Files
yanting/report-notebooklm-api/docs/API_AND_DATA.md
T

7.1 KiB

API and Data Handoff

This is a handoff snapshot, not the product SSOT.

Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.

Current Implementation Status

Implemented in this repository:

  • FastAPI app under /api/report-notebooklm/v1.
  • SQLAlchemy model layer for the Phase 1 table set.
  • Alembic initial migration.
  • Seed import script with institutions, reports, modules, audio assets, users, favorites, and playback-progress fixtures.
  • Public read endpoints for health, feeds, reports, modules, institutions, and listen list.
  • Tests covering seed counts, public response shape, module visibility, gray-source handling, and listen behavior.

Not implemented yet:

  • Auth APIs.
  • Personal state APIs.
  • Audio stream signing endpoint.
  • Outbound events endpoint.
  • Internal management APIs.
  • Real Redis cache invalidation policy.
  • Real object-storage signed URL policy.
  • Production pagination/cursor behavior beyond seed-scale responses.

Data Tables

Table Purpose Current model
institutions Institution profile, source tier, website, topics, credibility notes. Implemented
reports Report master record, source, topics, publication state, cache version. Implemented
raw_artifacts NotebookLM artifact metadata and object-storage references. Implemented as metadata only
display_artifacts Reviewed display version metadata for App consumption. Implemented
display_modules Detail-page modules, sort order, visibility, content or content reference. Implemented
audio_assets Audio metadata and object-storage key. Implemented
related_news Related-source candidates and reviewed related items. Implemented
users User account records. Implemented as seed model, no auth routes
favorites User report favorites. Implemented as seed model, no API routes
reading_history User reading/history events. Implemented as model, no API routes
saved_listens User saved-listen records. Implemented as model, no API routes
playback_progress Playback progress sync records. Implemented as seed model, no API routes
outbound_events External attribution events. Implemented as model, no API route

Public API Implemented

Prefix: /api/report-notebooklm/v1

Method Path Purpose
GET /health Service health.
GET /feed/recommended Published report cards for recommendation feed.
GET /reports Published report cards with basic filters.
GET /reports/{report_id} Report detail skeleton and published modules.
GET /reports/{report_id}/modules/{module_id} Full content for a visible module.
GET /institutions Active institution list.
GET /institutions/{institution_id} Institution detail with latest/recent reports.
GET /listen Published audio-backed report list.

Current filters:

  • /reports: topic, institution_id, has_audio, source_tier, q, page_size.
  • /institutions: topic, source_tier, page_size.
  • /feed/recommended and /listen: page_size.

Current pagination is seed-scale. Responses return next_cursor: null and has_more: false.

Planned Public API

The Phase 1 contract also expects:

Method Path Purpose
GET /audio/{audio_id}/stream Return short-lived playable URL.
POST /outbound/events Persist external attribution click event.

Audio stream must not return a permanent object-storage URL. The planned behavior is backend-signed short-lived playback URL with no download URL.

Planned Auth and Personal State API

Auth:

  • POST /auth/phone/start
  • POST /auth/phone/verify
  • POST /auth/wechat
  • POST /auth/apple

Personal state:

  • GET /me
  • GET /me/favorites
  • POST /me/favorites
  • DELETE /me/favorites/{report_id}
  • GET /me/history
  • POST /me/history
  • GET /me/listens/saved
  • POST /me/listens/saved
  • DELETE /me/listens/saved/{audio_id}
  • POST /me/playback-progress
  • GET /me/playback-progress/{audio_id}

These endpoints are contract-level requirements but are not implemented in this scaffold.

Planned Internal API

Internal APIs should require service token and network allowlist. They must never be exposed to the App.

  • POST /internal/reports
  • POST /internal/reports/{report_id}/raw-artifacts
  • GET /internal/reports/{report_id}/raw-artifacts
  • POST /internal/reports/{report_id}/display-artifacts
  • PATCH /internal/modules/{module_id}
  • POST /internal/reports/{report_id}/publish
  • POST /internal/reports/{report_id}/hide
  • POST /internal/related-news/candidates

Publishing should update report display status, update has_audio, bump cache_version, and clear related cache keys.

Public vs Internal Fields

Public responses may expose:

  • Report identity, title, subtitle, one-liner, topics, institution card, release time, source tier, interpretation label, has_audio, and cache_version.
  • Detail source note, source URL where allowed, risk disclaimer, and published display modules.
  • Module metadata needed by the client: module_id, type, layer, render_mode, has_detail_page, is_publish_blocking, requires_human_review, sort_order, title_cn, content, preview, content_ref, content_etag.

Public responses must not expose:

  • Raw artifact payload.
  • Object-storage private paths for raw artifacts.
  • NotebookLM notebook IDs, source IDs, conversation IDs, or local account information.
  • Local filesystem paths.
  • display_version or module.version.
  • User phone hash, WeChat OpenID, Apple user ID, or auth internals.

The public cache contract is a single cache_version string. display_version and module version are server-internal fields only.

Seed Data

The seed importer currently creates:

  • 18 institutions.
  • 27 reports, including one NotebookLM sample report and multiple boundary cases.
  • 15 audio assets.
  • More than 120 display modules.
  • Test users, favorites, and playback progress.

Seed boundary cases intentionally cover:

  • Reports with audio and reports without audio.
  • Hidden/unpublished report behavior.
  • Gray broker source with restricted source URL behavior.
  • Published modules vs review-only modules.
  • study_guide module replacing legacy faq.
  • Heavy modules using card_plus_page preview plus full-module endpoint.

Do not treat seed content as production content. It exists to exercise app/API behavior and edge cases.

Detail Module Model

The detail page uses a skeleton plus module model:

  • Inline modules include small content directly in the detail response.
  • Heavy modules use render_mode=card_plus_page, return preview in detail, and load full content from /reports/{report_id}/modules/{module_id}.
  • Unknown future module types should not break the App; they should fall back to hidden or generic rendering.

Core module types:

  • basic_info
  • executive_overview
  • core_insights
  • key_data
  • source_compliance
  • institution
  • differentiated_view
  • weaknesses
  • timeline
  • study_guide
  • structure_graph
  • related_sources
  • infographic
  • audio
  • research_discovery