# API and Data Handoff This is a handoff snapshot, not the product SSOT. Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03. ## Current Implementation Status Implemented in this repository: - FastAPI app under `/api/report-notebooklm/v1`. - SQLAlchemy model layer for the Phase 1 table set. - Alembic initial migration. - Seed import script with institutions, reports, modules, audio assets, users, favorites, and playback-progress fixtures. - Public read endpoints for health, feeds, reports, modules, institutions, and listen list. - Tests covering seed counts, public response shape, module visibility, gray-source handling, and listen behavior. Not implemented yet: - Auth APIs. - Personal state APIs. - Audio stream signing endpoint. - Outbound events endpoint. - Internal management APIs. - Real Redis cache invalidation policy. - Real object-storage signed URL policy. - Production pagination/cursor behavior beyond seed-scale responses. ## Data Tables | Table | Purpose | Current model | |---|---|---| | `institutions` | Institution profile, source tier, website, topics, credibility notes. | Implemented | | `reports` | Report master record, source, topics, publication state, cache version. | Implemented | | `raw_artifacts` | NotebookLM artifact metadata and object-storage references. | Implemented as metadata only | | `display_artifacts` | Reviewed display version metadata for App consumption. | Implemented | | `display_modules` | Detail-page modules, sort order, visibility, content or content reference. | Implemented | | `audio_assets` | Audio metadata and object-storage key. | Implemented | | `related_news` | Related-source candidates and reviewed related items. | Implemented | | `users` | User account records. | Implemented as seed model, no auth routes | | `favorites` | User report favorites. | Implemented as seed model, no API routes | | `reading_history` | User reading/history events. | Implemented as model, no API routes | | `saved_listens` | User saved-listen records. | Implemented as model, no API routes | | `playback_progress` | Playback progress sync records. | Implemented as seed model, no API routes | | `outbound_events` | External attribution events. | Implemented as model, no API route | ## Public API Implemented Prefix: `/api/report-notebooklm/v1` | Method | Path | Purpose | |---|---|---| | `GET` | `/health` | Service health. | | `GET` | `/feed/recommended` | Published report cards for recommendation feed. | | `GET` | `/reports` | Published report cards with basic filters. | | `GET` | `/reports/{report_id}` | Report detail skeleton and published modules. | | `GET` | `/reports/{report_id}/modules/{module_id}` | Full content for a visible module. | | `GET` | `/institutions` | Active institution list. | | `GET` | `/institutions/{institution_id}` | Institution detail with latest/recent reports. | | `GET` | `/listen` | Published audio-backed report list. | Current filters: - `/reports`: `topic`, `institution_id`, `has_audio`, `source_tier`, `q`, `page_size`. - `/institutions`: `topic`, `source_tier`, `page_size`. - `/feed/recommended` and `/listen`: `page_size`. Current pagination is seed-scale. Responses return `next_cursor: null` and `has_more: false`. ## Planned Public API The Phase 1 contract also expects: | Method | Path | Purpose | |---|---|---| | `GET` | `/audio/{audio_id}/stream` | Return short-lived playable URL. | | `POST` | `/outbound/events` | Persist external attribution click event. | Audio stream must not return a permanent object-storage URL. The planned behavior is backend-signed short-lived playback URL with no download URL. ## Planned Auth and Personal State API Auth: - `POST /auth/phone/start` - `POST /auth/phone/verify` - `POST /auth/wechat` - `POST /auth/apple` Personal state: - `GET /me` - `GET /me/favorites` - `POST /me/favorites` - `DELETE /me/favorites/{report_id}` - `GET /me/history` - `POST /me/history` - `GET /me/listens/saved` - `POST /me/listens/saved` - `DELETE /me/listens/saved/{audio_id}` - `POST /me/playback-progress` - `GET /me/playback-progress/{audio_id}` These endpoints are contract-level requirements but are not implemented in this scaffold. ## Planned Internal API Internal APIs should require service token and network allowlist. They must never be exposed to the App. - `POST /internal/reports` - `POST /internal/reports/{report_id}/raw-artifacts` - `GET /internal/reports/{report_id}/raw-artifacts` - `POST /internal/reports/{report_id}/display-artifacts` - `PATCH /internal/modules/{module_id}` - `POST /internal/reports/{report_id}/publish` - `POST /internal/reports/{report_id}/hide` - `POST /internal/related-news/candidates` Publishing should update report display status, update `has_audio`, bump `cache_version`, and clear related cache keys. ## Public vs Internal Fields Public responses may expose: - Report identity, title, subtitle, one-liner, topics, institution card, release time, source tier, interpretation label, `has_audio`, and `cache_version`. - Detail source note, source URL where allowed, risk disclaimer, and published display modules. - Module metadata needed by the client: `module_id`, `type`, `layer`, `render_mode`, `has_detail_page`, `is_publish_blocking`, `requires_human_review`, `sort_order`, `title_cn`, `content`, `preview`, `content_ref`, `content_etag`. Public responses must not expose: - Raw artifact payload. - Object-storage private paths for raw artifacts. - NotebookLM notebook IDs, source IDs, conversation IDs, or local account information. - Local filesystem paths. - `display_version` or `module.version`. - User phone hash, WeChat OpenID, Apple user ID, or auth internals. The public cache contract is a single `cache_version` string. `display_version` and module `version` are server-internal fields only. ## Seed Data The seed importer currently creates: - 18 institutions. - 27 reports, including one NotebookLM sample report and multiple boundary cases. - 15 audio assets. - More than 120 display modules. - Test users, favorites, and playback progress. Seed boundary cases intentionally cover: - Reports with audio and reports without audio. - Hidden/unpublished report behavior. - Gray broker source with restricted source URL behavior. - Published modules vs review-only modules. - `study_guide` module replacing legacy `faq`. - Heavy modules using `card_plus_page` preview plus full-module endpoint. Do not treat seed content as production content. It exists to exercise app/API behavior and edge cases. ## Detail Module Model The detail page uses a skeleton plus module model: - Inline modules include small `content` directly in the detail response. - Heavy modules use `render_mode=card_plus_page`, return `preview` in detail, and load full content from `/reports/{report_id}/modules/{module_id}`. - Unknown future module types should not break the App; they should fall back to hidden or generic rendering. Core module types: - `basic_info` - `executive_overview` - `core_insights` - `key_data` - `source_compliance` - `institution` - `differentiated_view` - `weaknesses` - `timeline` - `study_guide` - `structure_graph` - `related_sources` - `infographic` - `audio` - `research_discovery`