chore: prepare yanting monorepo handoff

This commit is contained in:
2026-06-03 10:39:03 +09:00
commit fde51468c6
106 changed files with 8171 additions and 0 deletions
+185
View File
@@ -0,0 +1,185 @@
# API and Data Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Current Implementation Status
Implemented in this repository:
- FastAPI app under `/api/report-notebooklm/v1`.
- SQLAlchemy model layer for the Phase 1 table set.
- Alembic initial migration.
- Seed import script with institutions, reports, modules, audio assets, users, favorites, and playback-progress fixtures.
- Public read endpoints for health, feeds, reports, modules, institutions, and listen list.
- Tests covering seed counts, public response shape, module visibility, gray-source handling, and listen behavior.
Not implemented yet:
- Auth APIs.
- Personal state APIs.
- Audio stream signing endpoint.
- Outbound events endpoint.
- Internal management APIs.
- Real Redis cache invalidation policy.
- Real object-storage signed URL policy.
- Production pagination/cursor behavior beyond seed-scale responses.
## Data Tables
| Table | Purpose | Current model |
|---|---|---|
| `institutions` | Institution profile, source tier, website, topics, credibility notes. | Implemented |
| `reports` | Report master record, source, topics, publication state, cache version. | Implemented |
| `raw_artifacts` | NotebookLM artifact metadata and object-storage references. | Implemented as metadata only |
| `display_artifacts` | Reviewed display version metadata for App consumption. | Implemented |
| `display_modules` | Detail-page modules, sort order, visibility, content or content reference. | Implemented |
| `audio_assets` | Audio metadata and object-storage key. | Implemented |
| `related_news` | Related-source candidates and reviewed related items. | Implemented |
| `users` | User account records. | Implemented as seed model, no auth routes |
| `favorites` | User report favorites. | Implemented as seed model, no API routes |
| `reading_history` | User reading/history events. | Implemented as model, no API routes |
| `saved_listens` | User saved-listen records. | Implemented as model, no API routes |
| `playback_progress` | Playback progress sync records. | Implemented as seed model, no API routes |
| `outbound_events` | External attribution events. | Implemented as model, no API route |
## Public API Implemented
Prefix: `/api/report-notebooklm/v1`
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/health` | Service health. |
| `GET` | `/feed/recommended` | Published report cards for recommendation feed. |
| `GET` | `/reports` | Published report cards with basic filters. |
| `GET` | `/reports/{report_id}` | Report detail skeleton and published modules. |
| `GET` | `/reports/{report_id}/modules/{module_id}` | Full content for a visible module. |
| `GET` | `/institutions` | Active institution list. |
| `GET` | `/institutions/{institution_id}` | Institution detail with latest/recent reports. |
| `GET` | `/listen` | Published audio-backed report list. |
Current filters:
- `/reports`: `topic`, `institution_id`, `has_audio`, `source_tier`, `q`, `page_size`.
- `/institutions`: `topic`, `source_tier`, `page_size`.
- `/feed/recommended` and `/listen`: `page_size`.
Current pagination is seed-scale. Responses return `next_cursor: null` and `has_more: false`.
## Planned Public API
The Phase 1 contract also expects:
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/audio/{audio_id}/stream` | Return short-lived playable URL. |
| `POST` | `/outbound/events` | Persist external attribution click event. |
Audio stream must not return a permanent object-storage URL. The planned behavior is backend-signed short-lived playback URL with no download URL.
## Planned Auth and Personal State API
Auth:
- `POST /auth/phone/start`
- `POST /auth/phone/verify`
- `POST /auth/wechat`
- `POST /auth/apple`
Personal state:
- `GET /me`
- `GET /me/favorites`
- `POST /me/favorites`
- `DELETE /me/favorites/{report_id}`
- `GET /me/history`
- `POST /me/history`
- `GET /me/listens/saved`
- `POST /me/listens/saved`
- `DELETE /me/listens/saved/{audio_id}`
- `POST /me/playback-progress`
- `GET /me/playback-progress/{audio_id}`
These endpoints are contract-level requirements but are not implemented in this scaffold.
## Planned Internal API
Internal APIs should require service token and network allowlist. They must never be exposed to the App.
- `POST /internal/reports`
- `POST /internal/reports/{report_id}/raw-artifacts`
- `GET /internal/reports/{report_id}/raw-artifacts`
- `POST /internal/reports/{report_id}/display-artifacts`
- `PATCH /internal/modules/{module_id}`
- `POST /internal/reports/{report_id}/publish`
- `POST /internal/reports/{report_id}/hide`
- `POST /internal/related-news/candidates`
Publishing should update report display status, update `has_audio`, bump `cache_version`, and clear related cache keys.
## Public vs Internal Fields
Public responses may expose:
- Report identity, title, subtitle, one-liner, topics, institution card, release time, source tier, interpretation label, `has_audio`, and `cache_version`.
- Detail source note, source URL where allowed, risk disclaimer, and published display modules.
- Module metadata needed by the client: `module_id`, `type`, `layer`, `render_mode`, `has_detail_page`, `is_publish_blocking`, `requires_human_review`, `sort_order`, `title_cn`, `content`, `preview`, `content_ref`, `content_etag`.
Public responses must not expose:
- Raw artifact payload.
- Object-storage private paths for raw artifacts.
- NotebookLM notebook IDs, source IDs, conversation IDs, or local account information.
- Local filesystem paths.
- `display_version` or `module.version`.
- User phone hash, WeChat OpenID, Apple user ID, or auth internals.
The public cache contract is a single `cache_version` string. `display_version` and module `version` are server-internal fields only.
## Seed Data
The seed importer currently creates:
- 18 institutions.
- 27 reports, including one NotebookLM sample report and multiple boundary cases.
- 15 audio assets.
- More than 120 display modules.
- Test users, favorites, and playback progress.
Seed boundary cases intentionally cover:
- Reports with audio and reports without audio.
- Hidden/unpublished report behavior.
- Gray broker source with restricted source URL behavior.
- Published modules vs review-only modules.
- `study_guide` module replacing legacy `faq`.
- Heavy modules using `card_plus_page` preview plus full-module endpoint.
Do not treat seed content as production content. It exists to exercise app/API behavior and edge cases.
## Detail Module Model
The detail page uses a skeleton plus module model:
- Inline modules include small `content` directly in the detail response.
- Heavy modules use `render_mode=card_plus_page`, return `preview` in detail, and load full content from `/reports/{report_id}/modules/{module_id}`.
- Unknown future module types should not break the App; they should fall back to hidden or generic rendering.
Core module types:
- `basic_info`
- `executive_overview`
- `core_insights`
- `key_data`
- `source_compliance`
- `institution`
- `differentiated_view`
- `weaknesses`
- `timeline`
- `study_guide`
- `structure_graph`
- `related_sources`
- `infographic`
- `audio`
- `research_discovery`