Compare commits

..

4 Commits

Author SHA1 Message Date
jimme 6c72b7d048 docs: add data source flow guide and localize handoff READMEs
- Add docs/DATA_SOURCE_FLOW.md: end-to-end source -> NotebookLM ->
  storage -> App flow, source list with publish frequency, institution
  intro status, ingestion artifact structure, and known cadence gaps
- Link the new doc from README and PROJECT_OVERVIEW indexes
- Localize top-level and subproject READMEs to Chinese for handoff
  (pre-existing working-tree changes)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:59:38 +09:00
jingyun 556f366894 fix:编译iOS和web目录 2026-06-03 10:17:39 +08:00
jingyun 638cf20629 fix:flutter的sdk过高,适应当前开发版本 2026-06-03 10:17:17 +08:00
jimme fde51468c6 chore: prepare yanting monorepo handoff 2026-06-03 10:39:03 +09:00
146 changed files with 3383 additions and 85 deletions
+46 -44
View File
@@ -1,47 +1,49 @@
# Miscellaneous # Local/private agent overlays
*.class AGENTS.local.md
*.log CURRENT_STATUS.md
*.pyc docs.jimme.local/
*.swp docs.*.local/
# Secrets and local env
.env
*.env.local
# Python
report-notebooklm-api/.venv/
report-notebooklm-api/.pytest_cache/
report-notebooklm-api/.mypy_cache/
report-notebooklm-api/**/*.pyc
report-notebooklm-api/**/__pycache__/
report-notebooklm-api/*.db
report-notebooklm-api/*.egg-info/
# Flutter / Dart
report-notebooklm-app/.dart_tool/
report-notebooklm-app/.flutter-plugins-dependencies
report-notebooklm-app/.pub-cache/
report-notebooklm-app/.pub/
report-notebooklm-app/build/
report-notebooklm-app/coverage/
# Android local/generated
report-notebooklm-app/android/.gradle/
report-notebooklm-app/android/local.properties
report-notebooklm-app/android/app/debug/
report-notebooklm-app/android/app/profile/
report-notebooklm-app/android/app/release/
report-notebooklm-app/**/*.apk
report-notebooklm-app/**/*.jks
report-notebooklm-app/**/*.keystore
report-notebooklm-app/android/key.properties
# IDE / OS noise
.DS_Store .DS_Store
.atom/ **/.DS_Store
.build/
.buildlog/
.history
.svn/
.swiftpm/
migrate_working_dir/
# IntelliJ related
*.iml
*.ipr
*.iws
.idea/ .idea/
*.iml
.vscode/
# The .vscode folder contains launch configuration and tasks you configure in # Build artifacts
# VS Code which you may wish to be included in version control, so this line build/
# is commented out by default. dist/
#.vscode/ coverage/
# Flutter/Dart/Pub related
**/doc/api/
**/ios/Flutter/.last_build_id
.dart_tool/
.flutter-plugins-dependencies
.pub-cache/
.pub/
/build/
/coverage/
*.apk
build/verification/
# Symbolication related
app.*.symbols
# Obfuscation related
app.*.map.json
# Android Studio will place build artifacts here
/android/app/debug
/android/app/profile
/android/app/release
+126
View File
@@ -0,0 +1,126 @@
# AGENTS.md - Yanting Engineering Repo
> Public agent instructions for this repository. This file is safe to commit.
> Local agents may read ignored `AGENTS.local.md`, but the repository must not depend on it.
> Last updated: 2026-06-03.
## Project
This repository contains the Phase 1 implementation and engineering handoff for `研听 / report-notebooklm`.
`研听` is a Chinese research-report interpretation app. It turns global institutional research reports into structured Chinese reading and listening experiences. The product is an interpretation and annotation service, not investment advice.
Technical identifiers:
- Code/API/internal name: `report-notebooklm`
- Short prefix: `rnb`
- API prefix: `/api/report-notebooklm/v1`
- Database schema name: `report_notebooklm`
- User-facing display name: `研听`
Do not use the user-facing display name in code identifiers, database schema names, Redis keys, object-storage paths, or package names.
## Repository Layout
This is intended to be a single Gitea repository.
| Path | Purpose |
|---|---|
| `README.md` | Human-facing repository entry point. |
| `docs/` | Repo-level public handoff, decisions, and development history. |
| `report-notebooklm-api/` | FastAPI backend, database models, migrations, seed importer, API docs. |
| `report-notebooklm-app/` | Flutter app, Android/web scaffolds, App docs. |
| `docs.jimme.local/` | Ignored local-only notes, not required by the team. |
| `AGENTS.local.md` | Ignored local agent overlay. |
## Public vs Local Documentation
Public, committed documentation must be portable:
- Use repository-relative paths.
- Use environment variables and placeholders for credentials.
- Describe product decisions in team-readable language.
- Distinguish implemented behavior from planned/spec behavior.
Do not commit local-only material:
- Local absolute paths.
- Personal machine setup.
- private agent workflow.
- raw session logs.
- local screenshots, APKs, caches, virtualenvs, build outputs.
- credentials or local service passwords.
Use `docs.jimme.local/` for ignored local notes and raw process references. Durable team-facing conclusions should be distilled into public `docs/`.
## Product and Compliance Constraints
- Public responses expose only reviewed display artifacts, not raw NotebookLM artifacts.
- Public responses expose `cache_version`; `display_version` and module `version` are internal.
- Do not expose raw artifact payloads, local file paths, NotebookLM notebook/source/conversation IDs, account identifiers, or private object-storage paths.
- Phase 1 has no report-interpretation download feature.
- Phase 1 does not include comments, UGC, paid unlocks, membership, task walls, points, trading signals, or investment recommendations.
- NotebookLM-native/source-driven artifacts are the content source. Do not use local LLM rewriting to invent publishable report content.
- Gray broker sources and generated media require compliance/operations review before public release.
## Backend
Read first:
- `report-notebooklm-api/README.md`
- `report-notebooklm-api/docs/HANDOFF.md`
- `report-notebooklm-api/docs/API_AND_DATA.md`
- `report-notebooklm-api/docs/CONTENT_PIPELINE.md`
- `report-notebooklm-api/docs/RUNBOOK.md`
Verify:
```bash
cd report-notebooklm-api
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
alembic upgrade head
python scripts/import_seed_content.py
pytest -q
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
The backend requires `.env` settings for real MySQL/Redis environments. Use `.env.example` as the template. Do not commit `.env`.
## App
Read first:
- `report-notebooklm-app/README.md`
- `report-notebooklm-app/docs/HANDOFF.md`
- `report-notebooklm-app/docs/API_CONTRACT_NOTES.md`
- `report-notebooklm-app/docs/APP_RUNBOOK.md`
Verify:
```bash
cd report-notebooklm-app
flutter analyze
flutter test
flutter build web --dart-define=RNB_API_BASE=<api-base-url>
flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
The App intentionally has no built-in live API default. Always pass `RNB_API_BASE`.
## Decision Records
Long-lived decisions belong in `docs/DECISIONS.md`.
Development timeline and major implementation changes belong in `docs/DEVELOPMENT_HISTORY.md`.
Raw session logs, temporary planning transcripts, or local-only evidence pointers belong in ignored `docs.jimme.local/`.
## Git Rules
- Target remote: `https://gitea.neuronlabs.art/third-party-project/yanting.git`.
- Commit one monorepo, not nested repositories.
- Before the final monorepo push, remove or archive nested `.git/` directories under subprojects so source files are committed as normal directories.
- Keep `.env`, build artifacts, caches, APKs, local status files, and local agent overlays ignored.
- Use English commit messages with prefixes such as `feat:`, `fix:`, `docs:`, and `chore:`.
+104 -41
View File
@@ -1,36 +1,107 @@
# report-notebooklm-app # 研听 / report-notebooklm
report-notebooklm 第一阶段应用外壳的 Flutter 客户端 `研听` 是一个第一阶段(Phase 1)的应用和后端,用来把全球机构研报转化成结构化的中文阅读与收听体验
后端 API 在同一个 monorepo 的 `../report-notebooklm-api/` 里。API、数据、内容流水线的细节都记在那边;这个目录专注于应用交接、UI 状态、构建命令和对接说明 这个仓库被整理成单个 Gitea 交接仓库,供产品和工程团队接手使用
## 仓库里有什么
| 区域 | 路径 | 说明 |
|---|---|---|
| 后端 API | `report-notebooklm-api/` | FastAPI 服务、MySQL 模型、Alembic 迁移、种子数据导入、对外只读 API。 |
| Flutter 应用 | `report-notebooklm-app/` | Flutter 客户端,包含五个主标签页、研报详情模块、Android/Web 脚手架。 |
| 仓库文档 | `docs/` | 项目级概览、决策记录、开发历程和交接指南。 |
| 后端文档 | `report-notebooklm-api/docs/` | API、数据、内容流水线、运维手册的细节。 |
| 应用文档 | `report-notebooklm-app/docs/` | 应用运维手册、项目地图、API 调用说明。 |
## 产品速览
`研听` 帮助中文用户读懂全球机构研报,覆盖宏观、贵金属、大宗商品、能源、央行、跨资产等主题。
第一阶段聚焦在:
- 推荐:精选 / 最新的研报解读。
- 研报:研报列表和基础筛选。
- 机构:机构列表和机构详情。
- 听单:带音频的研报。
- 我的:游客 / 登录状态,以及浅层的个人状态入口。
第一阶段明确**不包含**:评论、UGC、付费解锁、会员、广告、交易信号、投资建议、研报解读下载。
## 先读这些 ## 先读这些
- [docs/HANDOFF.md](docs/HANDOFF.md):当前应用状态、已实现的页面、占位项,以及下一步工作。 给人类读者:
- [docs/PROJECT_BRIEF.md](docs/PROJECT_BRIEF.md):产品和第一阶段范围速览。
- [docs/APP_RUNBOOK.md](docs/APP_RUNBOOK.md)Flutter 版本、本地运行、Web 构建、Android 调试构建和验证。
- [docs/API_CONTRACT_NOTES.md](docs/API_CONTRACT_NOTES.md):应用所消费的接口和字段。
- [docs/PROJECT_MAP.md](docs/PROJECT_MAP.md):源码目录地图。
## 产品边界 1. `docs/PROJECT_OVERVIEW.md`
2. `docs/DECISIONS.md`
3. `docs/DATA_SOURCE_FLOW.md`
4. `docs/DEVELOPMENT_HISTORY.md`
5. `report-notebooklm-api/docs/HANDOFF.md`
6. `report-notebooklm-app/docs/HANDOFF.md`
这个仓库装的是应用代码和一份工程交接快照,不是产品的唯一真源。 给 AI agent
产品 SSOTmall-docs 里的 report-notebooklm 文档。快照日期:2026-06-03。 1. `AGENTS.md`
2. `docs/DECISIONS.md`
3. 对应子系统的 README 和运维手册。
技术标识符用 `report-notebooklm``rnb`,面向用户的产品名是 `研听` ## 当前实现状态
## 环境要求 后端已实现:
- Flutter 3.44.1 / Dart 3.12.1,或兼容的更新版本 - 挂在 `/api/report-notebooklm/v1` 下的 FastAPI 应用
- 一个正在运行、提供 `/api/report-notebooklm/v1` 的后端 - 第一阶段数据表的 SQLAlchemy 模型层
- 做 Android 构建还需要:Android SDK、已接受的许可协议,以及一台模拟器或真机 - Alembic 初始迁移
- 种子数据导入脚本。
- 健康检查、信息流、研报、研报模块、机构、听单的对外只读接口。
- 针对种子数据和对外 API 行为的测试。
## API 基础地址 应用已实现:
应用刻意不内置任何线上 API 默认值。请显式传入后端基础地址: - 五个底部标签页:推荐、研报、机构、听单、我的。
- 基于 `RNB_API_BASE` 的列表 / 详情视图。
- 研报详情的模块渲染器注册表。
- 登录、收藏、外链跳转确认、播放进度的本地占位实现。
- Android 和 Web 构建脚手架。
尚未达到生产可用:
- 鉴权和个人状态。
- 真实的音频流签名。
- 外链事件写入。
- 内部内容管理 API。
- 生产环境对象存储和缓存失效。
- 生产 API 域名、发布签名、最终应用图标、应用商店元信息。
## 后端快速上手
```bash ```bash
cd report-notebooklm-api
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# 按你的 MySQL 和 Redis 编辑 .env
alembic upgrade head
python scripts/import_seed_content.py
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
冒烟检查:
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
## 应用快速上手
```bash
cd report-notebooklm-app
flutter analyze
flutter test
flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url> flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url>
``` ```
@@ -40,38 +111,30 @@ Android 模拟器:
flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url> flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url>
``` ```
同一局域网内的 Android 真机:
```bash
flutter run -d <device-id> --dart-define=RNB_API_BASE=http://<host-lan-ip>:<port>/api/report-notebooklm/v1
```
明文 HTTP 只能用于调试构建。发布构建必须使用 HTTPS。
## 验证 ## 验证
后端:
```bash ```bash
cd report-notebooklm-api
source .venv/bin/activate
pytest -q
```
应用:
```bash
cd report-notebooklm-app
flutter analyze flutter analyze
flutter test flutter test
flutter build web --dart-define=RNB_API_BASE=<api-base-url> flutter build web --dart-define=RNB_API_BASE=<api-base-url>
flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url> flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url>
``` ```
## 当前应用范围 ## 文档边界
已实现: 这个仓库是一份代码交接快照,不能替代产品的唯一真源(SSOT)。
- 五个底部标签页:推荐、研报、机构、听单、我的 产品 SSOTmall-docs 里的 report-notebooklm 文档,快照日期:2026-06-03
- 基于 API 的信息流、研报列表、机构列表、听单、机构详情和研报详情。
- 用于内联模块和「卡片 + 页面」模块的模块渲染器注册表。
- 产品显示名 `研听`
- 登录、收藏、外链跳转确认、播放进度的本地 UI 占位。
尚未实现: 仅限本机的笔记、私有路径、原始会话指针、个人 agent 工作流,都应放在被忽略的 `docs.jimme.local/``AGENTS.local.md` 里。
- 真实鉴权。
- 真实的收藏 / 历史 / 收听记录同步。
- 真正可播放的音频流。
- 真实的外链事件写入。
- 生产 API 域名。
- 发布签名、最终图标和最终应用商店元信息。
+269
View File
@@ -0,0 +1,269 @@
# 数据源流转说明 / Data Source Flow
这是一份交接快照,不是产品唯一真源(SSOT)。
产品 SSOTmall-docs 的 report-notebooklm 文档,快照日期:2026-06-03。
本文把"研报从哪里来 → 怎么解读 → 存在哪里 → 怎么进 APP"这条链路一次讲清楚,并把之前文档里分散或缺失的部分(尤其是**数据源清单**与**更新频率**)补齐。涉及的具体实现细节请回到各子系统文档与 SSOT 核对。
---
## 1. 一图看懂:四层数据模型 + 端到端流转
研听的数据分**四层**。前两层是内部证据,**对 APP 不可见**;后两层是审核后的展示物,对 APP 可见。
```
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 1 报告源 Report Source │
│ 机构研报 PDF / 来源 URL / 机构元数据 │
│ └─ 来自公开官方源 / 授权伙伴源 / 灰色券商公开源 │
└───────────────────────────────┬─────────────────────────────────────┘
│ 上传到 NotebookLM,源驱动解读
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 2 原始产物 Raw Artifact(内部,App 不可见) │
│ NotebookLM 原生 + 定向查询产物,全量保留 │
│ └─ payload 存对象存储;DB 只存 metadata + payload_ref + sha256 │
└───────────────────────────────┬─────────────────────────────────────┘
│ 确定性组装 / 清洗 / 字段映射 + 人工审核
│ (禁止本地 LLM 重写原文)
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 3 展示产物 Display Artifact(审核后,App 可见) │
│ display_artifacts + display_modules(按 P0/P1/P2 分层的详情模块) │
│ └─ 状态机:missing → raw_ready → review → approved → published │
└───────────────────────────────┬─────────────────────────────────────┘
│ 只读公开 API
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 4 App 响应 App Response │
│ 列表 / 详情骨架 / 模块懒加载 / 音频签名 URL / 机构卡片 │
└─────────────────────────────────────────────────────────────────────┘
```
**核心原则**APP 永远只消费 Layer 3/4 的"审核后展示物",从不直接读 Layer 1/2 的原始 PDF 或 NotebookLM 原始产物。误请求原始产物应返回 `RAW_ARTIFACT_NOT_EXPOSED`403)。
---
## 2. 数据源(Layer 1
### 2.1 三类来源 / 三个可信层级
| 来源类别 | 可信层级 | 处理规则 |
|---|---|---|
| 官方公开源(监管机构、国际组织、行业组织) | `tier_1` | 标准流程。 |
| 卖方研究 / 资管(投行、资管公司、数据商) | `tier_2` | 标准流程。 |
| 灰色券商公开源 | `tier_3` | 更严格审核;来源 URL 展示受限,需走后端短期签名 URL;发布前必须合规/运营复核。 |
| 自家 / 授权合作源 | 按约定 | 暂空,后续接期货公司 / 券商内部研报时新增。 |
来源可作参考的历史经验来自 Vision 的源清单与源健康数据,但**生产数据不得依赖本地 Vision 运行时、本地路径、本地缓存或本地账号状态**。
### 2.2 研报 PDF 源清单与发布频率(补齐)
以下为 SSOTvision-research-sources)中**已启用的研报 PDF 源**,按主题分组,含**天然发布频率**——这正是此前文档缺失的"PDF 更新频率"基线。频率列指**源站自身的研报发布周期**,不等于研听的解读 / 复读节奏(见第 6 节)。
**贵金属专门机构**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| World Gold Council(世界黄金协会) | Weekly Markets Monitor / Silver Lining | 周 |
| WPIC(世界铂金投资协会) | 铂金季报 | 季 |
| State Street(道富) | 贵金属月度 | 月 |
| ING | 贵金属 / 外汇研究 | 不定期 |
| Silver Institute(白银协会) | 白银市场 | 年 / 不定期 |
| HDFC Securities / Sharekhan | 印度市场视角 | 不定期 |
| Emirates NBD | 中东 / 央行购金 | 不定期 |
**跨资产 / 大宗宏观(卖方主力)**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| Goldman Sachs(高盛研究) | 大宗 / 宏观展望 | 年 / 不定期 |
| J.P. MorganAM + PWM | 资产配置展望 | 年 / 不定期 |
| Bloomberg Intelligence | 跨资产 | 不定期 |
| WisdomTreeEU + US | 大宗商品展望 | 不定期 |
| Invesco(景顺) | ETF / 资产配置 | 年 |
| World Bank(世界银行) | Commodity Markets Outlook / **Pink Sheet** | 半年 / **双周(频率最高)** |
**能源**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| EIA(美国能源信息署) | Short-Term Energy Outlook | 月 |
| IEA(国际能源署) | Oil Market Report / Gas Market Report | 月 / 季 |
| OPEC(欧佩克) | 年度展望 | 年 |
| IEEJ(日本能源经济研究所) | 能源经济 | 不定期 |
| Policy Center(摩洛哥) | 能源政策 | 不定期 |
**矿企 / 工业金属 / 农产品**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| USGS(美国地质调查局) | Mineral Commodity Summaries | 年 |
| USDA | WASDE 农产品供需 | 月(PDF 链接月度轮换) |
| Eldorado Gold / Pan American Silver | 矿企季报 | 季 |
> 频率总览:**周(WGC)→ 双周(WB Pink Sheet)→ 月(EIA / IEA OMR / USDA / State Street)→ 季(WPIC / IEA Gas / 矿企)→ 半年(WB CMO)→ 年(高盛 / JPM / Invesco / USGS / OPEC**。
**口径优先级**:实际入库表 > Vision `config/research_report_sources.json``config/sources.yaml` > 本文。本文是研听消费视角的聚合视图,会定期 stale,使用前请回源核对。
### 2.3 与种子数据的差异(重要)
后端 `import_seed_content.py` 里的 **18 家机构是种子数据,不是生产清单**。生产权威清单是 §2.2 的 ~31 家 PDF 源。此外:
- 种子里出现的 BIS / Fed / IMF 等,以及早期设计稿设想的 ECB / BOJ,**不在已启用的研报源清单内**——它们是设计设想或实验样本(如 NotebookLM 能力实验用的 BIS 季报),落地时以已启用清单为准。
- 上线前接入新源,应在 Vision 源配置(或后续研听自有源配置)里新增 source,并同步本文。
---
## 3. 机构信息(institutions 表)
| 字段 | 说明 | App 可见 | 现状 |
|---|---|---|---|
| `name_cn` / `name_en` | 中英文名 | 是 | 已填 |
| `institution_type` | 7 类枚举:`official` / `international_org` / `industry_org` / `bank_research` / `asset_manager` / `data_provider` / `partner` | 是 | 已填 |
| `source_tier` | `tier_1/2/3` | 是 | 已填 |
| `website_url` | 官网 | 是 | 已填 |
| `covered_topics` | 覆盖主题 | 是 | 已填 |
| `intro_cn` | **机构详情页简介** | 是 | ⚠️ 字段存在,逐家文本基本未写 |
| `credibility_note` | **可信度说明** | 是 | ⚠️ 仅有 WGC 一条样例 |
**机构介绍现状**schema 完全支持 `intro_cn` + `credibility_note`,但 SSOT 中目前只有一条实际样例——WGC 的可信度说明:"全球黄金行业组织,公开发布黄金需求与市场研究。" §2.2 各机构的"代表报告 / 主题"可作为撰写逐家简介的素材,但**31 家逐家成段介绍文本仍是待补内容**。
---
## 4. PDF → NotebookLM:解读与抓取内容结构(Layer 2)
### 4.1 解读工作流(推荐顺序)
1. 检查源 PDF:标题、机构、日期、页数、大小、报告类型。
2. 为一份报告源创建(或复用)一个 notebook;除非明确做多报告综述,否则一报告一 notebook。
3. 上传报告源。
4. 生成 **P0 文本包**source description、原生 Briefing Doc、原生 Blog Post、data table、query dimensions、query key data、query divergence、query weaknesses。
5. 生成 **P1 产物**query timeline、query related sources、Study Guide、mind map(若导出成功)。
6. 异步生成 **P2 产物**infographic 候选、audio brief、research discovery。
7. 每步操作后写入 manifest,持久化每个产物状态。
8. 从已审核产物**确定性**组装展示模块。
9. 发布前人工审核。
工具链:NotebookLM CLI`nlm`)创建 notebook、上传 source、生成并导出 artifacts;生产 worker 把 PDF 生产为 raw artifacts 并入库。
### 4.2 产物类型(16 类)与实测结构
一次实测(106 页机构季报样本)产出 **16 类 artifact15 成功、1 失败(mind map 导出失败)**,体量从 ~1KB 文本到 5.4MB 信息图、~75 秒音频不等。各类用途与发布约束:
| Artifact 类型 | 用途 | 阻断发布 | 需人审 |
|---|---|:--:|:--:|
| `source_summary` / `notebook_summary` | 源 / notebook 级摘要 | 否 | 否 |
| `native_briefing_doc` | 原生简报文档 | **是** | 否 |
| `native_blog_post` | 原生博文 | **是** | 否 |
| `native_study_guide` | FAQ / 学习指南 / 术语表 | 否 | 否 |
| `data_table` | 结构化表格(CSV | **是** | 否 |
| `mind_map` | 思维导图 / 图结构源 | 否 | 否 |
| `query_dimensions` | 分析维度 | **是** | 否 |
| `query_key_data` | 关键数据点 | **是** | 否 |
| `query_divergence` | 与共识的分歧 | 否 | 否 |
| `query_weaknesses` | 弱点与开放问题 | 否 | 否 |
| `query_timeline` | 时间线与转折点 | 否 | 否 |
| `query_related_sources` | 相关源候选 | 否 | **是** |
| `research_discovery` | 拓展队列 | 否 | **是** |
| `infographic` | 公开候选图 | 否 | **是** |
| `audio_brief` | 音频预览 / 音频源 | 否 | 否 |
> **最高价值层**是 query 系产物(dimensions / key_data / divergence / weaknesses / timeline),体量最大、信息最密。
### 4.3 raw artifact 元数据结构(manifest → 数据库 `raw_artifacts`
每条 artifact 记录持久化的字段:`artifact_type``provider`(默认 notebooklm)、`payload_format``payload_ref`(对象存储引用)、`sha256``size_bytes``status`pending/ok/failed)、`error``generated_at` / `ingested_at``is_publish_blocking``requires_human_review``quality_flags``retention_status`,以及内部关联 IDnotebook / source / conversation——**仅内部,绝不进 App 响应**)。
### 4.4 抓取的两条硬规则
- **禁止本地 LLM 重写** NotebookLM 原文。流水线只能编排、清洗、校验、字段映射、确定性组装、人工裁剪;不得用本地改写凭空生成可发布内容。
- **引用页码需二次规范化**:NotebookLM 引用可能给出研报印刷页码(≠ PDF 物理页码),UI 不暴露 raw page label,未规范化前不展示页标;保留 citation 作内部证据。
---
## 5. 存储与流转落点(Layer 2 → 3)
### 5.1 对象存储(阿里云 OSS
原始 payload、音频、图片、超大模块内容都存 OSS,DB 只存引用键。约定前缀:
| 前缀 | 内容 |
|---|---|
| `rnb/raw/` | NotebookLM 原始产物 payload |
| `rnb/modules/` | 展示模块内容(大模块 `content_ref` |
| `rnb/audio/` | 音频资产 |
| `rnb/images/` | 信息图 / 图片 |
- raw payload 存 OSSMySQL 仅存 `payload_ref` + metadata + `sha256`(内部)。
- 音频对象键 `audio_assets.oss_key` 内部不可见;播放 `stream_url` 由后端**即时签发短期签名 URL**(计划有效期 ~2 小时),不落库、无下载 URL。
- 大模块内容(如 mind map / infographic / 长表,>100KB)存 OSS`display_modules.content` 只存 `content_ref` + `content_etag`
> ⚠️ 当前实现状态:真实 OSS 签名与失效策略仍为 **planned**,本仓库 scaffold 未落地生产对象存储。
### 5.2 数据库表(schema = `report_notebooklm`MySQL 8
共 13 张表:`institutions``reports``raw_artifacts``display_artifacts``display_modules``audio_assets``related_news``users``favorites``reading_history``saved_listens``playback_progress``outbound_events`
- **内容侧**(已实现模型):前 7 张。
- **用户态侧**(已实现模型、API 多为 planned):后 6 张。
### 5.3 raw → display 审核状态机
```
missing → raw_ready → review → approved → published
↑↓
hidden
```
**发布门槛**:所有 `is_publish_blocking=True` 的 P0 模块均已 `published`,且来源署名与风险免责声明齐备、公开响应不含原始 payload / 本地路径 / NotebookLM 内部 ID / 账号信息。
---
## 6. 流转节奏(cadence)与已知缺口
把"频率"分成三个层次看,避免混淆:
| 层次 | 现状 |
|---|---|
| **A. 各源天然发布频率** | ✅ 已明确,见 §2.2(周 / 双周 / 月 / 季 / 半年 / 年)。 |
| **B. 单次 NotebookLM 生产压力策略** | ✅ 已实测:单账号串行(`parallelism=1`)、限速(~48 ops/小时量级)、按产物重量 60–150 秒冷却、不跑 slides/video、research discovery 不自动导入;一篇报告图文层约 20–30 分钟。 |
| **C. 研听自身的解读 / 复读 / 排产 cadence** | ❌ **未冻结**——产品契约层没有定义"每篇研报多久复读一次""每天/每周解读多少篇""生产 runner 的 cron/触发节奏"。 |
**内容量门槛(非频率,但相关)**
- 开发期种子:1020 条 Report / 58 个 Institution / 35 条带音频。
- 上线前首批:30–50 条已审核研报解读,≥10 条带音频。
**仍待补的缺口(建议下一步处理)**
1. **研听生产 cadenceC 层)**:每篇研报的复读周期、每天/每周产量、生产 runner 调度节奏。Phase 1 的定位是"上线前批量跑一次最小内容集,不阻塞 App 开发",**持续 cadence 留给后续阶段(G5 服务端生产链迁移)**,目前仅"每周检查可发布数量"。
2. **机构逐家介绍文本**:§3 的 `intro_cn` / `credibility_note` 31 家逐家内容。
3. **种子清单 vs 生产清单对齐**:把 §2.2 的生产源清单沉淀为正式机构主数据,替换 18 家种子。
---
## 7. 进入 APP 的出口(Layer 4
是的,**最终是"进数据库 + 进对象存储"的双层落地**,APP 通过只读 API 消费:
- 元数据与结构化模块内容 → **MySQL**13 张表)。
- 原始 PDF、原始产物、音频、图片、超大模块 → **对象存储**DB 存引用键。
- 缓存 → Redisfeed/detail 缓存、播放进度去抖、限流)。
**公开 API**(前缀 `/api/report-notebooklm/v1`):`/feed/recommended``/reports``/reports/{id}`(详情骨架)、`/reports/{id}/modules/{module_id}`(重模块全文懒加载)、`/institutions``/institutions/{id}``/listen`,以及计划中的 `/audio/{id}/stream`(短期签名 URL)。
**详情页取数模型**:骨架 + 模块懒加载——轻模块内联返回 `content`;重模块返回 `preview`,全文走二级端点或 `content_ref`,客户端用 `content_etag` 校验缓存。公开已发布内容可直读 `content_ref`;受限(灰色)来源走后端短期签名 URL。
**内部生产链 API**service token + 网络白名单,绝不对 App 暴露):`POST /internal/reports/{id}/raw-artifacts``/display-artifacts``/publish``/hide` 等。发布动作更新展示状态、刷新 `has_audio`、bump `cache_version`、清相关缓存键。
---
## 8. 相关文档
- 内容流水线细节:`report-notebooklm-api/docs/CONTENT_PIPELINE.md`
- API 与数据模型:`report-notebooklm-api/docs/API_AND_DATA.md`
- 运维与存储约定:`report-notebooklm-api/docs/RUNBOOK.md`
- 决策记录:`docs/DECISIONS.md`
- 产品 SSOTmall-docs report-notebooklm 文档(数据源清单、构建 brief、数据模型契约、NotebookLM 能力实验报告)。
+44
View File
@@ -0,0 +1,44 @@
# Decision Record
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Product Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-02 | Phase 1 scope is a Chinese global institutional report interpretation app, not a pure audio app. | Five main tabs remain 推荐 / 研报 / 机构 / 听单 / 我的. |
| 2026-06-02 | Phase 1 has no commercialization. | No ads, paid unlock, membership, task wall, or points. |
| 2026-06-02 | Phase 1 does not open comments, UGC, or user-generated report interpretation. | App should not show community or publishing entry points. |
| 2026-06-02 | Guest users can browse public content and fully listen to at least one item. | Login should not block first listening experience. |
| 2026-06-03 | Product display name is `研听`; technical identifiers stay `report-notebooklm` / `rnb`. | Code identifiers, database schema, Redis keys, object-storage paths, and API prefixes remain brand-neutral. |
| 2026-06-03 | Phase 1 has no report-interpretation download feature. | No top-level download icon, detail download button, profile download record, download API, or offline audio package. |
## API and Data Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-03 | Public responses expose only `cache_version`. | `display_version`, module `version`, and nested cache version objects are internal. |
| 2026-06-03 | Heavy modules use a skeleton plus lazy full-module flow. | Detail returns previews; full content uses `/reports/{report_id}/modules/{module_id}` or a content reference. |
| 2026-06-03 | FAQ, Study Guide, and Glossary are represented as `study_guide`. | Legacy `faq` should map to `study_guide`; no separate public `faq` type. |
| 2026-06-03 | Public published content may use direct content references; restricted sources need short-lived backend signed URLs. | Backend keeps module endpoint and should add signed URL behavior for restricted content. |
| 2026-06-03 | Gray broker sources may be full-text audio-ized, but need compliance/operations review before production. | Seed and production rules can allow audio, but release must remain reviewed. |
## Content Pipeline Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-02 | NotebookLM is treated as a source-driven research engine. | Use native artifacts and targeted queries; do not invent unsupported copy. |
| 2026-06-02 | Raw artifacts stay internal. | App consumes reviewed display artifacts only. |
| 2026-06-02 | P0 text artifacts publish first; media and enrichment are async. | Audio, infographic, research discovery, and mind map must not block text publishability. |
| 2026-06-02 | Vision can be used as source/reference experience but not as a production runtime dependency. | Production data must not depend on local Vision runtime, local paths, or local account state. |
## Repository and Handoff Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-03 | Gitea target is a single repository. | `report-notebooklm-api/` and `report-notebooklm-app/` should be ordinary subdirectories in one repo. |
| 2026-06-03 | Public docs must be portable. | No local absolute paths or private machine setup in committed docs. |
| 2026-06-03 | Local-only agent and status material goes into ignored files. | Use `AGENTS.local.md` and `docs.jimme.local/`. |
| 2026-06-03 | Long-lived decisions are public; raw sessions are local. | Distill decisions into this file; keep session pointers in ignored local docs. |
+85
View File
@@ -0,0 +1,85 @@
# Development History
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## 2026-06-02 - Product Scope Freeze
- Product scope was corrected away from the old "Wall Street listening" / pure-audio framing.
- Phase 1 was frozen around a Chinese research-report interpretation app.
- Main tabs were fixed as 推荐 / 研报 / 机构 / 听单 / 我的.
- Non-goals were made explicit: no commercialization, comments, UGC, trading advice, professional terminal, or local Vision runtime dependency.
- Vision was kept as reference/source experience, not production runtime.
## 2026-06-02 - Development Plan and Review
- Phase 1 technical baseline was selected:
- Flutter App.
- FastAPI backend.
- MySQL 8.
- Redis with `rnb:` namespace.
- Object storage for raw artifacts, heavy modules, audio, and images.
- Existing cloud/server deployment model.
- External launch dependencies were identified:
- SMS template/signature.
- WeChat Open Platform.
- Apple login if required.
- AI-generated-content labeling.
- compliance review for source and media policies.
- The plan passed independent review with changes requested around launch blockers and implementation details.
## 2026-06-03 - Backend Scaffold
- FastAPI service created under `report-notebooklm-api/`.
- SQLAlchemy model layer created for Phase 1 tables.
- Alembic initial migration added.
- Seed importer added with institutions, reports, display artifacts, display modules, audio assets, users, favorites, and playback progress.
- Public read routes implemented:
- `/health`
- `/feed/recommended`
- `/reports`
- `/reports/{id}`
- `/reports/{id}/modules/{module_id}`
- `/institutions`
- `/institutions/{id}`
- `/listen`
- Tests added for seed counts, public API shape, hidden/review module boundaries, gray-source behavior, and listen list behavior.
## 2026-06-03 - App Scaffold
- Flutter app shell created under `report-notebooklm-app/`.
- Five tabs implemented.
- API client added with explicit `RNB_API_BASE`.
- Feature folders created for feed, reports, institutions, listen, profile, detail, and shared widgets.
- Detail module renderer registry added.
- Local placeholders added for blocked behaviors:
- login
- favorite
- outbound confirmation
- playback progress
- real audio stream
- Android platform scaffold added.
## 2026-06-03 - Handoff Preparation
- Backend and App documentation added.
- Public docs were distilled from product documents without copying the full product-doc tree.
- Local-only paths and raw session details were separated from public docs.
- Root README and public AGENTS instructions were introduced for the single-repo Gitea handoff.
## Current Verification Snapshot
Validated during handoff preparation:
- Backend editable install with dev dependencies.
- Backend migration.
- Backend seed import.
- Backend tests.
- Backend smoke checks for health, feed, and report detail.
- App analyze.
- App widget test.
- App web build.
- App debug APK build.
Build artifacts are transient and are not committed.
+45
View File
@@ -0,0 +1,45 @@
# Project Overview
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Purpose
`研听` is a Chinese app for understanding global institutional research reports. It converts difficult English research reports into reviewed Chinese reading and listening experiences.
The product is a research-report interpretation and annotation service. It does not provide investment advice.
## Technical Shape
| Layer | Technology | Path |
|---|---|---|
| App | Flutter | `report-notebooklm-app/` |
| API | FastAPI | `report-notebooklm-api/` |
| Database | MySQL 8 | configured by `RNB_DATABASE_URL` |
| Cache | Redis | configured by `RNB_REDIS_URL` |
| Storage | Object storage | planned for raw artifacts, modules, audio, images |
## Phase 1 Surfaces
- 推荐: latest and curated report interpretations.
- 研报: all published report interpretations with basic filters.
- 机构: institution list, institution detail, and recent reports.
- 听单: audio-backed reports.
- 我的: guest/login state and shallow personal-state entries.
## Key Engineering Principle
The app consumes reviewed display artifacts through the API. Raw NotebookLM artifacts are internal evidence and must not be exposed publicly.
NotebookLM-native content may be cleaned, mapped, reviewed, and assembled deterministically. It must not be silently replaced by local LLM rewriting.
## Repository Documentation
- `README.md`: human entry point.
- `AGENTS.md`: public agent instructions.
- `docs/DECISIONS.md`: durable decisions.
- `docs/DEVELOPMENT_HISTORY.md`: major change history.
- `docs/DATA_SOURCE_FLOW.md`: end-to-end data source flow, source list with publish frequency, and storage/ingestion path.
- `report-notebooklm-api/docs/`: backend, data, API, and content pipeline details.
- `report-notebooklm-app/docs/`: App runbook and API consumption notes.
+3
View File
@@ -0,0 +1,3 @@
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<host>:<port>/0
RNB_REDIS_KEY_PREFIX=rnb:
+11
View File
@@ -0,0 +1,11 @@
.venv/
__pycache__/
.pytest_cache/
.mypy_cache/
*.egg-info/
*.pyc
*.db
.env
.DS_Store
build/
*.apk
+56
View File
@@ -0,0 +1,56 @@
# report-notebooklm-api Notes
This file keeps short engineering notes for this repository. The durable handoff is in `docs/`.
## 2026-06-03 Phase 1 Scaffold
- Started the Phase 1 backend scaffold from `phase1-build-brief.md`.
- Technical identifiers use `report-notebooklm` / `rnb`; user-facing product name is `研听`.
- Implemented FastAPI config, database, cache helper, routers, SQLAlchemy models, Alembic migration, seed importer, and public read API.
- Public API prefix is `/api/report-notebooklm/v1`.
- Implemented public routes:
- `/health`
- `/feed/recommended`
- `/reports`
- `/reports/{id}`
- `/reports/{id}/modules/{module_id}`
- `/institutions`
- `/institutions/{id}`
- `/listen`
- Seed importer covers institutions, reports, display artifacts, display modules, audio assets, users, favorites, and playback progress.
- Heavy modules store preview/full content in a JSON envelope. Public detail responses expose previews for `card_plus_page`; the module endpoint exposes full content.
- Review-only modules do not appear in public responses.
- Public responses expose `cache_version`; `display_version` and module `version` remain internal.
## Verification Snapshot
- Backend tests: `pytest -q` passed.
- Local API smoke checks passed for `/health`, `/feed/recommended`, and `/reports/rep_ssga_gold`.
- Companion App analyze/test/build checks passed when using a Flutter SDK compatible with Dart 3.12.1.
- Android debug validation was completed during local handoff. Build artifacts and screenshots are transient and should not be committed.
## Resolved Product Decisions
- Public responses expose only `cache_version`.
- Heavy module access keeps both `content_ref` and `GET /reports/{id}/modules/{module_id}` available.
- Public published content may use direct content references; restricted sources should use backend short-lived signed URLs.
- FAQ, Study Guide, and Glossary are represented as a single `study_guide` module type.
- `faq` stays deprecated; legacy seed `faq` should map to `study_guide`.
- Gray-source full-text audio is allowed by product decision but still needs operations/compliance review before production release.
- App prototype feedback decisions from 2026-06-03 are durable in mall-docs `docs/2026-06-03-app-prototype-feedback-decisions.md`.
- Seed/display module order is: 报告概览 / 报告摘要 / 听研报 / 报告要点 / 报告中的关键数据 / 观点差异 / 局限与疑问 / 时间线 / 术语与问答 / 结构梳理 / 延伸阅读 / 报告来源.
- Do not seed a separate `institution` display module for public Detail. Publisher information belongs inside the source/compliance surface rendered as `报告来源`.
- The real BIS sample should be the top report, but public UI copy must not expose internal labels such as NotebookLM sample, query artifact, or artifact mapping.
- `basic_info` and `executive_overview` must not repeat the same text: overview is factual scope/metadata; summary is a few-sentence report-level description.
- All public modules returned for Detail should expose `has_detail_page=True`; tests assert this to prevent accidental regression.
## Remaining Backend Gaps
- Auth routes.
- Personal-state routes.
- Audio stream signed URL route.
- Outbound events route.
- Internal management routes.
- Production object storage integration.
- Production cache invalidation and pagination.
- Deployment environment configuration.
+62
View File
@@ -0,0 +1,62 @@
# report-notebooklm-api
report-notebooklm 第一阶段对外只读接口的 FastAPI 服务。
这个目录是 API、数据模型、种子数据导入,以及由 NotebookLM 支撑的内容流水线的主要工程交接入口。配套的 Flutter 应用在同一个 monorepo 的 `../report-notebooklm-app/` 里。
## 先读这些
- [docs/HANDOFF.md](docs/HANDOFF.md):当前进度、已解决的问题、待解决的问题,以及交接顺序。
- [docs/PROJECT_BRIEF.md](docs/PROJECT_BRIEF.md):产品和第一阶段范围速览。
- [docs/API_AND_DATA.md](docs/API_AND_DATA.md):数据表、接口,以及已实现 / 计划中的 API。
- [docs/CONTENT_PIPELINE.md](docs/CONTENT_PIPELINE.md):研报来源和 NotebookLM 产物的流转。
- [docs/RUNBOOK.md](docs/RUNBOOK.md):本地搭建、种子数据导入、冒烟检查和部署检查。
- [docs/ROADMAP_AND_OPEN_ISSUES.md](docs/ROADMAP_AND_OPEN_ISSUES.md):下一步的工程工作。
- [docs/SOURCE_INDEX.md](docs/SOURCE_INDEX.md):本次交接快照所用到的源文档名称。
## 产品边界
这个仓库装的是代码和一份工程交接快照,不是产品的唯一真源。
产品 SSOTmall-docs 里的 report-notebooklm 文档。快照日期:2026-06-03。
技术标识符用 `report-notebooklm``rnb`,面向用户的产品名是 `研听`
## 本地快速上手
按你环境里可用的后端服务,创建一个 `.env` 文件:
```bash
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<host>:<port>/0
RNB_REDIS_KEY_PREFIX=rnb:
```
然后运行:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
alembic upgrade head
python scripts/import_seed_content.py
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
API 前缀:`/api/report-notebooklm/v1`
## 验证
```bash
source .venv/bin/activate
pytest -q
```
服务启动后,建议做这些冒烟检查:
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
+39
View File
@@ -0,0 +1,39 @@
[alembic]
script_location = migrations
prepend_sys_path = .
path_separator = os
# Runtime value is injected from RNB_DATABASE_URL in migrations/env.py.
sqlalchemy.url = sqlite+aiosqlite:///unused_alembic_config.db
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
+1
View File
@@ -0,0 +1 @@
+15
View File
@@ -0,0 +1,15 @@
from redis.asyncio import Redis
from app.config import get_settings
settings = get_settings()
def prefixed_key(key: str) -> str:
return f"{settings.redis_key_prefix}{key}"
def get_redis() -> Redis:
return Redis.from_url(settings.redis_url, decode_responses=True)
+18
View File
@@ -0,0 +1,18 @@
from functools import lru_cache
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
app_name: str = "report-notebooklm-api"
api_prefix: str = "/api/report-notebooklm/v1"
database_url: str
redis_url: str
redis_key_prefix: str = "rnb:"
model_config = SettingsConfigDict(env_prefix="RNB_", env_file=".env", extra="ignore")
@lru_cache
def get_settings() -> Settings:
return Settings()
+21
View File
@@ -0,0 +1,21 @@
from collections.abc import AsyncGenerator
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase
from app.config import get_settings
class Base(DeclarativeBase):
pass
settings = get_settings()
engine = create_async_engine(settings.database_url, pool_pre_ping=True)
SessionLocal = async_sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)
async def get_session() -> AsyncGenerator[AsyncSession, None]:
async with SessionLocal() as session:
yield session
+22
View File
@@ -0,0 +1,22 @@
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.config import get_settings
from app.routers import health, institutions, listen, reports
settings = get_settings()
app = FastAPI(title=settings.app_name)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=False,
allow_methods=["GET", "POST", "PATCH", "DELETE"],
allow_headers=["*"],
)
app.include_router(health.router, prefix=settings.api_prefix)
app.include_router(reports.router, prefix=settings.api_prefix)
app.include_router(institutions.router, prefix=settings.api_prefix)
app.include_router(listen.router, prefix=settings.api_prefix)
@@ -0,0 +1,32 @@
from app.models.entities import (
AudioAsset,
DisplayArtifact,
DisplayModule,
Favorite,
Institution,
OutboundEvent,
PlaybackProgress,
RawArtifact,
ReadingHistory,
RelatedNews,
Report,
SavedListen,
User,
)
__all__ = [
"AudioAsset",
"DisplayArtifact",
"DisplayModule",
"Favorite",
"Institution",
"OutboundEvent",
"PlaybackProgress",
"RawArtifact",
"ReadingHistory",
"RelatedNews",
"Report",
"SavedListen",
"User",
]
@@ -0,0 +1,302 @@
from __future__ import annotations
import datetime as dt
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Index, Integer, String, Text, UniqueConstraint
from sqlalchemy.dialects.mysql import MEDIUMTEXT
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.db import Base
def utcnow() -> dt.datetime:
return dt.datetime.now(dt.UTC).replace(tzinfo=None)
MediumText = Text().with_variant(MEDIUMTEXT, "mysql")
class Institution(Base):
__tablename__ = "institutions"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
institution_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
name_cn: Mapped[str] = mapped_column(String(255), nullable=False)
name_en: Mapped[str | None] = mapped_column(String(255))
institution_type: Mapped[str] = mapped_column(String(32), nullable=False)
source_tier: Mapped[str] = mapped_column(String(16), nullable=False)
website_url: Mapped[str | None] = mapped_column(String(512))
covered_topics: Mapped[str | None] = mapped_column(Text)
intro_cn: Mapped[str | None] = mapped_column(Text)
credibility_note: Mapped[str | None] = mapped_column(Text)
report_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
latest_report_id: Mapped[str | None] = mapped_column(String(64))
latest_report_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
reports: Mapped[list[Report]] = relationship(back_populates="institution")
__table_args__ = (Index("ix_institutions_status_latest", "status", "latest_report_at"),)
class Report(Base):
__tablename__ = "reports"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
report_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_type: Mapped[str] = mapped_column(String(16), nullable=False, default="single")
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
subtitle_cn: Mapped[str | None] = mapped_column(String(512))
original_title: Mapped[str | None] = mapped_column(String(512))
one_liner: Mapped[str | None] = mapped_column(String(512))
institution_id: Mapped[str] = mapped_column(String(64), ForeignKey("institutions.institution_id"), nullable=False)
co_institution_ids: Mapped[str | None] = mapped_column(Text)
source_tier: Mapped[str] = mapped_column(String(32), nullable=False)
source_url: Mapped[str | None] = mapped_column(String(512))
source_note: Mapped[str] = mapped_column(Text, nullable=False)
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
interpreted_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
released_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
topics: Mapped[str | None] = mapped_column(Text)
language: Mapped[str] = mapped_column(String(8), nullable=False, default="en")
has_audio: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
display_status: Mapped[str] = mapped_column(String(16), nullable=False, default="draft")
display_version: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
cache_version: Mapped[str] = mapped_column(String(128), nullable=False)
risk_disclaimer: Mapped[str | None] = mapped_column(Text)
interpretation_label: Mapped[str | None] = mapped_column(String(64), default="研报解读")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
institution: Mapped[Institution] = relationship(back_populates="reports")
__table_args__ = (
Index("ix_reports_status_released", "display_status", "released_at"),
Index("ix_reports_institution_released", "institution_id", "released_at"),
Index("ix_reports_audio_released", "has_audio", "released_at"),
)
class RawArtifact(Base):
__tablename__ = "raw_artifacts"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
raw_artifact_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
provider: Mapped[str] = mapped_column(String(32), nullable=False, default="notebooklm")
artifact_type: Mapped[str] = mapped_column(String(64), nullable=False)
conversation_id: Mapped[str | None] = mapped_column(String(128))
source_id: Mapped[str | None] = mapped_column(String(128))
notebook_id: Mapped[str | None] = mapped_column(String(128))
source_language: Mapped[str | None] = mapped_column(String(8))
payload_format: Mapped[str] = mapped_column(String(16), nullable=False)
payload_ref: Mapped[str | None] = mapped_column(String(512))
sha256: Mapped[str | None] = mapped_column(String(128))
status: Mapped[str] = mapped_column(String(16), nullable=False, default="pending")
error: Mapped[str | None] = mapped_column(Text)
size_bytes: Mapped[int | None] = mapped_column(BigInteger)
generated_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
ingested_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
is_publish_blocking: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
requires_human_review: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
quality_flags: Mapped[str | None] = mapped_column(Text)
retention_status: Mapped[str] = mapped_column(String(32), nullable=False, default="retained")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (
Index("ix_raw_report_type", "report_id", "artifact_type"),
Index("ix_raw_report_status", "report_id", "status"),
Index("ix_raw_retention", "retention_status"),
)
class DisplayArtifact(Base):
__tablename__ = "display_artifacts"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
display_artifact_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
display_version: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
summary_cn: Mapped[str | None] = mapped_column(Text)
source_label: Mapped[str | None] = mapped_column(String(255))
interpretation_label: Mapped[str | None] = mapped_column(String(64), default="研报解读")
ai_generated_label: Mapped[str | None] = mapped_column(String(128))
synthesis_type: Mapped[str | None] = mapped_column(String(16))
source_disclosure_text: Mapped[str | None] = mapped_column(Text)
review_status: Mapped[str] = mapped_column(String(16), nullable=False, default="review")
reviewed_by: Mapped[str | None] = mapped_column(String(128))
reviewed_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (Index("ix_display_artifacts_report_status_version", "report_id", "review_status", "display_version"),)
class DisplayModule(Base):
__tablename__ = "display_modules"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
module_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
display_artifact_id: Mapped[str] = mapped_column(String(64), ForeignKey("display_artifacts.display_artifact_id"), nullable=False)
type: Mapped[str] = mapped_column(String(32), nullable=False)
title_cn: Mapped[str | None] = mapped_column(String(255))
content_format: Mapped[str] = mapped_column(String(16), nullable=False)
content: Mapped[str | None] = mapped_column(MediumText)
content_ref: Mapped[str | None] = mapped_column(String(512))
content_etag: Mapped[str | None] = mapped_column(String(64))
source_raw_artifact_ids: Mapped[str | None] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="missing")
sort_order: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
version: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (
Index("ix_display_modules_report_status_sort", "report_id", "status", "sort_order"),
Index("ix_display_modules_artifact_status", "display_artifact_id", "status"),
)
class AudioAsset(Base):
__tablename__ = "audio_assets"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
audio_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
source_raw_artifact_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("raw_artifacts.raw_artifact_id"))
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
duration_sec: Mapped[int | None] = mapped_column(Integer)
oss_key: Mapped[str | None] = mapped_column(String(512))
waveform_ref: Mapped[str | None] = mapped_column(String(512))
chapters: Mapped[str | None] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="missing")
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (Index("ix_audio_report_status", "report_id", "status"),)
class RelatedNews(Base):
__tablename__ = "related_news"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
related_news_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
title: Mapped[str] = mapped_column(String(512), nullable=False)
source_name: Mapped[str | None] = mapped_column(String(255))
source_url: Mapped[str | None] = mapped_column(String(512))
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
language: Mapped[str | None] = mapped_column(String(8))
summary_cn: Mapped[str | None] = mapped_column(Text)
match_method: Mapped[str] = mapped_column(String(32), nullable=False, default="manual_curated")
match_keywords: Mapped[str | None] = mapped_column(Text)
match_confidence: Mapped[str | None] = mapped_column(String(8))
status: Mapped[str] = mapped_column(String(16), nullable=False, default="candidate")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_related_news_report_status", "report_id", "status"),)
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
user_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
phone_hash: Mapped[str | None] = mapped_column(String(128), unique=True)
wechat_openid: Mapped[str | None] = mapped_column(String(128), unique=True)
apple_user_id: Mapped[str | None] = mapped_column(String(256), unique=True)
display_name: Mapped[str | None] = mapped_column(String(128))
avatar_url: Mapped[str | None] = mapped_column(String(512))
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
last_login_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
class Favorite(Base):
__tablename__ = "favorites"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
favorite_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
__table_args__ = (UniqueConstraint("user_id", "report_id", name="uq_favorites_user_report"), Index("ix_favorites_user_report", "user_id", "report_id"))
class ReadingHistory(Base):
__tablename__ = "reading_history"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
history_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
event_type: Mapped[str] = mapped_column(String(32), nullable=False, default="view_detail")
last_position: Mapped[str | None] = mapped_column(Text)
last_seen_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_reading_history_user_seen", "user_id", "last_seen_at"), Index("ix_reading_history_user_report", "user_id", "report_id"))
class SavedListen(Base):
__tablename__ = "saved_listens"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
saved_listen_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
audio_id: Mapped[str] = mapped_column(String(64), ForeignKey("audio_assets.audio_id"), nullable=False)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
__table_args__ = (UniqueConstraint("user_id", "audio_id", name="uq_saved_listens_user_audio"),)
class PlaybackProgress(Base):
__tablename__ = "playback_progress"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
progress_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
audio_id: Mapped[str] = mapped_column(String(64), ForeignKey("audio_assets.audio_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
position_sec: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
duration_sec: Mapped[int | None] = mapped_column(Integer)
completed: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (UniqueConstraint("user_id", "audio_id", name="uq_playback_user_audio"), Index("ix_playback_user_audio", "user_id", "audio_id"))
class OutboundEvent(Base):
__tablename__ = "outbound_events"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
outbound_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
click_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
tracking_id: Mapped[str] = mapped_column(String(64), nullable=False)
user_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("users.user_id"))
device_id: Mapped[str | None] = mapped_column(String(128))
report_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("reports.report_id"))
institution_id: Mapped[str | None] = mapped_column(String(64))
scene: Mapped[str | None] = mapped_column(String(64))
ref: Mapped[str | None] = mapped_column(String(128))
target: Mapped[str | None] = mapped_column(String(32))
source_page: Mapped[str | None] = mapped_column(String(32))
placement: Mapped[str | None] = mapped_column(String(64))
campaign_id: Mapped[str | None] = mapped_column(String(64))
target_app: Mapped[str | None] = mapped_column(String(64))
commodity_tag: Mapped[str | None] = mapped_column(String(64))
hook_type: Mapped[str | None] = mapped_column(String(64))
user_state: Mapped[str | None] = mapped_column(String(16))
ts: Mapped[int | None] = mapped_column(BigInteger)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_outbound_tracking", "tracking_id"), Index("ix_outbound_report_created", "report_id", "created_at"))
@@ -0,0 +1 @@
@@ -0,0 +1 @@
@@ -0,0 +1,9 @@
from fastapi import APIRouter
router = APIRouter()
@router.get("/health")
async def health() -> dict[str, str]:
return {"status": "ok"}
@@ -0,0 +1,23 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/institutions")
async def institutions(
topic: str | None = None,
source_tier: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).institutions(topic=topic, source_tier=source_tier, page_size=page_size)
@router.get("/institutions/{institution_id}")
async def institution_detail(institution_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).institution_detail(institution_id)
@@ -0,0 +1,13 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/listen")
async def listen(page_size: int = Query(20, ge=1, le=50), session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).listen_items(page_size=page_size)
@@ -0,0 +1,47 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/feed/recommended")
async def recommended_feed(
topic: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).report_cards(topic=topic, page_size=page_size)
@router.get("/reports")
async def reports(
topic: str | None = None,
institution_id: str | None = None,
has_audio: bool | None = None,
source_tier: str | None = None,
q: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).report_cards(
topic=topic,
institution_id=institution_id,
has_audio=has_audio,
source_tier=source_tier,
q=q,
page_size=page_size,
)
@router.get("/reports/{report_id}")
async def report_detail(report_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).report_detail(report_id)
@router.get("/reports/{report_id}/modules/{module_id}")
async def module_detail(report_id: str, module_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).module_detail(report_id, module_id)
@@ -0,0 +1 @@
@@ -0,0 +1 @@
@@ -0,0 +1,272 @@
from __future__ import annotations
import json
from typing import Any
from fastapi import HTTPException
from sqlalchemy import Select, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models import AudioAsset, DisplayModule, Institution, RelatedNews, Report
MODULE_META: dict[str, dict[str, Any]] = {
"basic_info": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"executive_overview": {"layer": "p0", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"core_insights": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"key_data": {"layer": "p0", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"source_compliance": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"differentiated_view": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"weaknesses": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"timeline": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"study_guide": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"related_sources": {"layer": "p1", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"structure_graph": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"infographic": {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"audio": {"layer": "p2", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"research_discovery": {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"institution": {"layer": "p0", "render_mode": "inline", "has_detail_page": False, "is_publish_blocking": False, "requires_human_review": False},
}
def loads_json(value: str | None, default: Any) -> Any:
if not value:
return default
return json.loads(value)
def iso(value: Any) -> str | None:
return value.isoformat() if value else None
def institution_public(inst: Institution, *, detail: bool = False) -> dict[str, Any]:
data = {
"institution_id": inst.institution_id,
"name_cn": inst.name_cn,
"name_en": inst.name_en,
"institution_type": inst.institution_type,
"source_tier": inst.source_tier,
"website_url": inst.website_url,
"covered_topics": loads_json(inst.covered_topics, []),
"report_count": inst.report_count,
"latest_report_at": iso(inst.latest_report_at),
"credibility_note": inst.credibility_note,
}
if detail:
data["intro_cn"] = inst.intro_cn
return data
def institution_card(inst: Institution) -> dict[str, Any]:
return {
"institution_id": inst.institution_id,
"name_cn": inst.name_cn,
"name_en": inst.name_en,
"source_tier": inst.source_tier,
}
def report_card(report: Report, inst: Institution) -> dict[str, Any]:
return {
"report_id": report.report_id,
"title_cn": report.title_cn,
"subtitle_cn": report.subtitle_cn or "",
"one_liner": report.one_liner,
"institution": institution_card(inst),
"topics": loads_json(report.topics, []),
"released_at": iso(report.released_at),
"has_audio": report.has_audio,
"interpretation_label": report.interpretation_label,
"source_tier": report.source_tier,
"cache_version": report.cache_version,
}
def module_payload(module: DisplayModule) -> dict[str, Any]:
meta = MODULE_META.get(module.type, {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False})
envelope = loads_json(module.content, {})
render_mode = meta["render_mode"]
content = None
preview = None
if render_mode == "inline":
content = envelope.get("content", envelope)
else:
preview = envelope.get("preview", {})
return {
"module_id": module.module_id,
"type": module.type,
"layer": meta["layer"],
"render_mode": render_mode,
"has_detail_page": meta["has_detail_page"],
"is_publish_blocking": meta["is_publish_blocking"],
"requires_human_review": meta["requires_human_review"],
"sort_order": module.sort_order,
"title_cn": module.title_cn,
"content": content,
"preview": preview,
"content_ref": module.content_ref,
"content_etag": module.content_etag,
}
class CatalogService:
def __init__(self, session: AsyncSession) -> None:
self.session = session
async def _published_report_query(self) -> Select[tuple[Report, Institution]]:
return (
select(Report, Institution)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(Report.display_status == "published", Institution.status == "active")
.order_by(Report.released_at.desc(), Report.report_id)
)
async def report_cards(
self,
*,
topic: str | None = None,
institution_id: str | None = None,
has_audio: bool | None = None,
source_tier: str | None = None,
q: str | None = None,
page_size: int = 20,
) -> dict[str, Any]:
stmt = await self._published_report_query()
if topic:
stmt = stmt.where(Report.topics.like(f"%{topic}%"))
if institution_id:
stmt = stmt.where(Report.institution_id == institution_id)
if has_audio is not None:
stmt = stmt.where(Report.has_audio == has_audio)
if source_tier:
stmt = stmt.where(Report.source_tier == source_tier)
if q:
stmt = stmt.where(Report.title_cn.like(f"%{q}%"))
stmt = stmt.limit(min(max(page_size, 1), 50))
rows = (await self.session.execute(stmt)).all()
return {
"items": [report_card(report, inst) for report, inst in rows],
"page": {"next_cursor": None, "has_more": False},
"cache_version": "feed:recommended:seed:v1",
}
async def report_detail(self, report_id: str) -> dict[str, Any]:
row = (
await self.session.execute(
select(Report, Institution)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(Report.report_id == report_id, Report.display_status == "published", Institution.status == "active")
)
).one_or_none()
if row is None:
raise HTTPException(status_code=404, detail={"error": {"code": "REPORT_NOT_FOUND", "message": "报告不存在或未发布。"}})
report, inst = row
modules = (
await self.session.execute(
select(DisplayModule)
.where(DisplayModule.report_id == report_id, DisplayModule.status == "published")
.order_by(DisplayModule.sort_order)
)
).scalars().all()
return {
"report_id": report.report_id,
"title_cn": report.title_cn,
"subtitle_cn": report.subtitle_cn or "",
"original_title": report.original_title,
"one_liner": report.one_liner,
"institution": institution_public(inst, detail=True),
"source": {
"source_url": report.source_url,
"source_note": report.source_note,
"source_tier": report.source_tier,
"published_at": iso(report.published_at),
},
"topics": loads_json(report.topics, []),
"has_audio": report.has_audio,
"interpretation_label": report.interpretation_label,
"risk_disclaimer": report.risk_disclaimer,
"released_at": iso(report.released_at),
"cache_version": report.cache_version,
"modules": [module_payload(module) for module in modules],
}
async def module_detail(self, report_id: str, module_id: str) -> dict[str, Any]:
report = (
await self.session.execute(select(Report).where(Report.report_id == report_id, Report.display_status == "published"))
).scalar_one_or_none()
if report is None:
raise HTTPException(status_code=404, detail={"error": {"code": "REPORT_NOT_FOUND", "message": "报告不存在或未发布。"}})
module = (
await self.session.execute(
select(DisplayModule).where(DisplayModule.report_id == report_id, DisplayModule.module_id == module_id, DisplayModule.status == "published")
)
).scalar_one_or_none()
if module is None:
raise HTTPException(status_code=404, detail={"error": {"code": "MODULE_HIDDEN", "message": "模块隐藏或不可见。"}})
envelope = loads_json(module.content, {})
content = envelope.get("full") or envelope.get("content") or envelope
return {
"module_id": module.module_id,
"type": module.type,
"title_cn": module.title_cn,
"content": content,
"content_etag": module.content_etag,
"cache_version": report.cache_version,
}
async def institutions(self, *, topic: str | None = None, source_tier: str | None = None, page_size: int = 20) -> dict[str, Any]:
stmt = select(Institution).where(Institution.status == "active").order_by(Institution.source_tier, Institution.name_cn).limit(min(max(page_size, 1), 50))
if topic:
stmt = stmt.where(Institution.covered_topics.like(f"%{topic}%"))
if source_tier:
stmt = stmt.where(Institution.source_tier == source_tier)
rows = (await self.session.execute(stmt)).scalars().all()
return {"items": [institution_public(inst) for inst in rows], "page": {"next_cursor": None, "has_more": False}}
async def institution_detail(self, institution_id: str) -> dict[str, Any]:
inst = (await self.session.execute(select(Institution).where(Institution.institution_id == institution_id, Institution.status == "active"))).scalar_one_or_none()
if inst is None:
raise HTTPException(status_code=404, detail={"error": {"code": "INSTITUTION_NOT_FOUND", "message": "机构不存在。"}})
reports = await self.report_cards(institution_id=institution_id, page_size=5)
detail = institution_public(inst, detail=True)
detail["latest_report"] = reports["items"][0] if reports["items"] else None
detail["recent_reports"] = reports["items"]
return detail
async def listen_items(self, *, page_size: int = 20) -> dict[str, Any]:
stmt = (
select(AudioAsset, Report, Institution)
.join(Report, AudioAsset.report_id == Report.report_id)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(AudioAsset.status == "published", Report.display_status == "published")
.order_by(Report.released_at.desc(), AudioAsset.audio_id)
.limit(min(max(page_size, 1), 50))
)
rows = (await self.session.execute(stmt)).all()
items = [
{
"audio_id": audio.audio_id,
"title_cn": audio.title_cn,
"duration_sec": audio.duration_sec,
"report_id": report.report_id,
"report_title_cn": report.title_cn,
"institution": institution_card(inst),
"released_at": iso(report.released_at),
"cache_version": report.cache_version,
}
for audio, report, inst in rows
]
return {"items": items, "page": {"next_cursor": None, "has_more": False}, "cache_version": "listen:seed:v1"}
async def seed_counts(self) -> dict[str, int]:
models = {
"institutions": Institution,
"reports": Report,
"audio_assets": AudioAsset,
"display_modules": DisplayModule,
"related_news": RelatedNews,
}
counts = {}
for name, model in models.items():
counts[name] = await self.session.scalar(select(func.count()).select_from(model)) or 0
return counts
+185
View File
@@ -0,0 +1,185 @@
# API and Data Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Current Implementation Status
Implemented in this repository:
- FastAPI app under `/api/report-notebooklm/v1`.
- SQLAlchemy model layer for the Phase 1 table set.
- Alembic initial migration.
- Seed import script with institutions, reports, modules, audio assets, users, favorites, and playback-progress fixtures.
- Public read endpoints for health, feeds, reports, modules, institutions, and listen list.
- Tests covering seed counts, public response shape, module visibility, gray-source handling, and listen behavior.
Not implemented yet:
- Auth APIs.
- Personal state APIs.
- Audio stream signing endpoint.
- Outbound events endpoint.
- Internal management APIs.
- Real Redis cache invalidation policy.
- Real object-storage signed URL policy.
- Production pagination/cursor behavior beyond seed-scale responses.
## Data Tables
| Table | Purpose | Current model |
|---|---|---|
| `institutions` | Institution profile, source tier, website, topics, credibility notes. | Implemented |
| `reports` | Report master record, source, topics, publication state, cache version. | Implemented |
| `raw_artifacts` | NotebookLM artifact metadata and object-storage references. | Implemented as metadata only |
| `display_artifacts` | Reviewed display version metadata for App consumption. | Implemented |
| `display_modules` | Detail-page modules, sort order, visibility, content or content reference. | Implemented |
| `audio_assets` | Audio metadata and object-storage key. | Implemented |
| `related_news` | Related-source candidates and reviewed related items. | Implemented |
| `users` | User account records. | Implemented as seed model, no auth routes |
| `favorites` | User report favorites. | Implemented as seed model, no API routes |
| `reading_history` | User reading/history events. | Implemented as model, no API routes |
| `saved_listens` | User saved-listen records. | Implemented as model, no API routes |
| `playback_progress` | Playback progress sync records. | Implemented as seed model, no API routes |
| `outbound_events` | External attribution events. | Implemented as model, no API route |
## Public API Implemented
Prefix: `/api/report-notebooklm/v1`
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/health` | Service health. |
| `GET` | `/feed/recommended` | Published report cards for recommendation feed. |
| `GET` | `/reports` | Published report cards with basic filters. |
| `GET` | `/reports/{report_id}` | Report detail skeleton and published modules. |
| `GET` | `/reports/{report_id}/modules/{module_id}` | Full content for a visible module. |
| `GET` | `/institutions` | Active institution list. |
| `GET` | `/institutions/{institution_id}` | Institution detail with latest/recent reports. |
| `GET` | `/listen` | Published audio-backed report list. |
Current filters:
- `/reports`: `topic`, `institution_id`, `has_audio`, `source_tier`, `q`, `page_size`.
- `/institutions`: `topic`, `source_tier`, `page_size`.
- `/feed/recommended` and `/listen`: `page_size`.
Current pagination is seed-scale. Responses return `next_cursor: null` and `has_more: false`.
## Planned Public API
The Phase 1 contract also expects:
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/audio/{audio_id}/stream` | Return short-lived playable URL. |
| `POST` | `/outbound/events` | Persist external attribution click event. |
Audio stream must not return a permanent object-storage URL. The planned behavior is backend-signed short-lived playback URL with no download URL.
## Planned Auth and Personal State API
Auth:
- `POST /auth/phone/start`
- `POST /auth/phone/verify`
- `POST /auth/wechat`
- `POST /auth/apple`
Personal state:
- `GET /me`
- `GET /me/favorites`
- `POST /me/favorites`
- `DELETE /me/favorites/{report_id}`
- `GET /me/history`
- `POST /me/history`
- `GET /me/listens/saved`
- `POST /me/listens/saved`
- `DELETE /me/listens/saved/{audio_id}`
- `POST /me/playback-progress`
- `GET /me/playback-progress/{audio_id}`
These endpoints are contract-level requirements but are not implemented in this scaffold.
## Planned Internal API
Internal APIs should require service token and network allowlist. They must never be exposed to the App.
- `POST /internal/reports`
- `POST /internal/reports/{report_id}/raw-artifacts`
- `GET /internal/reports/{report_id}/raw-artifacts`
- `POST /internal/reports/{report_id}/display-artifacts`
- `PATCH /internal/modules/{module_id}`
- `POST /internal/reports/{report_id}/publish`
- `POST /internal/reports/{report_id}/hide`
- `POST /internal/related-news/candidates`
Publishing should update report display status, update `has_audio`, bump `cache_version`, and clear related cache keys.
## Public vs Internal Fields
Public responses may expose:
- Report identity, title, subtitle, one-liner, topics, institution card, release time, source tier, interpretation label, `has_audio`, and `cache_version`.
- Detail source note, source URL where allowed, risk disclaimer, and published display modules.
- Module metadata needed by the client: `module_id`, `type`, `layer`, `render_mode`, `has_detail_page`, `is_publish_blocking`, `requires_human_review`, `sort_order`, `title_cn`, `content`, `preview`, `content_ref`, `content_etag`.
Public responses must not expose:
- Raw artifact payload.
- Object-storage private paths for raw artifacts.
- NotebookLM notebook IDs, source IDs, conversation IDs, or local account information.
- Local filesystem paths.
- `display_version` or `module.version`.
- User phone hash, WeChat OpenID, Apple user ID, or auth internals.
The public cache contract is a single `cache_version` string. `display_version` and module `version` are server-internal fields only.
## Seed Data
The seed importer currently creates:
- 18 institutions.
- 27 reports, including one NotebookLM sample report and multiple boundary cases.
- 15 audio assets.
- More than 120 display modules.
- Test users, favorites, and playback progress.
Seed boundary cases intentionally cover:
- Reports with audio and reports without audio.
- Hidden/unpublished report behavior.
- Gray broker source with restricted source URL behavior.
- Published modules vs review-only modules.
- `study_guide` module replacing legacy `faq`.
- Heavy modules using `card_plus_page` preview plus full-module endpoint.
Do not treat seed content as production content. It exists to exercise app/API behavior and edge cases.
## Detail Module Model
The detail page uses a skeleton plus module model:
- Inline modules include small `content` directly in the detail response.
- Heavy modules use `render_mode=card_plus_page`, return `preview` in detail, and load full content from `/reports/{report_id}/modules/{module_id}`.
- Unknown future module types should not break the App; they should fall back to hidden or generic rendering.
Core module types:
- `basic_info`
- `executive_overview`
- `core_insights`
- `key_data`
- `source_compliance`
- `institution`
- `differentiated_view`
- `weaknesses`
- `timeline`
- `study_guide`
- `structure_graph`
- `related_sources`
- `infographic`
- `audio`
- `research_discovery`
@@ -0,0 +1,142 @@
# Content Pipeline Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Content Principle
Use NotebookLM as a source-driven research engine, not as a generic rewriting model.
The pipeline may orchestrate, clean, validate, map, and review NotebookLM-native artifacts. It must not silently replace missing NotebookLM artifacts with locally rewritten publishable content.
## Source Inputs
Phase 1 content is based on public or authorized institutional research reports. Priority source categories:
- Official public sources.
- Authorized partner sources.
- Gray broker public sources, with stricter review and source display handling.
Vision source lists, tiering, and historical source-health experience may be used as reference material. Production data must not depend on a local Vision runtime, local path, local cache, or local account state.
## NotebookLM Workflow
Recommended report run order:
1. Inspect the source PDF: title, institution, date, page count, size, and report type.
2. Create or reuse one notebook for one report source unless a multi-report synthesis is explicitly planned.
3. Upload the report source.
4. Generate the P0 text package:
- source description
- native Briefing Doc
- native Blog Post
- data table
- query dimensions
- query key data
- query divergence
- query weaknesses
5. Generate useful P1 artifacts:
- query timeline
- query related sources
- Study Guide
- mind map, if download succeeds
6. Generate P2 artifacts asynchronously:
- infographic candidate
- audio brief
- research discovery
7. Persist every artifact status in a manifest.
8. Deterministically assemble display modules from reviewed artifacts.
9. Run human review before publishing.
## Artifact Types
The Phase 1 schema supports these NotebookLM artifact types:
| Artifact type | Purpose | Publish blocking | Human review |
|---|---|---:|---:|
| `source_summary` | Source-level summary. | No | No |
| `notebook_summary` | Notebook-level summary. | No | No |
| `native_briefing_doc` | Native briefing document. | Yes | No |
| `native_blog_post` | Native blog post. | Yes | No |
| `native_study_guide` | FAQ, study guide, glossary. | No | No |
| `data_table` | Structured table data. | Yes | No |
| `mind_map` | Mind map or graph source. | No | No |
| `query_dimensions` | Analysis dimensions. | Yes | No |
| `query_key_data` | Key data points. | Yes | No |
| `query_divergence` | Views that diverge from consensus. | No | No |
| `query_weaknesses` | Weaknesses and open questions. | No | No |
| `query_timeline` | Timeline and turning points. | No | No |
| `query_related_sources` | Related source candidates. | No | Yes |
| `research_discovery` | Enrichment queue. | No | Yes |
| `infographic` | Candidate public image. | No | Yes |
| `audio_brief` | Listening preview or audio source. | No | No |
Artifact records should keep status, object reference, format, size, hash, generated time, error, and review flags. Raw payloads should stay in object storage and remain internal.
## Module Mapping
| Product module | Primary artifact sources | Notes |
|---|---|---|
| `basic_info` | Source metadata and source summary. | P0, inline. |
| `executive_overview` | Briefing Doc and Blog Post. | P0, heavy card plus page. |
| `core_insights` | Briefing Doc and query dimensions. | P0, inline with optional detail page. |
| `key_data` | Data table and query key data. | P0, heavy card plus page. |
| `source_compliance` | Source metadata and review notes. | P0, inline, must include disclaimer. |
| `institution` | Institution record. | P0, inline. |
| `differentiated_view` | Query divergence. | P1, optional. |
| `weaknesses` | Query weaknesses. | P1, optional, avoid investment-advice wording. |
| `timeline` | Query timeline. | P1, optional. |
| `study_guide` | Native Study Guide. | P1, optional, replaces legacy `faq`. |
| `structure_graph` | Mind map or deterministic fallback. | P1, optional. |
| `related_sources` | Related-source query and review queue. | P1, review required before display. |
| `infographic` | Infographic candidate. | P2, review required before display. |
| `audio` | Audio brief or reviewed audio asset. | P2, not required for text publish. |
| `research_discovery` | Research discovery queue. | P2, internal or reviewed only. |
## Publish Gates
Blocking before public release:
- Source upload succeeded and is traceable.
- Required P0 text artifacts exist and have usable content.
- `basic_info`, `executive_overview`, `core_insights`, `key_data`, and `source_compliance` are present unless a product decision allows a partial report.
- Display artifact is reviewed and approved.
- Source attribution and risk disclaimer are present.
- No raw artifact payload, local path, private notebook ID, or account information appears in public responses.
Non-blocking:
- Mind map.
- Study guide.
- Timeline.
- Related-source candidates.
- Research discovery.
- Infographic.
- Audio.
If optional artifacts fail, record the failure and continue without inventing fallback public copy. Deterministic fallback is allowed for structure graph from already available artifacts.
## Cadence Notes
NotebookLM operations should be conservative by default:
- One active NotebookLM operation per account.
- Text artifacts first.
- Media artifacts after text success.
- Heavy media should not block publishable text.
- On transient failure, retry once; if an optional artifact fails again, mark it failed and continue.
The seed importer is not a production runner. A production runner should persist manifests after every operation and support resumable review/import.
## Human Review
Review is mandatory for:
- Gray broker sources.
- Related-source candidate display.
- Infographic or generated media.
- Any content where citations/page labels are ambiguous.
- Any copy that could be interpreted as investment advice.
Do not display raw NotebookLM page labels until they are normalized against verifiable source pages or sections.
+84
View File
@@ -0,0 +1,84 @@
# Backend Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Current State
The backend is a runnable Phase 1 scaffold for the public read surface. It is not production-ready yet.
Implemented:
- FastAPI app and API prefix.
- SQLAlchemy models for the Phase 1 table set.
- Alembic initial migration.
- Seed import script.
- Public read API for feed, reports, module detail, institutions, and listen list.
- Tests for current seed behavior and public response boundaries.
Not implemented:
- Authentication.
- User personal-state routes.
- Audio stream signing.
- Outbound attribution route.
- Internal management routes.
- Production object storage integration.
- Real Redis cache invalidation.
- Production deployment config.
## Repository Map
| Path | Purpose |
|---|---|
| `app/main.py` | FastAPI app, CORS, router registration. |
| `app/config.py` | Environment-driven settings. |
| `app/db.py` | Async SQLAlchemy engine and session dependency. |
| `app/cache.py` | Redis client helper and key prefixing. |
| `app/models/entities.py` | SQLAlchemy table models. |
| `app/routers/` | HTTP route handlers. |
| `app/services/catalog.py` | Public catalog response assembly. |
| `migrations/` | Alembic environment and migration files. |
| `scripts/import_seed_content.py` | Seed data importer and module fixture builder. |
| `tests/test_public_api.py` | Current API and seed behavior tests. |
| `docs/` | Engineering handoff documentation. |
## Solved Decisions
- Technical identifiers stay `report-notebooklm` / `rnb`; display name is `研听`.
- Public API responses expose `cache_version`, not `display_version` or module `version`.
- `study_guide` replaces legacy `faq`.
- Heavy modules use preview cards plus full-module endpoint.
- Raw artifacts stay internal; App consumes reviewed display artifacts only.
- Gray broker sources may be audio-ized only after the latest product decision and compliance review.
- Phase 1 has no interpretation-content download feature.
## Known Gaps
- `GET /audio/{audio_id}/stream` needs signed playback URL behavior.
- Auth and personal state APIs need implementation.
- `POST /outbound/events` needs implementation and validation for `click_id` / `tracking_id`.
- Internal publish/hide/import management endpoints need implementation.
- Cursor pagination and cache invalidation are seed-scale placeholders.
- Object storage policy needs a production decision for public vs signed module content.
- Release/deploy settings need staging and production environment values.
- Compliance must re-review gray-source audio and generated media rules before launch.
## Suggested Handoff Order
1. Read `docs/PROJECT_BRIEF.md`.
2. Read `docs/API_AND_DATA.md`.
3. Run the backend locally with seed data using `docs/RUNBOOK.md`.
4. Run `pytest -q` and smoke the three core public endpoints.
5. Pair with `report-notebooklm-app/` and verify `RNB_API_BASE` points to this service.
6. Choose the next work item from `docs/ROADMAP_AND_OPEN_ISSUES.md`.
## Definition of Done for Next Backend Work
- New API behavior has tests.
- Public responses do not expose internal/raw fields.
- Migrations include downgrade.
- New config is environment-driven.
- Seed data remains useful for App development.
- Documentation is updated when contract behavior changes.
@@ -0,0 +1,71 @@
# Project Brief Snapshot
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Product
`研听` is a Chinese research-report interpretation app for users who want to understand global institutional research with lower language and time barriers. It turns hard-to-read English research reports into structured Chinese reading and listening experiences.
Technical identifiers remain `report-notebooklm` and `rnb`. Do not use the product display name in code identifiers, database schema names, Redis keys, object storage paths, or API prefixes.
## Phase 1 Goals
- Validate whether Chinese users will repeatedly consume global institutional research-report interpretations.
- Ship a complete first app experience for discovery, reading, listening, saving, and returning to reports.
- Establish a minimum loop from report sources to selection, NotebookLM-assisted interpretation, review, storage, API distribution, and app display.
- Keep source attribution and compliance clear: this is report interpretation and annotation, not investment advice.
- Keep the commercial app independent from any local-only Vision runtime.
## Target Users
- General Chinese users interested in macro, precious metals, commodities, energy, central banks, and cross-asset research.
- Light professional users who want overseas institutional views and original-source traceability, without trading advice.
- Commuting or fragmented-time users who want reports transformed into listenable content.
Non-target users: professional terminal users, real-time trading-signal users, UGC/community users, and users expecting original investment recommendations.
## Main Tabs
| Tab | Phase 1 scope | Explicitly out of scope |
|---|---|---|
| 推荐 | Latest and curated report interpretations. | Ads, hard trading CTAs, real-time news flashes. |
| 研报 | All published report interpretations with basic filters. | Advanced investment terminal search. |
| 机构 | Institution list and institution report entry points. | Commercial institution ranking or onboarding backend. |
| 听单 | Reports that have audio form. | User-created podcasts, downloads, offline packages. |
| 我的 | Guest/login state, favorites, history, saved listening entry points. | Comments, UGC, paid membership, points. |
## Phase 1 Must Do
- Public browsing for recommended reports, report list, institutions, and listen list.
- Report detail pages with title, institution, publication/release data, source type, topics, summary, structured modules, source/compliance information, and favorite entry.
- Guest users can browse public content and fully listen to at least one episode.
- Logged-in users can synchronize favorites, reading history, saved listens, and playback progress.
- Published app responses must expose only reviewed display artifacts, not raw NotebookLM artifacts.
- Every report detail must preserve source attribution and risk disclaimer wording.
## Phase 1 Must Not Do
- No commercialization: no ads, paid unlock, membership, task wall, or points.
- No comments, community, UGC, or user-generated report interpretations.
- No investment advice, trading signals, buy/sell points, return promises, or portfolio recommendations.
- No original financial news, real-time reporting, or commentary positioned as original market views.
- No in-product downloads for interpretation content, audio packages, or PDFs.
- No long-term production dependency on a local Vision runtime, local SQLite, local scripts, local paths, or local account state.
- No App or server-side LLM rewriting of NotebookLM-native content into unsupported original copy.
## Compliance Boundary
- Positioning: research-report interpretation and annotation service.
- Content: Chinese interpretation of public or authorized institutional reports.
- Detail pages, agreements, and store metadata must state that content is not investment advice.
- Each item must show institution, source, publication time, and interpretation/source labels.
- Gray broker sources require special handling and human review before public release.
- Phase 1 does not open user content surfaces.
## Vision Decoupling
Vision source experience can be reused as reference material: source lists, source tiers, source-health lessons, NotebookLM experience, and prior pitfalls.
The app must not depend on local Vision runtime state in production. Any short-term Vision consumption must be read-only transition input, must not write back to Vision, and must not leak local file paths into production data.
@@ -0,0 +1,57 @@
# Roadmap and Open Issues
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## P0 Before Production Handoff
- Add environment examples and production-safe defaults for all deploy-time settings.
- Decide staging and production API domains.
- Implement `GET /audio/{audio_id}/stream` with short-lived signed playback URL.
- Implement auth start/verify flow and token handling.
- Implement `/me` personal-state APIs for favorites, history, saved listens, and playback progress.
- Implement `POST /outbound/events` with required `click_id` and `tracking_id`.
- Implement production cursor pagination.
- Implement cache invalidation on publish/hide/module/audio changes.
- Add smoke scripts for health, feed, detail, listen, audio stream, favorite, and outbound event.
## P1 Content and Admin
- Implement internal APIs for report import, raw artifacts, display artifacts, module patching, publish, hide, and related-source candidates.
- Implement production content importer from a manifest-based NotebookLM runner.
- Add validation for module JSON schemas.
- Add object storage integration for raw payloads, heavy module content, audio, images, and source references.
- Add publish blocking validation for P0 modules.
- Add gray-source review flags and operational reporting.
## P1 App/API Contract
- Align App with real auth state and return-to-action behavior.
- Add playable audio stream integration once backend stream endpoint exists.
- Replace local playback placeholders with API-backed progress.
- Add real outbound event write before external navigation.
- Decide whether heavy P1 modules stay as separate pages or merge into one deep-dive page.
## P2 Production Operations
- Add structured logs and request IDs.
- Add application metrics for feed/detail/listen/audio/outbound.
- Add backup and restore runbook for database and content objects.
- Add staging seed or reviewed staging content set.
- Add CI checks for lint, tests, migrations, and public response snapshots.
## Product and Compliance Open Issues
- Re-review gray-source audio policy before public release.
- Define AI-generated-content labeling requirements in App detail and store metadata.
- Define infographic watermark, QA, and factual-check process.
- Define source citation display rules after citation/page-label normalization.
- Confirm login channels and external approvals: phone SMS, WeChat, Apple.
- Confirm store listing wording and risk disclaimers.
## Gitea Handoff Blockers
- Use the single Gitea remote for the monorepo.
- Decide whether the initial push goes directly to `main` or to a review branch.
- Confirm the team has access to the product SSOT or accepts the code-repo snapshot as the development handoff.
+112
View File
@@ -0,0 +1,112 @@
# Backend Runbook
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Requirements
- Python 3.12 or compatible with the configured project dependencies.
- MySQL 8 for local/staging/prod-like runs.
- Redis 7 for cache-compatible local/staging/prod-like runs.
- A shell environment that can create a Python virtual environment.
SQLite is used by the automated tests through `RNB_DATABASE_URL`; production-like local runs should use MySQL.
## Environment Variables
Create `.env` in the repository root:
```bash
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<redis-host>:<redis-port>/0
RNB_REDIS_KEY_PREFIX=rnb:
```
Do not commit `.env`.
## Install
```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```
## Migrate and Seed
```bash
source .venv/bin/activate
alembic upgrade head
python scripts/import_seed_content.py
```
Seed import is destructive for seed tables. Use it only in local or disposable test data environments unless a production-safe importer is written.
## Run API
```bash
source .venv/bin/activate
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
API prefix:
```text
/api/report-notebooklm/v1
```
## Smoke Checks
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
Expected:
- Health returns `{"status":"ok"}`.
- Feed returns non-empty `items`.
- Report detail returns modules and does not include `display_version`.
## Test
```bash
source .venv/bin/activate
pytest -q
```
## App Integration
Start this backend first, then run the App with:
```bash
flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url>
```
For Android emulator, use an API base URL reachable from that emulator:
```bash
flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
Only use cleartext HTTP for local debug builds. Release builds must use HTTPS.
## Deployment Checks
Before staging or production:
- Use environment variables for all database, Redis, object storage, auth, and signing settings.
- Configure HTTPS at the gateway.
- Confirm migrations can run forward and downgrade in staging.
- Import reviewed content, not raw/unreviewed NotebookLM artifacts.
- Smoke `/health`, `/feed/recommended`, report detail, audio stream, favorites, and outbound event once those APIs exist.
- Confirm public responses do not expose local paths, raw payloads, notebook IDs, source IDs, conversation IDs, or secrets.
## Operational Notes
- Redis keys must use the `rnb:` prefix or a compatible namespace.
- Object storage keys should use `rnb/raw/`, `rnb/modules/`, `rnb/audio/`, and `rnb/images/` style prefixes.
- Long NotebookLM operations should live in a resumable runner, not inside HTTP request handlers.
@@ -0,0 +1,35 @@
# Source Index
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Snapshot Sources
The handoff documents in this repository were distilled from these logical product-document sources:
| Logical source | Used for |
|---|---|
| `phase1-scope.md` | Product positioning, target users, tabs, Phase 1 scope, non-goals, compliance boundary, Vision decoupling. |
| `phase1-build-brief.md` | Data tables, endpoint list, display module model, artifact enum, seed mapping, open questions. |
| `phase1-development-plan.md` | Technology choices, architecture, Redis/object-storage strategy, phases, deployment assumptions, external dependencies. |
| `data-model-api-contract-v0.1.md` | API/data object intent and response boundaries. |
| `user-flows.md` | Guest vs logged-in behavior, shallow interaction expectations, no-download clarification. |
| `app-prd-v0.1.md` | App-side behavior and page-level expectations. |
| `vision-research-sources.md` | Source-reference context and Vision decoupling principle. |
## Drift Rule
Do not treat this repository snapshot as the product SSOT. When product requirements change:
1. Update the product SSOT first.
2. Update this code-repo snapshot only for information needed by engineers.
3. Bump the snapshot date or add a short changelog entry.
## What Was Not Copied
- Historical drafts.
- Full experiment reports.
- Local-only evidence paths.
- Private local notes.
- Raw NotebookLM notebook IDs, source IDs, conversation IDs, account identifiers, or payloads.
+56
View File
@@ -0,0 +1,56 @@
from __future__ import annotations
import asyncio
from logging.config import fileConfig
from alembic import context
from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import async_engine_from_config
from app.config import get_settings
from app.db import Base
from app.models import * # noqa: F401,F403
config = context.config
config.set_main_option("sqlalchemy.url", get_settings().database_url)
if config.config_file_name is not None:
fileConfig(config.config_file_name)
target_metadata = Base.metadata
def run_migrations_offline() -> None:
context.configure(
url=get_settings().database_url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection: Connection) -> None:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
async def run_migrations_online() -> None:
connectable = async_engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
if context.is_offline_mode():
run_migrations_offline()
else:
asyncio.run(run_migrations_online())
@@ -0,0 +1,23 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
revision = ${repr(up_revision)}
down_revision = ${repr(down_revision)}
branch_labels = ${repr(branch_labels)}
depends_on = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}
@@ -0,0 +1,26 @@
"""phase1 initial tables
Revision ID: 202606030100
Revises:
Create Date: 2026-06-03 01:00:00
"""
from alembic import op
from app.db import Base
from app.models import * # noqa: F401,F403
revision = "202606030100"
down_revision = None
branch_labels = None
depends_on = None
def upgrade() -> None:
bind = op.get_bind()
Base.metadata.create_all(bind=bind)
def downgrade() -> None:
bind = op.get_bind()
Base.metadata.drop_all(bind=bind)
+32
View File
@@ -0,0 +1,32 @@
[project]
name = "report-notebooklm-api"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
"fastapi",
"uvicorn",
"sqlalchemy",
"greenlet",
"alembic",
"pydantic",
"pydantic-settings",
"asyncmy",
"redis",
]
[project.optional-dependencies]
dev = [
"aiosqlite",
"httpx",
"pytest",
"pytest-asyncio",
]
[tool.setuptools.packages.find]
include = ["app*", "scripts*"]
exclude = ["migrations*", "tests*"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["."]
@@ -0,0 +1 @@
@@ -0,0 +1,649 @@
from __future__ import annotations
import asyncio
import csv
import datetime as dt
import hashlib
import re
import json
import sys
from typing import Any
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sqlalchemy import delete, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import Base, SessionLocal, engine
from app.models import (
AudioAsset,
DisplayArtifact,
DisplayModule,
Favorite,
Institution,
OutboundEvent,
PlaybackProgress,
RawArtifact,
ReadingHistory,
RelatedNews,
Report,
SavedListen,
User,
)
def j(value: Any) -> str:
return json.dumps(value, ensure_ascii=False, separators=(",", ":"))
def d(value: str) -> dt.datetime:
return dt.datetime.fromisoformat(value.replace("Z", "+00:00")).replace(tzinfo=None)
def etag(value: Any) -> str:
return hashlib.sha256(j(value).encode("utf-8")).hexdigest()[:16]
REAL_SAMPLE_REPORT_ID = "rep_bis_notebooklm_sample"
REAL_SAMPLE_ROOT = (
Path.home()
/ "Projects/team-project/mall-docs/products/type3-orbit/report-notebooklm/docs.jimme.local/report-notebooklm/notebooklm-capability-bis-2026-06-02"
)
REAL_SAMPLE_ARTIFACTS = REAL_SAMPLE_ROOT / "artifacts"
def read_real_sample(name: str) -> str:
path = REAL_SAMPLE_ARTIFACTS / name
if not path.exists():
return ""
return path.read_text(encoding="utf-8-sig")
def clean_markdown_text(value: str) -> str:
text = re.sub(r"\[\d+(?:[-, ]+\d+)*\]", "", value)
text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
text = text.replace("`", "")
return re.sub(r"\s+", " ", text).strip()
def markdown_sections(markdown: str, *, min_level: int = 2, limit: int = 8) -> list[dict[str, str]]:
sections: list[dict[str, str]] = []
current_heading = ""
current_lines: list[str] = []
heading_re = re.compile(r"^(#{%d,4})\s+(.+)$" % min_level)
for raw_line in markdown.splitlines():
line = raw_line.strip()
if line == "## Citations":
break
match = heading_re.match(line)
if match:
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
current_heading = clean_markdown_text(match.group(2))
current_lines = []
continue
if current_heading and line and not line.startswith("---") and not line.startswith("|"):
current_lines.append(line)
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
return sections[:limit]
def numbered_sections(markdown: str, *, limit: int = 8) -> list[dict[str, str]]:
sections: list[dict[str, str]] = []
pattern = re.compile(r"^(?:###\s*)?\d+\.\s+\**(.+?)\**$")
current_heading = ""
current_lines: list[str] = []
for raw_line in markdown.splitlines():
line = raw_line.strip()
if line == "## Citations":
break
match = pattern.match(line)
if match:
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
current_heading = clean_markdown_text(match.group(1))
current_lines = []
continue
if current_heading and line and not line.startswith("#") and not line.startswith("---"):
current_lines.append(line)
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
return sections[:limit]
def split_heading_body(section: dict[str, str]) -> tuple[str, str]:
body = section["body"]
parts = re.split(r"研报观点与证据:|证据:|影响:", body, maxsplit=1)
if len(parts) == 2:
return clean_markdown_text(parts[0]), clean_markdown_text(parts[1])
return "", body
def key_data_rows() -> list[dict[str, str]]:
csv_text = read_real_sample("data-table.csv")
rows: list[dict[str, str]] = []
if csv_text:
for row in csv.DictReader(csv_text.splitlines()):
rows.append(
{
"metric": row.get("数据点/指标名称", ""),
"value": row.get("定量数值或趋势", ""),
"unit": "",
"importance": row.get("风险/修订指示", ""),
"judgment": row.get("相关行业或资产类别", ""),
}
)
if rows:
return rows
return [
{"metric": "M7 市值占比", "value": "近 35%", "unit": "", "importance": "提示指数集中度风险", "judgment": "美国大型科技股"},
{"metric": "SRT 覆盖贷款", "value": "约 8000 亿欧元", "unit": "", "importance": "提示隐藏信贷风险规模", "judgment": "银行业 / 非银机构"},
]
def sample_artifact_types() -> list[str]:
return [
"describe-source",
"native_briefing_doc",
"native_blog_post",
"native_study_guide",
"data_table",
"query_dimensions",
"query_key_data",
"query_divergence",
"query_weaknesses",
"query_timeline",
"query_related_sources",
"audio_brief",
]
MODULE_TITLES = {
"basic_info": "报告概览",
"executive_overview": "报告摘要",
"audio": "听研报",
"core_insights": "报告要点",
"key_data": "报告中的关键数据",
"differentiated_view": "观点差异",
"weaknesses": "局限与疑问",
"timeline": "时间线",
"study_guide": "术语与问答",
"structure_graph": "结构梳理",
"related_sources": "延伸阅读",
"source_compliance": "报告来源",
}
MODULE_DISPLAY_ORDER = {module_type: index for index, module_type in enumerate(MODULE_TITLES)}
def real_sample_module_envelope(module_type: str, report_id: str, title: str, institution_name: str) -> dict[str, Any]:
briefing = read_real_sample("briefing-doc.md")
blog = read_real_sample("blog-post.md")
study = read_real_sample("study-guide.md")
dimensions = read_real_sample("query-dimensions.md")
key_data = read_real_sample("query-key-data.md")
divergence = read_real_sample("query-divergence.md")
weaknesses = read_real_sample("query-weaknesses.md")
timeline = read_real_sample("query-timeline.md")
related_sources = read_real_sample("query-related-sources.md")
briefing_sections = markdown_sections(briefing, min_level=2, limit=7)
blog_sections = markdown_sections(blog, min_level=3, limit=7)
dimension_sections = numbered_sections(dimensions, limit=5)
key_data_sections = numbered_sections(key_data, limit=12)
divergence_sections = numbered_sections(divergence, limit=5)
weakness_sections = numbered_sections(weaknesses, limit=5)
timeline_sections = numbered_sections(timeline, limit=10)
related_sections = numbered_sections(related_sources, limit=8)
study_faq = numbered_sections(study, limit=5)
key_rows = key_data_rows()
core_points = [
{"kind": "view", "text": "市场表面平静,但底层已经从美国大型科技股向欧洲、日本、新兴市场、价值股和小盘股重新轮动。"},
{"kind": "number", "text": "M7 在标普 500 指数中的市值占比接近 35%,单一板块波动正在显著影响指数风险。"},
{"kind": "risk", "text": "AI 基础设施融资从现金流叙事转向债务和表外融资,私人信贷、保险公司和银行授信之间的关联增强。"},
{"kind": "risk", "text": "白银在 2026 年 1 月先涨超 50%、后单日跌近 30%,暴露了杠杆 ETF 再平衡和保证金触发平仓的放大效应。"},
]
base = {
"basic_info": {
"content": {
"report_id": report_id,
"title_cn": title,
"summary_cn": "BIS 2026 年 3 月季度评论,回顾 2025 年 11 月 29 日至 2026 年 3 月 5 日的全球金融市场变化,覆盖市场轮动、AI 融资、私募信贷、贵金属和新兴市场政策反应。",
"topics": ["宏观金融", "金融稳定", "AI 融资", "非银风险"],
"interpretation_label": "研报解读",
}
},
"executive_overview": {
"preview": {
"preview_summary": "报告认为,本轮变化不是单一资产回调,而是高估值科技股、AI 基础设施融资、贵金属杠杆交易和非银信用链条共同推动的市场重新校准。它的核心价值在于把看似分散的市场波动,放回金融稳定和跨市场传导的框架里理解。",
"section_count": len(briefing_sections) + len(blog_sections),
"key_quote_snippet": "全球金融市场在表面的平静下经历了深刻的流向切换与重新校准。",
"highlights": ["资金从美国大型科技股转向欧洲、日本和新兴市场", "AI 基础设施融资开始暴露信用风险", "贵金属波动显示杠杆交易的放大效应"],
},
"full": {
"intro_cn": "这份报告把 2025 年底到 2026 年初的市场变化概括为一次跨资产重新校准:美国大型科技股降温,资金转向欧洲、日本和新兴市场;AI 融资从高成长叙事进入债务和表外风险阶段;贵金属和非银金融机构的波动说明杠杆与流动性仍是金融稳定的关键变量。",
"sections": briefing_sections + blog_sections[:4],
"source_artifacts": ["native_briefing_doc", "native_blog_post"],
},
},
"core_insights": {
"content": {"points": core_points},
"full": {
"points": core_points,
"dimensions": [{"dimension": item["heading"], "summary": item["body"]} for item in dimension_sections],
},
},
"key_data": {
"preview": {
"preview_headline": "8 个真实关键数据点",
"highlights": [f"{row['metric']}{row['value']}" for row in key_rows[:3]],
"row_count": len(key_rows),
},
"full": {
"rows": key_rows,
"source_artifacts": ["data_table", "query_key_data"],
"supporting_notes": [{"heading": item["heading"], "body": item["body"]} for item in key_data_sections[:6]],
},
},
"source_compliance": {
"content": {
"source_url": "https://www.bis.org/publ/qtrpdf/r_qt2603.htm",
"source_note": "原文为 BIS Quarterly Review, March 2026 的公开研报;本页仅提供中文解读,不提供解读内容下载。",
"copyright_cn": "原文版权归发布机构所有;本页为基于公开研报整理的中文阅读辅助。",
"disclaimer": "本内容仅供研报阅读参考,不构成投资建议。",
"ai_generated_label": "AI 辅助生成",
}
},
"differentiated_view": {
"preview": {
"preview_headline": "5 处与常见叙事的分歧",
"highlights": [item["heading"] for item in divergence_sections[:3]],
"divergence_count": len(divergence_sections),
},
"full": {
"divergences": [
{
"topic": item["heading"],
"consensus_view": split_heading_body(item)[0] or "常规叙事没有充分覆盖该维度。",
"report_position": split_heading_body(item)[1],
}
for item in divergence_sections
]
},
},
"weaknesses": {
"preview": {
"preview_headline": "5 个论证弱点与反方向证据",
"highlights": [item["heading"] for item in weakness_sections[:3]],
"item_count": len(weakness_sections),
"disclaimer_brief": "只做论证质量分析,不做投资建议。",
},
"full": {
"disclaimer_cn": "以下仅分析研报论证质量,不构成投资建议。",
"verification_notes": ["以上问题需要结合后续市场数据、原文脚注和反方向证据继续验证。"],
"items": [
{
"topic": item["heading"],
"weakness": item["body"],
"counter_evidence": "需要结合后续数据、原文脚注与反方向证据继续验证。",
}
for item in weakness_sections
],
},
},
"timeline": {
"preview": {
"preview_headline": "10 个关键事件节点",
"date_range": "1990s-2026",
"highlights": [item["heading"] for item in timeline_sections[:3]],
"event_count": len(timeline_sections),
},
"full": {
"events": [
{
"date": item["heading"],
"period_type": "report_timeline",
"event": item["heading"],
"impact": item["body"],
}
for item in timeline_sections
]
},
},
"study_guide": {
"preview": {
"preview_headline": "术语与问答",
"faq_count": len(study_faq),
"glossary_count": 8,
"sample_question": study_faq[0]["heading"] if study_faq else "为什么要读这份 BIS 季报?",
"highlights": ["核心概念摘要", "简答练习题", "重要术语表"],
},
"full": {
"intro_cn": "这一部分整理了阅读本篇研报时容易遇到的概念、问题和术语。",
"faq_items": [{"question": item["heading"], "answer": item["body"]} for item in study_faq],
"glossary": [
{"term": "M7", "definition": "主导美国股市的七大科技巨头。"},
{"term": "SRT", "definition": "合成风险转移,银行通过衍生品或担保转移部分信用风险。"},
{"term": "BISTRO", "definition": "BIS Time-series Regression Oracle,宏观时间序列预测工具。"},
{"term": "NBFI", "definition": "非银行金融机构。"},
{"term": "Shadow Borrowing", "definition": "经济实质类似债务、但主要存在于资产负债表外的融资安排。"},
{"term": "BDCs", "definition": "业务发展公司,是私募信贷市场的公开交易窗口之一。"},
{"term": "Carry Trade", "definition": "借入低息货币、投资高息资产的套利交易。"},
{"term": "Margin-triggered Liquidations", "definition": "保证金要求上升触发的被迫平仓。"},
],
},
},
"structure_graph": {
"preview": {
"preview_headline": "结构梳理",
"root": "BIS 季报:分析框架",
"top_nodes": [item["heading"] for item in dimension_sections[:5]],
"fallback_derived": True,
},
"full": {
"root": "BIS 季报:分析框架",
"nodes": [
{
"label": item["heading"],
"children": [phrase.strip("") for phrase in re.split(r"[。;;]", item["body"])[:3] if phrase.strip()],
}
for item in dimension_sections
],
"fallback_derived": True,
"source_artifacts": ["query_dimensions"],
},
},
"related_sources": {
"content": {
"items": [
{"title": item["heading"], "source_name": "延伸资料", "summary_cn": item["body"]}
for item in related_sections[:3]
],
"review_note": "延伸来源仅作为候选队列,正式展示前需要人工审核。",
},
"full": {
"items": [
{"title": item["heading"], "source_name": "延伸资料", "summary_cn": item["body"]}
for item in related_sections
],
"review_note": "延伸来源仅作为候选队列,正式展示前需要人工审核。",
},
},
"audio": {
"content": {
"audio_id": "aud_bis_notebooklm_sample",
"title_cn": "BIS 季度评论",
"duration_sec": 75,
"chapters": [],
}
},
}
return base[module_type]
INSTITUTIONS = [
("inst_wgc", "世界黄金协会", "World Gold Council", "industry_org", "tier_1", "https://www.gold.org/", ["贵金属", "央行"]),
("inst_imf", "国际货币基金组织", "International Monetary Fund", "international_org", "tier_1", "https://www.imf.org/", ["宏观金融", "外汇"]),
("inst_world_bank", "世界银行", "World Bank", "international_org", "tier_1", "https://www.worldbank.org/", ["大宗商品", "发展经济"]),
("inst_iea", "国际能源署", "International Energy Agency", "international_org", "tier_1", "https://www.iea.org/", ["能源", "原油"]),
("inst_eia", "美国能源信息署", "U.S. Energy Information Administration", "official", "tier_1", "https://www.eia.gov/", ["能源", "原油"]),
("inst_usgs", "美国地质调查局", "U.S. Geological Survey", "official", "tier_1", "https://www.usgs.gov/", ["矿产", "贵金属"]),
("inst_ecb", "欧洲央行", "European Central Bank", "official", "tier_1", "https://www.ecb.europa.eu/", ["货币政策", "欧元区"]),
("inst_bis", "国际清算银行", "Bank for International Settlements", "international_org", "tier_1", "https://www.bis.org/", ["宏观金融", "金融稳定"]),
("inst_fed", "美联储", "Federal Reserve", "official", "tier_1", "https://www.federalreserve.gov/", ["货币政策", "美元"]),
("inst_opec", "欧佩克", "OPEC", "international_org", "tier_1", "https://www.opec.org/", ["能源", "原油"]),
("inst_ssga", "道富环球投资管理", "State Street Global Advisors", "asset_manager", "tier_2", "https://www.ssga.com/", ["贵金属", "跨资产"]),
("inst_wisdomtree", "WisdomTree", "WisdomTree", "asset_manager", "tier_2", "https://www.wisdomtree.com/", ["大宗商品", "资产配置"]),
("inst_ing", "ING 银行研究", "ING Think", "bank_research", "tier_2", "https://think.ing.com/", ["贵金属", "外汇"]),
("inst_silver_institute", "白银协会", "The Silver Institute", "industry_org", "tier_2", "https://silverinstitute.org/", ["白银", "矿产"]),
("inst_goldman", "高盛研究", "Goldman Sachs Research", "bank_research", "tier_3", "https://www.goldmansachs.com/", ["大宗商品", "宏观"]),
("inst_jpm", "摩根大通研究", "J.P. Morgan Research", "bank_research", "tier_3", "https://www.jpmorgan.com/", ["大宗商品", "宏观"]),
("inst_invesco", "景顺", "Invesco", "asset_manager", "tier_3", "https://www.invesco.com/", ["ETF", "资产配置"]),
("inst_pas", "泛美白银", "Pan American Silver", "partner", "tier_3", "https://www.panamericansilver.com/", ["白银", "矿业"]),
]
BASE_REPORTS = [
(REAL_SAMPLE_REPORT_ID, "BIS 季度评论:全球金融市场重新校准", "inst_bis", "official_public", True, ["宏观金融", "金融稳定", "AI 融资", "非银风险"], "2026-06-02T00:00:00Z"),
("rep_ssga_gold", "黄金月报:金价新高之后,谁在继续买?", "inst_ssga", "authorized_partner", True, ["贵金属", "跨资产"], "2026-05-22T00:00:00Z"),
("rep_wb_pinksheet", "世界银行大宗商品价格表:金属分化继续", "inst_world_bank", "official_public", True, ["大宗商品", "金属"], "2026-05-20T00:00:00Z"),
("rep_iea_omr", "IEA 原油市场月报:库存与需求再平衡", "inst_iea", "official_public", True, ["能源", "原油"], "2026-05-18T00:00:00Z"),
("rep_ing_gold", "ING 黄金观点:实际利率回摆的压力测试", "inst_ing", "authorized_partner", False, ["贵金属", "外汇"], "2026-05-16T00:00:00Z"),
("rep_wisdomtree_outlook", "WisdomTree 商品展望:配置窗口与回撤风险", "inst_wisdomtree", "authorized_partner", False, ["大宗商品", "资产配置"], "2026-05-14T00:00:00Z"),
("rep_usgs_minerals", "USGS 矿产摘要:关键金属供给约束", "inst_usgs", "official_public", True, ["矿产", "贵金属"], "2026-05-12T00:00:00Z"),
("rep_pas_silver", "白银矿业更新:供给扰动与成本曲线", "inst_pas", "broker_public_gray", False, ["白银", "矿业"], "2026-05-10T00:00:00Z"),
("rep_eia_steo", "EIA 短期能源展望:油气价格情景", "inst_eia", "official_public", True, ["能源", "原油"], "2026-05-08T00:00:00Z"),
]
LIGHT_REPORTS = [
("rep_imf_weo", "IMF 世界经济展望:增长分化与政策空间", "inst_imf", "official_public", True, ["宏观金融"], "2026-05-06T00:00:00Z"),
("rep_bis_quarterly", "BIS 季报:市场重新校准", "inst_bis", "official_public", True, ["宏观金融", "金融稳定"], "2026-05-04T00:00:00Z"),
("rep_fed_fsr", "美联储金融稳定报告:杠杆与流动性", "inst_fed", "official_public", True, ["金融稳定"], "2026-05-02T00:00:00Z"),
("rep_ecb_bulletin", "欧洲央行经济公报:通胀路径更新", "inst_ecb", "official_public", True, ["货币政策"], "2026-04-30T00:00:00Z"),
("rep_opec_momr", "OPEC 月报:供需缺口与配额纪律", "inst_opec", "official_public", True, ["能源", "原油"], "2026-04-28T00:00:00Z"),
("rep_wgc_trends", "世界黄金协会:黄金需求趋势", "inst_wgc", "official_public", True, ["贵金属", "央行"], "2026-04-26T00:00:00Z"),
("rep_silver_survey", "白银协会:白银供需调查", "inst_silver_institute", "official_public", True, ["白银"], "2026-04-24T00:00:00Z"),
("rep_gs_commodity", "高盛商品观点:再通胀交易复盘", "inst_goldman", "broker_public_gray", False, ["大宗商品"], "2026-04-22T00:00:00Z"),
("rep_jpm_flows", "摩根大通资金流:商品 ETF 与风险偏好", "inst_jpm", "authorized_partner", False, ["跨资产"], "2026-04-20T00:00:00Z"),
("rep_invesco_etf", "景顺 ETF 观察:黄金与能源配置", "inst_invesco", "authorized_partner", False, ["ETF", "贵金属"], "2026-04-18T00:00:00Z"),
("rep_world_bank_macro", "世界银行宏观更新:贸易与大宗商品", "inst_world_bank", "official_public", True, ["宏观金融", "大宗商品"], "2026-04-16T00:00:00Z"),
("rep_iea_gas", "IEA 天然气市场报告:需求弹性", "inst_iea", "official_public", True, ["能源"], "2026-04-14T00:00:00Z"),
("rep_eia_inventory", "EIA 库存周报解读:裂解价差与需求", "inst_eia", "official_public", False, ["能源"], "2026-04-12T00:00:00Z"),
("rep_usgs_copper", "USGS 铜矿供给:项目延迟与品位下降", "inst_usgs", "official_public", False, ["矿产"], "2026-04-10T00:00:00Z"),
("rep_ing_fx", "ING 外汇周报:美元路径与黄金敏感性", "inst_ing", "authorized_partner", False, ["外汇", "贵金属"], "2026-04-08T00:00:00Z"),
("rep_wisdomtree_gold", "WisdomTree 黄金配置:避险与实际利率", "inst_wisdomtree", "authorized_partner", False, ["贵金属"], "2026-04-06T00:00:00Z"),
("rep_ecb_stability", "欧洲央行稳定评估:非银金融风险", "inst_ecb", "official_public", False, ["金融稳定"], "2026-04-04T00:00:00Z"),
("rep_bis_ai_credit", "BIS 专题:AI 融资与信用风险", "inst_bis", "official_public", False, ["金融稳定", "AI"], "2026-04-02T00:00:00Z"),
]
def module_envelope(module_type: str, report_id: str, title: str, institution_name: str, *, fallback: bool = False) -> dict[str, Any]:
base = {
"basic_info": {"content": {"report_id": report_id, "title_cn": title, "summary_cn": f"{title} 的基础信息,包含发布机构、发布时间、主题标签和来源层级。", "topics": ["贵金属"], "interpretation_label": "研报解读"}},
"executive_overview": {
"preview": {"preview_summary": f"{title} 的结构化摘要,聚焦核心结论、数据线索与风险边界。", "section_count": 3, "key_quote_snippet": "公开研报显示关键变量正在重新定价。"},
"full": {"intro_cn": f"{title} 的执行摘要。", "sections": [{"heading": "核心结论", "body": "报告把需求、价格和风险拆成可读结构。"}, {"heading": "数据线索", "body": "关键指标用于判断趋势是否可持续。"}, {"heading": "风险边界", "body": "外部冲击和估值回摆仍可能改变短期路径。"}], "source_artifacts": ["native_briefing_doc", "native_blog_post"]},
},
"core_insights": {"content": {"points": [{"kind": "view", "text": "核心变量从情绪驱动转向结构驱动。"}, {"kind": "number", "text": "多项关键指标出现同步变化。"}, {"kind": "risk", "text": "若宏观假设反转,短期波动会放大。"}]}, "full": {"dimensions": [{"dimension": "需求结构", "summary": "机构、ETF 与产业需求变化共同影响价格。"}, {"dimension": "风险路径", "summary": "利率、美元和地缘冲击是主要风险因子。"}]}},
"key_data": {"preview": {"preview_headline": "10 个关键数据点", "highlights": ["央行购金保持韧性", "ETF 资金重新流入", "库存周期出现分化"], "row_count": 10}, "full": {"rows": [{"metric": "样本指标", "value": "10", "unit": "", "importance": "用于验证关键数据模块渲染", "judgment": "方向性信号清晰"}], "source_artifacts": ["data_table", "query_key_data"]}},
"source_compliance": {"content": {"source_url": None if report_id == "rep_pas_silver" else "https://example.org/public-report", "source_note": "灰度来源仅展示来源说明,不提供原文链接。" if report_id == "rep_pas_silver" else "原文来源于机构公开研究页。", "copyright_cn": "内容基于机构公开研报的中文结构化解读。", "disclaimer": "本内容不构成投资建议。", "ai_generated_label": "AI 辅助生成"}},
"differentiated_view": {"preview": {"preview_headline": "3 处与共识的关键分歧", "highlights": ["结构性买盘强于短期情绪", "库存周期解释部分价格韧性"], "divergence_count": 3}, "full": {"divergences": [{"topic": "买盘结构", "consensus_view": "价格主要由短期情绪驱动。", "report_position": "报告强调更稳定的结构性买盘。"}]}},
"weaknesses": {"preview": {"preview_headline": "3 处质疑点与开放问题", "highlights": ["样本窗口偏短", "反方向证据仍需跟踪"], "item_count": 3, "disclaimer_brief": "AI 辅助论证质量分析"}, "full": {"disclaimer_cn": "仅供学习参考,不构成投资建议。", "verification_notes": ["这些开放问题需要结合后续数据、原文脚注和反方向证据继续验证。"], "items": [{"topic": "样本窗口", "weakness": "短周期数据可能放大结论。", "counter_evidence": "后续数据可能修正方向。"}]}},
"timeline": {"preview": {"preview_headline": "5 个关键事件节点", "date_range": "2025-2026", "highlights": ["2026:价格重新定价", "2025:资金结构切换"], "event_count": 5}, "full": {"events": [{"date": "2026-05", "period_type": "review_period", "event": "报告发布", "impact": "为市场判断提供公开依据。"}]}},
"study_guide": {"preview": {"preview_headline": "学习指南", "faq_count": 3, "glossary_count": 5, "sample_question": "这份报告适合谁读?"}, "full": {"intro_cn": "学习指南帮助读者理解术语和关键问题。", "faq_items": [{"question": "这份报告适合谁读?", "answer": "适合关注宏观、商品和资产配置的读者。"}], "glossary": [{"term": "source_tier", "definition": "来源可信层级。"}]}},
"structure_graph": {"preview": {"preview_headline": "研报结构图", "root": f"{title}:分析框架", "top_nodes": ["需求", "价格", "风险"], "fallback_derived": fallback}, "full": {"root": f"{title}:分析框架", "nodes": [{"label": "需求", "children": ["机构", "产业", "投资"]}, {"label": "价格", "children": ["利率", "美元", "库存"]}], "fallback_derived": fallback, "source_artifacts": ["query_dimensions"] if fallback else ["mind_map"]}},
"audio": {"content": {"audio_id": f"aud_{report_id.removeprefix('rep_')}", "title_cn": f"{title} 音频摘要", "duration_sec": 180, "chapters": []}},
}
return base[module_type]
def rich_module_types(report_id: str) -> list[str]:
by_report = {
REAL_SAMPLE_REPORT_ID: [
"basic_info",
"executive_overview",
"core_insights",
"key_data",
"source_compliance",
"institution",
"differentiated_view",
"weaknesses",
"timeline",
"study_guide",
"structure_graph",
"related_sources",
"audio",
],
"rep_ssga_gold": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "differentiated_view", "weaknesses", "timeline", "study_guide", "structure_graph", "audio"],
"rep_wb_pinksheet": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "timeline", "study_guide", "audio"],
"rep_iea_omr": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "study_guide", "structure_graph", "audio"],
"rep_ing_gold": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"],
"rep_wisdomtree_outlook": ["basic_info", "executive_overview", "core_insights", "source_compliance", "institution", "timeline"],
"rep_usgs_minerals": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "timeline", "structure_graph", "audio"],
"rep_pas_silver": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"],
"rep_eia_steo": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "study_guide", "audio"],
}
return by_report.get(report_id, ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"])
async def reset(session: AsyncSession) -> None:
for model in [
OutboundEvent,
PlaybackProgress,
SavedListen,
ReadingHistory,
Favorite,
User,
RelatedNews,
AudioAsset,
DisplayModule,
DisplayArtifact,
RawArtifact,
Report,
Institution,
]:
await session.execute(delete(model))
await session.commit()
async def import_seed(session: AsyncSession) -> None:
await reset(session)
inst_lookup: dict[str, str] = {}
for inst_id, name_cn, name_en, inst_type, tier, url, topics in INSTITUTIONS:
inst_lookup[inst_id] = name_cn
session.add(Institution(institution_id=inst_id, name_cn=name_cn, name_en=name_en, institution_type=inst_type, source_tier=tier, website_url=url, covered_topics=j(topics), intro_cn=f"{name_cn} 的公开研究和数据用于 Phase 1 seed 展示。", credibility_note=f"{name_cn}{tier} 来源。", status="active"))
await session.flush()
all_reports = BASE_REPORTS + LIGHT_REPORTS
audio_report_ids = {report_id for report_id, *_rest, has_audio, _topics, _date in all_reports if has_audio}
for idx, (report_id, title, inst_id, source_tier, has_audio, topics, released) in enumerate(all_reports, start=1):
display_status = "draft" if report_id == "rep_wisdomtree_outlook" else "published"
source_url = None if source_tier == "broker_public_gray" else "https://example.org/public-report"
source_note = "灰度公开来源,仅保留来源说明,不做默认音频化。" if source_tier == "broker_public_gray" else "原文来源于机构公开研究页。"
if report_id == REAL_SAMPLE_REPORT_ID:
source_url = "https://www.bis.org/publ/qtrpdf/r_qt2603.htm"
source_note = "原文为 BIS Quarterly Review, March 2026 的公开研报。"
session.add(
Report(
report_id=report_id,
report_type="single",
title_cn=title,
subtitle_cn="",
original_title="BIS Quarterly Review, March 2026" if report_id == REAL_SAMPLE_REPORT_ID else f"{title} original",
one_liner="2025 年底至 2026 年初,全球金融市场在表面平静下出现资金流向切换,AI 融资、贵金属杠杆和非银风险成为主要线索。" if report_id == REAL_SAMPLE_REPORT_ID else f"{title} 的一分钟结构化摘要。",
institution_id=inst_id,
source_tier=source_tier,
source_url=source_url,
source_note=source_note,
published_at=d(released),
interpreted_at=d(released),
released_at=d(released),
topics=j(topics),
language="en",
has_audio=has_audio,
display_status=display_status,
display_version=1,
cache_version=f"{report_id}:v1",
risk_disclaimer="本内容为公开研报的结构化解读,不构成投资建议。",
interpretation_label="研报解读",
)
)
await session.flush()
da_id = f"da_{report_id.removeprefix('rep_')}_v1"
session.add(DisplayArtifact(display_artifact_id=da_id, report_id=report_id, display_version=1, title_cn=title, summary_cn=f"{title} seed display artifact", source_label=inst_lookup[inst_id], interpretation_label="研报解读", ai_generated_label="AI 辅助生成", synthesis_type="mixed" if has_audio else "text", source_disclosure_text=source_note, review_status="published", published_at=d(released)))
await session.flush()
artifact_types = sample_artifact_types() if report_id == REAL_SAMPLE_REPORT_ID else ["native_briefing_doc", "native_blog_post", "native_study_guide", "data_table", "query_dimensions", "query_key_data"]
for artifact_type in artifact_types:
session.add(RawArtifact(raw_artifact_id=f"raw_{report_id.removeprefix('rep_')}_{artifact_type}", report_id=report_id, artifact_type=artifact_type, payload_format="markdown" if artifact_type != "data_table" else "csv", status="ok", is_publish_blocking=artifact_type in {"native_briefing_doc", "native_blog_post", "data_table", "query_dimensions", "query_key_data"}, retention_status="retained", ingested_at=d(released)))
if report_id == "rep_iea_omr":
session.add(RawArtifact(raw_artifact_id="raw_iea_omr_mind_map", report_id=report_id, artifact_type="mind_map", payload_format="json", status="failed", error="Download failed for mind_map", is_publish_blocking=False, retention_status="retained", ingested_at=d(released)))
module_types = [
value
for value in sorted(
rich_module_types(report_id),
key=lambda value: MODULE_DISPLAY_ORDER.get(value, len(MODULE_DISPLAY_ORDER)),
)
if value != "institution"
]
for order, module_type in enumerate(module_types):
if report_id == REAL_SAMPLE_REPORT_ID:
payload = real_sample_module_envelope(module_type, report_id, title, inst_lookup[inst_id])
else:
payload = module_envelope(module_type, report_id, title, inst_lookup[inst_id], fallback=(report_id == "rep_iea_omr" and module_type == "structure_graph"))
module_id = f"mod_{report_id.removeprefix('rep_')}_{module_type}"
content_ref = f"rnb/modules/{module_id}.json" if "full" in payload else None
session.add(
DisplayModule(
module_id=module_id,
report_id=report_id,
display_artifact_id=da_id,
type=module_type,
title_cn=MODULE_TITLES.get(module_type, module_type),
content_format="json",
content=j(payload),
content_ref=content_ref,
content_etag=etag(payload),
source_raw_artifact_ids=j([]),
status="published" if display_status == "published" else "review",
sort_order=order,
version=1,
)
)
if has_audio and report_id in audio_report_ids:
audio_id = f"aud_{report_id.removeprefix('rep_')}"
session.add(AudioAsset(audio_id=audio_id, report_id=report_id, title_cn=f"{title} 音频摘要", duration_sec=180 + idx, oss_key=f"rnb/audio/{audio_id}.m4a", chapters=j([]), status="published" if display_status == "published" else "review", published_at=d(released)))
if idx <= 15:
session.add(RelatedNews(related_news_id=f"news_{idx:03d}", report_id=report_id, title=f"{title} 延伸阅读", source_name="公开财经资讯", source_url="https://example.org/news", published_at=d(released), language="zh", summary_cn="整理自公开财经资讯的延伸阅读。", match_method="manual_curated", match_keywords=j(topics), match_confidence="medium", status="published"))
await session.flush()
for inst_id in inst_lookup:
count = await session.scalar(select(Report).where(Report.institution_id == inst_id).count()) if False else None
reports = (await session.execute(select(Report).where(Report.institution_id == inst_id, Report.display_status == "published").order_by(Report.released_at.desc()))).scalars().all()
inst = (await session.execute(select(Institution).where(Institution.institution_id == inst_id))).scalar_one()
inst.report_count = len(reports)
if reports:
inst.latest_report_id = reports[0].report_id
inst.latest_report_at = reports[0].released_at
users = [
User(user_id="user_alpha", phone_hash="hash_alpha", display_name="Alpha", status="active"),
User(user_id="user_history", phone_hash="hash_history", display_name="History", status="active"),
User(user_id="user_guest_placeholder", display_name="Guest Placeholder", status="disabled"),
]
session.add_all(users)
await session.flush()
for idx, report_id in enumerate(["rep_ssga_gold", "rep_wb_pinksheet", "rep_iea_omr", "rep_usgs_minerals", "rep_eia_steo"], start=1):
session.add(Favorite(favorite_id=f"fav_{idx:03d}", user_id="user_alpha", report_id=report_id, status="active"))
for idx, report_id in enumerate(["rep_ssga_gold", "rep_wb_pinksheet", "rep_iea_omr"], start=1):
audio_id = f"aud_{report_id.removeprefix('rep_')}"
session.add(PlaybackProgress(progress_id=f"prog_{idx:03d}", user_id="user_alpha", audio_id=audio_id, report_id=report_id, position_sec=idx * 30, duration_sec=180 + idx, completed=False))
await session.commit()
async def main() -> None:
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
async with SessionLocal() as session:
await import_seed(session)
print("seed import complete")
if __name__ == "__main__":
asyncio.run(main())
@@ -0,0 +1,117 @@
from __future__ import annotations
import os
os.environ["RNB_DATABASE_URL"] = "sqlite+aiosqlite:///./test_seed.db"
os.environ["RNB_REDIS_URL"] = "redis://test-redis.invalid/0"
import pytest
from httpx import ASGITransport, AsyncClient
from sqlalchemy import select
from app.db import Base, SessionLocal, engine
from app.main import app
from app.models import AudioAsset, DisplayModule, Institution, Report
from scripts.import_seed_content import import_seed
PREFIX = "/api/report-notebooklm/v1"
@pytest.fixture(autouse=True)
async def seeded_db():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.drop_all)
await conn.run_sync(Base.metadata.create_all)
async with SessionLocal() as session:
await import_seed(session)
yield
@pytest.fixture
async def client():
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as ac:
yield ac
async def test_seed_counts_match_phase1_shape():
async with SessionLocal() as session:
assert len((await session.execute(select(Institution))).scalars().all()) == 18
assert len((await session.execute(select(Report))).scalars().all()) == 27
assert len((await session.execute(select(AudioAsset))).scalars().all()) == 15
assert len((await session.execute(select(DisplayModule))).scalars().all()) >= 120
async def test_health_and_recommended_feed(client: AsyncClient):
health = await client.get(f"{PREFIX}/health")
assert health.status_code == 200
assert health.json() == {"status": "ok"}
feed = await client.get(f"{PREFIX}/feed/recommended")
assert feed.status_code == 200
body = feed.json()
assert body["items"]
assert body["items"][0]["report_id"] == "rep_bis_notebooklm_sample"
assert "display_version" not in body["items"][0]
assert body["items"][0]["cache_version"].startswith("rep_")
async def test_report_detail_hides_internal_fields_and_review_modules(client: AsyncClient):
response = await client.get(f"{PREFIX}/reports/rep_ssga_gold")
assert response.status_code == 200
body = response.json()
assert body["report_id"] == "rep_ssga_gold"
assert "display_version" not in body
module_types = [module["type"] for module in body["modules"]]
assert "study_guide" in module_types
assert "institution" not in module_types
assert "faq" not in module_types
assert "infographic" not in module_types
assert all(module["has_detail_page"] for module in body["modules"])
assert module_types[-1] == "source_compliance"
key_data = next(module for module in body["modules"] if module["type"] == "key_data")
assert key_data["render_mode"] == "card_plus_page"
assert key_data["content"] is None
assert key_data["preview"]["row_count"] == 10
assert key_data["content_ref"].startswith("rnb/modules/")
async def test_module_endpoint_returns_full_content(client: AsyncClient):
detail = (await client.get(f"{PREFIX}/reports/rep_ssga_gold")).json()
key_data = next(module for module in detail["modules"] if module["type"] == "key_data")
response = await client.get(f"{PREFIX}/reports/rep_ssga_gold/modules/{key_data['module_id']}")
assert response.status_code == 200
body = response.json()
assert body["module_id"] == key_data["module_id"]
assert "rows" in body["content"]
assert body["cache_version"] == "rep_ssga_gold:v1"
async def test_boundary_reports(client: AsyncClient):
listen = (await client.get(f"{PREFIX}/listen")).json()
listen_report_ids = {item["report_id"] for item in listen["items"]}
assert "rep_ing_gold" not in listen_report_ids
assert "rep_pas_silver" not in listen_report_ids
hidden = await client.get(f"{PREFIX}/reports/rep_wisdomtree_outlook")
assert hidden.status_code == 404
gray = (await client.get(f"{PREFIX}/reports/rep_pas_silver")).json()
compliance = next(module for module in gray["modules"] if module["type"] == "source_compliance")
assert compliance["content"]["source_url"] is None
assert "灰度" in compliance["content"]["source_note"]
async def test_institutions_and_listen(client: AsyncClient):
institutions = await client.get(f"{PREFIX}/institutions")
assert institutions.status_code == 200
assert len(institutions.json()["items"]) == 18
inst = await client.get(f"{PREFIX}/institutions/inst_ssga")
assert inst.status_code == 200
assert inst.json()["latest_report"]["report_id"] == "rep_ssga_gold"
listen = await client.get(f"{PREFIX}/listen")
assert listen.status_code == 200
assert listen.json()["items"][0]["audio_id"].startswith("aud_")
+47
View File
@@ -0,0 +1,47 @@
# Miscellaneous
*.class
*.log
*.pyc
*.swp
.DS_Store
.atom/
.build/
.buildlog/
.history
.svn/
.swiftpm/
migrate_working_dir/
# IntelliJ related
*.iml
*.ipr
*.iws
.idea/
# The .vscode folder contains launch configuration and tasks you configure in
# VS Code which you may wish to be included in version control, so this line
# is commented out by default.
#.vscode/
# Flutter/Dart/Pub related
**/doc/api/
**/ios/Flutter/.last_build_id
.dart_tool/
.flutter-plugins-dependencies
.pub-cache/
.pub/
/build/
/coverage/
*.apk
build/verification/
# Symbolication related
app.*.symbols
# Obfuscation related
app.*.map.json
# Android Studio will place build artifacts here
/android/app/debug
/android/app/profile
/android/app/release
+77
View File
@@ -0,0 +1,77 @@
# report-notebooklm-app
report-notebooklm 第一阶段应用外壳的 Flutter 客户端。
后端 API 在同一个 monorepo 的 `../report-notebooklm-api/` 里。API、数据、内容流水线的细节都记在那边;这个目录专注于应用交接、UI 状态、构建命令和对接说明。
## 先读这些
- [docs/HANDOFF.md](docs/HANDOFF.md):当前应用状态、已实现的页面、占位项,以及下一步工作。
- [docs/PROJECT_BRIEF.md](docs/PROJECT_BRIEF.md):产品和第一阶段范围速览。
- [docs/APP_RUNBOOK.md](docs/APP_RUNBOOK.md)Flutter 版本、本地运行、Web 构建、Android 调试构建和验证。
- [docs/API_CONTRACT_NOTES.md](docs/API_CONTRACT_NOTES.md):应用所消费的接口和字段。
- [docs/PROJECT_MAP.md](docs/PROJECT_MAP.md):源码目录地图。
## 产品边界
这个仓库装的是应用代码和一份工程交接快照,不是产品的唯一真源。
产品 SSOTmall-docs 里的 report-notebooklm 文档。快照日期:2026-06-03。
技术标识符用 `report-notebooklm``rnb`,面向用户的产品名是 `研听`
## 环境要求
- Flutter 3.44.1 / Dart 3.12.1,或兼容的更新版本。
- 一个正在运行、提供 `/api/report-notebooklm/v1` 的后端。
- 做 Android 构建还需要:Android SDK、已接受的许可协议,以及一台模拟器或真机。
## API 基础地址
应用刻意不内置任何线上 API 默认值。请显式传入后端基础地址:
```bash
flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url>
```
Android 模拟器:
```bash
flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
同一局域网内的 Android 真机:
```bash
flutter run -d <device-id> --dart-define=RNB_API_BASE=http://<host-lan-ip>:<port>/api/report-notebooklm/v1
```
明文 HTTP 只能用于调试构建。发布构建必须使用 HTTPS。
## 验证
```bash
flutter analyze
flutter test
flutter build web --dart-define=RNB_API_BASE=<api-base-url>
flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
## 当前应用范围
已实现:
- 五个底部标签页:推荐、研报、机构、听单、我的。
- 基于 API 的信息流、研报列表、机构列表、听单、机构详情和研报详情。
- 用于内联模块和「卡片 + 页面」模块的模块渲染器注册表。
- 产品显示名 `研听`
- 登录、收藏、外链跳转确认、播放进度的本地 UI 占位。
尚未实现:
- 真实鉴权。
- 真实的收藏 / 历史 / 收听记录同步。
- 真正可播放的音频流。
- 真实的外链事件写入。
- 生产 API 域名。
- 发布签名、最终图标和最终应用商店元信息。

Before

Width:  |  Height:  |  Size: 544 B

After

Width:  |  Height:  |  Size: 544 B

Before

Width:  |  Height:  |  Size: 442 B

After

Width:  |  Height:  |  Size: 442 B

Before

Width:  |  Height:  |  Size: 721 B

After

Width:  |  Height:  |  Size: 721 B

Before

Width:  |  Height:  |  Size: 1.0 KiB

After

Width:  |  Height:  |  Size: 1.0 KiB

Before

Width:  |  Height:  |  Size: 1.4 KiB

After

Width:  |  Height:  |  Size: 1.4 KiB

Some files were not shown because too many files have changed in this diff Show More