Compare commits

..

4 Commits

Author SHA1 Message Date
jimme e93356e849 docs: add data source flow guide and localize handoff READMEs
- Add docs/DATA_SOURCE_FLOW.md: end-to-end source -> NotebookLM ->
  storage -> App flow, source list with publish frequency, institution
  intro status, ingestion artifact structure, and known cadence gaps
- Link the new doc from README and PROJECT_OVERVIEW indexes
- Localize top-level and subproject READMEs to Chinese for handoff
  (pre-existing working-tree changes)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:59:38 +09:00
jingyun a76ea8dd07 fix:编译iOS和web目录 2026-06-03 10:17:39 +08:00
jingyun 4a632ba60f fix:flutter的sdk过高,适应当前开发版本 2026-06-03 10:17:17 +08:00
jimme 634ae98dec chore: prepare yanting monorepo handoff 2026-06-03 10:39:03 +09:00
146 changed files with 83 additions and 3381 deletions
+45 -47
View File
@@ -1,49 +1,47 @@
# Local/private agent overlays
AGENTS.local.md
CURRENT_STATUS.md
docs.jimme.local/
docs.*.local/
# Secrets and local env
.env
*.env.local
# Python
report-notebooklm-api/.venv/
report-notebooklm-api/.pytest_cache/
report-notebooklm-api/.mypy_cache/
report-notebooklm-api/**/*.pyc
report-notebooklm-api/**/__pycache__/
report-notebooklm-api/*.db
report-notebooklm-api/*.egg-info/
# Flutter / Dart
report-notebooklm-app/.dart_tool/
report-notebooklm-app/.flutter-plugins-dependencies
report-notebooklm-app/.pub-cache/
report-notebooklm-app/.pub/
report-notebooklm-app/build/
report-notebooklm-app/coverage/
# Android local/generated
report-notebooklm-app/android/.gradle/
report-notebooklm-app/android/local.properties
report-notebooklm-app/android/app/debug/
report-notebooklm-app/android/app/profile/
report-notebooklm-app/android/app/release/
report-notebooklm-app/**/*.apk
report-notebooklm-app/**/*.jks
report-notebooklm-app/**/*.keystore
report-notebooklm-app/android/key.properties
# IDE / OS noise
# Miscellaneous
*.class
*.log
*.pyc
*.swp
.DS_Store
**/.DS_Store
.idea/
*.iml
.vscode/
.atom/
.build/
.buildlog/
.history
.svn/
.swiftpm/
migrate_working_dir/
# Build artifacts
build/
dist/
coverage/
# IntelliJ related
*.iml
*.ipr
*.iws
.idea/
# The .vscode folder contains launch configuration and tasks you configure in
# VS Code which you may wish to be included in version control, so this line
# is commented out by default.
#.vscode/
# Flutter/Dart/Pub related
**/doc/api/
**/ios/Flutter/.last_build_id
.dart_tool/
.flutter-plugins-dependencies
.pub-cache/
.pub/
/build/
/coverage/
*.apk
build/verification/
# Symbolication related
app.*.symbols
# Obfuscation related
app.*.map.json
# Android Studio will place build artifacts here
/android/app/debug
/android/app/profile
/android/app/release
-126
View File
@@ -1,126 +0,0 @@
# AGENTS.md - Yanting Engineering Repo
> Public agent instructions for this repository. This file is safe to commit.
> Local agents may read ignored `AGENTS.local.md`, but the repository must not depend on it.
> Last updated: 2026-06-03.
## Project
This repository contains the Phase 1 implementation and engineering handoff for `研听 / report-notebooklm`.
`研听` is a Chinese research-report interpretation app. It turns global institutional research reports into structured Chinese reading and listening experiences. The product is an interpretation and annotation service, not investment advice.
Technical identifiers:
- Code/API/internal name: `report-notebooklm`
- Short prefix: `rnb`
- API prefix: `/api/report-notebooklm/v1`
- Database schema name: `report_notebooklm`
- User-facing display name: `研听`
Do not use the user-facing display name in code identifiers, database schema names, Redis keys, object-storage paths, or package names.
## Repository Layout
This is intended to be a single Gitea repository.
| Path | Purpose |
|---|---|
| `README.md` | Human-facing repository entry point. |
| `docs/` | Repo-level public handoff, decisions, and development history. |
| `report-notebooklm-api/` | FastAPI backend, database models, migrations, seed importer, API docs. |
| `report-notebooklm-app/` | Flutter app, Android/web scaffolds, App docs. |
| `docs.jimme.local/` | Ignored local-only notes, not required by the team. |
| `AGENTS.local.md` | Ignored local agent overlay. |
## Public vs Local Documentation
Public, committed documentation must be portable:
- Use repository-relative paths.
- Use environment variables and placeholders for credentials.
- Describe product decisions in team-readable language.
- Distinguish implemented behavior from planned/spec behavior.
Do not commit local-only material:
- Local absolute paths.
- Personal machine setup.
- private agent workflow.
- raw session logs.
- local screenshots, APKs, caches, virtualenvs, build outputs.
- credentials or local service passwords.
Use `docs.jimme.local/` for ignored local notes and raw process references. Durable team-facing conclusions should be distilled into public `docs/`.
## Product and Compliance Constraints
- Public responses expose only reviewed display artifacts, not raw NotebookLM artifacts.
- Public responses expose `cache_version`; `display_version` and module `version` are internal.
- Do not expose raw artifact payloads, local file paths, NotebookLM notebook/source/conversation IDs, account identifiers, or private object-storage paths.
- Phase 1 has no report-interpretation download feature.
- Phase 1 does not include comments, UGC, paid unlocks, membership, task walls, points, trading signals, or investment recommendations.
- NotebookLM-native/source-driven artifacts are the content source. Do not use local LLM rewriting to invent publishable report content.
- Gray broker sources and generated media require compliance/operations review before public release.
## Backend
Read first:
- `report-notebooklm-api/README.md`
- `report-notebooklm-api/docs/HANDOFF.md`
- `report-notebooklm-api/docs/API_AND_DATA.md`
- `report-notebooklm-api/docs/CONTENT_PIPELINE.md`
- `report-notebooklm-api/docs/RUNBOOK.md`
Verify:
```bash
cd report-notebooklm-api
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
alembic upgrade head
python scripts/import_seed_content.py
pytest -q
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
The backend requires `.env` settings for real MySQL/Redis environments. Use `.env.example` as the template. Do not commit `.env`.
## App
Read first:
- `report-notebooklm-app/README.md`
- `report-notebooklm-app/docs/HANDOFF.md`
- `report-notebooklm-app/docs/API_CONTRACT_NOTES.md`
- `report-notebooklm-app/docs/APP_RUNBOOK.md`
Verify:
```bash
cd report-notebooklm-app
flutter analyze
flutter test
flutter build web --dart-define=RNB_API_BASE=<api-base-url>
flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
The App intentionally has no built-in live API default. Always pass `RNB_API_BASE`.
## Decision Records
Long-lived decisions belong in `docs/DECISIONS.md`.
Development timeline and major implementation changes belong in `docs/DEVELOPMENT_HISTORY.md`.
Raw session logs, temporary planning transcripts, or local-only evidence pointers belong in ignored `docs.jimme.local/`.
## Git Rules
- Target remote: `https://gitea.neuronlabs.art/third-party-project/yanting.git`.
- Commit one monorepo, not nested repositories.
- Before the final monorepo push, remove or archive nested `.git/` directories under subprojects so source files are committed as normal directories.
- Keep `.env`, build artifacts, caches, APKs, local status files, and local agent overlays ignored.
- Use English commit messages with prefixes such as `feat:`, `fix:`, `docs:`, and `chore:`.
+38 -101
View File
@@ -1,107 +1,36 @@
# 研听 / report-notebooklm
# report-notebooklm-app
`研听` 是一个第一阶段(Phase 1)的应用和后端,用来把全球机构研报转化成结构化的中文阅读与收听体验
report-notebooklm 第一阶段应用外壳的 Flutter 客户端
这个仓库被整理成单个 Gitea 交接仓库,供产品和工程团队接手使用
## 仓库里有什么
| 区域 | 路径 | 说明 |
|---|---|---|
| 后端 API | `report-notebooklm-api/` | FastAPI 服务、MySQL 模型、Alembic 迁移、种子数据导入、对外只读 API。 |
| Flutter 应用 | `report-notebooklm-app/` | Flutter 客户端,包含五个主标签页、研报详情模块、Android/Web 脚手架。 |
| 仓库文档 | `docs/` | 项目级概览、决策记录、开发历程和交接指南。 |
| 后端文档 | `report-notebooklm-api/docs/` | API、数据、内容流水线、运维手册的细节。 |
| 应用文档 | `report-notebooklm-app/docs/` | 应用运维手册、项目地图、API 调用说明。 |
## 产品速览
`研听` 帮助中文用户读懂全球机构研报,覆盖宏观、贵金属、大宗商品、能源、央行、跨资产等主题。
第一阶段聚焦在:
- 推荐:精选 / 最新的研报解读。
- 研报:研报列表和基础筛选。
- 机构:机构列表和机构详情。
- 听单:带音频的研报。
- 我的:游客 / 登录状态,以及浅层的个人状态入口。
第一阶段明确**不包含**:评论、UGC、付费解锁、会员、广告、交易信号、投资建议、研报解读下载。
后端 API 在同一个 monorepo 的 `../report-notebooklm-api/` 里。API、数据、内容流水线的细节都记在那边;这个目录专注于应用交接、UI 状态、构建命令和对接说明
## 先读这些
给人类读者:
- [docs/HANDOFF.md](docs/HANDOFF.md):当前应用状态、已实现的页面、占位项,以及下一步工作。
- [docs/PROJECT_BRIEF.md](docs/PROJECT_BRIEF.md):产品和第一阶段范围速览。
- [docs/APP_RUNBOOK.md](docs/APP_RUNBOOK.md)Flutter 版本、本地运行、Web 构建、Android 调试构建和验证。
- [docs/API_CONTRACT_NOTES.md](docs/API_CONTRACT_NOTES.md):应用所消费的接口和字段。
- [docs/PROJECT_MAP.md](docs/PROJECT_MAP.md):源码目录地图。
1. `docs/PROJECT_OVERVIEW.md`
2. `docs/DECISIONS.md`
3. `docs/DATA_SOURCE_FLOW.md`
4. `docs/DEVELOPMENT_HISTORY.md`
5. `report-notebooklm-api/docs/HANDOFF.md`
6. `report-notebooklm-app/docs/HANDOFF.md`
## 产品边界
给 AI agent
这个仓库装的是应用代码和一份工程交接快照,不是产品的唯一真源。
1. `AGENTS.md`
2. `docs/DECISIONS.md`
3. 对应子系统的 README 和运维手册。
产品 SSOTmall-docs 里的 report-notebooklm 文档。快照日期:2026-06-03。
## 当前实现状态
技术标识符用 `report-notebooklm``rnb`,面向用户的产品名是 `研听`
后端已实现:
## 环境要求
- 挂在 `/api/report-notebooklm/v1` 下的 FastAPI 应用
- 第一阶段数据表的 SQLAlchemy 模型层
- Alembic 初始迁移
- 种子数据导入脚本。
- 健康检查、信息流、研报、研报模块、机构、听单的对外只读接口。
- 针对种子数据和对外 API 行为的测试。
- Flutter 3.44.1 / Dart 3.12.1,或兼容的更新版本
- 一个正在运行、提供 `/api/report-notebooklm/v1` 的后端
- 做 Android 构建还需要:Android SDK、已接受的许可协议,以及一台模拟器或真机
应用已实现:
## API 基础地址
- 五个底部标签页:推荐、研报、机构、听单、我的。
- 基于 `RNB_API_BASE` 的列表 / 详情视图。
- 研报详情的模块渲染器注册表。
- 登录、收藏、外链跳转确认、播放进度的本地占位实现。
- Android 和 Web 构建脚手架。
尚未达到生产可用:
- 鉴权和个人状态。
- 真实的音频流签名。
- 外链事件写入。
- 内部内容管理 API。
- 生产环境对象存储和缓存失效。
- 生产 API 域名、发布签名、最终应用图标、应用商店元信息。
## 后端快速上手
应用刻意不内置任何线上 API 默认值。请显式传入后端基础地址:
```bash
cd report-notebooklm-api
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# 按你的 MySQL 和 Redis 编辑 .env
alembic upgrade head
python scripts/import_seed_content.py
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
冒烟检查:
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
## 应用快速上手
```bash
cd report-notebooklm-app
flutter analyze
flutter test
flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url>
```
@@ -111,30 +40,38 @@ Android 模拟器:
flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
## 验证
后端:
同一局域网内的 Android 真机:
```bash
cd report-notebooklm-api
source .venv/bin/activate
pytest -q
flutter run -d <device-id> --dart-define=RNB_API_BASE=http://<host-lan-ip>:<port>/api/report-notebooklm/v1
```
应用:
明文 HTTP 只能用于调试构建。发布构建必须使用 HTTPS。
## 验证
```bash
cd report-notebooklm-app
flutter analyze
flutter test
flutter build web --dart-define=RNB_API_BASE=<api-base-url>
flutter build apk --debug --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
## 文档边界
## 当前应用范围
这个仓库是一份代码交接快照,不能替代产品的唯一真源(SSOT)。
已实现:
产品 SSOTmall-docs 里的 report-notebooklm 文档,快照日期:2026-06-03
- 五个底部标签页:推荐、研报、机构、听单、我的
- 基于 API 的信息流、研报列表、机构列表、听单、机构详情和研报详情。
- 用于内联模块和「卡片 + 页面」模块的模块渲染器注册表。
- 产品显示名 `研听`
- 登录、收藏、外链跳转确认、播放进度的本地 UI 占位。
仅限本机的笔记、私有路径、原始会话指针、个人 agent 工作流,都应放在被忽略的 `docs.jimme.local/``AGENTS.local.md` 里。
尚未实现:
- 真实鉴权。
- 真实的收藏 / 历史 / 收听记录同步。
- 真正可播放的音频流。
- 真实的外链事件写入。
- 生产 API 域名。
- 发布签名、最终图标和最终应用商店元信息。

Before

Width:  |  Height:  |  Size: 544 B

After

Width:  |  Height:  |  Size: 544 B

Before

Width:  |  Height:  |  Size: 442 B

After

Width:  |  Height:  |  Size: 442 B

Before

Width:  |  Height:  |  Size: 721 B

After

Width:  |  Height:  |  Size: 721 B

Before

Width:  |  Height:  |  Size: 1.0 KiB

After

Width:  |  Height:  |  Size: 1.0 KiB

Before

Width:  |  Height:  |  Size: 1.4 KiB

After

Width:  |  Height:  |  Size: 1.4 KiB

-269
View File
@@ -1,269 +0,0 @@
# 数据源流转说明 / Data Source Flow
这是一份交接快照,不是产品唯一真源(SSOT)。
产品 SSOTmall-docs 的 report-notebooklm 文档,快照日期:2026-06-03。
本文把"研报从哪里来 → 怎么解读 → 存在哪里 → 怎么进 APP"这条链路一次讲清楚,并把之前文档里分散或缺失的部分(尤其是**数据源清单**与**更新频率**)补齐。涉及的具体实现细节请回到各子系统文档与 SSOT 核对。
---
## 1. 一图看懂:四层数据模型 + 端到端流转
研听的数据分**四层**。前两层是内部证据,**对 APP 不可见**;后两层是审核后的展示物,对 APP 可见。
```
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 1 报告源 Report Source │
│ 机构研报 PDF / 来源 URL / 机构元数据 │
│ └─ 来自公开官方源 / 授权伙伴源 / 灰色券商公开源 │
└───────────────────────────────┬─────────────────────────────────────┘
│ 上传到 NotebookLM,源驱动解读
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 2 原始产物 Raw Artifact(内部,App 不可见) │
│ NotebookLM 原生 + 定向查询产物,全量保留 │
│ └─ payload 存对象存储;DB 只存 metadata + payload_ref + sha256 │
└───────────────────────────────┬─────────────────────────────────────┘
│ 确定性组装 / 清洗 / 字段映射 + 人工审核
│ (禁止本地 LLM 重写原文)
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 3 展示产物 Display Artifact(审核后,App 可见) │
│ display_artifacts + display_modules(按 P0/P1/P2 分层的详情模块) │
│ └─ 状态机:missing → raw_ready → review → approved → published │
└───────────────────────────────┬─────────────────────────────────────┘
│ 只读公开 API
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 4 App 响应 App Response │
│ 列表 / 详情骨架 / 模块懒加载 / 音频签名 URL / 机构卡片 │
└─────────────────────────────────────────────────────────────────────┘
```
**核心原则**APP 永远只消费 Layer 3/4 的"审核后展示物",从不直接读 Layer 1/2 的原始 PDF 或 NotebookLM 原始产物。误请求原始产物应返回 `RAW_ARTIFACT_NOT_EXPOSED`403)。
---
## 2. 数据源(Layer 1
### 2.1 三类来源 / 三个可信层级
| 来源类别 | 可信层级 | 处理规则 |
|---|---|---|
| 官方公开源(监管机构、国际组织、行业组织) | `tier_1` | 标准流程。 |
| 卖方研究 / 资管(投行、资管公司、数据商) | `tier_2` | 标准流程。 |
| 灰色券商公开源 | `tier_3` | 更严格审核;来源 URL 展示受限,需走后端短期签名 URL;发布前必须合规/运营复核。 |
| 自家 / 授权合作源 | 按约定 | 暂空,后续接期货公司 / 券商内部研报时新增。 |
来源可作参考的历史经验来自 Vision 的源清单与源健康数据,但**生产数据不得依赖本地 Vision 运行时、本地路径、本地缓存或本地账号状态**。
### 2.2 研报 PDF 源清单与发布频率(补齐)
以下为 SSOTvision-research-sources)中**已启用的研报 PDF 源**,按主题分组,含**天然发布频率**——这正是此前文档缺失的"PDF 更新频率"基线。频率列指**源站自身的研报发布周期**,不等于研听的解读 / 复读节奏(见第 6 节)。
**贵金属专门机构**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| World Gold Council(世界黄金协会) | Weekly Markets Monitor / Silver Lining | 周 |
| WPIC(世界铂金投资协会) | 铂金季报 | 季 |
| State Street(道富) | 贵金属月度 | 月 |
| ING | 贵金属 / 外汇研究 | 不定期 |
| Silver Institute(白银协会) | 白银市场 | 年 / 不定期 |
| HDFC Securities / Sharekhan | 印度市场视角 | 不定期 |
| Emirates NBD | 中东 / 央行购金 | 不定期 |
**跨资产 / 大宗宏观(卖方主力)**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| Goldman Sachs(高盛研究) | 大宗 / 宏观展望 | 年 / 不定期 |
| J.P. MorganAM + PWM | 资产配置展望 | 年 / 不定期 |
| Bloomberg Intelligence | 跨资产 | 不定期 |
| WisdomTreeEU + US | 大宗商品展望 | 不定期 |
| Invesco(景顺) | ETF / 资产配置 | 年 |
| World Bank(世界银行) | Commodity Markets Outlook / **Pink Sheet** | 半年 / **双周(频率最高)** |
**能源**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| EIA(美国能源信息署) | Short-Term Energy Outlook | 月 |
| IEA(国际能源署) | Oil Market Report / Gas Market Report | 月 / 季 |
| OPEC(欧佩克) | 年度展望 | 年 |
| IEEJ(日本能源经济研究所) | 能源经济 | 不定期 |
| Policy Center(摩洛哥) | 能源政策 | 不定期 |
**矿企 / 工业金属 / 农产品**
| 机构 | 代表报告 | 发布频率 |
|---|---|---|
| USGS(美国地质调查局) | Mineral Commodity Summaries | 年 |
| USDA | WASDE 农产品供需 | 月(PDF 链接月度轮换) |
| Eldorado Gold / Pan American Silver | 矿企季报 | 季 |
> 频率总览:**周(WGC)→ 双周(WB Pink Sheet)→ 月(EIA / IEA OMR / USDA / State Street)→ 季(WPIC / IEA Gas / 矿企)→ 半年(WB CMO)→ 年(高盛 / JPM / Invesco / USGS / OPEC**。
**口径优先级**:实际入库表 > Vision `config/research_report_sources.json``config/sources.yaml` > 本文。本文是研听消费视角的聚合视图,会定期 stale,使用前请回源核对。
### 2.3 与种子数据的差异(重要)
后端 `import_seed_content.py` 里的 **18 家机构是种子数据,不是生产清单**。生产权威清单是 §2.2 的 ~31 家 PDF 源。此外:
- 种子里出现的 BIS / Fed / IMF 等,以及早期设计稿设想的 ECB / BOJ,**不在已启用的研报源清单内**——它们是设计设想或实验样本(如 NotebookLM 能力实验用的 BIS 季报),落地时以已启用清单为准。
- 上线前接入新源,应在 Vision 源配置(或后续研听自有源配置)里新增 source,并同步本文。
---
## 3. 机构信息(institutions 表)
| 字段 | 说明 | App 可见 | 现状 |
|---|---|---|---|
| `name_cn` / `name_en` | 中英文名 | 是 | 已填 |
| `institution_type` | 7 类枚举:`official` / `international_org` / `industry_org` / `bank_research` / `asset_manager` / `data_provider` / `partner` | 是 | 已填 |
| `source_tier` | `tier_1/2/3` | 是 | 已填 |
| `website_url` | 官网 | 是 | 已填 |
| `covered_topics` | 覆盖主题 | 是 | 已填 |
| `intro_cn` | **机构详情页简介** | 是 | ⚠️ 字段存在,逐家文本基本未写 |
| `credibility_note` | **可信度说明** | 是 | ⚠️ 仅有 WGC 一条样例 |
**机构介绍现状**schema 完全支持 `intro_cn` + `credibility_note`,但 SSOT 中目前只有一条实际样例——WGC 的可信度说明:"全球黄金行业组织,公开发布黄金需求与市场研究。" §2.2 各机构的"代表报告 / 主题"可作为撰写逐家简介的素材,但**31 家逐家成段介绍文本仍是待补内容**。
---
## 4. PDF → NotebookLM:解读与抓取内容结构(Layer 2)
### 4.1 解读工作流(推荐顺序)
1. 检查源 PDF:标题、机构、日期、页数、大小、报告类型。
2. 为一份报告源创建(或复用)一个 notebook;除非明确做多报告综述,否则一报告一 notebook。
3. 上传报告源。
4. 生成 **P0 文本包**source description、原生 Briefing Doc、原生 Blog Post、data table、query dimensions、query key data、query divergence、query weaknesses。
5. 生成 **P1 产物**query timeline、query related sources、Study Guide、mind map(若导出成功)。
6. 异步生成 **P2 产物**infographic 候选、audio brief、research discovery。
7. 每步操作后写入 manifest,持久化每个产物状态。
8. 从已审核产物**确定性**组装展示模块。
9. 发布前人工审核。
工具链:NotebookLM CLI`nlm`)创建 notebook、上传 source、生成并导出 artifacts;生产 worker 把 PDF 生产为 raw artifacts 并入库。
### 4.2 产物类型(16 类)与实测结构
一次实测(106 页机构季报样本)产出 **16 类 artifact15 成功、1 失败(mind map 导出失败)**,体量从 ~1KB 文本到 5.4MB 信息图、~75 秒音频不等。各类用途与发布约束:
| Artifact 类型 | 用途 | 阻断发布 | 需人审 |
|---|---|:--:|:--:|
| `source_summary` / `notebook_summary` | 源 / notebook 级摘要 | 否 | 否 |
| `native_briefing_doc` | 原生简报文档 | **是** | 否 |
| `native_blog_post` | 原生博文 | **是** | 否 |
| `native_study_guide` | FAQ / 学习指南 / 术语表 | 否 | 否 |
| `data_table` | 结构化表格(CSV | **是** | 否 |
| `mind_map` | 思维导图 / 图结构源 | 否 | 否 |
| `query_dimensions` | 分析维度 | **是** | 否 |
| `query_key_data` | 关键数据点 | **是** | 否 |
| `query_divergence` | 与共识的分歧 | 否 | 否 |
| `query_weaknesses` | 弱点与开放问题 | 否 | 否 |
| `query_timeline` | 时间线与转折点 | 否 | 否 |
| `query_related_sources` | 相关源候选 | 否 | **是** |
| `research_discovery` | 拓展队列 | 否 | **是** |
| `infographic` | 公开候选图 | 否 | **是** |
| `audio_brief` | 音频预览 / 音频源 | 否 | 否 |
> **最高价值层**是 query 系产物(dimensions / key_data / divergence / weaknesses / timeline),体量最大、信息最密。
### 4.3 raw artifact 元数据结构(manifest → 数据库 `raw_artifacts`
每条 artifact 记录持久化的字段:`artifact_type``provider`(默认 notebooklm)、`payload_format``payload_ref`(对象存储引用)、`sha256``size_bytes``status`pending/ok/failed)、`error``generated_at` / `ingested_at``is_publish_blocking``requires_human_review``quality_flags``retention_status`,以及内部关联 IDnotebook / source / conversation——**仅内部,绝不进 App 响应**)。
### 4.4 抓取的两条硬规则
- **禁止本地 LLM 重写** NotebookLM 原文。流水线只能编排、清洗、校验、字段映射、确定性组装、人工裁剪;不得用本地改写凭空生成可发布内容。
- **引用页码需二次规范化**:NotebookLM 引用可能给出研报印刷页码(≠ PDF 物理页码),UI 不暴露 raw page label,未规范化前不展示页标;保留 citation 作内部证据。
---
## 5. 存储与流转落点(Layer 2 → 3)
### 5.1 对象存储(阿里云 OSS
原始 payload、音频、图片、超大模块内容都存 OSS,DB 只存引用键。约定前缀:
| 前缀 | 内容 |
|---|---|
| `rnb/raw/` | NotebookLM 原始产物 payload |
| `rnb/modules/` | 展示模块内容(大模块 `content_ref` |
| `rnb/audio/` | 音频资产 |
| `rnb/images/` | 信息图 / 图片 |
- raw payload 存 OSSMySQL 仅存 `payload_ref` + metadata + `sha256`(内部)。
- 音频对象键 `audio_assets.oss_key` 内部不可见;播放 `stream_url` 由后端**即时签发短期签名 URL**(计划有效期 ~2 小时),不落库、无下载 URL。
- 大模块内容(如 mind map / infographic / 长表,>100KB)存 OSS`display_modules.content` 只存 `content_ref` + `content_etag`
> ⚠️ 当前实现状态:真实 OSS 签名与失效策略仍为 **planned**,本仓库 scaffold 未落地生产对象存储。
### 5.2 数据库表(schema = `report_notebooklm`MySQL 8
共 13 张表:`institutions``reports``raw_artifacts``display_artifacts``display_modules``audio_assets``related_news``users``favorites``reading_history``saved_listens``playback_progress``outbound_events`
- **内容侧**(已实现模型):前 7 张。
- **用户态侧**(已实现模型、API 多为 planned):后 6 张。
### 5.3 raw → display 审核状态机
```
missing → raw_ready → review → approved → published
↑↓
hidden
```
**发布门槛**:所有 `is_publish_blocking=True` 的 P0 模块均已 `published`,且来源署名与风险免责声明齐备、公开响应不含原始 payload / 本地路径 / NotebookLM 内部 ID / 账号信息。
---
## 6. 流转节奏(cadence)与已知缺口
把"频率"分成三个层次看,避免混淆:
| 层次 | 现状 |
|---|---|
| **A. 各源天然发布频率** | ✅ 已明确,见 §2.2(周 / 双周 / 月 / 季 / 半年 / 年)。 |
| **B. 单次 NotebookLM 生产压力策略** | ✅ 已实测:单账号串行(`parallelism=1`)、限速(~48 ops/小时量级)、按产物重量 60–150 秒冷却、不跑 slides/video、research discovery 不自动导入;一篇报告图文层约 20–30 分钟。 |
| **C. 研听自身的解读 / 复读 / 排产 cadence** | ❌ **未冻结**——产品契约层没有定义"每篇研报多久复读一次""每天/每周解读多少篇""生产 runner 的 cron/触发节奏"。 |
**内容量门槛(非频率,但相关)**
- 开发期种子:1020 条 Report / 58 个 Institution / 35 条带音频。
- 上线前首批:30–50 条已审核研报解读,≥10 条带音频。
**仍待补的缺口(建议下一步处理)**
1. **研听生产 cadenceC 层)**:每篇研报的复读周期、每天/每周产量、生产 runner 调度节奏。Phase 1 的定位是"上线前批量跑一次最小内容集,不阻塞 App 开发",**持续 cadence 留给后续阶段(G5 服务端生产链迁移)**,目前仅"每周检查可发布数量"。
2. **机构逐家介绍文本**:§3 的 `intro_cn` / `credibility_note` 31 家逐家内容。
3. **种子清单 vs 生产清单对齐**:把 §2.2 的生产源清单沉淀为正式机构主数据,替换 18 家种子。
---
## 7. 进入 APP 的出口(Layer 4
是的,**最终是"进数据库 + 进对象存储"的双层落地**,APP 通过只读 API 消费:
- 元数据与结构化模块内容 → **MySQL**13 张表)。
- 原始 PDF、原始产物、音频、图片、超大模块 → **对象存储**DB 存引用键。
- 缓存 → Redisfeed/detail 缓存、播放进度去抖、限流)。
**公开 API**(前缀 `/api/report-notebooklm/v1`):`/feed/recommended``/reports``/reports/{id}`(详情骨架)、`/reports/{id}/modules/{module_id}`(重模块全文懒加载)、`/institutions``/institutions/{id}``/listen`,以及计划中的 `/audio/{id}/stream`(短期签名 URL)。
**详情页取数模型**:骨架 + 模块懒加载——轻模块内联返回 `content`;重模块返回 `preview`,全文走二级端点或 `content_ref`,客户端用 `content_etag` 校验缓存。公开已发布内容可直读 `content_ref`;受限(灰色)来源走后端短期签名 URL。
**内部生产链 API**service token + 网络白名单,绝不对 App 暴露):`POST /internal/reports/{id}/raw-artifacts``/display-artifacts``/publish``/hide` 等。发布动作更新展示状态、刷新 `has_audio`、bump `cache_version`、清相关缓存键。
---
## 8. 相关文档
- 内容流水线细节:`report-notebooklm-api/docs/CONTENT_PIPELINE.md`
- API 与数据模型:`report-notebooklm-api/docs/API_AND_DATA.md`
- 运维与存储约定:`report-notebooklm-api/docs/RUNBOOK.md`
- 决策记录:`docs/DECISIONS.md`
- 产品 SSOTmall-docs report-notebooklm 文档(数据源清单、构建 brief、数据模型契约、NotebookLM 能力实验报告)。
-44
View File
@@ -1,44 +0,0 @@
# Decision Record
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Product Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-02 | Phase 1 scope is a Chinese global institutional report interpretation app, not a pure audio app. | Five main tabs remain 推荐 / 研报 / 机构 / 听单 / 我的. |
| 2026-06-02 | Phase 1 has no commercialization. | No ads, paid unlock, membership, task wall, or points. |
| 2026-06-02 | Phase 1 does not open comments, UGC, or user-generated report interpretation. | App should not show community or publishing entry points. |
| 2026-06-02 | Guest users can browse public content and fully listen to at least one item. | Login should not block first listening experience. |
| 2026-06-03 | Product display name is `研听`; technical identifiers stay `report-notebooklm` / `rnb`. | Code identifiers, database schema, Redis keys, object-storage paths, and API prefixes remain brand-neutral. |
| 2026-06-03 | Phase 1 has no report-interpretation download feature. | No top-level download icon, detail download button, profile download record, download API, or offline audio package. |
## API and Data Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-03 | Public responses expose only `cache_version`. | `display_version`, module `version`, and nested cache version objects are internal. |
| 2026-06-03 | Heavy modules use a skeleton plus lazy full-module flow. | Detail returns previews; full content uses `/reports/{report_id}/modules/{module_id}` or a content reference. |
| 2026-06-03 | FAQ, Study Guide, and Glossary are represented as `study_guide`. | Legacy `faq` should map to `study_guide`; no separate public `faq` type. |
| 2026-06-03 | Public published content may use direct content references; restricted sources need short-lived backend signed URLs. | Backend keeps module endpoint and should add signed URL behavior for restricted content. |
| 2026-06-03 | Gray broker sources may be full-text audio-ized, but need compliance/operations review before production. | Seed and production rules can allow audio, but release must remain reviewed. |
## Content Pipeline Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-02 | NotebookLM is treated as a source-driven research engine. | Use native artifacts and targeted queries; do not invent unsupported copy. |
| 2026-06-02 | Raw artifacts stay internal. | App consumes reviewed display artifacts only. |
| 2026-06-02 | P0 text artifacts publish first; media and enrichment are async. | Audio, infographic, research discovery, and mind map must not block text publishability. |
| 2026-06-02 | Vision can be used as source/reference experience but not as a production runtime dependency. | Production data must not depend on local Vision runtime, local paths, or local account state. |
## Repository and Handoff Decisions
| Date | Decision | Impact |
|---|---|---|
| 2026-06-03 | Gitea target is a single repository. | `report-notebooklm-api/` and `report-notebooklm-app/` should be ordinary subdirectories in one repo. |
| 2026-06-03 | Public docs must be portable. | No local absolute paths or private machine setup in committed docs. |
| 2026-06-03 | Local-only agent and status material goes into ignored files. | Use `AGENTS.local.md` and `docs.jimme.local/`. |
| 2026-06-03 | Long-lived decisions are public; raw sessions are local. | Distill decisions into this file; keep session pointers in ignored local docs. |
-85
View File
@@ -1,85 +0,0 @@
# Development History
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## 2026-06-02 - Product Scope Freeze
- Product scope was corrected away from the old "Wall Street listening" / pure-audio framing.
- Phase 1 was frozen around a Chinese research-report interpretation app.
- Main tabs were fixed as 推荐 / 研报 / 机构 / 听单 / 我的.
- Non-goals were made explicit: no commercialization, comments, UGC, trading advice, professional terminal, or local Vision runtime dependency.
- Vision was kept as reference/source experience, not production runtime.
## 2026-06-02 - Development Plan and Review
- Phase 1 technical baseline was selected:
- Flutter App.
- FastAPI backend.
- MySQL 8.
- Redis with `rnb:` namespace.
- Object storage for raw artifacts, heavy modules, audio, and images.
- Existing cloud/server deployment model.
- External launch dependencies were identified:
- SMS template/signature.
- WeChat Open Platform.
- Apple login if required.
- AI-generated-content labeling.
- compliance review for source and media policies.
- The plan passed independent review with changes requested around launch blockers and implementation details.
## 2026-06-03 - Backend Scaffold
- FastAPI service created under `report-notebooklm-api/`.
- SQLAlchemy model layer created for Phase 1 tables.
- Alembic initial migration added.
- Seed importer added with institutions, reports, display artifacts, display modules, audio assets, users, favorites, and playback progress.
- Public read routes implemented:
- `/health`
- `/feed/recommended`
- `/reports`
- `/reports/{id}`
- `/reports/{id}/modules/{module_id}`
- `/institutions`
- `/institutions/{id}`
- `/listen`
- Tests added for seed counts, public API shape, hidden/review module boundaries, gray-source behavior, and listen list behavior.
## 2026-06-03 - App Scaffold
- Flutter app shell created under `report-notebooklm-app/`.
- Five tabs implemented.
- API client added with explicit `RNB_API_BASE`.
- Feature folders created for feed, reports, institutions, listen, profile, detail, and shared widgets.
- Detail module renderer registry added.
- Local placeholders added for blocked behaviors:
- login
- favorite
- outbound confirmation
- playback progress
- real audio stream
- Android platform scaffold added.
## 2026-06-03 - Handoff Preparation
- Backend and App documentation added.
- Public docs were distilled from product documents without copying the full product-doc tree.
- Local-only paths and raw session details were separated from public docs.
- Root README and public AGENTS instructions were introduced for the single-repo Gitea handoff.
## Current Verification Snapshot
Validated during handoff preparation:
- Backend editable install with dev dependencies.
- Backend migration.
- Backend seed import.
- Backend tests.
- Backend smoke checks for health, feed, and report detail.
- App analyze.
- App widget test.
- App web build.
- App debug APK build.
Build artifacts are transient and are not committed.
-45
View File
@@ -1,45 +0,0 @@
# Project Overview
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Purpose
`研听` is a Chinese app for understanding global institutional research reports. It converts difficult English research reports into reviewed Chinese reading and listening experiences.
The product is a research-report interpretation and annotation service. It does not provide investment advice.
## Technical Shape
| Layer | Technology | Path |
|---|---|---|
| App | Flutter | `report-notebooklm-app/` |
| API | FastAPI | `report-notebooklm-api/` |
| Database | MySQL 8 | configured by `RNB_DATABASE_URL` |
| Cache | Redis | configured by `RNB_REDIS_URL` |
| Storage | Object storage | planned for raw artifacts, modules, audio, images |
## Phase 1 Surfaces
- 推荐: latest and curated report interpretations.
- 研报: all published report interpretations with basic filters.
- 机构: institution list, institution detail, and recent reports.
- 听单: audio-backed reports.
- 我的: guest/login state and shallow personal-state entries.
## Key Engineering Principle
The app consumes reviewed display artifacts through the API. Raw NotebookLM artifacts are internal evidence and must not be exposed publicly.
NotebookLM-native content may be cleaned, mapped, reviewed, and assembled deterministically. It must not be silently replaced by local LLM rewriting.
## Repository Documentation
- `README.md`: human entry point.
- `AGENTS.md`: public agent instructions.
- `docs/DECISIONS.md`: durable decisions.
- `docs/DEVELOPMENT_HISTORY.md`: major change history.
- `docs/DATA_SOURCE_FLOW.md`: end-to-end data source flow, source list with publish frequency, and storage/ingestion path.
- `report-notebooklm-api/docs/`: backend, data, API, and content pipeline details.
- `report-notebooklm-app/docs/`: App runbook and API consumption notes.

Before

Width:  |  Height:  |  Size: 68 B

After

Width:  |  Height:  |  Size: 68 B

Before

Width:  |  Height:  |  Size: 68 B

After

Width:  |  Height:  |  Size: 68 B

Before

Width:  |  Height:  |  Size: 68 B

After

Width:  |  Height:  |  Size: 68 B

Some files were not shown because too many files have changed in this diff Show More