chore: prepare yanting monorepo handoff

This commit is contained in:
2026-06-03 10:39:03 +09:00
commit fde51468c6
106 changed files with 8171 additions and 0 deletions
+3
View File
@@ -0,0 +1,3 @@
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<host>:<port>/0
RNB_REDIS_KEY_PREFIX=rnb:
+11
View File
@@ -0,0 +1,11 @@
.venv/
__pycache__/
.pytest_cache/
.mypy_cache/
*.egg-info/
*.pyc
*.db
.env
.DS_Store
build/
*.apk
+56
View File
@@ -0,0 +1,56 @@
# report-notebooklm-api Notes
This file keeps short engineering notes for this repository. The durable handoff is in `docs/`.
## 2026-06-03 Phase 1 Scaffold
- Started the Phase 1 backend scaffold from `phase1-build-brief.md`.
- Technical identifiers use `report-notebooklm` / `rnb`; user-facing product name is `研听`.
- Implemented FastAPI config, database, cache helper, routers, SQLAlchemy models, Alembic migration, seed importer, and public read API.
- Public API prefix is `/api/report-notebooklm/v1`.
- Implemented public routes:
- `/health`
- `/feed/recommended`
- `/reports`
- `/reports/{id}`
- `/reports/{id}/modules/{module_id}`
- `/institutions`
- `/institutions/{id}`
- `/listen`
- Seed importer covers institutions, reports, display artifacts, display modules, audio assets, users, favorites, and playback progress.
- Heavy modules store preview/full content in a JSON envelope. Public detail responses expose previews for `card_plus_page`; the module endpoint exposes full content.
- Review-only modules do not appear in public responses.
- Public responses expose `cache_version`; `display_version` and module `version` remain internal.
## Verification Snapshot
- Backend tests: `pytest -q` passed.
- Local API smoke checks passed for `/health`, `/feed/recommended`, and `/reports/rep_ssga_gold`.
- Companion App analyze/test/build checks passed when using a Flutter SDK compatible with Dart 3.12.1.
- Android debug validation was completed during local handoff. Build artifacts and screenshots are transient and should not be committed.
## Resolved Product Decisions
- Public responses expose only `cache_version`.
- Heavy module access keeps both `content_ref` and `GET /reports/{id}/modules/{module_id}` available.
- Public published content may use direct content references; restricted sources should use backend short-lived signed URLs.
- FAQ, Study Guide, and Glossary are represented as a single `study_guide` module type.
- `faq` stays deprecated; legacy seed `faq` should map to `study_guide`.
- Gray-source full-text audio is allowed by product decision but still needs operations/compliance review before production release.
- App prototype feedback decisions from 2026-06-03 are durable in mall-docs `docs/2026-06-03-app-prototype-feedback-decisions.md`.
- Seed/display module order is: 报告概览 / 报告摘要 / 听研报 / 报告要点 / 报告中的关键数据 / 观点差异 / 局限与疑问 / 时间线 / 术语与问答 / 结构梳理 / 延伸阅读 / 报告来源.
- Do not seed a separate `institution` display module for public Detail. Publisher information belongs inside the source/compliance surface rendered as `报告来源`.
- The real BIS sample should be the top report, but public UI copy must not expose internal labels such as NotebookLM sample, query artifact, or artifact mapping.
- `basic_info` and `executive_overview` must not repeat the same text: overview is factual scope/metadata; summary is a few-sentence report-level description.
- All public modules returned for Detail should expose `has_detail_page=True`; tests assert this to prevent accidental regression.
## Remaining Backend Gaps
- Auth routes.
- Personal-state routes.
- Audio stream signed URL route.
- Outbound events route.
- Internal management routes.
- Production object storage integration.
- Production cache invalidation and pagination.
- Deployment environment configuration.
+62
View File
@@ -0,0 +1,62 @@
# report-notebooklm-api
FastAPI service for the report-notebooklm Phase 1 public read surface.
This directory is the main engineering handoff entry for API, data model, seed import, and the NotebookLM-backed content pipeline. The companion Flutter app lives in `../report-notebooklm-app/` in the same monorepo.
## Read First
- [docs/HANDOFF.md](docs/HANDOFF.md): current progress, solved issues, open issues, and handoff order.
- [docs/PROJECT_BRIEF.md](docs/PROJECT_BRIEF.md): product and Phase 1 scope snapshot.
- [docs/API_AND_DATA.md](docs/API_AND_DATA.md): data tables, endpoints, implemented vs planned API.
- [docs/CONTENT_PIPELINE.md](docs/CONTENT_PIPELINE.md): report source and NotebookLM artifact flow.
- [docs/RUNBOOK.md](docs/RUNBOOK.md): local setup, seed import, smoke checks, and deployment checks.
- [docs/ROADMAP_AND_OPEN_ISSUES.md](docs/ROADMAP_AND_OPEN_ISSUES.md): next engineering work.
- [docs/SOURCE_INDEX.md](docs/SOURCE_INDEX.md): source document names used for this handoff snapshot.
## Product Boundary
This repo contains code and an engineering handoff snapshot. It is not the product source of truth.
Product SSOT: mall-docs report-notebooklm docs. Snapshot date: 2026-06-03.
Use `report-notebooklm` and `rnb` for technical identifiers. The user-facing product name is `研听`.
## Local Quick Start
Create a `.env` file with the backend services available to your environment:
```bash
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<host>:<port>/0
RNB_REDIS_KEY_PREFIX=rnb:
```
Then run:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
alembic upgrade head
python scripts/import_seed_content.py
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
API prefix: `/api/report-notebooklm/v1`
## Verify
```bash
source .venv/bin/activate
pytest -q
```
Recommended smoke checks after the service starts:
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
+39
View File
@@ -0,0 +1,39 @@
[alembic]
script_location = migrations
prepend_sys_path = .
path_separator = os
# Runtime value is injected from RNB_DATABASE_URL in migrations/env.py.
sqlalchemy.url = sqlite+aiosqlite:///unused_alembic_config.db
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
+1
View File
@@ -0,0 +1 @@
+15
View File
@@ -0,0 +1,15 @@
from redis.asyncio import Redis
from app.config import get_settings
settings = get_settings()
def prefixed_key(key: str) -> str:
return f"{settings.redis_key_prefix}{key}"
def get_redis() -> Redis:
return Redis.from_url(settings.redis_url, decode_responses=True)
+18
View File
@@ -0,0 +1,18 @@
from functools import lru_cache
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
app_name: str = "report-notebooklm-api"
api_prefix: str = "/api/report-notebooklm/v1"
database_url: str
redis_url: str
redis_key_prefix: str = "rnb:"
model_config = SettingsConfigDict(env_prefix="RNB_", env_file=".env", extra="ignore")
@lru_cache
def get_settings() -> Settings:
return Settings()
+21
View File
@@ -0,0 +1,21 @@
from collections.abc import AsyncGenerator
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase
from app.config import get_settings
class Base(DeclarativeBase):
pass
settings = get_settings()
engine = create_async_engine(settings.database_url, pool_pre_ping=True)
SessionLocal = async_sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)
async def get_session() -> AsyncGenerator[AsyncSession, None]:
async with SessionLocal() as session:
yield session
+22
View File
@@ -0,0 +1,22 @@
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.config import get_settings
from app.routers import health, institutions, listen, reports
settings = get_settings()
app = FastAPI(title=settings.app_name)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=False,
allow_methods=["GET", "POST", "PATCH", "DELETE"],
allow_headers=["*"],
)
app.include_router(health.router, prefix=settings.api_prefix)
app.include_router(reports.router, prefix=settings.api_prefix)
app.include_router(institutions.router, prefix=settings.api_prefix)
app.include_router(listen.router, prefix=settings.api_prefix)
@@ -0,0 +1,32 @@
from app.models.entities import (
AudioAsset,
DisplayArtifact,
DisplayModule,
Favorite,
Institution,
OutboundEvent,
PlaybackProgress,
RawArtifact,
ReadingHistory,
RelatedNews,
Report,
SavedListen,
User,
)
__all__ = [
"AudioAsset",
"DisplayArtifact",
"DisplayModule",
"Favorite",
"Institution",
"OutboundEvent",
"PlaybackProgress",
"RawArtifact",
"ReadingHistory",
"RelatedNews",
"Report",
"SavedListen",
"User",
]
@@ -0,0 +1,302 @@
from __future__ import annotations
import datetime as dt
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Index, Integer, String, Text, UniqueConstraint
from sqlalchemy.dialects.mysql import MEDIUMTEXT
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.db import Base
def utcnow() -> dt.datetime:
return dt.datetime.now(dt.UTC).replace(tzinfo=None)
MediumText = Text().with_variant(MEDIUMTEXT, "mysql")
class Institution(Base):
__tablename__ = "institutions"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
institution_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
name_cn: Mapped[str] = mapped_column(String(255), nullable=False)
name_en: Mapped[str | None] = mapped_column(String(255))
institution_type: Mapped[str] = mapped_column(String(32), nullable=False)
source_tier: Mapped[str] = mapped_column(String(16), nullable=False)
website_url: Mapped[str | None] = mapped_column(String(512))
covered_topics: Mapped[str | None] = mapped_column(Text)
intro_cn: Mapped[str | None] = mapped_column(Text)
credibility_note: Mapped[str | None] = mapped_column(Text)
report_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
latest_report_id: Mapped[str | None] = mapped_column(String(64))
latest_report_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
reports: Mapped[list[Report]] = relationship(back_populates="institution")
__table_args__ = (Index("ix_institutions_status_latest", "status", "latest_report_at"),)
class Report(Base):
__tablename__ = "reports"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
report_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_type: Mapped[str] = mapped_column(String(16), nullable=False, default="single")
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
subtitle_cn: Mapped[str | None] = mapped_column(String(512))
original_title: Mapped[str | None] = mapped_column(String(512))
one_liner: Mapped[str | None] = mapped_column(String(512))
institution_id: Mapped[str] = mapped_column(String(64), ForeignKey("institutions.institution_id"), nullable=False)
co_institution_ids: Mapped[str | None] = mapped_column(Text)
source_tier: Mapped[str] = mapped_column(String(32), nullable=False)
source_url: Mapped[str | None] = mapped_column(String(512))
source_note: Mapped[str] = mapped_column(Text, nullable=False)
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
interpreted_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
released_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
topics: Mapped[str | None] = mapped_column(Text)
language: Mapped[str] = mapped_column(String(8), nullable=False, default="en")
has_audio: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
display_status: Mapped[str] = mapped_column(String(16), nullable=False, default="draft")
display_version: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
cache_version: Mapped[str] = mapped_column(String(128), nullable=False)
risk_disclaimer: Mapped[str | None] = mapped_column(Text)
interpretation_label: Mapped[str | None] = mapped_column(String(64), default="研报解读")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
institution: Mapped[Institution] = relationship(back_populates="reports")
__table_args__ = (
Index("ix_reports_status_released", "display_status", "released_at"),
Index("ix_reports_institution_released", "institution_id", "released_at"),
Index("ix_reports_audio_released", "has_audio", "released_at"),
)
class RawArtifact(Base):
__tablename__ = "raw_artifacts"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
raw_artifact_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
provider: Mapped[str] = mapped_column(String(32), nullable=False, default="notebooklm")
artifact_type: Mapped[str] = mapped_column(String(64), nullable=False)
conversation_id: Mapped[str | None] = mapped_column(String(128))
source_id: Mapped[str | None] = mapped_column(String(128))
notebook_id: Mapped[str | None] = mapped_column(String(128))
source_language: Mapped[str | None] = mapped_column(String(8))
payload_format: Mapped[str] = mapped_column(String(16), nullable=False)
payload_ref: Mapped[str | None] = mapped_column(String(512))
sha256: Mapped[str | None] = mapped_column(String(128))
status: Mapped[str] = mapped_column(String(16), nullable=False, default="pending")
error: Mapped[str | None] = mapped_column(Text)
size_bytes: Mapped[int | None] = mapped_column(BigInteger)
generated_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
ingested_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
is_publish_blocking: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
requires_human_review: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
quality_flags: Mapped[str | None] = mapped_column(Text)
retention_status: Mapped[str] = mapped_column(String(32), nullable=False, default="retained")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (
Index("ix_raw_report_type", "report_id", "artifact_type"),
Index("ix_raw_report_status", "report_id", "status"),
Index("ix_raw_retention", "retention_status"),
)
class DisplayArtifact(Base):
__tablename__ = "display_artifacts"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
display_artifact_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
display_version: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
summary_cn: Mapped[str | None] = mapped_column(Text)
source_label: Mapped[str | None] = mapped_column(String(255))
interpretation_label: Mapped[str | None] = mapped_column(String(64), default="研报解读")
ai_generated_label: Mapped[str | None] = mapped_column(String(128))
synthesis_type: Mapped[str | None] = mapped_column(String(16))
source_disclosure_text: Mapped[str | None] = mapped_column(Text)
review_status: Mapped[str] = mapped_column(String(16), nullable=False, default="review")
reviewed_by: Mapped[str | None] = mapped_column(String(128))
reviewed_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (Index("ix_display_artifacts_report_status_version", "report_id", "review_status", "display_version"),)
class DisplayModule(Base):
__tablename__ = "display_modules"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
module_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
display_artifact_id: Mapped[str] = mapped_column(String(64), ForeignKey("display_artifacts.display_artifact_id"), nullable=False)
type: Mapped[str] = mapped_column(String(32), nullable=False)
title_cn: Mapped[str | None] = mapped_column(String(255))
content_format: Mapped[str] = mapped_column(String(16), nullable=False)
content: Mapped[str | None] = mapped_column(MediumText)
content_ref: Mapped[str | None] = mapped_column(String(512))
content_etag: Mapped[str | None] = mapped_column(String(64))
source_raw_artifact_ids: Mapped[str | None] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="missing")
sort_order: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
version: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (
Index("ix_display_modules_report_status_sort", "report_id", "status", "sort_order"),
Index("ix_display_modules_artifact_status", "display_artifact_id", "status"),
)
class AudioAsset(Base):
__tablename__ = "audio_assets"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
audio_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
source_raw_artifact_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("raw_artifacts.raw_artifact_id"))
title_cn: Mapped[str] = mapped_column(String(512), nullable=False)
duration_sec: Mapped[int | None] = mapped_column(Integer)
oss_key: Mapped[str | None] = mapped_column(String(512))
waveform_ref: Mapped[str | None] = mapped_column(String(512))
chapters: Mapped[str | None] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="missing")
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (Index("ix_audio_report_status", "report_id", "status"),)
class RelatedNews(Base):
__tablename__ = "related_news"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
related_news_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
title: Mapped[str] = mapped_column(String(512), nullable=False)
source_name: Mapped[str | None] = mapped_column(String(255))
source_url: Mapped[str | None] = mapped_column(String(512))
published_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
language: Mapped[str | None] = mapped_column(String(8))
summary_cn: Mapped[str | None] = mapped_column(Text)
match_method: Mapped[str] = mapped_column(String(32), nullable=False, default="manual_curated")
match_keywords: Mapped[str | None] = mapped_column(Text)
match_confidence: Mapped[str | None] = mapped_column(String(8))
status: Mapped[str] = mapped_column(String(16), nullable=False, default="candidate")
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_related_news_report_status", "report_id", "status"),)
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
user_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
phone_hash: Mapped[str | None] = mapped_column(String(128), unique=True)
wechat_openid: Mapped[str | None] = mapped_column(String(128), unique=True)
apple_user_id: Mapped[str | None] = mapped_column(String(256), unique=True)
display_name: Mapped[str | None] = mapped_column(String(128))
avatar_url: Mapped[str | None] = mapped_column(String(512))
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
last_login_at: Mapped[dt.datetime | None] = mapped_column(DateTime)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
class Favorite(Base):
__tablename__ = "favorites"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
favorite_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
__table_args__ = (UniqueConstraint("user_id", "report_id", name="uq_favorites_user_report"), Index("ix_favorites_user_report", "user_id", "report_id"))
class ReadingHistory(Base):
__tablename__ = "reading_history"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
history_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
event_type: Mapped[str] = mapped_column(String(32), nullable=False, default="view_detail")
last_position: Mapped[str | None] = mapped_column(Text)
last_seen_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_reading_history_user_seen", "user_id", "last_seen_at"), Index("ix_reading_history_user_report", "user_id", "report_id"))
class SavedListen(Base):
__tablename__ = "saved_listens"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
saved_listen_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
audio_id: Mapped[str] = mapped_column(String(64), ForeignKey("audio_assets.audio_id"), nullable=False)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
status: Mapped[str] = mapped_column(String(16), nullable=False, default="active")
__table_args__ = (UniqueConstraint("user_id", "audio_id", name="uq_saved_listens_user_audio"),)
class PlaybackProgress(Base):
__tablename__ = "playback_progress"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
progress_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
user_id: Mapped[str] = mapped_column(String(64), ForeignKey("users.user_id"), nullable=False)
audio_id: Mapped[str] = mapped_column(String(64), ForeignKey("audio_assets.audio_id"), nullable=False)
report_id: Mapped[str] = mapped_column(String(64), ForeignKey("reports.report_id"), nullable=False)
position_sec: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
duration_sec: Mapped[int | None] = mapped_column(Integer)
completed: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
updated_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow, onupdate=utcnow)
__table_args__ = (UniqueConstraint("user_id", "audio_id", name="uq_playback_user_audio"), Index("ix_playback_user_audio", "user_id", "audio_id"))
class OutboundEvent(Base):
__tablename__ = "outbound_events"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
outbound_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
click_id: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
tracking_id: Mapped[str] = mapped_column(String(64), nullable=False)
user_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("users.user_id"))
device_id: Mapped[str | None] = mapped_column(String(128))
report_id: Mapped[str | None] = mapped_column(String(64), ForeignKey("reports.report_id"))
institution_id: Mapped[str | None] = mapped_column(String(64))
scene: Mapped[str | None] = mapped_column(String(64))
ref: Mapped[str | None] = mapped_column(String(128))
target: Mapped[str | None] = mapped_column(String(32))
source_page: Mapped[str | None] = mapped_column(String(32))
placement: Mapped[str | None] = mapped_column(String(64))
campaign_id: Mapped[str | None] = mapped_column(String(64))
target_app: Mapped[str | None] = mapped_column(String(64))
commodity_tag: Mapped[str | None] = mapped_column(String(64))
hook_type: Mapped[str | None] = mapped_column(String(64))
user_state: Mapped[str | None] = mapped_column(String(16))
ts: Mapped[int | None] = mapped_column(BigInteger)
created_at: Mapped[dt.datetime] = mapped_column(DateTime, nullable=False, default=utcnow)
__table_args__ = (Index("ix_outbound_tracking", "tracking_id"), Index("ix_outbound_report_created", "report_id", "created_at"))
@@ -0,0 +1 @@
@@ -0,0 +1 @@
@@ -0,0 +1,9 @@
from fastapi import APIRouter
router = APIRouter()
@router.get("/health")
async def health() -> dict[str, str]:
return {"status": "ok"}
@@ -0,0 +1,23 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/institutions")
async def institutions(
topic: str | None = None,
source_tier: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).institutions(topic=topic, source_tier=source_tier, page_size=page_size)
@router.get("/institutions/{institution_id}")
async def institution_detail(institution_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).institution_detail(institution_id)
@@ -0,0 +1,13 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/listen")
async def listen(page_size: int = Query(20, ge=1, le=50), session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).listen_items(page_size=page_size)
@@ -0,0 +1,47 @@
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import get_session
from app.services.catalog import CatalogService
router = APIRouter()
@router.get("/feed/recommended")
async def recommended_feed(
topic: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).report_cards(topic=topic, page_size=page_size)
@router.get("/reports")
async def reports(
topic: str | None = None,
institution_id: str | None = None,
has_audio: bool | None = None,
source_tier: str | None = None,
q: str | None = None,
page_size: int = Query(20, ge=1, le=50),
session: AsyncSession = Depends(get_session),
) -> dict:
return await CatalogService(session).report_cards(
topic=topic,
institution_id=institution_id,
has_audio=has_audio,
source_tier=source_tier,
q=q,
page_size=page_size,
)
@router.get("/reports/{report_id}")
async def report_detail(report_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).report_detail(report_id)
@router.get("/reports/{report_id}/modules/{module_id}")
async def module_detail(report_id: str, module_id: str, session: AsyncSession = Depends(get_session)) -> dict:
return await CatalogService(session).module_detail(report_id, module_id)
@@ -0,0 +1 @@
@@ -0,0 +1 @@
@@ -0,0 +1,272 @@
from __future__ import annotations
import json
from typing import Any
from fastapi import HTTPException
from sqlalchemy import Select, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models import AudioAsset, DisplayModule, Institution, RelatedNews, Report
MODULE_META: dict[str, dict[str, Any]] = {
"basic_info": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"executive_overview": {"layer": "p0", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"core_insights": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"key_data": {"layer": "p0", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"source_compliance": {"layer": "p0", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": True, "requires_human_review": False},
"differentiated_view": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"weaknesses": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"timeline": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"study_guide": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"related_sources": {"layer": "p1", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"structure_graph": {"layer": "p1", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"infographic": {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"audio": {"layer": "p2", "render_mode": "inline", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False},
"research_discovery": {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": True},
"institution": {"layer": "p0", "render_mode": "inline", "has_detail_page": False, "is_publish_blocking": False, "requires_human_review": False},
}
def loads_json(value: str | None, default: Any) -> Any:
if not value:
return default
return json.loads(value)
def iso(value: Any) -> str | None:
return value.isoformat() if value else None
def institution_public(inst: Institution, *, detail: bool = False) -> dict[str, Any]:
data = {
"institution_id": inst.institution_id,
"name_cn": inst.name_cn,
"name_en": inst.name_en,
"institution_type": inst.institution_type,
"source_tier": inst.source_tier,
"website_url": inst.website_url,
"covered_topics": loads_json(inst.covered_topics, []),
"report_count": inst.report_count,
"latest_report_at": iso(inst.latest_report_at),
"credibility_note": inst.credibility_note,
}
if detail:
data["intro_cn"] = inst.intro_cn
return data
def institution_card(inst: Institution) -> dict[str, Any]:
return {
"institution_id": inst.institution_id,
"name_cn": inst.name_cn,
"name_en": inst.name_en,
"source_tier": inst.source_tier,
}
def report_card(report: Report, inst: Institution) -> dict[str, Any]:
return {
"report_id": report.report_id,
"title_cn": report.title_cn,
"subtitle_cn": report.subtitle_cn or "",
"one_liner": report.one_liner,
"institution": institution_card(inst),
"topics": loads_json(report.topics, []),
"released_at": iso(report.released_at),
"has_audio": report.has_audio,
"interpretation_label": report.interpretation_label,
"source_tier": report.source_tier,
"cache_version": report.cache_version,
}
def module_payload(module: DisplayModule) -> dict[str, Any]:
meta = MODULE_META.get(module.type, {"layer": "p2", "render_mode": "card_plus_page", "has_detail_page": True, "is_publish_blocking": False, "requires_human_review": False})
envelope = loads_json(module.content, {})
render_mode = meta["render_mode"]
content = None
preview = None
if render_mode == "inline":
content = envelope.get("content", envelope)
else:
preview = envelope.get("preview", {})
return {
"module_id": module.module_id,
"type": module.type,
"layer": meta["layer"],
"render_mode": render_mode,
"has_detail_page": meta["has_detail_page"],
"is_publish_blocking": meta["is_publish_blocking"],
"requires_human_review": meta["requires_human_review"],
"sort_order": module.sort_order,
"title_cn": module.title_cn,
"content": content,
"preview": preview,
"content_ref": module.content_ref,
"content_etag": module.content_etag,
}
class CatalogService:
def __init__(self, session: AsyncSession) -> None:
self.session = session
async def _published_report_query(self) -> Select[tuple[Report, Institution]]:
return (
select(Report, Institution)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(Report.display_status == "published", Institution.status == "active")
.order_by(Report.released_at.desc(), Report.report_id)
)
async def report_cards(
self,
*,
topic: str | None = None,
institution_id: str | None = None,
has_audio: bool | None = None,
source_tier: str | None = None,
q: str | None = None,
page_size: int = 20,
) -> dict[str, Any]:
stmt = await self._published_report_query()
if topic:
stmt = stmt.where(Report.topics.like(f"%{topic}%"))
if institution_id:
stmt = stmt.where(Report.institution_id == institution_id)
if has_audio is not None:
stmt = stmt.where(Report.has_audio == has_audio)
if source_tier:
stmt = stmt.where(Report.source_tier == source_tier)
if q:
stmt = stmt.where(Report.title_cn.like(f"%{q}%"))
stmt = stmt.limit(min(max(page_size, 1), 50))
rows = (await self.session.execute(stmt)).all()
return {
"items": [report_card(report, inst) for report, inst in rows],
"page": {"next_cursor": None, "has_more": False},
"cache_version": "feed:recommended:seed:v1",
}
async def report_detail(self, report_id: str) -> dict[str, Any]:
row = (
await self.session.execute(
select(Report, Institution)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(Report.report_id == report_id, Report.display_status == "published", Institution.status == "active")
)
).one_or_none()
if row is None:
raise HTTPException(status_code=404, detail={"error": {"code": "REPORT_NOT_FOUND", "message": "报告不存在或未发布。"}})
report, inst = row
modules = (
await self.session.execute(
select(DisplayModule)
.where(DisplayModule.report_id == report_id, DisplayModule.status == "published")
.order_by(DisplayModule.sort_order)
)
).scalars().all()
return {
"report_id": report.report_id,
"title_cn": report.title_cn,
"subtitle_cn": report.subtitle_cn or "",
"original_title": report.original_title,
"one_liner": report.one_liner,
"institution": institution_public(inst, detail=True),
"source": {
"source_url": report.source_url,
"source_note": report.source_note,
"source_tier": report.source_tier,
"published_at": iso(report.published_at),
},
"topics": loads_json(report.topics, []),
"has_audio": report.has_audio,
"interpretation_label": report.interpretation_label,
"risk_disclaimer": report.risk_disclaimer,
"released_at": iso(report.released_at),
"cache_version": report.cache_version,
"modules": [module_payload(module) for module in modules],
}
async def module_detail(self, report_id: str, module_id: str) -> dict[str, Any]:
report = (
await self.session.execute(select(Report).where(Report.report_id == report_id, Report.display_status == "published"))
).scalar_one_or_none()
if report is None:
raise HTTPException(status_code=404, detail={"error": {"code": "REPORT_NOT_FOUND", "message": "报告不存在或未发布。"}})
module = (
await self.session.execute(
select(DisplayModule).where(DisplayModule.report_id == report_id, DisplayModule.module_id == module_id, DisplayModule.status == "published")
)
).scalar_one_or_none()
if module is None:
raise HTTPException(status_code=404, detail={"error": {"code": "MODULE_HIDDEN", "message": "模块隐藏或不可见。"}})
envelope = loads_json(module.content, {})
content = envelope.get("full") or envelope.get("content") or envelope
return {
"module_id": module.module_id,
"type": module.type,
"title_cn": module.title_cn,
"content": content,
"content_etag": module.content_etag,
"cache_version": report.cache_version,
}
async def institutions(self, *, topic: str | None = None, source_tier: str | None = None, page_size: int = 20) -> dict[str, Any]:
stmt = select(Institution).where(Institution.status == "active").order_by(Institution.source_tier, Institution.name_cn).limit(min(max(page_size, 1), 50))
if topic:
stmt = stmt.where(Institution.covered_topics.like(f"%{topic}%"))
if source_tier:
stmt = stmt.where(Institution.source_tier == source_tier)
rows = (await self.session.execute(stmt)).scalars().all()
return {"items": [institution_public(inst) for inst in rows], "page": {"next_cursor": None, "has_more": False}}
async def institution_detail(self, institution_id: str) -> dict[str, Any]:
inst = (await self.session.execute(select(Institution).where(Institution.institution_id == institution_id, Institution.status == "active"))).scalar_one_or_none()
if inst is None:
raise HTTPException(status_code=404, detail={"error": {"code": "INSTITUTION_NOT_FOUND", "message": "机构不存在。"}})
reports = await self.report_cards(institution_id=institution_id, page_size=5)
detail = institution_public(inst, detail=True)
detail["latest_report"] = reports["items"][0] if reports["items"] else None
detail["recent_reports"] = reports["items"]
return detail
async def listen_items(self, *, page_size: int = 20) -> dict[str, Any]:
stmt = (
select(AudioAsset, Report, Institution)
.join(Report, AudioAsset.report_id == Report.report_id)
.join(Institution, Report.institution_id == Institution.institution_id)
.where(AudioAsset.status == "published", Report.display_status == "published")
.order_by(Report.released_at.desc(), AudioAsset.audio_id)
.limit(min(max(page_size, 1), 50))
)
rows = (await self.session.execute(stmt)).all()
items = [
{
"audio_id": audio.audio_id,
"title_cn": audio.title_cn,
"duration_sec": audio.duration_sec,
"report_id": report.report_id,
"report_title_cn": report.title_cn,
"institution": institution_card(inst),
"released_at": iso(report.released_at),
"cache_version": report.cache_version,
}
for audio, report, inst in rows
]
return {"items": items, "page": {"next_cursor": None, "has_more": False}, "cache_version": "listen:seed:v1"}
async def seed_counts(self) -> dict[str, int]:
models = {
"institutions": Institution,
"reports": Report,
"audio_assets": AudioAsset,
"display_modules": DisplayModule,
"related_news": RelatedNews,
}
counts = {}
for name, model in models.items():
counts[name] = await self.session.scalar(select(func.count()).select_from(model)) or 0
return counts
+185
View File
@@ -0,0 +1,185 @@
# API and Data Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Current Implementation Status
Implemented in this repository:
- FastAPI app under `/api/report-notebooklm/v1`.
- SQLAlchemy model layer for the Phase 1 table set.
- Alembic initial migration.
- Seed import script with institutions, reports, modules, audio assets, users, favorites, and playback-progress fixtures.
- Public read endpoints for health, feeds, reports, modules, institutions, and listen list.
- Tests covering seed counts, public response shape, module visibility, gray-source handling, and listen behavior.
Not implemented yet:
- Auth APIs.
- Personal state APIs.
- Audio stream signing endpoint.
- Outbound events endpoint.
- Internal management APIs.
- Real Redis cache invalidation policy.
- Real object-storage signed URL policy.
- Production pagination/cursor behavior beyond seed-scale responses.
## Data Tables
| Table | Purpose | Current model |
|---|---|---|
| `institutions` | Institution profile, source tier, website, topics, credibility notes. | Implemented |
| `reports` | Report master record, source, topics, publication state, cache version. | Implemented |
| `raw_artifacts` | NotebookLM artifact metadata and object-storage references. | Implemented as metadata only |
| `display_artifacts` | Reviewed display version metadata for App consumption. | Implemented |
| `display_modules` | Detail-page modules, sort order, visibility, content or content reference. | Implemented |
| `audio_assets` | Audio metadata and object-storage key. | Implemented |
| `related_news` | Related-source candidates and reviewed related items. | Implemented |
| `users` | User account records. | Implemented as seed model, no auth routes |
| `favorites` | User report favorites. | Implemented as seed model, no API routes |
| `reading_history` | User reading/history events. | Implemented as model, no API routes |
| `saved_listens` | User saved-listen records. | Implemented as model, no API routes |
| `playback_progress` | Playback progress sync records. | Implemented as seed model, no API routes |
| `outbound_events` | External attribution events. | Implemented as model, no API route |
## Public API Implemented
Prefix: `/api/report-notebooklm/v1`
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/health` | Service health. |
| `GET` | `/feed/recommended` | Published report cards for recommendation feed. |
| `GET` | `/reports` | Published report cards with basic filters. |
| `GET` | `/reports/{report_id}` | Report detail skeleton and published modules. |
| `GET` | `/reports/{report_id}/modules/{module_id}` | Full content for a visible module. |
| `GET` | `/institutions` | Active institution list. |
| `GET` | `/institutions/{institution_id}` | Institution detail with latest/recent reports. |
| `GET` | `/listen` | Published audio-backed report list. |
Current filters:
- `/reports`: `topic`, `institution_id`, `has_audio`, `source_tier`, `q`, `page_size`.
- `/institutions`: `topic`, `source_tier`, `page_size`.
- `/feed/recommended` and `/listen`: `page_size`.
Current pagination is seed-scale. Responses return `next_cursor: null` and `has_more: false`.
## Planned Public API
The Phase 1 contract also expects:
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/audio/{audio_id}/stream` | Return short-lived playable URL. |
| `POST` | `/outbound/events` | Persist external attribution click event. |
Audio stream must not return a permanent object-storage URL. The planned behavior is backend-signed short-lived playback URL with no download URL.
## Planned Auth and Personal State API
Auth:
- `POST /auth/phone/start`
- `POST /auth/phone/verify`
- `POST /auth/wechat`
- `POST /auth/apple`
Personal state:
- `GET /me`
- `GET /me/favorites`
- `POST /me/favorites`
- `DELETE /me/favorites/{report_id}`
- `GET /me/history`
- `POST /me/history`
- `GET /me/listens/saved`
- `POST /me/listens/saved`
- `DELETE /me/listens/saved/{audio_id}`
- `POST /me/playback-progress`
- `GET /me/playback-progress/{audio_id}`
These endpoints are contract-level requirements but are not implemented in this scaffold.
## Planned Internal API
Internal APIs should require service token and network allowlist. They must never be exposed to the App.
- `POST /internal/reports`
- `POST /internal/reports/{report_id}/raw-artifacts`
- `GET /internal/reports/{report_id}/raw-artifacts`
- `POST /internal/reports/{report_id}/display-artifacts`
- `PATCH /internal/modules/{module_id}`
- `POST /internal/reports/{report_id}/publish`
- `POST /internal/reports/{report_id}/hide`
- `POST /internal/related-news/candidates`
Publishing should update report display status, update `has_audio`, bump `cache_version`, and clear related cache keys.
## Public vs Internal Fields
Public responses may expose:
- Report identity, title, subtitle, one-liner, topics, institution card, release time, source tier, interpretation label, `has_audio`, and `cache_version`.
- Detail source note, source URL where allowed, risk disclaimer, and published display modules.
- Module metadata needed by the client: `module_id`, `type`, `layer`, `render_mode`, `has_detail_page`, `is_publish_blocking`, `requires_human_review`, `sort_order`, `title_cn`, `content`, `preview`, `content_ref`, `content_etag`.
Public responses must not expose:
- Raw artifact payload.
- Object-storage private paths for raw artifacts.
- NotebookLM notebook IDs, source IDs, conversation IDs, or local account information.
- Local filesystem paths.
- `display_version` or `module.version`.
- User phone hash, WeChat OpenID, Apple user ID, or auth internals.
The public cache contract is a single `cache_version` string. `display_version` and module `version` are server-internal fields only.
## Seed Data
The seed importer currently creates:
- 18 institutions.
- 27 reports, including one NotebookLM sample report and multiple boundary cases.
- 15 audio assets.
- More than 120 display modules.
- Test users, favorites, and playback progress.
Seed boundary cases intentionally cover:
- Reports with audio and reports without audio.
- Hidden/unpublished report behavior.
- Gray broker source with restricted source URL behavior.
- Published modules vs review-only modules.
- `study_guide` module replacing legacy `faq`.
- Heavy modules using `card_plus_page` preview plus full-module endpoint.
Do not treat seed content as production content. It exists to exercise app/API behavior and edge cases.
## Detail Module Model
The detail page uses a skeleton plus module model:
- Inline modules include small `content` directly in the detail response.
- Heavy modules use `render_mode=card_plus_page`, return `preview` in detail, and load full content from `/reports/{report_id}/modules/{module_id}`.
- Unknown future module types should not break the App; they should fall back to hidden or generic rendering.
Core module types:
- `basic_info`
- `executive_overview`
- `core_insights`
- `key_data`
- `source_compliance`
- `institution`
- `differentiated_view`
- `weaknesses`
- `timeline`
- `study_guide`
- `structure_graph`
- `related_sources`
- `infographic`
- `audio`
- `research_discovery`
@@ -0,0 +1,142 @@
# Content Pipeline Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Content Principle
Use NotebookLM as a source-driven research engine, not as a generic rewriting model.
The pipeline may orchestrate, clean, validate, map, and review NotebookLM-native artifacts. It must not silently replace missing NotebookLM artifacts with locally rewritten publishable content.
## Source Inputs
Phase 1 content is based on public or authorized institutional research reports. Priority source categories:
- Official public sources.
- Authorized partner sources.
- Gray broker public sources, with stricter review and source display handling.
Vision source lists, tiering, and historical source-health experience may be used as reference material. Production data must not depend on a local Vision runtime, local path, local cache, or local account state.
## NotebookLM Workflow
Recommended report run order:
1. Inspect the source PDF: title, institution, date, page count, size, and report type.
2. Create or reuse one notebook for one report source unless a multi-report synthesis is explicitly planned.
3. Upload the report source.
4. Generate the P0 text package:
- source description
- native Briefing Doc
- native Blog Post
- data table
- query dimensions
- query key data
- query divergence
- query weaknesses
5. Generate useful P1 artifacts:
- query timeline
- query related sources
- Study Guide
- mind map, if download succeeds
6. Generate P2 artifacts asynchronously:
- infographic candidate
- audio brief
- research discovery
7. Persist every artifact status in a manifest.
8. Deterministically assemble display modules from reviewed artifacts.
9. Run human review before publishing.
## Artifact Types
The Phase 1 schema supports these NotebookLM artifact types:
| Artifact type | Purpose | Publish blocking | Human review |
|---|---|---:|---:|
| `source_summary` | Source-level summary. | No | No |
| `notebook_summary` | Notebook-level summary. | No | No |
| `native_briefing_doc` | Native briefing document. | Yes | No |
| `native_blog_post` | Native blog post. | Yes | No |
| `native_study_guide` | FAQ, study guide, glossary. | No | No |
| `data_table` | Structured table data. | Yes | No |
| `mind_map` | Mind map or graph source. | No | No |
| `query_dimensions` | Analysis dimensions. | Yes | No |
| `query_key_data` | Key data points. | Yes | No |
| `query_divergence` | Views that diverge from consensus. | No | No |
| `query_weaknesses` | Weaknesses and open questions. | No | No |
| `query_timeline` | Timeline and turning points. | No | No |
| `query_related_sources` | Related source candidates. | No | Yes |
| `research_discovery` | Enrichment queue. | No | Yes |
| `infographic` | Candidate public image. | No | Yes |
| `audio_brief` | Listening preview or audio source. | No | No |
Artifact records should keep status, object reference, format, size, hash, generated time, error, and review flags. Raw payloads should stay in object storage and remain internal.
## Module Mapping
| Product module | Primary artifact sources | Notes |
|---|---|---|
| `basic_info` | Source metadata and source summary. | P0, inline. |
| `executive_overview` | Briefing Doc and Blog Post. | P0, heavy card plus page. |
| `core_insights` | Briefing Doc and query dimensions. | P0, inline with optional detail page. |
| `key_data` | Data table and query key data. | P0, heavy card plus page. |
| `source_compliance` | Source metadata and review notes. | P0, inline, must include disclaimer. |
| `institution` | Institution record. | P0, inline. |
| `differentiated_view` | Query divergence. | P1, optional. |
| `weaknesses` | Query weaknesses. | P1, optional, avoid investment-advice wording. |
| `timeline` | Query timeline. | P1, optional. |
| `study_guide` | Native Study Guide. | P1, optional, replaces legacy `faq`. |
| `structure_graph` | Mind map or deterministic fallback. | P1, optional. |
| `related_sources` | Related-source query and review queue. | P1, review required before display. |
| `infographic` | Infographic candidate. | P2, review required before display. |
| `audio` | Audio brief or reviewed audio asset. | P2, not required for text publish. |
| `research_discovery` | Research discovery queue. | P2, internal or reviewed only. |
## Publish Gates
Blocking before public release:
- Source upload succeeded and is traceable.
- Required P0 text artifacts exist and have usable content.
- `basic_info`, `executive_overview`, `core_insights`, `key_data`, and `source_compliance` are present unless a product decision allows a partial report.
- Display artifact is reviewed and approved.
- Source attribution and risk disclaimer are present.
- No raw artifact payload, local path, private notebook ID, or account information appears in public responses.
Non-blocking:
- Mind map.
- Study guide.
- Timeline.
- Related-source candidates.
- Research discovery.
- Infographic.
- Audio.
If optional artifacts fail, record the failure and continue without inventing fallback public copy. Deterministic fallback is allowed for structure graph from already available artifacts.
## Cadence Notes
NotebookLM operations should be conservative by default:
- One active NotebookLM operation per account.
- Text artifacts first.
- Media artifacts after text success.
- Heavy media should not block publishable text.
- On transient failure, retry once; if an optional artifact fails again, mark it failed and continue.
The seed importer is not a production runner. A production runner should persist manifests after every operation and support resumable review/import.
## Human Review
Review is mandatory for:
- Gray broker sources.
- Related-source candidate display.
- Infographic or generated media.
- Any content where citations/page labels are ambiguous.
- Any copy that could be interpreted as investment advice.
Do not display raw NotebookLM page labels until they are normalized against verifiable source pages or sections.
+84
View File
@@ -0,0 +1,84 @@
# Backend Handoff
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Current State
The backend is a runnable Phase 1 scaffold for the public read surface. It is not production-ready yet.
Implemented:
- FastAPI app and API prefix.
- SQLAlchemy models for the Phase 1 table set.
- Alembic initial migration.
- Seed import script.
- Public read API for feed, reports, module detail, institutions, and listen list.
- Tests for current seed behavior and public response boundaries.
Not implemented:
- Authentication.
- User personal-state routes.
- Audio stream signing.
- Outbound attribution route.
- Internal management routes.
- Production object storage integration.
- Real Redis cache invalidation.
- Production deployment config.
## Repository Map
| Path | Purpose |
|---|---|
| `app/main.py` | FastAPI app, CORS, router registration. |
| `app/config.py` | Environment-driven settings. |
| `app/db.py` | Async SQLAlchemy engine and session dependency. |
| `app/cache.py` | Redis client helper and key prefixing. |
| `app/models/entities.py` | SQLAlchemy table models. |
| `app/routers/` | HTTP route handlers. |
| `app/services/catalog.py` | Public catalog response assembly. |
| `migrations/` | Alembic environment and migration files. |
| `scripts/import_seed_content.py` | Seed data importer and module fixture builder. |
| `tests/test_public_api.py` | Current API and seed behavior tests. |
| `docs/` | Engineering handoff documentation. |
## Solved Decisions
- Technical identifiers stay `report-notebooklm` / `rnb`; display name is `研听`.
- Public API responses expose `cache_version`, not `display_version` or module `version`.
- `study_guide` replaces legacy `faq`.
- Heavy modules use preview cards plus full-module endpoint.
- Raw artifacts stay internal; App consumes reviewed display artifacts only.
- Gray broker sources may be audio-ized only after the latest product decision and compliance review.
- Phase 1 has no interpretation-content download feature.
## Known Gaps
- `GET /audio/{audio_id}/stream` needs signed playback URL behavior.
- Auth and personal state APIs need implementation.
- `POST /outbound/events` needs implementation and validation for `click_id` / `tracking_id`.
- Internal publish/hide/import management endpoints need implementation.
- Cursor pagination and cache invalidation are seed-scale placeholders.
- Object storage policy needs a production decision for public vs signed module content.
- Release/deploy settings need staging and production environment values.
- Compliance must re-review gray-source audio and generated media rules before launch.
## Suggested Handoff Order
1. Read `docs/PROJECT_BRIEF.md`.
2. Read `docs/API_AND_DATA.md`.
3. Run the backend locally with seed data using `docs/RUNBOOK.md`.
4. Run `pytest -q` and smoke the three core public endpoints.
5. Pair with `report-notebooklm-app/` and verify `RNB_API_BASE` points to this service.
6. Choose the next work item from `docs/ROADMAP_AND_OPEN_ISSUES.md`.
## Definition of Done for Next Backend Work
- New API behavior has tests.
- Public responses do not expose internal/raw fields.
- Migrations include downgrade.
- New config is environment-driven.
- Seed data remains useful for App development.
- Documentation is updated when contract behavior changes.
@@ -0,0 +1,71 @@
# Project Brief Snapshot
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Product
`研听` is a Chinese research-report interpretation app for users who want to understand global institutional research with lower language and time barriers. It turns hard-to-read English research reports into structured Chinese reading and listening experiences.
Technical identifiers remain `report-notebooklm` and `rnb`. Do not use the product display name in code identifiers, database schema names, Redis keys, object storage paths, or API prefixes.
## Phase 1 Goals
- Validate whether Chinese users will repeatedly consume global institutional research-report interpretations.
- Ship a complete first app experience for discovery, reading, listening, saving, and returning to reports.
- Establish a minimum loop from report sources to selection, NotebookLM-assisted interpretation, review, storage, API distribution, and app display.
- Keep source attribution and compliance clear: this is report interpretation and annotation, not investment advice.
- Keep the commercial app independent from any local-only Vision runtime.
## Target Users
- General Chinese users interested in macro, precious metals, commodities, energy, central banks, and cross-asset research.
- Light professional users who want overseas institutional views and original-source traceability, without trading advice.
- Commuting or fragmented-time users who want reports transformed into listenable content.
Non-target users: professional terminal users, real-time trading-signal users, UGC/community users, and users expecting original investment recommendations.
## Main Tabs
| Tab | Phase 1 scope | Explicitly out of scope |
|---|---|---|
| 推荐 | Latest and curated report interpretations. | Ads, hard trading CTAs, real-time news flashes. |
| 研报 | All published report interpretations with basic filters. | Advanced investment terminal search. |
| 机构 | Institution list and institution report entry points. | Commercial institution ranking or onboarding backend. |
| 听单 | Reports that have audio form. | User-created podcasts, downloads, offline packages. |
| 我的 | Guest/login state, favorites, history, saved listening entry points. | Comments, UGC, paid membership, points. |
## Phase 1 Must Do
- Public browsing for recommended reports, report list, institutions, and listen list.
- Report detail pages with title, institution, publication/release data, source type, topics, summary, structured modules, source/compliance information, and favorite entry.
- Guest users can browse public content and fully listen to at least one episode.
- Logged-in users can synchronize favorites, reading history, saved listens, and playback progress.
- Published app responses must expose only reviewed display artifacts, not raw NotebookLM artifacts.
- Every report detail must preserve source attribution and risk disclaimer wording.
## Phase 1 Must Not Do
- No commercialization: no ads, paid unlock, membership, task wall, or points.
- No comments, community, UGC, or user-generated report interpretations.
- No investment advice, trading signals, buy/sell points, return promises, or portfolio recommendations.
- No original financial news, real-time reporting, or commentary positioned as original market views.
- No in-product downloads for interpretation content, audio packages, or PDFs.
- No long-term production dependency on a local Vision runtime, local SQLite, local scripts, local paths, or local account state.
- No App or server-side LLM rewriting of NotebookLM-native content into unsupported original copy.
## Compliance Boundary
- Positioning: research-report interpretation and annotation service.
- Content: Chinese interpretation of public or authorized institutional reports.
- Detail pages, agreements, and store metadata must state that content is not investment advice.
- Each item must show institution, source, publication time, and interpretation/source labels.
- Gray broker sources require special handling and human review before public release.
- Phase 1 does not open user content surfaces.
## Vision Decoupling
Vision source experience can be reused as reference material: source lists, source tiers, source-health lessons, NotebookLM experience, and prior pitfalls.
The app must not depend on local Vision runtime state in production. Any short-term Vision consumption must be read-only transition input, must not write back to Vision, and must not leak local file paths into production data.
@@ -0,0 +1,57 @@
# Roadmap and Open Issues
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## P0 Before Production Handoff
- Add environment examples and production-safe defaults for all deploy-time settings.
- Decide staging and production API domains.
- Implement `GET /audio/{audio_id}/stream` with short-lived signed playback URL.
- Implement auth start/verify flow and token handling.
- Implement `/me` personal-state APIs for favorites, history, saved listens, and playback progress.
- Implement `POST /outbound/events` with required `click_id` and `tracking_id`.
- Implement production cursor pagination.
- Implement cache invalidation on publish/hide/module/audio changes.
- Add smoke scripts for health, feed, detail, listen, audio stream, favorite, and outbound event.
## P1 Content and Admin
- Implement internal APIs for report import, raw artifacts, display artifacts, module patching, publish, hide, and related-source candidates.
- Implement production content importer from a manifest-based NotebookLM runner.
- Add validation for module JSON schemas.
- Add object storage integration for raw payloads, heavy module content, audio, images, and source references.
- Add publish blocking validation for P0 modules.
- Add gray-source review flags and operational reporting.
## P1 App/API Contract
- Align App with real auth state and return-to-action behavior.
- Add playable audio stream integration once backend stream endpoint exists.
- Replace local playback placeholders with API-backed progress.
- Add real outbound event write before external navigation.
- Decide whether heavy P1 modules stay as separate pages or merge into one deep-dive page.
## P2 Production Operations
- Add structured logs and request IDs.
- Add application metrics for feed/detail/listen/audio/outbound.
- Add backup and restore runbook for database and content objects.
- Add staging seed or reviewed staging content set.
- Add CI checks for lint, tests, migrations, and public response snapshots.
## Product and Compliance Open Issues
- Re-review gray-source audio policy before public release.
- Define AI-generated-content labeling requirements in App detail and store metadata.
- Define infographic watermark, QA, and factual-check process.
- Define source citation display rules after citation/page-label normalization.
- Confirm login channels and external approvals: phone SMS, WeChat, Apple.
- Confirm store listing wording and risk disclaimers.
## Gitea Handoff Blockers
- Use the single Gitea remote for the monorepo.
- Decide whether the initial push goes directly to `main` or to a review branch.
- Confirm the team has access to the product SSOT or accepts the code-repo snapshot as the development handoff.
+112
View File
@@ -0,0 +1,112 @@
# Backend Runbook
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Requirements
- Python 3.12 or compatible with the configured project dependencies.
- MySQL 8 for local/staging/prod-like runs.
- Redis 7 for cache-compatible local/staging/prod-like runs.
- A shell environment that can create a Python virtual environment.
SQLite is used by the automated tests through `RNB_DATABASE_URL`; production-like local runs should use MySQL.
## Environment Variables
Create `.env` in the repository root:
```bash
RNB_DATABASE_URL=mysql+asyncmy://<db-user>:<db-pass>@<db-host>:<db-port>/report_notebooklm
RNB_REDIS_URL=redis://<redis-host>:<redis-port>/0
RNB_REDIS_KEY_PREFIX=rnb:
```
Do not commit `.env`.
## Install
```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```
## Migrate and Seed
```bash
source .venv/bin/activate
alembic upgrade head
python scripts/import_seed_content.py
```
Seed import is destructive for seed tables. Use it only in local or disposable test data environments unless a production-safe importer is written.
## Run API
```bash
source .venv/bin/activate
uvicorn app.main:app --reload --host <bind-host> --port <port>
```
API prefix:
```text
/api/report-notebooklm/v1
```
## Smoke Checks
```bash
API_BASE_URL=http://<api-host>:<port>/api/report-notebooklm/v1
curl "$API_BASE_URL/health"
curl "$API_BASE_URL/feed/recommended"
curl "$API_BASE_URL/reports/rep_ssga_gold"
```
Expected:
- Health returns `{"status":"ok"}`.
- Feed returns non-empty `items`.
- Report detail returns modules and does not include `display_version`.
## Test
```bash
source .venv/bin/activate
pytest -q
```
## App Integration
Start this backend first, then run the App with:
```bash
flutter run -d chrome --dart-define=RNB_API_BASE=<api-base-url>
```
For Android emulator, use an API base URL reachable from that emulator:
```bash
flutter run -d <emulator-id> --dart-define=RNB_API_BASE=<emulator-api-base-url>
```
Only use cleartext HTTP for local debug builds. Release builds must use HTTPS.
## Deployment Checks
Before staging or production:
- Use environment variables for all database, Redis, object storage, auth, and signing settings.
- Configure HTTPS at the gateway.
- Confirm migrations can run forward and downgrade in staging.
- Import reviewed content, not raw/unreviewed NotebookLM artifacts.
- Smoke `/health`, `/feed/recommended`, report detail, audio stream, favorites, and outbound event once those APIs exist.
- Confirm public responses do not expose local paths, raw payloads, notebook IDs, source IDs, conversation IDs, or secrets.
## Operational Notes
- Redis keys must use the `rnb:` prefix or a compatible namespace.
- Object storage keys should use `rnb/raw/`, `rnb/modules/`, `rnb/audio/`, and `rnb/images/` style prefixes.
- Long NotebookLM operations should live in a resumable runner, not inside HTTP request handlers.
@@ -0,0 +1,35 @@
# Source Index
This is a handoff snapshot, not the product SSOT.
Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.
## Snapshot Sources
The handoff documents in this repository were distilled from these logical product-document sources:
| Logical source | Used for |
|---|---|
| `phase1-scope.md` | Product positioning, target users, tabs, Phase 1 scope, non-goals, compliance boundary, Vision decoupling. |
| `phase1-build-brief.md` | Data tables, endpoint list, display module model, artifact enum, seed mapping, open questions. |
| `phase1-development-plan.md` | Technology choices, architecture, Redis/object-storage strategy, phases, deployment assumptions, external dependencies. |
| `data-model-api-contract-v0.1.md` | API/data object intent and response boundaries. |
| `user-flows.md` | Guest vs logged-in behavior, shallow interaction expectations, no-download clarification. |
| `app-prd-v0.1.md` | App-side behavior and page-level expectations. |
| `vision-research-sources.md` | Source-reference context and Vision decoupling principle. |
## Drift Rule
Do not treat this repository snapshot as the product SSOT. When product requirements change:
1. Update the product SSOT first.
2. Update this code-repo snapshot only for information needed by engineers.
3. Bump the snapshot date or add a short changelog entry.
## What Was Not Copied
- Historical drafts.
- Full experiment reports.
- Local-only evidence paths.
- Private local notes.
- Raw NotebookLM notebook IDs, source IDs, conversation IDs, account identifiers, or payloads.
+56
View File
@@ -0,0 +1,56 @@
from __future__ import annotations
import asyncio
from logging.config import fileConfig
from alembic import context
from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import async_engine_from_config
from app.config import get_settings
from app.db import Base
from app.models import * # noqa: F401,F403
config = context.config
config.set_main_option("sqlalchemy.url", get_settings().database_url)
if config.config_file_name is not None:
fileConfig(config.config_file_name)
target_metadata = Base.metadata
def run_migrations_offline() -> None:
context.configure(
url=get_settings().database_url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection: Connection) -> None:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
async def run_migrations_online() -> None:
connectable = async_engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
if context.is_offline_mode():
run_migrations_offline()
else:
asyncio.run(run_migrations_online())
@@ -0,0 +1,23 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
revision = ${repr(up_revision)}
down_revision = ${repr(down_revision)}
branch_labels = ${repr(branch_labels)}
depends_on = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}
@@ -0,0 +1,26 @@
"""phase1 initial tables
Revision ID: 202606030100
Revises:
Create Date: 2026-06-03 01:00:00
"""
from alembic import op
from app.db import Base
from app.models import * # noqa: F401,F403
revision = "202606030100"
down_revision = None
branch_labels = None
depends_on = None
def upgrade() -> None:
bind = op.get_bind()
Base.metadata.create_all(bind=bind)
def downgrade() -> None:
bind = op.get_bind()
Base.metadata.drop_all(bind=bind)
+32
View File
@@ -0,0 +1,32 @@
[project]
name = "report-notebooklm-api"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
"fastapi",
"uvicorn",
"sqlalchemy",
"greenlet",
"alembic",
"pydantic",
"pydantic-settings",
"asyncmy",
"redis",
]
[project.optional-dependencies]
dev = [
"aiosqlite",
"httpx",
"pytest",
"pytest-asyncio",
]
[tool.setuptools.packages.find]
include = ["app*", "scripts*"]
exclude = ["migrations*", "tests*"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["."]
@@ -0,0 +1 @@
@@ -0,0 +1,649 @@
from __future__ import annotations
import asyncio
import csv
import datetime as dt
import hashlib
import re
import json
import sys
from typing import Any
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from sqlalchemy import delete, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.db import Base, SessionLocal, engine
from app.models import (
AudioAsset,
DisplayArtifact,
DisplayModule,
Favorite,
Institution,
OutboundEvent,
PlaybackProgress,
RawArtifact,
ReadingHistory,
RelatedNews,
Report,
SavedListen,
User,
)
def j(value: Any) -> str:
return json.dumps(value, ensure_ascii=False, separators=(",", ":"))
def d(value: str) -> dt.datetime:
return dt.datetime.fromisoformat(value.replace("Z", "+00:00")).replace(tzinfo=None)
def etag(value: Any) -> str:
return hashlib.sha256(j(value).encode("utf-8")).hexdigest()[:16]
REAL_SAMPLE_REPORT_ID = "rep_bis_notebooklm_sample"
REAL_SAMPLE_ROOT = (
Path.home()
/ "Projects/team-project/mall-docs/products/type3-orbit/report-notebooklm/docs.jimme.local/report-notebooklm/notebooklm-capability-bis-2026-06-02"
)
REAL_SAMPLE_ARTIFACTS = REAL_SAMPLE_ROOT / "artifacts"
def read_real_sample(name: str) -> str:
path = REAL_SAMPLE_ARTIFACTS / name
if not path.exists():
return ""
return path.read_text(encoding="utf-8-sig")
def clean_markdown_text(value: str) -> str:
text = re.sub(r"\[\d+(?:[-, ]+\d+)*\]", "", value)
text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
text = text.replace("`", "")
return re.sub(r"\s+", " ", text).strip()
def markdown_sections(markdown: str, *, min_level: int = 2, limit: int = 8) -> list[dict[str, str]]:
sections: list[dict[str, str]] = []
current_heading = ""
current_lines: list[str] = []
heading_re = re.compile(r"^(#{%d,4})\s+(.+)$" % min_level)
for raw_line in markdown.splitlines():
line = raw_line.strip()
if line == "## Citations":
break
match = heading_re.match(line)
if match:
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
current_heading = clean_markdown_text(match.group(2))
current_lines = []
continue
if current_heading and line and not line.startswith("---") and not line.startswith("|"):
current_lines.append(line)
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
return sections[:limit]
def numbered_sections(markdown: str, *, limit: int = 8) -> list[dict[str, str]]:
sections: list[dict[str, str]] = []
pattern = re.compile(r"^(?:###\s*)?\d+\.\s+\**(.+?)\**$")
current_heading = ""
current_lines: list[str] = []
for raw_line in markdown.splitlines():
line = raw_line.strip()
if line == "## Citations":
break
match = pattern.match(line)
if match:
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
current_heading = clean_markdown_text(match.group(1))
current_lines = []
continue
if current_heading and line and not line.startswith("#") and not line.startswith("---"):
current_lines.append(line)
if current_heading and current_lines:
body = clean_markdown_text("\n".join(current_lines))
if body:
sections.append({"heading": current_heading, "body": body})
return sections[:limit]
def split_heading_body(section: dict[str, str]) -> tuple[str, str]:
body = section["body"]
parts = re.split(r"研报观点与证据:|证据:|影响:", body, maxsplit=1)
if len(parts) == 2:
return clean_markdown_text(parts[0]), clean_markdown_text(parts[1])
return "", body
def key_data_rows() -> list[dict[str, str]]:
csv_text = read_real_sample("data-table.csv")
rows: list[dict[str, str]] = []
if csv_text:
for row in csv.DictReader(csv_text.splitlines()):
rows.append(
{
"metric": row.get("数据点/指标名称", ""),
"value": row.get("定量数值或趋势", ""),
"unit": "",
"importance": row.get("风险/修订指示", ""),
"judgment": row.get("相关行业或资产类别", ""),
}
)
if rows:
return rows
return [
{"metric": "M7 市值占比", "value": "近 35%", "unit": "", "importance": "提示指数集中度风险", "judgment": "美国大型科技股"},
{"metric": "SRT 覆盖贷款", "value": "约 8000 亿欧元", "unit": "", "importance": "提示隐藏信贷风险规模", "judgment": "银行业 / 非银机构"},
]
def sample_artifact_types() -> list[str]:
return [
"describe-source",
"native_briefing_doc",
"native_blog_post",
"native_study_guide",
"data_table",
"query_dimensions",
"query_key_data",
"query_divergence",
"query_weaknesses",
"query_timeline",
"query_related_sources",
"audio_brief",
]
MODULE_TITLES = {
"basic_info": "报告概览",
"executive_overview": "报告摘要",
"audio": "听研报",
"core_insights": "报告要点",
"key_data": "报告中的关键数据",
"differentiated_view": "观点差异",
"weaknesses": "局限与疑问",
"timeline": "时间线",
"study_guide": "术语与问答",
"structure_graph": "结构梳理",
"related_sources": "延伸阅读",
"source_compliance": "报告来源",
}
MODULE_DISPLAY_ORDER = {module_type: index for index, module_type in enumerate(MODULE_TITLES)}
def real_sample_module_envelope(module_type: str, report_id: str, title: str, institution_name: str) -> dict[str, Any]:
briefing = read_real_sample("briefing-doc.md")
blog = read_real_sample("blog-post.md")
study = read_real_sample("study-guide.md")
dimensions = read_real_sample("query-dimensions.md")
key_data = read_real_sample("query-key-data.md")
divergence = read_real_sample("query-divergence.md")
weaknesses = read_real_sample("query-weaknesses.md")
timeline = read_real_sample("query-timeline.md")
related_sources = read_real_sample("query-related-sources.md")
briefing_sections = markdown_sections(briefing, min_level=2, limit=7)
blog_sections = markdown_sections(blog, min_level=3, limit=7)
dimension_sections = numbered_sections(dimensions, limit=5)
key_data_sections = numbered_sections(key_data, limit=12)
divergence_sections = numbered_sections(divergence, limit=5)
weakness_sections = numbered_sections(weaknesses, limit=5)
timeline_sections = numbered_sections(timeline, limit=10)
related_sections = numbered_sections(related_sources, limit=8)
study_faq = numbered_sections(study, limit=5)
key_rows = key_data_rows()
core_points = [
{"kind": "view", "text": "市场表面平静,但底层已经从美国大型科技股向欧洲、日本、新兴市场、价值股和小盘股重新轮动。"},
{"kind": "number", "text": "M7 在标普 500 指数中的市值占比接近 35%,单一板块波动正在显著影响指数风险。"},
{"kind": "risk", "text": "AI 基础设施融资从现金流叙事转向债务和表外融资,私人信贷、保险公司和银行授信之间的关联增强。"},
{"kind": "risk", "text": "白银在 2026 年 1 月先涨超 50%、后单日跌近 30%,暴露了杠杆 ETF 再平衡和保证金触发平仓的放大效应。"},
]
base = {
"basic_info": {
"content": {
"report_id": report_id,
"title_cn": title,
"summary_cn": "BIS 2026 年 3 月季度评论,回顾 2025 年 11 月 29 日至 2026 年 3 月 5 日的全球金融市场变化,覆盖市场轮动、AI 融资、私募信贷、贵金属和新兴市场政策反应。",
"topics": ["宏观金融", "金融稳定", "AI 融资", "非银风险"],
"interpretation_label": "研报解读",
}
},
"executive_overview": {
"preview": {
"preview_summary": "报告认为,本轮变化不是单一资产回调,而是高估值科技股、AI 基础设施融资、贵金属杠杆交易和非银信用链条共同推动的市场重新校准。它的核心价值在于把看似分散的市场波动,放回金融稳定和跨市场传导的框架里理解。",
"section_count": len(briefing_sections) + len(blog_sections),
"key_quote_snippet": "全球金融市场在表面的平静下经历了深刻的流向切换与重新校准。",
"highlights": ["资金从美国大型科技股转向欧洲、日本和新兴市场", "AI 基础设施融资开始暴露信用风险", "贵金属波动显示杠杆交易的放大效应"],
},
"full": {
"intro_cn": "这份报告把 2025 年底到 2026 年初的市场变化概括为一次跨资产重新校准:美国大型科技股降温,资金转向欧洲、日本和新兴市场;AI 融资从高成长叙事进入债务和表外风险阶段;贵金属和非银金融机构的波动说明杠杆与流动性仍是金融稳定的关键变量。",
"sections": briefing_sections + blog_sections[:4],
"source_artifacts": ["native_briefing_doc", "native_blog_post"],
},
},
"core_insights": {
"content": {"points": core_points},
"full": {
"points": core_points,
"dimensions": [{"dimension": item["heading"], "summary": item["body"]} for item in dimension_sections],
},
},
"key_data": {
"preview": {
"preview_headline": "8 个真实关键数据点",
"highlights": [f"{row['metric']}{row['value']}" for row in key_rows[:3]],
"row_count": len(key_rows),
},
"full": {
"rows": key_rows,
"source_artifacts": ["data_table", "query_key_data"],
"supporting_notes": [{"heading": item["heading"], "body": item["body"]} for item in key_data_sections[:6]],
},
},
"source_compliance": {
"content": {
"source_url": "https://www.bis.org/publ/qtrpdf/r_qt2603.htm",
"source_note": "原文为 BIS Quarterly Review, March 2026 的公开研报;本页仅提供中文解读,不提供解读内容下载。",
"copyright_cn": "原文版权归发布机构所有;本页为基于公开研报整理的中文阅读辅助。",
"disclaimer": "本内容仅供研报阅读参考,不构成投资建议。",
"ai_generated_label": "AI 辅助生成",
}
},
"differentiated_view": {
"preview": {
"preview_headline": "5 处与常见叙事的分歧",
"highlights": [item["heading"] for item in divergence_sections[:3]],
"divergence_count": len(divergence_sections),
},
"full": {
"divergences": [
{
"topic": item["heading"],
"consensus_view": split_heading_body(item)[0] or "常规叙事没有充分覆盖该维度。",
"report_position": split_heading_body(item)[1],
}
for item in divergence_sections
]
},
},
"weaknesses": {
"preview": {
"preview_headline": "5 个论证弱点与反方向证据",
"highlights": [item["heading"] for item in weakness_sections[:3]],
"item_count": len(weakness_sections),
"disclaimer_brief": "只做论证质量分析,不做投资建议。",
},
"full": {
"disclaimer_cn": "以下仅分析研报论证质量,不构成投资建议。",
"verification_notes": ["以上问题需要结合后续市场数据、原文脚注和反方向证据继续验证。"],
"items": [
{
"topic": item["heading"],
"weakness": item["body"],
"counter_evidence": "需要结合后续数据、原文脚注与反方向证据继续验证。",
}
for item in weakness_sections
],
},
},
"timeline": {
"preview": {
"preview_headline": "10 个关键事件节点",
"date_range": "1990s-2026",
"highlights": [item["heading"] for item in timeline_sections[:3]],
"event_count": len(timeline_sections),
},
"full": {
"events": [
{
"date": item["heading"],
"period_type": "report_timeline",
"event": item["heading"],
"impact": item["body"],
}
for item in timeline_sections
]
},
},
"study_guide": {
"preview": {
"preview_headline": "术语与问答",
"faq_count": len(study_faq),
"glossary_count": 8,
"sample_question": study_faq[0]["heading"] if study_faq else "为什么要读这份 BIS 季报?",
"highlights": ["核心概念摘要", "简答练习题", "重要术语表"],
},
"full": {
"intro_cn": "这一部分整理了阅读本篇研报时容易遇到的概念、问题和术语。",
"faq_items": [{"question": item["heading"], "answer": item["body"]} for item in study_faq],
"glossary": [
{"term": "M7", "definition": "主导美国股市的七大科技巨头。"},
{"term": "SRT", "definition": "合成风险转移,银行通过衍生品或担保转移部分信用风险。"},
{"term": "BISTRO", "definition": "BIS Time-series Regression Oracle,宏观时间序列预测工具。"},
{"term": "NBFI", "definition": "非银行金融机构。"},
{"term": "Shadow Borrowing", "definition": "经济实质类似债务、但主要存在于资产负债表外的融资安排。"},
{"term": "BDCs", "definition": "业务发展公司,是私募信贷市场的公开交易窗口之一。"},
{"term": "Carry Trade", "definition": "借入低息货币、投资高息资产的套利交易。"},
{"term": "Margin-triggered Liquidations", "definition": "保证金要求上升触发的被迫平仓。"},
],
},
},
"structure_graph": {
"preview": {
"preview_headline": "结构梳理",
"root": "BIS 季报:分析框架",
"top_nodes": [item["heading"] for item in dimension_sections[:5]],
"fallback_derived": True,
},
"full": {
"root": "BIS 季报:分析框架",
"nodes": [
{
"label": item["heading"],
"children": [phrase.strip("") for phrase in re.split(r"[。;;]", item["body"])[:3] if phrase.strip()],
}
for item in dimension_sections
],
"fallback_derived": True,
"source_artifacts": ["query_dimensions"],
},
},
"related_sources": {
"content": {
"items": [
{"title": item["heading"], "source_name": "延伸资料", "summary_cn": item["body"]}
for item in related_sections[:3]
],
"review_note": "延伸来源仅作为候选队列,正式展示前需要人工审核。",
},
"full": {
"items": [
{"title": item["heading"], "source_name": "延伸资料", "summary_cn": item["body"]}
for item in related_sections
],
"review_note": "延伸来源仅作为候选队列,正式展示前需要人工审核。",
},
},
"audio": {
"content": {
"audio_id": "aud_bis_notebooklm_sample",
"title_cn": "BIS 季度评论",
"duration_sec": 75,
"chapters": [],
}
},
}
return base[module_type]
INSTITUTIONS = [
("inst_wgc", "世界黄金协会", "World Gold Council", "industry_org", "tier_1", "https://www.gold.org/", ["贵金属", "央行"]),
("inst_imf", "国际货币基金组织", "International Monetary Fund", "international_org", "tier_1", "https://www.imf.org/", ["宏观金融", "外汇"]),
("inst_world_bank", "世界银行", "World Bank", "international_org", "tier_1", "https://www.worldbank.org/", ["大宗商品", "发展经济"]),
("inst_iea", "国际能源署", "International Energy Agency", "international_org", "tier_1", "https://www.iea.org/", ["能源", "原油"]),
("inst_eia", "美国能源信息署", "U.S. Energy Information Administration", "official", "tier_1", "https://www.eia.gov/", ["能源", "原油"]),
("inst_usgs", "美国地质调查局", "U.S. Geological Survey", "official", "tier_1", "https://www.usgs.gov/", ["矿产", "贵金属"]),
("inst_ecb", "欧洲央行", "European Central Bank", "official", "tier_1", "https://www.ecb.europa.eu/", ["货币政策", "欧元区"]),
("inst_bis", "国际清算银行", "Bank for International Settlements", "international_org", "tier_1", "https://www.bis.org/", ["宏观金融", "金融稳定"]),
("inst_fed", "美联储", "Federal Reserve", "official", "tier_1", "https://www.federalreserve.gov/", ["货币政策", "美元"]),
("inst_opec", "欧佩克", "OPEC", "international_org", "tier_1", "https://www.opec.org/", ["能源", "原油"]),
("inst_ssga", "道富环球投资管理", "State Street Global Advisors", "asset_manager", "tier_2", "https://www.ssga.com/", ["贵金属", "跨资产"]),
("inst_wisdomtree", "WisdomTree", "WisdomTree", "asset_manager", "tier_2", "https://www.wisdomtree.com/", ["大宗商品", "资产配置"]),
("inst_ing", "ING 银行研究", "ING Think", "bank_research", "tier_2", "https://think.ing.com/", ["贵金属", "外汇"]),
("inst_silver_institute", "白银协会", "The Silver Institute", "industry_org", "tier_2", "https://silverinstitute.org/", ["白银", "矿产"]),
("inst_goldman", "高盛研究", "Goldman Sachs Research", "bank_research", "tier_3", "https://www.goldmansachs.com/", ["大宗商品", "宏观"]),
("inst_jpm", "摩根大通研究", "J.P. Morgan Research", "bank_research", "tier_3", "https://www.jpmorgan.com/", ["大宗商品", "宏观"]),
("inst_invesco", "景顺", "Invesco", "asset_manager", "tier_3", "https://www.invesco.com/", ["ETF", "资产配置"]),
("inst_pas", "泛美白银", "Pan American Silver", "partner", "tier_3", "https://www.panamericansilver.com/", ["白银", "矿业"]),
]
BASE_REPORTS = [
(REAL_SAMPLE_REPORT_ID, "BIS 季度评论:全球金融市场重新校准", "inst_bis", "official_public", True, ["宏观金融", "金融稳定", "AI 融资", "非银风险"], "2026-06-02T00:00:00Z"),
("rep_ssga_gold", "黄金月报:金价新高之后,谁在继续买?", "inst_ssga", "authorized_partner", True, ["贵金属", "跨资产"], "2026-05-22T00:00:00Z"),
("rep_wb_pinksheet", "世界银行大宗商品价格表:金属分化继续", "inst_world_bank", "official_public", True, ["大宗商品", "金属"], "2026-05-20T00:00:00Z"),
("rep_iea_omr", "IEA 原油市场月报:库存与需求再平衡", "inst_iea", "official_public", True, ["能源", "原油"], "2026-05-18T00:00:00Z"),
("rep_ing_gold", "ING 黄金观点:实际利率回摆的压力测试", "inst_ing", "authorized_partner", False, ["贵金属", "外汇"], "2026-05-16T00:00:00Z"),
("rep_wisdomtree_outlook", "WisdomTree 商品展望:配置窗口与回撤风险", "inst_wisdomtree", "authorized_partner", False, ["大宗商品", "资产配置"], "2026-05-14T00:00:00Z"),
("rep_usgs_minerals", "USGS 矿产摘要:关键金属供给约束", "inst_usgs", "official_public", True, ["矿产", "贵金属"], "2026-05-12T00:00:00Z"),
("rep_pas_silver", "白银矿业更新:供给扰动与成本曲线", "inst_pas", "broker_public_gray", False, ["白银", "矿业"], "2026-05-10T00:00:00Z"),
("rep_eia_steo", "EIA 短期能源展望:油气价格情景", "inst_eia", "official_public", True, ["能源", "原油"], "2026-05-08T00:00:00Z"),
]
LIGHT_REPORTS = [
("rep_imf_weo", "IMF 世界经济展望:增长分化与政策空间", "inst_imf", "official_public", True, ["宏观金融"], "2026-05-06T00:00:00Z"),
("rep_bis_quarterly", "BIS 季报:市场重新校准", "inst_bis", "official_public", True, ["宏观金融", "金融稳定"], "2026-05-04T00:00:00Z"),
("rep_fed_fsr", "美联储金融稳定报告:杠杆与流动性", "inst_fed", "official_public", True, ["金融稳定"], "2026-05-02T00:00:00Z"),
("rep_ecb_bulletin", "欧洲央行经济公报:通胀路径更新", "inst_ecb", "official_public", True, ["货币政策"], "2026-04-30T00:00:00Z"),
("rep_opec_momr", "OPEC 月报:供需缺口与配额纪律", "inst_opec", "official_public", True, ["能源", "原油"], "2026-04-28T00:00:00Z"),
("rep_wgc_trends", "世界黄金协会:黄金需求趋势", "inst_wgc", "official_public", True, ["贵金属", "央行"], "2026-04-26T00:00:00Z"),
("rep_silver_survey", "白银协会:白银供需调查", "inst_silver_institute", "official_public", True, ["白银"], "2026-04-24T00:00:00Z"),
("rep_gs_commodity", "高盛商品观点:再通胀交易复盘", "inst_goldman", "broker_public_gray", False, ["大宗商品"], "2026-04-22T00:00:00Z"),
("rep_jpm_flows", "摩根大通资金流:商品 ETF 与风险偏好", "inst_jpm", "authorized_partner", False, ["跨资产"], "2026-04-20T00:00:00Z"),
("rep_invesco_etf", "景顺 ETF 观察:黄金与能源配置", "inst_invesco", "authorized_partner", False, ["ETF", "贵金属"], "2026-04-18T00:00:00Z"),
("rep_world_bank_macro", "世界银行宏观更新:贸易与大宗商品", "inst_world_bank", "official_public", True, ["宏观金融", "大宗商品"], "2026-04-16T00:00:00Z"),
("rep_iea_gas", "IEA 天然气市场报告:需求弹性", "inst_iea", "official_public", True, ["能源"], "2026-04-14T00:00:00Z"),
("rep_eia_inventory", "EIA 库存周报解读:裂解价差与需求", "inst_eia", "official_public", False, ["能源"], "2026-04-12T00:00:00Z"),
("rep_usgs_copper", "USGS 铜矿供给:项目延迟与品位下降", "inst_usgs", "official_public", False, ["矿产"], "2026-04-10T00:00:00Z"),
("rep_ing_fx", "ING 外汇周报:美元路径与黄金敏感性", "inst_ing", "authorized_partner", False, ["外汇", "贵金属"], "2026-04-08T00:00:00Z"),
("rep_wisdomtree_gold", "WisdomTree 黄金配置:避险与实际利率", "inst_wisdomtree", "authorized_partner", False, ["贵金属"], "2026-04-06T00:00:00Z"),
("rep_ecb_stability", "欧洲央行稳定评估:非银金融风险", "inst_ecb", "official_public", False, ["金融稳定"], "2026-04-04T00:00:00Z"),
("rep_bis_ai_credit", "BIS 专题:AI 融资与信用风险", "inst_bis", "official_public", False, ["金融稳定", "AI"], "2026-04-02T00:00:00Z"),
]
def module_envelope(module_type: str, report_id: str, title: str, institution_name: str, *, fallback: bool = False) -> dict[str, Any]:
base = {
"basic_info": {"content": {"report_id": report_id, "title_cn": title, "summary_cn": f"{title} 的基础信息,包含发布机构、发布时间、主题标签和来源层级。", "topics": ["贵金属"], "interpretation_label": "研报解读"}},
"executive_overview": {
"preview": {"preview_summary": f"{title} 的结构化摘要,聚焦核心结论、数据线索与风险边界。", "section_count": 3, "key_quote_snippet": "公开研报显示关键变量正在重新定价。"},
"full": {"intro_cn": f"{title} 的执行摘要。", "sections": [{"heading": "核心结论", "body": "报告把需求、价格和风险拆成可读结构。"}, {"heading": "数据线索", "body": "关键指标用于判断趋势是否可持续。"}, {"heading": "风险边界", "body": "外部冲击和估值回摆仍可能改变短期路径。"}], "source_artifacts": ["native_briefing_doc", "native_blog_post"]},
},
"core_insights": {"content": {"points": [{"kind": "view", "text": "核心变量从情绪驱动转向结构驱动。"}, {"kind": "number", "text": "多项关键指标出现同步变化。"}, {"kind": "risk", "text": "若宏观假设反转,短期波动会放大。"}]}, "full": {"dimensions": [{"dimension": "需求结构", "summary": "机构、ETF 与产业需求变化共同影响价格。"}, {"dimension": "风险路径", "summary": "利率、美元和地缘冲击是主要风险因子。"}]}},
"key_data": {"preview": {"preview_headline": "10 个关键数据点", "highlights": ["央行购金保持韧性", "ETF 资金重新流入", "库存周期出现分化"], "row_count": 10}, "full": {"rows": [{"metric": "样本指标", "value": "10", "unit": "", "importance": "用于验证关键数据模块渲染", "judgment": "方向性信号清晰"}], "source_artifacts": ["data_table", "query_key_data"]}},
"source_compliance": {"content": {"source_url": None if report_id == "rep_pas_silver" else "https://example.org/public-report", "source_note": "灰度来源仅展示来源说明,不提供原文链接。" if report_id == "rep_pas_silver" else "原文来源于机构公开研究页。", "copyright_cn": "内容基于机构公开研报的中文结构化解读。", "disclaimer": "本内容不构成投资建议。", "ai_generated_label": "AI 辅助生成"}},
"differentiated_view": {"preview": {"preview_headline": "3 处与共识的关键分歧", "highlights": ["结构性买盘强于短期情绪", "库存周期解释部分价格韧性"], "divergence_count": 3}, "full": {"divergences": [{"topic": "买盘结构", "consensus_view": "价格主要由短期情绪驱动。", "report_position": "报告强调更稳定的结构性买盘。"}]}},
"weaknesses": {"preview": {"preview_headline": "3 处质疑点与开放问题", "highlights": ["样本窗口偏短", "反方向证据仍需跟踪"], "item_count": 3, "disclaimer_brief": "AI 辅助论证质量分析"}, "full": {"disclaimer_cn": "仅供学习参考,不构成投资建议。", "verification_notes": ["这些开放问题需要结合后续数据、原文脚注和反方向证据继续验证。"], "items": [{"topic": "样本窗口", "weakness": "短周期数据可能放大结论。", "counter_evidence": "后续数据可能修正方向。"}]}},
"timeline": {"preview": {"preview_headline": "5 个关键事件节点", "date_range": "2025-2026", "highlights": ["2026:价格重新定价", "2025:资金结构切换"], "event_count": 5}, "full": {"events": [{"date": "2026-05", "period_type": "review_period", "event": "报告发布", "impact": "为市场判断提供公开依据。"}]}},
"study_guide": {"preview": {"preview_headline": "学习指南", "faq_count": 3, "glossary_count": 5, "sample_question": "这份报告适合谁读?"}, "full": {"intro_cn": "学习指南帮助读者理解术语和关键问题。", "faq_items": [{"question": "这份报告适合谁读?", "answer": "适合关注宏观、商品和资产配置的读者。"}], "glossary": [{"term": "source_tier", "definition": "来源可信层级。"}]}},
"structure_graph": {"preview": {"preview_headline": "研报结构图", "root": f"{title}:分析框架", "top_nodes": ["需求", "价格", "风险"], "fallback_derived": fallback}, "full": {"root": f"{title}:分析框架", "nodes": [{"label": "需求", "children": ["机构", "产业", "投资"]}, {"label": "价格", "children": ["利率", "美元", "库存"]}], "fallback_derived": fallback, "source_artifacts": ["query_dimensions"] if fallback else ["mind_map"]}},
"audio": {"content": {"audio_id": f"aud_{report_id.removeprefix('rep_')}", "title_cn": f"{title} 音频摘要", "duration_sec": 180, "chapters": []}},
}
return base[module_type]
def rich_module_types(report_id: str) -> list[str]:
by_report = {
REAL_SAMPLE_REPORT_ID: [
"basic_info",
"executive_overview",
"core_insights",
"key_data",
"source_compliance",
"institution",
"differentiated_view",
"weaknesses",
"timeline",
"study_guide",
"structure_graph",
"related_sources",
"audio",
],
"rep_ssga_gold": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "differentiated_view", "weaknesses", "timeline", "study_guide", "structure_graph", "audio"],
"rep_wb_pinksheet": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "timeline", "study_guide", "audio"],
"rep_iea_omr": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "study_guide", "structure_graph", "audio"],
"rep_ing_gold": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"],
"rep_wisdomtree_outlook": ["basic_info", "executive_overview", "core_insights", "source_compliance", "institution", "timeline"],
"rep_usgs_minerals": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "timeline", "structure_graph", "audio"],
"rep_pas_silver": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"],
"rep_eia_steo": ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution", "study_guide", "audio"],
}
return by_report.get(report_id, ["basic_info", "executive_overview", "core_insights", "key_data", "source_compliance", "institution"])
async def reset(session: AsyncSession) -> None:
for model in [
OutboundEvent,
PlaybackProgress,
SavedListen,
ReadingHistory,
Favorite,
User,
RelatedNews,
AudioAsset,
DisplayModule,
DisplayArtifact,
RawArtifact,
Report,
Institution,
]:
await session.execute(delete(model))
await session.commit()
async def import_seed(session: AsyncSession) -> None:
await reset(session)
inst_lookup: dict[str, str] = {}
for inst_id, name_cn, name_en, inst_type, tier, url, topics in INSTITUTIONS:
inst_lookup[inst_id] = name_cn
session.add(Institution(institution_id=inst_id, name_cn=name_cn, name_en=name_en, institution_type=inst_type, source_tier=tier, website_url=url, covered_topics=j(topics), intro_cn=f"{name_cn} 的公开研究和数据用于 Phase 1 seed 展示。", credibility_note=f"{name_cn}{tier} 来源。", status="active"))
await session.flush()
all_reports = BASE_REPORTS + LIGHT_REPORTS
audio_report_ids = {report_id for report_id, *_rest, has_audio, _topics, _date in all_reports if has_audio}
for idx, (report_id, title, inst_id, source_tier, has_audio, topics, released) in enumerate(all_reports, start=1):
display_status = "draft" if report_id == "rep_wisdomtree_outlook" else "published"
source_url = None if source_tier == "broker_public_gray" else "https://example.org/public-report"
source_note = "灰度公开来源,仅保留来源说明,不做默认音频化。" if source_tier == "broker_public_gray" else "原文来源于机构公开研究页。"
if report_id == REAL_SAMPLE_REPORT_ID:
source_url = "https://www.bis.org/publ/qtrpdf/r_qt2603.htm"
source_note = "原文为 BIS Quarterly Review, March 2026 的公开研报。"
session.add(
Report(
report_id=report_id,
report_type="single",
title_cn=title,
subtitle_cn="",
original_title="BIS Quarterly Review, March 2026" if report_id == REAL_SAMPLE_REPORT_ID else f"{title} original",
one_liner="2025 年底至 2026 年初,全球金融市场在表面平静下出现资金流向切换,AI 融资、贵金属杠杆和非银风险成为主要线索。" if report_id == REAL_SAMPLE_REPORT_ID else f"{title} 的一分钟结构化摘要。",
institution_id=inst_id,
source_tier=source_tier,
source_url=source_url,
source_note=source_note,
published_at=d(released),
interpreted_at=d(released),
released_at=d(released),
topics=j(topics),
language="en",
has_audio=has_audio,
display_status=display_status,
display_version=1,
cache_version=f"{report_id}:v1",
risk_disclaimer="本内容为公开研报的结构化解读,不构成投资建议。",
interpretation_label="研报解读",
)
)
await session.flush()
da_id = f"da_{report_id.removeprefix('rep_')}_v1"
session.add(DisplayArtifact(display_artifact_id=da_id, report_id=report_id, display_version=1, title_cn=title, summary_cn=f"{title} seed display artifact", source_label=inst_lookup[inst_id], interpretation_label="研报解读", ai_generated_label="AI 辅助生成", synthesis_type="mixed" if has_audio else "text", source_disclosure_text=source_note, review_status="published", published_at=d(released)))
await session.flush()
artifact_types = sample_artifact_types() if report_id == REAL_SAMPLE_REPORT_ID else ["native_briefing_doc", "native_blog_post", "native_study_guide", "data_table", "query_dimensions", "query_key_data"]
for artifact_type in artifact_types:
session.add(RawArtifact(raw_artifact_id=f"raw_{report_id.removeprefix('rep_')}_{artifact_type}", report_id=report_id, artifact_type=artifact_type, payload_format="markdown" if artifact_type != "data_table" else "csv", status="ok", is_publish_blocking=artifact_type in {"native_briefing_doc", "native_blog_post", "data_table", "query_dimensions", "query_key_data"}, retention_status="retained", ingested_at=d(released)))
if report_id == "rep_iea_omr":
session.add(RawArtifact(raw_artifact_id="raw_iea_omr_mind_map", report_id=report_id, artifact_type="mind_map", payload_format="json", status="failed", error="Download failed for mind_map", is_publish_blocking=False, retention_status="retained", ingested_at=d(released)))
module_types = [
value
for value in sorted(
rich_module_types(report_id),
key=lambda value: MODULE_DISPLAY_ORDER.get(value, len(MODULE_DISPLAY_ORDER)),
)
if value != "institution"
]
for order, module_type in enumerate(module_types):
if report_id == REAL_SAMPLE_REPORT_ID:
payload = real_sample_module_envelope(module_type, report_id, title, inst_lookup[inst_id])
else:
payload = module_envelope(module_type, report_id, title, inst_lookup[inst_id], fallback=(report_id == "rep_iea_omr" and module_type == "structure_graph"))
module_id = f"mod_{report_id.removeprefix('rep_')}_{module_type}"
content_ref = f"rnb/modules/{module_id}.json" if "full" in payload else None
session.add(
DisplayModule(
module_id=module_id,
report_id=report_id,
display_artifact_id=da_id,
type=module_type,
title_cn=MODULE_TITLES.get(module_type, module_type),
content_format="json",
content=j(payload),
content_ref=content_ref,
content_etag=etag(payload),
source_raw_artifact_ids=j([]),
status="published" if display_status == "published" else "review",
sort_order=order,
version=1,
)
)
if has_audio and report_id in audio_report_ids:
audio_id = f"aud_{report_id.removeprefix('rep_')}"
session.add(AudioAsset(audio_id=audio_id, report_id=report_id, title_cn=f"{title} 音频摘要", duration_sec=180 + idx, oss_key=f"rnb/audio/{audio_id}.m4a", chapters=j([]), status="published" if display_status == "published" else "review", published_at=d(released)))
if idx <= 15:
session.add(RelatedNews(related_news_id=f"news_{idx:03d}", report_id=report_id, title=f"{title} 延伸阅读", source_name="公开财经资讯", source_url="https://example.org/news", published_at=d(released), language="zh", summary_cn="整理自公开财经资讯的延伸阅读。", match_method="manual_curated", match_keywords=j(topics), match_confidence="medium", status="published"))
await session.flush()
for inst_id in inst_lookup:
count = await session.scalar(select(Report).where(Report.institution_id == inst_id).count()) if False else None
reports = (await session.execute(select(Report).where(Report.institution_id == inst_id, Report.display_status == "published").order_by(Report.released_at.desc()))).scalars().all()
inst = (await session.execute(select(Institution).where(Institution.institution_id == inst_id))).scalar_one()
inst.report_count = len(reports)
if reports:
inst.latest_report_id = reports[0].report_id
inst.latest_report_at = reports[0].released_at
users = [
User(user_id="user_alpha", phone_hash="hash_alpha", display_name="Alpha", status="active"),
User(user_id="user_history", phone_hash="hash_history", display_name="History", status="active"),
User(user_id="user_guest_placeholder", display_name="Guest Placeholder", status="disabled"),
]
session.add_all(users)
await session.flush()
for idx, report_id in enumerate(["rep_ssga_gold", "rep_wb_pinksheet", "rep_iea_omr", "rep_usgs_minerals", "rep_eia_steo"], start=1):
session.add(Favorite(favorite_id=f"fav_{idx:03d}", user_id="user_alpha", report_id=report_id, status="active"))
for idx, report_id in enumerate(["rep_ssga_gold", "rep_wb_pinksheet", "rep_iea_omr"], start=1):
audio_id = f"aud_{report_id.removeprefix('rep_')}"
session.add(PlaybackProgress(progress_id=f"prog_{idx:03d}", user_id="user_alpha", audio_id=audio_id, report_id=report_id, position_sec=idx * 30, duration_sec=180 + idx, completed=False))
await session.commit()
async def main() -> None:
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
async with SessionLocal() as session:
await import_seed(session)
print("seed import complete")
if __name__ == "__main__":
asyncio.run(main())
@@ -0,0 +1,117 @@
from __future__ import annotations
import os
os.environ["RNB_DATABASE_URL"] = "sqlite+aiosqlite:///./test_seed.db"
os.environ["RNB_REDIS_URL"] = "redis://test-redis.invalid/0"
import pytest
from httpx import ASGITransport, AsyncClient
from sqlalchemy import select
from app.db import Base, SessionLocal, engine
from app.main import app
from app.models import AudioAsset, DisplayModule, Institution, Report
from scripts.import_seed_content import import_seed
PREFIX = "/api/report-notebooklm/v1"
@pytest.fixture(autouse=True)
async def seeded_db():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.drop_all)
await conn.run_sync(Base.metadata.create_all)
async with SessionLocal() as session:
await import_seed(session)
yield
@pytest.fixture
async def client():
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as ac:
yield ac
async def test_seed_counts_match_phase1_shape():
async with SessionLocal() as session:
assert len((await session.execute(select(Institution))).scalars().all()) == 18
assert len((await session.execute(select(Report))).scalars().all()) == 27
assert len((await session.execute(select(AudioAsset))).scalars().all()) == 15
assert len((await session.execute(select(DisplayModule))).scalars().all()) >= 120
async def test_health_and_recommended_feed(client: AsyncClient):
health = await client.get(f"{PREFIX}/health")
assert health.status_code == 200
assert health.json() == {"status": "ok"}
feed = await client.get(f"{PREFIX}/feed/recommended")
assert feed.status_code == 200
body = feed.json()
assert body["items"]
assert body["items"][0]["report_id"] == "rep_bis_notebooklm_sample"
assert "display_version" not in body["items"][0]
assert body["items"][0]["cache_version"].startswith("rep_")
async def test_report_detail_hides_internal_fields_and_review_modules(client: AsyncClient):
response = await client.get(f"{PREFIX}/reports/rep_ssga_gold")
assert response.status_code == 200
body = response.json()
assert body["report_id"] == "rep_ssga_gold"
assert "display_version" not in body
module_types = [module["type"] for module in body["modules"]]
assert "study_guide" in module_types
assert "institution" not in module_types
assert "faq" not in module_types
assert "infographic" not in module_types
assert all(module["has_detail_page"] for module in body["modules"])
assert module_types[-1] == "source_compliance"
key_data = next(module for module in body["modules"] if module["type"] == "key_data")
assert key_data["render_mode"] == "card_plus_page"
assert key_data["content"] is None
assert key_data["preview"]["row_count"] == 10
assert key_data["content_ref"].startswith("rnb/modules/")
async def test_module_endpoint_returns_full_content(client: AsyncClient):
detail = (await client.get(f"{PREFIX}/reports/rep_ssga_gold")).json()
key_data = next(module for module in detail["modules"] if module["type"] == "key_data")
response = await client.get(f"{PREFIX}/reports/rep_ssga_gold/modules/{key_data['module_id']}")
assert response.status_code == 200
body = response.json()
assert body["module_id"] == key_data["module_id"]
assert "rows" in body["content"]
assert body["cache_version"] == "rep_ssga_gold:v1"
async def test_boundary_reports(client: AsyncClient):
listen = (await client.get(f"{PREFIX}/listen")).json()
listen_report_ids = {item["report_id"] for item in listen["items"]}
assert "rep_ing_gold" not in listen_report_ids
assert "rep_pas_silver" not in listen_report_ids
hidden = await client.get(f"{PREFIX}/reports/rep_wisdomtree_outlook")
assert hidden.status_code == 404
gray = (await client.get(f"{PREFIX}/reports/rep_pas_silver")).json()
compliance = next(module for module in gray["modules"] if module["type"] == "source_compliance")
assert compliance["content"]["source_url"] is None
assert "灰度" in compliance["content"]["source_note"]
async def test_institutions_and_listen(client: AsyncClient):
institutions = await client.get(f"{PREFIX}/institutions")
assert institutions.status_code == 200
assert len(institutions.json()["items"]) == 18
inst = await client.get(f"{PREFIX}/institutions/inst_ssga")
assert inst.status_code == 200
assert inst.json()["latest_report"]["report_id"] == "rep_ssga_gold"
listen = await client.get(f"{PREFIX}/listen")
assert listen.status_code == 200
assert listen.json()["items"][0]["audio_id"].startswith("aud_")