yanting/report-notebooklm-api/docs/CONTENT_PIPELINE.md

# Content Pipeline Handoff

This is a handoff snapshot, not the product SSOT.

Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.

## Content Principle

Use NotebookLM as a source-driven research engine, not as a generic rewriting model.

The pipeline may orchestrate, clean, validate, map, and review NotebookLM-native artifacts. It must not silently replace missing NotebookLM artifacts with locally rewritten publishable content.

## Source Inputs

Phase 1 content is based on public or authorized institutional research reports. Priority source categories:

- Official public sources.
- Authorized partner sources.
- Gray broker public sources, with stricter review and source display handling.

Vision source lists, tiering, and historical source-health experience may be used as reference material. Production data must not depend on a local Vision runtime, local path, local cache, or local account state.

## NotebookLM Workflow

Recommended report run order:

1. Inspect the source PDF: title, institution, date, page count, size, and report type.
2. Create or reuse one notebook for one report source unless a multi-report synthesis is explicitly planned.
3. Upload the report source.
4. Generate the P0 text package:
   - source description
   - native Briefing Doc
   - native Blog Post
   - data table
   - query dimensions
   - query key data
   - query divergence
   - query weaknesses
5. Generate useful P1 artifacts:
   - query timeline
   - query related sources
   - Study Guide
   - mind map, if download succeeds
6. Generate P2 artifacts asynchronously:
   - infographic candidate
   - audio brief
   - research discovery
7. Persist every artifact status in a manifest.
8. Deterministically assemble display modules from reviewed artifacts.
9. Run human review before publishing.

## Artifact Types

The Phase 1 schema supports these NotebookLM artifact types:

| Artifact type | Purpose | Publish blocking | Human review |
|---|---|---:|---:|
| `source_summary` | Source-level summary. | No | No |
| `notebook_summary` | Notebook-level summary. | No | No |
| `native_briefing_doc` | Native briefing document. | Yes | No |
| `native_blog_post` | Native blog post. | Yes | No |
| `native_study_guide` | FAQ, study guide, glossary. | No | No |
| `data_table` | Structured table data. | Yes | No |
| `mind_map` | Mind map or graph source. | No | No |
| `query_dimensions` | Analysis dimensions. | Yes | No |
| `query_key_data` | Key data points. | Yes | No |
| `query_divergence` | Views that diverge from consensus. | No | No |
| `query_weaknesses` | Weaknesses and open questions. | No | No |
| `query_timeline` | Timeline and turning points. | No | No |
| `query_related_sources` | Related source candidates. | No | Yes |
| `research_discovery` | Enrichment queue. | No | Yes |
| `infographic` | Candidate public image. | No | Yes |
| `audio_brief` | Listening preview or audio source. | No | No |

Artifact records should keep status, object reference, format, size, hash, generated time, error, and review flags. Raw payloads should stay in object storage and remain internal.

## Module Mapping

| Product module | Primary artifact sources | Notes |
|---|---|---|
| `basic_info` | Source metadata and source summary. | P0, inline. |
| `executive_overview` | Briefing Doc and Blog Post. | P0, heavy card plus page. |
| `core_insights` | Briefing Doc and query dimensions. | P0, inline with optional detail page. |
| `key_data` | Data table and query key data. | P0, heavy card plus page. |
| `source_compliance` | Source metadata and review notes. | P0, inline, must include disclaimer. |
| `institution` | Institution record. | P0, inline. |
| `differentiated_view` | Query divergence. | P1, optional. |
| `weaknesses` | Query weaknesses. | P1, optional, avoid investment-advice wording. |
| `timeline` | Query timeline. | P1, optional. |
| `study_guide` | Native Study Guide. | P1, optional, replaces legacy `faq`. |
| `structure_graph` | Mind map or deterministic fallback. | P1, optional. |
| `related_sources` | Related-source query and review queue. | P1, review required before display. |
| `infographic` | Infographic candidate. | P2, review required before display. |
| `audio` | Audio brief or reviewed audio asset. | P2, not required for text publish. |
| `research_discovery` | Research discovery queue. | P2, internal or reviewed only. |

## Publish Gates

Blocking before public release:

- Source upload succeeded and is traceable.
- Required P0 text artifacts exist and have usable content.
- `basic_info`, `executive_overview`, `core_insights`, `key_data`, and `source_compliance` are present unless a product decision allows a partial report.
- Display artifact is reviewed and approved.
- Source attribution and risk disclaimer are present.
- No raw artifact payload, local path, private notebook ID, or account information appears in public responses.

Non-blocking:

- Mind map.
- Study guide.
- Timeline.
- Related-source candidates.
- Research discovery.
- Infographic.
- Audio.

If optional artifacts fail, record the failure and continue without inventing fallback public copy. Deterministic fallback is allowed for structure graph from already available artifacts.

## Cadence Notes

NotebookLM operations should be conservative by default:

- One active NotebookLM operation per account.
- Text artifacts first.
- Media artifacts after text success.
- Heavy media should not block publishable text.
- On transient failure, retry once; if an optional artifact fails again, mark it failed and continue.

The seed importer is not a production runner. A production runner should persist manifests after every operation and support resumable review/import.

## Human Review

Review is mandatory for:

- Gray broker sources.
- Related-source candidate display.
- Infographic or generated media.
- Any content where citations/page labels are ambiguous.
- Any copy that could be interpreted as investment advice.

Do not display raw NotebookLM page labels until they are normalized against verifiable source pages or sections.