Files
yanting/report-notebooklm-api/docs/CONTENT_PIPELINE.md
T

6.0 KiB

Content Pipeline Handoff

This is a handoff snapshot, not the product SSOT.

Product SSOT: mall-docs report-notebooklm docs, snapshot date: 2026-06-03.

Content Principle

Use NotebookLM as a source-driven research engine, not as a generic rewriting model.

The pipeline may orchestrate, clean, validate, map, and review NotebookLM-native artifacts. It must not silently replace missing NotebookLM artifacts with locally rewritten publishable content.

Source Inputs

Phase 1 content is based on public or authorized institutional research reports. Priority source categories:

  • Official public sources.
  • Authorized partner sources.
  • Gray broker public sources, with stricter review and source display handling.

Vision source lists, tiering, and historical source-health experience may be used as reference material. Production data must not depend on a local Vision runtime, local path, local cache, or local account state.

NotebookLM Workflow

Recommended report run order:

  1. Inspect the source PDF: title, institution, date, page count, size, and report type.
  2. Create or reuse one notebook for one report source unless a multi-report synthesis is explicitly planned.
  3. Upload the report source.
  4. Generate the P0 text package:
    • source description
    • native Briefing Doc
    • native Blog Post
    • data table
    • query dimensions
    • query key data
    • query divergence
    • query weaknesses
  5. Generate useful P1 artifacts:
    • query timeline
    • query related sources
    • Study Guide
    • mind map, if download succeeds
  6. Generate P2 artifacts asynchronously:
    • infographic candidate
    • audio brief
    • research discovery
  7. Persist every artifact status in a manifest.
  8. Deterministically assemble display modules from reviewed artifacts.
  9. Run human review before publishing.

Artifact Types

The Phase 1 schema supports these NotebookLM artifact types:

Artifact type Purpose Publish blocking Human review
source_summary Source-level summary. No No
notebook_summary Notebook-level summary. No No
native_briefing_doc Native briefing document. Yes No
native_blog_post Native blog post. Yes No
native_study_guide FAQ, study guide, glossary. No No
data_table Structured table data. Yes No
mind_map Mind map or graph source. No No
query_dimensions Analysis dimensions. Yes No
query_key_data Key data points. Yes No
query_divergence Views that diverge from consensus. No No
query_weaknesses Weaknesses and open questions. No No
query_timeline Timeline and turning points. No No
query_related_sources Related source candidates. No Yes
research_discovery Enrichment queue. No Yes
infographic Candidate public image. No Yes
audio_brief Listening preview or audio source. No No

Artifact records should keep status, object reference, format, size, hash, generated time, error, and review flags. Raw payloads should stay in object storage and remain internal.

Module Mapping

Product module Primary artifact sources Notes
basic_info Source metadata and source summary. P0, inline.
executive_overview Briefing Doc and Blog Post. P0, heavy card plus page.
core_insights Briefing Doc and query dimensions. P0, inline with optional detail page.
key_data Data table and query key data. P0, heavy card plus page.
source_compliance Source metadata and review notes. P0, inline, must include disclaimer.
institution Institution record. P0, inline.
differentiated_view Query divergence. P1, optional.
weaknesses Query weaknesses. P1, optional, avoid investment-advice wording.
timeline Query timeline. P1, optional.
study_guide Native Study Guide. P1, optional, replaces legacy faq.
structure_graph Mind map or deterministic fallback. P1, optional.
related_sources Related-source query and review queue. P1, review required before display.
infographic Infographic candidate. P2, review required before display.
audio Audio brief or reviewed audio asset. P2, not required for text publish.
research_discovery Research discovery queue. P2, internal or reviewed only.

Publish Gates

Blocking before public release:

  • Source upload succeeded and is traceable.
  • Required P0 text artifacts exist and have usable content.
  • basic_info, executive_overview, core_insights, key_data, and source_compliance are present unless a product decision allows a partial report.
  • Display artifact is reviewed and approved.
  • Source attribution and risk disclaimer are present.
  • No raw artifact payload, local path, private notebook ID, or account information appears in public responses.

Non-blocking:

  • Mind map.
  • Study guide.
  • Timeline.
  • Related-source candidates.
  • Research discovery.
  • Infographic.
  • Audio.

If optional artifacts fail, record the failure and continue without inventing fallback public copy. Deterministic fallback is allowed for structure graph from already available artifacts.

Cadence Notes

NotebookLM operations should be conservative by default:

  • One active NotebookLM operation per account.
  • Text artifacts first.
  • Media artifacts after text success.
  • Heavy media should not block publishable text.
  • On transient failure, retry once; if an optional artifact fails again, mark it failed and continue.

The seed importer is not a production runner. A production runner should persist manifests after every operation and support resumable review/import.

Human Review

Review is mandatory for:

  • Gray broker sources.
  • Related-source candidate display.
  • Infographic or generated media.
  • Any content where citations/page labels are ambiguous.
  • Any copy that could be interpreted as investment advice.

Do not display raw NotebookLM page labels until they are normalized against verifiable source pages or sections.