Portfolio · Case Study
An end-to-end AI content pipeline that generates grounded, multi-channel content from a curated knowledge base. Source material — YouTube transcripts, PDFs, eBooks, web articles, and podcast episodes — is ingested, chunked, and embedded into ChromaDB. A podcast topic extraction pipeline uses Claude to identify article-worthy topics from each episode, each with a verbatim source segment as raw material. At generation time, RAG retrieval pulls source-attributed context — not the model's training data — and renders it through a configurable voice template — calibrated against reference writers or publications defined by the author, producing output that sounds like a consistent, human voice rather than generic AI. A standalone Claude Haiku review agent audits every draft before publish. Feature images are generated via gpt-image-2 against a branded studio style guide. Final output publishes to WordPress with a companion ElevenLabs audio file.
Content is generated from the author's actual knowledge base — not the model's training data. RAG retrieval pulls only from ingested sources; chunks are grouped by source title so Claude cites real authors and titles, not hallucinated ones. Source metadata (author, date, URL) is injected in a separate block to prevent fabrication of bibliographic details.
When Claude generates, it sees chunks grouped under labeled source headings — '[YouTube] Author Name · Title', '[PDF] Book Title by Author'. The model naturally cites from these labels rather than inventing sources. Canonical bibliographic metadata (author, date, URL) is injected in a separate block after the context, so the model has the real data available and cannot fabricate it.
A single view of the curated knowledge base — what's in it, what's been processed, and how to add more.
Every source ingested into the knowledge base — searchable, filterable, and traceable from any generated article back to its origin.
The pipeline of article ideas — sourced from the curated knowledge base and automatically mined from new podcast episodes.
Every article the engine has produced — long-form, voice-aligned, source-cited, and ready for publishing across channels.