← Journal
· 9 min #retrieval#production#eval#rag

On retrieval that survives the on-call rotation.

Why 'RAG' is a research toy until you treat ingestion, eval, and observability as first-class concerns. Field-tested patterns from three production deployments.

Most “RAG demos” survive about thirty seconds of real production traffic before the wheels come off. The model is fine. The retriever is fine. What breaks is the boring connective tissue between them — ingestion, schema, eval, and observability.

This is a rough field guide built from three deployments: an asset-management research tool, a customer-support assistant for a B2B SaaS, and an internal docs search at a 600-person company. Different shapes, same lessons.

Ingestion is half the system

The thing nobody writes about: getting good documents into your store is harder than the retrieval itself. PDFs are a war crime. Confluence pages have a half-life of six months. Slack threads are useful but contextually orphaned the moment they leave the channel.

The patterns that worked:

  1. Idempotent ingestion, keyed on a stable document ID, with a content hash so re-runs don’t churn the store.
  2. Layered chunking — ingest the document at three resolutions (page, section, paragraph) and pick at retrieval time.
  3. Metadata over cleverness — author, last-edited timestamp, ACL tags. Retrieval can filter on these before embedding similarity, which is faster and more correct.

Eval is not a launch-day ritual

The day you ship is the day you should already have an eval set running on every pull request. If you’re hand-checking outputs in a Notion doc, you’ve already lost.

Two evals that pay rent:

Observability beats prompt engineering

You will spend more time staring at the trace of a single bad query than rewriting prompts. Tools that helped:

The takeaways

If you’re building this for the second time:

The unsexy half is where the production value lives. The model is, by 2026, a commodity.

Newsletter · low-volume

A note when something new lands.

Long-form essays on agentic AI, MLOps, and production systems. No drips, no funnels — one mail when there is something worth reading.

No spam. Unsubscribe anytime.