← Journal
· 14 min #mlops#infrastructure#production

The boring parts of MLOps.

Schemas, artifacts, and the audit trail. The unsexy infrastructure that lets a model retraining pipeline survive a quarterly review.

There is a popular shape of MLOps post that goes: “we used MLflow + Airflow + Feast and now we ship models continuously.” This is not that post.

This is the post about the three things that quietly determine whether your retraining pipeline survives twelve months of organizational entropy: schemas, artifacts, and the audit trail.

Schemas

A model is downstream of a feature contract whether you’ve written one down or not. The question is whether the contract is checked at every layer or only by accident.

What “checked” looks like in practice:

That last one is the boring one nobody builds until they get burned.

Artifacts

An artifact is anything the pipeline produces that another part of the system reads. Models, of course — but also feature stats, calibration tables, evaluation reports, the prompt template for an LLM-judge eval.

The discipline that pays:

  1. Every artifact has a stable ID. Hash of inputs + version of code that produced it. Re-running with the same inputs produces the same ID.
  2. Every artifact is immutable once produced. No overwrites. New version → new ID.
  3. Every artifact has lineage — a record of which other artifacts it was produced from.

This is what lets you answer the question “why is the model that’s serving traffic right now this version?” in fewer than five steps.

The audit trail

Six months from now, someone — possibly you — will need to answer one of:

The audit trail is the answer. It is not a Slack scroll. It is:

What it looks like, end-to-end

The simplest version of all of this that actually works:

  1. Ingestion writes raw data with schema validation at the boundary. Bad records go to a quarantine bucket.
  2. Feature pipeline produces feature artifacts with hash-based IDs. Stats artifacts produced alongside.
  3. Training consumes feature artifact IDs (not paths), produces a model artifact ID + an eval artifact ID.
  4. Promotion is a recorded event in the registry, gated on eval thresholds.
  5. Deploy reads the registry, writes a deployment manifest with the artifact ID it pulled.

None of this is novel. None of it requires a hyperscaler-grade platform. Three engineers can build the bones of it in two months.

The reason it’s worth building is that the day you need it — when an auditor asks, when a model misbehaves, when someone asks why version 7 outperformed version 8 — you have actual evidence instead of a theory.

Newsletter · low-volume

A note when something new lands.

Long-form essays on agentic AI, MLOps, and production systems. No drips, no funnels — one mail when there is something worth reading.

No spam. Unsubscribe anytime.