This is Post 5 of The Production Ready Series.
"A production AI system should be able to tell you exactly what it did, in what order, at what speed, for any request — without anyone having to dig through logs to find out."
The Question That Exposes Everything
There is one question that separates a production AI system from a demo that made it to production.
What happened to that request.
Not approximately. Not we think the pipeline ran and the search probably returned results. Exactly. Which endpoint handled it. What the search layer returned. Which documents were in scope. How long each step took. Whether anything failed silently along the way.
If your system cannot answer that question with precision and speed you do not have a production AI system. You have an application that works until someone asks you to prove it.
Request tracing is how you prove it. Playback is how you investigate when it does not.
What Request Tracing Actually Is
Request tracing is the ability to follow a single user interaction through every layer of the system and produce a complete, correlated account of what happened.
It is not logging. Logs tell you what individual components felt like reporting. Request tracing tells you what the system did as a unified whole — from the moment a user submitted a query to the moment a response came back, with every step in between captured, typed, and queryable.
In plain terms — imagine being able to pull up a complete receipt for any user interaction. Timestamp. Endpoint called. Search executed. Documents retrieved. Latency at each step. Errors if any occurred. Everything in one place, structured, accurate, and available in seconds rather than hours.
That is what the event taxonomy makes possible. Each table owns its domain slice of the request. Together they tell the complete story.
Walking the Chain
A request enters the system through ApplicationHealthEvents. Endpoint called. Timestamp recorded. Latency clock started.
That request triggers document retrieval. OperationalMetrics captures the LanceDB query — latency, index state, scan type, result count. The search layer returns candidates.
Those candidates trace back to DocProcessingEvents. When were these documents ingested. Did processing complete cleanly. Were there any parse issues that might explain degraded result quality. The document trail is there because every document that entered the system left a record.
If the request hit a near real time pipeline PipelineEvents closes the loop. Which batch produced the data this request searched against. batch_id is the thread. Pull it and you know exactly which micro-batch was in scope when the user submitted their query.
UserAdvancedSearchEvents adds the behavioral layer. What did the user actually ask. Which filters were applied. Was this a query pattern the system handles well or one that consistently underperforms. Usage patterns in this table tell you whether the application is being used the way it was designed — and where the design needs to evolve.
Five tables. One request. Complete picture. No log diving required.
Single Request Forensics
The first use case for this capability is forensics on a specific interaction.
A user got a bad result. Or no result. Or a slow response that degraded their experience. Someone — a product owner, a business sponsor, a support escalation — wants to know what happened.
Without request tracing the answer is an investigation. Pull logs from multiple systems. Reconstruct a timeline manually. Try to correlate events that were never designed to be correlated. Hours of work that may not produce a definitive answer.
With request tracing it is a query. Take the request identifier from ApplicationHealthEvents. Join across the relevant tables. Produce a complete account of what the system did for that specific interaction in minutes.
That is the difference between a team that can stand in front of a business sponsor with confidence and a team that cannot. The data is either there or it is not. There is no improvising a credible answer when someone is asking hard questions about a specific user interaction on a specific day at a specific time.
Build the tracing capability before you need it. You will need it.
Time Window Replay
The second use case is broader. Not one request — a window of time.
What did the application do between 2pm and 4pm yesterday. Which endpoints were called and at what volume. Where did latency spike. Did document processing keep pace with request volume or fall behind. Were there clusters of failed or degraded requests that suggest a systemic issue rather than a one-off anomaly.
Time window replay is how you investigate incidents that do not announce themselves cleanly. No single failure. No obvious error. Just a pattern of degraded behavior across a window of time that only becomes visible when you can look at the full sequence of events together.
In plain terms — think of it as rewinding the tape. Not on one play but on a full quarter. You are looking for patterns, sequences, and correlations that individual request forensics would miss because no single request tells the whole story.
ApplicationHealthEvents gives you the request volume and latency picture. DocProcessingEvents shows whether ingestion kept up. OperationalMetrics shows whether the vector store was performing consistently. UserAdvancedSearchEvents shows whether query patterns shifted. PipelineEvents shows whether upstream data delivery was stable.
Together they give you a complete operational picture of any window of time the system has been running. That picture is how you move from something felt off yesterday afternoon to here is exactly what happened and here is why.
Playback Is Not Fully Built Yet. The Foundation Is.
Honest caveat. The full playback capability — structured, tooled, queryable across all six tables with a clean interface — is not complete. The system is maturing. The foundation is being laid now, not the finished product.
What is locked is the architecture that makes it possible. The event taxonomy captures the right signals. The schema discipline ensures they are typed consistently and joinable across tables. The correlation keys — batch_id, request identifiers, timestamps — thread through every table so the chain can be walked when you need to walk it.
You do not build playback and then decide what to capture. You capture everything correctly first and playback becomes a natural capability of the layer you already built.
That sequencing is intentional. Get the events right. Get the schema right. Get the correlation keys right. The tooling that surfaces those events for investigation follows naturally as the team's operational questions get more sophisticated.
The foundation is being laid now. The capability deepens from there.
What This Gives the Business
Request tracing and playback are engineering capabilities. But their value is measured in business terms.
When a user escalates a complaint the team has an answer in minutes not hours. When the business sponsor asks what happened during yesterday's degraded window the engineering team walks them through a complete timeline rather than offering uncertainty. When a pattern of poor results emerges in a specific query type the team identifies it from the data rather than waiting for enough user complaints to make the pattern visible.
An AI system that can account for its own behavior is an AI system the business can trust. Not because it never fails. Because when it fails the team can explain exactly what happened and demonstrate exactly what changed to prevent it happening again.
That is not a technical capability. That is a business capability built on a technical foundation.
Build the foundation right and the business capability follows.
What's Next in This Series
Post 6 — Closing the Loop: Feeding clean signal to your observability platform and making the business conversation possible
Clarity through the chaos.