The decision to proceed past the pilot is a milestone, but it does not mean you are at the finish line. Rather, it’s where a different kind of (vitally important) work begins. Organizations that ran disciplined, well-structured pilots still stumble in the transition, and the stumbles tend to follow a pattern.

Three gaps explain most of the friction: the visibility gap, the resilience gap, and the ownership gap. These gaps compound: visibility problems mask resilience issues, and resilience issues are worse without clear ownership.

The Visibility Gap

In a pilot, you see everything: every input, every output, every failure. In production, you see dashboards and alerts, if you’ve even built them.

The challenge is designing monitoring that catches meaningful problems without creating alert fatigue (alerts that people start ignoring). By the time the pilot is done, the operational framework exists and the system can run the tasks. But during the pilot, the monitoring was you. Building automated monitoring that replaces your judgment is real work, potentially comparable in effort to building the pipeline itself.

The key is to use the pilot as calibration, not just validation. The quality signals you learned to watch for, the failure patterns you caught, the edge cases that surfaced: that is the specification for production monitoring. What you observed during the pilot doesn’t disappear when the pilot ends — it becomes the blueprint for what automated systems need to surface.

It is important to monitor in both directions. Tracking failures is obvious, but tracking whether quality is drifting downward before it becomes a failure is how you catch degradation early. In the ResortSteward pilot I described in my previous posts, as part of the pilot process, I manually reviewed every email generated by the pilot system. That review allowed me to revise or tweak parts of the system and was a big part of what made the pilot successful. At ten times the volume, that’s no longer viable. The system needs to tell you which emails to review, and it needs to do so before a customer notices a problem.

The practical principle: monitor for drift (are outputs getting worse over time?) rather than perfection (is every output correct?). If you can’t describe what “worse than expected” looks like and set an alert for it, your monitoring isn’t done.

The Resilience Gap

Pilots run on relatively controlled inputs under favorable conditions, but production runs on whatever shows up.

Three dimensions of resilience matter: data quality (malformed inputs, missing fields, sources that changed structure since the pilot ran), infrastructure (API outages, rate limits, timeout handling, model version changes), and cost (what happens to unit economics when volume or error rates climb).

Start by inventorying every dependency: data inputs, AI models, external APIs, third-party services. Each one can change, degrade, or disappear without notice. LLM behaviour in particular can shift between model versions — the prompt that worked in March may not produce the same output in June. For each dependency, define what success looks like, then monitor for its absence rather than trying to anticipate every failure mode. A job that completes but produces no usable output is harder to catch than one that crashes outright, and far more dangerous.

The ratio of error-handling paths to “happy paths” should feel uncomfortable. If it isn’t, you haven’t thought hard enough about what can break. Build for graceful degradation: a job that errors and alerts is better than one that silently produces bad output.

The practical principle: the things that hurt you in production are the dependencies you assumed were stable. Assume none of them are stable and all will fail at some point.

The Ownership Gap

The pilot had a champion — someone who built it, understood every quirk, and cared about the results. Production needs more than a champion.

Pilots are engaging and people are invested. But once a system moves into production (especially when the initial transition goes smoothly!) it fades into the background and complacency builds quietly. Then something, somewhere, breaks, and the people who notice are your customers getting malformed emails or your suppliers wondering why payments stopped.

Treat the system the way you’d treat a person doing the same job. If an employee silently stopped doing part of their work or started doing it wrong, you’d have accountability structures to catch that: reviews, check-ins, escalation paths. Automated systems need the same, not because they’re fragile, but because unattended doesn’t mean unsupervised.

Ownership means a named individual accountable for the system’s health, documented procedures for common failures, and an escalation path when something unexpected happens. But it also means a structure around that person: who checks the work, how often, and how you verify the checking is actually happening. Accountability without oversight is just a name on a document.

The test: if the builder is unreachable for two weeks, does the system keep running and can someone else troubleshoot it?

This gap is organizational, not technical. It’s about whether the company has decided this system is infrastructure or still treats it as someone’s side project. Once that decision is made, the visibility and resilience gaps become solvable. Without it, they’re just technical debt with no owner.

Closing the Gaps

These three gaps are structural, but closing them is a human problem. Every one of them requires people to change how they work, not just what the technology does. Address them together, not sequentially: visibility problems mask resilience issues, and resilience issues are worse without clear ownership.

Closing them isn’t about perfection — it’s about the same discipline of honest evaluation that guided the pilot decision in the first place. Automating the work doesn’t automate the accountability. That part is still yours.

Next week: The ownership gap goes deeper than naming a maintainer. It touches organizational structure and accountability, and addresses why “the team owns it” is usually the same as “nobody owns it.”

Wrap Up

This post is part of a series on the current state of AI, focused on how it can be applied in practical ways to deliver measurable improvements in productivity, cost savings, and response times. If you’d like to explore more, all previous posts are available under Insights; please read them and reach out with any questions or comments you have.