Part 1 of this post (“The Costs of AI You See, and the Ones You Don’t”) focused on identifying the full set of costs associated with implementing an AI-assisted solution. When deploying AI, it’s easy to focus on the costs of the AI model because that’s the central part of the solution. But often, other services are required to provide the AI model with good data, and it’s easy to dismiss those costs as secondary even though they can approach (or even exceed) the cost of AI model usage. This post continues to focus on cost but shifts from identifying different cost contributors to understanding how to manage those costs.
Most businesses are accustomed to allocating costs to projects. Costs that can be directly attributable to a project or job are coded as such, and only costs that don’t have a direct relationship to a project are coded as overhead. But I regularly see that discipline lacking when it comes to AI-focused projects: The costs are treated as “IT overhead” and subsumed into the IT budget, rather than being properly allocated to the projects using them.
There’s a foundational principle at play here: To properly allocate cost, there must be true visibility into what area of the business is responsible for that cost. Cost visibility is the operational discipline that makes the bill legible at the level you actually need to manage it, which is rarely the level at which the bill arrives. Without cost visibility, none of the cost-control techniques in next week’s Part 3 will have anything concrete to operate on.
The Shared API Key
The most common cost-visibility failure I see in production AI deployments is the shared API key.
The shape of the problem is familiar. One Anthropic or OpenAI account is set up early in the AI journey, usually for the initial AI pilot project. That key then gets reused as new projects spin up, and within a few months that single account is billing for prospecting, knowledge retrieval, internal experiments, and the coding workflow.
When the bill arrives, the costs default to the original coding in the G/L. If you’re fortunate to have a diligent Finance department, they may ask “Which project drove last month’s spike?”, but all too often the answer is “we don’t know” because the consolidated bill has no per-project breakdown. The same pattern repeats at the supporting-API layer: one SendGrid account serving prospecting plus transactional mail plus marketing produces one number per month and no way to attribute it.
The diagnostic question to ask of any production AI deployment is whether you can answer this: if next month’s AI bill jumps 30%, which project caused it? If the honest answer is no, the deployment is running blind, and most cost-control techniques cannot apply because there is nothing concrete to apply them to.
The Fix Is Operational, Not Technical
The cleanest path is separate API keys per project where the vendor supports them. That puts the boundary at the account level, which is the same boundary the invoice uses. Where separate keys are not practical, every major model API and most supporting APIs accept a metadata or user-id field on each call that flows through to the usage dashboards and the invoice. Tagging at the call level by project, feature, or customer makes attribution possible after the fact. A cost dashboard that breaks spend down by those same dimensions turns “where did the money go” from a forensic exercise into a routine review.
The dashboard is the part that has to be designed deliberately. Three dimensions matter more than the others, and most production deployments need all three. Cost per project answers the attribution question the shared-key problem creates. Cost per useful output (per qualified lead, per answered question, per shipped feature) is the metric to actually optimize for; cost per call is too granular and rarely tells you anything actionable. Per-day burn rate with a rolling 30-day forecast is the watchful eye that catches runaway workloads well before the monthly bill arrives, which is much later than you want to find out.
In the prospecting pipeline I have written about over the last few weeks, every API call is tagged with both the customer-profile definition being searched and the batch identifier of the run. The dashboard rolls those up automatically, so when I want to know how a specific profile performed in dollar terms (cost per qualified prospect, by profile, over the last quarter) I can look at it the same morning the run completes. That is the level of granularity that makes any of the techniques we’ll look at in Part 3 actually possible.
None of this is terribly sophisticated — it just needs attention. Most production deployments are well underway before anyone gets around to setting it up, and the reason is almost always the same: The pilot worked, traffic started to scale, and nobody stopped to instrument the cost dashboard because the bill was small enough not to demand attention. By the time the bill demands attention, the instrumentation is a retrofit project on a system that is already running in production. The earlier this gets done, the cheaper it is to do, and the more visibility you’ll have into historical metrics.
The Principle
You cannot optimize what you do not measure. A shared API key is not typically a decision that someone made, but rather a default setting that crept into deployment without anyone noticing. Being aware of this issue and addressing it upfront is one of the highest-leverage things you can do on an AI deployment, because it converts the cost question from something that arrives once a month on an invoice you cannot interrogate into something you can proactively manage.
The Cost That Never Reaches the Invoice
There is one more visibility problem worth naming in this post, and it has nothing to do with API keys. There’s a large hidden cost in many production AI deployments that never appears on an invoice at all. The hours spent reviewing AI output, fixing the underlying corpus, handling exceptions, and evaluating new model versions every quarter are AI costs whether anyone labels them that way or not. The other cost categories respond to optimization techniques, but this area of cost responds to operational maturity. If you want to evaluate the true cost of an AI implementation, you need to explicitly track those hours alongside the dollars on the API bill. The Part 3 techniques will reduce the dollar lines but only deliberate attention to the human-time line will reduce this one.
Closing
This post made the case for cost visibility as the precondition for cost control. Part 3 of this series is the operational one, where we’ll look at practical techniques I use to bring the AI bill down at every layer of the stack: model selection, hybrid hosting, search-API choices, enrichment waterfalls, mailing trade-offs, and caching strategies.