Designing the AI Control Plane

TL;DR

The moment an enterprise lets every app team build its own budget logic, prompts, and routing, it has seventeen control planes and zero control. The platform move is to push that logic up — once — and make every app a client of it.
A real AI control plane does five jobs on every request, before a single token is sent: classify, price, route, eval, and ledger. Skip any one and the economics, the policy, or the audit story breaks.
The control plane is a data model, not a sidecar. Nine entities and a handful of foreign keys turn 'we think AI is governed' into 'we can prove it to a regulator in an afternoon.'
FOCUS-compliant cost records are the interface between the AI ledger and the enterprise ERP. They are what makes AI spend legible to finance without a six-month integration project.
The organizations winning at enterprise AI in 2026 are not the ones with the best models. They are the ones that own the control plane where those models are asked to run.

Measure it: The Enterprise Token Scorecard

A CIO at a global bank reached out through a mutual friend a few weeks back — the kind of call where someone wants to compare notes, not sell anything. She opened by sharing her screen: an AI cost dashboard showing thirty-one million dollars, trailing twelve months. She knew that number. She wanted to know the other one.

"How much of that is governed?"

She described the answer. It took her a while to get through it. Seven different app teams had integrated seven different vendors. Four of them had their own prompt registry. Three of them had their own caching layer. Two of them had their own retry logic. Nobody had a canonical list of which regulated data classes were allowed to travel to which region. Compliance had signed off on the concept, not on any specific run. The chargeback report came out of a finance spreadsheet that nobody trusted, reconstructed from vendor invoices that did not agree with each other at the tenth of a percent.

Then she said it, and I wrote it down: "I have seventeen control planes. I have zero control."

That meeting is becoming universal. The CEO's Guide to Token Economics argued that token economics is an operating discipline, not a procurement discipline. Data Gravity Meets Token Economics argued that placement is the other half of that discipline. Both essays left a specific question unanswered: what, concretely, does the system that runs those disciplines look like?

This is the architecture post. It is for the CIO and the CTO who have to build it. It is also for the CEO, the CFO, and the board member who will write the checks — because if you cannot recognize a real AI control plane when a vendor shows you one, you will buy a dashboard instead of a platform, and the bill will keep growing.

Why the control plane belongs outside the app

The first architectural sin is the most common one.

Every app team building with AI eventually writes budget logic, prompt scaffolding, model-selection rules, retry handlers, and audit trails. Every app team writes them a little differently. Every app team has an opinion about which model is best, which cache to use, and how much to spend per user. Every app team is right, for its own scope, for about a quarter. Then the next team stands up, makes different choices, and the enterprise now has two control planes with divergent policies, divergent costs, and divergent audit trails.

Repeat eight times. That is where most large enterprises sit right now.

The correct architecture inverts it. The control plane lives outside the app. Apps, agents, and copilots are clients — they request intelligence, the control plane decides how to satisfy the request, and the result comes back with policy, cost, and audit already attached. The app does not choose the model. The app does not enforce residency. The app does not do the chargeback math. The app writes business logic, which is what it is supposed to do.

That is not exotic. It is the same inversion that happened with authentication (apps stopped rolling their own auth and called an identity provider), payment processing (apps stopped handling cards and called a gateway), and secrets management (apps stopped hard-coding credentials and called a vault). Every one of those inversions looked optional until the compliance surface caught up with them, and then it looked mandatory.

Enterprise AI is ten minutes before mandatory.

The token policy engine, five steps at a time

The core of the control plane is a policy engine that runs on every request, in the same five steps, every time.

The token policy engine, one request at a time

Five steps the AI control plane performs before a single token is sent to a model. Skip any one of them and the economics, the policy, or the audit story breaks. Click a step to see what flows through it.

Step 1: ClassifyWhat is this request?

Admission control. Before a request is even priced, it is classified along four axes: data sensitivity, domain ownership, workload character, and residency constraint. Unclassified requests are rejected at the door. This is the step most in-house control planes skip — and the one that makes everything downstream fail.

Inputs

Agent request
Context package
Caller identity

Outputs

Sensitivity class
Domain owner
Residency lock

The engine runs on every request, every time. There is no "skip for this agent" option. That symmetry is what lets finance, compliance, and the CEO trust a single number at the top of the house.

The shape of the engine matters more than the implementation. There are five disciplines, five handoffs, and five things that break when any one of them is missing. Walk the flow above in order.

Classify. Before a request is priced, it is labeled. Sensitivity class from the data it intends to read. Domain owner from the caller's identity. Workload character from the request shape. Residency lock from the asset catalog. Unclassified requests are rejected at admission. This is the step most in-house control planes skip — "we trust our engineers" is how it gets rationalized — and it is the step that makes the rest of the pipeline useless without it. You cannot route by sovereignty you never assigned.

Price. Count before call. The major model providers expose token-counting APIs precisely so this step can happen at the control plane. Estimate tokens, multiply by the tier the router is about to pick, subtract any cache leverage, compare against the domain's wallet balance, and — if the quote exceeds policy — reject the request before the model is ever contacted. Price is not a single number, it is a function of model, lane, cache, and locality.

Route. Pick the cheapest acceptable lane. Utility-tier public call, regional API, reserved deployment, sovereign colocated GPU, batch-flex. Residency class is a hard constraint. Quality floor, latency budget, and data gravity are cost dimensions. Every routing decision is logged — not "this call went to Claude", but "this call went to Claude on the regional lane in us-east-governed because the residency lock required it, and the router chose this model because the eval spec for this workload requires a 0.85 floor that the utility tier has historically missed."

Eval. Verification gate. Before the output ships back to the caller, it runs through an eval scoped to the workload. Pass, and the ledger records success. Fail, and the request retries, escalates to a higher-tier model, or enters a human-review lane. This is the step that lets you measure "cost per verified outcome" — without it, you are measuring cost per attempt, which tells you nothing about whether the platform is creating value.

Ledger. Every run lands in an immutable audit record. FOCUS-compliant fields so finance can reconcile AI spend against ERP accounts without a six-month project. Regulators asking the locality question get an answer from the ledger instead of from a forensics exercise. Showback today; chargeback when the data is stable.

The five steps are not optional, in the same way that the five steps of a payment transaction — authorize, capture, settle, reconcile, refund — are not optional. You can skip them, but you will rebuild them in a panic when the CFO asks for numbers she cannot produce or the regulator asks a question you cannot answer.

The control plane is a data model

The second architectural sin is thinking that the control plane is a layer of middleware. It is not. It is a schema.

The control plane as a data model

Nine entities, one fact table, a handful of foreign keys. This is the part of the architecture most teams get wrong by building the policy engine without the schema underneath it. Click any entity to see its job.

INFERENCE_RUNThe fact record

The central fact in the control plane. Every run creates exactly one record, immutable, with foreign keys to everything around it. If the ledger is the source of truth, this is the fact table.

Key fields

run_idtimestampoutcomecost_usdeval_score

Relationships

←DOMAIN· sponsors
←AGENT· issues
←DATA_ASSET· feeds
←MODEL· serves
←ROUTE_POLICY· governs
→EVAL_RESULT· scored_by
→CHARGE_RECORD· debits

Every arrow here is a foreign key the audit team can follow. This is why the control plane is a platform, not a sidecar — it owns the schema that makes the rest of the argument auditable.

Nine entities. One fact table at the center — INFERENCE_RUN — with foreign keys to everything that matters for governance. Domain owns budget. Agent issues runs. Data asset feeds runs. Model serves runs. Route policy governs runs. Eval result scores runs. Charge record debits runs. Every arrow is a question the audit team can answer by following a join.

This is the part of the architecture most teams get wrong. They build the policy engine without the schema underneath it. The engine makes decisions, but the decisions do not land anywhere queryable. Six months in, when the finance team asks for spend-by-domain-by-lane, or the compliance team asks for all inference that ran against a specific data asset, the answer is a week of engineering effort instead of a SQL query.

The right move is to design the schema first. Stand up the INFERENCE_RUN table before you write the policy engine. Populate it from day one. Every downstream feature — showback, chargeback, audit, FinOps reporting, cost-per-outcome metrics — becomes a view on top of that single well-designed fact table.

A good test: can your control plane answer the following five questions in under a minute, without a human running a script?

What is every inference run that touched data asset X in the last ninety days?
What is the total AI spend by domain, broken out by lane, for the current fiscal quarter?
Which runs in the last thirty days failed their eval floor, and what did they cost?
What is the regional-premium paid by domain, against the residency-compliance rate?
Which agents have the fastest-growing spend, and what is their quality trend?

If the answer is "give me a week," the control plane is not yet a platform. It is a pile of plumbing.

The audit ledger, on one run

The ledger is not a log file. It is the interface between the AI platform and the rest of the enterprise. Finance reads it to produce the chargeback. Compliance reads it to respond to audits. The board reads a rolled-up version of it on the quarterly slide. The regulator reads it when the phone rings.

A single AI run, on the ledger

What an auditable AI run actually looks like when the control plane is doing its job. Sixteen fields, every one of them answerable. Click any field to see why it is on the record.

inference_runs / record #2638711

FOCUS-compliant

cost_usdEconomics$0.024

Fully loaded cost for this run, including any regional or sovereign premium. Premium is broken out in the charge_record entry so finance can see what locality actually cost.

Identity

Routing

Economics

Quality

Governance

Sixteen fields. Every one of them is answerable because every one of them is populated at the moment of admission or at the moment of return, by the control plane, before the run is considered complete. Not at the end of the quarter. Not when someone asks. At the run.

Three fields in particular separate mature operators from experimenters.

focus_account — a FOCUS-compliant account key that maps this run into the ERP chart of accounts. FOCUS is the FinOps Foundation's open cost-record specification; treating it as a first-class field rather than an afterthought is what makes AI spend legible to finance without a multi-quarter integration project. The CFO should be able to run her standard variance report and see AI alongside everything else, without a translation layer.

eval_score and outcome — these two fields together are what make the cost per verified outcome metric calculable at all. An operator that does not populate them is publishing token volumes and calling them metrics. A mature operator is publishing verified outcomes at a per-workload cost and trending that number quarter over quarter. The difference is measurable; the conversation in the boardroom is not the same.

residency_lock — the constraint set during classification. The lane and region fields have to satisfy it. If they do not, the ledger flags a policy violation. Most enterprises have been running for two years without this field. Most enterprises will wish, in retrospect, that they had not.

The cache and batch strategy: context as capital, async as off-peak

Two of the largest economic levers in enterprise AI live in the control plane and almost nowhere else.

The first is cache discipline. Every non-trivial AI workload sends the same context over and over — system prompts, policy blocks, tool definitions, few-shot examples, retrieval snippets. Major providers now discount cached input tokens to ten percent of standard price and document latency reductions of up to eighty percent. At portfolio scale, cache leverage is the single largest cost lever most enterprises have not pulled. The control plane is the only place that can pull it consistently, because only the control plane sees every request across every app. App-level caching creates fragmentation; control-plane caching creates compounding.

Treat repeated context as reusable capital. System prompts are not strings, they are instruments. Policy blocks are not boilerplate, they are assets. Tool definitions are not JSON, they are infrastructure. The control plane is what lets you depreciate that infrastructure across millions of requests, which is what makes the unit economics work.

The second is batch and flex lanes. Not every workload is latency-sensitive. Overnight enrichment, eval runs, back-office summarization, periodic data tagging — any of these is a candidate for asynchronous execution. Major providers discount asynchronous batches by up to fifty percent. The control plane is the only place that can route work into the right lane, because the app does not know — and should not know — which of its requests can wait. It just asks for intelligence; the control plane decides when "intelligence" means "right now" and when it means "next hour, at off-peak economics."

Both levers are worth double-digit percentage points of AI spend, separately and combined. Both are inaccessible without a real control plane.

The eval-and-approval loop is not optional

A quick but critical point on quality.

The eval step is the weakest link in most in-house control planes. Teams build the policy engine, the router, and the ledger. They say they will "add evals later." Six months later, the platform is shipping thousands of runs a day, nobody knows how good those runs are, and the CFO's question — "how do we know this is working?" — goes back to the same two data points it started with: adoption and anecdote.

Build the eval gate with the rest of the platform. The specification does not need to be complex on day one: a small number of assertions, a configurable quality floor, pass/fail output, escalation to a higher-tier model or to human review on failure. The important thing is that the field is on the record for every run, from the first run. Quality is measurable in retrospect only if you measured it at the time.

The architectural pattern is simple: route on the way out, eval on the way back. A failed eval returns the request to the router with an escalation flag, which picks the next-best lane. A second failure routes to a human-review queue with the original context, the failed output, and the eval assertions that failed. The human reviews, corrects, and stamps the record. The next run of the same workload learns from the correction.

Without this loop, the scorecard at the top of the house is a lie. With it, the scorecard is a business system.

Five architecture questions a CEO should ask the CIO

The boardroom version of this post is five questions. Ask them cold. The answers are diagnostic.

Show me one inference run. Not a dashboard. One run. If the platform team can pull up a single record with all sixteen fields populated in under a minute, the control plane is real. If the answer is "let me build a query for that," it isn't yet.
What happens when a regulated workload tries to run against a non-compliant region? The answer should be "it is rejected at admission, before the model is contacted." If the answer involves a review board, a ticket, or a manual check, residency is a convention, not a policy.
How much of your AI spend is directly allocated to a business domain in FOCUS-compliant records? If the number is under 60 percent, the chargeback conversation you are about to have with finance is going to go badly. If the number is above 80 percent, you are operating, not experimenting.
Which models are running on which lanes, by policy, right now? A mature control plane can show a live routing report with model mix, lane mix, cache-hit rate, and regional-premium-paid — by domain, by workload, by week. If it cannot, nobody is measuring the single highest-leverage cost decisions the platform is making.
When was the last model deprecation handled by the platform without an app-team emergency? Model deprecations are the stress test of a real control plane. If deprecations still require every app team to rewrite code, the control plane is not running the router. If deprecations are a routing-policy update — one config change, one release, done — the platform is working.

Most CEOs will not get a clean answer to all five. That is fine. The question is whether the CIO has a credible plan to get to yes on each, on a timeline a board can endorse.

The ERP connection, in one sentence

One last architectural note, because this is the one most often overlooked.

The AI platform should reconcile against the ERP as a peer system, not as a feed from a cost center. FOCUS is the vocabulary that makes this possible without a bespoke integration for every vendor. When the CFO pulls her quarterly finance report, she should see AI in the same layout as every other operating expense — not as a line item called "Shared Infrastructure" or "Platform Services," which is how most enterprises hide AI from the view of the people who ought to be governing it.

That is a one-sentence change in architecture that compounds into a decade of different conversations with finance.

The leadership move

Everyone has access to the same models. Everyone has access to the same APIs. Everyone has access to the same caching, batching, and regional-deployment options.

The difference between the organizations that will create lasting economic value from AI and the organizations that will merely spend on it is not the models. It is the control plane.

The CEOs I talk to most often right now are the ones who have stopped asking their CIOs "which model should we standardize on" and started asking "who owns the control plane." The answer to the first question changes every quarter. The answer to the second question — if it is a named owner, with a named team, with a named platform, with a schema that finance and compliance can both read — is the thing that compounds.

Own the control plane. The models are rented. The control plane is the asset.

This is the architecture post in the executive token-economics thread. The frame is in The CEO's Guide to Token Economics. The placement dimension is in Data Gravity Meets Token Economics. The measurement layer is in The Enterprise Token Scorecard. The technical substrate lives in the three-part Token Economy series. All four meet in the control plane — which is why it has its own essay.