TL;DR
- Retrieval returns candidates, not a governed working set. Memory determines what can be accessed. The missing layer is context compilation.
- Context IR is the portable intermediate representation between compilation and runtime — structured, inspectable, governable, provenance-preserving
- The six-layer architecture separates source, substrate, compiler, IR, lowering, and runtime — making the governed context layer explicit
- Compilation quality is a constrained optimization: relevance, cost, latency, policy risk, and provenance loss all matter simultaneously
In Part 1, I showed that existing benchmarks miss governance, safety, provenance, and compilation efficiency. We defined eight metrics to measure what benchmarks miss.
But measuring the gaps told us something bigger than any individual metric result: there is a missing systems layer between retrieval and reasoning.
This post is about that layer. It's about how we went from "the benchmarks don't cover this" to "there is a category of system that doesn't have a theory yet." The full formalization is in Toward a Theory of Context Compilation for Human-AI Systems, available as a preprint. Here I'll walk through the ideas in accessible form.
The Problem with "Just Retrieval"
The common architecture for AI systems that use external knowledge looks like this:
data → retrieval → prompt → model → UI
This works for prototypes. It does not age well. Change the source system, model family, interaction surface, or governance rule, and the whole path often needs to be rebuilt.
The deeper issue is that each paradigm in the current landscape solves part of the problem, but none fully isolates the missing layer:
-
Retrieval-augmented generation is necessary, but retrieval returns candidates, not a governed working set. It can tell you what might be relevant without deciding what should survive budgeting, deduplication, policy checks, and provenance attachment.
-
Memory systems improve persistence and recall, but they do not provide compile-time control over what should be included now, summarized now, omitted now, or transformed now.
-
Context engineering correctly reframes the problem around finite context and high-signal tokens, but it does not yet guarantee a portable internal representation that can be inspected once and lowered into multiple runtimes.
-
Existing benchmarks mostly evaluate answer quality, recall, or long-horizon memory behavior. They rarely expose the compilation decisions themselves.
The missing layer is context compilation: the process of turning heterogeneous source context into a governed active working set under budget, latency, and policy constraints.
Three Lines
The paper reduces the argument to three lines:
Memory determines what a system can know.
Retrieval identifies candidate evidence.
Context compilation determines what the system actually thinks with.
That distinction is the central systems claim. Future AI systems will not win only by having larger context windows or better retrieval. They will win by compiling the right governed working set for the task, the model, and the moment.
The Six-Layer Architecture
If context compilation is a systems layer, what does the full architecture look like? The paper separates the context stack into six layers:
The Six-Layer Context Architecture
The stable asset is not a prompt or a UI — it is the governed context layer (3-4) that survives source, model, and interface changes.
The architectural insight: Layers 3-4 (Compiler + IR) form the governed context layer that decouples what the system thinks with from how sources are stored (below) and how outputs are rendered (above). Change the model, the UI, or the data source — the compiled working set stays stable.
The key architectural insight is the separation between layers 3-4 (the compiler and Context IR) and everything above and below them:
-
Below (layers 1-2): Source data and storage are implementation details. Change from email to Slack, from Qdrant to Pinecone, from SQLite to Postgres — the compilation layer doesn't care.
-
Above (layers 5-6): Runtimes and rendering are presentation details. The same compiled working set can be lowered into a chat message, an executive brief, an agent payload, a voice script, or an API response.
-
The stable core (layers 3-4): The compiler and IR form the governed context layer. This is the durable asset — it survives source changes, model changes, and interface changes. The paper calls this property continuity of cognition.
Context IR: The Portable Representation
The paper introduces Context Intermediate Representation (Context IR) as the portable internal object between source context and runtime execution.
In plain terms, Context IR is the representation that lets a system say: "these are the specific facts, events, entities, commitments, issues, preferences, and contradictions that survived compilation, together with the provenance, policy, freshness, and confidence metadata needed to use them safely."
Context IR Object Model
Typed semantic objects + governance metadata. Click any type to see its fields.
Semantic Objects
Governance
What Context IR is not: Not raw retrieval output. Not a prompt template. Not MCP. It is the governed representation of what the system is permitted to think with — before lowering into any specific runtime.
What Context IR is not
This matters enough to state explicitly:
- Not raw retrieval output. Retrieval returns candidates. Context IR contains what survived compilation — what was selected, ranked, deduplicated, and policy-filtered.
- Not a prompt template. Prompts are runtime outputs. Context IR is the governed input to the lowering stage that produces prompts.
- Not MCP itself. MCP standardizes interaction surfaces between tools and models. Context IR standardizes the governed working set that compilation produces.
Properties of a useful IR
A credible Context IR must be:
- Structured rather than free text — typed semantic objects, not one undifferentiated string
- Portable across runtimes — the same IR lowers into chat, agent, voice, or API
- Inspectable by humans and systems — you can examine what the compiler decided
- Governable through explicit policy metadata — sensitivity labels, domain ACLs, redaction records
- Provenance-preserving — every object traces to its source
- Budget-aware — token estimates attached to every object
These properties are not aspirational. They are implemented in MemoryOS's context_ir.py and exported as a JSON Schema for external inspection.
Compilation as Optimization
The paper does not present context compilation as a vague design preference. It presents it as a constrained optimization problem.
Context Compilation as Optimization
Compilation quality is never only about relevance — cost, latency, policy risk, and provenance loss all matter.
C* = argmax_C
Downstream utility
Token + compute cost
Scope violations
Source lineage lost in transformation
The point of the formulation is not the exact symbols. It is the recognition that compilation quality is never only about relevance. A good context pack must also respect:
- Token and compute cost — budgets are finite
- End-to-end latency — real-time use cases can't wait
- Policy and privacy risk — restricted content must not leak
- Provenance loss — summarization and transformation can destroy source lineage
The two-stage formulation is the stronger insight: IR-space optimization (what should be in the working set) is separated from runtime-space lowering (how to express it for a specific target). Selection and governance happen once. Formatting happens per-runtime.
This separation means you make policy decisions once — in IR space — and then lower the same governed working set into chat for one user, into an executive brief for another, and into an agent payload for a third. No re-running policy checks. No accidental scope leakage because a different runtime path had different filtering.
Continuity of Cognition
Here's why this architecture matters beyond just "better context."
If the source environment changes — a new email system, a new CRM, a different document store — the compilation layer adapts, but the IR stays stable. The model keeps reasoning with the same governed working set.
If the model changes — GPT-4o to Claude to Gemini to a local model — the IR stays stable. The lowering stage adapts to the new model's format preferences, but the semantic content doesn't change.
If the runtime changes — from chat to voice to workflow agent — the IR stays stable. The lowering stage produces runtime-appropriate output.
The paper calls this continuity of cognition. The stable asset in the system is not a prompt or a UI. It is the governed context layer that preserves what the system knows, what it's allowed to use, and what evidence that knowledge came from — across every kind of change.
From Theory to Code
One of the strengths of this work is that it is not only architectural language. The MemoryOS repository contains concrete artifacts:
context_ir.py— the typed IR object model with JSON Schema exportcontext_pack.py— the compiler path: intent planning, multi-channel retrieval, policy gating, compression, manifest generationlowering.py— the explicitLower(IR*, T, R)stage with targets including chat, executive brief, agent, voice, and API
The theory is implemented. The implementation is inspectable. And the measurements that validate the architecture are the subject of Part 3.
Next: Part 3: The Evidence — Eight metrics measured on a live system. The CRR journey from 48.6% to 100%. CompileBench: the benchmark that evaluates compilation decisions, not just answers. And the open standard proposal.