brianletort.ai
All Posts

Modes of the LLM OS

A 6-part series on what actually happens when you prompt frontier AI — and how to run your own

LLM OSCoworkClaude CodeCursorOperatorMCPEnterprise AIAI Governance

Cowork Mode: State Is the Coworker

The difference between a chatbot and a coworker is state. Claude Code, Cursor, Operator, Codex, ChatGPT Projects. Persistent memory, skills, knowledge base, environment access. Session-long state — and the most dangerous un-governed surface in the enterprise today.

May 18, 202611 min read

TL;DR

  • Cowork Mode is defined by state — persistent memory, skills library, knowledge base, and environment access — all at once, session-long
  • Cursor, Claude Code, Operator, Codex, ChatGPT Projects are all cowork products. They differ in which surfaces they expose and how they govern them
  • Skills are the 2026 abstraction for shared organizational behavior — reusable, named, versionable prompt + tool bundles
  • MCP (Model Context Protocol) is the connective tissue that makes Cowork Mode possible at enterprise scope
  • A cowork session costs 100–10,000× a single Chat call and is the most dangerous un-governed surface in most enterprises today

The difference between a chatbot and a coworker is state.

A chatbot is a conversation. You ask a question. It answers. You ask another question. It answers again. Each turn is stateless at the model. The illusion of memory is what the client rehydrates into the next prompt.

A coworker is different. A coworker remembers what you decided last week. A coworker knows the project. A coworker has skills that your whole team shares. A coworker can open your terminal, read your files, commit your code, and move your mouse. A coworker does not forget between Tuesday and Thursday.

That is Cowork Mode. Claude Code. Cursor. ChatGPT Agent and Projects. Codex. Operator. They are all the same category of machine, and they are the fourth and final operating mode of the LLM OS.

Cowork Mode: The Coworker's Desk

Memory, skills, knowledge base, environment. Watch which surfaces activate for each cowork product.

Session time
~2 hrs
Events
0 / 9
Tokens
0
Cost (final $180)
$0.00
0 hits

Persistent memory

What the coworker remembers across sessions.

0 hits

Skills library

Shared organizational behaviors, addressed by name.

0 hits

Knowledge base

Project files, docs, code index. Retrieval scope.

0 hits

Environment access

Terminal, browser, screen. Real side effects.

The coworker's desk — four surfaces. Your Claude Code session uses 4 of them.

Session trace

Load project memory

Resume prior conversation state, todos, decisions

Read CLAUDE.md

Load project rules, conventions, architecture notes

Activate code-review skill

Load skill prompt, tool registry, examples

Terminal: git status

Execute shell command in project root

Read 4 source files

src/api/*.ts, package.json

Update scratchpad

Record plan: refactor auth middleware

Terminal: run tests

vitest run

Edit src/api/auth.ts

Apply patch

Persist session memory

Save decisions for next session

A cowork session is not a conversation. It is persistent memory, a skills library, a knowledge base, and environment access, all at once. Every surface is a separate governance question.

The four surfaces

Cowork Mode is defined by four surfaces, all present at once, all in play every session:

  1. Persistent memory. What the coworker remembers across sessions. Project history, decisions made, preferences, ongoing plans, unfinished work. Every serious cowork product has some version of this — Cursor's chat history, Claude Code's session resume, ChatGPT's long-term memory, Projects' thread persistence.

  2. Skills library. Reusable, named behaviors. A skill is a bundle — prompt, tool registry, examples, sometimes scaffolding — addressed by name. "Activate the incident-response skill" loads a consistent organizational behavior. Skills are how a team's best practices become a shared resource instead of a document nobody reads.

  3. Knowledge base. The body of content the coworker can retrieve from. Project files, documentation, code index, uploaded PDFs, internal wikis. The knowledge base defines what the coworker knows about you — not what it knows about the world.

  4. Environment access. The part that most separates Cowork from the other three modes. A coworker can open a terminal. Execute a shell command. Read files. Write files. Click a button in a browser. Move your mouse. Run your tests. Commit to your repo. This is how real work actually gets done.

No Chat call has any of these. An Agent turn has tools but not persistent state. A Deep Research run has a swarm but not environment access. Cowork has them all.

The four products, and what they share

ProductMemorySkillsKnowledgeEnvironment
CursorYesYes (rules, hooks)Yes (project index)Yes (terminal, editor, git)
Claude CodeYesYes (Claude Skills)Yes (CLAUDE.md + files)Yes (terminal, file I/O)
ChatGPT ProjectsYesPartial (project instructions)Yes (uploaded files + web)No
Operator / Computer UseYesPartial (via tools)MinimalYes (screen, browser)
CodexYesYesYes (code index)Yes (terminal, editor)

Cursor and Claude Code are the most complete cowork products in 2026. Both give the model all four surfaces. Both are where most engineering teams are feeling the productivity lift — and also where most are quietly accumulating governance debt.

ChatGPT Projects occupies an interesting middle ground. It has memory, partial skills (custom instructions at the project level), and knowledge (uploaded files), but not environment access. That is deliberately a safer product; it also means it cannot actually do the work you would ask a coworker to do.

Operator (and the broader Computer Use category) inverts the trade: thin memory, minimal skills, no real knowledge base, but direct environment access — screen, mouse, browser. This is what you reach for when the only way to do the job is to operate the UI a human would operate. It is the riskiest cowork surface for anything sensitive.

Codex and code-specific offerings largely mirror the Cursor and Claude Code shape.

Skills, specifically

The skills-as-infrastructure move is the single most important product shift of 2026.

A skill file is small. It contains a prompt, a tool registry scoped to that skill, often a few example invocations, and sometimes scaffolding code. When the runtime detects that a skill applies — usually via name match or semantic match — it loads the skill bundle into the context and the coworker now "has" that behavior.

Examples from products shipping today:

  • Claude Skills. ~/.claude/skills/ or workspace-level. "Create-pull-request" skill loads PR-writing tools and examples. "Review-code" skill loads the review checklist and structured output. "Release-notes" skill assembles changelogs. The skill fires when it's selected; your coworker runs with that behavior, then returns to baseline.
  • Cursor Skills and Rules. .cursor/rules/*.mdc and per-project skill definitions. Rules define the coworker's behavior at the repo level. Skills are higher-level bundles that can be invoked explicitly.
  • ChatGPT custom GPTs. Custom instructions + function tools + knowledge files. The GPT-as-product model is effectively a skill.

The economics of skills matter too. A skill is how you avoid paying input-token cost on your entire tool library for every call. Only the active skill's prompt and tools are in context. Used well, this reduces context bloat by 5–10× for a heavy-tool-use project.

Skills are also where cross-enterprise sharing starts. The best teams I have seen in 2026 are building internal skill marketplaces: the "onboard-new-engineer" skill, the "security-review" skill, the "compliance-check-for-SOC2" skill. The skills are the team's shared intelligence, versioned, reviewed, and governed — more durable than documentation, more consistent than training.

MCP is the connective tissue

Model Context Protocol is why Cowork Mode works at enterprise scope.

MCP is the wire format that lets a cowork runtime talk to external servers exposing tools and resources. Claude Code uses MCP to reach GitHub. Cursor uses MCP to reach your filesystem and databases. Your enterprise can stand up an MCP server that exposes your internal systems — your data warehouse, your ticketing system, your feature flags, your secrets manager — to whatever cowork product your team uses. The model does not need to be trained on your systems. It needs a protocol that tells it what they are and how to call them.

Three implications for the enterprise:

  • MCP is a new API boundary. The MCP servers your team exposes are as important as the public API. They are often the first place sensitive data leaves, and the first place a compromised cowork session can reach inside.
  • MCP auth is the new OAuth moment. Scoped tokens, short-lived credentials, per-tool permissions — the patterns are not new, but they have to be applied to MCP servers with new discipline. Most enterprises are running MCP on shared developer credentials in 2026. That will not age well.
  • MCP observability is essential. Every MCP call is a tool invocation from the runtime. Log them all. Correlate them to the session. Build the audit.

MCP turned out to be the sleeper standard of 2025 and 2026. If you have not stood up a governed MCP gateway yet, start there before anything else in this section.

Computer use and Operator

The screen-and-mouse branch of Cowork is both the most impressive and the most dangerous.

What Operator (and Claude Computer Use, and the broader category) actually does: the runtime takes a screenshot, asks the model to reason about what to do next, emits a keyboard or mouse action, executes it on a real screen, takes another screenshot, and loops. Screen as an API. OS as a tool.

The use cases that justify it are narrow but real: legacy applications without APIs, vendor portals with no MCP integration, tasks where the UI is the interface and there is no other way to reach it. In those cases, a cowork session with computer use can do in ten minutes what a human has been doing in an hour.

The governance challenge is brutal. Everything the coworker sees on screen is in the context. Everything the coworker can click, it can click. Every form it can fill, it can fill. If it has your browser, it has your cookies. If it has your terminal, it has your credentials. If it has your screen, it has everything your screen has.

The right answer is not "don't use computer use." The right answer is to give computer use its own dedicated environment — a VM, a container, a browser profile with narrow permissions, scoped credentials, isolated network — and log every screenshot and action. If your Operator is running against your main laptop with your everyday credentials, you have built a loaded gun you walk past every day.

The cost curve

Cowork Mode does not bill like the others.

A Chat call is priced per call. An Agent turn is priced per turn. A Deep Research run is priced per run. A cowork session is priced per hour of engineer using it, because the model is in the loop the entire time. Tokens accumulate constantly. Context grows constantly. Compaction fires periodically, but cannot undo a multi-hour context cost.

Rough numbers for a serious production session:

  • ChatGPT Projects, one-hour session: ~$40, ~180K tokens
  • Operator / Computer Use, 45-minute session: ~$95, ~380K tokens
  • Claude Code, two-hour session: ~$180, ~950K tokens
  • Cursor, three-hour session: ~$220, ~1.1M tokens

For comparison, a Chat call is $0.02. A cowork session is 100× to 10,000× a single Chat call.

The economics only work if the session produces value that corresponds. An engineer closing two tickets in three hours that would have taken a day and a half is a great trade at $220. An engineer wandering with an AI for three hours on a half-formed idea is a worse trade. The meter runs whether or not the work compounds.

This is also why skills matter economically. A well-scoped skill keeps the session focused. A cowork session without scoped skills drifts, which means longer sessions, bigger contexts, more compaction, more cost, less value. Enterprise cost management in Cowork is mostly about scoping the workload, not about model choice.

Enterprise sidebar — the most dangerous un-governed surface

Three uncomfortable questions most enterprises have not answered in 2026:

Who in your organization is running Cowork Mode today? Not just your engineers. Your analysts running Cursor. Your sales engineers running Claude Code to write demo scripts. Your IT team running Operator against vendor portals. The answer is almost always "more people than you think."

What credentials are those sessions running under? If the answer is "the person's everyday credentials with full access to production and email," you have a first-class governance problem hiding in plain sight.

If the coworker did something wrong yesterday, can you find it today? The cowork run ledger is the primitive that answers this. If you cannot produce a chronological trace of every surface the coworker touched — memory reads, skill activations, knowledge retrievals, environment actions — you do not have audit. You have a demo.

Three moves for Monday, in order:

  1. Inventory cowork usage. Which products. Which people. Which credentials. Which environments. Which data reachable from those environments. If the inventory takes more than a week, you are already late.
  2. Stand up a governed MCP gateway. Every cowork product your team uses should reach enterprise systems through one controlled MCP layer, with scoped auth, rate limits, and full logging. No direct MCP-server-to-laptop connections to production.
  3. Define skill governance. Who can publish a skill. Who reviews it. Where the skill library lives. How skills get versioned. Skills will become your team's most valuable shared asset. They should not be accumulating on individual laptops.

Next up

Five modes of the LLM OS, fully mapped. Part 6 assembles them into an enterprise operating picture. Bedrock, Azure, Vertex, OpenRouter, self-hosted vLLM against H100 / H200 / Blackwell hardware. The reusable control plane underneath — context compiler, token ledger, skill registry, MCP gateway, eval harness. What an enterprise actually builds when it decides to run its own LLM OS.

Operate. Publish. Teach.