Anthropic / chat
Claude Sonnet 5
Positioned as the most agentic Sonnet yet — planning, browser and terminal tool use, and autonomous runs previously requiring larger models — with native 1M-token context, adjustable effort levels, and default status for Free and Pro tiers.
The mid-tier model is now explicitly an agent runtime with a cost dial, which changes routing economics for agent fleets. Operators should re-baseline which tasks need a frontier model, and note the intro pricing ($2/$10 per Mtok) steps up to $3/$15 after 2026-08-31 — model the increase before standing up fleets on the intro rate.
Anthropic / automate
Claude Code v2.1.198
Background agents launched from `claude agents` now automatically commit, push, and open a draft PR when they finish code work in a worktree, fire notification hooks, and run subagents in the background by default; Claude in Chrome reached GA the same release.
Unattended work now terminates in a reviewable artifact with a human merge gate by default — the W26 escalation runbook rule became the product default. Automation owners should stop writing that rule into prompts and instead verify their review queues and notification hooks can absorb the incoming draft-PR volume.
GitHub / cowork
Browser tools for Copilot in VS Code
Agents can drive a real browser at GA: navigate live apps, click and type, read pages, capture console errors, take screenshots, and run scripted flows — on by default with a deliberate permission model.
Computer-use is converging on agents verifying their own web work — Claude in Chrome went GA the same week. Coworking agents can now test the thing they just built, so leaders should update definition-of-done to include agent-run browser verification, and review the permission model before broad enablement.
GitHub / automate
Copilot AI credit session limits
Per-session spend caps in Copilot CLI and SDK cover model calls, subagents, and background compaction — `/limits` interactively, `--max-ai-credits` in scripts — with soft-cap semantics that let the agent wrap up gracefully instead of running open-ended.
Spend just became a first-class loop constraint rather than an after-the-fact bill, explicitly aimed at unmonitored automation. Automation owners should set caps on every scripted or scheduled agent run now, and treat any agent surface without a budget primitive as a gap to raise with the vendor.
Cognition / build
Devin Security Swarm
An enterprise swarm that finds vulnerabilities across large codebases using Agentic MapReduce, validates exploitability at runtime in an isolated sandbox, and ships remediation PRs, accompanied by a six-week backlog-remediation program.
This is swarm-scale agent work shipping with a published architecture and eval rather than a demo. Even teams that never buy the product should study the orchestration: deterministic coverage plus sandbox verification is the transferable answer to 'how do agents handle codebases bigger than a context window.'
Cursor / automate
iOS app (public beta) and Team MCPs
A native iOS app on all paid plans lets operators launch and manage always-on agents remotely with live notifications and review or merge PRs from a phone; Team MCPs let admins configure MCP servers once and distribute them across cloud agents, IDE, and CLI with org-group scoping.
The supervision surface for background agents is going mobile while the connector surface goes centrally administered. Leaders should decide whether phone-based PR merges fit their review policy before the beta normalizes it, and move MCP administration from per-developer sprawl to the org-scoped model.