The Model Pulse

Issue 07 · Week 23 of 2026.

June 6, 2026/Weekly read/~6 min read/Public sources onlyDownload brief

The Big Read

No new closed frontier shipped; the open/local agent substrate widened underneath it.

The thesis this issue defends

W23 did not produce the expected Gemini 3.5 Pro GA or a fresh Anthropic/OpenAI frontier release. That absence is the story: Claude Opus 4.8 remains the public closed-frontier leader for coding and agentic work, while Google kept Pro in the June watch window and the model layer's actual shipping activity moved down-stack. JetBrains released Mellum2, an Apache-2.0 12B/2.5B-active MoE designed for low-latency routing, RAG, summarization, validation, and sub-agent calls; NVIDIA released Cosmos 3 as an open physical-AI omni-model with Nano 16B and Super 64B variants; H Company released Holo3.1 with local computer-use sizes and quantized checkpoints. The procurement implication is sharper than another leaderboard reshuffle: production agent systems are becoming portfolios of models. Keep Opus/GPT/Gemini-class models for high-risk reasoning and codebase-scale orchestration, but push cheap, private, repeated sub-agent work into specialized open/local models. The tree delta therefore adds efficient-agent and physical-AI nodes rather than another general chatbot crown.

Tree delta

What changed in the tree.

3 models added, 0 updated.

Three W23 additions: Mellum2 for efficient text/code sub-agent workloads, Cosmos 3 for physical-AI omni-modeling, and Holo3.1 for local computer-use agents.

Added (3)

mellum2
cosmos-3-nano
holo-3-1

Updated

None this period.

Gemini 3.5 Pro remains pending; no tree row is added until Google publishes the GA model card or API identifier.

Explore the LLM Evolutionary Tree

Frontier movements

Flagship-class releases.

1 release this period.

Vendor-stated frontier capability. The releases that reset the closed-source ceiling.

2026-06 target/Google DeepMind/Frontier/Reasoning
Gemini 3.5 Pro
Still pending at W23 close: Google has said Pro follows Gemini 3.5 Flash in June, but no public API ID, pricing, or independent benchmark row landed in-window
This is a frontier movement by absence. Buyers waiting for Gemini 3.5 Pro should keep the launch on the June watchlist but should not pause current coding-agent baselines: the public board still has Opus 4.8 leading the closed frontier, and Pro's economics and benchmark profile remain unverified. The likely enterprise split is task routing — Gemini for huge-context/multimodal work, Opus/GPT for coding and agentic reliability — not a universal replacement.
Google Gemini 3.5 announcement; AI Tool Bolt June comparison

Open weights

Open-frontier and open-source drops.

3 releases this period.

Open-weights releases that change procurement options. Pull these into pilot when score parity meets license parity.

2026-06-01/JetBrains/Edge / small/MoE
Mellum2
Apache-2.0 12B MoE with 2.5B active parameters per token for low-latency text/code sub-agent workloads
Mellum2 is not trying to win the frontier leaderboard; it is trying to lower the cost of the thousands of routine model calls inside agent systems. Routing, RAG, summarization, validation, and lightweight code tasks are exactly where private deployments want an efficient open model. If the benchmark claims hold, this is a practical procurement node for teams trying to cut orchestration cost without sending every step to a closed flagship.
Hugging Face JetBrains Mellum2 launch
2026-06-01/NVIDIA/Specialist/Multimodal
NVIDIA Cosmos 3
Open physical-AI omni-model family combining world generation, physical reasoning, and action generation in Nano 16B and Super 64B variants
Cosmos 3 widens the model tree away from text-only agents and into physical AI. The important shift is a unified model that can reason over world state and action generation rather than stitching separate generation and control pipelines together. Robotics, simulation, and industrial automation teams should evaluate it as synthetic-data and reasoning infrastructure, not as a chatbot substitute.
Hugging Face NVIDIA Cosmos 3 launch
2026-06-02/H Company/Specialist/Agentic
Holo3.1
Local computer-use agent family with 0.8B, 4B, 9B, and 35B-A3B sizes plus FP8, Q4 GGUF, and NVFP4 checkpoints
Holo3.1 pushes computer-use agents toward deployability: multiple sizes, quantized checkpoints, and local inference targets matter more than a single headline score. Enterprises automating browser, desktop, and internal-tool workflows can now separate privacy-sensitive UI action from hosted frontier reasoning. That supports a two-layer architecture: local CUA for execution, closed frontier for planning and verification.
Hugging Face Holo3.1 launch

Architecture watch

Patterns to track.

3 patterns reshaping the canopy.

Architectural patterns that crossed multiple vendors this period. Each pattern lists exemplar releases and what it changes for deployment, cost, or capability.

Cheap specialist sub-agents
Mellum2Holo3.1-0.8B / 4B / 9BClaude Opus 4.8 fast mode
The agent stack is splitting into high-reasoning planners and cheap repeated workers. Mellum2 and Holo3.1 are purpose-built for the calls that happen hundreds or thousands of times inside a workflow: routing, validation, summarization, UI action, and local execution. Model routers should now budget by step type rather than treating one flagship as the default for every agent call.
Hugging Face Mellum2 and Holo3.1 launches
Physical-AI omni-models
NVIDIA Cosmos 3 NanoNVIDIA Cosmos 3 SuperBernini-R renderer
Open model activity is expanding from language and code into physical-world simulation, video rendering, and action generation. Cosmos 3's combined world generation, physical reasoning, and action generation points toward a branch where synthetic data and robotics workflows become first-class model workloads. That is a different buyer and deployment path than enterprise chat.
Hugging Face NVIDIA Cosmos 3 launch; ByteDance Bernini-R Hugging Face card
Frontier release gaps measured in weeks
Claude Opus 4.8Gemini 3.5 Pro pendingGPT-5.6 speculation
The absence of a new frontier release this week matters because expectations have compressed. Buyers are now tempted to delay procurement for a model that may arrive in days. The practical answer is to separate infrastructure choices from model choice: standardize evaluation harnesses, routers, and cost controls so a June GA can be tested and slotted without freezing current deployments.
Google Gemini 3.5 announcement; Anthropic Opus 4.8 announcement

Benchmark moves

Where the leaderboard moved.

3 benchmarks shifted.

Benchmark deltas that change a procurement read. Scores reflect public leaderboards or vendor model cards as of publication.

Artificial Analysis Intelligence Index
No new W23 leaderboard reset; Claude Opus 4.8 remains the public #1 at 61.4 while Gemini 3.5 Pro is still pending GA
- Claude Opus 4.861.4
- GPT-5.5~60
- Gemini 3.1 Pro~57
Artificial Analysis summaries via June model comparisons
SWE-Bench Pro
Closed frontier still leads coding: Opus 4.8's 69.2% remains the public bar; no new open release in W23 changes the top coding score
- Claude Opus 4.869.2%
- GPT-5.5~66-67%
- Gemini 3.1 Pro~62%
AI Tool Bolt / Pristren June benchmark summaries
Local computer-use deployability
Holo3.1 shifts the measurable axis from one top-line CUA score to model size and quantization availability for local execution
- Holo3.1-0.8Bultra-light local
- Holo3.1-9Bbalanced local
- Holo3.1-35B-A3Bstate-of-the-art tier
Hugging Face Holo3.1 launch

Tier scorecard

Who leads, who pushes.

6 tiers · leaders as of Jun 6, 2026.

A snapshot of leader-vs-challenger by tier. Useful for procurement shortlists when matching workload to model class. Pair with the benchmark moves above for the underlying scores.

Tier

Leader

Challenger

Read

Closed frontier
Claude Opus 4.8
GPT-5.5
No W23 reset; Gemini 3.5 Pro remains the watched June challenger rather than a published benchmark row.
Open frontier
DeepSeek V4-Pro
GLM-5.1
No new open frontier text model displaced the April leaders; W23 open activity shifted to specialist/local models.
Reasoning
Claude Opus 4.8
GPT-5.5
Closed reasoning leadership steady while Gemini 3.5 Pro remains pending.
Coding
Claude Opus 4.8
GPT-5.5
Opus 4.8 still owns the visible SWE-Bench Pro lead; Mellum2 matters for cheap sub-agent code/text calls.
Multimodal
Gemini 3.1 Pro
Cosmos 3
Gemini remains the general multimodal reference; Cosmos 3 creates a specialist physical-AI branch.
Edge / small
Mellum2
Holo3.1-9B
Efficient local/sub-agent models were the week's real release activity.

Closed frontier
Leader: Claude Opus 4.8
Challenger: GPT-5.5
No W23 reset; Gemini 3.5 Pro remains the watched June challenger rather than a published benchmark row.
Open frontier
Leader: DeepSeek V4-Pro
Challenger: GLM-5.1
No new open frontier text model displaced the April leaders; W23 open activity shifted to specialist/local models.
Reasoning
Leader: Claude Opus 4.8
Challenger: GPT-5.5
Closed reasoning leadership steady while Gemini 3.5 Pro remains pending.
Coding
Leader: Claude Opus 4.8
Challenger: GPT-5.5
Opus 4.8 still owns the visible SWE-Bench Pro lead; Mellum2 matters for cheap sub-agent code/text calls.
Multimodal
Leader: Gemini 3.1 Pro
Challenger: Cosmos 3
Gemini remains the general multimodal reference; Cosmos 3 creates a specialist physical-AI branch.
Edge / small
Leader: Mellum2
Challenger: Holo3.1-9B
Efficient local/sub-agent models were the week's real release activity.

Vendor signals

Pricing, gating, deprecation.

3 non-release signals worth tracking.

The non-release moves that shift vendor risk — pricing, deprecations, gating decisions, license changes — with a one-line procurement read.

2026-06/Google DeepMind
Gemini 3.5 Pro remains promised for June with no public API model ID, pricing, or third-party benchmark row by W23 close
Procurement teams should prepare an eval slot but avoid freezing current deployments on an unreleased model. The operating pattern is rapid re-baselining, not launch-date speculation.
Google Gemini 3.5 announcement and June developer guidance
2026-06-01/JetBrains
Released Mellum2 under Apache 2.0 with an explicit low-latency production-workload positioning
IDE and enterprise-platform vendors can now point to a plausible open/private model for background text-code tasks. That increases pressure on hosted copilots to justify every closed-frontier call by risk or quality, not habit.
Hugging Face JetBrains Mellum2 launch
2026-06-01/NVIDIA
Released Cosmos 3 on Hugging Face while also announcing Vera Rubin production at GTC Taipei
NVIDIA is binding the model and infrastructure stories together: physical-AI models create demand for the simulation, synthetic-data, and rack-scale compute stack it sells. Buyers should evaluate model capability and deployment substrate together.
Hugging Face Cosmos 3 launch; NVIDIA GTC Taipei

Watchlist

On the radar next.

3 catalysts to watch, starting Jun 7-30.

Specific model-side catalysts in the next 7–30 days that would change the read materially. Watching these tells us whether the canopy is widening or thinning.

Jun 7-30
Gemini 3.5 Pro GA
The first public model card, price row, API ID, and Artificial Analysis pass will determine whether June becomes a true frontier reset or just a routing expansion.
Jun-Jul
Mythos-class Anthropic availability
Anthropic has publicly framed stronger gated models as pending cyber safeguards. A wider release would change the closed-frontier scorecard more than another Opus point release.
Jun-Aug
Open/local agent model adoption
Downloads, integrations, and benchmark replications for Mellum2, Cosmos 3, and Holo3.1 will show whether specialist open models are becoming production substrate or just launch-week noise.

No new closed frontier shipped; the open/local agent substrate widened underneath it.

What changed in the tree.

Flagship-class releases.

Gemini 3.5 Pro

Open-frontier and open-source drops.

Mellum2

NVIDIA Cosmos 3

Holo3.1

Patterns to track.

Cheap specialist sub-agents

Physical-AI omni-models

Frontier release gaps measured in weeks

Where the leaderboard moved.

Who leads, who pushes.

Pricing, gating, deprecation.

On the radar next.