brianletort.ai
All issues

The Model Pulse

Issue 07 · Week 23 of 2026.

/Weekly read/~6 min read/Public sources onlyDownload brief

The Big Read

No new closed frontier shipped; the open/local agent substrate widened underneath it.

The thesis this issue defends

W23 did not produce the expected Gemini 3.5 Pro GA or a fresh Anthropic/OpenAI frontier release. That absence is the story: Claude Opus 4.8 remains the public closed-frontier leader for coding and agentic work, while Google kept Pro in the June watch window and the model layer's actual shipping activity moved down-stack. JetBrains released Mellum2, an Apache-2.0 12B/2.5B-active MoE designed for low-latency routing, RAG, summarization, validation, and sub-agent calls; NVIDIA released Cosmos 3 as an open physical-AI omni-model with Nano 16B and Super 64B variants; H Company released Holo3.1 with local computer-use sizes and quantized checkpoints. The procurement implication is sharper than another leaderboard reshuffle: production agent systems are becoming portfolios of models. Keep Opus/GPT/Gemini-class models for high-risk reasoning and codebase-scale orchestration, but push cheap, private, repeated sub-agent work into specialized open/local models. The tree delta therefore adds efficient-agent and physical-AI nodes rather than another general chatbot crown.

Tree delta

What changed in the tree.

3 models added, 0 updated.

Three W23 additions: Mellum2 for efficient text/code sub-agent workloads, Cosmos 3 for physical-AI omni-modeling, and Holo3.1 for local computer-use agents.

Added (3)

  • mellum2
  • cosmos-3-nano
  • holo-3-1

Updated

None this period.

Gemini 3.5 Pro remains pending; no tree row is added until Google publishes the GA model card or API identifier.

Explore the LLM Evolutionary Tree

Frontier movements

Flagship-class releases.

1 release this period.

Vendor-stated frontier capability. The releases that reset the closed-source ceiling.

  • /Google DeepMind/Frontier/Reasoning

    Gemini 3.5 Pro

    Still pending at W23 close: Google has said Pro follows Gemini 3.5 Flash in June, but no public API ID, pricing, or independent benchmark row landed in-window

    This is a frontier movement by absence. Buyers waiting for Gemini 3.5 Pro should keep the launch on the June watchlist but should not pause current coding-agent baselines: the public board still has Opus 4.8 leading the closed frontier, and Pro's economics and benchmark profile remain unverified. The likely enterprise split is task routing — Gemini for huge-context/multimodal work, Opus/GPT for coding and agentic reliability — not a universal replacement.

    Google Gemini 3.5 announcement; AI Tool Bolt June comparison

Open weights

Open-frontier and open-source drops.

3 releases this period.

Open-weights releases that change procurement options. Pull these into pilot when score parity meets license parity.

  • /JetBrains/Edge / small/MoE

    Mellum2

    Apache-2.0 12B MoE with 2.5B active parameters per token for low-latency text/code sub-agent workloads

    Mellum2 is not trying to win the frontier leaderboard; it is trying to lower the cost of the thousands of routine model calls inside agent systems. Routing, RAG, summarization, validation, and lightweight code tasks are exactly where private deployments want an efficient open model. If the benchmark claims hold, this is a practical procurement node for teams trying to cut orchestration cost without sending every step to a closed flagship.

    Hugging Face JetBrains Mellum2 launch

  • /NVIDIA/Specialist/Multimodal

    NVIDIA Cosmos 3

    Open physical-AI omni-model family combining world generation, physical reasoning, and action generation in Nano 16B and Super 64B variants

    Cosmos 3 widens the model tree away from text-only agents and into physical AI. The important shift is a unified model that can reason over world state and action generation rather than stitching separate generation and control pipelines together. Robotics, simulation, and industrial automation teams should evaluate it as synthetic-data and reasoning infrastructure, not as a chatbot substitute.

    Hugging Face NVIDIA Cosmos 3 launch

  • /H Company/Specialist/Agentic

    Holo3.1

    Local computer-use agent family with 0.8B, 4B, 9B, and 35B-A3B sizes plus FP8, Q4 GGUF, and NVFP4 checkpoints

    Holo3.1 pushes computer-use agents toward deployability: multiple sizes, quantized checkpoints, and local inference targets matter more than a single headline score. Enterprises automating browser, desktop, and internal-tool workflows can now separate privacy-sensitive UI action from hosted frontier reasoning. That supports a two-layer architecture: local CUA for execution, closed frontier for planning and verification.

    Hugging Face Holo3.1 launch

Architecture watch

Patterns to track.

3 patterns reshaping the canopy.

Architectural patterns that crossed multiple vendors this period. Each pattern lists exemplar releases and what it changes for deployment, cost, or capability.

  • Cheap specialist sub-agents

    Mellum2Holo3.1-0.8B / 4B / 9BClaude Opus 4.8 fast mode

    The agent stack is splitting into high-reasoning planners and cheap repeated workers. Mellum2 and Holo3.1 are purpose-built for the calls that happen hundreds or thousands of times inside a workflow: routing, validation, summarization, UI action, and local execution. Model routers should now budget by step type rather than treating one flagship as the default for every agent call.

    Hugging Face Mellum2 and Holo3.1 launches

  • Physical-AI omni-models

    NVIDIA Cosmos 3 NanoNVIDIA Cosmos 3 SuperBernini-R renderer

    Open model activity is expanding from language and code into physical-world simulation, video rendering, and action generation. Cosmos 3's combined world generation, physical reasoning, and action generation points toward a branch where synthetic data and robotics workflows become first-class model workloads. That is a different buyer and deployment path than enterprise chat.

    Hugging Face NVIDIA Cosmos 3 launch; ByteDance Bernini-R Hugging Face card

  • Frontier release gaps measured in weeks

    Claude Opus 4.8Gemini 3.5 Pro pendingGPT-5.6 speculation

    The absence of a new frontier release this week matters because expectations have compressed. Buyers are now tempted to delay procurement for a model that may arrive in days. The practical answer is to separate infrastructure choices from model choice: standardize evaluation harnesses, routers, and cost controls so a June GA can be tested and slotted without freezing current deployments.

    Google Gemini 3.5 announcement; Anthropic Opus 4.8 announcement

Benchmark moves

Where the leaderboard moved.

3 benchmarks shifted.

Benchmark deltas that change a procurement read. Scores reflect public leaderboards or vendor model cards as of publication.

  • Artificial Analysis Intelligence Index

    No new W23 leaderboard reset; Claude Opus 4.8 remains the public #1 at 61.4 while Gemini 3.5 Pro is still pending GA

    • Claude Opus 4.861.4
    • GPT-5.5~60
    • Gemini 3.1 Pro~57

    Artificial Analysis summaries via June model comparisons

  • SWE-Bench Pro

    Closed frontier still leads coding: Opus 4.8's 69.2% remains the public bar; no new open release in W23 changes the top coding score

    • Claude Opus 4.869.2%
    • GPT-5.5~66-67%
    • Gemini 3.1 Pro~62%

    AI Tool Bolt / Pristren June benchmark summaries

  • Local computer-use deployability

    Holo3.1 shifts the measurable axis from one top-line CUA score to model size and quantization availability for local execution

    • Holo3.1-0.8Bultra-light local
    • Holo3.1-9Bbalanced local
    • Holo3.1-35B-A3Bstate-of-the-art tier

    Hugging Face Holo3.1 launch

Tier scorecard

Who leads, who pushes.

6 tiers · leaders as of Jun 6, 2026.

A snapshot of leader-vs-challenger by tier. Useful for procurement shortlists when matching workload to model class. Pair with the benchmark moves above for the underlying scores.

  • Closed frontier

    Leader: Claude Opus 4.8

    Challenger: GPT-5.5

    No W23 reset; Gemini 3.5 Pro remains the watched June challenger rather than a published benchmark row.

  • Open frontier

    Leader: DeepSeek V4-Pro

    Challenger: GLM-5.1

    No new open frontier text model displaced the April leaders; W23 open activity shifted to specialist/local models.

  • Reasoning

    Leader: Claude Opus 4.8

    Challenger: GPT-5.5

    Closed reasoning leadership steady while Gemini 3.5 Pro remains pending.

  • Coding

    Leader: Claude Opus 4.8

    Challenger: GPT-5.5

    Opus 4.8 still owns the visible SWE-Bench Pro lead; Mellum2 matters for cheap sub-agent code/text calls.

  • Multimodal

    Leader: Gemini 3.1 Pro

    Challenger: Cosmos 3

    Gemini remains the general multimodal reference; Cosmos 3 creates a specialist physical-AI branch.

  • Edge / small

    Leader: Mellum2

    Challenger: Holo3.1-9B

    Efficient local/sub-agent models were the week's real release activity.

Vendor signals

Pricing, gating, deprecation.

3 non-release signals worth tracking.

The non-release moves that shift vendor risk — pricing, deprecations, gating decisions, license changes — with a one-line procurement read.

  • /Google DeepMind

    Gemini 3.5 Pro remains promised for June with no public API model ID, pricing, or third-party benchmark row by W23 close

    Procurement teams should prepare an eval slot but avoid freezing current deployments on an unreleased model. The operating pattern is rapid re-baselining, not launch-date speculation.

    Google Gemini 3.5 announcement and June developer guidance

  • /JetBrains

    Released Mellum2 under Apache 2.0 with an explicit low-latency production-workload positioning

    IDE and enterprise-platform vendors can now point to a plausible open/private model for background text-code tasks. That increases pressure on hosted copilots to justify every closed-frontier call by risk or quality, not habit.

    Hugging Face JetBrains Mellum2 launch

  • /NVIDIA

    Released Cosmos 3 on Hugging Face while also announcing Vera Rubin production at GTC Taipei

    NVIDIA is binding the model and infrastructure stories together: physical-AI models create demand for the simulation, synthetic-data, and rack-scale compute stack it sells. Buyers should evaluate model capability and deployment substrate together.

    Hugging Face Cosmos 3 launch; NVIDIA GTC Taipei

Watchlist

On the radar next.

3 catalysts to watch, starting Jun 7-30.

Specific model-side catalysts in the next 7–30 days that would change the read materially. Watching these tells us whether the canopy is widening or thinning.

  • Jun 7-30

    Gemini 3.5 Pro GA

    The first public model card, price row, API ID, and Artificial Analysis pass will determine whether June becomes a true frontier reset or just a routing expansion.

  • Jun-Jul

    Mythos-class Anthropic availability

    Anthropic has publicly framed stronger gated models as pending cyber safeguards. A wider release would change the closed-frontier scorecard more than another Opus point release.

  • Jun-Aug

    Open/local agent model adoption

    Downloads, integrations, and benchmark replications for Mellum2, Cosmos 3, and Holo3.1 will show whether specialist open models are becoming production substrate or just launch-week noise.

Edits this issue

  • Added Mellum2, Cosmos 3 Nano, and Holo3.1 to the LLM tree and reframed W23 around efficient/local specialist models rather than a closed-frontier release.

About The Model Pulse

A weekly read on the software side of the AI stack. Anchored to the LLM Evolutionary Tree, which the brief annotates each week. The cross-stack flywheel (capital, hardware, networking) is covered in The AI Stack Weekly.

Authorship and sources

Compiled from public model cards, vendor blogs, leaderboards, and official lab announcements. Written by Brian Letort. Independent analysis. Not investment guidance.

Operate. Publish. Teach.