The Model Pulse

Issue 05 · Week 21 of 2026.

May 23, 2026/Weekly read/~6 min read/Public sources onlyDownload brief

The Big Read

Six new tree rows. The loudest model-layer week of Q2 — Gemini 3.5 + Omni, Qwen3.7-Max in the global top 5, three open-weights drops that reset what 'open' means.

The thesis this issue defends

W21 (May 18-23, 2026) was the loudest model-layer week of Q2. Google's I/O turned out larger than expected — the long-anticipated 'Gemini 3.2 Flash' shipped as Gemini 3.5 Flash, paired with native multimodal video generation in Omni Flash and the Antigravity 2.0 standalone agent IDE; Alibaba's same-week Cloud Summit pushed Qwen3.7-Max into the global AA Intelligence Index top 5 (a first for a Chinese model at 56.6, ahead of Gemini 3.5 Flash); three open-weights drops — Cohere Command A+ (first Apache 2.0 frontier-adjacent MoE), Microsoft Fara1.5 (browser agents that out-score OpenAI Operator and Gemini Computer Use), Tencent Hy-MT2 (translation MoE that runs offline on a phone) — reset what 'open' means at the frontier-adjacent tier.

The structural shift underneath all six launches is the same: every flagship was engineered for sustained agentic loops first and chat second, with 1M-token context now table stakes and speed-per-task on the Pareto frontier replacing peak-intelligence as the procurement metric. Gemini 3.5 Flash hits Terminal-Bench 76.2% / MCP Atlas 83.6% at ~280 output tokens per second — outperforming prior-gen Gemini 3.1 Pro on agentic benchmarks at 4x throughput. Qwen3.7-Max demonstrated a 35-hour autonomous run with 1,158 tool calls. The unit of competition shifted from raw intelligence to runtime monetization and ecosystem lock-in.

NVIDIA's Q1 FY27 print (May 20: $81.6B revenue, Vera Rubin announced for Q3 ship, Vera CPU already in OpenAI / Anthropic / Oracle / xAI hands, $145B supply commitments) confirmed the compute supply chain that makes all this possible is still ramping, not peaking. Huang named Anthropic alongside the hyperscalers as a Blackwell deployment customer — the compute-side demand matches the equity bid behind Anthropic's $30B / $900B round closing imminently per Bloomberg (May 22).

The specialist canopy widened structurally. Cohere Command A+ (218B / 25B active MoE, Apache 2.0, runs on 2x H100s) closed the open-vs-closed gap on enterprise RAG and tool-use with τ²-Bench Telecom jumping 37% to 85% generation-over-generation; AA-Omniscience Non-Hallucination #1 at 86%. Microsoft Fara1.5-27B (open-weight, Qwen3.5 fine-tune) scored 72% Online-Mind2Web — beating OpenAI Operator (58.3%) and Gemini Computer Use (57.3%) — collapsing the per-task cost for browser-agent fleets. Tencent Hy-MT2 at 30B-A3B runs offline on a phone after 1.25-bit quantization.

Notable absences: no DeepSeek R3, no Kimi K3, no Llama 5, no Grok 4.4 / 4.5, and Anthropic's I/O-week response was infrastructure (self-hosted Managed Agent sandboxes, MCP tunnels) rather than a new model. For the May 25 - June 13 window, the load-bearing catalysts are Anthropic's round close, the Samsung-union ratification vote May 27-28, Microsoft Build (June 2-3), Computex Taipei + NVIDIA GTC Taipei (June 1-5), and any frontier-text counter-launch from Anthropic / OpenAI.

Tree delta

What changed in the tree.

6 models added, 0 updated.

6 model rows added this week across 6 vendors. Canopy widened across Google's I/O drop (Gemini 3.5 Flash, Omni Flash), Alibaba's Cloud Summit response (Qwen3.7-Max — first Chinese model in AA Index top 5), Cohere's first Apache 2.0 frontier-adjacent MoE (Command A+), Microsoft browser-agent open weights (Fara1.5), and Tencent translation MoE (Hy-MT2). Speed-per-task replaces peak-intelligence as the dominant procurement metric.

Added (6)

gemini-3-5-flash
gemini-omni-flash
qwen-3-7-max
cohere-command-a-plus
fara-1-5-27b
hy-mt2-30b-a3b

Updated

None this period.

I/O + Alibaba Cloud Summit + NVIDIA Q1 FY27 + Cohere / Microsoft / Tencent open-weights all stacked into five business days. This was the densest model-layer week of the quarter. Microsoft Build (June 2-3) sits outside the window, so Microsoft's MAI silence on the platform layer is not yet a signal.

Explore the LLM Evolutionary Tree

Frontier movements

Flagship-class releases.

3 releases this period.

Vendor-stated frontier capability. The releases that reset the closed-source ceiling.

2026-05-19/Google DeepMind/Frontier/Multimodal
Gemini 3.5 Flash
Buyers should rerun agentic-coding RFPs — a Flash-tier model now out-codes prior-gen Pro on most agent benchmarks at 4x speed.
Gemini 3.5 Flash hits Terminal-Bench 2.1 76.2%, MCP Atlas 83.6%, and OSWorld 78.4% — beating Gemini 3.1 Pro on almost every agentic eval while running at ~280 output tokens per second, the new Pareto frontier on speed-vs-intelligence. Architects optimizing for sustained agent loops (not chat) should re-baseline cost-per-task within 30 days; Flash pricing is up 3x ($1.50/$9 per Mtok) but per-task cost is often half of frontier peers because of latency and token economy.
deepmind.google/models/gemini/flash, blog.google
2026-05-19/Google DeepMind/Frontier/Multimodal
Gemini Omni Flash
Marketing / ad / content-ops leaders should pilot an Omni workflow this quarter — natively multimodal video gen is now in-product, not a research demo.
Omni Flash ingests text / image / audio / video and outputs grounded 10-second video clips with conversational editing — landing simultaneously in the Gemini app, Google Flow, YouTube Shorts, and YouTube Create. The 'create anything from any input' framing is real because the model carries Gemini's world-knowledge through the generation, unlike Veo's text-to-video pipeline. API access lands in weeks; enterprises that ignore this until then will be six weeks behind on creative-ops tooling.
blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni
2026-05-20/Alibaba/Frontier/Reasoning
Qwen3.7-Max
China-sovereignty buyers have a top-5 global intelligence option in-country — add Qwen3.7-Max to closed-model bake-offs this quarter.
First Chinese model in the Artificial Analysis Intelligence Index top 5 (56.6, ahead of Gemini 3.5 Flash's 55.3); 1M context, 35-hour autonomous tool-use run demonstrated on a real engineering task. Closed-weight strategy and Alibaba Cloud-only access (with homegrown Zhenwu M890 silicon underneath) explicitly target enterprises that want frontier intelligence without US-vendor dependence. The 3-4 point gap to GPT-5.5 has closed enough that 'use Western frontier' is now a debatable position, not a default.
alibabacloud.com/blog, marktechpost.com, Artificial Analysis

Open weights

Open-frontier and open-source drops.

3 releases this period.

Open-weights releases that change procurement options. Pull these into pilot when score parity meets license parity.

2026-05-20/Cohere/Open frontier/MoE
Command A+
Sovereign-AI and data-residency buyers should add Command A+ to their on-prem shortlist — Cohere just shipped the first Apache 2.0 frontier-adjacent MoE that fits on two H100s.
218B total / 25B active MoE, multimodal with citation grounding, W4A4 quantization on B200 or 2x H100 — a deployment profile that previously required either a closed API or a license-restricted open model. τ²-Bench Telecom jumped 37% to 85% generation-over-generation; AA-Omniscience Non-Hallucination #1 at 86%; 48 languages. Procurement teams running RAG or agentic workflows in regulated industries should evaluate this within the next two sprints; strongest non-Chinese open-weight frontier-adjacent candidate in the Western catalog.
cohere.com/blog/command-a-plus, venturebeat.com
2026-05-22/Microsoft Research/Specialist/Multimodal
Fara1.5-27B
Teams building computer-use agents should switch to open-weight Fara1.5 before scaling — Microsoft's tiny model beats OpenAI Operator and Google Computer Use on live web tasks.
Fara1.5-27B scores 72% on Online-Mind2Web vs 58.3% for OpenAI Operator and 57.3% for Gemini 2.5 Computer Use; even the 9B variant beats them at 63.4%. Built on Qwen3.5, open weights on Azure AI Foundry. Collapses the cost structure for browser-agent deployments — operators currently paying per-task to closed providers can run an entire fleet on a single host. Microsoft's strategy reveal: in browser-agent specialty, scale is no longer the moat.
microsoft.com/en-us/research/articles/fara1-5-computer-use-agent, decrypt.co
2026-05-21/Tencent/Specialist/MoE
Hunyuan Hy-MT2-30B-A3B
Localization and globalization ops teams should test Hy-MT2 against incumbent MT APIs — 30B MoE running 3B active now leads open-source translation; 1.8B variant runs offline on a phone.
Family release (1.8B / 7B / 30B-A3B) supporting 33 languages + 5 Chinese dialects. The 1.8B variant fits in 440MB after 1.25-bit AngelSlim quantization — runs offline on a phone. The 7B and 30B-A3B beat DeepSeek-V4-Pro and Kimi K2.6 on fast-thinking translation. Architects shipping multilingual products should benchmark: this is the first open-weight family where on-device offline translation is genuinely production-grade, not a research curiosity.
github.com/Tencent-Hunyuan/Hy-MT2

Architecture watch

Patterns to track.

5 patterns reshaping the canopy.

Architectural patterns that crossed multiple vendors this period. Each pattern lists exemplar releases and what it changes for deployment, cost, or capability.

Speed-intelligence Pareto frontier reset by Flash tier
Gemini 3.5 Flash (~280 tok/sec, AA Index 55+)Command A+ (W4A4 quantized, 2x H100)Gemini 3.1 Flash-Lite (carryover anchor)
Gemini 3.5 Flash at 278-284 output tokens per second with AA Intelligence Index 55+ has redrawn the Pareto frontier — Flash-class models now match prior-generation Pro intelligence at sub-100ms-per-token throughput. Architects budgeting for agentic loops should recompute cost-per-task assuming Flash-tier models now do what required a Pro model six months ago. Closed Pro-tier models retain a 3-5 point intelligence lead but at 4-6x the latency.
artificialanalysis.ai/articles/cohere-launches-open-weights-model-command-a, deepmind.google/models/gemini/flash
Agent-first ground-up redesigns replace chat-first architectures
Gemini 3.5 Flash + Antigravity 2.0Qwen3.7-Max (35h autonomous demo)Command A+ (unified four prior single-purpose models)Mistral Medium 3.5 (anchor)
Every flagship model launched this week was explicitly engineered for sustained multi-step agentic workloads — long-horizon planning, parallel tool use, persistent state — not for conversational turns. Antigravity 2.0 ships as a standalone agent-first GUI replacing the IDE paradigm; Command A+ collapsed four prior single-purpose models into one unified weights. Architects assuming a chat-first interaction model in their AI stack should review their roadmaps — vendor pricing, latency tuning, and API shapes are all moving toward agents as the unit of work.
antigravity.google/blog/introducing-google-antigravity-2-0, cohere.com/blog/command-a-plus
Open-weight frontier shifts from license-restricted to fully Apache 2.0
Command A+ (Cohere Apache 2.0)Gemma 4 (Apache 2.0)DeepSeek V4 (MIT)Mistral Medium 3.5 (modified MIT)
Cohere's Apache 2.0 release of a 218B MoE — its first ever under that license — joins Gemma 4's Apache 2.0 shift in marking a definitive move away from license-restricted open weights at the frontier. Llama Community License-style restrictions are now the laggard, not the standard. Procurement teams that previously had to lawyer up around 700M-MAU caps and field-of-use carveouts have a fast-growing set of frontier-adjacent options without that overhead.
venturebeat.com/technology/cohere-cracks-lossless-quantization-and-native-citations-with-first-full-apache-2-0-licensed-open-model-command-a
1M-token context becomes table stakes for closed frontier
Gemini 3.5 Flash (1M)Qwen3.7-Max (1M)DeepSeek V4 Pro/Flash (1M)Claude Opus 4.7 (1M)GPT-5.5 Pro (1M)
Every major frontier or frontier-adjacent model shipped this month carries a 1M-token context as the default, not a premium tier. The bar moved from 256K to 1M in two quarters. Architects building RAG or long-document workflows should stop treating retrieval chunking as the default — single-shot whole-corpus inference is now economically viable for many workloads. Mid-tier models still at 128K-256K (Command A+ at 128K input) are now the exception that needs justification.
benchlm.ai/benchmarks/artificialAnalysis, model cards across labs
Specialist computer-use models challenge generalist agents on UI tasks
Fara1.5-27B (72% Online-Mind2Web)Yutori Navigator n1 (64.7%)OpenAI Operator (58.3%)Gemini 2.5 Computer Use (57.3%)
Fara1.5-27B's 72% Online-Mind2Web beats OpenAI's Operator (58.3%) and Gemini 2.5 Computer Use (57.3%) — purpose-built browser agents on fine-tuned small bases now outperform frontier generalists on web-task completion. Operators paying per-task on closed Operator / Computer Use APIs should run a TCO comparison against a self-hosted Fara1.5 fleet. The pattern: specialist fine-tunes on focused data beat broad-but-shallow frontier generalists on narrow real-world workflows.
decrypt.co/368807/microsoft-fara15-open-source-ai-beats-openai-gemini, microsoft.com/en-us/research/articles/fara1-5-computer-use-agent

Benchmark moves

Where the leaderboard moved.

5 benchmarks shifted.

Benchmark deltas that change a procurement read. Scores reflect public leaderboards or vendor model cards as of publication.

Artificial Analysis Intelligence Index (snapshot May 21, 2026)
Top remains GPT-5.5 (60.2); Qwen3.7-Max debuts at 56.6 — first Chinese model in global top 5, ahead of Gemini 3.5 Flash (55.3); Claude Opus 4.7 holds 57.3.
- GPT-5.5 (xhigh)60.2
- Claude Opus 4.7 (Adaptive)57.3
- Gemini 3.1 Pro Preview57.2
- Qwen3.7-Max56.6
- Gemini 3.5 Flash (high)55.3
artificialanalysis.ai/articles/cohere-launches-open-weights-model-command-a
Terminal-Bench 2.1 (agentic terminal coding)
Gemini 3.5 Flash jumps to 76.2% — outscores Claude Opus 4.7 (66.1%), Gemini 3.1 Pro (70.3%), and Gemini 3 Flash (58.0%); GPT-5.5 leads at 78.2%.
- GPT-5.578.2
- Gemini 3.5 Flash76.2
- Gemini 3.1 Pro70.3
- Claude Opus 4.766.1
- Gemini 3 Flash58.0
deepmind.google/models/gemini/flash
Online-Mind2Web (browser computer-use, 300 tasks across 136 live sites)
Open-weight Fara1.5-27B sets new SOTA at 72.0%, beating OpenAI Operator (58.3%), Gemini 2.5 Computer Use (57.3%), and Yutori Navigator n1 (64.7%).
- Fara1.5-27B72.0
- Yutori Navigator n164.7
- Fara1.5-9B63.4
- OpenAI Operator58.3
- Gemini 2.5 Computer Use57.3
microsoft.com/en-us/research/articles/fara1-5-computer-use-agent
GDPval-AA (economically valuable knowledge work, Elo)
Gemini 3.5 Flash reaches 1656 Elo — leapfrogging Gemini 3.1 Pro (1314) and Gemini 3 Flash (1204); Claude Opus 4.7 still leads enterprise reasoning at 1753.
- GPT-5.51769
- Claude Opus 4.71753
- Claude Sonnet 4.61676
- Gemini 3.5 Flash1656
- Gemini 3.1 Pro1314
deepmind.google/models/gemini/flash
τ²-Bench Telecom (agentic tool use)
Command A+ jumps from 37% (Command A Reasoning) to 85% — a 48-point generation-over-generation gain on open weights.
- Mistral Medium 3.591.4
- Command A+85.0
- Command A Reasoning37.0
cohere.com/blog/command-a-plus

Tier scorecard

Who leads, who pushes.

6 tiers · leaders as of May 23, 2026.

A snapshot of leader-vs-challenger by tier. Useful for procurement shortlists when matching workload to model class. Pair with the benchmark moves above for the underlying scores.

Tier

Leader

Challenger

Read

Closed frontier
GPT-5.5 (xhigh) — AA Index 60.2
Claude Opus 4.7 (Adaptive Max) — AA Index 57.3, LMArena 1492-1501
Headline intelligence crown unchanged; Gemini 3.5 Pro arrives next month and will be the real challenge to GPT-5.5's lead.
Open frontier
Kimi K2.6 (Moonshot) — AA Index lead among open; DeepSeek V4 Pro #2 on open
Command A+ (Cohere) — first Apache 2.0 frontier-adjacent MoE; Mistral Medium 3.5 dense flagship
Cohere made sovereign-AI on-prem deployment a real Western option this week; Chinese labs still hold raw intelligence lead.
Reasoning
GPT-5.5 Pro / Claude Opus 4.7 (adaptive thinking)
Qwen3.7-Max — 35-hour autonomous tool-use, AA 56.6 with strong long-horizon traces
Reasoning tier is now where China is closest to parity — buyers should test long-horizon agent loops side-by-side.
Coding
Claude Opus 4.7 — SWE-Bench Verified 87.6%, SWE-Bench Pro 64.3%
Gemini 3.5 Flash — agentic-coding Pareto leader at 4x speed (Terminal-Bench 76.2)
Opus 4.7 owns hard coding; Flash owns sustained agentic coding loops where token-per-second matters more than peak score.
Multimodal
Gemini Omni Flash — native text/image/audio/video to grounded video
Gemini 3.5 Flash on input multimodal (CharXiv 84.2, MMMU-Pro 83.6)
Google now owns both ends of multimodal — input understanding and conditional video output — without a clear cross-vendor competitor this week.
Edge / small
Gemma 4 family (E2B / E4B / 26B-MoE / 31B) — Apache 2.0, runs from Pi to workstation
Hy-MT2-1.8B (440MB at 1.25-bit) for translation; Fara1.5-4B for browser agents
Edge tier consolidates around specialists — translation, browser-use, multimodal — rather than general-purpose miniatures.

Closed frontier
Leader: GPT-5.5 (xhigh) — AA Index 60.2
Challenger: Claude Opus 4.7 (Adaptive Max) — AA Index 57.3, LMArena 1492-1501
Headline intelligence crown unchanged; Gemini 3.5 Pro arrives next month and will be the real challenge to GPT-5.5's lead.
Open frontier
Leader: Kimi K2.6 (Moonshot) — AA Index lead among open; DeepSeek V4 Pro #2 on open
Challenger: Command A+ (Cohere) — first Apache 2.0 frontier-adjacent MoE; Mistral Medium 3.5 dense flagship
Cohere made sovereign-AI on-prem deployment a real Western option this week; Chinese labs still hold raw intelligence lead.
Reasoning
Leader: GPT-5.5 Pro / Claude Opus 4.7 (adaptive thinking)
Challenger: Qwen3.7-Max — 35-hour autonomous tool-use, AA 56.6 with strong long-horizon traces
Reasoning tier is now where China is closest to parity — buyers should test long-horizon agent loops side-by-side.
Coding
Leader: Claude Opus 4.7 — SWE-Bench Verified 87.6%, SWE-Bench Pro 64.3%
Challenger: Gemini 3.5 Flash — agentic-coding Pareto leader at 4x speed (Terminal-Bench 76.2)
Opus 4.7 owns hard coding; Flash owns sustained agentic coding loops where token-per-second matters more than peak score.
Multimodal
Leader: Gemini Omni Flash — native text/image/audio/video to grounded video
Challenger: Gemini 3.5 Flash on input multimodal (CharXiv 84.2, MMMU-Pro 83.6)
Google now owns both ends of multimodal — input understanding and conditional video output — without a clear cross-vendor competitor this week.
Edge / small
Leader: Gemma 4 family (E2B / E4B / 26B-MoE / 31B) — Apache 2.0, runs from Pi to workstation
Challenger: Hy-MT2-1.8B (440MB at 1.25-bit) for translation; Fara1.5-4B for browser agents
Edge tier consolidates around specialists — translation, browser-use, multimodal — rather than general-purpose miniatures.

Vendor signals

Pricing, gating, deprecation.

6 non-release signals worth tracking.

The non-release moves that shift vendor risk — pricing, deprecations, gating decisions, license changes — with a one-line procurement read.

2026-05-19/Google
Gemini 3.5 Flash + Omni Flash + Antigravity 2.0 + Managed Agents API at I/O — three-month product cycle compressed into one keynote
Google shifted the conversation from 'model choice' to 'agent platform choice'. Procurement teams should reopen Gemini-vs-incumbent bake-offs that closed in Q1; the agent-harness lock-in (AGENTS.md, SKILL.md, managed Linux sandboxes) is now the strategic battleground, not the model itself.
blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights
2026-05-20/NVIDIA
Q1 FY27 record $81.6B revenue (+85% YoY); Vera Rubin + Vera CPU + BlueField-4 STX announced; Vera CPUs already shipping to OpenAI, Oracle, Anthropic, xAI
Data Center revenue $75.2B (+92% YoY) confirms the buildout has not yet peaked; Vera Rubin Q3 2026 ship date with a purpose-built agentic-AI CPU reframes 2H training capacity. CFOs and infrastructure architects should assume frontier training compute remains supply-constrained through year-end and budget GPU access on Q3-availability cadence, not on price.
investor.nvidia.com/news/press-release-details/2026/NVIDIA-Announces-Financial-Results-for-First-Quarter-Fiscal-2027
2026-05-19/Anthropic
Claude Managed Agents add self-hosted sandboxes (public beta) and MCP tunnels (research preview) — agent execution moves into customer infrastructure boundary
Anthropic's response to I/O wasn't a new model — it was an enterprise security control surface. Security and platform teams should evaluate this as the new baseline for agent-deployment threat models: tool execution stays on-prem, credentials never enter agent context, only outbound traffic. Buyers that previously gated Claude agents on data-leakage risk now have a clear path forward.
claude.com/fr/blog/claude-managed-agents-updates
2026-05-21/OpenAI
Codex 'Goal mode' GA, macOS Appshots, locked-Mac computer use, browser annotations + Dell hybrid deployment partnership (May 18-19)
OpenAI did not counter-launch a model into I/O week — instead doubled down on Codex as the front door for enterprise on-prem (Dell AI Factory) and pushed agent persistence (Goal mode, locked computer use). Engineering leaders evaluating coding-agent vendors should now treat Codex as a multi-surface platform (CLI + IDE + mobile + macOS app + Windows incoming) rather than a chat feature, and re-evaluate hybrid deployment paths.
help.openai.com/en/articles/6825453-chatgpt-release-notes
2026-05-20/Alibaba
Alibaba Cloud Summit: Qwen3.7-Max + Qwen3.7-Plus-Preview + homegrown Zhenwu M890 AI chip (3x predecessor) + rebuilt full-stack agentic cloud platform
Alibaba projects model & application services ARR of ¥10B (~$1.4B) in the June quarter and ¥30B (~$4.1B) by year-end — a credible commercial AI business at scale. Buyers in APAC or with China-region data residency requirements should treat Alibaba as a primary frontier vendor evaluation this quarter, not a regional alternative.
alibabacloud.com/blog/alibaba-unveils-new-ai-chip-flagship-model-and-rebuilt-cloud-stack-ai-for-agentic-era
2026-05-20/Cohere
Command A+ shipped under Apache 2.0 — first ever fully-permissive license from Cohere on a frontier-adjacent MoE
Sovereign-AI and data-residency buyers now have a Western on-prem frontier-adjacent option without license restrictions. Strategic signal: Cohere is positioning as the Western counter to Chinese open-weights labs (DeepSeek, Moonshot, Z.ai) for regulated enterprises that cannot use Chinese weights. Procurement teams should test Command A+ vs Mistral Medium 3.5 for RAG and tool-use workloads inside the next two sprints.
cohere.com/blog/command-a-plus

Watchlist

On the radar next.

7 catalysts to watch, starting May 25 - 30.

Specific model-side catalysts in the next 7–30 days that would change the read materially. Watching these tells us whether the canopy is widening or thinning.

May 25 - 30
Anthropic round close — final terms, lead investor, full investor list
Bloomberg May 22 reported close 'as soon as next week' at $30B-plus / $900B-plus. Resets the frontier-lab valuation curve 15x in 14 months. A slip past end-May or a haircut below $700B would be the loudest re-rating signal in a year.
May 27 - 28
Samsung-union ratification vote on tentative HBM4 agreement
Binary outcome. Approval keeps the 18-day walkout off the table and HBM4 supply for Vera Rubin intact; rejection re-activates the W20 supply-risk thesis on three days' notice.
May 26 - June 13
Anthropic Opus 4.8 — leaked partner evaluations, expected I/O response
Multiple sources reporting select Anthropic partners running Opus 4.8 internal evals; Anthropic's I/O-week response was infrastructure (sandboxes, MCP tunnels), not a model. The model itself likely lands within 3 weeks based on Opus 4.6 to 4.7 cadence.
June 2 - 3
Microsoft Build 2026 (San Francisco) — MAI-line announcements, Foundry updates, Fara1.5 enterprise rollout
Build was pushed from May to June 2-3 — Microsoft has been notably quiet on its own MAI model line through I/O week. Build is where MAI frontier models, Copilot agent stack moves, and Foundry consolidation are likely to land; procurement teams running Azure / Foundry should hold model-selection decisions until after Build.
May 26 - July 15
Gemini 3.5 Pro launch — frontier intelligence rematch
Google promised Pro 'next month' (June 2026). This is the model that determines whether Google retakes the AA Intelligence Index lead from GPT-5.5 (currently 60.2) or stays on the second podium tier. Buyers planning Pro-tier procurement should hold pending Pro pricing and benchmarks.
May 26 - July 31
DeepSeek R3 / V4-Thinking — overdue per W20 watchlist
DeepSeek V4 Pro/Flash landed April 24 with strong benchmarks but explicitly without R3-style deep reasoning weights. Cohere CEO and DeepSeek itself have publicly acknowledged being 3-6 months behind US frontier — an R3 release is the catalyst that could close that gap or confirm it.
June 9 - 13
Apple WWDC 2026 — on-device model strategy, Apple Intelligence v2
Apple absent from the model-launch cycle through Q2; WWDC is the only major Apple model-disclosure window in 1H. Mobile / edge architects building on-device AI should expect significant Apple Intelligence repositioning and possibly partnered closed-model integration changes.

Six new tree rows. The loudest model-layer week of Q2 — Gemini 3.5 + Omni, Qwen3.7-Max in the global top 5, three open-weights drops that reset what 'open' means.

What changed in the tree.

Flagship-class releases.

Gemini 3.5 Flash

Gemini Omni Flash

Qwen3.7-Max

Open-frontier and open-source drops.

Command A+

Fara1.5-27B

Hunyuan Hy-MT2-30B-A3B

Patterns to track.

Speed-intelligence Pareto frontier reset by Flash tier

Agent-first ground-up redesigns replace chat-first architectures

Open-weight frontier shifts from license-restricted to fully Apache 2.0

1M-token context becomes table stakes for closed frontier

Specialist computer-use models challenge generalist agents on UI tasks

Where the leaderboard moved.

Who leads, who pushes.

Pricing, gating, deprecation.

On the radar next.