brianletort.ai
All issues

The AI Stack Weekly

Issue 07 · Week 23 of 2026.

/Industry brief · ~7 min read/Public sources onlyDownload brief

The Bottom Line

The AI factory became a power-and-fabric problem, not a model-release problem.

Flywheel arcAll three lenses

W23 was the first week where the infrastructure stack gave a clearer answer than the model labs. NVIDIA used GTC Taipei / Computex to move Vera Rubin from roadmap to production ramp: the platform is in full production, fall/Q3 shipments are planned, the five-rack AI factory reference now includes Vera Rubin NVL72, Vera CPU, BlueField-4 storage, Spectrum-6 Ethernet and Spectrum-X Ethernet Photonics, and Jensen Huang later confirmed Samsung, SK hynix, and Micron are all qualified and in production for HBM4. That resolved last week's hardware prediction, but it also shifted the bottleneck: the question is no longer whether the next rack exists, it is whether power, memory, optical fabric, and operator software can arrive together. On software, the closed frontier was quiet — Gemini 3.5 Pro still had not GA'd by the end of the window — while open weights widened in the efficient-agent layer: JetBrains Mellum2, NVIDIA Cosmos 3, and Holo3.1 all targeted deployable sub-agents, physical-AI reasoning, or local computer-use rather than a monolithic chatbot benchmark. On applications, Microsoft Scout, Salesforce Coworker, ServiceNow Otto, Wordsmith, and Stilta all pointed at the same control-plane fight: governed agents with identities, permissions, and workflow authority. Net/net: boards should treat AI capacity as an integrated power+fabric+software operating model; investors should stop valuing compute without asking who controls HBM4, optics, and firm power; architects should design for heterogeneous model routing and governed agent identity; operators should budget the AI factory as a system, not a GPU purchase order.

JevonsMetcalfeGilderSoftwareJevonsHardwareHuangNetworkingMetcalfe + Gilder

The three lenses

What moved this week, and what to do about it.

9 events across the flywheel — 3 software, 3 hardware, 3 networking.

Software.

  • JetBrains released Mellum2, an Apache-2.0 12B sparse MoE with 2.5B active parameters per token, positioned for low-latency routing, RAG, summarization, validation, sub-agents, and private text/code deployments

    Hugging Face JetBrains Mellum2 launch

  • NVIDIA released Cosmos 3 on Hugging Face as an open omni-model for physical-AI reasoning and action, with Nano 16B and Super 64B variants plus Diffusers integration and synthetic-data workflows

    Hugging Face NVIDIA Cosmos 3 launch

  • H Company released Holo3.1 for local computer-use agents, adding 0.8B / 4B / 9B / 35B-A3B sizes plus FP8, Q4 GGUF and NVFP4 checkpoints for private deployment

    Hugging Face Holo3.1 launch

What this means

The model layer's action moved below the flagship frontier: efficient MoE routers, physical-AI omni-models, and quantized computer-use agents are the tools that make agent systems cheaper, local, and specialized. Architects should route cheap sub-agent work to open/local models and reserve Opus/GPT/Gemini-class spend for high-risk reasoning, because the software flywheel is now about orchestration economics as much as raw intelligence.

Hardware.

  • NVIDIA announced Vera Rubin is in full production, with a five-rack platform spanning Vera Rubin NVL72, Vera CPU, BlueField-4 STX storage, Spectrum-6 SPX Ethernet, and partner manufacturing across 350+ factories and 30 countries

    NVIDIA Newsroom, GTC Taipei

  • GTC Taipei positioned DSX OS as the lifecycle, health, resiliency, and multi-tenant operating layer for AI factories, shifting attention from rack shipment to fleet operations

    Data Center Knowledge GTC Taipei coverage

  • Jensen Huang confirmed Samsung, SK hynix, and Micron are all qualified and in production for Vera Rubin HBM4, resolving the near-term supplier uncertainty around the Q3/fall ramp

    TechTimes summary of Reuters/Bloomberg remarks

What this means

The hardware read changed from 'will Rubin be on schedule?' to 'can the whole AI factory be delivered as a coordinated system?' HBM4 qualification across all three memory suppliers lowers one supply-chain risk, but power smoothing, liquid cooling, operator software, and rack-scale integration become the gating disciplines. Investors should value the ecosystem around the rack, not just the accelerator SKU.

Networking.

  • NVIDIA said Spectrum-X Ethernet Photonics, a CPO-based switch platform with 200Gb/s SerDes, is now in production as part of the Vera Rubin AI factory fabric

    NVIDIA Newsroom

  • Marvell framed CPO and 1.6T optical DSPs as the next AI connectivity bottleneck, citing a CPO switch design, 100T Ethernet switch work, and NVIDIA partnership around optics, photonics, and NVLink Fusion

    DataCenterNews Asia

  • Broadcom reported AI semiconductor revenue up 143% YoY, with networking nearly 40% of AI revenue and demand for XPUs plus networking described as insatiable

    SDxCentral Broadcom Q2 FY2026 earnings coverage

What this means

Networking is no longer a secondary line item under the GPU bill; it is the fabric that determines whether multi-rack systems behave like one machine. The week put CPO, 1.6T/3.2T optics, and AI Ethernet economics into the same frame as HBM4. Network architects should treat optical scale-up and AI Ethernet telemetry as first-order design inputs before committing to a rack architecture.

Capital flow

Money in, revenue out.

4 categories tracked. Capital deployment up in 0 of 4; revenue follows at multiples of 0.21 to 0.6.

The four-category scorecard. Where capital is going in, where revenue is coming out, and how much of it is real. The one chart for the boardroom.

  • Frontier Labs

    OpenAI, Anthropic, Google DeepMind, xAI

    Capital In

    ~$90B

    vs ~$90B

    Revenue Out

    ~$20B

    vs ~$20B

    Burn / Rev

    ~1.3x

    Movement

    No new mega-round closed; the action moved from lab balance sheets to the infrastructure stack those labs consume.

  • Hyperscaler-Hosted

    Azure-OpenAI, AWS-Anthropic, Google Cloud-Gemini, Oracle-OCI

    Capital In

    ~$181B

    vs ~$180B

    Revenue Out

    ~$60B

    vs ~$60B

    Burn / Rev

    ~0.3x

    Movement

    Google's power-first Texas campus made energy procurement the visible hyperscaler battleground.

  • Neoclouds

    CoreWeave, Nscale, Crusoe, Lambda, Fluidstack, IREN

    Capital In

    ~$12B

    vs ~$12B

    Revenue Out

    ~$5B

    vs ~$5B

    Burn / Rev

    ~3x

    Movement

    No new W23 financing reset; prior IREN/Microsoft-style deals remain the relevant neocloud proof point.

  • On-Prem / Hybrid

    Enterprise GPU clusters, sovereign and national programs, Cisco / Dell / HPE

    Capital In

    ~$91B

    vs ~$90B

    Revenue Out

    ~$35B

    vs ~$35B

    Burn / Rev

    ~2x

    Movement

    Sovereign AI infrastructure moved from compute ambition to power-first site selection and public-consent risk.

Burn-to-Revenue is revenue divided by committed capital. Lower means more capital is going out than coming in.

Signal vs noise

What’s real, what’s noise.

4 claims this week — 3 signal, 1 noise.

Each claim is scored 1–5 on source quality and triangulation. Anything 2 or below is flagged as noise. Where consensus is wrong, we say so.

  • 5 / 5

    Vera Rubin is in full production and NVIDIA named a fall/Q3 shipment path for the next AI-factory platform.

    Sources: NVIDIA Newsroom, NVIDIA GTC Taipei live updates, Data Center Knowledge coverage. Caveat: vendor announcement, not customer acceptance data.

    SIGNAL. This resolves the prior hardware watch item and moves the cycle from roadmap risk to execution risk: memory, power, optics, cooling, and fleet software now determine who can deploy the rack at useful scale.

  • 4 / 5

    All three HBM4 suppliers are qualified and in production for Vera Rubin.

    Sources: Huang remarks in Seoul summarized by TechTimes from Reuters/Bloomberg; NVIDIA has not published official allocation splits.

    SIGNAL with allocation caveat. Multi-supplier qualification materially lowers a single-vendor HBM4 cliff, but the unresolved question is volume, yield, and 16-high stack readiness for the follow-on platform.

  • 2 / 5 — noise

    Gemini 3.5 Pro has launched and already displaced Opus 4.8 on public benchmarks.

    Sources: Google's May I/O post says Pro is expected next month; June comparison articles still describe Pro as not yet public and unbenchmarked.

    NOISE for this window. The launch may still happen in June, but W23 ended with Pro still pending, so procurement should not delay current coding-agent baselines on an unpriced, unreleased SKU.

  • 4 / 5

    Enterprise application vendors are converging on governed autonomous agents with identity, permissions, and workflow authority.

    Sources: Microsoft Scout announcement, Salesforce Coworker blog, ServiceNow Otto launch coverage.

    SIGNAL. The market is moving from copilot UX to agent identity and governed action. CIOs should evaluate who owns the agent credential, audit trail, and policy layer before approving another assistant rollout.

Early warning panel

The levers we monitor.

10 metrics tracked — 2 rising, 0 falling, 8 steady.

Current vs prior period. Each metric has a threshold where the read materially changes — this panel flags the inflection before it lands in headlines. Click any metric for the methodology and this-week read.

  • Frontier lab cash position (avg months runway, top 3)

    ~33-36 movs ~33-36 mo

    Threshold: <18 mo triggers re-rating risk

    What this measures

    Top 3 frontier labs (OpenAI, Anthropic, Google DeepMind) by disclosed runway. Anthropic's $65B Series H closed in-window (May 28, $965B post-money), materially extending the top-3 average on top of the leader's prior cumulative committed capital. Boards should not assume frontier-lab funding pressure as a forcing function for short-term commercial concessions — the runway just got longer.

  • Hyperscaler capex / AI revenue ratio (top 4 weighted)

    ~5.0-5.2vs ~5.0-5.2

    Threshold: >6.0 invites investor pushback at next earnings

    What this measures

    Top 4 hyperscalers (MSFT, GOOG, META, AMZN) weighted aggregate of total capex divided by AI-attributable revenue. No within-window prints — all top-4 readings came at late-April earnings (~$725B 2026 capex guide), so this is carried flat. Investors monitoring a 'capex bubble' should keep the hypothesis on power / HBM4 supply constraints, not demand.

  • CoreWeave revenue backlog

    $99.4Bvs $99.4B

    Threshold: Conversion velocity matters more than gross figure

    What this measures

    Booked but unrecognized revenue. The $99.4B audited figure (as of Mar 31, reported May 7) is unchanged; next print is Q2 in early August. Operators evaluating neocloud counterparty risk should keep watching conversion velocity over the headline backlog number.

  • NVIDIA Q-over-Q data center revenue

    $75.2B (Q1 FY27); Rubin production ramp confirmedvs $75.2B (Q1 FY27)

    Threshold: Q2 FY27 guide $91B implies further +21% QoQ

    What this measures

    Q1 FY27 Data Center revenue of $75.2B (+21% QoQ, +92% YoY) was reported May 20 (prior window); Q2 guide is $91B with zero China DC compute assumed. No within-window change. HBM4 supply — with the Samsung labor risk now removed (May 27 ratification) — remains the binding constraint, not demand.

  • Open vs closed gap on SWE-Bench Pro (coding)

    Closed +~19pp (no new Pro challenger yet)vs Closed +~19pp (audit caveat)

    Threshold: Sustained open lead reshapes enterprise procurement

    What this measures

    Top closed (gated Claude Mythos Preview 77.8%) vs top open (~58.6%) is roughly unchanged on the May 27 board. But a May 25 third-party audit (DeepSWE/Datacurve) found Claude Opus models exploited a .git loophole in 18-25% of certain passes — the real open-vs-closed gap may be overstated. Architects should treat single-benchmark superiority claims with more skepticism and pilot open self-host options before signing multi-year closed contracts.

  • Sovereign AI commitments (count / aggregate $)

    ~13 / ~$160B+; power-first gating risingvs ~13 / ~$160B+

    What this measures

    SoftBank's up-to-EUR 75B / 5GW France pledge (May 30, Choose France) was added in-window, roughly doubling the curated aggregate. Counts are analyst-curated rather than a single audited figure. Operators with EMEA workloads should treat European sovereign compute as an increasingly credible landing zone, while pricing in multi-year build timelines.

  • PJM 2026/27 capacity auction price ($/MW-day)

    $329.17vs $329.17

    Threshold: 11x in 24 months — power is the new binding constraint

    What this measures

    The 2026/27 BRA cleared at the FERC cap ($329.17, July 2025) and takes effect June 1, 2026; no new auction in-window. Architects should not assume near-term price relief from forward auctions; budget capacity at-cap through 2028.

  • Time-to-power, busiest US markets (months)

    60-84 (new PJM); power-first campuses risingvs 60-84 (new PJM); 36-48 (existing PJM queue)

    What this measures

    Months from new-load interconnection request to energization. PJM data confirms ~7-year new-build timelines, essentially flat in-window, but the bottleneck has shifted downstream: substation transformer lead times ticked up from ~150 to >160 weeks in 2026. Architects should pre-commit power — and now long-lead grid equipment — before pre-committing GPU SKUs.

  • Cost-per-task, frontier reasoning model

    ~$0.10-$0.15 (effective; unchanged)vs ~$0.10-$0.15 (effective)

    Opus 4.8 fast mode dropped ~3x but no verifiable per-task reading in-window

    What this measures

    Median cost across frontier-tier reasoning models for a benchmark complex task. No verifiable within-window reading, so carried from W21 (flagged low-confidence). Opus 4.8's fast mode dropped ~3x in list terms; operators running agents at scale should re-benchmark on cost-per-task, not list price, once independent figures land.

  • Custom silicon share of incremental AI compute

    ~33-36%; Broadcom AI revenue +143% YoYvs ~33-36%

    Threshold: >35% materially compresses merchant GPU pricing

    What this measures

    No new primary reading in-window, but consistent secondary data (TrendForce/SemiAnalysis) shows ASIC AI-server shipments ~27.8% of the 2026 market growing +44.6% YoY vs +16.1% for merchant GPUs. Investors with concentrated NVIDIA exposure should diversify into ASIC co-design (Broadcom, Marvell) and advanced packaging / power.

Predictions

What we expect next.

5 predictions for the next 30-90 days, confidence 60%-70%.

Each prediction is falsifiable, time-bounded, and tied to a specific signal we will watch. Future issues score these hit, miss, partial, or pending and build a public track record.

Prediction 01

60%

confidence

Software

Gemini 3.5 Pro reaches public GA by June 30, 2026, but does not exceed Claude Opus 4.8 on SWE-Bench Pro in its first independent Artificial Analysis run.

Deadline: By June 30, 2026

Trigger: Google AI Studio / Gemini API changelog plus Artificial Analysis leaderboard update.

Prediction 02

70%

confidence

Hardware

At least one major OEM announces customer shipment or formal order availability for Vera Rubin NVL72-class systems before September 30, 2026.

Deadline: By September 30, 2026

Trigger: Dell, HPE, Lenovo, Supermicro, or NVIDIA customer-shipment announcement.

Prediction 03

65%

confidence

Hardware

Before August 31, 2026, at least one memory supplier or supply-chain analyst reports HBM4 allocation tightness despite three-supplier qualification.

Deadline: By August 31, 2026

Trigger: SK hynix, Samsung, Micron, TrendForce, or Bloomberg/Reuters supply-chain reporting.

Prediction 04

65%

confidence

Networking

Broadcom, Marvell, or NVIDIA announces a new CPO/1.6T production design win or revenue guide uplift tied to AI networking before August 31, 2026.

Deadline: By August 31, 2026

Trigger: Earnings call, product release, or customer design-win disclosure.

Prediction 05

60%

confidence

Power

A hyperscaler announces another >500MW power-first AI campus or behind-the-meter generation deal by September 30, 2026.

Deadline: By September 30, 2026

Trigger: Hyperscaler energy/data-center announcement; utility or developer disclosure.

Track record

Scoring prior predictions.

5 prior predictions: 0 hit, 0 miss, 0 partial, 5 pending. Hit rate —.

5 predictions across issues so far. Hit rate: . Hits 0, misses 0, partials 0, pending 5.

Prediction 01

65%

confidence

Capital

No frontier lab (Anthropic or OpenAI) files a publicly visible S-1 on SEC EDGAR before August 31, 2026, keeping the IPO race at the confidential-DRS stage.

Deadline: By August 31, 2026

Trigger: SEC EDGAR public filings; confirmed public S-1 vs confidential DRS reporting from Reuters / Bloomberg / The Information.

pending

Prediction 02

60%

confidence

Software

Gemini 3.5 Pro reaches general availability by June 30, 2026 and scores AA Intelligence Index >= 61, contesting Claude Opus 4.8's fresh lead.

Deadline: By June 30, 2026

Trigger: Google / DeepMind GA announcement; Artificial Analysis leaderboard update.

pending

Prediction 03

75%

confidence

Hardware

At GTC Taipei / Computex (June 1), NVIDIA reaffirms Vera Rubin production starting in 2H 2026 and frames HBM4 + CoWoS as the binding supply constraint rather than demand.

Deadline: By June 7, 2026

Trigger: NVIDIA GTC Taipei keynote; press coverage; investor notes.

pending

Prediction 04

65%

confidence

Networking

At least two of (Credo, Marvell, Broadcom) cite co-packaged-optics or 1.6T design wins in their next quarterly earnings, validating the W22 optical-fabric push.

Deadline: By August 31, 2026

Trigger: Q2 earnings calls and investor decks from optical/interconnect vendors.

pending

Prediction 05

60%

confidence

Power

A major hyperscaler or sovereign program announces a new behind-the-meter or >1GW power-procurement deal (SMR, gas, or grid) by August 31, 2026, as time-to-power stays the binding US constraint.

Deadline: By August 31, 2026

Trigger: Utility / PPA announcements; hyperscaler energy disclosures; sovereign program financing milestones.

pending

Watchlist

On the radar this week.

4 catalysts to watch, starting Jun 7-30.

Specific catalysts that would change the read materially. Watching these tells us whether the thesis is strengthening or weakening.

  • Jun 7-30

    Gemini 3.5 Pro GA and first independent benchmark pass

    Google's Pro release is the largest unresolved software catalyst from W22/W23. If it ships below Opus 4.8 on coding but above on context/multimodal, routing architectures will split more cleanly by task type.

  • Jun-Aug

    HBM4 allocation and Vera Rubin first customer shipment evidence

    Three-supplier qualification reduces one risk, but volume/yield determines whether the fall ramp is broad or supply-rationed. Watch supplier allocation, OEM shipment language, and lead-time changes.

  • Jun-Aug

    CPO and 1.6T optics revenue conversion

    The networking thesis needs earnings-confirmed dollar content, not just product demos. Broadcom, Marvell, Credo, and NVIDIA commentary will show whether optical fabric becomes a 2026 budget line.

  • Jun-Sep

    Power-first campus replication

    Google/Intersect's model could become the hyperscaler template. A second large deal would confirm that energy development is now part of AI capacity procurement.

Companion reads

The rest of the spine.

The AI Stack Weekly is the cross-stack flywheel read. Pair it with the model-and-tree spine and the working framework to get the full picture.

Edits this issue

  • Added W23 evidence that the binding constraint moved from model releases to integrated AI-factory delivery: Vera Rubin production, HBM4 qualification, CPO fabric, and power-first site strategy.

About this brief

Compiled from public announcements, SEC filings, earnings transcripts, and official lab and vendor publications. Every quantitative claim is graded 1–5 on source quality. Claims graded 2 or below are flagged as noise. The thesis the brief defends is published separately and updated only when a hypothesis materially changes.

Authorship

Written by Brian Letort. Independent analysis. All sources cited are public. Not investment guidance.

Operate. Publish. Teach.