AI League — Game Day 6: Speed Dead Heat at the Top, Meta Checks In, xAI Ships New Code

Claude and GPT are now separated by 0.2 tokens per second. Six days into the season, the intelligence board is locked in — but the speed race just got a lot more interesting.

Game Day 6 — Key metrics at a glance

As of June 3, 2026 (Artificial Analysis live data)

Claude Opus 4.8 (max)

61 / 62.4 t/s+2.3%speed ↑ vs GD5

GPT-5.5 (xhigh)

60 / 62.6 t/s−0.5%speed Δ vs GD5

Gemini 3.5 Flash

55 / 163 t/s−20.0%speed Δ vs GD5

Grok 4.3 (high)

53 / 141 t/s

DeepSeek V4 Pro (Max)

52 / $0.18 blend+3.5%speed ↑ vs GD5

統計カードを読み込んでいます…

Final score: intelligence standings

No movement at the top for the fourth straight game day. The leaderboard as of June 3, 2026:

Rank	Team	Model	AA Intelligence	Δ
🥇 1	Anthropic	Claude Opus 4.8 (max)	61	↔
🥈 2	OpenAI	GPT-5.5 (xhigh)	60	↔
🥉 3	OpenAI	GPT-5.5 (high)	59	↔
4	Anthropic	Claude Opus 4.7 (max)	57	↔
4	Google	Gemini 3.1 Pro Preview	57	↔
4	OpenAI	GPT-5.5 (medium)	57	↔
4	Alibaba	Qwen3.7 Max	57	↔
8	Google	Gemini 3.5 Flash	55	↔
9	Kimi	Kimi K2.6	54	↔
9	Xiaomi	MiMo-V2.5-Pro	54	↔
11	xAI	Grok 4.3 (high)	53	↔
12	Meta	Muse Spark	52	NEW
12	DeepSeek	DeepSeek V4 Pro (Max)	52	↔

The top bracket has been static since opening night. Claude holds the title at 61, one point ahead of GPT-5.5 (xhigh) at 60. The real story today: Meta's Muse Spark checked in at AI Index 52 — the Llama squad's creative model joining the roster for the first time this season, though performance data remains partial.

Speed panel: the dead heat

This is the number that changed today. After five game days where Claude trailed GPT-5.5 on raw throughput:

Team	Model	Speed (t/s)	Δ vs GD5
Anthropic	Claude Opus 4.8 (max)	62.4	↑ +2.3
OpenAI	GPT-5.5 (xhigh)	62.6	↓ −0.5
OpenAI	GPT-5.5 (high)	59.0	↔
Google	Gemini 3.1 Pro Preview	134	↓ −0.7
Google	Gemini 3.5 Flash	163	↓ −20
xAI	Grok 4.3 (high)	141	↓ −4.2
DeepSeek	V4 Pro (Max)	51	↑ +3.5
Kimi	K2.6	44	↔

Claude crept up to 62.4 t/s while GPT-5.5 slipped to 62.6 t/s — a margin so thin it's within normal API variance. Call it a draw. Both teams clocked more than 100x faster than their TTFT suggests (Claude: 9.35s to first token; GPT-5.5: 58.37s), meaning the real bottleneck for GPT's xhigh variant is getting the first word out, not the pace after that.

Gemini 3.5 Flash's speed reading dropped from 183.2 t/s (GD5) to 163 t/s today. That 11% dip is likely measurement variance rather than a sustained regression — Flash has oscillated between 163 and 207 over the season. Google's squad still holds the clear speed crown in the mid-tier bracket.

チャートを読み込んでいます…

Pricing war: nothing moved

All six franchise price tags are exactly where they were at season opening. No discounts, no hikes:

Team	Flagship	Input $/M	Output $/M	Blended $/M
Anthropic	Claude Opus 4.8	$6.25	$25.00	$4.10
OpenAI	GPT-5.5 (xhigh)	$5.00	$30.00	$4.35
Google	Gemini 3.1 Pro Preview	$2.00	$12.00	$1.74
Google	Gemini 3.5 Flash	$1.50	$9.00	$1.31
xAI	Grok 4.3 (high)	$1.25	$2.50	$0.64
DeepSeek	V4 Pro (Max)	$0.435	$0.87	$0.18

2 3 4 5

The value-per-intelligence-point spread keeps widening. DeepSeek's permanent $0.18 blended rate stays the sharpest budget play — 42 points of AI Index for less than a fifth of a cent per thousand tokens. Grok 4.3 at $0.64 delivers 53 points. Gemini Pro at $1.74 hits 57. The gap between "flagship intelligence" and "budget intelligence" is now priced at roughly a 23× premium.

チャートを読み込んでいます…

Challenger watch

Qwen3.7 Max (Alibaba, AI Index 57, 183 t/s, $1.43 blended) is running the best value-per-point play in the top 10. At $1.43 blended, it matches Gemini 3.1 Pro Preview's intelligence score for 18% less per token. The squad has been quietly building — Qwen3.6 Plus sits at 53 points for just $0.25 blended.

MiMo-V2.5-Pro (Xiaomi, AI Index 54, $0.18 blended) continues to shadow DeepSeek across both performance and pricing, now holding the same 54 score as Kimi K2.6 while costing roughly the same per token as DeepSeek.

xAI's bench deepened this week. The franchise shipped Composer 2.5 on June 1 and Grok Build 0.1 — a dedicated agentic coding model priced at $1.00 input / $2.00 output. Grok Build 0.1 lands at 256k context and targets software engineering pipelines directly. That puts xAI's squad fielding three distinct depth-chart roles: Grok 4.3 (flagship intelligence), Grok Build 0.1 (coding specialist), and the Imagine API (creative media). Roster depth matters in a league where use-case specialization is becoming the differentiator.

Game notes

A few things worth flagging at the 72-hour data update window:

The Claude vs. GPT speed convergence at ~62.4–62.6 t/s is the first time in this season's tracking that Anthropic's flagship has pulled even on throughput. Whether that holds through next week or reverts to GPT's prior +5 t/s edge is worth watching. Both teams are well below the 100 t/s threshold where "speed" stops mattering for most interactive use cases, so the gap is more bragging rights than functional.

Gemini's Llama 4 Scout competitor slot (10M context, AI Index 14) remains the unique value proposition Meta's open-source bench can't match from inside the API: none of the challengers come close to that context window. Muse Spark's 262k context is competent but not unusual.

The no-news-is-news story remains pricing. Six days, zero API price moves across all six core franchises. The DeepSeek permanent-promo floor at $0.18 appears to have anchored the low end — nobody has tried to undercut it.

Stats sourced from Artificial Analysis live benchmarks (72-hour rolling measurements, 8× daily). All prices from official API documentation. Speed figures represent first-party API performance. #AILeague