AI League — Game Day 6: Speed Dead Heat at the Top, Meta Checks In, xAI Ships New Code

AI League — Game Day 6: Speed Dead Heat at the Top, Meta Checks In, xAI Ships New Code

Claude & GPT-5.5 hit a 0.2 t/s dead heat at 62.4 vs 62.6 t/s. Meta's Muse Spark checks in at AI Index 52. xAI ships Grok Build 0.1 coding model. Intelligence board still locked: Claude #1 (61). Full June 3 stats. #AILeague

AIL·Stats Board
2026/6/3 · 8:11
購読 1 件 · コンテンツ 6 件
Claude and GPT are now separated by 0.2 tokens per second. Six days into the season, the intelligence board is locked in — but the speed race just got a lot more interesting.
統計カードを読み込んでいます…

Final score: intelligence standings

No movement at the top for the fourth straight game day. The leaderboard as of June 3, 2026:
RankTeamModelAA IntelligenceΔ
🥇 1AnthropicClaude Opus 4.8 (max)61
🥈 2OpenAIGPT-5.5 (xhigh)60
🥉 3OpenAIGPT-5.5 (high)59
4AnthropicClaude Opus 4.7 (max)57
4GoogleGemini 3.1 Pro Preview57
4OpenAIGPT-5.5 (medium)57
4AlibabaQwen3.7 Max57
8GoogleGemini 3.5 Flash55
9KimiKimi K2.654
9XiaomiMiMo-V2.5-Pro54
11xAIGrok 4.3 (high)53
12MetaMuse Spark52NEW
12DeepSeekDeepSeek V4 Pro (Max)52
1
The top bracket has been static since opening night. Claude holds the title at 61, one point ahead of GPT-5.5 (xhigh) at 60. The real story today: Meta's Muse Spark checked in at AI Index 52 — the Llama squad's creative model joining the roster for the first time this season, though performance data remains partial.

Speed panel: the dead heat

This is the number that changed today. After five game days where Claude trailed GPT-5.5 on raw throughput:
TeamModelSpeed (t/s)Δ vs GD5
AnthropicClaude Opus 4.8 (max)62.4↑ +2.3
OpenAIGPT-5.5 (xhigh)62.6↓ −0.5
OpenAIGPT-5.5 (high)59.0
GoogleGemini 3.1 Pro Preview134↓ −0.7
GoogleGemini 3.5 Flash163↓ −20
xAIGrok 4.3 (high)141↓ −4.2
DeepSeekV4 Pro (Max)51↑ +3.5
KimiK2.644
1
Claude crept up to 62.4 t/s while GPT-5.5 slipped to 62.6 t/s — a margin so thin it's within normal API variance. Call it a draw. Both teams clocked more than 100x faster than their TTFT suggests (Claude: 9.35s to first token; GPT-5.5: 58.37s), meaning the real bottleneck for GPT's xhigh variant is getting the first word out, not the pace after that.
Gemini 3.5 Flash's speed reading dropped from 183.2 t/s (GD5) to 163 t/s today. That 11% dip is likely measurement variance rather than a sustained regression — Flash has oscillated between 163 and 207 over the season. Google's squad still holds the clear speed crown in the mid-tier bracket.
チャートを読み込んでいます…

Pricing war: nothing moved

All six franchise price tags are exactly where they were at season opening. No discounts, no hikes:
TeamFlagshipInput $/MOutput $/MBlended $/M
AnthropicClaude Opus 4.8$6.25$25.00$4.10
OpenAIGPT-5.5 (xhigh)$5.00$30.00$4.35
GoogleGemini 3.1 Pro Preview$2.00$12.00$1.74
GoogleGemini 3.5 Flash$1.50$9.00$1.31
xAIGrok 4.3 (high)$1.25$2.50$0.64
DeepSeekV4 Pro (Max)$0.435$0.87$0.18
2 3 4 5
The value-per-intelligence-point spread keeps widening. DeepSeek's permanent $0.18 blended rate stays the sharpest budget play — 42 points of AI Index for less than a fifth of a cent per thousand tokens. Grok 4.3 at $0.64 delivers 53 points. Gemini Pro at $1.74 hits 57. The gap between "flagship intelligence" and "budget intelligence" is now priced at roughly a 23× premium.
チャートを読み込んでいます…

Challenger watch

Qwen3.7 Max (Alibaba, AI Index 57, 183 t/s, $1.43 blended) is running the best value-per-point play in the top 10. At $1.43 blended, it matches Gemini 3.1 Pro Preview's intelligence score for 18% less per token. The squad has been quietly building — Qwen3.6 Plus sits at 53 points for just $0.25 blended.
MiMo-V2.5-Pro (Xiaomi, AI Index 54, $0.18 blended) continues to shadow DeepSeek across both performance and pricing, now holding the same 54 score as Kimi K2.6 while costing roughly the same per token as DeepSeek.
xAI's bench deepened this week. The franchise shipped Composer 2.5 on June 1 and Grok Build 0.1 — a dedicated agentic coding model priced at $1.00 input / $2.00 output. Grok Build 0.1 lands at 256k context and targets software engineering pipelines directly. That puts xAI's squad fielding three distinct depth-chart roles: Grok 4.3 (flagship intelligence), Grok Build 0.1 (coding specialist), and the Imagine API (creative media). Roster depth matters in a league where use-case specialization is becoming the differentiator.

Game notes

A few things worth flagging at the 72-hour data update window:
The Claude vs. GPT speed convergence at ~62.4–62.6 t/s is the first time in this season's tracking that Anthropic's flagship has pulled even on throughput. Whether that holds through next week or reverts to GPT's prior +5 t/s edge is worth watching. Both teams are well below the 100 t/s threshold where "speed" stops mattering for most interactive use cases, so the gap is more bragging rights than functional.
Gemini's Llama 4 Scout competitor slot (10M context, AI Index 14) remains the unique value proposition Meta's open-source bench can't match from inside the API: none of the challengers come close to that context window. Muse Spark's 262k context is competent but not unusual.
The no-news-is-news story remains pricing. Six days, zero API price moves across all six core franchises. The DeepSeek permanent-promo floor at $0.18 appears to have anchored the low end — nobody has tried to undercut it.

Stats sourced from Artificial Analysis live benchmarks (72-hour rolling measurements, 8× daily). All prices from official API documentation. Speed figures represent first-party API performance. #AILeague

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。