AI League — Season Opening Night: The Official Stats Panel, Week 1

AI League — Season Opening Night: The Official Stats Panel, Week 1

Claude Opus 4.8 tops the board (AI Index: 61). DeepSeek V4 Pro cuts output price 75% to $0.87/M. Gemini 3.5 Flash hits 207 t/s. Full post-game stats panel. #AILeague

AIL·Stats Board
2026/5/29 · 11:27
購読 1 件 · コンテンツ 1 件
Presented by AI League Daily. Data sourced from Artificial Analysis, official API pricing pages, and OpenRouter usage rankings.

Claude Opus 4.8 takes the #1 slot on the Artificial Analysis Intelligence Index. GPT-5.5 drops to second. Gemini 3.5 Flash sets a season-high speed record at 207 t/s. And DeepSeek just slashed V4 Pro output pricing by 75% — the biggest in-season price cut any team has posted this year.

🏆 Overall standings — Intelligence Index (Artificial Analysis, live)

RankTeamModelAI IndexΔPrice/1M blend
🥇 1Anthropic (Claude)Claude Opus 4.8 (Reasoning Max)61↑ NEW$4.10
🥈 2OpenAI (GPT)GPT-5.5 (xhigh)60$4.35
🥉 3OpenAI (GPT)GPT-5.5 (high)59$4.35
4Anthropic (Claude)Claude Opus 4.7 (Reasoning Max)57↓ −1$4.10
5Google (Gemini)Gemini 3.1 Pro Preview57↑ NEW$1.74
13xAI (Grok)Grok 4.3 (high)53$0.64
17DeepSeekDeepSeek V4 Pro (Reasoning Max)52$0.18
18Meta (Llama)Muse Spark52↑ NEW
Artificial Analysis Intelligence Index (0–100 composite score, live 72-hr rolling window)
1
AI model leaderboard intelligence index chart from Artificial Analysis
Live intelligence index rankings from Artificial Analysis — updated every 72 hours across 100+ models 1

🔥 Game of the night: Claude vs. GPT in the scoring race

Claude Opus 4.8 posts an Intelligence Index of 61 to GPT-5.5's 60 — a one-point margin that is statistically tight on Artificial Analysis's composite methodology, but Claude has now held the top position across successive measurements. GPT-5.5 remains the higher-volume franchise: it fields three active pricing tiers (xhigh, high, medium) priced at a flat $4.35/1M blend compared to Claude Opus 4.8's $4.10/1M, a 6% pricing edge for Anthropic at equal performance.
Meanwhile, the legacy GPT-4o line (not on the current Artificial Analysis front page) has been effectively benched in favor of the GPT-5 series. OpenAI's pricing page no longer lists standalone GPT-4o rates — the franchise is running on a GPT-5.4 / GPT-5.5 two-tier rotation.
23

⚡ Speed panel — tokens per second (Artificial Analysis live)

This is where the scoreboard gets interesting. Intelligence and speed almost never correlate.
TeamModelOutput speed (t/s)TTFT (s)Speed tier
Google (Gemini)Gemini 3.5 Flash207 t/s16.2s🚀 Season-high
Google (Gemini)Gemini 3.1 Flash-Lite268 t/s5.7s🚀 Fastest Gemini
Anthropic (Claude)Claude Sonnet 4.6 (Reasoning Max)120 t/s110.7s⚡ Fast-reasoning
Anthropic (Claude)Claude Haiku 4.597 t/s16.9s⚡ Budget burner
xAI (Grok)Grok 4.3 (high)182 t/s12.5s🏃 Underrated pace
OpenAI (GPT)GPT-5.5 (xhigh)40.7 t/s49.1s🐢 Deliberate
DeepSeekDeepSeek V4 Pro (Reasoning Max)50 t/s1.8s⚡ Low TTFT
Season stat: Gemini 3.5 Flash's 207 t/s is the fastest production speed among models in the AI Index top 10 score bracket (55+). Google may be outside the intelligence top 5, but they're running the fastest offense of any team in that tier.
Grok's quiet hustle: Grok 4.3 (high) at 182 t/s with 12.5s TTFT punches above its Intelligence Index rank of 53. If xAI can close the scoring gap, the speed advantage becomes a real weapon.
Business data dashboard panel showing comparative performance metrics
Speed and throughput metrics: multi-dimension performance dashboard 1
1

💰 Pricing war analysis — the cost-per-point breakdown

The real competition is in the value tier — Intelligence Index points per dollar of API cost.
TeamFlagship modelInput $/1MOutput $/1MAI IndexPoints per $1 output
DeepSeekV4 Pro (75% promo)$0.435$0.875259.8
GoogleGemini 3.5 Flash$0.30*$2.50*5522.0
xAIGrok 4.3$1.25$2.505321.2
GoogleGemini 2.5 Pro$1.25$10.00353.5
AnthropicClaude Sonnet 4.6$3.00$15.00523.5
OpenAIGPT-5.5$5.00$30.00602.0
AnthropicClaude Opus 4.8$5.00$25.00612.4
Gemini 2.5 Pro/Flash pricing from Google AI Developer pricing page; DeepSeek pricing per official API docs (75% discount valid through May 31, 2026 UTC); Grok from xAI API page.
23456

DeepSeek's fire sale

DeepSeek's V4 Pro is currently on a 75% promotional discount — output at $0.87/1M versus the post-promotion rate of $3.48/1M. With an Intelligence Index of 52, it delivers roughly 60 AI-index points per dollar of output, compared to GPT-5.5's 2.0. This promo expires May 31, 2026. If DeepSeek holds that line post-discount, it becomes a real threat to the mid-market. If it reverts to full price, the equation changes entirely.
The macro pattern: Google and DeepSeek are playing a volume game. Gemini 3.5 Flash at $0.30/$2.50 in/out and Gemini 2.5 Flash at $0.30/$2.50 are the lowest-priced capable models from any named team. Google's free tier (via Google AI Studio) makes it the only franchise offering zero-cost access to a competitive model.
Speedometer dashboard — tokens per dollar value race among AI model API tiers
API pricing race: value-per-token is the new speed stat — DeepSeek's 75% promo ends May 31 4

📊 Context window depth chart

TeamModelContext windowField advantage
Meta (Llama)Llama 4 Scout10M tokensWidest context in the league
xAI (Grok)Grok 4.20 / 4.1 Fast1M tokensSecond-widest
Google (Gemini)Gemini 1.5 Pro (May)1M tokensThird-widest
Anthropic (Claude)Claude Opus 4.81M tokensTied third
OpenAI (GPT)GPT-5.5922k tokensSlight edge over prior 128k window
DeepSeekDeepSeek V4 Pro/Flash1M tokensFull million
Meta's Llama 4 Scout at 10 million tokens is the long-haul freight runner of this league — no other model is within 10x of its context capacity. That's a specialized stat, but for enterprises running massive document pipelines it's a decisive edge.
1

🆕 Challenger watch — challengers making noise this cycle

  • Kimi K2.6 (Moonshot AI) — Intelligence Index 54, ranking it above DeepSeek V4 Pro and all named Grok variants. Priced at $0.70/1M blend. An unseeded challenger crashing the main draw.
  • Mistral Medium 3.5 — AI Index 39, 152 t/s, priced at $2.10/1M blend. Solid mid-tier performance, especially for EU-regulated deployments.
  • Cohere Command A+ — AI Index 37, 223 t/s output speed — the highest raw token throughput on the entire leaderboard among models with a price listed. Zero-cost pricing (free tier). Useful for latency-critical applications but the intelligence gap versus tier-1 models is wide.

🏁 Postgame summary

Claude Opus 4.8 holds the scoring record. GPT-5.5 stays in contention on depth and ecosystem. Google's Gemini family is running the fastest offense in the ability-tier. DeepSeek's discount is the price-war story of the cycle — $0.87/1M output won't last past May 31. Meta's Llama 4 Scout has a context advantage no other franchise can touch. Grok is faster than its rankings suggest.
The teams that should be watching? Anyone who isn't Google or DeepSeek in the budget tier is currently fighting for value at a significant per-token disadvantage.

Data as of May 29, 2026 — live metrics from Artificial Analysis (72-hr rolling), official pricing from each team's API documentation pages. Speed metrics reflect streaming output tokens per second under standard single-request conditions.
#AILeague

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。