AI League — Game Day 11: Grok Clears 207 t/s, Gemini Flash Gives Chase in Speed Dead Heat

Grok 4.3 clears 207 t/s — new season-high — as Gemini 3.5 Flash matches pace within 0.8 t/s, turning the speed crown into a two-horse race. GPT-5.5 bounces back 3.3 t/s after back-to-back declines. Intelligence board stays frozen for an 11th consecutive day. Full June 8 stats. #AILeague

Intelligence board

The scoreboard hasn't shifted since Season Opening Night. Claude Opus 4.8 sits at 61 pts for an 11th straight day, GPT-5.5 at 60, with no new model reaching the 60-tier. The frozen board is less a sign of stability than a sign of how hard it is to break through: reaching 61 means outperforming 395 other benchmarked models across 10 independent evaluations.

Rank	Team	Model	AI Index	Δ Day
1	Anthropic	Claude Opus 4.8 (Max)	61	↔
2	OpenAI	GPT-5.5 (xhigh)	60	↔
3	Google	Gemini 3.1 Pro Preview	57	↔
4	Kimi	Kimi K2.6	54	↔
5	xAI	Grok 4.3 (high)	53	↔
6	DeepSeek	DeepSeek V4 Pro (Max)	52	↔

차트를 불러오는 중…

Kimi K2.6 remains the highest-ranked open-weights model — above every Grok variant and above DeepSeek on the intelligence board, even as it sits near the bottom of this core-six snapshot.

Speed panel

This is where today gets interesting.

Speed panel — June 8

Output tokens per second, first-party API. Source: Artificial Analysis

Grok 4.3 (high)

207.6 t/s+9.9%vs Day 10

Gemini 3.5 Flash (high)

206.8 t/s+0.0%stable

Gemini 3.1 Pro Preview

142.6 t/s+0.0%stable

GPT-5.5 (xhigh)

65.0 t/s+3.3%vs Day 10

Claude Opus 4.8 (Max)

66.8 t/s+0.0%stable

DeepSeek V4 Pro (Max)

59.7 t/s+0.0%stable

통계 카드를 불러오는 중…

Grok 4.3's 11-day speed run, from Season Opening Night (145.2 t/s) to today (207.6 t/s):

차트를 불러오는 중…

Grok 4.3 hit 207.6 t/s — its fifth consecutive record high and a 5% single-day gain over Day 10's 197.7 t/s. Since Season Opening Night, xAI has added 62.4 t/s to its output speed, a +43% run in 11 days. 2

The story isn't just Grok's run, though. Gemini 3.5 Flash checked in at 206.8 t/s — within 0.8 t/s of Grok, essentially tied. Google is running two sub-arms of its speed strategy: Flash at 207 t/s for throughput-sensitive workloads, and Pro Preview at 142.6 t/s for the accuracy-conscious tier. Together, Google fields the deepest speed bench in the league. 3 4

GPT-5.5 bounced from 61.7 to 65.0 t/s — recouping about half of its two-day slide. Not a breakout, but at least the bleeding stopped. 5

Claude Opus 4.8 held at 66.8 t/s, barely ahead of GPT-5.5 and roughly on pace with where it opened the season. 6

Pricing war breakdown

No price moves today. The gap between the top-intelligence tier and the value tier remains wide, and that's by design.

Team	Model	Input	Output	Blended
Anthropic	Claude Opus 4.8 (Max)	$6.25	$25.00	$4.10
OpenAI	GPT-5.5 (xhigh)	$5.00	$30.00	$4.35
Google	Gemini 3.1 Pro Preview	$2.00	$12.00	$1.74
Google	Gemini 3.5 Flash (high)	$1.50	$9.00	$1.31
Kimi	Kimi K2.6	$0.95	$4.00	$0.70
xAI	Grok 4.3 (high)	$1.25	$2.50	$0.64
DeepSeek	DeepSeek V4 Pro (Max)	$0.435	$0.87	$0.18

Grok 4.3 at $0.64/1M blended is the most cost-efficient proprietary reasoning model in the top-53 intelligence tier — speed-and-price combined, xAI is posting arguably the best value proposition for throughput-heavy workloads. DeepSeek V4 Pro at $0.18 blended remains the outright cheapest at AI Index 52, but the open-weights team's 59.7 t/s is the slowest in this cohort.

The real squeeze is on Anthropic and OpenAI: both are paying a premium-price tax to hold the top-two intelligence slots, while Google is offering 57-point intelligence (Gemini Pro) at 60% of their price — and 55-point intelligence (Flash) at roughly 30% of their price with twice the speed.

Challenger watch

MiMo-V2.5-Pro entered the AA index at 54 pts — tied with Kimi K2.6 as the second-highest open-weights score on the board. If MiMo sustains that rating across reruns, it becomes the first new open-weights model this season to match Kimi's standing. No speed or pricing data was available for MiMo in today's snapshot. 1

Stat of the day

Grok 4.3 is now generating tokens 3.1× faster than Claude Opus 4.8 (207.6 vs 66.8 t/s) while scoring only 8 intelligence points lower (53 vs 61). For latency-sensitive applications, that trade-off math is hard to ignore.

Stats sourced from Artificial Analysis live API measurements. Intelligence Index v4.0 incorporates 10 evaluations. Speed figures reflect first-party API performance. Prices in USD per 1M tokens (blended at 7:2:1 cache-hit/input/output ratio).

#AILeague