GPT-5.5 is out — and it closes the gap on reasoning while cutting token costs in half

OpenAI shipped GPT-5.5 on April 23, 20261, its most capable model to date across coding, reasoning, and long-context tasks. The release matters because it sets a new price-performance ceiling: Artificial Analysis puts the per-token cost at roughly half that of comparable frontier coding models, while the benchmarks show clear wins over its predecessor across nine of ten shared evaluations2.

Neural network connections illustration by Google DeepMind

What changed from GPT-5.4

The headline number is Terminal-Bench 2.0: 82.7% versus 75.1% for GPT-5.41. That's a +7.6 point jump on the most demanding agentic coding benchmark in the current evaluation set. Long-context retrieval tells a similar story — on the Graphwalks 1M task (one million token context), GPT-5.5 scores 45.4% where GPT-5.4 managed only 9.4%1. That's not a marginal improvement; it's a qualitative change in what the model can hold in working memory.

GPT-5.5 benchmark results

Selected scores vs. GPT-5.4 (April 23, 2026)

Terminal-Bench 2.0

0.0+7.6%vs GPT-5.4

ARC-AGI-2

0.0+11.7%vs GPT-5.4

GPQA Diamond

0.0

Humanity's Last Exam

0.0

SWE-Bench Pro

0.0

Long context (1M tokens)

0.0+36.0%vs GPT-5.4

正在加载统计卡片...

On ARC-AGI-2 — the visual reasoning benchmark designed to resist pattern-matching — GPT-5.5 hits 85.0%, the biggest single-generation jump in the comparison set at +11.7 points2. Third-party code review platform CodeRabbit ran it against their curated review benchmark: issue detection climbed from 58.3% to 79.2%3.

How it compares to Claude Opus 4.7

The rivalry with Anthropic's Claude Opus 4.7 (released April 16) is the more interesting competitive angle. MindStudio found that GPT-5.5 completes equivalent coding tasks using 72% fewer output tokens than Opus 4.74 — a structural efficiency gap that compounds fast at scale. Raw capability is roughly even on most dimensions, though Opus 4.7 holds an edge on SWE-Bench Pro (64.3% vs 58.6%) and long-context coherence at the million-token range1. On the LMArena human-preference leaderboard (updated April 27), gpt-5.5-high ranks fourth overall in text at 1488 Elo and fifth in code at 1500 — both claude-opus-4-7-thinking variants sit above it5.

Access and pricing

GPT-5.5 and GPT-5.5 Pro are live via API6. Pricing: gpt-5.5 runs $5/1M input tokens and $30/1M output; gpt-5.5-pro is $30/$1801. Both are available in ChatGPT Plus, Pro, Business, and Enterprise tiers. API access was rolling out at announcement; it is now confirmed open to developers.

Sam Altman confirmed the rollout on X (formerly Twitter) on April 23, 2026.

Context

The timing isn't accidental. GPT-5.5 arrived a week after Anthropic's Opus 4.7 and five days before OpenAI's revised Microsoft partnership announcement7, which lets OpenAI sell across cloud providers rather than through Azure exclusively. That structural change — combined with a model that costs half as much to run per output token as its competition — is worth watching as enterprise procurement decisions for the rest of 2026 start to crystallize. DeepSeek's V4 preview, released one day later on April 24 with 1.6 trillion parameters and a $0.145/1M input price8, adds a third cost axis to that equation — but GPT-5.5 currently has the advantage on both raw benchmark performance and the ecosystem breadth developers actually ship against.

GPT-5.5 is out — and it closes the gap on reasoning while cutting token costs in half

What changed from GPT-5.4

How it compares to Claude Opus 4.7

Access and pricing

Context

参考来源