
AIL Player Card #004 — DeepSeek V4 Pro: The Value Engineer
95 OVR. VE. Open-source. 1.6T parameters. $3.48/M output vs $30 for GPT-5.5. Codeforces ELO 3206 — beats the incumbent by 38 points. DeepSeek Athletic just repriced the frontier. #AILeague

#AILeague · AI League Player Card #004
🃏 DEEPSEEK ATHLETIC — DEEPSEEK V4 PRO
┌─────────────────────────────────────────────┐
│ 95 DeepSeek V4 Pro │
│ VE VALUE ENGINEER │
│ DeepSeek Athletic │
├──────────────────┬──────────────────────────┤
│ RZN 91 ████████░░ │
│ CRE 84 ███████░░░ │
│ SPD 79 ██████░░░░ │
│ MLT 68 █████░░░░░ │
│ SAF 76 ██████░░░░ │
│ VAL 99 ██████████ │
└─────────────────────────────────────────────┘95 OVR · VE · DeepSeek Athletic. Open-source. 1.6 trillion parameters. Released April 24, 2026 — exactly 24 hours after GPT-5.5 hit the field. The price: $3.48 per million output tokens versus $30 for the incumbent. DeepSeek Athletic just walked into the most expensive stadium in sports and handed everyone a cheaper ticket.
Season overview

On April 24, 2026, a model dropped on Hugging Face under the MIT license. DeepSeek V4 Pro: 1.6 trillion total parameters, 49 billion active per token, 1 million token context window, 384,000 token max output. 1
The timing was not accidental. OpenAI had shipped GPT-5.5 the previous day. Anthropic's Claude Opus 4.7 had launched eight days earlier. April 2026 was, by raw model-release count, the densest month in the history of public AI. DeepSeek dropped V4 into that pile and added one more variable: it was free to download, run, and modify. 2
Loading content card…
The companion V4 Flash (284 billion parameters, 13 billion active) slots in at $0.14 input / $0.28 output per million tokens. V4 Pro costs $1.74 / $3.48 — roughly 7× less than GPT-5.5 ($5/$30) and 6× less than Claude Opus 4.7 ($5/$25) at comparable flagship-reasoning workloads. 3
Scouting report
Coding — the position this player was built for
Codeforces ELO: 3206. GPT-5.5 sits at 3168 on the same leaderboard. That is not a statistical tie — V4 Pro beats the closed-source incumbent on competitive programming by 38 rating points. 1
LiveCodeBench: 93.5% against Claude Opus 4.7's 88.8%. SWE-bench Verified: 80.6% — effectively tied with Claude Opus 4.7's 80.8%, a gap that rounds to zero on real-world repository work.
Terminal-Bench 2.0 is the exception. V4 Pro scores 67.9% against GPT-5.5's 82.7% on long-running autonomous DevOps tasks. If the workflow is a 3-hour unsupervised infra migration, GPT-5.5 has the clear edge. For everything else in a standard coding sprint, V4 Pro is in the same tier — at a fraction of the cost.
Reasoning and science
GPQA Diamond: 90.1%. Claude Opus 4.7 leads that specific benchmark at 94.2%, and GPT-5.5 posts 93.6%. V4 Pro is behind but within striking distance on PhD-level science questions. MMLU-Pro: 87.5%. 4
Humanity's Last Exam (HLE) — the hardest test anyone has designed for frontier models — shows the gap more clearly: Claude Opus 4.7 at 46.9%, V4 Pro at 37.7%. On the problems that require genuine expert-level synthesis across disciplines, the closed-source labs still have a lead.
The number that explains everything: VAL 99
The architecture behind V4 is what makes the economics possible. Compressed Sparse Attention (CSA) compresses input sequences 4× and selects only the most relevant tokens before computing attention. Highly Compressed Attention (HCA) compresses the KV cache 128×. Running a 1-million-token context on V4 Pro costs 27% of the FLOPs and 10% of the KV cache of its predecessor at the same context length. 5
The model was trained on 33 trillion tokens using the Muon optimizer, which applies gradient orthogonalization for faster convergence than standard AdamW. Mixture-of-Experts routing activates 49 billion of 1.6 trillion parameters per token — the model is large in theory, cheap in practice.
This is the DeepSeek Athletic house style: build for efficiency, release open weights, let the benchmark numbers speak for themselves.
DeepSeek benchmark progression across V3.x family 5
Head-to-head
| DeepSeek V4 Pro | Claude Opus 4.7 | GPT-5.5 | |
|---|---|---|---|
| Arena ELO | 1,467 | ~1,510 | ~1,530+ |
| Codeforces ELO | 3,206 | N/A | 3,168 |
| LiveCodeBench | 93.5% | 88.8% | — |
| SWE-bench Verified | 80.6% | 80.8% | — |
| Terminal-Bench 2.0 | 67.9% | 65.4% | 82.7% |
| GPQA Diamond | 90.1% | 94.2% | 93.6% |
| MMLU-Pro | 87.5% | — | — |
| HLE | 37.7% | 46.9% | — |
| Input price (per 1M tokens) | $1.74 | $5.00 | $5.00 |
| Output price (per 1M tokens) | $3.48 | $25.00 | $30.00 |
| Open weights | ✅ MIT | ❌ | ❌ |
| Context window | 1M | 200K | 1M (Codex config) |
The VE position: what it means
The league does not have a prior template for what V4 Pro does. It does not win every category. It does not need to.
The Value Engineer position is defined by a specific competitive logic: perform within one tier of the best closed-source models across core benchmarks, then price the API at 7× below them. At 100 million output tokens per month — a realistic production workload — V4 Pro costs $348,000 per year. GPT-5.5 costs $3,000,000 for the same traffic. The performance differential on most production tasks does not justify that gap.
That calculus is pushing production developers toward multi-model routing architectures where V4 Flash handles 70% of traffic at $0.28/M output, V4 Pro covers mid-tier tasks, and frontier closed models take only the highest-stakes calls. 2
One real limitation: V4 Pro has no multimodal input. It cannot process images, audio, or video. For any pipeline touching file analysis, screenshot debugging, or document OCR, the routing still goes to Gemini, GPT-5.5, or Claude. The SAF score (76) reflects another genuine gap — hallucination testing shows V4 Pro confidently fabricates answers to questions it should decline. These are not rounding errors; they are architectural choices that constrain where V4 Pro can be safely deployed.
League notes
DeepSeek Athletic has now beaten OpenAI United to the open-source frontier model slot twice. V3 in January 2025 sent Nvidia's stock down $600 billion in a day. V4 dropped 24 hours after GPT-5.5 and did it with an MIT license and a 1-trillion-parameter count that no Western lab has matched in open weights. 6
The club's competitive philosophy is not "beat OpenAI at their own game." It is to build the game everyone else can afford to play. Whether that strategy holds as US export controls tighten on Huawei Ascend chips — the hardware DeepSeek V4 was reportedly trained on — is the question every scout in the league is currently writing up.
#AILeague
Add more perspectives or context around this Drop.