


1/3
DeepMind Resolves 9 Open Erdős Problems — All Lean-Kernel Verified
DeepMind's AlphaProof Nexus agent resolved 9 open Erdős problems and 44 OEIS conjectures — all formally verified by the Lean 4 kernel with zero sorry placeholders. First large-scale open-problem resolution via AI + formal proof search. Claim audit: Lean-kernel verification confirmed; scope limited to Mathlib-mature domains.
2026. 6. 11. · 03:57
갤러리
What happened: On May 21, 2026, DeepMind and UT Austin released a preprint (arXiv:2605.22763) reporting the first large-scale resolution of open mathematical problems using AI-driven formal proof search. Their most capable agent autonomously resolved 9 out of 353 open Erdős problems and proved 44 out of 492 auto-formalized OEIS conjectures — all verified by the Lean 4 proof kernel, zero
sorry placeholders.This is a different category of result from the OpenAI unit-distance disproof covered last issue: every proof here is machine-checked. The Lean kernel accepted each one end-to-end; human experts only confirmed post-hoc that the formal statement faithfully matched the original informal problem.
How the Agent Works
The system is built on the AlphaProof Nexus framework. The core loop is simple: an LLM generates a Lean proof attempt, the Lean compiler verifies it and returns error feedback, the LLM revises — repeat until a
sorry-free proof is accepted or the budget runs out.The paper evaluates four agent designs:
- Agent A (basic): multiple independent prover subagents, each running the LLM–Lean feedback loop.
- Agent B: same as A, but can call AlphaProof (tree-search mode) to discharge hard subgoals.
- Agent C: adds a shared population database of proof sketches; LLM raters score sketches by plausibility/clarity/novelty, P-UCB sampling drives evolutionary search.
- Agent D (full): combines AlphaProof tool access + evolutionary search + global goal caching to avoid redundant computation.
Agent D solved the hardest problems (Erdős #125, #138) with 2–5× lower cost than Agent A. Standalone AlphaProof in tree-search mode solved 0/9 — the LLM-guided natural language layer is the critical piece, not just search.
Domains where the agent succeeded: combinatorics, optimization, graph theory, algebraic geometry, additive number theory, quantum optics. All domains where Mathlib is mature and problems decompose into tractable subgoals.
What Was Proved
Nine Erdős open problems including:
- #12(i), #12(ii) — chromatic / combinatorial
- #125, #138, #152 — among the hardest cases for basic agents; required Agent D
- #741(i), #741(ii), #846
- #26 variant — related to Ben Green's open conjecture list (#57)
Plus 44/492 OEIS conjectures formally proved, plus results in optimization (tight O(1/t) rate for Anchored Gradient Descent-Ascent), a Graffiti conjecture from 1996 (graph theory), Zanello's log-concavity conjecture for pure O-sequences (algebraic geometry), and quantum optics existence conjectures for N=d ∈ {4, 6, 10}.
Cost: a few hundred USD per solved problem (Agent D); high variance. Screening all 353 Erdős problems for tractability required significant total compute.
Claim Audit
| Claim | Verdict | Notes |
|---|---|---|
| Lean-kernel verification | ✅ Verified | All 9 Erdős proofs are sorry-free; Lean kernel accepted each one end-to-end |
| Autonomy | ✅ Verified | Agent found proofs without step-by-step human guidance; pre-formalized statements provided as input |
| Statement faithfulness | ✅ Verified | Domain experts confirmed each formal statement matches the original open problem post-hoc |
| Scope | ⚠️ Partial | 9/353 Erdős problems; selection biased toward Mathlib-mature domains; deep structural results remain out of reach |
| Cost claim | ⚠️ Partial | "Few hundred USD" is median; true cost has high variance; total screening budget was substantially larger |
| Independent replication | ❓ Pending | As of June 10, 2026, no independent third-party replication reported |
Key distinction from last issue: The unit-distance disproof (OpenAI, May 20) was verified by human experts (Gowers, Alon, Shankar, Tsimerman) — not the Lean kernel. LeanMarathon tried to formalize it and failed. This paper's 9 Erdős proofs are the inverse: weaker human expert involvement, stronger mechanical verification.
Primary Sources
- Preprint: arXiv:2605.22763v2 — Tsoukalas et al., DeepMind / UT Austin, submitted May 21, revised June 8, 2026
- Related: SorryDB paper arXiv:2603.02668 — real-world Lean sorry benchmark (see next issue)

댓글