


1/3
Goedel-Architect: 99.2% on MiniF2F — Open-Source, 500× Cheaper
Goedel-Architect achieves 99.2% pass@1 on MiniF2F-test and 88.8% on PutnamBench using open-weight DeepSeek-V4-Flash — the strongest open-source formal theorem proving pipeline to date, at ~$294 for a full PutnamBench run (vs. ~$163,000 for Hilbert). Lean 4 kernel verification throughout, zero sorry. Claim audit: competition scores (IMO 4/6, Putnam 11/12) require NL-seeded mode, not fully autonomous.
June 12, 2026 · 2:08 AM
Gallery
arXiv:2606.06468 · June 4, 2026 · Princeton / NVIDIA
Goedel-Architect is a new agentic framework for formal theorem proving in Lean 4, built around a single innovation: the blueprint — a global dependency graph of lemmas that gets generated, proved in parallel, and globally rewritten when things fail. The result is the strongest open-source pass@1 performance on MiniF2F and PutnamBench to date, at a cost that undercuts comparable pipelines by up to 500×.
What Happened
The Goedel-LM team (Princeton, NVIDIA, collaborators) released Goedel-Architect on June 4. Using open-weight DeepSeek-V4-Flash (284B-A13B) as backbone:
- MiniF2F-test: 99.2% pass@1 (242/244) in autonomous mode; 100% with optional natural-language proof seeding — the first pipeline to close all 244 problems.
- PutnamBench: 75.6% pass@1 autonomous; 88.8% pass@4 (597/672) with NL seeding — surpassing Hilbert's 70.0% (which required ~$163,000 vs. Goedel-Architect's ~$294).
- Competition results (NL-seeded mode): 4/6 IMO 2025, 11/12 Putnam 2025, 3/6 USAMO 2026.
- Full pipeline and model weights open-sourced at github.com/Goedel-LM/Goedel-Prover-V2.
Prior SOTA for open-source: Seed-Prover 1.5 at 87.9% PutnamBench (NL-seeded, closed-source backbone).
How It Works
Most theorem-proving pipelines decompose a goal recursively — a top-down tree that can loop on dead-end strategies. Goedel-Architect does something different:
- Blueprint generation: Given only the target theorem statement, a planner builds a blueprint — a DAG of formally stated definitions and lemmas, with all proof bodies left empty. An optional NL proof can seed the structure.
- Parallel proving: Each open leaf node in the DAG is dispatched to a Lean prover that can only use that node's declared dependencies. The prover has access to the Lean compiler and Mathlib search. Success freezes the node; failure emits structured diagnostics (and counterexamples if the statement is wrong).
- Blueprint refinement: Failed nodes drive global blueprint rewrites: split an over-hard lemma, fix a misformalized statement, or add auxiliary lemmas. Already-proved nodes are preserved. The loop continues until all nodes are closed or an iteration cap is hit.
The key mechanism: global blueprint rewriting avoids the dead-end recursion that plagues tree-based approaches. The blueprint is also the shared system of record — no hidden state outside it.
Claim Audit
| Dimension | Assessment |
|---|---|
| Benchmark numbers | MiniF2F 99.2% pass@1 and PutnamBench 75.6% pass@1 as reported. Independently reproducing full runs is expensive but methodology is auditable. |
| Verification method | Lean 4 kernel + Mathlib throughout. All accepted proofs compile clean. Zero sorry in final output. ✓ Kernel-verified. |
| Autonomy level | Fully autonomous mode exists (target statement only). Competition scores (IMO/Putnam/USAMO) use NL-seeded mode — not autonomous. |
| Cost claim | ~$294 for full PutnamBench run ($0.44/problem). Hilbert: ~$163,000. AxProverBase (Claude Opus 4.5): $8,467. 500× claim is against Hilbert; against AxProver it's ~29×. |
| Statement faithfulness | Not independently audited in the paper. Auto-formalization from NL proofs introduces faithfulness risk — the paper does not report domain-expert re-checking beyond the automated pipeline. |
| Openness | Full pipeline + DeepSeek-V4-Flash weights open. No closed API dependency in the default setup. |
⚠️ Caveat: Competition benchmark results (IMO 4/6, Putnam 11/12) use NL-seeded mode. Fully autonomous numbers on competition-level problems are lower. This is a meaningful distinction for "did AI prove this without human proof hints."
Primary Sources
- Paper: arXiv:2606.06468 — Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement
- Code: github.com/Goedel-LM/Goedel-Prover-V2

Comments