
AIL Player Card #005 — Llama 4 Maverick: The Community Wing
88 OVR. CW. Open-weight. 1M context. $0.15/M input. The model that launched on a Saturday with a fake ELO score but a real value case. Meta Open fields its community roster pick. #AILeague

Meta Open · CW · OVR 88 · Season 1, 2025 · #AILeague
The scouting report
コンテンツカードを読み込んでいます…
Meta dropped Llama 4 Maverick on a Saturday — which already tells you something. Not the polished LlamaCon reveal everyone expected, but a weekend panic release that landed in the community's inbox like an unsigned transfer fee notice. The models came in hot, the Arena ELO headline read #2 in the world, and r/LocalLLaMA immediately started sharpening its pitchforks.
Here's the play: Maverick is a 17B active parameter / 400B total mixture-of-experts model with 128 experts and native multimodal capability built into its foundation via early fusion 1. It's distilled from Llama 4 Behemoth — a 288B active / ~2T total teacher model still in training — making it the first open-weight model to inherit frontier-grade distillation at scale. The 1 million token context window means it can process an entire codebase, several books, or a year's worth of email in one shot.
The position: CW (Community Wing). Meta Open's players don't compete on raw ELO — they compete on availability, cost, and the right to modify the product. Maverick's job on this roster is to give the open-source ecosystem a frontier-class multimodal option they can actually self-host and fine-tune.
Stat card
| Dimension | Score | Signal |
|---|---|---|
| OVR | 88 | Solid frontier-class. Not elite, but open-weight changes the math entirely. |
| RZN · Reasoning | 82 | MMLU-Pro 80.5, GPQA Diamond 69.8 — respectable, not dominant |
| CRE · Creativity | 84 | Native multimodal + strong conversational tone, image captioning and chart QA strong |
| SPD · Speed | 86 | MoE architecture pays off at inference — 307 tok/s on Groq hardware |
| MLT · Multimodal | 85 | MMMU 73.4, DocVQA 94.4, MathVista 73.7 — natively multimodal from day one |
| SAF · Safety | 76 | Refusal rate dropped from 7% (Llama 3.3) to under 2% — more helpful, but the guardrails loosened |
| VAL · Value | 92 | $0.15/M input · $0.60/M output via API — free to download and self-host |
Comparable ELO at release: The public instruct model sits in the upper-mid frontier tier. The LMArena #2 ELO of 1417 was a separate "experimental chat version customized to optimize for human preference" — not the model you can download 2.
Season highlights
The 1M context play. No other fully open model at this performance tier offers 1 million tokens of context. Maverick's nearest open peer, Scout, unlocks 10M — but at lower overall capability. For enterprises running retrieval over massive document archives, Maverick at 1M tokens and $0.15 per million input tokens is a different conversation than any closed API 3.
The MoE efficiency trade. 17B active parameters on a 400B total parameter model. Inference is fast and cheap when hosted right. But Maverick's hardware footprint — it needs a full H100 DGX host to run locally — priced out the LocalLLaMA community that built the entire Llama brand 4. The r/LocalLLaMA subreddit — literally named after this franchise — was visibly not having it.
The Arena controversy. Meta showed a chart with Maverick at ELO 1417, ahead of GPT-4o and Gemini 2.0 Flash. The fine print: that was a non-public version. The model available on Hugging Face and third-party inference providers behaves noticeably different. LMArena later confirmed the submitted model was "customized for human preference" — which in the league reads as the front office filing fraudulent stats before contract negotiations 2.
The benchmark floor is real, though. Strip away the controversy and the official numbers from Hugging Face tell a coherent story: MMLU-Pro 80.5 beats Llama 3.3 70B (68.9) by a solid margin. GPQA Diamond at 69.8 is competitive mid-frontier. LiveCodeBench 43.4 — again, outpacing the prior generation. This is a legitimate upgrade on paper 5.
チャートを読み込んでいます…
Head-to-head: Community Wing class
How Maverick stacks up against the other models sharing its position on the open-source / value spectrum:
| Model | OVR | RZN | CRE | SPD | MLT | SAF | VAL | Context | Price (in/out /M) |
|---|---|---|---|---|---|---|---|---|---|
| Llama 4 Maverick | 88 | 82 | 84 | 86 | 85 | 76 | 92 | 1M | $0.15 / $0.60 |
| DeepSeek V4 Pro (#004) | 95 | 95 | 88 | 80 | 72 | 78 | 97 | 1M | $0.87 / $3.48 |
| GPT-4o (#002) | 90 | 88 | 89 | 85 | 91 | 85 | 74 | 128K | $2.50 / $10.00 |
| Gemini 2.5 Pro (#003) | 93 | 92 | 88 | 82 | 94 | 84 | 80 | 1M | $1.25 / $5.00 |
The VAL score is where Maverick makes its case. At $0.15/M input, it costs 1/8 the price of GPT-4o for input tokens, and zero if you self-host. The SWE-bench Pro score of 5.24% is a problem — no one is deploying this for autonomous coding. But for retrieval-heavy workflows, multimodal document understanding, and anything that needs a 1M context window without a $5/M output tab, it fills a lane nobody else owns in open-weight form.
The real lineup question
コンテンツカードを読み込んでいます…
Maverick's biggest rival isn't GPT-4o or Gemini 2.5 Pro. It's the Qwen series. Qwen 2.5 covers the whole size range from 0.5B to 72B with an MIT license — meaning zero downstream naming restrictions, zero EU regulatory headaches, and hardware that actually fits in a home lab 4.
The Llama 4 Community License requires "Built with Llama" branding, restricts use for companies with over 700M monthly active users, and limits vision capabilities in the EU. For an "open-source community club," that's a lot of fine print in the season ticket contract.
Meta still controls the biggest social media training data moat and distribution pipeline in the world. Llama 4 running inside WhatsApp, Instagram Direct, and Messenger hits users at a scale no academic lab or DeepSeek competitor touches. The question the community is asking: is Maverick designed for them, or designed for Meta's own platform plays?
The Saturday release, the ELO stat controversy, and the departure of Meta's head of AI research three days before launch all suggest the front office has some things to sort out. The talent is real. The locker room is complicated.
Final call: 88 OVR. Community Wing. Class of 2025. Download the jersey, read the license. #AILeague
このコンテンツについて、さらに観点や背景を補足しましょう。