No Winner-Take-All: What Two ML Researchers Think Is Actually Happening in AI Right Now

No Winner-Take-All: What Two ML Researchers Think Is Actually Happening in AI Right Now

Nathan Lambert and Sebastian Raschka join Lex Fridman for a 4-hour state-of-AI roundtable. Their core argument: no single company holds a decisive edge in 2026 — ideas flow freely, the differentiator is compute resources and organizational culture. They cover China vs. US open-model dynamics, RLVR post-training, scaling law status, coding tools, and honest AGI timelines.

AI Podcast Insights
2026/6/9 · 10:30
購読 1 件 · コンテンツ 1 件

リサーチノート

In episode #490 of the Lex Fridman Podcast, Nathan Lambert (post-training lead at the Allen Institute for AI) and Sebastian Raschka (author of Build a Large Language Model from Scratch) spent over four hours dissecting the state of AI heading into 2026. Neither is a hype merchant. Lambert runs the OLMo open-model program and co-coined the term RLVR. Raschka builds models from first principles and writes detailed technical breakdowns for practitioners. Their core argument, stated plainly: the AI race has no clear winner, ideas are freely circulating, and the real moat is compute budget and organizational culture — not secret techniques.
コンテンツカードを読み込んでいます…

The DeepSeek moment reframed

The episode opens with Lex's framing: the DeepSeek R1 release in January 2025 kicked off an acceleration that has not slowed. Lambert and Raschka agree it matters, but resist the narrative that any single company is "winning." 1
Raschka's position: no company holds exclusive technology in 2026 because researchers move between labs constantly, and the ideas themselves — Mixture of Experts, RLVR, multi-head latent attention — are all published. The differentiating factor is budget and hardware to implement them. Lambert extends this: what DeepSeek really did was catalyze a whole generation of Chinese AI labs (Z.ai, MiniMax, Kimi Moonshot) into releasing strong open-weight models, similar to how ChatGPT seeded US startup activity in 2022. 1
On the closed-model side, both researchers flag Claude Opus 4.5 as the current darling of the developer community — particularly for code — while acknowledging that ChatGPT and Gemini command far larger consumer bases through sheer brand inertia. Gemini 3 launched with enormous marketing momentum, then faded conversationally even though the model itself remains strong. Lambert's prediction: Gemini continues gaining on ChatGPT in 2026, Anthropic dominates enterprise software, and OpenAI keeps pulling off research surprises (o1, Deep Research, Sora) through what he describes as a uniquely high-velocity research culture. 1

Where the actual gains are coming from

The most technically substantive section of the episode concerns the three axes of scaling: pre-training, post-training RL, and inference-time compute. Both researchers are bullish on all three but sharply disagree on which is currently the best return on investment.
The transformer architecture itself, Lambert notes, has barely changed since GPT-2. The real changes are in training stages:
  • Pre-training is still the knowledge-absorption phase — next-token prediction on a large corpus. At current scale (GPT-4-class models estimated around one trillion parameters), training runs cost roughly $5–10 million in cloud compute, while serving those models to hundreds of millions of users costs billions annually. Pre-training still improves the base model, but it is no longer the cheapest path to capability gains.
  • Mid-training refines pre-training data toward specific objectives (long context, high-quality synthetic rewrites, code-heavy mixes). It was not a distinct concept in the GPT-2 era.
  • Post-training, particularly Reinforcement Learning with Verifiable Rewards (RLVR), is the most active research area. The technique — popularized by DeepSeek R1, but named earlier by AI2's Tulu 3 project — lets models grade their own outputs on verifiable tasks (math, code) and iterate. Raschka demonstrates its speed: training Qwen 3 base with RLVR on the MATH-500 benchmark moved accuracy from 15% to 50% in just 50 steps. What the model is not doing is learning new math — the knowledge was already in pre-training. RLVR is an unlock, not an upload. 1
Lambert describes the third axis, inference-time scaling, as the most user-facing step change of the past year. The shift from "get the first token immediately" to "model thinks for 30 seconds or 30 minutes before answering" has transformed what models can do — especially for software tasks that require iterative tool use. One practical implication: a single o3-style Pro query running for an hour becomes an infrastructure problem at 100 million users.
There is a genuine debate about whether pre-training has plateaued. Lambert's answer is careful: the excitement has moved elsewhere, but the practice has not. "People vibe that way, but it's not what is actually happening," he says. Labs still redo pre-training every cycle. The question is purely financial — whether the next increment in model size is worth the fixed training cost versus cheaper alternatives like RLVR or better inference-time strategy.

The open-model landscape in 2026

Raschka and Lambert rattle off the open-model ecosystem without notes: DeepSeek, Kimi, MiniMax, Z.ai, Moonshot from China; Mistral, Gemma, gpt-oss-120b, OLMo, Qwen, NVIDIA Nemotron, and Hugging Face SmolLM from the West. This list was substantially shorter a year ago.
Overview of open and closed AI models released in 2025, compiled by Lex Fridman
AI models landscape 2025 — an overview updated by the episode's production team 1
Several things stand out in their analysis:
  • gpt-oss-120b (OpenAI's first openly-weighted model since GPT-2) is notable specifically because it was the first major model trained explicitly for tool use, enabling reliable web search and Python interpreter calls rather than relying on memorization. This is a significant capability shift for the open ecosystem.
  • Qwen 3 gets high marks from both for breadth and performance, though there is a known data-contamination debate around how much of its math benchmark performance reflects training on near-identical problems.
  • Llama's decline is acknowledged with some sympathy. Lambert's read: Meta's Llama program got pressured toward benchmark-chasing rather than serving the practitioner community that originally loved it. Internal misaligned incentives — researchers wanting to hit frontiers, management wanting headlines — may have broken the program. "RIP Llama," Lex says. Lambert doesn't argue. Whether there's a Llama 5 as an open-weight model is genuinely uncertain given Alexandr Wang's reported preference for closed development. 1
Lambert's personal project, ADAM (American Truly Open Models), runs through the conversation as a recurring thread: Lambert believes the US is ceding influence to Chinese open-weight models and argues for a centralized national effort — not on security-theater grounds, but because open models are the starting point for the next generation of AI research. If Qwen and DeepSeek are what every new researcher is building on, the ideas that flow from that will disproportionately be Chinese.

Coding tools and the "programming with English" shift

Both researchers use AI coding tools daily, but choose differently. Raschka uses Codeium in VS Code, preferring to stay in the diff and keep oversight. Lambert uses Cursor and Claude Code interchangeably, with a deliberate reason: Claude Code is teaching him to "program with English" — operating at the system-design level rather than managing line-by-line diffs. He considers this a distinct skill worth building now. The researchers also note that, in a survey of about 791 professional developers, roughly 80% found coding with AI more enjoyable, and senior developers — not juniors — were more likely to ship AI-generated code. The usual assumption (that juniors lean on AI more) is wrong in practice. 1
Lex pushes a question both find uncomfortable: if you never struggle with the bug yourself, do you eventually lose the skill? Raschka's view is that the answer depends on what you want to preserve. The AI is great for mundane tasks — his wife used ChatGPT to fix 100 broken links in podcast show notes that would have taken two hours manually. But the intrinsic satisfaction of debugging — the "drink of water after crossing a desert" — is worth protecting, and deliberate offline practice remains important.

AGI timelines: honest uncertainty, not false precision

On timelines, both researchers resist the strong claims circulating in AI circles. The most recent major episode to tackle this question from a different angle is Dwarkesh Patel's interview with Andrej Karpathy, who argues AGI is still at least a decade away — a useful counterpoint for calibrating expectations.
コンテンツカードを読み込んでいます…
Lambert roughly agrees with the AI 2027 report's framing of milestone-based progress (superhuman coder → superhuman AI researcher → ASI) but disagrees with the timeline optimism and the presumption of clean capability thresholds. His counterpoint: AI is "jagged" — already superhuman at frontend code and traditional ML systems, genuinely bad at distributed ML training because almost no training data exists for that. A "superhuman coder" is not a binary threshold; it's an uneven capability surface.
Raschka is even more cautious. The AI 2027 report originally projected 2027–2028 for fully automated coding; it has since been pushed to 2031 (mean estimate). Raschka says his own estimate would be longer still. 1
What both agree on: the most concrete near-term economic impact is the democratization of knowledge access, not the replacement of jobs. Lambert: "Nobody's really talking about that, because they just immediately take it for granted that this is awesome. Think about the impact across time, all across the world — kids throughout the world being able to learn these ideas. That's how we get to Mars." That framing cuts through most of the AGI debate as currently conducted in Silicon Valley.

The 996 grind and the Silicon Valley bubble

The episode ends with some cultural candor. Lambert describes frontier AI labs as operating on 9-9-6 culture (9am to 9pm, six days a week), compares the dynamic to Patrick McGee's account of Apple's China supply-chain engineers, and acknowledges he has personally hit burnout. Both researchers call out the Silicon Valley echo chamber — not to dismiss it, but to identify a real limitation: people building systems for the entire world from inside a very specific geographic, cultural, and socioeconomic context. 1
Raschka's advice for people in that bubble: read history books, travel, and don't confuse X/Twitter discourse for ground truth. Lambert recommends Season of the Witch, a history of San Francisco from 1960 to 1985, as a corrective for anyone moving to SF without context.
The overarching message is not pessimistic — both researchers genuinely believe AI will improve human lives at scale, and both are building pieces of that future. But they're unusually willing to say where the uncertainty is real, which makes this episode a useful baseline against which to measure the more aggressive claims circulating in AI coverage.

This article was distilled from Lex Fridman Podcast #490 (4 hours 25 minutes), published January 31, 2026, with guests Nathan Lambert and Sebastian Raschka. Full episode: 1

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。