DeepMind deep-dive and AI coding wars: May 20–27

A quiet week for startup and VC channels — most were in a Memorial Day publishing lull — but the creators who did post brought focus. Two Minute Papers landed its first in-person CEO interview (21 minutes with Demis Hassabis). Fireship handled Google I/O cleanup. And Theo published five videos in seven days that together form the most complete map of the current AI coding tool landscape available on YouTube.

Eight confirmed videos. All rated worth watching.

AI research

DeepMind's Insane AI Breakthroughs With CEO Demis Hassabis

Channel: Two Minute Papers · Published: May 25 · Duration: 21 min 28 sec

Two Minute Papers host Károly Zsolnai-Fehér meets Google DeepMind CEO Demis Hassabis in London 1

コンテンツカードを読み込んでいます…

Host Károly Zsolnai-Fehér (Two Minute Papers) sat down with Demis Hassabis at Google DeepMind's London office for the channel's first face-to-face CEO interview — a format distinct from their usual paper breakdowns. 1
Hassabis described DeepMind's drug discovery platform as six to twelve new models at AlphaFold's scale — covering protein interactions, molecular binding, ADME properties (absorption, distribution, metabolism, excretion), toxicity, and compound design — currently in pre-clinical stages. "We're building another half dozen to a dozen AlphaFold level models that are on different parts of the drug discovery process," he said. 1
Co-Scientist, a Gemini-based AI research assistant fine-tuned for hypothesis generation and literature synthesis, was demonstrated by Zsolnai-Fehér on his own research domain (ray tracing / global illumination). An earlier Co-Scientist variant independently discovered a more efficient matrix multiplication algorithm — and that algorithm was then used to improve AlphaFold itself. 1
DeepMind is building an automated materials science laboratory in London with 200,000 new material designs pending physical testing, including superconductor candidates. "We're sitting on 200,000 designs of new materials... There's some superconductors in there," Hassabis said. An automated drug discovery lab is expected within 18 to 24 months, contingent on robotics maturity. 1
Hassabis introduced the "Einstein test" as a benchmark for genuine AI scientific capability: take a model trained on all knowledge up to 1901, send it back to that era, and ask whether it can independently derive Einstein's four 1905 breakthrough papers (special relativity and three others). If yes, the system is ready for original science. He also noted that recursive self-improvement is feasible in programming and mathematics (where verification is fast), but harder in physical sciences, where each verification step requires an automated lab or physical-world confirmation. 1

Verdict: Watch — Hassabis disclosed DeepMind's drug discovery roadmap in unusual detail, gave a concrete timeline for automated labs, and offered a genuinely novel benchmark for evaluating AI scientific capability. The interview format also surfaced a more candid version of his views on AGI safety than typical conference panels.

Fireship

Google's AI endgame is here — everything you missed at I/O 2026

Channel: Fireship · Published: May 23 · Duration: ~10 min (estimated from transcript)

Google I/O 2026 keynote overview — Fireship's Google I/O 2026 recap 2

Google I/O 2026 was framed entirely around the "agentic Gemini era" — Gemini embedded across Search, Gmail, Android, and hardware including AR glasses. Token serving volume reached 3.2 quadrillion tokens per month, up from 9.7 trillion two years earlier. 2
Gemini Omni launched as a full-multimodal model accepting text, video, and audio, with output in any format — Fireship described it as Google's formal embrace of the world-model path. TPU chips were split into TPU-T (training) and TPU-I (inference) as separate product lines. 2
Gemini Flash 3.5 launched with performance comparable to Opus 4.7 and GPT-5.5, but at three times the price of the previous Flash version ($1.50/M input, $9/M output). Fireship noted: "The price of Gemini 3.5 Flash is three times more than the previous version and 30 times more than Gemini 1.5 Flash." Gemini 3.5 Pro remains in development, expected in late summer. 2
Anti-Gravity IDE (formerly Windsurf) demonstrated building a complete operating system that could run Doom during the keynote; its interface has shifted from "write code" to "manage agents." Chrome added an HTML on Canvas API allowing direct use of HTML elements within Canvas, useful for building high-interactivity UIs with WebGPU. 2
Fireship's editorial read: "Google is no longer trying to organize the world's information with blue hyperlinks, because search engines are now an archaic technology. Instead, Google is trying to become the interface to reality itself." The Neural Expressive design system for Gemini apps can generate charts, timelines, and mini-apps from natural language in real time. 2

Verdict: Watch — Fireship's high-density format covers the full I/O in under 10 minutes more efficiently than Google's official keynote. Particularly useful for developers tracking Gemini pricing and the Chrome API roadmap.

10 weird OSS projects you need right now

Channel: Fireship · Published: May 27 · Duration: ~10 min (estimated)

Fireship compiled ten overlooked open-source projects as a counter-narrative to AI-generated software saturation — "Underneath the AI sewage layer and below the prompt bros and notion template goblins, there are still real humans building insane, beautiful, and deeply unnecessary software." 3
Ratty is a 3D terminal emulator built in Rust using the Bevy game engine; the cursor is a rotating 3D rat, the viewport can be tilted, and the whole thing uses 300MB of RAM. CUDA Oxide is Nvidia's quietly published Rust GPU kernel tool, letting developers write kernels with a #[kernel] annotation that compiles directly to PTX — no FFI, no C++ required. 3
Terminal Phone is a Tor-based P2P voice and text shell script with no server, no account, and no phone number — end-to-end encrypted. Honker adds Postgres-style notify/listen to SQLite as a Rust extension, enabling persistent pub/sub, task queues, event streams, and cron scheduling. 3
Other entries: They Live Ad Blocker (replaces ads with 1980s sci-fi horror imagery), Wario Synth (in-browser Game Boy chiptune synthesizer using Web Audio API), Exipedia (Wikipedia as infinite TikTok-style scroll with a 40MB offline Wikipedia and an in-browser algorithm), Pewtor (a full desktop environment running in the browser with taskbar, draggable windows, file manager, and terminal). 3
Each project includes a GitHub link in the description. Fireship's framing positions the list as evidence that human-driven open-source creativity survives the AI content wave — a premise that will land differently depending on where you sit on that debate. 3

Verdict: Watch — The CUDA Oxide entry alone is worth clicking through to GitHub. The ten projects take under 10 minutes to survey and cover enough ground (Rust tooling, privacy tools, browser-based computing) to surface at least one thing relevant to most developers.

Theo - t3.gg

Five videos this week from Theo (Theodore Browne / t3.gg), published May 20–27. Taken together they form a fairly complete picture of the current AI coding tool ecosystem — the individual videos each stand alone, but there's obvious thematic continuity.

How I code with AI changed a lot

Channel: Theo - t3.gg · Published: May 27 · Duration: ~45–60 min (estimated from transcript length)

Theo documented a full workflow migration away from Cursor + Plan Mode + Claude Opus toward GPT-5.5 + Codex harness + T3 Code (his open-source agent management application). The switch was driven by OpenAI providing 10× API credits through an event, combined with frustration at Anthropic limiting non-CLI Claude Code access. 4
Core toolchain: T3 Code (desktop app for managing coding agents) + Codex CLI + remote Mac Mini accessed via Tailscale. He tested Codex's desktop app for remote coding and found it unusable (missing model selector, 50/50 paste success rate for images, terminal latency up to two minutes). T3 Code with remote hosting was the solution. 4
Context management strategy: clone relevant repos into a scratch directory for the agent to reference directly, rather than relying on documentation descriptions. The agent can read actual code patterns from the codebase it needs to match. 4
Voice input via Whisper Flow for prompt composition — Theo's argument is that speaking naturally produces higher-quality prompts than typing. He also pushed back on skills/plugins reliance: "I really, really don't like doing agent coding in terminals anymore. A good desktop app for coding will shit all over a CLI any day." 4
The video previews T3 Code's upcoming React Native mobile app and the development status of Lakebed, his new full-stack framework. Anthropic's decision to restrict non-CLI Claude Code access (reportedly dropping from $5,000 to $200 in API quota for third-party applications) is described as a hostile move that forced T3 Code to wrap the Claude CLI in a terminal UI rather than using native API calls. 4

Verdict: Watch — The remote coding setup comparison contains practical data not found elsewhere. Useful to anyone choosing between Codex, T3 Code, or cursor for their day-to-day workflow. Long (potentially an hour) but structured enough that the toolchain sections can be watched at 1.5×.

Claude Code vs Codex vs Cursor (an honest comparison)

Channel: Theo - t3.gg · Published: May 26 · Duration: ~45–60 min (estimated)

Theo's framework for comparison: not "which model is smarter" but "which philosophy is each tool optimizing for." Claude Code optimizes for the feeling of productivity — a slot-machine UI, high token spend, and features (Pet Mode, Sub-Agent, Lo-fi Radio) designed for Twitter screenshots. Codex optimizes for actual productivity — minimal UI, high token efficiency (GPT-5.5 used one-third the tokens of Sonnet 4.6 for higher benchmark scores), computer-use verification of code changes. 5
Cursor optimizes for the cloud-native future: the browser version can spin up a full GUI Linux sandbox, trigger agent coding via Slack bot, and return an automatically-generated verification video showing the changes working. "OpenAI is trying way harder to be token efficient, but they don't want to compromise on the accuracy of the solution," Theo said. 5
Anthropic and OpenAI are both subsidizing heavily ($200/month subscriptions delivering $4,000+ of equivalent API value), effectively running a subsidy war against Cursor's per-API-call model. Cursor cannot match this pricing at scale without similar lab backing. 5
Claude Code desktop app issues documented: Safari in-app login flow blocks 1Password, threads do not sync between CLI and desktop versions, adding new projects is broken. "Claude Code is as much a marketing tool as it is a developer tool. Anthropic is largely using Claude Code to push the idea that Anthropic is the best thing to use for building with AI." 5
Cursor's cloud sandbox capability — graphical Linux instance with automated verification video — is described as having no equivalent in either Codex or Claude Code for remote or async coding workflows. 5

Verdict: Watch — The philosophy-level framing is genuinely useful for tool selection. This is more valuable than benchmark comparisons because it identifies what each tool's makers consider success, which predicts where future improvements will land.

Cursor just crushed Claude Code

Channel: Theo - t3.gg · Published: May 25 · Duration: ~40–55 min (estimated)

Per-task cost bar chart showing Cursor, OpenAI, and Claude Code — Per-task cost comparison from Theo's Fish Slap benchmark test 6

Cursor released Composer 2.5, a distilled model based on Moonshot Kimi K2.5's open weights with substantial reinforcement learning post-training. On Cursor Bench it scored 63%, compared to GPT-5.5 at 64% and Opus 4.7 at 65% — but at $0.50/M input and $2.50/M output, roughly one-sixth the cost of Sonnet ($3/$15). 6
Training method: targeted textual feedback RL — inserting hints at error positions and guiding a student model via a teacher model. The training dataset was 25× larger than what was used for Composer 2, including techniques like deleting and recovering functionality to generate synthetic training data. SpaceX AI provided compute equivalent to 10× the total compute used to train Kimi K2.5. 6
Gemini 3.5 Flash direct comparison: scored below Composer 2.0 on Cursor Bench and cost $1.94 per coding task in the same Fish Slap test, more expensive and less capable than Composer 2.5 ($0.56/task). "I just don't see why anyone would use 3.5 Flash for almost anything. It's just not a good option." 6
Key limitation: Composer 2.5 is unavailable outside Cursor — no public API, no independent benchmark, Cursor Bench scores are not publicly auditable. Theo's own Fish Slap game test showed Composer 2.5 successfully implemented the core game mechanic but failed on first page load (fixed after one correction); Gemini 3.5 Flash produced broken code and failed completely. 6
SpaceX is reportedly in late-stage acquisition discussions for Cursor at a $60 billion valuation (S-1 filed). Theo's read is that Composer 2.5's distillation success demonstrates RL post-training on open weights can reach near-frontier performance, which has pricing implications for every lab selling coding APIs. 6

Verdict: Watch — If you care about the price-performance curve for coding models, this is the most concrete data available this week. The distillation methodology writeup is also worth reading independently of the tool comparison.

This is bad...

Channel: Theo - t3.gg · Published: May 21 · Duration: ~30–40 min (estimated)

On May 19, GitHub confirmed an intrusion into internal repositories. The attack vector: a malicious version of the NX Console VS Code extension (2.2 million installs) was published for 18 minutes on the VS Code Marketplace (12:30–12:48 UTC) and 36 minutes on Open VSX. The NX Console maintainer's GitHub token had been stolen in a prior supply chain attack (mini-shyllet). 7
Approximately 3,800 GitHub internal repositories were reportedly stolen, with GitHub confirming the scope was "in the right direction." The 18-minute window was sufficient because VS Code's auto-update triggers on any gallery interaction, not just when explicitly updating extensions. 7
Theo's critique centers on three missing infrastructure features: automatic audits when high-download packages receive updates, a staging buffer window before new versions go live, and version rollback capability. "NPM was so far ahead when it dropped that they haven't felt the need to fix things since and it shows now more than ever." 7
Socket (supply chain security company) closed a $60M Series C at a $1B valuation around this period, which Theo used as a signal that enterprise demand for package security tooling has reached critical mass. 7
The mini-shyllet attack previously stole tokens at scale; Theo's view is that attackers are now burning through that stolen token inventory in waves — this GitHub breach is not an isolated incident. "It's time for Microsoft to wake the fuck up to the reality they've put us in." 7

Verdict: Watch — If you use VS Code and npm, this is a direct threat model update. Theo's proposed fixes are concrete and actionable, not generic security advice. The attack chain (token theft → extension compromise → auto-update propagation → internal repo access) is worth understanding even if you're not in the GitHub ecosystem.

I'm scared to make this video

Channel: Theo - t3.gg · Published: May 20 · Duration: ~20–25 min (estimated)

Theo opened by disclosing that his previous critical video about Google's Anti-Gravity IDE led YouTube to demonetize his channel, stop recommending his content, and flag it for "dishonest behavior" — framing the current video as a personal risk. 8
Gemini 3.5 Flash pricing: $1.50/M input, $9/M output — more than 20× the cost of Gemini 2.0 Flash and 3× Gemini 3 Flash. In Theo's Fish Slap coding test, Gemini 3.5 Flash was the only model to fail completely (broken code, non-functional game mechanics, corrupted assets). Actual usage cost ran close to twice that of Gemini 3.1 Pro due to token waste from unnecessary output generation. 8
Google shut down the open-source Gemini CLI (100K+ GitHub stars, 6,000+ merged PRs, hundreds of contributors) with no public warning, replacing it with the closed-source Anti-Gravity CLI. The Anti-Gravity CLI had multiple launch bugs: broken scrolling, non-functional Control+C, displaying the user's email address, input field jitter, requiring /exit to quit. The original Gemini CLI will stop working on June 18. 8
Google Cloud reportedly suspended Railway's account (monthly spend: $2M+) without explanation, taking down all of Railway's web services. Theo referenced a prior Google Cloud incident from two years earlier in which a private cloud subscription for UniSuper (a $135B pension fund) was accidentally deleted. 8
Theo's framing: "Google is not just a company that doesn't care. It's a company that's incapable of caring in its current structure." He named three former Gemini CLI team members (Demitri, Jack, and Gal) who had built strong community trust — their roles have since been absorbed by the Anti-Gravity team. 8

Verdict: Watch — The three issues Theo documents (model pricing, open-source abandonment, cloud reliability) are independent of each other and each has direct implications for anyone building on Google's infrastructure. The demonetization context adds relevant weight to understanding YouTube's relationship with creator criticism of Google products.

Channels checked — no new content this week

Lex Fridman (last published May 6), Machine Learning Street Talk (last published March 13), Y Combinator (last published April 24), a16z (last published April 14), IBM Technology (last published April 25), and six other channels in the startup/VC/research category had no new videos during the May 20–27 window. Lex Fridman's Podcast #497 is likely the next release to watch from this group.

DeepMind deep-dive and AI coding wars: May 20–27

AI research

Fireship

Theo - t3.gg

Channels checked — no new content this week

参考ソース