
Best of your X follows: June 4
Today: Claude Mythos already hit the METR task-horizon milestone superforecasters said wouldn't happen until year-end; Ethan Mollick flags why AI acceleration feels invisible at the task level; Anthropic maps 832 malicious accounts against MITRE ATT&CK; OpenAI upgrades GPT-Rosalind for life sciences; Greg Brockman outlines a democratic AI governance blueprint; Andrew Ng covers LLM serving fundamentals; and Paul Graham notes YC startups' fundraising problem has flipped.

A tight day for news, but sharp on execution depth: OpenAI ships a life-sciences model upgrade and outlines how it wants AI governed; Anthropic maps real-world AI-enabled cyberattacks against a security framework; Ethan Mollick flags two under-discussed phenomena — the METR task-horizon milestone and the growing gap between what AI can do and what it feels like it can do. Andrew Ng breaks down LLM serving fundamentals. Paul Graham notes that YC startups' biggest problem has inverted.
AI safety and research
Anthropic studied 832 malicious accounts — here's what held up
Anthropic analyzed 832 accounts engaged in AI-enabled cyberattacks and mapped their behavior onto MITRE ATT&CK, a widely-used database of adversarial tactics and techniques. The study is one of the first to apply that framework to AI-assisted threat actors at scale 1.
The goal: find out which existing defensive techniques actually hold against AI-assisted offense, and which assumptions need updating. The full analysis is on Anthropic's blog.
Cargando tarjeta de contenido…
Claude Mythos already hit the METR milestone superforecasters said would take until year-end
In early May, the best superforecasters predicted that, by December 2026, the longest METR 80% task-horizon score would reach 3–4 hours. In late May, Claude Mythos achieved that number 2.
That's an eight-month forecast compressed into three weeks. Ethan Mollick flagged this as a data point worth sitting with: not as proof of AGI, but as a concrete signal that frontier capability is outrunning even well-calibrated expert timelines.
Cargando tarjeta de contenido…
Why AI feels slow even as it gets faster
Mollick also posted a short observation that's been circulating: models are improving by large margins on benchmarks, but because current frontier models are already strong, users don't feel the difference on most individual tasks 3.
The implication he draws: acceleration is real, but it's increasingly invisible at the task level — you only see it when you zoom out to aggregate scores or novel capability ceilings.
Mollick also recommends reading Anthropic's new RSI paper
He called Anthropic's piece on recursive self-improvement "a bit of navel-gazing, some marketing, and a lot of very sincere beliefs about what Anthropic thinks is likely in the near future of AI that you probably want to be aware of" 4.
Model releases and enterprise tools
OpenAI upgrades GPT-Rosalind for life sciences
OpenAI announced expanded capabilities for GPT-Rosalind, its model series built for life sciences research at enterprise scale 5.
The update brings GPT-5.5's agentic coding and tool use to the drug discovery and experimental workflow context. OpenAI is pitching it as the intersection of general-purpose reasoning with domain-specific science depth. Access is targeted at enterprise partners in biopharma and related fields.
Cargando tarjeta de contenido…
OpenAI proposes a blueprint for governing frontier AI democratically
Greg Brockman posted that OpenAI has put out a framework for "democratic governance of frontier AI" — how to build durable public institutions for frontier AI safety in the US 6.
The short post was light on detail, but the framing signals OpenAI is positioning itself as an actor that wants external governance structures rather than just resisting them.
AI tools and developer ecosystem
Andrew Ng's new vLLM serving course covers the basics most people skip
A new short course on the DeepLearning.AI platform, built with Red Hat and taught by Cedric Clyburn, covers the fundamentals of serving LLMs to many concurrent users at low latency 7.
The mechanics: a 70B-parameter model takes ~140 GB just to load weights. Each active request also needs its own KV cache — the memory block storing the token context built up so far. The course covers quantization to reduce memory footprint and vLLM's approach to memory management across parallel requests.
Skills covered: quantize a model, benchmark throughput vs. accuracy tradeoffs, serve with vLLM.
Startups and venture
Paul Graham: YC's funding problem has flipped
At a YC event last night, Graham and Jessica Livingston were interviewed about YC's early days. Graham noted that startups used to struggle to raise money after YC — now they face the opposite problem: so much capital that they have to be careful not to raise too much 8.
No policy recommendation attached — just a brief observation that the constraint has inverted.
Fuentes de referencia
- 1Anthropic: AI-enabled cyber threats and MITRE ATT&CK
- 2Ethan Mollick on Claude Mythos METR score
- 3Ethan Mollick on feeling the acceleration
- 4Anthropic Institute: Recursive Self-Improvement
- 5OpenAI: GPT-Rosalind update
- 6Greg Brockman on OpenAI's AI governance blueprint
- 7Andrew Ng on efficient LLM serving course
- 8Paul Graham on YC fundraising today vs. early days
Añade más opiniones o contexto en torno a este contenido.