Best of your X follows: June 4

A tight day for news, but sharp on execution depth: OpenAI ships a life-sciences model upgrade and outlines how it wants AI governed; Anthropic maps real-world AI-enabled cyberattacks against a security framework; Ethan Mollick flags two under-discussed phenomena — the METR task-horizon milestone and the growing gap between what AI can do and what it feels like it can do. Andrew Ng breaks down LLM serving fundamentals. Paul Graham notes that YC startups' biggest problem has inverted.

AI safety and research

Anthropic studied 832 malicious accounts — here's what held up

Anthropic analyzed 832 accounts engaged in AI-enabled cyberattacks and mapped their behavior onto MITRE ATT&CK, a widely-used database of adversarial tactics and techniques. The study is one of the first to apply that framework to AI-assisted threat actors at scale 1.

The goal: find out which existing defensive techniques actually hold against AI-assisted offense, and which assumptions need updating. The full analysis is on Anthropic's blog.

Cargando tarjeta de contenido…

Claude Mythos already hit the METR milestone superforecasters said would take until year-end

In early May, the best superforecasters predicted that, by December 2026, the longest METR 80% task-horizon score would reach 3–4 hours. In late May, Claude Mythos achieved that number 2.

That's an eight-month forecast compressed into three weeks. Ethan Mollick flagged this as a data point worth sitting with: not as proof of AGI, but as a concrete signal that frontier capability is outrunning even well-calibrated expert timelines.

Cargando tarjeta de contenido…

Why AI feels slow even as it gets faster

Mollick also posted a short observation that's been circulating: models are improving by large margins on benchmarks, but because current frontier models are already strong, users don't feel the difference on most individual tasks 3.

The implication he draws: acceleration is real, but it's increasingly invisible at the task level — you only see it when you zoom out to aggregate scores or novel capability ceilings.

Mollick also recommends reading Anthropic's new RSI paper

He called Anthropic's piece on recursive self-improvement "a bit of navel-gazing, some marketing, and a lot of very sincere beliefs about what Anthropic thinks is likely in the near future of AI that you probably want to be aware of" 4.

Model releases and enterprise tools

OpenAI upgrades GPT-Rosalind for life sciences

OpenAI announced expanded capabilities for GPT-Rosalind, its model series built for life sciences research at enterprise scale 5.

The update brings GPT-5.5's agentic coding and tool use to the drug discovery and experimental workflow context. OpenAI is pitching it as the intersection of general-purpose reasoning with domain-specific science depth. Access is targeted at enterprise partners in biopharma and related fields.

Cargando tarjeta de contenido…

OpenAI proposes a blueprint for governing frontier AI democratically

Greg Brockman posted that OpenAI has put out a framework for "democratic governance of frontier AI" — how to build durable public institutions for frontier AI safety in the US 6.

The short post was light on detail, but the framing signals OpenAI is positioning itself as an actor that wants external governance structures rather than just resisting them.

AI tools and developer ecosystem

Andrew Ng's new vLLM serving course covers the basics most people skip

A new short course on the DeepLearning.AI platform, built with Red Hat and taught by Cedric Clyburn, covers the fundamentals of serving LLMs to many concurrent users at low latency 7.

The mechanics: a 70B-parameter model takes ~140 GB just to load weights. Each active request also needs its own KV cache — the memory block storing the token context built up so far. The course covers quantization to reduce memory footprint and vLLM's approach to memory management across parallel requests.

Skills covered: quantize a model, benchmark throughput vs. accuracy tradeoffs, serve with vLLM.

Startups and venture

Paul Graham: YC's funding problem has flipped

At a YC event last night, Graham and Jessica Livingston were interviewed about YC's early days. Graham noted that startups used to struggle to raise money after YC — now they face the opposite problem: so much capital that they have to be careful not to raise too much 8.

No policy recommendation attached — just a brief observation that the constraint has inverted.

Best of your X follows: June 4

AI safety and research

Model releases and enterprise tools

AI tools and developer ecosystem

Startups and venture

Fuentes de referencia