Drug discovery agents need better geometry (2026)

The interesting claim in this Latent Space episode is not that AI can propose more drug candidates. That part is almost too easy. The harder claim is that an AI system may now be accurate enough to reason about whether a molecule is physically worth making.

Evan Feinberg, Genesis Molecular AI's co-founder and CEO, and Sergey Edunov, Genesis's CTO and former Meta researcher who led Llama 2 and Llama 3 pretraining, argue that the frontier of generative AI has moved into 3D structure prediction: modeling how a small molecule sits inside a flexible protein binding pocket 1. That makes the episode less a biotech company pitch than a useful test case for a broader AI question: when do agents become useful in the physical world?

Abrir en YouTube· Útil si el reproductor incrustado solicita iniciar sesión

Cargando tarjeta de contenido…

The central argument: agents need geometry before autonomy

In software, coding agents became more useful as base models crossed a reliability threshold. Feinberg and Edunov make the same argument for drug discovery. A drug-discovery agent is not valuable because it can brainstorm molecules forever; it is valuable only if its underlying models can predict binding poses, potency, toxicity, and related properties well enough that medicinal chemists would trust the next experiment.

That distinction matters because small-molecule discovery is a brutal search problem. Edunov says there are around 10^60 possible drug-like small molecules, so the task is not merely finding a needle in a haystack; in Feinberg's follow-up line, it may be closer to finding hay in a needle stack 2. Most candidate molecules are useless, unsafe, hard to synthesize, or strong in one property while failing another.

The episode's best explanation is the tension between binding and ADMET: absorption, distribution, metabolism, excretion, and toxicity. A molecule that binds tightly may be greasy; a greasy molecule may dissolve poorly and never reach the tissue where it is needed 1. So the promise of an AI agent is not one-step invention. It is a closed loop that can generate candidates, inspect their 3D poses, reason about tradeoffs, and decide which experiment deserves lab time.

Why diffusion, not another LLM metaphor

The episode is also a reminder that not all AI progress looks like language modeling. Feinberg says the field had to wait for the right primitive, and that primitive turned out to be diffusion: the iterative denoising approach now familiar from image generation, adapted here to 3D molecular structures 2.

Genesis's PEARL model, short for Place Every Atom at the Right Location, is a protein-ligand cofolding model. It predicts the 3D structure of a protein and a candidate ligand together, rather than treating the protein as a rigid lock and the ligand as a static key. Genesis describes PEARL as using an SO(3)-equivariant diffusion module, meaning the model is built to respect 3D rotational symmetry instead of relearning from data that a rotated molecule is still the same molecule 3.

That design choice is important because public experimental data is scarce. The global public repository of protein-ligand structures is tiny compared with the internet-scale corpora used for language models. Genesis says PEARL leans on physics-generated synthetic structures and inference-time steering, a rough analogue to giving a model more thinking budget at prediction time, except the intermediate object is not a chain of text but a possible molecular geometry 4.

The benchmark fight is really about usefulness

The most pointed part of the conversation is Feinberg's critique of the field's comfort with 2-angstrom RMSD. RMSD, or root-mean-square deviation, measures how far a predicted molecular pose is from the experimentally observed structure. The blunt version of Feinberg's argument: 2 angstroms can still be wrong enough to mislead a chemist.

He gives a concrete reason. If an aromatic ring flips, the pose may still look plausible under a loose metric while no longer modeling the right interactions. Hydrogen bonds, which often determine whether a ligand binds well, operate over a much smaller range. In Feinberg's words, a model sitting at 1.8 or 1.9 angstrom RMSD is "slop, most likely" 2.

That is not just scorekeeping. If the downstream goal is to decide what molecule to synthesize next, the metric must measure whether the model preserves chemically meaningful interactions. Genesis's technical report makes the same point in a more formal way: PEARL's practical value shows up most clearly under stricter thresholds, including RMSD below 1 angstrom on challenging internal targets 4.

OpenBind is the proof point the episode leans on

The episode's strongest evidence is Genesis's OpenBind result. OpenBind evaluated 802 ligand-protein complexes for EV-A71 2A protease, a target where the binding pocket changes shape when the ligand enters. Classical docking methods struggle because they often assume a more rigid receptor; modeling the target correctly requires induced fit, the protein's movement around the ligand 5.

Genesis reports that its zero-shot PEARL system reached a 78% success rate on OpenBind's joint criteria: RMSD at or below 2 angstroms, PoseBusters physical validity, and LDDT-PLI at or above 0.8. With pocket conditioning, PEARL reached 85% on the same joint criteria, and at the stricter RMSD below 1 angstrom threshold, it reported 60% zero-shot success and 70% pocket-conditioned success 5.

The detail that matters most is not the leaderboard placement. It is that Genesis says the result was zero-shot: PEARL used the protein sequence, the ligand's 2D chemical structure, and an apo crystal structure template, without target-specific tuning or data from the target or homologous targets 5. If that generalization holds across more targets, the model becomes more than a benchmark specialist. It becomes part of an experimental loop.

What SAPPHIRE suggests about the next interface

The agentic part of the episode is Genesis's internal system, SAPPHIRE. The hosts describe it as an agentic drug-discovery workflow that can reason about poses, form hypotheses, read literature, use internal tools, and propose candidates for the next iteration 1.

That sounds familiar to anyone following software agents, but the constraint is harsher. A bad coding agent can waste reviewer time. A bad drug-discovery agent can send scientists into expensive, slow, or biologically irrelevant experiments. The useful interface is therefore not "generate me a drug." It is a loop with model confidence, physics checks, wet-lab feedback, and human chemists deciding which hypotheses deserve resources.

This is where the episode's AI lesson travels beyond biotech. Agents become more credible when the environment gives them tight feedback and when their actions can be checked against reality. In drug discovery, that feedback is molecular geometry, lab data, and eventual biological response. The closer the model gets to chemically meaningful accuracy, the less the agent is just a text interface wrapped around uncertainty.

The caveat: this is still a closed, expensive frontier

The episode should not be read as "drug discovery is solved." The most impressive systems discussed here are closed, the strongest comparisons depend on specialized benchmarks, and even a correct binding pose does not prove a molecule will become a safe drug. Clinical success still depends on biology, pharmacokinetics, toxicity, manufacturability, and trials.

But the conversation does change the shape of the question. Instead of asking whether AI can invent molecules, the sharper question is whether models can predict molecular interactions accurately enough to make each lab cycle more informative. If PEARL-like systems keep improving, the scarce resource may shift from candidate generation to experimental judgment: which model-generated hypothesis is reliable enough to test next?

Drug discovery agents need better geometry