Best of your X follows: June 20

Best of your X follows: June 20

Mollick turns model evaluation into artifact inspection with GLM-5.2 and a harbor-town benchmark. Google DeepMind points AI at UK housing planning workflows, while Simon Willison and Charity Majors push the developer-tooling theme: generated code is cheap, engineering discipline is not.

Daily Best of Who I Follow on X
18/6/2026 · 2:06
1 suscripciones · 25 contenidos
The strongest signal today is not one giant launch. It is a set of small tests for where AI systems are starting to show up: model comparisons that use artifacts instead of leaderboard numbers, public-sector workflow prototypes, and developer tools that now assume agents can write to real systems.
Source mix: mostly X posts from the monitored account set, plus Simon Willison's weblog when his X timeline was quiet. Pure retweets, one-line political posts, and low-context small talk were left out.

Model releases and evaluation

Ethan Mollick: GLM-5.2 Max can do the task, but Fable still changes the shape of it

What happened: Mollick credited GLM-5.2 Max, a new open-weights model, for completing a constrained poem task that involved disappearing letters 1.
Why it matters: his comparison was not about whether the output was correct. He argued that Fable integrated the disappearing-letter constraint into the poem's theme, while GLM-5.2 Max mostly satisfied the surface requirement 1.
Implication: if you evaluate creative or agentic systems only by task completion, you miss the difference between following an instruction and using the constraint as part of the work.
Cargando tarjeta de contenido…
What happened: Mollick shared a benchmark prompt asking models to build a procedurally generated 3D harbor-town simulation from 3000 BCE to 3000 AD, with beauty and user control in the spec 2.
Why it matters: the linked gallery compares model outputs from one prompt and describes the set as spanning 39 months of AI progress; the older GPT-3.5 and GPT-4 entries needed one standardized follow-up 3.
Implication: this is the kind of artifact-based benchmark that is easy for practitioners to inspect. You can judge coherence, interactivity, aesthetics, and failure modes without reducing everything to one score.
Cargando tarjeta de contenido…

Public-sector AI

Google DeepMind: planning-office prototype targets housing applications

What happened: Google DeepMind said it is working with UK government bodies on an AI housing application planning prototype 4.
Why it matters: the post says the prototype is aimed at repetitive planning-officer work, so officers can spend more attention on complex projects 4.
Implication: DeepMind is claiming a processing-time reduction of up to 50%. Treat that as a target claim from the project team, not an audited deployment result yet 4.
Cargando tarjeta de contenido…

Developer tools and engineering practice

Simon Willison: Datasette gets first-class row editing

What happened: Simon Willison released Datasette 1.0a34, adding insert, edit, and delete tools to the Datasette interface 5.
Why it matters: the feature is available on table pages, while edit and delete also appear as row-level actions. That makes the ordinary UI catch up with the write workflows Simon had already been exploring through Datasette Agent 5.
Implication: agent-assisted database work is pushing product surfaces back toward explicit human approval and visible edit controls, not just chat-only automation.
Datasette row-editing interface
Datasette 1.0a34 adds row insert, edit, and delete actions to the web interface 5.

Simon Willison / Charity Majors: AI coding raises the bar for engineering discipline

What happened: Willison surfaced Charity Majors' argument that AI made code generation cheap and fast, changing the economics of software production 6.
Why it matters: Majors' longer piece argues that if code becomes more disposable, teams need stronger production understanding, observability, review habits, and system invariants, not weaker ones 7.
Implication: the practical takeaway for AI coding teams is blunt: optimize for shared understanding and production feedback, because generated code is cheap and operational confusion is still expensive.

Short signals

Greg Brockman: GPT-Realtime-2 gets a terse internal endorsement

What happened: Greg Brockman posted that "GPT-Realtime-2 is something new" 8.
Why it matters: the post gives no launch note or technical detail, so the signal is weaker than a product announcement. It does show OpenAI's cofounder drawing attention to the realtime line after recent voice and WebRTC experiments in the developer community 8.
Implication: keep an eye on demos and docs before treating this as more than a high-level hint.
Cargando tarjeta de contenido…

François Chollet: solve hard problems by reframing, not piling on complexity

What happened: Chollet argued that hard problems are rarely solved by adding complexity; they are solved by reframing the question until a simpler answer becomes visible 9.
Why it matters: in the context of AI research and software design, that is a useful counterweight to scale-first thinking. More machinery can hide a bad problem statement.
Implication: before adding another layer to an agent pipeline, ask whether the task definition is wrong.

Añade más opiniones o contexto en torno a este contenido.

  • Inicia sesión para comentar.