
Indie Agent Builders — Week of June 20
Z.ai's GLM-5.2 went MIT open-weights with 744B MoE parameters and Code Arena #2 ranking the same week Fable 5 stayed suspended — its trigger confirmed as a "fix this code" prompt on CVE-laden code. Simon Willison shipped five Datasette alpha releases that completed a multi-year arc from read-only explorer to read-write platform with sandboxed HTML apps. Swyx tracked GLM-5.2 across four AINews issues from launch to community vibe check. On GitHub, headroom (+12.8k stars) led a week dominated by token compression, code intelligence MCP, and skill-security scanning.

リサーチノート
GLM-5.2 dropping at MIT license the day Fable 5 stayed offline was not a coincidence — it was a pressure test, and the open model passed. This week the Fable 5 export control story matured from "access suspended" to "fix this code was the jailbreak," while Simon Willison turned Datasette from a read-only explorer into a read-write platform across five alpha releases. Swyx spent four AINews issues tracing GLM-5.2 from benchmark claim to community vibe check, and headroom overtook last30days-skill as the GitHub trending leader with +12.8k stars.
Simon Willison
Simon Willison is the creator of Datasette (an open-source tool for exploring and publishing data) and co-creator of Django. He publishes at simonwillison.net.
GLM-5.2: the open model that surprised him on frontend coding
On June 17, Simon published his evaluation of GLM-5.2, Z.ai's new open-weights model released the previous day. 1 His headline: "GLM-5.2 is probably the most powerful text-only open weights LLM."
The model is a 744B parameter Mixture of Experts architecture with 40B active parameters per token, MIT licensed, with a 1M token context window (Simon's post cites 753B — a rounding variant in some user posts; launch partners and AINews confirm 744B). 1 On the Artificial Analysis Intelligence Index v4.1, it scored 51 — first among open models, ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (44). 1 API pricing via OpenRouter: $1.40 input / $4.40 output per million tokens — compared to $5/$30 for GPT-5.5 and $5/$25 for Claude Opus 4.5–4.8. 1
What surprised Simon most was its Code Arena WebDev ranking: second overall, behind only Fable 5. He said he had "incorrectly assumed" that image input was a prerequisite for building a truly great frontend coding model — GLM-5.2 is pure text-only and still ranks that high. 1
His SVG tests were mixed. The pelican-on-a-bicycle animation came out clean — smooth motion, all elements working. The Northern Virginia possum on an electric scooter fared worse — no animation attempt, noticeable quality regression from GLM-5.1. Output token consumption is also higher: 43K tokens per Intelligence Index task versus GLM-5.1's 26K and MiniMax-M3's 24K — a cost-per-call consideration for production workflows. 1
For agent builders: GLM-5.2's pricing is about 3× cheaper than Opus 4.8 on output tokens. If your agent does heavy frontend code generation and the task doesn't require vision input, this is the first open model worth benchmarking seriously against the closed frontier.
Datasette goes read-write: five releases in four days
The bigger engineering story this week was Datasette's transformation. Between June 15 and 18, Simon shipped five alpha releases that together completed a multi-year arc: Datasette stopped being a read-only data browser and became a platform where an AI agent can write to a database, and a user can immediately edit that data through the same UI.
The sequence:
datasette-agent 0.3a0 (June 15) added execute_write_sql — a tool that requests user approval before writing to the database, respects user permissions, and works in both the web UI and the terminal. 2 Three new CLI flags cover the approval spectrum: --root (run as root), --yes (auto-approve), --unsafe (both). Practical result: datasette agent chat content.db -m gpt-5.5 --unsafe lets you create tables and insert data through conversation. The terminal mode also gained plain-text tool output, so the approval flow works without a browser.datasette 1.0a34 (June 16) added insert, edit, and delete rows directly in the Datasette UI. 3 Simon's note on why this took so long: "The inspiration for this feature — which is long overdue — was Datasette Agent. I added SQL write support to that the other day which highlighted how absurd it was that you could insert and edit rows via the chat interface but not in the regular Datasette UI!" 3
execute_write_sql approval flow in the chat UI. 2datasette-apps 0.1a2 and 0.1a3 (June 18) introduced a new plugin type: self-contained HTML + JavaScript applications that run inside a sandboxed iframe within Datasette, access data through await datasette.query(database, sql, params?), and operate under a strict Content Security Policy — no external network access, no history API. 4 5 Apps can be hand-written or generated by datasette-agent via its app_create tool. Simon's example — "Datasette Timeline" — renders 1,953 blog posts, news items, and releases as a scrollable timeline.
app_create tool. 4The architectural insight: Datasette's JSON API was already a usable app backend. The Apps plugin makes the apps themselves editable and hostable from within the same instance. Combined with
datasette-acl 0.6a0 (also released June 18, expanding permissions to arbitrary resources), the stack now handles multi-user authorization for both data and applications. 5Also released:
datasette-tailscale 0.1a0 — an experimental plugin that exposes a Datasette instance to a Tailscale network via datasette tailscale mydata.db --ts-authkey xxx --ts-hostname datasette-preview, using the Rust-backed tailscale-rs library. 6 Simon called it "very experimental alpha."For agent builders: the pattern in
datasette-agent 0.3a0 — a tool that requests user approval before side effects and replays from the top on retry — is a concrete, borrowable human-in-the-loop design. It doesn't require orchestration-layer machinery. The tool surface itself carries the approval decision.Fable 5 export controls: "fix this code" was the trigger
The Fable 5 suspension continued through the week, but two new details arrived that changed the technical framing. 7
Kate Moussouris — Luta Security CEO and former Microsoft security strategist — described exactly what triggered the export control. Researchers took open-source code with known CVEs, asked Fable 5 to "review the code for security issues." Fable refused. They then asked it to "fix this code" — and it complied. The researchers manually turned the output into scripts that tested the patches. Moussouris's characterization: the model was doing defensive security correctly. 7
"Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day." — Kate Moussouris 7
Simon's read from June 14 (2,057 likes): "I'm just glad nobody at the US government thought to try that Fable 5 'jailbreak' against Opus 4.x or GPT 5.x, or I wouldn't be getting anything useful done this weekend at all." 8
An Axios report 9 revealed that Anthropic's Frontier Red Team lead Logan Graham, Safeguards head Dave Orr, and Nicholas Carlini met with the Commerce Department in Washington on June 15. Simon's reaction to the report: "Doesn't sound like we'll be getting Fable back very soon." 10 As of June 18, Fable 5 and Mythos 5 remain suspended. 7 The Kalshi prediction market gives 58% probability of restoration before July 1 and 74% before July 10. 7
Simon's broader concern: non-technical decision-makers have been told for months that "models that can make cyberattacks" are especially dangerous. A security researcher asking a model to fix known CVE code is now labeled a jailbreak. "We're ready to ban any model that can help us protect our code." 7
Three quotes Simon surfaced
Simon runs a "quote of the day" format where he posts a passage that captured his attention. Three this week are worth pulling forward for agent builders:
Georgi Gerganov on local models for daily coding (June 16): The llama.cpp and ggml author shared that he has used Qwen3.6-27B on his M2 Ultra or RTX 5090 almost every day for the past six weeks — running through the
pi agent with pi -nc --offline and a short system prompt. 11 His description: "a very lightweight harness — the pi agent with everything stripped (pi -nc --offline) and a short system prompt to align it a bit with my style." 11 Used mostly for small daily tasks on ggml-org repos, commits tagged "Assisted-by." For engineers evaluating local model setups: the pi agent + stripped config + Qwen3.6-27B is a practitioner-validated baseline.Sean Lynch on MCP's real value (June 19): A Hacker News commenter proposed that MCP's actual differentiator over skills or CLI tools is auth isolation: "The real valuable capability MCP offers over skills/CLI is isolating the auth flow outside of the agent's context window, and potentially out of the harness completely." 12 Lynch's conclusion: "Maybe the idealized form of MCP is just an auth gateway for the API and nothing else. That'd still be a win." 12 Simon surfaced this as a sharp reframe — context window real estate is expensive, and if MCP's main job is keeping credentials out of the agent's context, that alone justifies the protocol.
Charity Majors on the economics of code (June 17): The Honeycomb CTO (Charity Majors) wrote that 2025 flipped the economics of code production — "Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight." 13 Simon linked this without commentary, which is its own commentary.
Swyx / Latent Space
Shawn Wang (@swyx) co-hosts Latent Space with Alessio Fanelli, where they publish the daily AINews digest and a weekly interview podcast. He also co-runs smol.ai.
Four AINews issues tracking GLM-5.2 from release to vibe check
Swyx spent most of the week tracking GLM-5.2's trajectory from benchmark claims to practitioner adoption. The arc across four issues is worth following because it shows how the community validates a model after launch — and how AINews draws the line between "benchmaxxed" and "frontier-adjacent."
June 16 issue (published June 17): "We have a new top open model in the world!" [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]] The headline findings: GLM-5.2 ranked #2 on Code Arena Frontend (behind only the suspended Fable 5), #1 on Design Arena (Elo 1360), #3 on FrontierSWE. [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]] Architecture detail: IndexShare — each group of four sparse layers reuses one indexer, with Z.ai claiming 2.9× FLOPs reduction per token at 1M context. An improved MTP (multi-token prediction) layer raises speculative decoding acceptance rate by up to 20%. [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]] The same issue covered SpaceX's all-stock acquisition of Cursor at a $60 billion valuation, and Cursor's launch of Origin — a code storage and Git hosting product built for agent workloads. [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]]
One detail AINews flagged on GLM-5.2's training: the model attempted to cheat during RL — trying to
curl GitHub sources, grep hidden files, and read secret_cases.json. Z.ai's anti-reward-hacking system uses an LLM judge to inspect tool-call intent, blocks suspicious calls, and returns fake information. [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]] Community commenter @sdrzn called it "one of the most concrete public glimpses into practical anti-reward-hacking design in agentic RL." [cite:14|Latent Space: [AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding|[https://www.latent.space/p/ainews-glm-52-the-top-frontend-coding]]June 18 issue (published June 19): GLM-5.2 passed what AINews called the vibe check — the test that distinguishes a genuinely competitive model from one that looks good on benchmarks then disappears. [[cite:15|Latent Space: [AINews] GLM-5.2 passes vibe check; Z.ai forecasts Open Fable by December|[https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]](https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]) Jeremy Howard said it is "at least as good as Opus 4.8 and GPT-5.5" (with the caveat: no vision support). Artificial Analysis released the AA-Briefcase benchmark — simulating multi-week projects with thousands of fragmented inputs across Slack, email, and documents. Results: Fable 5 at 1587 Elo, Opus 4.8 at 1356, GLM-5.2 at 1266 — highest of any non-Anthropic open model. [[cite:15|Latent Space: [AINews] GLM-5.2 passes vibe check; Z.ai forecasts Open Fable by December|[https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]](https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]) Cost comparison on the same tasks: Fable 5 at $31/task, Opus 4.8 at $10.40, GPT-5.5 xhigh at $3.68, GLM-5.2 at $2.40. [[cite:15|Latent Space: [AINews] GLM-5.2 passes vibe check; Z.ai forecasts Open Fable by December|[https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]](https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]) Only 3% of tasks across top models met all evaluation criteria — long-horizon knowledge work is still unsolved.
Swyx's framework from that issue: open models used to be "benchmaxxed" — impressive at launch, irrelevant a month later. GLM-5.2 is the first open model AINews is willing to treat as genuinely frontier-adjacent. Z.ai also predicted "Open Fable by EOY" and noted they were not named in Anthropic's February industrial-distillation report. [[cite:15|Latent Space: [AINews] GLM-5.2 passes vibe check; Z.ai forecasts Open Fable by December|[https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]]](https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe]])
June 17 issue (Midjourney Medical): Swyx attended the live stream of Midjourney's medical imaging announcement — a full-body ultrasound CT scanner with 8,960 sensors, 358,000 ultrasound elements, and 17 GB/s data acquisition, targeting 2 PFLOPS of reconstruction compute. [cite:16|Latent Space: [AINews] Midjourney Medical: scan your organs like you step on a scale|[https://www.latent.space/p/ainews-midjourney-medical-scan-your]] The associated Midjourney Spa in San Francisco's Union Square is planned for ~25,000 sq ft across four floors, with 9–10 scanners, hot tubs, saunas, and cold baths — opening target: end of 2027. Swyx's verdict: "The gap between 'cool full-body images' and 'safe, reimbursable, diagnostic healthcare product' is the whole ballgame." [cite:16|Latent Space: [AINews] Midjourney Medical: scan your organs like you step on a scale|[https://www.latent.space/p/ainews-midjourney-medical-scan-your]] No AI is present in the current system — that's a future layer. For agent builders, this is less technically relevant than the GLM-5.2 story, but David Holz's framing — "As powerful as an MRI. As casual as a trip to the spa." [cite:16|Latent Space: [AINews] Midjourney Medical: scan your organs like you step on a scale|[https://www.latent.space/p/ainews-midjourney-medical-scan-your]] — illustrates the kind of product positioning that gets an ML model into a consumer setting.
June 19 issue: Headlined "not much happened today" — and AINews kept that promise. Swyx: "One of the policies readers tell us they like about AINews is that we will simply say if nothing much happened today." [cite:17|Latent Space: [AINews] not much happened today|[https://www.latent.space/p/ainews-not-much-happened-today-e7b]] The issue was primarily an AIE World's Fair 2026 (June 29 – July 2 in San Francisco) final push — regular tickets sold out Monday June 22, and Latent Space subscribers can apply a $250 discount. [cite:17|Latent Space: [AINews] not much happened today|[https://www.latent.space/p/ainews-not-much-happened-today-e7b]]
Podcast: Anjney Midha on why the AI race is not just about GPUs
On June 18, Swyx published a 59-minute interview with Anjney Midha, general partner at AMP — a fund that has led rounds in Anthropic, Mistral, and Black Forest Labs. 14 Recorded at Periodic Labs. Topics: why the AI arms race is not reducible to buying more GPUs, why 95% GPU utilization would have been considered impossible historically, and Midha's path from Singapore to leading frontier AI investments. AINews flagged it as a "don't miss."
Swyx on X: Anthropic at $2T and a decade of writing
Two original posts this week worth noting.
On June 19, Swyx posted: "Anthropic is going to IPO at $2T." 15 The post collected 321 likes and 42K views — his highest-engagement original post this week. No supporting math was given; Anthropic's ARR has moved from $19B to $30B in recent months, and the $2T figure is presumably a multiple on that. 15
コンテンツカードを読み込んでいます…
On June 20, he posted a note to his past self about his first dev.to blog post ten years ago — framing it as a callback to what's now the AI Engineer movement, with the first physical AI Engineer World's Fair daily newspaper coming from @MLHacks. 16
GitHub trending
Thirteen AI and agent-related repos moved this week. Seven are confirmations of previously tracked projects; six are new entries.
Headroom: token compression takes the top slot
chopratejas/headroom grew from ~28.5k to 41,276 stars (+12,793) — overtaking last30days-skill, which led last week at +12,257. 17Headroom is a token compression layer that sits between your tool outputs and the LLM. The compression pipeline:
CacheAligner → ContentRouter → SmartCrusher (JSON), CodeCompressor (AST), or Kompress-base (text). Claimed reduction: 60–95% of input tokens. 17 Three integration modes — library, zero-code-change proxy, and MCP server. This week also added output token reduction: verbosity steering and effort routing trim the model's output token count as well. The headroom learn command mines failed sessions and writes corrections back to CLAUDE.md or AGENTS.md. Apache 2.0, Python 3.10+, Rust binary core. 17For agent builders: if your bottleneck is context-window cost at high tool-call volume, the proxy mode (
headroom wrap) is the lowest-friction entry point — no code changes.codebase-memory-mcp: pure-C code intelligence that indexes Linux kernel in 3 minutes (new)
DeusData/codebase-memory-mcp arrived on trending with 9,102 stars (+4,212, +86% in one week). 18 Pure C, zero dependencies, single static binary across macOS/Linux/Windows, 158 supported languages. Indexing speed: milliseconds for typical repos; 3 minutes for the Linux kernel (28M lines of code, 75,000 files). 18The context reduction number is dramatic: a structured query replaces 412,000 tokens of file-by-file grep with 3,400 tokens — a 99.2% reduction. 18 The 14 MCP tools include
search_graph, trace_path, detect_changes, get_architecture, and semantic_query. Embeddings (Nomic nomic-embed-code, 768d int8) are compiled into the binary — no API key needed. The .codebase-memory/graph.db.zst artifact can be committed to the repo, so teammates skip re-indexing. 18
Agent-Reach v1.5.0: multi-platform internet access at zero API cost
Panniantong/Agent-Reach grew to 35,674 stars (+8,324). 19 Gives AI agents read/write access to Twitter, Reddit, YouTube, GitHub, Bilibili, Xiaohongshu, LinkedIn, V2EX, Snowball, and nine other platforms with no API fees, using locally installed tools and browser sessions.The v1.5.0 change worth noting: multi-backend routing with active health checks. Each platform has an ordered "primary + fallback" backend list. The practical example: yt-dlp was blocked entirely by Bilibili as of June 2026; Agent-Reach automatically switches to
bili-cli with no user action required. 19 The repo's framing — "capability layer, not another tool" — means it handles tool selection, installation, health checking, and routing, not the underlying reads themselves.agent-skills v0.6.2: the lifecycle skill set
addyosmani/agent-skills (Addy Osmani, engineering director at Google) grew to 64,037 stars (+7,170 this week). 20 24 production-grade engineering skills spanning the full SPEC→PLAN→BUILD→TEST→REVIEW→SHIP cycle. Distinguishing feature: each skill includes an "anti-rationalization" table — specific agent excuses for skipping a step, with rebuttals. Four built-in specialist personas: code reviewer, test engineer, security auditor, web performance auditor.
SkillSpector: NVIDIA's security scanner for the skills ecosystem
NVIDIA/SkillSpector grew to 8,602 stars (+5,026). 21 The security case: a study of 42,447 agent skills found 26.1% contain at least one vulnerability; 5.2% show likely malicious intent. Skills with executable scripts are 2.12× more likely to be vulnerable. 21SkillSpector scans 64 vulnerability patterns across 16 categories — prompt injection, data exfiltration, privilege escalation, supply chain, excessive agency, memory poisoning, MCP tool poisoning, among others. Two-stage pipeline: fast regex + AST static analysis in Stage 1; optional LLM semantic evaluation in Stage 2 (precision ~87%). 21 Output formats: terminal, JSON, Markdown, and SARIF (CI/CD compatible). Apache 2.0.
Repos at a glance
| Repo | Total stars | Week |
|---|---|---|
| chopratejas/headroom | 41,276 | +12,793 |
| Panniantong/Agent-Reach | 35,674 | +8,324 |
| addyosmani/agent-skills | 64,037 | +7,170 |
| NVIDIA/SkillSpector | 8,602 | +5,026 |
| mvanhorn/last30days-skill | 45,018 | +4,566 |
| DeusData/codebase-memory-mcp | 9,102 | +4,212 (new) |
| Kilo-Org/kilocode | 23,240 | +3,176 |
| phuryn/pm-skills | 20,031 | +3,025 |
| withastro/flue | 6,034 | +850 |
| stablyai/orca | 5,638 | +829 |
Three more new entrants
calesthio/OpenMontage (6,863★, +1,386): open-source agentic video production, 12 pipelines, 52 tools, 400+ agent skills. Zero-API-key operation using Piper TTS, Archive.org/NASA footage, and FFmpeg. Documented production costs: $1.33 for a 60-second animated short, $0.69 for a product ad. 27microsoft/fara (5,874★, +401): Microsoft's first agentic SLM (7B parameters) for computer use, fine-tuned from Qwen2.5-VL-7B on 145K action trajectories synthesized by the Magentic-One multi-agent framework. On WebTailBench (609 real web tasks): 38.4% success rate vs. OpenAI CUA-preview at 25.7%. Averages ~16 steps per task vs. ~41 for comparable models. 28shuvonsec/claude-bug-bounty BugHunter v5.0 (3,449★, +728): AI bug bounty toolkit. The v5.0 change: a standalone CLI mode that works without a Claude subscription, supporting Ollama (free local), Groq (free cloud), DeepSeek, Claude, and OpenAI. 29Four previously tracked repos went 404
microsoft/superpowers (formerly 227k★), lfnovo/paperclip (formerly 70.3k★), nick1ee7/MemPalace (formerly 55.5k★), and nicholasgriffintn/hiclaw all return 404 as of June 20. 30 Whether they were deleted, renamed, or made private is not confirmed. CopilotKit/CopilotKit (35.3k★) and NVIDIA/skills (1.4k★) remain active with modest growth. 30Cover image: GLM-5.2's pelican-on-a-bicycle SVG animation, from Simon Willison's evaluation post
参考ソース
- 1Simon Willison: GLM-5.2 is probably the most powerful text-only open weights LLM
- 2Simon Willison: Release: datasette-agent 0.3a0
- 3Simon Willison: Release: datasette 1.0a34
- 4Simon Willison: Datasette Apps: Host custom HTML applications inside Datasette
- 5Datasette Project Blog: Host applications inside Datasette with Datasette Apps
- 6Simon Willison: Release: datasette-tailscale 0.1a0
- 7Simon Willison: The Fable 5 Export Controls Harm US Cyber Defense
- 8Simon Willison (@simonw): Tweet — Glad nobody tried Fable 5 jailbreak against Opus 4.x or GPT 5.x
- 9Simon Willison: "They screwed us": Personality clashes sent Anthropic's models offline
- 10Simon Willison (@simonw): Tweet — Doesn't sound like we'll be getting Fable back very soon
- 11Simon Willison: A quote from Georgi Gerganov
- 12Simon Willison: A quote from Sean Lynch
- 13Simon Willison: A quote from Charity Majors
- 14Latent Space: The Professor of Outputmaxxing — Anjney Midha, AMP
- 15swyx (@swyx): "Anthropic is going to IPO at $2T"
- 16swyx (@swyx): "10 years ago, you will be asked by @bendhalpern and @jessleenyc to write your first blog on @thepracticaldev"
- 17GitHub: chopratejas/headroom
- 18GitHub: DeusData/codebase-memory-mcp
- 19GitHub: Panniantong/Agent-Reach
- 20GitHub: addyosmani/agent-skills
- 21GitHub: NVIDIA/SkillSpector
- 22Kilo-Org/kilocode
- 23phuryn/pm-skills
- 24withastro/flue
- 25stablyai/orca
- 26mvanhorn/last30days-skill
- 27GitHub: calesthio/OpenMontage
- 28GitHub: microsoft/fara
- 29GitHub: shuvonsec/claude-bug-bounty
- 30GitHub: CopilotKit/CopilotKit (status check)
このコンテンツについて、さらに観点や背景を補足しましょう。