
Issue 05: Build Week — Microsoft Ships a Coding Model, OpenAI Sunsets Two, and Governments Ask for Early Access
Four developments from June 2 converge on the same question — what does production AI look like next year: Microsoft debuts MAI-Code-1-Flash (5B params, 51% SWE-Bench Pro) and Microsoft IQ at Build 2026; OpenAI retires GPT-5.2 and GPT-5.3-Codex from ChatGPT subscriptions with GPT-5.5 as default; Anthropic expands Project Glasswing to 150 critical infrastructure organizations across 15+ countries with Claude Mythos Preview; and Trump signs a voluntary 30-day pre-release review order for frontier AI models.

This week's biggest cluster hit on Tuesday, June 2: Microsoft held Build 2026 in San Francisco, OpenAI quietly retired GPT-5.2 and GPT-5.3-Codex from its coding platform the same day, Anthropic expanded its restricted Mythos model to 150 critical infrastructure organizations, and Trump signed an executive order asking AI labs for voluntary 30-day early access to frontier models. Four separate announcements, one underlying tension: the people building and regulating AI infrastructure are all making bets this week on what the next 12 months of "production AI" looks like.
1. Microsoft Build 2026: MAI-Code-1-Flash and the Microsoft IQ layer
Microsoft's keynote on June 2 led with two announcements aimed squarely at the Copilot gap narrative — the perception that GitHub Copilot's agentic coding capability still trails Claude Code and Cursor.
MAI-Code-1-Flash is the concrete answer. It's a 5-billion-parameter coding model shipping natively into GitHub Copilot and VS Code. According to CEO Satya Nadella's published keynote transcript, the model hits 51% on SWE-Bench Pro. 1 To put that in context: Claude Code scores around 80.8% on similar benchmarks and was the model Microsoft's own E+D division was paying roughly $2K/month per engineer to use before canceling those licenses last week. MAI-Code-1-Flash at 51% on SWE-Bench Pro from 5 billion parameters — Claude Haiku-class size, not Claude Code-class — is a meaningful efficiency story, even if the absolute number still trails.
Mustafa Suleyman confirmed the model on X, listing it alongside six other new MAI models released the same day, including MAI-Image-2.5. 2
Loading content card…
The second piece is Microsoft IQ, a context layer that went generally available June 2 across GitHub Copilot, Microsoft Foundry, and Copilot Studio. It's designed to ground agents in enterprise-specific context — org charts, project history, internal terminology — rather than just world knowledge. The framing from Microsoft's blog is that you build your agent in GitHub, deploy it to Foundry, and IQ handles the context continuity between them. 3
Microsoft also previewed an Agent Context Stack (ACS) — positioned as "the MCP or A2A of agent safety." Where MCP standardized how agents connect to tools and A2A standardized agent-to-agent communication, ACS targets identity, permissions, and audit trails. Available in private preview on Foundry. 4
What to watch: MAI-Code-1-Flash is in private preview on Foundry as of June 2. GitHub Copilot CLI (the Explore, Task, Review, Plan agents) still routes through external models by default; it's unclear when Copilot will default to MAI-Code-1-Flash in agentic mode vs. continuing to offer Claude routing. 5
2. OpenAI retires GPT-5.2 and GPT-5.3-Codex from ChatGPT subscriptions
The same day as Build, OpenAI sunset GPT-5.2 and GPT-5.3-Codex from the Codex platform when accessed via ChatGPT subscriptions. GPT-5.5 is now the default. 6
The migration path is straightforward on paper: move to GPT-5.5, which carries a 400K context window in Codex, or use GPT-5.1-Codex-Mini for cost-sensitive work. GPT-5.5 benchmarks at 82.7% on Terminal-Bench 2.0 with $5/$30 per million tokens. 7 Via API, GPT-5.2 will be retired on July 16, 2026 per Databricks documentation. 8
The developer reaction in the OpenAI forums is pointed: several engineers reported that GPT-5.3-Codex was their preferred model for long-running autonomous tasks precisely because it was cheaper and more coding-specific than GPT-5.5, which they describe as "more expensive and less coding-oriented." GPT-5.3-Codex introduced context compaction for long-running tasks in February 2026, a feature that GPT-5.5 retains but at a higher token price. 6
Loading content card…
This is part of a wider OpenAI model rationalization: per the official model release notes, GPT-4.5 leaves ChatGPT on June 27, and o3 follows on August 26. 9 GPT-5.5, GPT-5.4, and GPT-5.5 Instant are the surviving first-class models. AWS announced that GPT-5.5, GPT-5.4, and Codex are now generally available on Amazon Bedrock as well.
For teams running automated coding pipelines on Codex, the practical question is whether to pin GPT-5.5 now with a higher token budget, or migrate to GPT-5.1-Codex-Mini and accept the capability ceiling. The SWE-Bench Pro score for GPT-5.5 (82.7% on Terminal-Bench, not directly SWE-Bench Pro) suggests the capability is there; the cost delta for high-volume agentic loops is the decision variable.
3. Anthropic expands Project Glasswing to 150 organizations across 15+ countries
On June 2, Anthropic expanded Project Glasswing — its restricted program for using Claude Mythos Preview on critical infrastructure vulnerability discovery — to 150 new organizations across more than 15 countries. The sectors explicitly listed include energy, water utilities, healthcare, communications, and hardware. NATO is among the new partners. 10 11
Mythos Preview is Anthropic's frontier model that has not been released publicly — described by Anthropic as too capable (and potentially too dangerous) to release without controlled access. Rubrik, one of the new Glasswing partners, published a blog post confirming the relationship and describing the arrangement as using Mythos to identify vulnerabilities in critical software infrastructure. 12
The expansion of Glasswing from Big Tech and hyperscalers to 150 organizations in energy, water, healthcare, and NATO has two readings. The charitable one: Anthropic is trying to get adversarial testing at production scale before a public Mythos release — using real infrastructure to surface real vulnerabilities. The less charitable one: a "too dangerous to release" model being given to 150 organizations with varying security maturity in sectors like water utilities is itself a risk vector. The CFR analysis published the same day notes this tension explicitly. 13
Loading content card…
Reuters confirmed as of May 28 that a public Mythos launch is still "coming weeks" away. 14 This Glasswing expansion looks like Anthropic using the remaining pre-release window to stress-test Mythos in high-stakes environments before general availability.
4. Trump signs AI executive order: voluntary 30-day pre-release access for federal review
On June 2, Trump signed an executive order asking AI developers to voluntarily provide the federal government access to "covered frontier models" 30 days before public release for cybersecurity review. 15
Key details: the framework is explicitly voluntary — the EO does not mandate licensing, regulatory approval, or require companies to hold releases. WilmerHale's client alert, published the same day, confirms: "AI developers may, on a voluntary basis, share covered frontier models with the [federal government]." 16
The 30-day window is also shorter than what some in the industry expected, per PBS reporting. The order additionally directs the Pentagon to secure networks within 30 days and the Justice Department to pursue prosecutions for AI-related offenses. 17
The practical implications for AI engineers building on frontier APIs are limited in the short term — no launch delays are required or even asked for. But the order sets a precedent: voluntary today, potentially mandatory next. For API-dependent products where a 30-day delay to a major model release would have material impact (especially given how fast capability cycles have compressed in 2026), this is worth tracking.
What to watch
- MAI-Code-1-Flash evaluation window: Microsoft says it's in private preview on Foundry. The question for engineers evaluating Copilot Max is whether the model routes into agentic tasks by default, or whether Copilot continues offering Claude Opus 4.8 / external model selection at a premium. Independent SWE-Bench Pro benchmarks — rather than Microsoft's own 51% claim — will be the credibility test.
- Mythos public launch: Reuters put it at "coming weeks" as of May 28. The Glasswing expansion to 150 orgs looks like the final round of pre-release testing. Watch anthropic.com/news.
- OpenAI o3 API retirement (separate from ChatGPT): The June 2 ChatGPT sunset applies to ChatGPT subscriptions. o3 remains available via API and is not on the same retirement timeline. Developers using o3 programmatically should check their specific API tier documentation before assuming August 26 applies to them.
- Pydantic AI v2.0.0 GA: Still on beta track. The GA release would be worth a focused item — several teams are watching the tool prepare-callback change (now throws TypeError on invalid input rather than returning None) before committing to the v2 migration path.
References
- 1Microsoft Build 2026 Keynote Transcript
- 2Mustafa Suleyman on X
- 3Microsoft Build 2026: Be Yourself at Work
- 4Build Agents You Can Trust Across Any Framework
- 5Best Coding Agents for VS Code in 2026
- 6GPT-5.2 and GPT-5.3-Codex have been sunset in Codex
- 7Codex CLI v0.135 Reference
- 8Databricks Foundation Models — Supported Models
- 9OpenAI Model Release Notes
- 10Anthropic expands Mythos to 150 additional organizations — CNBC
- 11TechCrunch: Anthropic scales Claude Mythos to critical infrastructure in 15 countries
- 12Rubrik joins Anthropic's Project Glasswing
- 13Assessing Trump's executive order on AI oversight — CFR
- 14Reuters: Anthropic to roll out Claude Mythos in coming weeks
- 15White House: Promoting Advanced AI Innovation and Security
- 16WilmerHale: New Executive Order on Frontier AI Model Access
- 17PBS: Trump signs executive order on voluntary federal vetting of AI
Add more perspectives or context around this Post.