AI Coding Tools Weekly: Copilot billing overhaul, Codex hits 4M users, Grok Build enters the terminal war

Week of May 8–15, 2026

The biggest story this week has nothing to do with a new model or a shinier autocomplete. GitHub Copilot's move to usage-based billing — effective June 1 — is changing the math for every engineering team that runs agents at scale, and the community reaction has been sharp enough to push some users toward Cursor and Windsurf. Meanwhile, OpenAI's Codex crossed 4 million weekly active users, xAI launched a direct terminal-agent challenger, and a CVSS 10.0 vulnerability in Claude Code reminded everyone that agentic tooling carries a larger attack surface than traditional IDE extensions.

Here's what shipped and what changed.

GitHub Copilot: billing overhaul, desktop app, and a wave of deprecations

The dominant story across engineering forums this week is Copilot's billing transition, announced April 27 and taking effect June 1, 2026. 1

GitHub is replacing Premium Request Units (PRUs) with GitHub AI Credits, where 1 credit = $0.01. Base subscription prices hold — Pro at $10/mo, Business at $19/user/mo, Enterprise at $39/user/mo — and each plan bundles credits equal to its dollar cost (Pro gets 1,000 credits, Business 1,900, Enterprise 3,900). 2 Code completions and Next Edit Suggestions remain unmetered and will not draw down credits. Business and Enterprise credits pool across the organization.

For the June–August transition period, GitHub is offering a promotional bump: Business accounts get 3,000 credits/user/mo at $30/user, and Enterprise accounts get 7,000 credits/user/mo at $70/user. 2

To help teams prepare, GitHub published April usage preview reports on May 12 — downloadable estimates showing how April activity would translate to AI Credit consumption. 3 The reports note that April 1–24 usage of 0x-tier models was excluded (roughly 2% of total usage). Community reaction on r/GithubCopilot has been mixed: some power users reported projected monthly spend exceeding $5,000 under the new model, while others accepted it as a structural inevitability for per-prompt pricing. A meaningful subset is publicly evaluating Cursor or Windsurf as alternatives.

Two product releases shipped this week alongside the billing news:

On May 14, GitHub launched the Copilot desktop app in technical preview — a standalone application that starts agentic development sessions from an issue, PR, or a free-form prompt, with pause/resume support and cross-repo capability. The Agent Merge feature handles review comments and merge flows automatically. Pro and Pro+ subscribers can register for early access. 4

Also on May 14, Copilot cloud agent gained Auto model selection: choosing Auto lets the system pick the best-performing model in real time, with a 10% credit discount and no weekly rate cap. 5

For JetBrains users, a May 13 update brought the Copilot CLI agent into public preview (with Worktree/Workspace isolation modes), a unified sessions view that surfaces agent sessions from JetBrains, VS Code, and GitHub.com in one place, a new Ask Question tool in agent mode, global .agent.md support for defining agent behavior, and GHES (GitHub Enterprise Server) login. The update also removes Edit mode. 6

A broader VS Code recap published May 6 summarized April releases (v1.116–v1.119): semantic search now covers all workspaces, /chronicle experimental chat history search landed, BYOK (Bring Your Own Key) extended to Business and Enterprise plans, and agent debug logs are now persisted. 7

On deprecations, GitHub confirmed that June 1 will retire GPT-4.1 (recommended replacement: GPT-5.5), Claude Sonnet 4, GPT-5.2, GPT-5.2-Codex, and Grok Code Fast 1. 8 Post-transition model pricing: GPT-5.5 runs $5.00/1M input and $30.00/1M output; Claude Opus 4.7 at $5.00 input and $25.00 output. GPT-4.1 and GPT-5 mini remain included (no credit cost). GitHub's own fine-tuned models Goldeneye and Raptor mini are now in public preview. 9

The credits-bundled-with-subscription design is reasonable on paper. The friction point is that agents can exhaust those credits in hours on complex codebases, and the overage path — buying additional credits at $0.01 each — doesn't have a hard cap by default. Engineering managers with high-volume agent workflows should pull their April reports before June 1.

Cursor: cloud agent environments, parallel plans, and Bugbot repricing

Cursor shipped two releases and a pricing change this week.

v3.3 (May 7) introduced a redesigned PR Review experience with tabbed navigation (Reviews / Commits / Changes), Build in Parallel — the agent can now execute multiple plan steps concurrently rather than sequentially — a one-click "Split changes into PRs" action, and an Agent Context Usage panel that visualizes how rules, skills, MCPs, and subagents are consuming context window budget. 10

v3.4 (May 13) added Cloud Agent development environments: teams can now spin up Dockerfile-based remote containers with build secrets, multiple repo support, agent-led environment setup, version history, and egress/secrets isolation per environment. 10 11 The design targets teams that need reproducible, secure environments without exposing local credentials or codebases to agent processes.

On May 11, Cursor announced two changes to Bugbot. First, billing shifts from a flat $40/seat/month to usage-based pricing at roughly $1.00–$1.50 per PR review, effective for new customers after their next renewal on June 8. 12 Second, Bugbot now supports three effort levels — Default, High, and Custom. High effort finds an average of 0.95 bugs per run versus 0.7 on Default, a 35% increase. The same release added Microsoft Teams integration: team members can @Cursor in a Teams channel to delegate tasks directly to the cloud agent.

For teams that use Bugbot lightly, the per-run pricing is clearly better. For teams running it on every commit to a busy monorepo, the math flips fast.

Windsurf: Opus 4.7 fast mode in, free models out

Windsurf had a split week: a meaningful capability addition paired with a policy change that generated significant community friction.

On the positive side, v2.2.17 (May 6) opened Devin Review and Quick Review to all users with a two-week free trial, and improved the Agent Command Center with a list view and better sorting. 13 On May 12, Windsurf announced availability of Claude Opus 4.7 in fast mode — full Opus 4.7 intelligence at approximately 2.5× the output speed. 14

The friction: around May 13, Windsurf quietly removed nearly all free-tier models — including SWE-1.5, SWE-1.6 Free, Kimi K2.6, and GLM-5 — leaving very few options for free users. There was no entry in the official changelog or any announcement. 15 The r/windsurf community responded sharply, with multiple users describing the move as self-destructive and citing ChatGPT Codex or Cursor as their next destination.

The silence around the change is the real problem. Model pricing adjustments are defensible; removing access without explanation gives users no basis to evaluate whether to stay or upgrade. The absence from the changelog suggests this was either a deliberate soft rollout or an operational decision made above the product team. Either way, it's a trust issue that a post-hoc explanation won't fully repair.

Terminal agents: Codex hits 4M weekly users, Grok Build enters beta

OpenAI's Codex crossed 4 million weekly active users and extended its reach to mobile on May 14, launching a preview of Codex in the ChatGPT app for iOS and Android. 16 From the app, users can monitor active coding tasks, review output, approve commands, switch models, or start new sessions. The connection runs through a secure relay layer so trusted machines don't need direct public exposure. OpenAI described the mobile interface as designed for "quick check-ins" that keep long-running agent threads moving without requiring a laptop:

"More than 4 million people now use Codex every week, and we're seeing how much those small moments matter. A quick check-in can keep a thread moving, prevent unnecessary rework, or help Codex make progress with the right context." 16

The same announcement confirmed Remote SSH reaching general availability, Hooks reaching GA, programming access tokens, and HIPAA compliance support for Enterprise users.

xAI's official announcement tweet for Grok Build beta

Image from: xAI Launches Grok Build Beta: Agentic Coding CLI Explained

On the same day, xAI (operating as SpaceXAI) launched Grok Build in early beta — a terminal-native agentic coding CLI available to SuperGrok Heavy subscribers at $299/month, with an introductory price of $99/month for the first six months. 17 The tool is positioned as a direct competitor to Claude Code and Codex CLI. Elon Musk reposted the launch announcement inviting users to test and provide feedback.

Grok Build is too early-stage for a meaningful technical comparison. What it does change is the competitive framing: the terminal coding agent category now has three well-resourced entrants (Anthropic, OpenAI, xAI) plus Windsurf's CLI and Devin's terminal agent. The $99 introductory pricing positions it below Claude Code's $100 Max tier, which will matter for individual developers evaluating the space.

A critical vulnerability in Claude Code surfaced on May 8. CVE-2026-39861 is a sandbox escape via symlink that allows agent tooling to write arbitrary files outside the designated workspace. 18 The CVE carries a CVSS score of 10.0 and affects all Claude Code versions prior to 2.1.64. The flaw was first reported by a security researcher in August 2025 but was explicitly excluded from Anthropic's Vulnerability Disclosure Program at that time. It received a Hacker News discussion with "sandbox escape" in the title roughly a week ago.

Any team running Claude Code in a CI/CD pipeline, shared development environment, or multi-tenant infrastructure should verify they are on version 2.1.64 or later. CVSS 10.0 is the highest severity rating — patching is not optional.

On the other side of the security equation, Replit shipped Security Center 2.0 on May 7, adding cross-project vulnerability management with bulk operations: notify all affected project owners, unpublish vulnerable apps, or trigger a one-click "Fix with Agent" that generates a patch for selected CVEs. Enterprise users get SBOM (Software Bill of Materials) output. 19

On GitHub Trending, agent skill and configuration repositories dominated May 15's leaderboard: 20

mattpocock/skills — 83K stars, +2,987 in a single day; real engineer .claude skill directories
obra/superpowers — 191K stars, +1,780; agent skill framework and dev methodology
github/spec-kit — 99K stars, +1,232; specification-driven development toolkit
garrytan/gstack — 97K stars, +915; Garry Tan's (YC CEO) full Claude Code configuration with 23 custom tools

The pattern is consistent: developers are no longer just using agents — they're publishing and sharing agent configurations as first-class software artifacts. The .claude directory is quietly becoming a new type of dotfile.

Tabnine published its April recap on May 6, centering on "agents you can trust": a CLI Plan Mode that previews agent intent before execution, sandboxed CLI execution with stricter tool limits, a Token and Cost API for building departmental billing models, and per-team quota enforcement. 21

Brief notes

Devin adds Android emulator support (May 13): Cognition's Devin can now launch Android Virtual Devices (AVDs) to run, test, and debug Android apps in a live environment — closing the gap between code generation and on-device verification. Available to all Devin teams. 22

Anthropic corporate (May 13–14): Anthropic launched Claude for Small Business, a new plan tier targeting SMBs, and announced a $200 million partnership with the Gates Foundation. 23 No changes to Claude Code pricing or capabilities were announced alongside these.

Notion Developer Platform (May 13): Notion opened its workspace to external AI agents, adding Workers (sandboxed custom code execution in Notion's cloud, free through August), database sync from any API-accessible source, and direct integration with Claude Code, Cursor, Codex, and Decagon. 24 Notion CEO Ivan Zhao said: "Any data, any tool, any agent — that's the big picture for the Notion Developer Platform." Since February's Custom Agents launch, customers have built over 1 million agents on the platform.

Cover image: from Introducing Claude Opus 4.7