agent-browser: 82% fewer tokens per browser command

agent-browser: 82% fewer tokens per browser command

Vercel Labs' agent-browser (36.3K stars, Apache 2.0) uses a snapshot+refs system to cut browser automation token cost by 82.5% — so your Claude Code agent runs 5.7× more iterations per context budget.

Today's Trending Agent Skills
2026. 6. 18. · 02:20
구독 5개 · 콘텐츠 33개

리서치 브리프

Today's pick: vercel-labs/agent-browser — a native Rust browser automation CLI built specifically for AI agents, now at 36.3K stars and 458.3K installs on skills.sh. 1

FieldValue
Repovercel-labs/agent-browser
MaintainerChris Tate (@ctatedev), Vercel Labs
LicenseApache 2.0
Stars / forks36,300+ / 2,300+
Latest releasev0.28.0 (Jun 16, 2026)
Install (skill)npx skills add vercel-labs/agent-browser
Supported agents18+, including Claude Code, Cursor, Copilot, Codex, Gemini CLI, Windsurf, Hermes
PlatformmacOS (ARM64/x64), Linux (ARM64/x64), Windows (x64)
Official siteagent-browser.dev

What it does — and why token count is the whole story

Most browser tools for AI agents were designed for humans first: Playwright, Puppeteer, and their MCP wrappers return full DOM trees and verbose success/failure JSON. That made sense when a human was reading the output. When a model is the consumer, it's a tax.
agent-browser inverts that assumption. Every interaction command returns just six characters: ✓ Done. Page state is surfaced only on demand, via a snapshot command that returns a compact accessibility tree — elements labeled with short references like @e1, @e2 — rather than raw HTML. 2
Pulumi engineer Engin Diri ran six identical tests against Playwright MCP and agent-browser side by side in January 2026. Total output: 31,117 characters for Playwright MCP, 5,455 characters for agent-browser. A single button click: 12,891 characters vs. 6. Homepage snapshot: 8,247 vs. 280. In Diri's framing: "agent-browser takes the same approach. Instead of separate tools for clicking, typing, scrolling, and navigating, it has a unified CLI with one clever idea: the snapshot + refs system." 3
That 82.5% token reduction translates directly into throughput: the same context budget runs roughly 5.7× more test iterations. For long autonomous agent sessions — CI validation loops, multi-step form testing, site scraping pipelines — context budget is the practical ceiling. Shrinking each interaction's output is how you raise it.
vercel-labs/agent-browser GitHub repository showing 36K stars, 2K forks, 116 contributors, used by 553 projects
vercel-labs/agent-browser on GitHub. 1

How the snapshot + refs system works

The core loop is four commands:
agent-browser open https://example.com    # launch browser, navigate
agent-browser snapshot -i                 # return accessibility tree
agent-browser click @e2                   # interact via ref
agent-browser screenshot                  # capture current state
A snapshot returns something like:
- button "Sign In" [ref=e1]
- textbox "Email" [ref=e2]
- textbox "Password" [ref=e3]
- link "Forgot password" [ref=e4]
The agent picks elements by reference, not by CSS selector or XPath. This matters: selectors break when a site refactors its DOM; accessibility refs track semantic identity across minor layout changes. 2
Traditional selectors are still supported when needed — agent-browser click "#submit" works, as do semantic locators like agent-browser find role button --name "Submit" and agent-browser find text "Continue" — but the ref system is the default path for agent-driven flows. 3
콘텐츠 카드를 불러오는 중…

Install

Four paths, same underlying binary:
npm (recommended):
npm install -g agent-browser
agent-browser install   # downloads Chrome for Testing
Homebrew (macOS):
brew install agent-browser
agent-browser install
Cargo (Rust ecosystem):
cargo install agent-browser
agent-browser install
Skill-only (agent gets browser commands immediately, no separate binary install required):
npx skills add vercel-labs/agent-browser
Zero-dependency one-off: npx agent-browser open example.com runs without global install. Node.js 24+ and pnpm 11+ are only needed if building from source. 1
Update with agent-browser upgrade — auto-detects which install method you used.
The SKILL.md stub uses a runtime-loaded design: the agent runs agent-browser skills get core to pull the full ~420-line usage guide dynamically rather than relying on a bundled static file. This means the skill stays version-accurate automatically, without requiring a reinstall when the CLI updates. 4

Usage examples

E2E test loop with Claude Code — The pattern most users report: the agent opens a URL, takes a snapshot, fills a form using refs, submits, screenshots the result, and compares against expected state. All in a single session without the agent losing context to verbose DOM dumps.
Electron desktop automation — agent-browser works on any Chromium-embedded application, not just web browsers. Load the dedicated skill with agent-browser skills get electron and the agent can interact with VS Code, Slack, Discord, Figma, Notion, and Spotify the same way it would a webpage. 2 Developer @0xSero (53K followers) described this directly: "Agent-browser is the best CLI tool I have given to my agents. It lets them control my browser, and all my electron apps (discord, vscode, slack, etc.) It barely consumes any tokens compared to things like playwright, has a great skill and the agents seem very comfortable with it." 5
Zero-cost autonomous browsing — Install the skill file, run two CLI tools, pay nothing. No subscription, no API key, no MCP server configuration. As one user put it: "i just gave my claude code the ability to browse the internet autonomously and it cost me $0. zero subscriptions. zero API keys. zero MCP servers. just a skill file + two free CLI tools." 6
Natural language control (v0.25.0+): agent-browser chat "open google.com and search for agent tools" accepts plain language commands if you prefer not to chain explicit CLI calls.
Page change detection: agent-browser diff snapshot and agent-browser diff screenshot detect what changed between two states — useful for monitoring workflows.

Advanced features worth knowing

v0.27.0 (May 7, 2026) added React DevTools integration: react tree returns the full component hierarchy, react inspect <fiberId> shows props, hooks, and state, react renders start|stop profiles render performance, and react suspense classifies Suspense boundaries. Web Vitals reporting (vitals [url]) covers LCP, CLS, TTFB, FCP, and INP with React hydration timing. 7
v0.28.0 (Jun 16, 2026) introduced two larger architectural additions: an MCP Server mode (agent-browser mcp starts a stdio server providing typed tools like agent_browser_open and agent_browser_snapshot) and an out-of-process plugin system over a stdio protocol. Both expand how agent-browser can be integrated into larger pipelines. 7
The skill collection in the repo has grown to nine entries: the main agent-browser skill (458.3K installs), plus core, electron, slack, dogfood, vercel-sandbox, agentcore (AWS Bedrock), skill-creator, and next. 8
agent-browser feature overview showing universal agent compatibility, AI-first accessibility tree snapshot, native Rust speed, 50+ commands, session support, and cross-platform binaries
Feature overview from Towards AI. 9

Community signals

The project launched around January 11, 2026, gaining 1.5K GitHub stars within 24 hours. 10 By June 2026 it reached 36.3K stars and 458.3K installs — a strong, sustained adoption curve rather than a spike-and-fade pattern. 1 8
Paweł Huryn (79K followers, AI PM) included it in a list of "6 free GitHub repos for Claude Code" that can save $100/month on API costs. 11 @SeeLos called it "probably the most underrated dev tool Vercel labs has built." 12 Community discussion on Reddit r/ClaudeAI (88 upvotes) walked through the full Claude Code integration flow and sparked detailed comparison threads with Playwright MCP. 10
Chris Tate has noted that evals show agents using agent-browser "more often and more correctly" after the skill install, suggesting the runtime-loaded SKILL.md gives the model enough context to reach for the right tool without prompt engineering. 4

Known limitations

Dynamic waits are less mature than Playwright's. Playwright has years of accumulated logic for waiting on network idle, element visibility, and animation completion. agent-browser requires explicit wait calls for modals and async UI updates. In Diri's testing, this was the practical gap: "Start with agent-browser for AI validation loops. Move to Playwright when you outgrow it." 3
Headless detection. Some sites detect headless Chrome and block it. This affects agent-browser the same way it affects any headless automation tool. Reddit r/ClaudeAI discussion has noted this as a recurring friction point, particularly for scraping-heavy workflows. 10
Snapshot stability on complex SPAs. At least one Reddit user raised questions about whether the accessibility tree approach holds up reliably when DOM state changes frequently. This is an open empirical question — the tool performs well in standard scenarios, but edge behavior in highly dynamic single-page apps is less documented. 10
Playwright still leads on advanced protocol features. Multi-tab management, PDF generation, HAR recording, network interception and mocking — these exist in Playwright at a depth agent-browser hasn't matched yet. For workflows that depend on those specific capabilities, agent-browser isn't a drop-in replacement. 3
Documentation is thin in spots. With 50+ commands and a growing plugin system, the official docs lag the CHANGELOG. The runtime-loaded skill file covers the core loop well, but edge behaviors (session persistence, profile management, advanced MCP tool profiles) require reading the source or CHANGELOG directly.
The positioning Diri landed on holds up: agent-browser is the right default for AI-driven browser loops. If you're already reaching for Playwright because of a specific protocol requirement, that need is real — but for the majority of agent-browser tasks (navigate, snapshot, fill, submit, screenshot), agent-browser runs leaner and costs less.

Get it

# Install the skill (Claude Code, Cursor, Codex, and 15+ others)
npx skills add vercel-labs/agent-browser

# Install the CLI globally
npm install -g agent-browser && agent-browser install
Repo: github.com/vercel-labs/agent-browser · Apache 2.0 · v0.28.0 · 36.3K stars · agent-browser.dev
Cover image: AI-generated illustration

이 콘텐츠를 둘러싼 관점이나 맥락을 계속 보강해 보세요.

  • 로그인하면 댓글을 작성할 수 있습니다.