CLI vs MCP on Chrome DevTools Protocol
Anthropic published an interesting observation about the Model Context Protocol and code execution. They noted that executable code in the filesystem might be more efficient for AI agents than protocol servers. Isn’t that essentially what CLI tools already are?
I decided to test this hypothesis by comparing two approaches to browser automation with AI agents:
- CLI approach: bdg - A browser debugger CLI I built
- MCP approach: Chrome DevTools MCP - The official Chrome DevTools protocol server
Both tools interact with the Chrome DevTools Protocol, so they have access to the same underlying capabilities. The question is: does the interface matter?
Methodology
I used a fresh Claude instance (Sonnet 4.5) with zero prior knowledge of either tool. The agent received identical tasks across three real websites:
- Hacker News - Navigate, count stories, extract comments
- CodePen - Inspect trending pens, capture screenshots
- Amazon - Extract product information (anti-bot stress test)
The goal was to see how agents discover and use each tool naturally, without human guidance.
Full methodology: BENCHMARK_PROMPT.md
Results
Token Efficiency: 13x Difference
This was the most striking finding:
| Tool | Total Tokens | Per Test Average |
|---|---|---|
| bdg (CLI) | 6,500 | ~2,200 |
| Chrome MCP | 85,500 | ~28,500 |
| Difference | 13x more efficient | - |
The gap comes from how each tool returns information:
MCP’s approach: Full accessibility snapshots
- Every page state = complete accessibility tree
- Amazon product page alone: 52,000 tokens in one snapshot
- Includes every element, nested structure, full context
CLI’s approach: Targeted queries
- CSS selectors return only matching elements
bdg dom query ".athing"→ 1,200 tokens (30 stories)- Progressive disclosure - get what you need, when you need it
Command Count: Roughly Equivalent
| Test | bdg Commands | MCP Calls | Winner |
|---|---|---|---|
| Hacker News | 11 | 8 | MCP |
| CodePen | 6 | 5 | MCP |
| Amazon | 4 | 3 | MCP |
| Average | 7 | ~5 | MCP |
MCP requires slightly fewer calls for simple tasks. However, when one approach uses 13x more tokens per call, command count becomes less relevant.
Discovery: Zero-Knowledge Learning
One of the most interesting aspects was watching how the agent learned each tool.
bdg discovery path (5 commands):
# 1. What is this tool?
bdg --help --json
# 2. What can CDP do?
bdg cdp --list
# Result: 53 domains available
# 3. What Network methods exist?
bdg cdp Network --list
# Result: 39 methods
# 4. How do I get cookies?
bdg cdp --search cookie
bdg cdp Network.getCookies --describe
# 5. Execute
bdg cdp Network.getCookies The agent went from zero knowledge to successful execution without external documentation. It taught itself through the tool’s introspection.
MCP discovery path:
- Requires understanding of MCP protocol
- Uses UID-based element selection from snapshots
- Must parse 10k+ token accessibility trees to find elements
Element Selection
bdg: Standard CSS selectors
bdg dom query ".athing" # Hacker News stories
bdg dom query "#productTitle" # Amazon product
bdg dom click "button[type=submit]" MCP: UID-based from accessibility tree
take_snapshot({});
// Returns 10k tokens
click({ uid: "1_28" });
// Must find UID in snapshot first CSS selectors are more familiar to developers, but UID-based selection is more robust for dynamic content. Trade-offs exist on both sides.
Why CLI Tools Enable Self-Correction
There’s a fundamental difference in how CLI tools and protocol servers handle limitations:
CLI tools expose their constraints explicitly. When a command fails, you get structured errors with exit codes, suggestions, and full context. This enables agents to self-correct:
$ bdg dom click ".missing-button"
Error: Element not found: .missing-button
Exit code: 81 (user error)
Suggestions:
- Verify selector: bdg dom query ".missing-button"
- List all buttons: bdg dom query "button"
- Wait for element: sleep 2 && bdg dom click ".missing-button" The agent learns what went wrong and how to fix it. Error recovery becomes part of the workflow.
Protocol servers hide implementation gaps. If an MCP server doesn’t expose a specific CDP method, there’s no way to access it. You’re limited to the 28 curated tools the server provides. Need something from the Profiler domain? Security domain? WebAuthn domain? You’re stuck until someone updates the server.
Composability means extensibility. CLI tools integrate with the Unix ecosystem:
# Filter requests by status code
bdg peek --network | jq '.[] | select(.status >= 400)'
# Chain commands for workflows
bdg dom query "button" | jq '.[0].nodeId' | xargs bdg dom click
# Combine with other tools
bdg network getCookies | grep "session" | cut -d: -f2 If bdg doesn’t provide exactly what you need, you can compose it with jq, grep, awk, or any other Unix tool. MCP servers require protocol extensions and server updates.
Full protocol access matters. bdg exposes all 644 CDP methods across 53 domains. If you need Profiler.startPreciseCoverage or Security.setIgnoreCertificateErrors, it’s already there via bdg cdp Profiler.startPreciseCoverage. No waiting for server maintainers to add support.
This isn’t just about efficiency. It’s about not being artificially limited by someone else’s API design decisions.
What This Means
Token Efficiency Compounds
For a single task, 13x might seem manageable. But consider:
- Debugging session: 20+ page states → 200k vs 15k tokens
- Multi-step workflow: Navigate, fill forms, verify → tokens add up fast
- Context window limits: More tokens = less room for reasoning
At scale, this efficiency gap becomes significant.
Self-Documentation Enables Autonomy
The most interesting finding wasn’t the numbers - it was watching the agent learn through introspection:
$ bdg --help --json
# Agent learns: 10 commands, exit codes, task mappings
$ bdg cdp --list
# Agent learns: 53 domains
$ bdg cdp --search cookie
# Agent discovers: 14 cookie-related methods
$ bdg cdp Network.getCookies --describe
# Agent learns: parameters, return types, examples No external documentation needed. The tool IS the documentation.
Unix Composability Matters
CLI tools compose naturally with Unix tools:
# Filter network requests
bdg peek --last 20 | jq '.[] | select(.status >= 400)'
# Count specific elements
bdg dom query ".error" | jq 'length'
# Chain commands
bdg dom query "button" && bdg dom click "button:first-child" This flexibility is harder to replicate in protocol-based tools.
Limitations
This benchmark has clear constraints:
- Small sample size - Only 3 websites tested
- Single model - Only Claude Sonnet 4.5
- Specific scenarios - Information extraction workflows
- Bot detection - Both tools faced blocks on some sites
On Debugging Workflows
The most common criticism is that this focused on “information extraction” rather than debugging workflows. However, bdg provides comprehensive debugging abstractions:
- Console debugging:
bdg console --followfor real-time error streaming - Network debugging:
bdg peek --network,bdg tail,bdg network headers - Performance profiling: Full CDP access via
bdg cdp Profiler --list - HAR export:
bdg network harfor deep network analysis
The token efficiency advantage applies equally to debugging workflows. A debugging session with 20 page states would consume 200k+ tokens via MCP snapshots vs ~15k via targeted bdg queries.
Both tools access the same Chrome DevTools Protocol - the difference is the interface, not the capabilities.
Takeaways
This isn’t a definitive “CLI beats MCP” statement. It’s one data point suggesting:
Token Efficiency: With MCP, you pay upfront for every tool definition and capability declaration - whether you use them or not. CLI tools like glab, jq, and grep were already in the model’s training data. A skill document showing usage patterns is ~3k tokens. MCP server definitions alone can be 5-10k before you invoke any additional functionality.
Composability: Unix philosophy wins here. CLI tools pipe together, each doing one thing well, and you chain them for complex workflows. MCP servers are monolithic endpoints. If it doesn’t expose your exact query, you’re stuck. With CLI, you can grep, pipe to files, and combine tools. The model already knows these patterns.
Debuggability: CLI errors are transparent. You see exactly what failed and why. MCP errors hide behind protocol layers and server logs you can’t access. The model can identify CLI errors, understand them, and adapt accordingly.
Real-Time Evolution: I can update my skill document while the agent uses it, adding patterns and refining examples. With MCP, you’re locked to whatever the server exposes. Want new functionality? Wait for the maintainer to add it, redeploy, hope nothing breaks. With CLI, I just update the markdown.
For my use case (browser automation with AI agents), CLI tools with self-documentation proved more efficient than MCP servers. The ability to compose with Unix tools and access the full CDP surface without waiting for server updates is significant.
Your mileage may vary depending on your needs.
Resources
- Full benchmark results: BENCHMARK_RESULTS_2025-11-23.md
- bdg CLI tool: github.com/szymdzum/browser-debugger-cli
- Chrome DevTools MCP: github.com/ChromeDevTools/chrome-devtools-mcp
- Anthropic’s MCP article: Code Execution with MCP