13 KiB
Browser Automation CLI Reference
This document provides detailed technical reference for the CLI browser automation tool.
Architecture Overview
The browser automation system consists of:
- Stagehand: TypeScript library wrapping Playwright for AI-driven browser control. Uses AI model to find and interact with the right elements, so be specific
- Chrome CDP: Chrome DevTools Protocol connection on port 9222
- CLI Tool: Command-line interface in
src/cli.tsfor browser automation - Local Chrome: Chrome browser launched with remote debugging enabled
File Locations
- Chrome Profile:
.chrome-profile/- Persistent browser profile directory - Screenshots:
./agent/browser_screenshots/- Screenshot output directory - Downloads:
./agent/downloads/- File download directory
CLI Command Reference
navigate
Navigate to a URL in the browser.
Usage:
browser navigate <url>
Parameters:
url(string, required): The URL to navigate to. Must include protocol (http:// or https://)
Returns: JSON output:
{
"success": true,
"message": "Successfully navigated to <url>",
"screenshot": "/path/to/screenshot.png"
}
Implementation Details:
- Uses Playwright's
page.goto()under the hood - Waits for network idle and DOM content loaded
- Automatically takes a screenshot after navigation
- Supports HTTPS upgrade for HTTP URLs
Example:
browser navigate https://example.com
Error Handling:
- Invalid URLs return error with
success: false - Network timeouts return timeout error
- SSL certificate errors may fail navigation
act
Perform an action on the page using natural language.
Usage:
browser act "<action>"
Parameters:
action(string, required): Natural language description of the action to perform
Returns: JSON output:
{
"success": true,
"message": "Successfully performed action: <action>",
"screenshot": "/path/to/screenshot.png"
}
Note: Without specificity it might succeed on the wrong element!
Implementation Details:
- Uses Stagehand's
page.act()which leverages Claude Haiku 4.5 - AI model interprets natural language and executes corresponding browser actions
- Supports: clicking, typing, selecting, scrolling, waiting, hovering, and more
- Automatically handles element location and interaction
- Automatically takes a screenshot after the action
Natural Language Examples:
browser act "Click the login button"
browser act "Fill in email field with test@example.com"
browser act "Scroll to the bottom of the page"
browser act "Select 'California' from the state dropdown"
browser act "Hover over the menu icon"
browser act "Wait for 3 seconds"
browser act "Press the Enter key"
browser act "Double-click the file icon"
Best Practices:
- Be specific about which element to interact with
- Include visual descriptors ("button next to the form", "top menu", "form at bottom")
- For ambiguous elements, mention nearby context
- Break complex actions into multiple simple actions
Error Handling:
- Element not found errors indicate selector couldn't be resolved
- Timeout errors occur when action takes too long
- Action not possible errors indicate element state prevents action
- All errors return JSON with
success: false
extract
Extract structured data from the current page using a schema.
Usage:
browser extract "<instruction>" '{"field": "type"}'
Parameters:
instruction(string, required): Natural language description of what to extractschema(JSON string, required): Schema definition mapping field names to types
Schema Types:
"string": Text content"number": Numeric values (integers or floats)"boolean": True/false values
Returns: JSON output:
{
"success": true,
"data": {
"field1": "value",
"field2": 123
}
}
Implementation Details:
- Uses Stagehand's
page.extract()with Zod schema validation - AI model (Claude Haiku 4.5) identifies relevant page elements
- Automatically handles pagination and dynamic content
- Validates extracted data against schema
Schema Example:
browser extract "Extract the product information" '{"productName": "string", "price": "number", "inStock": "boolean", "description": "string", "rating": "number"}'
Complex Extraction Example:
browser extract "Extract all items from the shopping cart" '{"itemName": "string", "quantity": "number", "unitPrice": "number", "totalPrice": "number", "imageUrl": "string"}'
Best Practices:
- Use clear, descriptive field names
- Match schema types to expected data types
- Provide specific extraction instructions
- Handle missing data by checking result properties
Error Handling:
- Schema validation errors indicate type mismatch
- Extraction failures occur when data not found on page
- Timeout errors for pages that take too long to analyze
- All errors return JSON with
success: false
observe
Discover available actions on the page.
Usage:
browser observe "<query>"
Parameters:
query(string, required): Natural language query to discover elements
Returns: JSON output:
{
"success": true,
"data": [
{
"selector": "button.submit-btn",
"text": "Submit Form",
"type": "button",
"visible": true,
"enabled": true
}
]
}
Implementation Details:
- Uses Stagehand's
page.observe()to scan page elements - Returns actionable elements matching the query
- Provides element properties, states, and available actions
Query Examples:
browser observe "Find all buttons"
browser observe "Find clickable links in the navigation"
browser observe "Find form input fields"
browser observe "Find all submit buttons"
browser observe "Find elements with text 'Login'"
browser observe "Find all images"
Use Cases:
- Page exploration and discovery
- Debugging action failures
- Understanding page structure
- Finding dynamic element selectors
Error Handling:
- Empty array returned when no elements match
- Timeout for pages that take too long to scan
- All errors return JSON with
success: false
screenshot
Take a screenshot of the current page.
Usage:
browser screenshot
Parameters: None
Returns: JSON output:
{
"success": true,
"screenshot": "/path/to/screenshot.png"
}
Implementation Details:
- Uses Chrome DevTools Protocol
Page.captureScreenshot - Captures full viewport at current scroll position
- Saves as PNG format with timestamp in filename
- Automatically resizes images larger than 2000x2000 pixels using Sharp
- Uses lossless PNG compression
Screenshot Path Format:
./agent/browser_screenshots/screenshot-YYYY-MM-DDTHH-MM-SS-mmmZ.png
Example:
browser screenshot
Image Processing:
- Original resolution preserved if ≤ 2000x2000
- Larger images resized to fit within 2000x2000 while maintaining aspect ratio
- Uses Sharp library for high-quality image processing
Best Practices:
- Take screenshots before and after important actions
- Use for visual debugging and verification
- Screenshot after navigation to confirm page loaded
- Capture error states for troubleshooting
Error Handling:
- Directory creation errors if screenshots folder can't be created
- CDP errors if Chrome DevTools Protocol connection fails
- File write errors if disk space insufficient
- All errors return JSON with
success: false
close
Close the browser and cleanup resources.
Usage:
browser close
Parameters: None
Returns: JSON output:
{
"success": true,
"message": "Browser closed"
}
Implementation Details:
- Calls
stagehand.close()to clean up Playwright resources - Kills Chrome process if it was started by the CLI tool
- Clears internal state variables
- Does NOT delete
.chrome-profile/directory (preserved for reuse)
Resource Cleanup:
- Closes all browser tabs and windows
- Terminates Chrome process (only if started by this tool)
- Releases CDP connection
- Clears Stagehand instance
Best Practices:
- Always call at the end of browser automation tasks
- Call even if errors occurred during automation
- Don't call mid-workflow unless explicitly needed
Error Handling:
- Continues cleanup even if some steps fail
- Safe to call multiple times
- Gracefully handles already-closed browser
- All errors return JSON with
success: false
Configuration Details
Stagehand Initialization
The Stagehand instance is configured in src/cli.ts with:
new Stagehand({
env: "LOCAL",
verbose: 0,
enableCaching: true,
modelName: "anthropic/claude-haiku-4-5-20251001",
localBrowserLaunchOptions: {
cdpUrl: `http://localhost:9222`,
},
})
Configuration Options:
env: "LOCAL": Uses local Chrome instead of remote browserverbose: 0: Minimal logging outputenableCaching: true: Caches page analysis for better performancemodelName: Claude Haiku 4.5 for AI-driven actions and extractioncdpUrl: Chrome DevTools Protocol endpoint
Chrome Launch Arguments
Chrome is launched by src/cli.ts with:
--remote-debugging-port=9222
--user-data-dir=.chrome-profile
--window-position=-9999,-9999
--window-size=1280,720
Arguments:
--remote-debugging-port: Enables CDP on port 9222--user-data-dir: Persistent profile directory for session/cookie persistence--window-position: Launches minimized off-screen--window-size: Default window size
Download Configuration
Downloads are configured via CDP:
await client.send("Browser.setDownloadBehavior", {
behavior: "allow",
downloadPath: "./agent/downloads",
eventsEnabled: true,
})
Behavior:
- Downloads start automatically (no dialog)
- Files saved to
./agent/downloads/ - Download events can be monitored via CDP
Error Messages Reference
Common Errors
"Could not find local Chrome installation"
- Cause: Chrome/Chromium not installed or not in standard locations
- Solution: Install Chrome from https://www.google.com/chrome/
"Chrome failed to start with remote debugging on port 9222"
- Cause: Port 9222 already in use or Chrome can't bind to port
- Solution: Close other Chrome instances or change CDP port
"Browser failed to become ready within timeout"
- Cause: Chrome launched but page context not ready
- Solution: Check Chrome version compatibility, restart system
"Error performing action: element not found"
- Cause: Natural language description didn't match any page element
- Solution: Use more specific description or use observe to find elements
"Error extracting data: schema validation failed"
- Cause: Extracted data type doesn't match schema
- Solution: Verify schema types match actual page data
"Error taking screenshot: directory not writable"
- Cause: Insufficient permissions for screenshots directory
- Solution: Check file permissions on
./agent/browser_screenshots/
Performance Considerations
Caching
Stagehand caches page analysis to improve performance on repeated actions. Cache is maintained for:
- Element selectors
- Page structure analysis
- Vision model results
Timeouts
Default timeouts:
- Navigation: 30 seconds
- Action execution: 30 seconds
- Extraction: 60 seconds
- CDP connection: 15 seconds (50 retries × 300ms)
Resource Usage
Browser automation consumes:
- Memory: ~200-500MB for Chrome process
- CPU: Variable based on page complexity
- Disk: ~50-200MB for Chrome profile
- Network: Depends on pages visited
Security Considerations
Credential Handling
- Browser uses persistent profile (
.chrome-profile/) - Saved passwords and cookies persist between sessions
- Consider using isolated profiles for sensitive operations
Download Safety
- Downloads automatically saved to
./agent/downloads/ - No file type restrictions enforced
- Verify downloaded file integrity before use
Network Access
- Browser has full network access
- Respects system proxy settings
- Can access localhost and internal networks
Debugging Tips
Enable Verbose Logging
Edit src/cli.ts and change verbose level in Stagehand configuration:
// Change verbose: 0 to verbose: 1 or 2
verbose: 2, // Maximum verbosity
View Chrome Console
Connect to Chrome DevTools manually:
- Open Chrome
- Navigate to
chrome://inspect - Click "inspect" under Remote Target
Check CDP Connection
Test CDP endpoint:
curl http://localhost:9222/json/version
Monitor Browser Process
Check Chrome process:
ps aux | grep chrome
View Screenshots
Screenshots provide visual debugging:
ls -lh ./agent/browser_screenshots/
open ./agent/browser_screenshots/screenshot-*.png
Test CLI Commands
Test individual commands:
browser navigate https://example.com
browser screenshot
browser close
Version Information
- Stagehand: Uses
@browserbasehq/stagehandpackage v2.5.2+ - Model: Claude Haiku 4.5 (claude-haiku-4-5-20251001) for browser actions
- CLI Tool: TypeScript CLI in
src/cli.ts - Agent SDK:
@anthropic-ai/claude-agent-sdkfor conversation framework - Browser: Local Chrome/Chromium installation
For updates and changelog, see the main project repository.