Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:18:56 +08:00
commit 3aef0f6c84
70 changed files with 14222 additions and 0 deletions

View File

@@ -0,0 +1,574 @@
# Browser Pilot Commands Reference
Complete reference for all Browser Pilot CLI commands using the project-local wrapper script.
## Command Entry Points
Browser Pilot uses the CLI wrapper script `.browser-pilot/bp`:
**Single command execution:**
```bash
node .browser-pilot/bp <command> [options]
```
**Chain mode (multiple commands):**
```bash
node .browser-pilot/bp chain <command1> [args1] <command2> [args2] ...
```
**Daemon management:**
```bash
node .browser-pilot/bp daemon-<action>
```
## Single Command Execution
### Navigation Commands
**navigate** - Navigate to URL (auto-generates interaction map on page load)
```bash
node .browser-pilot/bp navigate -u "<url>"
```
**back** - Navigate back in history
```bash
node .browser-pilot/bp back
```
**forward** - Navigate forward in history
```bash
node .browser-pilot/bp forward
```
**reload** - Reload current page (regenerates interaction map)
```bash
node .browser-pilot/bp reload
```
### Interaction Commands
**click** - Click an element
Smart Mode (Recommended):
```bash
node .browser-pilot/bp click --text "<text>" [options]
Options:
--text <text> Text content to search for
--index <number> Select nth match (1-based)
--type <type> Element type filter (supports aliases: "input""input-*")
--tag <tag> HTML tag filter (e.g., "button", "input")
--viewport-only Only search visible elements
--verify Verify action success
Examples:
node .browser-pilot/bp click --text Submit
node .browser-pilot/bp click --text "Sign In" --type button
node .browser-pilot/bp click --text Delete --index 2
node .browser-pilot/bp click --text Search --type input # Auto-expands to all input types
node .browser-pilot/bp click --text Submit --tag button # Tag-based search
```
Direct Mode (fallback for unique IDs):
```bash
node .browser-pilot/bp click -s "<selector>"
node .browser-pilot/bp click -u "<url>" -s "<selector>"
```
**fill** - Fill input field
Smart Mode (Recommended):
```bash
node .browser-pilot/bp fill --text "<label>" -v "<value>" [options]
Options:
--text <label> Label or placeholder text to search
--type <type> Input type filter (supports aliases: "input""input-*")
--tag <tag> HTML tag filter (e.g., "input", "textarea")
--viewport-only Only search visible elements
--verify Verify action success
Examples:
node .browser-pilot/bp fill --text Email -v user@example.com
node .browser-pilot/bp fill --text Password -v secret --type input-password
node .browser-pilot/bp fill --text Email --tag input -v user@example.com # Tag-based search
```
Direct Mode (fallback for unique IDs):
```bash
node .browser-pilot/bp fill -s "<selector>" -v "<value>"
```
**hover** - Hover over element
```bash
node .browser-pilot/bp hover -s "<selector>"
```
**press** - Press keyboard key
```bash
node .browser-pilot/bp press -k "<key>"
Keys: Enter, Tab, Escape, ArrowUp, ArrowDown, etc.
```
**type** - Type text character by character
```bash
node .browser-pilot/bp type -t "<text>" -d <delay-ms>
```
**upload** - Upload file to input element
```bash
node .browser-pilot/bp upload -s "<selector>" -f "<file-path>"
```
### Data Extraction Commands
**extract** - Extract text content from element
```bash
node .browser-pilot/bp extract -s "<selector>"
```
**content** - Get full page HTML
```bash
node .browser-pilot/bp content
```
**console** - Get console messages with powerful filtering and formatting
```bash
node .browser-pilot/bp console [options]
Options:
-u, --url <url> Navigate to URL before getting console messages
Level Filtering:
-e, --errors-only Show only error messages
-l, --level <level> Filter by level: error, warning, log, info, verbose
--warnings Show only warning messages
--logs Show only log messages
Message Limiting:
--limit <number> Maximum number of messages to display
--skip <number> Skip first N messages
Text Filtering:
-f, --filter <pattern> Show only messages matching regex pattern
-x, --exclude <pattern> Exclude messages matching regex pattern
Output Format:
-j, --json Output in JSON format
-t, --timestamp Show timestamps
--no-color Disable colored output
File Output:
-o, --output <file> Save output to file
Source Filtering:
--url-filter <pattern> Filter by source URL (regex)
Examples:
# Get all console messages
node .browser-pilot/bp console
# Get only errors with timestamps
node .browser-pilot/bp console -e -t
# Filter messages containing "API"
node .browser-pilot/bp console -f "API"
# Get warnings and exclude messages containing "deprecated"
node .browser-pilot/bp console --warnings -x "deprecated"
# Get first 10 log messages in JSON format
node .browser-pilot/bp console --logs --limit 10 -j
# Save all console messages to file
node .browser-pilot/bp console -o console-output.txt
# Get errors from specific source file
node .browser-pilot/bp console -e --url-filter "app.js"
# Navigate and get console errors
node .browser-pilot/bp console -u "http://localhost:3000" -e
```
**cookies** - Get page cookies
```bash
node .browser-pilot/bp cookies
```
### Capture Commands
**screenshot** - Capture screenshot (saved to `.browser-pilot/screenshots/`)
```bash
node .browser-pilot/bp screenshot -o "<filename>.png" [options]
Options:
-u, --url <url> URL to capture (optional)
-o, --output <path> Output filename (saved to .browser-pilot/screenshots/)
--full-page Capture full page (default: true)
--clip-x <x> Clip region X coordinate (pixels)
--clip-y <y> Clip region Y coordinate (pixels)
--clip-width <width> Clip region width (pixels)
--clip-height <height> Clip region height (pixels)
--clip-scale <scale> Clip region scale factor (default: 1)
--headless Run in headless mode
Note: Clip region options take priority over --full-page. When clip options are specified,
only the specified region will be captured regardless of --full-page setting.
Examples:
# Full page screenshot
node .browser-pilot/bp screenshot -o result.png --full-page
# Saves to: .browser-pilot/screenshots/result.png
# Capture specific region (clip takes priority over full-page)
node .browser-pilot/bp screenshot -o region.png --clip-x 100 --clip-y 200 --clip-width 800 --clip-height 600
# Saves to: .browser-pilot/screenshots/region.png
```
**pdf** - Generate PDF (saved to `.browser-pilot/pdfs/`)
```bash
node .browser-pilot/bp pdf -o "<filename>.pdf" [options]
Options:
-u, --url <url> URL to capture (optional)
-o, --output <path> Output filename (saved to .browser-pilot/pdfs/)
--landscape Landscape orientation
--headless Run in headless mode
Example:
node .browser-pilot/bp pdf -o document.pdf --landscape
# Saves to: .browser-pilot/pdfs/document.pdf
```
**set-viewport** - Set browser viewport size (useful for responsive design testing)
```bash
node .browser-pilot/bp set-viewport -w <width> -h <height> [options]
Options:
-w, --width <width> Viewport width in pixels (required)
-h, --height <height> Viewport height in pixels (required)
--scale <scale> Device scale factor (default: 1)
--mobile Emulate mobile device (default: false)
Examples:
# Desktop viewport
node .browser-pilot/bp set-viewport -w 1920 -h 1080
# Mobile viewport (iPhone 12)
node .browser-pilot/bp set-viewport -w 390 -h 844 --scale 3 --mobile
# Tablet viewport (iPad)
node .browser-pilot/bp set-viewport -w 768 -h 1024 --scale 2
```
**get-viewport** - Get current viewport size
```bash
node .browser-pilot/bp get-viewport
Output:
=== Viewport Information ===
Size: 1920x1080
Scale: 1
```
**get-screen-info** - Get screen and viewport information
```bash
node .browser-pilot/bp get-screen-info
Output:
=== Screen Information ===
Screen: 2560x1440 # Physical screen resolution
Available: 2560x1392 # Available screen (excluding taskbar)
Viewport: 1920x1080 # Current browser viewport
Scale: 1 # Device pixel ratio
```
### Tab Management Commands
**tabs** - List all open tabs
```bash
node .browser-pilot/bp tabs
```
**new-tab** - Open new tab
```bash
node .browser-pilot/bp new-tab -u "<url>"
```
**close-tab** - Close tab by index
```bash
node .browser-pilot/bp close-tab -i <index>
```
**close** - Close browser
```bash
node .browser-pilot/bp close
```
### Utility Commands
**eval** - Execute JavaScript in browser context
```bash
node .browser-pilot/bp eval -e "<javascript-expression>"
Example:
node .browser-pilot/bp eval -e "document.title"
```
**wait** - Wait for element to appear
```bash
node .browser-pilot/bp wait -s "<selector>" -t <timeout-ms>
```
**scroll** - Scroll page or element
```bash
# Scroll to position
node .browser-pilot/bp scroll -x <x-pos> -y <y-pos>
# Scroll element into view
node .browser-pilot/bp scroll -s "<selector>"
```
**select** - Select dropdown option
```bash
node .browser-pilot/bp select -s "<selector>" -v "<option-value>"
```
**find** - Find elements matching selector
```bash
node .browser-pilot/bp find -s "<selector>"
```
**query** - Query interaction map for elements
```bash
# List all element types with counts
node .browser-pilot/bp query --list-types
# List all text contents (paginated, default 20)
node .browser-pilot/bp query --list-texts
# List text contents with type filter
node .browser-pilot/bp query --list-texts --type button
# Find elements by text
node .browser-pilot/bp query --text "<text-content>"
# Find all elements of a type (paginated)
node .browser-pilot/bp query --type <element-type>
# Type aliases (auto-expanded)
node .browser-pilot/bp query --type input # Matches: input, input-text, input-search, etc.
node .browser-pilot/bp query --type input-search # Exact match only
# Find by HTML tag
node .browser-pilot/bp query --tag button # All <button> elements
node .browser-pilot/bp query --text Submit --tag button
# Show detailed information
node .browser-pilot/bp query --type button --verbose
# Pagination options
node .browser-pilot/bp query --type button --limit 50 --offset 20
# Unlimited results
node .browser-pilot/bp query --type button --limit 0
# Other options:
# --index <n> Select nth match (1-based)
# --viewport-only Only visible elements
# --id <id> Direct element ID lookup
```
**map-status** - Check interaction map status
```bash
node .browser-pilot/bp map-status
```
**regen-map** - Force regenerate interaction map
```bash
node .browser-pilot/bp regen-map
```
## Chain Mode
Execute multiple commands sequentially in a single call with automatic map synchronization.
**Syntax:**
```bash
node .browser-pilot/bp chain <command1> [args1] <command2> [args2] ...
```
**Key Features:**
- Auto-waits for page load and map generation after navigation
- Supports Smart Mode (--text) for reliable element targeting
- Adds random human-like delay (300-800ms) between commands
- Stops execution if any command fails
- Each command executes after the previous one completes
**Chain-specific Options:**
```bash
--timeout <ms> Timeout for waiting map ready after navigation (default: 10000ms)
--delay <ms> Fixed delay between commands (overrides random 300-800ms delay)
```
**Quote Rules:**
- No quotes needed when values have no spaces
- Use quotes when values contain spaces
**Examples:**
```bash
# Basic chain (no quotes needed)
node .browser-pilot/bp chain navigate -u <url> click --text Submit extract -s .result
# With spaces (quotes required)
node .browser-pilot/bp chain navigate -u <url> click --text "Sign In" fill -s #email -v "user@example.com"
# Login workflow with Smart Mode
node .browser-pilot/bp chain navigate -u <url> fill --text Email -v <email> fill --text Password -v <password> click --text Login
# Multi-step workflow with Korean text
node .browser-pilot/bp chain navigate -u <url> click --text "메인 메뉴" click --text "설정 변경" click --text "저장"
# Screenshot workflow
node .browser-pilot/bp chain navigate -u <url> wait -s .content-loaded screenshot -o result.png
# Custom timing
node .browser-pilot/bp chain --timeout 15000 --delay 1000 navigate -u <url> click --text Submit
```
## Daemon Commands
Daemon starts automatically on first command and stops automatically at session end.
**daemon-start** - Start daemon (auto-starts on first command)
```bash
node .browser-pilot/bp daemon-start
```
**daemon-stop** - Stop daemon and close browser
```bash
node .browser-pilot/bp daemon-stop
```
**daemon-restart** - Restart daemon
```bash
node .browser-pilot/bp daemon-restart
```
**daemon-status** - Check daemon status
```bash
node .browser-pilot/bp daemon-status
```
## System Maintenance Commands
**reinstall** - Reinstall Browser Pilot scripts
Removes the `.browser-pilot` directory to force complete reinstallation on next command. Useful when:
- Installation or build is corrupted
- Scripts are not updating properly
- Troubleshooting persistent issues
```bash
# Show confirmation prompt
node .browser-pilot/bp reinstall
# Skip confirmation and reinstall immediately
node .browser-pilot/bp reinstall --yes
# Quiet mode (no output)
node .browser-pilot/bp reinstall --yes --quiet
```
**What it does:**
1. Stops the daemon if running
2. Removes `.browser-pilot` directory completely
3. Next command will trigger automatic reinstallation via SessionStart hook
**Options:**
- `-y, --yes`: Skip confirmation prompt
- `-q, --quiet`: Suppress output messages
**Example workflow:**
```bash
# Reinstall scripts
node .browser-pilot/bp reinstall --yes
# Next command triggers automatic reinstallation
node .browser-pilot/bp navigate -u "https://example.com"
```
## Common Options
Most commands support:
- `-u, --url <url>`: Navigate to URL before action
- `--headless`: Run in headless mode (no visible browser)
- `--timeout <ms>`: Custom timeout for operations
## Smart Mode vs Direct Mode
**🌟 Recommendation: Use Smart Mode by default for better reliability**
| Feature | Smart Mode (Recommended) | Direct Mode |
|---------|--------------------------|-------------|
| Selector | Text content | CSS or XPath |
| Reliability | ⭐⭐⭐⭐⭐ High (stable) | ⭐⭐ Low (brittle) |
| Duplicates | Auto indexing | Manual indexing |
| Map Required | Yes (auto-generated) | No |
| Speed | Medium | Fast |
| Best For | Most cases, text-based UI | Unique IDs only |
| Maintenance | Low (text rarely changes) | High (selectors break often) |
**When to use Direct Mode:**
- Element has a unique, stable ID (e.g., `#user-profile-button`)
- Performance-critical operations requiring maximum speed
- Element has no visible text content
## Exit Codes
- `0`: Success
- `1`: General error
- Non-zero: Command failed
## Examples
**Take screenshot:**
```bash
node .browser-pilot/bp screenshot -u "https://example.com" -o "example.png" --full-page
```
**Login flow (Direct Mode):**
```bash
node .browser-pilot/bp navigate -u "https://example.com/login"
node .browser-pilot/bp fill -s "#email" -v "user@example.com"
node .browser-pilot/bp fill -s "#password" -v "secret"
node .browser-pilot/bp click -s "#login-btn"
```
**Login flow (Chain Mode):**
```bash
node .browser-pilot/bp chain navigate -u "https://example.com/login" fill -s "#email" -v "user@example.com" fill -s "#password" -v "secret" click -s "#login-btn"
```
**Smart mode workflow (map auto-generated on navigate):**
```bash
node .browser-pilot/bp navigate -u "https://example.com"
node .browser-pilot/bp click --text "Login" --type button
node .browser-pilot/bp fill --text "Email" -v "user@example.com"
node .browser-pilot/bp fill --text "Password" -v "secret"
node .browser-pilot/bp click --text "Submit" --verify
```
**Smart mode workflow (Chain Mode):**
```bash
node .browser-pilot/bp chain navigate -u "https://example.com" click --text "Login" --type button fill --text "Email" -v "user@example.com" fill --text "Password" -v "secret" click --text "Submit" --verify
```
**Extract data:**
```bash
node .browser-pilot/bp navigate -u "https://example.com/products"
node .browser-pilot/bp extract -s ".product-title"
node .browser-pilot/bp extract -s ".product-price"
```

View File

@@ -0,0 +1,434 @@
# Interaction Map System
## Overview
The Interaction Map system provides reliable element targeting for browser automation by generating a structured JSON representation of all interactive elements on a webpage. This eliminates brittle CSS selectors and enables text-based element search with automatic selector generation.
## Architecture
### Components
1. **Map Generator** (`src/cdp/map/generate-interaction-map.ts`)
- Browser-side script that extracts all interactive elements
- Generates multiple selector types for each element
- Handles SVG elements, disabled states, React components
2. **Map Manager** (`src/daemon/map-manager.ts`)
- Daemon-level automatic map generation on page load
- 10-minute cache with auto-regeneration
- URL-based cache validation
- Event-driven DOM stabilization detection
3. **Map Query Module** (`src/cdp/map/query-map.ts`)
- Loads and queries interaction maps
- Searches by text, type, ID, visibility
- Returns best selector with alternatives
4. **CLI Integration** (`src/cli/commands/interaction.ts`)
- Smart Mode options: `--text`, `--index`, `--type`, `--viewport-only`
- Automatic map querying before action execution
- Fallback to alternative selectors on failure
### Automatic Map Generation
Maps are automatically generated when:
- Navigating to a new page (`node .browser-pilot/bp navigate -u "<url>"`)
- Page reload (`node .browser-pilot/bp reload`)
- Cache expires (10 minutes)
- Manual force generation (daemon command)
No manual map generation needed - the daemon handles it automatically.
Output location: `.browser-pilot/interaction-map.json`
## JSON Structure
Maps use a hybrid structure optimized for both direct access and search:
```json
{
"url": "https://example.com",
"timestamp": "2025-11-05T14:39:03.598+09:00",
"viewport": {
"width": 2560,
"height": 1305
},
"elements": {
"elem_0": {
"id": "elem_0",
"type": "button",
"tag": "button",
"text": "Submit",
"value": null,
"selectors": {
"byText": "//button[contains(text(), 'Submit')]",
"byId": "#submit-btn",
"byCSS": "button.btn.btn-primary",
"byRole": "[role='button']",
"byAriaLabel": "[aria-label='Submit form']"
},
"attributes": {
"id": "submit-btn",
"class": "btn btn-primary",
"disabled": false
},
"position": {
"x": 1275,
"y": 650
},
"visibility": {
"inViewport": true,
"visible": true,
"obscured": false
},
"context": {
"section": "Form"
}
}
},
"indexes": {
"byText": {
"Submit": ["elem_0", "elem_15"],
"Delete": ["elem_5", "elem_6", "elem_7"]
},
"byType": {
"button": ["elem_0", "elem_1", "elem_2"],
"input-text": ["elem_10", "elem_11"]
},
"inViewport": ["elem_0", "elem_1", "elem_2", "elem_10"]
},
"statistics": {
"total": 45,
"byType": {
"button": 12,
"input-text": 5,
"a": 8
},
"duplicates": 3
}
}
```
### Key Features
**1. Key-Value Structure** (`elements`)
- Direct ID access: `map.elements["elem_0"]`
- Avoids array iteration for known IDs
**2. Indexes** (fast lookup)
- `byText`: Maps text content → element IDs
- `byType`: Maps element types → element IDs
- `inViewport`: Array of visible element IDs
**3. Multiple Selectors**
- `byText`: XPath with tag name (e.g., `//button[contains(text(), 'Submit')]`)
- `byId`: CSS ID selector (highest priority)
- `byCSS`: CSS class selector
- `byRole`: ARIA role selector
- `byAriaLabel`: ARIA label selector
**4. Automatic Indexing**
- Duplicate text elements get indexed: `(//button[contains(text(), 'Delete')])[2]`
- Enables "click the 3rd Delete button" functionality
**5. Auto-Caching**
- 10-minute cache TTL
- Automatically regenerates on expiration or navigation
- URL-based validation to prevent stale maps
## Element Detection
### Interactive Element Types
The map generator detects:
- Standard inputs: `<input>`, `<button>`, `<select>`, `<textarea>`
- Links: `<a href="...">`
- ARIA roles: `button`, `link`, `textbox`, `checkbox`, `radio`, etc.
- Click handlers: Elements with `onclick`, React event handlers
- Cursor style: `cursor: pointer`
- Tab-navigable: `tabindex >= 0`
### Special Cases
**SVG Elements:**
```typescript
// Handles SVGAnimatedString className
const className = typeof el.className === 'string'
? el.className
: (el.className.baseVal || '');
```
**Disabled Buttons:**
```typescript
// Standard interactive elements included even if disabled
const isStandardInteractive = ['INPUT', 'BUTTON', 'SELECT', 'TEXTAREA', 'A'].includes(tag);
if (!isStandardInteractive && style.pointerEvents === 'none') {
return false; // Skip
}
```
**React Components:**
```typescript
// Detect React event handlers
const reactProps = Object.keys(el).filter(key => key.startsWith('__react'));
const hasReactHandlers = reactProps.some(prop => {
const value = el[prop];
return value && typeof value === 'object' && value.onClick;
});
```
## Selector Generation
### Priority Order
Query system selects best selector with this priority:
1. **byId** (highest priority)
- Most stable, unique identifier
- Example: `#login-button`
2. **byText** (indexed for duplicates)
- Tag-specific XPath: `//button[contains(text(), 'Submit')]`
- With indexing: `(//button[contains(text(), 'Delete')])[2]`
3. **byCSS**
- Safe classes only (alphanumeric, hyphens, underscores)
- Example: `button.btn.btn-primary`
- Skips generic tag-only selectors
4. **byRole**
- ARIA role attribute
- Example: `[role="button"]`
5. **byAriaLabel** (lowest priority)
- ARIA label attribute
- Example: `[aria-label="Submit form"]`
### Text-Based XPath
XPath selectors include tag names for precision:
**Before:** `//*[contains(text(), 'Submit')]`
- Problem: Matches any element with that text (div, span, button, etc.)
**After:** `//button[contains(text(), 'Submit')]`
- Solution: Only matches `<button>` elements
- More precise, faster execution
## Query API
### Query Options
```typescript
interface QueryOptions {
text?: string; // Search by text content
type?: string; // Filter by element type (supports aliases: "input" → "input-*")
tag?: string; // Filter by HTML tag (e.g., "input", "button")
index?: number; // Select nth match (1-based)
viewportOnly?: boolean; // Only visible elements
id?: string; // Direct ID lookup
}
```
**Type Aliases:**
- Generic types auto-expand to match all subtypes
- `type: "input"` → matches `input`, `input-text`, `input-search`, `input-password`, etc.
- `type: "button"` → matches `button`, `button-submit`, `button-reset`, etc.
- Specific types match exactly: `type: "input-search"` → only `input-search`
**Tag vs Type:**
- `tag`: Filters by HTML tag name (e.g., `<input>`, `<button>`)
- `type`: Filters by interaction map type classification (more specific, includes subtypes)
- Use `tag` for broader matching, `type` for precise targeting
**3-Stage Fallback (Automatic):**
When element not found, system automatically:
1. Tries type-based search (with alias expansion)
2. Falls back to tag-based search (if type specified)
3. Regenerates map and retries (up to 3 attempts)
### Usage Examples
**Direct ID lookup:**
```typescript
const results = queryMap(map, { id: 'elem_0' });
// Returns: Single element with that ID
```
**Text search:**
```typescript
const results = queryMap(map, { text: 'Delete' });
// Returns: All elements containing "Delete"
```
**Text + index:**
```typescript
const results = queryMap(map, { text: 'Delete', index: 2 });
// Returns: Second element containing "Delete"
```
**Type filter:**
```typescript
const results = queryMap(map, { type: 'button' });
// Returns: All button elements
```
**Text + type:**
```typescript
const results = queryMap(map, { text: 'Submit', type: 'button' });
// Returns: Button elements containing "Submit"
```
**Visibility filter:**
```typescript
const results = queryMap(map, { text: 'Add to Cart', viewportOnly: true });
// Returns: Only "Add to Cart" elements currently visible
```
### Fuzzy Search
When exact text match fails, falls back to fuzzy search:
```typescript
// Query: { text: 'menu' }
// Matches: "메뉴로 돌아가기", "Main Menu", "menu button"
// Case-insensitive, substring matching
```
## CLI Smart Mode
### Click Command
```bash
# Search by text
node .browser-pilot/bp click --text "Submit"
# With index for duplicates
node .browser-pilot/bp click --text "Delete" --index 2
# Filter by type
node .browser-pilot/bp click --text "Add to Cart" --type button
# Visible elements only
node .browser-pilot/bp click --text "Next" --viewport-only
```
### Fill Command
```bash
# Search input by label
node .browser-pilot/bp fill --text "Username" -v "testuser"
# Filter by input type
node .browser-pilot/bp fill --text "Password" -v "secret" --type input-password
# Visible inputs only
node .browser-pilot/bp fill --text "Email" -v "test@example.com" --viewport-only
```
## Cache Management
### Automatic Cache
Maps are cached for 10 minutes with automatic management:
- Auto-generated on first page load
- Auto-regenerated after 10 minutes
- Auto-regenerated on navigation
- URL validation prevents stale maps
Cache location: `.browser-pilot/map-cache.json`
### Manual Control (Daemon Commands)
Force regenerate map:
```bash
npm run bp:daemon-send -- --command MAP_GENERATE --params '{"force":true}'
```
Query current map:
```bash
npm run bp:daemon-send -- --command MAP_QUERY --params '{"text":"Submit","type":"button"}'
```
## Best Practices
1. **Let daemon auto-manage**
- Maps generate automatically on page load
- No manual generation needed
2. **Use text + index for duplicates**
- Better than CSS classes that may change
- More readable: "click 2nd Delete" vs complex selector
3. **Filter by type**
- Narrows results when text is ambiguous
- `--type button` excludes links, divs with same text
4. **Verify visibility**
- `--viewport-only` ensures element is on screen
- Avoids clicking hidden/off-screen elements
5. **Check map statistics**
- Review duplicates count in map JSON
- Helps determine if indexing is needed
6. **Fallback handling**
- Smart Mode automatically tries alternative selectors
- Check console for errors if action fails
## Troubleshooting
### Element not found in map
**Cause:** Element may not be detected as interactive
**Solutions:**
1. Check if element has click handler: Look for `onclick`, React handlers
2. Verify cursor style: Should be `pointer` for clickable elements
3. Check ARIA role: Element should have appropriate role
4. Force regenerate map if recently added to page
### Wrong element selected
**Cause:** Multiple elements with same text
**Solutions:**
1. Use `--index` to select specific match
2. Add `--type` filter to narrow results
3. Use `--viewport-only` to exclude off-screen elements
4. Check element position in map JSON
### Map out of date
**Cause:** Page changed after map generation
**Solutions:**
1. Maps auto-regenerate after 10 minutes
2. Force regenerate with daemon command
3. Check timestamp in map JSON
4. Verify URL matches current page
### Cache not updating
**Cause:** URL changed but cache still returns old map
**Solutions:**
1. Daemon validates URL before returning cache
2. Force regenerate with `force:true` parameter
3. Check cache file for URL mismatch
4. Restart daemon if persists
## Future Enhancements
Current status (v1.3.0):
- ✓ Automatic map generation on page load
- ✓ Daemon-level map caching and management
- ✓ Action verification with automatic retry
- ✓ URL-based cache validation
- ✓ Chain mode with automatic map synchronization
- ✓ Handler architecture refactoring for maintainability
Planned improvements:
- Visual map inspector tool
- Map diff for debugging selector changes
- Performance metrics and optimization
- Additional daemon commands (wait-idle, sleep)

View File

@@ -0,0 +1,447 @@
# Selector Strategies Guide
## Selector Types Overview
| Selector Type | Stability | Speed | Best For |
|--------------|-----------|-------|----------|
| CSS ID | ⭐⭐⭐⭐⭐ | Fast | Unique IDs |
| Smart Mode (text) | ⭐⭐⭐⭐ | Medium | Text-based UI |
| XPath (text) | ⭐⭐⭐⭐ | Medium | Text content |
| CSS Class | ⭐⭐ | Fast | Stable classes |
| XPath (structure) | ⭐⭐ | Medium | DOM structure |
## Decision Tree
```
Does element have unique ID?
├─ YES → Use CSS: #element-id
└─ NO ↓
Is element identified by text?
├─ YES → Use Smart Mode: --text "Submit"
└─ NO ↓
Does element have stable class?
├─ YES → Use CSS: .stable-class
└─ NO ↓
Can use ARIA attributes?
├─ YES → Use XPath: //*[@role='button']
└─ NO ↓
Use structural XPath
```
## CSS Selectors
### By ID (Most Stable)
```bash
node .browser-pilot/bp click -s "#login-button"
node .browser-pilot/bp click -s "#user-menu"
```
### By Class
```bash
node .browser-pilot/bp click -s ".submit-btn"
node .browser-pilot/bp click -s ".modal .close-button"
```
### By Attribute
```bash
node .browser-pilot/bp fill -s "input[name='email']" -v "test@example.com"
node .browser-pilot/bp click -s "button[data-action='submit']"
```
### Complex Selectors
```bash
# Direct child
node .browser-pilot/bp click -s "div.modal > button.primary"
# Descendant
node .browser-pilot/bp click -s "form .submit-button"
# Nth-child
node .browser-pilot/bp click -s "ul > li:nth-child(3)"
# Multiple classes
node .browser-pilot/bp click -s "button.btn.btn-primary.btn-lg"
```
## XPath Selectors
### Text-Based (Most Reliable)
**With Tag Name (Recommended):**
```bash
# Specific tag with text
node .browser-pilot/bp click -s "//button[contains(text(), 'Submit')]"
node .browser-pilot/bp click -s "//a[contains(text(), 'Learn More')]"
# Exact text match
node .browser-pilot/bp click -s "//button[text()='Sign In']"
```
**With Wildcard (Avoid):**
```bash
# Searches all elements (slow, imprecise)
node .browser-pilot/bp click -s "//*[contains(text(), 'Submit')]"
```
### Indexed XPath (Duplicates)
```bash
# First match
node .browser-pilot/bp click -s "(//button[contains(text(), 'Delete')])[1]"
# Third match
node .browser-pilot/bp click -s "(//button[contains(text(), 'Delete')])[3]"
# Last match (if 5 exist)
node .browser-pilot/bp click -s "(//button[contains(text(), 'Delete')])[5]"
```
### Attribute-Based
```bash
# By type
node .browser-pilot/bp fill -s "//*[@type='email']" -v "test@example.com"
# By role
node .browser-pilot/bp click -s "//*[@role='button']"
# By data attribute
node .browser-pilot/bp click -s "//*[@data-testid='submit']"
# Partial match
node .browser-pilot/bp click -s "//*[contains(@href, 'checkout')]"
```
### Structural XPath
```bash
# Parent-child
node .browser-pilot/bp click -s "//div[@class='modal']//button[@type='submit']"
# Following sibling
node .browser-pilot/bp click -s "//h1[contains(text(), 'Welcome')]/following-sibling::button"
# Preceding sibling
node .browser-pilot/bp click -s "//button[contains(text(), 'Next')]/preceding-sibling::button"
# Parent
node .browser-pilot/bp click -s "//button[@type='submit']/parent::form"
```
## Smart Mode Selectors
### Basic Text Search
```bash
node .browser-pilot/bp click --text "Submit"
node .browser-pilot/bp click --text "Add to Cart"
node .browser-pilot/bp fill --text "Username" -v "test"
```
### With Type Filter
```bash
# Only buttons
node .browser-pilot/bp click --text "Submit" --type button
# Only links
node .browser-pilot/bp click --text "Learn More" --type a
# Only text inputs
node .browser-pilot/bp fill --text "Email" -v "test@example.com" --type input-text
```
### Type Aliases (Auto-Expanded)
```bash
# Generic type (matches all subtypes)
node .browser-pilot/bp click --text "Search" --type input
# Expands to: input, input-text, input-search, input-password, etc.
# Specific type (exact match)
node .browser-pilot/bp fill --text "Email" --type input-search -v "query"
# Matches only: input-search
# Button aliases
node .browser-pilot/bp click --text "Submit" --type button
# Expands to: button, button-submit, button-reset, etc.
```
### Tag-Based Search
```bash
# Filter by HTML tag (broader matching)
node .browser-pilot/bp click --text "Submit" --tag button
# Matches all <button> elements regardless of type
node .browser-pilot/bp fill --text "Email" --tag input -v "user@example.com"
# Matches all <input> elements regardless of type
# Combined with text search
node .browser-pilot/bp click --text "Next" --tag button --viewport-only
```
**Tag vs Type:**
- `--tag`: Matches HTML tag name (`<button>`, `<input>`)
- `--type`: Matches interaction map classification (more specific)
- Use `--tag` when type filtering fails or for broader matches
### With Indexing
```bash
# Second "Delete" button
node .browser-pilot/bp click --text "Delete" --index 2 --type button
# First "Submit" button
node .browser-pilot/bp click --text "Submit" --index 1
```
### With Visibility Filter
```bash
# Only visible elements
node .browser-pilot/bp click --text "Next" --viewport-only
# Visible "Add to Cart" buttons
node .browser-pilot/bp click --text "Add to Cart" --viewport-only --type button
```
## Common Patterns
### Form Filling
**Direct mode (fast but brittle):**
```bash
node .browser-pilot/bp fill -s "#email" -v "user@example.com"
node .browser-pilot/bp fill -s "#password" -v "secret"
node .browser-pilot/bp click -s "#login-btn"
```
**Smart mode (slower but reliable, map auto-generated on page load):**
```bash
node .browser-pilot/bp fill --text "Email" -v "user@example.com"
node .browser-pilot/bp fill --text "Password" -v "secret"
node .browser-pilot/bp click --text "Login" --type button
```
**Chain mode (Direct):**
```bash
node .browser-pilot/bp chain navigate -u <url> fill -s #email -v <email> fill -s #password -v <password> click -s #login-btn
```
**Chain mode (Smart - recommended):**
```bash
node .browser-pilot/bp chain navigate -u <url> fill --text Email -v <email> fill --text Password -v <password> click --text Login --type button
```
**Note:** Chain mode auto-waits for map generation after navigation and adds human-like delays between commands.
### Clicking Nth Item
**CSS nth-child:**
```bash
node .browser-pilot/bp click -s "ul.products > li:nth-child(3) button"
```
**XPath indexing:**
```bash
node .browser-pilot/bp click -s "(//ul[@class='products']//button)[3]"
```
**Smart mode indexing:**
```bash
node .browser-pilot/bp click --text "Add to Cart" --index 3
```
### Modal Interactions
**CSS scoping:**
```bash
node .browser-pilot/bp click -s ".modal .btn-primary"
node .browser-pilot/bp click -s "#confirm-modal button.submit"
```
**XPath scoping:**
```bash
node .browser-pilot/bp click -s "//div[@class='modal']//button[contains(text(), 'Confirm')]"
```
**Smart mode:**
```bash
node .browser-pilot/bp click --text "Confirm" --type button --viewport-only
```
### Dynamic Content
**Wait then interact:**
```bash
node .browser-pilot/bp wait -s ".loading-complete" -t 5000
node .browser-pilot/bp click --text "Load More" --viewport-only
```
**Chain mode:**
```bash
node .browser-pilot/bp chain wait -s ".loading-complete" -t 5000 click --text "Load More" --viewport-only
```
## Best Practices
### 1. Prefer Stable Identifiers
**Good: Unique ID**
```bash
node .browser-pilot/bp click -s "#checkout-button"
```
**Good: Data attribute**
```bash
node .browser-pilot/bp click -s "button[data-testid='submit']"
```
**Avoid: Generated classes**
```bash
node .browser-pilot/bp click -s ".btn-a7s9d2f" # Likely to change
```
### 2. Use Smart Mode for Text-Based UI
**Good: Text content is stable**
```bash
node .browser-pilot/bp click --text "Continue to Checkout"
```
**Avoid: CSS classes for text buttons**
```bash
node .browser-pilot/bp click -s ".checkout-btn-primary-lg"
```
### 3. Scope Selectors When Possible
**Good: Scoped to container**
```bash
node .browser-pilot/bp click -s "#user-menu button.logout"
```
**Avoid: Global selector**
```bash
node .browser-pilot/bp click -s "button.logout" # May match wrong button
```
### 4. Handle Duplicates Explicitly
**Good: Specific index**
```bash
node .browser-pilot/bp click --text "Delete" --index 2
```
**Good: Scoped selector**
```bash
node .browser-pilot/bp click -s "#product-123 button.delete"
```
**Avoid: Ambiguous selector**
```bash
node .browser-pilot/bp click --text "Delete" # Which Delete button?
```
### 5. Verify Element Visibility
**Good: Checks visibility**
```bash
node .browser-pilot/bp click --text "Submit" --viewport-only
```
**Consider: May be off-screen**
```bash
node .browser-pilot/bp click --text "Submit"
```
## Troubleshooting
### "Element not found"
**Solution 1: Use Smart Mode**
```bash
# Map auto-generates on page load, just use text search
node .browser-pilot/bp click --text "Submit"
```
**Solution 2: Wait for element**
```bash
node .browser-pilot/bp wait -s "#submit-button" -t 5000
node .browser-pilot/bp click -s "#submit-button"
```
**Solution 3: Check selector in DevTools**
```javascript
// In browser console:
document.querySelector('#submit-button') // CSS
$x("//button[contains(text(), 'Submit')]") // XPath
```
### "Multiple elements match"
**Solution 1: Use indexing**
```bash
node .browser-pilot/bp click --text "Delete" --index 2
```
**Solution 2: Add more specificity**
```bash
# Before
node .browser-pilot/bp click -s "button"
# After
node .browser-pilot/bp click -s "#product-list button.delete"
```
**Solution 3: Filter by type**
```bash
node .browser-pilot/bp click --text "Submit" --type button
```
### "Wrong element clicked"
**Solution 1: Inspect map**
```bash
# Check interaction map JSON
cat .browser-pilot/interaction-map.json | jq '.indexes.byText'
# Find element IDs with your text
# Verify positions and types
```
**Solution 2: Use visibility filter**
```bash
node .browser-pilot/bp click --text "Add to Cart" --viewport-only
```
**Solution 3: Be more specific**
```bash
# Before
node .browser-pilot/bp click --text "Submit"
# After
node .browser-pilot/bp click --text "Submit Order" --type button
```
## Framework-Specific Tips
### React Applications
- Use `data-testid` attributes if available
- Text-based XPath works well (stable)
- Smart Mode recommended for dynamic classes
- Avoid CSS classes (often generated)
### Angular Applications
- Use `ng-` attributes if available
- Text-based selectors are stable
- Smart Mode recommended
- Avoid dynamic `_ngcontent` attributes
### Vue Applications
- Use `data-` attributes if available
- Text-based XPath works well
- Smart Mode recommended
- Avoid scoped CSS classes
### Plain HTML
- CSS selectors work well
- IDs and classes are usually stable
- Direct mode is often sufficient
- Use Smart Mode for dynamic content