Files
gh-dev-gom-claude-code-mark…/skills/references/interaction-map.md
2025-11-29 18:18:56 +08:00

435 lines
12 KiB
Markdown

# Interaction Map System
## Overview
The Interaction Map system provides reliable element targeting for browser automation by generating a structured JSON representation of all interactive elements on a webpage. This eliminates brittle CSS selectors and enables text-based element search with automatic selector generation.
## Architecture
### Components
1. **Map Generator** (`src/cdp/map/generate-interaction-map.ts`)
- Browser-side script that extracts all interactive elements
- Generates multiple selector types for each element
- Handles SVG elements, disabled states, React components
2. **Map Manager** (`src/daemon/map-manager.ts`)
- Daemon-level automatic map generation on page load
- 10-minute cache with auto-regeneration
- URL-based cache validation
- Event-driven DOM stabilization detection
3. **Map Query Module** (`src/cdp/map/query-map.ts`)
- Loads and queries interaction maps
- Searches by text, type, ID, visibility
- Returns best selector with alternatives
4. **CLI Integration** (`src/cli/commands/interaction.ts`)
- Smart Mode options: `--text`, `--index`, `--type`, `--viewport-only`
- Automatic map querying before action execution
- Fallback to alternative selectors on failure
### Automatic Map Generation
Maps are automatically generated when:
- Navigating to a new page (`node .browser-pilot/bp navigate -u "<url>"`)
- Page reload (`node .browser-pilot/bp reload`)
- Cache expires (10 minutes)
- Manual force generation (daemon command)
No manual map generation needed - the daemon handles it automatically.
Output location: `.browser-pilot/interaction-map.json`
## JSON Structure
Maps use a hybrid structure optimized for both direct access and search:
```json
{
"url": "https://example.com",
"timestamp": "2025-11-05T14:39:03.598+09:00",
"viewport": {
"width": 2560,
"height": 1305
},
"elements": {
"elem_0": {
"id": "elem_0",
"type": "button",
"tag": "button",
"text": "Submit",
"value": null,
"selectors": {
"byText": "//button[contains(text(), 'Submit')]",
"byId": "#submit-btn",
"byCSS": "button.btn.btn-primary",
"byRole": "[role='button']",
"byAriaLabel": "[aria-label='Submit form']"
},
"attributes": {
"id": "submit-btn",
"class": "btn btn-primary",
"disabled": false
},
"position": {
"x": 1275,
"y": 650
},
"visibility": {
"inViewport": true,
"visible": true,
"obscured": false
},
"context": {
"section": "Form"
}
}
},
"indexes": {
"byText": {
"Submit": ["elem_0", "elem_15"],
"Delete": ["elem_5", "elem_6", "elem_7"]
},
"byType": {
"button": ["elem_0", "elem_1", "elem_2"],
"input-text": ["elem_10", "elem_11"]
},
"inViewport": ["elem_0", "elem_1", "elem_2", "elem_10"]
},
"statistics": {
"total": 45,
"byType": {
"button": 12,
"input-text": 5,
"a": 8
},
"duplicates": 3
}
}
```
### Key Features
**1. Key-Value Structure** (`elements`)
- Direct ID access: `map.elements["elem_0"]`
- Avoids array iteration for known IDs
**2. Indexes** (fast lookup)
- `byText`: Maps text content → element IDs
- `byType`: Maps element types → element IDs
- `inViewport`: Array of visible element IDs
**3. Multiple Selectors**
- `byText`: XPath with tag name (e.g., `//button[contains(text(), 'Submit')]`)
- `byId`: CSS ID selector (highest priority)
- `byCSS`: CSS class selector
- `byRole`: ARIA role selector
- `byAriaLabel`: ARIA label selector
**4. Automatic Indexing**
- Duplicate text elements get indexed: `(//button[contains(text(), 'Delete')])[2]`
- Enables "click the 3rd Delete button" functionality
**5. Auto-Caching**
- 10-minute cache TTL
- Automatically regenerates on expiration or navigation
- URL-based validation to prevent stale maps
## Element Detection
### Interactive Element Types
The map generator detects:
- Standard inputs: `<input>`, `<button>`, `<select>`, `<textarea>`
- Links: `<a href="...">`
- ARIA roles: `button`, `link`, `textbox`, `checkbox`, `radio`, etc.
- Click handlers: Elements with `onclick`, React event handlers
- Cursor style: `cursor: pointer`
- Tab-navigable: `tabindex >= 0`
### Special Cases
**SVG Elements:**
```typescript
// Handles SVGAnimatedString className
const className = typeof el.className === 'string'
? el.className
: (el.className.baseVal || '');
```
**Disabled Buttons:**
```typescript
// Standard interactive elements included even if disabled
const isStandardInteractive = ['INPUT', 'BUTTON', 'SELECT', 'TEXTAREA', 'A'].includes(tag);
if (!isStandardInteractive && style.pointerEvents === 'none') {
return false; // Skip
}
```
**React Components:**
```typescript
// Detect React event handlers
const reactProps = Object.keys(el).filter(key => key.startsWith('__react'));
const hasReactHandlers = reactProps.some(prop => {
const value = el[prop];
return value && typeof value === 'object' && value.onClick;
});
```
## Selector Generation
### Priority Order
Query system selects best selector with this priority:
1. **byId** (highest priority)
- Most stable, unique identifier
- Example: `#login-button`
2. **byText** (indexed for duplicates)
- Tag-specific XPath: `//button[contains(text(), 'Submit')]`
- With indexing: `(//button[contains(text(), 'Delete')])[2]`
3. **byCSS**
- Safe classes only (alphanumeric, hyphens, underscores)
- Example: `button.btn.btn-primary`
- Skips generic tag-only selectors
4. **byRole**
- ARIA role attribute
- Example: `[role="button"]`
5. **byAriaLabel** (lowest priority)
- ARIA label attribute
- Example: `[aria-label="Submit form"]`
### Text-Based XPath
XPath selectors include tag names for precision:
**Before:** `//*[contains(text(), 'Submit')]`
- Problem: Matches any element with that text (div, span, button, etc.)
**After:** `//button[contains(text(), 'Submit')]`
- Solution: Only matches `<button>` elements
- More precise, faster execution
## Query API
### Query Options
```typescript
interface QueryOptions {
text?: string; // Search by text content
type?: string; // Filter by element type (supports aliases: "input" → "input-*")
tag?: string; // Filter by HTML tag (e.g., "input", "button")
index?: number; // Select nth match (1-based)
viewportOnly?: boolean; // Only visible elements
id?: string; // Direct ID lookup
}
```
**Type Aliases:**
- Generic types auto-expand to match all subtypes
- `type: "input"` → matches `input`, `input-text`, `input-search`, `input-password`, etc.
- `type: "button"` → matches `button`, `button-submit`, `button-reset`, etc.
- Specific types match exactly: `type: "input-search"` → only `input-search`
**Tag vs Type:**
- `tag`: Filters by HTML tag name (e.g., `<input>`, `<button>`)
- `type`: Filters by interaction map type classification (more specific, includes subtypes)
- Use `tag` for broader matching, `type` for precise targeting
**3-Stage Fallback (Automatic):**
When element not found, system automatically:
1. Tries type-based search (with alias expansion)
2. Falls back to tag-based search (if type specified)
3. Regenerates map and retries (up to 3 attempts)
### Usage Examples
**Direct ID lookup:**
```typescript
const results = queryMap(map, { id: 'elem_0' });
// Returns: Single element with that ID
```
**Text search:**
```typescript
const results = queryMap(map, { text: 'Delete' });
// Returns: All elements containing "Delete"
```
**Text + index:**
```typescript
const results = queryMap(map, { text: 'Delete', index: 2 });
// Returns: Second element containing "Delete"
```
**Type filter:**
```typescript
const results = queryMap(map, { type: 'button' });
// Returns: All button elements
```
**Text + type:**
```typescript
const results = queryMap(map, { text: 'Submit', type: 'button' });
// Returns: Button elements containing "Submit"
```
**Visibility filter:**
```typescript
const results = queryMap(map, { text: 'Add to Cart', viewportOnly: true });
// Returns: Only "Add to Cart" elements currently visible
```
### Fuzzy Search
When exact text match fails, falls back to fuzzy search:
```typescript
// Query: { text: 'menu' }
// Matches: "메뉴로 돌아가기", "Main Menu", "menu button"
// Case-insensitive, substring matching
```
## CLI Smart Mode
### Click Command
```bash
# Search by text
node .browser-pilot/bp click --text "Submit"
# With index for duplicates
node .browser-pilot/bp click --text "Delete" --index 2
# Filter by type
node .browser-pilot/bp click --text "Add to Cart" --type button
# Visible elements only
node .browser-pilot/bp click --text "Next" --viewport-only
```
### Fill Command
```bash
# Search input by label
node .browser-pilot/bp fill --text "Username" -v "testuser"
# Filter by input type
node .browser-pilot/bp fill --text "Password" -v "secret" --type input-password
# Visible inputs only
node .browser-pilot/bp fill --text "Email" -v "test@example.com" --viewport-only
```
## Cache Management
### Automatic Cache
Maps are cached for 10 minutes with automatic management:
- Auto-generated on first page load
- Auto-regenerated after 10 minutes
- Auto-regenerated on navigation
- URL validation prevents stale maps
Cache location: `.browser-pilot/map-cache.json`
### Manual Control (Daemon Commands)
Force regenerate map:
```bash
npm run bp:daemon-send -- --command MAP_GENERATE --params '{"force":true}'
```
Query current map:
```bash
npm run bp:daemon-send -- --command MAP_QUERY --params '{"text":"Submit","type":"button"}'
```
## Best Practices
1. **Let daemon auto-manage**
- Maps generate automatically on page load
- No manual generation needed
2. **Use text + index for duplicates**
- Better than CSS classes that may change
- More readable: "click 2nd Delete" vs complex selector
3. **Filter by type**
- Narrows results when text is ambiguous
- `--type button` excludes links, divs with same text
4. **Verify visibility**
- `--viewport-only` ensures element is on screen
- Avoids clicking hidden/off-screen elements
5. **Check map statistics**
- Review duplicates count in map JSON
- Helps determine if indexing is needed
6. **Fallback handling**
- Smart Mode automatically tries alternative selectors
- Check console for errors if action fails
## Troubleshooting
### Element not found in map
**Cause:** Element may not be detected as interactive
**Solutions:**
1. Check if element has click handler: Look for `onclick`, React handlers
2. Verify cursor style: Should be `pointer` for clickable elements
3. Check ARIA role: Element should have appropriate role
4. Force regenerate map if recently added to page
### Wrong element selected
**Cause:** Multiple elements with same text
**Solutions:**
1. Use `--index` to select specific match
2. Add `--type` filter to narrow results
3. Use `--viewport-only` to exclude off-screen elements
4. Check element position in map JSON
### Map out of date
**Cause:** Page changed after map generation
**Solutions:**
1. Maps auto-regenerate after 10 minutes
2. Force regenerate with daemon command
3. Check timestamp in map JSON
4. Verify URL matches current page
### Cache not updating
**Cause:** URL changed but cache still returns old map
**Solutions:**
1. Daemon validates URL before returning cache
2. Force regenerate with `force:true` parameter
3. Check cache file for URL mismatch
4. Restart daemon if persists
## Future Enhancements
Current status (v1.3.0):
- ✓ Automatic map generation on page load
- ✓ Daemon-level map caching and management
- ✓ Action verification with automatic retry
- ✓ URL-based cache validation
- ✓ Chain mode with automatic map synchronization
- ✓ Handler architecture refactoring for maintainability
Planned improvements:
- Visual map inspector tool
- Map diff for debugging selector changes
- Performance metrics and optimization
- Additional daemon commands (wait-idle, sleep)