Initial commit
This commit is contained in:
200
skills/reflect-appworld-failure/SKILL.md
Normal file
200
skills/reflect-appworld-failure/SKILL.md
Normal file
@@ -0,0 +1,200 @@
|
||||
---
|
||||
name: reflect-appworld-failure
|
||||
description: Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples
|
||||
allowed-tools: Read
|
||||
---
|
||||
|
||||
# Reflect on AppWorld Failure
|
||||
|
||||
Analyze failed AppWorld tasks to extract specific, actionable learnings that can be added to the playbook.
|
||||
|
||||
## Purpose
|
||||
|
||||
When an AppWorld task fails, the Reflector calls this Skill with error details and failed code. You analyze the failure semantically and generate a high-quality bullet with:
|
||||
1. Specific title describing the pattern
|
||||
2. Detailed content with working code examples
|
||||
3. Relevant tags for retrieval
|
||||
4. Appropriate confidence level
|
||||
|
||||
## Input Format
|
||||
|
||||
The input will be a text description with sections:
|
||||
|
||||
```
|
||||
# Task
|
||||
<task instruction>
|
||||
|
||||
## Apps
|
||||
<comma-separated list of apps used>
|
||||
|
||||
## Error Type
|
||||
<error_type: api_misuse, logic_error, timeout, etc.>
|
||||
|
||||
## Error Messages
|
||||
<list of error messages from execution>
|
||||
|
||||
## Failed Code Snippet
|
||||
<relevant code that failed>
|
||||
|
||||
## Missing Patterns (from heuristics)
|
||||
<list of patterns the old system identified>
|
||||
|
||||
## Suggested Fixes (from heuristics)
|
||||
<list of fix suggestions>
|
||||
```
|
||||
|
||||
## Your Analysis Process
|
||||
|
||||
1. **Identify Root Cause**: What was the fundamental mistake?
|
||||
- Wrong API method name?
|
||||
- Missing authentication?
|
||||
- Incorrect data structure access?
|
||||
- Logic error?
|
||||
|
||||
2. **Extract Pattern**: What general pattern does this represent?
|
||||
- Is this specific to one app or applies to multiple?
|
||||
- Is this about API order (login first)?
|
||||
- Is this about method naming conventions?
|
||||
- Is this about data validation?
|
||||
|
||||
3. **Generate Concrete Example**: Create working code that demonstrates the CORRECT pattern
|
||||
|
||||
4. **Write Actionable Bullet**: Make it specific enough that the Generator can apply it
|
||||
|
||||
## Output Format
|
||||
|
||||
Return a JSON object with this structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"bullet": {
|
||||
"id": "bullet-YYYY-MM-DD-HHMMSS",
|
||||
"title": "<Specific pattern title>",
|
||||
"content": "<Detailed explanation with working code example>",
|
||||
"tags": ["app.<app_name>", "<error_category>", "<pattern_type>"],
|
||||
"evidence": [
|
||||
{
|
||||
"type": "execution",
|
||||
"ref": "<task_id>",
|
||||
"note": "<brief note about failure>"
|
||||
}
|
||||
],
|
||||
"confidence": "high|medium|low",
|
||||
"scope": "app|global"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Bullet Quality Guidelines
|
||||
|
||||
### GOOD Bullets (Specific and Actionable)
|
||||
|
||||
**Title**: "Spotify: Use show_playlist_songs() not get_tracks()"
|
||||
**Content**: "Spotify API uses show_playlist_songs(access_token, playlist_id) to retrieve tracks. The method get_tracks() does not exist. Example: `songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlist['id'])`"
|
||||
**Tags**: ["app.spotify", "api_misuse", "method_names", "playlists"]
|
||||
|
||||
**Title**: "Venmo: Call login() before search_transactions()"
|
||||
**Content**: "Venmo API requires authentication token for all operations. Always call venmo.login() first to get access_token, then pass it to other methods. Example: `response = apis.venmo.login(username='user', password='pass'); token = response['access_token']; results = apis.venmo.search_transactions(access_token=token, query={'friend': 'Alice'})`"
|
||||
**Tags**: ["app.venmo", "authentication", "api_order", "search"]
|
||||
|
||||
### BAD Bullets (Too Generic)
|
||||
|
||||
**Title**: "Verify venmo API logic and requirements"
|
||||
**Content**: "When implementing venmo operations: Check task logic and requirements; Missing login() call for venmo"
|
||||
**Tags**: ["logic", "debugging", "api", "app.venmo"]
|
||||
|
||||
**Why Bad**: No concrete code example, vague guidance, doesn't teach the specific pattern
|
||||
|
||||
## Example Analysis
|
||||
|
||||
### Input:
|
||||
```
|
||||
# Task
|
||||
What is the title of the most-liked song in my Spotify playlists
|
||||
|
||||
## Apps
|
||||
spotify
|
||||
|
||||
## Error Type
|
||||
api_misuse
|
||||
|
||||
## Error Messages
|
||||
AttributeError: 'Spotify' object has no attribute 'get_tracks'
|
||||
|
||||
## Failed Code Snippet
|
||||
songs = spotify.get_tracks(playlist_id=pid)
|
||||
|
||||
## Missing Patterns
|
||||
- Use correct Spotify API methods
|
||||
|
||||
## Suggested Fixes
|
||||
- Check Spotify API documentation for available methods
|
||||
```
|
||||
|
||||
### Your Analysis:
|
||||
|
||||
1. **Root Cause**: Code used non-existent method `get_tracks()` instead of correct `show_playlist_songs()`
|
||||
|
||||
2. **Pattern**: Spotify uses `show_*` naming convention for retrieval methods
|
||||
|
||||
3. **Scope**: App-specific (Spotify)
|
||||
|
||||
### Output:
|
||||
```json
|
||||
{
|
||||
"bullet": {
|
||||
"id": "bullet-2025-10-27-123456",
|
||||
"title": "Spotify: Use show_playlist_songs() to get tracks from playlist",
|
||||
"content": "To retrieve songs from a Spotify playlist, use show_playlist_songs(access_token, playlist_id). Don't use get_tracks() - it doesn't exist. Example: `token = apis.spotify.login()['access_token']; playlists = apis.spotify.show_playlist_library(access_token=token); songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlists[0]['id']); most_liked = max(songs, key=lambda s: s['likes'])`",
|
||||
"tags": ["app.spotify", "api_misuse", "method_names", "playlists", "retrieval"],
|
||||
"evidence": [
|
||||
{
|
||||
"type": "execution",
|
||||
"ref": "spotify_task_001",
|
||||
"note": "AttributeError: 'Spotify' object has no attribute 'get_tracks'"
|
||||
}
|
||||
],
|
||||
"confidence": "high",
|
||||
"scope": "app"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Common AppWorld Patterns to Look For
|
||||
|
||||
### Authentication Order
|
||||
- Most apps require login() first to get access_token
|
||||
- Token must be passed to subsequent API calls
|
||||
|
||||
### Method Naming Conventions
|
||||
- Spotify: `show_*` for retrieval (show_playlist_songs, show_album_library)
|
||||
- Venmo: `show_friends`, `send_payment`, `search_transactions`
|
||||
- Gmail: `fetch_emails`, `send_email`
|
||||
- Contacts: `show_contacts`, `add_contact`
|
||||
- Calendar: `show_events`, `create_event`
|
||||
|
||||
### Data Structure Access
|
||||
- API responses may have nested structures
|
||||
- Always check if keys exist before accessing
|
||||
- Use `.get()` with defaults for safety
|
||||
|
||||
### Aggregation Patterns
|
||||
- To find "most-liked song in playlists": Get all playlists → Get songs from each → Find max by likes
|
||||
- To find "most expensive transaction": Get all transactions → Find max by amount
|
||||
|
||||
### Task Completion
|
||||
- ALWAYS call `apis.supervisor.complete_task()` at the end
|
||||
- This signals successful completion to test framework
|
||||
|
||||
## Important Rules
|
||||
|
||||
1. **Be Specific**: Include actual method names, parameter names, and code examples
|
||||
2. **Be Actionable**: The Generator should know exactly what to do after reading your bullet
|
||||
3. **Include Working Code**: Show a complete example that demonstrates the correct pattern
|
||||
4. **Tag Appropriately**: Use `app.<app_name>` for app-specific bullets, plus semantic tags
|
||||
5. **Set Confidence**: "high" for clear patterns, "medium" for uncertain, "low" for speculative
|
||||
6. **Return ONLY JSON**: No explanations, no markdown formatting outside the JSON
|
||||
|
||||
## Response Format
|
||||
|
||||
Return the JSON object as plain text. Make sure it's valid JSON that can be parsed directly.
|
||||
Reference in New Issue
Block a user