222 lines
10 KiB
Markdown
222 lines
10 KiB
Markdown
---
|
|
name: error-troubleshooter
|
|
description: Automatically troubleshoot unexpected results OR command/script errors without user request. Triggers when: (1) unexpected behavior - command succeeded but expected effect didn't happen, missing expected errors, wrong output, silent failures; (2) explicit failures - stderr, exceptions, non-zero exit, SDK/API errors. Applies systematic diagnosis using error templates, hypothesis testing, and web research for any Stack Overflow-worthy issue.
|
|
---
|
|
|
|
# Error Troubleshooter
|
|
|
|
## Overview
|
|
|
|
This skill enables systematic troubleshooting of unexpected behavior and technical failures - whether explicit errors or silent anomalies where commands succeed but don't produce expected results. Proactively investigate any mismatch between expected and actual outcomes using a structured approach that balances quick fixes with thorough analysis.
|
|
|
|
## When to Use This Skill
|
|
|
|
Trigger this skill automatically when encountering either:
|
|
|
|
### (1) Unexpected Behavior (Priority)
|
|
- **Command succeeded but expected effect didn't happen** - e.g., configuration set but not taking effect, file created but empty
|
|
- **Missing expected errors** - e.g., test was designed to fail but passed, validation that should reject but accepted
|
|
- **Wrong or unexpected output** - e.g., different data than expected, incorrect format, unexpected side effects
|
|
- **Silent failures** - no error reported but operation clearly didn't work
|
|
- **Behavioral anomalies** - program runs but behaves differently than intended
|
|
|
|
### (2) Explicit Failures
|
|
- **Error messages from SDK/API calls** - exceptions, error codes, failure responses
|
|
- **Tool execution failures** - Bash errors, script crashes, MCP tool failures
|
|
- **Runtime errors** - exceptions in any programming language
|
|
- **Build or compilation failures** - compiler errors, linking failures
|
|
- **System errors** - permission denied, file not found, connection refused
|
|
|
|
**Key principle**: If there's any mismatch between expected and actual behavior - whether explicit error or silent anomaly - this skill applies.
|
|
|
|
## Troubleshooting Decision Tree
|
|
|
|
### 1. Initial Assessment
|
|
|
|
When unexpected behavior or an error occurs, immediately assess the situation:
|
|
|
|
```
|
|
Unexpected Behavior or Error Detected
|
|
↓
|
|
What type of issue is this?
|
|
│
|
|
├─ UNEXPECTED BEHAVIOR (command succeeded but wrong result)
|
|
│ ↓
|
|
│ Document the mismatch:
|
|
│ - What was expected?
|
|
│ - What actually happened?
|
|
│ - Any error messages? (none expected for unexpected behavior)
|
|
│ ↓
|
|
│ Is the cause obvious? (e.g., wrong variable, typo, wrong file)
|
|
│ ├─ YES → Apply quick fix
|
|
│ │ ↓
|
|
│ │ Did expected behavior occur?
|
|
│ │ ├─ YES → Done ✓
|
|
│ │ └─ NO → Revert, proceed to Rigorous Investigation
|
|
│ │
|
|
│ └─ NO → Proceed directly to Rigorous Investigation
|
|
│
|
|
└─ EXPLICIT ERROR (stderr, exception, non-zero exit)
|
|
↓
|
|
Is the fix obvious from the error message itself?
|
|
├─ YES → Apply quick fix (Happy Case Path)
|
|
│ ↓
|
|
│ Did it work?
|
|
│ ├─ YES → Done ✓
|
|
│ └─ NO → Revert changes, proceed to Rigorous Investigation
|
|
│
|
|
└─ NO → Is this a common trivial error?
|
|
├─ YES → Apply known fix based on experience
|
|
│ ↓
|
|
│ Did it work?
|
|
│ ├─ YES → Done ✓
|
|
│ └─ NO → Revert changes, proceed to Rigorous Investigation
|
|
│
|
|
└─ NO → Proceed directly to Rigorous Investigation
|
|
```
|
|
|
|
### 2. Happy Case Path (Quick Resolution)
|
|
|
|
For issues with obvious causes and fixes:
|
|
|
|
**Unexpected Behavior Quick Fixes:**
|
|
- Obvious typo or wrong variable name
|
|
- Wrong file path or target
|
|
- Cached data (clear cache and retry)
|
|
- Tool not reloaded after changes (restart and retry)
|
|
|
|
**Explicit Error Quick Fixes:**
|
|
- Error message explicitly states the solution (e.g., "Missing required argument --config")
|
|
- Common trivial errors (e.g., "command not found" → check installation)
|
|
- Direct and unambiguous error descriptions
|
|
|
|
**Action**: Apply the fix immediately and verify the result.
|
|
|
|
**Failure criteria for unexpected behavior**: If expected behavior still doesn't occur, revert and switch to rigorous investigation.
|
|
|
|
**Failure criteria for explicit errors**: If the error message is unchanged OR the problem clearly worsened, revert immediately and switch to rigorous investigation.
|
|
|
|
### 3. Rigorous Investigation Path (Complex Problems)
|
|
|
|
When quick fixes fail or the problem is non-trivial, follow this systematic approach:
|
|
|
|
#### Step 1: Extract Problem Pattern
|
|
|
|
**For Explicit Errors:**
|
|
SDK/API errors often follow fixed templates. Extract the template by:
|
|
- Removing variable components (file paths, user inputs, timestamps, IDs)
|
|
- Isolating the core error message structure
|
|
- Preparing the template for web search
|
|
|
|
Example:
|
|
```
|
|
Original: FileNotFoundError: [Errno 2] No such file or directory: '/home/user/data.csv'
|
|
Template: FileNotFoundError No such file or directory
|
|
```
|
|
|
|
See `{baseDir}/references/error-template-patterns.md` for detailed guidance.
|
|
|
|
**For Unexpected Behavior:**
|
|
Document the behavior pattern:
|
|
- What command/operation was performed?
|
|
- What was the expected outcome?
|
|
- What actually happened instead?
|
|
- Are there any observable symptoms (wrong data, missing files, etc.)?
|
|
- Has this operation worked before? When did it stop working?
|
|
|
|
Formulate a search query focusing on the behavior:
|
|
- "command X succeeded but didn't create Y"
|
|
- "configuration Z not taking effect"
|
|
- "expected validation to fail but passed"
|
|
|
|
#### Step 2: Gather Environment Information
|
|
|
|
Collect relevant environment details when:
|
|
- The pattern search doesn't yield clear solutions
|
|
- Environment-specific factors are likely relevant (versions, configurations, system state)
|
|
- Context is needed to understand the problem
|
|
|
|
**IMPORTANT**: Avoid collecting sensitive information. If sensitive data is necessary, explicitly request user authorization first.
|
|
|
|
See `{baseDir}/references/environment-info-guide.md` for collection guidelines and privacy protection.
|
|
|
|
#### Step 3: Research the Problem
|
|
|
|
Use efficient research strategies:
|
|
- **Web Search**: Search the extracted pattern (error template or behavior description)
|
|
- **Parallel Investigation**: Use Task tool with subagent_type=Explore for multiple research angles simultaneously
|
|
- **Documentation**: Search official docs, GitHub Issues, Stack Overflow for similar problems
|
|
- **Counter-evidence**: Look for cases where the expected behavior DID occur to identify what's different
|
|
|
|
**Token Efficiency**: For complex investigations, delegate research to subagents to avoid context exhaustion.
|
|
|
|
#### Step 4: Create Debug Notes File
|
|
|
|
For difficult problems, create a debug notes file to:
|
|
- Track theories and test results
|
|
- Enable parallel investigation
|
|
- Resume from interruptions
|
|
- Maintain systematic progress
|
|
|
|
Use the template in `{baseDir}/assets/debug-notes-template.md` to structure notes.
|
|
|
|
#### Step 5: Formulate and Test Theories
|
|
|
|
Based on research:
|
|
1. List plausible theories explaining the problem (why error occurred OR why expected behavior didn't happen)
|
|
2. Order by likelihood
|
|
3. Design tests to verify each theory
|
|
4. Execute tests systematically
|
|
5. Document results in debug notes
|
|
6. Iterate until the correct solution is found
|
|
|
|
**For unexpected behavior**: Focus theories on "why the expected effect didn't occur" rather than "why an error happened".
|
|
|
|
#### Step 6: Implement Solution
|
|
|
|
Once the correct theory is identified:
|
|
- Apply the fix
|
|
- Verify the problem is resolved:
|
|
- **For errors**: Error no longer occurs
|
|
- **For unexpected behavior**: Expected behavior now occurs as intended
|
|
- Document the solution in debug notes (if notes were created)
|
|
- Consider if this pattern should be added to common patterns
|
|
|
|
## Token Efficiency Strategies
|
|
|
|
For complex investigations:
|
|
|
|
1. **Use Subagents**: Delegate research tasks using the Task tool with subagent_type=Explore
|
|
2. **File-Based Notes**: Write debug notes to files instead of maintaining context in memory
|
|
3. **Parallel Research**: Launch multiple subagents simultaneously for different research angles
|
|
4. **Selective Context**: Only load reference files when specifically needed
|
|
|
|
## Key Principles
|
|
|
|
1. **Proactive Investigation**: Don't wait for the user to request troubleshooting—start investigating immediately when unexpected behavior or errors occur
|
|
2. **Prioritize Unexpected Behavior**: Check for silent failures and behavioral anomalies first, as they're more subtle than explicit errors
|
|
3. **Bold Hypotheses, Careful Verification**: Generate multiple competing theories, then rigorously verify each with concrete evidence (see `{baseDir}/references/problem-solving-mindset.md`)
|
|
4. **Challenge Your Own Reasoning**: Actively search for counter-evidence and successful counter-examples that would disprove your theories
|
|
5. **Acknowledge Uncertainty**: Present confidence levels; admit when evidence is incomplete rather than pretending certainty
|
|
6. **Revert on Failure**: If a fix doesn't work, always revert before trying another approach
|
|
7. **Systematic Documentation**: For difficult problems, maintain structured debug notes
|
|
8. **Privacy Protection**: Never collect sensitive information without explicit user authorization
|
|
9. **Efficient Resource Usage**: Use subagents and files to manage context for complex investigations
|
|
|
|
## Resources
|
|
|
|
This skill includes:
|
|
|
|
### references/
|
|
- `problem-solving-mindset.md` - Scientific approach to problem-solving: bold hypotheses, careful verification, and disciplined reasoning
|
|
- `systematic-debugging-methodology.md` - Practical debugging framework: Occam's Razor, diagnostic scripts, evidence hierarchy, and real-world examples
|
|
- `troubleshooting-sop.md` - Detailed standard operating procedures for systematic troubleshooting
|
|
- `error-template-patterns.md` - Guide to identifying and extracting error message templates
|
|
- `environment-info-guide.md` - Environment information collection guidelines with privacy protection
|
|
- `common-error-patterns.md` - Database of frequently encountered trivial errors and quick fixes
|
|
|
|
### assets/
|
|
- `debug-notes-template.md` - Template for structured debugging documentation
|
|
|
|
These resources are loaded as needed during the troubleshooting process.
|