10 KiB
Error Troubleshooter
Overview
This skill enables systematic troubleshooting of unexpected behavior and technical failures - whether explicit errors or silent anomalies where commands succeed but don't produce expected results. Proactively investigate any mismatch between expected and actual outcomes using a structured approach that balances quick fixes with thorough analysis.
When to Use This Skill
Trigger this skill automatically when encountering either:
(1) Unexpected Behavior (Priority)
- Command succeeded but expected effect didn't happen - e.g., configuration set but not taking effect, file created but empty
- Missing expected errors - e.g., test was designed to fail but passed, validation that should reject but accepted
- Wrong or unexpected output - e.g., different data than expected, incorrect format, unexpected side effects
- Silent failures - no error reported but operation clearly didn't work
- Behavioral anomalies - program runs but behaves differently than intended
(2) Explicit Failures
- Error messages from SDK/API calls - exceptions, error codes, failure responses
- Tool execution failures - Bash errors, script crashes, MCP tool failures
- Runtime errors - exceptions in any programming language
- Build or compilation failures - compiler errors, linking failures
- System errors - permission denied, file not found, connection refused
Key principle: If there's any mismatch between expected and actual behavior - whether explicit error or silent anomaly - this skill applies.
Troubleshooting Decision Tree
1. Initial Assessment
When unexpected behavior or an error occurs, immediately assess the situation:
Unexpected Behavior or Error Detected
↓
What type of issue is this?
│
├─ UNEXPECTED BEHAVIOR (command succeeded but wrong result)
│ ↓
│ Document the mismatch:
│ - What was expected?
│ - What actually happened?
│ - Any error messages? (none expected for unexpected behavior)
│ ↓
│ Is the cause obvious? (e.g., wrong variable, typo, wrong file)
│ ├─ YES → Apply quick fix
│ │ ↓
│ │ Did expected behavior occur?
│ │ ├─ YES → Done ✓
│ │ └─ NO → Revert, proceed to Rigorous Investigation
│ │
│ └─ NO → Proceed directly to Rigorous Investigation
│
└─ EXPLICIT ERROR (stderr, exception, non-zero exit)
↓
Is the fix obvious from the error message itself?
├─ YES → Apply quick fix (Happy Case Path)
│ ↓
│ Did it work?
│ ├─ YES → Done ✓
│ └─ NO → Revert changes, proceed to Rigorous Investigation
│
└─ NO → Is this a common trivial error?
├─ YES → Apply known fix based on experience
│ ↓
│ Did it work?
│ ├─ YES → Done ✓
│ └─ NO → Revert changes, proceed to Rigorous Investigation
│
└─ NO → Proceed directly to Rigorous Investigation
2. Happy Case Path (Quick Resolution)
For issues with obvious causes and fixes:
Unexpected Behavior Quick Fixes:
- Obvious typo or wrong variable name
- Wrong file path or target
- Cached data (clear cache and retry)
- Tool not reloaded after changes (restart and retry)
Explicit Error Quick Fixes:
- Error message explicitly states the solution (e.g., "Missing required argument --config")
- Common trivial errors (e.g., "command not found" → check installation)
- Direct and unambiguous error descriptions
Action: Apply the fix immediately and verify the result.
Failure criteria for unexpected behavior: If expected behavior still doesn't occur, revert and switch to rigorous investigation.
Failure criteria for explicit errors: If the error message is unchanged OR the problem clearly worsened, revert immediately and switch to rigorous investigation.
3. Rigorous Investigation Path (Complex Problems)
When quick fixes fail or the problem is non-trivial, follow this systematic approach:
Step 1: Extract Problem Pattern
For Explicit Errors: SDK/API errors often follow fixed templates. Extract the template by:
- Removing variable components (file paths, user inputs, timestamps, IDs)
- Isolating the core error message structure
- Preparing the template for web search
Example:
Original: FileNotFoundError: [Errno 2] No such file or directory: '/home/user/data.csv'
Template: FileNotFoundError No such file or directory
See {baseDir}/references/error-template-patterns.md for detailed guidance.
For Unexpected Behavior: Document the behavior pattern:
- What command/operation was performed?
- What was the expected outcome?
- What actually happened instead?
- Are there any observable symptoms (wrong data, missing files, etc.)?
- Has this operation worked before? When did it stop working?
Formulate a search query focusing on the behavior:
- "command X succeeded but didn't create Y"
- "configuration Z not taking effect"
- "expected validation to fail but passed"
Step 2: Gather Environment Information
Collect relevant environment details when:
- The pattern search doesn't yield clear solutions
- Environment-specific factors are likely relevant (versions, configurations, system state)
- Context is needed to understand the problem
IMPORTANT: Avoid collecting sensitive information. If sensitive data is necessary, explicitly request user authorization first.
See {baseDir}/references/environment-info-guide.md for collection guidelines and privacy protection.
Step 3: Research the Problem
Use efficient research strategies:
- Web Search: Search the extracted pattern (error template or behavior description)
- Parallel Investigation: Use Task tool with subagent_type=Explore for multiple research angles simultaneously
- Documentation: Search official docs, GitHub Issues, Stack Overflow for similar problems
- Counter-evidence: Look for cases where the expected behavior DID occur to identify what's different
Token Efficiency: For complex investigations, delegate research to subagents to avoid context exhaustion.
Step 4: Create Debug Notes File
For difficult problems, create a debug notes file to:
- Track theories and test results
- Enable parallel investigation
- Resume from interruptions
- Maintain systematic progress
Use the template in {baseDir}/assets/debug-notes-template.md to structure notes.
Step 5: Formulate and Test Theories
Based on research:
- List plausible theories explaining the problem (why error occurred OR why expected behavior didn't happen)
- Order by likelihood
- Design tests to verify each theory
- Execute tests systematically
- Document results in debug notes
- Iterate until the correct solution is found
For unexpected behavior: Focus theories on "why the expected effect didn't occur" rather than "why an error happened".
Step 6: Implement Solution
Once the correct theory is identified:
- Apply the fix
- Verify the problem is resolved:
- For errors: Error no longer occurs
- For unexpected behavior: Expected behavior now occurs as intended
- Document the solution in debug notes (if notes were created)
- Consider if this pattern should be added to common patterns
Token Efficiency Strategies
For complex investigations:
- Use Subagents: Delegate research tasks using the Task tool with subagent_type=Explore
- File-Based Notes: Write debug notes to files instead of maintaining context in memory
- Parallel Research: Launch multiple subagents simultaneously for different research angles
- Selective Context: Only load reference files when specifically needed
Key Principles
- Proactive Investigation: Don't wait for the user to request troubleshooting—start investigating immediately when unexpected behavior or errors occur
- Prioritize Unexpected Behavior: Check for silent failures and behavioral anomalies first, as they're more subtle than explicit errors
- Bold Hypotheses, Careful Verification: Generate multiple competing theories, then rigorously verify each with concrete evidence (see
{baseDir}/references/problem-solving-mindset.md) - Challenge Your Own Reasoning: Actively search for counter-evidence and successful counter-examples that would disprove your theories
- Acknowledge Uncertainty: Present confidence levels; admit when evidence is incomplete rather than pretending certainty
- Revert on Failure: If a fix doesn't work, always revert before trying another approach
- Systematic Documentation: For difficult problems, maintain structured debug notes
- Privacy Protection: Never collect sensitive information without explicit user authorization
- Efficient Resource Usage: Use subagents and files to manage context for complex investigations
Resources
This skill includes:
references/
problem-solving-mindset.md- Scientific approach to problem-solving: bold hypotheses, careful verification, and disciplined reasoningsystematic-debugging-methodology.md- Practical debugging framework: Occam's Razor, diagnostic scripts, evidence hierarchy, and real-world examplestroubleshooting-sop.md- Detailed standard operating procedures for systematic troubleshootingerror-template-patterns.md- Guide to identifying and extracting error message templatesenvironment-info-guide.md- Environment information collection guidelines with privacy protectioncommon-error-patterns.md- Database of frequently encountered trivial errors and quick fixes
assets/
debug-notes-template.md- Template for structured debugging documentation
These resources are loaded as needed during the troubleshooting process.