Files

Zhongwei Li e732da8316 Initial commit

2025-11-29 18:47:55 +08:00

10 KiB

Raw Permalink Blame History

name: error-troubleshooter description: Automatically troubleshoot unexpected results OR command/script errors without user request. Triggers when: (1) unexpected behavior - command succeeded but expected effect didn't happen, missing expected errors, wrong output, silent failures; (2) explicit failures - stderr, exceptions, non-zero exit, SDK/API errors. Applies systematic diagnosis using error templates, hypothesis testing, and web research for any Stack Overflow-worthy issue.

Error Troubleshooter

Overview

This skill enables systematic troubleshooting of unexpected behavior and technical failures - whether explicit errors or silent anomalies where commands succeed but don't produce expected results. Proactively investigate any mismatch between expected and actual outcomes using a structured approach that balances quick fixes with thorough analysis.

When to Use This Skill

Trigger this skill automatically when encountering either:

(1) Unexpected Behavior (Priority)

Command succeeded but expected effect didn't happen - e.g., configuration set but not taking effect, file created but empty
Missing expected errors - e.g., test was designed to fail but passed, validation that should reject but accepted
Wrong or unexpected output - e.g., different data than expected, incorrect format, unexpected side effects
Silent failures - no error reported but operation clearly didn't work
Behavioral anomalies - program runs but behaves differently than intended

(2) Explicit Failures

Error messages from SDK/API calls - exceptions, error codes, failure responses
Tool execution failures - Bash errors, script crashes, MCP tool failures
Runtime errors - exceptions in any programming language
Build or compilation failures - compiler errors, linking failures
System errors - permission denied, file not found, connection refused

Key principle: If there's any mismatch between expected and actual behavior - whether explicit error or silent anomaly - this skill applies.

Troubleshooting Decision Tree

1. Initial Assessment

When unexpected behavior or an error occurs, immediately assess the situation:

Unexpected Behavior or Error Detected
    ↓
What type of issue is this?
    │
    ├─ UNEXPECTED BEHAVIOR (command succeeded but wrong result)
    │   ↓
    │   Document the mismatch:
    │   - What was expected?
    │   - What actually happened?
    │   - Any error messages? (none expected for unexpected behavior)
    │   ↓
    │   Is the cause obvious? (e.g., wrong variable, typo, wrong file)
    │   ├─ YES → Apply quick fix
    │   │         ↓
    │   │      Did expected behavior occur?
    │   │         ├─ YES → Done ✓
    │   │         └─ NO → Revert, proceed to Rigorous Investigation
    │   │
    │   └─ NO → Proceed directly to Rigorous Investigation
    │
    └─ EXPLICIT ERROR (stderr, exception, non-zero exit)
        ↓
        Is the fix obvious from the error message itself?
        ├─ YES → Apply quick fix (Happy Case Path)
        │         ↓
        │      Did it work?
        │         ├─ YES → Done ✓
        │         └─ NO → Revert changes, proceed to Rigorous Investigation
        │
        └─ NO → Is this a common trivial error?
               ├─ YES → Apply known fix based on experience
               │         ↓
               │      Did it work?
               │         ├─ YES → Done ✓
               │         └─ NO → Revert changes, proceed to Rigorous Investigation
               │
               └─ NO → Proceed directly to Rigorous Investigation

2. Happy Case Path (Quick Resolution)

For issues with obvious causes and fixes:

Unexpected Behavior Quick Fixes:

Obvious typo or wrong variable name
Wrong file path or target
Cached data (clear cache and retry)
Tool not reloaded after changes (restart and retry)

Explicit Error Quick Fixes:

Error message explicitly states the solution (e.g., "Missing required argument --config")
Common trivial errors (e.g., "command not found" → check installation)
Direct and unambiguous error descriptions

Action: Apply the fix immediately and verify the result.

Failure criteria for unexpected behavior: If expected behavior still doesn't occur, revert and switch to rigorous investigation.

Failure criteria for explicit errors: If the error message is unchanged OR the problem clearly worsened, revert immediately and switch to rigorous investigation.

3. Rigorous Investigation Path (Complex Problems)

When quick fixes fail or the problem is non-trivial, follow this systematic approach:

Step 1: Extract Problem Pattern

For Explicit Errors: SDK/API errors often follow fixed templates. Extract the template by:

Removing variable components (file paths, user inputs, timestamps, IDs)
Isolating the core error message structure
Preparing the template for web search

Example:

Original: FileNotFoundError: [Errno 2] No such file or directory: '/home/user/data.csv'
Template: FileNotFoundError No such file or directory

See {baseDir}/references/error-template-patterns.md for detailed guidance.

For Unexpected Behavior: Document the behavior pattern:

What command/operation was performed?
What was the expected outcome?
What actually happened instead?
Are there any observable symptoms (wrong data, missing files, etc.)?
Has this operation worked before? When did it stop working?

Formulate a search query focusing on the behavior:

"command X succeeded but didn't create Y"
"configuration Z not taking effect"
"expected validation to fail but passed"

Step 2: Gather Environment Information

Collect relevant environment details when:

The pattern search doesn't yield clear solutions
Environment-specific factors are likely relevant (versions, configurations, system state)
Context is needed to understand the problem

IMPORTANT: Avoid collecting sensitive information. If sensitive data is necessary, explicitly request user authorization first.

See {baseDir}/references/environment-info-guide.md for collection guidelines and privacy protection.

Step 3: Research the Problem

Use efficient research strategies:

Web Search: Search the extracted pattern (error template or behavior description)
Parallel Investigation: Use Task tool with subagent_type=Explore for multiple research angles simultaneously
Documentation: Search official docs, GitHub Issues, Stack Overflow for similar problems
Counter-evidence: Look for cases where the expected behavior DID occur to identify what's different

Token Efficiency: For complex investigations, delegate research to subagents to avoid context exhaustion.

Step 4: Create Debug Notes File

For difficult problems, create a debug notes file to:

Track theories and test results
Enable parallel investigation
Resume from interruptions
Maintain systematic progress

Use the template in {baseDir}/assets/debug-notes-template.md to structure notes.

Step 5: Formulate and Test Theories

Based on research:

List plausible theories explaining the problem (why error occurred OR why expected behavior didn't happen)
Order by likelihood
Design tests to verify each theory
Execute tests systematically
Document results in debug notes
Iterate until the correct solution is found

For unexpected behavior: Focus theories on "why the expected effect didn't occur" rather than "why an error happened".

Step 6: Implement Solution

Once the correct theory is identified:

Apply the fix
Verify the problem is resolved:
- For errors: Error no longer occurs
- For unexpected behavior: Expected behavior now occurs as intended
Document the solution in debug notes (if notes were created)
Consider if this pattern should be added to common patterns

Token Efficiency Strategies

For complex investigations:

Use Subagents: Delegate research tasks using the Task tool with subagent_type=Explore
File-Based Notes: Write debug notes to files instead of maintaining context in memory
Parallel Research: Launch multiple subagents simultaneously for different research angles
Selective Context: Only load reference files when specifically needed

Key Principles

Proactive Investigation: Don't wait for the user to request troubleshooting—start investigating immediately when unexpected behavior or errors occur
Prioritize Unexpected Behavior: Check for silent failures and behavioral anomalies first, as they're more subtle than explicit errors
Bold Hypotheses, Careful Verification: Generate multiple competing theories, then rigorously verify each with concrete evidence (see {baseDir}/references/problem-solving-mindset.md)
Challenge Your Own Reasoning: Actively search for counter-evidence and successful counter-examples that would disprove your theories
Acknowledge Uncertainty: Present confidence levels; admit when evidence is incomplete rather than pretending certainty
Revert on Failure: If a fix doesn't work, always revert before trying another approach
Systematic Documentation: For difficult problems, maintain structured debug notes
Privacy Protection: Never collect sensitive information without explicit user authorization
Efficient Resource Usage: Use subagents and files to manage context for complex investigations

Resources

This skill includes:

references/

problem-solving-mindset.md - Scientific approach to problem-solving: bold hypotheses, careful verification, and disciplined reasoning
systematic-debugging-methodology.md - Practical debugging framework: Occam's Razor, diagnostic scripts, evidence hierarchy, and real-world examples
troubleshooting-sop.md - Detailed standard operating procedures for systematic troubleshooting
error-template-patterns.md - Guide to identifying and extracting error message templates
environment-info-guide.md - Environment information collection guidelines with privacy protection
common-error-patterns.md - Database of frequently encountered trivial errors and quick fixes

assets/

debug-notes-template.md - Template for structured debugging documentation

These resources are loaded as needed during the troubleshooting process.

10 KiB Raw Permalink Blame History