7.1 KiB
name, description, trigger, skip_when, related
| name | description | trigger | skip_when | related | |||
|---|---|---|---|---|---|---|---|
| systematic-debugging | Four-phase debugging framework - root cause investigation, pattern analysis, hypothesis testing, implementation. Ensures understanding before attempting fixes. | - Bug reported or test failure observed - Unexpected behavior or error message - Root cause unknown - Previous fix attempt didn't work | - Root cause already known → just fix it - Error deep in call stack, need to trace backward → use root-cause-tracing - Issue obviously caused by your last change → quick verification first |
|
Systematic Debugging
Core principle: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
When to Use
Use for ANY technical issue: test failures, bugs, unexpected behavior, performance problems, build failures, integration issues.
Especially when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- Previous fix didn't work
- You don't fully understand the issue
The Four Phases
Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
MUST complete ALL before Phase 2:
Phase 1 Investigation:
□ Error message copied verbatim: ___________
□ Reproduction confirmed: [steps documented]
□ Recent changes reviewed: [git diff output]
□ Evidence from ALL components: [list components checked]
□ Data flow traced: [origin → error location]
Copy this checklist to TodoWrite.
-
Read Error Messages
- Stack traces completely
- Line numbers, file paths, error codes
- Don't skip past warnings
-
Reproduce Consistently
- Exact steps to trigger
- Happens every time? If not → gather more data
-
Check Recent Changes
git diff, recent commits- New dependencies, config changes
-
Multi-Component Systems
Add diagnostic instrumentation at EACH boundary:
# For each layer, log: - What enters component - What exits component - Environment/config stateRun once, analyze evidence, identify failing layer.
-
Trace Data Flow
Error deep in stack? Use ring-default:root-cause-tracing skill.
Quick version:
- Where does bad value originate?
- Trace up call stack to source
- Fix at source, not symptom
Phase 1 Summary (write before Phase 2):
FINDINGS:
- Error: [exact error]
- Reproduces: [steps]
- Recent changes: [commits]
- Component evidence: [what each shows]
- Data origin: [where bad data starts]
Phase 2: Pattern Analysis
-
Find Working Examples
- Similar working code in codebase
- What works that's similar to what's broken?
-
Compare Against References
- Read reference implementation COMPLETELY
- Don't skim - understand fully
-
Identify Differences
- List EVERY difference (working vs broken)
- Don't assume "that can't matter"
-
Understand Dependencies
- What components, config, environment needed?
- What assumptions does it make?
Phase 3: Hypothesis Testing
-
Form Single Hypothesis
- "I think X is root cause because Y"
- Be specific
-
Test Minimally
- SMALLEST possible change
- One variable at a time
-
Verify and Track
Hypothesis #1: [what] → [result] Hypothesis #2: [what] → [result] Hypothesis #3: [what] → [STOP if fails]If 3 hypotheses fail:
- STOP immediately
- "3 hypotheses failed, architecture review required"
- Discuss with partner before more attempts
-
When You Don't Know
- Say "I don't understand X"
- Ask for help
- Research more
Phase 4: Implementation
Fix root cause, not symptom:
-
Create Failing Test
- Simplest reproduction
- Use ring-default:test-driven-development skill
-
Implement Single Fix
- Address root cause only
- ONE change at a time
- No "while I'm here" improvements
-
Verify Fix
- Test passes?
- No other tests broken?
- Issue resolved?
-
If Fix Doesn't Work
- Count fixes attempted
- If < 3: Return to Phase 1
- If ≥ 3: STOP → Architecture review required
-
After Fix Verified
- Test passes and issue resolved?
- If non-trivial (took > 5 min): Suggest documentation
"The fix has been verified. Would you like to document this solution for future reference? Run:
/ring-default:codify"- Use ring-default:codify-solution skill to capture institutional knowledge
-
If 3+ Fixes Failed: Question Architecture
Pattern indicating architectural problem:
- Each fix reveals new problem elsewhere
- Fixes require massive refactoring
- Each fix creates new symptoms
STOP and discuss: Is architecture sound? Should we refactor vs. fix?
Time Limits
Debugging time boxes:
- 30 min without root cause → Escalate
- 3 failed fixes → Architecture review
- 1 hour total → Stop, document, ask for guidance
Red Flags
STOP and return to Phase 1 if thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "One more fix attempt" (when already tried 2+)
- "Each fix reveals new problem" (architecture issue)
User signals you're wrong:
- "Is that not happening?" → You assumed without verifying
- "Stop guessing" → You're proposing fixes without understanding
- "We're stuck?" → Your approach isn't working
When you see these: STOP. Return to Phase 1.
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare differences, understand dependencies | Identify what's different |
| 3. Hypothesis | Form theory, test minimally, verify one at a time | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix root cause, verify | Bug resolved, tests pass |
Circuit breakers:
- 3 hypotheses fail → STOP, architecture review
- 3 fixes fail → STOP, question fundamentals
- 30 min no root cause → Escalate
Integration with Other Skills
Required sub-skills:
- root-cause-tracing - When error is deep in call stack (Phase 1, Step 5)
- test-driven-development - For failing test case (Phase 4, Step 1)
Post-completion:
- codify-solution - Document non-trivial fixes (Phase 4, Step 5)
Complementary:
- defense-in-depth - Add validation after finding root cause
- verification-before-completion - Verify fix worked before claiming success
Required Patterns
This skill uses these universal patterns:
- State Tracking: See
skills/shared-patterns/state-tracking.md - Failure Recovery: See
skills/shared-patterns/failure-recovery.md - Exit Criteria: See
skills/shared-patterns/exit-criteria.md - TodoWrite: See
skills/shared-patterns/todowrite-integration.md
Apply ALL patterns when using this skill.