Files
gh-lerianstudio-ring-default/skills/systematic-debugging/SKILL.md
2025-11-30 08:37:11 +08:00

7.1 KiB

name, description, trigger, skip_when, related
name description trigger skip_when related
systematic-debugging Four-phase debugging framework - root cause investigation, pattern analysis, hypothesis testing, implementation. Ensures understanding before attempting fixes. - Bug reported or test failure observed - Unexpected behavior or error message - Root cause unknown - Previous fix attempt didn't work - Root cause already known → just fix it - Error deep in call stack, need to trace backward → use root-cause-tracing - Issue obviously caused by your last change → quick verification first
complementary
root-cause-tracing

Systematic Debugging

Core principle: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.

When to Use

Use for ANY technical issue: test failures, bugs, unexpected behavior, performance problems, build failures, integration issues.

Especially when:

  • Under time pressure (emergencies make guessing tempting)
  • "Just one quick fix" seems obvious
  • Previous fix didn't work
  • You don't fully understand the issue

The Four Phases

Complete each phase before proceeding to the next.

Phase 1: Root Cause Investigation

MUST complete ALL before Phase 2:

Phase 1 Investigation:
□ Error message copied verbatim: ___________
□ Reproduction confirmed: [steps documented]
□ Recent changes reviewed: [git diff output]
□ Evidence from ALL components: [list components checked]
□ Data flow traced: [origin → error location]

Copy this checklist to TodoWrite.

  1. Read Error Messages

    • Stack traces completely
    • Line numbers, file paths, error codes
    • Don't skip past warnings
  2. Reproduce Consistently

    • Exact steps to trigger
    • Happens every time? If not → gather more data
  3. Check Recent Changes

    • git diff, recent commits
    • New dependencies, config changes
  4. Multi-Component Systems

    Add diagnostic instrumentation at EACH boundary:

    # For each layer, log:
    - What enters component
    - What exits component
    - Environment/config state
    

    Run once, analyze evidence, identify failing layer.

  5. Trace Data Flow

    Error deep in stack? Use ring-default:root-cause-tracing skill.

    Quick version:

    • Where does bad value originate?
    • Trace up call stack to source
    • Fix at source, not symptom

Phase 1 Summary (write before Phase 2):

FINDINGS:
- Error: [exact error]
- Reproduces: [steps]
- Recent changes: [commits]
- Component evidence: [what each shows]
- Data origin: [where bad data starts]

Phase 2: Pattern Analysis

  1. Find Working Examples

    • Similar working code in codebase
    • What works that's similar to what's broken?
  2. Compare Against References

    • Read reference implementation COMPLETELY
    • Don't skim - understand fully
  3. Identify Differences

    • List EVERY difference (working vs broken)
    • Don't assume "that can't matter"
  4. Understand Dependencies

    • What components, config, environment needed?
    • What assumptions does it make?

Phase 3: Hypothesis Testing

  1. Form Single Hypothesis

    • "I think X is root cause because Y"
    • Be specific
  2. Test Minimally

    • SMALLEST possible change
    • One variable at a time
  3. Verify and Track

    Hypothesis #1: [what] → [result]
    Hypothesis #2: [what] → [result]
    Hypothesis #3: [what] → [STOP if fails]
    

    If 3 hypotheses fail:

    • STOP immediately
    • "3 hypotheses failed, architecture review required"
    • Discuss with partner before more attempts
  4. When You Don't Know

    • Say "I don't understand X"
    • Ask for help
    • Research more

Phase 4: Implementation

Fix root cause, not symptom:

  1. Create Failing Test

    • Simplest reproduction
    • Use ring-default:test-driven-development skill
  2. Implement Single Fix

    • Address root cause only
    • ONE change at a time
    • No "while I'm here" improvements
  3. Verify Fix

    • Test passes?
    • No other tests broken?
    • Issue resolved?
  4. If Fix Doesn't Work

    • Count fixes attempted
    • If < 3: Return to Phase 1
    • If ≥ 3: STOP → Architecture review required
  5. After Fix Verified

    • Test passes and issue resolved?
    • If non-trivial (took > 5 min): Suggest documentation

    "The fix has been verified. Would you like to document this solution for future reference? Run: /ring-default:codify"

    • Use ring-default:codify-solution skill to capture institutional knowledge
  6. If 3+ Fixes Failed: Question Architecture

    Pattern indicating architectural problem:

    • Each fix reveals new problem elsewhere
    • Fixes require massive refactoring
    • Each fix creates new symptoms

    STOP and discuss: Is architecture sound? Should we refactor vs. fix?

Time Limits

Debugging time boxes:

  • 30 min without root cause → Escalate
  • 3 failed fixes → Architecture review
  • 1 hour total → Stop, document, ask for guidance

Red Flags

STOP and return to Phase 1 if thinking:

  • "Quick fix for now, investigate later"
  • "Just try changing X and see if it works"
  • "Add multiple changes, run tests"
  • "Skip the test, I'll manually verify"
  • "It's probably X, let me fix that"
  • "I don't fully understand but this might work"
  • "One more fix attempt" (when already tried 2+)
  • "Each fix reveals new problem" (architecture issue)

User signals you're wrong:

  • "Is that not happening?" → You assumed without verifying
  • "Stop guessing" → You're proposing fixes without understanding
  • "We're stuck?" → Your approach isn't working

When you see these: STOP. Return to Phase 1.

Quick Reference

Phase Key Activities Success Criteria
1. Root Cause Read errors, reproduce, check changes, gather evidence, trace data flow Understand WHAT and WHY
2. Pattern Find working examples, compare differences, understand dependencies Identify what's different
3. Hypothesis Form theory, test minimally, verify one at a time Confirmed or new hypothesis
4. Implementation Create test, fix root cause, verify Bug resolved, tests pass

Circuit breakers:

  • 3 hypotheses fail → STOP, architecture review
  • 3 fixes fail → STOP, question fundamentals
  • 30 min no root cause → Escalate

Integration with Other Skills

Required sub-skills:

  • root-cause-tracing - When error is deep in call stack (Phase 1, Step 5)
  • test-driven-development - For failing test case (Phase 4, Step 1)

Post-completion:

  • codify-solution - Document non-trivial fixes (Phase 4, Step 5)

Complementary:

  • defense-in-depth - Add validation after finding root cause
  • verification-before-completion - Verify fix worked before claiming success

Required Patterns

This skill uses these universal patterns:

  • State Tracking: See skills/shared-patterns/state-tracking.md
  • Failure Recovery: See skills/shared-patterns/failure-recovery.md
  • Exit Criteria: See skills/shared-patterns/exit-criteria.md
  • TodoWrite: See skills/shared-patterns/todowrite-integration.md

Apply ALL patterns when using this skill.