gh-lerianstudio-ring-default/skills/systematic-debugging/SKILL.md

---
name: systematic-debugging
description: |
  Four-phase debugging framework - root cause investigation, pattern analysis,
  hypothesis testing, implementation. Ensures understanding before attempting fixes.

trigger: |
  - Bug reported or test failure observed
  - Unexpected behavior or error message
  - Root cause unknown
  - Previous fix attempt didn't work

skip_when: |
  - Root cause already known → just fix it
  - Error deep in call stack, need to trace backward → use root-cause-tracing
  - Issue obviously caused by your last change → quick verification first

related:
  complementary: [root-cause-tracing]
---

# Systematic Debugging

**Core principle:** NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.

## When to Use

Use for ANY technical issue: test failures, bugs, unexpected behavior, performance problems, build failures, integration issues.

**Especially when:**
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- Previous fix didn't work
- You don't fully understand the issue

## The Four Phases

Complete each phase before proceeding to the next.

### Phase 1: Root Cause Investigation

**MUST complete ALL before Phase 2:**

```
Phase 1 Investigation:
□ Error message copied verbatim: ___________
□ Reproduction confirmed: [steps documented]
□ Recent changes reviewed: [git diff output]
□ Evidence from ALL components: [list components checked]
□ Data flow traced: [origin → error location]
```

**Copy this checklist to TodoWrite.**

1. **Read Error Messages**
   - Stack traces completely
   - Line numbers, file paths, error codes
   - Don't skip past warnings

2. **Reproduce Consistently**
   - Exact steps to trigger
   - Happens every time? If not → gather more data

3. **Check Recent Changes**
   - `git diff`, recent commits
   - New dependencies, config changes

4. **Multi-Component Systems**

   **Add diagnostic instrumentation at EACH boundary:**
   ```bash
   # For each layer, log:
   - What enters component
   - What exits component
   - Environment/config state
   ```

   Run once, analyze evidence, identify failing layer.

5. **Trace Data Flow**

   Error deep in stack? **Use ring-default:root-cause-tracing skill.**

   Quick version:
   - Where does bad value originate?
   - Trace up call stack to source
   - Fix at source, not symptom

**Phase 1 Summary (write before Phase 2):**
```
FINDINGS:
- Error: [exact error]
- Reproduces: [steps]
- Recent changes: [commits]
- Component evidence: [what each shows]
- Data origin: [where bad data starts]
```

### Phase 2: Pattern Analysis

1. **Find Working Examples**
   - Similar working code in codebase
   - What works that's similar to what's broken?

2. **Compare Against References**
   - Read reference implementation COMPLETELY
   - Don't skim - understand fully

3. **Identify Differences**
   - List EVERY difference (working vs broken)
   - Don't assume "that can't matter"

4. **Understand Dependencies**
   - What components, config, environment needed?
   - What assumptions does it make?

### Phase 3: Hypothesis Testing

1. **Form Single Hypothesis**
   - "I think X is root cause because Y"
   - Be specific

2. **Test Minimally**
   - SMALLEST possible change
   - One variable at a time

3. **Verify and Track**
   ```
   Hypothesis #1: [what] → [result]
   Hypothesis #2: [what] → [result]
   Hypothesis #3: [what] → [STOP if fails]
   ```

   **If 3 hypotheses fail:**
   - STOP immediately
   - "3 hypotheses failed, architecture review required"
   - Discuss with partner before more attempts

4. **When You Don't Know**
   - Say "I don't understand X"
   - Ask for help
   - Research more

### Phase 4: Implementation

**Fix root cause, not symptom:**

1. **Create Failing Test**
   - Simplest reproduction
   - **Use ring-default:test-driven-development skill**

2. **Implement Single Fix**
   - Address root cause only
   - ONE change at a time
   - No "while I'm here" improvements

3. **Verify Fix**
   - Test passes?
   - No other tests broken?
   - Issue resolved?

4. **If Fix Doesn't Work**
   - Count fixes attempted
   - If < 3: Return to Phase 1
   - **If ≥ 3: STOP → Architecture review required**

5. **After Fix Verified**
   - Test passes and issue resolved?
   - **If non-trivial (took > 5 min):** Suggest documentation
   > "The fix has been verified. Would you like to document this solution for future reference?
   > Run: `/ring-default:codify`"
   - **Use ring-default:codify-solution skill** to capture institutional knowledge

6. **If 3+ Fixes Failed: Question Architecture**

   Pattern indicating architectural problem:
   - Each fix reveals new problem elsewhere
   - Fixes require massive refactoring
   - Each fix creates new symptoms

   **STOP and discuss:** Is architecture sound? Should we refactor vs. fix?

## Time Limits

**Debugging time boxes:**
- 30 min without root cause → Escalate
- 3 failed fixes → Architecture review
- 1 hour total → Stop, document, ask for guidance

## Red Flags

**STOP and return to Phase 1 if thinking:**
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "Add multiple changes, run tests"
- "Skip the test, I'll manually verify"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "One more fix attempt" (when already tried 2+)
- "Each fix reveals new problem" (architecture issue)

**User signals you're wrong:**
- "Is that not happening?" → You assumed without verifying
- "Stop guessing" → You're proposing fixes without understanding
- "We're stuck?" → Your approach isn't working

**When you see these: STOP. Return to Phase 1.**

## Quick Reference

| Phase | Key Activities | Success Criteria |
|-------|---------------|------------------|
| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
| **2. Pattern** | Find working examples, compare differences, understand dependencies | Identify what's different |
| **3. Hypothesis** | Form theory, test minimally, verify one at a time | Confirmed or new hypothesis |
| **4. Implementation** | Create test, fix root cause, verify | Bug resolved, tests pass |

**Circuit breakers:**
- 3 hypotheses fail → STOP, architecture review
- 3 fixes fail → STOP, question fundamentals
- 30 min no root cause → Escalate

## Integration with Other Skills

**Required sub-skills:**
- **root-cause-tracing** - When error is deep in call stack (Phase 1, Step 5)
- **test-driven-development** - For failing test case (Phase 4, Step 1)

**Post-completion:**
- **codify-solution** - Document non-trivial fixes (Phase 4, Step 5)

**Complementary:**
- **defense-in-depth** - Add validation after finding root cause
- **verification-before-completion** - Verify fix worked before claiming success

## Required Patterns

This skill uses these universal patterns:
- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`

Apply ALL patterns when using this skill.