4.2 KiB
Root Cause Analysis Framework
Advanced techniques for identifying fundamental causes rather than symptoms.
Table of Contents
- The 5 Whys (Applied to Code)
- Architectural Analysis Method
- Data Flow Tracing
- State Analysis Patterns
- Integration Point Analysis
- Dependency Chain Analysis
- Performance Root Cause Analysis
- Sequential Thinking Templates
The 5 Whys (Applied to Code)
Ask "why" iteratively to drill down from symptom to root cause.
Example: Null Pointer Exception
- Why does the null pointer exception occur?
→
user.getEmail()is called on a null user object - Why is the user object null?
→
findUserById()returns null when no user is found - Why does
findUserById()return null instead of throwing? → Original design used null to indicate "not found" - Why wasn't this caught earlier in the call chain? → Calling code doesn't check for null before using the user
- Why doesn't the calling code check for null? → API contract is ambiguous about null as a valid return value
Root cause: Ambiguous API contract leads to inconsistent null handling. Proper fix: Define and enforce a clear API contract (Optional/exception/documented null).
Architectural Analysis Method
When bugs suggest deeper design issues, analyze architecture systematically.
- Map components: interactions, data flows, boundaries
- Identify assumptions (inputs, state, timing, external systems)
- Find assumption mismatches between components
- Choose architectural fix over workaround when systemic
Use codebase_search prompts like:
- "How does ComponentA communicate with ComponentB?"
- "What data flows from Source to Destination?"
Data Flow Tracing
Trace transformations to locate where data goes wrong.
- Backward tracing: start at observation point → immediate source → transformation → origin
- Forward tracing: origin → each transformation → final state
- At each step compare expected vs actual state
Common root causes:
- Missing validation
- Incorrect transformation logic
- Lost context/metadata
- Race conditions
- Type/encoding mismatch
State Analysis Patterns
Investigate state transitions and invariants.
- Uninitialized state: used before proper setup
- Stale state: cache invalidation/refresh failures
- Inconsistent state: related data out of sync (needs atomicity)
- Invalid state: invariants not enforced (add validation/assertions)
- Concurrent corruption: missing synchronization/immutability
Integration Point Analysis
Verify integration contracts at boundaries.
- Data format: actual vs expected
- Protocol/version: compatibility and usage
- Timing: sync vs async, timeouts, ordering
- Error handling: propagation and retries
- AuthZ/AuthN: credentials, validation, failure behavior
Root cause patterns:
- Mismatched versions
- Incomplete error handling
- Configuration mismatch
- Network constraints
Dependency Chain Analysis
Map direct, transitive, and hidden dependencies.
- Version conflicts (multiple versions)
- Missing dependencies (runtime load failures)
- Initialization order issues
- Circular dependencies
Use codebase_search:
- "What imports/uses ComponentX?"
- "What does ComponentX depend on?"
Performance Root Cause Analysis
Identify bottlenecks systematically.
- Measure first (profile under realistic load)
- Check algorithmic complexity and hotspots
- Analyze resource usage (CPU, memory, I/O, network)
- Classify cause: algorithm, implementation, contention, external
Fix strategies:
- Algorithmic improvements
- Caching/batching
- Lazy loading
- Parallelization/asynchronous I/O
Sequential Thinking Templates
Use SequentialThinking:process_thought to structure complex analysis.
Thought 1 - Problem Definition
- Symptom, context, confirmed facts, unknowns
Thought 2 - Hypotheses
- 3–5 candidates, assumptions, likelihood ranking
Thought 3 - Evidence
- For/against each hypothesis; challenge assumptions
Thought 4 - Selection
- Pick most likely; rationale; confidence
Thought 5 - Verification
- Predictions, test plan, alternatives if wrong