Files
2025-11-29 18:20:33 +08:00

136 lines
4.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Root Cause Analysis Framework
Advanced techniques for identifying fundamental causes rather than symptoms.
## Table of Contents
- [The 5 Whys (Applied to Code)](#the-5-whys-applied-to-code)
- [Architectural Analysis Method](#architectural-analysis-method)
- [Data Flow Tracing](#data-flow-tracing)
- [State Analysis Patterns](#state-analysis-patterns)
- [Integration Point Analysis](#integration-point-analysis)
- [Dependency Chain Analysis](#dependency-chain-analysis)
- [Performance Root Cause Analysis](#performance-root-cause-analysis)
- [Sequential Thinking Templates](#sequential-thinking-templates)
## The 5 Whys (Applied to Code)
Ask "why" iteratively to drill down from symptom to root cause.
### Example: Null Pointer Exception
1. Why does the null pointer exception occur?
`user.getEmail()` is called on a null user object
2. Why is the user object null?
`findUserById()` returns null when no user is found
3. Why does `findUserById()` return null instead of throwing?
→ Original design used null to indicate "not found"
4. Why wasn't this caught earlier in the call chain?
→ Calling code doesn't check for null before using the user
5. Why doesn't the calling code check for null?
→ API contract is ambiguous about null as a valid return value
Root cause: Ambiguous API contract leads to inconsistent null handling.
Proper fix: Define and enforce a clear API contract (Optional/exception/documented null).
## Architectural Analysis Method
When bugs suggest deeper design issues, analyze architecture systematically.
1. Map components: interactions, data flows, boundaries
2. Identify assumptions (inputs, state, timing, external systems)
3. Find assumption mismatches between components
4. Choose architectural fix over workaround when systemic
Use `codebase_search` prompts like:
- "How does ComponentA communicate with ComponentB?"
- "What data flows from Source to Destination?"
## Data Flow Tracing
Trace transformations to locate where data goes wrong.
- Backward tracing: start at observation point → immediate source → transformation → origin
- Forward tracing: origin → each transformation → final state
- At each step compare expected vs actual state
Common root causes:
- Missing validation
- Incorrect transformation logic
- Lost context/metadata
- Race conditions
- Type/encoding mismatch
## State Analysis Patterns
Investigate state transitions and invariants.
- Uninitialized state: used before proper setup
- Stale state: cache invalidation/refresh failures
- Inconsistent state: related data out of sync (needs atomicity)
- Invalid state: invariants not enforced (add validation/assertions)
- Concurrent corruption: missing synchronization/immutability
## Integration Point Analysis
Verify integration contracts at boundaries.
- Data format: actual vs expected
- Protocol/version: compatibility and usage
- Timing: sync vs async, timeouts, ordering
- Error handling: propagation and retries
- AuthZ/AuthN: credentials, validation, failure behavior
Root cause patterns:
- Mismatched versions
- Incomplete error handling
- Configuration mismatch
- Network constraints
## Dependency Chain Analysis
Map direct, transitive, and hidden dependencies.
- Version conflicts (multiple versions)
- Missing dependencies (runtime load failures)
- Initialization order issues
- Circular dependencies
Use `codebase_search`:
- "What imports/uses ComponentX?"
- "What does ComponentX depend on?"
## Performance Root Cause Analysis
Identify bottlenecks systematically.
1. Measure first (profile under realistic load)
2. Check algorithmic complexity and hotspots
3. Analyze resource usage (CPU, memory, I/O, network)
4. Classify cause: algorithm, implementation, contention, external
Fix strategies:
- Algorithmic improvements
- Caching/batching
- Lazy loading
- Parallelization/asynchronous I/O
## Sequential Thinking Templates
Use `SequentialThinking:process_thought` to structure complex analysis.
Thought 1 - Problem Definition
- Symptom, context, confirmed facts, unknowns
Thought 2 - Hypotheses
- 35 candidates, assumptions, likelihood ranking
Thought 3 - Evidence
- For/against each hypothesis; challenge assumptions
Thought 4 - Selection
- Pick most likely; rationale; confidence
Thought 5 - Verification
- Predictions, test plan, alternatives if wrong