Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:20:33 +08:00
commit 977fbf5872
27 changed files with 5714 additions and 0 deletions

121
skills/debugging/SKILL.md Normal file
View File

@@ -0,0 +1,121 @@
---
name: debugging
description: Systematic debugging that identifies root causes rather than treating symptoms. Uses sequential thinking for complex analysis, web search for research, and structured investigation to avoid circular reasoning and whack-a-mole fixes.
---
# Debugging
## Quickstart
1. Capture exact repro, scope, and recent changes
2. Isolate components/files; trace path to failure
3. Research exact error; check official docs
4. Compare failing vs working patterns; form a testable hypothesis
5. Verify with minimal test; apply minimal fix across all instances; validate
## When to Use This Skill
Use debugging when:
- A bug has no obvious cause or has been "fixed" before but returned
- Error messages are unclear or misleading
- Multiple attempted fixes have failed
- The issue might affect multiple locations in the codebase
- Understanding the root cause is critical for proper resolution
Skip this skill for:
- Simple syntax errors with obvious fixes
- Trivial typos or missing imports
- Well-understood, isolated bugs with clear solutions
## Core Anti-Patterns to Avoid
Based on documented failures in AI debugging, explicitly avoid:
1. **Circular Reasoning**: Never propose the same fix twice without learning why it failed
2. **Premature Victory**: Always verify fixes were actually implemented and work
3. **Pattern Amnesia**: Maintain awareness of established code patterns throughout the session
4. **Context Overload**: Use the 50% rule - restart conversation when context reaches 50%
5. **Symptom Chasing**: Resist fixing error messages without understanding root causes
6. **Implementation Before Understanding**: Never jump to code changes before examining existing patterns
## UNDERSTAND (10-step checklist)
- Understand: capture exact repro, scope, and recent changes
- Narrow: isolate components/files; trace path to failure
- Discover: research exact error (WebSearch → Parallel Search, Context7:get-library-docs)
- Examine: compare against known-good patterns in the codebase
- Reason: use SequentialThinking:process_thought and 5 Whys to reach root cause
- Synthesize: write a falsifiable hypothesis with predictions
- Test: add logs/tests to confirm the mechanism
- Apply: minimal fix for root cause, across all occurrences, following patterns
- Note: record insights, warnings, decisions
- Document: update comments/docs/tests as needed
## Progress Tracking with TodoWrite
Use TodoWrite to track debugging progress through the UNDERSTAND checklist:
1. **At start**: Create todos for each applicable step:
```
☐ U - Capture exact repro and scope
☐ N - Isolate failing component
☐ D - Research error message
☐ E - Compare with working patterns
☐ R - Root cause analysis (5 Whys)
☐ S - Write falsifiable hypothesis
☐ T - Verify with minimal test
☐ A - Apply fix across all occurrences
☐ N - Record insights
☐ D - Update docs/tests
```
2. **During debugging**: Mark steps in_progress → completed as you work through them
3. **When stuck**: TodoWrite makes it visible which step is blocked - helps identify if you're skipping steps or going in circles
4. **Skip steps only if**: Bug is simple enough that checklist is overkill (see "Skip this skill for" above)
## Tool Decision Tree
- Know exact text/symbol? → grep
- Need conceptual/semantic location? → codebase_search
- Need full file context? → read_file
- Unfamiliar error/behavior? → Context7:get-library-docs, then WebSearch → Parallel Search
- Complex multi-hypothesis analysis? → SequentialThinking:process_thought
## Context Management
- Restart at ~50% context usage to avoid degraded reasoning
- Before restart: summarize facts, hypothesis, ruled-outs, next step
- Start a fresh chat with just that summary; continue
## Decision Framework
**IF** same fix proposed twice → Stop; use SequentialThinking:process_thought
**IF** error is unclear → Research via WebSearch → Parallel Search; verify with docs
**IF** area is unfamiliar → Explore with codebase_search; don't guess
**IF** fix seems too easy → Confirm it addresses root cause (not symptom)
**IF** context is cluttered → Restart at 50% with summary
**IF** multiple hypotheses exist → Evaluate explicitly (evidence for/against)
**IF** similar code works → Find and diff via codebase_search/read_file
**IF** declaring success → Show changed lines; test fail-before/pass-after
**IF** fix spans multiple files → Search and patch all occurrences
**IF** library behavior assumed → Check Context7:get-library-docs
## Quality Checks Before Finishing
Before declaring a bug fixed, verify:
- [ ] Root cause identified and documented
- [ ] Fix addresses cause, not symptom
- [ ] All occurrences fixed (searched project-wide)
- [ ] Follows existing code patterns
- [ ] Original symptom eliminated
- [ ] No regressions introduced
- [ ] Tests/logs verify under relevant conditions
- [ ] Docs/tests updated (comments, docs, regression tests)
## References
- `reference/root-cause-framework.md`
- `reference/antipatterns.md`

View File

@@ -0,0 +1,43 @@
# Debugging Antipatterns (and Recoveries)
Avoid these documented failure modes; use the recovery steps when detected.
## 1) Circular Reasoning Without Learning
- Symptom: Proposing the same fix repeatedly
- Recovery: Stop and use `SequentialThinking:process_thought` to analyze why the fix failed; propose a substantively different approach
## 2) Premature Victory Declaration
- Symptom: Declaring success without changes/tests
- Recovery: Show changed lines; run tests that fail-before/pass-after; verify across scenarios
## 3) Pattern Amnesia
- Symptom: Ignoring established code patterns/conventions
- Recovery: `codebase_search` similar implementations; extract and follow patterns; explain any deviation
## 4) Implementation Before Understanding
- Symptom: Jumping to code edits without examining context
- Recovery: Explore → Plan → Code; read relevant files; outline plan; then implement
## 5) Context-Limited Fixes
- Symptom: Fixing one location only
- Recovery: Search project-wide (grep/codebase_search) for the root pattern; patch all occurrences; refactor if repeated
## 6) Symptom Chasing
- Symptom: Treating error messages as the problem
- Recovery: Apply 5 Whys; confirm root cause explains all symptoms; then fix
## 7) Assumption-Based Debugging
- Symptom: Assuming library/system behavior
- Recovery: Research via Firecrawl:search; verify with `Context7:get-library-docs`; test assumptions
## 8) Context Overload Ignorance
- Symptom: Degraded reasoning in long sessions
- Recovery: Restart at ~50%; carry summary of facts, hypothesis, next step only
## 9) Tool Misuse
- Symptom: Using wrong tool for task
- Recovery: Decision tree: exact text→grep; concept→codebase_search; full context→read_file; research→Firecrawl/Perplexity; complex analysis→SequentialThinking
## 10) Plan Abandonment
- Symptom: Ignoring the plan mid-way
- Recovery: Note deviation; justify; update plan; resume at correct step

View File

@@ -0,0 +1,135 @@
# Root Cause Analysis Framework
Advanced techniques for identifying fundamental causes rather than symptoms.
## Table of Contents
- [The 5 Whys (Applied to Code)](#the-5-whys-applied-to-code)
- [Architectural Analysis Method](#architectural-analysis-method)
- [Data Flow Tracing](#data-flow-tracing)
- [State Analysis Patterns](#state-analysis-patterns)
- [Integration Point Analysis](#integration-point-analysis)
- [Dependency Chain Analysis](#dependency-chain-analysis)
- [Performance Root Cause Analysis](#performance-root-cause-analysis)
- [Sequential Thinking Templates](#sequential-thinking-templates)
## The 5 Whys (Applied to Code)
Ask "why" iteratively to drill down from symptom to root cause.
### Example: Null Pointer Exception
1. Why does the null pointer exception occur?
`user.getEmail()` is called on a null user object
2. Why is the user object null?
`findUserById()` returns null when no user is found
3. Why does `findUserById()` return null instead of throwing?
→ Original design used null to indicate "not found"
4. Why wasn't this caught earlier in the call chain?
→ Calling code doesn't check for null before using the user
5. Why doesn't the calling code check for null?
→ API contract is ambiguous about null as a valid return value
Root cause: Ambiguous API contract leads to inconsistent null handling.
Proper fix: Define and enforce a clear API contract (Optional/exception/documented null).
## Architectural Analysis Method
When bugs suggest deeper design issues, analyze architecture systematically.
1. Map components: interactions, data flows, boundaries
2. Identify assumptions (inputs, state, timing, external systems)
3. Find assumption mismatches between components
4. Choose architectural fix over workaround when systemic
Use `codebase_search` prompts like:
- "How does ComponentA communicate with ComponentB?"
- "What data flows from Source to Destination?"
## Data Flow Tracing
Trace transformations to locate where data goes wrong.
- Backward tracing: start at observation point → immediate source → transformation → origin
- Forward tracing: origin → each transformation → final state
- At each step compare expected vs actual state
Common root causes:
- Missing validation
- Incorrect transformation logic
- Lost context/metadata
- Race conditions
- Type/encoding mismatch
## State Analysis Patterns
Investigate state transitions and invariants.
- Uninitialized state: used before proper setup
- Stale state: cache invalidation/refresh failures
- Inconsistent state: related data out of sync (needs atomicity)
- Invalid state: invariants not enforced (add validation/assertions)
- Concurrent corruption: missing synchronization/immutability
## Integration Point Analysis
Verify integration contracts at boundaries.
- Data format: actual vs expected
- Protocol/version: compatibility and usage
- Timing: sync vs async, timeouts, ordering
- Error handling: propagation and retries
- AuthZ/AuthN: credentials, validation, failure behavior
Root cause patterns:
- Mismatched versions
- Incomplete error handling
- Configuration mismatch
- Network constraints
## Dependency Chain Analysis
Map direct, transitive, and hidden dependencies.
- Version conflicts (multiple versions)
- Missing dependencies (runtime load failures)
- Initialization order issues
- Circular dependencies
Use `codebase_search`:
- "What imports/uses ComponentX?"
- "What does ComponentX depend on?"
## Performance Root Cause Analysis
Identify bottlenecks systematically.
1. Measure first (profile under realistic load)
2. Check algorithmic complexity and hotspots
3. Analyze resource usage (CPU, memory, I/O, network)
4. Classify cause: algorithm, implementation, contention, external
Fix strategies:
- Algorithmic improvements
- Caching/batching
- Lazy loading
- Parallelization/asynchronous I/O
## Sequential Thinking Templates
Use `SequentialThinking:process_thought` to structure complex analysis.
Thought 1 - Problem Definition
- Symptom, context, confirmed facts, unknowns
Thought 2 - Hypotheses
- 35 candidates, assumptions, likelihood ranking
Thought 3 - Evidence
- For/against each hypothesis; challenge assumptions
Thought 4 - Selection
- Pick most likely; rationale; confidence
Thought 5 - Verification
- Predictions, test plan, alternatives if wrong