Initial commit

2025-11-29 18:20:33 +08:00
commit 977fbf5872
27 changed files with 5714 additions and 0 deletions
--- a/skills/debugging/SKILL.md
+++ b/skills/debugging/SKILL.md
@@ -0,0 +1,121 @@
+---
+name: debugging
+description: Systematic debugging that identifies root causes rather than treating symptoms. Uses sequential thinking for complex analysis, web search for research, and structured investigation to avoid circular reasoning and whack-a-mole fixes.
+---
+
+# Debugging
+
+## Quickstart
+
+1. Capture exact repro, scope, and recent changes
+2. Isolate components/files; trace path to failure
+3. Research exact error; check official docs
+4. Compare failing vs working patterns; form a testable hypothesis
+5. Verify with minimal test; apply minimal fix across all instances; validate
+
+## When to Use This Skill
+
+Use debugging when:
+- A bug has no obvious cause or has been "fixed" before but returned
+- Error messages are unclear or misleading
+- Multiple attempted fixes have failed
+- The issue might affect multiple locations in the codebase
+- Understanding the root cause is critical for proper resolution
+
+Skip this skill for:
+- Simple syntax errors with obvious fixes
+- Trivial typos or missing imports
+- Well-understood, isolated bugs with clear solutions
+
+## Core Anti-Patterns to Avoid
+
+Based on documented failures in AI debugging, explicitly avoid:
+
+1. **Circular Reasoning**: Never propose the same fix twice without learning why it failed
+2. **Premature Victory**: Always verify fixes were actually implemented and work
+3. **Pattern Amnesia**: Maintain awareness of established code patterns throughout the session
+4. **Context Overload**: Use the 50% rule - restart conversation when context reaches 50%
+5. **Symptom Chasing**: Resist fixing error messages without understanding root causes
+6. **Implementation Before Understanding**: Never jump to code changes before examining existing patterns
+
+## UNDERSTAND (10-step checklist)
+
+- Understand: capture exact repro, scope, and recent changes
+- Narrow: isolate components/files; trace path to failure
+- Discover: research exact error (WebSearch → Parallel Search, Context7:get-library-docs)
+- Examine: compare against known-good patterns in the codebase
+- Reason: use SequentialThinking:process_thought and 5 Whys to reach root cause
+- Synthesize: write a falsifiable hypothesis with predictions
+- Test: add logs/tests to confirm the mechanism
+- Apply: minimal fix for root cause, across all occurrences, following patterns
+- Note: record insights, warnings, decisions
+- Document: update comments/docs/tests as needed
+
+## Progress Tracking with TodoWrite
+
+Use TodoWrite to track debugging progress through the UNDERSTAND checklist:
+
+1. **At start**: Create todos for each applicable step:
+   ```
+   ☐ U - Capture exact repro and scope
+   ☐ N - Isolate failing component
+   ☐ D - Research error message
+   ☐ E - Compare with working patterns
+   ☐ R - Root cause analysis (5 Whys)
+   ☐ S - Write falsifiable hypothesis
+   ☐ T - Verify with minimal test
+   ☐ A - Apply fix across all occurrences
+   ☐ N - Record insights
+   ☐ D - Update docs/tests
+   ```
+
+2. **During debugging**: Mark steps in_progress → completed as you work through them
+
+3. **When stuck**: TodoWrite makes it visible which step is blocked - helps identify if you're skipping steps or going in circles
+
+4. **Skip steps only if**: Bug is simple enough that checklist is overkill (see "Skip this skill for" above)
+
+## Tool Decision Tree
+
+- Know exact text/symbol? → grep
+- Need conceptual/semantic location? → codebase_search
+- Need full file context? → read_file
+- Unfamiliar error/behavior? → Context7:get-library-docs, then WebSearch → Parallel Search
+- Complex multi-hypothesis analysis? → SequentialThinking:process_thought
+
+## Context Management
+
+- Restart at ~50% context usage to avoid degraded reasoning
+- Before restart: summarize facts, hypothesis, ruled-outs, next step
+- Start a fresh chat with just that summary; continue
+
+## Decision Framework
+
+**IF** same fix proposed twice → Stop; use SequentialThinking:process_thought
+**IF** error is unclear → Research via WebSearch → Parallel Search; verify with docs
+**IF** area is unfamiliar → Explore with codebase_search; don't guess
+**IF** fix seems too easy → Confirm it addresses root cause (not symptom)
+**IF** context is cluttered → Restart at 50% with summary
+**IF** multiple hypotheses exist → Evaluate explicitly (evidence for/against)
+**IF** similar code works → Find and diff via codebase_search/read_file
+**IF** declaring success → Show changed lines; test fail-before/pass-after
+**IF** fix spans multiple files → Search and patch all occurrences
+**IF** library behavior assumed → Check Context7:get-library-docs
+
+## Quality Checks Before Finishing
+
+Before declaring a bug fixed, verify:
+
+- [ ] Root cause identified and documented
+- [ ] Fix addresses cause, not symptom
+- [ ] All occurrences fixed (searched project-wide)
+- [ ] Follows existing code patterns
+- [ ] Original symptom eliminated
+- [ ] No regressions introduced
+- [ ] Tests/logs verify under relevant conditions
+- [ ] Docs/tests updated (comments, docs, regression tests)
+
+## References
+
+- `reference/root-cause-framework.md`
+- `reference/antipatterns.md`
--- a/skills/debugging/reference/antipatterns.md
+++ b/skills/debugging/reference/antipatterns.md
@@ -0,0 +1,43 @@
+# Debugging Antipatterns (and Recoveries)
+
+Avoid these documented failure modes; use the recovery steps when detected.
+
+## 1) Circular Reasoning Without Learning
+- Symptom: Proposing the same fix repeatedly
+- Recovery: Stop and use `SequentialThinking:process_thought` to analyze why the fix failed; propose a substantively different approach
+
+## 2) Premature Victory Declaration
+- Symptom: Declaring success without changes/tests
+- Recovery: Show changed lines; run tests that fail-before/pass-after; verify across scenarios
+
+## 3) Pattern Amnesia
+- Symptom: Ignoring established code patterns/conventions
+- Recovery: `codebase_search` similar implementations; extract and follow patterns; explain any deviation
+
+## 4) Implementation Before Understanding
+- Symptom: Jumping to code edits without examining context
+- Recovery: Explore → Plan → Code; read relevant files; outline plan; then implement
+
+## 5) Context-Limited Fixes
+- Symptom: Fixing one location only
+- Recovery: Search project-wide (grep/codebase_search) for the root pattern; patch all occurrences; refactor if repeated
+
+## 6) Symptom Chasing
+- Symptom: Treating error messages as the problem
+- Recovery: Apply 5 Whys; confirm root cause explains all symptoms; then fix
+
+## 7) Assumption-Based Debugging
+- Symptom: Assuming library/system behavior
+- Recovery: Research via Firecrawl:search; verify with `Context7:get-library-docs`; test assumptions
+
+## 8) Context Overload Ignorance
+- Symptom: Degraded reasoning in long sessions
+- Recovery: Restart at ~50%; carry summary of facts, hypothesis, next step only
+
+## 9) Tool Misuse
+- Symptom: Using wrong tool for task
+- Recovery: Decision tree: exact text→grep; concept→codebase_search; full context→read_file; research→Firecrawl/Perplexity; complex analysis→SequentialThinking
+
+## 10) Plan Abandonment
+- Symptom: Ignoring the plan mid-way
+- Recovery: Note deviation; justify; update plan; resume at correct step
--- a/skills/debugging/reference/root-cause-framework.md
+++ b/skills/debugging/reference/root-cause-framework.md
@@ -0,0 +1,135 @@
+# Root Cause Analysis Framework
+
+Advanced techniques for identifying fundamental causes rather than symptoms.
+
+## Table of Contents
+
+- [The 5 Whys (Applied to Code)](#the-5-whys-applied-to-code)
+- [Architectural Analysis Method](#architectural-analysis-method)
+- [Data Flow Tracing](#data-flow-tracing)
+- [State Analysis Patterns](#state-analysis-patterns)
+- [Integration Point Analysis](#integration-point-analysis)
+- [Dependency Chain Analysis](#dependency-chain-analysis)
+- [Performance Root Cause Analysis](#performance-root-cause-analysis)
+- [Sequential Thinking Templates](#sequential-thinking-templates)
+
+## The 5 Whys (Applied to Code)
+
+Ask "why" iteratively to drill down from symptom to root cause.
+
+### Example: Null Pointer Exception
+
+1. Why does the null pointer exception occur?
+   → `user.getEmail()` is called on a null user object
+2. Why is the user object null?
+   → `findUserById()` returns null when no user is found
+3. Why does `findUserById()` return null instead of throwing?
+   → Original design used null to indicate "not found"
+4. Why wasn't this caught earlier in the call chain?
+   → Calling code doesn't check for null before using the user
+5. Why doesn't the calling code check for null?
+   → API contract is ambiguous about null as a valid return value
+
+Root cause: Ambiguous API contract leads to inconsistent null handling.
+Proper fix: Define and enforce a clear API contract (Optional/exception/documented null).
+
+## Architectural Analysis Method
+
+When bugs suggest deeper design issues, analyze architecture systematically.
+
+1. Map components: interactions, data flows, boundaries
+2. Identify assumptions (inputs, state, timing, external systems)
+3. Find assumption mismatches between components
+4. Choose architectural fix over workaround when systemic
+
+Use `codebase_search` prompts like:
+- "How does ComponentA communicate with ComponentB?"
+- "What data flows from Source to Destination?"
+
+## Data Flow Tracing
+
+Trace transformations to locate where data goes wrong.
+
+- Backward tracing: start at observation point → immediate source → transformation → origin
+- Forward tracing: origin → each transformation → final state
+- At each step compare expected vs actual state
+
+Common root causes:
+- Missing validation
+- Incorrect transformation logic
+- Lost context/metadata
+- Race conditions
+- Type/encoding mismatch
+
+## State Analysis Patterns
+
+Investigate state transitions and invariants.
+
+- Uninitialized state: used before proper setup
+- Stale state: cache invalidation/refresh failures
+- Inconsistent state: related data out of sync (needs atomicity)
+- Invalid state: invariants not enforced (add validation/assertions)
+- Concurrent corruption: missing synchronization/immutability
+
+## Integration Point Analysis
+
+Verify integration contracts at boundaries.
+
+- Data format: actual vs expected
+- Protocol/version: compatibility and usage
+- Timing: sync vs async, timeouts, ordering
+- Error handling: propagation and retries
+- AuthZ/AuthN: credentials, validation, failure behavior
+
+Root cause patterns:
+- Mismatched versions
+- Incomplete error handling
+- Configuration mismatch
+- Network constraints
+
+## Dependency Chain Analysis
+
+Map direct, transitive, and hidden dependencies.
+
+- Version conflicts (multiple versions)
+- Missing dependencies (runtime load failures)
+- Initialization order issues
+- Circular dependencies
+
+Use `codebase_search`:
+- "What imports/uses ComponentX?"
+- "What does ComponentX depend on?"
+
+## Performance Root Cause Analysis
+
+Identify bottlenecks systematically.
+
+1. Measure first (profile under realistic load)
+2. Check algorithmic complexity and hotspots
+3. Analyze resource usage (CPU, memory, I/O, network)
+4. Classify cause: algorithm, implementation, contention, external
+
+Fix strategies:
+- Algorithmic improvements
+- Caching/batching
+- Lazy loading
+- Parallelization/asynchronous I/O
+
+## Sequential Thinking Templates
+
+Use `SequentialThinking:process_thought` to structure complex analysis.
+
+Thought 1 - Problem Definition
+- Symptom, context, confirmed facts, unknowns
+
+Thought 2 - Hypotheses
+- 3–5 candidates, assumptions, likelihood ranking
+
+Thought 3 - Evidence
+- For/against each hypothesis; challenge assumptions
+
+Thought 4 - Selection
+- Pick most likely; rationale; confidence
+
+Thought 5 - Verification
+- Predictions, test plan, alternatives if wrong