Initial commit

2025-11-29 18:47:55 +08:00
commit e732da8316
20 changed files with 4969 additions and 0 deletions
--- a/skills/debug/SKILL.md
+++ b/skills/debug/SKILL.md
@@ -0,0 +1,179 @@
+---
+name: debug
+description: Apply systematic debugging methodology using medical differential diagnosis principles. Trigger when AI modifies working code and anomalies occur, or when users report unexpected test results or execution failures. Use observation without preconception, fact isolation, differential diagnosis lists, deductive exclusion, experimental verification, precise fixes, and prevention mechanisms.
+---
+
+# Debug
+
+## Overview
+
+This skill applies a systematic debugging methodology inspired by medical differential diagnosis. It provides a rigorous 7-step process for investigating and resolving bugs through observation, classification, hypothesis testing, and verification. This approach prioritizes evidence-based reasoning over assumptions, ensuring root causes are identified rather than symptoms treated.
+
+## When to Use This Skill
+
+Activate this skill in two primary scenarios:
+
+**Scenario A: Post-Modification Anomalies**
+When modifying a previously tested and working version, and any unexpected behavior emerges after the changes.
+
+**Scenario B: User-Reported Issues**
+When users report that test results don't meet expectations or the system fails to execute as intended.
+
+## Debugging Workflow
+
+Follow this 7-step systematic approach to diagnose and resolve issues.
+
+For a detailed checklist of each step, refer to `{baseDir}/references/debugging_checklist.md`. For common bug patterns and their signatures, see `{baseDir}/references/common_patterns.md`.
+
+### Step 1: Observe Without Preconception (Observe)
+
+**Objective:** Collect all available evidence without jumping to conclusions.
+
+**Process:**
+- Gather all accessible clues: user reports, system logs, dashboards, error stack traces, version changes (git diff), configuration parameters (configs/args/env)
+- Focus exclusively on facts and observable phenomena
+- Avoid premature hypotheses or assumptions about causes
+- Document all observations systematically
+
+**Key Principle:** Observe, don't just see. At this stage, the goal is comprehensive data collection, not interpretation.
+
+### Step 2: Classify and Isolate Facts (Classify & Isolate Facts)
+
+**Objective:** Distinguish symptoms from root causes and narrow the problem scope.
+
+**Process:**
+
+**For Incremental Development (Scenario A - Post-Modification Anomalies):**
+- Confirm the previous step still works (ensure issue is from new changes)
+- List ALL changes since last working state (git diff, code modifications, config changes)
+- Identify implicit assumptions in these changes, such as:
+  - API calling conventions ("I assume this API works this way")
+  - Parameter types/order ("I assume this parameter accepts X")
+  - Configuration values ("I assume this env var is set")
+  - Data formats ("I assume the response is JSON")
+  - [And other fundamental assumptions embedded in the changes]
+- **Apply Occam's Razor**: The simplest explanation is usually correct—prioritize basic assumption errors (typos, wrong parameters, incorrect API usage) over complex failure modes
+- Verify fundamental assumptions with this priority:
+  1. Check how it was implemented in the last working version (proven to work)
+  2. Consult official documentation for correct usage (may be outdated)
+  3. Only then consider external issues (community-reported bugs, known issues)
+
+**General Isolation:**
+- Separate "what is broken" (symptoms) from "why it's broken" (causes)
+- Systematically narrow down the problem domain by testing:
+  - Does it occur only in specific browsers?
+  - Does it happen on specific operating systems?
+  - Is it time-dependent?
+  - Is it triggered by specific parameter values or input data?
+- Eliminate all modules/components that function correctly
+- Isolate the suspicious area
+
+**Key Principle:** Reduce the search space by eliminating what works correctly.
+
+### Step 3: Build Differential Diagnosis List (Differential Diagnosis List)
+
+**Objective:** Enumerate all possible technical failure points.
+
+**Process:**
+- Create a comprehensive list of potential failure modes:
+  - Cache errors
+  - Database connection failures
+  - Third-party API outages
+  - Memory leaks
+  - Configuration anomalies
+  - Version compatibility issues
+  - Race conditions
+  - Resource exhaustion
+- Include even rare or unlikely scenarios
+- Draw on knowledge base and past experiences
+- Consider both common and edge cases
+- Consult `{baseDir}/references/common_patterns.md` for known bug patterns
+
+**Key Principle:** Cast a wide net initially—don't prematurely exclude possibilities.
+
+### Step 4: Apply Elimination and Deductive Reasoning (Deduce & Exclude)
+
+**Objective:** Systematically eliminate impossible factors to find the truth.
+
+**Process:**
+- Follow Sherlock Holmes' principle: "When you eliminate the impossible, whatever remains, however improbable, must be the truth"
+- Design precise tests to validate or invalidate each hypothesis
+- Use Chain-of-Thought reasoning to document the deductive process
+- Make reasoning transparent and verifiable
+- Progressively eliminate factors until a single root cause remains
+
+**Key Principle:** Evidence-based elimination leads to certainty.
+
+### Step 5: Experimental Verification and Investigation (Experimental Verification)
+
+**Objective:** Validate hypotheses through controlled experiments.
+
+**Process:**
+- Create restorable checkpoints before making changes
+- Design and execute targeted experiments to test remaining hypotheses
+- Research latest versions, known issues, and community discussions (GitHub issues, Stack Overflow)
+- Conduct focused verification tests
+- Use experimental evidence to prove each logical step
+- Iterate until the exact cause is confirmed
+
+**Key Principle:** Prove hypotheses with experiments, not assumptions.
+
+### Step 6: Locate and Implement Fix (Locate & Implement Fix)
+
+**Objective:** Apply the most elegant and least invasive solution.
+
+**Process:**
+- Pinpoint the exact code location or configuration causing the issue
+- Design the fix with minimal side effects
+- Prioritize elegant solutions over quick patches
+- Consider long-term maintainability
+- Implement the fix with precision
+
+**Key Principle:** Seek elegant solutions, not temporary workarounds.
+
+### Step 7: Prevention Mechanism (Prevent)
+
+**Objective:** Ensure the same error doesn't recur and verify stability.
+
+**Process:**
+- Verify all related modules remain stable after the fix
+- Run comprehensive regression tests
+- Review the entire debugging process
+- Generalize lessons learned
+- Document findings in CLAUDE.md or project documentation
+- Implement safeguards to prevent similar issues
+
+**Key Principle:** Fix once, prevent forever.
+
+## Best Practices
+
+**Maintain Scientific Rigor:**
+- Bold hypotheses, careful verification
+- Evidence before assertions
+- Transparency in reasoning
+
+**Documentation:**
+- Track all observations, hypotheses, and test results
+- Make the investigation reproducible
+- Document not just the fix, but the reasoning process
+- Use `{baseDir}/references/investigation_template.md` to structure investigation logs
+- Use `{baseDir}/assets/debug_report_template.md` for creating post-mortem reports
+
+**Communication:**
+- Explain findings clearly to users
+- Provide context for why the issue occurred
+- Describe preventive measures implemented
+
+## Resources
+
+This skill includes bundled resources to support the debugging workflow:
+
+### references/
+Load these into context as needed during investigation:
+- `{baseDir}/references/debugging_checklist.md` - Comprehensive checklist for each debugging step
+- `{baseDir}/references/common_patterns.md` - Common bug patterns and their signatures
+- `{baseDir}/references/investigation_template.md` - Template for documenting investigations
+
+### assets/
+Use these templates for documentation and reporting:
+- `{baseDir}/assets/debug_report_template.md` - Template for summarizing debugging sessions and creating post-mortem reports
--- a/skills/debug/assets/debug_report_template.md
+++ b/skills/debug/assets/debug_report_template.md
@@ -0,0 +1,134 @@
+# Debug Report: [Issue Title]
+
+**Date:** [YYYY-MM-DD]
+**Investigator:** [Name/AI]
+**Status:** 🟢 Resolved / 🔴 Unresolved / ⚠️ Workaround Applied
+
+---
+
+## Executive Summary
+
+[2-3 sentence summary of the issue, root cause, and resolution]
+
+---
+
+## Issue Description
+
+**Reported By:** [User/System]
+**Initial Report:**
+> [User's description or error message]
+
+**Impact:**
+- **Severity:** Critical / High / Medium / Low
+- **Users Affected:** [Number or description]
+- **Systems Affected:** [List of affected components]
+
+---
+
+## Root Cause
+
+**TL;DR:** [One sentence explanation]
+
+**Detailed Explanation:**
+[Explain what caused the issue and why it manifested the way it did]
+
+**Location:**
+- File: `[path/to/file]:[line]`
+- Component: [Component name]
+- Introduced in: [Commit hash or version]
+
+---
+
+## Investigation Process
+
+### Observations
+- [Key observation 1]
+- [Key observation 2]
+- [Key observation 3]
+
+### Hypotheses Considered
+1. ❌ [Eliminated hypothesis] - Ruled out because [reason]
+2. ❌ [Eliminated hypothesis] - Ruled out because [reason]
+3. ✅ [Confirmed hypothesis] - Confirmed by [evidence]
+
+### Key Evidence
+- [Evidence 1 that led to root cause]
+- [Evidence 2 that confirmed the diagnosis]
+
+---
+
+## Resolution
+
+### Fix Applied
+
+```diff
+# File: [filename]
+- [removed code]
+ [added code]
+```
+
+**Rationale:** [Why this fix was chosen over alternatives]
+
+### Verification
+- ✅ Original issue resolved
+- ✅ No regression in related functionality
+- ✅ Test suite passes
+- ✅ Deployed to production
+
+---
+
+## Prevention Measures
+
+**Immediate Actions:**
+1. [Action 1 - e.g., Added validation]
+2. [Action 2 - e.g., Added test coverage]
+
+**Long-term Improvements:**
+1. [Improvement 1 - e.g., Refactor error handling]
+2. [Improvement 2 - e.g., Add monitoring]
+
+**Tests Added:**
+```
+[Description or snippet of regression test]
+```
+
+---
+
+## Timeline
+
+| Time | Event |
+|------|-------|
+| [HH:MM] | Issue reported |
+| [HH:MM] | Investigation started |
+| [HH:MM] | Root cause identified |
+| [HH:MM] | Fix implemented |
+| [HH:MM] | Fix deployed |
+| [HH:MM] | Issue resolved |
+
+**Total Resolution Time:** [Duration]
+
+---
+
+## Lessons Learned
+
+**What Went Well:**
+- [Positive aspect 1]
+- [Positive aspect 2]
+
+**What Could Be Improved:**
+- [Improvement area 1]
+- [Improvement area 2]
+
+**Key Takeaway:**
+[Main lesson for future reference]
+
+---
+
+## Related Issues
+
+- [Related issue #1]
+- [Related issue #2]
+
+---
+
+**Report Generated:** [YYYY-MM-DD HH:MM]
--- a/skills/debug/references/common_patterns.md
+++ b/skills/debug/references/common_patterns.md
@@ -0,0 +1,306 @@
+# Common Bug Patterns and Signatures
+
+This reference documents frequently encountered bug patterns, their signatures, and diagnostic approaches.
+
+## Pattern Categories
+
+### 1. Timing and Concurrency Issues
+
+#### Race Conditions
+**Signature:**
+- Intermittent failures
+- Works in development but fails in production
+- Different results with same input
+- Failures during high load
+
+**Common Causes:**
+- Shared mutable state without synchronization
+- Incorrect thread-safety assumptions
+- Async operations completing in unexpected order
+
+**Investigation Approach:**
+- Add extensive logging with timestamps
+- Use debugger breakpoints sparingly (changes timing)
+- Add delays to expose race windows
+- Review all shared state access patterns
+
+#### Deadlocks
+**Signature:**
+- Application hangs indefinitely
+- No error messages
+- High CPU or complete freeze
+- Multiple threads waiting
+
+**Common Causes:**
+- Circular wait for locks
+- Lock ordering violations
+- Database transaction deadlocks
+
+**Investigation Approach:**
+- Check thread dumps / stack traces
+- Review lock acquisition order
+- Use database deadlock detection tools
+- Add timeout mechanisms
+
+### 2. Memory Issues
+
+#### Memory Leaks
+**Signature:**
+- Gradually increasing memory usage
+- Performance degradation over time
+- Out of memory errors after extended runtime
+- Works initially, fails after hours/days
+
+**Common Causes:**
+- Event listeners not cleaned up
+- Cache without eviction policy
+- Circular references preventing garbage collection
+- Resource handles not closed
+
+**Investigation Approach:**
+- Profile memory over time
+- Take heap dumps at intervals
+- Compare object counts between snapshots
+- Check for unclosed resources (files, connections, sockets)
+
+#### Stack Overflow
+**Signature:**
+- Stack overflow error
+- Deep recursion errors
+- Crashes at predictable depth
+
+**Common Causes:**
+- Unbounded recursion
+- Missing base case in recursive function
+- Circular data structure traversal
+
+**Investigation Approach:**
+- Check recursion depth
+- Verify base case conditions
+- Look for circular references
+- Consider iterative alternative
+
+### 3. State Management Issues
+
+#### Stale Cache
+**Signature:**
+- Outdated data displayed
+- Inconsistency between systems
+- Works after cache clear
+- Different results on different servers
+
+**Common Causes:**
+- Cache invalidation not triggered
+- TTL too long
+- Distributed cache synchronization issues
+
+**Investigation Approach:**
+- Check cache invalidation logic
+- Verify cache key generation
+- Test with cache disabled
+- Review cache update patterns
+
+#### State Corruption
+**Signature:**
+- Invalid state transitions
+- Data inconsistency
+- Unexpected null values
+- Objects in impossible states
+
+**Common Causes:**
+- Direct state mutation
+- Missing validation
+- Incorrect error handling leaving partial updates
+- Concurrent modifications
+
+**Investigation Approach:**
+- Add state validation assertions
+- Review state mutation points
+- Check transaction boundaries
+- Look for error handling gaps
+
+### 4. Integration Issues
+
+#### API Failures
+**Signature:**
+- Timeout errors
+- 500/503 errors
+- Network errors
+- Rate limiting responses
+
+**Common Causes:**
+- Third-party API downtime
+- Network connectivity issues
+- Authentication token expiration
+- Rate limits exceeded
+
+**Investigation Approach:**
+- Check API status pages
+- Verify network connectivity
+- Review authentication flow
+- Check rate limit headers
+- Test with API directly (curl/Postman)
+
+#### Database Issues
+**Signature:**
+- Connection pool exhausted
+- Slow query performance
+- Lock wait timeouts
+- Connection refused errors
+
+**Common Causes:**
+- Connection leaks (not closing connections)
+- Missing indexes causing full table scans
+- N+1 query problems
+- Database server overload
+
+**Investigation Approach:**
+- Monitor connection pool metrics
+- Review slow query logs
+- Check execution plans
+- Look for repeated queries in loops
+
+### 5. Configuration Issues
+
+#### Environment Mismatches
+**Signature:**
+- Works locally, fails in production
+- Different behavior across environments
+- "It works on my machine"
+
+**Common Causes:**
+- Different environment variables
+- Different dependency versions
+- Different configuration files
+- Platform-specific code paths
+
+**Investigation Approach:**
+- Compare environment variables
+- Check dependency versions (package-lock.json, poetry.lock, etc.)
+- Review configuration for environment-specific values
+- Check platform-specific code paths
+
+#### Missing Dependencies
+**Signature:**
+- Module not found errors
+- Import errors
+- Class/function not defined
+- Version incompatibility errors
+
+**Common Causes:**
+- Missing package in requirements
+- Outdated dependency versions
+- Peer dependency conflicts
+- System library missing
+
+**Investigation Approach:**
+- Review dependency manifests
+- Check installed versions vs required
+- Look for dependency conflicts
+- Verify system libraries installed
+
+### 6. Logic Errors
+
+#### Off-by-One Errors
+**Signature:**
+- Index out of bounds
+- Missing first or last element
+- Infinite loops
+- Incorrect boundary handling
+
+**Common Causes:**
+- Using < instead of <=
+- 0-indexed vs 1-indexed confusion
+- Incorrect loop conditions
+
+**Investigation Approach:**
+- Check boundary conditions
+- Test with edge cases (empty, single element)
+- Review loop conditions carefully
+- Add assertions for expected ranges
+
+#### Type Coercion Bugs
+**Signature:**
+- Unexpected type errors
+- Comparison behaving unexpectedly
+- String concatenation instead of addition
+- Falsy value handling issues
+
+**Common Causes:**
+- Implicit type conversion
+- Loose equality checks (== vs ===)
+- Type assumptions without validation
+- Mixed numeric types
+
+**Investigation Approach:**
+- Add explicit type checks
+- Use strict equality
+- Add type annotations/hints
+- Check for implicit conversions
+
+### 7. Error Handling Issues
+
+#### Swallowed Exceptions
+**Signature:**
+- Silent failures
+- No error messages despite failure
+- Incomplete operations
+- Success reported despite failure
+
+**Common Causes:**
+- Empty catch blocks
+- Broad exception catching
+- Returning default values on error
+- Not re-raising exceptions
+
+**Investigation Approach:**
+- Search for empty catch/except blocks
+- Review exception handling patterns
+- Add logging to all error paths
+- Check for bare except/catch clauses
+
+#### Error Propagation Failures
+**Signature:**
+- Low-level errors exposed to users
+- Unclear error messages
+- Generic "Something went wrong"
+- Stack traces in user interface
+
+**Common Causes:**
+- No error translation layer
+- Missing error boundaries
+- Not catching specific exceptions
+- No user-friendly error messages
+
+**Investigation Approach:**
+- Review error handling architecture
+- Check error message clarity
+- Verify error boundaries exist
+- Test error scenarios
+
+## Pattern Recognition Strategies
+
+### Look for These Red Flags:
+
+1. **Time-based behavior**: If adding delays changes behavior, suspect timing issues
+2. **Load-based failures**: If failures increase with load, suspect resource exhaustion or race conditions
+3. **Environment-specific**: If only fails in certain environments, suspect configuration differences
+4. **Gradual degradation**: If performance worsens over time, suspect memory leaks or resource leaks
+5. **Intermittent failures**: If behavior is non-deterministic, suspect concurrency issues or external dependencies
+
+### Diagnostic Quick Checks:
+
+1. **Can you reproduce it consistently?** No → Likely timing/concurrency issue
+2. **Does it fail immediately?** Yes → Likely configuration or initialization issue
+3. **Does it fail after some time?** Yes → Likely resource leak or state corruption
+4. **Does it fail with specific input?** Yes → Likely validation or edge case handling issue
+5. **Does it fail only in production?** Yes → Likely environment or load-related issue
+
+## Using This Reference
+
+When encountering a bug:
+1. Match the signature to patterns above
+2. Review common causes for that pattern
+3. Follow the investigation approach
+4. Apply lessons from similar past issues
+5. Update this document if you discover new patterns
--- a/skills/debug/references/debugging_checklist.md
+++ b/skills/debug/references/debugging_checklist.md
@@ -0,0 +1,176 @@
+# Debugging Checklist
+
+This checklist provides detailed action items for each step of the debugging workflow.
+
+## Step 1: Observe Without Preconception ✓
+
+**Evidence Collection:**
+- [ ] Review user's bug report or issue description
+- [ ] Examine error messages and stack traces
+- [ ] Check application logs (stderr, stdout, application-specific logs)
+- [ ] Review monitoring dashboards (if available)
+- [ ] Inspect recent code changes (`git diff`, `git log`)
+- [ ] Document current environment (OS, versions, dependencies)
+- [ ] Capture configuration files (config files, environment variables, CLI arguments)
+- [ ] Screenshot or record the error if visual
+- [ ] Note exact steps to reproduce
+
+**Documentation:**
+- [ ] Create investigation log file
+- [ ] Record timestamp and initial observations
+- [ ] List all data sources consulted
+
+## Step 2: Classify and Isolate Facts ✓
+
+**Symptom Analysis:**
+- [ ] List all observable symptoms
+- [ ] Distinguish symptoms from potential causes
+- [ ] Identify what changed recently (code, config, dependencies, infrastructure)
+
+**Scope Narrowing:**
+- [ ] Test across different environments (dev, staging, production)
+- [ ] Test across different platforms (Windows, Linux, macOS)
+- [ ] Test across different browsers (if web application)
+- [ ] Test with different input data
+- [ ] Test with different configurations
+- [ ] Identify minimal reproduction case
+- [ ] Test with previous working version (regression testing)
+
+**Component Isolation:**
+- [ ] List all involved components/modules
+- [ ] Mark components known to work correctly
+- [ ] Highlight suspicious components
+- [ ] Draw dependency diagram if complex
+
+## Step 3: Build Differential Diagnosis List ✓
+
+**Infrastructure Issues:**
+- [ ] Network connectivity problems
+- [ ] DNS resolution failures
+- [ ] Load balancer misconfiguration
+- [ ] Firewall/security group blocking
+- [ ] Resource exhaustion (CPU, memory, disk)
+
+**Application Issues:**
+- [ ] Cache staleness or corruption
+- [ ] Database connection pool exhaustion
+- [ ] Database deadlocks or slow queries
+- [ ] Third-party API failures or timeouts
+- [ ] Memory leaks
+- [ ] Race conditions or threading issues
+- [ ] Incorrect error handling
+- [ ] Invalid input validation
+
+**Configuration Issues:**
+- [ ] Environment variable mismatch
+- [ ] Configuration file errors
+- [ ] Version incompatibility
+- [ ] Missing dependencies
+- [ ] Permission problems
+
+**Code Issues:**
+- [ ] Logic errors in recent changes
+- [ ] Null pointer/undefined errors
+- [ ] Type mismatches
+- [ ] Off-by-one errors
+- [ ] Incorrect assumptions
+
+## Step 4: Apply Elimination and Deductive Reasoning ✓
+
+**Hypothesis Testing:**
+- [ ] Rank hypotheses by likelihood
+- [ ] Design test for most likely hypothesis
+- [ ] Execute test and document result
+- [ ] If hypothesis invalidated, mark as eliminated
+- [ ] If hypothesis confirmed, design further verification
+- [ ] Move to next hypothesis if needed
+
+**Reasoning Documentation:**
+- [ ] Document "If X, then Y" statements
+- [ ] Record why each hypothesis was eliminated
+- [ ] Note which tests ruled out which possibilities
+- [ ] Maintain chain of reasoning for review
+
+**Narrowing Down:**
+- [ ] Eliminate external factors first (network, APIs)
+- [ ] Then infrastructure (resources, configuration)
+- [ ] Then application-level issues (cache, database)
+- [ ] Finally code-level issues (logic, types)
+
+## Step 5: Experimental Verification ✓
+
+**Preparation:**
+- [ ] Create git branch for experiments
+- [ ] Backup current state (checkpoint)
+- [ ] Document experiment plan
+
+**Experimentation:**
+- [ ] Add logging/instrumentation to suspected area
+- [ ] Add debug breakpoints if using debugger
+- [ ] Create controlled test case
+- [ ] Run experiment and capture output
+- [ ] Compare actual vs expected behavior
+
+**Research:**
+- [ ] Search GitHub issues for similar problems
+- [ ] Check Stack Overflow for related questions
+- [ ] Review official documentation for edge cases
+- [ ] Check release notes for known issues
+- [ ] Consult language/framework changelog
+
+**Validation:**
+- [ ] Can the issue be reproduced consistently?
+- [ ] Does the evidence match the hypothesis?
+- [ ] Are there alternative explanations?
+
+## Step 6: Locate and Implement Fix ✓
+
+**Root Cause Confirmation:**
+- [ ] Identify exact file and line number
+- [ ] Understand why the code fails
+- [ ] Confirm this is root cause, not symptom
+
+**Solution Design:**
+- [ ] Consider multiple fix approaches
+- [ ] Evaluate side effects of each approach
+- [ ] Choose most elegant and maintainable solution
+- [ ] Ensure fix doesn't introduce new issues
+
+**Implementation:**
+- [ ] Implement the fix
+- [ ] Add comments explaining the fix
+- [ ] Update related documentation
+- [ ] Add test case to prevent regression
+
+**Verification:**
+- [ ] Test the fix resolves original issue
+- [ ] Run existing test suite
+- [ ] Test edge cases
+- [ ] Verify no new issues introduced
+
+## Step 7: Prevention Mechanism ✓
+
+**Stability Verification:**
+- [ ] Run full test suite
+- [ ] Perform integration testing
+- [ ] Test in staging environment
+- [ ] Monitor for unexpected behavior
+
+**Documentation:**
+- [ ] Update CLAUDE.md or project docs
+- [ ] Document root cause
+- [ ] Document fix and reasoning
+- [ ] Add to knowledge base
+
+**Prevention Measures:**
+- [ ] Add automated test for this scenario
+- [ ] Add validation/assertions to prevent recurrence
+- [ ] Update error messages for clarity
+- [ ] Add monitoring/alerting if applicable
+- [ ] Share learnings with team
+
+**Post-Mortem:**
+- [ ] Review what went well
+- [ ] Identify what could improve
+- [ ] Update debugging procedures if needed
+- [ ] Celebrate the fix! 🎉
--- a/skills/debug/references/investigation_template.md
+++ b/skills/debug/references/investigation_template.md
@@ -0,0 +1,292 @@
+# Bug Investigation Log Template
+
+Use this template to document debugging sessions systematically. Copy and adapt as needed.
+
+---
+
+## Investigation Metadata
+
+**Issue ID/Reference:** [e.g., #123, TICKET-456]
+**Date Started:** [YYYY-MM-DD HH:MM]
+**Investigator:** [Name or AI assistant]
+**Priority:** [Critical / High / Medium / Low]
+**Status:** [🔴 Investigating / 🟡 In Progress / 🟢 Resolved]
+
+---
+
+## Step 1: Initial Observations
+
+**User Report:**
+```
+[Paste user's bug report or description here]
+```
+
+**Reproduction Steps:**
+1. [Step 1]
+2. [Step 2]
+3. [Step 3]
+
+**Expected Behavior:**
+[What should happen]
+
+**Actual Behavior:**
+[What actually happens]
+
+**Environment:**
+- OS: [e.g., Ubuntu 22.04, Windows 11, macOS 14]
+- Application Version: [e.g., v2.3.1]
+- Runtime: [e.g., Node.js 18.16, Python 3.11]
+- Browser: [if applicable]
+
+**Evidence Collected:**
+
+*Error Messages:*
+```
+[Paste error messages, stack traces]
+```
+
+*Logs:*
+```
+[Relevant log entries]
+```
+
+*Configuration:*
+```
+[Relevant config values, environment variables]
+```
+
+*Recent Changes:*
+- [Commit hash / PR / change description]
+- [git diff summary if relevant]
+
+---
+
+## Step 2: Fact Classification
+
+**Confirmed Symptoms (Observable Facts):**
+1. [Symptom 1]
+2. [Symptom 2]
+3. [Symptom 3]
+
+**Scope Analysis:**
+
+| Test | Result | Notes |
+|------|--------|-------|
+| Different environments (dev/staging/prod) | ✓/✗ | |
+| Different platforms (Win/Mac/Linux) | ✓/✗ | |
+| Different browsers | ✓/✗ | |
+| Different input data | ✓/✗ | |
+| Previous version | ✓/✗ | |
+
+**Isolated Components:**
+- ✅ Working correctly: [Component A, Component B]
+- ❌ Suspected issues: [Component C, Component D]
+- ❓ Uncertain: [Component E]
+
+**What Changed Recently:**
+- [Change 1 - date, description]
+- [Change 2 - date, description]
+
+---
+
+## Step 3: Differential Diagnosis List
+
+**Hypotheses (Ranked by Likelihood):**
+
+### Hypothesis 1: [Name of hypothesis]
+**Likelihood:** High / Medium / Low
+**Category:** [Infrastructure / Application / Configuration / Code]
+**Reasoning:** [Why this is suspected]
+
+### Hypothesis 2: [Name of hypothesis]
+**Likelihood:** High / Medium / Low
+**Category:** [Infrastructure / Application / Configuration / Code]
+**Reasoning:** [Why this is suspected]
+
+### Hypothesis 3: [Name of hypothesis]
+**Likelihood:** High / Medium / Low
+**Category:** [Infrastructure / Application / Configuration / Code]
+**Reasoning:** [Why this is suspected]
+
+[Add more as needed]
+
+---
+
+## Step 4: Elimination and Deductive Reasoning
+
+### Test 1: [Hypothesis being tested]
+**Test Design:** [How to test this hypothesis]
+**Expected Result if Hypothesis True:** [What you expect to see]
+**Actual Result:** [What you observed]
+**Conclusion:** ✅ Confirmed / ❌ Eliminated / ⚠️ Inconclusive
+**Reasoning:**
+```
+If [condition], then [expected behavior]
+We observed [actual behavior]
+Therefore [conclusion]
+```
+
+### Test 2: [Hypothesis being tested]
+**Test Design:** [How to test this hypothesis]
+**Expected Result if Hypothesis True:** [What you expect to see]
+**Actual Result:** [What you observed]
+**Conclusion:** ✅ Confirmed / ❌ Eliminated / ⚠️ Inconclusive
+**Reasoning:**
+```
+[Chain of reasoning]
+```
+
+[Continue for each test]
+
+**Hypotheses Remaining:** [List hypotheses not yet eliminated]
+
+---
+
+## Step 5: Experimental Verification
+
+**Checkpoint Created:** [git branch name, commit hash, or backup location]
+
+### Experiment 1: [Description]
+**Goal:** [What this experiment aims to prove/disprove]
+**Method:**
+```bash
+# Commands or code used
+```
+**Results:**
+```
+[Output or findings]
+```
+**Conclusion:** [What this proves]
+
+### Experiment 2: [Description]
+**Goal:** [What this experiment aims to prove/disprove]
+**Method:**
+```bash
+# Commands or code used
+```
+**Results:**
+```
+[Output or findings]
+```
+**Conclusion:** [What this proves]
+
+**Research Conducted:**
+- [ ] GitHub issues searched: [keywords used]
+- [ ] Stack Overflow checked: [relevant Q&As]
+- [ ] Documentation reviewed: [sections consulted]
+- [ ] Release notes: [findings]
+
+**Findings:**
+[Summary of research findings]
+
+**Root Cause Identified:** ✅ Yes / ❌ No
+
+---
+
+## Step 6: Root Cause and Fix
+
+**Root Cause:**
+[Precise description of what's causing the issue]
+
+**Location:**
+- File: [path/to/file.ext]
+- Line(s): [line number(s)]
+- Function/Method: [name]
+
+**Why This Causes the Issue:**
+[Explanation of the causal mechanism]
+
+**Fix Approaches Considered:**
+
+| Approach | Pros | Cons | Selected |
+|----------|------|------|----------|
+| [Approach 1] | [pros] | [cons] | ✅/❌ |
+| [Approach 2] | [pros] | [cons] | ✅/❌ |
+| [Approach 3] | [pros] | [cons] | ✅/❌ |
+
+**Selected Fix:**
+```diff
+[Show code diff or configuration change]
+```
+
+**Rationale:** [Why this fix was chosen]
+
+**Implementation Notes:**
+[Any important details about the fix]
+
+**Verification:**
+- [ ] Original issue resolved
+- [ ] No new issues introduced
+- [ ] Test suite passes
+- [ ] Edge cases tested
+
+---
+
+## Step 7: Prevention and Documentation
+
+**Regression Test Added:**
+```
+[Test code or test case description]
+```
+
+**Documentation Updated:**
+- [ ] CLAUDE.md updated
+- [ ] Code comments added
+- [ ] API documentation updated
+- [ ] README updated (if needed)
+
+**Prevention Measures Implemented:**
+1. [Measure 1 - e.g., added validation]
+2. [Measure 2 - e.g., improved error handling]
+3. [Measure 3 - e.g., added monitoring]
+
+**Lessons Learned:**
+1. [Lesson 1]
+2. [Lesson 2]
+3. [Lesson 3]
+
+**Knowledge Base Update:**
+- Pattern: [If this represents a new pattern to document]
+- Category: [What category of bug this was]
+- Key Insight: [Main takeaway for future debugging]
+
+---
+
+## Timeline Summary
+
+| Time | Activity | Result |
+|------|----------|--------|
+| [HH:MM] | Investigation started | |
+| [HH:MM] | Initial observations completed | |
+| [HH:MM] | Hypothesis list created | |
+| [HH:MM] | Testing began | |
+| [HH:MM] | Root cause identified | |
+| [HH:MM] | Fix implemented | |
+| [HH:MM] | Verification completed | |
+| [HH:MM] | Issue resolved | |
+
+**Total Time:** [Duration]
+
+---
+
+## Status Update for Stakeholders
+
+**Summary for Non-Technical Audience:**
+[1-2 sentence explanation of what went wrong and how it was fixed]
+
+**Impact:**
+- Users affected: [number or description]
+- Duration: [how long the issue existed]
+- Severity: [impact level]
+
+**Resolution:**
+[Brief description of the fix]
+
+**Follow-up Actions:**
+- [ ] [Action 1]
+- [ ] [Action 2]
+
+---
+
+**Investigation Completed:** [YYYY-MM-DD HH:MM]
+**Final Status:** 🟢 Resolved / 🔴 Unresolved / ⚠️ Workaround Applied
--- a/skills/debug/repackage.py
+++ b/skills/debug/repackage.py
@@ -0,0 +1,33 @@
+#!/usr/bin/env python3
+"""
+Repackage this skill into a distributable zip file.
+
+Usage:
+    cd debug
+    python repackage.py
+
+Output: ../debug.zip
+"""
+import zipfile
+from pathlib import Path
+
+# Paths relative to this script
+script_dir = Path(__file__).parent
+skill_name = script_dir.name
+zip_path = script_dir.parent / f'{skill_name}.zip'
+
+# Remove old zip if exists
+if zip_path.exists():
+    zip_path.unlink()
+    print(f"Removed old: {zip_path.name}")
+
+print(f"Packaging skill: {skill_name}\n")
+
+with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
+    for file_path in script_dir.rglob('*'):
+        if file_path.is_file() and file_path.name != 'repackage.py':  # Don't include this script
+            arcname = file_path.relative_to(script_dir.parent)
+            zf.write(file_path, arcname)
+            print(f"  Added: {arcname}")
+
+print(f"\n✅ Successfully packaged to: {zip_path.absolute()}")