Initial commit

2025-11-29 18:50:24 +08:00
commit f172746dc6
52 changed files with 17406 additions and 0 deletions
--- a/skills/debugging-issues/SKILL.md
+++ b/skills/debugging-issues/SKILL.md
@@ -0,0 +1,500 @@
+---
+name: Debugging Issues
+description: Systematically debug issues with reproduction steps, error analysis, hypothesis testing, and root cause fixes. Use when investigating bugs, analyzing production incidents, or troubleshooting unexpected behavior.
+---
+
+# Debugging Issues
+
+## Purpose
+Provides systematic approaches to debugging, troubleshooting techniques, and error analysis strategies.
+
+## When to Use
+- Investigating bugs or unexpected behavior
+- Analyzing error messages and stack traces
+- Troubleshooting system issues
+- Performance debugging
+- Root cause analysis
+- Production incident response
+
+## Systematic Debugging Process
+
+### 1. Reproduce the Issue
+**Goal**: Create a consistent way to trigger the bug
+
+**Steps:**
+- [ ] Document exact steps to reproduce
+- [ ] Identify required preconditions
+- [ ] Note the environment (OS, browser, versions)
+- [ ] Create minimal reproduction case
+- [ ] Verify it reproduces consistently
+
+**Example:**
+```yaml
+reproduction_steps:
+  - action: "Login as admin user"
+  - action: "Navigate to /dashboard"
+  - action: "Click 'Export Data' button"
+  - expected: "CSV file downloads"
+  - actual: "Error 500 appears"
+  - frequency: "Occurs every time"
+```
+
+### 2. Isolate the Problem
+**Goal**: Narrow down where the issue occurs
+
+**Techniques:**
+```yaml
+isolation_methods:
+  Divide and Conquer:
+    description: "Split system in half, test which half has issue"
+    example: "Comment out half the code, see if error persists"
+
+  Binary Search:
+    description: "Use git bisect or similar to find breaking commit"
+    command: "git bisect start && git bisect bad && git bisect good v1.0"
+
+  Component Isolation:
+    description: "Test each component individually"
+    example: "Test database, API, frontend separately"
+
+  Environment Comparison:
+    description: "Compare working vs broken environments"
+    checklist:
+      - Different OS?
+      - Different versions?
+      - Different configurations?
+      - Different data?
+```
+
+### 3. Analyze Logs and Errors
+**Goal**: Gather evidence about what's going wrong
+
+**Log Analysis:**
+```yaml
+log_analysis:
+  error_messages:
+    - Read the full error message
+    - Note the error type/code
+    - Identify the failing component
+
+  stack_traces:
+    - Start from the bottom (root cause)
+    - Identify the first non-library code
+    - Check function arguments at that point
+
+  correlation:
+    - Check logs before the error
+    - Look for patterns
+    - Correlate with user actions
+    - Check timestamps
+```
+
+**Common Error Patterns:**
+```python
+# NullPointerException / AttributeError
+# Usually: Accessing property of None/null object
+# Fix: Add null checks or ensure object is initialized
+
+# IndexError / ArrayIndexOutOfBoundsException
+# Usually: Accessing array index that doesn't exist
+# Fix: Check array length before accessing
+
+# KeyError / Property not found
+# Usually: Accessing dict/object key that doesn't exist
+# Fix: Use .get() with default or check if key exists
+
+# TypeError / Type mismatch
+# Usually: Wrong type passed to function
+# Fix: Validate types, add type hints
+
+# ConnectionError / Timeout
+# Usually: Network issues or service down
+# Fix: Add retry logic, check service health
+```
+
+### 4. Form Hypothesis
+**Goal**: Develop theory about what's causing the issue
+
+**Hypothesis Framework:**
+```yaml
+hypothesis_template:
+  observation: "What did you observe?"
+  theory: "What do you think is causing it?"
+  prediction: "If theory is correct, what else would be true?"
+  test: "How can you test this?"
+
+example:
+  observation: "API returns 500 error on POST /users"
+  theory: "Input validation is rejecting valid email format"
+  prediction: "If true, different email format should work"
+  test: "Try with various email formats"
+```
+
+### 5. Test the Hypothesis
+**Goal**: Verify or disprove your theory
+
+**Testing Approaches:**
+```yaml
+testing_methods:
+  Add Logging:
+    description: "Add detailed logs around suspected area"
+    example: |
+      logger.debug(f"Input data: {data}")
+      logger.debug(f"Validation result: {is_valid}")
+
+  Add Breakpoints:
+    description: "Pause execution to inspect state"
+    tools:
+      - "pdb for Python"
+      - "debugger for JavaScript"
+      - "gdb for C/C++"
+
+  Change One Thing:
+    description: "Modify one variable at a time"
+    example: "Change input value, run again, observe result"
+
+  Write Failing Test:
+    description: "Create test that reproduces the bug"
+    benefit: "Ensures fix works and prevents regression"
+```
+
+### 6. Implement Fix
+**Goal**: Resolve the root cause
+
+**Fix Strategies:**
+```yaml
+fix_approaches:
+  Quick Fix:
+    when: "Production is down"
+    approach: "Minimal change to restore service"
+    followup: "Proper fix later"
+
+  Root Cause Fix:
+    when: "Have time to do it right"
+    approach: "Fix underlying cause"
+    benefit: "Prevents similar bugs"
+
+  Workaround:
+    when: "Fix is complex, need temporary solution"
+    approach: "Add special handling"
+    document: "Explain why workaround exists"
+```
+
+### 7. Verify the Fix
+**Goal**: Ensure the issue is resolved
+
+**Verification Checklist:**
+- [ ] Original bug is fixed
+- [ ] No new bugs introduced
+- [ ] All tests pass
+- [ ] Edge cases handled
+- [ ] Code reviewed
+- [ ] Deployed to test environment
+- [ ] Tested in production-like environment
+
+## Debugging Techniques
+
+### Print Debugging
+```python
+# Simple but effective
+def calculate_total(items):
+    print(f"DEBUG: items = {items}")
+    total = sum(item.price for item in items)
+    print(f"DEBUG: total = {total}")
+    return total
+```
+
+### Interactive Debugging
+```python
+# Python pdb
+import pdb; pdb.set_trace()
+
+# Common commands:
+# n (next) - Execute next line
+# s (step) - Step into function
+# c (continue) - Continue execution
+# p variable - Print variable
+# l (list) - Show code context
+# q (quit) - Exit debugger
+```
+
+### Rubber Duck Debugging
+```yaml
+rubber_duck_method:
+  step_1: "Get a rubber duck (or patient colleague)"
+  step_2: "Explain your code line by line"
+  step_3: "Explain what you expect to happen"
+  step_4: "Explain what actually happens"
+  step_5: "Often you'll realize the issue while explaining"
+```
+
+### Binary Search Debugging
+```bash
+# Find which commit introduced a bug
+git bisect start
+git bisect bad  # Current commit is bad
+git bisect good v1.0  # v1.0 was working
+
+# Git will checkout commits for you to test
+# After each test, mark as good or bad:
+git bisect good  # if works
+git bisect bad   # if broken
+
+# Git will find the problematic commit
+```
+
+### Adding Instrumentation
+```python
+# Add metrics to understand behavior
+import time
+from functools import wraps
+
+def timing_decorator(func):
+    @wraps(func)
+    def wrapper(*args, **kwargs):
+        start = time.time()
+        result = func(*args, **kwargs)
+        duration = time.time() - start
+        print(f"{func.__name__} took {duration:.2f}s")
+        return result
+    return wrapper
+
+@timing_decorator
+def slow_function():
+    # Your code here
+    pass
+```
+
+## Common Debugging Scenarios
+
+### Performance Issues
+```yaml
+performance_debugging:
+  profile_the_code:
+    python: "python -m cProfile script.py"
+    node: "node --prof script.js"
+
+  identify_bottlenecks:
+    - Look for functions called many times
+    - Check for slow database queries
+    - Identify memory allocations
+
+  optimize:
+    - Cache repeated calculations
+    - Use more efficient algorithms
+    - Add database indexes
+    - Implement pagination
+```
+
+### Memory Leaks
+```yaml
+memory_leak_debugging:
+  detect:
+    - Monitor memory usage over time
+    - Look for steadily increasing memory
+    - Check for unclosed resources
+
+  common_causes:
+    - Unclosed file handles
+    - Unclosed database connections
+    - Event listeners not removed
+    - Circular references
+    - Large objects not garbage collected
+
+  fix:
+    - Use context managers (with statement)
+    - Explicitly close connections
+    - Remove event listeners
+    - Break circular references
+```
+
+### Race Conditions
+```yaml
+race_condition_debugging:
+  symptoms:
+    - Intermittent failures
+    - Harder to reproduce
+    - Timing-dependent
+
+  detection:
+    - Add logging with timestamps
+    - Use thread/process IDs in logs
+    - Add artificial delays to expose timing issues
+
+  solutions:
+    - Add proper locking (mutex, semaphore)
+    - Use atomic operations
+    - Redesign to avoid shared state
+    - Use message queues
+```
+
+### Database Issues
+```yaml
+database_debugging:
+  slow_queries:
+    identify: "EXPLAIN ANALYZE query"
+    solutions:
+      - Add indexes
+      - Optimize joins
+      - Reduce data fetched
+      - Use connection pooling
+
+  deadlocks:
+    detect: "Check database logs for deadlock errors"
+    prevent:
+      - Acquire locks in consistent order
+      - Keep transactions short
+      - Use appropriate isolation levels
+
+  connection_issues:
+    symptoms: "Connection refused, timeout errors"
+    check:
+      - Database is running
+      - Connection string correct
+      - Firewall/network allows connection
+      - Connection pool not exhausted
+```
+
+## Error Analysis Patterns
+
+### Stack Trace Reading
+```python
+# Example stack trace
+Traceback (most recent call last):
+  File "app.py", line 45, in main
+    process_user(user_data)
+  File "services.py", line 23, in process_user
+    validate_email(user_data['email'])
+  File "validators.py", line 12, in validate_email
+    if '@' not in email:
+TypeError: argument of type 'NoneType' is not iterable
+
+# Analysis:
+# 1. Error: TypeError at line 12 in validators.py
+# 2. Cause: 'email' variable is None
+# 3. Origin: Likely user_data['email'] is None from services.py line 23
+# 4. Fix: Add None check before validation
+```
+
+### Error Messages Interpretation
+```yaml
+error_interpretation:
+  "Connection refused":
+    likely_causes:
+      - Service not running
+      - Wrong port
+      - Firewall blocking
+
+  "Permission denied":
+    likely_causes:
+      - Insufficient file permissions
+      - User lacks required role
+      - Protected resource
+
+  "Resource not found":
+    likely_causes:
+      - Typo in path/URL
+      - Resource deleted
+      - Wrong environment
+
+  "Timeout":
+    likely_causes:
+      - Service too slow
+      - Network issues
+      - Infinite loop
+      - Deadlock
+```
+
+## Debugging Checklist
+
+### Before Starting
+- [ ] Can you reproduce the issue?
+- [ ] Do you have access to logs?
+- [ ] Do you have a test environment?
+- [ ] Is there a recent change that might have caused it?
+
+### During Debugging
+- [ ] Have you isolated the problem area?
+- [ ] Have you checked the logs?
+- [ ] Have you formed a hypothesis?
+- [ ] Have you tested your hypothesis?
+- [ ] Are you changing one thing at a time?
+
+### Before Closing
+- [ ] Is the original issue fixed?
+- [ ] Have you written a test for this bug?
+- [ ] Have you checked for similar bugs?
+- [ ] Have you documented the root cause?
+- [ ] Have you shared knowledge with the team?
+
+## Production Debugging
+
+### Safe Debugging in Production
+```yaml
+production_debugging:
+  do:
+    - Add detailed logging
+    - Monitor metrics
+    - Use feature flags to isolate issues
+    - Take snapshots/backups before changes
+    - Have rollback plan ready
+
+  dont:
+    - Don't use debugger breakpoints (freezes service)
+    - Don't make changes without review
+    - Don't restart services unnecessarily
+    - Don't expose sensitive data in logs
+```
+
+### Incident Response
+```yaml
+incident_response:
+  immediate:
+    - Assess severity
+    - Notify stakeholders
+    - Start incident log
+    - Begin mitigation
+
+  mitigation:
+    - Restore service (rollback if needed)
+    - Implement workaround
+    - Monitor closely
+
+  resolution:
+    - Identify root cause
+    - Implement proper fix
+    - Test thoroughly
+    - Deploy fix
+
+  followup:
+    - Write postmortem
+    - Update runbooks
+    - Add monitoring/alerts
+    - Share learnings
+```
+
+## Tools and Resources
+
+### Debugging Tools
+```yaml
+tools_by_language:
+  python:
+    - "pdb - Interactive debugger"
+    - "ipdb - Enhanced pdb"
+    - "memory_profiler - Memory profiling"
+    - "cProfile - Performance profiling"
+
+  javascript:
+    - "Chrome DevTools"
+    - "Node.js debugger"
+    - "VS Code debugger"
+
+  general:
+    - "Git bisect - Find breaking commit"
+    - "curl - Test APIs"
+    - "tcpdump - Network debugging"
+    - "strace/dtrace - System call tracing"
+```
+
+---
+*Use this skill when debugging issues or conducting root cause analysis*