Initial commit
This commit is contained in:
500
skills/debugging-issues/SKILL.md
Normal file
500
skills/debugging-issues/SKILL.md
Normal file
@@ -0,0 +1,500 @@
|
||||
---
|
||||
name: Debugging Issues
|
||||
description: Systematically debug issues with reproduction steps, error analysis, hypothesis testing, and root cause fixes. Use when investigating bugs, analyzing production incidents, or troubleshooting unexpected behavior.
|
||||
---
|
||||
|
||||
# Debugging Issues
|
||||
|
||||
## Purpose
|
||||
Provides systematic approaches to debugging, troubleshooting techniques, and error analysis strategies.
|
||||
|
||||
## When to Use
|
||||
- Investigating bugs or unexpected behavior
|
||||
- Analyzing error messages and stack traces
|
||||
- Troubleshooting system issues
|
||||
- Performance debugging
|
||||
- Root cause analysis
|
||||
- Production incident response
|
||||
|
||||
## Systematic Debugging Process
|
||||
|
||||
### 1. Reproduce the Issue
|
||||
**Goal**: Create a consistent way to trigger the bug
|
||||
|
||||
**Steps:**
|
||||
- [ ] Document exact steps to reproduce
|
||||
- [ ] Identify required preconditions
|
||||
- [ ] Note the environment (OS, browser, versions)
|
||||
- [ ] Create minimal reproduction case
|
||||
- [ ] Verify it reproduces consistently
|
||||
|
||||
**Example:**
|
||||
```yaml
|
||||
reproduction_steps:
|
||||
- action: "Login as admin user"
|
||||
- action: "Navigate to /dashboard"
|
||||
- action: "Click 'Export Data' button"
|
||||
- expected: "CSV file downloads"
|
||||
- actual: "Error 500 appears"
|
||||
- frequency: "Occurs every time"
|
||||
```
|
||||
|
||||
### 2. Isolate the Problem
|
||||
**Goal**: Narrow down where the issue occurs
|
||||
|
||||
**Techniques:**
|
||||
```yaml
|
||||
isolation_methods:
|
||||
Divide and Conquer:
|
||||
description: "Split system in half, test which half has issue"
|
||||
example: "Comment out half the code, see if error persists"
|
||||
|
||||
Binary Search:
|
||||
description: "Use git bisect or similar to find breaking commit"
|
||||
command: "git bisect start && git bisect bad && git bisect good v1.0"
|
||||
|
||||
Component Isolation:
|
||||
description: "Test each component individually"
|
||||
example: "Test database, API, frontend separately"
|
||||
|
||||
Environment Comparison:
|
||||
description: "Compare working vs broken environments"
|
||||
checklist:
|
||||
- Different OS?
|
||||
- Different versions?
|
||||
- Different configurations?
|
||||
- Different data?
|
||||
```
|
||||
|
||||
### 3. Analyze Logs and Errors
|
||||
**Goal**: Gather evidence about what's going wrong
|
||||
|
||||
**Log Analysis:**
|
||||
```yaml
|
||||
log_analysis:
|
||||
error_messages:
|
||||
- Read the full error message
|
||||
- Note the error type/code
|
||||
- Identify the failing component
|
||||
|
||||
stack_traces:
|
||||
- Start from the bottom (root cause)
|
||||
- Identify the first non-library code
|
||||
- Check function arguments at that point
|
||||
|
||||
correlation:
|
||||
- Check logs before the error
|
||||
- Look for patterns
|
||||
- Correlate with user actions
|
||||
- Check timestamps
|
||||
```
|
||||
|
||||
**Common Error Patterns:**
|
||||
```python
|
||||
# NullPointerException / AttributeError
|
||||
# Usually: Accessing property of None/null object
|
||||
# Fix: Add null checks or ensure object is initialized
|
||||
|
||||
# IndexError / ArrayIndexOutOfBoundsException
|
||||
# Usually: Accessing array index that doesn't exist
|
||||
# Fix: Check array length before accessing
|
||||
|
||||
# KeyError / Property not found
|
||||
# Usually: Accessing dict/object key that doesn't exist
|
||||
# Fix: Use .get() with default or check if key exists
|
||||
|
||||
# TypeError / Type mismatch
|
||||
# Usually: Wrong type passed to function
|
||||
# Fix: Validate types, add type hints
|
||||
|
||||
# ConnectionError / Timeout
|
||||
# Usually: Network issues or service down
|
||||
# Fix: Add retry logic, check service health
|
||||
```
|
||||
|
||||
### 4. Form Hypothesis
|
||||
**Goal**: Develop theory about what's causing the issue
|
||||
|
||||
**Hypothesis Framework:**
|
||||
```yaml
|
||||
hypothesis_template:
|
||||
observation: "What did you observe?"
|
||||
theory: "What do you think is causing it?"
|
||||
prediction: "If theory is correct, what else would be true?"
|
||||
test: "How can you test this?"
|
||||
|
||||
example:
|
||||
observation: "API returns 500 error on POST /users"
|
||||
theory: "Input validation is rejecting valid email format"
|
||||
prediction: "If true, different email format should work"
|
||||
test: "Try with various email formats"
|
||||
```
|
||||
|
||||
### 5. Test the Hypothesis
|
||||
**Goal**: Verify or disprove your theory
|
||||
|
||||
**Testing Approaches:**
|
||||
```yaml
|
||||
testing_methods:
|
||||
Add Logging:
|
||||
description: "Add detailed logs around suspected area"
|
||||
example: |
|
||||
logger.debug(f"Input data: {data}")
|
||||
logger.debug(f"Validation result: {is_valid}")
|
||||
|
||||
Add Breakpoints:
|
||||
description: "Pause execution to inspect state"
|
||||
tools:
|
||||
- "pdb for Python"
|
||||
- "debugger for JavaScript"
|
||||
- "gdb for C/C++"
|
||||
|
||||
Change One Thing:
|
||||
description: "Modify one variable at a time"
|
||||
example: "Change input value, run again, observe result"
|
||||
|
||||
Write Failing Test:
|
||||
description: "Create test that reproduces the bug"
|
||||
benefit: "Ensures fix works and prevents regression"
|
||||
```
|
||||
|
||||
### 6. Implement Fix
|
||||
**Goal**: Resolve the root cause
|
||||
|
||||
**Fix Strategies:**
|
||||
```yaml
|
||||
fix_approaches:
|
||||
Quick Fix:
|
||||
when: "Production is down"
|
||||
approach: "Minimal change to restore service"
|
||||
followup: "Proper fix later"
|
||||
|
||||
Root Cause Fix:
|
||||
when: "Have time to do it right"
|
||||
approach: "Fix underlying cause"
|
||||
benefit: "Prevents similar bugs"
|
||||
|
||||
Workaround:
|
||||
when: "Fix is complex, need temporary solution"
|
||||
approach: "Add special handling"
|
||||
document: "Explain why workaround exists"
|
||||
```
|
||||
|
||||
### 7. Verify the Fix
|
||||
**Goal**: Ensure the issue is resolved
|
||||
|
||||
**Verification Checklist:**
|
||||
- [ ] Original bug is fixed
|
||||
- [ ] No new bugs introduced
|
||||
- [ ] All tests pass
|
||||
- [ ] Edge cases handled
|
||||
- [ ] Code reviewed
|
||||
- [ ] Deployed to test environment
|
||||
- [ ] Tested in production-like environment
|
||||
|
||||
## Debugging Techniques
|
||||
|
||||
### Print Debugging
|
||||
```python
|
||||
# Simple but effective
|
||||
def calculate_total(items):
|
||||
print(f"DEBUG: items = {items}")
|
||||
total = sum(item.price for item in items)
|
||||
print(f"DEBUG: total = {total}")
|
||||
return total
|
||||
```
|
||||
|
||||
### Interactive Debugging
|
||||
```python
|
||||
# Python pdb
|
||||
import pdb; pdb.set_trace()
|
||||
|
||||
# Common commands:
|
||||
# n (next) - Execute next line
|
||||
# s (step) - Step into function
|
||||
# c (continue) - Continue execution
|
||||
# p variable - Print variable
|
||||
# l (list) - Show code context
|
||||
# q (quit) - Exit debugger
|
||||
```
|
||||
|
||||
### Rubber Duck Debugging
|
||||
```yaml
|
||||
rubber_duck_method:
|
||||
step_1: "Get a rubber duck (or patient colleague)"
|
||||
step_2: "Explain your code line by line"
|
||||
step_3: "Explain what you expect to happen"
|
||||
step_4: "Explain what actually happens"
|
||||
step_5: "Often you'll realize the issue while explaining"
|
||||
```
|
||||
|
||||
### Binary Search Debugging
|
||||
```bash
|
||||
# Find which commit introduced a bug
|
||||
git bisect start
|
||||
git bisect bad # Current commit is bad
|
||||
git bisect good v1.0 # v1.0 was working
|
||||
|
||||
# Git will checkout commits for you to test
|
||||
# After each test, mark as good or bad:
|
||||
git bisect good # if works
|
||||
git bisect bad # if broken
|
||||
|
||||
# Git will find the problematic commit
|
||||
```
|
||||
|
||||
### Adding Instrumentation
|
||||
```python
|
||||
# Add metrics to understand behavior
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def timing_decorator(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
start = time.time()
|
||||
result = func(*args, **kwargs)
|
||||
duration = time.time() - start
|
||||
print(f"{func.__name__} took {duration:.2f}s")
|
||||
return result
|
||||
return wrapper
|
||||
|
||||
@timing_decorator
|
||||
def slow_function():
|
||||
# Your code here
|
||||
pass
|
||||
```
|
||||
|
||||
## Common Debugging Scenarios
|
||||
|
||||
### Performance Issues
|
||||
```yaml
|
||||
performance_debugging:
|
||||
profile_the_code:
|
||||
python: "python -m cProfile script.py"
|
||||
node: "node --prof script.js"
|
||||
|
||||
identify_bottlenecks:
|
||||
- Look for functions called many times
|
||||
- Check for slow database queries
|
||||
- Identify memory allocations
|
||||
|
||||
optimize:
|
||||
- Cache repeated calculations
|
||||
- Use more efficient algorithms
|
||||
- Add database indexes
|
||||
- Implement pagination
|
||||
```
|
||||
|
||||
### Memory Leaks
|
||||
```yaml
|
||||
memory_leak_debugging:
|
||||
detect:
|
||||
- Monitor memory usage over time
|
||||
- Look for steadily increasing memory
|
||||
- Check for unclosed resources
|
||||
|
||||
common_causes:
|
||||
- Unclosed file handles
|
||||
- Unclosed database connections
|
||||
- Event listeners not removed
|
||||
- Circular references
|
||||
- Large objects not garbage collected
|
||||
|
||||
fix:
|
||||
- Use context managers (with statement)
|
||||
- Explicitly close connections
|
||||
- Remove event listeners
|
||||
- Break circular references
|
||||
```
|
||||
|
||||
### Race Conditions
|
||||
```yaml
|
||||
race_condition_debugging:
|
||||
symptoms:
|
||||
- Intermittent failures
|
||||
- Harder to reproduce
|
||||
- Timing-dependent
|
||||
|
||||
detection:
|
||||
- Add logging with timestamps
|
||||
- Use thread/process IDs in logs
|
||||
- Add artificial delays to expose timing issues
|
||||
|
||||
solutions:
|
||||
- Add proper locking (mutex, semaphore)
|
||||
- Use atomic operations
|
||||
- Redesign to avoid shared state
|
||||
- Use message queues
|
||||
```
|
||||
|
||||
### Database Issues
|
||||
```yaml
|
||||
database_debugging:
|
||||
slow_queries:
|
||||
identify: "EXPLAIN ANALYZE query"
|
||||
solutions:
|
||||
- Add indexes
|
||||
- Optimize joins
|
||||
- Reduce data fetched
|
||||
- Use connection pooling
|
||||
|
||||
deadlocks:
|
||||
detect: "Check database logs for deadlock errors"
|
||||
prevent:
|
||||
- Acquire locks in consistent order
|
||||
- Keep transactions short
|
||||
- Use appropriate isolation levels
|
||||
|
||||
connection_issues:
|
||||
symptoms: "Connection refused, timeout errors"
|
||||
check:
|
||||
- Database is running
|
||||
- Connection string correct
|
||||
- Firewall/network allows connection
|
||||
- Connection pool not exhausted
|
||||
```
|
||||
|
||||
## Error Analysis Patterns
|
||||
|
||||
### Stack Trace Reading
|
||||
```python
|
||||
# Example stack trace
|
||||
Traceback (most recent call last):
|
||||
File "app.py", line 45, in main
|
||||
process_user(user_data)
|
||||
File "services.py", line 23, in process_user
|
||||
validate_email(user_data['email'])
|
||||
File "validators.py", line 12, in validate_email
|
||||
if '@' not in email:
|
||||
TypeError: argument of type 'NoneType' is not iterable
|
||||
|
||||
# Analysis:
|
||||
# 1. Error: TypeError at line 12 in validators.py
|
||||
# 2. Cause: 'email' variable is None
|
||||
# 3. Origin: Likely user_data['email'] is None from services.py line 23
|
||||
# 4. Fix: Add None check before validation
|
||||
```
|
||||
|
||||
### Error Messages Interpretation
|
||||
```yaml
|
||||
error_interpretation:
|
||||
"Connection refused":
|
||||
likely_causes:
|
||||
- Service not running
|
||||
- Wrong port
|
||||
- Firewall blocking
|
||||
|
||||
"Permission denied":
|
||||
likely_causes:
|
||||
- Insufficient file permissions
|
||||
- User lacks required role
|
||||
- Protected resource
|
||||
|
||||
"Resource not found":
|
||||
likely_causes:
|
||||
- Typo in path/URL
|
||||
- Resource deleted
|
||||
- Wrong environment
|
||||
|
||||
"Timeout":
|
||||
likely_causes:
|
||||
- Service too slow
|
||||
- Network issues
|
||||
- Infinite loop
|
||||
- Deadlock
|
||||
```
|
||||
|
||||
## Debugging Checklist
|
||||
|
||||
### Before Starting
|
||||
- [ ] Can you reproduce the issue?
|
||||
- [ ] Do you have access to logs?
|
||||
- [ ] Do you have a test environment?
|
||||
- [ ] Is there a recent change that might have caused it?
|
||||
|
||||
### During Debugging
|
||||
- [ ] Have you isolated the problem area?
|
||||
- [ ] Have you checked the logs?
|
||||
- [ ] Have you formed a hypothesis?
|
||||
- [ ] Have you tested your hypothesis?
|
||||
- [ ] Are you changing one thing at a time?
|
||||
|
||||
### Before Closing
|
||||
- [ ] Is the original issue fixed?
|
||||
- [ ] Have you written a test for this bug?
|
||||
- [ ] Have you checked for similar bugs?
|
||||
- [ ] Have you documented the root cause?
|
||||
- [ ] Have you shared knowledge with the team?
|
||||
|
||||
## Production Debugging
|
||||
|
||||
### Safe Debugging in Production
|
||||
```yaml
|
||||
production_debugging:
|
||||
do:
|
||||
- Add detailed logging
|
||||
- Monitor metrics
|
||||
- Use feature flags to isolate issues
|
||||
- Take snapshots/backups before changes
|
||||
- Have rollback plan ready
|
||||
|
||||
dont:
|
||||
- Don't use debugger breakpoints (freezes service)
|
||||
- Don't make changes without review
|
||||
- Don't restart services unnecessarily
|
||||
- Don't expose sensitive data in logs
|
||||
```
|
||||
|
||||
### Incident Response
|
||||
```yaml
|
||||
incident_response:
|
||||
immediate:
|
||||
- Assess severity
|
||||
- Notify stakeholders
|
||||
- Start incident log
|
||||
- Begin mitigation
|
||||
|
||||
mitigation:
|
||||
- Restore service (rollback if needed)
|
||||
- Implement workaround
|
||||
- Monitor closely
|
||||
|
||||
resolution:
|
||||
- Identify root cause
|
||||
- Implement proper fix
|
||||
- Test thoroughly
|
||||
- Deploy fix
|
||||
|
||||
followup:
|
||||
- Write postmortem
|
||||
- Update runbooks
|
||||
- Add monitoring/alerts
|
||||
- Share learnings
|
||||
```
|
||||
|
||||
## Tools and Resources
|
||||
|
||||
### Debugging Tools
|
||||
```yaml
|
||||
tools_by_language:
|
||||
python:
|
||||
- "pdb - Interactive debugger"
|
||||
- "ipdb - Enhanced pdb"
|
||||
- "memory_profiler - Memory profiling"
|
||||
- "cProfile - Performance profiling"
|
||||
|
||||
javascript:
|
||||
- "Chrome DevTools"
|
||||
- "Node.js debugger"
|
||||
- "VS Code debugger"
|
||||
|
||||
general:
|
||||
- "Git bisect - Find breaking commit"
|
||||
- "curl - Test APIs"
|
||||
- "tcpdump - Network debugging"
|
||||
- "strace/dtrace - System call tracing"
|
||||
```
|
||||
|
||||
---
|
||||
*Use this skill when debugging issues or conducting root cause analysis*
|
||||
Reference in New Issue
Block a user