Files
gh-tachyon-beep-skillpacks-…/skills/using-python-engineering/debugging-and-profiling.md
2025-11-30 08:59:19 +08:00

1048 lines
25 KiB
Markdown

# Debugging and Profiling
## Overview
**Core Principle:** Profile before optimizing. Humans are terrible at guessing where code is slow. Always measure before making changes.
Python debugging and profiling enables systematic problem diagnosis and performance optimization. Use debugpy/pdb for step-through debugging, cProfile for CPU profiling, memory_profiler for memory analysis. The biggest mistake: optimizing code without profiling first—you'll likely optimize the wrong thing.
## When to Use
**Use this skill when:**
- "Code is slow"
- "How to profile Python?"
- "Memory leak"
- "Debugging not working"
- "Find bottleneck"
- "Optimize performance"
- "Step through code"
- "Where is my code spending time?"
**Don't use when:**
- Setting up project (use project-structure-and-tooling)
- Already know what to optimize (but still profile to verify!)
- Algorithm selection (different skill domain)
**Symptoms triggering this skill:**
- Code runs slower than expected
- Memory usage growing over time
- Need to understand execution flow
- Performance degraded after changes
## Debugging Fundamentals
### Using debugpy with VS Code
```python
# ✅ CORRECT: debugpy for remote debugging
import debugpy
# Allow VS Code to attach
debugpy.listen(5678)
print("Waiting for debugger to attach...")
debugpy.wait_for_client()
# Your code here
def process_data(data):
result = []
for item in data:
# Set breakpoint in VS Code on this line
transformed = transform(item)
result.append(transformed)
return result
# VS Code launch.json configuration:
"""
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
}
}
]
}
"""
```
### Using pdb (Python Debugger)
```python
# ✅ CORRECT: pdb for interactive debugging
import pdb
def buggy_function(data):
result = []
for i, item in enumerate(data):
# Drop into debugger
pdb.set_trace() # Or: breakpoint() in Python 3.7+
processed = item * 2
result.append(processed)
return result
# pdb commands:
# n (next): Execute next line
# s (step): Step into function
# c (continue): Continue execution
# p variable: Print variable
# pp variable: Pretty print variable
# l (list): Show current location in code
# w (where): Show stack trace
# q (quit): Quit debugger
```
### Conditional Breakpoints
```python
# ❌ WRONG: Breaking on every iteration
def process_items(items):
for item in items:
pdb.set_trace() # Breaks 10000 times!
process(item)
# ✅ CORRECT: Conditional breakpoint
def process_items(items):
for i, item in enumerate(items):
if i == 5000: # Only break on specific iteration
breakpoint()
process(item)
# ✅ BETTER: Use pdb.set_trace with condition
def process_items(items):
for item in items:
if item.value < 0: # Break only when problematic
breakpoint()
process(item)
```
### Post-Mortem Debugging
```python
# ✅ CORRECT: Debug after exception
import pdb
def main():
try:
# Code that might raise exception
result = risky_operation()
except Exception:
# Drop into debugger at exception point
pdb.post_mortem()
# ✅ CORRECT: Auto post-mortem for unhandled exceptions
import sys
def custom_excepthook(type, value, traceback):
pdb.post_mortem(traceback)
sys.excepthook = custom_excepthook
# Now unhandled exceptions drop into pdb automatically
```
**Why this matters**: Breakpoints let you inspect state at exact point of failure. Conditional breakpoints avoid noise. Post-mortem debugging examines crashes.
## CPU Profiling
### cProfile for Function-Level Profiling
```python
import cProfile
import pstats
# ❌ WRONG: Guessing which function is slow
def slow_program():
# "I think this loop is the problem..."
for i in range(1000):
process_data(i)
# ✅ CORRECT: Profile to find actual bottleneck
def slow_program():
for i in range(1000):
process_data(i)
# Profile the function
cProfile.run('slow_program()', 'profile_stats')
# Analyze results
stats = pstats.Stats('profile_stats')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions by cumulative time
# ✅ CORRECT: Profile with context manager
from contextlib import contextmanager
import cProfile
@contextmanager
def profiled():
pr = cProfile.Profile()
pr.enable()
yield
pr.disable()
stats = pstats.Stats(pr)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20)
# Usage
with profiled():
slow_program()
```
### Profiling Specific Code Blocks
```python
# ✅ CORRECT: Profile specific section
import cProfile
pr = cProfile.Profile()
# Normal code
setup_data()
# Profile this section
pr.enable()
expensive_operation()
pr.disable()
# More normal code
cleanup()
# View results
pr.print_stats(sort='cumulative')
```
### Line-Level Profiling with line_profiler
```python
# Install: pip install line_profiler
# ✅ CORRECT: Line-by-line profiling
from line_profiler import LineProfiler
@profile # Use @profile decorator
def slow_function():
total = 0
for i in range(10000):
total += i ** 2
return total
# Run with kernprof:
# kernprof -l -v script.py
# Or programmatically:
lp = LineProfiler()
lp.add_function(slow_function)
lp.enable()
slow_function()
lp.disable()
lp.print_stats()
# Output shows time spent per line:
# Line # Hits Time Per Hit % Time Line Contents
# ==============================================================
# 1 def slow_function():
# 2 1 2.0 2.0 0.0 total = 0
# 3 10001 15234.0 1.5 20.0 for i in range(10000):
# 4 10000 60123.0 6.0 80.0 total += i ** 2
# 5 1 1.0 1.0 0.0 return total
```
**Why this matters**: cProfile shows which functions are slow. line_profiler shows which lines within functions. Both essential for optimization.
### Visualizing Profiles with SnakeViz
```bash
# Install: pip install snakeviz
# Profile code
python -m cProfile -o program.prof script.py
# Visualize
snakeviz program.prof
# Opens browser with interactive visualization:
# - Sunburst chart showing call hierarchy
# - Icicle chart showing time distribution
# - Click functions to zoom in
```
## Memory Profiling
### Memory Usage with memory_profiler
```python
# Install: pip install memory_profiler
from memory_profiler import profile
# ✅ CORRECT: Track memory usage per line
@profile
def memory_hungry_function():
# Line-by-line memory usage shown
big_list = [i for i in range(1000000)] # Allocates ~40MB
big_dict = {i: i**2 for i in range(1000000)} # Another ~40MB
return len(big_list), len(big_dict)
# Run with:
# python -m memory_profiler script.py
# Output:
# Line # Mem usage Increment Line Contents
# ================================================
# 3 38.3 MiB 38.3 MiB @profile
# 4 def memory_hungry_function():
# 5 45.2 MiB 6.9 MiB big_list = [i for i in range(1000000)]
# 6 83.1 MiB 37.9 MiB big_dict = {i: i**2 for i in range(1000000)}
# 7 83.1 MiB 0.0 MiB return len(big_list), len(big_dict)
```
### Finding Memory Leaks
```python
# ✅ CORRECT: Detect memory leaks with tracemalloc
import tracemalloc
# Start tracing
tracemalloc.start()
# Take snapshot before
snapshot1 = tracemalloc.take_snapshot()
# Run code that might leak
problematic_function()
# Take snapshot after
snapshot2 = tracemalloc.take_snapshot()
# Compare snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("Top 10 memory increases:")
for stat in top_stats[:10]:
print(stat)
tracemalloc.stop()
# ✅ CORRECT: Track specific objects
import gc
import sys
def find_memory_leak():
# Force garbage collection
gc.collect()
# Track objects before
before = len(gc.get_objects())
# Run potentially leaky code
for _ in range(100):
leaky_operation()
# Force GC again
gc.collect()
# Track objects after
after = len(gc.get_objects())
if after > before:
print(f"Potential leak: {after - before} objects not collected")
# Find what's keeping objects alive
for obj in gc.get_objects():
if isinstance(obj, MyClass): # Suspect class
print(f"Found {type(obj)}: {sys.getrefcount(obj)} references")
print(gc.get_referrers(obj))
```
### Profiling Memory with objgraph
```python
# Install: pip install objgraph
import objgraph
# ✅ CORRECT: Find most common objects
def analyze_memory():
objgraph.show_most_common_types()
# Output:
# dict 12453
# function 8234
# list 6789
# ...
# ✅ CORRECT: Track object growth
objgraph.show_growth()
potentially_leaky_function()
objgraph.show_growth() # Shows objects that increased
# ✅ CORRECT: Visualize object references
import objgraph
objgraph.show_refs([my_object], filename='refs.png')
# Creates graph showing what references my_object
```
**Why this matters**: Memory leaks cause gradual performance degradation. tracemalloc and memory_profiler help find exactly where memory is allocated.
## Profiling Async Code
### Profiling Async Functions
```python
import asyncio
import cProfile
import pstats
# ❌ WRONG: cProfile doesn't work well with async
async def slow_async():
await asyncio.sleep(1)
await process_data()
cProfile.run('asyncio.run(slow_async())') # Misleading results
# ✅ CORRECT: Use yappi for async profiling
# Install: pip install yappi
import yappi
async def slow_async():
await asyncio.sleep(1)
await process_data()
yappi.set_clock_type("wall") # Use wall time, not CPU time
yappi.start()
asyncio.run(slow_async())
yappi.stop()
# Print stats
stats = yappi.get_func_stats()
stats.sort("totaltime", "desc")
stats.print_all()
# ✅ CORRECT: Profile coroutines specifically
stats = yappi.get_func_stats(filter_callback=lambda x: 'coroutine' in x.name)
stats.print_all()
```
### Detecting Blocking Code in Async
```python
# ✅ CORRECT: Detect event loop blocking
import asyncio
import time
class LoopMonitor:
def __init__(self, threshold: float = 0.1):
self.threshold = threshold
async def monitor(self):
while True:
start = time.monotonic()
await asyncio.sleep(0.01) # Very short sleep
elapsed = time.monotonic() - start
if elapsed > self.threshold:
print(f"WARNING: Event loop blocked for {elapsed:.3f}s")
async def main():
# Start monitor
monitor = LoopMonitor(threshold=0.1)
monitor_task = asyncio.create_task(monitor.monitor())
# Run your async code
await your_async_function()
monitor_task.cancel()
# ✅ CORRECT: Use asyncio debug mode
asyncio.run(main(), debug=True)
# Warns about slow callbacks (>100ms)
```
## Performance Optimization Strategies
### Optimization Workflow
```python
# ✅ CORRECT: Systematic optimization approach
# 1. Profile to find bottleneck
import cProfile
cProfile.run('main()', 'profile_stats')
# 2. Analyze results
stats = pstats.Stats('profile_stats')
stats.sort_stats('cumulative')
stats.print_stats(10) # Focus on top 10
# 3. Identify specific slow function
def slow_function(data):
# Original implementation
result = []
for item in data:
if is_valid(item):
result.append(transform(item))
return result
# 4. Create benchmark
import timeit
def benchmark():
data = create_test_data(10000)
time_taken = timeit.timeit(
lambda: slow_function(data),
number=100
)
print(f"Average time: {time_taken / 100:.4f}s")
benchmark() # Baseline: 0.1234s
# 5. Optimize
def optimized_function(data):
# Use list comprehension (faster)
return [transform(item) for item in data if is_valid(item)]
# 6. Benchmark again
time_taken = timeit.timeit(
lambda: optimized_function(data),
number=100
)
print(f"Average time: {time_taken / 100:.4f}s") # 0.0789s - 36% faster!
# 7. Verify correctness
assert slow_function(data) == optimized_function(data)
# 8. Re-profile entire program to verify improvement
cProfile.run('main()', 'profile_stats_optimized')
```
**Why this matters**: Without profiling, you might optimize code that takes 1% of runtime, ignoring the 90% bottleneck. Always measure.
### Common Optimizations
```python
# ❌ WRONG: Repeated expensive operations
def process_items(items):
for item in items:
# Regex compiled every iteration!
pattern = re.compile(r'\d+')
match = pattern.search(item)
# ✅ CORRECT: Move expensive operations outside loop
def process_items(items):
pattern = re.compile(r'\d+') # Compile once
for item in items:
match = pattern.search(item)
# ❌ WRONG: Growing list with repeated concatenation
def build_large_list():
result = []
for i in range(100000):
result = result + [i] # Creates new list each time! O(n²)
# ✅ CORRECT: Use append
def build_large_list():
result = []
for i in range(100000):
result.append(i) # O(n)
# ❌ WRONG: Checking membership in list
def filter_items(items, blacklist):
return [item for item in items if item not in blacklist]
# O(n * m) if blacklist is list
# ✅ CORRECT: Use set for membership checks
def filter_items(items, blacklist):
blacklist_set = set(blacklist) # O(m)
return [item for item in items if item not in blacklist_set]
# O(n) for iteration + O(1) per lookup = O(n)
```
### Caching Results
```python
from functools import lru_cache
# ❌ WRONG: Recomputing expensive results
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# O(2^n) - recalculates same values repeatedly
# ✅ CORRECT: Cache results
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# O(n) - each value computed once
# ✅ CORRECT: Custom caching for unhashable arguments
from functools import wraps
def cache_dataframe_results(func):
cache = {}
@wraps(func)
def wrapper(df):
# Use hash of dataframe content as key
key = hashlib.md5(df.to_csv(index=False).encode()).hexdigest()
if key not in cache:
cache[key] = func(df)
return cache[key]
return wrapper
@cache_dataframe_results
def expensive_dataframe_operation(df):
# Complex computation
return df.groupby('category').agg({'value': 'sum'})
```
## Systematic Diagnosis
### Performance Degradation Diagnosis
```python
# ✅ CORRECT: Diagnose performance regression
import cProfile
import pstats
def diagnose_slowdown():
"""Compare current vs baseline performance."""
# Profile current code
cProfile.run('main()', 'current_profile.prof')
# Load baseline profile (from git history or previous run)
# git show main:profile.prof > baseline_profile.prof
current = pstats.Stats('current_profile.prof')
baseline = pstats.Stats('baseline_profile.prof')
print("=== CURRENT ===")
current.sort_stats('cumulative')
current.print_stats(10)
print("\n=== BASELINE ===")
baseline.sort_stats('cumulative')
baseline.print_stats(10)
# Look for functions that got slower
# Compare cumulative times
```
### Memory Leak Diagnosis
```python
# ✅ CORRECT: Systematic memory leak detection
import tracemalloc
import gc
def diagnose_memory_leak():
"""Run function multiple times and check memory growth."""
gc.collect()
tracemalloc.start()
# Baseline
snapshot1 = tracemalloc.take_snapshot()
# Run 100 times
for _ in range(100):
potentially_leaky_function()
gc.collect()
# Check memory
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("Top 10 memory allocations:")
for stat in top_stats[:10]:
print(f"{stat.traceback}: +{stat.size_diff / 1024:.1f} KB")
tracemalloc.stop()
```
### I/O vs CPU Bound Diagnosis
```python
# ✅ CORRECT: Determine if I/O or CPU bound
import time
import cProfile
def diagnose_bottleneck():
"""Determine if program is I/O or CPU bound."""
# Time wall clock
start_wall = time.time()
main()
wall_time = time.time() - start_wall
# Profile CPU time
pr = cProfile.Profile()
pr.enable()
start_cpu = time.process_time()
main()
cpu_time = time.process_time() - start_cpu
pr.disable()
print(f"Wall time: {wall_time:.2f}s")
print(f"CPU time: {cpu_time:.2f}s")
if cpu_time / wall_time > 0.9:
print("CPU bound - optimize computation")
# Consider: Cython, NumPy, multiprocessing
else:
print("I/O bound - optimize I/O")
# Consider: async/await, caching, batching
```
## Common Bottlenecks and Solutions
### String Concatenation
```python
# ❌ WRONG: String concatenation in loop
def build_string(items):
result = ""
for item in items:
result += str(item) + "\n" # Creates new string each time
return result
# O(n²) time complexity
# ✅ CORRECT: Use join
def build_string(items):
return "\n".join(str(item) for item in items)
# O(n) time complexity
# Benchmark:
# 1000 items: 0.0015s (join) vs 0.0234s (concatenation) - 15x faster
# 10000 items: 0.015s (join) vs 2.341s (concatenation) - 156x faster
```
### List Comprehension vs Map/Filter
```python
import timeit
# ✅ CORRECT: List comprehension (usually fastest)
def with_list_comp(data):
return [x * 2 for x in data if x > 0]
# ✅ CORRECT: Generator (memory efficient for large data)
def with_generator(data):
return (x * 2 for x in data if x > 0)
# Map/filter (sometimes faster for simple operations)
def with_map_filter(data):
return map(lambda x: x * 2, filter(lambda x: x > 0, data))
# Benchmark
data = list(range(1000000))
print(timeit.timeit(lambda: list(with_list_comp(data)), number=10))
print(timeit.timeit(lambda: list(with_generator(data)), number=10))
print(timeit.timeit(lambda: list(with_map_filter(data)), number=10))
# Results: List comprehension usually fastest for complex logic
# Generator best when you don't need all results at once
```
### Dictionary Lookups vs List Searches
```python
# ❌ WRONG: Searching in list
def find_users_list(user_ids, all_users_list):
results = []
for user_id in user_ids:
for user in all_users_list: # O(n) per lookup
if user['id'] == user_id:
results.append(user)
break
return results
# O(n * m) time complexity
# ✅ CORRECT: Use dictionary
def find_users_dict(user_ids, all_users_dict):
return [all_users_dict[uid] for uid in user_ids if uid in all_users_dict]
# O(n) time complexity
# Benchmark:
# 1000 lookups in 10000 items:
# List: 1.234s
# Dict: 0.001s - 1234x faster!
```
### DataFrame Iteration Anti-Pattern
```python
import pandas as pd
import numpy as np
# ❌ WRONG: Iterating over DataFrame rows
def process_rows_iterrows(df):
results = []
for idx, row in df.iterrows(): # VERY SLOW
if row['value'] > 0:
results.append(row['value'] * 2)
return results
# ✅ CORRECT: Vectorized operations
def process_rows_vectorized(df):
mask = df['value'] > 0
return (df.loc[mask, 'value'] * 2).tolist()
# Benchmark with 100,000 rows:
# iterrows: 15.234s
# vectorized: 0.015s - 1000x faster!
```
## Profiling Tools Comparison
### When to Use Which Tool
| Tool | Use Case | Output |
|------|----------|--------|
| cProfile | Function-level CPU profiling | Which functions take most time |
| line_profiler | Line-level CPU profiling | Which lines within function slow |
| memory_profiler | Line-level memory profiling | Memory usage per line |
| tracemalloc | Memory allocation tracking | Where memory allocated |
| yappi | Async/multithreaded profiling | Profile concurrent code |
| py-spy | Sampling profiler (no code changes) | Profile running processes |
| scalene | CPU+GPU+memory profiling | Comprehensive profiling |
### py-spy for Production Profiling
```bash
# Install: pip install py-spy
# Profile running process (no code changes needed!)
py-spy record -o profile.svg --pid 12345
# Profile for 60 seconds
py-spy record -o profile.svg --duration 60 -- python script.py
# Top-like view of running process
py-spy top --pid 12345
# Why use py-spy:
# - No code changes needed
# - Minimal overhead
# - Can attach to running process
# - Great for production debugging
```
## Anti-Patterns
### Premature Optimization
```python
# ❌ WRONG: Optimizing before measuring
def process_data(data):
# "Let me make this fast with complex caching..."
# Spend hours optimizing function that takes 0.1% of runtime
# ✅ CORRECT: Profile first
cProfile.run('main()', 'profile.prof')
# Oh, process_data only takes 0.1% of time
# The real bottleneck is database queries (90% of time)
# Optimize database queries instead!
```
### Micro-Optimizations
```python
# ❌ WRONG: Micro-optimizing at expense of readability
def calculate(x, y):
# "Using bit shift instead of multiply by 2 for speed!"
return (x << 1) + (y << 1)
# Saved: ~0.0000001 seconds per call
# Cost: Unreadable code
# ✅ CORRECT: Clear code first
def calculate(x, y):
return 2 * x + 2 * y
# Modern Python JIT optimizes this anyway
# Only optimize if profiler shows this is bottleneck
```
### Not Benchmarking Changes
```python
# ❌ WRONG: Assuming optimization worked
def slow_function():
# Original code
pass
def optimized_function():
# "Optimized" code
pass
# Assume optimized_function is faster without measuring
# ✅ CORRECT: Benchmark before and after
import timeit
before = timeit.timeit(slow_function, number=1000)
after = timeit.timeit(optimized_function, number=1000)
print(f"Before: {before:.4f}s")
print(f"After: {after:.4f}s")
print(f"Speedup: {before/after:.2f}x")
# Verify correctness
assert slow_function() == optimized_function()
```
## Decision Trees
### What Tool to Use for Profiling?
```
What do I need to profile?
├─ CPU time
│ ├─ Function-level → cProfile
│ ├─ Line-level → line_profiler
│ └─ Async code → yappi
├─ Memory usage
│ ├─ Line-level → memory_profiler
│ ├─ Allocation tracking → tracemalloc
│ └─ Object types → objgraph
└─ Running process (no code changes) → py-spy
```
### Optimization Strategy
```
Is code slow?
├─ Yes → Profile to find bottleneck
│ ├─ CPU bound → Profile with cProfile
│ │ └─ Optimize hot functions (vectorize, cache, algorithms)
│ └─ I/O bound → Profile with timing
│ └─ Use async/await, caching, batching
└─ No → Don't optimize (focus on features/correctness)
```
### Memory Issue Diagnosis
```
Is memory usage high?
├─ Yes → Profile with memory_profiler
│ ├─ Growing over time → Memory leak
│ │ └─ Use tracemalloc to find leak
│ └─ High but stable → Large data structures
│ └─ Optimize data structures (generators, efficient types)
└─ No → Monitor but don't optimize yet
```
## Integration with Other Skills
**After using this skill:**
- If I/O bound → See @async-patterns-and-concurrency for async optimization
- If data processing slow → See @scientific-computing-foundations for vectorization
- If need to track improvements → See @ml-engineering-workflows for metrics
**Before using this skill:**
- If unsure code is slow → Use this skill to profile and confirm!
- If setting up profiling → See @project-structure-and-tooling for dependencies
## Quick Reference
### Essential Profiling Commands
```python
# CPU profiling
import cProfile
cProfile.run('main()', 'profile.prof')
# View results
import pstats
stats = pstats.Stats('profile.prof')
stats.sort_stats('cumulative')
stats.print_stats(20)
# Memory profiling
import tracemalloc
tracemalloc.start()
# ... code ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
print(stat)
```
### Debugging Commands
```python
# Set breakpoint
breakpoint() # Python 3.7+
# or
import pdb; pdb.set_trace()
# pdb commands:
# n - next line
# s - step into
# c - continue
# p var - print variable
# l - list code
# w - where am I
# q - quit
```
### Optimization Checklist
- [ ] Profile before optimizing (use cProfile)
- [ ] Identify bottleneck (top 20% of time)
- [ ] Create benchmark for bottleneck
- [ ] Optimize bottleneck
- [ ] Benchmark again to verify improvement
- [ ] Re-profile entire program
- [ ] Verify correctness (tests still pass)
### Common Optimizations
| Problem | Solution | Speedup |
|---------|----------|---------|
| String concatenation in loop | Use str.join() | 10-100x |
| List membership checks | Use set | 100-1000x |
| DataFrame iteration | Vectorize with NumPy/pandas | 100-1000x |
| Repeated expensive computation | Cache with @lru_cache | ∞ (depends on cache hits) |
| I/O bound | Use async/await | 10-100x |
| CPU bound with parallelizable work | Use multiprocessing | ~number of cores |
### Red Flags
If you find yourself:
- Optimizing before profiling → STOP, profile first
- Spending hours on micro-optimizations → Check if it's bottleneck
- Making code unreadable for speed → Benchmark the benefit
- Assuming what's slow → Profile to verify
**Always measure. Never assume.**