# Debugging and Profiling ## Overview **Core Principle:** Profile before optimizing. Humans are terrible at guessing where code is slow. Always measure before making changes. Python debugging and profiling enables systematic problem diagnosis and performance optimization. Use debugpy/pdb for step-through debugging, cProfile for CPU profiling, memory_profiler for memory analysis. The biggest mistake: optimizing code without profiling first—you'll likely optimize the wrong thing. ## When to Use **Use this skill when:** - "Code is slow" - "How to profile Python?" - "Memory leak" - "Debugging not working" - "Find bottleneck" - "Optimize performance" - "Step through code" - "Where is my code spending time?" **Don't use when:** - Setting up project (use project-structure-and-tooling) - Already know what to optimize (but still profile to verify!) - Algorithm selection (different skill domain) **Symptoms triggering this skill:** - Code runs slower than expected - Memory usage growing over time - Need to understand execution flow - Performance degraded after changes ## Debugging Fundamentals ### Using debugpy with VS Code ```python # ✅ CORRECT: debugpy for remote debugging import debugpy # Allow VS Code to attach debugpy.listen(5678) print("Waiting for debugger to attach...") debugpy.wait_for_client() # Your code here def process_data(data): result = [] for item in data: # Set breakpoint in VS Code on this line transformed = transform(item) result.append(transformed) return result # VS Code launch.json configuration: """ { "version": "0.2.0", "configurations": [ { "name": "Python: Attach", "type": "python", "request": "attach", "connect": { "host": "localhost", "port": 5678 } } ] } """ ``` ### Using pdb (Python Debugger) ```python # ✅ CORRECT: pdb for interactive debugging import pdb def buggy_function(data): result = [] for i, item in enumerate(data): # Drop into debugger pdb.set_trace() # Or: breakpoint() in Python 3.7+ processed = item * 2 result.append(processed) return result # pdb commands: # n (next): Execute next line # s (step): Step into function # c (continue): Continue execution # p variable: Print variable # pp variable: Pretty print variable # l (list): Show current location in code # w (where): Show stack trace # q (quit): Quit debugger ``` ### Conditional Breakpoints ```python # ❌ WRONG: Breaking on every iteration def process_items(items): for item in items: pdb.set_trace() # Breaks 10000 times! process(item) # ✅ CORRECT: Conditional breakpoint def process_items(items): for i, item in enumerate(items): if i == 5000: # Only break on specific iteration breakpoint() process(item) # ✅ BETTER: Use pdb.set_trace with condition def process_items(items): for item in items: if item.value < 0: # Break only when problematic breakpoint() process(item) ``` ### Post-Mortem Debugging ```python # ✅ CORRECT: Debug after exception import pdb def main(): try: # Code that might raise exception result = risky_operation() except Exception: # Drop into debugger at exception point pdb.post_mortem() # ✅ CORRECT: Auto post-mortem for unhandled exceptions import sys def custom_excepthook(type, value, traceback): pdb.post_mortem(traceback) sys.excepthook = custom_excepthook # Now unhandled exceptions drop into pdb automatically ``` **Why this matters**: Breakpoints let you inspect state at exact point of failure. Conditional breakpoints avoid noise. Post-mortem debugging examines crashes. ## CPU Profiling ### cProfile for Function-Level Profiling ```python import cProfile import pstats # ❌ WRONG: Guessing which function is slow def slow_program(): # "I think this loop is the problem..." for i in range(1000): process_data(i) # ✅ CORRECT: Profile to find actual bottleneck def slow_program(): for i in range(1000): process_data(i) # Profile the function cProfile.run('slow_program()', 'profile_stats') # Analyze results stats = pstats.Stats('profile_stats') stats.strip_dirs() stats.sort_stats('cumulative') stats.print_stats(20) # Top 20 functions by cumulative time # ✅ CORRECT: Profile with context manager from contextlib import contextmanager import cProfile @contextmanager def profiled(): pr = cProfile.Profile() pr.enable() yield pr.disable() stats = pstats.Stats(pr) stats.strip_dirs() stats.sort_stats('cumulative') stats.print_stats(20) # Usage with profiled(): slow_program() ``` ### Profiling Specific Code Blocks ```python # ✅ CORRECT: Profile specific section import cProfile pr = cProfile.Profile() # Normal code setup_data() # Profile this section pr.enable() expensive_operation() pr.disable() # More normal code cleanup() # View results pr.print_stats(sort='cumulative') ``` ### Line-Level Profiling with line_profiler ```python # Install: pip install line_profiler # ✅ CORRECT: Line-by-line profiling from line_profiler import LineProfiler @profile # Use @profile decorator def slow_function(): total = 0 for i in range(10000): total += i ** 2 return total # Run with kernprof: # kernprof -l -v script.py # Or programmatically: lp = LineProfiler() lp.add_function(slow_function) lp.enable() slow_function() lp.disable() lp.print_stats() # Output shows time spent per line: # Line # Hits Time Per Hit % Time Line Contents # ============================================================== # 1 def slow_function(): # 2 1 2.0 2.0 0.0 total = 0 # 3 10001 15234.0 1.5 20.0 for i in range(10000): # 4 10000 60123.0 6.0 80.0 total += i ** 2 # 5 1 1.0 1.0 0.0 return total ``` **Why this matters**: cProfile shows which functions are slow. line_profiler shows which lines within functions. Both essential for optimization. ### Visualizing Profiles with SnakeViz ```bash # Install: pip install snakeviz # Profile code python -m cProfile -o program.prof script.py # Visualize snakeviz program.prof # Opens browser with interactive visualization: # - Sunburst chart showing call hierarchy # - Icicle chart showing time distribution # - Click functions to zoom in ``` ## Memory Profiling ### Memory Usage with memory_profiler ```python # Install: pip install memory_profiler from memory_profiler import profile # ✅ CORRECT: Track memory usage per line @profile def memory_hungry_function(): # Line-by-line memory usage shown big_list = [i for i in range(1000000)] # Allocates ~40MB big_dict = {i: i**2 for i in range(1000000)} # Another ~40MB return len(big_list), len(big_dict) # Run with: # python -m memory_profiler script.py # Output: # Line # Mem usage Increment Line Contents # ================================================ # 3 38.3 MiB 38.3 MiB @profile # 4 def memory_hungry_function(): # 5 45.2 MiB 6.9 MiB big_list = [i for i in range(1000000)] # 6 83.1 MiB 37.9 MiB big_dict = {i: i**2 for i in range(1000000)} # 7 83.1 MiB 0.0 MiB return len(big_list), len(big_dict) ``` ### Finding Memory Leaks ```python # ✅ CORRECT: Detect memory leaks with tracemalloc import tracemalloc # Start tracing tracemalloc.start() # Take snapshot before snapshot1 = tracemalloc.take_snapshot() # Run code that might leak problematic_function() # Take snapshot after snapshot2 = tracemalloc.take_snapshot() # Compare snapshots top_stats = snapshot2.compare_to(snapshot1, 'lineno') print("Top 10 memory increases:") for stat in top_stats[:10]: print(stat) tracemalloc.stop() # ✅ CORRECT: Track specific objects import gc import sys def find_memory_leak(): # Force garbage collection gc.collect() # Track objects before before = len(gc.get_objects()) # Run potentially leaky code for _ in range(100): leaky_operation() # Force GC again gc.collect() # Track objects after after = len(gc.get_objects()) if after > before: print(f"Potential leak: {after - before} objects not collected") # Find what's keeping objects alive for obj in gc.get_objects(): if isinstance(obj, MyClass): # Suspect class print(f"Found {type(obj)}: {sys.getrefcount(obj)} references") print(gc.get_referrers(obj)) ``` ### Profiling Memory with objgraph ```python # Install: pip install objgraph import objgraph # ✅ CORRECT: Find most common objects def analyze_memory(): objgraph.show_most_common_types() # Output: # dict 12453 # function 8234 # list 6789 # ... # ✅ CORRECT: Track object growth objgraph.show_growth() potentially_leaky_function() objgraph.show_growth() # Shows objects that increased # ✅ CORRECT: Visualize object references import objgraph objgraph.show_refs([my_object], filename='refs.png') # Creates graph showing what references my_object ``` **Why this matters**: Memory leaks cause gradual performance degradation. tracemalloc and memory_profiler help find exactly where memory is allocated. ## Profiling Async Code ### Profiling Async Functions ```python import asyncio import cProfile import pstats # ❌ WRONG: cProfile doesn't work well with async async def slow_async(): await asyncio.sleep(1) await process_data() cProfile.run('asyncio.run(slow_async())') # Misleading results # ✅ CORRECT: Use yappi for async profiling # Install: pip install yappi import yappi async def slow_async(): await asyncio.sleep(1) await process_data() yappi.set_clock_type("wall") # Use wall time, not CPU time yappi.start() asyncio.run(slow_async()) yappi.stop() # Print stats stats = yappi.get_func_stats() stats.sort("totaltime", "desc") stats.print_all() # ✅ CORRECT: Profile coroutines specifically stats = yappi.get_func_stats(filter_callback=lambda x: 'coroutine' in x.name) stats.print_all() ``` ### Detecting Blocking Code in Async ```python # ✅ CORRECT: Detect event loop blocking import asyncio import time class LoopMonitor: def __init__(self, threshold: float = 0.1): self.threshold = threshold async def monitor(self): while True: start = time.monotonic() await asyncio.sleep(0.01) # Very short sleep elapsed = time.monotonic() - start if elapsed > self.threshold: print(f"WARNING: Event loop blocked for {elapsed:.3f}s") async def main(): # Start monitor monitor = LoopMonitor(threshold=0.1) monitor_task = asyncio.create_task(monitor.monitor()) # Run your async code await your_async_function() monitor_task.cancel() # ✅ CORRECT: Use asyncio debug mode asyncio.run(main(), debug=True) # Warns about slow callbacks (>100ms) ``` ## Performance Optimization Strategies ### Optimization Workflow ```python # ✅ CORRECT: Systematic optimization approach # 1. Profile to find bottleneck import cProfile cProfile.run('main()', 'profile_stats') # 2. Analyze results stats = pstats.Stats('profile_stats') stats.sort_stats('cumulative') stats.print_stats(10) # Focus on top 10 # 3. Identify specific slow function def slow_function(data): # Original implementation result = [] for item in data: if is_valid(item): result.append(transform(item)) return result # 4. Create benchmark import timeit def benchmark(): data = create_test_data(10000) time_taken = timeit.timeit( lambda: slow_function(data), number=100 ) print(f"Average time: {time_taken / 100:.4f}s") benchmark() # Baseline: 0.1234s # 5. Optimize def optimized_function(data): # Use list comprehension (faster) return [transform(item) for item in data if is_valid(item)] # 6. Benchmark again time_taken = timeit.timeit( lambda: optimized_function(data), number=100 ) print(f"Average time: {time_taken / 100:.4f}s") # 0.0789s - 36% faster! # 7. Verify correctness assert slow_function(data) == optimized_function(data) # 8. Re-profile entire program to verify improvement cProfile.run('main()', 'profile_stats_optimized') ``` **Why this matters**: Without profiling, you might optimize code that takes 1% of runtime, ignoring the 90% bottleneck. Always measure. ### Common Optimizations ```python # ❌ WRONG: Repeated expensive operations def process_items(items): for item in items: # Regex compiled every iteration! pattern = re.compile(r'\d+') match = pattern.search(item) # ✅ CORRECT: Move expensive operations outside loop def process_items(items): pattern = re.compile(r'\d+') # Compile once for item in items: match = pattern.search(item) # ❌ WRONG: Growing list with repeated concatenation def build_large_list(): result = [] for i in range(100000): result = result + [i] # Creates new list each time! O(n²) # ✅ CORRECT: Use append def build_large_list(): result = [] for i in range(100000): result.append(i) # O(n) # ❌ WRONG: Checking membership in list def filter_items(items, blacklist): return [item for item in items if item not in blacklist] # O(n * m) if blacklist is list # ✅ CORRECT: Use set for membership checks def filter_items(items, blacklist): blacklist_set = set(blacklist) # O(m) return [item for item in items if item not in blacklist_set] # O(n) for iteration + O(1) per lookup = O(n) ``` ### Caching Results ```python from functools import lru_cache # ❌ WRONG: Recomputing expensive results def fibonacci(n): if n < 2: return n return fibonacci(n-1) + fibonacci(n-2) # O(2^n) - recalculates same values repeatedly # ✅ CORRECT: Cache results @lru_cache(maxsize=None) def fibonacci(n): if n < 2: return n return fibonacci(n-1) + fibonacci(n-2) # O(n) - each value computed once # ✅ CORRECT: Custom caching for unhashable arguments from functools import wraps def cache_dataframe_results(func): cache = {} @wraps(func) def wrapper(df): # Use hash of dataframe content as key key = hashlib.md5(df.to_csv(index=False).encode()).hexdigest() if key not in cache: cache[key] = func(df) return cache[key] return wrapper @cache_dataframe_results def expensive_dataframe_operation(df): # Complex computation return df.groupby('category').agg({'value': 'sum'}) ``` ## Systematic Diagnosis ### Performance Degradation Diagnosis ```python # ✅ CORRECT: Diagnose performance regression import cProfile import pstats def diagnose_slowdown(): """Compare current vs baseline performance.""" # Profile current code cProfile.run('main()', 'current_profile.prof') # Load baseline profile (from git history or previous run) # git show main:profile.prof > baseline_profile.prof current = pstats.Stats('current_profile.prof') baseline = pstats.Stats('baseline_profile.prof') print("=== CURRENT ===") current.sort_stats('cumulative') current.print_stats(10) print("\n=== BASELINE ===") baseline.sort_stats('cumulative') baseline.print_stats(10) # Look for functions that got slower # Compare cumulative times ``` ### Memory Leak Diagnosis ```python # ✅ CORRECT: Systematic memory leak detection import tracemalloc import gc def diagnose_memory_leak(): """Run function multiple times and check memory growth.""" gc.collect() tracemalloc.start() # Baseline snapshot1 = tracemalloc.take_snapshot() # Run 100 times for _ in range(100): potentially_leaky_function() gc.collect() # Check memory snapshot2 = tracemalloc.take_snapshot() top_stats = snapshot2.compare_to(snapshot1, 'lineno') print("Top 10 memory allocations:") for stat in top_stats[:10]: print(f"{stat.traceback}: +{stat.size_diff / 1024:.1f} KB") tracemalloc.stop() ``` ### I/O vs CPU Bound Diagnosis ```python # ✅ CORRECT: Determine if I/O or CPU bound import time import cProfile def diagnose_bottleneck(): """Determine if program is I/O or CPU bound.""" # Time wall clock start_wall = time.time() main() wall_time = time.time() - start_wall # Profile CPU time pr = cProfile.Profile() pr.enable() start_cpu = time.process_time() main() cpu_time = time.process_time() - start_cpu pr.disable() print(f"Wall time: {wall_time:.2f}s") print(f"CPU time: {cpu_time:.2f}s") if cpu_time / wall_time > 0.9: print("CPU bound - optimize computation") # Consider: Cython, NumPy, multiprocessing else: print("I/O bound - optimize I/O") # Consider: async/await, caching, batching ``` ## Common Bottlenecks and Solutions ### String Concatenation ```python # ❌ WRONG: String concatenation in loop def build_string(items): result = "" for item in items: result += str(item) + "\n" # Creates new string each time return result # O(n²) time complexity # ✅ CORRECT: Use join def build_string(items): return "\n".join(str(item) for item in items) # O(n) time complexity # Benchmark: # 1000 items: 0.0015s (join) vs 0.0234s (concatenation) - 15x faster # 10000 items: 0.015s (join) vs 2.341s (concatenation) - 156x faster ``` ### List Comprehension vs Map/Filter ```python import timeit # ✅ CORRECT: List comprehension (usually fastest) def with_list_comp(data): return [x * 2 for x in data if x > 0] # ✅ CORRECT: Generator (memory efficient for large data) def with_generator(data): return (x * 2 for x in data if x > 0) # Map/filter (sometimes faster for simple operations) def with_map_filter(data): return map(lambda x: x * 2, filter(lambda x: x > 0, data)) # Benchmark data = list(range(1000000)) print(timeit.timeit(lambda: list(with_list_comp(data)), number=10)) print(timeit.timeit(lambda: list(with_generator(data)), number=10)) print(timeit.timeit(lambda: list(with_map_filter(data)), number=10)) # Results: List comprehension usually fastest for complex logic # Generator best when you don't need all results at once ``` ### Dictionary Lookups vs List Searches ```python # ❌ WRONG: Searching in list def find_users_list(user_ids, all_users_list): results = [] for user_id in user_ids: for user in all_users_list: # O(n) per lookup if user['id'] == user_id: results.append(user) break return results # O(n * m) time complexity # ✅ CORRECT: Use dictionary def find_users_dict(user_ids, all_users_dict): return [all_users_dict[uid] for uid in user_ids if uid in all_users_dict] # O(n) time complexity # Benchmark: # 1000 lookups in 10000 items: # List: 1.234s # Dict: 0.001s - 1234x faster! ``` ### DataFrame Iteration Anti-Pattern ```python import pandas as pd import numpy as np # ❌ WRONG: Iterating over DataFrame rows def process_rows_iterrows(df): results = [] for idx, row in df.iterrows(): # VERY SLOW if row['value'] > 0: results.append(row['value'] * 2) return results # ✅ CORRECT: Vectorized operations def process_rows_vectorized(df): mask = df['value'] > 0 return (df.loc[mask, 'value'] * 2).tolist() # Benchmark with 100,000 rows: # iterrows: 15.234s # vectorized: 0.015s - 1000x faster! ``` ## Profiling Tools Comparison ### When to Use Which Tool | Tool | Use Case | Output | |------|----------|--------| | cProfile | Function-level CPU profiling | Which functions take most time | | line_profiler | Line-level CPU profiling | Which lines within function slow | | memory_profiler | Line-level memory profiling | Memory usage per line | | tracemalloc | Memory allocation tracking | Where memory allocated | | yappi | Async/multithreaded profiling | Profile concurrent code | | py-spy | Sampling profiler (no code changes) | Profile running processes | | scalene | CPU+GPU+memory profiling | Comprehensive profiling | ### py-spy for Production Profiling ```bash # Install: pip install py-spy # Profile running process (no code changes needed!) py-spy record -o profile.svg --pid 12345 # Profile for 60 seconds py-spy record -o profile.svg --duration 60 -- python script.py # Top-like view of running process py-spy top --pid 12345 # Why use py-spy: # - No code changes needed # - Minimal overhead # - Can attach to running process # - Great for production debugging ``` ## Anti-Patterns ### Premature Optimization ```python # ❌ WRONG: Optimizing before measuring def process_data(data): # "Let me make this fast with complex caching..." # Spend hours optimizing function that takes 0.1% of runtime # ✅ CORRECT: Profile first cProfile.run('main()', 'profile.prof') # Oh, process_data only takes 0.1% of time # The real bottleneck is database queries (90% of time) # Optimize database queries instead! ``` ### Micro-Optimizations ```python # ❌ WRONG: Micro-optimizing at expense of readability def calculate(x, y): # "Using bit shift instead of multiply by 2 for speed!" return (x << 1) + (y << 1) # Saved: ~0.0000001 seconds per call # Cost: Unreadable code # ✅ CORRECT: Clear code first def calculate(x, y): return 2 * x + 2 * y # Modern Python JIT optimizes this anyway # Only optimize if profiler shows this is bottleneck ``` ### Not Benchmarking Changes ```python # ❌ WRONG: Assuming optimization worked def slow_function(): # Original code pass def optimized_function(): # "Optimized" code pass # Assume optimized_function is faster without measuring # ✅ CORRECT: Benchmark before and after import timeit before = timeit.timeit(slow_function, number=1000) after = timeit.timeit(optimized_function, number=1000) print(f"Before: {before:.4f}s") print(f"After: {after:.4f}s") print(f"Speedup: {before/after:.2f}x") # Verify correctness assert slow_function() == optimized_function() ``` ## Decision Trees ### What Tool to Use for Profiling? ``` What do I need to profile? ├─ CPU time │ ├─ Function-level → cProfile │ ├─ Line-level → line_profiler │ └─ Async code → yappi ├─ Memory usage │ ├─ Line-level → memory_profiler │ ├─ Allocation tracking → tracemalloc │ └─ Object types → objgraph └─ Running process (no code changes) → py-spy ``` ### Optimization Strategy ``` Is code slow? ├─ Yes → Profile to find bottleneck │ ├─ CPU bound → Profile with cProfile │ │ └─ Optimize hot functions (vectorize, cache, algorithms) │ └─ I/O bound → Profile with timing │ └─ Use async/await, caching, batching └─ No → Don't optimize (focus on features/correctness) ``` ### Memory Issue Diagnosis ``` Is memory usage high? ├─ Yes → Profile with memory_profiler │ ├─ Growing over time → Memory leak │ │ └─ Use tracemalloc to find leak │ └─ High but stable → Large data structures │ └─ Optimize data structures (generators, efficient types) └─ No → Monitor but don't optimize yet ``` ## Integration with Other Skills **After using this skill:** - If I/O bound → See @async-patterns-and-concurrency for async optimization - If data processing slow → See @scientific-computing-foundations for vectorization - If need to track improvements → See @ml-engineering-workflows for metrics **Before using this skill:** - If unsure code is slow → Use this skill to profile and confirm! - If setting up profiling → See @project-structure-and-tooling for dependencies ## Quick Reference ### Essential Profiling Commands ```python # CPU profiling import cProfile cProfile.run('main()', 'profile.prof') # View results import pstats stats = pstats.Stats('profile.prof') stats.sort_stats('cumulative') stats.print_stats(20) # Memory profiling import tracemalloc tracemalloc.start() # ... code ... snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:10]: print(stat) ``` ### Debugging Commands ```python # Set breakpoint breakpoint() # Python 3.7+ # or import pdb; pdb.set_trace() # pdb commands: # n - next line # s - step into # c - continue # p var - print variable # l - list code # w - where am I # q - quit ``` ### Optimization Checklist - [ ] Profile before optimizing (use cProfile) - [ ] Identify bottleneck (top 20% of time) - [ ] Create benchmark for bottleneck - [ ] Optimize bottleneck - [ ] Benchmark again to verify improvement - [ ] Re-profile entire program - [ ] Verify correctness (tests still pass) ### Common Optimizations | Problem | Solution | Speedup | |---------|----------|---------| | String concatenation in loop | Use str.join() | 10-100x | | List membership checks | Use set | 100-1000x | | DataFrame iteration | Vectorize with NumPy/pandas | 100-1000x | | Repeated expensive computation | Cache with @lru_cache | ∞ (depends on cache hits) | | I/O bound | Use async/await | 10-100x | | CPU bound with parallelizable work | Use multiprocessing | ~number of cores | ### Red Flags If you find yourself: - Optimizing before profiling → STOP, profile first - Spending hours on micro-optimizations → Check if it's bottleneck - Making code unreadable for speed → Benchmark the benefit - Assuming what's slow → Profile to verify **Always measure. Never assume.**