Initial commit
This commit is contained in:
869
skills/python-performance-optimization/SKILL.md
Normal file
869
skills/python-performance-optimization/SKILL.md
Normal file
@@ -0,0 +1,869 @@
|
||||
---
|
||||
name: python-performance-optimization
|
||||
description: Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.
|
||||
---
|
||||
|
||||
# Python Performance Optimization
|
||||
|
||||
Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- Identifying performance bottlenecks in Python applications
|
||||
- Reducing application latency and response times
|
||||
- Optimizing CPU-intensive operations
|
||||
- Reducing memory consumption and memory leaks
|
||||
- Improving database query performance
|
||||
- Optimizing I/O operations
|
||||
- Speeding up data processing pipelines
|
||||
- Implementing high-performance algorithms
|
||||
- Profiling production applications
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### 1. Profiling Types
|
||||
- **CPU Profiling**: Identify time-consuming functions
|
||||
- **Memory Profiling**: Track memory allocation and leaks
|
||||
- **Line Profiling**: Profile at line-by-line granularity
|
||||
- **Call Graph**: Visualize function call relationships
|
||||
|
||||
### 2. Performance Metrics
|
||||
- **Execution Time**: How long operations take
|
||||
- **Memory Usage**: Peak and average memory consumption
|
||||
- **CPU Utilization**: Processor usage patterns
|
||||
- **I/O Wait**: Time spent on I/O operations
|
||||
|
||||
### 3. Optimization Strategies
|
||||
- **Algorithmic**: Better algorithms and data structures
|
||||
- **Implementation**: More efficient code patterns
|
||||
- **Parallelization**: Multi-threading/processing
|
||||
- **Caching**: Avoid redundant computation
|
||||
- **Native Extensions**: C/Rust for critical paths
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Timing
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
def measure_time():
|
||||
"""Simple timing measurement."""
|
||||
start = time.time()
|
||||
|
||||
# Your code here
|
||||
result = sum(range(1000000))
|
||||
|
||||
elapsed = time.time() - start
|
||||
print(f"Execution time: {elapsed:.4f} seconds")
|
||||
return result
|
||||
|
||||
# Better: use timeit for accurate measurements
|
||||
import timeit
|
||||
|
||||
execution_time = timeit.timeit(
|
||||
"sum(range(1000000))",
|
||||
number=100
|
||||
)
|
||||
print(f"Average time: {execution_time/100:.6f} seconds")
|
||||
```
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Pattern 1: cProfile - CPU Profiling
|
||||
|
||||
```python
|
||||
import cProfile
|
||||
import pstats
|
||||
from pstats import SortKey
|
||||
|
||||
def slow_function():
|
||||
"""Function to profile."""
|
||||
total = 0
|
||||
for i in range(1000000):
|
||||
total += i
|
||||
return total
|
||||
|
||||
def another_function():
|
||||
"""Another function."""
|
||||
return [i**2 for i in range(100000)]
|
||||
|
||||
def main():
|
||||
"""Main function to profile."""
|
||||
result1 = slow_function()
|
||||
result2 = another_function()
|
||||
return result1, result2
|
||||
|
||||
# Profile the code
|
||||
if __name__ == "__main__":
|
||||
profiler = cProfile.Profile()
|
||||
profiler.enable()
|
||||
|
||||
main()
|
||||
|
||||
profiler.disable()
|
||||
|
||||
# Print stats
|
||||
stats = pstats.Stats(profiler)
|
||||
stats.sort_stats(SortKey.CUMULATIVE)
|
||||
stats.print_stats(10) # Top 10 functions
|
||||
|
||||
# Save to file for later analysis
|
||||
stats.dump_stats("profile_output.prof")
|
||||
```
|
||||
|
||||
**Command-line profiling:**
|
||||
```bash
|
||||
# Profile a script
|
||||
python -m cProfile -o output.prof script.py
|
||||
|
||||
# View results
|
||||
python -m pstats output.prof
|
||||
# In pstats:
|
||||
# sort cumtime
|
||||
# stats 10
|
||||
```
|
||||
|
||||
### Pattern 2: line_profiler - Line-by-Line Profiling
|
||||
|
||||
```python
|
||||
# Install: pip install line-profiler
|
||||
|
||||
# Add @profile decorator (line_profiler provides this)
|
||||
@profile
|
||||
def process_data(data):
|
||||
"""Process data with line profiling."""
|
||||
result = []
|
||||
for item in data:
|
||||
processed = item * 2
|
||||
result.append(processed)
|
||||
return result
|
||||
|
||||
# Run with:
|
||||
# kernprof -l -v script.py
|
||||
```
|
||||
|
||||
**Manual line profiling:**
|
||||
```python
|
||||
from line_profiler import LineProfiler
|
||||
|
||||
def process_data(data):
|
||||
"""Function to profile."""
|
||||
result = []
|
||||
for item in data:
|
||||
processed = item * 2
|
||||
result.append(processed)
|
||||
return result
|
||||
|
||||
if __name__ == "__main__":
|
||||
lp = LineProfiler()
|
||||
lp.add_function(process_data)
|
||||
|
||||
data = list(range(100000))
|
||||
|
||||
lp_wrapper = lp(process_data)
|
||||
lp_wrapper(data)
|
||||
|
||||
lp.print_stats()
|
||||
```
|
||||
|
||||
### Pattern 3: memory_profiler - Memory Usage
|
||||
|
||||
```python
|
||||
# Install: pip install memory-profiler
|
||||
|
||||
from memory_profiler import profile
|
||||
|
||||
@profile
|
||||
def memory_intensive():
|
||||
"""Function that uses lots of memory."""
|
||||
# Create large list
|
||||
big_list = [i for i in range(1000000)]
|
||||
|
||||
# Create large dict
|
||||
big_dict = {i: i**2 for i in range(100000)}
|
||||
|
||||
# Process data
|
||||
result = sum(big_list)
|
||||
|
||||
return result
|
||||
|
||||
if __name__ == "__main__":
|
||||
memory_intensive()
|
||||
|
||||
# Run with:
|
||||
# python -m memory_profiler script.py
|
||||
```
|
||||
|
||||
### Pattern 4: py-spy - Production Profiling
|
||||
|
||||
```bash
|
||||
# Install: pip install py-spy
|
||||
|
||||
# Profile a running Python process
|
||||
py-spy top --pid 12345
|
||||
|
||||
# Generate flamegraph
|
||||
py-spy record -o profile.svg --pid 12345
|
||||
|
||||
# Profile a script
|
||||
py-spy record -o profile.svg -- python script.py
|
||||
|
||||
# Dump current call stack
|
||||
py-spy dump --pid 12345
|
||||
```
|
||||
|
||||
## Optimization Patterns
|
||||
|
||||
### Pattern 5: List Comprehensions vs Loops
|
||||
|
||||
```python
|
||||
import timeit
|
||||
|
||||
# Slow: Traditional loop
|
||||
def slow_squares(n):
|
||||
"""Create list of squares using loop."""
|
||||
result = []
|
||||
for i in range(n):
|
||||
result.append(i**2)
|
||||
return result
|
||||
|
||||
# Fast: List comprehension
|
||||
def fast_squares(n):
|
||||
"""Create list of squares using comprehension."""
|
||||
return [i**2 for i in range(n)]
|
||||
|
||||
# Benchmark
|
||||
n = 100000
|
||||
|
||||
slow_time = timeit.timeit(lambda: slow_squares(n), number=100)
|
||||
fast_time = timeit.timeit(lambda: fast_squares(n), number=100)
|
||||
|
||||
print(f"Loop: {slow_time:.4f}s")
|
||||
print(f"Comprehension: {fast_time:.4f}s")
|
||||
print(f"Speedup: {slow_time/fast_time:.2f}x")
|
||||
|
||||
# Even faster for simple operations: map
|
||||
def faster_squares(n):
|
||||
"""Use map for even better performance."""
|
||||
return list(map(lambda x: x**2, range(n)))
|
||||
```
|
||||
|
||||
### Pattern 6: Generator Expressions for Memory
|
||||
|
||||
```python
|
||||
import sys
|
||||
|
||||
def list_approach():
|
||||
"""Memory-intensive list."""
|
||||
data = [i**2 for i in range(1000000)]
|
||||
return sum(data)
|
||||
|
||||
def generator_approach():
|
||||
"""Memory-efficient generator."""
|
||||
data = (i**2 for i in range(1000000))
|
||||
return sum(data)
|
||||
|
||||
# Memory comparison
|
||||
list_data = [i for i in range(1000000)]
|
||||
gen_data = (i for i in range(1000000))
|
||||
|
||||
print(f"List size: {sys.getsizeof(list_data)} bytes")
|
||||
print(f"Generator size: {sys.getsizeof(gen_data)} bytes")
|
||||
|
||||
# Generators use constant memory regardless of size
|
||||
```
|
||||
|
||||
### Pattern 7: String Concatenation
|
||||
|
||||
```python
|
||||
import timeit
|
||||
|
||||
def slow_concat(items):
|
||||
"""Slow string concatenation."""
|
||||
result = ""
|
||||
for item in items:
|
||||
result += str(item)
|
||||
return result
|
||||
|
||||
def fast_concat(items):
|
||||
"""Fast string concatenation with join."""
|
||||
return "".join(str(item) for item in items)
|
||||
|
||||
def faster_concat(items):
|
||||
"""Even faster with list."""
|
||||
parts = [str(item) for item in items]
|
||||
return "".join(parts)
|
||||
|
||||
items = list(range(10000))
|
||||
|
||||
# Benchmark
|
||||
slow = timeit.timeit(lambda: slow_concat(items), number=100)
|
||||
fast = timeit.timeit(lambda: fast_concat(items), number=100)
|
||||
faster = timeit.timeit(lambda: faster_concat(items), number=100)
|
||||
|
||||
print(f"Concatenation (+): {slow:.4f}s")
|
||||
print(f"Join (generator): {fast:.4f}s")
|
||||
print(f"Join (list): {faster:.4f}s")
|
||||
```
|
||||
|
||||
### Pattern 8: Dictionary Lookups vs List Searches
|
||||
|
||||
```python
|
||||
import timeit
|
||||
|
||||
# Create test data
|
||||
size = 10000
|
||||
items = list(range(size))
|
||||
lookup_dict = {i: i for i in range(size)}
|
||||
|
||||
def list_search(items, target):
|
||||
"""O(n) search in list."""
|
||||
return target in items
|
||||
|
||||
def dict_search(lookup_dict, target):
|
||||
"""O(1) search in dict."""
|
||||
return target in lookup_dict
|
||||
|
||||
target = size - 1 # Worst case for list
|
||||
|
||||
# Benchmark
|
||||
list_time = timeit.timeit(
|
||||
lambda: list_search(items, target),
|
||||
number=1000
|
||||
)
|
||||
dict_time = timeit.timeit(
|
||||
lambda: dict_search(lookup_dict, target),
|
||||
number=1000
|
||||
)
|
||||
|
||||
print(f"List search: {list_time:.6f}s")
|
||||
print(f"Dict search: {dict_time:.6f}s")
|
||||
print(f"Speedup: {list_time/dict_time:.0f}x")
|
||||
```
|
||||
|
||||
### Pattern 9: Local Variable Access
|
||||
|
||||
```python
|
||||
import timeit
|
||||
|
||||
# Global variable (slow)
|
||||
GLOBAL_VALUE = 100
|
||||
|
||||
def use_global():
|
||||
"""Access global variable."""
|
||||
total = 0
|
||||
for i in range(10000):
|
||||
total += GLOBAL_VALUE
|
||||
return total
|
||||
|
||||
def use_local():
|
||||
"""Use local variable."""
|
||||
local_value = 100
|
||||
total = 0
|
||||
for i in range(10000):
|
||||
total += local_value
|
||||
return total
|
||||
|
||||
# Local is faster
|
||||
global_time = timeit.timeit(use_global, number=1000)
|
||||
local_time = timeit.timeit(use_local, number=1000)
|
||||
|
||||
print(f"Global access: {global_time:.4f}s")
|
||||
print(f"Local access: {local_time:.4f}s")
|
||||
print(f"Speedup: {global_time/local_time:.2f}x")
|
||||
```
|
||||
|
||||
### Pattern 10: Function Call Overhead
|
||||
|
||||
```python
|
||||
import timeit
|
||||
|
||||
def calculate_inline():
|
||||
"""Inline calculation."""
|
||||
total = 0
|
||||
for i in range(10000):
|
||||
total += i * 2 + 1
|
||||
return total
|
||||
|
||||
def helper_function(x):
|
||||
"""Helper function."""
|
||||
return x * 2 + 1
|
||||
|
||||
def calculate_with_function():
|
||||
"""Calculation with function calls."""
|
||||
total = 0
|
||||
for i in range(10000):
|
||||
total += helper_function(i)
|
||||
return total
|
||||
|
||||
# Inline is faster due to no call overhead
|
||||
inline_time = timeit.timeit(calculate_inline, number=1000)
|
||||
function_time = timeit.timeit(calculate_with_function, number=1000)
|
||||
|
||||
print(f"Inline: {inline_time:.4f}s")
|
||||
print(f"Function calls: {function_time:.4f}s")
|
||||
```
|
||||
|
||||
## Advanced Optimization
|
||||
|
||||
### Pattern 11: NumPy for Numerical Operations
|
||||
|
||||
```python
|
||||
import timeit
|
||||
import numpy as np
|
||||
|
||||
def python_sum(n):
|
||||
"""Sum using pure Python."""
|
||||
return sum(range(n))
|
||||
|
||||
def numpy_sum(n):
|
||||
"""Sum using NumPy."""
|
||||
return np.arange(n).sum()
|
||||
|
||||
n = 1000000
|
||||
|
||||
python_time = timeit.timeit(lambda: python_sum(n), number=100)
|
||||
numpy_time = timeit.timeit(lambda: numpy_sum(n), number=100)
|
||||
|
||||
print(f"Python: {python_time:.4f}s")
|
||||
print(f"NumPy: {numpy_time:.4f}s")
|
||||
print(f"Speedup: {python_time/numpy_time:.2f}x")
|
||||
|
||||
# Vectorized operations
|
||||
def python_multiply():
|
||||
"""Element-wise multiplication in Python."""
|
||||
a = list(range(100000))
|
||||
b = list(range(100000))
|
||||
return [x * y for x, y in zip(a, b)]
|
||||
|
||||
def numpy_multiply():
|
||||
"""Vectorized multiplication in NumPy."""
|
||||
a = np.arange(100000)
|
||||
b = np.arange(100000)
|
||||
return a * b
|
||||
|
||||
py_time = timeit.timeit(python_multiply, number=100)
|
||||
np_time = timeit.timeit(numpy_multiply, number=100)
|
||||
|
||||
print(f"\nPython multiply: {py_time:.4f}s")
|
||||
print(f"NumPy multiply: {np_time:.4f}s")
|
||||
print(f"Speedup: {py_time/np_time:.2f}x")
|
||||
```
|
||||
|
||||
### Pattern 12: Caching with functools.lru_cache
|
||||
|
||||
```python
|
||||
from functools import lru_cache
|
||||
import timeit
|
||||
|
||||
def fibonacci_slow(n):
|
||||
"""Recursive fibonacci without caching."""
|
||||
if n < 2:
|
||||
return n
|
||||
return fibonacci_slow(n-1) + fibonacci_slow(n-2)
|
||||
|
||||
@lru_cache(maxsize=None)
|
||||
def fibonacci_fast(n):
|
||||
"""Recursive fibonacci with caching."""
|
||||
if n < 2:
|
||||
return n
|
||||
return fibonacci_fast(n-1) + fibonacci_fast(n-2)
|
||||
|
||||
# Massive speedup for recursive algorithms
|
||||
n = 30
|
||||
|
||||
slow_time = timeit.timeit(lambda: fibonacci_slow(n), number=1)
|
||||
fast_time = timeit.timeit(lambda: fibonacci_fast(n), number=1000)
|
||||
|
||||
print(f"Without cache (1 run): {slow_time:.4f}s")
|
||||
print(f"With cache (1000 runs): {fast_time:.4f}s")
|
||||
|
||||
# Cache info
|
||||
print(f"Cache info: {fibonacci_fast.cache_info()}")
|
||||
```
|
||||
|
||||
### Pattern 13: Using __slots__ for Memory
|
||||
|
||||
```python
|
||||
import sys
|
||||
|
||||
class RegularClass:
|
||||
"""Regular class with __dict__."""
|
||||
def __init__(self, x, y, z):
|
||||
self.x = x
|
||||
self.y = y
|
||||
self.z = z
|
||||
|
||||
class SlottedClass:
|
||||
"""Class with __slots__ for memory efficiency."""
|
||||
__slots__ = ['x', 'y', 'z']
|
||||
|
||||
def __init__(self, x, y, z):
|
||||
self.x = x
|
||||
self.y = y
|
||||
self.z = z
|
||||
|
||||
# Memory comparison
|
||||
regular = RegularClass(1, 2, 3)
|
||||
slotted = SlottedClass(1, 2, 3)
|
||||
|
||||
print(f"Regular class size: {sys.getsizeof(regular)} bytes")
|
||||
print(f"Slotted class size: {sys.getsizeof(slotted)} bytes")
|
||||
|
||||
# Significant savings with many instances
|
||||
regular_objects = [RegularClass(i, i+1, i+2) for i in range(10000)]
|
||||
slotted_objects = [SlottedClass(i, i+1, i+2) for i in range(10000)]
|
||||
|
||||
print(f"\nMemory for 10000 regular objects: ~{sys.getsizeof(regular) * 10000} bytes")
|
||||
print(f"Memory for 10000 slotted objects: ~{sys.getsizeof(slotted) * 10000} bytes")
|
||||
```
|
||||
|
||||
### Pattern 14: Multiprocessing for CPU-Bound Tasks
|
||||
|
||||
```python
|
||||
import multiprocessing as mp
|
||||
import time
|
||||
|
||||
def cpu_intensive_task(n):
|
||||
"""CPU-intensive calculation."""
|
||||
return sum(i**2 for i in range(n))
|
||||
|
||||
def sequential_processing():
|
||||
"""Process tasks sequentially."""
|
||||
start = time.time()
|
||||
results = [cpu_intensive_task(1000000) for _ in range(4)]
|
||||
elapsed = time.time() - start
|
||||
return elapsed, results
|
||||
|
||||
def parallel_processing():
|
||||
"""Process tasks in parallel."""
|
||||
start = time.time()
|
||||
with mp.Pool(processes=4) as pool:
|
||||
results = pool.map(cpu_intensive_task, [1000000] * 4)
|
||||
elapsed = time.time() - start
|
||||
return elapsed, results
|
||||
|
||||
if __name__ == "__main__":
|
||||
seq_time, seq_results = sequential_processing()
|
||||
par_time, par_results = parallel_processing()
|
||||
|
||||
print(f"Sequential: {seq_time:.2f}s")
|
||||
print(f"Parallel: {par_time:.2f}s")
|
||||
print(f"Speedup: {seq_time/par_time:.2f}x")
|
||||
```
|
||||
|
||||
### Pattern 15: Async I/O for I/O-Bound Tasks
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import time
|
||||
import requests
|
||||
|
||||
urls = [
|
||||
"https://httpbin.org/delay/1",
|
||||
"https://httpbin.org/delay/1",
|
||||
"https://httpbin.org/delay/1",
|
||||
"https://httpbin.org/delay/1",
|
||||
]
|
||||
|
||||
def synchronous_requests():
|
||||
"""Synchronous HTTP requests."""
|
||||
start = time.time()
|
||||
results = []
|
||||
for url in urls:
|
||||
response = requests.get(url)
|
||||
results.append(response.status_code)
|
||||
elapsed = time.time() - start
|
||||
return elapsed, results
|
||||
|
||||
async def async_fetch(session, url):
|
||||
"""Async HTTP request."""
|
||||
async with session.get(url) as response:
|
||||
return response.status
|
||||
|
||||
async def asynchronous_requests():
|
||||
"""Asynchronous HTTP requests."""
|
||||
start = time.time()
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [async_fetch(session, url) for url in urls]
|
||||
results = await asyncio.gather(*tasks)
|
||||
elapsed = time.time() - start
|
||||
return elapsed, results
|
||||
|
||||
# Async is much faster for I/O-bound work
|
||||
sync_time, sync_results = synchronous_requests()
|
||||
async_time, async_results = asyncio.run(asynchronous_requests())
|
||||
|
||||
print(f"Synchronous: {sync_time:.2f}s")
|
||||
print(f"Asynchronous: {async_time:.2f}s")
|
||||
print(f"Speedup: {sync_time/async_time:.2f}x")
|
||||
```
|
||||
|
||||
## Database Optimization
|
||||
|
||||
### Pattern 16: Batch Database Operations
|
||||
|
||||
```python
|
||||
import sqlite3
|
||||
import time
|
||||
|
||||
def create_db():
|
||||
"""Create test database."""
|
||||
conn = sqlite3.connect(":memory:")
|
||||
conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
|
||||
return conn
|
||||
|
||||
def slow_inserts(conn, count):
|
||||
"""Insert records one at a time."""
|
||||
start = time.time()
|
||||
cursor = conn.cursor()
|
||||
for i in range(count):
|
||||
cursor.execute("INSERT INTO users (name) VALUES (?)", (f"User {i}",))
|
||||
conn.commit() # Commit each insert
|
||||
elapsed = time.time() - start
|
||||
return elapsed
|
||||
|
||||
def fast_inserts(conn, count):
|
||||
"""Batch insert with single commit."""
|
||||
start = time.time()
|
||||
cursor = conn.cursor()
|
||||
data = [(f"User {i}",) for i in range(count)]
|
||||
cursor.executemany("INSERT INTO users (name) VALUES (?)", data)
|
||||
conn.commit() # Single commit
|
||||
elapsed = time.time() - start
|
||||
return elapsed
|
||||
|
||||
# Benchmark
|
||||
conn1 = create_db()
|
||||
slow_time = slow_inserts(conn1, 1000)
|
||||
|
||||
conn2 = create_db()
|
||||
fast_time = fast_inserts(conn2, 1000)
|
||||
|
||||
print(f"Individual inserts: {slow_time:.4f}s")
|
||||
print(f"Batch insert: {fast_time:.4f}s")
|
||||
print(f"Speedup: {slow_time/fast_time:.2f}x")
|
||||
```
|
||||
|
||||
### Pattern 17: Query Optimization
|
||||
|
||||
```python
|
||||
# Use indexes for frequently queried columns
|
||||
"""
|
||||
-- Slow: No index
|
||||
SELECT * FROM users WHERE email = 'user@example.com';
|
||||
|
||||
-- Fast: With index
|
||||
CREATE INDEX idx_users_email ON users(email);
|
||||
SELECT * FROM users WHERE email = 'user@example.com';
|
||||
"""
|
||||
|
||||
# Use query planning
|
||||
import sqlite3
|
||||
|
||||
conn = sqlite3.connect("example.db")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Analyze query performance
|
||||
cursor.execute("EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?", ("test@example.com",))
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Use SELECT only needed columns
|
||||
# Slow: SELECT *
|
||||
# Fast: SELECT id, name
|
||||
```
|
||||
|
||||
## Memory Optimization
|
||||
|
||||
### Pattern 18: Detecting Memory Leaks
|
||||
|
||||
```python
|
||||
import tracemalloc
|
||||
import gc
|
||||
|
||||
def memory_leak_example():
|
||||
"""Example that leaks memory."""
|
||||
leaked_objects = []
|
||||
|
||||
for i in range(100000):
|
||||
# Objects added but never removed
|
||||
leaked_objects.append([i] * 100)
|
||||
|
||||
# In real code, this would be an unintended reference
|
||||
|
||||
def track_memory_usage():
|
||||
"""Track memory allocations."""
|
||||
tracemalloc.start()
|
||||
|
||||
# Take snapshot before
|
||||
snapshot1 = tracemalloc.take_snapshot()
|
||||
|
||||
# Run code
|
||||
memory_leak_example()
|
||||
|
||||
# Take snapshot after
|
||||
snapshot2 = tracemalloc.take_snapshot()
|
||||
|
||||
# Compare
|
||||
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
|
||||
|
||||
print("Top 10 memory allocations:")
|
||||
for stat in top_stats[:10]:
|
||||
print(stat)
|
||||
|
||||
tracemalloc.stop()
|
||||
|
||||
# Monitor memory
|
||||
track_memory_usage()
|
||||
|
||||
# Force garbage collection
|
||||
gc.collect()
|
||||
```
|
||||
|
||||
### Pattern 19: Iterators vs Lists
|
||||
|
||||
```python
|
||||
import sys
|
||||
|
||||
def process_file_list(filename):
|
||||
"""Load entire file into memory."""
|
||||
with open(filename) as f:
|
||||
lines = f.readlines() # Loads all lines
|
||||
return sum(1 for line in lines if line.strip())
|
||||
|
||||
def process_file_iterator(filename):
|
||||
"""Process file line by line."""
|
||||
with open(filename) as f:
|
||||
return sum(1 for line in f if line.strip())
|
||||
|
||||
# Iterator uses constant memory
|
||||
# List loads entire file into memory
|
||||
```
|
||||
|
||||
### Pattern 20: Weakref for Caches
|
||||
|
||||
```python
|
||||
import weakref
|
||||
|
||||
class CachedResource:
|
||||
"""Resource that can be garbage collected."""
|
||||
def __init__(self, data):
|
||||
self.data = data
|
||||
|
||||
# Regular cache prevents garbage collection
|
||||
regular_cache = {}
|
||||
|
||||
def get_resource_regular(key):
|
||||
"""Get resource from regular cache."""
|
||||
if key not in regular_cache:
|
||||
regular_cache[key] = CachedResource(f"Data for {key}")
|
||||
return regular_cache[key]
|
||||
|
||||
# Weak reference cache allows garbage collection
|
||||
weak_cache = weakref.WeakValueDictionary()
|
||||
|
||||
def get_resource_weak(key):
|
||||
"""Get resource from weak cache."""
|
||||
resource = weak_cache.get(key)
|
||||
if resource is None:
|
||||
resource = CachedResource(f"Data for {key}")
|
||||
weak_cache[key] = resource
|
||||
return resource
|
||||
|
||||
# When no strong references exist, objects can be GC'd
|
||||
```
|
||||
|
||||
## Benchmarking Tools
|
||||
|
||||
### Custom Benchmark Decorator
|
||||
|
||||
```python
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def benchmark(func):
|
||||
"""Decorator to benchmark function execution."""
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
start = time.perf_counter()
|
||||
result = func(*args, **kwargs)
|
||||
elapsed = time.perf_counter() - start
|
||||
print(f"{func.__name__} took {elapsed:.6f} seconds")
|
||||
return result
|
||||
return wrapper
|
||||
|
||||
@benchmark
|
||||
def slow_function():
|
||||
"""Function to benchmark."""
|
||||
time.sleep(0.5)
|
||||
return sum(range(1000000))
|
||||
|
||||
result = slow_function()
|
||||
```
|
||||
|
||||
### Performance Testing with pytest-benchmark
|
||||
|
||||
```python
|
||||
# Install: pip install pytest-benchmark
|
||||
|
||||
def test_list_comprehension(benchmark):
|
||||
"""Benchmark list comprehension."""
|
||||
result = benchmark(lambda: [i**2 for i in range(10000)])
|
||||
assert len(result) == 10000
|
||||
|
||||
def test_map_function(benchmark):
|
||||
"""Benchmark map function."""
|
||||
result = benchmark(lambda: list(map(lambda x: x**2, range(10000))))
|
||||
assert len(result) == 10000
|
||||
|
||||
# Run with: pytest test_performance.py --benchmark-compare
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Profile before optimizing** - Measure to find real bottlenecks
|
||||
2. **Focus on hot paths** - Optimize code that runs most frequently
|
||||
3. **Use appropriate data structures** - Dict for lookups, set for membership
|
||||
4. **Avoid premature optimization** - Clarity first, then optimize
|
||||
5. **Use built-in functions** - They're implemented in C
|
||||
6. **Cache expensive computations** - Use lru_cache
|
||||
7. **Batch I/O operations** - Reduce system calls
|
||||
8. **Use generators** for large datasets
|
||||
9. **Consider NumPy** for numerical operations
|
||||
10. **Profile production code** - Use py-spy for live systems
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- Optimizing without profiling
|
||||
- Using global variables unnecessarily
|
||||
- Not using appropriate data structures
|
||||
- Creating unnecessary copies of data
|
||||
- Not using connection pooling for databases
|
||||
- Ignoring algorithmic complexity
|
||||
- Over-optimizing rare code paths
|
||||
- Not considering memory usage
|
||||
|
||||
## Resources
|
||||
|
||||
- **cProfile**: Built-in CPU profiler
|
||||
- **memory_profiler**: Memory usage profiling
|
||||
- **line_profiler**: Line-by-line profiling
|
||||
- **py-spy**: Sampling profiler for production
|
||||
- **NumPy**: High-performance numerical computing
|
||||
- **Cython**: Compile Python to C
|
||||
- **PyPy**: Alternative Python interpreter with JIT
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
- [ ] Profiled code to identify bottlenecks
|
||||
- [ ] Used appropriate data structures
|
||||
- [ ] Implemented caching where beneficial
|
||||
- [ ] Optimized database queries
|
||||
- [ ] Used generators for large datasets
|
||||
- [ ] Considered multiprocessing for CPU-bound tasks
|
||||
- [ ] Used async I/O for I/O-bound tasks
|
||||
- [ ] Minimized function call overhead in hot loops
|
||||
- [ ] Checked for memory leaks
|
||||
- [ ] Benchmarked before and after optimization
|
||||
Reference in New Issue
Block a user