7.4 KiB
7.4 KiB
Python Performance Optimization
Load this file when: Optimizing performance in Python projects
Profiling Tools
Execution Time Profiling
# cProfile - Built-in profiler
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats
# py-spy - Sampling profiler (no code changes needed)
py-spy record -o profile.svg -- python script.py
py-spy top -- python script.py
# line_profiler - Line-by-line profiling
kernprof -l -v script.py
Memory Profiling
# memory_profiler - Line-by-line memory usage
python -m memory_profiler script.py
# memray - Modern memory profiler
memray run script.py
memray flamegraph output.bin
# tracemalloc - Built-in memory tracking
# (use in code, see example below)
Benchmarking
# pytest-benchmark
pytest tests/ --benchmark-only
# timeit - Quick microbenchmarks
python -m timeit "'-'.join(str(n) for n in range(100))"
Python-Specific Optimization Patterns
Async/Await Patterns
import asyncio
import aiohttp
# Good: Parallel async operations
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
return await asyncio.gather(*tasks)
# Bad: Sequential async (defeats the purpose)
async def fetch_all_bad(urls):
results = []
async with aiohttp.ClientSession() as session:
for url in urls:
results.append(await fetch_url(session, url))
return results
List Comprehensions vs Generators
# Generator (memory efficient for large datasets)
def process_large_file(filename):
return (process_line(line) for line in open(filename))
# List comprehension (when you need all data in memory)
def process_small_file(filename):
return [process_line(line) for line in open(filename)]
# Use itertools for complex generators
from itertools import islice, chain
first_10 = list(islice(generate_data(), 10))
Efficient Data Structures
# Use sets for membership testing
# Bad: O(n)
if item in my_list: # Slow for large lists
...
# Good: O(1)
if item in my_set: # Fast
...
# Use deque for queue operations
from collections import deque
queue = deque()
queue.append(item) # O(1)
queue.popleft() # O(1) vs list.pop(0) which is O(n)
# Use defaultdict to avoid key checks
from collections import defaultdict
counter = defaultdict(int)
counter[key] += 1 # No need to check if key exists
GIL (Global Interpreter Lock) Considerations
CPU-Bound Work
# Use multiprocessing for CPU-bound tasks
from multiprocessing import Pool
def cpu_intensive_task(data):
# Heavy computation
return result
with Pool(processes=4) as pool:
results = pool.map(cpu_intensive_task, data_list)
I/O-Bound Work
# Use asyncio or threading for I/O-bound tasks
import asyncio
async def io_bound_task(url):
# Network I/O, file I/O
return result
results = await asyncio.gather(*[io_bound_task(url) for url in urls])
Common Python Anti-Patterns
String Concatenation
# Bad: O(n²) for n strings
result = ""
for s in strings:
result += s
# Good: O(n)
result = "".join(strings)
Unnecessary Lambda
# Bad: Extra function call overhead
sorted_items = sorted(items, key=lambda x: x.value)
# Good: Direct attribute access
from operator import attrgetter
sorted_items = sorted(items, key=attrgetter('value'))
Loop Invariant Code
# Bad: Repeated calculation in loop
for item in items:
expensive_result = expensive_function()
process(item, expensive_result)
# Good: Calculate once
expensive_result = expensive_function()
for item in items:
process(item, expensive_result)
Performance Measurement
Tracemalloc for Memory Tracking
import tracemalloc
# Start tracking
tracemalloc.start()
# Your code here
data = [i for i in range(1000000)]
# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop()
Context Manager for Timing
import time
from contextlib import contextmanager
@contextmanager
def timer(name):
start = time.perf_counter()
yield
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.4f}s")
# Usage
with timer("Database query"):
results = db.query(...)
Database Optimization (Python-Specific)
SQLAlchemy Best Practices
# Bad: N+1 queries
for user in session.query(User).all():
print(user.profile.bio) # Separate query for each
# Good: Eager loading
from sqlalchemy.orm import joinedload
users = session.query(User).options(
joinedload(User.profile)
).all()
# Good: Batch operations
session.bulk_insert_mappings(User, user_dicts)
session.commit()
Caching Strategies
Function Caching
from functools import lru_cache, cache
# LRU cache with size limit
@lru_cache(maxsize=128)
def expensive_computation(n):
# Heavy computation
return result
# Unlimited cache (Python 3.9+)
@cache
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# Manual cache with expiration
from cachetools import TTLCache
cache = TTLCache(maxsize=100, ttl=300) # 5 minutes
Performance Testing
pytest-benchmark
def test_processing_performance(benchmark):
# Benchmark automatically handles iterations
result = benchmark(process_data, large_dataset)
assert result is not None
# Compare against baseline
def test_against_baseline(benchmark):
benchmark.pedantic(
process_data,
args=(dataset,),
iterations=10,
rounds=100
)
Load Testing with Locust
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 3)
@task
def load_homepage(self):
self.client.get("/")
@task(3) # 3x more likely than homepage
def load_api(self):
self.client.get("/api/data")
Performance Checklist
Before Optimizing:
- Profile to identify actual bottlenecks (don't guess!)
- Measure baseline performance
- Set performance targets
Python-Specific Optimizations:
- Use generators for large datasets
- Replace loops with list comprehensions where appropriate
- Use appropriate data structures (set, deque, defaultdict)
- Implement caching with @lru_cache or @cache
- Use async/await for I/O-bound operations
- Use multiprocessing for CPU-bound operations
- Avoid string concatenation in loops
- Minimize attribute lookups in hot loops
- Use slots for classes with many instances
After Optimizing:
- Re-profile to verify improvements
- Check memory usage hasn't increased significantly
- Ensure code readability is maintained
- Add performance regression tests
Tools and Libraries
Profiling:
cProfile- Built-in execution profilerpy-spy- Sampling profiler without code changesmemory_profiler- Memory usage line-by-linememray- Modern memory profiler with flamegraphs
Performance Testing:
pytest-benchmark- Benchmark testslocust- Load testing frameworkhyperfine- Command-line benchmarking
Optimization:
numpy- Vectorized operations for numerical datanumba- JIT compilation for numerical functionscython- Compile Python to C for speed
Python-specific performance optimization with profiling tools and patterns