Initial commit
This commit is contained in:
326
skills/optimizing-performance/languages/PYTHON.md
Normal file
326
skills/optimizing-performance/languages/PYTHON.md
Normal file
@@ -0,0 +1,326 @@
|
||||
# Python Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in Python projects
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Execution Time Profiling
|
||||
```bash
|
||||
# cProfile - Built-in profiler
|
||||
python -m cProfile -o profile.stats script.py
|
||||
python -m pstats profile.stats
|
||||
|
||||
# py-spy - Sampling profiler (no code changes needed)
|
||||
py-spy record -o profile.svg -- python script.py
|
||||
py-spy top -- python script.py
|
||||
|
||||
# line_profiler - Line-by-line profiling
|
||||
kernprof -l -v script.py
|
||||
```
|
||||
|
||||
### Memory Profiling
|
||||
```bash
|
||||
# memory_profiler - Line-by-line memory usage
|
||||
python -m memory_profiler script.py
|
||||
|
||||
# memray - Modern memory profiler
|
||||
memray run script.py
|
||||
memray flamegraph output.bin
|
||||
|
||||
# tracemalloc - Built-in memory tracking
|
||||
# (use in code, see example below)
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
```bash
|
||||
# pytest-benchmark
|
||||
pytest tests/ --benchmark-only
|
||||
|
||||
# timeit - Quick microbenchmarks
|
||||
python -m timeit "'-'.join(str(n) for n in range(100))"
|
||||
```
|
||||
|
||||
## Python-Specific Optimization Patterns
|
||||
|
||||
### Async/Await Patterns
|
||||
```python
|
||||
import asyncio
|
||||
import aiohttp
|
||||
|
||||
# Good: Parallel async operations
|
||||
async def fetch_all(urls):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [fetch_url(session, url) for url in urls]
|
||||
return await asyncio.gather(*tasks)
|
||||
|
||||
# Bad: Sequential async (defeats the purpose)
|
||||
async def fetch_all_bad(urls):
|
||||
results = []
|
||||
async with aiohttp.ClientSession() as session:
|
||||
for url in urls:
|
||||
results.append(await fetch_url(session, url))
|
||||
return results
|
||||
```
|
||||
|
||||
### List Comprehensions vs Generators
|
||||
```python
|
||||
# Generator (memory efficient for large datasets)
|
||||
def process_large_file(filename):
|
||||
return (process_line(line) for line in open(filename))
|
||||
|
||||
# List comprehension (when you need all data in memory)
|
||||
def process_small_file(filename):
|
||||
return [process_line(line) for line in open(filename)]
|
||||
|
||||
# Use itertools for complex generators
|
||||
from itertools import islice, chain
|
||||
first_10 = list(islice(generate_data(), 10))
|
||||
```
|
||||
|
||||
### Efficient Data Structures
|
||||
```python
|
||||
# Use sets for membership testing
|
||||
# Bad: O(n)
|
||||
if item in my_list: # Slow for large lists
|
||||
...
|
||||
|
||||
# Good: O(1)
|
||||
if item in my_set: # Fast
|
||||
...
|
||||
|
||||
# Use deque for queue operations
|
||||
from collections import deque
|
||||
queue = deque()
|
||||
queue.append(item) # O(1)
|
||||
queue.popleft() # O(1) vs list.pop(0) which is O(n)
|
||||
|
||||
# Use defaultdict to avoid key checks
|
||||
from collections import defaultdict
|
||||
counter = defaultdict(int)
|
||||
counter[key] += 1 # No need to check if key exists
|
||||
```
|
||||
|
||||
## GIL (Global Interpreter Lock) Considerations
|
||||
|
||||
### CPU-Bound Work
|
||||
```python
|
||||
# Use multiprocessing for CPU-bound tasks
|
||||
from multiprocessing import Pool
|
||||
|
||||
def cpu_intensive_task(data):
|
||||
# Heavy computation
|
||||
return result
|
||||
|
||||
with Pool(processes=4) as pool:
|
||||
results = pool.map(cpu_intensive_task, data_list)
|
||||
```
|
||||
|
||||
### I/O-Bound Work
|
||||
```python
|
||||
# Use asyncio or threading for I/O-bound tasks
|
||||
import asyncio
|
||||
|
||||
async def io_bound_task(url):
|
||||
# Network I/O, file I/O
|
||||
return result
|
||||
|
||||
results = await asyncio.gather(*[io_bound_task(url) for url in urls])
|
||||
```
|
||||
|
||||
## Common Python Anti-Patterns
|
||||
|
||||
### String Concatenation
|
||||
```python
|
||||
# Bad: O(n²) for n strings
|
||||
result = ""
|
||||
for s in strings:
|
||||
result += s
|
||||
|
||||
# Good: O(n)
|
||||
result = "".join(strings)
|
||||
```
|
||||
|
||||
### Unnecessary Lambda
|
||||
```python
|
||||
# Bad: Extra function call overhead
|
||||
sorted_items = sorted(items, key=lambda x: x.value)
|
||||
|
||||
# Good: Direct attribute access
|
||||
from operator import attrgetter
|
||||
sorted_items = sorted(items, key=attrgetter('value'))
|
||||
```
|
||||
|
||||
### Loop Invariant Code
|
||||
```python
|
||||
# Bad: Repeated calculation in loop
|
||||
for item in items:
|
||||
expensive_result = expensive_function()
|
||||
process(item, expensive_result)
|
||||
|
||||
# Good: Calculate once
|
||||
expensive_result = expensive_function()
|
||||
for item in items:
|
||||
process(item, expensive_result)
|
||||
```
|
||||
|
||||
## Performance Measurement
|
||||
|
||||
### Tracemalloc for Memory Tracking
|
||||
```python
|
||||
import tracemalloc
|
||||
|
||||
# Start tracking
|
||||
tracemalloc.start()
|
||||
|
||||
# Your code here
|
||||
data = [i for i in range(1000000)]
|
||||
|
||||
# Get memory usage
|
||||
current, peak = tracemalloc.get_traced_memory()
|
||||
print(f"Current: {current / 1024 / 1024:.2f} MB")
|
||||
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
|
||||
|
||||
tracemalloc.stop()
|
||||
```
|
||||
|
||||
### Context Manager for Timing
|
||||
```python
|
||||
import time
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def timer(name):
|
||||
start = time.perf_counter()
|
||||
yield
|
||||
elapsed = time.perf_counter() - start
|
||||
print(f"{name}: {elapsed:.4f}s")
|
||||
|
||||
# Usage
|
||||
with timer("Database query"):
|
||||
results = db.query(...)
|
||||
```
|
||||
|
||||
## Database Optimization (Python-Specific)
|
||||
|
||||
### SQLAlchemy Best Practices
|
||||
```python
|
||||
# Bad: N+1 queries
|
||||
for user in session.query(User).all():
|
||||
print(user.profile.bio) # Separate query for each
|
||||
|
||||
# Good: Eager loading
|
||||
from sqlalchemy.orm import joinedload
|
||||
|
||||
users = session.query(User).options(
|
||||
joinedload(User.profile)
|
||||
).all()
|
||||
|
||||
# Good: Batch operations
|
||||
session.bulk_insert_mappings(User, user_dicts)
|
||||
session.commit()
|
||||
```
|
||||
|
||||
## Caching Strategies
|
||||
|
||||
### Function Caching
|
||||
```python
|
||||
from functools import lru_cache, cache
|
||||
|
||||
# LRU cache with size limit
|
||||
@lru_cache(maxsize=128)
|
||||
def expensive_computation(n):
|
||||
# Heavy computation
|
||||
return result
|
||||
|
||||
# Unlimited cache (Python 3.9+)
|
||||
@cache
|
||||
def fibonacci(n):
|
||||
if n < 2:
|
||||
return n
|
||||
return fibonacci(n-1) + fibonacci(n-2)
|
||||
|
||||
# Manual cache with expiration
|
||||
from cachetools import TTLCache
|
||||
cache = TTLCache(maxsize=100, ttl=300) # 5 minutes
|
||||
```
|
||||
|
||||
## Performance Testing
|
||||
|
||||
### pytest-benchmark
|
||||
```python
|
||||
def test_processing_performance(benchmark):
|
||||
# Benchmark automatically handles iterations
|
||||
result = benchmark(process_data, large_dataset)
|
||||
assert result is not None
|
||||
|
||||
# Compare against baseline
|
||||
def test_against_baseline(benchmark):
|
||||
benchmark.pedantic(
|
||||
process_data,
|
||||
args=(dataset,),
|
||||
iterations=10,
|
||||
rounds=100
|
||||
)
|
||||
```
|
||||
|
||||
### Load Testing with Locust
|
||||
```python
|
||||
from locust import HttpUser, task, between
|
||||
|
||||
class WebsiteUser(HttpUser):
|
||||
wait_time = between(1, 3)
|
||||
|
||||
@task
|
||||
def load_homepage(self):
|
||||
self.client.get("/")
|
||||
|
||||
@task(3) # 3x more likely than homepage
|
||||
def load_api(self):
|
||||
self.client.get("/api/data")
|
||||
```
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
**Before Optimizing:**
|
||||
- [ ] Profile to identify actual bottlenecks (don't guess!)
|
||||
- [ ] Measure baseline performance
|
||||
- [ ] Set performance targets
|
||||
|
||||
**Python-Specific Optimizations:**
|
||||
- [ ] Use generators for large datasets
|
||||
- [ ] Replace loops with list comprehensions where appropriate
|
||||
- [ ] Use appropriate data structures (set, deque, defaultdict)
|
||||
- [ ] Implement caching with @lru_cache or @cache
|
||||
- [ ] Use async/await for I/O-bound operations
|
||||
- [ ] Use multiprocessing for CPU-bound operations
|
||||
- [ ] Avoid string concatenation in loops
|
||||
- [ ] Minimize attribute lookups in hot loops
|
||||
- [ ] Use __slots__ for classes with many instances
|
||||
|
||||
**After Optimizing:**
|
||||
- [ ] Re-profile to verify improvements
|
||||
- [ ] Check memory usage hasn't increased significantly
|
||||
- [ ] Ensure code readability is maintained
|
||||
- [ ] Add performance regression tests
|
||||
|
||||
## Tools and Libraries
|
||||
|
||||
**Profiling:**
|
||||
- `cProfile` - Built-in execution profiler
|
||||
- `py-spy` - Sampling profiler without code changes
|
||||
- `memory_profiler` - Memory usage line-by-line
|
||||
- `memray` - Modern memory profiler with flamegraphs
|
||||
|
||||
**Performance Testing:**
|
||||
- `pytest-benchmark` - Benchmark tests
|
||||
- `locust` - Load testing framework
|
||||
- `hyperfine` - Command-line benchmarking
|
||||
|
||||
**Optimization:**
|
||||
- `numpy` - Vectorized operations for numerical data
|
||||
- `numba` - JIT compilation for numerical functions
|
||||
- `cython` - Compile Python to C for speed
|
||||
|
||||
---
|
||||
|
||||
*Python-specific performance optimization with profiling tools and patterns*
|
||||
Reference in New Issue
Block a user