gh-jeanluciano-quaestor-src…/skills/optimizing-performance/languages/PYTHON.md

# Python Performance Optimization

**Load this file when:** Optimizing performance in Python projects

## Profiling Tools

### Execution Time Profiling
```bash
# cProfile - Built-in profiler
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats

# py-spy - Sampling profiler (no code changes needed)
py-spy record -o profile.svg -- python script.py
py-spy top -- python script.py

# line_profiler - Line-by-line profiling
kernprof -l -v script.py
```

### Memory Profiling
```bash
# memory_profiler - Line-by-line memory usage
python -m memory_profiler script.py

# memray - Modern memory profiler
memray run script.py
memray flamegraph output.bin

# tracemalloc - Built-in memory tracking
# (use in code, see example below)
```

### Benchmarking
```bash
# pytest-benchmark
pytest tests/ --benchmark-only

# timeit - Quick microbenchmarks
python -m timeit "'-'.join(str(n) for n in range(100))"
```

## Python-Specific Optimization Patterns

### Async/Await Patterns
```python
import asyncio
import aiohttp

# Good: Parallel async operations
async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Bad: Sequential async (defeats the purpose)
async def fetch_all_bad(urls):
    results = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            results.append(await fetch_url(session, url))
    return results
```

### List Comprehensions vs Generators
```python
# Generator (memory efficient for large datasets)
def process_large_file(filename):
    return (process_line(line) for line in open(filename))

# List comprehension (when you need all data in memory)
def process_small_file(filename):
    return [process_line(line) for line in open(filename)]

# Use itertools for complex generators
from itertools import islice, chain
first_10 = list(islice(generate_data(), 10))
```

### Efficient Data Structures
```python
# Use sets for membership testing
# Bad: O(n)
if item in my_list:  # Slow for large lists
    ...

# Good: O(1)
if item in my_set:  # Fast
    ...

# Use deque for queue operations
from collections import deque
queue = deque()
queue.append(item)      # O(1)
queue.popleft()         # O(1) vs list.pop(0) which is O(n)

# Use defaultdict to avoid key checks
from collections import defaultdict
counter = defaultdict(int)
counter[key] += 1  # No need to check if key exists
```

## GIL (Global Interpreter Lock) Considerations

### CPU-Bound Work
```python
# Use multiprocessing for CPU-bound tasks
from multiprocessing import Pool

def cpu_intensive_task(data):
    # Heavy computation
    return result

with Pool(processes=4) as pool:
    results = pool.map(cpu_intensive_task, data_list)
```

### I/O-Bound Work
```python
# Use asyncio or threading for I/O-bound tasks
import asyncio

async def io_bound_task(url):
    # Network I/O, file I/O
    return result

results = await asyncio.gather(*[io_bound_task(url) for url in urls])
```

## Common Python Anti-Patterns

### String Concatenation
```python
# Bad: O(n²) for n strings
result = ""
for s in strings:
    result += s

# Good: O(n)
result = "".join(strings)
```

### Unnecessary Lambda
```python
# Bad: Extra function call overhead
sorted_items = sorted(items, key=lambda x: x.value)

# Good: Direct attribute access
from operator import attrgetter
sorted_items = sorted(items, key=attrgetter('value'))
```

### Loop Invariant Code
```python
# Bad: Repeated calculation in loop
for item in items:
    expensive_result = expensive_function()
    process(item, expensive_result)

# Good: Calculate once
expensive_result = expensive_function()
for item in items:
    process(item, expensive_result)
```

## Performance Measurement

### Tracemalloc for Memory Tracking
```python
import tracemalloc

# Start tracking
tracemalloc.start()

# Your code here
data = [i for i in range(1000000)]

# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")

tracemalloc.stop()
```

### Context Manager for Timing
```python
import time
from contextlib import contextmanager

@contextmanager
def timer(name):
    start = time.perf_counter()
    yield
    elapsed = time.perf_counter() - start
    print(f"{name}: {elapsed:.4f}s")

# Usage
with timer("Database query"):
    results = db.query(...)
```

## Database Optimization (Python-Specific)

### SQLAlchemy Best Practices
```python
# Bad: N+1 queries
for user in session.query(User).all():
    print(user.profile.bio)  # Separate query for each

# Good: Eager loading
from sqlalchemy.orm import joinedload

users = session.query(User).options(
    joinedload(User.profile)
).all()

# Good: Batch operations
session.bulk_insert_mappings(User, user_dicts)
session.commit()
```

## Caching Strategies

### Function Caching
```python
from functools import lru_cache, cache

# LRU cache with size limit
@lru_cache(maxsize=128)
def expensive_computation(n):
    # Heavy computation
    return result

# Unlimited cache (Python 3.9+)
@cache
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Manual cache with expiration
from cachetools import TTLCache
cache = TTLCache(maxsize=100, ttl=300)  # 5 minutes
```

## Performance Testing

### pytest-benchmark
```python
def test_processing_performance(benchmark):
    # Benchmark automatically handles iterations
    result = benchmark(process_data, large_dataset)
    assert result is not None

# Compare against baseline
def test_against_baseline(benchmark):
    benchmark.pedantic(
        process_data,
        args=(dataset,),
        iterations=10,
        rounds=100
    )
```

### Load Testing with Locust
```python
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)

    @task
    def load_homepage(self):
        self.client.get("/")

    @task(3)  # 3x more likely than homepage
    def load_api(self):
        self.client.get("/api/data")
```

## Performance Checklist

**Before Optimizing:**
- [ ] Profile to identify actual bottlenecks (don't guess!)
- [ ] Measure baseline performance
- [ ] Set performance targets

**Python-Specific Optimizations:**
- [ ] Use generators for large datasets
- [ ] Replace loops with list comprehensions where appropriate
- [ ] Use appropriate data structures (set, deque, defaultdict)
- [ ] Implement caching with @lru_cache or @cache
- [ ] Use async/await for I/O-bound operations
- [ ] Use multiprocessing for CPU-bound operations
- [ ] Avoid string concatenation in loops
- [ ] Minimize attribute lookups in hot loops
- [ ] Use __slots__ for classes with many instances

**After Optimizing:**
- [ ] Re-profile to verify improvements
- [ ] Check memory usage hasn't increased significantly
- [ ] Ensure code readability is maintained
- [ ] Add performance regression tests

## Tools and Libraries

**Profiling:**
- `cProfile` - Built-in execution profiler
- `py-spy` - Sampling profiler without code changes
- `memory_profiler` - Memory usage line-by-line
- `memray` - Modern memory profiler with flamegraphs

**Performance Testing:**
- `pytest-benchmark` - Benchmark tests
- `locust` - Load testing framework
- `hyperfine` - Command-line benchmarking

**Optimization:**
- `numpy` - Vectorized operations for numerical data
- `numba` - JIT compilation for numerical functions
- `cython` - Compile Python to C for speed

---

*Python-specific performance optimization with profiling tools and patterns*