Files
gh-jeanluciano-quaestor-src…/skills/optimizing-performance/languages/PYTHON.md
2025-11-29 18:50:24 +08:00

7.4 KiB

Python Performance Optimization

Load this file when: Optimizing performance in Python projects

Profiling Tools

Execution Time Profiling

# cProfile - Built-in profiler
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats

# py-spy - Sampling profiler (no code changes needed)
py-spy record -o profile.svg -- python script.py
py-spy top -- python script.py

# line_profiler - Line-by-line profiling
kernprof -l -v script.py

Memory Profiling

# memory_profiler - Line-by-line memory usage
python -m memory_profiler script.py

# memray - Modern memory profiler
memray run script.py
memray flamegraph output.bin

# tracemalloc - Built-in memory tracking
# (use in code, see example below)

Benchmarking

# pytest-benchmark
pytest tests/ --benchmark-only

# timeit - Quick microbenchmarks
python -m timeit "'-'.join(str(n) for n in range(100))"

Python-Specific Optimization Patterns

Async/Await Patterns

import asyncio
import aiohttp

# Good: Parallel async operations
async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Bad: Sequential async (defeats the purpose)
async def fetch_all_bad(urls):
    results = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            results.append(await fetch_url(session, url))
    return results

List Comprehensions vs Generators

# Generator (memory efficient for large datasets)
def process_large_file(filename):
    return (process_line(line) for line in open(filename))

# List comprehension (when you need all data in memory)
def process_small_file(filename):
    return [process_line(line) for line in open(filename)]

# Use itertools for complex generators
from itertools import islice, chain
first_10 = list(islice(generate_data(), 10))

Efficient Data Structures

# Use sets for membership testing
# Bad: O(n)
if item in my_list:  # Slow for large lists
    ...

# Good: O(1)
if item in my_set:  # Fast
    ...

# Use deque for queue operations
from collections import deque
queue = deque()
queue.append(item)      # O(1)
queue.popleft()         # O(1) vs list.pop(0) which is O(n)

# Use defaultdict to avoid key checks
from collections import defaultdict
counter = defaultdict(int)
counter[key] += 1  # No need to check if key exists

GIL (Global Interpreter Lock) Considerations

CPU-Bound Work

# Use multiprocessing for CPU-bound tasks
from multiprocessing import Pool

def cpu_intensive_task(data):
    # Heavy computation
    return result

with Pool(processes=4) as pool:
    results = pool.map(cpu_intensive_task, data_list)

I/O-Bound Work

# Use asyncio or threading for I/O-bound tasks
import asyncio

async def io_bound_task(url):
    # Network I/O, file I/O
    return result

results = await asyncio.gather(*[io_bound_task(url) for url in urls])

Common Python Anti-Patterns

String Concatenation

# Bad: O(n²) for n strings
result = ""
for s in strings:
    result += s

# Good: O(n)
result = "".join(strings)

Unnecessary Lambda

# Bad: Extra function call overhead
sorted_items = sorted(items, key=lambda x: x.value)

# Good: Direct attribute access
from operator import attrgetter
sorted_items = sorted(items, key=attrgetter('value'))

Loop Invariant Code

# Bad: Repeated calculation in loop
for item in items:
    expensive_result = expensive_function()
    process(item, expensive_result)

# Good: Calculate once
expensive_result = expensive_function()
for item in items:
    process(item, expensive_result)

Performance Measurement

Tracemalloc for Memory Tracking

import tracemalloc

# Start tracking
tracemalloc.start()

# Your code here
data = [i for i in range(1000000)]

# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")

tracemalloc.stop()

Context Manager for Timing

import time
from contextlib import contextmanager

@contextmanager
def timer(name):
    start = time.perf_counter()
    yield
    elapsed = time.perf_counter() - start
    print(f"{name}: {elapsed:.4f}s")

# Usage
with timer("Database query"):
    results = db.query(...)

Database Optimization (Python-Specific)

SQLAlchemy Best Practices

# Bad: N+1 queries
for user in session.query(User).all():
    print(user.profile.bio)  # Separate query for each

# Good: Eager loading
from sqlalchemy.orm import joinedload

users = session.query(User).options(
    joinedload(User.profile)
).all()

# Good: Batch operations
session.bulk_insert_mappings(User, user_dicts)
session.commit()

Caching Strategies

Function Caching

from functools import lru_cache, cache

# LRU cache with size limit
@lru_cache(maxsize=128)
def expensive_computation(n):
    # Heavy computation
    return result

# Unlimited cache (Python 3.9+)
@cache
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Manual cache with expiration
from cachetools import TTLCache
cache = TTLCache(maxsize=100, ttl=300)  # 5 minutes

Performance Testing

pytest-benchmark

def test_processing_performance(benchmark):
    # Benchmark automatically handles iterations
    result = benchmark(process_data, large_dataset)
    assert result is not None

# Compare against baseline
def test_against_baseline(benchmark):
    benchmark.pedantic(
        process_data,
        args=(dataset,),
        iterations=10,
        rounds=100
    )

Load Testing with Locust

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)

    @task
    def load_homepage(self):
        self.client.get("/")

    @task(3)  # 3x more likely than homepage
    def load_api(self):
        self.client.get("/api/data")

Performance Checklist

Before Optimizing:

  • Profile to identify actual bottlenecks (don't guess!)
  • Measure baseline performance
  • Set performance targets

Python-Specific Optimizations:

  • Use generators for large datasets
  • Replace loops with list comprehensions where appropriate
  • Use appropriate data structures (set, deque, defaultdict)
  • Implement caching with @lru_cache or @cache
  • Use async/await for I/O-bound operations
  • Use multiprocessing for CPU-bound operations
  • Avoid string concatenation in loops
  • Minimize attribute lookups in hot loops
  • Use slots for classes with many instances

After Optimizing:

  • Re-profile to verify improvements
  • Check memory usage hasn't increased significantly
  • Ensure code readability is maintained
  • Add performance regression tests

Tools and Libraries

Profiling:

  • cProfile - Built-in execution profiler
  • py-spy - Sampling profiler without code changes
  • memory_profiler - Memory usage line-by-line
  • memray - Modern memory profiler with flamegraphs

Performance Testing:

  • pytest-benchmark - Benchmark tests
  • locust - Load testing framework
  • hyperfine - Command-line benchmarking

Optimization:

  • numpy - Vectorized operations for numerical data
  • numba - JIT compilation for numerical functions
  • cython - Compile Python to C for speed

Python-specific performance optimization with profiling tools and patterns