870 lines
21 KiB
Markdown
870 lines
21 KiB
Markdown
---
|
|
name: python-performance-optimization
|
|
description: Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.
|
|
---
|
|
|
|
# Python Performance Optimization
|
|
|
|
Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices.
|
|
|
|
## When to Use This Skill
|
|
|
|
- Identifying performance bottlenecks in Python applications
|
|
- Reducing application latency and response times
|
|
- Optimizing CPU-intensive operations
|
|
- Reducing memory consumption and memory leaks
|
|
- Improving database query performance
|
|
- Optimizing I/O operations
|
|
- Speeding up data processing pipelines
|
|
- Implementing high-performance algorithms
|
|
- Profiling production applications
|
|
|
|
## Core Concepts
|
|
|
|
### 1. Profiling Types
|
|
- **CPU Profiling**: Identify time-consuming functions
|
|
- **Memory Profiling**: Track memory allocation and leaks
|
|
- **Line Profiling**: Profile at line-by-line granularity
|
|
- **Call Graph**: Visualize function call relationships
|
|
|
|
### 2. Performance Metrics
|
|
- **Execution Time**: How long operations take
|
|
- **Memory Usage**: Peak and average memory consumption
|
|
- **CPU Utilization**: Processor usage patterns
|
|
- **I/O Wait**: Time spent on I/O operations
|
|
|
|
### 3. Optimization Strategies
|
|
- **Algorithmic**: Better algorithms and data structures
|
|
- **Implementation**: More efficient code patterns
|
|
- **Parallelization**: Multi-threading/processing
|
|
- **Caching**: Avoid redundant computation
|
|
- **Native Extensions**: C/Rust for critical paths
|
|
|
|
## Quick Start
|
|
|
|
### Basic Timing
|
|
|
|
```python
|
|
import time
|
|
|
|
def measure_time():
|
|
"""Simple timing measurement."""
|
|
start = time.time()
|
|
|
|
# Your code here
|
|
result = sum(range(1000000))
|
|
|
|
elapsed = time.time() - start
|
|
print(f"Execution time: {elapsed:.4f} seconds")
|
|
return result
|
|
|
|
# Better: use timeit for accurate measurements
|
|
import timeit
|
|
|
|
execution_time = timeit.timeit(
|
|
"sum(range(1000000))",
|
|
number=100
|
|
)
|
|
print(f"Average time: {execution_time/100:.6f} seconds")
|
|
```
|
|
|
|
## Profiling Tools
|
|
|
|
### Pattern 1: cProfile - CPU Profiling
|
|
|
|
```python
|
|
import cProfile
|
|
import pstats
|
|
from pstats import SortKey
|
|
|
|
def slow_function():
|
|
"""Function to profile."""
|
|
total = 0
|
|
for i in range(1000000):
|
|
total += i
|
|
return total
|
|
|
|
def another_function():
|
|
"""Another function."""
|
|
return [i**2 for i in range(100000)]
|
|
|
|
def main():
|
|
"""Main function to profile."""
|
|
result1 = slow_function()
|
|
result2 = another_function()
|
|
return result1, result2
|
|
|
|
# Profile the code
|
|
if __name__ == "__main__":
|
|
profiler = cProfile.Profile()
|
|
profiler.enable()
|
|
|
|
main()
|
|
|
|
profiler.disable()
|
|
|
|
# Print stats
|
|
stats = pstats.Stats(profiler)
|
|
stats.sort_stats(SortKey.CUMULATIVE)
|
|
stats.print_stats(10) # Top 10 functions
|
|
|
|
# Save to file for later analysis
|
|
stats.dump_stats("profile_output.prof")
|
|
```
|
|
|
|
**Command-line profiling:**
|
|
```bash
|
|
# Profile a script
|
|
python -m cProfile -o output.prof script.py
|
|
|
|
# View results
|
|
python -m pstats output.prof
|
|
# In pstats:
|
|
# sort cumtime
|
|
# stats 10
|
|
```
|
|
|
|
### Pattern 2: line_profiler - Line-by-Line Profiling
|
|
|
|
```python
|
|
# Install: pip install line-profiler
|
|
|
|
# Add @profile decorator (line_profiler provides this)
|
|
@profile
|
|
def process_data(data):
|
|
"""Process data with line profiling."""
|
|
result = []
|
|
for item in data:
|
|
processed = item * 2
|
|
result.append(processed)
|
|
return result
|
|
|
|
# Run with:
|
|
# kernprof -l -v script.py
|
|
```
|
|
|
|
**Manual line profiling:**
|
|
```python
|
|
from line_profiler import LineProfiler
|
|
|
|
def process_data(data):
|
|
"""Function to profile."""
|
|
result = []
|
|
for item in data:
|
|
processed = item * 2
|
|
result.append(processed)
|
|
return result
|
|
|
|
if __name__ == "__main__":
|
|
lp = LineProfiler()
|
|
lp.add_function(process_data)
|
|
|
|
data = list(range(100000))
|
|
|
|
lp_wrapper = lp(process_data)
|
|
lp_wrapper(data)
|
|
|
|
lp.print_stats()
|
|
```
|
|
|
|
### Pattern 3: memory_profiler - Memory Usage
|
|
|
|
```python
|
|
# Install: pip install memory-profiler
|
|
|
|
from memory_profiler import profile
|
|
|
|
@profile
|
|
def memory_intensive():
|
|
"""Function that uses lots of memory."""
|
|
# Create large list
|
|
big_list = [i for i in range(1000000)]
|
|
|
|
# Create large dict
|
|
big_dict = {i: i**2 for i in range(100000)}
|
|
|
|
# Process data
|
|
result = sum(big_list)
|
|
|
|
return result
|
|
|
|
if __name__ == "__main__":
|
|
memory_intensive()
|
|
|
|
# Run with:
|
|
# python -m memory_profiler script.py
|
|
```
|
|
|
|
### Pattern 4: py-spy - Production Profiling
|
|
|
|
```bash
|
|
# Install: pip install py-spy
|
|
|
|
# Profile a running Python process
|
|
py-spy top --pid 12345
|
|
|
|
# Generate flamegraph
|
|
py-spy record -o profile.svg --pid 12345
|
|
|
|
# Profile a script
|
|
py-spy record -o profile.svg -- python script.py
|
|
|
|
# Dump current call stack
|
|
py-spy dump --pid 12345
|
|
```
|
|
|
|
## Optimization Patterns
|
|
|
|
### Pattern 5: List Comprehensions vs Loops
|
|
|
|
```python
|
|
import timeit
|
|
|
|
# Slow: Traditional loop
|
|
def slow_squares(n):
|
|
"""Create list of squares using loop."""
|
|
result = []
|
|
for i in range(n):
|
|
result.append(i**2)
|
|
return result
|
|
|
|
# Fast: List comprehension
|
|
def fast_squares(n):
|
|
"""Create list of squares using comprehension."""
|
|
return [i**2 for i in range(n)]
|
|
|
|
# Benchmark
|
|
n = 100000
|
|
|
|
slow_time = timeit.timeit(lambda: slow_squares(n), number=100)
|
|
fast_time = timeit.timeit(lambda: fast_squares(n), number=100)
|
|
|
|
print(f"Loop: {slow_time:.4f}s")
|
|
print(f"Comprehension: {fast_time:.4f}s")
|
|
print(f"Speedup: {slow_time/fast_time:.2f}x")
|
|
|
|
# Even faster for simple operations: map
|
|
def faster_squares(n):
|
|
"""Use map for even better performance."""
|
|
return list(map(lambda x: x**2, range(n)))
|
|
```
|
|
|
|
### Pattern 6: Generator Expressions for Memory
|
|
|
|
```python
|
|
import sys
|
|
|
|
def list_approach():
|
|
"""Memory-intensive list."""
|
|
data = [i**2 for i in range(1000000)]
|
|
return sum(data)
|
|
|
|
def generator_approach():
|
|
"""Memory-efficient generator."""
|
|
data = (i**2 for i in range(1000000))
|
|
return sum(data)
|
|
|
|
# Memory comparison
|
|
list_data = [i for i in range(1000000)]
|
|
gen_data = (i for i in range(1000000))
|
|
|
|
print(f"List size: {sys.getsizeof(list_data)} bytes")
|
|
print(f"Generator size: {sys.getsizeof(gen_data)} bytes")
|
|
|
|
# Generators use constant memory regardless of size
|
|
```
|
|
|
|
### Pattern 7: String Concatenation
|
|
|
|
```python
|
|
import timeit
|
|
|
|
def slow_concat(items):
|
|
"""Slow string concatenation."""
|
|
result = ""
|
|
for item in items:
|
|
result += str(item)
|
|
return result
|
|
|
|
def fast_concat(items):
|
|
"""Fast string concatenation with join."""
|
|
return "".join(str(item) for item in items)
|
|
|
|
def faster_concat(items):
|
|
"""Even faster with list."""
|
|
parts = [str(item) for item in items]
|
|
return "".join(parts)
|
|
|
|
items = list(range(10000))
|
|
|
|
# Benchmark
|
|
slow = timeit.timeit(lambda: slow_concat(items), number=100)
|
|
fast = timeit.timeit(lambda: fast_concat(items), number=100)
|
|
faster = timeit.timeit(lambda: faster_concat(items), number=100)
|
|
|
|
print(f"Concatenation (+): {slow:.4f}s")
|
|
print(f"Join (generator): {fast:.4f}s")
|
|
print(f"Join (list): {faster:.4f}s")
|
|
```
|
|
|
|
### Pattern 8: Dictionary Lookups vs List Searches
|
|
|
|
```python
|
|
import timeit
|
|
|
|
# Create test data
|
|
size = 10000
|
|
items = list(range(size))
|
|
lookup_dict = {i: i for i in range(size)}
|
|
|
|
def list_search(items, target):
|
|
"""O(n) search in list."""
|
|
return target in items
|
|
|
|
def dict_search(lookup_dict, target):
|
|
"""O(1) search in dict."""
|
|
return target in lookup_dict
|
|
|
|
target = size - 1 # Worst case for list
|
|
|
|
# Benchmark
|
|
list_time = timeit.timeit(
|
|
lambda: list_search(items, target),
|
|
number=1000
|
|
)
|
|
dict_time = timeit.timeit(
|
|
lambda: dict_search(lookup_dict, target),
|
|
number=1000
|
|
)
|
|
|
|
print(f"List search: {list_time:.6f}s")
|
|
print(f"Dict search: {dict_time:.6f}s")
|
|
print(f"Speedup: {list_time/dict_time:.0f}x")
|
|
```
|
|
|
|
### Pattern 9: Local Variable Access
|
|
|
|
```python
|
|
import timeit
|
|
|
|
# Global variable (slow)
|
|
GLOBAL_VALUE = 100
|
|
|
|
def use_global():
|
|
"""Access global variable."""
|
|
total = 0
|
|
for i in range(10000):
|
|
total += GLOBAL_VALUE
|
|
return total
|
|
|
|
def use_local():
|
|
"""Use local variable."""
|
|
local_value = 100
|
|
total = 0
|
|
for i in range(10000):
|
|
total += local_value
|
|
return total
|
|
|
|
# Local is faster
|
|
global_time = timeit.timeit(use_global, number=1000)
|
|
local_time = timeit.timeit(use_local, number=1000)
|
|
|
|
print(f"Global access: {global_time:.4f}s")
|
|
print(f"Local access: {local_time:.4f}s")
|
|
print(f"Speedup: {global_time/local_time:.2f}x")
|
|
```
|
|
|
|
### Pattern 10: Function Call Overhead
|
|
|
|
```python
|
|
import timeit
|
|
|
|
def calculate_inline():
|
|
"""Inline calculation."""
|
|
total = 0
|
|
for i in range(10000):
|
|
total += i * 2 + 1
|
|
return total
|
|
|
|
def helper_function(x):
|
|
"""Helper function."""
|
|
return x * 2 + 1
|
|
|
|
def calculate_with_function():
|
|
"""Calculation with function calls."""
|
|
total = 0
|
|
for i in range(10000):
|
|
total += helper_function(i)
|
|
return total
|
|
|
|
# Inline is faster due to no call overhead
|
|
inline_time = timeit.timeit(calculate_inline, number=1000)
|
|
function_time = timeit.timeit(calculate_with_function, number=1000)
|
|
|
|
print(f"Inline: {inline_time:.4f}s")
|
|
print(f"Function calls: {function_time:.4f}s")
|
|
```
|
|
|
|
## Advanced Optimization
|
|
|
|
### Pattern 11: NumPy for Numerical Operations
|
|
|
|
```python
|
|
import timeit
|
|
import numpy as np
|
|
|
|
def python_sum(n):
|
|
"""Sum using pure Python."""
|
|
return sum(range(n))
|
|
|
|
def numpy_sum(n):
|
|
"""Sum using NumPy."""
|
|
return np.arange(n).sum()
|
|
|
|
n = 1000000
|
|
|
|
python_time = timeit.timeit(lambda: python_sum(n), number=100)
|
|
numpy_time = timeit.timeit(lambda: numpy_sum(n), number=100)
|
|
|
|
print(f"Python: {python_time:.4f}s")
|
|
print(f"NumPy: {numpy_time:.4f}s")
|
|
print(f"Speedup: {python_time/numpy_time:.2f}x")
|
|
|
|
# Vectorized operations
|
|
def python_multiply():
|
|
"""Element-wise multiplication in Python."""
|
|
a = list(range(100000))
|
|
b = list(range(100000))
|
|
return [x * y for x, y in zip(a, b)]
|
|
|
|
def numpy_multiply():
|
|
"""Vectorized multiplication in NumPy."""
|
|
a = np.arange(100000)
|
|
b = np.arange(100000)
|
|
return a * b
|
|
|
|
py_time = timeit.timeit(python_multiply, number=100)
|
|
np_time = timeit.timeit(numpy_multiply, number=100)
|
|
|
|
print(f"\nPython multiply: {py_time:.4f}s")
|
|
print(f"NumPy multiply: {np_time:.4f}s")
|
|
print(f"Speedup: {py_time/np_time:.2f}x")
|
|
```
|
|
|
|
### Pattern 12: Caching with functools.lru_cache
|
|
|
|
```python
|
|
from functools import lru_cache
|
|
import timeit
|
|
|
|
def fibonacci_slow(n):
|
|
"""Recursive fibonacci without caching."""
|
|
if n < 2:
|
|
return n
|
|
return fibonacci_slow(n-1) + fibonacci_slow(n-2)
|
|
|
|
@lru_cache(maxsize=None)
|
|
def fibonacci_fast(n):
|
|
"""Recursive fibonacci with caching."""
|
|
if n < 2:
|
|
return n
|
|
return fibonacci_fast(n-1) + fibonacci_fast(n-2)
|
|
|
|
# Massive speedup for recursive algorithms
|
|
n = 30
|
|
|
|
slow_time = timeit.timeit(lambda: fibonacci_slow(n), number=1)
|
|
fast_time = timeit.timeit(lambda: fibonacci_fast(n), number=1000)
|
|
|
|
print(f"Without cache (1 run): {slow_time:.4f}s")
|
|
print(f"With cache (1000 runs): {fast_time:.4f}s")
|
|
|
|
# Cache info
|
|
print(f"Cache info: {fibonacci_fast.cache_info()}")
|
|
```
|
|
|
|
### Pattern 13: Using __slots__ for Memory
|
|
|
|
```python
|
|
import sys
|
|
|
|
class RegularClass:
|
|
"""Regular class with __dict__."""
|
|
def __init__(self, x, y, z):
|
|
self.x = x
|
|
self.y = y
|
|
self.z = z
|
|
|
|
class SlottedClass:
|
|
"""Class with __slots__ for memory efficiency."""
|
|
__slots__ = ['x', 'y', 'z']
|
|
|
|
def __init__(self, x, y, z):
|
|
self.x = x
|
|
self.y = y
|
|
self.z = z
|
|
|
|
# Memory comparison
|
|
regular = RegularClass(1, 2, 3)
|
|
slotted = SlottedClass(1, 2, 3)
|
|
|
|
print(f"Regular class size: {sys.getsizeof(regular)} bytes")
|
|
print(f"Slotted class size: {sys.getsizeof(slotted)} bytes")
|
|
|
|
# Significant savings with many instances
|
|
regular_objects = [RegularClass(i, i+1, i+2) for i in range(10000)]
|
|
slotted_objects = [SlottedClass(i, i+1, i+2) for i in range(10000)]
|
|
|
|
print(f"\nMemory for 10000 regular objects: ~{sys.getsizeof(regular) * 10000} bytes")
|
|
print(f"Memory for 10000 slotted objects: ~{sys.getsizeof(slotted) * 10000} bytes")
|
|
```
|
|
|
|
### Pattern 14: Multiprocessing for CPU-Bound Tasks
|
|
|
|
```python
|
|
import multiprocessing as mp
|
|
import time
|
|
|
|
def cpu_intensive_task(n):
|
|
"""CPU-intensive calculation."""
|
|
return sum(i**2 for i in range(n))
|
|
|
|
def sequential_processing():
|
|
"""Process tasks sequentially."""
|
|
start = time.time()
|
|
results = [cpu_intensive_task(1000000) for _ in range(4)]
|
|
elapsed = time.time() - start
|
|
return elapsed, results
|
|
|
|
def parallel_processing():
|
|
"""Process tasks in parallel."""
|
|
start = time.time()
|
|
with mp.Pool(processes=4) as pool:
|
|
results = pool.map(cpu_intensive_task, [1000000] * 4)
|
|
elapsed = time.time() - start
|
|
return elapsed, results
|
|
|
|
if __name__ == "__main__":
|
|
seq_time, seq_results = sequential_processing()
|
|
par_time, par_results = parallel_processing()
|
|
|
|
print(f"Sequential: {seq_time:.2f}s")
|
|
print(f"Parallel: {par_time:.2f}s")
|
|
print(f"Speedup: {seq_time/par_time:.2f}x")
|
|
```
|
|
|
|
### Pattern 15: Async I/O for I/O-Bound Tasks
|
|
|
|
```python
|
|
import asyncio
|
|
import aiohttp
|
|
import time
|
|
import requests
|
|
|
|
urls = [
|
|
"https://httpbin.org/delay/1",
|
|
"https://httpbin.org/delay/1",
|
|
"https://httpbin.org/delay/1",
|
|
"https://httpbin.org/delay/1",
|
|
]
|
|
|
|
def synchronous_requests():
|
|
"""Synchronous HTTP requests."""
|
|
start = time.time()
|
|
results = []
|
|
for url in urls:
|
|
response = requests.get(url)
|
|
results.append(response.status_code)
|
|
elapsed = time.time() - start
|
|
return elapsed, results
|
|
|
|
async def async_fetch(session, url):
|
|
"""Async HTTP request."""
|
|
async with session.get(url) as response:
|
|
return response.status
|
|
|
|
async def asynchronous_requests():
|
|
"""Asynchronous HTTP requests."""
|
|
start = time.time()
|
|
async with aiohttp.ClientSession() as session:
|
|
tasks = [async_fetch(session, url) for url in urls]
|
|
results = await asyncio.gather(*tasks)
|
|
elapsed = time.time() - start
|
|
return elapsed, results
|
|
|
|
# Async is much faster for I/O-bound work
|
|
sync_time, sync_results = synchronous_requests()
|
|
async_time, async_results = asyncio.run(asynchronous_requests())
|
|
|
|
print(f"Synchronous: {sync_time:.2f}s")
|
|
print(f"Asynchronous: {async_time:.2f}s")
|
|
print(f"Speedup: {sync_time/async_time:.2f}x")
|
|
```
|
|
|
|
## Database Optimization
|
|
|
|
### Pattern 16: Batch Database Operations
|
|
|
|
```python
|
|
import sqlite3
|
|
import time
|
|
|
|
def create_db():
|
|
"""Create test database."""
|
|
conn = sqlite3.connect(":memory:")
|
|
conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
|
|
return conn
|
|
|
|
def slow_inserts(conn, count):
|
|
"""Insert records one at a time."""
|
|
start = time.time()
|
|
cursor = conn.cursor()
|
|
for i in range(count):
|
|
cursor.execute("INSERT INTO users (name) VALUES (?)", (f"User {i}",))
|
|
conn.commit() # Commit each insert
|
|
elapsed = time.time() - start
|
|
return elapsed
|
|
|
|
def fast_inserts(conn, count):
|
|
"""Batch insert with single commit."""
|
|
start = time.time()
|
|
cursor = conn.cursor()
|
|
data = [(f"User {i}",) for i in range(count)]
|
|
cursor.executemany("INSERT INTO users (name) VALUES (?)", data)
|
|
conn.commit() # Single commit
|
|
elapsed = time.time() - start
|
|
return elapsed
|
|
|
|
# Benchmark
|
|
conn1 = create_db()
|
|
slow_time = slow_inserts(conn1, 1000)
|
|
|
|
conn2 = create_db()
|
|
fast_time = fast_inserts(conn2, 1000)
|
|
|
|
print(f"Individual inserts: {slow_time:.4f}s")
|
|
print(f"Batch insert: {fast_time:.4f}s")
|
|
print(f"Speedup: {slow_time/fast_time:.2f}x")
|
|
```
|
|
|
|
### Pattern 17: Query Optimization
|
|
|
|
```python
|
|
# Use indexes for frequently queried columns
|
|
"""
|
|
-- Slow: No index
|
|
SELECT * FROM users WHERE email = 'user@example.com';
|
|
|
|
-- Fast: With index
|
|
CREATE INDEX idx_users_email ON users(email);
|
|
SELECT * FROM users WHERE email = 'user@example.com';
|
|
"""
|
|
|
|
# Use query planning
|
|
import sqlite3
|
|
|
|
conn = sqlite3.connect("example.db")
|
|
cursor = conn.cursor()
|
|
|
|
# Analyze query performance
|
|
cursor.execute("EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?", ("test@example.com",))
|
|
print(cursor.fetchall())
|
|
|
|
# Use SELECT only needed columns
|
|
# Slow: SELECT *
|
|
# Fast: SELECT id, name
|
|
```
|
|
|
|
## Memory Optimization
|
|
|
|
### Pattern 18: Detecting Memory Leaks
|
|
|
|
```python
|
|
import tracemalloc
|
|
import gc
|
|
|
|
def memory_leak_example():
|
|
"""Example that leaks memory."""
|
|
leaked_objects = []
|
|
|
|
for i in range(100000):
|
|
# Objects added but never removed
|
|
leaked_objects.append([i] * 100)
|
|
|
|
# In real code, this would be an unintended reference
|
|
|
|
def track_memory_usage():
|
|
"""Track memory allocations."""
|
|
tracemalloc.start()
|
|
|
|
# Take snapshot before
|
|
snapshot1 = tracemalloc.take_snapshot()
|
|
|
|
# Run code
|
|
memory_leak_example()
|
|
|
|
# Take snapshot after
|
|
snapshot2 = tracemalloc.take_snapshot()
|
|
|
|
# Compare
|
|
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
|
|
|
|
print("Top 10 memory allocations:")
|
|
for stat in top_stats[:10]:
|
|
print(stat)
|
|
|
|
tracemalloc.stop()
|
|
|
|
# Monitor memory
|
|
track_memory_usage()
|
|
|
|
# Force garbage collection
|
|
gc.collect()
|
|
```
|
|
|
|
### Pattern 19: Iterators vs Lists
|
|
|
|
```python
|
|
import sys
|
|
|
|
def process_file_list(filename):
|
|
"""Load entire file into memory."""
|
|
with open(filename) as f:
|
|
lines = f.readlines() # Loads all lines
|
|
return sum(1 for line in lines if line.strip())
|
|
|
|
def process_file_iterator(filename):
|
|
"""Process file line by line."""
|
|
with open(filename) as f:
|
|
return sum(1 for line in f if line.strip())
|
|
|
|
# Iterator uses constant memory
|
|
# List loads entire file into memory
|
|
```
|
|
|
|
### Pattern 20: Weakref for Caches
|
|
|
|
```python
|
|
import weakref
|
|
|
|
class CachedResource:
|
|
"""Resource that can be garbage collected."""
|
|
def __init__(self, data):
|
|
self.data = data
|
|
|
|
# Regular cache prevents garbage collection
|
|
regular_cache = {}
|
|
|
|
def get_resource_regular(key):
|
|
"""Get resource from regular cache."""
|
|
if key not in regular_cache:
|
|
regular_cache[key] = CachedResource(f"Data for {key}")
|
|
return regular_cache[key]
|
|
|
|
# Weak reference cache allows garbage collection
|
|
weak_cache = weakref.WeakValueDictionary()
|
|
|
|
def get_resource_weak(key):
|
|
"""Get resource from weak cache."""
|
|
resource = weak_cache.get(key)
|
|
if resource is None:
|
|
resource = CachedResource(f"Data for {key}")
|
|
weak_cache[key] = resource
|
|
return resource
|
|
|
|
# When no strong references exist, objects can be GC'd
|
|
```
|
|
|
|
## Benchmarking Tools
|
|
|
|
### Custom Benchmark Decorator
|
|
|
|
```python
|
|
import time
|
|
from functools import wraps
|
|
|
|
def benchmark(func):
|
|
"""Decorator to benchmark function execution."""
|
|
@wraps(func)
|
|
def wrapper(*args, **kwargs):
|
|
start = time.perf_counter()
|
|
result = func(*args, **kwargs)
|
|
elapsed = time.perf_counter() - start
|
|
print(f"{func.__name__} took {elapsed:.6f} seconds")
|
|
return result
|
|
return wrapper
|
|
|
|
@benchmark
|
|
def slow_function():
|
|
"""Function to benchmark."""
|
|
time.sleep(0.5)
|
|
return sum(range(1000000))
|
|
|
|
result = slow_function()
|
|
```
|
|
|
|
### Performance Testing with pytest-benchmark
|
|
|
|
```python
|
|
# Install: pip install pytest-benchmark
|
|
|
|
def test_list_comprehension(benchmark):
|
|
"""Benchmark list comprehension."""
|
|
result = benchmark(lambda: [i**2 for i in range(10000)])
|
|
assert len(result) == 10000
|
|
|
|
def test_map_function(benchmark):
|
|
"""Benchmark map function."""
|
|
result = benchmark(lambda: list(map(lambda x: x**2, range(10000))))
|
|
assert len(result) == 10000
|
|
|
|
# Run with: pytest test_performance.py --benchmark-compare
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Profile before optimizing** - Measure to find real bottlenecks
|
|
2. **Focus on hot paths** - Optimize code that runs most frequently
|
|
3. **Use appropriate data structures** - Dict for lookups, set for membership
|
|
4. **Avoid premature optimization** - Clarity first, then optimize
|
|
5. **Use built-in functions** - They're implemented in C
|
|
6. **Cache expensive computations** - Use lru_cache
|
|
7. **Batch I/O operations** - Reduce system calls
|
|
8. **Use generators** for large datasets
|
|
9. **Consider NumPy** for numerical operations
|
|
10. **Profile production code** - Use py-spy for live systems
|
|
|
|
## Common Pitfalls
|
|
|
|
- Optimizing without profiling
|
|
- Using global variables unnecessarily
|
|
- Not using appropriate data structures
|
|
- Creating unnecessary copies of data
|
|
- Not using connection pooling for databases
|
|
- Ignoring algorithmic complexity
|
|
- Over-optimizing rare code paths
|
|
- Not considering memory usage
|
|
|
|
## Resources
|
|
|
|
- **cProfile**: Built-in CPU profiler
|
|
- **memory_profiler**: Memory usage profiling
|
|
- **line_profiler**: Line-by-line profiling
|
|
- **py-spy**: Sampling profiler for production
|
|
- **NumPy**: High-performance numerical computing
|
|
- **Cython**: Compile Python to C
|
|
- **PyPy**: Alternative Python interpreter with JIT
|
|
|
|
## Performance Checklist
|
|
|
|
- [ ] Profiled code to identify bottlenecks
|
|
- [ ] Used appropriate data structures
|
|
- [ ] Implemented caching where beneficial
|
|
- [ ] Optimized database queries
|
|
- [ ] Used generators for large datasets
|
|
- [ ] Considered multiprocessing for CPU-bound tasks
|
|
- [ ] Used async I/O for I/O-bound tasks
|
|
- [ ] Minimized function call overhead in hot loops
|
|
- [ ] Checked for memory leaks
|
|
- [ ] Benchmarked before and after optimization
|