# Python Memory Profiling with Scalene Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies. ## Overview **Before Optimization**: - Memory usage: 500MB for processing 10K records - OOM (Out of Memory) errors with 100K records - Processing time: 45 seconds for 10K records - List comprehensions loading entire dataset **After Optimization**: - Memory usage: 5MB for processing 10K records (99% reduction) - No OOM errors with 1M records - Processing time: 8 seconds for 10K records (82% faster) - Generator-based streaming **Tools**: Scalene, pytest, memory_profiler, tracemalloc ## 1. Scalene Installation and Setup ### Installation ```bash # Install Scalene pip install scalene # Or with uv (faster) uv pip install scalene ``` ### Basic Usage ```bash # Profile entire script scalene script.py # Profile with pytest (recommended) scalene --cli --memory -m pytest tests/ # HTML output scalene --html --outfile profile.html script.py # Profile specific function scalene --reduced-profile script.py ``` ## 2. Profiling with pytest ### Test File Setup ```python # tests/test_data_processing.py import pytest from data_processor import DataProcessor @pytest.fixture def processor(): return DataProcessor() def test_process_large_dataset(processor): # Generate 10K records records = [{'id': i, 'value': i * 2} for i in range(10000)] # Process (this is where memory spike occurs) result = processor.process_records(records) assert len(result) == 10000 ``` ### Running Scalene with pytest ```bash # Profile memory usage during test execution uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test" # Output shows line-by-line memory allocation ``` **Scalene Output** (before optimization): ``` data_processor.py: Line | Memory % | Memory (MB) | CPU % | Code -----|----------|-------------|-------|----- 12 | 45% | 225 MB | 10% | result = [transform(r) for r in records] 18 | 30% | 150 MB | 5% | filtered = [r for r in result if r['value'] > 0] 25 | 15% | 75 MB | 20% | sorted_data = sorted(filtered, key=lambda x: x['id']) ``` **Analysis**: Line 12 is the hotspot (45% of memory) ## 3. Memory Hotspot Identification ### Vulnerable Code (Memory Spike) ```python # data_processor.py (BEFORE OPTIMIZATION) class DataProcessor: def process_records(self, records: list[dict]) -> list[dict]: # ❌ HOTSPOT: List comprehension loads entire dataset result = [self.transform(r) for r in records] # 225MB for 10K records # ❌ Creates another copy filtered = [r for r in result if r['value'] > 0] # +150MB # ❌ sorted() creates yet another copy sorted_data = sorted(filtered, key=lambda x: x['id']) # +75MB return sorted_data # Total: 450MB for 10K records def transform(self, record: dict) -> dict: return { 'id': record['id'], 'value': record['value'] * 2, 'timestamp': datetime.now() } ``` **Scalene Report**: ``` Memory allocation breakdown: - Line 12 (list comprehension): 225MB (50%) - Line 18 (filtering): 150MB (33%) - Line 25 (sorting): 75MB (17%) Total memory: 450MB for 10,000 records Projected for 100K: 4.5GB → OOM! ``` ### Optimized Code (Generator-Based) ```python # data_processor.py (AFTER OPTIMIZATION) from typing import Iterator class DataProcessor: def process_records(self, records: list[dict]) -> Iterator[dict]: # ✅ Generator: processes one record at a time transformed = (self.transform(r) for r in records) # O(1) memory # ✅ Generator chaining filtered = (r for r in transformed if r['value'] > 0) # O(1) memory # ✅ Stream-based sorting (only if needed) # For very large datasets, use external sorting or database ORDER BY yield from sorted(filtered, key=lambda x: x['id']) # Still O(n), but lazy def transform(self, record: dict) -> dict: return { 'id': record['id'], 'value': record['value'] * 2, 'timestamp': datetime.now() } # Alternative: Fully streaming (no sorting) def process_records_streaming(self, records: list[dict]) -> Iterator[dict]: for record in records: transformed = self.transform(record) if transformed['value'] > 0: yield transformed # O(1) memory, fully streaming ``` **Scalene Report (After)**: ``` Memory allocation breakdown: - Line 12 (generator): 5MB (100% - constant overhead) - Line 18 (filter generator): 0MB (lazy) - Line 25 (yield): 0MB (lazy) Total memory: 5MB for 10,000 records (99% reduction!) Scalable to 1M+ records without OOM ``` ## 4. Common Memory Patterns ### Pattern 1: List Comprehension → Generator **Before** (High Memory): ```python # ❌ Loads entire list into memory def process_large_file(filename: str) -> list[dict]: with open(filename) as f: lines = f.readlines() # Loads entire file (500MB) # Another copy return [json.loads(line) for line in lines] # +500MB = 1GB total ``` **After** (Low Memory): ```python # ✅ Generator: processes line-by-line def process_large_file(filename: str) -> Iterator[dict]: with open(filename) as f: for line in f: # Reads one line at a time yield json.loads(line) # O(1) memory ``` **Scalene diff**: 1GB → 5MB (99.5% reduction) ### Pattern 2: DataFrame Memory Optimization **Before** (High Memory): ```python # ❌ Loads entire CSV into memory import pandas as pd def analyze_data(filename: str): df = pd.read_csv(filename) # 10GB CSV → 10GB RAM # All transformations in memory df['new_col'] = df['value'] * 2 df_filtered = df[df['value'] > 0] return df_filtered.groupby('category').sum() ``` **After** (Low Memory with Chunking): ```python # ✅ Process in chunks import pandas as pd def analyze_data(filename: str): chunk_size = 10000 results = [] # Process 10K rows at a time for chunk in pd.read_csv(filename, chunksize=chunk_size): chunk['new_col'] = chunk['value'] * 2 filtered = chunk[chunk['value'] > 0] group_result = filtered.groupby('category').sum() results.append(group_result) # Combine results return pd.concat(results).groupby(level=0).sum() # Much smaller ``` **Scalene diff**: 10GB → 500MB (95% reduction) ### Pattern 3: String Concatenation **Before** (High Memory): ```python # ❌ Creates new string each iteration (O(n²) memory) def build_report(data: list[dict]) -> str: report = "" for item in data: # 100K items report += f"{item['id']}: {item['value']}\n" # New string every time return report # 500MB final string + 500MB garbage = 1GB ``` **After** (Low Memory): ```python # ✅ StringIO or join (O(n) memory) from io import StringIO def build_report(data: list[dict]) -> str: buffer = StringIO() for item in data: buffer.write(f"{item['id']}: {item['value']}\n") return buffer.getvalue() # Or even better: generator def build_report_streaming(data: list[dict]) -> Iterator[str]: for item in data: yield f"{item['id']}: {item['value']}\n" ``` **Scalene diff**: 1GB → 50MB (95% reduction) ## 5. Scalene CLI Reference ### Common Options ```bash # Memory-only profiling (fastest) scalene --cli --memory script.py # CPU + Memory profiling scalene --cli --cpu --memory script.py # Reduced profile (functions only, not lines) scalene --reduced-profile script.py # Profile specific function scalene --profile-only process_data script.py # HTML report scalene --html --outfile profile.html script.py # Profile with pytest scalene --cli --memory -m pytest tests/ # Set memory sampling interval (default: 1MB) scalene --malloc-threshold 0.1 script.py # Sample every 100KB ``` ### Interpreting Output **Column Meanings**: ``` Memory % | Percentage of total memory allocated Memory MB | Absolute memory allocated (in megabytes) CPU % | Percentage of CPU time spent Python % | Time spent in Python (vs native code) ``` **Example Output**: ``` script.py: Line | Memory % | Memory MB | CPU % | Python % | Code -----|----------|-----------|-------|----------|----- 12 | 45.2% | 225.6 MB | 10.5% | 95.2% | data = [x for x in range(1000000)] 18 | 30.1% | 150.3 MB | 5.2% | 98.1% | filtered = list(filter(lambda x: x > 0, data)) ``` **Analysis**: - Line 12: High memory (45.2%) → optimize list comprehension - Line 18: Moderate memory (30.1%) → use generator instead of list() ## 6. Integration with CI/CD ### GitHub Actions Workflow ```yaml # .github/workflows/memory-profiling.yml name: Memory Profiling on: [pull_request] jobs: profile: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: | pip install scalene pytest - name: Run memory profiling run: | scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt - name: Check for memory hotspots run: | if grep -q "Memory %" profile.txt; then # Alert if any line uses >100MB if awk '$3 > 100 {exit 1}' profile.txt; then echo "Memory hotspot detected!" exit 1 fi fi - name: Upload profile uses: actions/upload-artifact@v3 with: name: memory-profile path: profile.txt ``` ## 7. Real-World Optimization: CSV Processing ### Before (500MB Memory, OOM at 100K rows) ```python # csv_processor.py (BEFORE) import pandas as pd class CSVProcessor: def process_file(self, filename: str) -> dict: # ❌ Loads entire CSV df = pd.read_csv(filename) # 500MB for 10K rows # ❌ Multiple copies df['total'] = df['quantity'] * df['price'] df_filtered = df[df['total'] > 100] summary = df_filtered.groupby('category').agg({ 'total': 'sum', 'quantity': 'sum' }) return summary.to_dict() ``` **Scalene Output**: ``` Line 8: 500MB (75%) - pd.read_csv() Line 11: 100MB (15%) - df['total'] calculation Line 12: 50MB (10%) - filtering Total: 650MB for 10K rows ``` ### After (5MB Memory, Handles 1M rows) ```python # csv_processor.py (AFTER) import pandas as pd from collections import defaultdict class CSVProcessor: def process_file(self, filename: str) -> dict: # ✅ Process in 10K row chunks chunk_size = 10000 results = defaultdict(lambda: {'total': 0, 'quantity': 0}) for chunk in pd.read_csv(filename, chunksize=chunk_size): chunk['total'] = chunk['quantity'] * chunk['price'] filtered = chunk[chunk['total'] > 100] # Aggregate incrementally for category, group in filtered.groupby('category'): results[category]['total'] += group['total'].sum() results[category]['quantity'] += group['quantity'].sum() return dict(results) ``` **Scalene Output (After)**: ``` Line 9: 5MB (100%) - chunk processing (constant memory) Total: 5MB for any file size (99% reduction) ``` ## 8. Results and Impact ### Before vs After Metrics | Metric | Before | After | Impact | |--------|--------|-------|--------| | **Memory Usage** | 500MB (10K rows) | 5MB (1M rows) | **99% reduction** | | **Processing Time** | 45s (10K rows) | 8s (10K rows) | **82% faster** | | **Max File Size** | 100K rows (OOM) | 10M+ rows | **100x scalability** | | **OOM Errors** | 5/week | 0/month | **100% eliminated** | ### Key Optimizations Applied 1. **List comprehension → Generator**: 225MB → 0MB 2. **DataFrame chunking**: 500MB → 5MB per chunk 3. **String concatenation**: 1GB → 50MB (StringIO) 4. **Lazy evaluation**: Load on demand vs load all ## Related Documentation - **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md) - **DB Leaks**: [database-connection-leak.md](database-connection-leak.md) - **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md) - **Templates**: [../templates/scalene-config.txt](../templates/scalene-config.txt) --- Return to [examples index](INDEX.md)