Files
gh-greyhaven-ai-claude-code…/skills/memory-profiling/examples/python-scalene-profiling.md
2025-11-29 18:29:23 +08:00

12 KiB

Python Memory Profiling with Scalene

Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies.

Overview

Before Optimization:

  • Memory usage: 500MB for processing 10K records
  • OOM (Out of Memory) errors with 100K records
  • Processing time: 45 seconds for 10K records
  • List comprehensions loading entire dataset

After Optimization:

  • Memory usage: 5MB for processing 10K records (99% reduction)
  • No OOM errors with 1M records
  • Processing time: 8 seconds for 10K records (82% faster)
  • Generator-based streaming

Tools: Scalene, pytest, memory_profiler, tracemalloc

1. Scalene Installation and Setup

Installation

# Install Scalene
pip install scalene

# Or with uv (faster)
uv pip install scalene

Basic Usage

# Profile entire script
scalene script.py

# Profile with pytest (recommended)
scalene --cli --memory -m pytest tests/

# HTML output
scalene --html --outfile profile.html script.py

# Profile specific function
scalene --reduced-profile script.py

2. Profiling with pytest

Test File Setup

# tests/test_data_processing.py
import pytest
from data_processor import DataProcessor

@pytest.fixture
def processor():
    return DataProcessor()

def test_process_large_dataset(processor):
    # Generate 10K records
    records = [{'id': i, 'value': i * 2} for i in range(10000)]

    # Process (this is where memory spike occurs)
    result = processor.process_records(records)

    assert len(result) == 10000

Running Scalene with pytest

# Profile memory usage during test execution
uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test"

# Output shows line-by-line memory allocation

Scalene Output (before optimization):

data_processor.py:
Line | Memory % | Memory (MB) | CPU % | Code
-----|----------|-------------|-------|-----
12   | 45%      | 225 MB      | 10%   | result = [transform(r) for r in records]
18   | 30%      | 150 MB      | 5%    | filtered = [r for r in result if r['value'] > 0]
25   | 15%      | 75 MB       | 20%   | sorted_data = sorted(filtered, key=lambda x: x['id'])

Analysis: Line 12 is the hotspot (45% of memory)

3. Memory Hotspot Identification

Vulnerable Code (Memory Spike)

# data_processor.py (BEFORE OPTIMIZATION)
class DataProcessor:
    def process_records(self, records: list[dict]) -> list[dict]:
        # ❌ HOTSPOT: List comprehension loads entire dataset
        result = [self.transform(r) for r in records]  # 225MB for 10K records

        # ❌ Creates another copy
        filtered = [r for r in result if r['value'] > 0]  # +150MB

        # ❌ sorted() creates yet another copy
        sorted_data = sorted(filtered, key=lambda x: x['id'])  # +75MB

        return sorted_data  # Total: 450MB for 10K records

    def transform(self, record: dict) -> dict:
        return {
            'id': record['id'],
            'value': record['value'] * 2,
            'timestamp': datetime.now()
        }

Scalene Report:

Memory allocation breakdown:
- Line 12 (list comprehension): 225MB (50%)
- Line 18 (filtering): 150MB (33%)
- Line 25 (sorting): 75MB (17%)

Total memory: 450MB for 10,000 records
Projected for 100K: 4.5GB → OOM!

Optimized Code (Generator-Based)

# data_processor.py (AFTER OPTIMIZATION)
from typing import Iterator

class DataProcessor:
    def process_records(self, records: list[dict]) -> Iterator[dict]:
        # ✅ Generator: processes one record at a time
        transformed = (self.transform(r) for r in records)  # O(1) memory

        # ✅ Generator chaining
        filtered = (r for r in transformed if r['value'] > 0)  # O(1) memory

        # ✅ Stream-based sorting (only if needed)
        # For very large datasets, use external sorting or database ORDER BY
        yield from sorted(filtered, key=lambda x: x['id'])  # Still O(n), but lazy

    def transform(self, record: dict) -> dict:
        return {
            'id': record['id'],
            'value': record['value'] * 2,
            'timestamp': datetime.now()
        }

    # Alternative: Fully streaming (no sorting)
    def process_records_streaming(self, records: list[dict]) -> Iterator[dict]:
        for record in records:
            transformed = self.transform(record)
            if transformed['value'] > 0:
                yield transformed  # O(1) memory, fully streaming

Scalene Report (After):

Memory allocation breakdown:
- Line 12 (generator): 5MB (100% - constant overhead)
- Line 18 (filter generator): 0MB (lazy)
- Line 25 (yield): 0MB (lazy)

Total memory: 5MB for 10,000 records (99% reduction!)
Scalable to 1M+ records without OOM

4. Common Memory Patterns

Pattern 1: List Comprehension → Generator

Before (High Memory):

# ❌ Loads entire list into memory
def process_large_file(filename: str) -> list[dict]:
    with open(filename) as f:
        lines = f.readlines()  # Loads entire file (500MB)

    # Another copy
    return [json.loads(line) for line in lines]  # +500MB = 1GB total

After (Low Memory):

# ✅ Generator: processes line-by-line
def process_large_file(filename: str) -> Iterator[dict]:
    with open(filename) as f:
        for line in f:  # Reads one line at a time
            yield json.loads(line)  # O(1) memory

Scalene diff: 1GB → 5MB (99.5% reduction)

Pattern 2: DataFrame Memory Optimization

Before (High Memory):

# ❌ Loads entire CSV into memory
import pandas as pd

def analyze_data(filename: str):
    df = pd.read_csv(filename)  # 10GB CSV → 10GB RAM

    # All transformations in memory
    df['new_col'] = df['value'] * 2
    df_filtered = df[df['value'] > 0]
    return df_filtered.groupby('category').sum()

After (Low Memory with Chunking):

# ✅ Process in chunks
import pandas as pd

def analyze_data(filename: str):
    chunk_size = 10000
    results = []

    # Process 10K rows at a time
    for chunk in pd.read_csv(filename, chunksize=chunk_size):
        chunk['new_col'] = chunk['value'] * 2
        filtered = chunk[chunk['value'] > 0]
        group_result = filtered.groupby('category').sum()
        results.append(group_result)

    # Combine results
    return pd.concat(results).groupby(level=0).sum()  # Much smaller

Scalene diff: 10GB → 500MB (95% reduction)

Pattern 3: String Concatenation

Before (High Memory):

# ❌ Creates new string each iteration (O(n²) memory)
def build_report(data: list[dict]) -> str:
    report = ""
    for item in data:  # 100K items
        report += f"{item['id']}: {item['value']}\n"  # New string every time
    return report  # 500MB final string + 500MB garbage = 1GB

After (Low Memory):

# ✅ StringIO or join (O(n) memory)
from io import StringIO

def build_report(data: list[dict]) -> str:
    buffer = StringIO()
    for item in data:
        buffer.write(f"{item['id']}: {item['value']}\n")
    return buffer.getvalue()

# Or even better: generator
def build_report_streaming(data: list[dict]) -> Iterator[str]:
    for item in data:
        yield f"{item['id']}: {item['value']}\n"

Scalene diff: 1GB → 50MB (95% reduction)

5. Scalene CLI Reference

Common Options

# Memory-only profiling (fastest)
scalene --cli --memory script.py

# CPU + Memory profiling
scalene --cli --cpu --memory script.py

# Reduced profile (functions only, not lines)
scalene --reduced-profile script.py

# Profile specific function
scalene --profile-only process_data script.py

# HTML report
scalene --html --outfile profile.html script.py

# Profile with pytest
scalene --cli --memory -m pytest tests/

# Set memory sampling interval (default: 1MB)
scalene --malloc-threshold 0.1 script.py  # Sample every 100KB

Interpreting Output

Column Meanings:

Memory %  | Percentage of total memory allocated
Memory MB | Absolute memory allocated (in megabytes)
CPU %     | Percentage of CPU time spent
Python %  | Time spent in Python (vs native code)

Example Output:

script.py:
Line | Memory % | Memory MB | CPU % | Python % | Code
-----|----------|-----------|-------|----------|-----
12   | 45.2%    | 225.6 MB  | 10.5% | 95.2%    | data = [x for x in range(1000000)]
18   | 30.1%    | 150.3 MB  | 5.2%  | 98.1%    | filtered = list(filter(lambda x: x > 0, data))

Analysis:

  • Line 12: High memory (45.2%) → optimize list comprehension
  • Line 18: Moderate memory (30.1%) → use generator instead of list()

6. Integration with CI/CD

GitHub Actions Workflow

# .github/workflows/memory-profiling.yml
name: Memory Profiling

on: [pull_request]

jobs:
  profile:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install scalene pytest

      - name: Run memory profiling
        run: |
          scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt

      - name: Check for memory hotspots
        run: |
          if grep -q "Memory %" profile.txt; then
            # Alert if any line uses >100MB
            if awk '$3 > 100 {exit 1}' profile.txt; then
              echo "Memory hotspot detected!"
              exit 1
            fi
          fi

      - name: Upload profile
        uses: actions/upload-artifact@v3
        with:
          name: memory-profile
          path: profile.txt

7. Real-World Optimization: CSV Processing

Before (500MB Memory, OOM at 100K rows)

# csv_processor.py (BEFORE)
import pandas as pd

class CSVProcessor:
    def process_file(self, filename: str) -> dict:
        # ❌ Loads entire CSV
        df = pd.read_csv(filename)  # 500MB for 10K rows

        # ❌ Multiple copies
        df['total'] = df['quantity'] * df['price']
        df_filtered = df[df['total'] > 100]
        summary = df_filtered.groupby('category').agg({
            'total': 'sum',
            'quantity': 'sum'
        })

        return summary.to_dict()

Scalene Output:

Line 8:  500MB (75%) - pd.read_csv()
Line 11: 100MB (15%) - df['total'] calculation
Line 12: 50MB (10%) - filtering
Total: 650MB for 10K rows

After (5MB Memory, Handles 1M rows)

# csv_processor.py (AFTER)
import pandas as pd
from collections import defaultdict

class CSVProcessor:
    def process_file(self, filename: str) -> dict:
        # ✅ Process in 10K row chunks
        chunk_size = 10000
        results = defaultdict(lambda: {'total': 0, 'quantity': 0})

        for chunk in pd.read_csv(filename, chunksize=chunk_size):
            chunk['total'] = chunk['quantity'] * chunk['price']
            filtered = chunk[chunk['total'] > 100]

            # Aggregate incrementally
            for category, group in filtered.groupby('category'):
                results[category]['total'] += group['total'].sum()
                results[category]['quantity'] += group['quantity'].sum()

        return dict(results)

Scalene Output (After):

Line 9:  5MB (100%) - chunk processing (constant memory)
Total: 5MB for any file size (99% reduction)

8. Results and Impact

Before vs After Metrics

Metric Before After Impact
Memory Usage 500MB (10K rows) 5MB (1M rows) 99% reduction
Processing Time 45s (10K rows) 8s (10K rows) 82% faster
Max File Size 100K rows (OOM) 10M+ rows 100x scalability
OOM Errors 5/week 0/month 100% eliminated

Key Optimizations Applied

  1. List comprehension → Generator: 225MB → 0MB
  2. DataFrame chunking: 500MB → 5MB per chunk
  3. String concatenation: 1GB → 50MB (StringIO)
  4. Lazy evaluation: Load on demand vs load all

Return to examples index