zhongwei/gh-greyhaven-ai-claude-code-config-grey-haven-plugins-observability

Files

Zhongwei Li ebc71f5387 Initial commit

2025-11-29 18:29:23 +08:00

12 KiB

Raw Blame History

Python Memory Profiling with Scalene

Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies.

Overview

Before Optimization:

Memory usage: 500MB for processing 10K records
OOM (Out of Memory) errors with 100K records
Processing time: 45 seconds for 10K records
List comprehensions loading entire dataset

After Optimization:

Memory usage: 5MB for processing 10K records (99% reduction)
No OOM errors with 1M records
Processing time: 8 seconds for 10K records (82% faster)
Generator-based streaming

Tools: Scalene, pytest, memory_profiler, tracemalloc

1. Scalene Installation and Setup

Installation

# Install Scalene
pip install scalene

# Or with uv (faster)
uv pip install scalene

Basic Usage

# Profile entire script
scalene script.py

# Profile with pytest (recommended)
scalene --cli --memory -m pytest tests/

# HTML output
scalene --html --outfile profile.html script.py

# Profile specific function
scalene --reduced-profile script.py

2. Profiling with pytest

Test File Setup

# tests/test_data_processing.py
import pytest
from data_processor import DataProcessor

@pytest.fixture
def processor():
    return DataProcessor()

def test_process_large_dataset(processor):
    # Generate 10K records
    records = [{'id': i, 'value': i * 2} for i in range(10000)]

    # Process (this is where memory spike occurs)
    result = processor.process_records(records)

    assert len(result) == 10000

Running Scalene with pytest

# Profile memory usage during test execution
uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test"

# Output shows line-by-line memory allocation

Scalene Output (before optimization):

data_processor.py:
Line | Memory % | Memory (MB) | CPU % | Code
-----|----------|-------------|-------|-----
12   | 45%      | 225 MB      | 10%   | result = [transform(r) for r in records]
18   | 30%      | 150 MB      | 5%    | filtered = [r for r in result if r['value'] > 0]
25   | 15%      | 75 MB       | 20%   | sorted_data = sorted(filtered, key=lambda x: x['id'])

Analysis: Line 12 is the hotspot (45% of memory)

3. Memory Hotspot Identification

Vulnerable Code (Memory Spike)

# data_processor.py (BEFORE OPTIMIZATION)
class DataProcessor:
    def process_records(self, records: list[dict]) -> list[dict]:
        # ❌ HOTSPOT: List comprehension loads entire dataset
        result = [self.transform(r) for r in records]  # 225MB for 10K records

        # ❌ Creates another copy
        filtered = [r for r in result if r['value'] > 0]  # +150MB

        # ❌ sorted() creates yet another copy
        sorted_data = sorted(filtered, key=lambda x: x['id'])  # +75MB

        return sorted_data  # Total: 450MB for 10K records

    def transform(self, record: dict) -> dict:
        return {
            'id': record['id'],
            'value': record['value'] * 2,
            'timestamp': datetime.now()
        }

Scalene Report:

Memory allocation breakdown:
- Line 12 (list comprehension): 225MB (50%)
- Line 18 (filtering): 150MB (33%)
- Line 25 (sorting): 75MB (17%)

Total memory: 450MB for 10,000 records
Projected for 100K: 4.5GB → OOM!

Optimized Code (Generator-Based)

# data_processor.py (AFTER OPTIMIZATION)
from typing import Iterator

class DataProcessor:
    def process_records(self, records: list[dict]) -> Iterator[dict]:
        # ✅ Generator: processes one record at a time
        transformed = (self.transform(r) for r in records)  # O(1) memory

        # ✅ Generator chaining
        filtered = (r for r in transformed if r['value'] > 0)  # O(1) memory

        # ✅ Stream-based sorting (only if needed)
        # For very large datasets, use external sorting or database ORDER BY
        yield from sorted(filtered, key=lambda x: x['id'])  # Still O(n), but lazy

    def transform(self, record: dict) -> dict:
        return {
            'id': record['id'],
            'value': record['value'] * 2,
            'timestamp': datetime.now()
        }

    # Alternative: Fully streaming (no sorting)
    def process_records_streaming(self, records: list[dict]) -> Iterator[dict]:
        for record in records:
            transformed = self.transform(record)
            if transformed['value'] > 0:
                yield transformed  # O(1) memory, fully streaming

Scalene Report (After):

Memory allocation breakdown:
- Line 12 (generator): 5MB (100% - constant overhead)
- Line 18 (filter generator): 0MB (lazy)
- Line 25 (yield): 0MB (lazy)

Total memory: 5MB for 10,000 records (99% reduction!)
Scalable to 1M+ records without OOM

4. Common Memory Patterns

Pattern 1: List Comprehension → Generator

Before (High Memory):

# ❌ Loads entire list into memory
def process_large_file(filename: str) -> list[dict]:
    with open(filename) as f:
        lines = f.readlines()  # Loads entire file (500MB)

    # Another copy
    return [json.loads(line) for line in lines]  # +500MB = 1GB total

After (Low Memory):

# ✅ Generator: processes line-by-line
def process_large_file(filename: str) -> Iterator[dict]:
    with open(filename) as f:
        for line in f:  # Reads one line at a time
            yield json.loads(line)  # O(1) memory

Scalene diff: 1GB → 5MB (99.5% reduction)

Pattern 2: DataFrame Memory Optimization

Before (High Memory):

# ❌ Loads entire CSV into memory
import pandas as pd

def analyze_data(filename: str):
    df = pd.read_csv(filename)  # 10GB CSV → 10GB RAM

    # All transformations in memory
    df['new_col'] = df['value'] * 2
    df_filtered = df[df['value'] > 0]
    return df_filtered.groupby('category').sum()

After (Low Memory with Chunking):

# ✅ Process in chunks
import pandas as pd

def analyze_data(filename: str):
    chunk_size = 10000
    results = []

    # Process 10K rows at a time
    for chunk in pd.read_csv(filename, chunksize=chunk_size):
        chunk['new_col'] = chunk['value'] * 2
        filtered = chunk[chunk['value'] > 0]
        group_result = filtered.groupby('category').sum()
        results.append(group_result)

    # Combine results
    return pd.concat(results).groupby(level=0).sum()  # Much smaller

Scalene diff: 10GB → 500MB (95% reduction)

Pattern 3: String Concatenation

Before (High Memory):

# ❌ Creates new string each iteration (O(n²) memory)
def build_report(data: list[dict]) -> str:
    report = ""
    for item in data:  # 100K items
        report += f"{item['id']}: {item['value']}\n"  # New string every time
    return report  # 500MB final string + 500MB garbage = 1GB

After (Low Memory):

# ✅ StringIO or join (O(n) memory)
from io import StringIO

def build_report(data: list[dict]) -> str:
    buffer = StringIO()
    for item in data:
        buffer.write(f"{item['id']}: {item['value']}\n")
    return buffer.getvalue()

# Or even better: generator
def build_report_streaming(data: list[dict]) -> Iterator[str]:
    for item in data:
        yield f"{item['id']}: {item['value']}\n"

Scalene diff: 1GB → 50MB (95% reduction)

5. Scalene CLI Reference

Common Options

# Memory-only profiling (fastest)
scalene --cli --memory script.py

# CPU + Memory profiling
scalene --cli --cpu --memory script.py

# Reduced profile (functions only, not lines)
scalene --reduced-profile script.py

# Profile specific function
scalene --profile-only process_data script.py

# HTML report
scalene --html --outfile profile.html script.py

# Profile with pytest
scalene --cli --memory -m pytest tests/

# Set memory sampling interval (default: 1MB)
scalene --malloc-threshold 0.1 script.py  # Sample every 100KB

Interpreting Output

Column Meanings:

Memory %  | Percentage of total memory allocated
Memory MB | Absolute memory allocated (in megabytes)
CPU %     | Percentage of CPU time spent
Python %  | Time spent in Python (vs native code)

Example Output:

script.py:
Line | Memory % | Memory MB | CPU % | Python % | Code
-----|----------|-----------|-------|----------|-----
12   | 45.2%    | 225.6 MB  | 10.5% | 95.2%    | data = [x for x in range(1000000)]
18   | 30.1%    | 150.3 MB  | 5.2%  | 98.1%    | filtered = list(filter(lambda x: x > 0, data))

Analysis:

Line 12: High memory (45.2%) → optimize list comprehension
Line 18: Moderate memory (30.1%) → use generator instead of list()

6. Integration with CI/CD

GitHub Actions Workflow

# .github/workflows/memory-profiling.yml
name: Memory Profiling

on: [pull_request]

jobs:
  profile:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install scalene pytest

      - name: Run memory profiling
        run: |
          scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt

      - name: Check for memory hotspots
        run: |
          if grep -q "Memory %" profile.txt; then
            # Alert if any line uses >100MB
            if awk '$3 > 100 {exit 1}' profile.txt; then
              echo "Memory hotspot detected!"
              exit 1
            fi
          fi

      - name: Upload profile
        uses: actions/upload-artifact@v3
        with:
          name: memory-profile
          path: profile.txt

7. Real-World Optimization: CSV Processing

Before (500MB Memory, OOM at 100K rows)

# csv_processor.py (BEFORE)
import pandas as pd

class CSVProcessor:
    def process_file(self, filename: str) -> dict:
        # ❌ Loads entire CSV
        df = pd.read_csv(filename)  # 500MB for 10K rows

        # ❌ Multiple copies
        df['total'] = df['quantity'] * df['price']
        df_filtered = df[df['total'] > 100]
        summary = df_filtered.groupby('category').agg({
            'total': 'sum',
            'quantity': 'sum'
        })

        return summary.to_dict()

Scalene Output:

Line 8:  500MB (75%) - pd.read_csv()
Line 11: 100MB (15%) - df['total'] calculation
Line 12: 50MB (10%) - filtering
Total: 650MB for 10K rows

After (5MB Memory, Handles 1M rows)

# csv_processor.py (AFTER)
import pandas as pd
from collections import defaultdict

class CSVProcessor:
    def process_file(self, filename: str) -> dict:
        # ✅ Process in 10K row chunks
        chunk_size = 10000
        results = defaultdict(lambda: {'total': 0, 'quantity': 0})

        for chunk in pd.read_csv(filename, chunksize=chunk_size):
            chunk['total'] = chunk['quantity'] * chunk['price']
            filtered = chunk[chunk['total'] > 100]

            # Aggregate incrementally
            for category, group in filtered.groupby('category'):
                results[category]['total'] += group['total'].sum()
                results[category]['quantity'] += group['quantity'].sum()

        return dict(results)

Scalene Output (After):

Line 9:  5MB (100%) - chunk processing (constant memory)
Total: 5MB for any file size (99% reduction)

8. Results and Impact

Before vs After Metrics

Metric	Before	After	Impact
Memory Usage	500MB (10K rows)	5MB (1M rows)	99% reduction
Processing Time	45s (10K rows)	8s (10K rows)	82% faster
Max File Size	100K rows (OOM)	10M+ rows	100x scalability
OOM Errors	5/week	0/month	100% eliminated

Key Optimizations Applied

List comprehension → Generator: 225MB → 0MB
DataFrame chunking: 500MB → 5MB per chunk
String concatenation: 1GB → 50MB (StringIO)
Lazy evaluation: Load on demand vs load all

Node.js Leaks: nodejs-memory-leak.md
DB Leaks: database-connection-leak.md
Reference: ../reference/profiling-tools.md
Templates: ../templates/scalene-config.txt

Return to examples index

12 KiB Raw Blame History

Python Memory Profiling with Scalene

Overview

1. Scalene Installation and Setup

Installation

Basic Usage

2. Profiling with pytest

Test File Setup

Running Scalene with pytest

3. Memory Hotspot Identification

Vulnerable Code (Memory Spike)

Optimized Code (Generator-Based)

4. Common Memory Patterns

Pattern 1: List Comprehension → Generator

Pattern 2: DataFrame Memory Optimization

Pattern 3: String Concatenation

5. Scalene CLI Reference

Common Options

Interpreting Output

6. Integration with CI/CD

GitHub Actions Workflow

7. Real-World Optimization: CSV Processing

Before (500MB Memory, OOM at 100K rows)

After (5MB Memory, Handles 1M rows)

8. Results and Impact

Before vs After Metrics

Key Optimizations Applied

Related Documentation

12 KiB

Raw Blame History