Initial commit

2025-11-29 18:29:23 +08:00
commit ebc71f5387
37 changed files with 9382 additions and 0 deletions
--- a/skills/memory-profiling/examples/INDEX.md
+++ b/skills/memory-profiling/examples/INDEX.md
@@ -0,0 +1,86 @@
+# Memory Profiling Examples
+
+Production memory profiling implementations for Node.js and Python with leak detection, heap analysis, and optimization strategies.
+
+## Examples Overview
+
+### Node.js Memory Leak Detection
+
+**File**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
+
+Identifying and fixing memory leaks in Node.js applications:
+- **Memory leak detection**: Chrome DevTools, heapdump analysis
+- **Common leak patterns**: Event listeners, closures, global variables
+- **Heap snapshots**: Before/after comparison, retained object analysis
+- **Real leak**: EventEmitter leak causing 2GB memory growth
+- **Fix**: Proper cleanup with `removeListener()`, WeakMap for caching
+- **Result**: Memory stabilized at 150MB (93% reduction)
+
+**Use when**: Node.js memory growing over time, debugging production memory issues
+
+---
+
+### Python Memory Profiling with Scalene
+
+**File**: [python-scalene-profiling.md](python-scalene-profiling.md)
+
+Line-by-line memory profiling for Python applications:
+- **Scalene setup**: Installation, pytest integration, CLI usage
+- **Memory hotspots**: Line-by-line allocation tracking
+- **CPU + Memory**: Combined profiling for performance bottlenecks
+- **Real scenario**: 500MB dataset causing OOM, fixed with generators
+- **Optimization**: List comprehension → generator (500MB → 5MB)
+- **Result**: 99% memory reduction, no OOM errors
+
+**Use when**: Python memory spikes, profiling pytest tests, finding allocation hotspots
+
+---
+
+### Database Connection Pool Leak
+
+**File**: [database-connection-leak.md](database-connection-leak.md)
+
+PostgreSQL connection pool exhaustion and memory leaks:
+- **Symptom**: Connection pool maxed out, memory growing linearly
+- **Root cause**: Unclosed connections in error paths, missing `finally` blocks
+- **Detection**: Connection pool metrics, memory profiling
+- **Fix**: Context managers (`with` statement), proper cleanup
+- **Result**: Zero connection leaks, memory stable at 80MB
+
+**Use when**: Database connection errors, "too many clients" errors, connection pool issues
+
+---
+
+### Large Dataset Memory Optimization
+
+**File**: [large-dataset-optimization.md](large-dataset-optimization.md)
+
+Memory-efficient data processing for large datasets:
+- **Problem**: Loading 10GB CSV into memory (OOM killer)
+- **Solutions**: Streaming with `pandas.read_csv(chunksize)`, generators, memory mapping
+- **Techniques**: Lazy evaluation, columnar processing, batch processing
+- **Before/After**: 10GB memory → 500MB (95% reduction)
+- **Tools**: Pandas chunking, Dask for parallel processing
+
+**Use when**: Processing large files, OOM errors, batch data processing
+
+---
+
+## Quick Navigation
+
+| Topic | File | Lines | Focus |
+|-------|------|-------|-------|
+| **Node.js Leaks** | [nodejs-memory-leak.md](nodejs-memory-leak.md) | ~450 | EventEmitter, heap snapshots |
+| **Python Scalene** | [python-scalene-profiling.md](python-scalene-profiling.md) | ~420 | Line-by-line profiling |
+| **DB Connection Leaks** | [database-connection-leak.md](database-connection-leak.md) | ~380 | Connection pool management |
+| **Large Datasets** | [large-dataset-optimization.md](large-dataset-optimization.md) | ~400 | Streaming, chunking |
+
+## Related Documentation
+
+- **Reference**: [Reference Index](../reference/INDEX.md) - Memory patterns, profiling tools
+- **Templates**: [Templates Index](../templates/INDEX.md) - Profiling report template
+- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
+
+---
+
+Return to [main agent](../memory-profiler.md)
--- a/skills/memory-profiling/examples/database-connection-leak.md
+++ b/skills/memory-profiling/examples/database-connection-leak.md
@@ -0,0 +1,490 @@
+# Database Connection Pool Memory Leaks
+
+Detecting and fixing PostgreSQL connection pool leaks in FastAPI applications using connection monitoring and proper cleanup patterns.
+
+## Overview
+
+**Before Optimization**:
+- Active connections: 95/100 (pool exhausted)
+- Connection timeouts: 15-20/min during peak
+- Memory growth: 100MB/hour (unclosed connections)
+- Service restarts: 3-4x/day
+
+**After Optimization**:
+- Active connections: 8-12/100 (healthy pool)
+- Connection timeouts: 0/day
+- Memory growth: 0MB/hour (stable)
+- Service restarts: 0/month
+
+**Tools**: asyncpg, SQLModel, psycopg3, pg_stat_activity, Prometheus
+
+## 1. Connection Pool Architecture
+
+### Grey Haven Stack: PostgreSQL + SQLModel
+
+**Connection Pool Configuration**:
+```python
+# database.py
+from sqlmodel import create_engine
+from sqlalchemy.pool import QueuePool
+
+# ❌ VULNERABLE: No max_overflow, no timeout
+engine = create_engine(
+    "postgresql://user:pass@localhost/db",
+    poolclass=QueuePool,
+    pool_size=20,
+    echo=True
+)
+
+# ✅ SECURE: Proper pool configuration
+engine = create_engine(
+    "postgresql://user:pass@localhost/db",
+    poolclass=QueuePool,
+    pool_size=20,              # Core connections
+    max_overflow=10,           # Max additional connections
+    pool_timeout=30,           # Wait timeout (seconds)
+    pool_recycle=3600,         # Recycle after 1 hour
+    pool_pre_ping=True,        # Verify connection before use
+    echo=False
+)
+```
+
+**Pool Health Monitoring**:
+```python
+# monitoring.py
+from prometheus_client import Gauge
+
+# Prometheus metrics
+db_pool_size = Gauge('db_pool_connections_total', 'Total pool size')
+db_pool_active = Gauge('db_pool_connections_active', 'Active connections')
+db_pool_idle = Gauge('db_pool_connections_idle', 'Idle connections')
+db_pool_overflow = Gauge('db_pool_connections_overflow', 'Overflow connections')
+
+def record_pool_metrics(engine):
+    pool = engine.pool
+    db_pool_size.set(pool.size())
+    db_pool_active.set(pool.checkedout())
+    db_pool_idle.set(pool.size() - pool.checkedout())
+    db_pool_overflow.set(pool.overflow())
+```
+
+## 2. Common Leak Pattern: Unclosed Connections
+
+### Vulnerable Code (Connection Leak)
+
+```python
+# api/orders.py (BEFORE)
+from fastapi import APIRouter, Depends
+from sqlmodel import Session, select
+from database import engine
+
+router = APIRouter()
+
+@router.get("/orders")
+async def get_orders():
+    # ❌ LEAK: Connection never closed
+    session = Session(engine)
+
+    # If exception occurs here, session never closed
+    orders = session.exec(select(Order)).all()
+
+    # If return happens here, session never closed
+    return orders
+
+    # session.close() never reached if early return/exception
+    session.close()
+```
+
+**What Happens**:
+1. Every request acquires connection from pool
+2. Exception/early return prevents `session.close()`
+3. Connection remains in "active" state
+4. Pool exhausts after 100 requests (pool_size=100)
+5. New requests timeout waiting for connection
+
+**Memory Impact**:
+```
+Initial pool: 20 connections (40MB)
+After 1 hour: 95 leaked connections (190MB)
+After 6 hours: Pool exhausted + 100MB leaked memory
+```
+
+### Fixed Code (Context Manager)
+
+```python
+# api/orders.py (AFTER)
+from fastapi import APIRouter, Depends
+from sqlmodel import Session, select
+from database import engine, get_session
+from contextlib import contextmanager
+
+router = APIRouter()
+
+# ✅ Option 1: FastAPI dependency injection (recommended)
+def get_session():
+    """Session dependency with automatic cleanup"""
+    with Session(engine) as session:
+        yield session
+
+@router.get("/orders")
+async def get_orders(session: Session = Depends(get_session)):
+    # Session automatically closed after request
+    orders = session.exec(select(Order)).all()
+    return orders
+
+
+# ✅ Option 2: Explicit context manager
+@router.get("/orders-alt")
+async def get_orders_alt():
+    with Session(engine) as session:
+        orders = session.exec(select(Order)).all()
+        return orders
+    # Session guaranteed to close (even on exception)
+```
+
+**Why This Works**:
+- Context manager ensures `session.close()` called in `__exit__`
+- Works even if exception raised
+- Works even if early return
+- FastAPI `Depends()` handles async cleanup
+
+## 3. Async Connection Leaks (asyncpg)
+
+### Vulnerable Async Pattern
+
+```python
+# api/analytics.py (BEFORE)
+import asyncpg
+from fastapi import APIRouter
+
+router = APIRouter()
+
+@router.get("/analytics")
+async def get_analytics():
+    # ❌ LEAK: Connection never closed
+    conn = await asyncpg.connect(
+        user='postgres',
+        password='secret',
+        database='analytics'
+    )
+
+    # Exception here = connection leaked
+    result = await conn.fetch('SELECT * FROM metrics WHERE date > $1', date)
+
+    # Early return = connection leaked
+    if not result:
+        return []
+
+    await conn.close()  # Never reached
+    return result
+```
+
+### Fixed Async Pattern
+
+```python
+# api/analytics.py (AFTER)
+import asyncpg
+from fastapi import APIRouter
+from contextlib import asynccontextmanager
+
+router = APIRouter()
+
+# ✅ Connection pool (shared across requests)
+pool: asyncpg.Pool = None
+
+@asynccontextmanager
+async def get_db_connection():
+    """Async context manager for connections"""
+    conn = await pool.acquire()
+    try:
+        yield conn
+    finally:
+        await pool.release(conn)
+
+@router.get("/analytics")
+async def get_analytics():
+    async with get_db_connection() as conn:
+        result = await conn.fetch(
+            'SELECT * FROM metrics WHERE date > $1',
+            date
+        )
+        return result
+    # Connection automatically released to pool
+```
+
+**Pool Setup** (application startup):
+```python
+# main.py
+from fastapi import FastAPI
+import asyncpg
+
+app = FastAPI()
+
+@app.on_event("startup")
+async def startup():
+    global pool
+    pool = await asyncpg.create_pool(
+        user='postgres',
+        password='secret',
+        database='analytics',
+        min_size=10,        # Minimum connections
+        max_size=20,        # Maximum connections
+        max_inactive_connection_lifetime=300  # Recycle after 5 min
+    )
+
+@app.on_event("shutdown")
+async def shutdown():
+    await pool.close()
+```
+
+## 4. Transaction Leak Detection
+
+### Monitoring Active Connections
+
+**PostgreSQL Query**:
+```sql
+-- Show active connections with details
+SELECT
+    pid,
+    usename,
+    application_name,
+    client_addr,
+    state,
+    query,
+    state_change,
+    NOW() - state_change AS duration
+FROM pg_stat_activity
+WHERE state != 'idle'
+ORDER BY duration DESC;
+```
+
+**Prometheus Metrics**:
+```python
+# monitoring.py
+from prometheus_client import Gauge
+import asyncpg
+
+db_connections_active = Gauge(
+    'db_connections_active',
+    'Active database connections',
+    ['state']
+)
+
+async def monitor_connections(pool: asyncpg.Pool):
+    """Monitor PostgreSQL connections every 30 seconds"""
+    async with pool.acquire() as conn:
+        rows = await conn.fetch("""
+            SELECT state, COUNT(*) as count
+            FROM pg_stat_activity
+            WHERE datname = current_database()
+            GROUP BY state
+        """)
+
+        for row in rows:
+            db_connections_active.labels(state=row['state']).set(row['count'])
+```
+
+**Grafana Alert** (connection leak):
+```yaml
+alert: DatabaseConnectionLeak
+expr: db_connections_active{state="active"} > 80
+for: 5m
+annotations:
+  summary: "Potential connection leak ({{ $value }} active connections)"
+  description: "Active connections have been above 80 for 5+ minutes"
+```
+
+## 5. Real-World Fix: FastAPI Order Service
+
+### Before (Connection Pool Exhaustion)
+
+```python
+# services/order_processor.py (BEFORE)
+from sqlmodel import Session, select
+from database import engine
+from models import Order, OrderItem
+
+class OrderProcessor:
+    async def process_order(self, order_id: int):
+        # ❌ LEAK: Multiple sessions, some never closed
+        session1 = Session(engine)
+        order = session1.get(Order, order_id)
+
+        if not order:
+            # Early return = session1 leaked
+            return None
+
+        # ❌ LEAK: Second session
+        session2 = Session(engine)
+        items = session2.exec(
+            select(OrderItem).where(OrderItem.order_id == order_id)
+        ).all()
+
+        # Exception here = both sessions leaked
+        total = sum(item.price * item.quantity for item in items)
+
+        order.total = total
+        session1.commit()
+
+        # Only session1 closed, session2 leaked
+        session1.close()
+        return order
+```
+
+**Metrics (Before)**:
+```
+Connection pool: 100 connections
+Active connections after 1 hour: 95/100
+Leaked connections: ~12/min
+Memory growth: 100MB/hour
+Pool exhaustion: Every 6-8 hours
+```
+
+### After (Proper Resource Management)
+
+```python
+# services/order_processor.py (AFTER)
+from sqlmodel import Session, select
+from database import engine, get_session
+from models import Order, OrderItem
+from contextlib import contextmanager
+
+class OrderProcessor:
+    async def process_order(self, order_id: int):
+        # ✅ Single session, guaranteed cleanup
+        with Session(engine) as session:
+            # Query order
+            order = session.get(Order, order_id)
+            if not order:
+                return None
+
+            # Query items (same session)
+            items = session.exec(
+                select(OrderItem).where(OrderItem.order_id == order_id)
+            ).all()
+
+            # Calculate total
+            total = sum(item.price * item.quantity for item in items)
+
+            # Update order
+            order.total = total
+            session.add(order)
+            session.commit()
+            session.refresh(order)
+
+            return order
+        # Session automatically closed (even on exception)
+```
+
+**Metrics (After)**:
+```
+Connection pool: 100 connections
+Active connections: 8-12/100 (stable)
+Leaked connections: 0/day
+Memory growth: 0MB/hour
+Pool exhaustion: Never (0 incidents/month)
+```
+
+## 6. Connection Pool Configuration Best Practices
+
+### Recommended Settings (Grey Haven Stack)
+
+```python
+# database.py - Production settings
+from sqlmodel import create_engine
+from sqlalchemy.pool import QueuePool
+
+engine = create_engine(
+    database_url,
+    poolclass=QueuePool,
+    pool_size=20,          # (workers * connections/worker) + buffer
+    max_overflow=10,       # 50% of pool_size
+    pool_timeout=30,       # Wait timeout
+    pool_recycle=3600,     # Recycle after 1h
+    pool_pre_ping=True     # Health check
+)
+```
+
+**Pool Size Formula**: `pool_size = (workers * conn_per_worker) + buffer`
+Example: `(4 workers * 3 conn) + 8 buffer = 20`
+
+## 7. Testing Connection Cleanup
+
+### Pytest Fixture for Connection Tracking
+
+```python
+# tests/conftest.py
+import pytest
+from sqlmodel import Session, create_engine
+
+@pytest.fixture
+def engine():
+    """Test engine with connection tracking"""
+    test_engine = create_engine("postgresql://test:test@localhost/test_db", pool_size=5)
+    initial_active = test_engine.pool.checkedout()
+    yield test_engine
+    final_active = test_engine.pool.checkedout()
+    assert final_active == initial_active, f"Leaked {final_active - initial_active} connections"
+
+@pytest.mark.asyncio
+async def test_no_connection_leak_under_load(engine):
+    """Simulate 1000 concurrent requests"""
+    initial = engine.pool.checkedout()
+    tasks = [get_orders() for _ in range(1000)]
+    await asyncio.gather(*tasks)
+    await asyncio.sleep(1)
+    assert engine.pool.checkedout() == initial, "Connection leak detected"
+```
+
+## 8. CI/CD Integration
+
+```yaml
+# .github/workflows/connection-leak-test.yml
+name: Connection Leak Detection
+on: [pull_request]
+jobs:
+  leak-test:
+    runs-on: ubuntu-latest
+    services:
+      postgres:
+        image: postgres:15
+        env: {POSTGRES_PASSWORD: test, POSTGRES_DB: test_db}
+        ports: [5432:5432]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v4
+        with: {python-version: '3.11'}
+      - run: pip install -r requirements.txt pytest pytest-asyncio
+      - run: pytest tests/test_connection_leaks.py -v
+```
+
+## 9. Results and Impact
+
+### Before vs After Metrics
+
+| Metric | Before | After | Impact |
+|--------|--------|-------|--------|
+| **Active Connections** | 95/100 (95%) | 8-12/100 (10%) | **85% reduction** |
+| **Connection Timeouts** | 15-20/min | 0/day | **100% eliminated** |
+| **Memory Growth** | 100MB/hour | 0MB/hour | **100% eliminated** |
+| **Service Restarts** | 3-4x/day | 0/month | **100% eliminated** |
+| **Pool Wait Time (p95)** | 5.2s | 0.01s | **99.8% faster** |
+
+### Key Optimizations Applied
+
+1. **Context Managers**: Guaranteed connection cleanup (even on exceptions)
+2. **FastAPI Dependencies**: Automatic session lifecycle management
+3. **Connection Pooling**: Proper pool_size, max_overflow, pool_timeout
+4. **Prometheus Monitoring**: Real-time pool saturation metrics
+5. **Load Testing**: CI/CD checks for connection leaks
+
+## Related Documentation
+
+- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
+- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
+- **Large Datasets**: [large-dataset-optimization.md](large-dataset-optimization.md)
+- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
+
+---
+
+Return to [examples index](INDEX.md)
--- a/skills/memory-profiling/examples/large-dataset-optimization.md
+++ b/skills/memory-profiling/examples/large-dataset-optimization.md
@@ -0,0 +1,452 @@
+# Large Dataset Memory Optimization
+
+Memory-efficient patterns for processing multi-GB datasets in Python and Node.js without OOM errors.
+
+## Overview
+
+**Before Optimization**:
+- Dataset size: 10GB CSV (50M rows)
+- Memory usage: 20GB (2x dataset size)
+- Processing time: 45 minutes
+- OOM errors: Frequent (3-4x/day)
+
+**After Optimization**:
+- Dataset size: Same (10GB, 50M rows)
+- Memory usage: 500MB (constant)
+- Processing time: 12 minutes (73% faster)
+- OOM errors: 0/month
+
+**Tools**: Polars, pandas chunking, generators, streaming parsers
+
+## 1. Problem: Loading Entire Dataset
+
+### Vulnerable Pattern (Pandas read_csv)
+
+```python
+# analysis.py (BEFORE)
+import pandas as pd
+
+def analyze_sales_data(filename: str):
+    # ❌ Loads entire 10GB file into memory
+    df = pd.read_csv(filename)  # 20GB RAM usage
+
+    # ❌ Creates copies for each operation
+    df['total'] = df['quantity'] * df['price']  # +10GB
+    df_filtered = df[df['total'] > 1000]  # +8GB
+    df_sorted = df_filtered.sort_values('total', ascending=False)  # +8GB
+
+    # Peak memory: 46GB for 10GB file!
+    return df_sorted.head(100)
+```
+
+**Memory Profile**:
+```
+Step 1 (read_csv):     20GB
+Step 2 (calculation):  +10GB = 30GB
+Step 3 (filter):       +8GB  = 38GB
+Step 4 (sort):         +8GB  = 46GB
+Result: OOM on 32GB machine
+```
+
+## 2. Solution 1: Pandas Chunking
+
+### Chunk-Based Processing
+
+```python
+# analysis.py (AFTER - Chunking)
+import pandas as pd
+from typing import Iterator
+
+def analyze_sales_data_chunked(filename: str, chunk_size: int = 100000):
+    """Process 100K rows at a time (constant memory)"""
+
+    top_sales = []
+
+    # ✅ Process in chunks (100K rows = ~50MB each)
+    for chunk in pd.read_csv(filename, chunksize=chunk_size):
+        # Calculate total (in-place when possible)
+        chunk['total'] = chunk['quantity'] * chunk['price']
+
+        # Filter high-value sales
+        filtered = chunk[chunk['total'] > 1000]
+
+        # Keep top 100 from this chunk
+        top_chunk = filtered.nlargest(100, 'total')
+        top_sales.append(top_chunk)
+
+        # chunk goes out of scope, memory freed
+
+    # Combine top results from all chunks
+    final_df = pd.concat(top_sales).nlargest(100, 'total')
+    return final_df
+```
+
+**Memory Profile (Chunked)**:
+```
+Chunk 1: 50MB (process) → 10MB (top 100) → garbage collected
+Chunk 2: 50MB (process) → 10MB (top 100) → garbage collected
+...
+Chunk 500: 50MB (process) → 10MB (top 100) → garbage collected
+Final combine: 500 * 10MB = 500MB total
+Peak memory: 500MB (99% reduction!)
+```
+
+## 3. Solution 2: Polars (Lazy Evaluation)
+
+### Polars for Large Datasets
+
+**Why Polars**:
+- 10-100x faster than pandas
+- True streaming (doesn't load entire file)
+- Query optimizer (like SQL databases)
+- Parallel processing (uses all CPU cores)
+
+```python
+# analysis.py (POLARS)
+import polars as pl
+
+def analyze_sales_data_polars(filename: str):
+    """Polars lazy evaluation - constant memory"""
+
+    result = (
+        pl.scan_csv(filename)  # ✅ Lazy: doesn't load yet
+        .with_columns([
+            (pl.col('quantity') * pl.col('price')).alias('total')
+        ])
+        .filter(pl.col('total') > 1000)
+        .sort('total', descending=True)
+        .head(100)
+        .collect(streaming=True)  # ✅ Streaming: processes in chunks
+    )
+
+    return result
+```
+
+**Memory Profile (Polars Streaming)**:
+```
+Memory usage: 200-300MB (constant)
+Processing: Parallel chunks, optimized query plan
+Time: 12 minutes vs 45 minutes (pandas)
+```
+
+## 4. Node.js Streaming
+
+### CSV Streaming with csv-parser
+
+```typescript
+// analysis.ts (BEFORE)
+import fs from 'fs';
+import Papa from 'papaparse';
+
+async function analyzeSalesData(filename: string) {
+  // ❌ Loads entire 10GB file
+  const fileContent = fs.readFileSync(filename, 'utf-8');  // 20GB RAM
+  const parsed = Papa.parse(fileContent, { header: true });  // +10GB
+
+  // Process all rows
+  const results = parsed.data.map(row => ({
+    total: row.quantity * row.price
+  }));
+
+  return results;  // 30GB total
+}
+```
+
+**Fixed with Streaming**:
+```typescript
+// analysis.ts (AFTER - Streaming)
+import fs from 'fs';
+import csv from 'csv-parser';
+import { pipeline } from 'stream/promises';
+
+async function analyzeSalesDataStreaming(filename: string) {
+  const topSales: Array<{row: any, total: number}> = [];
+
+  await pipeline(
+    fs.createReadStream(filename),  // ✅ Stream (not load all)
+    csv(),
+    async function* (source) {
+      for await (const row of source) {
+        const total = row.quantity * row.price;
+
+        if (total > 1000) {
+          topSales.push({ row, total });
+
+          // Keep only top 100 (memory bounded)
+          if (topSales.length > 100) {
+            topSales.sort((a, b) => b.total - a.total);
+            topSales.length = 100;
+          }
+        }
+      }
+      yield topSales;
+    }
+  );
+
+  return topSales;
+}
+```
+
+**Memory Profile (Streaming)**:
+```
+Buffer: 64KB (stream chunk size)
+Processing: One row at a time
+Array: 100 rows max (bounded)
+Peak memory: 5MB vs 30GB (99.98% reduction!)
+```
+
+## 5. Generator Pattern (Python)
+
+### Memory-Efficient Pipeline
+
+```python
+# pipeline.py (Generator-based)
+from typing import Iterator
+import csv
+
+def read_csv_streaming(filename: str) -> Iterator[dict]:
+    """Read CSV line by line (not all at once)"""
+    with open(filename, 'r') as f:
+        reader = csv.DictReader(f)
+        for row in reader:
+            yield row  # ✅ One row at a time
+
+def calculate_totals(rows: Iterator[dict]) -> Iterator[dict]:
+    """Calculate totals (lazy)"""
+    for row in rows:
+        row['total'] = float(row['quantity']) * float(row['price'])
+        yield row
+
+def filter_high_value(rows: Iterator[dict], threshold: float = 1000) -> Iterator[dict]:
+    """Filter high-value sales (lazy)"""
+    for row in rows:
+        if row['total'] > threshold:
+            yield row
+
+def top_n(rows: Iterator[dict], n: int = 100) -> list[dict]:
+    """Keep top N rows (bounded memory)"""
+    import heapq
+    return heapq.nlargest(n, rows, key=lambda x: x['total'])
+
+# ✅ Pipeline: each stage processes one row at a time
+def analyze_sales_pipeline(filename: str):
+    rows = read_csv_streaming(filename)
+    with_totals = calculate_totals(rows)
+    high_value = filter_high_value(with_totals)
+    top_100 = top_n(high_value, 100)
+    return top_100
+```
+
+**Memory Profile (Generator Pipeline)**:
+```
+Stage 1 (read): 1 row (few KB)
+Stage 2 (calculate): 1 row (few KB)
+Stage 3 (filter): 1 row (few KB)
+Stage 4 (top_n): 100 rows (bounded)
+Peak memory: <1MB (constant)
+```
+
+## 6. Real-World: E-Commerce Analytics
+
+### Before (Pandas load_all)
+
+```python
+# analytics_service.py (BEFORE)
+import pandas as pd
+
+class AnalyticsService:
+    def generate_sales_report(self, start_date: str, end_date: str):
+        # ❌ Load entire orders table (10GB)
+        orders = pd.read_sql(
+            "SELECT * FROM orders WHERE date BETWEEN %s AND %s",
+            engine,
+            params=(start_date, end_date)
+        )  # 20GB RAM
+
+        # ❌ Load entire order_items (50GB)
+        items = pd.read_sql("SELECT * FROM order_items", engine)  # +100GB RAM
+
+        # Join (creates another copy)
+        merged = orders.merge(items, on='order_id')  # +150GB
+
+        # Aggregate
+        summary = merged.groupby('category').agg({
+            'total': 'sum',
+            'quantity': 'sum'
+        })
+
+        return summary  # Peak: 270GB - OOM!
+```
+
+### After (Database Aggregation + Chunking)
+
+```python
+# analytics_service.py (AFTER)
+import pandas as pd
+
+class AnalyticsService:
+    def generate_sales_report(self, start_date: str, end_date: str):
+        # ✅ Aggregate in database (PostgreSQL does the work)
+        query = """
+            SELECT
+                oi.category,
+                SUM(oi.price * oi.quantity) as total,
+                SUM(oi.quantity) as quantity
+            FROM orders o
+            JOIN order_items oi ON o.id = oi.order_id
+            WHERE o.date BETWEEN %(start)s AND %(end)s
+            GROUP BY oi.category
+        """
+
+        # Result: aggregated data (few KB, not 270GB!)
+        summary = pd.read_sql(
+            query,
+            engine,
+            params={'start': start_date, 'end': end_date}
+        )
+
+        return summary  # Peak: 1MB vs 270GB
+```
+
+**Metrics**:
+```
+Before: 270GB RAM, OOM error
+After: 1MB RAM, 99.9996% reduction
+Time: 45 min → 30 seconds (90x faster)
+```
+
+## 7. Dask for Parallel Processing
+
+### Dask DataFrame (Parallel Chunking)
+
+```python
+# analysis_dask.py
+import dask.dataframe as dd
+
+def analyze_sales_data_dask(filename: str):
+    """Process in parallel chunks across CPU cores"""
+
+    # ✅ Lazy loading, parallel processing
+    df = dd.read_csv(
+        filename,
+        blocksize='64MB'  # Process 64MB chunks
+    )
+
+    # All operations are lazy (no computation yet)
+    df['total'] = df['quantity'] * df['price']
+    filtered = df[df['total'] > 1000]
+    top_100 = filtered.nlargest(100, 'total')
+
+    # ✅ Trigger computation (parallel across cores)
+    result = top_100.compute()
+
+    return result
+```
+
+**Memory Profile (Dask)**:
+```
+Workers: 8 (one per CPU core)
+Memory per worker: 100MB
+Total memory: 800MB vs 46GB
+Speed: 4-8x faster (parallel)
+```
+
+## 8. Memory Monitoring
+
+### Track Memory Usage During Processing
+
+```python
+# monitor.py
+import tracemalloc
+import psutil
+from contextlib import contextmanager
+
+@contextmanager
+def memory_monitor(label: str):
+    """Monitor memory usage of code block"""
+
+    # Start tracking
+    tracemalloc.start()
+    process = psutil.Process()
+    mem_before = process.memory_info().rss / 1024 / 1024  # MB
+
+    yield
+
+    # Measure after
+    mem_after = process.memory_info().rss / 1024 / 1024
+    current, peak = tracemalloc.get_traced_memory()
+    tracemalloc.stop()
+
+    print(f"{label}:")
+    print(f"  Memory before: {mem_before:.1f} MB")
+    print(f"  Memory after: {mem_after:.1f} MB")
+    print(f"  Memory delta: {mem_after - mem_before:.1f} MB")
+    print(f"  Peak traced: {peak / 1024 / 1024:.1f} MB")
+
+# Usage
+with memory_monitor("Pandas load_all"):
+    df = pd.read_csv("large_file.csv")  # Shows high memory usage
+
+with memory_monitor("Polars streaming"):
+    df = pl.scan_csv("large_file.csv").collect(streaming=True)  # Low memory
+```
+
+## 9. Optimization Decision Tree
+
+**Choose the right tool based on dataset size**:
+
+```
+Dataset < 1GB:
+  → Use pandas.read_csv() (simple, fast)
+
+Dataset 1-10GB:
+  → Use pandas chunking (chunksize=100000)
+  → Or Polars streaming (faster, less memory)
+
+Dataset 10-100GB:
+  → Use Polars streaming (best performance)
+  → Or Dask (parallel processing)
+  → Or Database aggregation (PostgreSQL, ClickHouse)
+
+Dataset > 100GB:
+  → Database aggregation (required)
+  → Or Spark/Ray (distributed computing)
+  → Never load into memory
+```
+
+## 10. Results and Impact
+
+### Before vs After Metrics
+
+| Metric | Before (pandas) | After (Polars) | Impact |
+|--------|----------------|----------------|--------|
+| **Memory Usage** | 46GB | 300MB | **99.3% reduction** |
+| **Processing Time** | 45 min | 12 min | **73% faster** |
+| **OOM Errors** | 3-4/day | 0/month | **100% eliminated** |
+| **Max Dataset Size** | 10GB | 500GB+ | **50x scalability** |
+
+### Key Optimizations Applied
+
+1. **Chunking**: Process 100K rows at a time (constant memory)
+2. **Lazy Evaluation**: Polars/Dask don't load until needed
+3. **Streaming**: One row at a time (generators, Node.js streams)
+4. **Database Aggregation**: Let PostgreSQL do the work
+5. **Bounded Memory**: heapq.nlargest() keeps top N (not all rows)
+
+### Cost Savings
+
+**Infrastructure costs**:
+- Before: r5.8xlarge (256GB RAM) = $1.344/hour
+- After: r5.large (16GB RAM) = $0.084/hour
+- **Savings**: 94% reduction ($23,000/year per service)
+
+## Related Documentation
+
+- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
+- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
+- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
+- **Reference**: [../reference/memory-optimization-patterns.md](../reference/memory-optimization-patterns.md)
+
+---
+
+Return to [examples index](INDEX.md)
--- a/skills/memory-profiling/examples/nodejs-memory-leak.md
+++ b/skills/memory-profiling/examples/nodejs-memory-leak.md
@@ -0,0 +1,490 @@
+# Node.js Memory Leak Detection
+
+Identifying and fixing memory leaks in Node.js applications using Chrome DevTools, heapdump, and memory profiling techniques.
+
+## Overview
+
+**Symptoms Before Fix**:
+- Memory usage: 150MB → 2GB over 6 hours
+- Heap size growing linearly (5MB/minute)
+- V8 garbage collection ineffective
+- Production outages (OOM killer)
+
+**After Fix**:
+- Memory stable at 150MB (93% reduction)
+- Heap size constant over time
+- Zero OOM errors in 30 days
+- Proper resource cleanup
+
+**Tools**: Chrome DevTools, heapdump, memwatch-next, Prometheus monitoring
+
+## 1. Memory Leak Symptoms
+
+### Linear Memory Growth
+
+```bash
+# Monitor Node.js memory usage
+node --expose-gc --inspect app.js
+
+# Connect Chrome DevTools: chrome://inspect
+# Memory tab → Take heap snapshot every 5 minutes
+```
+
+**Heap growth pattern**:
+```
+Time  | Heap Size | External | Total
+------|-----------|----------|-------
+0 min | 50MB      | 10MB     | 60MB
+5 min | 75MB      | 15MB     | 90MB
+10min | 100MB     | 20MB     | 120MB
+15min | 125MB     | 25MB     | 150MB
+...   | ...       | ...      | ...
+6 hrs | 1.8GB     | 200MB    | 2GB
+```
+
+**Diagnosis**: Linear growth indicates memory leak (not normal sawtooth GC pattern)
+
+### High GC Activity
+
+```javascript
+// Monitor GC events
+const v8 = require('v8');
+const memoryUsage = process.memoryUsage();
+
+setInterval(() => {
+  const usage = process.memoryUsage();
+  console.log({
+    heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
+    heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
+    external: `${Math.round(usage.external / 1024 / 1024)}MB`,
+    rss: `${Math.round(usage.rss / 1024 / 1024)}MB`
+  });
+}, 60000);  // Every minute
+```
+
+**Output showing leak**:
+```
+{heapUsed: '75MB', heapTotal: '100MB', external: '15MB', rss: '120MB'}
+{heapUsed: '100MB', heapTotal: '130MB', external: '20MB', rss: '150MB'}
+{heapUsed: '125MB', heapTotal: '160MB', external: '25MB', rss: '185MB'}
+```
+
+## 2. Heap Snapshot Analysis
+
+### Taking Heap Snapshots
+
+```javascript
+// Generate heap snapshot programmatically
+const v8 = require('v8');
+const fs = require('fs');
+
+function takeHeapSnapshot(filename) {
+  const heapSnapshot = v8.writeHeapSnapshot(filename);
+  console.log(`Heap snapshot written to ${heapSnapshot}`);
+}
+
+// Take snapshot every hour
+setInterval(() => {
+  const timestamp = new Date().toISOString().replace(/:/g, '-');
+  takeHeapSnapshot(`heap-${timestamp}.heapsnapshot`);
+}, 3600000);
+```
+
+### Analyzing Snapshots in Chrome DevTools
+
+**Steps**:
+1. Load two snapshots (before and after 1 hour)
+2. Compare snapshots (Comparison view)
+3. Sort by "Size Delta" (descending)
+4. Look for objects growing significantly
+
+**Example Analysis**:
+```
+Object Type           | Count  | Size Delta | Retained Size
+----------------------|--------|------------|---------------
+(array)               | +5,000 | +50MB      | +60MB
+EventEmitter          | +1,200 | +12MB      | +15MB
+Closure (anonymous)   | +800   | +8MB       | +10MB
+```
+
+**Diagnosis**: EventEmitter count growing = likely event listener leak
+
+### Retained Objects Analysis
+
+```javascript
+// Chrome DevTools → Heap Snapshot → Summary → sort by "Retained Size"
+// Click object → view Retainer tree
+```
+
+**Retainer tree example** (EventEmitter leak):
+```
+EventEmitter @123456
+  ← listeners: Array[50]
+    ← _events.data: Array
+      ← EventEmitter @123456 (self-reference leak!)
+```
+
+## 3. Common Memory Leak Patterns
+
+### Pattern 1: Event Listener Leak
+
+**Vulnerable Code**:
+```typescript
+// ❌ LEAK: EventEmitter listeners never removed
+import {EventEmitter} from 'events';
+
+class DataProcessor {
+  private emitter = new EventEmitter();
+
+  async processOrders() {
+    // Add listener every time function called
+    this.emitter.on('data', (data) => {
+      console.log('Processing:', data);
+    });
+
+    // Emit 1000 events
+    for (let i = 0; i < 1000; i++) {
+      this.emitter.emit('data', {id: i});
+    }
+  }
+}
+
+// Called 1000 times = 1000 listeners accumulate!
+setInterval(() => new DataProcessor().processOrders(), 1000);
+```
+
+**Result**: 1000 listeners/second = 3.6M listeners/hour → 2GB memory leak
+
+**Fixed Code**:
+```typescript
+// ✅ FIXED: Remove listener after use
+class DataProcessor {
+  private emitter = new EventEmitter();
+
+  async processOrders() {
+    const handler = (data) => {
+      console.log('Processing:', data);
+    };
+
+    this.emitter.on('data', handler);
+
+    try {
+      for (let i = 0; i < 1000; i++) {
+        this.emitter.emit('data', {id: i});
+      }
+    } finally {
+      // ✅ Clean up listener
+      this.emitter.removeListener('data', handler);
+    }
+  }
+}
+```
+
+**Better**: Use `once()` for one-time listeners:
+```typescript
+this.emitter.once('data', handler);  // Auto-removed after first emit
+```
+
+### Pattern 2: Closure Leak
+
+**Vulnerable Code**:
+```typescript
+// ❌ LEAK: Closure captures large object
+const cache = new Map();
+
+function processRequest(userId: string) {
+  const largeData = fetchLargeDataset(userId);  // 10MB object
+
+  // Closure captures entire largeData
+  cache.set(userId, () => {
+    return largeData.summary;  // Only need summary (1KB)
+  });
+}
+
+// Called for 1000 users = 10GB in cache!
+```
+
+**Fixed Code**:
+```typescript
+// ✅ FIXED: Only store what you need
+const cache = new Map();
+
+function processRequest(userId: string) {
+  const largeData = fetchLargeDataset(userId);
+  const summary = largeData.summary;  // Extract only 1KB
+
+  // Store minimal data
+  cache.set(userId, () => summary);
+}
+
+// 1000 users = 1MB in cache ✅
+```
+
+### Pattern 3: Global Variable Accumulation
+
+**Vulnerable Code**:
+```typescript
+// ❌ LEAK: Global array keeps growing
+const requestLog: Request[] = [];
+
+app.post('/api/orders', (req, res) => {
+  requestLog.push(req);  // Never removed!
+  // ... process order
+});
+
+// 1M requests = 1M objects in memory permanently
+```
+
+**Fixed Code**:
+```typescript
+// ✅ FIXED: Use LRU cache with size limit
+import LRU from 'lru-cache';
+
+const requestLog = new LRU({
+  max: 1000,  // Maximum 1000 items
+  ttl: 1000 * 60 * 5  // 5-minute TTL
+});
+
+app.post('/api/orders', (req, res) => {
+  requestLog.set(req.id, req);  // Auto-evicts old items
+});
+```
+
+### Pattern 4: Forgotten Timers/Intervals
+
+**Vulnerable Code**:
+```typescript
+// ❌ LEAK: setInterval never cleared
+class ReportGenerator {
+  private data: any[] = [];
+
+  start() {
+    setInterval(() => {
+      this.data.push(generateReport());  // Accumulates forever
+    }, 60000);
+  }
+}
+
+// Each instance leaks!
+const generator = new ReportGenerator();
+generator.start();
+```
+
+**Fixed Code**:
+```typescript
+// ✅ FIXED: Clear interval on cleanup
+class ReportGenerator {
+  private data: any[] = [];
+  private intervalId?: NodeJS.Timeout;
+
+  start() {
+    this.intervalId = setInterval(() => {
+      this.data.push(generateReport());
+    }, 60000);
+  }
+
+  stop() {
+    if (this.intervalId) {
+      clearInterval(this.intervalId);
+      this.intervalId = undefined;
+      this.data = [];  // Clear accumulated data
+    }
+  }
+}
+```
+
+## 4. Memory Profiling with memwatch-next
+
+### Installation
+
+```bash
+bun add memwatch-next
+```
+
+### Leak Detection
+
+```typescript
+// memory-monitor.ts
+import memwatch from 'memwatch-next';
+
+// Detect memory leaks
+memwatch.on('leak', (info) => {
+  console.error('Memory leak detected:', {
+    growth: info.growth,
+    reason: info.reason,
+    current_base: `${Math.round(info.current_base / 1024 / 1024)}MB`,
+    leaked: `${Math.round((info.current_base - info.start) / 1024 / 1024)}MB`
+  });
+
+  // Alert to PagerDuty/Slack
+  alertOps('Memory leak detected', info);
+});
+
+// Monitor GC stats
+memwatch.on('stats', (stats) => {
+  console.log('GC stats:', {
+    used_heap_size: `${Math.round(stats.used_heap_size / 1024 / 1024)}MB`,
+    heap_size_limit: `${Math.round(stats.heap_size_limit / 1024 / 1024)}MB`,
+    num_full_gc: stats.num_full_gc,
+    num_inc_gc: stats.num_inc_gc
+  });
+});
+```
+
+### HeapDiff for Leak Analysis
+
+```typescript
+import memwatch from 'memwatch-next';
+
+const hd = new memwatch.HeapDiff();
+
+// Simulate leak
+const leak: any[] = [];
+for (let i = 0; i < 10000; i++) {
+  leak.push({data: new Array(1000).fill('x')});
+}
+
+// Compare heaps
+const diff = hd.end();
+console.log('Heap diff:', JSON.stringify(diff, null, 2));
+
+// Output:
+// {
+//   "before": {"nodes": 12345, "size": 50000000},
+//   "after": {"nodes": 22345, "size": 150000000},
+//   "change": {
+//     "size_bytes": 100000000,  // 100MB leak!
+//     "size": "100.00MB",
+//     "freed_nodes": 100,
+//     "allocated_nodes": 10100  // Net increase
+//   }
+// }
+```
+
+## 5. Production Memory Monitoring
+
+### Prometheus Metrics
+
+```typescript
+// metrics.ts
+import {Gauge} from 'prom-client';
+
+const memoryUsageGauge = new Gauge({
+  name: 'nodejs_memory_usage_bytes',
+  help: 'Node.js memory usage in bytes',
+  labelNames: ['type']
+});
+
+setInterval(() => {
+  const usage = process.memoryUsage();
+  memoryUsageGauge.set({type: 'heap_used'}, usage.heapUsed);
+  memoryUsageGauge.set({type: 'heap_total'}, usage.heapTotal);
+  memoryUsageGauge.set({type: 'external'}, usage.external);
+  memoryUsageGauge.set({type: 'rss'}, usage.rss);
+}, 15000);
+```
+
+**Grafana Alert**:
+```promql
+# Alert if heap usage growing linearly
+increase(nodejs_memory_usage_bytes{type="heap_used"}[1h]) > 100000000  # 100MB/hour
+```
+
+## 6. Real-World Fix: EventEmitter Leak
+
+### Before (Leaking)
+
+```typescript
+// order-processor.ts (BEFORE FIX)
+class OrderProcessor {
+  private emitter = new EventEmitter();
+
+  async processOrders() {
+    // ❌ LEAK: Listener added every call
+    this.emitter.on('order:created', async (order) => {
+      await this.sendConfirmationEmail(order);
+      await this.updateInventory(order);
+    });
+
+    const orders = await db.query.orders.findMany({status: 'pending'});
+    for (const order of orders) {
+      this.emitter.emit('order:created', order);
+    }
+  }
+}
+
+// Called every minute
+setInterval(() => new OrderProcessor().processOrders(), 60000);
+```
+
+**Result**: 1,440 listeners/day → 2GB memory leak in production
+
+### After (Fixed)
+
+```typescript
+// order-processor.ts (AFTER FIX)
+class OrderProcessor {
+  private emitter = new EventEmitter();
+  private listeners = new WeakMap();  // Track listeners for cleanup
+
+  async processOrders() {
+    const handler = async (order) => {
+      await this.sendConfirmationEmail(order);
+      await this.updateInventory(order);
+    };
+
+    // ✅ Use once() for one-time processing
+    this.emitter.once('order:created', handler);
+
+    const orders = await db.query.orders.findMany({status: 'pending'});
+    for (const order of orders) {
+      this.emitter.emit('order:created', order);
+    }
+
+    // ✅ Cleanup (if using on() instead of once())
+    this.emitter.removeAllListeners('order:created');
+  }
+}
+```
+
+**Result**: Memory stable at 150MB, zero leaks
+
+## 7. Results and Impact
+
+### Before vs After Metrics
+
+| Metric | Before Fix | After Fix | Impact |
+|--------|-----------|-----------|---------|
+| **Memory Usage** | 2GB (after 6h) | 150MB (stable) | **93% reduction** |
+| **Heap Size** | Linear growth (5MB/min) | Stable | **Zero growth** |
+| **OOM Incidents** | 12/month | 0/month | **100% eliminated** |
+| **GC Pause Time** | 200ms avg | 50ms avg | **75% faster** |
+| **Uptime** | 6 hours avg | 30+ days | **120x improvement** |
+
+### Lessons Learned
+
+**1. Always remove event listeners**
+- Use `once()` for one-time events
+- Use `removeListener()` in finally blocks
+- Track listeners with WeakMap for debugging
+
+**2. Avoid closures capturing large objects**
+- Extract only needed data before closure
+- Use WeakMap/WeakSet for object references
+- Profile with heap snapshots regularly
+
+**3. Monitor memory in production**
+- Prometheus metrics for heap usage
+- Alert on linear growth patterns
+- Weekly heap snapshot analysis
+
+## Related Documentation
+
+- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
+- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
+- **Reference**: [../reference/memory-patterns.md](../reference/memory-patterns.md)
+- **Templates**: [../templates/memory-report.md](../templates/memory-report.md)
+
+---
+
+Return to [examples index](INDEX.md)
--- a/skills/memory-profiling/examples/python-scalene-profiling.md
+++ b/skills/memory-profiling/examples/python-scalene-profiling.md
@@ -0,0 +1,456 @@
+# Python Memory Profiling with Scalene
+
+Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies.
+
+## Overview
+
+**Before Optimization**:
+- Memory usage: 500MB for processing 10K records
+- OOM (Out of Memory) errors with 100K records
+- Processing time: 45 seconds for 10K records
+- List comprehensions loading entire dataset
+
+**After Optimization**:
+- Memory usage: 5MB for processing 10K records (99% reduction)
+- No OOM errors with 1M records
+- Processing time: 8 seconds for 10K records (82% faster)
+- Generator-based streaming
+
+**Tools**: Scalene, pytest, memory_profiler, tracemalloc
+
+## 1. Scalene Installation and Setup
+
+### Installation
+
+```bash
+# Install Scalene
+pip install scalene
+
+# Or with uv (faster)
+uv pip install scalene
+```
+
+### Basic Usage
+
+```bash
+# Profile entire script
+scalene script.py
+
+# Profile with pytest (recommended)
+scalene --cli --memory -m pytest tests/
+
+# HTML output
+scalene --html --outfile profile.html script.py
+
+# Profile specific function
+scalene --reduced-profile script.py
+```
+
+## 2. Profiling with pytest
+
+### Test File Setup
+
+```python
+# tests/test_data_processing.py
+import pytest
+from data_processor import DataProcessor
+
+@pytest.fixture
+def processor():
+    return DataProcessor()
+
+def test_process_large_dataset(processor):
+    # Generate 10K records
+    records = [{'id': i, 'value': i * 2} for i in range(10000)]
+
+    # Process (this is where memory spike occurs)
+    result = processor.process_records(records)
+
+    assert len(result) == 10000
+```
+
+### Running Scalene with pytest
+
+```bash
+# Profile memory usage during test execution
+uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test"
+
+# Output shows line-by-line memory allocation
+```
+
+**Scalene Output** (before optimization):
+```
+data_processor.py:
+Line | Memory % | Memory (MB) | CPU % | Code
+-----|----------|-------------|-------|-----
+12   | 45%      | 225 MB      | 10%   | result = [transform(r) for r in records]
+18   | 30%      | 150 MB      | 5%    | filtered = [r for r in result if r['value'] > 0]
+25   | 15%      | 75 MB       | 20%   | sorted_data = sorted(filtered, key=lambda x: x['id'])
+```
+
+**Analysis**: Line 12 is the hotspot (45% of memory)
+
+## 3. Memory Hotspot Identification
+
+### Vulnerable Code (Memory Spike)
+
+```python
+# data_processor.py (BEFORE OPTIMIZATION)
+class DataProcessor:
+    def process_records(self, records: list[dict]) -> list[dict]:
+        # ❌ HOTSPOT: List comprehension loads entire dataset
+        result = [self.transform(r) for r in records]  # 225MB for 10K records
+
+        # ❌ Creates another copy
+        filtered = [r for r in result if r['value'] > 0]  # +150MB
+
+        # ❌ sorted() creates yet another copy
+        sorted_data = sorted(filtered, key=lambda x: x['id'])  # +75MB
+
+        return sorted_data  # Total: 450MB for 10K records
+
+    def transform(self, record: dict) -> dict:
+        return {
+            'id': record['id'],
+            'value': record['value'] * 2,
+            'timestamp': datetime.now()
+        }
+```
+
+**Scalene Report**:
+```
+Memory allocation breakdown:
+- Line 12 (list comprehension): 225MB (50%)
+- Line 18 (filtering): 150MB (33%)
+- Line 25 (sorting): 75MB (17%)
+
+Total memory: 450MB for 10,000 records
+Projected for 100K: 4.5GB → OOM!
+```
+
+### Optimized Code (Generator-Based)
+
+```python
+# data_processor.py (AFTER OPTIMIZATION)
+from typing import Iterator
+
+class DataProcessor:
+    def process_records(self, records: list[dict]) -> Iterator[dict]:
+        # ✅ Generator: processes one record at a time
+        transformed = (self.transform(r) for r in records)  # O(1) memory
+
+        # ✅ Generator chaining
+        filtered = (r for r in transformed if r['value'] > 0)  # O(1) memory
+
+        # ✅ Stream-based sorting (only if needed)
+        # For very large datasets, use external sorting or database ORDER BY
+        yield from sorted(filtered, key=lambda x: x['id'])  # Still O(n), but lazy
+
+    def transform(self, record: dict) -> dict:
+        return {
+            'id': record['id'],
+            'value': record['value'] * 2,
+            'timestamp': datetime.now()
+        }
+
+    # Alternative: Fully streaming (no sorting)
+    def process_records_streaming(self, records: list[dict]) -> Iterator[dict]:
+        for record in records:
+            transformed = self.transform(record)
+            if transformed['value'] > 0:
+                yield transformed  # O(1) memory, fully streaming
+```
+
+**Scalene Report (After)**:
+```
+Memory allocation breakdown:
+- Line 12 (generator): 5MB (100% - constant overhead)
+- Line 18 (filter generator): 0MB (lazy)
+- Line 25 (yield): 0MB (lazy)
+
+Total memory: 5MB for 10,000 records (99% reduction!)
+Scalable to 1M+ records without OOM
+```
+
+## 4. Common Memory Patterns
+
+### Pattern 1: List Comprehension → Generator
+
+**Before** (High Memory):
+```python
+# ❌ Loads entire list into memory
+def process_large_file(filename: str) -> list[dict]:
+    with open(filename) as f:
+        lines = f.readlines()  # Loads entire file (500MB)
+
+    # Another copy
+    return [json.loads(line) for line in lines]  # +500MB = 1GB total
+```
+
+**After** (Low Memory):
+```python
+# ✅ Generator: processes line-by-line
+def process_large_file(filename: str) -> Iterator[dict]:
+    with open(filename) as f:
+        for line in f:  # Reads one line at a time
+            yield json.loads(line)  # O(1) memory
+```
+
+**Scalene diff**: 1GB → 5MB (99.5% reduction)
+
+### Pattern 2: DataFrame Memory Optimization
+
+**Before** (High Memory):
+```python
+# ❌ Loads entire CSV into memory
+import pandas as pd
+
+def analyze_data(filename: str):
+    df = pd.read_csv(filename)  # 10GB CSV → 10GB RAM
+
+    # All transformations in memory
+    df['new_col'] = df['value'] * 2
+    df_filtered = df[df['value'] > 0]
+    return df_filtered.groupby('category').sum()
+```
+
+**After** (Low Memory with Chunking):
+```python
+# ✅ Process in chunks
+import pandas as pd
+
+def analyze_data(filename: str):
+    chunk_size = 10000
+    results = []
+
+    # Process 10K rows at a time
+    for chunk in pd.read_csv(filename, chunksize=chunk_size):
+        chunk['new_col'] = chunk['value'] * 2
+        filtered = chunk[chunk['value'] > 0]
+        group_result = filtered.groupby('category').sum()
+        results.append(group_result)
+
+    # Combine results
+    return pd.concat(results).groupby(level=0).sum()  # Much smaller
+```
+
+**Scalene diff**: 10GB → 500MB (95% reduction)
+
+### Pattern 3: String Concatenation
+
+**Before** (High Memory):
+```python
+# ❌ Creates new string each iteration (O(n²) memory)
+def build_report(data: list[dict]) -> str:
+    report = ""
+    for item in data:  # 100K items
+        report += f"{item['id']}: {item['value']}\n"  # New string every time
+    return report  # 500MB final string + 500MB garbage = 1GB
+```
+
+**After** (Low Memory):
+```python
+# ✅ StringIO or join (O(n) memory)
+from io import StringIO
+
+def build_report(data: list[dict]) -> str:
+    buffer = StringIO()
+    for item in data:
+        buffer.write(f"{item['id']}: {item['value']}\n")
+    return buffer.getvalue()
+
+# Or even better: generator
+def build_report_streaming(data: list[dict]) -> Iterator[str]:
+    for item in data:
+        yield f"{item['id']}: {item['value']}\n"
+```
+
+**Scalene diff**: 1GB → 50MB (95% reduction)
+
+## 5. Scalene CLI Reference
+
+### Common Options
+
+```bash
+# Memory-only profiling (fastest)
+scalene --cli --memory script.py
+
+# CPU + Memory profiling
+scalene --cli --cpu --memory script.py
+
+# Reduced profile (functions only, not lines)
+scalene --reduced-profile script.py
+
+# Profile specific function
+scalene --profile-only process_data script.py
+
+# HTML report
+scalene --html --outfile profile.html script.py
+
+# Profile with pytest
+scalene --cli --memory -m pytest tests/
+
+# Set memory sampling interval (default: 1MB)
+scalene --malloc-threshold 0.1 script.py  # Sample every 100KB
+```
+
+### Interpreting Output
+
+**Column Meanings**:
+```
+Memory %  | Percentage of total memory allocated
+Memory MB | Absolute memory allocated (in megabytes)
+CPU %     | Percentage of CPU time spent
+Python %  | Time spent in Python (vs native code)
+```
+
+**Example Output**:
+```
+script.py:
+Line | Memory % | Memory MB | CPU % | Python % | Code
+-----|----------|-----------|-------|----------|-----
+12   | 45.2%    | 225.6 MB  | 10.5% | 95.2%    | data = [x for x in range(1000000)]
+18   | 30.1%    | 150.3 MB  | 5.2%  | 98.1%    | filtered = list(filter(lambda x: x > 0, data))
+```
+
+**Analysis**:
+- Line 12: High memory (45.2%) → optimize list comprehension
+- Line 18: Moderate memory (30.1%) → use generator instead of list()
+
+## 6. Integration with CI/CD
+
+### GitHub Actions Workflow
+
+```yaml
+# .github/workflows/memory-profiling.yml
+name: Memory Profiling
+
+on: [pull_request]
+
+jobs:
+  profile:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: |
+          pip install scalene pytest
+
+      - name: Run memory profiling
+        run: |
+          scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt
+
+      - name: Check for memory hotspots
+        run: |
+          if grep -q "Memory %" profile.txt; then
+            # Alert if any line uses >100MB
+            if awk '$3 > 100 {exit 1}' profile.txt; then
+              echo "Memory hotspot detected!"
+              exit 1
+            fi
+          fi
+
+      - name: Upload profile
+        uses: actions/upload-artifact@v3
+        with:
+          name: memory-profile
+          path: profile.txt
+```
+
+## 7. Real-World Optimization: CSV Processing
+
+### Before (500MB Memory, OOM at 100K rows)
+
+```python
+# csv_processor.py (BEFORE)
+import pandas as pd
+
+class CSVProcessor:
+    def process_file(self, filename: str) -> dict:
+        # ❌ Loads entire CSV
+        df = pd.read_csv(filename)  # 500MB for 10K rows
+
+        # ❌ Multiple copies
+        df['total'] = df['quantity'] * df['price']
+        df_filtered = df[df['total'] > 100]
+        summary = df_filtered.groupby('category').agg({
+            'total': 'sum',
+            'quantity': 'sum'
+        })
+
+        return summary.to_dict()
+```
+
+**Scalene Output**:
+```
+Line 8:  500MB (75%) - pd.read_csv()
+Line 11: 100MB (15%) - df['total'] calculation
+Line 12: 50MB (10%) - filtering
+Total: 650MB for 10K rows
+```
+
+### After (5MB Memory, Handles 1M rows)
+
+```python
+# csv_processor.py (AFTER)
+import pandas as pd
+from collections import defaultdict
+
+class CSVProcessor:
+    def process_file(self, filename: str) -> dict:
+        # ✅ Process in 10K row chunks
+        chunk_size = 10000
+        results = defaultdict(lambda: {'total': 0, 'quantity': 0})
+
+        for chunk in pd.read_csv(filename, chunksize=chunk_size):
+            chunk['total'] = chunk['quantity'] * chunk['price']
+            filtered = chunk[chunk['total'] > 100]
+
+            # Aggregate incrementally
+            for category, group in filtered.groupby('category'):
+                results[category]['total'] += group['total'].sum()
+                results[category]['quantity'] += group['quantity'].sum()
+
+        return dict(results)
+```
+
+**Scalene Output (After)**:
+```
+Line 9:  5MB (100%) - chunk processing (constant memory)
+Total: 5MB for any file size (99% reduction)
+```
+
+## 8. Results and Impact
+
+### Before vs After Metrics
+
+| Metric | Before | After | Impact |
+|--------|--------|-------|--------|
+| **Memory Usage** | 500MB (10K rows) | 5MB (1M rows) | **99% reduction** |
+| **Processing Time** | 45s (10K rows) | 8s (10K rows) | **82% faster** |
+| **Max File Size** | 100K rows (OOM) | 10M+ rows | **100x scalability** |
+| **OOM Errors** | 5/week | 0/month | **100% eliminated** |
+
+### Key Optimizations Applied
+
+1. **List comprehension → Generator**: 225MB → 0MB
+2. **DataFrame chunking**: 500MB → 5MB per chunk
+3. **String concatenation**: 1GB → 50MB (StringIO)
+4. **Lazy evaluation**: Load on demand vs load all
+
+## Related Documentation
+
+- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
+- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
+- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
+- **Templates**: [../templates/scalene-config.txt](../templates/scalene-config.txt)
+
+---
+
+Return to [examples index](INDEX.md)