Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:29:23 +08:00
commit ebc71f5387
37 changed files with 9382 additions and 0 deletions

View File

@@ -0,0 +1,85 @@
---
name: grey-haven-memory-profiling
description: "Identify memory leaks, inefficient allocations, and optimization opportunities in JavaScript/TypeScript and Python applications. Analyze heap snapshots, allocation patterns, garbage collection, and memory retention. Use when memory grows over time, high memory consumption detected, performance degradation, or when user mentions 'memory leak', 'memory usage', 'heap analysis', 'garbage collection', 'memory profiling', or 'out of memory'."
---
# Memory Profiling Skill
Identify memory leaks, inefficiencies, and optimization opportunities in running applications through systematic heap analysis and allocation profiling.
## Description
Specialized memory profiling skill for analyzing allocation patterns, heap usage, garbage collection behavior, and memory retention in JavaScript/TypeScript (Node.js, Bun, browsers) and Python applications. Detect memory leaks, optimize memory usage, and prevent out-of-memory errors.
## What's Included
### Examples (`examples/`)
- **Memory leak detection** - Finding and fixing common leak patterns
- **Heap snapshot analysis** - Interpreting Chrome DevTools heap snapshots
- **Allocation profiling** - Tracking memory allocation over time
- **Real-world scenarios** - E-commerce app leak, API server memory growth
### Reference Guides (`reference/`)
- **Profiling tools** - Chrome DevTools, Node.js inspector, Python memory_profiler
- **Memory concepts** - Heap, stack, GC algorithms, retention paths
- **Optimization techniques** - Object pooling, weak references, lazy loading
- **Common leak patterns** - Event listeners, closures, caching, timers
### Templates (`templates/`)
- **Profiling report template** - Standardized memory analysis reports
- **Heap snapshot comparison template** - Before/after analysis
- **Memory budget template** - Setting and tracking memory limits
### Checklists (`checklists/`)
- **Memory leak checklist** - Systematic leak detection process
- **Optimization checklist** - Memory optimization verification
## Use This Skill When
- ✅ Memory usage growing continuously over time
- ✅ High memory consumption detected (> 500MB for Node, > 1GB for Python)
- ✅ Performance degradation with prolonged runtime
- ✅ Out of memory errors in production
- ✅ Garbage collection causing performance issues
- ✅ Need to optimize memory footprint
- ✅ User mentions: "memory leak", "memory usage", "heap", "garbage collection", "OOM"
## Related Agents
- `memory-profiler` - Automated memory analysis and leak detection
- `performance-optimizer` - Broader performance optimization including memory
## Quick Start
```bash
# View leak detection examples
cat examples/memory-leak-detection.md
# Check profiling tools reference
cat reference/profiling-tools.md
# Use memory leak checklist
cat checklists/memory-leak-checklist.md
```
## Common Memory Issues
1. **Event Listener Leaks** - Unremoved listeners holding references
2. **Closure Leaks** - Variables captured in closures never released
3. **Cache Leaks** - Unbounded caches growing indefinitely
4. **Timer Leaks** - setInterval/setTimeout not cleared
5. **DOM Leaks** - Detached DOM nodes retained in memory
6. **Circular References** - Objects referencing each other preventing GC
## Typical Workflow
1. **Detect**: Run profiler, take heap snapshots
2. **Analyze**: Compare snapshots, identify growing objects
3. **Locate**: Find retention paths, trace to source
4. **Fix**: Remove references, clean up resources
5. **Verify**: Re-profile to confirm fix
---
**Skill Version**: 1.0
**Last Updated**: 2025-11-09

View File

@@ -0,0 +1,86 @@
# Memory Profiling Examples
Production memory profiling implementations for Node.js and Python with leak detection, heap analysis, and optimization strategies.
## Examples Overview
### Node.js Memory Leak Detection
**File**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
Identifying and fixing memory leaks in Node.js applications:
- **Memory leak detection**: Chrome DevTools, heapdump analysis
- **Common leak patterns**: Event listeners, closures, global variables
- **Heap snapshots**: Before/after comparison, retained object analysis
- **Real leak**: EventEmitter leak causing 2GB memory growth
- **Fix**: Proper cleanup with `removeListener()`, WeakMap for caching
- **Result**: Memory stabilized at 150MB (93% reduction)
**Use when**: Node.js memory growing over time, debugging production memory issues
---
### Python Memory Profiling with Scalene
**File**: [python-scalene-profiling.md](python-scalene-profiling.md)
Line-by-line memory profiling for Python applications:
- **Scalene setup**: Installation, pytest integration, CLI usage
- **Memory hotspots**: Line-by-line allocation tracking
- **CPU + Memory**: Combined profiling for performance bottlenecks
- **Real scenario**: 500MB dataset causing OOM, fixed with generators
- **Optimization**: List comprehension → generator (500MB → 5MB)
- **Result**: 99% memory reduction, no OOM errors
**Use when**: Python memory spikes, profiling pytest tests, finding allocation hotspots
---
### Database Connection Pool Leak
**File**: [database-connection-leak.md](database-connection-leak.md)
PostgreSQL connection pool exhaustion and memory leaks:
- **Symptom**: Connection pool maxed out, memory growing linearly
- **Root cause**: Unclosed connections in error paths, missing `finally` blocks
- **Detection**: Connection pool metrics, memory profiling
- **Fix**: Context managers (`with` statement), proper cleanup
- **Result**: Zero connection leaks, memory stable at 80MB
**Use when**: Database connection errors, "too many clients" errors, connection pool issues
---
### Large Dataset Memory Optimization
**File**: [large-dataset-optimization.md](large-dataset-optimization.md)
Memory-efficient data processing for large datasets:
- **Problem**: Loading 10GB CSV into memory (OOM killer)
- **Solutions**: Streaming with `pandas.read_csv(chunksize)`, generators, memory mapping
- **Techniques**: Lazy evaluation, columnar processing, batch processing
- **Before/After**: 10GB memory → 500MB (95% reduction)
- **Tools**: Pandas chunking, Dask for parallel processing
**Use when**: Processing large files, OOM errors, batch data processing
---
## Quick Navigation
| Topic | File | Lines | Focus |
|-------|------|-------|-------|
| **Node.js Leaks** | [nodejs-memory-leak.md](nodejs-memory-leak.md) | ~450 | EventEmitter, heap snapshots |
| **Python Scalene** | [python-scalene-profiling.md](python-scalene-profiling.md) | ~420 | Line-by-line profiling |
| **DB Connection Leaks** | [database-connection-leak.md](database-connection-leak.md) | ~380 | Connection pool management |
| **Large Datasets** | [large-dataset-optimization.md](large-dataset-optimization.md) | ~400 | Streaming, chunking |
## Related Documentation
- **Reference**: [Reference Index](../reference/INDEX.md) - Memory patterns, profiling tools
- **Templates**: [Templates Index](../templates/INDEX.md) - Profiling report template
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
---
Return to [main agent](../memory-profiler.md)

View File

@@ -0,0 +1,490 @@
# Database Connection Pool Memory Leaks
Detecting and fixing PostgreSQL connection pool leaks in FastAPI applications using connection monitoring and proper cleanup patterns.
## Overview
**Before Optimization**:
- Active connections: 95/100 (pool exhausted)
- Connection timeouts: 15-20/min during peak
- Memory growth: 100MB/hour (unclosed connections)
- Service restarts: 3-4x/day
**After Optimization**:
- Active connections: 8-12/100 (healthy pool)
- Connection timeouts: 0/day
- Memory growth: 0MB/hour (stable)
- Service restarts: 0/month
**Tools**: asyncpg, SQLModel, psycopg3, pg_stat_activity, Prometheus
## 1. Connection Pool Architecture
### Grey Haven Stack: PostgreSQL + SQLModel
**Connection Pool Configuration**:
```python
# database.py
from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool
# ❌ VULNERABLE: No max_overflow, no timeout
engine = create_engine(
"postgresql://user:pass@localhost/db",
poolclass=QueuePool,
pool_size=20,
echo=True
)
# ✅ SECURE: Proper pool configuration
engine = create_engine(
"postgresql://user:pass@localhost/db",
poolclass=QueuePool,
pool_size=20, # Core connections
max_overflow=10, # Max additional connections
pool_timeout=30, # Wait timeout (seconds)
pool_recycle=3600, # Recycle after 1 hour
pool_pre_ping=True, # Verify connection before use
echo=False
)
```
**Pool Health Monitoring**:
```python
# monitoring.py
from prometheus_client import Gauge
# Prometheus metrics
db_pool_size = Gauge('db_pool_connections_total', 'Total pool size')
db_pool_active = Gauge('db_pool_connections_active', 'Active connections')
db_pool_idle = Gauge('db_pool_connections_idle', 'Idle connections')
db_pool_overflow = Gauge('db_pool_connections_overflow', 'Overflow connections')
def record_pool_metrics(engine):
pool = engine.pool
db_pool_size.set(pool.size())
db_pool_active.set(pool.checkedout())
db_pool_idle.set(pool.size() - pool.checkedout())
db_pool_overflow.set(pool.overflow())
```
## 2. Common Leak Pattern: Unclosed Connections
### Vulnerable Code (Connection Leak)
```python
# api/orders.py (BEFORE)
from fastapi import APIRouter, Depends
from sqlmodel import Session, select
from database import engine
router = APIRouter()
@router.get("/orders")
async def get_orders():
# ❌ LEAK: Connection never closed
session = Session(engine)
# If exception occurs here, session never closed
orders = session.exec(select(Order)).all()
# If return happens here, session never closed
return orders
# session.close() never reached if early return/exception
session.close()
```
**What Happens**:
1. Every request acquires connection from pool
2. Exception/early return prevents `session.close()`
3. Connection remains in "active" state
4. Pool exhausts after 100 requests (pool_size=100)
5. New requests timeout waiting for connection
**Memory Impact**:
```
Initial pool: 20 connections (40MB)
After 1 hour: 95 leaked connections (190MB)
After 6 hours: Pool exhausted + 100MB leaked memory
```
### Fixed Code (Context Manager)
```python
# api/orders.py (AFTER)
from fastapi import APIRouter, Depends
from sqlmodel import Session, select
from database import engine, get_session
from contextlib import contextmanager
router = APIRouter()
# ✅ Option 1: FastAPI dependency injection (recommended)
def get_session():
"""Session dependency with automatic cleanup"""
with Session(engine) as session:
yield session
@router.get("/orders")
async def get_orders(session: Session = Depends(get_session)):
# Session automatically closed after request
orders = session.exec(select(Order)).all()
return orders
# ✅ Option 2: Explicit context manager
@router.get("/orders-alt")
async def get_orders_alt():
with Session(engine) as session:
orders = session.exec(select(Order)).all()
return orders
# Session guaranteed to close (even on exception)
```
**Why This Works**:
- Context manager ensures `session.close()` called in `__exit__`
- Works even if exception raised
- Works even if early return
- FastAPI `Depends()` handles async cleanup
## 3. Async Connection Leaks (asyncpg)
### Vulnerable Async Pattern
```python
# api/analytics.py (BEFORE)
import asyncpg
from fastapi import APIRouter
router = APIRouter()
@router.get("/analytics")
async def get_analytics():
# ❌ LEAK: Connection never closed
conn = await asyncpg.connect(
user='postgres',
password='secret',
database='analytics'
)
# Exception here = connection leaked
result = await conn.fetch('SELECT * FROM metrics WHERE date > $1', date)
# Early return = connection leaked
if not result:
return []
await conn.close() # Never reached
return result
```
### Fixed Async Pattern
```python
# api/analytics.py (AFTER)
import asyncpg
from fastapi import APIRouter
from contextlib import asynccontextmanager
router = APIRouter()
# ✅ Connection pool (shared across requests)
pool: asyncpg.Pool = None
@asynccontextmanager
async def get_db_connection():
"""Async context manager for connections"""
conn = await pool.acquire()
try:
yield conn
finally:
await pool.release(conn)
@router.get("/analytics")
async def get_analytics():
async with get_db_connection() as conn:
result = await conn.fetch(
'SELECT * FROM metrics WHERE date > $1',
date
)
return result
# Connection automatically released to pool
```
**Pool Setup** (application startup):
```python
# main.py
from fastapi import FastAPI
import asyncpg
app = FastAPI()
@app.on_event("startup")
async def startup():
global pool
pool = await asyncpg.create_pool(
user='postgres',
password='secret',
database='analytics',
min_size=10, # Minimum connections
max_size=20, # Maximum connections
max_inactive_connection_lifetime=300 # Recycle after 5 min
)
@app.on_event("shutdown")
async def shutdown():
await pool.close()
```
## 4. Transaction Leak Detection
### Monitoring Active Connections
**PostgreSQL Query**:
```sql
-- Show active connections with details
SELECT
pid,
usename,
application_name,
client_addr,
state,
query,
state_change,
NOW() - state_change AS duration
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
```
**Prometheus Metrics**:
```python
# monitoring.py
from prometheus_client import Gauge
import asyncpg
db_connections_active = Gauge(
'db_connections_active',
'Active database connections',
['state']
)
async def monitor_connections(pool: asyncpg.Pool):
"""Monitor PostgreSQL connections every 30 seconds"""
async with pool.acquire() as conn:
rows = await conn.fetch("""
SELECT state, COUNT(*) as count
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
""")
for row in rows:
db_connections_active.labels(state=row['state']).set(row['count'])
```
**Grafana Alert** (connection leak):
```yaml
alert: DatabaseConnectionLeak
expr: db_connections_active{state="active"} > 80
for: 5m
annotations:
summary: "Potential connection leak ({{ $value }} active connections)"
description: "Active connections have been above 80 for 5+ minutes"
```
## 5. Real-World Fix: FastAPI Order Service
### Before (Connection Pool Exhaustion)
```python
# services/order_processor.py (BEFORE)
from sqlmodel import Session, select
from database import engine
from models import Order, OrderItem
class OrderProcessor:
async def process_order(self, order_id: int):
# ❌ LEAK: Multiple sessions, some never closed
session1 = Session(engine)
order = session1.get(Order, order_id)
if not order:
# Early return = session1 leaked
return None
# ❌ LEAK: Second session
session2 = Session(engine)
items = session2.exec(
select(OrderItem).where(OrderItem.order_id == order_id)
).all()
# Exception here = both sessions leaked
total = sum(item.price * item.quantity for item in items)
order.total = total
session1.commit()
# Only session1 closed, session2 leaked
session1.close()
return order
```
**Metrics (Before)**:
```
Connection pool: 100 connections
Active connections after 1 hour: 95/100
Leaked connections: ~12/min
Memory growth: 100MB/hour
Pool exhaustion: Every 6-8 hours
```
### After (Proper Resource Management)
```python
# services/order_processor.py (AFTER)
from sqlmodel import Session, select
from database import engine, get_session
from models import Order, OrderItem
from contextlib import contextmanager
class OrderProcessor:
async def process_order(self, order_id: int):
# ✅ Single session, guaranteed cleanup
with Session(engine) as session:
# Query order
order = session.get(Order, order_id)
if not order:
return None
# Query items (same session)
items = session.exec(
select(OrderItem).where(OrderItem.order_id == order_id)
).all()
# Calculate total
total = sum(item.price * item.quantity for item in items)
# Update order
order.total = total
session.add(order)
session.commit()
session.refresh(order)
return order
# Session automatically closed (even on exception)
```
**Metrics (After)**:
```
Connection pool: 100 connections
Active connections: 8-12/100 (stable)
Leaked connections: 0/day
Memory growth: 0MB/hour
Pool exhaustion: Never (0 incidents/month)
```
## 6. Connection Pool Configuration Best Practices
### Recommended Settings (Grey Haven Stack)
```python
# database.py - Production settings
from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
database_url,
poolclass=QueuePool,
pool_size=20, # (workers * connections/worker) + buffer
max_overflow=10, # 50% of pool_size
pool_timeout=30, # Wait timeout
pool_recycle=3600, # Recycle after 1h
pool_pre_ping=True # Health check
)
```
**Pool Size Formula**: `pool_size = (workers * conn_per_worker) + buffer`
Example: `(4 workers * 3 conn) + 8 buffer = 20`
## 7. Testing Connection Cleanup
### Pytest Fixture for Connection Tracking
```python
# tests/conftest.py
import pytest
from sqlmodel import Session, create_engine
@pytest.fixture
def engine():
"""Test engine with connection tracking"""
test_engine = create_engine("postgresql://test:test@localhost/test_db", pool_size=5)
initial_active = test_engine.pool.checkedout()
yield test_engine
final_active = test_engine.pool.checkedout()
assert final_active == initial_active, f"Leaked {final_active - initial_active} connections"
@pytest.mark.asyncio
async def test_no_connection_leak_under_load(engine):
"""Simulate 1000 concurrent requests"""
initial = engine.pool.checkedout()
tasks = [get_orders() for _ in range(1000)]
await asyncio.gather(*tasks)
await asyncio.sleep(1)
assert engine.pool.checkedout() == initial, "Connection leak detected"
```
## 8. CI/CD Integration
```yaml
# .github/workflows/connection-leak-test.yml
name: Connection Leak Detection
on: [pull_request]
jobs:
leak-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env: {POSTGRES_PASSWORD: test, POSTGRES_DB: test_db}
ports: [5432:5432]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with: {python-version: '3.11'}
- run: pip install -r requirements.txt pytest pytest-asyncio
- run: pytest tests/test_connection_leaks.py -v
```
## 9. Results and Impact
### Before vs After Metrics
| Metric | Before | After | Impact |
|--------|--------|-------|--------|
| **Active Connections** | 95/100 (95%) | 8-12/100 (10%) | **85% reduction** |
| **Connection Timeouts** | 15-20/min | 0/day | **100% eliminated** |
| **Memory Growth** | 100MB/hour | 0MB/hour | **100% eliminated** |
| **Service Restarts** | 3-4x/day | 0/month | **100% eliminated** |
| **Pool Wait Time (p95)** | 5.2s | 0.01s | **99.8% faster** |
### Key Optimizations Applied
1. **Context Managers**: Guaranteed connection cleanup (even on exceptions)
2. **FastAPI Dependencies**: Automatic session lifecycle management
3. **Connection Pooling**: Proper pool_size, max_overflow, pool_timeout
4. **Prometheus Monitoring**: Real-time pool saturation metrics
5. **Load Testing**: CI/CD checks for connection leaks
## Related Documentation
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
- **Large Datasets**: [large-dataset-optimization.md](large-dataset-optimization.md)
- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
---
Return to [examples index](INDEX.md)

View File

@@ -0,0 +1,452 @@
# Large Dataset Memory Optimization
Memory-efficient patterns for processing multi-GB datasets in Python and Node.js without OOM errors.
## Overview
**Before Optimization**:
- Dataset size: 10GB CSV (50M rows)
- Memory usage: 20GB (2x dataset size)
- Processing time: 45 minutes
- OOM errors: Frequent (3-4x/day)
**After Optimization**:
- Dataset size: Same (10GB, 50M rows)
- Memory usage: 500MB (constant)
- Processing time: 12 minutes (73% faster)
- OOM errors: 0/month
**Tools**: Polars, pandas chunking, generators, streaming parsers
## 1. Problem: Loading Entire Dataset
### Vulnerable Pattern (Pandas read_csv)
```python
# analysis.py (BEFORE)
import pandas as pd
def analyze_sales_data(filename: str):
# ❌ Loads entire 10GB file into memory
df = pd.read_csv(filename) # 20GB RAM usage
# ❌ Creates copies for each operation
df['total'] = df['quantity'] * df['price'] # +10GB
df_filtered = df[df['total'] > 1000] # +8GB
df_sorted = df_filtered.sort_values('total', ascending=False) # +8GB
# Peak memory: 46GB for 10GB file!
return df_sorted.head(100)
```
**Memory Profile**:
```
Step 1 (read_csv): 20GB
Step 2 (calculation): +10GB = 30GB
Step 3 (filter): +8GB = 38GB
Step 4 (sort): +8GB = 46GB
Result: OOM on 32GB machine
```
## 2. Solution 1: Pandas Chunking
### Chunk-Based Processing
```python
# analysis.py (AFTER - Chunking)
import pandas as pd
from typing import Iterator
def analyze_sales_data_chunked(filename: str, chunk_size: int = 100000):
"""Process 100K rows at a time (constant memory)"""
top_sales = []
# ✅ Process in chunks (100K rows = ~50MB each)
for chunk in pd.read_csv(filename, chunksize=chunk_size):
# Calculate total (in-place when possible)
chunk['total'] = chunk['quantity'] * chunk['price']
# Filter high-value sales
filtered = chunk[chunk['total'] > 1000]
# Keep top 100 from this chunk
top_chunk = filtered.nlargest(100, 'total')
top_sales.append(top_chunk)
# chunk goes out of scope, memory freed
# Combine top results from all chunks
final_df = pd.concat(top_sales).nlargest(100, 'total')
return final_df
```
**Memory Profile (Chunked)**:
```
Chunk 1: 50MB (process) → 10MB (top 100) → garbage collected
Chunk 2: 50MB (process) → 10MB (top 100) → garbage collected
...
Chunk 500: 50MB (process) → 10MB (top 100) → garbage collected
Final combine: 500 * 10MB = 500MB total
Peak memory: 500MB (99% reduction!)
```
## 3. Solution 2: Polars (Lazy Evaluation)
### Polars for Large Datasets
**Why Polars**:
- 10-100x faster than pandas
- True streaming (doesn't load entire file)
- Query optimizer (like SQL databases)
- Parallel processing (uses all CPU cores)
```python
# analysis.py (POLARS)
import polars as pl
def analyze_sales_data_polars(filename: str):
"""Polars lazy evaluation - constant memory"""
result = (
pl.scan_csv(filename) # ✅ Lazy: doesn't load yet
.with_columns([
(pl.col('quantity') * pl.col('price')).alias('total')
])
.filter(pl.col('total') > 1000)
.sort('total', descending=True)
.head(100)
.collect(streaming=True) # ✅ Streaming: processes in chunks
)
return result
```
**Memory Profile (Polars Streaming)**:
```
Memory usage: 200-300MB (constant)
Processing: Parallel chunks, optimized query plan
Time: 12 minutes vs 45 minutes (pandas)
```
## 4. Node.js Streaming
### CSV Streaming with csv-parser
```typescript
// analysis.ts (BEFORE)
import fs from 'fs';
import Papa from 'papaparse';
async function analyzeSalesData(filename: string) {
// ❌ Loads entire 10GB file
const fileContent = fs.readFileSync(filename, 'utf-8'); // 20GB RAM
const parsed = Papa.parse(fileContent, { header: true }); // +10GB
// Process all rows
const results = parsed.data.map(row => ({
total: row.quantity * row.price
}));
return results; // 30GB total
}
```
**Fixed with Streaming**:
```typescript
// analysis.ts (AFTER - Streaming)
import fs from 'fs';
import csv from 'csv-parser';
import { pipeline } from 'stream/promises';
async function analyzeSalesDataStreaming(filename: string) {
const topSales: Array<{row: any, total: number}> = [];
await pipeline(
fs.createReadStream(filename), // ✅ Stream (not load all)
csv(),
async function* (source) {
for await (const row of source) {
const total = row.quantity * row.price;
if (total > 1000) {
topSales.push({ row, total });
// Keep only top 100 (memory bounded)
if (topSales.length > 100) {
topSales.sort((a, b) => b.total - a.total);
topSales.length = 100;
}
}
}
yield topSales;
}
);
return topSales;
}
```
**Memory Profile (Streaming)**:
```
Buffer: 64KB (stream chunk size)
Processing: One row at a time
Array: 100 rows max (bounded)
Peak memory: 5MB vs 30GB (99.98% reduction!)
```
## 5. Generator Pattern (Python)
### Memory-Efficient Pipeline
```python
# pipeline.py (Generator-based)
from typing import Iterator
import csv
def read_csv_streaming(filename: str) -> Iterator[dict]:
"""Read CSV line by line (not all at once)"""
with open(filename, 'r') as f:
reader = csv.DictReader(f)
for row in reader:
yield row # ✅ One row at a time
def calculate_totals(rows: Iterator[dict]) -> Iterator[dict]:
"""Calculate totals (lazy)"""
for row in rows:
row['total'] = float(row['quantity']) * float(row['price'])
yield row
def filter_high_value(rows: Iterator[dict], threshold: float = 1000) -> Iterator[dict]:
"""Filter high-value sales (lazy)"""
for row in rows:
if row['total'] > threshold:
yield row
def top_n(rows: Iterator[dict], n: int = 100) -> list[dict]:
"""Keep top N rows (bounded memory)"""
import heapq
return heapq.nlargest(n, rows, key=lambda x: x['total'])
# ✅ Pipeline: each stage processes one row at a time
def analyze_sales_pipeline(filename: str):
rows = read_csv_streaming(filename)
with_totals = calculate_totals(rows)
high_value = filter_high_value(with_totals)
top_100 = top_n(high_value, 100)
return top_100
```
**Memory Profile (Generator Pipeline)**:
```
Stage 1 (read): 1 row (few KB)
Stage 2 (calculate): 1 row (few KB)
Stage 3 (filter): 1 row (few KB)
Stage 4 (top_n): 100 rows (bounded)
Peak memory: <1MB (constant)
```
## 6. Real-World: E-Commerce Analytics
### Before (Pandas load_all)
```python
# analytics_service.py (BEFORE)
import pandas as pd
class AnalyticsService:
def generate_sales_report(self, start_date: str, end_date: str):
# ❌ Load entire orders table (10GB)
orders = pd.read_sql(
"SELECT * FROM orders WHERE date BETWEEN %s AND %s",
engine,
params=(start_date, end_date)
) # 20GB RAM
# ❌ Load entire order_items (50GB)
items = pd.read_sql("SELECT * FROM order_items", engine) # +100GB RAM
# Join (creates another copy)
merged = orders.merge(items, on='order_id') # +150GB
# Aggregate
summary = merged.groupby('category').agg({
'total': 'sum',
'quantity': 'sum'
})
return summary # Peak: 270GB - OOM!
```
### After (Database Aggregation + Chunking)
```python
# analytics_service.py (AFTER)
import pandas as pd
class AnalyticsService:
def generate_sales_report(self, start_date: str, end_date: str):
# ✅ Aggregate in database (PostgreSQL does the work)
query = """
SELECT
oi.category,
SUM(oi.price * oi.quantity) as total,
SUM(oi.quantity) as quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.date BETWEEN %(start)s AND %(end)s
GROUP BY oi.category
"""
# Result: aggregated data (few KB, not 270GB!)
summary = pd.read_sql(
query,
engine,
params={'start': start_date, 'end': end_date}
)
return summary # Peak: 1MB vs 270GB
```
**Metrics**:
```
Before: 270GB RAM, OOM error
After: 1MB RAM, 99.9996% reduction
Time: 45 min → 30 seconds (90x faster)
```
## 7. Dask for Parallel Processing
### Dask DataFrame (Parallel Chunking)
```python
# analysis_dask.py
import dask.dataframe as dd
def analyze_sales_data_dask(filename: str):
"""Process in parallel chunks across CPU cores"""
# ✅ Lazy loading, parallel processing
df = dd.read_csv(
filename,
blocksize='64MB' # Process 64MB chunks
)
# All operations are lazy (no computation yet)
df['total'] = df['quantity'] * df['price']
filtered = df[df['total'] > 1000]
top_100 = filtered.nlargest(100, 'total')
# ✅ Trigger computation (parallel across cores)
result = top_100.compute()
return result
```
**Memory Profile (Dask)**:
```
Workers: 8 (one per CPU core)
Memory per worker: 100MB
Total memory: 800MB vs 46GB
Speed: 4-8x faster (parallel)
```
## 8. Memory Monitoring
### Track Memory Usage During Processing
```python
# monitor.py
import tracemalloc
import psutil
from contextlib import contextmanager
@contextmanager
def memory_monitor(label: str):
"""Monitor memory usage of code block"""
# Start tracking
tracemalloc.start()
process = psutil.Process()
mem_before = process.memory_info().rss / 1024 / 1024 # MB
yield
# Measure after
mem_after = process.memory_info().rss / 1024 / 1024
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"{label}:")
print(f" Memory before: {mem_before:.1f} MB")
print(f" Memory after: {mem_after:.1f} MB")
print(f" Memory delta: {mem_after - mem_before:.1f} MB")
print(f" Peak traced: {peak / 1024 / 1024:.1f} MB")
# Usage
with memory_monitor("Pandas load_all"):
df = pd.read_csv("large_file.csv") # Shows high memory usage
with memory_monitor("Polars streaming"):
df = pl.scan_csv("large_file.csv").collect(streaming=True) # Low memory
```
## 9. Optimization Decision Tree
**Choose the right tool based on dataset size**:
```
Dataset < 1GB:
→ Use pandas.read_csv() (simple, fast)
Dataset 1-10GB:
→ Use pandas chunking (chunksize=100000)
→ Or Polars streaming (faster, less memory)
Dataset 10-100GB:
→ Use Polars streaming (best performance)
→ Or Dask (parallel processing)
→ Or Database aggregation (PostgreSQL, ClickHouse)
Dataset > 100GB:
→ Database aggregation (required)
→ Or Spark/Ray (distributed computing)
→ Never load into memory
```
## 10. Results and Impact
### Before vs After Metrics
| Metric | Before (pandas) | After (Polars) | Impact |
|--------|----------------|----------------|--------|
| **Memory Usage** | 46GB | 300MB | **99.3% reduction** |
| **Processing Time** | 45 min | 12 min | **73% faster** |
| **OOM Errors** | 3-4/day | 0/month | **100% eliminated** |
| **Max Dataset Size** | 10GB | 500GB+ | **50x scalability** |
### Key Optimizations Applied
1. **Chunking**: Process 100K rows at a time (constant memory)
2. **Lazy Evaluation**: Polars/Dask don't load until needed
3. **Streaming**: One row at a time (generators, Node.js streams)
4. **Database Aggregation**: Let PostgreSQL do the work
5. **Bounded Memory**: heapq.nlargest() keeps top N (not all rows)
### Cost Savings
**Infrastructure costs**:
- Before: r5.8xlarge (256GB RAM) = $1.344/hour
- After: r5.large (16GB RAM) = $0.084/hour
- **Savings**: 94% reduction ($23,000/year per service)
## Related Documentation
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
- **Reference**: [../reference/memory-optimization-patterns.md](../reference/memory-optimization-patterns.md)
---
Return to [examples index](INDEX.md)

View File

@@ -0,0 +1,490 @@
# Node.js Memory Leak Detection
Identifying and fixing memory leaks in Node.js applications using Chrome DevTools, heapdump, and memory profiling techniques.
## Overview
**Symptoms Before Fix**:
- Memory usage: 150MB → 2GB over 6 hours
- Heap size growing linearly (5MB/minute)
- V8 garbage collection ineffective
- Production outages (OOM killer)
**After Fix**:
- Memory stable at 150MB (93% reduction)
- Heap size constant over time
- Zero OOM errors in 30 days
- Proper resource cleanup
**Tools**: Chrome DevTools, heapdump, memwatch-next, Prometheus monitoring
## 1. Memory Leak Symptoms
### Linear Memory Growth
```bash
# Monitor Node.js memory usage
node --expose-gc --inspect app.js
# Connect Chrome DevTools: chrome://inspect
# Memory tab → Take heap snapshot every 5 minutes
```
**Heap growth pattern**:
```
Time | Heap Size | External | Total
------|-----------|----------|-------
0 min | 50MB | 10MB | 60MB
5 min | 75MB | 15MB | 90MB
10min | 100MB | 20MB | 120MB
15min | 125MB | 25MB | 150MB
... | ... | ... | ...
6 hrs | 1.8GB | 200MB | 2GB
```
**Diagnosis**: Linear growth indicates memory leak (not normal sawtooth GC pattern)
### High GC Activity
```javascript
// Monitor GC events
const v8 = require('v8');
const memoryUsage = process.memoryUsage();
setInterval(() => {
const usage = process.memoryUsage();
console.log({
heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
external: `${Math.round(usage.external / 1024 / 1024)}MB`,
rss: `${Math.round(usage.rss / 1024 / 1024)}MB`
});
}, 60000); // Every minute
```
**Output showing leak**:
```
{heapUsed: '75MB', heapTotal: '100MB', external: '15MB', rss: '120MB'}
{heapUsed: '100MB', heapTotal: '130MB', external: '20MB', rss: '150MB'}
{heapUsed: '125MB', heapTotal: '160MB', external: '25MB', rss: '185MB'}
```
## 2. Heap Snapshot Analysis
### Taking Heap Snapshots
```javascript
// Generate heap snapshot programmatically
const v8 = require('v8');
const fs = require('fs');
function takeHeapSnapshot(filename) {
const heapSnapshot = v8.writeHeapSnapshot(filename);
console.log(`Heap snapshot written to ${heapSnapshot}`);
}
// Take snapshot every hour
setInterval(() => {
const timestamp = new Date().toISOString().replace(/:/g, '-');
takeHeapSnapshot(`heap-${timestamp}.heapsnapshot`);
}, 3600000);
```
### Analyzing Snapshots in Chrome DevTools
**Steps**:
1. Load two snapshots (before and after 1 hour)
2. Compare snapshots (Comparison view)
3. Sort by "Size Delta" (descending)
4. Look for objects growing significantly
**Example Analysis**:
```
Object Type | Count | Size Delta | Retained Size
----------------------|--------|------------|---------------
(array) | +5,000 | +50MB | +60MB
EventEmitter | +1,200 | +12MB | +15MB
Closure (anonymous) | +800 | +8MB | +10MB
```
**Diagnosis**: EventEmitter count growing = likely event listener leak
### Retained Objects Analysis
```javascript
// Chrome DevTools → Heap Snapshot → Summary → sort by "Retained Size"
// Click object → view Retainer tree
```
**Retainer tree example** (EventEmitter leak):
```
EventEmitter @123456
← listeners: Array[50]
← _events.data: Array
← EventEmitter @123456 (self-reference leak!)
```
## 3. Common Memory Leak Patterns
### Pattern 1: Event Listener Leak
**Vulnerable Code**:
```typescript
// ❌ LEAK: EventEmitter listeners never removed
import {EventEmitter} from 'events';
class DataProcessor {
private emitter = new EventEmitter();
async processOrders() {
// Add listener every time function called
this.emitter.on('data', (data) => {
console.log('Processing:', data);
});
// Emit 1000 events
for (let i = 0; i < 1000; i++) {
this.emitter.emit('data', {id: i});
}
}
}
// Called 1000 times = 1000 listeners accumulate!
setInterval(() => new DataProcessor().processOrders(), 1000);
```
**Result**: 1000 listeners/second = 3.6M listeners/hour → 2GB memory leak
**Fixed Code**:
```typescript
// ✅ FIXED: Remove listener after use
class DataProcessor {
private emitter = new EventEmitter();
async processOrders() {
const handler = (data) => {
console.log('Processing:', data);
};
this.emitter.on('data', handler);
try {
for (let i = 0; i < 1000; i++) {
this.emitter.emit('data', {id: i});
}
} finally {
// ✅ Clean up listener
this.emitter.removeListener('data', handler);
}
}
}
```
**Better**: Use `once()` for one-time listeners:
```typescript
this.emitter.once('data', handler); // Auto-removed after first emit
```
### Pattern 2: Closure Leak
**Vulnerable Code**:
```typescript
// ❌ LEAK: Closure captures large object
const cache = new Map();
function processRequest(userId: string) {
const largeData = fetchLargeDataset(userId); // 10MB object
// Closure captures entire largeData
cache.set(userId, () => {
return largeData.summary; // Only need summary (1KB)
});
}
// Called for 1000 users = 10GB in cache!
```
**Fixed Code**:
```typescript
// ✅ FIXED: Only store what you need
const cache = new Map();
function processRequest(userId: string) {
const largeData = fetchLargeDataset(userId);
const summary = largeData.summary; // Extract only 1KB
// Store minimal data
cache.set(userId, () => summary);
}
// 1000 users = 1MB in cache ✅
```
### Pattern 3: Global Variable Accumulation
**Vulnerable Code**:
```typescript
// ❌ LEAK: Global array keeps growing
const requestLog: Request[] = [];
app.post('/api/orders', (req, res) => {
requestLog.push(req); // Never removed!
// ... process order
});
// 1M requests = 1M objects in memory permanently
```
**Fixed Code**:
```typescript
// ✅ FIXED: Use LRU cache with size limit
import LRU from 'lru-cache';
const requestLog = new LRU({
max: 1000, // Maximum 1000 items
ttl: 1000 * 60 * 5 // 5-minute TTL
});
app.post('/api/orders', (req, res) => {
requestLog.set(req.id, req); // Auto-evicts old items
});
```
### Pattern 4: Forgotten Timers/Intervals
**Vulnerable Code**:
```typescript
// ❌ LEAK: setInterval never cleared
class ReportGenerator {
private data: any[] = [];
start() {
setInterval(() => {
this.data.push(generateReport()); // Accumulates forever
}, 60000);
}
}
// Each instance leaks!
const generator = new ReportGenerator();
generator.start();
```
**Fixed Code**:
```typescript
// ✅ FIXED: Clear interval on cleanup
class ReportGenerator {
private data: any[] = [];
private intervalId?: NodeJS.Timeout;
start() {
this.intervalId = setInterval(() => {
this.data.push(generateReport());
}, 60000);
}
stop() {
if (this.intervalId) {
clearInterval(this.intervalId);
this.intervalId = undefined;
this.data = []; // Clear accumulated data
}
}
}
```
## 4. Memory Profiling with memwatch-next
### Installation
```bash
bun add memwatch-next
```
### Leak Detection
```typescript
// memory-monitor.ts
import memwatch from 'memwatch-next';
// Detect memory leaks
memwatch.on('leak', (info) => {
console.error('Memory leak detected:', {
growth: info.growth,
reason: info.reason,
current_base: `${Math.round(info.current_base / 1024 / 1024)}MB`,
leaked: `${Math.round((info.current_base - info.start) / 1024 / 1024)}MB`
});
// Alert to PagerDuty/Slack
alertOps('Memory leak detected', info);
});
// Monitor GC stats
memwatch.on('stats', (stats) => {
console.log('GC stats:', {
used_heap_size: `${Math.round(stats.used_heap_size / 1024 / 1024)}MB`,
heap_size_limit: `${Math.round(stats.heap_size_limit / 1024 / 1024)}MB`,
num_full_gc: stats.num_full_gc,
num_inc_gc: stats.num_inc_gc
});
});
```
### HeapDiff for Leak Analysis
```typescript
import memwatch from 'memwatch-next';
const hd = new memwatch.HeapDiff();
// Simulate leak
const leak: any[] = [];
for (let i = 0; i < 10000; i++) {
leak.push({data: new Array(1000).fill('x')});
}
// Compare heaps
const diff = hd.end();
console.log('Heap diff:', JSON.stringify(diff, null, 2));
// Output:
// {
// "before": {"nodes": 12345, "size": 50000000},
// "after": {"nodes": 22345, "size": 150000000},
// "change": {
// "size_bytes": 100000000, // 100MB leak!
// "size": "100.00MB",
// "freed_nodes": 100,
// "allocated_nodes": 10100 // Net increase
// }
// }
```
## 5. Production Memory Monitoring
### Prometheus Metrics
```typescript
// metrics.ts
import {Gauge} from 'prom-client';
const memoryUsageGauge = new Gauge({
name: 'nodejs_memory_usage_bytes',
help: 'Node.js memory usage in bytes',
labelNames: ['type']
});
setInterval(() => {
const usage = process.memoryUsage();
memoryUsageGauge.set({type: 'heap_used'}, usage.heapUsed);
memoryUsageGauge.set({type: 'heap_total'}, usage.heapTotal);
memoryUsageGauge.set({type: 'external'}, usage.external);
memoryUsageGauge.set({type: 'rss'}, usage.rss);
}, 15000);
```
**Grafana Alert**:
```promql
# Alert if heap usage growing linearly
increase(nodejs_memory_usage_bytes{type="heap_used"}[1h]) > 100000000 # 100MB/hour
```
## 6. Real-World Fix: EventEmitter Leak
### Before (Leaking)
```typescript
// order-processor.ts (BEFORE FIX)
class OrderProcessor {
private emitter = new EventEmitter();
async processOrders() {
// ❌ LEAK: Listener added every call
this.emitter.on('order:created', async (order) => {
await this.sendConfirmationEmail(order);
await this.updateInventory(order);
});
const orders = await db.query.orders.findMany({status: 'pending'});
for (const order of orders) {
this.emitter.emit('order:created', order);
}
}
}
// Called every minute
setInterval(() => new OrderProcessor().processOrders(), 60000);
```
**Result**: 1,440 listeners/day → 2GB memory leak in production
### After (Fixed)
```typescript
// order-processor.ts (AFTER FIX)
class OrderProcessor {
private emitter = new EventEmitter();
private listeners = new WeakMap(); // Track listeners for cleanup
async processOrders() {
const handler = async (order) => {
await this.sendConfirmationEmail(order);
await this.updateInventory(order);
};
// ✅ Use once() for one-time processing
this.emitter.once('order:created', handler);
const orders = await db.query.orders.findMany({status: 'pending'});
for (const order of orders) {
this.emitter.emit('order:created', order);
}
// ✅ Cleanup (if using on() instead of once())
this.emitter.removeAllListeners('order:created');
}
}
```
**Result**: Memory stable at 150MB, zero leaks
## 7. Results and Impact
### Before vs After Metrics
| Metric | Before Fix | After Fix | Impact |
|--------|-----------|-----------|---------|
| **Memory Usage** | 2GB (after 6h) | 150MB (stable) | **93% reduction** |
| **Heap Size** | Linear growth (5MB/min) | Stable | **Zero growth** |
| **OOM Incidents** | 12/month | 0/month | **100% eliminated** |
| **GC Pause Time** | 200ms avg | 50ms avg | **75% faster** |
| **Uptime** | 6 hours avg | 30+ days | **120x improvement** |
### Lessons Learned
**1. Always remove event listeners**
- Use `once()` for one-time events
- Use `removeListener()` in finally blocks
- Track listeners with WeakMap for debugging
**2. Avoid closures capturing large objects**
- Extract only needed data before closure
- Use WeakMap/WeakSet for object references
- Profile with heap snapshots regularly
**3. Monitor memory in production**
- Prometheus metrics for heap usage
- Alert on linear growth patterns
- Weekly heap snapshot analysis
## Related Documentation
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
- **Reference**: [../reference/memory-patterns.md](../reference/memory-patterns.md)
- **Templates**: [../templates/memory-report.md](../templates/memory-report.md)
---
Return to [examples index](INDEX.md)

View File

@@ -0,0 +1,456 @@
# Python Memory Profiling with Scalene
Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies.
## Overview
**Before Optimization**:
- Memory usage: 500MB for processing 10K records
- OOM (Out of Memory) errors with 100K records
- Processing time: 45 seconds for 10K records
- List comprehensions loading entire dataset
**After Optimization**:
- Memory usage: 5MB for processing 10K records (99% reduction)
- No OOM errors with 1M records
- Processing time: 8 seconds for 10K records (82% faster)
- Generator-based streaming
**Tools**: Scalene, pytest, memory_profiler, tracemalloc
## 1. Scalene Installation and Setup
### Installation
```bash
# Install Scalene
pip install scalene
# Or with uv (faster)
uv pip install scalene
```
### Basic Usage
```bash
# Profile entire script
scalene script.py
# Profile with pytest (recommended)
scalene --cli --memory -m pytest tests/
# HTML output
scalene --html --outfile profile.html script.py
# Profile specific function
scalene --reduced-profile script.py
```
## 2. Profiling with pytest
### Test File Setup
```python
# tests/test_data_processing.py
import pytest
from data_processor import DataProcessor
@pytest.fixture
def processor():
return DataProcessor()
def test_process_large_dataset(processor):
# Generate 10K records
records = [{'id': i, 'value': i * 2} for i in range(10000)]
# Process (this is where memory spike occurs)
result = processor.process_records(records)
assert len(result) == 10000
```
### Running Scalene with pytest
```bash
# Profile memory usage during test execution
uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test"
# Output shows line-by-line memory allocation
```
**Scalene Output** (before optimization):
```
data_processor.py:
Line | Memory % | Memory (MB) | CPU % | Code
-----|----------|-------------|-------|-----
12 | 45% | 225 MB | 10% | result = [transform(r) for r in records]
18 | 30% | 150 MB | 5% | filtered = [r for r in result if r['value'] > 0]
25 | 15% | 75 MB | 20% | sorted_data = sorted(filtered, key=lambda x: x['id'])
```
**Analysis**: Line 12 is the hotspot (45% of memory)
## 3. Memory Hotspot Identification
### Vulnerable Code (Memory Spike)
```python
# data_processor.py (BEFORE OPTIMIZATION)
class DataProcessor:
def process_records(self, records: list[dict]) -> list[dict]:
# ❌ HOTSPOT: List comprehension loads entire dataset
result = [self.transform(r) for r in records] # 225MB for 10K records
# ❌ Creates another copy
filtered = [r for r in result if r['value'] > 0] # +150MB
# ❌ sorted() creates yet another copy
sorted_data = sorted(filtered, key=lambda x: x['id']) # +75MB
return sorted_data # Total: 450MB for 10K records
def transform(self, record: dict) -> dict:
return {
'id': record['id'],
'value': record['value'] * 2,
'timestamp': datetime.now()
}
```
**Scalene Report**:
```
Memory allocation breakdown:
- Line 12 (list comprehension): 225MB (50%)
- Line 18 (filtering): 150MB (33%)
- Line 25 (sorting): 75MB (17%)
Total memory: 450MB for 10,000 records
Projected for 100K: 4.5GB → OOM!
```
### Optimized Code (Generator-Based)
```python
# data_processor.py (AFTER OPTIMIZATION)
from typing import Iterator
class DataProcessor:
def process_records(self, records: list[dict]) -> Iterator[dict]:
# ✅ Generator: processes one record at a time
transformed = (self.transform(r) for r in records) # O(1) memory
# ✅ Generator chaining
filtered = (r for r in transformed if r['value'] > 0) # O(1) memory
# ✅ Stream-based sorting (only if needed)
# For very large datasets, use external sorting or database ORDER BY
yield from sorted(filtered, key=lambda x: x['id']) # Still O(n), but lazy
def transform(self, record: dict) -> dict:
return {
'id': record['id'],
'value': record['value'] * 2,
'timestamp': datetime.now()
}
# Alternative: Fully streaming (no sorting)
def process_records_streaming(self, records: list[dict]) -> Iterator[dict]:
for record in records:
transformed = self.transform(record)
if transformed['value'] > 0:
yield transformed # O(1) memory, fully streaming
```
**Scalene Report (After)**:
```
Memory allocation breakdown:
- Line 12 (generator): 5MB (100% - constant overhead)
- Line 18 (filter generator): 0MB (lazy)
- Line 25 (yield): 0MB (lazy)
Total memory: 5MB for 10,000 records (99% reduction!)
Scalable to 1M+ records without OOM
```
## 4. Common Memory Patterns
### Pattern 1: List Comprehension → Generator
**Before** (High Memory):
```python
# ❌ Loads entire list into memory
def process_large_file(filename: str) -> list[dict]:
with open(filename) as f:
lines = f.readlines() # Loads entire file (500MB)
# Another copy
return [json.loads(line) for line in lines] # +500MB = 1GB total
```
**After** (Low Memory):
```python
# ✅ Generator: processes line-by-line
def process_large_file(filename: str) -> Iterator[dict]:
with open(filename) as f:
for line in f: # Reads one line at a time
yield json.loads(line) # O(1) memory
```
**Scalene diff**: 1GB → 5MB (99.5% reduction)
### Pattern 2: DataFrame Memory Optimization
**Before** (High Memory):
```python
# ❌ Loads entire CSV into memory
import pandas as pd
def analyze_data(filename: str):
df = pd.read_csv(filename) # 10GB CSV → 10GB RAM
# All transformations in memory
df['new_col'] = df['value'] * 2
df_filtered = df[df['value'] > 0]
return df_filtered.groupby('category').sum()
```
**After** (Low Memory with Chunking):
```python
# ✅ Process in chunks
import pandas as pd
def analyze_data(filename: str):
chunk_size = 10000
results = []
# Process 10K rows at a time
for chunk in pd.read_csv(filename, chunksize=chunk_size):
chunk['new_col'] = chunk['value'] * 2
filtered = chunk[chunk['value'] > 0]
group_result = filtered.groupby('category').sum()
results.append(group_result)
# Combine results
return pd.concat(results).groupby(level=0).sum() # Much smaller
```
**Scalene diff**: 10GB → 500MB (95% reduction)
### Pattern 3: String Concatenation
**Before** (High Memory):
```python
# ❌ Creates new string each iteration (O(n²) memory)
def build_report(data: list[dict]) -> str:
report = ""
for item in data: # 100K items
report += f"{item['id']}: {item['value']}\n" # New string every time
return report # 500MB final string + 500MB garbage = 1GB
```
**After** (Low Memory):
```python
# ✅ StringIO or join (O(n) memory)
from io import StringIO
def build_report(data: list[dict]) -> str:
buffer = StringIO()
for item in data:
buffer.write(f"{item['id']}: {item['value']}\n")
return buffer.getvalue()
# Or even better: generator
def build_report_streaming(data: list[dict]) -> Iterator[str]:
for item in data:
yield f"{item['id']}: {item['value']}\n"
```
**Scalene diff**: 1GB → 50MB (95% reduction)
## 5. Scalene CLI Reference
### Common Options
```bash
# Memory-only profiling (fastest)
scalene --cli --memory script.py
# CPU + Memory profiling
scalene --cli --cpu --memory script.py
# Reduced profile (functions only, not lines)
scalene --reduced-profile script.py
# Profile specific function
scalene --profile-only process_data script.py
# HTML report
scalene --html --outfile profile.html script.py
# Profile with pytest
scalene --cli --memory -m pytest tests/
# Set memory sampling interval (default: 1MB)
scalene --malloc-threshold 0.1 script.py # Sample every 100KB
```
### Interpreting Output
**Column Meanings**:
```
Memory % | Percentage of total memory allocated
Memory MB | Absolute memory allocated (in megabytes)
CPU % | Percentage of CPU time spent
Python % | Time spent in Python (vs native code)
```
**Example Output**:
```
script.py:
Line | Memory % | Memory MB | CPU % | Python % | Code
-----|----------|-----------|-------|----------|-----
12 | 45.2% | 225.6 MB | 10.5% | 95.2% | data = [x for x in range(1000000)]
18 | 30.1% | 150.3 MB | 5.2% | 98.1% | filtered = list(filter(lambda x: x > 0, data))
```
**Analysis**:
- Line 12: High memory (45.2%) → optimize list comprehension
- Line 18: Moderate memory (30.1%) → use generator instead of list()
## 6. Integration with CI/CD
### GitHub Actions Workflow
```yaml
# .github/workflows/memory-profiling.yml
name: Memory Profiling
on: [pull_request]
jobs:
profile:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install scalene pytest
- name: Run memory profiling
run: |
scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt
- name: Check for memory hotspots
run: |
if grep -q "Memory %" profile.txt; then
# Alert if any line uses >100MB
if awk '$3 > 100 {exit 1}' profile.txt; then
echo "Memory hotspot detected!"
exit 1
fi
fi
- name: Upload profile
uses: actions/upload-artifact@v3
with:
name: memory-profile
path: profile.txt
```
## 7. Real-World Optimization: CSV Processing
### Before (500MB Memory, OOM at 100K rows)
```python
# csv_processor.py (BEFORE)
import pandas as pd
class CSVProcessor:
def process_file(self, filename: str) -> dict:
# ❌ Loads entire CSV
df = pd.read_csv(filename) # 500MB for 10K rows
# ❌ Multiple copies
df['total'] = df['quantity'] * df['price']
df_filtered = df[df['total'] > 100]
summary = df_filtered.groupby('category').agg({
'total': 'sum',
'quantity': 'sum'
})
return summary.to_dict()
```
**Scalene Output**:
```
Line 8: 500MB (75%) - pd.read_csv()
Line 11: 100MB (15%) - df['total'] calculation
Line 12: 50MB (10%) - filtering
Total: 650MB for 10K rows
```
### After (5MB Memory, Handles 1M rows)
```python
# csv_processor.py (AFTER)
import pandas as pd
from collections import defaultdict
class CSVProcessor:
def process_file(self, filename: str) -> dict:
# ✅ Process in 10K row chunks
chunk_size = 10000
results = defaultdict(lambda: {'total': 0, 'quantity': 0})
for chunk in pd.read_csv(filename, chunksize=chunk_size):
chunk['total'] = chunk['quantity'] * chunk['price']
filtered = chunk[chunk['total'] > 100]
# Aggregate incrementally
for category, group in filtered.groupby('category'):
results[category]['total'] += group['total'].sum()
results[category]['quantity'] += group['quantity'].sum()
return dict(results)
```
**Scalene Output (After)**:
```
Line 9: 5MB (100%) - chunk processing (constant memory)
Total: 5MB for any file size (99% reduction)
```
## 8. Results and Impact
### Before vs After Metrics
| Metric | Before | After | Impact |
|--------|--------|-------|--------|
| **Memory Usage** | 500MB (10K rows) | 5MB (1M rows) | **99% reduction** |
| **Processing Time** | 45s (10K rows) | 8s (10K rows) | **82% faster** |
| **Max File Size** | 100K rows (OOM) | 10M+ rows | **100x scalability** |
| **OOM Errors** | 5/week | 0/month | **100% eliminated** |
### Key Optimizations Applied
1. **List comprehension → Generator**: 225MB → 0MB
2. **DataFrame chunking**: 500MB → 5MB per chunk
3. **String concatenation**: 1GB → 50MB (StringIO)
4. **Lazy evaluation**: Load on demand vs load all
## Related Documentation
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
- **Templates**: [../templates/scalene-config.txt](../templates/scalene-config.txt)
---
Return to [examples index](INDEX.md)

View File

@@ -0,0 +1,75 @@
# Memory Profiler Reference
Quick reference guides for memory optimization patterns, profiling tools, and garbage collection.
## Reference Guides
### Memory Optimization Patterns
**File**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
Comprehensive catalog of memory leak patterns and their fixes:
- **Event Listener Leaks**: EventEmitter cleanup, closure traps
- **Connection Pool Leaks**: Database connection management
- **Large Dataset Patterns**: Streaming, chunking, lazy evaluation
- **Cache Management**: LRU caches, WeakMap/WeakSet
- **Closure Memory Traps**: Variable capture, scope management
**Use when**: Quick lookup for specific memory leak pattern
---
### Profiling Tools Comparison
**File**: [profiling-tools.md](profiling-tools.md)
Comparison matrix and usage guide for memory profiling tools:
- **Node.js**: Chrome DevTools, heapdump, memwatch-next, clinic.js
- **Python**: Scalene, memory_profiler, tracemalloc, py-spy
- **Monitoring**: Prometheus, Grafana, DataDog APM
- **Tool Selection**: When to use which tool
**Use when**: Choosing the right profiling tool for your stack
---
### Garbage Collection Guide
**File**: [garbage-collection-guide.md](garbage-collection-guide.md)
Understanding and tuning garbage collectors:
- **V8 (Node.js)**: Generational GC, heap structure, --max-old-space-size
- **Python**: Reference counting, generational GC, gc.collect()
- **GC Monitoring**: Metrics, alerts, optimization
- **GC Tuning**: When and how to tune
**Use when**: GC issues, tuning performance, understanding memory behavior
---
## Quick Lookup
**Common Patterns**:
- EventEmitter leak → [memory-optimization-patterns.md#event-listener-leaks](memory-optimization-patterns.md#event-listener-leaks)
- Connection leak → [memory-optimization-patterns.md#connection-pool-leaks](memory-optimization-patterns.md#connection-pool-leaks)
- Large dataset → [memory-optimization-patterns.md#large-dataset-patterns](memory-optimization-patterns.md#large-dataset-patterns)
**Tool Selection**:
- Node.js profiling → [profiling-tools.md#nodejs-tools](profiling-tools.md#nodejs-tools)
- Python profiling → [profiling-tools.md#python-tools](profiling-tools.md#python-tools)
- Production monitoring → [profiling-tools.md#monitoring-tools](profiling-tools.md#monitoring-tools)
**GC Issues**:
- Node.js heap → [garbage-collection-guide.md#v8-heap](garbage-collection-guide.md#v8-heap)
- Python GC → [garbage-collection-guide.md#python-gc](garbage-collection-guide.md#python-gc)
- GC metrics → [garbage-collection-guide.md#gc-monitoring](garbage-collection-guide.md#gc-monitoring)
## Related Documentation
- **Examples**: [Examples Index](../examples/INDEX.md) - Full walkthroughs
- **Templates**: [Templates Index](../templates/INDEX.md) - Memory report templates
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
---
Return to [main agent](../memory-profiler.md)

View File

@@ -0,0 +1,392 @@
# Garbage Collection Guide
Understanding and tuning garbage collectors in Node.js (V8) and Python for optimal memory management.
## V8 Garbage Collector (Node.js)
### Heap Structure
**Two Generations**:
```
┌─────────────────────────────────────────────────────────┐
│ V8 Heap │
├─────────────────────────────────────────────────────────┤
│ New Space (Young Generation) - 8MB-32MB │
│ ┌─────────────┬─────────────┐ │
│ │ From-Space │ To-Space │ ← Minor GC (Scavenge) │
│ └─────────────┴─────────────┘ │
│ │
│ Old Space (Old Generation) - Remaining heap │
│ ┌──────────────────────────────────────┐ │
│ │ Long-lived objects │ ← Major GC │
│ │ (survived 2+ Minor GCs) │ (Mark-Sweep)│
│ └──────────────────────────────────────┘ │
│ │
│ Large Object Space - Objects >512KB │
└─────────────────────────────────────────────────────────┘
```
**GC Types**:
- **Scavenge (Minor GC)**: Fast (~1ms), clears new space, runs frequently
- **Mark-Sweep (Major GC)**: Slow (100-500ms), clears old space, runs when old space fills
- **Mark-Compact**: Like Mark-Sweep but also defragments memory
---
### Monitoring V8 GC
**Built-in GC Traces**:
```bash
# Enable GC logging
node --trace-gc server.js
# Output:
# [12345:0x104800000] 42 ms: Scavenge 8.5 (10.2) -> 7.8 (10.2) MB
# [12345:0x104800000] 123 ms: Mark-sweep 95.2 (100.5) -> 82.3 (100.5) MB
```
**Parse GC logs**:
```
[PID:address] time ms: GC-type before (heap) -> after (heap) MB
Scavenge = Minor GC (young generation)
Mark-sweep = Major GC (old generation)
```
**Prometheus Metrics**:
```typescript
import { Gauge } from 'prom-client';
import v8 from 'v8';
const heap_size = new Gauge({ name: 'nodejs_heap_size_total_bytes' });
const heap_used = new Gauge({ name: 'nodejs_heap_used_bytes' });
const gc_duration = new Histogram({
name: 'nodejs_gc_duration_seconds',
labelNames: ['kind']
});
// Track GC events
const PerformanceObserver = require('perf_hooks').PerformanceObserver;
const obs = new PerformanceObserver((list) => {
const entry = list.getEntries()[0];
gc_duration.labels(entry.kind).observe(entry.duration / 1000);
});
obs.observe({ entryTypes: ['gc'] });
// Update heap metrics every 10s
setInterval(() => {
const stats = v8.getHeapStatistics();
heap_size.set(stats.total_heap_size);
heap_used.set(stats.used_heap_size);
}, 10000);
```
---
### V8 GC Tuning
**Heap Size Limits**:
```bash
# Default: ~1.4GB on 64-bit systems
# Increase max heap size
node --max-old-space-size=4096 server.js # 4GB heap
# For containers (set to 75% of container memory)
# 8GB container → --max-old-space-size=6144
```
**GC Optimization Flags**:
```bash
# Aggressive GC (lower memory, more CPU)
node --optimize-for-size --gc-interval=100 server.js
# Optimize for throughput (higher memory, less CPU)
node --max-old-space-size=8192 server.js
# Expose GC to JavaScript
node --expose-gc server.js
# Then: global.gc() to force GC
```
**When to tune**:
- ✅ Container memory limits (set heap to 75% of limit)
- ✅ Frequent Major GC causing latency spikes
- ✅ OOM errors with available memory
- ❌ Don't tune as first step (fix leaks first!)
---
## Python Garbage Collector
### GC Mechanism
**Two Systems**:
1. **Reference Counting**: Primary mechanism, immediate cleanup when refcount = 0
2. **Generational GC**: Handles circular references
**Generational Structure**:
```
┌─────────────────────────────────────────────────────────┐
│ Python GC (Generational) │
├─────────────────────────────────────────────────────────┤
│ Generation 0 (Young) - Threshold: 700 objects │
│ ├─ New objects │
│ └─ Collected most frequently │
│ │
│ Generation 1 (Middle) - Threshold: 10 collections │
│ ├─ Survived 1 Gen0 collection │
│ └─ Collected less frequently │
│ │
│ Generation 2 (Old) - Threshold: 10 collections │
│ ├─ Survived Gen1 collection │
│ └─ Collected rarely │
└─────────────────────────────────────────────────────────┘
```
---
### Monitoring Python GC
**GC Statistics**:
```python
import gc
# Get GC stats
print(gc.get_stats())
# [{'collections': 42, 'collected': 123, 'uncollectable': 0}, ...]
# Get object count by generation
print(gc.get_count())
# (45, 3, 1) = (gen0, gen1, gen2) object counts
# Get thresholds
print(gc.get_threshold())
# (700, 10, 10) = collect when gen0 has 700 objects, etc.
```
**Track GC Pauses**:
```python
import gc
import time
class GCMonitor:
def __init__(self):
self.start_time = None
def on_gc_start(self, phase, info):
self.start_time = time.time()
def on_gc_finish(self, phase, info):
duration = time.time() - self.start_time
print(f"GC {phase}: {duration*1000:.1f}ms, collected {info['collected']}")
# Install callbacks
gc.callbacks.append(GCMonitor().on_gc_start)
```
**Prometheus Metrics**:
```python
from prometheus_client import Gauge, Histogram
import gc
gc_collections = Gauge('python_gc_collections_total', 'GC collections', ['generation'])
gc_collected = Gauge('python_gc_objects_collected_total', 'Objects collected', ['generation'])
gc_duration = Histogram('python_gc_duration_seconds', 'GC duration', ['generation'])
def record_gc_metrics():
stats = gc.get_stats()
for gen, stat in enumerate(stats):
gc_collections.labels(generation=gen).set(stat['collections'])
gc_collected.labels(generation=gen).set(stat['collected'])
```
---
### Python GC Tuning
**Disable GC (for batch jobs)**:
```python
import gc
# Disable automatic GC
gc.disable()
# Process large dataset without GC pauses
for chunk in large_dataset:
process(chunk)
# Manual GC at end
gc.collect()
```
**Adjust Thresholds**:
```python
import gc
# Default: (700, 10, 10)
# More aggressive: collect more often, lower memory
gc.set_threshold(400, 5, 5)
# Less aggressive: collect less often, higher memory but faster
gc.set_threshold(1000, 15, 15)
```
**Debug Circular References**:
```python
import gc
# Find objects that can't be collected
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
print(f"Uncollectable: {len(gc.garbage)}")
for obj in gc.garbage:
print(type(obj), obj)
```
**When to tune**:
- ✅ Batch jobs: disable GC, manual collect at end
- ✅ Real-time systems: adjust thresholds to avoid long pauses
- ✅ Debugging: use `DEBUG_SAVEALL` to find leaks
- ❌ Don't disable GC in long-running services (memory will grow!)
---
## GC-Related Memory Issues
### Issue 1: Long GC Pauses
**Symptom**: Request latency spikes every few minutes
**V8 Fix**:
```bash
# Monitor GC pauses
node --trace-gc server.js 2>&1 | grep "Mark-sweep"
# If Major GC >500ms, increase heap size
node --max-old-space-size=4096 server.js
```
**Python Fix**:
```python
# Disable GC during request handling
import gc
gc.disable()
# Periodic manual GC (in background thread)
import threading
def periodic_gc():
while True:
time.sleep(60)
gc.collect()
threading.Thread(target=periodic_gc, daemon=True).start()
```
---
### Issue 2: Frequent Minor GC
**Symptom**: High CPU from constant minor GC
**Cause**: Too many short-lived objects
**Fix**: Reduce allocations
```python
# ❌ BAD: Creates many temporary objects
def process_data(items):
return [str(i) for i in items] # New list + strings
# ✅ BETTER: Generator (no intermediate list)
def process_data(items):
return (str(i) for i in items)
```
---
### Issue 3: Memory Not Released After GC
**Symptom**: Heap usage high even after GC
**V8 Cause**: Objects in old generation (major GC needed)
```bash
# Force full GC to reclaim memory
node --expose-gc server.js
# In code:
if (global.gc) global.gc();
```
**Python Cause**: Reference cycles
```python
# Debug reference cycles
import gc
import sys
# Find what's keeping object alive
obj = my_object
print(sys.getrefcount(obj)) # Should be low
# Get referrers
print(gc.get_referrers(obj))
```
---
## GC Alerts (Prometheus)
```yaml
# Prometheus alert rules
groups:
- name: gc_alerts
rules:
# V8: Major GC taking too long
- alert: SlowMajorGC
expr: nodejs_gc_duration_seconds{kind="major"} > 0.5
for: 5m
annotations:
summary: "Major GC >500ms ({{ $value }}s)"
# V8: High GC frequency
- alert: FrequentGC
expr: rate(nodejs_gc_duration_seconds_count[5m]) > 10
for: 10m
annotations:
summary: "GC running >10x/min"
# Python: High Gen2 collections
- alert: FrequentFullGC
expr: rate(python_gc_collections_total{generation="2"}[1h]) > 1
for: 1h
annotations:
summary: "Full GC >1x/hour (potential leak)"
```
---
## Best Practices
### V8 (Node.js)
1. **Set heap size**: `--max-old-space-size` to 75% of container memory
2. **Monitor GC**: Track duration and frequency with Prometheus
3. **Alert on slow GC**: Major GC >500ms indicates heap too small or memory leak
4. **Don't force GC**: Let V8 manage (except for tests/debugging)
### Python
1. **Use reference counting**: Most cleanup is automatic (refcount = 0)
2. **Avoid circular refs**: Use `weakref` for back-references
3. **Batch jobs**: Disable GC, manual `gc.collect()` at end
4. **Monitor Gen2**: Frequent Gen2 collections = potential leak
---
## Related Documentation
- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
- **Tools**: [profiling-tools.md](profiling-tools.md)
- **Examples**: [Examples Index](../examples/INDEX.md)
---
Return to [reference index](INDEX.md)

View File

@@ -0,0 +1,371 @@
# Memory Optimization Patterns Reference
Quick reference catalog of common memory leak patterns and their fixes.
## Event Listener Leaks
### Pattern: EventEmitter Accumulation
**Symptom**: Memory grows linearly with time/requests
**Cause**: Event listeners added but never removed
**Vulnerable**:
```typescript
// ❌ LEAK: listener added every call
class DataProcessor {
private emitter = new EventEmitter();
async process() {
this.emitter.on('data', handler); // Never removed
}
}
```
**Fixed**:
```typescript
// ✅ FIX 1: Remove listener
this.emitter.on('data', handler);
try { /* work */ } finally {
this.emitter.removeListener('data', handler);
}
// ✅ FIX 2: Use once()
this.emitter.once('data', handler); // Auto-removed
// ✅ FIX 3: Use AbortController
const controller = new AbortController();
this.emitter.on('data', handler, { signal: controller.signal });
controller.abort(); // Removes listener
```
**Detection**:
```typescript
// Check listener count
console.log(emitter.listenerCount('data')); // Should be constant
// Monitor in production
process.on('warning', (warning) => {
if (warning.name === 'MaxListenersExceededWarning') {
console.error('Listener leak detected:', warning);
}
});
```
---
## Closure Memory Traps
### Pattern: Captured Variables in Closures
**Symptom**: Memory not released after scope exits
**Cause**: Closure captures large variables
**Vulnerable**:
```typescript
// ❌ LEAK: Closure captures entire 1GB buffer
function createHandler(largeBuffer: Buffer) {
return function handler() {
// Only uses buffer.length, but captures entire buffer
console.log(largeBuffer.length);
};
}
```
**Fixed**:
```typescript
// ✅ FIX: Extract only what's needed
function createHandler(largeBuffer: Buffer) {
const length = largeBuffer.length; // Extract value
return function handler() {
console.log(length); // Only captures number, not Buffer
};
}
```
---
## Connection Pool Leaks
### Pattern: Unclosed Database Connections
**Symptom**: Pool exhaustion, connection timeouts
**Cause**: Connections acquired but not released
**Vulnerable**:
```python
# ❌ LEAK: Connection never closed on exception
def get_orders():
conn = pool.acquire()
orders = conn.execute("SELECT * FROM orders")
return orders # conn never released
```
**Fixed**:
```python
# ✅ FIX: Context manager guarantees cleanup
def get_orders():
with pool.acquire() as conn:
orders = conn.execute("SELECT * FROM orders")
return orders # conn auto-released
```
---
## Large Dataset Patterns
### Pattern 1: Loading Entire File into Memory
**Vulnerable**:
```python
# ❌ LEAK: 10GB file → 20GB RAM
df = pd.read_csv("large.csv")
```
**Fixed**:
```python
# ✅ FIX: Chunking
for chunk in pd.read_csv("large.csv", chunksize=10000):
process(chunk) # Constant memory
# ✅ BETTER: Polars streaming
df = pl.scan_csv("large.csv").collect(streaming=True)
```
### Pattern 2: List Comprehension vs Generator
**Vulnerable**:
```python
# ❌ LEAK: Entire list in memory
result = [process(item) for item in huge_list]
```
**Fixed**:
```python
# ✅ FIX: Generator (lazy evaluation)
result = (process(item) for item in huge_list)
for item in result:
use(item) # Processes one at a time
```
---
## Cache Management
### Pattern: Unbounded Cache Growth
**Vulnerable**:
```typescript
// ❌ LEAK: Cache grows forever
const cache = new Map<string, Data>();
function getData(key: string) {
if (!cache.has(key)) {
cache.set(key, fetchData(key)); // Never evicted
}
return cache.get(key);
}
```
**Fixed**:
```typescript
// ✅ FIX 1: LRU cache with max size
import { LRUCache } from 'lru-cache';
const cache = new LRUCache<string, Data>({
max: 1000, // Max 1000 entries
ttl: 1000 * 60 * 5 // 5 minute TTL
});
// ✅ FIX 2: WeakMap (auto-cleanup when key GC'd)
const cache = new WeakMap<object, Data>();
cache.set(key, data); // Auto-removed when key is GC'd
```
---
## Timer and Interval Leaks
### Pattern: Forgotten Timers
**Vulnerable**:
```typescript
// ❌ LEAK: Timer never cleared
class Component {
startPolling() {
setInterval(() => {
this.fetchData(); // Keeps Component alive forever
}, 1000);
}
}
```
**Fixed**:
```typescript
// ✅ FIX: Clear timer on cleanup
class Component {
private intervalId?: NodeJS.Timeout;
startPolling() {
this.intervalId = setInterval(() => {
this.fetchData();
}, 1000);
}
cleanup() {
if (this.intervalId) {
clearInterval(this.intervalId);
}
}
}
```
---
## Global Variable Accumulation
### Pattern: Growing Global Arrays
**Vulnerable**:
```typescript
// ❌ LEAK: Array grows forever
const logs: string[] = [];
function log(message: string) {
logs.push(message); // Never cleared
}
```
**Fixed**:
```typescript
// ✅ FIX 1: Bounded array
const MAX_LOGS = 1000;
const logs: string[] = [];
function log(message: string) {
logs.push(message);
if (logs.length > MAX_LOGS) {
logs.shift(); // Remove oldest
}
}
// ✅ FIX 2: Circular buffer
import { CircularBuffer } from 'circular-buffer';
const logs = new CircularBuffer<string>(1000);
```
---
## String Concatenation
### Pattern: Repeated String Concatenation
**Vulnerable**:
```python
# ❌ LEAK: Creates new string each iteration (O(n²))
result = ""
for item in items:
result += str(item) # New string allocation
```
**Fixed**:
```python
# ✅ FIX 1: Join
result = "".join(str(item) for item in items)
# ✅ FIX 2: StringIO
from io import StringIO
buffer = StringIO()
for item in items:
buffer.write(str(item))
result = buffer.getvalue()
```
---
## React Component Leaks
### Pattern: setState After Unmount
**Vulnerable**:
```typescript
// ❌ LEAK: setState called after unmount
function Component() {
const [data, setData] = useState(null);
useEffect(() => {
fetchData().then(setData); // If unmounted, causes leak
}, []);
}
```
**Fixed**:
```typescript
// ✅ FIX: Cleanup with AbortController
function Component() {
const [data, setData] = useState(null);
useEffect(() => {
const controller = new AbortController();
fetchData(controller.signal).then(setData);
return () => controller.abort(); // Cleanup
}, []);
}
```
---
## Detection Patterns
### Memory Leak Indicators
1. **Linear growth**: Memory usage increases linearly with time/requests
2. **Pool exhaustion**: Connection pool hits max size
3. **EventEmitter warnings**: "MaxListenersExceededWarning"
4. **GC pressure**: Frequent/long GC pauses
5. **OOM errors**: Process crashes with "JavaScript heap out of memory"
### Monitoring Metrics
```typescript
// Prometheus metrics for leak detection
const heap_used = new Gauge({
name: 'nodejs_heap_used_bytes',
help: 'V8 heap used bytes'
});
const event_listeners = new Gauge({
name: 'event_listeners_total',
help: 'Total event listeners',
labelNames: ['event']
});
// Alert if heap grows >10% per hour
// Alert if listener count >100 for single event
```
---
## Quick Fixes Checklist
- [ ] **Event listeners**: Use `once()` or `removeListener()`
- [ ] **Database connections**: Use context managers or `try/finally`
- [ ] **Large datasets**: Use chunking or streaming
- [ ] **Caches**: Implement LRU or WeakMap
- [ ] **Timers**: Clear with `clearInterval()` or `clearTimeout()`
- [ ] **Closures**: Extract values, avoid capturing large objects
- [ ] **React**: Cleanup in `useEffect()` return
- [ ] **Strings**: Use `join()` or `StringIO`, not `+=`
---
## Related Documentation
- **Examples**: [Examples Index](../examples/INDEX.md)
- **Tools**: [profiling-tools.md](profiling-tools.md)
- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
---
Return to [reference index](INDEX.md)

View File

@@ -0,0 +1,407 @@
# Memory Profiling Tools Comparison
Quick reference for choosing and using memory profiling tools across Node.js, Python, and production monitoring.
## Node.js Tools
### Chrome DevTools (Built-in)
**Best for**: Interactive heap snapshot analysis, timeline profiling
**Cost**: Free (built into Node.js)
**Usage**:
```bash
# Start Node.js with inspector
node --inspect server.js
# Open chrome://inspect
# Click "Open dedicated DevTools for Node"
```
**Features**:
- Heap snapshots (memory state at point in time)
- Timeline recording (allocations over time)
- Comparison view (find leaks by comparing snapshots)
- Retainer paths (why object not GC'd)
**When to use**:
- Development/staging environments
- Interactive debugging sessions
- Visual leak analysis
---
### heapdump (npm package)
**Best for**: Production heap snapshots without restarts
**Cost**: Free (npm package)
**Usage**:
```typescript
import heapdump from 'heapdump';
// Trigger snapshot on signal
process.on('SIGUSR2', () => {
heapdump.writeSnapshot((err, filename) => {
console.log('Heap dump written to', filename);
});
});
// Auto-snapshot on OOM
heapdump.writeSnapshot('./oom-' + Date.now() + '.heapsnapshot');
```
**When to use**:
- Production memory leak diagnosis
- Scheduled snapshots (daily/weekly)
- OOM analysis (capture before crash)
---
### clinic.js (Comprehensive Suite)
**Best for**: All-in-one performance profiling
**Cost**: Free (open source)
**Usage**:
```bash
# Install
npm install -g clinic
# Memory profiling
clinic heapprofiler -- node server.js
# Generates interactive HTML report
```
**Features**:
- Heap profiler (memory allocations)
- Flame graphs (CPU + memory)
- Timeline visualization
- Automatic leak detection
**When to use**:
- Initial performance investigation
- Comprehensive profiling (CPU + memory)
- Team-friendly reports (HTML)
---
### memwatch-next
**Best for**: Real-time leak detection in production
**Cost**: Free (npm package)
**Usage**:
```typescript
import memwatch from '@airbnb/node-memwatch';
memwatch.on('leak', (info) => {
console.error('Memory leak detected:', info);
// Alert, log, snapshot, etc.
});
memwatch.on('stats', (stats) => {
console.log('GC stats:', stats);
});
```
**When to use**:
- Production leak monitoring
- Automatic alerting
- GC pressure tracking
---
## Python Tools
### Scalene (Line-by-Line Profiler)
**Best for**: Fastest, most detailed Python profiler
**Cost**: Free (pip package)
**Usage**:
```bash
# Install
pip install scalene
# Profile script
scalene script.py
# Profile with pytest
scalene --cli --memory -m pytest tests/
# HTML report
scalene --html --outfile profile.html script.py
```
**Features**:
- Line-by-line memory allocation
- CPU profiling
- GPU profiling
- Native code vs Python time
- Memory timeline
**When to use**:
- Python memory optimization
- Line-level bottleneck identification
- pytest integration
---
### memory_profiler
**Best for**: Simple decorator-based profiling
**Cost**: Free (pip package)
**Usage**:
```python
from memory_profiler import profile
@profile
def my_function():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
return a + b
# Run with: python -m memory_profiler script.py
```
**When to use**:
- Quick function-level profiling
- Simple memory debugging
- Educational/learning
---
### tracemalloc (Built-in)
**Best for**: Production memory tracking without dependencies
**Cost**: Free (Python standard library)
**Usage**:
```python
import tracemalloc
tracemalloc.start()
# Your code here
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.1f} MB")
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
# Top allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
print(stat)
tracemalloc.stop()
```
**When to use**:
- Production environments (no external dependencies)
- Allocation tracking
- Top allocators identification
---
### py-spy (Sampling Profiler)
**Best for**: Zero-overhead production profiling
**Cost**: Free (cargo/pip package)
**Usage**:
```bash
# Install
pip install py-spy
# Attach to running process (no code changes!)
py-spy top --pid 12345
# Flame graph
py-spy record --pid 12345 --output profile.svg
```
**When to use**:
- Production profiling (minimal overhead)
- No code modification required
- Running process analysis
---
## Monitoring Tools
### Prometheus + Grafana
**Best for**: Production metrics and alerting
**Cost**: Free (open source)
**Metrics to track**:
```typescript
import { Gauge, Histogram } from 'prom-client';
// Heap usage
const heap_used = new Gauge({
name: 'nodejs_heap_used_bytes',
help: 'V8 heap used bytes'
});
// Memory allocation rate
const allocation_rate = new Gauge({
name: 'memory_allocation_bytes_per_second',
help: 'Memory allocation rate'
});
// Connection pool
const pool_active = new Gauge({
name: 'db_pool_connections_active',
help: 'Active database connections'
});
```
**Alerts**:
```yaml
# Prometheus alert rules
groups:
- name: memory_alerts
rules:
- alert: MemoryLeak
expr: increase(nodejs_heap_used_bytes[1h]) > 100000000 # +100MB/hour
for: 6h
annotations:
summary: "Potential memory leak ({{ $value | humanize }} growth)"
- alert: HeapNearLimit
expr: nodejs_heap_used_bytes / nodejs_heap_size_bytes > 0.9
for: 5m
annotations:
summary: "Heap usage >90%"
```
**When to use**:
- Production monitoring (all environments)
- Long-term trend analysis
- Automatic alerting
---
### DataDog APM
**Best for**: Comprehensive observability platform
**Cost**: Paid (starts $15/host/month)
**Features**:
- Automatic heap tracking
- Memory leak detection
- Distributed tracing
- Alert management
- Dashboards
**When to use**:
- Enterprise environments
- Multi-service tracing
- Managed solution preferred
---
## Tool Selection Matrix
| Scenario | Node.js Tool | Python Tool | Monitoring |
|----------|-------------|-------------|------------|
| **Development debugging** | Chrome DevTools | Scalene | - |
| **Production leak** | heapdump | py-spy | Prometheus |
| **Line-level analysis** | clinic.js | Scalene | - |
| **Real-time monitoring** | memwatch-next | tracemalloc | Grafana |
| **Zero overhead** | - | py-spy | DataDog |
| **No dependencies** | Chrome DevTools | tracemalloc | - |
| **Team reports** | clinic.js | Scalene HTML | Grafana |
---
## Quick Start Commands
### Node.js
```bash
# Development: Chrome DevTools
node --inspect server.js
# Production: Heap snapshot
kill -USR2 <pid> # If heapdump configured
# Comprehensive: clinic.js
clinic heapprofiler -- node server.js
```
### Python
```bash
# Line-by-line: Scalene
scalene --cli --memory script.py
# Quick profile: memory_profiler
python -m memory_profiler script.py
# Production: py-spy
py-spy top --pid <pid>
```
### Monitoring
```bash
# Prometheus metrics
curl http://localhost:9090/metrics | grep memory
# Grafana dashboard
# Import dashboard ID: 11159 (Node.js)
# Import dashboard ID: 7362 (Python)
```
---
## Tool Comparison Table
| Tool | Language | Type | Overhead | Production-Safe | Interactive |
|------|----------|------|----------|----------------|-------------|
| **Chrome DevTools** | Node.js | Heap snapshot | Low | No | Yes |
| **heapdump** | Node.js | Heap snapshot | Low | Yes | No |
| **clinic.js** | Node.js | Profiler | Medium | No | Yes |
| **memwatch-next** | Node.js | Real-time | Low | Yes | No |
| **Scalene** | Python | Profiler | Low | Staging | Yes |
| **memory_profiler** | Python | Decorator | Medium | No | No |
| **tracemalloc** | Python | Built-in | Low | Yes | No |
| **py-spy** | Python | Sampling | Very Low | Yes | No |
| **Prometheus** | Both | Metrics | Very Low | Yes | Yes (Grafana) |
| **DataDog** | Both | APM | Very Low | Yes | Yes |
---
## Best Practices
### Development Workflow
1. **Initial investigation**: Chrome DevTools (Node.js) or Scalene (Python)
2. **Line-level analysis**: clinic.js or Scalene with `--html`
3. **Root cause**: Heap snapshot comparison (DevTools)
4. **Validation**: Load testing with monitoring
### Production Workflow
1. **Detection**: Prometheus alerts (heap growth, pool exhaustion)
2. **Diagnosis**: heapdump snapshot or py-spy sampling
3. **Analysis**: Chrome DevTools (load snapshot) or Scalene (if reproducible in staging)
4. **Monitoring**: Grafana dashboards for trends
---
## Related Documentation
- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
- **Examples**: [Examples Index](../examples/INDEX.md)
---
Return to [reference index](INDEX.md)

View File

@@ -0,0 +1,60 @@
# Memory Profiler Templates
Ready-to-use templates for memory profiling reports and heap snapshot analysis.
## Templates Overview
### Memory Investigation Report
**File**: [memory-report-template.md](memory-report-template.md)
Template for documenting memory leak investigations:
- **Incident Summary**: Timeline, symptoms, impact
- **Investigation Steps**: Tools used, findings
- **Root Cause**: Code analysis, leak pattern identified
- **Fix Implementation**: Code changes, validation
- **Results**: Before/after metrics
**Use when**: Documenting memory leak investigations for team/postmortems
---
### Heap Snapshot Analysis Checklist
**File**: [heap-snapshot-analysis.md](heap-snapshot-analysis.md)
Step-by-step checklist for analyzing V8 heap snapshots:
- **Snapshot Collection**: When/how to capture snapshots
- **Comparison Analysis**: Finding leaks by comparing snapshots
- **Retainer Analysis**: Understanding why objects not GC'd
- **Common Patterns**: EventEmitter, closures, timers
**Use when**: Analyzing heap snapshots in Chrome DevTools
---
## Quick Usage
### Memory Report
1. Copy template: `cp templates/memory-report-template.md docs/investigations/memory-leak-YYYY-MM-DD.md`
2. Fill in sections as you investigate
3. Share with team for review
### Heap Analysis
1. Open template: `templates/heap-snapshot-analysis.md`
2. Follow checklist step-by-step
3. Document findings in memory report
---
## Related Documentation
- **Examples**: [Examples Index](../examples/INDEX.md) - Full investigation examples
- **Reference**: [Reference Index](../reference/INDEX.md) - Pattern catalog
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
---
Return to [main agent](../memory-profiler.md)

View File

@@ -0,0 +1,322 @@
# Memory Leak Investigation Report
**Service**: [Service Name]
**Date**: [YYYY-MM-DD]
**Investigator**: [Your Name]
**Severity**: [Critical/High/Medium/Low]
---
## Executive Summary
**TL;DR**: [One sentence summary of the leak, cause, and fix]
**Impact**:
- Memory growth: [X MB/hour or X% increase]
- OOM incidents: [Number of crashes]
- Affected users: [Number or percentage]
- Duration: [How long the leak existed]
**Resolution**:
- Root cause: [Leak pattern - e.g., "EventEmitter listeners not removed"]
- Fix deployed: [Date/time]
- Status: [Resolved/Monitoring/In Progress]
---
## Incident Timeline
| Time | Event | Details |
|------|-------|---------|
| [HH:MM] | Detection | [How was leak detected? Alert, manual observation, etc.] |
| [HH:MM] | Investigation started | [Initial actions taken] |
| [HH:MM] | Root cause identified | [What was found] |
| [HH:MM] | Fix implemented | [Code changes made] |
| [HH:MM] | Fix deployed | [Deployment details] |
| [HH:MM] | Validation complete | [Confirmation that leak is fixed] |
---
## Symptoms and Detection
### Initial Symptoms
- [ ] Linear memory growth (X MB/hour)
- [ ] OOM crashes (frequency: ___)
- [ ] GC pressure (frequent/long pauses)
- [ ] Connection pool exhaustion
- [ ] Service degradation (slow responses)
- [ ] Other: ___
### Detection Method
**How Discovered**: [Alert, monitoring dashboard, user report, etc.]
**Monitoring Data**:
```
Prometheus query: [Query used to detect the leak]
Alert rule: [Alert name/threshold]
Dashboard: [Link to Grafana dashboard]
```
**Example Metrics**:
```
Before:
- Heap usage baseline: X MB
- After 6 hours: Y MB
- Growth rate: Z MB/hour
Current:
- Heap usage: [Current value]
- Active connections: [Number]
- GC pause duration: [p95 value]
```
---
## Investigation Steps
### 1. Initial Data Collection
**Tools Used**:
- [ ] Chrome DevTools heap snapshots
- [ ] Node.js `--trace-gc` logs
- [ ] Python Scalene profiling
- [ ] Prometheus metrics
- [ ] Application logs
- [ ] Other: ___
**Heap Snapshots Collected**:
```
Snapshot 1: [timestamp] - [size] MB - [location/filename]
Snapshot 2: [timestamp] - [size] MB - [location/filename]
Snapshot 3: [timestamp] - [size] MB - [location/filename]
```
### 2. Snapshot Comparison Analysis
**Method**: [Comparison view in Chrome DevTools, diff analysis, etc.]
**Findings**:
```
Objects growing between snapshots:
- [Object type 1]: +X instances (+Y MB)
- [Object type 2]: +X instances (+Y MB)
- [Object type 3]: +X instances (+Y MB)
Top 3 memory consumers:
1. [Object type] - X MB - [Retainer path]
2. [Object type] - X MB - [Retainer path]
3. [Object type] - X MB - [Retainer path]
```
### 3. Retainer Path Analysis
**Leaked Object**: [Type of object that's leaking]
**Retainer Path**:
```
Window / Global
→ [Variable name]
→ [Object/function]
→ [Property]
→ [Leaked object]
```
**Why Not GC'd**: [Explanation of what's keeping object alive]
---
## Root Cause Analysis
### Leak Pattern Identified
**Pattern**: [e.g., EventEmitter leak, closure trap, unclosed connection, etc.]
**Vulnerable Code** (before fix):
```typescript
// File: [filepath]:[line]
// [Brief explanation of why this leaks]
[Paste vulnerable code here]
```
**Why This Leaks**:
1. [Step 1 of how the leak occurs]
2. [Step 2]
3. [Result: memory accumulates]
### Reproduction Steps
1. [Step to reproduce leak in dev/staging]
2. [Step 2]
3. [Observed result: memory growth]
**Reproduction Time**: [How long to observe leak? Minutes/hours]
---
## Fix Implementation
### Code Changes
**Pull Request**: [Link to PR]
**Files Modified**:
- [file1.ts] - [Brief description of change]
- [file2.ts] - [Brief description of change]
**Fixed Code**:
```typescript
// File: [filepath]:[line]
// [Brief explanation of fix]
[Paste fixed code here]
```
**Fix Strategy**:
- [ ] Remove event listeners (use `removeListener()` or `once()`)
- [ ] Close connections (use context managers or `try/finally`)
- [ ] Clear timers (use `clearInterval()`/`clearTimeout()`)
- [ ] Use WeakMap/WeakSet (for cache)
- [ ] Implement generator/streaming (for large datasets)
- [ ] Other: ___
### Testing and Validation
**Tests Added**:
```typescript
// Test that verifies no leak
describe('Memory leak fix', () => {
it('should not leak listeners', () => {
const before = emitter.listenerCount('event');
// ... execute code
const after = emitter.listenerCount('event');
expect(after).toBe(before); // No leak
});
});
```
**Load Test Results**:
```
Before fix:
- Memory after 1000 requests: X MB
- Memory after 10000 requests: Y MB (growth)
After fix:
- Memory after 1000 requests: X MB
- Memory after 10000 requests: X MB (stable)
```
---
## Deployment and Results
### Deployment Details
**Environment**: [staging/production]
**Deployment Time**: [YYYY-MM-DD HH:MM UTC]
**Rollout Strategy**: [Canary, blue-green, rolling, etc.]
### Post-Deployment Metrics
**Before Fix**:
```
Memory baseline: X MB
Memory after 6h: Y MB
Growth rate: Z MB/hour
OOM incidents: N/week
```
**After Fix**:
```
Memory baseline: X MB
Memory after 6h: X MB (stable!)
Growth rate: 0 MB/hour
OOM incidents: 0/month
```
**Improvement**:
- Memory reduction: [X% or Y MB]
- OOM elimination: [100%]
- GC pressure: [Reduced by X%]
### Grafana Dashboard
**Link**: [Dashboard URL]
**Key Panels**:
- Heap usage trend: [Shows memory stable after fix]
- GC pause duration: [Shows improved GC behavior]
- Error rate: [Shows OOM errors eliminated]
---
## Lessons Learned
### What Went Well
- [Positive aspect 1]
- [Positive aspect 2]
### What Could Be Improved
- [Improvement area 1]
- [Improvement area 2]
### Preventive Measures
**Monitoring Added**:
- [ ] Alert: Memory growth >X MB/hour for >Y hours
- [ ] Alert: Heap usage >Z% of limit
- [ ] Dashboard: Memory trend visualization
- [ ] Alert: Connection pool saturation >X%
**Code Review Checklist Updated**:
- [ ] Event listeners properly cleaned up
- [ ] Database connections closed
- [ ] Timers/intervals cleared
- [ ] Large datasets processed with streaming/chunking
**Testing Standards**:
- [ ] Memory leak tests for event listeners
- [ ] Load tests with memory monitoring
- [ ] CI/CD checks for connection cleanup
---
## Related Documentation
- **Pattern Catalog**: [Link to memory-optimization-patterns.md]
- **Similar Incidents**: [Links to previous memory leak reports]
- **Runbook**: [Link to memory leak runbook]
---
## Appendix
### Heap Snapshot Files
- [snapshot1.heapsnapshot] - [Location/S3 URL]
- [snapshot2.heapsnapshot] - [Location/S3 URL]
### GC Logs
```
[Relevant GC log excerpts showing the leak]
```
### Prometheus Queries
```promql
# Memory growth rate
rate(nodejs_heap_used_bytes[1h])
# GC pause duration
histogram_quantile(0.95, rate(nodejs_gc_duration_seconds_bucket[5m]))
```
---
**Report Completed**: [YYYY-MM-DD]
**Next Review**: [Date for follow-up validation]