Initial commit
This commit is contained in:
85
skills/memory-profiling/SKILL.md
Normal file
85
skills/memory-profiling/SKILL.md
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
name: grey-haven-memory-profiling
|
||||
description: "Identify memory leaks, inefficient allocations, and optimization opportunities in JavaScript/TypeScript and Python applications. Analyze heap snapshots, allocation patterns, garbage collection, and memory retention. Use when memory grows over time, high memory consumption detected, performance degradation, or when user mentions 'memory leak', 'memory usage', 'heap analysis', 'garbage collection', 'memory profiling', or 'out of memory'."
|
||||
---
|
||||
|
||||
# Memory Profiling Skill
|
||||
|
||||
Identify memory leaks, inefficiencies, and optimization opportunities in running applications through systematic heap analysis and allocation profiling.
|
||||
|
||||
## Description
|
||||
|
||||
Specialized memory profiling skill for analyzing allocation patterns, heap usage, garbage collection behavior, and memory retention in JavaScript/TypeScript (Node.js, Bun, browsers) and Python applications. Detect memory leaks, optimize memory usage, and prevent out-of-memory errors.
|
||||
|
||||
## What's Included
|
||||
|
||||
### Examples (`examples/`)
|
||||
- **Memory leak detection** - Finding and fixing common leak patterns
|
||||
- **Heap snapshot analysis** - Interpreting Chrome DevTools heap snapshots
|
||||
- **Allocation profiling** - Tracking memory allocation over time
|
||||
- **Real-world scenarios** - E-commerce app leak, API server memory growth
|
||||
|
||||
### Reference Guides (`reference/`)
|
||||
- **Profiling tools** - Chrome DevTools, Node.js inspector, Python memory_profiler
|
||||
- **Memory concepts** - Heap, stack, GC algorithms, retention paths
|
||||
- **Optimization techniques** - Object pooling, weak references, lazy loading
|
||||
- **Common leak patterns** - Event listeners, closures, caching, timers
|
||||
|
||||
### Templates (`templates/`)
|
||||
- **Profiling report template** - Standardized memory analysis reports
|
||||
- **Heap snapshot comparison template** - Before/after analysis
|
||||
- **Memory budget template** - Setting and tracking memory limits
|
||||
|
||||
### Checklists (`checklists/`)
|
||||
- **Memory leak checklist** - Systematic leak detection process
|
||||
- **Optimization checklist** - Memory optimization verification
|
||||
|
||||
## Use This Skill When
|
||||
|
||||
- ✅ Memory usage growing continuously over time
|
||||
- ✅ High memory consumption detected (> 500MB for Node, > 1GB for Python)
|
||||
- ✅ Performance degradation with prolonged runtime
|
||||
- ✅ Out of memory errors in production
|
||||
- ✅ Garbage collection causing performance issues
|
||||
- ✅ Need to optimize memory footprint
|
||||
- ✅ User mentions: "memory leak", "memory usage", "heap", "garbage collection", "OOM"
|
||||
|
||||
## Related Agents
|
||||
|
||||
- `memory-profiler` - Automated memory analysis and leak detection
|
||||
- `performance-optimizer` - Broader performance optimization including memory
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# View leak detection examples
|
||||
cat examples/memory-leak-detection.md
|
||||
|
||||
# Check profiling tools reference
|
||||
cat reference/profiling-tools.md
|
||||
|
||||
# Use memory leak checklist
|
||||
cat checklists/memory-leak-checklist.md
|
||||
```
|
||||
|
||||
## Common Memory Issues
|
||||
|
||||
1. **Event Listener Leaks** - Unremoved listeners holding references
|
||||
2. **Closure Leaks** - Variables captured in closures never released
|
||||
3. **Cache Leaks** - Unbounded caches growing indefinitely
|
||||
4. **Timer Leaks** - setInterval/setTimeout not cleared
|
||||
5. **DOM Leaks** - Detached DOM nodes retained in memory
|
||||
6. **Circular References** - Objects referencing each other preventing GC
|
||||
|
||||
## Typical Workflow
|
||||
|
||||
1. **Detect**: Run profiler, take heap snapshots
|
||||
2. **Analyze**: Compare snapshots, identify growing objects
|
||||
3. **Locate**: Find retention paths, trace to source
|
||||
4. **Fix**: Remove references, clean up resources
|
||||
5. **Verify**: Re-profile to confirm fix
|
||||
|
||||
---
|
||||
|
||||
**Skill Version**: 1.0
|
||||
**Last Updated**: 2025-11-09
|
||||
86
skills/memory-profiling/examples/INDEX.md
Normal file
86
skills/memory-profiling/examples/INDEX.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Memory Profiling Examples
|
||||
|
||||
Production memory profiling implementations for Node.js and Python with leak detection, heap analysis, and optimization strategies.
|
||||
|
||||
## Examples Overview
|
||||
|
||||
### Node.js Memory Leak Detection
|
||||
|
||||
**File**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
|
||||
|
||||
Identifying and fixing memory leaks in Node.js applications:
|
||||
- **Memory leak detection**: Chrome DevTools, heapdump analysis
|
||||
- **Common leak patterns**: Event listeners, closures, global variables
|
||||
- **Heap snapshots**: Before/after comparison, retained object analysis
|
||||
- **Real leak**: EventEmitter leak causing 2GB memory growth
|
||||
- **Fix**: Proper cleanup with `removeListener()`, WeakMap for caching
|
||||
- **Result**: Memory stabilized at 150MB (93% reduction)
|
||||
|
||||
**Use when**: Node.js memory growing over time, debugging production memory issues
|
||||
|
||||
---
|
||||
|
||||
### Python Memory Profiling with Scalene
|
||||
|
||||
**File**: [python-scalene-profiling.md](python-scalene-profiling.md)
|
||||
|
||||
Line-by-line memory profiling for Python applications:
|
||||
- **Scalene setup**: Installation, pytest integration, CLI usage
|
||||
- **Memory hotspots**: Line-by-line allocation tracking
|
||||
- **CPU + Memory**: Combined profiling for performance bottlenecks
|
||||
- **Real scenario**: 500MB dataset causing OOM, fixed with generators
|
||||
- **Optimization**: List comprehension → generator (500MB → 5MB)
|
||||
- **Result**: 99% memory reduction, no OOM errors
|
||||
|
||||
**Use when**: Python memory spikes, profiling pytest tests, finding allocation hotspots
|
||||
|
||||
---
|
||||
|
||||
### Database Connection Pool Leak
|
||||
|
||||
**File**: [database-connection-leak.md](database-connection-leak.md)
|
||||
|
||||
PostgreSQL connection pool exhaustion and memory leaks:
|
||||
- **Symptom**: Connection pool maxed out, memory growing linearly
|
||||
- **Root cause**: Unclosed connections in error paths, missing `finally` blocks
|
||||
- **Detection**: Connection pool metrics, memory profiling
|
||||
- **Fix**: Context managers (`with` statement), proper cleanup
|
||||
- **Result**: Zero connection leaks, memory stable at 80MB
|
||||
|
||||
**Use when**: Database connection errors, "too many clients" errors, connection pool issues
|
||||
|
||||
---
|
||||
|
||||
### Large Dataset Memory Optimization
|
||||
|
||||
**File**: [large-dataset-optimization.md](large-dataset-optimization.md)
|
||||
|
||||
Memory-efficient data processing for large datasets:
|
||||
- **Problem**: Loading 10GB CSV into memory (OOM killer)
|
||||
- **Solutions**: Streaming with `pandas.read_csv(chunksize)`, generators, memory mapping
|
||||
- **Techniques**: Lazy evaluation, columnar processing, batch processing
|
||||
- **Before/After**: 10GB memory → 500MB (95% reduction)
|
||||
- **Tools**: Pandas chunking, Dask for parallel processing
|
||||
|
||||
**Use when**: Processing large files, OOM errors, batch data processing
|
||||
|
||||
---
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
| Topic | File | Lines | Focus |
|
||||
|-------|------|-------|-------|
|
||||
| **Node.js Leaks** | [nodejs-memory-leak.md](nodejs-memory-leak.md) | ~450 | EventEmitter, heap snapshots |
|
||||
| **Python Scalene** | [python-scalene-profiling.md](python-scalene-profiling.md) | ~420 | Line-by-line profiling |
|
||||
| **DB Connection Leaks** | [database-connection-leak.md](database-connection-leak.md) | ~380 | Connection pool management |
|
||||
| **Large Datasets** | [large-dataset-optimization.md](large-dataset-optimization.md) | ~400 | Streaming, chunking |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Reference**: [Reference Index](../reference/INDEX.md) - Memory patterns, profiling tools
|
||||
- **Templates**: [Templates Index](../templates/INDEX.md) - Profiling report template
|
||||
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
|
||||
|
||||
---
|
||||
|
||||
Return to [main agent](../memory-profiler.md)
|
||||
490
skills/memory-profiling/examples/database-connection-leak.md
Normal file
490
skills/memory-profiling/examples/database-connection-leak.md
Normal file
@@ -0,0 +1,490 @@
|
||||
# Database Connection Pool Memory Leaks
|
||||
|
||||
Detecting and fixing PostgreSQL connection pool leaks in FastAPI applications using connection monitoring and proper cleanup patterns.
|
||||
|
||||
## Overview
|
||||
|
||||
**Before Optimization**:
|
||||
- Active connections: 95/100 (pool exhausted)
|
||||
- Connection timeouts: 15-20/min during peak
|
||||
- Memory growth: 100MB/hour (unclosed connections)
|
||||
- Service restarts: 3-4x/day
|
||||
|
||||
**After Optimization**:
|
||||
- Active connections: 8-12/100 (healthy pool)
|
||||
- Connection timeouts: 0/day
|
||||
- Memory growth: 0MB/hour (stable)
|
||||
- Service restarts: 0/month
|
||||
|
||||
**Tools**: asyncpg, SQLModel, psycopg3, pg_stat_activity, Prometheus
|
||||
|
||||
## 1. Connection Pool Architecture
|
||||
|
||||
### Grey Haven Stack: PostgreSQL + SQLModel
|
||||
|
||||
**Connection Pool Configuration**:
|
||||
```python
|
||||
# database.py
|
||||
from sqlmodel import create_engine
|
||||
from sqlalchemy.pool import QueuePool
|
||||
|
||||
# ❌ VULNERABLE: No max_overflow, no timeout
|
||||
engine = create_engine(
|
||||
"postgresql://user:pass@localhost/db",
|
||||
poolclass=QueuePool,
|
||||
pool_size=20,
|
||||
echo=True
|
||||
)
|
||||
|
||||
# ✅ SECURE: Proper pool configuration
|
||||
engine = create_engine(
|
||||
"postgresql://user:pass@localhost/db",
|
||||
poolclass=QueuePool,
|
||||
pool_size=20, # Core connections
|
||||
max_overflow=10, # Max additional connections
|
||||
pool_timeout=30, # Wait timeout (seconds)
|
||||
pool_recycle=3600, # Recycle after 1 hour
|
||||
pool_pre_ping=True, # Verify connection before use
|
||||
echo=False
|
||||
)
|
||||
```
|
||||
|
||||
**Pool Health Monitoring**:
|
||||
```python
|
||||
# monitoring.py
|
||||
from prometheus_client import Gauge
|
||||
|
||||
# Prometheus metrics
|
||||
db_pool_size = Gauge('db_pool_connections_total', 'Total pool size')
|
||||
db_pool_active = Gauge('db_pool_connections_active', 'Active connections')
|
||||
db_pool_idle = Gauge('db_pool_connections_idle', 'Idle connections')
|
||||
db_pool_overflow = Gauge('db_pool_connections_overflow', 'Overflow connections')
|
||||
|
||||
def record_pool_metrics(engine):
|
||||
pool = engine.pool
|
||||
db_pool_size.set(pool.size())
|
||||
db_pool_active.set(pool.checkedout())
|
||||
db_pool_idle.set(pool.size() - pool.checkedout())
|
||||
db_pool_overflow.set(pool.overflow())
|
||||
```
|
||||
|
||||
## 2. Common Leak Pattern: Unclosed Connections
|
||||
|
||||
### Vulnerable Code (Connection Leak)
|
||||
|
||||
```python
|
||||
# api/orders.py (BEFORE)
|
||||
from fastapi import APIRouter, Depends
|
||||
from sqlmodel import Session, select
|
||||
from database import engine
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.get("/orders")
|
||||
async def get_orders():
|
||||
# ❌ LEAK: Connection never closed
|
||||
session = Session(engine)
|
||||
|
||||
# If exception occurs here, session never closed
|
||||
orders = session.exec(select(Order)).all()
|
||||
|
||||
# If return happens here, session never closed
|
||||
return orders
|
||||
|
||||
# session.close() never reached if early return/exception
|
||||
session.close()
|
||||
```
|
||||
|
||||
**What Happens**:
|
||||
1. Every request acquires connection from pool
|
||||
2. Exception/early return prevents `session.close()`
|
||||
3. Connection remains in "active" state
|
||||
4. Pool exhausts after 100 requests (pool_size=100)
|
||||
5. New requests timeout waiting for connection
|
||||
|
||||
**Memory Impact**:
|
||||
```
|
||||
Initial pool: 20 connections (40MB)
|
||||
After 1 hour: 95 leaked connections (190MB)
|
||||
After 6 hours: Pool exhausted + 100MB leaked memory
|
||||
```
|
||||
|
||||
### Fixed Code (Context Manager)
|
||||
|
||||
```python
|
||||
# api/orders.py (AFTER)
|
||||
from fastapi import APIRouter, Depends
|
||||
from sqlmodel import Session, select
|
||||
from database import engine, get_session
|
||||
from contextlib import contextmanager
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
# ✅ Option 1: FastAPI dependency injection (recommended)
|
||||
def get_session():
|
||||
"""Session dependency with automatic cleanup"""
|
||||
with Session(engine) as session:
|
||||
yield session
|
||||
|
||||
@router.get("/orders")
|
||||
async def get_orders(session: Session = Depends(get_session)):
|
||||
# Session automatically closed after request
|
||||
orders = session.exec(select(Order)).all()
|
||||
return orders
|
||||
|
||||
|
||||
# ✅ Option 2: Explicit context manager
|
||||
@router.get("/orders-alt")
|
||||
async def get_orders_alt():
|
||||
with Session(engine) as session:
|
||||
orders = session.exec(select(Order)).all()
|
||||
return orders
|
||||
# Session guaranteed to close (even on exception)
|
||||
```
|
||||
|
||||
**Why This Works**:
|
||||
- Context manager ensures `session.close()` called in `__exit__`
|
||||
- Works even if exception raised
|
||||
- Works even if early return
|
||||
- FastAPI `Depends()` handles async cleanup
|
||||
|
||||
## 3. Async Connection Leaks (asyncpg)
|
||||
|
||||
### Vulnerable Async Pattern
|
||||
|
||||
```python
|
||||
# api/analytics.py (BEFORE)
|
||||
import asyncpg
|
||||
from fastapi import APIRouter
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.get("/analytics")
|
||||
async def get_analytics():
|
||||
# ❌ LEAK: Connection never closed
|
||||
conn = await asyncpg.connect(
|
||||
user='postgres',
|
||||
password='secret',
|
||||
database='analytics'
|
||||
)
|
||||
|
||||
# Exception here = connection leaked
|
||||
result = await conn.fetch('SELECT * FROM metrics WHERE date > $1', date)
|
||||
|
||||
# Early return = connection leaked
|
||||
if not result:
|
||||
return []
|
||||
|
||||
await conn.close() # Never reached
|
||||
return result
|
||||
```
|
||||
|
||||
### Fixed Async Pattern
|
||||
|
||||
```python
|
||||
# api/analytics.py (AFTER)
|
||||
import asyncpg
|
||||
from fastapi import APIRouter
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
# ✅ Connection pool (shared across requests)
|
||||
pool: asyncpg.Pool = None
|
||||
|
||||
@asynccontextmanager
|
||||
async def get_db_connection():
|
||||
"""Async context manager for connections"""
|
||||
conn = await pool.acquire()
|
||||
try:
|
||||
yield conn
|
||||
finally:
|
||||
await pool.release(conn)
|
||||
|
||||
@router.get("/analytics")
|
||||
async def get_analytics():
|
||||
async with get_db_connection() as conn:
|
||||
result = await conn.fetch(
|
||||
'SELECT * FROM metrics WHERE date > $1',
|
||||
date
|
||||
)
|
||||
return result
|
||||
# Connection automatically released to pool
|
||||
```
|
||||
|
||||
**Pool Setup** (application startup):
|
||||
```python
|
||||
# main.py
|
||||
from fastapi import FastAPI
|
||||
import asyncpg
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
global pool
|
||||
pool = await asyncpg.create_pool(
|
||||
user='postgres',
|
||||
password='secret',
|
||||
database='analytics',
|
||||
min_size=10, # Minimum connections
|
||||
max_size=20, # Maximum connections
|
||||
max_inactive_connection_lifetime=300 # Recycle after 5 min
|
||||
)
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown():
|
||||
await pool.close()
|
||||
```
|
||||
|
||||
## 4. Transaction Leak Detection
|
||||
|
||||
### Monitoring Active Connections
|
||||
|
||||
**PostgreSQL Query**:
|
||||
```sql
|
||||
-- Show active connections with details
|
||||
SELECT
|
||||
pid,
|
||||
usename,
|
||||
application_name,
|
||||
client_addr,
|
||||
state,
|
||||
query,
|
||||
state_change,
|
||||
NOW() - state_change AS duration
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
ORDER BY duration DESC;
|
||||
```
|
||||
|
||||
**Prometheus Metrics**:
|
||||
```python
|
||||
# monitoring.py
|
||||
from prometheus_client import Gauge
|
||||
import asyncpg
|
||||
|
||||
db_connections_active = Gauge(
|
||||
'db_connections_active',
|
||||
'Active database connections',
|
||||
['state']
|
||||
)
|
||||
|
||||
async def monitor_connections(pool: asyncpg.Pool):
|
||||
"""Monitor PostgreSQL connections every 30 seconds"""
|
||||
async with pool.acquire() as conn:
|
||||
rows = await conn.fetch("""
|
||||
SELECT state, COUNT(*) as count
|
||||
FROM pg_stat_activity
|
||||
WHERE datname = current_database()
|
||||
GROUP BY state
|
||||
""")
|
||||
|
||||
for row in rows:
|
||||
db_connections_active.labels(state=row['state']).set(row['count'])
|
||||
```
|
||||
|
||||
**Grafana Alert** (connection leak):
|
||||
```yaml
|
||||
alert: DatabaseConnectionLeak
|
||||
expr: db_connections_active{state="active"} > 80
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: "Potential connection leak ({{ $value }} active connections)"
|
||||
description: "Active connections have been above 80 for 5+ minutes"
|
||||
```
|
||||
|
||||
## 5. Real-World Fix: FastAPI Order Service
|
||||
|
||||
### Before (Connection Pool Exhaustion)
|
||||
|
||||
```python
|
||||
# services/order_processor.py (BEFORE)
|
||||
from sqlmodel import Session, select
|
||||
from database import engine
|
||||
from models import Order, OrderItem
|
||||
|
||||
class OrderProcessor:
|
||||
async def process_order(self, order_id: int):
|
||||
# ❌ LEAK: Multiple sessions, some never closed
|
||||
session1 = Session(engine)
|
||||
order = session1.get(Order, order_id)
|
||||
|
||||
if not order:
|
||||
# Early return = session1 leaked
|
||||
return None
|
||||
|
||||
# ❌ LEAK: Second session
|
||||
session2 = Session(engine)
|
||||
items = session2.exec(
|
||||
select(OrderItem).where(OrderItem.order_id == order_id)
|
||||
).all()
|
||||
|
||||
# Exception here = both sessions leaked
|
||||
total = sum(item.price * item.quantity for item in items)
|
||||
|
||||
order.total = total
|
||||
session1.commit()
|
||||
|
||||
# Only session1 closed, session2 leaked
|
||||
session1.close()
|
||||
return order
|
||||
```
|
||||
|
||||
**Metrics (Before)**:
|
||||
```
|
||||
Connection pool: 100 connections
|
||||
Active connections after 1 hour: 95/100
|
||||
Leaked connections: ~12/min
|
||||
Memory growth: 100MB/hour
|
||||
Pool exhaustion: Every 6-8 hours
|
||||
```
|
||||
|
||||
### After (Proper Resource Management)
|
||||
|
||||
```python
|
||||
# services/order_processor.py (AFTER)
|
||||
from sqlmodel import Session, select
|
||||
from database import engine, get_session
|
||||
from models import Order, OrderItem
|
||||
from contextlib import contextmanager
|
||||
|
||||
class OrderProcessor:
|
||||
async def process_order(self, order_id: int):
|
||||
# ✅ Single session, guaranteed cleanup
|
||||
with Session(engine) as session:
|
||||
# Query order
|
||||
order = session.get(Order, order_id)
|
||||
if not order:
|
||||
return None
|
||||
|
||||
# Query items (same session)
|
||||
items = session.exec(
|
||||
select(OrderItem).where(OrderItem.order_id == order_id)
|
||||
).all()
|
||||
|
||||
# Calculate total
|
||||
total = sum(item.price * item.quantity for item in items)
|
||||
|
||||
# Update order
|
||||
order.total = total
|
||||
session.add(order)
|
||||
session.commit()
|
||||
session.refresh(order)
|
||||
|
||||
return order
|
||||
# Session automatically closed (even on exception)
|
||||
```
|
||||
|
||||
**Metrics (After)**:
|
||||
```
|
||||
Connection pool: 100 connections
|
||||
Active connections: 8-12/100 (stable)
|
||||
Leaked connections: 0/day
|
||||
Memory growth: 0MB/hour
|
||||
Pool exhaustion: Never (0 incidents/month)
|
||||
```
|
||||
|
||||
## 6. Connection Pool Configuration Best Practices
|
||||
|
||||
### Recommended Settings (Grey Haven Stack)
|
||||
|
||||
```python
|
||||
# database.py - Production settings
|
||||
from sqlmodel import create_engine
|
||||
from sqlalchemy.pool import QueuePool
|
||||
|
||||
engine = create_engine(
|
||||
database_url,
|
||||
poolclass=QueuePool,
|
||||
pool_size=20, # (workers * connections/worker) + buffer
|
||||
max_overflow=10, # 50% of pool_size
|
||||
pool_timeout=30, # Wait timeout
|
||||
pool_recycle=3600, # Recycle after 1h
|
||||
pool_pre_ping=True # Health check
|
||||
)
|
||||
```
|
||||
|
||||
**Pool Size Formula**: `pool_size = (workers * conn_per_worker) + buffer`
|
||||
Example: `(4 workers * 3 conn) + 8 buffer = 20`
|
||||
|
||||
## 7. Testing Connection Cleanup
|
||||
|
||||
### Pytest Fixture for Connection Tracking
|
||||
|
||||
```python
|
||||
# tests/conftest.py
|
||||
import pytest
|
||||
from sqlmodel import Session, create_engine
|
||||
|
||||
@pytest.fixture
|
||||
def engine():
|
||||
"""Test engine with connection tracking"""
|
||||
test_engine = create_engine("postgresql://test:test@localhost/test_db", pool_size=5)
|
||||
initial_active = test_engine.pool.checkedout()
|
||||
yield test_engine
|
||||
final_active = test_engine.pool.checkedout()
|
||||
assert final_active == initial_active, f"Leaked {final_active - initial_active} connections"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_connection_leak_under_load(engine):
|
||||
"""Simulate 1000 concurrent requests"""
|
||||
initial = engine.pool.checkedout()
|
||||
tasks = [get_orders() for _ in range(1000)]
|
||||
await asyncio.gather(*tasks)
|
||||
await asyncio.sleep(1)
|
||||
assert engine.pool.checkedout() == initial, "Connection leak detected"
|
||||
```
|
||||
|
||||
## 8. CI/CD Integration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/connection-leak-test.yml
|
||||
name: Connection Leak Detection
|
||||
on: [pull_request]
|
||||
jobs:
|
||||
leak-test:
|
||||
runs-on: ubuntu-latest
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:15
|
||||
env: {POSTGRES_PASSWORD: test, POSTGRES_DB: test_db}
|
||||
ports: [5432:5432]
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v4
|
||||
with: {python-version: '3.11'}
|
||||
- run: pip install -r requirements.txt pytest pytest-asyncio
|
||||
- run: pytest tests/test_connection_leaks.py -v
|
||||
```
|
||||
|
||||
## 9. Results and Impact
|
||||
|
||||
### Before vs After Metrics
|
||||
|
||||
| Metric | Before | After | Impact |
|
||||
|--------|--------|-------|--------|
|
||||
| **Active Connections** | 95/100 (95%) | 8-12/100 (10%) | **85% reduction** |
|
||||
| **Connection Timeouts** | 15-20/min | 0/day | **100% eliminated** |
|
||||
| **Memory Growth** | 100MB/hour | 0MB/hour | **100% eliminated** |
|
||||
| **Service Restarts** | 3-4x/day | 0/month | **100% eliminated** |
|
||||
| **Pool Wait Time (p95)** | 5.2s | 0.01s | **99.8% faster** |
|
||||
|
||||
### Key Optimizations Applied
|
||||
|
||||
1. **Context Managers**: Guaranteed connection cleanup (even on exceptions)
|
||||
2. **FastAPI Dependencies**: Automatic session lifecycle management
|
||||
3. **Connection Pooling**: Proper pool_size, max_overflow, pool_timeout
|
||||
4. **Prometheus Monitoring**: Real-time pool saturation metrics
|
||||
5. **Load Testing**: CI/CD checks for connection leaks
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
|
||||
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
|
||||
- **Large Datasets**: [large-dataset-optimization.md](large-dataset-optimization.md)
|
||||
- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [examples index](INDEX.md)
|
||||
452
skills/memory-profiling/examples/large-dataset-optimization.md
Normal file
452
skills/memory-profiling/examples/large-dataset-optimization.md
Normal file
@@ -0,0 +1,452 @@
|
||||
# Large Dataset Memory Optimization
|
||||
|
||||
Memory-efficient patterns for processing multi-GB datasets in Python and Node.js without OOM errors.
|
||||
|
||||
## Overview
|
||||
|
||||
**Before Optimization**:
|
||||
- Dataset size: 10GB CSV (50M rows)
|
||||
- Memory usage: 20GB (2x dataset size)
|
||||
- Processing time: 45 minutes
|
||||
- OOM errors: Frequent (3-4x/day)
|
||||
|
||||
**After Optimization**:
|
||||
- Dataset size: Same (10GB, 50M rows)
|
||||
- Memory usage: 500MB (constant)
|
||||
- Processing time: 12 minutes (73% faster)
|
||||
- OOM errors: 0/month
|
||||
|
||||
**Tools**: Polars, pandas chunking, generators, streaming parsers
|
||||
|
||||
## 1. Problem: Loading Entire Dataset
|
||||
|
||||
### Vulnerable Pattern (Pandas read_csv)
|
||||
|
||||
```python
|
||||
# analysis.py (BEFORE)
|
||||
import pandas as pd
|
||||
|
||||
def analyze_sales_data(filename: str):
|
||||
# ❌ Loads entire 10GB file into memory
|
||||
df = pd.read_csv(filename) # 20GB RAM usage
|
||||
|
||||
# ❌ Creates copies for each operation
|
||||
df['total'] = df['quantity'] * df['price'] # +10GB
|
||||
df_filtered = df[df['total'] > 1000] # +8GB
|
||||
df_sorted = df_filtered.sort_values('total', ascending=False) # +8GB
|
||||
|
||||
# Peak memory: 46GB for 10GB file!
|
||||
return df_sorted.head(100)
|
||||
```
|
||||
|
||||
**Memory Profile**:
|
||||
```
|
||||
Step 1 (read_csv): 20GB
|
||||
Step 2 (calculation): +10GB = 30GB
|
||||
Step 3 (filter): +8GB = 38GB
|
||||
Step 4 (sort): +8GB = 46GB
|
||||
Result: OOM on 32GB machine
|
||||
```
|
||||
|
||||
## 2. Solution 1: Pandas Chunking
|
||||
|
||||
### Chunk-Based Processing
|
||||
|
||||
```python
|
||||
# analysis.py (AFTER - Chunking)
|
||||
import pandas as pd
|
||||
from typing import Iterator
|
||||
|
||||
def analyze_sales_data_chunked(filename: str, chunk_size: int = 100000):
|
||||
"""Process 100K rows at a time (constant memory)"""
|
||||
|
||||
top_sales = []
|
||||
|
||||
# ✅ Process in chunks (100K rows = ~50MB each)
|
||||
for chunk in pd.read_csv(filename, chunksize=chunk_size):
|
||||
# Calculate total (in-place when possible)
|
||||
chunk['total'] = chunk['quantity'] * chunk['price']
|
||||
|
||||
# Filter high-value sales
|
||||
filtered = chunk[chunk['total'] > 1000]
|
||||
|
||||
# Keep top 100 from this chunk
|
||||
top_chunk = filtered.nlargest(100, 'total')
|
||||
top_sales.append(top_chunk)
|
||||
|
||||
# chunk goes out of scope, memory freed
|
||||
|
||||
# Combine top results from all chunks
|
||||
final_df = pd.concat(top_sales).nlargest(100, 'total')
|
||||
return final_df
|
||||
```
|
||||
|
||||
**Memory Profile (Chunked)**:
|
||||
```
|
||||
Chunk 1: 50MB (process) → 10MB (top 100) → garbage collected
|
||||
Chunk 2: 50MB (process) → 10MB (top 100) → garbage collected
|
||||
...
|
||||
Chunk 500: 50MB (process) → 10MB (top 100) → garbage collected
|
||||
Final combine: 500 * 10MB = 500MB total
|
||||
Peak memory: 500MB (99% reduction!)
|
||||
```
|
||||
|
||||
## 3. Solution 2: Polars (Lazy Evaluation)
|
||||
|
||||
### Polars for Large Datasets
|
||||
|
||||
**Why Polars**:
|
||||
- 10-100x faster than pandas
|
||||
- True streaming (doesn't load entire file)
|
||||
- Query optimizer (like SQL databases)
|
||||
- Parallel processing (uses all CPU cores)
|
||||
|
||||
```python
|
||||
# analysis.py (POLARS)
|
||||
import polars as pl
|
||||
|
||||
def analyze_sales_data_polars(filename: str):
|
||||
"""Polars lazy evaluation - constant memory"""
|
||||
|
||||
result = (
|
||||
pl.scan_csv(filename) # ✅ Lazy: doesn't load yet
|
||||
.with_columns([
|
||||
(pl.col('quantity') * pl.col('price')).alias('total')
|
||||
])
|
||||
.filter(pl.col('total') > 1000)
|
||||
.sort('total', descending=True)
|
||||
.head(100)
|
||||
.collect(streaming=True) # ✅ Streaming: processes in chunks
|
||||
)
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
**Memory Profile (Polars Streaming)**:
|
||||
```
|
||||
Memory usage: 200-300MB (constant)
|
||||
Processing: Parallel chunks, optimized query plan
|
||||
Time: 12 minutes vs 45 minutes (pandas)
|
||||
```
|
||||
|
||||
## 4. Node.js Streaming
|
||||
|
||||
### CSV Streaming with csv-parser
|
||||
|
||||
```typescript
|
||||
// analysis.ts (BEFORE)
|
||||
import fs from 'fs';
|
||||
import Papa from 'papaparse';
|
||||
|
||||
async function analyzeSalesData(filename: string) {
|
||||
// ❌ Loads entire 10GB file
|
||||
const fileContent = fs.readFileSync(filename, 'utf-8'); // 20GB RAM
|
||||
const parsed = Papa.parse(fileContent, { header: true }); // +10GB
|
||||
|
||||
// Process all rows
|
||||
const results = parsed.data.map(row => ({
|
||||
total: row.quantity * row.price
|
||||
}));
|
||||
|
||||
return results; // 30GB total
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed with Streaming**:
|
||||
```typescript
|
||||
// analysis.ts (AFTER - Streaming)
|
||||
import fs from 'fs';
|
||||
import csv from 'csv-parser';
|
||||
import { pipeline } from 'stream/promises';
|
||||
|
||||
async function analyzeSalesDataStreaming(filename: string) {
|
||||
const topSales: Array<{row: any, total: number}> = [];
|
||||
|
||||
await pipeline(
|
||||
fs.createReadStream(filename), // ✅ Stream (not load all)
|
||||
csv(),
|
||||
async function* (source) {
|
||||
for await (const row of source) {
|
||||
const total = row.quantity * row.price;
|
||||
|
||||
if (total > 1000) {
|
||||
topSales.push({ row, total });
|
||||
|
||||
// Keep only top 100 (memory bounded)
|
||||
if (topSales.length > 100) {
|
||||
topSales.sort((a, b) => b.total - a.total);
|
||||
topSales.length = 100;
|
||||
}
|
||||
}
|
||||
}
|
||||
yield topSales;
|
||||
}
|
||||
);
|
||||
|
||||
return topSales;
|
||||
}
|
||||
```
|
||||
|
||||
**Memory Profile (Streaming)**:
|
||||
```
|
||||
Buffer: 64KB (stream chunk size)
|
||||
Processing: One row at a time
|
||||
Array: 100 rows max (bounded)
|
||||
Peak memory: 5MB vs 30GB (99.98% reduction!)
|
||||
```
|
||||
|
||||
## 5. Generator Pattern (Python)
|
||||
|
||||
### Memory-Efficient Pipeline
|
||||
|
||||
```python
|
||||
# pipeline.py (Generator-based)
|
||||
from typing import Iterator
|
||||
import csv
|
||||
|
||||
def read_csv_streaming(filename: str) -> Iterator[dict]:
|
||||
"""Read CSV line by line (not all at once)"""
|
||||
with open(filename, 'r') as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
yield row # ✅ One row at a time
|
||||
|
||||
def calculate_totals(rows: Iterator[dict]) -> Iterator[dict]:
|
||||
"""Calculate totals (lazy)"""
|
||||
for row in rows:
|
||||
row['total'] = float(row['quantity']) * float(row['price'])
|
||||
yield row
|
||||
|
||||
def filter_high_value(rows: Iterator[dict], threshold: float = 1000) -> Iterator[dict]:
|
||||
"""Filter high-value sales (lazy)"""
|
||||
for row in rows:
|
||||
if row['total'] > threshold:
|
||||
yield row
|
||||
|
||||
def top_n(rows: Iterator[dict], n: int = 100) -> list[dict]:
|
||||
"""Keep top N rows (bounded memory)"""
|
||||
import heapq
|
||||
return heapq.nlargest(n, rows, key=lambda x: x['total'])
|
||||
|
||||
# ✅ Pipeline: each stage processes one row at a time
|
||||
def analyze_sales_pipeline(filename: str):
|
||||
rows = read_csv_streaming(filename)
|
||||
with_totals = calculate_totals(rows)
|
||||
high_value = filter_high_value(with_totals)
|
||||
top_100 = top_n(high_value, 100)
|
||||
return top_100
|
||||
```
|
||||
|
||||
**Memory Profile (Generator Pipeline)**:
|
||||
```
|
||||
Stage 1 (read): 1 row (few KB)
|
||||
Stage 2 (calculate): 1 row (few KB)
|
||||
Stage 3 (filter): 1 row (few KB)
|
||||
Stage 4 (top_n): 100 rows (bounded)
|
||||
Peak memory: <1MB (constant)
|
||||
```
|
||||
|
||||
## 6. Real-World: E-Commerce Analytics
|
||||
|
||||
### Before (Pandas load_all)
|
||||
|
||||
```python
|
||||
# analytics_service.py (BEFORE)
|
||||
import pandas as pd
|
||||
|
||||
class AnalyticsService:
|
||||
def generate_sales_report(self, start_date: str, end_date: str):
|
||||
# ❌ Load entire orders table (10GB)
|
||||
orders = pd.read_sql(
|
||||
"SELECT * FROM orders WHERE date BETWEEN %s AND %s",
|
||||
engine,
|
||||
params=(start_date, end_date)
|
||||
) # 20GB RAM
|
||||
|
||||
# ❌ Load entire order_items (50GB)
|
||||
items = pd.read_sql("SELECT * FROM order_items", engine) # +100GB RAM
|
||||
|
||||
# Join (creates another copy)
|
||||
merged = orders.merge(items, on='order_id') # +150GB
|
||||
|
||||
# Aggregate
|
||||
summary = merged.groupby('category').agg({
|
||||
'total': 'sum',
|
||||
'quantity': 'sum'
|
||||
})
|
||||
|
||||
return summary # Peak: 270GB - OOM!
|
||||
```
|
||||
|
||||
### After (Database Aggregation + Chunking)
|
||||
|
||||
```python
|
||||
# analytics_service.py (AFTER)
|
||||
import pandas as pd
|
||||
|
||||
class AnalyticsService:
|
||||
def generate_sales_report(self, start_date: str, end_date: str):
|
||||
# ✅ Aggregate in database (PostgreSQL does the work)
|
||||
query = """
|
||||
SELECT
|
||||
oi.category,
|
||||
SUM(oi.price * oi.quantity) as total,
|
||||
SUM(oi.quantity) as quantity
|
||||
FROM orders o
|
||||
JOIN order_items oi ON o.id = oi.order_id
|
||||
WHERE o.date BETWEEN %(start)s AND %(end)s
|
||||
GROUP BY oi.category
|
||||
"""
|
||||
|
||||
# Result: aggregated data (few KB, not 270GB!)
|
||||
summary = pd.read_sql(
|
||||
query,
|
||||
engine,
|
||||
params={'start': start_date, 'end': end_date}
|
||||
)
|
||||
|
||||
return summary # Peak: 1MB vs 270GB
|
||||
```
|
||||
|
||||
**Metrics**:
|
||||
```
|
||||
Before: 270GB RAM, OOM error
|
||||
After: 1MB RAM, 99.9996% reduction
|
||||
Time: 45 min → 30 seconds (90x faster)
|
||||
```
|
||||
|
||||
## 7. Dask for Parallel Processing
|
||||
|
||||
### Dask DataFrame (Parallel Chunking)
|
||||
|
||||
```python
|
||||
# analysis_dask.py
|
||||
import dask.dataframe as dd
|
||||
|
||||
def analyze_sales_data_dask(filename: str):
|
||||
"""Process in parallel chunks across CPU cores"""
|
||||
|
||||
# ✅ Lazy loading, parallel processing
|
||||
df = dd.read_csv(
|
||||
filename,
|
||||
blocksize='64MB' # Process 64MB chunks
|
||||
)
|
||||
|
||||
# All operations are lazy (no computation yet)
|
||||
df['total'] = df['quantity'] * df['price']
|
||||
filtered = df[df['total'] > 1000]
|
||||
top_100 = filtered.nlargest(100, 'total')
|
||||
|
||||
# ✅ Trigger computation (parallel across cores)
|
||||
result = top_100.compute()
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
**Memory Profile (Dask)**:
|
||||
```
|
||||
Workers: 8 (one per CPU core)
|
||||
Memory per worker: 100MB
|
||||
Total memory: 800MB vs 46GB
|
||||
Speed: 4-8x faster (parallel)
|
||||
```
|
||||
|
||||
## 8. Memory Monitoring
|
||||
|
||||
### Track Memory Usage During Processing
|
||||
|
||||
```python
|
||||
# monitor.py
|
||||
import tracemalloc
|
||||
import psutil
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def memory_monitor(label: str):
|
||||
"""Monitor memory usage of code block"""
|
||||
|
||||
# Start tracking
|
||||
tracemalloc.start()
|
||||
process = psutil.Process()
|
||||
mem_before = process.memory_info().rss / 1024 / 1024 # MB
|
||||
|
||||
yield
|
||||
|
||||
# Measure after
|
||||
mem_after = process.memory_info().rss / 1024 / 1024
|
||||
current, peak = tracemalloc.get_traced_memory()
|
||||
tracemalloc.stop()
|
||||
|
||||
print(f"{label}:")
|
||||
print(f" Memory before: {mem_before:.1f} MB")
|
||||
print(f" Memory after: {mem_after:.1f} MB")
|
||||
print(f" Memory delta: {mem_after - mem_before:.1f} MB")
|
||||
print(f" Peak traced: {peak / 1024 / 1024:.1f} MB")
|
||||
|
||||
# Usage
|
||||
with memory_monitor("Pandas load_all"):
|
||||
df = pd.read_csv("large_file.csv") # Shows high memory usage
|
||||
|
||||
with memory_monitor("Polars streaming"):
|
||||
df = pl.scan_csv("large_file.csv").collect(streaming=True) # Low memory
|
||||
```
|
||||
|
||||
## 9. Optimization Decision Tree
|
||||
|
||||
**Choose the right tool based on dataset size**:
|
||||
|
||||
```
|
||||
Dataset < 1GB:
|
||||
→ Use pandas.read_csv() (simple, fast)
|
||||
|
||||
Dataset 1-10GB:
|
||||
→ Use pandas chunking (chunksize=100000)
|
||||
→ Or Polars streaming (faster, less memory)
|
||||
|
||||
Dataset 10-100GB:
|
||||
→ Use Polars streaming (best performance)
|
||||
→ Or Dask (parallel processing)
|
||||
→ Or Database aggregation (PostgreSQL, ClickHouse)
|
||||
|
||||
Dataset > 100GB:
|
||||
→ Database aggregation (required)
|
||||
→ Or Spark/Ray (distributed computing)
|
||||
→ Never load into memory
|
||||
```
|
||||
|
||||
## 10. Results and Impact
|
||||
|
||||
### Before vs After Metrics
|
||||
|
||||
| Metric | Before (pandas) | After (Polars) | Impact |
|
||||
|--------|----------------|----------------|--------|
|
||||
| **Memory Usage** | 46GB | 300MB | **99.3% reduction** |
|
||||
| **Processing Time** | 45 min | 12 min | **73% faster** |
|
||||
| **OOM Errors** | 3-4/day | 0/month | **100% eliminated** |
|
||||
| **Max Dataset Size** | 10GB | 500GB+ | **50x scalability** |
|
||||
|
||||
### Key Optimizations Applied
|
||||
|
||||
1. **Chunking**: Process 100K rows at a time (constant memory)
|
||||
2. **Lazy Evaluation**: Polars/Dask don't load until needed
|
||||
3. **Streaming**: One row at a time (generators, Node.js streams)
|
||||
4. **Database Aggregation**: Let PostgreSQL do the work
|
||||
5. **Bounded Memory**: heapq.nlargest() keeps top N (not all rows)
|
||||
|
||||
### Cost Savings
|
||||
|
||||
**Infrastructure costs**:
|
||||
- Before: r5.8xlarge (256GB RAM) = $1.344/hour
|
||||
- After: r5.large (16GB RAM) = $0.084/hour
|
||||
- **Savings**: 94% reduction ($23,000/year per service)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
|
||||
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
|
||||
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
|
||||
- **Reference**: [../reference/memory-optimization-patterns.md](../reference/memory-optimization-patterns.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [examples index](INDEX.md)
|
||||
490
skills/memory-profiling/examples/nodejs-memory-leak.md
Normal file
490
skills/memory-profiling/examples/nodejs-memory-leak.md
Normal file
@@ -0,0 +1,490 @@
|
||||
# Node.js Memory Leak Detection
|
||||
|
||||
Identifying and fixing memory leaks in Node.js applications using Chrome DevTools, heapdump, and memory profiling techniques.
|
||||
|
||||
## Overview
|
||||
|
||||
**Symptoms Before Fix**:
|
||||
- Memory usage: 150MB → 2GB over 6 hours
|
||||
- Heap size growing linearly (5MB/minute)
|
||||
- V8 garbage collection ineffective
|
||||
- Production outages (OOM killer)
|
||||
|
||||
**After Fix**:
|
||||
- Memory stable at 150MB (93% reduction)
|
||||
- Heap size constant over time
|
||||
- Zero OOM errors in 30 days
|
||||
- Proper resource cleanup
|
||||
|
||||
**Tools**: Chrome DevTools, heapdump, memwatch-next, Prometheus monitoring
|
||||
|
||||
## 1. Memory Leak Symptoms
|
||||
|
||||
### Linear Memory Growth
|
||||
|
||||
```bash
|
||||
# Monitor Node.js memory usage
|
||||
node --expose-gc --inspect app.js
|
||||
|
||||
# Connect Chrome DevTools: chrome://inspect
|
||||
# Memory tab → Take heap snapshot every 5 minutes
|
||||
```
|
||||
|
||||
**Heap growth pattern**:
|
||||
```
|
||||
Time | Heap Size | External | Total
|
||||
------|-----------|----------|-------
|
||||
0 min | 50MB | 10MB | 60MB
|
||||
5 min | 75MB | 15MB | 90MB
|
||||
10min | 100MB | 20MB | 120MB
|
||||
15min | 125MB | 25MB | 150MB
|
||||
... | ... | ... | ...
|
||||
6 hrs | 1.8GB | 200MB | 2GB
|
||||
```
|
||||
|
||||
**Diagnosis**: Linear growth indicates memory leak (not normal sawtooth GC pattern)
|
||||
|
||||
### High GC Activity
|
||||
|
||||
```javascript
|
||||
// Monitor GC events
|
||||
const v8 = require('v8');
|
||||
const memoryUsage = process.memoryUsage();
|
||||
|
||||
setInterval(() => {
|
||||
const usage = process.memoryUsage();
|
||||
console.log({
|
||||
heapUsed: `${Math.round(usage.heapUsed / 1024 / 1024)}MB`,
|
||||
heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
|
||||
external: `${Math.round(usage.external / 1024 / 1024)}MB`,
|
||||
rss: `${Math.round(usage.rss / 1024 / 1024)}MB`
|
||||
});
|
||||
}, 60000); // Every minute
|
||||
```
|
||||
|
||||
**Output showing leak**:
|
||||
```
|
||||
{heapUsed: '75MB', heapTotal: '100MB', external: '15MB', rss: '120MB'}
|
||||
{heapUsed: '100MB', heapTotal: '130MB', external: '20MB', rss: '150MB'}
|
||||
{heapUsed: '125MB', heapTotal: '160MB', external: '25MB', rss: '185MB'}
|
||||
```
|
||||
|
||||
## 2. Heap Snapshot Analysis
|
||||
|
||||
### Taking Heap Snapshots
|
||||
|
||||
```javascript
|
||||
// Generate heap snapshot programmatically
|
||||
const v8 = require('v8');
|
||||
const fs = require('fs');
|
||||
|
||||
function takeHeapSnapshot(filename) {
|
||||
const heapSnapshot = v8.writeHeapSnapshot(filename);
|
||||
console.log(`Heap snapshot written to ${heapSnapshot}`);
|
||||
}
|
||||
|
||||
// Take snapshot every hour
|
||||
setInterval(() => {
|
||||
const timestamp = new Date().toISOString().replace(/:/g, '-');
|
||||
takeHeapSnapshot(`heap-${timestamp}.heapsnapshot`);
|
||||
}, 3600000);
|
||||
```
|
||||
|
||||
### Analyzing Snapshots in Chrome DevTools
|
||||
|
||||
**Steps**:
|
||||
1. Load two snapshots (before and after 1 hour)
|
||||
2. Compare snapshots (Comparison view)
|
||||
3. Sort by "Size Delta" (descending)
|
||||
4. Look for objects growing significantly
|
||||
|
||||
**Example Analysis**:
|
||||
```
|
||||
Object Type | Count | Size Delta | Retained Size
|
||||
----------------------|--------|------------|---------------
|
||||
(array) | +5,000 | +50MB | +60MB
|
||||
EventEmitter | +1,200 | +12MB | +15MB
|
||||
Closure (anonymous) | +800 | +8MB | +10MB
|
||||
```
|
||||
|
||||
**Diagnosis**: EventEmitter count growing = likely event listener leak
|
||||
|
||||
### Retained Objects Analysis
|
||||
|
||||
```javascript
|
||||
// Chrome DevTools → Heap Snapshot → Summary → sort by "Retained Size"
|
||||
// Click object → view Retainer tree
|
||||
```
|
||||
|
||||
**Retainer tree example** (EventEmitter leak):
|
||||
```
|
||||
EventEmitter @123456
|
||||
← listeners: Array[50]
|
||||
← _events.data: Array
|
||||
← EventEmitter @123456 (self-reference leak!)
|
||||
```
|
||||
|
||||
## 3. Common Memory Leak Patterns
|
||||
|
||||
### Pattern 1: Event Listener Leak
|
||||
|
||||
**Vulnerable Code**:
|
||||
```typescript
|
||||
// ❌ LEAK: EventEmitter listeners never removed
|
||||
import {EventEmitter} from 'events';
|
||||
|
||||
class DataProcessor {
|
||||
private emitter = new EventEmitter();
|
||||
|
||||
async processOrders() {
|
||||
// Add listener every time function called
|
||||
this.emitter.on('data', (data) => {
|
||||
console.log('Processing:', data);
|
||||
});
|
||||
|
||||
// Emit 1000 events
|
||||
for (let i = 0; i < 1000; i++) {
|
||||
this.emitter.emit('data', {id: i});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Called 1000 times = 1000 listeners accumulate!
|
||||
setInterval(() => new DataProcessor().processOrders(), 1000);
|
||||
```
|
||||
|
||||
**Result**: 1000 listeners/second = 3.6M listeners/hour → 2GB memory leak
|
||||
|
||||
**Fixed Code**:
|
||||
```typescript
|
||||
// ✅ FIXED: Remove listener after use
|
||||
class DataProcessor {
|
||||
private emitter = new EventEmitter();
|
||||
|
||||
async processOrders() {
|
||||
const handler = (data) => {
|
||||
console.log('Processing:', data);
|
||||
};
|
||||
|
||||
this.emitter.on('data', handler);
|
||||
|
||||
try {
|
||||
for (let i = 0; i < 1000; i++) {
|
||||
this.emitter.emit('data', {id: i});
|
||||
}
|
||||
} finally {
|
||||
// ✅ Clean up listener
|
||||
this.emitter.removeListener('data', handler);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Better**: Use `once()` for one-time listeners:
|
||||
```typescript
|
||||
this.emitter.once('data', handler); // Auto-removed after first emit
|
||||
```
|
||||
|
||||
### Pattern 2: Closure Leak
|
||||
|
||||
**Vulnerable Code**:
|
||||
```typescript
|
||||
// ❌ LEAK: Closure captures large object
|
||||
const cache = new Map();
|
||||
|
||||
function processRequest(userId: string) {
|
||||
const largeData = fetchLargeDataset(userId); // 10MB object
|
||||
|
||||
// Closure captures entire largeData
|
||||
cache.set(userId, () => {
|
||||
return largeData.summary; // Only need summary (1KB)
|
||||
});
|
||||
}
|
||||
|
||||
// Called for 1000 users = 10GB in cache!
|
||||
```
|
||||
|
||||
**Fixed Code**:
|
||||
```typescript
|
||||
// ✅ FIXED: Only store what you need
|
||||
const cache = new Map();
|
||||
|
||||
function processRequest(userId: string) {
|
||||
const largeData = fetchLargeDataset(userId);
|
||||
const summary = largeData.summary; // Extract only 1KB
|
||||
|
||||
// Store minimal data
|
||||
cache.set(userId, () => summary);
|
||||
}
|
||||
|
||||
// 1000 users = 1MB in cache ✅
|
||||
```
|
||||
|
||||
### Pattern 3: Global Variable Accumulation
|
||||
|
||||
**Vulnerable Code**:
|
||||
```typescript
|
||||
// ❌ LEAK: Global array keeps growing
|
||||
const requestLog: Request[] = [];
|
||||
|
||||
app.post('/api/orders', (req, res) => {
|
||||
requestLog.push(req); // Never removed!
|
||||
// ... process order
|
||||
});
|
||||
|
||||
// 1M requests = 1M objects in memory permanently
|
||||
```
|
||||
|
||||
**Fixed Code**:
|
||||
```typescript
|
||||
// ✅ FIXED: Use LRU cache with size limit
|
||||
import LRU from 'lru-cache';
|
||||
|
||||
const requestLog = new LRU({
|
||||
max: 1000, // Maximum 1000 items
|
||||
ttl: 1000 * 60 * 5 // 5-minute TTL
|
||||
});
|
||||
|
||||
app.post('/api/orders', (req, res) => {
|
||||
requestLog.set(req.id, req); // Auto-evicts old items
|
||||
});
|
||||
```
|
||||
|
||||
### Pattern 4: Forgotten Timers/Intervals
|
||||
|
||||
**Vulnerable Code**:
|
||||
```typescript
|
||||
// ❌ LEAK: setInterval never cleared
|
||||
class ReportGenerator {
|
||||
private data: any[] = [];
|
||||
|
||||
start() {
|
||||
setInterval(() => {
|
||||
this.data.push(generateReport()); // Accumulates forever
|
||||
}, 60000);
|
||||
}
|
||||
}
|
||||
|
||||
// Each instance leaks!
|
||||
const generator = new ReportGenerator();
|
||||
generator.start();
|
||||
```
|
||||
|
||||
**Fixed Code**:
|
||||
```typescript
|
||||
// ✅ FIXED: Clear interval on cleanup
|
||||
class ReportGenerator {
|
||||
private data: any[] = [];
|
||||
private intervalId?: NodeJS.Timeout;
|
||||
|
||||
start() {
|
||||
this.intervalId = setInterval(() => {
|
||||
this.data.push(generateReport());
|
||||
}, 60000);
|
||||
}
|
||||
|
||||
stop() {
|
||||
if (this.intervalId) {
|
||||
clearInterval(this.intervalId);
|
||||
this.intervalId = undefined;
|
||||
this.data = []; // Clear accumulated data
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Memory Profiling with memwatch-next
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
bun add memwatch-next
|
||||
```
|
||||
|
||||
### Leak Detection
|
||||
|
||||
```typescript
|
||||
// memory-monitor.ts
|
||||
import memwatch from 'memwatch-next';
|
||||
|
||||
// Detect memory leaks
|
||||
memwatch.on('leak', (info) => {
|
||||
console.error('Memory leak detected:', {
|
||||
growth: info.growth,
|
||||
reason: info.reason,
|
||||
current_base: `${Math.round(info.current_base / 1024 / 1024)}MB`,
|
||||
leaked: `${Math.round((info.current_base - info.start) / 1024 / 1024)}MB`
|
||||
});
|
||||
|
||||
// Alert to PagerDuty/Slack
|
||||
alertOps('Memory leak detected', info);
|
||||
});
|
||||
|
||||
// Monitor GC stats
|
||||
memwatch.on('stats', (stats) => {
|
||||
console.log('GC stats:', {
|
||||
used_heap_size: `${Math.round(stats.used_heap_size / 1024 / 1024)}MB`,
|
||||
heap_size_limit: `${Math.round(stats.heap_size_limit / 1024 / 1024)}MB`,
|
||||
num_full_gc: stats.num_full_gc,
|
||||
num_inc_gc: stats.num_inc_gc
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### HeapDiff for Leak Analysis
|
||||
|
||||
```typescript
|
||||
import memwatch from 'memwatch-next';
|
||||
|
||||
const hd = new memwatch.HeapDiff();
|
||||
|
||||
// Simulate leak
|
||||
const leak: any[] = [];
|
||||
for (let i = 0; i < 10000; i++) {
|
||||
leak.push({data: new Array(1000).fill('x')});
|
||||
}
|
||||
|
||||
// Compare heaps
|
||||
const diff = hd.end();
|
||||
console.log('Heap diff:', JSON.stringify(diff, null, 2));
|
||||
|
||||
// Output:
|
||||
// {
|
||||
// "before": {"nodes": 12345, "size": 50000000},
|
||||
// "after": {"nodes": 22345, "size": 150000000},
|
||||
// "change": {
|
||||
// "size_bytes": 100000000, // 100MB leak!
|
||||
// "size": "100.00MB",
|
||||
// "freed_nodes": 100,
|
||||
// "allocated_nodes": 10100 // Net increase
|
||||
// }
|
||||
// }
|
||||
```
|
||||
|
||||
## 5. Production Memory Monitoring
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
```typescript
|
||||
// metrics.ts
|
||||
import {Gauge} from 'prom-client';
|
||||
|
||||
const memoryUsageGauge = new Gauge({
|
||||
name: 'nodejs_memory_usage_bytes',
|
||||
help: 'Node.js memory usage in bytes',
|
||||
labelNames: ['type']
|
||||
});
|
||||
|
||||
setInterval(() => {
|
||||
const usage = process.memoryUsage();
|
||||
memoryUsageGauge.set({type: 'heap_used'}, usage.heapUsed);
|
||||
memoryUsageGauge.set({type: 'heap_total'}, usage.heapTotal);
|
||||
memoryUsageGauge.set({type: 'external'}, usage.external);
|
||||
memoryUsageGauge.set({type: 'rss'}, usage.rss);
|
||||
}, 15000);
|
||||
```
|
||||
|
||||
**Grafana Alert**:
|
||||
```promql
|
||||
# Alert if heap usage growing linearly
|
||||
increase(nodejs_memory_usage_bytes{type="heap_used"}[1h]) > 100000000 # 100MB/hour
|
||||
```
|
||||
|
||||
## 6. Real-World Fix: EventEmitter Leak
|
||||
|
||||
### Before (Leaking)
|
||||
|
||||
```typescript
|
||||
// order-processor.ts (BEFORE FIX)
|
||||
class OrderProcessor {
|
||||
private emitter = new EventEmitter();
|
||||
|
||||
async processOrders() {
|
||||
// ❌ LEAK: Listener added every call
|
||||
this.emitter.on('order:created', async (order) => {
|
||||
await this.sendConfirmationEmail(order);
|
||||
await this.updateInventory(order);
|
||||
});
|
||||
|
||||
const orders = await db.query.orders.findMany({status: 'pending'});
|
||||
for (const order of orders) {
|
||||
this.emitter.emit('order:created', order);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Called every minute
|
||||
setInterval(() => new OrderProcessor().processOrders(), 60000);
|
||||
```
|
||||
|
||||
**Result**: 1,440 listeners/day → 2GB memory leak in production
|
||||
|
||||
### After (Fixed)
|
||||
|
||||
```typescript
|
||||
// order-processor.ts (AFTER FIX)
|
||||
class OrderProcessor {
|
||||
private emitter = new EventEmitter();
|
||||
private listeners = new WeakMap(); // Track listeners for cleanup
|
||||
|
||||
async processOrders() {
|
||||
const handler = async (order) => {
|
||||
await this.sendConfirmationEmail(order);
|
||||
await this.updateInventory(order);
|
||||
};
|
||||
|
||||
// ✅ Use once() for one-time processing
|
||||
this.emitter.once('order:created', handler);
|
||||
|
||||
const orders = await db.query.orders.findMany({status: 'pending'});
|
||||
for (const order of orders) {
|
||||
this.emitter.emit('order:created', order);
|
||||
}
|
||||
|
||||
// ✅ Cleanup (if using on() instead of once())
|
||||
this.emitter.removeAllListeners('order:created');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Result**: Memory stable at 150MB, zero leaks
|
||||
|
||||
## 7. Results and Impact
|
||||
|
||||
### Before vs After Metrics
|
||||
|
||||
| Metric | Before Fix | After Fix | Impact |
|
||||
|--------|-----------|-----------|---------|
|
||||
| **Memory Usage** | 2GB (after 6h) | 150MB (stable) | **93% reduction** |
|
||||
| **Heap Size** | Linear growth (5MB/min) | Stable | **Zero growth** |
|
||||
| **OOM Incidents** | 12/month | 0/month | **100% eliminated** |
|
||||
| **GC Pause Time** | 200ms avg | 50ms avg | **75% faster** |
|
||||
| **Uptime** | 6 hours avg | 30+ days | **120x improvement** |
|
||||
|
||||
### Lessons Learned
|
||||
|
||||
**1. Always remove event listeners**
|
||||
- Use `once()` for one-time events
|
||||
- Use `removeListener()` in finally blocks
|
||||
- Track listeners with WeakMap for debugging
|
||||
|
||||
**2. Avoid closures capturing large objects**
|
||||
- Extract only needed data before closure
|
||||
- Use WeakMap/WeakSet for object references
|
||||
- Profile with heap snapshots regularly
|
||||
|
||||
**3. Monitor memory in production**
|
||||
- Prometheus metrics for heap usage
|
||||
- Alert on linear growth patterns
|
||||
- Weekly heap snapshot analysis
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Python Profiling**: [python-scalene-profiling.md](python-scalene-profiling.md)
|
||||
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
|
||||
- **Reference**: [../reference/memory-patterns.md](../reference/memory-patterns.md)
|
||||
- **Templates**: [../templates/memory-report.md](../templates/memory-report.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [examples index](INDEX.md)
|
||||
456
skills/memory-profiling/examples/python-scalene-profiling.md
Normal file
456
skills/memory-profiling/examples/python-scalene-profiling.md
Normal file
@@ -0,0 +1,456 @@
|
||||
# Python Memory Profiling with Scalene
|
||||
|
||||
Line-by-line memory and CPU profiling for Python applications using Scalene, with pytest integration and optimization strategies.
|
||||
|
||||
## Overview
|
||||
|
||||
**Before Optimization**:
|
||||
- Memory usage: 500MB for processing 10K records
|
||||
- OOM (Out of Memory) errors with 100K records
|
||||
- Processing time: 45 seconds for 10K records
|
||||
- List comprehensions loading entire dataset
|
||||
|
||||
**After Optimization**:
|
||||
- Memory usage: 5MB for processing 10K records (99% reduction)
|
||||
- No OOM errors with 1M records
|
||||
- Processing time: 8 seconds for 10K records (82% faster)
|
||||
- Generator-based streaming
|
||||
|
||||
**Tools**: Scalene, pytest, memory_profiler, tracemalloc
|
||||
|
||||
## 1. Scalene Installation and Setup
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Install Scalene
|
||||
pip install scalene
|
||||
|
||||
# Or with uv (faster)
|
||||
uv pip install scalene
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Profile entire script
|
||||
scalene script.py
|
||||
|
||||
# Profile with pytest (recommended)
|
||||
scalene --cli --memory -m pytest tests/
|
||||
|
||||
# HTML output
|
||||
scalene --html --outfile profile.html script.py
|
||||
|
||||
# Profile specific function
|
||||
scalene --reduced-profile script.py
|
||||
```
|
||||
|
||||
## 2. Profiling with pytest
|
||||
|
||||
### Test File Setup
|
||||
|
||||
```python
|
||||
# tests/test_data_processing.py
|
||||
import pytest
|
||||
from data_processor import DataProcessor
|
||||
|
||||
@pytest.fixture
|
||||
def processor():
|
||||
return DataProcessor()
|
||||
|
||||
def test_process_large_dataset(processor):
|
||||
# Generate 10K records
|
||||
records = [{'id': i, 'value': i * 2} for i in range(10000)]
|
||||
|
||||
# Process (this is where memory spike occurs)
|
||||
result = processor.process_records(records)
|
||||
|
||||
assert len(result) == 10000
|
||||
```
|
||||
|
||||
### Running Scalene with pytest
|
||||
|
||||
```bash
|
||||
# Profile memory usage during test execution
|
||||
uv run scalene --cli --memory -m pytest tests/test_data_processing.py 2>&1 | grep -i "memory\|mb\|test"
|
||||
|
||||
# Output shows line-by-line memory allocation
|
||||
```
|
||||
|
||||
**Scalene Output** (before optimization):
|
||||
```
|
||||
data_processor.py:
|
||||
Line | Memory % | Memory (MB) | CPU % | Code
|
||||
-----|----------|-------------|-------|-----
|
||||
12 | 45% | 225 MB | 10% | result = [transform(r) for r in records]
|
||||
18 | 30% | 150 MB | 5% | filtered = [r for r in result if r['value'] > 0]
|
||||
25 | 15% | 75 MB | 20% | sorted_data = sorted(filtered, key=lambda x: x['id'])
|
||||
```
|
||||
|
||||
**Analysis**: Line 12 is the hotspot (45% of memory)
|
||||
|
||||
## 3. Memory Hotspot Identification
|
||||
|
||||
### Vulnerable Code (Memory Spike)
|
||||
|
||||
```python
|
||||
# data_processor.py (BEFORE OPTIMIZATION)
|
||||
class DataProcessor:
|
||||
def process_records(self, records: list[dict]) -> list[dict]:
|
||||
# ❌ HOTSPOT: List comprehension loads entire dataset
|
||||
result = [self.transform(r) for r in records] # 225MB for 10K records
|
||||
|
||||
# ❌ Creates another copy
|
||||
filtered = [r for r in result if r['value'] > 0] # +150MB
|
||||
|
||||
# ❌ sorted() creates yet another copy
|
||||
sorted_data = sorted(filtered, key=lambda x: x['id']) # +75MB
|
||||
|
||||
return sorted_data # Total: 450MB for 10K records
|
||||
|
||||
def transform(self, record: dict) -> dict:
|
||||
return {
|
||||
'id': record['id'],
|
||||
'value': record['value'] * 2,
|
||||
'timestamp': datetime.now()
|
||||
}
|
||||
```
|
||||
|
||||
**Scalene Report**:
|
||||
```
|
||||
Memory allocation breakdown:
|
||||
- Line 12 (list comprehension): 225MB (50%)
|
||||
- Line 18 (filtering): 150MB (33%)
|
||||
- Line 25 (sorting): 75MB (17%)
|
||||
|
||||
Total memory: 450MB for 10,000 records
|
||||
Projected for 100K: 4.5GB → OOM!
|
||||
```
|
||||
|
||||
### Optimized Code (Generator-Based)
|
||||
|
||||
```python
|
||||
# data_processor.py (AFTER OPTIMIZATION)
|
||||
from typing import Iterator
|
||||
|
||||
class DataProcessor:
|
||||
def process_records(self, records: list[dict]) -> Iterator[dict]:
|
||||
# ✅ Generator: processes one record at a time
|
||||
transformed = (self.transform(r) for r in records) # O(1) memory
|
||||
|
||||
# ✅ Generator chaining
|
||||
filtered = (r for r in transformed if r['value'] > 0) # O(1) memory
|
||||
|
||||
# ✅ Stream-based sorting (only if needed)
|
||||
# For very large datasets, use external sorting or database ORDER BY
|
||||
yield from sorted(filtered, key=lambda x: x['id']) # Still O(n), but lazy
|
||||
|
||||
def transform(self, record: dict) -> dict:
|
||||
return {
|
||||
'id': record['id'],
|
||||
'value': record['value'] * 2,
|
||||
'timestamp': datetime.now()
|
||||
}
|
||||
|
||||
# Alternative: Fully streaming (no sorting)
|
||||
def process_records_streaming(self, records: list[dict]) -> Iterator[dict]:
|
||||
for record in records:
|
||||
transformed = self.transform(record)
|
||||
if transformed['value'] > 0:
|
||||
yield transformed # O(1) memory, fully streaming
|
||||
```
|
||||
|
||||
**Scalene Report (After)**:
|
||||
```
|
||||
Memory allocation breakdown:
|
||||
- Line 12 (generator): 5MB (100% - constant overhead)
|
||||
- Line 18 (filter generator): 0MB (lazy)
|
||||
- Line 25 (yield): 0MB (lazy)
|
||||
|
||||
Total memory: 5MB for 10,000 records (99% reduction!)
|
||||
Scalable to 1M+ records without OOM
|
||||
```
|
||||
|
||||
## 4. Common Memory Patterns
|
||||
|
||||
### Pattern 1: List Comprehension → Generator
|
||||
|
||||
**Before** (High Memory):
|
||||
```python
|
||||
# ❌ Loads entire list into memory
|
||||
def process_large_file(filename: str) -> list[dict]:
|
||||
with open(filename) as f:
|
||||
lines = f.readlines() # Loads entire file (500MB)
|
||||
|
||||
# Another copy
|
||||
return [json.loads(line) for line in lines] # +500MB = 1GB total
|
||||
```
|
||||
|
||||
**After** (Low Memory):
|
||||
```python
|
||||
# ✅ Generator: processes line-by-line
|
||||
def process_large_file(filename: str) -> Iterator[dict]:
|
||||
with open(filename) as f:
|
||||
for line in f: # Reads one line at a time
|
||||
yield json.loads(line) # O(1) memory
|
||||
```
|
||||
|
||||
**Scalene diff**: 1GB → 5MB (99.5% reduction)
|
||||
|
||||
### Pattern 2: DataFrame Memory Optimization
|
||||
|
||||
**Before** (High Memory):
|
||||
```python
|
||||
# ❌ Loads entire CSV into memory
|
||||
import pandas as pd
|
||||
|
||||
def analyze_data(filename: str):
|
||||
df = pd.read_csv(filename) # 10GB CSV → 10GB RAM
|
||||
|
||||
# All transformations in memory
|
||||
df['new_col'] = df['value'] * 2
|
||||
df_filtered = df[df['value'] > 0]
|
||||
return df_filtered.groupby('category').sum()
|
||||
```
|
||||
|
||||
**After** (Low Memory with Chunking):
|
||||
```python
|
||||
# ✅ Process in chunks
|
||||
import pandas as pd
|
||||
|
||||
def analyze_data(filename: str):
|
||||
chunk_size = 10000
|
||||
results = []
|
||||
|
||||
# Process 10K rows at a time
|
||||
for chunk in pd.read_csv(filename, chunksize=chunk_size):
|
||||
chunk['new_col'] = chunk['value'] * 2
|
||||
filtered = chunk[chunk['value'] > 0]
|
||||
group_result = filtered.groupby('category').sum()
|
||||
results.append(group_result)
|
||||
|
||||
# Combine results
|
||||
return pd.concat(results).groupby(level=0).sum() # Much smaller
|
||||
```
|
||||
|
||||
**Scalene diff**: 10GB → 500MB (95% reduction)
|
||||
|
||||
### Pattern 3: String Concatenation
|
||||
|
||||
**Before** (High Memory):
|
||||
```python
|
||||
# ❌ Creates new string each iteration (O(n²) memory)
|
||||
def build_report(data: list[dict]) -> str:
|
||||
report = ""
|
||||
for item in data: # 100K items
|
||||
report += f"{item['id']}: {item['value']}\n" # New string every time
|
||||
return report # 500MB final string + 500MB garbage = 1GB
|
||||
```
|
||||
|
||||
**After** (Low Memory):
|
||||
```python
|
||||
# ✅ StringIO or join (O(n) memory)
|
||||
from io import StringIO
|
||||
|
||||
def build_report(data: list[dict]) -> str:
|
||||
buffer = StringIO()
|
||||
for item in data:
|
||||
buffer.write(f"{item['id']}: {item['value']}\n")
|
||||
return buffer.getvalue()
|
||||
|
||||
# Or even better: generator
|
||||
def build_report_streaming(data: list[dict]) -> Iterator[str]:
|
||||
for item in data:
|
||||
yield f"{item['id']}: {item['value']}\n"
|
||||
```
|
||||
|
||||
**Scalene diff**: 1GB → 50MB (95% reduction)
|
||||
|
||||
## 5. Scalene CLI Reference
|
||||
|
||||
### Common Options
|
||||
|
||||
```bash
|
||||
# Memory-only profiling (fastest)
|
||||
scalene --cli --memory script.py
|
||||
|
||||
# CPU + Memory profiling
|
||||
scalene --cli --cpu --memory script.py
|
||||
|
||||
# Reduced profile (functions only, not lines)
|
||||
scalene --reduced-profile script.py
|
||||
|
||||
# Profile specific function
|
||||
scalene --profile-only process_data script.py
|
||||
|
||||
# HTML report
|
||||
scalene --html --outfile profile.html script.py
|
||||
|
||||
# Profile with pytest
|
||||
scalene --cli --memory -m pytest tests/
|
||||
|
||||
# Set memory sampling interval (default: 1MB)
|
||||
scalene --malloc-threshold 0.1 script.py # Sample every 100KB
|
||||
```
|
||||
|
||||
### Interpreting Output
|
||||
|
||||
**Column Meanings**:
|
||||
```
|
||||
Memory % | Percentage of total memory allocated
|
||||
Memory MB | Absolute memory allocated (in megabytes)
|
||||
CPU % | Percentage of CPU time spent
|
||||
Python % | Time spent in Python (vs native code)
|
||||
```
|
||||
|
||||
**Example Output**:
|
||||
```
|
||||
script.py:
|
||||
Line | Memory % | Memory MB | CPU % | Python % | Code
|
||||
-----|----------|-----------|-------|----------|-----
|
||||
12 | 45.2% | 225.6 MB | 10.5% | 95.2% | data = [x for x in range(1000000)]
|
||||
18 | 30.1% | 150.3 MB | 5.2% | 98.1% | filtered = list(filter(lambda x: x > 0, data))
|
||||
```
|
||||
|
||||
**Analysis**:
|
||||
- Line 12: High memory (45.2%) → optimize list comprehension
|
||||
- Line 18: Moderate memory (30.1%) → use generator instead of list()
|
||||
|
||||
## 6. Integration with CI/CD
|
||||
|
||||
### GitHub Actions Workflow
|
||||
|
||||
```yaml
|
||||
# .github/workflows/memory-profiling.yml
|
||||
name: Memory Profiling
|
||||
|
||||
on: [pull_request]
|
||||
|
||||
jobs:
|
||||
profile:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install scalene pytest
|
||||
|
||||
- name: Run memory profiling
|
||||
run: |
|
||||
scalene --cli --memory --reduced-profile -m pytest tests/ > profile.txt
|
||||
|
||||
- name: Check for memory hotspots
|
||||
run: |
|
||||
if grep -q "Memory %" profile.txt; then
|
||||
# Alert if any line uses >100MB
|
||||
if awk '$3 > 100 {exit 1}' profile.txt; then
|
||||
echo "Memory hotspot detected!"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
- name: Upload profile
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: memory-profile
|
||||
path: profile.txt
|
||||
```
|
||||
|
||||
## 7. Real-World Optimization: CSV Processing
|
||||
|
||||
### Before (500MB Memory, OOM at 100K rows)
|
||||
|
||||
```python
|
||||
# csv_processor.py (BEFORE)
|
||||
import pandas as pd
|
||||
|
||||
class CSVProcessor:
|
||||
def process_file(self, filename: str) -> dict:
|
||||
# ❌ Loads entire CSV
|
||||
df = pd.read_csv(filename) # 500MB for 10K rows
|
||||
|
||||
# ❌ Multiple copies
|
||||
df['total'] = df['quantity'] * df['price']
|
||||
df_filtered = df[df['total'] > 100]
|
||||
summary = df_filtered.groupby('category').agg({
|
||||
'total': 'sum',
|
||||
'quantity': 'sum'
|
||||
})
|
||||
|
||||
return summary.to_dict()
|
||||
```
|
||||
|
||||
**Scalene Output**:
|
||||
```
|
||||
Line 8: 500MB (75%) - pd.read_csv()
|
||||
Line 11: 100MB (15%) - df['total'] calculation
|
||||
Line 12: 50MB (10%) - filtering
|
||||
Total: 650MB for 10K rows
|
||||
```
|
||||
|
||||
### After (5MB Memory, Handles 1M rows)
|
||||
|
||||
```python
|
||||
# csv_processor.py (AFTER)
|
||||
import pandas as pd
|
||||
from collections import defaultdict
|
||||
|
||||
class CSVProcessor:
|
||||
def process_file(self, filename: str) -> dict:
|
||||
# ✅ Process in 10K row chunks
|
||||
chunk_size = 10000
|
||||
results = defaultdict(lambda: {'total': 0, 'quantity': 0})
|
||||
|
||||
for chunk in pd.read_csv(filename, chunksize=chunk_size):
|
||||
chunk['total'] = chunk['quantity'] * chunk['price']
|
||||
filtered = chunk[chunk['total'] > 100]
|
||||
|
||||
# Aggregate incrementally
|
||||
for category, group in filtered.groupby('category'):
|
||||
results[category]['total'] += group['total'].sum()
|
||||
results[category]['quantity'] += group['quantity'].sum()
|
||||
|
||||
return dict(results)
|
||||
```
|
||||
|
||||
**Scalene Output (After)**:
|
||||
```
|
||||
Line 9: 5MB (100%) - chunk processing (constant memory)
|
||||
Total: 5MB for any file size (99% reduction)
|
||||
```
|
||||
|
||||
## 8. Results and Impact
|
||||
|
||||
### Before vs After Metrics
|
||||
|
||||
| Metric | Before | After | Impact |
|
||||
|--------|--------|-------|--------|
|
||||
| **Memory Usage** | 500MB (10K rows) | 5MB (1M rows) | **99% reduction** |
|
||||
| **Processing Time** | 45s (10K rows) | 8s (10K rows) | **82% faster** |
|
||||
| **Max File Size** | 100K rows (OOM) | 10M+ rows | **100x scalability** |
|
||||
| **OOM Errors** | 5/week | 0/month | **100% eliminated** |
|
||||
|
||||
### Key Optimizations Applied
|
||||
|
||||
1. **List comprehension → Generator**: 225MB → 0MB
|
||||
2. **DataFrame chunking**: 500MB → 5MB per chunk
|
||||
3. **String concatenation**: 1GB → 50MB (StringIO)
|
||||
4. **Lazy evaluation**: Load on demand vs load all
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Node.js Leaks**: [nodejs-memory-leak.md](nodejs-memory-leak.md)
|
||||
- **DB Leaks**: [database-connection-leak.md](database-connection-leak.md)
|
||||
- **Reference**: [../reference/profiling-tools.md](../reference/profiling-tools.md)
|
||||
- **Templates**: [../templates/scalene-config.txt](../templates/scalene-config.txt)
|
||||
|
||||
---
|
||||
|
||||
Return to [examples index](INDEX.md)
|
||||
75
skills/memory-profiling/reference/INDEX.md
Normal file
75
skills/memory-profiling/reference/INDEX.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Memory Profiler Reference
|
||||
|
||||
Quick reference guides for memory optimization patterns, profiling tools, and garbage collection.
|
||||
|
||||
## Reference Guides
|
||||
|
||||
### Memory Optimization Patterns
|
||||
|
||||
**File**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
|
||||
|
||||
Comprehensive catalog of memory leak patterns and their fixes:
|
||||
- **Event Listener Leaks**: EventEmitter cleanup, closure traps
|
||||
- **Connection Pool Leaks**: Database connection management
|
||||
- **Large Dataset Patterns**: Streaming, chunking, lazy evaluation
|
||||
- **Cache Management**: LRU caches, WeakMap/WeakSet
|
||||
- **Closure Memory Traps**: Variable capture, scope management
|
||||
|
||||
**Use when**: Quick lookup for specific memory leak pattern
|
||||
|
||||
---
|
||||
|
||||
### Profiling Tools Comparison
|
||||
|
||||
**File**: [profiling-tools.md](profiling-tools.md)
|
||||
|
||||
Comparison matrix and usage guide for memory profiling tools:
|
||||
- **Node.js**: Chrome DevTools, heapdump, memwatch-next, clinic.js
|
||||
- **Python**: Scalene, memory_profiler, tracemalloc, py-spy
|
||||
- **Monitoring**: Prometheus, Grafana, DataDog APM
|
||||
- **Tool Selection**: When to use which tool
|
||||
|
||||
**Use when**: Choosing the right profiling tool for your stack
|
||||
|
||||
---
|
||||
|
||||
### Garbage Collection Guide
|
||||
|
||||
**File**: [garbage-collection-guide.md](garbage-collection-guide.md)
|
||||
|
||||
Understanding and tuning garbage collectors:
|
||||
- **V8 (Node.js)**: Generational GC, heap structure, --max-old-space-size
|
||||
- **Python**: Reference counting, generational GC, gc.collect()
|
||||
- **GC Monitoring**: Metrics, alerts, optimization
|
||||
- **GC Tuning**: When and how to tune
|
||||
|
||||
**Use when**: GC issues, tuning performance, understanding memory behavior
|
||||
|
||||
---
|
||||
|
||||
## Quick Lookup
|
||||
|
||||
**Common Patterns**:
|
||||
- EventEmitter leak → [memory-optimization-patterns.md#event-listener-leaks](memory-optimization-patterns.md#event-listener-leaks)
|
||||
- Connection leak → [memory-optimization-patterns.md#connection-pool-leaks](memory-optimization-patterns.md#connection-pool-leaks)
|
||||
- Large dataset → [memory-optimization-patterns.md#large-dataset-patterns](memory-optimization-patterns.md#large-dataset-patterns)
|
||||
|
||||
**Tool Selection**:
|
||||
- Node.js profiling → [profiling-tools.md#nodejs-tools](profiling-tools.md#nodejs-tools)
|
||||
- Python profiling → [profiling-tools.md#python-tools](profiling-tools.md#python-tools)
|
||||
- Production monitoring → [profiling-tools.md#monitoring-tools](profiling-tools.md#monitoring-tools)
|
||||
|
||||
**GC Issues**:
|
||||
- Node.js heap → [garbage-collection-guide.md#v8-heap](garbage-collection-guide.md#v8-heap)
|
||||
- Python GC → [garbage-collection-guide.md#python-gc](garbage-collection-guide.md#python-gc)
|
||||
- GC metrics → [garbage-collection-guide.md#gc-monitoring](garbage-collection-guide.md#gc-monitoring)
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md) - Full walkthroughs
|
||||
- **Templates**: [Templates Index](../templates/INDEX.md) - Memory report templates
|
||||
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
|
||||
|
||||
---
|
||||
|
||||
Return to [main agent](../memory-profiler.md)
|
||||
392
skills/memory-profiling/reference/garbage-collection-guide.md
Normal file
392
skills/memory-profiling/reference/garbage-collection-guide.md
Normal file
@@ -0,0 +1,392 @@
|
||||
# Garbage Collection Guide
|
||||
|
||||
Understanding and tuning garbage collectors in Node.js (V8) and Python for optimal memory management.
|
||||
|
||||
## V8 Garbage Collector (Node.js)
|
||||
|
||||
### Heap Structure
|
||||
|
||||
**Two Generations**:
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ V8 Heap │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ New Space (Young Generation) - 8MB-32MB │
|
||||
│ ┌─────────────┬─────────────┐ │
|
||||
│ │ From-Space │ To-Space │ ← Minor GC (Scavenge) │
|
||||
│ └─────────────┴─────────────┘ │
|
||||
│ │
|
||||
│ Old Space (Old Generation) - Remaining heap │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Long-lived objects │ ← Major GC │
|
||||
│ │ (survived 2+ Minor GCs) │ (Mark-Sweep)│
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Large Object Space - Objects >512KB │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**GC Types**:
|
||||
- **Scavenge (Minor GC)**: Fast (~1ms), clears new space, runs frequently
|
||||
- **Mark-Sweep (Major GC)**: Slow (100-500ms), clears old space, runs when old space fills
|
||||
- **Mark-Compact**: Like Mark-Sweep but also defragments memory
|
||||
|
||||
---
|
||||
|
||||
### Monitoring V8 GC
|
||||
|
||||
**Built-in GC Traces**:
|
||||
```bash
|
||||
# Enable GC logging
|
||||
node --trace-gc server.js
|
||||
|
||||
# Output:
|
||||
# [12345:0x104800000] 42 ms: Scavenge 8.5 (10.2) -> 7.8 (10.2) MB
|
||||
# [12345:0x104800000] 123 ms: Mark-sweep 95.2 (100.5) -> 82.3 (100.5) MB
|
||||
```
|
||||
|
||||
**Parse GC logs**:
|
||||
```
|
||||
[PID:address] time ms: GC-type before (heap) -> after (heap) MB
|
||||
|
||||
Scavenge = Minor GC (young generation)
|
||||
Mark-sweep = Major GC (old generation)
|
||||
```
|
||||
|
||||
**Prometheus Metrics**:
|
||||
```typescript
|
||||
import { Gauge } from 'prom-client';
|
||||
import v8 from 'v8';
|
||||
|
||||
const heap_size = new Gauge({ name: 'nodejs_heap_size_total_bytes' });
|
||||
const heap_used = new Gauge({ name: 'nodejs_heap_used_bytes' });
|
||||
const gc_duration = new Histogram({
|
||||
name: 'nodejs_gc_duration_seconds',
|
||||
labelNames: ['kind']
|
||||
});
|
||||
|
||||
// Track GC events
|
||||
const PerformanceObserver = require('perf_hooks').PerformanceObserver;
|
||||
const obs = new PerformanceObserver((list) => {
|
||||
const entry = list.getEntries()[0];
|
||||
gc_duration.labels(entry.kind).observe(entry.duration / 1000);
|
||||
});
|
||||
obs.observe({ entryTypes: ['gc'] });
|
||||
|
||||
// Update heap metrics every 10s
|
||||
setInterval(() => {
|
||||
const stats = v8.getHeapStatistics();
|
||||
heap_size.set(stats.total_heap_size);
|
||||
heap_used.set(stats.used_heap_size);
|
||||
}, 10000);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### V8 GC Tuning
|
||||
|
||||
**Heap Size Limits**:
|
||||
```bash
|
||||
# Default: ~1.4GB on 64-bit systems
|
||||
# Increase max heap size
|
||||
node --max-old-space-size=4096 server.js # 4GB heap
|
||||
|
||||
# For containers (set to 75% of container memory)
|
||||
# 8GB container → --max-old-space-size=6144
|
||||
```
|
||||
|
||||
**GC Optimization Flags**:
|
||||
```bash
|
||||
# Aggressive GC (lower memory, more CPU)
|
||||
node --optimize-for-size --gc-interval=100 server.js
|
||||
|
||||
# Optimize for throughput (higher memory, less CPU)
|
||||
node --max-old-space-size=8192 server.js
|
||||
|
||||
# Expose GC to JavaScript
|
||||
node --expose-gc server.js
|
||||
# Then: global.gc() to force GC
|
||||
```
|
||||
|
||||
**When to tune**:
|
||||
- ✅ Container memory limits (set heap to 75% of limit)
|
||||
- ✅ Frequent Major GC causing latency spikes
|
||||
- ✅ OOM errors with available memory
|
||||
- ❌ Don't tune as first step (fix leaks first!)
|
||||
|
||||
---
|
||||
|
||||
## Python Garbage Collector
|
||||
|
||||
### GC Mechanism
|
||||
|
||||
**Two Systems**:
|
||||
1. **Reference Counting**: Primary mechanism, immediate cleanup when refcount = 0
|
||||
2. **Generational GC**: Handles circular references
|
||||
|
||||
**Generational Structure**:
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Python GC (Generational) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ Generation 0 (Young) - Threshold: 700 objects │
|
||||
│ ├─ New objects │
|
||||
│ └─ Collected most frequently │
|
||||
│ │
|
||||
│ Generation 1 (Middle) - Threshold: 10 collections │
|
||||
│ ├─ Survived 1 Gen0 collection │
|
||||
│ └─ Collected less frequently │
|
||||
│ │
|
||||
│ Generation 2 (Old) - Threshold: 10 collections │
|
||||
│ ├─ Survived Gen1 collection │
|
||||
│ └─ Collected rarely │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Monitoring Python GC
|
||||
|
||||
**GC Statistics**:
|
||||
```python
|
||||
import gc
|
||||
|
||||
# Get GC stats
|
||||
print(gc.get_stats())
|
||||
# [{'collections': 42, 'collected': 123, 'uncollectable': 0}, ...]
|
||||
|
||||
# Get object count by generation
|
||||
print(gc.get_count())
|
||||
# (45, 3, 1) = (gen0, gen1, gen2) object counts
|
||||
|
||||
# Get thresholds
|
||||
print(gc.get_threshold())
|
||||
# (700, 10, 10) = collect when gen0 has 700 objects, etc.
|
||||
```
|
||||
|
||||
**Track GC Pauses**:
|
||||
```python
|
||||
import gc
|
||||
import time
|
||||
|
||||
class GCMonitor:
|
||||
def __init__(self):
|
||||
self.start_time = None
|
||||
|
||||
def on_gc_start(self, phase, info):
|
||||
self.start_time = time.time()
|
||||
|
||||
def on_gc_finish(self, phase, info):
|
||||
duration = time.time() - self.start_time
|
||||
print(f"GC {phase}: {duration*1000:.1f}ms, collected {info['collected']}")
|
||||
|
||||
# Install callbacks
|
||||
gc.callbacks.append(GCMonitor().on_gc_start)
|
||||
```
|
||||
|
||||
**Prometheus Metrics**:
|
||||
```python
|
||||
from prometheus_client import Gauge, Histogram
|
||||
import gc
|
||||
|
||||
gc_collections = Gauge('python_gc_collections_total', 'GC collections', ['generation'])
|
||||
gc_collected = Gauge('python_gc_objects_collected_total', 'Objects collected', ['generation'])
|
||||
gc_duration = Histogram('python_gc_duration_seconds', 'GC duration', ['generation'])
|
||||
|
||||
def record_gc_metrics():
|
||||
stats = gc.get_stats()
|
||||
for gen, stat in enumerate(stats):
|
||||
gc_collections.labels(generation=gen).set(stat['collections'])
|
||||
gc_collected.labels(generation=gen).set(stat['collected'])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Python GC Tuning
|
||||
|
||||
**Disable GC (for batch jobs)**:
|
||||
```python
|
||||
import gc
|
||||
|
||||
# Disable automatic GC
|
||||
gc.disable()
|
||||
|
||||
# Process large dataset without GC pauses
|
||||
for chunk in large_dataset:
|
||||
process(chunk)
|
||||
|
||||
# Manual GC at end
|
||||
gc.collect()
|
||||
```
|
||||
|
||||
**Adjust Thresholds**:
|
||||
```python
|
||||
import gc
|
||||
|
||||
# Default: (700, 10, 10)
|
||||
# More aggressive: collect more often, lower memory
|
||||
gc.set_threshold(400, 5, 5)
|
||||
|
||||
# Less aggressive: collect less often, higher memory but faster
|
||||
gc.set_threshold(1000, 15, 15)
|
||||
```
|
||||
|
||||
**Debug Circular References**:
|
||||
```python
|
||||
import gc
|
||||
|
||||
# Find objects that can't be collected
|
||||
gc.set_debug(gc.DEBUG_SAVEALL)
|
||||
gc.collect()
|
||||
|
||||
print(f"Uncollectable: {len(gc.garbage)}")
|
||||
for obj in gc.garbage:
|
||||
print(type(obj), obj)
|
||||
```
|
||||
|
||||
**When to tune**:
|
||||
- ✅ Batch jobs: disable GC, manual collect at end
|
||||
- ✅ Real-time systems: adjust thresholds to avoid long pauses
|
||||
- ✅ Debugging: use `DEBUG_SAVEALL` to find leaks
|
||||
- ❌ Don't disable GC in long-running services (memory will grow!)
|
||||
|
||||
---
|
||||
|
||||
## GC-Related Memory Issues
|
||||
|
||||
### Issue 1: Long GC Pauses
|
||||
|
||||
**Symptom**: Request latency spikes every few minutes
|
||||
|
||||
**V8 Fix**:
|
||||
```bash
|
||||
# Monitor GC pauses
|
||||
node --trace-gc server.js 2>&1 | grep "Mark-sweep"
|
||||
|
||||
# If Major GC >500ms, increase heap size
|
||||
node --max-old-space-size=4096 server.js
|
||||
```
|
||||
|
||||
**Python Fix**:
|
||||
```python
|
||||
# Disable GC during request handling
|
||||
import gc
|
||||
gc.disable()
|
||||
|
||||
# Periodic manual GC (in background thread)
|
||||
import threading
|
||||
def periodic_gc():
|
||||
while True:
|
||||
time.sleep(60)
|
||||
gc.collect()
|
||||
threading.Thread(target=periodic_gc, daemon=True).start()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue 2: Frequent Minor GC
|
||||
|
||||
**Symptom**: High CPU from constant minor GC
|
||||
|
||||
**Cause**: Too many short-lived objects
|
||||
|
||||
**Fix**: Reduce allocations
|
||||
```python
|
||||
# ❌ BAD: Creates many temporary objects
|
||||
def process_data(items):
|
||||
return [str(i) for i in items] # New list + strings
|
||||
|
||||
# ✅ BETTER: Generator (no intermediate list)
|
||||
def process_data(items):
|
||||
return (str(i) for i in items)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue 3: Memory Not Released After GC
|
||||
|
||||
**Symptom**: Heap usage high even after GC
|
||||
|
||||
**V8 Cause**: Objects in old generation (major GC needed)
|
||||
```bash
|
||||
# Force full GC to reclaim memory
|
||||
node --expose-gc server.js
|
||||
|
||||
# In code:
|
||||
if (global.gc) global.gc();
|
||||
```
|
||||
|
||||
**Python Cause**: Reference cycles
|
||||
```python
|
||||
# Debug reference cycles
|
||||
import gc
|
||||
import sys
|
||||
|
||||
# Find what's keeping object alive
|
||||
obj = my_object
|
||||
print(sys.getrefcount(obj)) # Should be low
|
||||
|
||||
# Get referrers
|
||||
print(gc.get_referrers(obj))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GC Alerts (Prometheus)
|
||||
|
||||
```yaml
|
||||
# Prometheus alert rules
|
||||
groups:
|
||||
- name: gc_alerts
|
||||
rules:
|
||||
# V8: Major GC taking too long
|
||||
- alert: SlowMajorGC
|
||||
expr: nodejs_gc_duration_seconds{kind="major"} > 0.5
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: "Major GC >500ms ({{ $value }}s)"
|
||||
|
||||
# V8: High GC frequency
|
||||
- alert: FrequentGC
|
||||
expr: rate(nodejs_gc_duration_seconds_count[5m]) > 10
|
||||
for: 10m
|
||||
annotations:
|
||||
summary: "GC running >10x/min"
|
||||
|
||||
# Python: High Gen2 collections
|
||||
- alert: FrequentFullGC
|
||||
expr: rate(python_gc_collections_total{generation="2"}[1h]) > 1
|
||||
for: 1h
|
||||
annotations:
|
||||
summary: "Full GC >1x/hour (potential leak)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### V8 (Node.js)
|
||||
|
||||
1. **Set heap size**: `--max-old-space-size` to 75% of container memory
|
||||
2. **Monitor GC**: Track duration and frequency with Prometheus
|
||||
3. **Alert on slow GC**: Major GC >500ms indicates heap too small or memory leak
|
||||
4. **Don't force GC**: Let V8 manage (except for tests/debugging)
|
||||
|
||||
### Python
|
||||
|
||||
1. **Use reference counting**: Most cleanup is automatic (refcount = 0)
|
||||
2. **Avoid circular refs**: Use `weakref` for back-references
|
||||
3. **Batch jobs**: Disable GC, manual `gc.collect()` at end
|
||||
4. **Monitor Gen2**: Frequent Gen2 collections = potential leak
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
|
||||
- **Tools**: [profiling-tools.md](profiling-tools.md)
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [reference index](INDEX.md)
|
||||
@@ -0,0 +1,371 @@
|
||||
# Memory Optimization Patterns Reference
|
||||
|
||||
Quick reference catalog of common memory leak patterns and their fixes.
|
||||
|
||||
## Event Listener Leaks
|
||||
|
||||
### Pattern: EventEmitter Accumulation
|
||||
|
||||
**Symptom**: Memory grows linearly with time/requests
|
||||
**Cause**: Event listeners added but never removed
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: listener added every call
|
||||
class DataProcessor {
|
||||
private emitter = new EventEmitter();
|
||||
|
||||
async process() {
|
||||
this.emitter.on('data', handler); // Never removed
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX 1: Remove listener
|
||||
this.emitter.on('data', handler);
|
||||
try { /* work */ } finally {
|
||||
this.emitter.removeListener('data', handler);
|
||||
}
|
||||
|
||||
// ✅ FIX 2: Use once()
|
||||
this.emitter.once('data', handler); // Auto-removed
|
||||
|
||||
// ✅ FIX 3: Use AbortController
|
||||
const controller = new AbortController();
|
||||
this.emitter.on('data', handler, { signal: controller.signal });
|
||||
controller.abort(); // Removes listener
|
||||
```
|
||||
|
||||
**Detection**:
|
||||
```typescript
|
||||
// Check listener count
|
||||
console.log(emitter.listenerCount('data')); // Should be constant
|
||||
|
||||
// Monitor in production
|
||||
process.on('warning', (warning) => {
|
||||
if (warning.name === 'MaxListenersExceededWarning') {
|
||||
console.error('Listener leak detected:', warning);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Closure Memory Traps
|
||||
|
||||
### Pattern: Captured Variables in Closures
|
||||
|
||||
**Symptom**: Memory not released after scope exits
|
||||
**Cause**: Closure captures large variables
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: Closure captures entire 1GB buffer
|
||||
function createHandler(largeBuffer: Buffer) {
|
||||
return function handler() {
|
||||
// Only uses buffer.length, but captures entire buffer
|
||||
console.log(largeBuffer.length);
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX: Extract only what's needed
|
||||
function createHandler(largeBuffer: Buffer) {
|
||||
const length = largeBuffer.length; // Extract value
|
||||
return function handler() {
|
||||
console.log(length); // Only captures number, not Buffer
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Connection Pool Leaks
|
||||
|
||||
### Pattern: Unclosed Database Connections
|
||||
|
||||
**Symptom**: Pool exhaustion, connection timeouts
|
||||
**Cause**: Connections acquired but not released
|
||||
|
||||
**Vulnerable**:
|
||||
```python
|
||||
# ❌ LEAK: Connection never closed on exception
|
||||
def get_orders():
|
||||
conn = pool.acquire()
|
||||
orders = conn.execute("SELECT * FROM orders")
|
||||
return orders # conn never released
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```python
|
||||
# ✅ FIX: Context manager guarantees cleanup
|
||||
def get_orders():
|
||||
with pool.acquire() as conn:
|
||||
orders = conn.execute("SELECT * FROM orders")
|
||||
return orders # conn auto-released
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Large Dataset Patterns
|
||||
|
||||
### Pattern 1: Loading Entire File into Memory
|
||||
|
||||
**Vulnerable**:
|
||||
```python
|
||||
# ❌ LEAK: 10GB file → 20GB RAM
|
||||
df = pd.read_csv("large.csv")
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```python
|
||||
# ✅ FIX: Chunking
|
||||
for chunk in pd.read_csv("large.csv", chunksize=10000):
|
||||
process(chunk) # Constant memory
|
||||
|
||||
# ✅ BETTER: Polars streaming
|
||||
df = pl.scan_csv("large.csv").collect(streaming=True)
|
||||
```
|
||||
|
||||
### Pattern 2: List Comprehension vs Generator
|
||||
|
||||
**Vulnerable**:
|
||||
```python
|
||||
# ❌ LEAK: Entire list in memory
|
||||
result = [process(item) for item in huge_list]
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```python
|
||||
# ✅ FIX: Generator (lazy evaluation)
|
||||
result = (process(item) for item in huge_list)
|
||||
for item in result:
|
||||
use(item) # Processes one at a time
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cache Management
|
||||
|
||||
### Pattern: Unbounded Cache Growth
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: Cache grows forever
|
||||
const cache = new Map<string, Data>();
|
||||
|
||||
function getData(key: string) {
|
||||
if (!cache.has(key)) {
|
||||
cache.set(key, fetchData(key)); // Never evicted
|
||||
}
|
||||
return cache.get(key);
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX 1: LRU cache with max size
|
||||
import { LRUCache } from 'lru-cache';
|
||||
|
||||
const cache = new LRUCache<string, Data>({
|
||||
max: 1000, // Max 1000 entries
|
||||
ttl: 1000 * 60 * 5 // 5 minute TTL
|
||||
});
|
||||
|
||||
// ✅ FIX 2: WeakMap (auto-cleanup when key GC'd)
|
||||
const cache = new WeakMap<object, Data>();
|
||||
cache.set(key, data); // Auto-removed when key is GC'd
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Timer and Interval Leaks
|
||||
|
||||
### Pattern: Forgotten Timers
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: Timer never cleared
|
||||
class Component {
|
||||
startPolling() {
|
||||
setInterval(() => {
|
||||
this.fetchData(); // Keeps Component alive forever
|
||||
}, 1000);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX: Clear timer on cleanup
|
||||
class Component {
|
||||
private intervalId?: NodeJS.Timeout;
|
||||
|
||||
startPolling() {
|
||||
this.intervalId = setInterval(() => {
|
||||
this.fetchData();
|
||||
}, 1000);
|
||||
}
|
||||
|
||||
cleanup() {
|
||||
if (this.intervalId) {
|
||||
clearInterval(this.intervalId);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Global Variable Accumulation
|
||||
|
||||
### Pattern: Growing Global Arrays
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: Array grows forever
|
||||
const logs: string[] = [];
|
||||
|
||||
function log(message: string) {
|
||||
logs.push(message); // Never cleared
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX 1: Bounded array
|
||||
const MAX_LOGS = 1000;
|
||||
const logs: string[] = [];
|
||||
|
||||
function log(message: string) {
|
||||
logs.push(message);
|
||||
if (logs.length > MAX_LOGS) {
|
||||
logs.shift(); // Remove oldest
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ FIX 2: Circular buffer
|
||||
import { CircularBuffer } from 'circular-buffer';
|
||||
const logs = new CircularBuffer<string>(1000);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## String Concatenation
|
||||
|
||||
### Pattern: Repeated String Concatenation
|
||||
|
||||
**Vulnerable**:
|
||||
```python
|
||||
# ❌ LEAK: Creates new string each iteration (O(n²))
|
||||
result = ""
|
||||
for item in items:
|
||||
result += str(item) # New string allocation
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```python
|
||||
# ✅ FIX 1: Join
|
||||
result = "".join(str(item) for item in items)
|
||||
|
||||
# ✅ FIX 2: StringIO
|
||||
from io import StringIO
|
||||
buffer = StringIO()
|
||||
for item in items:
|
||||
buffer.write(str(item))
|
||||
result = buffer.getvalue()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## React Component Leaks
|
||||
|
||||
### Pattern: setState After Unmount
|
||||
|
||||
**Vulnerable**:
|
||||
```typescript
|
||||
// ❌ LEAK: setState called after unmount
|
||||
function Component() {
|
||||
const [data, setData] = useState(null);
|
||||
|
||||
useEffect(() => {
|
||||
fetchData().then(setData); // If unmounted, causes leak
|
||||
}, []);
|
||||
}
|
||||
```
|
||||
|
||||
**Fixed**:
|
||||
```typescript
|
||||
// ✅ FIX: Cleanup with AbortController
|
||||
function Component() {
|
||||
const [data, setData] = useState(null);
|
||||
|
||||
useEffect(() => {
|
||||
const controller = new AbortController();
|
||||
|
||||
fetchData(controller.signal).then(setData);
|
||||
|
||||
return () => controller.abort(); // Cleanup
|
||||
}, []);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Memory Leak Indicators
|
||||
|
||||
1. **Linear growth**: Memory usage increases linearly with time/requests
|
||||
2. **Pool exhaustion**: Connection pool hits max size
|
||||
3. **EventEmitter warnings**: "MaxListenersExceededWarning"
|
||||
4. **GC pressure**: Frequent/long GC pauses
|
||||
5. **OOM errors**: Process crashes with "JavaScript heap out of memory"
|
||||
|
||||
### Monitoring Metrics
|
||||
|
||||
```typescript
|
||||
// Prometheus metrics for leak detection
|
||||
const heap_used = new Gauge({
|
||||
name: 'nodejs_heap_used_bytes',
|
||||
help: 'V8 heap used bytes'
|
||||
});
|
||||
|
||||
const event_listeners = new Gauge({
|
||||
name: 'event_listeners_total',
|
||||
help: 'Total event listeners',
|
||||
labelNames: ['event']
|
||||
});
|
||||
|
||||
// Alert if heap grows >10% per hour
|
||||
// Alert if listener count >100 for single event
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Fixes Checklist
|
||||
|
||||
- [ ] **Event listeners**: Use `once()` or `removeListener()`
|
||||
- [ ] **Database connections**: Use context managers or `try/finally`
|
||||
- [ ] **Large datasets**: Use chunking or streaming
|
||||
- [ ] **Caches**: Implement LRU or WeakMap
|
||||
- [ ] **Timers**: Clear with `clearInterval()` or `clearTimeout()`
|
||||
- [ ] **Closures**: Extract values, avoid capturing large objects
|
||||
- [ ] **React**: Cleanup in `useEffect()` return
|
||||
- [ ] **Strings**: Use `join()` or `StringIO`, not `+=`
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md)
|
||||
- **Tools**: [profiling-tools.md](profiling-tools.md)
|
||||
- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [reference index](INDEX.md)
|
||||
407
skills/memory-profiling/reference/profiling-tools.md
Normal file
407
skills/memory-profiling/reference/profiling-tools.md
Normal file
@@ -0,0 +1,407 @@
|
||||
# Memory Profiling Tools Comparison
|
||||
|
||||
Quick reference for choosing and using memory profiling tools across Node.js, Python, and production monitoring.
|
||||
|
||||
## Node.js Tools
|
||||
|
||||
### Chrome DevTools (Built-in)
|
||||
|
||||
**Best for**: Interactive heap snapshot analysis, timeline profiling
|
||||
**Cost**: Free (built into Node.js)
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
# Start Node.js with inspector
|
||||
node --inspect server.js
|
||||
|
||||
# Open chrome://inspect
|
||||
# Click "Open dedicated DevTools for Node"
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Heap snapshots (memory state at point in time)
|
||||
- Timeline recording (allocations over time)
|
||||
- Comparison view (find leaks by comparing snapshots)
|
||||
- Retainer paths (why object not GC'd)
|
||||
|
||||
**When to use**:
|
||||
- Development/staging environments
|
||||
- Interactive debugging sessions
|
||||
- Visual leak analysis
|
||||
|
||||
---
|
||||
|
||||
### heapdump (npm package)
|
||||
|
||||
**Best for**: Production heap snapshots without restarts
|
||||
**Cost**: Free (npm package)
|
||||
|
||||
**Usage**:
|
||||
```typescript
|
||||
import heapdump from 'heapdump';
|
||||
|
||||
// Trigger snapshot on signal
|
||||
process.on('SIGUSR2', () => {
|
||||
heapdump.writeSnapshot((err, filename) => {
|
||||
console.log('Heap dump written to', filename);
|
||||
});
|
||||
});
|
||||
|
||||
// Auto-snapshot on OOM
|
||||
heapdump.writeSnapshot('./oom-' + Date.now() + '.heapsnapshot');
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Production memory leak diagnosis
|
||||
- Scheduled snapshots (daily/weekly)
|
||||
- OOM analysis (capture before crash)
|
||||
|
||||
---
|
||||
|
||||
### clinic.js (Comprehensive Suite)
|
||||
|
||||
**Best for**: All-in-one performance profiling
|
||||
**Cost**: Free (open source)
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
# Install
|
||||
npm install -g clinic
|
||||
|
||||
# Memory profiling
|
||||
clinic heapprofiler -- node server.js
|
||||
|
||||
# Generates interactive HTML report
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Heap profiler (memory allocations)
|
||||
- Flame graphs (CPU + memory)
|
||||
- Timeline visualization
|
||||
- Automatic leak detection
|
||||
|
||||
**When to use**:
|
||||
- Initial performance investigation
|
||||
- Comprehensive profiling (CPU + memory)
|
||||
- Team-friendly reports (HTML)
|
||||
|
||||
---
|
||||
|
||||
### memwatch-next
|
||||
|
||||
**Best for**: Real-time leak detection in production
|
||||
**Cost**: Free (npm package)
|
||||
|
||||
**Usage**:
|
||||
```typescript
|
||||
import memwatch from '@airbnb/node-memwatch';
|
||||
|
||||
memwatch.on('leak', (info) => {
|
||||
console.error('Memory leak detected:', info);
|
||||
// Alert, log, snapshot, etc.
|
||||
});
|
||||
|
||||
memwatch.on('stats', (stats) => {
|
||||
console.log('GC stats:', stats);
|
||||
});
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Production leak monitoring
|
||||
- Automatic alerting
|
||||
- GC pressure tracking
|
||||
|
||||
---
|
||||
|
||||
## Python Tools
|
||||
|
||||
### Scalene (Line-by-Line Profiler)
|
||||
|
||||
**Best for**: Fastest, most detailed Python profiler
|
||||
**Cost**: Free (pip package)
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
# Install
|
||||
pip install scalene
|
||||
|
||||
# Profile script
|
||||
scalene script.py
|
||||
|
||||
# Profile with pytest
|
||||
scalene --cli --memory -m pytest tests/
|
||||
|
||||
# HTML report
|
||||
scalene --html --outfile profile.html script.py
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Line-by-line memory allocation
|
||||
- CPU profiling
|
||||
- GPU profiling
|
||||
- Native code vs Python time
|
||||
- Memory timeline
|
||||
|
||||
**When to use**:
|
||||
- Python memory optimization
|
||||
- Line-level bottleneck identification
|
||||
- pytest integration
|
||||
|
||||
---
|
||||
|
||||
### memory_profiler
|
||||
|
||||
**Best for**: Simple decorator-based profiling
|
||||
**Cost**: Free (pip package)
|
||||
|
||||
**Usage**:
|
||||
```python
|
||||
from memory_profiler import profile
|
||||
|
||||
@profile
|
||||
def my_function():
|
||||
a = [1] * (10 ** 6)
|
||||
b = [2] * (2 * 10 ** 7)
|
||||
return a + b
|
||||
|
||||
# Run with: python -m memory_profiler script.py
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Quick function-level profiling
|
||||
- Simple memory debugging
|
||||
- Educational/learning
|
||||
|
||||
---
|
||||
|
||||
### tracemalloc (Built-in)
|
||||
|
||||
**Best for**: Production memory tracking without dependencies
|
||||
**Cost**: Free (Python standard library)
|
||||
|
||||
**Usage**:
|
||||
```python
|
||||
import tracemalloc
|
||||
|
||||
tracemalloc.start()
|
||||
|
||||
# Your code here
|
||||
|
||||
current, peak = tracemalloc.get_traced_memory()
|
||||
print(f"Current: {current / 1024 / 1024:.1f} MB")
|
||||
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
|
||||
|
||||
# Top allocations
|
||||
snapshot = tracemalloc.take_snapshot()
|
||||
top_stats = snapshot.statistics('lineno')
|
||||
for stat in top_stats[:10]:
|
||||
print(stat)
|
||||
|
||||
tracemalloc.stop()
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Production environments (no external dependencies)
|
||||
- Allocation tracking
|
||||
- Top allocators identification
|
||||
|
||||
---
|
||||
|
||||
### py-spy (Sampling Profiler)
|
||||
|
||||
**Best for**: Zero-overhead production profiling
|
||||
**Cost**: Free (cargo/pip package)
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
# Install
|
||||
pip install py-spy
|
||||
|
||||
# Attach to running process (no code changes!)
|
||||
py-spy top --pid 12345
|
||||
|
||||
# Flame graph
|
||||
py-spy record --pid 12345 --output profile.svg
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Production profiling (minimal overhead)
|
||||
- No code modification required
|
||||
- Running process analysis
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Tools
|
||||
|
||||
### Prometheus + Grafana
|
||||
|
||||
**Best for**: Production metrics and alerting
|
||||
**Cost**: Free (open source)
|
||||
|
||||
**Metrics to track**:
|
||||
```typescript
|
||||
import { Gauge, Histogram } from 'prom-client';
|
||||
|
||||
// Heap usage
|
||||
const heap_used = new Gauge({
|
||||
name: 'nodejs_heap_used_bytes',
|
||||
help: 'V8 heap used bytes'
|
||||
});
|
||||
|
||||
// Memory allocation rate
|
||||
const allocation_rate = new Gauge({
|
||||
name: 'memory_allocation_bytes_per_second',
|
||||
help: 'Memory allocation rate'
|
||||
});
|
||||
|
||||
// Connection pool
|
||||
const pool_active = new Gauge({
|
||||
name: 'db_pool_connections_active',
|
||||
help: 'Active database connections'
|
||||
});
|
||||
```
|
||||
|
||||
**Alerts**:
|
||||
```yaml
|
||||
# Prometheus alert rules
|
||||
groups:
|
||||
- name: memory_alerts
|
||||
rules:
|
||||
- alert: MemoryLeak
|
||||
expr: increase(nodejs_heap_used_bytes[1h]) > 100000000 # +100MB/hour
|
||||
for: 6h
|
||||
annotations:
|
||||
summary: "Potential memory leak ({{ $value | humanize }} growth)"
|
||||
|
||||
- alert: HeapNearLimit
|
||||
expr: nodejs_heap_used_bytes / nodejs_heap_size_bytes > 0.9
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: "Heap usage >90%"
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- Production monitoring (all environments)
|
||||
- Long-term trend analysis
|
||||
- Automatic alerting
|
||||
|
||||
---
|
||||
|
||||
### DataDog APM
|
||||
|
||||
**Best for**: Comprehensive observability platform
|
||||
**Cost**: Paid (starts $15/host/month)
|
||||
|
||||
**Features**:
|
||||
- Automatic heap tracking
|
||||
- Memory leak detection
|
||||
- Distributed tracing
|
||||
- Alert management
|
||||
- Dashboards
|
||||
|
||||
**When to use**:
|
||||
- Enterprise environments
|
||||
- Multi-service tracing
|
||||
- Managed solution preferred
|
||||
|
||||
---
|
||||
|
||||
## Tool Selection Matrix
|
||||
|
||||
| Scenario | Node.js Tool | Python Tool | Monitoring |
|
||||
|----------|-------------|-------------|------------|
|
||||
| **Development debugging** | Chrome DevTools | Scalene | - |
|
||||
| **Production leak** | heapdump | py-spy | Prometheus |
|
||||
| **Line-level analysis** | clinic.js | Scalene | - |
|
||||
| **Real-time monitoring** | memwatch-next | tracemalloc | Grafana |
|
||||
| **Zero overhead** | - | py-spy | DataDog |
|
||||
| **No dependencies** | Chrome DevTools | tracemalloc | - |
|
||||
| **Team reports** | clinic.js | Scalene HTML | Grafana |
|
||||
|
||||
---
|
||||
|
||||
## Quick Start Commands
|
||||
|
||||
### Node.js
|
||||
|
||||
```bash
|
||||
# Development: Chrome DevTools
|
||||
node --inspect server.js
|
||||
|
||||
# Production: Heap snapshot
|
||||
kill -USR2 <pid> # If heapdump configured
|
||||
|
||||
# Comprehensive: clinic.js
|
||||
clinic heapprofiler -- node server.js
|
||||
```
|
||||
|
||||
### Python
|
||||
|
||||
```bash
|
||||
# Line-by-line: Scalene
|
||||
scalene --cli --memory script.py
|
||||
|
||||
# Quick profile: memory_profiler
|
||||
python -m memory_profiler script.py
|
||||
|
||||
# Production: py-spy
|
||||
py-spy top --pid <pid>
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
```bash
|
||||
# Prometheus metrics
|
||||
curl http://localhost:9090/metrics | grep memory
|
||||
|
||||
# Grafana dashboard
|
||||
# Import dashboard ID: 11159 (Node.js)
|
||||
# Import dashboard ID: 7362 (Python)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tool Comparison Table
|
||||
|
||||
| Tool | Language | Type | Overhead | Production-Safe | Interactive |
|
||||
|------|----------|------|----------|----------------|-------------|
|
||||
| **Chrome DevTools** | Node.js | Heap snapshot | Low | No | Yes |
|
||||
| **heapdump** | Node.js | Heap snapshot | Low | Yes | No |
|
||||
| **clinic.js** | Node.js | Profiler | Medium | No | Yes |
|
||||
| **memwatch-next** | Node.js | Real-time | Low | Yes | No |
|
||||
| **Scalene** | Python | Profiler | Low | Staging | Yes |
|
||||
| **memory_profiler** | Python | Decorator | Medium | No | No |
|
||||
| **tracemalloc** | Python | Built-in | Low | Yes | No |
|
||||
| **py-spy** | Python | Sampling | Very Low | Yes | No |
|
||||
| **Prometheus** | Both | Metrics | Very Low | Yes | Yes (Grafana) |
|
||||
| **DataDog** | Both | APM | Very Low | Yes | Yes |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Development Workflow
|
||||
|
||||
1. **Initial investigation**: Chrome DevTools (Node.js) or Scalene (Python)
|
||||
2. **Line-level analysis**: clinic.js or Scalene with `--html`
|
||||
3. **Root cause**: Heap snapshot comparison (DevTools)
|
||||
4. **Validation**: Load testing with monitoring
|
||||
|
||||
### Production Workflow
|
||||
|
||||
1. **Detection**: Prometheus alerts (heap growth, pool exhaustion)
|
||||
2. **Diagnosis**: heapdump snapshot or py-spy sampling
|
||||
3. **Analysis**: Chrome DevTools (load snapshot) or Scalene (if reproducible in staging)
|
||||
4. **Monitoring**: Grafana dashboards for trends
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
|
||||
- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md)
|
||||
|
||||
---
|
||||
|
||||
Return to [reference index](INDEX.md)
|
||||
60
skills/memory-profiling/templates/INDEX.md
Normal file
60
skills/memory-profiling/templates/INDEX.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# Memory Profiler Templates
|
||||
|
||||
Ready-to-use templates for memory profiling reports and heap snapshot analysis.
|
||||
|
||||
## Templates Overview
|
||||
|
||||
### Memory Investigation Report
|
||||
|
||||
**File**: [memory-report-template.md](memory-report-template.md)
|
||||
|
||||
Template for documenting memory leak investigations:
|
||||
- **Incident Summary**: Timeline, symptoms, impact
|
||||
- **Investigation Steps**: Tools used, findings
|
||||
- **Root Cause**: Code analysis, leak pattern identified
|
||||
- **Fix Implementation**: Code changes, validation
|
||||
- **Results**: Before/after metrics
|
||||
|
||||
**Use when**: Documenting memory leak investigations for team/postmortems
|
||||
|
||||
---
|
||||
|
||||
### Heap Snapshot Analysis Checklist
|
||||
|
||||
**File**: [heap-snapshot-analysis.md](heap-snapshot-analysis.md)
|
||||
|
||||
Step-by-step checklist for analyzing V8 heap snapshots:
|
||||
- **Snapshot Collection**: When/how to capture snapshots
|
||||
- **Comparison Analysis**: Finding leaks by comparing snapshots
|
||||
- **Retainer Analysis**: Understanding why objects not GC'd
|
||||
- **Common Patterns**: EventEmitter, closures, timers
|
||||
|
||||
**Use when**: Analyzing heap snapshots in Chrome DevTools
|
||||
|
||||
---
|
||||
|
||||
## Quick Usage
|
||||
|
||||
### Memory Report
|
||||
|
||||
1. Copy template: `cp templates/memory-report-template.md docs/investigations/memory-leak-YYYY-MM-DD.md`
|
||||
2. Fill in sections as you investigate
|
||||
3. Share with team for review
|
||||
|
||||
### Heap Analysis
|
||||
|
||||
1. Open template: `templates/heap-snapshot-analysis.md`
|
||||
2. Follow checklist step-by-step
|
||||
3. Document findings in memory report
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md) - Full investigation examples
|
||||
- **Reference**: [Reference Index](../reference/INDEX.md) - Pattern catalog
|
||||
- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
|
||||
|
||||
---
|
||||
|
||||
Return to [main agent](../memory-profiler.md)
|
||||
322
skills/memory-profiling/templates/memory-report-template.md
Normal file
322
skills/memory-profiling/templates/memory-report-template.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Memory Leak Investigation Report
|
||||
|
||||
**Service**: [Service Name]
|
||||
**Date**: [YYYY-MM-DD]
|
||||
**Investigator**: [Your Name]
|
||||
**Severity**: [Critical/High/Medium/Low]
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**TL;DR**: [One sentence summary of the leak, cause, and fix]
|
||||
|
||||
**Impact**:
|
||||
- Memory growth: [X MB/hour or X% increase]
|
||||
- OOM incidents: [Number of crashes]
|
||||
- Affected users: [Number or percentage]
|
||||
- Duration: [How long the leak existed]
|
||||
|
||||
**Resolution**:
|
||||
- Root cause: [Leak pattern - e.g., "EventEmitter listeners not removed"]
|
||||
- Fix deployed: [Date/time]
|
||||
- Status: [Resolved/Monitoring/In Progress]
|
||||
|
||||
---
|
||||
|
||||
## Incident Timeline
|
||||
|
||||
| Time | Event | Details |
|
||||
|------|-------|---------|
|
||||
| [HH:MM] | Detection | [How was leak detected? Alert, manual observation, etc.] |
|
||||
| [HH:MM] | Investigation started | [Initial actions taken] |
|
||||
| [HH:MM] | Root cause identified | [What was found] |
|
||||
| [HH:MM] | Fix implemented | [Code changes made] |
|
||||
| [HH:MM] | Fix deployed | [Deployment details] |
|
||||
| [HH:MM] | Validation complete | [Confirmation that leak is fixed] |
|
||||
|
||||
---
|
||||
|
||||
## Symptoms and Detection
|
||||
|
||||
### Initial Symptoms
|
||||
|
||||
- [ ] Linear memory growth (X MB/hour)
|
||||
- [ ] OOM crashes (frequency: ___)
|
||||
- [ ] GC pressure (frequent/long pauses)
|
||||
- [ ] Connection pool exhaustion
|
||||
- [ ] Service degradation (slow responses)
|
||||
- [ ] Other: ___
|
||||
|
||||
### Detection Method
|
||||
|
||||
**How Discovered**: [Alert, monitoring dashboard, user report, etc.]
|
||||
|
||||
**Monitoring Data**:
|
||||
```
|
||||
Prometheus query: [Query used to detect the leak]
|
||||
Alert rule: [Alert name/threshold]
|
||||
Dashboard: [Link to Grafana dashboard]
|
||||
```
|
||||
|
||||
**Example Metrics**:
|
||||
```
|
||||
Before:
|
||||
- Heap usage baseline: X MB
|
||||
- After 6 hours: Y MB
|
||||
- Growth rate: Z MB/hour
|
||||
|
||||
Current:
|
||||
- Heap usage: [Current value]
|
||||
- Active connections: [Number]
|
||||
- GC pause duration: [p95 value]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Investigation Steps
|
||||
|
||||
### 1. Initial Data Collection
|
||||
|
||||
**Tools Used**:
|
||||
- [ ] Chrome DevTools heap snapshots
|
||||
- [ ] Node.js `--trace-gc` logs
|
||||
- [ ] Python Scalene profiling
|
||||
- [ ] Prometheus metrics
|
||||
- [ ] Application logs
|
||||
- [ ] Other: ___
|
||||
|
||||
**Heap Snapshots Collected**:
|
||||
```
|
||||
Snapshot 1: [timestamp] - [size] MB - [location/filename]
|
||||
Snapshot 2: [timestamp] - [size] MB - [location/filename]
|
||||
Snapshot 3: [timestamp] - [size] MB - [location/filename]
|
||||
```
|
||||
|
||||
### 2. Snapshot Comparison Analysis
|
||||
|
||||
**Method**: [Comparison view in Chrome DevTools, diff analysis, etc.]
|
||||
|
||||
**Findings**:
|
||||
```
|
||||
Objects growing between snapshots:
|
||||
- [Object type 1]: +X instances (+Y MB)
|
||||
- [Object type 2]: +X instances (+Y MB)
|
||||
- [Object type 3]: +X instances (+Y MB)
|
||||
|
||||
Top 3 memory consumers:
|
||||
1. [Object type] - X MB - [Retainer path]
|
||||
2. [Object type] - X MB - [Retainer path]
|
||||
3. [Object type] - X MB - [Retainer path]
|
||||
```
|
||||
|
||||
### 3. Retainer Path Analysis
|
||||
|
||||
**Leaked Object**: [Type of object that's leaking]
|
||||
|
||||
**Retainer Path**:
|
||||
```
|
||||
Window / Global
|
||||
→ [Variable name]
|
||||
→ [Object/function]
|
||||
→ [Property]
|
||||
→ [Leaked object]
|
||||
```
|
||||
|
||||
**Why Not GC'd**: [Explanation of what's keeping object alive]
|
||||
|
||||
---
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Leak Pattern Identified
|
||||
|
||||
**Pattern**: [e.g., EventEmitter leak, closure trap, unclosed connection, etc.]
|
||||
|
||||
**Vulnerable Code** (before fix):
|
||||
```typescript
|
||||
// File: [filepath]:[line]
|
||||
// [Brief explanation of why this leaks]
|
||||
|
||||
[Paste vulnerable code here]
|
||||
```
|
||||
|
||||
**Why This Leaks**:
|
||||
1. [Step 1 of how the leak occurs]
|
||||
2. [Step 2]
|
||||
3. [Result: memory accumulates]
|
||||
|
||||
### Reproduction Steps
|
||||
|
||||
1. [Step to reproduce leak in dev/staging]
|
||||
2. [Step 2]
|
||||
3. [Observed result: memory growth]
|
||||
|
||||
**Reproduction Time**: [How long to observe leak? Minutes/hours]
|
||||
|
||||
---
|
||||
|
||||
## Fix Implementation
|
||||
|
||||
### Code Changes
|
||||
|
||||
**Pull Request**: [Link to PR]
|
||||
|
||||
**Files Modified**:
|
||||
- [file1.ts] - [Brief description of change]
|
||||
- [file2.ts] - [Brief description of change]
|
||||
|
||||
**Fixed Code**:
|
||||
```typescript
|
||||
// File: [filepath]:[line]
|
||||
// [Brief explanation of fix]
|
||||
|
||||
[Paste fixed code here]
|
||||
```
|
||||
|
||||
**Fix Strategy**:
|
||||
- [ ] Remove event listeners (use `removeListener()` or `once()`)
|
||||
- [ ] Close connections (use context managers or `try/finally`)
|
||||
- [ ] Clear timers (use `clearInterval()`/`clearTimeout()`)
|
||||
- [ ] Use WeakMap/WeakSet (for cache)
|
||||
- [ ] Implement generator/streaming (for large datasets)
|
||||
- [ ] Other: ___
|
||||
|
||||
### Testing and Validation
|
||||
|
||||
**Tests Added**:
|
||||
```typescript
|
||||
// Test that verifies no leak
|
||||
describe('Memory leak fix', () => {
|
||||
it('should not leak listeners', () => {
|
||||
const before = emitter.listenerCount('event');
|
||||
// ... execute code
|
||||
const after = emitter.listenerCount('event');
|
||||
expect(after).toBe(before); // No leak
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
**Load Test Results**:
|
||||
```
|
||||
Before fix:
|
||||
- Memory after 1000 requests: X MB
|
||||
- Memory after 10000 requests: Y MB (growth)
|
||||
|
||||
After fix:
|
||||
- Memory after 1000 requests: X MB
|
||||
- Memory after 10000 requests: X MB (stable)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment and Results
|
||||
|
||||
### Deployment Details
|
||||
|
||||
**Environment**: [staging/production]
|
||||
**Deployment Time**: [YYYY-MM-DD HH:MM UTC]
|
||||
**Rollout Strategy**: [Canary, blue-green, rolling, etc.]
|
||||
|
||||
### Post-Deployment Metrics
|
||||
|
||||
**Before Fix**:
|
||||
```
|
||||
Memory baseline: X MB
|
||||
Memory after 6h: Y MB
|
||||
Growth rate: Z MB/hour
|
||||
OOM incidents: N/week
|
||||
```
|
||||
|
||||
**After Fix**:
|
||||
```
|
||||
Memory baseline: X MB
|
||||
Memory after 6h: X MB (stable!)
|
||||
Growth rate: 0 MB/hour
|
||||
OOM incidents: 0/month
|
||||
```
|
||||
|
||||
**Improvement**:
|
||||
- Memory reduction: [X% or Y MB]
|
||||
- OOM elimination: [100%]
|
||||
- GC pressure: [Reduced by X%]
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
**Link**: [Dashboard URL]
|
||||
|
||||
**Key Panels**:
|
||||
- Heap usage trend: [Shows memory stable after fix]
|
||||
- GC pause duration: [Shows improved GC behavior]
|
||||
- Error rate: [Shows OOM errors eliminated]
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### What Went Well
|
||||
|
||||
- [Positive aspect 1]
|
||||
- [Positive aspect 2]
|
||||
|
||||
### What Could Be Improved
|
||||
|
||||
- [Improvement area 1]
|
||||
- [Improvement area 2]
|
||||
|
||||
### Preventive Measures
|
||||
|
||||
**Monitoring Added**:
|
||||
- [ ] Alert: Memory growth >X MB/hour for >Y hours
|
||||
- [ ] Alert: Heap usage >Z% of limit
|
||||
- [ ] Dashboard: Memory trend visualization
|
||||
- [ ] Alert: Connection pool saturation >X%
|
||||
|
||||
**Code Review Checklist Updated**:
|
||||
- [ ] Event listeners properly cleaned up
|
||||
- [ ] Database connections closed
|
||||
- [ ] Timers/intervals cleared
|
||||
- [ ] Large datasets processed with streaming/chunking
|
||||
|
||||
**Testing Standards**:
|
||||
- [ ] Memory leak tests for event listeners
|
||||
- [ ] Load tests with memory monitoring
|
||||
- [ ] CI/CD checks for connection cleanup
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Pattern Catalog**: [Link to memory-optimization-patterns.md]
|
||||
- **Similar Incidents**: [Links to previous memory leak reports]
|
||||
- **Runbook**: [Link to memory leak runbook]
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### Heap Snapshot Files
|
||||
|
||||
- [snapshot1.heapsnapshot] - [Location/S3 URL]
|
||||
- [snapshot2.heapsnapshot] - [Location/S3 URL]
|
||||
|
||||
### GC Logs
|
||||
|
||||
```
|
||||
[Relevant GC log excerpts showing the leak]
|
||||
```
|
||||
|
||||
### Prometheus Queries
|
||||
|
||||
```promql
|
||||
# Memory growth rate
|
||||
rate(nodejs_heap_used_bytes[1h])
|
||||
|
||||
# GC pause duration
|
||||
histogram_quantile(0.95, rate(nodejs_gc_duration_seconds_bucket[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Report Completed**: [YYYY-MM-DD]
|
||||
**Next Review**: [Date for follow-up validation]
|
||||
Reference in New Issue
Block a user