Initial commit

2025-11-29 18:29:23 +08:00
commit ebc71f5387
37 changed files with 9382 additions and 0 deletions
--- a/skills/memory-profiling/reference/INDEX.md
+++ b/skills/memory-profiling/reference/INDEX.md
@@ -0,0 +1,75 @@
+# Memory Profiler Reference
+
+Quick reference guides for memory optimization patterns, profiling tools, and garbage collection.
+
+## Reference Guides
+
+### Memory Optimization Patterns
+
+**File**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
+
+Comprehensive catalog of memory leak patterns and their fixes:
+- **Event Listener Leaks**: EventEmitter cleanup, closure traps
+- **Connection Pool Leaks**: Database connection management
+- **Large Dataset Patterns**: Streaming, chunking, lazy evaluation
+- **Cache Management**: LRU caches, WeakMap/WeakSet
+- **Closure Memory Traps**: Variable capture, scope management
+
+**Use when**: Quick lookup for specific memory leak pattern
+
+---
+
+### Profiling Tools Comparison
+
+**File**: [profiling-tools.md](profiling-tools.md)
+
+Comparison matrix and usage guide for memory profiling tools:
+- **Node.js**: Chrome DevTools, heapdump, memwatch-next, clinic.js
+- **Python**: Scalene, memory_profiler, tracemalloc, py-spy
+- **Monitoring**: Prometheus, Grafana, DataDog APM
+- **Tool Selection**: When to use which tool
+
+**Use when**: Choosing the right profiling tool for your stack
+
+---
+
+### Garbage Collection Guide
+
+**File**: [garbage-collection-guide.md](garbage-collection-guide.md)
+
+Understanding and tuning garbage collectors:
+- **V8 (Node.js)**: Generational GC, heap structure, --max-old-space-size
+- **Python**: Reference counting, generational GC, gc.collect()
+- **GC Monitoring**: Metrics, alerts, optimization
+- **GC Tuning**: When and how to tune
+
+**Use when**: GC issues, tuning performance, understanding memory behavior
+
+---
+
+## Quick Lookup
+
+**Common Patterns**:
+- EventEmitter leak → [memory-optimization-patterns.md#event-listener-leaks](memory-optimization-patterns.md#event-listener-leaks)
+- Connection leak → [memory-optimization-patterns.md#connection-pool-leaks](memory-optimization-patterns.md#connection-pool-leaks)
+- Large dataset → [memory-optimization-patterns.md#large-dataset-patterns](memory-optimization-patterns.md#large-dataset-patterns)
+
+**Tool Selection**:
+- Node.js profiling → [profiling-tools.md#nodejs-tools](profiling-tools.md#nodejs-tools)
+- Python profiling → [profiling-tools.md#python-tools](profiling-tools.md#python-tools)
+- Production monitoring → [profiling-tools.md#monitoring-tools](profiling-tools.md#monitoring-tools)
+
+**GC Issues**:
+- Node.js heap → [garbage-collection-guide.md#v8-heap](garbage-collection-guide.md#v8-heap)
+- Python GC → [garbage-collection-guide.md#python-gc](garbage-collection-guide.md#python-gc)
+- GC metrics → [garbage-collection-guide.md#gc-monitoring](garbage-collection-guide.md#gc-monitoring)
+
+## Related Documentation
+
+- **Examples**: [Examples Index](../examples/INDEX.md) - Full walkthroughs
+- **Templates**: [Templates Index](../templates/INDEX.md) - Memory report templates
+- **Main Agent**: [memory-profiler.md](../memory-profiler.md) - Memory profiler agent
+
+---
+
+Return to [main agent](../memory-profiler.md)
--- a/skills/memory-profiling/reference/garbage-collection-guide.md
+++ b/skills/memory-profiling/reference/garbage-collection-guide.md
@@ -0,0 +1,392 @@
+# Garbage Collection Guide
+
+Understanding and tuning garbage collectors in Node.js (V8) and Python for optimal memory management.
+
+## V8 Garbage Collector (Node.js)
+
+### Heap Structure
+
+**Two Generations**:
+```
+┌─────────────────────────────────────────────────────────┐
+│ V8 Heap                                                 │
+├─────────────────────────────────────────────────────────┤
+│ New Space (Young Generation) - 8MB-32MB                │
+│ ┌─────────────┬─────────────┐                          │
+│ │ From-Space  │ To-Space    │ ← Minor GC (Scavenge)   │
+│ └─────────────┴─────────────┘                          │
+│                                                         │
+│ Old Space (Old Generation) - Remaining heap            │
+│ ┌──────────────────────────────────────┐               │
+│ │ Long-lived objects                   │ ← Major GC    │
+│ │ (survived 2+ Minor GCs)              │   (Mark-Sweep)│
+│ └──────────────────────────────────────┘               │
+│                                                         │
+│ Large Object Space - Objects >512KB                    │
+└─────────────────────────────────────────────────────────┘
+```
+
+**GC Types**:
+- **Scavenge (Minor GC)**: Fast (~1ms), clears new space, runs frequently
+- **Mark-Sweep (Major GC)**: Slow (100-500ms), clears old space, runs when old space fills
+- **Mark-Compact**: Like Mark-Sweep but also defragments memory
+
+---
+
+### Monitoring V8 GC
+
+**Built-in GC Traces**:
+```bash
+# Enable GC logging
+node --trace-gc server.js
+
+# Output:
+# [12345:0x104800000]       42 ms: Scavenge 8.5 (10.2) -> 7.8 (10.2) MB
+# [12345:0x104800000]      123 ms: Mark-sweep 95.2 (100.5) -> 82.3 (100.5) MB
+```
+
+**Parse GC logs**:
+```
+[PID:address] time ms: GC-type before (heap) -> after (heap) MB
+
+Scavenge = Minor GC (young generation)
+Mark-sweep = Major GC (old generation)
+```
+
+**Prometheus Metrics**:
+```typescript
+import { Gauge } from 'prom-client';
+import v8 from 'v8';
+
+const heap_size = new Gauge({ name: 'nodejs_heap_size_total_bytes' });
+const heap_used = new Gauge({ name: 'nodejs_heap_used_bytes' });
+const gc_duration = new Histogram({
+  name: 'nodejs_gc_duration_seconds',
+  labelNames: ['kind']
+});
+
+// Track GC events
+const PerformanceObserver = require('perf_hooks').PerformanceObserver;
+const obs = new PerformanceObserver((list) => {
+  const entry = list.getEntries()[0];
+  gc_duration.labels(entry.kind).observe(entry.duration / 1000);
+});
+obs.observe({ entryTypes: ['gc'] });
+
+// Update heap metrics every 10s
+setInterval(() => {
+  const stats = v8.getHeapStatistics();
+  heap_size.set(stats.total_heap_size);
+  heap_used.set(stats.used_heap_size);
+}, 10000);
+```
+
+---
+
+### V8 GC Tuning
+
+**Heap Size Limits**:
+```bash
+# Default: ~1.4GB on 64-bit systems
+# Increase max heap size
+node --max-old-space-size=4096 server.js  # 4GB heap
+
+# For containers (set to 75% of container memory)
+# 8GB container → --max-old-space-size=6144
+```
+
+**GC Optimization Flags**:
+```bash
+# Aggressive GC (lower memory, more CPU)
+node --optimize-for-size --gc-interval=100 server.js
+
+# Optimize for throughput (higher memory, less CPU)
+node --max-old-space-size=8192 server.js
+
+# Expose GC to JavaScript
+node --expose-gc server.js
+# Then: global.gc() to force GC
+```
+
+**When to tune**:
+- ✅ Container memory limits (set heap to 75% of limit)
+- ✅ Frequent Major GC causing latency spikes
+- ✅ OOM errors with available memory
+- ❌ Don't tune as first step (fix leaks first!)
+
+---
+
+## Python Garbage Collector
+
+### GC Mechanism
+
+**Two Systems**:
+1. **Reference Counting**: Primary mechanism, immediate cleanup when refcount = 0
+2. **Generational GC**: Handles circular references
+
+**Generational Structure**:
+```
+┌─────────────────────────────────────────────────────────┐
+│ Python GC (Generational)                                │
+├─────────────────────────────────────────────────────────┤
+│ Generation 0 (Young) - Threshold: 700 objects          │
+│ ├─ New objects                                          │
+│ └─ Collected most frequently                            │
+│                                                         │
+│ Generation 1 (Middle) - Threshold: 10 collections      │
+│ ├─ Survived 1 Gen0 collection                          │
+│ └─ Collected less frequently                            │
+│                                                         │
+│ Generation 2 (Old) - Threshold: 10 collections         │
+│ ├─ Survived Gen1 collection                            │
+│ └─ Collected rarely                                     │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+### Monitoring Python GC
+
+**GC Statistics**:
+```python
+import gc
+
+# Get GC stats
+print(gc.get_stats())
+# [{'collections': 42, 'collected': 123, 'uncollectable': 0}, ...]
+
+# Get object count by generation
+print(gc.get_count())
+# (45, 3, 1) = (gen0, gen1, gen2) object counts
+
+# Get thresholds
+print(gc.get_threshold())
+# (700, 10, 10) = collect when gen0 has 700 objects, etc.
+```
+
+**Track GC Pauses**:
+```python
+import gc
+import time
+
+class GCMonitor:
+    def __init__(self):
+        self.start_time = None
+
+    def on_gc_start(self, phase, info):
+        self.start_time = time.time()
+
+    def on_gc_finish(self, phase, info):
+        duration = time.time() - self.start_time
+        print(f"GC {phase}: {duration*1000:.1f}ms, collected {info['collected']}")
+
+# Install callbacks
+gc.callbacks.append(GCMonitor().on_gc_start)
+```
+
+**Prometheus Metrics**:
+```python
+from prometheus_client import Gauge, Histogram
+import gc
+
+gc_collections = Gauge('python_gc_collections_total', 'GC collections', ['generation'])
+gc_collected = Gauge('python_gc_objects_collected_total', 'Objects collected', ['generation'])
+gc_duration = Histogram('python_gc_duration_seconds', 'GC duration', ['generation'])
+
+def record_gc_metrics():
+    stats = gc.get_stats()
+    for gen, stat in enumerate(stats):
+        gc_collections.labels(generation=gen).set(stat['collections'])
+        gc_collected.labels(generation=gen).set(stat['collected'])
+```
+
+---
+
+### Python GC Tuning
+
+**Disable GC (for batch jobs)**:
+```python
+import gc
+
+# Disable automatic GC
+gc.disable()
+
+# Process large dataset without GC pauses
+for chunk in large_dataset:
+    process(chunk)
+
+# Manual GC at end
+gc.collect()
+```
+
+**Adjust Thresholds**:
+```python
+import gc
+
+# Default: (700, 10, 10)
+# More aggressive: collect more often, lower memory
+gc.set_threshold(400, 5, 5)
+
+# Less aggressive: collect less often, higher memory but faster
+gc.set_threshold(1000, 15, 15)
+```
+
+**Debug Circular References**:
+```python
+import gc
+
+# Find objects that can't be collected
+gc.set_debug(gc.DEBUG_SAVEALL)
+gc.collect()
+
+print(f"Uncollectable: {len(gc.garbage)}")
+for obj in gc.garbage:
+    print(type(obj), obj)
+```
+
+**When to tune**:
+- ✅ Batch jobs: disable GC, manual collect at end
+- ✅ Real-time systems: adjust thresholds to avoid long pauses
+- ✅ Debugging: use `DEBUG_SAVEALL` to find leaks
+- ❌ Don't disable GC in long-running services (memory will grow!)
+
+---
+
+## GC-Related Memory Issues
+
+### Issue 1: Long GC Pauses
+
+**Symptom**: Request latency spikes every few minutes
+
+**V8 Fix**:
+```bash
+# Monitor GC pauses
+node --trace-gc server.js 2>&1 | grep "Mark-sweep"
+
+# If Major GC >500ms, increase heap size
+node --max-old-space-size=4096 server.js
+```
+
+**Python Fix**:
+```python
+# Disable GC during request handling
+import gc
+gc.disable()
+
+# Periodic manual GC (in background thread)
+import threading
+def periodic_gc():
+    while True:
+        time.sleep(60)
+        gc.collect()
+threading.Thread(target=periodic_gc, daemon=True).start()
+```
+
+---
+
+### Issue 2: Frequent Minor GC
+
+**Symptom**: High CPU from constant minor GC
+
+**Cause**: Too many short-lived objects
+
+**Fix**: Reduce allocations
+```python
+# ❌ BAD: Creates many temporary objects
+def process_data(items):
+    return [str(i) for i in items]  # New list + strings
+
+# ✅ BETTER: Generator (no intermediate list)
+def process_data(items):
+    return (str(i) for i in items)
+```
+
+---
+
+### Issue 3: Memory Not Released After GC
+
+**Symptom**: Heap usage high even after GC
+
+**V8 Cause**: Objects in old generation (major GC needed)
+```bash
+# Force full GC to reclaim memory
+node --expose-gc server.js
+
+# In code:
+if (global.gc) global.gc();
+```
+
+**Python Cause**: Reference cycles
+```python
+# Debug reference cycles
+import gc
+import sys
+
+# Find what's keeping object alive
+obj = my_object
+print(sys.getrefcount(obj))  # Should be low
+
+# Get referrers
+print(gc.get_referrers(obj))
+```
+
+---
+
+## GC Alerts (Prometheus)
+
+```yaml
+# Prometheus alert rules
+groups:
+  - name: gc_alerts
+    rules:
+      # V8: Major GC taking too long
+      - alert: SlowMajorGC
+        expr: nodejs_gc_duration_seconds{kind="major"} > 0.5
+        for: 5m
+        annotations:
+          summary: "Major GC >500ms ({{ $value }}s)"
+
+      # V8: High GC frequency
+      - alert: FrequentGC
+        expr: rate(nodejs_gc_duration_seconds_count[5m]) > 10
+        for: 10m
+        annotations:
+          summary: "GC running >10x/min"
+
+      # Python: High Gen2 collections
+      - alert: FrequentFullGC
+        expr: rate(python_gc_collections_total{generation="2"}[1h]) > 1
+        for: 1h
+        annotations:
+          summary: "Full GC >1x/hour (potential leak)"
+```
+
+---
+
+## Best Practices
+
+### V8 (Node.js)
+
+1. **Set heap size**: `--max-old-space-size` to 75% of container memory
+2. **Monitor GC**: Track duration and frequency with Prometheus
+3. **Alert on slow GC**: Major GC >500ms indicates heap too small or memory leak
+4. **Don't force GC**: Let V8 manage (except for tests/debugging)
+
+### Python
+
+1. **Use reference counting**: Most cleanup is automatic (refcount = 0)
+2. **Avoid circular refs**: Use `weakref` for back-references
+3. **Batch jobs**: Disable GC, manual `gc.collect()` at end
+4. **Monitor Gen2**: Frequent Gen2 collections = potential leak
+
+---
+
+## Related Documentation
+
+- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
+- **Tools**: [profiling-tools.md](profiling-tools.md)
+- **Examples**: [Examples Index](../examples/INDEX.md)
+
+---
+
+Return to [reference index](INDEX.md)
--- a/skills/memory-profiling/reference/memory-optimization-patterns.md
+++ b/skills/memory-profiling/reference/memory-optimization-patterns.md
@@ -0,0 +1,371 @@
+# Memory Optimization Patterns Reference
+
+Quick reference catalog of common memory leak patterns and their fixes.
+
+## Event Listener Leaks
+
+### Pattern: EventEmitter Accumulation
+
+**Symptom**: Memory grows linearly with time/requests
+**Cause**: Event listeners added but never removed
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: listener added every call
+class DataProcessor {
+  private emitter = new EventEmitter();
+
+  async process() {
+    this.emitter.on('data', handler);  // Never removed
+  }
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX 1: Remove listener
+this.emitter.on('data', handler);
+try { /* work */ } finally {
+  this.emitter.removeListener('data', handler);
+}
+
+// ✅ FIX 2: Use once()
+this.emitter.once('data', handler);  // Auto-removed
+
+// ✅ FIX 3: Use AbortController
+const controller = new AbortController();
+this.emitter.on('data', handler, { signal: controller.signal });
+controller.abort();  // Removes listener
+```
+
+**Detection**:
+```typescript
+// Check listener count
+console.log(emitter.listenerCount('data'));  // Should be constant
+
+// Monitor in production
+process.on('warning', (warning) => {
+  if (warning.name === 'MaxListenersExceededWarning') {
+    console.error('Listener leak detected:', warning);
+  }
+});
+```
+
+---
+
+## Closure Memory Traps
+
+### Pattern: Captured Variables in Closures
+
+**Symptom**: Memory not released after scope exits
+**Cause**: Closure captures large variables
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: Closure captures entire 1GB buffer
+function createHandler(largeBuffer: Buffer) {
+  return function handler() {
+    // Only uses buffer.length, but captures entire buffer
+    console.log(largeBuffer.length);
+  };
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX: Extract only what's needed
+function createHandler(largeBuffer: Buffer) {
+  const length = largeBuffer.length;  // Extract value
+  return function handler() {
+    console.log(length);  // Only captures number, not Buffer
+  };
+}
+```
+
+---
+
+## Connection Pool Leaks
+
+### Pattern: Unclosed Database Connections
+
+**Symptom**: Pool exhaustion, connection timeouts
+**Cause**: Connections acquired but not released
+
+**Vulnerable**:
+```python
+# ❌ LEAK: Connection never closed on exception
+def get_orders():
+    conn = pool.acquire()
+    orders = conn.execute("SELECT * FROM orders")
+    return orders  # conn never released
+```
+
+**Fixed**:
+```python
+# ✅ FIX: Context manager guarantees cleanup
+def get_orders():
+    with pool.acquire() as conn:
+        orders = conn.execute("SELECT * FROM orders")
+        return orders  # conn auto-released
+```
+
+---
+
+## Large Dataset Patterns
+
+### Pattern 1: Loading Entire File into Memory
+
+**Vulnerable**:
+```python
+# ❌ LEAK: 10GB file → 20GB RAM
+df = pd.read_csv("large.csv")
+```
+
+**Fixed**:
+```python
+# ✅ FIX: Chunking
+for chunk in pd.read_csv("large.csv", chunksize=10000):
+    process(chunk)  # Constant memory
+
+# ✅ BETTER: Polars streaming
+df = pl.scan_csv("large.csv").collect(streaming=True)
+```
+
+### Pattern 2: List Comprehension vs Generator
+
+**Vulnerable**:
+```python
+# ❌ LEAK: Entire list in memory
+result = [process(item) for item in huge_list]
+```
+
+**Fixed**:
+```python
+# ✅ FIX: Generator (lazy evaluation)
+result = (process(item) for item in huge_list)
+for item in result:
+    use(item)  # Processes one at a time
+```
+
+---
+
+## Cache Management
+
+### Pattern: Unbounded Cache Growth
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: Cache grows forever
+const cache = new Map<string, Data>();
+
+function getData(key: string) {
+  if (!cache.has(key)) {
+    cache.set(key, fetchData(key));  // Never evicted
+  }
+  return cache.get(key);
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX 1: LRU cache with max size
+import { LRUCache } from 'lru-cache';
+
+const cache = new LRUCache<string, Data>({
+  max: 1000,  // Max 1000 entries
+  ttl: 1000 * 60 * 5  // 5 minute TTL
+});
+
+// ✅ FIX 2: WeakMap (auto-cleanup when key GC'd)
+const cache = new WeakMap<object, Data>();
+cache.set(key, data);  // Auto-removed when key is GC'd
+```
+
+---
+
+## Timer and Interval Leaks
+
+### Pattern: Forgotten Timers
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: Timer never cleared
+class Component {
+  startPolling() {
+    setInterval(() => {
+      this.fetchData();  // Keeps Component alive forever
+    }, 1000);
+  }
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX: Clear timer on cleanup
+class Component {
+  private intervalId?: NodeJS.Timeout;
+
+  startPolling() {
+    this.intervalId = setInterval(() => {
+      this.fetchData();
+    }, 1000);
+  }
+
+  cleanup() {
+    if (this.intervalId) {
+      clearInterval(this.intervalId);
+    }
+  }
+}
+```
+
+---
+
+## Global Variable Accumulation
+
+### Pattern: Growing Global Arrays
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: Array grows forever
+const logs: string[] = [];
+
+function log(message: string) {
+  logs.push(message);  // Never cleared
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX 1: Bounded array
+const MAX_LOGS = 1000;
+const logs: string[] = [];
+
+function log(message: string) {
+  logs.push(message);
+  if (logs.length > MAX_LOGS) {
+    logs.shift();  // Remove oldest
+  }
+}
+
+// ✅ FIX 2: Circular buffer
+import { CircularBuffer } from 'circular-buffer';
+const logs = new CircularBuffer<string>(1000);
+```
+
+---
+
+## String Concatenation
+
+### Pattern: Repeated String Concatenation
+
+**Vulnerable**:
+```python
+# ❌ LEAK: Creates new string each iteration (O(n²))
+result = ""
+for item in items:
+    result += str(item)  # New string allocation
+```
+
+**Fixed**:
+```python
+# ✅ FIX 1: Join
+result = "".join(str(item) for item in items)
+
+# ✅ FIX 2: StringIO
+from io import StringIO
+buffer = StringIO()
+for item in items:
+    buffer.write(str(item))
+result = buffer.getvalue()
+```
+
+---
+
+## React Component Leaks
+
+### Pattern: setState After Unmount
+
+**Vulnerable**:
+```typescript
+// ❌ LEAK: setState called after unmount
+function Component() {
+  const [data, setData] = useState(null);
+
+  useEffect(() => {
+    fetchData().then(setData);  // If unmounted, causes leak
+  }, []);
+}
+```
+
+**Fixed**:
+```typescript
+// ✅ FIX: Cleanup with AbortController
+function Component() {
+  const [data, setData] = useState(null);
+
+  useEffect(() => {
+    const controller = new AbortController();
+
+    fetchData(controller.signal).then(setData);
+
+    return () => controller.abort();  // Cleanup
+  }, []);
+}
+```
+
+---
+
+## Detection Patterns
+
+### Memory Leak Indicators
+
+1. **Linear growth**: Memory usage increases linearly with time/requests
+2. **Pool exhaustion**: Connection pool hits max size
+3. **EventEmitter warnings**: "MaxListenersExceededWarning"
+4. **GC pressure**: Frequent/long GC pauses
+5. **OOM errors**: Process crashes with "JavaScript heap out of memory"
+
+### Monitoring Metrics
+
+```typescript
+// Prometheus metrics for leak detection
+const heap_used = new Gauge({
+  name: 'nodejs_heap_used_bytes',
+  help: 'V8 heap used bytes'
+});
+
+const event_listeners = new Gauge({
+  name: 'event_listeners_total',
+  help: 'Total event listeners',
+  labelNames: ['event']
+});
+
+// Alert if heap grows >10% per hour
+// Alert if listener count >100 for single event
+```
+
+---
+
+## Quick Fixes Checklist
+
+- [ ] **Event listeners**: Use `once()` or `removeListener()`
+- [ ] **Database connections**: Use context managers or `try/finally`
+- [ ] **Large datasets**: Use chunking or streaming
+- [ ] **Caches**: Implement LRU or WeakMap
+- [ ] **Timers**: Clear with `clearInterval()` or `clearTimeout()`
+- [ ] **Closures**: Extract values, avoid capturing large objects
+- [ ] **React**: Cleanup in `useEffect()` return
+- [ ] **Strings**: Use `join()` or `StringIO`, not `+=`
+
+---
+
+## Related Documentation
+
+- **Examples**: [Examples Index](../examples/INDEX.md)
+- **Tools**: [profiling-tools.md](profiling-tools.md)
+- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
+
+---
+
+Return to [reference index](INDEX.md)
--- a/skills/memory-profiling/reference/profiling-tools.md
+++ b/skills/memory-profiling/reference/profiling-tools.md
@@ -0,0 +1,407 @@
+# Memory Profiling Tools Comparison
+
+Quick reference for choosing and using memory profiling tools across Node.js, Python, and production monitoring.
+
+## Node.js Tools
+
+### Chrome DevTools (Built-in)
+
+**Best for**: Interactive heap snapshot analysis, timeline profiling
+**Cost**: Free (built into Node.js)
+
+**Usage**:
+```bash
+# Start Node.js with inspector
+node --inspect server.js
+
+# Open chrome://inspect
+# Click "Open dedicated DevTools for Node"
+```
+
+**Features**:
+- Heap snapshots (memory state at point in time)
+- Timeline recording (allocations over time)
+- Comparison view (find leaks by comparing snapshots)
+- Retainer paths (why object not GC'd)
+
+**When to use**:
+- Development/staging environments
+- Interactive debugging sessions
+- Visual leak analysis
+
+---
+
+### heapdump (npm package)
+
+**Best for**: Production heap snapshots without restarts
+**Cost**: Free (npm package)
+
+**Usage**:
+```typescript
+import heapdump from 'heapdump';
+
+// Trigger snapshot on signal
+process.on('SIGUSR2', () => {
+  heapdump.writeSnapshot((err, filename) => {
+    console.log('Heap dump written to', filename);
+  });
+});
+
+// Auto-snapshot on OOM
+heapdump.writeSnapshot('./oom-' + Date.now() + '.heapsnapshot');
+```
+
+**When to use**:
+- Production memory leak diagnosis
+- Scheduled snapshots (daily/weekly)
+- OOM analysis (capture before crash)
+
+---
+
+### clinic.js (Comprehensive Suite)
+
+**Best for**: All-in-one performance profiling
+**Cost**: Free (open source)
+
+**Usage**:
+```bash
+# Install
+npm install -g clinic
+
+# Memory profiling
+clinic heapprofiler -- node server.js
+
+# Generates interactive HTML report
+```
+
+**Features**:
+- Heap profiler (memory allocations)
+- Flame graphs (CPU + memory)
+- Timeline visualization
+- Automatic leak detection
+
+**When to use**:
+- Initial performance investigation
+- Comprehensive profiling (CPU + memory)
+- Team-friendly reports (HTML)
+
+---
+
+### memwatch-next
+
+**Best for**: Real-time leak detection in production
+**Cost**: Free (npm package)
+
+**Usage**:
+```typescript
+import memwatch from '@airbnb/node-memwatch';
+
+memwatch.on('leak', (info) => {
+  console.error('Memory leak detected:', info);
+  // Alert, log, snapshot, etc.
+});
+
+memwatch.on('stats', (stats) => {
+  console.log('GC stats:', stats);
+});
+```
+
+**When to use**:
+- Production leak monitoring
+- Automatic alerting
+- GC pressure tracking
+
+---
+
+## Python Tools
+
+### Scalene (Line-by-Line Profiler)
+
+**Best for**: Fastest, most detailed Python profiler
+**Cost**: Free (pip package)
+
+**Usage**:
+```bash
+# Install
+pip install scalene
+
+# Profile script
+scalene script.py
+
+# Profile with pytest
+scalene --cli --memory -m pytest tests/
+
+# HTML report
+scalene --html --outfile profile.html script.py
+```
+
+**Features**:
+- Line-by-line memory allocation
+- CPU profiling
+- GPU profiling
+- Native code vs Python time
+- Memory timeline
+
+**When to use**:
+- Python memory optimization
+- Line-level bottleneck identification
+- pytest integration
+
+---
+
+### memory_profiler
+
+**Best for**: Simple decorator-based profiling
+**Cost**: Free (pip package)
+
+**Usage**:
+```python
+from memory_profiler import profile
+
+@profile
+def my_function():
+    a = [1] * (10 ** 6)
+    b = [2] * (2 * 10 ** 7)
+    return a + b
+
+# Run with: python -m memory_profiler script.py
+```
+
+**When to use**:
+- Quick function-level profiling
+- Simple memory debugging
+- Educational/learning
+
+---
+
+### tracemalloc (Built-in)
+
+**Best for**: Production memory tracking without dependencies
+**Cost**: Free (Python standard library)
+
+**Usage**:
+```python
+import tracemalloc
+
+tracemalloc.start()
+
+# Your code here
+
+current, peak = tracemalloc.get_traced_memory()
+print(f"Current: {current / 1024 / 1024:.1f} MB")
+print(f"Peak: {peak / 1024 / 1024:.1f} MB")
+
+# Top allocations
+snapshot = tracemalloc.take_snapshot()
+top_stats = snapshot.statistics('lineno')
+for stat in top_stats[:10]:
+    print(stat)
+
+tracemalloc.stop()
+```
+
+**When to use**:
+- Production environments (no external dependencies)
+- Allocation tracking
+- Top allocators identification
+
+---
+
+### py-spy (Sampling Profiler)
+
+**Best for**: Zero-overhead production profiling
+**Cost**: Free (cargo/pip package)
+
+**Usage**:
+```bash
+# Install
+pip install py-spy
+
+# Attach to running process (no code changes!)
+py-spy top --pid 12345
+
+# Flame graph
+py-spy record --pid 12345 --output profile.svg
+```
+
+**When to use**:
+- Production profiling (minimal overhead)
+- No code modification required
+- Running process analysis
+
+---
+
+## Monitoring Tools
+
+### Prometheus + Grafana
+
+**Best for**: Production metrics and alerting
+**Cost**: Free (open source)
+
+**Metrics to track**:
+```typescript
+import { Gauge, Histogram } from 'prom-client';
+
+// Heap usage
+const heap_used = new Gauge({
+  name: 'nodejs_heap_used_bytes',
+  help: 'V8 heap used bytes'
+});
+
+// Memory allocation rate
+const allocation_rate = new Gauge({
+  name: 'memory_allocation_bytes_per_second',
+  help: 'Memory allocation rate'
+});
+
+// Connection pool
+const pool_active = new Gauge({
+  name: 'db_pool_connections_active',
+  help: 'Active database connections'
+});
+```
+
+**Alerts**:
+```yaml
+# Prometheus alert rules
+groups:
+  - name: memory_alerts
+    rules:
+      - alert: MemoryLeak
+        expr: increase(nodejs_heap_used_bytes[1h]) > 100000000  # +100MB/hour
+        for: 6h
+        annotations:
+          summary: "Potential memory leak ({{ $value | humanize }} growth)"
+
+      - alert: HeapNearLimit
+        expr: nodejs_heap_used_bytes / nodejs_heap_size_bytes > 0.9
+        for: 5m
+        annotations:
+          summary: "Heap usage >90%"
+```
+
+**When to use**:
+- Production monitoring (all environments)
+- Long-term trend analysis
+- Automatic alerting
+
+---
+
+### DataDog APM
+
+**Best for**: Comprehensive observability platform
+**Cost**: Paid (starts $15/host/month)
+
+**Features**:
+- Automatic heap tracking
+- Memory leak detection
+- Distributed tracing
+- Alert management
+- Dashboards
+
+**When to use**:
+- Enterprise environments
+- Multi-service tracing
+- Managed solution preferred
+
+---
+
+## Tool Selection Matrix
+
+| Scenario | Node.js Tool | Python Tool | Monitoring |
+|----------|-------------|-------------|------------|
+| **Development debugging** | Chrome DevTools | Scalene | - |
+| **Production leak** | heapdump | py-spy | Prometheus |
+| **Line-level analysis** | clinic.js | Scalene | - |
+| **Real-time monitoring** | memwatch-next | tracemalloc | Grafana |
+| **Zero overhead** | - | py-spy | DataDog |
+| **No dependencies** | Chrome DevTools | tracemalloc | - |
+| **Team reports** | clinic.js | Scalene HTML | Grafana |
+
+---
+
+## Quick Start Commands
+
+### Node.js
+
+```bash
+# Development: Chrome DevTools
+node --inspect server.js
+
+# Production: Heap snapshot
+kill -USR2 <pid>  # If heapdump configured
+
+# Comprehensive: clinic.js
+clinic heapprofiler -- node server.js
+```
+
+### Python
+
+```bash
+# Line-by-line: Scalene
+scalene --cli --memory script.py
+
+# Quick profile: memory_profiler
+python -m memory_profiler script.py
+
+# Production: py-spy
+py-spy top --pid <pid>
+```
+
+### Monitoring
+
+```bash
+# Prometheus metrics
+curl http://localhost:9090/metrics | grep memory
+
+# Grafana dashboard
+# Import dashboard ID: 11159 (Node.js)
+# Import dashboard ID: 7362 (Python)
+```
+
+---
+
+## Tool Comparison Table
+
+| Tool | Language | Type | Overhead | Production-Safe | Interactive |
+|------|----------|------|----------|----------------|-------------|
+| **Chrome DevTools** | Node.js | Heap snapshot | Low | No | Yes |
+| **heapdump** | Node.js | Heap snapshot | Low | Yes | No |
+| **clinic.js** | Node.js | Profiler | Medium | No | Yes |
+| **memwatch-next** | Node.js | Real-time | Low | Yes | No |
+| **Scalene** | Python | Profiler | Low | Staging | Yes |
+| **memory_profiler** | Python | Decorator | Medium | No | No |
+| **tracemalloc** | Python | Built-in | Low | Yes | No |
+| **py-spy** | Python | Sampling | Very Low | Yes | No |
+| **Prometheus** | Both | Metrics | Very Low | Yes | Yes (Grafana) |
+| **DataDog** | Both | APM | Very Low | Yes | Yes |
+
+---
+
+## Best Practices
+
+### Development Workflow
+
+1. **Initial investigation**: Chrome DevTools (Node.js) or Scalene (Python)
+2. **Line-level analysis**: clinic.js or Scalene with `--html`
+3. **Root cause**: Heap snapshot comparison (DevTools)
+4. **Validation**: Load testing with monitoring
+
+### Production Workflow
+
+1. **Detection**: Prometheus alerts (heap growth, pool exhaustion)
+2. **Diagnosis**: heapdump snapshot or py-spy sampling
+3. **Analysis**: Chrome DevTools (load snapshot) or Scalene (if reproducible in staging)
+4. **Monitoring**: Grafana dashboards for trends
+
+---
+
+## Related Documentation
+
+- **Patterns**: [memory-optimization-patterns.md](memory-optimization-patterns.md)
+- **GC**: [garbage-collection-guide.md](garbage-collection-guide.md)
+- **Examples**: [Examples Index](../examples/INDEX.md)
+
+---
+
+Return to [reference index](INDEX.md)