Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:50:24 +08:00
commit f172746dc6
52 changed files with 17406 additions and 0 deletions

View File

@@ -0,0 +1,211 @@
---
name: Optimizing Performance
description: Optimize performance with profiling, caching strategies, database query optimization, and bottleneck analysis. Use when improving response times, implementing caching layers, or scaling for high load.
---
# Optimizing Performance
I help you identify and fix performance bottlenecks using language-specific profiling tools, optimization patterns, and best practices.
## When to Use Me
**Performance analysis:**
- "Profile this code for bottlenecks"
- "Analyze performance issues"
- "Why is this slow?"
**Optimization:**
- "Optimize database queries"
- "Improve response time"
- "Reduce memory usage"
**Scaling:**
- "Implement caching strategy"
- "Optimize for high load"
- "Scale this service"
## How I Work - Progressive Loading
I load only the performance guidance relevant to your language:
```yaml
Language Detection:
"Python project" → Load @languages/PYTHON.md
"Rust project" → Load @languages/RUST.md
"JavaScript/Node.js" → Load @languages/JAVASCRIPT.md
"Go project" → Load @languages/GO.md
"Any language" → Load @languages/GENERIC.md
```
**Don't load all files!** Start with language detection, then load specific guidance.
## Core Principles
### 1. Measure First
**Never optimize without data.** Profile to find actual bottlenecks, don't guess.
- Establish baseline metrics
- Profile to identify hot paths
- Focus on the 20% of code that takes 80% of time
- Measure improvements after optimization
### 2. Performance Budgets
Set clear targets before optimizing:
```yaml
targets:
api_response: "<200ms (p95)"
page_load: "<2 seconds"
database_query: "<50ms (p95)"
cache_lookup: "<10ms"
```
### 3. Trade-offs
Balance performance vs:
- Code readability
- Maintainability
- Development time
- Memory usage
Premature optimization is the root of all evil. Optimize when:
- Profiling shows clear bottleneck
- Performance requirement not met
- User experience degraded
## Quick Wins (Language-Agnostic)
### Database
- Add indexes for frequently queried columns
- Implement connection pooling
- Use batch operations instead of loops
- Cache expensive query results
### Caching
- Implement multi-level caching (L1: in-memory, L2: Redis, L3: database, L4: CDN)
- Define cache invalidation strategy
- Monitor cache hit rates
### Network
- Enable compression for responses
- Use HTTP/2 or HTTP/3
- Implement CDN for static assets
- Configure appropriate timeouts
## Language-Specific Guidance
### Python
**Load:** `@languages/PYTHON.md`
**Quick reference:**
- Profiling: `cProfile`, `py-spy`, `memory_profiler`
- Patterns: Generators, async/await, list comprehensions
- Anti-patterns: String concatenation in loops, GIL contention
- Tools: `pytest-benchmark`, `locust`
### Rust
**Load:** `@languages/RUST.md`
**Quick reference:**
- Profiling: `cargo bench`, `flamegraph`, `perf`
- Patterns: Zero-cost abstractions, iterator chains, preallocated collections
- Anti-patterns: Unnecessary allocations, large enum variants
- Tools: `criterion`, `rayon`, `parking_lot`
### JavaScript/Node.js
**Load:** `@languages/JAVASCRIPT.md`
**Quick reference:**
- Profiling: `clinic.js`, `0x`, Chrome DevTools
- Patterns: Event loop optimization, worker threads, streaming
- Anti-patterns: Blocking event loop, memory leaks, unnecessary re-renders
- Tools: `autocannon`, `react-window`, `p-limit`
### Go
**Load:** `@languages/GO.md`
**Quick reference:**
- Profiling: `pprof`, `go test -bench`, `go tool trace`
- Patterns: Goroutine pools, buffered channels, `sync.Pool`
- Anti-patterns: Unlimited goroutines, defer in loops, lock contention
- Tools: `benchstat`, `sync.Map`, `strings.Builder`
### Generic Patterns
**Load:** `@languages/GENERIC.md`
**When to use:** Database optimization, caching strategies, load balancing, monitoring - applicable to any language.
## Optimization Workflow
### Phase 1: Baseline
1. Define performance requirements
2. Measure current performance
3. Identify user-facing metrics (response time, throughput)
### Phase 2: Profile
1. Use language-specific profiling tools
2. Identify hot paths (where time is spent)
3. Find memory bottlenecks
4. Check for resource leaks
### Phase 3: Optimize
1. Focus on biggest bottleneck first
2. Apply language-specific optimizations
3. Implement caching where appropriate
4. Optimize database queries
### Phase 4: Verify
1. Re-profile to measure improvements
2. Run performance regression tests
3. Monitor in production
4. Set up alerts for degradation
## Common Bottlenecks
### Database
- Missing indexes
- N+1 query problem
- No connection pooling
- Expensive joins
**Load** `@languages/GENERIC.md` for DB optimization
### Memory
- Memory leaks
- Excessive allocations
- Large object graphs
- No pooling
**Load** language-specific file for memory management
### Network
- No compression
- Chatty API calls
- Synchronous external calls
- No CDN
**Load** `@languages/GENERIC.md` for network optimization
### Concurrency
- Lock contention
- Excessive threading/goroutines
- Blocking operations
- Poor work distribution
**Load** language-specific file for concurrency patterns
## Success Criteria
**Optimization complete when:**
- ✅ Performance targets met
- ✅ No regressions in functionality
- ✅ Code remains maintainable
- ✅ Improvements verified with profiling
- ✅ Production metrics show improvement
- ✅ Alerts configured for degradation
## Next Steps
- Use profiling tools to identify bottlenecks
- Load language-specific guidance
- Apply targeted optimizations
- Set up monitoring and alerts
---
*Load language-specific files for detailed profiling tools, optimization patterns, and best practices*

View File

@@ -0,0 +1,426 @@
# Generic Performance Optimization
**Load this file when:** Optimizing performance in any language or need language-agnostic patterns
## Universal Principles
### Measure First
- Never optimize without profiling
- Establish baseline metrics before changes
- Focus on bottlenecks, not micro-optimizations
- Use 80/20 rule: 80% of time spent in 20% of code
### Performance Budgets
```yaml
response_time_targets:
api_endpoint: "<200ms (p95)"
page_load: "<2 seconds"
database_query: "<50ms (p95)"
cache_lookup: "<10ms"
resource_limits:
max_memory: "512MB per process"
max_cpu: "80% sustained"
max_connections: "100 per instance"
```
## Database Optimization
### Indexing Strategy
```sql
-- Identify slow queries
-- PostgreSQL
SELECT query, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Add indexes for frequently queried columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- Composite indexes for common query patterns
CREATE INDEX idx_search ON products(category, price, created_at);
```
### Query Optimization
```sql
-- Use EXPLAIN to understand query plans
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Avoid SELECT *
-- Bad
SELECT * FROM users;
-- Good
SELECT id, name, email FROM users;
-- Use LIMIT for pagination
SELECT id, name FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- Use EXISTS instead of COUNT for checking existence
-- Bad
SELECT COUNT(*) FROM orders WHERE user_id = 123;
-- Good
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 123);
```
### Connection Pooling
```yaml
connection_pool_config:
min_connections: 5
max_connections: 20
connection_timeout: 30s
idle_timeout: 10m
max_lifetime: 1h
```
## Caching Strategies
### Multi-Level Caching
```yaml
caching_layers:
L1_application:
type: "In-Memory (LRU)"
size: "100MB"
ttl: "5 minutes"
use_case: "Hot data, session data"
L2_distributed:
type: "Redis"
ttl: "1 hour"
use_case: "Shared data across instances"
L3_database:
type: "Query Result Cache"
ttl: "15 minutes"
use_case: "Expensive query results"
L4_cdn:
type: "CDN"
ttl: "24 hours"
use_case: "Static assets, public API responses"
```
### Cache Invalidation Patterns
```yaml
strategies:
time_based:
description: "TTL-based expiration"
use_case: "Data with predictable change patterns"
example: "Weather data, stock prices"
event_based:
description: "Invalidate on data change events"
use_case: "Real-time consistency required"
example: "User profile updates"
write_through:
description: "Update cache on write"
use_case: "Strong consistency needed"
example: "Shopping cart, user sessions"
lazy_refresh:
description: "Refresh on cache miss"
use_case: "Acceptable stale data"
example: "Analytics dashboards"
```
## Network Optimization
### HTTP/2 and HTTP/3
```yaml
benefits:
- Multiplexing: Multiple requests over single connection
- Header compression: Reduced overhead
- Server push: Proactive resource sending
- Binary protocol: Faster parsing
```
### Compression
```yaml
compression_config:
enabled: true
min_size: "1KB" # Don't compress tiny responses
types:
- "text/html"
- "text/css"
- "application/javascript"
- "application/json"
level: 6 # Balance speed vs size
```
### Connection Management
```yaml
keep_alive:
enabled: true
timeout: "60s"
max_requests: 100
timeouts:
connect: "10s"
read: "30s"
write: "30s"
idle: "120s"
```
## Monitoring and Observability
### Key Metrics to Track
```yaml
application_metrics:
- response_time_p50
- response_time_p95
- response_time_p99
- error_rate
- throughput_rps
system_metrics:
- cpu_utilization
- memory_utilization
- disk_io
- network_io
database_metrics:
- query_execution_time
- connection_pool_usage
- slow_query_count
- cache_hit_rate
```
### Alert Thresholds
```yaml
alerts:
critical:
- metric: "error_rate"
threshold: ">5%"
duration: "2 minutes"
- metric: "response_time_p99"
threshold: ">1000ms"
duration: "5 minutes"
warning:
- metric: "cpu_utilization"
threshold: ">80%"
duration: "10 minutes"
- metric: "memory_utilization"
threshold: ">85%"
duration: "5 minutes"
```
## Load Balancing
### Strategies
```yaml
round_robin:
description: "Distribute requests evenly"
use_case: "Homogeneous backend servers"
least_connections:
description: "Route to server with fewest connections"
use_case: "Varying request processing times"
ip_hash:
description: "Consistent routing based on client IP"
use_case: "Session affinity required"
weighted:
description: "Route based on server capacity"
use_case: "Heterogeneous server specs"
```
### Health Checks
```yaml
health_check:
interval: "10s"
timeout: "5s"
unhealthy_threshold: 3
healthy_threshold: 2
path: "/health"
```
## CDN Configuration
### Caching Rules
```yaml
static_assets:
pattern: "*.{js,css,png,jpg,svg,woff2}"
cache_control: "public, max-age=31536000, immutable"
api_responses:
pattern: "/api/public/*"
cache_control: "public, max-age=300, s-maxage=600"
html_pages:
pattern: "*.html"
cache_control: "public, max-age=60, s-maxage=300"
```
### Geographic Distribution
```yaml
regions:
- us-east: "Primary"
- us-west: "Failover"
- eu-west: "Regional"
- ap-southeast: "Regional"
routing:
policy: "latency-based"
fallback: "round-robin"
```
## Horizontal Scaling Patterns
### Stateless Services
```yaml
principles:
- No local state storage
- Session data in external store (Redis, database)
- Any instance can handle any request
- Easy to add/remove instances
```
### Message Queues
```yaml
use_cases:
- Decouple services
- Handle traffic spikes
- Async processing
- Retry logic
patterns:
work_queue:
description: "Distribute tasks to workers"
example: "Image processing, email sending"
pub_sub:
description: "Event broadcasting"
example: "User registration notifications"
```
## Anti-Patterns to Avoid
### N+1 Query Problem
```sql
-- Bad: N+1 queries (1 for users + N for profiles)
SELECT * FROM users;
-- Then for each user:
SELECT * FROM profiles WHERE user_id = ?;
-- Good: Single join query
SELECT u.*, p.*
FROM users u
LEFT JOIN profiles p ON u.id = p.user_id;
```
### Chatty Interfaces
```yaml
bad:
requests: 100
description: "100 separate API calls to get data"
latency: "100 * 50ms = 5000ms"
good:
requests: 1
description: "Single batch API call"
latency: "200ms"
```
### Synchronous External Calls
```yaml
bad:
pattern: "Sequential blocking calls"
time: "call1 (500ms) + call2 (500ms) + call3 (500ms) = 1500ms"
good:
pattern: "Parallel async calls"
time: "max(call1, call2, call3) = 500ms"
```
## Performance Testing Strategy
### Load Testing
```yaml
scenarios:
smoke_test:
users: 1
duration: "1 minute"
purpose: "Verify system works"
load_test:
users: "normal_traffic"
duration: "15 minutes"
purpose: "Performance under normal load"
stress_test:
users: "2x_normal"
duration: "30 minutes"
purpose: "Find breaking point"
spike_test:
users: "0 → 1000 → 0"
duration: "10 minutes"
purpose: "Handle sudden traffic spikes"
endurance_test:
users: "normal_traffic"
duration: "24 hours"
purpose: "Memory leaks, degradation"
```
### Performance Regression Tests
```yaml
approach:
- Baseline metrics from production
- Run automated perf tests in CI
- Compare against baseline
- Fail build if regression > threshold
thresholds:
response_time: "+10%"
throughput: "-5%"
error_rate: "+1%"
```
## Checklist
**Initial Assessment:**
- [ ] Identify performance requirements
- [ ] Establish current baseline metrics
- [ ] Profile to find bottlenecks
**Database Optimization:**
- [ ] Add indexes for common queries
- [ ] Implement connection pooling
- [ ] Cache query results
- [ ] Use batch operations
**Caching:**
- [ ] Implement multi-level caching
- [ ] Define cache invalidation strategy
- [ ] Monitor cache hit rates
**Network:**
- [ ] Enable compression
- [ ] Use HTTP/2 or HTTP/3
- [ ] Implement CDN for static assets
- [ ] Configure appropriate timeouts
**Monitoring:**
- [ ] Track key performance metrics
- [ ] Set up alerts for anomalies
- [ ] Implement distributed tracing
- [ ] Create performance dashboards
**Testing:**
- [ ] Run load tests
- [ ] Conduct stress tests
- [ ] Set up performance regression tests
- [ ] Monitor in production
---
*Language-agnostic performance optimization patterns applicable to any technology stack*

View File

@@ -0,0 +1,433 @@
# Go Performance Optimization
**Load this file when:** Optimizing performance in Go projects
## Profiling Tools
### Built-in pprof
```bash
# CPU profiling
go test -cpuprofile=cpu.prof -bench=.
go tool pprof cpu.prof
# Memory profiling
go test -memprofile=mem.prof -bench=.
go tool pprof mem.prof
# Web UI for profiles
go tool pprof -http=:8080 cpu.prof
# Goroutine profiling
go tool pprof http://localhost:6060/debug/pprof/goroutine
# Heap profiling
go tool pprof http://localhost:6060/debug/pprof/heap
```
### Benchmarking
```go
// Basic benchmark
func BenchmarkFibonacci(b *testing.B) {
for i := 0; i < b.N; i++ {
fibonacci(20)
}
}
// With sub-benchmarks
func BenchmarkSizes(b *testing.B) {
sizes := []int{10, 100, 1000}
for _, size := range sizes {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
for i := 0; i < b.N; i++ {
process(size)
}
})
}
}
// Reset timer for setup
func BenchmarkWithSetup(b *testing.B) {
data := setupExpensiveData()
b.ResetTimer() // Don't count setup time
for i := 0; i < b.N; i++ {
process(data)
}
}
```
### Runtime Metrics
```go
import (
"net/http"
_ "net/http/pprof" // Import for side effects
"runtime"
)
func init() {
// Enable profiling endpoint
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
}
// Monitor goroutines
func printStats() {
fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc: %d MB\n", m.Alloc/1024/1024)
fmt.Printf("TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024)
}
```
## Memory Management
### Avoiding Allocations
```go
// Bad: Allocates on every call
func process(data []byte) []byte {
result := make([]byte, len(data)) // New allocation
copy(result, data)
return result
}
// Good: Reuse buffer
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 1024)
},
}
func process(data []byte) {
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Process with buf
}
```
### Preallocate Slices
```go
// Bad: Multiple allocations as slice grows
items := []Item{}
for i := 0; i < 1000; i++ {
items = append(items, Item{i}) // Reallocates when cap exceeded
}
// Good: Single allocation
items := make([]Item, 0, 1000)
for i := 0; i < 1000; i++ {
items = append(items, Item{i}) // No reallocation
}
// Or if final size is known
items := make([]Item, 1000)
for i := 0; i < 1000; i++ {
items[i] = Item{i}
}
```
### String vs []byte
```go
// Bad: String concatenation allocates
var result string
for _, s := range strings {
result += s // New allocation each time
}
// Good: Use strings.Builder
var builder strings.Builder
builder.Grow(estimatedSize) // Preallocate
for _, s := range strings {
builder.WriteString(s)
}
result := builder.String()
// For byte operations, work with []byte
data := []byte("hello")
data = append(data, " world"...) // Efficient
```
## Goroutine Optimization
### Worker Pool Pattern
```go
// Bad: Unlimited goroutines
for _, task := range tasks {
go process(task) // Could spawn millions!
}
// Good: Limited worker pool
func workerPool(tasks <-chan Task, workers int) {
var wg sync.WaitGroup
for i := 0; i < workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for task := range tasks {
process(task)
}
}()
}
wg.Wait()
}
// Usage
taskChan := make(chan Task, 100)
go workerPool(taskChan, 10) // 10 workers
```
### Channel Patterns
```go
// Buffered channels reduce blocking
ch := make(chan int, 100) // Buffer of 100
// Fan-out pattern for parallel work
func fanOut(in <-chan int, n int) []<-chan int {
outs := make([]<-chan int, n)
for i := 0; i < n; i++ {
out := make(chan int)
outs[i] = out
go func() {
for v := range in {
out <- process(v)
}
close(out)
}()
}
return outs
}
// Fan-in pattern to merge results
func fanIn(channels ...<-chan int) <-chan int {
out := make(chan int)
var wg sync.WaitGroup
for _, ch := range channels {
wg.Add(1)
go func(c <-chan int) {
defer wg.Done()
for v := range c {
out <- v
}
}(ch)
}
go func() {
wg.Wait()
close(out)
}()
return out
}
```
## Data Structure Optimization
### Map Preallocation
```go
// Bad: Map grows as needed
m := make(map[string]int)
for i := 0; i < 10000; i++ {
m[fmt.Sprint(i)] = i // Reallocates periodically
}
// Good: Preallocate
m := make(map[string]int, 10000)
for i := 0; i < 10000; i++ {
m[fmt.Sprint(i)] = i // No reallocation
}
```
### Struct Field Alignment
```go
// Bad: Poor alignment (40 bytes due to padding)
type BadLayout struct {
a bool // 1 byte + 7 padding
b int64 // 8 bytes
c bool // 1 byte + 7 padding
d int64 // 8 bytes
e bool // 1 byte + 7 padding
}
// Good: Optimal alignment (24 bytes)
type GoodLayout struct {
b int64 // 8 bytes
d int64 // 8 bytes
a bool // 1 byte
c bool // 1 byte
e bool // 1 byte + 5 padding
}
```
## I/O Optimization
### Buffered I/O
```go
// Bad: Unbuffered reads
file, _ := os.Open("file.txt")
scanner := bufio.NewScanner(file)
// Good: Buffered with custom size
file, _ := os.Open("file.txt")
reader := bufio.NewReaderSize(file, 64*1024) // 64KB buffer
scanner := bufio.NewScanner(reader)
```
### Connection Pooling
```go
// HTTP client with connection pooling
client := &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
Timeout: 10 * time.Second,
}
// Database connection pool
db, _ := sql.Open("postgres", dsn)
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(5)
db.SetConnMaxLifetime(5 * time.Minute)
```
## Performance Anti-Patterns
### Unnecessary Interface Conversions
```go
// Bad: Interface conversion in hot path
func process(items []interface{}) {
for _, item := range items {
v := item.(MyType) // Type assertion overhead
use(v)
}
}
// Good: Use concrete types
func process(items []MyType) {
for _, item := range items {
use(item) // Direct access
}
}
```
### Defer in Loops
```go
// Bad: Defers accumulate in loop
for _, file := range files {
f, _ := os.Open(file)
defer f.Close() // All close calls deferred until function returns!
}
// Good: Close immediately or use function
for _, file := range files {
func() {
f, _ := os.Open(file)
defer f.Close() // Deferred to end of this closure
process(f)
}()
}
```
### Lock Contention
```go
// Bad: Lock held during expensive operation
mu.Lock()
result := expensiveComputation(data)
cache[key] = result
mu.Unlock()
// Good: Minimize lock time
result := expensiveComputation(data)
mu.Lock()
cache[key] = result
mu.Unlock()
// Better: Use sync.Map for concurrent reads
var cache sync.Map
cache.Store(key, value)
val, ok := cache.Load(key)
```
## Compiler Optimizations
### Escape Analysis
```go
// Bad: Escapes to heap
func makeSlice() *[]int {
s := make([]int, 1000)
return &s // Pointer returned, allocates on heap
}
// Good: Stays on stack
func makeSlice() []int {
s := make([]int, 1000)
return s // Value returned, can stay on stack
}
// Check with: go build -gcflags='-m'
```
### Inline Functions
```go
// Small functions are inlined automatically
func add(a, b int) int {
return a + b // Will be inlined
}
// Prevent inlining if needed: //go:noinline
```
## Performance Checklist
**Before Optimizing:**
- [ ] Profile with pprof to identify bottlenecks
- [ ] Write benchmarks for hot paths
- [ ] Measure allocations with `-benchmem`
- [ ] Check for goroutine leaks
**Go-Specific Optimizations:**
- [ ] Preallocate slices and maps with known capacity
- [ ] Use `strings.Builder` for string concatenation
- [ ] Implement worker pools instead of unlimited goroutines
- [ ] Use buffered channels to reduce blocking
- [ ] Reuse buffers with `sync.Pool`
- [ ] Minimize allocations in hot paths
- [ ] Order struct fields by size (largest first)
- [ ] Use concrete types instead of interfaces in hot paths
- [ ] Avoid `defer` in tight loops
- [ ] Use `sync.Map` for concurrent read-heavy maps
**After Optimizing:**
- [ ] Re-profile to verify improvements
- [ ] Compare benchmarks: `benchstat old.txt new.txt`
- [ ] Check memory allocations decreased
- [ ] Monitor goroutine count in production
- [ ] Use `go test -race` to check for race conditions
## Tools and Packages
**Profiling:**
- `pprof` - Built-in profiler
- `go-torch` - Flamegraph generation
- `benchstat` - Compare benchmark results
- `trace` - Execution tracer
**Optimization:**
- `sync.Pool` - Object pooling
- `sync.Map` - Concurrent map
- `strings.Builder` - Efficient string building
- `bufio` - Buffered I/O
**Analysis:**
- `-gcflags='-m'` - Escape analysis
- `go test -race` - Race detector
- `go test -benchmem` - Memory allocations
- `goleak` - Goroutine leak detection
---
*Go-specific performance optimization with goroutines, channels, and profiling*

View File

@@ -0,0 +1,406 @@
# JavaScript/Node.js Performance Optimization
**Load this file when:** Optimizing performance in JavaScript or Node.js projects
## Profiling Tools
### Node.js Built-in Profiler
```bash
# CPU profiling
node --prof app.js
node --prof-process isolate-0x*.log > processed.txt
# Inspect with Chrome DevTools
node --inspect app.js
# Open chrome://inspect
# Heap snapshots
node --inspect --inspect-brk app.js
# Take heap snapshots in DevTools
```
### Clinic.js Suite
```bash
# Install clinic
npm install -g clinic
# Doctor - Overall health check
clinic doctor -- node app.js
# Flame - Flamegraph profiling
clinic flame -- node app.js
# Bubbleprof - Async operations
clinic bubbleprof -- node app.js
# Heap profiler
clinic heapprofiler -- node app.js
```
### Performance Measurement
```bash
# 0x - Flamegraph generator
npx 0x app.js
# autocannon - HTTP load testing
npx autocannon http://localhost:3000
# lighthouse - Frontend performance
npx lighthouse https://example.com
```
## V8 Optimization Patterns
### Hidden Classes and Inline Caches
```javascript
// Bad: Dynamic property addition breaks hidden class
function Point(x, y) {
this.x = x;
this.y = y;
}
const p1 = new Point(1, 2);
p1.z = 3; // Deoptimizes!
// Good: Consistent object shape
function Point(x, y, z = 0) {
this.x = x;
this.y = y;
this.z = z; // Always present
}
```
### Avoid Polymorphism in Hot Paths
```javascript
// Bad: Type changes break optimization
function add(a, b) {
return a + b;
}
add(1, 2); // Optimized for numbers
add("a", "b"); // Deoptimized! Now handles strings too
// Good: Separate functions for different types
function addNumbers(a, b) {
return a + b; // Always numbers
}
function concatStrings(a, b) {
return a + b; // Always strings
}
```
### Array Optimization
```javascript
// Bad: Mixed types in array
const mixed = [1, "two", 3, "four"]; // Slow property access
// Good: Homogeneous arrays
const numbers = [1, 2, 3, 4]; // Fast element access
const strings = ["one", "two", "three"];
// Use typed arrays for numeric data
const buffer = new Float64Array(1000); // Faster than regular arrays
```
## Event Loop Optimization
### Avoid Blocking the Event Loop
```javascript
// Bad: Synchronous operations block event loop
const data = fs.readFileSync('large-file.txt');
const result = heavyComputation(data);
// Good: Async operations
const data = await fs.promises.readFile('large-file.txt');
const result = await processAsync(data);
// For CPU-intensive work, use worker threads
const { Worker } = require('worker_threads');
const worker = new Worker('./cpu-intensive.js');
```
### Batch Async Operations
```javascript
// Bad: Sequential async calls
for (const item of items) {
await processItem(item); // Waits for each
}
// Good: Parallel execution
await Promise.all(items.map(item => processItem(item)));
// Better: Controlled concurrency with p-limit
const pLimit = require('p-limit');
const limit = pLimit(10); // Max 10 concurrent
await Promise.all(
items.map(item => limit(() => processItem(item)))
);
```
## Memory Management
### Avoid Memory Leaks
```javascript
// Bad: Global variables and closures retain memory
let cache = {}; // Never cleared
function addToCache(key, value) {
cache[key] = value; // Grows indefinitely
}
// Good: Use WeakMap for caching
const cache = new WeakMap();
function addToCache(obj, value) {
cache.set(obj, value); // Auto garbage collected
}
// Good: Implement cache eviction
const LRU = require('lru-cache');
const cache = new LRU({ max: 500 });
```
### Stream Large Data
```javascript
// Bad: Load entire file in memory
const data = await fs.promises.readFile('large-file.txt');
const processed = data.toString().split('\n').map(process);
// Good: Stream processing
const readline = require('readline');
const stream = fs.createReadStream('large-file.txt');
const rl = readline.createInterface({ input: stream });
for await (const line of rl) {
process(line); // Process one line at a time
}
```
## Database Query Optimization
### Connection Pooling
```javascript
// Bad: Create new connection per request
async function query(sql) {
const conn = await mysql.createConnection(config);
const result = await conn.query(sql);
await conn.end();
return result;
}
// Good: Use connection pool
const pool = mysql.createPool(config);
async function query(sql) {
return pool.query(sql); // Reuses connections
}
```
### Batch Database Operations
```javascript
// Bad: Multiple round trips
for (const user of users) {
await db.insert('users', user);
}
// Good: Single batch insert
await db.batchInsert('users', users, 1000); // Chunks of 1000
```
## HTTP Server Optimization
### Compression
```javascript
const compression = require('compression');
app.use(compression({
level: 6, // Balance between speed and compression
threshold: 1024 // Only compress responses > 1KB
}));
```
### Caching Headers
```javascript
app.get('/static/*', (req, res) => {
res.setHeader('Cache-Control', 'public, max-age=31536000');
res.setHeader('ETag', computeETag(file));
res.sendFile(file);
});
```
### Keep-Alive Connections
```javascript
const http = require('http');
const server = http.createServer({
keepAlive: true,
keepAliveTimeout: 60000 // 60 seconds
}, app);
```
## Frontend Performance
### Code Splitting
```javascript
// Dynamic imports for code splitting
const HeavyComponent = lazy(() => import('./HeavyComponent'));
// Route-based code splitting
const routes = [
{
path: '/dashboard',
component: lazy(() => import('./Dashboard'))
}
];
```
### Memoization
```javascript
// React.memo for expensive components
const ExpensiveComponent = React.memo(({ data }) => {
return <div>{expensiveRender(data)}</div>;
});
// useMemo for expensive computations
const sortedData = useMemo(() => {
return data.sort(compare);
}, [data]);
// useCallback for stable function references
const handleClick = useCallback(() => {
doSomething(id);
}, [id]);
```
### Virtual Scrolling
```javascript
// For large lists, render only visible items
import { FixedSizeList } from 'react-window';
<FixedSizeList
height={600}
itemCount={10000}
itemSize={50}
width="100%"
>
{Row}
</FixedSizeList>
```
## Performance Anti-Patterns
### Unnecessary Re-renders
```javascript
// Bad: Creates new object on every render
function MyComponent() {
const style = { color: 'red' }; // New object each render
return <div style={style}>Text</div>;
}
// Good: Define outside or use useMemo
const style = { color: 'red' };
function MyComponent() {
return <div style={style}>Text</div>;
}
```
### Expensive Operations in Render
```javascript
// Bad: Expensive computation in render
function MyComponent({ items }) {
const sorted = items.sort(); // Sorts on every render!
return <List data={sorted} />;
}
// Good: Memoize expensive computations
function MyComponent({ items }) {
const sorted = useMemo(() => items.sort(), [items]);
return <List data={sorted} />;
}
```
## Benchmarking
### Simple Benchmarks
```javascript
const { performance } = require('perf_hooks');
function benchmark(fn, iterations = 1000) {
const start = performance.now();
for (let i = 0; i < iterations; i++) {
fn();
}
const end = performance.now();
console.log(`Avg: ${(end - start) / iterations}ms`);
}
benchmark(() => myFunction());
```
### Benchmark.js
```javascript
const Benchmark = require('benchmark');
const suite = new Benchmark.Suite;
suite
.add('Array#forEach', function() {
[1,2,3].forEach(x => x * 2);
})
.add('Array#map', function() {
[1,2,3].map(x => x * 2);
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run();
```
## Performance Checklist
**Before Optimizing:**
- [ ] Profile with Chrome DevTools or clinic.js
- [ ] Identify hot paths and bottlenecks
- [ ] Measure baseline performance
**Node.js Optimizations:**
- [ ] Use worker threads for CPU-intensive tasks
- [ ] Implement connection pooling for databases
- [ ] Enable compression middleware
- [ ] Use streams for large data processing
- [ ] Implement caching (Redis, in-memory)
- [ ] Batch async operations with controlled concurrency
- [ ] Monitor event loop lag
**Frontend Optimizations:**
- [ ] Implement code splitting
- [ ] Use React.memo for expensive components
- [ ] Implement virtual scrolling for large lists
- [ ] Optimize bundle size (tree shaking, minification)
- [ ] Use Web Workers for heavy computations
- [ ] Implement service workers for offline caching
- [ ] Lazy load images and components
**After Optimizing:**
- [ ] Re-profile to verify improvements
- [ ] Check memory usage for leaks
- [ ] Run load tests (autocannon, artillery)
- [ ] Monitor with APM tools
## Tools and Libraries
**Profiling:**
- `clinic.js` - Performance profiling suite
- `0x` - Flamegraph profiler
- `node --inspect` - Chrome DevTools integration
- `autocannon` - HTTP load testing
**Optimization:**
- `p-limit` - Concurrency control
- `lru-cache` - LRU caching
- `compression` - Response compression
- `react-window` - Virtual scrolling
- `workerpool` - Worker thread pools
**Monitoring:**
- `prom-client` - Prometheus metrics
- `newrelic` / `datadog` - APM
- `clinic-doctor` - Health diagnostics
---
*JavaScript/Node.js-specific performance optimization with V8 patterns and profiling tools*

View File

@@ -0,0 +1,326 @@
# Python Performance Optimization
**Load this file when:** Optimizing performance in Python projects
## Profiling Tools
### Execution Time Profiling
```bash
# cProfile - Built-in profiler
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats
# py-spy - Sampling profiler (no code changes needed)
py-spy record -o profile.svg -- python script.py
py-spy top -- python script.py
# line_profiler - Line-by-line profiling
kernprof -l -v script.py
```
### Memory Profiling
```bash
# memory_profiler - Line-by-line memory usage
python -m memory_profiler script.py
# memray - Modern memory profiler
memray run script.py
memray flamegraph output.bin
# tracemalloc - Built-in memory tracking
# (use in code, see example below)
```
### Benchmarking
```bash
# pytest-benchmark
pytest tests/ --benchmark-only
# timeit - Quick microbenchmarks
python -m timeit "'-'.join(str(n) for n in range(100))"
```
## Python-Specific Optimization Patterns
### Async/Await Patterns
```python
import asyncio
import aiohttp
# Good: Parallel async operations
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
return await asyncio.gather(*tasks)
# Bad: Sequential async (defeats the purpose)
async def fetch_all_bad(urls):
results = []
async with aiohttp.ClientSession() as session:
for url in urls:
results.append(await fetch_url(session, url))
return results
```
### List Comprehensions vs Generators
```python
# Generator (memory efficient for large datasets)
def process_large_file(filename):
return (process_line(line) for line in open(filename))
# List comprehension (when you need all data in memory)
def process_small_file(filename):
return [process_line(line) for line in open(filename)]
# Use itertools for complex generators
from itertools import islice, chain
first_10 = list(islice(generate_data(), 10))
```
### Efficient Data Structures
```python
# Use sets for membership testing
# Bad: O(n)
if item in my_list: # Slow for large lists
...
# Good: O(1)
if item in my_set: # Fast
...
# Use deque for queue operations
from collections import deque
queue = deque()
queue.append(item) # O(1)
queue.popleft() # O(1) vs list.pop(0) which is O(n)
# Use defaultdict to avoid key checks
from collections import defaultdict
counter = defaultdict(int)
counter[key] += 1 # No need to check if key exists
```
## GIL (Global Interpreter Lock) Considerations
### CPU-Bound Work
```python
# Use multiprocessing for CPU-bound tasks
from multiprocessing import Pool
def cpu_intensive_task(data):
# Heavy computation
return result
with Pool(processes=4) as pool:
results = pool.map(cpu_intensive_task, data_list)
```
### I/O-Bound Work
```python
# Use asyncio or threading for I/O-bound tasks
import asyncio
async def io_bound_task(url):
# Network I/O, file I/O
return result
results = await asyncio.gather(*[io_bound_task(url) for url in urls])
```
## Common Python Anti-Patterns
### String Concatenation
```python
# Bad: O(n²) for n strings
result = ""
for s in strings:
result += s
# Good: O(n)
result = "".join(strings)
```
### Unnecessary Lambda
```python
# Bad: Extra function call overhead
sorted_items = sorted(items, key=lambda x: x.value)
# Good: Direct attribute access
from operator import attrgetter
sorted_items = sorted(items, key=attrgetter('value'))
```
### Loop Invariant Code
```python
# Bad: Repeated calculation in loop
for item in items:
expensive_result = expensive_function()
process(item, expensive_result)
# Good: Calculate once
expensive_result = expensive_function()
for item in items:
process(item, expensive_result)
```
## Performance Measurement
### Tracemalloc for Memory Tracking
```python
import tracemalloc
# Start tracking
tracemalloc.start()
# Your code here
data = [i for i in range(1000000)]
# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop()
```
### Context Manager for Timing
```python
import time
from contextlib import contextmanager
@contextmanager
def timer(name):
start = time.perf_counter()
yield
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.4f}s")
# Usage
with timer("Database query"):
results = db.query(...)
```
## Database Optimization (Python-Specific)
### SQLAlchemy Best Practices
```python
# Bad: N+1 queries
for user in session.query(User).all():
print(user.profile.bio) # Separate query for each
# Good: Eager loading
from sqlalchemy.orm import joinedload
users = session.query(User).options(
joinedload(User.profile)
).all()
# Good: Batch operations
session.bulk_insert_mappings(User, user_dicts)
session.commit()
```
## Caching Strategies
### Function Caching
```python
from functools import lru_cache, cache
# LRU cache with size limit
@lru_cache(maxsize=128)
def expensive_computation(n):
# Heavy computation
return result
# Unlimited cache (Python 3.9+)
@cache
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# Manual cache with expiration
from cachetools import TTLCache
cache = TTLCache(maxsize=100, ttl=300) # 5 minutes
```
## Performance Testing
### pytest-benchmark
```python
def test_processing_performance(benchmark):
# Benchmark automatically handles iterations
result = benchmark(process_data, large_dataset)
assert result is not None
# Compare against baseline
def test_against_baseline(benchmark):
benchmark.pedantic(
process_data,
args=(dataset,),
iterations=10,
rounds=100
)
```
### Load Testing with Locust
```python
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 3)
@task
def load_homepage(self):
self.client.get("/")
@task(3) # 3x more likely than homepage
def load_api(self):
self.client.get("/api/data")
```
## Performance Checklist
**Before Optimizing:**
- [ ] Profile to identify actual bottlenecks (don't guess!)
- [ ] Measure baseline performance
- [ ] Set performance targets
**Python-Specific Optimizations:**
- [ ] Use generators for large datasets
- [ ] Replace loops with list comprehensions where appropriate
- [ ] Use appropriate data structures (set, deque, defaultdict)
- [ ] Implement caching with @lru_cache or @cache
- [ ] Use async/await for I/O-bound operations
- [ ] Use multiprocessing for CPU-bound operations
- [ ] Avoid string concatenation in loops
- [ ] Minimize attribute lookups in hot loops
- [ ] Use __slots__ for classes with many instances
**After Optimizing:**
- [ ] Re-profile to verify improvements
- [ ] Check memory usage hasn't increased significantly
- [ ] Ensure code readability is maintained
- [ ] Add performance regression tests
## Tools and Libraries
**Profiling:**
- `cProfile` - Built-in execution profiler
- `py-spy` - Sampling profiler without code changes
- `memory_profiler` - Memory usage line-by-line
- `memray` - Modern memory profiler with flamegraphs
**Performance Testing:**
- `pytest-benchmark` - Benchmark tests
- `locust` - Load testing framework
- `hyperfine` - Command-line benchmarking
**Optimization:**
- `numpy` - Vectorized operations for numerical data
- `numba` - JIT compilation for numerical functions
- `cython` - Compile Python to C for speed
---
*Python-specific performance optimization with profiling tools and patterns*

View File

@@ -0,0 +1,382 @@
# Rust Performance Optimization
**Load this file when:** Optimizing performance in Rust projects
## Profiling Tools
### Benchmarking with Criterion
```bash
# Add to Cargo.toml
[dev-dependencies]
criterion = "0.5"
[[bench]]
name = "my_benchmark"
harness = false
# Run benchmarks
cargo bench
# Compare against baseline
cargo bench --baseline master
```
### CPU Profiling
```bash
# perf (Linux)
cargo build --release
perf record --call-graph dwarf ./target/release/myapp
perf report
# Instruments (macOS)
cargo instruments --release --template "Time Profiler"
# cargo-flamegraph
cargo install flamegraph
cargo flamegraph
# samply (cross-platform)
cargo install samply
samply record ./target/release/myapp
```
### Memory Profiling
```bash
# valgrind (memory leaks, cache performance)
cargo build
valgrind --tool=massif ./target/debug/myapp
# dhat (heap profiling)
# Add dhat crate to project
# cargo-bloat (binary size analysis)
cargo install cargo-bloat
cargo bloat --release
```
## Zero-Cost Abstractions
### Avoiding Unnecessary Allocations
```rust
// Bad: Allocates on every call
fn process_string(s: String) -> String {
s.to_uppercase()
}
// Good: Borrows, no allocation
fn process_string(s: &str) -> String {
s.to_uppercase()
}
// Best: In-place modification where possible
fn process_string_mut(s: &mut String) {
*s = s.to_uppercase();
}
```
### Stack vs Heap Allocation
```rust
// Stack: Fast, known size at compile time
let numbers = [1, 2, 3, 4, 5];
// Heap: Flexible, runtime-sized data
let numbers = vec![1, 2, 3, 4, 5];
// Use Box<[T]> for fixed-size heap data (smaller than Vec)
let numbers: Box<[i32]> = vec![1, 2, 3, 4, 5].into_boxed_slice();
```
### Iterator Chains vs For Loops
```rust
// Good: Zero-cost iterator chains (compiled to efficient code)
let sum: i32 = numbers
.iter()
.filter(|&&n| n > 0)
.map(|&n| n * 2)
.sum();
// Also good: Manual loop (similar performance)
let mut sum = 0;
for &n in numbers.iter() {
if n > 0 {
sum += n * 2;
}
}
// Choose iterators for readability, loops for complex logic
```
## Compilation Optimizations
### Release Profile Tuning
```toml
[profile.release]
opt-level = 3 # Maximum optimization
lto = "fat" # Link-time optimization
codegen-units = 1 # Better optimization, slower compile
strip = true # Strip symbols from binary
panic = "abort" # Smaller binary, no stack unwinding
[profile.release-with-debug]
inherits = "release"
debug = true # Keep debug symbols for profiling
```
### Target CPU Features
```bash
# Use native CPU features
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Or in .cargo/config.toml
[build]
rustflags = ["-C", "target-cpu=native"]
```
## Memory Layout Optimization
### Struct Field Ordering
```rust
// Bad: Wasted padding (24 bytes)
struct BadLayout {
a: u8, // 1 byte + 7 padding
b: u64, // 8 bytes
c: u8, // 1 byte + 7 padding
}
// Good: Minimal padding (16 bytes)
struct GoodLayout {
b: u64, // 8 bytes
a: u8, // 1 byte
c: u8, // 1 byte + 6 padding
}
// Use #[repr(C)] for consistent layout
#[repr(C)]
struct FixedLayout {
// Fields laid out in declaration order
}
```
### Enum Optimization
```rust
// Consider enum size (uses largest variant)
enum Large {
Small(u8),
Big([u8; 1000]), // Entire enum is 1000+ bytes!
}
// Better: Box large variants
enum Optimized {
Small(u8),
Big(Box<[u8; 1000]>), // Enum is now pointer-sized
}
```
## Concurrency Patterns
### Using Rayon for Data Parallelism
```rust
use rayon::prelude::*;
// Sequential
let sum: i32 = data.iter().map(|x| expensive(x)).sum();
// Parallel (automatic work stealing)
let sum: i32 = data.par_iter().map(|x| expensive(x)).sum();
```
### Async Runtime Optimization
```rust
// tokio - For I/O-heavy workloads
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
async fn main() {
// Async I/O operations
}
// async-std - Alternative runtime
// Choose based on ecosystem compatibility
```
## Common Rust Performance Patterns
### String Handling
```rust
// Avoid unnecessary clones
// Bad
fn process(s: String) -> String {
let upper = s.clone().to_uppercase();
upper
}
// Good
fn process(s: &str) -> String {
s.to_uppercase()
}
// Use Cow for conditional cloning
use std::borrow::Cow;
fn maybe_uppercase<'a>(s: &'a str, uppercase: bool) -> Cow<'a, str> {
if uppercase {
Cow::Owned(s.to_uppercase())
} else {
Cow::Borrowed(s)
}
}
```
### Collection Preallocation
```rust
// Bad: Multiple reallocations
let mut vec = Vec::new();
for i in 0..1000 {
vec.push(i);
}
// Good: Single allocation
let mut vec = Vec::with_capacity(1000);
for i in 0..1000 {
vec.push(i);
}
// Best: Use collect with size_hint
let vec: Vec<_> = (0..1000).collect();
```
### Minimize Clones
```rust
// Bad: Unnecessary clones in loop
for item in &items {
let owned = item.clone();
process(owned);
}
// Good: Borrow when possible
for item in &items {
process_borrowed(item);
}
// Use Rc/Arc only when necessary
use std::rc::Rc;
let shared = Rc::new(expensive_data);
let clone1 = Rc::clone(&shared); // Cheap pointer clone
```
## Performance Anti-Patterns
### Unnecessary Dynamic Dispatch
```rust
// Bad: Dynamic dispatch overhead
fn process(items: &[Box<dyn Trait>]) {
for item in items {
item.method(); // Virtual call
}
}
// Good: Static dispatch via generics
fn process<T: Trait>(items: &[T]) {
for item in items {
item.method(); // Direct call, can be inlined
}
}
```
### Lock Contention
```rust
// Bad: Holding lock during expensive operation
let data = mutex.lock().unwrap();
let result = expensive_computation(&data);
drop(data);
// Good: Release lock quickly
let cloned = {
let data = mutex.lock().unwrap();
data.clone()
};
let result = expensive_computation(&cloned);
```
## Benchmarking with Criterion
### Basic Benchmark
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn fibonacci_benchmark(c: &mut Criterion) {
c.bench_function("fib 20", |b| {
b.iter(|| fibonacci(black_box(20)))
});
}
criterion_group!(benches, fibonacci_benchmark);
criterion_main!(benches);
```
### Parameterized Benchmarks
```rust
fn bench_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("process");
for size in [10, 100, 1000, 10000].iter() {
group.bench_with_input(
BenchmarkId::from_parameter(size),
size,
|b, &size| {
b.iter(|| process_data(black_box(size)))
},
);
}
group.finish();
}
```
## Performance Checklist
**Before Optimizing:**
- [ ] Profile with release build to identify bottlenecks
- [ ] Measure baseline with criterion benchmarks
- [ ] Use cargo-flamegraph to visualize hot paths
**Rust-Specific Optimizations:**
- [ ] Enable LTO in release profile
- [ ] Use target-cpu=native for CPU-specific features
- [ ] Preallocate collections with `with_capacity`
- [ ] Prefer borrowing (&T) over owned (T) in APIs
- [ ] Use iterators over manual loops
- [ ] Minimize clones - use Rc/Arc only when needed
- [ ] Order struct fields by size (largest first)
- [ ] Box large enum variants
- [ ] Use rayon for CPU-bound parallelism
- [ ] Avoid unnecessary dynamic dispatch
**After Optimizing:**
- [ ] Re-benchmark to verify improvements
- [ ] Check binary size with cargo-bloat
- [ ] Profile memory with valgrind/dhat
- [ ] Add regression tests with criterion baselines
## Tools and Crates
**Profiling:**
- `criterion` - Statistical benchmarking
- `flamegraph` - Flamegraph generation
- `cargo-instruments` - macOS profiling
- `perf` - Linux performance analysis
- `dhat` - Heap profiling
**Optimization:**
- `rayon` - Data parallelism
- `tokio` / `async-std` - Async runtime
- `parking_lot` - Faster mutex/rwlock
- `smallvec` - Stack-allocated vectors
- `once_cell` - Lazy static initialization
**Analysis:**
- `cargo-bloat` - Binary size analysis
- `cargo-udeps` - Find unused dependencies
- `twiggy` - Code size profiler
---
*Rust-specific performance optimization with zero-cost abstractions and profiling tools*