Initial commit

2025-11-29 18:50:24 +08:00
commit f172746dc6
52 changed files with 17406 additions and 0 deletions
--- a/skills/optimizing-performance/SKILL.md
+++ b/skills/optimizing-performance/SKILL.md
@@ -0,0 +1,211 @@
+---
+name: Optimizing Performance
+description: Optimize performance with profiling, caching strategies, database query optimization, and bottleneck analysis. Use when improving response times, implementing caching layers, or scaling for high load.
+---
+
+# Optimizing Performance
+
+I help you identify and fix performance bottlenecks using language-specific profiling tools, optimization patterns, and best practices.
+
+## When to Use Me
+
+**Performance analysis:**
+- "Profile this code for bottlenecks"
+- "Analyze performance issues"
+- "Why is this slow?"
+
+**Optimization:**
+- "Optimize database queries"
+- "Improve response time"
+- "Reduce memory usage"
+
+**Scaling:**
+- "Implement caching strategy"
+- "Optimize for high load"
+- "Scale this service"
+
+## How I Work - Progressive Loading
+
+I load only the performance guidance relevant to your language:
+
+```yaml
+Language Detection:
+  "Python project" → Load @languages/PYTHON.md
+  "Rust project" → Load @languages/RUST.md
+  "JavaScript/Node.js" → Load @languages/JAVASCRIPT.md
+  "Go project" → Load @languages/GO.md
+  "Any language" → Load @languages/GENERIC.md
+```
+
+**Don't load all files!** Start with language detection, then load specific guidance.
+
+## Core Principles
+
+### 1. Measure First
+**Never optimize without data.** Profile to find actual bottlenecks, don't guess.
+
+- Establish baseline metrics
+- Profile to identify hot paths
+- Focus on the 20% of code that takes 80% of time
+- Measure improvements after optimization
+
+### 2. Performance Budgets
+Set clear targets before optimizing:
+
+```yaml
+targets:
+  api_response: "<200ms (p95)"
+  page_load: "<2 seconds"
+  database_query: "<50ms (p95)"
+  cache_lookup: "<10ms"
+```
+
+### 3. Trade-offs
+Balance performance vs:
+- Code readability
+- Maintainability
+- Development time
+- Memory usage
+
+Premature optimization is the root of all evil. Optimize when:
+- Profiling shows clear bottleneck
+- Performance requirement not met
+- User experience degraded
+
+## Quick Wins (Language-Agnostic)
+
+### Database
+- Add indexes for frequently queried columns
+- Implement connection pooling
+- Use batch operations instead of loops
+- Cache expensive query results
+
+### Caching
+- Implement multi-level caching (L1: in-memory, L2: Redis, L3: database, L4: CDN)
+- Define cache invalidation strategy
+- Monitor cache hit rates
+
+### Network
+- Enable compression for responses
+- Use HTTP/2 or HTTP/3
+- Implement CDN for static assets
+- Configure appropriate timeouts
+
+## Language-Specific Guidance
+
+### Python
+**Load:** `@languages/PYTHON.md`
+
+**Quick reference:**
+- Profiling: `cProfile`, `py-spy`, `memory_profiler`
+- Patterns: Generators, async/await, list comprehensions
+- Anti-patterns: String concatenation in loops, GIL contention
+- Tools: `pytest-benchmark`, `locust`
+
+### Rust
+**Load:** `@languages/RUST.md`
+
+**Quick reference:**
+- Profiling: `cargo bench`, `flamegraph`, `perf`
+- Patterns: Zero-cost abstractions, iterator chains, preallocated collections
+- Anti-patterns: Unnecessary allocations, large enum variants
+- Tools: `criterion`, `rayon`, `parking_lot`
+
+### JavaScript/Node.js
+**Load:** `@languages/JAVASCRIPT.md`
+
+**Quick reference:**
+- Profiling: `clinic.js`, `0x`, Chrome DevTools
+- Patterns: Event loop optimization, worker threads, streaming
+- Anti-patterns: Blocking event loop, memory leaks, unnecessary re-renders
+- Tools: `autocannon`, `react-window`, `p-limit`
+
+### Go
+**Load:** `@languages/GO.md`
+
+**Quick reference:**
+- Profiling: `pprof`, `go test -bench`, `go tool trace`
+- Patterns: Goroutine pools, buffered channels, `sync.Pool`
+- Anti-patterns: Unlimited goroutines, defer in loops, lock contention
+- Tools: `benchstat`, `sync.Map`, `strings.Builder`
+
+### Generic Patterns
+**Load:** `@languages/GENERIC.md`
+
+**When to use:** Database optimization, caching strategies, load balancing, monitoring - applicable to any language.
+
+## Optimization Workflow
+
+### Phase 1: Baseline
+1. Define performance requirements
+2. Measure current performance
+3. Identify user-facing metrics (response time, throughput)
+
+### Phase 2: Profile
+1. Use language-specific profiling tools
+2. Identify hot paths (where time is spent)
+3. Find memory bottlenecks
+4. Check for resource leaks
+
+### Phase 3: Optimize
+1. Focus on biggest bottleneck first
+2. Apply language-specific optimizations
+3. Implement caching where appropriate
+4. Optimize database queries
+
+### Phase 4: Verify
+1. Re-profile to measure improvements
+2. Run performance regression tests
+3. Monitor in production
+4. Set up alerts for degradation
+
+## Common Bottlenecks
+
+### Database
+- Missing indexes
+- N+1 query problem
+- No connection pooling
+- Expensive joins
+→ **Load** `@languages/GENERIC.md` for DB optimization
+
+### Memory
+- Memory leaks
+- Excessive allocations
+- Large object graphs
+- No pooling
+→ **Load** language-specific file for memory management
+
+### Network
+- No compression
+- Chatty API calls
+- Synchronous external calls
+- No CDN
+→ **Load** `@languages/GENERIC.md` for network optimization
+
+### Concurrency
+- Lock contention
+- Excessive threading/goroutines
+- Blocking operations
+- Poor work distribution
+→ **Load** language-specific file for concurrency patterns
+
+## Success Criteria
+
+**Optimization complete when:**
+- ✅ Performance targets met
+- ✅ No regressions in functionality
+- ✅ Code remains maintainable
+- ✅ Improvements verified with profiling
+- ✅ Production metrics show improvement
+- ✅ Alerts configured for degradation
+
+## Next Steps
+
+- Use profiling tools to identify bottlenecks
+- Load language-specific guidance
+- Apply targeted optimizations
+- Set up monitoring and alerts
+
+---
+
+*Load language-specific files for detailed profiling tools, optimization patterns, and best practices*
--- a/skills/optimizing-performance/languages/GENERIC.md
+++ b/skills/optimizing-performance/languages/GENERIC.md
@@ -0,0 +1,426 @@
+# Generic Performance Optimization
+
+**Load this file when:** Optimizing performance in any language or need language-agnostic patterns
+
+## Universal Principles
+
+### Measure First
+- Never optimize without profiling
+- Establish baseline metrics before changes
+- Focus on bottlenecks, not micro-optimizations
+- Use 80/20 rule: 80% of time spent in 20% of code
+
+### Performance Budgets
+```yaml
+response_time_targets:
+  api_endpoint: "<200ms (p95)"
+  page_load: "<2 seconds"
+  database_query: "<50ms (p95)"
+  cache_lookup: "<10ms"
+
+resource_limits:
+  max_memory: "512MB per process"
+  max_cpu: "80% sustained"
+  max_connections: "100 per instance"
+```
+
+## Database Optimization
+
+### Indexing Strategy
+```sql
+-- Identify slow queries
+-- PostgreSQL
+SELECT query, mean_exec_time
+FROM pg_stat_statements
+ORDER BY mean_exec_time DESC
+LIMIT 10;
+
+-- Add indexes for frequently queried columns
+CREATE INDEX idx_users_email ON users(email);
+CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
+
+-- Composite indexes for common query patterns
+CREATE INDEX idx_search ON products(category, price, created_at);
+```
+
+### Query Optimization
+```sql
+-- Use EXPLAIN to understand query plans
+EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
+
+-- Avoid SELECT *
+-- Bad
+SELECT * FROM users;
+
+-- Good
+SELECT id, name, email FROM users;
+
+-- Use LIMIT for pagination
+SELECT id, name FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 0;
+
+-- Use EXISTS instead of COUNT for checking existence
+-- Bad
+SELECT COUNT(*) FROM orders WHERE user_id = 123;
+
+-- Good
+SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 123);
+```
+
+### Connection Pooling
+```yaml
+connection_pool_config:
+  min_connections: 5
+  max_connections: 20
+  connection_timeout: 30s
+  idle_timeout: 10m
+  max_lifetime: 1h
+```
+
+## Caching Strategies
+
+### Multi-Level Caching
+```yaml
+caching_layers:
+  L1_application:
+    type: "In-Memory (LRU)"
+    size: "100MB"
+    ttl: "5 minutes"
+    use_case: "Hot data, session data"
+
+  L2_distributed:
+    type: "Redis"
+    ttl: "1 hour"
+    use_case: "Shared data across instances"
+
+  L3_database:
+    type: "Query Result Cache"
+    ttl: "15 minutes"
+    use_case: "Expensive query results"
+
+  L4_cdn:
+    type: "CDN"
+    ttl: "24 hours"
+    use_case: "Static assets, public API responses"
+```
+
+### Cache Invalidation Patterns
+```yaml
+strategies:
+  time_based:
+    description: "TTL-based expiration"
+    use_case: "Data with predictable change patterns"
+    example: "Weather data, stock prices"
+
+  event_based:
+    description: "Invalidate on data change events"
+    use_case: "Real-time consistency required"
+    example: "User profile updates"
+
+  write_through:
+    description: "Update cache on write"
+    use_case: "Strong consistency needed"
+    example: "Shopping cart, user sessions"
+
+  lazy_refresh:
+    description: "Refresh on cache miss"
+    use_case: "Acceptable stale data"
+    example: "Analytics dashboards"
+```
+
+## Network Optimization
+
+### HTTP/2 and HTTP/3
+```yaml
+benefits:
+  - Multiplexing: Multiple requests over single connection
+  - Header compression: Reduced overhead
+  - Server push: Proactive resource sending
+  - Binary protocol: Faster parsing
+```
+
+### Compression
+```yaml
+compression_config:
+  enabled: true
+  min_size: "1KB"  # Don't compress tiny responses
+  types:
+    - "text/html"
+    - "text/css"
+    - "application/javascript"
+    - "application/json"
+  level: 6  # Balance speed vs size
+```
+
+### Connection Management
+```yaml
+keep_alive:
+  enabled: true
+  timeout: "60s"
+  max_requests: 100
+
+timeouts:
+  connect: "10s"
+  read: "30s"
+  write: "30s"
+  idle: "120s"
+```
+
+## Monitoring and Observability
+
+### Key Metrics to Track
+```yaml
+application_metrics:
+  - response_time_p50
+  - response_time_p95
+  - response_time_p99
+  - error_rate
+  - throughput_rps
+
+system_metrics:
+  - cpu_utilization
+  - memory_utilization
+  - disk_io
+  - network_io
+
+database_metrics:
+  - query_execution_time
+  - connection_pool_usage
+  - slow_query_count
+  - cache_hit_rate
+```
+
+### Alert Thresholds
+```yaml
+alerts:
+  critical:
+    - metric: "error_rate"
+      threshold: ">5%"
+      duration: "2 minutes"
+
+    - metric: "response_time_p99"
+      threshold: ">1000ms"
+      duration: "5 minutes"
+
+  warning:
+    - metric: "cpu_utilization"
+      threshold: ">80%"
+      duration: "10 minutes"
+
+    - metric: "memory_utilization"
+      threshold: ">85%"
+      duration: "5 minutes"
+```
+
+## Load Balancing
+
+### Strategies
+```yaml
+round_robin:
+  description: "Distribute requests evenly"
+  use_case: "Homogeneous backend servers"
+
+least_connections:
+  description: "Route to server with fewest connections"
+  use_case: "Varying request processing times"
+
+ip_hash:
+  description: "Consistent routing based on client IP"
+  use_case: "Session affinity required"
+
+weighted:
+  description: "Route based on server capacity"
+  use_case: "Heterogeneous server specs"
+```
+
+### Health Checks
+```yaml
+health_check:
+  interval: "10s"
+  timeout: "5s"
+  unhealthy_threshold: 3
+  healthy_threshold: 2
+  path: "/health"
+```
+
+## CDN Configuration
+
+### Caching Rules
+```yaml
+static_assets:
+  pattern: "*.{js,css,png,jpg,svg,woff2}"
+  cache_control: "public, max-age=31536000, immutable"
+
+api_responses:
+  pattern: "/api/public/*"
+  cache_control: "public, max-age=300, s-maxage=600"
+
+html_pages:
+  pattern: "*.html"
+  cache_control: "public, max-age=60, s-maxage=300"
+```
+
+### Geographic Distribution
+```yaml
+regions:
+  - us-east: "Primary"
+  - us-west: "Failover"
+  - eu-west: "Regional"
+  - ap-southeast: "Regional"
+
+routing:
+  policy: "latency-based"
+  fallback: "round-robin"
+```
+
+## Horizontal Scaling Patterns
+
+### Stateless Services
+```yaml
+principles:
+  - No local state storage
+  - Session data in external store (Redis, database)
+  - Any instance can handle any request
+  - Easy to add/remove instances
+```
+
+### Message Queues
+```yaml
+use_cases:
+  - Decouple services
+  - Handle traffic spikes
+  - Async processing
+  - Retry logic
+
+patterns:
+  work_queue:
+    description: "Distribute tasks to workers"
+    example: "Image processing, email sending"
+
+  pub_sub:
+    description: "Event broadcasting"
+    example: "User registration notifications"
+```
+
+## Anti-Patterns to Avoid
+
+### N+1 Query Problem
+```sql
+-- Bad: N+1 queries (1 for users + N for profiles)
+SELECT * FROM users;
+-- Then for each user:
+SELECT * FROM profiles WHERE user_id = ?;
+
+-- Good: Single join query
+SELECT u.*, p.*
+FROM users u
+LEFT JOIN profiles p ON u.id = p.user_id;
+```
+
+### Chatty Interfaces
+```yaml
+bad:
+  requests: 100
+  description: "100 separate API calls to get data"
+  latency: "100 * 50ms = 5000ms"
+
+good:
+  requests: 1
+  description: "Single batch API call"
+  latency: "200ms"
+```
+
+### Synchronous External Calls
+```yaml
+bad:
+  pattern: "Sequential blocking calls"
+  time: "call1 (500ms) + call2 (500ms) + call3 (500ms) = 1500ms"
+
+good:
+  pattern: "Parallel async calls"
+  time: "max(call1, call2, call3) = 500ms"
+```
+
+## Performance Testing Strategy
+
+### Load Testing
+```yaml
+scenarios:
+  smoke_test:
+    users: 1
+    duration: "1 minute"
+    purpose: "Verify system works"
+
+  load_test:
+    users: "normal_traffic"
+    duration: "15 minutes"
+    purpose: "Performance under normal load"
+
+  stress_test:
+    users: "2x_normal"
+    duration: "30 minutes"
+    purpose: "Find breaking point"
+
+  spike_test:
+    users: "0 → 1000 → 0"
+    duration: "10 minutes"
+    purpose: "Handle sudden traffic spikes"
+
+  endurance_test:
+    users: "normal_traffic"
+    duration: "24 hours"
+    purpose: "Memory leaks, degradation"
+```
+
+### Performance Regression Tests
+```yaml
+approach:
+  - Baseline metrics from production
+  - Run automated perf tests in CI
+  - Compare against baseline
+  - Fail build if regression > threshold
+
+thresholds:
+  response_time: "+10%"
+  throughput: "-5%"
+  error_rate: "+1%"
+```
+
+## Checklist
+
+**Initial Assessment:**
+- [ ] Identify performance requirements
+- [ ] Establish current baseline metrics
+- [ ] Profile to find bottlenecks
+
+**Database Optimization:**
+- [ ] Add indexes for common queries
+- [ ] Implement connection pooling
+- [ ] Cache query results
+- [ ] Use batch operations
+
+**Caching:**
+- [ ] Implement multi-level caching
+- [ ] Define cache invalidation strategy
+- [ ] Monitor cache hit rates
+
+**Network:**
+- [ ] Enable compression
+- [ ] Use HTTP/2 or HTTP/3
+- [ ] Implement CDN for static assets
+- [ ] Configure appropriate timeouts
+
+**Monitoring:**
+- [ ] Track key performance metrics
+- [ ] Set up alerts for anomalies
+- [ ] Implement distributed tracing
+- [ ] Create performance dashboards
+
+**Testing:**
+- [ ] Run load tests
+- [ ] Conduct stress tests
+- [ ] Set up performance regression tests
+- [ ] Monitor in production
+
+---
+
+*Language-agnostic performance optimization patterns applicable to any technology stack*
--- a/skills/optimizing-performance/languages/GO.md
+++ b/skills/optimizing-performance/languages/GO.md
@@ -0,0 +1,433 @@
+# Go Performance Optimization
+
+**Load this file when:** Optimizing performance in Go projects
+
+## Profiling Tools
+
+### Built-in pprof
+```bash
+# CPU profiling
+go test -cpuprofile=cpu.prof -bench=.
+go tool pprof cpu.prof
+
+# Memory profiling
+go test -memprofile=mem.prof -bench=.
+go tool pprof mem.prof
+
+# Web UI for profiles
+go tool pprof -http=:8080 cpu.prof
+
+# Goroutine profiling
+go tool pprof http://localhost:6060/debug/pprof/goroutine
+
+# Heap profiling
+go tool pprof http://localhost:6060/debug/pprof/heap
+```
+
+### Benchmarking
+```go
+// Basic benchmark
+func BenchmarkFibonacci(b *testing.B) {
+    for i := 0; i < b.N; i++ {
+        fibonacci(20)
+    }
+}
+
+// With sub-benchmarks
+func BenchmarkSizes(b *testing.B) {
+    sizes := []int{10, 100, 1000}
+    for _, size := range sizes {
+        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
+            for i := 0; i < b.N; i++ {
+                process(size)
+            }
+        })
+    }
+}
+
+// Reset timer for setup
+func BenchmarkWithSetup(b *testing.B) {
+    data := setupExpensiveData()
+    b.ResetTimer()  // Don't count setup time
+
+    for i := 0; i < b.N; i++ {
+        process(data)
+    }
+}
+```
+
+### Runtime Metrics
+```go
+import (
+    "net/http"
+    _ "net/http/pprof"  // Import for side effects
+    "runtime"
+)
+
+func init() {
+    // Enable profiling endpoint
+    go func() {
+        http.ListenAndServe("localhost:6060", nil)
+    }()
+}
+
+// Monitor goroutines
+func printStats() {
+    fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
+
+    var m runtime.MemStats
+    runtime.ReadMemStats(&m)
+    fmt.Printf("Alloc: %d MB\n", m.Alloc/1024/1024)
+    fmt.Printf("TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024)
+}
+```
+
+## Memory Management
+
+### Avoiding Allocations
+```go
+// Bad: Allocates on every call
+func process(data []byte) []byte {
+    result := make([]byte, len(data))  // New allocation
+    copy(result, data)
+    return result
+}
+
+// Good: Reuse buffer
+var bufferPool = sync.Pool{
+    New: func() interface{} {
+        return make([]byte, 1024)
+    },
+}
+
+func process(data []byte) {
+    buf := bufferPool.Get().([]byte)
+    defer bufferPool.Put(buf)
+    // Process with buf
+}
+```
+
+### Preallocate Slices
+```go
+// Bad: Multiple allocations as slice grows
+items := []Item{}
+for i := 0; i < 1000; i++ {
+    items = append(items, Item{i})  // Reallocates when cap exceeded
+}
+
+// Good: Single allocation
+items := make([]Item, 0, 1000)
+for i := 0; i < 1000; i++ {
+    items = append(items, Item{i})  // No reallocation
+}
+
+// Or if final size is known
+items := make([]Item, 1000)
+for i := 0; i < 1000; i++ {
+    items[i] = Item{i}
+}
+```
+
+### String vs []byte
+```go
+// Bad: String concatenation allocates
+var result string
+for _, s := range strings {
+    result += s  // New allocation each time
+}
+
+// Good: Use strings.Builder
+var builder strings.Builder
+builder.Grow(estimatedSize)  // Preallocate
+for _, s := range strings {
+    builder.WriteString(s)
+}
+result := builder.String()
+
+// For byte operations, work with []byte
+data := []byte("hello")
+data = append(data, " world"...)  // Efficient
+```
+
+## Goroutine Optimization
+
+### Worker Pool Pattern
+```go
+// Bad: Unlimited goroutines
+for _, task := range tasks {
+    go process(task)  // Could spawn millions!
+}
+
+// Good: Limited worker pool
+func workerPool(tasks <-chan Task, workers int) {
+    var wg sync.WaitGroup
+    for i := 0; i < workers; i++ {
+        wg.Add(1)
+        go func() {
+            defer wg.Done()
+            for task := range tasks {
+                process(task)
+            }
+        }()
+    }
+    wg.Wait()
+}
+
+// Usage
+taskChan := make(chan Task, 100)
+go workerPool(taskChan, 10)  // 10 workers
+```
+
+### Channel Patterns
+```go
+// Buffered channels reduce blocking
+ch := make(chan int, 100)  // Buffer of 100
+
+// Fan-out pattern for parallel work
+func fanOut(in <-chan int, n int) []<-chan int {
+    outs := make([]<-chan int, n)
+    for i := 0; i < n; i++ {
+        out := make(chan int)
+        outs[i] = out
+        go func() {
+            for v := range in {
+                out <- process(v)
+            }
+            close(out)
+        }()
+    }
+    return outs
+}
+
+// Fan-in pattern to merge results
+func fanIn(channels ...<-chan int) <-chan int {
+    out := make(chan int)
+    var wg sync.WaitGroup
+
+    for _, ch := range channels {
+        wg.Add(1)
+        go func(c <-chan int) {
+            defer wg.Done()
+            for v := range c {
+                out <- v
+            }
+        }(ch)
+    }
+
+    go func() {
+        wg.Wait()
+        close(out)
+    }()
+
+    return out
+}
+```
+
+## Data Structure Optimization
+
+### Map Preallocation
+```go
+// Bad: Map grows as needed
+m := make(map[string]int)
+for i := 0; i < 10000; i++ {
+    m[fmt.Sprint(i)] = i  // Reallocates periodically
+}
+
+// Good: Preallocate
+m := make(map[string]int, 10000)
+for i := 0; i < 10000; i++ {
+    m[fmt.Sprint(i)] = i  // No reallocation
+}
+```
+
+### Struct Field Alignment
+```go
+// Bad: Poor alignment (40 bytes due to padding)
+type BadLayout struct {
+    a bool   // 1 byte + 7 padding
+    b int64  // 8 bytes
+    c bool   // 1 byte + 7 padding
+    d int64  // 8 bytes
+    e bool   // 1 byte + 7 padding
+}
+
+// Good: Optimal alignment (24 bytes)
+type GoodLayout struct {
+    b int64  // 8 bytes
+    d int64  // 8 bytes
+    a bool   // 1 byte
+    c bool   // 1 byte
+    e bool   // 1 byte + 5 padding
+}
+```
+
+## I/O Optimization
+
+### Buffered I/O
+```go
+// Bad: Unbuffered reads
+file, _ := os.Open("file.txt")
+scanner := bufio.NewScanner(file)
+
+// Good: Buffered with custom size
+file, _ := os.Open("file.txt")
+reader := bufio.NewReaderSize(file, 64*1024)  // 64KB buffer
+scanner := bufio.NewScanner(reader)
+```
+
+### Connection Pooling
+```go
+// HTTP client with connection pooling
+client := &http.Client{
+    Transport: &http.Transport{
+        MaxIdleConns:        100,
+        MaxIdleConnsPerHost: 10,
+        IdleConnTimeout:     90 * time.Second,
+    },
+    Timeout: 10 * time.Second,
+}
+
+// Database connection pool
+db, _ := sql.Open("postgres", dsn)
+db.SetMaxOpenConns(25)
+db.SetMaxIdleConns(5)
+db.SetConnMaxLifetime(5 * time.Minute)
+```
+
+## Performance Anti-Patterns
+
+### Unnecessary Interface Conversions
+```go
+// Bad: Interface conversion in hot path
+func process(items []interface{}) {
+    for _, item := range items {
+        v := item.(MyType)  // Type assertion overhead
+        use(v)
+    }
+}
+
+// Good: Use concrete types
+func process(items []MyType) {
+    for _, item := range items {
+        use(item)  // Direct access
+    }
+}
+```
+
+### Defer in Loops
+```go
+// Bad: Defers accumulate in loop
+for _, file := range files {
+    f, _ := os.Open(file)
+    defer f.Close()  // All close calls deferred until function returns!
+}
+
+// Good: Close immediately or use function
+for _, file := range files {
+    func() {
+        f, _ := os.Open(file)
+        defer f.Close()  // Deferred to end of this closure
+        process(f)
+    }()
+}
+```
+
+### Lock Contention
+```go
+// Bad: Lock held during expensive operation
+mu.Lock()
+result := expensiveComputation(data)
+cache[key] = result
+mu.Unlock()
+
+// Good: Minimize lock time
+result := expensiveComputation(data)
+mu.Lock()
+cache[key] = result
+mu.Unlock()
+
+// Better: Use sync.Map for concurrent reads
+var cache sync.Map
+cache.Store(key, value)
+val, ok := cache.Load(key)
+```
+
+## Compiler Optimizations
+
+### Escape Analysis
+```go
+// Bad: Escapes to heap
+func makeSlice() *[]int {
+    s := make([]int, 1000)
+    return &s  // Pointer returned, allocates on heap
+}
+
+// Good: Stays on stack
+func makeSlice() []int {
+    s := make([]int, 1000)
+    return s  // Value returned, can stay on stack
+}
+
+// Check with: go build -gcflags='-m'
+```
+
+### Inline Functions
+```go
+// Small functions are inlined automatically
+func add(a, b int) int {
+    return a + b  // Will be inlined
+}
+
+// Prevent inlining if needed: //go:noinline
+```
+
+## Performance Checklist
+
+**Before Optimizing:**
+- [ ] Profile with pprof to identify bottlenecks
+- [ ] Write benchmarks for hot paths
+- [ ] Measure allocations with `-benchmem`
+- [ ] Check for goroutine leaks
+
+**Go-Specific Optimizations:**
+- [ ] Preallocate slices and maps with known capacity
+- [ ] Use `strings.Builder` for string concatenation
+- [ ] Implement worker pools instead of unlimited goroutines
+- [ ] Use buffered channels to reduce blocking
+- [ ] Reuse buffers with `sync.Pool`
+- [ ] Minimize allocations in hot paths
+- [ ] Order struct fields by size (largest first)
+- [ ] Use concrete types instead of interfaces in hot paths
+- [ ] Avoid `defer` in tight loops
+- [ ] Use `sync.Map` for concurrent read-heavy maps
+
+**After Optimizing:**
+- [ ] Re-profile to verify improvements
+- [ ] Compare benchmarks: `benchstat old.txt new.txt`
+- [ ] Check memory allocations decreased
+- [ ] Monitor goroutine count in production
+- [ ] Use `go test -race` to check for race conditions
+
+## Tools and Packages
+
+**Profiling:**
+- `pprof` - Built-in profiler
+- `go-torch` - Flamegraph generation
+- `benchstat` - Compare benchmark results
+- `trace` - Execution tracer
+
+**Optimization:**
+- `sync.Pool` - Object pooling
+- `sync.Map` - Concurrent map
+- `strings.Builder` - Efficient string building
+- `bufio` - Buffered I/O
+
+**Analysis:**
+- `-gcflags='-m'` - Escape analysis
+- `go test -race` - Race detector
+- `go test -benchmem` - Memory allocations
+- `goleak` - Goroutine leak detection
+
+---
+
+*Go-specific performance optimization with goroutines, channels, and profiling*
--- a/skills/optimizing-performance/languages/JAVASCRIPT.md
+++ b/skills/optimizing-performance/languages/JAVASCRIPT.md
@@ -0,0 +1,406 @@
+# JavaScript/Node.js Performance Optimization
+
+**Load this file when:** Optimizing performance in JavaScript or Node.js projects
+
+## Profiling Tools
+
+### Node.js Built-in Profiler
+```bash
+# CPU profiling
+node --prof app.js
+node --prof-process isolate-0x*.log > processed.txt
+
+# Inspect with Chrome DevTools
+node --inspect app.js
+# Open chrome://inspect
+
+# Heap snapshots
+node --inspect --inspect-brk app.js
+# Take heap snapshots in DevTools
+```
+
+### Clinic.js Suite
+```bash
+# Install clinic
+npm install -g clinic
+
+# Doctor - Overall health check
+clinic doctor -- node app.js
+
+# Flame - Flamegraph profiling
+clinic flame -- node app.js
+
+# Bubbleprof - Async operations
+clinic bubbleprof -- node app.js
+
+# Heap profiler
+clinic heapprofiler -- node app.js
+```
+
+### Performance Measurement
+```bash
+# 0x - Flamegraph generator
+npx 0x app.js
+
+# autocannon - HTTP load testing
+npx autocannon http://localhost:3000
+
+# lighthouse - Frontend performance
+npx lighthouse https://example.com
+```
+
+## V8 Optimization Patterns
+
+### Hidden Classes and Inline Caches
+```javascript
+// Bad: Dynamic property addition breaks hidden class
+function Point(x, y) {
+    this.x = x;
+    this.y = y;
+}
+const p1 = new Point(1, 2);
+p1.z = 3;  // Deoptimizes!
+
+// Good: Consistent object shape
+function Point(x, y, z = 0) {
+    this.x = x;
+    this.y = y;
+    this.z = z;  // Always present
+}
+```
+
+### Avoid Polymorphism in Hot Paths
+```javascript
+// Bad: Type changes break optimization
+function add(a, b) {
+    return a + b;
+}
+add(1, 2);      // Optimized for numbers
+add("a", "b");  // Deoptimized! Now handles strings too
+
+// Good: Separate functions for different types
+function addNumbers(a, b) {
+    return a + b;  // Always numbers
+}
+
+function concatStrings(a, b) {
+    return a + b;  // Always strings
+}
+```
+
+### Array Optimization
+```javascript
+// Bad: Mixed types in array
+const mixed = [1, "two", 3, "four"];  // Slow property access
+
+// Good: Homogeneous arrays
+const numbers = [1, 2, 3, 4];  // Fast element access
+const strings = ["one", "two", "three"];
+
+// Use typed arrays for numeric data
+const buffer = new Float64Array(1000);  // Faster than regular arrays
+```
+
+## Event Loop Optimization
+
+### Avoid Blocking the Event Loop
+```javascript
+// Bad: Synchronous operations block event loop
+const data = fs.readFileSync('large-file.txt');
+const result = heavyComputation(data);
+
+// Good: Async operations
+const data = await fs.promises.readFile('large-file.txt');
+const result = await processAsync(data);
+
+// For CPU-intensive work, use worker threads
+const { Worker } = require('worker_threads');
+const worker = new Worker('./cpu-intensive.js');
+```
+
+### Batch Async Operations
+```javascript
+// Bad: Sequential async calls
+for (const item of items) {
+    await processItem(item);  // Waits for each
+}
+
+// Good: Parallel execution
+await Promise.all(items.map(item => processItem(item)));
+
+// Better: Controlled concurrency with p-limit
+const pLimit = require('p-limit');
+const limit = pLimit(10);  // Max 10 concurrent
+
+await Promise.all(
+    items.map(item => limit(() => processItem(item)))
+);
+```
+
+## Memory Management
+
+### Avoid Memory Leaks
+```javascript
+// Bad: Global variables and closures retain memory
+let cache = {};  // Never cleared
+function addToCache(key, value) {
+    cache[key] = value;  // Grows indefinitely
+}
+
+// Good: Use WeakMap for caching
+const cache = new WeakMap();
+function addToCache(obj, value) {
+    cache.set(obj, value);  // Auto garbage collected
+}
+
+// Good: Implement cache eviction
+const LRU = require('lru-cache');
+const cache = new LRU({ max: 500 });
+```
+
+### Stream Large Data
+```javascript
+// Bad: Load entire file in memory
+const data = await fs.promises.readFile('large-file.txt');
+const processed = data.toString().split('\n').map(process);
+
+// Good: Stream processing
+const readline = require('readline');
+const stream = fs.createReadStream('large-file.txt');
+const rl = readline.createInterface({ input: stream });
+
+for await (const line of rl) {
+    process(line);  // Process one line at a time
+}
+```
+
+## Database Query Optimization
+
+### Connection Pooling
+```javascript
+// Bad: Create new connection per request
+async function query(sql) {
+    const conn = await mysql.createConnection(config);
+    const result = await conn.query(sql);
+    await conn.end();
+    return result;
+}
+
+// Good: Use connection pool
+const pool = mysql.createPool(config);
+async function query(sql) {
+    return pool.query(sql);  // Reuses connections
+}
+```
+
+### Batch Database Operations
+```javascript
+// Bad: Multiple round trips
+for (const user of users) {
+    await db.insert('users', user);
+}
+
+// Good: Single batch insert
+await db.batchInsert('users', users, 1000);  // Chunks of 1000
+```
+
+## HTTP Server Optimization
+
+### Compression
+```javascript
+const compression = require('compression');
+app.use(compression({
+    level: 6,  // Balance between speed and compression
+    threshold: 1024  // Only compress responses > 1KB
+}));
+```
+
+### Caching Headers
+```javascript
+app.get('/static/*', (req, res) => {
+    res.setHeader('Cache-Control', 'public, max-age=31536000');
+    res.setHeader('ETag', computeETag(file));
+    res.sendFile(file);
+});
+```
+
+### Keep-Alive Connections
+```javascript
+const http = require('http');
+const server = http.createServer({
+    keepAlive: true,
+    keepAliveTimeout: 60000  // 60 seconds
+}, app);
+```
+
+## Frontend Performance
+
+### Code Splitting
+```javascript
+// Dynamic imports for code splitting
+const HeavyComponent = lazy(() => import('./HeavyComponent'));
+
+// Route-based code splitting
+const routes = [
+    {
+        path: '/dashboard',
+        component: lazy(() => import('./Dashboard'))
+    }
+];
+```
+
+### Memoization
+```javascript
+// React.memo for expensive components
+const ExpensiveComponent = React.memo(({ data }) => {
+    return <div>{expensiveRender(data)}</div>;
+});
+
+// useMemo for expensive computations
+const sortedData = useMemo(() => {
+    return data.sort(compare);
+}, [data]);
+
+// useCallback for stable function references
+const handleClick = useCallback(() => {
+    doSomething(id);
+}, [id]);
+```
+
+### Virtual Scrolling
+```javascript
+// For large lists, render only visible items
+import { FixedSizeList } from 'react-window';
+
+<FixedSizeList
+    height={600}
+    itemCount={10000}
+    itemSize={50}
+    width="100%"
+>
+    {Row}
+</FixedSizeList>
+```
+
+## Performance Anti-Patterns
+
+### Unnecessary Re-renders
+```javascript
+// Bad: Creates new object on every render
+function MyComponent() {
+    const style = { color: 'red' };  // New object each render
+    return <div style={style}>Text</div>;
+}
+
+// Good: Define outside or use useMemo
+const style = { color: 'red' };
+function MyComponent() {
+    return <div style={style}>Text</div>;
+}
+```
+
+### Expensive Operations in Render
+```javascript
+// Bad: Expensive computation in render
+function MyComponent({ items }) {
+    const sorted = items.sort();  // Sorts on every render!
+    return <List data={sorted} />;
+}
+
+// Good: Memoize expensive computations
+function MyComponent({ items }) {
+    const sorted = useMemo(() => items.sort(), [items]);
+    return <List data={sorted} />;
+}
+```
+
+## Benchmarking
+
+### Simple Benchmarks
+```javascript
+const { performance } = require('perf_hooks');
+
+function benchmark(fn, iterations = 1000) {
+    const start = performance.now();
+    for (let i = 0; i < iterations; i++) {
+        fn();
+    }
+    const end = performance.now();
+    console.log(`Avg: ${(end - start) / iterations}ms`);
+}
+
+benchmark(() => myFunction());
+```
+
+### Benchmark.js
+```javascript
+const Benchmark = require('benchmark');
+const suite = new Benchmark.Suite;
+
+suite
+    .add('Array#forEach', function() {
+        [1,2,3].forEach(x => x * 2);
+    })
+    .add('Array#map', function() {
+        [1,2,3].map(x => x * 2);
+    })
+    .on('complete', function() {
+        console.log('Fastest is ' + this.filter('fastest').map('name'));
+    })
+    .run();
+```
+
+## Performance Checklist
+
+**Before Optimizing:**
+- [ ] Profile with Chrome DevTools or clinic.js
+- [ ] Identify hot paths and bottlenecks
+- [ ] Measure baseline performance
+
+**Node.js Optimizations:**
+- [ ] Use worker threads for CPU-intensive tasks
+- [ ] Implement connection pooling for databases
+- [ ] Enable compression middleware
+- [ ] Use streams for large data processing
+- [ ] Implement caching (Redis, in-memory)
+- [ ] Batch async operations with controlled concurrency
+- [ ] Monitor event loop lag
+
+**Frontend Optimizations:**
+- [ ] Implement code splitting
+- [ ] Use React.memo for expensive components
+- [ ] Implement virtual scrolling for large lists
+- [ ] Optimize bundle size (tree shaking, minification)
+- [ ] Use Web Workers for heavy computations
+- [ ] Implement service workers for offline caching
+- [ ] Lazy load images and components
+
+**After Optimizing:**
+- [ ] Re-profile to verify improvements
+- [ ] Check memory usage for leaks
+- [ ] Run load tests (autocannon, artillery)
+- [ ] Monitor with APM tools
+
+## Tools and Libraries
+
+**Profiling:**
+- `clinic.js` - Performance profiling suite
+- `0x` - Flamegraph profiler
+- `node --inspect` - Chrome DevTools integration
+- `autocannon` - HTTP load testing
+
+**Optimization:**
+- `p-limit` - Concurrency control
+- `lru-cache` - LRU caching
+- `compression` - Response compression
+- `react-window` - Virtual scrolling
+- `workerpool` - Worker thread pools
+
+**Monitoring:**
+- `prom-client` - Prometheus metrics
+- `newrelic` / `datadog` - APM
+- `clinic-doctor` - Health diagnostics
+
+---
+
+*JavaScript/Node.js-specific performance optimization with V8 patterns and profiling tools*
--- a/skills/optimizing-performance/languages/PYTHON.md
+++ b/skills/optimizing-performance/languages/PYTHON.md
@@ -0,0 +1,326 @@
+# Python Performance Optimization
+
+**Load this file when:** Optimizing performance in Python projects
+
+## Profiling Tools
+
+### Execution Time Profiling
+```bash
+# cProfile - Built-in profiler
+python -m cProfile -o profile.stats script.py
+python -m pstats profile.stats
+
+# py-spy - Sampling profiler (no code changes needed)
+py-spy record -o profile.svg -- python script.py
+py-spy top -- python script.py
+
+# line_profiler - Line-by-line profiling
+kernprof -l -v script.py
+```
+
+### Memory Profiling
+```bash
+# memory_profiler - Line-by-line memory usage
+python -m memory_profiler script.py
+
+# memray - Modern memory profiler
+memray run script.py
+memray flamegraph output.bin
+
+# tracemalloc - Built-in memory tracking
+# (use in code, see example below)
+```
+
+### Benchmarking
+```bash
+# pytest-benchmark
+pytest tests/ --benchmark-only
+
+# timeit - Quick microbenchmarks
+python -m timeit "'-'.join(str(n) for n in range(100))"
+```
+
+## Python-Specific Optimization Patterns
+
+### Async/Await Patterns
+```python
+import asyncio
+import aiohttp
+
+# Good: Parallel async operations
+async def fetch_all(urls):
+    async with aiohttp.ClientSession() as session:
+        tasks = [fetch_url(session, url) for url in urls]
+        return await asyncio.gather(*tasks)
+
+# Bad: Sequential async (defeats the purpose)
+async def fetch_all_bad(urls):
+    results = []
+    async with aiohttp.ClientSession() as session:
+        for url in urls:
+            results.append(await fetch_url(session, url))
+    return results
+```
+
+### List Comprehensions vs Generators
+```python
+# Generator (memory efficient for large datasets)
+def process_large_file(filename):
+    return (process_line(line) for line in open(filename))
+
+# List comprehension (when you need all data in memory)
+def process_small_file(filename):
+    return [process_line(line) for line in open(filename)]
+
+# Use itertools for complex generators
+from itertools import islice, chain
+first_10 = list(islice(generate_data(), 10))
+```
+
+### Efficient Data Structures
+```python
+# Use sets for membership testing
+# Bad: O(n)
+if item in my_list:  # Slow for large lists
+    ...
+
+# Good: O(1)
+if item in my_set:  # Fast
+    ...
+
+# Use deque for queue operations
+from collections import deque
+queue = deque()
+queue.append(item)      # O(1)
+queue.popleft()         # O(1) vs list.pop(0) which is O(n)
+
+# Use defaultdict to avoid key checks
+from collections import defaultdict
+counter = defaultdict(int)
+counter[key] += 1  # No need to check if key exists
+```
+
+## GIL (Global Interpreter Lock) Considerations
+
+### CPU-Bound Work
+```python
+# Use multiprocessing for CPU-bound tasks
+from multiprocessing import Pool
+
+def cpu_intensive_task(data):
+    # Heavy computation
+    return result
+
+with Pool(processes=4) as pool:
+    results = pool.map(cpu_intensive_task, data_list)
+```
+
+### I/O-Bound Work
+```python
+# Use asyncio or threading for I/O-bound tasks
+import asyncio
+
+async def io_bound_task(url):
+    # Network I/O, file I/O
+    return result
+
+results = await asyncio.gather(*[io_bound_task(url) for url in urls])
+```
+
+## Common Python Anti-Patterns
+
+### String Concatenation
+```python
+# Bad: O(n²) for n strings
+result = ""
+for s in strings:
+    result += s
+
+# Good: O(n)
+result = "".join(strings)
+```
+
+### Unnecessary Lambda
+```python
+# Bad: Extra function call overhead
+sorted_items = sorted(items, key=lambda x: x.value)
+
+# Good: Direct attribute access
+from operator import attrgetter
+sorted_items = sorted(items, key=attrgetter('value'))
+```
+
+### Loop Invariant Code
+```python
+# Bad: Repeated calculation in loop
+for item in items:
+    expensive_result = expensive_function()
+    process(item, expensive_result)
+
+# Good: Calculate once
+expensive_result = expensive_function()
+for item in items:
+    process(item, expensive_result)
+```
+
+## Performance Measurement
+
+### Tracemalloc for Memory Tracking
+```python
+import tracemalloc
+
+# Start tracking
+tracemalloc.start()
+
+# Your code here
+data = [i for i in range(1000000)]
+
+# Get memory usage
+current, peak = tracemalloc.get_traced_memory()
+print(f"Current: {current / 1024 / 1024:.2f} MB")
+print(f"Peak: {peak / 1024 / 1024:.2f} MB")
+
+tracemalloc.stop()
+```
+
+### Context Manager for Timing
+```python
+import time
+from contextlib import contextmanager
+
+@contextmanager
+def timer(name):
+    start = time.perf_counter()
+    yield
+    elapsed = time.perf_counter() - start
+    print(f"{name}: {elapsed:.4f}s")
+
+# Usage
+with timer("Database query"):
+    results = db.query(...)
+```
+
+## Database Optimization (Python-Specific)
+
+### SQLAlchemy Best Practices
+```python
+# Bad: N+1 queries
+for user in session.query(User).all():
+    print(user.profile.bio)  # Separate query for each
+
+# Good: Eager loading
+from sqlalchemy.orm import joinedload
+
+users = session.query(User).options(
+    joinedload(User.profile)
+).all()
+
+# Good: Batch operations
+session.bulk_insert_mappings(User, user_dicts)
+session.commit()
+```
+
+## Caching Strategies
+
+### Function Caching
+```python
+from functools import lru_cache, cache
+
+# LRU cache with size limit
+@lru_cache(maxsize=128)
+def expensive_computation(n):
+    # Heavy computation
+    return result
+
+# Unlimited cache (Python 3.9+)
+@cache
+def fibonacci(n):
+    if n < 2:
+        return n
+    return fibonacci(n-1) + fibonacci(n-2)
+
+# Manual cache with expiration
+from cachetools import TTLCache
+cache = TTLCache(maxsize=100, ttl=300)  # 5 minutes
+```
+
+## Performance Testing
+
+### pytest-benchmark
+```python
+def test_processing_performance(benchmark):
+    # Benchmark automatically handles iterations
+    result = benchmark(process_data, large_dataset)
+    assert result is not None
+
+# Compare against baseline
+def test_against_baseline(benchmark):
+    benchmark.pedantic(
+        process_data,
+        args=(dataset,),
+        iterations=10,
+        rounds=100
+    )
+```
+
+### Load Testing with Locust
+```python
+from locust import HttpUser, task, between
+
+class WebsiteUser(HttpUser):
+    wait_time = between(1, 3)
+
+    @task
+    def load_homepage(self):
+        self.client.get("/")
+
+    @task(3)  # 3x more likely than homepage
+    def load_api(self):
+        self.client.get("/api/data")
+```
+
+## Performance Checklist
+
+**Before Optimizing:**
+- [ ] Profile to identify actual bottlenecks (don't guess!)
+- [ ] Measure baseline performance
+- [ ] Set performance targets
+
+**Python-Specific Optimizations:**
+- [ ] Use generators for large datasets
+- [ ] Replace loops with list comprehensions where appropriate
+- [ ] Use appropriate data structures (set, deque, defaultdict)
+- [ ] Implement caching with @lru_cache or @cache
+- [ ] Use async/await for I/O-bound operations
+- [ ] Use multiprocessing for CPU-bound operations
+- [ ] Avoid string concatenation in loops
+- [ ] Minimize attribute lookups in hot loops
+- [ ] Use __slots__ for classes with many instances
+
+**After Optimizing:**
+- [ ] Re-profile to verify improvements
+- [ ] Check memory usage hasn't increased significantly
+- [ ] Ensure code readability is maintained
+- [ ] Add performance regression tests
+
+## Tools and Libraries
+
+**Profiling:**
+- `cProfile` - Built-in execution profiler
+- `py-spy` - Sampling profiler without code changes
+- `memory_profiler` - Memory usage line-by-line
+- `memray` - Modern memory profiler with flamegraphs
+
+**Performance Testing:**
+- `pytest-benchmark` - Benchmark tests
+- `locust` - Load testing framework
+- `hyperfine` - Command-line benchmarking
+
+**Optimization:**
+- `numpy` - Vectorized operations for numerical data
+- `numba` - JIT compilation for numerical functions
+- `cython` - Compile Python to C for speed
+
+---
+
+*Python-specific performance optimization with profiling tools and patterns*
--- a/skills/optimizing-performance/languages/RUST.md
+++ b/skills/optimizing-performance/languages/RUST.md
@@ -0,0 +1,382 @@
+# Rust Performance Optimization
+
+**Load this file when:** Optimizing performance in Rust projects
+
+## Profiling Tools
+
+### Benchmarking with Criterion
+```bash
+# Add to Cargo.toml
+[dev-dependencies]
+criterion = "0.5"
+
+[[bench]]
+name = "my_benchmark"
+harness = false
+
+# Run benchmarks
+cargo bench
+
+# Compare against baseline
+cargo bench --baseline master
+```
+
+### CPU Profiling
+```bash
+# perf (Linux)
+cargo build --release
+perf record --call-graph dwarf ./target/release/myapp
+perf report
+
+# Instruments (macOS)
+cargo instruments --release --template "Time Profiler"
+
+# cargo-flamegraph
+cargo install flamegraph
+cargo flamegraph
+
+# samply (cross-platform)
+cargo install samply
+samply record ./target/release/myapp
+```
+
+### Memory Profiling
+```bash
+# valgrind (memory leaks, cache performance)
+cargo build
+valgrind --tool=massif ./target/debug/myapp
+
+# dhat (heap profiling)
+# Add dhat crate to project
+
+# cargo-bloat (binary size analysis)
+cargo install cargo-bloat
+cargo bloat --release
+```
+
+## Zero-Cost Abstractions
+
+### Avoiding Unnecessary Allocations
+```rust
+// Bad: Allocates on every call
+fn process_string(s: String) -> String {
+    s.to_uppercase()
+}
+
+// Good: Borrows, no allocation
+fn process_string(s: &str) -> String {
+    s.to_uppercase()
+}
+
+// Best: In-place modification where possible
+fn process_string_mut(s: &mut String) {
+    *s = s.to_uppercase();
+}
+```
+
+### Stack vs Heap Allocation
+```rust
+// Stack: Fast, known size at compile time
+let numbers = [1, 2, 3, 4, 5];
+
+// Heap: Flexible, runtime-sized data
+let numbers = vec![1, 2, 3, 4, 5];
+
+// Use Box<[T]> for fixed-size heap data (smaller than Vec)
+let numbers: Box<[i32]> = vec![1, 2, 3, 4, 5].into_boxed_slice();
+```
+
+### Iterator Chains vs For Loops
+```rust
+// Good: Zero-cost iterator chains (compiled to efficient code)
+let sum: i32 = numbers
+    .iter()
+    .filter(|&&n| n > 0)
+    .map(|&n| n * 2)
+    .sum();
+
+// Also good: Manual loop (similar performance)
+let mut sum = 0;
+for &n in numbers.iter() {
+    if n > 0 {
+        sum += n * 2;
+    }
+}
+
+// Choose iterators for readability, loops for complex logic
+```
+
+## Compilation Optimizations
+
+### Release Profile Tuning
+```toml
+[profile.release]
+opt-level = 3           # Maximum optimization
+lto = "fat"             # Link-time optimization
+codegen-units = 1       # Better optimization, slower compile
+strip = true            # Strip symbols from binary
+panic = "abort"         # Smaller binary, no stack unwinding
+
+[profile.release-with-debug]
+inherits = "release"
+debug = true           # Keep debug symbols for profiling
+```
+
+### Target CPU Features
+```bash
+# Use native CPU features
+RUSTFLAGS="-C target-cpu=native" cargo build --release
+
+# Or in .cargo/config.toml
+[build]
+rustflags = ["-C", "target-cpu=native"]
+```
+
+## Memory Layout Optimization
+
+### Struct Field Ordering
+```rust
+// Bad: Wasted padding (24 bytes)
+struct BadLayout {
+    a: u8,   // 1 byte + 7 padding
+    b: u64,  // 8 bytes
+    c: u8,   // 1 byte + 7 padding
+}
+
+// Good: Minimal padding (16 bytes)
+struct GoodLayout {
+    b: u64,  // 8 bytes
+    a: u8,   // 1 byte
+    c: u8,   // 1 byte + 6 padding
+}
+
+// Use #[repr(C)] for consistent layout
+#[repr(C)]
+struct FixedLayout {
+    // Fields laid out in declaration order
+}
+```
+
+### Enum Optimization
+```rust
+// Consider enum size (uses largest variant)
+enum Large {
+    Small(u8),
+    Big([u8; 1000]),  // Entire enum is 1000+ bytes!
+}
+
+// Better: Box large variants
+enum Optimized {
+    Small(u8),
+    Big(Box<[u8; 1000]>),  // Enum is now pointer-sized
+}
+```
+
+## Concurrency Patterns
+
+### Using Rayon for Data Parallelism
+```rust
+use rayon::prelude::*;
+
+// Sequential
+let sum: i32 = data.iter().map(|x| expensive(x)).sum();
+
+// Parallel (automatic work stealing)
+let sum: i32 = data.par_iter().map(|x| expensive(x)).sum();
+```
+
+### Async Runtime Optimization
+```rust
+// tokio - For I/O-heavy workloads
+#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
+async fn main() {
+    // Async I/O operations
+}
+
+// async-std - Alternative runtime
+// Choose based on ecosystem compatibility
+```
+
+## Common Rust Performance Patterns
+
+### String Handling
+```rust
+// Avoid unnecessary clones
+// Bad
+fn process(s: String) -> String {
+    let upper = s.clone().to_uppercase();
+    upper
+}
+
+// Good
+fn process(s: &str) -> String {
+    s.to_uppercase()
+}
+
+// Use Cow for conditional cloning
+use std::borrow::Cow;
+
+fn maybe_uppercase<'a>(s: &'a str, uppercase: bool) -> Cow<'a, str> {
+    if uppercase {
+        Cow::Owned(s.to_uppercase())
+    } else {
+        Cow::Borrowed(s)
+    }
+}
+```
+
+### Collection Preallocation
+```rust
+// Bad: Multiple reallocations
+let mut vec = Vec::new();
+for i in 0..1000 {
+    vec.push(i);
+}
+
+// Good: Single allocation
+let mut vec = Vec::with_capacity(1000);
+for i in 0..1000 {
+    vec.push(i);
+}
+
+// Best: Use collect with size_hint
+let vec: Vec<_> = (0..1000).collect();
+```
+
+### Minimize Clones
+```rust
+// Bad: Unnecessary clones in loop
+for item in &items {
+    let owned = item.clone();
+    process(owned);
+}
+
+// Good: Borrow when possible
+for item in &items {
+    process_borrowed(item);
+}
+
+// Use Rc/Arc only when necessary
+use std::rc::Rc;
+let shared = Rc::new(expensive_data);
+let clone1 = Rc::clone(&shared);  // Cheap pointer clone
+```
+
+## Performance Anti-Patterns
+
+### Unnecessary Dynamic Dispatch
+```rust
+// Bad: Dynamic dispatch overhead
+fn process(items: &[Box<dyn Trait>]) {
+    for item in items {
+        item.method();  // Virtual call
+    }
+}
+
+// Good: Static dispatch via generics
+fn process<T: Trait>(items: &[T]) {
+    for item in items {
+        item.method();  // Direct call, can be inlined
+    }
+}
+```
+
+### Lock Contention
+```rust
+// Bad: Holding lock during expensive operation
+let data = mutex.lock().unwrap();
+let result = expensive_computation(&data);
+drop(data);
+
+// Good: Release lock quickly
+let cloned = {
+    let data = mutex.lock().unwrap();
+    data.clone()
+};
+let result = expensive_computation(&cloned);
+```
+
+## Benchmarking with Criterion
+
+### Basic Benchmark
+```rust
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+
+fn fibonacci_benchmark(c: &mut Criterion) {
+    c.bench_function("fib 20", |b| {
+        b.iter(|| fibonacci(black_box(20)))
+    });
+}
+
+criterion_group!(benches, fibonacci_benchmark);
+criterion_main!(benches);
+```
+
+### Parameterized Benchmarks
+```rust
+fn bench_sizes(c: &mut Criterion) {
+    let mut group = c.benchmark_group("process");
+
+    for size in [10, 100, 1000, 10000].iter() {
+        group.bench_with_input(
+            BenchmarkId::from_parameter(size),
+            size,
+            |b, &size| {
+                b.iter(|| process_data(black_box(size)))
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+## Performance Checklist
+
+**Before Optimizing:**
+- [ ] Profile with release build to identify bottlenecks
+- [ ] Measure baseline with criterion benchmarks
+- [ ] Use cargo-flamegraph to visualize hot paths
+
+**Rust-Specific Optimizations:**
+- [ ] Enable LTO in release profile
+- [ ] Use target-cpu=native for CPU-specific features
+- [ ] Preallocate collections with `with_capacity`
+- [ ] Prefer borrowing (&T) over owned (T) in APIs
+- [ ] Use iterators over manual loops
+- [ ] Minimize clones - use Rc/Arc only when needed
+- [ ] Order struct fields by size (largest first)
+- [ ] Box large enum variants
+- [ ] Use rayon for CPU-bound parallelism
+- [ ] Avoid unnecessary dynamic dispatch
+
+**After Optimizing:**
+- [ ] Re-benchmark to verify improvements
+- [ ] Check binary size with cargo-bloat
+- [ ] Profile memory with valgrind/dhat
+- [ ] Add regression tests with criterion baselines
+
+## Tools and Crates
+
+**Profiling:**
+- `criterion` - Statistical benchmarking
+- `flamegraph` - Flamegraph generation
+- `cargo-instruments` - macOS profiling
+- `perf` - Linux performance analysis
+- `dhat` - Heap profiling
+
+**Optimization:**
+- `rayon` - Data parallelism
+- `tokio` / `async-std` - Async runtime
+- `parking_lot` - Faster mutex/rwlock
+- `smallvec` - Stack-allocated vectors
+- `once_cell` - Lazy static initialization
+
+**Analysis:**
+- `cargo-bloat` - Binary size analysis
+- `cargo-udeps` - Find unused dependencies
+- `twiggy` - Code size profiler
+
+---
+
+*Rust-specific performance optimization with zero-cost abstractions and profiling tools*