Initial commit
This commit is contained in:
211
skills/optimizing-performance/SKILL.md
Normal file
211
skills/optimizing-performance/SKILL.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
name: Optimizing Performance
|
||||
description: Optimize performance with profiling, caching strategies, database query optimization, and bottleneck analysis. Use when improving response times, implementing caching layers, or scaling for high load.
|
||||
---
|
||||
|
||||
# Optimizing Performance
|
||||
|
||||
I help you identify and fix performance bottlenecks using language-specific profiling tools, optimization patterns, and best practices.
|
||||
|
||||
## When to Use Me
|
||||
|
||||
**Performance analysis:**
|
||||
- "Profile this code for bottlenecks"
|
||||
- "Analyze performance issues"
|
||||
- "Why is this slow?"
|
||||
|
||||
**Optimization:**
|
||||
- "Optimize database queries"
|
||||
- "Improve response time"
|
||||
- "Reduce memory usage"
|
||||
|
||||
**Scaling:**
|
||||
- "Implement caching strategy"
|
||||
- "Optimize for high load"
|
||||
- "Scale this service"
|
||||
|
||||
## How I Work - Progressive Loading
|
||||
|
||||
I load only the performance guidance relevant to your language:
|
||||
|
||||
```yaml
|
||||
Language Detection:
|
||||
"Python project" → Load @languages/PYTHON.md
|
||||
"Rust project" → Load @languages/RUST.md
|
||||
"JavaScript/Node.js" → Load @languages/JAVASCRIPT.md
|
||||
"Go project" → Load @languages/GO.md
|
||||
"Any language" → Load @languages/GENERIC.md
|
||||
```
|
||||
|
||||
**Don't load all files!** Start with language detection, then load specific guidance.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### 1. Measure First
|
||||
**Never optimize without data.** Profile to find actual bottlenecks, don't guess.
|
||||
|
||||
- Establish baseline metrics
|
||||
- Profile to identify hot paths
|
||||
- Focus on the 20% of code that takes 80% of time
|
||||
- Measure improvements after optimization
|
||||
|
||||
### 2. Performance Budgets
|
||||
Set clear targets before optimizing:
|
||||
|
||||
```yaml
|
||||
targets:
|
||||
api_response: "<200ms (p95)"
|
||||
page_load: "<2 seconds"
|
||||
database_query: "<50ms (p95)"
|
||||
cache_lookup: "<10ms"
|
||||
```
|
||||
|
||||
### 3. Trade-offs
|
||||
Balance performance vs:
|
||||
- Code readability
|
||||
- Maintainability
|
||||
- Development time
|
||||
- Memory usage
|
||||
|
||||
Premature optimization is the root of all evil. Optimize when:
|
||||
- Profiling shows clear bottleneck
|
||||
- Performance requirement not met
|
||||
- User experience degraded
|
||||
|
||||
## Quick Wins (Language-Agnostic)
|
||||
|
||||
### Database
|
||||
- Add indexes for frequently queried columns
|
||||
- Implement connection pooling
|
||||
- Use batch operations instead of loops
|
||||
- Cache expensive query results
|
||||
|
||||
### Caching
|
||||
- Implement multi-level caching (L1: in-memory, L2: Redis, L3: database, L4: CDN)
|
||||
- Define cache invalidation strategy
|
||||
- Monitor cache hit rates
|
||||
|
||||
### Network
|
||||
- Enable compression for responses
|
||||
- Use HTTP/2 or HTTP/3
|
||||
- Implement CDN for static assets
|
||||
- Configure appropriate timeouts
|
||||
|
||||
## Language-Specific Guidance
|
||||
|
||||
### Python
|
||||
**Load:** `@languages/PYTHON.md`
|
||||
|
||||
**Quick reference:**
|
||||
- Profiling: `cProfile`, `py-spy`, `memory_profiler`
|
||||
- Patterns: Generators, async/await, list comprehensions
|
||||
- Anti-patterns: String concatenation in loops, GIL contention
|
||||
- Tools: `pytest-benchmark`, `locust`
|
||||
|
||||
### Rust
|
||||
**Load:** `@languages/RUST.md`
|
||||
|
||||
**Quick reference:**
|
||||
- Profiling: `cargo bench`, `flamegraph`, `perf`
|
||||
- Patterns: Zero-cost abstractions, iterator chains, preallocated collections
|
||||
- Anti-patterns: Unnecessary allocations, large enum variants
|
||||
- Tools: `criterion`, `rayon`, `parking_lot`
|
||||
|
||||
### JavaScript/Node.js
|
||||
**Load:** `@languages/JAVASCRIPT.md`
|
||||
|
||||
**Quick reference:**
|
||||
- Profiling: `clinic.js`, `0x`, Chrome DevTools
|
||||
- Patterns: Event loop optimization, worker threads, streaming
|
||||
- Anti-patterns: Blocking event loop, memory leaks, unnecessary re-renders
|
||||
- Tools: `autocannon`, `react-window`, `p-limit`
|
||||
|
||||
### Go
|
||||
**Load:** `@languages/GO.md`
|
||||
|
||||
**Quick reference:**
|
||||
- Profiling: `pprof`, `go test -bench`, `go tool trace`
|
||||
- Patterns: Goroutine pools, buffered channels, `sync.Pool`
|
||||
- Anti-patterns: Unlimited goroutines, defer in loops, lock contention
|
||||
- Tools: `benchstat`, `sync.Map`, `strings.Builder`
|
||||
|
||||
### Generic Patterns
|
||||
**Load:** `@languages/GENERIC.md`
|
||||
|
||||
**When to use:** Database optimization, caching strategies, load balancing, monitoring - applicable to any language.
|
||||
|
||||
## Optimization Workflow
|
||||
|
||||
### Phase 1: Baseline
|
||||
1. Define performance requirements
|
||||
2. Measure current performance
|
||||
3. Identify user-facing metrics (response time, throughput)
|
||||
|
||||
### Phase 2: Profile
|
||||
1. Use language-specific profiling tools
|
||||
2. Identify hot paths (where time is spent)
|
||||
3. Find memory bottlenecks
|
||||
4. Check for resource leaks
|
||||
|
||||
### Phase 3: Optimize
|
||||
1. Focus on biggest bottleneck first
|
||||
2. Apply language-specific optimizations
|
||||
3. Implement caching where appropriate
|
||||
4. Optimize database queries
|
||||
|
||||
### Phase 4: Verify
|
||||
1. Re-profile to measure improvements
|
||||
2. Run performance regression tests
|
||||
3. Monitor in production
|
||||
4. Set up alerts for degradation
|
||||
|
||||
## Common Bottlenecks
|
||||
|
||||
### Database
|
||||
- Missing indexes
|
||||
- N+1 query problem
|
||||
- No connection pooling
|
||||
- Expensive joins
|
||||
→ **Load** `@languages/GENERIC.md` for DB optimization
|
||||
|
||||
### Memory
|
||||
- Memory leaks
|
||||
- Excessive allocations
|
||||
- Large object graphs
|
||||
- No pooling
|
||||
→ **Load** language-specific file for memory management
|
||||
|
||||
### Network
|
||||
- No compression
|
||||
- Chatty API calls
|
||||
- Synchronous external calls
|
||||
- No CDN
|
||||
→ **Load** `@languages/GENERIC.md` for network optimization
|
||||
|
||||
### Concurrency
|
||||
- Lock contention
|
||||
- Excessive threading/goroutines
|
||||
- Blocking operations
|
||||
- Poor work distribution
|
||||
→ **Load** language-specific file for concurrency patterns
|
||||
|
||||
## Success Criteria
|
||||
|
||||
**Optimization complete when:**
|
||||
- ✅ Performance targets met
|
||||
- ✅ No regressions in functionality
|
||||
- ✅ Code remains maintainable
|
||||
- ✅ Improvements verified with profiling
|
||||
- ✅ Production metrics show improvement
|
||||
- ✅ Alerts configured for degradation
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Use profiling tools to identify bottlenecks
|
||||
- Load language-specific guidance
|
||||
- Apply targeted optimizations
|
||||
- Set up monitoring and alerts
|
||||
|
||||
---
|
||||
|
||||
*Load language-specific files for detailed profiling tools, optimization patterns, and best practices*
|
||||
426
skills/optimizing-performance/languages/GENERIC.md
Normal file
426
skills/optimizing-performance/languages/GENERIC.md
Normal file
@@ -0,0 +1,426 @@
|
||||
# Generic Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in any language or need language-agnostic patterns
|
||||
|
||||
## Universal Principles
|
||||
|
||||
### Measure First
|
||||
- Never optimize without profiling
|
||||
- Establish baseline metrics before changes
|
||||
- Focus on bottlenecks, not micro-optimizations
|
||||
- Use 80/20 rule: 80% of time spent in 20% of code
|
||||
|
||||
### Performance Budgets
|
||||
```yaml
|
||||
response_time_targets:
|
||||
api_endpoint: "<200ms (p95)"
|
||||
page_load: "<2 seconds"
|
||||
database_query: "<50ms (p95)"
|
||||
cache_lookup: "<10ms"
|
||||
|
||||
resource_limits:
|
||||
max_memory: "512MB per process"
|
||||
max_cpu: "80% sustained"
|
||||
max_connections: "100 per instance"
|
||||
```
|
||||
|
||||
## Database Optimization
|
||||
|
||||
### Indexing Strategy
|
||||
```sql
|
||||
-- Identify slow queries
|
||||
-- PostgreSQL
|
||||
SELECT query, mean_exec_time
|
||||
FROM pg_stat_statements
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 10;
|
||||
|
||||
-- Add indexes for frequently queried columns
|
||||
CREATE INDEX idx_users_email ON users(email);
|
||||
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
|
||||
|
||||
-- Composite indexes for common query patterns
|
||||
CREATE INDEX idx_search ON products(category, price, created_at);
|
||||
```
|
||||
|
||||
### Query Optimization
|
||||
```sql
|
||||
-- Use EXPLAIN to understand query plans
|
||||
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
|
||||
|
||||
-- Avoid SELECT *
|
||||
-- Bad
|
||||
SELECT * FROM users;
|
||||
|
||||
-- Good
|
||||
SELECT id, name, email FROM users;
|
||||
|
||||
-- Use LIMIT for pagination
|
||||
SELECT id, name FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 0;
|
||||
|
||||
-- Use EXISTS instead of COUNT for checking existence
|
||||
-- Bad
|
||||
SELECT COUNT(*) FROM orders WHERE user_id = 123;
|
||||
|
||||
-- Good
|
||||
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 123);
|
||||
```
|
||||
|
||||
### Connection Pooling
|
||||
```yaml
|
||||
connection_pool_config:
|
||||
min_connections: 5
|
||||
max_connections: 20
|
||||
connection_timeout: 30s
|
||||
idle_timeout: 10m
|
||||
max_lifetime: 1h
|
||||
```
|
||||
|
||||
## Caching Strategies
|
||||
|
||||
### Multi-Level Caching
|
||||
```yaml
|
||||
caching_layers:
|
||||
L1_application:
|
||||
type: "In-Memory (LRU)"
|
||||
size: "100MB"
|
||||
ttl: "5 minutes"
|
||||
use_case: "Hot data, session data"
|
||||
|
||||
L2_distributed:
|
||||
type: "Redis"
|
||||
ttl: "1 hour"
|
||||
use_case: "Shared data across instances"
|
||||
|
||||
L3_database:
|
||||
type: "Query Result Cache"
|
||||
ttl: "15 minutes"
|
||||
use_case: "Expensive query results"
|
||||
|
||||
L4_cdn:
|
||||
type: "CDN"
|
||||
ttl: "24 hours"
|
||||
use_case: "Static assets, public API responses"
|
||||
```
|
||||
|
||||
### Cache Invalidation Patterns
|
||||
```yaml
|
||||
strategies:
|
||||
time_based:
|
||||
description: "TTL-based expiration"
|
||||
use_case: "Data with predictable change patterns"
|
||||
example: "Weather data, stock prices"
|
||||
|
||||
event_based:
|
||||
description: "Invalidate on data change events"
|
||||
use_case: "Real-time consistency required"
|
||||
example: "User profile updates"
|
||||
|
||||
write_through:
|
||||
description: "Update cache on write"
|
||||
use_case: "Strong consistency needed"
|
||||
example: "Shopping cart, user sessions"
|
||||
|
||||
lazy_refresh:
|
||||
description: "Refresh on cache miss"
|
||||
use_case: "Acceptable stale data"
|
||||
example: "Analytics dashboards"
|
||||
```
|
||||
|
||||
## Network Optimization
|
||||
|
||||
### HTTP/2 and HTTP/3
|
||||
```yaml
|
||||
benefits:
|
||||
- Multiplexing: Multiple requests over single connection
|
||||
- Header compression: Reduced overhead
|
||||
- Server push: Proactive resource sending
|
||||
- Binary protocol: Faster parsing
|
||||
```
|
||||
|
||||
### Compression
|
||||
```yaml
|
||||
compression_config:
|
||||
enabled: true
|
||||
min_size: "1KB" # Don't compress tiny responses
|
||||
types:
|
||||
- "text/html"
|
||||
- "text/css"
|
||||
- "application/javascript"
|
||||
- "application/json"
|
||||
level: 6 # Balance speed vs size
|
||||
```
|
||||
|
||||
### Connection Management
|
||||
```yaml
|
||||
keep_alive:
|
||||
enabled: true
|
||||
timeout: "60s"
|
||||
max_requests: 100
|
||||
|
||||
timeouts:
|
||||
connect: "10s"
|
||||
read: "30s"
|
||||
write: "30s"
|
||||
idle: "120s"
|
||||
```
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Key Metrics to Track
|
||||
```yaml
|
||||
application_metrics:
|
||||
- response_time_p50
|
||||
- response_time_p95
|
||||
- response_time_p99
|
||||
- error_rate
|
||||
- throughput_rps
|
||||
|
||||
system_metrics:
|
||||
- cpu_utilization
|
||||
- memory_utilization
|
||||
- disk_io
|
||||
- network_io
|
||||
|
||||
database_metrics:
|
||||
- query_execution_time
|
||||
- connection_pool_usage
|
||||
- slow_query_count
|
||||
- cache_hit_rate
|
||||
```
|
||||
|
||||
### Alert Thresholds
|
||||
```yaml
|
||||
alerts:
|
||||
critical:
|
||||
- metric: "error_rate"
|
||||
threshold: ">5%"
|
||||
duration: "2 minutes"
|
||||
|
||||
- metric: "response_time_p99"
|
||||
threshold: ">1000ms"
|
||||
duration: "5 minutes"
|
||||
|
||||
warning:
|
||||
- metric: "cpu_utilization"
|
||||
threshold: ">80%"
|
||||
duration: "10 minutes"
|
||||
|
||||
- metric: "memory_utilization"
|
||||
threshold: ">85%"
|
||||
duration: "5 minutes"
|
||||
```
|
||||
|
||||
## Load Balancing
|
||||
|
||||
### Strategies
|
||||
```yaml
|
||||
round_robin:
|
||||
description: "Distribute requests evenly"
|
||||
use_case: "Homogeneous backend servers"
|
||||
|
||||
least_connections:
|
||||
description: "Route to server with fewest connections"
|
||||
use_case: "Varying request processing times"
|
||||
|
||||
ip_hash:
|
||||
description: "Consistent routing based on client IP"
|
||||
use_case: "Session affinity required"
|
||||
|
||||
weighted:
|
||||
description: "Route based on server capacity"
|
||||
use_case: "Heterogeneous server specs"
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
```yaml
|
||||
health_check:
|
||||
interval: "10s"
|
||||
timeout: "5s"
|
||||
unhealthy_threshold: 3
|
||||
healthy_threshold: 2
|
||||
path: "/health"
|
||||
```
|
||||
|
||||
## CDN Configuration
|
||||
|
||||
### Caching Rules
|
||||
```yaml
|
||||
static_assets:
|
||||
pattern: "*.{js,css,png,jpg,svg,woff2}"
|
||||
cache_control: "public, max-age=31536000, immutable"
|
||||
|
||||
api_responses:
|
||||
pattern: "/api/public/*"
|
||||
cache_control: "public, max-age=300, s-maxage=600"
|
||||
|
||||
html_pages:
|
||||
pattern: "*.html"
|
||||
cache_control: "public, max-age=60, s-maxage=300"
|
||||
```
|
||||
|
||||
### Geographic Distribution
|
||||
```yaml
|
||||
regions:
|
||||
- us-east: "Primary"
|
||||
- us-west: "Failover"
|
||||
- eu-west: "Regional"
|
||||
- ap-southeast: "Regional"
|
||||
|
||||
routing:
|
||||
policy: "latency-based"
|
||||
fallback: "round-robin"
|
||||
```
|
||||
|
||||
## Horizontal Scaling Patterns
|
||||
|
||||
### Stateless Services
|
||||
```yaml
|
||||
principles:
|
||||
- No local state storage
|
||||
- Session data in external store (Redis, database)
|
||||
- Any instance can handle any request
|
||||
- Easy to add/remove instances
|
||||
```
|
||||
|
||||
### Message Queues
|
||||
```yaml
|
||||
use_cases:
|
||||
- Decouple services
|
||||
- Handle traffic spikes
|
||||
- Async processing
|
||||
- Retry logic
|
||||
|
||||
patterns:
|
||||
work_queue:
|
||||
description: "Distribute tasks to workers"
|
||||
example: "Image processing, email sending"
|
||||
|
||||
pub_sub:
|
||||
description: "Event broadcasting"
|
||||
example: "User registration notifications"
|
||||
```
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### N+1 Query Problem
|
||||
```sql
|
||||
-- Bad: N+1 queries (1 for users + N for profiles)
|
||||
SELECT * FROM users;
|
||||
-- Then for each user:
|
||||
SELECT * FROM profiles WHERE user_id = ?;
|
||||
|
||||
-- Good: Single join query
|
||||
SELECT u.*, p.*
|
||||
FROM users u
|
||||
LEFT JOIN profiles p ON u.id = p.user_id;
|
||||
```
|
||||
|
||||
### Chatty Interfaces
|
||||
```yaml
|
||||
bad:
|
||||
requests: 100
|
||||
description: "100 separate API calls to get data"
|
||||
latency: "100 * 50ms = 5000ms"
|
||||
|
||||
good:
|
||||
requests: 1
|
||||
description: "Single batch API call"
|
||||
latency: "200ms"
|
||||
```
|
||||
|
||||
### Synchronous External Calls
|
||||
```yaml
|
||||
bad:
|
||||
pattern: "Sequential blocking calls"
|
||||
time: "call1 (500ms) + call2 (500ms) + call3 (500ms) = 1500ms"
|
||||
|
||||
good:
|
||||
pattern: "Parallel async calls"
|
||||
time: "max(call1, call2, call3) = 500ms"
|
||||
```
|
||||
|
||||
## Performance Testing Strategy
|
||||
|
||||
### Load Testing
|
||||
```yaml
|
||||
scenarios:
|
||||
smoke_test:
|
||||
users: 1
|
||||
duration: "1 minute"
|
||||
purpose: "Verify system works"
|
||||
|
||||
load_test:
|
||||
users: "normal_traffic"
|
||||
duration: "15 minutes"
|
||||
purpose: "Performance under normal load"
|
||||
|
||||
stress_test:
|
||||
users: "2x_normal"
|
||||
duration: "30 minutes"
|
||||
purpose: "Find breaking point"
|
||||
|
||||
spike_test:
|
||||
users: "0 → 1000 → 0"
|
||||
duration: "10 minutes"
|
||||
purpose: "Handle sudden traffic spikes"
|
||||
|
||||
endurance_test:
|
||||
users: "normal_traffic"
|
||||
duration: "24 hours"
|
||||
purpose: "Memory leaks, degradation"
|
||||
```
|
||||
|
||||
### Performance Regression Tests
|
||||
```yaml
|
||||
approach:
|
||||
- Baseline metrics from production
|
||||
- Run automated perf tests in CI
|
||||
- Compare against baseline
|
||||
- Fail build if regression > threshold
|
||||
|
||||
thresholds:
|
||||
response_time: "+10%"
|
||||
throughput: "-5%"
|
||||
error_rate: "+1%"
|
||||
```
|
||||
|
||||
## Checklist
|
||||
|
||||
**Initial Assessment:**
|
||||
- [ ] Identify performance requirements
|
||||
- [ ] Establish current baseline metrics
|
||||
- [ ] Profile to find bottlenecks
|
||||
|
||||
**Database Optimization:**
|
||||
- [ ] Add indexes for common queries
|
||||
- [ ] Implement connection pooling
|
||||
- [ ] Cache query results
|
||||
- [ ] Use batch operations
|
||||
|
||||
**Caching:**
|
||||
- [ ] Implement multi-level caching
|
||||
- [ ] Define cache invalidation strategy
|
||||
- [ ] Monitor cache hit rates
|
||||
|
||||
**Network:**
|
||||
- [ ] Enable compression
|
||||
- [ ] Use HTTP/2 or HTTP/3
|
||||
- [ ] Implement CDN for static assets
|
||||
- [ ] Configure appropriate timeouts
|
||||
|
||||
**Monitoring:**
|
||||
- [ ] Track key performance metrics
|
||||
- [ ] Set up alerts for anomalies
|
||||
- [ ] Implement distributed tracing
|
||||
- [ ] Create performance dashboards
|
||||
|
||||
**Testing:**
|
||||
- [ ] Run load tests
|
||||
- [ ] Conduct stress tests
|
||||
- [ ] Set up performance regression tests
|
||||
- [ ] Monitor in production
|
||||
|
||||
---
|
||||
|
||||
*Language-agnostic performance optimization patterns applicable to any technology stack*
|
||||
433
skills/optimizing-performance/languages/GO.md
Normal file
433
skills/optimizing-performance/languages/GO.md
Normal file
@@ -0,0 +1,433 @@
|
||||
# Go Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in Go projects
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Built-in pprof
|
||||
```bash
|
||||
# CPU profiling
|
||||
go test -cpuprofile=cpu.prof -bench=.
|
||||
go tool pprof cpu.prof
|
||||
|
||||
# Memory profiling
|
||||
go test -memprofile=mem.prof -bench=.
|
||||
go tool pprof mem.prof
|
||||
|
||||
# Web UI for profiles
|
||||
go tool pprof -http=:8080 cpu.prof
|
||||
|
||||
# Goroutine profiling
|
||||
go tool pprof http://localhost:6060/debug/pprof/goroutine
|
||||
|
||||
# Heap profiling
|
||||
go tool pprof http://localhost:6060/debug/pprof/heap
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
```go
|
||||
// Basic benchmark
|
||||
func BenchmarkFibonacci(b *testing.B) {
|
||||
for i := 0; i < b.N; i++ {
|
||||
fibonacci(20)
|
||||
}
|
||||
}
|
||||
|
||||
// With sub-benchmarks
|
||||
func BenchmarkSizes(b *testing.B) {
|
||||
sizes := []int{10, 100, 1000}
|
||||
for _, size := range sizes {
|
||||
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
|
||||
for i := 0; i < b.N; i++ {
|
||||
process(size)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Reset timer for setup
|
||||
func BenchmarkWithSetup(b *testing.B) {
|
||||
data := setupExpensiveData()
|
||||
b.ResetTimer() // Don't count setup time
|
||||
|
||||
for i := 0; i < b.N; i++ {
|
||||
process(data)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Runtime Metrics
|
||||
```go
|
||||
import (
|
||||
"net/http"
|
||||
_ "net/http/pprof" // Import for side effects
|
||||
"runtime"
|
||||
)
|
||||
|
||||
func init() {
|
||||
// Enable profiling endpoint
|
||||
go func() {
|
||||
http.ListenAndServe("localhost:6060", nil)
|
||||
}()
|
||||
}
|
||||
|
||||
// Monitor goroutines
|
||||
func printStats() {
|
||||
fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
|
||||
|
||||
var m runtime.MemStats
|
||||
runtime.ReadMemStats(&m)
|
||||
fmt.Printf("Alloc: %d MB\n", m.Alloc/1024/1024)
|
||||
fmt.Printf("TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024)
|
||||
}
|
||||
```
|
||||
|
||||
## Memory Management
|
||||
|
||||
### Avoiding Allocations
|
||||
```go
|
||||
// Bad: Allocates on every call
|
||||
func process(data []byte) []byte {
|
||||
result := make([]byte, len(data)) // New allocation
|
||||
copy(result, data)
|
||||
return result
|
||||
}
|
||||
|
||||
// Good: Reuse buffer
|
||||
var bufferPool = sync.Pool{
|
||||
New: func() interface{} {
|
||||
return make([]byte, 1024)
|
||||
},
|
||||
}
|
||||
|
||||
func process(data []byte) {
|
||||
buf := bufferPool.Get().([]byte)
|
||||
defer bufferPool.Put(buf)
|
||||
// Process with buf
|
||||
}
|
||||
```
|
||||
|
||||
### Preallocate Slices
|
||||
```go
|
||||
// Bad: Multiple allocations as slice grows
|
||||
items := []Item{}
|
||||
for i := 0; i < 1000; i++ {
|
||||
items = append(items, Item{i}) // Reallocates when cap exceeded
|
||||
}
|
||||
|
||||
// Good: Single allocation
|
||||
items := make([]Item, 0, 1000)
|
||||
for i := 0; i < 1000; i++ {
|
||||
items = append(items, Item{i}) // No reallocation
|
||||
}
|
||||
|
||||
// Or if final size is known
|
||||
items := make([]Item, 1000)
|
||||
for i := 0; i < 1000; i++ {
|
||||
items[i] = Item{i}
|
||||
}
|
||||
```
|
||||
|
||||
### String vs []byte
|
||||
```go
|
||||
// Bad: String concatenation allocates
|
||||
var result string
|
||||
for _, s := range strings {
|
||||
result += s // New allocation each time
|
||||
}
|
||||
|
||||
// Good: Use strings.Builder
|
||||
var builder strings.Builder
|
||||
builder.Grow(estimatedSize) // Preallocate
|
||||
for _, s := range strings {
|
||||
builder.WriteString(s)
|
||||
}
|
||||
result := builder.String()
|
||||
|
||||
// For byte operations, work with []byte
|
||||
data := []byte("hello")
|
||||
data = append(data, " world"...) // Efficient
|
||||
```
|
||||
|
||||
## Goroutine Optimization
|
||||
|
||||
### Worker Pool Pattern
|
||||
```go
|
||||
// Bad: Unlimited goroutines
|
||||
for _, task := range tasks {
|
||||
go process(task) // Could spawn millions!
|
||||
}
|
||||
|
||||
// Good: Limited worker pool
|
||||
func workerPool(tasks <-chan Task, workers int) {
|
||||
var wg sync.WaitGroup
|
||||
for i := 0; i < workers; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for task := range tasks {
|
||||
process(task)
|
||||
}
|
||||
}()
|
||||
}
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
// Usage
|
||||
taskChan := make(chan Task, 100)
|
||||
go workerPool(taskChan, 10) // 10 workers
|
||||
```
|
||||
|
||||
### Channel Patterns
|
||||
```go
|
||||
// Buffered channels reduce blocking
|
||||
ch := make(chan int, 100) // Buffer of 100
|
||||
|
||||
// Fan-out pattern for parallel work
|
||||
func fanOut(in <-chan int, n int) []<-chan int {
|
||||
outs := make([]<-chan int, n)
|
||||
for i := 0; i < n; i++ {
|
||||
out := make(chan int)
|
||||
outs[i] = out
|
||||
go func() {
|
||||
for v := range in {
|
||||
out <- process(v)
|
||||
}
|
||||
close(out)
|
||||
}()
|
||||
}
|
||||
return outs
|
||||
}
|
||||
|
||||
// Fan-in pattern to merge results
|
||||
func fanIn(channels ...<-chan int) <-chan int {
|
||||
out := make(chan int)
|
||||
var wg sync.WaitGroup
|
||||
|
||||
for _, ch := range channels {
|
||||
wg.Add(1)
|
||||
go func(c <-chan int) {
|
||||
defer wg.Done()
|
||||
for v := range c {
|
||||
out <- v
|
||||
}
|
||||
}(ch)
|
||||
}
|
||||
|
||||
go func() {
|
||||
wg.Wait()
|
||||
close(out)
|
||||
}()
|
||||
|
||||
return out
|
||||
}
|
||||
```
|
||||
|
||||
## Data Structure Optimization
|
||||
|
||||
### Map Preallocation
|
||||
```go
|
||||
// Bad: Map grows as needed
|
||||
m := make(map[string]int)
|
||||
for i := 0; i < 10000; i++ {
|
||||
m[fmt.Sprint(i)] = i // Reallocates periodically
|
||||
}
|
||||
|
||||
// Good: Preallocate
|
||||
m := make(map[string]int, 10000)
|
||||
for i := 0; i < 10000; i++ {
|
||||
m[fmt.Sprint(i)] = i // No reallocation
|
||||
}
|
||||
```
|
||||
|
||||
### Struct Field Alignment
|
||||
```go
|
||||
// Bad: Poor alignment (40 bytes due to padding)
|
||||
type BadLayout struct {
|
||||
a bool // 1 byte + 7 padding
|
||||
b int64 // 8 bytes
|
||||
c bool // 1 byte + 7 padding
|
||||
d int64 // 8 bytes
|
||||
e bool // 1 byte + 7 padding
|
||||
}
|
||||
|
||||
// Good: Optimal alignment (24 bytes)
|
||||
type GoodLayout struct {
|
||||
b int64 // 8 bytes
|
||||
d int64 // 8 bytes
|
||||
a bool // 1 byte
|
||||
c bool // 1 byte
|
||||
e bool // 1 byte + 5 padding
|
||||
}
|
||||
```
|
||||
|
||||
## I/O Optimization
|
||||
|
||||
### Buffered I/O
|
||||
```go
|
||||
// Bad: Unbuffered reads
|
||||
file, _ := os.Open("file.txt")
|
||||
scanner := bufio.NewScanner(file)
|
||||
|
||||
// Good: Buffered with custom size
|
||||
file, _ := os.Open("file.txt")
|
||||
reader := bufio.NewReaderSize(file, 64*1024) // 64KB buffer
|
||||
scanner := bufio.NewScanner(reader)
|
||||
```
|
||||
|
||||
### Connection Pooling
|
||||
```go
|
||||
// HTTP client with connection pooling
|
||||
client := &http.Client{
|
||||
Transport: &http.Transport{
|
||||
MaxIdleConns: 100,
|
||||
MaxIdleConnsPerHost: 10,
|
||||
IdleConnTimeout: 90 * time.Second,
|
||||
},
|
||||
Timeout: 10 * time.Second,
|
||||
}
|
||||
|
||||
// Database connection pool
|
||||
db, _ := sql.Open("postgres", dsn)
|
||||
db.SetMaxOpenConns(25)
|
||||
db.SetMaxIdleConns(5)
|
||||
db.SetConnMaxLifetime(5 * time.Minute)
|
||||
```
|
||||
|
||||
## Performance Anti-Patterns
|
||||
|
||||
### Unnecessary Interface Conversions
|
||||
```go
|
||||
// Bad: Interface conversion in hot path
|
||||
func process(items []interface{}) {
|
||||
for _, item := range items {
|
||||
v := item.(MyType) // Type assertion overhead
|
||||
use(v)
|
||||
}
|
||||
}
|
||||
|
||||
// Good: Use concrete types
|
||||
func process(items []MyType) {
|
||||
for _, item := range items {
|
||||
use(item) // Direct access
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Defer in Loops
|
||||
```go
|
||||
// Bad: Defers accumulate in loop
|
||||
for _, file := range files {
|
||||
f, _ := os.Open(file)
|
||||
defer f.Close() // All close calls deferred until function returns!
|
||||
}
|
||||
|
||||
// Good: Close immediately or use function
|
||||
for _, file := range files {
|
||||
func() {
|
||||
f, _ := os.Open(file)
|
||||
defer f.Close() // Deferred to end of this closure
|
||||
process(f)
|
||||
}()
|
||||
}
|
||||
```
|
||||
|
||||
### Lock Contention
|
||||
```go
|
||||
// Bad: Lock held during expensive operation
|
||||
mu.Lock()
|
||||
result := expensiveComputation(data)
|
||||
cache[key] = result
|
||||
mu.Unlock()
|
||||
|
||||
// Good: Minimize lock time
|
||||
result := expensiveComputation(data)
|
||||
mu.Lock()
|
||||
cache[key] = result
|
||||
mu.Unlock()
|
||||
|
||||
// Better: Use sync.Map for concurrent reads
|
||||
var cache sync.Map
|
||||
cache.Store(key, value)
|
||||
val, ok := cache.Load(key)
|
||||
```
|
||||
|
||||
## Compiler Optimizations
|
||||
|
||||
### Escape Analysis
|
||||
```go
|
||||
// Bad: Escapes to heap
|
||||
func makeSlice() *[]int {
|
||||
s := make([]int, 1000)
|
||||
return &s // Pointer returned, allocates on heap
|
||||
}
|
||||
|
||||
// Good: Stays on stack
|
||||
func makeSlice() []int {
|
||||
s := make([]int, 1000)
|
||||
return s // Value returned, can stay on stack
|
||||
}
|
||||
|
||||
// Check with: go build -gcflags='-m'
|
||||
```
|
||||
|
||||
### Inline Functions
|
||||
```go
|
||||
// Small functions are inlined automatically
|
||||
func add(a, b int) int {
|
||||
return a + b // Will be inlined
|
||||
}
|
||||
|
||||
// Prevent inlining if needed: //go:noinline
|
||||
```
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
**Before Optimizing:**
|
||||
- [ ] Profile with pprof to identify bottlenecks
|
||||
- [ ] Write benchmarks for hot paths
|
||||
- [ ] Measure allocations with `-benchmem`
|
||||
- [ ] Check for goroutine leaks
|
||||
|
||||
**Go-Specific Optimizations:**
|
||||
- [ ] Preallocate slices and maps with known capacity
|
||||
- [ ] Use `strings.Builder` for string concatenation
|
||||
- [ ] Implement worker pools instead of unlimited goroutines
|
||||
- [ ] Use buffered channels to reduce blocking
|
||||
- [ ] Reuse buffers with `sync.Pool`
|
||||
- [ ] Minimize allocations in hot paths
|
||||
- [ ] Order struct fields by size (largest first)
|
||||
- [ ] Use concrete types instead of interfaces in hot paths
|
||||
- [ ] Avoid `defer` in tight loops
|
||||
- [ ] Use `sync.Map` for concurrent read-heavy maps
|
||||
|
||||
**After Optimizing:**
|
||||
- [ ] Re-profile to verify improvements
|
||||
- [ ] Compare benchmarks: `benchstat old.txt new.txt`
|
||||
- [ ] Check memory allocations decreased
|
||||
- [ ] Monitor goroutine count in production
|
||||
- [ ] Use `go test -race` to check for race conditions
|
||||
|
||||
## Tools and Packages
|
||||
|
||||
**Profiling:**
|
||||
- `pprof` - Built-in profiler
|
||||
- `go-torch` - Flamegraph generation
|
||||
- `benchstat` - Compare benchmark results
|
||||
- `trace` - Execution tracer
|
||||
|
||||
**Optimization:**
|
||||
- `sync.Pool` - Object pooling
|
||||
- `sync.Map` - Concurrent map
|
||||
- `strings.Builder` - Efficient string building
|
||||
- `bufio` - Buffered I/O
|
||||
|
||||
**Analysis:**
|
||||
- `-gcflags='-m'` - Escape analysis
|
||||
- `go test -race` - Race detector
|
||||
- `go test -benchmem` - Memory allocations
|
||||
- `goleak` - Goroutine leak detection
|
||||
|
||||
---
|
||||
|
||||
*Go-specific performance optimization with goroutines, channels, and profiling*
|
||||
406
skills/optimizing-performance/languages/JAVASCRIPT.md
Normal file
406
skills/optimizing-performance/languages/JAVASCRIPT.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# JavaScript/Node.js Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in JavaScript or Node.js projects
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Node.js Built-in Profiler
|
||||
```bash
|
||||
# CPU profiling
|
||||
node --prof app.js
|
||||
node --prof-process isolate-0x*.log > processed.txt
|
||||
|
||||
# Inspect with Chrome DevTools
|
||||
node --inspect app.js
|
||||
# Open chrome://inspect
|
||||
|
||||
# Heap snapshots
|
||||
node --inspect --inspect-brk app.js
|
||||
# Take heap snapshots in DevTools
|
||||
```
|
||||
|
||||
### Clinic.js Suite
|
||||
```bash
|
||||
# Install clinic
|
||||
npm install -g clinic
|
||||
|
||||
# Doctor - Overall health check
|
||||
clinic doctor -- node app.js
|
||||
|
||||
# Flame - Flamegraph profiling
|
||||
clinic flame -- node app.js
|
||||
|
||||
# Bubbleprof - Async operations
|
||||
clinic bubbleprof -- node app.js
|
||||
|
||||
# Heap profiler
|
||||
clinic heapprofiler -- node app.js
|
||||
```
|
||||
|
||||
### Performance Measurement
|
||||
```bash
|
||||
# 0x - Flamegraph generator
|
||||
npx 0x app.js
|
||||
|
||||
# autocannon - HTTP load testing
|
||||
npx autocannon http://localhost:3000
|
||||
|
||||
# lighthouse - Frontend performance
|
||||
npx lighthouse https://example.com
|
||||
```
|
||||
|
||||
## V8 Optimization Patterns
|
||||
|
||||
### Hidden Classes and Inline Caches
|
||||
```javascript
|
||||
// Bad: Dynamic property addition breaks hidden class
|
||||
function Point(x, y) {
|
||||
this.x = x;
|
||||
this.y = y;
|
||||
}
|
||||
const p1 = new Point(1, 2);
|
||||
p1.z = 3; // Deoptimizes!
|
||||
|
||||
// Good: Consistent object shape
|
||||
function Point(x, y, z = 0) {
|
||||
this.x = x;
|
||||
this.y = y;
|
||||
this.z = z; // Always present
|
||||
}
|
||||
```
|
||||
|
||||
### Avoid Polymorphism in Hot Paths
|
||||
```javascript
|
||||
// Bad: Type changes break optimization
|
||||
function add(a, b) {
|
||||
return a + b;
|
||||
}
|
||||
add(1, 2); // Optimized for numbers
|
||||
add("a", "b"); // Deoptimized! Now handles strings too
|
||||
|
||||
// Good: Separate functions for different types
|
||||
function addNumbers(a, b) {
|
||||
return a + b; // Always numbers
|
||||
}
|
||||
|
||||
function concatStrings(a, b) {
|
||||
return a + b; // Always strings
|
||||
}
|
||||
```
|
||||
|
||||
### Array Optimization
|
||||
```javascript
|
||||
// Bad: Mixed types in array
|
||||
const mixed = [1, "two", 3, "four"]; // Slow property access
|
||||
|
||||
// Good: Homogeneous arrays
|
||||
const numbers = [1, 2, 3, 4]; // Fast element access
|
||||
const strings = ["one", "two", "three"];
|
||||
|
||||
// Use typed arrays for numeric data
|
||||
const buffer = new Float64Array(1000); // Faster than regular arrays
|
||||
```
|
||||
|
||||
## Event Loop Optimization
|
||||
|
||||
### Avoid Blocking the Event Loop
|
||||
```javascript
|
||||
// Bad: Synchronous operations block event loop
|
||||
const data = fs.readFileSync('large-file.txt');
|
||||
const result = heavyComputation(data);
|
||||
|
||||
// Good: Async operations
|
||||
const data = await fs.promises.readFile('large-file.txt');
|
||||
const result = await processAsync(data);
|
||||
|
||||
// For CPU-intensive work, use worker threads
|
||||
const { Worker } = require('worker_threads');
|
||||
const worker = new Worker('./cpu-intensive.js');
|
||||
```
|
||||
|
||||
### Batch Async Operations
|
||||
```javascript
|
||||
// Bad: Sequential async calls
|
||||
for (const item of items) {
|
||||
await processItem(item); // Waits for each
|
||||
}
|
||||
|
||||
// Good: Parallel execution
|
||||
await Promise.all(items.map(item => processItem(item)));
|
||||
|
||||
// Better: Controlled concurrency with p-limit
|
||||
const pLimit = require('p-limit');
|
||||
const limit = pLimit(10); // Max 10 concurrent
|
||||
|
||||
await Promise.all(
|
||||
items.map(item => limit(() => processItem(item)))
|
||||
);
|
||||
```
|
||||
|
||||
## Memory Management
|
||||
|
||||
### Avoid Memory Leaks
|
||||
```javascript
|
||||
// Bad: Global variables and closures retain memory
|
||||
let cache = {}; // Never cleared
|
||||
function addToCache(key, value) {
|
||||
cache[key] = value; // Grows indefinitely
|
||||
}
|
||||
|
||||
// Good: Use WeakMap for caching
|
||||
const cache = new WeakMap();
|
||||
function addToCache(obj, value) {
|
||||
cache.set(obj, value); // Auto garbage collected
|
||||
}
|
||||
|
||||
// Good: Implement cache eviction
|
||||
const LRU = require('lru-cache');
|
||||
const cache = new LRU({ max: 500 });
|
||||
```
|
||||
|
||||
### Stream Large Data
|
||||
```javascript
|
||||
// Bad: Load entire file in memory
|
||||
const data = await fs.promises.readFile('large-file.txt');
|
||||
const processed = data.toString().split('\n').map(process);
|
||||
|
||||
// Good: Stream processing
|
||||
const readline = require('readline');
|
||||
const stream = fs.createReadStream('large-file.txt');
|
||||
const rl = readline.createInterface({ input: stream });
|
||||
|
||||
for await (const line of rl) {
|
||||
process(line); // Process one line at a time
|
||||
}
|
||||
```
|
||||
|
||||
## Database Query Optimization
|
||||
|
||||
### Connection Pooling
|
||||
```javascript
|
||||
// Bad: Create new connection per request
|
||||
async function query(sql) {
|
||||
const conn = await mysql.createConnection(config);
|
||||
const result = await conn.query(sql);
|
||||
await conn.end();
|
||||
return result;
|
||||
}
|
||||
|
||||
// Good: Use connection pool
|
||||
const pool = mysql.createPool(config);
|
||||
async function query(sql) {
|
||||
return pool.query(sql); // Reuses connections
|
||||
}
|
||||
```
|
||||
|
||||
### Batch Database Operations
|
||||
```javascript
|
||||
// Bad: Multiple round trips
|
||||
for (const user of users) {
|
||||
await db.insert('users', user);
|
||||
}
|
||||
|
||||
// Good: Single batch insert
|
||||
await db.batchInsert('users', users, 1000); // Chunks of 1000
|
||||
```
|
||||
|
||||
## HTTP Server Optimization
|
||||
|
||||
### Compression
|
||||
```javascript
|
||||
const compression = require('compression');
|
||||
app.use(compression({
|
||||
level: 6, // Balance between speed and compression
|
||||
threshold: 1024 // Only compress responses > 1KB
|
||||
}));
|
||||
```
|
||||
|
||||
### Caching Headers
|
||||
```javascript
|
||||
app.get('/static/*', (req, res) => {
|
||||
res.setHeader('Cache-Control', 'public, max-age=31536000');
|
||||
res.setHeader('ETag', computeETag(file));
|
||||
res.sendFile(file);
|
||||
});
|
||||
```
|
||||
|
||||
### Keep-Alive Connections
|
||||
```javascript
|
||||
const http = require('http');
|
||||
const server = http.createServer({
|
||||
keepAlive: true,
|
||||
keepAliveTimeout: 60000 // 60 seconds
|
||||
}, app);
|
||||
```
|
||||
|
||||
## Frontend Performance
|
||||
|
||||
### Code Splitting
|
||||
```javascript
|
||||
// Dynamic imports for code splitting
|
||||
const HeavyComponent = lazy(() => import('./HeavyComponent'));
|
||||
|
||||
// Route-based code splitting
|
||||
const routes = [
|
||||
{
|
||||
path: '/dashboard',
|
||||
component: lazy(() => import('./Dashboard'))
|
||||
}
|
||||
];
|
||||
```
|
||||
|
||||
### Memoization
|
||||
```javascript
|
||||
// React.memo for expensive components
|
||||
const ExpensiveComponent = React.memo(({ data }) => {
|
||||
return <div>{expensiveRender(data)}</div>;
|
||||
});
|
||||
|
||||
// useMemo for expensive computations
|
||||
const sortedData = useMemo(() => {
|
||||
return data.sort(compare);
|
||||
}, [data]);
|
||||
|
||||
// useCallback for stable function references
|
||||
const handleClick = useCallback(() => {
|
||||
doSomething(id);
|
||||
}, [id]);
|
||||
```
|
||||
|
||||
### Virtual Scrolling
|
||||
```javascript
|
||||
// For large lists, render only visible items
|
||||
import { FixedSizeList } from 'react-window';
|
||||
|
||||
<FixedSizeList
|
||||
height={600}
|
||||
itemCount={10000}
|
||||
itemSize={50}
|
||||
width="100%"
|
||||
>
|
||||
{Row}
|
||||
</FixedSizeList>
|
||||
```
|
||||
|
||||
## Performance Anti-Patterns
|
||||
|
||||
### Unnecessary Re-renders
|
||||
```javascript
|
||||
// Bad: Creates new object on every render
|
||||
function MyComponent() {
|
||||
const style = { color: 'red' }; // New object each render
|
||||
return <div style={style}>Text</div>;
|
||||
}
|
||||
|
||||
// Good: Define outside or use useMemo
|
||||
const style = { color: 'red' };
|
||||
function MyComponent() {
|
||||
return <div style={style}>Text</div>;
|
||||
}
|
||||
```
|
||||
|
||||
### Expensive Operations in Render
|
||||
```javascript
|
||||
// Bad: Expensive computation in render
|
||||
function MyComponent({ items }) {
|
||||
const sorted = items.sort(); // Sorts on every render!
|
||||
return <List data={sorted} />;
|
||||
}
|
||||
|
||||
// Good: Memoize expensive computations
|
||||
function MyComponent({ items }) {
|
||||
const sorted = useMemo(() => items.sort(), [items]);
|
||||
return <List data={sorted} />;
|
||||
}
|
||||
```
|
||||
|
||||
## Benchmarking
|
||||
|
||||
### Simple Benchmarks
|
||||
```javascript
|
||||
const { performance } = require('perf_hooks');
|
||||
|
||||
function benchmark(fn, iterations = 1000) {
|
||||
const start = performance.now();
|
||||
for (let i = 0; i < iterations; i++) {
|
||||
fn();
|
||||
}
|
||||
const end = performance.now();
|
||||
console.log(`Avg: ${(end - start) / iterations}ms`);
|
||||
}
|
||||
|
||||
benchmark(() => myFunction());
|
||||
```
|
||||
|
||||
### Benchmark.js
|
||||
```javascript
|
||||
const Benchmark = require('benchmark');
|
||||
const suite = new Benchmark.Suite;
|
||||
|
||||
suite
|
||||
.add('Array#forEach', function() {
|
||||
[1,2,3].forEach(x => x * 2);
|
||||
})
|
||||
.add('Array#map', function() {
|
||||
[1,2,3].map(x => x * 2);
|
||||
})
|
||||
.on('complete', function() {
|
||||
console.log('Fastest is ' + this.filter('fastest').map('name'));
|
||||
})
|
||||
.run();
|
||||
```
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
**Before Optimizing:**
|
||||
- [ ] Profile with Chrome DevTools or clinic.js
|
||||
- [ ] Identify hot paths and bottlenecks
|
||||
- [ ] Measure baseline performance
|
||||
|
||||
**Node.js Optimizations:**
|
||||
- [ ] Use worker threads for CPU-intensive tasks
|
||||
- [ ] Implement connection pooling for databases
|
||||
- [ ] Enable compression middleware
|
||||
- [ ] Use streams for large data processing
|
||||
- [ ] Implement caching (Redis, in-memory)
|
||||
- [ ] Batch async operations with controlled concurrency
|
||||
- [ ] Monitor event loop lag
|
||||
|
||||
**Frontend Optimizations:**
|
||||
- [ ] Implement code splitting
|
||||
- [ ] Use React.memo for expensive components
|
||||
- [ ] Implement virtual scrolling for large lists
|
||||
- [ ] Optimize bundle size (tree shaking, minification)
|
||||
- [ ] Use Web Workers for heavy computations
|
||||
- [ ] Implement service workers for offline caching
|
||||
- [ ] Lazy load images and components
|
||||
|
||||
**After Optimizing:**
|
||||
- [ ] Re-profile to verify improvements
|
||||
- [ ] Check memory usage for leaks
|
||||
- [ ] Run load tests (autocannon, artillery)
|
||||
- [ ] Monitor with APM tools
|
||||
|
||||
## Tools and Libraries
|
||||
|
||||
**Profiling:**
|
||||
- `clinic.js` - Performance profiling suite
|
||||
- `0x` - Flamegraph profiler
|
||||
- `node --inspect` - Chrome DevTools integration
|
||||
- `autocannon` - HTTP load testing
|
||||
|
||||
**Optimization:**
|
||||
- `p-limit` - Concurrency control
|
||||
- `lru-cache` - LRU caching
|
||||
- `compression` - Response compression
|
||||
- `react-window` - Virtual scrolling
|
||||
- `workerpool` - Worker thread pools
|
||||
|
||||
**Monitoring:**
|
||||
- `prom-client` - Prometheus metrics
|
||||
- `newrelic` / `datadog` - APM
|
||||
- `clinic-doctor` - Health diagnostics
|
||||
|
||||
---
|
||||
|
||||
*JavaScript/Node.js-specific performance optimization with V8 patterns and profiling tools*
|
||||
326
skills/optimizing-performance/languages/PYTHON.md
Normal file
326
skills/optimizing-performance/languages/PYTHON.md
Normal file
@@ -0,0 +1,326 @@
|
||||
# Python Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in Python projects
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Execution Time Profiling
|
||||
```bash
|
||||
# cProfile - Built-in profiler
|
||||
python -m cProfile -o profile.stats script.py
|
||||
python -m pstats profile.stats
|
||||
|
||||
# py-spy - Sampling profiler (no code changes needed)
|
||||
py-spy record -o profile.svg -- python script.py
|
||||
py-spy top -- python script.py
|
||||
|
||||
# line_profiler - Line-by-line profiling
|
||||
kernprof -l -v script.py
|
||||
```
|
||||
|
||||
### Memory Profiling
|
||||
```bash
|
||||
# memory_profiler - Line-by-line memory usage
|
||||
python -m memory_profiler script.py
|
||||
|
||||
# memray - Modern memory profiler
|
||||
memray run script.py
|
||||
memray flamegraph output.bin
|
||||
|
||||
# tracemalloc - Built-in memory tracking
|
||||
# (use in code, see example below)
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
```bash
|
||||
# pytest-benchmark
|
||||
pytest tests/ --benchmark-only
|
||||
|
||||
# timeit - Quick microbenchmarks
|
||||
python -m timeit "'-'.join(str(n) for n in range(100))"
|
||||
```
|
||||
|
||||
## Python-Specific Optimization Patterns
|
||||
|
||||
### Async/Await Patterns
|
||||
```python
|
||||
import asyncio
|
||||
import aiohttp
|
||||
|
||||
# Good: Parallel async operations
|
||||
async def fetch_all(urls):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [fetch_url(session, url) for url in urls]
|
||||
return await asyncio.gather(*tasks)
|
||||
|
||||
# Bad: Sequential async (defeats the purpose)
|
||||
async def fetch_all_bad(urls):
|
||||
results = []
|
||||
async with aiohttp.ClientSession() as session:
|
||||
for url in urls:
|
||||
results.append(await fetch_url(session, url))
|
||||
return results
|
||||
```
|
||||
|
||||
### List Comprehensions vs Generators
|
||||
```python
|
||||
# Generator (memory efficient for large datasets)
|
||||
def process_large_file(filename):
|
||||
return (process_line(line) for line in open(filename))
|
||||
|
||||
# List comprehension (when you need all data in memory)
|
||||
def process_small_file(filename):
|
||||
return [process_line(line) for line in open(filename)]
|
||||
|
||||
# Use itertools for complex generators
|
||||
from itertools import islice, chain
|
||||
first_10 = list(islice(generate_data(), 10))
|
||||
```
|
||||
|
||||
### Efficient Data Structures
|
||||
```python
|
||||
# Use sets for membership testing
|
||||
# Bad: O(n)
|
||||
if item in my_list: # Slow for large lists
|
||||
...
|
||||
|
||||
# Good: O(1)
|
||||
if item in my_set: # Fast
|
||||
...
|
||||
|
||||
# Use deque for queue operations
|
||||
from collections import deque
|
||||
queue = deque()
|
||||
queue.append(item) # O(1)
|
||||
queue.popleft() # O(1) vs list.pop(0) which is O(n)
|
||||
|
||||
# Use defaultdict to avoid key checks
|
||||
from collections import defaultdict
|
||||
counter = defaultdict(int)
|
||||
counter[key] += 1 # No need to check if key exists
|
||||
```
|
||||
|
||||
## GIL (Global Interpreter Lock) Considerations
|
||||
|
||||
### CPU-Bound Work
|
||||
```python
|
||||
# Use multiprocessing for CPU-bound tasks
|
||||
from multiprocessing import Pool
|
||||
|
||||
def cpu_intensive_task(data):
|
||||
# Heavy computation
|
||||
return result
|
||||
|
||||
with Pool(processes=4) as pool:
|
||||
results = pool.map(cpu_intensive_task, data_list)
|
||||
```
|
||||
|
||||
### I/O-Bound Work
|
||||
```python
|
||||
# Use asyncio or threading for I/O-bound tasks
|
||||
import asyncio
|
||||
|
||||
async def io_bound_task(url):
|
||||
# Network I/O, file I/O
|
||||
return result
|
||||
|
||||
results = await asyncio.gather(*[io_bound_task(url) for url in urls])
|
||||
```
|
||||
|
||||
## Common Python Anti-Patterns
|
||||
|
||||
### String Concatenation
|
||||
```python
|
||||
# Bad: O(n²) for n strings
|
||||
result = ""
|
||||
for s in strings:
|
||||
result += s
|
||||
|
||||
# Good: O(n)
|
||||
result = "".join(strings)
|
||||
```
|
||||
|
||||
### Unnecessary Lambda
|
||||
```python
|
||||
# Bad: Extra function call overhead
|
||||
sorted_items = sorted(items, key=lambda x: x.value)
|
||||
|
||||
# Good: Direct attribute access
|
||||
from operator import attrgetter
|
||||
sorted_items = sorted(items, key=attrgetter('value'))
|
||||
```
|
||||
|
||||
### Loop Invariant Code
|
||||
```python
|
||||
# Bad: Repeated calculation in loop
|
||||
for item in items:
|
||||
expensive_result = expensive_function()
|
||||
process(item, expensive_result)
|
||||
|
||||
# Good: Calculate once
|
||||
expensive_result = expensive_function()
|
||||
for item in items:
|
||||
process(item, expensive_result)
|
||||
```
|
||||
|
||||
## Performance Measurement
|
||||
|
||||
### Tracemalloc for Memory Tracking
|
||||
```python
|
||||
import tracemalloc
|
||||
|
||||
# Start tracking
|
||||
tracemalloc.start()
|
||||
|
||||
# Your code here
|
||||
data = [i for i in range(1000000)]
|
||||
|
||||
# Get memory usage
|
||||
current, peak = tracemalloc.get_traced_memory()
|
||||
print(f"Current: {current / 1024 / 1024:.2f} MB")
|
||||
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
|
||||
|
||||
tracemalloc.stop()
|
||||
```
|
||||
|
||||
### Context Manager for Timing
|
||||
```python
|
||||
import time
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def timer(name):
|
||||
start = time.perf_counter()
|
||||
yield
|
||||
elapsed = time.perf_counter() - start
|
||||
print(f"{name}: {elapsed:.4f}s")
|
||||
|
||||
# Usage
|
||||
with timer("Database query"):
|
||||
results = db.query(...)
|
||||
```
|
||||
|
||||
## Database Optimization (Python-Specific)
|
||||
|
||||
### SQLAlchemy Best Practices
|
||||
```python
|
||||
# Bad: N+1 queries
|
||||
for user in session.query(User).all():
|
||||
print(user.profile.bio) # Separate query for each
|
||||
|
||||
# Good: Eager loading
|
||||
from sqlalchemy.orm import joinedload
|
||||
|
||||
users = session.query(User).options(
|
||||
joinedload(User.profile)
|
||||
).all()
|
||||
|
||||
# Good: Batch operations
|
||||
session.bulk_insert_mappings(User, user_dicts)
|
||||
session.commit()
|
||||
```
|
||||
|
||||
## Caching Strategies
|
||||
|
||||
### Function Caching
|
||||
```python
|
||||
from functools import lru_cache, cache
|
||||
|
||||
# LRU cache with size limit
|
||||
@lru_cache(maxsize=128)
|
||||
def expensive_computation(n):
|
||||
# Heavy computation
|
||||
return result
|
||||
|
||||
# Unlimited cache (Python 3.9+)
|
||||
@cache
|
||||
def fibonacci(n):
|
||||
if n < 2:
|
||||
return n
|
||||
return fibonacci(n-1) + fibonacci(n-2)
|
||||
|
||||
# Manual cache with expiration
|
||||
from cachetools import TTLCache
|
||||
cache = TTLCache(maxsize=100, ttl=300) # 5 minutes
|
||||
```
|
||||
|
||||
## Performance Testing
|
||||
|
||||
### pytest-benchmark
|
||||
```python
|
||||
def test_processing_performance(benchmark):
|
||||
# Benchmark automatically handles iterations
|
||||
result = benchmark(process_data, large_dataset)
|
||||
assert result is not None
|
||||
|
||||
# Compare against baseline
|
||||
def test_against_baseline(benchmark):
|
||||
benchmark.pedantic(
|
||||
process_data,
|
||||
args=(dataset,),
|
||||
iterations=10,
|
||||
rounds=100
|
||||
)
|
||||
```
|
||||
|
||||
### Load Testing with Locust
|
||||
```python
|
||||
from locust import HttpUser, task, between
|
||||
|
||||
class WebsiteUser(HttpUser):
|
||||
wait_time = between(1, 3)
|
||||
|
||||
@task
|
||||
def load_homepage(self):
|
||||
self.client.get("/")
|
||||
|
||||
@task(3) # 3x more likely than homepage
|
||||
def load_api(self):
|
||||
self.client.get("/api/data")
|
||||
```
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
**Before Optimizing:**
|
||||
- [ ] Profile to identify actual bottlenecks (don't guess!)
|
||||
- [ ] Measure baseline performance
|
||||
- [ ] Set performance targets
|
||||
|
||||
**Python-Specific Optimizations:**
|
||||
- [ ] Use generators for large datasets
|
||||
- [ ] Replace loops with list comprehensions where appropriate
|
||||
- [ ] Use appropriate data structures (set, deque, defaultdict)
|
||||
- [ ] Implement caching with @lru_cache or @cache
|
||||
- [ ] Use async/await for I/O-bound operations
|
||||
- [ ] Use multiprocessing for CPU-bound operations
|
||||
- [ ] Avoid string concatenation in loops
|
||||
- [ ] Minimize attribute lookups in hot loops
|
||||
- [ ] Use __slots__ for classes with many instances
|
||||
|
||||
**After Optimizing:**
|
||||
- [ ] Re-profile to verify improvements
|
||||
- [ ] Check memory usage hasn't increased significantly
|
||||
- [ ] Ensure code readability is maintained
|
||||
- [ ] Add performance regression tests
|
||||
|
||||
## Tools and Libraries
|
||||
|
||||
**Profiling:**
|
||||
- `cProfile` - Built-in execution profiler
|
||||
- `py-spy` - Sampling profiler without code changes
|
||||
- `memory_profiler` - Memory usage line-by-line
|
||||
- `memray` - Modern memory profiler with flamegraphs
|
||||
|
||||
**Performance Testing:**
|
||||
- `pytest-benchmark` - Benchmark tests
|
||||
- `locust` - Load testing framework
|
||||
- `hyperfine` - Command-line benchmarking
|
||||
|
||||
**Optimization:**
|
||||
- `numpy` - Vectorized operations for numerical data
|
||||
- `numba` - JIT compilation for numerical functions
|
||||
- `cython` - Compile Python to C for speed
|
||||
|
||||
---
|
||||
|
||||
*Python-specific performance optimization with profiling tools and patterns*
|
||||
382
skills/optimizing-performance/languages/RUST.md
Normal file
382
skills/optimizing-performance/languages/RUST.md
Normal file
@@ -0,0 +1,382 @@
|
||||
# Rust Performance Optimization
|
||||
|
||||
**Load this file when:** Optimizing performance in Rust projects
|
||||
|
||||
## Profiling Tools
|
||||
|
||||
### Benchmarking with Criterion
|
||||
```bash
|
||||
# Add to Cargo.toml
|
||||
[dev-dependencies]
|
||||
criterion = "0.5"
|
||||
|
||||
[[bench]]
|
||||
name = "my_benchmark"
|
||||
harness = false
|
||||
|
||||
# Run benchmarks
|
||||
cargo bench
|
||||
|
||||
# Compare against baseline
|
||||
cargo bench --baseline master
|
||||
```
|
||||
|
||||
### CPU Profiling
|
||||
```bash
|
||||
# perf (Linux)
|
||||
cargo build --release
|
||||
perf record --call-graph dwarf ./target/release/myapp
|
||||
perf report
|
||||
|
||||
# Instruments (macOS)
|
||||
cargo instruments --release --template "Time Profiler"
|
||||
|
||||
# cargo-flamegraph
|
||||
cargo install flamegraph
|
||||
cargo flamegraph
|
||||
|
||||
# samply (cross-platform)
|
||||
cargo install samply
|
||||
samply record ./target/release/myapp
|
||||
```
|
||||
|
||||
### Memory Profiling
|
||||
```bash
|
||||
# valgrind (memory leaks, cache performance)
|
||||
cargo build
|
||||
valgrind --tool=massif ./target/debug/myapp
|
||||
|
||||
# dhat (heap profiling)
|
||||
# Add dhat crate to project
|
||||
|
||||
# cargo-bloat (binary size analysis)
|
||||
cargo install cargo-bloat
|
||||
cargo bloat --release
|
||||
```
|
||||
|
||||
## Zero-Cost Abstractions
|
||||
|
||||
### Avoiding Unnecessary Allocations
|
||||
```rust
|
||||
// Bad: Allocates on every call
|
||||
fn process_string(s: String) -> String {
|
||||
s.to_uppercase()
|
||||
}
|
||||
|
||||
// Good: Borrows, no allocation
|
||||
fn process_string(s: &str) -> String {
|
||||
s.to_uppercase()
|
||||
}
|
||||
|
||||
// Best: In-place modification where possible
|
||||
fn process_string_mut(s: &mut String) {
|
||||
*s = s.to_uppercase();
|
||||
}
|
||||
```
|
||||
|
||||
### Stack vs Heap Allocation
|
||||
```rust
|
||||
// Stack: Fast, known size at compile time
|
||||
let numbers = [1, 2, 3, 4, 5];
|
||||
|
||||
// Heap: Flexible, runtime-sized data
|
||||
let numbers = vec![1, 2, 3, 4, 5];
|
||||
|
||||
// Use Box<[T]> for fixed-size heap data (smaller than Vec)
|
||||
let numbers: Box<[i32]> = vec![1, 2, 3, 4, 5].into_boxed_slice();
|
||||
```
|
||||
|
||||
### Iterator Chains vs For Loops
|
||||
```rust
|
||||
// Good: Zero-cost iterator chains (compiled to efficient code)
|
||||
let sum: i32 = numbers
|
||||
.iter()
|
||||
.filter(|&&n| n > 0)
|
||||
.map(|&n| n * 2)
|
||||
.sum();
|
||||
|
||||
// Also good: Manual loop (similar performance)
|
||||
let mut sum = 0;
|
||||
for &n in numbers.iter() {
|
||||
if n > 0 {
|
||||
sum += n * 2;
|
||||
}
|
||||
}
|
||||
|
||||
// Choose iterators for readability, loops for complex logic
|
||||
```
|
||||
|
||||
## Compilation Optimizations
|
||||
|
||||
### Release Profile Tuning
|
||||
```toml
|
||||
[profile.release]
|
||||
opt-level = 3 # Maximum optimization
|
||||
lto = "fat" # Link-time optimization
|
||||
codegen-units = 1 # Better optimization, slower compile
|
||||
strip = true # Strip symbols from binary
|
||||
panic = "abort" # Smaller binary, no stack unwinding
|
||||
|
||||
[profile.release-with-debug]
|
||||
inherits = "release"
|
||||
debug = true # Keep debug symbols for profiling
|
||||
```
|
||||
|
||||
### Target CPU Features
|
||||
```bash
|
||||
# Use native CPU features
|
||||
RUSTFLAGS="-C target-cpu=native" cargo build --release
|
||||
|
||||
# Or in .cargo/config.toml
|
||||
[build]
|
||||
rustflags = ["-C", "target-cpu=native"]
|
||||
```
|
||||
|
||||
## Memory Layout Optimization
|
||||
|
||||
### Struct Field Ordering
|
||||
```rust
|
||||
// Bad: Wasted padding (24 bytes)
|
||||
struct BadLayout {
|
||||
a: u8, // 1 byte + 7 padding
|
||||
b: u64, // 8 bytes
|
||||
c: u8, // 1 byte + 7 padding
|
||||
}
|
||||
|
||||
// Good: Minimal padding (16 bytes)
|
||||
struct GoodLayout {
|
||||
b: u64, // 8 bytes
|
||||
a: u8, // 1 byte
|
||||
c: u8, // 1 byte + 6 padding
|
||||
}
|
||||
|
||||
// Use #[repr(C)] for consistent layout
|
||||
#[repr(C)]
|
||||
struct FixedLayout {
|
||||
// Fields laid out in declaration order
|
||||
}
|
||||
```
|
||||
|
||||
### Enum Optimization
|
||||
```rust
|
||||
// Consider enum size (uses largest variant)
|
||||
enum Large {
|
||||
Small(u8),
|
||||
Big([u8; 1000]), // Entire enum is 1000+ bytes!
|
||||
}
|
||||
|
||||
// Better: Box large variants
|
||||
enum Optimized {
|
||||
Small(u8),
|
||||
Big(Box<[u8; 1000]>), // Enum is now pointer-sized
|
||||
}
|
||||
```
|
||||
|
||||
## Concurrency Patterns
|
||||
|
||||
### Using Rayon for Data Parallelism
|
||||
```rust
|
||||
use rayon::prelude::*;
|
||||
|
||||
// Sequential
|
||||
let sum: i32 = data.iter().map(|x| expensive(x)).sum();
|
||||
|
||||
// Parallel (automatic work stealing)
|
||||
let sum: i32 = data.par_iter().map(|x| expensive(x)).sum();
|
||||
```
|
||||
|
||||
### Async Runtime Optimization
|
||||
```rust
|
||||
// tokio - For I/O-heavy workloads
|
||||
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
|
||||
async fn main() {
|
||||
// Async I/O operations
|
||||
}
|
||||
|
||||
// async-std - Alternative runtime
|
||||
// Choose based on ecosystem compatibility
|
||||
```
|
||||
|
||||
## Common Rust Performance Patterns
|
||||
|
||||
### String Handling
|
||||
```rust
|
||||
// Avoid unnecessary clones
|
||||
// Bad
|
||||
fn process(s: String) -> String {
|
||||
let upper = s.clone().to_uppercase();
|
||||
upper
|
||||
}
|
||||
|
||||
// Good
|
||||
fn process(s: &str) -> String {
|
||||
s.to_uppercase()
|
||||
}
|
||||
|
||||
// Use Cow for conditional cloning
|
||||
use std::borrow::Cow;
|
||||
|
||||
fn maybe_uppercase<'a>(s: &'a str, uppercase: bool) -> Cow<'a, str> {
|
||||
if uppercase {
|
||||
Cow::Owned(s.to_uppercase())
|
||||
} else {
|
||||
Cow::Borrowed(s)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Collection Preallocation
|
||||
```rust
|
||||
// Bad: Multiple reallocations
|
||||
let mut vec = Vec::new();
|
||||
for i in 0..1000 {
|
||||
vec.push(i);
|
||||
}
|
||||
|
||||
// Good: Single allocation
|
||||
let mut vec = Vec::with_capacity(1000);
|
||||
for i in 0..1000 {
|
||||
vec.push(i);
|
||||
}
|
||||
|
||||
// Best: Use collect with size_hint
|
||||
let vec: Vec<_> = (0..1000).collect();
|
||||
```
|
||||
|
||||
### Minimize Clones
|
||||
```rust
|
||||
// Bad: Unnecessary clones in loop
|
||||
for item in &items {
|
||||
let owned = item.clone();
|
||||
process(owned);
|
||||
}
|
||||
|
||||
// Good: Borrow when possible
|
||||
for item in &items {
|
||||
process_borrowed(item);
|
||||
}
|
||||
|
||||
// Use Rc/Arc only when necessary
|
||||
use std::rc::Rc;
|
||||
let shared = Rc::new(expensive_data);
|
||||
let clone1 = Rc::clone(&shared); // Cheap pointer clone
|
||||
```
|
||||
|
||||
## Performance Anti-Patterns
|
||||
|
||||
### Unnecessary Dynamic Dispatch
|
||||
```rust
|
||||
// Bad: Dynamic dispatch overhead
|
||||
fn process(items: &[Box<dyn Trait>]) {
|
||||
for item in items {
|
||||
item.method(); // Virtual call
|
||||
}
|
||||
}
|
||||
|
||||
// Good: Static dispatch via generics
|
||||
fn process<T: Trait>(items: &[T]) {
|
||||
for item in items {
|
||||
item.method(); // Direct call, can be inlined
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Lock Contention
|
||||
```rust
|
||||
// Bad: Holding lock during expensive operation
|
||||
let data = mutex.lock().unwrap();
|
||||
let result = expensive_computation(&data);
|
||||
drop(data);
|
||||
|
||||
// Good: Release lock quickly
|
||||
let cloned = {
|
||||
let data = mutex.lock().unwrap();
|
||||
data.clone()
|
||||
};
|
||||
let result = expensive_computation(&cloned);
|
||||
```
|
||||
|
||||
## Benchmarking with Criterion
|
||||
|
||||
### Basic Benchmark
|
||||
```rust
|
||||
use criterion::{black_box, criterion_group, criterion_main, Criterion};
|
||||
|
||||
fn fibonacci_benchmark(c: &mut Criterion) {
|
||||
c.bench_function("fib 20", |b| {
|
||||
b.iter(|| fibonacci(black_box(20)))
|
||||
});
|
||||
}
|
||||
|
||||
criterion_group!(benches, fibonacci_benchmark);
|
||||
criterion_main!(benches);
|
||||
```
|
||||
|
||||
### Parameterized Benchmarks
|
||||
```rust
|
||||
fn bench_sizes(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("process");
|
||||
|
||||
for size in [10, 100, 1000, 10000].iter() {
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(size),
|
||||
size,
|
||||
|b, &size| {
|
||||
b.iter(|| process_data(black_box(size)))
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
**Before Optimizing:**
|
||||
- [ ] Profile with release build to identify bottlenecks
|
||||
- [ ] Measure baseline with criterion benchmarks
|
||||
- [ ] Use cargo-flamegraph to visualize hot paths
|
||||
|
||||
**Rust-Specific Optimizations:**
|
||||
- [ ] Enable LTO in release profile
|
||||
- [ ] Use target-cpu=native for CPU-specific features
|
||||
- [ ] Preallocate collections with `with_capacity`
|
||||
- [ ] Prefer borrowing (&T) over owned (T) in APIs
|
||||
- [ ] Use iterators over manual loops
|
||||
- [ ] Minimize clones - use Rc/Arc only when needed
|
||||
- [ ] Order struct fields by size (largest first)
|
||||
- [ ] Box large enum variants
|
||||
- [ ] Use rayon for CPU-bound parallelism
|
||||
- [ ] Avoid unnecessary dynamic dispatch
|
||||
|
||||
**After Optimizing:**
|
||||
- [ ] Re-benchmark to verify improvements
|
||||
- [ ] Check binary size with cargo-bloat
|
||||
- [ ] Profile memory with valgrind/dhat
|
||||
- [ ] Add regression tests with criterion baselines
|
||||
|
||||
## Tools and Crates
|
||||
|
||||
**Profiling:**
|
||||
- `criterion` - Statistical benchmarking
|
||||
- `flamegraph` - Flamegraph generation
|
||||
- `cargo-instruments` - macOS profiling
|
||||
- `perf` - Linux performance analysis
|
||||
- `dhat` - Heap profiling
|
||||
|
||||
**Optimization:**
|
||||
- `rayon` - Data parallelism
|
||||
- `tokio` / `async-std` - Async runtime
|
||||
- `parking_lot` - Faster mutex/rwlock
|
||||
- `smallvec` - Stack-allocated vectors
|
||||
- `once_cell` - Lazy static initialization
|
||||
|
||||
**Analysis:**
|
||||
- `cargo-bloat` - Binary size analysis
|
||||
- `cargo-udeps` - Find unused dependencies
|
||||
- `twiggy` - Code size profiler
|
||||
|
||||
---
|
||||
|
||||
*Rust-specific performance optimization with zero-cost abstractions and profiling tools*
|
||||
Reference in New Issue
Block a user