Initial commit

2025-11-30 09:08:46 +08:00
commit ab158ea3e9
10 changed files with 5566 additions and 0 deletions
--- a/skills/aws-serverless-eda/references/performance-optimization.md
+++ b/skills/aws-serverless-eda/references/performance-optimization.md
@@ -0,0 +1,671 @@
+# Serverless Performance Optimization
+
+Performance optimization best practices for AWS Lambda and serverless architectures.
+
+## Table of Contents
+
+- [Lambda Execution Lifecycle](#lambda-execution-lifecycle)
+- [Cold Start Optimization](#cold-start-optimization)
+- [Memory and CPU Optimization](#memory-and-cpu-optimization)
+- [Package Size Optimization](#package-size-optimization)
+- [Initialization Optimization](#initialization-optimization)
+- [Runtime Performance](#runtime-performance)
+
+## Lambda Execution Lifecycle
+
+### Execution Environment Phases
+
+**Three phases of Lambda execution**:
+
+1. **Init Phase** (Cold Start):
+   - Download and unpack function package
+   - Create execution environment
+   - Initialize runtime
+   - Execute initialization code (outside handler)
+
+2. **Invoke Phase**:
+   - Execute handler code
+   - Return response
+   - Freeze execution environment
+
+3. **Shutdown Phase**:
+   - Runtime shutdown (after period of inactivity)
+   - Execution environment destroyed
+
+### Concurrency and Scaling
+
+**Key concepts**:
+- **Concurrency**: Number of execution environments serving requests simultaneously
+- **One event per environment**: Each environment processes one event at a time
+- **Automatic scaling**: Lambda creates new environments as needed
+- **Environment reuse**: Warm starts reuse existing environments
+
+**Example**:
+- Function takes 100ms to execute
+- Single environment can handle 10 requests/second
+- 100 concurrent requests = 10 environments needed
+- Default account limit: 1,000 concurrent executions (can be raised)
+
+## Cold Start Optimization
+
+### Understanding Cold Starts
+
+**Cold start components**:
+```
+Total Cold Start = Download Package + Init Environment + Init Code + Handler
+```
+
+**Cold start frequency**:
+- Development: Every code change creates new environments (frequent)
+- Production: Typically < 1% of invocations
+- Optimize for p95/p99 latency, not average
+
+### Package Size Optimization
+
+**Minimize deployment package**:
+
+```typescript
+new NodejsFunction(this, 'Function', {
+  entry: 'src/handler.ts',
+  bundling: {
+    minify: true, // Minify production code
+    sourceMap: false, // Disable in production
+    externalModules: [
+      '@aws-sdk/*', // Use AWS SDK from runtime
+    ],
+    // Tree-shaking removes unused code
+  },
+});
+```
+
+**Tools for optimization**:
+- **esbuild**: Automatic tree-shaking and minification
+- **Webpack**: Bundle optimization
+- **Maven**: Dependency analysis
+- **Gradle**: Unused dependency detection
+
+**Best practices**:
+1. Avoid monolithic functions
+2. Bundle only required dependencies
+3. Use tree-shaking to remove unused code
+4. Minify production code
+5. Exclude AWS SDK (provided by runtime)
+
+### Provisioned Concurrency
+
+**Pre-initialize environments for predictable latency**:
+
+```typescript
+const fn = new NodejsFunction(this, 'Function', {
+  entry: 'src/handler.ts',
+});
+
+// Static provisioned concurrency
+fn.currentVersion.addAlias('live', {
+  provisionedConcurrentExecutions: 10,
+});
+
+// Auto-scaling provisioned concurrency
+const alias = fn.currentVersion.addAlias('prod');
+
+const target = new applicationautoscaling.ScalableTarget(this, 'ScalableTarget', {
+  serviceNamespace: applicationautoscaling.ServiceNamespace.LAMBDA,
+  maxCapacity: 100,
+  minCapacity: 10,
+  resourceId: `function:${fn.functionName}:${alias.aliasName}`,
+  scalableDimension: 'lambda:function:ProvisionedConcurrentExecutions',
+});
+
+target.scaleOnUtilization({
+  utilizationTarget: 0.7, // 70% utilization
+});
+```
+
+**When to use**:
+- **Consistent traffic patterns**: Predictable load
+- **Latency-sensitive APIs**: Sub-100ms requirements
+- **Cost consideration**: Compare cold start frequency vs. provisioned cost
+
+**Cost comparison**:
+- **On-demand**: Pay only for actual usage
+- **Provisioned**: Pay for provisioned capacity + invocations
+- **Breakeven**: When cold starts > ~20% of invocations
+
+### Lambda SnapStart (Java)
+
+**Instant cold starts for Java**:
+
+```typescript
+new lambda.Function(this, 'JavaFunction', {
+  runtime: lambda.Runtime.JAVA_17,
+  code: lambda.Code.fromAsset('target/function.jar'),
+  handler: 'com.example.Handler::handleRequest',
+  snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
+});
+```
+
+**Benefits**:
+- Up to 10x faster cold starts for Java
+- No code changes required
+- Works with published versions
+- No additional cost
+
+## Memory and CPU Optimization
+
+### Memory = CPU Allocation
+
+**Key principle**: Memory and CPU are proportionally allocated
+
+| Memory | vCPU |
+|--------|------|
+| 128 MB | 0.07 vCPU |
+| 512 MB | 0.28 vCPU |
+| 1,024 MB | 0.57 vCPU |
+| 1,769 MB | 1.00 vCPU |
+| 3,538 MB | 2.00 vCPU |
+| 10,240 MB | 6.00 vCPU |
+
+### Cost vs. Performance Balancing
+
+**Example - Compute-intensive function**:
+
+| Memory | Duration | Cost |
+|--------|----------|------|
+| 128 MB | 11.72s | $0.0246 |
+| 256 MB | 6.68s | $0.0280 |
+| 512 MB | 3.19s | $0.0268 |
+| 1024 MB | 1.46s | $0.0246 |
+
+**Key insight**: More memory = faster execution = similar or lower cost
+
+**Formula**:
+```
+Duration = Allocated Memory (GB) × Execution Time (seconds)
+Cost = Duration × Number of Invocations × Price per GB-second
+```
+
+### Finding Optimal Memory
+
+**Use Lambda Power Tuning**:
+
+```bash
+# Deploy power tuning state machine
+sam deploy --template-file template.yml --stack-name lambda-power-tuning
+
+# Run power tuning
+aws lambda invoke \
+  --function-name powerTuningFunction \
+  --payload '{"lambdaARN": "arn:aws:lambda:...", "powerValues": [128, 256, 512, 1024, 1536, 3008]}' \
+  response.json
+```
+
+**Manual testing approach**:
+1. Test function at different memory levels
+2. Measure execution time at each level
+3. Calculate cost for each configuration
+4. Choose optimal balance for your use case
+
+### Multi-Core Optimization
+
+**Leverage multiple vCPUs** (at 1,769 MB+):
+
+```typescript
+// Use Worker Threads for parallel processing
+import { Worker } from 'worker_threads';
+
+export const handler = async (event: any) => {
+  const items = event.items;
+
+  // Process in parallel using multiple cores
+  const workers = items.map(item =>
+    new Promise((resolve, reject) => {
+      const worker = new Worker('./worker.js', {
+        workerData: item,
+      });
+
+      worker.on('message', resolve);
+      worker.on('error', reject);
+    })
+  );
+
+  const results = await Promise.all(workers);
+  return results;
+};
+```
+
+**Python multiprocessing**:
+
+```python
+import multiprocessing as mp
+
+def handler(event, context):
+    items = event['items']
+
+    # Use multiple cores for CPU-bound work
+    with mp.Pool(mp.cpu_count()) as pool:
+        results = pool.map(process_item, items)
+
+    return {'results': results}
+```
+
+## Initialization Optimization
+
+### Code Outside Handler
+
+**Initialize once, reuse across invocations**:
+
+```typescript
+// ✅ GOOD - Initialize outside handler
+import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
+import { S3Client } from '@aws-sdk/client-s3';
+
+// Initialized once per execution environment
+const dynamodb = new DynamoDBClient({});
+const s3 = new S3Client({});
+
+// Connection pool initialized once
+const pool = createConnectionPool({
+  host: process.env.DB_HOST,
+  max: 1, // One connection per execution environment
+});
+
+export const handler = async (event: any) => {
+  // Reuse connections across invocations
+  const data = await dynamodb.getItem({ /* ... */ });
+  const file = await s3.getObject({ /* ... */ });
+  return processData(data, file);
+};
+
+// ❌ BAD - Initialize in handler
+export const handler = async (event: any) => {
+  const dynamodb = new DynamoDBClient({}); // Created every invocation
+  const s3 = new S3Client({}); // Created every invocation
+  // ...
+};
+```
+
+### Lazy Loading
+
+**Load dependencies only when needed**:
+
+```typescript
+// ✅ GOOD - Conditional loading
+export const handler = async (event: any) => {
+  if (event.operation === 'generatePDF') {
+    // Load heavy PDF library only when needed
+    const pdfLib = await import('./pdf-generator');
+    return pdfLib.generatePDF(event.data);
+  }
+
+  if (event.operation === 'processImage') {
+    const sharp = await import('sharp');
+    return processImage(sharp, event.data);
+  }
+
+  // Default operation (no heavy dependencies)
+  return processDefault(event);
+};
+
+// ❌ BAD - Load everything upfront
+import pdfLib from './pdf-generator'; // 50MB
+import sharp from 'sharp'; // 20MB
+// Even if not used!
+
+export const handler = async (event: any) => {
+  if (event.operation === 'generatePDF') {
+    return pdfLib.generatePDF(event.data);
+  }
+};
+```
+
+### Connection Reuse
+
+**Enable connection reuse**:
+
+```typescript
+import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
+
+const client = new DynamoDBClient({
+  // Enable keep-alive for connection reuse
+  requestHandler: {
+    connectionTimeout: 3000,
+    socketTimeout: 3000,
+  },
+});
+
+// For Node.js AWS SDK
+process.env.AWS_NODEJS_CONNECTION_REUSE_ENABLED = '1';
+```
+
+## Runtime Performance
+
+### Choose the Right Runtime
+
+**Runtime comparison**:
+
+| Runtime | Cold Start | Execution Speed | Ecosystem | Best For |
+|---------|------------|-----------------|-----------|----------|
+| Node.js 20 | Fast | Fast | Excellent | APIs, I/O-bound |
+| Python 3.12 | Fast | Medium | Excellent | Data processing |
+| Java 17 + SnapStart | Fast (w/SnapStart) | Fast | Good | Enterprise apps |
+| .NET 8 | Medium | Fast | Good | Enterprise apps |
+| Go | Very Fast | Very Fast | Good | High performance |
+| Rust | Very Fast | Very Fast | Growing | High performance |
+
+### Optimize Handler Code
+
+**Efficient code patterns**:
+
+```typescript
+// ✅ GOOD - Batch operations
+const items = ['item1', 'item2', 'item3'];
+
+// Single batch write
+await dynamodb.batchWriteItem({
+  RequestItems: {
+    [tableName]: items.map(item => ({
+      PutRequest: { Item: item },
+    })),
+  },
+});
+
+// ❌ BAD - Multiple single operations
+for (const item of items) {
+  await dynamodb.putItem({
+    TableName: tableName,
+    Item: item,
+  }); // Slow, multiple round trips
+}
+```
+
+### Async Processing
+
+**Use async/await effectively**:
+
+```typescript
+// ✅ GOOD - Parallel async operations
+const [userData, orderData, inventoryData] = await Promise.all([
+  getUserData(userId),
+  getOrderData(orderId),
+  getInventoryData(productId),
+]);
+
+// ❌ BAD - Sequential async operations
+const userData = await getUserData(userId);
+const orderData = await getOrderData(orderId); // Waits unnecessarily
+const inventoryData = await getInventoryData(productId); // Waits unnecessarily
+```
+
+### Caching Strategies
+
+**Cache frequently accessed data**:
+
+```typescript
+// In-memory cache (persists in warm environments)
+const cache = new Map<string, any>();
+
+export const handler = async (event: any) => {
+  const key = event.key;
+
+  // Check cache first
+  if (cache.has(key)) {
+    console.log('Cache hit');
+    return cache.get(key);
+  }
+
+  // Fetch from database
+  const data = await fetchFromDatabase(key);
+
+  // Store in cache
+  cache.set(key, data);
+
+  return data;
+};
+```
+
+**ElastiCache for shared cache**:
+
+```typescript
+import Redis from 'ioredis';
+
+// Initialize once
+const redis = new Redis({
+  host: process.env.REDIS_HOST,
+  port: 6379,
+  lazyConnect: true,
+  enableOfflineQueue: false,
+});
+
+export const handler = async (event: any) => {
+  const key = `order:${event.orderId}`;
+
+  // Try cache
+  const cached = await redis.get(key);
+  if (cached) {
+    return JSON.parse(cached);
+  }
+
+  // Fetch and cache
+  const data = await fetchOrder(event.orderId);
+  await redis.setex(key, 300, JSON.stringify(data)); // 5 min TTL
+
+  return data;
+};
+```
+
+## Performance Testing
+
+### Load Testing
+
+**Use Artillery for load testing**:
+
+```yaml
+# load-test.yml
+config:
+  target: https://api.example.com
+  phases:
+    - duration: 60
+      arrivalRate: 10
+      rampTo: 100 # Ramp from 10 to 100 req/sec
+scenarios:
+  - flow:
+      - post:
+          url: /orders
+          json:
+            orderId: "{{ $randomString() }}"
+            amount: "{{ $randomNumber(10, 1000) }}"
+```
+
+```bash
+artillery run load-test.yml
+```
+
+### Benchmarking
+
+**Test different configurations**:
+
+```typescript
+// benchmark.ts
+import { Lambda } from '@aws-sdk/client-lambda';
+
+const lambda = new Lambda({});
+
+const testConfigurations = [
+  { memory: 128, name: 'Function-128' },
+  { memory: 256, name: 'Function-256' },
+  { memory: 512, name: 'Function-512' },
+  { memory: 1024, name: 'Function-1024' },
+];
+
+for (const config of testConfigurations) {
+  const times: number[] = [];
+
+  // Warm up
+  for (let i = 0; i < 5; i++) {
+    await lambda.invoke({ FunctionName: config.name });
+  }
+
+  // Measure
+  for (let i = 0; i < 100; i++) {
+    const start = Date.now();
+    await lambda.invoke({ FunctionName: config.name });
+    times.push(Date.now() - start);
+  }
+
+  const p99 = times.sort()[99];
+  const avg = times.reduce((a, b) => a + b) / times.length;
+
+  console.log(`${config.memory}MB - Avg: ${avg}ms, p99: ${p99}ms`);
+}
+```
+
+## Cost Optimization
+
+### Right-Sizing Memory
+
+**Balance cost and performance**:
+
+**CPU-bound workloads**:
+- More memory = more CPU = faster execution
+- Often results in lower cost overall
+- Test at 1769MB (1 vCPU) and above
+
+**I/O-bound workloads**:
+- Less sensitive to memory allocation
+- May not benefit from higher memory
+- Test at lower memory levels (256-512MB)
+
+**Simple operations**:
+- Minimal CPU required
+- Use minimum memory (128-256MB)
+- Fast execution despite low resources
+
+### Billing Granularity
+
+**Lambda bills in 1ms increments**:
+- Precise billing (7ms execution = 7ms cost)
+- Optimize even small improvements
+- Consider trade-offs carefully
+
+**Cost calculation**:
+```
+Cost = (Memory GB) × (Duration seconds) × (Invocations) × ($0.0000166667/GB-second)
+     + (Invocations) × ($0.20/1M requests)
+```
+
+### Cost Reduction Strategies
+
+1. **Optimize execution time**: Faster = cheaper
+2. **Right-size memory**: Balance CPU needs with cost
+3. **Reduce invocations**: Batch processing, caching
+4. **Use Graviton2**: 20% better price/performance
+5. **Reserved Concurrency**: Only when needed
+6. **Compression**: Reduce data transfer costs
+
+## Advanced Optimization
+
+### Lambda Extensions
+
+**Use extensions for cross-cutting concerns**:
+
+```typescript
+// Lambda layer with extension
+const extensionLayer = lambda.LayerVersion.fromLayerVersionArn(
+  this,
+  'Extension',
+  'arn:aws:lambda:us-east-1:123456789:layer:my-extension:1'
+);
+
+new NodejsFunction(this, 'Function', {
+  entry: 'src/handler.ts',
+  layers: [extensionLayer],
+});
+```
+
+**Common extensions**:
+- Secrets caching
+- Configuration caching
+- Custom logging
+- Security scanning
+- Performance monitoring
+
+### Graviton2 Architecture
+
+**20% better price/performance**:
+
+```typescript
+new NodejsFunction(this, 'Function', {
+  entry: 'src/handler.ts',
+  architecture: lambda.Architecture.ARM_64, // Graviton2
+});
+```
+
+**Considerations**:
+- Most runtimes support ARM64
+- Test thoroughly before migrating
+- Dependencies must support ARM64
+- Native extensions may need recompilation
+
+### VPC Optimization
+
+**Hyperplane ENIs** (automatic since 2019):
+- No ENI per function
+- Faster cold starts in VPC
+- Scales instantly
+
+```typescript
+// Modern VPC configuration (fast)
+new NodejsFunction(this, 'VpcFunction', {
+  entry: 'src/handler.ts',
+  vpc,
+  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
+  // Fast scaling, no ENI limitations
+});
+```
+
+## Performance Monitoring
+
+### Key Metrics
+
+**Monitor these metrics**:
+- **Duration**: p50, p95, p99, max
+- **Cold Start %**: ColdStartDuration / TotalDuration
+- **Error Rate**: Errors / Invocations
+- **Throttles**: Indicates concurrency limit reached
+- **Iterator Age**: For stream processing lag
+
+### Performance Dashboards
+
+```typescript
+const dashboard = new cloudwatch.Dashboard(this, 'PerformanceDashboard');
+
+dashboard.addWidgets(
+  new cloudwatch.GraphWidget({
+    title: 'Latency Distribution',
+    left: [
+      fn.metricDuration({ statistic: 'p50', label: 'p50' }),
+      fn.metricDuration({ statistic: 'p95', label: 'p95' }),
+      fn.metricDuration({ statistic: 'p99', label: 'p99' }),
+      fn.metricDuration({ statistic: 'Maximum', label: 'max' }),
+    ],
+  }),
+  new cloudwatch.GraphWidget({
+    title: 'Memory Utilization',
+    left: [fn.metricDuration()],
+    right: [fn.metricErrors()],
+  })
+);
+```
+
+## Summary
+
+- **Cold Starts**: Optimize package size, use provisioned concurrency for critical paths
+- **Memory**: More memory often = faster execution = lower cost
+- **Initialization**: Initialize connections outside handler
+- **Lazy Loading**: Load dependencies only when needed
+- **Connection Reuse**: Enable for AWS SDK clients
+- **Testing**: Test at different memory levels to find optimal configuration
+- **Monitoring**: Track p99 latency, not average
+- **Graviton2**: Consider ARM64 for better price/performance
+- **Batch Operations**: Reduce round trips to services
+- **Caching**: Cache frequently accessed data