Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:08:46 +08:00
commit ab158ea3e9
10 changed files with 5566 additions and 0 deletions

View File

@@ -0,0 +1,671 @@
# Serverless Performance Optimization
Performance optimization best practices for AWS Lambda and serverless architectures.
## Table of Contents
- [Lambda Execution Lifecycle](#lambda-execution-lifecycle)
- [Cold Start Optimization](#cold-start-optimization)
- [Memory and CPU Optimization](#memory-and-cpu-optimization)
- [Package Size Optimization](#package-size-optimization)
- [Initialization Optimization](#initialization-optimization)
- [Runtime Performance](#runtime-performance)
## Lambda Execution Lifecycle
### Execution Environment Phases
**Three phases of Lambda execution**:
1. **Init Phase** (Cold Start):
- Download and unpack function package
- Create execution environment
- Initialize runtime
- Execute initialization code (outside handler)
2. **Invoke Phase**:
- Execute handler code
- Return response
- Freeze execution environment
3. **Shutdown Phase**:
- Runtime shutdown (after period of inactivity)
- Execution environment destroyed
### Concurrency and Scaling
**Key concepts**:
- **Concurrency**: Number of execution environments serving requests simultaneously
- **One event per environment**: Each environment processes one event at a time
- **Automatic scaling**: Lambda creates new environments as needed
- **Environment reuse**: Warm starts reuse existing environments
**Example**:
- Function takes 100ms to execute
- Single environment can handle 10 requests/second
- 100 concurrent requests = 10 environments needed
- Default account limit: 1,000 concurrent executions (can be raised)
## Cold Start Optimization
### Understanding Cold Starts
**Cold start components**:
```
Total Cold Start = Download Package + Init Environment + Init Code + Handler
```
**Cold start frequency**:
- Development: Every code change creates new environments (frequent)
- Production: Typically < 1% of invocations
- Optimize for p95/p99 latency, not average
### Package Size Optimization
**Minimize deployment package**:
```typescript
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
bundling: {
minify: true, // Minify production code
sourceMap: false, // Disable in production
externalModules: [
'@aws-sdk/*', // Use AWS SDK from runtime
],
// Tree-shaking removes unused code
},
});
```
**Tools for optimization**:
- **esbuild**: Automatic tree-shaking and minification
- **Webpack**: Bundle optimization
- **Maven**: Dependency analysis
- **Gradle**: Unused dependency detection
**Best practices**:
1. Avoid monolithic functions
2. Bundle only required dependencies
3. Use tree-shaking to remove unused code
4. Minify production code
5. Exclude AWS SDK (provided by runtime)
### Provisioned Concurrency
**Pre-initialize environments for predictable latency**:
```typescript
const fn = new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
});
// Static provisioned concurrency
fn.currentVersion.addAlias('live', {
provisionedConcurrentExecutions: 10,
});
// Auto-scaling provisioned concurrency
const alias = fn.currentVersion.addAlias('prod');
const target = new applicationautoscaling.ScalableTarget(this, 'ScalableTarget', {
serviceNamespace: applicationautoscaling.ServiceNamespace.LAMBDA,
maxCapacity: 100,
minCapacity: 10,
resourceId: `function:${fn.functionName}:${alias.aliasName}`,
scalableDimension: 'lambda:function:ProvisionedConcurrentExecutions',
});
target.scaleOnUtilization({
utilizationTarget: 0.7, // 70% utilization
});
```
**When to use**:
- **Consistent traffic patterns**: Predictable load
- **Latency-sensitive APIs**: Sub-100ms requirements
- **Cost consideration**: Compare cold start frequency vs. provisioned cost
**Cost comparison**:
- **On-demand**: Pay only for actual usage
- **Provisioned**: Pay for provisioned capacity + invocations
- **Breakeven**: When cold starts > ~20% of invocations
### Lambda SnapStart (Java)
**Instant cold starts for Java**:
```typescript
new lambda.Function(this, 'JavaFunction', {
runtime: lambda.Runtime.JAVA_17,
code: lambda.Code.fromAsset('target/function.jar'),
handler: 'com.example.Handler::handleRequest',
snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
});
```
**Benefits**:
- Up to 10x faster cold starts for Java
- No code changes required
- Works with published versions
- No additional cost
## Memory and CPU Optimization
### Memory = CPU Allocation
**Key principle**: Memory and CPU are proportionally allocated
| Memory | vCPU |
|--------|------|
| 128 MB | 0.07 vCPU |
| 512 MB | 0.28 vCPU |
| 1,024 MB | 0.57 vCPU |
| 1,769 MB | 1.00 vCPU |
| 3,538 MB | 2.00 vCPU |
| 10,240 MB | 6.00 vCPU |
### Cost vs. Performance Balancing
**Example - Compute-intensive function**:
| Memory | Duration | Cost |
|--------|----------|------|
| 128 MB | 11.72s | $0.0246 |
| 256 MB | 6.68s | $0.0280 |
| 512 MB | 3.19s | $0.0268 |
| 1024 MB | 1.46s | $0.0246 |
**Key insight**: More memory = faster execution = similar or lower cost
**Formula**:
```
Duration = Allocated Memory (GB) × Execution Time (seconds)
Cost = Duration × Number of Invocations × Price per GB-second
```
### Finding Optimal Memory
**Use Lambda Power Tuning**:
```bash
# Deploy power tuning state machine
sam deploy --template-file template.yml --stack-name lambda-power-tuning
# Run power tuning
aws lambda invoke \
--function-name powerTuningFunction \
--payload '{"lambdaARN": "arn:aws:lambda:...", "powerValues": [128, 256, 512, 1024, 1536, 3008]}' \
response.json
```
**Manual testing approach**:
1. Test function at different memory levels
2. Measure execution time at each level
3. Calculate cost for each configuration
4. Choose optimal balance for your use case
### Multi-Core Optimization
**Leverage multiple vCPUs** (at 1,769 MB+):
```typescript
// Use Worker Threads for parallel processing
import { Worker } from 'worker_threads';
export const handler = async (event: any) => {
const items = event.items;
// Process in parallel using multiple cores
const workers = items.map(item =>
new Promise((resolve, reject) => {
const worker = new Worker('./worker.js', {
workerData: item,
});
worker.on('message', resolve);
worker.on('error', reject);
})
);
const results = await Promise.all(workers);
return results;
};
```
**Python multiprocessing**:
```python
import multiprocessing as mp
def handler(event, context):
items = event['items']
# Use multiple cores for CPU-bound work
with mp.Pool(mp.cpu_count()) as pool:
results = pool.map(process_item, items)
return {'results': results}
```
## Initialization Optimization
### Code Outside Handler
**Initialize once, reuse across invocations**:
```typescript
// ✅ GOOD - Initialize outside handler
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
// Initialized once per execution environment
const dynamodb = new DynamoDBClient({});
const s3 = new S3Client({});
// Connection pool initialized once
const pool = createConnectionPool({
host: process.env.DB_HOST,
max: 1, // One connection per execution environment
});
export const handler = async (event: any) => {
// Reuse connections across invocations
const data = await dynamodb.getItem({ /* ... */ });
const file = await s3.getObject({ /* ... */ });
return processData(data, file);
};
// ❌ BAD - Initialize in handler
export const handler = async (event: any) => {
const dynamodb = new DynamoDBClient({}); // Created every invocation
const s3 = new S3Client({}); // Created every invocation
// ...
};
```
### Lazy Loading
**Load dependencies only when needed**:
```typescript
// ✅ GOOD - Conditional loading
export const handler = async (event: any) => {
if (event.operation === 'generatePDF') {
// Load heavy PDF library only when needed
const pdfLib = await import('./pdf-generator');
return pdfLib.generatePDF(event.data);
}
if (event.operation === 'processImage') {
const sharp = await import('sharp');
return processImage(sharp, event.data);
}
// Default operation (no heavy dependencies)
return processDefault(event);
};
// ❌ BAD - Load everything upfront
import pdfLib from './pdf-generator'; // 50MB
import sharp from 'sharp'; // 20MB
// Even if not used!
export const handler = async (event: any) => {
if (event.operation === 'generatePDF') {
return pdfLib.generatePDF(event.data);
}
};
```
### Connection Reuse
**Enable connection reuse**:
```typescript
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
const client = new DynamoDBClient({
// Enable keep-alive for connection reuse
requestHandler: {
connectionTimeout: 3000,
socketTimeout: 3000,
},
});
// For Node.js AWS SDK
process.env.AWS_NODEJS_CONNECTION_REUSE_ENABLED = '1';
```
## Runtime Performance
### Choose the Right Runtime
**Runtime comparison**:
| Runtime | Cold Start | Execution Speed | Ecosystem | Best For |
|---------|------------|-----------------|-----------|----------|
| Node.js 20 | Fast | Fast | Excellent | APIs, I/O-bound |
| Python 3.12 | Fast | Medium | Excellent | Data processing |
| Java 17 + SnapStart | Fast (w/SnapStart) | Fast | Good | Enterprise apps |
| .NET 8 | Medium | Fast | Good | Enterprise apps |
| Go | Very Fast | Very Fast | Good | High performance |
| Rust | Very Fast | Very Fast | Growing | High performance |
### Optimize Handler Code
**Efficient code patterns**:
```typescript
// ✅ GOOD - Batch operations
const items = ['item1', 'item2', 'item3'];
// Single batch write
await dynamodb.batchWriteItem({
RequestItems: {
[tableName]: items.map(item => ({
PutRequest: { Item: item },
})),
},
});
// ❌ BAD - Multiple single operations
for (const item of items) {
await dynamodb.putItem({
TableName: tableName,
Item: item,
}); // Slow, multiple round trips
}
```
### Async Processing
**Use async/await effectively**:
```typescript
// ✅ GOOD - Parallel async operations
const [userData, orderData, inventoryData] = await Promise.all([
getUserData(userId),
getOrderData(orderId),
getInventoryData(productId),
]);
// ❌ BAD - Sequential async operations
const userData = await getUserData(userId);
const orderData = await getOrderData(orderId); // Waits unnecessarily
const inventoryData = await getInventoryData(productId); // Waits unnecessarily
```
### Caching Strategies
**Cache frequently accessed data**:
```typescript
// In-memory cache (persists in warm environments)
const cache = new Map<string, any>();
export const handler = async (event: any) => {
const key = event.key;
// Check cache first
if (cache.has(key)) {
console.log('Cache hit');
return cache.get(key);
}
// Fetch from database
const data = await fetchFromDatabase(key);
// Store in cache
cache.set(key, data);
return data;
};
```
**ElastiCache for shared cache**:
```typescript
import Redis from 'ioredis';
// Initialize once
const redis = new Redis({
host: process.env.REDIS_HOST,
port: 6379,
lazyConnect: true,
enableOfflineQueue: false,
});
export const handler = async (event: any) => {
const key = `order:${event.orderId}`;
// Try cache
const cached = await redis.get(key);
if (cached) {
return JSON.parse(cached);
}
// Fetch and cache
const data = await fetchOrder(event.orderId);
await redis.setex(key, 300, JSON.stringify(data)); // 5 min TTL
return data;
};
```
## Performance Testing
### Load Testing
**Use Artillery for load testing**:
```yaml
# load-test.yml
config:
target: https://api.example.com
phases:
- duration: 60
arrivalRate: 10
rampTo: 100 # Ramp from 10 to 100 req/sec
scenarios:
- flow:
- post:
url: /orders
json:
orderId: "{{ $randomString() }}"
amount: "{{ $randomNumber(10, 1000) }}"
```
```bash
artillery run load-test.yml
```
### Benchmarking
**Test different configurations**:
```typescript
// benchmark.ts
import { Lambda } from '@aws-sdk/client-lambda';
const lambda = new Lambda({});
const testConfigurations = [
{ memory: 128, name: 'Function-128' },
{ memory: 256, name: 'Function-256' },
{ memory: 512, name: 'Function-512' },
{ memory: 1024, name: 'Function-1024' },
];
for (const config of testConfigurations) {
const times: number[] = [];
// Warm up
for (let i = 0; i < 5; i++) {
await lambda.invoke({ FunctionName: config.name });
}
// Measure
for (let i = 0; i < 100; i++) {
const start = Date.now();
await lambda.invoke({ FunctionName: config.name });
times.push(Date.now() - start);
}
const p99 = times.sort()[99];
const avg = times.reduce((a, b) => a + b) / times.length;
console.log(`${config.memory}MB - Avg: ${avg}ms, p99: ${p99}ms`);
}
```
## Cost Optimization
### Right-Sizing Memory
**Balance cost and performance**:
**CPU-bound workloads**:
- More memory = more CPU = faster execution
- Often results in lower cost overall
- Test at 1769MB (1 vCPU) and above
**I/O-bound workloads**:
- Less sensitive to memory allocation
- May not benefit from higher memory
- Test at lower memory levels (256-512MB)
**Simple operations**:
- Minimal CPU required
- Use minimum memory (128-256MB)
- Fast execution despite low resources
### Billing Granularity
**Lambda bills in 1ms increments**:
- Precise billing (7ms execution = 7ms cost)
- Optimize even small improvements
- Consider trade-offs carefully
**Cost calculation**:
```
Cost = (Memory GB) × (Duration seconds) × (Invocations) × ($0.0000166667/GB-second)
+ (Invocations) × ($0.20/1M requests)
```
### Cost Reduction Strategies
1. **Optimize execution time**: Faster = cheaper
2. **Right-size memory**: Balance CPU needs with cost
3. **Reduce invocations**: Batch processing, caching
4. **Use Graviton2**: 20% better price/performance
5. **Reserved Concurrency**: Only when needed
6. **Compression**: Reduce data transfer costs
## Advanced Optimization
### Lambda Extensions
**Use extensions for cross-cutting concerns**:
```typescript
// Lambda layer with extension
const extensionLayer = lambda.LayerVersion.fromLayerVersionArn(
this,
'Extension',
'arn:aws:lambda:us-east-1:123456789:layer:my-extension:1'
);
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
layers: [extensionLayer],
});
```
**Common extensions**:
- Secrets caching
- Configuration caching
- Custom logging
- Security scanning
- Performance monitoring
### Graviton2 Architecture
**20% better price/performance**:
```typescript
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
architecture: lambda.Architecture.ARM_64, // Graviton2
});
```
**Considerations**:
- Most runtimes support ARM64
- Test thoroughly before migrating
- Dependencies must support ARM64
- Native extensions may need recompilation
### VPC Optimization
**Hyperplane ENIs** (automatic since 2019):
- No ENI per function
- Faster cold starts in VPC
- Scales instantly
```typescript
// Modern VPC configuration (fast)
new NodejsFunction(this, 'VpcFunction', {
entry: 'src/handler.ts',
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
// Fast scaling, no ENI limitations
});
```
## Performance Monitoring
### Key Metrics
**Monitor these metrics**:
- **Duration**: p50, p95, p99, max
- **Cold Start %**: ColdStartDuration / TotalDuration
- **Error Rate**: Errors / Invocations
- **Throttles**: Indicates concurrency limit reached
- **Iterator Age**: For stream processing lag
### Performance Dashboards
```typescript
const dashboard = new cloudwatch.Dashboard(this, 'PerformanceDashboard');
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'Latency Distribution',
left: [
fn.metricDuration({ statistic: 'p50', label: 'p50' }),
fn.metricDuration({ statistic: 'p95', label: 'p95' }),
fn.metricDuration({ statistic: 'p99', label: 'p99' }),
fn.metricDuration({ statistic: 'Maximum', label: 'max' }),
],
}),
new cloudwatch.GraphWidget({
title: 'Memory Utilization',
left: [fn.metricDuration()],
right: [fn.metricErrors()],
})
);
```
## Summary
- **Cold Starts**: Optimize package size, use provisioned concurrency for critical paths
- **Memory**: More memory often = faster execution = lower cost
- **Initialization**: Initialize connections outside handler
- **Lazy Loading**: Load dependencies only when needed
- **Connection Reuse**: Enable for AWS SDK clients
- **Testing**: Test at different memory levels to find optimal configuration
- **Monitoring**: Track p99 latency, not average
- **Graviton2**: Consider ARM64 for better price/performance
- **Batch Operations**: Reduce round trips to services
- **Caching**: Cache frequently accessed data