zhongwei/gh-zxkane-aws-skills

Fork 0

Files

Zhongwei Li 38b562e994 Initial commit

2025-11-30 09:08:38 +08:00

16 KiB

Raw Blame History

Serverless Performance Optimization

Performance optimization best practices for AWS Lambda and serverless architectures.

Lambda Execution Lifecycle
Cold Start Optimization
Memory and CPU Optimization
Package Size Optimization
Initialization Optimization
Runtime Performance

Lambda Execution Lifecycle

Execution Environment Phases

Three phases of Lambda execution:

Init Phase (Cold Start):
- Download and unpack function package
- Create execution environment
- Initialize runtime
- Execute initialization code (outside handler)
Invoke Phase:
- Execute handler code
- Return response
- Freeze execution environment
Shutdown Phase:
- Runtime shutdown (after period of inactivity)
- Execution environment destroyed

Concurrency and Scaling

Key concepts:

Concurrency: Number of execution environments serving requests simultaneously
One event per environment: Each environment processes one event at a time
Automatic scaling: Lambda creates new environments as needed
Environment reuse: Warm starts reuse existing environments

Example:

Function takes 100ms to execute
Single environment can handle 10 requests/second
100 concurrent requests = 10 environments needed
Default account limit: 1,000 concurrent executions (can be raised)

Cold Start Optimization

Understanding Cold Starts

Cold start components:

Total Cold Start = Download Package + Init Environment + Init Code + Handler

Cold start frequency:

Development: Every code change creates new environments (frequent)
Production: Typically < 1% of invocations
Optimize for p95/p99 latency, not average

Package Size Optimization

Minimize deployment package:

new NodejsFunction(this, 'Function', {
  entry: 'src/handler.ts',
  bundling: {
    minify: true, // Minify production code
    sourceMap: false, // Disable in production
    externalModules: [
      '@aws-sdk/*', // Use AWS SDK from runtime
    ],
    // Tree-shaking removes unused code
  },
});

Tools for optimization:

esbuild: Automatic tree-shaking and minification
Webpack: Bundle optimization
Maven: Dependency analysis
Gradle: Unused dependency detection

Best practices:

Avoid monolithic functions
Bundle only required dependencies
Use tree-shaking to remove unused code
Minify production code
Exclude AWS SDK (provided by runtime)

Provisioned Concurrency

Pre-initialize environments for predictable latency:

const fn = new NodejsFunction(this, 'Function', {
  entry: 'src/handler.ts',
});

// Static provisioned concurrency
fn.currentVersion.addAlias('live', {
  provisionedConcurrentExecutions: 10,
});

// Auto-scaling provisioned concurrency
const alias = fn.currentVersion.addAlias('prod');

const target = new applicationautoscaling.ScalableTarget(this, 'ScalableTarget', {
  serviceNamespace: applicationautoscaling.ServiceNamespace.LAMBDA,
  maxCapacity: 100,
  minCapacity: 10,
  resourceId: `function:${fn.functionName}:${alias.aliasName}`,
  scalableDimension: 'lambda:function:ProvisionedConcurrentExecutions',
});

target.scaleOnUtilization({
  utilizationTarget: 0.7, // 70% utilization
});

When to use:

Consistent traffic patterns: Predictable load
Latency-sensitive APIs: Sub-100ms requirements
Cost consideration: Compare cold start frequency vs. provisioned cost

Cost comparison:

On-demand: Pay only for actual usage
Provisioned: Pay for provisioned capacity + invocations
Breakeven: When cold starts > ~20% of invocations

Lambda SnapStart (Java)

Instant cold starts for Java:

new lambda.Function(this, 'JavaFunction', {
  runtime: lambda.Runtime.JAVA_17,
  code: lambda.Code.fromAsset('target/function.jar'),
  handler: 'com.example.Handler::handleRequest',
  snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
});

Benefits:

Up to 10x faster cold starts for Java
No code changes required
Works with published versions
No additional cost

Memory and CPU Optimization

Memory = CPU Allocation

Key principle: Memory and CPU are proportionally allocated

Memory	vCPU
128 MB	0.07 vCPU
512 MB	0.28 vCPU
1,024 MB	0.57 vCPU
1,769 MB	1.00 vCPU
3,538 MB	2.00 vCPU
10,240 MB	6.00 vCPU

Cost vs. Performance Balancing

Example - Compute-intensive function:

Memory	Duration	Cost
128 MB	11.72s	$0.0246
256 MB	6.68s	$0.0280
512 MB	3.19s	$0.0268
1024 MB	1.46s	$0.0246

Key insight: More memory = faster execution = similar or lower cost

Formula:

Duration = Allocated Memory (GB) × Execution Time (seconds)
Cost = Duration × Number of Invocations × Price per GB-second

Finding Optimal Memory

Use Lambda Power Tuning:

# Deploy power tuning state machine
sam deploy --template-file template.yml --stack-name lambda-power-tuning

# Run power tuning
aws lambda invoke \
  --function-name powerTuningFunction \
  --payload '{"lambdaARN": "arn:aws:lambda:...", "powerValues": [128, 256, 512, 1024, 1536, 3008]}' \
  response.json

Manual testing approach:

Test function at different memory levels
Measure execution time at each level
Calculate cost for each configuration
Choose optimal balance for your use case

Multi-Core Optimization

Leverage multiple vCPUs (at 1,769 MB+):

// Use Worker Threads for parallel processing
import { Worker } from 'worker_threads';

export const handler = async (event: any) => {
  const items = event.items;

  // Process in parallel using multiple cores
  const workers = items.map(item =>
    new Promise((resolve, reject) => {
      const worker = new Worker('./worker.js', {
        workerData: item,
      });

      worker.on('message', resolve);
      worker.on('error', reject);
    })
  );

  const results = await Promise.all(workers);
  return results;
};

Python multiprocessing:

import multiprocessing as mp

def handler(event, context):
    items = event['items']

    # Use multiple cores for CPU-bound work
    with mp.Pool(mp.cpu_count()) as pool:
        results = pool.map(process_item, items)

    return {'results': results}

Initialization Optimization

Code Outside Handler

Initialize once, reuse across invocations:

// ✅ GOOD - Initialize outside handler
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';

// Initialized once per execution environment
const dynamodb = new DynamoDBClient({});
const s3 = new S3Client({});

// Connection pool initialized once
const pool = createConnectionPool({
  host: process.env.DB_HOST,
  max: 1, // One connection per execution environment
});

export const handler = async (event: any) => {
  // Reuse connections across invocations
  const data = await dynamodb.getItem({ /* ... */ });
  const file = await s3.getObject({ /* ... */ });
  return processData(data, file);
};

// ❌ BAD - Initialize in handler
export const handler = async (event: any) => {
  const dynamodb = new DynamoDBClient({}); // Created every invocation
  const s3 = new S3Client({}); // Created every invocation
  // ...
};

Lazy Loading

Load dependencies only when needed:

// ✅ GOOD - Conditional loading
export const handler = async (event: any) => {
  if (event.operation === 'generatePDF') {
    // Load heavy PDF library only when needed
    const pdfLib = await import('./pdf-generator');
    return pdfLib.generatePDF(event.data);
  }

  if (event.operation === 'processImage') {
    const sharp = await import('sharp');
    return processImage(sharp, event.data);
  }

  // Default operation (no heavy dependencies)
  return processDefault(event);
};

// ❌ BAD - Load everything upfront
import pdfLib from './pdf-generator'; // 50MB
import sharp from 'sharp'; // 20MB
// Even if not used!

export const handler = async (event: any) => {
  if (event.operation === 'generatePDF') {
    return pdfLib.generatePDF(event.data);
  }
};

Connection Reuse

Enable connection reuse:

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

const client = new DynamoDBClient({
  // Enable keep-alive for connection reuse
  requestHandler: {
    connectionTimeout: 3000,
    socketTimeout: 3000,
  },
});

// For Node.js AWS SDK
process.env.AWS_NODEJS_CONNECTION_REUSE_ENABLED = '1';

Runtime Performance

Choose the Right Runtime

Runtime comparison:

Runtime	Cold Start	Execution Speed	Ecosystem	Best For
Node.js 20	Fast	Fast	Excellent	APIs, I/O-bound
Python 3.12	Fast	Medium	Excellent	Data processing
Java 17 + SnapStart	Fast (w/SnapStart)	Fast	Good	Enterprise apps
.NET 8	Medium	Fast	Good	Enterprise apps
Go	Very Fast	Very Fast	Good	High performance
Rust	Very Fast	Very Fast	Growing	High performance

Optimize Handler Code

Efficient code patterns:

// ✅ GOOD - Batch operations
const items = ['item1', 'item2', 'item3'];

// Single batch write
await dynamodb.batchWriteItem({
  RequestItems: {
    [tableName]: items.map(item => ({
      PutRequest: { Item: item },
    })),
  },
});

// ❌ BAD - Multiple single operations
for (const item of items) {
  await dynamodb.putItem({
    TableName: tableName,
    Item: item,
  }); // Slow, multiple round trips
}

Async Processing

Use async/await effectively:

// ✅ GOOD - Parallel async operations
const [userData, orderData, inventoryData] = await Promise.all([
  getUserData(userId),
  getOrderData(orderId),
  getInventoryData(productId),
]);

// ❌ BAD - Sequential async operations
const userData = await getUserData(userId);
const orderData = await getOrderData(orderId); // Waits unnecessarily
const inventoryData = await getInventoryData(productId); // Waits unnecessarily

Caching Strategies

Cache frequently accessed data:

// In-memory cache (persists in warm environments)
const cache = new Map<string, any>();

export const handler = async (event: any) => {
  const key = event.key;

  // Check cache first
  if (cache.has(key)) {
    console.log('Cache hit');
    return cache.get(key);
  }

  // Fetch from database
  const data = await fetchFromDatabase(key);

  // Store in cache
  cache.set(key, data);

  return data;
};

ElastiCache for shared cache:

import Redis from 'ioredis';

// Initialize once
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  lazyConnect: true,
  enableOfflineQueue: false,
});

export const handler = async (event: any) => {
  const key = `order:${event.orderId}`;

  // Try cache
  const cached = await redis.get(key);
  if (cached) {
    return JSON.parse(cached);
  }

  // Fetch and cache
  const data = await fetchOrder(event.orderId);
  await redis.setex(key, 300, JSON.stringify(data)); // 5 min TTL

  return data;
};

Performance Testing

Load Testing

Use Artillery for load testing:

# load-test.yml
config:
  target: https://api.example.com
  phases:
    - duration: 60
      arrivalRate: 10
      rampTo: 100 # Ramp from 10 to 100 req/sec
scenarios:
  - flow:
      - post:
          url: /orders
          json:
            orderId: "{{ $randomString() }}"
            amount: "{{ $randomNumber(10, 1000) }}"

artillery run load-test.yml

Benchmarking

Test different configurations:

// benchmark.ts
import { Lambda } from '@aws-sdk/client-lambda';

const lambda = new Lambda({});

const testConfigurations = [
  { memory: 128, name: 'Function-128' },
  { memory: 256, name: 'Function-256' },
  { memory: 512, name: 'Function-512' },
  { memory: 1024, name: 'Function-1024' },
];

for (const config of testConfigurations) {
  const times: number[] = [];

  // Warm up
  for (let i = 0; i < 5; i++) {
    await lambda.invoke({ FunctionName: config.name });
  }

  // Measure
  for (let i = 0; i < 100; i++) {
    const start = Date.now();
    await lambda.invoke({ FunctionName: config.name });
    times.push(Date.now() - start);
  }

  const p99 = times.sort()[99];
  const avg = times.reduce((a, b) => a + b) / times.length;

  console.log(`${config.memory}MB - Avg: ${avg}ms, p99: ${p99}ms`);
}

Cost Optimization

Right-Sizing Memory

Balance cost and performance:

CPU-bound workloads:

More memory = more CPU = faster execution
Often results in lower cost overall
Test at 1769MB (1 vCPU) and above

I/O-bound workloads:

Less sensitive to memory allocation
May not benefit from higher memory
Test at lower memory levels (256-512MB)

Simple operations:

Minimal CPU required
Use minimum memory (128-256MB)
Fast execution despite low resources

Billing Granularity

Lambda bills in 1ms increments:

Precise billing (7ms execution = 7ms cost)
Optimize even small improvements
Consider trade-offs carefully

Cost calculation:

Cost = (Memory GB) × (Duration seconds) × (Invocations) × ($0.0000166667/GB-second)
     + (Invocations) × ($0.20/1M requests)

Cost Reduction Strategies

Optimize execution time: Faster = cheaper
Right-size memory: Balance CPU needs with cost
Reduce invocations: Batch processing, caching
Use Graviton2: 20% better price/performance
Reserved Concurrency: Only when needed
Compression: Reduce data transfer costs

Advanced Optimization

Lambda Extensions

Use extensions for cross-cutting concerns:

// Lambda layer with extension
const extensionLayer = lambda.LayerVersion.fromLayerVersionArn(
  this,
  'Extension',
  'arn:aws:lambda:us-east-1:123456789:layer:my-extension:1'
);

new NodejsFunction(this, 'Function', {
  entry: 'src/handler.ts',
  layers: [extensionLayer],
});

Common extensions:

Secrets caching
Configuration caching
Custom logging
Security scanning
Performance monitoring

Graviton2 Architecture

20% better price/performance:

new NodejsFunction(this, 'Function', {
  entry: 'src/handler.ts',
  architecture: lambda.Architecture.ARM_64, // Graviton2
});

Considerations:

Most runtimes support ARM64
Test thoroughly before migrating
Dependencies must support ARM64
Native extensions may need recompilation

VPC Optimization

Hyperplane ENIs (automatic since 2019):

No ENI per function
Faster cold starts in VPC
Scales instantly

// Modern VPC configuration (fast)
new NodejsFunction(this, 'VpcFunction', {
  entry: 'src/handler.ts',
  vpc,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  // Fast scaling, no ENI limitations
});

Performance Monitoring

Key Metrics

Monitor these metrics:

Duration: p50, p95, p99, max
Cold Start %: ColdStartDuration / TotalDuration
Error Rate: Errors / Invocations
Throttles: Indicates concurrency limit reached
Iterator Age: For stream processing lag

Performance Dashboards

const dashboard = new cloudwatch.Dashboard(this, 'PerformanceDashboard');

dashboard.addWidgets(
  new cloudwatch.GraphWidget({
    title: 'Latency Distribution',
    left: [
      fn.metricDuration({ statistic: 'p50', label: 'p50' }),
      fn.metricDuration({ statistic: 'p95', label: 'p95' }),
      fn.metricDuration({ statistic: 'p99', label: 'p99' }),
      fn.metricDuration({ statistic: 'Maximum', label: 'max' }),
    ],
  }),
  new cloudwatch.GraphWidget({
    title: 'Memory Utilization',
    left: [fn.metricDuration()],
    right: [fn.metricErrors()],
  })
);

Summary

Cold Starts: Optimize package size, use provisioned concurrency for critical paths
Memory: More memory often = faster execution = lower cost
Initialization: Initialize connections outside handler
Lazy Loading: Load dependencies only when needed
Connection Reuse: Enable for AWS SDK clients
Testing: Test at different memory levels to find optimal configuration
Monitoring: Track p99 latency, not average
Graviton2: Consider ARM64 for better price/performance
Batch Operations: Reduce round trips to services
Caching: Cache frequently accessed data

16 KiB Raw Blame History Unescape Escape