Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:25:52 +08:00
commit e840c60a06
19 changed files with 6852 additions and 0 deletions

View File

@@ -0,0 +1,575 @@
---
description: Set up advanced observability for Rust Lambda with OpenTelemetry, X-Ray, and structured logging
---
You are helping the user implement comprehensive observability for their Rust Lambda functions.
## Your Task
Guide the user through setting up production-grade observability including distributed tracing, metrics, and structured logging.
## Observability Stack Options
### Option 1: AWS X-Ray (Native AWS Solution)
**Best for**:
- AWS-native monitoring
- Quick setup
- CloudWatch integration
- Basic distributed tracing needs
#### Enable X-Ray in Lambda
**Via cargo-lambda:**
```bash
cargo lambda deploy --enable-tracing
```
**Via SAM template:**
```yaml
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Tracing: Active # Enable X-Ray
```
**Via Terraform:**
```hcl
resource "aws_lambda_function" "function" {
# ... other config ...
tracing_config {
mode = "Active"
}
}
```
#### X-Ray with xray-lite
Add to `Cargo.toml`:
```toml
[dependencies]
xray-lite = "0.1"
aws-config = "1"
aws-sdk-dynamodb = "1" # or other AWS services
```
Basic usage:
```rust
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use xray_lite::SubsegmentContext;
use xray_lite_aws_sdk::XRayAwsSdkExtension;
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
// X-Ray automatically creates parent segment for Lambda
// Create subsegment for custom operation
let subsegment = SubsegmentContext::from_lambda_ctx(&event.context);
// Trace AWS SDK calls
let config = aws_config::load_from_env().await
.xray_extension(subsegment.clone());
let dynamodb = aws_sdk_dynamodb::Client::new(&config);
// This DynamoDB call will be traced automatically
let result = dynamodb
.get_item()
.table_name("MyTable")
.key("id", AttributeValue::S("123".to_string()))
.send()
.await?;
Ok(Response { data: result })
}
```
### Option 2: OpenTelemetry (Vendor-Neutral)
**Best for**:
- Multi-vendor monitoring
- Portability across platforms
- Advanced telemetry needs
- Custom metrics and traces
#### Setup OpenTelemetry
Add to `Cargo.toml`:
```toml
[dependencies]
lambda_runtime = "0.13"
lambda-otel-lite = "0.1" # Lightweight OpenTelemetry for Lambda
opentelemetry = "0.22"
opentelemetry-otlp = "0.15"
opentelemetry_sdk = "0.22"
tracing = "0.1"
tracing-opentelemetry = "0.23"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
```
#### Basic OpenTelemetry Setup
```rust
use lambda_otel_lite::{init_telemetry, HttpTracerProviderBuilder};
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use opentelemetry::trace::TracerProvider;
use tracing::{info, instrument};
use tracing_subscriber::layer::SubscriberExt;
#[tokio::main]
async fn main() -> Result<(), Error> {
// Initialize OpenTelemetry
let tracer_provider = HttpTracerProviderBuilder::default()
.with_default_text_map_propagator()
.with_stdout_client() // For testing, use OTLP for production
.build()?;
let tracer = tracer_provider.tracer("my-rust-lambda");
// Setup tracing subscriber
let telemetry_layer = tracing_opentelemetry::layer()
.with_tracer(tracer);
let subscriber = tracing_subscriber::registry()
.with(tracing_subscriber::EnvFilter::from_default_env())
.with(tracing_subscriber::fmt::layer())
.with(telemetry_layer);
tracing::subscriber::set_global_default(subscriber)?;
run(service_fn(function_handler)).await
}
#[instrument(skip(event))]
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
info!(request_id = %event.context.request_id, "Processing request");
let result = process_data(&event.payload).await?;
Ok(Response { result })
}
#[instrument]
async fn process_data(request: &Request) -> Result<Data, Error> {
info!("Processing data");
// Your processing logic
// All operations within this function will be traced
Ok(Data::new())
}
```
#### OpenTelemetry with OTLP Exporter
For production, export to observability backend:
```rust
use lambda_otel_lite::HttpTracerProviderBuilder;
use opentelemetry_otlp::WithExportConfig;
let tracer_provider = HttpTracerProviderBuilder::default()
.with_stdout_client()
.enable_otlp(
opentelemetry_otlp::new_exporter()
.http()
.with_endpoint("https://your-collector:4318")
.with_headers([("api-key", "your-key")])
)?
.build()?;
```
### Option 3: Datadog Integration
**Best for**:
- Datadog users
- Comprehensive APM
- Log aggregation
- Custom metrics
Add Datadog Lambda Extension layer and configure:
```rust
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use tracing::{info, instrument};
use tracing_subscriber::{fmt, EnvFilter};
#[tokio::main]
async fn main() -> Result<(), Error> {
// JSON format for Datadog log parsing
tracing_subscriber::fmt()
.json()
.with_env_filter(EnvFilter::from_default_env())
.with_target(false)
.with_current_span(false)
.init();
run(service_fn(function_handler)).await
}
#[instrument(
skip(event),
fields(
request_id = %event.context.request_id,
user_id = %event.payload.user_id,
)
)]
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
info!("Processing user request");
// Datadog automatically traces this
let result = fetch_user_data(&event.payload.user_id).await?;
Ok(Response { result })
}
```
Deploy with Datadog extension layer:
```bash
cargo lambda deploy \
--layers arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension-ARM:latest \
--env-var DD_API_KEY=your-api-key \
--env-var DD_SITE=datadoghq.com \
--env-var DD_SERVICE=my-rust-service \
--env-var DD_ENV=production
```
## Structured Logging Best Practices
### Using tracing with Spans
```rust
use tracing::{info, warn, error, debug, span, Level};
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
let span = span!(
Level::INFO,
"process_request",
request_id = %event.context.request_id,
user_id = %event.payload.user_id,
);
let _enter = span.enter();
info!("Starting request processing");
match process_user(&event.payload.user_id).await {
Ok(user) => {
info!(user_name = %user.name, "User processed successfully");
Ok(Response { user })
}
Err(e) => {
error!(error = %e, "Failed to process user");
Err(e)
}
}
}
#[instrument(skip(db), fields(user_id = %user_id))]
async fn process_user(user_id: &str) -> Result<User, Error> {
debug!("Fetching user from database");
let user = fetch_from_db(user_id).await?;
info!(email = %user.email, "User fetched");
Ok(user)
}
```
### JSON Structured Logging
```rust
use tracing_subscriber::{fmt, EnvFilter, layer::SubscriberExt};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Error> {
// JSON output for CloudWatch Insights
tracing_subscriber::fmt()
.json()
.with_env_filter(EnvFilter::from_default_env())
.with_current_span(true)
.with_span_list(true)
.with_target(false)
.without_time() // CloudWatch adds timestamp
.init();
run(service_fn(function_handler)).await
}
// Logs will be structured JSON:
// {"level":"info","message":"Processing request","request_id":"abc123","user_id":"user456"}
```
### Custom Metrics with OpenTelemetry
```rust
use opentelemetry::metrics::{Counter, Histogram};
use opentelemetry::KeyValue;
struct Metrics {
request_counter: Counter<u64>,
duration_histogram: Histogram<f64>,
}
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
let start = std::time::Instant::now();
// Increment counter
metrics.request_counter.add(
1,
&[
KeyValue::new("function", "my-lambda"),
KeyValue::new("region", "us-east-1"),
],
);
let result = process_request(&event.payload).await?;
// Record duration
let duration = start.elapsed().as_secs_f64();
metrics.duration_histogram.record(
duration,
&[KeyValue::new("status", "success")],
);
Ok(result)
}
```
## CloudWatch Logs Insights Queries
With structured logging, you can query efficiently:
```
# Find errors for specific user
fields @timestamp, message, error
| filter user_id = "user456"
| filter level = "error"
| sort @timestamp desc
# Calculate p95 latency
fields duration_ms
| stats percentile(duration_ms, 95) as p95_latency by bin(5m)
# Count requests by status
fields @timestamp
| filter message = "Request completed"
| stats count() by status
```
## Distributed Tracing Pattern
For microservices calling each other:
```rust
use opentelemetry::global;
use opentelemetry::trace::{Tracer, TracerProvider, SpanKind};
use opentelemetry_http::HeaderExtractor;
async fn function_handler(event: LambdaEvent<ApiGatewayRequest>) -> Result<Response, Error> {
let tracer = global::tracer("my-service");
// Extract trace context from incoming request
let parent_cx = global::get_text_map_propagator(|propagator| {
let headers = HeaderExtractor::new(&event.payload.headers);
propagator.extract(&headers)
});
// Create span with parent context
let span = tracer
.span_builder("handle_request")
.with_kind(SpanKind::Server)
.start_with_context(&tracer, &parent_cx);
let cx = opentelemetry::Context::current_with_span(span);
// Call downstream service with trace context
let client = reqwest::Client::new();
let response = client
.get("https://downstream-service.com/api")
.header("traceparent", extract_traceparent(&cx))
.send()
.await?;
Ok(Response { data: response.text().await? })
}
```
## AWS ADOT Lambda Layer
For automatic instrumentation (limited Rust support):
```bash
# Add ADOT layer (note: Rust needs manual instrumentation)
cargo lambda deploy \
--layers arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-collector-arm64-ver-0-90-1:1 \
--env-var AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument \
--env-var OPENTELEMETRY_COLLECTOR_CONFIG_FILE=/var/task/collector.yaml
```
## Cold Start Monitoring
Track cold start vs warm start:
```rust
use std::sync::atomic::{AtomicBool, Ordering};
static COLD_START: AtomicBool = AtomicBool::new(true);
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
let is_cold_start = COLD_START.swap(false, Ordering::Relaxed);
info!(
cold_start = is_cold_start,
"Lambda invocation"
);
// Process request...
Ok(Response {})
}
```
## Error Tracking
### Capturing Error Context
```rust
use tracing::error;
use thiserror::Error;
#[derive(Error, Debug)]
enum LambdaError {
#[error("Database error: {0}")]
Database(#[from] sqlx::Error),
#[error("External API error: {status}, {message}")]
ExternalApi { status: u16, message: String },
}
async fn function_handler(event: LambdaEvent<Request>) -> Result<Response, Error> {
match process_request(&event.payload).await {
Ok(result) => {
info!("Request processed successfully");
Ok(Response { result })
}
Err(e) => {
error!(
error = %e,
error_type = std::any::type_name_of_val(&e),
request_id = %event.context.request_id,
"Request failed"
);
// Optionally send to error tracking service
send_to_sentry(&e, &event.context).await;
Err(e.into())
}
}
}
```
## Performance Monitoring
### Measure Operation Duration
```rust
use std::time::Instant;
use tracing::info;
#[instrument]
async fn expensive_operation() -> Result<Data, Error> {
let start = Instant::now();
let result = do_work().await?;
let duration = start.elapsed();
info!(duration_ms = duration.as_millis(), "Operation completed");
Ok(result)
}
```
### Automatic Instrumentation
```rust
use tracing::instrument;
// Automatically creates span and logs entry/exit
#[instrument(
skip(db), // Don't log entire db object
fields(
user_id = %user_id,
operation = "fetch_user"
),
err // Log errors automatically
)]
async fn fetch_user(db: &Database, user_id: &str) -> Result<User, Error> {
db.get_user(user_id).await
}
```
## Observability Checklist
- [ ] Enable X-Ray or OpenTelemetry tracing
- [ ] Use structured logging (JSON format)
- [ ] Add span instrumentation to key functions
- [ ] Track cold vs warm starts
- [ ] Monitor error rates and types
- [ ] Measure operation durations
- [ ] Set up CloudWatch Logs Insights queries
- [ ] Configure alerts for errors and latency
- [ ] Track custom business metrics
- [ ] Propagate trace context across services
- [ ] Set appropriate log retention
- [ ] Use log levels correctly (debug, info, warn, error)
## Recommended Stack
**For AWS-only**:
- X-Ray for tracing
- CloudWatch Logs with structured JSON
- CloudWatch Insights for queries
- xray-lite for Rust integration
**For multi-cloud/vendor-neutral**:
- OpenTelemetry for tracing
- OTLP exporter to your backend
- lambda-otel-lite for Lambda optimization
- tracing crate for structured logging
**For Datadog users**:
- Datadog Lambda Extension
- DD_TRACE_ENABLED for automatic tracing
- JSON structured logging
- Custom metrics via DogStatsD
## Dependencies
```toml
[dependencies]
# Basic tracing
tracing = { version = "0.1", features = ["log"] }
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
# X-Ray
xray-lite = "0.1"
xray-lite-aws-sdk = "0.1"
# OpenTelemetry
lambda-otel-lite = "0.1"
opentelemetry = "0.22"
opentelemetry-otlp = "0.15"
opentelemetry_sdk = "0.22"
tracing-opentelemetry = "0.23"
# AWS SDK (for tracing AWS calls)
aws-config = "1"
aws-sdk-dynamodb = "1" # or other services
```
Guide the user through setting up observability appropriate for their needs and monitoring backend.