--- description: Set up advanced observability for Rust Lambda with OpenTelemetry, X-Ray, and structured logging --- You are helping the user implement comprehensive observability for their Rust Lambda functions. ## Your Task Guide the user through setting up production-grade observability including distributed tracing, metrics, and structured logging. ## Observability Stack Options ### Option 1: AWS X-Ray (Native AWS Solution) **Best for**: - AWS-native monitoring - Quick setup - CloudWatch integration - Basic distributed tracing needs #### Enable X-Ray in Lambda **Via cargo-lambda:** ```bash cargo lambda deploy --enable-tracing ``` **Via SAM template:** ```yaml Resources: MyFunction: Type: AWS::Serverless::Function Properties: Tracing: Active # Enable X-Ray ``` **Via Terraform:** ```hcl resource "aws_lambda_function" "function" { # ... other config ... tracing_config { mode = "Active" } } ``` #### X-Ray with xray-lite Add to `Cargo.toml`: ```toml [dependencies] xray-lite = "0.1" aws-config = "1" aws-sdk-dynamodb = "1" # or other AWS services ``` Basic usage: ```rust use lambda_runtime::{run, service_fn, Error, LambdaEvent}; use xray_lite::SubsegmentContext; use xray_lite_aws_sdk::XRayAwsSdkExtension; async fn function_handler(event: LambdaEvent) -> Result { // X-Ray automatically creates parent segment for Lambda // Create subsegment for custom operation let subsegment = SubsegmentContext::from_lambda_ctx(&event.context); // Trace AWS SDK calls let config = aws_config::load_from_env().await .xray_extension(subsegment.clone()); let dynamodb = aws_sdk_dynamodb::Client::new(&config); // This DynamoDB call will be traced automatically let result = dynamodb .get_item() .table_name("MyTable") .key("id", AttributeValue::S("123".to_string())) .send() .await?; Ok(Response { data: result }) } ``` ### Option 2: OpenTelemetry (Vendor-Neutral) **Best for**: - Multi-vendor monitoring - Portability across platforms - Advanced telemetry needs - Custom metrics and traces #### Setup OpenTelemetry Add to `Cargo.toml`: ```toml [dependencies] lambda_runtime = "0.13" lambda-otel-lite = "0.1" # Lightweight OpenTelemetry for Lambda opentelemetry = "0.22" opentelemetry-otlp = "0.15" opentelemetry_sdk = "0.22" tracing = "0.1" tracing-opentelemetry = "0.23" tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] } ``` #### Basic OpenTelemetry Setup ```rust use lambda_otel_lite::{init_telemetry, HttpTracerProviderBuilder}; use lambda_runtime::{run, service_fn, Error, LambdaEvent}; use opentelemetry::trace::TracerProvider; use tracing::{info, instrument}; use tracing_subscriber::layer::SubscriberExt; #[tokio::main] async fn main() -> Result<(), Error> { // Initialize OpenTelemetry let tracer_provider = HttpTracerProviderBuilder::default() .with_default_text_map_propagator() .with_stdout_client() // For testing, use OTLP for production .build()?; let tracer = tracer_provider.tracer("my-rust-lambda"); // Setup tracing subscriber let telemetry_layer = tracing_opentelemetry::layer() .with_tracer(tracer); let subscriber = tracing_subscriber::registry() .with(tracing_subscriber::EnvFilter::from_default_env()) .with(tracing_subscriber::fmt::layer()) .with(telemetry_layer); tracing::subscriber::set_global_default(subscriber)?; run(service_fn(function_handler)).await } #[instrument(skip(event))] async fn function_handler(event: LambdaEvent) -> Result { info!(request_id = %event.context.request_id, "Processing request"); let result = process_data(&event.payload).await?; Ok(Response { result }) } #[instrument] async fn process_data(request: &Request) -> Result { info!("Processing data"); // Your processing logic // All operations within this function will be traced Ok(Data::new()) } ``` #### OpenTelemetry with OTLP Exporter For production, export to observability backend: ```rust use lambda_otel_lite::HttpTracerProviderBuilder; use opentelemetry_otlp::WithExportConfig; let tracer_provider = HttpTracerProviderBuilder::default() .with_stdout_client() .enable_otlp( opentelemetry_otlp::new_exporter() .http() .with_endpoint("https://your-collector:4318") .with_headers([("api-key", "your-key")]) )? .build()?; ``` ### Option 3: Datadog Integration **Best for**: - Datadog users - Comprehensive APM - Log aggregation - Custom metrics Add Datadog Lambda Extension layer and configure: ```rust use lambda_runtime::{run, service_fn, Error, LambdaEvent}; use tracing::{info, instrument}; use tracing_subscriber::{fmt, EnvFilter}; #[tokio::main] async fn main() -> Result<(), Error> { // JSON format for Datadog log parsing tracing_subscriber::fmt() .json() .with_env_filter(EnvFilter::from_default_env()) .with_target(false) .with_current_span(false) .init(); run(service_fn(function_handler)).await } #[instrument( skip(event), fields( request_id = %event.context.request_id, user_id = %event.payload.user_id, ) )] async fn function_handler(event: LambdaEvent) -> Result { info!("Processing user request"); // Datadog automatically traces this let result = fetch_user_data(&event.payload.user_id).await?; Ok(Response { result }) } ``` Deploy with Datadog extension layer: ```bash cargo lambda deploy \ --layers arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension-ARM:latest \ --env-var DD_API_KEY=your-api-key \ --env-var DD_SITE=datadoghq.com \ --env-var DD_SERVICE=my-rust-service \ --env-var DD_ENV=production ``` ## Structured Logging Best Practices ### Using tracing with Spans ```rust use tracing::{info, warn, error, debug, span, Level}; async fn function_handler(event: LambdaEvent) -> Result { let span = span!( Level::INFO, "process_request", request_id = %event.context.request_id, user_id = %event.payload.user_id, ); let _enter = span.enter(); info!("Starting request processing"); match process_user(&event.payload.user_id).await { Ok(user) => { info!(user_name = %user.name, "User processed successfully"); Ok(Response { user }) } Err(e) => { error!(error = %e, "Failed to process user"); Err(e) } } } #[instrument(skip(db), fields(user_id = %user_id))] async fn process_user(user_id: &str) -> Result { debug!("Fetching user from database"); let user = fetch_from_db(user_id).await?; info!(email = %user.email, "User fetched"); Ok(user) } ``` ### JSON Structured Logging ```rust use tracing_subscriber::{fmt, EnvFilter, layer::SubscriberExt}; use serde_json::json; #[tokio::main] async fn main() -> Result<(), Error> { // JSON output for CloudWatch Insights tracing_subscriber::fmt() .json() .with_env_filter(EnvFilter::from_default_env()) .with_current_span(true) .with_span_list(true) .with_target(false) .without_time() // CloudWatch adds timestamp .init(); run(service_fn(function_handler)).await } // Logs will be structured JSON: // {"level":"info","message":"Processing request","request_id":"abc123","user_id":"user456"} ``` ### Custom Metrics with OpenTelemetry ```rust use opentelemetry::metrics::{Counter, Histogram}; use opentelemetry::KeyValue; struct Metrics { request_counter: Counter, duration_histogram: Histogram, } async fn function_handler(event: LambdaEvent) -> Result { let start = std::time::Instant::now(); // Increment counter metrics.request_counter.add( 1, &[ KeyValue::new("function", "my-lambda"), KeyValue::new("region", "us-east-1"), ], ); let result = process_request(&event.payload).await?; // Record duration let duration = start.elapsed().as_secs_f64(); metrics.duration_histogram.record( duration, &[KeyValue::new("status", "success")], ); Ok(result) } ``` ## CloudWatch Logs Insights Queries With structured logging, you can query efficiently: ``` # Find errors for specific user fields @timestamp, message, error | filter user_id = "user456" | filter level = "error" | sort @timestamp desc # Calculate p95 latency fields duration_ms | stats percentile(duration_ms, 95) as p95_latency by bin(5m) # Count requests by status fields @timestamp | filter message = "Request completed" | stats count() by status ``` ## Distributed Tracing Pattern For microservices calling each other: ```rust use opentelemetry::global; use opentelemetry::trace::{Tracer, TracerProvider, SpanKind}; use opentelemetry_http::HeaderExtractor; async fn function_handler(event: LambdaEvent) -> Result { let tracer = global::tracer("my-service"); // Extract trace context from incoming request let parent_cx = global::get_text_map_propagator(|propagator| { let headers = HeaderExtractor::new(&event.payload.headers); propagator.extract(&headers) }); // Create span with parent context let span = tracer .span_builder("handle_request") .with_kind(SpanKind::Server) .start_with_context(&tracer, &parent_cx); let cx = opentelemetry::Context::current_with_span(span); // Call downstream service with trace context let client = reqwest::Client::new(); let response = client .get("https://downstream-service.com/api") .header("traceparent", extract_traceparent(&cx)) .send() .await?; Ok(Response { data: response.text().await? }) } ``` ## AWS ADOT Lambda Layer For automatic instrumentation (limited Rust support): ```bash # Add ADOT layer (note: Rust needs manual instrumentation) cargo lambda deploy \ --layers arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-collector-arm64-ver-0-90-1:1 \ --env-var AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument \ --env-var OPENTELEMETRY_COLLECTOR_CONFIG_FILE=/var/task/collector.yaml ``` ## Cold Start Monitoring Track cold start vs warm start: ```rust use std::sync::atomic::{AtomicBool, Ordering}; static COLD_START: AtomicBool = AtomicBool::new(true); async fn function_handler(event: LambdaEvent) -> Result { let is_cold_start = COLD_START.swap(false, Ordering::Relaxed); info!( cold_start = is_cold_start, "Lambda invocation" ); // Process request... Ok(Response {}) } ``` ## Error Tracking ### Capturing Error Context ```rust use tracing::error; use thiserror::Error; #[derive(Error, Debug)] enum LambdaError { #[error("Database error: {0}")] Database(#[from] sqlx::Error), #[error("External API error: {status}, {message}")] ExternalApi { status: u16, message: String }, } async fn function_handler(event: LambdaEvent) -> Result { match process_request(&event.payload).await { Ok(result) => { info!("Request processed successfully"); Ok(Response { result }) } Err(e) => { error!( error = %e, error_type = std::any::type_name_of_val(&e), request_id = %event.context.request_id, "Request failed" ); // Optionally send to error tracking service send_to_sentry(&e, &event.context).await; Err(e.into()) } } } ``` ## Performance Monitoring ### Measure Operation Duration ```rust use std::time::Instant; use tracing::info; #[instrument] async fn expensive_operation() -> Result { let start = Instant::now(); let result = do_work().await?; let duration = start.elapsed(); info!(duration_ms = duration.as_millis(), "Operation completed"); Ok(result) } ``` ### Automatic Instrumentation ```rust use tracing::instrument; // Automatically creates span and logs entry/exit #[instrument( skip(db), // Don't log entire db object fields( user_id = %user_id, operation = "fetch_user" ), err // Log errors automatically )] async fn fetch_user(db: &Database, user_id: &str) -> Result { db.get_user(user_id).await } ``` ## Observability Checklist - [ ] Enable X-Ray or OpenTelemetry tracing - [ ] Use structured logging (JSON format) - [ ] Add span instrumentation to key functions - [ ] Track cold vs warm starts - [ ] Monitor error rates and types - [ ] Measure operation durations - [ ] Set up CloudWatch Logs Insights queries - [ ] Configure alerts for errors and latency - [ ] Track custom business metrics - [ ] Propagate trace context across services - [ ] Set appropriate log retention - [ ] Use log levels correctly (debug, info, warn, error) ## Recommended Stack **For AWS-only**: - X-Ray for tracing - CloudWatch Logs with structured JSON - CloudWatch Insights for queries - xray-lite for Rust integration **For multi-cloud/vendor-neutral**: - OpenTelemetry for tracing - OTLP exporter to your backend - lambda-otel-lite for Lambda optimization - tracing crate for structured logging **For Datadog users**: - Datadog Lambda Extension - DD_TRACE_ENABLED for automatic tracing - JSON structured logging - Custom metrics via DogStatsD ## Dependencies ```toml [dependencies] # Basic tracing tracing = { version = "0.1", features = ["log"] } tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] } # X-Ray xray-lite = "0.1" xray-lite-aws-sdk = "0.1" # OpenTelemetry lambda-otel-lite = "0.1" opentelemetry = "0.22" opentelemetry-otlp = "0.15" opentelemetry_sdk = "0.22" tracing-opentelemetry = "0.23" # AWS SDK (for tracing AWS calls) aws-config = "1" aws-sdk-dynamodb = "1" # or other services ``` Guide the user through setting up observability appropriate for their needs and monitoring backend.