Files
gh-greyhaven-ai-claude-code…/skills/observability-monitoring/SKILL.md
2025-11-29 18:29:23 +08:00

11 KiB

name, description
name description
observability-monitoring Implement observability and monitoring using Cloudflare Workers Analytics, wrangler tail for logs, and health checks. Use when setting up monitoring, implementing logging, configuring alerts, or debugging production issues.

Grey Haven Observability and Monitoring

Implement comprehensive monitoring for Grey Haven applications using Cloudflare Workers built-in observability tools.

Observability Stack

Grey Haven Monitoring Architecture

  • Logging: Cloudflare Workers logs + wrangler tail
  • Metrics: Cloudflare Workers Analytics dashboard
  • Custom Events: Cloudflare Analytics Engine
  • Health Checks: Cloudflare Health Checks for endpoint availability
  • Error Tracking: Console errors visible in Cloudflare dashboard

Cloudflare Workers Logging

Console Logging in Workers

// app/utils/logger.ts
export interface LogEvent {
  level: "debug" | "info" | "warn" | "error";
  message: string;
  context?: Record<string, unknown>;
  userId?: string;
  tenantId?: string;
  requestId?: string;
  duration?: number;
}

export function log(event: LogEvent) {
  const logData = {
    timestamp: new Date().toISOString(),
    level: event.level,
    message: event.message,
    environment: process.env.ENVIRONMENT,
    user_id: event.userId,
    tenant_id: event.tenantId,
    request_id: event.requestId,
    duration_ms: event.duration,
    ...event.context,
  };

  // Structured console logging (visible in Cloudflare dashboard)
  console[event.level](JSON.stringify(logData));
}

// Convenience methods
export const logger = {
  debug: (message: string, context?: Record<string, unknown>) =>
    log({ level: "debug", message, context }),
  info: (message: string, context?: Record<string, unknown>) =>
    log({ level: "info", message, context }),
  warn: (message: string, context?: Record<string, unknown>) =>
    log({ level: "warn", message, context }),
  error: (message: string, context?: Record<string, unknown>) =>
    log({ level: "error", message, context }),
};

Logging Middleware

// app/middleware/logging.ts
import { logger } from "~/utils/logger";
import { v4 as uuidv4 } from "uuid";

export async function loggingMiddleware(
  request: Request,
  next: () => Promise<Response>
) {
  const requestId = uuidv4();
  const startTime = Date.now();

  try {
    const response = await next();
    const duration = Date.now() - startTime;

    logger.info("Request completed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      status: response.status,
      duration_ms: duration,
    });

    return response;
  } catch (error) {
    const duration = Date.now() - startTime;

    logger.error("Request failed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      error: error.message,
      stack: error.stack,
      duration_ms: duration,
    });

    throw error;
  }
}

Cloudflare Workers Analytics

Workers Analytics Dashboard

Access metrics at: https://dash.cloudflare.com → Workers → Analytics

Key Metrics:

  • Request rate (requests/second)
  • CPU time (milliseconds)
  • Error rate (%)
  • Success rate (%)
  • Response time (P50, P95, P99)
  • Invocations per day
  • GB-seconds (compute usage)

Wrangler Tail (Real-time Logs)

# Stream production logs
npx wrangler tail --config wrangler.production.toml

# Filter by status code
npx wrangler tail --status error --config wrangler.production.toml

# Filter by method
npx wrangler tail --method POST --config wrangler.production.toml

# Filter by IP address
npx wrangler tail --ip 1.2.3.4 --config wrangler.production.toml

# Output to file
npx wrangler tail --config wrangler.production.toml > logs.txt

Accessing Logs in Cloudflare Dashboard

  1. Go to https://dash.cloudflare.com
  2. Navigate to Workers & Pages
  3. Select your Worker
  4. Click "Logs" tab
  5. View real-time logs with filtering

Log Features:

  • Real-time streaming
  • Filter by status code
  • Filter by request method
  • Search log content
  • Export logs (JSON)

Analytics Engine (Custom Events)

Setup Analytics Engine

wrangler.toml:

[[analytics_engine_datasets]]
binding = "ANALYTICS"

Track Custom Events

// app/utils/analytics.ts
export async function trackEvent(
  env: Env,
  eventName: string,
  data: {
    user_id?: string;
    tenant_id?: string;
    duration_ms?: number;
    [key: string]: string | number | undefined;
  }
) {
  try {
    await env.ANALYTICS.writeDataPoint({
      blobs: [eventName],
      doubles: [data.duration_ms || 0],
      indexes: [data.user_id || "", data.tenant_id || ""],
    });
  } catch (error) {
    console.error("Failed to track event:", error);
  }
}

// Usage in server function
export const loginUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    const startTime = Date.now();
    const user = await authenticateUser(data);
    const duration = Date.now() - startTime;

    // Track login event
    await trackEvent(context.env, "user_login", {
      user_id: user.id,
      tenant_id: user.tenantId,
      duration_ms: duration,
    });

    return user;
  }
);

Query Analytics Data

Use Cloudflare GraphQL API:

query GetLoginStats {
  viewer {
    accounts(filter: { accountTag: $accountId }) {
      workersAnalyticsEngineDataset(dataset: "my_analytics") {
        query(
          filter: {
            blob1: "user_login"
            datetime_gt: "2025-01-01T00:00:00Z"
          }
        ) {
          count
          dimensions {
            blob1  # event name
            index1 # user_id
            index2 # tenant_id
          }
        }
      }
    }
  }
}

Health Checks

Health Check Endpoint

// app/routes/api/health.ts
import { createServerFn } from "@tanstack/start";
import { db } from "~/lib/server/db";

export const GET = createServerFn({ method: "GET" }).handler(async ({ context }) => {
  const startTime = Date.now();
  const checks: Record<string, string> = {};

  // Check database
  let dbHealthy = false;
  try {
    await db.execute("SELECT 1");
    dbHealthy = true;
    checks.database = "ok";
  } catch (error) {
    console.error("Database health check failed:", error);
    checks.database = "failed";
  }

  // Check Redis (if using Upstash)
  let redisHealthy = false;
  if (context.env.REDIS) {
    try {
      await context.env.REDIS.ping();
      redisHealthy = true;
      checks.redis = "ok";
    } catch (error) {
      console.error("Redis health check failed:", error);
      checks.redis = "failed";
    }
  }

  const duration = Date.now() - startTime;
  const healthy = dbHealthy && (!context.env.REDIS || redisHealthy);

  return new Response(
    JSON.stringify({
      status: healthy ? "healthy" : "unhealthy",
      checks,
      duration_ms: duration,
      timestamp: new Date().toISOString(),
      environment: process.env.ENVIRONMENT,
    }),
    {
      status: healthy ? 200 : 503,
      headers: { "Content-Type": "application/json" },
    }
  );
});

Cloudflare Health Checks

Configure in Cloudflare dashboard:

  1. Go to Traffic → Health Checks
  2. Create health check for /api/health
  3. Configure:
    • Interval: 60 seconds
    • Timeout: 5 seconds
    • Retries: 2
    • Expected status: 200
  4. Set up notifications (email/webhook)

Error Tracking

Structured Error Logging

// app/utils/error-handler.ts
import { logger } from "~/utils/logger";

export function handleError(error: Error, context?: Record<string, unknown>) {
  // Log error with full context
  logger.error(error.message, {
    error_name: error.name,
    stack: error.stack,
    ...context,
  });

  // Also log to Analytics Engine for tracking
  if (context?.env) {
    trackEvent(context.env as Env, "error_occurred", {
      error_name: error.name,
      error_message: error.message,
    });
  }
}

// Usage in server function
export const updateUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    try {
      return await userService.update(data);
    } catch (error) {
      handleError(error, {
        user_id: context.user?.id,
        tenant_id: context.tenant?.id,
        env: context.env,
      });
      throw error;
    }
  }
);

Viewing Errors in Cloudflare

  1. Workers Dashboard: View errors in real-time
  2. Wrangler Tail: npx wrangler tail --status error
  3. Analytics: Check error rate metrics
  4. Health Checks: Monitor endpoint failures

Supporting Documentation

All supporting files are under 500 lines per Anthropic best practices:

When to Apply This Skill

Use this skill when:

  • Setting up monitoring for new Cloudflare Workers projects
  • Implementing structured logging with console
  • Debugging production issues with wrangler tail
  • Setting up health checks
  • Implementing custom metrics tracking with Analytics Engine
  • Configuring Cloudflare alerts

Template Reference

These patterns are from Grey Haven's production monitoring:

  • Cloudflare Workers Analytics: Request and performance metrics
  • Wrangler tail: Real-time log streaming
  • Console logging: Structured JSON logs
  • Analytics Engine: Custom event tracking

Critical Reminders

  1. Structured logging: Use JSON.stringify for console logs
  2. Request IDs: Track requests with UUIDs for debugging
  3. Error context: Include tenant_id, user_id in all error logs
  4. Health checks: Monitor database and external service connections
  5. Wrangler tail: Use filters to narrow down logs (--status, --method)
  6. Performance: Track duration_ms for all operations
  7. Environment: Log environment in all messages for filtering
  8. Analytics Engine: Use for custom metrics and event tracking
  9. Dashboard access: Logs available in Cloudflare Workers dashboard
  10. Real-time debugging: Use wrangler tail for live production debugging