Files
gh-greyhaven-ai-claude-code…/skills/observability-monitoring/SKILL.md
2025-11-29 18:29:23 +08:00

414 lines
11 KiB
Markdown

---
name: observability-monitoring
description: Implement observability and monitoring using Cloudflare Workers Analytics, wrangler tail for logs, and health checks. Use when setting up monitoring, implementing logging, configuring alerts, or debugging production issues.
---
# Grey Haven Observability and Monitoring
Implement comprehensive monitoring for Grey Haven applications using **Cloudflare Workers** built-in observability tools.
## Observability Stack
### Grey Haven Monitoring Architecture
- **Logging**: Cloudflare Workers logs + wrangler tail
- **Metrics**: Cloudflare Workers Analytics dashboard
- **Custom Events**: Cloudflare Analytics Engine
- **Health Checks**: Cloudflare Health Checks for endpoint availability
- **Error Tracking**: Console errors visible in Cloudflare dashboard
## Cloudflare Workers Logging
### Console Logging in Workers
```typescript
// app/utils/logger.ts
export interface LogEvent {
level: "debug" | "info" | "warn" | "error";
message: string;
context?: Record<string, unknown>;
userId?: string;
tenantId?: string;
requestId?: string;
duration?: number;
}
export function log(event: LogEvent) {
const logData = {
timestamp: new Date().toISOString(),
level: event.level,
message: event.message,
environment: process.env.ENVIRONMENT,
user_id: event.userId,
tenant_id: event.tenantId,
request_id: event.requestId,
duration_ms: event.duration,
...event.context,
};
// Structured console logging (visible in Cloudflare dashboard)
console[event.level](JSON.stringify(logData));
}
// Convenience methods
export const logger = {
debug: (message: string, context?: Record<string, unknown>) =>
log({ level: "debug", message, context }),
info: (message: string, context?: Record<string, unknown>) =>
log({ level: "info", message, context }),
warn: (message: string, context?: Record<string, unknown>) =>
log({ level: "warn", message, context }),
error: (message: string, context?: Record<string, unknown>) =>
log({ level: "error", message, context }),
};
```
### Logging Middleware
```typescript
// app/middleware/logging.ts
import { logger } from "~/utils/logger";
import { v4 as uuidv4 } from "uuid";
export async function loggingMiddleware(
request: Request,
next: () => Promise<Response>
) {
const requestId = uuidv4();
const startTime = Date.now();
try {
const response = await next();
const duration = Date.now() - startTime;
logger.info("Request completed", {
request_id: requestId,
method: request.method,
url: request.url,
status: response.status,
duration_ms: duration,
});
return response;
} catch (error) {
const duration = Date.now() - startTime;
logger.error("Request failed", {
request_id: requestId,
method: request.method,
url: request.url,
error: error.message,
stack: error.stack,
duration_ms: duration,
});
throw error;
}
}
```
## Cloudflare Workers Analytics
### Workers Analytics Dashboard
Access metrics at: `https://dash.cloudflare.com → Workers → Analytics`
**Key Metrics**:
- Request rate (requests/second)
- CPU time (milliseconds)
- Error rate (%)
- Success rate (%)
- Response time (P50, P95, P99)
- Invocations per day
- GB-seconds (compute usage)
### Wrangler Tail (Real-time Logs)
```bash
# Stream production logs
npx wrangler tail --config wrangler.production.toml
# Filter by status code
npx wrangler tail --status error --config wrangler.production.toml
# Filter by method
npx wrangler tail --method POST --config wrangler.production.toml
# Filter by IP address
npx wrangler tail --ip 1.2.3.4 --config wrangler.production.toml
# Output to file
npx wrangler tail --config wrangler.production.toml > logs.txt
```
### Accessing Logs in Cloudflare Dashboard
1. Go to `https://dash.cloudflare.com`
2. Navigate to Workers & Pages
3. Select your Worker
4. Click "Logs" tab
5. View real-time logs with filtering
**Log Features**:
- Real-time streaming
- Filter by status code
- Filter by request method
- Search log content
- Export logs (JSON)
## Analytics Engine (Custom Events)
### Setup Analytics Engine
**wrangler.toml**:
```toml
[[analytics_engine_datasets]]
binding = "ANALYTICS"
```
### Track Custom Events
```typescript
// app/utils/analytics.ts
export async function trackEvent(
env: Env,
eventName: string,
data: {
user_id?: string;
tenant_id?: string;
duration_ms?: number;
[key: string]: string | number | undefined;
}
) {
try {
await env.ANALYTICS.writeDataPoint({
blobs: [eventName],
doubles: [data.duration_ms || 0],
indexes: [data.user_id || "", data.tenant_id || ""],
});
} catch (error) {
console.error("Failed to track event:", error);
}
}
// Usage in server function
export const loginUser = createServerFn({ method: "POST" }).handler(
async ({ data, context }) => {
const startTime = Date.now();
const user = await authenticateUser(data);
const duration = Date.now() - startTime;
// Track login event
await trackEvent(context.env, "user_login", {
user_id: user.id,
tenant_id: user.tenantId,
duration_ms: duration,
});
return user;
}
);
```
### Query Analytics Data
Use Cloudflare GraphQL API:
```graphql
query GetLoginStats {
viewer {
accounts(filter: { accountTag: $accountId }) {
workersAnalyticsEngineDataset(dataset: "my_analytics") {
query(
filter: {
blob1: "user_login"
datetime_gt: "2025-01-01T00:00:00Z"
}
) {
count
dimensions {
blob1 # event name
index1 # user_id
index2 # tenant_id
}
}
}
}
}
}
```
## Health Checks
### Health Check Endpoint
```typescript
// app/routes/api/health.ts
import { createServerFn } from "@tanstack/start";
import { db } from "~/lib/server/db";
export const GET = createServerFn({ method: "GET" }).handler(async ({ context }) => {
const startTime = Date.now();
const checks: Record<string, string> = {};
// Check database
let dbHealthy = false;
try {
await db.execute("SELECT 1");
dbHealthy = true;
checks.database = "ok";
} catch (error) {
console.error("Database health check failed:", error);
checks.database = "failed";
}
// Check Redis (if using Upstash)
let redisHealthy = false;
if (context.env.REDIS) {
try {
await context.env.REDIS.ping();
redisHealthy = true;
checks.redis = "ok";
} catch (error) {
console.error("Redis health check failed:", error);
checks.redis = "failed";
}
}
const duration = Date.now() - startTime;
const healthy = dbHealthy && (!context.env.REDIS || redisHealthy);
return new Response(
JSON.stringify({
status: healthy ? "healthy" : "unhealthy",
checks,
duration_ms: duration,
timestamp: new Date().toISOString(),
environment: process.env.ENVIRONMENT,
}),
{
status: healthy ? 200 : 503,
headers: { "Content-Type": "application/json" },
}
);
});
```
### Cloudflare Health Checks
Configure in Cloudflare dashboard:
1. Go to Traffic → Health Checks
2. Create health check for `/api/health`
3. Configure:
- Interval: 60 seconds
- Timeout: 5 seconds
- Retries: 2
- Expected status: 200
4. Set up notifications (email/webhook)
## Error Tracking
### Structured Error Logging
```typescript
// app/utils/error-handler.ts
import { logger } from "~/utils/logger";
export function handleError(error: Error, context?: Record<string, unknown>) {
// Log error with full context
logger.error(error.message, {
error_name: error.name,
stack: error.stack,
...context,
});
// Also log to Analytics Engine for tracking
if (context?.env) {
trackEvent(context.env as Env, "error_occurred", {
error_name: error.name,
error_message: error.message,
});
}
}
// Usage in server function
export const updateUser = createServerFn({ method: "POST" }).handler(
async ({ data, context }) => {
try {
return await userService.update(data);
} catch (error) {
handleError(error, {
user_id: context.user?.id,
tenant_id: context.tenant?.id,
env: context.env,
});
throw error;
}
}
);
```
### Viewing Errors in Cloudflare
1. **Workers Dashboard**: View errors in real-time
2. **Wrangler Tail**: `npx wrangler tail --status error`
3. **Analytics**: Check error rate metrics
4. **Health Checks**: Monitor endpoint failures
## Supporting Documentation
All supporting files are under 500 lines per Anthropic best practices:
- **[examples/](examples/)** - Complete monitoring examples
- [cloudflare-logging.md](examples/cloudflare-logging.md) - Structured console logging
- [wrangler-tail.md](examples/wrangler-tail.md) - Real-time log streaming
- [analytics-engine.md](examples/analytics-engine.md) - Custom event tracking
- [health-checks.md](examples/health-checks.md) - Health check implementations
- [error-tracking.md](examples/error-tracking.md) - Error handling patterns
- [INDEX.md](examples/INDEX.md) - Examples navigation
- **[reference/](reference/)** - Monitoring references
- [cloudflare-metrics.md](reference/cloudflare-metrics.md) - Available metrics
- [wrangler-commands.md](reference/wrangler-commands.md) - Wrangler CLI reference
- [alert-configuration.md](reference/alert-configuration.md) - Setting up alerts
- [INDEX.md](reference/INDEX.md) - Reference navigation
- **[templates/](templates/)** - Copy-paste ready templates
- [logger.ts](templates/logger.ts) - Cloudflare logger template
- [health-check.ts](templates/health-check.ts) - Health check endpoint
- **[checklists/](checklists/)** - Monitoring checklists
- [observability-setup-checklist.md](checklists/observability-setup-checklist.md) - Setup checklist
## When to Apply This Skill
Use this skill when:
- Setting up monitoring for new Cloudflare Workers projects
- Implementing structured logging with console
- Debugging production issues with wrangler tail
- Setting up health checks
- Implementing custom metrics tracking with Analytics Engine
- Configuring Cloudflare alerts
## Template Reference
These patterns are from Grey Haven's production monitoring:
- **Cloudflare Workers Analytics**: Request and performance metrics
- **Wrangler tail**: Real-time log streaming
- **Console logging**: Structured JSON logs
- **Analytics Engine**: Custom event tracking
## Critical Reminders
1. **Structured logging**: Use JSON.stringify for console logs
2. **Request IDs**: Track requests with UUIDs for debugging
3. **Error context**: Include tenant_id, user_id in all error logs
4. **Health checks**: Monitor database and external service connections
5. **Wrangler tail**: Use filters to narrow down logs (--status, --method)
6. **Performance**: Track duration_ms for all operations
7. **Environment**: Log environment in all messages for filtering
8. **Analytics Engine**: Use for custom metrics and event tracking
9. **Dashboard access**: Logs available in Cloudflare Workers dashboard
10. **Real-time debugging**: Use wrangler tail for live production debugging