Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:29:04 +08:00
commit 62641cca84
9 changed files with 3236 additions and 0 deletions

View File

@@ -0,0 +1,278 @@
---
name: cf-deployment-status
description: Check Cloudflare deployment status across environments, view recent deployments, and monitor CI/CD pipeline health
---
Check the status of Cloudflare Workers and Pages deployments. This command provides a comprehensive view of deployment health across all environments.
## What This Command Does
1. **List Recent Deployments**
- Shows last 10 deployments
- Displays status (success/failure/in-progress)
- Shows deployment duration
- Includes commit SHA and message
2. **GitHub Actions Status**
- Lists recent workflow runs
- Shows current deployment pipeline status
- Identifies failed or stuck workflows
- Displays workflow execution time
3. **Environment Health Check**
- Checks production deployment status
- Verifies staging environment
- Tests preview deployments
- Shows environment-specific metrics
## Usage
```bash
# Basic usage - check all environments
/cf-deployment-status
# Check specific environment
/cf-deployment-status production
# Show last N deployments
/cf-deployment-status --limit 20
# Show failed deployments only
/cf-deployment-status --failed
# Check specific worker
/cf-deployment-status --worker my-worker-name
```
## Implementation
When you use this command, Claude will:
1. **Check Cloudflare Deployments**
```bash
# List deployments via Wrangler
wrangler deployments list --name <worker-name>
# Get deployment details
wrangler deployments view <deployment-id>
```
2. **Check GitHub Actions**
```bash
# List recent workflow runs
gh run list --workflow=deploy.yml --limit=10 --json status,conclusion,createdAt,updatedAt,headSha,headBranch
# Check for failures
gh run list --workflow=deploy.yml --status=failure --limit=5
```
3. **Environment Health**
```bash
# Test production endpoint
curl -f https://production.example.com/health
# Test staging endpoint
curl -f https://staging.example.com/health
```
4. **Generate Report**
```markdown
## Deployment Status Report
**Generated**: 2025-01-15 10:30:00 UTC
### Summary
- Total deployments (24h): 15
- Success rate: 93% (14/15)
- Active failures: 1
- Average duration: 2m 45s
### Environments
#### Production
- Status: ✓ Healthy
- Last deployment: 2 hours ago (abc123)
- Version: v1.2.3
- Health check: ✓ Passing
#### Staging
- Status: ✓ Healthy
- Last deployment: 30 minutes ago (def456)
- Version: v1.2.4-rc.1
- Health check: ✓ Passing
### Recent Deployments
| Time | Environment | Status | Duration | Commit | Triggered By |
|------|-------------|--------|----------|--------|--------------|
| 10:15 | production | ✓ Success | 2m 30s | abc123 | GitHub Actions |
| 10:00 | staging | ✓ Success | 2m 15s | def456 | GitHub Actions |
| 09:45 | staging | ✗ Failed | 1m 05s | ghi789 | Manual |
### Active Issues
1. Staging deployment failed (ghi789)
- Error: Build failed - missing environment variable
- Time: 09:45 UTC
- Duration: 1m 05s
- Recommendation: Check GitHub secrets configuration
### GitHub Actions Status
- Workflow: Deploy to Cloudflare
- Last run: ✓ Success (2 hours ago)
- Average duration: 2m 45s
- Success rate (7 days): 95%
### Recommendations
✓ All systems operational
- No action required
```
## Output Format
The command provides structured output with:
- **Executive summary** - Quick overview of deployment health
- **Environment status** - Status of each environment (production, staging, preview)
- **Recent deployments** - Table of recent deployments with status
- **Active issues** - Any current deployment problems
- **CI/CD health** - GitHub Actions workflow status
- **Recommendations** - Suggested actions
## Error Handling
If the command encounters issues:
1. **No Cloudflare credentials**
```
⚠ Warning: Cloudflare API token not found
Set CLOUDFLARE_API_TOKEN environment variable or configure wrangler.toml
```
2. **GitHub CLI not authenticated**
```
⚠ Warning: GitHub CLI not authenticated
Run: gh auth login
```
3. **Worker not found**
```
✗ Error: Worker 'my-worker' not found
Available workers:
- production-worker
- staging-worker
```
4. **API rate limit**
```
⚠ Warning: Cloudflare API rate limit reached
Retry in 60 seconds or use cached data
```
## Best Practices
1. **Regular Monitoring**
- Run daily to track deployment health
- Set up automated checks in CI/CD
- Monitor success rate trends
2. **Quick Debugging**
- Use `--failed` flag to focus on issues
- Check specific environments during incidents
- Compare deployment durations
3. **Integration**
- Add to deployment pipeline for validation
- Include in monitoring dashboards
- Use in incident response runbooks
## Related Commands
- `/cf-logs-analyze` - Analyze deployment logs
- `/cf-metrics-dashboard` - View detailed metrics
- Use `cloudflare-deployment-monitor` agent for active monitoring
## Examples
### Example 1: Check Production Status
```bash
/cf-deployment-status production
```
Output:
```markdown
## Production Deployment Status
**Status**: ✓ Healthy
**Last Deployment**: 2 hours ago
**Version**: v1.2.3 (abc123)
**Health Check**: ✓ Passing
**Response Time**: 45ms (p95)
**Error Rate**: 0.01%
**Recent Deployments**:
1. ✓ abc123 - 2 hours ago - "Fix authentication bug" (2m 30s)
2. ✓ xyz789 - 1 day ago - "Add new feature" (2m 45s)
3. ✓ def456 - 2 days ago - "Update dependencies" (3m 10s)
```
### Example 2: Check Failed Deployments
```bash
/cf-deployment-status --failed
```
Output:
```markdown
## Failed Deployments
**Last 24 Hours**: 2 failures
### Failure 1: ghi789
- **Time**: 2 hours ago
- **Environment**: staging
- **Duration**: 1m 05s
- **Error**: Build failed - Type error in src/api/handler.ts
- **Triggered By**: GitHub Actions (PR #123)
- **Logs**: Available via `gh run view 12345678`
### Failure 2: jkl012
- **Time**: 5 hours ago
- **Environment**: preview
- **Duration**: 45s
- **Error**: Missing CLOUDFLARE_ACCOUNT_ID secret
- **Triggered By**: GitHub Actions (PR #122)
- **Fixed**: Yes (redeployed successfully)
```
### Example 3: Check All Workers
```bash
/cf-deployment-status
```
Output shows status for all workers and environments with summary metrics.
## Configuration
The command uses these configuration sources:
1. **wrangler.toml** - Worker configuration
2. **GitHub Actions workflows** - CI/CD configuration
3. **Environment variables**:
- `CLOUDFLARE_API_TOKEN`
- `CLOUDFLARE_ACCOUNT_ID`
- `GITHUB_TOKEN` (for gh CLI)
## Troubleshooting
**Command returns no deployments**:
- Check wrangler.toml configuration
- Verify worker name
- Ensure API token has correct permissions
**GitHub Actions status unavailable**:
- Authenticate with `gh auth login`
- Check repository permissions
- Verify workflow file exists
**Health checks fail**:
- Verify endpoint URLs
- Check network connectivity
- Ensure health endpoint is implemented

503
commands/logs-analyze.md Normal file
View File

@@ -0,0 +1,503 @@
---
name: cf-logs-analyze
description: Analyze Cloudflare Workers logs and GitHub Actions deployment logs to identify errors, patterns, and performance issues
---
Analyze logs from Cloudflare Workers and GitHub Actions deployments to identify errors, patterns, and performance issues.
## What This Command Does
1. **Cloudflare Workers Logs**
- Streams real-time Worker logs
- Filters for errors and exceptions
- Analyzes log patterns
- Tracks error frequency
2. **GitHub Actions Logs**
- Retrieves deployment workflow logs
- Identifies build/deploy failures
- Extracts error messages
- Shows failed job steps
3. **Log Analysis**
- Identifies common error patterns
- Groups similar errors
- Suggests fixes for common issues
- Provides error context
## Usage
```bash
# Analyze recent Worker logs
/cf-logs-analyze
# Analyze specific deployment
/cf-logs-analyze <deployment-id>
# Analyze failed GitHub Actions run
/cf-logs-analyze --run <run-id>
# Filter for errors only
/cf-logs-analyze --errors-only
# Analyze last N minutes
/cf-logs-analyze --since 30m
# Specific worker
/cf-logs-analyze --worker production-worker
# Export logs to file
/cf-logs-analyze --export logs.json
```
## Implementation
When you use this command, Claude will:
1. **Stream Cloudflare Workers Logs**
```bash
# Tail Worker logs
wrangler tail <worker-name> --format=pretty
# Filter for errors
wrangler tail <worker-name> --format=json | jq 'select(.level=="error")'
# Get logs since timestamp
wrangler tail <worker-name> --since <timestamp>
```
2. **Analyze GitHub Actions Logs**
```bash
# Get workflow run logs
gh run view <run-id> --log
# Get failed job logs only
gh run view <run-id> --log-failed
# Get specific job logs
gh run view <run-id> --job <job-id> --log
```
3. **Parse and Analyze**
```javascript
// Log analysis structure
{
"analysis_period": "last_1_hour",
"total_logs": 15432,
"errors": 23,
"warnings": 145,
"error_breakdown": {
"TypeError": 12,
"NetworkError": 6,
"AuthenticationError": 3,
"Other": 2
},
"top_errors": [
{
"type": "TypeError",
"message": "Cannot read property 'id' of undefined",
"count": 8,
"first_seen": "2025-01-15T10:15:00Z",
"last_seen": "2025-01-15T10:45:00Z",
"locations": ["src/api/users.ts:42", "src/api/users.ts:67"],
"suggested_fix": "Add null check before accessing user.id"
}
]
}
```
4. **Generate Analysis Report**
## Output Format
### Example: Worker Logs Analysis
```markdown
## Cloudflare Worker Logs Analysis
**Worker**: production-worker
**Period**: Last 1 hour
**Total Logs**: 15,432
### Summary
- Total requests: 15,000
- Errors: 23 (0.15%)
- Warnings: 145 (0.97%)
- Average response time: 45ms
### Error Breakdown
| Type | Count | % of Errors | First Seen | Status |
|------|-------|-------------|------------|--------|
| TypeError | 12 | 52% | 10:15 UTC | 🔴 Active |
| NetworkError | 6 | 26% | 10:30 UTC | 🔴 Active |
| AuthenticationError | 3 | 13% | 10:25 UTC | ✅ Resolved |
| Other | 2 | 9% | 10:40 UTC | 🔴 Active |
### Top Errors
#### 1. TypeError: Cannot read property 'id' of undefined
- **Count**: 8 occurrences
- **First seen**: 10:15 UTC
- **Last seen**: 10:45 UTC
- **Location**: src/api/users.ts:42, src/api/users.ts:67
- **Impact**: 0.05% of requests
- **Suggested fix**:
```typescript
// Before
const userId = user.id;
// After
const userId = user?.id;
if (!userId) {
throw new Error('User ID not found');
}
```
#### 2. NetworkError: Failed to fetch user data
- **Count**: 6 occurrences
- **First seen**: 10:30 UTC
- **Last seen**: 10:50 UTC
- **Location**: src/services/api.ts:123
- **Impact**: 0.04% of requests
- **Pattern**: All errors from same external API
- **Suggested fix**: Add retry logic with exponential backoff
#### 3. AuthenticationError: Invalid token
- **Count**: 3 occurrences
- **First seen**: 10:25 UTC
- **Last seen**: 10:35 UTC
- **Location**: src/middleware/auth.ts:45
- **Status**: ✅ Resolved at 10:36 UTC
- **Resolution**: Token refresh implemented
### Performance Issues
#### Slow Requests (>1s)
- **Count**: 45 (0.3% of requests)
- **Average duration**: 1.8s
- **Max duration**: 3.2s
- **Common pattern**: Database queries without indexes
### Log Patterns
#### Pattern 1: Rate Limiting
```
[10:15:32] WARNING: Rate limit approaching for user 12345
[10:15:45] WARNING: Rate limit approaching for user 12345
[10:15:58] ERROR: Rate limit exceeded for user 12345
```
**Analysis**: User hitting rate limits
**Recommendation**: Implement client-side throttling
#### Pattern 2: External API Timeouts
```
[10:30:12] INFO: Fetching user data from external API
[10:30:42] ERROR: Request timeout after 30s
```
**Analysis**: External API slow/unreachable
**Recommendation**: Add circuit breaker, reduce timeout
### Geographic Distribution
| Region | Requests | Errors | Error Rate |
|--------|----------|--------|------------|
| US-East | 8,000 | 5 | 0.06% |
| EU-West | 4,500 | 12 | 0.27% |
| APAC | 2,500 | 6 | 0.24% |
**Note**: Higher error rate in EU-West region
### Recommendations
1. **Critical**: Fix TypeError in user API (8 occurrences)
2. **High**: Add retry logic for external API calls
3. **Medium**: Optimize database queries causing slow requests
4. **Low**: Investigate higher error rate in EU-West region
### Next Steps
1. Deploy fix for TypeError in src/api/users.ts
2. Monitor error rate for next hour
3. Set up alert if error rate exceeds 0.5%
```
### Example: GitHub Actions Logs Analysis
```markdown
## GitHub Actions Deployment Logs Analysis
**Workflow**: Deploy to Cloudflare
**Run ID**: 12345678
**Status**: ✗ Failed
**Duration**: 3m 45s
**Triggered**: 2 hours ago by @developer
### Job Summary
| Job | Status | Duration | Error |
|-----|--------|----------|-------|
| Build | ✓ Success | 2m 15s | - |
| Test | ✓ Success | 1m 30s | - |
| Deploy | ✗ Failed | 0m 45s | Deployment rejected |
### Failed Job: Deploy
**Error**:
```
Error: Failed to publish your Function. Got error: Uncaught SyntaxError:
Unexpected token 'export' in dist/worker.js:1234
at worker.js:1234:5
```
**Failed Step**: Deploy to Cloudflare Workers
**Time**: Step 4 of 5
**Exit Code**: 1
**Log Context**:
```
[2025-01-15 10:30:15] Installing dependencies...
[2025-01-15 10:30:45] Dependencies installed successfully
[2025-01-15 10:30:50] Building worker...
[2025-01-15 10:31:30] Build completed successfully
[2025-01-15 10:31:35] Deploying to Cloudflare...
[2025-01-15 10:31:40] ERROR: Failed to publish your Function
[2025-01-15 10:31:40] ERROR: Got error: Uncaught SyntaxError
```
### Root Cause Analysis
**Issue**: SyntaxError in deployed worker
**Cause**: Build output contains ES6 modules but Cloudflare Worker expects bundled code
**Location**: dist/worker.js:1234
**Code Context**:
```javascript
// Line 1234 in dist/worker.js
export { handler }; // ❌ This is the problem
```
**Why it failed**:
- Build process didn't bundle the code properly
- Export statement not compatible with Worker runtime
- Missing bundler configuration
### Suggested Fix
**Option 1**: Update build configuration
```json
// package.json
{
"scripts": {
"build": "esbuild src/index.ts --bundle --format=esm --outfile=dist/worker.js"
}
}
```
**Option 2**: Update wrangler.toml
```toml
[build]
command = "npm run build"
watch_dirs = ["src"]
[build.upload]
format = "modules"
main = "./dist/worker.js"
```
### Prevention
To prevent this in the future:
1. Add build validation step before deployment
2. Test worker locally with `wrangler dev`
3. Add syntax validation in CI
4. Use TypeScript strict mode
**Recommended CI step**:
```yaml
- name: Validate Worker
run: |
wrangler deploy --dry-run
node -c dist/worker.js # Check syntax
```
### Related Issues
- Similar failure in run #12345600 (3 days ago)
- Pattern: Occurs after dependency updates
- Recommendation: Add pre-deployment validation
### Quick Fix Command
```bash
# Update build configuration
npm install --save-dev esbuild
# Update build script in package.json
# Redeploy
```
```
## Log Analysis Capabilities
### 1. Error Pattern Recognition
Identifies common error patterns:
- **Null pointer exceptions** → Add null checks
- **Authentication failures** → Check token/credentials
- **Network timeouts** → Add retry logic
- **Rate limiting** → Implement backoff
- **Build failures** → Check dependencies/configuration
### 2. Performance Analysis
Tracks performance metrics from logs:
- Request duration distribution
- Slow endpoint identification
- Cold start frequency
- Resource usage patterns
### 3. Security Issue Detection
Identifies security-related log entries:
- Authentication failures
- Unauthorized access attempts
- Suspicious request patterns
- Potential DDoS indicators
### 4. Deployment Issue Analysis
Analyzes deployment-specific problems:
- Build failures
- Test failures
- Configuration errors
- Dependency issues
- API quota/rate limits
## Advanced Features
### Log Aggregation
Combine logs from multiple sources:
```bash
# Analyze both Worker and CI logs
/cf-logs-analyze --deployment abc123 --include-ci
```
Output combines:
- Worker execution logs
- GitHub Actions deployment logs
- Build process logs
- Test execution logs
### Time-Series Analysis
Track errors over time:
```bash
# Analyze last 24 hours
/cf-logs-analyze --since 24h --group-by hour
```
Output:
```markdown
### Error Rate Over Time
| Hour | Requests | Errors | Error Rate |
|------|----------|--------|------------|
| 09:00 | 5,000 | 12 | 0.24% |
| 10:00 | 5,200 | 23 | 0.44% | 📈 Spike
| 11:00 | 5,100 | 8 | 0.16% |
```
### Error Correlation
Find correlated errors:
```markdown
### Correlated Errors
**Primary**: TypeError in user API
**Correlated with**:
- AuthenticationError (80% correlation)
- NetworkError to external API (60% correlation)
**Analysis**: TypeError occurs after auth token expiry
**Fix**: Refresh token before API call
```
## Integration
### With Monitoring Tools
Export to monitoring platforms:
```bash
# Export to Datadog
/cf-logs-analyze --export datadog
# Export to Sentry
/cf-logs-analyze --export sentry
# Export to JSON
/cf-logs-analyze --export logs.json
```
### With Incident Response
Use during incidents:
```bash
# Quick error analysis
/cf-logs-analyze --errors-only --since 30m
# Find specific error
/cf-logs-analyze --search "database timeout"
# Compare with previous deployment
/cf-logs-analyze --deployment abc123 --compare-to xyz789
```
## Best Practices
1. **Regular Analysis**
- Analyze logs after each deployment
- Review error patterns weekly
- Track error rate trends
2. **Proactive Monitoring**
- Set up log-based alerts
- Monitor error rate thresholds
- Track performance degradation
3. **Incident Response**
- Use during outages for quick diagnosis
- Compare with baseline logs
- Track error resolution
## Related Commands
- `/cf-deployment-status` - Check deployment status
- `/cf-metrics-dashboard` - View metrics dashboard
- Use `cloudflare-deployment-monitor` agent for active monitoring
- Use `cloudflare-cicd-analyzer` agent for CI/CD optimization
## Configuration
Configure log analysis behavior:
```json
// .claude/settings.json
{
"cloudflare-logs": {
"default_worker": "production-worker",
"analysis_window": "1h",
"error_threshold": 0.01,
"include_warnings": true,
"export_format": "json"
}
}
```
## Troubleshooting
**No logs available**:
- Check worker name
- Verify API token permissions
- Ensure worker is receiving traffic
**GitHub Actions logs not found**:
- Authenticate with `gh auth login`
- Check run ID is correct
- Verify repository access
**Analysis too slow**:
- Reduce time window
- Use `--errors-only` flag
- Filter by specific log level

View File

@@ -0,0 +1,619 @@
---
name: cf-metrics-dashboard
description: Display comprehensive deployment and performance metrics dashboard for Cloudflare Workers and Pages with GitHub Actions CI/CD integration
---
Display a comprehensive metrics dashboard for Cloudflare Workers and Pages deployments, including deployment metrics, performance data, CI/CD pipeline health, and Core Web Vitals.
## What This Command Does
1. **Deployment Metrics**
- Deployment frequency
- Success/failure rate
- Mean time to deployment (MTTD)
- Rollback frequency
- Deployment duration trends
2. **Performance Metrics**
- Request latency (p50, p95, p99)
- Error rates
- Requests per second
- Cold start metrics
- Bundle size trends
3. **CI/CD Pipeline Metrics**
- Workflow success rate
- Pipeline duration
- Job-level performance
- GitHub Actions minutes usage
- Queue time analysis
4. **Core Web Vitals**
- LCP (Largest Contentful Paint)
- FID (First Input Delay)
- CLS (Cumulative Layout Shift)
- TTFB (Time to First Byte)
## Usage
```bash
# Show all metrics
/cf-metrics-dashboard
# Specific time range
/cf-metrics-dashboard --range 7d
/cf-metrics-dashboard --range 24h
/cf-metrics-dashboard --range 30d
# Specific worker
/cf-metrics-dashboard --worker production-worker
# Specific environment
/cf-metrics-dashboard --env production
# Compare deployments
/cf-metrics-dashboard --compare abc123 xyz789
# Export to file
/cf-metrics-dashboard --export dashboard.json
# Specific metric groups
/cf-metrics-dashboard --metrics deployment,performance
/cf-metrics-dashboard --metrics cicd
/cf-metrics-dashboard --metrics web-vitals
```
## Dashboard Output
### Full Dashboard View
```markdown
# Cloudflare Deployment Metrics Dashboard
**Worker**: production-worker
**Environment**: production
**Period**: Last 7 days
**Generated**: 2025-01-15 10:30:00 UTC
---
## 📊 Executive Summary
| Metric | Value | Trend | Status |
|--------|-------|-------|--------|
| Deployment Success Rate | 96% | ↑ +2% | ✅ Good |
| Average Deployment Time | 2m 45s | ↓ -15s | ✅ Good |
| Error Rate | 0.08% | ↓ -0.02% | ✅ Good |
| P95 Latency | 125ms | ↑ +10ms | ⚠️ Warning |
| Core Web Vitals Score | 92/100 | → 0 | ✅ Good |
---
## 🚀 Deployment Metrics
### Deployment Frequency
```
Week view:
Mon ████████████ 12 deployments
Tue ██████ 6 deployments
Wed █████████ 9 deployments
Thu ███████████ 11 deployments
Fri ████████ 8 deployments
Sat ████ 4 deployments
Sun ██ 2 deployments
Total: 52 deployments
Average: 7.4 deployments/day
```
### Deployment Success Rate
```
Last 7 days: 96% (50/52 successful)
Last 30 days: 94% (198/210 successful)
Trend: ↑ Improving
```
### Deployment Duration
| Metric | Current | Previous | Change |
|--------|---------|----------|--------|
| Mean | 2m 45s | 3m 00s | ↓ -15s |
| P95 | 4m 30s | 5m 00s | ↓ -30s |
| P99 | 6m 15s | 7m 00s | ↓ -45s |
| Max | 8m 20s | 9m 30s | ↓ -1m 10s |
**Trend**: ✅ Improving (15% faster)
### Recent Deployments
| Time | Status | Duration | Commit | Environment |
|------|--------|----------|--------|-------------|
| 2h ago | ✅ Success | 2m 30s | abc123 | production |
| 4h ago | ✅ Success | 2m 45s | def456 | staging |
| 6h ago | ❌ Failed | 1m 20s | ghi789 | production |
| 8h ago | ✅ Success | 3m 10s | jkl012 | production |
| 10h ago | ✅ Success | 2m 55s | mno345 | staging |
### Rollback Activity
```
Total rollbacks (7d): 2
Rollback rate: 3.8%
Reasons:
- Build failure: 1
- Post-deployment errors: 1
Mean time to rollback: 5m 30s
```
---
## ⚡ Performance Metrics
### Request Latency
```
Current (last hour):
p50: 45ms ████████████░░░░░░░░
p75: 82ms ████████████████░░░░
p95: 125ms █████████████████░░░
p99: 245ms ███████████████████░
Target thresholds:
p50: <50ms ✅ Met
p95: <200ms ✅ Met
p99: <500ms ✅ Met
```
**7-day trend**:
```
Day 1: p95=115ms ████████████░
Day 2: p95=118ms █████████████░
Day 3: p95=120ms █████████████░
Day 4: p95=125ms ██████████████
Day 5: p95=122ms █████████████░
Day 6: p95=125ms ██████████████
Day 7: p95=125ms ██████████████
Trend: ↑ Slight increase (+10ms)
```
### Request Volume
```
Requests/second (current): 1,245 rps
Requests/day (average): 107M requests
Peak: 2,180 rps (09:00 UTC)
Trough: 340 rps (03:00 UTC)
```
### Error Rates
| Error Type | Count | Rate | Trend |
|------------|-------|------|-------|
| 5xx errors | 850 | 0.08% | ↓ Good |
| 4xx errors | 12,400 | 1.16% | → Stable |
| Timeouts | 120 | 0.01% | ↓ Good |
| Total | 13,370 | 1.25% | ↓ Good |
**Target**: <1% error rate for 5xx errors ✅ Met
### Cold Start Analysis
```
Cold starts (7d): 3,420
Cold start rate: 0.32% of requests
Duration distribution:
p50: 180ms ████████████████░░░░
p95: 350ms ███████████████████░
p99: 520ms ████████████████████
Impact: Minimal (<0.5% of requests)
```
### Bundle Size
```
Current: 512 KB ████████████████░░░░
Maximum: 750 KB ████████████████████
Percentage: 68% of limit
7-day trend:
Day 1: 505 KB ████████████████░░░░
Day 2: 508 KB ████████████████░░░░
Day 3: 510 KB ████████████████░░░░
Day 4: 512 KB ████████████████░░░░
Day 5: 512 KB ████████████████░░░░
Day 6: 512 KB ████████████████░░░░
Day 7: 512 KB ████████████████░░░░
Change: +7 KB (+1.4%)
Status: ✅ Under control
```
---
## 🔄 CI/CD Pipeline Metrics
### GitHub Actions Performance
```
Workflow: Deploy to Cloudflare
Total runs (7d): 52
Success rate: 96% (50/52)
Duration breakdown:
├─ Build job: 2m 15s (50%)
├─ Test job: 1m 30s (33%)
└─ Deploy job: 45s (17%)
Total average: 4m 30s
```
### Job-Level Performance
| Job | Avg Duration | Success Rate | Trend |
|-----|--------------|--------------|-------|
| Build | 2m 15s | 98% | ↓ -10s |
| Test | 1m 30s | 96% | → 0s |
| Deploy | 45s | 100% | ↓ -5s |
### Cache Effectiveness
```
npm cache hit rate: 87%
Build cache hit rate: 72%
Time saved by caching:
- npm install: 1m 20s → 15s (saved 1m 05s)
- Build: 2m 30s → 45s (saved 1m 45s)
Total time saved per run: 2m 50s
```
### GitHub Actions Minutes Usage
```
Total minutes (7d): 234 minutes
Average per run: 4.5 minutes
Projected monthly: ~1,000 minutes
Cost (estimated): $0.00 (within free tier)
```
### Failure Analysis
```
Failed runs (7d): 2
Failure breakdown:
- Build failures: 1 (50%)
- Test failures: 0 (0%)
- Deployment failures: 1 (50%)
Mean time to fix: 15 minutes
```
---
## 🌐 Core Web Vitals
### Overall Score: 92/100 ✅
| Metric | Value | Target | Status | Trend |
|--------|-------|--------|--------|-------|
| LCP (p75) | 1.8s | <2.5s | ✅ Good | → Stable |
| FID (p75) | 45ms | <100ms | ✅ Good | ↓ Better |
| CLS (p75) | 0.05 | <0.1 | ✅ Good | → Stable |
| FCP (p75) | 1.2s | <1.8s | ✅ Good | → Stable |
| TTFB (p75) | 420ms | <600ms | ✅ Good | ↑ +20ms |
### LCP (Largest Contentful Paint)
```
Distribution:
Good (<2.5s): ████████████████████ 89% ✅
Needs work (2.5-4s): ███ 8% ⚠️
Poor (>4s): █ 3% ❌
p75 value: 1.8s ✅ Good
Target: <2.5s
```
### FID (First Input Delay)
```
Distribution:
Good (<100ms): ████████████████████ 95% ✅
Needs work (100-300ms): █ 4% ⚠️
Poor (>300ms): ░ 1% ❌
p75 value: 45ms ✅ Good
Target: <100ms
```
### CLS (Cumulative Layout Shift)
```
Distribution:
Good (<0.1): ████████████████████ 92% ✅
Needs work (0.1-0.25): ██ 6% ⚠️
Poor (>0.25): ░ 2% ❌
p75 value: 0.05 ✅ Good
Target: <0.1
```
### Geographic Performance
| Region | LCP | FID | CLS | Score |
|--------|-----|-----|-----|-------|
| US-East | 1.6s | 42ms | 0.04 | 95/100 ✅ |
| US-West | 1.7s | 44ms | 0.05 | 94/100 ✅ |
| EU-West | 1.9s | 48ms | 0.06 | 91/100 ✅ |
| APAC | 2.2s | 55ms | 0.07 | 88/100 ⚠️ |
**Note**: APAC region slightly slower, still meeting targets
---
## 📈 Trends & Insights
### Key Findings
1. ✅ Deployment speed improved 15% over last week
2. ⚠️ P95 latency increased by 10ms (monitoring)
3. ✅ Error rate decreased by 0.02%
4. ✅ Core Web Vitals stable and meeting targets
5. ✅ CI/CD pipeline optimized with caching
### Performance Regressions Detected
None. All metrics within acceptable thresholds.
### Recommendations
1. **Medium Priority**: Investigate P95 latency increase
- Started: 3 days ago
- Impact: +10ms (still within target)
- Action: Review recent code changes
2. **Low Priority**: Optimize APAC region performance
- LCP slightly higher (2.2s vs 1.8s average)
- Still meeting targets (<2.5s)
- Action: Consider regional caching strategy
### Upcoming Alerts
⚠️ Bundle size approaching 70% of limit
- Current: 512 KB / 750 KB
- Action: Plan bundle size optimization
---
## 📊 Historical Comparison
### vs. Last Week
| Metric | Current | Last Week | Change |
|--------|---------|-----------|--------|
| Deployment frequency | 52 | 48 | +4 (+8%) |
| Success rate | 96% | 94% | +2% |
| Avg deployment time | 2m 45s | 3m 00s | -15s (-8%) |
| Error rate | 0.08% | 0.10% | -0.02% |
| P95 latency | 125ms | 115ms | +10ms (+9%) |
### vs. Last Month
| Metric | Current | Last Month | Change |
|--------|---------|------------|--------|
| Deployment frequency | 52/wk | 45/wk | +7 (+16%) |
| Success rate | 96% | 92% | +4% |
| Avg deployment time | 2m 45s | 3m 30s | -45s (-21%) |
| Error rate | 0.08% | 0.12% | -0.04% |
| P95 latency | 125ms | 130ms | -5ms (-4%) |
---
## 🎯 SLO Status
### Service Level Objectives
| SLO | Target | Current | Status | Remaining Error Budget |
|-----|--------|---------|--------|------------------------|
| Availability | 99.9% | 99.92% | ✅ Met | 80% remaining |
| P95 Latency | <200ms | 125ms | ✅ Met | 37% used |
| Error Rate | <1% | 0.08% | ✅ Met | 92% remaining |
| Deployment Success | >95% | 96% | ✅ Met | 20% buffer |
**Error Budget Status**: ✅ Healthy
- 80% error budget remaining
- Current burn rate: Low
- Projected to meet SLOs for next 30 days
---
## 🔔 Active Alerts
No active alerts. All systems operational. ✅
---
## 💡 Next Actions
1. Continue monitoring P95 latency trend
2. Review code changes from last 3 days
3. Plan bundle size optimization for next sprint
4. Consider APAC region caching improvements
---
**Report Generated**: 2025-01-15 10:30:00 UTC
**Next Update**: Automatic (every hour) or run `/cf-metrics-dashboard` anytime
```
## Metric Categories
### 1. Deployment Metrics
- **Frequency**: Deployments per day/week
- **Success Rate**: % of successful deployments
- **Duration**: Time to complete deployment
- **Rollback Rate**: Frequency of rollbacks
- **MTTD**: Mean Time To Deployment
### 2. Performance Metrics
- **Latency**: p50, p95, p99 response times
- **Error Rates**: 4xx, 5xx, timeout errors
- **Throughput**: Requests per second
- **Cold Starts**: Frequency and duration
- **Bundle Size**: Size trends
### 3. CI/CD Metrics
- **Workflow Success Rate**: GitHub Actions success %
- **Pipeline Duration**: Total workflow time
- **Job Performance**: Individual job times
- **Cache Hit Rate**: Effectiveness of caching
- **GitHub Actions Minutes**: Usage tracking
### 4. User Experience Metrics
- **Core Web Vitals**: LCP, FID, CLS
- **TTFB**: Time to First Byte
- **FCP**: First Contentful Paint
- **Geographic Performance**: Regional metrics
## Advanced Features
### Metric Comparison
Compare different deployments:
```bash
/cf-metrics-dashboard --compare abc123 xyz789
```
Output shows side-by-side comparison with deltas.
### Custom Time Ranges
```bash
# Last 24 hours
/cf-metrics-dashboard --range 24h
# Last 7 days (default)
/cf-metrics-dashboard --range 7d
# Last 30 days
/cf-metrics-dashboard --range 30d
# Custom range
/cf-metrics-dashboard --from 2025-01-01 --to 2025-01-15
```
### Filtered Views
Show specific metric categories:
```bash
# Only deployment metrics
/cf-metrics-dashboard --metrics deployment
# Only performance metrics
/cf-metrics-dashboard --metrics performance
# Multiple categories
/cf-metrics-dashboard --metrics deployment,performance,cicd
```
### Export Options
```bash
# Export to JSON
/cf-metrics-dashboard --export dashboard.json
# Export to CSV
/cf-metrics-dashboard --export metrics.csv
# Send to monitoring platform
/cf-metrics-dashboard --export datadog
```
## Integration
### With Monitoring Tools
Send metrics to external platforms:
- **Datadog**: Send metrics and events
- **Sentry**: Performance monitoring
- **Grafana**: Custom dashboards
- **CloudWatch**: AWS integration
### With Alerting
Set up alerts based on thresholds:
```javascript
{
"alerts": [
{
"metric": "deployment_success_rate",
"threshold": 0.95,
"operator": "<",
"action": "notify_slack"
},
{
"metric": "p95_latency_ms",
"threshold": 200,
"operator": ">",
"action": "create_incident"
}
]
}
```
## Best Practices
1. **Regular Review**
- Check dashboard daily
- Review weekly trends
- Monthly deep dives
2. **Threshold Monitoring**
- Set up alerts for SLO violations
- Track error budget consumption
- Monitor trend changes
3. **Historical Analysis**
- Compare with previous periods
- Identify seasonal patterns
- Track long-term improvements
4. **Actionable Insights**
- Focus on trends, not just absolute values
- Investigate significant changes
- Correlate metrics with deployments
## Related Commands
- `/cf-deployment-status` - Check current deployment status
- `/cf-logs-analyze` - Analyze logs for errors
- Use `cloudflare-performance-tracker` agent for detailed performance analysis
- Use `cloudflare-deployment-monitor` agent for active monitoring
## Configuration
Customize dashboard settings:
```json
// .claude/settings.json
{
"cloudflare-metrics": {
"default_range": "7d",
"default_worker": "production-worker",
"refresh_interval": "1h",
"thresholds": {
"p95_latency_ms": 200,
"error_rate": 0.01,
"deployment_success_rate": 0.95
},
"web_vitals_targets": {
"lcp": 2.5,
"fid": 100,
"cls": 0.1
}
}
}
```
## Troubleshooting
**No metrics available**:
- Check Cloudflare API access
- Verify worker name
- Ensure analytics are enabled
**Incomplete data**:
- Analytics may have delay (up to 5 minutes)
- Check date range
- Verify data retention settings
**Metrics don't match other tools**:
- Check time zone differences
- Verify aggregation methods
- Compare data sources