14 KiB
Claude Code OpenTelemetry Setup Skill
Automated workflow for setting up OpenTelemetry telemetry collection for Claude Code usage monitoring, cost tracking, and productivity analytics.
Version: 1.0.0 Author: Prometheus Team
Features
- Mode 1: Local PoC Setup - Full Docker stack with Grafana dashboards
- Mode 2: Enterprise Setup - Connect to centralized infrastructure
- Automated configuration file generation
- Dashboard import with UID detection
- Verification and testing procedures
- Comprehensive troubleshooting guides
Quick Start
Prerequisites
For Mode 1 (Local PoC):
- Docker Desktop installed and running
- Claude Code installed
- Write access to
~/.claude/settings.json
For Mode 2 (Enterprise):
- OTEL Collector endpoint URL
- Authentication credentials
- Write access to
~/.claude/settings.json
Installation
This skill is designed to be invoked by Claude Code. No manual installation required.
Usage
Mode 1 - Local PoC Setup:
"Set up Claude Code telemetry locally"
"I want to try OpenTelemetry with Claude Code"
"Create a local telemetry stack for me"
Mode 2 - Enterprise Setup:
"Connect Claude Code to our company OTEL endpoint at otel.company.com:4317"
"Set up telemetry for team rollout"
"Configure enterprise telemetry"
What Gets Collected?
Metrics
- Session counts and active time - How much you use Claude Code
- Token usage - Input, output, cached tokens by model
- API costs - Spend tracking by model and time
- Lines of code - Code modifications (added, changed, deleted)
- Commits and PRs - Git activity tracking
Events/Logs
- User prompts (if enabled)
- Tool executions
- API requests
- Session lifecycle
Privacy: Metrics are anonymized. Source code content is never collected.
Directory Structure
claude-code-otel-setup/
├── SKILL.md # Main skill definition
├── README.md # This file
├── modes/
│ ├── mode1-poc-setup.md # Detailed local setup workflow
│ └── mode2-enterprise.md # Detailed enterprise setup workflow
├── templates/
│ ├── docker-compose.yml # Docker Compose configuration
│ ├── otel-collector-config.yml # OTEL Collector configuration
│ ├── prometheus.yml # Prometheus scrape configuration
│ ├── grafana-datasources.yml # Grafana datasource provisioning
│ ├── settings.json.local # Local telemetry settings template
│ ├── settings.json.enterprise # Enterprise settings template
│ ├── start-telemetry.sh # Start script
│ └── stop-telemetry.sh # Stop script
├── dashboards/
│ ├── README.md # Dashboard import guide
│ ├── claude-code-overview.json # Comprehensive dashboard
│ └── claude-code-simple.json # Simplified dashboard
└── data/
├── metrics-reference.md # Complete metrics documentation
├── prometheus-queries.md # Useful PromQL queries
└── troubleshooting.md # Common issues and solutions
Mode 1: Local PoC Setup
What it does:
- Creates
~/.claude/telemetry/directory - Generates Docker Compose configuration
- Starts 4 containers: OTEL Collector, Prometheus, Loki, Grafana
- Updates Claude Code settings.json
- Imports Grafana dashboards
- Verifies data flow
Time: 5-7 minutes
Output:
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Working dashboards with real data
Detailed workflow: See modes/mode1-poc-setup.md
Mode 2: Enterprise Setup
What it does:
- Collects enterprise OTEL endpoint details
- Updates Claude Code settings.json with endpoint and auth
- Adds team/environment resource attributes
- Tests connectivity (optional)
- Provides team rollout documentation
Time: 2-3 minutes
Output:
- Claude Code configured to send to central endpoint
- Connectivity verified
- Team rollout guide generated
Detailed workflow: See modes/mode2-enterprise.md
Example Dashboards
Overview Dashboard
Includes:
- Total Lines of Code (all-time)
- Total Cost (24h)
- Total Tokens (24h)
- Active Time (24h)
- Cost Over Time (timeseries)
- Token Usage by Type (stacked)
- Lines of Code Modified (bar chart)
- Commits Created (24h)
Custom Queries
See data/prometheus-queries.md for 50+ ready-to-use PromQL queries:
- Cost analysis
- Token usage
- Productivity metrics
- Team aggregation
- Model comparison
- Alerting rules
Common Use Cases
Individual Developer
Goal: Track personal Claude Code usage and costs
Setup:
Mode 1 (Local PoC)
Access:
- Personal Grafana dashboard at localhost:3000
- All data stays local
Team Pilot (5-10 Users)
Goal: Aggregate metrics across pilot users
Setup:
Mode 2 (Enterprise)
Architecture:
- Centralized OTEL Collector
- Team-level Prometheus/Grafana
- Aggregated dashboards
Enterprise Rollout (100+ Users)
Goal: Organization-wide cost tracking and productivity analytics
Setup:
Mode 2 (Enterprise) + Managed Infrastructure
Features:
- Department/team/project attribution
- Chargeback reporting
- Executive dashboards
- Trend analysis
Troubleshooting
Quick Checks
Containers not starting:
docker compose logs
No metrics in Prometheus:
- Restart Claude Code (telemetry loads at startup)
- Wait 60 seconds (export interval)
- Check OTEL Collector logs:
docker compose logs otel-collector
Dashboard shows "No data":
- Verify metric names use double prefix:
claude_code_claude_code_* - Check time range (top-right corner)
- Verify datasource UID matches
Full troubleshooting guide: See data/troubleshooting.md
Known Issues
Issue 1: 🚨 CRITICAL - Missing OTEL Exporters
Description: Claude Code not sending telemetry even with CLAUDE_CODE_ENABLE_TELEMETRY=1
Cause: Missing required OTEL_METRICS_EXPORTER and OTEL_LOGS_EXPORTER settings
Solution: The skill templates include these by default. Always verify they're present in settings.json. See Configuration Reference for details.
Issue 2: OTEL Collector Deprecated 'address' Field
Description: Collector crashes with "'address' has invalid keys" error
Cause: The address field in service.telemetry.metrics is deprecated in collector v0.123.0+
Solution: Skill templates have this removed. If using custom config, remove the deprecated field.
Issue 3: Metric Double Prefix
Description: Metrics are named claude_code_claude_code_* instead of claude_code_*
Cause: OTEL Collector Prometheus exporter adds namespace prefix
Solution: This is expected. Dashboards use correct naming.
Issue 4: Dashboard Datasource UID Mismatch
Description: Dashboard shows "datasource prometheus not found"
Cause: Dashboard has hardcoded UID that doesn't match your Grafana
Solution: Skill automatically detects and fixes UID during import
Issue 5: OTEL Collector Deprecated Exporter
Description: Container fails with "logging exporter has been deprecated"
Cause: Old OTEL configuration
Solution: Skill uses debug exporter (not deprecated logging)
Configuration Reference
Settings.json (Local)
🚨 CRITICAL REQUIREMENTS:
The following settings are REQUIRED (not optional) for telemetry to work:
CLAUDE_CODE_ENABLE_TELEMETRY: "1"- Enables telemetry systemOTEL_METRICS_EXPORTER: "otlp"- REQUIRED to send metrics (most common missing setting!)OTEL_LOGS_EXPORTER: "otlp"- REQUIRED to send events/logs
Without OTEL_METRICS_EXPORTER and OTEL_LOGS_EXPORTER, telemetry will not send even if CLAUDE_CODE_ENABLE_TELEMETRY=1 is set.
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp", // REQUIRED!
"OTEL_LOGS_EXPORTER": "otlp", // REQUIRED!
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "environment=local,deployment=poc"
}
}
Settings.json (Enterprise)
Same CRITICAL requirements apply:
OTEL_METRICS_EXPORTER: "otlp"- REQUIRED!OTEL_LOGS_EXPORTER: "otlp"- REQUIRED!
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp", // REQUIRED!
"OTEL_LOGS_EXPORTER": "otlp", // REQUIRED!
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer YOUR_API_KEY",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production"
}
}
Management
Start Telemetry Stack (Mode 1)
~/.claude/telemetry/start-telemetry.sh
Stop Telemetry Stack (Mode 1)
~/.claude/telemetry/stop-telemetry.sh
Check Status
docker compose ps
View Logs
docker compose logs -f
Restart Services
docker compose restart
Data Retention
Default: 15 days in Prometheus
Adjust retention:
Edit docker-compose.yml or prometheus.yml:
command:
- '--storage.tsdb.retention.time=90d'
- '--storage.tsdb.retention.size=50GB'
Disk usage: ~1-2 MB per day per active user
Security Considerations
Local Setup (Mode 1)
- Grafana accessible only on localhost
- Default credentials: admin/admin (change after first login)
- No external network exposure
- Data stored in Docker volumes
Enterprise Setup (Mode 2)
- Use HTTPS endpoints
- Store API keys securely (environment variables, secrets manager)
- Enable mTLS for production
- Tag metrics with team/project for proper attribution
Performance Tuning
Reduce OTEL Collector Memory
Edit otel-collector-config.yml:
processors:
memory_limiter:
limit_mib: 256 # Reduce from default
Reduce Prometheus Retention
Edit docker-compose.yml:
command:
- '--storage.tsdb.retention.time=7d' # Reduce from 15d
Optimize Dashboard Queries
- Use recording rules for expensive queries
- Reduce dashboard time ranges
- Increase refresh intervals
See data/prometheus-queries.md for recording rule examples
Integration Examples
Cost Alerts (PagerDuty/Slack)
# alertmanager.yml
groups:
- name: claude_code_cost
rules:
- alert: HighDailyCost
expr: sum(increase(claude_code_claude_code_cost_usage_USD_total[24h])) > 100
annotations:
summary: "Claude Code daily cost exceeded $100"
Weekly Cost Reports (Email)
Use Grafana Reporting:
- Create dashboard with cost panels
- Set up email delivery
- Schedule weekly reports
Chargeback Integration
Export metrics to data warehouse:
# Use Prometheus remote write
remote_write:
- url: "https://datawarehouse.company.com/prometheus"
Contributing
This skill is maintained by the Prometheus Team.
Feedback: Open an issue or contact the team
Improvements: Submit pull requests with enhancements
Changelog
Version 1.1.0 (2025-11-01)
Critical Updates from Production Testing:
- 🚨 CRITICAL FIX: Documented missing OTEL_METRICS_EXPORTER/OTEL_LOGS_EXPORTER as #1 cause of "telemetry not working"
- ✅ Added deprecated
addressfield fix for OTEL Collector v0.123.0+ - ✅ Enhanced troubleshooting with prominent exporter configuration section
- ✅ Updated all documentation with CRITICAL warnings for required settings
- ✅ Added comprehensive Known Issues section covering production scenarios
- ✅ Verified templates have correct exporter configuration
What Changed:
- Troubleshooting guide now prioritizes missing exporters as root cause
- Known Issues expanded from 3 to 6 issues with production learnings
- Configuration Reference includes prominent CRITICAL requirements callout
- SKILL.md Important Reminders section updated with exporter warnings
Version 1.0.0 (2025-10-31)
Initial Release:
- Mode 1: Local PoC setup with full Docker stack
- Mode 2: Enterprise setup with centralized endpoint
- Comprehensive documentation and troubleshooting
- Dashboard templates with correct metric naming
- Automated UID detection and replacement
Known Issues Fixed:
- ✅ OTEL Collector deprecated logging exporter
- ✅ Dashboard datasource UID mismatch
- ✅ Metric double prefix handling
- ✅ Loki exporter configuration
Additional Resources
- Claude Code Monitoring Docs: https://docs.claude.com/claude-code/monitoring
- OpenTelemetry Docs: https://opentelemetry.io/docs/
- Prometheus Docs: https://prometheus.io/docs/
- Grafana Docs: https://grafana.com/docs/
License
Internal use within Elsevier organization.
Support
Issues? Check data/troubleshooting.md first
Questions? Contact Prometheus Team or #claude-code-telemetry channel
Emergency? Rollback with: cp ~/.claude/settings.json.backup ~/.claude/settings.json
Ready to monitor your Claude Code usage! 🚀