# Claude Code OpenTelemetry Setup Skill Automated workflow for setting up OpenTelemetry telemetry collection for Claude Code usage monitoring, cost tracking, and productivity analytics. **Version:** 1.0.0 **Author:** Prometheus Team --- ## Features - **Mode 1: Local PoC Setup** - Full Docker stack with Grafana dashboards - **Mode 2: Enterprise Setup** - Connect to centralized infrastructure - Automated configuration file generation - Dashboard import with UID detection - Verification and testing procedures - Comprehensive troubleshooting guides --- ## Quick Start ### Prerequisites **For Mode 1 (Local PoC):** - Docker Desktop installed and running - Claude Code installed - Write access to `~/.claude/settings.json` **For Mode 2 (Enterprise):** - OTEL Collector endpoint URL - Authentication credentials - Write access to `~/.claude/settings.json` ### Installation This skill is designed to be invoked by Claude Code. No manual installation required. ### Usage **Mode 1 - Local PoC Setup:** ``` "Set up Claude Code telemetry locally" "I want to try OpenTelemetry with Claude Code" "Create a local telemetry stack for me" ``` **Mode 2 - Enterprise Setup:** ``` "Connect Claude Code to our company OTEL endpoint at otel.company.com:4317" "Set up telemetry for team rollout" "Configure enterprise telemetry" ``` --- ## What Gets Collected? ### Metrics - **Session counts and active time** - How much you use Claude Code - **Token usage** - Input, output, cached tokens by model - **API costs** - Spend tracking by model and time - **Lines of code** - Code modifications (added, changed, deleted) - **Commits and PRs** - Git activity tracking ### Events/Logs - User prompts (if enabled) - Tool executions - API requests - Session lifecycle **Privacy:** Metrics are anonymized. Source code content is never collected. --- ## Directory Structure ``` claude-code-otel-setup/ ├── SKILL.md # Main skill definition ├── README.md # This file ├── modes/ │ ├── mode1-poc-setup.md # Detailed local setup workflow │ └── mode2-enterprise.md # Detailed enterprise setup workflow ├── templates/ │ ├── docker-compose.yml # Docker Compose configuration │ ├── otel-collector-config.yml # OTEL Collector configuration │ ├── prometheus.yml # Prometheus scrape configuration │ ├── grafana-datasources.yml # Grafana datasource provisioning │ ├── settings.json.local # Local telemetry settings template │ ├── settings.json.enterprise # Enterprise settings template │ ├── start-telemetry.sh # Start script │ └── stop-telemetry.sh # Stop script ├── dashboards/ │ ├── README.md # Dashboard import guide │ ├── claude-code-overview.json # Comprehensive dashboard │ └── claude-code-simple.json # Simplified dashboard └── data/ ├── metrics-reference.md # Complete metrics documentation ├── prometheus-queries.md # Useful PromQL queries └── troubleshooting.md # Common issues and solutions ``` --- ## Mode 1: Local PoC Setup **What it does:** - Creates `~/.claude/telemetry/` directory - Generates Docker Compose configuration - Starts 4 containers: OTEL Collector, Prometheus, Loki, Grafana - Updates Claude Code settings.json - Imports Grafana dashboards - Verifies data flow **Time:** 5-7 minutes **Output:** - Grafana: http://localhost:3000 (admin/admin) - Prometheus: http://localhost:9090 - Working dashboards with real data **Detailed workflow:** See `modes/mode1-poc-setup.md` --- ## Mode 2: Enterprise Setup **What it does:** - Collects enterprise OTEL endpoint details - Updates Claude Code settings.json with endpoint and auth - Adds team/environment resource attributes - Tests connectivity (optional) - Provides team rollout documentation **Time:** 2-3 minutes **Output:** - Claude Code configured to send to central endpoint - Connectivity verified - Team rollout guide generated **Detailed workflow:** See `modes/mode2-enterprise.md` --- ## Example Dashboards ### Overview Dashboard Includes: - Total Lines of Code (all-time) - Total Cost (24h) - Total Tokens (24h) - Active Time (24h) - Cost Over Time (timeseries) - Token Usage by Type (stacked) - Lines of Code Modified (bar chart) - Commits Created (24h) ### Custom Queries See `data/prometheus-queries.md` for 50+ ready-to-use PromQL queries: - Cost analysis - Token usage - Productivity metrics - Team aggregation - Model comparison - Alerting rules --- ## Common Use Cases ### Individual Developer **Goal:** Track personal Claude Code usage and costs **Setup:** ``` Mode 1 (Local PoC) ``` **Access:** - Personal Grafana dashboard at localhost:3000 - All data stays local --- ### Team Pilot (5-10 Users) **Goal:** Aggregate metrics across pilot users **Setup:** ``` Mode 2 (Enterprise) ``` **Architecture:** - Centralized OTEL Collector - Team-level Prometheus/Grafana - Aggregated dashboards --- ### Enterprise Rollout (100+ Users) **Goal:** Organization-wide cost tracking and productivity analytics **Setup:** ``` Mode 2 (Enterprise) + Managed Infrastructure ``` **Features:** - Department/team/project attribution - Chargeback reporting - Executive dashboards - Trend analysis --- ## Troubleshooting ### Quick Checks **Containers not starting:** ```bash docker compose logs ``` **No metrics in Prometheus:** 1. Restart Claude Code (telemetry loads at startup) 2. Wait 60 seconds (export interval) 3. Check OTEL Collector logs: `docker compose logs otel-collector` **Dashboard shows "No data":** 1. Verify metric names use double prefix: `claude_code_claude_code_*` 2. Check time range (top-right corner) 3. Verify datasource UID matches **Full troubleshooting guide:** See `data/troubleshooting.md` --- ## Known Issues ### Issue 1: 🚨 CRITICAL - Missing OTEL Exporters **Description:** Claude Code not sending telemetry even with `CLAUDE_CODE_ENABLE_TELEMETRY=1` **Cause:** Missing required `OTEL_METRICS_EXPORTER` and `OTEL_LOGS_EXPORTER` settings **Solution:** The skill templates include these by default. **Always verify** they're present in settings.json. See Configuration Reference for details. --- ### Issue 2: OTEL Collector Deprecated 'address' Field **Description:** Collector crashes with "'address' has invalid keys" error **Cause:** The `address` field in `service.telemetry.metrics` is deprecated in collector v0.123.0+ **Solution:** Skill templates have this removed. If using custom config, remove the deprecated field. --- ### Issue 3: Metric Double Prefix **Description:** Metrics are named `claude_code_claude_code_*` instead of `claude_code_*` **Cause:** OTEL Collector Prometheus exporter adds namespace prefix **Solution:** This is expected. Dashboards use correct naming. --- ### Issue 4: Dashboard Datasource UID Mismatch **Description:** Dashboard shows "datasource prometheus not found" **Cause:** Dashboard has hardcoded UID that doesn't match your Grafana **Solution:** Skill automatically detects and fixes UID during import --- ### Issue 5: OTEL Collector Deprecated Exporter **Description:** Container fails with "logging exporter has been deprecated" **Cause:** Old OTEL configuration **Solution:** Skill uses `debug` exporter (not deprecated `logging`) --- ## Configuration Reference ### Settings.json (Local) **🚨 CRITICAL REQUIREMENTS:** The following settings are **REQUIRED** (not optional) for telemetry to work: - `CLAUDE_CODE_ENABLE_TELEMETRY: "1"` - Enables telemetry system - `OTEL_METRICS_EXPORTER: "otlp"` - **REQUIRED** to send metrics (most common missing setting!) - `OTEL_LOGS_EXPORTER: "otlp"` - **REQUIRED** to send events/logs Without `OTEL_METRICS_EXPORTER` and `OTEL_LOGS_EXPORTER`, telemetry will not send even if `CLAUDE_CODE_ENABLE_TELEMETRY=1` is set. ```json { "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", // REQUIRED! "OTEL_LOGS_EXPORTER": "otlp", // REQUIRED! "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc", "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317", "OTEL_METRIC_EXPORT_INTERVAL": "60000", "OTEL_LOGS_EXPORT_INTERVAL": "5000", "OTEL_LOG_USER_PROMPTS": "1", "OTEL_METRICS_INCLUDE_SESSION_ID": "true", "OTEL_METRICS_INCLUDE_VERSION": "true", "OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true", "OTEL_RESOURCE_ATTRIBUTES": "environment=local,deployment=poc" } } ``` ### Settings.json (Enterprise) **Same CRITICAL requirements apply:** - `OTEL_METRICS_EXPORTER: "otlp"` - **REQUIRED!** - `OTEL_LOGS_EXPORTER: "otlp"` - **REQUIRED!** ```json { "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", // REQUIRED! "OTEL_LOGS_EXPORTER": "otlp", // REQUIRED! "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc", "OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317", "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer YOUR_API_KEY", "OTEL_METRIC_EXPORT_INTERVAL": "60000", "OTEL_LOGS_EXPORT_INTERVAL": "5000", "OTEL_LOG_USER_PROMPTS": "1", "OTEL_METRICS_INCLUDE_SESSION_ID": "true", "OTEL_METRICS_INCLUDE_VERSION": "true", "OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true", "OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production" } } ``` --- ## Management ### Start Telemetry Stack (Mode 1) ```bash ~/.claude/telemetry/start-telemetry.sh ``` ### Stop Telemetry Stack (Mode 1) ```bash ~/.claude/telemetry/stop-telemetry.sh ``` ### Check Status ```bash docker compose ps ``` ### View Logs ```bash docker compose logs -f ``` ### Restart Services ```bash docker compose restart ``` --- ## Data Retention **Default:** 15 days in Prometheus **Adjust retention:** Edit `docker-compose.yml` or `prometheus.yml`: ```yaml command: - '--storage.tsdb.retention.time=90d' - '--storage.tsdb.retention.size=50GB' ``` **Disk usage:** ~1-2 MB per day per active user --- ## Security Considerations ### Local Setup (Mode 1) - Grafana accessible only on localhost - Default credentials: admin/admin (change after first login) - No external network exposure - Data stored in Docker volumes ### Enterprise Setup (Mode 2) - Use HTTPS endpoints - Store API keys securely (environment variables, secrets manager) - Enable mTLS for production - Tag metrics with team/project for proper attribution --- ## Performance Tuning ### Reduce OTEL Collector Memory Edit `otel-collector-config.yml`: ```yaml processors: memory_limiter: limit_mib: 256 # Reduce from default ``` ### Reduce Prometheus Retention Edit `docker-compose.yml`: ```yaml command: - '--storage.tsdb.retention.time=7d' # Reduce from 15d ``` ### Optimize Dashboard Queries - Use recording rules for expensive queries - Reduce dashboard time ranges - Increase refresh intervals See `data/prometheus-queries.md` for recording rule examples --- ## Integration Examples ### Cost Alerts (PagerDuty/Slack) ```yaml # alertmanager.yml groups: - name: claude_code_cost rules: - alert: HighDailyCost expr: sum(increase(claude_code_claude_code_cost_usage_USD_total[24h])) > 100 annotations: summary: "Claude Code daily cost exceeded $100" ``` ### Weekly Cost Reports (Email) Use Grafana Reporting: 1. Create dashboard with cost panels 2. Set up email delivery 3. Schedule weekly reports ### Chargeback Integration Export metrics to data warehouse: ```yaml # Use Prometheus remote write remote_write: - url: "https://datawarehouse.company.com/prometheus" ``` --- ## Contributing This skill is maintained by the Prometheus Team. **Feedback:** Open an issue or contact the team **Improvements:** Submit pull requests with enhancements --- ## Changelog ### Version 1.1.0 (2025-11-01) **Critical Updates from Production Testing:** - 🚨 **CRITICAL FIX**: Documented missing OTEL_METRICS_EXPORTER/OTEL_LOGS_EXPORTER as #1 cause of "telemetry not working" - ✅ Added deprecated `address` field fix for OTEL Collector v0.123.0+ - ✅ Enhanced troubleshooting with prominent exporter configuration section - ✅ Updated all documentation with CRITICAL warnings for required settings - ✅ Added comprehensive Known Issues section covering production scenarios - ✅ Verified templates have correct exporter configuration **What Changed:** - Troubleshooting guide now prioritizes missing exporters as root cause - Known Issues expanded from 3 to 6 issues with production learnings - Configuration Reference includes prominent CRITICAL requirements callout - SKILL.md Important Reminders section updated with exporter warnings ### Version 1.0.0 (2025-10-31) **Initial Release:** - Mode 1: Local PoC setup with full Docker stack - Mode 2: Enterprise setup with centralized endpoint - Comprehensive documentation and troubleshooting - Dashboard templates with correct metric naming - Automated UID detection and replacement **Known Issues Fixed:** - ✅ OTEL Collector deprecated logging exporter - ✅ Dashboard datasource UID mismatch - ✅ Metric double prefix handling - ✅ Loki exporter configuration --- ## Additional Resources - **Claude Code Monitoring Docs:** https://docs.claude.com/claude-code/monitoring - **OpenTelemetry Docs:** https://opentelemetry.io/docs/ - **Prometheus Docs:** https://prometheus.io/docs/ - **Grafana Docs:** https://grafana.com/docs/ --- ## License Internal use within Elsevier organization. --- ## Support **Issues?** Check `data/troubleshooting.md` first **Questions?** Contact Prometheus Team or #claude-code-telemetry channel **Emergency?** Rollback with: `cp ~/.claude/settings.json.backup ~/.claude/settings.json` --- **Ready to monitor your Claude Code usage!** 🚀