Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:40 +08:00
commit f125e90b9f
370 changed files with 67769 additions and 0 deletions

View File

@@ -0,0 +1,812 @@
# Mode 1: Local PoC Setup - Detailed Workflow
Complete step-by-step process for setting up a local OpenTelemetry stack for Claude Code telemetry.
---
## Overview
**Goal:** Create a complete local telemetry monitoring stack
**Time:** 5-7 minutes
**Prerequisites:** Docker Desktop, Claude Code, 2GB+ free disk space
**Output:** Running Grafana dashboard with Claude Code metrics
---
## Phase 0: Prerequisites Verification
### Step 0.1: Check Docker Installation
```bash
# Check if Docker is installed
docker --version
# Expected: Docker version 20.10.0 or higher
```
**If not installed:**
```
Docker is not installed. Please install Docker Desktop:
- Mac: https://docs.docker.com/desktop/install/mac-install/
- Linux: https://docs.docker.com/desktop/install/linux-install/
- Windows: https://docs.docker.com/desktop/install/windows-install/
```
**Stop if:** Docker not installed
### Step 0.2: Verify Docker is Running
```bash
# Check Docker daemon
docker ps
# Expected: List of containers (or empty list)
# Error: "Cannot connect to Docker daemon" means Docker isn't running
```
**If not running:**
```
Docker Desktop is not running. Please:
1. Open Docker Desktop application
2. Wait for the whale icon to be stable (not animated)
3. Try again
```
**Stop if:** Docker not running
### Step 0.3: Check Docker Compose
```bash
# Modern Docker includes compose
docker compose version
# Expected: Docker Compose version v2.x.x or higher
```
**Note:** We use `docker compose` (not `docker-compose`)
### Step 0.4: Check Available Ports
```bash
# Check if ports are available
lsof -i :3000 -i :4317 -i :4318 -i :8889 -i :9090 -i :3100
# Expected: No output (ports are free)
```
**If ports in use:**
```
The following ports are required but already in use:
- 3000: Grafana
- 4317: OTEL Collector (gRPC)
- 4318: OTEL Collector (HTTP)
- 8889: OTEL Collector (Prometheus exporter)
- 9090: Prometheus
- 3100: Loki
Options:
1. Stop services using these ports
2. Modify port mappings in docker-compose.yml (advanced)
```
**Stop if:** Critical ports (3000, 4317, 9090) are in use
### Step 0.5: Check Disk Space
```bash
# Check available disk space
df -h ~
# Minimum: 2GB free (for Docker images ~1.5GB + data volumes)
# Recommended: 5GB+ free for comfortable operation
```
**If low disk space:**
```
Low disk space detected. Setup requires:
- Initial: ~1.5GB for Docker images (OTEL, Prometheus, Grafana, Loki)
- Runtime: 500MB+ for data volumes (grows over time)
- Minimum: 2GB free disk space required
Please free up space before continuing.
```
---
## Phase 1: Directory Structure Creation
### Step 1.1: Create Base Directory
```bash
mkdir -p ~/.claude/telemetry/{dashboards,docs}
cd ~/.claude/telemetry
```
**Verify:**
```bash
ls -la ~/.claude/telemetry
# Should show: dashboards/ and docs/ directories
```
---
## Phase 2: Configuration File Generation
### Step 2.1: Create docker-compose.yml
**Template:** `templates/docker-compose-template.yml`
```yaml
services:
# OpenTelemetry Collector - receives telemetry from Claude Code
otel-collector:
image: otel/opentelemetry-collector-contrib:0.115.1
container_name: claude-otel-collector
command: ["--config=/etc/otel-collector-config.yml"]
volumes:
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "8889:8889" # Prometheus metrics exporter
networks:
- claude-telemetry
# Prometheus - stores metrics
prometheus:
image: prom/prometheus:v2.55.1
container_name: claude-prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- claude-telemetry
depends_on:
- otel-collector
# Loki - stores logs
loki:
image: grafana/loki:3.0.0
container_name: claude-loki
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
networks:
- claude-telemetry
# Grafana - visualization dashboards
grafana:
image: grafana/grafana:11.3.0
container_name: claude-grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
networks:
- claude-telemetry
depends_on:
- prometheus
- loki
networks:
claude-telemetry:
driver: bridge
volumes:
prometheus-data:
loki-data:
grafana-data:
```
**Write to:** `~/.claude/telemetry/docker-compose.yml`
**Note on Image Versions:**
- Versions are pinned to prevent breaking changes from upstream
- Current versions (tested and stable):
- OTEL Collector: 0.115.1
- Prometheus: v2.55.1
- Loki: 3.0.0
- Grafana: 11.3.0
- To update: Change version tags in docker-compose.yml and run `docker compose pull`
### Step 2.2: Create OTEL Collector Configuration
**Template:** `templates/otel-collector-config-template.yml`
**CRITICAL:** Use `debug` exporter, not deprecated `logging` exporter
```yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.name
value: claude-code
action: upsert
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
# Export metrics to Prometheus
prometheus:
endpoint: "0.0.0.0:8889"
namespace: claude_code
const_labels:
source: claude_code_telemetry
# Export logs to Loki via OTLP HTTP
otlphttp/loki:
endpoint: http://loki:3100/otlp
tls:
insecure: true
# Debug exporter (replaces deprecated logging exporter)
debug:
verbosity: normal
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [prometheus, debug]
logs:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [otlphttp/loki, debug]
telemetry:
logs:
level: info
```
**Write to:** `~/.claude/telemetry/otel-collector-config.yml`
### Step 2.3: Create Prometheus Configuration
**Template:** `templates/prometheus-config-template.yml`
```yaml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8889']
```
**Write to:** `~/.claude/telemetry/prometheus.yml`
### Step 2.4: Create Grafana Datasources Configuration
**Template:** `templates/grafana-datasources-template.yml`
```yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: true
```
**Write to:** `~/.claude/telemetry/grafana-datasources.yml`
### Step 2.5: Create Management Scripts
**Start Script:**
```bash
#!/bin/bash
# start-telemetry.sh
echo "🚀 Starting Claude Code Telemetry Stack..."
# Check if Docker is running
if ! docker info > /dev/null 2>&1; then
echo "❌ Docker is not running. Please start Docker Desktop."
exit 1
fi
cd ~/.claude/telemetry || exit 1
# Start containers
docker compose up -d
# Wait for services to be ready
echo "⏳ Waiting for services to start..."
sleep 5
# Check container status
echo ""
echo "📊 Container Status:"
docker ps --filter "name=claude-" --format "table {{.Names}}\t{{.Status}}"
echo ""
echo "✅ Telemetry stack started!"
echo ""
echo "🌐 Access URLs:"
echo " Grafana: http://localhost:3000 (admin/admin)"
echo " Prometheus: http://localhost:9090"
echo " Loki: http://localhost:3100"
echo ""
echo "📝 Next steps:"
echo " 1. Restart Claude Code to activate telemetry"
echo " 2. Import dashboards into Grafana"
echo " 3. Use Claude Code normally - metrics will appear in ~60 seconds"
```
**Write to:** `~/.claude/telemetry/start-telemetry.sh`
```bash
chmod +x ~/.claude/telemetry/start-telemetry.sh
```
**Stop Script:**
```bash
#!/bin/bash
# stop-telemetry.sh
echo "🛑 Stopping Claude Code Telemetry Stack..."
cd ~/.claude/telemetry || exit 1
docker compose down
echo "✅ Telemetry stack stopped"
echo ""
echo "Note: Data is preserved in Docker volumes."
echo "To start again: ./start-telemetry.sh"
echo "To completely remove all data: ./cleanup-telemetry.sh"
```
**Write to:** `~/.claude/telemetry/stop-telemetry.sh`
```bash
chmod +x ~/.claude/telemetry/stop-telemetry.sh
```
**Cleanup Script (Full Data Removal):**
```bash
#!/bin/bash
# cleanup-telemetry.sh
echo "⚠️ WARNING: This will remove ALL telemetry data including:"
echo " - All containers"
echo " - All Docker volumes (Grafana, Prometheus, Loki data)"
echo " - Network configuration"
echo ""
read -p "Are you sure you want to proceed? (yes/no): " -r
echo
if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
echo "Cleanup cancelled."
exit 0
fi
echo "Performing full cleanup of Claude Code telemetry stack..."
cd ~/.claude/telemetry || exit 1
docker compose down -v
echo ""
echo "✅ Full cleanup complete!"
echo ""
echo "Removed:"
echo " ✓ All containers (otel-collector, prometheus, loki, grafana)"
echo " ✓ All volumes (all historical data)"
echo " ✓ Network configuration"
echo ""
echo "Preserved:"
echo " ✓ Configuration files in ~/.claude/telemetry/"
echo " ✓ Claude Code settings in ~/.claude/settings.json"
echo ""
echo "To start fresh: ./start-telemetry.sh"
```
**Write to:** `~/.claude/telemetry/cleanup-telemetry.sh`
```bash
chmod +x ~/.claude/telemetry/cleanup-telemetry.sh
```
---
## Phase 3: Start Docker Containers
### Step 3.1: Start All Services
```bash
cd ~/.claude/telemetry
docker compose up -d
```
**Expected output:**
```
[+] Running 5/5
✔ Network claude_claude-telemetry Created
✔ Container claude-loki Started
✔ Container claude-otel-collector Started
✔ Container claude-prometheus Started
✔ Container claude-grafana Started
```
### Step 3.2: Verify Containers are Running
```bash
docker ps --filter "name=claude-" --format "table {{.Names}}\t{{.Status}}"
```
**Expected:** All 4 containers showing "Up X seconds/minutes"
**If OTEL Collector is not running:**
```bash
# Check logs
docker logs claude-otel-collector
```
**Common issue:** "logging exporter deprecated" error
**Solution:** Config file uses `debug` exporter (already fixed in template)
### Step 3.3: Wait for Services to be Healthy
```bash
# Give services time to initialize
sleep 10
# Test Prometheus
curl -s http://localhost:9090/-/healthy
# Expected: Prometheus is Healthy.
# Test Grafana
curl -s http://localhost:3000/api/health | jq
# Expected: {"database": "ok", ...}
```
---
## Phase 4: Update Claude Code Settings
### Step 4.1: Backup Existing Settings
```bash
cp ~/.claude/settings.json ~/.claude/settings.json.backup
```
### Step 4.2: Read Current Settings
```bash
# Read existing settings
cat ~/.claude/settings.json
```
### Step 4.3: Merge Telemetry Configuration
**Add to settings.json `env` section:**
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "environment=local,deployment=poc"
}
}
```
**Template:** `templates/settings-env-template.json`
**Note:** Merge with existing env vars, don't replace entire settings file
### Step 4.4: Verify Settings Updated
```bash
cat ~/.claude/settings.json | grep CLAUDE_CODE_ENABLE_TELEMETRY
# Expected: "CLAUDE_CODE_ENABLE_TELEMETRY": "1"
```
---
## Phase 5: Grafana Dashboard Import
### Step 5.1: Detect Prometheus Datasource UID
**Option A: Via Grafana API**
```bash
curl -s http://admin:admin@localhost:3000/api/datasources | \
jq '.[] | select(.type=="prometheus") | {name, uid}'
```
**Expected:**
```json
{
"name": "Prometheus",
"uid": "PBFA97CFB590B2093"
}
```
**Option B: Manual Detection**
1. Open http://localhost:3000
2. Go to Connections → Data sources
3. Click Prometheus
4. Note the UID from the URL: `/datasources/edit/{UID}`
### Step 5.2: Fix Dashboard with Correct UID
**Read dashboard template:** `dashboards/claude-code-overview-template.json`
**Replace all instances of:**
```json
"datasource": {
"type": "prometheus",
"uid": "prometheus"
}
```
**With:**
```json
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
}
```
**Use detected UID from Step 5.1**
### Step 5.3: Verify Metric Names
**CRITICAL:** Claude Code metrics use double prefix: `claude_code_claude_code_*`
**Verify actual metric names:**
```bash
curl -s 'http://localhost:9090/api/v1/label/__name__/values' | \
grep claude_code
```
**Expected metrics:**
- `claude_code_claude_code_active_time_seconds_total`
- `claude_code_claude_code_commit_count_total`
- `claude_code_claude_code_cost_usage_USD_total`
- `claude_code_claude_code_lines_of_code_count_total`
- `claude_code_claude_code_token_usage_tokens_total`
**Dashboard queries must use these exact names**
### Step 5.4: Save Corrected Dashboard
**Write to:** `~/.claude/telemetry/dashboards/claude-code-overview.json`
### Step 5.5: Import Dashboard
**Option A: Via Grafana UI**
1. Open http://localhost:3000 (admin/admin)
2. Dashboards → New → Import
3. Upload JSON file: `~/.claude/telemetry/dashboards/claude-code-overview.json`
4. Click Import
**Option B: Via API**
```bash
curl -X POST http://admin:admin@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d @~/.claude/telemetry/dashboards/claude-code-overview.json
```
---
## Phase 6: Verification & Testing
### Step 6.1: Verify OTEL Collector Receiving Data
**Note:** Claude Code must be restarted for telemetry to activate!
```bash
# Check OTEL Collector logs for incoming data
docker logs claude-otel-collector --tail 50 | grep -i "received"
```
**Expected:** Messages about receiving OTLP data
**If no data:**
```
Reminder: You must restart Claude Code for telemetry to activate.
1. Exit current Claude Code session
2. Start new session: claude
3. Wait 60 seconds
4. Check again
```
### Step 6.2: Query Prometheus for Metrics
```bash
# Check if any claude_code metrics exist
curl -s 'http://localhost:9090/api/v1/label/__name__/values' | \
jq '.data[] | select(. | startswith("claude_code"))'
```
**Expected:** List of claude_code metrics
**Sample query:**
```bash
curl -s 'http://localhost:9090/api/v1/query?query=claude_code_claude_code_lines_of_code_count_total' | \
jq '.data.result'
```
**Expected:** Non-empty result array
### Step 6.3: Test Grafana Dashboard
1. Open http://localhost:3000
2. Navigate to imported dashboard
3. Check panels show data (or "No data" if Claude Code hasn't been used yet)
**If "No data":**
- Normal if Claude Code hasn't generated any activity yet
- Use Claude Code for 1-2 minutes
- Refresh dashboard
**If "Datasource not found":**
- UID mismatch - go back to Step 5.1
**If queries fail:**
- Metric name mismatch - verify double prefix
### Step 6.4: Generate Test Data
**To populate dashboard quickly:**
```
Use Claude Code to:
1. Ask a question (generates token usage)
2. Request a code modification (generates LOC metrics)
3. Have a conversation (generates active time)
```
**Wait 60 seconds, then refresh Grafana dashboard**
---
## Phase 7: Documentation & Quickstart Guide
### Step 7.1: Create Quickstart Guide
**Write to:** `~/.claude/telemetry/docs/quickstart.md`
**Include:**
- URLs and credentials
- Management commands (start/stop)
- What metrics are being collected
- How to access dashboards
- Troubleshooting quick reference
**Template:** `data/quickstart-template.md`
### Step 7.2: Provide User Summary
```
✅ Setup Complete!
📦 Installation:
Location: ~/.claude/telemetry/
Containers: 4 running (OTEL Collector, Prometheus, Loki, Grafana)
🌐 Access URLs:
Grafana: http://localhost:3000 (admin/admin)
Prometheus: http://localhost:9090
OTEL Collector: localhost:4317 (gRPC), localhost:4318 (HTTP)
📊 Dashboards Imported:
✓ Claude Code - Overview
📝 What's Being Collected:
• Session counts and active time
• Token usage (input/output/cached)
• API costs by model
• Lines of code modified
• Commits and PRs created
• Tool execution metrics
⚙️ Management:
Start: ~/.claude/telemetry/start-telemetry.sh
Stop: ~/.claude/telemetry/stop-telemetry.sh (preserves data)
Cleanup: ~/.claude/telemetry/cleanup-telemetry.sh (removes all data)
Logs: docker logs claude-otel-collector
🚀 Next Steps:
1. ✅ Restart Claude Code (telemetry activates on startup)
2. Use Claude Code normally
3. Check dashboard in ~60 seconds
4. Review quickstart: ~/.claude/telemetry/docs/quickstart.md
📚 Documentation:
- Quickstart: ~/.claude/telemetry/docs/quickstart.md
- Metrics Reference: data/metrics-reference.md
- Troubleshooting: data/troubleshooting.md
```
---
## Cleanup Instructions
### Remove Stack (Keep Data)
```bash
cd ~/.claude/telemetry
docker compose down
```
### Remove Stack and Data
```bash
cd ~/.claude/telemetry
docker compose down -v
```
### Remove Telemetry from Claude Code
Edit `~/.claude/settings.json` and remove the `env` section with telemetry variables, or set:
```json
"CLAUDE_CODE_ENABLE_TELEMETRY": "0"
```
Then restart Claude Code.
---
## Troubleshooting
See `data/troubleshooting.md` for detailed solutions to common issues.
**Quick fixes:**
- Container won't start → Check logs: `docker logs claude-otel-collector`
- No metrics → Restart Claude Code
- Dashboard broken → Verify datasource UID
- Wrong metric names → Use double prefix: `claude_code_claude_code_*`

View File

@@ -0,0 +1,572 @@
# Mode 2: Enterprise Setup (Connect to Existing Infrastructure)
**Goal:** Configure Claude Code to send telemetry to centralized company infrastructure
**When to use:**
- Company has centralized OTEL Collector endpoint
- Team rollout scenario
- Want aggregated team metrics
- Privacy/compliance requires centralized control
- No need for local Grafana dashboards
**Prerequisites:**
- OTEL Collector endpoint URL (e.g., `https://otel.company.com:4317`)
- Authentication credentials (API key or mTLS certificates)
- Optional: Team/department identifiers
- Write access to `~/.claude/settings.json`
**Estimated Time:** 2-3 minutes
---
## Phase 0: Gather Requirements
### Step 0.1: Collect endpoint information from user
Ask the user for the following details:
1. **OTEL Collector Endpoint URL**
- Format: `https://otel.company.com:4317` or `http://otel.company.com:4318`
- Protocol: gRPC (port 4317) or HTTP (port 4318)
2. **Authentication Method**
- API Key/Bearer Token
- mTLS certificates
- Basic Auth
- No authentication (internal network)
3. **Team/Environment Identifiers**
- Team name (e.g., `team=platform`)
- Environment (e.g., `environment=production`)
- Department (e.g., `department=engineering`)
- Any other custom attributes
4. **Optional: Protocol Preferences**
- Default: gRPC (more efficient)
- Alternative: HTTP (better firewall compatibility)
**Example Questions:**
```
To configure enterprise telemetry, I need a few details:
1. **Endpoint:** What is your OTEL Collector endpoint URL?
(e.g., https://otel.company.com:4317)
2. **Protocol:** HTTPS or HTTP? gRPC or HTTP/protobuf?
3. **Authentication:** Do you have an API key, certificate, or other credentials?
4. **Team identifier:** What team/department should metrics be tagged with?
(e.g., team=platform, department=engineering)
```
---
## Phase 1: Backup Existing Settings
### Step 1.1: Backup settings.json
**Always backup before modifying!**
```bash
# Check if settings.json exists
if [ -f ~/.claude/settings.json ]; then
cp ~/.claude/settings.json ~/.claude/settings.json.backup.$(date +%Y%m%d-%H%M%S)
echo "✅ Backup created: ~/.claude/settings.json.backup.$(date +%Y%m%d-%H%M%S)"
else
echo "⚠️ No existing settings.json found - will create new one"
fi
```
### Step 1.2: Read existing settings
```bash
# Check current settings
cat ~/.claude/settings.json
```
**Important:** Preserve all existing settings when adding telemetry configuration!
---
## Phase 2: Update Claude Code Settings
### Step 2.1: Determine configuration based on authentication method
**Scenario A: API Key Authentication**
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer YOUR_API_KEY_HERE",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,deployment=enterprise"
}
}
```
**Scenario B: mTLS Certificate Authentication**
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
"OTEL_EXPORTER_OTLP_CERTIFICATE": "/path/to/client-cert.pem",
"OTEL_EXPORTER_OTLP_CLIENT_KEY": "/path/to/client-key.pem",
"OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE": "/path/to/ca-cert.pem",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,deployment=enterprise"
}
}
```
**Scenario C: HTTP Protocol (Port 4318)**
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4318",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer YOUR_API_KEY_HERE",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,deployment=enterprise"
}
}
```
**Scenario D: No Authentication (Internal Network)**
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://otel.internal.company.com:4317",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,deployment=enterprise"
}
}
```
### Step 2.2: Update settings.json
**Method 1: Manual Update (Safest)**
1. Open `~/.claude/settings.json` in editor
2. Merge the telemetry configuration into existing `env` object
3. Preserve all other settings
4. Save file
**Method 2: Programmatic Update (Use with Caution)**
```bash
# Read existing settings
existing_settings=$(cat ~/.claude/settings.json)
# Create merged settings (requires jq)
cat ~/.claude/settings.json | jq '. + {
"env": (.env // {} | . + {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer YOUR_API_KEY_HERE",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,deployment=enterprise"
})
}' > ~/.claude/settings.json.new
# Validate JSON
if jq empty ~/.claude/settings.json.new 2>/dev/null; then
mv ~/.claude/settings.json.new ~/.claude/settings.json
echo "✅ Settings updated successfully"
else
echo "❌ Invalid JSON - restoring backup"
rm ~/.claude/settings.json.new
fi
```
### Step 2.3: Validate configuration
```bash
# Check that settings.json is valid JSON
jq empty ~/.claude/settings.json
# Display telemetry configuration
jq '.env | with_entries(select(.key | startswith("OTEL_") or . == "CLAUDE_CODE_ENABLE_TELEMETRY"))' ~/.claude/settings.json
```
---
## Phase 3: Test Connectivity (Optional)
### Step 3.1: Test OTEL endpoint reachability
```bash
# Test gRPC endpoint (port 4317)
nc -zv otel.company.com 4317
# Test HTTP endpoint (port 4318)
curl -v https://otel.company.com:4318/v1/metrics -d '{}' -H "Content-Type: application/json"
```
### Step 3.2: Validate authentication
```bash
# Test with API key
curl -v https://otel.company.com:4318/v1/metrics \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{}'
# Expected: 200 or 401/403 (tells us auth is working)
# Unexpected: Connection refused, timeout (network issue)
```
---
## Phase 4: User Instructions
### Step 4.1: Provide restart instructions
**Display to user:**
```
✅ Configuration complete!
**Important Next Steps:**
1. **Restart Claude Code** for telemetry to take effect
- Telemetry configuration is only loaded at startup
- Close all Claude Code sessions and restart
2. **Verify with your platform team** that they see metrics
- Metrics should appear within 60 seconds of restart
- Tagged with: team=platform, environment=production
- Metric prefix: claude_code_claude_code_*
3. **Dashboard access**
- Contact your platform team for Grafana/dashboard URLs
- Dashboards should be centrally managed
**Troubleshooting:**
If metrics don't appear:
- Check network connectivity to OTEL endpoint
- Verify authentication credentials are correct
- Check firewall rules allow outbound connections
- Review OTEL Collector logs on backend (platform team)
- Verify OTEL_EXPORTER_OTLP_ENDPOINT is correct
**Rollback:**
If you need to disable telemetry:
- Restore backup: cp ~/.claude/settings.json.backup.TIMESTAMP ~/.claude/settings.json
- Or set: "CLAUDE_CODE_ENABLE_TELEMETRY": "0"
```
---
## Phase 5: Create Team Rollout Documentation
### Step 5.1: Generate rollout guide for team distribution
**Create file: `claude-code-telemetry-setup-guide.md`**
```markdown
# Claude Code Telemetry Setup Guide
**For:** [Team Name] Team Members
**Last Updated:** [Date]
## Overview
We're collecting Claude Code usage telemetry to:
- Track API costs and optimize spending
- Measure productivity metrics (LOC, commits, PRs)
- Understand token usage patterns
- Identify high-value use cases
**Privacy:** All metrics are aggregated and anonymized at the team level.
## Setup Instructions
### Step 1: Backup Your Settings
```bash
cp ~/.claude/settings.json ~/.claude/settings.json.backup
```
### Step 2: Update Configuration
Add the following to your `~/.claude/settings.json`:
```json
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "https://otel.company.com:4317",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer [PROVIDED_BY_PLATFORM_TEAM]",
"OTEL_METRIC_EXPORT_INTERVAL": "60000",
"OTEL_LOGS_EXPORT_INTERVAL": "5000",
"OTEL_LOG_USER_PROMPTS": "1",
"OTEL_METRICS_INCLUDE_SESSION_ID": "true",
"OTEL_METRICS_INCLUDE_VERSION": "true",
"OTEL_METRICS_INCLUDE_ACCOUNT_UUID": "true",
"OTEL_RESOURCE_ATTRIBUTES": "team=[TEAM_NAME],environment=production"
}
}
```
**Important:** Replace `[PROVIDED_BY_PLATFORM_TEAM]` with your API key.
### Step 3: Restart Claude Code
Close all Claude Code sessions and restart for changes to take effect.
### Step 4: Verify Setup
After 5 minutes of usage:
1. Check team dashboard: [DASHBOARD_URL]
2. Verify your metrics appear in the team aggregation
3. Contact [TEAM_CONTACT] if you have issues
## What's Being Collected?
**Metrics:**
- Session counts and active time
- Token usage (input, output, cached)
- API costs by model
- Lines of code modified
- Commits and PRs created
**Events/Logs:**
- User prompts (anonymized)
- Tool executions
- API requests
**NOT Collected:**
- Source code content
- File names or paths
- Personal identifiers (beyond account UUID for deduplication)
## Dashboard Access
**Team Dashboard:** [URL]
**Login:** Use your company SSO
## Support
**Issues?** Contact [TEAM_CONTACT] or #claude-code-telemetry Slack channel
**Opt-Out:** Contact [TEAM_CONTACT] if you need to opt out for specific projects
```
---
## Phase 6: Success Criteria
### Checklist for Mode 2 completion:
- ✅ Backed up existing settings.json
- ✅ Updated settings with correct OTEL endpoint
- ✅ Added authentication (API key or certificates)
- ✅ Set team/environment resource attributes
- ✅ Validated JSON configuration
- ✅ Tested connectivity (optional)
- ✅ Provided restart instructions to user
- ✅ Created team rollout documentation (if applicable)
**Expected outcome:**
- Claude Code sends telemetry to central endpoint within 60 seconds of restart
- Platform team can see metrics tagged with team identifier
- User has clear instructions for verification and troubleshooting
---
## Troubleshooting
### Issue 1: Connection Refused
**Symptoms:** Claude Code can't reach OTEL endpoint
**Checks:**
```bash
# Test network connectivity
ping otel.company.com
# Test port access
nc -zv otel.company.com 4317
# Check corporate VPN/proxy
echo $HTTPS_PROXY
```
**Solutions:**
- Connect to corporate VPN
- Use HTTP proxy if required: `HTTPS_PROXY=http://proxy.company.com:8080`
- Try HTTP protocol (port 4318) instead of gRPC
- Contact network team to allow outbound connections
### Issue 2: Authentication Failed
**Symptoms:** 401 or 403 errors in logs
**Checks:**
```bash
# Verify API key format
jq '.env.OTEL_EXPORTER_OTLP_HEADERS' ~/.claude/settings.json
# Test manually
curl -v https://otel.company.com:4318/v1/metrics \
-H "Authorization: Bearer YOUR_KEY" \
-d '{}'
```
**Solutions:**
- Verify API key is correct and not expired
- Check header format: `Authorization=Bearer TOKEN` (no quotes, equals sign)
- Confirm permissions with platform team
- Try rotating API key
### Issue 3: Metrics Not Appearing
**Symptoms:** Platform team doesn't see metrics after 5 minutes
**Checks:**
```bash
# Verify telemetry is enabled
jq '.env.CLAUDE_CODE_ENABLE_TELEMETRY' ~/.claude/settings.json
# Check endpoint configuration
jq '.env.OTEL_EXPORTER_OTLP_ENDPOINT' ~/.claude/settings.json
# Confirm Claude Code was restarted
ps aux | grep claude
```
**Solutions:**
- Restart Claude Code (telemetry loads at startup only)
- Verify endpoint URL has correct protocol and port
- Check with platform team if OTEL Collector is receiving data
- Review OTEL Collector logs for errors
- Verify resource attributes match expected format
### Issue 4: Certificate Errors (mTLS)
**Symptoms:** SSL/TLS handshake errors
**Checks:**
```bash
# Verify certificate paths
ls -la /path/to/client-cert.pem
ls -la /path/to/client-key.pem
ls -la /path/to/ca-cert.pem
# Check certificate validity
openssl x509 -in /path/to/client-cert.pem -noout -dates
```
**Solutions:**
- Ensure certificate files are readable
- Verify certificates haven't expired
- Check certificate chain is complete
- Confirm CA certificate matches server
- Contact platform team for new certificates if needed
---
## Enterprise Configuration Examples
### Example 1: Multi-Environment Setup
**Development:**
```json
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=development,user=john.doe"
```
**Staging:**
```json
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=staging,user=john.doe"
```
**Production:**
```json
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,environment=production,user=john.doe"
```
### Example 2: Department-Level Aggregation
```json
"OTEL_RESOURCE_ATTRIBUTES": "department=engineering,team=platform,squad=backend,environment=production"
```
Enables queries like:
- Cost by department
- Usage by team within department
- Squad-level productivity metrics
### Example 3: Project-Based Tagging
```json
"OTEL_RESOURCE_ATTRIBUTES": "team=platform,project=api-v2-migration,environment=production"
```
Track costs and effort for specific initiatives.
---
## Additional Resources
- **OTEL Specification:** https://opentelemetry.io/docs/specs/otel/
- **Claude Code Metrics Reference:** See `data/metrics-reference.md`
- **Enterprise Architecture:** See `data/enterprise-architecture.md`
- **Team Dashboard Queries:** See `data/prometheus-queries.md`
---
**Mode 2 Complete!**