757 lines
11 KiB
Markdown
757 lines
11 KiB
Markdown
# DQL (Datadog Query Language) ↔ PromQL Translation Guide
|
|
|
|
## Quick Reference
|
|
|
|
| Concept | Datadog (DQL) | Prometheus (PromQL) |
|
|
|---------|---------------|---------------------|
|
|
| Aggregation | `avg:`, `sum:`, `min:`, `max:` | `avg()`, `sum()`, `min()`, `max()` |
|
|
| Rate | `.as_rate()`, `.as_count()` | `rate()`, `increase()` |
|
|
| Percentile | `p50:`, `p95:`, `p99:` | `histogram_quantile()` |
|
|
| Filtering | `{tag:value}` | `{label="value"}` |
|
|
| Time window | `last_5m`, `last_1h` | `[5m]`, `[1h]` |
|
|
|
|
---
|
|
|
|
## Basic Queries
|
|
|
|
### Simple Metric Query
|
|
|
|
**Datadog**:
|
|
```
|
|
system.cpu.user
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_cpu_seconds_total{mode="user"}
|
|
```
|
|
|
|
---
|
|
|
|
### Metric with Filter
|
|
|
|
**Datadog**:
|
|
```
|
|
system.cpu.user{host:web-01}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_cpu_seconds_total{mode="user", instance="web-01"}
|
|
```
|
|
|
|
---
|
|
|
|
### Multiple Filters (AND)
|
|
|
|
**Datadog**:
|
|
```
|
|
system.cpu.user{host:web-01,env:production}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_cpu_seconds_total{mode="user", instance="web-01", env="production"}
|
|
```
|
|
|
|
---
|
|
|
|
### Wildcard Filters
|
|
|
|
**Datadog**:
|
|
```
|
|
system.cpu.user{host:web-*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_cpu_seconds_total{mode="user", instance=~"web-.*"}
|
|
```
|
|
|
|
---
|
|
|
|
### OR Filters
|
|
|
|
**Datadog**:
|
|
```
|
|
system.cpu.user{host:web-01 OR host:web-02}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_cpu_seconds_total{mode="user", instance=~"web-01|web-02"}
|
|
```
|
|
|
|
---
|
|
|
|
## Aggregations
|
|
|
|
### Average
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:system.cpu.user{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg(node_cpu_seconds_total{mode="user"})
|
|
```
|
|
|
|
---
|
|
|
|
### Sum
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:requests.count{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(http_requests_total)
|
|
```
|
|
|
|
---
|
|
|
|
### Min/Max
|
|
|
|
**Datadog**:
|
|
```
|
|
min:system.mem.free{*}
|
|
max:system.mem.free{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
min(node_memory_MemFree_bytes)
|
|
max(node_memory_MemFree_bytes)
|
|
```
|
|
|
|
---
|
|
|
|
### Aggregation by Tag/Label
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:system.cpu.user{*} by {host}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg by (instance) (node_cpu_seconds_total{mode="user"})
|
|
```
|
|
|
|
---
|
|
|
|
## Rates and Counts
|
|
|
|
### Rate (per second)
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:requests.count{*}.as_rate()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(http_requests_total[5m]))
|
|
```
|
|
|
|
Note: Prometheus requires explicit time window `[5m]`
|
|
|
|
---
|
|
|
|
### Count (total over time)
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:requests.count{*}.as_count()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(increase(http_requests_total[1h]))
|
|
```
|
|
|
|
---
|
|
|
|
### Derivative (change over time)
|
|
|
|
**Datadog**:
|
|
```
|
|
derivative(avg:system.disk.used{*})
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
deriv(node_filesystem_size_bytes[5m])
|
|
```
|
|
|
|
---
|
|
|
|
## Percentiles
|
|
|
|
### P50 (Median)
|
|
|
|
**Datadog**:
|
|
```
|
|
p50:request.duration{*}
|
|
```
|
|
|
|
**Prometheus** (requires histogram):
|
|
```promql
|
|
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
|
```
|
|
|
|
---
|
|
|
|
### P95
|
|
|
|
**Datadog**:
|
|
```
|
|
p95:request.duration{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
|
```
|
|
|
|
---
|
|
|
|
### P99
|
|
|
|
**Datadog**:
|
|
```
|
|
p99:request.duration{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
|
```
|
|
|
|
---
|
|
|
|
## Time Windows
|
|
|
|
### Last 5 minutes
|
|
|
|
**Datadog**:
|
|
```
|
|
avg(last_5m):system.cpu.user{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg(node_cpu_seconds_total{mode="user"}[5m])
|
|
```
|
|
|
|
---
|
|
|
|
### Last 1 hour
|
|
|
|
**Datadog**:
|
|
```
|
|
avg(last_1h):system.cpu.user{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
|
|
```
|
|
|
|
---
|
|
|
|
## Math Operations
|
|
|
|
### Division
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:system.mem.used{*} / avg:system.mem.total{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
node_memory_MemUsed_bytes / node_memory_MemTotal_bytes
|
|
```
|
|
|
|
---
|
|
|
|
### Multiplication
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:system.cpu.user{*} * 100
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg(node_cpu_seconds_total{mode="user"}) * 100
|
|
```
|
|
|
|
---
|
|
|
|
### Percentage Calculation
|
|
|
|
**Datadog**:
|
|
```
|
|
(sum:requests.errors{*} / sum:requests.count{*}) * 100
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
|
|
```
|
|
|
|
---
|
|
|
|
## Common Use Cases
|
|
|
|
### CPU Usage Percentage
|
|
|
|
**Datadog**:
|
|
```
|
|
100 - avg:system.cpu.idle{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
|
```
|
|
|
|
---
|
|
|
|
### Memory Usage Percentage
|
|
|
|
**Datadog**:
|
|
```
|
|
(avg:system.mem.used{*} / avg:system.mem.total{*}) * 100
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
|
|
```
|
|
|
|
---
|
|
|
|
### Disk Usage Percentage
|
|
|
|
**Datadog**:
|
|
```
|
|
(avg:system.disk.used{*} / avg:system.disk.total{*}) * 100
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
|
|
```
|
|
|
|
---
|
|
|
|
### Request Rate (requests/sec)
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:requests.count{*}.as_rate()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(http_requests_total[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### Error Rate Percentage
|
|
|
|
**Datadog**:
|
|
```
|
|
(sum:requests.errors{*}.as_rate() / sum:requests.count{*}.as_rate()) * 100
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
|
|
```
|
|
|
|
---
|
|
|
|
### Request Latency (P95)
|
|
|
|
**Datadog**:
|
|
```
|
|
p95:request.duration{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
|
```
|
|
|
|
---
|
|
|
|
### Top 5 Hosts by CPU
|
|
|
|
**Datadog**:
|
|
```
|
|
top(avg:system.cpu.user{*} by {host}, 5, 'mean', 'desc')
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
topk(5, avg by (instance) (rate(node_cpu_seconds_total{mode="user"}[5m])))
|
|
```
|
|
|
|
---
|
|
|
|
## Functions
|
|
|
|
### Absolute Value
|
|
|
|
**Datadog**:
|
|
```
|
|
abs(diff(avg:system.cpu.user{*}))
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
abs(delta(node_cpu_seconds_total{mode="user"}[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### Ceiling/Floor
|
|
|
|
**Datadog**:
|
|
```
|
|
ceil(avg:system.cpu.user{*})
|
|
floor(avg:system.cpu.user{*})
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
ceil(avg(node_cpu_seconds_total{mode="user"}))
|
|
floor(avg(node_cpu_seconds_total{mode="user"}))
|
|
```
|
|
|
|
---
|
|
|
|
### Clamp (Limit Range)
|
|
|
|
**Datadog**:
|
|
```
|
|
clamp_min(avg:system.cpu.user{*}, 0)
|
|
clamp_max(avg:system.cpu.user{*}, 100)
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
clamp_min(avg(node_cpu_seconds_total{mode="user"}), 0)
|
|
clamp_max(avg(node_cpu_seconds_total{mode="user"}), 100)
|
|
```
|
|
|
|
---
|
|
|
|
### Moving Average
|
|
|
|
**Datadog**:
|
|
```
|
|
moving_rollup(avg:system.cpu.user{*}, 60, 'avg')
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced Patterns
|
|
|
|
### Compare to Previous Period
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:requests.count{*}.as_rate() / timeshift(sum:requests.count{*}.as_rate(), 3600)
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(http_requests_total[5m])) / sum(rate(http_requests_total[5m] offset 1h))
|
|
```
|
|
|
|
---
|
|
|
|
### Forecast
|
|
|
|
**Datadog**:
|
|
```
|
|
forecast(avg:system.disk.used{*}, 'linear', 1)
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
predict_linear(node_filesystem_size_bytes[1h], 3600)
|
|
```
|
|
|
|
Note: Predicts value 1 hour in future based on last 1 hour trend
|
|
|
|
---
|
|
|
|
### Anomaly Detection
|
|
|
|
**Datadog**:
|
|
```
|
|
anomalies(avg:system.cpu.user{*}, 'basic', 2)
|
|
```
|
|
|
|
**Prometheus**: No built-in function
|
|
- Use recording rules with stddev
|
|
- External tools like **Robust Perception's anomaly detector**
|
|
- Or use **Grafana ML** plugin
|
|
|
|
---
|
|
|
|
### Outlier Detection
|
|
|
|
**Datadog**:
|
|
```
|
|
outliers(avg:system.cpu.user{*} by {host}, 'mad')
|
|
```
|
|
|
|
**Prometheus**: No built-in function
|
|
- Calculate manually with stddev:
|
|
```promql
|
|
abs(metric - avg(metric)) > 2 * stddev(metric)
|
|
```
|
|
|
|
---
|
|
|
|
## Container & Kubernetes
|
|
|
|
### Container CPU Usage
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:docker.cpu.usage{*} by {container_name}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg by (container) (rate(container_cpu_usage_seconds_total[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### Container Memory Usage
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:docker.mem.rss{*} by {container_name}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg by (container) (container_memory_rss)
|
|
```
|
|
|
|
---
|
|
|
|
### Pod Count by Status
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:kubernetes.pods.running{*} by {kube_namespace}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum by (namespace) (kube_pod_status_phase{phase="Running"})
|
|
```
|
|
|
|
---
|
|
|
|
## Database Queries
|
|
|
|
### MySQL Queries Per Second
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:mysql.performance.queries{*}.as_rate()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(mysql_global_status_queries[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### PostgreSQL Active Connections
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:postgresql.connections{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg(pg_stat_database_numbackends)
|
|
```
|
|
|
|
---
|
|
|
|
### Redis Memory Usage
|
|
|
|
**Datadog**:
|
|
```
|
|
avg:redis.mem.used{*}
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
avg(redis_memory_used_bytes)
|
|
```
|
|
|
|
---
|
|
|
|
## Network Metrics
|
|
|
|
### Network Bytes Sent
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:system.net.bytes_sent{*}.as_rate()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(node_network_transmit_bytes_total[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### Network Bytes Received
|
|
|
|
**Datadog**:
|
|
```
|
|
sum:system.net.bytes_rcvd{*}.as_rate()
|
|
```
|
|
|
|
**Prometheus**:
|
|
```promql
|
|
sum(rate(node_network_receive_bytes_total[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
## Key Differences
|
|
|
|
### 1. Time Windows
|
|
- **Datadog**: Optional, defaults to query time range
|
|
- **Prometheus**: Always required for rate/increase functions
|
|
|
|
### 2. Histograms
|
|
- **Datadog**: Percentiles available directly
|
|
- **Prometheus**: Requires histogram buckets + `histogram_quantile()`
|
|
|
|
### 3. Default Aggregation
|
|
- **Datadog**: No default, must specify
|
|
- **Prometheus**: Returns all time series unless aggregated
|
|
|
|
### 4. Metric Types
|
|
- **Datadog**: All metrics treated similarly
|
|
- **Prometheus**: Explicit types (counter, gauge, histogram, summary)
|
|
|
|
### 5. Tag vs Label
|
|
- **Datadog**: Uses "tags" (key:value)
|
|
- **Prometheus**: Uses "labels" (key="value")
|
|
|
|
---
|
|
|
|
## Migration Tips
|
|
|
|
1. **Start with dashboards**: Convert most-used dashboards first
|
|
2. **Use recording rules**: Pre-calculate expensive PromQL queries
|
|
3. **Test in parallel**: Run both systems during migration
|
|
4. **Document mappings**: Create team-specific translation guide
|
|
5. **Train team**: PromQL has learning curve, invest in training
|
|
|
|
---
|
|
|
|
## Tools
|
|
|
|
- **Datadog Dashboard Exporter**: Export JSON dashboards
|
|
- **Grafana Dashboard Linter**: Validate converted dashboards
|
|
- **PromQL Learning Resources**: https://prometheus.io/docs/prometheus/latest/querying/basics/
|
|
|
|
---
|
|
|
|
## Common Gotchas
|
|
|
|
### Rate without Time Window
|
|
|
|
❌ **Wrong**:
|
|
```promql
|
|
rate(http_requests_total)
|
|
```
|
|
|
|
✅ **Correct**:
|
|
```promql
|
|
rate(http_requests_total[5m])
|
|
```
|
|
|
|
---
|
|
|
|
### Aggregating Before Rate
|
|
|
|
❌ **Wrong**:
|
|
```promql
|
|
rate(sum(http_requests_total)[5m])
|
|
```
|
|
|
|
✅ **Correct**:
|
|
```promql
|
|
sum(rate(http_requests_total[5m]))
|
|
```
|
|
|
|
---
|
|
|
|
### Histogram Quantile Without by (le)
|
|
|
|
❌ **Wrong**:
|
|
```promql
|
|
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
|
|
```
|
|
|
|
✅ **Correct**:
|
|
```promql
|
|
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Conversion Checklist
|
|
|
|
When converting a Datadog query to PromQL:
|
|
|
|
- [ ] Replace metric name (e.g., `system.cpu.user` → `node_cpu_seconds_total`)
|
|
- [ ] Convert tags to labels (`{tag:value}` → `{label="value"}`)
|
|
- [ ] Add time window for rate/increase (`[5m]`)
|
|
- [ ] Change aggregation syntax (`avg:` → `avg()`)
|
|
- [ ] Convert percentiles to histogram_quantile if needed
|
|
- [ ] Test query in Prometheus before adding to dashboard
|
|
- [ ] Add `by (label)` for grouped aggregations
|
|
|
|
---
|
|
|
|
## Need More Help?
|
|
|
|
- See `datadog_migration.md` for full migration guide
|
|
- PromQL documentation: https://prometheus.io/docs/prometheus/latest/querying/
|
|
- Practice at: https://demo.promlens.com/
|