Initial commit
This commit is contained in:
756
references/dql_promql_translation.md
Normal file
756
references/dql_promql_translation.md
Normal file
@@ -0,0 +1,756 @@
|
||||
# DQL (Datadog Query Language) ↔ PromQL Translation Guide
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Concept | Datadog (DQL) | Prometheus (PromQL) |
|
||||
|---------|---------------|---------------------|
|
||||
| Aggregation | `avg:`, `sum:`, `min:`, `max:` | `avg()`, `sum()`, `min()`, `max()` |
|
||||
| Rate | `.as_rate()`, `.as_count()` | `rate()`, `increase()` |
|
||||
| Percentile | `p50:`, `p95:`, `p99:` | `histogram_quantile()` |
|
||||
| Filtering | `{tag:value}` | `{label="value"}` |
|
||||
| Time window | `last_5m`, `last_1h` | `[5m]`, `[1h]` |
|
||||
|
||||
---
|
||||
|
||||
## Basic Queries
|
||||
|
||||
### Simple Metric Query
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
system.cpu.user
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_cpu_seconds_total{mode="user"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Metric with Filter
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
system.cpu.user{host:web-01}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_cpu_seconds_total{mode="user", instance="web-01"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Multiple Filters (AND)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
system.cpu.user{host:web-01,env:production}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_cpu_seconds_total{mode="user", instance="web-01", env="production"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Wildcard Filters
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
system.cpu.user{host:web-*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_cpu_seconds_total{mode="user", instance=~"web-.*"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### OR Filters
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
system.cpu.user{host:web-01 OR host:web-02}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_cpu_seconds_total{mode="user", instance=~"web-01|web-02"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Aggregations
|
||||
|
||||
### Average
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:system.cpu.user{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg(node_cpu_seconds_total{mode="user"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Sum
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:requests.count{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(http_requests_total)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Min/Max
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
min:system.mem.free{*}
|
||||
max:system.mem.free{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
min(node_memory_MemFree_bytes)
|
||||
max(node_memory_MemFree_bytes)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Aggregation by Tag/Label
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:system.cpu.user{*} by {host}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg by (instance) (node_cpu_seconds_total{mode="user"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rates and Counts
|
||||
|
||||
### Rate (per second)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:requests.count{*}.as_rate()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(http_requests_total[5m]))
|
||||
```
|
||||
|
||||
Note: Prometheus requires explicit time window `[5m]`
|
||||
|
||||
---
|
||||
|
||||
### Count (total over time)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:requests.count{*}.as_count()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(increase(http_requests_total[1h]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Derivative (change over time)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
derivative(avg:system.disk.used{*})
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
deriv(node_filesystem_size_bytes[5m])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Percentiles
|
||||
|
||||
### P50 (Median)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
p50:request.duration{*}
|
||||
```
|
||||
|
||||
**Prometheus** (requires histogram):
|
||||
```promql
|
||||
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### P95
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
p95:request.duration{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### P99
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
p99:request.duration{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Time Windows
|
||||
|
||||
### Last 5 minutes
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg(last_5m):system.cpu.user{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg(node_cpu_seconds_total{mode="user"}[5m])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Last 1 hour
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg(last_1h):system.cpu.user{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Math Operations
|
||||
|
||||
### Division
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:system.mem.used{*} / avg:system.mem.total{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
node_memory_MemUsed_bytes / node_memory_MemTotal_bytes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Multiplication
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:system.cpu.user{*} * 100
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg(node_cpu_seconds_total{mode="user"}) * 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Percentage Calculation
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
(sum:requests.errors{*} / sum:requests.count{*}) * 100
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### CPU Usage Percentage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
100 - avg:system.cpu.idle{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Memory Usage Percentage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
(avg:system.mem.used{*} / avg:system.mem.total{*}) * 100
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Disk Usage Percentage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
(avg:system.disk.used{*} / avg:system.disk.total{*}) * 100
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Request Rate (requests/sec)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:requests.count{*}.as_rate()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(http_requests_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Error Rate Percentage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
(sum:requests.errors{*}.as_rate() / sum:requests.count{*}.as_rate()) * 100
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Request Latency (P95)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
p95:request.duration{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Top 5 Hosts by CPU
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
top(avg:system.cpu.user{*} by {host}, 5, 'mean', 'desc')
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
topk(5, avg by (instance) (rate(node_cpu_seconds_total{mode="user"}[5m])))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Functions
|
||||
|
||||
### Absolute Value
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
abs(diff(avg:system.cpu.user{*}))
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
abs(delta(node_cpu_seconds_total{mode="user"}[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Ceiling/Floor
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
ceil(avg:system.cpu.user{*})
|
||||
floor(avg:system.cpu.user{*})
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
ceil(avg(node_cpu_seconds_total{mode="user"}))
|
||||
floor(avg(node_cpu_seconds_total{mode="user"}))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Clamp (Limit Range)
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
clamp_min(avg:system.cpu.user{*}, 0)
|
||||
clamp_max(avg:system.cpu.user{*}, 100)
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
clamp_min(avg(node_cpu_seconds_total{mode="user"}), 0)
|
||||
clamp_max(avg(node_cpu_seconds_total{mode="user"}), 100)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Moving Average
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
moving_rollup(avg:system.cpu.user{*}, 60, 'avg')
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Compare to Previous Period
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:requests.count{*}.as_rate() / timeshift(sum:requests.count{*}.as_rate(), 3600)
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(http_requests_total[5m])) / sum(rate(http_requests_total[5m] offset 1h))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Forecast
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
forecast(avg:system.disk.used{*}, 'linear', 1)
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
predict_linear(node_filesystem_size_bytes[1h], 3600)
|
||||
```
|
||||
|
||||
Note: Predicts value 1 hour in future based on last 1 hour trend
|
||||
|
||||
---
|
||||
|
||||
### Anomaly Detection
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
anomalies(avg:system.cpu.user{*}, 'basic', 2)
|
||||
```
|
||||
|
||||
**Prometheus**: No built-in function
|
||||
- Use recording rules with stddev
|
||||
- External tools like **Robust Perception's anomaly detector**
|
||||
- Or use **Grafana ML** plugin
|
||||
|
||||
---
|
||||
|
||||
### Outlier Detection
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
outliers(avg:system.cpu.user{*} by {host}, 'mad')
|
||||
```
|
||||
|
||||
**Prometheus**: No built-in function
|
||||
- Calculate manually with stddev:
|
||||
```promql
|
||||
abs(metric - avg(metric)) > 2 * stddev(metric)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Container & Kubernetes
|
||||
|
||||
### Container CPU Usage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:docker.cpu.usage{*} by {container_name}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg by (container) (rate(container_cpu_usage_seconds_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Container Memory Usage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:docker.mem.rss{*} by {container_name}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg by (container) (container_memory_rss)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Pod Count by Status
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:kubernetes.pods.running{*} by {kube_namespace}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum by (namespace) (kube_pod_status_phase{phase="Running"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Queries
|
||||
|
||||
### MySQL Queries Per Second
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:mysql.performance.queries{*}.as_rate()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(mysql_global_status_queries[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### PostgreSQL Active Connections
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:postgresql.connections{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg(pg_stat_database_numbackends)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Redis Memory Usage
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
avg:redis.mem.used{*}
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
avg(redis_memory_used_bytes)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Network Metrics
|
||||
|
||||
### Network Bytes Sent
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:system.net.bytes_sent{*}.as_rate()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(node_network_transmit_bytes_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Network Bytes Received
|
||||
|
||||
**Datadog**:
|
||||
```
|
||||
sum:system.net.bytes_rcvd{*}.as_rate()
|
||||
```
|
||||
|
||||
**Prometheus**:
|
||||
```promql
|
||||
sum(rate(node_network_receive_bytes_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Differences
|
||||
|
||||
### 1. Time Windows
|
||||
- **Datadog**: Optional, defaults to query time range
|
||||
- **Prometheus**: Always required for rate/increase functions
|
||||
|
||||
### 2. Histograms
|
||||
- **Datadog**: Percentiles available directly
|
||||
- **Prometheus**: Requires histogram buckets + `histogram_quantile()`
|
||||
|
||||
### 3. Default Aggregation
|
||||
- **Datadog**: No default, must specify
|
||||
- **Prometheus**: Returns all time series unless aggregated
|
||||
|
||||
### 4. Metric Types
|
||||
- **Datadog**: All metrics treated similarly
|
||||
- **Prometheus**: Explicit types (counter, gauge, histogram, summary)
|
||||
|
||||
### 5. Tag vs Label
|
||||
- **Datadog**: Uses "tags" (key:value)
|
||||
- **Prometheus**: Uses "labels" (key="value")
|
||||
|
||||
---
|
||||
|
||||
## Migration Tips
|
||||
|
||||
1. **Start with dashboards**: Convert most-used dashboards first
|
||||
2. **Use recording rules**: Pre-calculate expensive PromQL queries
|
||||
3. **Test in parallel**: Run both systems during migration
|
||||
4. **Document mappings**: Create team-specific translation guide
|
||||
5. **Train team**: PromQL has learning curve, invest in training
|
||||
|
||||
---
|
||||
|
||||
## Tools
|
||||
|
||||
- **Datadog Dashboard Exporter**: Export JSON dashboards
|
||||
- **Grafana Dashboard Linter**: Validate converted dashboards
|
||||
- **PromQL Learning Resources**: https://prometheus.io/docs/prometheus/latest/querying/basics/
|
||||
|
||||
---
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
### Rate without Time Window
|
||||
|
||||
❌ **Wrong**:
|
||||
```promql
|
||||
rate(http_requests_total)
|
||||
```
|
||||
|
||||
✅ **Correct**:
|
||||
```promql
|
||||
rate(http_requests_total[5m])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Aggregating Before Rate
|
||||
|
||||
❌ **Wrong**:
|
||||
```promql
|
||||
rate(sum(http_requests_total)[5m])
|
||||
```
|
||||
|
||||
✅ **Correct**:
|
||||
```promql
|
||||
sum(rate(http_requests_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Histogram Quantile Without by (le)
|
||||
|
||||
❌ **Wrong**:
|
||||
```promql
|
||||
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
|
||||
```
|
||||
|
||||
✅ **Correct**:
|
||||
```promql
|
||||
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Conversion Checklist
|
||||
|
||||
When converting a Datadog query to PromQL:
|
||||
|
||||
- [ ] Replace metric name (e.g., `system.cpu.user` → `node_cpu_seconds_total`)
|
||||
- [ ] Convert tags to labels (`{tag:value}` → `{label="value"}`)
|
||||
- [ ] Add time window for rate/increase (`[5m]`)
|
||||
- [ ] Change aggregation syntax (`avg:` → `avg()`)
|
||||
- [ ] Convert percentiles to histogram_quantile if needed
|
||||
- [ ] Test query in Prometheus before adding to dashboard
|
||||
- [ ] Add `by (label)` for grouped aggregations
|
||||
|
||||
---
|
||||
|
||||
## Need More Help?
|
||||
|
||||
- See `datadog_migration.md` for full migration guide
|
||||
- PromQL documentation: https://prometheus.io/docs/prometheus/latest/querying/
|
||||
- Practice at: https://demo.promlens.com/
|
||||
Reference in New Issue
Block a user