11 KiB
DQL (Datadog Query Language) ↔ PromQL Translation Guide
Quick Reference
| Concept | Datadog (DQL) | Prometheus (PromQL) |
|---|---|---|
| Aggregation | avg:, sum:, min:, max: |
avg(), sum(), min(), max() |
| Rate | .as_rate(), .as_count() |
rate(), increase() |
| Percentile | p50:, p95:, p99: |
histogram_quantile() |
| Filtering | {tag:value} |
{label="value"} |
| Time window | last_5m, last_1h |
[5m], [1h] |
Basic Queries
Simple Metric Query
Datadog:
system.cpu.user
Prometheus:
node_cpu_seconds_total{mode="user"}
Metric with Filter
Datadog:
system.cpu.user{host:web-01}
Prometheus:
node_cpu_seconds_total{mode="user", instance="web-01"}
Multiple Filters (AND)
Datadog:
system.cpu.user{host:web-01,env:production}
Prometheus:
node_cpu_seconds_total{mode="user", instance="web-01", env="production"}
Wildcard Filters
Datadog:
system.cpu.user{host:web-*}
Prometheus:
node_cpu_seconds_total{mode="user", instance=~"web-.*"}
OR Filters
Datadog:
system.cpu.user{host:web-01 OR host:web-02}
Prometheus:
node_cpu_seconds_total{mode="user", instance=~"web-01|web-02"}
Aggregations
Average
Datadog:
avg:system.cpu.user{*}
Prometheus:
avg(node_cpu_seconds_total{mode="user"})
Sum
Datadog:
sum:requests.count{*}
Prometheus:
sum(http_requests_total)
Min/Max
Datadog:
min:system.mem.free{*}
max:system.mem.free{*}
Prometheus:
min(node_memory_MemFree_bytes)
max(node_memory_MemFree_bytes)
Aggregation by Tag/Label
Datadog:
avg:system.cpu.user{*} by {host}
Prometheus:
avg by (instance) (node_cpu_seconds_total{mode="user"})
Rates and Counts
Rate (per second)
Datadog:
sum:requests.count{*}.as_rate()
Prometheus:
sum(rate(http_requests_total[5m]))
Note: Prometheus requires explicit time window [5m]
Count (total over time)
Datadog:
sum:requests.count{*}.as_count()
Prometheus:
sum(increase(http_requests_total[1h]))
Derivative (change over time)
Datadog:
derivative(avg:system.disk.used{*})
Prometheus:
deriv(node_filesystem_size_bytes[5m])
Percentiles
P50 (Median)
Datadog:
p50:request.duration{*}
Prometheus (requires histogram):
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
P95
Datadog:
p95:request.duration{*}
Prometheus:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
P99
Datadog:
p99:request.duration{*}
Prometheus:
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Time Windows
Last 5 minutes
Datadog:
avg(last_5m):system.cpu.user{*}
Prometheus:
avg(node_cpu_seconds_total{mode="user"}[5m])
Last 1 hour
Datadog:
avg(last_1h):system.cpu.user{*}
Prometheus:
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
Math Operations
Division
Datadog:
avg:system.mem.used{*} / avg:system.mem.total{*}
Prometheus:
node_memory_MemUsed_bytes / node_memory_MemTotal_bytes
Multiplication
Datadog:
avg:system.cpu.user{*} * 100
Prometheus:
avg(node_cpu_seconds_total{mode="user"}) * 100
Percentage Calculation
Datadog:
(sum:requests.errors{*} / sum:requests.count{*}) * 100
Prometheus:
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
Common Use Cases
CPU Usage Percentage
Datadog:
100 - avg:system.cpu.idle{*}
Prometheus:
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage Percentage
Datadog:
(avg:system.mem.used{*} / avg:system.mem.total{*}) * 100
Prometheus:
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Disk Usage Percentage
Datadog:
(avg:system.disk.used{*} / avg:system.disk.total{*}) * 100
Prometheus:
(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
Request Rate (requests/sec)
Datadog:
sum:requests.count{*}.as_rate()
Prometheus:
sum(rate(http_requests_total[5m]))
Error Rate Percentage
Datadog:
(sum:requests.errors{*}.as_rate() / sum:requests.count{*}.as_rate()) * 100
Prometheus:
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100
Request Latency (P95)
Datadog:
p95:request.duration{*}
Prometheus:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Top 5 Hosts by CPU
Datadog:
top(avg:system.cpu.user{*} by {host}, 5, 'mean', 'desc')
Prometheus:
topk(5, avg by (instance) (rate(node_cpu_seconds_total{mode="user"}[5m])))
Functions
Absolute Value
Datadog:
abs(diff(avg:system.cpu.user{*}))
Prometheus:
abs(delta(node_cpu_seconds_total{mode="user"}[5m]))
Ceiling/Floor
Datadog:
ceil(avg:system.cpu.user{*})
floor(avg:system.cpu.user{*})
Prometheus:
ceil(avg(node_cpu_seconds_total{mode="user"}))
floor(avg(node_cpu_seconds_total{mode="user"}))
Clamp (Limit Range)
Datadog:
clamp_min(avg:system.cpu.user{*}, 0)
clamp_max(avg:system.cpu.user{*}, 100)
Prometheus:
clamp_min(avg(node_cpu_seconds_total{mode="user"}), 0)
clamp_max(avg(node_cpu_seconds_total{mode="user"}), 100)
Moving Average
Datadog:
moving_rollup(avg:system.cpu.user{*}, 60, 'avg')
Prometheus:
avg_over_time(node_cpu_seconds_total{mode="user"}[1h])
Advanced Patterns
Compare to Previous Period
Datadog:
sum:requests.count{*}.as_rate() / timeshift(sum:requests.count{*}.as_rate(), 3600)
Prometheus:
sum(rate(http_requests_total[5m])) / sum(rate(http_requests_total[5m] offset 1h))
Forecast
Datadog:
forecast(avg:system.disk.used{*}, 'linear', 1)
Prometheus:
predict_linear(node_filesystem_size_bytes[1h], 3600)
Note: Predicts value 1 hour in future based on last 1 hour trend
Anomaly Detection
Datadog:
anomalies(avg:system.cpu.user{*}, 'basic', 2)
Prometheus: No built-in function
- Use recording rules with stddev
- External tools like Robust Perception's anomaly detector
- Or use Grafana ML plugin
Outlier Detection
Datadog:
outliers(avg:system.cpu.user{*} by {host}, 'mad')
Prometheus: No built-in function
- Calculate manually with stddev:
abs(metric - avg(metric)) > 2 * stddev(metric)
Container & Kubernetes
Container CPU Usage
Datadog:
avg:docker.cpu.usage{*} by {container_name}
Prometheus:
avg by (container) (rate(container_cpu_usage_seconds_total[5m]))
Container Memory Usage
Datadog:
avg:docker.mem.rss{*} by {container_name}
Prometheus:
avg by (container) (container_memory_rss)
Pod Count by Status
Datadog:
sum:kubernetes.pods.running{*} by {kube_namespace}
Prometheus:
sum by (namespace) (kube_pod_status_phase{phase="Running"})
Database Queries
MySQL Queries Per Second
Datadog:
sum:mysql.performance.queries{*}.as_rate()
Prometheus:
sum(rate(mysql_global_status_queries[5m]))
PostgreSQL Active Connections
Datadog:
avg:postgresql.connections{*}
Prometheus:
avg(pg_stat_database_numbackends)
Redis Memory Usage
Datadog:
avg:redis.mem.used{*}
Prometheus:
avg(redis_memory_used_bytes)
Network Metrics
Network Bytes Sent
Datadog:
sum:system.net.bytes_sent{*}.as_rate()
Prometheus:
sum(rate(node_network_transmit_bytes_total[5m]))
Network Bytes Received
Datadog:
sum:system.net.bytes_rcvd{*}.as_rate()
Prometheus:
sum(rate(node_network_receive_bytes_total[5m]))
Key Differences
1. Time Windows
- Datadog: Optional, defaults to query time range
- Prometheus: Always required for rate/increase functions
2. Histograms
- Datadog: Percentiles available directly
- Prometheus: Requires histogram buckets +
histogram_quantile()
3. Default Aggregation
- Datadog: No default, must specify
- Prometheus: Returns all time series unless aggregated
4. Metric Types
- Datadog: All metrics treated similarly
- Prometheus: Explicit types (counter, gauge, histogram, summary)
5. Tag vs Label
- Datadog: Uses "tags" (key:value)
- Prometheus: Uses "labels" (key="value")
Migration Tips
- Start with dashboards: Convert most-used dashboards first
- Use recording rules: Pre-calculate expensive PromQL queries
- Test in parallel: Run both systems during migration
- Document mappings: Create team-specific translation guide
- Train team: PromQL has learning curve, invest in training
Tools
- Datadog Dashboard Exporter: Export JSON dashboards
- Grafana Dashboard Linter: Validate converted dashboards
- PromQL Learning Resources: https://prometheus.io/docs/prometheus/latest/querying/basics/
Common Gotchas
Rate without Time Window
❌ Wrong:
rate(http_requests_total)
✅ Correct:
rate(http_requests_total[5m])
Aggregating Before Rate
❌ Wrong:
rate(sum(http_requests_total)[5m])
✅ Correct:
sum(rate(http_requests_total[5m]))
Histogram Quantile Without by (le)
❌ Wrong:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
✅ Correct:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Quick Conversion Checklist
When converting a Datadog query to PromQL:
- Replace metric name (e.g.,
system.cpu.user→node_cpu_seconds_total) - Convert tags to labels (
{tag:value}→{label="value"}) - Add time window for rate/increase (
[5m]) - Change aggregation syntax (
avg:→avg()) - Convert percentiles to histogram_quantile if needed
- Test query in Prometheus before adding to dashboard
- Add
by (label)for grouped aggregations
Need More Help?
- See
datadog_migration.mdfor full migration guide - PromQL documentation: https://prometheus.io/docs/prometheus/latest/querying/
- Practice at: https://demo.promlens.com/