Files
gh-ahmedasmar-devops-claude…/references/dql_promql_translation.md
2025-11-29 17:51:22 +08:00

11 KiB

DQL (Datadog Query Language) ↔ PromQL Translation Guide

Quick Reference

Concept Datadog (DQL) Prometheus (PromQL)
Aggregation avg:, sum:, min:, max: avg(), sum(), min(), max()
Rate .as_rate(), .as_count() rate(), increase()
Percentile p50:, p95:, p99: histogram_quantile()
Filtering {tag:value} {label="value"}
Time window last_5m, last_1h [5m], [1h]

Basic Queries

Simple Metric Query

Datadog:

system.cpu.user

Prometheus:

node_cpu_seconds_total{mode="user"}

Metric with Filter

Datadog:

system.cpu.user{host:web-01}

Prometheus:

node_cpu_seconds_total{mode="user", instance="web-01"}

Multiple Filters (AND)

Datadog:

system.cpu.user{host:web-01,env:production}

Prometheus:

node_cpu_seconds_total{mode="user", instance="web-01", env="production"}

Wildcard Filters

Datadog:

system.cpu.user{host:web-*}

Prometheus:

node_cpu_seconds_total{mode="user", instance=~"web-.*"}

OR Filters

Datadog:

system.cpu.user{host:web-01 OR host:web-02}

Prometheus:

node_cpu_seconds_total{mode="user", instance=~"web-01|web-02"}

Aggregations

Average

Datadog:

avg:system.cpu.user{*}

Prometheus:

avg(node_cpu_seconds_total{mode="user"})

Sum

Datadog:

sum:requests.count{*}

Prometheus:

sum(http_requests_total)

Min/Max

Datadog:

min:system.mem.free{*}
max:system.mem.free{*}

Prometheus:

min(node_memory_MemFree_bytes)
max(node_memory_MemFree_bytes)

Aggregation by Tag/Label

Datadog:

avg:system.cpu.user{*} by {host}

Prometheus:

avg by (instance) (node_cpu_seconds_total{mode="user"})

Rates and Counts

Rate (per second)

Datadog:

sum:requests.count{*}.as_rate()

Prometheus:

sum(rate(http_requests_total[5m]))

Note: Prometheus requires explicit time window [5m]


Count (total over time)

Datadog:

sum:requests.count{*}.as_count()

Prometheus:

sum(increase(http_requests_total[1h]))

Derivative (change over time)

Datadog:

derivative(avg:system.disk.used{*})

Prometheus:

deriv(node_filesystem_size_bytes[5m])

Percentiles

P50 (Median)

Datadog:

p50:request.duration{*}

Prometheus (requires histogram):

histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

P95

Datadog:

p95:request.duration{*}

Prometheus:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

P99

Datadog:

p99:request.duration{*}

Prometheus:

histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Time Windows

Last 5 minutes

Datadog:

avg(last_5m):system.cpu.user{*}

Prometheus:

avg(node_cpu_seconds_total{mode="user"}[5m])

Last 1 hour

Datadog:

avg(last_1h):system.cpu.user{*}

Prometheus:

avg_over_time(node_cpu_seconds_total{mode="user"}[1h])

Math Operations

Division

Datadog:

avg:system.mem.used{*} / avg:system.mem.total{*}

Prometheus:

node_memory_MemUsed_bytes / node_memory_MemTotal_bytes

Multiplication

Datadog:

avg:system.cpu.user{*} * 100

Prometheus:

avg(node_cpu_seconds_total{mode="user"}) * 100

Percentage Calculation

Datadog:

(sum:requests.errors{*} / sum:requests.count{*}) * 100

Prometheus:

(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100

Common Use Cases

CPU Usage Percentage

Datadog:

100 - avg:system.cpu.idle{*}

Prometheus:

100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Percentage

Datadog:

(avg:system.mem.used{*} / avg:system.mem.total{*}) * 100

Prometheus:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Disk Usage Percentage

Datadog:

(avg:system.disk.used{*} / avg:system.disk.total{*}) * 100

Prometheus:

(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100

Request Rate (requests/sec)

Datadog:

sum:requests.count{*}.as_rate()

Prometheus:

sum(rate(http_requests_total[5m]))

Error Rate Percentage

Datadog:

(sum:requests.errors{*}.as_rate() / sum:requests.count{*}.as_rate()) * 100

Prometheus:

(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100

Request Latency (P95)

Datadog:

p95:request.duration{*}

Prometheus:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Top 5 Hosts by CPU

Datadog:

top(avg:system.cpu.user{*} by {host}, 5, 'mean', 'desc')

Prometheus:

topk(5, avg by (instance) (rate(node_cpu_seconds_total{mode="user"}[5m])))

Functions

Absolute Value

Datadog:

abs(diff(avg:system.cpu.user{*}))

Prometheus:

abs(delta(node_cpu_seconds_total{mode="user"}[5m]))

Ceiling/Floor

Datadog:

ceil(avg:system.cpu.user{*})
floor(avg:system.cpu.user{*})

Prometheus:

ceil(avg(node_cpu_seconds_total{mode="user"}))
floor(avg(node_cpu_seconds_total{mode="user"}))

Clamp (Limit Range)

Datadog:

clamp_min(avg:system.cpu.user{*}, 0)
clamp_max(avg:system.cpu.user{*}, 100)

Prometheus:

clamp_min(avg(node_cpu_seconds_total{mode="user"}), 0)
clamp_max(avg(node_cpu_seconds_total{mode="user"}), 100)

Moving Average

Datadog:

moving_rollup(avg:system.cpu.user{*}, 60, 'avg')

Prometheus:

avg_over_time(node_cpu_seconds_total{mode="user"}[1h])

Advanced Patterns

Compare to Previous Period

Datadog:

sum:requests.count{*}.as_rate() / timeshift(sum:requests.count{*}.as_rate(), 3600)

Prometheus:

sum(rate(http_requests_total[5m])) / sum(rate(http_requests_total[5m] offset 1h))

Forecast

Datadog:

forecast(avg:system.disk.used{*}, 'linear', 1)

Prometheus:

predict_linear(node_filesystem_size_bytes[1h], 3600)

Note: Predicts value 1 hour in future based on last 1 hour trend


Anomaly Detection

Datadog:

anomalies(avg:system.cpu.user{*}, 'basic', 2)

Prometheus: No built-in function

  • Use recording rules with stddev
  • External tools like Robust Perception's anomaly detector
  • Or use Grafana ML plugin

Outlier Detection

Datadog:

outliers(avg:system.cpu.user{*} by {host}, 'mad')

Prometheus: No built-in function

  • Calculate manually with stddev:
abs(metric - avg(metric)) > 2 * stddev(metric)

Container & Kubernetes

Container CPU Usage

Datadog:

avg:docker.cpu.usage{*} by {container_name}

Prometheus:

avg by (container) (rate(container_cpu_usage_seconds_total[5m]))

Container Memory Usage

Datadog:

avg:docker.mem.rss{*} by {container_name}

Prometheus:

avg by (container) (container_memory_rss)

Pod Count by Status

Datadog:

sum:kubernetes.pods.running{*} by {kube_namespace}

Prometheus:

sum by (namespace) (kube_pod_status_phase{phase="Running"})

Database Queries

MySQL Queries Per Second

Datadog:

sum:mysql.performance.queries{*}.as_rate()

Prometheus:

sum(rate(mysql_global_status_queries[5m]))

PostgreSQL Active Connections

Datadog:

avg:postgresql.connections{*}

Prometheus:

avg(pg_stat_database_numbackends)

Redis Memory Usage

Datadog:

avg:redis.mem.used{*}

Prometheus:

avg(redis_memory_used_bytes)

Network Metrics

Network Bytes Sent

Datadog:

sum:system.net.bytes_sent{*}.as_rate()

Prometheus:

sum(rate(node_network_transmit_bytes_total[5m]))

Network Bytes Received

Datadog:

sum:system.net.bytes_rcvd{*}.as_rate()

Prometheus:

sum(rate(node_network_receive_bytes_total[5m]))

Key Differences

1. Time Windows

  • Datadog: Optional, defaults to query time range
  • Prometheus: Always required for rate/increase functions

2. Histograms

  • Datadog: Percentiles available directly
  • Prometheus: Requires histogram buckets + histogram_quantile()

3. Default Aggregation

  • Datadog: No default, must specify
  • Prometheus: Returns all time series unless aggregated

4. Metric Types

  • Datadog: All metrics treated similarly
  • Prometheus: Explicit types (counter, gauge, histogram, summary)

5. Tag vs Label

  • Datadog: Uses "tags" (key:value)
  • Prometheus: Uses "labels" (key="value")

Migration Tips

  1. Start with dashboards: Convert most-used dashboards first
  2. Use recording rules: Pre-calculate expensive PromQL queries
  3. Test in parallel: Run both systems during migration
  4. Document mappings: Create team-specific translation guide
  5. Train team: PromQL has learning curve, invest in training

Tools


Common Gotchas

Rate without Time Window

Wrong:

rate(http_requests_total)

Correct:

rate(http_requests_total[5m])

Aggregating Before Rate

Wrong:

rate(sum(http_requests_total)[5m])

Correct:

sum(rate(http_requests_total[5m]))

Histogram Quantile Without by (le)

Wrong:

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Correct:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Quick Conversion Checklist

When converting a Datadog query to PromQL:

  • Replace metric name (e.g., system.cpu.usernode_cpu_seconds_total)
  • Convert tags to labels ({tag:value}{label="value"})
  • Add time window for rate/increase ([5m])
  • Change aggregation syntax (avg:avg())
  • Convert percentiles to histogram_quantile if needed
  • Test query in Prometheus before adding to dashboard
  • Add by (label) for grouped aggregations

Need More Help?