68 lines
2.5 KiB
Markdown
68 lines
2.5 KiB
Markdown
# Observability Reference Documentation
|
|
|
|
Comprehensive reference guides for production observability patterns, PromQL queries, and SRE best practices.
|
|
|
|
## Reference Overview
|
|
|
|
### PromQL Query Language Guide
|
|
|
|
**File**: [promql-guide.md](promql-guide.md)
|
|
|
|
Complete PromQL reference for Prometheus queries:
|
|
- **Metric types**: Counter, Gauge, Histogram, Summary
|
|
- **PromQL functions**: rate(), irate(), increase(), sum(), avg(), histogram_quantile()
|
|
- **Recording rules**: Pre-aggregated metrics for performance
|
|
- **Alerting queries**: Burn rate calculations, threshold alerts
|
|
- **Performance tips**: Query optimization, avoiding cardinality explosions
|
|
|
|
**Use when**: Writing Prometheus queries, creating recording rules, debugging slow queries
|
|
|
|
---
|
|
|
|
### Golden Signals Reference
|
|
|
|
**File**: [golden-signals.md](golden-signals.md)
|
|
|
|
Google SRE Golden Signals implementation guide:
|
|
- **Request Rate (Traffic)**: RPS calculations, per-service breakdowns
|
|
- **Error Rate**: 5xx errors, client vs server errors, error budget impact
|
|
- **Latency (Duration)**: p50/p95/p99 percentiles, latency SLOs
|
|
- **Saturation**: CPU, memory, disk, connection pools
|
|
|
|
**Use when**: Designing monitoring dashboards, implementing SLIs, understanding system health
|
|
|
|
---
|
|
|
|
### SLO Best Practices
|
|
|
|
**File**: [slo-best-practices.md](slo-best-practices.md)
|
|
|
|
Google SRE SLO/SLI/Error Budget framework:
|
|
- **SLI selection**: Choosing meaningful indicators (availability, latency, throughput)
|
|
- **SLO targets**: Critical (99.95%), Essential (99.9%), Standard (99.5%)
|
|
- **Error budget policies**: Feature freeze thresholds, postmortem requirements
|
|
- **Multi-window burn rate alerts**: 1h, 6h, 24h windows
|
|
- **SLO review cadence**: Weekly reviews, quarterly adjustments
|
|
|
|
**Use when**: Implementing SLO framework, setting reliability targets, balancing velocity with reliability
|
|
|
|
---
|
|
|
|
## Quick Navigation
|
|
|
|
| Topic | File | Lines | Focus |
|
|
|-------|------|-------|-------|
|
|
| **PromQL** | [promql-guide.md](promql-guide.md) | ~450 | Query language reference |
|
|
| **Golden Signals** | [golden-signals.md](golden-signals.md) | ~380 | Four signals implementation |
|
|
| **SLO Practices** | [slo-best-practices.md](slo-best-practices.md) | ~420 | Google SRE framework |
|
|
|
|
## Related Documentation
|
|
|
|
- **Examples**: [Examples Index](../examples/INDEX.md) - Production implementations
|
|
- **Templates**: [Templates Index](../templates/INDEX.md) - Copy-paste configurations
|
|
- **Main Agent**: [observability-engineer.md](../observability-engineer.md) - Observability agent
|
|
|
|
---
|
|
|
|
Return to [main agent](../observability-engineer.md)
|