Initial commit
This commit is contained in:
67
skills/observability-engineering/reference/INDEX.md
Normal file
67
skills/observability-engineering/reference/INDEX.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Observability Reference Documentation
|
||||
|
||||
Comprehensive reference guides for production observability patterns, PromQL queries, and SRE best practices.
|
||||
|
||||
## Reference Overview
|
||||
|
||||
### PromQL Query Language Guide
|
||||
|
||||
**File**: [promql-guide.md](promql-guide.md)
|
||||
|
||||
Complete PromQL reference for Prometheus queries:
|
||||
- **Metric types**: Counter, Gauge, Histogram, Summary
|
||||
- **PromQL functions**: rate(), irate(), increase(), sum(), avg(), histogram_quantile()
|
||||
- **Recording rules**: Pre-aggregated metrics for performance
|
||||
- **Alerting queries**: Burn rate calculations, threshold alerts
|
||||
- **Performance tips**: Query optimization, avoiding cardinality explosions
|
||||
|
||||
**Use when**: Writing Prometheus queries, creating recording rules, debugging slow queries
|
||||
|
||||
---
|
||||
|
||||
### Golden Signals Reference
|
||||
|
||||
**File**: [golden-signals.md](golden-signals.md)
|
||||
|
||||
Google SRE Golden Signals implementation guide:
|
||||
- **Request Rate (Traffic)**: RPS calculations, per-service breakdowns
|
||||
- **Error Rate**: 5xx errors, client vs server errors, error budget impact
|
||||
- **Latency (Duration)**: p50/p95/p99 percentiles, latency SLOs
|
||||
- **Saturation**: CPU, memory, disk, connection pools
|
||||
|
||||
**Use when**: Designing monitoring dashboards, implementing SLIs, understanding system health
|
||||
|
||||
---
|
||||
|
||||
### SLO Best Practices
|
||||
|
||||
**File**: [slo-best-practices.md](slo-best-practices.md)
|
||||
|
||||
Google SRE SLO/SLI/Error Budget framework:
|
||||
- **SLI selection**: Choosing meaningful indicators (availability, latency, throughput)
|
||||
- **SLO targets**: Critical (99.95%), Essential (99.9%), Standard (99.5%)
|
||||
- **Error budget policies**: Feature freeze thresholds, postmortem requirements
|
||||
- **Multi-window burn rate alerts**: 1h, 6h, 24h windows
|
||||
- **SLO review cadence**: Weekly reviews, quarterly adjustments
|
||||
|
||||
**Use when**: Implementing SLO framework, setting reliability targets, balancing velocity with reliability
|
||||
|
||||
---
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
| Topic | File | Lines | Focus |
|
||||
|-------|------|-------|-------|
|
||||
| **PromQL** | [promql-guide.md](promql-guide.md) | ~450 | Query language reference |
|
||||
| **Golden Signals** | [golden-signals.md](golden-signals.md) | ~380 | Four signals implementation |
|
||||
| **SLO Practices** | [slo-best-practices.md](slo-best-practices.md) | ~420 | Google SRE framework |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Examples**: [Examples Index](../examples/INDEX.md) - Production implementations
|
||||
- **Templates**: [Templates Index](../templates/INDEX.md) - Copy-paste configurations
|
||||
- **Main Agent**: [observability-engineer.md](../observability-engineer.md) - Observability agent
|
||||
|
||||
---
|
||||
|
||||
Return to [main agent](../observability-engineer.md)
|
||||
Reference in New Issue
Block a user