1.2 KiB
1.2 KiB
description
| description |
|---|
| Create intelligent alerting rules |
Alerting Rule Creator
Create comprehensive alerting rules to detect performance issues proactively.
Alert Categories
- Latency Alerts: Response time degradation
- Error Rate Alerts: Elevated error percentages
- Throughput Alerts: Traffic anomalies
- Resource Alerts: CPU, memory, disk saturation
- Availability Alerts: Service downtime
- SLO Violation Alerts: Service level breach warnings
Alert Design Principles
- Avoid alert fatigue with proper thresholds
- Use multi-window alerts to reduce false positives
- Implement severity levels (critical, warning, info)
- Include actionable information in alert messages
- Set appropriate escalation policies
Process
- Identify critical metrics and failure modes
- Define alert conditions and thresholds
- Design alert routing and escalation
- Create alert rule configurations
- Implement runbooks for alert responses
Output
Provide:
- Alert rule definitions (Prometheus, Datadog, etc.)
- Threshold calculations with rationale
- Alert routing configuration
- Escalation policies
- Runbooks for common alerts
- Alert testing procedures