148 lines
3.3 KiB
Markdown
148 lines
3.3 KiB
Markdown
# Incident Timeline: [INCIDENT TITLE]
|
|
|
|
**Incident ID**: INC-YYYY-MM-DD-XXX
|
|
**Severity**: [SEV1 / SEV2 / SEV3]
|
|
**Status**: [Investigating / Mitigating / Resolved / Monitoring]
|
|
**Started**: [YYYY-MM-DD HH:MM UTC]
|
|
|
|
---
|
|
|
|
## Incident Overview
|
|
|
|
**Impact**:
|
|
- Customer Impact: [All users / X% of users / Specific feature]
|
|
- Services Affected: [List affected services]
|
|
- Error Rate: [X%]
|
|
- Revenue Impact: [$X estimated]
|
|
|
|
**Symptoms**:
|
|
- [User-facing symptom 1]
|
|
- [User-facing symptom 2]
|
|
- [Metric: baseline → current]
|
|
|
|
---
|
|
|
|
## Team
|
|
|
|
**Incident Commander**: @[name]
|
|
**Technical Lead**: @[name]
|
|
**Communications Lead**: @[name]
|
|
**Scribe**: @[name]
|
|
**SMEs**: @[name1], @[name2]
|
|
|
|
**Channels**:
|
|
- Slack: #incident-XXX
|
|
- Zoom: [link]
|
|
- Status Page: [link]
|
|
|
|
---
|
|
|
|
## Timeline
|
|
|
|
| Time (UTC) | Event | Action Taken | Owner | Status |
|
|
|------------|-------|--------------|-------|--------|
|
|
| [HH:MM] | [Alert fired / Issue detected] | [What was done] | @[name] | 🔴 Started |
|
|
| [HH:MM] | [IC joined] | [Declared severity, assigned roles] | @[IC] | 🔴 Investigating |
|
|
| [HH:MM] | [Discovery] | [What was found] | @[name] | 🔴 Investigating |
|
|
| [HH:MM] | [Root cause identified] | [What the root cause is] | @[name] | 🟡 Identified |
|
|
| [HH:MM] | [Mitigation started] | [What fix is being applied] | @[name] | 🟡 Mitigating |
|
|
| [HH:MM] | [Mitigation complete] | [Verification of fix] | @[name] | 🟢 Mitigated |
|
|
| [HH:MM] | [Incident resolved] | [All checks passing] | @[IC] | 🟢 Resolved |
|
|
|
|
**Total Duration**: [X] minutes/hours
|
|
|
|
---
|
|
|
|
## Status Updates
|
|
|
|
### Update #1 ([HH:MM UTC] - T+[X] min)
|
|
|
|
**Status**: [Investigating / Mitigating]
|
|
**Root Cause**: [Known / Unknown - investigating X]
|
|
**Current Actions**: [What team is doing]
|
|
**Impact**: [Current impact status]
|
|
**ETA**: [Estimated resolution time OR "Unknown"]
|
|
**Next Update**: [Time]
|
|
|
|
### Update #2 ([HH:MM UTC] - T+[X] min)
|
|
|
|
[Same format as Update #1]
|
|
|
|
### Final Update ([HH:MM UTC] - T+[X] min)
|
|
|
|
**Status**: Resolved
|
|
**Root Cause**: [Brief summary]
|
|
**Fix Applied**: [What was done]
|
|
**Impact**: Resolved
|
|
**Monitoring**: [Ongoing monitoring period]
|
|
|
|
---
|
|
|
|
## Root Cause (Brief)
|
|
|
|
**Immediate Cause**: [What directly caused the issue]
|
|
|
|
**Contributing Factors**:
|
|
1. [Factor 1]
|
|
2. [Factor 2]
|
|
3. [Factor 3]
|
|
|
|
---
|
|
|
|
## Resolution Summary
|
|
|
|
**Temporary Fix** (if applicable):
|
|
- [What was done to quickly mitigate]
|
|
- [When it was applied]
|
|
|
|
**Permanent Fix**:
|
|
- [What was done for long-term solution]
|
|
- [When it was applied]
|
|
|
|
**Verification**:
|
|
- [How we confirmed the fix worked]
|
|
- [Metrics that returned to normal]
|
|
|
|
---
|
|
|
|
## Communications
|
|
|
|
### Internal
|
|
|
|
- [HH:MM] - SEV1 declared in #incidents
|
|
- [HH:MM] - Update #1 posted
|
|
- [HH:MM] - Update #2 posted
|
|
- [HH:MM] - Resolution announced
|
|
|
|
### External
|
|
|
|
- [HH:MM] - Status page: "Investigating"
|
|
- [HH:MM] - Status page: "Identified"
|
|
- [HH:MM] - Status page: "Monitoring"
|
|
- [HH:MM] - Status page: "Resolved"
|
|
- [HH:MM] - Customer email sent (if applicable)
|
|
|
|
### Executive
|
|
|
|
- [HH:MM] - Initial notification to CTO/CEO (SEV1 only)
|
|
- [HH:MM] - Resolution summary sent
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
- [ ] Full postmortem scheduled: [Date/Time]
|
|
- [ ] Action items created in Linear
|
|
- [ ] Runbook updated with new learnings
|
|
- [ ] Monitoring improvements identified
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
[Any additional context, observations, or learnings captured during the incident]
|
|
|
|
---
|
|
|
|
Return to [templates index](INDEX.md)
|