3.3 KiB
Incident Timeline: [INCIDENT TITLE]
Incident ID: INC-YYYY-MM-DD-XXX Severity: [SEV1 / SEV2 / SEV3] Status: [Investigating / Mitigating / Resolved / Monitoring] Started: [YYYY-MM-DD HH:MM UTC]
Incident Overview
Impact:
- Customer Impact: [All users / X% of users / Specific feature]
- Services Affected: [List affected services]
- Error Rate: [X%]
- Revenue Impact: [$X estimated]
Symptoms:
- [User-facing symptom 1]
- [User-facing symptom 2]
- [Metric: baseline → current]
Team
Incident Commander: @[name] Technical Lead: @[name] Communications Lead: @[name] Scribe: @[name] SMEs: @[name1], @[name2]
Channels:
- Slack: #incident-XXX
- Zoom: [link]
- Status Page: [link]
Timeline
| Time (UTC) | Event | Action Taken | Owner | Status |
|---|---|---|---|---|
| [HH:MM] | [Alert fired / Issue detected] | [What was done] | @[name] | 🔴 Started |
| [HH:MM] | [IC joined] | [Declared severity, assigned roles] | @[IC] | 🔴 Investigating |
| [HH:MM] | [Discovery] | [What was found] | @[name] | 🔴 Investigating |
| [HH:MM] | [Root cause identified] | [What the root cause is] | @[name] | 🟡 Identified |
| [HH:MM] | [Mitigation started] | [What fix is being applied] | @[name] | 🟡 Mitigating |
| [HH:MM] | [Mitigation complete] | [Verification of fix] | @[name] | 🟢 Mitigated |
| [HH:MM] | [Incident resolved] | [All checks passing] | @[IC] | 🟢 Resolved |
Total Duration: [X] minutes/hours
Status Updates
Update #1 ([HH:MM UTC] - T+[X] min)
Status: [Investigating / Mitigating] Root Cause: [Known / Unknown - investigating X] Current Actions: [What team is doing] Impact: [Current impact status] ETA: [Estimated resolution time OR "Unknown"] Next Update: [Time]
Update #2 ([HH:MM UTC] - T+[X] min)
[Same format as Update #1]
Final Update ([HH:MM UTC] - T+[X] min)
Status: Resolved Root Cause: [Brief summary] Fix Applied: [What was done] Impact: Resolved Monitoring: [Ongoing monitoring period]
Root Cause (Brief)
Immediate Cause: [What directly caused the issue]
Contributing Factors:
- [Factor 1]
- [Factor 2]
- [Factor 3]
Resolution Summary
Temporary Fix (if applicable):
- [What was done to quickly mitigate]
- [When it was applied]
Permanent Fix:
- [What was done for long-term solution]
- [When it was applied]
Verification:
- [How we confirmed the fix worked]
- [Metrics that returned to normal]
Communications
Internal
- [HH:MM] - SEV1 declared in #incidents
- [HH:MM] - Update #1 posted
- [HH:MM] - Update #2 posted
- [HH:MM] - Resolution announced
External
- [HH:MM] - Status page: "Investigating"
- [HH:MM] - Status page: "Identified"
- [HH:MM] - Status page: "Monitoring"
- [HH:MM] - Status page: "Resolved"
- [HH:MM] - Customer email sent (if applicable)
Executive
- [HH:MM] - Initial notification to CTO/CEO (SEV1 only)
- [HH:MM] - Resolution summary sent
Next Steps
- Full postmortem scheduled: [Date/Time]
- Action items created in Linear
- Runbook updated with new learnings
- Monitoring improvements identified
Notes
[Any additional context, observations, or learnings captured during the incident]
Return to templates index