Initial commit
This commit is contained in:
187
skills/incident-response/templates/postmortem-template.md
Normal file
187
skills/incident-response/templates/postmortem-template.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# Postmortem: [INCIDENT TITLE]
|
||||
|
||||
**Date**: [YYYY-MM-DD]
|
||||
**Incident ID**: INC-YYYY-MM-DD-XXX
|
||||
**Severity**: [SEV1 / SEV2 / SEV3]
|
||||
**Author**: [Name]
|
||||
**Reviewers**: [Names]
|
||||
**Status**: [Draft / Final]
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**What Happened**: [2-3 sentence summary of the incident]
|
||||
|
||||
**Impact**:
|
||||
- **Duration**: [X] minutes/hours
|
||||
- **Users Affected**: [All / X% / specific group]
|
||||
- **Revenue Impact**: [$X estimated loss]
|
||||
- **SLA Breach**: [Yes/No - details]
|
||||
|
||||
**Root Cause**: [1 sentence root cause]
|
||||
|
||||
**Resolution**: [1 sentence how it was fixed]
|
||||
|
||||
**Key Actions**: [3 most important action items]
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
| Time (UTC) | Event | Notes |
|
||||
|------------|-------|-------|
|
||||
| [HH:MM] | [Event] | [Context] |
|
||||
| [HH:MM] | [Event] | [Context] |
|
||||
| [HH:MM] | [Event] | [Context] |
|
||||
|
||||
**Duration Breakdown**:
|
||||
- Detection → Identification: [X] minutes
|
||||
- Identification → Mitigation: [X] minutes
|
||||
- Mitigation → Full Resolution: [X] minutes
|
||||
- **Total MTTR**: [X] minutes
|
||||
|
||||
---
|
||||
|
||||
## Root Cause Analysis (5 Whys)
|
||||
|
||||
**Why 1**: Why did [problem] happen?
|
||||
→ [Answer based on facts]
|
||||
|
||||
**Why 2**: Why did [previous answer] happen?
|
||||
→ [Answer based on facts]
|
||||
|
||||
**Why 3**: Why did [previous answer] happen?
|
||||
→ [Answer based on facts]
|
||||
|
||||
**Why 4**: Why did [previous answer] happen?
|
||||
→ [Answer based on facts]
|
||||
|
||||
**Why 5**: Why did [previous answer] happen?
|
||||
→ [Answer based on facts]
|
||||
|
||||
**ROOT CAUSE**: [Final systemic issue identified]
|
||||
|
||||
---
|
||||
|
||||
## Contributing Factors
|
||||
|
||||
### Immediate Cause
|
||||
[Direct technical cause of the incident]
|
||||
|
||||
### Underlying Conditions
|
||||
1. [Condition that enabled the immediate cause]
|
||||
2. [Condition that enabled the immediate cause]
|
||||
|
||||
### Latent Failures
|
||||
1. [Organizational/process weakness]
|
||||
2. [Organizational/process weakness]
|
||||
|
||||
---
|
||||
|
||||
## What Went Well ✅
|
||||
|
||||
1. [Something that worked well during response]
|
||||
2. [Something that worked well during response]
|
||||
3. [Something that worked well during response]
|
||||
|
||||
---
|
||||
|
||||
## What Went Wrong ❌
|
||||
|
||||
1. [Something that didn't work or was missing]
|
||||
2. [Something that didn't work or was missing]
|
||||
3. [Something that didn't work or was missing]
|
||||
|
||||
---
|
||||
|
||||
## Action Items
|
||||
|
||||
| Priority | Action | Owner | Due Date | Status | Link |
|
||||
|----------|--------|-------|----------|--------|------|
|
||||
| P0 | [Critical - do immediately] | @[name] | [Date] | [ ] | [Link] |
|
||||
| P1 | [Important - do within 1 week] | @[name] | [Date] | [ ] | [Link] |
|
||||
| P2 | [Nice to have - do within 1 month] | @[name] | [Date] | [ ] | [Link] |
|
||||
|
||||
### P0 Actions (Immediate)
|
||||
- [ ] [Action 1] - @[owner] - [due date]
|
||||
- [ ] [Action 2] - @[owner] - [due date]
|
||||
|
||||
### P1 Actions (Short-Term)
|
||||
- [ ] [Action 1] - @[owner] - [due date]
|
||||
- [ ] [Action 2] - @[owner] - [due date]
|
||||
|
||||
### P2 Actions (Long-Term)
|
||||
- [ ] [Action 1] - @[owner] - [due date]
|
||||
- [ ] [Action 2] - @[owner] - [due date]
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### Technical Learnings
|
||||
1. [Technical insight gained]
|
||||
2. [Technical insight gained]
|
||||
|
||||
### Process Learnings
|
||||
1. [Process improvement identified]
|
||||
2. [Process improvement identified]
|
||||
|
||||
### Communication Learnings
|
||||
1. [Communication improvement identified]
|
||||
2. [Communication improvement identified]
|
||||
|
||||
---
|
||||
|
||||
## Prevention Measures
|
||||
|
||||
### Immediate (Completed)
|
||||
- [x] [What was done same day]
|
||||
- [x] [What was done same day]
|
||||
|
||||
### Short-Term (1-2 weeks)
|
||||
- [ ] [What will be done soon]
|
||||
- [ ] [What will be done soon]
|
||||
|
||||
### Long-Term (1-3 months)
|
||||
- [ ] [What will be done eventually]
|
||||
- [ ] [What will be done eventually]
|
||||
|
||||
---
|
||||
|
||||
## Related Incidents
|
||||
|
||||
- [INC-YYYY-MM-DD-XXX] - [Brief description] - [Link]
|
||||
- [INC-YYYY-MM-DD-XXX] - [Brief description] - [Link]
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### Relevant Logs
|
||||
```
|
||||
[Paste key log entries]
|
||||
```
|
||||
|
||||
### Metrics/Graphs
|
||||
[Links to Grafana dashboards, screenshots]
|
||||
|
||||
### Commands Run
|
||||
```bash
|
||||
[Commands that were used during investigation/mitigation]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sign-Off
|
||||
|
||||
**Incident Commander**: [Name] - [Date]
|
||||
**Technical Lead**: [Name] - [Date]
|
||||
**Engineering Manager**: [Name] - [Date]
|
||||
|
||||
**Postmortem Review**: [Date/Time]
|
||||
**Attendees**: [List of people who reviewed]
|
||||
|
||||
---
|
||||
|
||||
Return to [templates index](INDEX.md)
|
||||
Reference in New Issue
Block a user