# Postmortem: [INCIDENT TITLE] **Date**: [YYYY-MM-DD] **Incident ID**: INC-YYYY-MM-DD-XXX **Severity**: [SEV1 / SEV2 / SEV3] **Author**: [Name] **Reviewers**: [Names] **Status**: [Draft / Final] --- ## Executive Summary **What Happened**: [2-3 sentence summary of the incident] **Impact**: - **Duration**: [X] minutes/hours - **Users Affected**: [All / X% / specific group] - **Revenue Impact**: [$X estimated loss] - **SLA Breach**: [Yes/No - details] **Root Cause**: [1 sentence root cause] **Resolution**: [1 sentence how it was fixed] **Key Actions**: [3 most important action items] --- ## Timeline | Time (UTC) | Event | Notes | |------------|-------|-------| | [HH:MM] | [Event] | [Context] | | [HH:MM] | [Event] | [Context] | | [HH:MM] | [Event] | [Context] | **Duration Breakdown**: - Detection → Identification: [X] minutes - Identification → Mitigation: [X] minutes - Mitigation → Full Resolution: [X] minutes - **Total MTTR**: [X] minutes --- ## Root Cause Analysis (5 Whys) **Why 1**: Why did [problem] happen? → [Answer based on facts] **Why 2**: Why did [previous answer] happen? → [Answer based on facts] **Why 3**: Why did [previous answer] happen? → [Answer based on facts] **Why 4**: Why did [previous answer] happen? → [Answer based on facts] **Why 5**: Why did [previous answer] happen? → [Answer based on facts] **ROOT CAUSE**: [Final systemic issue identified] --- ## Contributing Factors ### Immediate Cause [Direct technical cause of the incident] ### Underlying Conditions 1. [Condition that enabled the immediate cause] 2. [Condition that enabled the immediate cause] ### Latent Failures 1. [Organizational/process weakness] 2. [Organizational/process weakness] --- ## What Went Well ✅ 1. [Something that worked well during response] 2. [Something that worked well during response] 3. [Something that worked well during response] --- ## What Went Wrong ❌ 1. [Something that didn't work or was missing] 2. [Something that didn't work or was missing] 3. [Something that didn't work or was missing] --- ## Action Items | Priority | Action | Owner | Due Date | Status | Link | |----------|--------|-------|----------|--------|------| | P0 | [Critical - do immediately] | @[name] | [Date] | [ ] | [Link] | | P1 | [Important - do within 1 week] | @[name] | [Date] | [ ] | [Link] | | P2 | [Nice to have - do within 1 month] | @[name] | [Date] | [ ] | [Link] | ### P0 Actions (Immediate) - [ ] [Action 1] - @[owner] - [due date] - [ ] [Action 2] - @[owner] - [due date] ### P1 Actions (Short-Term) - [ ] [Action 1] - @[owner] - [due date] - [ ] [Action 2] - @[owner] - [due date] ### P2 Actions (Long-Term) - [ ] [Action 1] - @[owner] - [due date] - [ ] [Action 2] - @[owner] - [due date] --- ## Lessons Learned ### Technical Learnings 1. [Technical insight gained] 2. [Technical insight gained] ### Process Learnings 1. [Process improvement identified] 2. [Process improvement identified] ### Communication Learnings 1. [Communication improvement identified] 2. [Communication improvement identified] --- ## Prevention Measures ### Immediate (Completed) - [x] [What was done same day] - [x] [What was done same day] ### Short-Term (1-2 weeks) - [ ] [What will be done soon] - [ ] [What will be done soon] ### Long-Term (1-3 months) - [ ] [What will be done eventually] - [ ] [What will be done eventually] --- ## Related Incidents - [INC-YYYY-MM-DD-XXX] - [Brief description] - [Link] - [INC-YYYY-MM-DD-XXX] - [Brief description] - [Link] --- ## Appendix ### Relevant Logs ``` [Paste key log entries] ``` ### Metrics/Graphs [Links to Grafana dashboards, screenshots] ### Commands Run ```bash [Commands that were used during investigation/mitigation] ``` --- ## Sign-Off **Incident Commander**: [Name] - [Date] **Technical Lead**: [Name] - [Date] **Engineering Manager**: [Name] - [Date] **Postmortem Review**: [Date/Time] **Attendees**: [List of people who reviewed] --- Return to [templates index](INDEX.md)