Initial commit
This commit is contained in:
477
commands/hotfix.md
Normal file
477
commands/hotfix.md
Normal file
@@ -0,0 +1,477 @@
|
||||
---
|
||||
description: Expedited emergency response workflow for critical production issues
|
||||
---
|
||||
|
||||
<!--
|
||||
ATTRIBUTION CHAIN:
|
||||
1. Original methodology: spec-kit-extensions (https://github.com/MartyBonacci/spec-kit-extensions)
|
||||
by Marty Bonacci (2025)
|
||||
2. Adapted: SpecLab plugin by Marty Bonacci & Claude Code (2025)
|
||||
3. Based on: GitHub spec-kit (https://github.com/github/spec-kit)
|
||||
Copyright (c) GitHub, Inc. | MIT License
|
||||
-->
|
||||
|
||||
## User Input
|
||||
|
||||
```text
|
||||
$ARGUMENTS
|
||||
```
|
||||
|
||||
You **MUST** consider the user input before proceeding (if not empty).
|
||||
|
||||
## Goal
|
||||
|
||||
Execute expedited emergency response workflow for critical production issues requiring immediate resolution.
|
||||
|
||||
**Key Principles**:
|
||||
1. **Speed First**: Minimal process overhead, maximum velocity
|
||||
2. **Safety**: Despite speed, maintain essential safeguards
|
||||
3. **Rollback Ready**: Always have rollback plan
|
||||
4. **Post-Mortem**: After fire is out, analyze and learn
|
||||
5. **Communication**: Keep stakeholders informed
|
||||
|
||||
**Use Cases**: Critical bugs, security vulnerabilities, data corruption, service outages
|
||||
|
||||
**Coverage**: Addresses ~10-15% of development work (emergencies)
|
||||
|
||||
---
|
||||
|
||||
## Smart Integration Detection
|
||||
|
||||
```bash
|
||||
# Check for SpecSwarm (tech stack enforcement) - OPTIONAL in hotfix
|
||||
SPECSWARM_INSTALLED=$(claude plugin list | grep -q "specswarm" && echo "true" || echo "false")
|
||||
|
||||
# Check for SpecTest (parallel execution, hooks, metrics)
|
||||
SPECTEST_INSTALLED=$(claude plugin list | grep -q "spectest" && echo "true" || echo "false")
|
||||
|
||||
# In hotfix mode, integration is OPTIONAL - speed is priority
|
||||
if [ "$SPECTEST_INSTALLED" = "true" ]; then
|
||||
ENABLE_METRICS=true
|
||||
echo "🎯 SpecTest detected (metrics enabled, but minimal overhead)"
|
||||
fi
|
||||
|
||||
if [ "$SPECSWARM_INSTALLED" = "true" ]; then
|
||||
TECH_VALIDATION_AVAILABLE=true
|
||||
echo "🎯 SpecSwarm detected (tech validation available if time permits)"
|
||||
fi
|
||||
```
|
||||
|
||||
**Note**: In hotfix mode, speed takes precedence. Tech validation and hooks are OPTIONAL.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Workflow Hook (if SpecTest installed)
|
||||
|
||||
```bash
|
||||
if [ "$ENABLE_METRICS" = "true" ]; then
|
||||
echo "🎣 Pre-Hotfix Hook (minimal overhead mode)"
|
||||
WORKFLOW_START_TIME=$(date +%s)
|
||||
echo "✓ Emergency metrics tracking initialized"
|
||||
echo ""
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### 1. Discover Hotfix Context
|
||||
|
||||
```bash
|
||||
REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
|
||||
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
|
||||
|
||||
# Try to extract hotfix number from branch (hotfix/NNN-*)
|
||||
HOTFIX_NUM=$(echo "$CURRENT_BRANCH" | grep -oE 'hotfix/([0-9]{3})' | grep -oE '[0-9]{3}')
|
||||
|
||||
if [ -z "$HOTFIX_NUM" ]; then
|
||||
echo "🚨 HOTFIX WORKFLOW - EMERGENCY MODE"
|
||||
echo ""
|
||||
echo "No hotfix branch detected. Provide hotfix number:"
|
||||
read -p "Hotfix number: " HOTFIX_NUM
|
||||
HOTFIX_NUM=$(printf "%03d" $HOTFIX_NUM)
|
||||
fi
|
||||
|
||||
# Source features location helper
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PLUGIN_DIR="$(dirname "$SCRIPT_DIR")"
|
||||
source "$PLUGIN_DIR/lib/features-location.sh"
|
||||
|
||||
# Initialize features directory
|
||||
ensure_features_dir "$REPO_ROOT"
|
||||
|
||||
FEATURE_DIR="${FEATURES_DIR}/${HOTFIX_NUM}-hotfix"
|
||||
mkdir -p "$FEATURE_DIR"
|
||||
|
||||
HOTFIX_SPEC="${FEATURE_DIR}/hotfix.md"
|
||||
TASKS_FILE="${FEATURE_DIR}/tasks.md"
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
🚨 Hotfix Workflow - EMERGENCY MODE - Hotfix ${HOTFIX_NUM}
|
||||
✓ Branch: ${CURRENT_BRANCH}
|
||||
✓ Feature directory: ${FEATURE_DIR}
|
||||
⚡ Expedited process active (minimal overhead)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Create Minimal Hotfix Specification
|
||||
|
||||
Create `$HOTFIX_SPEC` with ESSENTIAL details only:
|
||||
|
||||
```markdown
|
||||
# Hotfix ${HOTFIX_NUM}: [Title]
|
||||
|
||||
**🚨 EMERGENCY**: [Critical/High Priority]
|
||||
**Created**: YYYY-MM-DD HH:MM
|
||||
**Status**: Active
|
||||
|
||||
---
|
||||
|
||||
## Emergency Summary
|
||||
|
||||
**What's Broken**: [Brief description]
|
||||
|
||||
**Impact**:
|
||||
- Users affected: [scope]
|
||||
- Service status: [degraded/down]
|
||||
- Data at risk: [Yes/No]
|
||||
|
||||
**Urgency**: [Immediate/Within hours/Within day]
|
||||
|
||||
---
|
||||
|
||||
## Immediate Actions
|
||||
|
||||
**1. Mitigation** (if applicable):
|
||||
- [Temporary mitigation in place? Describe]
|
||||
|
||||
**2. Hotfix Scope**:
|
||||
- [What needs to be fixed immediately]
|
||||
|
||||
**3. Rollback Plan**:
|
||||
- [How to rollback if hotfix fails]
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
**Root Cause** (if known):
|
||||
- [Quick analysis]
|
||||
|
||||
**Fix Approach**:
|
||||
- [Minimal changes to resolve emergency]
|
||||
|
||||
**Files Affected**:
|
||||
- [List critical files]
|
||||
|
||||
**Testing Strategy**:
|
||||
- [Minimal essential tests - what MUST pass?]
|
||||
|
||||
---
|
||||
|
||||
## Deployment Plan
|
||||
|
||||
**Target**: Production
|
||||
**Rollout**: [Immediate/Phased]
|
||||
**Rollback Trigger**: [When to rollback]
|
||||
|
||||
---
|
||||
|
||||
## Post-Mortem Required
|
||||
|
||||
**After Emergency Resolved**:
|
||||
- [ ] Root cause analysis
|
||||
- [ ] Permanent fix (if hotfix is temporary)
|
||||
- [ ] Process improvements
|
||||
- [ ] Documentation updates
|
||||
- [ ] Team retrospective
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Workflow**: Hotfix (Emergency Response)
|
||||
**Created By**: SpecLab Plugin v1.0.0
|
||||
```
|
||||
|
||||
**Prompt user for:**
|
||||
- Emergency summary
|
||||
- Impact assessment
|
||||
- Rollback plan
|
||||
|
||||
Write to `$HOTFIX_SPEC`.
|
||||
|
||||
Output:
|
||||
```
|
||||
📋 Hotfix Specification (Minimal)
|
||||
✓ Created: ${HOTFIX_SPEC}
|
||||
✓ Emergency documented
|
||||
✓ Rollback plan defined
|
||||
⚡ Ready for immediate action
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Generate Minimal Tasks
|
||||
|
||||
Create `$TASKS_FILE` with ESSENTIAL tasks only:
|
||||
|
||||
```markdown
|
||||
# Tasks: Hotfix ${HOTFIX_NUM} - EMERGENCY
|
||||
|
||||
**Workflow**: Hotfix (Expedited)
|
||||
**Status**: Active
|
||||
**Created**: YYYY-MM-DD HH:MM
|
||||
|
||||
---
|
||||
|
||||
## ⚡ EMERGENCY MODE - MINIMAL PROCESS
|
||||
|
||||
**Speed Priority**: Essential tasks only
|
||||
**Tech Validation**: ${TECH_VALIDATION_AVAILABLE} (OPTIONAL - use if time permits)
|
||||
**Metrics**: ${ENABLE_METRICS} (lightweight tracking)
|
||||
|
||||
---
|
||||
|
||||
## Emergency Tasks
|
||||
|
||||
### T001: Implement Hotfix
|
||||
**Description**: [Minimal fix to resolve emergency]
|
||||
**Files**: [list]
|
||||
**Validation**: [Essential test only]
|
||||
**Parallel**: No (focused fix)
|
||||
|
||||
### T002: Essential Testing
|
||||
**Description**: Verify hotfix resolves emergency
|
||||
**Test Scope**: MINIMAL (critical path only)
|
||||
**Expected**: Emergency resolved
|
||||
**Parallel**: No
|
||||
|
||||
### T003: Deploy to Production
|
||||
**Description**: Emergency deployment
|
||||
**Rollback Plan**: ${ROLLBACK_PLAN}
|
||||
**Validation**: Service restored
|
||||
**Parallel**: No
|
||||
|
||||
### T004: Monitor Post-Deployment
|
||||
**Description**: Watch metrics, error rates, user reports
|
||||
**Duration**: [monitoring period]
|
||||
**Escalation**: [when to rollback]
|
||||
**Parallel**: No
|
||||
|
||||
---
|
||||
|
||||
## Post-Emergency Tasks (After Fire Out)
|
||||
|
||||
### T005: Root Cause Analysis
|
||||
**Description**: Deep dive into why emergency occurred
|
||||
**Output**: Root cause doc
|
||||
**Timeline**: Within 24-48 hours
|
||||
|
||||
### T006: Permanent Fix (if hotfix is temporary)
|
||||
**Description**: Replace hotfix with proper solution
|
||||
**Workflow**: Use /speclab:bugfix for permanent fix
|
||||
**Timeline**: [timeframe]
|
||||
|
||||
### T007: Post-Mortem
|
||||
**Description**: Team retrospective and process improvements
|
||||
**Timeline**: Within 1 week
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Emergency Tasks**: 4 (T001-T004)
|
||||
**Post-Emergency Tasks**: 3 (T005-T007)
|
||||
**Estimated Time to Resolution**: <1-2 hours (emergency tasks only)
|
||||
|
||||
**Success Criteria**:
|
||||
- ✅ Emergency resolved
|
||||
- ✅ Service restored
|
||||
- ✅ No data loss
|
||||
- ✅ Rollback plan tested (if triggered)
|
||||
```
|
||||
|
||||
Write to `$TASKS_FILE`.
|
||||
|
||||
Output:
|
||||
```
|
||||
📊 Emergency Tasks Generated
|
||||
✓ Created: ${TASKS_FILE}
|
||||
✓ 4 emergency tasks (immediate resolution)
|
||||
✓ 3 post-emergency tasks (learning and improvement)
|
||||
⚡ Estimated resolution time: <1-2 hours
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Execute Emergency Tasks
|
||||
|
||||
**Execute T001-T004 with MAXIMUM SPEED**:
|
||||
|
||||
```
|
||||
🚨 Executing Hotfix Workflow - EMERGENCY MODE
|
||||
|
||||
⚡ T001: Implement Hotfix
|
||||
[Execute minimal fix]
|
||||
${TECH_VALIDATION_IF_TIME_PERMITS}
|
||||
|
||||
⚡ T002: Essential Testing
|
||||
[Run critical path tests only]
|
||||
|
||||
⚡ T003: Deploy to Production
|
||||
[Emergency deployment]
|
||||
✓ Deployed
|
||||
⏱️ Monitoring...
|
||||
|
||||
⚡ T004: Monitor Post-Deployment
|
||||
[Watch metrics for N minutes]
|
||||
${MONITORING_RESULTS}
|
||||
|
||||
${EMERGENCY_RESOLVED_OR_ROLLBACK}
|
||||
```
|
||||
|
||||
**If Emergency Resolved**:
|
||||
```
|
||||
✅ EMERGENCY RESOLVED
|
||||
|
||||
📊 Resolution:
|
||||
- Time to fix: [duration]
|
||||
- Service status: Restored
|
||||
- Impact: Mitigated
|
||||
|
||||
📋 Post-Emergency Actions Required:
|
||||
- T005: Root cause analysis (within 24-48h)
|
||||
- T006: Permanent fix (if hotfix is temporary)
|
||||
- T007: Post-mortem (within 1 week)
|
||||
|
||||
Schedule these using normal workflows when appropriate.
|
||||
```
|
||||
|
||||
**If Rollback Triggered**:
|
||||
```
|
||||
🔄 ROLLBACK INITIATED
|
||||
|
||||
Reason: [rollback trigger hit]
|
||||
Status: Rolling back hotfix
|
||||
|
||||
[Execute rollback plan from hotfix.md]
|
||||
|
||||
✅ Rollback Complete
|
||||
Service status: [current status]
|
||||
|
||||
⚠️ Hotfix failed - need alternative approach
|
||||
Recommend: Escalate to senior engineer / architect
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Workflow Hook (if SpecTest installed)
|
||||
|
||||
```bash
|
||||
if [ "$ENABLE_METRICS" = "true" ]; then
|
||||
echo ""
|
||||
echo "🎣 Post-Hotfix Hook"
|
||||
|
||||
WORKFLOW_END_TIME=$(date +%s)
|
||||
WORKFLOW_DURATION=$((WORKFLOW_END_TIME - WORKFLOW_START_TIME))
|
||||
WORKFLOW_MINUTES=$(echo "scale=0; $WORKFLOW_DURATION / 60" | bc)
|
||||
|
||||
echo "✓ Emergency resolved"
|
||||
echo "⏱️ Time to Resolution: ${WORKFLOW_MINUTES} minutes"
|
||||
|
||||
# Update metrics
|
||||
METRICS_FILE=".specswarm/workflow-metrics.json"
|
||||
echo "📊 Emergency metrics saved: ${METRICS_FILE}"
|
||||
|
||||
echo ""
|
||||
echo "📋 Post-Emergency Actions:"
|
||||
echo "- Complete T005-T007 in normal hours"
|
||||
echo "- Schedule post-mortem"
|
||||
echo "- Document learnings"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final Output
|
||||
|
||||
```
|
||||
✅ Hotfix Workflow Complete - Hotfix ${HOTFIX_NUM}
|
||||
|
||||
🚨 Emergency Status: RESOLVED
|
||||
|
||||
📋 Artifacts Created:
|
||||
- ${HOTFIX_SPEC}
|
||||
- ${TASKS_FILE}
|
||||
|
||||
📊 Results:
|
||||
- Emergency resolved in: ${RESOLUTION_TIME}
|
||||
- Service status: Restored
|
||||
- Rollback triggered: [Yes/No]
|
||||
|
||||
⏱️ Time to Resolution: ${WORKFLOW_DURATION}
|
||||
|
||||
📋 Post-Emergency Actions Required:
|
||||
1. Root cause analysis (T005) - within 24-48h
|
||||
2. Permanent fix (T006) - if hotfix is temporary
|
||||
3. Post-mortem (T007) - within 1 week
|
||||
|
||||
📈 Next Steps:
|
||||
- Monitor production metrics closely
|
||||
- Schedule post-mortem meeting
|
||||
- Plan permanent fix: /speclab:bugfix (if hotfix is temporary)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
**If hotfix fails**:
|
||||
- Execute rollback plan immediately
|
||||
- Escalate to senior engineer
|
||||
- Document failure for post-mortem
|
||||
|
||||
**If rollback fails**:
|
||||
- CRITICAL: Manual intervention required
|
||||
- Alert on-call engineer
|
||||
- Document all actions taken
|
||||
|
||||
**If emergency worsens**:
|
||||
- Stop hotfix attempt
|
||||
- Consider service shutdown / maintenance mode
|
||||
- Escalate to incident commander
|
||||
|
||||
---
|
||||
|
||||
## Operating Principles
|
||||
|
||||
1. **Speed Over Process**: Minimize overhead, maximize velocity
|
||||
2. **Essential Only**: Skip non-critical validations
|
||||
3. **Rollback Ready**: Always have escape hatch
|
||||
4. **Monitor Closely**: Watch post-deployment metrics
|
||||
5. **Learn After**: Post-mortem is mandatory
|
||||
6. **Communicate**: Keep stakeholders informed
|
||||
7. **Temporary OK**: Hotfix can be temporary (permanent fix later)
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ Emergency resolved quickly (<2 hours)
|
||||
✅ Service restored to normal operation
|
||||
✅ No data loss or corruption
|
||||
✅ Rollback plan tested (if needed)
|
||||
✅ Post-emergency tasks scheduled
|
||||
✅ Incident documented for learning
|
||||
|
||||
---
|
||||
|
||||
**Workflow Coverage**: Addresses ~10-15% of development work (emergencies)
|
||||
**Speed**: ~45-90 minutes average time to resolution
|
||||
**Integration**: Optional SpecSwarm/SpecTest (speed takes priority)
|
||||
**Graduation Path**: Proven workflow will graduate to SpecSwarm stable
|
||||
Reference in New Issue
Block a user