# How to Create a Milestone Specification

Milestone specifications define specific delivery targets within a project, including deliverables, success criteria, and timeline. They're checkpoints to verify progress.

## Quick Start

```bash
# 1. Create a new milestone
scripts/generate-spec.sh milestone mls-001-descriptive-slug

# 2. Open and fill in the file
# (The file will be created at: docs/specs/milestone/mls-001-descriptive-slug.md)

# 3. Fill in deliverables and criteria, then validate:
scripts/validate-spec.sh docs/specs/milestone/mls-001-descriptive-slug.md

# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/milestone/mls-001-descriptive-slug.md
```

## When to Write a Milestone

Use a Milestone Spec when you need to:
- Define specific delivery checkpoints
- Communicate to stakeholders what's shipping when
- Track progress against concrete deliverables
- Set success criteria before building
- Manage dependencies between teams
- Celebrate progress and team achievements

## Research Phase

### 1. Research Related Specifications
Find the context for this milestone:

```bash
# Find the plan this milestone belongs to
grep -r "plan" docs/specs/ --include="*.md"

# Find related requirements and specs
grep -r "brd\|prd\|design" docs/specs/ --include="*.md"
```

### 2. Understand the Broader Plan
- What larger project is this part of?
- What comes before and after this milestone?
- What dependencies exist with other teams?
- What are the overall project goals?

### 3. Review Similar Milestones
- How were past milestones structured?
- What deliverables were tracked?
- How were success criteria defined?
- What worked and what didn't?

## Structure & Content Guide

### Title & Metadata
- **Title**: "Phase 1: Infrastructure Ready", "Beta Launch", etc.
- **Date**: Target completion date
- **Owner**: Team or person responsible
- **Status**: Planned | In Progress | Completed | At Risk

### Milestone Summary

```markdown
# Phase 1: Export Infrastructure Ready

**Target Date**: January 28, 2024
**Owner**: Backend Engineering Team
**Status**: In Progress

## Summary
Delivery of fully operational job queue infrastructure and worker processes
supporting the bulk export feature. Team demonstrates system can reliably
process 10+ jobs per second with monitoring and alerting in place.
```

### Deliverables Section

List what will be delivered:

```markdown
## Deliverables

### 1. Redis Job Queue (Production-Ready)
**Description**: Managed Redis cluster configured for job queuing
**Acceptance Criteria**:
- [ ] AWS ElastiCache Redis cluster deployed to staging
- [ ] Cluster sized for 10k requests/second capacity
- [ ] Backup and failover configured
- [ ] Monitoring and alerts in place
**Owner**: Infrastructure Team
**Status**: In Progress

### 2. Bull Job Queue Worker
**Description**: Node.js Bull queue implementation with workers
**Acceptance Criteria**:
- [ ] Bull queue initialized and processing jobs
- [ ] Worker processes handle 10+ jobs/second
- [ ] Graceful shutdown implemented
- [ ] Error handling and retry logic working
- [ ] Unit tests cover all worker functions
**Owner**: Backend Engineer (Alice)
**Delivered**: Code in feature branch, ready for review

### 3. Kubernetes Deployment Manifests
**Description**: K8s manifests for deploying queue workers
**Acceptance Criteria**:
- [ ] Deployment manifest supports 1-10 replicas
- [ ] Health checks configured (liveness, readiness)
- [ ] Resource requests/limits defined
- [ ] Secrets management for Redis credentials
- [ ] Successfully deploys to staging cluster
**Owner**: DevOps Engineer (Bob)
**Status**: Ready for review

### 4. Prometheus Metrics Integration
**Description**: Export metrics for job queue depth, worker status
**Acceptance Criteria**:
- [ ] Metrics scrape successfully every 15 seconds
- [ ] Dashboard shows queue depth over time
- [ ] Queue saturation alerts configured
- [ ] Grafana dashboard created for monitoring
**Owner**: Backend Engineer (Alice)
**Status**: In progress

### 5. Documentation & Runbook
**Description**: Queue architecture docs and operational runbook
**Acceptance Criteria**:
- [ ] Architecture diagram showing queues and workers
- [ ] Configuration guide for different environments
- [ ] Runbook for common operations (scaling, debugging)
- [ ] Troubleshooting guide for common issues
**Owner**: Tech Lead (Charlie)
**Status**: Planned (starts after technical setup)

## Deliverables Summary

| Deliverable | Status | Owner | Target |
|------------|--------|-------|--------|
| Redis Cluster | In Progress | Infra | Jan 20 |
| Bull Worker | In Progress | Alice | Jan 22 |
| K8s Manifests | In Progress | Bob | Jan 22 |
| Prometheus Metrics | In Progress | Alice | Jan 25 |
| Documentation | Planned | Charlie | Jan 28 |
```

### Success Criteria Section

Define what "done" means:

```markdown
## Success Criteria

### Technical Criteria (Must Pass)
- [ ] Job queue processes 100 jobs without errors
- [ ] Queue handles 10+ jobs/second sustained throughput
- [ ] Workers scale horizontally (add/remove replicas without data loss)
- [ ] Failed jobs retry with exponential backoff
- [ ] All health checks pass in staging environment

### Operational Criteria (Must Have)
- [ ] Prometheus metrics visible in Grafana dashboard
- [ ] Alerts fire correctly when queue depth exceeds threshold
- [ ] Monitoring documentation complete and understood by ops team
- [ ] Runbook covers: scaling, debugging, troubleshooting

### Quality Criteria (Must Meet)
- [ ] Code reviewed and approved by 2+ senior engineers
- [ ] Unit tests pass with 90%+ coverage
- [ ] Integration tests verify queue → worker → completion flow
- [ ] Load tests verify performance targets
- [ ] Security audit passed (no exposed credentials)

### Documentation Criteria (Must Have)
- [ ] Architecture documented with diagrams
- [ ] Configuration guide for different environments
- [ ] Troubleshooting guide covers common issues
- [ ] Operations team trained and confident in operations

## Sign-Off Criteria

Milestone is "done" when:
1. All deliverables accepted and deployed to staging
2. All technical criteria pass
3. Tech lead, product owner, and operations lead approve
4. Documentation reviewed and accepted
```

### Timeline & Dependencies Section

```markdown
## Timeline & Dependencies

### Critical Path
```
Start → Redis Setup → Bull Implementation → Testing → Documentation → Done
 (Jan 15)  (3 days)      (4 days)          (3 days)   (2 days)      (Jan 28)
```

### Phase Dependencies
- **Blocking this milestone**: None (can start immediately)
- **This milestone blocks**: Phase 2 (Export Service Development)
- **If delayed**: Phase 2 starts after this completes
- **Contingency**: Have spare capacity in next phase for any slippage

### Team Capacity
| Person | Allocation | Weeks | Notes |
|--------|-----------|-------|-------|
| Alice (Backend) | 100% | 2 | Queue + metrics |
| Bob (DevOps) | 100% | 1.5 | Infrastructure |
| Charlie (Lead) | 50% | 1.5 | Review + docs |

### Risks & Mitigation

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|-----------|
| Redis provisioning delayed | Medium | High | Use managed service, start request early |
| Performance targets not met | Low | High | Load test early, optimize if needed |
| Team member unavailable | Low | Medium | Cross-train backup person |
| Documentation delayed | Low | Low | Defer non-critical docs to next phase |
```

### Blockers & Issues Section

Track what could prevent delivery:

```markdown
## Current Blockers

### 1. AWS Infrastructure Approval (High Priority)
- **Issue**: Redis cluster requires infrastructure approval
- **Impact**: Blocks infrastructure setup (3-5 day delay if not approved)
- **Owner**: Infrastructure Lead
- **Action**: Sent approval request on Jan 10, following up Jan 15
- **Target Resolution**: Jan 12

### 2. Node.js Bull Documentation Gap (Low Priority)
- **Issue**: Team unfamiliar with Bull library job prioritization
- **Impact**: Might need extra time for implementation
- **Owner**: Alice
- **Action**: Schedule Bull library workshop on Jan 16
- **Target Resolution**: Jan 16

## Dependencies Waiting

- AWS ElastiCache cluster approval (Infrastructure)
- IAM roles and security groups (Security team)
```

### Acceptance & Testing Section

```markdown
## Acceptance Procedures

### Manual Testing Checklist
- [ ] Queue accepts jobs from client
- [ ] Worker processes jobs without errors
- [ ] Queue depth monitoring works in Grafana
- [ ] Scaling up adds workers, scaling down removes them gracefully
- [ ] Failed job retry works with exponential backoff
- [ ] Restart worker and verify no jobs are lost

### Performance Testing
- [ ] Load test with 100 concurrent jobs
- [ ] Verify throughput ≥ 10 jobs/second
- [ ] Monitor memory and CPU during load test
- [ ] Document baseline metrics for future comparison

### Security Testing
- [ ] Credentials not exposed in logs or metrics
- [ ] Redis connection uses TLS
- [ ] Worker process runs with minimal permissions

### Sign-Off Process
1. Engineering team completes manual testing
2. Tech lead verifies all acceptance criteria pass
3. Operations team reviews runbook and documentation
4. Product owner confirms milestone meets business requirements
5. All sign-off: tech lead, ops lead, product owner
```

### Rollback Plan Section

```markdown
## Rollback Plan

If this milestone fails or has critical issues:

### Rollback Steps
1. Revert worker deployment: `kubectl rollout undo`
2. Keep Redis cluster (non-breaking)
3. Disable alerts that reference new queue
4. Run post-mortem to understand failure

### Communication
- Notify stakeholders if deadline at risk
- Update project plan and re-estimate Phase 2
- Communicate revised timeline to customers

### Root Cause Analysis
- Conduct post-mortem within 2 days
- Document lessons learned
- Update processes/checklists to prevent recurrence
```

### Stakeholder Communication Section

```markdown
## Stakeholder Communication

### Who Needs to Know About This Milestone?
- **Engineering**: Build against completed infrastructure
- **Product**: Planning feature launch timeline
- **Operations**: Preparing to support new system
- **Executives**: Tracking project progress
- **Customers**: Waiting for export feature

### Communication Plan

| Stakeholder | Update Frequency | Content |
|-------------|-----------------|---------|
| Engineering Team | Daily standup | Progress, blockers |
| Tech Lead | 3x/week | Risk assessment, decisions |
| Product Owner | Weekly | Status, timeline impact |
| Ops Team | Twice/week | Operational readiness |
| Executives | On completion | Milestone achieved, next steps |

### Status Updates

**Current Status**: 60% complete (Jan 22)
- Redis setup: Complete
- Bull worker: Mostly done, 2 days of testing remaining
- K8s manifests: In review
- Metrics: Underway
- Documentation: Not yet started

**Next Update**: Jan 25 (on track for Jan 28 completion)

**Confidence Level**: High (85%) - minor risks, good progress
```

## Writing Tips

### Be Specific About Deliverables
- What exactly is being delivered?
- How will you verify it's done?
- Who owns each deliverable?
- What's the definition of "done"?

### Define Success Clearly
- Success criteria should be objective and testable
- Mix technical, operational, and quality criteria
- Include both must-haves and nice-to-haves
- Get stakeholder agreement on criteria upfront

### Think About the Bigger Picture
- How does this milestone fit into the overall project?
- What depends on this milestone?
- What changes if this milestone is delayed?
- What's the contingency plan?

### Track Progress
- Update the milestone spec regularly (weekly)
- Note what's actually happening vs. plan
- Identify and communicate risks early
- Celebrate when milestone completes!

### Link to Related Specs
- Reference the overall plan: `[PLN-001]`
- Reference related milestones: `[MLS-002]`
- Reference technical specs: `[CMP-001]`, `[API-001]`

## Validation & Fixing Issues

### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/milestone/mls-001-your-spec.md
```

### Common Issues & Fixes

**Issue**: "Deliverables lack acceptance criteria"
- **Fix**: Add specific, testable criteria for each deliverable

**Issue**: "No success criteria defined"
- **Fix**: Document technical, operational, and quality criteria

**Issue**: "Owner/responsibilities not assigned"
- **Fix**: Assign each deliverable to a specific person or team

**Issue**: "Rollback plan missing"
- **Fix**: Document how you'd handle failure or critical issues

## Decision-Making Framework

When defining a milestone:

1. **Scope**: What should be in this milestone?
   - Shippable chunk?
   - Dependencies resolved?
   - Tests passing?

2. **Success**: How will we know this is done?
   - Objective criteria?
   - Stakeholder agreement?
   - Testable outcomes?

3. **Schedule**: When is this realistically achievable?
   - Team capacity?
   - Dependency timelines?
   - Buffer for unknowns?

4. **Risks**: What could prevent delivery?
   - Technical unknowns?
   - Resource constraints?
   - External dependencies?

5. **Communication**: Who needs to know about this?
   - Stakeholder updates?
   - Sign-off process?
   - Celebration when done?

## Next Steps

1. **Create the spec**: `scripts/generate-spec.sh milestone mls-XXX-slug`
2. **Research**: Find the plan and related specs
3. **Define deliverables** with clear owners
4. **Set success criteria** that are testable
5. **Identify risks** and mitigation strategies
6. **Validate**: `scripts/validate-spec.sh docs/specs/milestone/mls-XXX-slug.md`
7. **Get stakeholder alignment** before kickoff
8. **Update regularly** to track progress