435 lines
13 KiB
Markdown
435 lines
13 KiB
Markdown
# How to Create a Milestone Specification
|
|
|
|
Milestone specifications define specific delivery targets within a project, including deliverables, success criteria, and timeline. They're checkpoints to verify progress.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Create a new milestone
|
|
scripts/generate-spec.sh milestone mls-001-descriptive-slug
|
|
|
|
# 2. Open and fill in the file
|
|
# (The file will be created at: docs/specs/milestone/mls-001-descriptive-slug.md)
|
|
|
|
# 3. Fill in deliverables and criteria, then validate:
|
|
scripts/validate-spec.sh docs/specs/milestone/mls-001-descriptive-slug.md
|
|
|
|
# 4. Fix issues and check completeness:
|
|
scripts/check-completeness.sh docs/specs/milestone/mls-001-descriptive-slug.md
|
|
```
|
|
|
|
## When to Write a Milestone
|
|
|
|
Use a Milestone Spec when you need to:
|
|
- Define specific delivery checkpoints
|
|
- Communicate to stakeholders what's shipping when
|
|
- Track progress against concrete deliverables
|
|
- Set success criteria before building
|
|
- Manage dependencies between teams
|
|
- Celebrate progress and team achievements
|
|
|
|
## Research Phase
|
|
|
|
### 1. Research Related Specifications
|
|
Find the context for this milestone:
|
|
|
|
```bash
|
|
# Find the plan this milestone belongs to
|
|
grep -r "plan" docs/specs/ --include="*.md"
|
|
|
|
# Find related requirements and specs
|
|
grep -r "brd\|prd\|design" docs/specs/ --include="*.md"
|
|
```
|
|
|
|
### 2. Understand the Broader Plan
|
|
- What larger project is this part of?
|
|
- What comes before and after this milestone?
|
|
- What dependencies exist with other teams?
|
|
- What are the overall project goals?
|
|
|
|
### 3. Review Similar Milestones
|
|
- How were past milestones structured?
|
|
- What deliverables were tracked?
|
|
- How were success criteria defined?
|
|
- What worked and what didn't?
|
|
|
|
## Structure & Content Guide
|
|
|
|
### Title & Metadata
|
|
- **Title**: "Phase 1: Infrastructure Ready", "Beta Launch", etc.
|
|
- **Date**: Target completion date
|
|
- **Owner**: Team or person responsible
|
|
- **Status**: Planned | In Progress | Completed | At Risk
|
|
|
|
### Milestone Summary
|
|
|
|
```markdown
|
|
# Phase 1: Export Infrastructure Ready
|
|
|
|
**Target Date**: January 28, 2024
|
|
**Owner**: Backend Engineering Team
|
|
**Status**: In Progress
|
|
|
|
## Summary
|
|
Delivery of fully operational job queue infrastructure and worker processes
|
|
supporting the bulk export feature. Team demonstrates system can reliably
|
|
process 10+ jobs per second with monitoring and alerting in place.
|
|
```
|
|
|
|
### Deliverables Section
|
|
|
|
List what will be delivered:
|
|
|
|
```markdown
|
|
## Deliverables
|
|
|
|
### 1. Redis Job Queue (Production-Ready)
|
|
**Description**: Managed Redis cluster configured for job queuing
|
|
**Acceptance Criteria**:
|
|
- [ ] AWS ElastiCache Redis cluster deployed to staging
|
|
- [ ] Cluster sized for 10k requests/second capacity
|
|
- [ ] Backup and failover configured
|
|
- [ ] Monitoring and alerts in place
|
|
**Owner**: Infrastructure Team
|
|
**Status**: In Progress
|
|
|
|
### 2. Bull Job Queue Worker
|
|
**Description**: Node.js Bull queue implementation with workers
|
|
**Acceptance Criteria**:
|
|
- [ ] Bull queue initialized and processing jobs
|
|
- [ ] Worker processes handle 10+ jobs/second
|
|
- [ ] Graceful shutdown implemented
|
|
- [ ] Error handling and retry logic working
|
|
- [ ] Unit tests cover all worker functions
|
|
**Owner**: Backend Engineer (Alice)
|
|
**Delivered**: Code in feature branch, ready for review
|
|
|
|
### 3. Kubernetes Deployment Manifests
|
|
**Description**: K8s manifests for deploying queue workers
|
|
**Acceptance Criteria**:
|
|
- [ ] Deployment manifest supports 1-10 replicas
|
|
- [ ] Health checks configured (liveness, readiness)
|
|
- [ ] Resource requests/limits defined
|
|
- [ ] Secrets management for Redis credentials
|
|
- [ ] Successfully deploys to staging cluster
|
|
**Owner**: DevOps Engineer (Bob)
|
|
**Status**: Ready for review
|
|
|
|
### 4. Prometheus Metrics Integration
|
|
**Description**: Export metrics for job queue depth, worker status
|
|
**Acceptance Criteria**:
|
|
- [ ] Metrics scrape successfully every 15 seconds
|
|
- [ ] Dashboard shows queue depth over time
|
|
- [ ] Queue saturation alerts configured
|
|
- [ ] Grafana dashboard created for monitoring
|
|
**Owner**: Backend Engineer (Alice)
|
|
**Status**: In progress
|
|
|
|
### 5. Documentation & Runbook
|
|
**Description**: Queue architecture docs and operational runbook
|
|
**Acceptance Criteria**:
|
|
- [ ] Architecture diagram showing queues and workers
|
|
- [ ] Configuration guide for different environments
|
|
- [ ] Runbook for common operations (scaling, debugging)
|
|
- [ ] Troubleshooting guide for common issues
|
|
**Owner**: Tech Lead (Charlie)
|
|
**Status**: Planned (starts after technical setup)
|
|
|
|
## Deliverables Summary
|
|
|
|
| Deliverable | Status | Owner | Target |
|
|
|------------|--------|-------|--------|
|
|
| Redis Cluster | In Progress | Infra | Jan 20 |
|
|
| Bull Worker | In Progress | Alice | Jan 22 |
|
|
| K8s Manifests | In Progress | Bob | Jan 22 |
|
|
| Prometheus Metrics | In Progress | Alice | Jan 25 |
|
|
| Documentation | Planned | Charlie | Jan 28 |
|
|
```
|
|
|
|
### Success Criteria Section
|
|
|
|
Define what "done" means:
|
|
|
|
```markdown
|
|
## Success Criteria
|
|
|
|
### Technical Criteria (Must Pass)
|
|
- [ ] Job queue processes 100 jobs without errors
|
|
- [ ] Queue handles 10+ jobs/second sustained throughput
|
|
- [ ] Workers scale horizontally (add/remove replicas without data loss)
|
|
- [ ] Failed jobs retry with exponential backoff
|
|
- [ ] All health checks pass in staging environment
|
|
|
|
### Operational Criteria (Must Have)
|
|
- [ ] Prometheus metrics visible in Grafana dashboard
|
|
- [ ] Alerts fire correctly when queue depth exceeds threshold
|
|
- [ ] Monitoring documentation complete and understood by ops team
|
|
- [ ] Runbook covers: scaling, debugging, troubleshooting
|
|
|
|
### Quality Criteria (Must Meet)
|
|
- [ ] Code reviewed and approved by 2+ senior engineers
|
|
- [ ] Unit tests pass with 90%+ coverage
|
|
- [ ] Integration tests verify queue → worker → completion flow
|
|
- [ ] Load tests verify performance targets
|
|
- [ ] Security audit passed (no exposed credentials)
|
|
|
|
### Documentation Criteria (Must Have)
|
|
- [ ] Architecture documented with diagrams
|
|
- [ ] Configuration guide for different environments
|
|
- [ ] Troubleshooting guide covers common issues
|
|
- [ ] Operations team trained and confident in operations
|
|
|
|
## Sign-Off Criteria
|
|
|
|
Milestone is "done" when:
|
|
1. All deliverables accepted and deployed to staging
|
|
2. All technical criteria pass
|
|
3. Tech lead, product owner, and operations lead approve
|
|
4. Documentation reviewed and accepted
|
|
```
|
|
|
|
### Timeline & Dependencies Section
|
|
|
|
```markdown
|
|
## Timeline & Dependencies
|
|
|
|
### Critical Path
|
|
```
|
|
Start → Redis Setup → Bull Implementation → Testing → Documentation → Done
|
|
(Jan 15) (3 days) (4 days) (3 days) (2 days) (Jan 28)
|
|
```
|
|
|
|
### Phase Dependencies
|
|
- **Blocking this milestone**: None (can start immediately)
|
|
- **This milestone blocks**: Phase 2 (Export Service Development)
|
|
- **If delayed**: Phase 2 starts after this completes
|
|
- **Contingency**: Have spare capacity in next phase for any slippage
|
|
|
|
### Team Capacity
|
|
| Person | Allocation | Weeks | Notes |
|
|
|--------|-----------|-------|-------|
|
|
| Alice (Backend) | 100% | 2 | Queue + metrics |
|
|
| Bob (DevOps) | 100% | 1.5 | Infrastructure |
|
|
| Charlie (Lead) | 50% | 1.5 | Review + docs |
|
|
|
|
### Risks & Mitigation
|
|
|
|
| Risk | Likelihood | Impact | Mitigation |
|
|
|------|------------|--------|-----------|
|
|
| Redis provisioning delayed | Medium | High | Use managed service, start request early |
|
|
| Performance targets not met | Low | High | Load test early, optimize if needed |
|
|
| Team member unavailable | Low | Medium | Cross-train backup person |
|
|
| Documentation delayed | Low | Low | Defer non-critical docs to next phase |
|
|
```
|
|
|
|
### Blockers & Issues Section
|
|
|
|
Track what could prevent delivery:
|
|
|
|
```markdown
|
|
## Current Blockers
|
|
|
|
### 1. AWS Infrastructure Approval (High Priority)
|
|
- **Issue**: Redis cluster requires infrastructure approval
|
|
- **Impact**: Blocks infrastructure setup (3-5 day delay if not approved)
|
|
- **Owner**: Infrastructure Lead
|
|
- **Action**: Sent approval request on Jan 10, following up Jan 15
|
|
- **Target Resolution**: Jan 12
|
|
|
|
### 2. Node.js Bull Documentation Gap (Low Priority)
|
|
- **Issue**: Team unfamiliar with Bull library job prioritization
|
|
- **Impact**: Might need extra time for implementation
|
|
- **Owner**: Alice
|
|
- **Action**: Schedule Bull library workshop on Jan 16
|
|
- **Target Resolution**: Jan 16
|
|
|
|
## Dependencies Waiting
|
|
|
|
- AWS ElastiCache cluster approval (Infrastructure)
|
|
- IAM roles and security groups (Security team)
|
|
```
|
|
|
|
### Acceptance & Testing Section
|
|
|
|
```markdown
|
|
## Acceptance Procedures
|
|
|
|
### Manual Testing Checklist
|
|
- [ ] Queue accepts jobs from client
|
|
- [ ] Worker processes jobs without errors
|
|
- [ ] Queue depth monitoring works in Grafana
|
|
- [ ] Scaling up adds workers, scaling down removes them gracefully
|
|
- [ ] Failed job retry works with exponential backoff
|
|
- [ ] Restart worker and verify no jobs are lost
|
|
|
|
### Performance Testing
|
|
- [ ] Load test with 100 concurrent jobs
|
|
- [ ] Verify throughput ≥ 10 jobs/second
|
|
- [ ] Monitor memory and CPU during load test
|
|
- [ ] Document baseline metrics for future comparison
|
|
|
|
### Security Testing
|
|
- [ ] Credentials not exposed in logs or metrics
|
|
- [ ] Redis connection uses TLS
|
|
- [ ] Worker process runs with minimal permissions
|
|
|
|
### Sign-Off Process
|
|
1. Engineering team completes manual testing
|
|
2. Tech lead verifies all acceptance criteria pass
|
|
3. Operations team reviews runbook and documentation
|
|
4. Product owner confirms milestone meets business requirements
|
|
5. All sign-off: tech lead, ops lead, product owner
|
|
```
|
|
|
|
### Rollback Plan Section
|
|
|
|
```markdown
|
|
## Rollback Plan
|
|
|
|
If this milestone fails or has critical issues:
|
|
|
|
### Rollback Steps
|
|
1. Revert worker deployment: `kubectl rollout undo`
|
|
2. Keep Redis cluster (non-breaking)
|
|
3. Disable alerts that reference new queue
|
|
4. Run post-mortem to understand failure
|
|
|
|
### Communication
|
|
- Notify stakeholders if deadline at risk
|
|
- Update project plan and re-estimate Phase 2
|
|
- Communicate revised timeline to customers
|
|
|
|
### Root Cause Analysis
|
|
- Conduct post-mortem within 2 days
|
|
- Document lessons learned
|
|
- Update processes/checklists to prevent recurrence
|
|
```
|
|
|
|
### Stakeholder Communication Section
|
|
|
|
```markdown
|
|
## Stakeholder Communication
|
|
|
|
### Who Needs to Know About This Milestone?
|
|
- **Engineering**: Build against completed infrastructure
|
|
- **Product**: Planning feature launch timeline
|
|
- **Operations**: Preparing to support new system
|
|
- **Executives**: Tracking project progress
|
|
- **Customers**: Waiting for export feature
|
|
|
|
### Communication Plan
|
|
|
|
| Stakeholder | Update Frequency | Content |
|
|
|-------------|-----------------|---------|
|
|
| Engineering Team | Daily standup | Progress, blockers |
|
|
| Tech Lead | 3x/week | Risk assessment, decisions |
|
|
| Product Owner | Weekly | Status, timeline impact |
|
|
| Ops Team | Twice/week | Operational readiness |
|
|
| Executives | On completion | Milestone achieved, next steps |
|
|
|
|
### Status Updates
|
|
|
|
**Current Status**: 60% complete (Jan 22)
|
|
- Redis setup: Complete
|
|
- Bull worker: Mostly done, 2 days of testing remaining
|
|
- K8s manifests: In review
|
|
- Metrics: Underway
|
|
- Documentation: Not yet started
|
|
|
|
**Next Update**: Jan 25 (on track for Jan 28 completion)
|
|
|
|
**Confidence Level**: High (85%) - minor risks, good progress
|
|
```
|
|
|
|
## Writing Tips
|
|
|
|
### Be Specific About Deliverables
|
|
- What exactly is being delivered?
|
|
- How will you verify it's done?
|
|
- Who owns each deliverable?
|
|
- What's the definition of "done"?
|
|
|
|
### Define Success Clearly
|
|
- Success criteria should be objective and testable
|
|
- Mix technical, operational, and quality criteria
|
|
- Include both must-haves and nice-to-haves
|
|
- Get stakeholder agreement on criteria upfront
|
|
|
|
### Think About the Bigger Picture
|
|
- How does this milestone fit into the overall project?
|
|
- What depends on this milestone?
|
|
- What changes if this milestone is delayed?
|
|
- What's the contingency plan?
|
|
|
|
### Track Progress
|
|
- Update the milestone spec regularly (weekly)
|
|
- Note what's actually happening vs. plan
|
|
- Identify and communicate risks early
|
|
- Celebrate when milestone completes!
|
|
|
|
### Link to Related Specs
|
|
- Reference the overall plan: `[PLN-001]`
|
|
- Reference related milestones: `[MLS-002]`
|
|
- Reference technical specs: `[CMP-001]`, `[API-001]`
|
|
|
|
## Validation & Fixing Issues
|
|
|
|
### Run the Validator
|
|
```bash
|
|
scripts/validate-spec.sh docs/specs/milestone/mls-001-your-spec.md
|
|
```
|
|
|
|
### Common Issues & Fixes
|
|
|
|
**Issue**: "Deliverables lack acceptance criteria"
|
|
- **Fix**: Add specific, testable criteria for each deliverable
|
|
|
|
**Issue**: "No success criteria defined"
|
|
- **Fix**: Document technical, operational, and quality criteria
|
|
|
|
**Issue**: "Owner/responsibilities not assigned"
|
|
- **Fix**: Assign each deliverable to a specific person or team
|
|
|
|
**Issue**: "Rollback plan missing"
|
|
- **Fix**: Document how you'd handle failure or critical issues
|
|
|
|
## Decision-Making Framework
|
|
|
|
When defining a milestone:
|
|
|
|
1. **Scope**: What should be in this milestone?
|
|
- Shippable chunk?
|
|
- Dependencies resolved?
|
|
- Tests passing?
|
|
|
|
2. **Success**: How will we know this is done?
|
|
- Objective criteria?
|
|
- Stakeholder agreement?
|
|
- Testable outcomes?
|
|
|
|
3. **Schedule**: When is this realistically achievable?
|
|
- Team capacity?
|
|
- Dependency timelines?
|
|
- Buffer for unknowns?
|
|
|
|
4. **Risks**: What could prevent delivery?
|
|
- Technical unknowns?
|
|
- Resource constraints?
|
|
- External dependencies?
|
|
|
|
5. **Communication**: Who needs to know about this?
|
|
- Stakeholder updates?
|
|
- Sign-off process?
|
|
- Celebration when done?
|
|
|
|
## Next Steps
|
|
|
|
1. **Create the spec**: `scripts/generate-spec.sh milestone mls-XXX-slug`
|
|
2. **Research**: Find the plan and related specs
|
|
3. **Define deliverables** with clear owners
|
|
4. **Set success criteria** that are testable
|
|
5. **Identify risks** and mitigation strategies
|
|
6. **Validate**: `scripts/validate-spec.sh docs/specs/milestone/mls-XXX-slug.md`
|
|
7. **Get stakeholder alignment** before kickoff
|
|
8. **Update regularly** to track progress
|