Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:45:31 +08:00
commit ca9b85ccda
35 changed files with 10784 additions and 0 deletions

View File

@@ -0,0 +1,341 @@
# Spec Author Instruction Guides
This directory contains comprehensive instruction guides for creating each type of specification document supported by the spec-author skill. Each guide includes:
- **Quick Start**: Get up and running quickly with basic commands
- **Research Phase**: Guidance on researching related specs and external documentation
- **Structure & Content Guide**: Detailed walkthrough of each section
- **Writing Tips**: Best practices and common pitfalls
- **Validation & Fixing Issues**: How to use the validation tools
- **Decision-Making Framework**: Questions to ask while writing
- **Next Steps**: Complete workflow from creation to completion
## Specification Types
### Business & Planning
#### [Business Requirement (brd-XXX)](./business-requirement.md)
Capture what problem you're solving and why it matters from a business perspective. Translate customer needs into requirements that engineering can build against.
**Use when**: Documenting new features, defining business value, creating stakeholder alignment
**Key sections**: Business value, user stories, acceptance criteria, success metrics
#### [Technical Requirement (prd-XXX)](./technical-requirement.md)
Translate business needs into specific, implementation-ready technical requirements. Bridge the gap between "what we want" and "how we'll build it."
**Use when**: Defining technical implementation details, mapping business requirements to solutions
**Key sections**: Technical requirements, design decisions, acceptance criteria, constraints
#### [Plan (pln-XXX)](./plan.md)
Document implementation roadmaps, project timelines, phases, and deliverables. Provide the "how and when" we'll build something.
**Use when**: Planning project execution, defining phases and timeline, identifying dependencies
**Key sections**: Phases, timeline, deliverables, dependencies, risks, resources
#### [Milestone (mls-XXX)](./milestone.md)
Define specific delivery checkpoints within a project, including deliverables, success criteria, and timeline.
**Use when**: Defining delivery targets, communicating progress, tracking against concrete deliverables
**Key sections**: Deliverables, success criteria, timeline, blockers, acceptance procedures
### Architecture & Design
#### [Design Document (des-XXX)](./design-document.md)
Provide the detailed architectural and technical design for a system, component, or significant feature.
**Use when**: Major system redesign, architectural decisions, technology choices
**Key sections**: Proposed solution, design decisions, technology stack, trade-offs, implementation plan
#### [Component (cmp-XXX)](./component.md)
Document individual system components or services, including responsibilities, interfaces, configuration, and deployment.
**Use when**: Documenting microservices, major system components, architectural pieces
**Key sections**: Responsibilities, interfaces, configuration, deployment, monitoring
#### [Flow Schematic (flow-XXX)](./flow-schematic.md)
Document business processes, workflows, and system flows visually and textually. Show how information moves through systems.
**Use when**: Documenting user workflows, system interactions, complex processes
**Key sections**: Flow diagram, step-by-step descriptions, decision points, error handling
### Data & Contracts
#### [Data Model (data-XXX)](./data-model.md)
Define entities, fields, relationships, and constraints for your application's data. Define the "shape" of data your system works with.
**Use when**: Planning database schema, documenting entity relationships, enabling API/UI teams
**Key sections**: Entity definitions, relationships, constraints, scaling considerations
#### [API Contract (api-XXX)](./api-contract.md)
Document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication.
**Use when**: Defining API endpoints, enabling parallel frontend/backend development, creating living documentation
**Key sections**: Authentication, endpoints, response formats, error handling, rate limiting
### Operations & Configuration
#### [Deployment Procedure (deploy-XXX)](./deployment-procedure.md)
Document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
**Use when**: Deploying services, creating runbooks, enabling safe operations, ensuring repeatable deployments
**Key sections**: Prerequisites, deployment steps, rollback procedure, success criteria, troubleshooting
#### [Configuration Schema (config-XXX)](./configuration-schema.md)
Document all configurable parameters for a system, including types, valid values, defaults, and impact.
**Use when**: Documenting system configuration, enabling ops teams to configure safely, supporting multiple environments
**Key sections**: Configuration methods, field descriptions, validation rules, environment-specific examples
## How to Use These Guides
### For New Spec Types
1. **Choose your spec type** based on what you're documenting
2. **Go to the corresponding guide** (e.g., if creating a business requirement, read `business-requirement.md`)
3. **Follow the Quick Start** to generate a new spec from the template
4. **Work through the Research Phase** to understand context
5. **Use the Structure & Content Guide** to fill in each section
6. **Apply the Writing Tips** as you write
7. **Run validation** using the tools
8. **Follow the Decision-Making Framework** to reason through tough choices
### For Improving Existing Specs
1. **Check the appropriate guide** for what your spec should contain
2. **Review the "Validation & Fixing Issues"** section for common problems
3. **Use validation tools** to identify what's missing
4. **Fill in missing sections** using the structure guide
5. **Fix any issues** the validator identifies
### For Team Standards
1. **Each guide provides concrete standards** for each spec type
2. **Sections that are marked as "required"** should be in all specs
3. **Examples in each guide** show the expected quality and detail level
4. **Validation rules** ensure consistent structure across specs
## Quick Reference: CLI Commands
### Create a New Spec
```bash
# Generate a new spec from template
scripts/generate-spec.sh <spec-type> <spec-id>
# Examples:
scripts/generate-spec.sh business-requirement brd-001-user-export
scripts/generate-spec.sh design-document des-001-export-arch
scripts/generate-spec.sh api-contract api-001-export-endpoints
```
### Validate a Spec
```bash
# Check a spec for completeness and structure
scripts/validate-spec.sh docs/specs/business-requirement/brd-001-user-export.md
```
Returns:
-**PASS**: All required sections present and complete
-**WARNINGS**: Missing optional sections or incomplete TODOs
-**ERRORS**: Missing critical sections or structural issues
### Check Completeness
```bash
# See what's incomplete and what TODOs need attention
scripts/check-completeness.sh docs/specs/business-requirement/brd-001-user-export.md
```
Shows:
- Completion percentage
- Missing sections with descriptions
- TODO items that need completion
- Referenced documents
### List Available Templates
```bash
# See what spec types are available
scripts/list-templates.sh
```
## Workflow Example: Creating a Feature End-to-End
### Step 1: Business Requirements
**Create**: `scripts/generate-spec.sh business-requirement brd-001-bulk-export`
**Guide**: Follow [business-requirement.md](./business-requirement.md)
**Output**: `docs/specs/business-requirement/brd-001-bulk-export.md`
### Step 2: Technical Requirements
**Create**: `scripts/generate-spec.sh technical-requirement prd-001-export-api`
**Guide**: Follow [technical-requirement.md](./technical-requirement.md)
**Reference**: Link to BRD created in Step 1
**Output**: `docs/specs/technical-requirement/prd-001-export-api.md`
### Step 3: Design Document
**Create**: `scripts/generate-spec.sh design-document des-001-export-arch`
**Guide**: Follow [design-document.md](./design-document.md)
**Reference**: Link to PRD and BRD
**Output**: `docs/specs/design-document/des-001-export-arch.md`
### Step 4: Data Model
**Create**: `scripts/generate-spec.sh data-model data-001-export-schema`
**Guide**: Follow [data-model.md](./data-model.md)
**Reference**: Entities used in design
**Output**: `docs/specs/data-model/data-001-export-schema.md`
### Step 5: API Contract
**Create**: `scripts/generate-spec.sh api-contract api-001-export-endpoints`
**Guide**: Follow [api-contract.md](./api-contract.md)
**Reference**: Link to technical requirements
**Output**: `docs/specs/api-contract/api-001-export-endpoints.md`
### Step 6: Component Specs
**Create**: `scripts/generate-spec.sh component cmp-001-export-service`
**Guide**: Follow [component.md](./component.md)
**Reference**: Link to design and technical requirements
**Output**: `docs/specs/component/cmp-001-export-service.md`
### Step 7: Implementation Plan
**Create**: `scripts/generate-spec.sh plan pln-001-export-implementation`
**Guide**: Follow [plan.md](./plan.md)
**Reference**: Link to all related specs
**Output**: `docs/specs/plan/pln-001-export-implementation.md`
### Step 8: Define Milestones
**Create**: `scripts/generate-spec.sh milestone mls-001-export-phase1`
**Guide**: Follow [milestone.md](./milestone.md)
**Reference**: Link to plan created in Step 7
**Output**: `docs/specs/milestone/mls-001-export-phase1.md`
### Step 9: Document Workflows
**Create**: `scripts/generate-spec.sh flow-schematic flow-001-export-process`
**Guide**: Follow [flow-schematic.md](./flow-schematic.md)
**Reference**: Illustrate flows described in design and API
**Output**: `docs/specs/flow-schematic/flow-001-export-process.md`
### Step 10: Configuration Schema
**Create**: `scripts/generate-spec.sh configuration-schema config-001-export-service`
**Guide**: Follow [configuration-schema.md](./configuration-schema.md)
**Reference**: Used by component and deployment
**Output**: `docs/specs/configuration-schema/config-001-export-service.md`
### Step 11: Deployment Procedure
**Create**: `scripts/generate-spec.sh deployment-procedure deploy-001-export-production`
**Guide**: Follow [deployment-procedure.md](./deployment-procedure.md)
**Reference**: Link to component, configuration, and plan
**Output**: `docs/specs/deployment-procedure/deploy-001-export-production.md`
## Tips for Success
### 1. Research Thoroughly
- Use the Research Phase section in each guide
- Look for related specs that have already been created
- Research external docs using doc tools or web search
- Understand the context before writing
### 2. Use CLI Tools Effectively
- Start with `scripts/generate-spec.sh` to create from template (saves time)
- Use `scripts/validate-spec.sh` frequently while writing (catches issues early)
- Use `scripts/check-completeness.sh` to find TODOs that need attention
- Run validation before considering a spec "done"
### 3. Complete All Sections
- "Required" sections should be in every spec
- "Optional" sections may be skipped if not applicable
- Never leave placeholder text or TODO items in final specs
- Incomplete specs cause confusion and rework
### 4. Link Specs Together
- Reference related specs using [ID] format (e.g., `[BRD-001]`)
- Show how specs depend on each other
- This creates a web of related documentation
- Makes specs more discoverable
### 5. Use Concrete Examples
- Concrete examples are clearer than abstract descriptions
- Show actual data, requests, responses
- Include sample configurations
- Show before/after if describing changes
### 6. Get Feedback Early
- Share early drafts with stakeholders
- Use validation to catch structural issues
- Get domain experts to review for accuracy
- Iterate based on feedback
### 7. Keep Updating
- Specs should reflect current state, not initial design
- Update when important decisions change
- Mark what was decided and when
- Document why changes were made
## Common Patterns Across Guides
### Quick Start Pattern
Every guide starts with:
```bash
scripts/generate-spec.sh <type> <id>
# Edit file
scripts/validate-spec.sh docs/specs/...
scripts/check-completeness.sh docs/specs/...
```
### Research Phase Pattern
Every guide recommends:
1. Finding related specs
2. Understanding external context
3. Reviewing existing patterns
4. Understanding constraints/requirements
### Structure Pattern
Every guide provides:
- Detailed walkthrough of each section
- Purpose of each section
- What should be included
- How detailed to be
### Validation Pattern
Every guide includes:
- Running the validator
- Common issues and how to fix them
- Checking completeness
### Decision Pattern
Every guide encourages thinking through:
1. Scope/boundaries
2. Options and trade-offs
3. Specific decisions and rationale
4. Communication and approval
5. Evolution and change
## Getting Help
### For questions about a specific spec type:
- Read the corresponding guide in this directory
- Check the examples section for concrete examples
- Review the decision-making framework for guidance
### For validation issues:
- Run `./scripts/validate-spec.sh` to see what's missing
- Read the "Validation & Fixing Issues" section of the guide
- Check if required sections are present and complete
### For understanding the bigger picture:
- Read through related guides to see how specs connect
- Look at the Workflow Example to see the full flow
- Review the Common Patterns section
## Next Steps
1. **Pick a spec type** you need to create
2. **Read the corresponding guide** thoroughly
3. **Run the generate command** to create from template
4. **Follow the structure guide** to fill in sections
5. **Validate frequently** as you work
6. **Fix issues** the validator identifies
7. **Get feedback** from stakeholders
8. **Consider this "complete"** when validator passes ✓
Good luck writing great specs! Remember: clear, complete specifications save time and prevent mistakes later in the development process.

View File

@@ -0,0 +1,526 @@
# How to Create an API Contract Specification
API Contracts document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication. They serve as the contract between frontend and backend teams.
## Quick Start
```bash
# 1. Create a new API contract
scripts/generate-spec.sh api-contract api-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/api-contract/api-001-descriptive-slug.md)
# 3. Fill in endpoints and specifications, then validate:
scripts/validate-spec.sh docs/specs/api-contract/api-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/api-contract/api-001-descriptive-slug.md
```
## When to Write an API Contract
Use an API Contract when you need to:
- Define REST API endpoints and their behavior
- Document request/response schemas in detail
- Specify error handling and status codes
- Clarify authentication and authorization
- Enable parallel frontend/backend development
- Create living documentation of your API
## Research Phase
### 1. Research Related Specifications
Find what this API needs to support:
```bash
# Find technical requirements this fulfills
grep -r "prd\|technical" docs/specs/ --include="*.md"
# Find data models this API exposes
grep -r "data\|model" docs/specs/ --include="*.md"
# Find existing APIs in the codebase
grep -r "api\|endpoint" docs/specs/ --include="*.md"
```
### 2. Research API Design Standards
Understand best practices and conventions:
- REST conventions: HTTP methods, status codes, URL structure
- Pagination: How to handle large result sets?
- Error handling: Standard error format for your org?
- Versioning: How do you version APIs?
- Naming conventions: camelCase vs. snake_case?
Research your tech stack's conventions if needed.
### 3. Review Existing APIs
- How are existing APIs in your codebase designed?
- What patterns does your team follow?
- Any shared infrastructure (API gateway, auth)?
- Error response format standards?
### 4. Understand Data Models
- What entities are exposed?
- Which fields are required vs. optional?
- See [DATA-001] or similar specs for schema details
## Structure & Content Guide
### Title & Metadata
- **Title**: "User Export API" or similar
- Include context about what endpoints are included
- Version number if this is an update to an existing API
### Overview Section
Provide context for the API:
```markdown
# User Export API
This API provides endpoints for initiating, tracking, and downloading user data exports.
Supports bulk export of user information in multiple formats (CSV, JSON).
Authenticated requests only.
**Base URL**: `https://api.example.com/v1`
**Authentication**: Bearer token (JWT)
```
### Authentication & Authorization Section
Describe how authentication works:
```markdown
## Authentication
**Method**: Bearer Token (JWT)
**Header**: `Authorization: Bearer {token}`
**Token Source**: Obtained from `/auth/login` endpoint
### Authorization
**Required**: All endpoints require valid JWT token
**Scopes** (if using OAuth/scope-based):
- `exports:read` - View export status
- `exports:create` - Create new exports
- `exports:download` - Download export files
**User Data**: Users can only access their own exports (enforced server-side)
```
### Endpoints Section
Document each endpoint thoroughly:
#### Endpoint: Create Export
```markdown
**POST /exports**
Creates a new export job for the authenticated user.
### Request
**Headers**
- `Authorization: Bearer {token}` (required)
- `Content-Type: application/json`
**Body**
```json
{
"data_types": ["users", "transactions"],
"format": "csv",
"date_range": {
"start": "2024-01-01",
"end": "2024-01-31"
}
}
```
**Parameters**
- `data_types` (array, required): Types of data to include
- Allowed values: `users`, `transactions`, `settings`
- At least one required
- `format` (string, required): Export file format
- Allowed values: `csv`, `json`
- `date_range` (object, optional): Filter data by date range
- `start` (string, ISO8601 format)
- `end` (string, ISO8601 format)
### Response
**Status: 201 Created**
```json
{
"id": "exp_1234567890",
"user_id": "usr_9876543210",
"status": "queued",
"format": "csv",
"data_types": ["users", "transactions"],
"created_at": "2024-01-15T10:30:00Z",
"estimated_completion": "2024-01-15T10:35:00Z"
}
```
**Status: 400 Bad Request**
```json
{
"error": "invalid_request",
"message": "data_types must include at least one type",
"code": "VALIDATION_ERROR"
}
```
**Status: 401 Unauthorized**
```json
{
"error": "unauthorized",
"message": "Invalid or missing authorization token",
"code": "AUTH_FAILED"
}
```
**Status: 429 Too Many Requests**
```json
{
"error": "rate_limited",
"message": "Too many requests. Try again after 60 seconds.",
"retry_after": 60
}
```
### Details
**Rate Limiting**
- 10 exports per hour per user
- Returns `X-RateLimit-*` headers
- `X-RateLimit-Limit: 10`
- `X-RateLimit-Remaining: 5`
- `X-RateLimit-Reset: 1705319400`
**Notes**
- Exports larger than 100MB are automatically gzipped
- User receives email notification when export is ready
- Export files retained for 7 days
```
#### Endpoint: Get Export Status
```markdown
**GET /exports/{export_id}**
Retrieve the status of a specific export.
### Path Parameters
- `export_id` (string, required): Export ID (e.g., `exp_1234567890`)
### Response
**Status: 200 OK**
```json
{
"id": "exp_1234567890",
"user_id": "usr_9876543210",
"status": "completed",
"format": "csv",
"data_types": ["users", "transactions"],
"created_at": "2024-01-15T10:30:00Z",
"completed_at": "2024-01-15T10:35:00Z",
"file_size_bytes": 2048576,
"download_url": "https://exports.example.com/exp_1234567890.csv.gz",
"download_expires_at": "2024-01-22T10:35:00Z"
}
```
**Status: 404 Not Found**
```json
{
"error": "not_found",
"message": "Export not found",
"code": "EXPORT_NOT_FOUND"
}
```
### Export Status Values
- `queued` - Job is waiting to be processed
- `processing` - Job is currently running
- `completed` - Export is ready for download
- `failed` - Export failed (see error field)
- `cancelled` - User cancelled the export
### Error Field (when status: failed)
```json
{
"error": "export_failed",
"message": "Database connection lost during export"
}
```
```
#### Endpoint: Download Export
```markdown
**GET /exports/{export_id}/download**
Download the export file.
### Path Parameters
- `export_id` (string, required): Export ID
### Response
**Status: 200 OK**
- Returns binary file content
- Content-Type: `application/csv` or `application/json`
- Headers include:
- `Content-Disposition: attachment; filename=export.csv`
- `Content-Length: 2048576`
**Status: 410 Gone**
```json
{
"error": "gone",
"message": "Export file expired (retention: 7 days)",
"code": "FILE_EXPIRED"
}
```
```
### Response Formats Section
Define common response formats used across endpoints:
```markdown
## Common Response Formats
### Error Response
All errors follow this format:
```json
{
"error": "error_code",
"message": "Human-readable error message",
"code": "ERROR_CODE",
"request_id": "req_abc123" // For support/debugging
}
```
### Pagination (for list endpoints)
```json
{
"data": [ /* array of items */ ],
"pagination": {
"total": 150,
"limit": 20,
"offset": 0,
"next": "https://api.example.com/v1/exports?limit=20&offset=20"
}
}
```
```
### Error Handling Section
Document error scenarios and status codes:
```markdown
## Error Handling
### HTTP Status Codes
- **200 OK**: Request succeeded
- **201 Created**: Resource created successfully
- **400 Bad Request**: Invalid request format or parameters
- **401 Unauthorized**: Missing or invalid authentication
- **403 Forbidden**: Authenticated but not authorized (e.g., trying to access another user's export)
- **404 Not Found**: Resource doesn't exist
- **409 Conflict**: Request conflicts with current state (e.g., cancelling completed export)
- **429 Too Many Requests**: Rate limit exceeded
- **500 Internal Server Error**: Server error
- **503 Service Unavailable**: Service temporarily unavailable
### Error Codes
- `VALIDATION_ERROR` - Invalid input parameters
- `AUTH_FAILED` - Authentication failed
- `NOT_AUTHORIZED` - Insufficient permissions
- `NOT_FOUND` - Resource doesn't exist
- `CONFLICT` - Conflicting request state
- `RATE_LIMITED` - Rate limit exceeded
- `INTERNAL_ERROR` - Server error (retryable)
- `SERVICE_UNAVAILABLE` - Service temporarily down (retryable)
### Retry Strategy
**Retryable errors** (5xx, 429):
- Implement exponential backoff: 1s, 2s, 4s, 8s...
- Maximum 3 retries
**Non-retryable errors** (4xx except 429):
- Return error immediately to client
```
### Rate Limiting Section
```markdown
## Rate Limiting
### Limits per User
- Export creation: 10 per hour
- API calls: 1000 per hour
### Headers
All responses include rate limit information:
- `X-RateLimit-Limit`: Request quota
- `X-RateLimit-Remaining`: Requests remaining
- `X-RateLimit-Reset`: Unix timestamp when quota resets
### Handling Rate Limits
- If rate limited (429), client receives `Retry-After` header
- Retry after specified seconds
- Implement exponential backoff to avoid overwhelming API
```
### Data Types Section
If your API works with multiple data models, document them:
```markdown
## Data Types
### Export Object
```json
{
"id": "string (export ID)",
"user_id": "string (user ID)",
"status": "string (queued|processing|completed|failed|cancelled)",
"format": "string (csv|json)",
"data_types": "string[] (users|transactions|settings)",
"created_at": "string (ISO8601)",
"completed_at": "string (ISO8601, null if not completed)",
"file_size_bytes": "number (null if not completed)",
"download_url": "string (null if not completed)",
"download_expires_at": "string (ISO8601, null if expired)"
}
```
### User Object
```json
{
"id": "string",
"email": "string",
"created_at": "string (ISO8601)"
}
```
```
### Versioning Section
```markdown
## API Versioning
**Current Version**: v1
### Versioning Strategy
- New major versions for breaking changes (v1, v2, etc.)
- Minor versions for additive changes (backwards compatible)
- Versions specified in URL path: `/v1/exports`
### Migration Timeline
- Old version support: Minimum 12 months after new version release
- Deprecation notice: 3 months before shutdown
### Breaking Changes
Examples of breaking changes requiring new version:
- Removing endpoints or fields
- Changing response format fundamentally
- Changing HTTP method of endpoint
```
## Writing Tips
### Be Specific About Request/Response
- Show actual JSON examples
- Document all fields (required vs. optional)
- Include data types and valid values
- Specify date/time formats (ISO8601)
### Document Error Scenarios
- List common error cases for each endpoint
- Show exact error response format
- Explain how client should handle each error
- Include HTTP status codes
### Think About Developer Experience
- Are endpoints intuitive?
- Is pagination consistent across endpoints?
- Are error messages helpful?
- Can a developer implement against this without asking questions?
### Link to Related Specs
- Reference data models: `[DATA-001]`
- Reference technical requirements: `[PRD-001]`
- Reference design docs: `[DES-001]`
### Version Your API
- Document versioning strategy
- Make it easy for clients to upgrade
- Provide migration path from old to new versions
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/api-contract/api-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing endpoint specifications"
- **Fix**: Document all endpoints with request/response examples
**Issue**: "Error handling not documented"
- **Fix**: Add status codes and error response formats
**Issue**: "No authentication section"
- **Fix**: Clearly document authentication method and authorization rules
**Issue**: "Incomplete endpoint details"
- **Fix**: Add request parameters, response examples, and error cases
## Decision-Making Framework
As you write the API spec, consider:
1. **Design**: Are endpoints intuitive and consistent?
- Consistent URL structure?
- Correct HTTP methods?
- Good naming?
2. **Data**: What fields are needed in requests/responses?
- Required vs. optional?
- Proper data types?
- Necessary for clients or redundant?
3. **Errors**: What can go wrong?
- Common error cases?
- Clear error messages?
- Actionable feedback for developers?
4. **Performance**: Are there efficiency considerations?
- Pagination for large result sets?
- Filtering/search capabilities?
- Rate limiting strategy?
5. **Evolution**: How will this API change?
- Versioning strategy?
- Backwards compatibility?
- Deprecation timeline?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh api-contract api-XXX-slug`
2. **Research**: Find related data models and technical requirements
3. **Design endpoints**: Sketch out URL structure and HTTP methods
4. **Fill in details** for each endpoint using this guide
5. **Validate**: `scripts/validate-spec.sh docs/specs/api-contract/api-XXX-slug.md`
6. **Get review** from backend and frontend teams
7. **Share with implementation teams** for development

View File

@@ -0,0 +1,310 @@
# How to Create a Business Requirement Specification
Business Requirements (BRD) capture what problem you're solving and why it matters from a business perspective. They translate customer needs into requirements that the technical team can build against.
## Quick Start
```bash
# 1. Create a new business requirement (auto-generates next ID)
scripts/generate-spec.sh business-requirement --next descriptive-slug
# This auto-assigns the next ID (e.g., brd-002-descriptive-slug)
# File created at: docs/specs/business-requirement/brd-002-descriptive-slug.md
# 2. Fill in the sections following the guide below
# 3. Validate completeness
scripts/validate-spec.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
# 4. Fix any issues and check completeness
scripts/check-completeness.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
```
**Pro tip:** Use `scripts/next-id.sh business-requirement` to see what the next ID will be before creating.
## When to Write a Business Requirement
Use a Business Requirement when you need to:
- Document a new feature or capability from the user's perspective
- Articulate the business value and expected outcomes
- Define acceptance criteria that stakeholders can verify
- Create alignment across product, engineering, and business teams
- Track success after implementation with specific metrics
## Research Phase
Before writing, gather context:
### 1. Research Related Specifications
Look for related specs that inform this requirement:
```bash
# Find related business requirements
grep -r "related\|similar\|user export" docs/specs/ --include="*.md" | head -20
# Or search for related technical requirements that might already exist
scripts/list-templates.sh # See what's already documented
```
### 2. Research External Documentation & Competitive Landscape
If available, research:
- Competitor features or how similar companies solve this problem
- Industry standards or best practices
- User research or survey data
- Customer feedback or support tickets
Use web tools if needed:
```bash
# If researching web sources, use Claude's web fetch capabilities
# to pull in external docs, API documentation, or competitive analysis
```
### 3. Understand Existing Context
- Ask: What systems or processes does this impact?
- Find: Any existing specs in `docs/specs/` that are related
- Review: Recent PRs or commits related to this domain
## Structure & Content Guide
### Metadata Section
Fill in these required fields:
- **Document ID**: Use format `BRD-XXX-short-slug` (e.g., `BRD-001-user-export`)
- **Status**: Start with "Draft", moves to "In Review" → "Approved" → "Implemented"
- **Author**: Your name
- **Created**: Today's date (YYYY-MM-DD)
- **Stakeholders**: Key people involved (Product Manager, Engineering Lead, Customer Success, etc.)
- **Priority**: Critical | High | Medium | Low
### Description Section
Answer: "What is the problem and why does it matter?"
**Background**: Provide context
- What is the current situation?
- How did we identify this need?
- Who brought it up?
**Problem Statement**: Be concise and specific
- Example: "Users cannot export their data in bulk, forcing them to perform manual exports one at a time, which is time-consuming and error-prone."
### Business Value Section
Answer: "Why should we build this?"
**Expected Outcomes**: List 2-3 measurable outcomes
- Example: "Reduce manual export time by 80%"
- Example: "Increase user retention by enabling data portability"
**Strategic Alignment**: How does this support business goals?
- Example: "Aligns with our goal to improve user experience for enterprise customers"
### Stakeholders Section
Create a table identifying who needs to sign off or provide input:
- **Business Owner**: Makes final business decisions
- **Product Owner**: Gathers and prioritizes requirements
- **End Users**: The people who will use this feature
- **Technical Lead**: Ensures technical feasibility
### User Stories Section
Write 3-5 user stories following this format:
```
As a [user role],
I want to [capability],
so that [benefit/outcome]
```
**Tips for writing user stories:**
- Use real user roles from your product (not generic "user")
- Each story should be achievable in 1-3 days of work (rough estimate)
- Include acceptance criteria inline or in a separate section
**Example:**
| ID | As a... | I want to... | So that... | Priority |
|---|---|---|---|---|
| US-1 | Data Analyst | Export data as CSV | I can analyze it in Excel | High |
| US-2 | Enterprise Admin | Bulk export all user data | I can back it up and migrate to another system | High |
| US-3 | API Client | Get exports via webhook | I can automate reports | Medium |
### Assumptions Section
List what you're assuming to be true:
- "Users have stable internet connections"
- "Exported data will be less than 100MB"
- "We can leverage the existing database export functionality"
### Constraints Section
Identify limitations:
- **Business**: Budget, timeline, market windows
- **Technical**: System limitations, platform restrictions
- **Organizational**: Team capacity, skill gaps
### Dependencies Section
What needs to happen first?
- "Data privacy review must be completed"
- "Export API implementation (prd-XXX) must be finished"
### Risks Section
What could go wrong?
- Document: Risk description, likelihood (High/Med/Low), impact (High/Med/Low), and mitigation strategy
### Acceptance Criteria Section
Define "done" from a business perspective (3-5 criteria):
```
1. Users can select data types to export (users, transactions, settings)
2. Exports complete within 2 minutes for datasets up to 100MB
3. Exported data is usable in common formats (CSV, JSON)
4. Users receive email confirmation when export is ready
5. Exported data is securely deleted after 7 days
```
### Success Metrics Section
How will you measure success after launch?
| Metric | Current Baseline | Target | Measurement Method |
|--------|------------------|--------|-------------------|
| % of power users using export | 0% | 40% | Product analytics |
| Average export time | N/A | < 2 min | Server logs |
| Support tickets about exports | TBD | < 5/week | Support system |
| User satisfaction (export feature) | N/A | > 4/5 stars | In-app survey |
### Time to Value
When do you expect to see results?
- Example: "We expect 20% adoption within the first 2 weeks post-launch based on similar features"
### Approval Section
Track who has approved this requirement:
- Business Owner approval needed before engineering begins
- Product Owner approval to confirm alignment
- Technical Lead approval to confirm feasibility
## Writing Tips
### Be Specific, Not Vague
- ❌ Bad: "Users want to export their data"
- ✅ Good: "Users want to export their transaction history as CSV within a specific date range"
### Use Concrete Examples
- Describe what the feature looks like in action
- Include sample data or screenshots if possible
- Give edge cases (what about large datasets? special characters? time zones?)
### Consider the User's Perspective
- Think about: What problem are they solving with this?
- What would make them happy or frustrated?
- What alternatives might they use if you don't build this?
### Link to Other Specs
- Reference related technical requirements (if they exist): "See [prd-XXX] for implementation details"
- Reference related design docs: "See [des-XXX] for the export flow architecture"
### Complete All TODOs
- Don't leave placeholder text like "TODO: Add metrics"
- If something isn't known, explain why and what needs to happen to find out
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/business-requirement/brd-001-your-spec.md
```
The validator checks for:
- Title and ID properly formatted
- All required sections present
- Minimum content in critical sections
- No incomplete TODO items
### Common Issues & Fixes
**Issue**: "Missing Acceptance Criteria section"
- **Fix**: Add 3-5 clear, measurable acceptance criteria that define what "done" means
**Issue**: "User Stories only has 1-2 items (minimum 3)"
- **Fix**: Add 1-2 more user stories representing different roles or scenarios
**Issue**: "TODO items in Business Value (3 items)"
- **Fix**: Complete the Business Value section with actual expected outcomes and strategic alignment
**Issue**: "No Success Metrics defined"
- **Fix**: Add a table with specific, measurable KPIs you'll track post-launch
### Check Completeness
```bash
scripts/check-completeness.sh docs/specs/business-requirement/brd-001-your-spec.md
```
This shows:
- Overall completion percentage
- Which sections still have TODOs
- Referenced documents (if any are broken, they show up here)
## Decision-Making Framework
When writing the BRD, reason through these questions:
1. **Problem**: Is this a real problem or a nice-to-have?
- Can you trace it back to actual user feedback?
- How many users are affected?
- How often do they encounter this problem?
2. **Scope**: What are we NOT building?
- Define boundaries clearly (what's in scope vs. out of scope)
- This helps prevent scope creep
3. **Trade-offs**: What are we accepting by building this?
- Engineering effort cost
- Opportunity cost (what else won't we build?)
- Maintenance burden
4. **Success**: How will we know if this was worth building?
- What metrics matter?
- What's the acceptable threshold for success?
5. **Risks**: What could prevent this from working?
- Technical risks
- User adoption risks
- Business/market risks
## Example: Complete Business Requirement
Here's how a complete BRD section might look:
```markdown
# [BRD-001] Bulk User Data Export
## Metadata
- **Document ID**: BRD-001-bulk-export
- **Status**: Approved
- **Author**: Jane Smith
- **Created**: 2024-01-15
- **Stakeholders**: Product Manager (Jane), Engineering Lead (Bob), Support (Maria)
- **Priority**: High
## Description
### Background
Our enterprise customers have requested the ability to bulk export user data.
Currently, they can only export one user at a time via the admin panel, which is
time-consuming for customers with hundreds of users.
### Problem Statement
Enterprise customers need to audit, back up, and migrate user data, but the
current one-at-a-time export process takes hours and is error-prone.
## Business Value
### Expected Outcomes
- Reduce manual export time for enterprise customers by 80%
- Enable customers to audit user data for compliance purposes
- Support customer data portability requests
### Strategic Alignment
Aligns with our enterprise expansion goal by improving features our target
customers need for large-scale deployments.
[... rest of sections follow template ...]
```
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh business-requirement brd-XXX-slug`
2. **Fill in each section** using this guide as reference
3. **Validate**: `scripts/validate-spec.sh docs/specs/business-requirement/brd-XXX-slug.md`
4. **Fix issues** identified by the validator
5. **Get stakeholder approval** (fill in the Approval section)
6. **Share with engineering** for technical requirement creation

View File

@@ -0,0 +1,600 @@
# How to Create a Component Specification
Component specifications document individual system components or services, including their responsibilities, interfaces, configuration, and deployment characteristics.
## Quick Start
```bash
# 1. Create a new component spec
scripts/generate-spec.sh component cmp-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/component/cmp-001-descriptive-slug.md)
# 3. Fill in the sections, then validate:
scripts/validate-spec.sh docs/specs/component/cmp-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/component/cmp-001-descriptive-slug.md
```
## When to Write a Component Specification
Use a Component Spec when you need to:
- Document a microservice or major system component
- Specify component responsibilities and interfaces
- Define configuration requirements
- Document deployment procedures
- Enable teams to understand component behavior
- Plan for monitoring and observability
## Research Phase
### 1. Research Related Specifications
Find what informed this component:
```bash
# Find design documents that reference this component
grep -r "design\|architecture" docs/specs/ --include="*.md"
# Find API contracts this component implements
grep -r "api\|endpoint" docs/specs/ --include="*.md"
# Find data models this component uses
grep -r "data\|model" docs/specs/ --include="*.md"
```
### 2. Review Similar Components
- How are other components in your system designed?
- What patterns and conventions exist?
- How are they deployed and monitored?
- What's the standard for documentation?
### 3. Understand Dependencies
- What services or systems does this component depend on?
- What services depend on this component?
- What data flows through this component?
- What are the integration points?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Export Service", "User Authentication Service", etc.
- **Type**: Microservice, Library, Worker, API Gateway, etc.
- **Version**: Current version number
### Component Description
```markdown
# Export Service
The Export Service is a microservice responsible for handling bulk user data exports.
Manages export job lifecycle: queuing, processing, storage, and delivery.
**Type**: Microservice
**Language**: Node.js + TypeScript
**Deployment**: Kubernetes (3+ replicas)
**Status**: Stable (production)
```
### Purpose & Responsibilities Section
```markdown
## Purpose
Provide reliable, scalable handling of user data exports in multiple formats
while maintaining system stability and data security.
## Primary Responsibilities
1. **Job Queueing**: Accept export requests and queue them for processing
- Validate request parameters
- Create export job records
- Enqueue jobs for processing
- Return job ID to client
2. **Job Processing**: Execute export jobs asynchronously
- Query user data from database
- Transform data to requested format (CSV, JSON)
- Compress files for storage
- Handle processing errors and retries
3. **File Storage**: Manage exported file storage and lifecycle
- Store completed exports to S3
- Generate secure download URLs
- Implement TTL-based cleanup
- Maintain export audit logs
4. **Status Tracking**: Provide job status and progress information
- Track job state (queued, processing, completed, failed)
- Record completion time and file metadata
- Handle cancellation requests
5. **Error Handling**: Manage failures gracefully
- Retry failed jobs with exponential backoff
- Notify users of failures
- Log errors for debugging
- Preserve system stability during failures
```
### Interfaces & APIs Section
```markdown
## Interfaces
### REST API Endpoints
The service exposes these HTTP endpoints:
#### POST /exports
**Purpose**: Create a new export job
**Authentication**: Required (Bearer token)
**Request Body**:
```json
{
"data_types": ["users", "transactions"],
"format": "csv",
"date_range": { "start": "2024-01-01", "end": "2024-01-31" }
}
```
**Response** (201 Created):
```json
{
"id": "exp_123456",
"status": "queued",
"created_at": "2024-01-15T10:00:00Z"
}
```
#### GET /exports/{id}
**Purpose**: Get export job status
**Response** (200 OK):
```json
{
"id": "exp_123456",
"status": "completed",
"download_url": "https://...",
"file_size_bytes": 2048576
}
```
### Event Publishing
The service publishes events to message queue:
**export.started**
```json
{
"event": "export.started",
"export_id": "exp_123456",
"user_id": "usr_789012",
"timestamp": "2024-01-15T10:00:00Z"
}
```
**export.completed**
```json
{
"event": "export.completed",
"export_id": "exp_123456",
"file_size_bytes": 2048576,
"format": "csv",
"timestamp": "2024-01-15T10:05:00Z"
}
```
**export.failed**
```json
{
"event": "export.failed",
"export_id": "exp_123456",
"error": "database_connection_timeout",
"timestamp": "2024-01-15T10:05:00Z"
}
```
### Dependencies (Consumed APIs)
- **User Service API**: GET /users/{id}, GET /users (for data export)
- **Auth Service**: JWT validation
- **Notification Service**: Send export completion notifications
```
### Configuration Section
```markdown
## Configuration
### Environment Variables
| Variable | Type | Required | Description |
|----------|------|----------|-------------|
| NODE_ENV | string | Yes | Environment (dev, staging, production) |
| PORT | number | Yes | HTTP server port (default: 3000) |
| DATABASE_URL | string | Yes | PostgreSQL connection string |
| REDIS_URL | string | Yes | Redis connection for job queue |
| S3_BUCKET | string | Yes | S3 bucket for export files |
| S3_REGION | string | Yes | AWS region (e.g., us-east-1) |
| AWS_ACCESS_KEY_ID | string | Yes | AWS credentials |
| AWS_SECRET_ACCESS_KEY | string | Yes | AWS credentials |
| EXPORT_TTL_DAYS | number | No | Export file retention days (default: 7) |
| MAX_EXPORT_SIZE_MB | number | No | Maximum export file size (default: 500) |
| CONCURRENT_WORKERS | number | No | Number of concurrent job processors (default: 5) |
### Configuration File (config.json)
```json
{
"server": {
"port": 3000,
"timeout_ms": 30000
},
"jobs": {
"max_retries": 3,
"retry_delay_ms": 1000,
"timeout_ms": 300000
},
"export": {
"max_file_size_mb": 500,
"ttl_days": 7,
"formats": ["csv", "json"]
},
"storage": {
"type": "s3",
"cleanup_interval_hours": 24
}
}
```
### Runtime Requirements
- **Memory**: 512MB minimum, 2GB recommended
- **CPU**: 1 core minimum, 2 cores recommended
- **Disk**: 10GB for temporary files
- **Network**: Must reach PostgreSQL, Redis, S3, Auth Service
```
### Data Dependencies Section
```markdown
## Data Dependencies
### Input Data
The service requires access to:
- **User data**: From User Service or User DB
- Fields: id, email, name, created_at, etc.
- Constraints: User must be authenticated
- Volume: Scale with user dataset
- **Transaction data**: From Transaction DB
- Fields: id, user_id, amount, date, etc.
- Volume: Can be large (100k+ per user)
### Output Data
The service produces:
- **Export files**: CSV or JSON format
- Stored in S3
- Size: Up to 500MB per file
- Retention: 7 days
- **Export metadata**: Stored in PostgreSQL
- Export record with status, size, completion time
- Audit trail of all exports
```
### Deployment Section
```markdown
## Deployment
### Container Image
- **Base Image**: node:18-alpine
- **Build**: Dockerfile in repository root
- **Registry**: ECR (AWS Elastic Container Registry)
- **Tag**: Semver (e.g., v1.2.3, latest)
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: export-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: export-service
template:
metadata:
labels:
app: export-service
spec:
containers:
- name: export-service
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:latest
ports:
- containerPort: 3000
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: export-service-secrets
key: database-url
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
```
### Deployment Steps
1. **Build**: `docker build -t export-service:v1.2.3 .`
2. **Push**: `docker push <registry>/export-service:v1.2.3`
3. **Update**: `kubectl set image deployment/export-service export-service=<registry>/export-service:v1.2.3`
4. **Verify**: `kubectl rollout status deployment/export-service`
### Rollback Procedure
```bash
# If deployment fails, rollback to previous version
kubectl rollout undo deployment/export-service
# Verify successful rollback
kubectl rollout status deployment/export-service
```
### Pre-Deployment Checklist
- [ ] All tests passing locally
- [ ] Database migrations run successfully
- [ ] Configuration environment variables set in staging
- [ ] Health check endpoints responding
- [ ] Metrics and logging verified
```
### Monitoring & Observability Section
```markdown
## Monitoring
### Health Checks
**Liveness Probe**: GET /health
- Returns 200 if service is running
- Used by Kubernetes to restart unhealthy pods
**Readiness Probe**: GET /ready
- Returns 200 if service is ready to receive traffic
- Checks database connectivity, Redis availability
- Used by Kubernetes for traffic routing
### Metrics
Export these Prometheus metrics:
| Metric | Type | Description |
|--------|------|-------------|
| exports_created_total | Counter | Total exports created |
| exports_completed_total | Counter | Total exports completed successfully |
| exports_failed_total | Counter | Total exports failed |
| export_duration_seconds | Histogram | Time to complete export (p50, p95, p99) |
| export_file_size_bytes | Histogram | Size of exported files |
| export_job_queue_depth | Gauge | Number of jobs awaiting processing |
| export_active_jobs | Gauge | Number of jobs currently processing |
### Alerts
Configure these alerts:
**Export Job Backlog Growing**
- Alert if `export_job_queue_depth > 100` for 5+ minutes
- Action: Scale up worker replicas
**Export Failures Increasing**
- Alert if `exports_failed_total` increases by > 10% in 1 hour
- Action: Investigate failure logs
**Service Unhealthy**
- Alert if liveness probe fails
- Action: Restart pod, check logs
### Logging
Log format (JSON):
```json
{
"timestamp": "2024-01-15T10:05:00Z",
"level": "info",
"service": "export-service",
"export_id": "exp_123456",
"event": "export_completed",
"duration_ms": 5000,
"file_size_bytes": 2048576,
"message": "Export completed successfully"
}
```
**Log Levels**
- `debug`: Detailed debugging information
- `info`: Important operational events
- `warn`: Warning conditions (retries, slow operations)
- `error`: Error conditions (failures, exceptions)
```
### Dependencies & Integration Section
```markdown
## Dependencies
### Service Dependencies
| Service | Purpose | Criticality | Failure Impact |
|---------|---------|-------------|----------------|
| PostgreSQL | Export job storage | Critical | Service down |
| Redis | Job queue | Critical | Exports won't process |
| S3 | Export file storage | Critical | Can't store exports |
| Auth Service | JWT validation | Critical | Can't validate requests |
| User Service | User data source | Critical | Can't export user data |
| Notification Service | Email notifications | Optional | Users won't get notification |
### External Dependencies
- **AWS S3**: For file storage and retrieval
- **PostgreSQL**: For export metadata
- **Redis**: For job queue
- **Kubernetes**: For orchestration
### Fallback Strategies
- Redis unavailable: Use in-memory queue (single instance only)
- User Service unavailable: Fail export with "upstream_error"
- S3 unavailable: Retry with exponential backoff, max 3 times
```
### Performance & SLA Section
```markdown
## Performance Characteristics
### Throughput
- Process up to 1000 exports per day
- Handle 100 concurrent job workers
- Queue depth auto-scales based on load
### Latency
- Create export job: < 100ms (p95)
- Process 100MB export: 3-5 minutes average
- Query export status: < 50ms (p95)
### Resource Usage
- Memory: 800MB average, peaks at 1.5GB
- CPU: 25% average, peaks at 60%
- Disk (temp): 50GB for concurrent exports
### Service Level Objectives (SLOs)
| Objective | Target |
|-----------|--------|
| Availability | 99.5% uptime |
| Error Rate | < 0.1% |
| p95 Latency (status query) | < 100ms |
| Export Completion | < 10 minutes for 100MB |
### Scalability
- Horizontal: Add more pods for higher throughput
- Vertical: Increase pod memory/CPU for larger exports
- Maximum tested: 10k exports/day on 5 pod cluster
```
## Writing Tips
### Be Specific About Responsibilities
- What does this component do?
- What does it NOT do?
- Where do responsibilities start/stop?
### Document All Interfaces
- REST APIs? Document endpoints and schemas
- Message queues? Show event formats
- Database? Show schema and queries
- Dependencies? Show what's called and how
### Include Deployment Details
- How is it deployed (containers, VMs, serverless)?
- Configuration required?
- Health checks?
- Monitoring setup?
### Link to Related Specs
- Reference design docs: `[DES-001]`
- Reference API contracts: `[API-001]`
- Reference data models: `[DATA-001]`
- Reference deployment procedures: `[DEPLOY-001]`
### Document Failure Modes
- What happens if dependencies fail?
- How does the component recover?
- What alerts fire when things go wrong?
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/component/cmp-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing Interfaces section"
- **Fix**: Document all APIs, event formats, and data contracts
**Issue**: "Configuration incomplete"
- **Fix**: Add environment variables, configuration files, and runtime requirements
**Issue**: "No Monitoring section"
- **Fix**: Add health checks, metrics, alerts, and logging strategy
**Issue**: "Deployment steps unclear"
- **Fix**: Add step-by-step deployment and rollback procedures
## Decision-Making Framework
When writing a component spec, consider:
1. **Boundaries**: What is this component's responsibility?
- What does it own?
- What does it depend on?
- Where are boundaries clear?
2. **Interfaces**: How will others interact with this?
- REST, gRPC, events, direct calls?
- What contracts must be maintained?
- How do we evolve interfaces?
3. **Configuration**: What's configurable vs. hardcoded?
- Environment-specific settings?
- Runtime tuning parameters?
- Feature flags?
4. **Operations**: How will we run this in production?
- Deployment model?
- Monitoring and alerting?
- Failure recovery?
5. **Scale**: How much can this component handle?
- Throughput limits?
- Scaling strategy?
- Resource requirements?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh component cmp-XXX-slug`
2. **Research**: Find design docs and existing components
3. **Define responsibilities** and boundaries
4. **Document interfaces** for all interactions
5. **Plan deployment** and monitoring
6. **Validate**: `scripts/validate-spec.sh docs/specs/component/cmp-XXX-slug.md`
7. **Share with architecture/ops** before implementation

View File

@@ -0,0 +1,707 @@
# How to Create a Configuration Schema Specification
Configuration schema specifications document all configurable parameters for a system, including their types, valid values, defaults, and impact.
## Quick Start
```bash
# 1. Create a new configuration schema
scripts/generate-spec.sh configuration-schema config-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/configuration-schema/config-001-descriptive-slug.md)
# 3. Fill in configuration fields and validation rules, then validate:
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
```
## When to Write a Configuration Schema
Use a Configuration Schema when you need to:
- Document all configurable system parameters
- Specify environment variables and their meanings
- Define configuration file formats
- Document validation rules and constraints
- Enable operations teams to configure systems safely
- Provide examples for different environments
## Research Phase
### 1. Research Related Specifications
Find what you're configuring:
```bash
# Find component specs
grep -r "component" docs/specs/ --include="*.md"
# Find deployment procedures
grep -r "deploy" docs/specs/ --include="*.md"
# Find existing configuration specs
grep -r "config" docs/specs/ --include="*.md"
```
### 2. Understand Configuration Needs
- What aspects of the system need to be configurable?
- What differs between environments (dev, staging, prod)?
- What can change at runtime vs. requires restart?
- What's sensitive (secrets, credentials)?
### 3. Review Existing Configurations
- How are other services configured?
- What configuration format is used?
- What environment variables exist?
- What patterns should be followed?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Export Service Configuration", "API Gateway Config", etc.
- **Component**: What component is being configured
- **Version**: Configuration format version
- **Status**: Current, Deprecated, etc.
### Overview Section
```markdown
# Export Service Configuration Schema
## Summary
Defines all configurable parameters for the Export Service microservice.
Configuration can be set via environment variables or JSON config file.
**Configuration Methods**:
- Environment variables (recommended for Docker/Kubernetes)
- config.json file (for monolithic deployments)
- Command-line arguments (for local development)
**Scope**: All settings that affect Export Service behavior
**Format**: JSON Schema compliant
```
### Configuration Methods Section
```markdown
## Configuration Methods
### Method 1: Environment Variables (Recommended for Production)
Used in containerized deployments (Docker, Kubernetes).
Set before starting the service.
**Syntax**: `EXPORT_SERVICE_KEY=value`
**Example**:
```bash
export EXPORT_SERVICE_PORT=3000
export EXPORT_SERVICE_LOG_LEVEL=info
export EXPORT_SERVICE_DATABASE_URL=postgresql://user:pass@host/db
```
### Method 2: Configuration File (config.json)
Used in monolithic or local deployments.
JSON format with hierarchical structure.
**Location**: `./config.json` in working directory
**Example**:
```json
{
"server": {
"port": 3000,
"timeout": 30000
},
"database": {
"url": "postgresql://user:pass@host/db",
"pool": 10
}
}
```
### Method 3: Command-Line Arguments
Used in local development. Takes precedence over file config.
**Syntax**: `--key value` or `--key=value`
**Example**:
```bash
node index.js --port 3000 --log-level debug
```
### Precedence (Priority Order)
1. Command-line arguments (highest priority)
2. Environment variables
3. config.json file
4. Default values (lowest priority)
```
### Configuration Fields Section
Document each configuration field:
```markdown
## Configuration Fields
### Server Section
#### PORT
- **Type**: integer
- **Default**: 3000
- **Range**: 1024-65535
- **Environment Variable**: `EXPORT_SERVICE_PORT`
- **Config File Key**: `server.port`
- **Description**: HTTP server listening port
- **Examples**:
- Development: 3000 (local machine, different services use different ports)
- Production: 3000 (behind load balancer, port not exposed)
- **Impact**: Service not reachable if port already in use
- **Can Change at Runtime**: No (requires restart)
#### TIMEOUT_MS
- **Type**: integer (milliseconds)
- **Default**: 30000 (30 seconds)
- **Range**: 5000-120000
- **Environment Variable**: `EXPORT_SERVICE_TIMEOUT_MS`
- **Config File Key**: `server.timeout_ms`
- **Description**: HTTP request timeout
- **Considerations**:
- Must be longer than longest export duration
- If too short: Long exports time out and fail
- If too long: Failed connections hang longer
- **Examples**:
- Development: 30000 (quick feedback on errors)
- Production: 120000 (accounts for large exports)
#### ENABLE_COMPRESSION
- **Type**: boolean
- **Default**: true
- **Environment Variable**: `EXPORT_SERVICE_ENABLE_COMPRESSION`
- **Config File Key**: `server.enable_compression`
- **Description**: Enable HTTP response compression (gzip)
- **Considerations**:
- Reduces bandwidth but increases CPU usage
- Should be true unless CPU constrained
- **Typical Value**: true (always)
### Database Section
#### DATABASE_URL
- **Type**: string (connection string)
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_URL`
- **Config File Key**: `database.url`
- **Format**: `postgresql://user:password@host:port/database`
- **Description**: PostgreSQL connection string
- **Examples**:
- Development: `postgresql://localhost/export_service`
- Staging: `postgresql://stage-db.example.com/export_stage`
- Production: `postgresql://prod-db.example.com/export_prod` (managed RDS)
- **Sensitive**: Yes (contains credentials - use secrets management)
- **Required**: Yes
- **Validation**:
- Must be valid PostgreSQL connection string
- Service fails to start if URL invalid or unreachable
#### DATABASE_POOL_SIZE
- **Type**: integer
- **Default**: 10
- **Range**: 1-100
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_POOL_SIZE`
- **Config File Key**: `database.pool_size`
- **Description**: Number of database connections to maintain
- **Considerations**:
- More connections allow more concurrent queries
- Each connection uses memory and database slot
- Database has max_connections limit (typically 100-500)
- **Tuning**:
- 1 service instance: 5-10 connections
- 5 service instances: 2-4 connections each (25-40 total)
- Kubernetes auto-scaling: 2-3 per pod (auto-scaled)
#### DATABASE_QUERY_TIMEOUT_MS
- **Type**: integer (milliseconds)
- **Default**: 10000 (10 seconds)
- **Range**: 1000-60000
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_QUERY_TIMEOUT_MS`
- **Config File Key**: `database.query_timeout_ms`
- **Description**: Timeout for individual database queries
- **Considerations**:
- Export queries can take several seconds for large datasets
- If too short: Queries fail prematurely
- If too long: Failed queries block connection pool
- **Typical Values**:
- Simple queries: 5000ms
- Large exports: 30000ms
### Redis (Job Queue) Section
#### REDIS_URL
- **Type**: string (connection string)
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_REDIS_URL`
- **Config File Key**: `redis.url`
- **Format**: `redis://user:password@host:port/db`
- **Description**: Redis connection string for job queue
- **Examples**:
- Development: `redis://localhost:6379/0`
- Staging: `redis://redis-stage.example.com:6379/0`
- Production: `redis://redis-prod.example.com:6379/0` (managed ElastiCache)
- **Sensitive**: Yes (may contain credentials)
- **Required**: Yes
#### REDIS_MAX_RETRIES
- **Type**: integer
- **Default**: 3
- **Range**: 1-10
- **Environment Variable**: `EXPORT_SERVICE_REDIS_MAX_RETRIES`
- **Config File Key**: `redis.max_retries`
- **Description**: Maximum retry attempts for Redis operations
- **Considerations**:
- More retries provide resilience but increase latency on failure
- Should be 3-5 for production
- **Typical Values**: 3
#### CONCURRENT_WORKERS
- **Type**: integer
- **Default**: 3
- **Range**: 1-20
- **Environment Variable**: `EXPORT_SERVICE_CONCURRENT_WORKERS`
- **Config File Key**: `redis.concurrent_workers`
- **Description**: Number of concurrent export workers
- **Considerations**:
- Each worker processes one export job at a time
- More workers process jobs faster but use more resources
- Limited by CPU and memory available
- Kubernetes scales pods, not this setting
- **Tuning**:
- Development: 1-2 (for debugging)
- Production with 2 CPU: 2-3 workers
- Production with 4+ CPU: 4-8 workers
### Export Section
#### MAX_EXPORT_SIZE_MB
- **Type**: integer
- **Default**: 500
- **Range**: 10-5000
- **Environment Variable**: `EXPORT_SERVICE_MAX_EXPORT_SIZE_MB`
- **Config File Key**: `export.max_export_size_mb`
- **Description**: Maximum size for an export file (in MB)
- **Considerations**:
- Files larger than this are rejected
- Limited by disk space and memory
- Should match S3 bucket policies
- **Typical Values**:
- Small deployments: 100MB
- Standard: 500MB
- Enterprise: 1000-5000MB
#### EXPORT_TTL_DAYS
- **Type**: integer (days)
- **Default**: 7
- **Range**: 1-365
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_TTL_DAYS`
- **Config File Key**: `export.ttl_days`
- **Description**: How long to retain export files after completion
- **Considerations**:
- Files deleted after TTL expires
- Affects storage costs (shorter TTL = lower cost)
- Users must download before expiration
- **Typical Values**:
- Short retention: 3 days (reduce storage cost)
- Standard: 7 days (reasonable download window)
- Long retention: 30 days (enterprise customers)
#### EXPORT_FORMATS
- **Type**: array of strings
- **Default**: ["csv", "json"]
- **Valid Values**: "csv", "json", "parquet"
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_FORMATS` (comma-separated)
- **Config File Key**: `export.formats`
- **Description**: Supported export file formats
- **Examples**:
- `["csv", "json"]` (most common)
- `["csv", "json", "parquet"]` (full support)
- **Configuration**:
- Environment: `EXPORT_SERVICE_EXPORT_FORMATS=csv,json`
- File: `"formats": ["csv", "json"]`
#### COMPRESSION_ENABLED
- **Type**: boolean
- **Default**: true
- **Environment Variable**: `EXPORT_SERVICE_COMPRESSION_ENABLED`
- **Config File Key**: `export.compression_enabled`
- **Description**: Enable gzip compression for export files
- **Considerations**:
- Reduces file size by 60-80% typically
- Increases CPU usage during export
- Should be enabled unless CPU is bottleneck
- **Typical Value**: true
### Storage Section
#### S3_BUCKET
- **Type**: string
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_S3_BUCKET`
- **Config File Key**: `storage.s3_bucket`
- **Description**: AWS S3 bucket for storing export files
- **Format**: `bucket-name` (no s3:// prefix)
- **Examples**:
- Development: `export-service-dev`
- Staging: `export-service-stage`
- Production: `export-service-prod`
- **Required**: Yes
- **IAM Requirements**: Service role must have s3:PutObject, s3:GetObject
#### S3_REGION
- **Type**: string
- **Default**: `us-east-1`
- **Valid Values**: Any AWS region (us-east-1, eu-west-1, etc.)
- **Environment Variable**: `EXPORT_SERVICE_S3_REGION`
- **Config File Key**: `storage.s3_region`
- **Description**: AWS region for S3 bucket
- **Examples**:
- us-east-1 (US East - Virginia)
- eu-west-1 (EU - Ireland)
### Logging Section
#### LOG_LEVEL
- **Type**: string (enum)
- **Default**: "info"
- **Valid Values**: "debug", "info", "warn", "error"
- **Environment Variable**: `EXPORT_SERVICE_LOG_LEVEL`
- **Config File Key**: `logging.level`
- **Description**: Logging verbosity level
- **Examples**:
- Development: "debug" (verbose, detailed logs)
- Staging: "info" (normal level)
- Production: "info" or "warn" (minimal logs, better performance)
- **Considerations**:
- debug: Very verbose, affects performance
- info: Standard operational logs
- warn: Only warnings and errors
- error: Only errors
#### LOG_FORMAT
- **Type**: string (enum)
- **Default**: "json"
- **Valid Values**: "json", "text"
- **Environment Variable**: `EXPORT_SERVICE_LOG_FORMAT`
- **Config File Key**: `logging.format`
- **Description**: Log output format
- **Examples**:
- json: Machine-parseable JSON logs (recommended for production)
- text: Human-readable text (good for development)
### Feature Flags Section
#### FEATURE_PARQUET_EXPORT
- **Type**: boolean
- **Default**: false
- **Environment Variable**: `EXPORT_SERVICE_FEATURE_PARQUET_EXPORT`
- **Config File Key**: `features.parquet_export`
- **Description**: Enable experimental Parquet export format
- **Considerations**:
- Set to false for stable deployments
- Set to true in staging for testing
- Disabled by default in production
- **Typical Values**:
- Development: true (test new feature)
- Staging: true (validate before production)
- Production: false (disabled until stable)
```
### Validation Rules Section
```markdown
## Validation & Constraints
### Required Fields
These fields must be provided (no default value):
- `DATABASE_URL` - PostgreSQL connection string required
- `REDIS_URL` - Redis connection required
- `S3_BUCKET` - S3 bucket must be specified
### Type Validation
- Integers: Must be valid numeric values
- Booleans: Accept true, false, "true", "false", 1, 0
- Strings: Must not be empty (unless explicitly optional)
- Arrays: Must be comma-separated in environment, JSON array in file
### Range Validation
- PORT: 1024-65535 (avoid system ports)
- POOL_SIZE: 1-100 (reasonable connection pool)
- TIMEOUT_MS: 5000-120000 (between 5 seconds and 2 minutes)
- MAX_EXPORT_SIZE_MB: 10-5000 (reasonable file sizes)
### Format Validation
- DATABASE_URL: Must be valid PostgreSQL connection string
- S3_BUCKET: Must follow S3 naming rules (lowercase, hyphens only)
- S3_REGION: Must be valid AWS region code
### Interdependency Rules
- If COMPRESSION_ENABLED=true: MAX_EXPORT_SIZE_MB can be larger
- If MAX_EXPORT_SIZE_MB > 100: DATABASE_QUERY_TIMEOUT_MS should be > 10000
- If CONCURRENT_WORKERS > 5: Memory requirements increase significantly
### Error Cases
What happens if validation fails:
- Service fails to start with validation error
- Specific field and reason for validation failure logged
- Error message includes valid range/values
```
### Environment-Specific Configurations Section
```markdown
## Environment-Specific Configurations
### Development Environment
```json
{
"server": {
"port": 3000,
"timeout_ms": 30000
},
"database": {
"url": "postgresql://localhost/export_service",
"pool_size": 5
},
"redis": {
"url": "redis://localhost:6379/0",
"concurrent_workers": 1
},
"export": {
"max_export_size_mb": 100,
"ttl_days": 7,
"formats": ["csv", "json"]
},
"logging": {
"level": "debug",
"format": "text"
},
"features": {
"parquet_export": false
}
}
```
**Notes**:
- Runs locally with minimal resources
- Verbose logging for debugging
- Limited concurrent workers (1)
- Smaller max export size for testing
### Staging Environment
```bash
EXPORT_SERVICE_PORT=3000
EXPORT_SERVICE_DATABASE_URL=postgresql://stage-db.example.com/export_stage
EXPORT_SERVICE_REDIS_URL=redis://redis-stage.example.com:6379/0
EXPORT_SERVICE_S3_BUCKET=export-service-stage
EXPORT_SERVICE_S3_REGION=us-east-1
EXPORT_SERVICE_LOG_LEVEL=info
EXPORT_SERVICE_LOG_FORMAT=json
EXPORT_SERVICE_CONCURRENT_WORKERS=3
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=true
```
**Notes**:
- Tests new features before production
- Similar resources to production
- Parquet export enabled for testing
### Production Environment
```bash
EXPORT_SERVICE_PORT=3000
EXPORT_SERVICE_DATABASE_URL=<from AWS Secrets Manager>
EXPORT_SERVICE_REDIS_URL=<from AWS Secrets Manager>
EXPORT_SERVICE_S3_BUCKET=export-service-prod
EXPORT_SERVICE_S3_REGION=us-east-1
EXPORT_SERVICE_LOG_LEVEL=info
EXPORT_SERVICE_LOG_FORMAT=json
EXPORT_SERVICE_CONCURRENT_WORKERS=4
EXPORT_SERVICE_DATABASE_POOL_SIZE=3
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
EXPORT_SERVICE_EXPORT_TTL_DAYS=7
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=false
```
**Notes**:
- Credentials from secrets manager
- Optimized for performance and reliability
- Experimental features disabled
- Standard deployment settings
```
### Configuration Examples Section
```markdown
## Complete Configuration Examples
### Minimal Configuration (Development)
```bash
# Minimal settings needed to run locally
export EXPORT_SERVICE_DATABASE_URL=postgresql://localhost/export_service
export EXPORT_SERVICE_REDIS_URL=redis://localhost:6379/0
export EXPORT_SERVICE_S3_BUCKET=export-service-local
export EXPORT_SERVICE_S3_REGION=us-east-1
```
### High-Throughput Configuration (Production)
```bash
# Optimized for maximum throughput
export EXPORT_SERVICE_CONCURRENT_WORKERS=8
export EXPORT_SERVICE_DATABASE_POOL_SIZE=5
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=1000
export EXPORT_SERVICE_COMPRESSION_ENABLED=true
export EXPORT_SERVICE_EXPORT_TTL_DAYS=30
```
### Low-Resource Configuration (Cost-Optimized)
```bash
# Minimizes resource usage and cost
export EXPORT_SERVICE_CONCURRENT_WORKERS=1
export EXPORT_SERVICE_DATABASE_POOL_SIZE=2
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=100
export EXPORT_SERVICE_EXPORT_TTL_DAYS=1
export EXPORT_SERVICE_LOG_LEVEL=warn
```
```
### Secrets Management Section
```markdown
## Handling Sensitive Configuration
### Sensitive Fields
These fields contain credentials or sensitive information:
- DATABASE_URL (contains password)
- REDIS_URL (may contain password)
- AWS credentials (if not using IAM roles)
### Security Best Practices
1. **Never commit secrets to git**
- Use .gitignore to exclude config files with secrets
- Use environment variables instead
2. **Use Secrets Management**
- AWS Secrets Manager (recommended for production)
- HashiCorp Vault (for multi-team deployments)
- Kubernetes Secrets (for K8s deployments)
3. **Rotate Credentials**
- Rotate database passwords regularly
- Rotate AWS API keys
- Update service after rotation
4. **Limit Access**
- Only operations team can see production credentials
- Audit logs track who accessed what credentials
- Use IAM roles instead of static credentials when possible
### Example: Using AWS Secrets Manager
```bash
# In Kubernetes deployment, inject from AWS Secrets Manager
DATABASE_URL=$(aws secretsmanager get-secret-value \
--secret-id export-service/db-url \
--query SecretString --output text)
export EXPORT_SERVICE_DATABASE_URL=$DATABASE_URL
```
```
## Writing Tips
### Be Clear About Scope
- What can users configure?
- What's fixed/non-configurable and why?
- What requires restart vs. hot reload?
### Provide Realistic Examples
- Show real values, not placeholders
- Include examples for different environments
- Show both correct and incorrect formats
### Document Trade-offs
- Why choose certain defaults?
- What's the impact of changing values?
- What happens if value is too high/low?
### Include Validation
- What values are valid?
- What happens if invalid values provided?
- How do users know if config is wrong?
### Think About Operations
- What configuration might ops teams want to change?
- What parameters help troubleshoot issues?
- What can be tuned for performance?
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Configuration fields lack descriptions"
- **Fix**: Add purpose, examples, and impact for each field
**Issue**: "No validation rules documented"
- **Fix**: Document valid ranges, formats, required fields
**Issue**: "No environment-specific examples"
- **Fix**: Add configurations for dev, staging, and production
**Issue**: "Sensitive fields not highlighted"
- **Fix**: Clearly mark sensitive fields and document secrets management
## Decision-Making Framework
When designing configuration schema:
1. **Scope**: What should be configurable?
- Environment-specific settings?
- Performance tuning parameters?
- Feature flags?
- Operational settings?
2. **Defaults**: What are good default values?
- Production-safe defaults?
- Development-friendly for new users?
- Documented reasoning?
3. **Flexibility**: How much should users configure?
- Too much: Confusing, hard to troubleshoot
- Too little: Can't adapt to needs
- Right amount: Common use cases covered
4. **Safety**: How do we prevent misconfiguration?
- Validation rules?
- Error messages?
- Documentation of constraints?
5. **Evolution**: How will configuration change?
- Backward compatibility?
- Migration path for old configs?
- Deprecation timeline?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh configuration-schema config-XXX-slug`
2. **List fields**: What can be configured?
3. **Document each field** with type, default, range, impact
4. **Provide examples** for different environments
5. **Document validation** rules and constraints
6. **Validate**: `scripts/validate-spec.sh docs/specs/configuration-schema/config-XXX-slug.md`
7. **Share with operations team** for feedback

View File

@@ -0,0 +1,490 @@
# How to Create a Data Model Specification
Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.
## Quick Start
```bash
# 1. Create a new data model
scripts/generate-spec.sh data-model data-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)
# 3. Fill in entities and relationships, then validate:
scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md
```
## When to Write a Data Model
Use a Data Model when you need to:
- Define database schema for new features
- Document entity relationships and constraints
- Establish consistent naming conventions
- Enable API/UI teams to understand data structure
- Plan data migrations or refactoring
- Document complex data relationships
## Research Phase
### 1. Research Related Specifications
Find what this data model supports:
```bash
# Find technical requirements this fulfills
grep -r "prd\|technical" docs/specs/ --include="*.md"
# Find existing data models that might be related
grep -r "data\|model" docs/specs/ --include="*.md"
# Find API contracts that expose this data
grep -r "api\|endpoint" docs/specs/ --include="*.md"
```
### 2. Review Existing Data Models
- What data modeling patterns does your codebase use?
- What database are you using (PostgreSQL, MongoDB, etc.)?
- How are relationships currently modeled?
- Naming conventions for fields and entities?
- Any legacy schema patterns to respect or migrate from?
### 3. Research Domain Models
- How do industry-standard models structure similar data?
- Are there existing standards (e.g., ISO, RFC) you should follow?
- What are best practices in this domain?
### 4. Understand Business Rules
- What constraints must the data satisfy?
- What are the cardinality rules (one-to-many, many-to-many)?
- What data must be unique or required?
- What's the expected scale/volume?
## Structure & Content Guide
### Title & Metadata
- **Title**: "User Data Model", "Transaction Model", etc.
- **Scope**: What entities does this model cover?
- **Version**: 1.0 for new models
### Overview Section
Provide context:
```markdown
# User & Profile Data Model
This data model defines the core entities for user management and profile
information. Covers user accounts, authentication data, and user preferences.
**Entities**: User, UserProfile, UserPreference
**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
**Primary Database**: PostgreSQL
```
### Entity Definitions Section
Document each entity/table:
#### Entity: User
```markdown
### User
Core user account entity. Every user must have exactly one User record.
**Purpose**: Represents a user account in the system.
**Fields**
| Field | Type | Required | Unique | Default | Description |
|-------|------|----------|--------|---------|-------------|
| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
| email | String(255) | Yes | Yes | - | User's email address, used for login |
| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
| first_name | String(100) | No | No | - | User's first name |
| last_name | String(100) | No | No | - | User's last name |
| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |
**Indexes**
- Primary: `email` (unique for quick lookups)
- Secondary: `created_at` (for user listing/pagination)
- Secondary: `status` (for filtering active users)
**Constraints**
- Email format must be valid (enforced in application)
- Password must be at least 8 characters (enforced in application)
- Email must be globally unique
- Status can only be: active, inactive, suspended
**Data Volume**
- Expected growth: 100 new users/day
- Estimated year 1: ~36k users
- Estimated year 3: ~150k users
**Archival Strategy**
- Deleted users (deleted_at != NULL) moved to archive after 1 year
- Soft deletes used for data recovery capability
```
#### Entity: UserProfile
```markdown
### UserProfile
Extended user profile information. One-to-one relationship with User.
**Purpose**: Stores optional user profile information separate from core account.
**Fields**
| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
| avatar_url | String(500) | No | No | URL to user's avatar image |
| bio | String(500) | No | No | User bio/description |
| phone | String(20) | No | Yes | User phone number |
| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
| theme | Enum | No | No | UI theme preference: light, dark, auto |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |
**Indexes**
- Primary: `user_id` (unique for 1:1 relationship)
**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Phone must be valid format (if provided)
- Timezone must be valid (e.g., from IANA timezone database)
- Language must be valid ISO 639-1 code
- Theme must be one of: light, dark, auto
**Notes**
- Soft-deleted with parent User (CASCADE delete)
- Profile is optional - some users may not have profile data
```
#### Entity: UserPreference
```markdown
### UserPreference
Key-value preferences for users. Flexible schema for future preference types.
**Purpose**: Stores user preferences without requiring schema changes.
**Fields**
| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
| preference_value | String(1000) | Yes | No | Preference value as string |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |
**Indexes**
- Composite: `(user_id, preference_key)` - For efficient preference lookup
- Primary: `user_id` - For finding all preferences for a user
**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Composite unique: `(user_id, preference_key)` - One preference per key per user
- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
- preference_value must be valid JSON or simple string
**Valid Preferences**
Examples of preference_key values:
- `notifications_email` → "true"/"false"
- `notifications_sms` → "true"/"false"
- `export_format` → "csv"/"json"
- `ui_columns_per_page` → "20"/"50"/"100"
**Notes**
- Flexible key-value design allows adding preferences without schema changes
- Values stored as strings for flexibility, parsed by application layer
```
### Relationships Section
Document how entities relate:
```markdown
## Entity Relationships
```
┌───────────┐ ┌──────────────┐ ┌─────────────┐
│ User │ │ UserProfile │ │ UserPref │
├───────────┤ ├──────────────┤ ├─────────────┤
│ id (PK) │ │ id (PK) │ │ id (PK) │
│ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │
│ ... │ │ avatar_url │ │ pref_key │
└───────────┘ │ ... │ │ pref_value │
└──────────────┘ └─────────────┘
1:many
```
### Relationship: User → UserProfile (1:1)
- **Type**: One-to-One
- **Foreign Key**: UserProfile.user_id → User.id
- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
- **Optional**: UserProfile is optional (some users may not have detailed profile)
### Relationship: User → UserPreference (1:many)
- **Type**: One-to-Many
- **Foreign Key**: UserPreference.user_id → User.id
- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
- **Optional**: A User can have zero preferences
```
### Constraints & Validation Section
```markdown
## Data Constraints & Validation
### Business Logic Constraints
- Users cannot have duplicate emails (enforced at database + application)
- User phone numbers must be unique if provided
- Email and phone cannot both be deleted/NULL in UserProfile
### Data Integrity Rules
- password_hash must never be exposed in API responses
- deleted_at cannot be set retroactively (only forward through time)
- updated_at must be >= created_at
### Referential Integrity
- Foreign key constraints enforced at database level
- Cascade deletes on User deletion
- No orphaned UserProfile or UserPreference records
### Enumeration Values
**User.status**
- `active` - Account is active
- `inactive` - Account temporarily inactive
- `suspended` - Account suspended (admin action)
**UserProfile.theme**
- `light` - Light theme
- `dark` - Dark theme
- `auto` - Follow system settings
**UserPreference.preference_key**
- Must match pattern: `[a-z_]+`
- Examples: `notifications_email`, `export_format`, `ui_language`
```
### Scaling Considerations Section
```markdown
## Scaling & Performance
### Expected Data Volume
- Users: 100-1000 per day growth
- Preferences: ~5-10 per user on average
- Year 1 estimate: 36k users, ~180k preference records
### Table Sizes
- User table: ~36MB (estimated year 1)
- UserProfile table: ~28MB
- UserPreference table: ~22MB
### Query Patterns & Indexes
- Find user by email: Indexed (UNIQUE index on email)
- Find all preferences for user: Indexed (composite on user_id, pref_key)
- List users by creation date: Indexed (on created_at)
- Filter users by status: Indexed (on status)
### Optimization Notes
- Composite index `(user_id, preference_key)` enables efficient preference lookups
- Email index enables fast login queries
- Consider partitioning UserPreference by user_id for very large scale (100M+ records)
```
### Migration & Change Management Section
```markdown
## Schema Evolution
### Creating These Tables
```sql
CREATE TABLE user (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
status VARCHAR(50) DEFAULT 'active' NOT NULL,
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL,
deleted_at TIMESTAMP
);
CREATE TABLE user_profile (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
avatar_url VARCHAR(500),
bio VARCHAR(500),
phone VARCHAR(20) UNIQUE,
timezone VARCHAR(50),
language VARCHAR(5),
theme VARCHAR(20),
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL
);
CREATE TABLE user_preference (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
preference_key VARCHAR(100) NOT NULL,
preference_value VARCHAR(1000) NOT NULL,
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL,
UNIQUE(user_id, preference_key)
);
CREATE INDEX idx_user_email ON user(email);
CREATE INDEX idx_user_created_at ON user(created_at);
CREATE INDEX idx_user_status ON user(status);
CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);
```
### Future Migrations
- Q2 2024: Add `last_login_at` to User (nullable, new index)
- Q3 2024: Implement user archival (age > 1 year, no activity)
```
### Documentation & Examples Section
```markdown
## Example Queries
### Find user by email
```sql
SELECT * FROM user WHERE email = 'user@example.com';
```
### Get user with profile
```sql
SELECT u.*, p.*
FROM user u
LEFT JOIN user_profile p ON u.id = p.user_id
WHERE u.id = $1;
```
### Get user's preferences
```sql
SELECT preference_key, preference_value
FROM user_preference
WHERE user_id = $1
ORDER BY created_at DESC;
```
### Archive old inactive users
```sql
UPDATE user
SET deleted_at = now()
WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
AND deleted_at IS NULL;
```
```
## Writing Tips
### Document Constraints Clearly
- Why does each field have the constraints it does?
- What validation rules apply?
- What happens on constraint violations?
### Think About Scale
- How much data will this table store?
- What are the growth projections?
- What indexing strategy is needed?
- Will partitioning be needed in the future?
### Link to Related Specs
- Reference technical requirements: `[PRD-001]`
- Reference API contracts: `[API-001]` (what data is exposed)
- Reference design documents: `[DES-001]`
### Include Examples
- Sample SQL for common queries
- Sample JSON representations
- Example migration scripts
### Document Change Constraints
- What fields can't change after creation?
- What fields are immutable?
- How do we handle schema evolution?
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing entity field specifications"
- **Fix**: Complete the fields table for each entity with types, constraints, descriptions
**Issue**: "No relationships documented"
- **Fix**: Add a relationships section showing foreign keys and cardinality
**Issue**: "TODO items in Constraints (3 items)"
- **Fix**: Complete constraint definitions, validation rules, and enumeration values
**Issue**: "No scaling or performance information"
- **Fix**: Add data volume estimates, indexing strategy, and optimization notes
## Decision-Making Framework
As you write the data model, consider:
1. **Entity Design**: What entities do we need?
- What are distinct concepts?
- What are attributes vs. relationships?
- Should data be normalized or denormalized?
2. **Relationships**: How do entities relate?
- One-to-one, one-to-many, many-to-many?
- Should relationships be required or optional?
- How should deletions cascade?
3. **Constraints**: What rules must data satisfy?
- Uniqueness constraints?
- Required fields?
- Data type restrictions?
- Enumeration values?
4. **Performance**: How will data be queried?
- What indexes are needed?
- What's the expected scale?
- Are there bottlenecks?
5. **Evolution**: How will this model change?
- Can we add fields without migrations?
- Can we add entities without breaking things?
- How do we handle data migrations?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh data-model data-XXX-slug`
2. **Define entities**: What are the main entities/tables?
3. **Specify fields** with types, constraints, descriptions
4. **Document relationships** between entities
5. **Plan indexes** for performance
6. **Validate**: `scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md`
7. **Share with team** before implementation

View File

@@ -0,0 +1,561 @@
# How to Create a Deployment Procedure Specification
Deployment procedures document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
## Quick Start
```bash
# 1. Create a new deployment procedure
scripts/generate-spec.sh deployment-procedure deploy-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/deployment-procedure/deploy-001-descriptive-slug.md)
# 3. Fill in steps and checklists, then validate:
scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
```
## When to Write a Deployment Procedure
Use a Deployment Procedure when you need to:
- Document how to deploy a new service or component
- Ensure consistent, repeatable deployments
- Provide runbooks for operations teams
- Document rollback procedures for failures
- Enable any team member to deploy safely
- Create an audit trail of deployments
## Research Phase
### 1. Research Related Specifications
Find what you're deploying:
```bash
# Find component specs
grep -r "component" docs/specs/ --include="*.md"
# Find design documents that mention infrastructure
grep -r "design\|infrastructure" docs/specs/ --include="*.md"
# Find existing deployment procedures
grep -r "deploy" docs/specs/ --include="*.md"
```
### 2. Understand Your Infrastructure
- What's the deployment target? (Kubernetes, serverless, VMs)
- What infrastructure does this component need?
- What access/permissions are required?
- What monitoring must be in place?
### 3. Review Past Deployments
- How have similar components been deployed?
- What issues arose? How were they resolved?
- What worked well? What didn't?
- Any patterns or templates to follow?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Export Service Deployment to Production", "Database Migration", etc.
- **Component**: What's being deployed
- **Target**: Production, staging, canary, etc.
- **Owner**: Team responsible for deployment
### Prerequisites Section
Document what must be done before deployment:
```markdown
# Export Service Production Deployment
## Prerequisites
### Infrastructure Requirements
- [ ] AWS resources provisioned (see [CMP-001] for details)
- [ ] ElastiCache Redis cluster (export-service-queue)
- [ ] RDS PostgreSQL instance (export-db)
- [ ] S3 bucket (export-files-prod)
- [ ] IAM roles and policies configured
- [ ] Kubernetes cluster accessible
- [ ] kubectl configured with production cluster context
- [ ] Deployment manifests reviewed by tech lead
- [ ] Namespace `export-service-prod` created
### Code & Build Requirements
- [ ] All code merged to main branch
- [ ] Code reviewed by 2+ senior engineers
- [ ] All tests passing
- [ ] Unit tests (90%+ coverage)
- [ ] Integration tests
- [ ] Load tests pass at target throughput
- [ ] Docker image built and pushed to ECR
- [ ] Image tagged with version (e.g., v1.2.3)
- [ ] Image scanned for vulnerabilities
- [ ] Image verified to work (manual test in staging)
### Team & Access Requirements
- [ ] Deployment lead identified (typically tech lead or on-call eng)
- [ ] Access verified for:
- [ ] AWS console (ECR, S3, CloudWatch)
- [ ] Kubernetes cluster (kubectl access)
- [ ] Database (for running migrations if needed)
- [ ] Monitoring/alerting system (Grafana, PagerDuty)
- [ ] Communication channel open (Slack, war room)
- [ ] Runbook reviewed by both eng and ops team
### Pre-Deployment Verification Checklist
- [ ] Staging deployment successful (deployed 24+ hours ago, stable)
- [ ] Monitoring in place and verified working
- [ ] Rollback plan reviewed and tested
- [ ] Emergency contacts identified
- [ ] Stakeholders notified of deployment window
- [ ] Change log prepared (what's new in this version)
### Data/Database Requirements
- [ ] Database schema compatible with new version
- [ ] Backward compatible (no breaking changes)
- [ ] Migrations tested in staging
- [ ] Rollback plan for migrations documented
- [ ] No data conflicts or corruption risks
- [ ] Backup created (if applicable)
### Approval Checklist
- [ ] Tech Lead: Code and approach approved
- [ ] Product Owner: Feature approved, ready for launch
- [ ] Operations Lead: Deployment plan reviewed
- [ ] Security: Security review passed (if applicable)
```
### Deployment Steps Section
Provide step-by-step instructions:
```markdown
## Deployment Procedure
### Pre-Deployment (Validation Phase)
**Step 1: Verify Prerequisites**
- Command: Run pre-deployment checklist above
- Verify: All items checked ✓
- If any fail: Stop deployment, resolve issues
- Time: ~15 minutes
**Step 2: Create Deployment Record**
- Document: Who is deploying, when, what version
- Command: Log in to deployment tracking system
- Entry:
```
Deployment: export-service
Version: v1.2.3
Environment: production
Deployed By: Alice Smith
Time: 2024-01-15 14:30 UTC
Change Summary: Added bulk export feature, fixed queue processing
```
- Time: ~5 minutes
### Deployment Phase
**Step 3: Tag Database Migration (if applicable)**
- Check: Are there schema changes in this version?
- If YES:
```bash
# SSH to database server
ssh -i ~/.ssh/prod.pem admin@db.example.com
# Run migrations
psql -U export_service -d export_service -c \
"ALTER TABLE exports ADD COLUMN retry_count INT DEFAULT 0;"
# Verify migration
psql -U export_service -d export_service -c \
"SELECT column_name FROM information_schema.columns WHERE table_name='exports';"
```
- If NO: Skip this step
- Verify: All migrations complete without errors
- Time: ~10 minutes
**Step 4: Deploy to Kubernetes**
- Verify: You're deploying to PRODUCTION cluster
```bash
kubectl config current-context
# Should output: arn:aws:eks:us-east-1:123456789:cluster/prod
```
- If wrong context: STOP, switch to correct cluster
- Deploy new image version:
```bash
# Update deployment with new image
kubectl set image deployment/export-service \
export-service=123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:v1.2.3 \
-n export-service-prod
```
- Verify: Deployment triggered
```bash
kubectl rollout status deployment/export-service -n export-service-prod
```
- Wait: For all pods to become ready (typically 2-3 minutes)
- Output should show: `deployment "export-service" successfully rolled out`
- Time: ~5 minutes
**Step 5: Verify Deployment Health**
- Check: Pod status
```bash
kubectl get pods -n export-service-prod
```
- All pods should show `Running` status
- If any show `CrashLoopBackOff`: Stop deployment, investigate
- Check: Service endpoints
```bash
kubectl get svc export-service -n export-service-prod
```
- Should show external IP/load balancer endpoint
- Check: Logs for errors
```bash
kubectl logs -n export-service-prod -l app=export-service --tail=50
```
- Should show startup logs, no ERROR level messages
- If errors present: Check Step 6 for rollback
- Check: Health endpoints
```bash
curl https://api.example.com/health
```
- Should return 200 OK
- If not: Service may still be starting (wait 30s and retry)
- Time: ~5 minutes
### Post-Deployment (Verification Phase)
**Step 6: Monitor Metrics**
- Open: Grafana dashboard for export-service
- Check: Key metrics for 5 minutes
- Request latency: Should be stable (< 100ms p95)
- Error rate: Should remain < 0.1%
- CPU/Memory: Should be within normal ranges
- Queue depth: Should process jobs smoothly
- Look for: Any sudden spikes or anomalies
- If anomalies: Proceed to rollback (Step 8)
- Time: ~5 minutes
**Step 7: Functional Testing**
- Manual test: Create export via API
```bash
curl -X POST https://api.example.com/exports \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"format": "csv",
"data_types": ["users"]
}'
```
- Response: Should return 201 Created with export_id
- Check status:
```bash
curl https://api.example.com/exports/{export_id} \
-H "Authorization: Bearer $TOKEN"
```
- Verify: Status transitions from queued → processing → completed
- Download: Successfully download export file
- Verify: File contents correct
- If any step fails: Proceed to rollback (Step 8)
- Time: ~5 minutes
**Step 8: Notify Stakeholders**
- Update: Deployment tracking system
```
Status: DEPLOYED
Completion Time: 14:45 UTC
Health: ✓ All checks passed
Metrics: ✓ Stable
Functional Tests: ✓ Passed
```
- Announce: Slack to #product-eng
```
@channel Export Service v1.2.3 deployed to production.
New feature: Bulk data exports now available.
Status: Monitoring.
```
- Notify: On-call engineer (monitoring for 2 hours post-deployment)
### Rollback Procedure (If Issues Found)
**Step 8: Rollback (Only if Step 6 or 7 fail)**
- Decision: Is deployment safe to continue?
- YES → All checks pass, monitoring is good → Release complete
- NO → Issues found → Proceed with rollback
- Execute rollback:
```bash
# Revert to previous version
kubectl rollout undo deployment/export-service -n export-service-prod
# Verify rollback in progress
kubectl rollout status deployment/export-service -n export-service-prod
# Wait for rollback to complete
```
- Verify rollback successful:
```bash
# Check current image
kubectl describe deployment export-service -n export-service-prod | grep Image
# Should show previous version (e.g., v1.2.2)
# Verify service responding
curl https://api.example.com/health
```
- Notify: Update stakeholders
```
@channel Deployment rolled back due to [specific reason].
Current version: v1.2.2 (stable)
Investigating issue. Will retry deployment tomorrow.
```
- Document: Root cause analysis
- What went wrong?
- Why wasn't it caught in staging?
- How do we prevent this next time?
- Time: ~10 minutes
```
### Success Criteria Section
```markdown
## Deployment Success Criteria
The deployment is successful if ALL of these are true:
### Technical Criteria
- [ ] All pods running and healthy (0 CrashLoopBackOff)
- [ ] Service responding to health checks (200 OK)
- [ ] Metrics showing normal values (no spikes)
- [ ] Error rate < 0.1% (< 1 error per 1000 requests)
- [ ] Response latency p95 < 100ms
- [ ] No errors in application logs
### Functional Criteria
- [ ] Export API responds to requests
- [ ] Export jobs queue successfully
- [ ] Jobs process and complete
- [ ] Files upload to S3 correctly
- [ ] Users can download exported files
- [ ] File contents verified correct
### Operational Criteria
- [ ] Monitoring active and receiving metrics
- [ ] Alerting working (test alert fired)
- [ ] Logs aggregated and searchable
- [ ] Runbook tested and functional
- [ ] Team confident in operating system
```
### Monitoring & Alerting Section
```markdown
## Monitoring Setup
### Critical Alerts (Page on-call)
- Service down (health check fails)
- Error rate > 1% for 5 minutes
- Response latency p95 > 500ms for 5 minutes
- Queue depth > 1000 for 10 minutes
### Warning Alerts (Slack notification)
- Error rate > 0.5% for 5 minutes
- CPU > 80% for 10 minutes
- Memory > 85% for 10 minutes
- Export job timeout increasing
### Dashboard
- Service: export-service-prod
- Metrics: Latency, errors, throughput, queue depth
- Time range: Last 24 hours by default
- Alerts: Show current alert status
```
### Troubleshooting Section
```markdown
## Troubleshooting Common Issues
### Issue: Pods stuck in CrashLoopBackOff
**Symptoms**: Pods repeatedly crash and restart
**Diagnosis**:
```bash
# Check logs for errors
kubectl logs <pod-name> -n export-service-prod
```
**Common Causes**:
- Configuration error (check environment variables)
- Database connection failed (check credentials)
- Out of memory (check resource limits)
**Fix**: Review logs, check prerequisites, rollback if unclear
### Issue: Response latency spiking
**Symptoms**: p95 latency > 200ms, users report slow exports
**Diagnosis**:
```bash
# Check queue depth
kubectl exec -it <worker-pod> -n export-service-prod \
-- redis-cli -h redis.example.com LLEN export-queue
```
**Common Causes**:
- Too many concurrent exports (queue backlog)
- Database slow (check queries, indexes)
- Network issues (check connectivity)
**Fix**: Scale up workers, check database performance, verify network
### Issue: Export jobs failing
**Symptoms**: Job status shows `failed`, users can't export
**Diagnosis**:
```bash
# Check worker logs
kubectl logs -n export-service-prod -l app=export-service
```
**Common Causes**:
- S3 upload failing (check permissions, bucket exists)
- Database query error (schema mismatch)
- User doesn't have data to export
**Fix**: Review logs, verify S3 access, check schema version
### Issue: Database migration failed
**Symptoms**: Service won't start after deployment
**Diagnosis**:
```bash
# Check migration logs
psql -U export_service -d export_service -c \
"SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5;"
```
**Recovery**:
1. Identify failed migration
2. Rollback deployment (revert to previous version)
3. Debug migration issue in staging
4. Retry deployment after fix
```
### Post-Deployment Actions Section
```markdown
## After Deployment
### Immediate (Next 2 hours)
- [ ] On-call engineer monitoring
- [ ] Check metrics every 15 minutes
- [ ] Monitor error rate and latency
- [ ] Watch for user-reported issues in #support
### Short-term (Next 24 hours)
- [ ] Review deployment metrics
- [ ] Collect feedback from users
- [ ] Document any issues encountered
- [ ] Update runbook if needed
### Follow-up (Next week)
- [ ] Post-mortem if issues occurred
- [ ] Update deployment procedure based on lessons learned
- [ ] Plan performance improvements if needed
- [ ] Update documentation if system behavior changed
```
## Writing Tips
### Be Precise and Detailed
- Exact commands to run (copy-paste ready)
- Specific values (versions, endpoints, timeouts)
- Expected outputs for verification
- Time estimates for each step
### Think About Edge Cases
- What if something is already deployed?
- What if a prerequisite is missing?
- What if deployment partially succeeds?
- What if rollback is needed?
### Make Rollback Easy
- Document rollback procedure clearly
- Test rollback before using in production
- Make rollback faster than forward deployment
- Have quick communication plan for failures
### Document Monitoring
- What metrics indicate health?
- What should we watch during deployment?
- What thresholds trigger alerts?
- How do we validate success?
### Link to Related Specs
- Reference component specs: `[CMP-001]`
- Reference design documents: `[DES-001]`
- Reference operations runbooks
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Prerequisites section incomplete"
- **Fix**: Add all required infrastructure, code, access, and approvals
**Issue**: "Step-by-step procedures lack detail"
- **Fix**: Add actual commands, expected output, time estimates
**Issue**: "No rollback procedure"
- **Fix**: Document how to revert deployment if issues arise
**Issue**: "Monitoring and troubleshooting missing"
- **Fix**: Add success criteria, monitoring setup, and troubleshooting guide
## Decision-Making Framework
When writing a deployment procedure:
1. **Prerequisites**: What must be true before we start?
- Infrastructure ready?
- Code reviewed and tested?
- Team trained?
- Approvals gotten?
2. **Procedure**: What are the exact steps?
- Simple, repeatable steps?
- Verification at each step?
- Estimated timing?
3. **Safety**: How do we prevent/catch issues?
- Verification steps after each phase?
- Rollback procedure?
- Quick failure detection?
4. **Communication**: Who needs to know what?
- Stakeholders notified?
- On-call monitoring?
- Escalation path?
5. **Learning**: How do we improve next time?
- Monitoring enabled?
- Runbook updated?
- Issues documented?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh deployment-procedure deploy-XXX-slug`
2. **Research**: Find component specs and existing procedures
3. **Document prerequisites**: What must be true before deployment?
4. **Write procedures**: Step-by-step, with commands and verification
5. **Plan rollback**: How do we undo this if needed?
6. **Validate**: `scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-XXX-slug.md`
7. **Test procedure**: Walk through it in staging environment
8. **Get team review** before using in production

View File

@@ -0,0 +1,503 @@
# How to Create a Design Document
Design Documents provide the detailed architectural and technical design for a system, component, or significant feature. They answer "How will we build this?" after business and technical requirements have been defined.
## Quick Start
```bash
# 1. Create a new design document
scripts/generate-spec.sh design-document des-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/design-document/des-001-descriptive-slug.md)
# 3. Fill in the sections, then validate:
scripts/validate-spec.sh docs/specs/design-document/des-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/design-document/des-001-descriptive-slug.md
```
## When to Write a Design Document
Use a Design Document when you need to:
- Define system architecture or redesign existing components
- Document major technical decisions and trade-offs
- Provide a blueprint for implementation teams
- Enable architectural review before coding begins
- Create shared understanding of complex systems
## Research Phase
### 1. Research Related Specifications
Find upstream specs that inform your design:
```bash
# Find related business requirements
grep -r "brd" docs/specs/ --include="*.md"
# Find related technical requirements
grep -r "prd\|technical" docs/specs/ --include="*.md"
# Find existing design patterns or similar designs
grep -r "design\|architecture" docs/specs/ --include="*.md"
```
### 2. Research External Documentation
Research existing architectures and patterns:
- Look up similar systems: "How do other companies solve this problem?"
- Research technologies and frameworks you're planning to use
- Review relevant design patterns or architecture styles
- Check for security, performance, or scalability best practices
Use tools to fetch external docs:
```bash
# Research the latest on your chosen technologies
# Example: Research distributed system patterns
# Example: Research microservices architecture best practices
```
### 3. Review Existing Codebase & Architecture
- What patterns does your codebase already follow?
- What technologies are you already using?
- How are similar features currently implemented?
- What architectural decisions have been made previously?
Ask: "Are we extending existing patterns or introducing new ones?"
### 4. Understand Constraints
- What are the performance requirements? (latency, throughput)
- What scalability targets exist?
- What security constraints apply?
- What infrastructure/budget constraints?
- Team expertise with chosen technologies?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Microservices Architecture for User Service" or similar
- **Type**: Architecture | System Design | RFC | Technology Choice
- **Status**: Draft | Under Review | Accepted | Rejected
- **Version**: 1.0.0 (increment for significant revisions)
### Executive Summary
Write 3-4 sentences that answer:
- What problem does this solve?
- What's the proposed solution?
- What are the key tradeoffs?
Example:
```
This design proposes a microservices architecture to scale our user service.
We'll split user management, authentication, and profile service into separate
deployable services. This trades some operational complexity for independent
scaling and development velocity. Key trade-off: eventual consistency vs.
immediate consistency in cross-service operations.
```
### Problem Statement
Describe the current state and limitations:
```
Current monolithic architecture handles all user operations in a single service,
causing:
- Bottleneck: User service becomes bottleneck for entire system
- Scaling: Must scale entire service even if only auth needs capacity
- Deployment: Changes in one area risk entire user service
- Velocity: Teams block each other during development
This design solves these issues by enabling independent scaling and deployment.
```
### Goals & Success Criteria
**Primary Goals** (3-5 goals)
- Reduce deployment frequency to enable multiple daily deployments
- Enable independent scaling of auth and profile services
- Reduce time to market for new user features
**Success Criteria** (specific, measurable)
1. Auth service can scale independently to handle 10k requests/sec
2. Profile service deployment doesn't impact auth service
3. System reduces MTTR for user service incidents by 50%
4. Teams can deploy independently without coordination
5. P95 latency remains under 200ms across service boundaries
### Context & Background
Explain why now?
```
Over the past 6 months, we've experienced:
- Auth service saturated at 5k requests/sec during peak hours
- Authentication changes blocked by profile service deployments
- High operational burden managing single monolithic service
Recent customer requests for higher throughput have revealed these bottlenecks.
This design addresses the most urgent scaling constraint (auth service).
```
### Proposed Solution
#### High-Level Overview
Provide a diagram showing major components and data flow:
```
┌─────────────┐
│ Client │
└──────┬──────┘
├─→ [API Gateway]
│ │
├─→ [Auth Service] - JWT validation, user login
├─→ [Profile Service] - User profile, preferences
└─→ [Data Layer]
├─ User DB (master)
├─ Cache (Redis)
└─ Message Queue (RabbitMQ)
```
Explain how components interact:
```
Client sends request to API Gateway, which routes based on endpoint.
Auth service handles login/JWT operations. Profile service handles profile
reads/writes. Both services consume user data from shared database with
eventual consistency via message queue.
```
#### Architecture Components
For each major component:
**Auth Service**
- **Purpose**: Handles authentication, token generation, validation
- **Technology**: Node.js with Express, Redis for session storage
- **Key Responsibilities**:
- User login/logout
- JWT token generation and validation
- Session management
- Password reset flows
- **Interactions**: Calls User DB for credential validation, publishes events to queue
**Profile Service**
- **Purpose**: Manages user profile data and preferences
- **Technology**: Node.js with Express, PostgreSQL for user data
- **Key Responsibilities**:
- Read/write user profile information
- Manage user preferences
- Handle profile search and filtering
- **Interactions**: Consumes user events from queue, calls shared User DB
**API Gateway**
- **Purpose**: Single entry point, routing, authentication enforcement
- **Technology**: Nginx or API Gateway (e.g., Kong)
- **Key Responsibilities**:
- Route requests to appropriate service
- Enforce API authentication
- Rate limiting
- Request/response transformation
- **Interactions**: Routes to Auth and Profile services
### Design Decisions
For each significant decision, document:
#### Decision 1: Microservices vs. Monolith
- **Decision**: Adopt microservices architecture
- **Rationale**:
- Independent scaling needed (auth bottleneck at 5k req/sec)
- Team velocity: Can deploy auth changes independently
- Loose coupling enables faster iteration
- **Alternatives Considered**:
- Monolith optimization: Caching, database optimization (rejected: can't solve scaling bottleneck)
- Modular monolith: Improves structure but doesn't enable independent scaling
- **Impact**:
- Gain: Independent scaling, deployment, team velocity
- Accept: Distributed system complexity, operational overhead, eventual consistency
#### Decision 2: Synchronous vs. Asynchronous Communication
- **Decision**: Use message queue for eventual consistency
- **Rationale**:
- Profile updates don't need to be immediately consistent across auth service
- Reduces coupling: Auth service doesn't wait for profile service
- Improves resilience: Profile service failure doesn't affect auth
- **Alternatives Considered**:
- Synchronous REST calls: Simpler but tight coupling, availability issues
- Event sourcing: Over-engineered for current needs
- **Impact**:
- Gain: Resilience, reduced coupling, independent scaling
- Accept: Eventual consistency, operational complexity (message queue)
### Technology Stack
**Language & Runtime**
- Node.js 18 LTS - Rationale: Existing expertise, good async support
- Express - Lightweight, flexible framework the team knows
**Data Layer**
- PostgreSQL (primary database) - Reliable, ACID transactions for user data
- Redis (cache layer) - Session storage, auth token cache
**Infrastructure**
- Kubernetes for orchestration - Running multiple services at scale
- Docker for containerization - Consistent deployment
**Key Libraries/Frameworks**
- Express (v4.18) - HTTP framework
- jsonwebtoken - JWT token handling
- @aws-sdk - AWS SDK for future integration
- Jest - Testing framework
### Data Model & Storage
**Storage Strategy**
- **Primary Database**: PostgreSQL with user table containing:
- id, email, password_hash, created_at, updated_at
- One-to-many relationship with user_preferences
- **Caching**: Redis stores JWT token metadata and session info with 1-hour TTL
- **Data Retention**: User data retained indefinitely; sessions cleaned up after TTL
**Schema Overview**
```
Users Table:
- id (primary key)
- email (unique index)
- password_hash
- created_at
- updated_at
User Preferences:
- id
- user_id (foreign key)
- key (e.g., theme, language)
- value
```
### API & Integration Points
**External Dependencies**
- Integrates with existing Payment Service for billing
- Consumes events from Billing Service (subscription changes)
- Publishes user events to event bus for downstream services
**Key Endpoints** (reference full API spec):
- POST /auth/login - User login
- POST /auth/logout - User logout
- GET /profile - Fetch user profile
- PUT /profile - Update user profile
(See [API-001] for complete endpoint specifications)
### Trade-offs
**Accepting**
- Operational complexity: Must manage multiple services, deployments, monitoring
- Eventual consistency: Changes propagate through message queue, not immediate
- Debugging complexity: Cross-service issues harder to debug
**Gaining**
- Independent scaling: Auth service can scale without scaling profile service
- Team autonomy: Teams can deploy independently without coordination
- Failure isolation: Auth service failure doesn't take down profile service
- Development velocity: Faster iteration, less blocking
### Implementation
**Approach**: Phased migration - Extract services incrementally without big-bang rewrite
**Phases**:
1. **Phase 1 (Week 1-2)**: Extract Auth Service
- Deliverables: Auth service running in parallel, API Gateway routing auth requests
- Testing: Canary traffic (10%) to new service
2. **Phase 2 (Week 3-4)**: Migrate Auth Traffic
- Deliverables: 100% auth traffic on new service, rollback plan tested
- Verification: Auth latency, error rates compared to baseline
3. **Phase 3 (Week 5-6)**: Extract Profile Service
- Deliverables: Profile service independent, event queue running
- Testing: Data consistency verification across message queue
**Migration Strategy**:
- Run both monolith and microservices in parallel initially
- Use API Gateway to route traffic, allow A/B testing
- Maintain ability to rollback quickly if issues arise
- Monitor closely for latency/error rate increases
(See [PLN-001] for detailed implementation roadmap)
### Performance & Scalability
**Performance Targets**
- **Latency**: Auth service p95 < 100ms, p99 < 200ms
- **Throughput**: Auth service handles 10k requests/second
- **Availability**: 99.9% uptime for auth service
**Scalability Strategy**
- **Scaling Approach**: Horizontal - Add more auth service instances behind load balancer
- **Bottlenecks**: Database connection pool size (limit 100 connections per service instance)
- Mitigation: PgBouncer connection pooling, read replicas for read operations
- **Auto-scaling**: Kubernetes HPA scales auth service from 3 to 20 replicas based on CPU
**Monitoring & Observability**
- **Metrics**: Request latency (p50, p95, p99), error rate, service availability
- **Alerting**: Alert if auth latency p95 > 150ms, error rate > 0.5%
- **Logging**: Structured JSON logs with request ID for tracing across services
### Security
**Authentication**
- JWT tokens issued by Auth Service, validated by API Gateway
- Token expiration: 1 hour, refresh tokens for extended sessions
**Authorization**
- Role-based access control (RBAC) enforced at API Gateway
- Profile service doesn't repeat auth checks (trusts gateway)
**Data Protection**
- **Encryption at Rest**: PostgreSQL database encryption enabled
- **Encryption in Transit**: TLS 1.3 for all service-to-service communication
- **PII Handling**: Passwords hashed with bcrypt (cost factor 12)
**Secrets Management**
- Database credentials stored in Kubernetes secrets
- JWT signing key rotated quarterly
- Environment-based secret injection at runtime
**Compliance**
- GDPR: User data can be exported via profile service
- SOC2: Audit logging enabled for user data access
### Dependencies & Assumptions
**Dependencies**
- PostgreSQL database must be highly available (RTO 1 hour)
- Redis cache can tolerate data loss (non-critical)
- API Gateway (Nginx) must be deployed and operational
- Message queue (RabbitMQ) must be running
**Assumptions**
- Auth service will handle up to 10k requests/second (based on growth projections)
- User data size remains < 100GB (current: 5GB)
- Network latency between services < 10ms (co-located data center)
### Open Questions
- [ ] Should we use gRPC for service-to-service communication instead of REST?
- **Status**: Under investigation - benchmarking against REST
- [ ] How do we handle shared user data updates if both services write to DB?
- **Status**: Deferred to Phase 3 - will use event sourcing pattern
- [ ] What message queue (RabbitMQ vs. Kafka)?
- **Status**: RabbitMQ chosen, but revisit if we need audit trail of all changes
### Approvals
**Technical Review**
- Lead Backend Engineer - TBD
**Architecture Review**
- VP Engineering - TBD
**Security Review**
- Security Team - TBD
**Approved By**
- TBD
```
## Writing Tips
### Use Diagrams Effectively
- ASCII art is fine for design docs (easy to version control)
- Show data flow and component interactions
- Label arrows with what data/requests are flowing
### Be Explicit About Trade-offs
- Don't just say "microservices is better"
- Say "We're trading operational complexity for independent scaling because this addresses our 5k req/sec bottleneck"
### Link to Other Specs
- Reference related business requirements: `[BRD-001]`
- Reference technical requirements: `[PRD-001]`
- Reference data models: `[DATA-001]`
- Reference API contracts: `[API-001]`
### Document Rationale
- Each decision needs a "why"
- Explain what alternatives were considered and why they were rejected
- This helps future developers understand the context
### Be Specific About Performance
- Not: "Must be performant"
- Yes: "p95 latency under 100ms, p99 under 200ms, supporting 10k requests/second"
### Consider the Whole System
- Security implications
- Operational/monitoring requirements
- Data consistency model
- Failure modes and recovery
- Future scalability
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/design-document/des-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing Proposed Solution section"
- **Fix**: Add detailed architecture components, design decisions, tech stack
**Issue**: "TODO items in Architecture Components (4 items)"
- **Fix**: Complete descriptions for all components (purpose, technology, responsibilities)
**Issue**: "No Trade-offs documented"
- **Fix**: Explicitly document what you're accepting and what you're gaining
**Issue**: "Missing Performance & Scalability targets"
- **Fix**: Add specific latency, throughput, and availability targets
### Check Completeness
```bash
scripts/check-completeness.sh docs/specs/design-document/des-001-your-spec.md
```
## Decision-Making Framework
As you write the design doc, work through:
1. **Problem**: What are we designing for?
- Specific pain points or constraints?
- Performance targets, scalability requirements?
2. **Options**: What architectural approaches could work?
- Monolith vs. distributed?
- Synchronous vs. asynchronous?
- Technology choices?
3. **Evaluation**: How do options compare?
- Which best addresses the problem?
- What are the trade-offs?
- What does the team have experience with?
4. **Decision**: Which approach wins and why?
- What assumptions must hold?
- What trade-offs are we accepting?
5. **Implementation**: How do we build/migrate to this?
- Big bang or incremental?
- Parallel running period?
- Rollback plan?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh design-document des-XXX-slug`
2. **Research**: Find related specs and understand architecture context
3. **Sketch**: Draw architecture diagrams before writing detailed components
4. **Fill in sections** using this guide
5. **Validate**: `scripts/validate-spec.sh docs/specs/design-document/des-XXX-slug.md`
6. **Get architectural review** before implementation begins
7. **Update related specs**: Create or update technical requirements and implementation plans

View File

@@ -0,0 +1,564 @@
# How to Create a Flow Schematic Specification
Flow schematics document business processes, workflows, and system flows visually and textually. They show how information moves through systems and how users interact with features.
## Quick Start
```bash
# 1. Create a new flow schematic
scripts/generate-spec.sh flow-schematic flow-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/flow-schematic/flow-001-descriptive-slug.md)
# 3. Add diagram and flow descriptions, then validate:
scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
```
## When to Write a Flow Schematic
Use a Flow Schematic when you need to:
- Document how users interact with a feature
- Show data flow through systems
- Illustrate decision points and branches
- Document error handling paths
- Clarify complex processes
- Enable team alignment on workflow
## Research Phase
### 1. Research Related Specifications
Find what this flow represents:
```bash
# Find business requirements this flow implements
grep -r "brd" docs/specs/ --include="*.md"
# Find design documents that mention this flow
grep -r "design" docs/specs/ --include="*.md"
# Find related components or APIs
grep -r "component\|api" docs/specs/ --include="*.md"
```
### 2. Understand the User/System
- Who are the actors in this flow? (users, systems, services)
- What are they trying to accomplish?
- What information flows between actors?
- Where are the decision points?
- What happens when things go wrong?
### 3. Review Similar Flows
- How are flows documented in your organization?
- What diagramming style is used?
- What level of detail is typical?
- What's been confusing about past flows?
## Structure & Content Guide
### Title & Metadata
- **Title**: "User Export Flow", "Payment Processing Flow", etc.
- **Actor**: Primary user or system
- **Scope**: What does this flow cover?
- **Status**: Draft | Current | Legacy
### Overview Section
```markdown
# User Bulk Export Flow
## Summary
Describes the complete workflow when a user initiates a bulk data export,
including queuing, processing, file storage, and download.
**Primary Actors**: User, Export Service, Database, S3
**Scope**: From export request to download
**Current**: Yes (live in production)
## Key Steps Overview
1. User requests export (website)
2. API queues export job
3. Worker processes export
4. File stored to S3
5. User notified and downloads
```
### Flow Diagram Section
Create a visual representation:
```markdown
## Flow Diagram
### User Export Flow (ASCII Art)
```
┌─────────────┐
│ User │
│ (Website) │
└──────┬──────┘
│ 1. Click Export
┌─────────────────────────┐
│ Export API │
│ POST /exports │
├─────────────────────────┤
│ 2. Validate request │
│ 3. Create export record │
│ 4. Queue job │
└────────┬────────────────┘
├─→ 5. Return job_id to user
┌──────────────────────┐
│ Message Queue │
│ (Redis Bull) │
├──────────────────────┤
│ 6. Store export job │
└────────┬─────────────┘
├─→ 7. Worker picks up job
┌──────────────────────────────┐
│ Export Worker │
├──────────────────────────────┤
│ 8. Query user data │
│ 9. Format data (CSV/JSON) │
│ 10. Compress file │
└────────┬─────────────────────┘
├─→ 11. Update job status (processing)
┌──────────────────────────┐
│ AWS S3 │
├──────────────────────────┤
│ 12. Store file │
│ 13. Generate signed URL │
└────────┬─────────────────┘
├─→ 14. Send notification email to user
┌──────────────────────────┐
│ User Email │
├──────────────────────────┤
│ 15. Click download link │
└────────┬─────────────────┘
├─→ 16. Browser requests file from S3
┌──────────────────────────┐
│ File Downloaded │
└──────────────────────────┘
```
### Swimlane Diagram (Alternative Format)
```markdown
### Alternative: Swimlane Diagram
```
User │ Frontend │ Export API │ Message Queue │ Worker │ S3
│ │ │ │ │
1. Clicks │ │ │ │ │
Export ─┼──────────────→│ │ │ │
│ 2. Form Data │ │ │ │
│ │ 3. Validate │ │ │
│ │ 4. Create Job│ │ │
│ │ 5. Queue Job ─┼──────────────→│ │
│ │ │ 6. Job Ready │ │
│ 7. Show Status│ │ │ │
│ (polling) ←┼───────────────│ (update DB) │ │
│ │ │ │ 8. Get Data │
│ │ │ │ 9. Format │
│ │ │ │ 10. Compress │
│ │ │ │ 11. Upload ─┼──→
│ │ │ │ │
│ 12. Email sent│ │ │ │
│←──────────────┼───────────────┼───────────────┤ │
│ │ │ │ │
14. Download │ │ │ │ │
Starts ─┼──────────────→│ │ │ │
│ │ │ │ │
│ │ 15. GET /file ┼───────────────┼──────────────→│
│ │ │ │ 16. Return URL
│ File Downloaded
```
```
### Step-by-Step Description Section
Document each step in detail:
```markdown
## Detailed Flow Steps
### Phase 1: Export Request
**Step 1: User Initiates Export**
- **Actor**: User
- **Action**: Clicks "Export Data" button on website
- **Input**: Export preferences (format, data types, date range)
- **Output**: Export request form submitted
**Step 2: Frontend Sends Request**
- **Actor**: Frontend/Browser
- **Action**: Submits POST request to /exports endpoint
- **Headers**: Authorization header with JWT token
- **Body**:
```json
{
"format": "csv",
"data_types": ["users", "transactions"],
"date_range": { "start": "2024-01-01", "end": "2024-01-31" }
}
```
**Step 3: API Validates Request**
- **Actor**: Export API
- **Action**: Validate request format and parameters
- **Checks**:
- User authenticated?
- Valid format type?
- Date range valid?
- User not already processing too many exports?
- **Success**: Continue to Step 4
- **Error**: Return 400 Bad Request with error details
**Step 4: Create Export Record**
- **Actor**: Export API
- **Action**: Store export metadata in database
- **Data Stored**:
```sql
INSERT INTO exports (
id, user_id, format, data_types, status,
created_at, updated_at
) VALUES (...)
```
- **Status**: `queued`
- **Response**: Return 201 with export_id
### Phase 2: Job Processing
**Step 5: Queue Export Job**
- **Actor**: Export API
- **Action**: Add job to Redis queue
- **Job Format**:
```json
{
"export_id": "exp_123456",
"user_id": "usr_789012",
"format": "csv",
"data_types": ["users", "transactions"]
}
```
- **Queue**: Bull job queue in Redis
- **TTL**: Job removed after 7 days
**Step 6: Return to User**
- **Actor**: Export API
- **Action**: Send response to frontend
- **Response**:
```json
{
"id": "exp_123456",
"status": "queued",
"created_at": "2024-01-15T10:00:00Z",
"estimated_completion": "2024-01-15T10:05:00Z"
}
```
### Phase 3: Data Export
**Step 7: Worker Picks Up Job**
- **Actor**: Export Worker
- **Action**: Poll Redis queue for jobs
- **Condition**: Worker checks every 100ms
- **Process**: Dequeues oldest job, marks as processing
- **Status Update**: Export marked as `processing` in database
**Step 8-10: Process Export**
- **Actor**: Export Worker
- **Actions**:
1. Query user data from database (user table, transaction table)
2. Validate and transform data to requested format
3. Write to temporary file on worker disk
4. Compress file with gzip
- **Error Handling**: If fails, retry up to 3 times with backoff
**Step 11: Upload to S3**
- **Actor**: Export Worker
- **Action**: Upload compressed file to S3
- **Filename**: `exports/exp_123456.csv.gz`
- **ACL**: Private (only accessible via signed URL)
- **Success**: Update export status to `completed` in database
### Phase 4: Notification & Download
**Step 12: Send Notification**
- **Actor**: Notification Service (triggered by export completion event)
- **Action**: Send email to user
- **Email Content**: "Your export is ready! [Click here to download]"
- **Link**: Includes signed URL (valid for 7 days)
**Step 13: User Receives Email**
- **Actor**: User
- **Action**: Receives email notification
- **Next**: Clicks download link
**Step 14-16: Download File**
- **Actor**: User browser
- **Action**: Follows download link
- **Request**: GET /exports/exp_123456/download
- **Response**: Browser initiates file download
- **File**: exp_123456.csv.gz is saved to user's computer
```
### Decision Points Section
Document branching logic:
```markdown
## Decision Points
### Decision 1: Export Format Validation
**Question**: Is the requested export format supported?
**Options**:
- ✓ CSV: Continue to data export (Step 8)
- ✓ JSON: Continue to data export (Step 8)
- ✗ Other format: Return 400 error, user selects different format
### Decision 2: User Data Available?
**Question**: Can we successfully query user data?
**Options**:
- ✓ Yes: Continue with data transformation (Step 9)
- ✗ Database error: Retry job (up to 3 times)
- ✗ User data deleted: Return "no data" message to user
### Decision 3: File Size Check
**Question**: Is the export file within size limits?
**Options**:
- ✓ < 500MB: Proceed to upload (Step 11)
- ✗ > 500MB: Return error "export too large", offer data filtering options
### Decision 4: Export Status Check (User Polling)
**Question**: Has export job completed?
**Polling**: Frontend polls GET /exports/{id} every 5 seconds
**Options**:
- `queued`: Show "Waiting to process..."
- `processing`: Show "Processing... (40%)"
- `completed`: Show download link
- `failed`: Show error message, offer retry option
- `cancelled`: Show "Export was cancelled"
```
### Error Handling Section
```markdown
## Error Handling & Recovery
### Error 1: Invalid Request Format
**Trigger**: User submits invalid format parameter
**Response Code**: 400 Bad Request
**Message**: "Invalid format. Supported: csv, json"
**Recovery**: User submits corrected request
### Error 2: Database Connection Lost During Export
**Trigger**: Worker loses connection to database while querying data
**Response Code**: (internal, no response to user)
**Recovery**: Job retried automatically (backoff: 1s, 2s, 4s)
**Max Retries**: 3 times
**If Fails After Retries**: Export marked as `failed`, user notified
### Error 3: S3 Upload Failure
**Trigger**: S3 returns 500 error
**Recovery**: Retry with exponential backoff
**Fallback**: If retries exhausted, store to local backup, retry next hour
**User Impact**: Export shows "delayed", user can check status later
### Error 4: File Too Large
**Trigger**: Export file exceeds 500MB limit
**Response Code**: 413 Payload Too Large
**Message**: "Export data exceeds 500MB. Use date filtering to reduce size."
**Recovery**: User modifies date range and resubmits
### Timeout Handling
**Job Timeout**: If export takes > 5 minutes, job is killed
**User Notification**: "Export processing took too long. Please try again."
**Logs**: Timeout recorded for analysis
**Recovery**: User can request again (usually succeeds second time)
```
### Async/Event Section
Document asynchronous aspects:
```markdown
## Asynchronous Operations
### Event: Export Created
**Trigger**: POST /exports returns 201
**Event Published**: `export.created`
**Subscribers**: Analytics service (tracks export requests)
**Payload**:
```json
{
"export_id": "exp_123456",
"user_id": "usr_789012",
"format": "csv",
"timestamp": "2024-01-15T10:00:00Z"
}
```
### Event: Export Completed
**Trigger**: Worker successfully uploads to S3
**Event Published**: `export.completed`
**Subscribers**:
- Notification service (send email)
- Analytics service (track completion)
**Payload**:
```json
{
"export_id": "exp_123456",
"file_size_bytes": 2048576,
"processing_time_ms": 312000,
"timestamp": "2024-01-15T10:05:12Z"
}
```
### Event: Export Failed
**Trigger**: Job fails after max retries
**Event Published**: `export.failed`
**Subscribers**: Notification service (alert user)
**Payload**:
```json
{
"export_id": "exp_123456",
"error_code": "database_timeout",
"error_message": "Connection timeout after 3 retries",
"timestamp": "2024-01-15T10:06:00Z"
}
```
```
### Performance & Timing Section
```markdown
## Performance Characteristics
### Typical Timings
- Request submission → queued: < 100ms
- Queued → processing starts: < 30 seconds (depends on queue load)
- Processing time:
- Small dataset (< 10MB): 1-2 minutes
- Medium dataset (10-100MB): 2-5 minutes
- Large dataset (100-500MB): 5-10 minutes
- Upload to S3: 30 seconds to 2 minutes
### Total End-to-End Time
- Average: 5-10 minutes from request to download ready
- Best case: 3-5 minutes (empty queue, small dataset)
- Worst case: 15+ minutes (high load, large dataset)
### Scaling Behavior
- 1 worker: Processes 1 export at a time
- 3 workers: Process 3 exports in parallel
- 10 workers: Can handle 10 concurrent exports
- Queue depth auto-scales workers up to 20 pods
```
## Writing Tips
### Use Clear Diagrams
- ASCII art is fine and versioning-friendly
- Show all actors and their interactions
- Label arrows with what's being transmitted
- Use swimlanes for multiple actors
### Be Specific About Data
- Show actual request/response formats
- Include field names and types
- Show error responses with codes
- Document data transformations
### Cover the Happy Path AND Error Paths
- What happens when everything works?
- What happens when things go wrong?
- What are the recovery mechanisms?
- Can users recover?
### Think About Timing
- What happens asynchronously?
- Where are synchronous waits?
- What are typical timings?
- Where are bottlenecks?
### Link to Related Specs
- Reference design documents: `[DES-001]`
- Reference API contracts: `[API-001]`
- Reference component specs: `[CMP-001]`
- Reference data models: `[DATA-001]`
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Flow diagram incomplete or missing"
- **Fix**: Add ASCII diagram or swimlane showing all steps
**Issue**: "Step descriptions lack detail"
- **Fix**: Add what happens, who's involved, input/output for each step
**Issue**: "No error handling documented"
- **Fix**: Document error cases and recovery mechanisms
**Issue**: "Async operations not clearly shown"
- **Fix**: Highlight asynchronous steps and show event flows
## Decision-Making Framework
When documenting a flow:
1. **Scope**: What does this flow cover?
- Where does it start/end?
- What's in scope vs. out?
2. **Actors**: Who/what are the main actors?
- Users, systems, services?
- External dependencies?
3. **Happy Path**: What's the ideal flow?
- Step-by-step happy path
- Minimal branching
4. **Edge Cases**: What can go wrong?
- Error scenarios
- Recovery mechanisms
- User impact
5. **Timing**: What's the performance profile?
- Synchronous waits?
- Asynchronous operations?
- Expected timings?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh flow-schematic flow-XXX-slug`
2. **Research**: Find related specs and understand context
3. **Sketch diagram**: Draw initial flow with all actors
4. **Document steps**: Write detailed description for each step
5. **Add error handling**: Document failure scenarios
6. **Validate**: `scripts/validate-spec.sh docs/specs/flow-schematic/flow-XXX-slug.md`
7. **Get feedback** from team to refine flow

View File

@@ -0,0 +1,434 @@
# How to Create a Milestone Specification
Milestone specifications define specific delivery targets within a project, including deliverables, success criteria, and timeline. They're checkpoints to verify progress.
## Quick Start
```bash
# 1. Create a new milestone
scripts/generate-spec.sh milestone mls-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/milestone/mls-001-descriptive-slug.md)
# 3. Fill in deliverables and criteria, then validate:
scripts/validate-spec.sh docs/specs/milestone/mls-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/milestone/mls-001-descriptive-slug.md
```
## When to Write a Milestone
Use a Milestone Spec when you need to:
- Define specific delivery checkpoints
- Communicate to stakeholders what's shipping when
- Track progress against concrete deliverables
- Set success criteria before building
- Manage dependencies between teams
- Celebrate progress and team achievements
## Research Phase
### 1. Research Related Specifications
Find the context for this milestone:
```bash
# Find the plan this milestone belongs to
grep -r "plan" docs/specs/ --include="*.md"
# Find related requirements and specs
grep -r "brd\|prd\|design" docs/specs/ --include="*.md"
```
### 2. Understand the Broader Plan
- What larger project is this part of?
- What comes before and after this milestone?
- What dependencies exist with other teams?
- What are the overall project goals?
### 3. Review Similar Milestones
- How were past milestones structured?
- What deliverables were tracked?
- How were success criteria defined?
- What worked and what didn't?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Phase 1: Infrastructure Ready", "Beta Launch", etc.
- **Date**: Target completion date
- **Owner**: Team or person responsible
- **Status**: Planned | In Progress | Completed | At Risk
### Milestone Summary
```markdown
# Phase 1: Export Infrastructure Ready
**Target Date**: January 28, 2024
**Owner**: Backend Engineering Team
**Status**: In Progress
## Summary
Delivery of fully operational job queue infrastructure and worker processes
supporting the bulk export feature. Team demonstrates system can reliably
process 10+ jobs per second with monitoring and alerting in place.
```
### Deliverables Section
List what will be delivered:
```markdown
## Deliverables
### 1. Redis Job Queue (Production-Ready)
**Description**: Managed Redis cluster configured for job queuing
**Acceptance Criteria**:
- [ ] AWS ElastiCache Redis cluster deployed to staging
- [ ] Cluster sized for 10k requests/second capacity
- [ ] Backup and failover configured
- [ ] Monitoring and alerts in place
**Owner**: Infrastructure Team
**Status**: In Progress
### 2. Bull Job Queue Worker
**Description**: Node.js Bull queue implementation with workers
**Acceptance Criteria**:
- [ ] Bull queue initialized and processing jobs
- [ ] Worker processes handle 10+ jobs/second
- [ ] Graceful shutdown implemented
- [ ] Error handling and retry logic working
- [ ] Unit tests cover all worker functions
**Owner**: Backend Engineer (Alice)
**Delivered**: Code in feature branch, ready for review
### 3. Kubernetes Deployment Manifests
**Description**: K8s manifests for deploying queue workers
**Acceptance Criteria**:
- [ ] Deployment manifest supports 1-10 replicas
- [ ] Health checks configured (liveness, readiness)
- [ ] Resource requests/limits defined
- [ ] Secrets management for Redis credentials
- [ ] Successfully deploys to staging cluster
**Owner**: DevOps Engineer (Bob)
**Status**: Ready for review
### 4. Prometheus Metrics Integration
**Description**: Export metrics for job queue depth, worker status
**Acceptance Criteria**:
- [ ] Metrics scrape successfully every 15 seconds
- [ ] Dashboard shows queue depth over time
- [ ] Queue saturation alerts configured
- [ ] Grafana dashboard created for monitoring
**Owner**: Backend Engineer (Alice)
**Status**: In progress
### 5. Documentation & Runbook
**Description**: Queue architecture docs and operational runbook
**Acceptance Criteria**:
- [ ] Architecture diagram showing queues and workers
- [ ] Configuration guide for different environments
- [ ] Runbook for common operations (scaling, debugging)
- [ ] Troubleshooting guide for common issues
**Owner**: Tech Lead (Charlie)
**Status**: Planned (starts after technical setup)
## Deliverables Summary
| Deliverable | Status | Owner | Target |
|------------|--------|-------|--------|
| Redis Cluster | In Progress | Infra | Jan 20 |
| Bull Worker | In Progress | Alice | Jan 22 |
| K8s Manifests | In Progress | Bob | Jan 22 |
| Prometheus Metrics | In Progress | Alice | Jan 25 |
| Documentation | Planned | Charlie | Jan 28 |
```
### Success Criteria Section
Define what "done" means:
```markdown
## Success Criteria
### Technical Criteria (Must Pass)
- [ ] Job queue processes 100 jobs without errors
- [ ] Queue handles 10+ jobs/second sustained throughput
- [ ] Workers scale horizontally (add/remove replicas without data loss)
- [ ] Failed jobs retry with exponential backoff
- [ ] All health checks pass in staging environment
### Operational Criteria (Must Have)
- [ ] Prometheus metrics visible in Grafana dashboard
- [ ] Alerts fire correctly when queue depth exceeds threshold
- [ ] Monitoring documentation complete and understood by ops team
- [ ] Runbook covers: scaling, debugging, troubleshooting
### Quality Criteria (Must Meet)
- [ ] Code reviewed and approved by 2+ senior engineers
- [ ] Unit tests pass with 90%+ coverage
- [ ] Integration tests verify queue → worker → completion flow
- [ ] Load tests verify performance targets
- [ ] Security audit passed (no exposed credentials)
### Documentation Criteria (Must Have)
- [ ] Architecture documented with diagrams
- [ ] Configuration guide for different environments
- [ ] Troubleshooting guide covers common issues
- [ ] Operations team trained and confident in operations
## Sign-Off Criteria
Milestone is "done" when:
1. All deliverables accepted and deployed to staging
2. All technical criteria pass
3. Tech lead, product owner, and operations lead approve
4. Documentation reviewed and accepted
```
### Timeline & Dependencies Section
```markdown
## Timeline & Dependencies
### Critical Path
```
Start → Redis Setup → Bull Implementation → Testing → Documentation → Done
(Jan 15) (3 days) (4 days) (3 days) (2 days) (Jan 28)
```
### Phase Dependencies
- **Blocking this milestone**: None (can start immediately)
- **This milestone blocks**: Phase 2 (Export Service Development)
- **If delayed**: Phase 2 starts after this completes
- **Contingency**: Have spare capacity in next phase for any slippage
### Team Capacity
| Person | Allocation | Weeks | Notes |
|--------|-----------|-------|-------|
| Alice (Backend) | 100% | 2 | Queue + metrics |
| Bob (DevOps) | 100% | 1.5 | Infrastructure |
| Charlie (Lead) | 50% | 1.5 | Review + docs |
### Risks & Mitigation
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|-----------|
| Redis provisioning delayed | Medium | High | Use managed service, start request early |
| Performance targets not met | Low | High | Load test early, optimize if needed |
| Team member unavailable | Low | Medium | Cross-train backup person |
| Documentation delayed | Low | Low | Defer non-critical docs to next phase |
```
### Blockers & Issues Section
Track what could prevent delivery:
```markdown
## Current Blockers
### 1. AWS Infrastructure Approval (High Priority)
- **Issue**: Redis cluster requires infrastructure approval
- **Impact**: Blocks infrastructure setup (3-5 day delay if not approved)
- **Owner**: Infrastructure Lead
- **Action**: Sent approval request on Jan 10, following up Jan 15
- **Target Resolution**: Jan 12
### 2. Node.js Bull Documentation Gap (Low Priority)
- **Issue**: Team unfamiliar with Bull library job prioritization
- **Impact**: Might need extra time for implementation
- **Owner**: Alice
- **Action**: Schedule Bull library workshop on Jan 16
- **Target Resolution**: Jan 16
## Dependencies Waiting
- AWS ElastiCache cluster approval (Infrastructure)
- IAM roles and security groups (Security team)
```
### Acceptance & Testing Section
```markdown
## Acceptance Procedures
### Manual Testing Checklist
- [ ] Queue accepts jobs from client
- [ ] Worker processes jobs without errors
- [ ] Queue depth monitoring works in Grafana
- [ ] Scaling up adds workers, scaling down removes them gracefully
- [ ] Failed job retry works with exponential backoff
- [ ] Restart worker and verify no jobs are lost
### Performance Testing
- [ ] Load test with 100 concurrent jobs
- [ ] Verify throughput ≥ 10 jobs/second
- [ ] Monitor memory and CPU during load test
- [ ] Document baseline metrics for future comparison
### Security Testing
- [ ] Credentials not exposed in logs or metrics
- [ ] Redis connection uses TLS
- [ ] Worker process runs with minimal permissions
### Sign-Off Process
1. Engineering team completes manual testing
2. Tech lead verifies all acceptance criteria pass
3. Operations team reviews runbook and documentation
4. Product owner confirms milestone meets business requirements
5. All sign-off: tech lead, ops lead, product owner
```
### Rollback Plan Section
```markdown
## Rollback Plan
If this milestone fails or has critical issues:
### Rollback Steps
1. Revert worker deployment: `kubectl rollout undo`
2. Keep Redis cluster (non-breaking)
3. Disable alerts that reference new queue
4. Run post-mortem to understand failure
### Communication
- Notify stakeholders if deadline at risk
- Update project plan and re-estimate Phase 2
- Communicate revised timeline to customers
### Root Cause Analysis
- Conduct post-mortem within 2 days
- Document lessons learned
- Update processes/checklists to prevent recurrence
```
### Stakeholder Communication Section
```markdown
## Stakeholder Communication
### Who Needs to Know About This Milestone?
- **Engineering**: Build against completed infrastructure
- **Product**: Planning feature launch timeline
- **Operations**: Preparing to support new system
- **Executives**: Tracking project progress
- **Customers**: Waiting for export feature
### Communication Plan
| Stakeholder | Update Frequency | Content |
|-------------|-----------------|---------|
| Engineering Team | Daily standup | Progress, blockers |
| Tech Lead | 3x/week | Risk assessment, decisions |
| Product Owner | Weekly | Status, timeline impact |
| Ops Team | Twice/week | Operational readiness |
| Executives | On completion | Milestone achieved, next steps |
### Status Updates
**Current Status**: 60% complete (Jan 22)
- Redis setup: Complete
- Bull worker: Mostly done, 2 days of testing remaining
- K8s manifests: In review
- Metrics: Underway
- Documentation: Not yet started
**Next Update**: Jan 25 (on track for Jan 28 completion)
**Confidence Level**: High (85%) - minor risks, good progress
```
## Writing Tips
### Be Specific About Deliverables
- What exactly is being delivered?
- How will you verify it's done?
- Who owns each deliverable?
- What's the definition of "done"?
### Define Success Clearly
- Success criteria should be objective and testable
- Mix technical, operational, and quality criteria
- Include both must-haves and nice-to-haves
- Get stakeholder agreement on criteria upfront
### Think About the Bigger Picture
- How does this milestone fit into the overall project?
- What depends on this milestone?
- What changes if this milestone is delayed?
- What's the contingency plan?
### Track Progress
- Update the milestone spec regularly (weekly)
- Note what's actually happening vs. plan
- Identify and communicate risks early
- Celebrate when milestone completes!
### Link to Related Specs
- Reference the overall plan: `[PLN-001]`
- Reference related milestones: `[MLS-002]`
- Reference technical specs: `[CMP-001]`, `[API-001]`
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/milestone/mls-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Deliverables lack acceptance criteria"
- **Fix**: Add specific, testable criteria for each deliverable
**Issue**: "No success criteria defined"
- **Fix**: Document technical, operational, and quality criteria
**Issue**: "Owner/responsibilities not assigned"
- **Fix**: Assign each deliverable to a specific person or team
**Issue**: "Rollback plan missing"
- **Fix**: Document how you'd handle failure or critical issues
## Decision-Making Framework
When defining a milestone:
1. **Scope**: What should be in this milestone?
- Shippable chunk?
- Dependencies resolved?
- Tests passing?
2. **Success**: How will we know this is done?
- Objective criteria?
- Stakeholder agreement?
- Testable outcomes?
3. **Schedule**: When is this realistically achievable?
- Team capacity?
- Dependency timelines?
- Buffer for unknowns?
4. **Risks**: What could prevent delivery?
- Technical unknowns?
- Resource constraints?
- External dependencies?
5. **Communication**: Who needs to know about this?
- Stakeholder updates?
- Sign-off process?
- Celebration when done?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh milestone mls-XXX-slug`
2. **Research**: Find the plan and related specs
3. **Define deliverables** with clear owners
4. **Set success criteria** that are testable
5. **Identify risks** and mitigation strategies
6. **Validate**: `scripts/validate-spec.sh docs/specs/milestone/mls-XXX-slug.md`
7. **Get stakeholder alignment** before kickoff
8. **Update regularly** to track progress

View File

@@ -0,0 +1,523 @@
# How to Create a Plan Specification
Plan specifications document implementation roadmaps, project timelines, phases, and deliverables. They provide the "how and when" we'll build something.
## Quick Start
```bash
# 1. Create a new plan
scripts/generate-spec.sh plan pln-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/plan/pln-001-descriptive-slug.md)
# 3. Fill in phases and deliverables, then validate:
scripts/validate-spec.sh docs/specs/plan/pln-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/plan/pln-001-descriptive-slug.md
```
## When to Write a Plan
Use a Plan Spec when you need to:
- Document project phases and timeline
- Define deliverables and milestones
- Identify dependencies and blockers
- Track progress against plan
- Communicate timeline to stakeholders
- Align team on implementation sequence
## Research Phase
### 1. Research Related Specifications
Find what you're planning:
```bash
# Find business requirements
grep -r "brd" docs/specs/ --include="*.md"
# Find technical requirements and design docs
grep -r "prd\|design" docs/specs/ --include="*.md"
# Find existing plans that might be related
grep -r "plan" docs/specs/ --include="*.md"
```
### 2. Understand the Scope
- What are you building?
- What are the business priorities?
- What are the technical dependencies?
- What constraints exist (timeline, resources)?
### 3. Review Similar Projects
- How long did similar projects take?
- What teams are involved?
- What risks arose and how were they managed?
- What was the actual vs. planned timeline?
## Structure & Content Guide
### Title & Metadata
- **Title**: "Export Feature Implementation Plan", "Q1 2024 Roadmap", etc.
- **Timeline**: Start date through completion
- **Owner**: Project lead or team lead
- **Status**: Planning | In Progress | Completed
### Overview Section
```markdown
# Export Feature Implementation Plan
## Summary
Plan to implement bulk user data export feature across 8 weeks.
Includes job queue infrastructure, API endpoints, UI, and deployment.
**Timeline**: January 15 - March 10, 2024 (8 weeks)
**Team Size**: 3-4 engineers, 1 product manager
**Owner**: Engineering Lead
**Status**: Planning
## Key Objectives
1. Enable enterprise customers to bulk export their data
2. Build reusable async job processing infrastructure
3. Achieve 99% reliability for export system
4. Complete testing and documentation before production launch
```
### Phases Section
Document phases with timing:
```markdown
## Implementation Phases
### Phase 1: Infrastructure Setup (Weeks 1-2)
**Timeline**: Jan 15 - Jan 28 (2 weeks)
**Team**: 2 engineers
**Goals**
- Implement job queue infrastructure (Redis + Bull)
- Build worker process foundation
- Deploy to staging environment
**Deliverables**
- Redis job queue with worker processes
- Monitoring and alerting setup
- Health checks and graceful shutdown
- Documentation of queue architecture
**Tasks**
- [ ] Set up Redis cluster (managed service)
- [ ] Implement Bull queue with worker processors
- [ ] Add Prometheus metrics for queue depth
- [ ] Configure Kubernetes deployment manifests
- [ ] Create staging deployment
- [ ] Document queue architecture
**Dependencies**
- None (can start immediately)
**Success Criteria**
- Job queue processes 10 jobs/second without errors
- Workers can be scaled horizontally in Kubernetes
- Monitoring shows queue depth and worker status
```
### Phase 2: Export Service Development (Weeks 3-5)
**Timeline**: Jan 29 - Feb 18 (3 weeks)
**Team**: 2-3 engineers
**Goals**
- Implement export processing logic
- Add database export functionality
- Support multiple export formats
**Deliverables**
- Export service that processes jobs from queue
- CSV and JSON export format support
- Data validation and error handling
- Comprehensive unit tests
**Tasks**
- [ ] Implement data query and export logic
- [ ] Add CSV formatter
- [ ] Add JSON formatter
- [ ] Implement error handling and retries
- [ ] Add data compression
- [ ] Write unit tests (target: 90%+ coverage)
- [ ] Performance test with 100MB+ files
**Dependencies**
- Phase 1 complete (queue infrastructure)
- Data model spec finalized ([DATA-001])
**Success Criteria**
- Export processes complete in < 5 minutes for 100MB files
- All data exports match source data exactly
- Error retry logic works correctly
- 90%+ test coverage
```
### Phase 3: API & Storage (Weeks 4-6)
**Timeline**: Feb 5 - Feb 25 (3 weeks, 1 week overlap with Phase 2)
**Team**: 2 engineers
**Goals**
- Implement REST API for export management
- Set up S3 storage for export files
- Build export status tracking
**Deliverables**
- REST API endpoints (create, get status, download)
- S3 integration for file storage
- Export metadata storage in database
- API documentation
**Tasks**
- [ ] Implement POST /exports endpoint
- [ ] Implement GET /exports/{id} endpoint
- [ ] Implement GET /exports/{id}/download endpoint
- [ ] Add S3 integration for storage
- [ ] Create export metadata schema
- [ ] Implement TTL-based cleanup
- [ ] Add API rate limiting
- [ ] Create API documentation
**Dependencies**
- Phase 1 complete (queue infrastructure)
- Phase 2 progress (service processing)
- Data model spec finalized ([DATA-001])
**Success Criteria**
- API responds to requests in < 100ms (p95)
- Files stored and retrieved from S3 correctly
- Cleanup removes files after 7-day TTL
```
### Phase 4: Testing & Optimization (Weeks 6-7)
**Timeline**: Feb 19 - Mar 3 (2 weeks, overlap with Phase 3)
**Team**: 2-3 engineers
**Goals**
- Comprehensive testing across all components
- Performance optimization
- Security audit
**Deliverables**
- Integration tests for full export flow
- Load tests verifying performance targets
- Security audit report
- Performance tuning applied
**Tasks**
- [ ] Write integration tests for full export flow
- [ ] Load test with 100 concurrent exports
- [ ] Security audit of data handling
- [ ] Performance profiling and optimization
- [ ] Test large file handling (500MB+)
- [ ] Test error scenarios and retries
- [ ] Document known limitations
**Dependencies**
- Phase 2, 3 complete (service and API)
**Success Criteria**
- 95%+ automated test coverage
- Load tests show < 500ms p95 latency
- Security audit finds no critical issues
- Performance meets targets (< 5 min for 100MB)
```
### Phase 5: Documentation & Launch (Weeks 7-8)
**Timeline**: Mar 4 - Mar 10 (2 weeks, 1 week overlap)
**Team**: Full team (all 4)
**Goals**
- Complete documentation
- Customer communication
- Production deployment
**Deliverables**
- API documentation (for customers)
- Runbook for operations team
- Customer launch announcement
- Production deployment checklist
**Tasks**
- [ ] Create customer-facing API docs
- [ ] Create operational runbook
- [ ] Write troubleshooting guide
- [ ] Create launch announcement
- [ ] Train support team
- [ ] Deploy to production
- [ ] Monitor for issues
- [ ] Collect initial feedback
**Dependencies**
- All prior phases complete
- Security audit passed
**Success Criteria**
- Documentation is complete and clear
- Support team can operate system independently
- Launch goes smoothly with no incidents
- Users successfully export data
```
### Dependencies & Blocking Section
```markdown
## Dependencies & Blockers
### External Dependencies
- **Data Model Spec ([DATA-001])**: Required to understand data structure
- Status: Draft
- Timeline: Must be approved by Jan 20
- Owner: Data team
- **API Contract Spec ([API-001])**: API design must be finalized
- Status: In Review
- Timeline: Must be approved by Feb 5
- Owner: Product team
- **Infrastructure Resources**: Need S3 bucket and Redis cluster
- Status: Requested
- Timeline: Must be available by Jan 15
- Owner: Infrastructure team
### Internal Dependencies
- **Phase 1 → Phase 2**: Queue infrastructure must be stable
- **Phase 2 → Phase 3**: Service must process exports correctly
- **Phase 3 → Phase 4**: API and storage must be working
- **Phase 4 → Phase 5**: All testing must pass
### Known Blockers
- Infrastructure team is currently overloaded
- Mitigation: Request resources early, use managed services
- Data privacy review needed for export functionality
- Mitigation: Schedule review meeting in first week
```
### Timeline & Gantt Chart Section
```markdown
## Timeline
```
Phase 1: Infrastructure (2 wks) [====================]
Phase 2: Export Service (3 wks) ___[============================]
Phase 3: API & Storage (3 wks) _______[============================]
Phase 4: Testing (2 wks) ______________[====================]
Phase 5: Launch (2 wks) __________________[====================]
Week: 1 2 3 4 5 6 7 8
|_|_|_|_|_|_|_|
```
### Key Milestones
| Milestone | Target Date | Owner | Deliverable |
|-----------|------------|-------|-------------|
| Queue Infrastructure Ready | Jan 28 | Eng Lead | Staging deployment |
| Export Processing Works | Feb 18 | Eng Lead | Service passes tests |
| API Complete & Working | Feb 25 | Eng Lead | API docs + endpoints |
| Testing Complete | Mar 3 | QA Lead | Test report |
| Production Launch | Mar 10 | Eng Lead | Live feature |
```
### Resource & Team Section
```markdown
## Resources
### Team Composition
**Engineering Team**
- 2 Backend Engineers (Weeks 1-8): Infrastructure, export service, API
- 1 Backend Engineer (Weeks 4-8): Testing, optimization
- Optional: 1 Frontend Engineer (Weeks 7-8): Documentation, demos
**Support & Operations**
- Product Manager (all weeks): Requirements, prioritization
- QA Lead (Weeks 4-8): Testing coordination
### Skills Required
- Backend development (Node.js, PostgreSQL)
- Infrastructure/DevOps (Kubernetes, AWS)
- Performance testing and optimization
- Security best practices
### Training Needs
- Team review of job queue pattern
- S3 and AWS integration workshop
```
### Risk Management Section
```markdown
## Risks & Mitigation
### Technical Risks
**Risk: Job Queue Reliability Issues**
- **Likelihood**: Medium
- **Impact**: High (feature doesn't work)
- **Mitigation**:
- Use managed Redis service (AWS ElastiCache)
- Implement comprehensive error handling
- Load test thoroughly before production
- Have rollback plan
**Risk: Large File Performance Problems**
- **Likelihood**: Medium
- **Impact**: Medium (performance targets missed)
- **Mitigation**:
- Start performance testing early (Week 2)
- Profile and optimize in Phase 4
- Document performance constraints
- Set data size limits if needed
**Risk: Data Consistency Issues**
- **Likelihood**: Low
- **Impact**: High (data corruption)
- **Mitigation**:
- Implement data validation
- Use database transactions
- Test with edge cases
- Have data audit procedures
### Scheduling Risks
**Risk: Phase Dependencies Cause Delays**
- **Likelihood**: Medium
- **Impact**: High (slips launch date)
- **Mitigation**:
- Phase 2 and 3 overlap to parallelize
- Start Phase 4 testing early
- Have clear done criteria
**Risk: Data Model Spec Not Ready**
- **Likelihood**: Low
- **Impact**: High (blocks implementation)
- **Mitigation**:
- Confirm spec status before starting
- Have backup data model if needed
- Schedule early review meetings
```
### Success Metrics Section
```markdown
## Success Criteria
### Technical Metrics
- [ ] Export API processes 1000+ requests/day
- [ ] p95 latency < 100ms for status queries
- [ ] Export processing completes in < 5 minutes for 100MB files
- [ ] System reliability > 99.5%
- [ ] Zero data loss or corruption incidents
### Adoption Metrics
- [ ] 30%+ of enterprise users adopt feature in first month
- [ ] Average of 2+ exports per adopting user per month
- [ ] Support tickets about exports < 5/week
### Quality Metrics
- [ ] 90%+ test coverage
- [ ] Zero critical security issues
- [ ] Documentation completeness = 100%
- [ ] Team can operate independently
```
## Writing Tips
### Be Realistic About Timelines
- Include buffer time for unknowns (add 20-30%)
- Consider team capacity and interruptions
- Account for review and testing cycles
- Document assumptions about team size/availability
### Break Down Phases Clearly
- Each phase should have clear deliverables
- Phases should be independable or clearly sequenced
- Dependencies should be explicit
- Success criteria should be measurable
### Link to Related Specs
- Reference business requirements: `[BRD-001]`
- Reference technical requirements: `[PRD-001]`
- Reference design documents: `[DES-001]`
- Reference component specs: `[CMP-001]`
### Identify Risk Early
- What could go wrong?
- What's outside your control?
- What mitigations exist?
- What's the contingency plan?
### Track Against Plan
- Update plan weekly with actual progress
- Note slippages and root causes
- Adjust future phases if needed
- Use as learning for future planning
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/plan/pln-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Phases lack clear deliverables"
- **Fix**: Add specific, measurable deliverables for each phase
**Issue**: "No timeline or dates specified"
- **Fix**: Add start/end dates and duration for each phase
**Issue**: "Dependencies not documented"
- **Fix**: Identify and document blocking dependencies between phases
**Issue**: "Resource allocation unclear"
- **Fix**: Specify team members, their roles, and time commitment per phase
## Decision-Making Framework
When planning implementation:
1. **Scope**: What exactly are we building?
- Must-haves vs. nice-to-haves?
- What can we defer?
2. **Sequence**: What must be done in order?
- What can happen in parallel?
- Where are critical path bottlenecks?
3. **Phases**: How do we break this into manageable chunks?
- 1-3 week phases work well
- Each should produce something shippable/testable
- Clear entry/exit criteria
4. **Resources**: What do we need?
- Team skills and capacity?
- Infrastructure and tools?
- External dependencies?
5. **Risk**: What could derail us?
- Technical risks?
- Timeline risks?
- Resource risks?
- Mitigation strategies?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh plan pln-XXX-slug`
2. **Research**: Find related specs and understand scope
3. **Define phases**: Break work into logical chunks
4. **Map dependencies**: Understand what blocks what
5. **Estimate effort**: How long will each phase take?
6. **Identify risks**: What could go wrong?
7. **Validate**: `scripts/validate-spec.sh docs/specs/plan/pln-XXX-slug.md`
8. **Share with team** for feedback and planning

View File

@@ -0,0 +1,382 @@
# How to Create a Technical Requirement Specification
Technical Requirements (PRD or TRQ) translate business needs into specific, implementation-ready technical requirements. They bridge the gap between "what we want to build" (business requirements) and "how we'll build it" (design documents).
## Quick Start
```bash
# 1. Create a new technical requirement
scripts/generate-spec.sh technical-requirement prd-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/technical-requirement/prd-001-descriptive-slug.md)
# 3. Fill in the sections, then validate:
scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
# 4. Fix any issues, then check completeness:
scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
```
## When to Write a Technical Requirement
Use a Technical Requirement when you need to:
- Define specific technical implementation details for a feature
- Map business requirements to technical solutions
- Document design decisions and their rationale
- Create acceptance criteria that engineers can test against
- Specify external dependencies and constraints
## Research Phase
Before writing, do your homework:
### 1. Research Related Specifications
Look for upstream and downstream specs:
```bash
# Find the business requirement this fulfills
grep -r "brd\|business" docs/specs/ --include="*.md" | head -20
# Find any existing technical requirements in this domain
grep -r "prd\|technical" docs/specs/ --include="*.md" | head -20
# Find design documents that might inform this
grep -r "design\|architecture" docs/specs/ --include="*.md" | head -20
```
### 2. Research External Documentation
Research relevant technologies and patterns:
```bash
# For external libraries/frameworks:
# Use the doc tools to get latest official documentation
# Example: research React hooks if implementing a frontend component
# Example: research database indexing strategies if working with large datasets
```
Ask yourself:
- What technologies are most suitable for this?
- Are there industry standards we should follow?
- What does the existing codebase use for similar features?
- Are there performance benchmarks or best practices we should know about?
### 3. Review the Codebase
- How have similar features been implemented?
- What patterns does the team follow?
- What libraries/frameworks are already in use?
- Are there existing utilities or services we can reuse?
## Structure & Content Guide
### Title & Metadata
- **Title**: Clear, specific requirement (e.g., "Implement Real-Time Notification System")
- **Priority**: critical | high | medium | low
- **Document ID**: Use format `PRD-XXX-slug` (e.g., `PRD-001-export-api`)
### Description Section
Answer: "What technical problem are we solving?"
Describe:
- The technical challenge you're addressing
- Why this particular approach matters
- Current technical gaps
- How this impacts system architecture or performance
Example:
```
Currently, bulk exports run synchronously, blocking requests for up to 30 seconds.
This causes timeout errors for exports > 100MB. We need an asynchronous export
system that handles large datasets efficiently.
```
### Business Requirements Addressed Section
Reference the business requirements this fulfills:
```
- [BRD-001] Bulk User Data Export - This implementation enables the export feature
- [BRD-002] Enterprise Data Audit - This provides the data integrity requirements
```
Link each BRD to how your technical solution addresses it.
### Technical Requirements Section
List specific, measurable technical requirements:
```markdown
1. **[TR-001] Asynchronous Export Processing**
- Exports must complete within 5 minutes for datasets up to 500MB
- Must not block HTTP request threads
- Must handle job queue with at least 100 concurrent exports
2. **[TR-002] Data Format Support**
- Support CSV, JSON, and Parquet formats
- All formats must preserve data types accurately
- Handle special characters and encodings (UTF-8, etc.)
3. **[TR-003] Resilience & Retries**
- Failed exports must retry up to 3 times with exponential backoff
- Incomplete exports must be resumable or cleanly failed
```
**Tips:**
- Be specific: Use numbers, formats, standards
- Make it testable: Each requirement should be verifiable
- Reference technical specs: Link to API contracts, data models, etc.
- Include edge cases: What about edge cases or error conditions?
### Implementation Approach Section
Describe the high-level technical strategy:
```markdown
**Architecture Pattern**
We'll use a job queue pattern with async workers. HTTP requests will create
an export job and return immediately. Workers process jobs asynchronously
and notify users when complete.
**Key Technologies**
- Job Queue: Redis with Bull library
- Export Service: Node.js worker process
- Storage: S3 for export files
- Notifications: Email service
**Integration Points**
- Integrates with existing User Service API
- Uses auth middleware for permission checking
- Publishes completion events to event bus
```
### Key Design Decisions Section
Document important choices:
```markdown
**Decision 1: Asynchronous Export vs. Synchronous**
- **Decision**: Use async job queue instead of blocking requests
- **Rationale**: Synchronous approach causes timeouts for large exports;
async improves reliability and user experience
- **Tradeoffs**: Adds complexity (job queue, worker processes, status tracking)
but enables exports for datasets up to 500MB vs. 50MB limit
```
**Why this matters:**
- Explains the "why" behind technical choices
- Helps future developers understand constraints
- Documents tradeoffs explicitly
### Technical Acceptance Criteria Section
Define how you'll know this is implemented correctly:
```markdown
### [TAC-001] Export Job Creation
**Description**: When a user requests an export, a job is created and queued
**Verification**: Unit test verifies job is created with correct parameters;
integration test verifies job appears in queue
### [TAC-002] Async Processing
**Description**: Export job completes without blocking HTTP request
**Verification**: Load test shows HTTP response time < 100ms regardless of
export size; export job completes within target time
### [TAC-003] Export Format Accuracy
**Description**: Exported data matches source data exactly (no data loss)
**Verification**: Property-based tests verify format accuracy for various
data types and edge cases
```
**Tips for Acceptance Criteria:**
- Each should be testable (unit test, integration test, or manual test)
- Include both happy path and edge cases
- Reference specific metrics or standards
### Dependencies Section
**Technical Dependencies**
- What libraries, services, or systems must be in place?
- What versions are required?
- What's the risk if a dependency is unavailable?
```markdown
- **Redis** (v6.0+) - Job queue | Risk: Medium
- **Bull** (v3.0+) - Queue library | Risk: Low
- **S3** - Export file storage | Risk: Low
- **Email Service API** - User notifications | Risk: Medium
```
**Specification Dependencies**
- What other specs must be completed first?
- Why is this a blocker?
```markdown
- [API-001] Export Endpoints - Must be designed before implementation
- [DATA-001] User Data Model - Need schema for understanding export structure
```
### Constraints Section
Document technical limitations:
```markdown
**Performance**
- Exports must complete within 5 minutes
- p95 latency for export requests must be < 100ms
- System must handle 100 concurrent exports
**Scalability**
- Support up to 500MB export files
- Handle 1000+ daily exports
**Security**
- Only export user's own data (auth-based filtering)
- Encryption for files in transit and at rest
- Audit logs for all exports
**Compatibility**
- Support all major browsers (Chrome, Firefox, Safari, Edge)
- Works with existing authentication system
```
### Implementation Notes Section
**Key Considerations**
What should the implementation team watch out for?
```markdown
**Error Handling**
- Handle network interruptions during export
- Gracefully fail if S3 becomes unavailable
- Provide clear error messages to users
**Testing Strategy**
- Unit tests for export formatting logic
- Integration tests for job queue and workers
- Load tests for concurrent export handling
- Property-based tests for data accuracy
```
**Migration Strategy** (if applicable)
- How do we transition from old to new system?
- What about existing data or users?
## Writing Tips
### Make Requirements Testable
- ❌ Bad: "Export should be fast"
- ✅ Good: "Export must complete within 5 minutes for datasets up to 500MB, with p95 latency under 100ms"
### Be Specific About Trade-offs
- Don't just say "we chose Redis"
- Explain: "We chose Redis over RabbitMQ because it's already in our stack and provides the job persistence we need"
### Link to Other Specs
- Reference business requirements this fulfills: `[BRD-001]`
- Reference data models: `[DATA-001]`
- Reference API contracts: `[API-001]`
- Reference design documents: `[DES-001]`
### Document Constraints Clearly
- Performance targets with specific numbers
- Scalability limits and assumptions
- Security and compliance requirements
- Browser/platform support
### Include Edge Cases
- What happens with extremely large datasets?
- How do we handle special characters, encoding issues, missing data?
- What about rate limiting and concurrent requests?
### Complete All TODOs
- Replace placeholder text with actual decisions
- If something is still undecided, explain what needs to happen to decide
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing Technical Acceptance Criteria"
- **Fix**: Add 3-5 criteria describing how you'll verify implementation correctness
**Issue**: "TODO items in Implementation Approach (2 items)"
- **Fix**: Complete the architecture pattern, technologies, and integration points
**Issue**: "No Performance constraints specified"
- **Fix**: Add specific latency, throughput, and availability targets
**Issue**: "Dependencies section incomplete"
- **Fix**: List all required libraries, services, and other specifications this depends on
### Check Completeness
```bash
scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-your-spec.md
```
## Decision-Making Framework
As you write the technical requirement, reason through:
1. **Problem**: What technical problem are we solving?
- Is this a performance issue, reliability issue, or capability gap?
- What's the current cost of not solving this?
2. **Approach**: What are the viable technical approaches?
- Pros and cons of each?
- What's the simplest approach that solves the problem?
- What does the team have experience with?
3. **Trade-offs**: What are we accepting with this approach?
- Complexity vs. flexibility?
- Performance vs. maintainability?
- Immediate need vs. future extensibility?
4. **Measurability**: How will we know this works?
- What specific metrics define success?
- What's the threshold for "passing"?
5. **Dependencies**: What must happen first?
- Are there blockers we need to resolve?
- Can parts be parallelized?
## Example: Complete Technical Requirement
```markdown
# [PRD-001] Asynchronous Export Service
**Priority:** High
## Description
Currently, bulk exports run synchronously, blocking HTTP requests for up to
30 seconds, causing timeouts for exports > 100MB. We need an asynchronous
export system that handles large datasets efficiently and provides job status
tracking to users.
## Business Requirements Addressed
- [BRD-001] Bulk User Data Export - Enables the core export feature
- [BRD-002] Enterprise Audit Requirements - Provides reliable data export
## Technical Requirements
1. **[TR-001] Asynchronous Processing**
- Export jobs must not block HTTP requests
- Jobs complete within 5 minutes for datasets up to 500MB
- System handles 100 concurrent exports
2. **[TR-002] Format Support**
- Support CSV, JSON formats
- Preserve data types and handle special characters
3. **[TR-003] Job Status Tracking**
- Users can check export job status via API
- Job history retained for 30 days
... [rest of sections follow] ...
```
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh technical-requirement prd-XXX-slug`
2. **Research**: Find related BRD and understand the context
3. **Fill in sections** using this guide
4. **Validate**: `scripts/validate-spec.sh docs/specs/technical-requirement/prd-XXX-slug.md`
5. **Fix issues** identified by validator
6. **Share with architecture/design team** for design document creation