Initial commit
This commit is contained in:
341
skills/spec-author/guides/README.md
Normal file
341
skills/spec-author/guides/README.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Spec Author Instruction Guides
|
||||
|
||||
This directory contains comprehensive instruction guides for creating each type of specification document supported by the spec-author skill. Each guide includes:
|
||||
|
||||
- **Quick Start**: Get up and running quickly with basic commands
|
||||
- **Research Phase**: Guidance on researching related specs and external documentation
|
||||
- **Structure & Content Guide**: Detailed walkthrough of each section
|
||||
- **Writing Tips**: Best practices and common pitfalls
|
||||
- **Validation & Fixing Issues**: How to use the validation tools
|
||||
- **Decision-Making Framework**: Questions to ask while writing
|
||||
- **Next Steps**: Complete workflow from creation to completion
|
||||
|
||||
## Specification Types
|
||||
|
||||
### Business & Planning
|
||||
|
||||
#### [Business Requirement (brd-XXX)](./business-requirement.md)
|
||||
Capture what problem you're solving and why it matters from a business perspective. Translate customer needs into requirements that engineering can build against.
|
||||
|
||||
**Use when**: Documenting new features, defining business value, creating stakeholder alignment
|
||||
**Key sections**: Business value, user stories, acceptance criteria, success metrics
|
||||
|
||||
#### [Technical Requirement (prd-XXX)](./technical-requirement.md)
|
||||
Translate business needs into specific, implementation-ready technical requirements. Bridge the gap between "what we want" and "how we'll build it."
|
||||
|
||||
**Use when**: Defining technical implementation details, mapping business requirements to solutions
|
||||
**Key sections**: Technical requirements, design decisions, acceptance criteria, constraints
|
||||
|
||||
#### [Plan (pln-XXX)](./plan.md)
|
||||
Document implementation roadmaps, project timelines, phases, and deliverables. Provide the "how and when" we'll build something.
|
||||
|
||||
**Use when**: Planning project execution, defining phases and timeline, identifying dependencies
|
||||
**Key sections**: Phases, timeline, deliverables, dependencies, risks, resources
|
||||
|
||||
#### [Milestone (mls-XXX)](./milestone.md)
|
||||
Define specific delivery checkpoints within a project, including deliverables, success criteria, and timeline.
|
||||
|
||||
**Use when**: Defining delivery targets, communicating progress, tracking against concrete deliverables
|
||||
**Key sections**: Deliverables, success criteria, timeline, blockers, acceptance procedures
|
||||
|
||||
### Architecture & Design
|
||||
|
||||
#### [Design Document (des-XXX)](./design-document.md)
|
||||
Provide the detailed architectural and technical design for a system, component, or significant feature.
|
||||
|
||||
**Use when**: Major system redesign, architectural decisions, technology choices
|
||||
**Key sections**: Proposed solution, design decisions, technology stack, trade-offs, implementation plan
|
||||
|
||||
#### [Component (cmp-XXX)](./component.md)
|
||||
Document individual system components or services, including responsibilities, interfaces, configuration, and deployment.
|
||||
|
||||
**Use when**: Documenting microservices, major system components, architectural pieces
|
||||
**Key sections**: Responsibilities, interfaces, configuration, deployment, monitoring
|
||||
|
||||
#### [Flow Schematic (flow-XXX)](./flow-schematic.md)
|
||||
Document business processes, workflows, and system flows visually and textually. Show how information moves through systems.
|
||||
|
||||
**Use when**: Documenting user workflows, system interactions, complex processes
|
||||
**Key sections**: Flow diagram, step-by-step descriptions, decision points, error handling
|
||||
|
||||
### Data & Contracts
|
||||
|
||||
#### [Data Model (data-XXX)](./data-model.md)
|
||||
Define entities, fields, relationships, and constraints for your application's data. Define the "shape" of data your system works with.
|
||||
|
||||
**Use when**: Planning database schema, documenting entity relationships, enabling API/UI teams
|
||||
**Key sections**: Entity definitions, relationships, constraints, scaling considerations
|
||||
|
||||
#### [API Contract (api-XXX)](./api-contract.md)
|
||||
Document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication.
|
||||
|
||||
**Use when**: Defining API endpoints, enabling parallel frontend/backend development, creating living documentation
|
||||
**Key sections**: Authentication, endpoints, response formats, error handling, rate limiting
|
||||
|
||||
### Operations & Configuration
|
||||
|
||||
#### [Deployment Procedure (deploy-XXX)](./deployment-procedure.md)
|
||||
Document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
|
||||
|
||||
**Use when**: Deploying services, creating runbooks, enabling safe operations, ensuring repeatable deployments
|
||||
**Key sections**: Prerequisites, deployment steps, rollback procedure, success criteria, troubleshooting
|
||||
|
||||
#### [Configuration Schema (config-XXX)](./configuration-schema.md)
|
||||
Document all configurable parameters for a system, including types, valid values, defaults, and impact.
|
||||
|
||||
**Use when**: Documenting system configuration, enabling ops teams to configure safely, supporting multiple environments
|
||||
**Key sections**: Configuration methods, field descriptions, validation rules, environment-specific examples
|
||||
|
||||
## How to Use These Guides
|
||||
|
||||
### For New Spec Types
|
||||
|
||||
1. **Choose your spec type** based on what you're documenting
|
||||
2. **Go to the corresponding guide** (e.g., if creating a business requirement, read `business-requirement.md`)
|
||||
3. **Follow the Quick Start** to generate a new spec from the template
|
||||
4. **Work through the Research Phase** to understand context
|
||||
5. **Use the Structure & Content Guide** to fill in each section
|
||||
6. **Apply the Writing Tips** as you write
|
||||
7. **Run validation** using the tools
|
||||
8. **Follow the Decision-Making Framework** to reason through tough choices
|
||||
|
||||
### For Improving Existing Specs
|
||||
|
||||
1. **Check the appropriate guide** for what your spec should contain
|
||||
2. **Review the "Validation & Fixing Issues"** section for common problems
|
||||
3. **Use validation tools** to identify what's missing
|
||||
4. **Fill in missing sections** using the structure guide
|
||||
5. **Fix any issues** the validator identifies
|
||||
|
||||
### For Team Standards
|
||||
|
||||
1. **Each guide provides concrete standards** for each spec type
|
||||
2. **Sections that are marked as "required"** should be in all specs
|
||||
3. **Examples in each guide** show the expected quality and detail level
|
||||
4. **Validation rules** ensure consistent structure across specs
|
||||
|
||||
## Quick Reference: CLI Commands
|
||||
|
||||
### Create a New Spec
|
||||
|
||||
```bash
|
||||
# Generate a new spec from template
|
||||
scripts/generate-spec.sh <spec-type> <spec-id>
|
||||
|
||||
# Examples:
|
||||
scripts/generate-spec.sh business-requirement brd-001-user-export
|
||||
scripts/generate-spec.sh design-document des-001-export-arch
|
||||
scripts/generate-spec.sh api-contract api-001-export-endpoints
|
||||
```
|
||||
|
||||
### Validate a Spec
|
||||
|
||||
```bash
|
||||
# Check a spec for completeness and structure
|
||||
scripts/validate-spec.sh docs/specs/business-requirement/brd-001-user-export.md
|
||||
```
|
||||
|
||||
Returns:
|
||||
- ✓ **PASS**: All required sections present and complete
|
||||
- ⚠ **WARNINGS**: Missing optional sections or incomplete TODOs
|
||||
- ✗ **ERRORS**: Missing critical sections or structural issues
|
||||
|
||||
### Check Completeness
|
||||
|
||||
```bash
|
||||
# See what's incomplete and what TODOs need attention
|
||||
scripts/check-completeness.sh docs/specs/business-requirement/brd-001-user-export.md
|
||||
```
|
||||
|
||||
Shows:
|
||||
- Completion percentage
|
||||
- Missing sections with descriptions
|
||||
- TODO items that need completion
|
||||
- Referenced documents
|
||||
|
||||
### List Available Templates
|
||||
|
||||
```bash
|
||||
# See what spec types are available
|
||||
scripts/list-templates.sh
|
||||
```
|
||||
|
||||
## Workflow Example: Creating a Feature End-to-End
|
||||
|
||||
### Step 1: Business Requirements
|
||||
**Create**: `scripts/generate-spec.sh business-requirement brd-001-bulk-export`
|
||||
**Guide**: Follow [business-requirement.md](./business-requirement.md)
|
||||
**Output**: `docs/specs/business-requirement/brd-001-bulk-export.md`
|
||||
|
||||
### Step 2: Technical Requirements
|
||||
**Create**: `scripts/generate-spec.sh technical-requirement prd-001-export-api`
|
||||
**Guide**: Follow [technical-requirement.md](./technical-requirement.md)
|
||||
**Reference**: Link to BRD created in Step 1
|
||||
**Output**: `docs/specs/technical-requirement/prd-001-export-api.md`
|
||||
|
||||
### Step 3: Design Document
|
||||
**Create**: `scripts/generate-spec.sh design-document des-001-export-arch`
|
||||
**Guide**: Follow [design-document.md](./design-document.md)
|
||||
**Reference**: Link to PRD and BRD
|
||||
**Output**: `docs/specs/design-document/des-001-export-arch.md`
|
||||
|
||||
### Step 4: Data Model
|
||||
**Create**: `scripts/generate-spec.sh data-model data-001-export-schema`
|
||||
**Guide**: Follow [data-model.md](./data-model.md)
|
||||
**Reference**: Entities used in design
|
||||
**Output**: `docs/specs/data-model/data-001-export-schema.md`
|
||||
|
||||
### Step 5: API Contract
|
||||
**Create**: `scripts/generate-spec.sh api-contract api-001-export-endpoints`
|
||||
**Guide**: Follow [api-contract.md](./api-contract.md)
|
||||
**Reference**: Link to technical requirements
|
||||
**Output**: `docs/specs/api-contract/api-001-export-endpoints.md`
|
||||
|
||||
### Step 6: Component Specs
|
||||
**Create**: `scripts/generate-spec.sh component cmp-001-export-service`
|
||||
**Guide**: Follow [component.md](./component.md)
|
||||
**Reference**: Link to design and technical requirements
|
||||
**Output**: `docs/specs/component/cmp-001-export-service.md`
|
||||
|
||||
### Step 7: Implementation Plan
|
||||
**Create**: `scripts/generate-spec.sh plan pln-001-export-implementation`
|
||||
**Guide**: Follow [plan.md](./plan.md)
|
||||
**Reference**: Link to all related specs
|
||||
**Output**: `docs/specs/plan/pln-001-export-implementation.md`
|
||||
|
||||
### Step 8: Define Milestones
|
||||
**Create**: `scripts/generate-spec.sh milestone mls-001-export-phase1`
|
||||
**Guide**: Follow [milestone.md](./milestone.md)
|
||||
**Reference**: Link to plan created in Step 7
|
||||
**Output**: `docs/specs/milestone/mls-001-export-phase1.md`
|
||||
|
||||
### Step 9: Document Workflows
|
||||
**Create**: `scripts/generate-spec.sh flow-schematic flow-001-export-process`
|
||||
**Guide**: Follow [flow-schematic.md](./flow-schematic.md)
|
||||
**Reference**: Illustrate flows described in design and API
|
||||
**Output**: `docs/specs/flow-schematic/flow-001-export-process.md`
|
||||
|
||||
### Step 10: Configuration Schema
|
||||
**Create**: `scripts/generate-spec.sh configuration-schema config-001-export-service`
|
||||
**Guide**: Follow [configuration-schema.md](./configuration-schema.md)
|
||||
**Reference**: Used by component and deployment
|
||||
**Output**: `docs/specs/configuration-schema/config-001-export-service.md`
|
||||
|
||||
### Step 11: Deployment Procedure
|
||||
**Create**: `scripts/generate-spec.sh deployment-procedure deploy-001-export-production`
|
||||
**Guide**: Follow [deployment-procedure.md](./deployment-procedure.md)
|
||||
**Reference**: Link to component, configuration, and plan
|
||||
**Output**: `docs/specs/deployment-procedure/deploy-001-export-production.md`
|
||||
|
||||
## Tips for Success
|
||||
|
||||
### 1. Research Thoroughly
|
||||
- Use the Research Phase section in each guide
|
||||
- Look for related specs that have already been created
|
||||
- Research external docs using doc tools or web search
|
||||
- Understand the context before writing
|
||||
|
||||
### 2. Use CLI Tools Effectively
|
||||
- Start with `scripts/generate-spec.sh` to create from template (saves time)
|
||||
- Use `scripts/validate-spec.sh` frequently while writing (catches issues early)
|
||||
- Use `scripts/check-completeness.sh` to find TODOs that need attention
|
||||
- Run validation before considering a spec "done"
|
||||
|
||||
### 3. Complete All Sections
|
||||
- "Required" sections should be in every spec
|
||||
- "Optional" sections may be skipped if not applicable
|
||||
- Never leave placeholder text or TODO items in final specs
|
||||
- Incomplete specs cause confusion and rework
|
||||
|
||||
### 4. Link Specs Together
|
||||
- Reference related specs using [ID] format (e.g., `[BRD-001]`)
|
||||
- Show how specs depend on each other
|
||||
- This creates a web of related documentation
|
||||
- Makes specs more discoverable
|
||||
|
||||
### 5. Use Concrete Examples
|
||||
- Concrete examples are clearer than abstract descriptions
|
||||
- Show actual data, requests, responses
|
||||
- Include sample configurations
|
||||
- Show before/after if describing changes
|
||||
|
||||
### 6. Get Feedback Early
|
||||
- Share early drafts with stakeholders
|
||||
- Use validation to catch structural issues
|
||||
- Get domain experts to review for accuracy
|
||||
- Iterate based on feedback
|
||||
|
||||
### 7. Keep Updating
|
||||
- Specs should reflect current state, not initial design
|
||||
- Update when important decisions change
|
||||
- Mark what was decided and when
|
||||
- Document why changes were made
|
||||
|
||||
## Common Patterns Across Guides
|
||||
|
||||
### Quick Start Pattern
|
||||
Every guide starts with:
|
||||
```bash
|
||||
scripts/generate-spec.sh <type> <id>
|
||||
# Edit file
|
||||
scripts/validate-spec.sh docs/specs/...
|
||||
scripts/check-completeness.sh docs/specs/...
|
||||
```
|
||||
|
||||
### Research Phase Pattern
|
||||
Every guide recommends:
|
||||
1. Finding related specs
|
||||
2. Understanding external context
|
||||
3. Reviewing existing patterns
|
||||
4. Understanding constraints/requirements
|
||||
|
||||
### Structure Pattern
|
||||
Every guide provides:
|
||||
- Detailed walkthrough of each section
|
||||
- Purpose of each section
|
||||
- What should be included
|
||||
- How detailed to be
|
||||
|
||||
### Validation Pattern
|
||||
Every guide includes:
|
||||
- Running the validator
|
||||
- Common issues and how to fix them
|
||||
- Checking completeness
|
||||
|
||||
### Decision Pattern
|
||||
Every guide encourages thinking through:
|
||||
1. Scope/boundaries
|
||||
2. Options and trade-offs
|
||||
3. Specific decisions and rationale
|
||||
4. Communication and approval
|
||||
5. Evolution and change
|
||||
|
||||
## Getting Help
|
||||
|
||||
### For questions about a specific spec type:
|
||||
- Read the corresponding guide in this directory
|
||||
- Check the examples section for concrete examples
|
||||
- Review the decision-making framework for guidance
|
||||
|
||||
### For validation issues:
|
||||
- Run `./scripts/validate-spec.sh` to see what's missing
|
||||
- Read the "Validation & Fixing Issues" section of the guide
|
||||
- Check if required sections are present and complete
|
||||
|
||||
### For understanding the bigger picture:
|
||||
- Read through related guides to see how specs connect
|
||||
- Look at the Workflow Example to see the full flow
|
||||
- Review the Common Patterns section
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Pick a spec type** you need to create
|
||||
2. **Read the corresponding guide** thoroughly
|
||||
3. **Run the generate command** to create from template
|
||||
4. **Follow the structure guide** to fill in sections
|
||||
5. **Validate frequently** as you work
|
||||
6. **Fix issues** the validator identifies
|
||||
7. **Get feedback** from stakeholders
|
||||
8. **Consider this "complete"** when validator passes ✓
|
||||
|
||||
Good luck writing great specs! Remember: clear, complete specifications save time and prevent mistakes later in the development process.
|
||||
526
skills/spec-author/guides/api-contract.md
Normal file
526
skills/spec-author/guides/api-contract.md
Normal file
@@ -0,0 +1,526 @@
|
||||
# How to Create an API Contract Specification
|
||||
|
||||
API Contracts document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication. They serve as the contract between frontend and backend teams.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new API contract
|
||||
scripts/generate-spec.sh api-contract api-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/api-contract/api-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in endpoints and specifications, then validate:
|
||||
scripts/validate-spec.sh docs/specs/api-contract/api-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/api-contract/api-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write an API Contract
|
||||
|
||||
Use an API Contract when you need to:
|
||||
- Define REST API endpoints and their behavior
|
||||
- Document request/response schemas in detail
|
||||
- Specify error handling and status codes
|
||||
- Clarify authentication and authorization
|
||||
- Enable parallel frontend/backend development
|
||||
- Create living documentation of your API
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what this API needs to support:
|
||||
|
||||
```bash
|
||||
# Find technical requirements this fulfills
|
||||
grep -r "prd\|technical" docs/specs/ --include="*.md"
|
||||
|
||||
# Find data models this API exposes
|
||||
grep -r "data\|model" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing APIs in the codebase
|
||||
grep -r "api\|endpoint" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Research API Design Standards
|
||||
Understand best practices and conventions:
|
||||
|
||||
- REST conventions: HTTP methods, status codes, URL structure
|
||||
- Pagination: How to handle large result sets?
|
||||
- Error handling: Standard error format for your org?
|
||||
- Versioning: How do you version APIs?
|
||||
- Naming conventions: camelCase vs. snake_case?
|
||||
|
||||
Research your tech stack's conventions if needed.
|
||||
|
||||
### 3. Review Existing APIs
|
||||
- How are existing APIs in your codebase designed?
|
||||
- What patterns does your team follow?
|
||||
- Any shared infrastructure (API gateway, auth)?
|
||||
- Error response format standards?
|
||||
|
||||
### 4. Understand Data Models
|
||||
- What entities are exposed?
|
||||
- Which fields are required vs. optional?
|
||||
- See [DATA-001] or similar specs for schema details
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "User Export API" or similar
|
||||
- Include context about what endpoints are included
|
||||
- Version number if this is an update to an existing API
|
||||
|
||||
### Overview Section
|
||||
Provide context for the API:
|
||||
|
||||
```markdown
|
||||
# User Export API
|
||||
|
||||
This API provides endpoints for initiating, tracking, and downloading user data exports.
|
||||
Supports bulk export of user information in multiple formats (CSV, JSON).
|
||||
Authenticated requests only.
|
||||
|
||||
**Base URL**: `https://api.example.com/v1`
|
||||
**Authentication**: Bearer token (JWT)
|
||||
```
|
||||
|
||||
### Authentication & Authorization Section
|
||||
|
||||
Describe how authentication works:
|
||||
|
||||
```markdown
|
||||
## Authentication
|
||||
|
||||
**Method**: Bearer Token (JWT)
|
||||
**Header**: `Authorization: Bearer {token}`
|
||||
**Token Source**: Obtained from `/auth/login` endpoint
|
||||
|
||||
### Authorization
|
||||
|
||||
**Required**: All endpoints require valid JWT token
|
||||
|
||||
**Scopes** (if using OAuth/scope-based):
|
||||
- `exports:read` - View export status
|
||||
- `exports:create` - Create new exports
|
||||
- `exports:download` - Download export files
|
||||
|
||||
**User Data**: Users can only access their own exports (enforced server-side)
|
||||
```
|
||||
|
||||
### Endpoints Section
|
||||
|
||||
Document each endpoint thoroughly:
|
||||
|
||||
#### Endpoint: Create Export
|
||||
```markdown
|
||||
**POST /exports**
|
||||
|
||||
Creates a new export job for the authenticated user.
|
||||
|
||||
### Request
|
||||
|
||||
**Headers**
|
||||
- `Authorization: Bearer {token}` (required)
|
||||
- `Content-Type: application/json`
|
||||
|
||||
**Body**
|
||||
```json
|
||||
{
|
||||
"data_types": ["users", "transactions"],
|
||||
"format": "csv",
|
||||
"date_range": {
|
||||
"start": "2024-01-01",
|
||||
"end": "2024-01-31"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
- `data_types` (array, required): Types of data to include
|
||||
- Allowed values: `users`, `transactions`, `settings`
|
||||
- At least one required
|
||||
- `format` (string, required): Export file format
|
||||
- Allowed values: `csv`, `json`
|
||||
- `date_range` (object, optional): Filter data by date range
|
||||
- `start` (string, ISO8601 format)
|
||||
- `end` (string, ISO8601 format)
|
||||
|
||||
### Response
|
||||
|
||||
**Status: 201 Created**
|
||||
```json
|
||||
{
|
||||
"id": "exp_1234567890",
|
||||
"user_id": "usr_9876543210",
|
||||
"status": "queued",
|
||||
"format": "csv",
|
||||
"data_types": ["users", "transactions"],
|
||||
"created_at": "2024-01-15T10:30:00Z",
|
||||
"estimated_completion": "2024-01-15T10:35:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Status: 400 Bad Request**
|
||||
```json
|
||||
{
|
||||
"error": "invalid_request",
|
||||
"message": "data_types must include at least one type",
|
||||
"code": "VALIDATION_ERROR"
|
||||
}
|
||||
```
|
||||
|
||||
**Status: 401 Unauthorized**
|
||||
```json
|
||||
{
|
||||
"error": "unauthorized",
|
||||
"message": "Invalid or missing authorization token",
|
||||
"code": "AUTH_FAILED"
|
||||
}
|
||||
```
|
||||
|
||||
**Status: 429 Too Many Requests**
|
||||
```json
|
||||
{
|
||||
"error": "rate_limited",
|
||||
"message": "Too many requests. Try again after 60 seconds.",
|
||||
"retry_after": 60
|
||||
}
|
||||
```
|
||||
|
||||
### Details
|
||||
|
||||
**Rate Limiting**
|
||||
- 10 exports per hour per user
|
||||
- Returns `X-RateLimit-*` headers
|
||||
- `X-RateLimit-Limit: 10`
|
||||
- `X-RateLimit-Remaining: 5`
|
||||
- `X-RateLimit-Reset: 1705319400`
|
||||
|
||||
**Notes**
|
||||
- Exports larger than 100MB are automatically gzipped
|
||||
- User receives email notification when export is ready
|
||||
- Export files retained for 7 days
|
||||
```
|
||||
|
||||
#### Endpoint: Get Export Status
|
||||
```markdown
|
||||
**GET /exports/{export_id}**
|
||||
|
||||
Retrieve the status of a specific export.
|
||||
|
||||
### Path Parameters
|
||||
- `export_id` (string, required): Export ID (e.g., `exp_1234567890`)
|
||||
|
||||
### Response
|
||||
|
||||
**Status: 200 OK**
|
||||
```json
|
||||
{
|
||||
"id": "exp_1234567890",
|
||||
"user_id": "usr_9876543210",
|
||||
"status": "completed",
|
||||
"format": "csv",
|
||||
"data_types": ["users", "transactions"],
|
||||
"created_at": "2024-01-15T10:30:00Z",
|
||||
"completed_at": "2024-01-15T10:35:00Z",
|
||||
"file_size_bytes": 2048576,
|
||||
"download_url": "https://exports.example.com/exp_1234567890.csv.gz",
|
||||
"download_expires_at": "2024-01-22T10:35:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Status: 404 Not Found**
|
||||
```json
|
||||
{
|
||||
"error": "not_found",
|
||||
"message": "Export not found",
|
||||
"code": "EXPORT_NOT_FOUND"
|
||||
}
|
||||
```
|
||||
|
||||
### Export Status Values
|
||||
- `queued` - Job is waiting to be processed
|
||||
- `processing` - Job is currently running
|
||||
- `completed` - Export is ready for download
|
||||
- `failed` - Export failed (see error field)
|
||||
- `cancelled` - User cancelled the export
|
||||
|
||||
### Error Field (when status: failed)
|
||||
```json
|
||||
{
|
||||
"error": "export_failed",
|
||||
"message": "Database connection lost during export"
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
#### Endpoint: Download Export
|
||||
```markdown
|
||||
**GET /exports/{export_id}/download**
|
||||
|
||||
Download the export file.
|
||||
|
||||
### Path Parameters
|
||||
- `export_id` (string, required): Export ID
|
||||
|
||||
### Response
|
||||
|
||||
**Status: 200 OK**
|
||||
- Returns binary file content
|
||||
- Content-Type: `application/csv` or `application/json`
|
||||
- Headers include:
|
||||
- `Content-Disposition: attachment; filename=export.csv`
|
||||
- `Content-Length: 2048576`
|
||||
|
||||
**Status: 410 Gone**
|
||||
```json
|
||||
{
|
||||
"error": "gone",
|
||||
"message": "Export file expired (retention: 7 days)",
|
||||
"code": "FILE_EXPIRED"
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### Response Formats Section
|
||||
|
||||
Define common response formats used across endpoints:
|
||||
|
||||
```markdown
|
||||
## Common Response Formats
|
||||
|
||||
### Error Response
|
||||
All errors follow this format:
|
||||
```json
|
||||
{
|
||||
"error": "error_code",
|
||||
"message": "Human-readable error message",
|
||||
"code": "ERROR_CODE",
|
||||
"request_id": "req_abc123" // For support/debugging
|
||||
}
|
||||
```
|
||||
|
||||
### Pagination (for list endpoints)
|
||||
```json
|
||||
{
|
||||
"data": [ /* array of items */ ],
|
||||
"pagination": {
|
||||
"total": 150,
|
||||
"limit": 20,
|
||||
"offset": 0,
|
||||
"next": "https://api.example.com/v1/exports?limit=20&offset=20"
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### Error Handling Section
|
||||
|
||||
Document error scenarios and status codes:
|
||||
|
||||
```markdown
|
||||
## Error Handling
|
||||
|
||||
### HTTP Status Codes
|
||||
|
||||
- **200 OK**: Request succeeded
|
||||
- **201 Created**: Resource created successfully
|
||||
- **400 Bad Request**: Invalid request format or parameters
|
||||
- **401 Unauthorized**: Missing or invalid authentication
|
||||
- **403 Forbidden**: Authenticated but not authorized (e.g., trying to access another user's export)
|
||||
- **404 Not Found**: Resource doesn't exist
|
||||
- **409 Conflict**: Request conflicts with current state (e.g., cancelling completed export)
|
||||
- **429 Too Many Requests**: Rate limit exceeded
|
||||
- **500 Internal Server Error**: Server error
|
||||
- **503 Service Unavailable**: Service temporarily unavailable
|
||||
|
||||
### Error Codes
|
||||
|
||||
- `VALIDATION_ERROR` - Invalid input parameters
|
||||
- `AUTH_FAILED` - Authentication failed
|
||||
- `NOT_AUTHORIZED` - Insufficient permissions
|
||||
- `NOT_FOUND` - Resource doesn't exist
|
||||
- `CONFLICT` - Conflicting request state
|
||||
- `RATE_LIMITED` - Rate limit exceeded
|
||||
- `INTERNAL_ERROR` - Server error (retryable)
|
||||
- `SERVICE_UNAVAILABLE` - Service temporarily down (retryable)
|
||||
|
||||
### Retry Strategy
|
||||
|
||||
**Retryable errors** (5xx, 429):
|
||||
- Implement exponential backoff: 1s, 2s, 4s, 8s...
|
||||
- Maximum 3 retries
|
||||
|
||||
**Non-retryable errors** (4xx except 429):
|
||||
- Return error immediately to client
|
||||
```
|
||||
|
||||
### Rate Limiting Section
|
||||
|
||||
```markdown
|
||||
## Rate Limiting
|
||||
|
||||
### Limits per User
|
||||
- Export creation: 10 per hour
|
||||
- API calls: 1000 per hour
|
||||
|
||||
### Headers
|
||||
All responses include rate limit information:
|
||||
- `X-RateLimit-Limit`: Request quota
|
||||
- `X-RateLimit-Remaining`: Requests remaining
|
||||
- `X-RateLimit-Reset`: Unix timestamp when quota resets
|
||||
|
||||
### Handling Rate Limits
|
||||
- If rate limited (429), client receives `Retry-After` header
|
||||
- Retry after specified seconds
|
||||
- Implement exponential backoff to avoid overwhelming API
|
||||
```
|
||||
|
||||
### Data Types Section
|
||||
|
||||
If your API works with multiple data models, document them:
|
||||
|
||||
```markdown
|
||||
## Data Types
|
||||
|
||||
### Export Object
|
||||
```json
|
||||
{
|
||||
"id": "string (export ID)",
|
||||
"user_id": "string (user ID)",
|
||||
"status": "string (queued|processing|completed|failed|cancelled)",
|
||||
"format": "string (csv|json)",
|
||||
"data_types": "string[] (users|transactions|settings)",
|
||||
"created_at": "string (ISO8601)",
|
||||
"completed_at": "string (ISO8601, null if not completed)",
|
||||
"file_size_bytes": "number (null if not completed)",
|
||||
"download_url": "string (null if not completed)",
|
||||
"download_expires_at": "string (ISO8601, null if expired)"
|
||||
}
|
||||
```
|
||||
|
||||
### User Object
|
||||
```json
|
||||
{
|
||||
"id": "string",
|
||||
"email": "string",
|
||||
"created_at": "string (ISO8601)"
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### Versioning Section
|
||||
|
||||
```markdown
|
||||
## API Versioning
|
||||
|
||||
**Current Version**: v1
|
||||
|
||||
### Versioning Strategy
|
||||
- New major versions for breaking changes (v1, v2, etc.)
|
||||
- Minor versions for additive changes (backwards compatible)
|
||||
- Versions specified in URL path: `/v1/exports`
|
||||
|
||||
### Migration Timeline
|
||||
- Old version support: Minimum 12 months after new version release
|
||||
- Deprecation notice: 3 months before shutdown
|
||||
|
||||
### Breaking Changes
|
||||
Examples of breaking changes requiring new version:
|
||||
- Removing endpoints or fields
|
||||
- Changing response format fundamentally
|
||||
- Changing HTTP method of endpoint
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Specific About Request/Response
|
||||
- Show actual JSON examples
|
||||
- Document all fields (required vs. optional)
|
||||
- Include data types and valid values
|
||||
- Specify date/time formats (ISO8601)
|
||||
|
||||
### Document Error Scenarios
|
||||
- List common error cases for each endpoint
|
||||
- Show exact error response format
|
||||
- Explain how client should handle each error
|
||||
- Include HTTP status codes
|
||||
|
||||
### Think About Developer Experience
|
||||
- Are endpoints intuitive?
|
||||
- Is pagination consistent across endpoints?
|
||||
- Are error messages helpful?
|
||||
- Can a developer implement against this without asking questions?
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference data models: `[DATA-001]`
|
||||
- Reference technical requirements: `[PRD-001]`
|
||||
- Reference design docs: `[DES-001]`
|
||||
|
||||
### Version Your API
|
||||
- Document versioning strategy
|
||||
- Make it easy for clients to upgrade
|
||||
- Provide migration path from old to new versions
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/api-contract/api-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing endpoint specifications"
|
||||
- **Fix**: Document all endpoints with request/response examples
|
||||
|
||||
**Issue**: "Error handling not documented"
|
||||
- **Fix**: Add status codes and error response formats
|
||||
|
||||
**Issue**: "No authentication section"
|
||||
- **Fix**: Clearly document authentication method and authorization rules
|
||||
|
||||
**Issue**: "Incomplete endpoint details"
|
||||
- **Fix**: Add request parameters, response examples, and error cases
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
As you write the API spec, consider:
|
||||
|
||||
1. **Design**: Are endpoints intuitive and consistent?
|
||||
- Consistent URL structure?
|
||||
- Correct HTTP methods?
|
||||
- Good naming?
|
||||
|
||||
2. **Data**: What fields are needed in requests/responses?
|
||||
- Required vs. optional?
|
||||
- Proper data types?
|
||||
- Necessary for clients or redundant?
|
||||
|
||||
3. **Errors**: What can go wrong?
|
||||
- Common error cases?
|
||||
- Clear error messages?
|
||||
- Actionable feedback for developers?
|
||||
|
||||
4. **Performance**: Are there efficiency considerations?
|
||||
- Pagination for large result sets?
|
||||
- Filtering/search capabilities?
|
||||
- Rate limiting strategy?
|
||||
|
||||
5. **Evolution**: How will this API change?
|
||||
- Versioning strategy?
|
||||
- Backwards compatibility?
|
||||
- Deprecation timeline?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh api-contract api-XXX-slug`
|
||||
2. **Research**: Find related data models and technical requirements
|
||||
3. **Design endpoints**: Sketch out URL structure and HTTP methods
|
||||
4. **Fill in details** for each endpoint using this guide
|
||||
5. **Validate**: `scripts/validate-spec.sh docs/specs/api-contract/api-XXX-slug.md`
|
||||
6. **Get review** from backend and frontend teams
|
||||
7. **Share with implementation teams** for development
|
||||
310
skills/spec-author/guides/business-requirement.md
Normal file
310
skills/spec-author/guides/business-requirement.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# How to Create a Business Requirement Specification
|
||||
|
||||
Business Requirements (BRD) capture what problem you're solving and why it matters from a business perspective. They translate customer needs into requirements that the technical team can build against.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new business requirement (auto-generates next ID)
|
||||
scripts/generate-spec.sh business-requirement --next descriptive-slug
|
||||
# This auto-assigns the next ID (e.g., brd-002-descriptive-slug)
|
||||
# File created at: docs/specs/business-requirement/brd-002-descriptive-slug.md
|
||||
|
||||
# 2. Fill in the sections following the guide below
|
||||
|
||||
# 3. Validate completeness
|
||||
scripts/validate-spec.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
|
||||
|
||||
# 4. Fix any issues and check completeness
|
||||
scripts/check-completeness.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
|
||||
```
|
||||
|
||||
**Pro tip:** Use `scripts/next-id.sh business-requirement` to see what the next ID will be before creating.
|
||||
|
||||
## When to Write a Business Requirement
|
||||
|
||||
Use a Business Requirement when you need to:
|
||||
- Document a new feature or capability from the user's perspective
|
||||
- Articulate the business value and expected outcomes
|
||||
- Define acceptance criteria that stakeholders can verify
|
||||
- Create alignment across product, engineering, and business teams
|
||||
- Track success after implementation with specific metrics
|
||||
|
||||
## Research Phase
|
||||
|
||||
Before writing, gather context:
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Look for related specs that inform this requirement:
|
||||
```bash
|
||||
# Find related business requirements
|
||||
grep -r "related\|similar\|user export" docs/specs/ --include="*.md" | head -20
|
||||
|
||||
# Or search for related technical requirements that might already exist
|
||||
scripts/list-templates.sh # See what's already documented
|
||||
```
|
||||
|
||||
### 2. Research External Documentation & Competitive Landscape
|
||||
If available, research:
|
||||
- Competitor features or how similar companies solve this problem
|
||||
- Industry standards or best practices
|
||||
- User research or survey data
|
||||
- Customer feedback or support tickets
|
||||
|
||||
Use web tools if needed:
|
||||
```bash
|
||||
# If researching web sources, use Claude's web fetch capabilities
|
||||
# to pull in external docs, API documentation, or competitive analysis
|
||||
```
|
||||
|
||||
### 3. Understand Existing Context
|
||||
- Ask: What systems or processes does this impact?
|
||||
- Find: Any existing specs in `docs/specs/` that are related
|
||||
- Review: Recent PRs or commits related to this domain
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Metadata Section
|
||||
Fill in these required fields:
|
||||
- **Document ID**: Use format `BRD-XXX-short-slug` (e.g., `BRD-001-user-export`)
|
||||
- **Status**: Start with "Draft", moves to "In Review" → "Approved" → "Implemented"
|
||||
- **Author**: Your name
|
||||
- **Created**: Today's date (YYYY-MM-DD)
|
||||
- **Stakeholders**: Key people involved (Product Manager, Engineering Lead, Customer Success, etc.)
|
||||
- **Priority**: Critical | High | Medium | Low
|
||||
|
||||
### Description Section
|
||||
Answer: "What is the problem and why does it matter?"
|
||||
|
||||
**Background**: Provide context
|
||||
- What is the current situation?
|
||||
- How did we identify this need?
|
||||
- Who brought it up?
|
||||
|
||||
**Problem Statement**: Be concise and specific
|
||||
- Example: "Users cannot export their data in bulk, forcing them to perform manual exports one at a time, which is time-consuming and error-prone."
|
||||
|
||||
### Business Value Section
|
||||
Answer: "Why should we build this?"
|
||||
|
||||
**Expected Outcomes**: List 2-3 measurable outcomes
|
||||
- Example: "Reduce manual export time by 80%"
|
||||
- Example: "Increase user retention by enabling data portability"
|
||||
|
||||
**Strategic Alignment**: How does this support business goals?
|
||||
- Example: "Aligns with our goal to improve user experience for enterprise customers"
|
||||
|
||||
### Stakeholders Section
|
||||
Create a table identifying who needs to sign off or provide input:
|
||||
- **Business Owner**: Makes final business decisions
|
||||
- **Product Owner**: Gathers and prioritizes requirements
|
||||
- **End Users**: The people who will use this feature
|
||||
- **Technical Lead**: Ensures technical feasibility
|
||||
|
||||
### User Stories Section
|
||||
Write 3-5 user stories following this format:
|
||||
```
|
||||
As a [user role],
|
||||
I want to [capability],
|
||||
so that [benefit/outcome]
|
||||
```
|
||||
|
||||
**Tips for writing user stories:**
|
||||
- Use real user roles from your product (not generic "user")
|
||||
- Each story should be achievable in 1-3 days of work (rough estimate)
|
||||
- Include acceptance criteria inline or in a separate section
|
||||
|
||||
**Example:**
|
||||
| ID | As a... | I want to... | So that... | Priority |
|
||||
|---|---|---|---|---|
|
||||
| US-1 | Data Analyst | Export data as CSV | I can analyze it in Excel | High |
|
||||
| US-2 | Enterprise Admin | Bulk export all user data | I can back it up and migrate to another system | High |
|
||||
| US-3 | API Client | Get exports via webhook | I can automate reports | Medium |
|
||||
|
||||
### Assumptions Section
|
||||
List what you're assuming to be true:
|
||||
- "Users have stable internet connections"
|
||||
- "Exported data will be less than 100MB"
|
||||
- "We can leverage the existing database export functionality"
|
||||
|
||||
### Constraints Section
|
||||
Identify limitations:
|
||||
- **Business**: Budget, timeline, market windows
|
||||
- **Technical**: System limitations, platform restrictions
|
||||
- **Organizational**: Team capacity, skill gaps
|
||||
|
||||
### Dependencies Section
|
||||
What needs to happen first?
|
||||
- "Data privacy review must be completed"
|
||||
- "Export API implementation (prd-XXX) must be finished"
|
||||
|
||||
### Risks Section
|
||||
What could go wrong?
|
||||
- Document: Risk description, likelihood (High/Med/Low), impact (High/Med/Low), and mitigation strategy
|
||||
|
||||
### Acceptance Criteria Section
|
||||
Define "done" from a business perspective (3-5 criteria):
|
||||
```
|
||||
1. Users can select data types to export (users, transactions, settings)
|
||||
2. Exports complete within 2 minutes for datasets up to 100MB
|
||||
3. Exported data is usable in common formats (CSV, JSON)
|
||||
4. Users receive email confirmation when export is ready
|
||||
5. Exported data is securely deleted after 7 days
|
||||
```
|
||||
|
||||
### Success Metrics Section
|
||||
How will you measure success after launch?
|
||||
|
||||
| Metric | Current Baseline | Target | Measurement Method |
|
||||
|--------|------------------|--------|-------------------|
|
||||
| % of power users using export | 0% | 40% | Product analytics |
|
||||
| Average export time | N/A | < 2 min | Server logs |
|
||||
| Support tickets about exports | TBD | < 5/week | Support system |
|
||||
| User satisfaction (export feature) | N/A | > 4/5 stars | In-app survey |
|
||||
|
||||
### Time to Value
|
||||
When do you expect to see results?
|
||||
- Example: "We expect 20% adoption within the first 2 weeks post-launch based on similar features"
|
||||
|
||||
### Approval Section
|
||||
Track who has approved this requirement:
|
||||
- Business Owner approval needed before engineering begins
|
||||
- Product Owner approval to confirm alignment
|
||||
- Technical Lead approval to confirm feasibility
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Specific, Not Vague
|
||||
- ❌ Bad: "Users want to export their data"
|
||||
- ✅ Good: "Users want to export their transaction history as CSV within a specific date range"
|
||||
|
||||
### Use Concrete Examples
|
||||
- Describe what the feature looks like in action
|
||||
- Include sample data or screenshots if possible
|
||||
- Give edge cases (what about large datasets? special characters? time zones?)
|
||||
|
||||
### Consider the User's Perspective
|
||||
- Think about: What problem are they solving with this?
|
||||
- What would make them happy or frustrated?
|
||||
- What alternatives might they use if you don't build this?
|
||||
|
||||
### Link to Other Specs
|
||||
- Reference related technical requirements (if they exist): "See [prd-XXX] for implementation details"
|
||||
- Reference related design docs: "See [des-XXX] for the export flow architecture"
|
||||
|
||||
### Complete All TODOs
|
||||
- Don't leave placeholder text like "TODO: Add metrics"
|
||||
- If something isn't known, explain why and what needs to happen to find out
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/business-requirement/brd-001-your-spec.md
|
||||
```
|
||||
|
||||
The validator checks for:
|
||||
- Title and ID properly formatted
|
||||
- All required sections present
|
||||
- Minimum content in critical sections
|
||||
- No incomplete TODO items
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing Acceptance Criteria section"
|
||||
- **Fix**: Add 3-5 clear, measurable acceptance criteria that define what "done" means
|
||||
|
||||
**Issue**: "User Stories only has 1-2 items (minimum 3)"
|
||||
- **Fix**: Add 1-2 more user stories representing different roles or scenarios
|
||||
|
||||
**Issue**: "TODO items in Business Value (3 items)"
|
||||
- **Fix**: Complete the Business Value section with actual expected outcomes and strategic alignment
|
||||
|
||||
**Issue**: "No Success Metrics defined"
|
||||
- **Fix**: Add a table with specific, measurable KPIs you'll track post-launch
|
||||
|
||||
### Check Completeness
|
||||
```bash
|
||||
scripts/check-completeness.sh docs/specs/business-requirement/brd-001-your-spec.md
|
||||
```
|
||||
|
||||
This shows:
|
||||
- Overall completion percentage
|
||||
- Which sections still have TODOs
|
||||
- Referenced documents (if any are broken, they show up here)
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When writing the BRD, reason through these questions:
|
||||
|
||||
1. **Problem**: Is this a real problem or a nice-to-have?
|
||||
- Can you trace it back to actual user feedback?
|
||||
- How many users are affected?
|
||||
- How often do they encounter this problem?
|
||||
|
||||
2. **Scope**: What are we NOT building?
|
||||
- Define boundaries clearly (what's in scope vs. out of scope)
|
||||
- This helps prevent scope creep
|
||||
|
||||
3. **Trade-offs**: What are we accepting by building this?
|
||||
- Engineering effort cost
|
||||
- Opportunity cost (what else won't we build?)
|
||||
- Maintenance burden
|
||||
|
||||
4. **Success**: How will we know if this was worth building?
|
||||
- What metrics matter?
|
||||
- What's the acceptable threshold for success?
|
||||
|
||||
5. **Risks**: What could prevent this from working?
|
||||
- Technical risks
|
||||
- User adoption risks
|
||||
- Business/market risks
|
||||
|
||||
## Example: Complete Business Requirement
|
||||
|
||||
Here's how a complete BRD section might look:
|
||||
|
||||
```markdown
|
||||
# [BRD-001] Bulk User Data Export
|
||||
|
||||
## Metadata
|
||||
- **Document ID**: BRD-001-bulk-export
|
||||
- **Status**: Approved
|
||||
- **Author**: Jane Smith
|
||||
- **Created**: 2024-01-15
|
||||
- **Stakeholders**: Product Manager (Jane), Engineering Lead (Bob), Support (Maria)
|
||||
- **Priority**: High
|
||||
|
||||
## Description
|
||||
|
||||
### Background
|
||||
Our enterprise customers have requested the ability to bulk export user data.
|
||||
Currently, they can only export one user at a time via the admin panel, which is
|
||||
time-consuming for customers with hundreds of users.
|
||||
|
||||
### Problem Statement
|
||||
Enterprise customers need to audit, back up, and migrate user data, but the
|
||||
current one-at-a-time export process takes hours and is error-prone.
|
||||
|
||||
## Business Value
|
||||
|
||||
### Expected Outcomes
|
||||
- Reduce manual export time for enterprise customers by 80%
|
||||
- Enable customers to audit user data for compliance purposes
|
||||
- Support customer data portability requests
|
||||
|
||||
### Strategic Alignment
|
||||
Aligns with our enterprise expansion goal by improving features our target
|
||||
customers need for large-scale deployments.
|
||||
|
||||
[... rest of sections follow template ...]
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh business-requirement brd-XXX-slug`
|
||||
2. **Fill in each section** using this guide as reference
|
||||
3. **Validate**: `scripts/validate-spec.sh docs/specs/business-requirement/brd-XXX-slug.md`
|
||||
4. **Fix issues** identified by the validator
|
||||
5. **Get stakeholder approval** (fill in the Approval section)
|
||||
6. **Share with engineering** for technical requirement creation
|
||||
600
skills/spec-author/guides/component.md
Normal file
600
skills/spec-author/guides/component.md
Normal file
@@ -0,0 +1,600 @@
|
||||
# How to Create a Component Specification
|
||||
|
||||
Component specifications document individual system components or services, including their responsibilities, interfaces, configuration, and deployment characteristics.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new component spec
|
||||
scripts/generate-spec.sh component cmp-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/component/cmp-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in the sections, then validate:
|
||||
scripts/validate-spec.sh docs/specs/component/cmp-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/component/cmp-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Component Specification
|
||||
|
||||
Use a Component Spec when you need to:
|
||||
- Document a microservice or major system component
|
||||
- Specify component responsibilities and interfaces
|
||||
- Define configuration requirements
|
||||
- Document deployment procedures
|
||||
- Enable teams to understand component behavior
|
||||
- Plan for monitoring and observability
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what informed this component:
|
||||
|
||||
```bash
|
||||
# Find design documents that reference this component
|
||||
grep -r "design\|architecture" docs/specs/ --include="*.md"
|
||||
|
||||
# Find API contracts this component implements
|
||||
grep -r "api\|endpoint" docs/specs/ --include="*.md"
|
||||
|
||||
# Find data models this component uses
|
||||
grep -r "data\|model" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Review Similar Components
|
||||
- How are other components in your system designed?
|
||||
- What patterns and conventions exist?
|
||||
- How are they deployed and monitored?
|
||||
- What's the standard for documentation?
|
||||
|
||||
### 3. Understand Dependencies
|
||||
- What services or systems does this component depend on?
|
||||
- What services depend on this component?
|
||||
- What data flows through this component?
|
||||
- What are the integration points?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Export Service", "User Authentication Service", etc.
|
||||
- **Type**: Microservice, Library, Worker, API Gateway, etc.
|
||||
- **Version**: Current version number
|
||||
|
||||
### Component Description
|
||||
|
||||
```markdown
|
||||
# Export Service
|
||||
|
||||
The Export Service is a microservice responsible for handling bulk user data exports.
|
||||
Manages export job lifecycle: queuing, processing, storage, and delivery.
|
||||
|
||||
**Type**: Microservice
|
||||
**Language**: Node.js + TypeScript
|
||||
**Deployment**: Kubernetes (3+ replicas)
|
||||
**Status**: Stable (production)
|
||||
```
|
||||
|
||||
### Purpose & Responsibilities Section
|
||||
|
||||
```markdown
|
||||
## Purpose
|
||||
|
||||
Provide reliable, scalable handling of user data exports in multiple formats
|
||||
while maintaining system stability and data security.
|
||||
|
||||
## Primary Responsibilities
|
||||
|
||||
1. **Job Queueing**: Accept export requests and queue them for processing
|
||||
- Validate request parameters
|
||||
- Create export job records
|
||||
- Enqueue jobs for processing
|
||||
- Return job ID to client
|
||||
|
||||
2. **Job Processing**: Execute export jobs asynchronously
|
||||
- Query user data from database
|
||||
- Transform data to requested format (CSV, JSON)
|
||||
- Compress files for storage
|
||||
- Handle processing errors and retries
|
||||
|
||||
3. **File Storage**: Manage exported file storage and lifecycle
|
||||
- Store completed exports to S3
|
||||
- Generate secure download URLs
|
||||
- Implement TTL-based cleanup
|
||||
- Maintain export audit logs
|
||||
|
||||
4. **Status Tracking**: Provide job status and progress information
|
||||
- Track job state (queued, processing, completed, failed)
|
||||
- Record completion time and file metadata
|
||||
- Handle cancellation requests
|
||||
|
||||
5. **Error Handling**: Manage failures gracefully
|
||||
- Retry failed jobs with exponential backoff
|
||||
- Notify users of failures
|
||||
- Log errors for debugging
|
||||
- Preserve system stability during failures
|
||||
```
|
||||
|
||||
### Interfaces & APIs Section
|
||||
|
||||
```markdown
|
||||
## Interfaces
|
||||
|
||||
### REST API Endpoints
|
||||
|
||||
The service exposes these HTTP endpoints:
|
||||
|
||||
#### POST /exports
|
||||
**Purpose**: Create a new export job
|
||||
**Authentication**: Required (Bearer token)
|
||||
**Request Body**:
|
||||
```json
|
||||
{
|
||||
"data_types": ["users", "transactions"],
|
||||
"format": "csv",
|
||||
"date_range": { "start": "2024-01-01", "end": "2024-01-31" }
|
||||
}
|
||||
```
|
||||
**Response** (201 Created):
|
||||
```json
|
||||
{
|
||||
"id": "exp_123456",
|
||||
"status": "queued",
|
||||
"created_at": "2024-01-15T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /exports/{id}
|
||||
**Purpose**: Get export job status
|
||||
**Response** (200 OK):
|
||||
```json
|
||||
{
|
||||
"id": "exp_123456",
|
||||
"status": "completed",
|
||||
"download_url": "https://...",
|
||||
"file_size_bytes": 2048576
|
||||
}
|
||||
```
|
||||
|
||||
### Event Publishing
|
||||
|
||||
The service publishes events to message queue:
|
||||
|
||||
**export.started**
|
||||
```json
|
||||
{
|
||||
"event": "export.started",
|
||||
"export_id": "exp_123456",
|
||||
"user_id": "usr_789012",
|
||||
"timestamp": "2024-01-15T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**export.completed**
|
||||
```json
|
||||
{
|
||||
"event": "export.completed",
|
||||
"export_id": "exp_123456",
|
||||
"file_size_bytes": 2048576,
|
||||
"format": "csv",
|
||||
"timestamp": "2024-01-15T10:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**export.failed**
|
||||
```json
|
||||
{
|
||||
"event": "export.failed",
|
||||
"export_id": "exp_123456",
|
||||
"error": "database_connection_timeout",
|
||||
"timestamp": "2024-01-15T10:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Dependencies (Consumed APIs)
|
||||
|
||||
- **User Service API**: GET /users/{id}, GET /users (for data export)
|
||||
- **Auth Service**: JWT validation
|
||||
- **Notification Service**: Send export completion notifications
|
||||
```
|
||||
|
||||
### Configuration Section
|
||||
|
||||
```markdown
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Type | Required | Description |
|
||||
|----------|------|----------|-------------|
|
||||
| NODE_ENV | string | Yes | Environment (dev, staging, production) |
|
||||
| PORT | number | Yes | HTTP server port (default: 3000) |
|
||||
| DATABASE_URL | string | Yes | PostgreSQL connection string |
|
||||
| REDIS_URL | string | Yes | Redis connection for job queue |
|
||||
| S3_BUCKET | string | Yes | S3 bucket for export files |
|
||||
| S3_REGION | string | Yes | AWS region (e.g., us-east-1) |
|
||||
| AWS_ACCESS_KEY_ID | string | Yes | AWS credentials |
|
||||
| AWS_SECRET_ACCESS_KEY | string | Yes | AWS credentials |
|
||||
| EXPORT_TTL_DAYS | number | No | Export file retention days (default: 7) |
|
||||
| MAX_EXPORT_SIZE_MB | number | No | Maximum export file size (default: 500) |
|
||||
| CONCURRENT_WORKERS | number | No | Number of concurrent job processors (default: 5) |
|
||||
|
||||
### Configuration File (config.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"server": {
|
||||
"port": 3000,
|
||||
"timeout_ms": 30000
|
||||
},
|
||||
"jobs": {
|
||||
"max_retries": 3,
|
||||
"retry_delay_ms": 1000,
|
||||
"timeout_ms": 300000
|
||||
},
|
||||
"export": {
|
||||
"max_file_size_mb": 500,
|
||||
"ttl_days": 7,
|
||||
"formats": ["csv", "json"]
|
||||
},
|
||||
"storage": {
|
||||
"type": "s3",
|
||||
"cleanup_interval_hours": 24
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Runtime Requirements
|
||||
|
||||
- **Memory**: 512MB minimum, 2GB recommended
|
||||
- **CPU**: 1 core minimum, 2 cores recommended
|
||||
- **Disk**: 10GB for temporary files
|
||||
- **Network**: Must reach PostgreSQL, Redis, S3, Auth Service
|
||||
```
|
||||
|
||||
### Data Dependencies Section
|
||||
|
||||
```markdown
|
||||
## Data Dependencies
|
||||
|
||||
### Input Data
|
||||
|
||||
The service requires access to:
|
||||
- **User data**: From User Service or User DB
|
||||
- Fields: id, email, name, created_at, etc.
|
||||
- Constraints: User must be authenticated
|
||||
- Volume: Scale with user dataset
|
||||
|
||||
- **Transaction data**: From Transaction DB
|
||||
- Fields: id, user_id, amount, date, etc.
|
||||
- Volume: Can be large (100k+ per user)
|
||||
|
||||
### Output Data
|
||||
|
||||
The service produces:
|
||||
- **Export files**: CSV or JSON format
|
||||
- Stored in S3
|
||||
- Size: Up to 500MB per file
|
||||
- Retention: 7 days
|
||||
|
||||
- **Export metadata**: Stored in PostgreSQL
|
||||
- Export record with status, size, completion time
|
||||
- Audit trail of all exports
|
||||
```
|
||||
|
||||
### Deployment Section
|
||||
|
||||
```markdown
|
||||
## Deployment
|
||||
|
||||
### Container Image
|
||||
|
||||
- **Base Image**: node:18-alpine
|
||||
- **Build**: Dockerfile in repository root
|
||||
- **Registry**: ECR (AWS Elastic Container Registry)
|
||||
- **Tag**: Semver (e.g., v1.2.3, latest)
|
||||
|
||||
### Kubernetes Deployment
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: export-service
|
||||
namespace: production
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: export-service
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: export-service
|
||||
spec:
|
||||
containers:
|
||||
- name: export-service
|
||||
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:latest
|
||||
ports:
|
||||
- containerPort: 3000
|
||||
resources:
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "250m"
|
||||
limits:
|
||||
memory: "2Gi"
|
||||
cpu: "1000m"
|
||||
env:
|
||||
- name: NODE_ENV
|
||||
value: "production"
|
||||
- name: DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: export-service-secrets
|
||||
key: database-url
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 3000
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ready
|
||||
port: 3000
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
```
|
||||
|
||||
### Deployment Steps
|
||||
|
||||
1. **Build**: `docker build -t export-service:v1.2.3 .`
|
||||
2. **Push**: `docker push <registry>/export-service:v1.2.3`
|
||||
3. **Update**: `kubectl set image deployment/export-service export-service=<registry>/export-service:v1.2.3`
|
||||
4. **Verify**: `kubectl rollout status deployment/export-service`
|
||||
|
||||
### Rollback Procedure
|
||||
|
||||
```bash
|
||||
# If deployment fails, rollback to previous version
|
||||
kubectl rollout undo deployment/export-service
|
||||
|
||||
# Verify successful rollback
|
||||
kubectl rollout status deployment/export-service
|
||||
```
|
||||
|
||||
### Pre-Deployment Checklist
|
||||
|
||||
- [ ] All tests passing locally
|
||||
- [ ] Database migrations run successfully
|
||||
- [ ] Configuration environment variables set in staging
|
||||
- [ ] Health check endpoints responding
|
||||
- [ ] Metrics and logging verified
|
||||
```
|
||||
|
||||
### Monitoring & Observability Section
|
||||
|
||||
```markdown
|
||||
## Monitoring
|
||||
|
||||
### Health Checks
|
||||
|
||||
**Liveness Probe**: GET /health
|
||||
- Returns 200 if service is running
|
||||
- Used by Kubernetes to restart unhealthy pods
|
||||
|
||||
**Readiness Probe**: GET /ready
|
||||
- Returns 200 if service is ready to receive traffic
|
||||
- Checks database connectivity, Redis availability
|
||||
- Used by Kubernetes for traffic routing
|
||||
|
||||
### Metrics
|
||||
|
||||
Export these Prometheus metrics:
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| exports_created_total | Counter | Total exports created |
|
||||
| exports_completed_total | Counter | Total exports completed successfully |
|
||||
| exports_failed_total | Counter | Total exports failed |
|
||||
| export_duration_seconds | Histogram | Time to complete export (p50, p95, p99) |
|
||||
| export_file_size_bytes | Histogram | Size of exported files |
|
||||
| export_job_queue_depth | Gauge | Number of jobs awaiting processing |
|
||||
| export_active_jobs | Gauge | Number of jobs currently processing |
|
||||
|
||||
### Alerts
|
||||
|
||||
Configure these alerts:
|
||||
|
||||
**Export Job Backlog Growing**
|
||||
- Alert if `export_job_queue_depth > 100` for 5+ minutes
|
||||
- Action: Scale up worker replicas
|
||||
|
||||
**Export Failures Increasing**
|
||||
- Alert if `exports_failed_total` increases by > 10% in 1 hour
|
||||
- Action: Investigate failure logs
|
||||
|
||||
**Service Unhealthy**
|
||||
- Alert if liveness probe fails
|
||||
- Action: Restart pod, check logs
|
||||
|
||||
### Logging
|
||||
|
||||
Log format (JSON):
|
||||
```json
|
||||
{
|
||||
"timestamp": "2024-01-15T10:05:00Z",
|
||||
"level": "info",
|
||||
"service": "export-service",
|
||||
"export_id": "exp_123456",
|
||||
"event": "export_completed",
|
||||
"duration_ms": 5000,
|
||||
"file_size_bytes": 2048576,
|
||||
"message": "Export completed successfully"
|
||||
}
|
||||
```
|
||||
|
||||
**Log Levels**
|
||||
- `debug`: Detailed debugging information
|
||||
- `info`: Important operational events
|
||||
- `warn`: Warning conditions (retries, slow operations)
|
||||
- `error`: Error conditions (failures, exceptions)
|
||||
```
|
||||
|
||||
### Dependencies & Integration Section
|
||||
|
||||
```markdown
|
||||
## Dependencies
|
||||
|
||||
### Service Dependencies
|
||||
|
||||
| Service | Purpose | Criticality | Failure Impact |
|
||||
|---------|---------|-------------|----------------|
|
||||
| PostgreSQL | Export job storage | Critical | Service down |
|
||||
| Redis | Job queue | Critical | Exports won't process |
|
||||
| S3 | Export file storage | Critical | Can't store exports |
|
||||
| Auth Service | JWT validation | Critical | Can't validate requests |
|
||||
| User Service | User data source | Critical | Can't export user data |
|
||||
| Notification Service | Email notifications | Optional | Users won't get notification |
|
||||
|
||||
### External Dependencies
|
||||
|
||||
- **AWS S3**: For file storage and retrieval
|
||||
- **PostgreSQL**: For export metadata
|
||||
- **Redis**: For job queue
|
||||
- **Kubernetes**: For orchestration
|
||||
|
||||
### Fallback Strategies
|
||||
|
||||
- Redis unavailable: Use in-memory queue (single instance only)
|
||||
- User Service unavailable: Fail export with "upstream_error"
|
||||
- S3 unavailable: Retry with exponential backoff, max 3 times
|
||||
```
|
||||
|
||||
### Performance & SLA Section
|
||||
|
||||
```markdown
|
||||
## Performance Characteristics
|
||||
|
||||
### Throughput
|
||||
- Process up to 1000 exports per day
|
||||
- Handle 100 concurrent job workers
|
||||
- Queue depth auto-scales based on load
|
||||
|
||||
### Latency
|
||||
- Create export job: < 100ms (p95)
|
||||
- Process 100MB export: 3-5 minutes average
|
||||
- Query export status: < 50ms (p95)
|
||||
|
||||
### Resource Usage
|
||||
- Memory: 800MB average, peaks at 1.5GB
|
||||
- CPU: 25% average, peaks at 60%
|
||||
- Disk (temp): 50GB for concurrent exports
|
||||
|
||||
### Service Level Objectives (SLOs)
|
||||
|
||||
| Objective | Target |
|
||||
|-----------|--------|
|
||||
| Availability | 99.5% uptime |
|
||||
| Error Rate | < 0.1% |
|
||||
| p95 Latency (status query) | < 100ms |
|
||||
| Export Completion | < 10 minutes for 100MB |
|
||||
|
||||
### Scalability
|
||||
|
||||
- Horizontal: Add more pods for higher throughput
|
||||
- Vertical: Increase pod memory/CPU for larger exports
|
||||
- Maximum tested: 10k exports/day on 5 pod cluster
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Specific About Responsibilities
|
||||
- What does this component do?
|
||||
- What does it NOT do?
|
||||
- Where do responsibilities start/stop?
|
||||
|
||||
### Document All Interfaces
|
||||
- REST APIs? Document endpoints and schemas
|
||||
- Message queues? Show event formats
|
||||
- Database? Show schema and queries
|
||||
- Dependencies? Show what's called and how
|
||||
|
||||
### Include Deployment Details
|
||||
- How is it deployed (containers, VMs, serverless)?
|
||||
- Configuration required?
|
||||
- Health checks?
|
||||
- Monitoring setup?
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference design docs: `[DES-001]`
|
||||
- Reference API contracts: `[API-001]`
|
||||
- Reference data models: `[DATA-001]`
|
||||
- Reference deployment procedures: `[DEPLOY-001]`
|
||||
|
||||
### Document Failure Modes
|
||||
- What happens if dependencies fail?
|
||||
- How does the component recover?
|
||||
- What alerts fire when things go wrong?
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/component/cmp-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing Interfaces section"
|
||||
- **Fix**: Document all APIs, event formats, and data contracts
|
||||
|
||||
**Issue**: "Configuration incomplete"
|
||||
- **Fix**: Add environment variables, configuration files, and runtime requirements
|
||||
|
||||
**Issue**: "No Monitoring section"
|
||||
- **Fix**: Add health checks, metrics, alerts, and logging strategy
|
||||
|
||||
**Issue**: "Deployment steps unclear"
|
||||
- **Fix**: Add step-by-step deployment and rollback procedures
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When writing a component spec, consider:
|
||||
|
||||
1. **Boundaries**: What is this component's responsibility?
|
||||
- What does it own?
|
||||
- What does it depend on?
|
||||
- Where are boundaries clear?
|
||||
|
||||
2. **Interfaces**: How will others interact with this?
|
||||
- REST, gRPC, events, direct calls?
|
||||
- What contracts must be maintained?
|
||||
- How do we evolve interfaces?
|
||||
|
||||
3. **Configuration**: What's configurable vs. hardcoded?
|
||||
- Environment-specific settings?
|
||||
- Runtime tuning parameters?
|
||||
- Feature flags?
|
||||
|
||||
4. **Operations**: How will we run this in production?
|
||||
- Deployment model?
|
||||
- Monitoring and alerting?
|
||||
- Failure recovery?
|
||||
|
||||
5. **Scale**: How much can this component handle?
|
||||
- Throughput limits?
|
||||
- Scaling strategy?
|
||||
- Resource requirements?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh component cmp-XXX-slug`
|
||||
2. **Research**: Find design docs and existing components
|
||||
3. **Define responsibilities** and boundaries
|
||||
4. **Document interfaces** for all interactions
|
||||
5. **Plan deployment** and monitoring
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/component/cmp-XXX-slug.md`
|
||||
7. **Share with architecture/ops** before implementation
|
||||
707
skills/spec-author/guides/configuration-schema.md
Normal file
707
skills/spec-author/guides/configuration-schema.md
Normal file
@@ -0,0 +1,707 @@
|
||||
# How to Create a Configuration Schema Specification
|
||||
|
||||
Configuration schema specifications document all configurable parameters for a system, including their types, valid values, defaults, and impact.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new configuration schema
|
||||
scripts/generate-spec.sh configuration-schema config-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/configuration-schema/config-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in configuration fields and validation rules, then validate:
|
||||
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Configuration Schema
|
||||
|
||||
Use a Configuration Schema when you need to:
|
||||
- Document all configurable system parameters
|
||||
- Specify environment variables and their meanings
|
||||
- Define configuration file formats
|
||||
- Document validation rules and constraints
|
||||
- Enable operations teams to configure systems safely
|
||||
- Provide examples for different environments
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what you're configuring:
|
||||
|
||||
```bash
|
||||
# Find component specs
|
||||
grep -r "component" docs/specs/ --include="*.md"
|
||||
|
||||
# Find deployment procedures
|
||||
grep -r "deploy" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing configuration specs
|
||||
grep -r "config" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Understand Configuration Needs
|
||||
- What aspects of the system need to be configurable?
|
||||
- What differs between environments (dev, staging, prod)?
|
||||
- What can change at runtime vs. requires restart?
|
||||
- What's sensitive (secrets, credentials)?
|
||||
|
||||
### 3. Review Existing Configurations
|
||||
- How are other services configured?
|
||||
- What configuration format is used?
|
||||
- What environment variables exist?
|
||||
- What patterns should be followed?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Export Service Configuration", "API Gateway Config", etc.
|
||||
- **Component**: What component is being configured
|
||||
- **Version**: Configuration format version
|
||||
- **Status**: Current, Deprecated, etc.
|
||||
|
||||
### Overview Section
|
||||
|
||||
```markdown
|
||||
# Export Service Configuration Schema
|
||||
|
||||
## Summary
|
||||
Defines all configurable parameters for the Export Service microservice.
|
||||
Configuration can be set via environment variables or JSON config file.
|
||||
|
||||
**Configuration Methods**:
|
||||
- Environment variables (recommended for Docker/Kubernetes)
|
||||
- config.json file (for monolithic deployments)
|
||||
- Command-line arguments (for local development)
|
||||
|
||||
**Scope**: All settings that affect Export Service behavior
|
||||
**Format**: JSON Schema compliant
|
||||
```
|
||||
|
||||
### Configuration Methods Section
|
||||
|
||||
```markdown
|
||||
## Configuration Methods
|
||||
|
||||
### Method 1: Environment Variables (Recommended for Production)
|
||||
Used in containerized deployments (Docker, Kubernetes).
|
||||
Set before starting the service.
|
||||
|
||||
**Syntax**: `EXPORT_SERVICE_KEY=value`
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
export EXPORT_SERVICE_PORT=3000
|
||||
export EXPORT_SERVICE_LOG_LEVEL=info
|
||||
export EXPORT_SERVICE_DATABASE_URL=postgresql://user:pass@host/db
|
||||
```
|
||||
|
||||
### Method 2: Configuration File (config.json)
|
||||
Used in monolithic or local deployments.
|
||||
JSON format with hierarchical structure.
|
||||
|
||||
**Location**: `./config.json` in working directory
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"server": {
|
||||
"port": 3000,
|
||||
"timeout": 30000
|
||||
},
|
||||
"database": {
|
||||
"url": "postgresql://user:pass@host/db",
|
||||
"pool": 10
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Method 3: Command-Line Arguments
|
||||
Used in local development. Takes precedence over file config.
|
||||
|
||||
**Syntax**: `--key value` or `--key=value`
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
node index.js --port 3000 --log-level debug
|
||||
```
|
||||
|
||||
### Precedence (Priority Order)
|
||||
1. Command-line arguments (highest priority)
|
||||
2. Environment variables
|
||||
3. config.json file
|
||||
4. Default values (lowest priority)
|
||||
```
|
||||
|
||||
### Configuration Fields Section
|
||||
|
||||
Document each configuration field:
|
||||
|
||||
```markdown
|
||||
## Configuration Fields
|
||||
|
||||
### Server Section
|
||||
|
||||
#### PORT
|
||||
- **Type**: integer
|
||||
- **Default**: 3000
|
||||
- **Range**: 1024-65535
|
||||
- **Environment Variable**: `EXPORT_SERVICE_PORT`
|
||||
- **Config File Key**: `server.port`
|
||||
- **Description**: HTTP server listening port
|
||||
- **Examples**:
|
||||
- Development: 3000 (local machine, different services use different ports)
|
||||
- Production: 3000 (behind load balancer, port not exposed)
|
||||
- **Impact**: Service not reachable if port already in use
|
||||
- **Can Change at Runtime**: No (requires restart)
|
||||
|
||||
#### TIMEOUT_MS
|
||||
- **Type**: integer (milliseconds)
|
||||
- **Default**: 30000 (30 seconds)
|
||||
- **Range**: 5000-120000
|
||||
- **Environment Variable**: `EXPORT_SERVICE_TIMEOUT_MS`
|
||||
- **Config File Key**: `server.timeout_ms`
|
||||
- **Description**: HTTP request timeout
|
||||
- **Considerations**:
|
||||
- Must be longer than longest export duration
|
||||
- If too short: Long exports time out and fail
|
||||
- If too long: Failed connections hang longer
|
||||
- **Examples**:
|
||||
- Development: 30000 (quick feedback on errors)
|
||||
- Production: 120000 (accounts for large exports)
|
||||
|
||||
#### ENABLE_COMPRESSION
|
||||
- **Type**: boolean
|
||||
- **Default**: true
|
||||
- **Environment Variable**: `EXPORT_SERVICE_ENABLE_COMPRESSION`
|
||||
- **Config File Key**: `server.enable_compression`
|
||||
- **Description**: Enable HTTP response compression (gzip)
|
||||
- **Considerations**:
|
||||
- Reduces bandwidth but increases CPU usage
|
||||
- Should be true unless CPU constrained
|
||||
- **Typical Value**: true (always)
|
||||
|
||||
### Database Section
|
||||
|
||||
#### DATABASE_URL
|
||||
- **Type**: string (connection string)
|
||||
- **Default**: None (required)
|
||||
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_URL`
|
||||
- **Config File Key**: `database.url`
|
||||
- **Format**: `postgresql://user:password@host:port/database`
|
||||
- **Description**: PostgreSQL connection string
|
||||
- **Examples**:
|
||||
- Development: `postgresql://localhost/export_service`
|
||||
- Staging: `postgresql://stage-db.example.com/export_stage`
|
||||
- Production: `postgresql://prod-db.example.com/export_prod` (managed RDS)
|
||||
- **Sensitive**: Yes (contains credentials - use secrets management)
|
||||
- **Required**: Yes
|
||||
- **Validation**:
|
||||
- Must be valid PostgreSQL connection string
|
||||
- Service fails to start if URL invalid or unreachable
|
||||
|
||||
#### DATABASE_POOL_SIZE
|
||||
- **Type**: integer
|
||||
- **Default**: 10
|
||||
- **Range**: 1-100
|
||||
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_POOL_SIZE`
|
||||
- **Config File Key**: `database.pool_size`
|
||||
- **Description**: Number of database connections to maintain
|
||||
- **Considerations**:
|
||||
- More connections allow more concurrent queries
|
||||
- Each connection uses memory and database slot
|
||||
- Database has max_connections limit (typically 100-500)
|
||||
- **Tuning**:
|
||||
- 1 service instance: 5-10 connections
|
||||
- 5 service instances: 2-4 connections each (25-40 total)
|
||||
- Kubernetes auto-scaling: 2-3 per pod (auto-scaled)
|
||||
|
||||
#### DATABASE_QUERY_TIMEOUT_MS
|
||||
- **Type**: integer (milliseconds)
|
||||
- **Default**: 10000 (10 seconds)
|
||||
- **Range**: 1000-60000
|
||||
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_QUERY_TIMEOUT_MS`
|
||||
- **Config File Key**: `database.query_timeout_ms`
|
||||
- **Description**: Timeout for individual database queries
|
||||
- **Considerations**:
|
||||
- Export queries can take several seconds for large datasets
|
||||
- If too short: Queries fail prematurely
|
||||
- If too long: Failed queries block connection pool
|
||||
- **Typical Values**:
|
||||
- Simple queries: 5000ms
|
||||
- Large exports: 30000ms
|
||||
|
||||
### Redis (Job Queue) Section
|
||||
|
||||
#### REDIS_URL
|
||||
- **Type**: string (connection string)
|
||||
- **Default**: None (required)
|
||||
- **Environment Variable**: `EXPORT_SERVICE_REDIS_URL`
|
||||
- **Config File Key**: `redis.url`
|
||||
- **Format**: `redis://user:password@host:port/db`
|
||||
- **Description**: Redis connection string for job queue
|
||||
- **Examples**:
|
||||
- Development: `redis://localhost:6379/0`
|
||||
- Staging: `redis://redis-stage.example.com:6379/0`
|
||||
- Production: `redis://redis-prod.example.com:6379/0` (managed ElastiCache)
|
||||
- **Sensitive**: Yes (may contain credentials)
|
||||
- **Required**: Yes
|
||||
|
||||
#### REDIS_MAX_RETRIES
|
||||
- **Type**: integer
|
||||
- **Default**: 3
|
||||
- **Range**: 1-10
|
||||
- **Environment Variable**: `EXPORT_SERVICE_REDIS_MAX_RETRIES`
|
||||
- **Config File Key**: `redis.max_retries`
|
||||
- **Description**: Maximum retry attempts for Redis operations
|
||||
- **Considerations**:
|
||||
- More retries provide resilience but increase latency on failure
|
||||
- Should be 3-5 for production
|
||||
- **Typical Values**: 3
|
||||
|
||||
#### CONCURRENT_WORKERS
|
||||
- **Type**: integer
|
||||
- **Default**: 3
|
||||
- **Range**: 1-20
|
||||
- **Environment Variable**: `EXPORT_SERVICE_CONCURRENT_WORKERS`
|
||||
- **Config File Key**: `redis.concurrent_workers`
|
||||
- **Description**: Number of concurrent export workers
|
||||
- **Considerations**:
|
||||
- Each worker processes one export job at a time
|
||||
- More workers process jobs faster but use more resources
|
||||
- Limited by CPU and memory available
|
||||
- Kubernetes scales pods, not this setting
|
||||
- **Tuning**:
|
||||
- Development: 1-2 (for debugging)
|
||||
- Production with 2 CPU: 2-3 workers
|
||||
- Production with 4+ CPU: 4-8 workers
|
||||
|
||||
### Export Section
|
||||
|
||||
#### MAX_EXPORT_SIZE_MB
|
||||
- **Type**: integer
|
||||
- **Default**: 500
|
||||
- **Range**: 10-5000
|
||||
- **Environment Variable**: `EXPORT_SERVICE_MAX_EXPORT_SIZE_MB`
|
||||
- **Config File Key**: `export.max_export_size_mb`
|
||||
- **Description**: Maximum size for an export file (in MB)
|
||||
- **Considerations**:
|
||||
- Files larger than this are rejected
|
||||
- Limited by disk space and memory
|
||||
- Should match S3 bucket policies
|
||||
- **Typical Values**:
|
||||
- Small deployments: 100MB
|
||||
- Standard: 500MB
|
||||
- Enterprise: 1000-5000MB
|
||||
|
||||
#### EXPORT_TTL_DAYS
|
||||
- **Type**: integer (days)
|
||||
- **Default**: 7
|
||||
- **Range**: 1-365
|
||||
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_TTL_DAYS`
|
||||
- **Config File Key**: `export.ttl_days`
|
||||
- **Description**: How long to retain export files after completion
|
||||
- **Considerations**:
|
||||
- Files deleted after TTL expires
|
||||
- Affects storage costs (shorter TTL = lower cost)
|
||||
- Users must download before expiration
|
||||
- **Typical Values**:
|
||||
- Short retention: 3 days (reduce storage cost)
|
||||
- Standard: 7 days (reasonable download window)
|
||||
- Long retention: 30 days (enterprise customers)
|
||||
|
||||
#### EXPORT_FORMATS
|
||||
- **Type**: array of strings
|
||||
- **Default**: ["csv", "json"]
|
||||
- **Valid Values**: "csv", "json", "parquet"
|
||||
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_FORMATS` (comma-separated)
|
||||
- **Config File Key**: `export.formats`
|
||||
- **Description**: Supported export file formats
|
||||
- **Examples**:
|
||||
- `["csv", "json"]` (most common)
|
||||
- `["csv", "json", "parquet"]` (full support)
|
||||
- **Configuration**:
|
||||
- Environment: `EXPORT_SERVICE_EXPORT_FORMATS=csv,json`
|
||||
- File: `"formats": ["csv", "json"]`
|
||||
|
||||
#### COMPRESSION_ENABLED
|
||||
- **Type**: boolean
|
||||
- **Default**: true
|
||||
- **Environment Variable**: `EXPORT_SERVICE_COMPRESSION_ENABLED`
|
||||
- **Config File Key**: `export.compression_enabled`
|
||||
- **Description**: Enable gzip compression for export files
|
||||
- **Considerations**:
|
||||
- Reduces file size by 60-80% typically
|
||||
- Increases CPU usage during export
|
||||
- Should be enabled unless CPU is bottleneck
|
||||
- **Typical Value**: true
|
||||
|
||||
### Storage Section
|
||||
|
||||
#### S3_BUCKET
|
||||
- **Type**: string
|
||||
- **Default**: None (required)
|
||||
- **Environment Variable**: `EXPORT_SERVICE_S3_BUCKET`
|
||||
- **Config File Key**: `storage.s3_bucket`
|
||||
- **Description**: AWS S3 bucket for storing export files
|
||||
- **Format**: `bucket-name` (no s3:// prefix)
|
||||
- **Examples**:
|
||||
- Development: `export-service-dev`
|
||||
- Staging: `export-service-stage`
|
||||
- Production: `export-service-prod`
|
||||
- **Required**: Yes
|
||||
- **IAM Requirements**: Service role must have s3:PutObject, s3:GetObject
|
||||
|
||||
#### S3_REGION
|
||||
- **Type**: string
|
||||
- **Default**: `us-east-1`
|
||||
- **Valid Values**: Any AWS region (us-east-1, eu-west-1, etc.)
|
||||
- **Environment Variable**: `EXPORT_SERVICE_S3_REGION`
|
||||
- **Config File Key**: `storage.s3_region`
|
||||
- **Description**: AWS region for S3 bucket
|
||||
- **Examples**:
|
||||
- us-east-1 (US East - Virginia)
|
||||
- eu-west-1 (EU - Ireland)
|
||||
|
||||
### Logging Section
|
||||
|
||||
#### LOG_LEVEL
|
||||
- **Type**: string (enum)
|
||||
- **Default**: "info"
|
||||
- **Valid Values**: "debug", "info", "warn", "error"
|
||||
- **Environment Variable**: `EXPORT_SERVICE_LOG_LEVEL`
|
||||
- **Config File Key**: `logging.level`
|
||||
- **Description**: Logging verbosity level
|
||||
- **Examples**:
|
||||
- Development: "debug" (verbose, detailed logs)
|
||||
- Staging: "info" (normal level)
|
||||
- Production: "info" or "warn" (minimal logs, better performance)
|
||||
- **Considerations**:
|
||||
- debug: Very verbose, affects performance
|
||||
- info: Standard operational logs
|
||||
- warn: Only warnings and errors
|
||||
- error: Only errors
|
||||
|
||||
#### LOG_FORMAT
|
||||
- **Type**: string (enum)
|
||||
- **Default**: "json"
|
||||
- **Valid Values**: "json", "text"
|
||||
- **Environment Variable**: `EXPORT_SERVICE_LOG_FORMAT`
|
||||
- **Config File Key**: `logging.format`
|
||||
- **Description**: Log output format
|
||||
- **Examples**:
|
||||
- json: Machine-parseable JSON logs (recommended for production)
|
||||
- text: Human-readable text (good for development)
|
||||
|
||||
### Feature Flags Section
|
||||
|
||||
#### FEATURE_PARQUET_EXPORT
|
||||
- **Type**: boolean
|
||||
- **Default**: false
|
||||
- **Environment Variable**: `EXPORT_SERVICE_FEATURE_PARQUET_EXPORT`
|
||||
- **Config File Key**: `features.parquet_export`
|
||||
- **Description**: Enable experimental Parquet export format
|
||||
- **Considerations**:
|
||||
- Set to false for stable deployments
|
||||
- Set to true in staging for testing
|
||||
- Disabled by default in production
|
||||
- **Typical Values**:
|
||||
- Development: true (test new feature)
|
||||
- Staging: true (validate before production)
|
||||
- Production: false (disabled until stable)
|
||||
```
|
||||
|
||||
### Validation Rules Section
|
||||
|
||||
```markdown
|
||||
## Validation & Constraints
|
||||
|
||||
### Required Fields
|
||||
These fields must be provided (no default value):
|
||||
- `DATABASE_URL` - PostgreSQL connection string required
|
||||
- `REDIS_URL` - Redis connection required
|
||||
- `S3_BUCKET` - S3 bucket must be specified
|
||||
|
||||
### Type Validation
|
||||
- Integers: Must be valid numeric values
|
||||
- Booleans: Accept true, false, "true", "false", 1, 0
|
||||
- Strings: Must not be empty (unless explicitly optional)
|
||||
- Arrays: Must be comma-separated in environment, JSON array in file
|
||||
|
||||
### Range Validation
|
||||
- PORT: 1024-65535 (avoid system ports)
|
||||
- POOL_SIZE: 1-100 (reasonable connection pool)
|
||||
- TIMEOUT_MS: 5000-120000 (between 5 seconds and 2 minutes)
|
||||
- MAX_EXPORT_SIZE_MB: 10-5000 (reasonable file sizes)
|
||||
|
||||
### Format Validation
|
||||
- DATABASE_URL: Must be valid PostgreSQL connection string
|
||||
- S3_BUCKET: Must follow S3 naming rules (lowercase, hyphens only)
|
||||
- S3_REGION: Must be valid AWS region code
|
||||
|
||||
### Interdependency Rules
|
||||
- If COMPRESSION_ENABLED=true: MAX_EXPORT_SIZE_MB can be larger
|
||||
- If MAX_EXPORT_SIZE_MB > 100: DATABASE_QUERY_TIMEOUT_MS should be > 10000
|
||||
- If CONCURRENT_WORKERS > 5: Memory requirements increase significantly
|
||||
|
||||
### Error Cases
|
||||
What happens if validation fails:
|
||||
- Service fails to start with validation error
|
||||
- Specific field and reason for validation failure logged
|
||||
- Error message includes valid range/values
|
||||
```
|
||||
|
||||
### Environment-Specific Configurations Section
|
||||
|
||||
```markdown
|
||||
## Environment-Specific Configurations
|
||||
|
||||
### Development Environment
|
||||
|
||||
```json
|
||||
{
|
||||
"server": {
|
||||
"port": 3000,
|
||||
"timeout_ms": 30000
|
||||
},
|
||||
"database": {
|
||||
"url": "postgresql://localhost/export_service",
|
||||
"pool_size": 5
|
||||
},
|
||||
"redis": {
|
||||
"url": "redis://localhost:6379/0",
|
||||
"concurrent_workers": 1
|
||||
},
|
||||
"export": {
|
||||
"max_export_size_mb": 100,
|
||||
"ttl_days": 7,
|
||||
"formats": ["csv", "json"]
|
||||
},
|
||||
"logging": {
|
||||
"level": "debug",
|
||||
"format": "text"
|
||||
},
|
||||
"features": {
|
||||
"parquet_export": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Runs locally with minimal resources
|
||||
- Verbose logging for debugging
|
||||
- Limited concurrent workers (1)
|
||||
- Smaller max export size for testing
|
||||
|
||||
### Staging Environment
|
||||
|
||||
```bash
|
||||
EXPORT_SERVICE_PORT=3000
|
||||
EXPORT_SERVICE_DATABASE_URL=postgresql://stage-db.example.com/export_stage
|
||||
EXPORT_SERVICE_REDIS_URL=redis://redis-stage.example.com:6379/0
|
||||
EXPORT_SERVICE_S3_BUCKET=export-service-stage
|
||||
EXPORT_SERVICE_S3_REGION=us-east-1
|
||||
EXPORT_SERVICE_LOG_LEVEL=info
|
||||
EXPORT_SERVICE_LOG_FORMAT=json
|
||||
EXPORT_SERVICE_CONCURRENT_WORKERS=3
|
||||
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
|
||||
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=true
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Tests new features before production
|
||||
- Similar resources to production
|
||||
- Parquet export enabled for testing
|
||||
|
||||
### Production Environment
|
||||
|
||||
```bash
|
||||
EXPORT_SERVICE_PORT=3000
|
||||
EXPORT_SERVICE_DATABASE_URL=<from AWS Secrets Manager>
|
||||
EXPORT_SERVICE_REDIS_URL=<from AWS Secrets Manager>
|
||||
EXPORT_SERVICE_S3_BUCKET=export-service-prod
|
||||
EXPORT_SERVICE_S3_REGION=us-east-1
|
||||
EXPORT_SERVICE_LOG_LEVEL=info
|
||||
EXPORT_SERVICE_LOG_FORMAT=json
|
||||
EXPORT_SERVICE_CONCURRENT_WORKERS=4
|
||||
EXPORT_SERVICE_DATABASE_POOL_SIZE=3
|
||||
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
|
||||
EXPORT_SERVICE_EXPORT_TTL_DAYS=7
|
||||
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=false
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Credentials from secrets manager
|
||||
- Optimized for performance and reliability
|
||||
- Experimental features disabled
|
||||
- Standard deployment settings
|
||||
```
|
||||
|
||||
### Configuration Examples Section
|
||||
|
||||
```markdown
|
||||
## Complete Configuration Examples
|
||||
|
||||
### Minimal Configuration (Development)
|
||||
```bash
|
||||
# Minimal settings needed to run locally
|
||||
export EXPORT_SERVICE_DATABASE_URL=postgresql://localhost/export_service
|
||||
export EXPORT_SERVICE_REDIS_URL=redis://localhost:6379/0
|
||||
export EXPORT_SERVICE_S3_BUCKET=export-service-local
|
||||
export EXPORT_SERVICE_S3_REGION=us-east-1
|
||||
```
|
||||
|
||||
### High-Throughput Configuration (Production)
|
||||
```bash
|
||||
# Optimized for maximum throughput
|
||||
export EXPORT_SERVICE_CONCURRENT_WORKERS=8
|
||||
export EXPORT_SERVICE_DATABASE_POOL_SIZE=5
|
||||
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=1000
|
||||
export EXPORT_SERVICE_COMPRESSION_ENABLED=true
|
||||
export EXPORT_SERVICE_EXPORT_TTL_DAYS=30
|
||||
```
|
||||
|
||||
### Low-Resource Configuration (Cost-Optimized)
|
||||
```bash
|
||||
# Minimizes resource usage and cost
|
||||
export EXPORT_SERVICE_CONCURRENT_WORKERS=1
|
||||
export EXPORT_SERVICE_DATABASE_POOL_SIZE=2
|
||||
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=100
|
||||
export EXPORT_SERVICE_EXPORT_TTL_DAYS=1
|
||||
export EXPORT_SERVICE_LOG_LEVEL=warn
|
||||
```
|
||||
```
|
||||
|
||||
### Secrets Management Section
|
||||
|
||||
```markdown
|
||||
## Handling Sensitive Configuration
|
||||
|
||||
### Sensitive Fields
|
||||
These fields contain credentials or sensitive information:
|
||||
- DATABASE_URL (contains password)
|
||||
- REDIS_URL (may contain password)
|
||||
- AWS credentials (if not using IAM roles)
|
||||
|
||||
### Security Best Practices
|
||||
1. **Never commit secrets to git**
|
||||
- Use .gitignore to exclude config files with secrets
|
||||
- Use environment variables instead
|
||||
|
||||
2. **Use Secrets Management**
|
||||
- AWS Secrets Manager (recommended for production)
|
||||
- HashiCorp Vault (for multi-team deployments)
|
||||
- Kubernetes Secrets (for K8s deployments)
|
||||
|
||||
3. **Rotate Credentials**
|
||||
- Rotate database passwords regularly
|
||||
- Rotate AWS API keys
|
||||
- Update service after rotation
|
||||
|
||||
4. **Limit Access**
|
||||
- Only operations team can see production credentials
|
||||
- Audit logs track who accessed what credentials
|
||||
- Use IAM roles instead of static credentials when possible
|
||||
|
||||
### Example: Using AWS Secrets Manager
|
||||
```bash
|
||||
# In Kubernetes deployment, inject from AWS Secrets Manager
|
||||
DATABASE_URL=$(aws secretsmanager get-secret-value \
|
||||
--secret-id export-service/db-url \
|
||||
--query SecretString --output text)
|
||||
|
||||
export EXPORT_SERVICE_DATABASE_URL=$DATABASE_URL
|
||||
```
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Clear About Scope
|
||||
- What can users configure?
|
||||
- What's fixed/non-configurable and why?
|
||||
- What requires restart vs. hot reload?
|
||||
|
||||
### Provide Realistic Examples
|
||||
- Show real values, not placeholders
|
||||
- Include examples for different environments
|
||||
- Show both correct and incorrect formats
|
||||
|
||||
### Document Trade-offs
|
||||
- Why choose certain defaults?
|
||||
- What's the impact of changing values?
|
||||
- What happens if value is too high/low?
|
||||
|
||||
### Include Validation
|
||||
- What values are valid?
|
||||
- What happens if invalid values provided?
|
||||
- How do users know if config is wrong?
|
||||
|
||||
### Think About Operations
|
||||
- What configuration might ops teams want to change?
|
||||
- What parameters help troubleshoot issues?
|
||||
- What can be tuned for performance?
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Configuration fields lack descriptions"
|
||||
- **Fix**: Add purpose, examples, and impact for each field
|
||||
|
||||
**Issue**: "No validation rules documented"
|
||||
- **Fix**: Document valid ranges, formats, required fields
|
||||
|
||||
**Issue**: "No environment-specific examples"
|
||||
- **Fix**: Add configurations for dev, staging, and production
|
||||
|
||||
**Issue**: "Sensitive fields not highlighted"
|
||||
- **Fix**: Clearly mark sensitive fields and document secrets management
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When designing configuration schema:
|
||||
|
||||
1. **Scope**: What should be configurable?
|
||||
- Environment-specific settings?
|
||||
- Performance tuning parameters?
|
||||
- Feature flags?
|
||||
- Operational settings?
|
||||
|
||||
2. **Defaults**: What are good default values?
|
||||
- Production-safe defaults?
|
||||
- Development-friendly for new users?
|
||||
- Documented reasoning?
|
||||
|
||||
3. **Flexibility**: How much should users configure?
|
||||
- Too much: Confusing, hard to troubleshoot
|
||||
- Too little: Can't adapt to needs
|
||||
- Right amount: Common use cases covered
|
||||
|
||||
4. **Safety**: How do we prevent misconfiguration?
|
||||
- Validation rules?
|
||||
- Error messages?
|
||||
- Documentation of constraints?
|
||||
|
||||
5. **Evolution**: How will configuration change?
|
||||
- Backward compatibility?
|
||||
- Migration path for old configs?
|
||||
- Deprecation timeline?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh configuration-schema config-XXX-slug`
|
||||
2. **List fields**: What can be configured?
|
||||
3. **Document each field** with type, default, range, impact
|
||||
4. **Provide examples** for different environments
|
||||
5. **Document validation** rules and constraints
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/configuration-schema/config-XXX-slug.md`
|
||||
7. **Share with operations team** for feedback
|
||||
490
skills/spec-author/guides/data-model.md
Normal file
490
skills/spec-author/guides/data-model.md
Normal file
@@ -0,0 +1,490 @@
|
||||
# How to Create a Data Model Specification
|
||||
|
||||
Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new data model
|
||||
scripts/generate-spec.sh data-model data-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in entities and relationships, then validate:
|
||||
scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Data Model
|
||||
|
||||
Use a Data Model when you need to:
|
||||
- Define database schema for new features
|
||||
- Document entity relationships and constraints
|
||||
- Establish consistent naming conventions
|
||||
- Enable API/UI teams to understand data structure
|
||||
- Plan data migrations or refactoring
|
||||
- Document complex data relationships
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what this data model supports:
|
||||
|
||||
```bash
|
||||
# Find technical requirements this fulfills
|
||||
grep -r "prd\|technical" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing data models that might be related
|
||||
grep -r "data\|model" docs/specs/ --include="*.md"
|
||||
|
||||
# Find API contracts that expose this data
|
||||
grep -r "api\|endpoint" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Review Existing Data Models
|
||||
- What data modeling patterns does your codebase use?
|
||||
- What database are you using (PostgreSQL, MongoDB, etc.)?
|
||||
- How are relationships currently modeled?
|
||||
- Naming conventions for fields and entities?
|
||||
- Any legacy schema patterns to respect or migrate from?
|
||||
|
||||
### 3. Research Domain Models
|
||||
- How do industry-standard models structure similar data?
|
||||
- Are there existing standards (e.g., ISO, RFC) you should follow?
|
||||
- What are best practices in this domain?
|
||||
|
||||
### 4. Understand Business Rules
|
||||
- What constraints must the data satisfy?
|
||||
- What are the cardinality rules (one-to-many, many-to-many)?
|
||||
- What data must be unique or required?
|
||||
- What's the expected scale/volume?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "User Data Model", "Transaction Model", etc.
|
||||
- **Scope**: What entities does this model cover?
|
||||
- **Version**: 1.0 for new models
|
||||
|
||||
### Overview Section
|
||||
Provide context:
|
||||
|
||||
```markdown
|
||||
# User & Profile Data Model
|
||||
|
||||
This data model defines the core entities for user management and profile
|
||||
information. Covers user accounts, authentication data, and user preferences.
|
||||
|
||||
**Entities**: User, UserProfile, UserPreference
|
||||
**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
|
||||
**Primary Database**: PostgreSQL
|
||||
```
|
||||
|
||||
### Entity Definitions Section
|
||||
|
||||
Document each entity/table:
|
||||
|
||||
#### Entity: User
|
||||
|
||||
```markdown
|
||||
### User
|
||||
|
||||
Core user account entity. Every user must have exactly one User record.
|
||||
|
||||
**Purpose**: Represents a user account in the system.
|
||||
|
||||
**Fields**
|
||||
|
||||
| Field | Type | Required | Unique | Default | Description |
|
||||
|-------|------|----------|--------|---------|-------------|
|
||||
| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
|
||||
| email | String(255) | Yes | Yes | - | User's email address, used for login |
|
||||
| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
|
||||
| first_name | String(100) | No | No | - | User's first name |
|
||||
| last_name | String(100) | No | No | - | User's last name |
|
||||
| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
|
||||
| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
|
||||
| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
|
||||
| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |
|
||||
|
||||
**Indexes**
|
||||
- Primary: `email` (unique for quick lookups)
|
||||
- Secondary: `created_at` (for user listing/pagination)
|
||||
- Secondary: `status` (for filtering active users)
|
||||
|
||||
**Constraints**
|
||||
- Email format must be valid (enforced in application)
|
||||
- Password must be at least 8 characters (enforced in application)
|
||||
- Email must be globally unique
|
||||
- Status can only be: active, inactive, suspended
|
||||
|
||||
**Data Volume**
|
||||
- Expected growth: 100 new users/day
|
||||
- Estimated year 1: ~36k users
|
||||
- Estimated year 3: ~150k users
|
||||
|
||||
**Archival Strategy**
|
||||
- Deleted users (deleted_at != NULL) moved to archive after 1 year
|
||||
- Soft deletes used for data recovery capability
|
||||
```
|
||||
|
||||
#### Entity: UserProfile
|
||||
|
||||
```markdown
|
||||
### UserProfile
|
||||
|
||||
Extended user profile information. One-to-one relationship with User.
|
||||
|
||||
**Purpose**: Stores optional user profile information separate from core account.
|
||||
|
||||
**Fields**
|
||||
|
||||
| Field | Type | Required | Unique | Description |
|
||||
|-------|------|----------|--------|-------------|
|
||||
| id | UUID | Yes | Yes | Primary key |
|
||||
| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
|
||||
| avatar_url | String(500) | No | No | URL to user's avatar image |
|
||||
| bio | String(500) | No | No | User bio/description |
|
||||
| phone | String(20) | No | Yes | User phone number |
|
||||
| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
|
||||
| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
|
||||
| theme | Enum | No | No | UI theme preference: light, dark, auto |
|
||||
| created_at | Timestamp | Yes | No | Creation time |
|
||||
| updated_at | Timestamp | Yes | No | Last update time |
|
||||
|
||||
**Indexes**
|
||||
- Primary: `user_id` (unique for 1:1 relationship)
|
||||
|
||||
**Constraints**
|
||||
- Foreign key: user_id references User(id) ON DELETE CASCADE
|
||||
- Phone must be valid format (if provided)
|
||||
- Timezone must be valid (e.g., from IANA timezone database)
|
||||
- Language must be valid ISO 639-1 code
|
||||
- Theme must be one of: light, dark, auto
|
||||
|
||||
**Notes**
|
||||
- Soft-deleted with parent User (CASCADE delete)
|
||||
- Profile is optional - some users may not have profile data
|
||||
```
|
||||
|
||||
#### Entity: UserPreference
|
||||
|
||||
```markdown
|
||||
### UserPreference
|
||||
|
||||
Key-value preferences for users. Flexible schema for future preference types.
|
||||
|
||||
**Purpose**: Stores user preferences without requiring schema changes.
|
||||
|
||||
**Fields**
|
||||
|
||||
| Field | Type | Required | Unique | Description |
|
||||
|-------|------|----------|--------|-------------|
|
||||
| id | UUID | Yes | Yes | Primary key |
|
||||
| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
|
||||
| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
|
||||
| preference_value | String(1000) | Yes | No | Preference value as string |
|
||||
| created_at | Timestamp | Yes | No | Creation time |
|
||||
| updated_at | Timestamp | Yes | No | Last update time |
|
||||
|
||||
**Indexes**
|
||||
- Composite: `(user_id, preference_key)` - For efficient preference lookup
|
||||
- Primary: `user_id` - For finding all preferences for a user
|
||||
|
||||
**Constraints**
|
||||
- Foreign key: user_id references User(id) ON DELETE CASCADE
|
||||
- Composite unique: `(user_id, preference_key)` - One preference per key per user
|
||||
- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
|
||||
- preference_value must be valid JSON or simple string
|
||||
|
||||
**Valid Preferences**
|
||||
Examples of preference_key values:
|
||||
- `notifications_email` → "true"/"false"
|
||||
- `notifications_sms` → "true"/"false"
|
||||
- `export_format` → "csv"/"json"
|
||||
- `ui_columns_per_page` → "20"/"50"/"100"
|
||||
|
||||
**Notes**
|
||||
- Flexible key-value design allows adding preferences without schema changes
|
||||
- Values stored as strings for flexibility, parsed by application layer
|
||||
```
|
||||
|
||||
### Relationships Section
|
||||
|
||||
Document how entities relate:
|
||||
|
||||
```markdown
|
||||
## Entity Relationships
|
||||
|
||||
```
|
||||
┌───────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ User │ │ UserProfile │ │ UserPref │
|
||||
├───────────┤ ├──────────────┤ ├─────────────┤
|
||||
│ id (PK) │ │ id (PK) │ │ id (PK) │
|
||||
│ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │
|
||||
│ ... │ │ avatar_url │ │ pref_key │
|
||||
└───────────┘ │ ... │ │ pref_value │
|
||||
└──────────────┘ └─────────────┘
|
||||
▲
|
||||
│
|
||||
1:many
|
||||
```
|
||||
|
||||
### Relationship: User → UserProfile (1:1)
|
||||
- **Type**: One-to-One
|
||||
- **Foreign Key**: UserProfile.user_id → User.id
|
||||
- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
|
||||
- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
|
||||
- **Optional**: UserProfile is optional (some users may not have detailed profile)
|
||||
|
||||
### Relationship: User → UserPreference (1:many)
|
||||
- **Type**: One-to-Many
|
||||
- **Foreign Key**: UserPreference.user_id → User.id
|
||||
- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
|
||||
- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
|
||||
- **Optional**: A User can have zero preferences
|
||||
```
|
||||
|
||||
### Constraints & Validation Section
|
||||
|
||||
```markdown
|
||||
## Data Constraints & Validation
|
||||
|
||||
### Business Logic Constraints
|
||||
- Users cannot have duplicate emails (enforced at database + application)
|
||||
- User phone numbers must be unique if provided
|
||||
- Email and phone cannot both be deleted/NULL in UserProfile
|
||||
|
||||
### Data Integrity Rules
|
||||
- password_hash must never be exposed in API responses
|
||||
- deleted_at cannot be set retroactively (only forward through time)
|
||||
- updated_at must be >= created_at
|
||||
|
||||
### Referential Integrity
|
||||
- Foreign key constraints enforced at database level
|
||||
- Cascade deletes on User deletion
|
||||
- No orphaned UserProfile or UserPreference records
|
||||
|
||||
### Enumeration Values
|
||||
|
||||
**User.status**
|
||||
- `active` - Account is active
|
||||
- `inactive` - Account temporarily inactive
|
||||
- `suspended` - Account suspended (admin action)
|
||||
|
||||
**UserProfile.theme**
|
||||
- `light` - Light theme
|
||||
- `dark` - Dark theme
|
||||
- `auto` - Follow system settings
|
||||
|
||||
**UserPreference.preference_key**
|
||||
- Must match pattern: `[a-z_]+`
|
||||
- Examples: `notifications_email`, `export_format`, `ui_language`
|
||||
```
|
||||
|
||||
### Scaling Considerations Section
|
||||
|
||||
```markdown
|
||||
## Scaling & Performance
|
||||
|
||||
### Expected Data Volume
|
||||
- Users: 100-1000 per day growth
|
||||
- Preferences: ~5-10 per user on average
|
||||
- Year 1 estimate: 36k users, ~180k preference records
|
||||
|
||||
### Table Sizes
|
||||
- User table: ~36MB (estimated year 1)
|
||||
- UserProfile table: ~28MB
|
||||
- UserPreference table: ~22MB
|
||||
|
||||
### Query Patterns & Indexes
|
||||
- Find user by email: Indexed (UNIQUE index on email)
|
||||
- Find all preferences for user: Indexed (composite on user_id, pref_key)
|
||||
- List users by creation date: Indexed (on created_at)
|
||||
- Filter users by status: Indexed (on status)
|
||||
|
||||
### Optimization Notes
|
||||
- Composite index `(user_id, preference_key)` enables efficient preference lookups
|
||||
- Email index enables fast login queries
|
||||
- Consider partitioning UserPreference by user_id for very large scale (100M+ records)
|
||||
```
|
||||
|
||||
### Migration & Change Management Section
|
||||
|
||||
```markdown
|
||||
## Schema Evolution
|
||||
|
||||
### Creating These Tables
|
||||
```sql
|
||||
CREATE TABLE user (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
email VARCHAR(255) UNIQUE NOT NULL,
|
||||
password_hash VARCHAR(255) NOT NULL,
|
||||
first_name VARCHAR(100),
|
||||
last_name VARCHAR(100),
|
||||
status VARCHAR(50) DEFAULT 'active' NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT now() NOT NULL,
|
||||
updated_at TIMESTAMP DEFAULT now() NOT NULL,
|
||||
deleted_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE user_profile (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
|
||||
avatar_url VARCHAR(500),
|
||||
bio VARCHAR(500),
|
||||
phone VARCHAR(20) UNIQUE,
|
||||
timezone VARCHAR(50),
|
||||
language VARCHAR(5),
|
||||
theme VARCHAR(20),
|
||||
created_at TIMESTAMP DEFAULT now() NOT NULL,
|
||||
updated_at TIMESTAMP DEFAULT now() NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE user_preference (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
|
||||
preference_key VARCHAR(100) NOT NULL,
|
||||
preference_value VARCHAR(1000) NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT now() NOT NULL,
|
||||
updated_at TIMESTAMP DEFAULT now() NOT NULL,
|
||||
UNIQUE(user_id, preference_key)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_user_email ON user(email);
|
||||
CREATE INDEX idx_user_created_at ON user(created_at);
|
||||
CREATE INDEX idx_user_status ON user(status);
|
||||
CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);
|
||||
```
|
||||
|
||||
### Future Migrations
|
||||
- Q2 2024: Add `last_login_at` to User (nullable, new index)
|
||||
- Q3 2024: Implement user archival (age > 1 year, no activity)
|
||||
```
|
||||
|
||||
### Documentation & Examples Section
|
||||
|
||||
```markdown
|
||||
## Example Queries
|
||||
|
||||
### Find user by email
|
||||
```sql
|
||||
SELECT * FROM user WHERE email = 'user@example.com';
|
||||
```
|
||||
|
||||
### Get user with profile
|
||||
```sql
|
||||
SELECT u.*, p.*
|
||||
FROM user u
|
||||
LEFT JOIN user_profile p ON u.id = p.user_id
|
||||
WHERE u.id = $1;
|
||||
```
|
||||
|
||||
### Get user's preferences
|
||||
```sql
|
||||
SELECT preference_key, preference_value
|
||||
FROM user_preference
|
||||
WHERE user_id = $1
|
||||
ORDER BY created_at DESC;
|
||||
```
|
||||
|
||||
### Archive old inactive users
|
||||
```sql
|
||||
UPDATE user
|
||||
SET deleted_at = now()
|
||||
WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
|
||||
AND deleted_at IS NULL;
|
||||
```
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Document Constraints Clearly
|
||||
- Why does each field have the constraints it does?
|
||||
- What validation rules apply?
|
||||
- What happens on constraint violations?
|
||||
|
||||
### Think About Scale
|
||||
- How much data will this table store?
|
||||
- What are the growth projections?
|
||||
- What indexing strategy is needed?
|
||||
- Will partitioning be needed in the future?
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference technical requirements: `[PRD-001]`
|
||||
- Reference API contracts: `[API-001]` (what data is exposed)
|
||||
- Reference design documents: `[DES-001]`
|
||||
|
||||
### Include Examples
|
||||
- Sample SQL for common queries
|
||||
- Sample JSON representations
|
||||
- Example migration scripts
|
||||
|
||||
### Document Change Constraints
|
||||
- What fields can't change after creation?
|
||||
- What fields are immutable?
|
||||
- How do we handle schema evolution?
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing entity field specifications"
|
||||
- **Fix**: Complete the fields table for each entity with types, constraints, descriptions
|
||||
|
||||
**Issue**: "No relationships documented"
|
||||
- **Fix**: Add a relationships section showing foreign keys and cardinality
|
||||
|
||||
**Issue**: "TODO items in Constraints (3 items)"
|
||||
- **Fix**: Complete constraint definitions, validation rules, and enumeration values
|
||||
|
||||
**Issue**: "No scaling or performance information"
|
||||
- **Fix**: Add data volume estimates, indexing strategy, and optimization notes
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
As you write the data model, consider:
|
||||
|
||||
1. **Entity Design**: What entities do we need?
|
||||
- What are distinct concepts?
|
||||
- What are attributes vs. relationships?
|
||||
- Should data be normalized or denormalized?
|
||||
|
||||
2. **Relationships**: How do entities relate?
|
||||
- One-to-one, one-to-many, many-to-many?
|
||||
- Should relationships be required or optional?
|
||||
- How should deletions cascade?
|
||||
|
||||
3. **Constraints**: What rules must data satisfy?
|
||||
- Uniqueness constraints?
|
||||
- Required fields?
|
||||
- Data type restrictions?
|
||||
- Enumeration values?
|
||||
|
||||
4. **Performance**: How will data be queried?
|
||||
- What indexes are needed?
|
||||
- What's the expected scale?
|
||||
- Are there bottlenecks?
|
||||
|
||||
5. **Evolution**: How will this model change?
|
||||
- Can we add fields without migrations?
|
||||
- Can we add entities without breaking things?
|
||||
- How do we handle data migrations?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh data-model data-XXX-slug`
|
||||
2. **Define entities**: What are the main entities/tables?
|
||||
3. **Specify fields** with types, constraints, descriptions
|
||||
4. **Document relationships** between entities
|
||||
5. **Plan indexes** for performance
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md`
|
||||
7. **Share with team** before implementation
|
||||
561
skills/spec-author/guides/deployment-procedure.md
Normal file
561
skills/spec-author/guides/deployment-procedure.md
Normal file
@@ -0,0 +1,561 @@
|
||||
# How to Create a Deployment Procedure Specification
|
||||
|
||||
Deployment procedures document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new deployment procedure
|
||||
scripts/generate-spec.sh deployment-procedure deploy-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/deployment-procedure/deploy-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in steps and checklists, then validate:
|
||||
scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Deployment Procedure
|
||||
|
||||
Use a Deployment Procedure when you need to:
|
||||
- Document how to deploy a new service or component
|
||||
- Ensure consistent, repeatable deployments
|
||||
- Provide runbooks for operations teams
|
||||
- Document rollback procedures for failures
|
||||
- Enable any team member to deploy safely
|
||||
- Create an audit trail of deployments
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what you're deploying:
|
||||
|
||||
```bash
|
||||
# Find component specs
|
||||
grep -r "component" docs/specs/ --include="*.md"
|
||||
|
||||
# Find design documents that mention infrastructure
|
||||
grep -r "design\|infrastructure" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing deployment procedures
|
||||
grep -r "deploy" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Understand Your Infrastructure
|
||||
- What's the deployment target? (Kubernetes, serverless, VMs)
|
||||
- What infrastructure does this component need?
|
||||
- What access/permissions are required?
|
||||
- What monitoring must be in place?
|
||||
|
||||
### 3. Review Past Deployments
|
||||
- How have similar components been deployed?
|
||||
- What issues arose? How were they resolved?
|
||||
- What worked well? What didn't?
|
||||
- Any patterns or templates to follow?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Export Service Deployment to Production", "Database Migration", etc.
|
||||
- **Component**: What's being deployed
|
||||
- **Target**: Production, staging, canary, etc.
|
||||
- **Owner**: Team responsible for deployment
|
||||
|
||||
### Prerequisites Section
|
||||
|
||||
Document what must be done before deployment:
|
||||
|
||||
```markdown
|
||||
# Export Service Production Deployment
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Infrastructure Requirements
|
||||
- [ ] AWS resources provisioned (see [CMP-001] for details)
|
||||
- [ ] ElastiCache Redis cluster (export-service-queue)
|
||||
- [ ] RDS PostgreSQL instance (export-db)
|
||||
- [ ] S3 bucket (export-files-prod)
|
||||
- [ ] IAM roles and policies configured
|
||||
- [ ] Kubernetes cluster accessible
|
||||
- [ ] kubectl configured with production cluster context
|
||||
- [ ] Deployment manifests reviewed by tech lead
|
||||
- [ ] Namespace `export-service-prod` created
|
||||
|
||||
### Code & Build Requirements
|
||||
- [ ] All code merged to main branch
|
||||
- [ ] Code reviewed by 2+ senior engineers
|
||||
- [ ] All tests passing
|
||||
- [ ] Unit tests (90%+ coverage)
|
||||
- [ ] Integration tests
|
||||
- [ ] Load tests pass at target throughput
|
||||
- [ ] Docker image built and pushed to ECR
|
||||
- [ ] Image tagged with version (e.g., v1.2.3)
|
||||
- [ ] Image scanned for vulnerabilities
|
||||
- [ ] Image verified to work (manual test in staging)
|
||||
|
||||
### Team & Access Requirements
|
||||
- [ ] Deployment lead identified (typically tech lead or on-call eng)
|
||||
- [ ] Access verified for:
|
||||
- [ ] AWS console (ECR, S3, CloudWatch)
|
||||
- [ ] Kubernetes cluster (kubectl access)
|
||||
- [ ] Database (for running migrations if needed)
|
||||
- [ ] Monitoring/alerting system (Grafana, PagerDuty)
|
||||
- [ ] Communication channel open (Slack, war room)
|
||||
- [ ] Runbook reviewed by both eng and ops team
|
||||
|
||||
### Pre-Deployment Verification Checklist
|
||||
- [ ] Staging deployment successful (deployed 24+ hours ago, stable)
|
||||
- [ ] Monitoring in place and verified working
|
||||
- [ ] Rollback plan reviewed and tested
|
||||
- [ ] Emergency contacts identified
|
||||
- [ ] Stakeholders notified of deployment window
|
||||
- [ ] Change log prepared (what's new in this version)
|
||||
|
||||
### Data/Database Requirements
|
||||
- [ ] Database schema compatible with new version
|
||||
- [ ] Backward compatible (no breaking changes)
|
||||
- [ ] Migrations tested in staging
|
||||
- [ ] Rollback plan for migrations documented
|
||||
- [ ] No data conflicts or corruption risks
|
||||
- [ ] Backup created (if applicable)
|
||||
|
||||
### Approval Checklist
|
||||
- [ ] Tech Lead: Code and approach approved
|
||||
- [ ] Product Owner: Feature approved, ready for launch
|
||||
- [ ] Operations Lead: Deployment plan reviewed
|
||||
- [ ] Security: Security review passed (if applicable)
|
||||
```
|
||||
|
||||
### Deployment Steps Section
|
||||
|
||||
Provide step-by-step instructions:
|
||||
|
||||
```markdown
|
||||
## Deployment Procedure
|
||||
|
||||
### Pre-Deployment (Validation Phase)
|
||||
|
||||
**Step 1: Verify Prerequisites**
|
||||
- Command: Run pre-deployment checklist above
|
||||
- Verify: All items checked ✓
|
||||
- If any fail: Stop deployment, resolve issues
|
||||
- Time: ~15 minutes
|
||||
|
||||
**Step 2: Create Deployment Record**
|
||||
- Document: Who is deploying, when, what version
|
||||
- Command: Log in to deployment tracking system
|
||||
- Entry:
|
||||
```
|
||||
Deployment: export-service
|
||||
Version: v1.2.3
|
||||
Environment: production
|
||||
Deployed By: Alice Smith
|
||||
Time: 2024-01-15 14:30 UTC
|
||||
Change Summary: Added bulk export feature, fixed queue processing
|
||||
```
|
||||
- Time: ~5 minutes
|
||||
|
||||
### Deployment Phase
|
||||
|
||||
**Step 3: Tag Database Migration (if applicable)**
|
||||
- Check: Are there schema changes in this version?
|
||||
- If YES:
|
||||
```bash
|
||||
# SSH to database server
|
||||
ssh -i ~/.ssh/prod.pem admin@db.example.com
|
||||
|
||||
# Run migrations
|
||||
psql -U export_service -d export_service -c \
|
||||
"ALTER TABLE exports ADD COLUMN retry_count INT DEFAULT 0;"
|
||||
|
||||
# Verify migration
|
||||
psql -U export_service -d export_service -c \
|
||||
"SELECT column_name FROM information_schema.columns WHERE table_name='exports';"
|
||||
```
|
||||
- If NO: Skip this step
|
||||
- Verify: All migrations complete without errors
|
||||
- Time: ~10 minutes
|
||||
|
||||
**Step 4: Deploy to Kubernetes**
|
||||
- Verify: You're deploying to PRODUCTION cluster
|
||||
```bash
|
||||
kubectl config current-context
|
||||
# Should output: arn:aws:eks:us-east-1:123456789:cluster/prod
|
||||
```
|
||||
- If wrong context: STOP, switch to correct cluster
|
||||
- Deploy new image version:
|
||||
```bash
|
||||
# Update deployment with new image
|
||||
kubectl set image deployment/export-service \
|
||||
export-service=123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:v1.2.3 \
|
||||
-n export-service-prod
|
||||
```
|
||||
- Verify: Deployment triggered
|
||||
```bash
|
||||
kubectl rollout status deployment/export-service -n export-service-prod
|
||||
```
|
||||
- Wait: For all pods to become ready (typically 2-3 minutes)
|
||||
- Output should show: `deployment "export-service" successfully rolled out`
|
||||
- Time: ~5 minutes
|
||||
|
||||
**Step 5: Verify Deployment Health**
|
||||
- Check: Pod status
|
||||
```bash
|
||||
kubectl get pods -n export-service-prod
|
||||
```
|
||||
- All pods should show `Running` status
|
||||
- If any show `CrashLoopBackOff`: Stop deployment, investigate
|
||||
|
||||
- Check: Service endpoints
|
||||
```bash
|
||||
kubectl get svc export-service -n export-service-prod
|
||||
```
|
||||
- Should show external IP/load balancer endpoint
|
||||
|
||||
- Check: Logs for errors
|
||||
```bash
|
||||
kubectl logs -n export-service-prod -l app=export-service --tail=50
|
||||
```
|
||||
- Should show startup logs, no ERROR level messages
|
||||
- If errors present: Check Step 6 for rollback
|
||||
|
||||
- Check: Health endpoints
|
||||
```bash
|
||||
curl https://api.example.com/health
|
||||
```
|
||||
- Should return 200 OK
|
||||
- If not: Service may still be starting (wait 30s and retry)
|
||||
|
||||
- Time: ~5 minutes
|
||||
|
||||
### Post-Deployment (Verification Phase)
|
||||
|
||||
**Step 6: Monitor Metrics**
|
||||
- Open: Grafana dashboard for export-service
|
||||
- Check: Key metrics for 5 minutes
|
||||
- Request latency: Should be stable (< 100ms p95)
|
||||
- Error rate: Should remain < 0.1%
|
||||
- CPU/Memory: Should be within normal ranges
|
||||
- Queue depth: Should process jobs smoothly
|
||||
- Look for: Any sudden spikes or anomalies
|
||||
- If anomalies: Proceed to rollback (Step 8)
|
||||
- Time: ~5 minutes
|
||||
|
||||
**Step 7: Functional Testing**
|
||||
- Manual test: Create export via API
|
||||
```bash
|
||||
curl -X POST https://api.example.com/exports \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"format": "csv",
|
||||
"data_types": ["users"]
|
||||
}'
|
||||
```
|
||||
- Response: Should return 201 Created with export_id
|
||||
- Check status:
|
||||
```bash
|
||||
curl https://api.example.com/exports/{export_id} \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
- Verify: Status transitions from queued → processing → completed
|
||||
- Download: Successfully download export file
|
||||
- Verify: File contents correct
|
||||
- If any step fails: Proceed to rollback (Step 8)
|
||||
- Time: ~5 minutes
|
||||
|
||||
**Step 8: Notify Stakeholders**
|
||||
- Update: Deployment tracking system
|
||||
```
|
||||
Status: DEPLOYED
|
||||
Completion Time: 14:45 UTC
|
||||
Health: ✓ All checks passed
|
||||
Metrics: ✓ Stable
|
||||
Functional Tests: ✓ Passed
|
||||
```
|
||||
- Announce: Slack to #product-eng
|
||||
```
|
||||
@channel Export Service v1.2.3 deployed to production.
|
||||
New feature: Bulk data exports now available.
|
||||
Status: Monitoring.
|
||||
```
|
||||
- Notify: On-call engineer (monitoring for 2 hours post-deployment)
|
||||
|
||||
### Rollback Procedure (If Issues Found)
|
||||
|
||||
**Step 8: Rollback (Only if Step 6 or 7 fail)**
|
||||
- Decision: Is deployment safe to continue?
|
||||
- YES → All checks pass, monitoring is good → Release complete
|
||||
- NO → Issues found → Proceed with rollback
|
||||
|
||||
- Execute rollback:
|
||||
```bash
|
||||
# Revert to previous version
|
||||
kubectl rollout undo deployment/export-service -n export-service-prod
|
||||
|
||||
# Verify rollback in progress
|
||||
kubectl rollout status deployment/export-service -n export-service-prod
|
||||
|
||||
# Wait for rollback to complete
|
||||
```
|
||||
|
||||
- Verify rollback successful:
|
||||
```bash
|
||||
# Check current image
|
||||
kubectl describe deployment export-service -n export-service-prod | grep Image
|
||||
|
||||
# Should show previous version (e.g., v1.2.2)
|
||||
|
||||
# Verify service responding
|
||||
curl https://api.example.com/health
|
||||
```
|
||||
|
||||
- Notify: Update stakeholders
|
||||
```
|
||||
@channel Deployment rolled back due to [specific reason].
|
||||
Current version: v1.2.2 (stable)
|
||||
Investigating issue. Will retry deployment tomorrow.
|
||||
```
|
||||
|
||||
- Document: Root cause analysis
|
||||
- What went wrong?
|
||||
- Why wasn't it caught in staging?
|
||||
- How do we prevent this next time?
|
||||
|
||||
- Time: ~10 minutes
|
||||
```
|
||||
|
||||
### Success Criteria Section
|
||||
|
||||
```markdown
|
||||
## Deployment Success Criteria
|
||||
|
||||
The deployment is successful if ALL of these are true:
|
||||
|
||||
### Technical Criteria
|
||||
- [ ] All pods running and healthy (0 CrashLoopBackOff)
|
||||
- [ ] Service responding to health checks (200 OK)
|
||||
- [ ] Metrics showing normal values (no spikes)
|
||||
- [ ] Error rate < 0.1% (< 1 error per 1000 requests)
|
||||
- [ ] Response latency p95 < 100ms
|
||||
- [ ] No errors in application logs
|
||||
|
||||
### Functional Criteria
|
||||
- [ ] Export API responds to requests
|
||||
- [ ] Export jobs queue successfully
|
||||
- [ ] Jobs process and complete
|
||||
- [ ] Files upload to S3 correctly
|
||||
- [ ] Users can download exported files
|
||||
- [ ] File contents verified correct
|
||||
|
||||
### Operational Criteria
|
||||
- [ ] Monitoring active and receiving metrics
|
||||
- [ ] Alerting working (test alert fired)
|
||||
- [ ] Logs aggregated and searchable
|
||||
- [ ] Runbook tested and functional
|
||||
- [ ] Team confident in operating system
|
||||
```
|
||||
|
||||
### Monitoring & Alerting Section
|
||||
|
||||
```markdown
|
||||
## Monitoring Setup
|
||||
|
||||
### Critical Alerts (Page on-call)
|
||||
- Service down (health check fails)
|
||||
- Error rate > 1% for 5 minutes
|
||||
- Response latency p95 > 500ms for 5 minutes
|
||||
- Queue depth > 1000 for 10 minutes
|
||||
|
||||
### Warning Alerts (Slack notification)
|
||||
- Error rate > 0.5% for 5 minutes
|
||||
- CPU > 80% for 10 minutes
|
||||
- Memory > 85% for 10 minutes
|
||||
- Export job timeout increasing
|
||||
|
||||
### Dashboard
|
||||
- Service: export-service-prod
|
||||
- Metrics: Latency, errors, throughput, queue depth
|
||||
- Time range: Last 24 hours by default
|
||||
- Alerts: Show current alert status
|
||||
```
|
||||
|
||||
### Troubleshooting Section
|
||||
|
||||
```markdown
|
||||
## Troubleshooting Common Issues
|
||||
|
||||
### Issue: Pods stuck in CrashLoopBackOff
|
||||
**Symptoms**: Pods repeatedly crash and restart
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Check logs for errors
|
||||
kubectl logs <pod-name> -n export-service-prod
|
||||
```
|
||||
**Common Causes**:
|
||||
- Configuration error (check environment variables)
|
||||
- Database connection failed (check credentials)
|
||||
- Out of memory (check resource limits)
|
||||
**Fix**: Review logs, check prerequisites, rollback if unclear
|
||||
|
||||
### Issue: Response latency spiking
|
||||
**Symptoms**: p95 latency > 200ms, users report slow exports
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Check queue depth
|
||||
kubectl exec -it <worker-pod> -n export-service-prod \
|
||||
-- redis-cli -h redis.example.com LLEN export-queue
|
||||
```
|
||||
**Common Causes**:
|
||||
- Too many concurrent exports (queue backlog)
|
||||
- Database slow (check queries, indexes)
|
||||
- Network issues (check connectivity)
|
||||
**Fix**: Scale up workers, check database performance, verify network
|
||||
|
||||
### Issue: Export jobs failing
|
||||
**Symptoms**: Job status shows `failed`, users can't export
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Check worker logs
|
||||
kubectl logs -n export-service-prod -l app=export-service
|
||||
```
|
||||
**Common Causes**:
|
||||
- S3 upload failing (check permissions, bucket exists)
|
||||
- Database query error (schema mismatch)
|
||||
- User doesn't have data to export
|
||||
**Fix**: Review logs, verify S3 access, check schema version
|
||||
|
||||
### Issue: Database migration failed
|
||||
**Symptoms**: Service won't start after deployment
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Check migration logs
|
||||
psql -U export_service -d export_service -c \
|
||||
"SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5;"
|
||||
```
|
||||
**Recovery**:
|
||||
1. Identify failed migration
|
||||
2. Rollback deployment (revert to previous version)
|
||||
3. Debug migration issue in staging
|
||||
4. Retry deployment after fix
|
||||
```
|
||||
|
||||
### Post-Deployment Actions Section
|
||||
|
||||
```markdown
|
||||
## After Deployment
|
||||
|
||||
### Immediate (Next 2 hours)
|
||||
- [ ] On-call engineer monitoring
|
||||
- [ ] Check metrics every 15 minutes
|
||||
- [ ] Monitor error rate and latency
|
||||
- [ ] Watch for user-reported issues in #support
|
||||
|
||||
### Short-term (Next 24 hours)
|
||||
- [ ] Review deployment metrics
|
||||
- [ ] Collect feedback from users
|
||||
- [ ] Document any issues encountered
|
||||
- [ ] Update runbook if needed
|
||||
|
||||
### Follow-up (Next week)
|
||||
- [ ] Post-mortem if issues occurred
|
||||
- [ ] Update deployment procedure based on lessons learned
|
||||
- [ ] Plan performance improvements if needed
|
||||
- [ ] Update documentation if system behavior changed
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Precise and Detailed
|
||||
- Exact commands to run (copy-paste ready)
|
||||
- Specific values (versions, endpoints, timeouts)
|
||||
- Expected outputs for verification
|
||||
- Time estimates for each step
|
||||
|
||||
### Think About Edge Cases
|
||||
- What if something is already deployed?
|
||||
- What if a prerequisite is missing?
|
||||
- What if deployment partially succeeds?
|
||||
- What if rollback is needed?
|
||||
|
||||
### Make Rollback Easy
|
||||
- Document rollback procedure clearly
|
||||
- Test rollback before using in production
|
||||
- Make rollback faster than forward deployment
|
||||
- Have quick communication plan for failures
|
||||
|
||||
### Document Monitoring
|
||||
- What metrics indicate health?
|
||||
- What should we watch during deployment?
|
||||
- What thresholds trigger alerts?
|
||||
- How do we validate success?
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference component specs: `[CMP-001]`
|
||||
- Reference design documents: `[DES-001]`
|
||||
- Reference operations runbooks
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Prerequisites section incomplete"
|
||||
- **Fix**: Add all required infrastructure, code, access, and approvals
|
||||
|
||||
**Issue**: "Step-by-step procedures lack detail"
|
||||
- **Fix**: Add actual commands, expected output, time estimates
|
||||
|
||||
**Issue**: "No rollback procedure"
|
||||
- **Fix**: Document how to revert deployment if issues arise
|
||||
|
||||
**Issue**: "Monitoring and troubleshooting missing"
|
||||
- **Fix**: Add success criteria, monitoring setup, and troubleshooting guide
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When writing a deployment procedure:
|
||||
|
||||
1. **Prerequisites**: What must be true before we start?
|
||||
- Infrastructure ready?
|
||||
- Code reviewed and tested?
|
||||
- Team trained?
|
||||
- Approvals gotten?
|
||||
|
||||
2. **Procedure**: What are the exact steps?
|
||||
- Simple, repeatable steps?
|
||||
- Verification at each step?
|
||||
- Estimated timing?
|
||||
|
||||
3. **Safety**: How do we prevent/catch issues?
|
||||
- Verification steps after each phase?
|
||||
- Rollback procedure?
|
||||
- Quick failure detection?
|
||||
|
||||
4. **Communication**: Who needs to know what?
|
||||
- Stakeholders notified?
|
||||
- On-call monitoring?
|
||||
- Escalation path?
|
||||
|
||||
5. **Learning**: How do we improve next time?
|
||||
- Monitoring enabled?
|
||||
- Runbook updated?
|
||||
- Issues documented?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh deployment-procedure deploy-XXX-slug`
|
||||
2. **Research**: Find component specs and existing procedures
|
||||
3. **Document prerequisites**: What must be true before deployment?
|
||||
4. **Write procedures**: Step-by-step, with commands and verification
|
||||
5. **Plan rollback**: How do we undo this if needed?
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-XXX-slug.md`
|
||||
7. **Test procedure**: Walk through it in staging environment
|
||||
8. **Get team review** before using in production
|
||||
503
skills/spec-author/guides/design-document.md
Normal file
503
skills/spec-author/guides/design-document.md
Normal file
@@ -0,0 +1,503 @@
|
||||
# How to Create a Design Document
|
||||
|
||||
Design Documents provide the detailed architectural and technical design for a system, component, or significant feature. They answer "How will we build this?" after business and technical requirements have been defined.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new design document
|
||||
scripts/generate-spec.sh design-document des-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/design-document/des-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in the sections, then validate:
|
||||
scripts/validate-spec.sh docs/specs/design-document/des-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/design-document/des-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Design Document
|
||||
|
||||
Use a Design Document when you need to:
|
||||
- Define system architecture or redesign existing components
|
||||
- Document major technical decisions and trade-offs
|
||||
- Provide a blueprint for implementation teams
|
||||
- Enable architectural review before coding begins
|
||||
- Create shared understanding of complex systems
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find upstream specs that inform your design:
|
||||
|
||||
```bash
|
||||
# Find related business requirements
|
||||
grep -r "brd" docs/specs/ --include="*.md"
|
||||
|
||||
# Find related technical requirements
|
||||
grep -r "prd\|technical" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing design patterns or similar designs
|
||||
grep -r "design\|architecture" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Research External Documentation
|
||||
Research existing architectures and patterns:
|
||||
|
||||
- Look up similar systems: "How do other companies solve this problem?"
|
||||
- Research technologies and frameworks you're planning to use
|
||||
- Review relevant design patterns or architecture styles
|
||||
- Check for security, performance, or scalability best practices
|
||||
|
||||
Use tools to fetch external docs:
|
||||
```bash
|
||||
# Research the latest on your chosen technologies
|
||||
# Example: Research distributed system patterns
|
||||
# Example: Research microservices architecture best practices
|
||||
```
|
||||
|
||||
### 3. Review Existing Codebase & Architecture
|
||||
- What patterns does your codebase already follow?
|
||||
- What technologies are you already using?
|
||||
- How are similar features currently implemented?
|
||||
- What architectural decisions have been made previously?
|
||||
|
||||
Ask: "Are we extending existing patterns or introducing new ones?"
|
||||
|
||||
### 4. Understand Constraints
|
||||
- What are the performance requirements? (latency, throughput)
|
||||
- What scalability targets exist?
|
||||
- What security constraints apply?
|
||||
- What infrastructure/budget constraints?
|
||||
- Team expertise with chosen technologies?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Microservices Architecture for User Service" or similar
|
||||
- **Type**: Architecture | System Design | RFC | Technology Choice
|
||||
- **Status**: Draft | Under Review | Accepted | Rejected
|
||||
- **Version**: 1.0.0 (increment for significant revisions)
|
||||
|
||||
### Executive Summary
|
||||
Write 3-4 sentences that answer:
|
||||
- What problem does this solve?
|
||||
- What's the proposed solution?
|
||||
- What are the key tradeoffs?
|
||||
|
||||
Example:
|
||||
```
|
||||
This design proposes a microservices architecture to scale our user service.
|
||||
We'll split user management, authentication, and profile service into separate
|
||||
deployable services. This trades some operational complexity for independent
|
||||
scaling and development velocity. Key trade-off: eventual consistency vs.
|
||||
immediate consistency in cross-service operations.
|
||||
```
|
||||
|
||||
### Problem Statement
|
||||
Describe the current state and limitations:
|
||||
|
||||
```
|
||||
Current monolithic architecture handles all user operations in a single service,
|
||||
causing:
|
||||
- Bottleneck: User service becomes bottleneck for entire system
|
||||
- Scaling: Must scale entire service even if only auth needs capacity
|
||||
- Deployment: Changes in one area risk entire user service
|
||||
- Velocity: Teams block each other during development
|
||||
|
||||
This design solves these issues by enabling independent scaling and deployment.
|
||||
```
|
||||
|
||||
### Goals & Success Criteria
|
||||
|
||||
**Primary Goals** (3-5 goals)
|
||||
- Reduce deployment frequency to enable multiple daily deployments
|
||||
- Enable independent scaling of auth and profile services
|
||||
- Reduce time to market for new user features
|
||||
|
||||
**Success Criteria** (specific, measurable)
|
||||
1. Auth service can scale independently to handle 10k requests/sec
|
||||
2. Profile service deployment doesn't impact auth service
|
||||
3. System reduces MTTR for user service incidents by 50%
|
||||
4. Teams can deploy independently without coordination
|
||||
5. P95 latency remains under 200ms across service boundaries
|
||||
|
||||
### Context & Background
|
||||
Explain why now?
|
||||
|
||||
```
|
||||
Over the past 6 months, we've experienced:
|
||||
- Auth service saturated at 5k requests/sec during peak hours
|
||||
- Authentication changes blocked by profile service deployments
|
||||
- High operational burden managing single monolithic service
|
||||
|
||||
Recent customer requests for higher throughput have revealed these bottlenecks.
|
||||
This design addresses the most urgent scaling constraint (auth service).
|
||||
```
|
||||
|
||||
### Proposed Solution
|
||||
|
||||
#### High-Level Overview
|
||||
Provide a diagram showing major components and data flow:
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ Client │
|
||||
└──────┬──────┘
|
||||
│
|
||||
├─→ [API Gateway]
|
||||
│ │
|
||||
├─→ [Auth Service] - JWT validation, user login
|
||||
│
|
||||
├─→ [Profile Service] - User profile, preferences
|
||||
│
|
||||
└─→ [Data Layer]
|
||||
├─ User DB (master)
|
||||
├─ Cache (Redis)
|
||||
└─ Message Queue (RabbitMQ)
|
||||
```
|
||||
|
||||
Explain how components interact:
|
||||
```
|
||||
Client sends request to API Gateway, which routes based on endpoint.
|
||||
Auth service handles login/JWT operations. Profile service handles profile
|
||||
reads/writes. Both services consume user data from shared database with
|
||||
eventual consistency via message queue.
|
||||
```
|
||||
|
||||
#### Architecture Components
|
||||
For each major component:
|
||||
|
||||
**Auth Service**
|
||||
- **Purpose**: Handles authentication, token generation, validation
|
||||
- **Technology**: Node.js with Express, Redis for session storage
|
||||
- **Key Responsibilities**:
|
||||
- User login/logout
|
||||
- JWT token generation and validation
|
||||
- Session management
|
||||
- Password reset flows
|
||||
- **Interactions**: Calls User DB for credential validation, publishes events to queue
|
||||
|
||||
**Profile Service**
|
||||
- **Purpose**: Manages user profile data and preferences
|
||||
- **Technology**: Node.js with Express, PostgreSQL for user data
|
||||
- **Key Responsibilities**:
|
||||
- Read/write user profile information
|
||||
- Manage user preferences
|
||||
- Handle profile search and filtering
|
||||
- **Interactions**: Consumes user events from queue, calls shared User DB
|
||||
|
||||
**API Gateway**
|
||||
- **Purpose**: Single entry point, routing, authentication enforcement
|
||||
- **Technology**: Nginx or API Gateway (e.g., Kong)
|
||||
- **Key Responsibilities**:
|
||||
- Route requests to appropriate service
|
||||
- Enforce API authentication
|
||||
- Rate limiting
|
||||
- Request/response transformation
|
||||
- **Interactions**: Routes to Auth and Profile services
|
||||
|
||||
### Design Decisions
|
||||
|
||||
For each significant decision, document:
|
||||
|
||||
#### Decision 1: Microservices vs. Monolith
|
||||
- **Decision**: Adopt microservices architecture
|
||||
- **Rationale**:
|
||||
- Independent scaling needed (auth bottleneck at 5k req/sec)
|
||||
- Team velocity: Can deploy auth changes independently
|
||||
- Loose coupling enables faster iteration
|
||||
- **Alternatives Considered**:
|
||||
- Monolith optimization: Caching, database optimization (rejected: can't solve scaling bottleneck)
|
||||
- Modular monolith: Improves structure but doesn't enable independent scaling
|
||||
- **Impact**:
|
||||
- Gain: Independent scaling, deployment, team velocity
|
||||
- Accept: Distributed system complexity, operational overhead, eventual consistency
|
||||
|
||||
#### Decision 2: Synchronous vs. Asynchronous Communication
|
||||
- **Decision**: Use message queue for eventual consistency
|
||||
- **Rationale**:
|
||||
- Profile updates don't need to be immediately consistent across auth service
|
||||
- Reduces coupling: Auth service doesn't wait for profile service
|
||||
- Improves resilience: Profile service failure doesn't affect auth
|
||||
- **Alternatives Considered**:
|
||||
- Synchronous REST calls: Simpler but tight coupling, availability issues
|
||||
- Event sourcing: Over-engineered for current needs
|
||||
- **Impact**:
|
||||
- Gain: Resilience, reduced coupling, independent scaling
|
||||
- Accept: Eventual consistency, operational complexity (message queue)
|
||||
|
||||
### Technology Stack
|
||||
|
||||
**Language & Runtime**
|
||||
- Node.js 18 LTS - Rationale: Existing expertise, good async support
|
||||
- Express - Lightweight, flexible framework the team knows
|
||||
|
||||
**Data Layer**
|
||||
- PostgreSQL (primary database) - Reliable, ACID transactions for user data
|
||||
- Redis (cache layer) - Session storage, auth token cache
|
||||
|
||||
**Infrastructure**
|
||||
- Kubernetes for orchestration - Running multiple services at scale
|
||||
- Docker for containerization - Consistent deployment
|
||||
|
||||
**Key Libraries/Frameworks**
|
||||
- Express (v4.18) - HTTP framework
|
||||
- jsonwebtoken - JWT token handling
|
||||
- @aws-sdk - AWS SDK for future integration
|
||||
- Jest - Testing framework
|
||||
|
||||
### Data Model & Storage
|
||||
|
||||
**Storage Strategy**
|
||||
- **Primary Database**: PostgreSQL with user table containing:
|
||||
- id, email, password_hash, created_at, updated_at
|
||||
- One-to-many relationship with user_preferences
|
||||
- **Caching**: Redis stores JWT token metadata and session info with 1-hour TTL
|
||||
- **Data Retention**: User data retained indefinitely; sessions cleaned up after TTL
|
||||
|
||||
**Schema Overview**
|
||||
```
|
||||
Users Table:
|
||||
- id (primary key)
|
||||
- email (unique index)
|
||||
- password_hash
|
||||
- created_at
|
||||
- updated_at
|
||||
|
||||
User Preferences:
|
||||
- id
|
||||
- user_id (foreign key)
|
||||
- key (e.g., theme, language)
|
||||
- value
|
||||
```
|
||||
|
||||
### API & Integration Points
|
||||
|
||||
**External Dependencies**
|
||||
- Integrates with existing Payment Service for billing
|
||||
- Consumes events from Billing Service (subscription changes)
|
||||
- Publishes user events to event bus for downstream services
|
||||
|
||||
**Key Endpoints** (reference full API spec):
|
||||
- POST /auth/login - User login
|
||||
- POST /auth/logout - User logout
|
||||
- GET /profile - Fetch user profile
|
||||
- PUT /profile - Update user profile
|
||||
|
||||
(See [API-001] for complete endpoint specifications)
|
||||
|
||||
### Trade-offs
|
||||
|
||||
**Accepting**
|
||||
- Operational complexity: Must manage multiple services, deployments, monitoring
|
||||
- Eventual consistency: Changes propagate through message queue, not immediate
|
||||
- Debugging complexity: Cross-service issues harder to debug
|
||||
|
||||
**Gaining**
|
||||
- Independent scaling: Auth service can scale without scaling profile service
|
||||
- Team autonomy: Teams can deploy independently without coordination
|
||||
- Failure isolation: Auth service failure doesn't take down profile service
|
||||
- Development velocity: Faster iteration, less blocking
|
||||
|
||||
### Implementation
|
||||
|
||||
**Approach**: Phased migration - Extract services incrementally without big-bang rewrite
|
||||
|
||||
**Phases**:
|
||||
1. **Phase 1 (Week 1-2)**: Extract Auth Service
|
||||
- Deliverables: Auth service running in parallel, API Gateway routing auth requests
|
||||
- Testing: Canary traffic (10%) to new service
|
||||
|
||||
2. **Phase 2 (Week 3-4)**: Migrate Auth Traffic
|
||||
- Deliverables: 100% auth traffic on new service, rollback plan tested
|
||||
- Verification: Auth latency, error rates compared to baseline
|
||||
|
||||
3. **Phase 3 (Week 5-6)**: Extract Profile Service
|
||||
- Deliverables: Profile service independent, event queue running
|
||||
- Testing: Data consistency verification across message queue
|
||||
|
||||
**Migration Strategy**:
|
||||
- Run both monolith and microservices in parallel initially
|
||||
- Use API Gateway to route traffic, allow A/B testing
|
||||
- Maintain ability to rollback quickly if issues arise
|
||||
- Monitor closely for latency/error rate increases
|
||||
|
||||
(See [PLN-001] for detailed implementation roadmap)
|
||||
|
||||
### Performance & Scalability
|
||||
|
||||
**Performance Targets**
|
||||
- **Latency**: Auth service p95 < 100ms, p99 < 200ms
|
||||
- **Throughput**: Auth service handles 10k requests/second
|
||||
- **Availability**: 99.9% uptime for auth service
|
||||
|
||||
**Scalability Strategy**
|
||||
- **Scaling Approach**: Horizontal - Add more auth service instances behind load balancer
|
||||
- **Bottlenecks**: Database connection pool size (limit 100 connections per service instance)
|
||||
- Mitigation: PgBouncer connection pooling, read replicas for read operations
|
||||
- **Auto-scaling**: Kubernetes HPA scales auth service from 3 to 20 replicas based on CPU
|
||||
|
||||
**Monitoring & Observability**
|
||||
- **Metrics**: Request latency (p50, p95, p99), error rate, service availability
|
||||
- **Alerting**: Alert if auth latency p95 > 150ms, error rate > 0.5%
|
||||
- **Logging**: Structured JSON logs with request ID for tracing across services
|
||||
|
||||
### Security
|
||||
|
||||
**Authentication**
|
||||
- JWT tokens issued by Auth Service, validated by API Gateway
|
||||
- Token expiration: 1 hour, refresh tokens for extended sessions
|
||||
|
||||
**Authorization**
|
||||
- Role-based access control (RBAC) enforced at API Gateway
|
||||
- Profile service doesn't repeat auth checks (trusts gateway)
|
||||
|
||||
**Data Protection**
|
||||
- **Encryption at Rest**: PostgreSQL database encryption enabled
|
||||
- **Encryption in Transit**: TLS 1.3 for all service-to-service communication
|
||||
- **PII Handling**: Passwords hashed with bcrypt (cost factor 12)
|
||||
|
||||
**Secrets Management**
|
||||
- Database credentials stored in Kubernetes secrets
|
||||
- JWT signing key rotated quarterly
|
||||
- Environment-based secret injection at runtime
|
||||
|
||||
**Compliance**
|
||||
- GDPR: User data can be exported via profile service
|
||||
- SOC2: Audit logging enabled for user data access
|
||||
|
||||
### Dependencies & Assumptions
|
||||
|
||||
**Dependencies**
|
||||
- PostgreSQL database must be highly available (RTO 1 hour)
|
||||
- Redis cache can tolerate data loss (non-critical)
|
||||
- API Gateway (Nginx) must be deployed and operational
|
||||
- Message queue (RabbitMQ) must be running
|
||||
|
||||
**Assumptions**
|
||||
- Auth service will handle up to 10k requests/second (based on growth projections)
|
||||
- User data size remains < 100GB (current: 5GB)
|
||||
- Network latency between services < 10ms (co-located data center)
|
||||
|
||||
### Open Questions
|
||||
|
||||
- [ ] Should we use gRPC for service-to-service communication instead of REST?
|
||||
- **Status**: Under investigation - benchmarking against REST
|
||||
- [ ] How do we handle shared user data updates if both services write to DB?
|
||||
- **Status**: Deferred to Phase 3 - will use event sourcing pattern
|
||||
- [ ] What message queue (RabbitMQ vs. Kafka)?
|
||||
- **Status**: RabbitMQ chosen, but revisit if we need audit trail of all changes
|
||||
|
||||
### Approvals
|
||||
|
||||
**Technical Review**
|
||||
- Lead Backend Engineer - TBD
|
||||
|
||||
**Architecture Review**
|
||||
- VP Engineering - TBD
|
||||
|
||||
**Security Review**
|
||||
- Security Team - TBD
|
||||
|
||||
**Approved By**
|
||||
- TBD
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Use Diagrams Effectively
|
||||
- ASCII art is fine for design docs (easy to version control)
|
||||
- Show data flow and component interactions
|
||||
- Label arrows with what data/requests are flowing
|
||||
|
||||
### Be Explicit About Trade-offs
|
||||
- Don't just say "microservices is better"
|
||||
- Say "We're trading operational complexity for independent scaling because this addresses our 5k req/sec bottleneck"
|
||||
|
||||
### Link to Other Specs
|
||||
- Reference related business requirements: `[BRD-001]`
|
||||
- Reference technical requirements: `[PRD-001]`
|
||||
- Reference data models: `[DATA-001]`
|
||||
- Reference API contracts: `[API-001]`
|
||||
|
||||
### Document Rationale
|
||||
- Each decision needs a "why"
|
||||
- Explain what alternatives were considered and why they were rejected
|
||||
- This helps future developers understand the context
|
||||
|
||||
### Be Specific About Performance
|
||||
- Not: "Must be performant"
|
||||
- Yes: "p95 latency under 100ms, p99 under 200ms, supporting 10k requests/second"
|
||||
|
||||
### Consider the Whole System
|
||||
- Security implications
|
||||
- Operational/monitoring requirements
|
||||
- Data consistency model
|
||||
- Failure modes and recovery
|
||||
- Future scalability
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/design-document/des-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing Proposed Solution section"
|
||||
- **Fix**: Add detailed architecture components, design decisions, tech stack
|
||||
|
||||
**Issue**: "TODO items in Architecture Components (4 items)"
|
||||
- **Fix**: Complete descriptions for all components (purpose, technology, responsibilities)
|
||||
|
||||
**Issue**: "No Trade-offs documented"
|
||||
- **Fix**: Explicitly document what you're accepting and what you're gaining
|
||||
|
||||
**Issue**: "Missing Performance & Scalability targets"
|
||||
- **Fix**: Add specific latency, throughput, and availability targets
|
||||
|
||||
### Check Completeness
|
||||
```bash
|
||||
scripts/check-completeness.sh docs/specs/design-document/des-001-your-spec.md
|
||||
```
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
As you write the design doc, work through:
|
||||
|
||||
1. **Problem**: What are we designing for?
|
||||
- Specific pain points or constraints?
|
||||
- Performance targets, scalability requirements?
|
||||
|
||||
2. **Options**: What architectural approaches could work?
|
||||
- Monolith vs. distributed?
|
||||
- Synchronous vs. asynchronous?
|
||||
- Technology choices?
|
||||
|
||||
3. **Evaluation**: How do options compare?
|
||||
- Which best addresses the problem?
|
||||
- What are the trade-offs?
|
||||
- What does the team have experience with?
|
||||
|
||||
4. **Decision**: Which approach wins and why?
|
||||
- What assumptions must hold?
|
||||
- What trade-offs are we accepting?
|
||||
|
||||
5. **Implementation**: How do we build/migrate to this?
|
||||
- Big bang or incremental?
|
||||
- Parallel running period?
|
||||
- Rollback plan?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh design-document des-XXX-slug`
|
||||
2. **Research**: Find related specs and understand architecture context
|
||||
3. **Sketch**: Draw architecture diagrams before writing detailed components
|
||||
4. **Fill in sections** using this guide
|
||||
5. **Validate**: `scripts/validate-spec.sh docs/specs/design-document/des-XXX-slug.md`
|
||||
6. **Get architectural review** before implementation begins
|
||||
7. **Update related specs**: Create or update technical requirements and implementation plans
|
||||
564
skills/spec-author/guides/flow-schematic.md
Normal file
564
skills/spec-author/guides/flow-schematic.md
Normal file
@@ -0,0 +1,564 @@
|
||||
# How to Create a Flow Schematic Specification
|
||||
|
||||
Flow schematics document business processes, workflows, and system flows visually and textually. They show how information moves through systems and how users interact with features.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new flow schematic
|
||||
scripts/generate-spec.sh flow-schematic flow-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/flow-schematic/flow-001-descriptive-slug.md)
|
||||
|
||||
# 3. Add diagram and flow descriptions, then validate:
|
||||
scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Flow Schematic
|
||||
|
||||
Use a Flow Schematic when you need to:
|
||||
- Document how users interact with a feature
|
||||
- Show data flow through systems
|
||||
- Illustrate decision points and branches
|
||||
- Document error handling paths
|
||||
- Clarify complex processes
|
||||
- Enable team alignment on workflow
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what this flow represents:
|
||||
|
||||
```bash
|
||||
# Find business requirements this flow implements
|
||||
grep -r "brd" docs/specs/ --include="*.md"
|
||||
|
||||
# Find design documents that mention this flow
|
||||
grep -r "design" docs/specs/ --include="*.md"
|
||||
|
||||
# Find related components or APIs
|
||||
grep -r "component\|api" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Understand the User/System
|
||||
- Who are the actors in this flow? (users, systems, services)
|
||||
- What are they trying to accomplish?
|
||||
- What information flows between actors?
|
||||
- Where are the decision points?
|
||||
- What happens when things go wrong?
|
||||
|
||||
### 3. Review Similar Flows
|
||||
- How are flows documented in your organization?
|
||||
- What diagramming style is used?
|
||||
- What level of detail is typical?
|
||||
- What's been confusing about past flows?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "User Export Flow", "Payment Processing Flow", etc.
|
||||
- **Actor**: Primary user or system
|
||||
- **Scope**: What does this flow cover?
|
||||
- **Status**: Draft | Current | Legacy
|
||||
|
||||
### Overview Section
|
||||
|
||||
```markdown
|
||||
# User Bulk Export Flow
|
||||
|
||||
## Summary
|
||||
Describes the complete workflow when a user initiates a bulk data export,
|
||||
including queuing, processing, file storage, and download.
|
||||
|
||||
**Primary Actors**: User, Export Service, Database, S3
|
||||
**Scope**: From export request to download
|
||||
**Current**: Yes (live in production)
|
||||
|
||||
## Key Steps Overview
|
||||
1. User requests export (website)
|
||||
2. API queues export job
|
||||
3. Worker processes export
|
||||
4. File stored to S3
|
||||
5. User notified and downloads
|
||||
```
|
||||
|
||||
### Flow Diagram Section
|
||||
|
||||
Create a visual representation:
|
||||
|
||||
```markdown
|
||||
## Flow Diagram
|
||||
|
||||
### User Export Flow (ASCII Art)
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ User │
|
||||
│ (Website) │
|
||||
└──────┬──────┘
|
||||
│ 1. Click Export
|
||||
▼
|
||||
┌─────────────────────────┐
|
||||
│ Export API │
|
||||
│ POST /exports │
|
||||
├─────────────────────────┤
|
||||
│ 2. Validate request │
|
||||
│ 3. Create export record │
|
||||
│ 4. Queue job │
|
||||
└────────┬────────────────┘
|
||||
│
|
||||
├─→ 5. Return job_id to user
|
||||
│
|
||||
▼
|
||||
┌──────────────────────┐
|
||||
│ Message Queue │
|
||||
│ (Redis Bull) │
|
||||
├──────────────────────┤
|
||||
│ 6. Store export job │
|
||||
└────────┬─────────────┘
|
||||
│
|
||||
├─→ 7. Worker picks up job
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ Export Worker │
|
||||
├──────────────────────────────┤
|
||||
│ 8. Query user data │
|
||||
│ 9. Format data (CSV/JSON) │
|
||||
│ 10. Compress file │
|
||||
└────────┬─────────────────────┘
|
||||
│
|
||||
├─→ 11. Update job status (processing)
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────┐
|
||||
│ AWS S3 │
|
||||
├──────────────────────────┤
|
||||
│ 12. Store file │
|
||||
│ 13. Generate signed URL │
|
||||
└────────┬─────────────────┘
|
||||
│
|
||||
├─→ 14. Send notification email to user
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────┐
|
||||
│ User Email │
|
||||
├──────────────────────────┤
|
||||
│ 15. Click download link │
|
||||
└────────┬─────────────────┘
|
||||
│
|
||||
├─→ 16. Browser requests file from S3
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────┐
|
||||
│ File Downloaded │
|
||||
└──────────────────────────┘
|
||||
```
|
||||
|
||||
### Swimlane Diagram (Alternative Format)
|
||||
|
||||
```markdown
|
||||
### Alternative: Swimlane Diagram
|
||||
|
||||
```
|
||||
User │ Frontend │ Export API │ Message Queue │ Worker │ S3
|
||||
│ │ │ │ │
|
||||
1. Clicks │ │ │ │ │
|
||||
Export ─┼──────────────→│ │ │ │
|
||||
│ 2. Form Data │ │ │ │
|
||||
│ │ 3. Validate │ │ │
|
||||
│ │ 4. Create Job│ │ │
|
||||
│ │ 5. Queue Job ─┼──────────────→│ │
|
||||
│ │ │ 6. Job Ready │ │
|
||||
│ 7. Show Status│ │ │ │
|
||||
│ (polling) ←┼───────────────│ (update DB) │ │
|
||||
│ │ │ │ 8. Get Data │
|
||||
│ │ │ │ 9. Format │
|
||||
│ │ │ │ 10. Compress │
|
||||
│ │ │ │ 11. Upload ─┼──→
|
||||
│ │ │ │ │
|
||||
│ 12. Email sent│ │ │ │
|
||||
│←──────────────┼───────────────┼───────────────┤ │
|
||||
│ │ │ │ │
|
||||
14. Download │ │ │ │ │
|
||||
Starts ─┼──────────────→│ │ │ │
|
||||
│ │ │ │ │
|
||||
│ │ 15. GET /file ┼───────────────┼──────────────→│
|
||||
│ │ │ │ 16. Return URL
|
||||
│ File Downloaded
|
||||
```
|
||||
```
|
||||
|
||||
### Step-by-Step Description Section
|
||||
|
||||
Document each step in detail:
|
||||
|
||||
```markdown
|
||||
## Detailed Flow Steps
|
||||
|
||||
### Phase 1: Export Request
|
||||
|
||||
**Step 1: User Initiates Export**
|
||||
- **Actor**: User
|
||||
- **Action**: Clicks "Export Data" button on website
|
||||
- **Input**: Export preferences (format, data types, date range)
|
||||
- **Output**: Export request form submitted
|
||||
|
||||
**Step 2: Frontend Sends Request**
|
||||
- **Actor**: Frontend/Browser
|
||||
- **Action**: Submits POST request to /exports endpoint
|
||||
- **Headers**: Authorization header with JWT token
|
||||
- **Body**:
|
||||
```json
|
||||
{
|
||||
"format": "csv",
|
||||
"data_types": ["users", "transactions"],
|
||||
"date_range": { "start": "2024-01-01", "end": "2024-01-31" }
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: API Validates Request**
|
||||
- **Actor**: Export API
|
||||
- **Action**: Validate request format and parameters
|
||||
- **Checks**:
|
||||
- User authenticated?
|
||||
- Valid format type?
|
||||
- Date range valid?
|
||||
- User not already processing too many exports?
|
||||
- **Success**: Continue to Step 4
|
||||
- **Error**: Return 400 Bad Request with error details
|
||||
|
||||
**Step 4: Create Export Record**
|
||||
- **Actor**: Export API
|
||||
- **Action**: Store export metadata in database
|
||||
- **Data Stored**:
|
||||
```sql
|
||||
INSERT INTO exports (
|
||||
id, user_id, format, data_types, status,
|
||||
created_at, updated_at
|
||||
) VALUES (...)
|
||||
```
|
||||
- **Status**: `queued`
|
||||
- **Response**: Return 201 with export_id
|
||||
|
||||
### Phase 2: Job Processing
|
||||
|
||||
**Step 5: Queue Export Job**
|
||||
- **Actor**: Export API
|
||||
- **Action**: Add job to Redis queue
|
||||
- **Job Format**:
|
||||
```json
|
||||
{
|
||||
"export_id": "exp_123456",
|
||||
"user_id": "usr_789012",
|
||||
"format": "csv",
|
||||
"data_types": ["users", "transactions"]
|
||||
}
|
||||
```
|
||||
- **Queue**: Bull job queue in Redis
|
||||
- **TTL**: Job removed after 7 days
|
||||
|
||||
**Step 6: Return to User**
|
||||
- **Actor**: Export API
|
||||
- **Action**: Send response to frontend
|
||||
- **Response**:
|
||||
```json
|
||||
{
|
||||
"id": "exp_123456",
|
||||
"status": "queued",
|
||||
"created_at": "2024-01-15T10:00:00Z",
|
||||
"estimated_completion": "2024-01-15T10:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Data Export
|
||||
|
||||
**Step 7: Worker Picks Up Job**
|
||||
- **Actor**: Export Worker
|
||||
- **Action**: Poll Redis queue for jobs
|
||||
- **Condition**: Worker checks every 100ms
|
||||
- **Process**: Dequeues oldest job, marks as processing
|
||||
- **Status Update**: Export marked as `processing` in database
|
||||
|
||||
**Step 8-10: Process Export**
|
||||
- **Actor**: Export Worker
|
||||
- **Actions**:
|
||||
1. Query user data from database (user table, transaction table)
|
||||
2. Validate and transform data to requested format
|
||||
3. Write to temporary file on worker disk
|
||||
4. Compress file with gzip
|
||||
- **Error Handling**: If fails, retry up to 3 times with backoff
|
||||
|
||||
**Step 11: Upload to S3**
|
||||
- **Actor**: Export Worker
|
||||
- **Action**: Upload compressed file to S3
|
||||
- **Filename**: `exports/exp_123456.csv.gz`
|
||||
- **ACL**: Private (only accessible via signed URL)
|
||||
- **Success**: Update export status to `completed` in database
|
||||
|
||||
### Phase 4: Notification & Download
|
||||
|
||||
**Step 12: Send Notification**
|
||||
- **Actor**: Notification Service (triggered by export completion event)
|
||||
- **Action**: Send email to user
|
||||
- **Email Content**: "Your export is ready! [Click here to download]"
|
||||
- **Link**: Includes signed URL (valid for 7 days)
|
||||
|
||||
**Step 13: User Receives Email**
|
||||
- **Actor**: User
|
||||
- **Action**: Receives email notification
|
||||
- **Next**: Clicks download link
|
||||
|
||||
**Step 14-16: Download File**
|
||||
- **Actor**: User browser
|
||||
- **Action**: Follows download link
|
||||
- **Request**: GET /exports/exp_123456/download
|
||||
- **Response**: Browser initiates file download
|
||||
- **File**: exp_123456.csv.gz is saved to user's computer
|
||||
```
|
||||
|
||||
### Decision Points Section
|
||||
|
||||
Document branching logic:
|
||||
|
||||
```markdown
|
||||
## Decision Points
|
||||
|
||||
### Decision 1: Export Format Validation
|
||||
**Question**: Is the requested export format supported?
|
||||
**Options**:
|
||||
- ✓ CSV: Continue to data export (Step 8)
|
||||
- ✓ JSON: Continue to data export (Step 8)
|
||||
- ✗ Other format: Return 400 error, user selects different format
|
||||
|
||||
### Decision 2: User Data Available?
|
||||
**Question**: Can we successfully query user data?
|
||||
**Options**:
|
||||
- ✓ Yes: Continue with data transformation (Step 9)
|
||||
- ✗ Database error: Retry job (up to 3 times)
|
||||
- ✗ User data deleted: Return "no data" message to user
|
||||
|
||||
### Decision 3: File Size Check
|
||||
**Question**: Is the export file within size limits?
|
||||
**Options**:
|
||||
- ✓ < 500MB: Proceed to upload (Step 11)
|
||||
- ✗ > 500MB: Return error "export too large", offer data filtering options
|
||||
|
||||
### Decision 4: Export Status Check (User Polling)
|
||||
**Question**: Has export job completed?
|
||||
**Polling**: Frontend polls GET /exports/{id} every 5 seconds
|
||||
**Options**:
|
||||
- `queued`: Show "Waiting to process..."
|
||||
- `processing`: Show "Processing... (40%)"
|
||||
- `completed`: Show download link
|
||||
- `failed`: Show error message, offer retry option
|
||||
- `cancelled`: Show "Export was cancelled"
|
||||
```
|
||||
|
||||
### Error Handling Section
|
||||
|
||||
```markdown
|
||||
## Error Handling & Recovery
|
||||
|
||||
### Error 1: Invalid Request Format
|
||||
**Trigger**: User submits invalid format parameter
|
||||
**Response Code**: 400 Bad Request
|
||||
**Message**: "Invalid format. Supported: csv, json"
|
||||
**Recovery**: User submits corrected request
|
||||
|
||||
### Error 2: Database Connection Lost During Export
|
||||
**Trigger**: Worker loses connection to database while querying data
|
||||
**Response Code**: (internal, no response to user)
|
||||
**Recovery**: Job retried automatically (backoff: 1s, 2s, 4s)
|
||||
**Max Retries**: 3 times
|
||||
**If Fails After Retries**: Export marked as `failed`, user notified
|
||||
|
||||
### Error 3: S3 Upload Failure
|
||||
**Trigger**: S3 returns 500 error
|
||||
**Recovery**: Retry with exponential backoff
|
||||
**Fallback**: If retries exhausted, store to local backup, retry next hour
|
||||
**User Impact**: Export shows "delayed", user can check status later
|
||||
|
||||
### Error 4: File Too Large
|
||||
**Trigger**: Export file exceeds 500MB limit
|
||||
**Response Code**: 413 Payload Too Large
|
||||
**Message**: "Export data exceeds 500MB. Use date filtering to reduce size."
|
||||
**Recovery**: User modifies date range and resubmits
|
||||
|
||||
### Timeout Handling
|
||||
**Job Timeout**: If export takes > 5 minutes, job is killed
|
||||
**User Notification**: "Export processing took too long. Please try again."
|
||||
**Logs**: Timeout recorded for analysis
|
||||
**Recovery**: User can request again (usually succeeds second time)
|
||||
```
|
||||
|
||||
### Async/Event Section
|
||||
|
||||
Document asynchronous aspects:
|
||||
|
||||
```markdown
|
||||
## Asynchronous Operations
|
||||
|
||||
### Event: Export Created
|
||||
**Trigger**: POST /exports returns 201
|
||||
**Event Published**: `export.created`
|
||||
**Subscribers**: Analytics service (tracks export requests)
|
||||
**Payload**:
|
||||
```json
|
||||
{
|
||||
"export_id": "exp_123456",
|
||||
"user_id": "usr_789012",
|
||||
"format": "csv",
|
||||
"timestamp": "2024-01-15T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Event: Export Completed
|
||||
**Trigger**: Worker successfully uploads to S3
|
||||
**Event Published**: `export.completed`
|
||||
**Subscribers**:
|
||||
- Notification service (send email)
|
||||
- Analytics service (track completion)
|
||||
**Payload**:
|
||||
```json
|
||||
{
|
||||
"export_id": "exp_123456",
|
||||
"file_size_bytes": 2048576,
|
||||
"processing_time_ms": 312000,
|
||||
"timestamp": "2024-01-15T10:05:12Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Event: Export Failed
|
||||
**Trigger**: Job fails after max retries
|
||||
**Event Published**: `export.failed`
|
||||
**Subscribers**: Notification service (alert user)
|
||||
**Payload**:
|
||||
```json
|
||||
{
|
||||
"export_id": "exp_123456",
|
||||
"error_code": "database_timeout",
|
||||
"error_message": "Connection timeout after 3 retries",
|
||||
"timestamp": "2024-01-15T10:06:00Z"
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### Performance & Timing Section
|
||||
|
||||
```markdown
|
||||
## Performance Characteristics
|
||||
|
||||
### Typical Timings
|
||||
- Request submission → queued: < 100ms
|
||||
- Queued → processing starts: < 30 seconds (depends on queue load)
|
||||
- Processing time:
|
||||
- Small dataset (< 10MB): 1-2 minutes
|
||||
- Medium dataset (10-100MB): 2-5 minutes
|
||||
- Large dataset (100-500MB): 5-10 minutes
|
||||
- Upload to S3: 30 seconds to 2 minutes
|
||||
|
||||
### Total End-to-End Time
|
||||
- Average: 5-10 minutes from request to download ready
|
||||
- Best case: 3-5 minutes (empty queue, small dataset)
|
||||
- Worst case: 15+ minutes (high load, large dataset)
|
||||
|
||||
### Scaling Behavior
|
||||
- 1 worker: Processes 1 export at a time
|
||||
- 3 workers: Process 3 exports in parallel
|
||||
- 10 workers: Can handle 10 concurrent exports
|
||||
- Queue depth auto-scales workers up to 20 pods
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Use Clear Diagrams
|
||||
- ASCII art is fine and versioning-friendly
|
||||
- Show all actors and their interactions
|
||||
- Label arrows with what's being transmitted
|
||||
- Use swimlanes for multiple actors
|
||||
|
||||
### Be Specific About Data
|
||||
- Show actual request/response formats
|
||||
- Include field names and types
|
||||
- Show error responses with codes
|
||||
- Document data transformations
|
||||
|
||||
### Cover the Happy Path AND Error Paths
|
||||
- What happens when everything works?
|
||||
- What happens when things go wrong?
|
||||
- What are the recovery mechanisms?
|
||||
- Can users recover?
|
||||
|
||||
### Think About Timing
|
||||
- What happens asynchronously?
|
||||
- Where are synchronous waits?
|
||||
- What are typical timings?
|
||||
- Where are bottlenecks?
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference design documents: `[DES-001]`
|
||||
- Reference API contracts: `[API-001]`
|
||||
- Reference component specs: `[CMP-001]`
|
||||
- Reference data models: `[DATA-001]`
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Flow diagram incomplete or missing"
|
||||
- **Fix**: Add ASCII diagram or swimlane showing all steps
|
||||
|
||||
**Issue**: "Step descriptions lack detail"
|
||||
- **Fix**: Add what happens, who's involved, input/output for each step
|
||||
|
||||
**Issue**: "No error handling documented"
|
||||
- **Fix**: Document error cases and recovery mechanisms
|
||||
|
||||
**Issue**: "Async operations not clearly shown"
|
||||
- **Fix**: Highlight asynchronous steps and show event flows
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When documenting a flow:
|
||||
|
||||
1. **Scope**: What does this flow cover?
|
||||
- Where does it start/end?
|
||||
- What's in scope vs. out?
|
||||
|
||||
2. **Actors**: Who/what are the main actors?
|
||||
- Users, systems, services?
|
||||
- External dependencies?
|
||||
|
||||
3. **Happy Path**: What's the ideal flow?
|
||||
- Step-by-step happy path
|
||||
- Minimal branching
|
||||
|
||||
4. **Edge Cases**: What can go wrong?
|
||||
- Error scenarios
|
||||
- Recovery mechanisms
|
||||
- User impact
|
||||
|
||||
5. **Timing**: What's the performance profile?
|
||||
- Synchronous waits?
|
||||
- Asynchronous operations?
|
||||
- Expected timings?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh flow-schematic flow-XXX-slug`
|
||||
2. **Research**: Find related specs and understand context
|
||||
3. **Sketch diagram**: Draw initial flow with all actors
|
||||
4. **Document steps**: Write detailed description for each step
|
||||
5. **Add error handling**: Document failure scenarios
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/flow-schematic/flow-XXX-slug.md`
|
||||
7. **Get feedback** from team to refine flow
|
||||
434
skills/spec-author/guides/milestone.md
Normal file
434
skills/spec-author/guides/milestone.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# How to Create a Milestone Specification
|
||||
|
||||
Milestone specifications define specific delivery targets within a project, including deliverables, success criteria, and timeline. They're checkpoints to verify progress.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new milestone
|
||||
scripts/generate-spec.sh milestone mls-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/milestone/mls-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in deliverables and criteria, then validate:
|
||||
scripts/validate-spec.sh docs/specs/milestone/mls-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/milestone/mls-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Milestone
|
||||
|
||||
Use a Milestone Spec when you need to:
|
||||
- Define specific delivery checkpoints
|
||||
- Communicate to stakeholders what's shipping when
|
||||
- Track progress against concrete deliverables
|
||||
- Set success criteria before building
|
||||
- Manage dependencies between teams
|
||||
- Celebrate progress and team achievements
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find the context for this milestone:
|
||||
|
||||
```bash
|
||||
# Find the plan this milestone belongs to
|
||||
grep -r "plan" docs/specs/ --include="*.md"
|
||||
|
||||
# Find related requirements and specs
|
||||
grep -r "brd\|prd\|design" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Understand the Broader Plan
|
||||
- What larger project is this part of?
|
||||
- What comes before and after this milestone?
|
||||
- What dependencies exist with other teams?
|
||||
- What are the overall project goals?
|
||||
|
||||
### 3. Review Similar Milestones
|
||||
- How were past milestones structured?
|
||||
- What deliverables were tracked?
|
||||
- How were success criteria defined?
|
||||
- What worked and what didn't?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Phase 1: Infrastructure Ready", "Beta Launch", etc.
|
||||
- **Date**: Target completion date
|
||||
- **Owner**: Team or person responsible
|
||||
- **Status**: Planned | In Progress | Completed | At Risk
|
||||
|
||||
### Milestone Summary
|
||||
|
||||
```markdown
|
||||
# Phase 1: Export Infrastructure Ready
|
||||
|
||||
**Target Date**: January 28, 2024
|
||||
**Owner**: Backend Engineering Team
|
||||
**Status**: In Progress
|
||||
|
||||
## Summary
|
||||
Delivery of fully operational job queue infrastructure and worker processes
|
||||
supporting the bulk export feature. Team demonstrates system can reliably
|
||||
process 10+ jobs per second with monitoring and alerting in place.
|
||||
```
|
||||
|
||||
### Deliverables Section
|
||||
|
||||
List what will be delivered:
|
||||
|
||||
```markdown
|
||||
## Deliverables
|
||||
|
||||
### 1. Redis Job Queue (Production-Ready)
|
||||
**Description**: Managed Redis cluster configured for job queuing
|
||||
**Acceptance Criteria**:
|
||||
- [ ] AWS ElastiCache Redis cluster deployed to staging
|
||||
- [ ] Cluster sized for 10k requests/second capacity
|
||||
- [ ] Backup and failover configured
|
||||
- [ ] Monitoring and alerts in place
|
||||
**Owner**: Infrastructure Team
|
||||
**Status**: In Progress
|
||||
|
||||
### 2. Bull Job Queue Worker
|
||||
**Description**: Node.js Bull queue implementation with workers
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Bull queue initialized and processing jobs
|
||||
- [ ] Worker processes handle 10+ jobs/second
|
||||
- [ ] Graceful shutdown implemented
|
||||
- [ ] Error handling and retry logic working
|
||||
- [ ] Unit tests cover all worker functions
|
||||
**Owner**: Backend Engineer (Alice)
|
||||
**Delivered**: Code in feature branch, ready for review
|
||||
|
||||
### 3. Kubernetes Deployment Manifests
|
||||
**Description**: K8s manifests for deploying queue workers
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Deployment manifest supports 1-10 replicas
|
||||
- [ ] Health checks configured (liveness, readiness)
|
||||
- [ ] Resource requests/limits defined
|
||||
- [ ] Secrets management for Redis credentials
|
||||
- [ ] Successfully deploys to staging cluster
|
||||
**Owner**: DevOps Engineer (Bob)
|
||||
**Status**: Ready for review
|
||||
|
||||
### 4. Prometheus Metrics Integration
|
||||
**Description**: Export metrics for job queue depth, worker status
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Metrics scrape successfully every 15 seconds
|
||||
- [ ] Dashboard shows queue depth over time
|
||||
- [ ] Queue saturation alerts configured
|
||||
- [ ] Grafana dashboard created for monitoring
|
||||
**Owner**: Backend Engineer (Alice)
|
||||
**Status**: In progress
|
||||
|
||||
### 5. Documentation & Runbook
|
||||
**Description**: Queue architecture docs and operational runbook
|
||||
**Acceptance Criteria**:
|
||||
- [ ] Architecture diagram showing queues and workers
|
||||
- [ ] Configuration guide for different environments
|
||||
- [ ] Runbook for common operations (scaling, debugging)
|
||||
- [ ] Troubleshooting guide for common issues
|
||||
**Owner**: Tech Lead (Charlie)
|
||||
**Status**: Planned (starts after technical setup)
|
||||
|
||||
## Deliverables Summary
|
||||
|
||||
| Deliverable | Status | Owner | Target |
|
||||
|------------|--------|-------|--------|
|
||||
| Redis Cluster | In Progress | Infra | Jan 20 |
|
||||
| Bull Worker | In Progress | Alice | Jan 22 |
|
||||
| K8s Manifests | In Progress | Bob | Jan 22 |
|
||||
| Prometheus Metrics | In Progress | Alice | Jan 25 |
|
||||
| Documentation | Planned | Charlie | Jan 28 |
|
||||
```
|
||||
|
||||
### Success Criteria Section
|
||||
|
||||
Define what "done" means:
|
||||
|
||||
```markdown
|
||||
## Success Criteria
|
||||
|
||||
### Technical Criteria (Must Pass)
|
||||
- [ ] Job queue processes 100 jobs without errors
|
||||
- [ ] Queue handles 10+ jobs/second sustained throughput
|
||||
- [ ] Workers scale horizontally (add/remove replicas without data loss)
|
||||
- [ ] Failed jobs retry with exponential backoff
|
||||
- [ ] All health checks pass in staging environment
|
||||
|
||||
### Operational Criteria (Must Have)
|
||||
- [ ] Prometheus metrics visible in Grafana dashboard
|
||||
- [ ] Alerts fire correctly when queue depth exceeds threshold
|
||||
- [ ] Monitoring documentation complete and understood by ops team
|
||||
- [ ] Runbook covers: scaling, debugging, troubleshooting
|
||||
|
||||
### Quality Criteria (Must Meet)
|
||||
- [ ] Code reviewed and approved by 2+ senior engineers
|
||||
- [ ] Unit tests pass with 90%+ coverage
|
||||
- [ ] Integration tests verify queue → worker → completion flow
|
||||
- [ ] Load tests verify performance targets
|
||||
- [ ] Security audit passed (no exposed credentials)
|
||||
|
||||
### Documentation Criteria (Must Have)
|
||||
- [ ] Architecture documented with diagrams
|
||||
- [ ] Configuration guide for different environments
|
||||
- [ ] Troubleshooting guide covers common issues
|
||||
- [ ] Operations team trained and confident in operations
|
||||
|
||||
## Sign-Off Criteria
|
||||
|
||||
Milestone is "done" when:
|
||||
1. All deliverables accepted and deployed to staging
|
||||
2. All technical criteria pass
|
||||
3. Tech lead, product owner, and operations lead approve
|
||||
4. Documentation reviewed and accepted
|
||||
```
|
||||
|
||||
### Timeline & Dependencies Section
|
||||
|
||||
```markdown
|
||||
## Timeline & Dependencies
|
||||
|
||||
### Critical Path
|
||||
```
|
||||
Start → Redis Setup → Bull Implementation → Testing → Documentation → Done
|
||||
(Jan 15) (3 days) (4 days) (3 days) (2 days) (Jan 28)
|
||||
```
|
||||
|
||||
### Phase Dependencies
|
||||
- **Blocking this milestone**: None (can start immediately)
|
||||
- **This milestone blocks**: Phase 2 (Export Service Development)
|
||||
- **If delayed**: Phase 2 starts after this completes
|
||||
- **Contingency**: Have spare capacity in next phase for any slippage
|
||||
|
||||
### Team Capacity
|
||||
| Person | Allocation | Weeks | Notes |
|
||||
|--------|-----------|-------|-------|
|
||||
| Alice (Backend) | 100% | 2 | Queue + metrics |
|
||||
| Bob (DevOps) | 100% | 1.5 | Infrastructure |
|
||||
| Charlie (Lead) | 50% | 1.5 | Review + docs |
|
||||
|
||||
### Risks & Mitigation
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|-----------|
|
||||
| Redis provisioning delayed | Medium | High | Use managed service, start request early |
|
||||
| Performance targets not met | Low | High | Load test early, optimize if needed |
|
||||
| Team member unavailable | Low | Medium | Cross-train backup person |
|
||||
| Documentation delayed | Low | Low | Defer non-critical docs to next phase |
|
||||
```
|
||||
|
||||
### Blockers & Issues Section
|
||||
|
||||
Track what could prevent delivery:
|
||||
|
||||
```markdown
|
||||
## Current Blockers
|
||||
|
||||
### 1. AWS Infrastructure Approval (High Priority)
|
||||
- **Issue**: Redis cluster requires infrastructure approval
|
||||
- **Impact**: Blocks infrastructure setup (3-5 day delay if not approved)
|
||||
- **Owner**: Infrastructure Lead
|
||||
- **Action**: Sent approval request on Jan 10, following up Jan 15
|
||||
- **Target Resolution**: Jan 12
|
||||
|
||||
### 2. Node.js Bull Documentation Gap (Low Priority)
|
||||
- **Issue**: Team unfamiliar with Bull library job prioritization
|
||||
- **Impact**: Might need extra time for implementation
|
||||
- **Owner**: Alice
|
||||
- **Action**: Schedule Bull library workshop on Jan 16
|
||||
- **Target Resolution**: Jan 16
|
||||
|
||||
## Dependencies Waiting
|
||||
|
||||
- AWS ElastiCache cluster approval (Infrastructure)
|
||||
- IAM roles and security groups (Security team)
|
||||
```
|
||||
|
||||
### Acceptance & Testing Section
|
||||
|
||||
```markdown
|
||||
## Acceptance Procedures
|
||||
|
||||
### Manual Testing Checklist
|
||||
- [ ] Queue accepts jobs from client
|
||||
- [ ] Worker processes jobs without errors
|
||||
- [ ] Queue depth monitoring works in Grafana
|
||||
- [ ] Scaling up adds workers, scaling down removes them gracefully
|
||||
- [ ] Failed job retry works with exponential backoff
|
||||
- [ ] Restart worker and verify no jobs are lost
|
||||
|
||||
### Performance Testing
|
||||
- [ ] Load test with 100 concurrent jobs
|
||||
- [ ] Verify throughput ≥ 10 jobs/second
|
||||
- [ ] Monitor memory and CPU during load test
|
||||
- [ ] Document baseline metrics for future comparison
|
||||
|
||||
### Security Testing
|
||||
- [ ] Credentials not exposed in logs or metrics
|
||||
- [ ] Redis connection uses TLS
|
||||
- [ ] Worker process runs with minimal permissions
|
||||
|
||||
### Sign-Off Process
|
||||
1. Engineering team completes manual testing
|
||||
2. Tech lead verifies all acceptance criteria pass
|
||||
3. Operations team reviews runbook and documentation
|
||||
4. Product owner confirms milestone meets business requirements
|
||||
5. All sign-off: tech lead, ops lead, product owner
|
||||
```
|
||||
|
||||
### Rollback Plan Section
|
||||
|
||||
```markdown
|
||||
## Rollback Plan
|
||||
|
||||
If this milestone fails or has critical issues:
|
||||
|
||||
### Rollback Steps
|
||||
1. Revert worker deployment: `kubectl rollout undo`
|
||||
2. Keep Redis cluster (non-breaking)
|
||||
3. Disable alerts that reference new queue
|
||||
4. Run post-mortem to understand failure
|
||||
|
||||
### Communication
|
||||
- Notify stakeholders if deadline at risk
|
||||
- Update project plan and re-estimate Phase 2
|
||||
- Communicate revised timeline to customers
|
||||
|
||||
### Root Cause Analysis
|
||||
- Conduct post-mortem within 2 days
|
||||
- Document lessons learned
|
||||
- Update processes/checklists to prevent recurrence
|
||||
```
|
||||
|
||||
### Stakeholder Communication Section
|
||||
|
||||
```markdown
|
||||
## Stakeholder Communication
|
||||
|
||||
### Who Needs to Know About This Milestone?
|
||||
- **Engineering**: Build against completed infrastructure
|
||||
- **Product**: Planning feature launch timeline
|
||||
- **Operations**: Preparing to support new system
|
||||
- **Executives**: Tracking project progress
|
||||
- **Customers**: Waiting for export feature
|
||||
|
||||
### Communication Plan
|
||||
|
||||
| Stakeholder | Update Frequency | Content |
|
||||
|-------------|-----------------|---------|
|
||||
| Engineering Team | Daily standup | Progress, blockers |
|
||||
| Tech Lead | 3x/week | Risk assessment, decisions |
|
||||
| Product Owner | Weekly | Status, timeline impact |
|
||||
| Ops Team | Twice/week | Operational readiness |
|
||||
| Executives | On completion | Milestone achieved, next steps |
|
||||
|
||||
### Status Updates
|
||||
|
||||
**Current Status**: 60% complete (Jan 22)
|
||||
- Redis setup: Complete
|
||||
- Bull worker: Mostly done, 2 days of testing remaining
|
||||
- K8s manifests: In review
|
||||
- Metrics: Underway
|
||||
- Documentation: Not yet started
|
||||
|
||||
**Next Update**: Jan 25 (on track for Jan 28 completion)
|
||||
|
||||
**Confidence Level**: High (85%) - minor risks, good progress
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Specific About Deliverables
|
||||
- What exactly is being delivered?
|
||||
- How will you verify it's done?
|
||||
- Who owns each deliverable?
|
||||
- What's the definition of "done"?
|
||||
|
||||
### Define Success Clearly
|
||||
- Success criteria should be objective and testable
|
||||
- Mix technical, operational, and quality criteria
|
||||
- Include both must-haves and nice-to-haves
|
||||
- Get stakeholder agreement on criteria upfront
|
||||
|
||||
### Think About the Bigger Picture
|
||||
- How does this milestone fit into the overall project?
|
||||
- What depends on this milestone?
|
||||
- What changes if this milestone is delayed?
|
||||
- What's the contingency plan?
|
||||
|
||||
### Track Progress
|
||||
- Update the milestone spec regularly (weekly)
|
||||
- Note what's actually happening vs. plan
|
||||
- Identify and communicate risks early
|
||||
- Celebrate when milestone completes!
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference the overall plan: `[PLN-001]`
|
||||
- Reference related milestones: `[MLS-002]`
|
||||
- Reference technical specs: `[CMP-001]`, `[API-001]`
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/milestone/mls-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Deliverables lack acceptance criteria"
|
||||
- **Fix**: Add specific, testable criteria for each deliverable
|
||||
|
||||
**Issue**: "No success criteria defined"
|
||||
- **Fix**: Document technical, operational, and quality criteria
|
||||
|
||||
**Issue**: "Owner/responsibilities not assigned"
|
||||
- **Fix**: Assign each deliverable to a specific person or team
|
||||
|
||||
**Issue**: "Rollback plan missing"
|
||||
- **Fix**: Document how you'd handle failure or critical issues
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When defining a milestone:
|
||||
|
||||
1. **Scope**: What should be in this milestone?
|
||||
- Shippable chunk?
|
||||
- Dependencies resolved?
|
||||
- Tests passing?
|
||||
|
||||
2. **Success**: How will we know this is done?
|
||||
- Objective criteria?
|
||||
- Stakeholder agreement?
|
||||
- Testable outcomes?
|
||||
|
||||
3. **Schedule**: When is this realistically achievable?
|
||||
- Team capacity?
|
||||
- Dependency timelines?
|
||||
- Buffer for unknowns?
|
||||
|
||||
4. **Risks**: What could prevent delivery?
|
||||
- Technical unknowns?
|
||||
- Resource constraints?
|
||||
- External dependencies?
|
||||
|
||||
5. **Communication**: Who needs to know about this?
|
||||
- Stakeholder updates?
|
||||
- Sign-off process?
|
||||
- Celebration when done?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh milestone mls-XXX-slug`
|
||||
2. **Research**: Find the plan and related specs
|
||||
3. **Define deliverables** with clear owners
|
||||
4. **Set success criteria** that are testable
|
||||
5. **Identify risks** and mitigation strategies
|
||||
6. **Validate**: `scripts/validate-spec.sh docs/specs/milestone/mls-XXX-slug.md`
|
||||
7. **Get stakeholder alignment** before kickoff
|
||||
8. **Update regularly** to track progress
|
||||
523
skills/spec-author/guides/plan.md
Normal file
523
skills/spec-author/guides/plan.md
Normal file
@@ -0,0 +1,523 @@
|
||||
# How to Create a Plan Specification
|
||||
|
||||
Plan specifications document implementation roadmaps, project timelines, phases, and deliverables. They provide the "how and when" we'll build something.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new plan
|
||||
scripts/generate-spec.sh plan pln-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/plan/pln-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in phases and deliverables, then validate:
|
||||
scripts/validate-spec.sh docs/specs/plan/pln-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix issues and check completeness:
|
||||
scripts/check-completeness.sh docs/specs/plan/pln-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Plan
|
||||
|
||||
Use a Plan Spec when you need to:
|
||||
- Document project phases and timeline
|
||||
- Define deliverables and milestones
|
||||
- Identify dependencies and blockers
|
||||
- Track progress against plan
|
||||
- Communicate timeline to stakeholders
|
||||
- Align team on implementation sequence
|
||||
|
||||
## Research Phase
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Find what you're planning:
|
||||
|
||||
```bash
|
||||
# Find business requirements
|
||||
grep -r "brd" docs/specs/ --include="*.md"
|
||||
|
||||
# Find technical requirements and design docs
|
||||
grep -r "prd\|design" docs/specs/ --include="*.md"
|
||||
|
||||
# Find existing plans that might be related
|
||||
grep -r "plan" docs/specs/ --include="*.md"
|
||||
```
|
||||
|
||||
### 2. Understand the Scope
|
||||
- What are you building?
|
||||
- What are the business priorities?
|
||||
- What are the technical dependencies?
|
||||
- What constraints exist (timeline, resources)?
|
||||
|
||||
### 3. Review Similar Projects
|
||||
- How long did similar projects take?
|
||||
- What teams are involved?
|
||||
- What risks arose and how were they managed?
|
||||
- What was the actual vs. planned timeline?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: "Export Feature Implementation Plan", "Q1 2024 Roadmap", etc.
|
||||
- **Timeline**: Start date through completion
|
||||
- **Owner**: Project lead or team lead
|
||||
- **Status**: Planning | In Progress | Completed
|
||||
|
||||
### Overview Section
|
||||
|
||||
```markdown
|
||||
# Export Feature Implementation Plan
|
||||
|
||||
## Summary
|
||||
Plan to implement bulk user data export feature across 8 weeks.
|
||||
Includes job queue infrastructure, API endpoints, UI, and deployment.
|
||||
|
||||
**Timeline**: January 15 - March 10, 2024 (8 weeks)
|
||||
**Team Size**: 3-4 engineers, 1 product manager
|
||||
**Owner**: Engineering Lead
|
||||
**Status**: Planning
|
||||
|
||||
## Key Objectives
|
||||
1. Enable enterprise customers to bulk export their data
|
||||
2. Build reusable async job processing infrastructure
|
||||
3. Achieve 99% reliability for export system
|
||||
4. Complete testing and documentation before production launch
|
||||
```
|
||||
|
||||
### Phases Section
|
||||
|
||||
Document phases with timing:
|
||||
|
||||
```markdown
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Infrastructure Setup (Weeks 1-2)
|
||||
**Timeline**: Jan 15 - Jan 28 (2 weeks)
|
||||
**Team**: 2 engineers
|
||||
|
||||
**Goals**
|
||||
- Implement job queue infrastructure (Redis + Bull)
|
||||
- Build worker process foundation
|
||||
- Deploy to staging environment
|
||||
|
||||
**Deliverables**
|
||||
- Redis job queue with worker processes
|
||||
- Monitoring and alerting setup
|
||||
- Health checks and graceful shutdown
|
||||
- Documentation of queue architecture
|
||||
|
||||
**Tasks**
|
||||
- [ ] Set up Redis cluster (managed service)
|
||||
- [ ] Implement Bull queue with worker processors
|
||||
- [ ] Add Prometheus metrics for queue depth
|
||||
- [ ] Configure Kubernetes deployment manifests
|
||||
- [ ] Create staging deployment
|
||||
- [ ] Document queue architecture
|
||||
|
||||
**Dependencies**
|
||||
- None (can start immediately)
|
||||
|
||||
**Success Criteria**
|
||||
- Job queue processes 10 jobs/second without errors
|
||||
- Workers can be scaled horizontally in Kubernetes
|
||||
- Monitoring shows queue depth and worker status
|
||||
```
|
||||
|
||||
### Phase 2: Export Service Development (Weeks 3-5)
|
||||
**Timeline**: Jan 29 - Feb 18 (3 weeks)
|
||||
**Team**: 2-3 engineers
|
||||
|
||||
**Goals**
|
||||
- Implement export processing logic
|
||||
- Add database export functionality
|
||||
- Support multiple export formats
|
||||
|
||||
**Deliverables**
|
||||
- Export service that processes jobs from queue
|
||||
- CSV and JSON export format support
|
||||
- Data validation and error handling
|
||||
- Comprehensive unit tests
|
||||
|
||||
**Tasks**
|
||||
- [ ] Implement data query and export logic
|
||||
- [ ] Add CSV formatter
|
||||
- [ ] Add JSON formatter
|
||||
- [ ] Implement error handling and retries
|
||||
- [ ] Add data compression
|
||||
- [ ] Write unit tests (target: 90%+ coverage)
|
||||
- [ ] Performance test with 100MB+ files
|
||||
|
||||
**Dependencies**
|
||||
- Phase 1 complete (queue infrastructure)
|
||||
- Data model spec finalized ([DATA-001])
|
||||
|
||||
**Success Criteria**
|
||||
- Export processes complete in < 5 minutes for 100MB files
|
||||
- All data exports match source data exactly
|
||||
- Error retry logic works correctly
|
||||
- 90%+ test coverage
|
||||
```
|
||||
|
||||
### Phase 3: API & Storage (Weeks 4-6)
|
||||
**Timeline**: Feb 5 - Feb 25 (3 weeks, 1 week overlap with Phase 2)
|
||||
**Team**: 2 engineers
|
||||
|
||||
**Goals**
|
||||
- Implement REST API for export management
|
||||
- Set up S3 storage for export files
|
||||
- Build export status tracking
|
||||
|
||||
**Deliverables**
|
||||
- REST API endpoints (create, get status, download)
|
||||
- S3 integration for file storage
|
||||
- Export metadata storage in database
|
||||
- API documentation
|
||||
|
||||
**Tasks**
|
||||
- [ ] Implement POST /exports endpoint
|
||||
- [ ] Implement GET /exports/{id} endpoint
|
||||
- [ ] Implement GET /exports/{id}/download endpoint
|
||||
- [ ] Add S3 integration for storage
|
||||
- [ ] Create export metadata schema
|
||||
- [ ] Implement TTL-based cleanup
|
||||
- [ ] Add API rate limiting
|
||||
- [ ] Create API documentation
|
||||
|
||||
**Dependencies**
|
||||
- Phase 1 complete (queue infrastructure)
|
||||
- Phase 2 progress (service processing)
|
||||
- Data model spec finalized ([DATA-001])
|
||||
|
||||
**Success Criteria**
|
||||
- API responds to requests in < 100ms (p95)
|
||||
- Files stored and retrieved from S3 correctly
|
||||
- Cleanup removes files after 7-day TTL
|
||||
```
|
||||
|
||||
### Phase 4: Testing & Optimization (Weeks 6-7)
|
||||
**Timeline**: Feb 19 - Mar 3 (2 weeks, overlap with Phase 3)
|
||||
**Team**: 2-3 engineers
|
||||
|
||||
**Goals**
|
||||
- Comprehensive testing across all components
|
||||
- Performance optimization
|
||||
- Security audit
|
||||
|
||||
**Deliverables**
|
||||
- Integration tests for full export flow
|
||||
- Load tests verifying performance targets
|
||||
- Security audit report
|
||||
- Performance tuning applied
|
||||
|
||||
**Tasks**
|
||||
- [ ] Write integration tests for full export flow
|
||||
- [ ] Load test with 100 concurrent exports
|
||||
- [ ] Security audit of data handling
|
||||
- [ ] Performance profiling and optimization
|
||||
- [ ] Test large file handling (500MB+)
|
||||
- [ ] Test error scenarios and retries
|
||||
- [ ] Document known limitations
|
||||
|
||||
**Dependencies**
|
||||
- Phase 2, 3 complete (service and API)
|
||||
|
||||
**Success Criteria**
|
||||
- 95%+ automated test coverage
|
||||
- Load tests show < 500ms p95 latency
|
||||
- Security audit finds no critical issues
|
||||
- Performance meets targets (< 5 min for 100MB)
|
||||
```
|
||||
|
||||
### Phase 5: Documentation & Launch (Weeks 7-8)
|
||||
**Timeline**: Mar 4 - Mar 10 (2 weeks, 1 week overlap)
|
||||
**Team**: Full team (all 4)
|
||||
|
||||
**Goals**
|
||||
- Complete documentation
|
||||
- Customer communication
|
||||
- Production deployment
|
||||
|
||||
**Deliverables**
|
||||
- API documentation (for customers)
|
||||
- Runbook for operations team
|
||||
- Customer launch announcement
|
||||
- Production deployment checklist
|
||||
|
||||
**Tasks**
|
||||
- [ ] Create customer-facing API docs
|
||||
- [ ] Create operational runbook
|
||||
- [ ] Write troubleshooting guide
|
||||
- [ ] Create launch announcement
|
||||
- [ ] Train support team
|
||||
- [ ] Deploy to production
|
||||
- [ ] Monitor for issues
|
||||
- [ ] Collect initial feedback
|
||||
|
||||
**Dependencies**
|
||||
- All prior phases complete
|
||||
- Security audit passed
|
||||
|
||||
**Success Criteria**
|
||||
- Documentation is complete and clear
|
||||
- Support team can operate system independently
|
||||
- Launch goes smoothly with no incidents
|
||||
- Users successfully export data
|
||||
```
|
||||
|
||||
### Dependencies & Blocking Section
|
||||
|
||||
```markdown
|
||||
## Dependencies & Blockers
|
||||
|
||||
### External Dependencies
|
||||
- **Data Model Spec ([DATA-001])**: Required to understand data structure
|
||||
- Status: Draft
|
||||
- Timeline: Must be approved by Jan 20
|
||||
- Owner: Data team
|
||||
|
||||
- **API Contract Spec ([API-001])**: API design must be finalized
|
||||
- Status: In Review
|
||||
- Timeline: Must be approved by Feb 5
|
||||
- Owner: Product team
|
||||
|
||||
- **Infrastructure Resources**: Need S3 bucket and Redis cluster
|
||||
- Status: Requested
|
||||
- Timeline: Must be available by Jan 15
|
||||
- Owner: Infrastructure team
|
||||
|
||||
### Internal Dependencies
|
||||
- **Phase 1 → Phase 2**: Queue infrastructure must be stable
|
||||
- **Phase 2 → Phase 3**: Service must process exports correctly
|
||||
- **Phase 3 → Phase 4**: API and storage must be working
|
||||
- **Phase 4 → Phase 5**: All testing must pass
|
||||
|
||||
### Known Blockers
|
||||
- Infrastructure team is currently overloaded
|
||||
- Mitigation: Request resources early, use managed services
|
||||
- Data privacy review needed for export functionality
|
||||
- Mitigation: Schedule review meeting in first week
|
||||
```
|
||||
|
||||
### Timeline & Gantt Chart Section
|
||||
|
||||
```markdown
|
||||
## Timeline
|
||||
|
||||
```
|
||||
Phase 1: Infrastructure (2 wks) [====================]
|
||||
Phase 2: Export Service (3 wks) ___[============================]
|
||||
Phase 3: API & Storage (3 wks) _______[============================]
|
||||
Phase 4: Testing (2 wks) ______________[====================]
|
||||
Phase 5: Launch (2 wks) __________________[====================]
|
||||
|
||||
Week: 1 2 3 4 5 6 7 8
|
||||
|_|_|_|_|_|_|_|
|
||||
```
|
||||
|
||||
### Key Milestones
|
||||
|
||||
| Milestone | Target Date | Owner | Deliverable |
|
||||
|-----------|------------|-------|-------------|
|
||||
| Queue Infrastructure Ready | Jan 28 | Eng Lead | Staging deployment |
|
||||
| Export Processing Works | Feb 18 | Eng Lead | Service passes tests |
|
||||
| API Complete & Working | Feb 25 | Eng Lead | API docs + endpoints |
|
||||
| Testing Complete | Mar 3 | QA Lead | Test report |
|
||||
| Production Launch | Mar 10 | Eng Lead | Live feature |
|
||||
```
|
||||
|
||||
### Resource & Team Section
|
||||
|
||||
```markdown
|
||||
## Resources
|
||||
|
||||
### Team Composition
|
||||
|
||||
**Engineering Team**
|
||||
- 2 Backend Engineers (Weeks 1-8): Infrastructure, export service, API
|
||||
- 1 Backend Engineer (Weeks 4-8): Testing, optimization
|
||||
- Optional: 1 Frontend Engineer (Weeks 7-8): Documentation, demos
|
||||
|
||||
**Support & Operations**
|
||||
- Product Manager (all weeks): Requirements, prioritization
|
||||
- QA Lead (Weeks 4-8): Testing coordination
|
||||
|
||||
### Skills Required
|
||||
- Backend development (Node.js, PostgreSQL)
|
||||
- Infrastructure/DevOps (Kubernetes, AWS)
|
||||
- Performance testing and optimization
|
||||
- Security best practices
|
||||
|
||||
### Training Needs
|
||||
- Team review of job queue pattern
|
||||
- S3 and AWS integration workshop
|
||||
```
|
||||
|
||||
### Risk Management Section
|
||||
|
||||
```markdown
|
||||
## Risks & Mitigation
|
||||
|
||||
### Technical Risks
|
||||
|
||||
**Risk: Job Queue Reliability Issues**
|
||||
- **Likelihood**: Medium
|
||||
- **Impact**: High (feature doesn't work)
|
||||
- **Mitigation**:
|
||||
- Use managed Redis service (AWS ElastiCache)
|
||||
- Implement comprehensive error handling
|
||||
- Load test thoroughly before production
|
||||
- Have rollback plan
|
||||
|
||||
**Risk: Large File Performance Problems**
|
||||
- **Likelihood**: Medium
|
||||
- **Impact**: Medium (performance targets missed)
|
||||
- **Mitigation**:
|
||||
- Start performance testing early (Week 2)
|
||||
- Profile and optimize in Phase 4
|
||||
- Document performance constraints
|
||||
- Set data size limits if needed
|
||||
|
||||
**Risk: Data Consistency Issues**
|
||||
- **Likelihood**: Low
|
||||
- **Impact**: High (data corruption)
|
||||
- **Mitigation**:
|
||||
- Implement data validation
|
||||
- Use database transactions
|
||||
- Test with edge cases
|
||||
- Have data audit procedures
|
||||
|
||||
### Scheduling Risks
|
||||
|
||||
**Risk: Phase Dependencies Cause Delays**
|
||||
- **Likelihood**: Medium
|
||||
- **Impact**: High (slips launch date)
|
||||
- **Mitigation**:
|
||||
- Phase 2 and 3 overlap to parallelize
|
||||
- Start Phase 4 testing early
|
||||
- Have clear done criteria
|
||||
|
||||
**Risk: Data Model Spec Not Ready**
|
||||
- **Likelihood**: Low
|
||||
- **Impact**: High (blocks implementation)
|
||||
- **Mitigation**:
|
||||
- Confirm spec status before starting
|
||||
- Have backup data model if needed
|
||||
- Schedule early review meetings
|
||||
```
|
||||
|
||||
### Success Metrics Section
|
||||
|
||||
```markdown
|
||||
## Success Criteria
|
||||
|
||||
### Technical Metrics
|
||||
- [ ] Export API processes 1000+ requests/day
|
||||
- [ ] p95 latency < 100ms for status queries
|
||||
- [ ] Export processing completes in < 5 minutes for 100MB files
|
||||
- [ ] System reliability > 99.5%
|
||||
- [ ] Zero data loss or corruption incidents
|
||||
|
||||
### Adoption Metrics
|
||||
- [ ] 30%+ of enterprise users adopt feature in first month
|
||||
- [ ] Average of 2+ exports per adopting user per month
|
||||
- [ ] Support tickets about exports < 5/week
|
||||
|
||||
### Quality Metrics
|
||||
- [ ] 90%+ test coverage
|
||||
- [ ] Zero critical security issues
|
||||
- [ ] Documentation completeness = 100%
|
||||
- [ ] Team can operate independently
|
||||
```
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Be Realistic About Timelines
|
||||
- Include buffer time for unknowns (add 20-30%)
|
||||
- Consider team capacity and interruptions
|
||||
- Account for review and testing cycles
|
||||
- Document assumptions about team size/availability
|
||||
|
||||
### Break Down Phases Clearly
|
||||
- Each phase should have clear deliverables
|
||||
- Phases should be independable or clearly sequenced
|
||||
- Dependencies should be explicit
|
||||
- Success criteria should be measurable
|
||||
|
||||
### Link to Related Specs
|
||||
- Reference business requirements: `[BRD-001]`
|
||||
- Reference technical requirements: `[PRD-001]`
|
||||
- Reference design documents: `[DES-001]`
|
||||
- Reference component specs: `[CMP-001]`
|
||||
|
||||
### Identify Risk Early
|
||||
- What could go wrong?
|
||||
- What's outside your control?
|
||||
- What mitigations exist?
|
||||
- What's the contingency plan?
|
||||
|
||||
### Track Against Plan
|
||||
- Update plan weekly with actual progress
|
||||
- Note slippages and root causes
|
||||
- Adjust future phases if needed
|
||||
- Use as learning for future planning
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/plan/pln-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Phases lack clear deliverables"
|
||||
- **Fix**: Add specific, measurable deliverables for each phase
|
||||
|
||||
**Issue**: "No timeline or dates specified"
|
||||
- **Fix**: Add start/end dates and duration for each phase
|
||||
|
||||
**Issue**: "Dependencies not documented"
|
||||
- **Fix**: Identify and document blocking dependencies between phases
|
||||
|
||||
**Issue**: "Resource allocation unclear"
|
||||
- **Fix**: Specify team members, their roles, and time commitment per phase
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When planning implementation:
|
||||
|
||||
1. **Scope**: What exactly are we building?
|
||||
- Must-haves vs. nice-to-haves?
|
||||
- What can we defer?
|
||||
|
||||
2. **Sequence**: What must be done in order?
|
||||
- What can happen in parallel?
|
||||
- Where are critical path bottlenecks?
|
||||
|
||||
3. **Phases**: How do we break this into manageable chunks?
|
||||
- 1-3 week phases work well
|
||||
- Each should produce something shippable/testable
|
||||
- Clear entry/exit criteria
|
||||
|
||||
4. **Resources**: What do we need?
|
||||
- Team skills and capacity?
|
||||
- Infrastructure and tools?
|
||||
- External dependencies?
|
||||
|
||||
5. **Risk**: What could derail us?
|
||||
- Technical risks?
|
||||
- Timeline risks?
|
||||
- Resource risks?
|
||||
- Mitigation strategies?
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh plan pln-XXX-slug`
|
||||
2. **Research**: Find related specs and understand scope
|
||||
3. **Define phases**: Break work into logical chunks
|
||||
4. **Map dependencies**: Understand what blocks what
|
||||
5. **Estimate effort**: How long will each phase take?
|
||||
6. **Identify risks**: What could go wrong?
|
||||
7. **Validate**: `scripts/validate-spec.sh docs/specs/plan/pln-XXX-slug.md`
|
||||
8. **Share with team** for feedback and planning
|
||||
382
skills/spec-author/guides/technical-requirement.md
Normal file
382
skills/spec-author/guides/technical-requirement.md
Normal file
@@ -0,0 +1,382 @@
|
||||
# How to Create a Technical Requirement Specification
|
||||
|
||||
Technical Requirements (PRD or TRQ) translate business needs into specific, implementation-ready technical requirements. They bridge the gap between "what we want to build" (business requirements) and "how we'll build it" (design documents).
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Create a new technical requirement
|
||||
scripts/generate-spec.sh technical-requirement prd-001-descriptive-slug
|
||||
|
||||
# 2. Open and fill in the file
|
||||
# (The file will be created at: docs/specs/technical-requirement/prd-001-descriptive-slug.md)
|
||||
|
||||
# 3. Fill in the sections, then validate:
|
||||
scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
|
||||
|
||||
# 4. Fix any issues, then check completeness:
|
||||
scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
|
||||
```
|
||||
|
||||
## When to Write a Technical Requirement
|
||||
|
||||
Use a Technical Requirement when you need to:
|
||||
- Define specific technical implementation details for a feature
|
||||
- Map business requirements to technical solutions
|
||||
- Document design decisions and their rationale
|
||||
- Create acceptance criteria that engineers can test against
|
||||
- Specify external dependencies and constraints
|
||||
|
||||
## Research Phase
|
||||
|
||||
Before writing, do your homework:
|
||||
|
||||
### 1. Research Related Specifications
|
||||
Look for upstream and downstream specs:
|
||||
```bash
|
||||
# Find the business requirement this fulfills
|
||||
grep -r "brd\|business" docs/specs/ --include="*.md" | head -20
|
||||
|
||||
# Find any existing technical requirements in this domain
|
||||
grep -r "prd\|technical" docs/specs/ --include="*.md" | head -20
|
||||
|
||||
# Find design documents that might inform this
|
||||
grep -r "design\|architecture" docs/specs/ --include="*.md" | head -20
|
||||
```
|
||||
|
||||
### 2. Research External Documentation
|
||||
Research relevant technologies and patterns:
|
||||
|
||||
```bash
|
||||
# For external libraries/frameworks:
|
||||
# Use the doc tools to get latest official documentation
|
||||
# Example: research React hooks if implementing a frontend component
|
||||
# Example: research database indexing strategies if working with large datasets
|
||||
```
|
||||
|
||||
Ask yourself:
|
||||
- What technologies are most suitable for this?
|
||||
- Are there industry standards we should follow?
|
||||
- What does the existing codebase use for similar features?
|
||||
- Are there performance benchmarks or best practices we should know about?
|
||||
|
||||
### 3. Review the Codebase
|
||||
- How have similar features been implemented?
|
||||
- What patterns does the team follow?
|
||||
- What libraries/frameworks are already in use?
|
||||
- Are there existing utilities or services we can reuse?
|
||||
|
||||
## Structure & Content Guide
|
||||
|
||||
### Title & Metadata
|
||||
- **Title**: Clear, specific requirement (e.g., "Implement Real-Time Notification System")
|
||||
- **Priority**: critical | high | medium | low
|
||||
- **Document ID**: Use format `PRD-XXX-slug` (e.g., `PRD-001-export-api`)
|
||||
|
||||
### Description Section
|
||||
Answer: "What technical problem are we solving?"
|
||||
|
||||
Describe:
|
||||
- The technical challenge you're addressing
|
||||
- Why this particular approach matters
|
||||
- Current technical gaps
|
||||
- How this impacts system architecture or performance
|
||||
|
||||
Example:
|
||||
```
|
||||
Currently, bulk exports run synchronously, blocking requests for up to 30 seconds.
|
||||
This causes timeout errors for exports > 100MB. We need an asynchronous export
|
||||
system that handles large datasets efficiently.
|
||||
```
|
||||
|
||||
### Business Requirements Addressed Section
|
||||
Reference the business requirements this fulfills:
|
||||
```
|
||||
- [BRD-001] Bulk User Data Export - This implementation enables the export feature
|
||||
- [BRD-002] Enterprise Data Audit - This provides the data integrity requirements
|
||||
```
|
||||
|
||||
Link each BRD to how your technical solution addresses it.
|
||||
|
||||
### Technical Requirements Section
|
||||
List specific, measurable technical requirements:
|
||||
|
||||
```markdown
|
||||
1. **[TR-001] Asynchronous Export Processing**
|
||||
- Exports must complete within 5 minutes for datasets up to 500MB
|
||||
- Must not block HTTP request threads
|
||||
- Must handle job queue with at least 100 concurrent exports
|
||||
|
||||
2. **[TR-002] Data Format Support**
|
||||
- Support CSV, JSON, and Parquet formats
|
||||
- All formats must preserve data types accurately
|
||||
- Handle special characters and encodings (UTF-8, etc.)
|
||||
|
||||
3. **[TR-003] Resilience & Retries**
|
||||
- Failed exports must retry up to 3 times with exponential backoff
|
||||
- Incomplete exports must be resumable or cleanly failed
|
||||
```
|
||||
|
||||
**Tips:**
|
||||
- Be specific: Use numbers, formats, standards
|
||||
- Make it testable: Each requirement should be verifiable
|
||||
- Reference technical specs: Link to API contracts, data models, etc.
|
||||
- Include edge cases: What about edge cases or error conditions?
|
||||
|
||||
### Implementation Approach Section
|
||||
Describe the high-level technical strategy:
|
||||
|
||||
```markdown
|
||||
**Architecture Pattern**
|
||||
We'll use a job queue pattern with async workers. HTTP requests will create
|
||||
an export job and return immediately. Workers process jobs asynchronously
|
||||
and notify users when complete.
|
||||
|
||||
**Key Technologies**
|
||||
- Job Queue: Redis with Bull library
|
||||
- Export Service: Node.js worker process
|
||||
- Storage: S3 for export files
|
||||
- Notifications: Email service
|
||||
|
||||
**Integration Points**
|
||||
- Integrates with existing User Service API
|
||||
- Uses auth middleware for permission checking
|
||||
- Publishes completion events to event bus
|
||||
```
|
||||
|
||||
### Key Design Decisions Section
|
||||
Document important choices:
|
||||
|
||||
```markdown
|
||||
**Decision 1: Asynchronous Export vs. Synchronous**
|
||||
- **Decision**: Use async job queue instead of blocking requests
|
||||
- **Rationale**: Synchronous approach causes timeouts for large exports;
|
||||
async improves reliability and user experience
|
||||
- **Tradeoffs**: Adds complexity (job queue, worker processes, status tracking)
|
||||
but enables exports for datasets up to 500MB vs. 50MB limit
|
||||
```
|
||||
|
||||
**Why this matters:**
|
||||
- Explains the "why" behind technical choices
|
||||
- Helps future developers understand constraints
|
||||
- Documents tradeoffs explicitly
|
||||
|
||||
### Technical Acceptance Criteria Section
|
||||
Define how you'll know this is implemented correctly:
|
||||
|
||||
```markdown
|
||||
### [TAC-001] Export Job Creation
|
||||
**Description**: When a user requests an export, a job is created and queued
|
||||
**Verification**: Unit test verifies job is created with correct parameters;
|
||||
integration test verifies job appears in queue
|
||||
|
||||
### [TAC-002] Async Processing
|
||||
**Description**: Export job completes without blocking HTTP request
|
||||
**Verification**: Load test shows HTTP response time < 100ms regardless of
|
||||
export size; export job completes within target time
|
||||
|
||||
### [TAC-003] Export Format Accuracy
|
||||
**Description**: Exported data matches source data exactly (no data loss)
|
||||
**Verification**: Property-based tests verify format accuracy for various
|
||||
data types and edge cases
|
||||
```
|
||||
|
||||
**Tips for Acceptance Criteria:**
|
||||
- Each should be testable (unit test, integration test, or manual test)
|
||||
- Include both happy path and edge cases
|
||||
- Reference specific metrics or standards
|
||||
|
||||
### Dependencies Section
|
||||
|
||||
**Technical Dependencies**
|
||||
- What libraries, services, or systems must be in place?
|
||||
- What versions are required?
|
||||
- What's the risk if a dependency is unavailable?
|
||||
|
||||
```markdown
|
||||
- **Redis** (v6.0+) - Job queue | Risk: Medium
|
||||
- **Bull** (v3.0+) - Queue library | Risk: Low
|
||||
- **S3** - Export file storage | Risk: Low
|
||||
- **Email Service API** - User notifications | Risk: Medium
|
||||
```
|
||||
|
||||
**Specification Dependencies**
|
||||
- What other specs must be completed first?
|
||||
- Why is this a blocker?
|
||||
|
||||
```markdown
|
||||
- [API-001] Export Endpoints - Must be designed before implementation
|
||||
- [DATA-001] User Data Model - Need schema for understanding export structure
|
||||
```
|
||||
|
||||
### Constraints Section
|
||||
Document technical limitations:
|
||||
|
||||
```markdown
|
||||
**Performance**
|
||||
- Exports must complete within 5 minutes
|
||||
- p95 latency for export requests must be < 100ms
|
||||
- System must handle 100 concurrent exports
|
||||
|
||||
**Scalability**
|
||||
- Support up to 500MB export files
|
||||
- Handle 1000+ daily exports
|
||||
|
||||
**Security**
|
||||
- Only export user's own data (auth-based filtering)
|
||||
- Encryption for files in transit and at rest
|
||||
- Audit logs for all exports
|
||||
|
||||
**Compatibility**
|
||||
- Support all major browsers (Chrome, Firefox, Safari, Edge)
|
||||
- Works with existing authentication system
|
||||
```
|
||||
|
||||
### Implementation Notes Section
|
||||
|
||||
**Key Considerations**
|
||||
What should the implementation team watch out for?
|
||||
|
||||
```markdown
|
||||
**Error Handling**
|
||||
- Handle network interruptions during export
|
||||
- Gracefully fail if S3 becomes unavailable
|
||||
- Provide clear error messages to users
|
||||
|
||||
**Testing Strategy**
|
||||
- Unit tests for export formatting logic
|
||||
- Integration tests for job queue and workers
|
||||
- Load tests for concurrent export handling
|
||||
- Property-based tests for data accuracy
|
||||
```
|
||||
|
||||
**Migration Strategy** (if applicable)
|
||||
- How do we transition from old to new system?
|
||||
- What about existing data or users?
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Make Requirements Testable
|
||||
- ❌ Bad: "Export should be fast"
|
||||
- ✅ Good: "Export must complete within 5 minutes for datasets up to 500MB, with p95 latency under 100ms"
|
||||
|
||||
### Be Specific About Trade-offs
|
||||
- Don't just say "we chose Redis"
|
||||
- Explain: "We chose Redis over RabbitMQ because it's already in our stack and provides the job persistence we need"
|
||||
|
||||
### Link to Other Specs
|
||||
- Reference business requirements this fulfills: `[BRD-001]`
|
||||
- Reference data models: `[DATA-001]`
|
||||
- Reference API contracts: `[API-001]`
|
||||
- Reference design documents: `[DES-001]`
|
||||
|
||||
### Document Constraints Clearly
|
||||
- Performance targets with specific numbers
|
||||
- Scalability limits and assumptions
|
||||
- Security and compliance requirements
|
||||
- Browser/platform support
|
||||
|
||||
### Include Edge Cases
|
||||
- What happens with extremely large datasets?
|
||||
- How do we handle special characters, encoding issues, missing data?
|
||||
- What about rate limiting and concurrent requests?
|
||||
|
||||
### Complete All TODOs
|
||||
- Replace placeholder text with actual decisions
|
||||
- If something is still undecided, explain what needs to happen to decide
|
||||
|
||||
## Validation & Fixing Issues
|
||||
|
||||
### Run the Validator
|
||||
```bash
|
||||
scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-your-spec.md
|
||||
```
|
||||
|
||||
### Common Issues & Fixes
|
||||
|
||||
**Issue**: "Missing Technical Acceptance Criteria"
|
||||
- **Fix**: Add 3-5 criteria describing how you'll verify implementation correctness
|
||||
|
||||
**Issue**: "TODO items in Implementation Approach (2 items)"
|
||||
- **Fix**: Complete the architecture pattern, technologies, and integration points
|
||||
|
||||
**Issue**: "No Performance constraints specified"
|
||||
- **Fix**: Add specific latency, throughput, and availability targets
|
||||
|
||||
**Issue**: "Dependencies section incomplete"
|
||||
- **Fix**: List all required libraries, services, and other specifications this depends on
|
||||
|
||||
### Check Completeness
|
||||
```bash
|
||||
scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-your-spec.md
|
||||
```
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
As you write the technical requirement, reason through:
|
||||
|
||||
1. **Problem**: What technical problem are we solving?
|
||||
- Is this a performance issue, reliability issue, or capability gap?
|
||||
- What's the current cost of not solving this?
|
||||
|
||||
2. **Approach**: What are the viable technical approaches?
|
||||
- Pros and cons of each?
|
||||
- What's the simplest approach that solves the problem?
|
||||
- What does the team have experience with?
|
||||
|
||||
3. **Trade-offs**: What are we accepting with this approach?
|
||||
- Complexity vs. flexibility?
|
||||
- Performance vs. maintainability?
|
||||
- Immediate need vs. future extensibility?
|
||||
|
||||
4. **Measurability**: How will we know this works?
|
||||
- What specific metrics define success?
|
||||
- What's the threshold for "passing"?
|
||||
|
||||
5. **Dependencies**: What must happen first?
|
||||
- Are there blockers we need to resolve?
|
||||
- Can parts be parallelized?
|
||||
|
||||
## Example: Complete Technical Requirement
|
||||
|
||||
```markdown
|
||||
# [PRD-001] Asynchronous Export Service
|
||||
|
||||
**Priority:** High
|
||||
|
||||
## Description
|
||||
Currently, bulk exports run synchronously, blocking HTTP requests for up to
|
||||
30 seconds, causing timeouts for exports > 100MB. We need an asynchronous
|
||||
export system that handles large datasets efficiently and provides job status
|
||||
tracking to users.
|
||||
|
||||
## Business Requirements Addressed
|
||||
- [BRD-001] Bulk User Data Export - Enables the core export feature
|
||||
- [BRD-002] Enterprise Audit Requirements - Provides reliable data export
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
1. **[TR-001] Asynchronous Processing**
|
||||
- Export jobs must not block HTTP requests
|
||||
- Jobs complete within 5 minutes for datasets up to 500MB
|
||||
- System handles 100 concurrent exports
|
||||
|
||||
2. **[TR-002] Format Support**
|
||||
- Support CSV, JSON formats
|
||||
- Preserve data types and handle special characters
|
||||
|
||||
3. **[TR-003] Job Status Tracking**
|
||||
- Users can check export job status via API
|
||||
- Job history retained for 30 days
|
||||
|
||||
... [rest of sections follow] ...
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Create the spec**: `scripts/generate-spec.sh technical-requirement prd-XXX-slug`
|
||||
2. **Research**: Find related BRD and understand the context
|
||||
3. **Fill in sections** using this guide
|
||||
4. **Validate**: `scripts/validate-spec.sh docs/specs/technical-requirement/prd-XXX-slug.md`
|
||||
5. **Fix issues** identified by validator
|
||||
6. **Share with architecture/design team** for design document creation
|
||||
Reference in New Issue
Block a user