Initial commit

2025-11-30 08:45:31 +08:00
commit ca9b85ccda
35 changed files with 10784 additions and 0 deletions
--- a/skills/spec-author/guides/README.md
+++ b/skills/spec-author/guides/README.md
@@ -0,0 +1,341 @@
+# Spec Author Instruction Guides
+
+This directory contains comprehensive instruction guides for creating each type of specification document supported by the spec-author skill. Each guide includes:
+
+- **Quick Start**: Get up and running quickly with basic commands
+- **Research Phase**: Guidance on researching related specs and external documentation
+- **Structure & Content Guide**: Detailed walkthrough of each section
+- **Writing Tips**: Best practices and common pitfalls
+- **Validation & Fixing Issues**: How to use the validation tools
+- **Decision-Making Framework**: Questions to ask while writing
+- **Next Steps**: Complete workflow from creation to completion
+
+## Specification Types
+
+### Business & Planning
+
+#### [Business Requirement (brd-XXX)](./business-requirement.md)
+Capture what problem you're solving and why it matters from a business perspective. Translate customer needs into requirements that engineering can build against.
+
+**Use when**: Documenting new features, defining business value, creating stakeholder alignment
+**Key sections**: Business value, user stories, acceptance criteria, success metrics
+
+#### [Technical Requirement (prd-XXX)](./technical-requirement.md)
+Translate business needs into specific, implementation-ready technical requirements. Bridge the gap between "what we want" and "how we'll build it."
+
+**Use when**: Defining technical implementation details, mapping business requirements to solutions
+**Key sections**: Technical requirements, design decisions, acceptance criteria, constraints
+
+#### [Plan (pln-XXX)](./plan.md)
+Document implementation roadmaps, project timelines, phases, and deliverables. Provide the "how and when" we'll build something.
+
+**Use when**: Planning project execution, defining phases and timeline, identifying dependencies
+**Key sections**: Phases, timeline, deliverables, dependencies, risks, resources
+
+#### [Milestone (mls-XXX)](./milestone.md)
+Define specific delivery checkpoints within a project, including deliverables, success criteria, and timeline.
+
+**Use when**: Defining delivery targets, communicating progress, tracking against concrete deliverables
+**Key sections**: Deliverables, success criteria, timeline, blockers, acceptance procedures
+
+### Architecture & Design
+
+#### [Design Document (des-XXX)](./design-document.md)
+Provide the detailed architectural and technical design for a system, component, or significant feature.
+
+**Use when**: Major system redesign, architectural decisions, technology choices
+**Key sections**: Proposed solution, design decisions, technology stack, trade-offs, implementation plan
+
+#### [Component (cmp-XXX)](./component.md)
+Document individual system components or services, including responsibilities, interfaces, configuration, and deployment.
+
+**Use when**: Documenting microservices, major system components, architectural pieces
+**Key sections**: Responsibilities, interfaces, configuration, deployment, monitoring
+
+#### [Flow Schematic (flow-XXX)](./flow-schematic.md)
+Document business processes, workflows, and system flows visually and textually. Show how information moves through systems.
+
+**Use when**: Documenting user workflows, system interactions, complex processes
+**Key sections**: Flow diagram, step-by-step descriptions, decision points, error handling
+
+### Data & Contracts
+
+#### [Data Model (data-XXX)](./data-model.md)
+Define entities, fields, relationships, and constraints for your application's data. Define the "shape" of data your system works with.
+
+**Use when**: Planning database schema, documenting entity relationships, enabling API/UI teams
+**Key sections**: Entity definitions, relationships, constraints, scaling considerations
+
+#### [API Contract (api-XXX)](./api-contract.md)
+Document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication.
+
+**Use when**: Defining API endpoints, enabling parallel frontend/backend development, creating living documentation
+**Key sections**: Authentication, endpoints, response formats, error handling, rate limiting
+
+### Operations & Configuration
+
+#### [Deployment Procedure (deploy-XXX)](./deployment-procedure.md)
+Document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
+
+**Use when**: Deploying services, creating runbooks, enabling safe operations, ensuring repeatable deployments
+**Key sections**: Prerequisites, deployment steps, rollback procedure, success criteria, troubleshooting
+
+#### [Configuration Schema (config-XXX)](./configuration-schema.md)
+Document all configurable parameters for a system, including types, valid values, defaults, and impact.
+
+**Use when**: Documenting system configuration, enabling ops teams to configure safely, supporting multiple environments
+**Key sections**: Configuration methods, field descriptions, validation rules, environment-specific examples
+
+## How to Use These Guides
+
+### For New Spec Types
+
+1. **Choose your spec type** based on what you're documenting
+2. **Go to the corresponding guide** (e.g., if creating a business requirement, read `business-requirement.md`)
+3. **Follow the Quick Start** to generate a new spec from the template
+4. **Work through the Research Phase** to understand context
+5. **Use the Structure & Content Guide** to fill in each section
+6. **Apply the Writing Tips** as you write
+7. **Run validation** using the tools
+8. **Follow the Decision-Making Framework** to reason through tough choices
+
+### For Improving Existing Specs
+
+1. **Check the appropriate guide** for what your spec should contain
+2. **Review the "Validation & Fixing Issues"** section for common problems
+3. **Use validation tools** to identify what's missing
+4. **Fill in missing sections** using the structure guide
+5. **Fix any issues** the validator identifies
+
+### For Team Standards
+
+1. **Each guide provides concrete standards** for each spec type
+2. **Sections that are marked as "required"** should be in all specs
+3. **Examples in each guide** show the expected quality and detail level
+4. **Validation rules** ensure consistent structure across specs
+
+## Quick Reference: CLI Commands
+
+### Create a New Spec
+
+```bash
+# Generate a new spec from template
+scripts/generate-spec.sh <spec-type> <spec-id>
+
+# Examples:
+scripts/generate-spec.sh business-requirement brd-001-user-export
+scripts/generate-spec.sh design-document des-001-export-arch
+scripts/generate-spec.sh api-contract api-001-export-endpoints
+```
+
+### Validate a Spec
+
+```bash
+# Check a spec for completeness and structure
+scripts/validate-spec.sh docs/specs/business-requirement/brd-001-user-export.md
+```
+
+Returns:
+- ✓ **PASS**: All required sections present and complete
+- ⚠ **WARNINGS**: Missing optional sections or incomplete TODOs
+- ✗ **ERRORS**: Missing critical sections or structural issues
+
+### Check Completeness
+
+```bash
+# See what's incomplete and what TODOs need attention
+scripts/check-completeness.sh docs/specs/business-requirement/brd-001-user-export.md
+```
+
+Shows:
+- Completion percentage
+- Missing sections with descriptions
+- TODO items that need completion
+- Referenced documents
+
+### List Available Templates
+
+```bash
+# See what spec types are available
+scripts/list-templates.sh
+```
+
+## Workflow Example: Creating a Feature End-to-End
+
+### Step 1: Business Requirements
+**Create**: `scripts/generate-spec.sh business-requirement brd-001-bulk-export`
+**Guide**: Follow [business-requirement.md](./business-requirement.md)
+**Output**: `docs/specs/business-requirement/brd-001-bulk-export.md`
+
+### Step 2: Technical Requirements
+**Create**: `scripts/generate-spec.sh technical-requirement prd-001-export-api`
+**Guide**: Follow [technical-requirement.md](./technical-requirement.md)
+**Reference**: Link to BRD created in Step 1
+**Output**: `docs/specs/technical-requirement/prd-001-export-api.md`
+
+### Step 3: Design Document
+**Create**: `scripts/generate-spec.sh design-document des-001-export-arch`
+**Guide**: Follow [design-document.md](./design-document.md)
+**Reference**: Link to PRD and BRD
+**Output**: `docs/specs/design-document/des-001-export-arch.md`
+
+### Step 4: Data Model
+**Create**: `scripts/generate-spec.sh data-model data-001-export-schema`
+**Guide**: Follow [data-model.md](./data-model.md)
+**Reference**: Entities used in design
+**Output**: `docs/specs/data-model/data-001-export-schema.md`
+
+### Step 5: API Contract
+**Create**: `scripts/generate-spec.sh api-contract api-001-export-endpoints`
+**Guide**: Follow [api-contract.md](./api-contract.md)
+**Reference**: Link to technical requirements
+**Output**: `docs/specs/api-contract/api-001-export-endpoints.md`
+
+### Step 6: Component Specs
+**Create**: `scripts/generate-spec.sh component cmp-001-export-service`
+**Guide**: Follow [component.md](./component.md)
+**Reference**: Link to design and technical requirements
+**Output**: `docs/specs/component/cmp-001-export-service.md`
+
+### Step 7: Implementation Plan
+**Create**: `scripts/generate-spec.sh plan pln-001-export-implementation`
+**Guide**: Follow [plan.md](./plan.md)
+**Reference**: Link to all related specs
+**Output**: `docs/specs/plan/pln-001-export-implementation.md`
+
+### Step 8: Define Milestones
+**Create**: `scripts/generate-spec.sh milestone mls-001-export-phase1`
+**Guide**: Follow [milestone.md](./milestone.md)
+**Reference**: Link to plan created in Step 7
+**Output**: `docs/specs/milestone/mls-001-export-phase1.md`
+
+### Step 9: Document Workflows
+**Create**: `scripts/generate-spec.sh flow-schematic flow-001-export-process`
+**Guide**: Follow [flow-schematic.md](./flow-schematic.md)
+**Reference**: Illustrate flows described in design and API
+**Output**: `docs/specs/flow-schematic/flow-001-export-process.md`
+
+### Step 10: Configuration Schema
+**Create**: `scripts/generate-spec.sh configuration-schema config-001-export-service`
+**Guide**: Follow [configuration-schema.md](./configuration-schema.md)
+**Reference**: Used by component and deployment
+**Output**: `docs/specs/configuration-schema/config-001-export-service.md`
+
+### Step 11: Deployment Procedure
+**Create**: `scripts/generate-spec.sh deployment-procedure deploy-001-export-production`
+**Guide**: Follow [deployment-procedure.md](./deployment-procedure.md)
+**Reference**: Link to component, configuration, and plan
+**Output**: `docs/specs/deployment-procedure/deploy-001-export-production.md`
+
+## Tips for Success
+
+### 1. Research Thoroughly
+- Use the Research Phase section in each guide
+- Look for related specs that have already been created
+- Research external docs using doc tools or web search
+- Understand the context before writing
+
+### 2. Use CLI Tools Effectively
+- Start with `scripts/generate-spec.sh` to create from template (saves time)
+- Use `scripts/validate-spec.sh` frequently while writing (catches issues early)
+- Use `scripts/check-completeness.sh` to find TODOs that need attention
+- Run validation before considering a spec "done"
+
+### 3. Complete All Sections
+- "Required" sections should be in every spec
+- "Optional" sections may be skipped if not applicable
+- Never leave placeholder text or TODO items in final specs
+- Incomplete specs cause confusion and rework
+
+### 4. Link Specs Together
+- Reference related specs using [ID] format (e.g., `[BRD-001]`)
+- Show how specs depend on each other
+- This creates a web of related documentation
+- Makes specs more discoverable
+
+### 5. Use Concrete Examples
+- Concrete examples are clearer than abstract descriptions
+- Show actual data, requests, responses
+- Include sample configurations
+- Show before/after if describing changes
+
+### 6. Get Feedback Early
+- Share early drafts with stakeholders
+- Use validation to catch structural issues
+- Get domain experts to review for accuracy
+- Iterate based on feedback
+
+### 7. Keep Updating
+- Specs should reflect current state, not initial design
+- Update when important decisions change
+- Mark what was decided and when
+- Document why changes were made
+
+## Common Patterns Across Guides
+
+### Quick Start Pattern
+Every guide starts with:
+```bash
+scripts/generate-spec.sh <type> <id>
+# Edit file
+scripts/validate-spec.sh docs/specs/...
+scripts/check-completeness.sh docs/specs/...
+```
+
+### Research Phase Pattern
+Every guide recommends:
+1. Finding related specs
+2. Understanding external context
+3. Reviewing existing patterns
+4. Understanding constraints/requirements
+
+### Structure Pattern
+Every guide provides:
+- Detailed walkthrough of each section
+- Purpose of each section
+- What should be included
+- How detailed to be
+
+### Validation Pattern
+Every guide includes:
+- Running the validator
+- Common issues and how to fix them
+- Checking completeness
+
+### Decision Pattern
+Every guide encourages thinking through:
+1. Scope/boundaries
+2. Options and trade-offs
+3. Specific decisions and rationale
+4. Communication and approval
+5. Evolution and change
+
+## Getting Help
+
+### For questions about a specific spec type:
+- Read the corresponding guide in this directory
+- Check the examples section for concrete examples
+- Review the decision-making framework for guidance
+
+### For validation issues:
+- Run `./scripts/validate-spec.sh` to see what's missing
+- Read the "Validation & Fixing Issues" section of the guide
+- Check if required sections are present and complete
+
+### For understanding the bigger picture:
+- Read through related guides to see how specs connect
+- Look at the Workflow Example to see the full flow
+- Review the Common Patterns section
+
+## Next Steps
+
+1. **Pick a spec type** you need to create
+2. **Read the corresponding guide** thoroughly
+3. **Run the generate command** to create from template
+4. **Follow the structure guide** to fill in sections
+5. **Validate frequently** as you work
+6. **Fix issues** the validator identifies
+7. **Get feedback** from stakeholders
+8. **Consider this "complete"** when validator passes ✓
+
+Good luck writing great specs! Remember: clear, complete specifications save time and prevent mistakes later in the development process.
--- a/skills/spec-author/guides/api-contract.md
+++ b/skills/spec-author/guides/api-contract.md
@@ -0,0 +1,526 @@
+# How to Create an API Contract Specification
+
+API Contracts document the complete specification of REST/GraphQL endpoints, including request/response formats, error handling, and authentication. They serve as the contract between frontend and backend teams.
+
+## Quick Start
+
+```bash
+# 1. Create a new API contract
+scripts/generate-spec.sh api-contract api-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/api-contract/api-001-descriptive-slug.md)
+
+# 3. Fill in endpoints and specifications, then validate:
+scripts/validate-spec.sh docs/specs/api-contract/api-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/api-contract/api-001-descriptive-slug.md
+```
+
+## When to Write an API Contract
+
+Use an API Contract when you need to:
+- Define REST API endpoints and their behavior
+- Document request/response schemas in detail
+- Specify error handling and status codes
+- Clarify authentication and authorization
+- Enable parallel frontend/backend development
+- Create living documentation of your API
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what this API needs to support:
+
+```bash
+# Find technical requirements this fulfills
+grep -r "prd\|technical" docs/specs/ --include="*.md"
+
+# Find data models this API exposes
+grep -r "data\|model" docs/specs/ --include="*.md"
+
+# Find existing APIs in the codebase
+grep -r "api\|endpoint" docs/specs/ --include="*.md"
+```
+
+### 2. Research API Design Standards
+Understand best practices and conventions:
+
+- REST conventions: HTTP methods, status codes, URL structure
+- Pagination: How to handle large result sets?
+- Error handling: Standard error format for your org?
+- Versioning: How do you version APIs?
+- Naming conventions: camelCase vs. snake_case?
+
+Research your tech stack's conventions if needed.
+
+### 3. Review Existing APIs
+- How are existing APIs in your codebase designed?
+- What patterns does your team follow?
+- Any shared infrastructure (API gateway, auth)?
+- Error response format standards?
+
+### 4. Understand Data Models
+- What entities are exposed?
+- Which fields are required vs. optional?
+- See [DATA-001] or similar specs for schema details
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "User Export API" or similar
+- Include context about what endpoints are included
+- Version number if this is an update to an existing API
+
+### Overview Section
+Provide context for the API:
+
+```markdown
+# User Export API
+
+This API provides endpoints for initiating, tracking, and downloading user data exports.
+Supports bulk export of user information in multiple formats (CSV, JSON).
+Authenticated requests only.
+
+**Base URL**: `https://api.example.com/v1`
+**Authentication**: Bearer token (JWT)
+```
+
+### Authentication & Authorization Section
+
+Describe how authentication works:
+
+```markdown
+## Authentication
+
+**Method**: Bearer Token (JWT)
+**Header**: `Authorization: Bearer {token}`
+**Token Source**: Obtained from `/auth/login` endpoint
+
+### Authorization
+
+**Required**: All endpoints require valid JWT token
+
+**Scopes** (if using OAuth/scope-based):
+- `exports:read` - View export status
+- `exports:create` - Create new exports
+- `exports:download` - Download export files
+
+**User Data**: Users can only access their own exports (enforced server-side)
+```
+
+### Endpoints Section
+
+Document each endpoint thoroughly:
+
+#### Endpoint: Create Export
+```markdown
+**POST /exports**
+
+Creates a new export job for the authenticated user.
+
+### Request
+
+**Headers**
+- `Authorization: Bearer {token}` (required)
+- `Content-Type: application/json`
+
+**Body**
+```json
+{
+  "data_types": ["users", "transactions"],
+  "format": "csv",
+  "date_range": {
+    "start": "2024-01-01",
+    "end": "2024-01-31"
+  }
+}
+```
+
+**Parameters**
+- `data_types` (array, required): Types of data to include
+  - Allowed values: `users`, `transactions`, `settings`
+  - At least one required
+- `format` (string, required): Export file format
+  - Allowed values: `csv`, `json`
+- `date_range` (object, optional): Filter data by date range
+  - `start` (string, ISO8601 format)
+  - `end` (string, ISO8601 format)
+
+### Response
+
+**Status: 201 Created**
+```json
+{
+  "id": "exp_1234567890",
+  "user_id": "usr_9876543210",
+  "status": "queued",
+  "format": "csv",
+  "data_types": ["users", "transactions"],
+  "created_at": "2024-01-15T10:30:00Z",
+  "estimated_completion": "2024-01-15T10:35:00Z"
+}
+```
+
+**Status: 400 Bad Request**
+```json
+{
+  "error": "invalid_request",
+  "message": "data_types must include at least one type",
+  "code": "VALIDATION_ERROR"
+}
+```
+
+**Status: 401 Unauthorized**
+```json
+{
+  "error": "unauthorized",
+  "message": "Invalid or missing authorization token",
+  "code": "AUTH_FAILED"
+}
+```
+
+**Status: 429 Too Many Requests**
+```json
+{
+  "error": "rate_limited",
+  "message": "Too many requests. Try again after 60 seconds.",
+  "retry_after": 60
+}
+```
+
+### Details
+
+**Rate Limiting**
+- 10 exports per hour per user
+- Returns `X-RateLimit-*` headers
+  - `X-RateLimit-Limit: 10`
+  - `X-RateLimit-Remaining: 5`
+  - `X-RateLimit-Reset: 1705319400`
+
+**Notes**
+- Exports larger than 100MB are automatically gzipped
+- User receives email notification when export is ready
+- Export files retained for 7 days
+```
+
+#### Endpoint: Get Export Status
+```markdown
+**GET /exports/{export_id}**
+
+Retrieve the status of a specific export.
+
+### Path Parameters
+- `export_id` (string, required): Export ID (e.g., `exp_1234567890`)
+
+### Response
+
+**Status: 200 OK**
+```json
+{
+  "id": "exp_1234567890",
+  "user_id": "usr_9876543210",
+  "status": "completed",
+  "format": "csv",
+  "data_types": ["users", "transactions"],
+  "created_at": "2024-01-15T10:30:00Z",
+  "completed_at": "2024-01-15T10:35:00Z",
+  "file_size_bytes": 2048576,
+  "download_url": "https://exports.example.com/exp_1234567890.csv.gz",
+  "download_expires_at": "2024-01-22T10:35:00Z"
+}
+```
+
+**Status: 404 Not Found**
+```json
+{
+  "error": "not_found",
+  "message": "Export not found",
+  "code": "EXPORT_NOT_FOUND"
+}
+```
+
+### Export Status Values
+- `queued` - Job is waiting to be processed
+- `processing` - Job is currently running
+- `completed` - Export is ready for download
+- `failed` - Export failed (see error field)
+- `cancelled` - User cancelled the export
+
+### Error Field (when status: failed)
+```json
+{
+  "error": "export_failed",
+  "message": "Database connection lost during export"
+}
+```
+```
+
+#### Endpoint: Download Export
+```markdown
+**GET /exports/{export_id}/download**
+
+Download the export file.
+
+### Path Parameters
+- `export_id` (string, required): Export ID
+
+### Response
+
+**Status: 200 OK**
+- Returns binary file content
+- Content-Type: `application/csv` or `application/json`
+- Headers include:
+  - `Content-Disposition: attachment; filename=export.csv`
+  - `Content-Length: 2048576`
+
+**Status: 410 Gone**
+```json
+{
+  "error": "gone",
+  "message": "Export file expired (retention: 7 days)",
+  "code": "FILE_EXPIRED"
+}
+```
+```
+
+### Response Formats Section
+
+Define common response formats used across endpoints:
+
+```markdown
+## Common Response Formats
+
+### Error Response
+All errors follow this format:
+```json
+{
+  "error": "error_code",
+  "message": "Human-readable error message",
+  "code": "ERROR_CODE",
+  "request_id": "req_abc123"  // For support/debugging
+}
+```
+
+### Pagination (for list endpoints)
+```json
+{
+  "data": [ /* array of items */ ],
+  "pagination": {
+    "total": 150,
+    "limit": 20,
+    "offset": 0,
+    "next": "https://api.example.com/v1/exports?limit=20&offset=20"
+  }
+}
+```
+```
+
+### Error Handling Section
+
+Document error scenarios and status codes:
+
+```markdown
+## Error Handling
+
+### HTTP Status Codes
+
+- **200 OK**: Request succeeded
+- **201 Created**: Resource created successfully
+- **400 Bad Request**: Invalid request format or parameters
+- **401 Unauthorized**: Missing or invalid authentication
+- **403 Forbidden**: Authenticated but not authorized (e.g., trying to access another user's export)
+- **404 Not Found**: Resource doesn't exist
+- **409 Conflict**: Request conflicts with current state (e.g., cancelling completed export)
+- **429 Too Many Requests**: Rate limit exceeded
+- **500 Internal Server Error**: Server error
+- **503 Service Unavailable**: Service temporarily unavailable
+
+### Error Codes
+
+- `VALIDATION_ERROR` - Invalid input parameters
+- `AUTH_FAILED` - Authentication failed
+- `NOT_AUTHORIZED` - Insufficient permissions
+- `NOT_FOUND` - Resource doesn't exist
+- `CONFLICT` - Conflicting request state
+- `RATE_LIMITED` - Rate limit exceeded
+- `INTERNAL_ERROR` - Server error (retryable)
+- `SERVICE_UNAVAILABLE` - Service temporarily down (retryable)
+
+### Retry Strategy
+
+**Retryable errors** (5xx, 429):
+- Implement exponential backoff: 1s, 2s, 4s, 8s...
+- Maximum 3 retries
+
+**Non-retryable errors** (4xx except 429):
+- Return error immediately to client
+```
+
+### Rate Limiting Section
+
+```markdown
+## Rate Limiting
+
+### Limits per User
+- Export creation: 10 per hour
+- API calls: 1000 per hour
+
+### Headers
+All responses include rate limit information:
+- `X-RateLimit-Limit`: Request quota
+- `X-RateLimit-Remaining`: Requests remaining
+- `X-RateLimit-Reset`: Unix timestamp when quota resets
+
+### Handling Rate Limits
+- If rate limited (429), client receives `Retry-After` header
+- Retry after specified seconds
+- Implement exponential backoff to avoid overwhelming API
+```
+
+### Data Types Section
+
+If your API works with multiple data models, document them:
+
+```markdown
+## Data Types
+
+### Export Object
+```json
+{
+  "id": "string (export ID)",
+  "user_id": "string (user ID)",
+  "status": "string (queued|processing|completed|failed|cancelled)",
+  "format": "string (csv|json)",
+  "data_types": "string[] (users|transactions|settings)",
+  "created_at": "string (ISO8601)",
+  "completed_at": "string (ISO8601, null if not completed)",
+  "file_size_bytes": "number (null if not completed)",
+  "download_url": "string (null if not completed)",
+  "download_expires_at": "string (ISO8601, null if expired)"
+}
+```
+
+### User Object
+```json
+{
+  "id": "string",
+  "email": "string",
+  "created_at": "string (ISO8601)"
+}
+```
+```
+
+### Versioning Section
+
+```markdown
+## API Versioning
+
+**Current Version**: v1
+
+### Versioning Strategy
+- New major versions for breaking changes (v1, v2, etc.)
+- Minor versions for additive changes (backwards compatible)
+- Versions specified in URL path: `/v1/exports`
+
+### Migration Timeline
+- Old version support: Minimum 12 months after new version release
+- Deprecation notice: 3 months before shutdown
+
+### Breaking Changes
+Examples of breaking changes requiring new version:
+- Removing endpoints or fields
+- Changing response format fundamentally
+- Changing HTTP method of endpoint
+```
+
+## Writing Tips
+
+### Be Specific About Request/Response
+- Show actual JSON examples
+- Document all fields (required vs. optional)
+- Include data types and valid values
+- Specify date/time formats (ISO8601)
+
+### Document Error Scenarios
+- List common error cases for each endpoint
+- Show exact error response format
+- Explain how client should handle each error
+- Include HTTP status codes
+
+### Think About Developer Experience
+- Are endpoints intuitive?
+- Is pagination consistent across endpoints?
+- Are error messages helpful?
+- Can a developer implement against this without asking questions?
+
+### Link to Related Specs
+- Reference data models: `[DATA-001]`
+- Reference technical requirements: `[PRD-001]`
+- Reference design docs: `[DES-001]`
+
+### Version Your API
+- Document versioning strategy
+- Make it easy for clients to upgrade
+- Provide migration path from old to new versions
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/api-contract/api-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Missing endpoint specifications"
+- **Fix**: Document all endpoints with request/response examples
+
+**Issue**: "Error handling not documented"
+- **Fix**: Add status codes and error response formats
+
+**Issue**: "No authentication section"
+- **Fix**: Clearly document authentication method and authorization rules
+
+**Issue**: "Incomplete endpoint details"
+- **Fix**: Add request parameters, response examples, and error cases
+
+## Decision-Making Framework
+
+As you write the API spec, consider:
+
+1. **Design**: Are endpoints intuitive and consistent?
+   - Consistent URL structure?
+   - Correct HTTP methods?
+   - Good naming?
+
+2. **Data**: What fields are needed in requests/responses?
+   - Required vs. optional?
+   - Proper data types?
+   - Necessary for clients or redundant?
+
+3. **Errors**: What can go wrong?
+   - Common error cases?
+   - Clear error messages?
+   - Actionable feedback for developers?
+
+4. **Performance**: Are there efficiency considerations?
+   - Pagination for large result sets?
+   - Filtering/search capabilities?
+   - Rate limiting strategy?
+
+5. **Evolution**: How will this API change?
+   - Versioning strategy?
+   - Backwards compatibility?
+   - Deprecation timeline?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh api-contract api-XXX-slug`
+2. **Research**: Find related data models and technical requirements
+3. **Design endpoints**: Sketch out URL structure and HTTP methods
+4. **Fill in details** for each endpoint using this guide
+5. **Validate**: `scripts/validate-spec.sh docs/specs/api-contract/api-XXX-slug.md`
+6. **Get review** from backend and frontend teams
+7. **Share with implementation teams** for development
--- a/skills/spec-author/guides/business-requirement.md
+++ b/skills/spec-author/guides/business-requirement.md
@@ -0,0 +1,310 @@
+# How to Create a Business Requirement Specification
+
+Business Requirements (BRD) capture what problem you're solving and why it matters from a business perspective. They translate customer needs into requirements that the technical team can build against.
+
+## Quick Start
+
+```bash
+# 1. Create a new business requirement (auto-generates next ID)
+scripts/generate-spec.sh business-requirement --next descriptive-slug
+# This auto-assigns the next ID (e.g., brd-002-descriptive-slug)
+# File created at: docs/specs/business-requirement/brd-002-descriptive-slug.md
+
+# 2. Fill in the sections following the guide below
+
+# 3. Validate completeness
+scripts/validate-spec.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
+
+# 4. Fix any issues and check completeness
+scripts/check-completeness.sh docs/specs/business-requirement/brd-002-descriptive-slug.md
+```
+
+**Pro tip:** Use `scripts/next-id.sh business-requirement` to see what the next ID will be before creating.
+
+## When to Write a Business Requirement
+
+Use a Business Requirement when you need to:
+- Document a new feature or capability from the user's perspective
+- Articulate the business value and expected outcomes
+- Define acceptance criteria that stakeholders can verify
+- Create alignment across product, engineering, and business teams
+- Track success after implementation with specific metrics
+
+## Research Phase
+
+Before writing, gather context:
+
+### 1. Research Related Specifications
+Look for related specs that inform this requirement:
+```bash
+# Find related business requirements
+grep -r "related\|similar\|user export" docs/specs/ --include="*.md" | head -20
+
+# Or search for related technical requirements that might already exist
+scripts/list-templates.sh  # See what's already documented
+```
+
+### 2. Research External Documentation & Competitive Landscape
+If available, research:
+- Competitor features or how similar companies solve this problem
+- Industry standards or best practices
+- User research or survey data
+- Customer feedback or support tickets
+
+Use web tools if needed:
+```bash
+# If researching web sources, use Claude's web fetch capabilities
+# to pull in external docs, API documentation, or competitive analysis
+```
+
+### 3. Understand Existing Context
+- Ask: What systems or processes does this impact?
+- Find: Any existing specs in `docs/specs/` that are related
+- Review: Recent PRs or commits related to this domain
+
+## Structure & Content Guide
+
+### Metadata Section
+Fill in these required fields:
+- **Document ID**: Use format `BRD-XXX-short-slug` (e.g., `BRD-001-user-export`)
+- **Status**: Start with "Draft", moves to "In Review" → "Approved" → "Implemented"
+- **Author**: Your name
+- **Created**: Today's date (YYYY-MM-DD)
+- **Stakeholders**: Key people involved (Product Manager, Engineering Lead, Customer Success, etc.)
+- **Priority**: Critical | High | Medium | Low
+
+### Description Section
+Answer: "What is the problem and why does it matter?"
+
+**Background**: Provide context
+- What is the current situation?
+- How did we identify this need?
+- Who brought it up?
+
+**Problem Statement**: Be concise and specific
+- Example: "Users cannot export their data in bulk, forcing them to perform manual exports one at a time, which is time-consuming and error-prone."
+
+### Business Value Section
+Answer: "Why should we build this?"
+
+**Expected Outcomes**: List 2-3 measurable outcomes
+- Example: "Reduce manual export time by 80%"
+- Example: "Increase user retention by enabling data portability"
+
+**Strategic Alignment**: How does this support business goals?
+- Example: "Aligns with our goal to improve user experience for enterprise customers"
+
+### Stakeholders Section
+Create a table identifying who needs to sign off or provide input:
+- **Business Owner**: Makes final business decisions
+- **Product Owner**: Gathers and prioritizes requirements
+- **End Users**: The people who will use this feature
+- **Technical Lead**: Ensures technical feasibility
+
+### User Stories Section
+Write 3-5 user stories following this format:
+```
+As a [user role],
+I want to [capability],
+so that [benefit/outcome]
+```
+
+**Tips for writing user stories:**
+- Use real user roles from your product (not generic "user")
+- Each story should be achievable in 1-3 days of work (rough estimate)
+- Include acceptance criteria inline or in a separate section
+
+**Example:**
+| ID | As a... | I want to... | So that... | Priority |
+|---|---|---|---|---|
+| US-1 | Data Analyst | Export data as CSV | I can analyze it in Excel | High |
+| US-2 | Enterprise Admin | Bulk export all user data | I can back it up and migrate to another system | High |
+| US-3 | API Client | Get exports via webhook | I can automate reports | Medium |
+
+### Assumptions Section
+List what you're assuming to be true:
+- "Users have stable internet connections"
+- "Exported data will be less than 100MB"
+- "We can leverage the existing database export functionality"
+
+### Constraints Section
+Identify limitations:
+- **Business**: Budget, timeline, market windows
+- **Technical**: System limitations, platform restrictions
+- **Organizational**: Team capacity, skill gaps
+
+### Dependencies Section
+What needs to happen first?
+- "Data privacy review must be completed"
+- "Export API implementation (prd-XXX) must be finished"
+
+### Risks Section
+What could go wrong?
+- Document: Risk description, likelihood (High/Med/Low), impact (High/Med/Low), and mitigation strategy
+
+### Acceptance Criteria Section
+Define "done" from a business perspective (3-5 criteria):
+```
+1. Users can select data types to export (users, transactions, settings)
+2. Exports complete within 2 minutes for datasets up to 100MB
+3. Exported data is usable in common formats (CSV, JSON)
+4. Users receive email confirmation when export is ready
+5. Exported data is securely deleted after 7 days
+```
+
+### Success Metrics Section
+How will you measure success after launch?
+
+| Metric | Current Baseline | Target | Measurement Method |
+|--------|------------------|--------|-------------------|
+| % of power users using export | 0% | 40% | Product analytics |
+| Average export time | N/A | < 2 min | Server logs |
+| Support tickets about exports | TBD | < 5/week | Support system |
+| User satisfaction (export feature) | N/A | > 4/5 stars | In-app survey |
+
+### Time to Value
+When do you expect to see results?
+- Example: "We expect 20% adoption within the first 2 weeks post-launch based on similar features"
+
+### Approval Section
+Track who has approved this requirement:
+- Business Owner approval needed before engineering begins
+- Product Owner approval to confirm alignment
+- Technical Lead approval to confirm feasibility
+
+## Writing Tips
+
+### Be Specific, Not Vague
+- ❌ Bad: "Users want to export their data"
+- ✅ Good: "Users want to export their transaction history as CSV within a specific date range"
+
+### Use Concrete Examples
+- Describe what the feature looks like in action
+- Include sample data or screenshots if possible
+- Give edge cases (what about large datasets? special characters? time zones?)
+
+### Consider the User's Perspective
+- Think about: What problem are they solving with this?
+- What would make them happy or frustrated?
+- What alternatives might they use if you don't build this?
+
+### Link to Other Specs
+- Reference related technical requirements (if they exist): "See [prd-XXX] for implementation details"
+- Reference related design docs: "See [des-XXX] for the export flow architecture"
+
+### Complete All TODOs
+- Don't leave placeholder text like "TODO: Add metrics"
+- If something isn't known, explain why and what needs to happen to find out
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/business-requirement/brd-001-your-spec.md
+```
+
+The validator checks for:
+- Title and ID properly formatted
+- All required sections present
+- Minimum content in critical sections
+- No incomplete TODO items
+
+### Common Issues & Fixes
+
+**Issue**: "Missing Acceptance Criteria section"
+- **Fix**: Add 3-5 clear, measurable acceptance criteria that define what "done" means
+
+**Issue**: "User Stories only has 1-2 items (minimum 3)"
+- **Fix**: Add 1-2 more user stories representing different roles or scenarios
+
+**Issue**: "TODO items in Business Value (3 items)"
+- **Fix**: Complete the Business Value section with actual expected outcomes and strategic alignment
+
+**Issue**: "No Success Metrics defined"
+- **Fix**: Add a table with specific, measurable KPIs you'll track post-launch
+
+### Check Completeness
+```bash
+scripts/check-completeness.sh docs/specs/business-requirement/brd-001-your-spec.md
+```
+
+This shows:
+- Overall completion percentage
+- Which sections still have TODOs
+- Referenced documents (if any are broken, they show up here)
+
+## Decision-Making Framework
+
+When writing the BRD, reason through these questions:
+
+1. **Problem**: Is this a real problem or a nice-to-have?
+   - Can you trace it back to actual user feedback?
+   - How many users are affected?
+   - How often do they encounter this problem?
+
+2. **Scope**: What are we NOT building?
+   - Define boundaries clearly (what's in scope vs. out of scope)
+   - This helps prevent scope creep
+
+3. **Trade-offs**: What are we accepting by building this?
+   - Engineering effort cost
+   - Opportunity cost (what else won't we build?)
+   - Maintenance burden
+
+4. **Success**: How will we know if this was worth building?
+   - What metrics matter?
+   - What's the acceptable threshold for success?
+
+5. **Risks**: What could prevent this from working?
+   - Technical risks
+   - User adoption risks
+   - Business/market risks
+
+## Example: Complete Business Requirement
+
+Here's how a complete BRD section might look:
+
+```markdown
+# [BRD-001] Bulk User Data Export
+
+## Metadata
+- **Document ID**: BRD-001-bulk-export
+- **Status**: Approved
+- **Author**: Jane Smith
+- **Created**: 2024-01-15
+- **Stakeholders**: Product Manager (Jane), Engineering Lead (Bob), Support (Maria)
+- **Priority**: High
+
+## Description
+
+### Background
+Our enterprise customers have requested the ability to bulk export user data.
+Currently, they can only export one user at a time via the admin panel, which is
+time-consuming for customers with hundreds of users.
+
+### Problem Statement
+Enterprise customers need to audit, back up, and migrate user data, but the
+current one-at-a-time export process takes hours and is error-prone.
+
+## Business Value
+
+### Expected Outcomes
+- Reduce manual export time for enterprise customers by 80%
+- Enable customers to audit user data for compliance purposes
+- Support customer data portability requests
+
+### Strategic Alignment
+Aligns with our enterprise expansion goal by improving features our target
+customers need for large-scale deployments.
+
+[... rest of sections follow template ...]
+```
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh business-requirement brd-XXX-slug`
+2. **Fill in each section** using this guide as reference
+3. **Validate**: `scripts/validate-spec.sh docs/specs/business-requirement/brd-XXX-slug.md`
+4. **Fix issues** identified by the validator
+5. **Get stakeholder approval** (fill in the Approval section)
+6. **Share with engineering** for technical requirement creation
--- a/skills/spec-author/guides/component.md
+++ b/skills/spec-author/guides/component.md
@@ -0,0 +1,600 @@
+# How to Create a Component Specification
+
+Component specifications document individual system components or services, including their responsibilities, interfaces, configuration, and deployment characteristics.
+
+## Quick Start
+
+```bash
+# 1. Create a new component spec
+scripts/generate-spec.sh component cmp-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/component/cmp-001-descriptive-slug.md)
+
+# 3. Fill in the sections, then validate:
+scripts/validate-spec.sh docs/specs/component/cmp-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/component/cmp-001-descriptive-slug.md
+```
+
+## When to Write a Component Specification
+
+Use a Component Spec when you need to:
+- Document a microservice or major system component
+- Specify component responsibilities and interfaces
+- Define configuration requirements
+- Document deployment procedures
+- Enable teams to understand component behavior
+- Plan for monitoring and observability
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what informed this component:
+
+```bash
+# Find design documents that reference this component
+grep -r "design\|architecture" docs/specs/ --include="*.md"
+
+# Find API contracts this component implements
+grep -r "api\|endpoint" docs/specs/ --include="*.md"
+
+# Find data models this component uses
+grep -r "data\|model" docs/specs/ --include="*.md"
+```
+
+### 2. Review Similar Components
+- How are other components in your system designed?
+- What patterns and conventions exist?
+- How are they deployed and monitored?
+- What's the standard for documentation?
+
+### 3. Understand Dependencies
+- What services or systems does this component depend on?
+- What services depend on this component?
+- What data flows through this component?
+- What are the integration points?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Export Service", "User Authentication Service", etc.
+- **Type**: Microservice, Library, Worker, API Gateway, etc.
+- **Version**: Current version number
+
+### Component Description
+
+```markdown
+# Export Service
+
+The Export Service is a microservice responsible for handling bulk user data exports.
+Manages export job lifecycle: queuing, processing, storage, and delivery.
+
+**Type**: Microservice
+**Language**: Node.js + TypeScript
+**Deployment**: Kubernetes (3+ replicas)
+**Status**: Stable (production)
+```
+
+### Purpose & Responsibilities Section
+
+```markdown
+## Purpose
+
+Provide reliable, scalable handling of user data exports in multiple formats
+while maintaining system stability and data security.
+
+## Primary Responsibilities
+
+1. **Job Queueing**: Accept export requests and queue them for processing
+   - Validate request parameters
+   - Create export job records
+   - Enqueue jobs for processing
+   - Return job ID to client
+
+2. **Job Processing**: Execute export jobs asynchronously
+   - Query user data from database
+   - Transform data to requested format (CSV, JSON)
+   - Compress files for storage
+   - Handle processing errors and retries
+
+3. **File Storage**: Manage exported file storage and lifecycle
+   - Store completed exports to S3
+   - Generate secure download URLs
+   - Implement TTL-based cleanup
+   - Maintain export audit logs
+
+4. **Status Tracking**: Provide job status and progress information
+   - Track job state (queued, processing, completed, failed)
+   - Record completion time and file metadata
+   - Handle cancellation requests
+
+5. **Error Handling**: Manage failures gracefully
+   - Retry failed jobs with exponential backoff
+   - Notify users of failures
+   - Log errors for debugging
+   - Preserve system stability during failures
+```
+
+### Interfaces & APIs Section
+
+```markdown
+## Interfaces
+
+### REST API Endpoints
+
+The service exposes these HTTP endpoints:
+
+#### POST /exports
+**Purpose**: Create a new export job
+**Authentication**: Required (Bearer token)
+**Request Body**:
+```json
+{
+  "data_types": ["users", "transactions"],
+  "format": "csv",
+  "date_range": { "start": "2024-01-01", "end": "2024-01-31" }
+}
+```
+**Response** (201 Created):
+```json
+{
+  "id": "exp_123456",
+  "status": "queued",
+  "created_at": "2024-01-15T10:00:00Z"
+}
+```
+
+#### GET /exports/{id}
+**Purpose**: Get export job status
+**Response** (200 OK):
+```json
+{
+  "id": "exp_123456",
+  "status": "completed",
+  "download_url": "https://...",
+  "file_size_bytes": 2048576
+}
+```
+
+### Event Publishing
+
+The service publishes events to message queue:
+
+**export.started**
+```json
+{
+  "event": "export.started",
+  "export_id": "exp_123456",
+  "user_id": "usr_789012",
+  "timestamp": "2024-01-15T10:00:00Z"
+}
+```
+
+**export.completed**
+```json
+{
+  "event": "export.completed",
+  "export_id": "exp_123456",
+  "file_size_bytes": 2048576,
+  "format": "csv",
+  "timestamp": "2024-01-15T10:05:00Z"
+}
+```
+
+**export.failed**
+```json
+{
+  "event": "export.failed",
+  "export_id": "exp_123456",
+  "error": "database_connection_timeout",
+  "timestamp": "2024-01-15T10:05:00Z"
+}
+```
+
+### Dependencies (Consumed APIs)
+
+- **User Service API**: GET /users/{id}, GET /users (for data export)
+- **Auth Service**: JWT validation
+- **Notification Service**: Send export completion notifications
+```
+
+### Configuration Section
+
+```markdown
+## Configuration
+
+### Environment Variables
+
+| Variable | Type | Required | Description |
+|----------|------|----------|-------------|
+| NODE_ENV | string | Yes | Environment (dev, staging, production) |
+| PORT | number | Yes | HTTP server port (default: 3000) |
+| DATABASE_URL | string | Yes | PostgreSQL connection string |
+| REDIS_URL | string | Yes | Redis connection for job queue |
+| S3_BUCKET | string | Yes | S3 bucket for export files |
+| S3_REGION | string | Yes | AWS region (e.g., us-east-1) |
+| AWS_ACCESS_KEY_ID | string | Yes | AWS credentials |
+| AWS_SECRET_ACCESS_KEY | string | Yes | AWS credentials |
+| EXPORT_TTL_DAYS | number | No | Export file retention days (default: 7) |
+| MAX_EXPORT_SIZE_MB | number | No | Maximum export file size (default: 500) |
+| CONCURRENT_WORKERS | number | No | Number of concurrent job processors (default: 5) |
+
+### Configuration File (config.json)
+
+```json
+{
+  "server": {
+    "port": 3000,
+    "timeout_ms": 30000
+  },
+  "jobs": {
+    "max_retries": 3,
+    "retry_delay_ms": 1000,
+    "timeout_ms": 300000
+  },
+  "export": {
+    "max_file_size_mb": 500,
+    "ttl_days": 7,
+    "formats": ["csv", "json"]
+  },
+  "storage": {
+    "type": "s3",
+    "cleanup_interval_hours": 24
+  }
+}
+```
+
+### Runtime Requirements
+
+- **Memory**: 512MB minimum, 2GB recommended
+- **CPU**: 1 core minimum, 2 cores recommended
+- **Disk**: 10GB for temporary files
+- **Network**: Must reach PostgreSQL, Redis, S3, Auth Service
+```
+
+### Data Dependencies Section
+
+```markdown
+## Data Dependencies
+
+### Input Data
+
+The service requires access to:
+- **User data**: From User Service or User DB
+  - Fields: id, email, name, created_at, etc.
+  - Constraints: User must be authenticated
+  - Volume: Scale with user dataset
+
+- **Transaction data**: From Transaction DB
+  - Fields: id, user_id, amount, date, etc.
+  - Volume: Can be large (100k+ per user)
+
+### Output Data
+
+The service produces:
+- **Export files**: CSV or JSON format
+  - Stored in S3
+  - Size: Up to 500MB per file
+  - Retention: 7 days
+
+- **Export metadata**: Stored in PostgreSQL
+  - Export record with status, size, completion time
+  - Audit trail of all exports
+```
+
+### Deployment Section
+
+```markdown
+## Deployment
+
+### Container Image
+
+- **Base Image**: node:18-alpine
+- **Build**: Dockerfile in repository root
+- **Registry**: ECR (AWS Elastic Container Registry)
+- **Tag**: Semver (e.g., v1.2.3, latest)
+
+### Kubernetes Deployment
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: export-service
+  namespace: production
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: export-service
+  template:
+    metadata:
+      labels:
+        app: export-service
+    spec:
+      containers:
+      - name: export-service
+        image: 123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:latest
+        ports:
+        - containerPort: 3000
+        resources:
+          requests:
+            memory: "512Mi"
+            cpu: "250m"
+          limits:
+            memory: "2Gi"
+            cpu: "1000m"
+        env:
+        - name: NODE_ENV
+          value: "production"
+        - name: DATABASE_URL
+          valueFrom:
+            secretKeyRef:
+              name: export-service-secrets
+              key: database-url
+        livenessProbe:
+          httpGet:
+            path: /health
+            port: 3000
+          initialDelaySeconds: 30
+          periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /ready
+            port: 3000
+          initialDelaySeconds: 10
+          periodSeconds: 5
+```
+
+### Deployment Steps
+
+1. **Build**: `docker build -t export-service:v1.2.3 .`
+2. **Push**: `docker push <registry>/export-service:v1.2.3`
+3. **Update**: `kubectl set image deployment/export-service export-service=<registry>/export-service:v1.2.3`
+4. **Verify**: `kubectl rollout status deployment/export-service`
+
+### Rollback Procedure
+
+```bash
+# If deployment fails, rollback to previous version
+kubectl rollout undo deployment/export-service
+
+# Verify successful rollback
+kubectl rollout status deployment/export-service
+```
+
+### Pre-Deployment Checklist
+
+- [ ] All tests passing locally
+- [ ] Database migrations run successfully
+- [ ] Configuration environment variables set in staging
+- [ ] Health check endpoints responding
+- [ ] Metrics and logging verified
+```
+
+### Monitoring & Observability Section
+
+```markdown
+## Monitoring
+
+### Health Checks
+
+**Liveness Probe**: GET /health
+- Returns 200 if service is running
+- Used by Kubernetes to restart unhealthy pods
+
+**Readiness Probe**: GET /ready
+- Returns 200 if service is ready to receive traffic
+- Checks database connectivity, Redis availability
+- Used by Kubernetes for traffic routing
+
+### Metrics
+
+Export these Prometheus metrics:
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| exports_created_total | Counter | Total exports created |
+| exports_completed_total | Counter | Total exports completed successfully |
+| exports_failed_total | Counter | Total exports failed |
+| export_duration_seconds | Histogram | Time to complete export (p50, p95, p99) |
+| export_file_size_bytes | Histogram | Size of exported files |
+| export_job_queue_depth | Gauge | Number of jobs awaiting processing |
+| export_active_jobs | Gauge | Number of jobs currently processing |
+
+### Alerts
+
+Configure these alerts:
+
+**Export Job Backlog Growing**
+- Alert if `export_job_queue_depth > 100` for 5+ minutes
+- Action: Scale up worker replicas
+
+**Export Failures Increasing**
+- Alert if `exports_failed_total` increases by > 10% in 1 hour
+- Action: Investigate failure logs
+
+**Service Unhealthy**
+- Alert if liveness probe fails
+- Action: Restart pod, check logs
+
+### Logging
+
+Log format (JSON):
+```json
+{
+  "timestamp": "2024-01-15T10:05:00Z",
+  "level": "info",
+  "service": "export-service",
+  "export_id": "exp_123456",
+  "event": "export_completed",
+  "duration_ms": 5000,
+  "file_size_bytes": 2048576,
+  "message": "Export completed successfully"
+}
+```
+
+**Log Levels**
+- `debug`: Detailed debugging information
+- `info`: Important operational events
+- `warn`: Warning conditions (retries, slow operations)
+- `error`: Error conditions (failures, exceptions)
+```
+
+### Dependencies & Integration Section
+
+```markdown
+## Dependencies
+
+### Service Dependencies
+
+| Service | Purpose | Criticality | Failure Impact |
+|---------|---------|-------------|----------------|
+| PostgreSQL | Export job storage | Critical | Service down |
+| Redis | Job queue | Critical | Exports won't process |
+| S3 | Export file storage | Critical | Can't store exports |
+| Auth Service | JWT validation | Critical | Can't validate requests |
+| User Service | User data source | Critical | Can't export user data |
+| Notification Service | Email notifications | Optional | Users won't get notification |
+
+### External Dependencies
+
+- **AWS S3**: For file storage and retrieval
+- **PostgreSQL**: For export metadata
+- **Redis**: For job queue
+- **Kubernetes**: For orchestration
+
+### Fallback Strategies
+
+- Redis unavailable: Use in-memory queue (single instance only)
+- User Service unavailable: Fail export with "upstream_error"
+- S3 unavailable: Retry with exponential backoff, max 3 times
+```
+
+### Performance & SLA Section
+
+```markdown
+## Performance Characteristics
+
+### Throughput
+- Process up to 1000 exports per day
+- Handle 100 concurrent job workers
+- Queue depth auto-scales based on load
+
+### Latency
+- Create export job: < 100ms (p95)
+- Process 100MB export: 3-5 minutes average
+- Query export status: < 50ms (p95)
+
+### Resource Usage
+- Memory: 800MB average, peaks at 1.5GB
+- CPU: 25% average, peaks at 60%
+- Disk (temp): 50GB for concurrent exports
+
+### Service Level Objectives (SLOs)
+
+| Objective | Target |
+|-----------|--------|
+| Availability | 99.5% uptime |
+| Error Rate | < 0.1% |
+| p95 Latency (status query) | < 100ms |
+| Export Completion | < 10 minutes for 100MB |
+
+### Scalability
+
+- Horizontal: Add more pods for higher throughput
+- Vertical: Increase pod memory/CPU for larger exports
+- Maximum tested: 10k exports/day on 5 pod cluster
+```
+
+## Writing Tips
+
+### Be Specific About Responsibilities
+- What does this component do?
+- What does it NOT do?
+- Where do responsibilities start/stop?
+
+### Document All Interfaces
+- REST APIs? Document endpoints and schemas
+- Message queues? Show event formats
+- Database? Show schema and queries
+- Dependencies? Show what's called and how
+
+### Include Deployment Details
+- How is it deployed (containers, VMs, serverless)?
+- Configuration required?
+- Health checks?
+- Monitoring setup?
+
+### Link to Related Specs
+- Reference design docs: `[DES-001]`
+- Reference API contracts: `[API-001]`
+- Reference data models: `[DATA-001]`
+- Reference deployment procedures: `[DEPLOY-001]`
+
+### Document Failure Modes
+- What happens if dependencies fail?
+- How does the component recover?
+- What alerts fire when things go wrong?
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/component/cmp-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Missing Interfaces section"
+- **Fix**: Document all APIs, event formats, and data contracts
+
+**Issue**: "Configuration incomplete"
+- **Fix**: Add environment variables, configuration files, and runtime requirements
+
+**Issue**: "No Monitoring section"
+- **Fix**: Add health checks, metrics, alerts, and logging strategy
+
+**Issue**: "Deployment steps unclear"
+- **Fix**: Add step-by-step deployment and rollback procedures
+
+## Decision-Making Framework
+
+When writing a component spec, consider:
+
+1. **Boundaries**: What is this component's responsibility?
+   - What does it own?
+   - What does it depend on?
+   - Where are boundaries clear?
+
+2. **Interfaces**: How will others interact with this?
+   - REST, gRPC, events, direct calls?
+   - What contracts must be maintained?
+   - How do we evolve interfaces?
+
+3. **Configuration**: What's configurable vs. hardcoded?
+   - Environment-specific settings?
+   - Runtime tuning parameters?
+   - Feature flags?
+
+4. **Operations**: How will we run this in production?
+   - Deployment model?
+   - Monitoring and alerting?
+   - Failure recovery?
+
+5. **Scale**: How much can this component handle?
+   - Throughput limits?
+   - Scaling strategy?
+   - Resource requirements?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh component cmp-XXX-slug`
+2. **Research**: Find design docs and existing components
+3. **Define responsibilities** and boundaries
+4. **Document interfaces** for all interactions
+5. **Plan deployment** and monitoring
+6. **Validate**: `scripts/validate-spec.sh docs/specs/component/cmp-XXX-slug.md`
+7. **Share with architecture/ops** before implementation
--- a/skills/spec-author/guides/configuration-schema.md
+++ b/skills/spec-author/guides/configuration-schema.md
@@ -0,0 +1,707 @@
+# How to Create a Configuration Schema Specification
+
+Configuration schema specifications document all configurable parameters for a system, including their types, valid values, defaults, and impact.
+
+## Quick Start
+
+```bash
+# 1. Create a new configuration schema
+scripts/generate-spec.sh configuration-schema config-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/configuration-schema/config-001-descriptive-slug.md)
+
+# 3. Fill in configuration fields and validation rules, then validate:
+scripts/validate-spec.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/configuration-schema/config-001-descriptive-slug.md
+```
+
+## When to Write a Configuration Schema
+
+Use a Configuration Schema when you need to:
+- Document all configurable system parameters
+- Specify environment variables and their meanings
+- Define configuration file formats
+- Document validation rules and constraints
+- Enable operations teams to configure systems safely
+- Provide examples for different environments
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what you're configuring:
+
+```bash
+# Find component specs
+grep -r "component" docs/specs/ --include="*.md"
+
+# Find deployment procedures
+grep -r "deploy" docs/specs/ --include="*.md"
+
+# Find existing configuration specs
+grep -r "config" docs/specs/ --include="*.md"
+```
+
+### 2. Understand Configuration Needs
+- What aspects of the system need to be configurable?
+- What differs between environments (dev, staging, prod)?
+- What can change at runtime vs. requires restart?
+- What's sensitive (secrets, credentials)?
+
+### 3. Review Existing Configurations
+- How are other services configured?
+- What configuration format is used?
+- What environment variables exist?
+- What patterns should be followed?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Export Service Configuration", "API Gateway Config", etc.
+- **Component**: What component is being configured
+- **Version**: Configuration format version
+- **Status**: Current, Deprecated, etc.
+
+### Overview Section
+
+```markdown
+# Export Service Configuration Schema
+
+## Summary
+Defines all configurable parameters for the Export Service microservice.
+Configuration can be set via environment variables or JSON config file.
+
+**Configuration Methods**:
+- Environment variables (recommended for Docker/Kubernetes)
+- config.json file (for monolithic deployments)
+- Command-line arguments (for local development)
+
+**Scope**: All settings that affect Export Service behavior
+**Format**: JSON Schema compliant
+```
+
+### Configuration Methods Section
+
+```markdown
+## Configuration Methods
+
+### Method 1: Environment Variables (Recommended for Production)
+Used in containerized deployments (Docker, Kubernetes).
+Set before starting the service.
+
+**Syntax**: `EXPORT_SERVICE_KEY=value`
+
+**Example**:
+```bash
+export EXPORT_SERVICE_PORT=3000
+export EXPORT_SERVICE_LOG_LEVEL=info
+export EXPORT_SERVICE_DATABASE_URL=postgresql://user:pass@host/db
+```
+
+### Method 2: Configuration File (config.json)
+Used in monolithic or local deployments.
+JSON format with hierarchical structure.
+
+**Location**: `./config.json` in working directory
+
+**Example**:
+```json
+{
+  "server": {
+    "port": 3000,
+    "timeout": 30000
+  },
+  "database": {
+    "url": "postgresql://user:pass@host/db",
+    "pool": 10
+  }
+}
+```
+
+### Method 3: Command-Line Arguments
+Used in local development. Takes precedence over file config.
+
+**Syntax**: `--key value` or `--key=value`
+
+**Example**:
+```bash
+node index.js --port 3000 --log-level debug
+```
+
+### Precedence (Priority Order)
+1. Command-line arguments (highest priority)
+2. Environment variables
+3. config.json file
+4. Default values (lowest priority)
+```
+
+### Configuration Fields Section
+
+Document each configuration field:
+
+```markdown
+## Configuration Fields
+
+### Server Section
+
+#### PORT
+- **Type**: integer
+- **Default**: 3000
+- **Range**: 1024-65535
+- **Environment Variable**: `EXPORT_SERVICE_PORT`
+- **Config File Key**: `server.port`
+- **Description**: HTTP server listening port
+- **Examples**:
+  - Development: 3000 (local machine, different services use different ports)
+  - Production: 3000 (behind load balancer, port not exposed)
+- **Impact**: Service not reachable if port already in use
+- **Can Change at Runtime**: No (requires restart)
+
+#### TIMEOUT_MS
+- **Type**: integer (milliseconds)
+- **Default**: 30000 (30 seconds)
+- **Range**: 5000-120000
+- **Environment Variable**: `EXPORT_SERVICE_TIMEOUT_MS`
+- **Config File Key**: `server.timeout_ms`
+- **Description**: HTTP request timeout
+- **Considerations**:
+  - Must be longer than longest export duration
+  - If too short: Long exports time out and fail
+  - If too long: Failed connections hang longer
+- **Examples**:
+  - Development: 30000 (quick feedback on errors)
+  - Production: 120000 (accounts for large exports)
+
+#### ENABLE_COMPRESSION
+- **Type**: boolean
+- **Default**: true
+- **Environment Variable**: `EXPORT_SERVICE_ENABLE_COMPRESSION`
+- **Config File Key**: `server.enable_compression`
+- **Description**: Enable HTTP response compression (gzip)
+- **Considerations**:
+  - Reduces bandwidth but increases CPU usage
+  - Should be true unless CPU constrained
+- **Typical Value**: true (always)
+
+### Database Section
+
+#### DATABASE_URL
+- **Type**: string (connection string)
+- **Default**: None (required)
+- **Environment Variable**: `EXPORT_SERVICE_DATABASE_URL`
+- **Config File Key**: `database.url`
+- **Format**: `postgresql://user:password@host:port/database`
+- **Description**: PostgreSQL connection string
+- **Examples**:
+  - Development: `postgresql://localhost/export_service`
+  - Staging: `postgresql://stage-db.example.com/export_stage`
+  - Production: `postgresql://prod-db.example.com/export_prod` (managed RDS)
+- **Sensitive**: Yes (contains credentials - use secrets management)
+- **Required**: Yes
+- **Validation**:
+  - Must be valid PostgreSQL connection string
+  - Service fails to start if URL invalid or unreachable
+
+#### DATABASE_POOL_SIZE
+- **Type**: integer
+- **Default**: 10
+- **Range**: 1-100
+- **Environment Variable**: `EXPORT_SERVICE_DATABASE_POOL_SIZE`
+- **Config File Key**: `database.pool_size`
+- **Description**: Number of database connections to maintain
+- **Considerations**:
+  - More connections allow more concurrent queries
+  - Each connection uses memory and database slot
+  - Database has max_connections limit (typically 100-500)
+- **Tuning**:
+  - 1 service instance: 5-10 connections
+  - 5 service instances: 2-4 connections each (25-40 total)
+  - Kubernetes auto-scaling: 2-3 per pod (auto-scaled)
+
+#### DATABASE_QUERY_TIMEOUT_MS
+- **Type**: integer (milliseconds)
+- **Default**: 10000 (10 seconds)
+- **Range**: 1000-60000
+- **Environment Variable**: `EXPORT_SERVICE_DATABASE_QUERY_TIMEOUT_MS`
+- **Config File Key**: `database.query_timeout_ms`
+- **Description**: Timeout for individual database queries
+- **Considerations**:
+  - Export queries can take several seconds for large datasets
+  - If too short: Queries fail prematurely
+  - If too long: Failed queries block connection pool
+- **Typical Values**:
+  - Simple queries: 5000ms
+  - Large exports: 30000ms
+
+### Redis (Job Queue) Section
+
+#### REDIS_URL
+- **Type**: string (connection string)
+- **Default**: None (required)
+- **Environment Variable**: `EXPORT_SERVICE_REDIS_URL`
+- **Config File Key**: `redis.url`
+- **Format**: `redis://user:password@host:port/db`
+- **Description**: Redis connection string for job queue
+- **Examples**:
+  - Development: `redis://localhost:6379/0`
+  - Staging: `redis://redis-stage.example.com:6379/0`
+  - Production: `redis://redis-prod.example.com:6379/0` (managed ElastiCache)
+- **Sensitive**: Yes (may contain credentials)
+- **Required**: Yes
+
+#### REDIS_MAX_RETRIES
+- **Type**: integer
+- **Default**: 3
+- **Range**: 1-10
+- **Environment Variable**: `EXPORT_SERVICE_REDIS_MAX_RETRIES`
+- **Config File Key**: `redis.max_retries`
+- **Description**: Maximum retry attempts for Redis operations
+- **Considerations**:
+  - More retries provide resilience but increase latency on failure
+  - Should be 3-5 for production
+- **Typical Values**: 3
+
+#### CONCURRENT_WORKERS
+- **Type**: integer
+- **Default**: 3
+- **Range**: 1-20
+- **Environment Variable**: `EXPORT_SERVICE_CONCURRENT_WORKERS`
+- **Config File Key**: `redis.concurrent_workers`
+- **Description**: Number of concurrent export workers
+- **Considerations**:
+  - Each worker processes one export job at a time
+  - More workers process jobs faster but use more resources
+  - Limited by CPU and memory available
+  - Kubernetes scales pods, not this setting
+- **Tuning**:
+  - Development: 1-2 (for debugging)
+  - Production with 2 CPU: 2-3 workers
+  - Production with 4+ CPU: 4-8 workers
+
+### Export Section
+
+#### MAX_EXPORT_SIZE_MB
+- **Type**: integer
+- **Default**: 500
+- **Range**: 10-5000
+- **Environment Variable**: `EXPORT_SERVICE_MAX_EXPORT_SIZE_MB`
+- **Config File Key**: `export.max_export_size_mb`
+- **Description**: Maximum size for an export file (in MB)
+- **Considerations**:
+  - Files larger than this are rejected
+  - Limited by disk space and memory
+  - Should match S3 bucket policies
+- **Typical Values**:
+  - Small deployments: 100MB
+  - Standard: 500MB
+  - Enterprise: 1000-5000MB
+
+#### EXPORT_TTL_DAYS
+- **Type**: integer (days)
+- **Default**: 7
+- **Range**: 1-365
+- **Environment Variable**: `EXPORT_SERVICE_EXPORT_TTL_DAYS`
+- **Config File Key**: `export.ttl_days`
+- **Description**: How long to retain export files after completion
+- **Considerations**:
+  - Files deleted after TTL expires
+  - Affects storage costs (shorter TTL = lower cost)
+  - Users must download before expiration
+- **Typical Values**:
+  - Short retention: 3 days (reduce storage cost)
+  - Standard: 7 days (reasonable download window)
+  - Long retention: 30 days (enterprise customers)
+
+#### EXPORT_FORMATS
+- **Type**: array of strings
+- **Default**: ["csv", "json"]
+- **Valid Values**: "csv", "json", "parquet"
+- **Environment Variable**: `EXPORT_SERVICE_EXPORT_FORMATS` (comma-separated)
+- **Config File Key**: `export.formats`
+- **Description**: Supported export file formats
+- **Examples**:
+  - `["csv", "json"]` (most common)
+  - `["csv", "json", "parquet"]` (full support)
+- **Configuration**:
+  - Environment: `EXPORT_SERVICE_EXPORT_FORMATS=csv,json`
+  - File: `"formats": ["csv", "json"]`
+
+#### COMPRESSION_ENABLED
+- **Type**: boolean
+- **Default**: true
+- **Environment Variable**: `EXPORT_SERVICE_COMPRESSION_ENABLED`
+- **Config File Key**: `export.compression_enabled`
+- **Description**: Enable gzip compression for export files
+- **Considerations**:
+  - Reduces file size by 60-80% typically
+  - Increases CPU usage during export
+  - Should be enabled unless CPU is bottleneck
+- **Typical Value**: true
+
+### Storage Section
+
+#### S3_BUCKET
+- **Type**: string
+- **Default**: None (required)
+- **Environment Variable**: `EXPORT_SERVICE_S3_BUCKET`
+- **Config File Key**: `storage.s3_bucket`
+- **Description**: AWS S3 bucket for storing export files
+- **Format**: `bucket-name` (no s3:// prefix)
+- **Examples**:
+  - Development: `export-service-dev`
+  - Staging: `export-service-stage`
+  - Production: `export-service-prod`
+- **Required**: Yes
+- **IAM Requirements**: Service role must have s3:PutObject, s3:GetObject
+
+#### S3_REGION
+- **Type**: string
+- **Default**: `us-east-1`
+- **Valid Values**: Any AWS region (us-east-1, eu-west-1, etc.)
+- **Environment Variable**: `EXPORT_SERVICE_S3_REGION`
+- **Config File Key**: `storage.s3_region`
+- **Description**: AWS region for S3 bucket
+- **Examples**:
+  - us-east-1 (US East - Virginia)
+  - eu-west-1 (EU - Ireland)
+
+### Logging Section
+
+#### LOG_LEVEL
+- **Type**: string (enum)
+- **Default**: "info"
+- **Valid Values**: "debug", "info", "warn", "error"
+- **Environment Variable**: `EXPORT_SERVICE_LOG_LEVEL`
+- **Config File Key**: `logging.level`
+- **Description**: Logging verbosity level
+- **Examples**:
+  - Development: "debug" (verbose, detailed logs)
+  - Staging: "info" (normal level)
+  - Production: "info" or "warn" (minimal logs, better performance)
+- **Considerations**:
+  - debug: Very verbose, affects performance
+  - info: Standard operational logs
+  - warn: Only warnings and errors
+  - error: Only errors
+
+#### LOG_FORMAT
+- **Type**: string (enum)
+- **Default**: "json"
+- **Valid Values**: "json", "text"
+- **Environment Variable**: `EXPORT_SERVICE_LOG_FORMAT`
+- **Config File Key**: `logging.format`
+- **Description**: Log output format
+- **Examples**:
+  - json: Machine-parseable JSON logs (recommended for production)
+  - text: Human-readable text (good for development)
+
+### Feature Flags Section
+
+#### FEATURE_PARQUET_EXPORT
+- **Type**: boolean
+- **Default**: false
+- **Environment Variable**: `EXPORT_SERVICE_FEATURE_PARQUET_EXPORT`
+- **Config File Key**: `features.parquet_export`
+- **Description**: Enable experimental Parquet export format
+- **Considerations**:
+  - Set to false for stable deployments
+  - Set to true in staging for testing
+  - Disabled by default in production
+- **Typical Values**:
+  - Development: true (test new feature)
+  - Staging: true (validate before production)
+  - Production: false (disabled until stable)
+```
+
+### Validation Rules Section
+
+```markdown
+## Validation & Constraints
+
+### Required Fields
+These fields must be provided (no default value):
+- `DATABASE_URL` - PostgreSQL connection string required
+- `REDIS_URL` - Redis connection required
+- `S3_BUCKET` - S3 bucket must be specified
+
+### Type Validation
+- Integers: Must be valid numeric values
+- Booleans: Accept true, false, "true", "false", 1, 0
+- Strings: Must not be empty (unless explicitly optional)
+- Arrays: Must be comma-separated in environment, JSON array in file
+
+### Range Validation
+- PORT: 1024-65535 (avoid system ports)
+- POOL_SIZE: 1-100 (reasonable connection pool)
+- TIMEOUT_MS: 5000-120000 (between 5 seconds and 2 minutes)
+- MAX_EXPORT_SIZE_MB: 10-5000 (reasonable file sizes)
+
+### Format Validation
+- DATABASE_URL: Must be valid PostgreSQL connection string
+- S3_BUCKET: Must follow S3 naming rules (lowercase, hyphens only)
+- S3_REGION: Must be valid AWS region code
+
+### Interdependency Rules
+- If COMPRESSION_ENABLED=true: MAX_EXPORT_SIZE_MB can be larger
+- If MAX_EXPORT_SIZE_MB > 100: DATABASE_QUERY_TIMEOUT_MS should be > 10000
+- If CONCURRENT_WORKERS > 5: Memory requirements increase significantly
+
+### Error Cases
+What happens if validation fails:
+- Service fails to start with validation error
+- Specific field and reason for validation failure logged
+- Error message includes valid range/values
+```
+
+### Environment-Specific Configurations Section
+
+```markdown
+## Environment-Specific Configurations
+
+### Development Environment
+
+```json
+{
+  "server": {
+    "port": 3000,
+    "timeout_ms": 30000
+  },
+  "database": {
+    "url": "postgresql://localhost/export_service",
+    "pool_size": 5
+  },
+  "redis": {
+    "url": "redis://localhost:6379/0",
+    "concurrent_workers": 1
+  },
+  "export": {
+    "max_export_size_mb": 100,
+    "ttl_days": 7,
+    "formats": ["csv", "json"]
+  },
+  "logging": {
+    "level": "debug",
+    "format": "text"
+  },
+  "features": {
+    "parquet_export": false
+  }
+}
+```
+
+**Notes**:
+- Runs locally with minimal resources
+- Verbose logging for debugging
+- Limited concurrent workers (1)
+- Smaller max export size for testing
+
+### Staging Environment
+
+```bash
+EXPORT_SERVICE_PORT=3000
+EXPORT_SERVICE_DATABASE_URL=postgresql://stage-db.example.com/export_stage
+EXPORT_SERVICE_REDIS_URL=redis://redis-stage.example.com:6379/0
+EXPORT_SERVICE_S3_BUCKET=export-service-stage
+EXPORT_SERVICE_S3_REGION=us-east-1
+EXPORT_SERVICE_LOG_LEVEL=info
+EXPORT_SERVICE_LOG_FORMAT=json
+EXPORT_SERVICE_CONCURRENT_WORKERS=3
+EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
+EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=true
+```
+
+**Notes**:
+- Tests new features before production
+- Similar resources to production
+- Parquet export enabled for testing
+
+### Production Environment
+
+```bash
+EXPORT_SERVICE_PORT=3000
+EXPORT_SERVICE_DATABASE_URL=<from AWS Secrets Manager>
+EXPORT_SERVICE_REDIS_URL=<from AWS Secrets Manager>
+EXPORT_SERVICE_S3_BUCKET=export-service-prod
+EXPORT_SERVICE_S3_REGION=us-east-1
+EXPORT_SERVICE_LOG_LEVEL=info
+EXPORT_SERVICE_LOG_FORMAT=json
+EXPORT_SERVICE_CONCURRENT_WORKERS=4
+EXPORT_SERVICE_DATABASE_POOL_SIZE=3
+EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
+EXPORT_SERVICE_EXPORT_TTL_DAYS=7
+EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=false
+```
+
+**Notes**:
+- Credentials from secrets manager
+- Optimized for performance and reliability
+- Experimental features disabled
+- Standard deployment settings
+```
+
+### Configuration Examples Section
+
+```markdown
+## Complete Configuration Examples
+
+### Minimal Configuration (Development)
+```bash
+# Minimal settings needed to run locally
+export EXPORT_SERVICE_DATABASE_URL=postgresql://localhost/export_service
+export EXPORT_SERVICE_REDIS_URL=redis://localhost:6379/0
+export EXPORT_SERVICE_S3_BUCKET=export-service-local
+export EXPORT_SERVICE_S3_REGION=us-east-1
+```
+
+### High-Throughput Configuration (Production)
+```bash
+# Optimized for maximum throughput
+export EXPORT_SERVICE_CONCURRENT_WORKERS=8
+export EXPORT_SERVICE_DATABASE_POOL_SIZE=5
+export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=1000
+export EXPORT_SERVICE_COMPRESSION_ENABLED=true
+export EXPORT_SERVICE_EXPORT_TTL_DAYS=30
+```
+
+### Low-Resource Configuration (Cost-Optimized)
+```bash
+# Minimizes resource usage and cost
+export EXPORT_SERVICE_CONCURRENT_WORKERS=1
+export EXPORT_SERVICE_DATABASE_POOL_SIZE=2
+export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=100
+export EXPORT_SERVICE_EXPORT_TTL_DAYS=1
+export EXPORT_SERVICE_LOG_LEVEL=warn
+```
+```
+
+### Secrets Management Section
+
+```markdown
+## Handling Sensitive Configuration
+
+### Sensitive Fields
+These fields contain credentials or sensitive information:
+- DATABASE_URL (contains password)
+- REDIS_URL (may contain password)
+- AWS credentials (if not using IAM roles)
+
+### Security Best Practices
+1. **Never commit secrets to git**
+   - Use .gitignore to exclude config files with secrets
+   - Use environment variables instead
+
+2. **Use Secrets Management**
+   - AWS Secrets Manager (recommended for production)
+   - HashiCorp Vault (for multi-team deployments)
+   - Kubernetes Secrets (for K8s deployments)
+
+3. **Rotate Credentials**
+   - Rotate database passwords regularly
+   - Rotate AWS API keys
+   - Update service after rotation
+
+4. **Limit Access**
+   - Only operations team can see production credentials
+   - Audit logs track who accessed what credentials
+   - Use IAM roles instead of static credentials when possible
+
+### Example: Using AWS Secrets Manager
+```bash
+# In Kubernetes deployment, inject from AWS Secrets Manager
+DATABASE_URL=$(aws secretsmanager get-secret-value \
+  --secret-id export-service/db-url \
+  --query SecretString --output text)
+
+export EXPORT_SERVICE_DATABASE_URL=$DATABASE_URL
+```
+```
+
+## Writing Tips
+
+### Be Clear About Scope
+- What can users configure?
+- What's fixed/non-configurable and why?
+- What requires restart vs. hot reload?
+
+### Provide Realistic Examples
+- Show real values, not placeholders
+- Include examples for different environments
+- Show both correct and incorrect formats
+
+### Document Trade-offs
+- Why choose certain defaults?
+- What's the impact of changing values?
+- What happens if value is too high/low?
+
+### Include Validation
+- What values are valid?
+- What happens if invalid values provided?
+- How do users know if config is wrong?
+
+### Think About Operations
+- What configuration might ops teams want to change?
+- What parameters help troubleshoot issues?
+- What can be tuned for performance?
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/configuration-schema/config-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Configuration fields lack descriptions"
+- **Fix**: Add purpose, examples, and impact for each field
+
+**Issue**: "No validation rules documented"
+- **Fix**: Document valid ranges, formats, required fields
+
+**Issue**: "No environment-specific examples"
+- **Fix**: Add configurations for dev, staging, and production
+
+**Issue**: "Sensitive fields not highlighted"
+- **Fix**: Clearly mark sensitive fields and document secrets management
+
+## Decision-Making Framework
+
+When designing configuration schema:
+
+1. **Scope**: What should be configurable?
+   - Environment-specific settings?
+   - Performance tuning parameters?
+   - Feature flags?
+   - Operational settings?
+
+2. **Defaults**: What are good default values?
+   - Production-safe defaults?
+   - Development-friendly for new users?
+   - Documented reasoning?
+
+3. **Flexibility**: How much should users configure?
+   - Too much: Confusing, hard to troubleshoot
+   - Too little: Can't adapt to needs
+   - Right amount: Common use cases covered
+
+4. **Safety**: How do we prevent misconfiguration?
+   - Validation rules?
+   - Error messages?
+   - Documentation of constraints?
+
+5. **Evolution**: How will configuration change?
+   - Backward compatibility?
+   - Migration path for old configs?
+   - Deprecation timeline?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh configuration-schema config-XXX-slug`
+2. **List fields**: What can be configured?
+3. **Document each field** with type, default, range, impact
+4. **Provide examples** for different environments
+5. **Document validation** rules and constraints
+6. **Validate**: `scripts/validate-spec.sh docs/specs/configuration-schema/config-XXX-slug.md`
+7. **Share with operations team** for feedback
--- a/skills/spec-author/guides/data-model.md
+++ b/skills/spec-author/guides/data-model.md
@@ -0,0 +1,490 @@
+# How to Create a Data Model Specification
+
+Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.
+
+## Quick Start
+
+```bash
+# 1. Create a new data model
+scripts/generate-spec.sh data-model data-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)
+
+# 3. Fill in entities and relationships, then validate:
+scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md
+```
+
+## When to Write a Data Model
+
+Use a Data Model when you need to:
+- Define database schema for new features
+- Document entity relationships and constraints
+- Establish consistent naming conventions
+- Enable API/UI teams to understand data structure
+- Plan data migrations or refactoring
+- Document complex data relationships
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what this data model supports:
+
+```bash
+# Find technical requirements this fulfills
+grep -r "prd\|technical" docs/specs/ --include="*.md"
+
+# Find existing data models that might be related
+grep -r "data\|model" docs/specs/ --include="*.md"
+
+# Find API contracts that expose this data
+grep -r "api\|endpoint" docs/specs/ --include="*.md"
+```
+
+### 2. Review Existing Data Models
+- What data modeling patterns does your codebase use?
+- What database are you using (PostgreSQL, MongoDB, etc.)?
+- How are relationships currently modeled?
+- Naming conventions for fields and entities?
+- Any legacy schema patterns to respect or migrate from?
+
+### 3. Research Domain Models
+- How do industry-standard models structure similar data?
+- Are there existing standards (e.g., ISO, RFC) you should follow?
+- What are best practices in this domain?
+
+### 4. Understand Business Rules
+- What constraints must the data satisfy?
+- What are the cardinality rules (one-to-many, many-to-many)?
+- What data must be unique or required?
+- What's the expected scale/volume?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "User Data Model", "Transaction Model", etc.
+- **Scope**: What entities does this model cover?
+- **Version**: 1.0 for new models
+
+### Overview Section
+Provide context:
+
+```markdown
+# User & Profile Data Model
+
+This data model defines the core entities for user management and profile
+information. Covers user accounts, authentication data, and user preferences.
+
+**Entities**: User, UserProfile, UserPreference
+**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
+**Primary Database**: PostgreSQL
+```
+
+### Entity Definitions Section
+
+Document each entity/table:
+
+#### Entity: User
+
+```markdown
+### User
+
+Core user account entity. Every user must have exactly one User record.
+
+**Purpose**: Represents a user account in the system.
+
+**Fields**
+
+| Field | Type | Required | Unique | Default | Description |
+|-------|------|----------|--------|---------|-------------|
+| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
+| email | String(255) | Yes | Yes | - | User's email address, used for login |
+| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
+| first_name | String(100) | No | No | - | User's first name |
+| last_name | String(100) | No | No | - | User's last name |
+| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
+| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
+| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
+| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |
+
+**Indexes**
+- Primary: `email` (unique for quick lookups)
+- Secondary: `created_at` (for user listing/pagination)
+- Secondary: `status` (for filtering active users)
+
+**Constraints**
+- Email format must be valid (enforced in application)
+- Password must be at least 8 characters (enforced in application)
+- Email must be globally unique
+- Status can only be: active, inactive, suspended
+
+**Data Volume**
+- Expected growth: 100 new users/day
+- Estimated year 1: ~36k users
+- Estimated year 3: ~150k users
+
+**Archival Strategy**
+- Deleted users (deleted_at != NULL) moved to archive after 1 year
+- Soft deletes used for data recovery capability
+```
+
+#### Entity: UserProfile
+
+```markdown
+### UserProfile
+
+Extended user profile information. One-to-one relationship with User.
+
+**Purpose**: Stores optional user profile information separate from core account.
+
+**Fields**
+
+| Field | Type | Required | Unique | Description |
+|-------|------|----------|--------|-------------|
+| id | UUID | Yes | Yes | Primary key |
+| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
+| avatar_url | String(500) | No | No | URL to user's avatar image |
+| bio | String(500) | No | No | User bio/description |
+| phone | String(20) | No | Yes | User phone number |
+| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
+| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
+| theme | Enum | No | No | UI theme preference: light, dark, auto |
+| created_at | Timestamp | Yes | No | Creation time |
+| updated_at | Timestamp | Yes | No | Last update time |
+
+**Indexes**
+- Primary: `user_id` (unique for 1:1 relationship)
+
+**Constraints**
+- Foreign key: user_id references User(id) ON DELETE CASCADE
+- Phone must be valid format (if provided)
+- Timezone must be valid (e.g., from IANA timezone database)
+- Language must be valid ISO 639-1 code
+- Theme must be one of: light, dark, auto
+
+**Notes**
+- Soft-deleted with parent User (CASCADE delete)
+- Profile is optional - some users may not have profile data
+```
+
+#### Entity: UserPreference
+
+```markdown
+### UserPreference
+
+Key-value preferences for users. Flexible schema for future preference types.
+
+**Purpose**: Stores user preferences without requiring schema changes.
+
+**Fields**
+
+| Field | Type | Required | Unique | Description |
+|-------|------|----------|--------|-------------|
+| id | UUID | Yes | Yes | Primary key |
+| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
+| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
+| preference_value | String(1000) | Yes | No | Preference value as string |
+| created_at | Timestamp | Yes | No | Creation time |
+| updated_at | Timestamp | Yes | No | Last update time |
+
+**Indexes**
+- Composite: `(user_id, preference_key)` - For efficient preference lookup
+- Primary: `user_id` - For finding all preferences for a user
+
+**Constraints**
+- Foreign key: user_id references User(id) ON DELETE CASCADE
+- Composite unique: `(user_id, preference_key)` - One preference per key per user
+- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
+- preference_value must be valid JSON or simple string
+
+**Valid Preferences**
+Examples of preference_key values:
+- `notifications_email` → "true"/"false"
+- `notifications_sms` → "true"/"false"
+- `export_format` → "csv"/"json"
+- `ui_columns_per_page` → "20"/"50"/"100"
+
+**Notes**
+- Flexible key-value design allows adding preferences without schema changes
+- Values stored as strings for flexibility, parsed by application layer
+```
+
+### Relationships Section
+
+Document how entities relate:
+
+```markdown
+## Entity Relationships
+
+```
+┌───────────┐         ┌──────────────┐         ┌─────────────┐
+│   User    │         │ UserProfile  │         │ UserPref    │
+├───────────┤         ├──────────────┤         ├─────────────┤
+│ id (PK)   │         │ id (PK)      │         │ id (PK)     │
+│ email     │◄───1:1──│ user_id (FK) │         │ user_id(FK) │
+│ ...       │         │ avatar_url   │         │ pref_key    │
+└───────────┘         │ ...          │         │ pref_value  │
+                      └──────────────┘         └─────────────┘
+                                                       ▲
+                                                       │
+                                                     1:many
+```
+
+### Relationship: User → UserProfile (1:1)
+- **Type**: One-to-One
+- **Foreign Key**: UserProfile.user_id → User.id
+- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
+- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
+- **Optional**: UserProfile is optional (some users may not have detailed profile)
+
+### Relationship: User → UserPreference (1:many)
+- **Type**: One-to-Many
+- **Foreign Key**: UserPreference.user_id → User.id
+- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
+- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
+- **Optional**: A User can have zero preferences
+```
+
+### Constraints & Validation Section
+
+```markdown
+## Data Constraints & Validation
+
+### Business Logic Constraints
+- Users cannot have duplicate emails (enforced at database + application)
+- User phone numbers must be unique if provided
+- Email and phone cannot both be deleted/NULL in UserProfile
+
+### Data Integrity Rules
+- password_hash must never be exposed in API responses
+- deleted_at cannot be set retroactively (only forward through time)
+- updated_at must be >= created_at
+
+### Referential Integrity
+- Foreign key constraints enforced at database level
+- Cascade deletes on User deletion
+- No orphaned UserProfile or UserPreference records
+
+### Enumeration Values
+
+**User.status**
+- `active` - Account is active
+- `inactive` - Account temporarily inactive
+- `suspended` - Account suspended (admin action)
+
+**UserProfile.theme**
+- `light` - Light theme
+- `dark` - Dark theme
+- `auto` - Follow system settings
+
+**UserPreference.preference_key**
+- Must match pattern: `[a-z_]+`
+- Examples: `notifications_email`, `export_format`, `ui_language`
+```
+
+### Scaling Considerations Section
+
+```markdown
+## Scaling & Performance
+
+### Expected Data Volume
+- Users: 100-1000 per day growth
+- Preferences: ~5-10 per user on average
+- Year 1 estimate: 36k users, ~180k preference records
+
+### Table Sizes
+- User table: ~36MB (estimated year 1)
+- UserProfile table: ~28MB
+- UserPreference table: ~22MB
+
+### Query Patterns & Indexes
+- Find user by email: Indexed (UNIQUE index on email)
+- Find all preferences for user: Indexed (composite on user_id, pref_key)
+- List users by creation date: Indexed (on created_at)
+- Filter users by status: Indexed (on status)
+
+### Optimization Notes
+- Composite index `(user_id, preference_key)` enables efficient preference lookups
+- Email index enables fast login queries
+- Consider partitioning UserPreference by user_id for very large scale (100M+ records)
+```
+
+### Migration & Change Management Section
+
+```markdown
+## Schema Evolution
+
+### Creating These Tables
+```sql
+CREATE TABLE user (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  email VARCHAR(255) UNIQUE NOT NULL,
+  password_hash VARCHAR(255) NOT NULL,
+  first_name VARCHAR(100),
+  last_name VARCHAR(100),
+  status VARCHAR(50) DEFAULT 'active' NOT NULL,
+  created_at TIMESTAMP DEFAULT now() NOT NULL,
+  updated_at TIMESTAMP DEFAULT now() NOT NULL,
+  deleted_at TIMESTAMP
+);
+
+CREATE TABLE user_profile (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
+  avatar_url VARCHAR(500),
+  bio VARCHAR(500),
+  phone VARCHAR(20) UNIQUE,
+  timezone VARCHAR(50),
+  language VARCHAR(5),
+  theme VARCHAR(20),
+  created_at TIMESTAMP DEFAULT now() NOT NULL,
+  updated_at TIMESTAMP DEFAULT now() NOT NULL
+);
+
+CREATE TABLE user_preference (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
+  preference_key VARCHAR(100) NOT NULL,
+  preference_value VARCHAR(1000) NOT NULL,
+  created_at TIMESTAMP DEFAULT now() NOT NULL,
+  updated_at TIMESTAMP DEFAULT now() NOT NULL,
+  UNIQUE(user_id, preference_key)
+);
+
+CREATE INDEX idx_user_email ON user(email);
+CREATE INDEX idx_user_created_at ON user(created_at);
+CREATE INDEX idx_user_status ON user(status);
+CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);
+```
+
+### Future Migrations
+- Q2 2024: Add `last_login_at` to User (nullable, new index)
+- Q3 2024: Implement user archival (age > 1 year, no activity)
+```
+
+### Documentation & Examples Section
+
+```markdown
+## Example Queries
+
+### Find user by email
+```sql
+SELECT * FROM user WHERE email = 'user@example.com';
+```
+
+### Get user with profile
+```sql
+SELECT u.*, p.*
+FROM user u
+LEFT JOIN user_profile p ON u.id = p.user_id
+WHERE u.id = $1;
+```
+
+### Get user's preferences
+```sql
+SELECT preference_key, preference_value
+FROM user_preference
+WHERE user_id = $1
+ORDER BY created_at DESC;
+```
+
+### Archive old inactive users
+```sql
+UPDATE user
+SET deleted_at = now()
+WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
+AND deleted_at IS NULL;
+```
+```
+
+## Writing Tips
+
+### Document Constraints Clearly
+- Why does each field have the constraints it does?
+- What validation rules apply?
+- What happens on constraint violations?
+
+### Think About Scale
+- How much data will this table store?
+- What are the growth projections?
+- What indexing strategy is needed?
+- Will partitioning be needed in the future?
+
+### Link to Related Specs
+- Reference technical requirements: `[PRD-001]`
+- Reference API contracts: `[API-001]` (what data is exposed)
+- Reference design documents: `[DES-001]`
+
+### Include Examples
+- Sample SQL for common queries
+- Sample JSON representations
+- Example migration scripts
+
+### Document Change Constraints
+- What fields can't change after creation?
+- What fields are immutable?
+- How do we handle schema evolution?
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Missing entity field specifications"
+- **Fix**: Complete the fields table for each entity with types, constraints, descriptions
+
+**Issue**: "No relationships documented"
+- **Fix**: Add a relationships section showing foreign keys and cardinality
+
+**Issue**: "TODO items in Constraints (3 items)"
+- **Fix**: Complete constraint definitions, validation rules, and enumeration values
+
+**Issue**: "No scaling or performance information"
+- **Fix**: Add data volume estimates, indexing strategy, and optimization notes
+
+## Decision-Making Framework
+
+As you write the data model, consider:
+
+1. **Entity Design**: What entities do we need?
+   - What are distinct concepts?
+   - What are attributes vs. relationships?
+   - Should data be normalized or denormalized?
+
+2. **Relationships**: How do entities relate?
+   - One-to-one, one-to-many, many-to-many?
+   - Should relationships be required or optional?
+   - How should deletions cascade?
+
+3. **Constraints**: What rules must data satisfy?
+   - Uniqueness constraints?
+   - Required fields?
+   - Data type restrictions?
+   - Enumeration values?
+
+4. **Performance**: How will data be queried?
+   - What indexes are needed?
+   - What's the expected scale?
+   - Are there bottlenecks?
+
+5. **Evolution**: How will this model change?
+   - Can we add fields without migrations?
+   - Can we add entities without breaking things?
+   - How do we handle data migrations?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh data-model data-XXX-slug`
+2. **Define entities**: What are the main entities/tables?
+3. **Specify fields** with types, constraints, descriptions
+4. **Document relationships** between entities
+5. **Plan indexes** for performance
+6. **Validate**: `scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md`
+7. **Share with team** before implementation
--- a/skills/spec-author/guides/deployment-procedure.md
+++ b/skills/spec-author/guides/deployment-procedure.md
@@ -0,0 +1,561 @@
+# How to Create a Deployment Procedure Specification
+
+Deployment procedures document step-by-step instructions for deploying systems to production, including prerequisites, procedures, rollback, and troubleshooting.
+
+## Quick Start
+
+```bash
+# 1. Create a new deployment procedure
+scripts/generate-spec.sh deployment-procedure deploy-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/deployment-procedure/deploy-001-descriptive-slug.md)
+
+# 3. Fill in steps and checklists, then validate:
+scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/deployment-procedure/deploy-001-descriptive-slug.md
+```
+
+## When to Write a Deployment Procedure
+
+Use a Deployment Procedure when you need to:
+- Document how to deploy a new service or component
+- Ensure consistent, repeatable deployments
+- Provide runbooks for operations teams
+- Document rollback procedures for failures
+- Enable any team member to deploy safely
+- Create an audit trail of deployments
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what you're deploying:
+
+```bash
+# Find component specs
+grep -r "component" docs/specs/ --include="*.md"
+
+# Find design documents that mention infrastructure
+grep -r "design\|infrastructure" docs/specs/ --include="*.md"
+
+# Find existing deployment procedures
+grep -r "deploy" docs/specs/ --include="*.md"
+```
+
+### 2. Understand Your Infrastructure
+- What's the deployment target? (Kubernetes, serverless, VMs)
+- What infrastructure does this component need?
+- What access/permissions are required?
+- What monitoring must be in place?
+
+### 3. Review Past Deployments
+- How have similar components been deployed?
+- What issues arose? How were they resolved?
+- What worked well? What didn't?
+- Any patterns or templates to follow?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Export Service Deployment to Production", "Database Migration", etc.
+- **Component**: What's being deployed
+- **Target**: Production, staging, canary, etc.
+- **Owner**: Team responsible for deployment
+
+### Prerequisites Section
+
+Document what must be done before deployment:
+
+```markdown
+# Export Service Production Deployment
+
+## Prerequisites
+
+### Infrastructure Requirements
+- [ ] AWS resources provisioned (see [CMP-001] for details)
+  - [ ] ElastiCache Redis cluster (export-service-queue)
+  - [ ] RDS PostgreSQL instance (export-db)
+  - [ ] S3 bucket (export-files-prod)
+  - [ ] IAM roles and policies configured
+- [ ] Kubernetes cluster accessible
+  - [ ] kubectl configured with production cluster context
+  - [ ] Deployment manifests reviewed by tech lead
+  - [ ] Namespace `export-service-prod` created
+
+### Code & Build Requirements
+- [ ] All code merged to main branch
+- [ ] Code reviewed by 2+ senior engineers
+- [ ] All tests passing
+  - [ ] Unit tests (90%+ coverage)
+  - [ ] Integration tests
+  - [ ] Load tests pass at target throughput
+- [ ] Docker image built and pushed to ECR
+  - [ ] Image tagged with version (e.g., v1.2.3)
+  - [ ] Image scanned for vulnerabilities
+  - [ ] Image verified to work (manual test in staging)
+
+### Team & Access Requirements
+- [ ] Deployment lead identified (typically tech lead or on-call eng)
+- [ ] Access verified for:
+  - [ ] AWS console (ECR, S3, CloudWatch)
+  - [ ] Kubernetes cluster (kubectl access)
+  - [ ] Database (for running migrations if needed)
+  - [ ] Monitoring/alerting system (Grafana, PagerDuty)
+- [ ] Communication channel open (Slack, war room)
+- [ ] Runbook reviewed by both eng and ops team
+
+### Pre-Deployment Verification Checklist
+- [ ] Staging deployment successful (deployed 24+ hours ago, stable)
+- [ ] Monitoring in place and verified working
+- [ ] Rollback plan reviewed and tested
+- [ ] Emergency contacts identified
+- [ ] Stakeholders notified of deployment window
+- [ ] Change log prepared (what's new in this version)
+
+### Data/Database Requirements
+- [ ] Database schema compatible with new version
+  - [ ] Backward compatible (no breaking changes)
+  - [ ] Migrations tested in staging
+  - [ ] Rollback plan for migrations documented
+- [ ] No data conflicts or corruption risks
+- [ ] Backup created (if applicable)
+
+### Approval Checklist
+- [ ] Tech Lead: Code and approach approved
+- [ ] Product Owner: Feature approved, ready for launch
+- [ ] Operations Lead: Deployment plan reviewed
+- [ ] Security: Security review passed (if applicable)
+```
+
+### Deployment Steps Section
+
+Provide step-by-step instructions:
+
+```markdown
+## Deployment Procedure
+
+### Pre-Deployment (Validation Phase)
+
+**Step 1: Verify Prerequisites**
+- Command: Run pre-deployment checklist above
+- Verify: All items checked ✓
+- If any fail: Stop deployment, resolve issues
+- Time: ~15 minutes
+
+**Step 2: Create Deployment Record**
+- Document: Who is deploying, when, what version
+- Command: Log in to deployment tracking system
+- Entry:
+  ```
+  Deployment: export-service
+  Version: v1.2.3
+  Environment: production
+  Deployed By: Alice Smith
+  Time: 2024-01-15 14:30 UTC
+  Change Summary: Added bulk export feature, fixed queue processing
+  ```
+- Time: ~5 minutes
+
+### Deployment Phase
+
+**Step 3: Tag Database Migration (if applicable)**
+- Check: Are there schema changes in this version?
+- If YES:
+  ```bash
+  # SSH to database server
+  ssh -i ~/.ssh/prod.pem admin@db.example.com
+
+  # Run migrations
+  psql -U export_service -d export_service -c \
+    "ALTER TABLE exports ADD COLUMN retry_count INT DEFAULT 0;"
+
+  # Verify migration
+  psql -U export_service -d export_service -c \
+    "SELECT column_name FROM information_schema.columns WHERE table_name='exports';"
+  ```
+- If NO: Skip this step
+- Verify: All migrations complete without errors
+- Time: ~10 minutes
+
+**Step 4: Deploy to Kubernetes**
+- Verify: You're deploying to PRODUCTION cluster
+  ```bash
+  kubectl config current-context
+  # Should output: arn:aws:eks:us-east-1:123456789:cluster/prod
+  ```
+- If wrong context: STOP, switch to correct cluster
+- Deploy new image version:
+  ```bash
+  # Update deployment with new image
+  kubectl set image deployment/export-service \
+    export-service=123456789.dkr.ecr.us-east-1.amazonaws.com/export-service:v1.2.3 \
+    -n export-service-prod
+  ```
+- Verify: Deployment triggered
+  ```bash
+  kubectl rollout status deployment/export-service -n export-service-prod
+  ```
+- Wait: For all pods to become ready (typically 2-3 minutes)
+- Output should show: `deployment "export-service" successfully rolled out`
+- Time: ~5 minutes
+
+**Step 5: Verify Deployment Health**
+- Check: Pod status
+  ```bash
+  kubectl get pods -n export-service-prod
+  ```
+  - All pods should show `Running` status
+  - If any show `CrashLoopBackOff`: Stop deployment, investigate
+
+- Check: Service endpoints
+  ```bash
+  kubectl get svc export-service -n export-service-prod
+  ```
+  - Should show external IP/load balancer endpoint
+
+- Check: Logs for errors
+  ```bash
+  kubectl logs -n export-service-prod -l app=export-service --tail=50
+  ```
+  - Should show startup logs, no ERROR level messages
+  - If errors present: Check Step 6 for rollback
+
+- Check: Health endpoints
+  ```bash
+  curl https://api.example.com/health
+  ```
+  - Should return 200 OK
+  - If not: Service may still be starting (wait 30s and retry)
+
+- Time: ~5 minutes
+
+### Post-Deployment (Verification Phase)
+
+**Step 6: Monitor Metrics**
+- Open: Grafana dashboard for export-service
+- Check: Key metrics for 5 minutes
+  - Request latency: Should be stable (< 100ms p95)
+  - Error rate: Should remain < 0.1%
+  - CPU/Memory: Should be within normal ranges
+  - Queue depth: Should process jobs smoothly
+- Look for: Any sudden spikes or anomalies
+- If anomalies: Proceed to rollback (Step 8)
+- Time: ~5 minutes
+
+**Step 7: Functional Testing**
+- Manual test: Create export via API
+  ```bash
+  curl -X POST https://api.example.com/exports \
+    -H "Authorization: Bearer $TOKEN" \
+    -H "Content-Type: application/json" \
+    -d '{
+      "format": "csv",
+      "data_types": ["users"]
+    }'
+  ```
+- Response: Should return 201 Created with export_id
+- Check status:
+  ```bash
+  curl https://api.example.com/exports/{export_id} \
+    -H "Authorization: Bearer $TOKEN"
+  ```
+- Verify: Status transitions from queued → processing → completed
+- Download: Successfully download export file
+- Verify: File contents correct
+- If any step fails: Proceed to rollback (Step 8)
+- Time: ~5 minutes
+
+**Step 8: Notify Stakeholders**
+- Update: Deployment tracking system
+  ```
+  Status: DEPLOYED
+  Completion Time: 14:45 UTC
+  Health: ✓ All checks passed
+  Metrics: ✓ Stable
+  Functional Tests: ✓ Passed
+  ```
+- Announce: Slack to #product-eng
+  ```
+  @channel Export Service v1.2.3 deployed to production.
+  New feature: Bulk data exports now available.
+  Status: Monitoring.
+  ```
+- Notify: On-call engineer (monitoring for 2 hours post-deployment)
+
+### Rollback Procedure (If Issues Found)
+
+**Step 8: Rollback (Only if Step 6 or 7 fail)**
+- Decision: Is deployment safe to continue?
+  - YES → All checks pass, monitoring is good → Release complete
+  - NO → Issues found → Proceed with rollback
+
+- Execute rollback:
+  ```bash
+  # Revert to previous version
+  kubectl rollout undo deployment/export-service -n export-service-prod
+
+  # Verify rollback in progress
+  kubectl rollout status deployment/export-service -n export-service-prod
+
+  # Wait for rollback to complete
+  ```
+
+- Verify rollback successful:
+  ```bash
+  # Check current image
+  kubectl describe deployment export-service -n export-service-prod | grep Image
+
+  # Should show previous version (e.g., v1.2.2)
+
+  # Verify service responding
+  curl https://api.example.com/health
+  ```
+
+- Notify: Update stakeholders
+  ```
+  @channel Deployment rolled back due to [specific reason].
+  Current version: v1.2.2 (stable)
+  Investigating issue. Will retry deployment tomorrow.
+  ```
+
+- Document: Root cause analysis
+  - What went wrong?
+  - Why wasn't it caught in staging?
+  - How do we prevent this next time?
+
+- Time: ~10 minutes
+```
+
+### Success Criteria Section
+
+```markdown
+## Deployment Success Criteria
+
+The deployment is successful if ALL of these are true:
+
+### Technical Criteria
+- [ ] All pods running and healthy (0 CrashLoopBackOff)
+- [ ] Service responding to health checks (200 OK)
+- [ ] Metrics showing normal values (no spikes)
+- [ ] Error rate < 0.1% (< 1 error per 1000 requests)
+- [ ] Response latency p95 < 100ms
+- [ ] No errors in application logs
+
+### Functional Criteria
+- [ ] Export API responds to requests
+- [ ] Export jobs queue successfully
+- [ ] Jobs process and complete
+- [ ] Files upload to S3 correctly
+- [ ] Users can download exported files
+- [ ] File contents verified correct
+
+### Operational Criteria
+- [ ] Monitoring active and receiving metrics
+- [ ] Alerting working (test alert fired)
+- [ ] Logs aggregated and searchable
+- [ ] Runbook tested and functional
+- [ ] Team confident in operating system
+```
+
+### Monitoring & Alerting Section
+
+```markdown
+## Monitoring Setup
+
+### Critical Alerts (Page on-call)
+- Service down (health check fails)
+- Error rate > 1% for 5 minutes
+- Response latency p95 > 500ms for 5 minutes
+- Queue depth > 1000 for 10 minutes
+
+### Warning Alerts (Slack notification)
+- Error rate > 0.5% for 5 minutes
+- CPU > 80% for 10 minutes
+- Memory > 85% for 10 minutes
+- Export job timeout increasing
+
+### Dashboard
+- Service: export-service-prod
+- Metrics: Latency, errors, throughput, queue depth
+- Time range: Last 24 hours by default
+- Alerts: Show current alert status
+```
+
+### Troubleshooting Section
+
+```markdown
+## Troubleshooting Common Issues
+
+### Issue: Pods stuck in CrashLoopBackOff
+**Symptoms**: Pods repeatedly crash and restart
+**Diagnosis**:
+```bash
+# Check logs for errors
+kubectl logs <pod-name> -n export-service-prod
+```
+**Common Causes**:
+- Configuration error (check environment variables)
+- Database connection failed (check credentials)
+- Out of memory (check resource limits)
+**Fix**: Review logs, check prerequisites, rollback if unclear
+
+### Issue: Response latency spiking
+**Symptoms**: p95 latency > 200ms, users report slow exports
+**Diagnosis**:
+```bash
+# Check queue depth
+kubectl exec -it <worker-pod> -n export-service-prod \
+  -- redis-cli -h redis.example.com LLEN export-queue
+```
+**Common Causes**:
+- Too many concurrent exports (queue backlog)
+- Database slow (check queries, indexes)
+- Network issues (check connectivity)
+**Fix**: Scale up workers, check database performance, verify network
+
+### Issue: Export jobs failing
+**Symptoms**: Job status shows `failed`, users can't export
+**Diagnosis**:
+```bash
+# Check worker logs
+kubectl logs -n export-service-prod -l app=export-service
+```
+**Common Causes**:
+- S3 upload failing (check permissions, bucket exists)
+- Database query error (schema mismatch)
+- User doesn't have data to export
+**Fix**: Review logs, verify S3 access, check schema version
+
+### Issue: Database migration failed
+**Symptoms**: Service won't start after deployment
+**Diagnosis**:
+```bash
+# Check migration logs
+psql -U export_service -d export_service -c \
+  "SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5;"
+```
+**Recovery**:
+1. Identify failed migration
+2. Rollback deployment (revert to previous version)
+3. Debug migration issue in staging
+4. Retry deployment after fix
+```
+
+### Post-Deployment Actions Section
+
+```markdown
+## After Deployment
+
+### Immediate (Next 2 hours)
+- [ ] On-call engineer monitoring
+- [ ] Check metrics every 15 minutes
+- [ ] Monitor error rate and latency
+- [ ] Watch for user-reported issues in #support
+
+### Short-term (Next 24 hours)
+- [ ] Review deployment metrics
+- [ ] Collect feedback from users
+- [ ] Document any issues encountered
+- [ ] Update runbook if needed
+
+### Follow-up (Next week)
+- [ ] Post-mortem if issues occurred
+- [ ] Update deployment procedure based on lessons learned
+- [ ] Plan performance improvements if needed
+- [ ] Update documentation if system behavior changed
+```
+
+## Writing Tips
+
+### Be Precise and Detailed
+- Exact commands to run (copy-paste ready)
+- Specific values (versions, endpoints, timeouts)
+- Expected outputs for verification
+- Time estimates for each step
+
+### Think About Edge Cases
+- What if something is already deployed?
+- What if a prerequisite is missing?
+- What if deployment partially succeeds?
+- What if rollback is needed?
+
+### Make Rollback Easy
+- Document rollback procedure clearly
+- Test rollback before using in production
+- Make rollback faster than forward deployment
+- Have quick communication plan for failures
+
+### Document Monitoring
+- What metrics indicate health?
+- What should we watch during deployment?
+- What thresholds trigger alerts?
+- How do we validate success?
+
+### Link to Related Specs
+- Reference component specs: `[CMP-001]`
+- Reference design documents: `[DES-001]`
+- Reference operations runbooks
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Prerequisites section incomplete"
+- **Fix**: Add all required infrastructure, code, access, and approvals
+
+**Issue**: "Step-by-step procedures lack detail"
+- **Fix**: Add actual commands, expected output, time estimates
+
+**Issue**: "No rollback procedure"
+- **Fix**: Document how to revert deployment if issues arise
+
+**Issue**: "Monitoring and troubleshooting missing"
+- **Fix**: Add success criteria, monitoring setup, and troubleshooting guide
+
+## Decision-Making Framework
+
+When writing a deployment procedure:
+
+1. **Prerequisites**: What must be true before we start?
+   - Infrastructure ready?
+   - Code reviewed and tested?
+   - Team trained?
+   - Approvals gotten?
+
+2. **Procedure**: What are the exact steps?
+   - Simple, repeatable steps?
+   - Verification at each step?
+   - Estimated timing?
+
+3. **Safety**: How do we prevent/catch issues?
+   - Verification steps after each phase?
+   - Rollback procedure?
+   - Quick failure detection?
+
+4. **Communication**: Who needs to know what?
+   - Stakeholders notified?
+   - On-call monitoring?
+   - Escalation path?
+
+5. **Learning**: How do we improve next time?
+   - Monitoring enabled?
+   - Runbook updated?
+   - Issues documented?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh deployment-procedure deploy-XXX-slug`
+2. **Research**: Find component specs and existing procedures
+3. **Document prerequisites**: What must be true before deployment?
+4. **Write procedures**: Step-by-step, with commands and verification
+5. **Plan rollback**: How do we undo this if needed?
+6. **Validate**: `scripts/validate-spec.sh docs/specs/deployment-procedure/deploy-XXX-slug.md`
+7. **Test procedure**: Walk through it in staging environment
+8. **Get team review** before using in production
--- a/skills/spec-author/guides/design-document.md
+++ b/skills/spec-author/guides/design-document.md
@@ -0,0 +1,503 @@
+# How to Create a Design Document
+
+Design Documents provide the detailed architectural and technical design for a system, component, or significant feature. They answer "How will we build this?" after business and technical requirements have been defined.
+
+## Quick Start
+
+```bash
+# 1. Create a new design document
+scripts/generate-spec.sh design-document des-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/design-document/des-001-descriptive-slug.md)
+
+# 3. Fill in the sections, then validate:
+scripts/validate-spec.sh docs/specs/design-document/des-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/design-document/des-001-descriptive-slug.md
+```
+
+## When to Write a Design Document
+
+Use a Design Document when you need to:
+- Define system architecture or redesign existing components
+- Document major technical decisions and trade-offs
+- Provide a blueprint for implementation teams
+- Enable architectural review before coding begins
+- Create shared understanding of complex systems
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find upstream specs that inform your design:
+
+```bash
+# Find related business requirements
+grep -r "brd" docs/specs/ --include="*.md"
+
+# Find related technical requirements
+grep -r "prd\|technical" docs/specs/ --include="*.md"
+
+# Find existing design patterns or similar designs
+grep -r "design\|architecture" docs/specs/ --include="*.md"
+```
+
+### 2. Research External Documentation
+Research existing architectures and patterns:
+
+- Look up similar systems: "How do other companies solve this problem?"
+- Research technologies and frameworks you're planning to use
+- Review relevant design patterns or architecture styles
+- Check for security, performance, or scalability best practices
+
+Use tools to fetch external docs:
+```bash
+# Research the latest on your chosen technologies
+# Example: Research distributed system patterns
+# Example: Research microservices architecture best practices
+```
+
+### 3. Review Existing Codebase & Architecture
+- What patterns does your codebase already follow?
+- What technologies are you already using?
+- How are similar features currently implemented?
+- What architectural decisions have been made previously?
+
+Ask: "Are we extending existing patterns or introducing new ones?"
+
+### 4. Understand Constraints
+- What are the performance requirements? (latency, throughput)
+- What scalability targets exist?
+- What security constraints apply?
+- What infrastructure/budget constraints?
+- Team expertise with chosen technologies?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Microservices Architecture for User Service" or similar
+- **Type**: Architecture | System Design | RFC | Technology Choice
+- **Status**: Draft | Under Review | Accepted | Rejected
+- **Version**: 1.0.0 (increment for significant revisions)
+
+### Executive Summary
+Write 3-4 sentences that answer:
+- What problem does this solve?
+- What's the proposed solution?
+- What are the key tradeoffs?
+
+Example:
+```
+This design proposes a microservices architecture to scale our user service.
+We'll split user management, authentication, and profile service into separate
+deployable services. This trades some operational complexity for independent
+scaling and development velocity. Key trade-off: eventual consistency vs.
+immediate consistency in cross-service operations.
+```
+
+### Problem Statement
+Describe the current state and limitations:
+
+```
+Current monolithic architecture handles all user operations in a single service,
+causing:
+- Bottleneck: User service becomes bottleneck for entire system
+- Scaling: Must scale entire service even if only auth needs capacity
+- Deployment: Changes in one area risk entire user service
+- Velocity: Teams block each other during development
+
+This design solves these issues by enabling independent scaling and deployment.
+```
+
+### Goals & Success Criteria
+
+**Primary Goals** (3-5 goals)
+- Reduce deployment frequency to enable multiple daily deployments
+- Enable independent scaling of auth and profile services
+- Reduce time to market for new user features
+
+**Success Criteria** (specific, measurable)
+1. Auth service can scale independently to handle 10k requests/sec
+2. Profile service deployment doesn't impact auth service
+3. System reduces MTTR for user service incidents by 50%
+4. Teams can deploy independently without coordination
+5. P95 latency remains under 200ms across service boundaries
+
+### Context & Background
+Explain why now?
+
+```
+Over the past 6 months, we've experienced:
+- Auth service saturated at 5k requests/sec during peak hours
+- Authentication changes blocked by profile service deployments
+- High operational burden managing single monolithic service
+
+Recent customer requests for higher throughput have revealed these bottlenecks.
+This design addresses the most urgent scaling constraint (auth service).
+```
+
+### Proposed Solution
+
+#### High-Level Overview
+Provide a diagram showing major components and data flow:
+
+```
+┌─────────────┐
+│   Client    │
+└──────┬──────┘
+       │
+       ├─→ [API Gateway]
+       │         │
+       ├─→ [Auth Service] - JWT validation, user login
+       │
+       ├─→ [Profile Service] - User profile, preferences
+       │
+       └─→ [Data Layer]
+            ├─ User DB (master)
+            ├─ Cache (Redis)
+            └─ Message Queue (RabbitMQ)
+```
+
+Explain how components interact:
+```
+Client sends request to API Gateway, which routes based on endpoint.
+Auth service handles login/JWT operations. Profile service handles profile
+reads/writes. Both services consume user data from shared database with
+eventual consistency via message queue.
+```
+
+#### Architecture Components
+For each major component:
+
+**Auth Service**
+- **Purpose**: Handles authentication, token generation, validation
+- **Technology**: Node.js with Express, Redis for session storage
+- **Key Responsibilities**:
+  - User login/logout
+  - JWT token generation and validation
+  - Session management
+  - Password reset flows
+- **Interactions**: Calls User DB for credential validation, publishes events to queue
+
+**Profile Service**
+- **Purpose**: Manages user profile data and preferences
+- **Technology**: Node.js with Express, PostgreSQL for user data
+- **Key Responsibilities**:
+  - Read/write user profile information
+  - Manage user preferences
+  - Handle profile search and filtering
+- **Interactions**: Consumes user events from queue, calls shared User DB
+
+**API Gateway**
+- **Purpose**: Single entry point, routing, authentication enforcement
+- **Technology**: Nginx or API Gateway (e.g., Kong)
+- **Key Responsibilities**:
+  - Route requests to appropriate service
+  - Enforce API authentication
+  - Rate limiting
+  - Request/response transformation
+- **Interactions**: Routes to Auth and Profile services
+
+### Design Decisions
+
+For each significant decision, document:
+
+#### Decision 1: Microservices vs. Monolith
+- **Decision**: Adopt microservices architecture
+- **Rationale**:
+  - Independent scaling needed (auth bottleneck at 5k req/sec)
+  - Team velocity: Can deploy auth changes independently
+  - Loose coupling enables faster iteration
+- **Alternatives Considered**:
+  - Monolith optimization: Caching, database optimization (rejected: can't solve scaling bottleneck)
+  - Modular monolith: Improves structure but doesn't enable independent scaling
+- **Impact**:
+  - Gain: Independent scaling, deployment, team velocity
+  - Accept: Distributed system complexity, operational overhead, eventual consistency
+
+#### Decision 2: Synchronous vs. Asynchronous Communication
+- **Decision**: Use message queue for eventual consistency
+- **Rationale**:
+  - Profile updates don't need to be immediately consistent across auth service
+  - Reduces coupling: Auth service doesn't wait for profile service
+  - Improves resilience: Profile service failure doesn't affect auth
+- **Alternatives Considered**:
+  - Synchronous REST calls: Simpler but tight coupling, availability issues
+  - Event sourcing: Over-engineered for current needs
+- **Impact**:
+  - Gain: Resilience, reduced coupling, independent scaling
+  - Accept: Eventual consistency, operational complexity (message queue)
+
+### Technology Stack
+
+**Language & Runtime**
+- Node.js 18 LTS - Rationale: Existing expertise, good async support
+- Express - Lightweight, flexible framework the team knows
+
+**Data Layer**
+- PostgreSQL (primary database) - Reliable, ACID transactions for user data
+- Redis (cache layer) - Session storage, auth token cache
+
+**Infrastructure**
+- Kubernetes for orchestration - Running multiple services at scale
+- Docker for containerization - Consistent deployment
+
+**Key Libraries/Frameworks**
+- Express (v4.18) - HTTP framework
+- jsonwebtoken - JWT token handling
+- @aws-sdk - AWS SDK for future integration
+- Jest - Testing framework
+
+### Data Model & Storage
+
+**Storage Strategy**
+- **Primary Database**: PostgreSQL with user table containing:
+  - id, email, password_hash, created_at, updated_at
+  - One-to-many relationship with user_preferences
+- **Caching**: Redis stores JWT token metadata and session info with 1-hour TTL
+- **Data Retention**: User data retained indefinitely; sessions cleaned up after TTL
+
+**Schema Overview**
+```
+Users Table:
+- id (primary key)
+- email (unique index)
+- password_hash
+- created_at
+- updated_at
+
+User Preferences:
+- id
+- user_id (foreign key)
+- key (e.g., theme, language)
+- value
+```
+
+### API & Integration Points
+
+**External Dependencies**
+- Integrates with existing Payment Service for billing
+- Consumes events from Billing Service (subscription changes)
+- Publishes user events to event bus for downstream services
+
+**Key Endpoints** (reference full API spec):
+- POST /auth/login - User login
+- POST /auth/logout - User logout
+- GET /profile - Fetch user profile
+- PUT /profile - Update user profile
+
+(See [API-001] for complete endpoint specifications)
+
+### Trade-offs
+
+**Accepting**
+- Operational complexity: Must manage multiple services, deployments, monitoring
+- Eventual consistency: Changes propagate through message queue, not immediate
+- Debugging complexity: Cross-service issues harder to debug
+
+**Gaining**
+- Independent scaling: Auth service can scale without scaling profile service
+- Team autonomy: Teams can deploy independently without coordination
+- Failure isolation: Auth service failure doesn't take down profile service
+- Development velocity: Faster iteration, less blocking
+
+### Implementation
+
+**Approach**: Phased migration - Extract services incrementally without big-bang rewrite
+
+**Phases**:
+1. **Phase 1 (Week 1-2)**: Extract Auth Service
+   - Deliverables: Auth service running in parallel, API Gateway routing auth requests
+   - Testing: Canary traffic (10%) to new service
+
+2. **Phase 2 (Week 3-4)**: Migrate Auth Traffic
+   - Deliverables: 100% auth traffic on new service, rollback plan tested
+   - Verification: Auth latency, error rates compared to baseline
+
+3. **Phase 3 (Week 5-6)**: Extract Profile Service
+   - Deliverables: Profile service independent, event queue running
+   - Testing: Data consistency verification across message queue
+
+**Migration Strategy**:
+- Run both monolith and microservices in parallel initially
+- Use API Gateway to route traffic, allow A/B testing
+- Maintain ability to rollback quickly if issues arise
+- Monitor closely for latency/error rate increases
+
+(See [PLN-001] for detailed implementation roadmap)
+
+### Performance & Scalability
+
+**Performance Targets**
+- **Latency**: Auth service p95 < 100ms, p99 < 200ms
+- **Throughput**: Auth service handles 10k requests/second
+- **Availability**: 99.9% uptime for auth service
+
+**Scalability Strategy**
+- **Scaling Approach**: Horizontal - Add more auth service instances behind load balancer
+- **Bottlenecks**: Database connection pool size (limit 100 connections per service instance)
+  - Mitigation: PgBouncer connection pooling, read replicas for read operations
+- **Auto-scaling**: Kubernetes HPA scales auth service from 3 to 20 replicas based on CPU
+
+**Monitoring & Observability**
+- **Metrics**: Request latency (p50, p95, p99), error rate, service availability
+- **Alerting**: Alert if auth latency p95 > 150ms, error rate > 0.5%
+- **Logging**: Structured JSON logs with request ID for tracing across services
+
+### Security
+
+**Authentication**
+- JWT tokens issued by Auth Service, validated by API Gateway
+- Token expiration: 1 hour, refresh tokens for extended sessions
+
+**Authorization**
+- Role-based access control (RBAC) enforced at API Gateway
+- Profile service doesn't repeat auth checks (trusts gateway)
+
+**Data Protection**
+- **Encryption at Rest**: PostgreSQL database encryption enabled
+- **Encryption in Transit**: TLS 1.3 for all service-to-service communication
+- **PII Handling**: Passwords hashed with bcrypt (cost factor 12)
+
+**Secrets Management**
+- Database credentials stored in Kubernetes secrets
+- JWT signing key rotated quarterly
+- Environment-based secret injection at runtime
+
+**Compliance**
+- GDPR: User data can be exported via profile service
+- SOC2: Audit logging enabled for user data access
+
+### Dependencies & Assumptions
+
+**Dependencies**
+- PostgreSQL database must be highly available (RTO 1 hour)
+- Redis cache can tolerate data loss (non-critical)
+- API Gateway (Nginx) must be deployed and operational
+- Message queue (RabbitMQ) must be running
+
+**Assumptions**
+- Auth service will handle up to 10k requests/second (based on growth projections)
+- User data size remains < 100GB (current: 5GB)
+- Network latency between services < 10ms (co-located data center)
+
+### Open Questions
+
+- [ ] Should we use gRPC for service-to-service communication instead of REST?
+  - **Status**: Under investigation - benchmarking against REST
+- [ ] How do we handle shared user data updates if both services write to DB?
+  - **Status**: Deferred to Phase 3 - will use event sourcing pattern
+- [ ] What message queue (RabbitMQ vs. Kafka)?
+  - **Status**: RabbitMQ chosen, but revisit if we need audit trail of all changes
+
+### Approvals
+
+**Technical Review**
+- Lead Backend Engineer - TBD
+
+**Architecture Review**
+- VP Engineering - TBD
+
+**Security Review**
+- Security Team - TBD
+
+**Approved By**
+- TBD
+```
+
+## Writing Tips
+
+### Use Diagrams Effectively
+- ASCII art is fine for design docs (easy to version control)
+- Show data flow and component interactions
+- Label arrows with what data/requests are flowing
+
+### Be Explicit About Trade-offs
+- Don't just say "microservices is better"
+- Say "We're trading operational complexity for independent scaling because this addresses our 5k req/sec bottleneck"
+
+### Link to Other Specs
+- Reference related business requirements: `[BRD-001]`
+- Reference technical requirements: `[PRD-001]`
+- Reference data models: `[DATA-001]`
+- Reference API contracts: `[API-001]`
+
+### Document Rationale
+- Each decision needs a "why"
+- Explain what alternatives were considered and why they were rejected
+- This helps future developers understand the context
+
+### Be Specific About Performance
+- Not: "Must be performant"
+- Yes: "p95 latency under 100ms, p99 under 200ms, supporting 10k requests/second"
+
+### Consider the Whole System
+- Security implications
+- Operational/monitoring requirements
+- Data consistency model
+- Failure modes and recovery
+- Future scalability
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/design-document/des-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Missing Proposed Solution section"
+- **Fix**: Add detailed architecture components, design decisions, tech stack
+
+**Issue**: "TODO items in Architecture Components (4 items)"
+- **Fix**: Complete descriptions for all components (purpose, technology, responsibilities)
+
+**Issue**: "No Trade-offs documented"
+- **Fix**: Explicitly document what you're accepting and what you're gaining
+
+**Issue**: "Missing Performance & Scalability targets"
+- **Fix**: Add specific latency, throughput, and availability targets
+
+### Check Completeness
+```bash
+scripts/check-completeness.sh docs/specs/design-document/des-001-your-spec.md
+```
+
+## Decision-Making Framework
+
+As you write the design doc, work through:
+
+1. **Problem**: What are we designing for?
+   - Specific pain points or constraints?
+   - Performance targets, scalability requirements?
+
+2. **Options**: What architectural approaches could work?
+   - Monolith vs. distributed?
+   - Synchronous vs. asynchronous?
+   - Technology choices?
+
+3. **Evaluation**: How do options compare?
+   - Which best addresses the problem?
+   - What are the trade-offs?
+   - What does the team have experience with?
+
+4. **Decision**: Which approach wins and why?
+   - What assumptions must hold?
+   - What trade-offs are we accepting?
+
+5. **Implementation**: How do we build/migrate to this?
+   - Big bang or incremental?
+   - Parallel running period?
+   - Rollback plan?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh design-document des-XXX-slug`
+2. **Research**: Find related specs and understand architecture context
+3. **Sketch**: Draw architecture diagrams before writing detailed components
+4. **Fill in sections** using this guide
+5. **Validate**: `scripts/validate-spec.sh docs/specs/design-document/des-XXX-slug.md`
+6. **Get architectural review** before implementation begins
+7. **Update related specs**: Create or update technical requirements and implementation plans
--- a/skills/spec-author/guides/flow-schematic.md
+++ b/skills/spec-author/guides/flow-schematic.md
@@ -0,0 +1,564 @@
+# How to Create a Flow Schematic Specification
+
+Flow schematics document business processes, workflows, and system flows visually and textually. They show how information moves through systems and how users interact with features.
+
+## Quick Start
+
+```bash
+# 1. Create a new flow schematic
+scripts/generate-spec.sh flow-schematic flow-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/flow-schematic/flow-001-descriptive-slug.md)
+
+# 3. Add diagram and flow descriptions, then validate:
+scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/flow-schematic/flow-001-descriptive-slug.md
+```
+
+## When to Write a Flow Schematic
+
+Use a Flow Schematic when you need to:
+- Document how users interact with a feature
+- Show data flow through systems
+- Illustrate decision points and branches
+- Document error handling paths
+- Clarify complex processes
+- Enable team alignment on workflow
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what this flow represents:
+
+```bash
+# Find business requirements this flow implements
+grep -r "brd" docs/specs/ --include="*.md"
+
+# Find design documents that mention this flow
+grep -r "design" docs/specs/ --include="*.md"
+
+# Find related components or APIs
+grep -r "component\|api" docs/specs/ --include="*.md"
+```
+
+### 2. Understand the User/System
+- Who are the actors in this flow? (users, systems, services)
+- What are they trying to accomplish?
+- What information flows between actors?
+- Where are the decision points?
+- What happens when things go wrong?
+
+### 3. Review Similar Flows
+- How are flows documented in your organization?
+- What diagramming style is used?
+- What level of detail is typical?
+- What's been confusing about past flows?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "User Export Flow", "Payment Processing Flow", etc.
+- **Actor**: Primary user or system
+- **Scope**: What does this flow cover?
+- **Status**: Draft | Current | Legacy
+
+### Overview Section
+
+```markdown
+# User Bulk Export Flow
+
+## Summary
+Describes the complete workflow when a user initiates a bulk data export,
+including queuing, processing, file storage, and download.
+
+**Primary Actors**: User, Export Service, Database, S3
+**Scope**: From export request to download
+**Current**: Yes (live in production)
+
+## Key Steps Overview
+1. User requests export (website)
+2. API queues export job
+3. Worker processes export
+4. File stored to S3
+5. User notified and downloads
+```
+
+### Flow Diagram Section
+
+Create a visual representation:
+
+```markdown
+## Flow Diagram
+
+### User Export Flow (ASCII Art)
+
+```
+┌─────────────┐
+│   User      │
+│  (Website)  │
+└──────┬──────┘
+       │ 1. Click Export
+       ▼
+┌─────────────────────────┐
+│  Export API             │
+│  POST /exports          │
+├─────────────────────────┤
+│ 2. Validate request     │
+│ 3. Create export record │
+│ 4. Queue job            │
+└────────┬────────────────┘
+         │
+         ├─→ 5. Return job_id to user
+         │
+         ▼
+┌──────────────────────┐
+│ Message Queue        │
+│ (Redis Bull)         │
+├──────────────────────┤
+│ 6. Store export job  │
+└────────┬─────────────┘
+         │
+         ├─→ 7. Worker picks up job
+         │
+         ▼
+┌──────────────────────────────┐
+│ Export Worker                │
+├──────────────────────────────┤
+│ 8. Query user data           │
+│ 9. Format data (CSV/JSON)    │
+│ 10. Compress file            │
+└────────┬─────────────────────┘
+         │
+         ├─→ 11. Update job status (processing)
+         │
+         ▼
+┌──────────────────────────┐
+│ AWS S3                   │
+├──────────────────────────┤
+│ 12. Store file           │
+│ 13. Generate signed URL  │
+└────────┬─────────────────┘
+         │
+         ├─→ 14. Send notification email to user
+         │
+         ▼
+┌──────────────────────────┐
+│ User Email               │
+├──────────────────────────┤
+│ 15. Click download link  │
+└────────┬─────────────────┘
+         │
+         ├─→ 16. Browser requests file from S3
+         │
+         ▼
+┌──────────────────────────┐
+│ File Downloaded          │
+└──────────────────────────┘
+```
+
+### Swimlane Diagram (Alternative Format)
+
+```markdown
+### Alternative: Swimlane Diagram
+
+```
+User          │ Frontend      │ Export API    │ Message Queue │ Worker        │ S3
+              │               │               │               │               │
+1. Clicks     │               │               │               │               │
+   Export    ─┼──────────────→│               │               │               │
+              │  2. Form Data │               │               │               │
+              │               │ 3. Validate   │               │               │
+              │               │ 4. Create Job│               │               │
+              │               │ 5. Queue Job ─┼──────────────→│               │
+              │               │               │ 6. Job Ready  │               │
+              │ 7. Show Status│               │               │               │
+              │ (polling)    ←┼───────────────│ (update DB)   │               │
+              │               │               │               │ 8. Get Data   │
+              │               │               │               │ 9. Format     │
+              │               │               │               │ 10. Compress  │
+              │               │               │               │ 11. Upload   ─┼──→
+              │               │               │               │               │
+              │ 12. Email sent│               │               │               │
+              │←──────────────┼───────────────┼───────────────┤               │
+              │               │               │               │               │
+14. Download  │               │               │               │               │
+   Starts    ─┼──────────────→│               │               │               │
+              │               │               │               │               │
+              │               │ 15. GET /file ┼───────────────┼──────────────→│
+              │               │               │               │ 16. Return URL
+              │ File Downloaded
+```
+```
+
+### Step-by-Step Description Section
+
+Document each step in detail:
+
+```markdown
+## Detailed Flow Steps
+
+### Phase 1: Export Request
+
+**Step 1: User Initiates Export**
+- **Actor**: User
+- **Action**: Clicks "Export Data" button on website
+- **Input**: Export preferences (format, data types, date range)
+- **Output**: Export request form submitted
+
+**Step 2: Frontend Sends Request**
+- **Actor**: Frontend/Browser
+- **Action**: Submits POST request to /exports endpoint
+- **Headers**: Authorization header with JWT token
+- **Body**:
+  ```json
+  {
+    "format": "csv",
+    "data_types": ["users", "transactions"],
+    "date_range": { "start": "2024-01-01", "end": "2024-01-31" }
+  }
+  ```
+
+**Step 3: API Validates Request**
+- **Actor**: Export API
+- **Action**: Validate request format and parameters
+- **Checks**:
+  - User authenticated?
+  - Valid format type?
+  - Date range valid?
+  - User not already processing too many exports?
+- **Success**: Continue to Step 4
+- **Error**: Return 400 Bad Request with error details
+
+**Step 4: Create Export Record**
+- **Actor**: Export API
+- **Action**: Store export metadata in database
+- **Data Stored**:
+  ```sql
+  INSERT INTO exports (
+    id, user_id, format, data_types, status,
+    created_at, updated_at
+  ) VALUES (...)
+  ```
+- **Status**: `queued`
+- **Response**: Return 201 with export_id
+
+### Phase 2: Job Processing
+
+**Step 5: Queue Export Job**
+- **Actor**: Export API
+- **Action**: Add job to Redis queue
+- **Job Format**:
+  ```json
+  {
+    "export_id": "exp_123456",
+    "user_id": "usr_789012",
+    "format": "csv",
+    "data_types": ["users", "transactions"]
+  }
+  ```
+- **Queue**: Bull job queue in Redis
+- **TTL**: Job removed after 7 days
+
+**Step 6: Return to User**
+- **Actor**: Export API
+- **Action**: Send response to frontend
+- **Response**:
+  ```json
+  {
+    "id": "exp_123456",
+    "status": "queued",
+    "created_at": "2024-01-15T10:00:00Z",
+    "estimated_completion": "2024-01-15T10:05:00Z"
+  }
+  ```
+
+### Phase 3: Data Export
+
+**Step 7: Worker Picks Up Job**
+- **Actor**: Export Worker
+- **Action**: Poll Redis queue for jobs
+- **Condition**: Worker checks every 100ms
+- **Process**: Dequeues oldest job, marks as processing
+- **Status Update**: Export marked as `processing` in database
+
+**Step 8-10: Process Export**
+- **Actor**: Export Worker
+- **Actions**:
+  1. Query user data from database (user table, transaction table)
+  2. Validate and transform data to requested format
+  3. Write to temporary file on worker disk
+  4. Compress file with gzip
+- **Error Handling**: If fails, retry up to 3 times with backoff
+
+**Step 11: Upload to S3**
+- **Actor**: Export Worker
+- **Action**: Upload compressed file to S3
+- **Filename**: `exports/exp_123456.csv.gz`
+- **ACL**: Private (only accessible via signed URL)
+- **Success**: Update export status to `completed` in database
+
+### Phase 4: Notification & Download
+
+**Step 12: Send Notification**
+- **Actor**: Notification Service (triggered by export completion event)
+- **Action**: Send email to user
+- **Email Content**: "Your export is ready! [Click here to download]"
+- **Link**: Includes signed URL (valid for 7 days)
+
+**Step 13: User Receives Email**
+- **Actor**: User
+- **Action**: Receives email notification
+- **Next**: Clicks download link
+
+**Step 14-16: Download File**
+- **Actor**: User browser
+- **Action**: Follows download link
+- **Request**: GET /exports/exp_123456/download
+- **Response**: Browser initiates file download
+- **File**: exp_123456.csv.gz is saved to user's computer
+```
+
+### Decision Points Section
+
+Document branching logic:
+
+```markdown
+## Decision Points
+
+### Decision 1: Export Format Validation
+**Question**: Is the requested export format supported?
+**Options**:
+- ✓ CSV: Continue to data export (Step 8)
+- ✓ JSON: Continue to data export (Step 8)
+- ✗ Other format: Return 400 error, user selects different format
+
+### Decision 2: User Data Available?
+**Question**: Can we successfully query user data?
+**Options**:
+- ✓ Yes: Continue with data transformation (Step 9)
+- ✗ Database error: Retry job (up to 3 times)
+- ✗ User data deleted: Return "no data" message to user
+
+### Decision 3: File Size Check
+**Question**: Is the export file within size limits?
+**Options**:
+- ✓ < 500MB: Proceed to upload (Step 11)
+- ✗ > 500MB: Return error "export too large", offer data filtering options
+
+### Decision 4: Export Status Check (User Polling)
+**Question**: Has export job completed?
+**Polling**: Frontend polls GET /exports/{id} every 5 seconds
+**Options**:
+- `queued`: Show "Waiting to process..."
+- `processing`: Show "Processing... (40%)"
+- `completed`: Show download link
+- `failed`: Show error message, offer retry option
+- `cancelled`: Show "Export was cancelled"
+```
+
+### Error Handling Section
+
+```markdown
+## Error Handling & Recovery
+
+### Error 1: Invalid Request Format
+**Trigger**: User submits invalid format parameter
+**Response Code**: 400 Bad Request
+**Message**: "Invalid format. Supported: csv, json"
+**Recovery**: User submits corrected request
+
+### Error 2: Database Connection Lost During Export
+**Trigger**: Worker loses connection to database while querying data
+**Response Code**: (internal, no response to user)
+**Recovery**: Job retried automatically (backoff: 1s, 2s, 4s)
+**Max Retries**: 3 times
+**If Fails After Retries**: Export marked as `failed`, user notified
+
+### Error 3: S3 Upload Failure
+**Trigger**: S3 returns 500 error
+**Recovery**: Retry with exponential backoff
+**Fallback**: If retries exhausted, store to local backup, retry next hour
+**User Impact**: Export shows "delayed", user can check status later
+
+### Error 4: File Too Large
+**Trigger**: Export file exceeds 500MB limit
+**Response Code**: 413 Payload Too Large
+**Message**: "Export data exceeds 500MB. Use date filtering to reduce size."
+**Recovery**: User modifies date range and resubmits
+
+### Timeout Handling
+**Job Timeout**: If export takes > 5 minutes, job is killed
+**User Notification**: "Export processing took too long. Please try again."
+**Logs**: Timeout recorded for analysis
+**Recovery**: User can request again (usually succeeds second time)
+```
+
+### Async/Event Section
+
+Document asynchronous aspects:
+
+```markdown
+## Asynchronous Operations
+
+### Event: Export Created
+**Trigger**: POST /exports returns 201
+**Event Published**: `export.created`
+**Subscribers**: Analytics service (tracks export requests)
+**Payload**:
+```json
+{
+  "export_id": "exp_123456",
+  "user_id": "usr_789012",
+  "format": "csv",
+  "timestamp": "2024-01-15T10:00:00Z"
+}
+```
+
+### Event: Export Completed
+**Trigger**: Worker successfully uploads to S3
+**Event Published**: `export.completed`
+**Subscribers**:
+- Notification service (send email)
+- Analytics service (track completion)
+**Payload**:
+```json
+{
+  "export_id": "exp_123456",
+  "file_size_bytes": 2048576,
+  "processing_time_ms": 312000,
+  "timestamp": "2024-01-15T10:05:12Z"
+}
+```
+
+### Event: Export Failed
+**Trigger**: Job fails after max retries
+**Event Published**: `export.failed`
+**Subscribers**: Notification service (alert user)
+**Payload**:
+```json
+{
+  "export_id": "exp_123456",
+  "error_code": "database_timeout",
+  "error_message": "Connection timeout after 3 retries",
+  "timestamp": "2024-01-15T10:06:00Z"
+}
+```
+```
+
+### Performance & Timing Section
+
+```markdown
+## Performance Characteristics
+
+### Typical Timings
+- Request submission → queued: < 100ms
+- Queued → processing starts: < 30 seconds (depends on queue load)
+- Processing time:
+  - Small dataset (< 10MB): 1-2 minutes
+  - Medium dataset (10-100MB): 2-5 minutes
+  - Large dataset (100-500MB): 5-10 minutes
+- Upload to S3: 30 seconds to 2 minutes
+
+### Total End-to-End Time
+- Average: 5-10 minutes from request to download ready
+- Best case: 3-5 minutes (empty queue, small dataset)
+- Worst case: 15+ minutes (high load, large dataset)
+
+### Scaling Behavior
+- 1 worker: Processes 1 export at a time
+- 3 workers: Process 3 exports in parallel
+- 10 workers: Can handle 10 concurrent exports
+- Queue depth auto-scales workers up to 20 pods
+```
+
+## Writing Tips
+
+### Use Clear Diagrams
+- ASCII art is fine and versioning-friendly
+- Show all actors and their interactions
+- Label arrows with what's being transmitted
+- Use swimlanes for multiple actors
+
+### Be Specific About Data
+- Show actual request/response formats
+- Include field names and types
+- Show error responses with codes
+- Document data transformations
+
+### Cover the Happy Path AND Error Paths
+- What happens when everything works?
+- What happens when things go wrong?
+- What are the recovery mechanisms?
+- Can users recover?
+
+### Think About Timing
+- What happens asynchronously?
+- Where are synchronous waits?
+- What are typical timings?
+- Where are bottlenecks?
+
+### Link to Related Specs
+- Reference design documents: `[DES-001]`
+- Reference API contracts: `[API-001]`
+- Reference component specs: `[CMP-001]`
+- Reference data models: `[DATA-001]`
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/flow-schematic/flow-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Flow diagram incomplete or missing"
+- **Fix**: Add ASCII diagram or swimlane showing all steps
+
+**Issue**: "Step descriptions lack detail"
+- **Fix**: Add what happens, who's involved, input/output for each step
+
+**Issue**: "No error handling documented"
+- **Fix**: Document error cases and recovery mechanisms
+
+**Issue**: "Async operations not clearly shown"
+- **Fix**: Highlight asynchronous steps and show event flows
+
+## Decision-Making Framework
+
+When documenting a flow:
+
+1. **Scope**: What does this flow cover?
+   - Where does it start/end?
+   - What's in scope vs. out?
+
+2. **Actors**: Who/what are the main actors?
+   - Users, systems, services?
+   - External dependencies?
+
+3. **Happy Path**: What's the ideal flow?
+   - Step-by-step happy path
+   - Minimal branching
+
+4. **Edge Cases**: What can go wrong?
+   - Error scenarios
+   - Recovery mechanisms
+   - User impact
+
+5. **Timing**: What's the performance profile?
+   - Synchronous waits?
+   - Asynchronous operations?
+   - Expected timings?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh flow-schematic flow-XXX-slug`
+2. **Research**: Find related specs and understand context
+3. **Sketch diagram**: Draw initial flow with all actors
+4. **Document steps**: Write detailed description for each step
+5. **Add error handling**: Document failure scenarios
+6. **Validate**: `scripts/validate-spec.sh docs/specs/flow-schematic/flow-XXX-slug.md`
+7. **Get feedback** from team to refine flow
--- a/skills/spec-author/guides/milestone.md
+++ b/skills/spec-author/guides/milestone.md
@@ -0,0 +1,434 @@
+# How to Create a Milestone Specification
+
+Milestone specifications define specific delivery targets within a project, including deliverables, success criteria, and timeline. They're checkpoints to verify progress.
+
+## Quick Start
+
+```bash
+# 1. Create a new milestone
+scripts/generate-spec.sh milestone mls-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/milestone/mls-001-descriptive-slug.md)
+
+# 3. Fill in deliverables and criteria, then validate:
+scripts/validate-spec.sh docs/specs/milestone/mls-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/milestone/mls-001-descriptive-slug.md
+```
+
+## When to Write a Milestone
+
+Use a Milestone Spec when you need to:
+- Define specific delivery checkpoints
+- Communicate to stakeholders what's shipping when
+- Track progress against concrete deliverables
+- Set success criteria before building
+- Manage dependencies between teams
+- Celebrate progress and team achievements
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find the context for this milestone:
+
+```bash
+# Find the plan this milestone belongs to
+grep -r "plan" docs/specs/ --include="*.md"
+
+# Find related requirements and specs
+grep -r "brd\|prd\|design" docs/specs/ --include="*.md"
+```
+
+### 2. Understand the Broader Plan
+- What larger project is this part of?
+- What comes before and after this milestone?
+- What dependencies exist with other teams?
+- What are the overall project goals?
+
+### 3. Review Similar Milestones
+- How were past milestones structured?
+- What deliverables were tracked?
+- How were success criteria defined?
+- What worked and what didn't?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Phase 1: Infrastructure Ready", "Beta Launch", etc.
+- **Date**: Target completion date
+- **Owner**: Team or person responsible
+- **Status**: Planned | In Progress | Completed | At Risk
+
+### Milestone Summary
+
+```markdown
+# Phase 1: Export Infrastructure Ready
+
+**Target Date**: January 28, 2024
+**Owner**: Backend Engineering Team
+**Status**: In Progress
+
+## Summary
+Delivery of fully operational job queue infrastructure and worker processes
+supporting the bulk export feature. Team demonstrates system can reliably
+process 10+ jobs per second with monitoring and alerting in place.
+```
+
+### Deliverables Section
+
+List what will be delivered:
+
+```markdown
+## Deliverables
+
+### 1. Redis Job Queue (Production-Ready)
+**Description**: Managed Redis cluster configured for job queuing
+**Acceptance Criteria**:
+- [ ] AWS ElastiCache Redis cluster deployed to staging
+- [ ] Cluster sized for 10k requests/second capacity
+- [ ] Backup and failover configured
+- [ ] Monitoring and alerts in place
+**Owner**: Infrastructure Team
+**Status**: In Progress
+
+### 2. Bull Job Queue Worker
+**Description**: Node.js Bull queue implementation with workers
+**Acceptance Criteria**:
+- [ ] Bull queue initialized and processing jobs
+- [ ] Worker processes handle 10+ jobs/second
+- [ ] Graceful shutdown implemented
+- [ ] Error handling and retry logic working
+- [ ] Unit tests cover all worker functions
+**Owner**: Backend Engineer (Alice)
+**Delivered**: Code in feature branch, ready for review
+
+### 3. Kubernetes Deployment Manifests
+**Description**: K8s manifests for deploying queue workers
+**Acceptance Criteria**:
+- [ ] Deployment manifest supports 1-10 replicas
+- [ ] Health checks configured (liveness, readiness)
+- [ ] Resource requests/limits defined
+- [ ] Secrets management for Redis credentials
+- [ ] Successfully deploys to staging cluster
+**Owner**: DevOps Engineer (Bob)
+**Status**: Ready for review
+
+### 4. Prometheus Metrics Integration
+**Description**: Export metrics for job queue depth, worker status
+**Acceptance Criteria**:
+- [ ] Metrics scrape successfully every 15 seconds
+- [ ] Dashboard shows queue depth over time
+- [ ] Queue saturation alerts configured
+- [ ] Grafana dashboard created for monitoring
+**Owner**: Backend Engineer (Alice)
+**Status**: In progress
+
+### 5. Documentation & Runbook
+**Description**: Queue architecture docs and operational runbook
+**Acceptance Criteria**:
+- [ ] Architecture diagram showing queues and workers
+- [ ] Configuration guide for different environments
+- [ ] Runbook for common operations (scaling, debugging)
+- [ ] Troubleshooting guide for common issues
+**Owner**: Tech Lead (Charlie)
+**Status**: Planned (starts after technical setup)
+
+## Deliverables Summary
+
+| Deliverable | Status | Owner | Target |
+|------------|--------|-------|--------|
+| Redis Cluster | In Progress | Infra | Jan 20 |
+| Bull Worker | In Progress | Alice | Jan 22 |
+| K8s Manifests | In Progress | Bob | Jan 22 |
+| Prometheus Metrics | In Progress | Alice | Jan 25 |
+| Documentation | Planned | Charlie | Jan 28 |
+```
+
+### Success Criteria Section
+
+Define what "done" means:
+
+```markdown
+## Success Criteria
+
+### Technical Criteria (Must Pass)
+- [ ] Job queue processes 100 jobs without errors
+- [ ] Queue handles 10+ jobs/second sustained throughput
+- [ ] Workers scale horizontally (add/remove replicas without data loss)
+- [ ] Failed jobs retry with exponential backoff
+- [ ] All health checks pass in staging environment
+
+### Operational Criteria (Must Have)
+- [ ] Prometheus metrics visible in Grafana dashboard
+- [ ] Alerts fire correctly when queue depth exceeds threshold
+- [ ] Monitoring documentation complete and understood by ops team
+- [ ] Runbook covers: scaling, debugging, troubleshooting
+
+### Quality Criteria (Must Meet)
+- [ ] Code reviewed and approved by 2+ senior engineers
+- [ ] Unit tests pass with 90%+ coverage
+- [ ] Integration tests verify queue → worker → completion flow
+- [ ] Load tests verify performance targets
+- [ ] Security audit passed (no exposed credentials)
+
+### Documentation Criteria (Must Have)
+- [ ] Architecture documented with diagrams
+- [ ] Configuration guide for different environments
+- [ ] Troubleshooting guide covers common issues
+- [ ] Operations team trained and confident in operations
+
+## Sign-Off Criteria
+
+Milestone is "done" when:
+1. All deliverables accepted and deployed to staging
+2. All technical criteria pass
+3. Tech lead, product owner, and operations lead approve
+4. Documentation reviewed and accepted
+```
+
+### Timeline & Dependencies Section
+
+```markdown
+## Timeline & Dependencies
+
+### Critical Path
+```
+Start → Redis Setup → Bull Implementation → Testing → Documentation → Done
+ (Jan 15)  (3 days)      (4 days)          (3 days)   (2 days)      (Jan 28)
+```
+
+### Phase Dependencies
+- **Blocking this milestone**: None (can start immediately)
+- **This milestone blocks**: Phase 2 (Export Service Development)
+- **If delayed**: Phase 2 starts after this completes
+- **Contingency**: Have spare capacity in next phase for any slippage
+
+### Team Capacity
+| Person | Allocation | Weeks | Notes |
+|--------|-----------|-------|-------|
+| Alice (Backend) | 100% | 2 | Queue + metrics |
+| Bob (DevOps) | 100% | 1.5 | Infrastructure |
+| Charlie (Lead) | 50% | 1.5 | Review + docs |
+
+### Risks & Mitigation
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|-----------|
+| Redis provisioning delayed | Medium | High | Use managed service, start request early |
+| Performance targets not met | Low | High | Load test early, optimize if needed |
+| Team member unavailable | Low | Medium | Cross-train backup person |
+| Documentation delayed | Low | Low | Defer non-critical docs to next phase |
+```
+
+### Blockers & Issues Section
+
+Track what could prevent delivery:
+
+```markdown
+## Current Blockers
+
+### 1. AWS Infrastructure Approval (High Priority)
+- **Issue**: Redis cluster requires infrastructure approval
+- **Impact**: Blocks infrastructure setup (3-5 day delay if not approved)
+- **Owner**: Infrastructure Lead
+- **Action**: Sent approval request on Jan 10, following up Jan 15
+- **Target Resolution**: Jan 12
+
+### 2. Node.js Bull Documentation Gap (Low Priority)
+- **Issue**: Team unfamiliar with Bull library job prioritization
+- **Impact**: Might need extra time for implementation
+- **Owner**: Alice
+- **Action**: Schedule Bull library workshop on Jan 16
+- **Target Resolution**: Jan 16
+
+## Dependencies Waiting
+
+- AWS ElastiCache cluster approval (Infrastructure)
+- IAM roles and security groups (Security team)
+```
+
+### Acceptance & Testing Section
+
+```markdown
+## Acceptance Procedures
+
+### Manual Testing Checklist
+- [ ] Queue accepts jobs from client
+- [ ] Worker processes jobs without errors
+- [ ] Queue depth monitoring works in Grafana
+- [ ] Scaling up adds workers, scaling down removes them gracefully
+- [ ] Failed job retry works with exponential backoff
+- [ ] Restart worker and verify no jobs are lost
+
+### Performance Testing
+- [ ] Load test with 100 concurrent jobs
+- [ ] Verify throughput ≥ 10 jobs/second
+- [ ] Monitor memory and CPU during load test
+- [ ] Document baseline metrics for future comparison
+
+### Security Testing
+- [ ] Credentials not exposed in logs or metrics
+- [ ] Redis connection uses TLS
+- [ ] Worker process runs with minimal permissions
+
+### Sign-Off Process
+1. Engineering team completes manual testing
+2. Tech lead verifies all acceptance criteria pass
+3. Operations team reviews runbook and documentation
+4. Product owner confirms milestone meets business requirements
+5. All sign-off: tech lead, ops lead, product owner
+```
+
+### Rollback Plan Section
+
+```markdown
+## Rollback Plan
+
+If this milestone fails or has critical issues:
+
+### Rollback Steps
+1. Revert worker deployment: `kubectl rollout undo`
+2. Keep Redis cluster (non-breaking)
+3. Disable alerts that reference new queue
+4. Run post-mortem to understand failure
+
+### Communication
+- Notify stakeholders if deadline at risk
+- Update project plan and re-estimate Phase 2
+- Communicate revised timeline to customers
+
+### Root Cause Analysis
+- Conduct post-mortem within 2 days
+- Document lessons learned
+- Update processes/checklists to prevent recurrence
+```
+
+### Stakeholder Communication Section
+
+```markdown
+## Stakeholder Communication
+
+### Who Needs to Know About This Milestone?
+- **Engineering**: Build against completed infrastructure
+- **Product**: Planning feature launch timeline
+- **Operations**: Preparing to support new system
+- **Executives**: Tracking project progress
+- **Customers**: Waiting for export feature
+
+### Communication Plan
+
+| Stakeholder | Update Frequency | Content |
+|-------------|-----------------|---------|
+| Engineering Team | Daily standup | Progress, blockers |
+| Tech Lead | 3x/week | Risk assessment, decisions |
+| Product Owner | Weekly | Status, timeline impact |
+| Ops Team | Twice/week | Operational readiness |
+| Executives | On completion | Milestone achieved, next steps |
+
+### Status Updates
+
+**Current Status**: 60% complete (Jan 22)
+- Redis setup: Complete
+- Bull worker: Mostly done, 2 days of testing remaining
+- K8s manifests: In review
+- Metrics: Underway
+- Documentation: Not yet started
+
+**Next Update**: Jan 25 (on track for Jan 28 completion)
+
+**Confidence Level**: High (85%) - minor risks, good progress
+```
+
+## Writing Tips
+
+### Be Specific About Deliverables
+- What exactly is being delivered?
+- How will you verify it's done?
+- Who owns each deliverable?
+- What's the definition of "done"?
+
+### Define Success Clearly
+- Success criteria should be objective and testable
+- Mix technical, operational, and quality criteria
+- Include both must-haves and nice-to-haves
+- Get stakeholder agreement on criteria upfront
+
+### Think About the Bigger Picture
+- How does this milestone fit into the overall project?
+- What depends on this milestone?
+- What changes if this milestone is delayed?
+- What's the contingency plan?
+
+### Track Progress
+- Update the milestone spec regularly (weekly)
+- Note what's actually happening vs. plan
+- Identify and communicate risks early
+- Celebrate when milestone completes!
+
+### Link to Related Specs
+- Reference the overall plan: `[PLN-001]`
+- Reference related milestones: `[MLS-002]`
+- Reference technical specs: `[CMP-001]`, `[API-001]`
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/milestone/mls-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Deliverables lack acceptance criteria"
+- **Fix**: Add specific, testable criteria for each deliverable
+
+**Issue**: "No success criteria defined"
+- **Fix**: Document technical, operational, and quality criteria
+
+**Issue**: "Owner/responsibilities not assigned"
+- **Fix**: Assign each deliverable to a specific person or team
+
+**Issue**: "Rollback plan missing"
+- **Fix**: Document how you'd handle failure or critical issues
+
+## Decision-Making Framework
+
+When defining a milestone:
+
+1. **Scope**: What should be in this milestone?
+   - Shippable chunk?
+   - Dependencies resolved?
+   - Tests passing?
+
+2. **Success**: How will we know this is done?
+   - Objective criteria?
+   - Stakeholder agreement?
+   - Testable outcomes?
+
+3. **Schedule**: When is this realistically achievable?
+   - Team capacity?
+   - Dependency timelines?
+   - Buffer for unknowns?
+
+4. **Risks**: What could prevent delivery?
+   - Technical unknowns?
+   - Resource constraints?
+   - External dependencies?
+
+5. **Communication**: Who needs to know about this?
+   - Stakeholder updates?
+   - Sign-off process?
+   - Celebration when done?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh milestone mls-XXX-slug`
+2. **Research**: Find the plan and related specs
+3. **Define deliverables** with clear owners
+4. **Set success criteria** that are testable
+5. **Identify risks** and mitigation strategies
+6. **Validate**: `scripts/validate-spec.sh docs/specs/milestone/mls-XXX-slug.md`
+7. **Get stakeholder alignment** before kickoff
+8. **Update regularly** to track progress
--- a/skills/spec-author/guides/plan.md
+++ b/skills/spec-author/guides/plan.md
@@ -0,0 +1,523 @@
+# How to Create a Plan Specification
+
+Plan specifications document implementation roadmaps, project timelines, phases, and deliverables. They provide the "how and when" we'll build something.
+
+## Quick Start
+
+```bash
+# 1. Create a new plan
+scripts/generate-spec.sh plan pln-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/plan/pln-001-descriptive-slug.md)
+
+# 3. Fill in phases and deliverables, then validate:
+scripts/validate-spec.sh docs/specs/plan/pln-001-descriptive-slug.md
+
+# 4. Fix issues and check completeness:
+scripts/check-completeness.sh docs/specs/plan/pln-001-descriptive-slug.md
+```
+
+## When to Write a Plan
+
+Use a Plan Spec when you need to:
+- Document project phases and timeline
+- Define deliverables and milestones
+- Identify dependencies and blockers
+- Track progress against plan
+- Communicate timeline to stakeholders
+- Align team on implementation sequence
+
+## Research Phase
+
+### 1. Research Related Specifications
+Find what you're planning:
+
+```bash
+# Find business requirements
+grep -r "brd" docs/specs/ --include="*.md"
+
+# Find technical requirements and design docs
+grep -r "prd\|design" docs/specs/ --include="*.md"
+
+# Find existing plans that might be related
+grep -r "plan" docs/specs/ --include="*.md"
+```
+
+### 2. Understand the Scope
+- What are you building?
+- What are the business priorities?
+- What are the technical dependencies?
+- What constraints exist (timeline, resources)?
+
+### 3. Review Similar Projects
+- How long did similar projects take?
+- What teams are involved?
+- What risks arose and how were they managed?
+- What was the actual vs. planned timeline?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: "Export Feature Implementation Plan", "Q1 2024 Roadmap", etc.
+- **Timeline**: Start date through completion
+- **Owner**: Project lead or team lead
+- **Status**: Planning | In Progress | Completed
+
+### Overview Section
+
+```markdown
+# Export Feature Implementation Plan
+
+## Summary
+Plan to implement bulk user data export feature across 8 weeks.
+Includes job queue infrastructure, API endpoints, UI, and deployment.
+
+**Timeline**: January 15 - March 10, 2024 (8 weeks)
+**Team Size**: 3-4 engineers, 1 product manager
+**Owner**: Engineering Lead
+**Status**: Planning
+
+## Key Objectives
+1. Enable enterprise customers to bulk export their data
+2. Build reusable async job processing infrastructure
+3. Achieve 99% reliability for export system
+4. Complete testing and documentation before production launch
+```
+
+### Phases Section
+
+Document phases with timing:
+
+```markdown
+## Implementation Phases
+
+### Phase 1: Infrastructure Setup (Weeks 1-2)
+**Timeline**: Jan 15 - Jan 28 (2 weeks)
+**Team**: 2 engineers
+
+**Goals**
+- Implement job queue infrastructure (Redis + Bull)
+- Build worker process foundation
+- Deploy to staging environment
+
+**Deliverables**
+- Redis job queue with worker processes
+- Monitoring and alerting setup
+- Health checks and graceful shutdown
+- Documentation of queue architecture
+
+**Tasks**
+- [ ] Set up Redis cluster (managed service)
+- [ ] Implement Bull queue with worker processors
+- [ ] Add Prometheus metrics for queue depth
+- [ ] Configure Kubernetes deployment manifests
+- [ ] Create staging deployment
+- [ ] Document queue architecture
+
+**Dependencies**
+- None (can start immediately)
+
+**Success Criteria**
+- Job queue processes 10 jobs/second without errors
+- Workers can be scaled horizontally in Kubernetes
+- Monitoring shows queue depth and worker status
+```
+
+### Phase 2: Export Service Development (Weeks 3-5)
+**Timeline**: Jan 29 - Feb 18 (3 weeks)
+**Team**: 2-3 engineers
+
+**Goals**
+- Implement export processing logic
+- Add database export functionality
+- Support multiple export formats
+
+**Deliverables**
+- Export service that processes jobs from queue
+- CSV and JSON export format support
+- Data validation and error handling
+- Comprehensive unit tests
+
+**Tasks**
+- [ ] Implement data query and export logic
+- [ ] Add CSV formatter
+- [ ] Add JSON formatter
+- [ ] Implement error handling and retries
+- [ ] Add data compression
+- [ ] Write unit tests (target: 90%+ coverage)
+- [ ] Performance test with 100MB+ files
+
+**Dependencies**
+- Phase 1 complete (queue infrastructure)
+- Data model spec finalized ([DATA-001])
+
+**Success Criteria**
+- Export processes complete in < 5 minutes for 100MB files
+- All data exports match source data exactly
+- Error retry logic works correctly
+- 90%+ test coverage
+```
+
+### Phase 3: API & Storage (Weeks 4-6)
+**Timeline**: Feb 5 - Feb 25 (3 weeks, 1 week overlap with Phase 2)
+**Team**: 2 engineers
+
+**Goals**
+- Implement REST API for export management
+- Set up S3 storage for export files
+- Build export status tracking
+
+**Deliverables**
+- REST API endpoints (create, get status, download)
+- S3 integration for file storage
+- Export metadata storage in database
+- API documentation
+
+**Tasks**
+- [ ] Implement POST /exports endpoint
+- [ ] Implement GET /exports/{id} endpoint
+- [ ] Implement GET /exports/{id}/download endpoint
+- [ ] Add S3 integration for storage
+- [ ] Create export metadata schema
+- [ ] Implement TTL-based cleanup
+- [ ] Add API rate limiting
+- [ ] Create API documentation
+
+**Dependencies**
+- Phase 1 complete (queue infrastructure)
+- Phase 2 progress (service processing)
+- Data model spec finalized ([DATA-001])
+
+**Success Criteria**
+- API responds to requests in < 100ms (p95)
+- Files stored and retrieved from S3 correctly
+- Cleanup removes files after 7-day TTL
+```
+
+### Phase 4: Testing & Optimization (Weeks 6-7)
+**Timeline**: Feb 19 - Mar 3 (2 weeks, overlap with Phase 3)
+**Team**: 2-3 engineers
+
+**Goals**
+- Comprehensive testing across all components
+- Performance optimization
+- Security audit
+
+**Deliverables**
+- Integration tests for full export flow
+- Load tests verifying performance targets
+- Security audit report
+- Performance tuning applied
+
+**Tasks**
+- [ ] Write integration tests for full export flow
+- [ ] Load test with 100 concurrent exports
+- [ ] Security audit of data handling
+- [ ] Performance profiling and optimization
+- [ ] Test large file handling (500MB+)
+- [ ] Test error scenarios and retries
+- [ ] Document known limitations
+
+**Dependencies**
+- Phase 2, 3 complete (service and API)
+
+**Success Criteria**
+- 95%+ automated test coverage
+- Load tests show < 500ms p95 latency
+- Security audit finds no critical issues
+- Performance meets targets (< 5 min for 100MB)
+```
+
+### Phase 5: Documentation & Launch (Weeks 7-8)
+**Timeline**: Mar 4 - Mar 10 (2 weeks, 1 week overlap)
+**Team**: Full team (all 4)
+
+**Goals**
+- Complete documentation
+- Customer communication
+- Production deployment
+
+**Deliverables**
+- API documentation (for customers)
+- Runbook for operations team
+- Customer launch announcement
+- Production deployment checklist
+
+**Tasks**
+- [ ] Create customer-facing API docs
+- [ ] Create operational runbook
+- [ ] Write troubleshooting guide
+- [ ] Create launch announcement
+- [ ] Train support team
+- [ ] Deploy to production
+- [ ] Monitor for issues
+- [ ] Collect initial feedback
+
+**Dependencies**
+- All prior phases complete
+- Security audit passed
+
+**Success Criteria**
+- Documentation is complete and clear
+- Support team can operate system independently
+- Launch goes smoothly with no incidents
+- Users successfully export data
+```
+
+### Dependencies & Blocking Section
+
+```markdown
+## Dependencies & Blockers
+
+### External Dependencies
+- **Data Model Spec ([DATA-001])**: Required to understand data structure
+  - Status: Draft
+  - Timeline: Must be approved by Jan 20
+  - Owner: Data team
+
+- **API Contract Spec ([API-001])**: API design must be finalized
+  - Status: In Review
+  - Timeline: Must be approved by Feb 5
+  - Owner: Product team
+
+- **Infrastructure Resources**: Need S3 bucket and Redis cluster
+  - Status: Requested
+  - Timeline: Must be available by Jan 15
+  - Owner: Infrastructure team
+
+### Internal Dependencies
+- **Phase 1 → Phase 2**: Queue infrastructure must be stable
+- **Phase 2 → Phase 3**: Service must process exports correctly
+- **Phase 3 → Phase 4**: API and storage must be working
+- **Phase 4 → Phase 5**: All testing must pass
+
+### Known Blockers
+- Infrastructure team is currently overloaded
+  - Mitigation: Request resources early, use managed services
+- Data privacy review needed for export functionality
+  - Mitigation: Schedule review meeting in first week
+```
+
+### Timeline & Gantt Chart Section
+
+```markdown
+## Timeline
+
+```
+Phase 1: Infrastructure (2 wks)   [====================]
+Phase 2: Export Service (3 wks)   ___[============================]
+Phase 3: API & Storage (3 wks)    _______[============================]
+Phase 4: Testing (2 wks)          ______________[====================]
+Phase 5: Launch (2 wks)           __________________[====================]
+
+Week:    1 2 3 4 5 6 7 8
+         |_|_|_|_|_|_|_|
+```
+
+### Key Milestones
+
+| Milestone | Target Date | Owner | Deliverable |
+|-----------|------------|-------|-------------|
+| Queue Infrastructure Ready | Jan 28 | Eng Lead | Staging deployment |
+| Export Processing Works | Feb 18 | Eng Lead | Service passes tests |
+| API Complete & Working | Feb 25 | Eng Lead | API docs + endpoints |
+| Testing Complete | Mar 3 | QA Lead | Test report |
+| Production Launch | Mar 10 | Eng Lead | Live feature |
+```
+
+### Resource & Team Section
+
+```markdown
+## Resources
+
+### Team Composition
+
+**Engineering Team**
+- 2 Backend Engineers (Weeks 1-8): Infrastructure, export service, API
+- 1 Backend Engineer (Weeks 4-8): Testing, optimization
+- Optional: 1 Frontend Engineer (Weeks 7-8): Documentation, demos
+
+**Support & Operations**
+- Product Manager (all weeks): Requirements, prioritization
+- QA Lead (Weeks 4-8): Testing coordination
+
+### Skills Required
+- Backend development (Node.js, PostgreSQL)
+- Infrastructure/DevOps (Kubernetes, AWS)
+- Performance testing and optimization
+- Security best practices
+
+### Training Needs
+- Team review of job queue pattern
+- S3 and AWS integration workshop
+```
+
+### Risk Management Section
+
+```markdown
+## Risks & Mitigation
+
+### Technical Risks
+
+**Risk: Job Queue Reliability Issues**
+- **Likelihood**: Medium
+- **Impact**: High (feature doesn't work)
+- **Mitigation**:
+  - Use managed Redis service (AWS ElastiCache)
+  - Implement comprehensive error handling
+  - Load test thoroughly before production
+  - Have rollback plan
+
+**Risk: Large File Performance Problems**
+- **Likelihood**: Medium
+- **Impact**: Medium (performance targets missed)
+- **Mitigation**:
+  - Start performance testing early (Week 2)
+  - Profile and optimize in Phase 4
+  - Document performance constraints
+  - Set data size limits if needed
+
+**Risk: Data Consistency Issues**
+- **Likelihood**: Low
+- **Impact**: High (data corruption)
+- **Mitigation**:
+  - Implement data validation
+  - Use database transactions
+  - Test with edge cases
+  - Have data audit procedures
+
+### Scheduling Risks
+
+**Risk: Phase Dependencies Cause Delays**
+- **Likelihood**: Medium
+- **Impact**: High (slips launch date)
+- **Mitigation**:
+  - Phase 2 and 3 overlap to parallelize
+  - Start Phase 4 testing early
+  - Have clear done criteria
+
+**Risk: Data Model Spec Not Ready**
+- **Likelihood**: Low
+- **Impact**: High (blocks implementation)
+- **Mitigation**:
+  - Confirm spec status before starting
+  - Have backup data model if needed
+  - Schedule early review meetings
+```
+
+### Success Metrics Section
+
+```markdown
+## Success Criteria
+
+### Technical Metrics
+- [ ] Export API processes 1000+ requests/day
+- [ ] p95 latency < 100ms for status queries
+- [ ] Export processing completes in < 5 minutes for 100MB files
+- [ ] System reliability > 99.5%
+- [ ] Zero data loss or corruption incidents
+
+### Adoption Metrics
+- [ ] 30%+ of enterprise users adopt feature in first month
+- [ ] Average of 2+ exports per adopting user per month
+- [ ] Support tickets about exports < 5/week
+
+### Quality Metrics
+- [ ] 90%+ test coverage
+- [ ] Zero critical security issues
+- [ ] Documentation completeness = 100%
+- [ ] Team can operate independently
+```
+
+## Writing Tips
+
+### Be Realistic About Timelines
+- Include buffer time for unknowns (add 20-30%)
+- Consider team capacity and interruptions
+- Account for review and testing cycles
+- Document assumptions about team size/availability
+
+### Break Down Phases Clearly
+- Each phase should have clear deliverables
+- Phases should be independable or clearly sequenced
+- Dependencies should be explicit
+- Success criteria should be measurable
+
+### Link to Related Specs
+- Reference business requirements: `[BRD-001]`
+- Reference technical requirements: `[PRD-001]`
+- Reference design documents: `[DES-001]`
+- Reference component specs: `[CMP-001]`
+
+### Identify Risk Early
+- What could go wrong?
+- What's outside your control?
+- What mitigations exist?
+- What's the contingency plan?
+
+### Track Against Plan
+- Update plan weekly with actual progress
+- Note slippages and root causes
+- Adjust future phases if needed
+- Use as learning for future planning
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/plan/pln-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Phases lack clear deliverables"
+- **Fix**: Add specific, measurable deliverables for each phase
+
+**Issue**: "No timeline or dates specified"
+- **Fix**: Add start/end dates and duration for each phase
+
+**Issue**: "Dependencies not documented"
+- **Fix**: Identify and document blocking dependencies between phases
+
+**Issue**: "Resource allocation unclear"
+- **Fix**: Specify team members, their roles, and time commitment per phase
+
+## Decision-Making Framework
+
+When planning implementation:
+
+1. **Scope**: What exactly are we building?
+   - Must-haves vs. nice-to-haves?
+   - What can we defer?
+
+2. **Sequence**: What must be done in order?
+   - What can happen in parallel?
+   - Where are critical path bottlenecks?
+
+3. **Phases**: How do we break this into manageable chunks?
+   - 1-3 week phases work well
+   - Each should produce something shippable/testable
+   - Clear entry/exit criteria
+
+4. **Resources**: What do we need?
+   - Team skills and capacity?
+   - Infrastructure and tools?
+   - External dependencies?
+
+5. **Risk**: What could derail us?
+   - Technical risks?
+   - Timeline risks?
+   - Resource risks?
+   - Mitigation strategies?
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh plan pln-XXX-slug`
+2. **Research**: Find related specs and understand scope
+3. **Define phases**: Break work into logical chunks
+4. **Map dependencies**: Understand what blocks what
+5. **Estimate effort**: How long will each phase take?
+6. **Identify risks**: What could go wrong?
+7. **Validate**: `scripts/validate-spec.sh docs/specs/plan/pln-XXX-slug.md`
+8. **Share with team** for feedback and planning
--- a/skills/spec-author/guides/technical-requirement.md
+++ b/skills/spec-author/guides/technical-requirement.md
@@ -0,0 +1,382 @@
+# How to Create a Technical Requirement Specification
+
+Technical Requirements (PRD or TRQ) translate business needs into specific, implementation-ready technical requirements. They bridge the gap between "what we want to build" (business requirements) and "how we'll build it" (design documents).
+
+## Quick Start
+
+```bash
+# 1. Create a new technical requirement
+scripts/generate-spec.sh technical-requirement prd-001-descriptive-slug
+
+# 2. Open and fill in the file
+# (The file will be created at: docs/specs/technical-requirement/prd-001-descriptive-slug.md)
+
+# 3. Fill in the sections, then validate:
+scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
+
+# 4. Fix any issues, then check completeness:
+scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-descriptive-slug.md
+```
+
+## When to Write a Technical Requirement
+
+Use a Technical Requirement when you need to:
+- Define specific technical implementation details for a feature
+- Map business requirements to technical solutions
+- Document design decisions and their rationale
+- Create acceptance criteria that engineers can test against
+- Specify external dependencies and constraints
+
+## Research Phase
+
+Before writing, do your homework:
+
+### 1. Research Related Specifications
+Look for upstream and downstream specs:
+```bash
+# Find the business requirement this fulfills
+grep -r "brd\|business" docs/specs/ --include="*.md" | head -20
+
+# Find any existing technical requirements in this domain
+grep -r "prd\|technical" docs/specs/ --include="*.md" | head -20
+
+# Find design documents that might inform this
+grep -r "design\|architecture" docs/specs/ --include="*.md" | head -20
+```
+
+### 2. Research External Documentation
+Research relevant technologies and patterns:
+
+```bash
+# For external libraries/frameworks:
+# Use the doc tools to get latest official documentation
+# Example: research React hooks if implementing a frontend component
+# Example: research database indexing strategies if working with large datasets
+```
+
+Ask yourself:
+- What technologies are most suitable for this?
+- Are there industry standards we should follow?
+- What does the existing codebase use for similar features?
+- Are there performance benchmarks or best practices we should know about?
+
+### 3. Review the Codebase
+- How have similar features been implemented?
+- What patterns does the team follow?
+- What libraries/frameworks are already in use?
+- Are there existing utilities or services we can reuse?
+
+## Structure & Content Guide
+
+### Title & Metadata
+- **Title**: Clear, specific requirement (e.g., "Implement Real-Time Notification System")
+- **Priority**: critical | high | medium | low
+- **Document ID**: Use format `PRD-XXX-slug` (e.g., `PRD-001-export-api`)
+
+### Description Section
+Answer: "What technical problem are we solving?"
+
+Describe:
+- The technical challenge you're addressing
+- Why this particular approach matters
+- Current technical gaps
+- How this impacts system architecture or performance
+
+Example:
+```
+Currently, bulk exports run synchronously, blocking requests for up to 30 seconds.
+This causes timeout errors for exports > 100MB. We need an asynchronous export
+system that handles large datasets efficiently.
+```
+
+### Business Requirements Addressed Section
+Reference the business requirements this fulfills:
+```
+- [BRD-001] Bulk User Data Export - This implementation enables the export feature
+- [BRD-002] Enterprise Data Audit - This provides the data integrity requirements
+```
+
+Link each BRD to how your technical solution addresses it.
+
+### Technical Requirements Section
+List specific, measurable technical requirements:
+
+```markdown
+1. **[TR-001] Asynchronous Export Processing**
+   - Exports must complete within 5 minutes for datasets up to 500MB
+   - Must not block HTTP request threads
+   - Must handle job queue with at least 100 concurrent exports
+
+2. **[TR-002] Data Format Support**
+   - Support CSV, JSON, and Parquet formats
+   - All formats must preserve data types accurately
+   - Handle special characters and encodings (UTF-8, etc.)
+
+3. **[TR-003] Resilience & Retries**
+   - Failed exports must retry up to 3 times with exponential backoff
+   - Incomplete exports must be resumable or cleanly failed
+```
+
+**Tips:**
+- Be specific: Use numbers, formats, standards
+- Make it testable: Each requirement should be verifiable
+- Reference technical specs: Link to API contracts, data models, etc.
+- Include edge cases: What about edge cases or error conditions?
+
+### Implementation Approach Section
+Describe the high-level technical strategy:
+
+```markdown
+**Architecture Pattern**
+We'll use a job queue pattern with async workers. HTTP requests will create
+an export job and return immediately. Workers process jobs asynchronously
+and notify users when complete.
+
+**Key Technologies**
+- Job Queue: Redis with Bull library
+- Export Service: Node.js worker process
+- Storage: S3 for export files
+- Notifications: Email service
+
+**Integration Points**
+- Integrates with existing User Service API
+- Uses auth middleware for permission checking
+- Publishes completion events to event bus
+```
+
+### Key Design Decisions Section
+Document important choices:
+
+```markdown
+**Decision 1: Asynchronous Export vs. Synchronous**
+- **Decision**: Use async job queue instead of blocking requests
+- **Rationale**: Synchronous approach causes timeouts for large exports;
+  async improves reliability and user experience
+- **Tradeoffs**: Adds complexity (job queue, worker processes, status tracking)
+  but enables exports for datasets up to 500MB vs. 50MB limit
+```
+
+**Why this matters:**
+- Explains the "why" behind technical choices
+- Helps future developers understand constraints
+- Documents tradeoffs explicitly
+
+### Technical Acceptance Criteria Section
+Define how you'll know this is implemented correctly:
+
+```markdown
+### [TAC-001] Export Job Creation
+**Description**: When a user requests an export, a job is created and queued
+**Verification**: Unit test verifies job is created with correct parameters;
+  integration test verifies job appears in queue
+
+### [TAC-002] Async Processing
+**Description**: Export job completes without blocking HTTP request
+**Verification**: Load test shows HTTP response time < 100ms regardless of
+  export size; export job completes within target time
+
+### [TAC-003] Export Format Accuracy
+**Description**: Exported data matches source data exactly (no data loss)
+**Verification**: Property-based tests verify format accuracy for various
+  data types and edge cases
+```
+
+**Tips for Acceptance Criteria:**
+- Each should be testable (unit test, integration test, or manual test)
+- Include both happy path and edge cases
+- Reference specific metrics or standards
+
+### Dependencies Section
+
+**Technical Dependencies**
+- What libraries, services, or systems must be in place?
+- What versions are required?
+- What's the risk if a dependency is unavailable?
+
+```markdown
+- **Redis** (v6.0+) - Job queue | Risk: Medium
+- **Bull** (v3.0+) - Queue library | Risk: Low
+- **S3** - Export file storage | Risk: Low
+- **Email Service API** - User notifications | Risk: Medium
+```
+
+**Specification Dependencies**
+- What other specs must be completed first?
+- Why is this a blocker?
+
+```markdown
+- [API-001] Export Endpoints - Must be designed before implementation
+- [DATA-001] User Data Model - Need schema for understanding export structure
+```
+
+### Constraints Section
+Document technical limitations:
+
+```markdown
+**Performance**
+- Exports must complete within 5 minutes
+- p95 latency for export requests must be < 100ms
+- System must handle 100 concurrent exports
+
+**Scalability**
+- Support up to 500MB export files
+- Handle 1000+ daily exports
+
+**Security**
+- Only export user's own data (auth-based filtering)
+- Encryption for files in transit and at rest
+- Audit logs for all exports
+
+**Compatibility**
+- Support all major browsers (Chrome, Firefox, Safari, Edge)
+- Works with existing authentication system
+```
+
+### Implementation Notes Section
+
+**Key Considerations**
+What should the implementation team watch out for?
+
+```markdown
+**Error Handling**
+- Handle network interruptions during export
+- Gracefully fail if S3 becomes unavailable
+- Provide clear error messages to users
+
+**Testing Strategy**
+- Unit tests for export formatting logic
+- Integration tests for job queue and workers
+- Load tests for concurrent export handling
+- Property-based tests for data accuracy
+```
+
+**Migration Strategy** (if applicable)
+- How do we transition from old to new system?
+- What about existing data or users?
+
+## Writing Tips
+
+### Make Requirements Testable
+- ❌ Bad: "Export should be fast"
+- ✅ Good: "Export must complete within 5 minutes for datasets up to 500MB, with p95 latency under 100ms"
+
+### Be Specific About Trade-offs
+- Don't just say "we chose Redis"
+- Explain: "We chose Redis over RabbitMQ because it's already in our stack and provides the job persistence we need"
+
+### Link to Other Specs
+- Reference business requirements this fulfills: `[BRD-001]`
+- Reference data models: `[DATA-001]`
+- Reference API contracts: `[API-001]`
+- Reference design documents: `[DES-001]`
+
+### Document Constraints Clearly
+- Performance targets with specific numbers
+- Scalability limits and assumptions
+- Security and compliance requirements
+- Browser/platform support
+
+### Include Edge Cases
+- What happens with extremely large datasets?
+- How do we handle special characters, encoding issues, missing data?
+- What about rate limiting and concurrent requests?
+
+### Complete All TODOs
+- Replace placeholder text with actual decisions
+- If something is still undecided, explain what needs to happen to decide
+
+## Validation & Fixing Issues
+
+### Run the Validator
+```bash
+scripts/validate-spec.sh docs/specs/technical-requirement/prd-001-your-spec.md
+```
+
+### Common Issues & Fixes
+
+**Issue**: "Missing Technical Acceptance Criteria"
+- **Fix**: Add 3-5 criteria describing how you'll verify implementation correctness
+
+**Issue**: "TODO items in Implementation Approach (2 items)"
+- **Fix**: Complete the architecture pattern, technologies, and integration points
+
+**Issue**: "No Performance constraints specified"
+- **Fix**: Add specific latency, throughput, and availability targets
+
+**Issue**: "Dependencies section incomplete"
+- **Fix**: List all required libraries, services, and other specifications this depends on
+
+### Check Completeness
+```bash
+scripts/check-completeness.sh docs/specs/technical-requirement/prd-001-your-spec.md
+```
+
+## Decision-Making Framework
+
+As you write the technical requirement, reason through:
+
+1. **Problem**: What technical problem are we solving?
+   - Is this a performance issue, reliability issue, or capability gap?
+   - What's the current cost of not solving this?
+
+2. **Approach**: What are the viable technical approaches?
+   - Pros and cons of each?
+   - What's the simplest approach that solves the problem?
+   - What does the team have experience with?
+
+3. **Trade-offs**: What are we accepting with this approach?
+   - Complexity vs. flexibility?
+   - Performance vs. maintainability?
+   - Immediate need vs. future extensibility?
+
+4. **Measurability**: How will we know this works?
+   - What specific metrics define success?
+   - What's the threshold for "passing"?
+
+5. **Dependencies**: What must happen first?
+   - Are there blockers we need to resolve?
+   - Can parts be parallelized?
+
+## Example: Complete Technical Requirement
+
+```markdown
+# [PRD-001] Asynchronous Export Service
+
+**Priority:** High
+
+## Description
+Currently, bulk exports run synchronously, blocking HTTP requests for up to
+30 seconds, causing timeouts for exports > 100MB. We need an asynchronous
+export system that handles large datasets efficiently and provides job status
+tracking to users.
+
+## Business Requirements Addressed
+- [BRD-001] Bulk User Data Export - Enables the core export feature
+- [BRD-002] Enterprise Audit Requirements - Provides reliable data export
+
+## Technical Requirements
+
+1. **[TR-001] Asynchronous Processing**
+   - Export jobs must not block HTTP requests
+   - Jobs complete within 5 minutes for datasets up to 500MB
+   - System handles 100 concurrent exports
+
+2. **[TR-002] Format Support**
+   - Support CSV, JSON formats
+   - Preserve data types and handle special characters
+
+3. **[TR-003] Job Status Tracking**
+   - Users can check export job status via API
+   - Job history retained for 30 days
+
+... [rest of sections follow] ...
+```
+
+## Next Steps
+
+1. **Create the spec**: `scripts/generate-spec.sh technical-requirement prd-XXX-slug`
+2. **Research**: Find related BRD and understand the context
+3. **Fill in sections** using this guide
+4. **Validate**: `scripts/validate-spec.sh docs/specs/technical-requirement/prd-XXX-slug.md`
+5. **Fix issues** identified by validator
+6. **Share with architecture/design team** for design document creation