# How to Create a Configuration Schema Specification Configuration schema specifications document all configurable parameters for a system, including their types, valid values, defaults, and impact. ## Quick Start ```bash # 1. Create a new configuration schema scripts/generate-spec.sh configuration-schema config-001-descriptive-slug # 2. Open and fill in the file # (The file will be created at: docs/specs/configuration-schema/config-001-descriptive-slug.md) # 3. Fill in configuration fields and validation rules, then validate: scripts/validate-spec.sh docs/specs/configuration-schema/config-001-descriptive-slug.md # 4. Fix issues and check completeness: scripts/check-completeness.sh docs/specs/configuration-schema/config-001-descriptive-slug.md ``` ## When to Write a Configuration Schema Use a Configuration Schema when you need to: - Document all configurable system parameters - Specify environment variables and their meanings - Define configuration file formats - Document validation rules and constraints - Enable operations teams to configure systems safely - Provide examples for different environments ## Research Phase ### 1. Research Related Specifications Find what you're configuring: ```bash # Find component specs grep -r "component" docs/specs/ --include="*.md" # Find deployment procedures grep -r "deploy" docs/specs/ --include="*.md" # Find existing configuration specs grep -r "config" docs/specs/ --include="*.md" ``` ### 2. Understand Configuration Needs - What aspects of the system need to be configurable? - What differs between environments (dev, staging, prod)? - What can change at runtime vs. requires restart? - What's sensitive (secrets, credentials)? ### 3. Review Existing Configurations - How are other services configured? - What configuration format is used? - What environment variables exist? - What patterns should be followed? ## Structure & Content Guide ### Title & Metadata - **Title**: "Export Service Configuration", "API Gateway Config", etc. - **Component**: What component is being configured - **Version**: Configuration format version - **Status**: Current, Deprecated, etc. ### Overview Section ```markdown # Export Service Configuration Schema ## Summary Defines all configurable parameters for the Export Service microservice. Configuration can be set via environment variables or JSON config file. **Configuration Methods**: - Environment variables (recommended for Docker/Kubernetes) - config.json file (for monolithic deployments) - Command-line arguments (for local development) **Scope**: All settings that affect Export Service behavior **Format**: JSON Schema compliant ``` ### Configuration Methods Section ```markdown ## Configuration Methods ### Method 1: Environment Variables (Recommended for Production) Used in containerized deployments (Docker, Kubernetes). Set before starting the service. **Syntax**: `EXPORT_SERVICE_KEY=value` **Example**: ```bash export EXPORT_SERVICE_PORT=3000 export EXPORT_SERVICE_LOG_LEVEL=info export EXPORT_SERVICE_DATABASE_URL=postgresql://user:pass@host/db ``` ### Method 2: Configuration File (config.json) Used in monolithic or local deployments. JSON format with hierarchical structure. **Location**: `./config.json` in working directory **Example**: ```json { "server": { "port": 3000, "timeout": 30000 }, "database": { "url": "postgresql://user:pass@host/db", "pool": 10 } } ``` ### Method 3: Command-Line Arguments Used in local development. Takes precedence over file config. **Syntax**: `--key value` or `--key=value` **Example**: ```bash node index.js --port 3000 --log-level debug ``` ### Precedence (Priority Order) 1. Command-line arguments (highest priority) 2. Environment variables 3. config.json file 4. Default values (lowest priority) ``` ### Configuration Fields Section Document each configuration field: ```markdown ## Configuration Fields ### Server Section #### PORT - **Type**: integer - **Default**: 3000 - **Range**: 1024-65535 - **Environment Variable**: `EXPORT_SERVICE_PORT` - **Config File Key**: `server.port` - **Description**: HTTP server listening port - **Examples**: - Development: 3000 (local machine, different services use different ports) - Production: 3000 (behind load balancer, port not exposed) - **Impact**: Service not reachable if port already in use - **Can Change at Runtime**: No (requires restart) #### TIMEOUT_MS - **Type**: integer (milliseconds) - **Default**: 30000 (30 seconds) - **Range**: 5000-120000 - **Environment Variable**: `EXPORT_SERVICE_TIMEOUT_MS` - **Config File Key**: `server.timeout_ms` - **Description**: HTTP request timeout - **Considerations**: - Must be longer than longest export duration - If too short: Long exports time out and fail - If too long: Failed connections hang longer - **Examples**: - Development: 30000 (quick feedback on errors) - Production: 120000 (accounts for large exports) #### ENABLE_COMPRESSION - **Type**: boolean - **Default**: true - **Environment Variable**: `EXPORT_SERVICE_ENABLE_COMPRESSION` - **Config File Key**: `server.enable_compression` - **Description**: Enable HTTP response compression (gzip) - **Considerations**: - Reduces bandwidth but increases CPU usage - Should be true unless CPU constrained - **Typical Value**: true (always) ### Database Section #### DATABASE_URL - **Type**: string (connection string) - **Default**: None (required) - **Environment Variable**: `EXPORT_SERVICE_DATABASE_URL` - **Config File Key**: `database.url` - **Format**: `postgresql://user:password@host:port/database` - **Description**: PostgreSQL connection string - **Examples**: - Development: `postgresql://localhost/export_service` - Staging: `postgresql://stage-db.example.com/export_stage` - Production: `postgresql://prod-db.example.com/export_prod` (managed RDS) - **Sensitive**: Yes (contains credentials - use secrets management) - **Required**: Yes - **Validation**: - Must be valid PostgreSQL connection string - Service fails to start if URL invalid or unreachable #### DATABASE_POOL_SIZE - **Type**: integer - **Default**: 10 - **Range**: 1-100 - **Environment Variable**: `EXPORT_SERVICE_DATABASE_POOL_SIZE` - **Config File Key**: `database.pool_size` - **Description**: Number of database connections to maintain - **Considerations**: - More connections allow more concurrent queries - Each connection uses memory and database slot - Database has max_connections limit (typically 100-500) - **Tuning**: - 1 service instance: 5-10 connections - 5 service instances: 2-4 connections each (25-40 total) - Kubernetes auto-scaling: 2-3 per pod (auto-scaled) #### DATABASE_QUERY_TIMEOUT_MS - **Type**: integer (milliseconds) - **Default**: 10000 (10 seconds) - **Range**: 1000-60000 - **Environment Variable**: `EXPORT_SERVICE_DATABASE_QUERY_TIMEOUT_MS` - **Config File Key**: `database.query_timeout_ms` - **Description**: Timeout for individual database queries - **Considerations**: - Export queries can take several seconds for large datasets - If too short: Queries fail prematurely - If too long: Failed queries block connection pool - **Typical Values**: - Simple queries: 5000ms - Large exports: 30000ms ### Redis (Job Queue) Section #### REDIS_URL - **Type**: string (connection string) - **Default**: None (required) - **Environment Variable**: `EXPORT_SERVICE_REDIS_URL` - **Config File Key**: `redis.url` - **Format**: `redis://user:password@host:port/db` - **Description**: Redis connection string for job queue - **Examples**: - Development: `redis://localhost:6379/0` - Staging: `redis://redis-stage.example.com:6379/0` - Production: `redis://redis-prod.example.com:6379/0` (managed ElastiCache) - **Sensitive**: Yes (may contain credentials) - **Required**: Yes #### REDIS_MAX_RETRIES - **Type**: integer - **Default**: 3 - **Range**: 1-10 - **Environment Variable**: `EXPORT_SERVICE_REDIS_MAX_RETRIES` - **Config File Key**: `redis.max_retries` - **Description**: Maximum retry attempts for Redis operations - **Considerations**: - More retries provide resilience but increase latency on failure - Should be 3-5 for production - **Typical Values**: 3 #### CONCURRENT_WORKERS - **Type**: integer - **Default**: 3 - **Range**: 1-20 - **Environment Variable**: `EXPORT_SERVICE_CONCURRENT_WORKERS` - **Config File Key**: `redis.concurrent_workers` - **Description**: Number of concurrent export workers - **Considerations**: - Each worker processes one export job at a time - More workers process jobs faster but use more resources - Limited by CPU and memory available - Kubernetes scales pods, not this setting - **Tuning**: - Development: 1-2 (for debugging) - Production with 2 CPU: 2-3 workers - Production with 4+ CPU: 4-8 workers ### Export Section #### MAX_EXPORT_SIZE_MB - **Type**: integer - **Default**: 500 - **Range**: 10-5000 - **Environment Variable**: `EXPORT_SERVICE_MAX_EXPORT_SIZE_MB` - **Config File Key**: `export.max_export_size_mb` - **Description**: Maximum size for an export file (in MB) - **Considerations**: - Files larger than this are rejected - Limited by disk space and memory - Should match S3 bucket policies - **Typical Values**: - Small deployments: 100MB - Standard: 500MB - Enterprise: 1000-5000MB #### EXPORT_TTL_DAYS - **Type**: integer (days) - **Default**: 7 - **Range**: 1-365 - **Environment Variable**: `EXPORT_SERVICE_EXPORT_TTL_DAYS` - **Config File Key**: `export.ttl_days` - **Description**: How long to retain export files after completion - **Considerations**: - Files deleted after TTL expires - Affects storage costs (shorter TTL = lower cost) - Users must download before expiration - **Typical Values**: - Short retention: 3 days (reduce storage cost) - Standard: 7 days (reasonable download window) - Long retention: 30 days (enterprise customers) #### EXPORT_FORMATS - **Type**: array of strings - **Default**: ["csv", "json"] - **Valid Values**: "csv", "json", "parquet" - **Environment Variable**: `EXPORT_SERVICE_EXPORT_FORMATS` (comma-separated) - **Config File Key**: `export.formats` - **Description**: Supported export file formats - **Examples**: - `["csv", "json"]` (most common) - `["csv", "json", "parquet"]` (full support) - **Configuration**: - Environment: `EXPORT_SERVICE_EXPORT_FORMATS=csv,json` - File: `"formats": ["csv", "json"]` #### COMPRESSION_ENABLED - **Type**: boolean - **Default**: true - **Environment Variable**: `EXPORT_SERVICE_COMPRESSION_ENABLED` - **Config File Key**: `export.compression_enabled` - **Description**: Enable gzip compression for export files - **Considerations**: - Reduces file size by 60-80% typically - Increases CPU usage during export - Should be enabled unless CPU is bottleneck - **Typical Value**: true ### Storage Section #### S3_BUCKET - **Type**: string - **Default**: None (required) - **Environment Variable**: `EXPORT_SERVICE_S3_BUCKET` - **Config File Key**: `storage.s3_bucket` - **Description**: AWS S3 bucket for storing export files - **Format**: `bucket-name` (no s3:// prefix) - **Examples**: - Development: `export-service-dev` - Staging: `export-service-stage` - Production: `export-service-prod` - **Required**: Yes - **IAM Requirements**: Service role must have s3:PutObject, s3:GetObject #### S3_REGION - **Type**: string - **Default**: `us-east-1` - **Valid Values**: Any AWS region (us-east-1, eu-west-1, etc.) - **Environment Variable**: `EXPORT_SERVICE_S3_REGION` - **Config File Key**: `storage.s3_region` - **Description**: AWS region for S3 bucket - **Examples**: - us-east-1 (US East - Virginia) - eu-west-1 (EU - Ireland) ### Logging Section #### LOG_LEVEL - **Type**: string (enum) - **Default**: "info" - **Valid Values**: "debug", "info", "warn", "error" - **Environment Variable**: `EXPORT_SERVICE_LOG_LEVEL` - **Config File Key**: `logging.level` - **Description**: Logging verbosity level - **Examples**: - Development: "debug" (verbose, detailed logs) - Staging: "info" (normal level) - Production: "info" or "warn" (minimal logs, better performance) - **Considerations**: - debug: Very verbose, affects performance - info: Standard operational logs - warn: Only warnings and errors - error: Only errors #### LOG_FORMAT - **Type**: string (enum) - **Default**: "json" - **Valid Values**: "json", "text" - **Environment Variable**: `EXPORT_SERVICE_LOG_FORMAT` - **Config File Key**: `logging.format` - **Description**: Log output format - **Examples**: - json: Machine-parseable JSON logs (recommended for production) - text: Human-readable text (good for development) ### Feature Flags Section #### FEATURE_PARQUET_EXPORT - **Type**: boolean - **Default**: false - **Environment Variable**: `EXPORT_SERVICE_FEATURE_PARQUET_EXPORT` - **Config File Key**: `features.parquet_export` - **Description**: Enable experimental Parquet export format - **Considerations**: - Set to false for stable deployments - Set to true in staging for testing - Disabled by default in production - **Typical Values**: - Development: true (test new feature) - Staging: true (validate before production) - Production: false (disabled until stable) ``` ### Validation Rules Section ```markdown ## Validation & Constraints ### Required Fields These fields must be provided (no default value): - `DATABASE_URL` - PostgreSQL connection string required - `REDIS_URL` - Redis connection required - `S3_BUCKET` - S3 bucket must be specified ### Type Validation - Integers: Must be valid numeric values - Booleans: Accept true, false, "true", "false", 1, 0 - Strings: Must not be empty (unless explicitly optional) - Arrays: Must be comma-separated in environment, JSON array in file ### Range Validation - PORT: 1024-65535 (avoid system ports) - POOL_SIZE: 1-100 (reasonable connection pool) - TIMEOUT_MS: 5000-120000 (between 5 seconds and 2 minutes) - MAX_EXPORT_SIZE_MB: 10-5000 (reasonable file sizes) ### Format Validation - DATABASE_URL: Must be valid PostgreSQL connection string - S3_BUCKET: Must follow S3 naming rules (lowercase, hyphens only) - S3_REGION: Must be valid AWS region code ### Interdependency Rules - If COMPRESSION_ENABLED=true: MAX_EXPORT_SIZE_MB can be larger - If MAX_EXPORT_SIZE_MB > 100: DATABASE_QUERY_TIMEOUT_MS should be > 10000 - If CONCURRENT_WORKERS > 5: Memory requirements increase significantly ### Error Cases What happens if validation fails: - Service fails to start with validation error - Specific field and reason for validation failure logged - Error message includes valid range/values ``` ### Environment-Specific Configurations Section ```markdown ## Environment-Specific Configurations ### Development Environment ```json { "server": { "port": 3000, "timeout_ms": 30000 }, "database": { "url": "postgresql://localhost/export_service", "pool_size": 5 }, "redis": { "url": "redis://localhost:6379/0", "concurrent_workers": 1 }, "export": { "max_export_size_mb": 100, "ttl_days": 7, "formats": ["csv", "json"] }, "logging": { "level": "debug", "format": "text" }, "features": { "parquet_export": false } } ``` **Notes**: - Runs locally with minimal resources - Verbose logging for debugging - Limited concurrent workers (1) - Smaller max export size for testing ### Staging Environment ```bash EXPORT_SERVICE_PORT=3000 EXPORT_SERVICE_DATABASE_URL=postgresql://stage-db.example.com/export_stage EXPORT_SERVICE_REDIS_URL=redis://redis-stage.example.com:6379/0 EXPORT_SERVICE_S3_BUCKET=export-service-stage EXPORT_SERVICE_S3_REGION=us-east-1 EXPORT_SERVICE_LOG_LEVEL=info EXPORT_SERVICE_LOG_FORMAT=json EXPORT_SERVICE_CONCURRENT_WORKERS=3 EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500 EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=true ``` **Notes**: - Tests new features before production - Similar resources to production - Parquet export enabled for testing ### Production Environment ```bash EXPORT_SERVICE_PORT=3000 EXPORT_SERVICE_DATABASE_URL= EXPORT_SERVICE_REDIS_URL= EXPORT_SERVICE_S3_BUCKET=export-service-prod EXPORT_SERVICE_S3_REGION=us-east-1 EXPORT_SERVICE_LOG_LEVEL=info EXPORT_SERVICE_LOG_FORMAT=json EXPORT_SERVICE_CONCURRENT_WORKERS=4 EXPORT_SERVICE_DATABASE_POOL_SIZE=3 EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500 EXPORT_SERVICE_EXPORT_TTL_DAYS=7 EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=false ``` **Notes**: - Credentials from secrets manager - Optimized for performance and reliability - Experimental features disabled - Standard deployment settings ``` ### Configuration Examples Section ```markdown ## Complete Configuration Examples ### Minimal Configuration (Development) ```bash # Minimal settings needed to run locally export EXPORT_SERVICE_DATABASE_URL=postgresql://localhost/export_service export EXPORT_SERVICE_REDIS_URL=redis://localhost:6379/0 export EXPORT_SERVICE_S3_BUCKET=export-service-local export EXPORT_SERVICE_S3_REGION=us-east-1 ``` ### High-Throughput Configuration (Production) ```bash # Optimized for maximum throughput export EXPORT_SERVICE_CONCURRENT_WORKERS=8 export EXPORT_SERVICE_DATABASE_POOL_SIZE=5 export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=1000 export EXPORT_SERVICE_COMPRESSION_ENABLED=true export EXPORT_SERVICE_EXPORT_TTL_DAYS=30 ``` ### Low-Resource Configuration (Cost-Optimized) ```bash # Minimizes resource usage and cost export EXPORT_SERVICE_CONCURRENT_WORKERS=1 export EXPORT_SERVICE_DATABASE_POOL_SIZE=2 export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=100 export EXPORT_SERVICE_EXPORT_TTL_DAYS=1 export EXPORT_SERVICE_LOG_LEVEL=warn ``` ``` ### Secrets Management Section ```markdown ## Handling Sensitive Configuration ### Sensitive Fields These fields contain credentials or sensitive information: - DATABASE_URL (contains password) - REDIS_URL (may contain password) - AWS credentials (if not using IAM roles) ### Security Best Practices 1. **Never commit secrets to git** - Use .gitignore to exclude config files with secrets - Use environment variables instead 2. **Use Secrets Management** - AWS Secrets Manager (recommended for production) - HashiCorp Vault (for multi-team deployments) - Kubernetes Secrets (for K8s deployments) 3. **Rotate Credentials** - Rotate database passwords regularly - Rotate AWS API keys - Update service after rotation 4. **Limit Access** - Only operations team can see production credentials - Audit logs track who accessed what credentials - Use IAM roles instead of static credentials when possible ### Example: Using AWS Secrets Manager ```bash # In Kubernetes deployment, inject from AWS Secrets Manager DATABASE_URL=$(aws secretsmanager get-secret-value \ --secret-id export-service/db-url \ --query SecretString --output text) export EXPORT_SERVICE_DATABASE_URL=$DATABASE_URL ``` ``` ## Writing Tips ### Be Clear About Scope - What can users configure? - What's fixed/non-configurable and why? - What requires restart vs. hot reload? ### Provide Realistic Examples - Show real values, not placeholders - Include examples for different environments - Show both correct and incorrect formats ### Document Trade-offs - Why choose certain defaults? - What's the impact of changing values? - What happens if value is too high/low? ### Include Validation - What values are valid? - What happens if invalid values provided? - How do users know if config is wrong? ### Think About Operations - What configuration might ops teams want to change? - What parameters help troubleshoot issues? - What can be tuned for performance? ## Validation & Fixing Issues ### Run the Validator ```bash scripts/validate-spec.sh docs/specs/configuration-schema/config-001-your-spec.md ``` ### Common Issues & Fixes **Issue**: "Configuration fields lack descriptions" - **Fix**: Add purpose, examples, and impact for each field **Issue**: "No validation rules documented" - **Fix**: Document valid ranges, formats, required fields **Issue**: "No environment-specific examples" - **Fix**: Add configurations for dev, staging, and production **Issue**: "Sensitive fields not highlighted" - **Fix**: Clearly mark sensitive fields and document secrets management ## Decision-Making Framework When designing configuration schema: 1. **Scope**: What should be configurable? - Environment-specific settings? - Performance tuning parameters? - Feature flags? - Operational settings? 2. **Defaults**: What are good default values? - Production-safe defaults? - Development-friendly for new users? - Documented reasoning? 3. **Flexibility**: How much should users configure? - Too much: Confusing, hard to troubleshoot - Too little: Can't adapt to needs - Right amount: Common use cases covered 4. **Safety**: How do we prevent misconfiguration? - Validation rules? - Error messages? - Documentation of constraints? 5. **Evolution**: How will configuration change? - Backward compatibility? - Migration path for old configs? - Deprecation timeline? ## Next Steps 1. **Create the spec**: `scripts/generate-spec.sh configuration-schema config-XXX-slug` 2. **List fields**: What can be configured? 3. **Document each field** with type, default, range, impact 4. **Provide examples** for different environments 5. **Document validation** rules and constraints 6. **Validate**: `scripts/validate-spec.sh docs/specs/configuration-schema/config-XXX-slug.md` 7. **Share with operations team** for feedback