zhongwei/gh-onezerocompany-claude-project-basics

Files

Zhongwei Li ca9b85ccda Initial commit

2025-11-30 08:45:31 +08:00

21 KiB

Raw Permalink Blame History

How to Create a Configuration Schema Specification

Configuration schema specifications document all configurable parameters for a system, including their types, valid values, defaults, and impact.

Quick Start

# 1. Create a new configuration schema
scripts/generate-spec.sh configuration-schema config-001-descriptive-slug

# 2. Open and fill in the file
# (The file will be created at: docs/specs/configuration-schema/config-001-descriptive-slug.md)

# 3. Fill in configuration fields and validation rules, then validate:
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-descriptive-slug.md

# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/configuration-schema/config-001-descriptive-slug.md

When to Write a Configuration Schema

Use a Configuration Schema when you need to:

Document all configurable system parameters
Specify environment variables and their meanings
Define configuration file formats
Document validation rules and constraints
Enable operations teams to configure systems safely
Provide examples for different environments

Research Phase

Find what you're configuring:

# Find component specs
grep -r "component" docs/specs/ --include="*.md"

# Find deployment procedures
grep -r "deploy" docs/specs/ --include="*.md"

# Find existing configuration specs
grep -r "config" docs/specs/ --include="*.md"

2. Understand Configuration Needs

What aspects of the system need to be configurable?
What differs between environments (dev, staging, prod)?
What can change at runtime vs. requires restart?
What's sensitive (secrets, credentials)?

3. Review Existing Configurations

How are other services configured?
What configuration format is used?
What environment variables exist?
What patterns should be followed?

Structure & Content Guide

Title & Metadata

Title: "Export Service Configuration", "API Gateway Config", etc.
Component: What component is being configured
Version: Configuration format version
Status: Current, Deprecated, etc.

Overview Section

# Export Service Configuration Schema

## Summary
Defines all configurable parameters for the Export Service microservice.
Configuration can be set via environment variables or JSON config file.

**Configuration Methods**:
- Environment variables (recommended for Docker/Kubernetes)
- config.json file (for monolithic deployments)
- Command-line arguments (for local development)

**Scope**: All settings that affect Export Service behavior
**Format**: JSON Schema compliant

Configuration Methods Section

## Configuration Methods

### Method 1: Environment Variables (Recommended for Production)
Used in containerized deployments (Docker, Kubernetes).
Set before starting the service.

**Syntax**: `EXPORT_SERVICE_KEY=value`

**Example**:
```bash
export EXPORT_SERVICE_PORT=3000
export EXPORT_SERVICE_LOG_LEVEL=info
export EXPORT_SERVICE_DATABASE_URL=postgresql://user:pass@host/db

Method 2: Configuration File (config.json)

Used in monolithic or local deployments. JSON format with hierarchical structure.

Location: ./config.json in working directory

Example:

{
  "server": {
    "port": 3000,
    "timeout": 30000
  },
  "database": {
    "url": "postgresql://user:pass@host/db",
    "pool": 10
  }
}

Method 3: Command-Line Arguments

Used in local development. Takes precedence over file config.

Syntax: --key value or --key=value

Example:

node index.js --port 3000 --log-level debug

Precedence (Priority Order)

Command-line arguments (highest priority)
Environment variables
config.json file
Default values (lowest priority)


### Configuration Fields Section

Document each configuration field:

```markdown
## Configuration Fields

### Server Section

#### PORT
- **Type**: integer
- **Default**: 3000
- **Range**: 1024-65535
- **Environment Variable**: `EXPORT_SERVICE_PORT`
- **Config File Key**: `server.port`
- **Description**: HTTP server listening port
- **Examples**:
  - Development: 3000 (local machine, different services use different ports)
  - Production: 3000 (behind load balancer, port not exposed)
- **Impact**: Service not reachable if port already in use
- **Can Change at Runtime**: No (requires restart)

#### TIMEOUT_MS
- **Type**: integer (milliseconds)
- **Default**: 30000 (30 seconds)
- **Range**: 5000-120000
- **Environment Variable**: `EXPORT_SERVICE_TIMEOUT_MS`
- **Config File Key**: `server.timeout_ms`
- **Description**: HTTP request timeout
- **Considerations**:
  - Must be longer than longest export duration
  - If too short: Long exports time out and fail
  - If too long: Failed connections hang longer
- **Examples**:
  - Development: 30000 (quick feedback on errors)
  - Production: 120000 (accounts for large exports)

#### ENABLE_COMPRESSION
- **Type**: boolean
- **Default**: true
- **Environment Variable**: `EXPORT_SERVICE_ENABLE_COMPRESSION`
- **Config File Key**: `server.enable_compression`
- **Description**: Enable HTTP response compression (gzip)
- **Considerations**:
  - Reduces bandwidth but increases CPU usage
  - Should be true unless CPU constrained
- **Typical Value**: true (always)

### Database Section

#### DATABASE_URL
- **Type**: string (connection string)
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_URL`
- **Config File Key**: `database.url`
- **Format**: `postgresql://user:password@host:port/database`
- **Description**: PostgreSQL connection string
- **Examples**:
  - Development: `postgresql://localhost/export_service`
  - Staging: `postgresql://stage-db.example.com/export_stage`
  - Production: `postgresql://prod-db.example.com/export_prod` (managed RDS)
- **Sensitive**: Yes (contains credentials - use secrets management)
- **Required**: Yes
- **Validation**:
  - Must be valid PostgreSQL connection string
  - Service fails to start if URL invalid or unreachable

#### DATABASE_POOL_SIZE
- **Type**: integer
- **Default**: 10
- **Range**: 1-100
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_POOL_SIZE`
- **Config File Key**: `database.pool_size`
- **Description**: Number of database connections to maintain
- **Considerations**:
  - More connections allow more concurrent queries
  - Each connection uses memory and database slot
  - Database has max_connections limit (typically 100-500)
- **Tuning**:
  - 1 service instance: 5-10 connections
  - 5 service instances: 2-4 connections each (25-40 total)
  - Kubernetes auto-scaling: 2-3 per pod (auto-scaled)

#### DATABASE_QUERY_TIMEOUT_MS
- **Type**: integer (milliseconds)
- **Default**: 10000 (10 seconds)
- **Range**: 1000-60000
- **Environment Variable**: `EXPORT_SERVICE_DATABASE_QUERY_TIMEOUT_MS`
- **Config File Key**: `database.query_timeout_ms`
- **Description**: Timeout for individual database queries
- **Considerations**:
  - Export queries can take several seconds for large datasets
  - If too short: Queries fail prematurely
  - If too long: Failed queries block connection pool
- **Typical Values**:
  - Simple queries: 5000ms
  - Large exports: 30000ms

### Redis (Job Queue) Section

#### REDIS_URL
- **Type**: string (connection string)
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_REDIS_URL`
- **Config File Key**: `redis.url`
- **Format**: `redis://user:password@host:port/db`
- **Description**: Redis connection string for job queue
- **Examples**:
  - Development: `redis://localhost:6379/0`
  - Staging: `redis://redis-stage.example.com:6379/0`
  - Production: `redis://redis-prod.example.com:6379/0` (managed ElastiCache)
- **Sensitive**: Yes (may contain credentials)
- **Required**: Yes

#### REDIS_MAX_RETRIES
- **Type**: integer
- **Default**: 3
- **Range**: 1-10
- **Environment Variable**: `EXPORT_SERVICE_REDIS_MAX_RETRIES`
- **Config File Key**: `redis.max_retries`
- **Description**: Maximum retry attempts for Redis operations
- **Considerations**:
  - More retries provide resilience but increase latency on failure
  - Should be 3-5 for production
- **Typical Values**: 3

#### CONCURRENT_WORKERS
- **Type**: integer
- **Default**: 3
- **Range**: 1-20
- **Environment Variable**: `EXPORT_SERVICE_CONCURRENT_WORKERS`
- **Config File Key**: `redis.concurrent_workers`
- **Description**: Number of concurrent export workers
- **Considerations**:
  - Each worker processes one export job at a time
  - More workers process jobs faster but use more resources
  - Limited by CPU and memory available
  - Kubernetes scales pods, not this setting
- **Tuning**:
  - Development: 1-2 (for debugging)
  - Production with 2 CPU: 2-3 workers
  - Production with 4+ CPU: 4-8 workers

### Export Section

#### MAX_EXPORT_SIZE_MB
- **Type**: integer
- **Default**: 500
- **Range**: 10-5000
- **Environment Variable**: `EXPORT_SERVICE_MAX_EXPORT_SIZE_MB`
- **Config File Key**: `export.max_export_size_mb`
- **Description**: Maximum size for an export file (in MB)
- **Considerations**:
  - Files larger than this are rejected
  - Limited by disk space and memory
  - Should match S3 bucket policies
- **Typical Values**:
  - Small deployments: 100MB
  - Standard: 500MB
  - Enterprise: 1000-5000MB

#### EXPORT_TTL_DAYS
- **Type**: integer (days)
- **Default**: 7
- **Range**: 1-365
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_TTL_DAYS`
- **Config File Key**: `export.ttl_days`
- **Description**: How long to retain export files after completion
- **Considerations**:
  - Files deleted after TTL expires
  - Affects storage costs (shorter TTL = lower cost)
  - Users must download before expiration
- **Typical Values**:
  - Short retention: 3 days (reduce storage cost)
  - Standard: 7 days (reasonable download window)
  - Long retention: 30 days (enterprise customers)

#### EXPORT_FORMATS
- **Type**: array of strings
- **Default**: ["csv", "json"]
- **Valid Values**: "csv", "json", "parquet"
- **Environment Variable**: `EXPORT_SERVICE_EXPORT_FORMATS` (comma-separated)
- **Config File Key**: `export.formats`
- **Description**: Supported export file formats
- **Examples**:
  - `["csv", "json"]` (most common)
  - `["csv", "json", "parquet"]` (full support)
- **Configuration**:
  - Environment: `EXPORT_SERVICE_EXPORT_FORMATS=csv,json`
  - File: `"formats": ["csv", "json"]`

#### COMPRESSION_ENABLED
- **Type**: boolean
- **Default**: true
- **Environment Variable**: `EXPORT_SERVICE_COMPRESSION_ENABLED`
- **Config File Key**: `export.compression_enabled`
- **Description**: Enable gzip compression for export files
- **Considerations**:
  - Reduces file size by 60-80% typically
  - Increases CPU usage during export
  - Should be enabled unless CPU is bottleneck
- **Typical Value**: true

### Storage Section

#### S3_BUCKET
- **Type**: string
- **Default**: None (required)
- **Environment Variable**: `EXPORT_SERVICE_S3_BUCKET`
- **Config File Key**: `storage.s3_bucket`
- **Description**: AWS S3 bucket for storing export files
- **Format**: `bucket-name` (no s3:// prefix)
- **Examples**:
  - Development: `export-service-dev`
  - Staging: `export-service-stage`
  - Production: `export-service-prod`
- **Required**: Yes
- **IAM Requirements**: Service role must have s3:PutObject, s3:GetObject

#### S3_REGION
- **Type**: string
- **Default**: `us-east-1`
- **Valid Values**: Any AWS region (us-east-1, eu-west-1, etc.)
- **Environment Variable**: `EXPORT_SERVICE_S3_REGION`
- **Config File Key**: `storage.s3_region`
- **Description**: AWS region for S3 bucket
- **Examples**:
  - us-east-1 (US East - Virginia)
  - eu-west-1 (EU - Ireland)

### Logging Section

#### LOG_LEVEL
- **Type**: string (enum)
- **Default**: "info"
- **Valid Values**: "debug", "info", "warn", "error"
- **Environment Variable**: `EXPORT_SERVICE_LOG_LEVEL`
- **Config File Key**: `logging.level`
- **Description**: Logging verbosity level
- **Examples**:
  - Development: "debug" (verbose, detailed logs)
  - Staging: "info" (normal level)
  - Production: "info" or "warn" (minimal logs, better performance)
- **Considerations**:
  - debug: Very verbose, affects performance
  - info: Standard operational logs
  - warn: Only warnings and errors
  - error: Only errors

#### LOG_FORMAT
- **Type**: string (enum)
- **Default**: "json"
- **Valid Values**: "json", "text"
- **Environment Variable**: `EXPORT_SERVICE_LOG_FORMAT`
- **Config File Key**: `logging.format`
- **Description**: Log output format
- **Examples**:
  - json: Machine-parseable JSON logs (recommended for production)
  - text: Human-readable text (good for development)

### Feature Flags Section

#### FEATURE_PARQUET_EXPORT
- **Type**: boolean
- **Default**: false
- **Environment Variable**: `EXPORT_SERVICE_FEATURE_PARQUET_EXPORT`
- **Config File Key**: `features.parquet_export`
- **Description**: Enable experimental Parquet export format
- **Considerations**:
  - Set to false for stable deployments
  - Set to true in staging for testing
  - Disabled by default in production
- **Typical Values**:
  - Development: true (test new feature)
  - Staging: true (validate before production)
  - Production: false (disabled until stable)

Validation Rules Section

## Validation & Constraints

### Required Fields
These fields must be provided (no default value):
- `DATABASE_URL` - PostgreSQL connection string required
- `REDIS_URL` - Redis connection required
- `S3_BUCKET` - S3 bucket must be specified

### Type Validation
- Integers: Must be valid numeric values
- Booleans: Accept true, false, "true", "false", 1, 0
- Strings: Must not be empty (unless explicitly optional)
- Arrays: Must be comma-separated in environment, JSON array in file

### Range Validation
- PORT: 1024-65535 (avoid system ports)
- POOL_SIZE: 1-100 (reasonable connection pool)
- TIMEOUT_MS: 5000-120000 (between 5 seconds and 2 minutes)
- MAX_EXPORT_SIZE_MB: 10-5000 (reasonable file sizes)

### Format Validation
- DATABASE_URL: Must be valid PostgreSQL connection string
- S3_BUCKET: Must follow S3 naming rules (lowercase, hyphens only)
- S3_REGION: Must be valid AWS region code

### Interdependency Rules
- If COMPRESSION_ENABLED=true: MAX_EXPORT_SIZE_MB can be larger
- If MAX_EXPORT_SIZE_MB > 100: DATABASE_QUERY_TIMEOUT_MS should be > 10000
- If CONCURRENT_WORKERS > 5: Memory requirements increase significantly

### Error Cases
What happens if validation fails:
- Service fails to start with validation error
- Specific field and reason for validation failure logged
- Error message includes valid range/values

Environment-Specific Configurations Section

## Environment-Specific Configurations

### Development Environment

```json
{
  "server": {
    "port": 3000,
    "timeout_ms": 30000
  },
  "database": {
    "url": "postgresql://localhost/export_service",
    "pool_size": 5
  },
  "redis": {
    "url": "redis://localhost:6379/0",
    "concurrent_workers": 1
  },
  "export": {
    "max_export_size_mb": 100,
    "ttl_days": 7,
    "formats": ["csv", "json"]
  },
  "logging": {
    "level": "debug",
    "format": "text"
  },
  "features": {
    "parquet_export": false
  }
}

Notes:

Runs locally with minimal resources
Verbose logging for debugging
Limited concurrent workers (1)
Smaller max export size for testing

Staging Environment

EXPORT_SERVICE_PORT=3000
EXPORT_SERVICE_DATABASE_URL=postgresql://stage-db.example.com/export_stage
EXPORT_SERVICE_REDIS_URL=redis://redis-stage.example.com:6379/0
EXPORT_SERVICE_S3_BUCKET=export-service-stage
EXPORT_SERVICE_S3_REGION=us-east-1
EXPORT_SERVICE_LOG_LEVEL=info
EXPORT_SERVICE_LOG_FORMAT=json
EXPORT_SERVICE_CONCURRENT_WORKERS=3
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=true

Notes:

Tests new features before production
Similar resources to production
Parquet export enabled for testing

Production Environment

EXPORT_SERVICE_PORT=3000
EXPORT_SERVICE_DATABASE_URL=<from AWS Secrets Manager>
EXPORT_SERVICE_REDIS_URL=<from AWS Secrets Manager>
EXPORT_SERVICE_S3_BUCKET=export-service-prod
EXPORT_SERVICE_S3_REGION=us-east-1
EXPORT_SERVICE_LOG_LEVEL=info
EXPORT_SERVICE_LOG_FORMAT=json
EXPORT_SERVICE_CONCURRENT_WORKERS=4
EXPORT_SERVICE_DATABASE_POOL_SIZE=3
EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=500
EXPORT_SERVICE_EXPORT_TTL_DAYS=7
EXPORT_SERVICE_FEATURE_PARQUET_EXPORT=false

Notes:

Credentials from secrets manager
Optimized for performance and reliability
Experimental features disabled
Standard deployment settings


### Configuration Examples Section

```markdown
## Complete Configuration Examples

### Minimal Configuration (Development)
```bash
# Minimal settings needed to run locally
export EXPORT_SERVICE_DATABASE_URL=postgresql://localhost/export_service
export EXPORT_SERVICE_REDIS_URL=redis://localhost:6379/0
export EXPORT_SERVICE_S3_BUCKET=export-service-local
export EXPORT_SERVICE_S3_REGION=us-east-1

High-Throughput Configuration (Production)

# Optimized for maximum throughput
export EXPORT_SERVICE_CONCURRENT_WORKERS=8
export EXPORT_SERVICE_DATABASE_POOL_SIZE=5
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=1000
export EXPORT_SERVICE_COMPRESSION_ENABLED=true
export EXPORT_SERVICE_EXPORT_TTL_DAYS=30

Low-Resource Configuration (Cost-Optimized)

# Minimizes resource usage and cost
export EXPORT_SERVICE_CONCURRENT_WORKERS=1
export EXPORT_SERVICE_DATABASE_POOL_SIZE=2
export EXPORT_SERVICE_MAX_EXPORT_SIZE_MB=100
export EXPORT_SERVICE_EXPORT_TTL_DAYS=1
export EXPORT_SERVICE_LOG_LEVEL=warn


### Secrets Management Section

```markdown
## Handling Sensitive Configuration

### Sensitive Fields
These fields contain credentials or sensitive information:
- DATABASE_URL (contains password)
- REDIS_URL (may contain password)
- AWS credentials (if not using IAM roles)

### Security Best Practices
1. **Never commit secrets to git**
   - Use .gitignore to exclude config files with secrets
   - Use environment variables instead

2. **Use Secrets Management**
   - AWS Secrets Manager (recommended for production)
   - HashiCorp Vault (for multi-team deployments)
   - Kubernetes Secrets (for K8s deployments)

3. **Rotate Credentials**
   - Rotate database passwords regularly
   - Rotate AWS API keys
   - Update service after rotation

4. **Limit Access**
   - Only operations team can see production credentials
   - Audit logs track who accessed what credentials
   - Use IAM roles instead of static credentials when possible

### Example: Using AWS Secrets Manager
```bash
# In Kubernetes deployment, inject from AWS Secrets Manager
DATABASE_URL=$(aws secretsmanager get-secret-value \
  --secret-id export-service/db-url \
  --query SecretString --output text)

export EXPORT_SERVICE_DATABASE_URL=$DATABASE_URL


## Writing Tips

### Be Clear About Scope
- What can users configure?
- What's fixed/non-configurable and why?
- What requires restart vs. hot reload?

### Provide Realistic Examples
- Show real values, not placeholders
- Include examples for different environments
- Show both correct and incorrect formats

### Document Trade-offs
- Why choose certain defaults?
- What's the impact of changing values?
- What happens if value is too high/low?

### Include Validation
- What values are valid?
- What happens if invalid values provided?
- How do users know if config is wrong?

### Think About Operations
- What configuration might ops teams want to change?
- What parameters help troubleshoot issues?
- What can be tuned for performance?

## Validation & Fixing Issues

### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/configuration-schema/config-001-your-spec.md

Common Issues & Fixes

Issue: "Configuration fields lack descriptions"

Fix: Add purpose, examples, and impact for each field

Issue: "No validation rules documented"

Fix: Document valid ranges, formats, required fields

Issue: "No environment-specific examples"

Fix: Add configurations for dev, staging, and production

Issue: "Sensitive fields not highlighted"

Fix: Clearly mark sensitive fields and document secrets management

Decision-Making Framework

When designing configuration schema:

Scope: What should be configurable?
- Environment-specific settings?
- Performance tuning parameters?
- Feature flags?
- Operational settings?
Defaults: What are good default values?
- Production-safe defaults?
- Development-friendly for new users?
- Documented reasoning?
Flexibility: How much should users configure?
- Too much: Confusing, hard to troubleshoot
- Too little: Can't adapt to needs
- Right amount: Common use cases covered
Safety: How do we prevent misconfiguration?
- Validation rules?
- Error messages?
- Documentation of constraints?
Evolution: How will configuration change?
- Backward compatibility?
- Migration path for old configs?
- Deprecation timeline?

Next Steps

Create the spec: scripts/generate-spec.sh configuration-schema config-XXX-slug
List fields: What can be configured?
Document each field with type, default, range, impact
Provide examples for different environments
Document validation rules and constraints
Validate: scripts/validate-spec.sh docs/specs/configuration-schema/config-XXX-slug.md
Share with operations team for feedback

21 KiB Raw Permalink Blame History

How to Create a Configuration Schema Specification

Quick Start

When to Write a Configuration Schema

Research Phase

1. Research Related Specifications

2. Understand Configuration Needs

3. Review Existing Configurations

Structure & Content Guide

Title & Metadata

Overview Section

Configuration Methods Section

Method 2: Configuration File (config.json)

Method 3: Command-Line Arguments

Precedence (Priority Order)

Validation Rules Section

Environment-Specific Configurations Section

Staging Environment

Production Environment

High-Throughput Configuration (Production)

Low-Resource Configuration (Cost-Optimized)

Common Issues & Fixes

Decision-Making Framework

Next Steps

21 KiB

Raw Permalink Blame History