# How to Create a Data Model Specification Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with. ## Quick Start ```bash # 1. Create a new data model scripts/generate-spec.sh data-model data-001-descriptive-slug # 2. Open and fill in the file # (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md) # 3. Fill in entities and relationships, then validate: scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md # 4. Fix issues and check completeness: scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md ``` ## When to Write a Data Model Use a Data Model when you need to: - Define database schema for new features - Document entity relationships and constraints - Establish consistent naming conventions - Enable API/UI teams to understand data structure - Plan data migrations or refactoring - Document complex data relationships ## Research Phase ### 1. Research Related Specifications Find what this data model supports: ```bash # Find technical requirements this fulfills grep -r "prd\|technical" docs/specs/ --include="*.md" # Find existing data models that might be related grep -r "data\|model" docs/specs/ --include="*.md" # Find API contracts that expose this data grep -r "api\|endpoint" docs/specs/ --include="*.md" ``` ### 2. Review Existing Data Models - What data modeling patterns does your codebase use? - What database are you using (PostgreSQL, MongoDB, etc.)? - How are relationships currently modeled? - Naming conventions for fields and entities? - Any legacy schema patterns to respect or migrate from? ### 3. Research Domain Models - How do industry-standard models structure similar data? - Are there existing standards (e.g., ISO, RFC) you should follow? - What are best practices in this domain? ### 4. Understand Business Rules - What constraints must the data satisfy? - What are the cardinality rules (one-to-many, many-to-many)? - What data must be unique or required? - What's the expected scale/volume? ## Structure & Content Guide ### Title & Metadata - **Title**: "User Data Model", "Transaction Model", etc. - **Scope**: What entities does this model cover? - **Version**: 1.0 for new models ### Overview Section Provide context: ```markdown # User & Profile Data Model This data model defines the core entities for user management and profile information. Covers user accounts, authentication data, and user preferences. **Entities**: User, UserProfile, UserPreference **Relationships**: User → UserProfile (1:1), User → UserPreference (1:many) **Primary Database**: PostgreSQL ``` ### Entity Definitions Section Document each entity/table: #### Entity: User ```markdown ### User Core user account entity. Every user must have exactly one User record. **Purpose**: Represents a user account in the system. **Fields** | Field | Type | Required | Unique | Default | Description | |-------|------|----------|--------|---------|-------------| | id | UUID | Yes | Yes | auto | Primary key, auto-generated | | email | String(255) | Yes | Yes | - | User's email address, used for login | | password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) | | first_name | String(100) | No | No | - | User's first name | | last_name | String(100) | No | No | - | User's last name | | status | Enum | Yes | No | active | Account status: active, inactive, suspended | | created_at | Timestamp | Yes | No | now() | Account creation time (UTC) | | updated_at | Timestamp | Yes | No | now() | Last update time (UTC) | | deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active | **Indexes** - Primary: `email` (unique for quick lookups) - Secondary: `created_at` (for user listing/pagination) - Secondary: `status` (for filtering active users) **Constraints** - Email format must be valid (enforced in application) - Password must be at least 8 characters (enforced in application) - Email must be globally unique - Status can only be: active, inactive, suspended **Data Volume** - Expected growth: 100 new users/day - Estimated year 1: ~36k users - Estimated year 3: ~150k users **Archival Strategy** - Deleted users (deleted_at != NULL) moved to archive after 1 year - Soft deletes used for data recovery capability ``` #### Entity: UserProfile ```markdown ### UserProfile Extended user profile information. One-to-one relationship with User. **Purpose**: Stores optional user profile information separate from core account. **Fields** | Field | Type | Required | Unique | Description | |-------|------|----------|--------|-------------| | id | UUID | Yes | Yes | Primary key | | user_id | UUID (FK) | Yes | Yes | Foreign key to User.id | | avatar_url | String(500) | No | No | URL to user's avatar image | | bio | String(500) | No | No | User bio/description | | phone | String(20) | No | Yes | User phone number | | timezone | String(50) | No | No | User's timezone (e.g., America/New_York) | | language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) | | theme | Enum | No | No | UI theme preference: light, dark, auto | | created_at | Timestamp | Yes | No | Creation time | | updated_at | Timestamp | Yes | No | Last update time | **Indexes** - Primary: `user_id` (unique for 1:1 relationship) **Constraints** - Foreign key: user_id references User(id) ON DELETE CASCADE - Phone must be valid format (if provided) - Timezone must be valid (e.g., from IANA timezone database) - Language must be valid ISO 639-1 code - Theme must be one of: light, dark, auto **Notes** - Soft-deleted with parent User (CASCADE delete) - Profile is optional - some users may not have profile data ``` #### Entity: UserPreference ```markdown ### UserPreference Key-value preferences for users. Flexible schema for future preference types. **Purpose**: Stores user preferences without requiring schema changes. **Fields** | Field | Type | Required | Unique | Description | |-------|------|----------|--------|-------------| | id | UUID | Yes | Yes | Primary key | | user_id | UUID (FK) | Yes | No | Foreign key to User.id | | preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) | | preference_value | String(1000) | Yes | No | Preference value as string | | created_at | Timestamp | Yes | No | Creation time | | updated_at | Timestamp | Yes | No | Last update time | **Indexes** - Composite: `(user_id, preference_key)` - For efficient preference lookup - Primary: `user_id` - For finding all preferences for a user **Constraints** - Foreign key: user_id references User(id) ON DELETE CASCADE - Composite unique: `(user_id, preference_key)` - One preference per key per user - preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only) - preference_value must be valid JSON or simple string **Valid Preferences** Examples of preference_key values: - `notifications_email` → "true"/"false" - `notifications_sms` → "true"/"false" - `export_format` → "csv"/"json" - `ui_columns_per_page` → "20"/"50"/"100" **Notes** - Flexible key-value design allows adding preferences without schema changes - Values stored as strings for flexibility, parsed by application layer ``` ### Relationships Section Document how entities relate: ```markdown ## Entity Relationships ``` ┌───────────┐ ┌──────────────┐ ┌─────────────┐ │ User │ │ UserProfile │ │ UserPref │ ├───────────┤ ├──────────────┤ ├─────────────┤ │ id (PK) │ │ id (PK) │ │ id (PK) │ │ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │ │ ... │ │ avatar_url │ │ pref_key │ └───────────┘ │ ... │ │ pref_value │ └──────────────┘ └─────────────┘ ▲ │ 1:many ``` ### Relationship: User → UserProfile (1:1) - **Type**: One-to-One - **Foreign Key**: UserProfile.user_id → User.id - **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User - **Delete Behavior**: CASCADE - Deleting User deletes UserProfile - **Optional**: UserProfile is optional (some users may not have detailed profile) ### Relationship: User → UserPreference (1:many) - **Type**: One-to-Many - **Foreign Key**: UserPreference.user_id → User.id - **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User - **Delete Behavior**: CASCADE - Deleting User deletes all preferences - **Optional**: A User can have zero preferences ``` ### Constraints & Validation Section ```markdown ## Data Constraints & Validation ### Business Logic Constraints - Users cannot have duplicate emails (enforced at database + application) - User phone numbers must be unique if provided - Email and phone cannot both be deleted/NULL in UserProfile ### Data Integrity Rules - password_hash must never be exposed in API responses - deleted_at cannot be set retroactively (only forward through time) - updated_at must be >= created_at ### Referential Integrity - Foreign key constraints enforced at database level - Cascade deletes on User deletion - No orphaned UserProfile or UserPreference records ### Enumeration Values **User.status** - `active` - Account is active - `inactive` - Account temporarily inactive - `suspended` - Account suspended (admin action) **UserProfile.theme** - `light` - Light theme - `dark` - Dark theme - `auto` - Follow system settings **UserPreference.preference_key** - Must match pattern: `[a-z_]+` - Examples: `notifications_email`, `export_format`, `ui_language` ``` ### Scaling Considerations Section ```markdown ## Scaling & Performance ### Expected Data Volume - Users: 100-1000 per day growth - Preferences: ~5-10 per user on average - Year 1 estimate: 36k users, ~180k preference records ### Table Sizes - User table: ~36MB (estimated year 1) - UserProfile table: ~28MB - UserPreference table: ~22MB ### Query Patterns & Indexes - Find user by email: Indexed (UNIQUE index on email) - Find all preferences for user: Indexed (composite on user_id, pref_key) - List users by creation date: Indexed (on created_at) - Filter users by status: Indexed (on status) ### Optimization Notes - Composite index `(user_id, preference_key)` enables efficient preference lookups - Email index enables fast login queries - Consider partitioning UserPreference by user_id for very large scale (100M+ records) ``` ### Migration & Change Management Section ```markdown ## Schema Evolution ### Creating These Tables ```sql CREATE TABLE user ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, first_name VARCHAR(100), last_name VARCHAR(100), status VARCHAR(50) DEFAULT 'active' NOT NULL, created_at TIMESTAMP DEFAULT now() NOT NULL, updated_at TIMESTAMP DEFAULT now() NOT NULL, deleted_at TIMESTAMP ); CREATE TABLE user_profile ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE, avatar_url VARCHAR(500), bio VARCHAR(500), phone VARCHAR(20) UNIQUE, timezone VARCHAR(50), language VARCHAR(5), theme VARCHAR(20), created_at TIMESTAMP DEFAULT now() NOT NULL, updated_at TIMESTAMP DEFAULT now() NOT NULL ); CREATE TABLE user_preference ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE, preference_key VARCHAR(100) NOT NULL, preference_value VARCHAR(1000) NOT NULL, created_at TIMESTAMP DEFAULT now() NOT NULL, updated_at TIMESTAMP DEFAULT now() NOT NULL, UNIQUE(user_id, preference_key) ); CREATE INDEX idx_user_email ON user(email); CREATE INDEX idx_user_created_at ON user(created_at); CREATE INDEX idx_user_status ON user(status); CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key); ``` ### Future Migrations - Q2 2024: Add `last_login_at` to User (nullable, new index) - Q3 2024: Implement user archival (age > 1 year, no activity) ``` ### Documentation & Examples Section ```markdown ## Example Queries ### Find user by email ```sql SELECT * FROM user WHERE email = 'user@example.com'; ``` ### Get user with profile ```sql SELECT u.*, p.* FROM user u LEFT JOIN user_profile p ON u.id = p.user_id WHERE u.id = $1; ``` ### Get user's preferences ```sql SELECT preference_key, preference_value FROM user_preference WHERE user_id = $1 ORDER BY created_at DESC; ``` ### Archive old inactive users ```sql UPDATE user SET deleted_at = now() WHERE status = 'inactive' AND updated_at < now() - interval '1 year' AND deleted_at IS NULL; ``` ``` ## Writing Tips ### Document Constraints Clearly - Why does each field have the constraints it does? - What validation rules apply? - What happens on constraint violations? ### Think About Scale - How much data will this table store? - What are the growth projections? - What indexing strategy is needed? - Will partitioning be needed in the future? ### Link to Related Specs - Reference technical requirements: `[PRD-001]` - Reference API contracts: `[API-001]` (what data is exposed) - Reference design documents: `[DES-001]` ### Include Examples - Sample SQL for common queries - Sample JSON representations - Example migration scripts ### Document Change Constraints - What fields can't change after creation? - What fields are immutable? - How do we handle schema evolution? ## Validation & Fixing Issues ### Run the Validator ```bash scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md ``` ### Common Issues & Fixes **Issue**: "Missing entity field specifications" - **Fix**: Complete the fields table for each entity with types, constraints, descriptions **Issue**: "No relationships documented" - **Fix**: Add a relationships section showing foreign keys and cardinality **Issue**: "TODO items in Constraints (3 items)" - **Fix**: Complete constraint definitions, validation rules, and enumeration values **Issue**: "No scaling or performance information" - **Fix**: Add data volume estimates, indexing strategy, and optimization notes ## Decision-Making Framework As you write the data model, consider: 1. **Entity Design**: What entities do we need? - What are distinct concepts? - What are attributes vs. relationships? - Should data be normalized or denormalized? 2. **Relationships**: How do entities relate? - One-to-one, one-to-many, many-to-many? - Should relationships be required or optional? - How should deletions cascade? 3. **Constraints**: What rules must data satisfy? - Uniqueness constraints? - Required fields? - Data type restrictions? - Enumeration values? 4. **Performance**: How will data be queried? - What indexes are needed? - What's the expected scale? - Are there bottlenecks? 5. **Evolution**: How will this model change? - Can we add fields without migrations? - Can we add entities without breaking things? - How do we handle data migrations? ## Next Steps 1. **Create the spec**: `scripts/generate-spec.sh data-model data-XXX-slug` 2. **Define entities**: What are the main entities/tables? 3. **Specify fields** with types, constraints, descriptions 4. **Document relationships** between entities 5. **Plan indexes** for performance 6. **Validate**: `scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md` 7. **Share with team** before implementation