Files
2025-11-30 08:45:31 +08:00

16 KiB

How to Create a Data Model Specification

Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.

Quick Start

# 1. Create a new data model
scripts/generate-spec.sh data-model data-001-descriptive-slug

# 2. Open and fill in the file
# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)

# 3. Fill in entities and relationships, then validate:
scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md

# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md

When to Write a Data Model

Use a Data Model when you need to:

  • Define database schema for new features
  • Document entity relationships and constraints
  • Establish consistent naming conventions
  • Enable API/UI teams to understand data structure
  • Plan data migrations or refactoring
  • Document complex data relationships

Research Phase

Find what this data model supports:

# Find technical requirements this fulfills
grep -r "prd\|technical" docs/specs/ --include="*.md"

# Find existing data models that might be related
grep -r "data\|model" docs/specs/ --include="*.md"

# Find API contracts that expose this data
grep -r "api\|endpoint" docs/specs/ --include="*.md"

2. Review Existing Data Models

  • What data modeling patterns does your codebase use?
  • What database are you using (PostgreSQL, MongoDB, etc.)?
  • How are relationships currently modeled?
  • Naming conventions for fields and entities?
  • Any legacy schema patterns to respect or migrate from?

3. Research Domain Models

  • How do industry-standard models structure similar data?
  • Are there existing standards (e.g., ISO, RFC) you should follow?
  • What are best practices in this domain?

4. Understand Business Rules

  • What constraints must the data satisfy?
  • What are the cardinality rules (one-to-many, many-to-many)?
  • What data must be unique or required?
  • What's the expected scale/volume?

Structure & Content Guide

Title & Metadata

  • Title: "User Data Model", "Transaction Model", etc.
  • Scope: What entities does this model cover?
  • Version: 1.0 for new models

Overview Section

Provide context:

# User & Profile Data Model

This data model defines the core entities for user management and profile
information. Covers user accounts, authentication data, and user preferences.

**Entities**: User, UserProfile, UserPreference
**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
**Primary Database**: PostgreSQL

Entity Definitions Section

Document each entity/table:

Entity: User

### User

Core user account entity. Every user must have exactly one User record.

**Purpose**: Represents a user account in the system.

**Fields**

| Field | Type | Required | Unique | Default | Description |
|-------|------|----------|--------|---------|-------------|
| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
| email | String(255) | Yes | Yes | - | User's email address, used for login |
| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
| first_name | String(100) | No | No | - | User's first name |
| last_name | String(100) | No | No | - | User's last name |
| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |

**Indexes**
- Primary: `email` (unique for quick lookups)
- Secondary: `created_at` (for user listing/pagination)
- Secondary: `status` (for filtering active users)

**Constraints**
- Email format must be valid (enforced in application)
- Password must be at least 8 characters (enforced in application)
- Email must be globally unique
- Status can only be: active, inactive, suspended

**Data Volume**
- Expected growth: 100 new users/day
- Estimated year 1: ~36k users
- Estimated year 3: ~150k users

**Archival Strategy**
- Deleted users (deleted_at != NULL) moved to archive after 1 year
- Soft deletes used for data recovery capability

Entity: UserProfile

### UserProfile

Extended user profile information. One-to-one relationship with User.

**Purpose**: Stores optional user profile information separate from core account.

**Fields**

| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
| avatar_url | String(500) | No | No | URL to user's avatar image |
| bio | String(500) | No | No | User bio/description |
| phone | String(20) | No | Yes | User phone number |
| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
| theme | Enum | No | No | UI theme preference: light, dark, auto |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |

**Indexes**
- Primary: `user_id` (unique for 1:1 relationship)

**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Phone must be valid format (if provided)
- Timezone must be valid (e.g., from IANA timezone database)
- Language must be valid ISO 639-1 code
- Theme must be one of: light, dark, auto

**Notes**
- Soft-deleted with parent User (CASCADE delete)
- Profile is optional - some users may not have profile data

Entity: UserPreference

### UserPreference

Key-value preferences for users. Flexible schema for future preference types.

**Purpose**: Stores user preferences without requiring schema changes.

**Fields**

| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
| preference_value | String(1000) | Yes | No | Preference value as string |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |

**Indexes**
- Composite: `(user_id, preference_key)` - For efficient preference lookup
- Primary: `user_id` - For finding all preferences for a user

**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Composite unique: `(user_id, preference_key)` - One preference per key per user
- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
- preference_value must be valid JSON or simple string

**Valid Preferences**
Examples of preference_key values:
- `notifications_email` → "true"/"false"
- `notifications_sms` → "true"/"false"
- `export_format` → "csv"/"json"
- `ui_columns_per_page` → "20"/"50"/"100"

**Notes**
- Flexible key-value design allows adding preferences without schema changes
- Values stored as strings for flexibility, parsed by application layer

Relationships Section

Document how entities relate:

## Entity Relationships

┌───────────┐ ┌──────────────┐ ┌─────────────┐ │ User │ │ UserProfile │ │ UserPref │ ├───────────┤ ├──────────────┤ ├─────────────┤ │ id (PK) │ │ id (PK) │ │ id (PK) │ │ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │ │ ... │ │ avatar_url │ │ pref_key │ └───────────┘ │ ... │ │ pref_value │ └──────────────┘ └─────────────┘ ▲ │ 1:many


### Relationship: User → UserProfile (1:1)
- **Type**: One-to-One
- **Foreign Key**: UserProfile.user_id → User.id
- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
- **Optional**: UserProfile is optional (some users may not have detailed profile)

### Relationship: User → UserPreference (1:many)
- **Type**: One-to-Many
- **Foreign Key**: UserPreference.user_id → User.id
- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
- **Optional**: A User can have zero preferences

Constraints & Validation Section

## Data Constraints & Validation

### Business Logic Constraints
- Users cannot have duplicate emails (enforced at database + application)
- User phone numbers must be unique if provided
- Email and phone cannot both be deleted/NULL in UserProfile

### Data Integrity Rules
- password_hash must never be exposed in API responses
- deleted_at cannot be set retroactively (only forward through time)
- updated_at must be >= created_at

### Referential Integrity
- Foreign key constraints enforced at database level
- Cascade deletes on User deletion
- No orphaned UserProfile or UserPreference records

### Enumeration Values

**User.status**
- `active` - Account is active
- `inactive` - Account temporarily inactive
- `suspended` - Account suspended (admin action)

**UserProfile.theme**
- `light` - Light theme
- `dark` - Dark theme
- `auto` - Follow system settings

**UserPreference.preference_key**
- Must match pattern: `[a-z_]+`
- Examples: `notifications_email`, `export_format`, `ui_language`

Scaling Considerations Section

## Scaling & Performance

### Expected Data Volume
- Users: 100-1000 per day growth
- Preferences: ~5-10 per user on average
- Year 1 estimate: 36k users, ~180k preference records

### Table Sizes
- User table: ~36MB (estimated year 1)
- UserProfile table: ~28MB
- UserPreference table: ~22MB

### Query Patterns & Indexes
- Find user by email: Indexed (UNIQUE index on email)
- Find all preferences for user: Indexed (composite on user_id, pref_key)
- List users by creation date: Indexed (on created_at)
- Filter users by status: Indexed (on status)

### Optimization Notes
- Composite index `(user_id, preference_key)` enables efficient preference lookups
- Email index enables fast login queries
- Consider partitioning UserPreference by user_id for very large scale (100M+ records)

Migration & Change Management Section

## Schema Evolution

### Creating These Tables
```sql
CREATE TABLE user (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email VARCHAR(255) UNIQUE NOT NULL,
  password_hash VARCHAR(255) NOT NULL,
  first_name VARCHAR(100),
  last_name VARCHAR(100),
  status VARCHAR(50) DEFAULT 'active' NOT NULL,
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL,
  deleted_at TIMESTAMP
);

CREATE TABLE user_profile (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
  avatar_url VARCHAR(500),
  bio VARCHAR(500),
  phone VARCHAR(20) UNIQUE,
  timezone VARCHAR(50),
  language VARCHAR(5),
  theme VARCHAR(20),
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL
);

CREATE TABLE user_preference (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
  preference_key VARCHAR(100) NOT NULL,
  preference_value VARCHAR(1000) NOT NULL,
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL,
  UNIQUE(user_id, preference_key)
);

CREATE INDEX idx_user_email ON user(email);
CREATE INDEX idx_user_created_at ON user(created_at);
CREATE INDEX idx_user_status ON user(status);
CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);

Future Migrations

  • Q2 2024: Add last_login_at to User (nullable, new index)
  • Q3 2024: Implement user archival (age > 1 year, no activity)

### Documentation & Examples Section

```markdown
## Example Queries

### Find user by email
```sql
SELECT * FROM user WHERE email = 'user@example.com';

Get user with profile

SELECT u.*, p.*
FROM user u
LEFT JOIN user_profile p ON u.id = p.user_id
WHERE u.id = $1;

Get user's preferences

SELECT preference_key, preference_value
FROM user_preference
WHERE user_id = $1
ORDER BY created_at DESC;

Archive old inactive users

UPDATE user
SET deleted_at = now()
WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
AND deleted_at IS NULL;

## Writing Tips

### Document Constraints Clearly
- Why does each field have the constraints it does?
- What validation rules apply?
- What happens on constraint violations?

### Think About Scale
- How much data will this table store?
- What are the growth projections?
- What indexing strategy is needed?
- Will partitioning be needed in the future?

### Link to Related Specs
- Reference technical requirements: `[PRD-001]`
- Reference API contracts: `[API-001]` (what data is exposed)
- Reference design documents: `[DES-001]`

### Include Examples
- Sample SQL for common queries
- Sample JSON representations
- Example migration scripts

### Document Change Constraints
- What fields can't change after creation?
- What fields are immutable?
- How do we handle schema evolution?

## Validation & Fixing Issues

### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md

Common Issues & Fixes

Issue: "Missing entity field specifications"

  • Fix: Complete the fields table for each entity with types, constraints, descriptions

Issue: "No relationships documented"

  • Fix: Add a relationships section showing foreign keys and cardinality

Issue: "TODO items in Constraints (3 items)"

  • Fix: Complete constraint definitions, validation rules, and enumeration values

Issue: "No scaling or performance information"

  • Fix: Add data volume estimates, indexing strategy, and optimization notes

Decision-Making Framework

As you write the data model, consider:

  1. Entity Design: What entities do we need?

    • What are distinct concepts?
    • What are attributes vs. relationships?
    • Should data be normalized or denormalized?
  2. Relationships: How do entities relate?

    • One-to-one, one-to-many, many-to-many?
    • Should relationships be required or optional?
    • How should deletions cascade?
  3. Constraints: What rules must data satisfy?

    • Uniqueness constraints?
    • Required fields?
    • Data type restrictions?
    • Enumeration values?
  4. Performance: How will data be queried?

    • What indexes are needed?
    • What's the expected scale?
    • Are there bottlenecks?
  5. Evolution: How will this model change?

    • Can we add fields without migrations?
    • Can we add entities without breaking things?
    • How do we handle data migrations?

Next Steps

  1. Create the spec: scripts/generate-spec.sh data-model data-XXX-slug
  2. Define entities: What are the main entities/tables?
  3. Specify fields with types, constraints, descriptions
  4. Document relationships between entities
  5. Plan indexes for performance
  6. Validate: scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md
  7. Share with team before implementation