Files
gh-onezerocompany-claude-pr…/skills/spec-author/guides/data-model.md
2025-11-30 08:45:31 +08:00

491 lines
16 KiB
Markdown

# How to Create a Data Model Specification
Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.
## Quick Start
```bash
# 1. Create a new data model
scripts/generate-spec.sh data-model data-001-descriptive-slug
# 2. Open and fill in the file
# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)
# 3. Fill in entities and relationships, then validate:
scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md
# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md
```
## When to Write a Data Model
Use a Data Model when you need to:
- Define database schema for new features
- Document entity relationships and constraints
- Establish consistent naming conventions
- Enable API/UI teams to understand data structure
- Plan data migrations or refactoring
- Document complex data relationships
## Research Phase
### 1. Research Related Specifications
Find what this data model supports:
```bash
# Find technical requirements this fulfills
grep -r "prd\|technical" docs/specs/ --include="*.md"
# Find existing data models that might be related
grep -r "data\|model" docs/specs/ --include="*.md"
# Find API contracts that expose this data
grep -r "api\|endpoint" docs/specs/ --include="*.md"
```
### 2. Review Existing Data Models
- What data modeling patterns does your codebase use?
- What database are you using (PostgreSQL, MongoDB, etc.)?
- How are relationships currently modeled?
- Naming conventions for fields and entities?
- Any legacy schema patterns to respect or migrate from?
### 3. Research Domain Models
- How do industry-standard models structure similar data?
- Are there existing standards (e.g., ISO, RFC) you should follow?
- What are best practices in this domain?
### 4. Understand Business Rules
- What constraints must the data satisfy?
- What are the cardinality rules (one-to-many, many-to-many)?
- What data must be unique or required?
- What's the expected scale/volume?
## Structure & Content Guide
### Title & Metadata
- **Title**: "User Data Model", "Transaction Model", etc.
- **Scope**: What entities does this model cover?
- **Version**: 1.0 for new models
### Overview Section
Provide context:
```markdown
# User & Profile Data Model
This data model defines the core entities for user management and profile
information. Covers user accounts, authentication data, and user preferences.
**Entities**: User, UserProfile, UserPreference
**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
**Primary Database**: PostgreSQL
```
### Entity Definitions Section
Document each entity/table:
#### Entity: User
```markdown
### User
Core user account entity. Every user must have exactly one User record.
**Purpose**: Represents a user account in the system.
**Fields**
| Field | Type | Required | Unique | Default | Description |
|-------|------|----------|--------|---------|-------------|
| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
| email | String(255) | Yes | Yes | - | User's email address, used for login |
| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
| first_name | String(100) | No | No | - | User's first name |
| last_name | String(100) | No | No | - | User's last name |
| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |
**Indexes**
- Primary: `email` (unique for quick lookups)
- Secondary: `created_at` (for user listing/pagination)
- Secondary: `status` (for filtering active users)
**Constraints**
- Email format must be valid (enforced in application)
- Password must be at least 8 characters (enforced in application)
- Email must be globally unique
- Status can only be: active, inactive, suspended
**Data Volume**
- Expected growth: 100 new users/day
- Estimated year 1: ~36k users
- Estimated year 3: ~150k users
**Archival Strategy**
- Deleted users (deleted_at != NULL) moved to archive after 1 year
- Soft deletes used for data recovery capability
```
#### Entity: UserProfile
```markdown
### UserProfile
Extended user profile information. One-to-one relationship with User.
**Purpose**: Stores optional user profile information separate from core account.
**Fields**
| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
| avatar_url | String(500) | No | No | URL to user's avatar image |
| bio | String(500) | No | No | User bio/description |
| phone | String(20) | No | Yes | User phone number |
| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
| theme | Enum | No | No | UI theme preference: light, dark, auto |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |
**Indexes**
- Primary: `user_id` (unique for 1:1 relationship)
**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Phone must be valid format (if provided)
- Timezone must be valid (e.g., from IANA timezone database)
- Language must be valid ISO 639-1 code
- Theme must be one of: light, dark, auto
**Notes**
- Soft-deleted with parent User (CASCADE delete)
- Profile is optional - some users may not have profile data
```
#### Entity: UserPreference
```markdown
### UserPreference
Key-value preferences for users. Flexible schema for future preference types.
**Purpose**: Stores user preferences without requiring schema changes.
**Fields**
| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
| preference_value | String(1000) | Yes | No | Preference value as string |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |
**Indexes**
- Composite: `(user_id, preference_key)` - For efficient preference lookup
- Primary: `user_id` - For finding all preferences for a user
**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Composite unique: `(user_id, preference_key)` - One preference per key per user
- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
- preference_value must be valid JSON or simple string
**Valid Preferences**
Examples of preference_key values:
- `notifications_email` → "true"/"false"
- `notifications_sms` → "true"/"false"
- `export_format` → "csv"/"json"
- `ui_columns_per_page` → "20"/"50"/"100"
**Notes**
- Flexible key-value design allows adding preferences without schema changes
- Values stored as strings for flexibility, parsed by application layer
```
### Relationships Section
Document how entities relate:
```markdown
## Entity Relationships
```
┌───────────┐ ┌──────────────┐ ┌─────────────┐
│ User │ │ UserProfile │ │ UserPref │
├───────────┤ ├──────────────┤ ├─────────────┤
│ id (PK) │ │ id (PK) │ │ id (PK) │
│ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │
│ ... │ │ avatar_url │ │ pref_key │
└───────────┘ │ ... │ │ pref_value │
└──────────────┘ └─────────────┘
1:many
```
### Relationship: User → UserProfile (1:1)
- **Type**: One-to-One
- **Foreign Key**: UserProfile.user_id → User.id
- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
- **Optional**: UserProfile is optional (some users may not have detailed profile)
### Relationship: User → UserPreference (1:many)
- **Type**: One-to-Many
- **Foreign Key**: UserPreference.user_id → User.id
- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
- **Optional**: A User can have zero preferences
```
### Constraints & Validation Section
```markdown
## Data Constraints & Validation
### Business Logic Constraints
- Users cannot have duplicate emails (enforced at database + application)
- User phone numbers must be unique if provided
- Email and phone cannot both be deleted/NULL in UserProfile
### Data Integrity Rules
- password_hash must never be exposed in API responses
- deleted_at cannot be set retroactively (only forward through time)
- updated_at must be >= created_at
### Referential Integrity
- Foreign key constraints enforced at database level
- Cascade deletes on User deletion
- No orphaned UserProfile or UserPreference records
### Enumeration Values
**User.status**
- `active` - Account is active
- `inactive` - Account temporarily inactive
- `suspended` - Account suspended (admin action)
**UserProfile.theme**
- `light` - Light theme
- `dark` - Dark theme
- `auto` - Follow system settings
**UserPreference.preference_key**
- Must match pattern: `[a-z_]+`
- Examples: `notifications_email`, `export_format`, `ui_language`
```
### Scaling Considerations Section
```markdown
## Scaling & Performance
### Expected Data Volume
- Users: 100-1000 per day growth
- Preferences: ~5-10 per user on average
- Year 1 estimate: 36k users, ~180k preference records
### Table Sizes
- User table: ~36MB (estimated year 1)
- UserProfile table: ~28MB
- UserPreference table: ~22MB
### Query Patterns & Indexes
- Find user by email: Indexed (UNIQUE index on email)
- Find all preferences for user: Indexed (composite on user_id, pref_key)
- List users by creation date: Indexed (on created_at)
- Filter users by status: Indexed (on status)
### Optimization Notes
- Composite index `(user_id, preference_key)` enables efficient preference lookups
- Email index enables fast login queries
- Consider partitioning UserPreference by user_id for very large scale (100M+ records)
```
### Migration & Change Management Section
```markdown
## Schema Evolution
### Creating These Tables
```sql
CREATE TABLE user (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
status VARCHAR(50) DEFAULT 'active' NOT NULL,
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL,
deleted_at TIMESTAMP
);
CREATE TABLE user_profile (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
avatar_url VARCHAR(500),
bio VARCHAR(500),
phone VARCHAR(20) UNIQUE,
timezone VARCHAR(50),
language VARCHAR(5),
theme VARCHAR(20),
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL
);
CREATE TABLE user_preference (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
preference_key VARCHAR(100) NOT NULL,
preference_value VARCHAR(1000) NOT NULL,
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL,
UNIQUE(user_id, preference_key)
);
CREATE INDEX idx_user_email ON user(email);
CREATE INDEX idx_user_created_at ON user(created_at);
CREATE INDEX idx_user_status ON user(status);
CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);
```
### Future Migrations
- Q2 2024: Add `last_login_at` to User (nullable, new index)
- Q3 2024: Implement user archival (age > 1 year, no activity)
```
### Documentation & Examples Section
```markdown
## Example Queries
### Find user by email
```sql
SELECT * FROM user WHERE email = 'user@example.com';
```
### Get user with profile
```sql
SELECT u.*, p.*
FROM user u
LEFT JOIN user_profile p ON u.id = p.user_id
WHERE u.id = $1;
```
### Get user's preferences
```sql
SELECT preference_key, preference_value
FROM user_preference
WHERE user_id = $1
ORDER BY created_at DESC;
```
### Archive old inactive users
```sql
UPDATE user
SET deleted_at = now()
WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
AND deleted_at IS NULL;
```
```
## Writing Tips
### Document Constraints Clearly
- Why does each field have the constraints it does?
- What validation rules apply?
- What happens on constraint violations?
### Think About Scale
- How much data will this table store?
- What are the growth projections?
- What indexing strategy is needed?
- Will partitioning be needed in the future?
### Link to Related Specs
- Reference technical requirements: `[PRD-001]`
- Reference API contracts: `[API-001]` (what data is exposed)
- Reference design documents: `[DES-001]`
### Include Examples
- Sample SQL for common queries
- Sample JSON representations
- Example migration scripts
### Document Change Constraints
- What fields can't change after creation?
- What fields are immutable?
- How do we handle schema evolution?
## Validation & Fixing Issues
### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md
```
### Common Issues & Fixes
**Issue**: "Missing entity field specifications"
- **Fix**: Complete the fields table for each entity with types, constraints, descriptions
**Issue**: "No relationships documented"
- **Fix**: Add a relationships section showing foreign keys and cardinality
**Issue**: "TODO items in Constraints (3 items)"
- **Fix**: Complete constraint definitions, validation rules, and enumeration values
**Issue**: "No scaling or performance information"
- **Fix**: Add data volume estimates, indexing strategy, and optimization notes
## Decision-Making Framework
As you write the data model, consider:
1. **Entity Design**: What entities do we need?
- What are distinct concepts?
- What are attributes vs. relationships?
- Should data be normalized or denormalized?
2. **Relationships**: How do entities relate?
- One-to-one, one-to-many, many-to-many?
- Should relationships be required or optional?
- How should deletions cascade?
3. **Constraints**: What rules must data satisfy?
- Uniqueness constraints?
- Required fields?
- Data type restrictions?
- Enumeration values?
4. **Performance**: How will data be queried?
- What indexes are needed?
- What's the expected scale?
- Are there bottlenecks?
5. **Evolution**: How will this model change?
- Can we add fields without migrations?
- Can we add entities without breaking things?
- How do we handle data migrations?
## Next Steps
1. **Create the spec**: `scripts/generate-spec.sh data-model data-XXX-slug`
2. **Define entities**: What are the main entities/tables?
3. **Specify fields** with types, constraints, descriptions
4. **Document relationships** between entities
5. **Plan indexes** for performance
6. **Validate**: `scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md`
7. **Share with team** before implementation