zhongwei/gh-onezerocompany-claude-project-basics

Files

Zhongwei Li ca9b85ccda Initial commit

2025-11-30 08:45:31 +08:00

16 KiB

Raw Permalink Blame History

How to Create a Data Model Specification

Data Model specifications document the entities, fields, relationships, and constraints for your application's data. They define the "shape" of data your system works with.

Quick Start

# 1. Create a new data model
scripts/generate-spec.sh data-model data-001-descriptive-slug

# 2. Open and fill in the file
# (The file will be created at: docs/specs/data-model/data-001-descriptive-slug.md)

# 3. Fill in entities and relationships, then validate:
scripts/validate-spec.sh docs/specs/data-model/data-001-descriptive-slug.md

# 4. Fix issues and check completeness:
scripts/check-completeness.sh docs/specs/data-model/data-001-descriptive-slug.md

When to Write a Data Model

Use a Data Model when you need to:

Define database schema for new features
Document entity relationships and constraints
Establish consistent naming conventions
Enable API/UI teams to understand data structure
Plan data migrations or refactoring
Document complex data relationships

Research Phase

Find what this data model supports:

# Find technical requirements this fulfills
grep -r "prd\|technical" docs/specs/ --include="*.md"

# Find existing data models that might be related
grep -r "data\|model" docs/specs/ --include="*.md"

# Find API contracts that expose this data
grep -r "api\|endpoint" docs/specs/ --include="*.md"

2. Review Existing Data Models

What data modeling patterns does your codebase use?
What database are you using (PostgreSQL, MongoDB, etc.)?
How are relationships currently modeled?
Naming conventions for fields and entities?
Any legacy schema patterns to respect or migrate from?

3. Research Domain Models

How do industry-standard models structure similar data?
Are there existing standards (e.g., ISO, RFC) you should follow?
What are best practices in this domain?

4. Understand Business Rules

What constraints must the data satisfy?
What are the cardinality rules (one-to-many, many-to-many)?
What data must be unique or required?
What's the expected scale/volume?

Structure & Content Guide

Title & Metadata

Title: "User Data Model", "Transaction Model", etc.
Scope: What entities does this model cover?
Version: 1.0 for new models

Overview Section

Provide context:

# User & Profile Data Model

This data model defines the core entities for user management and profile
information. Covers user accounts, authentication data, and user preferences.

**Entities**: User, UserProfile, UserPreference
**Relationships**: User → UserProfile (1:1), User → UserPreference (1:many)
**Primary Database**: PostgreSQL

Entity Definitions Section

Document each entity/table:

Entity: User

### User

Core user account entity. Every user must have exactly one User record.

**Purpose**: Represents a user account in the system.

**Fields**

| Field | Type | Required | Unique | Default | Description |
|-------|------|----------|--------|---------|-------------|
| id | UUID | Yes | Yes | auto | Primary key, auto-generated |
| email | String(255) | Yes | Yes | - | User's email address, used for login |
| password_hash | String(255) | Yes | No | - | Bcrypt hash of password (cost=12) |
| first_name | String(100) | No | No | - | User's first name |
| last_name | String(100) | No | No | - | User's last name |
| status | Enum | Yes | No | active | Account status: active, inactive, suspended |
| created_at | Timestamp | Yes | No | now() | Account creation time (UTC) |
| updated_at | Timestamp | Yes | No | now() | Last update time (UTC) |
| deleted_at | Timestamp | No | No | NULL | Soft-delete timestamp, NULL if active |

**Indexes**
- Primary: `email` (unique for quick lookups)
- Secondary: `created_at` (for user listing/pagination)
- Secondary: `status` (for filtering active users)

**Constraints**
- Email format must be valid (enforced in application)
- Password must be at least 8 characters (enforced in application)
- Email must be globally unique
- Status can only be: active, inactive, suspended

**Data Volume**
- Expected growth: 100 new users/day
- Estimated year 1: ~36k users
- Estimated year 3: ~150k users

**Archival Strategy**
- Deleted users (deleted_at != NULL) moved to archive after 1 year
- Soft deletes used for data recovery capability

Entity: UserProfile

### UserProfile

Extended user profile information. One-to-one relationship with User.

**Purpose**: Stores optional user profile information separate from core account.

**Fields**

| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | Yes | Foreign key to User.id |
| avatar_url | String(500) | No | No | URL to user's avatar image |
| bio | String(500) | No | No | User bio/description |
| phone | String(20) | No | Yes | User phone number |
| timezone | String(50) | No | No | User's timezone (e.g., America/New_York) |
| language | String(5) | No | No | Preferred language (ISO 639-1, e.g., en, fr) |
| theme | Enum | No | No | UI theme preference: light, dark, auto |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |

**Indexes**
- Primary: `user_id` (unique for 1:1 relationship)

**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Phone must be valid format (if provided)
- Timezone must be valid (e.g., from IANA timezone database)
- Language must be valid ISO 639-1 code
- Theme must be one of: light, dark, auto

**Notes**
- Soft-deleted with parent User (CASCADE delete)
- Profile is optional - some users may not have profile data

Entity: UserPreference

### UserPreference

Key-value preferences for users. Flexible schema for future preference types.

**Purpose**: Stores user preferences without requiring schema changes.

**Fields**

| Field | Type | Required | Unique | Description |
|-------|------|----------|--------|-------------|
| id | UUID | Yes | Yes | Primary key |
| user_id | UUID (FK) | Yes | No | Foreign key to User.id |
| preference_key | String(100) | Yes | No | Preference identifier (e.g., notifications_email) |
| preference_value | String(1000) | Yes | No | Preference value as string |
| created_at | Timestamp | Yes | No | Creation time |
| updated_at | Timestamp | Yes | No | Last update time |

**Indexes**
- Composite: `(user_id, preference_key)` - For efficient preference lookup
- Primary: `user_id` - For finding all preferences for a user

**Constraints**
- Foreign key: user_id references User(id) ON DELETE CASCADE
- Composite unique: `(user_id, preference_key)` - One preference per key per user
- preference_key must match pattern: `[a-z_]+` (lowercase letters and underscores only)
- preference_value must be valid JSON or simple string

**Valid Preferences**
Examples of preference_key values:
- `notifications_email` → "true"/"false"
- `notifications_sms` → "true"/"false"
- `export_format` → "csv"/"json"
- `ui_columns_per_page` → "20"/"50"/"100"

**Notes**
- Flexible key-value design allows adding preferences without schema changes
- Values stored as strings for flexibility, parsed by application layer

Relationships Section

Document how entities relate:

## Entity Relationships

┌───────────┐ ┌──────────────┐ ┌─────────────┐ │ User │ │ UserProfile │ │ UserPref │ ├───────────┤ ├──────────────┤ ├─────────────┤ │ id (PK) │ │ id (PK) │ │ id (PK) │ │ email │◄───1:1──│ user_id (FK) │ │ user_id(FK) │ │ ... │ │ avatar_url │ │ pref_key │ └───────────┘ │ ... │ │ pref_value │ └──────────────┘ └─────────────┘ ▲ │ 1:many


### Relationship: User → UserProfile (1:1)
- **Type**: One-to-One
- **Foreign Key**: UserProfile.user_id → User.id
- **Cardinality**: A User has exactly one UserProfile; a UserProfile belongs to exactly one User
- **Delete Behavior**: CASCADE - Deleting User deletes UserProfile
- **Optional**: UserProfile is optional (some users may not have detailed profile)

### Relationship: User → UserPreference (1:many)
- **Type**: One-to-Many
- **Foreign Key**: UserPreference.user_id → User.id
- **Cardinality**: A User can have many UserPreferences; each UserPreference belongs to one User
- **Delete Behavior**: CASCADE - Deleting User deletes all preferences
- **Optional**: A User can have zero preferences

Constraints & Validation Section

## Data Constraints & Validation

### Business Logic Constraints
- Users cannot have duplicate emails (enforced at database + application)
- User phone numbers must be unique if provided
- Email and phone cannot both be deleted/NULL in UserProfile

### Data Integrity Rules
- password_hash must never be exposed in API responses
- deleted_at cannot be set retroactively (only forward through time)
- updated_at must be >= created_at

### Referential Integrity
- Foreign key constraints enforced at database level
- Cascade deletes on User deletion
- No orphaned UserProfile or UserPreference records

### Enumeration Values

**User.status**
- `active` - Account is active
- `inactive` - Account temporarily inactive
- `suspended` - Account suspended (admin action)

**UserProfile.theme**
- `light` - Light theme
- `dark` - Dark theme
- `auto` - Follow system settings

**UserPreference.preference_key**
- Must match pattern: `[a-z_]+`
- Examples: `notifications_email`, `export_format`, `ui_language`

Scaling Considerations Section

## Scaling & Performance

### Expected Data Volume
- Users: 100-1000 per day growth
- Preferences: ~5-10 per user on average
- Year 1 estimate: 36k users, ~180k preference records

### Table Sizes
- User table: ~36MB (estimated year 1)
- UserProfile table: ~28MB
- UserPreference table: ~22MB

### Query Patterns & Indexes
- Find user by email: Indexed (UNIQUE index on email)
- Find all preferences for user: Indexed (composite on user_id, pref_key)
- List users by creation date: Indexed (on created_at)
- Filter users by status: Indexed (on status)

### Optimization Notes
- Composite index `(user_id, preference_key)` enables efficient preference lookups
- Email index enables fast login queries
- Consider partitioning UserPreference by user_id for very large scale (100M+ records)

Migration & Change Management Section

## Schema Evolution

### Creating These Tables
```sql
CREATE TABLE user (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email VARCHAR(255) UNIQUE NOT NULL,
  password_hash VARCHAR(255) NOT NULL,
  first_name VARCHAR(100),
  last_name VARCHAR(100),
  status VARCHAR(50) DEFAULT 'active' NOT NULL,
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL,
  deleted_at TIMESTAMP
);

CREATE TABLE user_profile (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID UNIQUE NOT NULL REFERENCES user(id) ON DELETE CASCADE,
  avatar_url VARCHAR(500),
  bio VARCHAR(500),
  phone VARCHAR(20) UNIQUE,
  timezone VARCHAR(50),
  language VARCHAR(5),
  theme VARCHAR(20),
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL
);

CREATE TABLE user_preference (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE,
  preference_key VARCHAR(100) NOT NULL,
  preference_value VARCHAR(1000) NOT NULL,
  created_at TIMESTAMP DEFAULT now() NOT NULL,
  updated_at TIMESTAMP DEFAULT now() NOT NULL,
  UNIQUE(user_id, preference_key)
);

CREATE INDEX idx_user_email ON user(email);
CREATE INDEX idx_user_created_at ON user(created_at);
CREATE INDEX idx_user_status ON user(status);
CREATE INDEX idx_preference_lookup ON user_preference(user_id, preference_key);

Future Migrations

Q2 2024: Add last_login_at to User (nullable, new index)
Q3 2024: Implement user archival (age > 1 year, no activity)


### Documentation & Examples Section

```markdown
## Example Queries

### Find user by email
```sql
SELECT * FROM user WHERE email = 'user@example.com';

Get user with profile

SELECT u.*, p.*
FROM user u
LEFT JOIN user_profile p ON u.id = p.user_id
WHERE u.id = $1;

Get user's preferences

SELECT preference_key, preference_value
FROM user_preference
WHERE user_id = $1
ORDER BY created_at DESC;

Archive old inactive users

UPDATE user
SET deleted_at = now()
WHERE status = 'inactive' AND updated_at < now() - interval '1 year'
AND deleted_at IS NULL;


## Writing Tips

### Document Constraints Clearly
- Why does each field have the constraints it does?
- What validation rules apply?
- What happens on constraint violations?

### Think About Scale
- How much data will this table store?
- What are the growth projections?
- What indexing strategy is needed?
- Will partitioning be needed in the future?

### Link to Related Specs
- Reference technical requirements: `[PRD-001]`
- Reference API contracts: `[API-001]` (what data is exposed)
- Reference design documents: `[DES-001]`

### Include Examples
- Sample SQL for common queries
- Sample JSON representations
- Example migration scripts

### Document Change Constraints
- What fields can't change after creation?
- What fields are immutable?
- How do we handle schema evolution?

## Validation & Fixing Issues

### Run the Validator
```bash
scripts/validate-spec.sh docs/specs/data-model/data-001-your-spec.md

Common Issues & Fixes

Issue: "Missing entity field specifications"

Fix: Complete the fields table for each entity with types, constraints, descriptions

Issue: "No relationships documented"

Fix: Add a relationships section showing foreign keys and cardinality

Issue: "TODO items in Constraints (3 items)"

Fix: Complete constraint definitions, validation rules, and enumeration values

Issue: "No scaling or performance information"

Fix: Add data volume estimates, indexing strategy, and optimization notes

Decision-Making Framework

As you write the data model, consider:

Entity Design: What entities do we need?
- What are distinct concepts?
- What are attributes vs. relationships?
- Should data be normalized or denormalized?
Relationships: How do entities relate?
- One-to-one, one-to-many, many-to-many?
- Should relationships be required or optional?
- How should deletions cascade?
Constraints: What rules must data satisfy?
- Uniqueness constraints?
- Required fields?
- Data type restrictions?
- Enumeration values?
Performance: How will data be queried?
- What indexes are needed?
- What's the expected scale?
- Are there bottlenecks?
Evolution: How will this model change?
- Can we add fields without migrations?
- Can we add entities without breaking things?
- How do we handle data migrations?

Next Steps

Create the spec: scripts/generate-spec.sh data-model data-XXX-slug
Define entities: What are the main entities/tables?
Specify fields with types, constraints, descriptions
Document relationships between entities
Plan indexes for performance
Validate: scripts/validate-spec.sh docs/specs/data-model/data-XXX-slug.md
Share with team before implementation

16 KiB Raw Permalink Blame History