8.6 KiB
Data Schema & Knowledge Modeling Template
Workflow
Copy this checklist and track your progress:
Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes systematically
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate against use cases and document
Step 1: Gather domain requirements and scope
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use Input Questions.
Step 2: Identify entities and attributes systematically
Extract entities from requirements using Entity Identification. Define attributes with types and nullability using Attribute Guide.
Step 3: Define relationships and cardinality
Map entity connections using Relationship Mapping. Specify cardinality (1:1, 1:N, M:N) and optionality.
Step 4: Specify constraints and invariants
Define business rules and constraints using Constraint Specification. Document domain invariants.
Step 5: Validate against use cases and document
Create data-schema-knowledge-modeling.md using Template. Verify using Validation Checklist.
Input Questions
Domain & Scope:
- What domain? (e-commerce, healthcare, social network)
- Boundaries? In/out of scope?
- Existing schemas to integrate/migrate from?
Core Use Cases:
- Primary operations? (CRUD for which entities?)
- Required queries/reports?
- Access patterns? (read-heavy, write-heavy, mixed)
Scale & Performance:
- Data volume? (rows per table, storage)
- Growth rate? (daily/monthly)
- Performance SLAs?
Technology:
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
- Compliance? (GDPR, HIPAA, SOC2)
- Evolution needs? (schema versioning, migrations)
Entity Identification
Step 1: Extract nouns
List nouns from requirements = candidate entities.
Step 2: Validate
For each, check:
- Distinct identity? (can point to "this specific X")
- Independent lifecycle?
- Multiple attributes beyond name?
- Track multiple instances?
Keep if yes to most. Reject if just an attribute.
Step 3: Entity vs Value Object
- Entity: Has ID, mutable (User, Order)
- Value Object: No ID, immutable (Address, Money)
Step 4: Document
### Entity: [Name]
**Purpose:** [What it represents]
**Examples:** [2-3 concrete cases]
**Lifecycle:** [Creation → deletion]
**Invariants:** [Rules that must hold]
Attribute Guide
Template:
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
- Description: [What it represents]
- Validation: [Constraints]
- Examples: [Sample values]
Standard attributes:
id: Primary key (UUID/BIGINT)createdAt: TIMESTAMP NOT NULLupdatedAt: TIMESTAMP NOT NULLdeletedAt: TIMESTAMP NULL (soft deletes)
Data types:
| Data | SQL | NoSQL | Notes |
|---|---|---|---|
| Short text | VARCHAR(N) | String | Specify max |
| Long text | TEXT | String | No limit |
| Integer | INT/BIGINT | Number | Choose size |
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
| Boolean | BOOLEAN | Boolean | Not nullable |
| Date/Time | TIMESTAMP | ISODate | With timezone |
| UUID | UUID/CHAR(36) | String | Distributed IDs |
| JSON | JSON/JSONB | Object | Flexible |
| Enum | ENUM/VARCHAR | String | Fixed values |
Nullability:
- NOT NULL if required
- NULL if optional/unknown at creation
- Avoid NULL for booleans
Relationship Mapping
Cardinality:
1:1 - User has one Profile
- SQL:
Profile.userId UNIQUE NOT NULL REFERENCES User(id)
1:N - User has many Orders
- SQL:
Order.userId NOT NULL REFERENCES User(id)
M:N - Order contains Products
- Junction table:
OrderItem ( orderId REFERENCES Order(id), productId REFERENCES Product(id), quantity INT NOT NULL, PRIMARY KEY (orderId, productId) )
Optionality:
- Required: NOT NULL
- Optional: NULL
Naming: Use verbs: User owns Order, Product belongs to Category
Constraint Specification
Primary Keys:
id BIGINT PRIMARY KEY AUTO_INCREMENT
-- or --
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
Unique:
email VARCHAR(255) UNIQUE NOT NULL
UNIQUE (userId, productId) -- composite
Foreign Keys:
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
-- Options: CASCADE, SET NULL, RESTRICT
Check Constraints:
price DECIMAL(10,2) CHECK (price >= 0)
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
Domain Invariants:
Document business rules:
### Invariant: Order total = sum of items
Order.total = SUM(OrderItem.quantity * OrderItem.price)
### Invariant: Unique email
No duplicate emails (case-insensitive)
Enforce via: DB constraints (preferred), application logic, or triggers.
Schema Documentation Template
Create: data-schema-knowledge-modeling.md
Required sections:
- Domain Overview - Purpose, scope, technology
- Use Cases - Primary operations, query patterns
- Entity Definitions - For each entity:
- Purpose, examples, lifecycle
- Attributes table (name, type, null, default, constraints, description)
- Relationships (cardinality, FK, optionality)
- Invariants
- ERD - Visual/text diagram showing relationships
- Constraints - DB constraints, domain invariants
- Normalization - Level, denormalization decisions
- Implementation - SQL DDL / JSON Schema / Graph schema as appropriate
- Validation - Check each use case is supported
- Open Questions - Unresolved decisions
Example entity definition:
### Entity: Order
**Purpose:** Represents customer purchase transaction
**Examples:** Amazon order #123, Shopify order #456
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
#### Attributes
| Attribute | Type | Null? | Default | Constraints | Description |
|---|---|---|---|---|---|
| id | BIGINT | NO | auto | PK | Unique identifier |
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
#### Relationships
- **belongs to:** 1:N with User (Order.userId → User.id)
- **contains:** 1:N with OrderItem junction table
#### Invariants
- total = SUM(OrderItem.quantity * OrderItem.price)
- status transitions: pending → confirmed → shipped → delivered
Validation Checklist
Completeness:
- All entities identified
- All attributes defined (types, nullability)
- All relationships mapped (cardinality)
- All constraints specified
- All invariants documented
Correctness:
- Each entity distinct purpose
- No redundant entities
- Attributes in correct entities
- Cardinality reflects reality
- Constraints enforce rules
Use Case Coverage:
- Supports all CRUD operations
- All queries answerable
- Indexes planned
- No missing joins
Normalization:
- No partial dependencies (2NF)
- No transitive dependencies (3NF)
- Denormalization documented
- No update anomalies
Technical Quality:
- Consistent naming
- Appropriate data types
- Primary keys defined
- Foreign keys maintain integrity
- Soft delete strategy (if needed)
- Audit fields (if needed)
Future-Proofing:
- Schema extensible
- Migration path (if applicable)
- Versioning strategy
- No technical debt
Common Pitfalls
1. Modeling implementation, not domain
- Symptom: Entities like "UserSession", "Cache"
- Fix: Model real-world concepts only
2. God entities
- Symptom: User with 50+ attributes
- Fix: Extract to separate entities
3. Missing junction tables
- Symptom: M:N with FKs
- Fix: Always use junction table
4. Nullable FKs without reason
- Symptom: All relationships optional
- Fix: NOT NULL unless truly optional
5. Not enforcing invariants
- Symptom: Rules in docs only
- Fix: CHECK constraints, triggers, app validation
6. Premature denormalization
- Symptom: Duplicating without measurement
- Fix: Normalize first, denormalize after profiling
7. Wrong data types
- Symptom: Money as VARCHAR
- Fix: DECIMAL for money, proper types for all
8. No migration strategy
- Symptom: Can't change schema
- Fix: Versioning, backward-compat changes