331 lines
8.6 KiB
Markdown
331 lines
8.6 KiB
Markdown
# Data Schema & Knowledge Modeling Template
|
|
|
|
## Workflow
|
|
|
|
Copy this checklist and track your progress:
|
|
|
|
```
|
|
Data Schema & Knowledge Modeling Progress:
|
|
- [ ] Step 1: Gather domain requirements and scope
|
|
- [ ] Step 2: Identify entities and attributes systematically
|
|
- [ ] Step 3: Define relationships and cardinality
|
|
- [ ] Step 4: Specify constraints and invariants
|
|
- [ ] Step 5: Validate against use cases and document
|
|
```
|
|
|
|
**Step 1: Gather domain requirements and scope**
|
|
|
|
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
|
|
|
|
**Step 2: Identify entities and attributes systematically**
|
|
|
|
Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
|
|
|
|
**Step 3: Define relationships and cardinality**
|
|
|
|
Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
|
|
|
|
**Step 4: Specify constraints and invariants**
|
|
|
|
Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
|
|
|
|
**Step 5: Validate against use cases and document**
|
|
|
|
Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
|
|
|
|
---
|
|
|
|
## Input Questions
|
|
|
|
**Domain & Scope:**
|
|
- What domain? (e-commerce, healthcare, social network)
|
|
- Boundaries? In/out of scope?
|
|
- Existing schemas to integrate/migrate from?
|
|
|
|
**Core Use Cases:**
|
|
- Primary operations? (CRUD for which entities?)
|
|
- Required queries/reports?
|
|
- Access patterns? (read-heavy, write-heavy, mixed)
|
|
|
|
**Scale & Performance:**
|
|
- Data volume? (rows per table, storage)
|
|
- Growth rate? (daily/monthly)
|
|
- Performance SLAs?
|
|
|
|
**Technology:**
|
|
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
|
|
- Compliance? (GDPR, HIPAA, SOC2)
|
|
- Evolution needs? (schema versioning, migrations)
|
|
|
|
---
|
|
|
|
## Entity Identification
|
|
|
|
**Step 1: Extract nouns**
|
|
|
|
List nouns from requirements = candidate entities.
|
|
|
|
**Step 2: Validate**
|
|
|
|
For each, check:
|
|
- [ ] Distinct identity? (can point to "this specific X")
|
|
- [ ] Independent lifecycle?
|
|
- [ ] Multiple attributes beyond name?
|
|
- [ ] Track multiple instances?
|
|
|
|
**Keep** if yes to most. **Reject** if just an attribute.
|
|
|
|
**Step 3: Entity vs Value Object**
|
|
|
|
- **Entity**: Has ID, mutable (User, Order)
|
|
- **Value Object**: No ID, immutable (Address, Money)
|
|
|
|
**Step 4: Document**
|
|
|
|
```markdown
|
|
### Entity: [Name]
|
|
**Purpose:** [What it represents]
|
|
**Examples:** [2-3 concrete cases]
|
|
**Lifecycle:** [Creation → deletion]
|
|
**Invariants:** [Rules that must hold]
|
|
```
|
|
|
|
---
|
|
|
|
## Attribute Guide
|
|
|
|
**Template:**
|
|
```
|
|
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
|
|
- Description: [What it represents]
|
|
- Validation: [Constraints]
|
|
- Examples: [Sample values]
|
|
```
|
|
|
|
**Standard attributes:**
|
|
- `id`: Primary key (UUID/BIGINT)
|
|
- `createdAt`: TIMESTAMP NOT NULL
|
|
- `updatedAt`: TIMESTAMP NOT NULL
|
|
- `deletedAt`: TIMESTAMP NULL (soft deletes)
|
|
|
|
**Data types:**
|
|
|
|
| Data | SQL | NoSQL | Notes |
|
|
|------|-----|-------|-------|
|
|
| Short text | VARCHAR(N) | String | Specify max |
|
|
| Long text | TEXT | String | No limit |
|
|
| Integer | INT/BIGINT | Number | Choose size |
|
|
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
|
|
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
|
|
| Boolean | BOOLEAN | Boolean | Not nullable |
|
|
| Date/Time | TIMESTAMP | ISODate | With timezone |
|
|
| UUID | UUID/CHAR(36) | String | Distributed IDs |
|
|
| JSON | JSON/JSONB | Object | Flexible |
|
|
| Enum | ENUM/VARCHAR | String | Fixed values |
|
|
|
|
**Nullability:**
|
|
- NOT NULL if required
|
|
- NULL if optional/unknown at creation
|
|
- Avoid NULL for booleans
|
|
|
|
---
|
|
|
|
## Relationship Mapping
|
|
|
|
**Cardinality:**
|
|
|
|
**1:1** - User has one Profile
|
|
- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
|
|
|
|
**1:N** - User has many Orders
|
|
- SQL: `Order.userId NOT NULL REFERENCES User(id)`
|
|
|
|
**M:N** - Order contains Products
|
|
- Junction table:
|
|
```sql
|
|
OrderItem (
|
|
orderId REFERENCES Order(id),
|
|
productId REFERENCES Product(id),
|
|
quantity INT NOT NULL,
|
|
PRIMARY KEY (orderId, productId)
|
|
)
|
|
```
|
|
|
|
**Optionality:**
|
|
- Required: NOT NULL
|
|
- Optional: NULL
|
|
|
|
**Naming:**
|
|
Use verbs: User **owns** Order, Product **belongs to** Category
|
|
|
|
---
|
|
|
|
## Constraint Specification
|
|
|
|
**Primary Keys:**
|
|
```sql
|
|
id BIGINT PRIMARY KEY AUTO_INCREMENT
|
|
-- or --
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
|
```
|
|
|
|
**Unique:**
|
|
```sql
|
|
email VARCHAR(255) UNIQUE NOT NULL
|
|
UNIQUE (userId, productId) -- composite
|
|
```
|
|
|
|
**Foreign Keys:**
|
|
```sql
|
|
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
|
|
-- Options: CASCADE, SET NULL, RESTRICT
|
|
```
|
|
|
|
**Check Constraints:**
|
|
```sql
|
|
price DECIMAL(10,2) CHECK (price >= 0)
|
|
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
|
|
```
|
|
|
|
**Domain Invariants:**
|
|
|
|
Document business rules:
|
|
```markdown
|
|
### Invariant: Order total = sum of items
|
|
Order.total = SUM(OrderItem.quantity * OrderItem.price)
|
|
|
|
### Invariant: Unique email
|
|
No duplicate emails (case-insensitive)
|
|
```
|
|
|
|
Enforce via: DB constraints (preferred), application logic, or triggers.
|
|
|
|
---
|
|
|
|
## Schema Documentation Template
|
|
|
|
Create: `data-schema-knowledge-modeling.md`
|
|
|
|
**Required sections:**
|
|
|
|
1. **Domain Overview** - Purpose, scope, technology
|
|
2. **Use Cases** - Primary operations, query patterns
|
|
3. **Entity Definitions** - For each entity:
|
|
- Purpose, examples, lifecycle
|
|
- Attributes table (name, type, null, default, constraints, description)
|
|
- Relationships (cardinality, FK, optionality)
|
|
- Invariants
|
|
4. **ERD** - Visual/text diagram showing relationships
|
|
5. **Constraints** - DB constraints, domain invariants
|
|
6. **Normalization** - Level, denormalization decisions
|
|
7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
|
|
8. **Validation** - Check each use case is supported
|
|
9. **Open Questions** - Unresolved decisions
|
|
|
|
**Example entity definition:**
|
|
|
|
```markdown
|
|
### Entity: Order
|
|
|
|
**Purpose:** Represents customer purchase transaction
|
|
**Examples:** Amazon order #123, Shopify order #456
|
|
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
|
|
|
|
#### Attributes
|
|
|
|
| Attribute | Type | Null? | Default | Constraints | Description |
|
|
|---|---|---|---|---|---|
|
|
| id | BIGINT | NO | auto | PK | Unique identifier |
|
|
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
|
|
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
|
|
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
|
|
|
|
#### Relationships
|
|
- **belongs to:** 1:N with User (Order.userId → User.id)
|
|
- **contains:** 1:N with OrderItem junction table
|
|
|
|
#### Invariants
|
|
- total = SUM(OrderItem.quantity * OrderItem.price)
|
|
- status transitions: pending → confirmed → shipped → delivered
|
|
```
|
|
|
|
---
|
|
|
|
## Validation Checklist
|
|
|
|
**Completeness:**
|
|
- [ ] All entities identified
|
|
- [ ] All attributes defined (types, nullability)
|
|
- [ ] All relationships mapped (cardinality)
|
|
- [ ] All constraints specified
|
|
- [ ] All invariants documented
|
|
|
|
**Correctness:**
|
|
- [ ] Each entity distinct purpose
|
|
- [ ] No redundant entities
|
|
- [ ] Attributes in correct entities
|
|
- [ ] Cardinality reflects reality
|
|
- [ ] Constraints enforce rules
|
|
|
|
**Use Case Coverage:**
|
|
- [ ] Supports all CRUD operations
|
|
- [ ] All queries answerable
|
|
- [ ] Indexes planned
|
|
- [ ] No missing joins
|
|
|
|
**Normalization:**
|
|
- [ ] No partial dependencies (2NF)
|
|
- [ ] No transitive dependencies (3NF)
|
|
- [ ] Denormalization documented
|
|
- [ ] No update anomalies
|
|
|
|
**Technical Quality:**
|
|
- [ ] Consistent naming
|
|
- [ ] Appropriate data types
|
|
- [ ] Primary keys defined
|
|
- [ ] Foreign keys maintain integrity
|
|
- [ ] Soft delete strategy (if needed)
|
|
- [ ] Audit fields (if needed)
|
|
|
|
**Future-Proofing:**
|
|
- [ ] Schema extensible
|
|
- [ ] Migration path (if applicable)
|
|
- [ ] Versioning strategy
|
|
- [ ] No technical debt
|
|
|
|
---
|
|
|
|
## Common Pitfalls
|
|
|
|
**1. Modeling implementation, not domain**
|
|
- Symptom: Entities like "UserSession", "Cache"
|
|
- Fix: Model real-world concepts only
|
|
|
|
**2. God entities**
|
|
- Symptom: User with 50+ attributes
|
|
- Fix: Extract to separate entities
|
|
|
|
**3. Missing junction tables**
|
|
- Symptom: M:N with FKs
|
|
- Fix: Always use junction table
|
|
|
|
**4. Nullable FKs without reason**
|
|
- Symptom: All relationships optional
|
|
- Fix: NOT NULL unless truly optional
|
|
|
|
**5. Not enforcing invariants**
|
|
- Symptom: Rules in docs only
|
|
- Fix: CHECK constraints, triggers, app validation
|
|
|
|
**6. Premature denormalization**
|
|
- Symptom: Duplicating without measurement
|
|
- Fix: Normalize first, denormalize after profiling
|
|
|
|
**7. Wrong data types**
|
|
- Symptom: Money as VARCHAR
|
|
- Fix: DECIMAL for money, proper types for all
|
|
|
|
**8. No migration strategy**
|
|
- Symptom: Can't change schema
|
|
- Fix: Versioning, backward-compat changes
|