Files
2025-11-30 08:38:26 +08:00

331 lines
8.6 KiB
Markdown

# Data Schema & Knowledge Modeling Template
## Workflow
Copy this checklist and track your progress:
```
Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes systematically
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate against use cases and document
```
**Step 1: Gather domain requirements and scope**
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
**Step 2: Identify entities and attributes systematically**
Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
**Step 3: Define relationships and cardinality**
Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
**Step 4: Specify constraints and invariants**
Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
**Step 5: Validate against use cases and document**
Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
---
## Input Questions
**Domain & Scope:**
- What domain? (e-commerce, healthcare, social network)
- Boundaries? In/out of scope?
- Existing schemas to integrate/migrate from?
**Core Use Cases:**
- Primary operations? (CRUD for which entities?)
- Required queries/reports?
- Access patterns? (read-heavy, write-heavy, mixed)
**Scale & Performance:**
- Data volume? (rows per table, storage)
- Growth rate? (daily/monthly)
- Performance SLAs?
**Technology:**
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
- Compliance? (GDPR, HIPAA, SOC2)
- Evolution needs? (schema versioning, migrations)
---
## Entity Identification
**Step 1: Extract nouns**
List nouns from requirements = candidate entities.
**Step 2: Validate**
For each, check:
- [ ] Distinct identity? (can point to "this specific X")
- [ ] Independent lifecycle?
- [ ] Multiple attributes beyond name?
- [ ] Track multiple instances?
**Keep** if yes to most. **Reject** if just an attribute.
**Step 3: Entity vs Value Object**
- **Entity**: Has ID, mutable (User, Order)
- **Value Object**: No ID, immutable (Address, Money)
**Step 4: Document**
```markdown
### Entity: [Name]
**Purpose:** [What it represents]
**Examples:** [2-3 concrete cases]
**Lifecycle:** [Creation → deletion]
**Invariants:** [Rules that must hold]
```
---
## Attribute Guide
**Template:**
```
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
- Description: [What it represents]
- Validation: [Constraints]
- Examples: [Sample values]
```
**Standard attributes:**
- `id`: Primary key (UUID/BIGINT)
- `createdAt`: TIMESTAMP NOT NULL
- `updatedAt`: TIMESTAMP NOT NULL
- `deletedAt`: TIMESTAMP NULL (soft deletes)
**Data types:**
| Data | SQL | NoSQL | Notes |
|------|-----|-------|-------|
| Short text | VARCHAR(N) | String | Specify max |
| Long text | TEXT | String | No limit |
| Integer | INT/BIGINT | Number | Choose size |
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
| Boolean | BOOLEAN | Boolean | Not nullable |
| Date/Time | TIMESTAMP | ISODate | With timezone |
| UUID | UUID/CHAR(36) | String | Distributed IDs |
| JSON | JSON/JSONB | Object | Flexible |
| Enum | ENUM/VARCHAR | String | Fixed values |
**Nullability:**
- NOT NULL if required
- NULL if optional/unknown at creation
- Avoid NULL for booleans
---
## Relationship Mapping
**Cardinality:**
**1:1** - User has one Profile
- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
**1:N** - User has many Orders
- SQL: `Order.userId NOT NULL REFERENCES User(id)`
**M:N** - Order contains Products
- Junction table:
```sql
OrderItem (
orderId REFERENCES Order(id),
productId REFERENCES Product(id),
quantity INT NOT NULL,
PRIMARY KEY (orderId, productId)
)
```
**Optionality:**
- Required: NOT NULL
- Optional: NULL
**Naming:**
Use verbs: User **owns** Order, Product **belongs to** Category
---
## Constraint Specification
**Primary Keys:**
```sql
id BIGINT PRIMARY KEY AUTO_INCREMENT
-- or --
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
```
**Unique:**
```sql
email VARCHAR(255) UNIQUE NOT NULL
UNIQUE (userId, productId) -- composite
```
**Foreign Keys:**
```sql
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
-- Options: CASCADE, SET NULL, RESTRICT
```
**Check Constraints:**
```sql
price DECIMAL(10,2) CHECK (price >= 0)
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
```
**Domain Invariants:**
Document business rules:
```markdown
### Invariant: Order total = sum of items
Order.total = SUM(OrderItem.quantity * OrderItem.price)
### Invariant: Unique email
No duplicate emails (case-insensitive)
```
Enforce via: DB constraints (preferred), application logic, or triggers.
---
## Schema Documentation Template
Create: `data-schema-knowledge-modeling.md`
**Required sections:**
1. **Domain Overview** - Purpose, scope, technology
2. **Use Cases** - Primary operations, query patterns
3. **Entity Definitions** - For each entity:
- Purpose, examples, lifecycle
- Attributes table (name, type, null, default, constraints, description)
- Relationships (cardinality, FK, optionality)
- Invariants
4. **ERD** - Visual/text diagram showing relationships
5. **Constraints** - DB constraints, domain invariants
6. **Normalization** - Level, denormalization decisions
7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
8. **Validation** - Check each use case is supported
9. **Open Questions** - Unresolved decisions
**Example entity definition:**
```markdown
### Entity: Order
**Purpose:** Represents customer purchase transaction
**Examples:** Amazon order #123, Shopify order #456
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
#### Attributes
| Attribute | Type | Null? | Default | Constraints | Description |
|---|---|---|---|---|---|
| id | BIGINT | NO | auto | PK | Unique identifier |
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
#### Relationships
- **belongs to:** 1:N with User (Order.userId → User.id)
- **contains:** 1:N with OrderItem junction table
#### Invariants
- total = SUM(OrderItem.quantity * OrderItem.price)
- status transitions: pending → confirmed → shipped → delivered
```
---
## Validation Checklist
**Completeness:**
- [ ] All entities identified
- [ ] All attributes defined (types, nullability)
- [ ] All relationships mapped (cardinality)
- [ ] All constraints specified
- [ ] All invariants documented
**Correctness:**
- [ ] Each entity distinct purpose
- [ ] No redundant entities
- [ ] Attributes in correct entities
- [ ] Cardinality reflects reality
- [ ] Constraints enforce rules
**Use Case Coverage:**
- [ ] Supports all CRUD operations
- [ ] All queries answerable
- [ ] Indexes planned
- [ ] No missing joins
**Normalization:**
- [ ] No partial dependencies (2NF)
- [ ] No transitive dependencies (3NF)
- [ ] Denormalization documented
- [ ] No update anomalies
**Technical Quality:**
- [ ] Consistent naming
- [ ] Appropriate data types
- [ ] Primary keys defined
- [ ] Foreign keys maintain integrity
- [ ] Soft delete strategy (if needed)
- [ ] Audit fields (if needed)
**Future-Proofing:**
- [ ] Schema extensible
- [ ] Migration path (if applicable)
- [ ] Versioning strategy
- [ ] No technical debt
---
## Common Pitfalls
**1. Modeling implementation, not domain**
- Symptom: Entities like "UserSession", "Cache"
- Fix: Model real-world concepts only
**2. God entities**
- Symptom: User with 50+ attributes
- Fix: Extract to separate entities
**3. Missing junction tables**
- Symptom: M:N with FKs
- Fix: Always use junction table
**4. Nullable FKs without reason**
- Symptom: All relationships optional
- Fix: NOT NULL unless truly optional
**5. Not enforcing invariants**
- Symptom: Rules in docs only
- Fix: CHECK constraints, triggers, app validation
**6. Premature denormalization**
- Symptom: Duplicating without measurement
- Fix: Normalize first, denormalize after profiling
**7. Wrong data types**
- Symptom: Money as VARCHAR
- Fix: DECIMAL for money, proper types for all
**8. No migration strategy**
- Symptom: Can't change schema
- Fix: Versioning, backward-compat changes