Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,282 @@
{
"criteria": [
{
"name": "Entity Identification & Completeness",
"description": "Are all domain entities identified? Each with clear purpose, distinct identity, and no redundancy?",
"scoring": {
"1": "Missing critical entities. Entities poorly defined or overlapping. No clear distinction between entities and attributes.",
"2": "Some entities identified but gaps in coverage. Some entity purposes unclear. Minor redundancy.",
"3": "Most entities identified with clear purposes. Reasonable coverage. Entities generally distinct.",
"4": "All required entities identified with clear, documented purposes. Good examples provided. No redundancy.",
"5": "Complete entity coverage validated against all use cases. Each entity has purpose, examples, lifecycle documented. Entity vs value object distinction clear. No overlap or redundancy."
}
},
{
"name": "Attribute Definition Quality",
"description": "Are attributes complete with appropriate data types, nullability, and constraints?",
"scoring": {
"1": "Attributes missing or poorly typed. Wrong data types (e.g., money as VARCHAR). Nullability ignored.",
"2": "Basic attributes present but some types questionable. Nullability inconsistent. Some constraints missing.",
"3": "Attributes defined with reasonable types. Nullability specified. Core constraints present.",
"4": "All attributes well-typed (DECIMAL for money, proper VARCHAR lengths). Nullability correctly specified. Constraints documented.",
"5": "Comprehensive attribute definitions with justification for types, nullability, defaults, and constraints. Audit fields (createdAt, updatedAt) included where appropriate. No technical debt."
}
},
{
"name": "Relationship Modeling Accuracy",
"description": "Are relationships correctly identified with proper cardinality, optionality, and implementation?",
"scoring": {
"1": "Relationships missing or incorrect. Cardinality wrong. M:N modeled without junction table.",
"2": "Some relationships identified but cardinality questionable. Missing junction tables or unclear optionality.",
"3": "Most relationships mapped with cardinality. Junction tables for M:N. Optionality specified.",
"4": "All relationships correctly modeled. Proper cardinality (1:1, 1:N, M:N). Junction tables where needed. Clear optionality.",
"5": "Comprehensive relationship documentation with bidirectional naming, implementation details (FKs, ON DELETE actions), and validation that relationships support all use cases. Complex patterns (polymorphic, hierarchical) correctly handled."
}
},
{
"name": "Constraint & Invariant Specification",
"description": "Are business rules enforced via constraints? Are domain invariants documented and validated?",
"scoring": {
"1": "No constraints beyond primary keys. Business rules not documented. Invariants missing.",
"2": "Basic constraints (NOT NULL, UNIQUE) present but business rules not enforced. Invariants mentioned but not validated.",
"3": "Good constraint coverage (PK, FK, UNIQUE, NOT NULL). Some business rules enforced. Invariants documented.",
"4": "Comprehensive constraints including CHECK constraints for business rules. All invariants documented with enforcement strategy.",
"5": "All constraints documented with rationale. Domain invariants clearly stated and enforced via DB constraints where possible, application logic where not. Validation strategy for complex multi-table invariants. Examples of enforcement code provided."
}
},
{
"name": "Normalization & Data Integrity",
"description": "Is schema properly normalized (or deliberately denormalized with rationale)?",
"scoring": {
"1": "Severe normalization violations. Redundant data. Update anomalies likely.",
"2": "Some normalization but violations present (partial or transitive dependencies). Some redundancy.",
"3": "Generally normalized to 2NF-3NF. Minimal redundancy. Rationale for exceptions provided.",
"4": "Proper normalization to 3NF. Any denormalization documented with performance justification. No update anomalies.",
"5": "Exemplary normalization with clear explanation of level achieved (1NF/2NF/3NF/BCNF). Strategic denormalization only where measured performance gains justify it. Trade-offs explicitly documented. No data integrity risks."
}
},
{
"name": "Use Case Coverage & Validation",
"description": "Does schema support all required use cases? Can all queries be answered?",
"scoring": {
"1": "Schema doesn't support core use cases. Critical queries impossible or require workarounds.",
"2": "Supports some use cases but gaps exist. Some queries difficult or inefficient.",
"3": "Supports most use cases. Required queries possible though some may be complex.",
"4": "All use cases supported. Validation checklist shows each use case can be satisfied. Query paths identified.",
"5": "Comprehensive validation against all use cases with example queries. Indexes planned for performance. Edge cases considered. Future use cases accommodated by design."
}
},
{
"name": "Technology Appropriateness",
"description": "Is the schema type (relational, document, graph) appropriate for the domain?",
"scoring": {
"1": "Wrong technology choice (e.g., relational for graph problem, or vice versa). Implementation doesn't match paradigm.",
"2": "Technology choice questionable. Implementation awkward for chosen paradigm.",
"3": "Reasonable technology choice. Implementation follows paradigm conventions.",
"4": "Good technology choice with justification. Implementation leverages paradigm strengths.",
"5": "Optimal technology choice with clear rationale comparing alternatives. Implementation exemplifies paradigm best practices. Schema leverages technology-specific features appropriately (e.g., JSONB in PostgreSQL, graph traversal in Neo4j)."
}
},
{
"name": "Documentation Quality & Clarity",
"description": "Is schema well-documented with ERD, implementation code, and clear explanations?",
"scoring": {
"1": "Minimal documentation. No diagram. Entity definitions incomplete.",
"2": "Basic documentation present but gaps. Diagram missing or unclear. Some entities poorly explained.",
"3": "Good documentation with most sections complete. Diagram present. Entities explained.",
"4": "Comprehensive documentation following template. ERD clear. All entities, relationships, constraints documented. Implementation code provided.",
"5": "Exemplary documentation that could serve as reference. ERD/diagram clear and complete. All sections filled thoroughly. Implementation code (SQL DDL / JSON Schema / Cypher) executable. Examples aid understanding. Could be handed to developer for immediate implementation."
}
},
{
"name": "Evolution & Migration Strategy",
"description": "Is there a plan for schema changes? Migration path from existing systems considered?",
"scoring": {
"1": "No evolution strategy. If migration, no plan for existing data.",
"2": "Evolution mentioned but no concrete strategy. Migration path vague.",
"3": "Basic evolution strategy (versioning or backward-compat approach). Migration considered if applicable.",
"4": "Clear evolution strategy documented. Migration path defined with phases if migrating from legacy.",
"5": "Comprehensive evolution strategy with versioning, backward-compatibility approach, and detailed migration plan if applicable. Rollback strategy considered. Zero-downtime deployment approach specified. Future extensibility designed in."
}
},
{
"name": "Advanced Pattern Application",
"description": "Are advanced patterns (temporal, hierarchies, polymorphic) correctly applied when needed?",
"scoring": {
"1": "Complex patterns needed but missing or incorrectly implemented.",
"2": "Attempted advanced patterns but implementation flawed or overly complex.",
"3": "Advanced patterns applied where needed with reasonable implementation.",
"4": "Advanced patterns correctly implemented with good trade-off decisions (e.g., hierarchy approach chosen based on use case).",
"5": "Sophisticated pattern usage with clear rationale for choices. Temporal modeling, hierarchies, polymorphic associations, or graph patterns implemented optimally for domain. Trade-offs explicit and justified."
}
}
],
"schema_type_guidance": {
"Relational (SQL)": {
"target_score": 3.5,
"focus_criteria": [
"Normalization & Data Integrity",
"Constraint & Invariant Specification",
"Relationship Modeling Accuracy"
],
"key_patterns": [
"Proper normalization (3NF typical)",
"Foreign key relationships with CASCADE/RESTRICT",
"CHECK constraints for business rules",
"Junction tables for M:N relationships"
]
},
"Document/NoSQL": {
"target_score": 3.5,
"focus_criteria": [
"Entity Identification & Completeness",
"Use Case Coverage & Validation",
"Technology Appropriateness"
],
"key_patterns": [
"Embed vs reference decision documented",
"Denormalization for read performance justified",
"Document structure matches query patterns",
"JSON schema validation if available"
]
},
"Graph Database": {
"target_score": 4.0,
"focus_criteria": [
"Relationship Modeling Accuracy",
"Technology Appropriateness",
"Advanced Pattern Application"
],
"key_patterns": [
"Nodes for entities, edges for relationships",
"Properties on edges for context",
"Traversal patterns optimized (< 3 hops typical)",
"Index on frequently filtered properties"
]
},
"Data Warehouse (OLAP)": {
"target_score": 3.5,
"focus_criteria": [
"Use Case Coverage & Validation",
"Normalization & Data Integrity",
"Technology Appropriateness"
],
"key_patterns": [
"Star or snowflake schema",
"Fact tables with foreign keys to dimensions",
"Dimensional attributes denormalized",
"Slowly changing dimensions handled"
]
}
},
"domain_complexity_guidance": {
"Simple Domain (< 10 entities, straightforward relationships)": {
"target_score": 3.0,
"acceptable_shortcuts": [
"ERD can be simple text diagram",
"Fewer implementation details needed",
"Basic constraints sufficient"
],
"key_quality_gates": [
"All entities identified",
"Relationships correct",
"Supports use cases"
]
},
"Standard Domain (10-30 entities, moderate complexity)": {
"target_score": 3.5,
"required_elements": [
"Complete entity definitions",
"ERD diagram",
"All relationships mapped",
"Constraints documented",
"Implementation code (DDL/schema)"
],
"key_quality_gates": [
"All 10 criteria evaluated",
"Minimum score of 3 on each",
"Average ≥ 3.5"
]
},
"Complex Domain (30+ entities, hierarchies, temporal, polymorphic)": {
"target_score": 4.0,
"required_elements": [
"Comprehensive documentation",
"Multiple diagrams (ERD + detail views)",
"Advanced pattern usage documented",
"Migration strategy if applicable",
"Performance considerations",
"Example queries for complex patterns"
],
"key_quality_gates": [
"All 10 criteria evaluated",
"Minimum score of 3.5 on each",
"Average ≥ 4.0",
"Score 5 on Advanced Pattern Application"
]
}
},
"common_failure_modes": {
"1. God Entities": {
"symptom": "User table with 50+ attributes, or single entity handling multiple concerns",
"why_it_fails": "Violates single responsibility, hard to query, update anomalies",
"fix": "Extract related concerns into separate entities (UserProfile, UserPreferences, UserAddress)",
"related_criteria": ["Entity Identification & Completeness", "Normalization & Data Integrity"]
},
"2. Missing Junction Tables": {
"symptom": "Attempting M:N relationship with direct foreign keys or comma-separated IDs",
"why_it_fails": "Can't properly model M:N, violates 1NF, query complexity",
"fix": "Always use junction table with composite primary key for M:N relationships",
"related_criteria": ["Relationship Modeling Accuracy", "Normalization & Data Integrity"]
},
"3. Wrong Data Types": {
"symptom": "Money as FLOAT, dates as VARCHAR, booleans as CHAR(1)",
"why_it_fails": "Precision loss (money), format inconsistency (dates), unclear semantics (booleans)",
"fix": "Use DECIMAL for money, DATE/TIMESTAMP for dates, BOOLEAN for flags",
"related_criteria": ["Attribute Definition Quality"]
},
"4. No Constraints": {
"symptom": "Business rules in documentation but not enforced in schema",
"why_it_fails": "Application bugs can corrupt data, no database-level guarantees",
"fix": "Use CHECK constraints, NOT NULL, UNIQUE, FK constraints to enforce rules",
"related_criteria": ["Constraint & Invariant Specification"]
},
"5. Premature Denormalization": {
"symptom": "Duplicating data for \"performance\" without measuring",
"why_it_fails": "Update anomalies, data inconsistency, wasted effort if not bottleneck",
"fix": "Normalize first (3NF), denormalize only after profiling shows actual bottleneck",
"related_criteria": ["Normalization & Data Integrity", "Use Case Coverage & Validation"]
},
"6. Ignoring Use Cases": {
"symptom": "Schema designed in isolation, doesn't support required queries",
"why_it_fails": "Schema can't answer business questions, requires redesign",
"fix": "Validate schema against ALL use cases. Write example queries to verify.",
"related_criteria": ["Use Case Coverage & Validation"]
},
"7. Modeling Implementation": {
"symptom": "Entities like \"UserSession\", \"Cache\", \"Queue\" in domain model",
"why_it_fails": "Confuses domain concepts with technical infrastructure",
"fix": "Model real-world domain entities only. Infrastructure is separate concern.",
"related_criteria": ["Entity Identification & Completeness", "Technology Appropriateness"]
},
"8. No Evolution Strategy": {
"symptom": "Can't change schema without breaking production",
"why_it_fails": "Schema ossifies, can't adapt to business changes",
"fix": "Plan for evolution: versioning, backward-compat changes, or expand-contract migrations",
"related_criteria": ["Evolution & Migration Strategy"]
}
},
"scale": {
"description": "Each criterion scored 1-5",
"min_score": 1,
"max_score": 5,
"passing_threshold": 3.5,
"excellence_threshold": 4.5
},
"usage_notes": {
"when_to_score": "After completing schema design, before delivering to user",
"minimum_standard": "Average score ≥ 3.5 across all criteria (standard domain). Simple domains: ≥ 3.0. Complex domains: ≥ 4.0.",
"how_to_improve": "If scoring < threshold, identify lowest-scoring criteria and iterate. Common fixes: add missing entities, specify constraints, validate against use cases, improve documentation.",
"self_assessment": "Score honestly. Schema flaws are expensive to fix in production. Better to iterate now."
}
}

View File

@@ -0,0 +1,439 @@
# Data Schema & Knowledge Modeling: Advanced Methodology
## Workflow
```
Advanced Schema Modeling:
- [ ] Step 1: Analyze complex domain patterns
- [ ] Step 2: Design advanced relationship structures
- [ ] Step 3: Apply normalization or strategic denormalization
- [ ] Step 4: Model temporal/historical aspects
- [ ] Step 5: Plan schema evolution strategy
```
**Steps:** (1) Identify patterns in [Advanced Relationships](#1-advanced-relationship-patterns), (2) Apply [Hierarchy](#2-hierarchy-modeling) and [Polymorphic](#3-polymorphic-associations) patterns, (3) Use [Normalization](#4-normalization-levels) then [Denormalization](#5-strategic-denormalization), (4) Add [Temporal](#6-temporal--historical-modeling) if needed, (5) Plan [Evolution](#7-schema-evolution).
---
## 1. Advanced Relationship Patterns
### Self-Referential
Entity relates to itself (org charts, categories, social networks).
```sql
CREATE TABLE Employee (
id BIGINT PRIMARY KEY,
managerId BIGINT NULL REFERENCES Employee(id),
CONSTRAINT no_self_ref CHECK (id != managerId)
);
```
Query with recursive CTE for full hierarchy.
### Conditional
Relationship exists only under conditions.
```sql
CREATE TABLE Order (
id BIGINT PRIMARY KEY,
status VARCHAR(20),
paymentId BIGINT NULL REFERENCES Payment(id),
CONSTRAINT payment_when_paid CHECK (
(status IN ('paid','completed') AND paymentId IS NOT NULL) OR
(status NOT IN ('paid','completed'))
)
);
```
### Multi-Parent
Entity has multiple parents (document in folders).
```sql
CREATE TABLE DocumentFolder (
documentId BIGINT REFERENCES Document(id),
folderId BIGINT REFERENCES Folder(id),
PRIMARY KEY (documentId, folderId)
);
```
---
## 2. Hierarchy Modeling
Four approaches with trade-offs:
| Approach | Implementation | Read | Write | Best For |
|----------|---------------|------|-------|----------|
| **Adjacency List** | `parentId` column | Slow (recursive) | Fast | Shallow trees, frequent updates |
| **Path Enumeration** | `path VARCHAR` ('/1/5/12/') | Fast | Medium | Read-heavy, moderate depth |
| **Nested Sets** | `lft, rgt INT` | Fastest | Slow | Read-heavy, rare writes |
| **Closure Table** | Separate ancestor/descendant table | Fastest | Medium | Complex queries, any depth |
**Adjacency List:**
```sql
CREATE TABLE Category (
id BIGINT PRIMARY KEY,
parentId BIGINT NULL REFERENCES Category(id)
);
```
**Closure Table:**
```sql
CREATE TABLE CategoryClosure (
ancestor BIGINT,
descendant BIGINT,
depth INT, -- 0=self, 1=child, 2+=deeper
PRIMARY KEY (ancestor, descendant)
);
```
**Recommendation:** Adjacency for < 5 levels, Closure for complex queries.
---
## 3. Polymorphic Associations
Entity relates to multiple types (Comment on Post/Photo/Video).
### Approach 1: Separate FKs (Recommended for SQL)
```sql
CREATE TABLE Comment (
id BIGINT PRIMARY KEY,
postId BIGINT NULL REFERENCES Post(id),
photoId BIGINT NULL REFERENCES Photo(id),
videoId BIGINT NULL REFERENCES Video(id),
CONSTRAINT one_parent CHECK (
(postId IS NOT NULL)::int +
(photoId IS NOT NULL)::int +
(videoId IS NOT NULL)::int = 1
)
);
```
**Pros:** Type-safe, referential integrity
**Cons:** Schema grows with types
### Approach 2: Supertype/Subtype
```sql
CREATE TABLE Commentable (id BIGINT PRIMARY KEY, type VARCHAR(50));
CREATE TABLE Post (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
CREATE TABLE Photo (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
CREATE TABLE Comment (commentableId BIGINT REFERENCES Commentable(id));
```
**Use when:** Shared attributes across types.
---
## 4. Graph & Ontology Design
### Property Graph
**Nodes** = entities, **Edges** = relationships, both have properties.
```cypher
CREATE (u:User {id: 1, name: 'Alice'})
CREATE (p:Product {id: 100, name: 'Widget'})
CREATE (u)-[:PURCHASED {date: '2024-01-15', quantity: 2}]->(p)
```
**Schema:**
```
Nodes: User, Product, Category
Edges: PURCHASED (User→Product, {date, quantity})
REVIEWED (User→Product, {rating, comment})
BELONGS_TO (Product→Category)
```
**Design principles:**
- Nodes for entities with identity
- Edges for relationships
- Properties on edges for context
- Avoid deep traversals (< 3 hops)
### RDF Triples (Semantic Web)
Subject-Predicate-Object:
```turtle
ex:Alice rdf:type ex:User .
ex:Alice ex:purchased ex:Widget .
```
**Use RDF when:** Standards compliance, semantic reasoning, linked data
**Use Property Graph when:** Performance, complex traversals
---
## 5. Normalization Levels
### 1NF: Atomic Values
**Violation:** Multiple phones in one column
**Fix:** Separate UserPhone table
### 2NF: No Partial Dependencies
**Violation:** In OrderItem(orderId, productId, productName), productName depends only on productId
**Fix:** productName lives in Product table
### 3NF: No Transitive Dependencies
**Violation:** In Address(id, zipCode, city, state), city/state depend on zipCode
**Fix:** Separate ZipCode table
**When to normalize to 3NF:** OLTP, frequent updates, consistency required
---
## 6. Strategic Denormalization
**Only after profiling shows bottleneck.**
### Pattern 1: Computed Aggregates
Store `Order.total` instead of summing OrderItems on every query.
**Trade-off:** Faster reads, slower writes, consistency risk (use triggers/app logic)
### Pattern 2: Frequent Joins
Embed address fields in User table to avoid join.
**Trade-off:** No join, but updates must maintain both
### Pattern 3: Historical Snapshots
```sql
CREATE TABLE OrderSnapshot (
orderId BIGINT,
snapshotDate DATE,
userName VARCHAR(255), -- denormalized from User
userEmail VARCHAR(255),
PRIMARY KEY (orderId, snapshotDate)
);
```
**Use when:** Need point-in-time data (e.g., user's name at time of order)
---
## 7. Temporal & Historical Modeling
### Pattern 1: Effective Dating
```sql
CREATE TABLE Price (
productId BIGINT,
price DECIMAL(10,2),
effectiveFrom DATE NOT NULL,
effectiveTo DATE NULL, -- NULL = current
PRIMARY KEY (productId, effectiveFrom)
);
```
**Query current:** WHERE effectiveFrom <= TODAY AND (effectiveTo IS NULL OR effectiveTo > TODAY)
### Pattern 2: History Table
```sql
CREATE TABLE UserHistory (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
userId BIGINT,
email VARCHAR(255),
name VARCHAR(255),
validFrom TIMESTAMP DEFAULT NOW(),
validTo TIMESTAMP NULL,
changeType VARCHAR(20) -- 'INSERT', 'UPDATE', 'DELETE'
);
```
Trigger on User table inserts into UserHistory on changes.
### Pattern 3: Event Sourcing
```sql
CREATE TABLE OrderEvent (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
orderId BIGINT,
eventType VARCHAR(50), -- 'CREATED', 'ITEM_ADDED', 'SHIPPED'
eventData JSON,
occurredAt TIMESTAMP DEFAULT NOW()
);
```
Reconstruct state by replaying events.
**Trade-offs:**
**Pros:** Complete audit, time travel
**Cons:** Query complexity, storage
---
## 8. Schema Evolution
### Strategy 1: Backward-Compatible
Safe changes (no app changes):
- Add nullable column
- Add table (not referenced)
- Add index
- Widen column (VARCHAR(100) → VARCHAR(255))
```sql
ALTER TABLE User ADD COLUMN phoneNumber VARCHAR(20) NULL;
```
### Strategy 2: Expand-Contract
For breaking changes:
1. **Expand:** Add new alongside old
```sql
ALTER TABLE User ADD COLUMN newEmail VARCHAR(255) NULL;
```
2. **Migrate:** Copy data
```sql
UPDATE User SET newEmail = email WHERE newEmail IS NULL;
```
3. **Contract:** Remove old
```sql
ALTER TABLE User DROP COLUMN email;
ALTER TABLE User RENAME COLUMN newEmail TO email;
```
### Strategy 3: Versioned Schemas (NoSQL)
```json
{"_schemaVersion": "2.0", "email": "alice@example.com"}
```
App handles multiple versions.
### Strategy 4: Blue-Green
Run old and new schemas simultaneously, dual-write, migrate, switch reads, remove old.
**Best for:** Major redesigns, zero downtime
---
## 9. Multi-Tenancy
### Pattern 1: Separate Databases
```
tenant1_db, tenant2_db, tenant3_db
```
**Pros:** Strong isolation
**Cons:** High overhead
### Pattern 2: Separate Schemas
```sql
CREATE SCHEMA tenant1;
CREATE TABLE tenant1.User (...);
```
**Pros:** Better than separate DBs
**Cons:** Still some overhead
### Pattern 3: Shared Schema + Tenant ID
```sql
CREATE TABLE User (
id BIGINT PRIMARY KEY,
tenantId BIGINT NOT NULL,
email VARCHAR(255),
UNIQUE (tenantId, email)
);
```
**Pros:** Most efficient
**Cons:** Must filter ALL queries by tenantId
**Recommendation:** Pattern 3 for SaaS, Pattern 1 for regulated industries
---
## 10. Performance
### Indexes
**Covering index** (includes all query columns):
```sql
CREATE INDEX idx_user_status ON User(status) INCLUDE (name, email);
```
**Composite index** (order matters):
```sql
-- Good for: WHERE tenantId = X AND createdAt > Y
CREATE INDEX idx_tenant_date ON Order(tenantId, createdAt);
```
**Partial index** (reduce size):
```sql
CREATE INDEX idx_active ON User(email) WHERE deletedAt IS NULL;
```
### Partitioning
**Horizontal (sharding):**
```sql
CREATE TABLE Order (...) PARTITION BY RANGE (createdAt);
CREATE TABLE Order_2024_Q1 PARTITION OF Order
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
```
**Vertical:** Split hot/cold data into separate tables.
---
## 11. Common Advanced Patterns
### Soft Deletes
```sql
ALTER TABLE User ADD COLUMN deletedAt TIMESTAMP NULL;
-- Query: WHERE deletedAt IS NULL
```
### Audit Columns
```sql
createdAt TIMESTAMP DEFAULT NOW()
updatedAt TIMESTAMP DEFAULT NOW() ON UPDATE NOW()
createdBy BIGINT REFERENCES User(id)
updatedBy BIGINT REFERENCES User(id)
```
### State Machines
```sql
CREATE TABLE OrderState (
orderId BIGINT REFERENCES Order(id),
state VARCHAR(20),
transitionedAt TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (orderId, transitionedAt)
);
-- Track: draft → pending → confirmed → shipped → delivered
```
### Idempotency Keys
```sql
CREATE TABLE Request (
idempotencyKey UUID PRIMARY KEY,
payload JSON,
result JSON,
processedAt TIMESTAMP
);
-- Prevents duplicate processing
```

View File

@@ -0,0 +1,330 @@
# Data Schema & Knowledge Modeling Template
## Workflow
Copy this checklist and track your progress:
```
Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes systematically
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate against use cases and document
```
**Step 1: Gather domain requirements and scope**
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
**Step 2: Identify entities and attributes systematically**
Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
**Step 3: Define relationships and cardinality**
Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
**Step 4: Specify constraints and invariants**
Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
**Step 5: Validate against use cases and document**
Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
---
## Input Questions
**Domain & Scope:**
- What domain? (e-commerce, healthcare, social network)
- Boundaries? In/out of scope?
- Existing schemas to integrate/migrate from?
**Core Use Cases:**
- Primary operations? (CRUD for which entities?)
- Required queries/reports?
- Access patterns? (read-heavy, write-heavy, mixed)
**Scale & Performance:**
- Data volume? (rows per table, storage)
- Growth rate? (daily/monthly)
- Performance SLAs?
**Technology:**
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
- Compliance? (GDPR, HIPAA, SOC2)
- Evolution needs? (schema versioning, migrations)
---
## Entity Identification
**Step 1: Extract nouns**
List nouns from requirements = candidate entities.
**Step 2: Validate**
For each, check:
- [ ] Distinct identity? (can point to "this specific X")
- [ ] Independent lifecycle?
- [ ] Multiple attributes beyond name?
- [ ] Track multiple instances?
**Keep** if yes to most. **Reject** if just an attribute.
**Step 3: Entity vs Value Object**
- **Entity**: Has ID, mutable (User, Order)
- **Value Object**: No ID, immutable (Address, Money)
**Step 4: Document**
```markdown
### Entity: [Name]
**Purpose:** [What it represents]
**Examples:** [2-3 concrete cases]
**Lifecycle:** [Creation → deletion]
**Invariants:** [Rules that must hold]
```
---
## Attribute Guide
**Template:**
```
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
- Description: [What it represents]
- Validation: [Constraints]
- Examples: [Sample values]
```
**Standard attributes:**
- `id`: Primary key (UUID/BIGINT)
- `createdAt`: TIMESTAMP NOT NULL
- `updatedAt`: TIMESTAMP NOT NULL
- `deletedAt`: TIMESTAMP NULL (soft deletes)
**Data types:**
| Data | SQL | NoSQL | Notes |
|------|-----|-------|-------|
| Short text | VARCHAR(N) | String | Specify max |
| Long text | TEXT | String | No limit |
| Integer | INT/BIGINT | Number | Choose size |
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
| Boolean | BOOLEAN | Boolean | Not nullable |
| Date/Time | TIMESTAMP | ISODate | With timezone |
| UUID | UUID/CHAR(36) | String | Distributed IDs |
| JSON | JSON/JSONB | Object | Flexible |
| Enum | ENUM/VARCHAR | String | Fixed values |
**Nullability:**
- NOT NULL if required
- NULL if optional/unknown at creation
- Avoid NULL for booleans
---
## Relationship Mapping
**Cardinality:**
**1:1** - User has one Profile
- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
**1:N** - User has many Orders
- SQL: `Order.userId NOT NULL REFERENCES User(id)`
**M:N** - Order contains Products
- Junction table:
```sql
OrderItem (
orderId REFERENCES Order(id),
productId REFERENCES Product(id),
quantity INT NOT NULL,
PRIMARY KEY (orderId, productId)
)
```
**Optionality:**
- Required: NOT NULL
- Optional: NULL
**Naming:**
Use verbs: User **owns** Order, Product **belongs to** Category
---
## Constraint Specification
**Primary Keys:**
```sql
id BIGINT PRIMARY KEY AUTO_INCREMENT
-- or --
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
```
**Unique:**
```sql
email VARCHAR(255) UNIQUE NOT NULL
UNIQUE (userId, productId) -- composite
```
**Foreign Keys:**
```sql
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
-- Options: CASCADE, SET NULL, RESTRICT
```
**Check Constraints:**
```sql
price DECIMAL(10,2) CHECK (price >= 0)
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
```
**Domain Invariants:**
Document business rules:
```markdown
### Invariant: Order total = sum of items
Order.total = SUM(OrderItem.quantity * OrderItem.price)
### Invariant: Unique email
No duplicate emails (case-insensitive)
```
Enforce via: DB constraints (preferred), application logic, or triggers.
---
## Schema Documentation Template
Create: `data-schema-knowledge-modeling.md`
**Required sections:**
1. **Domain Overview** - Purpose, scope, technology
2. **Use Cases** - Primary operations, query patterns
3. **Entity Definitions** - For each entity:
- Purpose, examples, lifecycle
- Attributes table (name, type, null, default, constraints, description)
- Relationships (cardinality, FK, optionality)
- Invariants
4. **ERD** - Visual/text diagram showing relationships
5. **Constraints** - DB constraints, domain invariants
6. **Normalization** - Level, denormalization decisions
7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
8. **Validation** - Check each use case is supported
9. **Open Questions** - Unresolved decisions
**Example entity definition:**
```markdown
### Entity: Order
**Purpose:** Represents customer purchase transaction
**Examples:** Amazon order #123, Shopify order #456
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
#### Attributes
| Attribute | Type | Null? | Default | Constraints | Description |
|---|---|---|---|---|---|
| id | BIGINT | NO | auto | PK | Unique identifier |
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
#### Relationships
- **belongs to:** 1:N with User (Order.userId → User.id)
- **contains:** 1:N with OrderItem junction table
#### Invariants
- total = SUM(OrderItem.quantity * OrderItem.price)
- status transitions: pending → confirmed → shipped → delivered
```
---
## Validation Checklist
**Completeness:**
- [ ] All entities identified
- [ ] All attributes defined (types, nullability)
- [ ] All relationships mapped (cardinality)
- [ ] All constraints specified
- [ ] All invariants documented
**Correctness:**
- [ ] Each entity distinct purpose
- [ ] No redundant entities
- [ ] Attributes in correct entities
- [ ] Cardinality reflects reality
- [ ] Constraints enforce rules
**Use Case Coverage:**
- [ ] Supports all CRUD operations
- [ ] All queries answerable
- [ ] Indexes planned
- [ ] No missing joins
**Normalization:**
- [ ] No partial dependencies (2NF)
- [ ] No transitive dependencies (3NF)
- [ ] Denormalization documented
- [ ] No update anomalies
**Technical Quality:**
- [ ] Consistent naming
- [ ] Appropriate data types
- [ ] Primary keys defined
- [ ] Foreign keys maintain integrity
- [ ] Soft delete strategy (if needed)
- [ ] Audit fields (if needed)
**Future-Proofing:**
- [ ] Schema extensible
- [ ] Migration path (if applicable)
- [ ] Versioning strategy
- [ ] No technical debt
---
## Common Pitfalls
**1. Modeling implementation, not domain**
- Symptom: Entities like "UserSession", "Cache"
- Fix: Model real-world concepts only
**2. God entities**
- Symptom: User with 50+ attributes
- Fix: Extract to separate entities
**3. Missing junction tables**
- Symptom: M:N with FKs
- Fix: Always use junction table
**4. Nullable FKs without reason**
- Symptom: All relationships optional
- Fix: NOT NULL unless truly optional
**5. Not enforcing invariants**
- Symptom: Rules in docs only
- Fix: CHECK constraints, triggers, app validation
**6. Premature denormalization**
- Symptom: Duplicating without measurement
- Fix: Normalize first, denormalize after profiling
**7. Wrong data types**
- Symptom: Money as VARCHAR
- Fix: DECIMAL for money, proper types for all
**8. No migration strategy**
- Symptom: Can't change schema
- Fix: Versioning, backward-compat changes