Initial commit
This commit is contained in:
@@ -0,0 +1,282 @@
|
||||
{
|
||||
"criteria": [
|
||||
{
|
||||
"name": "Entity Identification & Completeness",
|
||||
"description": "Are all domain entities identified? Each with clear purpose, distinct identity, and no redundancy?",
|
||||
"scoring": {
|
||||
"1": "Missing critical entities. Entities poorly defined or overlapping. No clear distinction between entities and attributes.",
|
||||
"2": "Some entities identified but gaps in coverage. Some entity purposes unclear. Minor redundancy.",
|
||||
"3": "Most entities identified with clear purposes. Reasonable coverage. Entities generally distinct.",
|
||||
"4": "All required entities identified with clear, documented purposes. Good examples provided. No redundancy.",
|
||||
"5": "Complete entity coverage validated against all use cases. Each entity has purpose, examples, lifecycle documented. Entity vs value object distinction clear. No overlap or redundancy."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Attribute Definition Quality",
|
||||
"description": "Are attributes complete with appropriate data types, nullability, and constraints?",
|
||||
"scoring": {
|
||||
"1": "Attributes missing or poorly typed. Wrong data types (e.g., money as VARCHAR). Nullability ignored.",
|
||||
"2": "Basic attributes present but some types questionable. Nullability inconsistent. Some constraints missing.",
|
||||
"3": "Attributes defined with reasonable types. Nullability specified. Core constraints present.",
|
||||
"4": "All attributes well-typed (DECIMAL for money, proper VARCHAR lengths). Nullability correctly specified. Constraints documented.",
|
||||
"5": "Comprehensive attribute definitions with justification for types, nullability, defaults, and constraints. Audit fields (createdAt, updatedAt) included where appropriate. No technical debt."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Relationship Modeling Accuracy",
|
||||
"description": "Are relationships correctly identified with proper cardinality, optionality, and implementation?",
|
||||
"scoring": {
|
||||
"1": "Relationships missing or incorrect. Cardinality wrong. M:N modeled without junction table.",
|
||||
"2": "Some relationships identified but cardinality questionable. Missing junction tables or unclear optionality.",
|
||||
"3": "Most relationships mapped with cardinality. Junction tables for M:N. Optionality specified.",
|
||||
"4": "All relationships correctly modeled. Proper cardinality (1:1, 1:N, M:N). Junction tables where needed. Clear optionality.",
|
||||
"5": "Comprehensive relationship documentation with bidirectional naming, implementation details (FKs, ON DELETE actions), and validation that relationships support all use cases. Complex patterns (polymorphic, hierarchical) correctly handled."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Constraint & Invariant Specification",
|
||||
"description": "Are business rules enforced via constraints? Are domain invariants documented and validated?",
|
||||
"scoring": {
|
||||
"1": "No constraints beyond primary keys. Business rules not documented. Invariants missing.",
|
||||
"2": "Basic constraints (NOT NULL, UNIQUE) present but business rules not enforced. Invariants mentioned but not validated.",
|
||||
"3": "Good constraint coverage (PK, FK, UNIQUE, NOT NULL). Some business rules enforced. Invariants documented.",
|
||||
"4": "Comprehensive constraints including CHECK constraints for business rules. All invariants documented with enforcement strategy.",
|
||||
"5": "All constraints documented with rationale. Domain invariants clearly stated and enforced via DB constraints where possible, application logic where not. Validation strategy for complex multi-table invariants. Examples of enforcement code provided."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Normalization & Data Integrity",
|
||||
"description": "Is schema properly normalized (or deliberately denormalized with rationale)?",
|
||||
"scoring": {
|
||||
"1": "Severe normalization violations. Redundant data. Update anomalies likely.",
|
||||
"2": "Some normalization but violations present (partial or transitive dependencies). Some redundancy.",
|
||||
"3": "Generally normalized to 2NF-3NF. Minimal redundancy. Rationale for exceptions provided.",
|
||||
"4": "Proper normalization to 3NF. Any denormalization documented with performance justification. No update anomalies.",
|
||||
"5": "Exemplary normalization with clear explanation of level achieved (1NF/2NF/3NF/BCNF). Strategic denormalization only where measured performance gains justify it. Trade-offs explicitly documented. No data integrity risks."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Use Case Coverage & Validation",
|
||||
"description": "Does schema support all required use cases? Can all queries be answered?",
|
||||
"scoring": {
|
||||
"1": "Schema doesn't support core use cases. Critical queries impossible or require workarounds.",
|
||||
"2": "Supports some use cases but gaps exist. Some queries difficult or inefficient.",
|
||||
"3": "Supports most use cases. Required queries possible though some may be complex.",
|
||||
"4": "All use cases supported. Validation checklist shows each use case can be satisfied. Query paths identified.",
|
||||
"5": "Comprehensive validation against all use cases with example queries. Indexes planned for performance. Edge cases considered. Future use cases accommodated by design."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Technology Appropriateness",
|
||||
"description": "Is the schema type (relational, document, graph) appropriate for the domain?",
|
||||
"scoring": {
|
||||
"1": "Wrong technology choice (e.g., relational for graph problem, or vice versa). Implementation doesn't match paradigm.",
|
||||
"2": "Technology choice questionable. Implementation awkward for chosen paradigm.",
|
||||
"3": "Reasonable technology choice. Implementation follows paradigm conventions.",
|
||||
"4": "Good technology choice with justification. Implementation leverages paradigm strengths.",
|
||||
"5": "Optimal technology choice with clear rationale comparing alternatives. Implementation exemplifies paradigm best practices. Schema leverages technology-specific features appropriately (e.g., JSONB in PostgreSQL, graph traversal in Neo4j)."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Documentation Quality & Clarity",
|
||||
"description": "Is schema well-documented with ERD, implementation code, and clear explanations?",
|
||||
"scoring": {
|
||||
"1": "Minimal documentation. No diagram. Entity definitions incomplete.",
|
||||
"2": "Basic documentation present but gaps. Diagram missing or unclear. Some entities poorly explained.",
|
||||
"3": "Good documentation with most sections complete. Diagram present. Entities explained.",
|
||||
"4": "Comprehensive documentation following template. ERD clear. All entities, relationships, constraints documented. Implementation code provided.",
|
||||
"5": "Exemplary documentation that could serve as reference. ERD/diagram clear and complete. All sections filled thoroughly. Implementation code (SQL DDL / JSON Schema / Cypher) executable. Examples aid understanding. Could be handed to developer for immediate implementation."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Evolution & Migration Strategy",
|
||||
"description": "Is there a plan for schema changes? Migration path from existing systems considered?",
|
||||
"scoring": {
|
||||
"1": "No evolution strategy. If migration, no plan for existing data.",
|
||||
"2": "Evolution mentioned but no concrete strategy. Migration path vague.",
|
||||
"3": "Basic evolution strategy (versioning or backward-compat approach). Migration considered if applicable.",
|
||||
"4": "Clear evolution strategy documented. Migration path defined with phases if migrating from legacy.",
|
||||
"5": "Comprehensive evolution strategy with versioning, backward-compatibility approach, and detailed migration plan if applicable. Rollback strategy considered. Zero-downtime deployment approach specified. Future extensibility designed in."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Advanced Pattern Application",
|
||||
"description": "Are advanced patterns (temporal, hierarchies, polymorphic) correctly applied when needed?",
|
||||
"scoring": {
|
||||
"1": "Complex patterns needed but missing or incorrectly implemented.",
|
||||
"2": "Attempted advanced patterns but implementation flawed or overly complex.",
|
||||
"3": "Advanced patterns applied where needed with reasonable implementation.",
|
||||
"4": "Advanced patterns correctly implemented with good trade-off decisions (e.g., hierarchy approach chosen based on use case).",
|
||||
"5": "Sophisticated pattern usage with clear rationale for choices. Temporal modeling, hierarchies, polymorphic associations, or graph patterns implemented optimally for domain. Trade-offs explicit and justified."
|
||||
}
|
||||
}
|
||||
],
|
||||
"schema_type_guidance": {
|
||||
"Relational (SQL)": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Normalization & Data Integrity",
|
||||
"Constraint & Invariant Specification",
|
||||
"Relationship Modeling Accuracy"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Proper normalization (3NF typical)",
|
||||
"Foreign key relationships with CASCADE/RESTRICT",
|
||||
"CHECK constraints for business rules",
|
||||
"Junction tables for M:N relationships"
|
||||
]
|
||||
},
|
||||
"Document/NoSQL": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Entity Identification & Completeness",
|
||||
"Use Case Coverage & Validation",
|
||||
"Technology Appropriateness"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Embed vs reference decision documented",
|
||||
"Denormalization for read performance justified",
|
||||
"Document structure matches query patterns",
|
||||
"JSON schema validation if available"
|
||||
]
|
||||
},
|
||||
"Graph Database": {
|
||||
"target_score": 4.0,
|
||||
"focus_criteria": [
|
||||
"Relationship Modeling Accuracy",
|
||||
"Technology Appropriateness",
|
||||
"Advanced Pattern Application"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Nodes for entities, edges for relationships",
|
||||
"Properties on edges for context",
|
||||
"Traversal patterns optimized (< 3 hops typical)",
|
||||
"Index on frequently filtered properties"
|
||||
]
|
||||
},
|
||||
"Data Warehouse (OLAP)": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Use Case Coverage & Validation",
|
||||
"Normalization & Data Integrity",
|
||||
"Technology Appropriateness"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Star or snowflake schema",
|
||||
"Fact tables with foreign keys to dimensions",
|
||||
"Dimensional attributes denormalized",
|
||||
"Slowly changing dimensions handled"
|
||||
]
|
||||
}
|
||||
},
|
||||
"domain_complexity_guidance": {
|
||||
"Simple Domain (< 10 entities, straightforward relationships)": {
|
||||
"target_score": 3.0,
|
||||
"acceptable_shortcuts": [
|
||||
"ERD can be simple text diagram",
|
||||
"Fewer implementation details needed",
|
||||
"Basic constraints sufficient"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All entities identified",
|
||||
"Relationships correct",
|
||||
"Supports use cases"
|
||||
]
|
||||
},
|
||||
"Standard Domain (10-30 entities, moderate complexity)": {
|
||||
"target_score": 3.5,
|
||||
"required_elements": [
|
||||
"Complete entity definitions",
|
||||
"ERD diagram",
|
||||
"All relationships mapped",
|
||||
"Constraints documented",
|
||||
"Implementation code (DDL/schema)"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All 10 criteria evaluated",
|
||||
"Minimum score of 3 on each",
|
||||
"Average ≥ 3.5"
|
||||
]
|
||||
},
|
||||
"Complex Domain (30+ entities, hierarchies, temporal, polymorphic)": {
|
||||
"target_score": 4.0,
|
||||
"required_elements": [
|
||||
"Comprehensive documentation",
|
||||
"Multiple diagrams (ERD + detail views)",
|
||||
"Advanced pattern usage documented",
|
||||
"Migration strategy if applicable",
|
||||
"Performance considerations",
|
||||
"Example queries for complex patterns"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All 10 criteria evaluated",
|
||||
"Minimum score of 3.5 on each",
|
||||
"Average ≥ 4.0",
|
||||
"Score 5 on Advanced Pattern Application"
|
||||
]
|
||||
}
|
||||
},
|
||||
"common_failure_modes": {
|
||||
"1. God Entities": {
|
||||
"symptom": "User table with 50+ attributes, or single entity handling multiple concerns",
|
||||
"why_it_fails": "Violates single responsibility, hard to query, update anomalies",
|
||||
"fix": "Extract related concerns into separate entities (UserProfile, UserPreferences, UserAddress)",
|
||||
"related_criteria": ["Entity Identification & Completeness", "Normalization & Data Integrity"]
|
||||
},
|
||||
"2. Missing Junction Tables": {
|
||||
"symptom": "Attempting M:N relationship with direct foreign keys or comma-separated IDs",
|
||||
"why_it_fails": "Can't properly model M:N, violates 1NF, query complexity",
|
||||
"fix": "Always use junction table with composite primary key for M:N relationships",
|
||||
"related_criteria": ["Relationship Modeling Accuracy", "Normalization & Data Integrity"]
|
||||
},
|
||||
"3. Wrong Data Types": {
|
||||
"symptom": "Money as FLOAT, dates as VARCHAR, booleans as CHAR(1)",
|
||||
"why_it_fails": "Precision loss (money), format inconsistency (dates), unclear semantics (booleans)",
|
||||
"fix": "Use DECIMAL for money, DATE/TIMESTAMP for dates, BOOLEAN for flags",
|
||||
"related_criteria": ["Attribute Definition Quality"]
|
||||
},
|
||||
"4. No Constraints": {
|
||||
"symptom": "Business rules in documentation but not enforced in schema",
|
||||
"why_it_fails": "Application bugs can corrupt data, no database-level guarantees",
|
||||
"fix": "Use CHECK constraints, NOT NULL, UNIQUE, FK constraints to enforce rules",
|
||||
"related_criteria": ["Constraint & Invariant Specification"]
|
||||
},
|
||||
"5. Premature Denormalization": {
|
||||
"symptom": "Duplicating data for \"performance\" without measuring",
|
||||
"why_it_fails": "Update anomalies, data inconsistency, wasted effort if not bottleneck",
|
||||
"fix": "Normalize first (3NF), denormalize only after profiling shows actual bottleneck",
|
||||
"related_criteria": ["Normalization & Data Integrity", "Use Case Coverage & Validation"]
|
||||
},
|
||||
"6. Ignoring Use Cases": {
|
||||
"symptom": "Schema designed in isolation, doesn't support required queries",
|
||||
"why_it_fails": "Schema can't answer business questions, requires redesign",
|
||||
"fix": "Validate schema against ALL use cases. Write example queries to verify.",
|
||||
"related_criteria": ["Use Case Coverage & Validation"]
|
||||
},
|
||||
"7. Modeling Implementation": {
|
||||
"symptom": "Entities like \"UserSession\", \"Cache\", \"Queue\" in domain model",
|
||||
"why_it_fails": "Confuses domain concepts with technical infrastructure",
|
||||
"fix": "Model real-world domain entities only. Infrastructure is separate concern.",
|
||||
"related_criteria": ["Entity Identification & Completeness", "Technology Appropriateness"]
|
||||
},
|
||||
"8. No Evolution Strategy": {
|
||||
"symptom": "Can't change schema without breaking production",
|
||||
"why_it_fails": "Schema ossifies, can't adapt to business changes",
|
||||
"fix": "Plan for evolution: versioning, backward-compat changes, or expand-contract migrations",
|
||||
"related_criteria": ["Evolution & Migration Strategy"]
|
||||
}
|
||||
},
|
||||
"scale": {
|
||||
"description": "Each criterion scored 1-5",
|
||||
"min_score": 1,
|
||||
"max_score": 5,
|
||||
"passing_threshold": 3.5,
|
||||
"excellence_threshold": 4.5
|
||||
},
|
||||
"usage_notes": {
|
||||
"when_to_score": "After completing schema design, before delivering to user",
|
||||
"minimum_standard": "Average score ≥ 3.5 across all criteria (standard domain). Simple domains: ≥ 3.0. Complex domains: ≥ 4.0.",
|
||||
"how_to_improve": "If scoring < threshold, identify lowest-scoring criteria and iterate. Common fixes: add missing entities, specify constraints, validate against use cases, improve documentation.",
|
||||
"self_assessment": "Score honestly. Schema flaws are expensive to fix in production. Better to iterate now."
|
||||
}
|
||||
}
|
||||
439
skills/data-schema-knowledge-modeling/resources/methodology.md
Normal file
439
skills/data-schema-knowledge-modeling/resources/methodology.md
Normal file
@@ -0,0 +1,439 @@
|
||||
# Data Schema & Knowledge Modeling: Advanced Methodology
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Advanced Schema Modeling:
|
||||
- [ ] Step 1: Analyze complex domain patterns
|
||||
- [ ] Step 2: Design advanced relationship structures
|
||||
- [ ] Step 3: Apply normalization or strategic denormalization
|
||||
- [ ] Step 4: Model temporal/historical aspects
|
||||
- [ ] Step 5: Plan schema evolution strategy
|
||||
```
|
||||
|
||||
**Steps:** (1) Identify patterns in [Advanced Relationships](#1-advanced-relationship-patterns), (2) Apply [Hierarchy](#2-hierarchy-modeling) and [Polymorphic](#3-polymorphic-associations) patterns, (3) Use [Normalization](#4-normalization-levels) then [Denormalization](#5-strategic-denormalization), (4) Add [Temporal](#6-temporal--historical-modeling) if needed, (5) Plan [Evolution](#7-schema-evolution).
|
||||
|
||||
---
|
||||
|
||||
## 1. Advanced Relationship Patterns
|
||||
|
||||
### Self-Referential
|
||||
|
||||
Entity relates to itself (org charts, categories, social networks).
|
||||
|
||||
```sql
|
||||
CREATE TABLE Employee (
|
||||
id BIGINT PRIMARY KEY,
|
||||
managerId BIGINT NULL REFERENCES Employee(id),
|
||||
CONSTRAINT no_self_ref CHECK (id != managerId)
|
||||
);
|
||||
```
|
||||
|
||||
Query with recursive CTE for full hierarchy.
|
||||
|
||||
### Conditional
|
||||
|
||||
Relationship exists only under conditions.
|
||||
|
||||
```sql
|
||||
CREATE TABLE Order (
|
||||
id BIGINT PRIMARY KEY,
|
||||
status VARCHAR(20),
|
||||
paymentId BIGINT NULL REFERENCES Payment(id),
|
||||
CONSTRAINT payment_when_paid CHECK (
|
||||
(status IN ('paid','completed') AND paymentId IS NOT NULL) OR
|
||||
(status NOT IN ('paid','completed'))
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
### Multi-Parent
|
||||
|
||||
Entity has multiple parents (document in folders).
|
||||
|
||||
```sql
|
||||
CREATE TABLE DocumentFolder (
|
||||
documentId BIGINT REFERENCES Document(id),
|
||||
folderId BIGINT REFERENCES Folder(id),
|
||||
PRIMARY KEY (documentId, folderId)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Hierarchy Modeling
|
||||
|
||||
Four approaches with trade-offs:
|
||||
|
||||
| Approach | Implementation | Read | Write | Best For |
|
||||
|----------|---------------|------|-------|----------|
|
||||
| **Adjacency List** | `parentId` column | Slow (recursive) | Fast | Shallow trees, frequent updates |
|
||||
| **Path Enumeration** | `path VARCHAR` ('/1/5/12/') | Fast | Medium | Read-heavy, moderate depth |
|
||||
| **Nested Sets** | `lft, rgt INT` | Fastest | Slow | Read-heavy, rare writes |
|
||||
| **Closure Table** | Separate ancestor/descendant table | Fastest | Medium | Complex queries, any depth |
|
||||
|
||||
**Adjacency List:**
|
||||
```sql
|
||||
CREATE TABLE Category (
|
||||
id BIGINT PRIMARY KEY,
|
||||
parentId BIGINT NULL REFERENCES Category(id)
|
||||
);
|
||||
```
|
||||
|
||||
**Closure Table:**
|
||||
```sql
|
||||
CREATE TABLE CategoryClosure (
|
||||
ancestor BIGINT,
|
||||
descendant BIGINT,
|
||||
depth INT, -- 0=self, 1=child, 2+=deeper
|
||||
PRIMARY KEY (ancestor, descendant)
|
||||
);
|
||||
```
|
||||
|
||||
**Recommendation:** Adjacency for < 5 levels, Closure for complex queries.
|
||||
|
||||
---
|
||||
|
||||
## 3. Polymorphic Associations
|
||||
|
||||
Entity relates to multiple types (Comment on Post/Photo/Video).
|
||||
|
||||
### Approach 1: Separate FKs (Recommended for SQL)
|
||||
|
||||
```sql
|
||||
CREATE TABLE Comment (
|
||||
id BIGINT PRIMARY KEY,
|
||||
postId BIGINT NULL REFERENCES Post(id),
|
||||
photoId BIGINT NULL REFERENCES Photo(id),
|
||||
videoId BIGINT NULL REFERENCES Video(id),
|
||||
CONSTRAINT one_parent CHECK (
|
||||
(postId IS NOT NULL)::int +
|
||||
(photoId IS NOT NULL)::int +
|
||||
(videoId IS NOT NULL)::int = 1
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
**Pros:** Type-safe, referential integrity
|
||||
**Cons:** Schema grows with types
|
||||
|
||||
### Approach 2: Supertype/Subtype
|
||||
|
||||
```sql
|
||||
CREATE TABLE Commentable (id BIGINT PRIMARY KEY, type VARCHAR(50));
|
||||
CREATE TABLE Post (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
|
||||
CREATE TABLE Photo (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
|
||||
CREATE TABLE Comment (commentableId BIGINT REFERENCES Commentable(id));
|
||||
```
|
||||
|
||||
**Use when:** Shared attributes across types.
|
||||
|
||||
---
|
||||
|
||||
## 4. Graph & Ontology Design
|
||||
|
||||
### Property Graph
|
||||
|
||||
**Nodes** = entities, **Edges** = relationships, both have properties.
|
||||
|
||||
```cypher
|
||||
CREATE (u:User {id: 1, name: 'Alice'})
|
||||
CREATE (p:Product {id: 100, name: 'Widget'})
|
||||
CREATE (u)-[:PURCHASED {date: '2024-01-15', quantity: 2}]->(p)
|
||||
```
|
||||
|
||||
**Schema:**
|
||||
```
|
||||
Nodes: User, Product, Category
|
||||
Edges: PURCHASED (User→Product, {date, quantity})
|
||||
REVIEWED (User→Product, {rating, comment})
|
||||
BELONGS_TO (Product→Category)
|
||||
```
|
||||
|
||||
**Design principles:**
|
||||
- Nodes for entities with identity
|
||||
- Edges for relationships
|
||||
- Properties on edges for context
|
||||
- Avoid deep traversals (< 3 hops)
|
||||
|
||||
### RDF Triples (Semantic Web)
|
||||
|
||||
Subject-Predicate-Object:
|
||||
```turtle
|
||||
ex:Alice rdf:type ex:User .
|
||||
ex:Alice ex:purchased ex:Widget .
|
||||
```
|
||||
|
||||
**Use RDF when:** Standards compliance, semantic reasoning, linked data
|
||||
**Use Property Graph when:** Performance, complex traversals
|
||||
|
||||
---
|
||||
|
||||
## 5. Normalization Levels
|
||||
|
||||
### 1NF: Atomic Values
|
||||
|
||||
**Violation:** Multiple phones in one column
|
||||
**Fix:** Separate UserPhone table
|
||||
|
||||
### 2NF: No Partial Dependencies
|
||||
|
||||
**Violation:** In OrderItem(orderId, productId, productName), productName depends only on productId
|
||||
**Fix:** productName lives in Product table
|
||||
|
||||
### 3NF: No Transitive Dependencies
|
||||
|
||||
**Violation:** In Address(id, zipCode, city, state), city/state depend on zipCode
|
||||
**Fix:** Separate ZipCode table
|
||||
|
||||
**When to normalize to 3NF:** OLTP, frequent updates, consistency required
|
||||
|
||||
---
|
||||
|
||||
## 6. Strategic Denormalization
|
||||
|
||||
**Only after profiling shows bottleneck.**
|
||||
|
||||
### Pattern 1: Computed Aggregates
|
||||
|
||||
Store `Order.total` instead of summing OrderItems on every query.
|
||||
|
||||
**Trade-off:** Faster reads, slower writes, consistency risk (use triggers/app logic)
|
||||
|
||||
### Pattern 2: Frequent Joins
|
||||
|
||||
Embed address fields in User table to avoid join.
|
||||
|
||||
**Trade-off:** No join, but updates must maintain both
|
||||
|
||||
### Pattern 3: Historical Snapshots
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderSnapshot (
|
||||
orderId BIGINT,
|
||||
snapshotDate DATE,
|
||||
userName VARCHAR(255), -- denormalized from User
|
||||
userEmail VARCHAR(255),
|
||||
PRIMARY KEY (orderId, snapshotDate)
|
||||
);
|
||||
```
|
||||
|
||||
**Use when:** Need point-in-time data (e.g., user's name at time of order)
|
||||
|
||||
---
|
||||
|
||||
## 7. Temporal & Historical Modeling
|
||||
|
||||
### Pattern 1: Effective Dating
|
||||
|
||||
```sql
|
||||
CREATE TABLE Price (
|
||||
productId BIGINT,
|
||||
price DECIMAL(10,2),
|
||||
effectiveFrom DATE NOT NULL,
|
||||
effectiveTo DATE NULL, -- NULL = current
|
||||
PRIMARY KEY (productId, effectiveFrom)
|
||||
);
|
||||
```
|
||||
|
||||
**Query current:** WHERE effectiveFrom <= TODAY AND (effectiveTo IS NULL OR effectiveTo > TODAY)
|
||||
|
||||
### Pattern 2: History Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE UserHistory (
|
||||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||||
userId BIGINT,
|
||||
email VARCHAR(255),
|
||||
name VARCHAR(255),
|
||||
validFrom TIMESTAMP DEFAULT NOW(),
|
||||
validTo TIMESTAMP NULL,
|
||||
changeType VARCHAR(20) -- 'INSERT', 'UPDATE', 'DELETE'
|
||||
);
|
||||
```
|
||||
|
||||
Trigger on User table inserts into UserHistory on changes.
|
||||
|
||||
### Pattern 3: Event Sourcing
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderEvent (
|
||||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||||
orderId BIGINT,
|
||||
eventType VARCHAR(50), -- 'CREATED', 'ITEM_ADDED', 'SHIPPED'
|
||||
eventData JSON,
|
||||
occurredAt TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
Reconstruct state by replaying events.
|
||||
|
||||
**Trade-offs:**
|
||||
**Pros:** Complete audit, time travel
|
||||
**Cons:** Query complexity, storage
|
||||
|
||||
---
|
||||
|
||||
## 8. Schema Evolution
|
||||
|
||||
### Strategy 1: Backward-Compatible
|
||||
|
||||
Safe changes (no app changes):
|
||||
- Add nullable column
|
||||
- Add table (not referenced)
|
||||
- Add index
|
||||
- Widen column (VARCHAR(100) → VARCHAR(255))
|
||||
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN phoneNumber VARCHAR(20) NULL;
|
||||
```
|
||||
|
||||
### Strategy 2: Expand-Contract
|
||||
|
||||
For breaking changes:
|
||||
|
||||
1. **Expand:** Add new alongside old
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN newEmail VARCHAR(255) NULL;
|
||||
```
|
||||
|
||||
2. **Migrate:** Copy data
|
||||
```sql
|
||||
UPDATE User SET newEmail = email WHERE newEmail IS NULL;
|
||||
```
|
||||
|
||||
3. **Contract:** Remove old
|
||||
```sql
|
||||
ALTER TABLE User DROP COLUMN email;
|
||||
ALTER TABLE User RENAME COLUMN newEmail TO email;
|
||||
```
|
||||
|
||||
### Strategy 3: Versioned Schemas (NoSQL)
|
||||
|
||||
```json
|
||||
{"_schemaVersion": "2.0", "email": "alice@example.com"}
|
||||
```
|
||||
|
||||
App handles multiple versions.
|
||||
|
||||
### Strategy 4: Blue-Green
|
||||
|
||||
Run old and new schemas simultaneously, dual-write, migrate, switch reads, remove old.
|
||||
|
||||
**Best for:** Major redesigns, zero downtime
|
||||
|
||||
---
|
||||
|
||||
## 9. Multi-Tenancy
|
||||
|
||||
### Pattern 1: Separate Databases
|
||||
|
||||
```
|
||||
tenant1_db, tenant2_db, tenant3_db
|
||||
```
|
||||
|
||||
**Pros:** Strong isolation
|
||||
**Cons:** High overhead
|
||||
|
||||
### Pattern 2: Separate Schemas
|
||||
|
||||
```sql
|
||||
CREATE SCHEMA tenant1;
|
||||
CREATE TABLE tenant1.User (...);
|
||||
```
|
||||
|
||||
**Pros:** Better than separate DBs
|
||||
**Cons:** Still some overhead
|
||||
|
||||
### Pattern 3: Shared Schema + Tenant ID
|
||||
|
||||
```sql
|
||||
CREATE TABLE User (
|
||||
id BIGINT PRIMARY KEY,
|
||||
tenantId BIGINT NOT NULL,
|
||||
email VARCHAR(255),
|
||||
UNIQUE (tenantId, email)
|
||||
);
|
||||
```
|
||||
|
||||
**Pros:** Most efficient
|
||||
**Cons:** Must filter ALL queries by tenantId
|
||||
|
||||
**Recommendation:** Pattern 3 for SaaS, Pattern 1 for regulated industries
|
||||
|
||||
---
|
||||
|
||||
## 10. Performance
|
||||
|
||||
### Indexes
|
||||
|
||||
**Covering index** (includes all query columns):
|
||||
```sql
|
||||
CREATE INDEX idx_user_status ON User(status) INCLUDE (name, email);
|
||||
```
|
||||
|
||||
**Composite index** (order matters):
|
||||
```sql
|
||||
-- Good for: WHERE tenantId = X AND createdAt > Y
|
||||
CREATE INDEX idx_tenant_date ON Order(tenantId, createdAt);
|
||||
```
|
||||
|
||||
**Partial index** (reduce size):
|
||||
```sql
|
||||
CREATE INDEX idx_active ON User(email) WHERE deletedAt IS NULL;
|
||||
```
|
||||
|
||||
### Partitioning
|
||||
|
||||
**Horizontal (sharding):**
|
||||
```sql
|
||||
CREATE TABLE Order (...) PARTITION BY RANGE (createdAt);
|
||||
CREATE TABLE Order_2024_Q1 PARTITION OF Order
|
||||
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
|
||||
```
|
||||
|
||||
**Vertical:** Split hot/cold data into separate tables.
|
||||
|
||||
---
|
||||
|
||||
## 11. Common Advanced Patterns
|
||||
|
||||
### Soft Deletes
|
||||
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN deletedAt TIMESTAMP NULL;
|
||||
-- Query: WHERE deletedAt IS NULL
|
||||
```
|
||||
|
||||
### Audit Columns
|
||||
|
||||
```sql
|
||||
createdAt TIMESTAMP DEFAULT NOW()
|
||||
updatedAt TIMESTAMP DEFAULT NOW() ON UPDATE NOW()
|
||||
createdBy BIGINT REFERENCES User(id)
|
||||
updatedBy BIGINT REFERENCES User(id)
|
||||
```
|
||||
|
||||
### State Machines
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderState (
|
||||
orderId BIGINT REFERENCES Order(id),
|
||||
state VARCHAR(20),
|
||||
transitionedAt TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (orderId, transitionedAt)
|
||||
);
|
||||
-- Track: draft → pending → confirmed → shipped → delivered
|
||||
```
|
||||
|
||||
### Idempotency Keys
|
||||
|
||||
```sql
|
||||
CREATE TABLE Request (
|
||||
idempotencyKey UUID PRIMARY KEY,
|
||||
payload JSON,
|
||||
result JSON,
|
||||
processedAt TIMESTAMP
|
||||
);
|
||||
-- Prevents duplicate processing
|
||||
```
|
||||
330
skills/data-schema-knowledge-modeling/resources/template.md
Normal file
330
skills/data-schema-knowledge-modeling/resources/template.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Data Schema & Knowledge Modeling Template
|
||||
|
||||
## Workflow
|
||||
|
||||
Copy this checklist and track your progress:
|
||||
|
||||
```
|
||||
Data Schema & Knowledge Modeling Progress:
|
||||
- [ ] Step 1: Gather domain requirements and scope
|
||||
- [ ] Step 2: Identify entities and attributes systematically
|
||||
- [ ] Step 3: Define relationships and cardinality
|
||||
- [ ] Step 4: Specify constraints and invariants
|
||||
- [ ] Step 5: Validate against use cases and document
|
||||
```
|
||||
|
||||
**Step 1: Gather domain requirements and scope**
|
||||
|
||||
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
|
||||
|
||||
**Step 2: Identify entities and attributes systematically**
|
||||
|
||||
Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
|
||||
|
||||
**Step 3: Define relationships and cardinality**
|
||||
|
||||
Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
|
||||
|
||||
**Step 4: Specify constraints and invariants**
|
||||
|
||||
Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
|
||||
|
||||
**Step 5: Validate against use cases and document**
|
||||
|
||||
Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
|
||||
|
||||
---
|
||||
|
||||
## Input Questions
|
||||
|
||||
**Domain & Scope:**
|
||||
- What domain? (e-commerce, healthcare, social network)
|
||||
- Boundaries? In/out of scope?
|
||||
- Existing schemas to integrate/migrate from?
|
||||
|
||||
**Core Use Cases:**
|
||||
- Primary operations? (CRUD for which entities?)
|
||||
- Required queries/reports?
|
||||
- Access patterns? (read-heavy, write-heavy, mixed)
|
||||
|
||||
**Scale & Performance:**
|
||||
- Data volume? (rows per table, storage)
|
||||
- Growth rate? (daily/monthly)
|
||||
- Performance SLAs?
|
||||
|
||||
**Technology:**
|
||||
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
|
||||
- Compliance? (GDPR, HIPAA, SOC2)
|
||||
- Evolution needs? (schema versioning, migrations)
|
||||
|
||||
---
|
||||
|
||||
## Entity Identification
|
||||
|
||||
**Step 1: Extract nouns**
|
||||
|
||||
List nouns from requirements = candidate entities.
|
||||
|
||||
**Step 2: Validate**
|
||||
|
||||
For each, check:
|
||||
- [ ] Distinct identity? (can point to "this specific X")
|
||||
- [ ] Independent lifecycle?
|
||||
- [ ] Multiple attributes beyond name?
|
||||
- [ ] Track multiple instances?
|
||||
|
||||
**Keep** if yes to most. **Reject** if just an attribute.
|
||||
|
||||
**Step 3: Entity vs Value Object**
|
||||
|
||||
- **Entity**: Has ID, mutable (User, Order)
|
||||
- **Value Object**: No ID, immutable (Address, Money)
|
||||
|
||||
**Step 4: Document**
|
||||
|
||||
```markdown
|
||||
### Entity: [Name]
|
||||
**Purpose:** [What it represents]
|
||||
**Examples:** [2-3 concrete cases]
|
||||
**Lifecycle:** [Creation → deletion]
|
||||
**Invariants:** [Rules that must hold]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Attribute Guide
|
||||
|
||||
**Template:**
|
||||
```
|
||||
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
|
||||
- Description: [What it represents]
|
||||
- Validation: [Constraints]
|
||||
- Examples: [Sample values]
|
||||
```
|
||||
|
||||
**Standard attributes:**
|
||||
- `id`: Primary key (UUID/BIGINT)
|
||||
- `createdAt`: TIMESTAMP NOT NULL
|
||||
- `updatedAt`: TIMESTAMP NOT NULL
|
||||
- `deletedAt`: TIMESTAMP NULL (soft deletes)
|
||||
|
||||
**Data types:**
|
||||
|
||||
| Data | SQL | NoSQL | Notes |
|
||||
|------|-----|-------|-------|
|
||||
| Short text | VARCHAR(N) | String | Specify max |
|
||||
| Long text | TEXT | String | No limit |
|
||||
| Integer | INT/BIGINT | Number | Choose size |
|
||||
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
|
||||
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
|
||||
| Boolean | BOOLEAN | Boolean | Not nullable |
|
||||
| Date/Time | TIMESTAMP | ISODate | With timezone |
|
||||
| UUID | UUID/CHAR(36) | String | Distributed IDs |
|
||||
| JSON | JSON/JSONB | Object | Flexible |
|
||||
| Enum | ENUM/VARCHAR | String | Fixed values |
|
||||
|
||||
**Nullability:**
|
||||
- NOT NULL if required
|
||||
- NULL if optional/unknown at creation
|
||||
- Avoid NULL for booleans
|
||||
|
||||
---
|
||||
|
||||
## Relationship Mapping
|
||||
|
||||
**Cardinality:**
|
||||
|
||||
**1:1** - User has one Profile
|
||||
- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
|
||||
|
||||
**1:N** - User has many Orders
|
||||
- SQL: `Order.userId NOT NULL REFERENCES User(id)`
|
||||
|
||||
**M:N** - Order contains Products
|
||||
- Junction table:
|
||||
```sql
|
||||
OrderItem (
|
||||
orderId REFERENCES Order(id),
|
||||
productId REFERENCES Product(id),
|
||||
quantity INT NOT NULL,
|
||||
PRIMARY KEY (orderId, productId)
|
||||
)
|
||||
```
|
||||
|
||||
**Optionality:**
|
||||
- Required: NOT NULL
|
||||
- Optional: NULL
|
||||
|
||||
**Naming:**
|
||||
Use verbs: User **owns** Order, Product **belongs to** Category
|
||||
|
||||
---
|
||||
|
||||
## Constraint Specification
|
||||
|
||||
**Primary Keys:**
|
||||
```sql
|
||||
id BIGINT PRIMARY KEY AUTO_INCREMENT
|
||||
-- or --
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
||||
```
|
||||
|
||||
**Unique:**
|
||||
```sql
|
||||
email VARCHAR(255) UNIQUE NOT NULL
|
||||
UNIQUE (userId, productId) -- composite
|
||||
```
|
||||
|
||||
**Foreign Keys:**
|
||||
```sql
|
||||
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
|
||||
-- Options: CASCADE, SET NULL, RESTRICT
|
||||
```
|
||||
|
||||
**Check Constraints:**
|
||||
```sql
|
||||
price DECIMAL(10,2) CHECK (price >= 0)
|
||||
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
|
||||
```
|
||||
|
||||
**Domain Invariants:**
|
||||
|
||||
Document business rules:
|
||||
```markdown
|
||||
### Invariant: Order total = sum of items
|
||||
Order.total = SUM(OrderItem.quantity * OrderItem.price)
|
||||
|
||||
### Invariant: Unique email
|
||||
No duplicate emails (case-insensitive)
|
||||
```
|
||||
|
||||
Enforce via: DB constraints (preferred), application logic, or triggers.
|
||||
|
||||
---
|
||||
|
||||
## Schema Documentation Template
|
||||
|
||||
Create: `data-schema-knowledge-modeling.md`
|
||||
|
||||
**Required sections:**
|
||||
|
||||
1. **Domain Overview** - Purpose, scope, technology
|
||||
2. **Use Cases** - Primary operations, query patterns
|
||||
3. **Entity Definitions** - For each entity:
|
||||
- Purpose, examples, lifecycle
|
||||
- Attributes table (name, type, null, default, constraints, description)
|
||||
- Relationships (cardinality, FK, optionality)
|
||||
- Invariants
|
||||
4. **ERD** - Visual/text diagram showing relationships
|
||||
5. **Constraints** - DB constraints, domain invariants
|
||||
6. **Normalization** - Level, denormalization decisions
|
||||
7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
|
||||
8. **Validation** - Check each use case is supported
|
||||
9. **Open Questions** - Unresolved decisions
|
||||
|
||||
**Example entity definition:**
|
||||
|
||||
```markdown
|
||||
### Entity: Order
|
||||
|
||||
**Purpose:** Represents customer purchase transaction
|
||||
**Examples:** Amazon order #123, Shopify order #456
|
||||
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
|
||||
|
||||
#### Attributes
|
||||
|
||||
| Attribute | Type | Null? | Default | Constraints | Description |
|
||||
|---|---|---|---|---|---|
|
||||
| id | BIGINT | NO | auto | PK | Unique identifier |
|
||||
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
|
||||
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
|
||||
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
|
||||
|
||||
#### Relationships
|
||||
- **belongs to:** 1:N with User (Order.userId → User.id)
|
||||
- **contains:** 1:N with OrderItem junction table
|
||||
|
||||
#### Invariants
|
||||
- total = SUM(OrderItem.quantity * OrderItem.price)
|
||||
- status transitions: pending → confirmed → shipped → delivered
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
**Completeness:**
|
||||
- [ ] All entities identified
|
||||
- [ ] All attributes defined (types, nullability)
|
||||
- [ ] All relationships mapped (cardinality)
|
||||
- [ ] All constraints specified
|
||||
- [ ] All invariants documented
|
||||
|
||||
**Correctness:**
|
||||
- [ ] Each entity distinct purpose
|
||||
- [ ] No redundant entities
|
||||
- [ ] Attributes in correct entities
|
||||
- [ ] Cardinality reflects reality
|
||||
- [ ] Constraints enforce rules
|
||||
|
||||
**Use Case Coverage:**
|
||||
- [ ] Supports all CRUD operations
|
||||
- [ ] All queries answerable
|
||||
- [ ] Indexes planned
|
||||
- [ ] No missing joins
|
||||
|
||||
**Normalization:**
|
||||
- [ ] No partial dependencies (2NF)
|
||||
- [ ] No transitive dependencies (3NF)
|
||||
- [ ] Denormalization documented
|
||||
- [ ] No update anomalies
|
||||
|
||||
**Technical Quality:**
|
||||
- [ ] Consistent naming
|
||||
- [ ] Appropriate data types
|
||||
- [ ] Primary keys defined
|
||||
- [ ] Foreign keys maintain integrity
|
||||
- [ ] Soft delete strategy (if needed)
|
||||
- [ ] Audit fields (if needed)
|
||||
|
||||
**Future-Proofing:**
|
||||
- [ ] Schema extensible
|
||||
- [ ] Migration path (if applicable)
|
||||
- [ ] Versioning strategy
|
||||
- [ ] No technical debt
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
**1. Modeling implementation, not domain**
|
||||
- Symptom: Entities like "UserSession", "Cache"
|
||||
- Fix: Model real-world concepts only
|
||||
|
||||
**2. God entities**
|
||||
- Symptom: User with 50+ attributes
|
||||
- Fix: Extract to separate entities
|
||||
|
||||
**3. Missing junction tables**
|
||||
- Symptom: M:N with FKs
|
||||
- Fix: Always use junction table
|
||||
|
||||
**4. Nullable FKs without reason**
|
||||
- Symptom: All relationships optional
|
||||
- Fix: NOT NULL unless truly optional
|
||||
|
||||
**5. Not enforcing invariants**
|
||||
- Symptom: Rules in docs only
|
||||
- Fix: CHECK constraints, triggers, app validation
|
||||
|
||||
**6. Premature denormalization**
|
||||
- Symptom: Duplicating without measurement
|
||||
- Fix: Normalize first, denormalize after profiling
|
||||
|
||||
**7. Wrong data types**
|
||||
- Symptom: Money as VARCHAR
|
||||
- Fix: DECIMAL for money, proper types for all
|
||||
|
||||
**8. No migration strategy**
|
||||
- Symptom: Can't change schema
|
||||
- Fix: Versioning, backward-compat changes
|
||||
Reference in New Issue
Block a user