Initial commit
This commit is contained in:
193
skills/data-schema-knowledge-modeling/SKILL.md
Normal file
193
skills/data-schema-knowledge-modeling/SKILL.md
Normal file
@@ -0,0 +1,193 @@
|
||||
---
|
||||
name: data-schema-knowledge-modeling
|
||||
description: Use when designing database schemas, need to model domain entities and relationships clearly, building knowledge graphs or ontologies, creating API data models, defining system boundaries and invariants, migrating between data models, establishing taxonomies or hierarchies, user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when scattered/inconsistent data structures need formalization.
|
||||
---
|
||||
|
||||
# Data Schema & Knowledge Modeling
|
||||
|
||||
## Table of Contents
|
||||
- [Purpose](#purpose)
|
||||
- [When to Use](#when-to-use)
|
||||
- [What Is It](#what-is-it)
|
||||
- [Workflow](#workflow)
|
||||
- [Schema Types](#schema-types)
|
||||
- [Common Patterns](#common-patterns)
|
||||
- [Guardrails](#guardrails)
|
||||
- [Quick Reference](#quick-reference)
|
||||
|
||||
## Purpose
|
||||
|
||||
Create rigorous, validated models of entities, relationships, and constraints that enable correct system implementation, knowledge representation, and semantic reasoning.
|
||||
|
||||
## When to Use
|
||||
|
||||
**Invoke this skill when you need to:**
|
||||
- Design database schema (SQL, NoSQL, graph) for new application
|
||||
- Model complex domain with many entities and relationships
|
||||
- Build knowledge graph or ontology for semantic search/reasoning
|
||||
- Define API data models and contracts
|
||||
- Create taxonomies or classification hierarchies
|
||||
- Establish data governance and canonical models
|
||||
- Migrate legacy schemas to modern architectures
|
||||
- Resolve ambiguity in domain concepts and relationships
|
||||
- Enable data integration across systems
|
||||
- Document system invariants and business rules
|
||||
|
||||
**Common trigger phrases:**
|
||||
- "Design a schema for..."
|
||||
- "Model the entities and relationships"
|
||||
- "Create a knowledge graph"
|
||||
- "What's the data model?"
|
||||
- "Define the ontology"
|
||||
- "How should we structure this data?"
|
||||
- "Map relationships between..."
|
||||
- "Design the API data model"
|
||||
|
||||
## What Is It
|
||||
|
||||
**Data schema & knowledge modeling** is the process of formally defining:
|
||||
|
||||
1. **Entities** - Things that exist (User, Product, Order, Organization)
|
||||
2. **Attributes** - Properties of entities (name, price, status, createdAt)
|
||||
3. **Relationships** - Connections between entities (User owns Order, Product belongsTo Category)
|
||||
4. **Constraints** - Rules and invariants (unique email, price > 0, one primary address)
|
||||
5. **Cardinality** - How many of each (one-to-many, many-to-many)
|
||||
|
||||
**Quick example:** E-commerce schema:
|
||||
- **Entities**: User, Product, Order, Cart, Payment
|
||||
- **Relationships**: User has many Orders, Order contains many Products (via OrderItems), User has one Cart
|
||||
- **Constraints**: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total
|
||||
- **Result**: Unambiguous model that prevents data inconsistencies
|
||||
|
||||
## Workflow
|
||||
|
||||
Copy this checklist and track your progress:
|
||||
|
||||
```
|
||||
Data Schema & Knowledge Modeling Progress:
|
||||
- [ ] Step 1: Gather domain requirements and scope
|
||||
- [ ] Step 2: Identify entities and attributes
|
||||
- [ ] Step 3: Define relationships and cardinality
|
||||
- [ ] Step 4: Specify constraints and invariants
|
||||
- [ ] Step 5: Validate and document the model
|
||||
```
|
||||
|
||||
**Step 1: Gather domain requirements and scope**
|
||||
|
||||
Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See [Schema Types](#schema-types) for guidance.
|
||||
|
||||
**Step 2: Identify entities and attributes**
|
||||
|
||||
Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use [resources/template.md](resources/template.md) for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples.
|
||||
|
||||
**Step 3: Define relationships and cardinality**
|
||||
|
||||
Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use [resources/methodology.md](resources/methodology.md) for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships.
|
||||
|
||||
**Step 4: Specify constraints and invariants**
|
||||
|
||||
Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must ALWAYS be true). Identify derived/computed attributes vs stored. Use [resources/methodology.md](resources/methodology.md) for advanced constraint patterns and validation strategies.
|
||||
|
||||
**Step 5: Validate and document the model**
|
||||
|
||||
Create `data-schema-knowledge-modeling.md` file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using [resources/evaluators/rubric_data_schema_knowledge_modeling.json](resources/evaluators/rubric_data_schema_knowledge_modeling.json). Minimum standard: Average score ≥ 3.5.
|
||||
|
||||
## Schema Types
|
||||
|
||||
Choose based on use case and technology:
|
||||
|
||||
**Relational (SQL) Schema**
|
||||
- **Best for:** Transactional systems (OLTP), strong consistency, complex queries with joins
|
||||
- **Pattern:** Normalized tables, foreign keys, ACID transactions
|
||||
- **Example use cases:** E-commerce orders, banking transactions, HR systems
|
||||
- **Key decision:** Normalization level (3NF for consistency vs denormalized for read performance)
|
||||
|
||||
**Document/NoSQL Schema**
|
||||
- **Best for:** Flexible/evolving structure, high write throughput, denormalized reads
|
||||
- **Pattern:** Nested documents, embedded relationships, no joins
|
||||
- **Example use cases:** Content management, user profiles, event logs
|
||||
- **Key decision:** Embed vs reference (embed for 1-to-few, reference for 1-to-many)
|
||||
|
||||
**Graph Schema (Ontology)**
|
||||
- **Best for:** Complex relationships, traversal queries, semantic reasoning, knowledge graphs
|
||||
- **Pattern:** Nodes (entities), edges (relationships), properties on both
|
||||
- **Example use cases:** Social networks, fraud detection, recommendation engines, scientific research
|
||||
- **Key decision:** Property graph vs RDF triples
|
||||
|
||||
**Event/Time-Series Schema**
|
||||
- **Best for:** Audit logs, metrics, IoT data, append-only data
|
||||
- **Pattern:** Immutable events, time-based partitioning, aggregation tables
|
||||
- **Example use cases:** User activity tracking, monitoring, financial transactions
|
||||
- **Key decision:** Raw events vs pre-aggregated summaries
|
||||
|
||||
**Dimensional (Data Warehouse) Schema**
|
||||
- **Best for:** Analytics (OLAP), aggregations, historical reporting
|
||||
- **Pattern:** Fact tables + dimension tables (star/snowflake schema)
|
||||
- **Example use cases:** Business intelligence, sales analytics, customer 360
|
||||
- **Key decision:** Star schema (denormalized) vs snowflake (normalized dimensions)
|
||||
|
||||
## Common Patterns
|
||||
|
||||
**Pattern: Entity Lifecycle Modeling**
|
||||
Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed.
|
||||
|
||||
**Pattern: Soft Deletes**
|
||||
Never physically delete records - add `deletedAt` timestamp. Allows data recovery, audit compliance, and referential integrity. Filter `WHERE deletedAt IS NULL` in queries.
|
||||
|
||||
**Pattern: Polymorphic Associations**
|
||||
Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance.
|
||||
|
||||
**Pattern: Temporal/Historical Data**
|
||||
Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns.
|
||||
|
||||
**Pattern: Multi-tenancy**
|
||||
Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared.
|
||||
|
||||
**Pattern: Hierarchies**
|
||||
Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance.
|
||||
|
||||
## Guardrails
|
||||
|
||||
**✓ Do:**
|
||||
- Start with use cases - schema serves queries/operations
|
||||
- Normalize first, then denormalize for specific performance needs
|
||||
- Document all constraints and invariants explicitly
|
||||
- Use meaningful, consistent naming conventions
|
||||
- Consider future evolution - design for extensibility
|
||||
- Validate model against ALL required use cases
|
||||
- Model the real world accurately (don't force fit to technology)
|
||||
|
||||
**✗ Don't:**
|
||||
- Design schema in isolation from use cases
|
||||
- Premature optimization (denormalize before measuring)
|
||||
- Skip constraint definitions (leads to data corruption)
|
||||
- Use generic names (data, value, thing) - be specific
|
||||
- Ignore cardinality and nullability
|
||||
- Model implementation details in domain entities
|
||||
- Forget about data migration path from existing systems
|
||||
- Create circular dependencies between entities
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Resources:**
|
||||
- `resources/template.md` - Structured process for entity identification, relationship mapping, and constraint definition
|
||||
- `resources/methodology.md` - Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategies
|
||||
- `resources/examples/` - Worked examples showing complete schema designs with validation
|
||||
- `resources/evaluators/rubric_data_schema_knowledge_modeling.json` - Quality assessment before delivery
|
||||
|
||||
**When to choose which resource:**
|
||||
- Simple domain (< 10 entities) → Start with template
|
||||
- Complex domain or graph/ontology → Study methodology for advanced patterns
|
||||
- Need to see examples → Review examples folder
|
||||
- Before delivering to user → Always validate with rubric
|
||||
|
||||
**Expected deliverable:**
|
||||
`data-schema-knowledge-modeling.md` file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes.
|
||||
|
||||
**Common schema notations:**
|
||||
- **ERD** (Entity-Relationship Diagram): Visual representation of entities and relationships
|
||||
- **UML Class Diagram**: Object-oriented view with inheritance and associations
|
||||
- **Graph Diagram**: Nodes and edges for graph databases
|
||||
- **JSON Schema**: API/document structure with validation rules
|
||||
- **SQL DDL**: Executable CREATE TABLE statements
|
||||
- **Ontology (OWL/RDF)**: Semantic web knowledge representation
|
||||
@@ -0,0 +1,282 @@
|
||||
{
|
||||
"criteria": [
|
||||
{
|
||||
"name": "Entity Identification & Completeness",
|
||||
"description": "Are all domain entities identified? Each with clear purpose, distinct identity, and no redundancy?",
|
||||
"scoring": {
|
||||
"1": "Missing critical entities. Entities poorly defined or overlapping. No clear distinction between entities and attributes.",
|
||||
"2": "Some entities identified but gaps in coverage. Some entity purposes unclear. Minor redundancy.",
|
||||
"3": "Most entities identified with clear purposes. Reasonable coverage. Entities generally distinct.",
|
||||
"4": "All required entities identified with clear, documented purposes. Good examples provided. No redundancy.",
|
||||
"5": "Complete entity coverage validated against all use cases. Each entity has purpose, examples, lifecycle documented. Entity vs value object distinction clear. No overlap or redundancy."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Attribute Definition Quality",
|
||||
"description": "Are attributes complete with appropriate data types, nullability, and constraints?",
|
||||
"scoring": {
|
||||
"1": "Attributes missing or poorly typed. Wrong data types (e.g., money as VARCHAR). Nullability ignored.",
|
||||
"2": "Basic attributes present but some types questionable. Nullability inconsistent. Some constraints missing.",
|
||||
"3": "Attributes defined with reasonable types. Nullability specified. Core constraints present.",
|
||||
"4": "All attributes well-typed (DECIMAL for money, proper VARCHAR lengths). Nullability correctly specified. Constraints documented.",
|
||||
"5": "Comprehensive attribute definitions with justification for types, nullability, defaults, and constraints. Audit fields (createdAt, updatedAt) included where appropriate. No technical debt."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Relationship Modeling Accuracy",
|
||||
"description": "Are relationships correctly identified with proper cardinality, optionality, and implementation?",
|
||||
"scoring": {
|
||||
"1": "Relationships missing or incorrect. Cardinality wrong. M:N modeled without junction table.",
|
||||
"2": "Some relationships identified but cardinality questionable. Missing junction tables or unclear optionality.",
|
||||
"3": "Most relationships mapped with cardinality. Junction tables for M:N. Optionality specified.",
|
||||
"4": "All relationships correctly modeled. Proper cardinality (1:1, 1:N, M:N). Junction tables where needed. Clear optionality.",
|
||||
"5": "Comprehensive relationship documentation with bidirectional naming, implementation details (FKs, ON DELETE actions), and validation that relationships support all use cases. Complex patterns (polymorphic, hierarchical) correctly handled."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Constraint & Invariant Specification",
|
||||
"description": "Are business rules enforced via constraints? Are domain invariants documented and validated?",
|
||||
"scoring": {
|
||||
"1": "No constraints beyond primary keys. Business rules not documented. Invariants missing.",
|
||||
"2": "Basic constraints (NOT NULL, UNIQUE) present but business rules not enforced. Invariants mentioned but not validated.",
|
||||
"3": "Good constraint coverage (PK, FK, UNIQUE, NOT NULL). Some business rules enforced. Invariants documented.",
|
||||
"4": "Comprehensive constraints including CHECK constraints for business rules. All invariants documented with enforcement strategy.",
|
||||
"5": "All constraints documented with rationale. Domain invariants clearly stated and enforced via DB constraints where possible, application logic where not. Validation strategy for complex multi-table invariants. Examples of enforcement code provided."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Normalization & Data Integrity",
|
||||
"description": "Is schema properly normalized (or deliberately denormalized with rationale)?",
|
||||
"scoring": {
|
||||
"1": "Severe normalization violations. Redundant data. Update anomalies likely.",
|
||||
"2": "Some normalization but violations present (partial or transitive dependencies). Some redundancy.",
|
||||
"3": "Generally normalized to 2NF-3NF. Minimal redundancy. Rationale for exceptions provided.",
|
||||
"4": "Proper normalization to 3NF. Any denormalization documented with performance justification. No update anomalies.",
|
||||
"5": "Exemplary normalization with clear explanation of level achieved (1NF/2NF/3NF/BCNF). Strategic denormalization only where measured performance gains justify it. Trade-offs explicitly documented. No data integrity risks."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Use Case Coverage & Validation",
|
||||
"description": "Does schema support all required use cases? Can all queries be answered?",
|
||||
"scoring": {
|
||||
"1": "Schema doesn't support core use cases. Critical queries impossible or require workarounds.",
|
||||
"2": "Supports some use cases but gaps exist. Some queries difficult or inefficient.",
|
||||
"3": "Supports most use cases. Required queries possible though some may be complex.",
|
||||
"4": "All use cases supported. Validation checklist shows each use case can be satisfied. Query paths identified.",
|
||||
"5": "Comprehensive validation against all use cases with example queries. Indexes planned for performance. Edge cases considered. Future use cases accommodated by design."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Technology Appropriateness",
|
||||
"description": "Is the schema type (relational, document, graph) appropriate for the domain?",
|
||||
"scoring": {
|
||||
"1": "Wrong technology choice (e.g., relational for graph problem, or vice versa). Implementation doesn't match paradigm.",
|
||||
"2": "Technology choice questionable. Implementation awkward for chosen paradigm.",
|
||||
"3": "Reasonable technology choice. Implementation follows paradigm conventions.",
|
||||
"4": "Good technology choice with justification. Implementation leverages paradigm strengths.",
|
||||
"5": "Optimal technology choice with clear rationale comparing alternatives. Implementation exemplifies paradigm best practices. Schema leverages technology-specific features appropriately (e.g., JSONB in PostgreSQL, graph traversal in Neo4j)."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Documentation Quality & Clarity",
|
||||
"description": "Is schema well-documented with ERD, implementation code, and clear explanations?",
|
||||
"scoring": {
|
||||
"1": "Minimal documentation. No diagram. Entity definitions incomplete.",
|
||||
"2": "Basic documentation present but gaps. Diagram missing or unclear. Some entities poorly explained.",
|
||||
"3": "Good documentation with most sections complete. Diagram present. Entities explained.",
|
||||
"4": "Comprehensive documentation following template. ERD clear. All entities, relationships, constraints documented. Implementation code provided.",
|
||||
"5": "Exemplary documentation that could serve as reference. ERD/diagram clear and complete. All sections filled thoroughly. Implementation code (SQL DDL / JSON Schema / Cypher) executable. Examples aid understanding. Could be handed to developer for immediate implementation."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Evolution & Migration Strategy",
|
||||
"description": "Is there a plan for schema changes? Migration path from existing systems considered?",
|
||||
"scoring": {
|
||||
"1": "No evolution strategy. If migration, no plan for existing data.",
|
||||
"2": "Evolution mentioned but no concrete strategy. Migration path vague.",
|
||||
"3": "Basic evolution strategy (versioning or backward-compat approach). Migration considered if applicable.",
|
||||
"4": "Clear evolution strategy documented. Migration path defined with phases if migrating from legacy.",
|
||||
"5": "Comprehensive evolution strategy with versioning, backward-compatibility approach, and detailed migration plan if applicable. Rollback strategy considered. Zero-downtime deployment approach specified. Future extensibility designed in."
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Advanced Pattern Application",
|
||||
"description": "Are advanced patterns (temporal, hierarchies, polymorphic) correctly applied when needed?",
|
||||
"scoring": {
|
||||
"1": "Complex patterns needed but missing or incorrectly implemented.",
|
||||
"2": "Attempted advanced patterns but implementation flawed or overly complex.",
|
||||
"3": "Advanced patterns applied where needed with reasonable implementation.",
|
||||
"4": "Advanced patterns correctly implemented with good trade-off decisions (e.g., hierarchy approach chosen based on use case).",
|
||||
"5": "Sophisticated pattern usage with clear rationale for choices. Temporal modeling, hierarchies, polymorphic associations, or graph patterns implemented optimally for domain. Trade-offs explicit and justified."
|
||||
}
|
||||
}
|
||||
],
|
||||
"schema_type_guidance": {
|
||||
"Relational (SQL)": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Normalization & Data Integrity",
|
||||
"Constraint & Invariant Specification",
|
||||
"Relationship Modeling Accuracy"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Proper normalization (3NF typical)",
|
||||
"Foreign key relationships with CASCADE/RESTRICT",
|
||||
"CHECK constraints for business rules",
|
||||
"Junction tables for M:N relationships"
|
||||
]
|
||||
},
|
||||
"Document/NoSQL": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Entity Identification & Completeness",
|
||||
"Use Case Coverage & Validation",
|
||||
"Technology Appropriateness"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Embed vs reference decision documented",
|
||||
"Denormalization for read performance justified",
|
||||
"Document structure matches query patterns",
|
||||
"JSON schema validation if available"
|
||||
]
|
||||
},
|
||||
"Graph Database": {
|
||||
"target_score": 4.0,
|
||||
"focus_criteria": [
|
||||
"Relationship Modeling Accuracy",
|
||||
"Technology Appropriateness",
|
||||
"Advanced Pattern Application"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Nodes for entities, edges for relationships",
|
||||
"Properties on edges for context",
|
||||
"Traversal patterns optimized (< 3 hops typical)",
|
||||
"Index on frequently filtered properties"
|
||||
]
|
||||
},
|
||||
"Data Warehouse (OLAP)": {
|
||||
"target_score": 3.5,
|
||||
"focus_criteria": [
|
||||
"Use Case Coverage & Validation",
|
||||
"Normalization & Data Integrity",
|
||||
"Technology Appropriateness"
|
||||
],
|
||||
"key_patterns": [
|
||||
"Star or snowflake schema",
|
||||
"Fact tables with foreign keys to dimensions",
|
||||
"Dimensional attributes denormalized",
|
||||
"Slowly changing dimensions handled"
|
||||
]
|
||||
}
|
||||
},
|
||||
"domain_complexity_guidance": {
|
||||
"Simple Domain (< 10 entities, straightforward relationships)": {
|
||||
"target_score": 3.0,
|
||||
"acceptable_shortcuts": [
|
||||
"ERD can be simple text diagram",
|
||||
"Fewer implementation details needed",
|
||||
"Basic constraints sufficient"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All entities identified",
|
||||
"Relationships correct",
|
||||
"Supports use cases"
|
||||
]
|
||||
},
|
||||
"Standard Domain (10-30 entities, moderate complexity)": {
|
||||
"target_score": 3.5,
|
||||
"required_elements": [
|
||||
"Complete entity definitions",
|
||||
"ERD diagram",
|
||||
"All relationships mapped",
|
||||
"Constraints documented",
|
||||
"Implementation code (DDL/schema)"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All 10 criteria evaluated",
|
||||
"Minimum score of 3 on each",
|
||||
"Average ≥ 3.5"
|
||||
]
|
||||
},
|
||||
"Complex Domain (30+ entities, hierarchies, temporal, polymorphic)": {
|
||||
"target_score": 4.0,
|
||||
"required_elements": [
|
||||
"Comprehensive documentation",
|
||||
"Multiple diagrams (ERD + detail views)",
|
||||
"Advanced pattern usage documented",
|
||||
"Migration strategy if applicable",
|
||||
"Performance considerations",
|
||||
"Example queries for complex patterns"
|
||||
],
|
||||
"key_quality_gates": [
|
||||
"All 10 criteria evaluated",
|
||||
"Minimum score of 3.5 on each",
|
||||
"Average ≥ 4.0",
|
||||
"Score 5 on Advanced Pattern Application"
|
||||
]
|
||||
}
|
||||
},
|
||||
"common_failure_modes": {
|
||||
"1. God Entities": {
|
||||
"symptom": "User table with 50+ attributes, or single entity handling multiple concerns",
|
||||
"why_it_fails": "Violates single responsibility, hard to query, update anomalies",
|
||||
"fix": "Extract related concerns into separate entities (UserProfile, UserPreferences, UserAddress)",
|
||||
"related_criteria": ["Entity Identification & Completeness", "Normalization & Data Integrity"]
|
||||
},
|
||||
"2. Missing Junction Tables": {
|
||||
"symptom": "Attempting M:N relationship with direct foreign keys or comma-separated IDs",
|
||||
"why_it_fails": "Can't properly model M:N, violates 1NF, query complexity",
|
||||
"fix": "Always use junction table with composite primary key for M:N relationships",
|
||||
"related_criteria": ["Relationship Modeling Accuracy", "Normalization & Data Integrity"]
|
||||
},
|
||||
"3. Wrong Data Types": {
|
||||
"symptom": "Money as FLOAT, dates as VARCHAR, booleans as CHAR(1)",
|
||||
"why_it_fails": "Precision loss (money), format inconsistency (dates), unclear semantics (booleans)",
|
||||
"fix": "Use DECIMAL for money, DATE/TIMESTAMP for dates, BOOLEAN for flags",
|
||||
"related_criteria": ["Attribute Definition Quality"]
|
||||
},
|
||||
"4. No Constraints": {
|
||||
"symptom": "Business rules in documentation but not enforced in schema",
|
||||
"why_it_fails": "Application bugs can corrupt data, no database-level guarantees",
|
||||
"fix": "Use CHECK constraints, NOT NULL, UNIQUE, FK constraints to enforce rules",
|
||||
"related_criteria": ["Constraint & Invariant Specification"]
|
||||
},
|
||||
"5. Premature Denormalization": {
|
||||
"symptom": "Duplicating data for \"performance\" without measuring",
|
||||
"why_it_fails": "Update anomalies, data inconsistency, wasted effort if not bottleneck",
|
||||
"fix": "Normalize first (3NF), denormalize only after profiling shows actual bottleneck",
|
||||
"related_criteria": ["Normalization & Data Integrity", "Use Case Coverage & Validation"]
|
||||
},
|
||||
"6. Ignoring Use Cases": {
|
||||
"symptom": "Schema designed in isolation, doesn't support required queries",
|
||||
"why_it_fails": "Schema can't answer business questions, requires redesign",
|
||||
"fix": "Validate schema against ALL use cases. Write example queries to verify.",
|
||||
"related_criteria": ["Use Case Coverage & Validation"]
|
||||
},
|
||||
"7. Modeling Implementation": {
|
||||
"symptom": "Entities like \"UserSession\", \"Cache\", \"Queue\" in domain model",
|
||||
"why_it_fails": "Confuses domain concepts with technical infrastructure",
|
||||
"fix": "Model real-world domain entities only. Infrastructure is separate concern.",
|
||||
"related_criteria": ["Entity Identification & Completeness", "Technology Appropriateness"]
|
||||
},
|
||||
"8. No Evolution Strategy": {
|
||||
"symptom": "Can't change schema without breaking production",
|
||||
"why_it_fails": "Schema ossifies, can't adapt to business changes",
|
||||
"fix": "Plan for evolution: versioning, backward-compat changes, or expand-contract migrations",
|
||||
"related_criteria": ["Evolution & Migration Strategy"]
|
||||
}
|
||||
},
|
||||
"scale": {
|
||||
"description": "Each criterion scored 1-5",
|
||||
"min_score": 1,
|
||||
"max_score": 5,
|
||||
"passing_threshold": 3.5,
|
||||
"excellence_threshold": 4.5
|
||||
},
|
||||
"usage_notes": {
|
||||
"when_to_score": "After completing schema design, before delivering to user",
|
||||
"minimum_standard": "Average score ≥ 3.5 across all criteria (standard domain). Simple domains: ≥ 3.0. Complex domains: ≥ 4.0.",
|
||||
"how_to_improve": "If scoring < threshold, identify lowest-scoring criteria and iterate. Common fixes: add missing entities, specify constraints, validate against use cases, improve documentation.",
|
||||
"self_assessment": "Score honestly. Schema flaws are expensive to fix in production. Better to iterate now."
|
||||
}
|
||||
}
|
||||
439
skills/data-schema-knowledge-modeling/resources/methodology.md
Normal file
439
skills/data-schema-knowledge-modeling/resources/methodology.md
Normal file
@@ -0,0 +1,439 @@
|
||||
# Data Schema & Knowledge Modeling: Advanced Methodology
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Advanced Schema Modeling:
|
||||
- [ ] Step 1: Analyze complex domain patterns
|
||||
- [ ] Step 2: Design advanced relationship structures
|
||||
- [ ] Step 3: Apply normalization or strategic denormalization
|
||||
- [ ] Step 4: Model temporal/historical aspects
|
||||
- [ ] Step 5: Plan schema evolution strategy
|
||||
```
|
||||
|
||||
**Steps:** (1) Identify patterns in [Advanced Relationships](#1-advanced-relationship-patterns), (2) Apply [Hierarchy](#2-hierarchy-modeling) and [Polymorphic](#3-polymorphic-associations) patterns, (3) Use [Normalization](#4-normalization-levels) then [Denormalization](#5-strategic-denormalization), (4) Add [Temporal](#6-temporal--historical-modeling) if needed, (5) Plan [Evolution](#7-schema-evolution).
|
||||
|
||||
---
|
||||
|
||||
## 1. Advanced Relationship Patterns
|
||||
|
||||
### Self-Referential
|
||||
|
||||
Entity relates to itself (org charts, categories, social networks).
|
||||
|
||||
```sql
|
||||
CREATE TABLE Employee (
|
||||
id BIGINT PRIMARY KEY,
|
||||
managerId BIGINT NULL REFERENCES Employee(id),
|
||||
CONSTRAINT no_self_ref CHECK (id != managerId)
|
||||
);
|
||||
```
|
||||
|
||||
Query with recursive CTE for full hierarchy.
|
||||
|
||||
### Conditional
|
||||
|
||||
Relationship exists only under conditions.
|
||||
|
||||
```sql
|
||||
CREATE TABLE Order (
|
||||
id BIGINT PRIMARY KEY,
|
||||
status VARCHAR(20),
|
||||
paymentId BIGINT NULL REFERENCES Payment(id),
|
||||
CONSTRAINT payment_when_paid CHECK (
|
||||
(status IN ('paid','completed') AND paymentId IS NOT NULL) OR
|
||||
(status NOT IN ('paid','completed'))
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
### Multi-Parent
|
||||
|
||||
Entity has multiple parents (document in folders).
|
||||
|
||||
```sql
|
||||
CREATE TABLE DocumentFolder (
|
||||
documentId BIGINT REFERENCES Document(id),
|
||||
folderId BIGINT REFERENCES Folder(id),
|
||||
PRIMARY KEY (documentId, folderId)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Hierarchy Modeling
|
||||
|
||||
Four approaches with trade-offs:
|
||||
|
||||
| Approach | Implementation | Read | Write | Best For |
|
||||
|----------|---------------|------|-------|----------|
|
||||
| **Adjacency List** | `parentId` column | Slow (recursive) | Fast | Shallow trees, frequent updates |
|
||||
| **Path Enumeration** | `path VARCHAR` ('/1/5/12/') | Fast | Medium | Read-heavy, moderate depth |
|
||||
| **Nested Sets** | `lft, rgt INT` | Fastest | Slow | Read-heavy, rare writes |
|
||||
| **Closure Table** | Separate ancestor/descendant table | Fastest | Medium | Complex queries, any depth |
|
||||
|
||||
**Adjacency List:**
|
||||
```sql
|
||||
CREATE TABLE Category (
|
||||
id BIGINT PRIMARY KEY,
|
||||
parentId BIGINT NULL REFERENCES Category(id)
|
||||
);
|
||||
```
|
||||
|
||||
**Closure Table:**
|
||||
```sql
|
||||
CREATE TABLE CategoryClosure (
|
||||
ancestor BIGINT,
|
||||
descendant BIGINT,
|
||||
depth INT, -- 0=self, 1=child, 2+=deeper
|
||||
PRIMARY KEY (ancestor, descendant)
|
||||
);
|
||||
```
|
||||
|
||||
**Recommendation:** Adjacency for < 5 levels, Closure for complex queries.
|
||||
|
||||
---
|
||||
|
||||
## 3. Polymorphic Associations
|
||||
|
||||
Entity relates to multiple types (Comment on Post/Photo/Video).
|
||||
|
||||
### Approach 1: Separate FKs (Recommended for SQL)
|
||||
|
||||
```sql
|
||||
CREATE TABLE Comment (
|
||||
id BIGINT PRIMARY KEY,
|
||||
postId BIGINT NULL REFERENCES Post(id),
|
||||
photoId BIGINT NULL REFERENCES Photo(id),
|
||||
videoId BIGINT NULL REFERENCES Video(id),
|
||||
CONSTRAINT one_parent CHECK (
|
||||
(postId IS NOT NULL)::int +
|
||||
(photoId IS NOT NULL)::int +
|
||||
(videoId IS NOT NULL)::int = 1
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
**Pros:** Type-safe, referential integrity
|
||||
**Cons:** Schema grows with types
|
||||
|
||||
### Approach 2: Supertype/Subtype
|
||||
|
||||
```sql
|
||||
CREATE TABLE Commentable (id BIGINT PRIMARY KEY, type VARCHAR(50));
|
||||
CREATE TABLE Post (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
|
||||
CREATE TABLE Photo (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
|
||||
CREATE TABLE Comment (commentableId BIGINT REFERENCES Commentable(id));
|
||||
```
|
||||
|
||||
**Use when:** Shared attributes across types.
|
||||
|
||||
---
|
||||
|
||||
## 4. Graph & Ontology Design
|
||||
|
||||
### Property Graph
|
||||
|
||||
**Nodes** = entities, **Edges** = relationships, both have properties.
|
||||
|
||||
```cypher
|
||||
CREATE (u:User {id: 1, name: 'Alice'})
|
||||
CREATE (p:Product {id: 100, name: 'Widget'})
|
||||
CREATE (u)-[:PURCHASED {date: '2024-01-15', quantity: 2}]->(p)
|
||||
```
|
||||
|
||||
**Schema:**
|
||||
```
|
||||
Nodes: User, Product, Category
|
||||
Edges: PURCHASED (User→Product, {date, quantity})
|
||||
REVIEWED (User→Product, {rating, comment})
|
||||
BELONGS_TO (Product→Category)
|
||||
```
|
||||
|
||||
**Design principles:**
|
||||
- Nodes for entities with identity
|
||||
- Edges for relationships
|
||||
- Properties on edges for context
|
||||
- Avoid deep traversals (< 3 hops)
|
||||
|
||||
### RDF Triples (Semantic Web)
|
||||
|
||||
Subject-Predicate-Object:
|
||||
```turtle
|
||||
ex:Alice rdf:type ex:User .
|
||||
ex:Alice ex:purchased ex:Widget .
|
||||
```
|
||||
|
||||
**Use RDF when:** Standards compliance, semantic reasoning, linked data
|
||||
**Use Property Graph when:** Performance, complex traversals
|
||||
|
||||
---
|
||||
|
||||
## 5. Normalization Levels
|
||||
|
||||
### 1NF: Atomic Values
|
||||
|
||||
**Violation:** Multiple phones in one column
|
||||
**Fix:** Separate UserPhone table
|
||||
|
||||
### 2NF: No Partial Dependencies
|
||||
|
||||
**Violation:** In OrderItem(orderId, productId, productName), productName depends only on productId
|
||||
**Fix:** productName lives in Product table
|
||||
|
||||
### 3NF: No Transitive Dependencies
|
||||
|
||||
**Violation:** In Address(id, zipCode, city, state), city/state depend on zipCode
|
||||
**Fix:** Separate ZipCode table
|
||||
|
||||
**When to normalize to 3NF:** OLTP, frequent updates, consistency required
|
||||
|
||||
---
|
||||
|
||||
## 6. Strategic Denormalization
|
||||
|
||||
**Only after profiling shows bottleneck.**
|
||||
|
||||
### Pattern 1: Computed Aggregates
|
||||
|
||||
Store `Order.total` instead of summing OrderItems on every query.
|
||||
|
||||
**Trade-off:** Faster reads, slower writes, consistency risk (use triggers/app logic)
|
||||
|
||||
### Pattern 2: Frequent Joins
|
||||
|
||||
Embed address fields in User table to avoid join.
|
||||
|
||||
**Trade-off:** No join, but updates must maintain both
|
||||
|
||||
### Pattern 3: Historical Snapshots
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderSnapshot (
|
||||
orderId BIGINT,
|
||||
snapshotDate DATE,
|
||||
userName VARCHAR(255), -- denormalized from User
|
||||
userEmail VARCHAR(255),
|
||||
PRIMARY KEY (orderId, snapshotDate)
|
||||
);
|
||||
```
|
||||
|
||||
**Use when:** Need point-in-time data (e.g., user's name at time of order)
|
||||
|
||||
---
|
||||
|
||||
## 7. Temporal & Historical Modeling
|
||||
|
||||
### Pattern 1: Effective Dating
|
||||
|
||||
```sql
|
||||
CREATE TABLE Price (
|
||||
productId BIGINT,
|
||||
price DECIMAL(10,2),
|
||||
effectiveFrom DATE NOT NULL,
|
||||
effectiveTo DATE NULL, -- NULL = current
|
||||
PRIMARY KEY (productId, effectiveFrom)
|
||||
);
|
||||
```
|
||||
|
||||
**Query current:** WHERE effectiveFrom <= TODAY AND (effectiveTo IS NULL OR effectiveTo > TODAY)
|
||||
|
||||
### Pattern 2: History Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE UserHistory (
|
||||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||||
userId BIGINT,
|
||||
email VARCHAR(255),
|
||||
name VARCHAR(255),
|
||||
validFrom TIMESTAMP DEFAULT NOW(),
|
||||
validTo TIMESTAMP NULL,
|
||||
changeType VARCHAR(20) -- 'INSERT', 'UPDATE', 'DELETE'
|
||||
);
|
||||
```
|
||||
|
||||
Trigger on User table inserts into UserHistory on changes.
|
||||
|
||||
### Pattern 3: Event Sourcing
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderEvent (
|
||||
id BIGINT AUTO_INCREMENT PRIMARY KEY,
|
||||
orderId BIGINT,
|
||||
eventType VARCHAR(50), -- 'CREATED', 'ITEM_ADDED', 'SHIPPED'
|
||||
eventData JSON,
|
||||
occurredAt TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
Reconstruct state by replaying events.
|
||||
|
||||
**Trade-offs:**
|
||||
**Pros:** Complete audit, time travel
|
||||
**Cons:** Query complexity, storage
|
||||
|
||||
---
|
||||
|
||||
## 8. Schema Evolution
|
||||
|
||||
### Strategy 1: Backward-Compatible
|
||||
|
||||
Safe changes (no app changes):
|
||||
- Add nullable column
|
||||
- Add table (not referenced)
|
||||
- Add index
|
||||
- Widen column (VARCHAR(100) → VARCHAR(255))
|
||||
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN phoneNumber VARCHAR(20) NULL;
|
||||
```
|
||||
|
||||
### Strategy 2: Expand-Contract
|
||||
|
||||
For breaking changes:
|
||||
|
||||
1. **Expand:** Add new alongside old
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN newEmail VARCHAR(255) NULL;
|
||||
```
|
||||
|
||||
2. **Migrate:** Copy data
|
||||
```sql
|
||||
UPDATE User SET newEmail = email WHERE newEmail IS NULL;
|
||||
```
|
||||
|
||||
3. **Contract:** Remove old
|
||||
```sql
|
||||
ALTER TABLE User DROP COLUMN email;
|
||||
ALTER TABLE User RENAME COLUMN newEmail TO email;
|
||||
```
|
||||
|
||||
### Strategy 3: Versioned Schemas (NoSQL)
|
||||
|
||||
```json
|
||||
{"_schemaVersion": "2.0", "email": "alice@example.com"}
|
||||
```
|
||||
|
||||
App handles multiple versions.
|
||||
|
||||
### Strategy 4: Blue-Green
|
||||
|
||||
Run old and new schemas simultaneously, dual-write, migrate, switch reads, remove old.
|
||||
|
||||
**Best for:** Major redesigns, zero downtime
|
||||
|
||||
---
|
||||
|
||||
## 9. Multi-Tenancy
|
||||
|
||||
### Pattern 1: Separate Databases
|
||||
|
||||
```
|
||||
tenant1_db, tenant2_db, tenant3_db
|
||||
```
|
||||
|
||||
**Pros:** Strong isolation
|
||||
**Cons:** High overhead
|
||||
|
||||
### Pattern 2: Separate Schemas
|
||||
|
||||
```sql
|
||||
CREATE SCHEMA tenant1;
|
||||
CREATE TABLE tenant1.User (...);
|
||||
```
|
||||
|
||||
**Pros:** Better than separate DBs
|
||||
**Cons:** Still some overhead
|
||||
|
||||
### Pattern 3: Shared Schema + Tenant ID
|
||||
|
||||
```sql
|
||||
CREATE TABLE User (
|
||||
id BIGINT PRIMARY KEY,
|
||||
tenantId BIGINT NOT NULL,
|
||||
email VARCHAR(255),
|
||||
UNIQUE (tenantId, email)
|
||||
);
|
||||
```
|
||||
|
||||
**Pros:** Most efficient
|
||||
**Cons:** Must filter ALL queries by tenantId
|
||||
|
||||
**Recommendation:** Pattern 3 for SaaS, Pattern 1 for regulated industries
|
||||
|
||||
---
|
||||
|
||||
## 10. Performance
|
||||
|
||||
### Indexes
|
||||
|
||||
**Covering index** (includes all query columns):
|
||||
```sql
|
||||
CREATE INDEX idx_user_status ON User(status) INCLUDE (name, email);
|
||||
```
|
||||
|
||||
**Composite index** (order matters):
|
||||
```sql
|
||||
-- Good for: WHERE tenantId = X AND createdAt > Y
|
||||
CREATE INDEX idx_tenant_date ON Order(tenantId, createdAt);
|
||||
```
|
||||
|
||||
**Partial index** (reduce size):
|
||||
```sql
|
||||
CREATE INDEX idx_active ON User(email) WHERE deletedAt IS NULL;
|
||||
```
|
||||
|
||||
### Partitioning
|
||||
|
||||
**Horizontal (sharding):**
|
||||
```sql
|
||||
CREATE TABLE Order (...) PARTITION BY RANGE (createdAt);
|
||||
CREATE TABLE Order_2024_Q1 PARTITION OF Order
|
||||
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
|
||||
```
|
||||
|
||||
**Vertical:** Split hot/cold data into separate tables.
|
||||
|
||||
---
|
||||
|
||||
## 11. Common Advanced Patterns
|
||||
|
||||
### Soft Deletes
|
||||
|
||||
```sql
|
||||
ALTER TABLE User ADD COLUMN deletedAt TIMESTAMP NULL;
|
||||
-- Query: WHERE deletedAt IS NULL
|
||||
```
|
||||
|
||||
### Audit Columns
|
||||
|
||||
```sql
|
||||
createdAt TIMESTAMP DEFAULT NOW()
|
||||
updatedAt TIMESTAMP DEFAULT NOW() ON UPDATE NOW()
|
||||
createdBy BIGINT REFERENCES User(id)
|
||||
updatedBy BIGINT REFERENCES User(id)
|
||||
```
|
||||
|
||||
### State Machines
|
||||
|
||||
```sql
|
||||
CREATE TABLE OrderState (
|
||||
orderId BIGINT REFERENCES Order(id),
|
||||
state VARCHAR(20),
|
||||
transitionedAt TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (orderId, transitionedAt)
|
||||
);
|
||||
-- Track: draft → pending → confirmed → shipped → delivered
|
||||
```
|
||||
|
||||
### Idempotency Keys
|
||||
|
||||
```sql
|
||||
CREATE TABLE Request (
|
||||
idempotencyKey UUID PRIMARY KEY,
|
||||
payload JSON,
|
||||
result JSON,
|
||||
processedAt TIMESTAMP
|
||||
);
|
||||
-- Prevents duplicate processing
|
||||
```
|
||||
330
skills/data-schema-knowledge-modeling/resources/template.md
Normal file
330
skills/data-schema-knowledge-modeling/resources/template.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Data Schema & Knowledge Modeling Template
|
||||
|
||||
## Workflow
|
||||
|
||||
Copy this checklist and track your progress:
|
||||
|
||||
```
|
||||
Data Schema & Knowledge Modeling Progress:
|
||||
- [ ] Step 1: Gather domain requirements and scope
|
||||
- [ ] Step 2: Identify entities and attributes systematically
|
||||
- [ ] Step 3: Define relationships and cardinality
|
||||
- [ ] Step 4: Specify constraints and invariants
|
||||
- [ ] Step 5: Validate against use cases and document
|
||||
```
|
||||
|
||||
**Step 1: Gather domain requirements and scope**
|
||||
|
||||
Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
|
||||
|
||||
**Step 2: Identify entities and attributes systematically**
|
||||
|
||||
Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
|
||||
|
||||
**Step 3: Define relationships and cardinality**
|
||||
|
||||
Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
|
||||
|
||||
**Step 4: Specify constraints and invariants**
|
||||
|
||||
Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
|
||||
|
||||
**Step 5: Validate against use cases and document**
|
||||
|
||||
Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
|
||||
|
||||
---
|
||||
|
||||
## Input Questions
|
||||
|
||||
**Domain & Scope:**
|
||||
- What domain? (e-commerce, healthcare, social network)
|
||||
- Boundaries? In/out of scope?
|
||||
- Existing schemas to integrate/migrate from?
|
||||
|
||||
**Core Use Cases:**
|
||||
- Primary operations? (CRUD for which entities?)
|
||||
- Required queries/reports?
|
||||
- Access patterns? (read-heavy, write-heavy, mixed)
|
||||
|
||||
**Scale & Performance:**
|
||||
- Data volume? (rows per table, storage)
|
||||
- Growth rate? (daily/monthly)
|
||||
- Performance SLAs?
|
||||
|
||||
**Technology:**
|
||||
- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
|
||||
- Compliance? (GDPR, HIPAA, SOC2)
|
||||
- Evolution needs? (schema versioning, migrations)
|
||||
|
||||
---
|
||||
|
||||
## Entity Identification
|
||||
|
||||
**Step 1: Extract nouns**
|
||||
|
||||
List nouns from requirements = candidate entities.
|
||||
|
||||
**Step 2: Validate**
|
||||
|
||||
For each, check:
|
||||
- [ ] Distinct identity? (can point to "this specific X")
|
||||
- [ ] Independent lifecycle?
|
||||
- [ ] Multiple attributes beyond name?
|
||||
- [ ] Track multiple instances?
|
||||
|
||||
**Keep** if yes to most. **Reject** if just an attribute.
|
||||
|
||||
**Step 3: Entity vs Value Object**
|
||||
|
||||
- **Entity**: Has ID, mutable (User, Order)
|
||||
- **Value Object**: No ID, immutable (Address, Money)
|
||||
|
||||
**Step 4: Document**
|
||||
|
||||
```markdown
|
||||
### Entity: [Name]
|
||||
**Purpose:** [What it represents]
|
||||
**Examples:** [2-3 concrete cases]
|
||||
**Lifecycle:** [Creation → deletion]
|
||||
**Invariants:** [Rules that must hold]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Attribute Guide
|
||||
|
||||
**Template:**
|
||||
```
|
||||
attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
|
||||
- Description: [What it represents]
|
||||
- Validation: [Constraints]
|
||||
- Examples: [Sample values]
|
||||
```
|
||||
|
||||
**Standard attributes:**
|
||||
- `id`: Primary key (UUID/BIGINT)
|
||||
- `createdAt`: TIMESTAMP NOT NULL
|
||||
- `updatedAt`: TIMESTAMP NOT NULL
|
||||
- `deletedAt`: TIMESTAMP NULL (soft deletes)
|
||||
|
||||
**Data types:**
|
||||
|
||||
| Data | SQL | NoSQL | Notes |
|
||||
|------|-----|-------|-------|
|
||||
| Short text | VARCHAR(N) | String | Specify max |
|
||||
| Long text | TEXT | String | No limit |
|
||||
| Integer | INT/BIGINT | Number | Choose size |
|
||||
| Decimal | DECIMAL(P,S) | Number | Fixed precision |
|
||||
| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
|
||||
| Boolean | BOOLEAN | Boolean | Not nullable |
|
||||
| Date/Time | TIMESTAMP | ISODate | With timezone |
|
||||
| UUID | UUID/CHAR(36) | String | Distributed IDs |
|
||||
| JSON | JSON/JSONB | Object | Flexible |
|
||||
| Enum | ENUM/VARCHAR | String | Fixed values |
|
||||
|
||||
**Nullability:**
|
||||
- NOT NULL if required
|
||||
- NULL if optional/unknown at creation
|
||||
- Avoid NULL for booleans
|
||||
|
||||
---
|
||||
|
||||
## Relationship Mapping
|
||||
|
||||
**Cardinality:**
|
||||
|
||||
**1:1** - User has one Profile
|
||||
- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
|
||||
|
||||
**1:N** - User has many Orders
|
||||
- SQL: `Order.userId NOT NULL REFERENCES User(id)`
|
||||
|
||||
**M:N** - Order contains Products
|
||||
- Junction table:
|
||||
```sql
|
||||
OrderItem (
|
||||
orderId REFERENCES Order(id),
|
||||
productId REFERENCES Product(id),
|
||||
quantity INT NOT NULL,
|
||||
PRIMARY KEY (orderId, productId)
|
||||
)
|
||||
```
|
||||
|
||||
**Optionality:**
|
||||
- Required: NOT NULL
|
||||
- Optional: NULL
|
||||
|
||||
**Naming:**
|
||||
Use verbs: User **owns** Order, Product **belongs to** Category
|
||||
|
||||
---
|
||||
|
||||
## Constraint Specification
|
||||
|
||||
**Primary Keys:**
|
||||
```sql
|
||||
id BIGINT PRIMARY KEY AUTO_INCREMENT
|
||||
-- or --
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
||||
```
|
||||
|
||||
**Unique:**
|
||||
```sql
|
||||
email VARCHAR(255) UNIQUE NOT NULL
|
||||
UNIQUE (userId, productId) -- composite
|
||||
```
|
||||
|
||||
**Foreign Keys:**
|
||||
```sql
|
||||
userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
|
||||
-- Options: CASCADE, SET NULL, RESTRICT
|
||||
```
|
||||
|
||||
**Check Constraints:**
|
||||
```sql
|
||||
price DECIMAL(10,2) CHECK (price >= 0)
|
||||
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
|
||||
```
|
||||
|
||||
**Domain Invariants:**
|
||||
|
||||
Document business rules:
|
||||
```markdown
|
||||
### Invariant: Order total = sum of items
|
||||
Order.total = SUM(OrderItem.quantity * OrderItem.price)
|
||||
|
||||
### Invariant: Unique email
|
||||
No duplicate emails (case-insensitive)
|
||||
```
|
||||
|
||||
Enforce via: DB constraints (preferred), application logic, or triggers.
|
||||
|
||||
---
|
||||
|
||||
## Schema Documentation Template
|
||||
|
||||
Create: `data-schema-knowledge-modeling.md`
|
||||
|
||||
**Required sections:**
|
||||
|
||||
1. **Domain Overview** - Purpose, scope, technology
|
||||
2. **Use Cases** - Primary operations, query patterns
|
||||
3. **Entity Definitions** - For each entity:
|
||||
- Purpose, examples, lifecycle
|
||||
- Attributes table (name, type, null, default, constraints, description)
|
||||
- Relationships (cardinality, FK, optionality)
|
||||
- Invariants
|
||||
4. **ERD** - Visual/text diagram showing relationships
|
||||
5. **Constraints** - DB constraints, domain invariants
|
||||
6. **Normalization** - Level, denormalization decisions
|
||||
7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
|
||||
8. **Validation** - Check each use case is supported
|
||||
9. **Open Questions** - Unresolved decisions
|
||||
|
||||
**Example entity definition:**
|
||||
|
||||
```markdown
|
||||
### Entity: Order
|
||||
|
||||
**Purpose:** Represents customer purchase transaction
|
||||
**Examples:** Amazon order #123, Shopify order #456
|
||||
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
|
||||
|
||||
#### Attributes
|
||||
|
||||
| Attribute | Type | Null? | Default | Constraints | Description |
|
||||
|---|---|---|---|---|---|
|
||||
| id | BIGINT | NO | auto | PK | Unique identifier |
|
||||
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
|
||||
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
|
||||
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
|
||||
|
||||
#### Relationships
|
||||
- **belongs to:** 1:N with User (Order.userId → User.id)
|
||||
- **contains:** 1:N with OrderItem junction table
|
||||
|
||||
#### Invariants
|
||||
- total = SUM(OrderItem.quantity * OrderItem.price)
|
||||
- status transitions: pending → confirmed → shipped → delivered
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
**Completeness:**
|
||||
- [ ] All entities identified
|
||||
- [ ] All attributes defined (types, nullability)
|
||||
- [ ] All relationships mapped (cardinality)
|
||||
- [ ] All constraints specified
|
||||
- [ ] All invariants documented
|
||||
|
||||
**Correctness:**
|
||||
- [ ] Each entity distinct purpose
|
||||
- [ ] No redundant entities
|
||||
- [ ] Attributes in correct entities
|
||||
- [ ] Cardinality reflects reality
|
||||
- [ ] Constraints enforce rules
|
||||
|
||||
**Use Case Coverage:**
|
||||
- [ ] Supports all CRUD operations
|
||||
- [ ] All queries answerable
|
||||
- [ ] Indexes planned
|
||||
- [ ] No missing joins
|
||||
|
||||
**Normalization:**
|
||||
- [ ] No partial dependencies (2NF)
|
||||
- [ ] No transitive dependencies (3NF)
|
||||
- [ ] Denormalization documented
|
||||
- [ ] No update anomalies
|
||||
|
||||
**Technical Quality:**
|
||||
- [ ] Consistent naming
|
||||
- [ ] Appropriate data types
|
||||
- [ ] Primary keys defined
|
||||
- [ ] Foreign keys maintain integrity
|
||||
- [ ] Soft delete strategy (if needed)
|
||||
- [ ] Audit fields (if needed)
|
||||
|
||||
**Future-Proofing:**
|
||||
- [ ] Schema extensible
|
||||
- [ ] Migration path (if applicable)
|
||||
- [ ] Versioning strategy
|
||||
- [ ] No technical debt
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
**1. Modeling implementation, not domain**
|
||||
- Symptom: Entities like "UserSession", "Cache"
|
||||
- Fix: Model real-world concepts only
|
||||
|
||||
**2. God entities**
|
||||
- Symptom: User with 50+ attributes
|
||||
- Fix: Extract to separate entities
|
||||
|
||||
**3. Missing junction tables**
|
||||
- Symptom: M:N with FKs
|
||||
- Fix: Always use junction table
|
||||
|
||||
**4. Nullable FKs without reason**
|
||||
- Symptom: All relationships optional
|
||||
- Fix: NOT NULL unless truly optional
|
||||
|
||||
**5. Not enforcing invariants**
|
||||
- Symptom: Rules in docs only
|
||||
- Fix: CHECK constraints, triggers, app validation
|
||||
|
||||
**6. Premature denormalization**
|
||||
- Symptom: Duplicating without measurement
|
||||
- Fix: Normalize first, denormalize after profiling
|
||||
|
||||
**7. Wrong data types**
|
||||
- Symptom: Money as VARCHAR
|
||||
- Fix: DECIMAL for money, proper types for all
|
||||
|
||||
**8. No migration strategy**
|
||||
- Symptom: Can't change schema
|
||||
- Fix: Versioning, backward-compat changes
|
||||
Reference in New Issue
Block a user