Files
gh-lyndonkl-claude/skills/data-schema-knowledge-modeling/resources/template.md
2025-11-30 08:38:26 +08:00

8.6 KiB

Data Schema & Knowledge Modeling Template

Workflow

Copy this checklist and track your progress:

Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes systematically
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate against use cases and document

Step 1: Gather domain requirements and scope

Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use Input Questions.

Step 2: Identify entities and attributes systematically

Extract entities from requirements using Entity Identification. Define attributes with types and nullability using Attribute Guide.

Step 3: Define relationships and cardinality

Map entity connections using Relationship Mapping. Specify cardinality (1:1, 1:N, M:N) and optionality.

Step 4: Specify constraints and invariants

Define business rules and constraints using Constraint Specification. Document domain invariants.

Step 5: Validate against use cases and document

Create data-schema-knowledge-modeling.md using Template. Verify using Validation Checklist.


Input Questions

Domain & Scope:

  • What domain? (e-commerce, healthcare, social network)
  • Boundaries? In/out of scope?
  • Existing schemas to integrate/migrate from?

Core Use Cases:

  • Primary operations? (CRUD for which entities?)
  • Required queries/reports?
  • Access patterns? (read-heavy, write-heavy, mixed)

Scale & Performance:

  • Data volume? (rows per table, storage)
  • Growth rate? (daily/monthly)
  • Performance SLAs?

Technology:

  • Database? (PostgreSQL, MongoDB, Neo4j, etc.)
  • Compliance? (GDPR, HIPAA, SOC2)
  • Evolution needs? (schema versioning, migrations)

Entity Identification

Step 1: Extract nouns

List nouns from requirements = candidate entities.

Step 2: Validate

For each, check:

  • Distinct identity? (can point to "this specific X")
  • Independent lifecycle?
  • Multiple attributes beyond name?
  • Track multiple instances?

Keep if yes to most. Reject if just an attribute.

Step 3: Entity vs Value Object

  • Entity: Has ID, mutable (User, Order)
  • Value Object: No ID, immutable (Address, Money)

Step 4: Document

### Entity: [Name]
**Purpose:** [What it represents]
**Examples:** [2-3 concrete cases]
**Lifecycle:** [Creation → deletion]
**Invariants:** [Rules that must hold]

Attribute Guide

Template:

attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
  - Description: [What it represents]
  - Validation: [Constraints]
  - Examples: [Sample values]

Standard attributes:

  • id: Primary key (UUID/BIGINT)
  • createdAt: TIMESTAMP NOT NULL
  • updatedAt: TIMESTAMP NOT NULL
  • deletedAt: TIMESTAMP NULL (soft deletes)

Data types:

Data SQL NoSQL Notes
Short text VARCHAR(N) String Specify max
Long text TEXT String No limit
Integer INT/BIGINT Number Choose size
Decimal DECIMAL(P,S) Number Fixed precision
Money DECIMAL(19,4) {amount,currency} Never FLOAT
Boolean BOOLEAN Boolean Not nullable
Date/Time TIMESTAMP ISODate With timezone
UUID UUID/CHAR(36) String Distributed IDs
JSON JSON/JSONB Object Flexible
Enum ENUM/VARCHAR String Fixed values

Nullability:

  • NOT NULL if required
  • NULL if optional/unknown at creation
  • Avoid NULL for booleans

Relationship Mapping

Cardinality:

1:1 - User has one Profile

  • SQL: Profile.userId UNIQUE NOT NULL REFERENCES User(id)

1:N - User has many Orders

  • SQL: Order.userId NOT NULL REFERENCES User(id)

M:N - Order contains Products

  • Junction table:
    OrderItem (
      orderId REFERENCES Order(id),
      productId REFERENCES Product(id),
      quantity INT NOT NULL,
      PRIMARY KEY (orderId, productId)
    )
    

Optionality:

  • Required: NOT NULL
  • Optional: NULL

Naming: Use verbs: User owns Order, Product belongs to Category


Constraint Specification

Primary Keys:

id BIGINT PRIMARY KEY AUTO_INCREMENT
-- or --
id UUID PRIMARY KEY DEFAULT gen_random_uuid()

Unique:

email VARCHAR(255) UNIQUE NOT NULL
UNIQUE (userId, productId)  -- composite

Foreign Keys:

userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
-- Options: CASCADE, SET NULL, RESTRICT

Check Constraints:

price DECIMAL(10,2) CHECK (price >= 0)
status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))

Domain Invariants:

Document business rules:

### Invariant: Order total = sum of items
Order.total = SUM(OrderItem.quantity * OrderItem.price)

### Invariant: Unique email
No duplicate emails (case-insensitive)

Enforce via: DB constraints (preferred), application logic, or triggers.


Schema Documentation Template

Create: data-schema-knowledge-modeling.md

Required sections:

  1. Domain Overview - Purpose, scope, technology
  2. Use Cases - Primary operations, query patterns
  3. Entity Definitions - For each entity:
    • Purpose, examples, lifecycle
    • Attributes table (name, type, null, default, constraints, description)
    • Relationships (cardinality, FK, optionality)
    • Invariants
  4. ERD - Visual/text diagram showing relationships
  5. Constraints - DB constraints, domain invariants
  6. Normalization - Level, denormalization decisions
  7. Implementation - SQL DDL / JSON Schema / Graph schema as appropriate
  8. Validation - Check each use case is supported
  9. Open Questions - Unresolved decisions

Example entity definition:

### Entity: Order

**Purpose:** Represents customer purchase transaction
**Examples:** Amazon order #123, Shopify order #456
**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery

#### Attributes

| Attribute | Type | Null? | Default | Constraints | Description |
|---|---|---|---|---|---|
| id | BIGINT | NO | auto | PK | Unique identifier |
| userId | BIGINT | NO | - | FK→User | Customer who placed order |
| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |

#### Relationships
- **belongs to:** 1:N with User (Order.userId → User.id)
- **contains:** 1:N with OrderItem junction table

#### Invariants
- total = SUM(OrderItem.quantity * OrderItem.price)
- status transitions: pending → confirmed → shipped → delivered

Validation Checklist

Completeness:

  • All entities identified
  • All attributes defined (types, nullability)
  • All relationships mapped (cardinality)
  • All constraints specified
  • All invariants documented

Correctness:

  • Each entity distinct purpose
  • No redundant entities
  • Attributes in correct entities
  • Cardinality reflects reality
  • Constraints enforce rules

Use Case Coverage:

  • Supports all CRUD operations
  • All queries answerable
  • Indexes planned
  • No missing joins

Normalization:

  • No partial dependencies (2NF)
  • No transitive dependencies (3NF)
  • Denormalization documented
  • No update anomalies

Technical Quality:

  • Consistent naming
  • Appropriate data types
  • Primary keys defined
  • Foreign keys maintain integrity
  • Soft delete strategy (if needed)
  • Audit fields (if needed)

Future-Proofing:

  • Schema extensible
  • Migration path (if applicable)
  • Versioning strategy
  • No technical debt

Common Pitfalls

1. Modeling implementation, not domain

  • Symptom: Entities like "UserSession", "Cache"
  • Fix: Model real-world concepts only

2. God entities

  • Symptom: User with 50+ attributes
  • Fix: Extract to separate entities

3. Missing junction tables

  • Symptom: M:N with FKs
  • Fix: Always use junction table

4. Nullable FKs without reason

  • Symptom: All relationships optional
  • Fix: NOT NULL unless truly optional

5. Not enforcing invariants

  • Symptom: Rules in docs only
  • Fix: CHECK constraints, triggers, app validation

6. Premature denormalization

  • Symptom: Duplicating without measurement
  • Fix: Normalize first, denormalize after profiling

7. Wrong data types

  • Symptom: Money as VARCHAR
  • Fix: DECIMAL for money, proper types for all

8. No migration strategy

  • Symptom: Can't change schema
  • Fix: Versioning, backward-compat changes