Files
2025-11-29 17:59:49 +08:00

815 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Backend Architecture
**Comprehensive patterns for APIs, databases, and server-side systems**
Consolidated from:
- backend-architect skills
- api-developer skills
- database-architect skills
---
# Schema Design Skill
Expert patterns for relational database schema design, normalization, and constraint management.
## Core Principles
### 1. Normalization Levels
**1NF (First Normal Form)**:
- Atomic values only (no arrays, no comma-separated lists)
- Each column contains single value
- No repeating groups
**2NF (Second Normal Form)**:
- Must be in 1NF
- No partial dependencies on composite primary keys
- Every non-key column depends on the entire primary key
**3NF (Third Normal Form)**:
- Must be in 2NF
- No transitive dependencies
- Every non-key column depends only on the primary key
**BCNF (Boyce-Codd Normal Form)**:
- Must be in 3NF
- Every determinant is a candidate key
**Strategic Denormalization**:
- Only denormalize with performance data justification
- Document the trade-off
- Consider materialized views instead
- Plan for data consistency maintenance
### 2. Primary Key Selection
**UUID (Recommended for distributed systems)**:
```sql
id UUID PRIMARY KEY DEFAULT uuid_generate_v4()
```
- Pros: Globally unique, no coordination needed, harder to enumerate
- Cons: Larger storage (16 bytes), random order (index fragmentation)
**Auto-increment Integer**:
```sql
id SERIAL PRIMARY KEY -- PostgreSQL
id INT AUTO_INCREMENT PRIMARY KEY -- MySQL
id INTEGER PRIMARY KEY AUTOINCREMENT -- SQLite
```
- Pros: Small storage (4-8 bytes), sequential (better index performance)
- Cons: Coordination needed, easy to enumerate, not globally unique
**Composite Keys** (for junction tables):
```sql
PRIMARY KEY (user_id, role_id)
```
### 3. Foreign Key Constraints
**Always define foreign keys** for referential integrity:
```sql
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(id)
ON DELETE CASCADE -- or RESTRICT, SET NULL
ON UPDATE CASCADE
```
**ON DELETE options**:
- `CASCADE`: Delete child rows when parent deleted
- `RESTRICT`: Prevent delete if children exist
- `SET NULL`: Set foreign key to NULL
- `NO ACTION`: Similar to RESTRICT (database-specific)
### 4. Check Constraints
**Use check constraints for business rules**:
```sql
-- Email format validation
CONSTRAINT email_format CHECK (
email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$'
)
-- Positive values
CONSTRAINT total_positive CHECK (total >= 0)
-- Enum-like values
CONSTRAINT valid_status CHECK (
status IN ('pending', 'processing', 'completed', 'cancelled')
)
-- Date ranges
CONSTRAINT valid_date_range CHECK (end_date > start_date)
```
### 5. Index Strategy
**Index Types and When to Use**:
**B-Tree (Default)**:
- WHERE clauses: `WHERE status = 'active'`
- ORDER BY: `ORDER BY created_at DESC`
- Range queries: `WHERE price BETWEEN 10 AND 100`
- Joins: Foreign key columns
**GIN (PostgreSQL - Generalized Inverted Index)**:
- JSONB columns: `WHERE data @> '{"key": "value"}'`
- Arrays: `WHERE tags @> ARRAY['postgresql']`
- Full-text search: `WHERE to_tsvector(text) @@ to_tsquery('search')`
**GiST (PostgreSQL - Generalized Search Tree)**:
- Geometric data: `WHERE location && box '((0,0),(1,1))'`
- Full-text search: Alternative to GIN
- Range types: `WHERE daterange && '[2025-01-01, 2025-12-31]'`
**Hash (Limited use)**:
- Equality only: `WHERE id = 123`
- Not recommended (B-tree usually better)
**Composite Index Column Order**:
```sql
-- Rule: Most selective column first, or most commonly filtered
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Works for:
-- WHERE status = 'pending' ✅
-- WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days' ✅
-- WHERE status = 'pending' ORDER BY created_at DESC ✅
-- Does NOT work for:
-- WHERE created_at > NOW() - INTERVAL '7 days' ❌ (doesn't start with status)
```
## Schema Patterns
### Pattern 1: Soft Delete
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
deleted_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Partial unique index (only for non-deleted rows)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;
-- Query pattern: Always filter deleted
SELECT * FROM users WHERE deleted_at IS NULL;
```
### Pattern 2: Audit Trail
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by UUID REFERENCES users(id),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_by UUID REFERENCES users(id)
);
-- Separate audit log table for full history
CREATE TABLE users_audit (
audit_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL,
operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
old_values JSONB,
new_values JSONB,
changed_by UUID REFERENCES users(id),
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
### Pattern 3: Many-to-Many with Metadata
```sql
-- Junction table with additional attributes
CREATE TABLE user_roles (
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
role_id UUID REFERENCES roles(id) ON DELETE CASCADE,
granted_by UUID REFERENCES users(id),
granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
PRIMARY KEY (user_id, role_id)
);
CREATE INDEX idx_user_roles_user ON user_roles(user_id);
CREATE INDEX idx_user_roles_role ON user_roles(role_id);
CREATE INDEX idx_user_roles_expires ON user_roles(expires_at)
WHERE expires_at IS NOT NULL;
```
### Pattern 4: Hierarchical Data (Adjacency List)
```sql
CREATE TABLE categories (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(255) NOT NULL,
parent_id UUID REFERENCES categories(id),
path TEXT, -- Materialized path: /electronics/computers/laptops
level INT, -- Denormalized for performance
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_categories_parent ON categories(parent_id);
CREATE INDEX idx_categories_path ON categories(path);
```
### Pattern 5: Polymorphic Associations (Avoid if Possible)
**❌ Problematic Approach**:
```sql
-- Weak referential integrity
CREATE TABLE comments (
id UUID PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50), -- 'Post' or 'Photo'
commentable_id UUID, -- No real foreign key!
created_at TIMESTAMP
);
```
**✅ Better Approach (Exclusive Arcs)**:
```sql
CREATE TABLE comments (
id UUID PRIMARY KEY,
content TEXT NOT NULL,
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
photo_id UUID REFERENCES photos(id) ON DELETE CASCADE,
created_at TIMESTAMP,
-- Exactly one must be set
CONSTRAINT one_commentable CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
)
);
CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_photo ON comments(photo_id);
```
## Naming Conventions
**Tables**: Plural nouns, lowercase, underscores
```
users, orders, order_items, user_preferences
```
**Columns**: Singular nouns, lowercase, underscores
```
id, email, first_name, created_at, customer_id
```
**Primary Keys**: Always `id`
```
id UUID PRIMARY KEY
```
**Foreign Keys**: `{referenced_table_singular}_id`
```
customer_id, product_id, user_id
```
**Indexes**: `idx_{table}_{column(s)}[_{condition}]`
```
idx_users_email
idx_orders_customer_id
idx_orders_status_created
idx_users_email_active (partial index)
```
**Constraints**: `{type}_{table}_{description}`
```
pk_users (primary key)
fk_orders_customer (foreign key)
uq_users_email (unique)
ck_orders_total_positive (check)
```
## Common Anti-Patterns to Avoid
**❌ Generic JSON Columns (EAV Pattern)**:
```sql
-- Bad: No schema, no constraints, no indexes
CREATE TABLE entities (
id UUID PRIMARY KEY,
type VARCHAR(50),
attributes JSONB
);
```
**❌ Comma-Separated Lists**:
```sql
-- Bad: Violates 1NF, can't join efficiently
CREATE TABLE users (
id UUID PRIMARY KEY,
tags TEXT -- 'javascript,python,sql'
);
```
**✅ Use junction table instead**:
```sql
CREATE TABLE user_tags (
user_id UUID REFERENCES users(id),
tag_id UUID REFERENCES tags(id),
PRIMARY KEY (user_id, tag_id)
);
```
**❌ Nullable Boolean Columns**:
```sql
-- Bad: Three states (true, false, null) - ambiguous
is_active BOOLEAN NULL
```
**✅ Be explicit**:
```sql
-- Good: Two clear states
is_active BOOLEAN NOT NULL DEFAULT true
```
## ER Diagram Notation (Mermaid)
```mermaid
erDiagram
CUSTOMER ||--o{ ORDER : places
CUSTOMER {
uuid id PK "Primary Key"
string email UK "Unique Key"
string name
timestamp created_at
}
ORDER ||--|{ LINE_ITEM : contains
ORDER {
uuid id PK
uuid customer_id FK
decimal total
string status
}
PRODUCT ||--o{ LINE_ITEM : "ordered in"
LINE_ITEM {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal unit_price
}
```
**Cardinality Symbols**:
- `||--||` : One to exactly one
- `||--o|` : One to zero or one
- `||--o{` : One to zero or more
- `}|--|{` : One or more to one or more
## Database-Specific Best Practices
### PostgreSQL
```sql
-- Enable UUID extension
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Use JSONB (not JSON) for better performance
metadata JSONB
-- Use array types when appropriate
tags TEXT[]
-- Use full-text search
CREATE INDEX idx_products_search ON products
USING GIN (to_tsvector('english', name || ' ' || description));
-- Use enums for fixed sets
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'completed', 'cancelled');
```
### MySQL
```sql
-- Use InnoDB engine (default in 8.0+)
ENGINE=InnoDB
-- Use UTF8MB4 for full Unicode support (including emoji)
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
-- Use generated columns for computed values
price_with_tax DECIMAL(10,2) GENERATED ALWAYS AS (price * 1.20) STORED
-- Partition large tables
PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
```
### SQLite
```sql
-- Use STRICT tables for type enforcement (3.37+)
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT NOT NULL,
age INTEGER NOT NULL
) STRICT;
-- Use WITHOUT ROWID for space efficiency
CREATE TABLE user_settings (
user_id INTEGER PRIMARY KEY,
theme TEXT NOT NULL,
locale TEXT NOT NULL
) WITHOUT ROWID;
-- Use triggers for complex constraints
CREATE TRIGGER check_age_before_insert
BEFORE INSERT ON users
FOR EACH ROW
WHEN NEW.age < 18
BEGIN
SELECT RAISE(ABORT, 'Users must be 18 or older');
END;
```
## Quality Checklist
**Schema Completeness**:
- [ ] All tables have primary keys
- [ ] All relationships have foreign keys
- [ ] Appropriate NOT NULL constraints
- [ ] Check constraints for business rules
- [ ] Default values where appropriate
- [ ] Created_at/updated_at timestamps
**Normalization**:
- [ ] Schema is at least 3NF
- [ ] No repeating groups
- [ ] No partial dependencies
- [ ] No transitive dependencies
- [ ] Denormalization justified and documented
**Performance**:
- [ ] Indexes on all foreign keys
- [ ] Indexes on commonly filtered columns
- [ ] Composite indexes for multi-column queries
- [ ] Covering indexes for frequent queries
- [ ] Partial indexes where appropriate
**Maintainability**:
- [ ] Consistent naming conventions
- [ ] Clear table and column names
- [ ] Comments on complex structures
- [ ] ER diagram provided
- [ ] Design decisions documented
---
## MCP-Enhanced Schema Design
### PostgreSQL MCP for Schema Validation
When PostgreSQL MCP is available, validate schema designs directly against production databases:
```typescript
// Runtime detection - no configuration needed
const hasPostgres = typeof mcp__postgres__query !== 'undefined';
if (hasPostgres) {
console.log("✓ Using PostgreSQL MCP for schema design validation");
// Validate schema against existing database
const schemaCheck = await mcp__postgres__query({
sql: `
SELECT
table_name,
column_name,
data_type,
is_nullable,
column_default,
character_maximum_length
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name IN ('users', 'orders', 'products')
ORDER BY table_name, ordinal_position
`
});
console.log(`✓ Retrieved schema for ${schemaCheck.rows.length} columns`);
// Check constraints
const constraints = await mcp__postgres__query({
sql: `
SELECT
tc.table_name,
tc.constraint_name,
tc.constraint_type,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
LEFT JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
LEFT JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name
WHERE tc.table_schema = 'public'
ORDER BY tc.table_name, tc.constraint_type
`
});
console.log(`✓ Found ${constraints.rows.length} constraints`);
// Validate foreign key relationships
const orphanedRecords = await mcp__postgres__query({
sql: `
SELECT
'orders' as table_name,
COUNT(*) as orphaned_count
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL
`
});
if (orphanedRecords.rows[0].orphaned_count > 0) {
console.log(`⚠️ Found ${orphanedRecords.rows[0].orphaned_count} orphaned records`);
} else {
console.log("✓ All foreign key relationships valid");
}
// Test DDL before executing
const ddlTest = await mcp__postgres__query({
sql: `
BEGIN;
-- Test adding new column
ALTER TABLE users ADD COLUMN test_column VARCHAR(100);
-- Check table size after change
SELECT pg_size_pretty(pg_relation_size('users')) as table_size;
ROLLBACK;
`
});
console.log(`✓ DDL validated (would not break existing data)`);
} else {
console.log(" PostgreSQL MCP not available");
console.log(" Install for schema validation:");
console.log(" npm install -g @modelcontextprotocol/server-postgres");
}
```
### Benefits Comparison
| Aspect | With PostgreSQL MCP | Without MCP (Traditional) |
|--------|-------------------|--------------------------|
| **Schema Exploration** | Query information_schema instantly | Request schema dump → wait |
| **Constraint Validation** | Check FK relationships on real data | Assume constraints work |
| **DDL Testing** | Test ALTER statements with ROLLBACK | Deploy and hope |
| **Data Distribution** | Analyze with pg_stats | Guess cardinality |
| **Impact Analysis** | Query actual table sizes | Estimate impact |
| **Normalization Check** | Find duplicates in production | Theoretical analysis |
| **Migration Safety** | Test on production replica | Cross fingers |
**When to use PostgreSQL MCP:**
- Designing schema for existing database
- Validating normalization against real data
- Testing DDL changes before deployment
- Analyzing data distribution for index design
- Finding schema anomalies
- Planning migrations
- Reverse engineering existing schemas
**When traditional approach needed:**
- Greenfield database design
- Designing for future data
- Theoretical schema modeling
- No database access
### Real-World Example: Adding User Preferences
**With PostgreSQL MCP (15 minutes):**
```typescript
// 1. Analyze current users table
const currentSchema = await mcp__postgres__query({
sql: `
SELECT
column_name,
data_type,
is_nullable
FROM information_schema.columns
WHERE table_name = 'users'
ORDER BY ordinal_position
`
});
console.log(`✓ Users table has ${currentSchema.rows.length} columns`);
// 2. Check for existing preference data
const preferencesCheck = await mcp__postgres__query({
sql: `
SELECT
COUNT(DISTINCT user_id) as users_with_prefs,
COUNT(*) as total_prefs,
AVG(array_length(string_to_array(preferences, ','), 1)) as avg_prefs_per_user
FROM user_metadata
WHERE preferences IS NOT NULL
`
});
console.log(`${preferencesCheck.rows[0].users_with_prefs} users have preferences`);
// 3. Design decision: Separate table vs JSONB column
const tableSize = await mcp__postgres__query({
sql: `
SELECT
pg_size_pretty(pg_relation_size('users')) as current_size,
pg_size_pretty(pg_relation_size('users') * 1.2) as estimated_with_jsonb
`
});
console.log(`✓ Adding JSONB column would increase size to ${tableSize.rows[0].estimated_with_jsonb}`);
// 4. Test the migration (with ROLLBACK)
const migrationTest = await mcp__postgres__query({
sql: `
BEGIN;
-- Add preferences column
ALTER TABLE users ADD COLUMN preferences JSONB DEFAULT '{}'::jsonb;
-- Add GIN index for JSONB queries
CREATE INDEX idx_users_preferences ON users USING GIN (preferences);
-- Test query performance
EXPLAIN ANALYZE
SELECT * FROM users
WHERE preferences @> '{"theme": "dark"}'::jsonb;
ROLLBACK;
`
});
console.log("✓ Migration tested successfully");
// Decision: Use JSONB column (flexible, good performance with GIN index)
```
**Without MCP (2 hours):**
1. Request schema documentation (15 min wait)
2. Analyze schema manually (20 min)
3. Make design decision based on assumptions (15 min)
4. Write migration script (15 min)
5. Deploy to test database (10 min)
6. Load test data (20 min)
7. Test queries (10 min)
8. Find issues (15 min)
9. Revise and redeploy (15 min)
### Schema Validation Patterns
```typescript
// Comprehensive schema validation
async function validateSchema() {
const hasPostgres = typeof mcp__postgres__query !== 'undefined';
if (hasPostgres) {
// 1. Check for missing indexes on foreign keys
const missingIndexes = await mcp__postgres__query({
sql: `
SELECT
tc.table_name,
kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
SELECT 1
FROM pg_indexes
WHERE tablename = tc.table_name
AND indexdef LIKE '%' || kcu.column_name || '%'
)
`
});
if (missingIndexes.rows.length > 0) {
console.log("⚠️ Missing indexes on foreign keys:");
missingIndexes.rows.forEach(row => {
console.log(` ${row.table_name}.${row.column_name}`);
});
}
// 2. Check for columns that should be NOT NULL
const nullableColumns = await mcp__postgres__query({
sql: `
SELECT
table_name,
column_name,
COUNT(*) FILTER (WHERE value IS NULL) as null_count,
COUNT(*) as total_count
FROM (
SELECT 'users' as table_name, 'email' as column_name, email as value FROM users
UNION ALL
SELECT 'orders' as table_name, 'user_id' as column_name, user_id::text as value FROM orders
) data
GROUP BY table_name, column_name
HAVING COUNT(*) FILTER (WHERE value IS NULL) = 0
AND EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_name = data.table_name
AND column_name = data.column_name
AND is_nullable = 'YES'
)
`
});
if (nullableColumns.rows.length > 0) {
console.log("⚠️ Columns that could be NOT NULL:");
nullableColumns.rows.forEach(row => {
console.log(` ${row.table_name}.${row.column_name} (0 NULLs in ${row.total_count} rows)`);
});
}
// 3. Check for denormalization opportunities
const duplicateData = await mcp__postgres__query({
sql: `
SELECT
user_id,
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY user_id, email
HAVING COUNT(*) > 1
`
});
return {
missingIndexes: missingIndexes.rows,
nullableColumns: nullableColumns.rows,
duplicates: duplicateData.rows
};
}
}
```
### PostgreSQL MCP Installation
```bash
# Install PostgreSQL MCP
npm install -g @modelcontextprotocol/server-postgres
# Configure for schema design
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://schema_designer:pass@db.company.com:5432/production"
}
}
}
}
```
Once installed, all agents reading this skill automatically validate schemas against live databases.
### Schema Design Workflow with MCP
1. **Explore Existing Schema**: Query information_schema
2. **Analyze Data Distribution**: Use pg_stats
3. **Check Constraints**: Validate FK relationships
4. **Test DDL Changes**: Use BEGIN...ROLLBACK
5. **Estimate Impact**: Query table/index sizes
6. **Validate Normalization**: Find duplicates
7. **Plan Indexes**: Analyze query patterns
8. **Generate Migration**: Create safe DDL scripts
---
**Version**: 1.0
**Last Updated**: January 2025
**MCP Enhancement**: PostgreSQL for data-driven schema design
**Best Practices**: Industry-proven schema design patterns