21 KiB
Backend Architecture
Comprehensive patterns for APIs, databases, and server-side systems
Consolidated from:
- backend-architect skills
- api-developer skills
- database-architect skills
Schema Design Skill
Expert patterns for relational database schema design, normalization, and constraint management.
Core Principles
1. Normalization Levels
1NF (First Normal Form):
- Atomic values only (no arrays, no comma-separated lists)
- Each column contains single value
- No repeating groups
2NF (Second Normal Form):
- Must be in 1NF
- No partial dependencies on composite primary keys
- Every non-key column depends on the entire primary key
3NF (Third Normal Form):
- Must be in 2NF
- No transitive dependencies
- Every non-key column depends only on the primary key
BCNF (Boyce-Codd Normal Form):
- Must be in 3NF
- Every determinant is a candidate key
Strategic Denormalization:
- Only denormalize with performance data justification
- Document the trade-off
- Consider materialized views instead
- Plan for data consistency maintenance
2. Primary Key Selection
UUID (Recommended for distributed systems):
id UUID PRIMARY KEY DEFAULT uuid_generate_v4()
- Pros: Globally unique, no coordination needed, harder to enumerate
- Cons: Larger storage (16 bytes), random order (index fragmentation)
Auto-increment Integer:
id SERIAL PRIMARY KEY -- PostgreSQL
id INT AUTO_INCREMENT PRIMARY KEY -- MySQL
id INTEGER PRIMARY KEY AUTOINCREMENT -- SQLite
- Pros: Small storage (4-8 bytes), sequential (better index performance)
- Cons: Coordination needed, easy to enumerate, not globally unique
Composite Keys (for junction tables):
PRIMARY KEY (user_id, role_id)
3. Foreign Key Constraints
Always define foreign keys for referential integrity:
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(id)
ON DELETE CASCADE -- or RESTRICT, SET NULL
ON UPDATE CASCADE
ON DELETE options:
CASCADE: Delete child rows when parent deletedRESTRICT: Prevent delete if children existSET NULL: Set foreign key to NULLNO ACTION: Similar to RESTRICT (database-specific)
4. Check Constraints
Use check constraints for business rules:
-- Email format validation
CONSTRAINT email_format CHECK (
email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$'
)
-- Positive values
CONSTRAINT total_positive CHECK (total >= 0)
-- Enum-like values
CONSTRAINT valid_status CHECK (
status IN ('pending', 'processing', 'completed', 'cancelled')
)
-- Date ranges
CONSTRAINT valid_date_range CHECK (end_date > start_date)
5. Index Strategy
Index Types and When to Use:
B-Tree (Default):
- WHERE clauses:
WHERE status = 'active' - ORDER BY:
ORDER BY created_at DESC - Range queries:
WHERE price BETWEEN 10 AND 100 - Joins: Foreign key columns
GIN (PostgreSQL - Generalized Inverted Index):
- JSONB columns:
WHERE data @> '{"key": "value"}' - Arrays:
WHERE tags @> ARRAY['postgresql'] - Full-text search:
WHERE to_tsvector(text) @@ to_tsquery('search')
GiST (PostgreSQL - Generalized Search Tree):
- Geometric data:
WHERE location && box '((0,0),(1,1))' - Full-text search: Alternative to GIN
- Range types:
WHERE daterange && '[2025-01-01, 2025-12-31]'
Hash (Limited use):
- Equality only:
WHERE id = 123 - Not recommended (B-tree usually better)
Composite Index Column Order:
-- Rule: Most selective column first, or most commonly filtered
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Works for:
-- WHERE status = 'pending' ✅
-- WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days' ✅
-- WHERE status = 'pending' ORDER BY created_at DESC ✅
-- Does NOT work for:
-- WHERE created_at > NOW() - INTERVAL '7 days' ❌ (doesn't start with status)
Schema Patterns
Pattern 1: Soft Delete
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
deleted_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Partial unique index (only for non-deleted rows)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;
-- Query pattern: Always filter deleted
SELECT * FROM users WHERE deleted_at IS NULL;
Pattern 2: Audit Trail
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by UUID REFERENCES users(id),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_by UUID REFERENCES users(id)
);
-- Separate audit log table for full history
CREATE TABLE users_audit (
audit_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL,
operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
old_values JSONB,
new_values JSONB,
changed_by UUID REFERENCES users(id),
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Pattern 3: Many-to-Many with Metadata
-- Junction table with additional attributes
CREATE TABLE user_roles (
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
role_id UUID REFERENCES roles(id) ON DELETE CASCADE,
granted_by UUID REFERENCES users(id),
granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
PRIMARY KEY (user_id, role_id)
);
CREATE INDEX idx_user_roles_user ON user_roles(user_id);
CREATE INDEX idx_user_roles_role ON user_roles(role_id);
CREATE INDEX idx_user_roles_expires ON user_roles(expires_at)
WHERE expires_at IS NOT NULL;
Pattern 4: Hierarchical Data (Adjacency List)
CREATE TABLE categories (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(255) NOT NULL,
parent_id UUID REFERENCES categories(id),
path TEXT, -- Materialized path: /electronics/computers/laptops
level INT, -- Denormalized for performance
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_categories_parent ON categories(parent_id);
CREATE INDEX idx_categories_path ON categories(path);
Pattern 5: Polymorphic Associations (Avoid if Possible)
❌ Problematic Approach:
-- Weak referential integrity
CREATE TABLE comments (
id UUID PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50), -- 'Post' or 'Photo'
commentable_id UUID, -- No real foreign key!
created_at TIMESTAMP
);
✅ Better Approach (Exclusive Arcs):
CREATE TABLE comments (
id UUID PRIMARY KEY,
content TEXT NOT NULL,
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
photo_id UUID REFERENCES photos(id) ON DELETE CASCADE,
created_at TIMESTAMP,
-- Exactly one must be set
CONSTRAINT one_commentable CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
)
);
CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_photo ON comments(photo_id);
Naming Conventions
Tables: Plural nouns, lowercase, underscores
users, orders, order_items, user_preferences
Columns: Singular nouns, lowercase, underscores
id, email, first_name, created_at, customer_id
Primary Keys: Always id
id UUID PRIMARY KEY
Foreign Keys: {referenced_table_singular}_id
customer_id, product_id, user_id
Indexes: idx_{table}_{column(s)}[_{condition}]
idx_users_email
idx_orders_customer_id
idx_orders_status_created
idx_users_email_active (partial index)
Constraints: {type}_{table}_{description}
pk_users (primary key)
fk_orders_customer (foreign key)
uq_users_email (unique)
ck_orders_total_positive (check)
Common Anti-Patterns to Avoid
❌ Generic JSON Columns (EAV Pattern):
-- Bad: No schema, no constraints, no indexes
CREATE TABLE entities (
id UUID PRIMARY KEY,
type VARCHAR(50),
attributes JSONB
);
❌ Comma-Separated Lists:
-- Bad: Violates 1NF, can't join efficiently
CREATE TABLE users (
id UUID PRIMARY KEY,
tags TEXT -- 'javascript,python,sql'
);
✅ Use junction table instead:
CREATE TABLE user_tags (
user_id UUID REFERENCES users(id),
tag_id UUID REFERENCES tags(id),
PRIMARY KEY (user_id, tag_id)
);
❌ Nullable Boolean Columns:
-- Bad: Three states (true, false, null) - ambiguous
is_active BOOLEAN NULL
✅ Be explicit:
-- Good: Two clear states
is_active BOOLEAN NOT NULL DEFAULT true
ER Diagram Notation (Mermaid)
erDiagram
CUSTOMER ||--o{ ORDER : places
CUSTOMER {
uuid id PK "Primary Key"
string email UK "Unique Key"
string name
timestamp created_at
}
ORDER ||--|{ LINE_ITEM : contains
ORDER {
uuid id PK
uuid customer_id FK
decimal total
string status
}
PRODUCT ||--o{ LINE_ITEM : "ordered in"
LINE_ITEM {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal unit_price
}
Cardinality Symbols:
||--||: One to exactly one||--o|: One to zero or one||--o{: One to zero or more}|--|{: One or more to one or more
Database-Specific Best Practices
PostgreSQL
-- Enable UUID extension
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Use JSONB (not JSON) for better performance
metadata JSONB
-- Use array types when appropriate
tags TEXT[]
-- Use full-text search
CREATE INDEX idx_products_search ON products
USING GIN (to_tsvector('english', name || ' ' || description));
-- Use enums for fixed sets
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'completed', 'cancelled');
MySQL
-- Use InnoDB engine (default in 8.0+)
ENGINE=InnoDB
-- Use UTF8MB4 for full Unicode support (including emoji)
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
-- Use generated columns for computed values
price_with_tax DECIMAL(10,2) GENERATED ALWAYS AS (price * 1.20) STORED
-- Partition large tables
PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
SQLite
-- Use STRICT tables for type enforcement (3.37+)
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT NOT NULL,
age INTEGER NOT NULL
) STRICT;
-- Use WITHOUT ROWID for space efficiency
CREATE TABLE user_settings (
user_id INTEGER PRIMARY KEY,
theme TEXT NOT NULL,
locale TEXT NOT NULL
) WITHOUT ROWID;
-- Use triggers for complex constraints
CREATE TRIGGER check_age_before_insert
BEFORE INSERT ON users
FOR EACH ROW
WHEN NEW.age < 18
BEGIN
SELECT RAISE(ABORT, 'Users must be 18 or older');
END;
Quality Checklist
Schema Completeness:
- All tables have primary keys
- All relationships have foreign keys
- Appropriate NOT NULL constraints
- Check constraints for business rules
- Default values where appropriate
- Created_at/updated_at timestamps
Normalization:
- Schema is at least 3NF
- No repeating groups
- No partial dependencies
- No transitive dependencies
- Denormalization justified and documented
Performance:
- Indexes on all foreign keys
- Indexes on commonly filtered columns
- Composite indexes for multi-column queries
- Covering indexes for frequent queries
- Partial indexes where appropriate
Maintainability:
- Consistent naming conventions
- Clear table and column names
- Comments on complex structures
- ER diagram provided
- Design decisions documented
MCP-Enhanced Schema Design
PostgreSQL MCP for Schema Validation
When PostgreSQL MCP is available, validate schema designs directly against production databases:
// Runtime detection - no configuration needed
const hasPostgres = typeof mcp__postgres__query !== 'undefined';
if (hasPostgres) {
console.log("✓ Using PostgreSQL MCP for schema design validation");
// Validate schema against existing database
const schemaCheck = await mcp__postgres__query({
sql: `
SELECT
table_name,
column_name,
data_type,
is_nullable,
column_default,
character_maximum_length
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name IN ('users', 'orders', 'products')
ORDER BY table_name, ordinal_position
`
});
console.log(`✓ Retrieved schema for ${schemaCheck.rows.length} columns`);
// Check constraints
const constraints = await mcp__postgres__query({
sql: `
SELECT
tc.table_name,
tc.constraint_name,
tc.constraint_type,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
LEFT JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
LEFT JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name
WHERE tc.table_schema = 'public'
ORDER BY tc.table_name, tc.constraint_type
`
});
console.log(`✓ Found ${constraints.rows.length} constraints`);
// Validate foreign key relationships
const orphanedRecords = await mcp__postgres__query({
sql: `
SELECT
'orders' as table_name,
COUNT(*) as orphaned_count
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL
`
});
if (orphanedRecords.rows[0].orphaned_count > 0) {
console.log(`⚠️ Found ${orphanedRecords.rows[0].orphaned_count} orphaned records`);
} else {
console.log("✓ All foreign key relationships valid");
}
// Test DDL before executing
const ddlTest = await mcp__postgres__query({
sql: `
BEGIN;
-- Test adding new column
ALTER TABLE users ADD COLUMN test_column VARCHAR(100);
-- Check table size after change
SELECT pg_size_pretty(pg_relation_size('users')) as table_size;
ROLLBACK;
`
});
console.log(`✓ DDL validated (would not break existing data)`);
} else {
console.log("ℹ️ PostgreSQL MCP not available");
console.log(" Install for schema validation:");
console.log(" npm install -g @modelcontextprotocol/server-postgres");
}
Benefits Comparison
| Aspect | With PostgreSQL MCP | Without MCP (Traditional) |
|---|---|---|
| Schema Exploration | Query information_schema instantly | Request schema dump → wait |
| Constraint Validation | Check FK relationships on real data | Assume constraints work |
| DDL Testing | Test ALTER statements with ROLLBACK | Deploy and hope |
| Data Distribution | Analyze with pg_stats | Guess cardinality |
| Impact Analysis | Query actual table sizes | Estimate impact |
| Normalization Check | Find duplicates in production | Theoretical analysis |
| Migration Safety | Test on production replica | Cross fingers |
When to use PostgreSQL MCP:
- Designing schema for existing database
- Validating normalization against real data
- Testing DDL changes before deployment
- Analyzing data distribution for index design
- Finding schema anomalies
- Planning migrations
- Reverse engineering existing schemas
When traditional approach needed:
- Greenfield database design
- Designing for future data
- Theoretical schema modeling
- No database access
Real-World Example: Adding User Preferences
With PostgreSQL MCP (15 minutes):
// 1. Analyze current users table
const currentSchema = await mcp__postgres__query({
sql: `
SELECT
column_name,
data_type,
is_nullable
FROM information_schema.columns
WHERE table_name = 'users'
ORDER BY ordinal_position
`
});
console.log(`✓ Users table has ${currentSchema.rows.length} columns`);
// 2. Check for existing preference data
const preferencesCheck = await mcp__postgres__query({
sql: `
SELECT
COUNT(DISTINCT user_id) as users_with_prefs,
COUNT(*) as total_prefs,
AVG(array_length(string_to_array(preferences, ','), 1)) as avg_prefs_per_user
FROM user_metadata
WHERE preferences IS NOT NULL
`
});
console.log(`✓ ${preferencesCheck.rows[0].users_with_prefs} users have preferences`);
// 3. Design decision: Separate table vs JSONB column
const tableSize = await mcp__postgres__query({
sql: `
SELECT
pg_size_pretty(pg_relation_size('users')) as current_size,
pg_size_pretty(pg_relation_size('users') * 1.2) as estimated_with_jsonb
`
});
console.log(`✓ Adding JSONB column would increase size to ${tableSize.rows[0].estimated_with_jsonb}`);
// 4. Test the migration (with ROLLBACK)
const migrationTest = await mcp__postgres__query({
sql: `
BEGIN;
-- Add preferences column
ALTER TABLE users ADD COLUMN preferences JSONB DEFAULT '{}'::jsonb;
-- Add GIN index for JSONB queries
CREATE INDEX idx_users_preferences ON users USING GIN (preferences);
-- Test query performance
EXPLAIN ANALYZE
SELECT * FROM users
WHERE preferences @> '{"theme": "dark"}'::jsonb;
ROLLBACK;
`
});
console.log("✓ Migration tested successfully");
// Decision: Use JSONB column (flexible, good performance with GIN index)
Without MCP (2 hours):
- Request schema documentation (15 min wait)
- Analyze schema manually (20 min)
- Make design decision based on assumptions (15 min)
- Write migration script (15 min)
- Deploy to test database (10 min)
- Load test data (20 min)
- Test queries (10 min)
- Find issues (15 min)
- Revise and redeploy (15 min)
Schema Validation Patterns
// Comprehensive schema validation
async function validateSchema() {
const hasPostgres = typeof mcp__postgres__query !== 'undefined';
if (hasPostgres) {
// 1. Check for missing indexes on foreign keys
const missingIndexes = await mcp__postgres__query({
sql: `
SELECT
tc.table_name,
kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
SELECT 1
FROM pg_indexes
WHERE tablename = tc.table_name
AND indexdef LIKE '%' || kcu.column_name || '%'
)
`
});
if (missingIndexes.rows.length > 0) {
console.log("⚠️ Missing indexes on foreign keys:");
missingIndexes.rows.forEach(row => {
console.log(` ${row.table_name}.${row.column_name}`);
});
}
// 2. Check for columns that should be NOT NULL
const nullableColumns = await mcp__postgres__query({
sql: `
SELECT
table_name,
column_name,
COUNT(*) FILTER (WHERE value IS NULL) as null_count,
COUNT(*) as total_count
FROM (
SELECT 'users' as table_name, 'email' as column_name, email as value FROM users
UNION ALL
SELECT 'orders' as table_name, 'user_id' as column_name, user_id::text as value FROM orders
) data
GROUP BY table_name, column_name
HAVING COUNT(*) FILTER (WHERE value IS NULL) = 0
AND EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_name = data.table_name
AND column_name = data.column_name
AND is_nullable = 'YES'
)
`
});
if (nullableColumns.rows.length > 0) {
console.log("⚠️ Columns that could be NOT NULL:");
nullableColumns.rows.forEach(row => {
console.log(` ${row.table_name}.${row.column_name} (0 NULLs in ${row.total_count} rows)`);
});
}
// 3. Check for denormalization opportunities
const duplicateData = await mcp__postgres__query({
sql: `
SELECT
user_id,
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY user_id, email
HAVING COUNT(*) > 1
`
});
return {
missingIndexes: missingIndexes.rows,
nullableColumns: nullableColumns.rows,
duplicates: duplicateData.rows
};
}
}
PostgreSQL MCP Installation
# Install PostgreSQL MCP
npm install -g @modelcontextprotocol/server-postgres
# Configure for schema design
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://schema_designer:pass@db.company.com:5432/production"
}
}
}
}
Once installed, all agents reading this skill automatically validate schemas against live databases.
Schema Design Workflow with MCP
- Explore Existing Schema: Query information_schema
- Analyze Data Distribution: Use pg_stats
- Check Constraints: Validate FK relationships
- Test DDL Changes: Use BEGIN...ROLLBACK
- Estimate Impact: Query table/index sizes
- Validate Normalization: Find duplicates
- Plan Indexes: Analyze query patterns
- Generate Migration: Create safe DDL scripts
Version: 1.0 Last Updated: January 2025 MCP Enhancement: PostgreSQL for data-driven schema design Best Practices: Industry-proven schema design patterns