# Database Optimization Operation You are executing the **database** operation to optimize database queries, schema, indexes, and connection management. ## Parameters **Received**: `$ARGUMENTS` (after removing 'database' operation name) Expected format: `target:"queries|schema|indexes|connections|all" [context:"specific-details"] [threshold:"time-in-ms"] [environment:"prod|staging|dev"]` **Parameter definitions**: - `target` (required): What to optimize - `queries`, `schema`, `indexes`, `connections`, or `all` - `context` (optional): Specific context like table names, query patterns, or problem description - `threshold` (optional): Time threshold for slow queries in milliseconds (default: 500ms) - `environment` (optional): Target environment (default: development) ## Workflow ### 1. Identify Database Technology Detect database type from codebase: ```bash # Check for database configuration grep -r "DATABASE_URL\|DB_CONNECTION\|database" .env* config/ 2>/dev/null | head -5 # Check package dependencies grep -E "pg|mysql|mongodb|sqlite" package.json 2>/dev/null ``` Common patterns: - **PostgreSQL**: `pg`, `pg_stat_statements`, `.pgpass` - **MySQL**: `mysql2`, `mysql`, `.my.cnf` - **MongoDB**: `mongoose`, `mongodb` - **SQLite**: `sqlite3`, `.db` files ### 2. Enable Performance Monitoring **PostgreSQL**: ```sql -- Enable pg_stat_statements extension (if not already) CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Reset statistics for fresh baseline SELECT pg_stat_statements_reset(); -- Enable slow query logging ALTER SYSTEM SET log_min_duration_statement = 500; -- 500ms threshold SELECT pg_reload_conf(); ``` **MySQL**: ```sql -- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 0.5; -- 500ms threshold SET GLOBAL log_queries_not_using_indexes = 'ON'; ``` **MongoDB**: ```javascript // Enable profiling db.setProfilingLevel(1, { slowms: 500 }); // View profiler status db.getProfilingStatus(); ``` ### 3. Analyze Slow Queries **PostgreSQL - Find Slow Queries**: ```sql -- Top 20 slow queries by average time SELECT substring(query, 1, 100) AS short_query, round(mean_exec_time::numeric, 2) AS avg_time_ms, calls, round(total_exec_time::numeric, 2) AS total_time_ms, round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage_cpu FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements%' ORDER BY mean_exec_time DESC LIMIT 20; -- Queries with most calls (potential optimization targets) SELECT substring(query, 1, 100) AS short_query, calls, round(mean_exec_time::numeric, 2) AS avg_time_ms, round(total_exec_time::numeric, 2) AS total_time_ms FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements%' ORDER BY calls DESC LIMIT 20; -- Most time-consuming queries SELECT substring(query, 1, 100) AS short_query, round(total_exec_time::numeric, 2) AS total_time_ms, calls, round(mean_exec_time::numeric, 2) AS avg_time_ms FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements%' ORDER BY total_exec_time DESC LIMIT 20; ``` **MySQL - Find Slow Queries**: ```sql -- Analyze slow query log SELECT DIGEST_TEXT AS query, COUNT_STAR AS exec_count, AVG_TIMER_WAIT/1000000000 AS avg_time_ms, SUM_TIMER_WAIT/1000000000 AS total_time_ms FROM performance_schema.events_statements_summary_by_digest ORDER BY AVG_TIMER_WAIT DESC LIMIT 20; ``` **MongoDB - Find Slow Queries**: ```javascript // View slow operations db.system.profile.find({ millis: { $gt: 500 } }).sort({ ts: -1 }).limit(20).pretty(); // Aggregate slow operations by type db.system.profile.aggregate([ { $match: { millis: { $gt: 500 } } }, { $group: { _id: "$command", count: { $sum: 1 }, avgTime: { $avg: "$millis" } }}, { $sort: { avgTime: -1 } } ]); ``` ### 4. Analyze Query Execution Plans For each slow query, analyze the execution plan: **PostgreSQL - EXPLAIN ANALYZE**: ```sql -- Replace [SLOW_QUERY] with actual query EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) [SLOW_QUERY]; -- Human-readable format EXPLAIN (ANALYZE, BUFFERS) SELECT u.id, u.email, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id WHERE u.created_at > NOW() - INTERVAL '30 days' GROUP BY u.id, u.email; ``` Look for these indicators: - **Seq Scan** - Full table scan (bad for large tables, consider index) - **Index Scan** - Using index (good) - **Nested Loop** - Join method (may be slow for large datasets) - **Hash Join** / **Merge Join** - Usually better for large datasets - **High execution time** - Optimization opportunity **MySQL - EXPLAIN**: ```sql EXPLAIN FORMAT=JSON SELECT u.id, u.email, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id WHERE u.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY u.id, u.email; ``` Look for: - `type: ALL` - Full table scan (bad) - `type: index` or `type: range` - Using index (good) - `rows: high_number` - Large row count suggests optimization needed **MongoDB - Explain**: ```javascript db.users.find({ createdAt: { $gte: new Date(Date.now() - 30*24*60*60*1000) } }).explain("executionStats"); ``` Look for: - `COLLSCAN` - Collection scan (bad, add index) - `IXSCAN` - Index scan (good) - `executionTimeMillis` - Total execution time ### 5. Index Analysis and Optimization **PostgreSQL - Missing Indexes**: ```sql -- Find tables with missing indexes (frequent seq scans) SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan AS avg_seq_read FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 20; -- Find unused indexes (candidates for removal) SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE '%_pkey' ORDER BY pg_relation_size(indexrelid) DESC; -- Check duplicate indexes SELECT pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS total_size, (array_agg(idx))[1] AS idx1, (array_agg(idx))[2] AS idx2, (array_agg(idx))[3] AS idx3, (array_agg(idx))[4] AS idx4 FROM ( SELECT indexrelid::regclass AS idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'|| COALESCE(indexprs::text,'')||E'\n' || COALESCE(indpred::text,'')) AS key FROM pg_index ) sub GROUP BY key HAVING COUNT(*) > 1 ORDER BY SUM(pg_relation_size(idx)) DESC; ``` **Index Creation Examples**: ```sql -- Simple index (single column) CREATE INDEX CONCURRENTLY idx_users_email ON users(email); -- Composite index (multiple columns) - order matters! CREATE INDEX CONCURRENTLY idx_posts_user_created ON posts(user_id, created_at DESC); -- Partial index (filtered) CREATE INDEX CONCURRENTLY idx_users_active_email ON users(email) WHERE status = 'active'; -- Expression index CREATE INDEX CONCURRENTLY idx_users_lower_email ON users(LOWER(email)); -- GiST index for full-text search CREATE INDEX CONCURRENTLY idx_posts_search ON posts USING GiST(to_tsvector('english', title || ' ' || content)); ``` **MySQL - Index Analysis**: ```sql -- Check indexes on a table SHOW INDEXES FROM users; -- Find unused indexes SELECT TABLE_NAME, INDEX_NAME, CARDINALITY FROM information_schema.STATISTICS WHERE TABLE_SCHEMA = DATABASE() GROUP BY TABLE_NAME, INDEX_NAME HAVING SUM(CARDINALITY) = 0; -- Create index CREATE INDEX idx_users_email ON users(email); -- Create composite index CREATE INDEX idx_posts_user_created ON posts(user_id, created_at); ``` **MongoDB - Index Analysis**: ```javascript // List all indexes on collection db.users.getIndexes(); // Check index usage db.users.aggregate([ { $indexStats: {} } ]); // Create single field index db.users.createIndex({ email: 1 }); // Create compound index db.posts.createIndex({ userId: 1, createdAt: -1 }); // Create text index for search db.posts.createIndex({ title: "text", content: "text" }); // Create partial index db.users.createIndex( { email: 1 }, { partialFilterExpression: { status: "active" } } ); ``` ### 6. Query Optimization Examples **Example 1: N+1 Query Problem** ```javascript // BEFORE (N+1 problem) async function getUsersWithPosts() { const users = await User.findAll(); // 1 query for (const user of users) { user.posts = await Post.findAll({ // N queries (one per user) where: { userId: user.id } }); } return users; } // AFTER (eager loading) async function getUsersWithPosts() { const users = await User.findAll({ // 1 query with join include: [{ model: Post, as: 'posts' }] }); return users; } // SQL generated: // SELECT u.*, p.* FROM users u LEFT JOIN posts p ON p.user_id = u.id; ``` **Example 2: SELECT * Optimization** ```sql -- BEFORE (fetches all columns) SELECT * FROM users WHERE email = 'user@example.com'; -- AFTER (fetch only needed columns) SELECT id, email, name, created_at FROM users WHERE email = 'user@example.com'; ``` **Example 3: Inefficient JOIN** ```sql -- BEFORE (subquery for each row) SELECT u.id, u.name, (SELECT COUNT(*) FROM posts WHERE user_id = u.id) AS post_count FROM users u; -- AFTER (single join with aggregation) SELECT u.id, u.name, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name; ``` **Example 4: Pagination with OFFSET** ```sql -- BEFORE (inefficient for large offsets) SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000; -- AFTER (cursor-based pagination) SELECT * FROM posts WHERE created_at < '2025-10-01T00:00:00Z' -- cursor from last result ORDER BY created_at DESC LIMIT 20; ``` **Example 5: OR to UNION Optimization** ```sql -- BEFORE (prevents index usage) SELECT * FROM users WHERE email = 'test@example.com' OR username = 'testuser'; -- AFTER (allows index usage on both columns) SELECT * FROM users WHERE email = 'test@example.com' UNION SELECT * FROM users WHERE username = 'testuser'; ``` ### 7. Schema Optimization **Normalization vs. Denormalization**: ```sql -- Normalized (3NF) - reduces redundancy but requires joins CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255) ); CREATE TABLE user_profiles ( user_id INTEGER PRIMARY KEY REFERENCES users(id), bio TEXT, avatar_url VARCHAR(500) ); -- Denormalized - faster reads, some redundancy CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255), bio TEXT, avatar_url VARCHAR(500) ); ``` **Partitioning Large Tables**: ```sql -- PostgreSQL table partitioning by date CREATE TABLE posts ( id BIGSERIAL, user_id INTEGER, content TEXT, created_at TIMESTAMP NOT NULL, PRIMARY KEY (id, created_at) ) PARTITION BY RANGE (created_at); -- Create partitions CREATE TABLE posts_2025_q1 PARTITION OF posts FOR VALUES FROM ('2025-01-01') TO ('2025-04-01'); CREATE TABLE posts_2025_q2 PARTITION OF posts FOR VALUES FROM ('2025-04-01') TO ('2025-07-01'); ``` **Column Type Optimization**: ```sql -- BEFORE (inefficient types) CREATE TABLE users ( id BIGSERIAL, email VARCHAR(500), status VARCHAR(50), age NUMERIC, is_verified CHAR(1) ); -- AFTER (optimized types) CREATE TABLE users ( id SERIAL, -- Use SERIAL if < 2 billion records email VARCHAR(255), -- Right-sized status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended')), -- Constrained age SMALLINT CHECK (age >= 0 AND age <= 150), -- Appropriate int size is_verified BOOLEAN -- Native boolean ); ``` ### 8. Connection Pool Optimization **Node.js (pg) Example**: ```javascript // BEFORE (default settings) const pool = new Pool({ connectionString: process.env.DATABASE_URL }); // AFTER (optimized for application) const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // Maximum pool size (based on workload) min: 5, // Minimum idle connections idleTimeoutMillis: 30000, // Remove idle connections after 30s connectionTimeoutMillis: 2000, // Fail fast if no connection available statement_timeout: 5000, // Query timeout (5s) query_timeout: 5000 }); // Monitor pool health pool.on('connect', () => { console.log('Database connection established'); }); pool.on('error', (err) => { console.error('Unexpected database error', err); }); // Check pool status setInterval(() => { console.log({ total: pool.totalCount, idle: pool.idleCount, waiting: pool.waitingCount }); }, 60000); ``` **Connection Pool Sizing Formula**: ``` Optimal Pool Size = (Core Count × 2) + Effective Spindle Count Example for 4-core server with SSD: Pool Size = (4 × 2) + 1 = 9 connections ``` ### 9. Query Caching **Application-Level Caching (Redis)**: ```javascript // BEFORE (no caching) async function getUser(userId) { return await User.findByPk(userId); } // AFTER (with Redis cache) async function getUser(userId) { const cacheKey = `user:${userId}`; // Try cache first const cached = await redis.get(cacheKey); if (cached) { return JSON.parse(cached); } // Cache miss - query database const user = await User.findByPk(userId); // Store in cache (TTL: 5 minutes) await redis.setex(cacheKey, 300, JSON.stringify(user)); return user; } // Invalidate cache on update async function updateUser(userId, data) { const user = await User.update(data, { where: { id: userId } }); // Invalidate cache await redis.del(`user:${userId}`); return user; } ``` **Database-Level Caching**: ```sql -- PostgreSQL materialized view (cached aggregate) CREATE MATERIALIZED VIEW user_stats AS SELECT user_id, COUNT(*) AS post_count, MAX(created_at) AS last_post_at FROM posts GROUP BY user_id; -- Create index on materialized view CREATE INDEX idx_user_stats_user_id ON user_stats(user_id); -- Refresh periodically (in cron job) REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats; ``` ### 10. Measure Impact After implementing optimizations: ```sql -- PostgreSQL: Compare before/after query times SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements WHERE query LIKE '%[your_query_pattern]%' ORDER BY mean_exec_time DESC; -- Check index usage after creating indexes SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE indexname IN ('idx_users_email', 'idx_posts_user_created') ORDER BY idx_scan DESC; ``` ## Output Format ```markdown # Database Optimization Report: [Context] **Optimization Date**: [Date] **Database**: [PostgreSQL/MySQL/MongoDB version] **Environment**: [production/staging/development] **Threshold**: [X]ms for slow queries ## Executive Summary [2-3 paragraph summary of findings and optimizations applied] ## Baseline Metrics ### Slow Queries Identified | Query Pattern | Avg Time | Calls | Total Time | % CPU | |---------------|----------|-------|------------|-------| | SELECT users WHERE email = ... | 450ms | 1,250 | 562s | 12.3% | | SELECT posts with user JOIN | 820ms | 450 | 369s | 8.1% | | SELECT COUNT(*) FROM activity_logs | 2,100ms | 120 | 252s | 5.5% | ### Index Analysis **Missing Indexes**: 3 tables with frequent sequential scans **Unused Indexes**: 2 indexes with 0 scans (candidates for removal) **Duplicate Indexes**: 1 set of duplicate indexes found ### Connection Pool Metrics - **Total Connections**: 15 - **Idle Connections**: 3 - **Active Connections**: 12 - **Waiting Requests**: 5 (indicates pool exhaustion) ## Optimizations Implemented ### 1. Added Missing Indexes #### Index: idx_users_email ```sql CREATE INDEX CONCURRENTLY idx_users_email ON users(email); ``` **Impact**: - **Before**: 450ms avg, 1,250 calls, Seq Scan on 500K rows - **After**: 8ms avg, 1,250 calls, Index Scan - **Improvement**: 98.2% faster (442ms saved per query) - **Total Time Saved**: 552s per analysis period **Execution Plan Comparison**: ``` BEFORE: Seq Scan on users (cost=0.00..15234.50 rows=1 width=124) (actual time=442.231..448.891 rows=1) Filter: (email = 'user@example.com') Rows Removed by Filter: 499999 AFTER: Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=124) (actual time=0.031..0.033 rows=1) Index Cond: (email = 'user@example.com') ``` #### Index: idx_posts_user_created ```sql CREATE INDEX CONCURRENTLY idx_posts_user_created ON posts(user_id, created_at DESC); ``` **Impact**: - **Before**: 820ms avg, Nested Loop + Seq Scan - **After**: 45ms avg, Index Scan with sorted results - **Improvement**: 94.5% faster (775ms saved per query) ### 2. Query Optimizations #### Optimization: Fixed N+1 Query in User Posts Endpoint **Before**: ```javascript const users = await User.findAll(); for (const user of users) { user.posts = await Post.findAll({ where: { userId: user.id } }); } // Result: 1 + N queries (251 queries for 250 users) ``` **After**: ```javascript const users = await User.findAll({ include: [{ model: Post, as: 'posts' }] }); // Result: 1 query with JOIN ``` **Impact**: - **Before**: 2,100ms for 250 users (1 + 250 queries) - **After**: 180ms for 250 users (1 query) - **Improvement**: 91.4% faster #### Optimization: Cursor-Based Pagination **Before**: ```sql SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000; -- Execution time: 1,200ms (must scan and skip 10,000 rows) ``` **After**: ```sql SELECT * FROM posts WHERE created_at < '2025-09-01T12:00:00Z' ORDER BY created_at DESC LIMIT 20; -- Execution time: 15ms (index seek directly to position) ``` **Impact**: 98.8% faster pagination for deep pages ### 3. Schema Optimizations #### Denormalized User Activity Counts **Before**: ```sql -- Expensive aggregation on every query SELECT u.*, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id; ``` **After**: ```sql -- Added cached column with trigger updates ALTER TABLE users ADD COLUMN post_count INTEGER DEFAULT 0; -- Trigger to maintain count CREATE OR REPLACE FUNCTION update_user_post_count() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'INSERT' THEN UPDATE users SET post_count = post_count + 1 WHERE id = NEW.user_id; ELSIF TG_OP = 'DELETE' THEN UPDATE users SET post_count = post_count - 1 WHERE id = OLD.user_id; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER update_post_count AFTER INSERT OR DELETE ON posts FOR EACH ROW EXECUTE FUNCTION update_user_post_count(); -- Simple query now SELECT * FROM users; ``` **Impact**: - **Before**: 340ms (aggregation query) - **After**: 12ms (simple select) - **Improvement**: 96.5% faster ### 4. Connection Pool Optimization **Before**: ```javascript const pool = new Pool(); // Default settings // Max: 10, Min: 0 // Frequent connection exhaustion under load ``` **After**: ```javascript const pool = new Pool({ max: 20, // Increased for higher concurrency min: 5, // Keep warm connections idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, statement_timeout: 5000 }); ``` **Impact**: - **Before**: 45 connection timeout errors per hour under load - **After**: 0 connection timeout errors - **Improvement**: Eliminated connection pool exhaustion ### 5. Query Result Caching **Implementation**: ```javascript async function getUserProfile(userId) { const cacheKey = `user:${userId}:profile`; const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); const profile = await User.findByPk(userId, { include: ['profile', 'settings'] }); await redis.setex(cacheKey, 300, JSON.stringify(profile)); return profile; } ``` **Impact**: - **Cache Hit Rate**: 87% (after 24 hours) - **Avg Response Time (cached)**: 3ms - **Avg Response Time (uncached)**: 45ms - **Database Load Reduction**: 87% ## Results Summary ### Overall Performance Improvements | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Avg Query Time | 285ms | 34ms | 88% faster | | Slow Query Count (>500ms) | 23 queries | 2 queries | 91% reduction | | Database CPU Usage | 68% | 32% | 53% reduction | | Connection Pool Timeouts | 45/hour | 0/hour | 100% elimination | | Cache Hit Rate | N/A | 87% | New capability | ### Query-Specific Improvements | Query | Before | After | Improvement | |-------|--------|-------|-------------| | User lookup by email | 450ms | 8ms | 98.2% | | User posts listing | 820ms | 45ms | 94.5% | | User activity with posts | 2,100ms | 180ms | 91.4% | | Deep pagination | 1,200ms | 15ms | 98.8% | ### Index Impact | Index | Scans | Rows Read | Impact | |-------|-------|-----------|--------| | idx_users_email | 1,250 | 1,250 | Direct lookups | | idx_posts_user_created | 450 | 9,000 | User posts queries | ## Monitoring Recommendations ### Key Metrics to Track 1. **Query Performance**: ```sql -- Weekly query performance review SELECT substring(query, 1, 100) AS query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements WHERE mean_exec_time > 100 ORDER BY mean_exec_time DESC LIMIT 20; ``` 2. **Index Usage**: ```sql -- Monitor new index effectiveness SELECT * FROM pg_stat_user_indexes WHERE indexname LIKE 'idx_%' ORDER BY idx_scan DESC; ``` 3. **Connection Pool Health**: ```javascript // Log pool metrics every minute setInterval(() => { console.log('Pool:', pool.totalCount, 'Idle:', pool.idleCount); }, 60000); ``` 4. **Cache Hit Rates**: ```javascript // Track Redis cache effectiveness const stats = await redis.info('stats'); // Monitor keyspace_hits vs keyspace_misses ``` ### Alerts to Configure - Slow query count > 10 per hour - Connection pool utilization > 85% - Cache hit rate < 70% - Database CPU > 80% ## Trade-offs and Considerations **Denormalization Trade-offs**: - **Benefit**: Faster reads (96.5% improvement) - **Cost**: Increased storage (minimal), trigger overhead on writes - **Conclusion**: Worth it for read-heavy workloads **Connection Pool Size**: - **Benefit**: Eliminated timeouts - **Cost**: Increased memory usage (~20MB) - **Consideration**: Monitor database connection limits **Caching Strategy**: - **Benefit**: 87% reduction in database load - **Cost**: Cache invalidation complexity, Redis dependency - **Consideration**: Implement cache warming for critical data ## Next Steps 1. **Monitor** new indexes and query performance for 1 week 2. **Implement** additional caching for frequently accessed data 3. **Consider** table partitioning for `activity_logs` (2M+ rows) 4. **Schedule** VACUUM ANALYZE for optimized tables 5. **Review** remaining 2 slow queries for further optimization ## Maintenance Recommendations **Weekly**: - Review pg_stat_statements for new slow queries - Check index usage statistics **Monthly**: - Analyze table statistics: `VACUUM ANALYZE` - Review and remove unused indexes - Check for table bloat **Quarterly**: - Review schema design for optimization opportunities - Evaluate partitioning strategy for large tables - Update connection pool settings based on usage patterns