22 KiB
Database Optimization Operation
You are executing the database operation to optimize database queries, schema, indexes, and connection management.
Parameters
Received: $ARGUMENTS (after removing 'database' operation name)
Expected format: target:"queries|schema|indexes|connections|all" [context:"specific-details"] [threshold:"time-in-ms"] [environment:"prod|staging|dev"]
Parameter definitions:
target(required): What to optimize -queries,schema,indexes,connections, orallcontext(optional): Specific context like table names, query patterns, or problem descriptionthreshold(optional): Time threshold for slow queries in milliseconds (default: 500ms)environment(optional): Target environment (default: development)
Workflow
1. Identify Database Technology
Detect database type from codebase:
# Check for database configuration
grep -r "DATABASE_URL\|DB_CONNECTION\|database" .env* config/ 2>/dev/null | head -5
# Check package dependencies
grep -E "pg|mysql|mongodb|sqlite" package.json 2>/dev/null
Common patterns:
- PostgreSQL:
pg,pg_stat_statements,.pgpass - MySQL:
mysql2,mysql,.my.cnf - MongoDB:
mongoose,mongodb - SQLite:
sqlite3,.dbfiles
2. Enable Performance Monitoring
PostgreSQL:
-- Enable pg_stat_statements extension (if not already)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Reset statistics for fresh baseline
SELECT pg_stat_statements_reset();
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 500; -- 500ms threshold
SELECT pg_reload_conf();
MySQL:
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.5; -- 500ms threshold
SET GLOBAL log_queries_not_using_indexes = 'ON';
MongoDB:
// Enable profiling
db.setProfilingLevel(1, { slowms: 500 });
// View profiler status
db.getProfilingStatus();
3. Analyze Slow Queries
PostgreSQL - Find Slow Queries:
-- Top 20 slow queries by average time
SELECT
substring(query, 1, 100) AS short_query,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
calls,
round(total_exec_time::numeric, 2) AS total_time_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage_cpu
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat_statements%'
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Queries with most calls (potential optimization targets)
SELECT
substring(query, 1, 100) AS short_query,
calls,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
round(total_exec_time::numeric, 2) AS total_time_ms
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat_statements%'
ORDER BY calls DESC
LIMIT 20;
-- Most time-consuming queries
SELECT
substring(query, 1, 100) AS short_query,
round(total_exec_time::numeric, 2) AS total_time_ms,
calls,
round(mean_exec_time::numeric, 2) AS avg_time_ms
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat_statements%'
ORDER BY total_exec_time DESC
LIMIT 20;
MySQL - Find Slow Queries:
-- Analyze slow query log
SELECT
DIGEST_TEXT AS query,
COUNT_STAR AS exec_count,
AVG_TIMER_WAIT/1000000000 AS avg_time_ms,
SUM_TIMER_WAIT/1000000000 AS total_time_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 20;
MongoDB - Find Slow Queries:
// View slow operations
db.system.profile.find({
millis: { $gt: 500 }
}).sort({ ts: -1 }).limit(20).pretty();
// Aggregate slow operations by type
db.system.profile.aggregate([
{ $match: { millis: { $gt: 500 } } },
{ $group: {
_id: "$command",
count: { $sum: 1 },
avgTime: { $avg: "$millis" }
}},
{ $sort: { avgTime: -1 } }
]);
4. Analyze Query Execution Plans
For each slow query, analyze the execution plan:
PostgreSQL - EXPLAIN ANALYZE:
-- Replace [SLOW_QUERY] with actual query
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
[SLOW_QUERY];
-- Human-readable format
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.id, u.email, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email;
Look for these indicators:
- Seq Scan - Full table scan (bad for large tables, consider index)
- Index Scan - Using index (good)
- Nested Loop - Join method (may be slow for large datasets)
- Hash Join / Merge Join - Usually better for large datasets
- High execution time - Optimization opportunity
MySQL - EXPLAIN:
EXPLAIN FORMAT=JSON
SELECT u.id, u.email, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY u.id, u.email;
Look for:
type: ALL- Full table scan (bad)type: indexortype: range- Using index (good)rows: high_number- Large row count suggests optimization needed
MongoDB - Explain:
db.users.find({
createdAt: { $gte: new Date(Date.now() - 30*24*60*60*1000) }
}).explain("executionStats");
Look for:
COLLSCAN- Collection scan (bad, add index)IXSCAN- Index scan (good)executionTimeMillis- Total execution time
5. Index Analysis and Optimization
PostgreSQL - Missing Indexes:
-- Find tables with missing indexes (frequent seq scans)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
-- Find unused indexes (candidates for removal)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Check duplicate indexes
SELECT
pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS total_size,
(array_agg(idx))[1] AS idx1,
(array_agg(idx))[2] AS idx2,
(array_agg(idx))[3] AS idx3,
(array_agg(idx))[4] AS idx4
FROM (
SELECT
indexrelid::regclass AS idx,
(indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'|| COALESCE(indexprs::text,'')||E'\n' || COALESCE(indpred::text,'')) AS key
FROM pg_index
) sub
GROUP BY key
HAVING COUNT(*) > 1
ORDER BY SUM(pg_relation_size(idx)) DESC;
Index Creation Examples:
-- Simple index (single column)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Composite index (multiple columns) - order matters!
CREATE INDEX CONCURRENTLY idx_posts_user_created
ON posts(user_id, created_at DESC);
-- Partial index (filtered)
CREATE INDEX CONCURRENTLY idx_users_active_email
ON users(email)
WHERE status = 'active';
-- Expression index
CREATE INDEX CONCURRENTLY idx_users_lower_email
ON users(LOWER(email));
-- GiST index for full-text search
CREATE INDEX CONCURRENTLY idx_posts_search
ON posts USING GiST(to_tsvector('english', title || ' ' || content));
MySQL - Index Analysis:
-- Check indexes on a table
SHOW INDEXES FROM users;
-- Find unused indexes
SELECT
TABLE_NAME,
INDEX_NAME,
CARDINALITY
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
GROUP BY TABLE_NAME, INDEX_NAME
HAVING SUM(CARDINALITY) = 0;
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Create composite index
CREATE INDEX idx_posts_user_created ON posts(user_id, created_at);
MongoDB - Index Analysis:
// List all indexes on collection
db.users.getIndexes();
// Check index usage
db.users.aggregate([
{ $indexStats: {} }
]);
// Create single field index
db.users.createIndex({ email: 1 });
// Create compound index
db.posts.createIndex({ userId: 1, createdAt: -1 });
// Create text index for search
db.posts.createIndex({ title: "text", content: "text" });
// Create partial index
db.users.createIndex(
{ email: 1 },
{ partialFilterExpression: { status: "active" } }
);
6. Query Optimization Examples
Example 1: N+1 Query Problem
// BEFORE (N+1 problem)
async function getUsersWithPosts() {
const users = await User.findAll(); // 1 query
for (const user of users) {
user.posts = await Post.findAll({ // N queries (one per user)
where: { userId: user.id }
});
}
return users;
}
// AFTER (eager loading)
async function getUsersWithPosts() {
const users = await User.findAll({ // 1 query with join
include: [{ model: Post, as: 'posts' }]
});
return users;
}
// SQL generated:
// SELECT u.*, p.* FROM users u LEFT JOIN posts p ON p.user_id = u.id;
Example 2: SELECT * Optimization
-- BEFORE (fetches all columns)
SELECT * FROM users WHERE email = 'user@example.com';
-- AFTER (fetch only needed columns)
SELECT id, email, name, created_at FROM users WHERE email = 'user@example.com';
Example 3: Inefficient JOIN
-- BEFORE (subquery for each row)
SELECT
u.id,
u.name,
(SELECT COUNT(*) FROM posts WHERE user_id = u.id) AS post_count
FROM users u;
-- AFTER (single join with aggregation)
SELECT
u.id,
u.name,
COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;
Example 4: Pagination with OFFSET
-- BEFORE (inefficient for large offsets)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- AFTER (cursor-based pagination)
SELECT * FROM posts
WHERE created_at < '2025-10-01T00:00:00Z' -- cursor from last result
ORDER BY created_at DESC
LIMIT 20;
Example 5: OR to UNION Optimization
-- BEFORE (prevents index usage)
SELECT * FROM users WHERE email = 'test@example.com' OR username = 'testuser';
-- AFTER (allows index usage on both columns)
SELECT * FROM users WHERE email = 'test@example.com'
UNION
SELECT * FROM users WHERE username = 'testuser';
7. Schema Optimization
Normalization vs. Denormalization:
-- Normalized (3NF) - reduces redundancy but requires joins
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
CREATE TABLE user_profiles (
user_id INTEGER PRIMARY KEY REFERENCES users(id),
bio TEXT,
avatar_url VARCHAR(500)
);
-- Denormalized - faster reads, some redundancy
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
bio TEXT,
avatar_url VARCHAR(500)
);
Partitioning Large Tables:
-- PostgreSQL table partitioning by date
CREATE TABLE posts (
id BIGSERIAL,
user_id INTEGER,
content TEXT,
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE posts_2025_q1 PARTITION OF posts
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE posts_2025_q2 PARTITION OF posts
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
Column Type Optimization:
-- BEFORE (inefficient types)
CREATE TABLE users (
id BIGSERIAL,
email VARCHAR(500),
status VARCHAR(50),
age NUMERIC,
is_verified CHAR(1)
);
-- AFTER (optimized types)
CREATE TABLE users (
id SERIAL, -- Use SERIAL if < 2 billion records
email VARCHAR(255), -- Right-sized
status VARCHAR(20) CHECK (status IN ('active', 'inactive', 'suspended')), -- Constrained
age SMALLINT CHECK (age >= 0 AND age <= 150), -- Appropriate int size
is_verified BOOLEAN -- Native boolean
);
8. Connection Pool Optimization
Node.js (pg) Example:
// BEFORE (default settings)
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
// AFTER (optimized for application)
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Maximum pool size (based on workload)
min: 5, // Minimum idle connections
idleTimeoutMillis: 30000, // Remove idle connections after 30s
connectionTimeoutMillis: 2000, // Fail fast if no connection available
statement_timeout: 5000, // Query timeout (5s)
query_timeout: 5000
});
// Monitor pool health
pool.on('connect', () => {
console.log('Database connection established');
});
pool.on('error', (err) => {
console.error('Unexpected database error', err);
});
// Check pool status
setInterval(() => {
console.log({
total: pool.totalCount,
idle: pool.idleCount,
waiting: pool.waitingCount
});
}, 60000);
Connection Pool Sizing Formula:
Optimal Pool Size = (Core Count × 2) + Effective Spindle Count
Example for 4-core server with SSD:
Pool Size = (4 × 2) + 1 = 9 connections
9. Query Caching
Application-Level Caching (Redis):
// BEFORE (no caching)
async function getUser(userId) {
return await User.findByPk(userId);
}
// AFTER (with Redis cache)
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// Try cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Cache miss - query database
const user = await User.findByPk(userId);
// Store in cache (TTL: 5 minutes)
await redis.setex(cacheKey, 300, JSON.stringify(user));
return user;
}
// Invalidate cache on update
async function updateUser(userId, data) {
const user = await User.update(data, { where: { id: userId } });
// Invalidate cache
await redis.del(`user:${userId}`);
return user;
}
Database-Level Caching:
-- PostgreSQL materialized view (cached aggregate)
CREATE MATERIALIZED VIEW user_stats AS
SELECT
user_id,
COUNT(*) AS post_count,
MAX(created_at) AS last_post_at
FROM posts
GROUP BY user_id;
-- Create index on materialized view
CREATE INDEX idx_user_stats_user_id ON user_stats(user_id);
-- Refresh periodically (in cron job)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;
10. Measure Impact
After implementing optimizations:
-- PostgreSQL: Compare before/after query times
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%[your_query_pattern]%'
ORDER BY mean_exec_time DESC;
-- Check index usage after creating indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexname IN ('idx_users_email', 'idx_posts_user_created')
ORDER BY idx_scan DESC;
Output Format
# Database Optimization Report: [Context]
**Optimization Date**: [Date]
**Database**: [PostgreSQL/MySQL/MongoDB version]
**Environment**: [production/staging/development]
**Threshold**: [X]ms for slow queries
## Executive Summary
[2-3 paragraph summary of findings and optimizations applied]
## Baseline Metrics
### Slow Queries Identified
| Query Pattern | Avg Time | Calls | Total Time | % CPU |
|---------------|----------|-------|------------|-------|
| SELECT users WHERE email = ... | 450ms | 1,250 | 562s | 12.3% |
| SELECT posts with user JOIN | 820ms | 450 | 369s | 8.1% |
| SELECT COUNT(*) FROM activity_logs | 2,100ms | 120 | 252s | 5.5% |
### Index Analysis
**Missing Indexes**: 3 tables with frequent sequential scans
**Unused Indexes**: 2 indexes with 0 scans (candidates for removal)
**Duplicate Indexes**: 1 set of duplicate indexes found
### Connection Pool Metrics
- **Total Connections**: 15
- **Idle Connections**: 3
- **Active Connections**: 12
- **Waiting Requests**: 5 (indicates pool exhaustion)
## Optimizations Implemented
### 1. Added Missing Indexes
#### Index: idx_users_email
```sql
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
Impact:
- Before: 450ms avg, 1,250 calls, Seq Scan on 500K rows
- After: 8ms avg, 1,250 calls, Index Scan
- Improvement: 98.2% faster (442ms saved per query)
- Total Time Saved: 552s per analysis period
Execution Plan Comparison:
BEFORE:
Seq Scan on users (cost=0.00..15234.50 rows=1 width=124) (actual time=442.231..448.891 rows=1)
Filter: (email = 'user@example.com')
Rows Removed by Filter: 499999
AFTER:
Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=124) (actual time=0.031..0.033 rows=1)
Index Cond: (email = 'user@example.com')
Index: idx_posts_user_created
CREATE INDEX CONCURRENTLY idx_posts_user_created ON posts(user_id, created_at DESC);
Impact:
- Before: 820ms avg, Nested Loop + Seq Scan
- After: 45ms avg, Index Scan with sorted results
- Improvement: 94.5% faster (775ms saved per query)
2. Query Optimizations
Optimization: Fixed N+1 Query in User Posts Endpoint
Before:
const users = await User.findAll();
for (const user of users) {
user.posts = await Post.findAll({ where: { userId: user.id } });
}
// Result: 1 + N queries (251 queries for 250 users)
After:
const users = await User.findAll({
include: [{ model: Post, as: 'posts' }]
});
// Result: 1 query with JOIN
Impact:
- Before: 2,100ms for 250 users (1 + 250 queries)
- After: 180ms for 250 users (1 query)
- Improvement: 91.4% faster
Optimization: Cursor-Based Pagination
Before:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Execution time: 1,200ms (must scan and skip 10,000 rows)
After:
SELECT * FROM posts
WHERE created_at < '2025-09-01T12:00:00Z'
ORDER BY created_at DESC
LIMIT 20;
-- Execution time: 15ms (index seek directly to position)
Impact: 98.8% faster pagination for deep pages
3. Schema Optimizations
Denormalized User Activity Counts
Before:
-- Expensive aggregation on every query
SELECT u.*, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id;
After:
-- Added cached column with trigger updates
ALTER TABLE users ADD COLUMN post_count INTEGER DEFAULT 0;
-- Trigger to maintain count
CREATE OR REPLACE FUNCTION update_user_post_count()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE users SET post_count = post_count + 1 WHERE id = NEW.user_id;
ELSIF TG_OP = 'DELETE' THEN
UPDATE users SET post_count = post_count - 1 WHERE id = OLD.user_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_post_count
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_user_post_count();
-- Simple query now
SELECT * FROM users;
Impact:
- Before: 340ms (aggregation query)
- After: 12ms (simple select)
- Improvement: 96.5% faster
4. Connection Pool Optimization
Before:
const pool = new Pool(); // Default settings
// Max: 10, Min: 0
// Frequent connection exhaustion under load
After:
const pool = new Pool({
max: 20, // Increased for higher concurrency
min: 5, // Keep warm connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
statement_timeout: 5000
});
Impact:
- Before: 45 connection timeout errors per hour under load
- After: 0 connection timeout errors
- Improvement: Eliminated connection pool exhaustion
5. Query Result Caching
Implementation:
async function getUserProfile(userId) {
const cacheKey = `user:${userId}:profile`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const profile = await User.findByPk(userId, {
include: ['profile', 'settings']
});
await redis.setex(cacheKey, 300, JSON.stringify(profile));
return profile;
}
Impact:
- Cache Hit Rate: 87% (after 24 hours)
- Avg Response Time (cached): 3ms
- Avg Response Time (uncached): 45ms
- Database Load Reduction: 87%
Results Summary
Overall Performance Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| Avg Query Time | 285ms | 34ms | 88% faster |
| Slow Query Count (>500ms) | 23 queries | 2 queries | 91% reduction |
| Database CPU Usage | 68% | 32% | 53% reduction |
| Connection Pool Timeouts | 45/hour | 0/hour | 100% elimination |
| Cache Hit Rate | N/A | 87% | New capability |
Query-Specific Improvements
| Query | Before | After | Improvement |
|---|---|---|---|
| User lookup by email | 450ms | 8ms | 98.2% |
| User posts listing | 820ms | 45ms | 94.5% |
| User activity with posts | 2,100ms | 180ms | 91.4% |
| Deep pagination | 1,200ms | 15ms | 98.8% |
Index Impact
| Index | Scans | Rows Read | Impact |
|---|---|---|---|
| idx_users_email | 1,250 | 1,250 | Direct lookups |
| idx_posts_user_created | 450 | 9,000 | User posts queries |
Monitoring Recommendations
Key Metrics to Track
-
Query Performance:
-- Weekly query performance review SELECT substring(query, 1, 100) AS query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements WHERE mean_exec_time > 100 ORDER BY mean_exec_time DESC LIMIT 20; -
Index Usage:
-- Monitor new index effectiveness SELECT * FROM pg_stat_user_indexes WHERE indexname LIKE 'idx_%' ORDER BY idx_scan DESC; -
Connection Pool Health:
// Log pool metrics every minute setInterval(() => { console.log('Pool:', pool.totalCount, 'Idle:', pool.idleCount); }, 60000); -
Cache Hit Rates:
// Track Redis cache effectiveness const stats = await redis.info('stats'); // Monitor keyspace_hits vs keyspace_misses
Alerts to Configure
- Slow query count > 10 per hour
- Connection pool utilization > 85%
- Cache hit rate < 70%
- Database CPU > 80%
Trade-offs and Considerations
Denormalization Trade-offs:
- Benefit: Faster reads (96.5% improvement)
- Cost: Increased storage (minimal), trigger overhead on writes
- Conclusion: Worth it for read-heavy workloads
Connection Pool Size:
- Benefit: Eliminated timeouts
- Cost: Increased memory usage (~20MB)
- Consideration: Monitor database connection limits
Caching Strategy:
- Benefit: 87% reduction in database load
- Cost: Cache invalidation complexity, Redis dependency
- Consideration: Implement cache warming for critical data
Next Steps
- Monitor new indexes and query performance for 1 week
- Implement additional caching for frequently accessed data
- Consider table partitioning for
activity_logs(2M+ rows) - Schedule VACUUM ANALYZE for optimized tables
- Review remaining 2 slow queries for further optimization
Maintenance Recommendations
Weekly:
- Review pg_stat_statements for new slow queries
- Check index usage statistics
Monthly:
- Analyze table statistics:
VACUUM ANALYZE - Review and remove unused indexes
- Check for table bloat
Quarterly:
- Review schema design for optimization opportunities
- Evaluate partitioning strategy for large tables
- Update connection pool settings based on usage patterns