528 lines
11 KiB
Markdown
528 lines
11 KiB
Markdown
# PostgreSQL Performance Optimization
|
|
|
|
Query optimization, indexing strategies, EXPLAIN analysis, and performance tuning for PostgreSQL.
|
|
|
|
## EXPLAIN Command
|
|
|
|
### Basic EXPLAIN
|
|
```sql
|
|
-- Show query plan
|
|
EXPLAIN SELECT * FROM users WHERE id = 1;
|
|
|
|
-- Output shows:
|
|
-- - Execution plan nodes
|
|
-- - Estimated costs
|
|
-- - Estimated rows
|
|
```
|
|
|
|
### EXPLAIN ANALYZE
|
|
```sql
|
|
-- Execute query and show actual performance
|
|
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 18;
|
|
|
|
-- Shows:
|
|
-- - Actual execution time
|
|
-- - Actual rows returned
|
|
-- - Planning time
|
|
-- - Execution time
|
|
```
|
|
|
|
### EXPLAIN Options
|
|
```sql
|
|
-- Verbose output
|
|
EXPLAIN (VERBOSE) SELECT * FROM users;
|
|
|
|
-- Show buffer usage
|
|
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE active = true;
|
|
|
|
-- JSON format
|
|
EXPLAIN (FORMAT JSON, ANALYZE) SELECT * FROM users;
|
|
|
|
-- All options
|
|
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, COSTS)
|
|
SELECT * FROM users WHERE id = 1;
|
|
```
|
|
|
|
## Understanding Query Plans
|
|
|
|
### Scan Methods
|
|
|
|
#### Sequential Scan
|
|
```sql
|
|
-- Full table scan (reads all rows)
|
|
EXPLAIN SELECT * FROM users WHERE name = 'Alice';
|
|
|
|
-- Output: Seq Scan on users
|
|
-- Indicates: no suitable index or small table
|
|
```
|
|
|
|
#### Index Scan
|
|
```sql
|
|
-- Uses index to find rows
|
|
EXPLAIN SELECT * FROM users WHERE id = 1;
|
|
|
|
-- Output: Index Scan using users_pkey on users
|
|
-- Best for: selective queries, small result sets
|
|
```
|
|
|
|
#### Index Only Scan
|
|
```sql
|
|
-- Query covered by index (no table access)
|
|
CREATE INDEX idx_users_email_name ON users(email, name);
|
|
EXPLAIN SELECT email, name FROM users WHERE email = 'alice@example.com';
|
|
|
|
-- Output: Index Only Scan using idx_users_email_name
|
|
-- Best performance: no heap fetch needed
|
|
```
|
|
|
|
#### Bitmap Scan
|
|
```sql
|
|
-- Combines multiple indexes or handles large result sets
|
|
EXPLAIN SELECT * FROM users WHERE age > 18 AND status = 'active';
|
|
|
|
-- Output:
|
|
-- Bitmap Heap Scan on users
|
|
-- Recheck Cond: ...
|
|
-- -> Bitmap Index Scan on idx_age
|
|
|
|
-- Good for: moderate selectivity
|
|
```
|
|
|
|
### Join Methods
|
|
|
|
#### Nested Loop
|
|
```sql
|
|
-- For each row in outer table, scan inner table
|
|
EXPLAIN SELECT * FROM orders o
|
|
JOIN customers c ON o.customer_id = c.id
|
|
WHERE c.id = 1;
|
|
|
|
-- Output: Nested Loop
|
|
-- Best for: small outer table, indexed inner table
|
|
```
|
|
|
|
#### Hash Join
|
|
```sql
|
|
-- Build hash table from smaller table
|
|
EXPLAIN SELECT * FROM orders o
|
|
JOIN customers c ON o.customer_id = c.id;
|
|
|
|
-- Output: Hash Join
|
|
-- Best for: large tables, equality conditions
|
|
```
|
|
|
|
#### Merge Join
|
|
```sql
|
|
-- Both inputs sorted on join key
|
|
EXPLAIN SELECT * FROM orders o
|
|
JOIN customers c ON o.customer_id = c.id
|
|
ORDER BY o.customer_id;
|
|
|
|
-- Output: Merge Join
|
|
-- Best for: pre-sorted data, large sorted inputs
|
|
```
|
|
|
|
## Indexing Strategies
|
|
|
|
### B-tree Index (Default)
|
|
```sql
|
|
-- General purpose index
|
|
CREATE INDEX idx_users_email ON users(email);
|
|
CREATE INDEX idx_orders_date ON orders(order_date);
|
|
|
|
-- Supports: =, <, <=, >, >=, BETWEEN, IN, IS NULL
|
|
-- Supports: ORDER BY, MIN/MAX
|
|
```
|
|
|
|
### Composite Index
|
|
```sql
|
|
-- Multiple columns (order matters!)
|
|
CREATE INDEX idx_users_status_created ON users(status, created_at);
|
|
|
|
-- Supports queries on:
|
|
-- - status
|
|
-- - status, created_at
|
|
-- Does NOT support: created_at alone
|
|
|
|
-- Column order: most selective first
|
|
-- Exception: match query WHERE/ORDER BY patterns
|
|
```
|
|
|
|
### Partial Index
|
|
```sql
|
|
-- Index subset of rows
|
|
CREATE INDEX idx_active_users ON users(email)
|
|
WHERE status = 'active';
|
|
|
|
-- Smaller index, faster queries with matching WHERE clause
|
|
-- Query must include WHERE status = 'active' to use index
|
|
```
|
|
|
|
### Expression Index
|
|
```sql
|
|
-- Index on computed value
|
|
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
|
|
|
|
-- Query must use same expression
|
|
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
|
|
```
|
|
|
|
### GIN Index (Generalized Inverted Index)
|
|
```sql
|
|
-- For array, JSONB, full-text search
|
|
CREATE INDEX idx_products_tags ON products USING GIN(tags);
|
|
CREATE INDEX idx_documents_data ON documents USING GIN(data);
|
|
|
|
-- Array queries
|
|
SELECT * FROM products WHERE tags @> ARRAY['featured'];
|
|
|
|
-- JSONB queries
|
|
SELECT * FROM documents WHERE data @> '{"status": "active"}';
|
|
```
|
|
|
|
### GiST Index (Generalized Search Tree)
|
|
```sql
|
|
-- For geometric data, range types, full-text
|
|
CREATE INDEX idx_locations_geom ON locations USING GiST(geom);
|
|
|
|
-- Geometric queries
|
|
SELECT * FROM locations WHERE geom && ST_MakeEnvelope(...);
|
|
```
|
|
|
|
### Hash Index
|
|
```sql
|
|
-- Equality comparisons only
|
|
CREATE INDEX idx_users_hash_email ON users USING HASH(email);
|
|
|
|
-- Only supports: =
|
|
-- Rarely used (B-tree usually better)
|
|
```
|
|
|
|
### BRIN Index (Block Range Index)
|
|
```sql
|
|
-- For very large tables with natural clustering
|
|
CREATE INDEX idx_logs_brin_created ON logs USING BRIN(created_at);
|
|
|
|
-- Tiny index size, good for append-only data
|
|
-- Best for: time-series, logging, large tables
|
|
```
|
|
|
|
## Query Optimization Techniques
|
|
|
|
### Avoid SELECT *
|
|
```sql
|
|
-- Bad
|
|
SELECT * FROM users WHERE id = 1;
|
|
|
|
-- Good (only needed columns)
|
|
SELECT id, name, email FROM users WHERE id = 1;
|
|
```
|
|
|
|
### Use LIMIT
|
|
```sql
|
|
-- Limit result set
|
|
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
|
|
|
|
-- PostgreSQL can stop early with LIMIT
|
|
```
|
|
|
|
### Index for ORDER BY
|
|
```sql
|
|
-- Create index matching sort order
|
|
CREATE INDEX idx_users_created_desc ON users(created_at DESC);
|
|
|
|
-- Query uses index for sorting
|
|
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
|
|
```
|
|
|
|
### Covering Index
|
|
```sql
|
|
-- Include all queried columns in index
|
|
CREATE INDEX idx_users_email_name_status ON users(email, name, status);
|
|
|
|
-- Query covered by index (no table access)
|
|
SELECT name, status FROM users WHERE email = 'alice@example.com';
|
|
```
|
|
|
|
### EXISTS vs IN
|
|
```sql
|
|
-- Prefer EXISTS for large subqueries
|
|
-- Bad
|
|
SELECT * FROM customers
|
|
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
|
|
|
|
-- Good
|
|
SELECT * FROM customers c
|
|
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000);
|
|
```
|
|
|
|
### JOIN Order
|
|
```sql
|
|
-- Filter before joining
|
|
-- Bad
|
|
SELECT * FROM orders o
|
|
JOIN customers c ON o.customer_id = c.id
|
|
WHERE o.status = 'completed' AND c.country = 'USA';
|
|
|
|
-- Good (filter in subquery)
|
|
SELECT * FROM (
|
|
SELECT * FROM orders WHERE status = 'completed'
|
|
) o
|
|
JOIN (
|
|
SELECT * FROM customers WHERE country = 'USA'
|
|
) c ON o.customer_id = c.id;
|
|
|
|
-- Or use CTE
|
|
WITH filtered_orders AS (
|
|
SELECT * FROM orders WHERE status = 'completed'
|
|
),
|
|
filtered_customers AS (
|
|
SELECT * FROM customers WHERE country = 'USA'
|
|
)
|
|
SELECT * FROM filtered_orders o
|
|
JOIN filtered_customers c ON o.customer_id = c.id;
|
|
```
|
|
|
|
### Avoid Functions in WHERE
|
|
```sql
|
|
-- Bad (index not used)
|
|
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
|
|
|
|
-- Good (create expression index)
|
|
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
|
|
-- Then query uses index
|
|
|
|
-- Or store lowercase separately
|
|
ALTER TABLE users ADD COLUMN email_lower TEXT;
|
|
UPDATE users SET email_lower = LOWER(email);
|
|
CREATE INDEX idx_users_email_lower ON users(email_lower);
|
|
```
|
|
|
|
## Statistics and ANALYZE
|
|
|
|
### Update Statistics
|
|
```sql
|
|
-- Analyze table (update statistics)
|
|
ANALYZE users;
|
|
|
|
-- Analyze specific columns
|
|
ANALYZE users(email, status);
|
|
|
|
-- Analyze all tables
|
|
ANALYZE;
|
|
|
|
-- Auto-analyze (configured in postgresql.conf)
|
|
autovacuum_analyze_threshold = 50
|
|
autovacuum_analyze_scale_factor = 0.1
|
|
```
|
|
|
|
### Check Statistics
|
|
```sql
|
|
-- Last analyze time
|
|
SELECT schemaname, tablename, last_analyze, last_autoanalyze
|
|
FROM pg_stat_user_tables;
|
|
|
|
-- Statistics targets (adjust for important columns)
|
|
ALTER TABLE users ALTER COLUMN email SET STATISTICS 1000;
|
|
```
|
|
|
|
## VACUUM and Maintenance
|
|
|
|
### VACUUM
|
|
```sql
|
|
-- Reclaim storage, update statistics
|
|
VACUUM users;
|
|
|
|
-- Verbose output
|
|
VACUUM VERBOSE users;
|
|
|
|
-- Full vacuum (rewrites table, locks table)
|
|
VACUUM FULL users;
|
|
|
|
-- Analyze after vacuum
|
|
VACUUM ANALYZE users;
|
|
```
|
|
|
|
### Auto-Vacuum
|
|
```sql
|
|
-- Check autovacuum status
|
|
SELECT schemaname, tablename, last_vacuum, last_autovacuum
|
|
FROM pg_stat_user_tables;
|
|
|
|
-- Configure in postgresql.conf
|
|
autovacuum = on
|
|
autovacuum_vacuum_threshold = 50
|
|
autovacuum_vacuum_scale_factor = 0.2
|
|
```
|
|
|
|
### REINDEX
|
|
```sql
|
|
-- Rebuild index
|
|
REINDEX INDEX idx_users_email;
|
|
|
|
-- Rebuild all indexes on table
|
|
REINDEX TABLE users;
|
|
|
|
-- Rebuild all indexes in schema
|
|
REINDEX SCHEMA public;
|
|
```
|
|
|
|
## Monitoring Queries
|
|
|
|
### Active Queries
|
|
```sql
|
|
-- Current queries
|
|
SELECT pid, usename, state, query, query_start
|
|
FROM pg_stat_activity
|
|
WHERE state != 'idle';
|
|
|
|
-- Long-running queries
|
|
SELECT pid, now() - query_start AS duration, query
|
|
FROM pg_stat_activity
|
|
WHERE state != 'idle' AND now() - query_start > interval '5 minutes'
|
|
ORDER BY duration DESC;
|
|
```
|
|
|
|
### Slow Query Log
|
|
```sql
|
|
-- Enable slow query logging (postgresql.conf)
|
|
log_min_duration_statement = 100 -- milliseconds
|
|
|
|
-- Or per session
|
|
SET log_min_duration_statement = 100;
|
|
|
|
-- Logs appear in PostgreSQL log files
|
|
```
|
|
|
|
### pg_stat_statements Extension
|
|
```sql
|
|
-- Enable extension
|
|
CREATE EXTENSION pg_stat_statements;
|
|
|
|
-- View query statistics
|
|
SELECT query, calls, total_exec_time, mean_exec_time, rows
|
|
FROM pg_stat_statements
|
|
ORDER BY mean_exec_time DESC
|
|
LIMIT 10;
|
|
|
|
-- Reset statistics
|
|
SELECT pg_stat_statements_reset();
|
|
```
|
|
|
|
## Index Usage Analysis
|
|
|
|
### Check Index Usage
|
|
```sql
|
|
-- Index usage statistics
|
|
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
|
|
FROM pg_stat_user_indexes
|
|
ORDER BY idx_scan;
|
|
|
|
-- Unused indexes (idx_scan = 0)
|
|
SELECT schemaname, tablename, indexname
|
|
FROM pg_stat_user_indexes
|
|
WHERE idx_scan = 0 AND indexname NOT LIKE '%_pkey';
|
|
```
|
|
|
|
### Index Size
|
|
```sql
|
|
-- Index sizes
|
|
SELECT schemaname, tablename, indexname,
|
|
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
|
|
FROM pg_stat_user_indexes
|
|
ORDER BY pg_relation_size(indexrelid) DESC;
|
|
```
|
|
|
|
### Missing Indexes
|
|
```sql
|
|
-- Tables with sequential scans
|
|
SELECT schemaname, tablename, seq_scan, seq_tup_read
|
|
FROM pg_stat_user_tables
|
|
WHERE seq_scan > 0
|
|
ORDER BY seq_tup_read DESC;
|
|
|
|
-- Consider adding indexes to high seq_scan tables
|
|
```
|
|
|
|
## Configuration Tuning
|
|
|
|
### Memory Settings (postgresql.conf)
|
|
```conf
|
|
# Shared buffers (25% of RAM)
|
|
shared_buffers = 4GB
|
|
|
|
# Work memory (per operation)
|
|
work_mem = 64MB
|
|
|
|
# Maintenance work memory (VACUUM, CREATE INDEX)
|
|
maintenance_work_mem = 512MB
|
|
|
|
# Effective cache size (estimate of OS cache)
|
|
effective_cache_size = 12GB
|
|
```
|
|
|
|
### Query Planner Settings
|
|
```conf
|
|
# Random page cost (lower for SSD)
|
|
random_page_cost = 1.1
|
|
|
|
# Effective IO concurrency (number of concurrent disk operations)
|
|
effective_io_concurrency = 200
|
|
|
|
# Cost of parallel query startup
|
|
parallel_setup_cost = 1000
|
|
parallel_tuple_cost = 0.1
|
|
```
|
|
|
|
### Connection Settings
|
|
```conf
|
|
# Max connections
|
|
max_connections = 100
|
|
|
|
# Connection pooling recommended (pgBouncer)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Index strategy**
|
|
- Index foreign keys
|
|
- Index WHERE clause columns
|
|
- Index ORDER BY columns
|
|
- Use composite indexes for multi-column queries
|
|
- Keep index count reasonable (5-10 per table)
|
|
|
|
2. **Query optimization**
|
|
- Use EXPLAIN ANALYZE
|
|
- Avoid SELECT *
|
|
- Use LIMIT when possible
|
|
- Filter before joining
|
|
- Use appropriate join type
|
|
|
|
3. **Statistics**
|
|
- Regular ANALYZE
|
|
- Increase statistics target for skewed distributions
|
|
- Monitor autovacuum
|
|
|
|
4. **Monitoring**
|
|
- Enable pg_stat_statements
|
|
- Log slow queries
|
|
- Monitor index usage
|
|
- Check table bloat
|
|
|
|
5. **Maintenance**
|
|
- Regular VACUUM
|
|
- REINDEX periodically
|
|
- Update PostgreSQL version
|
|
- Monitor disk space
|
|
|
|
6. **Configuration**
|
|
- Tune memory settings
|
|
- Adjust for workload (OLTP vs OLAP)
|
|
- Use connection pooling
|
|
- Enable query logging
|
|
|
|
7. **Testing**
|
|
- Test queries with production-like data volume
|
|
- Benchmark before/after changes
|
|
- Monitor production metrics
|