--- description: Design and implement table partitioning strategies for massive datasets shortcut: partition --- # Database Partition Manager Design, implement, and manage table partitioning strategies for massive datasets with automated partition maintenance, query optimization, and data lifecycle management. ## When to Use This Command Use `/partition` when you need to: - Manage tables exceeding 100GB with slow query performance - Implement time-series data archival strategies (IoT, logs, metrics) - Optimize queries that filter by date ranges or specific values - Reduce maintenance window for VACUUM, INDEX, and ANALYZE operations - Implement efficient data retention policies (delete old partitions) - Improve parallel query performance across multiple partitions DON'T use this when: - Tables are small (<10GB) and perform well - Queries don't filter by partition key (causes partition pruning failure) - Application can't be updated to handle partition-aware queries - Database doesn't support native partitioning (use application-level sharding instead) ## Design Decisions This command implements **declarative partitioning** because: - Native database support provides optimal query performance - Automatic partition pruning reduces query execution time by 90%+ - Constraint exclusion ensures only relevant partitions are scanned - Partition-wise joins improve multi-table query performance - Automated partition management reduces operational overhead **Alternative considered: Application-level sharding** - Full control over data distribution - Requires application code changes - No automatic query optimization - Recommended for multi-tenant applications with tenant-based isolation **Alternative considered: Inheritance-based partitioning (legacy)** - Available in older PostgreSQL versions (<10) - Manual trigger maintenance required - No automatic partition pruning - Recommended only for legacy systems ## Prerequisites Before running this command: 1. Identify partition key (typically timestamp or category column) 2. Analyze query patterns to ensure they filter by partition key 3. Estimate partition size (target: 10-50GB per partition) 4. Plan partition retention policy (e.g., keep 90 days, archive rest) 5. Test partition migration on development database ## Implementation Process ### Step 1: Analyze Table and Query Patterns Review table size, query patterns, and identify optimal partition strategy. ### Step 2: Design Partition Schema Choose partitioning method (range, list, hash) and partition key based on access patterns. ### Step 3: Create Partitioned Table Convert existing table to partitioned table with minimal downtime using pg_partman or manual migration. ### Step 4: Implement Automated Partition Maintenance Set up automated partition creation, archival, and cleanup processes. ### Step 5: Optimize Queries for Partition Pruning Ensure queries include partition key in WHERE clauses for automatic pruning. ## Output Format The command generates: - `schema/partitioned_table.sql` - Partitioned table definition - `maintenance/partition_manager.sql` - Automated partition management functions - `scripts/partition_maintenance.sh` - Cron job for partition operations - `migration/convert_to_partitioned.sql` - Zero-downtime migration script - `monitoring/partition_health.sql` - Partition size and performance monitoring ## Code Examples ### Example 1: PostgreSQL Range Partitioning for Time-Series Data ```sql -- Create partitioned table for time-series sensor data CREATE TABLE sensor_readings ( id BIGSERIAL, sensor_id INTEGER NOT NULL, reading_value NUMERIC(10,2) NOT NULL, reading_time TIMESTAMP NOT NULL, metadata JSONB, PRIMARY KEY (id, reading_time) ) PARTITION BY RANGE (reading_time); -- Create indexes on partitioned table (inherited by all partitions) CREATE INDEX idx_sensor_readings_sensor_id ON sensor_readings (sensor_id); CREATE INDEX idx_sensor_readings_time ON sensor_readings (reading_time); CREATE INDEX idx_sensor_readings_metadata ON sensor_readings USING GIN (metadata); -- Create initial partitions (monthly strategy) CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); CREATE TABLE sensor_readings_2024_03 PARTITION OF sensor_readings FOR VALUES FROM ('2024-03-01') TO ('2024-04-01'); -- Create default partition for out-of-range data CREATE TABLE sensor_readings_default PARTITION OF sensor_readings DEFAULT; -- Automated partition management function CREATE OR REPLACE FUNCTION create_monthly_partitions( p_table_name TEXT, p_months_ahead INTEGER DEFAULT 3 ) RETURNS VOID AS $$ DECLARE v_start_date DATE; v_end_date DATE; v_partition_name TEXT; v_sql TEXT; v_month INTEGER; BEGIN -- Create partitions for next N months FOR v_month IN 1..p_months_ahead LOOP v_start_date := DATE_TRUNC('month', CURRENT_DATE + (v_month || ' months')::INTERVAL); v_end_date := v_start_date + INTERVAL '1 month'; v_partition_name := p_table_name || '_' || TO_CHAR(v_start_date, 'YYYY_MM'); -- Check if partition already exists IF NOT EXISTS ( SELECT 1 FROM pg_class WHERE relname = v_partition_name ) THEN v_sql := FORMAT( 'CREATE TABLE %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)', v_partition_name, p_table_name, v_start_date, v_end_date ); RAISE NOTICE 'Creating partition: %', v_partition_name; EXECUTE v_sql; -- Analyze new partition EXECUTE FORMAT('ANALYZE %I', v_partition_name); END IF; END LOOP; END; $$ LANGUAGE plpgsql; -- Automated partition archival and cleanup CREATE OR REPLACE FUNCTION archive_old_partitions( p_table_name TEXT, p_retention_months INTEGER DEFAULT 12, p_archive_table TEXT DEFAULT NULL ) RETURNS VOID AS $$ DECLARE v_partition RECORD; v_cutoff_date DATE; v_sql TEXT; BEGIN v_cutoff_date := DATE_TRUNC('month', CURRENT_DATE - (p_retention_months || ' months')::INTERVAL); FOR v_partition IN SELECT c.relname AS partition_name, pg_get_expr(c.relpartbound, c.oid) AS partition_bounds FROM pg_class c JOIN pg_inherits i ON i.inhrelid = c.oid JOIN pg_class p ON p.oid = i.inhparent WHERE p.relname = p_table_name AND c.relname LIKE p_table_name || '_%' AND c.relname != p_table_name || '_default' ORDER BY c.relname LOOP -- Extract partition start date from name IF v_partition.partition_name ~ '_\d{4}_\d{2}$' THEN DECLARE v_partition_date DATE; BEGIN v_partition_date := TO_DATE( SUBSTRING(v_partition.partition_name FROM '\d{4}_\d{2}$'), 'YYYY_MM' ); IF v_partition_date < v_cutoff_date THEN RAISE NOTICE 'Archiving partition: %', v_partition.partition_name; IF p_archive_table IS NOT NULL THEN -- Move data to archive table v_sql := FORMAT( 'INSERT INTO %I SELECT * FROM %I', p_archive_table, v_partition.partition_name ); EXECUTE v_sql; END IF; -- Detach and drop old partition v_sql := FORMAT( 'ALTER TABLE %I DETACH PARTITION %I', p_table_name, v_partition.partition_name ); EXECUTE v_sql; v_sql := FORMAT('DROP TABLE %I', v_partition.partition_name); EXECUTE v_sql; RAISE NOTICE 'Dropped partition: %', v_partition.partition_name; END IF; END; END IF; END LOOP; END; $$ LANGUAGE plpgsql; -- Partition health monitoring CREATE OR REPLACE VIEW partition_health AS SELECT schemaname, tablename AS partition_name, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS index_size, n_live_tup AS row_count, n_dead_tup AS dead_rows, ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_row_percent, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE tablename LIKE '%_202%' -- Filter for partitions ORDER BY schemaname, tablename; -- Query to show partition pruning effectiveness CREATE OR REPLACE FUNCTION explain_partition_pruning(p_query TEXT) RETURNS TABLE (plan_line TEXT) AS $$ BEGIN RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, BUFFERS) ' || p_query; END; $$ LANGUAGE plpgsql; ``` ```bash #!/bin/bash # scripts/partition_maintenance.sh - Automated Partition Management set -euo pipefail # Configuration DB_NAME="mydb" DB_USER="postgres" DB_HOST="localhost" RETENTION_MONTHS=12 CREATE_AHEAD_MONTHS=3 LOG_FILE="/var/log/partition_maintenance.log" log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE" } # Create future partitions create_partitions() { log "Creating partitions for next $CREATE_AHEAD_MONTHS months..." psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 < 10 ORDER BY dead_row_percent DESC; EOF log "Health report generated" } # Main execution main() { log "=== Starting Partition Maintenance ===" create_partitions cleanup_partitions analyze_partitions health_report log "=== Partition Maintenance Completed ===" } main "$@" ``` ### Example 2: List Partitioning by Category with Hash Sub-Partitioning ```sql -- Multi-level partitioning: LIST (by region) → HASH (by customer_id) CREATE TABLE orders ( order_id BIGSERIAL, customer_id INTEGER NOT NULL, region VARCHAR(10) NOT NULL, order_date TIMESTAMP NOT NULL, total_amount NUMERIC(10,2), PRIMARY KEY (order_id, region, customer_id) ) PARTITION BY LIST (region); -- Create regional partitions CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US', 'CA', 'MX') PARTITION BY HASH (customer_id); CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('UK', 'FR', 'DE', 'ES', 'IT') PARTITION BY HASH (customer_id); CREATE TABLE orders_asia PARTITION OF orders FOR VALUES IN ('JP', 'CN', 'IN', 'SG') PARTITION BY HASH (customer_id); -- Create hash sub-partitions (4 buckets per region for parallel processing) CREATE TABLE orders_us_0 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE orders_us_1 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE orders_us_2 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE orders_us_3 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 3); CREATE TABLE orders_eu_0 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE orders_eu_1 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE orders_eu_2 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE orders_eu_3 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 3); CREATE TABLE orders_asia_0 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE orders_asia_1 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE orders_asia_2 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE orders_asia_3 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 3); -- Query optimization with partition-wise join SET enable_partitionwise_join = on; SET enable_partitionwise_aggregate = on; -- Demonstrate partition pruning EXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, SUM(total_amount) AS total_spent FROM orders WHERE region = 'US' AND order_date >= '2024-01-01' AND order_date < '2024-02-01' GROUP BY customer_id ORDER BY total_spent DESC LIMIT 100; -- Output shows only orders_us partitions are scanned (not EU or Asia) ``` ```python # scripts/partition_migration.py - Zero-downtime partition migration import psycopg2 from psycopg2 import sql import logging import time from datetime import datetime, timedelta logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class PartitionMigrator: """Migrate existing table to partitioned table with minimal downtime.""" def __init__(self, connection_string: str): self.conn_string = connection_string def connect(self): return psycopg2.connect(self.conn_string) def migrate_to_partitioned( self, table_name: str, partition_column: str, partition_type: str = 'RANGE', partition_interval: str = 'MONTHLY' ): """ Migrate table to partitioned table with zero downtime. Strategy: 1. Create new partitioned table 2. Copy existing data in batches 3. Rename tables atomically 4. Update application to use new table """ conn = self.connect() conn.autocommit = False try: with conn.cursor() as cur: # Step 1: Create new partitioned table logger.info(f"Creating partitioned table {table_name}_new...") cur.execute(f""" CREATE TABLE {table_name}_new ( LIKE {table_name} INCLUDING ALL ) PARTITION BY {partition_type} ({partition_column}); """) # Step 2: Create initial partitions based on existing data logger.info("Creating initial partitions...") if partition_interval == 'MONTHLY': cur.execute(f""" SELECT DATE_TRUNC('month', MIN({partition_column})) AS min_date, DATE_TRUNC('month', MAX({partition_column})) AS max_date FROM {table_name}; """) min_date, max_date = cur.fetchone() logger.info(f"Data range: {min_date} to {max_date}") current_date = min_date while current_date <= max_date: next_date = current_date + timedelta(days=32) next_date = next_date.replace(day=1) # First day of next month partition_name = f"{table_name}_{current_date.strftime('%Y_%m')}" cur.execute(sql.SQL(""" CREATE TABLE {} PARTITION OF {} FOR VALUES FROM (%s) TO (%s); """).format( sql.Identifier(partition_name), sql.Identifier(f"{table_name}_new") ), (current_date, next_date)) logger.info(f"Created partition: {partition_name}") current_date = next_date # Step 3: Copy data in batches logger.info("Copying data in batches...") batch_size = 10000 offset = 0 while True: cur.execute(f""" INSERT INTO {table_name}_new SELECT * FROM {table_name} ORDER BY {partition_column} LIMIT {batch_size} OFFSET {offset}; """) rows_copied = cur.rowcount if rows_copied == 0: break offset += batch_size logger.info(f"Copied {offset} rows...") # Commit each batch to avoid long-running transactions conn.commit() # Step 4: Verify row counts logger.info("Verifying row counts...") cur.execute(f"SELECT COUNT(*) FROM {table_name};") original_count = cur.fetchone()[0] cur.execute(f"SELECT COUNT(*) FROM {table_name}_new;") new_count = cur.fetchone()[0] if original_count != new_count: raise Exception( f"Row count mismatch! Original: {original_count}, New: {new_count}" ) logger.info(f"Row count verified: {original_count} rows") # Step 5: Rename tables atomically logger.info("Renaming tables...") cur.execute(f""" BEGIN; ALTER TABLE {table_name} RENAME TO {table_name}_old; ALTER TABLE {table_name}_new RENAME TO {table_name}; COMMIT; """) logger.info("Migration completed successfully!") logger.info(f"Old table preserved as {table_name}_old") conn.commit() except Exception as e: conn.rollback() logger.error(f"Migration failed: {e}") raise finally: conn.close() def verify_partition_pruning(self, query: str): """Test if query benefits from partition pruning.""" conn = self.connect() try: with conn.cursor() as cur: cur.execute(f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}") plan = cur.fetchone()[0][0] # Extract partition pruning info pruned = self._count_pruned_partitions(plan) logger.info(f"Partition pruning analysis:") logger.info(f" Total partitions: {pruned['total']}") logger.info(f" Scanned partitions: {pruned['scanned']}") logger.info(f" Pruned partitions: {pruned['pruned']}") logger.info(f" Pruning effectiveness: {pruned['effectiveness']:.1f}%") return pruned finally: conn.close() def _count_pruned_partitions(self, plan: dict) -> dict: """Recursively count partitions in explain plan.""" total = 0 scanned = 0 def traverse(node): nonlocal total, scanned if 'Relation Name' in node and '_202' in node['Relation Name']: total += 1 if 'Plans' in node or node.get('Actual Rows', 0) > 0: scanned += 1 if 'Plans' in node: for child in node['Plans']: traverse(child) traverse(plan['Plan']) pruned = total - scanned effectiveness = (pruned / total * 100) if total > 0 else 0 return { 'total': total, 'scanned': scanned, 'pruned': pruned, 'effectiveness': effectiveness } # Usage if __name__ == "__main__": migrator = PartitionMigrator( "postgresql://user:password@localhost/mydb" ) # Migrate sensor_readings table migrator.migrate_to_partitioned( table_name='sensor_readings', partition_column='reading_time', partition_type='RANGE', partition_interval='MONTHLY' ) # Verify partition pruning works test_query = """ SELECT * FROM sensor_readings WHERE reading_time >= '2024-10-01' AND reading_time < '2024-11-01' LIMIT 1000; """ migrator.verify_partition_pruning(test_query) ``` ## Error Handling | Error | Cause | Solution | |-------|-------|----------| | "No partition of relation ... found for row" | Data outside partition ranges | Create default partition or extend partition range | | "Partition constraint violated" | Invalid data for partition | Fix data or adjust partition bounds | | "Cannot create partition of temporary table" | Partitioning temp tables unsupported | Use regular tables or application-level sharding | | "Too many partitions (>1000)" | Excessive partition count | Increase partition interval (daily → weekly → monthly) | | "Constraint exclusion not working" | Query doesn't filter by partition key | Rewrite query to include partition key in WHERE clause | ## Configuration Options **Partition Planning** - `partition_type`: RANGE (dates), LIST (categories), HASH (distribution) - `partition_interval`: DAILY, WEEKLY, MONTHLY, YEARLY - `retention_policy`: How long to keep old partitions - `partition_size_target`: Target 10-50GB per partition **Query Optimization** - `enable_partition_pruning = on`: Enable automatic partition elimination - `constraint_exclusion = partition`: Enable constraint-based pruning - `enable_partitionwise_join = on`: Join matching partitions directly - `enable_partitionwise_aggregate = on`: Aggregate per-partition then combine ## Best Practices DO: - Always include partition key in WHERE clauses for pruning - Target 10-50GB per partition (not too large, not too small) - Use RANGE partitioning for time-series data - Use LIST partitioning for categorical data (regions, types) - Use HASH partitioning for even distribution without natural key - Automate partition creation 3+ months ahead - Monitor partition sizes and adjust strategy if needed DON'T: - Create thousands of tiny partitions (overhead > benefit) - Partition tables < 10GB (overhead not justified) - Use partition key that changes over time - Query without partition key filter (scans all partitions) - Forget to analyze partitions after bulk inserts - Mix partition strategies without clear reason ## Performance Considerations - Partition pruning can reduce query time by 90%+ on large tables - Each partition adds ~8KB overhead in PostgreSQL catalogs - INSERT performance unchanged for single-row inserts - Bulk INSERT benefits from partition-wise parallelism - VACUUM and ANALYZE run faster on smaller partitions - Index creation can be parallelized across partitions ## Related Commands - `/database-migration-manager` - Schema migrations with partition support - `/database-backup-automator` - Per-partition backup strategies - `/database-index-advisor` - Optimize indexes for partitioned tables - `/sql-query-optimizer` - Ensure queries leverage partition pruning ## Version History - v1.0.0 (2024-10): Initial implementation with PostgreSQL declarative partitioning - Planned v1.1.0: Add MySQL partitioning support and automated partition rebalancing