Files
gh-jeremylongshore-claude-c…/commands/partitioning.md
2025-11-30 08:18:26 +08:00

25 KiB

description, shortcut
description shortcut
Design and implement table partitioning strategies for massive datasets partition

Database Partition Manager

Design, implement, and manage table partitioning strategies for massive datasets with automated partition maintenance, query optimization, and data lifecycle management.

When to Use This Command

Use /partition when you need to:

  • Manage tables exceeding 100GB with slow query performance
  • Implement time-series data archival strategies (IoT, logs, metrics)
  • Optimize queries that filter by date ranges or specific values
  • Reduce maintenance window for VACUUM, INDEX, and ANALYZE operations
  • Implement efficient data retention policies (delete old partitions)
  • Improve parallel query performance across multiple partitions

DON'T use this when:

  • Tables are small (<10GB) and perform well
  • Queries don't filter by partition key (causes partition pruning failure)
  • Application can't be updated to handle partition-aware queries
  • Database doesn't support native partitioning (use application-level sharding instead)

Design Decisions

This command implements declarative partitioning because:

  • Native database support provides optimal query performance
  • Automatic partition pruning reduces query execution time by 90%+
  • Constraint exclusion ensures only relevant partitions are scanned
  • Partition-wise joins improve multi-table query performance
  • Automated partition management reduces operational overhead

Alternative considered: Application-level sharding

  • Full control over data distribution
  • Requires application code changes
  • No automatic query optimization
  • Recommended for multi-tenant applications with tenant-based isolation

Alternative considered: Inheritance-based partitioning (legacy)

  • Available in older PostgreSQL versions (<10)
  • Manual trigger maintenance required
  • No automatic partition pruning
  • Recommended only for legacy systems

Prerequisites

Before running this command:

  1. Identify partition key (typically timestamp or category column)
  2. Analyze query patterns to ensure they filter by partition key
  3. Estimate partition size (target: 10-50GB per partition)
  4. Plan partition retention policy (e.g., keep 90 days, archive rest)
  5. Test partition migration on development database

Implementation Process

Step 1: Analyze Table and Query Patterns

Review table size, query patterns, and identify optimal partition strategy.

Step 2: Design Partition Schema

Choose partitioning method (range, list, hash) and partition key based on access patterns.

Step 3: Create Partitioned Table

Convert existing table to partitioned table with minimal downtime using pg_partman or manual migration.

Step 4: Implement Automated Partition Maintenance

Set up automated partition creation, archival, and cleanup processes.

Step 5: Optimize Queries for Partition Pruning

Ensure queries include partition key in WHERE clauses for automatic pruning.

Output Format

The command generates:

  • schema/partitioned_table.sql - Partitioned table definition
  • maintenance/partition_manager.sql - Automated partition management functions
  • scripts/partition_maintenance.sh - Cron job for partition operations
  • migration/convert_to_partitioned.sql - Zero-downtime migration script
  • monitoring/partition_health.sql - Partition size and performance monitoring

Code Examples

Example 1: PostgreSQL Range Partitioning for Time-Series Data

-- Create partitioned table for time-series sensor data
CREATE TABLE sensor_readings (
    id BIGSERIAL,
    sensor_id INTEGER NOT NULL,
    reading_value NUMERIC(10,2) NOT NULL,
    reading_time TIMESTAMP NOT NULL,
    metadata JSONB,
    PRIMARY KEY (id, reading_time)
) PARTITION BY RANGE (reading_time);

-- Create indexes on partitioned table (inherited by all partitions)
CREATE INDEX idx_sensor_readings_sensor_id ON sensor_readings (sensor_id);
CREATE INDEX idx_sensor_readings_time ON sensor_readings (reading_time);
CREATE INDEX idx_sensor_readings_metadata ON sensor_readings USING GIN (metadata);

-- Create initial partitions (monthly strategy)
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE sensor_readings_2024_03 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

-- Create default partition for out-of-range data
CREATE TABLE sensor_readings_default PARTITION OF sensor_readings DEFAULT;

-- Automated partition management function
CREATE OR REPLACE FUNCTION create_monthly_partitions(
    p_table_name TEXT,
    p_months_ahead INTEGER DEFAULT 3
)
RETURNS VOID AS $$
DECLARE
    v_start_date DATE;
    v_end_date DATE;
    v_partition_name TEXT;
    v_sql TEXT;
    v_month INTEGER;
BEGIN
    -- Create partitions for next N months
    FOR v_month IN 1..p_months_ahead LOOP
        v_start_date := DATE_TRUNC('month', CURRENT_DATE + (v_month || ' months')::INTERVAL);
        v_end_date := v_start_date + INTERVAL '1 month';
        v_partition_name := p_table_name || '_' || TO_CHAR(v_start_date, 'YYYY_MM');

        -- Check if partition already exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_class
            WHERE relname = v_partition_name
        ) THEN
            v_sql := FORMAT(
                'CREATE TABLE %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
                v_partition_name,
                p_table_name,
                v_start_date,
                v_end_date
            );

            RAISE NOTICE 'Creating partition: %', v_partition_name;
            EXECUTE v_sql;

            -- Analyze new partition
            EXECUTE FORMAT('ANALYZE %I', v_partition_name);
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Automated partition archival and cleanup
CREATE OR REPLACE FUNCTION archive_old_partitions(
    p_table_name TEXT,
    p_retention_months INTEGER DEFAULT 12,
    p_archive_table TEXT DEFAULT NULL
)
RETURNS VOID AS $$
DECLARE
    v_partition RECORD;
    v_cutoff_date DATE;
    v_sql TEXT;
BEGIN
    v_cutoff_date := DATE_TRUNC('month', CURRENT_DATE - (p_retention_months || ' months')::INTERVAL);

    FOR v_partition IN
        SELECT
            c.relname AS partition_name,
            pg_get_expr(c.relpartbound, c.oid) AS partition_bounds
        FROM pg_class c
        JOIN pg_inherits i ON i.inhrelid = c.oid
        JOIN pg_class p ON p.oid = i.inhparent
        WHERE p.relname = p_table_name
          AND c.relname LIKE p_table_name || '_%'
          AND c.relname != p_table_name || '_default'
        ORDER BY c.relname
    LOOP
        -- Extract partition start date from name
        IF v_partition.partition_name ~ '_\d{4}_\d{2}$' THEN
            DECLARE
                v_partition_date DATE;
            BEGIN
                v_partition_date := TO_DATE(
                    SUBSTRING(v_partition.partition_name FROM '\d{4}_\d{2}$'),
                    'YYYY_MM'
                );

                IF v_partition_date < v_cutoff_date THEN
                    RAISE NOTICE 'Archiving partition: %', v_partition.partition_name;

                    IF p_archive_table IS NOT NULL THEN
                        -- Move data to archive table
                        v_sql := FORMAT(
                            'INSERT INTO %I SELECT * FROM %I',
                            p_archive_table,
                            v_partition.partition_name
                        );
                        EXECUTE v_sql;
                    END IF;

                    -- Detach and drop old partition
                    v_sql := FORMAT(
                        'ALTER TABLE %I DETACH PARTITION %I',
                        p_table_name,
                        v_partition.partition_name
                    );
                    EXECUTE v_sql;

                    v_sql := FORMAT('DROP TABLE %I', v_partition.partition_name);
                    EXECUTE v_sql;

                    RAISE NOTICE 'Dropped partition: %', v_partition.partition_name;
                END IF;
            END;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Partition health monitoring
CREATE OR REPLACE VIEW partition_health AS
SELECT
    schemaname,
    tablename AS partition_name,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) -
                  pg_relation_size(schemaname||'.'||tablename)) AS index_size,
    n_live_tup AS row_count,
    n_dead_tup AS dead_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_row_percent,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename LIKE '%_202%'  -- Filter for partitions
ORDER BY schemaname, tablename;

-- Query to show partition pruning effectiveness
CREATE OR REPLACE FUNCTION explain_partition_pruning(p_query TEXT)
RETURNS TABLE (plan_line TEXT) AS $$
BEGIN
    RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, BUFFERS) ' || p_query;
END;
$$ LANGUAGE plpgsql;
#!/bin/bash
# scripts/partition_maintenance.sh - Automated Partition Management

set -euo pipefail

# Configuration
DB_NAME="mydb"
DB_USER="postgres"
DB_HOST="localhost"
RETENTION_MONTHS=12
CREATE_AHEAD_MONTHS=3
LOG_FILE="/var/log/partition_maintenance.log"

log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

# Create future partitions
create_partitions() {
    log "Creating partitions for next $CREATE_AHEAD_MONTHS months..."

    psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
        SELECT create_monthly_partitions('sensor_readings', $CREATE_AHEAD_MONTHS);
        SELECT create_monthly_partitions('audit_logs', $CREATE_AHEAD_MONTHS);
        SELECT create_monthly_partitions('user_events', $CREATE_AHEAD_MONTHS);
EOF

    log "Partition creation completed"
}

# Archive and cleanup old partitions
cleanup_partitions() {
    log "Archiving partitions older than $RETENTION_MONTHS months..."

    psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
        SELECT archive_old_partitions('sensor_readings', $RETENTION_MONTHS, 'sensor_readings_archive');
        SELECT archive_old_partitions('audit_logs', $RETENTION_MONTHS, 'audit_logs_archive');
        SELECT archive_old_partitions('user_events', $RETENTION_MONTHS, NULL);  -- No archival, just drop
EOF

    log "Partition cleanup completed"
}

# Analyze partitions for query optimization
analyze_partitions() {
    log "Analyzing partitions..."

    psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
        SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
        FROM pg_tables
        WHERE tablename LIKE '%_202%'
        ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

        -- Analyze all partitions
        DO \$\$
        DECLARE
            r RECORD;
        BEGIN
            FOR r IN
                SELECT schemaname, tablename
                FROM pg_tables
                WHERE tablename LIKE '%_202%'
            LOOP
                EXECUTE FORMAT('ANALYZE %I.%I', r.schemaname, r.tablename);
            END LOOP;
        END;
        \$\$;
EOF

    log "Partition analysis completed"
}

# Generate health report
health_report() {
    log "Generating partition health report..."

    psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF | tee -a "$LOG_FILE"
        \echo '=== Partition Health Report ==='
        \echo ''

        SELECT * FROM partition_health
        WHERE total_size != '0 bytes'
        ORDER BY partition_name DESC
        LIMIT 20;

        \echo ''
        \echo '=== Partitions Needing VACUUM ==='
        SELECT partition_name, dead_row_percent, row_count, dead_rows
        FROM partition_health
        WHERE dead_row_percent > 10
        ORDER BY dead_row_percent DESC;
EOF

    log "Health report generated"
}

# Main execution
main() {
    log "=== Starting Partition Maintenance ==="

    create_partitions
    cleanup_partitions
    analyze_partitions
    health_report

    log "=== Partition Maintenance Completed ==="
}

main "$@"

Example 2: List Partitioning by Category with Hash Sub-Partitioning

-- Multi-level partitioning: LIST (by region) → HASH (by customer_id)
CREATE TABLE orders (
    order_id BIGSERIAL,
    customer_id INTEGER NOT NULL,
    region VARCHAR(10) NOT NULL,
    order_date TIMESTAMP NOT NULL,
    total_amount NUMERIC(10,2),
    PRIMARY KEY (order_id, region, customer_id)
) PARTITION BY LIST (region);

-- Create regional partitions
CREATE TABLE orders_us PARTITION OF orders
    FOR VALUES IN ('US', 'CA', 'MX')
    PARTITION BY HASH (customer_id);

CREATE TABLE orders_eu PARTITION OF orders
    FOR VALUES IN ('UK', 'FR', 'DE', 'ES', 'IT')
    PARTITION BY HASH (customer_id);

CREATE TABLE orders_asia PARTITION OF orders
    FOR VALUES IN ('JP', 'CN', 'IN', 'SG')
    PARTITION BY HASH (customer_id);

-- Create hash sub-partitions (4 buckets per region for parallel processing)
CREATE TABLE orders_us_0 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_us_1 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_us_2 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_us_3 PARTITION OF orders_us FOR VALUES WITH (MODULUS 4, REMAINDER 3);

CREATE TABLE orders_eu_0 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_eu_1 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_eu_2 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_eu_3 PARTITION OF orders_eu FOR VALUES WITH (MODULUS 4, REMAINDER 3);

CREATE TABLE orders_asia_0 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_asia_1 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_asia_2 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_asia_3 PARTITION OF orders_asia FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- Query optimization with partition-wise join
SET enable_partitionwise_join = on;
SET enable_partitionwise_aggregate = on;

-- Demonstrate partition pruning
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
WHERE region = 'US'
  AND order_date >= '2024-01-01'
  AND order_date < '2024-02-01'
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 100;

-- Output shows only orders_us partitions are scanned (not EU or Asia)
# scripts/partition_migration.py - Zero-downtime partition migration
import psycopg2
from psycopg2 import sql
import logging
import time
from datetime import datetime, timedelta

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class PartitionMigrator:
    """Migrate existing table to partitioned table with minimal downtime."""

    def __init__(self, connection_string: str):
        self.conn_string = connection_string

    def connect(self):
        return psycopg2.connect(self.conn_string)

    def migrate_to_partitioned(
        self,
        table_name: str,
        partition_column: str,
        partition_type: str = 'RANGE',
        partition_interval: str = 'MONTHLY'
    ):
        """
        Migrate table to partitioned table with zero downtime.

        Strategy:
        1. Create new partitioned table
        2. Copy existing data in batches
        3. Rename tables atomically
        4. Update application to use new table
        """
        conn = self.connect()
        conn.autocommit = False

        try:
            with conn.cursor() as cur:
                # Step 1: Create new partitioned table
                logger.info(f"Creating partitioned table {table_name}_new...")

                cur.execute(f"""
                    CREATE TABLE {table_name}_new (
                        LIKE {table_name} INCLUDING ALL
                    ) PARTITION BY {partition_type} ({partition_column});
                """)

                # Step 2: Create initial partitions based on existing data
                logger.info("Creating initial partitions...")

                if partition_interval == 'MONTHLY':
                    cur.execute(f"""
                        SELECT
                            DATE_TRUNC('month', MIN({partition_column})) AS min_date,
                            DATE_TRUNC('month', MAX({partition_column})) AS max_date
                        FROM {table_name};
                    """)

                    min_date, max_date = cur.fetchone()
                    logger.info(f"Data range: {min_date} to {max_date}")

                    current_date = min_date
                    while current_date <= max_date:
                        next_date = current_date + timedelta(days=32)
                        next_date = next_date.replace(day=1)  # First day of next month

                        partition_name = f"{table_name}_{current_date.strftime('%Y_%m')}"

                        cur.execute(sql.SQL("""
                            CREATE TABLE {} PARTITION OF {} FOR VALUES FROM (%s) TO (%s);
                        """).format(
                            sql.Identifier(partition_name),
                            sql.Identifier(f"{table_name}_new")
                        ), (current_date, next_date))

                        logger.info(f"Created partition: {partition_name}")
                        current_date = next_date

                # Step 3: Copy data in batches
                logger.info("Copying data in batches...")

                batch_size = 10000
                offset = 0

                while True:
                    cur.execute(f"""
                        INSERT INTO {table_name}_new
                        SELECT * FROM {table_name}
                        ORDER BY {partition_column}
                        LIMIT {batch_size} OFFSET {offset};
                    """)

                    rows_copied = cur.rowcount
                    if rows_copied == 0:
                        break

                    offset += batch_size
                    logger.info(f"Copied {offset} rows...")

                    # Commit each batch to avoid long-running transactions
                    conn.commit()

                # Step 4: Verify row counts
                logger.info("Verifying row counts...")

                cur.execute(f"SELECT COUNT(*) FROM {table_name};")
                original_count = cur.fetchone()[0]

                cur.execute(f"SELECT COUNT(*) FROM {table_name}_new;")
                new_count = cur.fetchone()[0]

                if original_count != new_count:
                    raise Exception(
                        f"Row count mismatch! Original: {original_count}, New: {new_count}"
                    )

                logger.info(f"Row count verified: {original_count} rows")

                # Step 5: Rename tables atomically
                logger.info("Renaming tables...")

                cur.execute(f"""
                    BEGIN;
                    ALTER TABLE {table_name} RENAME TO {table_name}_old;
                    ALTER TABLE {table_name}_new RENAME TO {table_name};
                    COMMIT;
                """)

                logger.info("Migration completed successfully!")
                logger.info(f"Old table preserved as {table_name}_old")

            conn.commit()

        except Exception as e:
            conn.rollback()
            logger.error(f"Migration failed: {e}")
            raise
        finally:
            conn.close()

    def verify_partition_pruning(self, query: str):
        """Test if query benefits from partition pruning."""
        conn = self.connect()

        try:
            with conn.cursor() as cur:
                cur.execute(f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}")
                plan = cur.fetchone()[0][0]

                # Extract partition pruning info
                pruned = self._count_pruned_partitions(plan)

                logger.info(f"Partition pruning analysis:")
                logger.info(f"  Total partitions: {pruned['total']}")
                logger.info(f"  Scanned partitions: {pruned['scanned']}")
                logger.info(f"  Pruned partitions: {pruned['pruned']}")
                logger.info(f"  Pruning effectiveness: {pruned['effectiveness']:.1f}%")

                return pruned

        finally:
            conn.close()

    def _count_pruned_partitions(self, plan: dict) -> dict:
        """Recursively count partitions in explain plan."""
        total = 0
        scanned = 0

        def traverse(node):
            nonlocal total, scanned

            if 'Relation Name' in node and '_202' in node['Relation Name']:
                total += 1
                if 'Plans' in node or node.get('Actual Rows', 0) > 0:
                    scanned += 1

            if 'Plans' in node:
                for child in node['Plans']:
                    traverse(child)

        traverse(plan['Plan'])

        pruned = total - scanned
        effectiveness = (pruned / total * 100) if total > 0 else 0

        return {
            'total': total,
            'scanned': scanned,
            'pruned': pruned,
            'effectiveness': effectiveness
        }

# Usage
if __name__ == "__main__":
    migrator = PartitionMigrator(
        "postgresql://user:password@localhost/mydb"
    )

    # Migrate sensor_readings table
    migrator.migrate_to_partitioned(
        table_name='sensor_readings',
        partition_column='reading_time',
        partition_type='RANGE',
        partition_interval='MONTHLY'
    )

    # Verify partition pruning works
    test_query = """
        SELECT * FROM sensor_readings
        WHERE reading_time >= '2024-10-01'
          AND reading_time < '2024-11-01'
        LIMIT 1000;
    """

    migrator.verify_partition_pruning(test_query)

Error Handling

Error Cause Solution
"No partition of relation ... found for row" Data outside partition ranges Create default partition or extend partition range
"Partition constraint violated" Invalid data for partition Fix data or adjust partition bounds
"Cannot create partition of temporary table" Partitioning temp tables unsupported Use regular tables or application-level sharding
"Too many partitions (>1000)" Excessive partition count Increase partition interval (daily → weekly → monthly)
"Constraint exclusion not working" Query doesn't filter by partition key Rewrite query to include partition key in WHERE clause

Configuration Options

Partition Planning

  • partition_type: RANGE (dates), LIST (categories), HASH (distribution)
  • partition_interval: DAILY, WEEKLY, MONTHLY, YEARLY
  • retention_policy: How long to keep old partitions
  • partition_size_target: Target 10-50GB per partition

Query Optimization

  • enable_partition_pruning = on: Enable automatic partition elimination
  • constraint_exclusion = partition: Enable constraint-based pruning
  • enable_partitionwise_join = on: Join matching partitions directly
  • enable_partitionwise_aggregate = on: Aggregate per-partition then combine

Best Practices

DO:

  • Always include partition key in WHERE clauses for pruning
  • Target 10-50GB per partition (not too large, not too small)
  • Use RANGE partitioning for time-series data
  • Use LIST partitioning for categorical data (regions, types)
  • Use HASH partitioning for even distribution without natural key
  • Automate partition creation 3+ months ahead
  • Monitor partition sizes and adjust strategy if needed

DON'T:

  • Create thousands of tiny partitions (overhead > benefit)
  • Partition tables < 10GB (overhead not justified)
  • Use partition key that changes over time
  • Query without partition key filter (scans all partitions)
  • Forget to analyze partitions after bulk inserts
  • Mix partition strategies without clear reason

Performance Considerations

  • Partition pruning can reduce query time by 90%+ on large tables
  • Each partition adds ~8KB overhead in PostgreSQL catalogs
  • INSERT performance unchanged for single-row inserts
  • Bulk INSERT benefits from partition-wise parallelism
  • VACUUM and ANALYZE run faster on smaller partitions
  • Index creation can be parallelized across partitions
  • /database-migration-manager - Schema migrations with partition support
  • /database-backup-automator - Per-partition backup strategies
  • /database-index-advisor - Optimize indexes for partitioned tables
  • /sql-query-optimizer - Ensure queries leverage partition pruning

Version History

  • v1.0.0 (2024-10): Initial implementation with PostgreSQL declarative partitioning
  • Planned v1.1.0: Add MySQL partitioning support and automated partition rebalancing