Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:18:31 +08:00
commit bcabeddc75
8 changed files with 914 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
{
"name": "database-replication-manager",
"description": "Manage database replication, failover, and high availability configurations",
"version": "1.0.0",
"author": {
"name": "Claude Code Plugins",
"email": "[email protected]"
},
"skills": [
"./skills"
],
"commands": [
"./commands"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# database-replication-manager
Manage database replication, failover, and high availability configurations

759
commands/replication.md Normal file
View File

@@ -0,0 +1,759 @@
---
description: Comprehensive database replication management with streaming replication, failover automation, and lag monitoring
shortcut: replication
---
# Database Replication Manager
Implement production-grade database replication for PostgreSQL and MySQL with streaming replication (physical), logical replication (selective tables), synchronous and asynchronous modes, automatic failover, lag monitoring, conflict resolution, and read scaling across multiple replicas. Achieve 99.99% availability with RPO <5 seconds and RTO <30 seconds for automated failover.
## When to Use This Command
Use `/replication` when you need to:
- Implement high availability with automatic failover (99.99%+ uptime)
- Scale read workloads across multiple replicas (10x read capacity)
- Create disaster recovery instances in different regions
- Enable zero-downtime database migrations and upgrades
- Implement read-heavy application architectures
- Meet compliance requirements for data redundancy
DON'T use this when:
- Single server handles all load comfortably (<50% CPU)
- Database size is small (<10GB) and backup/restore is fast
- Application doesn't support read replica routing
- Network latency between regions is high (>100ms for sync replication)
- You lack monitoring infrastructure for replication lag
- Write workload is too heavy for replication to keep up
## Design Decisions
This command implements **automated replication with failover** because:
- Streaming replication provides real-time data synchronization
- Automatic failover reduces RTO from hours to seconds
- Read replicas enable horizontal read scaling
- Logical replication allows selective table replication
- Synchronous mode ensures zero data loss for critical transactions
**Alternative considered: Application-level read/write splitting**
- No replication overhead at database level
- Requires application changes for every database interaction
- More complex error handling and retry logic
- Recommended when replication infrastructure unavailable
**Alternative considered: Database clustering (Patroni, Galera)**
- Multi-master with automatic failover
- More complex setup and maintenance
- Better for write-heavy workloads
- Recommended for high-write applications requiring HA
## Prerequisites
Before running this command:
1. Primary and replica servers with network connectivity
2. Sufficient disk space for WAL archiving (30-50% of database size)
3. Monitoring system for replication lag alerts
4. Understanding of RPO (Recovery Point Objective) and RTO requirements
5. Tested failover procedures and runbooks
## Implementation Process
### Step 1: Configure Primary for Replication
Enable WAL archiving, set max_wal_senders, and create replication user.
### Step 2: Initialize Replica with Base Backup
Use pg_basebackup to clone primary database to replica server.
### Step 3: Configure Replica Connection
Set primary_conninfo and start replica in standby mode.
### Step 4: Verify Replication Status
Check replication lag and ensure WAL streaming is active.
### Step 5: Implement Monitoring and Failover
Deploy replication lag alerts and automatic failover scripts.
## Output Format
The command generates:
- `replication/primary_setup.sql` - Primary configuration and replication user
- `replication/replica_setup.sh` - Automated replica initialization script
- `replication/failover.py` - Automatic failover orchestration
- `replication/monitoring.yml` - Prometheus/Grafana replication metrics
- `replication/recovery.conf` - Replica recovery configuration
## Code Examples
### Example 1: PostgreSQL Streaming Replication Setup
```bash
#!/bin/bash
#
# Production-ready PostgreSQL streaming replication setup
# with automatic failover and monitoring integration
#
set -e
# Configuration
PRIMARY_HOST="${PRIMARY_HOST:-primary.example.com}"
REPLICA_HOST="${REPLICA_HOST:-replica.example.com}"
REPLICATION_USER="${REPLICATION_USER:-replicator}"
REPLICATION_PASSWORD="${REPLICATION_PASSWORD:-changeme}"
POSTGRES_DATA_DIR="/var/lib/postgresql/14/main"
echo "========================================="
echo "PostgreSQL Streaming Replication Setup"
echo "========================================="
echo ""
# ===== PRIMARY SERVER CONFIGURATION =====
setup_primary() {
echo "Configuring PRIMARY server: $PRIMARY_HOST"
echo ""
# 1. Configure postgresql.conf for replication
cat >> /etc/postgresql/14/main/postgresql.conf <<EOF
# ========== REPLICATION SETTINGS ==========
# Added by replication setup script
# Write-Ahead Log (WAL) settings
wal_level = replica # Enable WAL for replication
max_wal_senders = 10 # Max concurrent replication connections
wal_keep_size = 1024 # Keep 1GB of WAL segments (PostgreSQL 13+)
max_replication_slots = 10 # For replication slots (recommended)
# Synchronous replication (optional - for zero data loss)
# synchronous_standby_names = 'replica1' # Uncomment for sync replication
synchronous_commit = local # Options: off, local, remote_write, remote_apply, on
# Archive WAL for point-in-time recovery (optional)
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
archive_timeout = 300 # Force WAL switch every 5 minutes
# Hot standby settings
hot_standby = on # Allow reads on replica
hot_standby_feedback = on # Prevent query conflicts
# ========================================
EOF
# 2. Create WAL archive directory
mkdir -p /var/lib/postgresql/wal_archive
chown postgres:postgres /var/lib/postgresql/wal_archive
chmod 700 /var/lib/postgresql/wal_archive
# 3. Configure pg_hba.conf for replication connections
cat >> /etc/postgresql/14/main/pg_hba.conf <<EOF
# Replication connections (added by setup script)
host replication $REPLICATION_USER $REPLICA_HOST/32 scram-sha-256
host replication $REPLICATION_USER 0.0.0.0/0 scram-sha-256 # For multiple replicas
EOF
# 4. Create replication user
sudo -u postgres psql <<EOF
-- Create replication user with strong password
CREATE ROLE $REPLICATION_USER WITH REPLICATION LOGIN PASSWORD '$REPLICATION_PASSWORD';
-- Grant necessary permissions
GRANT CONNECT ON DATABASE postgres TO $REPLICATION_USER;
-- Create replication slot (recommended for reliability)
SELECT * FROM pg_create_physical_replication_slot('replica1_slot');
-- Verify replication user
\du $REPLICATION_USER
EOF
# 5. Restart PostgreSQL to apply changes
systemctl restart postgresql@14-main
echo ""
echo "✅ Primary server configured successfully"
echo ""
echo "Replication Status:"
sudo -u postgres psql -c "SELECT * FROM pg_replication_slots;"
sudo -u postgres psql -c "SELECT usename, application_name, client_addr, state, sync_state FROM pg_stat_replication;"
}
# ===== REPLICA SERVER CONFIGURATION =====
setup_replica() {
echo "Configuring REPLICA server: $REPLICA_HOST"
echo ""
# 1. Stop PostgreSQL on replica (will be rebuilt)
systemctl stop postgresql@14-main
# 2. Backup existing data (safety)
if [ -d "$POSTGRES_DATA_DIR" ]; then
mv "$POSTGRES_DATA_DIR" "${POSTGRES_DATA_DIR}.backup.$(date +%Y%m%d-%H%M%S)"
fi
# 3. Create base backup from primary using pg_basebackup
echo "Creating base backup from primary (this may take several minutes)..."
sudo -u postgres pg_basebackup \
-h $PRIMARY_HOST \
-D $POSTGRES_DATA_DIR \
-U $REPLICATION_USER \
-P \
-v \
-R \
-X stream \
-C \
-S replica1_slot
# -R: Creates standby.signal and writes recovery parameters
# -X stream: Stream WAL while backup is in progress
# -C: Create replication slot on primary (if it doesn't exist)
# -S: Use replication slot for reliable replication
# 4. Configure replica-specific settings (optional)
cat >> $POSTGRES_DATA_DIR/postgresql.auto.conf <<EOF
# Replica-specific configuration
primary_conninfo = 'host=$PRIMARY_HOST port=5432 user=$REPLICATION_USER password=$REPLICATION_PASSWORD application_name=replica1'
primary_slot_name = 'replica1_slot'
# Recovery settings
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p' # If using WAL archiving
recovery_target_timeline = 'latest'
# Hot standby settings (allow read queries on replica)
hot_standby = on
max_standby_streaming_delay = 30s # Max delay before canceling conflicting queries
EOF
# 5. Set proper permissions
chown -R postgres:postgres $POSTGRES_DATA_DIR
chmod 700 $POSTGRES_DATA_DIR
# 6. Start replica
systemctl start postgresql@14-main
echo ""
echo "✅ Replica server configured successfully"
echo ""
echo "Replica Status:"
sudo -u postgres psql -c "SELECT pg_is_in_recovery();" # Should return 't'
sudo -u postgres psql -c "SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();"
}
# ===== REPLICATION VERIFICATION =====
verify_replication() {
echo "Verifying replication setup..."
echo ""
# Check primary replication status
echo "=== PRIMARY SERVER STATUS ==="
sudo -u postgres psql -h $PRIMARY_HOST -U $REPLICATION_USER postgres <<EOF
SELECT
client_addr,
application_name,
state,
sync_state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_priority,
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;
EOF
echo ""
echo "=== REPLICA SERVER STATUS ==="
sudo -u postgres psql -h $REPLICA_HOST postgres <<EOF
SELECT
pg_is_in_recovery() AS is_replica,
pg_last_wal_receive_lsn() AS receive_lsn,
pg_last_wal_replay_lsn() AS replay_lsn,
pg_last_xact_replay_timestamp() AS last_replay_timestamp,
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds;
EOF
echo ""
echo "=== REPLICATION LAG ==="
# Acceptable lag: <1 second for local replicas, <5 seconds for remote
LAG=$(sudo -u postgres psql -h $PRIMARY_HOST -U postgres postgres -t -c \
"SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) FROM pg_stat_replication LIMIT 1;")
if (( $(echo "$LAG < 5" | bc -l) )); then
echo "✅ Replication lag: ${LAG}s (healthy)"
else
echo "⚠️ Replication lag: ${LAG}s (high)"
fi
}
# ===== MAIN =====
case "${1:-}" in
primary)
setup_primary
;;
replica)
setup_replica
;;
verify)
verify_replication
;;
*)
echo "Usage: $0 {primary|replica|verify}"
echo ""
echo " primary - Configure primary server for replication"
echo " replica - Set up replica from primary"
echo " verify - Verify replication status"
exit 1
;;
esac
```
### Example 2: Automated Failover Script with Prometheus Integration
```python
#!/usr/bin/env python3
"""
Production-ready PostgreSQL automatic failover script with
health checks, monitoring integration, and rollback capability.
"""
import psycopg2
import time
import logging
import subprocess
from typing import Optional, Dict
from dataclasses import dataclass
from enum import Enum
import requests
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class ServerRole(Enum):
"""Database server role."""
PRIMARY = "primary"
REPLICA = "replica"
UNKNOWN = "unknown"
@dataclass
class ReplicationStatus:
"""Replication status information."""
is_primary: bool
is_replica: bool
replication_lag_seconds: Optional[float]
wal_receive_lsn: Optional[str]
wal_replay_lsn: Optional[str]
connected_replicas: int
class PostgreSQLFailoverManager:
"""
Manages automatic failover for PostgreSQL streaming replication.
"""
def __init__(
self,
primary_host: str,
replica_host: str,
postgres_user: str = "postgres",
postgres_password: str = "",
failover_threshold_seconds: int = 30,
alert_webhook: Optional[str] = None
):
"""
Initialize failover manager.
Args:
primary_host: Primary server hostname
replica_host: Replica server hostname
postgres_user: PostgreSQL superuser
postgres_password: PostgreSQL password
failover_threshold_seconds: Trigger failover after this many seconds down
alert_webhook: Slack/PagerDuty webhook for alerts
"""
self.primary_host = primary_host
self.replica_host = replica_host
self.postgres_user = postgres_user
self.postgres_password = postgres_password
self.failover_threshold = failover_threshold_seconds
self.alert_webhook = alert_webhook
self.primary_down_since: Optional[float] = None
def check_server_health(self, host: str) -> bool:
"""
Check if PostgreSQL server is healthy.
Args:
host: Server hostname
Returns:
True if server is healthy, False otherwise
"""
try:
conn = psycopg2.connect(
host=host,
user=self.postgres_user,
password=self.postgres_password,
database="postgres",
connect_timeout=5
)
conn.close()
return True
except Exception as e:
logger.error(f"Health check failed for {host}: {e}")
return False
def get_replication_status(self, host: str) -> Optional[ReplicationStatus]:
"""
Get replication status from a server.
Args:
host: Server hostname
Returns:
ReplicationStatus or None if unreachable
"""
try:
conn = psycopg2.connect(
host=host,
user=self.postgres_user,
password=self.postgres_password,
database="postgres",
connect_timeout=5
)
with conn.cursor() as cur:
# Check if primary or replica
cur.execute("SELECT pg_is_in_recovery()")
is_replica = cur.fetchone()[0]
is_primary = not is_replica
# Get replication lag (for replicas)
replication_lag = None
wal_receive_lsn = None
wal_replay_lsn = None
if is_replica:
cur.execute("""
SELECT
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds,
pg_last_wal_receive_lsn()::text AS receive_lsn,
pg_last_wal_replay_lsn()::text AS replay_lsn
""")
row = cur.fetchone()
replication_lag = row[0]
wal_receive_lsn = row[1]
wal_replay_lsn = row[2]
# Count connected replicas (for primary)
connected_replicas = 0
if is_primary:
cur.execute("SELECT COUNT(*) FROM pg_stat_replication")
connected_replicas = cur.fetchone()[0]
conn.close()
return ReplicationStatus(
is_primary=is_primary,
is_replica=is_replica,
replication_lag_seconds=replication_lag,
wal_receive_lsn=wal_receive_lsn,
wal_replay_lsn=wal_replay_lsn,
connected_replicas=connected_replicas
)
except Exception as e:
logger.error(f"Failed to get replication status from {host}: {e}")
return None
def promote_replica_to_primary(self, replica_host: str) -> bool:
"""
Promote replica to primary.
Args:
replica_host: Replica server to promote
Returns:
True if promotion successful
"""
logger.info(f"Promoting replica {replica_host} to primary...")
try:
# Execute pg_promote() via SSH or local command
# (Assuming replica is on same machine for this example)
conn = psycopg2.connect(
host=replica_host,
user=self.postgres_user,
password=self.postgres_password,
database="postgres"
)
with conn.cursor() as cur:
# Promote replica to primary
cur.execute("SELECT pg_promote()")
conn.commit()
conn.close()
# Wait for promotion to complete
time.sleep(5)
# Verify promotion
status = self.get_replication_status(replica_host)
if status and status.is_primary:
logger.info(f"✅ Successfully promoted {replica_host} to primary")
return True
else:
logger.error(f"❌ Promotion failed - server is still replica")
return False
except Exception as e:
logger.error(f"Promotion failed: {e}")
return False
def send_alert(self, message: str, severity: str = "error") -> None:
"""
Send alert to configured webhook.
Args:
message: Alert message
severity: Alert severity (info, warning, error, critical)
"""
if not self.alert_webhook:
return
emoji_map = {
'info': '',
'warning': '⚠️',
'error': '',
'critical': '🚨'
}
payload = {
'text': f"{emoji_map.get(severity, '')} PostgreSQL Failover Alert",
'attachments': [{
'color': 'danger' if severity in ['error', 'critical'] else 'warning',
'text': message,
'footer': 'PostgreSQL Failover Manager',
'ts': int(time.time())
}]
}
try:
requests.post(self.alert_webhook, json=payload, timeout=5)
except Exception as e:
logger.error(f"Failed to send alert: {e}")
def monitor_and_failover(self) -> None:
"""
Continuously monitor replication and perform automatic failover.
"""
logger.info("Starting replication monitoring...")
while True:
try:
# Check primary health
primary_healthy = self.check_server_health(self.primary_host)
if not primary_healthy:
if self.primary_down_since is None:
# Primary just went down
self.primary_down_since = time.time()
logger.warning(f"⚠️ Primary {self.primary_host} is DOWN")
self.send_alert(
f"Primary database {self.primary_host} is unreachable. "
f"Failover will trigger in {self.failover_threshold} seconds.",
severity='warning'
)
# Check if primary has been down long enough to trigger failover
down_duration = time.time() - self.primary_down_since
if down_duration >= self.failover_threshold:
logger.critical(
f"🚨 Primary down for {down_duration:.0f}s - "
f"TRIGGERING FAILOVER"
)
self.send_alert(
f"PRIMARY FAILURE: {self.primary_host} down for "
f"{down_duration:.0f}s. Initiating automatic failover to "
f"{self.replica_host}",
severity='critical'
)
# Perform failover
success = self.promote_replica_to_primary(self.replica_host)
if success:
self.send_alert(
f"✅ FAILOVER SUCCESSFUL: {self.replica_host} is now PRIMARY. "
f"Update application connection strings immediately.",
severity='error' # Still an error situation
)
# Stop monitoring (manual intervention required)
break
else:
self.send_alert(
f"❌ FAILOVER FAILED: Could not promote {self.replica_host}. "
f"Manual intervention required immediately.",
severity='critical'
)
break
else:
# Primary is healthy
if self.primary_down_since is not None:
# Primary recovered
logger.info(f"✅ Primary {self.primary_host} recovered")
self.send_alert(
f"Primary database {self.primary_host} has recovered.",
severity='info'
)
self.primary_down_since = None
# Check replication lag
replica_status = self.get_replication_status(self.replica_host)
if replica_status:
lag = replica_status.replication_lag_seconds or 0
if lag > 60:
logger.warning(f"⚠️ High replication lag: {lag:.1f}s")
self.send_alert(
f"High replication lag detected: {lag:.1f} seconds",
severity='warning'
)
else:
logger.info(
f"Replication healthy: lag={lag:.1f}s, "
f"primary={self.primary_host}, replica={self.replica_host}"
)
# Sleep before next check
time.sleep(10)
except KeyboardInterrupt:
logger.info("Monitoring stopped by user")
break
except Exception as e:
logger.error(f"Monitoring error: {e}")
time.sleep(10)
# CLI usage
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="PostgreSQL Failover Manager")
parser.add_argument("--primary", required=True, help="Primary host")
parser.add_argument("--replica", required=True, help="Replica host")
parser.add_argument("--user", default="postgres", help="PostgreSQL user")
parser.add_argument("--password", default="", help="PostgreSQL password")
parser.add_argument("--threshold", type=int, default=30, help="Failover threshold (seconds)")
parser.add_argument("--webhook", help="Alert webhook URL")
args = parser.parse_args()
manager = PostgreSQLFailoverManager(
primary_host=args.primary,
replica_host=args.replica,
postgres_user=args.user,
postgres_password=args.password,
failover_threshold_seconds=args.threshold,
alert_webhook=args.webhook
)
manager.monitor_and_failover()
```
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| "could not connect to server" | Replica cannot reach primary | Check network connectivity, firewall rules, pg_hba.conf |
| "requested WAL segment already removed" | WAL files deleted before replica could receive them | Increase wal_keep_size or use replication slots |
| "replication slot does not exist" | Replica trying to use non-existent slot | Create slot on primary: `SELECT pg_create_physical_replication_slot('slot_name')` |
| "hot standby conflict" | Query on replica conflicts with recovery | Increase max_standby_streaming_delay or tune query cancellation |
| "timeline history file missing" | Replica and primary have diverged after failover | Rebuild replica from new primary using pg_basebackup |
## Configuration Options
**Replication Modes**
- **Asynchronous** (default): Best performance, small data loss risk
- **Synchronous** (`synchronous_commit=on`): Zero data loss, slower writes
- **Remote write** (`synchronous_commit=remote_write`): Balanced approach
- **Remote apply** (`synchronous_commit=remote_apply`): Strongest consistency
**Replication Methods**
- **Streaming replication**: Binary WAL streaming (physical replication)
- **Logical replication**: Selective table replication (PostgreSQL 10+)
- **WAL shipping**: Archive-based replication (for backups)
**Failover Strategies**
- **Manual failover**: DBA triggers promotion (safest)
- **Automatic failover**: Scripted promotion after health check failure
- **Patroni/repmgr**: Cluster management with automatic failover
- **Cloud-managed**: RDS/CloudSQL automatic failover
## Best Practices
DO:
- Use replication slots to prevent WAL deletion before replica receives it
- Monitor replication lag continuously (alert at >10 seconds)
- Test failover procedures quarterly (disaster recovery drills)
- Use synchronous replication for critical write transactions only
- Implement connection pooling to handle failover reconnections
- Document failover runbooks with step-by-step procedures
- Keep replicas on same PostgreSQL major version as primary
DON'T:
- Run long-running queries on replicas without tuning conflict resolution
- Forget to update application connection strings after failover
- Use synchronous replication over high-latency networks (>50ms)
- Skip monitoring replication lag (leads to split-brain scenarios)
- Promote replica without verifying it's up-to-date
- Delete replication slots without stopping replicas first
- Ignore hot standby conflicts (causes query cancellations)
## Performance Considerations
- **Replication lag**: <1s for local replicas, <5s for cross-region
- **Write overhead**: 5-10% for asynchronous, 20-50% for synchronous
- **Network bandwidth**: 10-50 Mbps per active replica
- **Disk I/O**: Replica writes same data as primary (similar load)
- **Failover time (RTO)**: 10-60 seconds for automatic, 5-15 minutes for manual
- **Data loss (RPO)**: 0 seconds (sync), 0-5 seconds (async)
## Security Considerations
- Use strong passwords for replication user (20+ characters)
- Encrypt replication traffic with SSL/TLS (`sslmode=require`)
- Restrict replication connections in pg_hba.conf to specific IPs
- Audit all failover operations for compliance
- Rotate replication credentials quarterly
- Use dedicated replication user (not superuser)
- Enable connection logging for replication connections
## Related Commands
- `/database-backup-automator` - Backup before major replication changes
- `/database-health-monitor` - Monitor replication lag and health
- `/database-recovery-manager` - PITR using WAL archives from replication
- `/database-connection-pooler` - Handle connection routing after failover
## Version History
- v1.0.0 (2024-10): Initial implementation with streaming replication and automatic failover
- Planned v1.1.0: Add logical replication support, Patroni integration

61
plugin.lock.json Normal file
View File

@@ -0,0 +1,61 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/database/database-replication-manager",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "97ae5d64e8c06b38e8716238998aab19812e69e9",
"treeHash": "da5f5a034a7629bd22d3978b916295393b676c7a98873cdb2b6431232b16c504",
"generatedAt": "2025-11-28T10:18:21.192078Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "database-replication-manager",
"description": "Manage database replication, failover, and high availability configurations",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "3f30b26bdc6895ac101de9ef4745188554a76c4c10318197441534d57eba2191"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "8264a4ac2bb7ecbcf87a1c1d2fd467af191222b97173156be933e93db6081821"
},
{
"path": "commands/replication.md",
"sha256": "2de263b472c1793b2cc6ae7fc3e47fce5b876968f377957e57a0becbcd9237b6"
},
{
"path": "skills/database-replication-manager/SKILL.md",
"sha256": "59f96e915fadc1fa39f2cb8f3310d321e405b86adb80a8b749056a4154bd19b3"
},
{
"path": "skills/database-replication-manager/references/README.md",
"sha256": "720f6e0356268b97a40bc71569cfcb315894d4833059d52dd27be8063f41ab1a"
},
{
"path": "skills/database-replication-manager/scripts/README.md",
"sha256": "223f0497201f24e609b54d987df9f5afc41731069e21e2309cd0f6ba0b340bde"
},
{
"path": "skills/database-replication-manager/assets/README.md",
"sha256": "24cfb70ef86f6762c67597ed06440f4ab8d6cd22e8bfdac707ad12925d21c49d"
}
],
"dirSha256": "da5f5a034a7629bd22d3978b916295393b676c7a98873cdb2b6431232b16c504"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,55 @@
---
name: managing-database-replication
description: |
This skill enables Claude to manage database replication, failover, and high availability configurations using the database-replication-manager plugin. It is designed to assist with tasks such as setting up master-slave replication, configuring automatic failover, monitoring replication lag, and implementing read scaling. Use this skill when the user requests help with "database replication", "failover configuration", "high availability", "replication lag", or "read scaling" for databases like PostgreSQL or MySQL. The plugin facilitates both physical and logical replication strategies.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
version: 1.0.0
---
## Overview
This skill empowers Claude to automate and streamline database replication processes, ensuring high availability and data consistency across multiple database instances. It simplifies the configuration and management of complex replication topologies.
## How It Works
1. **Initialization**: The skill activates the database-replication-manager plugin upon detecting relevant keywords.
2. **Configuration**: The skill prompts the user for database connection details, replication type (physical/logical), and desired configuration parameters (e.g., failover settings, replication lag thresholds).
3. **Implementation**: The plugin generates and executes the necessary commands to configure database replication based on the user's specifications.
## When to Use This Skill
This skill activates when you need to:
- Set up a new database replication environment.
- Configure automatic failover for a database cluster.
- Monitor replication lag and trigger alerts based on defined thresholds.
- Implement read scaling by distributing read queries across multiple replicas.
## Examples
### Example 1: Setting up Master-Slave Replication
User request: "Set up master-slave replication for my PostgreSQL database with automatic failover."
The skill will:
1. Activate the database-replication-manager plugin.
2. Guide the user through the configuration process, prompting for connection details and failover settings.
3. Generate and execute the necessary PostgreSQL commands to establish master-slave replication and configure automatic failover.
### Example 2: Monitoring Replication Lag
User request: "Monitor replication lag on my MySQL replica and alert me if it exceeds 5 seconds."
The skill will:
1. Activate the database-replication-manager plugin.
2. Configure replication lag monitoring for the specified MySQL replica.
3. Set up alerts that trigger when the replication lag exceeds the defined threshold of 5 seconds.
## Best Practices
- **Security**: Always encrypt database credentials and use secure communication channels for replication traffic.
- **Monitoring**: Implement comprehensive monitoring of replication status, lag, and resource utilization.
- **Testing**: Regularly test failover procedures to ensure they function correctly in a disaster recovery scenario.
## Integration
This skill can be integrated with other monitoring and alerting tools to provide comprehensive database management capabilities. It can also be used in conjunction with infrastructure-as-code tools to automate the deployment and configuration of database replication environments.

View File

@@ -0,0 +1,6 @@
# Assets
Bundled resources for database-replication-manager skill
- [ ] replication_template.conf: A template configuration file for database replication.
- [ ] monitoring_dashboard.json: A sample dashboard configuration for monitoring replication metrics.

View File

@@ -0,0 +1,8 @@
# References
Bundled resources for database-replication-manager skill
- [ ] replication_best_practices.md: A document detailing best practices for database replication.
- [ ] failover_configuration.md: Detailed instructions on configuring automatic failover.
- [ ] database_schema.md: Schema definitions for the databases being replicated.
- [ ] troubleshooting_replication.md: A guide to troubleshooting common replication issues.

View File

@@ -0,0 +1,7 @@
# Scripts
Bundled resources for database-replication-manager skill
- [ ] setup_replication.sh: Automates the setup of master-slave replication.
- [ ] failover.sh: Simulates a failover scenario and tests the failover configuration.
- [ ] monitor_replication.py: Monitors replication lag and sends alerts if it exceeds a threshold.