Initial commit
This commit is contained in:
15
.claude-plugin/plugin.json
Normal file
15
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"name": "database-sharding-manager",
|
||||
"description": "Database plugin for database-sharding-manager",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "Claude Code Plugins",
|
||||
"email": "[email protected]"
|
||||
},
|
||||
"skills": [
|
||||
"./skills"
|
||||
],
|
||||
"commands": [
|
||||
"./commands"
|
||||
]
|
||||
}
|
||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# database-sharding-manager
|
||||
|
||||
Database plugin for database-sharding-manager
|
||||
741
commands/sharding.md
Normal file
741
commands/sharding.md
Normal file
@@ -0,0 +1,741 @@
|
||||
---
|
||||
description: Implement horizontal database sharding for massive scale applications
|
||||
shortcut: sharding
|
||||
---
|
||||
|
||||
# Database Sharding Manager
|
||||
|
||||
Design and implement horizontal database sharding strategies to distribute data across multiple database instances, enabling applications to scale beyond single-server limitations with consistent hashing, automatic rebalancing, and cross-shard query coordination.
|
||||
|
||||
## When to Use This Command
|
||||
|
||||
Use `/sharding` when you need to:
|
||||
- Scale beyond single database server capacity (>10TB or >100k QPS)
|
||||
- Distribute write load across multiple database servers
|
||||
- Improve query performance through data locality
|
||||
- Implement geographic data distribution for GDPR/data residency
|
||||
- Reduce blast radius of database failures (isolate tenant data)
|
||||
- Support multi-tenant SaaS with tenant-level isolation
|
||||
|
||||
DON'T use this when:
|
||||
- Database is small (<1TB) and performing well
|
||||
- Can solve with read replicas and caching instead
|
||||
- Application can't handle distributed transactions complexity
|
||||
- Team lacks expertise in distributed systems
|
||||
- Cross-shard queries are majority of workload (use partitioning instead)
|
||||
|
||||
## Design Decisions
|
||||
|
||||
This command implements **consistent hashing with virtual nodes** because:
|
||||
- Minimizes data movement when adding/removing shards (only K/n keys move)
|
||||
- Distributes load evenly across shards with virtual nodes
|
||||
- Supports gradual shard addition without downtime
|
||||
- Enables geographic routing for data residency compliance
|
||||
- Provides automatic failover with shard replica promotion
|
||||
|
||||
**Alternative considered: Range-based sharding**
|
||||
- Simple to implement and understand
|
||||
- Predictable data distribution
|
||||
- Prone to hotspots if key distribution uneven
|
||||
- Recommended for time-series data with sequential IDs
|
||||
|
||||
**Alternative considered: Directory-based sharding**
|
||||
- Flexible shard assignment with lookup table
|
||||
- Easy to move individual records
|
||||
- Single point of failure (directory lookup)
|
||||
- Recommended for small-scale or initial implementations
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before running this command:
|
||||
1. Application supports sharding-aware database connections
|
||||
2. Clear understanding of sharding key (immutable, high cardinality)
|
||||
3. Strategy for handling cross-shard queries and joins
|
||||
4. Monitoring infrastructure for shard health
|
||||
5. Migration plan from single database to sharded architecture
|
||||
|
||||
## Implementation Process
|
||||
|
||||
### Step 1: Choose Sharding Strategy
|
||||
Select sharding approach based on data access patterns and scale requirements.
|
||||
|
||||
### Step 2: Design Shard Key
|
||||
Choose immutable, high-cardinality key that distributes data evenly (user_id, tenant_id).
|
||||
|
||||
### Step 3: Implement Shard Routing Layer
|
||||
Build connection pooling and routing logic to direct queries to correct shard.
|
||||
|
||||
### Step 4: Migrate Data to Shards
|
||||
Perform zero-downtime migration from monolithic to sharded architecture.
|
||||
|
||||
### Step 5: Monitor and Rebalance
|
||||
Track shard load distribution and rebalance data as needed.
|
||||
|
||||
## Output Format
|
||||
|
||||
The command generates:
|
||||
- `sharding/shard_router.py` - Consistent hashing router implementation
|
||||
- `sharding/shard_manager.js` - Shard connection pool manager
|
||||
- `migration/shard_migration.sql` - Data migration scripts per shard
|
||||
- `monitoring/shard_health.sql` - Per-shard metrics and health checks
|
||||
- `docs/sharding_architecture.md` - Architecture documentation and runbooks
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Example 1: Consistent Hashing Shard Router with Virtual Nodes
|
||||
|
||||
```python
|
||||
# sharding/consistent_hash_router.py
|
||||
import hashlib
|
||||
import bisect
|
||||
from typing import List, Dict, Optional, Any
|
||||
from dataclasses import dataclass
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@dataclass
|
||||
class ShardConfig:
|
||||
"""Configuration for a database shard."""
|
||||
shard_id: int
|
||||
host: str
|
||||
port: int
|
||||
database: str
|
||||
weight: int = 1 # Relative weight for load distribution
|
||||
status: str = 'active' # active, readonly, maintenance
|
||||
|
||||
class ConsistentHashRouter:
|
||||
"""
|
||||
Consistent hashing implementation with virtual nodes.
|
||||
|
||||
Virtual nodes ensure even distribution even with heterogeneous shard sizes.
|
||||
Adding/removing shards only affects K/n keys where n = number of shards.
|
||||
"""
|
||||
|
||||
def __init__(self, virtual_nodes: int = 150):
|
||||
"""
|
||||
Initialize consistent hash ring.
|
||||
|
||||
Args:
|
||||
virtual_nodes: Number of virtual nodes per physical shard.
|
||||
More nodes = better distribution, higher memory usage.
|
||||
"""
|
||||
self.virtual_nodes = virtual_nodes
|
||||
self.ring: List[int] = [] # Sorted hash values
|
||||
self.ring_map: Dict[int, ShardConfig] = {} # Hash -> Shard mapping
|
||||
self.shards: Dict[int, ShardConfig] = {} # Shard ID -> Config
|
||||
|
||||
def add_shard(self, shard: ShardConfig) -> None:
|
||||
"""Add shard to consistent hash ring with virtual nodes."""
|
||||
self.shards[shard.shard_id] = shard
|
||||
|
||||
# Create virtual nodes weighted by shard capacity
|
||||
num_vnodes = self.virtual_nodes * shard.weight
|
||||
|
||||
for i in range(num_vnodes):
|
||||
# Create unique hash for each virtual node
|
||||
vnode_key = f"{shard.shard_id}:{shard.host}:{i}"
|
||||
hash_value = self._hash(vnode_key)
|
||||
|
||||
# Insert into sorted ring
|
||||
bisect.insort(self.ring, hash_value)
|
||||
self.ring_map[hash_value] = shard
|
||||
|
||||
logger.info(
|
||||
f"Added shard {shard.shard_id} ({shard.host}) with {num_vnodes} virtual nodes"
|
||||
)
|
||||
|
||||
def remove_shard(self, shard_id: int) -> None:
|
||||
"""Remove shard from hash ring."""
|
||||
if shard_id not in self.shards:
|
||||
raise ValueError(f"Shard {shard_id} not found")
|
||||
|
||||
shard = self.shards[shard_id]
|
||||
|
||||
# Remove all virtual nodes for this shard
|
||||
num_vnodes = self.virtual_nodes * shard.weight
|
||||
removed_count = 0
|
||||
|
||||
for i in range(num_vnodes):
|
||||
vnode_key = f"{shard.shard_id}:{shard.host}:{i}"
|
||||
hash_value = self._hash(vnode_key)
|
||||
|
||||
if hash_value in self.ring_map:
|
||||
self.ring.remove(hash_value)
|
||||
del self.ring_map[hash_value]
|
||||
removed_count += 1
|
||||
|
||||
del self.shards[shard_id]
|
||||
|
||||
logger.info(
|
||||
f"Removed shard {shard_id} ({removed_count} virtual nodes)"
|
||||
)
|
||||
|
||||
def get_shard(self, key: str) -> Optional[ShardConfig]:
|
||||
"""
|
||||
Find shard for given key using consistent hashing.
|
||||
|
||||
Args:
|
||||
key: Sharding key (user_id, tenant_id, etc.)
|
||||
|
||||
Returns:
|
||||
ShardConfig for the shard responsible for this key
|
||||
"""
|
||||
if not self.ring:
|
||||
raise ValueError("No shards available in hash ring")
|
||||
|
||||
key_hash = self._hash(key)
|
||||
|
||||
# Find first hash value >= key_hash (clockwise search)
|
||||
idx = bisect.bisect_right(self.ring, key_hash)
|
||||
|
||||
# Wrap around to beginning if at end of ring
|
||||
if idx == len(self.ring):
|
||||
idx = 0
|
||||
|
||||
shard = self.ring_map[self.ring[idx]]
|
||||
|
||||
# Skip if shard is in maintenance
|
||||
if shard.status == 'maintenance':
|
||||
logger.warning(f"Shard {shard.shard_id} in maintenance, finding alternate")
|
||||
return self._find_next_active_shard(idx)
|
||||
|
||||
return shard
|
||||
|
||||
def _find_next_active_shard(self, start_idx: int) -> Optional[ShardConfig]:
|
||||
"""Find next active shard in ring, skipping maintenance shards."""
|
||||
for i in range(len(self.ring)):
|
||||
idx = (start_idx + i) % len(self.ring)
|
||||
shard = self.ring_map[self.ring[idx]]
|
||||
|
||||
if shard.status == 'active':
|
||||
return shard
|
||||
|
||||
raise ValueError("No active shards available")
|
||||
|
||||
def _hash(self, key: str) -> int:
|
||||
"""
|
||||
Generate consistent hash value for key.
|
||||
|
||||
Uses MD5 for speed. SHA256 is more secure but slower.
|
||||
"""
|
||||
return int(hashlib.md5(key.encode()).hexdigest(), 16)
|
||||
|
||||
def get_shard_distribution(self) -> Dict[int, int]:
|
||||
"""Analyze key distribution across shards (for testing)."""
|
||||
distribution = {shard_id: 0 for shard_id in self.shards}
|
||||
|
||||
# Sample 10000 keys to estimate distribution
|
||||
for i in range(10000):
|
||||
shard = self.get_shard(str(i))
|
||||
distribution[shard.shard_id] += 1
|
||||
|
||||
return distribution
|
||||
|
||||
def rebalance_check(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Check if shards are balanced and recommend rebalancing.
|
||||
|
||||
Returns:
|
||||
Dict with balance metrics and recommendations
|
||||
"""
|
||||
distribution = self.get_shard_distribution()
|
||||
|
||||
total = sum(distribution.values())
|
||||
expected_per_shard = total / len(self.shards)
|
||||
|
||||
imbalance = {}
|
||||
for shard_id, count in distribution.items():
|
||||
deviation = abs(count - expected_per_shard) / expected_per_shard * 100
|
||||
imbalance[shard_id] = {
|
||||
'count': count,
|
||||
'expected': expected_per_shard,
|
||||
'deviation_percent': round(deviation, 2)
|
||||
}
|
||||
|
||||
max_deviation = max(s['deviation_percent'] for s in imbalance.values())
|
||||
|
||||
return {
|
||||
'balanced': max_deviation < 10, # <10% deviation is acceptable
|
||||
'max_deviation_percent': max_deviation,
|
||||
'shard_distribution': imbalance,
|
||||
'recommendation': (
|
||||
'Rebalancing recommended' if max_deviation > 20
|
||||
else 'Distribution acceptable'
|
||||
)
|
||||
}
|
||||
|
||||
# Usage example
|
||||
if __name__ == "__main__":
|
||||
# Initialize router
|
||||
router = ConsistentHashRouter(virtual_nodes=150)
|
||||
|
||||
# Add shards
|
||||
router.add_shard(ShardConfig(
|
||||
shard_id=1,
|
||||
host='shard1.db.example.com',
|
||||
port=5432,
|
||||
database='myapp_shard1',
|
||||
weight=1
|
||||
))
|
||||
|
||||
router.add_shard(ShardConfig(
|
||||
shard_id=2,
|
||||
host='shard2.db.example.com',
|
||||
port=5432,
|
||||
database='myapp_shard2',
|
||||
weight=2 # Double capacity
|
||||
))
|
||||
|
||||
router.add_shard(ShardConfig(
|
||||
shard_id=3,
|
||||
host='shard3.db.example.com',
|
||||
port=5432,
|
||||
database='myapp_shard3',
|
||||
weight=1
|
||||
))
|
||||
|
||||
# Route queries
|
||||
user_id = "user_12345"
|
||||
shard = router.get_shard(user_id)
|
||||
print(f"User {user_id} → Shard {shard.shard_id} ({shard.host})")
|
||||
|
||||
# Check balance
|
||||
balance_report = router.rebalance_check()
|
||||
print(f"\nBalance report:")
|
||||
print(f" Balanced: {balance_report['balanced']}")
|
||||
print(f" Max deviation: {balance_report['max_deviation_percent']}%")
|
||||
```
|
||||
|
||||
### Example 2: Shard-Aware Database Connection Pool
|
||||
|
||||
```javascript
|
||||
// sharding/shard_connection_pool.js
|
||||
const { Pool } = require('pg');
|
||||
const crypto = require('crypto');
|
||||
|
||||
class ShardConnectionPool {
|
||||
constructor(shardConfigs) {
|
||||
this.shards = new Map();
|
||||
this.virtualNodes = 150;
|
||||
this.ring = [];
|
||||
this.ringMap = new Map();
|
||||
|
||||
// Initialize connection pools for each shard
|
||||
shardConfigs.forEach(config => {
|
||||
const pool = new Pool({
|
||||
host: config.host,
|
||||
port: config.port,
|
||||
database: config.database,
|
||||
user: config.user,
|
||||
password: config.password,
|
||||
max: 20, // Max connections per shard
|
||||
idleTimeoutMillis: 30000,
|
||||
connectionTimeoutMillis: 2000
|
||||
});
|
||||
|
||||
this.shards.set(config.shardId, {
|
||||
config,
|
||||
pool,
|
||||
stats: {
|
||||
queries: 0,
|
||||
errors: 0,
|
||||
avgLatency: 0
|
||||
}
|
||||
});
|
||||
|
||||
this.addToRing(config);
|
||||
});
|
||||
|
||||
console.log(`Initialized ${this.shards.size} shards with ${this.ring.length} virtual nodes`);
|
||||
}
|
||||
|
||||
addToRing(config) {
|
||||
const numVNodes = this.virtualNodes * (config.weight || 1);
|
||||
|
||||
for (let i = 0; i < numVNodes; i++) {
|
||||
const vnodeKey = `${config.shardId}:${config.host}:${i}`;
|
||||
const hash = this.hash(vnodeKey);
|
||||
|
||||
this.ring.push(hash);
|
||||
this.ringMap.set(hash, config.shardId);
|
||||
}
|
||||
|
||||
// Sort ring for binary search
|
||||
this.ring.sort((a, b) => a - b);
|
||||
}
|
||||
|
||||
hash(key) {
|
||||
return parseInt(
|
||||
crypto.createHash('md5').update(key).digest('hex').substring(0, 8),
|
||||
16
|
||||
);
|
||||
}
|
||||
|
||||
getShardId(key) {
|
||||
if (this.ring.length === 0) {
|
||||
throw new Error('No shards available');
|
||||
}
|
||||
|
||||
const keyHash = this.hash(key);
|
||||
|
||||
// Binary search for next hash >= keyHash
|
||||
let idx = this.ring.findIndex(h => h >= keyHash);
|
||||
|
||||
if (idx === -1) {
|
||||
idx = 0; // Wrap around
|
||||
}
|
||||
|
||||
return this.ringMap.get(this.ring[idx]);
|
||||
}
|
||||
|
||||
async query(shardKey, sql, params = []) {
|
||||
const shardId = this.getShardId(shardKey);
|
||||
const shard = this.shards.get(shardId);
|
||||
|
||||
if (!shard) {
|
||||
throw new Error(`Shard ${shardId} not found`);
|
||||
}
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
const result = await shard.pool.query(sql, params);
|
||||
|
||||
// Update stats
|
||||
shard.stats.queries++;
|
||||
const latency = Date.now() - startTime;
|
||||
shard.stats.avgLatency =
|
||||
(shard.stats.avgLatency * (shard.stats.queries - 1) + latency) /
|
||||
shard.stats.queries;
|
||||
|
||||
return result;
|
||||
|
||||
} catch (error) {
|
||||
shard.stats.errors++;
|
||||
console.error(`Query error on shard ${shardId}:`, error);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async queryMultipleShards(sql, params = []) {
|
||||
/**
|
||||
* Execute query across all shards and merge results.
|
||||
* Use sparingly - cross-shard queries are expensive.
|
||||
*/
|
||||
const promises = Array.from(this.shards.values()).map(async shard => {
|
||||
try {
|
||||
const result = await shard.pool.query(sql, params);
|
||||
return {
|
||||
shardId: shard.config.shardId,
|
||||
rows: result.rows,
|
||||
success: true
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
shardId: shard.config.shardId,
|
||||
error: error.message,
|
||||
success: false
|
||||
};
|
||||
}
|
||||
});
|
||||
|
||||
const results = await Promise.all(promises);
|
||||
|
||||
// Merge rows from all shards
|
||||
const allRows = results
|
||||
.filter(r => r.success)
|
||||
.flatMap(r => r.rows);
|
||||
|
||||
return {
|
||||
rows: allRows,
|
||||
shardResults: results
|
||||
};
|
||||
}
|
||||
|
||||
async transaction(shardKey, callback) {
|
||||
/**
|
||||
* Execute transaction on specific shard.
|
||||
* Cross-shard transactions require 2PC (not implemented).
|
||||
*/
|
||||
const shardId = this.getShardId(shardKey);
|
||||
const shard = this.shards.get(shardId);
|
||||
|
||||
const client = await shard.pool.connect();
|
||||
|
||||
try {
|
||||
await client.query('BEGIN');
|
||||
const result = await callback(client);
|
||||
await client.query('COMMIT');
|
||||
return result;
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK');
|
||||
throw error;
|
||||
} finally {
|
||||
client.release();
|
||||
}
|
||||
}
|
||||
|
||||
getStats() {
|
||||
const stats = {};
|
||||
|
||||
for (const [shardId, shard] of this.shards) {
|
||||
stats[shardId] = {
|
||||
...shard.stats,
|
||||
poolSize: shard.pool.totalCount,
|
||||
idleConnections: shard.pool.idleCount,
|
||||
waitingClients: shard.pool.waitingCount
|
||||
};
|
||||
}
|
||||
|
||||
return stats;
|
||||
}
|
||||
|
||||
async close() {
|
||||
for (const shard of this.shards.values()) {
|
||||
await shard.pool.end();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Usage example
|
||||
const shardPool = new ShardConnectionPool([
|
||||
{
|
||||
shardId: 1,
|
||||
host: 'shard1.db.example.com',
|
||||
port: 5432,
|
||||
database: 'myapp_shard1',
|
||||
user: 'app_user',
|
||||
password: 'password',
|
||||
weight: 1
|
||||
},
|
||||
{
|
||||
shardId: 2,
|
||||
host: 'shard2.db.example.com',
|
||||
port: 5432,
|
||||
database: 'myapp_shard2',
|
||||
user: 'app_user',
|
||||
password: 'password',
|
||||
weight: 2
|
||||
}
|
||||
]);
|
||||
|
||||
// Single-shard query
|
||||
const userId = 'user_12345';
|
||||
const user = await shardPool.query(
|
||||
userId,
|
||||
'SELECT * FROM users WHERE user_id = $1',
|
||||
[userId]
|
||||
);
|
||||
|
||||
// Cross-shard query (expensive - avoid if possible)
|
||||
const allActiveUsers = await shardPool.queryMultipleShards(
|
||||
'SELECT * FROM users WHERE status = $1',
|
||||
['active']
|
||||
);
|
||||
|
||||
console.log(`Found ${allActiveUsers.rows.length} active users across all shards`);
|
||||
|
||||
// Transaction on specific shard
|
||||
await shardPool.transaction(userId, async (client) => {
|
||||
await client.query(
|
||||
'UPDATE users SET balance = balance - $1 WHERE user_id = $2',
|
||||
[100, userId]
|
||||
);
|
||||
|
||||
await client.query(
|
||||
'INSERT INTO transactions (user_id, amount, type) VALUES ($1, $2, $3)',
|
||||
[userId, -100, 'withdrawal']
|
||||
);
|
||||
});
|
||||
|
||||
// Monitor shard health
|
||||
setInterval(() => {
|
||||
const stats = shardPool.getStats();
|
||||
console.log('Shard statistics:', JSON.stringify(stats, null, 2));
|
||||
}, 60000);
|
||||
```
|
||||
|
||||
### Example 3: Geographic Sharding with Data Residency
|
||||
|
||||
```python
|
||||
# sharding/geo_shard_router.py
|
||||
from typing import Dict, Optional
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
class Region(Enum):
|
||||
"""Geographic regions for data residency compliance."""
|
||||
US_EAST = 'us-east'
|
||||
US_WEST = 'us-west'
|
||||
EU_WEST = 'eu-west'
|
||||
ASIA_PACIFIC = 'asia-pacific'
|
||||
|
||||
@dataclass
|
||||
class GeoShardConfig:
|
||||
region: Region
|
||||
shard_id: int
|
||||
host: str
|
||||
port: int
|
||||
database: str
|
||||
data_residency_compliant: bool = True
|
||||
|
||||
class GeographicShardRouter:
|
||||
"""
|
||||
Route queries to region-specific shards for GDPR/data residency compliance.
|
||||
|
||||
Each user/tenant is assigned to a geographic region and all their data
|
||||
resides in shards within that region.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.region_shards: Dict[Region, list[GeoShardConfig]] = {}
|
||||
self.user_region_map: Dict[str, Region] = {} # user_id -> region
|
||||
|
||||
def add_region_shard(self, shard: GeoShardConfig) -> None:
|
||||
"""Add shard for specific geographic region."""
|
||||
if shard.region not in self.region_shards:
|
||||
self.region_shards[shard.region] = []
|
||||
|
||||
self.region_shards[shard.region].append(shard)
|
||||
print(f"Added shard {shard.shard_id} for region {shard.region.value}")
|
||||
|
||||
def assign_user_region(self, user_id: str, region: Region) -> None:
|
||||
"""Assign user to geographic region (permanent assignment)."""
|
||||
if user_id in self.user_region_map:
|
||||
raise ValueError(
|
||||
f"User {user_id} already assigned to {self.user_region_map[user_id]}"
|
||||
)
|
||||
|
||||
self.user_region_map[user_id] = region
|
||||
print(f"Assigned user {user_id} to region {region.value}")
|
||||
|
||||
def get_shard_for_user(self, user_id: str) -> Optional[GeoShardConfig]:
|
||||
"""Get shard for user based on regional assignment."""
|
||||
region = self.user_region_map.get(user_id)
|
||||
|
||||
if not region:
|
||||
raise ValueError(f"User {user_id} not assigned to any region")
|
||||
|
||||
shards = self.region_shards.get(region)
|
||||
|
||||
if not shards:
|
||||
raise ValueError(f"No shards available for region {region.value}")
|
||||
|
||||
# Simple round-robin across shards in region
|
||||
# Could use consistent hashing within region for better distribution
|
||||
shard_idx = hash(user_id) % len(shards)
|
||||
return shards[shard_idx]
|
||||
|
||||
def validate_data_residency(self, user_id: str, shard: GeoShardConfig) -> bool:
|
||||
"""Ensure data residency compliance before query execution."""
|
||||
user_region = self.user_region_map.get(user_id)
|
||||
|
||||
if user_region != shard.region:
|
||||
raise ValueError(
|
||||
f"Data residency violation: User {user_id} in {user_region.value} "
|
||||
f"attempting access to shard in {shard.region.value}"
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
# Usage
|
||||
geo_router = GeographicShardRouter()
|
||||
|
||||
# Add region-specific shards
|
||||
geo_router.add_region_shard(GeoShardConfig(
|
||||
region=Region.US_EAST,
|
||||
shard_id=1,
|
||||
host='us-east-shard1.db.example.com',
|
||||
port=5432,
|
||||
database='myapp_us_east'
|
||||
))
|
||||
|
||||
geo_router.add_region_shard(GeoShardConfig(
|
||||
region=Region.EU_WEST,
|
||||
shard_id=2,
|
||||
host='eu-west-shard1.db.example.com',
|
||||
port=5432,
|
||||
database='myapp_eu_west',
|
||||
data_residency_compliant=True
|
||||
))
|
||||
|
||||
# Assign users to regions (based on signup location)
|
||||
geo_router.assign_user_region('user_us_12345', Region.US_EAST)
|
||||
geo_router.assign_user_region('user_eu_67890', Region.EU_WEST)
|
||||
|
||||
# Route queries to correct regional shard
|
||||
us_user_shard = geo_router.get_shard_for_user('user_us_12345')
|
||||
print(f"US user → {us_user_shard.host}")
|
||||
|
||||
eu_user_shard = geo_router.get_shard_for_user('user_eu_67890')
|
||||
print(f"EU user → {eu_user_shard.host}")
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| "No shards available" | All shards offline or empty ring | Add at least one shard, check shard health |
|
||||
| "Cross-shard foreign key violation" | Reference to data on different shard | Denormalize data or use application-level joins |
|
||||
| "Shard rebalancing in progress" | Data migration active | Retry query or route to new shard |
|
||||
| "Distributed transaction failure" | 2PC coordinator unreachable | Implement saga pattern or idempotent operations |
|
||||
| "Hotspot detected on shard" | Uneven key distribution | Rebalance with more virtual nodes or reshard |
|
||||
|
||||
## Configuration Options
|
||||
|
||||
**Sharding Strategies**
|
||||
- `consistent_hash`: Best for even distribution, minimal rebalancing
|
||||
- `range`: Simple, good for time-series, prone to hotspots
|
||||
- `directory`: Flexible, requires lookup table maintenance
|
||||
- `geographic`: Data residency compliance, region isolation
|
||||
|
||||
**Virtual Nodes**
|
||||
- 50-100: Faster routing, less even distribution
|
||||
- 150-200: Balanced (recommended for production)
|
||||
- 300+: Most even distribution, higher memory usage
|
||||
|
||||
**Connection Pooling**
|
||||
- `max_connections_per_shard`: 10-50 depending on load
|
||||
- `idle_timeout`: 30-60 seconds
|
||||
- `connection_timeout`: 2-5 seconds
|
||||
|
||||
## Best Practices
|
||||
|
||||
DO:
|
||||
- Use immutable, high-cardinality shard keys (user_id, tenant_id)
|
||||
- Implement connection pooling per shard
|
||||
- Monitor shard load distribution continuously
|
||||
- Design for cross-shard query minimization
|
||||
- Use read replicas within shards for scale
|
||||
- Plan shard capacity for 2-3 years growth
|
||||
|
||||
DON'T:
|
||||
- Use mutable shard keys (email, username can change)
|
||||
- Perform JOINs across shards (denormalize instead)
|
||||
- Ignore shard imbalance (leads to hotspots)
|
||||
- Add shards without capacity planning
|
||||
- Skip monitoring per-shard metrics
|
||||
- Use distributed transactions without strong justification
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- Shard routing adds ~1-5ms latency per query
|
||||
- Cross-shard queries 10-100x slower than single-shard
|
||||
- Adding shard affects K/n keys where K=total keys, n=shard count
|
||||
- Virtual nodes increase routing time O(log(v*n)) but improve distribution
|
||||
- Connection pool per shard adds memory overhead (~10MB per pool)
|
||||
- Rebalancing requires dual-write period (5-10% overhead)
|
||||
|
||||
## Related Commands
|
||||
|
||||
- `/database-partition-manager` - Partition tables within shards
|
||||
- `/database-replication-manager` - Set up replicas per shard
|
||||
- `/database-migration-manager` - Migrate data between shards
|
||||
- `/database-health-monitor` - Monitor per-shard health metrics
|
||||
|
||||
## Version History
|
||||
|
||||
- v1.0.0 (2024-10): Initial implementation with consistent hashing and geographic routing
|
||||
- Planned v1.1.0: Add automatic shard rebalancing and distributed transaction support
|
||||
61
plugin.lock.json
Normal file
61
plugin.lock.json
Normal file
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||
"pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/database/database-sharding-manager",
|
||||
"normalized": {
|
||||
"repo": null,
|
||||
"ref": "refs/tags/v20251128.0",
|
||||
"commit": "0313f61e2428178a2d7324284abd1c78bbe44928",
|
||||
"treeHash": "e34a397acfec31c7bc45f9ee35f580c5822bd05875be5a30d17712802f009729",
|
||||
"generatedAt": "2025-11-28T10:18:21.823678Z",
|
||||
"toolVersion": "publish_plugins.py@0.2.0"
|
||||
},
|
||||
"origin": {
|
||||
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||
"branch": "master",
|
||||
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||
},
|
||||
"manifest": {
|
||||
"name": "database-sharding-manager",
|
||||
"description": "Database plugin for database-sharding-manager",
|
||||
"version": "1.0.0"
|
||||
},
|
||||
"content": {
|
||||
"files": [
|
||||
{
|
||||
"path": "README.md",
|
||||
"sha256": "b5ee974de3c8061e1d4878cebeb9faa9ff914b7bfba30b64b6ae025de488efdc"
|
||||
},
|
||||
{
|
||||
"path": ".claude-plugin/plugin.json",
|
||||
"sha256": "7afaf3dfc37eb5d9ba72d46123f640f04ef04923605d87cdd6820bc0625221de"
|
||||
},
|
||||
{
|
||||
"path": "commands/sharding.md",
|
||||
"sha256": "930eea29f7fd0e721c4885793ecec3d3d28082307497490b713b05f887b9aea9"
|
||||
},
|
||||
{
|
||||
"path": "skills/database-sharding-manager/SKILL.md",
|
||||
"sha256": "4d25b5792c07564387403b989f7ee6e549ba90c80db444d1d4845a011e576e1f"
|
||||
},
|
||||
{
|
||||
"path": "skills/database-sharding-manager/references/README.md",
|
||||
"sha256": "c6562801f8dc741600dc90086f314623ba0c462eb4ddb61e6b88613f11fd7a47"
|
||||
},
|
||||
{
|
||||
"path": "skills/database-sharding-manager/scripts/README.md",
|
||||
"sha256": "3ad8a7fddd2eaf4d96876d5b8e83551c77510b171d2cf19176e7467216749b3b"
|
||||
},
|
||||
{
|
||||
"path": "skills/database-sharding-manager/assets/README.md",
|
||||
"sha256": "7db047463743a3eec9d31a5d4b570cc832bbc70c98e1dec3ef77ac73117320aa"
|
||||
}
|
||||
],
|
||||
"dirSha256": "e34a397acfec31c7bc45f9ee35f580c5822bd05875be5a30d17712802f009729"
|
||||
},
|
||||
"security": {
|
||||
"scannedAt": null,
|
||||
"scannerVersion": null,
|
||||
"flags": []
|
||||
}
|
||||
}
|
||||
54
skills/database-sharding-manager/SKILL.md
Normal file
54
skills/database-sharding-manager/SKILL.md
Normal file
@@ -0,0 +1,54 @@
|
||||
---
|
||||
name: managing-database-sharding
|
||||
description: |
|
||||
This skill assists with managing database sharding strategies. It is activated when the user needs to implement horizontal database sharding to scale beyond single-server limitations. The skill supports designing sharding strategies, distributing data across multiple database instances, and implementing consistent hashing, automatic rebalancing, and cross-shard query coordination. Use this skill when the user mentions "database sharding", "sharding implementation", "scale database", or "horizontal partitioning". The plugin helps design and implement sharding for high-scale applications.
|
||||
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This skill empowers Claude to design and implement horizontal database sharding strategies. It guides the user through the process of distributing data across multiple database instances, ensuring scalability and performance for applications handling large datasets and high query loads.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Strategy Design**: Analyzes the application's data model and access patterns to determine the optimal sharding key and sharding strategy (e.g., range-based, hash-based).
|
||||
2. **Implementation Planning**: Generates a detailed plan for implementing the chosen sharding strategy, including database schema modifications, data migration procedures, and application code changes.
|
||||
3. **Cross-Shard Query Coordination**: Provides guidance on implementing cross-shard query coordination mechanisms to ensure data consistency and accuracy across multiple shards.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
This skill activates when you need to:
|
||||
- Scale a database beyond the capacity of a single server.
|
||||
- Distribute write load across multiple database servers.
|
||||
- Improve database performance by reducing contention.
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Scaling an E-commerce Product Catalog
|
||||
|
||||
User request: "Implement database sharding for my e-commerce product catalog to handle increased traffic and product listings."
|
||||
|
||||
The skill will:
|
||||
1. Analyze the product catalog's data model and access patterns.
|
||||
2. Recommend a hash-based sharding strategy based on product ID.
|
||||
3. Generate a plan for migrating the product catalog data to the sharded database.
|
||||
|
||||
### Example 2: Sharding a Social Media Activity Feed
|
||||
|
||||
User request: "Design a sharding strategy for a social media activity feed to handle millions of users and billions of activities."
|
||||
|
||||
The skill will:
|
||||
1. Evaluate the activity feed's data model and query patterns.
|
||||
2. Suggest a time-based sharding strategy combined with user ID sharding.
|
||||
3. Outline the steps for implementing cross-shard queries to retrieve activities across multiple shards.
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Data Modeling**: Carefully consider the sharding key and its impact on query performance.
|
||||
- **Data Migration**: Plan the data migration process thoroughly to minimize downtime and ensure data integrity.
|
||||
- **Monitoring**: Implement robust monitoring to track shard performance and identify potential issues.
|
||||
|
||||
## Integration
|
||||
|
||||
This skill can be integrated with other database management tools and plugins to automate tasks such as schema creation, data migration, and monitoring. It complements plugins focused on database deployment and performance tuning.
|
||||
7
skills/database-sharding-manager/assets/README.md
Normal file
7
skills/database-sharding-manager/assets/README.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Assets
|
||||
|
||||
Bundled resources for database-sharding-manager skill
|
||||
|
||||
- [ ] sharding_config_template.yaml: Template for the sharding configuration file.
|
||||
- [ ] example_data_distribution.json: Example of data distribution across shards.
|
||||
- [ ] monitoring_dashboard.json: Example dashboard configuration for monitoring sharding performance.
|
||||
10
skills/database-sharding-manager/references/README.md
Normal file
10
skills/database-sharding-manager/references/README.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# References
|
||||
|
||||
Bundled resources for database-sharding-manager skill
|
||||
|
||||
- [ ] sharding_strategies.md: Documentation on different database sharding strategies (e.g., range-based, hash-based).
|
||||
- [ ] consistent_hashing.md: Detailed explanation of consistent hashing and its implementation.
|
||||
- [ ] auto_rebalancing.md: Guide on implementing automatic rebalancing of shards.
|
||||
- [ ] cross_shard_query_patterns.md: Best practices for querying data across multiple shards.
|
||||
- [ ] sharding_api_reference.md: API documentation for interacting with the sharding manager.
|
||||
- [ ] error_handling_sharding.md: Best practices for error handling in sharded environments
|
||||
8
skills/database-sharding-manager/scripts/README.md
Normal file
8
skills/database-sharding-manager/scripts/README.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Scripts
|
||||
|
||||
Bundled resources for database-sharding-manager skill
|
||||
|
||||
- [ ] init_sharding.py: Script to initialize database sharding based on a given strategy.
|
||||
- [ ] rebalance_shards.py: Script to rebalance data across shards automatically.
|
||||
- [ ] cross_shard_query.py: Script to execute queries across multiple shards and aggregate results.
|
||||
- [ ] validate_sharding_config.py: Script to validate the sharding configuration file.
|
||||
Reference in New Issue
Block a user