3.2 KiB
3.2 KiB
Proxmox Clustering Reference
Cluster Benefits
- Centralized web management
- Live VM migration between nodes
- High availability (HA) with automatic failover
- Shared configuration
Cluster Requirements
| Requirement | Details |
|---|---|
| Version | Same major/minor Proxmox version |
| Time | NTP synchronized |
| Network | Low-latency cluster network |
| Names | Unique node hostnames |
| Storage | Shared storage for HA |
Cluster Commands
# Check cluster status
pvecm status
# List cluster nodes
pvecm nodes
# Add node to cluster (run on new node)
pvecm add <existing-node>
# Remove node (run on remaining node)
pvecm delnode <node-name>
# Expected votes (split-brain recovery)
pvecm expected <votes>
Quorum
Cluster requires majority of nodes online to operate.
| Nodes | Quorum | Can Lose |
|---|---|---|
| 2 | 2 | 0 (use QDevice) |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
QDevice
External quorum device for even-node clusters:
- Prevents split-brain in 2-node clusters
- Runs on separate machine
- Provides tie-breaking vote
High Availability (HA)
Automatic VM restart on healthy node if host fails.
Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Fencing enabled (watchdog)
- HA group configured
- VM added to HA
HA States
| State | Description |
|---|---|
| started | VM running, managed by HA |
| stopped | VM stopped intentionally |
| migrate | Migration in progress |
| relocate | Moving to different node |
| error | Problem detected |
HA Configuration
- Enable fencing (watchdog device)
- Create HA group (optional)
- Add VM to HA: Datacenter → HA → Add
Fencing
Prevents split-brain by forcing failed node to stop:
# Check watchdog status
cat /proc/sys/kernel/watchdog
# Watchdog config
/etc/pve/ha/fence.cfg
Live Migration
Move running VM between nodes without downtime.
Requirements
- Shared storage OR local-to-local migration
- Same CPU architecture
- Network connectivity
- Sufficient resources on target
Migration Types
| Type | Downtime | Requirements |
|---|---|---|
| Live | Minimal | Shared storage |
| Offline | Full | Any storage |
| Local storage | Moderate | Copies disk |
Migration Command
# Live migrate
qm migrate <vmid> <target-node>
# Offline migrate
qm migrate <vmid> <target-node> --offline
# With local disk
qm migrate <vmid> <target-node> --with-local-disks
Cluster Network
Corosync Network
Cluster communication (default port 5405):
- Low-latency required
- Dedicated VLAN recommended
- Redundant links for HA
Configuration
# /etc/pve/corosync.conf
nodelist {
node {
name: node1
ring0_addr: 192.168.10.1
}
node {
name: node2
ring0_addr: 192.168.10.2
}
}
Troubleshooting
Quorum Lost
# Check status
pvecm status
# Force expected votes (DANGEROUS)
pvecm expected 1
# Then: recover remaining nodes
Node Won't Join
- Check network connectivity
- Verify time sync
- Check Proxmox versions match
- Review /var/log/pve-cluster/
Split Brain Recovery
- Identify authoritative node
- Stop cluster services on other nodes
- Set expected votes
- Restart and rejoin nodes