Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:38 +08:00
commit 18faa0569e
47 changed files with 7969 additions and 0 deletions

View File

@@ -0,0 +1,181 @@
# Proxmox Clustering Reference
## Cluster Benefits
- Centralized web management
- Live VM migration between nodes
- High availability (HA) with automatic failover
- Shared configuration
## Cluster Requirements
| Requirement | Details |
|-------------|---------|
| Version | Same major/minor Proxmox version |
| Time | NTP synchronized |
| Network | Low-latency cluster network |
| Names | Unique node hostnames |
| Storage | Shared storage for HA |
## Cluster Commands
```bash
# Check cluster status
pvecm status
# List cluster nodes
pvecm nodes
# Add node to cluster (run on new node)
pvecm add <existing-node>
# Remove node (run on remaining node)
pvecm delnode <node-name>
# Expected votes (split-brain recovery)
pvecm expected <votes>
```
## Quorum
Cluster requires majority of nodes online to operate.
| Nodes | Quorum | Can Lose |
|-------|--------|----------|
| 2 | 2 | 0 (use QDevice) |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
### QDevice
External quorum device for even-node clusters:
- Prevents split-brain in 2-node clusters
- Runs on separate machine
- Provides tie-breaking vote
## High Availability (HA)
Automatic VM restart on healthy node if host fails.
### Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Fencing enabled (watchdog)
- HA group configured
- VM added to HA
### HA States
| State | Description |
|-------|-------------|
| started | VM running, managed by HA |
| stopped | VM stopped intentionally |
| migrate | Migration in progress |
| relocate | Moving to different node |
| error | Problem detected |
### HA Configuration
1. Enable fencing (watchdog device)
2. Create HA group (optional)
3. Add VM to HA: Datacenter → HA → Add
### Fencing
Prevents split-brain by forcing failed node to stop:
```bash
# Check watchdog status
cat /proc/sys/kernel/watchdog
# Watchdog config
/etc/pve/ha/fence.cfg
```
## Live Migration
Move running VM between nodes without downtime.
### Requirements
- Shared storage OR local-to-local migration
- Same CPU architecture
- Network connectivity
- Sufficient resources on target
### Migration Types
| Type | Downtime | Requirements |
|------|----------|--------------|
| Live | Minimal | Shared storage |
| Offline | Full | Any storage |
| Local storage | Moderate | Copies disk |
### Migration Command
```bash
# Live migrate
qm migrate <vmid> <target-node>
# Offline migrate
qm migrate <vmid> <target-node> --offline
# With local disk
qm migrate <vmid> <target-node> --with-local-disks
```
## Cluster Network
### Corosync Network
Cluster communication (default port 5405):
- Low-latency required
- Dedicated VLAN recommended
- Redundant links for HA
### Configuration
```
# /etc/pve/corosync.conf
nodelist {
node {
name: node1
ring0_addr: 192.168.10.1
}
node {
name: node2
ring0_addr: 192.168.10.2
}
}
```
## Troubleshooting
### Quorum Lost
```bash
# Check status
pvecm status
# Force expected votes (DANGEROUS)
pvecm expected 1
# Then: recover remaining nodes
```
### Node Won't Join
- Check network connectivity
- Verify time sync
- Check Proxmox versions match
- Review /var/log/pve-cluster/
### Split Brain Recovery
1. Identify authoritative node
2. Stop cluster services on other nodes
3. Set expected votes
4. Restart and rejoin nodes