4.9 KiB
4.9 KiB
Proxmox Troubleshooting Reference
Common Errors
| Error | Cause | Solution |
|---|---|---|
| VM won't start | Lock, storage, resources | qm unlock, check storage, verify resources |
| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
| Cluster issues | Quorum, network, time | pvecm status, check NTP, network |
| Storage unavailable | Mount failed, network | Check mount, network access |
| High load | Resource contention | Identify bottleneck, rebalance VMs |
| Network issues | Bridge, VLAN, firewall | brctl show, check tags, firewall rules |
| Backup failed | Disk space, VM state | Check space, storage access |
| Template not found | Not downloaded | Download from Proxmox repo |
| API errors | Auth, permissions | Check token, user permissions |
Diagnostic Commands
Cluster Health
pvecm status # Quorum and node status
pvecm nodes # List cluster members
systemctl status pve-cluster # Cluster service
systemctl status corosync # Corosync service
Node Health
pveversion -v # Proxmox version info
uptime # Load and uptime
free -h # Memory usage
df -h # Disk space
top -bn1 | head -20 # Process overview
VM Diagnostics
qm status <vmid> # VM state
qm config <vmid> # VM configuration
qm showcmd <vmid> # QEMU command line
qm unlock <vmid> # Clear locks
qm monitor <vmid> # QEMU monitor access
Container Diagnostics
pct status <ctid> # Container state
pct config <ctid> # Container configuration
pct enter <ctid> # Enter container shell
pct unlock <ctid> # Clear locks
Storage Diagnostics
pvesm status # Storage status
df -h # Disk space
mount | grep -E 'nfs|ceph' # Mounted storage
zpool status # ZFS pool status (if using ZFS)
ceph -s # Ceph status (if using Ceph)
Network Diagnostics
brctl show # Bridge configuration
ip link # Network interfaces
ip addr # IP addresses
ip route # Routing table
bridge vlan show # VLAN configuration
Log Files
# Cluster logs
journalctl -u pve-cluster
journalctl -u corosync
# VM/Container logs
journalctl | grep <vmid>
tail -f /var/log/pve/tasks/*
# Firewall logs
journalctl -u pve-firewall
# Web interface logs
journalctl -u pveproxy
Troubleshooting Workflows
VM Won't Start
- Check for locks:
qm unlock <vmid> - Verify storage:
pvesm status - Check resources:
free -h,df -h - Review config:
qm config <vmid> - Check logs:
journalctl | grep <vmid> - Try manual start:
qm start <vmid> --debug
Migration Failure
- Verify shared storage:
pvesm status - Check target resources:
pvesh get /nodes/<target>/status - Verify network:
ping <target-node> - Check version match:
pveversionon both nodes - Review migration logs
Cluster Quorum Lost
- Check status:
pvecm status - Identify online nodes
- If majority lost, set expected:
pvecm expected <n> - Recover remaining nodes
- Rejoin lost nodes when available
Storage Mount Failed
- Check network:
ping <storage-server> - Verify mount:
mount | grep <storage> - Try manual mount
- Check permissions on storage server
- Review
/var/log/syslog
High CPU/Memory Usage
- Identify culprit:
top,htop - Check VM resources:
qm monitor <vmid>→info balloon - Review resource allocation across cluster
- Consider migration or resource limits
Recovery Procedures
Remove Failed Node
# On healthy node
pvecm delnode <failed-node>
# Clean up node-specific configs
rm -rf /etc/pve/nodes/<failed-node>
Force Stop Locked VM
# Remove lock
qm unlock <vmid>
# If still stuck, find and kill QEMU process
ps aux | grep <vmid>
kill <pid>
# Force cleanup
qm stop <vmid> --skiplock
Recover from Corrupt Config
# Backup current config
cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
# Edit config manually
nano /etc/pve/qemu-server/<vmid>.conf
# Or restore from backup
qmrestore <backup> <vmid>
Health Check Script
#!/bin/bash
echo "=== Cluster Status ==="
pvecm status
echo -e "\n=== Node Resources ==="
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
echo "--- $node ---"
pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
done
echo -e "\n=== Storage Status ==="
pvesm status
echo -e "\n=== Running VMs ==="
qm list | grep running
echo -e "\n=== Running Containers ==="
pct list | grep running