# Proxmox Troubleshooting Reference ## Common Errors | Error | Cause | Solution | |-------|-------|----------| | VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources | | Migration failed | No shared storage, resources | Verify shared storage, check target capacity | | Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network | | Storage unavailable | Mount failed, network | Check mount, network access | | High load | Resource contention | Identify bottleneck, rebalance VMs | | Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules | | Backup failed | Disk space, VM state | Check space, storage access | | Template not found | Not downloaded | Download from Proxmox repo | | API errors | Auth, permissions | Check token, user permissions | ## Diagnostic Commands ### Cluster Health ```bash pvecm status # Quorum and node status pvecm nodes # List cluster members systemctl status pve-cluster # Cluster service systemctl status corosync # Corosync service ``` ### Node Health ```bash pveversion -v # Proxmox version info uptime # Load and uptime free -h # Memory usage df -h # Disk space top -bn1 | head -20 # Process overview ``` ### VM Diagnostics ```bash qm status # VM state qm config # VM configuration qm showcmd # QEMU command line qm unlock # Clear locks qm monitor # QEMU monitor access ``` ### Container Diagnostics ```bash pct status # Container state pct config # Container configuration pct enter # Enter container shell pct unlock # Clear locks ``` ### Storage Diagnostics ```bash pvesm status # Storage status df -h # Disk space mount | grep -E 'nfs|ceph' # Mounted storage zpool status # ZFS pool status (if using ZFS) ceph -s # Ceph status (if using Ceph) ``` ### Network Diagnostics ```bash brctl show # Bridge configuration ip link # Network interfaces ip addr # IP addresses ip route # Routing table bridge vlan show # VLAN configuration ``` ### Log Files ```bash # Cluster logs journalctl -u pve-cluster journalctl -u corosync # VM/Container logs journalctl | grep tail -f /var/log/pve/tasks/* # Firewall logs journalctl -u pve-firewall # Web interface logs journalctl -u pveproxy ``` ## Troubleshooting Workflows ### VM Won't Start 1. Check for locks: `qm unlock ` 2. Verify storage: `pvesm status` 3. Check resources: `free -h`, `df -h` 4. Review config: `qm config ` 5. Check logs: `journalctl | grep ` 6. Try manual start: `qm start --debug` ### Migration Failure 1. Verify shared storage: `pvesm status` 2. Check target resources: `pvesh get /nodes//status` 3. Verify network: `ping ` 4. Check version match: `pveversion` on both nodes 5. Review migration logs ### Cluster Quorum Lost 1. Check status: `pvecm status` 2. Identify online nodes 3. If majority lost, set expected: `pvecm expected ` 4. Recover remaining nodes 5. Rejoin lost nodes when available ### Storage Mount Failed 1. Check network: `ping ` 2. Verify mount: `mount | grep ` 3. Try manual mount 4. Check permissions on storage server 5. Review `/var/log/syslog` ### High CPU/Memory Usage 1. Identify culprit: `top`, `htop` 2. Check VM resources: `qm monitor ` → `info balloon` 3. Review resource allocation across cluster 4. Consider migration or resource limits ## Recovery Procedures ### Remove Failed Node ```bash # On healthy node pvecm delnode # Clean up node-specific configs rm -rf /etc/pve/nodes/ ``` ### Force Stop Locked VM ```bash # Remove lock qm unlock # If still stuck, find and kill QEMU process ps aux | grep kill # Force cleanup qm stop --skiplock ``` ### Recover from Corrupt Config ```bash # Backup current config cp /etc/pve/qemu-server/.conf /root/.conf.bak # Edit config manually nano /etc/pve/qemu-server/.conf # Or restore from backup qmrestore ``` ## Health Check Script ```bash #!/bin/bash echo "=== Cluster Status ===" pvecm status echo -e "\n=== Node Resources ===" for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do echo "--- $node ---" pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):' done echo -e "\n=== Storage Status ===" pvesm status echo -e "\n=== Running VMs ===" qm list | grep running echo -e "\n=== Running Containers ===" pct list | grep running ```