Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:38 +08:00
commit 18faa0569e
47 changed files with 7969 additions and 0 deletions

View File

@@ -0,0 +1,179 @@
# Proxmox Automation Tools
Integration patterns for managing Proxmox with Terraform and Ansible.
## Tool Selection Guide
| Task | Recommended Tool | Rationale |
|------|-----------------|-----------|
| VM/LXC provisioning | Terraform | Declarative state, idempotent, handles dependencies |
| Template creation | Packer | Repeatable builds, version-controlled |
| Post-boot configuration | Ansible | Agent-based, procedural, good for drift |
| One-off VM operations | Ansible | Quick tasks, no state file needed |
| Dynamic inventory | Ansible | Query running VMs for configuration |
| Bulk VM creation | Terraform | count/for_each, parallel creation |
| Snapshot management | Either | Terraform for lifecycle, Ansible for ad-hoc |
| Cluster administration | CLI/API | Direct access for maintenance tasks |
## Terraform Integration
### Provider
```hcl
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
provider "proxmox" {
pm_api_url = "https://proxmox.example.com:8006/api2/json"
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
}
```
### Common Patterns
```hcl
# Clone from template
resource "proxmox_vm_qemu" "vm" {
name = "myvm"
target_node = "joseph"
clone = "tmpl-ubuntu-2404-standard"
full_clone = true
cores = 2
memory = 4096
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm"
size = "50G"
}
}
}
}
}
```
### Skill Reference
Load terraform skill for detailed patterns:
- `terraform/references/proxmox/gotchas.md` - Critical issues
- `terraform/references/proxmox/vm-qemu.md` - VM resource patterns
- `terraform/references/proxmox/authentication.md` - API setup
## Ansible Integration
### Collection
```bash
ansible-galaxy collection install community.general
```
### Common Patterns
```yaml
# Clone VM
- name: Clone from template
community.general.proxmox_kvm:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 300
name: myvm
clone: tmpl-ubuntu-2404-standard
full: true
timeout: 500
# Start VM
- name: Start VM
community.general.proxmox_kvm:
# ... auth ...
vmid: 300
state: started
```
### Skill Reference
Load ansible skill for detailed patterns:
- `ansible/references/proxmox/modules.md` - All Proxmox modules
- `ansible/references/proxmox/gotchas.md` - Common issues
- `ansible/references/proxmox/dynamic-inventory.md` - Auto-discovery
## Terraform vs Ansible Decision
### Use Terraform When
- Creating infrastructure from scratch
- Managing VM lifecycle (create, update, destroy)
- Need state tracking and drift detection
- Deploying multiple similar VMs (for_each)
- Complex dependencies between resources
- Team collaboration with state locking
### Use Ansible When
- Configuring VMs after creation
- Ad-hoc operations (start/stop specific VMs)
- Dynamic inventory needed for other playbooks
- Quick one-off tasks
- No state file management desired
- Integration with existing Ansible workflows
### Use Both When
- Terraform provisions VMs
- Ansible configures them post-boot
- Ansible uses Proxmox dynamic inventory to find Terraform-created VMs
## Hybrid Workflow Example
```
1. Packer builds VM template
└── packer build ubuntu-2404.pkr.hcl
2. Terraform provisions VMs from template
└── terraform apply
└── Outputs: VM IPs, hostnames
3. Ansible configures VMs
└── Uses Proxmox dynamic inventory OR
└── Uses Terraform output as inventory
4. Ongoing management
└── Terraform for infrastructure changes
└── Ansible for configuration drift
```
## API Token Sharing
Both tools can share the same API token:
```bash
# Create shared token
pveum user add automation@pve
pveum aclmod / -user automation@pve -role PVEAdmin
pveum user token add automation@pve shared --privsep 0
```
Store in shared secrets management (1Password, Vault, etc.).
## Common Gotchas
| Issue | Terraform | Ansible |
|-------|-----------|---------|
| VMID | Auto-assigns if not specified | Must specify manually |
| Cloud-init changes | Use replace_triggered_by | Limited support, use API |
| State tracking | Yes (tfstate) | No state file |
| Parallel operations | Yes (configurable) | Yes (forks) |
| Template name vs ID | Supports both | Supports both |
| Timeout handling | Provider config | Module parameter |

View File

@@ -0,0 +1,162 @@
# Proxmox Backup Reference
## vzdump Overview
Built-in backup tool for VMs and containers.
```bash
# Basic backup
vzdump <vmid>
# With options
vzdump <vmid> --mode snapshot --storage backup-nfs --compress zstd
# Backup all VMs
vzdump --all --compress zstd
```
## Backup Modes
| Mode | Downtime | Method | Use Case |
|------|----------|--------|----------|
| stop | Full | Shutdown, backup, start | Consistent, any storage |
| suspend | Brief | Pause, backup, resume | Running state preserved |
| snapshot | None | LVM/ZFS/Ceph snapshot | Production, requires snapshot storage |
### Mode Selection
```bash
# Stop mode (most consistent)
vzdump <vmid> --mode stop
# Suspend mode (preserves RAM state)
vzdump <vmid> --mode suspend
# Snapshot mode (live, requires compatible storage)
vzdump <vmid> --mode snapshot
```
## Backup Formats
| Format | Type | Compression |
|--------|------|-------------|
| VMA | VMs | Native Proxmox format |
| tar | Containers | Standard tar archive |
## Compression Options
| Type | Speed | Ratio | CPU |
|------|-------|-------|-----|
| none | Fastest | 1:1 | Low |
| lzo | Fast | Good | Low |
| gzip | Moderate | Better | Medium |
| zstd | Fast | Best | Medium |
Recommendation: `zstd` for best balance.
```bash
vzdump <vmid> --compress zstd
```
## Storage Configuration
```bash
# Backup to specific storage
vzdump <vmid> --storage backup-nfs
# Check available backup storage
pvesm status | grep backup
```
## Scheduled Backups
Configure in Datacenter → Backup:
- Schedule (cron format)
- Selection (all, pool, specific VMs)
- Storage destination
- Mode and compression
- Retention policy
### Retention Policy
```
keep-last: 3 # Keep last N backups
keep-daily: 7 # Keep daily for N days
keep-weekly: 4 # Keep weekly for N weeks
keep-monthly: 6 # Keep monthly for N months
```
## Restore Operations
### Full Restore
```bash
# Restore VM
qmrestore <backup-file> <vmid>
# Restore to different VMID
qmrestore <backup-file> <new-vmid>
# Restore container
pct restore <ctid> <backup-file>
```
### Restore Options
```bash
# Restore to different storage
qmrestore <backup> <vmid> --storage local-lvm
# Force overwrite existing VM
qmrestore <backup> <vmid> --force
```
### File-Level Restore
```bash
# Mount backup for file extraction
# (Use web UI: Backup → Restore → File Restore)
```
## Proxmox Backup Server (PBS)
Dedicated backup server with deduplication.
### Benefits
- Deduplication across backups
- Encryption at rest
- Verification and integrity checks
- Efficient incremental backups
- Remote backup sync
### Integration
Add PBS storage:
```bash
pvesm add pbs <storage-id> \
--server <pbs-server> \
--datastore <datastore> \
--username <user>@pbs \
--fingerprint <fingerprint>
```
## Backup Best Practices
- Store backups on separate storage from VMs
- Use snapshot mode for production VMs
- Test restores regularly
- Offsite backup copy for disaster recovery
- Monitor backup job completion
- Set appropriate retention policy
## Troubleshooting
| Issue | Check |
|-------|-------|
| Backup fails | Storage space, VM state, permissions |
| Slow backup | Mode (snapshot faster), compression, network |
| Restore fails | Storage compatibility, VMID conflicts |
| Snapshot fails | Storage doesn't support snapshots |

View File

@@ -0,0 +1,178 @@
# Proxmox CLI Tools Reference
## qm - VM Management
```bash
# List and status
qm list # List all VMs
qm status <vmid> # VM status
qm config <vmid> # Show VM config
# Power operations
qm start <vmid> # Start VM
qm stop <vmid> # Force stop
qm shutdown <vmid> # ACPI shutdown
qm reboot <vmid> # ACPI reboot
qm reset <vmid> # Hard reset
qm suspend <vmid> # Suspend to RAM
qm resume <vmid> # Resume from suspend
# Configuration
qm set <vmid> --memory 4096 # Set memory
qm set <vmid> --cores 4 # Set CPU cores
qm set <vmid> --name newname # Rename VM
# Disk operations
qm resize <vmid> scsi0 +10G # Extend disk
qm move-disk <vmid> scsi0 <storage> # Move disk
# Snapshots
qm snapshot <vmid> <snapname> # Create snapshot
qm listsnapshot <vmid> # List snapshots
qm rollback <vmid> <snapname> # Rollback
qm delsnapshot <vmid> <snapname> # Delete snapshot
# Templates and clones
qm template <vmid> # Convert to template
qm clone <vmid> <newid> # Clone VM
# Migration
qm migrate <vmid> <target-node> # Live migrate
# Troubleshooting
qm unlock <vmid> # Remove lock
qm showcmd <vmid> # Show QEMU command
qm monitor <vmid> # QEMU monitor
qm guest cmd <vmid> <command> # Guest agent command
```
## pct - Container Management
```bash
# List and status
pct list # List all containers
pct status <ctid> # Container status
pct config <ctid> # Show config
# Power operations
pct start <ctid> # Start container
pct stop <ctid> # Stop container
pct shutdown <ctid> # Graceful shutdown
pct reboot <ctid> # Reboot
# Access
pct enter <ctid> # Enter shell
pct exec <ctid> -- <command> # Run command
pct console <ctid> # Attach console
# Configuration
pct set <ctid> --memory 2048 # Set memory
pct set <ctid> --cores 2 # Set CPU cores
pct set <ctid> --hostname name # Set hostname
# Disk operations
pct resize <ctid> rootfs +5G # Extend rootfs
pct move-volume <ctid> <vol> <storage> # Move volume
# Snapshots
pct snapshot <ctid> <snapname> # Create snapshot
pct listsnapshot <ctid> # List snapshots
pct rollback <ctid> <snapname> # Rollback
# Templates
pct template <ctid> # Convert to template
pct clone <ctid> <newid> # Clone container
# Migration
pct migrate <ctid> <target-node> # Migrate container
# Troubleshooting
pct unlock <ctid> # Remove lock
pct push <ctid> <src> <dst> # Copy file to container
pct pull <ctid> <src> <dst> # Copy file from container
```
## pvecm - Cluster Management
```bash
# Status
pvecm status # Cluster status
pvecm nodes # List nodes
pvecm qdevice # QDevice status
# Node operations
pvecm add <node> # Join cluster
pvecm delnode <node> # Remove node
pvecm updatecerts # Update SSL certs
# Recovery
pvecm expected <votes> # Set expected votes
```
## pvesh - API Shell
```bash
# GET requests
pvesh get /nodes # List nodes
pvesh get /nodes/<node>/status # Node status
pvesh get /nodes/<node>/qemu # List VMs on node
pvesh get /nodes/<node>/qemu/<vmid>/status/current # VM status
pvesh get /storage # List storage
pvesh get /cluster/resources # All cluster resources
# POST/PUT requests
pvesh create /nodes/<node>/qemu -vmid <id> ... # Create VM
pvesh set /nodes/<node>/qemu/<vmid>/config ... # Modify VM
# DELETE requests
pvesh delete /nodes/<node>/qemu/<vmid> # Delete VM
```
## vzdump - Backup
```bash
# Basic backup
vzdump <vmid> # Backup VM
vzdump <ctid> # Backup container
# Options
vzdump <vmid> --mode snapshot # Snapshot mode
vzdump <vmid> --compress zstd # With compression
vzdump <vmid> --storage backup # To specific storage
vzdump <vmid> --mailto admin@example.com # Email notification
# Backup all
vzdump --all # All VMs and containers
vzdump --pool <pool> # All in pool
```
## qmrestore / pct restore
```bash
# Restore VM
qmrestore <backup.vma> <vmid>
qmrestore <backup.vma> <vmid> --storage local-lvm
# Restore container
pct restore <ctid> <backup.tar>
pct restore <ctid> <backup.tar> --storage local-lvm
```
## Useful Combinations
```bash
# Check resources on all nodes
for node in joseph maxwell everette; do
echo "=== $node ==="
pvesh get /nodes/$node/status | jq '{cpu:.cpu, memory:.memory}'
done
# Stop all VMs on a node
qm list | awk 'NR>1 {print $1}' | xargs -I {} qm stop {}
# List VMs with their IPs (requires guest agent)
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
echo -n "$vmid: "
qm guest cmd $vmid network-get-interfaces 2>/dev/null | jq -r '.[].["ip-addresses"][]?.["ip-address"]' | head -1
done
```

View File

@@ -0,0 +1,181 @@
# Proxmox Clustering Reference
## Cluster Benefits
- Centralized web management
- Live VM migration between nodes
- High availability (HA) with automatic failover
- Shared configuration
## Cluster Requirements
| Requirement | Details |
|-------------|---------|
| Version | Same major/minor Proxmox version |
| Time | NTP synchronized |
| Network | Low-latency cluster network |
| Names | Unique node hostnames |
| Storage | Shared storage for HA |
## Cluster Commands
```bash
# Check cluster status
pvecm status
# List cluster nodes
pvecm nodes
# Add node to cluster (run on new node)
pvecm add <existing-node>
# Remove node (run on remaining node)
pvecm delnode <node-name>
# Expected votes (split-brain recovery)
pvecm expected <votes>
```
## Quorum
Cluster requires majority of nodes online to operate.
| Nodes | Quorum | Can Lose |
|-------|--------|----------|
| 2 | 2 | 0 (use QDevice) |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
### QDevice
External quorum device for even-node clusters:
- Prevents split-brain in 2-node clusters
- Runs on separate machine
- Provides tie-breaking vote
## High Availability (HA)
Automatic VM restart on healthy node if host fails.
### Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Fencing enabled (watchdog)
- HA group configured
- VM added to HA
### HA States
| State | Description |
|-------|-------------|
| started | VM running, managed by HA |
| stopped | VM stopped intentionally |
| migrate | Migration in progress |
| relocate | Moving to different node |
| error | Problem detected |
### HA Configuration
1. Enable fencing (watchdog device)
2. Create HA group (optional)
3. Add VM to HA: Datacenter → HA → Add
### Fencing
Prevents split-brain by forcing failed node to stop:
```bash
# Check watchdog status
cat /proc/sys/kernel/watchdog
# Watchdog config
/etc/pve/ha/fence.cfg
```
## Live Migration
Move running VM between nodes without downtime.
### Requirements
- Shared storage OR local-to-local migration
- Same CPU architecture
- Network connectivity
- Sufficient resources on target
### Migration Types
| Type | Downtime | Requirements |
|------|----------|--------------|
| Live | Minimal | Shared storage |
| Offline | Full | Any storage |
| Local storage | Moderate | Copies disk |
### Migration Command
```bash
# Live migrate
qm migrate <vmid> <target-node>
# Offline migrate
qm migrate <vmid> <target-node> --offline
# With local disk
qm migrate <vmid> <target-node> --with-local-disks
```
## Cluster Network
### Corosync Network
Cluster communication (default port 5405):
- Low-latency required
- Dedicated VLAN recommended
- Redundant links for HA
### Configuration
```
# /etc/pve/corosync.conf
nodelist {
node {
name: node1
ring0_addr: 192.168.10.1
}
node {
name: node2
ring0_addr: 192.168.10.2
}
}
```
## Troubleshooting
### Quorum Lost
```bash
# Check status
pvecm status
# Force expected votes (DANGEROUS)
pvecm expected 1
# Then: recover remaining nodes
```
### Node Won't Join
- Check network connectivity
- Verify time sync
- Check Proxmox versions match
- Review /var/log/pve-cluster/
### Split Brain Recovery
1. Identify authoritative node
2. Stop cluster services on other nodes
3. Set expected votes
4. Restart and rejoin nodes

View File

@@ -0,0 +1,202 @@
# Docker Workloads on Proxmox
Best practices for hosting Docker containers on Proxmox VE.
## Hosting Options
| Option | Isolation | Overhead | Complexity | Recommendation |
|--------|-----------|----------|------------|----------------|
| VM + Docker | Full | Higher | Low | **Recommended** |
| LXC + Docker | Shared kernel | Lower | High | Avoid |
| Bare metal Docker | None | Lowest | N/A | Not on Proxmox |
## VM for Docker (Recommended)
### Template Selection
Use Docker-ready templates (102+):
| Template | Docker Pre-installed |
|----------|---------------------|
| 102 (docker) | Yes |
| 103 (github-runner) | Yes |
| 104 (pihole) | Yes |
### VM Sizing
| Workload | CPU | RAM | Disk |
|----------|-----|-----|------|
| Light (1-3 containers) | 2 | 4 GB | 50 GB |
| Medium (4-10 containers) | 4 | 8 GB | 100 GB |
| Heavy (10+ containers) | 8+ | 16+ GB | 200+ GB |
### Storage Backend
| Proxmox Storage | Docker Suitability | Notes |
|-----------------|-------------------|-------|
| local-lvm | Good | Default, fast |
| ZFS | Best | Snapshots, compression |
| Ceph | Good | Distributed, HA |
| NFS | Moderate | Shared access, slower |
### Network Configuration
```
Proxmox Node
├── vmbr0 (bridge) → VM eth0 → Docker bridge network
└── vmbr12 (high-speed) → VM eth1 → Docker macvlan (optional)
```
## Docker in LXC (Not Recommended)
If you must run Docker in LXC:
### Requirements
1. **Privileged container** or nesting enabled
2. **AppArmor** profile unconfined
3. **Keyctl** feature enabled
### LXC Options
```bash
# Proxmox GUI: Options → Features
nesting: 1
keyctl: 1
# Or in /etc/pve/lxc/<vmid>.conf
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
```
### Known Issues
- Some Docker storage drivers don't work
- Overlay filesystem may have issues
- Reduced security isolation
- Complex debugging (two container layers)
## Resource Allocation
### CPU
```bash
# VM config - dedicate cores to Docker host
cores: 4
cpu: host # Pass through CPU features
```
### Memory
```bash
# VM config - allow some overcommit for containers
memory: 8192
balloon: 4096 # Minimum memory
```
### Disk I/O
For I/O intensive containers (databases):
```bash
# VM disk options
cache: none # Direct I/O for consistency
iothread: 1 # Dedicated I/O thread
ssd: 1 # If on SSD storage
```
## GPU Passthrough for Containers
For transcoding (Plex) or ML workloads:
### 1. Proxmox: Pass GPU to VM
```bash
# /etc/pve/qemu-server/<vmid>.conf
hostpci0: 0000:01:00.0,pcie=1
```
### 2. VM: Install NVIDIA Container Toolkit
```bash
# In VM
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
### 3. Docker Compose
```yaml
services:
plex:
image: linuxserver/plex
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
## Backup Strategy
### VM-level (Recommended)
Proxmox vzdump backs up entire Docker host including all containers:
```bash
vzdump <vmid> --mode snapshot --storage backup --compress zstd
```
### Application-level
For consistent database backups, stop or flush before VM backup:
```bash
# Pre-backup hook
docker exec postgres pg_dump -U user db > /backup/db.sql
```
## Monitoring
### From Proxmox
- VM CPU, memory, network, disk via Proxmox UI
- No visibility into individual containers
### From Docker Host
```bash
# Resource usage per container
docker stats
# System-wide
docker system df
```
### Recommended Stack
```yaml
# On Docker host
services:
prometheus:
image: prom/prometheus
cadvisor:
image: gcr.io/cadvisor/cadvisor
grafana:
image: grafana/grafana
```
## Skill References
For Docker-specific patterns:
- `docker/references/compose.md` - Compose file structure
- `docker/references/networking.md` - Network modes
- `docker/references/volumes.md` - Data persistence
- `docker/references/proxmox/hosting.md` - Detailed hosting guide

View File

@@ -0,0 +1,153 @@
# Proxmox Networking Reference
## Linux Bridges
Default networking method for Proxmox VMs and containers.
### Bridge Configuration
```
# /etc/network/interfaces example
auto vmbr0
iface vmbr0 inet static
address 192.168.1.10/24
gateway 192.168.1.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
```
### VLAN-Aware Bridge
Enable VLAN tagging at VM level instead of separate bridges:
- Set `bridge-vlan-aware yes` on bridge
- Configure VLAN tag in VM network config
- Simpler management, fewer bridges needed
### Separate Bridges (Alternative)
One bridge per VLAN:
- vmbr0: Untagged/native VLAN
- vmbr1: VLAN 10
- vmbr5: VLAN 5
More bridges but explicit network separation.
## VLAN Configuration
### At VM Level (VLAN-aware bridge)
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,tag=20
```
### At Bridge Level (Separate bridges)
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr20
```
## Firewall
Three levels of firewall rules:
| Level | Scope | Use Case |
|-------|-------|----------|
| Datacenter | Cluster-wide | Default policies |
| Node | Per-node | Node-specific rules |
| VM/Container | Per-VM | Application-specific |
### Default Policy
- Input: DROP (only allow explicit rules)
- Output: ACCEPT
- Enable firewall per VM in Options
### Common Rules
```
# Allow SSH
IN ACCEPT -p tcp --dport 22
# Allow HTTP/HTTPS
IN ACCEPT -p tcp --dport 80
IN ACCEPT -p tcp --dport 443
# Allow ICMP (ping)
IN ACCEPT -p icmp
```
## SDN (Software Defined Networking)
Advanced networking for complex multi-tenant setups.
### Zone Types
| Type | Use Case |
|------|----------|
| Simple | Basic L2 network |
| VLAN | VLAN-based isolation |
| VXLAN | Overlay networking |
| EVPN | BGP-based routing |
### When to Use SDN
- Multi-tenant environments
- Complex routing requirements
- Cross-node L2 networks
- VXLAN overlay needs
For homelab: Standard bridges usually sufficient.
## Network Performance
### Jumbo Frames
Enable on storage network for better throughput:
```
# Set MTU 9000 on bridge
auto vmbr40
iface vmbr40 inet static
mtu 9000
...
```
Requires: All devices in path support jumbo frames.
### VirtIO Multiqueue
Enable parallel network processing for high-throughput VMs:
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4
```
## Troubleshooting
### Check Bridge Status
```bash
brctl show # List bridges and attached interfaces
ip link show vmbr0 # Bridge interface details
bridge vlan show # VLAN configuration
```
### Check VM Network
```bash
qm config <vmid> | grep net # VM network config
ip addr # From inside VM
```
### Common Issues
| Problem | Check |
|---------|-------|
| No connectivity | Bridge exists, interface attached |
| Wrong VLAN | Tag matches switch config |
| Slow network | MTU mismatch, driver type |
| Firewall blocking | Rules, policy, enabled status |

View File

@@ -0,0 +1,150 @@
# Proxmox Storage Reference
## Storage Types
### Local Storage
| Type | Features | Use Case |
|------|----------|----------|
| Directory | Simple, any filesystem | Basic storage |
| LVM | Block device, raw performance | Performance |
| LVM-thin | Thin provisioning, snapshots | Efficient space |
| ZFS | Compression, snapshots, high perf | Production |
Limitations: No live migration, single node only.
### Shared Storage
| Type | Features | Use Case |
|------|----------|----------|
| NFS | File-based, simple | Shared access |
| Ceph RBD | Distributed block, HA | Production HA |
| iSCSI | Network block | SAN integration |
| GlusterFS | Distributed file | File sharing |
Benefits: Live migration, HA, shared access.
## Content Types
Configure what each storage can hold:
| Content | Description | File Types |
|---------|-------------|------------|
| images | VM disk images | .raw, .qcow2 |
| iso | ISO images for install | .iso |
| vztmpl | Container templates | .tar.gz |
| backup | Backup files | .vma, .tar |
| rootdir | Container root FS | directories |
| snippets | Cloud-init, hooks | .yaml, scripts |
## Storage Configuration
### Add NFS Storage
```bash
pvesm add nfs <storage-id> \
--server <nfs-server> \
--export <export-path> \
--content images,iso,backup
```
### Add Ceph RBD
```bash
pvesm add rbd <storage-id> \
--monhost <mon1>,<mon2>,<mon3> \
--pool <pool-name> \
--content images,rootdir
```
### Check Storage Status
```bash
pvesm status # All storage status
pvesh get /storage # API query
df -h # Disk space
```
## Disk Formats
| Format | Features | Performance |
|--------|----------|-------------|
| raw | No overhead, full allocation | Fastest |
| qcow2 | Snapshots, thin provisioning | Moderate |
Recommendation: Use `raw` for production, `qcow2` for dev/snapshots.
## Disk Cache Modes
| Mode | Safety | Performance | Use Case |
|------|--------|-------------|----------|
| none | Safe | Good | Default, recommended |
| writeback | Unsafe | Best | Non-critical, battery backup |
| writethrough | Safe | Moderate | Compatibility |
| directsync | Safest | Slow | Critical data |
## Storage Performance
### Enable Discard (TRIM)
For SSD thin provisioning:
```
scsi0: local-lvm:vm-100-disk-0,discard=on
```
### I/O Thread
Dedicated I/O thread per disk:
```
scsi0: local-lvm:vm-100-disk-0,iothread=1
```
### I/O Limits
Throttle disk bandwidth:
```
# In VM config
bwlimit: <KiB/s>
iops_rd: <iops>
iops_wr: <iops>
```
## Cloud-Init Storage
Cloud-init configs stored in `snippets` content type:
```bash
# Upload cloud-init files
scp user-data.yaml root@proxmox:/var/lib/vz/snippets/
# Or to named storage
scp user-data.yaml root@proxmox:/mnt/pve/<storage>/snippets/
```
Reference in VM:
```
cicustom: user=<storage>:snippets/user-data.yaml
```
## Backup Storage
### Recommended Configuration
- Separate storage for backups
- NFS or dedicated backup server
- Sufficient space for retention policy
### Backup Retention
Configure in Datacenter → Backup:
```
keep-last: 3
keep-daily: 7
keep-weekly: 4
keep-monthly: 6
```

View File

@@ -0,0 +1,197 @@
# Proxmox Troubleshooting Reference
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources |
| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
| Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network |
| Storage unavailable | Mount failed, network | Check mount, network access |
| High load | Resource contention | Identify bottleneck, rebalance VMs |
| Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules |
| Backup failed | Disk space, VM state | Check space, storage access |
| Template not found | Not downloaded | Download from Proxmox repo |
| API errors | Auth, permissions | Check token, user permissions |
## Diagnostic Commands
### Cluster Health
```bash
pvecm status # Quorum and node status
pvecm nodes # List cluster members
systemctl status pve-cluster # Cluster service
systemctl status corosync # Corosync service
```
### Node Health
```bash
pveversion -v # Proxmox version info
uptime # Load and uptime
free -h # Memory usage
df -h # Disk space
top -bn1 | head -20 # Process overview
```
### VM Diagnostics
```bash
qm status <vmid> # VM state
qm config <vmid> # VM configuration
qm showcmd <vmid> # QEMU command line
qm unlock <vmid> # Clear locks
qm monitor <vmid> # QEMU monitor access
```
### Container Diagnostics
```bash
pct status <ctid> # Container state
pct config <ctid> # Container configuration
pct enter <ctid> # Enter container shell
pct unlock <ctid> # Clear locks
```
### Storage Diagnostics
```bash
pvesm status # Storage status
df -h # Disk space
mount | grep -E 'nfs|ceph' # Mounted storage
zpool status # ZFS pool status (if using ZFS)
ceph -s # Ceph status (if using Ceph)
```
### Network Diagnostics
```bash
brctl show # Bridge configuration
ip link # Network interfaces
ip addr # IP addresses
ip route # Routing table
bridge vlan show # VLAN configuration
```
### Log Files
```bash
# Cluster logs
journalctl -u pve-cluster
journalctl -u corosync
# VM/Container logs
journalctl | grep <vmid>
tail -f /var/log/pve/tasks/*
# Firewall logs
journalctl -u pve-firewall
# Web interface logs
journalctl -u pveproxy
```
## Troubleshooting Workflows
### VM Won't Start
1. Check for locks: `qm unlock <vmid>`
2. Verify storage: `pvesm status`
3. Check resources: `free -h`, `df -h`
4. Review config: `qm config <vmid>`
5. Check logs: `journalctl | grep <vmid>`
6. Try manual start: `qm start <vmid> --debug`
### Migration Failure
1. Verify shared storage: `pvesm status`
2. Check target resources: `pvesh get /nodes/<target>/status`
3. Verify network: `ping <target-node>`
4. Check version match: `pveversion` on both nodes
5. Review migration logs
### Cluster Quorum Lost
1. Check status: `pvecm status`
2. Identify online nodes
3. If majority lost, set expected: `pvecm expected <n>`
4. Recover remaining nodes
5. Rejoin lost nodes when available
### Storage Mount Failed
1. Check network: `ping <storage-server>`
2. Verify mount: `mount | grep <storage>`
3. Try manual mount
4. Check permissions on storage server
5. Review `/var/log/syslog`
### High CPU/Memory Usage
1. Identify culprit: `top`, `htop`
2. Check VM resources: `qm monitor <vmid>``info balloon`
3. Review resource allocation across cluster
4. Consider migration or resource limits
## Recovery Procedures
### Remove Failed Node
```bash
# On healthy node
pvecm delnode <failed-node>
# Clean up node-specific configs
rm -rf /etc/pve/nodes/<failed-node>
```
### Force Stop Locked VM
```bash
# Remove lock
qm unlock <vmid>
# If still stuck, find and kill QEMU process
ps aux | grep <vmid>
kill <pid>
# Force cleanup
qm stop <vmid> --skiplock
```
### Recover from Corrupt Config
```bash
# Backup current config
cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
# Edit config manually
nano /etc/pve/qemu-server/<vmid>.conf
# Or restore from backup
qmrestore <backup> <vmid>
```
## Health Check Script
```bash
#!/bin/bash
echo "=== Cluster Status ==="
pvecm status
echo -e "\n=== Node Resources ==="
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
echo "--- $node ---"
pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
done
echo -e "\n=== Storage Status ==="
pvesm status
echo -e "\n=== Running VMs ==="
qm list | grep running
echo -e "\n=== Running Containers ==="
pct list | grep running
```

View File

@@ -0,0 +1,103 @@
# VM vs LXC Reference
## Decision Matrix
### Use VM (QEMU/KVM) When
- Running Windows or non-Linux OS
- Need full kernel isolation
- Running untrusted workloads
- Complex hardware passthrough needed
- Different kernel version required
- GPU passthrough required
### Use LXC When
- Running Linux services
- Need lightweight, fast startup
- Comfortable with shared kernel
- Want better density/performance
- Simple application containers
- Development environments
## QEMU/KVM VMs
Full hardware virtualization with any OS support.
### Hardware Configuration
| Setting | Options | Recommendation |
|---------|---------|----------------|
| CPU type | host, kvm64, custom | `host` for performance |
| Boot | UEFI, BIOS | UEFI for modern OS |
| Display | VNC, SPICE, NoVNC | NoVNC for web access |
### Storage Controllers
| Type | Performance | Use Case |
|------|-------------|----------|
| VirtIO | Fastest | Linux, Windows with drivers |
| SCSI | Fast | General purpose |
| SATA | Moderate | Compatibility |
| IDE | Slow | Legacy OS |
### Network Adapters
| Type | Performance | Use Case |
|------|-------------|----------|
| VirtIO | Fastest | Linux, Windows with drivers |
| E1000 | Good | Compatibility |
| RTL8139 | Slow | Legacy OS |
### Features
- Snapshots (requires compatible storage)
- Templates for rapid cloning
- Live migration (requires shared storage)
- Hardware passthrough (GPU, USB, PCI)
## LXC Containers
OS-level virtualization with shared kernel.
### Container Types
| Type | Security | Use Case |
|------|----------|----------|
| Unprivileged | Higher (recommended) | Production workloads |
| Privileged | Lower | Docker-in-LXC, NFS mounts |
### Resource Controls
- CPU cores and limits
- Memory hard/soft limits
- Disk I/O throttling
- Network bandwidth limits
### Storage Options
- Bind mounts from host
- Volume storage
- ZFS datasets
### Features
- Fast startup (seconds)
- Lower memory overhead
- Higher density per host
- Templates from Proxmox repo
## Migration Considerations
### VM Migration Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Same CPU architecture
- Compatible Proxmox versions
- Network connectivity between nodes
### LXC Migration Requirements
- Shared storage for live migration
- Same architecture
- Unprivileged preferred for portability