Initial commit
This commit is contained in:
179
skills/proxmox/references/automation-tools.md
Normal file
179
skills/proxmox/references/automation-tools.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Proxmox Automation Tools
|
||||
|
||||
Integration patterns for managing Proxmox with Terraform and Ansible.
|
||||
|
||||
## Tool Selection Guide
|
||||
|
||||
| Task | Recommended Tool | Rationale |
|
||||
|------|-----------------|-----------|
|
||||
| VM/LXC provisioning | Terraform | Declarative state, idempotent, handles dependencies |
|
||||
| Template creation | Packer | Repeatable builds, version-controlled |
|
||||
| Post-boot configuration | Ansible | Agent-based, procedural, good for drift |
|
||||
| One-off VM operations | Ansible | Quick tasks, no state file needed |
|
||||
| Dynamic inventory | Ansible | Query running VMs for configuration |
|
||||
| Bulk VM creation | Terraform | count/for_each, parallel creation |
|
||||
| Snapshot management | Either | Terraform for lifecycle, Ansible for ad-hoc |
|
||||
| Cluster administration | CLI/API | Direct access for maintenance tasks |
|
||||
|
||||
## Terraform Integration
|
||||
|
||||
### Provider
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
required_providers {
|
||||
proxmox = {
|
||||
source = "telmate/proxmox"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "proxmox" {
|
||||
pm_api_url = "https://proxmox.example.com:8006/api2/json"
|
||||
pm_api_token_id = "terraform@pve!mytoken"
|
||||
pm_api_token_secret = var.pm_api_token_secret
|
||||
}
|
||||
```
|
||||
|
||||
### Common Patterns
|
||||
|
||||
```hcl
|
||||
# Clone from template
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
name = "myvm"
|
||||
target_node = "joseph"
|
||||
clone = "tmpl-ubuntu-2404-standard"
|
||||
full_clone = true
|
||||
|
||||
cores = 2
|
||||
memory = 4096
|
||||
|
||||
disks {
|
||||
scsi {
|
||||
scsi0 {
|
||||
disk {
|
||||
storage = "local-lvm"
|
||||
size = "50G"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Skill Reference
|
||||
|
||||
Load terraform skill for detailed patterns:
|
||||
- `terraform/references/proxmox/gotchas.md` - Critical issues
|
||||
- `terraform/references/proxmox/vm-qemu.md` - VM resource patterns
|
||||
- `terraform/references/proxmox/authentication.md` - API setup
|
||||
|
||||
## Ansible Integration
|
||||
|
||||
### Collection
|
||||
|
||||
```bash
|
||||
ansible-galaxy collection install community.general
|
||||
```
|
||||
|
||||
### Common Patterns
|
||||
|
||||
```yaml
|
||||
# Clone VM
|
||||
- name: Clone from template
|
||||
community.general.proxmox_kvm:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
clone: tmpl-ubuntu-2404-standard
|
||||
full: true
|
||||
timeout: 500
|
||||
|
||||
# Start VM
|
||||
- name: Start VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
vmid: 300
|
||||
state: started
|
||||
```
|
||||
|
||||
### Skill Reference
|
||||
|
||||
Load ansible skill for detailed patterns:
|
||||
- `ansible/references/proxmox/modules.md` - All Proxmox modules
|
||||
- `ansible/references/proxmox/gotchas.md` - Common issues
|
||||
- `ansible/references/proxmox/dynamic-inventory.md` - Auto-discovery
|
||||
|
||||
## Terraform vs Ansible Decision
|
||||
|
||||
### Use Terraform When
|
||||
|
||||
- Creating infrastructure from scratch
|
||||
- Managing VM lifecycle (create, update, destroy)
|
||||
- Need state tracking and drift detection
|
||||
- Deploying multiple similar VMs (for_each)
|
||||
- Complex dependencies between resources
|
||||
- Team collaboration with state locking
|
||||
|
||||
### Use Ansible When
|
||||
|
||||
- Configuring VMs after creation
|
||||
- Ad-hoc operations (start/stop specific VMs)
|
||||
- Dynamic inventory needed for other playbooks
|
||||
- Quick one-off tasks
|
||||
- No state file management desired
|
||||
- Integration with existing Ansible workflows
|
||||
|
||||
### Use Both When
|
||||
|
||||
- Terraform provisions VMs
|
||||
- Ansible configures them post-boot
|
||||
- Ansible uses Proxmox dynamic inventory to find Terraform-created VMs
|
||||
|
||||
## Hybrid Workflow Example
|
||||
|
||||
```
|
||||
1. Packer builds VM template
|
||||
└── packer build ubuntu-2404.pkr.hcl
|
||||
|
||||
2. Terraform provisions VMs from template
|
||||
└── terraform apply
|
||||
└── Outputs: VM IPs, hostnames
|
||||
|
||||
3. Ansible configures VMs
|
||||
└── Uses Proxmox dynamic inventory OR
|
||||
└── Uses Terraform output as inventory
|
||||
|
||||
4. Ongoing management
|
||||
└── Terraform for infrastructure changes
|
||||
└── Ansible for configuration drift
|
||||
```
|
||||
|
||||
## API Token Sharing
|
||||
|
||||
Both tools can share the same API token:
|
||||
|
||||
```bash
|
||||
# Create shared token
|
||||
pveum user add automation@pve
|
||||
pveum aclmod / -user automation@pve -role PVEAdmin
|
||||
pveum user token add automation@pve shared --privsep 0
|
||||
```
|
||||
|
||||
Store in shared secrets management (1Password, Vault, etc.).
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
| Issue | Terraform | Ansible |
|
||||
|-------|-----------|---------|
|
||||
| VMID | Auto-assigns if not specified | Must specify manually |
|
||||
| Cloud-init changes | Use replace_triggered_by | Limited support, use API |
|
||||
| State tracking | Yes (tfstate) | No state file |
|
||||
| Parallel operations | Yes (configurable) | Yes (forks) |
|
||||
| Template name vs ID | Supports both | Supports both |
|
||||
| Timeout handling | Provider config | Module parameter |
|
||||
162
skills/proxmox/references/backup.md
Normal file
162
skills/proxmox/references/backup.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Proxmox Backup Reference
|
||||
|
||||
## vzdump Overview
|
||||
|
||||
Built-in backup tool for VMs and containers.
|
||||
|
||||
```bash
|
||||
# Basic backup
|
||||
vzdump <vmid>
|
||||
|
||||
# With options
|
||||
vzdump <vmid> --mode snapshot --storage backup-nfs --compress zstd
|
||||
|
||||
# Backup all VMs
|
||||
vzdump --all --compress zstd
|
||||
```
|
||||
|
||||
## Backup Modes
|
||||
|
||||
| Mode | Downtime | Method | Use Case |
|
||||
|------|----------|--------|----------|
|
||||
| stop | Full | Shutdown, backup, start | Consistent, any storage |
|
||||
| suspend | Brief | Pause, backup, resume | Running state preserved |
|
||||
| snapshot | None | LVM/ZFS/Ceph snapshot | Production, requires snapshot storage |
|
||||
|
||||
### Mode Selection
|
||||
|
||||
```bash
|
||||
# Stop mode (most consistent)
|
||||
vzdump <vmid> --mode stop
|
||||
|
||||
# Suspend mode (preserves RAM state)
|
||||
vzdump <vmid> --mode suspend
|
||||
|
||||
# Snapshot mode (live, requires compatible storage)
|
||||
vzdump <vmid> --mode snapshot
|
||||
```
|
||||
|
||||
## Backup Formats
|
||||
|
||||
| Format | Type | Compression |
|
||||
|--------|------|-------------|
|
||||
| VMA | VMs | Native Proxmox format |
|
||||
| tar | Containers | Standard tar archive |
|
||||
|
||||
## Compression Options
|
||||
|
||||
| Type | Speed | Ratio | CPU |
|
||||
|------|-------|-------|-----|
|
||||
| none | Fastest | 1:1 | Low |
|
||||
| lzo | Fast | Good | Low |
|
||||
| gzip | Moderate | Better | Medium |
|
||||
| zstd | Fast | Best | Medium |
|
||||
|
||||
Recommendation: `zstd` for best balance.
|
||||
|
||||
```bash
|
||||
vzdump <vmid> --compress zstd
|
||||
```
|
||||
|
||||
## Storage Configuration
|
||||
|
||||
```bash
|
||||
# Backup to specific storage
|
||||
vzdump <vmid> --storage backup-nfs
|
||||
|
||||
# Check available backup storage
|
||||
pvesm status | grep backup
|
||||
```
|
||||
|
||||
## Scheduled Backups
|
||||
|
||||
Configure in Datacenter → Backup:
|
||||
|
||||
- Schedule (cron format)
|
||||
- Selection (all, pool, specific VMs)
|
||||
- Storage destination
|
||||
- Mode and compression
|
||||
- Retention policy
|
||||
|
||||
### Retention Policy
|
||||
|
||||
```
|
||||
keep-last: 3 # Keep last N backups
|
||||
keep-daily: 7 # Keep daily for N days
|
||||
keep-weekly: 4 # Keep weekly for N weeks
|
||||
keep-monthly: 6 # Keep monthly for N months
|
||||
```
|
||||
|
||||
## Restore Operations
|
||||
|
||||
### Full Restore
|
||||
|
||||
```bash
|
||||
# Restore VM
|
||||
qmrestore <backup-file> <vmid>
|
||||
|
||||
# Restore to different VMID
|
||||
qmrestore <backup-file> <new-vmid>
|
||||
|
||||
# Restore container
|
||||
pct restore <ctid> <backup-file>
|
||||
```
|
||||
|
||||
### Restore Options
|
||||
|
||||
```bash
|
||||
# Restore to different storage
|
||||
qmrestore <backup> <vmid> --storage local-lvm
|
||||
|
||||
# Force overwrite existing VM
|
||||
qmrestore <backup> <vmid> --force
|
||||
```
|
||||
|
||||
### File-Level Restore
|
||||
|
||||
```bash
|
||||
# Mount backup for file extraction
|
||||
# (Use web UI: Backup → Restore → File Restore)
|
||||
```
|
||||
|
||||
## Proxmox Backup Server (PBS)
|
||||
|
||||
Dedicated backup server with deduplication.
|
||||
|
||||
### Benefits
|
||||
|
||||
- Deduplication across backups
|
||||
- Encryption at rest
|
||||
- Verification and integrity checks
|
||||
- Efficient incremental backups
|
||||
- Remote backup sync
|
||||
|
||||
### Integration
|
||||
|
||||
Add PBS storage:
|
||||
|
||||
```bash
|
||||
pvesm add pbs <storage-id> \
|
||||
--server <pbs-server> \
|
||||
--datastore <datastore> \
|
||||
--username <user>@pbs \
|
||||
--fingerprint <fingerprint>
|
||||
```
|
||||
|
||||
## Backup Best Practices
|
||||
|
||||
- Store backups on separate storage from VMs
|
||||
- Use snapshot mode for production VMs
|
||||
- Test restores regularly
|
||||
- Offsite backup copy for disaster recovery
|
||||
- Monitor backup job completion
|
||||
- Set appropriate retention policy
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Issue | Check |
|
||||
|-------|-------|
|
||||
| Backup fails | Storage space, VM state, permissions |
|
||||
| Slow backup | Mode (snapshot faster), compression, network |
|
||||
| Restore fails | Storage compatibility, VMID conflicts |
|
||||
| Snapshot fails | Storage doesn't support snapshots |
|
||||
178
skills/proxmox/references/cli-tools.md
Normal file
178
skills/proxmox/references/cli-tools.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Proxmox CLI Tools Reference
|
||||
|
||||
## qm - VM Management
|
||||
|
||||
```bash
|
||||
# List and status
|
||||
qm list # List all VMs
|
||||
qm status <vmid> # VM status
|
||||
qm config <vmid> # Show VM config
|
||||
|
||||
# Power operations
|
||||
qm start <vmid> # Start VM
|
||||
qm stop <vmid> # Force stop
|
||||
qm shutdown <vmid> # ACPI shutdown
|
||||
qm reboot <vmid> # ACPI reboot
|
||||
qm reset <vmid> # Hard reset
|
||||
qm suspend <vmid> # Suspend to RAM
|
||||
qm resume <vmid> # Resume from suspend
|
||||
|
||||
# Configuration
|
||||
qm set <vmid> --memory 4096 # Set memory
|
||||
qm set <vmid> --cores 4 # Set CPU cores
|
||||
qm set <vmid> --name newname # Rename VM
|
||||
|
||||
# Disk operations
|
||||
qm resize <vmid> scsi0 +10G # Extend disk
|
||||
qm move-disk <vmid> scsi0 <storage> # Move disk
|
||||
|
||||
# Snapshots
|
||||
qm snapshot <vmid> <snapname> # Create snapshot
|
||||
qm listsnapshot <vmid> # List snapshots
|
||||
qm rollback <vmid> <snapname> # Rollback
|
||||
qm delsnapshot <vmid> <snapname> # Delete snapshot
|
||||
|
||||
# Templates and clones
|
||||
qm template <vmid> # Convert to template
|
||||
qm clone <vmid> <newid> # Clone VM
|
||||
|
||||
# Migration
|
||||
qm migrate <vmid> <target-node> # Live migrate
|
||||
|
||||
# Troubleshooting
|
||||
qm unlock <vmid> # Remove lock
|
||||
qm showcmd <vmid> # Show QEMU command
|
||||
qm monitor <vmid> # QEMU monitor
|
||||
qm guest cmd <vmid> <command> # Guest agent command
|
||||
```
|
||||
|
||||
## pct - Container Management
|
||||
|
||||
```bash
|
||||
# List and status
|
||||
pct list # List all containers
|
||||
pct status <ctid> # Container status
|
||||
pct config <ctid> # Show config
|
||||
|
||||
# Power operations
|
||||
pct start <ctid> # Start container
|
||||
pct stop <ctid> # Stop container
|
||||
pct shutdown <ctid> # Graceful shutdown
|
||||
pct reboot <ctid> # Reboot
|
||||
|
||||
# Access
|
||||
pct enter <ctid> # Enter shell
|
||||
pct exec <ctid> -- <command> # Run command
|
||||
pct console <ctid> # Attach console
|
||||
|
||||
# Configuration
|
||||
pct set <ctid> --memory 2048 # Set memory
|
||||
pct set <ctid> --cores 2 # Set CPU cores
|
||||
pct set <ctid> --hostname name # Set hostname
|
||||
|
||||
# Disk operations
|
||||
pct resize <ctid> rootfs +5G # Extend rootfs
|
||||
pct move-volume <ctid> <vol> <storage> # Move volume
|
||||
|
||||
# Snapshots
|
||||
pct snapshot <ctid> <snapname> # Create snapshot
|
||||
pct listsnapshot <ctid> # List snapshots
|
||||
pct rollback <ctid> <snapname> # Rollback
|
||||
|
||||
# Templates
|
||||
pct template <ctid> # Convert to template
|
||||
pct clone <ctid> <newid> # Clone container
|
||||
|
||||
# Migration
|
||||
pct migrate <ctid> <target-node> # Migrate container
|
||||
|
||||
# Troubleshooting
|
||||
pct unlock <ctid> # Remove lock
|
||||
pct push <ctid> <src> <dst> # Copy file to container
|
||||
pct pull <ctid> <src> <dst> # Copy file from container
|
||||
```
|
||||
|
||||
## pvecm - Cluster Management
|
||||
|
||||
```bash
|
||||
# Status
|
||||
pvecm status # Cluster status
|
||||
pvecm nodes # List nodes
|
||||
pvecm qdevice # QDevice status
|
||||
|
||||
# Node operations
|
||||
pvecm add <node> # Join cluster
|
||||
pvecm delnode <node> # Remove node
|
||||
pvecm updatecerts # Update SSL certs
|
||||
|
||||
# Recovery
|
||||
pvecm expected <votes> # Set expected votes
|
||||
```
|
||||
|
||||
## pvesh - API Shell
|
||||
|
||||
```bash
|
||||
# GET requests
|
||||
pvesh get /nodes # List nodes
|
||||
pvesh get /nodes/<node>/status # Node status
|
||||
pvesh get /nodes/<node>/qemu # List VMs on node
|
||||
pvesh get /nodes/<node>/qemu/<vmid>/status/current # VM status
|
||||
pvesh get /storage # List storage
|
||||
pvesh get /cluster/resources # All cluster resources
|
||||
|
||||
# POST/PUT requests
|
||||
pvesh create /nodes/<node>/qemu -vmid <id> ... # Create VM
|
||||
pvesh set /nodes/<node>/qemu/<vmid>/config ... # Modify VM
|
||||
|
||||
# DELETE requests
|
||||
pvesh delete /nodes/<node>/qemu/<vmid> # Delete VM
|
||||
```
|
||||
|
||||
## vzdump - Backup
|
||||
|
||||
```bash
|
||||
# Basic backup
|
||||
vzdump <vmid> # Backup VM
|
||||
vzdump <ctid> # Backup container
|
||||
|
||||
# Options
|
||||
vzdump <vmid> --mode snapshot # Snapshot mode
|
||||
vzdump <vmid> --compress zstd # With compression
|
||||
vzdump <vmid> --storage backup # To specific storage
|
||||
vzdump <vmid> --mailto admin@example.com # Email notification
|
||||
|
||||
# Backup all
|
||||
vzdump --all # All VMs and containers
|
||||
vzdump --pool <pool> # All in pool
|
||||
```
|
||||
|
||||
## qmrestore / pct restore
|
||||
|
||||
```bash
|
||||
# Restore VM
|
||||
qmrestore <backup.vma> <vmid>
|
||||
qmrestore <backup.vma> <vmid> --storage local-lvm
|
||||
|
||||
# Restore container
|
||||
pct restore <ctid> <backup.tar>
|
||||
pct restore <ctid> <backup.tar> --storage local-lvm
|
||||
```
|
||||
|
||||
## Useful Combinations
|
||||
|
||||
```bash
|
||||
# Check resources on all nodes
|
||||
for node in joseph maxwell everette; do
|
||||
echo "=== $node ==="
|
||||
pvesh get /nodes/$node/status | jq '{cpu:.cpu, memory:.memory}'
|
||||
done
|
||||
|
||||
# Stop all VMs on a node
|
||||
qm list | awk 'NR>1 {print $1}' | xargs -I {} qm stop {}
|
||||
|
||||
# List VMs with their IPs (requires guest agent)
|
||||
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
|
||||
echo -n "$vmid: "
|
||||
qm guest cmd $vmid network-get-interfaces 2>/dev/null | jq -r '.[].["ip-addresses"][]?.["ip-address"]' | head -1
|
||||
done
|
||||
```
|
||||
181
skills/proxmox/references/clustering.md
Normal file
181
skills/proxmox/references/clustering.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Proxmox Clustering Reference
|
||||
|
||||
## Cluster Benefits
|
||||
|
||||
- Centralized web management
|
||||
- Live VM migration between nodes
|
||||
- High availability (HA) with automatic failover
|
||||
- Shared configuration
|
||||
|
||||
## Cluster Requirements
|
||||
|
||||
| Requirement | Details |
|
||||
|-------------|---------|
|
||||
| Version | Same major/minor Proxmox version |
|
||||
| Time | NTP synchronized |
|
||||
| Network | Low-latency cluster network |
|
||||
| Names | Unique node hostnames |
|
||||
| Storage | Shared storage for HA |
|
||||
|
||||
## Cluster Commands
|
||||
|
||||
```bash
|
||||
# Check cluster status
|
||||
pvecm status
|
||||
|
||||
# List cluster nodes
|
||||
pvecm nodes
|
||||
|
||||
# Add node to cluster (run on new node)
|
||||
pvecm add <existing-node>
|
||||
|
||||
# Remove node (run on remaining node)
|
||||
pvecm delnode <node-name>
|
||||
|
||||
# Expected votes (split-brain recovery)
|
||||
pvecm expected <votes>
|
||||
```
|
||||
|
||||
## Quorum
|
||||
|
||||
Cluster requires majority of nodes online to operate.
|
||||
|
||||
| Nodes | Quorum | Can Lose |
|
||||
|-------|--------|----------|
|
||||
| 2 | 2 | 0 (use QDevice) |
|
||||
| 3 | 2 | 1 |
|
||||
| 4 | 3 | 1 |
|
||||
| 5 | 3 | 2 |
|
||||
|
||||
### QDevice
|
||||
|
||||
External quorum device for even-node clusters:
|
||||
|
||||
- Prevents split-brain in 2-node clusters
|
||||
- Runs on separate machine
|
||||
- Provides tie-breaking vote
|
||||
|
||||
## High Availability (HA)
|
||||
|
||||
Automatic VM restart on healthy node if host fails.
|
||||
|
||||
### Requirements
|
||||
|
||||
- Shared storage (Ceph, NFS, iSCSI)
|
||||
- Fencing enabled (watchdog)
|
||||
- HA group configured
|
||||
- VM added to HA
|
||||
|
||||
### HA States
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| started | VM running, managed by HA |
|
||||
| stopped | VM stopped intentionally |
|
||||
| migrate | Migration in progress |
|
||||
| relocate | Moving to different node |
|
||||
| error | Problem detected |
|
||||
|
||||
### HA Configuration
|
||||
|
||||
1. Enable fencing (watchdog device)
|
||||
2. Create HA group (optional)
|
||||
3. Add VM to HA: Datacenter → HA → Add
|
||||
|
||||
### Fencing
|
||||
|
||||
Prevents split-brain by forcing failed node to stop:
|
||||
|
||||
```bash
|
||||
# Check watchdog status
|
||||
cat /proc/sys/kernel/watchdog
|
||||
|
||||
# Watchdog config
|
||||
/etc/pve/ha/fence.cfg
|
||||
```
|
||||
|
||||
## Live Migration
|
||||
|
||||
Move running VM between nodes without downtime.
|
||||
|
||||
### Requirements
|
||||
|
||||
- Shared storage OR local-to-local migration
|
||||
- Same CPU architecture
|
||||
- Network connectivity
|
||||
- Sufficient resources on target
|
||||
|
||||
### Migration Types
|
||||
|
||||
| Type | Downtime | Requirements |
|
||||
|------|----------|--------------|
|
||||
| Live | Minimal | Shared storage |
|
||||
| Offline | Full | Any storage |
|
||||
| Local storage | Moderate | Copies disk |
|
||||
|
||||
### Migration Command
|
||||
|
||||
```bash
|
||||
# Live migrate
|
||||
qm migrate <vmid> <target-node>
|
||||
|
||||
# Offline migrate
|
||||
qm migrate <vmid> <target-node> --offline
|
||||
|
||||
# With local disk
|
||||
qm migrate <vmid> <target-node> --with-local-disks
|
||||
```
|
||||
|
||||
## Cluster Network
|
||||
|
||||
### Corosync Network
|
||||
|
||||
Cluster communication (default port 5405):
|
||||
|
||||
- Low-latency required
|
||||
- Dedicated VLAN recommended
|
||||
- Redundant links for HA
|
||||
|
||||
### Configuration
|
||||
|
||||
```
|
||||
# /etc/pve/corosync.conf
|
||||
nodelist {
|
||||
node {
|
||||
name: node1
|
||||
ring0_addr: 192.168.10.1
|
||||
}
|
||||
node {
|
||||
name: node2
|
||||
ring0_addr: 192.168.10.2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Quorum Lost
|
||||
|
||||
```bash
|
||||
# Check status
|
||||
pvecm status
|
||||
|
||||
# Force expected votes (DANGEROUS)
|
||||
pvecm expected 1
|
||||
|
||||
# Then: recover remaining nodes
|
||||
```
|
||||
|
||||
### Node Won't Join
|
||||
|
||||
- Check network connectivity
|
||||
- Verify time sync
|
||||
- Check Proxmox versions match
|
||||
- Review /var/log/pve-cluster/
|
||||
|
||||
### Split Brain Recovery
|
||||
|
||||
1. Identify authoritative node
|
||||
2. Stop cluster services on other nodes
|
||||
3. Set expected votes
|
||||
4. Restart and rejoin nodes
|
||||
202
skills/proxmox/references/docker-hosting.md
Normal file
202
skills/proxmox/references/docker-hosting.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Docker Workloads on Proxmox
|
||||
|
||||
Best practices for hosting Docker containers on Proxmox VE.
|
||||
|
||||
## Hosting Options
|
||||
|
||||
| Option | Isolation | Overhead | Complexity | Recommendation |
|
||||
|--------|-----------|----------|------------|----------------|
|
||||
| VM + Docker | Full | Higher | Low | **Recommended** |
|
||||
| LXC + Docker | Shared kernel | Lower | High | Avoid |
|
||||
| Bare metal Docker | None | Lowest | N/A | Not on Proxmox |
|
||||
|
||||
## VM for Docker (Recommended)
|
||||
|
||||
### Template Selection
|
||||
|
||||
Use Docker-ready templates (102+):
|
||||
|
||||
| Template | Docker Pre-installed |
|
||||
|----------|---------------------|
|
||||
| 102 (docker) | Yes |
|
||||
| 103 (github-runner) | Yes |
|
||||
| 104 (pihole) | Yes |
|
||||
|
||||
### VM Sizing
|
||||
|
||||
| Workload | CPU | RAM | Disk |
|
||||
|----------|-----|-----|------|
|
||||
| Light (1-3 containers) | 2 | 4 GB | 50 GB |
|
||||
| Medium (4-10 containers) | 4 | 8 GB | 100 GB |
|
||||
| Heavy (10+ containers) | 8+ | 16+ GB | 200+ GB |
|
||||
|
||||
### Storage Backend
|
||||
|
||||
| Proxmox Storage | Docker Suitability | Notes |
|
||||
|-----------------|-------------------|-------|
|
||||
| local-lvm | Good | Default, fast |
|
||||
| ZFS | Best | Snapshots, compression |
|
||||
| Ceph | Good | Distributed, HA |
|
||||
| NFS | Moderate | Shared access, slower |
|
||||
|
||||
### Network Configuration
|
||||
|
||||
```
|
||||
Proxmox Node
|
||||
├── vmbr0 (bridge) → VM eth0 → Docker bridge network
|
||||
└── vmbr12 (high-speed) → VM eth1 → Docker macvlan (optional)
|
||||
```
|
||||
|
||||
## Docker in LXC (Not Recommended)
|
||||
|
||||
If you must run Docker in LXC:
|
||||
|
||||
### Requirements
|
||||
|
||||
1. **Privileged container** or nesting enabled
|
||||
2. **AppArmor** profile unconfined
|
||||
3. **Keyctl** feature enabled
|
||||
|
||||
### LXC Options
|
||||
|
||||
```bash
|
||||
# Proxmox GUI: Options → Features
|
||||
nesting: 1
|
||||
keyctl: 1
|
||||
|
||||
# Or in /etc/pve/lxc/<vmid>.conf
|
||||
features: keyctl=1,nesting=1
|
||||
lxc.apparmor.profile: unconfined
|
||||
```
|
||||
|
||||
### Known Issues
|
||||
|
||||
- Some Docker storage drivers don't work
|
||||
- Overlay filesystem may have issues
|
||||
- Reduced security isolation
|
||||
- Complex debugging (two container layers)
|
||||
|
||||
## Resource Allocation
|
||||
|
||||
### CPU
|
||||
|
||||
```bash
|
||||
# VM config - dedicate cores to Docker host
|
||||
cores: 4
|
||||
cpu: host # Pass through CPU features
|
||||
```
|
||||
|
||||
### Memory
|
||||
|
||||
```bash
|
||||
# VM config - allow some overcommit for containers
|
||||
memory: 8192
|
||||
balloon: 4096 # Minimum memory
|
||||
```
|
||||
|
||||
### Disk I/O
|
||||
|
||||
For I/O intensive containers (databases):
|
||||
|
||||
```bash
|
||||
# VM disk options
|
||||
cache: none # Direct I/O for consistency
|
||||
iothread: 1 # Dedicated I/O thread
|
||||
ssd: 1 # If on SSD storage
|
||||
```
|
||||
|
||||
## GPU Passthrough for Containers
|
||||
|
||||
For transcoding (Plex) or ML workloads:
|
||||
|
||||
### 1. Proxmox: Pass GPU to VM
|
||||
|
||||
```bash
|
||||
# /etc/pve/qemu-server/<vmid>.conf
|
||||
hostpci0: 0000:01:00.0,pcie=1
|
||||
```
|
||||
|
||||
### 2. VM: Install NVIDIA Container Toolkit
|
||||
|
||||
```bash
|
||||
# In VM
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
sudo apt update && sudo apt install -y nvidia-container-toolkit
|
||||
sudo nvidia-ctk runtime configure --runtime=docker
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### 3. Docker Compose
|
||||
|
||||
```yaml
|
||||
services:
|
||||
plex:
|
||||
image: linuxserver/plex
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### VM-level (Recommended)
|
||||
|
||||
Proxmox vzdump backs up entire Docker host including all containers:
|
||||
|
||||
```bash
|
||||
vzdump <vmid> --mode snapshot --storage backup --compress zstd
|
||||
```
|
||||
|
||||
### Application-level
|
||||
|
||||
For consistent database backups, stop or flush before VM backup:
|
||||
|
||||
```bash
|
||||
# Pre-backup hook
|
||||
docker exec postgres pg_dump -U user db > /backup/db.sql
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### From Proxmox
|
||||
|
||||
- VM CPU, memory, network, disk via Proxmox UI
|
||||
- No visibility into individual containers
|
||||
|
||||
### From Docker Host
|
||||
|
||||
```bash
|
||||
# Resource usage per container
|
||||
docker stats
|
||||
|
||||
# System-wide
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Recommended Stack
|
||||
|
||||
```yaml
|
||||
# On Docker host
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor
|
||||
grafana:
|
||||
image: grafana/grafana
|
||||
```
|
||||
|
||||
## Skill References
|
||||
|
||||
For Docker-specific patterns:
|
||||
- `docker/references/compose.md` - Compose file structure
|
||||
- `docker/references/networking.md` - Network modes
|
||||
- `docker/references/volumes.md` - Data persistence
|
||||
- `docker/references/proxmox/hosting.md` - Detailed hosting guide
|
||||
153
skills/proxmox/references/networking.md
Normal file
153
skills/proxmox/references/networking.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Proxmox Networking Reference
|
||||
|
||||
## Linux Bridges
|
||||
|
||||
Default networking method for Proxmox VMs and containers.
|
||||
|
||||
### Bridge Configuration
|
||||
|
||||
```
|
||||
# /etc/network/interfaces example
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 192.168.1.10/24
|
||||
gateway 192.168.1.1
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-vlan-aware yes
|
||||
```
|
||||
|
||||
### VLAN-Aware Bridge
|
||||
|
||||
Enable VLAN tagging at VM level instead of separate bridges:
|
||||
|
||||
- Set `bridge-vlan-aware yes` on bridge
|
||||
- Configure VLAN tag in VM network config
|
||||
- Simpler management, fewer bridges needed
|
||||
|
||||
### Separate Bridges (Alternative)
|
||||
|
||||
One bridge per VLAN:
|
||||
|
||||
- vmbr0: Untagged/native VLAN
|
||||
- vmbr1: VLAN 10
|
||||
- vmbr5: VLAN 5
|
||||
|
||||
More bridges but explicit network separation.
|
||||
|
||||
## VLAN Configuration
|
||||
|
||||
### At VM Level (VLAN-aware bridge)
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,tag=20
|
||||
```
|
||||
|
||||
### At Bridge Level (Separate bridges)
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr20
|
||||
```
|
||||
|
||||
## Firewall
|
||||
|
||||
Three levels of firewall rules:
|
||||
|
||||
| Level | Scope | Use Case |
|
||||
|-------|-------|----------|
|
||||
| Datacenter | Cluster-wide | Default policies |
|
||||
| Node | Per-node | Node-specific rules |
|
||||
| VM/Container | Per-VM | Application-specific |
|
||||
|
||||
### Default Policy
|
||||
|
||||
- Input: DROP (only allow explicit rules)
|
||||
- Output: ACCEPT
|
||||
- Enable firewall per VM in Options
|
||||
|
||||
### Common Rules
|
||||
|
||||
```
|
||||
# Allow SSH
|
||||
IN ACCEPT -p tcp --dport 22
|
||||
|
||||
# Allow HTTP/HTTPS
|
||||
IN ACCEPT -p tcp --dport 80
|
||||
IN ACCEPT -p tcp --dport 443
|
||||
|
||||
# Allow ICMP (ping)
|
||||
IN ACCEPT -p icmp
|
||||
```
|
||||
|
||||
## SDN (Software Defined Networking)
|
||||
|
||||
Advanced networking for complex multi-tenant setups.
|
||||
|
||||
### Zone Types
|
||||
|
||||
| Type | Use Case |
|
||||
|------|----------|
|
||||
| Simple | Basic L2 network |
|
||||
| VLAN | VLAN-based isolation |
|
||||
| VXLAN | Overlay networking |
|
||||
| EVPN | BGP-based routing |
|
||||
|
||||
### When to Use SDN
|
||||
|
||||
- Multi-tenant environments
|
||||
- Complex routing requirements
|
||||
- Cross-node L2 networks
|
||||
- VXLAN overlay needs
|
||||
|
||||
For homelab: Standard bridges usually sufficient.
|
||||
|
||||
## Network Performance
|
||||
|
||||
### Jumbo Frames
|
||||
|
||||
Enable on storage network for better throughput:
|
||||
|
||||
```
|
||||
# Set MTU 9000 on bridge
|
||||
auto vmbr40
|
||||
iface vmbr40 inet static
|
||||
mtu 9000
|
||||
...
|
||||
```
|
||||
|
||||
Requires: All devices in path support jumbo frames.
|
||||
|
||||
### VirtIO Multiqueue
|
||||
|
||||
Enable parallel network processing for high-throughput VMs:
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Bridge Status
|
||||
|
||||
```bash
|
||||
brctl show # List bridges and attached interfaces
|
||||
ip link show vmbr0 # Bridge interface details
|
||||
bridge vlan show # VLAN configuration
|
||||
```
|
||||
|
||||
### Check VM Network
|
||||
|
||||
```bash
|
||||
qm config <vmid> | grep net # VM network config
|
||||
ip addr # From inside VM
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Problem | Check |
|
||||
|---------|-------|
|
||||
| No connectivity | Bridge exists, interface attached |
|
||||
| Wrong VLAN | Tag matches switch config |
|
||||
| Slow network | MTU mismatch, driver type |
|
||||
| Firewall blocking | Rules, policy, enabled status |
|
||||
150
skills/proxmox/references/storage.md
Normal file
150
skills/proxmox/references/storage.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Proxmox Storage Reference
|
||||
|
||||
## Storage Types
|
||||
|
||||
### Local Storage
|
||||
|
||||
| Type | Features | Use Case |
|
||||
|------|----------|----------|
|
||||
| Directory | Simple, any filesystem | Basic storage |
|
||||
| LVM | Block device, raw performance | Performance |
|
||||
| LVM-thin | Thin provisioning, snapshots | Efficient space |
|
||||
| ZFS | Compression, snapshots, high perf | Production |
|
||||
|
||||
Limitations: No live migration, single node only.
|
||||
|
||||
### Shared Storage
|
||||
|
||||
| Type | Features | Use Case |
|
||||
|------|----------|----------|
|
||||
| NFS | File-based, simple | Shared access |
|
||||
| Ceph RBD | Distributed block, HA | Production HA |
|
||||
| iSCSI | Network block | SAN integration |
|
||||
| GlusterFS | Distributed file | File sharing |
|
||||
|
||||
Benefits: Live migration, HA, shared access.
|
||||
|
||||
## Content Types
|
||||
|
||||
Configure what each storage can hold:
|
||||
|
||||
| Content | Description | File Types |
|
||||
|---------|-------------|------------|
|
||||
| images | VM disk images | .raw, .qcow2 |
|
||||
| iso | ISO images for install | .iso |
|
||||
| vztmpl | Container templates | .tar.gz |
|
||||
| backup | Backup files | .vma, .tar |
|
||||
| rootdir | Container root FS | directories |
|
||||
| snippets | Cloud-init, hooks | .yaml, scripts |
|
||||
|
||||
## Storage Configuration
|
||||
|
||||
### Add NFS Storage
|
||||
|
||||
```bash
|
||||
pvesm add nfs <storage-id> \
|
||||
--server <nfs-server> \
|
||||
--export <export-path> \
|
||||
--content images,iso,backup
|
||||
```
|
||||
|
||||
### Add Ceph RBD
|
||||
|
||||
```bash
|
||||
pvesm add rbd <storage-id> \
|
||||
--monhost <mon1>,<mon2>,<mon3> \
|
||||
--pool <pool-name> \
|
||||
--content images,rootdir
|
||||
```
|
||||
|
||||
### Check Storage Status
|
||||
|
||||
```bash
|
||||
pvesm status # All storage status
|
||||
pvesh get /storage # API query
|
||||
df -h # Disk space
|
||||
```
|
||||
|
||||
## Disk Formats
|
||||
|
||||
| Format | Features | Performance |
|
||||
|--------|----------|-------------|
|
||||
| raw | No overhead, full allocation | Fastest |
|
||||
| qcow2 | Snapshots, thin provisioning | Moderate |
|
||||
|
||||
Recommendation: Use `raw` for production, `qcow2` for dev/snapshots.
|
||||
|
||||
## Disk Cache Modes
|
||||
|
||||
| Mode | Safety | Performance | Use Case |
|
||||
|------|--------|-------------|----------|
|
||||
| none | Safe | Good | Default, recommended |
|
||||
| writeback | Unsafe | Best | Non-critical, battery backup |
|
||||
| writethrough | Safe | Moderate | Compatibility |
|
||||
| directsync | Safest | Slow | Critical data |
|
||||
|
||||
## Storage Performance
|
||||
|
||||
### Enable Discard (TRIM)
|
||||
|
||||
For SSD thin provisioning:
|
||||
|
||||
```
|
||||
scsi0: local-lvm:vm-100-disk-0,discard=on
|
||||
```
|
||||
|
||||
### I/O Thread
|
||||
|
||||
Dedicated I/O thread per disk:
|
||||
|
||||
```
|
||||
scsi0: local-lvm:vm-100-disk-0,iothread=1
|
||||
```
|
||||
|
||||
### I/O Limits
|
||||
|
||||
Throttle disk bandwidth:
|
||||
|
||||
```
|
||||
# In VM config
|
||||
bwlimit: <KiB/s>
|
||||
iops_rd: <iops>
|
||||
iops_wr: <iops>
|
||||
```
|
||||
|
||||
## Cloud-Init Storage
|
||||
|
||||
Cloud-init configs stored in `snippets` content type:
|
||||
|
||||
```bash
|
||||
# Upload cloud-init files
|
||||
scp user-data.yaml root@proxmox:/var/lib/vz/snippets/
|
||||
|
||||
# Or to named storage
|
||||
scp user-data.yaml root@proxmox:/mnt/pve/<storage>/snippets/
|
||||
```
|
||||
|
||||
Reference in VM:
|
||||
|
||||
```
|
||||
cicustom: user=<storage>:snippets/user-data.yaml
|
||||
```
|
||||
|
||||
## Backup Storage
|
||||
|
||||
### Recommended Configuration
|
||||
|
||||
- Separate storage for backups
|
||||
- NFS or dedicated backup server
|
||||
- Sufficient space for retention policy
|
||||
|
||||
### Backup Retention
|
||||
|
||||
Configure in Datacenter → Backup:
|
||||
|
||||
```
|
||||
keep-last: 3
|
||||
keep-daily: 7
|
||||
keep-weekly: 4
|
||||
keep-monthly: 6
|
||||
```
|
||||
197
skills/proxmox/references/troubleshooting.md
Normal file
197
skills/proxmox/references/troubleshooting.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Proxmox Troubleshooting Reference
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources |
|
||||
| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
|
||||
| Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network |
|
||||
| Storage unavailable | Mount failed, network | Check mount, network access |
|
||||
| High load | Resource contention | Identify bottleneck, rebalance VMs |
|
||||
| Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules |
|
||||
| Backup failed | Disk space, VM state | Check space, storage access |
|
||||
| Template not found | Not downloaded | Download from Proxmox repo |
|
||||
| API errors | Auth, permissions | Check token, user permissions |
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
### Cluster Health
|
||||
|
||||
```bash
|
||||
pvecm status # Quorum and node status
|
||||
pvecm nodes # List cluster members
|
||||
systemctl status pve-cluster # Cluster service
|
||||
systemctl status corosync # Corosync service
|
||||
```
|
||||
|
||||
### Node Health
|
||||
|
||||
```bash
|
||||
pveversion -v # Proxmox version info
|
||||
uptime # Load and uptime
|
||||
free -h # Memory usage
|
||||
df -h # Disk space
|
||||
top -bn1 | head -20 # Process overview
|
||||
```
|
||||
|
||||
### VM Diagnostics
|
||||
|
||||
```bash
|
||||
qm status <vmid> # VM state
|
||||
qm config <vmid> # VM configuration
|
||||
qm showcmd <vmid> # QEMU command line
|
||||
qm unlock <vmid> # Clear locks
|
||||
qm monitor <vmid> # QEMU monitor access
|
||||
```
|
||||
|
||||
### Container Diagnostics
|
||||
|
||||
```bash
|
||||
pct status <ctid> # Container state
|
||||
pct config <ctid> # Container configuration
|
||||
pct enter <ctid> # Enter container shell
|
||||
pct unlock <ctid> # Clear locks
|
||||
```
|
||||
|
||||
### Storage Diagnostics
|
||||
|
||||
```bash
|
||||
pvesm status # Storage status
|
||||
df -h # Disk space
|
||||
mount | grep -E 'nfs|ceph' # Mounted storage
|
||||
zpool status # ZFS pool status (if using ZFS)
|
||||
ceph -s # Ceph status (if using Ceph)
|
||||
```
|
||||
|
||||
### Network Diagnostics
|
||||
|
||||
```bash
|
||||
brctl show # Bridge configuration
|
||||
ip link # Network interfaces
|
||||
ip addr # IP addresses
|
||||
ip route # Routing table
|
||||
bridge vlan show # VLAN configuration
|
||||
```
|
||||
|
||||
### Log Files
|
||||
|
||||
```bash
|
||||
# Cluster logs
|
||||
journalctl -u pve-cluster
|
||||
journalctl -u corosync
|
||||
|
||||
# VM/Container logs
|
||||
journalctl | grep <vmid>
|
||||
tail -f /var/log/pve/tasks/*
|
||||
|
||||
# Firewall logs
|
||||
journalctl -u pve-firewall
|
||||
|
||||
# Web interface logs
|
||||
journalctl -u pveproxy
|
||||
```
|
||||
|
||||
## Troubleshooting Workflows
|
||||
|
||||
### VM Won't Start
|
||||
|
||||
1. Check for locks: `qm unlock <vmid>`
|
||||
2. Verify storage: `pvesm status`
|
||||
3. Check resources: `free -h`, `df -h`
|
||||
4. Review config: `qm config <vmid>`
|
||||
5. Check logs: `journalctl | grep <vmid>`
|
||||
6. Try manual start: `qm start <vmid> --debug`
|
||||
|
||||
### Migration Failure
|
||||
|
||||
1. Verify shared storage: `pvesm status`
|
||||
2. Check target resources: `pvesh get /nodes/<target>/status`
|
||||
3. Verify network: `ping <target-node>`
|
||||
4. Check version match: `pveversion` on both nodes
|
||||
5. Review migration logs
|
||||
|
||||
### Cluster Quorum Lost
|
||||
|
||||
1. Check status: `pvecm status`
|
||||
2. Identify online nodes
|
||||
3. If majority lost, set expected: `pvecm expected <n>`
|
||||
4. Recover remaining nodes
|
||||
5. Rejoin lost nodes when available
|
||||
|
||||
### Storage Mount Failed
|
||||
|
||||
1. Check network: `ping <storage-server>`
|
||||
2. Verify mount: `mount | grep <storage>`
|
||||
3. Try manual mount
|
||||
4. Check permissions on storage server
|
||||
5. Review `/var/log/syslog`
|
||||
|
||||
### High CPU/Memory Usage
|
||||
|
||||
1. Identify culprit: `top`, `htop`
|
||||
2. Check VM resources: `qm monitor <vmid>` → `info balloon`
|
||||
3. Review resource allocation across cluster
|
||||
4. Consider migration or resource limits
|
||||
|
||||
## Recovery Procedures
|
||||
|
||||
### Remove Failed Node
|
||||
|
||||
```bash
|
||||
# On healthy node
|
||||
pvecm delnode <failed-node>
|
||||
|
||||
# Clean up node-specific configs
|
||||
rm -rf /etc/pve/nodes/<failed-node>
|
||||
```
|
||||
|
||||
### Force Stop Locked VM
|
||||
|
||||
```bash
|
||||
# Remove lock
|
||||
qm unlock <vmid>
|
||||
|
||||
# If still stuck, find and kill QEMU process
|
||||
ps aux | grep <vmid>
|
||||
kill <pid>
|
||||
|
||||
# Force cleanup
|
||||
qm stop <vmid> --skiplock
|
||||
```
|
||||
|
||||
### Recover from Corrupt Config
|
||||
|
||||
```bash
|
||||
# Backup current config
|
||||
cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
|
||||
|
||||
# Edit config manually
|
||||
nano /etc/pve/qemu-server/<vmid>.conf
|
||||
|
||||
# Or restore from backup
|
||||
qmrestore <backup> <vmid>
|
||||
```
|
||||
|
||||
## Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
echo "=== Cluster Status ==="
|
||||
pvecm status
|
||||
|
||||
echo -e "\n=== Node Resources ==="
|
||||
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
|
||||
echo "--- $node ---"
|
||||
pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
|
||||
done
|
||||
|
||||
echo -e "\n=== Storage Status ==="
|
||||
pvesm status
|
||||
|
||||
echo -e "\n=== Running VMs ==="
|
||||
qm list | grep running
|
||||
|
||||
echo -e "\n=== Running Containers ==="
|
||||
pct list | grep running
|
||||
```
|
||||
103
skills/proxmox/references/vm-lxc.md
Normal file
103
skills/proxmox/references/vm-lxc.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# VM vs LXC Reference
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
### Use VM (QEMU/KVM) When
|
||||
|
||||
- Running Windows or non-Linux OS
|
||||
- Need full kernel isolation
|
||||
- Running untrusted workloads
|
||||
- Complex hardware passthrough needed
|
||||
- Different kernel version required
|
||||
- GPU passthrough required
|
||||
|
||||
### Use LXC When
|
||||
|
||||
- Running Linux services
|
||||
- Need lightweight, fast startup
|
||||
- Comfortable with shared kernel
|
||||
- Want better density/performance
|
||||
- Simple application containers
|
||||
- Development environments
|
||||
|
||||
## QEMU/KVM VMs
|
||||
|
||||
Full hardware virtualization with any OS support.
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
| Setting | Options | Recommendation |
|
||||
|---------|---------|----------------|
|
||||
| CPU type | host, kvm64, custom | `host` for performance |
|
||||
| Boot | UEFI, BIOS | UEFI for modern OS |
|
||||
| Display | VNC, SPICE, NoVNC | NoVNC for web access |
|
||||
|
||||
### Storage Controllers
|
||||
|
||||
| Type | Performance | Use Case |
|
||||
|------|-------------|----------|
|
||||
| VirtIO | Fastest | Linux, Windows with drivers |
|
||||
| SCSI | Fast | General purpose |
|
||||
| SATA | Moderate | Compatibility |
|
||||
| IDE | Slow | Legacy OS |
|
||||
|
||||
### Network Adapters
|
||||
|
||||
| Type | Performance | Use Case |
|
||||
|------|-------------|----------|
|
||||
| VirtIO | Fastest | Linux, Windows with drivers |
|
||||
| E1000 | Good | Compatibility |
|
||||
| RTL8139 | Slow | Legacy OS |
|
||||
|
||||
### Features
|
||||
|
||||
- Snapshots (requires compatible storage)
|
||||
- Templates for rapid cloning
|
||||
- Live migration (requires shared storage)
|
||||
- Hardware passthrough (GPU, USB, PCI)
|
||||
|
||||
## LXC Containers
|
||||
|
||||
OS-level virtualization with shared kernel.
|
||||
|
||||
### Container Types
|
||||
|
||||
| Type | Security | Use Case |
|
||||
|------|----------|----------|
|
||||
| Unprivileged | Higher (recommended) | Production workloads |
|
||||
| Privileged | Lower | Docker-in-LXC, NFS mounts |
|
||||
|
||||
### Resource Controls
|
||||
|
||||
- CPU cores and limits
|
||||
- Memory hard/soft limits
|
||||
- Disk I/O throttling
|
||||
- Network bandwidth limits
|
||||
|
||||
### Storage Options
|
||||
|
||||
- Bind mounts from host
|
||||
- Volume storage
|
||||
- ZFS datasets
|
||||
|
||||
### Features
|
||||
|
||||
- Fast startup (seconds)
|
||||
- Lower memory overhead
|
||||
- Higher density per host
|
||||
- Templates from Proxmox repo
|
||||
|
||||
## Migration Considerations
|
||||
|
||||
### VM Migration Requirements
|
||||
|
||||
- Shared storage (Ceph, NFS, iSCSI)
|
||||
- Same CPU architecture
|
||||
- Compatible Proxmox versions
|
||||
- Network connectivity between nodes
|
||||
|
||||
### LXC Migration Requirements
|
||||
|
||||
- Shared storage for live migration
|
||||
- Same architecture
|
||||
- Unprivileged preferred for portability
|
||||
Reference in New Issue
Block a user