Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:00:27 +08:00
commit 0c6988a884
19 changed files with 5729 additions and 0 deletions

View File

@@ -0,0 +1,378 @@
# Proxmox API Reference
## Overview
The Proxmox API enables programmatic management of the cluster via REST. This reference focuses on common patterns for Python (proxmoxer) and Terraform/Ansible usage.
## Authentication Methods
### API Tokens (Recommended)
**Create API token via CLI:**
```bash
pveum user token add <user>@<realm> <token-id> --privsep 0
```
**Environment variables:**
```bash
export PROXMOX_VE_API_TOKEN="user@realm!token-id=secret"
export PROXMOX_VE_ENDPOINT="https://192.168.3.5:8006"
```
### Password Authentication
```bash
export PROXMOX_VE_USERNAME="root@pam"
export PROXMOX_VE_PASSWORD="password"
export PROXMOX_VE_ENDPOINT="https://192.168.3.5:8006"
```
## Python API Usage (proxmoxer)
### Installation
```bash
# Using uv inline script metadata
# /// script
# dependencies = ["proxmoxer", "requests"]
# ///
```
### Basic Connection
```python
#!/usr/bin/env python3
# /// script
# dependencies = ["proxmoxer", "requests"]
# ///
from proxmoxer import ProxmoxAPI
import os
# Connect using API token
proxmox = ProxmoxAPI(
os.getenv("PROXMOX_VE_ENDPOINT").replace("https://", "").replace(":8006", ""),
user=os.getenv("PROXMOX_VE_USERNAME"),
token_name=os.getenv("PROXMOX_VE_TOKEN_NAME"),
token_value=os.getenv("PROXMOX_VE_TOKEN_VALUE"),
verify_ssl=False
)
# OR using password
proxmox = ProxmoxAPI(
'192.168.3.5',
user='root@pam',
password=os.getenv("PROXMOX_VE_PASSWORD"),
verify_ssl=False
)
```
### Common Operations
**List VMs:**
```python
# Get all VMs across cluster
for node in proxmox.nodes.get():
node_name = node['node']
for vm in proxmox.nodes(node_name).qemu.get():
print(f"VM {vm['vmid']}: {vm['name']} on {node_name} - {vm['status']}")
```
**Get VM Configuration:**
```python
vmid = 101
node = "foxtrot"
vm_config = proxmox.nodes(node).qemu(vmid).config.get()
print(f"VM {vmid} config: {vm_config}")
```
**Clone Template:**
```python
template_id = 9000
new_vmid = 101
node = "foxtrot"
# Clone template
proxmox.nodes(node).qemu(template_id).clone.post(
newid=new_vmid,
name="docker-01-nexus",
full=1, # Full clone (not linked)
storage="local-lvm"
)
# Wait for clone to complete
import time
while True:
tasks = proxmox.nodes(node).tasks.get()
clone_task = next((t for t in tasks if t['type'] == 'qmclone' and str(t['id']) == str(new_vmid)), None)
if not clone_task or clone_task['status'] == 'stopped':
break
time.sleep(2)
```
**Update VM Configuration:**
```python
# Set cloud-init parameters
proxmox.nodes(node).qemu(vmid).config.put(
ipconfig0="ip=192.168.3.100/24,gw=192.168.3.1",
nameserver="192.168.3.1",
searchdomain="spaceships.work",
sshkeys="ssh-rsa AAAA..."
)
```
**Start/Stop VM:**
```python
# Start VM
proxmox.nodes(node).qemu(vmid).status.start.post()
# Stop VM (graceful)
proxmox.nodes(node).qemu(vmid).status.shutdown.post()
# Force stop
proxmox.nodes(node).qemu(vmid).status.stop.post()
```
**Delete VM:**
```python
proxmox.nodes(node).qemu(vmid).delete()
```
### Cluster Operations
**Get Cluster Status:**
```python
cluster_status = proxmox.cluster.status.get()
for node in cluster_status:
if node['type'] == 'node':
print(f"Node: {node['name']} - {node['online']}")
```
**Get Node Resources:**
```python
node_status = proxmox.nodes(node).status.get()
print(f"CPU: {node_status['cpu']*100:.1f}%")
print(f"Memory: {node_status['memory']['used']/1024**3:.1f}GB / {node_status['memory']['total']/1024**3:.1f}GB")
```
### Storage Operations
**List Storage:**
```python
for storage in proxmox.storage.get():
print(f"Storage: {storage['storage']} - Type: {storage['type']} - {storage['active']}")
```
**Get Storage Content:**
```python
storage = "local-lvm"
content = proxmox.storage(storage).content.get()
for item in content:
print(f"{item['volid']} - {item.get('vmid', 'N/A')} - {item['size']/1024**3:.1f}GB")
```
## Terraform Provider Patterns
### Basic Resource (VM from Clone)
```hcl
resource "proxmox_vm_qemu" "docker_host" {
name = "docker-01-nexus"
target_node = "foxtrot"
vmid = 101
clone = "ubuntu-template"
full_clone = true
cores = 4
memory = 8192
sockets = 1
network {
bridge = "vmbr0"
model = "virtio"
tag = 30 # VLAN 30
}
disk {
storage = "local-lvm"
type = "scsi"
size = "50G"
}
ipconfig0 = "ip=192.168.3.100/24,gw=192.168.3.1"
sshkeys = file("~/.ssh/id_rsa.pub")
}
```
### Data Sources
```hcl
# Get template information
data "proxmox_vm_qemu" "template" {
name = "ubuntu-template"
target_node = "foxtrot"
}
# Get storage information
data "proxmox_storage" "local_lvm" {
node = "foxtrot"
storage = "local-lvm"
}
```
## Ansible Module Patterns
### Create VM from Template
```yaml
- name: Clone template to create VM
community.proxmox.proxmox_kvm:
api_host: "{{ proxmox_api_host }}"
api_user: "{{ proxmox_api_user }}"
api_token_id: "{{ proxmox_token_id }}"
api_token_secret: "{{ proxmox_token_secret }}"
node: foxtrot
vmid: 101
name: docker-01-nexus
clone: ubuntu-template
full: true
storage: local-lvm
net:
net0: 'virtio,bridge=vmbr0,tag=30'
ipconfig:
ipconfig0: 'ip=192.168.3.100/24,gw=192.168.3.1'
cores: 4
memory: 8192
agent: 1
state: present
```
### Start VM
```yaml
- name: Start VM
community.proxmox.proxmox_kvm:
api_host: "{{ proxmox_api_host }}"
api_user: "{{ proxmox_api_user }}"
api_token_id: "{{ proxmox_token_id }}"
api_token_secret: "{{ proxmox_token_secret }}"
node: foxtrot
vmid: 101
state: started
```
## Matrix Cluster Specifics
### Node IP Addresses
```python
MATRIX_NODES = {
"foxtrot": "192.168.3.5",
"golf": "192.168.3.6",
"hotel": "192.168.3.7"
}
```
### Storage Pools
```python
STORAGE_POOLS = {
"local": "dir", # Local directory
"local-lvm": "lvmthin", # LVM thin on boot disk
"ceph-pool": "rbd" # CEPH RBD (when configured)
}
```
### Network Bridges
```python
BRIDGES = {
"vmbr0": "192.168.3.0/24", # Management + VLAN 9 (Corosync)
"vmbr1": "192.168.5.0/24", # CEPH Public (MTU 9000)
"vmbr2": "192.168.7.0/24" # CEPH Private (MTU 9000)
}
```
## Error Handling
### Python Example
```python
from proxmoxer import ProxmoxAPI, ResourceException
import sys
try:
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass', verify_ssl=False)
vm_config = proxmox.nodes('foxtrot').qemu(101).config.get()
except ResourceException as e:
print(f"API Error: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Unexpected error: {e}", file=sys.stderr)
sys.exit(1)
```
### Ansible Example
```yaml
- name: Clone VM with error handling
community.proxmox.proxmox_kvm:
api_host: "{{ proxmox_api_host }}"
# ... config ...
register: clone_result
failed_when: false
- name: Check clone result
ansible.builtin.fail:
msg: "Failed to clone VM: {{ clone_result.msg }}"
when: clone_result.failed
```
## API Endpoints Reference
### Common Endpoints
```text
GET /api2/json/nodes # List nodes
GET /api2/json/nodes/{node}/qemu # List VMs on node
GET /api2/json/nodes/{node}/qemu/{vmid} # Get VM status
POST /api2/json/nodes/{node}/qemu/{vmid}/clone # Clone VM
PUT /api2/json/nodes/{node}/qemu/{vmid}/config # Update config
POST /api2/json/nodes/{node}/qemu/{vmid}/status/start # Start VM
POST /api2/json/nodes/{node}/qemu/{vmid}/status/shutdown # Stop VM
DELETE /api2/json/nodes/{node}/qemu/{vmid} # Delete VM
GET /api2/json/cluster/status # Cluster status
GET /api2/json/storage # List storage
```
## Best Practices
1. **Use API tokens** - More secure than password authentication
2. **Handle SSL properly** - Use `verify_ssl=True` with proper CA cert in production
3. **Check task completion** - Clone/migrate operations are async, poll for completion
4. **Error handling** - Always catch ResourceException and provide meaningful errors
5. **Rate limiting** - Don't hammer the API, add delays in loops
6. **Idempotency** - Check if resource exists before creating
7. **Use VMID ranges** - Reserve ranges for different purposes (templates: 9000-9999, VMs: 100-999)
## Further Reading
- [Proxmox VE API Documentation](https://pve.proxmox.com/pve-docs/api-viewer/)
- [proxmoxer GitHub](https://github.com/proxmoxer/proxmoxer)
- [community.proxmox Collection](https://docs.ansible.com/ansible/latest/collections/community/proxmox/)

View File

@@ -0,0 +1,163 @@
# Cloud-Init Patterns for Proxmox VE
*Source: <https://pve.proxmox.com/wiki/Cloud-Init_Support*>
## Overview
Cloud-Init is the de facto multi-distribution package that handles early initialization of virtual machines. When a VM starts for the first time, Cloud-Init applies network and SSH key settings configured on the hypervisor.
## Template Creation Workflow
### Download and Import Cloud Image
```bash
# Download Ubuntu cloud image
wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
# Create VM with VirtIO SCSI controller
qm create 9000 --memory 2048 --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci
# Import disk to storage
qm set 9000 --scsi0 local-lvm:0,import-from=/path/to/bionic-server-cloudimg-amd64.img
```
**Important**: Ubuntu Cloud-Init images require `virtio-scsi-pci` controller type for SCSI drives.
### Configure Cloud-Init Components
```bash
# Add Cloud-Init CD-ROM drive
qm set 9000 --ide2 local-lvm:cloudinit
# Set boot order (speeds up boot)
qm set 9000 --boot order=scsi0
# Configure serial console (required for many cloud images)
qm set 9000 --serial0 socket --vga serial0
# Convert to template
qm template 9000
```
## Deploying from Templates
### Clone Template
```bash
# Clone template to new VM
qm clone 9000 123 --name ubuntu2
```
### Configure VM
```bash
# Set SSH public key
qm set 123 --sshkey ~/.ssh/id_rsa.pub
# Configure network
qm set 123 --ipconfig0 ip=10.0.10.123/24,gw=10.0.10.1
```
## Custom Cloud-Init Configuration
### Using Custom Config Files
Proxmox allows custom cloud-init configurations via the `cicustom` option:
```bash
qm set 9000 --cicustom "user=<volume>,network=<volume>,meta=<volume>"
```
Example using local snippets storage:
```bash
qm set 9000 --cicustom "user=local:snippets/userconfig.yaml"
```
### Dump Generated Config
Use as a base for custom configurations:
```bash
qm cloudinit dump 9000 user
qm cloudinit dump 9000 network
qm cloudinit dump 9000 meta
```
## Cloud-Init Options Reference
### cicustom
Specify custom files to replace automatically generated ones:
- `meta=<volume>` - Meta data (provider specific)
- `network=<volume>` - Network data
- `user=<volume>` - User data
- `vendor=<volume>` - Vendor data
### cipassword
Password for the user. **Not recommended** - use SSH keys instead.
### citype
Configuration format: `configdrive2 | nocloud | opennebula`
- Default: `nocloud` for Linux, `configdrive2` for Windows
### ciupgrade
Automatic package upgrade after first boot (default: `true`)
### ciuser
Username to configure (instead of image's default user)
### ipconfig[n]
IP addresses and gateways for network interfaces.
Format: `[gw=<GatewayIPv4>] [,gw6=<GatewayIPv6>] [,ip=<IPv4Format/CIDR>] [,ip6=<IPv6Format/CIDR>]`
Special values:
- `ip=dhcp` - Use DHCP for IPv4
- `ip6=auto` - Use stateless autoconfiguration (requires cloud-init 19.4+)
### sshkeys
Public SSH keys (one per line, OpenSSH format)
### nameserver
DNS server IP address
### searchdomain
DNS search domains
## Best Practices
1. **Use SSH keys** instead of passwords for authentication
2. **Configure serial console** for cloud images (many require it)
3. **Set boot order** to speed up boot process
4. **Convert to template** for fast linked clone deployment
5. **Store custom configs in snippets** storage (must be on all nodes for migration)
6. **Test with a clone** before modifying template
## Troubleshooting
### Template Won't Boot
- Check if serial console is configured: `qm set <vmid> --serial0 socket --vga serial0`
- Verify boot order: `qm set <vmid> --boot order=scsi0`
### Network Not Configured
- Ensure cloud-init CD-ROM is attached: `qm set <vmid> --ide2 local-lvm:cloudinit`
- Check IP configuration: `qm config <vmid> | grep ipconfig`
### SSH Keys Not Working
- Verify sshkeys format (OpenSSH format, one per line)
- Check cloud-init logs in VM: `cat /var/log/cloud-init.log`

View File

@@ -0,0 +1,373 @@
# Proxmox Network Configuration
*Source: <https://pve.proxmox.com/wiki/Network_Configuration*>
## Key Concepts
### Configuration File
All network configuration is in `/etc/network/interfaces`. GUI changes write to `/etc/network/interfaces.new` for safety.
### Applying Changes
**ifupdown2 (recommended):**
```bash
# Apply from GUI or run:
ifreload -a
```
**Reboot method:**
The `pvenetcommit` service activates staging file before `networking` service applies it.
## Naming Conventions
### Current (Proxmox VE 5.0+)
- Ethernet: `en*` (systemd predictable names)
- `eno1` - first on-board NIC
- `enp3s0f1` - function 1 of NIC on PCI bus 3, slot 0
- Bridges: `vmbr[0-4094]`
- Bonds: `bond[N]`
- VLANs: Add VLAN number after period: `eno1.50`, `bond1.30`
### Legacy (pre-5.0)
- Ethernet: `eth[N]` (eth0, eth1, ...)
### Pinning Naming Scheme Version
Add to kernel command line to prevent name changes:
```bash
net.naming-scheme=v252
```
### Overriding Device Names
**Automatic tool:**
```bash
# Generate .link files for all interfaces
pve-network-interface-pinning generate
# With custom prefix
pve-network-interface-pinning generate --prefix myprefix
# Pin specific interface
pve-network-interface-pinning generate --interface enp1s0 --target-name if42
```
**Manual method** (`/etc/systemd/network/10-enwan0.link`):
```ini
[Match]
MACAddress=aa:bb:cc:dd:ee:ff
Type=ether
[Link]
Name=enwan0
```
After creating link files:
```bash
update-initramfs -u -k all
# Then reboot
```
## Network Setups
### Default Bridged Configuration
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.10.2/24
gateway 192.168.10.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
```
VMs behave as if directly connected to physical network.
### Routed Configuration
For hosting providers that block multiple MACs:
```bash
auto lo
iface lo inet loopback
auto eno0
iface eno0 inet static
address 198.51.100.5/29
gateway 198.51.100.1
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
auto vmbr0
iface vmbr0 inet static
address 203.0.113.17/28
bridge-ports none
bridge-stp off
bridge-fd 0
```
### Masquerading (NAT)
For VMs with private IPs:
```bash
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet static
address 198.51.100.5/24
gateway 198.51.100.1
auto vmbr0
iface vmbr0 inet static
address 10.10.10.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
```
**Conntrack zones fix** (if firewall blocks outgoing):
```bash
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
```
## Linux Bonding
### Bond Modes
1. **balance-rr** - Round-robin (load balancing + fault tolerance)
2. **active-backup** - Only one active NIC (fault tolerance only)
3. **balance-xor** - XOR selection (load balancing + fault tolerance)
4. **broadcast** - Transmit on all slaves (fault tolerance)
5. **802.3ad (LACP)** - IEEE 802.3ad Dynamic link aggregation (requires switch support)
6. **balance-tlb** - Adaptive transmit load balancing
7. **balance-alb** - Adaptive load balancing (balance-tlb + receive balancing)
**Recommendation:**
- If switch supports LACP → use 802.3ad
- Otherwise → use active-backup
### Bond with Fixed IP
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet static
bond-slaves eno1 eno2
address 192.168.1.2/24
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports eno3
bridge-stp off
bridge-fd 0
```
### Bond as Bridge Port
For fault-tolerant guest network:
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
```
## VLAN Configuration (802.1Q)
### VLAN Awareness on Bridge
**Guest VLANs** - Configure VLAN tag in VM settings, bridge handles transparently.
**Bridge with VLAN awareness:**
```bash
auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
```
### Host Management on VLAN
**With VLAN-aware bridge:**
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0.5
iface vmbr0.5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
```
**Traditional VLAN:**
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno1.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports eno1.5
bridge-stp off
bridge-fd 0
auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
```
### VLAN with Bonding
```bash
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
iface bond0.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports bond0.5
bridge-stp off
bridge-fd 0
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
```
## Advanced Features
### Disable MAC Learning
Available since Proxmox VE 7.3:
```bash
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports ens18
bridge-stp off
bridge-fd 0
bridge-disable-mac-learning 1
```
Proxmox VE manually adds VM/CT MAC addresses to forwarding database.
### Disable IPv6
Create `/etc/sysctl.d/disable-ipv6.conf`:
```ini
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
```
Then: `sysctl -p /etc/sysctl.d/disable-ipv6.conf`
## Troubleshooting
### Avoid ifup/ifdown
**Don't use** `ifup`/`ifdown` on bridges as they interrupt guest traffic without reconnecting.
**Use instead:**
- GUI "Apply Configuration" button
- `ifreload -a` command
- Reboot
### Network Changes Not Applied
1. Check `/etc/network/interfaces.new` exists
2. Click "Apply Configuration" in GUI or run `ifreload -a`
3. If issues persist, reboot
### Bond Not Working with Corosync
Some bond modes are problematic for Corosync. Use multiple networks instead of bonding for cluster traffic.

View File

@@ -0,0 +1,467 @@
# QEMU Guest Agent Integration
## Overview
The QEMU Guest Agent (`qemu-guest-agent`) is a service running inside VMs that enables communication between Proxmox and the guest OS. It provides IP address detection, graceful shutdowns, filesystem freezing for snapshots, and more.
## Why Use QEMU Guest Agent?
**Without Guest Agent:**
- VM IP address unknown to Proxmox
- Shutdown = hard power off
- Snapshots don't freeze filesystem (risk of corruption)
- No guest-level monitoring
**With Guest Agent:**
- Automatic IP address detection
- Graceful shutdown/reboot
- Consistent snapshots with filesystem freeze
- Execute commands inside VM
- Query guest information (hostname, users, OS details)
## Installation in Guest VM
### Ubuntu/Debian
```bash
sudo apt update
sudo apt install qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
```
### RHEL/Rocky/AlmaLinux
```bash
sudo dnf install qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
```
### Verify Installation
```bash
systemctl status qemu-guest-agent
```
**Expected output:**
```text
● qemu-guest-agent.service - QEMU Guest Agent
Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; enabled)
Active: active (running)
```
## Enable in VM Configuration
### Via Proxmox Web UI
**VM → Hardware → Add → QEMU Guest Agent**
OR edit VM options:
**VM → Options → QEMU Guest Agent → Edit → Check "Use QEMU Guest Agent"**
### Via CLI
```bash
qm set <vmid> --agent 1
```
**With custom options:**
```bash
# Enable with filesystem freeze support
qm set <vmid> --agent enabled=1,fstrim_cloned_disks=1
```
### Via Terraform
```hcl
resource "proxmox_vm_qemu" "vm" {
name = "my-vm"
# ... other config ...
agent = 1 # Enable guest agent
}
```
### Via Ansible
```yaml
- name: Enable QEMU guest agent
community.proxmox.proxmox_kvm:
api_host: "{{ proxmox_api_host }}"
api_user: "{{ proxmox_api_user }}"
api_token_id: "{{ proxmox_token_id }}"
api_token_secret: "{{ proxmox_token_secret }}"
node: foxtrot
vmid: 101
agent: 1
update: true
```
## Using Guest Agent
### Check Agent Status
**Via CLI:**
```bash
# Test if agent is responding
qm agent 101 ping
# Get guest info
qm agent 101 info
# Get network interfaces
qm agent 101 network-get-interfaces
# Get IP addresses
qm agent 101 get-osinfo
```
**Example output:**
```json
{
"result": {
"id": "ubuntu",
"kernel-release": "5.15.0-91-generic",
"kernel-version": "#101-Ubuntu SMP",
"machine": "x86_64",
"name": "Ubuntu",
"pretty-name": "Ubuntu 22.04.3 LTS",
"version": "22.04",
"version-id": "22.04"
}
}
```
### Execute Commands
**Via CLI:**
```bash
# Execute command in guest
qm guest exec 101 -- whoami
# With arguments
qm guest exec 101 -- ls -la /tmp
```
**Via Python API:**
```python
from proxmoxer import ProxmoxAPI
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass')
# Execute command
result = proxmox.nodes('foxtrot').qemu(101).agent.exec.post(
command=['whoami']
)
# Get execution result
pid = result['pid']
exec_status = proxmox.nodes('foxtrot').qemu(101).agent('exec-status').get(pid=pid)
print(exec_status)
```
### Graceful Shutdown/Reboot
**Shutdown (graceful with agent):**
```bash
# Sends ACPI shutdown to guest, waits for agent to shutdown OS
qm shutdown 101
# Force shutdown if doesn't complete in 60s
qm shutdown 101 --timeout 60 --forceStop 1
```
**Reboot:**
```bash
qm reboot 101
```
## Snapshot Integration
### Filesystem Freeze for Consistent Snapshots
When guest agent is enabled, Proxmox can freeze the filesystem before taking a snapshot, ensuring consistency.
**Create snapshot with FS freeze:**
```bash
# Guest agent automatically freezes filesystem
qm snapshot 101 before-upgrade --vmstate 0 --description "Before upgrade"
```
**Rollback to snapshot:**
```bash
qm rollback 101 before-upgrade
```
**Delete snapshot:**
```bash
qm delsnapshot 101 before-upgrade
```
## IP Address Detection
### Automatic IP Assignment
With guest agent, Proxmox automatically detects VM IP addresses.
**View in Web UI:**
VM → Summary → IPs section shows detected IPs
**Via CLI:**
```bash
qm agent 101 network-get-interfaces | jq '.result[] | select(.name=="eth0") | ."ip-addresses"'
```
**Via Python:**
```python
interfaces = proxmox.nodes('foxtrot').qemu(101).agent('network-get-interfaces').get()
for iface in interfaces['result']:
if iface['name'] == 'eth0':
for ip in iface.get('ip-addresses', []):
if ip['ip-address-type'] == 'ipv4':
print(f"IPv4: {ip['ip-address']}")
```
## Advanced Configuration
### Guest Agent Options
**Full options syntax:**
```bash
qm set <vmid> --agent [enabled=]<1|0>[,fstrim_cloned_disks=<1|0>][,type=<virtio|isa>]
```
**Parameters:**
- `enabled` - Enable/disable guest agent (default: 1)
- `fstrim_cloned_disks` - Run fstrim after cloning disk (default: 0)
- `type` - Agent communication type: virtio or isa (default: virtio)
**Example:**
```bash
# Enable with fstrim on cloned disks
qm set 101 --agent enabled=1,fstrim_cloned_disks=1
```
### Filesystem Trim (fstrim)
For VMs on thin-provisioned storage (LVM-thin, CEPH), fstrim helps reclaim unused space.
**Manual fstrim:**
```bash
# Inside VM
sudo fstrim -av
```
**Automatic on clone:**
```bash
qm set <vmid> --agent enabled=1,fstrim_cloned_disks=1
```
**Scheduled fstrim (inside VM):**
```bash
# Enable weekly fstrim timer
sudo systemctl enable fstrim.timer
sudo systemctl start fstrim.timer
```
## Cloud-Init Integration
### Include in Cloud-Init Template
**During template creation:**
```bash
# Install agent package
virt-customize -a ubuntu-22.04.img \
--install qemu-guest-agent \
--run-command "systemctl enable qemu-guest-agent"
# Create VM from image
qm create 9000 --name ubuntu-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
qm importdisk 9000 ubuntu-22.04.img local-lvm
qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0
qm set 9000 --agent 1 # Enable guest agent
qm set 9000 --ide2 local-lvm:cloudinit
qm template 9000
```
### Cloud-Init User Data
**Include in cloud-init config:**
```yaml
#cloud-config
packages:
- qemu-guest-agent
runcmd:
- systemctl enable qemu-guest-agent
- systemctl start qemu-guest-agent
```
## Troubleshooting
### Guest Agent Not Responding
**1. Check if service is running in guest:**
```bash
# Inside VM
systemctl status qemu-guest-agent
journalctl -u qemu-guest-agent
```
**2. Check if agent is enabled in VM config:**
```bash
# On Proxmox host
qm config 101 | grep agent
```
**3. Check virtio serial device:**
```bash
# Inside VM
ls -l /dev/virtio-ports/
# Should show: org.qemu.guest_agent.0
```
**4. Restart agent:**
```bash
# Inside VM
sudo systemctl restart qemu-guest-agent
```
**5. Check Proxmox can communicate:**
```bash
# On Proxmox host
qm agent 101 ping
```
### IP Address Not Detected
**Possible causes:**
1. Guest agent not running
2. Network interface not configured
3. DHCP not assigning IP
4. Firewall blocking communication
**Debug:**
```bash
# Check all interfaces
qm agent 101 network-get-interfaces | jq
# Verify cloud-init completed
# Inside VM
cloud-init status
```
### Filesystem Freeze Timeout
**Symptoms:**
Snapshot creation hangs or times out.
**Solution:**
```bash
# Disable FS freeze for snapshots
qm set 101 --agent enabled=1
# Take snapshot without FS freeze
qm snapshot 101 test --vmstate 0
```
### Agent Installed but Not Enabled
**Check VM config:**
```bash
qm config 101 | grep agent
```
**If missing, enable:**
```bash
qm set 101 --agent 1
```
**Restart VM for changes to take effect:**
```bash
qm reboot 101
```
## Best Practices
1. **Always install in templates** - Include qemu-guest-agent in VM templates
2. **Enable during provisioning** - Set `--agent 1` when creating VMs
3. **Use for production VMs** - Critical for graceful shutdowns and monitoring
4. **Enable fstrim for thin storage** - Helps reclaim space on LVM-thin and CEPH
5. **Test before snapshots** - Verify agent works: `qm agent <vmid> ping`
6. **Cloud-init integration** - Automate installation via cloud-init packages
7. **Monitor agent status** - Check agent is running in monitoring tools
## Ansible Automation Example
```yaml
---
- name: Ensure QEMU guest agent is configured
hosts: proxmox_vms
become: true
tasks:
- name: Install qemu-guest-agent
ansible.builtin.apt:
name: qemu-guest-agent
state: present
when: ansible_os_family == "Debian"
- name: Enable and start qemu-guest-agent
ansible.builtin.systemd:
name: qemu-guest-agent
enabled: true
state: started
- name: Verify agent is running
ansible.builtin.systemd:
name: qemu-guest-agent
register: agent_status
- name: Report agent status
ansible.builtin.debug:
msg: "Guest agent is {{ agent_status.status.ActiveState }}"
```
## Further Reading
- [Proxmox QEMU Guest Agent Documentation](https://pve.proxmox.com/wiki/Qemu-guest-agent)
- [QEMU Guest Agent Protocol](https://www.qemu.org/docs/master/interop/qemu-ga.html)

View File

@@ -0,0 +1,486 @@
# Proxmox Storage Management
## Overview
Proxmox VE supports multiple storage backends. This guide focuses on the storage architecture of the Matrix cluster: LVM-thin for boot disks and CEPH for distributed storage.
## Matrix Cluster Storage Architecture
### Hardware Configuration
**Per Node (Foxtrot, Golf, Hotel):**
```text
nvme0n1 - 1TB Crucial P3 → Boot disk + LVM
nvme1n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
nvme2n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
```
**Total Cluster:**
- 3× 1TB boot disks (LVM local storage)
- 6× 4TB NVMe drives (24TB raw CEPH capacity)
- 12 CEPH OSDs total (2 per NVMe drive)
### Storage Pools
```text
Storage Pool Type Backend Purpose
------------- ---- ------- -------
local dir Directory ISO images, templates, backups
local-lvm lvmthin LVM-thin VM disks (local)
ceph-pool rbd CEPH RBD VM disks (distributed, HA)
ceph-fs cephfs CephFS Shared filesystem
```
## LVM Storage
### LVM-thin Configuration
**Advantages:**
- Thin provisioning (overcommit storage)
- Fast snapshots
- Local to each node (low latency)
- No network overhead
**Disadvantages:**
- No HA (tied to single node)
- No live migration with storage
- Limited to node's local disk size
**Check LVM usage:**
```bash
# View volume groups
vgs
# View logical volumes
lvs
# View thin pool usage
lvs -a | grep thin
```
**Example output:**
```text
LV VG Attr LSize Pool Origin Data%
data pve twi-aotz-- 850.00g 45.23
vm-101-disk-0 pve Vwi-aotz-- 50.00g data 12.45
```
### Managing LVM Storage
**Extend thin pool (if boot disk has space):**
```bash
# Check free space in VG
vgs pve
# Extend thin pool
lvextend -L +100G pve/data
```
**Create VM disk manually:**
```bash
# Create 50GB disk for VM 101
lvcreate -V 50G -T pve/data -n vm-101-disk-0
```
## CEPH Storage
### CEPH Architecture for Matrix
**Network Configuration:**
```text
vmbr1 (192.168.5.0/24, MTU 9000) → CEPH Public Network
vmbr2 (192.168.7.0/24, MTU 9000) → CEPH Private Network
```
**OSD Distribution:**
```text
Node NVMe OSDs Capacity
------- ------ ---- --------
foxtrot nvme1n1 2 4TB
foxtrot nvme2n1 2 4TB
golf nvme1n1 2 4TB
golf nvme2n1 2 4TB
hotel nvme1n1 2 4TB
hotel nvme2n1 2 4TB
------- ------ ---- --------
Total 12 24TB raw
```
**Usable capacity (replica 3):** ~8TB
### CEPH Deployment Commands
**Install CEPH:**
```bash
# On first node (foxtrot)
pveceph install --version reef
# Initialize cluster
pveceph init --network 192.168.5.0/24 --cluster-network 192.168.7.0/24
```
**Create Monitors (3 for quorum):**
```bash
# On each node
pveceph mon create
```
**Create Manager (on each node):**
```bash
pveceph mgr create
```
**Create OSDs:**
```bash
# On each node - 2 OSDs per NVMe drive
# For nvme1n1 (4TB)
pveceph osd create /dev/nvme1n1 --crush-device-class nvme
# For nvme2n1 (4TB)
pveceph osd create /dev/nvme2n1 --crush-device-class nvme
```
**Create CEPH Pool:**
```bash
# Create RBD pool for VMs
pveceph pool create ceph-pool --add_storages
# Create CephFS for shared storage
pveceph fs create --name cephfs --add-storage
```
### CEPH Configuration Best Practices
**Optimize for NVMe:**
```bash
# /etc/pve/ceph.conf
[global]
public_network = 192.168.5.0/24
cluster_network = 192.168.7.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 2
[osd]
osd_memory_target = 4294967296 # 4GB per OSD
osd_max_backfills = 1
osd_recovery_max_active = 1
```
**Restart CEPH services after config change:**
```bash
systemctl restart ceph-osd@*.service
```
### CEPH Monitoring
**Check cluster health:**
```bash
ceph status
ceph health detail
```
**Example healthy output:**
```text
cluster:
id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
health: HEALTH_OK
services:
mon: 3 daemons, quorum foxtrot,golf,hotel
mgr: foxtrot(active), standbys: golf, hotel
osd: 12 osds: 12 up, 12 in
data:
pools: 2 pools, 128 pgs
objects: 1.23k objects, 45 GiB
usage: 135 GiB used, 23.8 TiB / 24 TiB avail
pgs: 128 active+clean
```
**Check OSD performance:**
```bash
ceph osd df
ceph osd perf
```
**Check pool usage:**
```bash
ceph df
rados df
```
## Storage Configuration in Proxmox
### Add Storage via Web UI
**Datacenter → Storage → Add:**
1. **Directory** - For ISOs and backups
2. **LVM-Thin** - For local VM disks
3. **RBD** - For CEPH VM disks
4. **CephFS** - For shared files
### Add Storage via CLI
**CEPH RBD:**
```bash
pvesm add rbd ceph-pool \
--pool ceph-pool \
--content images,rootdir \
--nodes foxtrot,golf,hotel
```
**CephFS:**
```bash
pvesm add cephfs cephfs \
--path /mnt/pve/cephfs \
--content backup,iso,vztmpl \
--nodes foxtrot,golf,hotel
```
**NFS (if using external NAS):**
```bash
pvesm add nfs nas-storage \
--server 192.168.3.10 \
--export /mnt/tank/proxmox \
--content images,backup,iso \
--nodes foxtrot,golf,hotel
```
## VM Disk Management
### Create VM Disk on CEPH
**Via CLI:**
```bash
# Create 100GB disk for VM 101 on CEPH
qm set 101 --scsi1 ceph-pool:100
```
**Via API (Python):**
```python
from proxmoxer import ProxmoxAPI
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass')
proxmox.nodes('foxtrot').qemu(101).config.put(scsi1='ceph-pool:100')
```
### Move VM Disk Between Storage
**Move from local-lvm to CEPH:**
```bash
qm move-disk 101 scsi0 ceph-pool --delete 1
```
**Move with live migration:**
```bash
qm move-disk 101 scsi0 ceph-pool --delete 1 --online 1
```
### Resize VM Disk
**Grow disk (can't shrink):**
```bash
# Grow VM 101's scsi0 by 50GB
qm resize 101 scsi0 +50G
```
**Inside VM (expand filesystem):**
```bash
# For ext4
sudo resize2fs /dev/sda1
# For XFS
sudo xfs_growfs /
```
## Backup and Restore
### Backup to Storage
**Create backup:**
```bash
# Backup VM 101 to local storage
vzdump 101 --storage local --mode snapshot --compress zstd
# Backup to CephFS
vzdump 101 --storage cephfs --mode snapshot --compress zstd
```
**Scheduled backups (via Web UI):**
Datacenter → Backup → Add:
- Schedule: Daily at 2 AM
- Storage: cephfs
- Mode: Snapshot
- Compression: ZSTD
- Retention: Keep last 7
### Restore from Backup
**List backups:**
```bash
ls /var/lib/vz/dump/
# OR
ls /mnt/pve/cephfs/dump/
```
**Restore:**
```bash
# Restore to same VMID
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 101
# Restore to new VMID
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 102 --storage ceph-pool
```
## Performance Tuning
### CEPH Performance
**For NVMe OSDs:**
```bash
# Set proper device class
ceph osd crush set-device-class nvme osd.0
ceph osd crush set-device-class nvme osd.1
# ... repeat for all OSDs
```
**Create performance pool:**
```bash
ceph osd pool create fast-pool 128 128
ceph osd pool application enable fast-pool rbd
```
**Enable RBD cache:**
```bash
# /etc/pve/ceph.conf
[client]
rbd_cache = true
rbd_cache_size = 134217728 # 128MB
rbd_cache_writethrough_until_flush = false
```
### LVM Performance
**Use SSD discard:**
```bash
# Enable discard on VM disk
qm set 101 --scsi0 local-lvm:vm-101-disk-0,discard=on,ssd=1
```
## Troubleshooting
### CEPH Not Healthy
**Check OSD status:**
```bash
ceph osd tree
ceph osd stat
```
**Restart stuck OSD:**
```bash
systemctl restart ceph-osd@0.service
```
**Check network connectivity:**
```bash
# From one node to another
ping -c 3 -M do -s 8972 192.168.5.6 # Test MTU 9000
```
### LVM Out of Space
**Check thin pool usage:**
```bash
lvs pve/data -o lv_name,data_percent,metadata_percent
```
**If thin pool > 90% full:**
```bash
# Extend if VG has space
lvextend -L +100G pve/data
# OR delete unused VM disks
lvremove pve/vm-XXX-disk-0
```
### Storage Performance Issues
**Test disk I/O:**
```bash
# Test sequential write
dd if=/dev/zero of=/tmp/test bs=1M count=1024 oflag=direct
# Test CEPH RBD performance
rbd bench --io-type write ceph-pool/test-image
```
**Monitor CEPH latency:**
```bash
ceph osd perf
```
## Best Practices
1. **Use CEPH for HA VMs** - Store critical VM disks on CEPH for live migration
2. **Use LVM for performance** - Non-critical VMs get better performance on local LVM
3. **MTU 9000 for CEPH** - Always use jumbo frames on CEPH networks
4. **Separate networks** - Public and private CEPH networks on different interfaces
5. **Monitor CEPH health** - Set up alerts for HEALTH_WARN/HEALTH_ERR
6. **Regular backups** - Automated daily backups to CephFS or external NAS
7. **Plan for growth** - Leave 20% free space in CEPH for rebalancing
8. **Use replica 3** - Essential for data safety, especially with only 3 nodes
## Further Reading
- [Proxmox VE Storage Documentation](https://pve.proxmox.com/wiki/Storage)
- [CEPH Documentation](https://docs.ceph.com/)
- [Proxmox CEPH Guide](https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster)