Initial commit
This commit is contained in:
378
skills/proxmox-infrastructure/reference/api-reference.md
Normal file
378
skills/proxmox-infrastructure/reference/api-reference.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# Proxmox API Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The Proxmox API enables programmatic management of the cluster via REST. This reference focuses on common patterns for Python (proxmoxer) and Terraform/Ansible usage.
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
### API Tokens (Recommended)
|
||||
|
||||
**Create API token via CLI:**
|
||||
|
||||
```bash
|
||||
pveum user token add <user>@<realm> <token-id> --privsep 0
|
||||
```
|
||||
|
||||
**Environment variables:**
|
||||
|
||||
```bash
|
||||
export PROXMOX_VE_API_TOKEN="user@realm!token-id=secret"
|
||||
export PROXMOX_VE_ENDPOINT="https://192.168.3.5:8006"
|
||||
```
|
||||
|
||||
### Password Authentication
|
||||
|
||||
```bash
|
||||
export PROXMOX_VE_USERNAME="root@pam"
|
||||
export PROXMOX_VE_PASSWORD="password"
|
||||
export PROXMOX_VE_ENDPOINT="https://192.168.3.5:8006"
|
||||
```
|
||||
|
||||
## Python API Usage (proxmoxer)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Using uv inline script metadata
|
||||
# /// script
|
||||
# dependencies = ["proxmoxer", "requests"]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Basic Connection
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
# /// script
|
||||
# dependencies = ["proxmoxer", "requests"]
|
||||
# ///
|
||||
|
||||
from proxmoxer import ProxmoxAPI
|
||||
import os
|
||||
|
||||
# Connect using API token
|
||||
proxmox = ProxmoxAPI(
|
||||
os.getenv("PROXMOX_VE_ENDPOINT").replace("https://", "").replace(":8006", ""),
|
||||
user=os.getenv("PROXMOX_VE_USERNAME"),
|
||||
token_name=os.getenv("PROXMOX_VE_TOKEN_NAME"),
|
||||
token_value=os.getenv("PROXMOX_VE_TOKEN_VALUE"),
|
||||
verify_ssl=False
|
||||
)
|
||||
|
||||
# OR using password
|
||||
proxmox = ProxmoxAPI(
|
||||
'192.168.3.5',
|
||||
user='root@pam',
|
||||
password=os.getenv("PROXMOX_VE_PASSWORD"),
|
||||
verify_ssl=False
|
||||
)
|
||||
```
|
||||
|
||||
### Common Operations
|
||||
|
||||
**List VMs:**
|
||||
|
||||
```python
|
||||
# Get all VMs across cluster
|
||||
for node in proxmox.nodes.get():
|
||||
node_name = node['node']
|
||||
for vm in proxmox.nodes(node_name).qemu.get():
|
||||
print(f"VM {vm['vmid']}: {vm['name']} on {node_name} - {vm['status']}")
|
||||
```
|
||||
|
||||
**Get VM Configuration:**
|
||||
|
||||
```python
|
||||
vmid = 101
|
||||
node = "foxtrot"
|
||||
|
||||
vm_config = proxmox.nodes(node).qemu(vmid).config.get()
|
||||
print(f"VM {vmid} config: {vm_config}")
|
||||
```
|
||||
|
||||
**Clone Template:**
|
||||
|
||||
```python
|
||||
template_id = 9000
|
||||
new_vmid = 101
|
||||
node = "foxtrot"
|
||||
|
||||
# Clone template
|
||||
proxmox.nodes(node).qemu(template_id).clone.post(
|
||||
newid=new_vmid,
|
||||
name="docker-01-nexus",
|
||||
full=1, # Full clone (not linked)
|
||||
storage="local-lvm"
|
||||
)
|
||||
|
||||
# Wait for clone to complete
|
||||
import time
|
||||
while True:
|
||||
tasks = proxmox.nodes(node).tasks.get()
|
||||
clone_task = next((t for t in tasks if t['type'] == 'qmclone' and str(t['id']) == str(new_vmid)), None)
|
||||
if not clone_task or clone_task['status'] == 'stopped':
|
||||
break
|
||||
time.sleep(2)
|
||||
```
|
||||
|
||||
**Update VM Configuration:**
|
||||
|
||||
```python
|
||||
# Set cloud-init parameters
|
||||
proxmox.nodes(node).qemu(vmid).config.put(
|
||||
ipconfig0="ip=192.168.3.100/24,gw=192.168.3.1",
|
||||
nameserver="192.168.3.1",
|
||||
searchdomain="spaceships.work",
|
||||
sshkeys="ssh-rsa AAAA..."
|
||||
)
|
||||
```
|
||||
|
||||
**Start/Stop VM:**
|
||||
|
||||
```python
|
||||
# Start VM
|
||||
proxmox.nodes(node).qemu(vmid).status.start.post()
|
||||
|
||||
# Stop VM (graceful)
|
||||
proxmox.nodes(node).qemu(vmid).status.shutdown.post()
|
||||
|
||||
# Force stop
|
||||
proxmox.nodes(node).qemu(vmid).status.stop.post()
|
||||
```
|
||||
|
||||
**Delete VM:**
|
||||
|
||||
```python
|
||||
proxmox.nodes(node).qemu(vmid).delete()
|
||||
```
|
||||
|
||||
### Cluster Operations
|
||||
|
||||
**Get Cluster Status:**
|
||||
|
||||
```python
|
||||
cluster_status = proxmox.cluster.status.get()
|
||||
for node in cluster_status:
|
||||
if node['type'] == 'node':
|
||||
print(f"Node: {node['name']} - {node['online']}")
|
||||
```
|
||||
|
||||
**Get Node Resources:**
|
||||
|
||||
```python
|
||||
node_status = proxmox.nodes(node).status.get()
|
||||
print(f"CPU: {node_status['cpu']*100:.1f}%")
|
||||
print(f"Memory: {node_status['memory']['used']/1024**3:.1f}GB / {node_status['memory']['total']/1024**3:.1f}GB")
|
||||
```
|
||||
|
||||
### Storage Operations
|
||||
|
||||
**List Storage:**
|
||||
|
||||
```python
|
||||
for storage in proxmox.storage.get():
|
||||
print(f"Storage: {storage['storage']} - Type: {storage['type']} - {storage['active']}")
|
||||
```
|
||||
|
||||
**Get Storage Content:**
|
||||
|
||||
```python
|
||||
storage = "local-lvm"
|
||||
content = proxmox.storage(storage).content.get()
|
||||
for item in content:
|
||||
print(f"{item['volid']} - {item.get('vmid', 'N/A')} - {item['size']/1024**3:.1f}GB")
|
||||
```
|
||||
|
||||
## Terraform Provider Patterns
|
||||
|
||||
### Basic Resource (VM from Clone)
|
||||
|
||||
```hcl
|
||||
resource "proxmox_vm_qemu" "docker_host" {
|
||||
name = "docker-01-nexus"
|
||||
target_node = "foxtrot"
|
||||
vmid = 101
|
||||
|
||||
clone = "ubuntu-template"
|
||||
full_clone = true
|
||||
|
||||
cores = 4
|
||||
memory = 8192
|
||||
sockets = 1
|
||||
|
||||
network {
|
||||
bridge = "vmbr0"
|
||||
model = "virtio"
|
||||
tag = 30 # VLAN 30
|
||||
}
|
||||
|
||||
disk {
|
||||
storage = "local-lvm"
|
||||
type = "scsi"
|
||||
size = "50G"
|
||||
}
|
||||
|
||||
ipconfig0 = "ip=192.168.3.100/24,gw=192.168.3.1"
|
||||
|
||||
sshkeys = file("~/.ssh/id_rsa.pub")
|
||||
}
|
||||
```
|
||||
|
||||
### Data Sources
|
||||
|
||||
```hcl
|
||||
# Get template information
|
||||
data "proxmox_vm_qemu" "template" {
|
||||
name = "ubuntu-template"
|
||||
target_node = "foxtrot"
|
||||
}
|
||||
|
||||
# Get storage information
|
||||
data "proxmox_storage" "local_lvm" {
|
||||
node = "foxtrot"
|
||||
storage = "local-lvm"
|
||||
}
|
||||
```
|
||||
|
||||
## Ansible Module Patterns
|
||||
|
||||
### Create VM from Template
|
||||
|
||||
```yaml
|
||||
- name: Clone template to create VM
|
||||
community.proxmox.proxmox_kvm:
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
api_user: "{{ proxmox_api_user }}"
|
||||
api_token_id: "{{ proxmox_token_id }}"
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: foxtrot
|
||||
vmid: 101
|
||||
name: docker-01-nexus
|
||||
clone: ubuntu-template
|
||||
full: true
|
||||
storage: local-lvm
|
||||
net:
|
||||
net0: 'virtio,bridge=vmbr0,tag=30'
|
||||
ipconfig:
|
||||
ipconfig0: 'ip=192.168.3.100/24,gw=192.168.3.1'
|
||||
cores: 4
|
||||
memory: 8192
|
||||
agent: 1
|
||||
state: present
|
||||
```
|
||||
|
||||
### Start VM
|
||||
|
||||
```yaml
|
||||
- name: Start VM
|
||||
community.proxmox.proxmox_kvm:
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
api_user: "{{ proxmox_api_user }}"
|
||||
api_token_id: "{{ proxmox_token_id }}"
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: foxtrot
|
||||
vmid: 101
|
||||
state: started
|
||||
```
|
||||
|
||||
## Matrix Cluster Specifics
|
||||
|
||||
### Node IP Addresses
|
||||
|
||||
```python
|
||||
MATRIX_NODES = {
|
||||
"foxtrot": "192.168.3.5",
|
||||
"golf": "192.168.3.6",
|
||||
"hotel": "192.168.3.7"
|
||||
}
|
||||
```
|
||||
|
||||
### Storage Pools
|
||||
|
||||
```python
|
||||
STORAGE_POOLS = {
|
||||
"local": "dir", # Local directory
|
||||
"local-lvm": "lvmthin", # LVM thin on boot disk
|
||||
"ceph-pool": "rbd" # CEPH RBD (when configured)
|
||||
}
|
||||
```
|
||||
|
||||
### Network Bridges
|
||||
|
||||
```python
|
||||
BRIDGES = {
|
||||
"vmbr0": "192.168.3.0/24", # Management + VLAN 9 (Corosync)
|
||||
"vmbr1": "192.168.5.0/24", # CEPH Public (MTU 9000)
|
||||
"vmbr2": "192.168.7.0/24" # CEPH Private (MTU 9000)
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Python Example
|
||||
|
||||
```python
|
||||
from proxmoxer import ProxmoxAPI, ResourceException
|
||||
import sys
|
||||
|
||||
try:
|
||||
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass', verify_ssl=False)
|
||||
vm_config = proxmox.nodes('foxtrot').qemu(101).config.get()
|
||||
except ResourceException as e:
|
||||
print(f"API Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"Unexpected error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
```
|
||||
|
||||
### Ansible Example
|
||||
|
||||
```yaml
|
||||
- name: Clone VM with error handling
|
||||
community.proxmox.proxmox_kvm:
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
# ... config ...
|
||||
register: clone_result
|
||||
failed_when: false
|
||||
|
||||
- name: Check clone result
|
||||
ansible.builtin.fail:
|
||||
msg: "Failed to clone VM: {{ clone_result.msg }}"
|
||||
when: clone_result.failed
|
||||
```
|
||||
|
||||
## API Endpoints Reference
|
||||
|
||||
### Common Endpoints
|
||||
|
||||
```text
|
||||
GET /api2/json/nodes # List nodes
|
||||
GET /api2/json/nodes/{node}/qemu # List VMs on node
|
||||
GET /api2/json/nodes/{node}/qemu/{vmid} # Get VM status
|
||||
POST /api2/json/nodes/{node}/qemu/{vmid}/clone # Clone VM
|
||||
PUT /api2/json/nodes/{node}/qemu/{vmid}/config # Update config
|
||||
POST /api2/json/nodes/{node}/qemu/{vmid}/status/start # Start VM
|
||||
POST /api2/json/nodes/{node}/qemu/{vmid}/status/shutdown # Stop VM
|
||||
DELETE /api2/json/nodes/{node}/qemu/{vmid} # Delete VM
|
||||
|
||||
GET /api2/json/cluster/status # Cluster status
|
||||
GET /api2/json/storage # List storage
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use API tokens** - More secure than password authentication
|
||||
2. **Handle SSL properly** - Use `verify_ssl=True` with proper CA cert in production
|
||||
3. **Check task completion** - Clone/migrate operations are async, poll for completion
|
||||
4. **Error handling** - Always catch ResourceException and provide meaningful errors
|
||||
5. **Rate limiting** - Don't hammer the API, add delays in loops
|
||||
6. **Idempotency** - Check if resource exists before creating
|
||||
7. **Use VMID ranges** - Reserve ranges for different purposes (templates: 9000-9999, VMs: 100-999)
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Proxmox VE API Documentation](https://pve.proxmox.com/pve-docs/api-viewer/)
|
||||
- [proxmoxer GitHub](https://github.com/proxmoxer/proxmoxer)
|
||||
- [community.proxmox Collection](https://docs.ansible.com/ansible/latest/collections/community/proxmox/)
|
||||
163
skills/proxmox-infrastructure/reference/cloud-init-patterns.md
Normal file
163
skills/proxmox-infrastructure/reference/cloud-init-patterns.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Cloud-Init Patterns for Proxmox VE
|
||||
|
||||
*Source: <https://pve.proxmox.com/wiki/Cloud-Init_Support*>
|
||||
|
||||
## Overview
|
||||
|
||||
Cloud-Init is the de facto multi-distribution package that handles early initialization of virtual machines. When a VM starts for the first time, Cloud-Init applies network and SSH key settings configured on the hypervisor.
|
||||
|
||||
## Template Creation Workflow
|
||||
|
||||
### Download and Import Cloud Image
|
||||
|
||||
```bash
|
||||
# Download Ubuntu cloud image
|
||||
wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
|
||||
|
||||
# Create VM with VirtIO SCSI controller
|
||||
qm create 9000 --memory 2048 --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-pci
|
||||
|
||||
# Import disk to storage
|
||||
qm set 9000 --scsi0 local-lvm:0,import-from=/path/to/bionic-server-cloudimg-amd64.img
|
||||
```
|
||||
|
||||
**Important**: Ubuntu Cloud-Init images require `virtio-scsi-pci` controller type for SCSI drives.
|
||||
|
||||
### Configure Cloud-Init Components
|
||||
|
||||
```bash
|
||||
# Add Cloud-Init CD-ROM drive
|
||||
qm set 9000 --ide2 local-lvm:cloudinit
|
||||
|
||||
# Set boot order (speeds up boot)
|
||||
qm set 9000 --boot order=scsi0
|
||||
|
||||
# Configure serial console (required for many cloud images)
|
||||
qm set 9000 --serial0 socket --vga serial0
|
||||
|
||||
# Convert to template
|
||||
qm template 9000
|
||||
```
|
||||
|
||||
## Deploying from Templates
|
||||
|
||||
### Clone Template
|
||||
|
||||
```bash
|
||||
# Clone template to new VM
|
||||
qm clone 9000 123 --name ubuntu2
|
||||
```
|
||||
|
||||
### Configure VM
|
||||
|
||||
```bash
|
||||
# Set SSH public key
|
||||
qm set 123 --sshkey ~/.ssh/id_rsa.pub
|
||||
|
||||
# Configure network
|
||||
qm set 123 --ipconfig0 ip=10.0.10.123/24,gw=10.0.10.1
|
||||
```
|
||||
|
||||
## Custom Cloud-Init Configuration
|
||||
|
||||
### Using Custom Config Files
|
||||
|
||||
Proxmox allows custom cloud-init configurations via the `cicustom` option:
|
||||
|
||||
```bash
|
||||
qm set 9000 --cicustom "user=<volume>,network=<volume>,meta=<volume>"
|
||||
```
|
||||
|
||||
Example using local snippets storage:
|
||||
|
||||
```bash
|
||||
qm set 9000 --cicustom "user=local:snippets/userconfig.yaml"
|
||||
```
|
||||
|
||||
### Dump Generated Config
|
||||
|
||||
Use as a base for custom configurations:
|
||||
|
||||
```bash
|
||||
qm cloudinit dump 9000 user
|
||||
qm cloudinit dump 9000 network
|
||||
qm cloudinit dump 9000 meta
|
||||
```
|
||||
|
||||
## Cloud-Init Options Reference
|
||||
|
||||
### cicustom
|
||||
|
||||
Specify custom files to replace automatically generated ones:
|
||||
|
||||
- `meta=<volume>` - Meta data (provider specific)
|
||||
- `network=<volume>` - Network data
|
||||
- `user=<volume>` - User data
|
||||
- `vendor=<volume>` - Vendor data
|
||||
|
||||
### cipassword
|
||||
|
||||
Password for the user. **Not recommended** - use SSH keys instead.
|
||||
|
||||
### citype
|
||||
|
||||
Configuration format: `configdrive2 | nocloud | opennebula`
|
||||
|
||||
- Default: `nocloud` for Linux, `configdrive2` for Windows
|
||||
|
||||
### ciupgrade
|
||||
|
||||
Automatic package upgrade after first boot (default: `true`)
|
||||
|
||||
### ciuser
|
||||
|
||||
Username to configure (instead of image's default user)
|
||||
|
||||
### ipconfig[n]
|
||||
|
||||
IP addresses and gateways for network interfaces.
|
||||
|
||||
Format: `[gw=<GatewayIPv4>] [,gw6=<GatewayIPv6>] [,ip=<IPv4Format/CIDR>] [,ip6=<IPv6Format/CIDR>]`
|
||||
|
||||
Special values:
|
||||
|
||||
- `ip=dhcp` - Use DHCP for IPv4
|
||||
- `ip6=auto` - Use stateless autoconfiguration (requires cloud-init 19.4+)
|
||||
|
||||
### sshkeys
|
||||
|
||||
Public SSH keys (one per line, OpenSSH format)
|
||||
|
||||
### nameserver
|
||||
|
||||
DNS server IP address
|
||||
|
||||
### searchdomain
|
||||
|
||||
DNS search domains
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use SSH keys** instead of passwords for authentication
|
||||
2. **Configure serial console** for cloud images (many require it)
|
||||
3. **Set boot order** to speed up boot process
|
||||
4. **Convert to template** for fast linked clone deployment
|
||||
5. **Store custom configs in snippets** storage (must be on all nodes for migration)
|
||||
6. **Test with a clone** before modifying template
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Template Won't Boot
|
||||
|
||||
- Check if serial console is configured: `qm set <vmid> --serial0 socket --vga serial0`
|
||||
- Verify boot order: `qm set <vmid> --boot order=scsi0`
|
||||
|
||||
### Network Not Configured
|
||||
|
||||
- Ensure cloud-init CD-ROM is attached: `qm set <vmid> --ide2 local-lvm:cloudinit`
|
||||
- Check IP configuration: `qm config <vmid> | grep ipconfig`
|
||||
|
||||
### SSH Keys Not Working
|
||||
|
||||
- Verify sshkeys format (OpenSSH format, one per line)
|
||||
- Check cloud-init logs in VM: `cat /var/log/cloud-init.log`
|
||||
373
skills/proxmox-infrastructure/reference/networking.md
Normal file
373
skills/proxmox-infrastructure/reference/networking.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Proxmox Network Configuration
|
||||
|
||||
*Source: <https://pve.proxmox.com/wiki/Network_Configuration*>
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Configuration File
|
||||
|
||||
All network configuration is in `/etc/network/interfaces`. GUI changes write to `/etc/network/interfaces.new` for safety.
|
||||
|
||||
### Applying Changes
|
||||
|
||||
**ifupdown2 (recommended):**
|
||||
|
||||
```bash
|
||||
# Apply from GUI or run:
|
||||
ifreload -a
|
||||
```
|
||||
|
||||
**Reboot method:**
|
||||
The `pvenetcommit` service activates staging file before `networking` service applies it.
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
### Current (Proxmox VE 5.0+)
|
||||
|
||||
- Ethernet: `en*` (systemd predictable names)
|
||||
- `eno1` - first on-board NIC
|
||||
- `enp3s0f1` - function 1 of NIC on PCI bus 3, slot 0
|
||||
- Bridges: `vmbr[0-4094]`
|
||||
- Bonds: `bond[N]`
|
||||
- VLANs: Add VLAN number after period: `eno1.50`, `bond1.30`
|
||||
|
||||
### Legacy (pre-5.0)
|
||||
|
||||
- Ethernet: `eth[N]` (eth0, eth1, ...)
|
||||
|
||||
### Pinning Naming Scheme Version
|
||||
|
||||
Add to kernel command line to prevent name changes:
|
||||
|
||||
```bash
|
||||
net.naming-scheme=v252
|
||||
```
|
||||
|
||||
### Overriding Device Names
|
||||
|
||||
**Automatic tool:**
|
||||
|
||||
```bash
|
||||
# Generate .link files for all interfaces
|
||||
pve-network-interface-pinning generate
|
||||
|
||||
# With custom prefix
|
||||
pve-network-interface-pinning generate --prefix myprefix
|
||||
|
||||
# Pin specific interface
|
||||
pve-network-interface-pinning generate --interface enp1s0 --target-name if42
|
||||
```
|
||||
|
||||
**Manual method** (`/etc/systemd/network/10-enwan0.link`):
|
||||
|
||||
```ini
|
||||
[Match]
|
||||
MACAddress=aa:bb:cc:dd:ee:ff
|
||||
Type=ether
|
||||
|
||||
[Link]
|
||||
Name=enwan0
|
||||
```
|
||||
|
||||
After creating link files:
|
||||
|
||||
```bash
|
||||
update-initramfs -u -k all
|
||||
# Then reboot
|
||||
```
|
||||
|
||||
## Network Setups
|
||||
|
||||
### Default Bridged Configuration
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 192.168.10.2/24
|
||||
gateway 192.168.10.1
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
VMs behave as if directly connected to physical network.
|
||||
|
||||
### Routed Configuration
|
||||
|
||||
For hosting providers that block multiple MACs:
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
auto eno0
|
||||
iface eno0 inet static
|
||||
address 198.51.100.5/29
|
||||
gateway 198.51.100.1
|
||||
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
|
||||
post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 203.0.113.17/28
|
||||
bridge-ports none
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
### Masquerading (NAT)
|
||||
|
||||
For VMs with private IPs:
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
auto eno1
|
||||
iface eno1 inet static
|
||||
address 198.51.100.5/24
|
||||
gateway 198.51.100.1
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 10.10.10.1/24
|
||||
bridge-ports none
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
|
||||
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
|
||||
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
|
||||
```
|
||||
|
||||
**Conntrack zones fix** (if firewall blocks outgoing):
|
||||
|
||||
```bash
|
||||
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
|
||||
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
|
||||
```
|
||||
|
||||
## Linux Bonding
|
||||
|
||||
### Bond Modes
|
||||
|
||||
1. **balance-rr** - Round-robin (load balancing + fault tolerance)
|
||||
2. **active-backup** - Only one active NIC (fault tolerance only)
|
||||
3. **balance-xor** - XOR selection (load balancing + fault tolerance)
|
||||
4. **broadcast** - Transmit on all slaves (fault tolerance)
|
||||
5. **802.3ad (LACP)** - IEEE 802.3ad Dynamic link aggregation (requires switch support)
|
||||
6. **balance-tlb** - Adaptive transmit load balancing
|
||||
7. **balance-alb** - Adaptive load balancing (balance-tlb + receive balancing)
|
||||
|
||||
**Recommendation:**
|
||||
|
||||
- If switch supports LACP → use 802.3ad
|
||||
- Otherwise → use active-backup
|
||||
|
||||
### Bond with Fixed IP
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
iface eno2 inet manual
|
||||
|
||||
auto bond0
|
||||
iface bond0 inet static
|
||||
bond-slaves eno1 eno2
|
||||
address 192.168.1.2/24
|
||||
bond-miimon 100
|
||||
bond-mode 802.3ad
|
||||
bond-xmit-hash-policy layer2+3
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
bridge-ports eno3
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
### Bond as Bridge Port
|
||||
|
||||
For fault-tolerant guest network:
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
iface eno2 inet manual
|
||||
|
||||
auto bond0
|
||||
iface bond0 inet manual
|
||||
bond-slaves eno1 eno2
|
||||
bond-miimon 100
|
||||
bond-mode 802.3ad
|
||||
bond-xmit-hash-policy layer2+3
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
bridge-ports bond0
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
## VLAN Configuration (802.1Q)
|
||||
|
||||
### VLAN Awareness on Bridge
|
||||
|
||||
**Guest VLANs** - Configure VLAN tag in VM settings, bridge handles transparently.
|
||||
|
||||
**Bridge with VLAN awareness:**
|
||||
|
||||
```bash
|
||||
auto vmbr0
|
||||
iface vmbr0 inet manual
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-vlan-aware yes
|
||||
bridge-vids 2-4094
|
||||
```
|
||||
|
||||
### Host Management on VLAN
|
||||
|
||||
**With VLAN-aware bridge:**
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
|
||||
auto vmbr0.5
|
||||
iface vmbr0.5 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet manual
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-vlan-aware yes
|
||||
bridge-vids 2-4094
|
||||
```
|
||||
|
||||
**Traditional VLAN:**
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
iface eno1.5 inet manual
|
||||
|
||||
auto vmbr0v5
|
||||
iface vmbr0v5 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
bridge-ports eno1.5
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet manual
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
### VLAN with Bonding
|
||||
|
||||
```bash
|
||||
auto lo
|
||||
iface lo inet loopback
|
||||
|
||||
iface eno1 inet manual
|
||||
iface eno2 inet manual
|
||||
|
||||
auto bond0
|
||||
iface bond0 inet manual
|
||||
bond-slaves eno1 eno2
|
||||
bond-miimon 100
|
||||
bond-mode 802.3ad
|
||||
bond-xmit-hash-policy layer2+3
|
||||
|
||||
iface bond0.5 inet manual
|
||||
|
||||
auto vmbr0v5
|
||||
iface vmbr0v5 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
bridge-ports bond0.5
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
|
||||
auto vmbr0
|
||||
iface vmbr0 inet manual
|
||||
bridge-ports bond0
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Disable MAC Learning
|
||||
|
||||
Available since Proxmox VE 7.3:
|
||||
|
||||
```bash
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 10.10.10.2/24
|
||||
gateway 10.10.10.1
|
||||
bridge-ports ens18
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-disable-mac-learning 1
|
||||
```
|
||||
|
||||
Proxmox VE manually adds VM/CT MAC addresses to forwarding database.
|
||||
|
||||
### Disable IPv6
|
||||
|
||||
Create `/etc/sysctl.d/disable-ipv6.conf`:
|
||||
|
||||
```ini
|
||||
net.ipv6.conf.all.disable_ipv6 = 1
|
||||
net.ipv6.conf.default.disable_ipv6 = 1
|
||||
```
|
||||
|
||||
Then: `sysctl -p /etc/sysctl.d/disable-ipv6.conf`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Avoid ifup/ifdown
|
||||
|
||||
**Don't use** `ifup`/`ifdown` on bridges as they interrupt guest traffic without reconnecting.
|
||||
|
||||
**Use instead:**
|
||||
|
||||
- GUI "Apply Configuration" button
|
||||
- `ifreload -a` command
|
||||
- Reboot
|
||||
|
||||
### Network Changes Not Applied
|
||||
|
||||
1. Check `/etc/network/interfaces.new` exists
|
||||
2. Click "Apply Configuration" in GUI or run `ifreload -a`
|
||||
3. If issues persist, reboot
|
||||
|
||||
### Bond Not Working with Corosync
|
||||
|
||||
Some bond modes are problematic for Corosync. Use multiple networks instead of bonding for cluster traffic.
|
||||
467
skills/proxmox-infrastructure/reference/qemu-guest-agent.md
Normal file
467
skills/proxmox-infrastructure/reference/qemu-guest-agent.md
Normal file
@@ -0,0 +1,467 @@
|
||||
# QEMU Guest Agent Integration
|
||||
|
||||
## Overview
|
||||
|
||||
The QEMU Guest Agent (`qemu-guest-agent`) is a service running inside VMs that enables communication between Proxmox and the guest OS. It provides IP address detection, graceful shutdowns, filesystem freezing for snapshots, and more.
|
||||
|
||||
## Why Use QEMU Guest Agent?
|
||||
|
||||
**Without Guest Agent:**
|
||||
|
||||
- VM IP address unknown to Proxmox
|
||||
- Shutdown = hard power off
|
||||
- Snapshots don't freeze filesystem (risk of corruption)
|
||||
- No guest-level monitoring
|
||||
|
||||
**With Guest Agent:**
|
||||
|
||||
- Automatic IP address detection
|
||||
- Graceful shutdown/reboot
|
||||
- Consistent snapshots with filesystem freeze
|
||||
- Execute commands inside VM
|
||||
- Query guest information (hostname, users, OS details)
|
||||
|
||||
## Installation in Guest VM
|
||||
|
||||
### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install qemu-guest-agent
|
||||
sudo systemctl enable qemu-guest-agent
|
||||
sudo systemctl start qemu-guest-agent
|
||||
```
|
||||
|
||||
### RHEL/Rocky/AlmaLinux
|
||||
|
||||
```bash
|
||||
sudo dnf install qemu-guest-agent
|
||||
sudo systemctl enable qemu-guest-agent
|
||||
sudo systemctl start qemu-guest-agent
|
||||
```
|
||||
|
||||
### Verify Installation
|
||||
|
||||
```bash
|
||||
systemctl status qemu-guest-agent
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
|
||||
```text
|
||||
● qemu-guest-agent.service - QEMU Guest Agent
|
||||
Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; enabled)
|
||||
Active: active (running)
|
||||
```
|
||||
|
||||
## Enable in VM Configuration
|
||||
|
||||
### Via Proxmox Web UI
|
||||
|
||||
**VM → Hardware → Add → QEMU Guest Agent**
|
||||
|
||||
OR edit VM options:
|
||||
|
||||
**VM → Options → QEMU Guest Agent → Edit → Check "Use QEMU Guest Agent"**
|
||||
|
||||
### Via CLI
|
||||
|
||||
```bash
|
||||
qm set <vmid> --agent 1
|
||||
```
|
||||
|
||||
**With custom options:**
|
||||
|
||||
```bash
|
||||
# Enable with filesystem freeze support
|
||||
qm set <vmid> --agent enabled=1,fstrim_cloned_disks=1
|
||||
```
|
||||
|
||||
### Via Terraform
|
||||
|
||||
```hcl
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
name = "my-vm"
|
||||
# ... other config ...
|
||||
|
||||
agent = 1 # Enable guest agent
|
||||
}
|
||||
```
|
||||
|
||||
### Via Ansible
|
||||
|
||||
```yaml
|
||||
- name: Enable QEMU guest agent
|
||||
community.proxmox.proxmox_kvm:
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
api_user: "{{ proxmox_api_user }}"
|
||||
api_token_id: "{{ proxmox_token_id }}"
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: foxtrot
|
||||
vmid: 101
|
||||
agent: 1
|
||||
update: true
|
||||
```
|
||||
|
||||
## Using Guest Agent
|
||||
|
||||
### Check Agent Status
|
||||
|
||||
**Via CLI:**
|
||||
|
||||
```bash
|
||||
# Test if agent is responding
|
||||
qm agent 101 ping
|
||||
|
||||
# Get guest info
|
||||
qm agent 101 info
|
||||
|
||||
# Get network interfaces
|
||||
qm agent 101 network-get-interfaces
|
||||
|
||||
# Get IP addresses
|
||||
qm agent 101 get-osinfo
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"result": {
|
||||
"id": "ubuntu",
|
||||
"kernel-release": "5.15.0-91-generic",
|
||||
"kernel-version": "#101-Ubuntu SMP",
|
||||
"machine": "x86_64",
|
||||
"name": "Ubuntu",
|
||||
"pretty-name": "Ubuntu 22.04.3 LTS",
|
||||
"version": "22.04",
|
||||
"version-id": "22.04"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Execute Commands
|
||||
|
||||
**Via CLI:**
|
||||
|
||||
```bash
|
||||
# Execute command in guest
|
||||
qm guest exec 101 -- whoami
|
||||
|
||||
# With arguments
|
||||
qm guest exec 101 -- ls -la /tmp
|
||||
```
|
||||
|
||||
**Via Python API:**
|
||||
|
||||
```python
|
||||
from proxmoxer import ProxmoxAPI
|
||||
|
||||
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass')
|
||||
|
||||
# Execute command
|
||||
result = proxmox.nodes('foxtrot').qemu(101).agent.exec.post(
|
||||
command=['whoami']
|
||||
)
|
||||
|
||||
# Get execution result
|
||||
pid = result['pid']
|
||||
exec_status = proxmox.nodes('foxtrot').qemu(101).agent('exec-status').get(pid=pid)
|
||||
print(exec_status)
|
||||
```
|
||||
|
||||
### Graceful Shutdown/Reboot
|
||||
|
||||
**Shutdown (graceful with agent):**
|
||||
|
||||
```bash
|
||||
# Sends ACPI shutdown to guest, waits for agent to shutdown OS
|
||||
qm shutdown 101
|
||||
|
||||
# Force shutdown if doesn't complete in 60s
|
||||
qm shutdown 101 --timeout 60 --forceStop 1
|
||||
```
|
||||
|
||||
**Reboot:**
|
||||
|
||||
```bash
|
||||
qm reboot 101
|
||||
```
|
||||
|
||||
## Snapshot Integration
|
||||
|
||||
### Filesystem Freeze for Consistent Snapshots
|
||||
|
||||
When guest agent is enabled, Proxmox can freeze the filesystem before taking a snapshot, ensuring consistency.
|
||||
|
||||
**Create snapshot with FS freeze:**
|
||||
|
||||
```bash
|
||||
# Guest agent automatically freezes filesystem
|
||||
qm snapshot 101 before-upgrade --vmstate 0 --description "Before upgrade"
|
||||
```
|
||||
|
||||
**Rollback to snapshot:**
|
||||
|
||||
```bash
|
||||
qm rollback 101 before-upgrade
|
||||
```
|
||||
|
||||
**Delete snapshot:**
|
||||
|
||||
```bash
|
||||
qm delsnapshot 101 before-upgrade
|
||||
```
|
||||
|
||||
## IP Address Detection
|
||||
|
||||
### Automatic IP Assignment
|
||||
|
||||
With guest agent, Proxmox automatically detects VM IP addresses.
|
||||
|
||||
**View in Web UI:**
|
||||
|
||||
VM → Summary → IPs section shows detected IPs
|
||||
|
||||
**Via CLI:**
|
||||
|
||||
```bash
|
||||
qm agent 101 network-get-interfaces | jq '.result[] | select(.name=="eth0") | ."ip-addresses"'
|
||||
```
|
||||
|
||||
**Via Python:**
|
||||
|
||||
```python
|
||||
interfaces = proxmox.nodes('foxtrot').qemu(101).agent('network-get-interfaces').get()
|
||||
|
||||
for iface in interfaces['result']:
|
||||
if iface['name'] == 'eth0':
|
||||
for ip in iface.get('ip-addresses', []):
|
||||
if ip['ip-address-type'] == 'ipv4':
|
||||
print(f"IPv4: {ip['ip-address']}")
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Guest Agent Options
|
||||
|
||||
**Full options syntax:**
|
||||
|
||||
```bash
|
||||
qm set <vmid> --agent [enabled=]<1|0>[,fstrim_cloned_disks=<1|0>][,type=<virtio|isa>]
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
- `enabled` - Enable/disable guest agent (default: 1)
|
||||
- `fstrim_cloned_disks` - Run fstrim after cloning disk (default: 0)
|
||||
- `type` - Agent communication type: virtio or isa (default: virtio)
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
# Enable with fstrim on cloned disks
|
||||
qm set 101 --agent enabled=1,fstrim_cloned_disks=1
|
||||
```
|
||||
|
||||
### Filesystem Trim (fstrim)
|
||||
|
||||
For VMs on thin-provisioned storage (LVM-thin, CEPH), fstrim helps reclaim unused space.
|
||||
|
||||
**Manual fstrim:**
|
||||
|
||||
```bash
|
||||
# Inside VM
|
||||
sudo fstrim -av
|
||||
```
|
||||
|
||||
**Automatic on clone:**
|
||||
|
||||
```bash
|
||||
qm set <vmid> --agent enabled=1,fstrim_cloned_disks=1
|
||||
```
|
||||
|
||||
**Scheduled fstrim (inside VM):**
|
||||
|
||||
```bash
|
||||
# Enable weekly fstrim timer
|
||||
sudo systemctl enable fstrim.timer
|
||||
sudo systemctl start fstrim.timer
|
||||
```
|
||||
|
||||
## Cloud-Init Integration
|
||||
|
||||
### Include in Cloud-Init Template
|
||||
|
||||
**During template creation:**
|
||||
|
||||
```bash
|
||||
# Install agent package
|
||||
virt-customize -a ubuntu-22.04.img \
|
||||
--install qemu-guest-agent \
|
||||
--run-command "systemctl enable qemu-guest-agent"
|
||||
|
||||
# Create VM from image
|
||||
qm create 9000 --name ubuntu-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
|
||||
qm importdisk 9000 ubuntu-22.04.img local-lvm
|
||||
qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0
|
||||
qm set 9000 --agent 1 # Enable guest agent
|
||||
qm set 9000 --ide2 local-lvm:cloudinit
|
||||
qm template 9000
|
||||
```
|
||||
|
||||
### Cloud-Init User Data
|
||||
|
||||
**Include in cloud-init config:**
|
||||
|
||||
```yaml
|
||||
#cloud-config
|
||||
packages:
|
||||
- qemu-guest-agent
|
||||
|
||||
runcmd:
|
||||
- systemctl enable qemu-guest-agent
|
||||
- systemctl start qemu-guest-agent
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Guest Agent Not Responding
|
||||
|
||||
**1. Check if service is running in guest:**
|
||||
|
||||
```bash
|
||||
# Inside VM
|
||||
systemctl status qemu-guest-agent
|
||||
journalctl -u qemu-guest-agent
|
||||
```
|
||||
|
||||
**2. Check if agent is enabled in VM config:**
|
||||
|
||||
```bash
|
||||
# On Proxmox host
|
||||
qm config 101 | grep agent
|
||||
```
|
||||
|
||||
**3. Check virtio serial device:**
|
||||
|
||||
```bash
|
||||
# Inside VM
|
||||
ls -l /dev/virtio-ports/
|
||||
# Should show: org.qemu.guest_agent.0
|
||||
```
|
||||
|
||||
**4. Restart agent:**
|
||||
|
||||
```bash
|
||||
# Inside VM
|
||||
sudo systemctl restart qemu-guest-agent
|
||||
```
|
||||
|
||||
**5. Check Proxmox can communicate:**
|
||||
|
||||
```bash
|
||||
# On Proxmox host
|
||||
qm agent 101 ping
|
||||
```
|
||||
|
||||
### IP Address Not Detected
|
||||
|
||||
**Possible causes:**
|
||||
|
||||
1. Guest agent not running
|
||||
2. Network interface not configured
|
||||
3. DHCP not assigning IP
|
||||
4. Firewall blocking communication
|
||||
|
||||
**Debug:**
|
||||
|
||||
```bash
|
||||
# Check all interfaces
|
||||
qm agent 101 network-get-interfaces | jq
|
||||
|
||||
# Verify cloud-init completed
|
||||
# Inside VM
|
||||
cloud-init status
|
||||
```
|
||||
|
||||
### Filesystem Freeze Timeout
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
Snapshot creation hangs or times out.
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Disable FS freeze for snapshots
|
||||
qm set 101 --agent enabled=1
|
||||
|
||||
# Take snapshot without FS freeze
|
||||
qm snapshot 101 test --vmstate 0
|
||||
```
|
||||
|
||||
### Agent Installed but Not Enabled
|
||||
|
||||
**Check VM config:**
|
||||
|
||||
```bash
|
||||
qm config 101 | grep agent
|
||||
```
|
||||
|
||||
**If missing, enable:**
|
||||
|
||||
```bash
|
||||
qm set 101 --agent 1
|
||||
```
|
||||
|
||||
**Restart VM for changes to take effect:**
|
||||
|
||||
```bash
|
||||
qm reboot 101
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always install in templates** - Include qemu-guest-agent in VM templates
|
||||
2. **Enable during provisioning** - Set `--agent 1` when creating VMs
|
||||
3. **Use for production VMs** - Critical for graceful shutdowns and monitoring
|
||||
4. **Enable fstrim for thin storage** - Helps reclaim space on LVM-thin and CEPH
|
||||
5. **Test before snapshots** - Verify agent works: `qm agent <vmid> ping`
|
||||
6. **Cloud-init integration** - Automate installation via cloud-init packages
|
||||
7. **Monitor agent status** - Check agent is running in monitoring tools
|
||||
|
||||
## Ansible Automation Example
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Ensure QEMU guest agent is configured
|
||||
hosts: proxmox_vms
|
||||
become: true
|
||||
tasks:
|
||||
- name: Install qemu-guest-agent
|
||||
ansible.builtin.apt:
|
||||
name: qemu-guest-agent
|
||||
state: present
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Enable and start qemu-guest-agent
|
||||
ansible.builtin.systemd:
|
||||
name: qemu-guest-agent
|
||||
enabled: true
|
||||
state: started
|
||||
|
||||
- name: Verify agent is running
|
||||
ansible.builtin.systemd:
|
||||
name: qemu-guest-agent
|
||||
register: agent_status
|
||||
|
||||
- name: Report agent status
|
||||
ansible.builtin.debug:
|
||||
msg: "Guest agent is {{ agent_status.status.ActiveState }}"
|
||||
```
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Proxmox QEMU Guest Agent Documentation](https://pve.proxmox.com/wiki/Qemu-guest-agent)
|
||||
- [QEMU Guest Agent Protocol](https://www.qemu.org/docs/master/interop/qemu-ga.html)
|
||||
486
skills/proxmox-infrastructure/reference/storage-management.md
Normal file
486
skills/proxmox-infrastructure/reference/storage-management.md
Normal file
@@ -0,0 +1,486 @@
|
||||
# Proxmox Storage Management
|
||||
|
||||
## Overview
|
||||
|
||||
Proxmox VE supports multiple storage backends. This guide focuses on the storage architecture of the Matrix cluster: LVM-thin for boot disks and CEPH for distributed storage.
|
||||
|
||||
## Matrix Cluster Storage Architecture
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
**Per Node (Foxtrot, Golf, Hotel):**
|
||||
|
||||
```text
|
||||
nvme0n1 - 1TB Crucial P3 → Boot disk + LVM
|
||||
nvme1n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
|
||||
nvme2n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
|
||||
```
|
||||
|
||||
**Total Cluster:**
|
||||
|
||||
- 3× 1TB boot disks (LVM local storage)
|
||||
- 6× 4TB NVMe drives (24TB raw CEPH capacity)
|
||||
- 12 CEPH OSDs total (2 per NVMe drive)
|
||||
|
||||
### Storage Pools
|
||||
|
||||
```text
|
||||
Storage Pool Type Backend Purpose
|
||||
------------- ---- ------- -------
|
||||
local dir Directory ISO images, templates, backups
|
||||
local-lvm lvmthin LVM-thin VM disks (local)
|
||||
ceph-pool rbd CEPH RBD VM disks (distributed, HA)
|
||||
ceph-fs cephfs CephFS Shared filesystem
|
||||
```
|
||||
|
||||
## LVM Storage
|
||||
|
||||
### LVM-thin Configuration
|
||||
|
||||
**Advantages:**
|
||||
|
||||
- Thin provisioning (overcommit storage)
|
||||
- Fast snapshots
|
||||
- Local to each node (low latency)
|
||||
- No network overhead
|
||||
|
||||
**Disadvantages:**
|
||||
|
||||
- No HA (tied to single node)
|
||||
- No live migration with storage
|
||||
- Limited to node's local disk size
|
||||
|
||||
**Check LVM usage:**
|
||||
|
||||
```bash
|
||||
# View volume groups
|
||||
vgs
|
||||
|
||||
# View logical volumes
|
||||
lvs
|
||||
|
||||
# View thin pool usage
|
||||
lvs -a | grep thin
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
|
||||
```text
|
||||
LV VG Attr LSize Pool Origin Data%
|
||||
data pve twi-aotz-- 850.00g 45.23
|
||||
vm-101-disk-0 pve Vwi-aotz-- 50.00g data 12.45
|
||||
```
|
||||
|
||||
### Managing LVM Storage
|
||||
|
||||
**Extend thin pool (if boot disk has space):**
|
||||
|
||||
```bash
|
||||
# Check free space in VG
|
||||
vgs pve
|
||||
|
||||
# Extend thin pool
|
||||
lvextend -L +100G pve/data
|
||||
```
|
||||
|
||||
**Create VM disk manually:**
|
||||
|
||||
```bash
|
||||
# Create 50GB disk for VM 101
|
||||
lvcreate -V 50G -T pve/data -n vm-101-disk-0
|
||||
```
|
||||
|
||||
## CEPH Storage
|
||||
|
||||
### CEPH Architecture for Matrix
|
||||
|
||||
**Network Configuration:**
|
||||
|
||||
```text
|
||||
vmbr1 (192.168.5.0/24, MTU 9000) → CEPH Public Network
|
||||
vmbr2 (192.168.7.0/24, MTU 9000) → CEPH Private Network
|
||||
```
|
||||
|
||||
**OSD Distribution:**
|
||||
|
||||
```text
|
||||
Node NVMe OSDs Capacity
|
||||
------- ------ ---- --------
|
||||
foxtrot nvme1n1 2 4TB
|
||||
foxtrot nvme2n1 2 4TB
|
||||
golf nvme1n1 2 4TB
|
||||
golf nvme2n1 2 4TB
|
||||
hotel nvme1n1 2 4TB
|
||||
hotel nvme2n1 2 4TB
|
||||
------- ------ ---- --------
|
||||
Total 12 24TB raw
|
||||
```
|
||||
|
||||
**Usable capacity (replica 3):** ~8TB
|
||||
|
||||
### CEPH Deployment Commands
|
||||
|
||||
**Install CEPH:**
|
||||
|
||||
```bash
|
||||
# On first node (foxtrot)
|
||||
pveceph install --version reef
|
||||
|
||||
# Initialize cluster
|
||||
pveceph init --network 192.168.5.0/24 --cluster-network 192.168.7.0/24
|
||||
```
|
||||
|
||||
**Create Monitors (3 for quorum):**
|
||||
|
||||
```bash
|
||||
# On each node
|
||||
pveceph mon create
|
||||
```
|
||||
|
||||
**Create Manager (on each node):**
|
||||
|
||||
```bash
|
||||
pveceph mgr create
|
||||
```
|
||||
|
||||
**Create OSDs:**
|
||||
|
||||
```bash
|
||||
# On each node - 2 OSDs per NVMe drive
|
||||
|
||||
# For nvme1n1 (4TB)
|
||||
pveceph osd create /dev/nvme1n1 --crush-device-class nvme
|
||||
|
||||
# For nvme2n1 (4TB)
|
||||
pveceph osd create /dev/nvme2n1 --crush-device-class nvme
|
||||
```
|
||||
|
||||
**Create CEPH Pool:**
|
||||
|
||||
```bash
|
||||
# Create RBD pool for VMs
|
||||
pveceph pool create ceph-pool --add_storages
|
||||
|
||||
# Create CephFS for shared storage
|
||||
pveceph fs create --name cephfs --add-storage
|
||||
```
|
||||
|
||||
### CEPH Configuration Best Practices
|
||||
|
||||
**Optimize for NVMe:**
|
||||
|
||||
```bash
|
||||
# /etc/pve/ceph.conf
|
||||
[global]
|
||||
public_network = 192.168.5.0/24
|
||||
cluster_network = 192.168.7.0/24
|
||||
osd_pool_default_size = 3
|
||||
osd_pool_default_min_size = 2
|
||||
|
||||
[osd]
|
||||
osd_memory_target = 4294967296 # 4GB per OSD
|
||||
osd_max_backfills = 1
|
||||
osd_recovery_max_active = 1
|
||||
```
|
||||
|
||||
**Restart CEPH services after config change:**
|
||||
|
||||
```bash
|
||||
systemctl restart ceph-osd@*.service
|
||||
```
|
||||
|
||||
### CEPH Monitoring
|
||||
|
||||
**Check cluster health:**
|
||||
|
||||
```bash
|
||||
ceph status
|
||||
ceph health detail
|
||||
```
|
||||
|
||||
**Example healthy output:**
|
||||
|
||||
```text
|
||||
cluster:
|
||||
id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
||||
health: HEALTH_OK
|
||||
|
||||
services:
|
||||
mon: 3 daemons, quorum foxtrot,golf,hotel
|
||||
mgr: foxtrot(active), standbys: golf, hotel
|
||||
osd: 12 osds: 12 up, 12 in
|
||||
|
||||
data:
|
||||
pools: 2 pools, 128 pgs
|
||||
objects: 1.23k objects, 45 GiB
|
||||
usage: 135 GiB used, 23.8 TiB / 24 TiB avail
|
||||
pgs: 128 active+clean
|
||||
```
|
||||
|
||||
**Check OSD performance:**
|
||||
|
||||
```bash
|
||||
ceph osd df
|
||||
ceph osd perf
|
||||
```
|
||||
|
||||
**Check pool usage:**
|
||||
|
||||
```bash
|
||||
ceph df
|
||||
rados df
|
||||
```
|
||||
|
||||
## Storage Configuration in Proxmox
|
||||
|
||||
### Add Storage via Web UI
|
||||
|
||||
**Datacenter → Storage → Add:**
|
||||
|
||||
1. **Directory** - For ISOs and backups
|
||||
2. **LVM-Thin** - For local VM disks
|
||||
3. **RBD** - For CEPH VM disks
|
||||
4. **CephFS** - For shared files
|
||||
|
||||
### Add Storage via CLI
|
||||
|
||||
**CEPH RBD:**
|
||||
|
||||
```bash
|
||||
pvesm add rbd ceph-pool \
|
||||
--pool ceph-pool \
|
||||
--content images,rootdir \
|
||||
--nodes foxtrot,golf,hotel
|
||||
```
|
||||
|
||||
**CephFS:**
|
||||
|
||||
```bash
|
||||
pvesm add cephfs cephfs \
|
||||
--path /mnt/pve/cephfs \
|
||||
--content backup,iso,vztmpl \
|
||||
--nodes foxtrot,golf,hotel
|
||||
```
|
||||
|
||||
**NFS (if using external NAS):**
|
||||
|
||||
```bash
|
||||
pvesm add nfs nas-storage \
|
||||
--server 192.168.3.10 \
|
||||
--export /mnt/tank/proxmox \
|
||||
--content images,backup,iso \
|
||||
--nodes foxtrot,golf,hotel
|
||||
```
|
||||
|
||||
## VM Disk Management
|
||||
|
||||
### Create VM Disk on CEPH
|
||||
|
||||
**Via CLI:**
|
||||
|
||||
```bash
|
||||
# Create 100GB disk for VM 101 on CEPH
|
||||
qm set 101 --scsi1 ceph-pool:100
|
||||
```
|
||||
|
||||
**Via API (Python):**
|
||||
|
||||
```python
|
||||
from proxmoxer import ProxmoxAPI
|
||||
|
||||
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass')
|
||||
proxmox.nodes('foxtrot').qemu(101).config.put(scsi1='ceph-pool:100')
|
||||
```
|
||||
|
||||
### Move VM Disk Between Storage
|
||||
|
||||
**Move from local-lvm to CEPH:**
|
||||
|
||||
```bash
|
||||
qm move-disk 101 scsi0 ceph-pool --delete 1
|
||||
```
|
||||
|
||||
**Move with live migration:**
|
||||
|
||||
```bash
|
||||
qm move-disk 101 scsi0 ceph-pool --delete 1 --online 1
|
||||
```
|
||||
|
||||
### Resize VM Disk
|
||||
|
||||
**Grow disk (can't shrink):**
|
||||
|
||||
```bash
|
||||
# Grow VM 101's scsi0 by 50GB
|
||||
qm resize 101 scsi0 +50G
|
||||
```
|
||||
|
||||
**Inside VM (expand filesystem):**
|
||||
|
||||
```bash
|
||||
# For ext4
|
||||
sudo resize2fs /dev/sda1
|
||||
|
||||
# For XFS
|
||||
sudo xfs_growfs /
|
||||
```
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
### Backup to Storage
|
||||
|
||||
**Create backup:**
|
||||
|
||||
```bash
|
||||
# Backup VM 101 to local storage
|
||||
vzdump 101 --storage local --mode snapshot --compress zstd
|
||||
|
||||
# Backup to CephFS
|
||||
vzdump 101 --storage cephfs --mode snapshot --compress zstd
|
||||
```
|
||||
|
||||
**Scheduled backups (via Web UI):**
|
||||
|
||||
Datacenter → Backup → Add:
|
||||
|
||||
- Schedule: Daily at 2 AM
|
||||
- Storage: cephfs
|
||||
- Mode: Snapshot
|
||||
- Compression: ZSTD
|
||||
- Retention: Keep last 7
|
||||
|
||||
### Restore from Backup
|
||||
|
||||
**List backups:**
|
||||
|
||||
```bash
|
||||
ls /var/lib/vz/dump/
|
||||
# OR
|
||||
ls /mnt/pve/cephfs/dump/
|
||||
```
|
||||
|
||||
**Restore:**
|
||||
|
||||
```bash
|
||||
# Restore to same VMID
|
||||
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 101
|
||||
|
||||
# Restore to new VMID
|
||||
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 102 --storage ceph-pool
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### CEPH Performance
|
||||
|
||||
**For NVMe OSDs:**
|
||||
|
||||
```bash
|
||||
# Set proper device class
|
||||
ceph osd crush set-device-class nvme osd.0
|
||||
ceph osd crush set-device-class nvme osd.1
|
||||
# ... repeat for all OSDs
|
||||
```
|
||||
|
||||
**Create performance pool:**
|
||||
|
||||
```bash
|
||||
ceph osd pool create fast-pool 128 128
|
||||
ceph osd pool application enable fast-pool rbd
|
||||
```
|
||||
|
||||
**Enable RBD cache:**
|
||||
|
||||
```bash
|
||||
# /etc/pve/ceph.conf
|
||||
[client]
|
||||
rbd_cache = true
|
||||
rbd_cache_size = 134217728 # 128MB
|
||||
rbd_cache_writethrough_until_flush = false
|
||||
```
|
||||
|
||||
### LVM Performance
|
||||
|
||||
**Use SSD discard:**
|
||||
|
||||
```bash
|
||||
# Enable discard on VM disk
|
||||
qm set 101 --scsi0 local-lvm:vm-101-disk-0,discard=on,ssd=1
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### CEPH Not Healthy
|
||||
|
||||
**Check OSD status:**
|
||||
|
||||
```bash
|
||||
ceph osd tree
|
||||
ceph osd stat
|
||||
```
|
||||
|
||||
**Restart stuck OSD:**
|
||||
|
||||
```bash
|
||||
systemctl restart ceph-osd@0.service
|
||||
```
|
||||
|
||||
**Check network connectivity:**
|
||||
|
||||
```bash
|
||||
# From one node to another
|
||||
ping -c 3 -M do -s 8972 192.168.5.6 # Test MTU 9000
|
||||
```
|
||||
|
||||
### LVM Out of Space
|
||||
|
||||
**Check thin pool usage:**
|
||||
|
||||
```bash
|
||||
lvs pve/data -o lv_name,data_percent,metadata_percent
|
||||
```
|
||||
|
||||
**If thin pool > 90% full:**
|
||||
|
||||
```bash
|
||||
# Extend if VG has space
|
||||
lvextend -L +100G pve/data
|
||||
|
||||
# OR delete unused VM disks
|
||||
lvremove pve/vm-XXX-disk-0
|
||||
```
|
||||
|
||||
### Storage Performance Issues
|
||||
|
||||
**Test disk I/O:**
|
||||
|
||||
```bash
|
||||
# Test sequential write
|
||||
dd if=/dev/zero of=/tmp/test bs=1M count=1024 oflag=direct
|
||||
|
||||
# Test CEPH RBD performance
|
||||
rbd bench --io-type write ceph-pool/test-image
|
||||
```
|
||||
|
||||
**Monitor CEPH latency:**
|
||||
|
||||
```bash
|
||||
ceph osd perf
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use CEPH for HA VMs** - Store critical VM disks on CEPH for live migration
|
||||
2. **Use LVM for performance** - Non-critical VMs get better performance on local LVM
|
||||
3. **MTU 9000 for CEPH** - Always use jumbo frames on CEPH networks
|
||||
4. **Separate networks** - Public and private CEPH networks on different interfaces
|
||||
5. **Monitor CEPH health** - Set up alerts for HEALTH_WARN/HEALTH_ERR
|
||||
6. **Regular backups** - Automated daily backups to CephFS or external NAS
|
||||
7. **Plan for growth** - Leave 20% free space in CEPH for rebalancing
|
||||
8. **Use replica 3** - Essential for data safety, especially with only 3 nodes
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Proxmox VE Storage Documentation](https://pve.proxmox.com/wiki/Storage)
|
||||
- [CEPH Documentation](https://docs.ceph.com/)
|
||||
- [Proxmox CEPH Guide](https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster)
|
||||
Reference in New Issue
Block a user