Initial commit

2025-11-30 08:47:38 +08:00
commit 18faa0569e
47 changed files with 7969 additions and 0 deletions
--- a/skills/proxmox/references/automation-tools.md
+++ b/skills/proxmox/references/automation-tools.md
@@ -0,0 +1,179 @@
+# Proxmox Automation Tools
+
+Integration patterns for managing Proxmox with Terraform and Ansible.
+
+## Tool Selection Guide
+
+| Task | Recommended Tool | Rationale |
+|------|-----------------|-----------|
+| VM/LXC provisioning | Terraform | Declarative state, idempotent, handles dependencies |
+| Template creation | Packer | Repeatable builds, version-controlled |
+| Post-boot configuration | Ansible | Agent-based, procedural, good for drift |
+| One-off VM operations | Ansible | Quick tasks, no state file needed |
+| Dynamic inventory | Ansible | Query running VMs for configuration |
+| Bulk VM creation | Terraform | count/for_each, parallel creation |
+| Snapshot management | Either | Terraform for lifecycle, Ansible for ad-hoc |
+| Cluster administration | CLI/API | Direct access for maintenance tasks |
+
+## Terraform Integration
+
+### Provider
+
+```hcl
+terraform {
+  required_providers {
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "~> 3.0"
+    }
+  }
+}
+
+provider "proxmox" {
+  pm_api_url          = "https://proxmox.example.com:8006/api2/json"
+  pm_api_token_id     = "terraform@pve!mytoken"
+  pm_api_token_secret = var.pm_api_token_secret
+}
+```
+
+### Common Patterns
+
+```hcl
+# Clone from template
+resource "proxmox_vm_qemu" "vm" {
+  name        = "myvm"
+  target_node = "joseph"
+  clone       = "tmpl-ubuntu-2404-standard"
+  full_clone  = true
+
+  cores   = 2
+  memory  = 4096
+
+  disks {
+    scsi {
+      scsi0 {
+        disk {
+          storage = "local-lvm"
+          size    = "50G"
+        }
+      }
+    }
+  }
+}
+```
+
+### Skill Reference
+
+Load terraform skill for detailed patterns:
+- `terraform/references/proxmox/gotchas.md` - Critical issues
+- `terraform/references/proxmox/vm-qemu.md` - VM resource patterns
+- `terraform/references/proxmox/authentication.md` - API setup
+
+## Ansible Integration
+
+### Collection
+
+```bash
+ansible-galaxy collection install community.general
+```
+
+### Common Patterns
+
+```yaml
+# Clone VM
+- name: Clone from template
+  community.general.proxmox_kvm:
+    api_host: proxmox.example.com
+    api_user: ansible@pve
+    api_token_id: mytoken
+    api_token_secret: "{{ proxmox_token_secret }}"
+    node: joseph
+    vmid: 300
+    name: myvm
+    clone: tmpl-ubuntu-2404-standard
+    full: true
+    timeout: 500
+
+# Start VM
+- name: Start VM
+  community.general.proxmox_kvm:
+    # ... auth ...
+    vmid: 300
+    state: started
+```
+
+### Skill Reference
+
+Load ansible skill for detailed patterns:
+- `ansible/references/proxmox/modules.md` - All Proxmox modules
+- `ansible/references/proxmox/gotchas.md` - Common issues
+- `ansible/references/proxmox/dynamic-inventory.md` - Auto-discovery
+
+## Terraform vs Ansible Decision
+
+### Use Terraform When
+
+- Creating infrastructure from scratch
+- Managing VM lifecycle (create, update, destroy)
+- Need state tracking and drift detection
+- Deploying multiple similar VMs (for_each)
+- Complex dependencies between resources
+- Team collaboration with state locking
+
+### Use Ansible When
+
+- Configuring VMs after creation
+- Ad-hoc operations (start/stop specific VMs)
+- Dynamic inventory needed for other playbooks
+- Quick one-off tasks
+- No state file management desired
+- Integration with existing Ansible workflows
+
+### Use Both When
+
+- Terraform provisions VMs
+- Ansible configures them post-boot
+- Ansible uses Proxmox dynamic inventory to find Terraform-created VMs
+
+## Hybrid Workflow Example
+
+```
+1. Packer builds VM template
+   └── packer build ubuntu-2404.pkr.hcl
+
+2. Terraform provisions VMs from template
+   └── terraform apply
+   └── Outputs: VM IPs, hostnames
+
+3. Ansible configures VMs
+   └── Uses Proxmox dynamic inventory OR
+   └── Uses Terraform output as inventory
+
+4. Ongoing management
+   └── Terraform for infrastructure changes
+   └── Ansible for configuration drift
+```
+
+## API Token Sharing
+
+Both tools can share the same API token:
+
+```bash
+# Create shared token
+pveum user add automation@pve
+pveum aclmod / -user automation@pve -role PVEAdmin
+pveum user token add automation@pve shared --privsep 0
+```
+
+Store in shared secrets management (1Password, Vault, etc.).
+
+## Common Gotchas
+
+| Issue | Terraform | Ansible |
+|-------|-----------|---------|
+| VMID | Auto-assigns if not specified | Must specify manually |
+| Cloud-init changes | Use replace_triggered_by | Limited support, use API |
+| State tracking | Yes (tfstate) | No state file |
+| Parallel operations | Yes (configurable) | Yes (forks) |
+| Template name vs ID | Supports both | Supports both |
+| Timeout handling | Provider config | Module parameter |
--- a/skills/proxmox/references/backup.md
+++ b/skills/proxmox/references/backup.md
@@ -0,0 +1,162 @@
+# Proxmox Backup Reference
+
+## vzdump Overview
+
+Built-in backup tool for VMs and containers.
+
+```bash
+# Basic backup
+vzdump <vmid>
+
+# With options
+vzdump <vmid> --mode snapshot --storage backup-nfs --compress zstd
+
+# Backup all VMs
+vzdump --all --compress zstd
+```
+
+## Backup Modes
+
+| Mode | Downtime | Method | Use Case |
+|------|----------|--------|----------|
+| stop | Full | Shutdown, backup, start | Consistent, any storage |
+| suspend | Brief | Pause, backup, resume | Running state preserved |
+| snapshot | None | LVM/ZFS/Ceph snapshot | Production, requires snapshot storage |
+
+### Mode Selection
+
+```bash
+# Stop mode (most consistent)
+vzdump <vmid> --mode stop
+
+# Suspend mode (preserves RAM state)
+vzdump <vmid> --mode suspend
+
+# Snapshot mode (live, requires compatible storage)
+vzdump <vmid> --mode snapshot
+```
+
+## Backup Formats
+
+| Format | Type | Compression |
+|--------|------|-------------|
+| VMA | VMs | Native Proxmox format |
+| tar | Containers | Standard tar archive |
+
+## Compression Options
+
+| Type | Speed | Ratio | CPU |
+|------|-------|-------|-----|
+| none | Fastest | 1:1 | Low |
+| lzo | Fast | Good | Low |
+| gzip | Moderate | Better | Medium |
+| zstd | Fast | Best | Medium |
+
+Recommendation: `zstd` for best balance.
+
+```bash
+vzdump <vmid> --compress zstd
+```
+
+## Storage Configuration
+
+```bash
+# Backup to specific storage
+vzdump <vmid> --storage backup-nfs
+
+# Check available backup storage
+pvesm status | grep backup
+```
+
+## Scheduled Backups
+
+Configure in Datacenter → Backup:
+
+- Schedule (cron format)
+- Selection (all, pool, specific VMs)
+- Storage destination
+- Mode and compression
+- Retention policy
+
+### Retention Policy
+
+```
+keep-last: 3      # Keep last N backups
+keep-daily: 7     # Keep daily for N days
+keep-weekly: 4    # Keep weekly for N weeks
+keep-monthly: 6   # Keep monthly for N months
+```
+
+## Restore Operations
+
+### Full Restore
+
+```bash
+# Restore VM
+qmrestore <backup-file> <vmid>
+
+# Restore to different VMID
+qmrestore <backup-file> <new-vmid>
+
+# Restore container
+pct restore <ctid> <backup-file>
+```
+
+### Restore Options
+
+```bash
+# Restore to different storage
+qmrestore <backup> <vmid> --storage local-lvm
+
+# Force overwrite existing VM
+qmrestore <backup> <vmid> --force
+```
+
+### File-Level Restore
+
+```bash
+# Mount backup for file extraction
+# (Use web UI: Backup → Restore → File Restore)
+```
+
+## Proxmox Backup Server (PBS)
+
+Dedicated backup server with deduplication.
+
+### Benefits
+
+- Deduplication across backups
+- Encryption at rest
+- Verification and integrity checks
+- Efficient incremental backups
+- Remote backup sync
+
+### Integration
+
+Add PBS storage:
+
+```bash
+pvesm add pbs <storage-id> \
+  --server <pbs-server> \
+  --datastore <datastore> \
+  --username <user>@pbs \
+  --fingerprint <fingerprint>
+```
+
+## Backup Best Practices
+
+- Store backups on separate storage from VMs
+- Use snapshot mode for production VMs
+- Test restores regularly
+- Offsite backup copy for disaster recovery
+- Monitor backup job completion
+- Set appropriate retention policy
+
+## Troubleshooting
+
+| Issue | Check |
+|-------|-------|
+| Backup fails | Storage space, VM state, permissions |
+| Slow backup | Mode (snapshot faster), compression, network |
+| Restore fails | Storage compatibility, VMID conflicts |
+| Snapshot fails | Storage doesn't support snapshots |
--- a/skills/proxmox/references/cli-tools.md
+++ b/skills/proxmox/references/cli-tools.md
@@ -0,0 +1,178 @@
+# Proxmox CLI Tools Reference
+
+## qm - VM Management
+
+```bash
+# List and status
+qm list                          # List all VMs
+qm status <vmid>                 # VM status
+qm config <vmid>                 # Show VM config
+
+# Power operations
+qm start <vmid>                  # Start VM
+qm stop <vmid>                   # Force stop
+qm shutdown <vmid>               # ACPI shutdown
+qm reboot <vmid>                 # ACPI reboot
+qm reset <vmid>                  # Hard reset
+qm suspend <vmid>                # Suspend to RAM
+qm resume <vmid>                 # Resume from suspend
+
+# Configuration
+qm set <vmid> --memory 4096      # Set memory
+qm set <vmid> --cores 4          # Set CPU cores
+qm set <vmid> --name newname     # Rename VM
+
+# Disk operations
+qm resize <vmid> scsi0 +10G      # Extend disk
+qm move-disk <vmid> scsi0 <storage>  # Move disk
+
+# Snapshots
+qm snapshot <vmid> <snapname>    # Create snapshot
+qm listsnapshot <vmid>           # List snapshots
+qm rollback <vmid> <snapname>    # Rollback
+qm delsnapshot <vmid> <snapname> # Delete snapshot
+
+# Templates and clones
+qm template <vmid>               # Convert to template
+qm clone <vmid> <newid>          # Clone VM
+
+# Migration
+qm migrate <vmid> <target-node>  # Live migrate
+
+# Troubleshooting
+qm unlock <vmid>                 # Remove lock
+qm showcmd <vmid>                # Show QEMU command
+qm monitor <vmid>                # QEMU monitor
+qm guest cmd <vmid> <command>    # Guest agent command
+```
+
+## pct - Container Management
+
+```bash
+# List and status
+pct list                         # List all containers
+pct status <ctid>                # Container status
+pct config <ctid>                # Show config
+
+# Power operations
+pct start <ctid>                 # Start container
+pct stop <ctid>                  # Stop container
+pct shutdown <ctid>              # Graceful shutdown
+pct reboot <ctid>                # Reboot
+
+# Access
+pct enter <ctid>                 # Enter shell
+pct exec <ctid> -- <command>     # Run command
+pct console <ctid>               # Attach console
+
+# Configuration
+pct set <ctid> --memory 2048     # Set memory
+pct set <ctid> --cores 2         # Set CPU cores
+pct set <ctid> --hostname name   # Set hostname
+
+# Disk operations
+pct resize <ctid> rootfs +5G     # Extend rootfs
+pct move-volume <ctid> <vol> <storage>  # Move volume
+
+# Snapshots
+pct snapshot <ctid> <snapname>   # Create snapshot
+pct listsnapshot <ctid>          # List snapshots
+pct rollback <ctid> <snapname>   # Rollback
+
+# Templates
+pct template <ctid>              # Convert to template
+pct clone <ctid> <newid>         # Clone container
+
+# Migration
+pct migrate <ctid> <target-node> # Migrate container
+
+# Troubleshooting
+pct unlock <ctid>                # Remove lock
+pct push <ctid> <src> <dst>      # Copy file to container
+pct pull <ctid> <src> <dst>      # Copy file from container
+```
+
+## pvecm - Cluster Management
+
+```bash
+# Status
+pvecm status                     # Cluster status
+pvecm nodes                      # List nodes
+pvecm qdevice                    # QDevice status
+
+# Node operations
+pvecm add <node>                 # Join cluster
+pvecm delnode <node>             # Remove node
+pvecm updatecerts                # Update SSL certs
+
+# Recovery
+pvecm expected <votes>           # Set expected votes
+```
+
+## pvesh - API Shell
+
+```bash
+# GET requests
+pvesh get /nodes                 # List nodes
+pvesh get /nodes/<node>/status   # Node status
+pvesh get /nodes/<node>/qemu     # List VMs on node
+pvesh get /nodes/<node>/qemu/<vmid>/status/current  # VM status
+pvesh get /storage               # List storage
+pvesh get /cluster/resources     # All cluster resources
+
+# POST/PUT requests
+pvesh create /nodes/<node>/qemu -vmid <id> ...   # Create VM
+pvesh set /nodes/<node>/qemu/<vmid>/config ...   # Modify VM
+
+# DELETE requests
+pvesh delete /nodes/<node>/qemu/<vmid>           # Delete VM
+```
+
+## vzdump - Backup
+
+```bash
+# Basic backup
+vzdump <vmid>                    # Backup VM
+vzdump <ctid>                    # Backup container
+
+# Options
+vzdump <vmid> --mode snapshot    # Snapshot mode
+vzdump <vmid> --compress zstd    # With compression
+vzdump <vmid> --storage backup   # To specific storage
+vzdump <vmid> --mailto admin@example.com  # Email notification
+
+# Backup all
+vzdump --all                     # All VMs and containers
+vzdump --pool <pool>             # All in pool
+```
+
+## qmrestore / pct restore
+
+```bash
+# Restore VM
+qmrestore <backup.vma> <vmid>
+qmrestore <backup.vma> <vmid> --storage local-lvm
+
+# Restore container
+pct restore <ctid> <backup.tar>
+pct restore <ctid> <backup.tar> --storage local-lvm
+```
+
+## Useful Combinations
+
+```bash
+# Check resources on all nodes
+for node in joseph maxwell everette; do
+  echo "=== $node ==="
+  pvesh get /nodes/$node/status | jq '{cpu:.cpu, memory:.memory}'
+done
+
+# Stop all VMs on a node
+qm list | awk 'NR>1 {print $1}' | xargs -I {} qm stop {}
+
+# List VMs with their IPs (requires guest agent)
+for vmid in $(qm list | awk 'NR>1 {print $1}'); do
+  echo -n "$vmid: "
+  qm guest cmd $vmid network-get-interfaces 2>/dev/null | jq -r '.[].["ip-addresses"][]?.["ip-address"]' | head -1
+done
+```
--- a/skills/proxmox/references/clustering.md
+++ b/skills/proxmox/references/clustering.md
@@ -0,0 +1,181 @@
+# Proxmox Clustering Reference
+
+## Cluster Benefits
+
+- Centralized web management
+- Live VM migration between nodes
+- High availability (HA) with automatic failover
+- Shared configuration
+
+## Cluster Requirements
+
+| Requirement | Details |
+|-------------|---------|
+| Version | Same major/minor Proxmox version |
+| Time | NTP synchronized |
+| Network | Low-latency cluster network |
+| Names | Unique node hostnames |
+| Storage | Shared storage for HA |
+
+## Cluster Commands
+
+```bash
+# Check cluster status
+pvecm status
+
+# List cluster nodes
+pvecm nodes
+
+# Add node to cluster (run on new node)
+pvecm add <existing-node>
+
+# Remove node (run on remaining node)
+pvecm delnode <node-name>
+
+# Expected votes (split-brain recovery)
+pvecm expected <votes>
+```
+
+## Quorum
+
+Cluster requires majority of nodes online to operate.
+
+| Nodes | Quorum | Can Lose |
+|-------|--------|----------|
+| 2 | 2 | 0 (use QDevice) |
+| 3 | 2 | 1 |
+| 4 | 3 | 1 |
+| 5 | 3 | 2 |
+
+### QDevice
+
+External quorum device for even-node clusters:
+
+- Prevents split-brain in 2-node clusters
+- Runs on separate machine
+- Provides tie-breaking vote
+
+## High Availability (HA)
+
+Automatic VM restart on healthy node if host fails.
+
+### Requirements
+
+- Shared storage (Ceph, NFS, iSCSI)
+- Fencing enabled (watchdog)
+- HA group configured
+- VM added to HA
+
+### HA States
+
+| State | Description |
+|-------|-------------|
+| started | VM running, managed by HA |
+| stopped | VM stopped intentionally |
+| migrate | Migration in progress |
+| relocate | Moving to different node |
+| error | Problem detected |
+
+### HA Configuration
+
+1. Enable fencing (watchdog device)
+2. Create HA group (optional)
+3. Add VM to HA: Datacenter → HA → Add
+
+### Fencing
+
+Prevents split-brain by forcing failed node to stop:
+
+```bash
+# Check watchdog status
+cat /proc/sys/kernel/watchdog
+
+# Watchdog config
+/etc/pve/ha/fence.cfg
+```
+
+## Live Migration
+
+Move running VM between nodes without downtime.
+
+### Requirements
+
+- Shared storage OR local-to-local migration
+- Same CPU architecture
+- Network connectivity
+- Sufficient resources on target
+
+### Migration Types
+
+| Type | Downtime | Requirements |
+|------|----------|--------------|
+| Live | Minimal | Shared storage |
+| Offline | Full | Any storage |
+| Local storage | Moderate | Copies disk |
+
+### Migration Command
+
+```bash
+# Live migrate
+qm migrate <vmid> <target-node>
+
+# Offline migrate
+qm migrate <vmid> <target-node> --offline
+
+# With local disk
+qm migrate <vmid> <target-node> --with-local-disks
+```
+
+## Cluster Network
+
+### Corosync Network
+
+Cluster communication (default port 5405):
+
+- Low-latency required
+- Dedicated VLAN recommended
+- Redundant links for HA
+
+### Configuration
+
+```
+# /etc/pve/corosync.conf
+nodelist {
+  node {
+    name: node1
+    ring0_addr: 192.168.10.1
+  }
+  node {
+    name: node2
+    ring0_addr: 192.168.10.2
+  }
+}
+```
+
+## Troubleshooting
+
+### Quorum Lost
+
+```bash
+# Check status
+pvecm status
+
+# Force expected votes (DANGEROUS)
+pvecm expected 1
+
+# Then: recover remaining nodes
+```
+
+### Node Won't Join
+
+- Check network connectivity
+- Verify time sync
+- Check Proxmox versions match
+- Review /var/log/pve-cluster/
+
+### Split Brain Recovery
+
+1. Identify authoritative node
+2. Stop cluster services on other nodes
+3. Set expected votes
+4. Restart and rejoin nodes
--- a/skills/proxmox/references/docker-hosting.md
+++ b/skills/proxmox/references/docker-hosting.md
@@ -0,0 +1,202 @@
+# Docker Workloads on Proxmox
+
+Best practices for hosting Docker containers on Proxmox VE.
+
+## Hosting Options
+
+| Option | Isolation | Overhead | Complexity | Recommendation |
+|--------|-----------|----------|------------|----------------|
+| VM + Docker | Full | Higher | Low | **Recommended** |
+| LXC + Docker | Shared kernel | Lower | High | Avoid |
+| Bare metal Docker | None | Lowest | N/A | Not on Proxmox |
+
+## VM for Docker (Recommended)
+
+### Template Selection
+
+Use Docker-ready templates (102+):
+
+| Template | Docker Pre-installed |
+|----------|---------------------|
+| 102 (docker) | Yes |
+| 103 (github-runner) | Yes |
+| 104 (pihole) | Yes |
+
+### VM Sizing
+
+| Workload | CPU | RAM | Disk |
+|----------|-----|-----|------|
+| Light (1-3 containers) | 2 | 4 GB | 50 GB |
+| Medium (4-10 containers) | 4 | 8 GB | 100 GB |
+| Heavy (10+ containers) | 8+ | 16+ GB | 200+ GB |
+
+### Storage Backend
+
+| Proxmox Storage | Docker Suitability | Notes |
+|-----------------|-------------------|-------|
+| local-lvm | Good | Default, fast |
+| ZFS | Best | Snapshots, compression |
+| Ceph | Good | Distributed, HA |
+| NFS | Moderate | Shared access, slower |
+
+### Network Configuration
+
+```
+Proxmox Node
+├── vmbr0 (bridge) → VM eth0 → Docker bridge network
+└── vmbr12 (high-speed) → VM eth1 → Docker macvlan (optional)
+```
+
+## Docker in LXC (Not Recommended)
+
+If you must run Docker in LXC:
+
+### Requirements
+
+1. **Privileged container** or nesting enabled
+2. **AppArmor** profile unconfined
+3. **Keyctl** feature enabled
+
+### LXC Options
+
+```bash
+# Proxmox GUI: Options → Features
+nesting: 1
+keyctl: 1
+
+# Or in /etc/pve/lxc/<vmid>.conf
+features: keyctl=1,nesting=1
+lxc.apparmor.profile: unconfined
+```
+
+### Known Issues
+
+- Some Docker storage drivers don't work
+- Overlay filesystem may have issues
+- Reduced security isolation
+- Complex debugging (two container layers)
+
+## Resource Allocation
+
+### CPU
+
+```bash
+# VM config - dedicate cores to Docker host
+cores: 4
+cpu: host  # Pass through CPU features
+```
+
+### Memory
+
+```bash
+# VM config - allow some overcommit for containers
+memory: 8192
+balloon: 4096  # Minimum memory
+```
+
+### Disk I/O
+
+For I/O intensive containers (databases):
+
+```bash
+# VM disk options
+cache: none       # Direct I/O for consistency
+iothread: 1       # Dedicated I/O thread
+ssd: 1            # If on SSD storage
+```
+
+## GPU Passthrough for Containers
+
+For transcoding (Plex) or ML workloads:
+
+### 1. Proxmox: Pass GPU to VM
+
+```bash
+# /etc/pve/qemu-server/<vmid>.conf
+hostpci0: 0000:01:00.0,pcie=1
+```
+
+### 2. VM: Install NVIDIA Container Toolkit
+
+```bash
+# In VM
+curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
+curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
+  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
+  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+sudo apt update && sudo apt install -y nvidia-container-toolkit
+sudo nvidia-ctk runtime configure --runtime=docker
+sudo systemctl restart docker
+```
+
+### 3. Docker Compose
+
+```yaml
+services:
+  plex:
+    image: linuxserver/plex
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+```
+
+## Backup Strategy
+
+### VM-level (Recommended)
+
+Proxmox vzdump backs up entire Docker host including all containers:
+
+```bash
+vzdump <vmid> --mode snapshot --storage backup --compress zstd
+```
+
+### Application-level
+
+For consistent database backups, stop or flush before VM backup:
+
+```bash
+# Pre-backup hook
+docker exec postgres pg_dump -U user db > /backup/db.sql
+```
+
+## Monitoring
+
+### From Proxmox
+
+- VM CPU, memory, network, disk via Proxmox UI
+- No visibility into individual containers
+
+### From Docker Host
+
+```bash
+# Resource usage per container
+docker stats
+
+# System-wide
+docker system df
+```
+
+### Recommended Stack
+
+```yaml
+# On Docker host
+services:
+  prometheus:
+    image: prom/prometheus
+  cadvisor:
+    image: gcr.io/cadvisor/cadvisor
+  grafana:
+    image: grafana/grafana
+```
+
+## Skill References
+
+For Docker-specific patterns:
+- `docker/references/compose.md` - Compose file structure
+- `docker/references/networking.md` - Network modes
+- `docker/references/volumes.md` - Data persistence
+- `docker/references/proxmox/hosting.md` - Detailed hosting guide
--- a/skills/proxmox/references/networking.md
+++ b/skills/proxmox/references/networking.md
@@ -0,0 +1,153 @@
+# Proxmox Networking Reference
+
+## Linux Bridges
+
+Default networking method for Proxmox VMs and containers.
+
+### Bridge Configuration
+
+```
+# /etc/network/interfaces example
+auto vmbr0
+iface vmbr0 inet static
+    address 192.168.1.10/24
+    gateway 192.168.1.1
+    bridge-ports eno1
+    bridge-stp off
+    bridge-fd 0
+    bridge-vlan-aware yes
+```
+
+### VLAN-Aware Bridge
+
+Enable VLAN tagging at VM level instead of separate bridges:
+
+- Set `bridge-vlan-aware yes` on bridge
+- Configure VLAN tag in VM network config
+- Simpler management, fewer bridges needed
+
+### Separate Bridges (Alternative)
+
+One bridge per VLAN:
+
+- vmbr0: Untagged/native VLAN
+- vmbr1: VLAN 10
+- vmbr5: VLAN 5
+
+More bridges but explicit network separation.
+
+## VLAN Configuration
+
+### At VM Level (VLAN-aware bridge)
+
+```
+net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,tag=20
+```
+
+### At Bridge Level (Separate bridges)
+
+```
+net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr20
+```
+
+## Firewall
+
+Three levels of firewall rules:
+
+| Level | Scope | Use Case |
+|-------|-------|----------|
+| Datacenter | Cluster-wide | Default policies |
+| Node | Per-node | Node-specific rules |
+| VM/Container | Per-VM | Application-specific |
+
+### Default Policy
+
+- Input: DROP (only allow explicit rules)
+- Output: ACCEPT
+- Enable firewall per VM in Options
+
+### Common Rules
+
+```
+# Allow SSH
+IN ACCEPT -p tcp --dport 22
+
+# Allow HTTP/HTTPS
+IN ACCEPT -p tcp --dport 80
+IN ACCEPT -p tcp --dport 443
+
+# Allow ICMP (ping)
+IN ACCEPT -p icmp
+```
+
+## SDN (Software Defined Networking)
+
+Advanced networking for complex multi-tenant setups.
+
+### Zone Types
+
+| Type | Use Case |
+|------|----------|
+| Simple | Basic L2 network |
+| VLAN | VLAN-based isolation |
+| VXLAN | Overlay networking |
+| EVPN | BGP-based routing |
+
+### When to Use SDN
+
+- Multi-tenant environments
+- Complex routing requirements
+- Cross-node L2 networks
+- VXLAN overlay needs
+
+For homelab: Standard bridges usually sufficient.
+
+## Network Performance
+
+### Jumbo Frames
+
+Enable on storage network for better throughput:
+
+```
+# Set MTU 9000 on bridge
+auto vmbr40
+iface vmbr40 inet static
+    mtu 9000
+    ...
+```
+
+Requires: All devices in path support jumbo frames.
+
+### VirtIO Multiqueue
+
+Enable parallel network processing for high-throughput VMs:
+
+```
+net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4
+```
+
+## Troubleshooting
+
+### Check Bridge Status
+
+```bash
+brctl show              # List bridges and attached interfaces
+ip link show vmbr0      # Bridge interface details
+bridge vlan show        # VLAN configuration
+```
+
+### Check VM Network
+
+```bash
+qm config <vmid> | grep net   # VM network config
+ip addr                        # From inside VM
+```
+
+### Common Issues
+
+| Problem | Check |
+|---------|-------|
+| No connectivity | Bridge exists, interface attached |
+| Wrong VLAN | Tag matches switch config |
+| Slow network | MTU mismatch, driver type |
+| Firewall blocking | Rules, policy, enabled status |
--- a/skills/proxmox/references/storage.md
+++ b/skills/proxmox/references/storage.md
@@ -0,0 +1,150 @@
+# Proxmox Storage Reference
+
+## Storage Types
+
+### Local Storage
+
+| Type | Features | Use Case |
+|------|----------|----------|
+| Directory | Simple, any filesystem | Basic storage |
+| LVM | Block device, raw performance | Performance |
+| LVM-thin | Thin provisioning, snapshots | Efficient space |
+| ZFS | Compression, snapshots, high perf | Production |
+
+Limitations: No live migration, single node only.
+
+### Shared Storage
+
+| Type | Features | Use Case |
+|------|----------|----------|
+| NFS | File-based, simple | Shared access |
+| Ceph RBD | Distributed block, HA | Production HA |
+| iSCSI | Network block | SAN integration |
+| GlusterFS | Distributed file | File sharing |
+
+Benefits: Live migration, HA, shared access.
+
+## Content Types
+
+Configure what each storage can hold:
+
+| Content | Description | File Types |
+|---------|-------------|------------|
+| images | VM disk images | .raw, .qcow2 |
+| iso | ISO images for install | .iso |
+| vztmpl | Container templates | .tar.gz |
+| backup | Backup files | .vma, .tar |
+| rootdir | Container root FS | directories |
+| snippets | Cloud-init, hooks | .yaml, scripts |
+
+## Storage Configuration
+
+### Add NFS Storage
+
+```bash
+pvesm add nfs <storage-id> \
+  --server <nfs-server> \
+  --export <export-path> \
+  --content images,iso,backup
+```
+
+### Add Ceph RBD
+
+```bash
+pvesm add rbd <storage-id> \
+  --monhost <mon1>,<mon2>,<mon3> \
+  --pool <pool-name> \
+  --content images,rootdir
+```
+
+### Check Storage Status
+
+```bash
+pvesm status                    # All storage status
+pvesh get /storage              # API query
+df -h                           # Disk space
+```
+
+## Disk Formats
+
+| Format | Features | Performance |
+|--------|----------|-------------|
+| raw | No overhead, full allocation | Fastest |
+| qcow2 | Snapshots, thin provisioning | Moderate |
+
+Recommendation: Use `raw` for production, `qcow2` for dev/snapshots.
+
+## Disk Cache Modes
+
+| Mode | Safety | Performance | Use Case |
+|------|--------|-------------|----------|
+| none | Safe | Good | Default, recommended |
+| writeback | Unsafe | Best | Non-critical, battery backup |
+| writethrough | Safe | Moderate | Compatibility |
+| directsync | Safest | Slow | Critical data |
+
+## Storage Performance
+
+### Enable Discard (TRIM)
+
+For SSD thin provisioning:
+
+```
+scsi0: local-lvm:vm-100-disk-0,discard=on
+```
+
+### I/O Thread
+
+Dedicated I/O thread per disk:
+
+```
+scsi0: local-lvm:vm-100-disk-0,iothread=1
+```
+
+### I/O Limits
+
+Throttle disk bandwidth:
+
+```
+# In VM config
+bwlimit: <KiB/s>
+iops_rd: <iops>
+iops_wr: <iops>
+```
+
+## Cloud-Init Storage
+
+Cloud-init configs stored in `snippets` content type:
+
+```bash
+# Upload cloud-init files
+scp user-data.yaml root@proxmox:/var/lib/vz/snippets/
+
+# Or to named storage
+scp user-data.yaml root@proxmox:/mnt/pve/<storage>/snippets/
+```
+
+Reference in VM:
+
+```
+cicustom: user=<storage>:snippets/user-data.yaml
+```
+
+## Backup Storage
+
+### Recommended Configuration
+
+- Separate storage for backups
+- NFS or dedicated backup server
+- Sufficient space for retention policy
+
+### Backup Retention
+
+Configure in Datacenter → Backup:
+
+```
+keep-last: 3
+keep-daily: 7
+keep-weekly: 4
+keep-monthly: 6
+```
--- a/skills/proxmox/references/troubleshooting.md
+++ b/skills/proxmox/references/troubleshooting.md
@@ -0,0 +1,197 @@
+# Proxmox Troubleshooting Reference
+
+## Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources |
+| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
+| Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network |
+| Storage unavailable | Mount failed, network | Check mount, network access |
+| High load | Resource contention | Identify bottleneck, rebalance VMs |
+| Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules |
+| Backup failed | Disk space, VM state | Check space, storage access |
+| Template not found | Not downloaded | Download from Proxmox repo |
+| API errors | Auth, permissions | Check token, user permissions |
+
+## Diagnostic Commands
+
+### Cluster Health
+
+```bash
+pvecm status                     # Quorum and node status
+pvecm nodes                      # List cluster members
+systemctl status pve-cluster     # Cluster service
+systemctl status corosync        # Corosync service
+```
+
+### Node Health
+
+```bash
+pveversion -v                    # Proxmox version info
+uptime                           # Load and uptime
+free -h                          # Memory usage
+df -h                            # Disk space
+top -bn1 | head -20              # Process overview
+```
+
+### VM Diagnostics
+
+```bash
+qm status <vmid>                 # VM state
+qm config <vmid>                 # VM configuration
+qm showcmd <vmid>                # QEMU command line
+qm unlock <vmid>                 # Clear locks
+qm monitor <vmid>                # QEMU monitor access
+```
+
+### Container Diagnostics
+
+```bash
+pct status <ctid>                # Container state
+pct config <ctid>                # Container configuration
+pct enter <ctid>                 # Enter container shell
+pct unlock <ctid>                # Clear locks
+```
+
+### Storage Diagnostics
+
+```bash
+pvesm status                     # Storage status
+df -h                            # Disk space
+mount | grep -E 'nfs|ceph'       # Mounted storage
+zpool status                     # ZFS pool status (if using ZFS)
+ceph -s                          # Ceph status (if using Ceph)
+```
+
+### Network Diagnostics
+
+```bash
+brctl show                       # Bridge configuration
+ip link                          # Network interfaces
+ip addr                          # IP addresses
+ip route                         # Routing table
+bridge vlan show                 # VLAN configuration
+```
+
+### Log Files
+
+```bash
+# Cluster logs
+journalctl -u pve-cluster
+journalctl -u corosync
+
+# VM/Container logs
+journalctl | grep <vmid>
+tail -f /var/log/pve/tasks/*
+
+# Firewall logs
+journalctl -u pve-firewall
+
+# Web interface logs
+journalctl -u pveproxy
+```
+
+## Troubleshooting Workflows
+
+### VM Won't Start
+
+1. Check for locks: `qm unlock <vmid>`
+2. Verify storage: `pvesm status`
+3. Check resources: `free -h`, `df -h`
+4. Review config: `qm config <vmid>`
+5. Check logs: `journalctl | grep <vmid>`
+6. Try manual start: `qm start <vmid> --debug`
+
+### Migration Failure
+
+1. Verify shared storage: `pvesm status`
+2. Check target resources: `pvesh get /nodes/<target>/status`
+3. Verify network: `ping <target-node>`
+4. Check version match: `pveversion` on both nodes
+5. Review migration logs
+
+### Cluster Quorum Lost
+
+1. Check status: `pvecm status`
+2. Identify online nodes
+3. If majority lost, set expected: `pvecm expected <n>`
+4. Recover remaining nodes
+5. Rejoin lost nodes when available
+
+### Storage Mount Failed
+
+1. Check network: `ping <storage-server>`
+2. Verify mount: `mount | grep <storage>`
+3. Try manual mount
+4. Check permissions on storage server
+5. Review `/var/log/syslog`
+
+### High CPU/Memory Usage
+
+1. Identify culprit: `top`, `htop`
+2. Check VM resources: `qm monitor <vmid>` → `info balloon`
+3. Review resource allocation across cluster
+4. Consider migration or resource limits
+
+## Recovery Procedures
+
+### Remove Failed Node
+
+```bash
+# On healthy node
+pvecm delnode <failed-node>
+
+# Clean up node-specific configs
+rm -rf /etc/pve/nodes/<failed-node>
+```
+
+### Force Stop Locked VM
+
+```bash
+# Remove lock
+qm unlock <vmid>
+
+# If still stuck, find and kill QEMU process
+ps aux | grep <vmid>
+kill <pid>
+
+# Force cleanup
+qm stop <vmid> --skiplock
+```
+
+### Recover from Corrupt Config
+
+```bash
+# Backup current config
+cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
+
+# Edit config manually
+nano /etc/pve/qemu-server/<vmid>.conf
+
+# Or restore from backup
+qmrestore <backup> <vmid>
+```
+
+## Health Check Script
+
+```bash
+#!/bin/bash
+echo "=== Cluster Status ==="
+pvecm status
+
+echo -e "\n=== Node Resources ==="
+for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
+  echo "--- $node ---"
+  pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
+done
+
+echo -e "\n=== Storage Status ==="
+pvesm status
+
+echo -e "\n=== Running VMs ==="
+qm list | grep running
+
+echo -e "\n=== Running Containers ==="
+pct list | grep running
+```
--- a/skills/proxmox/references/vm-lxc.md
+++ b/skills/proxmox/references/vm-lxc.md
@@ -0,0 +1,103 @@
+# VM vs LXC Reference
+
+## Decision Matrix
+
+### Use VM (QEMU/KVM) When
+
+- Running Windows or non-Linux OS
+- Need full kernel isolation
+- Running untrusted workloads
+- Complex hardware passthrough needed
+- Different kernel version required
+- GPU passthrough required
+
+### Use LXC When
+
+- Running Linux services
+- Need lightweight, fast startup
+- Comfortable with shared kernel
+- Want better density/performance
+- Simple application containers
+- Development environments
+
+## QEMU/KVM VMs
+
+Full hardware virtualization with any OS support.
+
+### Hardware Configuration
+
+| Setting | Options | Recommendation |
+|---------|---------|----------------|
+| CPU type | host, kvm64, custom | `host` for performance |
+| Boot | UEFI, BIOS | UEFI for modern OS |
+| Display | VNC, SPICE, NoVNC | NoVNC for web access |
+
+### Storage Controllers
+
+| Type | Performance | Use Case |
+|------|-------------|----------|
+| VirtIO | Fastest | Linux, Windows with drivers |
+| SCSI | Fast | General purpose |
+| SATA | Moderate | Compatibility |
+| IDE | Slow | Legacy OS |
+
+### Network Adapters
+
+| Type | Performance | Use Case |
+|------|-------------|----------|
+| VirtIO | Fastest | Linux, Windows with drivers |
+| E1000 | Good | Compatibility |
+| RTL8139 | Slow | Legacy OS |
+
+### Features
+
+- Snapshots (requires compatible storage)
+- Templates for rapid cloning
+- Live migration (requires shared storage)
+- Hardware passthrough (GPU, USB, PCI)
+
+## LXC Containers
+
+OS-level virtualization with shared kernel.
+
+### Container Types
+
+| Type | Security | Use Case |
+|------|----------|----------|
+| Unprivileged | Higher (recommended) | Production workloads |
+| Privileged | Lower | Docker-in-LXC, NFS mounts |
+
+### Resource Controls
+
+- CPU cores and limits
+- Memory hard/soft limits
+- Disk I/O throttling
+- Network bandwidth limits
+
+### Storage Options
+
+- Bind mounts from host
+- Volume storage
+- ZFS datasets
+
+### Features
+
+- Fast startup (seconds)
+- Lower memory overhead
+- Higher density per host
+- Templates from Proxmox repo
+
+## Migration Considerations
+
+### VM Migration Requirements
+
+- Shared storage (Ceph, NFS, iSCSI)
+- Same CPU architecture
+- Compatible Proxmox versions
+- Network connectivity between nodes
+
+### LXC Migration Requirements
+
+- Shared storage for live migration
+- Same architecture
+- Unprivileged preferred for portability