7.5 KiB
Common Mistakes and Anti-Patterns
Lessons learned from real-world Proxmox deployments. Avoid these pitfalls to save time and frustration.
VM Provisioning with OpenTofu
Note: Use tofu CLI (not terraform). All examples use OpenTofu.
❌ Cloud-Init File Not on Target Node
Problem: tofu plan succeeds but VM fails to start or configure properly.
# BAD - Cloud-init file only exists locally
resource "proxmox_virtual_environment_vm" "example" {
initialization {
user_data_file_id = "local:snippets/user-data.yaml" # File doesn't exist on node!
}
}
Solution: Cloud-init YAML file MUST exist on the target Proxmox node's datastore.
# Upload to Proxmox node first
scp user-data.yaml root@foxtrot:/var/lib/vz/snippets/
# Or use Ansible to deploy it
ansible proxmox_nodes -m copy -a "src=user-data.yaml dest=/var/lib/vz/snippets/"
Reference: See terraform/netbox-template/user-data.yaml.example for the required format.
❌ Template Missing on Target Node
Problem: tofu apply fails with "template not found" error.
# BAD - Template referenced but doesn't exist
resource "proxmox_virtual_environment_vm" "example" {
node_name = "foxtrot"
clone {
vm_id = 9000 # Template doesn't exist on foxtrot!
}
}
Solution: Ensure template exists on the specific node you're deploying to.
# Check template exists
ssh root@foxtrot "qm list | grep 9000"
# Clone template to another node if needed
ssh root@foxtrot "qm clone 9000 9000 --pool templates"
Better: Use Ansible playbook to create templates consistently across nodes:
cd ansible && uv run ansible-playbook playbooks/proxmox-build-template.yml
❌ Remote Backend Configuration Errors
Problem: OpenTofu fails to authenticate with Proxmox when using Scalr remote backend.
# BAD - Incorrect provider config for remote backend
provider "proxmox" {
endpoint = var.proxmox_api_url
ssh {
agent = true # ❌ Doesn't work with remote backend!
}
}
Solution (Remote Backend - Scalr):
provider "proxmox" {
endpoint = var.proxmox_api_url
username = var.proxmox_username # Must use variables
password = var.proxmox_password # Must use variables
ssh {
agent = false # Critical: false for remote backend
username = var.ssh_username
}
}
Required environment variables:
export SCALR_HOSTNAME="your-scalr-host"
export SCALR_TOKEN="your-scalr-token"
export TF_VAR_proxmox_username="root@pam"
export TF_VAR_proxmox_password="your-password"
Solution (Local Testing):
provider "proxmox" {
endpoint = var.proxmox_api_url
ssh {
agent = true # Use SSH agent for local testing
username = "root"
}
}
Reference Architecture:
- Local examples:
terraform/examples/ - Versioned root modules:
basher83/Triangulum-Prime/terraform-bgp-vm
Template Creation
❌ Cloud Image Not Downloaded to Target Node
Problem: Ansible playbook fails when creating template from cloud image.
# BAD - Assuming image exists
- name: Create VM from cloud image
ansible.builtin.command: >
qm importdisk {{ template_id }} ubuntu-22.04.img local-lvm
# Fails: ubuntu-22.04.img doesn't exist!
Solution: Download cloud image to target node first.
# GOOD - Download first
- name: Download Ubuntu cloud image
ansible.builtin.get_url:
url: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
dest: /tmp/ubuntu-22.04.img
checksum: sha256:...
- name: Import disk to VM
ansible.builtin.command: >
qm importdisk {{ template_id }} /tmp/ubuntu-22.04.img local-lvm
Reference: See ansible/playbooks/proxmox-build-template.yml for complete workflow.
❌ Cloud-Init Snippet Format Violations
Problem: VM boots but cloud-init doesn't configure properly.
# BAD - Wrong format
#cloud-config
users:
- name: admin
sudo: ALL=(ALL) NOPASSWD:ALL
# Missing critical fields!
Solution: Use the standardized snippet format pre-configured for Ansible.
# GOOD - Complete format
#cloud-config
users:
- name: ansible
groups: sudo
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-ed25519 AAAA...
package_update: true
package_upgrade: false
packages:
- qemu-guest-agent
- python3
- python3-pip
runcmd:
- systemctl enable qemu-guest-agent
- systemctl start qemu-guest-agent
Critical Requirements:
- ✅ MUST include
qemu-guest-agentpackage - ✅ MUST include
python3for Ansible compatibility - ✅ MUST configure SSH key for Ansible user
- ✅ MUST enable qemu-guest-agent service
Reference Format: terraform/netbox-template/user-data.yaml.example
❌ Mixing Terraform and Ansible Provisioning
Problem: Confusion about which tool is responsible for what.
Anti-Pattern:
# BAD - Complex provisioning in Terraform
resource "proxmox_virtual_environment_vm" "example" {
initialization {
user_data_file_id = "local:snippets/complex-setup.yaml"
# Hundreds of lines of cloud-init doing app setup
}
}
Best Practice: Clear separation of concerns.
OpenTofu Responsibility:
- VM resource allocation (CPU, memory, disk)
- Network configuration
- Basic cloud-init (user, SSH keys, qemu-guest-agent)
- Infrastructure provisioning
Ansible Responsibility:
- Application installation
- Configuration management
- Service orchestration
- Ongoing management
Pattern:
- OpenTofu: Provision VM with minimal cloud-init
- Cloud-init: Create ansible user, install qemu-guest-agent, python3
- Ansible: Configure everything else
Reference Architecture:
- Template creation:
basher83/Triangulum-Prime/deployments/homelab/templates - OpenTofu examples:
terraform/examples/
Best Practices Summary
Template Creation
- ✅ Download cloud images to target node before import
- ✅ Use standardized cloud-init snippet format
- ✅ Always include qemu-guest-agent
- ✅ Keep cloud-init minimal - let Ansible handle configuration
- ✅ Reference:
basher83/Triangulum-Prime/deployments/homelab/templates
OpenTofu Provisioning
- ✅ Verify template exists on target node
- ✅ Upload cloud-init snippets before referencing
- ✅ Use
ssh.agent = falsefor remote backends (Scalr) - ✅ Use
ssh.agent = truefor local testing - ✅ Set credentials via OpenTofu variables, not hardcoded
- ✅ Reference:
terraform/examples/andbasher83/Triangulum-Prime
Workflow
- ✅ Create template once per node (or sync across nodes)
- ✅ Upload cloud-init snippets to
/var/lib/vz/snippets/ - ✅ Provision VM via OpenTofu (infrastructure)
- ✅ Configure VM via Ansible (applications/services)
Quick Troubleshooting
VM Won't Start After tofu apply
Check:
- Does template exist?
qm list | grep <template-id> - Does cloud-init file exist?
ls -la /var/lib/vz/snippets/ - Is qemu-guest-agent installed?
qm agent <vmid> ping
tofu Can't Connect to Proxmox
Remote Backend:
ssh.agent = false? ✅SCALR_HOSTNAMEandSCALR_TOKENset? ✅- Using OpenTofu variables for credentials? ✅
Local Testing:
ssh.agent = true? ✅- SSH key in agent?
ssh-add -l✅ - Can you SSH to node?
ssh root@foxtrot✅
Cloud-Init Didn't Configure VM
Check:
- File format matches
user-data.yaml.example? ✅ - Includes qemu-guest-agent? ✅
- Includes python3? ✅
- VM console logs:
qm terminal <vmid>then check/var/log/cloud-init.log