Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:00:27 +08:00
commit 0c6988a884
19 changed files with 5729 additions and 0 deletions

View File

@@ -0,0 +1,313 @@
# Common Mistakes and Anti-Patterns
Lessons learned from real-world Proxmox deployments. Avoid these pitfalls to save time and frustration.
## VM Provisioning with OpenTofu
**Note**: Use `tofu` CLI (not `terraform`). All examples use OpenTofu.
### ❌ Cloud-Init File Not on Target Node
**Problem**: `tofu plan` succeeds but VM fails to start or configure properly.
```hcl
# BAD - Cloud-init file only exists locally
resource "proxmox_virtual_environment_vm" "example" {
initialization {
user_data_file_id = "local:snippets/user-data.yaml" # File doesn't exist on node!
}
}
```
**Solution**: Cloud-init YAML file MUST exist on the target Proxmox node's datastore.
```bash
# Upload to Proxmox node first
scp user-data.yaml root@foxtrot:/var/lib/vz/snippets/
# Or use Ansible to deploy it
ansible proxmox_nodes -m copy -a "src=user-data.yaml dest=/var/lib/vz/snippets/"
```
**Reference**: See `terraform/netbox-template/user-data.yaml.example` for the required format.
---
### ❌ Template Missing on Target Node
**Problem**: `tofu apply` fails with "template not found" error.
```hcl
# BAD - Template referenced but doesn't exist
resource "proxmox_virtual_environment_vm" "example" {
node_name = "foxtrot"
clone {
vm_id = 9000 # Template doesn't exist on foxtrot!
}
}
```
**Solution**: Ensure template exists on the specific node you're deploying to.
```bash
# Check template exists
ssh root@foxtrot "qm list | grep 9000"
# Clone template to another node if needed
ssh root@foxtrot "qm clone 9000 9000 --pool templates"
```
**Better**: Use Ansible playbook to create templates consistently across nodes:
```bash
cd ansible && uv run ansible-playbook playbooks/proxmox-build-template.yml
```
---
### ❌ Remote Backend Configuration Errors
**Problem**: OpenTofu fails to authenticate with Proxmox when using Scalr remote backend.
```hcl
# BAD - Incorrect provider config for remote backend
provider "proxmox" {
endpoint = var.proxmox_api_url
ssh {
agent = true # ❌ Doesn't work with remote backend!
}
}
```
**Solution (Remote Backend - Scalr)**:
```hcl
provider "proxmox" {
endpoint = var.proxmox_api_url
username = var.proxmox_username # Must use variables
password = var.proxmox_password # Must use variables
ssh {
agent = false # Critical: false for remote backend
username = var.ssh_username
}
}
```
Required environment variables:
```bash
export SCALR_HOSTNAME="your-scalr-host"
export SCALR_TOKEN="your-scalr-token"
export TF_VAR_proxmox_username="root@pam"
export TF_VAR_proxmox_password="your-password"
```
**Solution (Local Testing)**:
```hcl
provider "proxmox" {
endpoint = var.proxmox_api_url
ssh {
agent = true # Use SSH agent for local testing
username = "root"
}
}
```
**Reference Architecture**:
- Local examples: `terraform/examples/`
- Versioned root modules: `basher83/Triangulum-Prime/terraform-bgp-vm`
---
## Template Creation
### ❌ Cloud Image Not Downloaded to Target Node
**Problem**: Ansible playbook fails when creating template from cloud image.
```yaml
# BAD - Assuming image exists
- name: Create VM from cloud image
ansible.builtin.command: >
qm importdisk {{ template_id }} ubuntu-22.04.img local-lvm
# Fails: ubuntu-22.04.img doesn't exist!
```
**Solution**: Download cloud image to target node first.
```yaml
# GOOD - Download first
- name: Download Ubuntu cloud image
ansible.builtin.get_url:
url: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
dest: /tmp/ubuntu-22.04.img
checksum: sha256:...
- name: Import disk to VM
ansible.builtin.command: >
qm importdisk {{ template_id }} /tmp/ubuntu-22.04.img local-lvm
```
**Reference**: See `ansible/playbooks/proxmox-build-template.yml` for complete workflow.
---
### ❌ Cloud-Init Snippet Format Violations
**Problem**: VM boots but cloud-init doesn't configure properly.
```yaml
# BAD - Wrong format
#cloud-config
users:
- name: admin
sudo: ALL=(ALL) NOPASSWD:ALL
# Missing critical fields!
```
**Solution**: Use the standardized snippet format pre-configured for Ansible.
```yaml
# GOOD - Complete format
#cloud-config
users:
- name: ansible
groups: sudo
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-ed25519 AAAA...
package_update: true
package_upgrade: false
packages:
- qemu-guest-agent
- python3
- python3-pip
runcmd:
- systemctl enable qemu-guest-agent
- systemctl start qemu-guest-agent
```
**Critical Requirements**:
- ✅ MUST include `qemu-guest-agent` package
- ✅ MUST include `python3` for Ansible compatibility
- ✅ MUST configure SSH key for Ansible user
- ✅ MUST enable qemu-guest-agent service
**Reference Format**: `terraform/netbox-template/user-data.yaml.example`
---
### ❌ Mixing Terraform and Ansible Provisioning
**Problem**: Confusion about which tool is responsible for what.
**Anti-Pattern**:
```hcl
# BAD - Complex provisioning in Terraform
resource "proxmox_virtual_environment_vm" "example" {
initialization {
user_data_file_id = "local:snippets/complex-setup.yaml"
# Hundreds of lines of cloud-init doing app setup
}
}
```
**Best Practice**: Clear separation of concerns.
**OpenTofu Responsibility**:
- VM resource allocation (CPU, memory, disk)
- Network configuration
- Basic cloud-init (user, SSH keys, qemu-guest-agent)
- Infrastructure provisioning
**Ansible Responsibility**:
- Application installation
- Configuration management
- Service orchestration
- Ongoing management
**Pattern**:
1. OpenTofu: Provision VM with minimal cloud-init
2. Cloud-init: Create ansible user, install qemu-guest-agent, python3
3. Ansible: Configure everything else
**Reference Architecture**:
- Template creation: `basher83/Triangulum-Prime/deployments/homelab/templates`
- OpenTofu examples: `terraform/examples/`
---
## Best Practices Summary
### Template Creation
1. ✅ Download cloud images to target node before import
2. ✅ Use standardized cloud-init snippet format
3. ✅ Always include qemu-guest-agent
4. ✅ Keep cloud-init minimal - let Ansible handle configuration
5. ✅ Reference: `basher83/Triangulum-Prime/deployments/homelab/templates`
### OpenTofu Provisioning
1. ✅ Verify template exists on target node
2. ✅ Upload cloud-init snippets before referencing
3. ✅ Use `ssh.agent = false` for remote backends (Scalr)
4. ✅ Use `ssh.agent = true` for local testing
5. ✅ Set credentials via OpenTofu variables, not hardcoded
6. ✅ Reference: `terraform/examples/` and `basher83/Triangulum-Prime`
### Workflow
1. ✅ Create template once per node (or sync across nodes)
2. ✅ Upload cloud-init snippets to `/var/lib/vz/snippets/`
3. ✅ Provision VM via OpenTofu (infrastructure)
4. ✅ Configure VM via Ansible (applications/services)
---
## Quick Troubleshooting
### VM Won't Start After tofu apply
**Check**:
1. Does template exist? `qm list | grep <template-id>`
2. Does cloud-init file exist? `ls -la /var/lib/vz/snippets/`
3. Is qemu-guest-agent installed? `qm agent <vmid> ping`
### tofu Can't Connect to Proxmox
**Remote Backend**:
1. `ssh.agent = false`? ✅
2. `SCALR_HOSTNAME` and `SCALR_TOKEN` set? ✅
3. Using OpenTofu variables for credentials? ✅
**Local Testing**:
1. `ssh.agent = true`? ✅
2. SSH key in agent? `ssh-add -l`
3. Can you SSH to node? `ssh root@foxtrot`
### Cloud-Init Didn't Configure VM
**Check**:
1. File format matches `user-data.yaml.example`? ✅
2. Includes qemu-guest-agent? ✅
3. Includes python3? ✅
4. VM console logs: `qm terminal <vmid>` then check `/var/log/cloud-init.log`