Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:38 +08:00
commit 18faa0569e
47 changed files with 7969 additions and 0 deletions

85
skills/terraform/SKILL.md Normal file
View File

@@ -0,0 +1,85 @@
---
name: terraform
description: |
Terraform infrastructure-as-code reference for HCL syntax, state management,
module design, and provider configuration. Use when working with Terraform
configurations (.tf files), running terraform commands, troubleshooting state
issues, or designing modules. Includes Telmate Proxmox provider patterns.
Triggers: terraform, tfstate, .tf files, HCL, modules, providers, proxmox_vm_qemu.
---
# Terraform Skill
Infrastructure-as-code reference for Terraform configurations, state management, and provider patterns.
## Quick Reference
```bash
# Core workflow
terraform init # Initialize, download providers
terraform validate # Syntax validation
terraform fmt -recursive # Format HCL files
terraform plan # Preview changes
terraform apply # Apply changes
# Inspection
terraform state list # List resources in state
terraform state show <resource> # Show resource details
terraform graph | dot -Tsvg > graph.svg # Dependency graph
# Debug
TF_LOG=DEBUG terraform plan 2>debug.log
```
## Core Workflow
```
init → validate → fmt → plan → apply
```
1. **init**: Download providers, initialize backend
2. **validate**: Check syntax and configuration validity
3. **fmt**: Ensure consistent formatting
4. **plan**: Preview what will change (review carefully)
5. **apply**: Execute changes
## Reference Files
Load on-demand based on task:
| Topic | File | When to Load |
|-------|------|--------------|
| Proxmox Gotchas | [proxmox/gotchas.md](references/proxmox/gotchas.md) | Critical provider issues, workarounds |
| Proxmox Auth | [proxmox/authentication.md](references/proxmox/authentication.md) | Provider config, API tokens |
| Proxmox VMs | [proxmox/vm-qemu.md](references/proxmox/vm-qemu.md) | proxmox_vm_qemu resource patterns |
| Proxmox Errors | [proxmox/troubleshooting.md](references/proxmox/troubleshooting.md) | Common errors, debugging |
| State | [state-management.md](references/state-management.md) | Backends, locking, operations |
| Modules | [module-design.md](references/module-design.md) | Module patterns, composition |
| Security | [security.md](references/security.md) | Secrets, state security |
| External | [external-resources.md](references/external-resources.md) | Official docs, links |
## Validation Checklist
Before `terraform apply`:
- [ ] `terraform init` completed successfully
- [ ] `terraform validate` passes
- [ ] `terraform fmt` applied
- [ ] `terraform plan` reviewed (check destroy/replace operations)
- [ ] Backend configured correctly (for team environments)
- [ ] State locking enabled (if remote backend)
- [ ] Sensitive variables marked `sensitive = true`
- [ ] Provider versions pinned in `terraform.tf`
- [ ] No secrets in version control
- [ ] Blast radius assessed (what could break?)
## Variable Precedence
(highest to lowest)
1. `-var` flag: `terraform apply -var="name=value"`
2. `-var-file` flag: `terraform apply -var-file=prod.tfvars`
3. `*.auto.tfvars` files (alphabetically)
4. `terraform.tfvars` file
5. `TF_VAR_*` environment variables
6. Variable defaults in `variables.tf`

View File

@@ -0,0 +1,66 @@
# External Resources
Pointers to official documentation and community resources.
## Official HashiCorp Documentation
| Resource | URL | Use For |
|----------|-----|---------|
| Terraform Docs | https://developer.hashicorp.com/terraform/docs | Language reference, CLI commands |
| Terraform Tutorials | https://developer.hashicorp.com/terraform/tutorials | Step-by-step learning paths |
| Language Reference | https://developer.hashicorp.com/terraform/language | HCL syntax, expressions, functions |
| CLI Reference | https://developer.hashicorp.com/terraform/cli | Command options and usage |
| Best Practices | https://developer.hashicorp.com/terraform/cloud-docs/recommended-practices | Official workflow recommendations |
## Terraform Registry
| Resource | URL | Use For |
|----------|-----|---------|
| Provider Registry | https://registry.terraform.io/browse/providers | Find and explore providers |
| Module Registry | https://registry.terraform.io/browse/modules | Pre-built modules |
| Telmate Proxmox | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Proxmox provider docs |
| AWS Provider | https://registry.terraform.io/providers/hashicorp/aws/latest/docs | AWS resource reference |
## Proxmox Resources
| Resource | URL | Use For |
|----------|-----|---------|
| Telmate Provider Docs | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Resource configuration |
| Telmate GitHub | https://github.com/Telmate/terraform-provider-proxmox | Source, issues, examples |
| Proxmox VE API | https://pve.proxmox.com/pve-docs/api-viewer/ | Understanding API calls |
| Proxmox Wiki | https://pve.proxmox.com/wiki/Main_Page | Proxmox concepts and setup |
## Community Resources
| Resource | URL | Use For |
|----------|-----|---------|
| Terraform Best Practices | https://www.terraform-best-practices.com | Community-maintained guide |
| Awesome Terraform | https://github.com/shuaibiyy/awesome-terraform | Curated list of resources |
| Terraform Weekly | https://www.yourdevopsmentor.com/terraform-weekly | News and updates |
## Learning Resources
| Resource | URL | Use For |
|----------|-----|---------|
| HashiCorp Learn | https://developer.hashicorp.com/terraform/tutorials | Official tutorials |
| Terraform Up & Running | https://www.terraformupandrunning.com/ | Comprehensive book |
## Tools
| Tool | URL | Use For |
|------|-----|---------|
| TFLint | https://github.com/terraform-linters/tflint | Linting and best practices |
| Checkov | https://github.com/bridgecrewio/checkov | Security scanning |
| Infracost | https://github.com/infracost/infracost | Cost estimation |
| Terragrunt | https://terragrunt.gruntwork.io/ | DRY Terraform configurations |
| tfenv | https://github.com/tfutils/tfenv | Terraform version management |
## Quick Links
**Most commonly needed:**
1. **HCL Syntax**: https://developer.hashicorp.com/terraform/language/syntax/configuration
2. **Functions**: https://developer.hashicorp.com/terraform/language/functions
3. **Expressions**: https://developer.hashicorp.com/terraform/language/expressions
4. **Backend Configuration**: https://developer.hashicorp.com/terraform/language/settings/backends
5. **Proxmox VM Resource**: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs/resources/vm_qemu

View File

@@ -0,0 +1,165 @@
# Module Design
## Standard Structure
```
modules/<name>/
├── main.tf # Resources
├── variables.tf # Inputs
├── outputs.tf # Outputs
├── versions.tf # Provider constraints
```
## Module Example
```hcl
# modules/vm/variables.tf
variable "name" {
description = "VM name"
type = string
}
variable "target_node" {
description = "Proxmox node"
type = string
}
variable "specs" {
type = object({
cores = number
memory = number
disk = optional(string, "50G")
})
}
```
```hcl
# modules/vm/main.tf
resource "proxmox_vm_qemu" "vm" {
name = var.name
target_node = var.target_node
cores = var.specs.cores
memory = var.specs.memory
}
```
```hcl
# modules/vm/outputs.tf
output "ip" {
value = proxmox_vm_qemu.vm.default_ipv4_address
}
```
```hcl
# Usage
module "web" {
source = "./modules/vm"
name = "web-01"
target_node = "pve1"
specs = { cores = 4, memory = 8192 }
}
```
## Complex Variable Types
```hcl
# Map of objects
variable "vms" {
type = map(object({
node = string
cores = number
memory = number
}))
}
# Object with optional fields
variable "network" {
type = object({
bridge = string
vlan = optional(number)
ip = optional(string, "dhcp")
})
}
```
## Variable Validation
```hcl
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod."
}
}
variable "cores" {
type = number
validation {
condition = var.cores >= 1 && var.cores <= 32
error_message = "Cores must be 1-32."
}
}
```
## Module Composition
```hcl
module "network" {
source = "../../modules/network"
# ...
}
module "web" {
source = "../../modules/vm"
network_id = module.network.id # Implicit dependency
}
module "database" {
source = "../../modules/vm"
depends_on = [module.network] # Explicit dependency
}
```
## for_each vs count
```hcl
# count - index-based (0, 1, 2)
module "worker" {
source = "./modules/vm"
count = 3
name = "worker-${count.index}"
}
# Access: module.worker[0]
# for_each - key-based (preferred)
module "vm" {
source = "./modules/vm"
for_each = var.vms
name = each.key
specs = each.value
}
# Access: module.vm["web"]
```
## Version Constraints
```hcl
# modules/vm/versions.tf
terraform {
required_version = ">= 1.0"
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
```
```hcl
# Pin module version
module "vm" {
source = "git::https://github.com/org/modules.git//vm?ref=v2.1.0"
}
```

View File

@@ -0,0 +1,44 @@
# Proxmox Provider Authentication
## Provider Configuration
```hcl
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
provider "proxmox" {
pm_api_url = "https://proxmox.example.com:8006/api2/json"
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
pm_tls_insecure = false # true for self-signed certs
pm_parallel = 4 # concurrent operations
pm_timeout = 600 # API timeout seconds
}
```
## Create API Token
```bash
pveum user add terraform@pve
pveum aclmod / -user terraform@pve -role PVEAdmin
pveum user token add terraform@pve mytoken
```
## Environment Variables
```bash
export PM_API_TOKEN_ID="terraform@pve!mytoken"
export PM_API_TOKEN_SECRET="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
```
## Official Resources
- [Provider Docs](https://registry.terraform.io/providers/Telmate/proxmox/latest/docs)
- [GitHub](https://github.com/Telmate/terraform-provider-proxmox)
- [Proxmox API](https://pve.proxmox.com/pve-docs/api-viewer/)

View File

@@ -0,0 +1,86 @@
# Proxmox Provider Gotchas
Critical issues when using Telmate Proxmox provider with Terraform.
## 1. Cloud-Init Changes Not Tracked
Terraform does **not** detect changes to cloud-init snippet file contents.
```hcl
# PROBLEM: Changing vendor-data.yml won't trigger replacement
resource "proxmox_vm_qemu" "vm" {
cicustom = "vendor=local:snippets/vendor-data.yml"
}
# SOLUTION: Use replace_triggered_by
resource "local_file" "vendor_data" {
filename = "vendor-data.yml"
content = templatefile("vendor-data.yml.tftpl", { ... })
}
resource "proxmox_vm_qemu" "vm" {
cicustom = "vendor=local:snippets/vendor-data.yml"
lifecycle {
replace_triggered_by = [
local_file.vendor_data.content_base64sha256
]
}
}
```
## 2. Storage Type vs Storage Pool
Different concepts - don't confuse:
```hcl
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm" # Pool NAME (from Proxmox datacenter)
size = "50G"
}
}
}
}
scsihw = "virtio-scsi-single" # Controller TYPE
```
- **Storage pool** = Where data stored (local-lvm, ceph-pool, nfs-share)
- **Disk type** = Interface (scsi, virtio, ide, sata)
## 3. Network Interface Naming
Proxmox VMs get predictable names by device order:
| NIC Order | Guest Name |
|-----------|------------|
| First | ens18 |
| Second | ens19 |
| Third | ens20 |
**NOT** eth0, eth1. Configure cloud-init netplan matching `ens*`.
## 4. API Token Expiration
Long operations (20+ VMs) can exceed token lifetime.
```hcl
provider "proxmox" {
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
pm_timeout = 1200 # 20 minutes for large operations
}
```
Use API tokens (longer-lived) not passwords.
## 5. Full Clone vs Linked Clone
```hcl
full_clone = true # Independent copy - safe, slower, more storage
full_clone = false # References template - BREAKS if template modified
```
**Always use `full_clone = true` for production.** Linked clones only for disposable test VMs.

View File

@@ -0,0 +1,66 @@
# Proxmox Troubleshooting
## VM Creation Stuck
```
Timeout waiting for VM to be created
```
**Causes**: Template missing, storage full, network unreachable
**Debug**: Check Proxmox task log in web UI
## Clone Failed
```
VM template not found
```
**Check**: `qm list | grep template-name`
**Causes**: Template doesn't exist, wrong node, permission issue
## SSH Timeout
```
Timeout waiting for SSH
```
**Debug**:
1. VM console in Proxmox UI
2. `cloud-init status` on VM
3. `ip addr` to verify network
**Causes**: Cloud-init failed, network misconfigured, firewall
## State Drift
```
Plan shows changes for unchanged resources
```
**Causes**: Manual changes in Proxmox UI, provider bug
**Fix**:
```bash
terraform refresh
terraform plan # Verify
```
## API Errors
```
500 Internal Server Error
```
**Causes**: Invalid config, resource constraints, API timeout
**Debug**: Check `/var/log/pveproxy/access.log` on Proxmox node
## Permission Denied
```
Permission check failed
```
**Fix**: Verify API token has required permissions:
```bash
pveum acl list
pveum user permissions terraform@pve
```

View File

@@ -0,0 +1,86 @@
# proxmox_vm_qemu Resource
## Basic VM from Template
```hcl
resource "proxmox_vm_qemu" "vm" {
name = "my-vm"
target_node = "pve1"
clone = "ubuntu-template"
full_clone = true
cores = 4
sockets = 1
memory = 8192
cpu = "host"
onboot = true
agent = 1 # QEMU guest agent
scsihw = "virtio-scsi-single"
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm"
size = "50G"
}
}
}
}
network {
bridge = "vmbr0"
model = "virtio"
}
# Cloud-init
os_type = "cloud-init"
ciuser = "ubuntu"
sshkeys = var.ssh_public_key
ipconfig0 = "ip=dhcp"
# Static: ipconfig0 = "ip=192.168.1.10/24,gw=192.168.1.1"
# Custom cloud-init
cicustom = "vendor=local:snippets/vendor-data.yml"
}
```
## Lifecycle Management
```hcl
lifecycle {
prevent_destroy = true # Block accidental deletion
ignore_changes = [
network, # Ignore manual changes
]
replace_triggered_by = [
local_file.cloud_init.content_base64sha256
]
create_before_destroy = true # Blue-green deployment
}
```
## Multiple VMs with for_each
```hcl
variable "vms" {
type = map(object({
node = string
cores = number
memory = number
}))
}
resource "proxmox_vm_qemu" "vm" {
for_each = var.vms
name = each.key
target_node = each.value.node
cores = each.value.cores
memory = each.value.memory
# ...
}
```

View File

@@ -0,0 +1,92 @@
# Security
## Secrets Management
### Environment Variables (Recommended)
```bash
export TF_VAR_proxmox_password="secret"
export TF_VAR_api_token="xxxxx"
terraform apply
```
### Sensitive Variables
```hcl
variable "database_password" {
type = string
sensitive = true # Hidden in logs/plan
}
```
### External Secrets Managers
**HashiCorp Vault**:
```hcl
data "vault_generic_secret" "db" {
path = "secret/database"
}
resource "some_resource" "x" {
password = data.vault_generic_secret.db.data["password"]
}
```
**1Password CLI**:
```bash
export TF_VAR_password="$(op read 'op://vault/item/password')"
terraform apply
```
## State Security
**CRITICAL**: State contains secrets in plaintext.
### Encrypt at Rest
```hcl
backend "s3" {
encrypt = true
kms_key_id = "arn:aws:kms:..." # Optional KMS
}
```
### Restrict Access
- IAM/RBAC on backend storage
- Enable state locking
- Never commit state to git
## Provider Credentials
```hcl
provider "proxmox" {
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret # From env
}
```
Create minimal-permission API user:
```bash
pveum user add terraform@pve
pveum aclmod / -user terraform@pve -role PVEVMAdmin
pveum user token add terraform@pve terraform-token
```
## Sensitive Outputs
```hcl
output "db_password" {
value = random_password.db.result
sensitive = true
}
```
## Checklist
- [ ] Sensitive vars marked `sensitive = true`
- [ ] Secrets via env vars or secrets manager
- [ ] State backend encryption enabled
- [ ] State locking enabled
- [ ] No credentials in .tf files
- [ ] Provider credentials minimal permissions

View File

@@ -0,0 +1,112 @@
# State Management
## Remote Backend (Recommended)
```hcl
terraform {
backend "s3" {
bucket = "terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks" # State locking
}
}
```
### S3-Compatible (MinIO, Ceph)
```hcl
terraform {
backend "s3" {
bucket = "terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1" # Required but ignored
endpoint = "https://minio.example.com"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
force_path_style = true
}
}
```
## State Operations
```bash
# List resources
terraform state list
terraform state list proxmox_vm_qemu.*
# Show resource details
terraform state show proxmox_vm_qemu.web
# Rename resource
terraform state mv proxmox_vm_qemu.old proxmox_vm_qemu.new
# Move to module
terraform state mv proxmox_vm_qemu.web modules.web.proxmox_vm_qemu.main
# Remove from state (doesn't destroy)
terraform state rm proxmox_vm_qemu.orphaned
# Import existing resource
terraform import proxmox_vm_qemu.web pve1/qemu/100
# Update state from infrastructure
terraform refresh
```
## State Migration
```bash
# Change backend - updates terraform block, then:
terraform init -migrate-state
# Reinitialize without migration
terraform init -reconfigure
```
## State Locking
Prevents concurrent modifications. Enable via backend config:
- S3: `dynamodb_table`
- Consul: Built-in
- HTTP: `lock_address`
### Force Unlock (Emergency)
```bash
# Only when certain no operation running
terraform force-unlock LOCK_ID
```
## Troubleshooting
### State Lock Timeout
```
Error: Error acquiring state lock
```
1. Wait for other operation
2. Verify no process running
3. `terraform force-unlock LOCK_ID` if safe
### State Drift
```
Plan shows unexpected changes
```
```bash
terraform refresh # Update state from real infra
terraform plan # Review changes
```
### Corrupted State
1. Restore from backup
2. `terraform state pull > backup.tfstate`
3. Last resort: `terraform state rm` and re-import