Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:38 +08:00
commit 18faa0569e
47 changed files with 7969 additions and 0 deletions

162
skills/ansible/SKILL.md Normal file
View File

@@ -0,0 +1,162 @@
---
name: ansible
description: |
Ansible automation reference for playbooks, roles, inventory, variables, and modules.
Includes Proxmox VE and Docker integration via community.general and community.docker collections.
Use when writing playbooks, troubleshooting Ansible runs, or designing automation workflows.
Triggers: ansible, playbook, inventory, role, task, handler, vars, jinja2, galaxy, proxmox_kvm, proxmox_lxc, docker_container, docker_compose.
---
# Ansible Skill
Ansible automation reference for configuration management and application deployment.
## Quick Reference
```bash
# Test connectivity
ansible all -m ping
ansible <group> -m ping
# Run playbook
ansible-playbook playbook.yml
ansible-playbook playbook.yml -l <host> # Limit to host
ansible-playbook playbook.yml --check # Dry-run
ansible-playbook playbook.yml -vvv # Verbose
# Tags
ansible-playbook playbook.yml --tags "deploy"
ansible-playbook playbook.yml --skip-tags "backup"
ansible-playbook playbook.yml --list-tags
# Variables
ansible-playbook playbook.yml -e "var=value"
ansible-playbook playbook.yml -e "@vars.yml"
# Ad-hoc commands
ansible <group> -m shell -a "command"
ansible <group> -m copy -a "src=file dest=/path"
ansible <group> -m apt -a "name=package state=present"
# Galaxy
ansible-galaxy collection install -r requirements.yml
ansible-galaxy role install <role>
```
## Reference Files
Load on-demand based on task:
| Topic | File | When to Load |
|-------|------|--------------|
| Playbook Structure | [playbooks.md](references/playbooks.md) | Writing playbooks |
| Inventory | [inventory.md](references/inventory.md) | Host/group configuration |
| Variables | [variables.md](references/variables.md) | Variable precedence, facts |
| Modules | [modules.md](references/modules.md) | Common module reference |
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, debugging |
### Proxmox Integration
| Topic | File | When to Load |
|-------|------|--------------|
| Proxmox Modules | [proxmox/modules.md](references/proxmox/modules.md) | VM/LXC management via API |
| Proxmox Auth | [proxmox/authentication.md](references/proxmox/authentication.md) | API tokens, credentials |
| Proxmox Gotchas | [proxmox/gotchas.md](references/proxmox/gotchas.md) | Common issues, workarounds |
| Dynamic Inventory | [proxmox/dynamic-inventory.md](references/proxmox/dynamic-inventory.md) | Auto-discover VMs/containers |
### Docker Integration
| Topic | File | When to Load |
|-------|------|--------------|
| Docker Deployment | [docker/deployment.md](references/docker/deployment.md) | Containers, images, networks, volumes |
| Compose Patterns | [docker/compose-patterns.md](references/docker/compose-patterns.md) | Roles, templates, multi-service stacks |
| Docker Troubleshooting | [docker/troubleshooting.md](references/docker/troubleshooting.md) | Common errors, debugging |
## Playbook Quick Reference
```yaml
---
- name: Deploy application
hosts: webservers
become: true
vars:
app_port: 8080
pre_tasks:
- name: Validate requirements
ansible.builtin.assert:
that:
- app_secret is defined
tasks:
- name: Install packages
ansible.builtin.apt:
name: "{{ item }}"
state: present
loop:
- nginx
- python3
- name: Deploy config
ansible.builtin.template:
src: app.conf.j2
dest: /etc/app/app.conf
notify: Restart app
handlers:
- name: Restart app
ansible.builtin.service:
name: app
state: restarted
post_tasks:
- name: Verify deployment
ansible.builtin.uri:
url: "http://localhost:{{ app_port }}/health"
```
## Variable Precedence (High to Low)
1. Extra vars (`-e "var=value"`)
2. Task vars
3. Block vars
4. Role/include vars
5. Play vars
6. Host facts
7. host_vars/
8. group_vars/
9. Role defaults
## Directory Structure
```text
ansible/
├── ansible.cfg # Configuration
├── inventory/
│ └── hosts.yml # Inventory
├── group_vars/
│ ├── all.yml # All hosts
│ └── webservers.yml # Group-specific
├── host_vars/
│ └── server1.yml # Host-specific
├── roles/
│ └── app/
│ ├── tasks/
│ ├── handlers/
│ ├── templates/
│ ├── files/
│ └── defaults/
├── playbooks/
│ └── deploy.yml
├── templates/
│ └── config.j2
└── requirements.yml # Galaxy dependencies
```
## Idempotency Checklist
- [ ] Tasks produce same result on repeated runs
- [ ] No `changed_when: true` unless necessary
- [ ] Use `state: present/absent` not `shell` commands
- [ ] Check mode (`--check`) shows accurate changes
- [ ] Second run shows all "ok" (no changes)

View File

@@ -0,0 +1,294 @@
# Ansible Docker Compose Patterns
Common patterns for managing Docker Compose stacks with Ansible.
## Project Structure
```
roles/
└── docker_app/
├── tasks/
│ └── main.yml
├── templates/
│ ├── docker-compose.yml.j2
│ └── .env.j2
├── defaults/
│ └── main.yml
└── handlers/
└── main.yml
```
## Role Template
### defaults/main.yml
```yaml
app_name: myapp
app_version: latest
app_port: 8080
app_data_dir: "/opt/{{ app_name }}"
# Compose settings
compose_pull: always
compose_recreate: auto # auto, always, never
# Resource limits
app_memory_limit: 512M
app_cpu_limit: 1.0
```
### templates/docker-compose.yml.j2
```yaml
name: {{ app_name }}
services:
app:
image: {{ app_image }}:{{ app_version }}
container_name: {{ app_name }}
restart: unless-stopped
ports:
- "{{ app_port }}:{{ app_internal_port | default(app_port) }}"
volumes:
- {{ app_data_dir }}/data:/app/data
{% if app_config_file is defined %}
- {{ app_data_dir }}/config:/app/config:ro
{% endif %}
environment:
TZ: {{ timezone | default('UTC') }}
{% for key, value in app_env.items() %}
{{ key }}: "{{ value }}"
{% endfor %}
{% if app_memory_limit is defined or app_cpu_limit is defined %}
deploy:
resources:
limits:
{% if app_memory_limit is defined %}
memory: {{ app_memory_limit }}
{% endif %}
{% if app_cpu_limit is defined %}
cpus: '{{ app_cpu_limit }}'
{% endif %}
{% endif %}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:{{ app_internal_port | default(app_port) }}/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- {{ app_network | default('default') }}
{% if app_network is defined %}
networks:
{{ app_network }}:
external: true
{% endif %}
```
### tasks/main.yml
```yaml
---
- name: Create application directory
ansible.builtin.file:
path: "{{ app_data_dir }}"
state: directory
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
mode: '0755'
- name: Create data directories
ansible.builtin.file:
path: "{{ app_data_dir }}/{{ item }}"
state: directory
owner: "{{ ansible_user }}"
mode: '0755'
loop:
- data
- config
- name: Deploy compose file
ansible.builtin.template:
src: docker-compose.yml.j2
dest: "{{ app_data_dir }}/docker-compose.yml"
owner: "{{ ansible_user }}"
mode: '0644'
notify: Redeploy stack
- name: Deploy environment file
ansible.builtin.template:
src: .env.j2
dest: "{{ app_data_dir }}/.env"
owner: "{{ ansible_user }}"
mode: '0600'
notify: Redeploy stack
when: app_secrets is defined
- name: Ensure stack is running
community.docker.docker_compose_v2:
project_src: "{{ app_data_dir }}"
state: present
pull: "{{ compose_pull }}"
recreate: "{{ compose_recreate }}"
register: compose_result
- name: Show deployment result
ansible.builtin.debug:
msg: "Deployed {{ compose_result.containers | length }} containers"
when: compose_result is changed
```
### handlers/main.yml
```yaml
---
- name: Redeploy stack
community.docker.docker_compose_v2:
project_src: "{{ app_data_dir }}"
state: present
pull: always
recreate: always
```
## Multi-Service Stack
### templates/docker-compose.yml.j2 (full stack)
```yaml
name: {{ stack_name }}
services:
app:
image: {{ app_image }}:{{ app_version }}
restart: unless-stopped
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
environment:
DATABASE_URL: "postgres://{{ db_user }}:{{ db_password }}@db:5432/{{ db_name }}"
REDIS_URL: "redis://redis:6379"
networks:
- internal
- web
db:
image: postgres:15
restart: unless-stopped
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_USER: {{ db_user }}
POSTGRES_PASSWORD: {{ db_password }}
POSTGRES_DB: {{ db_name }}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U {{ db_user }}"]
interval: 5s
timeout: 5s
retries: 5
networks:
- internal
redis:
image: redis:7-alpine
restart: unless-stopped
volumes:
- redis_data:/data
networks:
- internal
nginx:
image: nginx:alpine
restart: unless-stopped
ports:
- "{{ http_port | default(80) }}:80"
- "{{ https_port | default(443) }}:443"
volumes:
- {{ app_data_dir }}/nginx/conf.d:/etc/nginx/conf.d:ro
- {{ app_data_dir }}/nginx/ssl:/etc/nginx/ssl:ro
depends_on:
- app
networks:
- web
networks:
internal:
driver: bridge
web:
driver: bridge
volumes:
db_data:
redis_data:
```
## Zero-Downtime Update
```yaml
- name: Zero-downtime update
hosts: docker_hosts
serial: 1 # One host at a time
tasks:
- name: Pull new image
community.docker.docker_image:
name: "{{ app_image }}"
tag: "{{ app_version }}"
source: pull
- name: Drain connections (if load balanced)
# ... remove from load balancer ...
- name: Update stack
community.docker.docker_compose_v2:
project_src: "{{ app_data_dir }}"
state: present
recreate: always
- name: Wait for health
ansible.builtin.uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
register: health
until: health.status == 200
retries: 30
delay: 2
- name: Restore to load balancer
# ... add back to load balancer ...
```
## Secrets Management
### With ansible-vault
```yaml
# group_vars/secrets.yml (encrypted)
app_secrets:
DB_PASSWORD: supersecret
API_KEY: abc123
JWT_SECRET: longsecret
```
```yaml
# templates/.env.j2
{% for key, value in app_secrets.items() %}
{{ key }}={{ value }}
{% endfor %}
```
### With external secrets
```yaml
- name: Fetch secret from 1Password
ansible.builtin.set_fact:
db_password: "{{ lookup('community.general.onepassword', 'database', field='password') }}"
- name: Deploy with secret
community.docker.docker_compose_v2:
project_src: "{{ app_data_dir }}"
env_files:
- "{{ app_data_dir }}/.env"
state: present
```

View File

@@ -0,0 +1,307 @@
# Docker Deployment with Ansible
Managing Docker containers and compose stacks via Ansible.
## Collection Setup
```bash
ansible-galaxy collection install community.docker
```
## Compose Deployment (Recommended)
### Deploy from local compose file
```yaml
- name: Deploy application stack
hosts: docker_hosts
become: true
tasks:
- name: Create project directory
ansible.builtin.file:
path: /opt/myapp
state: directory
owner: "{{ ansible_user }}"
mode: '0755'
- name: Copy compose file
ansible.builtin.template:
src: docker-compose.yml.j2
dest: /opt/myapp/docker-compose.yml
owner: "{{ ansible_user }}"
mode: '0644'
- name: Copy environment file
ansible.builtin.template:
src: .env.j2
dest: /opt/myapp/.env
owner: "{{ ansible_user }}"
mode: '0600'
- name: Deploy with compose
community.docker.docker_compose_v2:
project_src: /opt/myapp
state: present
pull: always
register: deploy_result
- name: Show deployed services
ansible.builtin.debug:
var: deploy_result.containers
```
### Compose operations
```yaml
# Pull latest images and recreate
- name: Update stack
community.docker.docker_compose_v2:
project_src: /opt/myapp
state: present
pull: always
recreate: always
# Stop stack (keep volumes)
- name: Stop stack
community.docker.docker_compose_v2:
project_src: /opt/myapp
state: stopped
# Remove stack
- name: Remove stack
community.docker.docker_compose_v2:
project_src: /opt/myapp
state: absent
remove_volumes: false # Keep data volumes
```
## Container Deployment (Individual)
### Run container
```yaml
- name: Run nginx container
community.docker.docker_container:
name: nginx
image: nginx:1.25
state: started
restart_policy: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- /opt/nginx/html:/usr/share/nginx/html:ro
- /opt/nginx/conf.d:/etc/nginx/conf.d:ro
env:
TZ: "America/Los_Angeles"
labels:
app: web
env: production
- name: Run database
community.docker.docker_container:
name: postgres
image: postgres:15
state: started
restart_policy: unless-stopped
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
env:
POSTGRES_USER: "{{ db_user }}"
POSTGRES_PASSWORD: "{{ db_password }}"
POSTGRES_DB: "{{ db_name }}"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U {{ db_user }}"]
interval: 10s
timeout: 5s
retries: 5
```
### Container lifecycle
```yaml
# Stop container
- name: Stop container
community.docker.docker_container:
name: myapp
state: stopped
# Restart container
- name: Restart container
community.docker.docker_container:
name: myapp
state: started
restart: true
# Remove container
- name: Remove container
community.docker.docker_container:
name: myapp
state: absent
# Force recreate
- name: Recreate container
community.docker.docker_container:
name: myapp
image: myapp:latest
state: started
recreate: true
```
## Image Management
```yaml
# Pull image
- name: Pull latest image
community.docker.docker_image:
name: myapp
tag: latest
source: pull
force_source: true # Always check for updates
# Build from Dockerfile
- name: Build image
community.docker.docker_image:
name: myapp
tag: "{{ version }}"
source: build
build:
path: /opt/myapp
dockerfile: Dockerfile
pull: true # Pull base image updates
# Remove image
- name: Remove old images
community.docker.docker_image:
name: myapp
tag: old
state: absent
```
## Network Management
```yaml
# Create network
- name: Create app network
community.docker.docker_network:
name: app_network
driver: bridge
ipam_config:
- subnet: 172.20.0.0/16
gateway: 172.20.0.1
# Create macvlan network
- name: Create macvlan network
community.docker.docker_network:
name: lan
driver: macvlan
driver_options:
parent: eth0
ipam_config:
- subnet: 192.168.1.0/24
gateway: 192.168.1.1
# Attach container to network
- name: Run container on network
community.docker.docker_container:
name: myapp
image: myapp:latest
networks:
- name: app_network
ipv4_address: 172.20.0.10
```
## Volume Management
```yaml
# Create named volume
- name: Create data volume
community.docker.docker_volume:
name: app_data
driver: local
# Create volume with options
- name: Create NFS volume
community.docker.docker_volume:
name: shared_data
driver: local
driver_options:
type: nfs
device: ":/exports/data"
o: "addr=192.168.1.10,rw"
# Backup volume
- name: Backup volume
community.docker.docker_container:
name: backup
image: alpine
command: tar czf /backup/data.tar.gz /data
volumes:
- app_data:/data:ro
- /opt/backups:/backup
auto_remove: true
```
## Common Patterns
### Wait for service health
```yaml
- name: Deploy database
community.docker.docker_container:
name: postgres
image: postgres:15
# ... config ...
- name: Wait for database
community.docker.docker_container_info:
name: postgres
register: db_info
until: db_info.container.State.Health.Status == "healthy"
retries: 30
delay: 2
```
### Rolling update
```yaml
- name: Pull new image
community.docker.docker_image:
name: myapp
tag: "{{ new_version }}"
source: pull
- name: Update container
community.docker.docker_container:
name: myapp
image: "myapp:{{ new_version }}"
state: started
recreate: true
restart_policy: unless-stopped
```
### Cleanup
```yaml
- name: Remove stopped containers
community.docker.docker_prune:
containers: true
containers_filters:
status: exited
- name: Remove unused images
community.docker.docker_prune:
images: true
images_filters:
dangling: true
- name: Full cleanup (careful!)
community.docker.docker_prune:
containers: true
images: true
networks: true
volumes: false # Don't remove data!
builder_cache: true
```

View File

@@ -0,0 +1,292 @@
# Ansible Docker Troubleshooting
Common issues and debugging patterns.
## Module Issues
### "Could not find docker-compose"
```yaml
# docker_compose_v2 requires Docker Compose V2 (plugin)
# NOT standalone docker-compose binary
# Check on target host:
# docker compose version # V2 (plugin)
# docker-compose version # V1 (standalone) - won't work
```
Fix: Install Docker Compose V2:
```yaml
- name: Install Docker Compose plugin
ansible.builtin.apt:
name: docker-compose-plugin
state: present
```
### "Permission denied"
```yaml
# User not in docker group
- name: Add user to docker group
ansible.builtin.user:
name: "{{ ansible_user }}"
groups: docker
append: true
become: true
# Then reconnect or use become
- name: Run with become
community.docker.docker_container:
name: myapp
# ...
become: true
```
### "Cannot connect to Docker daemon"
```yaml
# Docker not running
- name: Ensure Docker is running
ansible.builtin.service:
name: docker
state: started
enabled: true
become: true
# Socket permission issue
# Add become: true to docker tasks
```
## Container Issues
### Get container logs
```yaml
- name: Get logs
community.docker.docker_container_exec:
container: myapp
command: cat /var/log/app.log
register: logs
ignore_errors: true
- name: Alternative - docker logs
ansible.builtin.command: docker logs --tail 100 myapp
register: docker_logs
changed_when: false
- name: Show logs
ansible.builtin.debug:
var: docker_logs.stdout_lines
```
### Container keeps restarting
```yaml
- name: Get container info
community.docker.docker_container_info:
name: myapp
register: container_info
- name: Show restart count
ansible.builtin.debug:
msg: "Restart count: {{ container_info.container.RestartCount }}"
- name: Show last exit code
ansible.builtin.debug:
msg: "Exit code: {{ container_info.container.State.ExitCode }}"
- name: Get logs from dead container
ansible.builtin.command: docker logs myapp
register: crash_logs
changed_when: false
- name: Show crash logs
ansible.builtin.debug:
var: crash_logs.stderr_lines
```
### Health check failing
```yaml
- name: Check health status
community.docker.docker_container_info:
name: myapp
register: info
- name: Show health
ansible.builtin.debug:
msg: |
Status: {{ info.container.State.Health.Status }}
Failing: {{ info.container.State.Health.FailingStreak }}
Log: {{ info.container.State.Health.Log | last }}
# Manual health check
- name: Test health endpoint
ansible.builtin.command: >
docker exec myapp curl -f http://localhost:8080/health
register: health
ignore_errors: true
changed_when: false
```
## Network Issues
### Container can't reach external network
```yaml
- name: Test DNS from container
ansible.builtin.command: docker exec myapp nslookup google.com
register: dns_test
changed_when: false
ignore_errors: true
- name: Test connectivity
ansible.builtin.command: docker exec myapp ping -c 1 8.8.8.8
register: ping_test
changed_when: false
ignore_errors: true
# Check iptables
- name: Check IP forwarding
ansible.builtin.command: sysctl net.ipv4.ip_forward
register: ip_forward
changed_when: false
- name: Enable IP forwarding
ansible.posix.sysctl:
name: net.ipv4.ip_forward
value: '1'
state: present
become: true
when: "'0' in ip_forward.stdout"
```
### Containers can't communicate
```yaml
- name: List networks
community.docker.docker_network_info:
name: "{{ network_name }}"
register: network_info
- name: Show connected containers
ansible.builtin.debug:
var: network_info.network.Containers
# Verify both containers on same network
- name: Test inter-container connectivity
ansible.builtin.command: >
docker exec app ping -c 1 db
register: ping_result
changed_when: false
```
## Compose Issues
### Services not starting in order
```yaml
# depends_on only waits for container start, not readiness
# Use healthcheck + condition
# In compose template:
services:
app:
depends_on:
db:
condition: service_healthy # Wait for health check
db:
healthcheck:
test: ["CMD-SHELL", "pg_isready"]
interval: 5s
timeout: 5s
retries: 5
```
### Orphaned containers
```yaml
# Containers from old compose runs
- name: Remove orphans
community.docker.docker_compose_v2:
project_src: /opt/myapp
state: present
remove_orphans: true
```
### Volume data not persisting
```yaml
# Check volume exists
- name: List volumes
ansible.builtin.command: docker volume ls
register: volumes
changed_when: false
# Check volume contents
- name: Inspect volume
ansible.builtin.command: docker volume inspect myapp_data
register: volume_info
changed_when: false
- name: Show volume mountpoint
ansible.builtin.debug:
msg: "{{ (volume_info.stdout | from_json)[0].Mountpoint }}"
```
## Debug Playbook
```yaml
---
- name: Docker debug
hosts: docker_hosts
tasks:
- name: Docker version
ansible.builtin.command: docker version
register: docker_version
changed_when: false
- name: Compose version
ansible.builtin.command: docker compose version
register: compose_version
changed_when: false
- name: List containers
ansible.builtin.command: docker ps -a
register: containers
changed_when: false
- name: List images
ansible.builtin.command: docker images
register: images
changed_when: false
- name: Disk usage
ansible.builtin.command: docker system df
register: disk
changed_when: false
- name: Show all
ansible.builtin.debug:
msg: |
Docker: {{ docker_version.stdout_lines[0] }}
Compose: {{ compose_version.stdout }}
Containers:
{{ containers.stdout }}
Images:
{{ images.stdout }}
Disk:
{{ disk.stdout }}
```
## Common Error Reference
| Error | Cause | Fix |
|-------|-------|-----|
| `docker.errors.DockerException` | Docker not running | Start docker service |
| `docker.errors.APIError: 404` | Container/image not found | Check name/tag |
| `docker.errors.APIError: 409` | Container name conflict | Remove or rename |
| `PermissionError` | Not in docker group | Add user or use become |
| `requests.exceptions.ConnectionError` | Docker socket inaccessible | Check socket permissions |
| `FileNotFoundError: docker-compose` | V1 compose not installed | Use docker_compose_v2 |

View File

@@ -0,0 +1,181 @@
# Ansible Inventory Reference
## YAML Inventory Format
```yaml
all:
children:
webservers:
hosts:
web1:
ansible_host: 192.168.1.10
web2:
ansible_host: 192.168.1.11
vars:
http_port: 80
databases:
hosts:
db1:
ansible_host: 192.168.1.20
db_port: 5432
db2:
ansible_host: 192.168.1.21
production:
children:
webservers:
databases:
vars:
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/id_rsa
```
## INI Inventory Format
```ini
[webservers]
web1 ansible_host=192.168.1.10
web2 ansible_host=192.168.1.11
[webservers:vars]
http_port=80
[databases]
db1 ansible_host=192.168.1.20 db_port=5432
db2 ansible_host=192.168.1.21
[production:children]
webservers
databases
[all:vars]
ansible_user=ubuntu
```
## Host Variables
Common host variables:
| Variable | Purpose |
|----------|---------|
| `ansible_host` | IP or hostname to connect |
| `ansible_port` | SSH port (default: 22) |
| `ansible_user` | SSH username |
| `ansible_ssh_private_key_file` | SSH key path |
| `ansible_become` | Enable sudo |
| `ansible_become_user` | Sudo target user |
| `ansible_python_interpreter` | Python path |
## Group Variables
```yaml
# group_vars/webservers.yml
http_port: 80
document_root: /var/www/html
# group_vars/all.yml
ntp_server: time.example.com
dns_servers:
- 8.8.8.8
- 8.8.4.4
```
## Host Variables Files
```yaml
# host_vars/web1.yml
site_name: production-web1
ssl_cert_path: /etc/ssl/certs/web1.crt
```
## Dynamic Groups
```yaml
# In playbook
- hosts: "{{ target_group | default('all') }}"
```
Run with:
```bash
ansible-playbook playbook.yml -e "target_group=webservers"
```
## Patterns
```bash
# All hosts
ansible all -m ping
# Single host
ansible web1 -m ping
# Group
ansible webservers -m ping
# Multiple groups
ansible 'webservers:databases' -m ping
# Intersection (AND)
ansible 'webservers:&production' -m ping
# Exclusion
ansible 'webservers:!web1' -m ping
# Regex
ansible '~web[0-9]+' -m ping
```
## Limit
```bash
# Limit to specific hosts
ansible-playbook playbook.yml -l web1
ansible-playbook playbook.yml --limit web1,web2
ansible-playbook playbook.yml --limit 'webservers:!web3'
```
## Inventory Check
```bash
# List hosts
ansible-inventory --list
ansible-inventory --graph
# Host info
ansible-inventory --host web1
# Validate
ansible all --list-hosts
```
## Multiple Inventories
```bash
# Multiple files
ansible-playbook -i inventory/production -i inventory/staging playbook.yml
# Directory of inventories
ansible-playbook -i inventory/ playbook.yml
```
## Special Groups
| Group | Contains |
|-------|----------|
| `all` | All hosts |
| `ungrouped` | Hosts not in any group |
## Local Connection
```yaml
localhost:
ansible_host: 127.0.0.1
ansible_connection: local
```
Or in inventory:
```ini
localhost ansible_connection=local
```

View File

@@ -0,0 +1,341 @@
# Ansible Modules Reference
## File Operations
### copy
```yaml
- name: Copy file
ansible.builtin.copy:
src: files/config.conf
dest: /etc/app/config.conf
owner: root
group: root
mode: '0644'
backup: true
```
### template
```yaml
- name: Template config
ansible.builtin.template:
src: templates/config.j2
dest: /etc/app/config.conf
owner: root
group: root
mode: '0644'
notify: Restart app
```
### file
```yaml
# Create directory
- name: Create directory
ansible.builtin.file:
path: /opt/app
state: directory
owner: app
group: app
mode: '0755'
# Create symlink
- name: Create symlink
ansible.builtin.file:
src: /opt/app/current
dest: /opt/app/release
state: link
# Delete file
- name: Remove file
ansible.builtin.file:
path: /tmp/old-file
state: absent
```
### lineinfile
```yaml
- name: Ensure line in file
ansible.builtin.lineinfile:
path: /etc/hosts
line: "192.168.1.10 myhost"
state: present
- name: Replace line
ansible.builtin.lineinfile:
path: /etc/config
regexp: '^PORT='
line: 'PORT=8080'
```
## Package Management
### apt (Debian/Ubuntu)
```yaml
- name: Install package
ansible.builtin.apt:
name: nginx
state: present
update_cache: true
- name: Install multiple
ansible.builtin.apt:
name:
- nginx
- python3
state: present
- name: Remove package
ansible.builtin.apt:
name: nginx
state: absent
```
### package (Generic)
```yaml
- name: Install package
ansible.builtin.package:
name: httpd
state: present
```
## Service Management
### service
```yaml
- name: Start and enable
ansible.builtin.service:
name: nginx
state: started
enabled: true
- name: Restart
ansible.builtin.service:
name: nginx
state: restarted
- name: Reload
ansible.builtin.service:
name: nginx
state: reloaded
```
### systemd
```yaml
- name: Daemon reload
ansible.builtin.systemd:
daemon_reload: true
- name: Enable and start
ansible.builtin.systemd:
name: myapp
state: started
enabled: true
```
## Command Execution
### command
```yaml
- name: Run command
ansible.builtin.command: /bin/mycommand arg1 arg2
register: result
changed_when: "'changed' in result.stdout"
```
### shell
```yaml
- name: Run shell command
ansible.builtin.shell: |
cd /opt/app
./setup.sh && ./configure.sh
args:
executable: /bin/bash
```
### script
```yaml
- name: Run local script on remote
ansible.builtin.script: scripts/setup.sh
args:
creates: /opt/app/.installed
```
## User Management
### user
```yaml
- name: Create user
ansible.builtin.user:
name: appuser
groups: docker,sudo
shell: /bin/bash
create_home: true
state: present
- name: Remove user
ansible.builtin.user:
name: olduser
state: absent
remove: true
```
### group
```yaml
- name: Create group
ansible.builtin.group:
name: appgroup
state: present
```
## Docker (community.docker)
### docker_container
```yaml
- name: Run container
community.docker.docker_container:
name: myapp
image: myapp:latest
state: started
restart_policy: unless-stopped
ports:
- "8080:80"
volumes:
- /data:/app/data
env:
DB_HOST: database
```
### docker_compose_v2
```yaml
- name: Deploy with compose
community.docker.docker_compose_v2:
project_src: /opt/app
project_name: myapp
state: present
pull: always
env_files:
- /opt/app/.env
```
### docker_image
```yaml
- name: Pull image
community.docker.docker_image:
name: nginx
tag: "1.25"
source: pull
```
## Networking
### uri
```yaml
- name: API call
ansible.builtin.uri:
url: "http://localhost:8080/api/health"
method: GET
return_content: true
register: response
- name: POST request
ansible.builtin.uri:
url: "http://api.example.com/data"
method: POST
body_format: json
body:
key: value
```
### wait_for
```yaml
- name: Wait for port
ansible.builtin.wait_for:
host: localhost
port: 8080
timeout: 300
- name: Wait for file
ansible.builtin.wait_for:
path: /var/log/app.log
search_regex: "Server started"
```
## Debug/Assert
### debug
```yaml
- name: Print variable
ansible.builtin.debug:
msg: "Value: {{ my_var }}"
- name: Print var directly
ansible.builtin.debug:
var: my_var
```
### assert
```yaml
- name: Validate conditions
ansible.builtin.assert:
that:
- my_var is defined
- my_var | length > 0
fail_msg: "my_var must be defined and non-empty"
success_msg: "Validation passed"
```
### fail
```yaml
- name: Fail with message
ansible.builtin.fail:
msg: "Required condition not met"
when: condition
```
## Misc
### pause
```yaml
- name: Wait 10 seconds
ansible.builtin.pause:
seconds: 10
- name: Wait for user
ansible.builtin.pause:
prompt: "Press enter to continue"
```
### stat
```yaml
- name: Check file exists
ansible.builtin.stat:
path: /etc/config
register: config_file
- name: Use result
ansible.builtin.debug:
msg: "File exists"
when: config_file.stat.exists
```

View File

@@ -0,0 +1,243 @@
# Ansible Playbook Reference
## Basic Structure
```yaml
---
- name: Playbook description
hosts: target_group
become: true # Run as root
gather_facts: true # Collect system info
vars:
my_var: value
vars_files:
- vars/secrets.yml
pre_tasks:
- name: Pre-task
ansible.builtin.debug:
msg: "Running before main tasks"
roles:
- role_name
tasks:
- name: Main task
ansible.builtin.debug:
msg: "Main task"
handlers:
- name: Handler name
ansible.builtin.service:
name: service
state: restarted
post_tasks:
- name: Post-task
ansible.builtin.debug:
msg: "Running after main tasks"
```
## Task Options
```yaml
tasks:
- name: Task with common options
ansible.builtin.command: /bin/command
become: true # Privilege escalation
become_user: www-data # Run as specific user
when: condition # Conditional execution
register: result # Store output
ignore_errors: true # Continue on failure
changed_when: false # Override change detection
failed_when: result.rc != 0 # Custom failure condition
tags:
- deploy
- config
notify: Handler name # Trigger handler
```
## Loops
```yaml
# Simple loop
- name: Install packages
ansible.builtin.apt:
name: "{{ item }}"
state: present
loop:
- nginx
- python3
# Loop with dict
- name: Create users
ansible.builtin.user:
name: "{{ item.name }}"
groups: "{{ item.groups }}"
loop:
- { name: 'user1', groups: 'admin' }
- { name: 'user2', groups: 'users' }
# Loop over dict
- name: Process items
ansible.builtin.debug:
msg: "{{ item.key }}: {{ item.value }}"
loop: "{{ my_dict | dict2items }}"
# Loop with index
- name: With index
ansible.builtin.debug:
msg: "{{ index }}: {{ item }}"
loop: "{{ my_list }}"
loop_control:
index_var: index
```
## Conditionals
```yaml
# Simple when
- name: Only on Ubuntu
ansible.builtin.apt:
name: package
when: ansible_distribution == "Ubuntu"
# Multiple conditions
- name: Complex condition
ansible.builtin.command: /bin/something
when:
- ansible_os_family == "Debian"
- ansible_distribution_version is version('20.04', '>=')
# Or conditions
- name: Or condition
ansible.builtin.command: /bin/something
when: condition1 or condition2
# Check variable
- name: If defined
ansible.builtin.debug:
msg: "{{ my_var }}"
when: my_var is defined
```
## Blocks
```yaml
- name: Block example
block:
- name: Task 1
ansible.builtin.command: /bin/task1
- name: Task 2
ansible.builtin.command: /bin/task2
rescue:
- name: Handle failure
ansible.builtin.debug:
msg: "Block failed"
always:
- name: Always run
ansible.builtin.debug:
msg: "Cleanup"
```
## Handlers
```yaml
tasks:
- name: Update config
ansible.builtin.template:
src: config.j2
dest: /etc/app/config
notify:
- Restart service
- Reload config
handlers:
- name: Restart service
ansible.builtin.service:
name: app
state: restarted
- name: Reload config
ansible.builtin.service:
name: app
state: reloaded
```
Handlers run once at end of play, even if notified multiple times.
## Including Tasks
```yaml
# Include tasks file
- name: Include tasks
ansible.builtin.include_tasks: tasks/setup.yml
# Import tasks (static)
- name: Import tasks
ansible.builtin.import_tasks: tasks/setup.yml
# Include with variables
- name: Include with vars
ansible.builtin.include_tasks: tasks/deploy.yml
vars:
environment: production
```
## Tags
```yaml
tasks:
- name: Tagged task
ansible.builtin.command: /bin/command
tags:
- deploy
- always # Always runs regardless of tag selection
- name: Never runs by default
ansible.builtin.command: /bin/command
tags: never # Only runs when explicitly tagged
```
Run with tags:
```bash
ansible-playbook playbook.yml --tags "deploy"
ansible-playbook playbook.yml --skip-tags "slow"
```
## Check Mode
```yaml
# Force check mode behavior
- name: Always runs in check
ansible.builtin.command: /bin/command
check_mode: false # Runs even in check mode
- name: Never runs in check
ansible.builtin.command: /bin/command
check_mode: true # Only runs in check mode
```
## Delegation
```yaml
# Run on different host
- name: Update load balancer
ansible.builtin.command: /bin/update-lb
delegate_to: loadbalancer
# Run locally
- name: Local action
ansible.builtin.command: /bin/local-command
delegate_to: localhost
# Run once for all hosts
- name: Single execution
ansible.builtin.command: /bin/command
run_once: true
```

View File

@@ -0,0 +1,155 @@
# Ansible Proxmox Authentication
## API Token Setup
Create a dedicated Ansible user and API token on Proxmox:
```bash
# On Proxmox node
pveum user add ansible@pve
pveum aclmod / -user ansible@pve -role PVEAdmin
pveum user token add ansible@pve mytoken --privsep 0
```
**Note:** `--privsep 0` gives the token the same permissions as the user.
## Playbook Variables
### Direct in playbook (NOT recommended)
```yaml
vars:
proxmox_api_host: proxmox.example.com
proxmox_api_user: ansible@pve
proxmox_api_token_id: mytoken
proxmox_api_token_secret: "{{ vault_proxmox_token }}"
```
### Group vars with vault
```yaml
# group_vars/all.yml
proxmox_api_host: proxmox.example.com
proxmox_api_user: ansible@pve
proxmox_api_token_id: mytoken
# group_vars/secrets.yml (ansible-vault encrypted)
proxmox_api_token_secret: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
### Environment variables
```bash
export PROXMOX_HOST=proxmox.example.com
export PROXMOX_USER=ansible@pve
export PROXMOX_TOKEN_ID=mytoken
export PROXMOX_TOKEN_SECRET=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
```yaml
# In playbook
vars:
proxmox_api_host: "{{ lookup('env', 'PROXMOX_HOST') }}"
proxmox_api_user: "{{ lookup('env', 'PROXMOX_USER') }}"
proxmox_api_token_id: "{{ lookup('env', 'PROXMOX_TOKEN_ID') }}"
proxmox_api_token_secret: "{{ lookup('env', 'PROXMOX_TOKEN_SECRET') }}"
```
## Reusable Auth Block
Define once, reuse across tasks:
```yaml
vars:
proxmox_auth: &proxmox_auth
api_host: "{{ proxmox_api_host }}"
api_user: "{{ proxmox_api_user }}"
api_token_id: "{{ proxmox_api_token_id }}"
api_token_secret: "{{ proxmox_api_token_secret }}"
validate_certs: false # For self-signed certs
tasks:
- name: Create VM
community.general.proxmox_kvm:
<<: *proxmox_auth
node: joseph
vmid: 300
name: myvm
state: present
- name: Start VM
community.general.proxmox_kvm:
<<: *proxmox_auth
vmid: 300
state: started
```
## TLS Certificate Handling
### Self-signed certificates
```yaml
community.general.proxmox_kvm:
# ... auth params ...
validate_certs: false
```
### Custom CA
```bash
export SSL_CERT_FILE=/path/to/ca-bundle.crt
```
Or in ansible.cfg:
```ini
[defaults]
# For urllib3/requests
ca_cert = /path/to/ca-bundle.crt
```
## Minimum Required Permissions
For full VM/container management:
| Permission | Path | Purpose |
|------------|------|---------|
| VM.Allocate | / | Create VMs |
| VM.Clone | / | Clone templates |
| VM.Config.* | / | Modify VM config |
| VM.PowerMgmt | / | Start/stop VMs |
| VM.Snapshot | / | Create snapshots |
| Datastore.AllocateSpace | / | Allocate disk space |
| Datastore.Audit | / | List storage |
Or use the built-in `PVEAdmin` role for full access.
## Troubleshooting Auth Issues
```yaml
# Debug task to test connection
- name: Test Proxmox API connection
community.general.proxmox_kvm:
api_host: "{{ proxmox_api_host }}"
api_user: "{{ proxmox_api_user }}"
api_token_id: "{{ proxmox_api_token_id }}"
api_token_secret: "{{ proxmox_api_token_secret }}"
validate_certs: false
vmid: 100
state: current
register: result
ignore_errors: true
- name: Show result
ansible.builtin.debug:
var: result
```
Common errors:
| Error | Cause | Fix |
|-------|-------|-----|
| 401 Unauthorized | Bad token | Verify token ID format: `user@realm!tokenname` |
| 403 Forbidden | Insufficient permissions | Check user ACLs with `pveum user permissions ansible@pve` |
| SSL certificate problem | Self-signed cert | Set `validate_certs: false` |
| Connection refused | Wrong host/port | Verify API URL (port 8006) |

View File

@@ -0,0 +1,195 @@
# Ansible Proxmox Dynamic Inventory
Query Proxmox API for automatic inventory generation.
## Plugin Setup
### Requirements
```bash
pip install proxmoxer requests
ansible-galaxy collection install community.general
```
### Inventory File
Create `inventory/proxmox.yml`:
```yaml
plugin: community.general.proxmox
url: https://proxmox.example.com:8006
user: ansible@pve
token_id: mytoken
token_secret: "{{ lookup('env', 'PROXMOX_TOKEN_SECRET') }}"
validate_certs: false
# Include VMs and containers
want_facts: true
want_proxmox_nodes_ansible_host: false
# Filter by status
filters:
- status == "running"
# Group by various attributes
groups:
# By Proxmox node
node_joseph: proxmox_node == "joseph"
node_maxwell: proxmox_node == "maxwell"
node_everette: proxmox_node == "everette"
# By type
vms: proxmox_type == "qemu"
containers: proxmox_type == "lxc"
# By template naming convention
docker_hosts: "'docker' in proxmox_name"
pihole: "'pihole' in proxmox_name"
# Host variables from Proxmox
compose:
ansible_host: proxmox_agent_interfaces[0].ip-addresses[0].ip-address | default(proxmox_name)
ansible_user: "'ubuntu'"
proxmox_vmid: proxmox_vmid
proxmox_node: proxmox_node
```
### Enable in ansible.cfg
```ini
[inventory]
enable_plugins = community.general.proxmox, yaml, ini
```
## Testing Inventory
```bash
# List all hosts
ansible-inventory -i inventory/proxmox.yml --list
# Graph view
ansible-inventory -i inventory/proxmox.yml --graph
# Specific host details
ansible-inventory -i inventory/proxmox.yml --host myvm
```
## Common Patterns
### Filter by Tags
Proxmox 7+ supports VM tags:
```yaml
groups:
webservers: "'web' in proxmox_tags"
databases: "'db' in proxmox_tags"
production: "'prod' in proxmox_tags"
```
### Filter by VMID Range
```yaml
filters:
- vmid >= 200
- vmid < 300
groups:
dev_vms: proxmox_vmid >= 200 and proxmox_vmid < 300
prod_vms: proxmox_vmid >= 300 and proxmox_vmid < 400
```
### IP Address from QEMU Agent
Requires QEMU guest agent running in VM:
```yaml
compose:
# Primary IP from agent
ansible_host: >-
proxmox_agent_interfaces
| selectattr('name', 'equalto', 'eth0')
| map(attribute='ip-addresses')
| flatten
| selectattr('ip-address-type', 'equalto', 'ipv4')
| map(attribute='ip-address')
| first
| default(proxmox_name)
```
### Static + Dynamic Inventory
Combine with static inventory:
```bash
# inventory/
# static.yml # Static hosts
# proxmox.yml # Dynamic from Proxmox
ansible-playbook -i inventory/ playbook.yml
```
## Available Variables
Variables populated from Proxmox API:
| Variable | Description |
|----------|-------------|
| proxmox_vmid | VM/container ID |
| proxmox_name | VM/container name |
| proxmox_type | "qemu" or "lxc" |
| proxmox_status | running, stopped, etc. |
| proxmox_node | Proxmox node name |
| proxmox_pool | Resource pool (if any) |
| proxmox_tags | Tags (Proxmox 7+) |
| proxmox_template | Is template (bool) |
| proxmox_agent | QEMU agent enabled (bool) |
| proxmox_agent_interfaces | Network info from agent |
| proxmox_cpus | CPU count |
| proxmox_maxmem | Max memory bytes |
| proxmox_maxdisk | Max disk bytes |
## Caching
Enable caching for faster inventory:
```yaml
plugin: community.general.proxmox
# ... auth ...
cache: true
cache_plugin: jsonfile
cache_connection: /tmp/ansible_proxmox_cache
cache_timeout: 300 # 5 minutes
```
Clear cache:
```bash
rm -rf /tmp/ansible_proxmox_cache
```
## Troubleshooting
### No hosts returned
1. Check API connectivity:
```bash
curl -k "https://proxmox:8006/api2/json/cluster/resources" \
-H "Authorization: PVEAPIToken=ansible@pve!mytoken=secret"
```
2. Check filters aren't too restrictive - try removing them
3. Verify token permissions include `VM.Audit`
### QEMU agent data missing
- Agent must be installed and running in guest
- `want_facts: true` must be set
- May take a few seconds after VM boot
### Slow inventory queries
- Enable caching (see above)
- Use filters to reduce results
- Avoid `want_facts: true` if not needed

View File

@@ -0,0 +1,202 @@
# Ansible Proxmox Gotchas
Common issues when using Ansible with Proxmox VE.
## 1. Token ID Format
**Wrong:**
```yaml
api_token_id: mytoken
```
**Correct:**
```yaml
api_token_id: mytoken # Just the token name, NOT user@realm!tokenname
```
The module combines `api_user` and `api_token_id` internally.
## 2. VMID Required for Most Operations
Unlike Terraform, you must always specify `vmid`:
```yaml
# Won't auto-generate VMID
- name: Create VM
community.general.proxmox_kvm:
# ... auth ...
vmid: 300 # REQUIRED - no auto-assignment
name: myvm
```
To find next available VMID:
```yaml
- name: Get cluster resources
ansible.builtin.uri:
url: "https://{{ proxmox_api_host }}:8006/api2/json/cluster/resources"
headers:
Authorization: "PVEAPIToken={{ proxmox_api_user }}!{{ proxmox_api_token_id }}={{ proxmox_api_token_secret }}"
validate_certs: false
register: resources
- name: Calculate next VMID
ansible.builtin.set_fact:
next_vmid: "{{ (resources.json.data | selectattr('vmid', 'defined') | map(attribute='vmid') | max) + 1 }}"
```
## 3. Node Parameter Required
Must specify which node to operate on:
```yaml
- name: Create VM
community.general.proxmox_kvm:
# ... auth ...
node: joseph # REQUIRED - which Proxmox node
vmid: 300
```
## 4. Clone vs Create
Cloning requires different parameters than creating:
```yaml
# CLONE from template
- name: Clone VM
community.general.proxmox_kvm:
# ... auth ...
node: joseph
vmid: 300
name: myvm
clone: tmpl-ubuntu-2404-standard # Template name or VMID
full: true
# CREATE new (less common)
- name: Create VM
community.general.proxmox_kvm:
# ... auth ...
node: joseph
vmid: 300
name: myvm
ostype: l26
scsihw: virtio-scsi-pci
bootdisk: scsi0
scsi:
scsi0: 'local-lvm:32,format=raw'
```
## 5. Async Operations
Large operations (clone, snapshot) can timeout. Use async:
```yaml
- name: Clone large VM
community.general.proxmox_kvm:
# ... auth ...
clone: large-template
vmid: 300
timeout: 600 # Module timeout
async: 900 # Ansible async timeout
poll: 10 # Check every 10 seconds
```
## 6. State Idempotency
`state: present` doesn't update existing VMs:
```yaml
# This WON'T change cores on existing VM
- name: Create/update VM
community.general.proxmox_kvm:
# ... auth ...
vmid: 300
cores: 4 # Ignored if VM exists
state: present
```
To modify existing VMs, use `proxmox_kvm` with `update: true` (Ansible 2.14+) or use the API directly.
## 7. Network Interface Format (LXC)
LXC containers use a specific JSON-like string format:
```yaml
# WRONG
netif:
net0:
bridge: vmbr0
ip: dhcp
# CORRECT
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp"}'
# Multiple interfaces
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp","net1":"name=eth1,bridge=vmbr12,ip=dhcp"}'
```
## 8. Disk Resize Only Grows
`proxmox_disk` resize only increases size:
```yaml
# This adds 20G to current size
- name: Grow disk
community.general.proxmox_disk:
# ... auth ...
vmid: 300
disk: scsi0
size: +20G # Relative increase
state: resized
# NOT possible to shrink
```
## 9. Template vs VM States
Templates don't support all states:
```yaml
# Can't start a template
- name: Start template
community.general.proxmox_kvm:
vmid: 100
state: started # FAILS - templates can't run
```
Convert template to VM first if needed.
## 10. Collection Version Matters
Module parameters change between versions. Check installed version:
```bash
ansible-galaxy collection list | grep community.general
```
Update if needed:
```bash
ansible-galaxy collection install community.general --upgrade
```
## 11. Cloud-Init Not Supported
Unlike Terraform's Proxmox provider, the Ansible modules have limited cloud-init support. For cloud-init VMs:
1. Clone template with cloud-init already configured
2. Use API calls to set cloud-init parameters
3. Or configure post-boot with Ansible
```yaml
# Workaround: Use URI module for cloud-init config
- name: Set cloud-init IP
ansible.builtin.uri:
url: "https://{{ proxmox_api_host }}:8006/api2/json/nodes/{{ node }}/qemu/{{ vmid }}/config"
method: PUT
headers:
Authorization: "PVEAPIToken={{ proxmox_api_user }}!{{ proxmox_api_token_id }}={{ proxmox_api_token_secret }}"
body_format: form-urlencoded
body:
ipconfig0: "ip=192.168.1.100/24,gw=192.168.1.1"
ciuser: ubuntu
validate_certs: false
```

View File

@@ -0,0 +1,232 @@
# Ansible Proxmox Modules
Proxmox VE management via `community.general` collection.
## Collection Setup
```bash
ansible-galaxy collection install community.general
```
## Core Modules
### proxmox (LXC Containers)
```yaml
- name: Create LXC container
community.general.proxmox:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 200
hostname: mycontainer
ostemplate: local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst
storage: local-lvm
cores: 2
memory: 2048
disk: 10
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp"}'
state: present
- name: Start container
community.general.proxmox:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 200
state: started
- name: Stop container
community.general.proxmox:
# ... auth params ...
vmid: 200
state: stopped
force: true # Force stop if graceful fails
- name: Remove container
community.general.proxmox:
# ... auth params ...
vmid: 200
state: absent
```
### proxmox_kvm (VMs)
```yaml
- name: Create VM from template
community.general.proxmox_kvm:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 300
name: myvm
clone: tmpl-ubuntu-2404-standard
full: true # Full clone (not linked)
storage: local-lvm
format: raw
timeout: 500
- name: Start VM
community.general.proxmox_kvm:
# ... auth params ...
node: joseph
vmid: 300
state: started
- name: Stop VM (ACPI shutdown)
community.general.proxmox_kvm:
# ... auth params ...
vmid: 300
state: stopped
force: false # Graceful ACPI
- name: Force stop VM
community.general.proxmox_kvm:
# ... auth params ...
vmid: 300
state: stopped
force: true
- name: Current state (running/stopped/present/absent)
community.general.proxmox_kvm:
# ... auth params ...
vmid: 300
state: current
register: vm_state
```
### proxmox_template
```yaml
- name: Convert VM to template
community.general.proxmox_template:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 100
state: present # Convert to template
- name: Delete template
community.general.proxmox_template:
# ... auth params ...
vmid: 100
state: absent
```
### proxmox_snap
```yaml
- name: Create snapshot
community.general.proxmox_snap:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
vmid: 300
snapname: before-upgrade
description: "Snapshot before major upgrade"
vmstate: false # Don't include RAM
state: present
- name: Rollback to snapshot
community.general.proxmox_snap:
# ... auth params ...
vmid: 300
snapname: before-upgrade
state: rollback
- name: Remove snapshot
community.general.proxmox_snap:
# ... auth params ...
vmid: 300
snapname: before-upgrade
state: absent
```
### proxmox_nic
```yaml
- name: Add NIC to VM
community.general.proxmox_nic:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
vmid: 300
interface: net1
bridge: vmbr12
model: virtio
tag: 12 # VLAN tag
state: present
- name: Remove NIC
community.general.proxmox_nic:
# ... auth params ...
vmid: 300
interface: net1
state: absent
```
### proxmox_disk
```yaml
- name: Add disk to VM
community.general.proxmox_disk:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
vmid: 300
disk: scsi1
storage: local-lvm
size: 50G
format: raw
state: present
- name: Resize disk
community.general.proxmox_disk:
# ... auth params ...
vmid: 300
disk: scsi0
size: +20G # Increase by 20G
state: resized
- name: Detach disk
community.general.proxmox_disk:
# ... auth params ...
vmid: 300
disk: scsi1
state: absent
```
## State Reference
| Module | States |
|--------|--------|
| proxmox (LXC) | present, started, stopped, restarted, absent |
| proxmox_kvm | present, started, stopped, restarted, absent, current |
| proxmox_template | present, absent |
| proxmox_snap | present, absent, rollback |
| proxmox_nic | present, absent |
| proxmox_disk | present, absent, resized |
## Common Parameters
All modules share these authentication parameters:
| Parameter | Description |
|-----------|-------------|
| api_host | Proxmox hostname/IP |
| api_user | User (format: user@realm) |
| api_token_id | API token name |
| api_token_secret | API token value |
| validate_certs | Verify TLS (default: true) |
| timeout | API timeout seconds |

View File

@@ -0,0 +1,295 @@
# Ansible Troubleshooting Reference
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| SSH connection failed | Wrong host/key/user | Check ansible_host, ansible_user, key |
| Permission denied | Need sudo/wrong user | Add `become: true`, check sudo config |
| Module not found | Collection not installed | `ansible-galaxy collection install` |
| Variable undefined | Missing var/typo | Check var name, define in vars |
| Syntax error | YAML/Jinja2 issue | Run `ansible-playbook --syntax-check` |
| Host unreachable | Network/SSH issue | `ansible host -m ping`, check firewall |
## Debug Commands
```bash
# Test connectivity
ansible all -m ping
ansible host -m ping -vvv
# Syntax check
ansible-playbook playbook.yml --syntax-check
# Dry run (check mode)
ansible-playbook playbook.yml --check
# Diff mode (show changes)
ansible-playbook playbook.yml --diff
# Verbose output
ansible-playbook playbook.yml -v # Minimal
ansible-playbook playbook.yml -vv # More
ansible-playbook playbook.yml -vvv # Connection debug
ansible-playbook playbook.yml -vvvv # Full debug
# List tasks without running
ansible-playbook playbook.yml --list-tasks
# List hosts
ansible-playbook playbook.yml --list-hosts
# Start at specific task
ansible-playbook playbook.yml --start-at-task="Task name"
# Step through tasks
ansible-playbook playbook.yml --step
```
## Connection Issues
### Test SSH
```bash
# Direct SSH test
ssh -i ~/.ssh/key user@host
# Ansible ping
ansible host -m ping -vvv
# Check SSH config
ansible host -m debug -a "var=ansible_ssh_private_key_file"
```
### Common SSH Fixes
```yaml
# In inventory or ansible.cfg
ansible_ssh_private_key_file: ~/.ssh/mykey
ansible_user: ubuntu
ansible_host: 192.168.1.10
host_key_checking: False # Only for testing
```
### SSH Connection Options
```yaml
# In inventory
host1:
ansible_host: 192.168.1.10
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
ansible_ssh_extra_args: '-o ConnectTimeout=10'
```
## Permission Issues
### Sudo Not Working
```yaml
# Enable become
- hosts: all
become: true
become_method: sudo
become_user: root
```
```bash
# On target host, check sudoers
sudo visudo
# User should have:
# ubuntu ALL=(ALL) NOPASSWD: ALL
```
### Ask for Sudo Password
```bash
ansible-playbook playbook.yml --ask-become-pass
```
## Variable Issues
### Debug Variables
```yaml
- name: Print all vars
ansible.builtin.debug:
var: vars
- name: Print specific var
ansible.builtin.debug:
var: my_var
- name: Print hostvars
ansible.builtin.debug:
var: hostvars[inventory_hostname]
- name: Print facts
ansible.builtin.debug:
var: ansible_facts
```
### Check Variable Precedence
```bash
# See where variable comes from
ansible-inventory --host hostname --yaml
```
### Undefined Variable
```yaml
# Provide default
value: "{{ my_var | default('fallback') }}"
# Check if defined
- name: Task
when: my_var is defined
# Fail early if required
- name: Validate
ansible.builtin.assert:
that: my_var is defined
fail_msg: "my_var must be set"
```
## Module Issues
### Module Not Found
```bash
# Install collection
ansible-galaxy collection install community.docker
# Check installed
ansible-galaxy collection list
# Update collections
ansible-galaxy collection install -r requirements.yml --force
```
### Module Arguments
```bash
# Get module documentation
ansible-doc ansible.builtin.copy
ansible-doc community.docker.docker_compose_v2
```
## Idempotency Issues
### Task Always Shows "changed"
```yaml
# Bad - always changed
- name: Run script
ansible.builtin.command: /bin/script.sh
# Good - check first
- name: Run script
ansible.builtin.command: /bin/script.sh
args:
creates: /opt/app/.installed
# Good - explicit changed_when
- name: Run script
ansible.builtin.command: /bin/script.sh
register: result
changed_when: "'Created' in result.stdout"
```
### Test Idempotency
```bash
# Run twice, second should show all "ok"
ansible-playbook playbook.yml
ansible-playbook playbook.yml # Should show "changed=0"
```
## Handler Issues
### Handler Not Running
- Handlers only run if task reports "changed"
- Handlers run at end of play, not immediately
- Force handler run: `ansible-playbook --force-handlers`
```yaml
# Force handler to run immediately
- name: Config change
ansible.builtin.template:
src: config.j2
dest: /etc/app/config
notify: Restart app
- name: Flush handlers
ansible.builtin.meta: flush_handlers
- name: Continue with restarted service
ansible.builtin.uri:
url: http://localhost:8080/health
```
## Performance Issues
### Slow Playbook
```yaml
# Disable fact gathering if not needed
- hosts: all
gather_facts: false
# Or gather specific facts
- hosts: all
gather_facts: true
gather_subset:
- network
```
```bash
# Increase parallelism
ansible-playbook playbook.yml -f 20 # 20 forks
# Use pipelining (add to ansible.cfg)
# [ssh_connection]
# pipelining = True
```
### Callback Timer
```ini
# ansible.cfg
[defaults]
callbacks_enabled = timer, profile_tasks
```
## Recovery
### Failed Playbook
```bash
# Retry failed hosts
ansible-playbook playbook.yml --limit @playbook.retry
# Start at failed task
ansible-playbook playbook.yml --start-at-task="Failed Task Name"
```
### Cleanup After Failure
```yaml
- name: Risky operation
block:
- name: Do something
ansible.builtin.command: /bin/risky
rescue:
- name: Cleanup on failure
ansible.builtin.file:
path: /tmp/incomplete
state: absent
always:
- name: Always cleanup
ansible.builtin.file:
path: /tmp/lock
state: absent
```

View File

@@ -0,0 +1,246 @@
# Ansible Variables Reference
## Variable Precedence (High to Low)
1. **Extra vars** (`-e "var=value"`)
2. **Task vars** (in task)
3. **Block vars** (in block)
4. **Role/include vars**
5. **set_facts / registered vars**
6. **Play vars_files**
7. **Play vars_prompt**
8. **Play vars**
9. **Host facts**
10. **Playbook host_vars/**
11. **Inventory host_vars/**
12. **Playbook group_vars/**
13. **Inventory group_vars/**
14. **Playbook group_vars/all**
15. **Inventory group_vars/all**
16. **Role defaults**
## Defining Variables
### In Playbook
```yaml
- hosts: all
vars:
app_name: myapp
app_port: 8080
vars_files:
- vars/common.yml
- "vars/{{ environment }}.yml"
```
### In Tasks
```yaml
- name: Set variable
ansible.builtin.set_fact:
my_var: "value"
- name: Register output
ansible.builtin.command: whoami
register: user_result
- name: Use registered
ansible.builtin.debug:
msg: "User: {{ user_result.stdout }}"
```
### In Roles
```yaml
# roles/app/defaults/main.yml (low priority)
app_port: 8080
# roles/app/vars/main.yml (high priority)
internal_setting: value
```
## Variable Types
```yaml
# String
name: "value"
# Number
port: 8080
# Boolean
enabled: true
# List
packages:
- nginx
- python3
# Dictionary
user:
name: admin
groups:
- wheel
- docker
```
## Accessing Variables
```yaml
# Simple
msg: "{{ my_var }}"
# Dictionary
msg: "{{ user.name }}"
msg: "{{ user['name'] }}"
# List
msg: "{{ packages[0] }}"
msg: "{{ packages | first }}"
# Default value
msg: "{{ my_var | default('fallback') }}"
# Required (fail if undefined)
msg: "{{ my_var }}" # Fails if undefined
```
## Jinja2 Filters
```yaml
# Default
value: "{{ var | default('default') }}"
# Mandatory
value: "{{ var | mandatory }}"
# Type conversion
port: "{{ port_string | int }}"
flag: "{{ flag_string | bool }}"
# String operations
upper: "{{ name | upper }}"
lower: "{{ name | lower }}"
title: "{{ name | title }}"
# Lists
first: "{{ list | first }}"
last: "{{ list | last }}"
length: "{{ list | length }}"
joined: "{{ list | join(',') }}"
# JSON
json_str: "{{ dict | to_json }}"
yaml_str: "{{ dict | to_yaml }}"
# Path operations
basename: "{{ path | basename }}"
dirname: "{{ path | dirname }}"
```
## Facts
```yaml
# Accessing facts
os: "{{ ansible_distribution }}"
version: "{{ ansible_distribution_version }}"
ip: "{{ ansible_default_ipv4.address }}"
hostname: "{{ ansible_hostname }}"
memory_mb: "{{ ansible_memtotal_mb }}"
cpus: "{{ ansible_processor_vcpus }}"
```
### Gathering Facts
```yaml
- hosts: all
gather_facts: true # Default
# Or manually
- name: Gather facts
ansible.builtin.setup:
filter: ansible_*
# Specific facts
- name: Get network facts
ansible.builtin.setup:
gather_subset:
- network
```
## Environment Variables
```yaml
# Lookup
value: "{{ lookup('env', 'MY_VAR') }}"
# Set for task
- name: Run with env
ansible.builtin.command: /bin/command
environment:
MY_VAR: "{{ my_value }}"
```
## Secrets/Vault
```bash
# Create encrypted file
ansible-vault create secrets.yml
# Edit encrypted file
ansible-vault edit secrets.yml
# Encrypt existing file
ansible-vault encrypt vars.yml
# Run with vault password
ansible-playbook playbook.yml --ask-vault-pass
ansible-playbook playbook.yml --vault-password-file ~/.vault_pass
```
## Prompt for Variables
```yaml
- hosts: all
vars_prompt:
- name: password
prompt: "Enter password"
private: true
- name: environment
prompt: "Which environment?"
default: "staging"
```
## Conditionals with Variables
```yaml
- name: Check defined
when: my_var is defined
- name: Check undefined
when: my_var is not defined
- name: Check truthy
when: my_var | bool
- name: Check falsy
when: not my_var | bool
- name: Check in list
when: item in my_list
- name: Version comparison
when: version is version('2.0', '>=')
```
## Hostvars
Access variables from other hosts:
```yaml
- name: Get from other host
ansible.builtin.debug:
msg: "{{ hostvars['web1']['ansible_host'] }}"
```

121
skills/docker/SKILL.md Normal file
View File

@@ -0,0 +1,121 @@
---
name: docker
description: |
Docker and Docker Compose reference for container deployment, networking, volumes,
and orchestration. Includes Proxmox hosting and LXC comparison patterns.
Use when working with docker-compose.yaml, Dockerfiles, troubleshooting containers,
or planning container architecture.
Triggers: docker, compose, container, dockerfile, volume, network, service, lxc.
---
# Docker Skill
Docker and Docker Compose reference for containerized application deployment and management.
## Quick Reference
```bash
# Container operations
docker ps # List running containers
docker ps -a # List all containers
docker logs <container> # View logs
docker logs -f <container> # Follow logs
docker exec -it <container> sh # Shell into container
docker inspect <container> # Full container details
# Compose operations
docker compose up -d # Start services (detached)
docker compose down # Stop and remove
docker compose ps # List compose services
docker compose logs -f # Follow all logs
docker compose pull # Pull latest images
docker compose restart # Restart services
# Troubleshooting
docker stats # Resource usage
docker network ls # List networks
docker network inspect <net> # Network details
docker volume ls # List volumes
docker system df # Disk usage
docker system prune # Clean up unused resources
```
## Reference Files
Load on-demand based on task:
| Topic | File | When to Load |
|-------|------|--------------|
| Compose Structure | [compose.md](references/compose.md) | Writing docker-compose.yaml |
| Networking | [networking.md](references/networking.md) | Network modes, port mapping |
| Volumes | [volumes.md](references/volumes.md) | Data persistence, mounts |
| Dockerfile | [dockerfile.md](references/dockerfile.md) | Building images |
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, diagnostics |
### Proxmox Integration
| Topic | File | When to Load |
|-------|------|--------------|
| Docker on Proxmox | [proxmox/hosting.md](references/proxmox/hosting.md) | VM sizing, storage, GPU passthrough |
| LXC vs Docker | [proxmox/lxc-vs-docker.md](references/proxmox/lxc-vs-docker.md) | Choosing container type |
## Compose File Quick Reference
```yaml
name: myapp # Project name (optional)
services:
web:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./html:/usr/share/nginx/html:ro
networks:
- frontend
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 30s
timeout: 10s
retries: 3
networks:
frontend:
driver: bridge
volumes:
data:
```
## Validation Checklist
Before deploying containers:
- [ ] Services defined with specific image tags (not :latest)
- [ ] Port mappings without conflicts
- [ ] Volumes for persistent data
- [ ] Networks configured appropriately
- [ ] Resource limits set (memory, CPU)
- [ ] Health checks for critical services
- [ ] Restart policy appropriate
- [ ] Secrets not in images or compose file
- [ ] .env file for environment variables
## Network Mode Quick Decision
| Mode | Use Case | Isolation |
|------|----------|-----------|
| bridge | Default, most services | Container isolated |
| host | Performance, network tools | No isolation |
| macvlan | Direct LAN access | Own MAC/IP |
| ipvlan | Like macvlan, shared MAC | Own IP |
| none | No networking | Full isolation |
## Volume Type Quick Decision
| Type | Use Case | Portability |
|------|----------|-------------|
| Named volume | Database, app data | Best |
| Bind mount | Config files, dev | Host-dependent |
| tmpfs | Secrets, cache | Memory only |

View File

@@ -0,0 +1,268 @@
# Docker Compose Reference
## File Structure
```yaml
name: project-name # Optional, defaults to directory name
services:
service-name:
# Image or build
image: image:tag
build:
context: ./path
dockerfile: Dockerfile
# Networking
ports:
- "host:container"
networks:
- network-name
# Storage
volumes:
- named-volume:/path
- ./host-path:/container-path
# Environment
environment:
KEY: value
env_file:
- .env
# Dependencies
depends_on:
- other-service
# Lifecycle
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 30s
timeout: 10s
retries: 3
# Resources
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
memory: 256M
networks:
network-name:
driver: bridge
volumes:
named-volume:
```
## Service Options
### Image vs Build
```yaml
# Use existing image
image: nginx:1.25-alpine
# Build from Dockerfile
build:
context: .
dockerfile: Dockerfile
args:
BUILD_ARG: value
```
### Port Mapping
```yaml
ports:
- "80:80" # host:container
- "443:443"
- "127.0.0.1:8080:80" # localhost only
- "8080-8090:8080-8090" # range
```
### Environment Variables
```yaml
# Inline
environment:
DATABASE_URL: postgres://db:5432/app
DEBUG: "false"
# From file
env_file:
- .env
- .env.local
```
### Dependencies
```yaml
depends_on:
- db
- redis
# With conditions (compose v2.1+)
depends_on:
db:
condition: service_healthy
```
### Restart Policies
| Policy | Behavior |
|--------|----------|
| no | Never restart (default) |
| always | Always restart |
| unless-stopped | Restart unless manually stopped |
| on-failure | Restart only on error exit |
### Health Checks
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s # Time between checks
timeout: 10s # Check timeout
retries: 3 # Failures before unhealthy
start_period: 40s # Grace period on startup
```
### Resource Limits
```yaml
deploy:
resources:
limits:
cpus: '2'
memory: 1G
reservations:
cpus: '0.5'
memory: 256M
```
## Network Configuration
### Custom Network
```yaml
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
```
### External Network
```yaml
networks:
existing-network:
external: true
```
### Macvlan Network
```yaml
networks:
lan:
driver: macvlan
driver_opts:
parent: eth0
ipam:
config:
- subnet: 192.168.1.0/24
gateway: 192.168.1.1
```
## Volume Configuration
### Named Volume
```yaml
volumes:
data:
driver: local
services:
db:
volumes:
- data:/var/lib/mysql
```
### Bind Mount
```yaml
services:
web:
volumes:
- ./config:/etc/app/config:ro
- ./data:/app/data
```
### tmpfs Mount
```yaml
services:
app:
tmpfs:
- /tmp
- /run
```
## Multi-Environment Setup
### Using .env Files
```bash
# .env
COMPOSE_PROJECT_NAME=myapp
IMAGE_TAG=latest
```
```yaml
# docker-compose.yaml
services:
app:
image: myapp:${IMAGE_TAG}
```
### Override Files
```bash
# Base config
docker-compose.yaml
# Development overrides
docker-compose.override.yaml # Auto-loaded
# Production
docker compose -f docker-compose.yaml -f docker-compose.prod.yaml up
```
## Useful Commands
```bash
# Start with rebuild
docker compose up -d --build
# Scale service
docker compose up -d --scale web=3
# View config after variable substitution
docker compose config
# Execute command in service
docker compose exec web sh
# View service logs
docker compose logs -f web
# Restart single service
docker compose restart web
```

View File

@@ -0,0 +1,243 @@
# Dockerfile Reference
## Basic Structure
```dockerfile
# Base image
FROM ubuntu:22.04
# Metadata
LABEL maintainer="team@example.com"
LABEL version="1.0"
# Environment
ENV APP_HOME=/app
WORKDIR $APP_HOME
# Install dependencies
RUN apt-get update && apt-get install -y \
package1 \
package2 \
&& rm -rf /var/lib/apt/lists/*
# Copy files
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
# Non-root user
RUN useradd -r -s /bin/false appuser
USER appuser
# Expose port
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8080/health || exit 1
# Entry point
ENTRYPOINT ["python"]
CMD ["app.py"]
```
## Multi-Stage Builds
Reduce final image size by separating build and runtime:
```dockerfile
# Build stage
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp
# Runtime stage
FROM alpine:3.18
COPY --from=builder /app/myapp /usr/local/bin/
CMD ["myapp"]
```
## Common Base Images
| Image | Size | Use Case |
|-------|------|----------|
| alpine | ~5MB | Minimal, production |
| debian:slim | ~80MB | Compatibility |
| ubuntu | ~75MB | Development |
| distroless | ~20MB | Security-focused |
| scratch | 0MB | Static binaries only |
## Instructions Reference
### FROM
```dockerfile
FROM image:tag
FROM image:tag AS builder
FROM --platform=linux/amd64 image:tag
```
### RUN
```dockerfile
# Shell form
RUN apt-get update && apt-get install -y package
# Exec form
RUN ["executable", "param1", "param2"]
```
### COPY vs ADD
```dockerfile
# COPY - preferred for local files
COPY ./src /app/src
COPY --chown=user:group files /app/
# ADD - can extract tars, fetch URLs (use sparingly)
ADD archive.tar.gz /app/
```
### ENV vs ARG
```dockerfile
# ARG - build-time only
ARG VERSION=1.0
# ENV - persists in image
ENV APP_VERSION=$VERSION
```
### EXPOSE
```dockerfile
EXPOSE 8080
EXPOSE 443/tcp
EXPOSE 53/udp
```
Documentation only - doesn't publish ports.
### ENTRYPOINT vs CMD
```dockerfile
# ENTRYPOINT - main executable
ENTRYPOINT ["python"]
# CMD - default arguments (can be overridden)
CMD ["app.py"]
# Combined: python app.py
# Override: docker run image other.py -> python other.py
```
### USER
```dockerfile
RUN useradd -r -s /bin/false appuser
USER appuser
```
### HEALTHCHECK
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost/health || exit 1
```
## Best Practices
### Layer Caching
Order from least to most frequently changed:
```dockerfile
# Rarely changes - cached
FROM node:18-alpine
WORKDIR /app
# Changes when deps change
COPY package*.json ./
RUN npm install
# Changes frequently - rebuild each time
COPY . .
```
### Reduce Layers
Combine RUN commands:
```dockerfile
# Bad - 3 layers
RUN apt-get update
RUN apt-get install -y package
RUN rm -rf /var/lib/apt/lists/*
# Good - 1 layer
RUN apt-get update && \
apt-get install -y package && \
rm -rf /var/lib/apt/lists/*
```
### Security
```dockerfile
# Use specific tags
FROM node:18.17.0-alpine # Not :latest
# Non-root user
USER nobody
# Read-only filesystem
# (Set at runtime with --read-only)
# No secrets in image
# (Use build args or runtime secrets)
```
### .dockerignore
```
.git
.gitignore
node_modules
*.log
.env
Dockerfile
docker-compose.yaml
README.md
```
## Build Commands
```bash
# Basic build
docker build -t myimage:tag .
# With build args
docker build --build-arg VERSION=1.0 -t myimage .
# No cache
docker build --no-cache -t myimage .
# Specific Dockerfile
docker build -f Dockerfile.prod -t myimage .
# Multi-platform
docker buildx build --platform linux/amd64,linux/arm64 -t myimage .
```
## Debugging Builds
```bash
# Build with progress output
docker build --progress=plain -t myimage .
# Inspect layers
docker history myimage
# Check image size
docker images myimage
```

View File

@@ -0,0 +1,229 @@
# Docker Networking Reference
## Network Drivers
### Bridge (Default)
Isolated container network with port mapping.
```yaml
networks:
app-network:
driver: bridge
```
- Containers get private IPs (172.17.0.0/16 default)
- Port mapping exposes services (`-p 80:80`)
- DNS resolution between containers by name
- Default for single-host deployments
### Host
Container shares host network stack.
```yaml
services:
app:
network_mode: host
```
- No network isolation
- No port mapping needed (container uses host ports)
- Best performance (no NAT overhead)
- Use for: Network tools, performance-critical apps
### Macvlan
Container gets own MAC address on physical network.
```yaml
networks:
lan:
driver: macvlan
driver_opts:
parent: eth0
ipam:
config:
- subnet: 192.168.1.0/24
gateway: 192.168.1.1
ip_range: 192.168.1.128/25
```
- Container appears as physical device on LAN
- Direct network access, no port mapping
- Use for: Services needing LAN presence (DNS, DHCP)
- Requires promiscuous mode on parent interface
### IPvlan
Like macvlan but shares host MAC address.
```yaml
networks:
lan:
driver: ipvlan
driver_opts:
parent: eth0
ipvlan_mode: l2 # or l3
```
- L2 mode: Same subnet as host
- L3 mode: Different subnet, requires routing
- Use when: Macvlan blocked by switch, cloud environments
### None
No networking.
```yaml
services:
isolated:
network_mode: none
```
## Port Mapping
```yaml
ports:
# Simple mapping
- "80:80"
# Different host port
- "8080:80"
# Localhost only
- "127.0.0.1:8080:80"
# UDP
- "53:53/udp"
# Range
- "8080-8090:8080-8090"
# Random host port
- "80"
```
## DNS and Service Discovery
### Automatic DNS
Containers on same network resolve each other by service name:
```yaml
services:
web:
networks:
- app
db:
networks:
- app
```
`web` can reach `db` at hostname `db`.
### Aliases
```yaml
services:
db:
networks:
app:
aliases:
- database
- mysql
```
### Custom DNS
```yaml
services:
app:
dns:
- 8.8.8.8
- 8.8.4.4
dns_search:
- example.com
```
## Network Isolation
### Internal Networks
No external connectivity:
```yaml
networks:
backend:
internal: true
```
### Multiple Networks
```yaml
services:
web:
networks:
- frontend
- backend
db:
networks:
- backend # Not on frontend
networks:
frontend:
backend:
internal: true
```
## Static IPs
```yaml
services:
app:
networks:
app-network:
ipv4_address: 172.20.0.10
networks:
app-network:
ipam:
config:
- subnet: 172.20.0.0/24
```
## Troubleshooting
### Inspect Network
```bash
docker network ls
docker network inspect <network>
```
### Container Network Info
```bash
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
```
### Test Connectivity
```bash
# From inside container
docker exec <container> ping <target>
docker exec <container> curl <url>
# Check DNS
docker exec <container> nslookup <hostname>
```
### Common Issues
| Problem | Check |
|---------|-------|
| Can't reach container | Port mapping, firewall, network attachment |
| DNS not working | Same network, container running |
| Slow network | Network mode, MTU settings |
| Port already in use | `lsof -i :<port>`, change mapping |

View File

@@ -0,0 +1,227 @@
# Docker on Proxmox VMs
Best practices for running Docker workloads on Proxmox VE.
## Template Selection
Use Docker-ready templates (102+) which have Docker pre-installed:
| Template ID | Name | Docker? |
|-------------|------|---------|
| 100 | tmpl-ubuntu-2404-base | No |
| 101 | tmpl-ubuntu-2404-standard | No |
| 102 | tmpl-ubuntu-2404-docker | Yes |
| 103 | tmpl-ubuntu-2404-github-runner | Yes |
| 104 | tmpl-ubuntu-2404-pihole | Yes |
**DO NOT** install Docker via cloud-init on templates 102+.
## VM vs LXC for Docker
| Factor | VM (QEMU) | LXC Unprivileged | LXC Privileged |
|--------|-----------|------------------|----------------|
| Docker support | Full | Limited | Works but risky |
| Isolation | Complete | Shared kernel | Shared kernel |
| Overhead | Higher | Lower | Lower |
| Nested containers | Works | Requires config | Works |
| GPU passthrough | Yes | Limited | Limited |
| Security | Best | Good | Avoid |
**Recommendation:** Use VMs for Docker workloads. LXC adds complexity for marginal resource savings.
## VM Sizing for Docker
### Minimum for Docker host
```
CPU: 2 cores
RAM: 4 GB (2 GB for OS, 2 GB for containers)
Disk: 50 GB (20 GB OS, 30 GB images/volumes)
```
### Per-container overhead
```
Base: ~10 MB RAM per container
Image layers: Shared between containers
Volumes: Depends on data
```
### Sizing formula
```
Total RAM = 2 GB (OS) + sum(container memory limits) + 20% buffer
Total Disk = 20 GB (OS) + images + volumes + 20% buffer
```
## Storage Backend Selection
| Proxmox Storage | Docker Use Case | Performance |
|-----------------|-----------------|-------------|
| local-lvm | General workloads | Good |
| ZFS | Database containers | Better (snapshots) |
| Ceph | HA workloads | Good (distributed) |
| NFS | Shared config/data | Moderate |
### Volume mapping to Proxmox storage
```yaml
# docker-compose.yaml
volumes:
db_data:
driver: local
driver_opts:
type: none
device: /mnt/storage/mysql # Map to Proxmox storage mount
o: bind
```
## Network Considerations
### Bridge mode (default)
Container gets private IP, NAT to VM IP. Good for most workloads.
```yaml
services:
web:
ports:
- "80:80" # VM_IP:80 -> container:80
```
### Host mode
Container shares VM network stack. Use for network tools or performance.
```yaml
services:
pihole:
network_mode: host # Container uses VM's IPs directly
```
### Macvlan (direct LAN access)
Container gets own IP on Proxmox bridge.
```bash
# On Docker host (VM)
docker network create -d macvlan \
--subnet=192.168.1.0/24 \
--gateway=192.168.1.1 \
-o parent=eth0 \
lan
```
```yaml
services:
app:
networks:
lan:
ipv4_address: 192.168.1.50
networks:
lan:
external: true
```
**Note:** Requires Proxmox bridge without VLAN tagging on that interface, or pass-through the VLAN-tagged interface to VM.
## Resource Limits
Always set limits to prevent container runaway affecting VM:
```yaml
services:
app:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '0.5'
memory: 512M
```
## GPU Passthrough
For containers needing GPU (AI/ML, transcoding):
1. **Proxmox:** Pass GPU to VM
```
hostpci0: 0000:01:00.0,pcie=1
```
2. **VM:** Install NVIDIA drivers + nvidia-container-toolkit
3. **Compose:**
```yaml
services:
plex:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
## Backup Considerations
### What to backup
| Data | Method | Location |
|------|--------|----------|
| VM disk | Proxmox vzdump | Includes everything |
| Docker volumes | docker run --volumes-from | Application-level |
| Compose files | Git | Version control |
### Proxmox backup includes Docker
When backing up the VM with vzdump, all Docker data (images, volumes, containers) is included.
```bash
vzdump <vmid> --mode snapshot --storage backup
```
### Application-consistent backups
For databases, use pre/post scripts:
```bash
# Pre-backup: flush and lock
docker exec mysql mysql -e "FLUSH TABLES WITH READ LOCK;"
# vzdump runs...
# Post-backup: unlock
docker exec mysql mysql -e "UNLOCK TABLES;"
```
## Troubleshooting
### Container can't reach internet
1. Check VM can reach internet: `ping 8.8.8.8`
2. Check Docker DNS: `docker run --rm alpine nslookup google.com`
3. Check iptables forwarding: `sysctl net.ipv4.ip_forward`
### Port not accessible from LAN
1. Check Proxmox firewall allows port
2. Check VM firewall (ufw/iptables)
3. Check container is bound to 0.0.0.0 not 127.0.0.1
### Disk space issues
```bash
# Check Docker disk usage
docker system df
# Clean up
docker system prune -a --volumes # WARNING: removes all unused data
# Check VM disk
df -h
```

View File

@@ -0,0 +1,140 @@
# LXC vs Docker Containers
Understanding when to use Proxmox LXC containers vs Docker containers.
## Fundamental Differences
| Aspect | LXC (Proxmox) | Docker |
|--------|---------------|--------|
| Abstraction | System container (full OS) | Application container |
| Init system | systemd, runit, etc. | Single process (PID 1) |
| Management | Proxmox (pct) | Docker daemon |
| Persistence | Stateful by default | Ephemeral by default |
| Updates | apt/yum inside container | Replace container |
| Networking | Proxmox managed | Docker managed |
## When to Use LXC
- **Long-running services** with traditional management (systemd, cron)
- **Multi-process applications** that expect init system
- **Legacy apps** not designed for containers
- **Dev/test environments** mimicking full VMs
- **Resource efficiency** when full VM isolation not needed
- **Direct Proxmox management** (backup, snapshots, migration)
```bash
# Create LXC
pct create 200 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname mycontainer \
--storage local-lvm \
--rootfs local-lvm:8 \
--cores 2 \
--memory 2048 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp
```
## When to Use Docker
- **Microservices** with single responsibility
- **CI/CD pipelines** with reproducible builds
- **Rapid deployment** and scaling
- **Application isolation** within a host
- **Compose stacks** with multi-container apps
- **Ecosystem tooling** (registries, orchestration)
```yaml
# docker-compose.yaml
services:
app:
image: myapp:1.0
restart: unless-stopped
```
## Decision Matrix
| Scenario | Recommendation | Rationale |
|----------|---------------|-----------|
| Pi-hole | Docker on VM | Easy updates, compose ecosystem |
| Database server | LXC or VM | Stateful, traditional management |
| Web app microservice | Docker | Ephemeral, scalable |
| Development environment | LXC | Full OS, multiple services |
| CI runner | Docker on VM | Isolation, reproducibility |
| Network appliance | LXC | Direct network access, systemd |
| Home automation | Docker on VM | Compose stacks, easy backup |
## Hybrid Approach
Common pattern: **VM runs Docker**, managed by Proxmox.
```
Proxmox Node
├── VM: docker-host-1 (template 102)
│ ├── Container: nginx
│ ├── Container: app
│ └── Container: redis
├── VM: docker-host-2 (template 102)
│ ├── Container: postgres
│ └── Container: backup
└── LXC: pihole (direct network)
```
Benefits:
- Proxmox handles VM-level backup/migration
- Docker handles application deployment
- Clear separation of concerns
## Docker in LXC (Not Recommended)
Running Docker inside LXC is possible but adds complexity:
### Requirements
1. Privileged container OR nested containers enabled
2. AppArmor profile modifications
3. Keyctl feature enabled
```bash
# LXC config (Proxmox)
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
features: keyctl=1,nesting=1
```
### Issues
- Security: Reduced isolation
- Compatibility: Some Docker features broken
- Debugging: Two container layers
- Backup: More complex
**Recommendation:** Use VM with Docker instead.
## Resource Comparison
For equivalent workload:
| Resource | VM + Docker | LXC | Docker in LXC |
|----------|-------------|-----|---------------|
| RAM overhead | ~500 MB | ~50 MB | ~100 MB |
| Disk overhead | ~5 GB | ~500 MB | ~1 GB |
| Boot time | 30-60s | 2-5s | 5-10s |
| Isolation | Full | Shared kernel | Shared kernel |
| Complexity | Low | Low | High |
## Migration Paths
### LXC to Docker
1. Export application config from LXC
2. Create Dockerfile/compose
3. Build image
4. Deploy to Docker host
5. Migrate data volumes
### Docker to LXC
1. Install service directly in LXC (apt/yum)
2. Configure with systemd
3. Migrate data
4. Update Proxmox firewall rules

View File

@@ -0,0 +1,212 @@
# Docker Troubleshooting Reference
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| Container exits immediately | Bad entrypoint, missing deps | Check logs, verify CMD |
| Port already in use | Conflict with host/other container | `lsof -i :<port>`, change mapping |
| Volume permission denied | UID mismatch | Check ownership, use named volumes |
| Network not found | Network removed/not created | `docker network create` |
| Image pull failed | Registry/auth/name issue | Check registry, credentials, name |
| OOM killed | Exceeded memory limit | Increase limit or optimize app |
| DNS resolution failed | Network config issue | Check DNS settings, network mode |
| Health check failing | App not responding | Check command, increase timeout |
## Diagnostic Commands
### Container Status
```bash
# List all containers (including stopped)
docker ps -a
# Check exit code
docker inspect <container> --format '{{.State.ExitCode}}'
# Check restart count
docker inspect <container> --format '{{.RestartCount}}'
```
### Logs
```bash
# View logs
docker logs <container>
# Follow logs
docker logs -f <container>
# Last N lines
docker logs --tail 100 <container>
# With timestamps
docker logs -t <container>
# Since time
docker logs --since 10m <container>
```
### Resource Usage
```bash
# Real-time stats
docker stats
# Single container
docker stats <container>
# Disk usage
docker system df
docker system df -v # Verbose
```
### Container Details
```bash
# Full inspection
docker inspect <container>
# Specific fields
docker inspect <container> --format '{{.State.Status}}'
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
docker inspect <container> --format '{{.Mounts}}'
```
### Process and Network
```bash
# Running processes
docker top <container>
# Execute command
docker exec <container> ps aux
docker exec <container> netstat -tlnp
# Network connectivity
docker exec <container> ping <host>
docker exec <container> curl <url>
docker exec <container> nslookup <hostname>
```
## Troubleshooting Workflows
### Container Won't Start
1. Check logs: `docker logs <container>`
2. Check exit code: `docker inspect <container> --format '{{.State.ExitCode}}'`
3. Run interactively: `docker run -it <image> sh`
4. Check entrypoint/cmd: `docker inspect <image> --format '{{.Config.Cmd}}'`
### Container Keeps Restarting
1. Check logs for errors
2. Verify health check if configured
3. Check resource limits (OOM)
4. Test entrypoint manually
### Network Issues
1. Verify network exists: `docker network ls`
2. Check container attached: `docker inspect <container> --format '{{.NetworkSettings.Networks}}'`
3. Test DNS: `docker exec <container> nslookup <service>`
4. Check port mapping: `docker port <container>`
### Volume Issues
1. Check mount: `docker inspect <container> --format '{{.Mounts}}'`
2. Verify permissions inside: `docker exec <container> ls -la /path`
3. Check host path exists (bind mounts)
4. Try named volume instead
### Performance Issues
1. Check resource usage: `docker stats`
2. Review limits: `docker inspect <container> --format '{{.HostConfig.Memory}}'`
3. Check for resource contention
4. Profile application inside container
## Cleanup
```bash
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune
# Remove unused volumes
docker volume prune
# Remove unused networks
docker network prune
# Remove everything unused
docker system prune -a --volumes
```
## Debugging Compose
```bash
# Validate compose file
docker compose config
# See what would run
docker compose config --services
# Check why service isn't starting
docker compose logs <service>
# Force recreate
docker compose up -d --force-recreate
# Rebuild images
docker compose up -d --build
```
## Common Compose Issues
| Problem | Check |
|---------|-------|
| Service not starting | `docker compose logs <service>` |
| depends_on not working | Service starts but app not ready (use healthcheck) |
| Volume not persisting | Check volume name, not recreated |
| Env vars not loading | Check .env file location, syntax |
| Network errors | Check network names, external networks |
## Health Check Debugging
```bash
# Check health status
docker inspect <container> --format '{{.State.Health.Status}}'
# View health log
docker inspect <container> --format '{{json .State.Health}}' | jq
# Test health command manually
docker exec <container> <health-command>
```
## Emergency Recovery
### Force Stop
```bash
docker kill <container>
```
### Remove Stuck Container
```bash
docker rm -f <container>
```
### Reset Docker
```bash
# Restart Docker daemon
sudo systemctl restart docker
# Or on macOS
# Restart Docker Desktop
```

View File

@@ -0,0 +1,230 @@
# Docker Volumes Reference
## Volume Types
### Named Volumes (Recommended)
Managed by Docker, stored in `/var/lib/docker/volumes/`.
```yaml
volumes:
db-data:
services:
db:
volumes:
- db-data:/var/lib/mysql
```
Benefits:
- Portable across hosts
- Backup-friendly
- No permission issues
- Can use volume drivers (NFS, etc.)
### Bind Mounts
Direct host path mapping.
```yaml
services:
web:
volumes:
- ./config:/etc/app/config:ro
- /host/data:/container/data
```
Benefits:
- Direct file access from host
- Development workflow (live reload)
- Access to host files
Drawbacks:
- Host-dependent paths
- Permission issues possible
- Less portable
### tmpfs Mounts
In-memory storage (Linux only).
```yaml
services:
app:
tmpfs:
- /tmp
- /run:size=100m
```
Benefits:
- Fast (RAM-based)
- Secure (not persisted)
- Good for secrets, cache
## Volume Options
### Read-Only
```yaml
volumes:
- ./config:/etc/app/config:ro
```
### Bind Propagation
```yaml
volumes:
- type: bind
source: ./data
target: /data
bind:
propagation: rslave
```
### Volume Driver Options
```yaml
volumes:
nfs-data:
driver: local
driver_opts:
type: nfs
o: addr=192.168.1.100,rw
device: ":/export/data"
```
## Common Patterns
### Database Data
```yaml
services:
postgres:
image: postgres:15
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: secret
volumes:
pgdata:
```
### Configuration Files
```yaml
services:
nginx:
image: nginx:alpine
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./html:/usr/share/nginx/html:ro
```
### Shared Data Between Services
```yaml
services:
app:
volumes:
- shared:/data
worker:
volumes:
- shared:/data
volumes:
shared:
```
### Log Persistence
```yaml
services:
app:
volumes:
- logs:/var/log/app
volumes:
logs:
```
## Backup and Restore
### Backup Named Volume
```bash
# Create backup
docker run --rm \
-v myvolume:/source:ro \
-v $(pwd):/backup \
alpine tar czf /backup/myvolume.tar.gz -C /source .
# Restore backup
docker run --rm \
-v myvolume:/target \
-v $(pwd):/backup \
alpine tar xzf /backup/myvolume.tar.gz -C /target
```
### Copy Files from Volume
```bash
docker cp <container>:/path/to/file ./local-file
```
## Volume Management
```bash
# List volumes
docker volume ls
# Inspect volume
docker volume inspect <volume>
# Remove unused volumes
docker volume prune
# Remove specific volume
docker volume rm <volume>
# Create volume manually
docker volume create --name myvolume
```
## Permissions
### Common Permission Issues
```bash
# Check container user
docker exec <container> id
# Check volume permissions
docker exec <container> ls -la /data
```
### Solutions
```yaml
# Run as specific user
services:
app:
user: "1000:1000"
volumes:
- ./data:/data
```
Or fix host permissions:
```bash
chown -R 1000:1000 ./data
```
## Best Practices
1. **Use named volumes for data** - More portable than bind mounts
2. **Read-only when possible** - Use `:ro` for config files
3. **Separate concerns** - Different volumes for data, config, logs
4. **Backup strategy** - Plan for volume backup/restore
5. **Don't store in image** - Data should be in volumes, not image layers
6. **Use .dockerignore** - Exclude data directories from build context

95
skills/proxmox/SKILL.md Normal file
View File

@@ -0,0 +1,95 @@
---
name: proxmox
description: |
Proxmox VE virtualization platform reference for VM/LXC management, clustering,
storage, and networking. Includes Terraform and Ansible integration patterns.
Use when working with Proxmox configurations, CLI commands, troubleshooting
VMs/containers, or planning resource allocation.
Triggers: proxmox, qemu, kvm, lxc, pve, vm, container, cluster, vzdump, qm, pct.
---
# Proxmox Skill
Proxmox VE virtualization platform reference for VM management, containers, clustering, and homelab infrastructure.
## Quick Reference
```bash
# VM management (qm)
qm list # List all VMs
qm status <vmid> # Check VM status
qm start <vmid> # Start VM
qm stop <vmid> # Stop VM (graceful)
qm shutdown <vmid> # Shutdown VM (ACPI)
qm unlock <vmid> # Remove lock
qm config <vmid> # Show VM config
# Container management (pct)
pct list # List all containers
pct status <ctid> # Check container status
pct start <ctid> # Start container
pct stop <ctid> # Stop container
pct enter <ctid> # Enter container shell
# Cluster management (pvecm)
pvecm status # Cluster status and quorum
pvecm nodes # List cluster nodes
# API shell (pvesh)
pvesh get /nodes # List nodes via API
pvesh get /nodes/<node>/status # Node resource status
# Backup (vzdump)
vzdump <vmid> --mode snapshot --storage <storage>
vzdump --all --compress zstd
```
## Reference Files
Load on-demand based on task:
| Topic | File | When to Load |
|-------|------|--------------|
| VM vs LXC | [vm-lxc.md](references/vm-lxc.md) | Choosing virtualization type |
| Docker Hosting | [docker-hosting.md](references/docker-hosting.md) | Running Docker on Proxmox |
| Networking | [networking.md](references/networking.md) | Bridges, VLANs, SDN, firewall |
| Storage | [storage.md](references/storage.md) | Storage backends, content types |
| Clustering | [clustering.md](references/clustering.md) | HA, quorum, fencing |
| Backup | [backup.md](references/backup.md) | vzdump modes, restore |
| CLI Tools | [cli-tools.md](references/cli-tools.md) | qm, pct, pvecm, pvesh commands |
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, diagnostics |
| Automation Tools | [automation-tools.md](references/automation-tools.md) | Terraform/Ansible integration |
## Validation Checklist
Before deploying VMs/containers:
- [ ] Cluster status healthy (`pvecm status`)
- [ ] Node resources available (CPU, RAM, disk)
- [ ] Storage accessible and mounted
- [ ] Network bridges configured correctly
- [ ] VLAN tags match network design
- [ ] Resource allocation within node limits
- [ ] HA configuration correct (if enabled)
- [ ] Backup schedule in place
- [ ] Naming convention followed
## VM vs LXC Quick Decision
| Factor | Use VM | Use LXC |
|--------|--------|---------|
| OS | Windows, BSD, any | Linux only |
| Isolation | Full kernel isolation | Shared kernel |
| Performance | Good | Better (lighter) |
| Startup | Slower | Fast |
| Density | Lower | Higher |
| Complexity | Any workload | Simple services |
## Homelab Network VLANs
| VLAN | Purpose | Proxmox Bridge |
|------|---------|----------------|
| 5 | Management (Web UI, API, SSH) | vmbr5 |
| 1 | Trusted network | vmbr0 |
| 11 | Storage (NFS/Ceph, MTU 9000) | vmbr11 |
| 12 | High-speed transfers | vmbr12 |

View File

@@ -0,0 +1,179 @@
# Proxmox Automation Tools
Integration patterns for managing Proxmox with Terraform and Ansible.
## Tool Selection Guide
| Task | Recommended Tool | Rationale |
|------|-----------------|-----------|
| VM/LXC provisioning | Terraform | Declarative state, idempotent, handles dependencies |
| Template creation | Packer | Repeatable builds, version-controlled |
| Post-boot configuration | Ansible | Agent-based, procedural, good for drift |
| One-off VM operations | Ansible | Quick tasks, no state file needed |
| Dynamic inventory | Ansible | Query running VMs for configuration |
| Bulk VM creation | Terraform | count/for_each, parallel creation |
| Snapshot management | Either | Terraform for lifecycle, Ansible for ad-hoc |
| Cluster administration | CLI/API | Direct access for maintenance tasks |
## Terraform Integration
### Provider
```hcl
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
provider "proxmox" {
pm_api_url = "https://proxmox.example.com:8006/api2/json"
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
}
```
### Common Patterns
```hcl
# Clone from template
resource "proxmox_vm_qemu" "vm" {
name = "myvm"
target_node = "joseph"
clone = "tmpl-ubuntu-2404-standard"
full_clone = true
cores = 2
memory = 4096
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm"
size = "50G"
}
}
}
}
}
```
### Skill Reference
Load terraform skill for detailed patterns:
- `terraform/references/proxmox/gotchas.md` - Critical issues
- `terraform/references/proxmox/vm-qemu.md` - VM resource patterns
- `terraform/references/proxmox/authentication.md` - API setup
## Ansible Integration
### Collection
```bash
ansible-galaxy collection install community.general
```
### Common Patterns
```yaml
# Clone VM
- name: Clone from template
community.general.proxmox_kvm:
api_host: proxmox.example.com
api_user: ansible@pve
api_token_id: mytoken
api_token_secret: "{{ proxmox_token_secret }}"
node: joseph
vmid: 300
name: myvm
clone: tmpl-ubuntu-2404-standard
full: true
timeout: 500
# Start VM
- name: Start VM
community.general.proxmox_kvm:
# ... auth ...
vmid: 300
state: started
```
### Skill Reference
Load ansible skill for detailed patterns:
- `ansible/references/proxmox/modules.md` - All Proxmox modules
- `ansible/references/proxmox/gotchas.md` - Common issues
- `ansible/references/proxmox/dynamic-inventory.md` - Auto-discovery
## Terraform vs Ansible Decision
### Use Terraform When
- Creating infrastructure from scratch
- Managing VM lifecycle (create, update, destroy)
- Need state tracking and drift detection
- Deploying multiple similar VMs (for_each)
- Complex dependencies between resources
- Team collaboration with state locking
### Use Ansible When
- Configuring VMs after creation
- Ad-hoc operations (start/stop specific VMs)
- Dynamic inventory needed for other playbooks
- Quick one-off tasks
- No state file management desired
- Integration with existing Ansible workflows
### Use Both When
- Terraform provisions VMs
- Ansible configures them post-boot
- Ansible uses Proxmox dynamic inventory to find Terraform-created VMs
## Hybrid Workflow Example
```
1. Packer builds VM template
└── packer build ubuntu-2404.pkr.hcl
2. Terraform provisions VMs from template
└── terraform apply
└── Outputs: VM IPs, hostnames
3. Ansible configures VMs
└── Uses Proxmox dynamic inventory OR
└── Uses Terraform output as inventory
4. Ongoing management
└── Terraform for infrastructure changes
└── Ansible for configuration drift
```
## API Token Sharing
Both tools can share the same API token:
```bash
# Create shared token
pveum user add automation@pve
pveum aclmod / -user automation@pve -role PVEAdmin
pveum user token add automation@pve shared --privsep 0
```
Store in shared secrets management (1Password, Vault, etc.).
## Common Gotchas
| Issue | Terraform | Ansible |
|-------|-----------|---------|
| VMID | Auto-assigns if not specified | Must specify manually |
| Cloud-init changes | Use replace_triggered_by | Limited support, use API |
| State tracking | Yes (tfstate) | No state file |
| Parallel operations | Yes (configurable) | Yes (forks) |
| Template name vs ID | Supports both | Supports both |
| Timeout handling | Provider config | Module parameter |

View File

@@ -0,0 +1,162 @@
# Proxmox Backup Reference
## vzdump Overview
Built-in backup tool for VMs and containers.
```bash
# Basic backup
vzdump <vmid>
# With options
vzdump <vmid> --mode snapshot --storage backup-nfs --compress zstd
# Backup all VMs
vzdump --all --compress zstd
```
## Backup Modes
| Mode | Downtime | Method | Use Case |
|------|----------|--------|----------|
| stop | Full | Shutdown, backup, start | Consistent, any storage |
| suspend | Brief | Pause, backup, resume | Running state preserved |
| snapshot | None | LVM/ZFS/Ceph snapshot | Production, requires snapshot storage |
### Mode Selection
```bash
# Stop mode (most consistent)
vzdump <vmid> --mode stop
# Suspend mode (preserves RAM state)
vzdump <vmid> --mode suspend
# Snapshot mode (live, requires compatible storage)
vzdump <vmid> --mode snapshot
```
## Backup Formats
| Format | Type | Compression |
|--------|------|-------------|
| VMA | VMs | Native Proxmox format |
| tar | Containers | Standard tar archive |
## Compression Options
| Type | Speed | Ratio | CPU |
|------|-------|-------|-----|
| none | Fastest | 1:1 | Low |
| lzo | Fast | Good | Low |
| gzip | Moderate | Better | Medium |
| zstd | Fast | Best | Medium |
Recommendation: `zstd` for best balance.
```bash
vzdump <vmid> --compress zstd
```
## Storage Configuration
```bash
# Backup to specific storage
vzdump <vmid> --storage backup-nfs
# Check available backup storage
pvesm status | grep backup
```
## Scheduled Backups
Configure in Datacenter → Backup:
- Schedule (cron format)
- Selection (all, pool, specific VMs)
- Storage destination
- Mode and compression
- Retention policy
### Retention Policy
```
keep-last: 3 # Keep last N backups
keep-daily: 7 # Keep daily for N days
keep-weekly: 4 # Keep weekly for N weeks
keep-monthly: 6 # Keep monthly for N months
```
## Restore Operations
### Full Restore
```bash
# Restore VM
qmrestore <backup-file> <vmid>
# Restore to different VMID
qmrestore <backup-file> <new-vmid>
# Restore container
pct restore <ctid> <backup-file>
```
### Restore Options
```bash
# Restore to different storage
qmrestore <backup> <vmid> --storage local-lvm
# Force overwrite existing VM
qmrestore <backup> <vmid> --force
```
### File-Level Restore
```bash
# Mount backup for file extraction
# (Use web UI: Backup → Restore → File Restore)
```
## Proxmox Backup Server (PBS)
Dedicated backup server with deduplication.
### Benefits
- Deduplication across backups
- Encryption at rest
- Verification and integrity checks
- Efficient incremental backups
- Remote backup sync
### Integration
Add PBS storage:
```bash
pvesm add pbs <storage-id> \
--server <pbs-server> \
--datastore <datastore> \
--username <user>@pbs \
--fingerprint <fingerprint>
```
## Backup Best Practices
- Store backups on separate storage from VMs
- Use snapshot mode for production VMs
- Test restores regularly
- Offsite backup copy for disaster recovery
- Monitor backup job completion
- Set appropriate retention policy
## Troubleshooting
| Issue | Check |
|-------|-------|
| Backup fails | Storage space, VM state, permissions |
| Slow backup | Mode (snapshot faster), compression, network |
| Restore fails | Storage compatibility, VMID conflicts |
| Snapshot fails | Storage doesn't support snapshots |

View File

@@ -0,0 +1,178 @@
# Proxmox CLI Tools Reference
## qm - VM Management
```bash
# List and status
qm list # List all VMs
qm status <vmid> # VM status
qm config <vmid> # Show VM config
# Power operations
qm start <vmid> # Start VM
qm stop <vmid> # Force stop
qm shutdown <vmid> # ACPI shutdown
qm reboot <vmid> # ACPI reboot
qm reset <vmid> # Hard reset
qm suspend <vmid> # Suspend to RAM
qm resume <vmid> # Resume from suspend
# Configuration
qm set <vmid> --memory 4096 # Set memory
qm set <vmid> --cores 4 # Set CPU cores
qm set <vmid> --name newname # Rename VM
# Disk operations
qm resize <vmid> scsi0 +10G # Extend disk
qm move-disk <vmid> scsi0 <storage> # Move disk
# Snapshots
qm snapshot <vmid> <snapname> # Create snapshot
qm listsnapshot <vmid> # List snapshots
qm rollback <vmid> <snapname> # Rollback
qm delsnapshot <vmid> <snapname> # Delete snapshot
# Templates and clones
qm template <vmid> # Convert to template
qm clone <vmid> <newid> # Clone VM
# Migration
qm migrate <vmid> <target-node> # Live migrate
# Troubleshooting
qm unlock <vmid> # Remove lock
qm showcmd <vmid> # Show QEMU command
qm monitor <vmid> # QEMU monitor
qm guest cmd <vmid> <command> # Guest agent command
```
## pct - Container Management
```bash
# List and status
pct list # List all containers
pct status <ctid> # Container status
pct config <ctid> # Show config
# Power operations
pct start <ctid> # Start container
pct stop <ctid> # Stop container
pct shutdown <ctid> # Graceful shutdown
pct reboot <ctid> # Reboot
# Access
pct enter <ctid> # Enter shell
pct exec <ctid> -- <command> # Run command
pct console <ctid> # Attach console
# Configuration
pct set <ctid> --memory 2048 # Set memory
pct set <ctid> --cores 2 # Set CPU cores
pct set <ctid> --hostname name # Set hostname
# Disk operations
pct resize <ctid> rootfs +5G # Extend rootfs
pct move-volume <ctid> <vol> <storage> # Move volume
# Snapshots
pct snapshot <ctid> <snapname> # Create snapshot
pct listsnapshot <ctid> # List snapshots
pct rollback <ctid> <snapname> # Rollback
# Templates
pct template <ctid> # Convert to template
pct clone <ctid> <newid> # Clone container
# Migration
pct migrate <ctid> <target-node> # Migrate container
# Troubleshooting
pct unlock <ctid> # Remove lock
pct push <ctid> <src> <dst> # Copy file to container
pct pull <ctid> <src> <dst> # Copy file from container
```
## pvecm - Cluster Management
```bash
# Status
pvecm status # Cluster status
pvecm nodes # List nodes
pvecm qdevice # QDevice status
# Node operations
pvecm add <node> # Join cluster
pvecm delnode <node> # Remove node
pvecm updatecerts # Update SSL certs
# Recovery
pvecm expected <votes> # Set expected votes
```
## pvesh - API Shell
```bash
# GET requests
pvesh get /nodes # List nodes
pvesh get /nodes/<node>/status # Node status
pvesh get /nodes/<node>/qemu # List VMs on node
pvesh get /nodes/<node>/qemu/<vmid>/status/current # VM status
pvesh get /storage # List storage
pvesh get /cluster/resources # All cluster resources
# POST/PUT requests
pvesh create /nodes/<node>/qemu -vmid <id> ... # Create VM
pvesh set /nodes/<node>/qemu/<vmid>/config ... # Modify VM
# DELETE requests
pvesh delete /nodes/<node>/qemu/<vmid> # Delete VM
```
## vzdump - Backup
```bash
# Basic backup
vzdump <vmid> # Backup VM
vzdump <ctid> # Backup container
# Options
vzdump <vmid> --mode snapshot # Snapshot mode
vzdump <vmid> --compress zstd # With compression
vzdump <vmid> --storage backup # To specific storage
vzdump <vmid> --mailto admin@example.com # Email notification
# Backup all
vzdump --all # All VMs and containers
vzdump --pool <pool> # All in pool
```
## qmrestore / pct restore
```bash
# Restore VM
qmrestore <backup.vma> <vmid>
qmrestore <backup.vma> <vmid> --storage local-lvm
# Restore container
pct restore <ctid> <backup.tar>
pct restore <ctid> <backup.tar> --storage local-lvm
```
## Useful Combinations
```bash
# Check resources on all nodes
for node in joseph maxwell everette; do
echo "=== $node ==="
pvesh get /nodes/$node/status | jq '{cpu:.cpu, memory:.memory}'
done
# Stop all VMs on a node
qm list | awk 'NR>1 {print $1}' | xargs -I {} qm stop {}
# List VMs with their IPs (requires guest agent)
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
echo -n "$vmid: "
qm guest cmd $vmid network-get-interfaces 2>/dev/null | jq -r '.[].["ip-addresses"][]?.["ip-address"]' | head -1
done
```

View File

@@ -0,0 +1,181 @@
# Proxmox Clustering Reference
## Cluster Benefits
- Centralized web management
- Live VM migration between nodes
- High availability (HA) with automatic failover
- Shared configuration
## Cluster Requirements
| Requirement | Details |
|-------------|---------|
| Version | Same major/minor Proxmox version |
| Time | NTP synchronized |
| Network | Low-latency cluster network |
| Names | Unique node hostnames |
| Storage | Shared storage for HA |
## Cluster Commands
```bash
# Check cluster status
pvecm status
# List cluster nodes
pvecm nodes
# Add node to cluster (run on new node)
pvecm add <existing-node>
# Remove node (run on remaining node)
pvecm delnode <node-name>
# Expected votes (split-brain recovery)
pvecm expected <votes>
```
## Quorum
Cluster requires majority of nodes online to operate.
| Nodes | Quorum | Can Lose |
|-------|--------|----------|
| 2 | 2 | 0 (use QDevice) |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
### QDevice
External quorum device for even-node clusters:
- Prevents split-brain in 2-node clusters
- Runs on separate machine
- Provides tie-breaking vote
## High Availability (HA)
Automatic VM restart on healthy node if host fails.
### Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Fencing enabled (watchdog)
- HA group configured
- VM added to HA
### HA States
| State | Description |
|-------|-------------|
| started | VM running, managed by HA |
| stopped | VM stopped intentionally |
| migrate | Migration in progress |
| relocate | Moving to different node |
| error | Problem detected |
### HA Configuration
1. Enable fencing (watchdog device)
2. Create HA group (optional)
3. Add VM to HA: Datacenter → HA → Add
### Fencing
Prevents split-brain by forcing failed node to stop:
```bash
# Check watchdog status
cat /proc/sys/kernel/watchdog
# Watchdog config
/etc/pve/ha/fence.cfg
```
## Live Migration
Move running VM between nodes without downtime.
### Requirements
- Shared storage OR local-to-local migration
- Same CPU architecture
- Network connectivity
- Sufficient resources on target
### Migration Types
| Type | Downtime | Requirements |
|------|----------|--------------|
| Live | Minimal | Shared storage |
| Offline | Full | Any storage |
| Local storage | Moderate | Copies disk |
### Migration Command
```bash
# Live migrate
qm migrate <vmid> <target-node>
# Offline migrate
qm migrate <vmid> <target-node> --offline
# With local disk
qm migrate <vmid> <target-node> --with-local-disks
```
## Cluster Network
### Corosync Network
Cluster communication (default port 5405):
- Low-latency required
- Dedicated VLAN recommended
- Redundant links for HA
### Configuration
```
# /etc/pve/corosync.conf
nodelist {
node {
name: node1
ring0_addr: 192.168.10.1
}
node {
name: node2
ring0_addr: 192.168.10.2
}
}
```
## Troubleshooting
### Quorum Lost
```bash
# Check status
pvecm status
# Force expected votes (DANGEROUS)
pvecm expected 1
# Then: recover remaining nodes
```
### Node Won't Join
- Check network connectivity
- Verify time sync
- Check Proxmox versions match
- Review /var/log/pve-cluster/
### Split Brain Recovery
1. Identify authoritative node
2. Stop cluster services on other nodes
3. Set expected votes
4. Restart and rejoin nodes

View File

@@ -0,0 +1,202 @@
# Docker Workloads on Proxmox
Best practices for hosting Docker containers on Proxmox VE.
## Hosting Options
| Option | Isolation | Overhead | Complexity | Recommendation |
|--------|-----------|----------|------------|----------------|
| VM + Docker | Full | Higher | Low | **Recommended** |
| LXC + Docker | Shared kernel | Lower | High | Avoid |
| Bare metal Docker | None | Lowest | N/A | Not on Proxmox |
## VM for Docker (Recommended)
### Template Selection
Use Docker-ready templates (102+):
| Template | Docker Pre-installed |
|----------|---------------------|
| 102 (docker) | Yes |
| 103 (github-runner) | Yes |
| 104 (pihole) | Yes |
### VM Sizing
| Workload | CPU | RAM | Disk |
|----------|-----|-----|------|
| Light (1-3 containers) | 2 | 4 GB | 50 GB |
| Medium (4-10 containers) | 4 | 8 GB | 100 GB |
| Heavy (10+ containers) | 8+ | 16+ GB | 200+ GB |
### Storage Backend
| Proxmox Storage | Docker Suitability | Notes |
|-----------------|-------------------|-------|
| local-lvm | Good | Default, fast |
| ZFS | Best | Snapshots, compression |
| Ceph | Good | Distributed, HA |
| NFS | Moderate | Shared access, slower |
### Network Configuration
```
Proxmox Node
├── vmbr0 (bridge) → VM eth0 → Docker bridge network
└── vmbr12 (high-speed) → VM eth1 → Docker macvlan (optional)
```
## Docker in LXC (Not Recommended)
If you must run Docker in LXC:
### Requirements
1. **Privileged container** or nesting enabled
2. **AppArmor** profile unconfined
3. **Keyctl** feature enabled
### LXC Options
```bash
# Proxmox GUI: Options → Features
nesting: 1
keyctl: 1
# Or in /etc/pve/lxc/<vmid>.conf
features: keyctl=1,nesting=1
lxc.apparmor.profile: unconfined
```
### Known Issues
- Some Docker storage drivers don't work
- Overlay filesystem may have issues
- Reduced security isolation
- Complex debugging (two container layers)
## Resource Allocation
### CPU
```bash
# VM config - dedicate cores to Docker host
cores: 4
cpu: host # Pass through CPU features
```
### Memory
```bash
# VM config - allow some overcommit for containers
memory: 8192
balloon: 4096 # Minimum memory
```
### Disk I/O
For I/O intensive containers (databases):
```bash
# VM disk options
cache: none # Direct I/O for consistency
iothread: 1 # Dedicated I/O thread
ssd: 1 # If on SSD storage
```
## GPU Passthrough for Containers
For transcoding (Plex) or ML workloads:
### 1. Proxmox: Pass GPU to VM
```bash
# /etc/pve/qemu-server/<vmid>.conf
hostpci0: 0000:01:00.0,pcie=1
```
### 2. VM: Install NVIDIA Container Toolkit
```bash
# In VM
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
### 3. Docker Compose
```yaml
services:
plex:
image: linuxserver/plex
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
## Backup Strategy
### VM-level (Recommended)
Proxmox vzdump backs up entire Docker host including all containers:
```bash
vzdump <vmid> --mode snapshot --storage backup --compress zstd
```
### Application-level
For consistent database backups, stop or flush before VM backup:
```bash
# Pre-backup hook
docker exec postgres pg_dump -U user db > /backup/db.sql
```
## Monitoring
### From Proxmox
- VM CPU, memory, network, disk via Proxmox UI
- No visibility into individual containers
### From Docker Host
```bash
# Resource usage per container
docker stats
# System-wide
docker system df
```
### Recommended Stack
```yaml
# On Docker host
services:
prometheus:
image: prom/prometheus
cadvisor:
image: gcr.io/cadvisor/cadvisor
grafana:
image: grafana/grafana
```
## Skill References
For Docker-specific patterns:
- `docker/references/compose.md` - Compose file structure
- `docker/references/networking.md` - Network modes
- `docker/references/volumes.md` - Data persistence
- `docker/references/proxmox/hosting.md` - Detailed hosting guide

View File

@@ -0,0 +1,153 @@
# Proxmox Networking Reference
## Linux Bridges
Default networking method for Proxmox VMs and containers.
### Bridge Configuration
```
# /etc/network/interfaces example
auto vmbr0
iface vmbr0 inet static
address 192.168.1.10/24
gateway 192.168.1.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
```
### VLAN-Aware Bridge
Enable VLAN tagging at VM level instead of separate bridges:
- Set `bridge-vlan-aware yes` on bridge
- Configure VLAN tag in VM network config
- Simpler management, fewer bridges needed
### Separate Bridges (Alternative)
One bridge per VLAN:
- vmbr0: Untagged/native VLAN
- vmbr1: VLAN 10
- vmbr5: VLAN 5
More bridges but explicit network separation.
## VLAN Configuration
### At VM Level (VLAN-aware bridge)
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,tag=20
```
### At Bridge Level (Separate bridges)
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr20
```
## Firewall
Three levels of firewall rules:
| Level | Scope | Use Case |
|-------|-------|----------|
| Datacenter | Cluster-wide | Default policies |
| Node | Per-node | Node-specific rules |
| VM/Container | Per-VM | Application-specific |
### Default Policy
- Input: DROP (only allow explicit rules)
- Output: ACCEPT
- Enable firewall per VM in Options
### Common Rules
```
# Allow SSH
IN ACCEPT -p tcp --dport 22
# Allow HTTP/HTTPS
IN ACCEPT -p tcp --dport 80
IN ACCEPT -p tcp --dport 443
# Allow ICMP (ping)
IN ACCEPT -p icmp
```
## SDN (Software Defined Networking)
Advanced networking for complex multi-tenant setups.
### Zone Types
| Type | Use Case |
|------|----------|
| Simple | Basic L2 network |
| VLAN | VLAN-based isolation |
| VXLAN | Overlay networking |
| EVPN | BGP-based routing |
### When to Use SDN
- Multi-tenant environments
- Complex routing requirements
- Cross-node L2 networks
- VXLAN overlay needs
For homelab: Standard bridges usually sufficient.
## Network Performance
### Jumbo Frames
Enable on storage network for better throughput:
```
# Set MTU 9000 on bridge
auto vmbr40
iface vmbr40 inet static
mtu 9000
...
```
Requires: All devices in path support jumbo frames.
### VirtIO Multiqueue
Enable parallel network processing for high-throughput VMs:
```
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4
```
## Troubleshooting
### Check Bridge Status
```bash
brctl show # List bridges and attached interfaces
ip link show vmbr0 # Bridge interface details
bridge vlan show # VLAN configuration
```
### Check VM Network
```bash
qm config <vmid> | grep net # VM network config
ip addr # From inside VM
```
### Common Issues
| Problem | Check |
|---------|-------|
| No connectivity | Bridge exists, interface attached |
| Wrong VLAN | Tag matches switch config |
| Slow network | MTU mismatch, driver type |
| Firewall blocking | Rules, policy, enabled status |

View File

@@ -0,0 +1,150 @@
# Proxmox Storage Reference
## Storage Types
### Local Storage
| Type | Features | Use Case |
|------|----------|----------|
| Directory | Simple, any filesystem | Basic storage |
| LVM | Block device, raw performance | Performance |
| LVM-thin | Thin provisioning, snapshots | Efficient space |
| ZFS | Compression, snapshots, high perf | Production |
Limitations: No live migration, single node only.
### Shared Storage
| Type | Features | Use Case |
|------|----------|----------|
| NFS | File-based, simple | Shared access |
| Ceph RBD | Distributed block, HA | Production HA |
| iSCSI | Network block | SAN integration |
| GlusterFS | Distributed file | File sharing |
Benefits: Live migration, HA, shared access.
## Content Types
Configure what each storage can hold:
| Content | Description | File Types |
|---------|-------------|------------|
| images | VM disk images | .raw, .qcow2 |
| iso | ISO images for install | .iso |
| vztmpl | Container templates | .tar.gz |
| backup | Backup files | .vma, .tar |
| rootdir | Container root FS | directories |
| snippets | Cloud-init, hooks | .yaml, scripts |
## Storage Configuration
### Add NFS Storage
```bash
pvesm add nfs <storage-id> \
--server <nfs-server> \
--export <export-path> \
--content images,iso,backup
```
### Add Ceph RBD
```bash
pvesm add rbd <storage-id> \
--monhost <mon1>,<mon2>,<mon3> \
--pool <pool-name> \
--content images,rootdir
```
### Check Storage Status
```bash
pvesm status # All storage status
pvesh get /storage # API query
df -h # Disk space
```
## Disk Formats
| Format | Features | Performance |
|--------|----------|-------------|
| raw | No overhead, full allocation | Fastest |
| qcow2 | Snapshots, thin provisioning | Moderate |
Recommendation: Use `raw` for production, `qcow2` for dev/snapshots.
## Disk Cache Modes
| Mode | Safety | Performance | Use Case |
|------|--------|-------------|----------|
| none | Safe | Good | Default, recommended |
| writeback | Unsafe | Best | Non-critical, battery backup |
| writethrough | Safe | Moderate | Compatibility |
| directsync | Safest | Slow | Critical data |
## Storage Performance
### Enable Discard (TRIM)
For SSD thin provisioning:
```
scsi0: local-lvm:vm-100-disk-0,discard=on
```
### I/O Thread
Dedicated I/O thread per disk:
```
scsi0: local-lvm:vm-100-disk-0,iothread=1
```
### I/O Limits
Throttle disk bandwidth:
```
# In VM config
bwlimit: <KiB/s>
iops_rd: <iops>
iops_wr: <iops>
```
## Cloud-Init Storage
Cloud-init configs stored in `snippets` content type:
```bash
# Upload cloud-init files
scp user-data.yaml root@proxmox:/var/lib/vz/snippets/
# Or to named storage
scp user-data.yaml root@proxmox:/mnt/pve/<storage>/snippets/
```
Reference in VM:
```
cicustom: user=<storage>:snippets/user-data.yaml
```
## Backup Storage
### Recommended Configuration
- Separate storage for backups
- NFS or dedicated backup server
- Sufficient space for retention policy
### Backup Retention
Configure in Datacenter → Backup:
```
keep-last: 3
keep-daily: 7
keep-weekly: 4
keep-monthly: 6
```

View File

@@ -0,0 +1,197 @@
# Proxmox Troubleshooting Reference
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources |
| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
| Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network |
| Storage unavailable | Mount failed, network | Check mount, network access |
| High load | Resource contention | Identify bottleneck, rebalance VMs |
| Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules |
| Backup failed | Disk space, VM state | Check space, storage access |
| Template not found | Not downloaded | Download from Proxmox repo |
| API errors | Auth, permissions | Check token, user permissions |
## Diagnostic Commands
### Cluster Health
```bash
pvecm status # Quorum and node status
pvecm nodes # List cluster members
systemctl status pve-cluster # Cluster service
systemctl status corosync # Corosync service
```
### Node Health
```bash
pveversion -v # Proxmox version info
uptime # Load and uptime
free -h # Memory usage
df -h # Disk space
top -bn1 | head -20 # Process overview
```
### VM Diagnostics
```bash
qm status <vmid> # VM state
qm config <vmid> # VM configuration
qm showcmd <vmid> # QEMU command line
qm unlock <vmid> # Clear locks
qm monitor <vmid> # QEMU monitor access
```
### Container Diagnostics
```bash
pct status <ctid> # Container state
pct config <ctid> # Container configuration
pct enter <ctid> # Enter container shell
pct unlock <ctid> # Clear locks
```
### Storage Diagnostics
```bash
pvesm status # Storage status
df -h # Disk space
mount | grep -E 'nfs|ceph' # Mounted storage
zpool status # ZFS pool status (if using ZFS)
ceph -s # Ceph status (if using Ceph)
```
### Network Diagnostics
```bash
brctl show # Bridge configuration
ip link # Network interfaces
ip addr # IP addresses
ip route # Routing table
bridge vlan show # VLAN configuration
```
### Log Files
```bash
# Cluster logs
journalctl -u pve-cluster
journalctl -u corosync
# VM/Container logs
journalctl | grep <vmid>
tail -f /var/log/pve/tasks/*
# Firewall logs
journalctl -u pve-firewall
# Web interface logs
journalctl -u pveproxy
```
## Troubleshooting Workflows
### VM Won't Start
1. Check for locks: `qm unlock <vmid>`
2. Verify storage: `pvesm status`
3. Check resources: `free -h`, `df -h`
4. Review config: `qm config <vmid>`
5. Check logs: `journalctl | grep <vmid>`
6. Try manual start: `qm start <vmid> --debug`
### Migration Failure
1. Verify shared storage: `pvesm status`
2. Check target resources: `pvesh get /nodes/<target>/status`
3. Verify network: `ping <target-node>`
4. Check version match: `pveversion` on both nodes
5. Review migration logs
### Cluster Quorum Lost
1. Check status: `pvecm status`
2. Identify online nodes
3. If majority lost, set expected: `pvecm expected <n>`
4. Recover remaining nodes
5. Rejoin lost nodes when available
### Storage Mount Failed
1. Check network: `ping <storage-server>`
2. Verify mount: `mount | grep <storage>`
3. Try manual mount
4. Check permissions on storage server
5. Review `/var/log/syslog`
### High CPU/Memory Usage
1. Identify culprit: `top`, `htop`
2. Check VM resources: `qm monitor <vmid>``info balloon`
3. Review resource allocation across cluster
4. Consider migration or resource limits
## Recovery Procedures
### Remove Failed Node
```bash
# On healthy node
pvecm delnode <failed-node>
# Clean up node-specific configs
rm -rf /etc/pve/nodes/<failed-node>
```
### Force Stop Locked VM
```bash
# Remove lock
qm unlock <vmid>
# If still stuck, find and kill QEMU process
ps aux | grep <vmid>
kill <pid>
# Force cleanup
qm stop <vmid> --skiplock
```
### Recover from Corrupt Config
```bash
# Backup current config
cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
# Edit config manually
nano /etc/pve/qemu-server/<vmid>.conf
# Or restore from backup
qmrestore <backup> <vmid>
```
## Health Check Script
```bash
#!/bin/bash
echo "=== Cluster Status ==="
pvecm status
echo -e "\n=== Node Resources ==="
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
echo "--- $node ---"
pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
done
echo -e "\n=== Storage Status ==="
pvesm status
echo -e "\n=== Running VMs ==="
qm list | grep running
echo -e "\n=== Running Containers ==="
pct list | grep running
```

View File

@@ -0,0 +1,103 @@
# VM vs LXC Reference
## Decision Matrix
### Use VM (QEMU/KVM) When
- Running Windows or non-Linux OS
- Need full kernel isolation
- Running untrusted workloads
- Complex hardware passthrough needed
- Different kernel version required
- GPU passthrough required
### Use LXC When
- Running Linux services
- Need lightweight, fast startup
- Comfortable with shared kernel
- Want better density/performance
- Simple application containers
- Development environments
## QEMU/KVM VMs
Full hardware virtualization with any OS support.
### Hardware Configuration
| Setting | Options | Recommendation |
|---------|---------|----------------|
| CPU type | host, kvm64, custom | `host` for performance |
| Boot | UEFI, BIOS | UEFI for modern OS |
| Display | VNC, SPICE, NoVNC | NoVNC for web access |
### Storage Controllers
| Type | Performance | Use Case |
|------|-------------|----------|
| VirtIO | Fastest | Linux, Windows with drivers |
| SCSI | Fast | General purpose |
| SATA | Moderate | Compatibility |
| IDE | Slow | Legacy OS |
### Network Adapters
| Type | Performance | Use Case |
|------|-------------|----------|
| VirtIO | Fastest | Linux, Windows with drivers |
| E1000 | Good | Compatibility |
| RTL8139 | Slow | Legacy OS |
### Features
- Snapshots (requires compatible storage)
- Templates for rapid cloning
- Live migration (requires shared storage)
- Hardware passthrough (GPU, USB, PCI)
## LXC Containers
OS-level virtualization with shared kernel.
### Container Types
| Type | Security | Use Case |
|------|----------|----------|
| Unprivileged | Higher (recommended) | Production workloads |
| Privileged | Lower | Docker-in-LXC, NFS mounts |
### Resource Controls
- CPU cores and limits
- Memory hard/soft limits
- Disk I/O throttling
- Network bandwidth limits
### Storage Options
- Bind mounts from host
- Volume storage
- ZFS datasets
### Features
- Fast startup (seconds)
- Lower memory overhead
- Higher density per host
- Templates from Proxmox repo
## Migration Considerations
### VM Migration Requirements
- Shared storage (Ceph, NFS, iSCSI)
- Same CPU architecture
- Compatible Proxmox versions
- Network connectivity between nodes
### LXC Migration Requirements
- Shared storage for live migration
- Same architecture
- Unprivileged preferred for portability

85
skills/terraform/SKILL.md Normal file
View File

@@ -0,0 +1,85 @@
---
name: terraform
description: |
Terraform infrastructure-as-code reference for HCL syntax, state management,
module design, and provider configuration. Use when working with Terraform
configurations (.tf files), running terraform commands, troubleshooting state
issues, or designing modules. Includes Telmate Proxmox provider patterns.
Triggers: terraform, tfstate, .tf files, HCL, modules, providers, proxmox_vm_qemu.
---
# Terraform Skill
Infrastructure-as-code reference for Terraform configurations, state management, and provider patterns.
## Quick Reference
```bash
# Core workflow
terraform init # Initialize, download providers
terraform validate # Syntax validation
terraform fmt -recursive # Format HCL files
terraform plan # Preview changes
terraform apply # Apply changes
# Inspection
terraform state list # List resources in state
terraform state show <resource> # Show resource details
terraform graph | dot -Tsvg > graph.svg # Dependency graph
# Debug
TF_LOG=DEBUG terraform plan 2>debug.log
```
## Core Workflow
```
init → validate → fmt → plan → apply
```
1. **init**: Download providers, initialize backend
2. **validate**: Check syntax and configuration validity
3. **fmt**: Ensure consistent formatting
4. **plan**: Preview what will change (review carefully)
5. **apply**: Execute changes
## Reference Files
Load on-demand based on task:
| Topic | File | When to Load |
|-------|------|--------------|
| Proxmox Gotchas | [proxmox/gotchas.md](references/proxmox/gotchas.md) | Critical provider issues, workarounds |
| Proxmox Auth | [proxmox/authentication.md](references/proxmox/authentication.md) | Provider config, API tokens |
| Proxmox VMs | [proxmox/vm-qemu.md](references/proxmox/vm-qemu.md) | proxmox_vm_qemu resource patterns |
| Proxmox Errors | [proxmox/troubleshooting.md](references/proxmox/troubleshooting.md) | Common errors, debugging |
| State | [state-management.md](references/state-management.md) | Backends, locking, operations |
| Modules | [module-design.md](references/module-design.md) | Module patterns, composition |
| Security | [security.md](references/security.md) | Secrets, state security |
| External | [external-resources.md](references/external-resources.md) | Official docs, links |
## Validation Checklist
Before `terraform apply`:
- [ ] `terraform init` completed successfully
- [ ] `terraform validate` passes
- [ ] `terraform fmt` applied
- [ ] `terraform plan` reviewed (check destroy/replace operations)
- [ ] Backend configured correctly (for team environments)
- [ ] State locking enabled (if remote backend)
- [ ] Sensitive variables marked `sensitive = true`
- [ ] Provider versions pinned in `terraform.tf`
- [ ] No secrets in version control
- [ ] Blast radius assessed (what could break?)
## Variable Precedence
(highest to lowest)
1. `-var` flag: `terraform apply -var="name=value"`
2. `-var-file` flag: `terraform apply -var-file=prod.tfvars`
3. `*.auto.tfvars` files (alphabetically)
4. `terraform.tfvars` file
5. `TF_VAR_*` environment variables
6. Variable defaults in `variables.tf`

View File

@@ -0,0 +1,66 @@
# External Resources
Pointers to official documentation and community resources.
## Official HashiCorp Documentation
| Resource | URL | Use For |
|----------|-----|---------|
| Terraform Docs | https://developer.hashicorp.com/terraform/docs | Language reference, CLI commands |
| Terraform Tutorials | https://developer.hashicorp.com/terraform/tutorials | Step-by-step learning paths |
| Language Reference | https://developer.hashicorp.com/terraform/language | HCL syntax, expressions, functions |
| CLI Reference | https://developer.hashicorp.com/terraform/cli | Command options and usage |
| Best Practices | https://developer.hashicorp.com/terraform/cloud-docs/recommended-practices | Official workflow recommendations |
## Terraform Registry
| Resource | URL | Use For |
|----------|-----|---------|
| Provider Registry | https://registry.terraform.io/browse/providers | Find and explore providers |
| Module Registry | https://registry.terraform.io/browse/modules | Pre-built modules |
| Telmate Proxmox | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Proxmox provider docs |
| AWS Provider | https://registry.terraform.io/providers/hashicorp/aws/latest/docs | AWS resource reference |
## Proxmox Resources
| Resource | URL | Use For |
|----------|-----|---------|
| Telmate Provider Docs | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Resource configuration |
| Telmate GitHub | https://github.com/Telmate/terraform-provider-proxmox | Source, issues, examples |
| Proxmox VE API | https://pve.proxmox.com/pve-docs/api-viewer/ | Understanding API calls |
| Proxmox Wiki | https://pve.proxmox.com/wiki/Main_Page | Proxmox concepts and setup |
## Community Resources
| Resource | URL | Use For |
|----------|-----|---------|
| Terraform Best Practices | https://www.terraform-best-practices.com | Community-maintained guide |
| Awesome Terraform | https://github.com/shuaibiyy/awesome-terraform | Curated list of resources |
| Terraform Weekly | https://www.yourdevopsmentor.com/terraform-weekly | News and updates |
## Learning Resources
| Resource | URL | Use For |
|----------|-----|---------|
| HashiCorp Learn | https://developer.hashicorp.com/terraform/tutorials | Official tutorials |
| Terraform Up & Running | https://www.terraformupandrunning.com/ | Comprehensive book |
## Tools
| Tool | URL | Use For |
|------|-----|---------|
| TFLint | https://github.com/terraform-linters/tflint | Linting and best practices |
| Checkov | https://github.com/bridgecrewio/checkov | Security scanning |
| Infracost | https://github.com/infracost/infracost | Cost estimation |
| Terragrunt | https://terragrunt.gruntwork.io/ | DRY Terraform configurations |
| tfenv | https://github.com/tfutils/tfenv | Terraform version management |
## Quick Links
**Most commonly needed:**
1. **HCL Syntax**: https://developer.hashicorp.com/terraform/language/syntax/configuration
2. **Functions**: https://developer.hashicorp.com/terraform/language/functions
3. **Expressions**: https://developer.hashicorp.com/terraform/language/expressions
4. **Backend Configuration**: https://developer.hashicorp.com/terraform/language/settings/backends
5. **Proxmox VM Resource**: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs/resources/vm_qemu

View File

@@ -0,0 +1,165 @@
# Module Design
## Standard Structure
```
modules/<name>/
├── main.tf # Resources
├── variables.tf # Inputs
├── outputs.tf # Outputs
├── versions.tf # Provider constraints
```
## Module Example
```hcl
# modules/vm/variables.tf
variable "name" {
description = "VM name"
type = string
}
variable "target_node" {
description = "Proxmox node"
type = string
}
variable "specs" {
type = object({
cores = number
memory = number
disk = optional(string, "50G")
})
}
```
```hcl
# modules/vm/main.tf
resource "proxmox_vm_qemu" "vm" {
name = var.name
target_node = var.target_node
cores = var.specs.cores
memory = var.specs.memory
}
```
```hcl
# modules/vm/outputs.tf
output "ip" {
value = proxmox_vm_qemu.vm.default_ipv4_address
}
```
```hcl
# Usage
module "web" {
source = "./modules/vm"
name = "web-01"
target_node = "pve1"
specs = { cores = 4, memory = 8192 }
}
```
## Complex Variable Types
```hcl
# Map of objects
variable "vms" {
type = map(object({
node = string
cores = number
memory = number
}))
}
# Object with optional fields
variable "network" {
type = object({
bridge = string
vlan = optional(number)
ip = optional(string, "dhcp")
})
}
```
## Variable Validation
```hcl
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod."
}
}
variable "cores" {
type = number
validation {
condition = var.cores >= 1 && var.cores <= 32
error_message = "Cores must be 1-32."
}
}
```
## Module Composition
```hcl
module "network" {
source = "../../modules/network"
# ...
}
module "web" {
source = "../../modules/vm"
network_id = module.network.id # Implicit dependency
}
module "database" {
source = "../../modules/vm"
depends_on = [module.network] # Explicit dependency
}
```
## for_each vs count
```hcl
# count - index-based (0, 1, 2)
module "worker" {
source = "./modules/vm"
count = 3
name = "worker-${count.index}"
}
# Access: module.worker[0]
# for_each - key-based (preferred)
module "vm" {
source = "./modules/vm"
for_each = var.vms
name = each.key
specs = each.value
}
# Access: module.vm["web"]
```
## Version Constraints
```hcl
# modules/vm/versions.tf
terraform {
required_version = ">= 1.0"
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
```
```hcl
# Pin module version
module "vm" {
source = "git::https://github.com/org/modules.git//vm?ref=v2.1.0"
}
```

View File

@@ -0,0 +1,44 @@
# Proxmox Provider Authentication
## Provider Configuration
```hcl
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 3.0"
}
}
}
provider "proxmox" {
pm_api_url = "https://proxmox.example.com:8006/api2/json"
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
pm_tls_insecure = false # true for self-signed certs
pm_parallel = 4 # concurrent operations
pm_timeout = 600 # API timeout seconds
}
```
## Create API Token
```bash
pveum user add terraform@pve
pveum aclmod / -user terraform@pve -role PVEAdmin
pveum user token add terraform@pve mytoken
```
## Environment Variables
```bash
export PM_API_TOKEN_ID="terraform@pve!mytoken"
export PM_API_TOKEN_SECRET="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
```
## Official Resources
- [Provider Docs](https://registry.terraform.io/providers/Telmate/proxmox/latest/docs)
- [GitHub](https://github.com/Telmate/terraform-provider-proxmox)
- [Proxmox API](https://pve.proxmox.com/pve-docs/api-viewer/)

View File

@@ -0,0 +1,86 @@
# Proxmox Provider Gotchas
Critical issues when using Telmate Proxmox provider with Terraform.
## 1. Cloud-Init Changes Not Tracked
Terraform does **not** detect changes to cloud-init snippet file contents.
```hcl
# PROBLEM: Changing vendor-data.yml won't trigger replacement
resource "proxmox_vm_qemu" "vm" {
cicustom = "vendor=local:snippets/vendor-data.yml"
}
# SOLUTION: Use replace_triggered_by
resource "local_file" "vendor_data" {
filename = "vendor-data.yml"
content = templatefile("vendor-data.yml.tftpl", { ... })
}
resource "proxmox_vm_qemu" "vm" {
cicustom = "vendor=local:snippets/vendor-data.yml"
lifecycle {
replace_triggered_by = [
local_file.vendor_data.content_base64sha256
]
}
}
```
## 2. Storage Type vs Storage Pool
Different concepts - don't confuse:
```hcl
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm" # Pool NAME (from Proxmox datacenter)
size = "50G"
}
}
}
}
scsihw = "virtio-scsi-single" # Controller TYPE
```
- **Storage pool** = Where data stored (local-lvm, ceph-pool, nfs-share)
- **Disk type** = Interface (scsi, virtio, ide, sata)
## 3. Network Interface Naming
Proxmox VMs get predictable names by device order:
| NIC Order | Guest Name |
|-----------|------------|
| First | ens18 |
| Second | ens19 |
| Third | ens20 |
**NOT** eth0, eth1. Configure cloud-init netplan matching `ens*`.
## 4. API Token Expiration
Long operations (20+ VMs) can exceed token lifetime.
```hcl
provider "proxmox" {
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret
pm_timeout = 1200 # 20 minutes for large operations
}
```
Use API tokens (longer-lived) not passwords.
## 5. Full Clone vs Linked Clone
```hcl
full_clone = true # Independent copy - safe, slower, more storage
full_clone = false # References template - BREAKS if template modified
```
**Always use `full_clone = true` for production.** Linked clones only for disposable test VMs.

View File

@@ -0,0 +1,66 @@
# Proxmox Troubleshooting
## VM Creation Stuck
```
Timeout waiting for VM to be created
```
**Causes**: Template missing, storage full, network unreachable
**Debug**: Check Proxmox task log in web UI
## Clone Failed
```
VM template not found
```
**Check**: `qm list | grep template-name`
**Causes**: Template doesn't exist, wrong node, permission issue
## SSH Timeout
```
Timeout waiting for SSH
```
**Debug**:
1. VM console in Proxmox UI
2. `cloud-init status` on VM
3. `ip addr` to verify network
**Causes**: Cloud-init failed, network misconfigured, firewall
## State Drift
```
Plan shows changes for unchanged resources
```
**Causes**: Manual changes in Proxmox UI, provider bug
**Fix**:
```bash
terraform refresh
terraform plan # Verify
```
## API Errors
```
500 Internal Server Error
```
**Causes**: Invalid config, resource constraints, API timeout
**Debug**: Check `/var/log/pveproxy/access.log` on Proxmox node
## Permission Denied
```
Permission check failed
```
**Fix**: Verify API token has required permissions:
```bash
pveum acl list
pveum user permissions terraform@pve
```

View File

@@ -0,0 +1,86 @@
# proxmox_vm_qemu Resource
## Basic VM from Template
```hcl
resource "proxmox_vm_qemu" "vm" {
name = "my-vm"
target_node = "pve1"
clone = "ubuntu-template"
full_clone = true
cores = 4
sockets = 1
memory = 8192
cpu = "host"
onboot = true
agent = 1 # QEMU guest agent
scsihw = "virtio-scsi-single"
disks {
scsi {
scsi0 {
disk {
storage = "local-lvm"
size = "50G"
}
}
}
}
network {
bridge = "vmbr0"
model = "virtio"
}
# Cloud-init
os_type = "cloud-init"
ciuser = "ubuntu"
sshkeys = var.ssh_public_key
ipconfig0 = "ip=dhcp"
# Static: ipconfig0 = "ip=192.168.1.10/24,gw=192.168.1.1"
# Custom cloud-init
cicustom = "vendor=local:snippets/vendor-data.yml"
}
```
## Lifecycle Management
```hcl
lifecycle {
prevent_destroy = true # Block accidental deletion
ignore_changes = [
network, # Ignore manual changes
]
replace_triggered_by = [
local_file.cloud_init.content_base64sha256
]
create_before_destroy = true # Blue-green deployment
}
```
## Multiple VMs with for_each
```hcl
variable "vms" {
type = map(object({
node = string
cores = number
memory = number
}))
}
resource "proxmox_vm_qemu" "vm" {
for_each = var.vms
name = each.key
target_node = each.value.node
cores = each.value.cores
memory = each.value.memory
# ...
}
```

View File

@@ -0,0 +1,92 @@
# Security
## Secrets Management
### Environment Variables (Recommended)
```bash
export TF_VAR_proxmox_password="secret"
export TF_VAR_api_token="xxxxx"
terraform apply
```
### Sensitive Variables
```hcl
variable "database_password" {
type = string
sensitive = true # Hidden in logs/plan
}
```
### External Secrets Managers
**HashiCorp Vault**:
```hcl
data "vault_generic_secret" "db" {
path = "secret/database"
}
resource "some_resource" "x" {
password = data.vault_generic_secret.db.data["password"]
}
```
**1Password CLI**:
```bash
export TF_VAR_password="$(op read 'op://vault/item/password')"
terraform apply
```
## State Security
**CRITICAL**: State contains secrets in plaintext.
### Encrypt at Rest
```hcl
backend "s3" {
encrypt = true
kms_key_id = "arn:aws:kms:..." # Optional KMS
}
```
### Restrict Access
- IAM/RBAC on backend storage
- Enable state locking
- Never commit state to git
## Provider Credentials
```hcl
provider "proxmox" {
pm_api_token_id = "terraform@pve!mytoken"
pm_api_token_secret = var.pm_api_token_secret # From env
}
```
Create minimal-permission API user:
```bash
pveum user add terraform@pve
pveum aclmod / -user terraform@pve -role PVEVMAdmin
pveum user token add terraform@pve terraform-token
```
## Sensitive Outputs
```hcl
output "db_password" {
value = random_password.db.result
sensitive = true
}
```
## Checklist
- [ ] Sensitive vars marked `sensitive = true`
- [ ] Secrets via env vars or secrets manager
- [ ] State backend encryption enabled
- [ ] State locking enabled
- [ ] No credentials in .tf files
- [ ] Provider credentials minimal permissions

View File

@@ -0,0 +1,112 @@
# State Management
## Remote Backend (Recommended)
```hcl
terraform {
backend "s3" {
bucket = "terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks" # State locking
}
}
```
### S3-Compatible (MinIO, Ceph)
```hcl
terraform {
backend "s3" {
bucket = "terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1" # Required but ignored
endpoint = "https://minio.example.com"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
force_path_style = true
}
}
```
## State Operations
```bash
# List resources
terraform state list
terraform state list proxmox_vm_qemu.*
# Show resource details
terraform state show proxmox_vm_qemu.web
# Rename resource
terraform state mv proxmox_vm_qemu.old proxmox_vm_qemu.new
# Move to module
terraform state mv proxmox_vm_qemu.web modules.web.proxmox_vm_qemu.main
# Remove from state (doesn't destroy)
terraform state rm proxmox_vm_qemu.orphaned
# Import existing resource
terraform import proxmox_vm_qemu.web pve1/qemu/100
# Update state from infrastructure
terraform refresh
```
## State Migration
```bash
# Change backend - updates terraform block, then:
terraform init -migrate-state
# Reinitialize without migration
terraform init -reconfigure
```
## State Locking
Prevents concurrent modifications. Enable via backend config:
- S3: `dynamodb_table`
- Consul: Built-in
- HTTP: `lock_address`
### Force Unlock (Emergency)
```bash
# Only when certain no operation running
terraform force-unlock LOCK_ID
```
## Troubleshooting
### State Lock Timeout
```
Error: Error acquiring state lock
```
1. Wait for other operation
2. Verify no process running
3. `terraform force-unlock LOCK_ID` if safe
### State Drift
```
Plan shows unexpected changes
```
```bash
terraform refresh # Update state from real infra
terraform plan # Review changes
```
### Corrupted State
1. Restore from backup
2. `terraform state pull > backup.tfstate`
3. Last resort: `terraform state rm` and re-import