Initial commit
This commit is contained in:
162
skills/ansible/SKILL.md
Normal file
162
skills/ansible/SKILL.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
name: ansible
|
||||
description: |
|
||||
Ansible automation reference for playbooks, roles, inventory, variables, and modules.
|
||||
Includes Proxmox VE and Docker integration via community.general and community.docker collections.
|
||||
Use when writing playbooks, troubleshooting Ansible runs, or designing automation workflows.
|
||||
Triggers: ansible, playbook, inventory, role, task, handler, vars, jinja2, galaxy, proxmox_kvm, proxmox_lxc, docker_container, docker_compose.
|
||||
---
|
||||
|
||||
# Ansible Skill
|
||||
|
||||
Ansible automation reference for configuration management and application deployment.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Test connectivity
|
||||
ansible all -m ping
|
||||
ansible <group> -m ping
|
||||
|
||||
# Run playbook
|
||||
ansible-playbook playbook.yml
|
||||
ansible-playbook playbook.yml -l <host> # Limit to host
|
||||
ansible-playbook playbook.yml --check # Dry-run
|
||||
ansible-playbook playbook.yml -vvv # Verbose
|
||||
|
||||
# Tags
|
||||
ansible-playbook playbook.yml --tags "deploy"
|
||||
ansible-playbook playbook.yml --skip-tags "backup"
|
||||
ansible-playbook playbook.yml --list-tags
|
||||
|
||||
# Variables
|
||||
ansible-playbook playbook.yml -e "var=value"
|
||||
ansible-playbook playbook.yml -e "@vars.yml"
|
||||
|
||||
# Ad-hoc commands
|
||||
ansible <group> -m shell -a "command"
|
||||
ansible <group> -m copy -a "src=file dest=/path"
|
||||
ansible <group> -m apt -a "name=package state=present"
|
||||
|
||||
# Galaxy
|
||||
ansible-galaxy collection install -r requirements.yml
|
||||
ansible-galaxy role install <role>
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
Load on-demand based on task:
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Playbook Structure | [playbooks.md](references/playbooks.md) | Writing playbooks |
|
||||
| Inventory | [inventory.md](references/inventory.md) | Host/group configuration |
|
||||
| Variables | [variables.md](references/variables.md) | Variable precedence, facts |
|
||||
| Modules | [modules.md](references/modules.md) | Common module reference |
|
||||
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, debugging |
|
||||
|
||||
### Proxmox Integration
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Proxmox Modules | [proxmox/modules.md](references/proxmox/modules.md) | VM/LXC management via API |
|
||||
| Proxmox Auth | [proxmox/authentication.md](references/proxmox/authentication.md) | API tokens, credentials |
|
||||
| Proxmox Gotchas | [proxmox/gotchas.md](references/proxmox/gotchas.md) | Common issues, workarounds |
|
||||
| Dynamic Inventory | [proxmox/dynamic-inventory.md](references/proxmox/dynamic-inventory.md) | Auto-discover VMs/containers |
|
||||
|
||||
### Docker Integration
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Docker Deployment | [docker/deployment.md](references/docker/deployment.md) | Containers, images, networks, volumes |
|
||||
| Compose Patterns | [docker/compose-patterns.md](references/docker/compose-patterns.md) | Roles, templates, multi-service stacks |
|
||||
| Docker Troubleshooting | [docker/troubleshooting.md](references/docker/troubleshooting.md) | Common errors, debugging |
|
||||
|
||||
## Playbook Quick Reference
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Deploy application
|
||||
hosts: webservers
|
||||
become: true
|
||||
vars:
|
||||
app_port: 8080
|
||||
|
||||
pre_tasks:
|
||||
- name: Validate requirements
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- app_secret is defined
|
||||
|
||||
tasks:
|
||||
- name: Install packages
|
||||
ansible.builtin.apt:
|
||||
name: "{{ item }}"
|
||||
state: present
|
||||
loop:
|
||||
- nginx
|
||||
- python3
|
||||
|
||||
- name: Deploy config
|
||||
ansible.builtin.template:
|
||||
src: app.conf.j2
|
||||
dest: /etc/app/app.conf
|
||||
notify: Restart app
|
||||
|
||||
handlers:
|
||||
- name: Restart app
|
||||
ansible.builtin.service:
|
||||
name: app
|
||||
state: restarted
|
||||
|
||||
post_tasks:
|
||||
- name: Verify deployment
|
||||
ansible.builtin.uri:
|
||||
url: "http://localhost:{{ app_port }}/health"
|
||||
```
|
||||
|
||||
## Variable Precedence (High to Low)
|
||||
|
||||
1. Extra vars (`-e "var=value"`)
|
||||
2. Task vars
|
||||
3. Block vars
|
||||
4. Role/include vars
|
||||
5. Play vars
|
||||
6. Host facts
|
||||
7. host_vars/
|
||||
8. group_vars/
|
||||
9. Role defaults
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```text
|
||||
ansible/
|
||||
├── ansible.cfg # Configuration
|
||||
├── inventory/
|
||||
│ └── hosts.yml # Inventory
|
||||
├── group_vars/
|
||||
│ ├── all.yml # All hosts
|
||||
│ └── webservers.yml # Group-specific
|
||||
├── host_vars/
|
||||
│ └── server1.yml # Host-specific
|
||||
├── roles/
|
||||
│ └── app/
|
||||
│ ├── tasks/
|
||||
│ ├── handlers/
|
||||
│ ├── templates/
|
||||
│ ├── files/
|
||||
│ └── defaults/
|
||||
├── playbooks/
|
||||
│ └── deploy.yml
|
||||
├── templates/
|
||||
│ └── config.j2
|
||||
└── requirements.yml # Galaxy dependencies
|
||||
```
|
||||
|
||||
## Idempotency Checklist
|
||||
|
||||
- [ ] Tasks produce same result on repeated runs
|
||||
- [ ] No `changed_when: true` unless necessary
|
||||
- [ ] Use `state: present/absent` not `shell` commands
|
||||
- [ ] Check mode (`--check`) shows accurate changes
|
||||
- [ ] Second run shows all "ok" (no changes)
|
||||
294
skills/ansible/references/docker/compose-patterns.md
Normal file
294
skills/ansible/references/docker/compose-patterns.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Ansible Docker Compose Patterns
|
||||
|
||||
Common patterns for managing Docker Compose stacks with Ansible.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
roles/
|
||||
└── docker_app/
|
||||
├── tasks/
|
||||
│ └── main.yml
|
||||
├── templates/
|
||||
│ ├── docker-compose.yml.j2
|
||||
│ └── .env.j2
|
||||
├── defaults/
|
||||
│ └── main.yml
|
||||
└── handlers/
|
||||
└── main.yml
|
||||
```
|
||||
|
||||
## Role Template
|
||||
|
||||
### defaults/main.yml
|
||||
|
||||
```yaml
|
||||
app_name: myapp
|
||||
app_version: latest
|
||||
app_port: 8080
|
||||
app_data_dir: "/opt/{{ app_name }}"
|
||||
|
||||
# Compose settings
|
||||
compose_pull: always
|
||||
compose_recreate: auto # auto, always, never
|
||||
|
||||
# Resource limits
|
||||
app_memory_limit: 512M
|
||||
app_cpu_limit: 1.0
|
||||
```
|
||||
|
||||
### templates/docker-compose.yml.j2
|
||||
|
||||
```yaml
|
||||
name: {{ app_name }}
|
||||
|
||||
services:
|
||||
app:
|
||||
image: {{ app_image }}:{{ app_version }}
|
||||
container_name: {{ app_name }}
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "{{ app_port }}:{{ app_internal_port | default(app_port) }}"
|
||||
volumes:
|
||||
- {{ app_data_dir }}/data:/app/data
|
||||
{% if app_config_file is defined %}
|
||||
- {{ app_data_dir }}/config:/app/config:ro
|
||||
{% endif %}
|
||||
environment:
|
||||
TZ: {{ timezone | default('UTC') }}
|
||||
{% for key, value in app_env.items() %}
|
||||
{{ key }}: "{{ value }}"
|
||||
{% endfor %}
|
||||
{% if app_memory_limit is defined or app_cpu_limit is defined %}
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
{% if app_memory_limit is defined %}
|
||||
memory: {{ app_memory_limit }}
|
||||
{% endif %}
|
||||
{% if app_cpu_limit is defined %}
|
||||
cpus: '{{ app_cpu_limit }}'
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:{{ app_internal_port | default(app_port) }}/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
networks:
|
||||
- {{ app_network | default('default') }}
|
||||
|
||||
{% if app_network is defined %}
|
||||
networks:
|
||||
{{ app_network }}:
|
||||
external: true
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
### tasks/main.yml
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Create application directory
|
||||
ansible.builtin.file:
|
||||
path: "{{ app_data_dir }}"
|
||||
state: directory
|
||||
owner: "{{ ansible_user }}"
|
||||
group: "{{ ansible_user }}"
|
||||
mode: '0755'
|
||||
|
||||
- name: Create data directories
|
||||
ansible.builtin.file:
|
||||
path: "{{ app_data_dir }}/{{ item }}"
|
||||
state: directory
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0755'
|
||||
loop:
|
||||
- data
|
||||
- config
|
||||
|
||||
- name: Deploy compose file
|
||||
ansible.builtin.template:
|
||||
src: docker-compose.yml.j2
|
||||
dest: "{{ app_data_dir }}/docker-compose.yml"
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0644'
|
||||
notify: Redeploy stack
|
||||
|
||||
- name: Deploy environment file
|
||||
ansible.builtin.template:
|
||||
src: .env.j2
|
||||
dest: "{{ app_data_dir }}/.env"
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0600'
|
||||
notify: Redeploy stack
|
||||
when: app_secrets is defined
|
||||
|
||||
- name: Ensure stack is running
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: "{{ app_data_dir }}"
|
||||
state: present
|
||||
pull: "{{ compose_pull }}"
|
||||
recreate: "{{ compose_recreate }}"
|
||||
register: compose_result
|
||||
|
||||
- name: Show deployment result
|
||||
ansible.builtin.debug:
|
||||
msg: "Deployed {{ compose_result.containers | length }} containers"
|
||||
when: compose_result is changed
|
||||
```
|
||||
|
||||
### handlers/main.yml
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Redeploy stack
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: "{{ app_data_dir }}"
|
||||
state: present
|
||||
pull: always
|
||||
recreate: always
|
||||
```
|
||||
|
||||
## Multi-Service Stack
|
||||
|
||||
### templates/docker-compose.yml.j2 (full stack)
|
||||
|
||||
```yaml
|
||||
name: {{ stack_name }}
|
||||
|
||||
services:
|
||||
app:
|
||||
image: {{ app_image }}:{{ app_version }}
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_started
|
||||
environment:
|
||||
DATABASE_URL: "postgres://{{ db_user }}:{{ db_password }}@db:5432/{{ db_name }}"
|
||||
REDIS_URL: "redis://redis:6379"
|
||||
networks:
|
||||
- internal
|
||||
- web
|
||||
|
||||
db:
|
||||
image: postgres:15
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- db_data:/var/lib/postgresql/data
|
||||
environment:
|
||||
POSTGRES_USER: {{ db_user }}
|
||||
POSTGRES_PASSWORD: {{ db_password }}
|
||||
POSTGRES_DB: {{ db_name }}
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U {{ db_user }}"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
networks:
|
||||
- internal
|
||||
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- redis_data:/data
|
||||
networks:
|
||||
- internal
|
||||
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "{{ http_port | default(80) }}:80"
|
||||
- "{{ https_port | default(443) }}:443"
|
||||
volumes:
|
||||
- {{ app_data_dir }}/nginx/conf.d:/etc/nginx/conf.d:ro
|
||||
- {{ app_data_dir }}/nginx/ssl:/etc/nginx/ssl:ro
|
||||
depends_on:
|
||||
- app
|
||||
networks:
|
||||
- web
|
||||
|
||||
networks:
|
||||
internal:
|
||||
driver: bridge
|
||||
web:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
db_data:
|
||||
redis_data:
|
||||
```
|
||||
|
||||
## Zero-Downtime Update
|
||||
|
||||
```yaml
|
||||
- name: Zero-downtime update
|
||||
hosts: docker_hosts
|
||||
serial: 1 # One host at a time
|
||||
tasks:
|
||||
- name: Pull new image
|
||||
community.docker.docker_image:
|
||||
name: "{{ app_image }}"
|
||||
tag: "{{ app_version }}"
|
||||
source: pull
|
||||
|
||||
- name: Drain connections (if load balanced)
|
||||
# ... remove from load balancer ...
|
||||
|
||||
- name: Update stack
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: "{{ app_data_dir }}"
|
||||
state: present
|
||||
recreate: always
|
||||
|
||||
- name: Wait for health
|
||||
ansible.builtin.uri:
|
||||
url: "http://localhost:{{ app_port }}/health"
|
||||
status_code: 200
|
||||
register: health
|
||||
until: health.status == 200
|
||||
retries: 30
|
||||
delay: 2
|
||||
|
||||
- name: Restore to load balancer
|
||||
# ... add back to load balancer ...
|
||||
```
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### With ansible-vault
|
||||
|
||||
```yaml
|
||||
# group_vars/secrets.yml (encrypted)
|
||||
app_secrets:
|
||||
DB_PASSWORD: supersecret
|
||||
API_KEY: abc123
|
||||
JWT_SECRET: longsecret
|
||||
```
|
||||
|
||||
```yaml
|
||||
# templates/.env.j2
|
||||
{% for key, value in app_secrets.items() %}
|
||||
{{ key }}={{ value }}
|
||||
{% endfor %}
|
||||
```
|
||||
|
||||
### With external secrets
|
||||
|
||||
```yaml
|
||||
- name: Fetch secret from 1Password
|
||||
ansible.builtin.set_fact:
|
||||
db_password: "{{ lookup('community.general.onepassword', 'database', field='password') }}"
|
||||
|
||||
- name: Deploy with secret
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: "{{ app_data_dir }}"
|
||||
env_files:
|
||||
- "{{ app_data_dir }}/.env"
|
||||
state: present
|
||||
```
|
||||
307
skills/ansible/references/docker/deployment.md
Normal file
307
skills/ansible/references/docker/deployment.md
Normal file
@@ -0,0 +1,307 @@
|
||||
# Docker Deployment with Ansible
|
||||
|
||||
Managing Docker containers and compose stacks via Ansible.
|
||||
|
||||
## Collection Setup
|
||||
|
||||
```bash
|
||||
ansible-galaxy collection install community.docker
|
||||
```
|
||||
|
||||
## Compose Deployment (Recommended)
|
||||
|
||||
### Deploy from local compose file
|
||||
|
||||
```yaml
|
||||
- name: Deploy application stack
|
||||
hosts: docker_hosts
|
||||
become: true
|
||||
tasks:
|
||||
- name: Create project directory
|
||||
ansible.builtin.file:
|
||||
path: /opt/myapp
|
||||
state: directory
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0755'
|
||||
|
||||
- name: Copy compose file
|
||||
ansible.builtin.template:
|
||||
src: docker-compose.yml.j2
|
||||
dest: /opt/myapp/docker-compose.yml
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0644'
|
||||
|
||||
- name: Copy environment file
|
||||
ansible.builtin.template:
|
||||
src: .env.j2
|
||||
dest: /opt/myapp/.env
|
||||
owner: "{{ ansible_user }}"
|
||||
mode: '0600'
|
||||
|
||||
- name: Deploy with compose
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/myapp
|
||||
state: present
|
||||
pull: always
|
||||
register: deploy_result
|
||||
|
||||
- name: Show deployed services
|
||||
ansible.builtin.debug:
|
||||
var: deploy_result.containers
|
||||
```
|
||||
|
||||
### Compose operations
|
||||
|
||||
```yaml
|
||||
# Pull latest images and recreate
|
||||
- name: Update stack
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/myapp
|
||||
state: present
|
||||
pull: always
|
||||
recreate: always
|
||||
|
||||
# Stop stack (keep volumes)
|
||||
- name: Stop stack
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/myapp
|
||||
state: stopped
|
||||
|
||||
# Remove stack
|
||||
- name: Remove stack
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/myapp
|
||||
state: absent
|
||||
remove_volumes: false # Keep data volumes
|
||||
```
|
||||
|
||||
## Container Deployment (Individual)
|
||||
|
||||
### Run container
|
||||
|
||||
```yaml
|
||||
- name: Run nginx container
|
||||
community.docker.docker_container:
|
||||
name: nginx
|
||||
image: nginx:1.25
|
||||
state: started
|
||||
restart_policy: unless-stopped
|
||||
ports:
|
||||
- "80:80"
|
||||
- "443:443"
|
||||
volumes:
|
||||
- /opt/nginx/html:/usr/share/nginx/html:ro
|
||||
- /opt/nginx/conf.d:/etc/nginx/conf.d:ro
|
||||
env:
|
||||
TZ: "America/Los_Angeles"
|
||||
labels:
|
||||
app: web
|
||||
env: production
|
||||
|
||||
- name: Run database
|
||||
community.docker.docker_container:
|
||||
name: postgres
|
||||
image: postgres:15
|
||||
state: started
|
||||
restart_policy: unless-stopped
|
||||
ports:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
env:
|
||||
POSTGRES_USER: "{{ db_user }}"
|
||||
POSTGRES_PASSWORD: "{{ db_password }}"
|
||||
POSTGRES_DB: "{{ db_name }}"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U {{ db_user }}"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
### Container lifecycle
|
||||
|
||||
```yaml
|
||||
# Stop container
|
||||
- name: Stop container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
state: stopped
|
||||
|
||||
# Restart container
|
||||
- name: Restart container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
state: started
|
||||
restart: true
|
||||
|
||||
# Remove container
|
||||
- name: Remove container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
state: absent
|
||||
|
||||
# Force recreate
|
||||
- name: Recreate container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
image: myapp:latest
|
||||
state: started
|
||||
recreate: true
|
||||
```
|
||||
|
||||
## Image Management
|
||||
|
||||
```yaml
|
||||
# Pull image
|
||||
- name: Pull latest image
|
||||
community.docker.docker_image:
|
||||
name: myapp
|
||||
tag: latest
|
||||
source: pull
|
||||
force_source: true # Always check for updates
|
||||
|
||||
# Build from Dockerfile
|
||||
- name: Build image
|
||||
community.docker.docker_image:
|
||||
name: myapp
|
||||
tag: "{{ version }}"
|
||||
source: build
|
||||
build:
|
||||
path: /opt/myapp
|
||||
dockerfile: Dockerfile
|
||||
pull: true # Pull base image updates
|
||||
|
||||
# Remove image
|
||||
- name: Remove old images
|
||||
community.docker.docker_image:
|
||||
name: myapp
|
||||
tag: old
|
||||
state: absent
|
||||
```
|
||||
|
||||
## Network Management
|
||||
|
||||
```yaml
|
||||
# Create network
|
||||
- name: Create app network
|
||||
community.docker.docker_network:
|
||||
name: app_network
|
||||
driver: bridge
|
||||
ipam_config:
|
||||
- subnet: 172.20.0.0/16
|
||||
gateway: 172.20.0.1
|
||||
|
||||
# Create macvlan network
|
||||
- name: Create macvlan network
|
||||
community.docker.docker_network:
|
||||
name: lan
|
||||
driver: macvlan
|
||||
driver_options:
|
||||
parent: eth0
|
||||
ipam_config:
|
||||
- subnet: 192.168.1.0/24
|
||||
gateway: 192.168.1.1
|
||||
|
||||
# Attach container to network
|
||||
- name: Run container on network
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
image: myapp:latest
|
||||
networks:
|
||||
- name: app_network
|
||||
ipv4_address: 172.20.0.10
|
||||
```
|
||||
|
||||
## Volume Management
|
||||
|
||||
```yaml
|
||||
# Create named volume
|
||||
- name: Create data volume
|
||||
community.docker.docker_volume:
|
||||
name: app_data
|
||||
driver: local
|
||||
|
||||
# Create volume with options
|
||||
- name: Create NFS volume
|
||||
community.docker.docker_volume:
|
||||
name: shared_data
|
||||
driver: local
|
||||
driver_options:
|
||||
type: nfs
|
||||
device: ":/exports/data"
|
||||
o: "addr=192.168.1.10,rw"
|
||||
|
||||
# Backup volume
|
||||
- name: Backup volume
|
||||
community.docker.docker_container:
|
||||
name: backup
|
||||
image: alpine
|
||||
command: tar czf /backup/data.tar.gz /data
|
||||
volumes:
|
||||
- app_data:/data:ro
|
||||
- /opt/backups:/backup
|
||||
auto_remove: true
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Wait for service health
|
||||
|
||||
```yaml
|
||||
- name: Deploy database
|
||||
community.docker.docker_container:
|
||||
name: postgres
|
||||
image: postgres:15
|
||||
# ... config ...
|
||||
|
||||
- name: Wait for database
|
||||
community.docker.docker_container_info:
|
||||
name: postgres
|
||||
register: db_info
|
||||
until: db_info.container.State.Health.Status == "healthy"
|
||||
retries: 30
|
||||
delay: 2
|
||||
```
|
||||
|
||||
### Rolling update
|
||||
|
||||
```yaml
|
||||
- name: Pull new image
|
||||
community.docker.docker_image:
|
||||
name: myapp
|
||||
tag: "{{ new_version }}"
|
||||
source: pull
|
||||
|
||||
- name: Update container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
image: "myapp:{{ new_version }}"
|
||||
state: started
|
||||
recreate: true
|
||||
restart_policy: unless-stopped
|
||||
```
|
||||
|
||||
### Cleanup
|
||||
|
||||
```yaml
|
||||
- name: Remove stopped containers
|
||||
community.docker.docker_prune:
|
||||
containers: true
|
||||
containers_filters:
|
||||
status: exited
|
||||
|
||||
- name: Remove unused images
|
||||
community.docker.docker_prune:
|
||||
images: true
|
||||
images_filters:
|
||||
dangling: true
|
||||
|
||||
- name: Full cleanup (careful!)
|
||||
community.docker.docker_prune:
|
||||
containers: true
|
||||
images: true
|
||||
networks: true
|
||||
volumes: false # Don't remove data!
|
||||
builder_cache: true
|
||||
```
|
||||
292
skills/ansible/references/docker/troubleshooting.md
Normal file
292
skills/ansible/references/docker/troubleshooting.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Ansible Docker Troubleshooting
|
||||
|
||||
Common issues and debugging patterns.
|
||||
|
||||
## Module Issues
|
||||
|
||||
### "Could not find docker-compose"
|
||||
|
||||
```yaml
|
||||
# docker_compose_v2 requires Docker Compose V2 (plugin)
|
||||
# NOT standalone docker-compose binary
|
||||
|
||||
# Check on target host:
|
||||
# docker compose version # V2 (plugin)
|
||||
# docker-compose version # V1 (standalone) - won't work
|
||||
```
|
||||
|
||||
Fix: Install Docker Compose V2:
|
||||
```yaml
|
||||
- name: Install Docker Compose plugin
|
||||
ansible.builtin.apt:
|
||||
name: docker-compose-plugin
|
||||
state: present
|
||||
```
|
||||
|
||||
### "Permission denied"
|
||||
|
||||
```yaml
|
||||
# User not in docker group
|
||||
- name: Add user to docker group
|
||||
ansible.builtin.user:
|
||||
name: "{{ ansible_user }}"
|
||||
groups: docker
|
||||
append: true
|
||||
become: true
|
||||
|
||||
# Then reconnect or use become
|
||||
- name: Run with become
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
# ...
|
||||
become: true
|
||||
```
|
||||
|
||||
### "Cannot connect to Docker daemon"
|
||||
|
||||
```yaml
|
||||
# Docker not running
|
||||
- name: Ensure Docker is running
|
||||
ansible.builtin.service:
|
||||
name: docker
|
||||
state: started
|
||||
enabled: true
|
||||
become: true
|
||||
|
||||
# Socket permission issue
|
||||
# Add become: true to docker tasks
|
||||
```
|
||||
|
||||
## Container Issues
|
||||
|
||||
### Get container logs
|
||||
|
||||
```yaml
|
||||
- name: Get logs
|
||||
community.docker.docker_container_exec:
|
||||
container: myapp
|
||||
command: cat /var/log/app.log
|
||||
register: logs
|
||||
ignore_errors: true
|
||||
|
||||
- name: Alternative - docker logs
|
||||
ansible.builtin.command: docker logs --tail 100 myapp
|
||||
register: docker_logs
|
||||
changed_when: false
|
||||
|
||||
- name: Show logs
|
||||
ansible.builtin.debug:
|
||||
var: docker_logs.stdout_lines
|
||||
```
|
||||
|
||||
### Container keeps restarting
|
||||
|
||||
```yaml
|
||||
- name: Get container info
|
||||
community.docker.docker_container_info:
|
||||
name: myapp
|
||||
register: container_info
|
||||
|
||||
- name: Show restart count
|
||||
ansible.builtin.debug:
|
||||
msg: "Restart count: {{ container_info.container.RestartCount }}"
|
||||
|
||||
- name: Show last exit code
|
||||
ansible.builtin.debug:
|
||||
msg: "Exit code: {{ container_info.container.State.ExitCode }}"
|
||||
|
||||
- name: Get logs from dead container
|
||||
ansible.builtin.command: docker logs myapp
|
||||
register: crash_logs
|
||||
changed_when: false
|
||||
|
||||
- name: Show crash logs
|
||||
ansible.builtin.debug:
|
||||
var: crash_logs.stderr_lines
|
||||
```
|
||||
|
||||
### Health check failing
|
||||
|
||||
```yaml
|
||||
- name: Check health status
|
||||
community.docker.docker_container_info:
|
||||
name: myapp
|
||||
register: info
|
||||
|
||||
- name: Show health
|
||||
ansible.builtin.debug:
|
||||
msg: |
|
||||
Status: {{ info.container.State.Health.Status }}
|
||||
Failing: {{ info.container.State.Health.FailingStreak }}
|
||||
Log: {{ info.container.State.Health.Log | last }}
|
||||
|
||||
# Manual health check
|
||||
- name: Test health endpoint
|
||||
ansible.builtin.command: >
|
||||
docker exec myapp curl -f http://localhost:8080/health
|
||||
register: health
|
||||
ignore_errors: true
|
||||
changed_when: false
|
||||
```
|
||||
|
||||
## Network Issues
|
||||
|
||||
### Container can't reach external network
|
||||
|
||||
```yaml
|
||||
- name: Test DNS from container
|
||||
ansible.builtin.command: docker exec myapp nslookup google.com
|
||||
register: dns_test
|
||||
changed_when: false
|
||||
ignore_errors: true
|
||||
|
||||
- name: Test connectivity
|
||||
ansible.builtin.command: docker exec myapp ping -c 1 8.8.8.8
|
||||
register: ping_test
|
||||
changed_when: false
|
||||
ignore_errors: true
|
||||
|
||||
# Check iptables
|
||||
- name: Check IP forwarding
|
||||
ansible.builtin.command: sysctl net.ipv4.ip_forward
|
||||
register: ip_forward
|
||||
changed_when: false
|
||||
|
||||
- name: Enable IP forwarding
|
||||
ansible.posix.sysctl:
|
||||
name: net.ipv4.ip_forward
|
||||
value: '1'
|
||||
state: present
|
||||
become: true
|
||||
when: "'0' in ip_forward.stdout"
|
||||
```
|
||||
|
||||
### Containers can't communicate
|
||||
|
||||
```yaml
|
||||
- name: List networks
|
||||
community.docker.docker_network_info:
|
||||
name: "{{ network_name }}"
|
||||
register: network_info
|
||||
|
||||
- name: Show connected containers
|
||||
ansible.builtin.debug:
|
||||
var: network_info.network.Containers
|
||||
|
||||
# Verify both containers on same network
|
||||
- name: Test inter-container connectivity
|
||||
ansible.builtin.command: >
|
||||
docker exec app ping -c 1 db
|
||||
register: ping_result
|
||||
changed_when: false
|
||||
```
|
||||
|
||||
## Compose Issues
|
||||
|
||||
### Services not starting in order
|
||||
|
||||
```yaml
|
||||
# depends_on only waits for container start, not readiness
|
||||
# Use healthcheck + condition
|
||||
|
||||
# In compose template:
|
||||
services:
|
||||
app:
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy # Wait for health check
|
||||
|
||||
db:
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
```
|
||||
|
||||
### Orphaned containers
|
||||
|
||||
```yaml
|
||||
# Containers from old compose runs
|
||||
- name: Remove orphans
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/myapp
|
||||
state: present
|
||||
remove_orphans: true
|
||||
```
|
||||
|
||||
### Volume data not persisting
|
||||
|
||||
```yaml
|
||||
# Check volume exists
|
||||
- name: List volumes
|
||||
ansible.builtin.command: docker volume ls
|
||||
register: volumes
|
||||
changed_when: false
|
||||
|
||||
# Check volume contents
|
||||
- name: Inspect volume
|
||||
ansible.builtin.command: docker volume inspect myapp_data
|
||||
register: volume_info
|
||||
changed_when: false
|
||||
|
||||
- name: Show volume mountpoint
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ (volume_info.stdout | from_json)[0].Mountpoint }}"
|
||||
```
|
||||
|
||||
## Debug Playbook
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Docker debug
|
||||
hosts: docker_hosts
|
||||
tasks:
|
||||
- name: Docker version
|
||||
ansible.builtin.command: docker version
|
||||
register: docker_version
|
||||
changed_when: false
|
||||
|
||||
- name: Compose version
|
||||
ansible.builtin.command: docker compose version
|
||||
register: compose_version
|
||||
changed_when: false
|
||||
|
||||
- name: List containers
|
||||
ansible.builtin.command: docker ps -a
|
||||
register: containers
|
||||
changed_when: false
|
||||
|
||||
- name: List images
|
||||
ansible.builtin.command: docker images
|
||||
register: images
|
||||
changed_when: false
|
||||
|
||||
- name: Disk usage
|
||||
ansible.builtin.command: docker system df
|
||||
register: disk
|
||||
changed_when: false
|
||||
|
||||
- name: Show all
|
||||
ansible.builtin.debug:
|
||||
msg: |
|
||||
Docker: {{ docker_version.stdout_lines[0] }}
|
||||
Compose: {{ compose_version.stdout }}
|
||||
Containers:
|
||||
{{ containers.stdout }}
|
||||
Images:
|
||||
{{ images.stdout }}
|
||||
Disk:
|
||||
{{ disk.stdout }}
|
||||
```
|
||||
|
||||
## Common Error Reference
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `docker.errors.DockerException` | Docker not running | Start docker service |
|
||||
| `docker.errors.APIError: 404` | Container/image not found | Check name/tag |
|
||||
| `docker.errors.APIError: 409` | Container name conflict | Remove or rename |
|
||||
| `PermissionError` | Not in docker group | Add user or use become |
|
||||
| `requests.exceptions.ConnectionError` | Docker socket inaccessible | Check socket permissions |
|
||||
| `FileNotFoundError: docker-compose` | V1 compose not installed | Use docker_compose_v2 |
|
||||
181
skills/ansible/references/inventory.md
Normal file
181
skills/ansible/references/inventory.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Ansible Inventory Reference
|
||||
|
||||
## YAML Inventory Format
|
||||
|
||||
```yaml
|
||||
all:
|
||||
children:
|
||||
webservers:
|
||||
hosts:
|
||||
web1:
|
||||
ansible_host: 192.168.1.10
|
||||
web2:
|
||||
ansible_host: 192.168.1.11
|
||||
vars:
|
||||
http_port: 80
|
||||
|
||||
databases:
|
||||
hosts:
|
||||
db1:
|
||||
ansible_host: 192.168.1.20
|
||||
db_port: 5432
|
||||
db2:
|
||||
ansible_host: 192.168.1.21
|
||||
|
||||
production:
|
||||
children:
|
||||
webservers:
|
||||
databases:
|
||||
|
||||
vars:
|
||||
ansible_user: ubuntu
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_rsa
|
||||
```
|
||||
|
||||
## INI Inventory Format
|
||||
|
||||
```ini
|
||||
[webservers]
|
||||
web1 ansible_host=192.168.1.10
|
||||
web2 ansible_host=192.168.1.11
|
||||
|
||||
[webservers:vars]
|
||||
http_port=80
|
||||
|
||||
[databases]
|
||||
db1 ansible_host=192.168.1.20 db_port=5432
|
||||
db2 ansible_host=192.168.1.21
|
||||
|
||||
[production:children]
|
||||
webservers
|
||||
databases
|
||||
|
||||
[all:vars]
|
||||
ansible_user=ubuntu
|
||||
```
|
||||
|
||||
## Host Variables
|
||||
|
||||
Common host variables:
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `ansible_host` | IP or hostname to connect |
|
||||
| `ansible_port` | SSH port (default: 22) |
|
||||
| `ansible_user` | SSH username |
|
||||
| `ansible_ssh_private_key_file` | SSH key path |
|
||||
| `ansible_become` | Enable sudo |
|
||||
| `ansible_become_user` | Sudo target user |
|
||||
| `ansible_python_interpreter` | Python path |
|
||||
|
||||
## Group Variables
|
||||
|
||||
```yaml
|
||||
# group_vars/webservers.yml
|
||||
http_port: 80
|
||||
document_root: /var/www/html
|
||||
|
||||
# group_vars/all.yml
|
||||
ntp_server: time.example.com
|
||||
dns_servers:
|
||||
- 8.8.8.8
|
||||
- 8.8.4.4
|
||||
```
|
||||
|
||||
## Host Variables Files
|
||||
|
||||
```yaml
|
||||
# host_vars/web1.yml
|
||||
site_name: production-web1
|
||||
ssl_cert_path: /etc/ssl/certs/web1.crt
|
||||
```
|
||||
|
||||
## Dynamic Groups
|
||||
|
||||
```yaml
|
||||
# In playbook
|
||||
- hosts: "{{ target_group | default('all') }}"
|
||||
```
|
||||
|
||||
Run with:
|
||||
```bash
|
||||
ansible-playbook playbook.yml -e "target_group=webservers"
|
||||
```
|
||||
|
||||
## Patterns
|
||||
|
||||
```bash
|
||||
# All hosts
|
||||
ansible all -m ping
|
||||
|
||||
# Single host
|
||||
ansible web1 -m ping
|
||||
|
||||
# Group
|
||||
ansible webservers -m ping
|
||||
|
||||
# Multiple groups
|
||||
ansible 'webservers:databases' -m ping
|
||||
|
||||
# Intersection (AND)
|
||||
ansible 'webservers:&production' -m ping
|
||||
|
||||
# Exclusion
|
||||
ansible 'webservers:!web1' -m ping
|
||||
|
||||
# Regex
|
||||
ansible '~web[0-9]+' -m ping
|
||||
```
|
||||
|
||||
## Limit
|
||||
|
||||
```bash
|
||||
# Limit to specific hosts
|
||||
ansible-playbook playbook.yml -l web1
|
||||
ansible-playbook playbook.yml --limit web1,web2
|
||||
ansible-playbook playbook.yml --limit 'webservers:!web3'
|
||||
```
|
||||
|
||||
## Inventory Check
|
||||
|
||||
```bash
|
||||
# List hosts
|
||||
ansible-inventory --list
|
||||
ansible-inventory --graph
|
||||
|
||||
# Host info
|
||||
ansible-inventory --host web1
|
||||
|
||||
# Validate
|
||||
ansible all --list-hosts
|
||||
```
|
||||
|
||||
## Multiple Inventories
|
||||
|
||||
```bash
|
||||
# Multiple files
|
||||
ansible-playbook -i inventory/production -i inventory/staging playbook.yml
|
||||
|
||||
# Directory of inventories
|
||||
ansible-playbook -i inventory/ playbook.yml
|
||||
```
|
||||
|
||||
## Special Groups
|
||||
|
||||
| Group | Contains |
|
||||
|-------|----------|
|
||||
| `all` | All hosts |
|
||||
| `ungrouped` | Hosts not in any group |
|
||||
|
||||
## Local Connection
|
||||
|
||||
```yaml
|
||||
localhost:
|
||||
ansible_host: 127.0.0.1
|
||||
ansible_connection: local
|
||||
```
|
||||
|
||||
Or in inventory:
|
||||
```ini
|
||||
localhost ansible_connection=local
|
||||
```
|
||||
341
skills/ansible/references/modules.md
Normal file
341
skills/ansible/references/modules.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Ansible Modules Reference
|
||||
|
||||
## File Operations
|
||||
|
||||
### copy
|
||||
|
||||
```yaml
|
||||
- name: Copy file
|
||||
ansible.builtin.copy:
|
||||
src: files/config.conf
|
||||
dest: /etc/app/config.conf
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
backup: true
|
||||
```
|
||||
|
||||
### template
|
||||
|
||||
```yaml
|
||||
- name: Template config
|
||||
ansible.builtin.template:
|
||||
src: templates/config.j2
|
||||
dest: /etc/app/config.conf
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
notify: Restart app
|
||||
```
|
||||
|
||||
### file
|
||||
|
||||
```yaml
|
||||
# Create directory
|
||||
- name: Create directory
|
||||
ansible.builtin.file:
|
||||
path: /opt/app
|
||||
state: directory
|
||||
owner: app
|
||||
group: app
|
||||
mode: '0755'
|
||||
|
||||
# Create symlink
|
||||
- name: Create symlink
|
||||
ansible.builtin.file:
|
||||
src: /opt/app/current
|
||||
dest: /opt/app/release
|
||||
state: link
|
||||
|
||||
# Delete file
|
||||
- name: Remove file
|
||||
ansible.builtin.file:
|
||||
path: /tmp/old-file
|
||||
state: absent
|
||||
```
|
||||
|
||||
### lineinfile
|
||||
|
||||
```yaml
|
||||
- name: Ensure line in file
|
||||
ansible.builtin.lineinfile:
|
||||
path: /etc/hosts
|
||||
line: "192.168.1.10 myhost"
|
||||
state: present
|
||||
|
||||
- name: Replace line
|
||||
ansible.builtin.lineinfile:
|
||||
path: /etc/config
|
||||
regexp: '^PORT='
|
||||
line: 'PORT=8080'
|
||||
```
|
||||
|
||||
## Package Management
|
||||
|
||||
### apt (Debian/Ubuntu)
|
||||
|
||||
```yaml
|
||||
- name: Install package
|
||||
ansible.builtin.apt:
|
||||
name: nginx
|
||||
state: present
|
||||
update_cache: true
|
||||
|
||||
- name: Install multiple
|
||||
ansible.builtin.apt:
|
||||
name:
|
||||
- nginx
|
||||
- python3
|
||||
state: present
|
||||
|
||||
- name: Remove package
|
||||
ansible.builtin.apt:
|
||||
name: nginx
|
||||
state: absent
|
||||
```
|
||||
|
||||
### package (Generic)
|
||||
|
||||
```yaml
|
||||
- name: Install package
|
||||
ansible.builtin.package:
|
||||
name: httpd
|
||||
state: present
|
||||
```
|
||||
|
||||
## Service Management
|
||||
|
||||
### service
|
||||
|
||||
```yaml
|
||||
- name: Start and enable
|
||||
ansible.builtin.service:
|
||||
name: nginx
|
||||
state: started
|
||||
enabled: true
|
||||
|
||||
- name: Restart
|
||||
ansible.builtin.service:
|
||||
name: nginx
|
||||
state: restarted
|
||||
|
||||
- name: Reload
|
||||
ansible.builtin.service:
|
||||
name: nginx
|
||||
state: reloaded
|
||||
```
|
||||
|
||||
### systemd
|
||||
|
||||
```yaml
|
||||
- name: Daemon reload
|
||||
ansible.builtin.systemd:
|
||||
daemon_reload: true
|
||||
|
||||
- name: Enable and start
|
||||
ansible.builtin.systemd:
|
||||
name: myapp
|
||||
state: started
|
||||
enabled: true
|
||||
```
|
||||
|
||||
## Command Execution
|
||||
|
||||
### command
|
||||
|
||||
```yaml
|
||||
- name: Run command
|
||||
ansible.builtin.command: /bin/mycommand arg1 arg2
|
||||
register: result
|
||||
changed_when: "'changed' in result.stdout"
|
||||
```
|
||||
|
||||
### shell
|
||||
|
||||
```yaml
|
||||
- name: Run shell command
|
||||
ansible.builtin.shell: |
|
||||
cd /opt/app
|
||||
./setup.sh && ./configure.sh
|
||||
args:
|
||||
executable: /bin/bash
|
||||
```
|
||||
|
||||
### script
|
||||
|
||||
```yaml
|
||||
- name: Run local script on remote
|
||||
ansible.builtin.script: scripts/setup.sh
|
||||
args:
|
||||
creates: /opt/app/.installed
|
||||
```
|
||||
|
||||
## User Management
|
||||
|
||||
### user
|
||||
|
||||
```yaml
|
||||
- name: Create user
|
||||
ansible.builtin.user:
|
||||
name: appuser
|
||||
groups: docker,sudo
|
||||
shell: /bin/bash
|
||||
create_home: true
|
||||
state: present
|
||||
|
||||
- name: Remove user
|
||||
ansible.builtin.user:
|
||||
name: olduser
|
||||
state: absent
|
||||
remove: true
|
||||
```
|
||||
|
||||
### group
|
||||
|
||||
```yaml
|
||||
- name: Create group
|
||||
ansible.builtin.group:
|
||||
name: appgroup
|
||||
state: present
|
||||
```
|
||||
|
||||
## Docker (community.docker)
|
||||
|
||||
### docker_container
|
||||
|
||||
```yaml
|
||||
- name: Run container
|
||||
community.docker.docker_container:
|
||||
name: myapp
|
||||
image: myapp:latest
|
||||
state: started
|
||||
restart_policy: unless-stopped
|
||||
ports:
|
||||
- "8080:80"
|
||||
volumes:
|
||||
- /data:/app/data
|
||||
env:
|
||||
DB_HOST: database
|
||||
```
|
||||
|
||||
### docker_compose_v2
|
||||
|
||||
```yaml
|
||||
- name: Deploy with compose
|
||||
community.docker.docker_compose_v2:
|
||||
project_src: /opt/app
|
||||
project_name: myapp
|
||||
state: present
|
||||
pull: always
|
||||
env_files:
|
||||
- /opt/app/.env
|
||||
```
|
||||
|
||||
### docker_image
|
||||
|
||||
```yaml
|
||||
- name: Pull image
|
||||
community.docker.docker_image:
|
||||
name: nginx
|
||||
tag: "1.25"
|
||||
source: pull
|
||||
```
|
||||
|
||||
## Networking
|
||||
|
||||
### uri
|
||||
|
||||
```yaml
|
||||
- name: API call
|
||||
ansible.builtin.uri:
|
||||
url: "http://localhost:8080/api/health"
|
||||
method: GET
|
||||
return_content: true
|
||||
register: response
|
||||
|
||||
- name: POST request
|
||||
ansible.builtin.uri:
|
||||
url: "http://api.example.com/data"
|
||||
method: POST
|
||||
body_format: json
|
||||
body:
|
||||
key: value
|
||||
```
|
||||
|
||||
### wait_for
|
||||
|
||||
```yaml
|
||||
- name: Wait for port
|
||||
ansible.builtin.wait_for:
|
||||
host: localhost
|
||||
port: 8080
|
||||
timeout: 300
|
||||
|
||||
- name: Wait for file
|
||||
ansible.builtin.wait_for:
|
||||
path: /var/log/app.log
|
||||
search_regex: "Server started"
|
||||
```
|
||||
|
||||
## Debug/Assert
|
||||
|
||||
### debug
|
||||
|
||||
```yaml
|
||||
- name: Print variable
|
||||
ansible.builtin.debug:
|
||||
msg: "Value: {{ my_var }}"
|
||||
|
||||
- name: Print var directly
|
||||
ansible.builtin.debug:
|
||||
var: my_var
|
||||
```
|
||||
|
||||
### assert
|
||||
|
||||
```yaml
|
||||
- name: Validate conditions
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- my_var is defined
|
||||
- my_var | length > 0
|
||||
fail_msg: "my_var must be defined and non-empty"
|
||||
success_msg: "Validation passed"
|
||||
```
|
||||
|
||||
### fail
|
||||
|
||||
```yaml
|
||||
- name: Fail with message
|
||||
ansible.builtin.fail:
|
||||
msg: "Required condition not met"
|
||||
when: condition
|
||||
```
|
||||
|
||||
## Misc
|
||||
|
||||
### pause
|
||||
|
||||
```yaml
|
||||
- name: Wait 10 seconds
|
||||
ansible.builtin.pause:
|
||||
seconds: 10
|
||||
|
||||
- name: Wait for user
|
||||
ansible.builtin.pause:
|
||||
prompt: "Press enter to continue"
|
||||
```
|
||||
|
||||
### stat
|
||||
|
||||
```yaml
|
||||
- name: Check file exists
|
||||
ansible.builtin.stat:
|
||||
path: /etc/config
|
||||
register: config_file
|
||||
|
||||
- name: Use result
|
||||
ansible.builtin.debug:
|
||||
msg: "File exists"
|
||||
when: config_file.stat.exists
|
||||
```
|
||||
243
skills/ansible/references/playbooks.md
Normal file
243
skills/ansible/references/playbooks.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Ansible Playbook Reference
|
||||
|
||||
## Basic Structure
|
||||
|
||||
```yaml
|
||||
---
|
||||
- name: Playbook description
|
||||
hosts: target_group
|
||||
become: true # Run as root
|
||||
gather_facts: true # Collect system info
|
||||
|
||||
vars:
|
||||
my_var: value
|
||||
|
||||
vars_files:
|
||||
- vars/secrets.yml
|
||||
|
||||
pre_tasks:
|
||||
- name: Pre-task
|
||||
ansible.builtin.debug:
|
||||
msg: "Running before main tasks"
|
||||
|
||||
roles:
|
||||
- role_name
|
||||
|
||||
tasks:
|
||||
- name: Main task
|
||||
ansible.builtin.debug:
|
||||
msg: "Main task"
|
||||
|
||||
handlers:
|
||||
- name: Handler name
|
||||
ansible.builtin.service:
|
||||
name: service
|
||||
state: restarted
|
||||
|
||||
post_tasks:
|
||||
- name: Post-task
|
||||
ansible.builtin.debug:
|
||||
msg: "Running after main tasks"
|
||||
```
|
||||
|
||||
## Task Options
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: Task with common options
|
||||
ansible.builtin.command: /bin/command
|
||||
become: true # Privilege escalation
|
||||
become_user: www-data # Run as specific user
|
||||
when: condition # Conditional execution
|
||||
register: result # Store output
|
||||
ignore_errors: true # Continue on failure
|
||||
changed_when: false # Override change detection
|
||||
failed_when: result.rc != 0 # Custom failure condition
|
||||
tags:
|
||||
- deploy
|
||||
- config
|
||||
notify: Handler name # Trigger handler
|
||||
```
|
||||
|
||||
## Loops
|
||||
|
||||
```yaml
|
||||
# Simple loop
|
||||
- name: Install packages
|
||||
ansible.builtin.apt:
|
||||
name: "{{ item }}"
|
||||
state: present
|
||||
loop:
|
||||
- nginx
|
||||
- python3
|
||||
|
||||
# Loop with dict
|
||||
- name: Create users
|
||||
ansible.builtin.user:
|
||||
name: "{{ item.name }}"
|
||||
groups: "{{ item.groups }}"
|
||||
loop:
|
||||
- { name: 'user1', groups: 'admin' }
|
||||
- { name: 'user2', groups: 'users' }
|
||||
|
||||
# Loop over dict
|
||||
- name: Process items
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ item.key }}: {{ item.value }}"
|
||||
loop: "{{ my_dict | dict2items }}"
|
||||
|
||||
# Loop with index
|
||||
- name: With index
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ index }}: {{ item }}"
|
||||
loop: "{{ my_list }}"
|
||||
loop_control:
|
||||
index_var: index
|
||||
```
|
||||
|
||||
## Conditionals
|
||||
|
||||
```yaml
|
||||
# Simple when
|
||||
- name: Only on Ubuntu
|
||||
ansible.builtin.apt:
|
||||
name: package
|
||||
when: ansible_distribution == "Ubuntu"
|
||||
|
||||
# Multiple conditions
|
||||
- name: Complex condition
|
||||
ansible.builtin.command: /bin/something
|
||||
when:
|
||||
- ansible_os_family == "Debian"
|
||||
- ansible_distribution_version is version('20.04', '>=')
|
||||
|
||||
# Or conditions
|
||||
- name: Or condition
|
||||
ansible.builtin.command: /bin/something
|
||||
when: condition1 or condition2
|
||||
|
||||
# Check variable
|
||||
- name: If defined
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ my_var }}"
|
||||
when: my_var is defined
|
||||
```
|
||||
|
||||
## Blocks
|
||||
|
||||
```yaml
|
||||
- name: Block example
|
||||
block:
|
||||
- name: Task 1
|
||||
ansible.builtin.command: /bin/task1
|
||||
|
||||
- name: Task 2
|
||||
ansible.builtin.command: /bin/task2
|
||||
|
||||
rescue:
|
||||
- name: Handle failure
|
||||
ansible.builtin.debug:
|
||||
msg: "Block failed"
|
||||
|
||||
always:
|
||||
- name: Always run
|
||||
ansible.builtin.debug:
|
||||
msg: "Cleanup"
|
||||
```
|
||||
|
||||
## Handlers
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: Update config
|
||||
ansible.builtin.template:
|
||||
src: config.j2
|
||||
dest: /etc/app/config
|
||||
notify:
|
||||
- Restart service
|
||||
- Reload config
|
||||
|
||||
handlers:
|
||||
- name: Restart service
|
||||
ansible.builtin.service:
|
||||
name: app
|
||||
state: restarted
|
||||
|
||||
- name: Reload config
|
||||
ansible.builtin.service:
|
||||
name: app
|
||||
state: reloaded
|
||||
```
|
||||
|
||||
Handlers run once at end of play, even if notified multiple times.
|
||||
|
||||
## Including Tasks
|
||||
|
||||
```yaml
|
||||
# Include tasks file
|
||||
- name: Include tasks
|
||||
ansible.builtin.include_tasks: tasks/setup.yml
|
||||
|
||||
# Import tasks (static)
|
||||
- name: Import tasks
|
||||
ansible.builtin.import_tasks: tasks/setup.yml
|
||||
|
||||
# Include with variables
|
||||
- name: Include with vars
|
||||
ansible.builtin.include_tasks: tasks/deploy.yml
|
||||
vars:
|
||||
environment: production
|
||||
```
|
||||
|
||||
## Tags
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: Tagged task
|
||||
ansible.builtin.command: /bin/command
|
||||
tags:
|
||||
- deploy
|
||||
- always # Always runs regardless of tag selection
|
||||
|
||||
- name: Never runs by default
|
||||
ansible.builtin.command: /bin/command
|
||||
tags: never # Only runs when explicitly tagged
|
||||
```
|
||||
|
||||
Run with tags:
|
||||
```bash
|
||||
ansible-playbook playbook.yml --tags "deploy"
|
||||
ansible-playbook playbook.yml --skip-tags "slow"
|
||||
```
|
||||
|
||||
## Check Mode
|
||||
|
||||
```yaml
|
||||
# Force check mode behavior
|
||||
- name: Always runs in check
|
||||
ansible.builtin.command: /bin/command
|
||||
check_mode: false # Runs even in check mode
|
||||
|
||||
- name: Never runs in check
|
||||
ansible.builtin.command: /bin/command
|
||||
check_mode: true # Only runs in check mode
|
||||
```
|
||||
|
||||
## Delegation
|
||||
|
||||
```yaml
|
||||
# Run on different host
|
||||
- name: Update load balancer
|
||||
ansible.builtin.command: /bin/update-lb
|
||||
delegate_to: loadbalancer
|
||||
|
||||
# Run locally
|
||||
- name: Local action
|
||||
ansible.builtin.command: /bin/local-command
|
||||
delegate_to: localhost
|
||||
|
||||
# Run once for all hosts
|
||||
- name: Single execution
|
||||
ansible.builtin.command: /bin/command
|
||||
run_once: true
|
||||
```
|
||||
155
skills/ansible/references/proxmox/authentication.md
Normal file
155
skills/ansible/references/proxmox/authentication.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Ansible Proxmox Authentication
|
||||
|
||||
## API Token Setup
|
||||
|
||||
Create a dedicated Ansible user and API token on Proxmox:
|
||||
|
||||
```bash
|
||||
# On Proxmox node
|
||||
pveum user add ansible@pve
|
||||
pveum aclmod / -user ansible@pve -role PVEAdmin
|
||||
pveum user token add ansible@pve mytoken --privsep 0
|
||||
```
|
||||
|
||||
**Note:** `--privsep 0` gives the token the same permissions as the user.
|
||||
|
||||
## Playbook Variables
|
||||
|
||||
### Direct in playbook (NOT recommended)
|
||||
|
||||
```yaml
|
||||
vars:
|
||||
proxmox_api_host: proxmox.example.com
|
||||
proxmox_api_user: ansible@pve
|
||||
proxmox_api_token_id: mytoken
|
||||
proxmox_api_token_secret: "{{ vault_proxmox_token }}"
|
||||
```
|
||||
|
||||
### Group vars with vault
|
||||
|
||||
```yaml
|
||||
# group_vars/all.yml
|
||||
proxmox_api_host: proxmox.example.com
|
||||
proxmox_api_user: ansible@pve
|
||||
proxmox_api_token_id: mytoken
|
||||
|
||||
# group_vars/secrets.yml (ansible-vault encrypted)
|
||||
proxmox_api_token_secret: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
|
||||
```
|
||||
|
||||
### Environment variables
|
||||
|
||||
```bash
|
||||
export PROXMOX_HOST=proxmox.example.com
|
||||
export PROXMOX_USER=ansible@pve
|
||||
export PROXMOX_TOKEN_ID=mytoken
|
||||
export PROXMOX_TOKEN_SECRET=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
|
||||
```
|
||||
|
||||
```yaml
|
||||
# In playbook
|
||||
vars:
|
||||
proxmox_api_host: "{{ lookup('env', 'PROXMOX_HOST') }}"
|
||||
proxmox_api_user: "{{ lookup('env', 'PROXMOX_USER') }}"
|
||||
proxmox_api_token_id: "{{ lookup('env', 'PROXMOX_TOKEN_ID') }}"
|
||||
proxmox_api_token_secret: "{{ lookup('env', 'PROXMOX_TOKEN_SECRET') }}"
|
||||
```
|
||||
|
||||
## Reusable Auth Block
|
||||
|
||||
Define once, reuse across tasks:
|
||||
|
||||
```yaml
|
||||
vars:
|
||||
proxmox_auth: &proxmox_auth
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
api_user: "{{ proxmox_api_user }}"
|
||||
api_token_id: "{{ proxmox_api_token_id }}"
|
||||
api_token_secret: "{{ proxmox_api_token_secret }}"
|
||||
validate_certs: false # For self-signed certs
|
||||
|
||||
tasks:
|
||||
- name: Create VM
|
||||
community.general.proxmox_kvm:
|
||||
<<: *proxmox_auth
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
state: present
|
||||
|
||||
- name: Start VM
|
||||
community.general.proxmox_kvm:
|
||||
<<: *proxmox_auth
|
||||
vmid: 300
|
||||
state: started
|
||||
```
|
||||
|
||||
## TLS Certificate Handling
|
||||
|
||||
### Self-signed certificates
|
||||
|
||||
```yaml
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth params ...
|
||||
validate_certs: false
|
||||
```
|
||||
|
||||
### Custom CA
|
||||
|
||||
```bash
|
||||
export SSL_CERT_FILE=/path/to/ca-bundle.crt
|
||||
```
|
||||
|
||||
Or in ansible.cfg:
|
||||
|
||||
```ini
|
||||
[defaults]
|
||||
# For urllib3/requests
|
||||
ca_cert = /path/to/ca-bundle.crt
|
||||
```
|
||||
|
||||
## Minimum Required Permissions
|
||||
|
||||
For full VM/container management:
|
||||
|
||||
| Permission | Path | Purpose |
|
||||
|------------|------|---------|
|
||||
| VM.Allocate | / | Create VMs |
|
||||
| VM.Clone | / | Clone templates |
|
||||
| VM.Config.* | / | Modify VM config |
|
||||
| VM.PowerMgmt | / | Start/stop VMs |
|
||||
| VM.Snapshot | / | Create snapshots |
|
||||
| Datastore.AllocateSpace | / | Allocate disk space |
|
||||
| Datastore.Audit | / | List storage |
|
||||
|
||||
Or use the built-in `PVEAdmin` role for full access.
|
||||
|
||||
## Troubleshooting Auth Issues
|
||||
|
||||
```yaml
|
||||
# Debug task to test connection
|
||||
- name: Test Proxmox API connection
|
||||
community.general.proxmox_kvm:
|
||||
api_host: "{{ proxmox_api_host }}"
|
||||
api_user: "{{ proxmox_api_user }}"
|
||||
api_token_id: "{{ proxmox_api_token_id }}"
|
||||
api_token_secret: "{{ proxmox_api_token_secret }}"
|
||||
validate_certs: false
|
||||
vmid: 100
|
||||
state: current
|
||||
register: result
|
||||
ignore_errors: true
|
||||
|
||||
- name: Show result
|
||||
ansible.builtin.debug:
|
||||
var: result
|
||||
```
|
||||
|
||||
Common errors:
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| 401 Unauthorized | Bad token | Verify token ID format: `user@realm!tokenname` |
|
||||
| 403 Forbidden | Insufficient permissions | Check user ACLs with `pveum user permissions ansible@pve` |
|
||||
| SSL certificate problem | Self-signed cert | Set `validate_certs: false` |
|
||||
| Connection refused | Wrong host/port | Verify API URL (port 8006) |
|
||||
195
skills/ansible/references/proxmox/dynamic-inventory.md
Normal file
195
skills/ansible/references/proxmox/dynamic-inventory.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Ansible Proxmox Dynamic Inventory
|
||||
|
||||
Query Proxmox API for automatic inventory generation.
|
||||
|
||||
## Plugin Setup
|
||||
|
||||
### Requirements
|
||||
|
||||
```bash
|
||||
pip install proxmoxer requests
|
||||
ansible-galaxy collection install community.general
|
||||
```
|
||||
|
||||
### Inventory File
|
||||
|
||||
Create `inventory/proxmox.yml`:
|
||||
|
||||
```yaml
|
||||
plugin: community.general.proxmox
|
||||
url: https://proxmox.example.com:8006
|
||||
user: ansible@pve
|
||||
token_id: mytoken
|
||||
token_secret: "{{ lookup('env', 'PROXMOX_TOKEN_SECRET') }}"
|
||||
validate_certs: false
|
||||
|
||||
# Include VMs and containers
|
||||
want_facts: true
|
||||
want_proxmox_nodes_ansible_host: false
|
||||
|
||||
# Filter by status
|
||||
filters:
|
||||
- status == "running"
|
||||
|
||||
# Group by various attributes
|
||||
groups:
|
||||
# By Proxmox node
|
||||
node_joseph: proxmox_node == "joseph"
|
||||
node_maxwell: proxmox_node == "maxwell"
|
||||
node_everette: proxmox_node == "everette"
|
||||
|
||||
# By type
|
||||
vms: proxmox_type == "qemu"
|
||||
containers: proxmox_type == "lxc"
|
||||
|
||||
# By template naming convention
|
||||
docker_hosts: "'docker' in proxmox_name"
|
||||
pihole: "'pihole' in proxmox_name"
|
||||
|
||||
# Host variables from Proxmox
|
||||
compose:
|
||||
ansible_host: proxmox_agent_interfaces[0].ip-addresses[0].ip-address | default(proxmox_name)
|
||||
ansible_user: "'ubuntu'"
|
||||
proxmox_vmid: proxmox_vmid
|
||||
proxmox_node: proxmox_node
|
||||
```
|
||||
|
||||
### Enable in ansible.cfg
|
||||
|
||||
```ini
|
||||
[inventory]
|
||||
enable_plugins = community.general.proxmox, yaml, ini
|
||||
```
|
||||
|
||||
## Testing Inventory
|
||||
|
||||
```bash
|
||||
# List all hosts
|
||||
ansible-inventory -i inventory/proxmox.yml --list
|
||||
|
||||
# Graph view
|
||||
ansible-inventory -i inventory/proxmox.yml --graph
|
||||
|
||||
# Specific host details
|
||||
ansible-inventory -i inventory/proxmox.yml --host myvm
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Filter by Tags
|
||||
|
||||
Proxmox 7+ supports VM tags:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
webservers: "'web' in proxmox_tags"
|
||||
databases: "'db' in proxmox_tags"
|
||||
production: "'prod' in proxmox_tags"
|
||||
```
|
||||
|
||||
### Filter by VMID Range
|
||||
|
||||
```yaml
|
||||
filters:
|
||||
- vmid >= 200
|
||||
- vmid < 300
|
||||
|
||||
groups:
|
||||
dev_vms: proxmox_vmid >= 200 and proxmox_vmid < 300
|
||||
prod_vms: proxmox_vmid >= 300 and proxmox_vmid < 400
|
||||
```
|
||||
|
||||
### IP Address from QEMU Agent
|
||||
|
||||
Requires QEMU guest agent running in VM:
|
||||
|
||||
```yaml
|
||||
compose:
|
||||
# Primary IP from agent
|
||||
ansible_host: >-
|
||||
proxmox_agent_interfaces
|
||||
| selectattr('name', 'equalto', 'eth0')
|
||||
| map(attribute='ip-addresses')
|
||||
| flatten
|
||||
| selectattr('ip-address-type', 'equalto', 'ipv4')
|
||||
| map(attribute='ip-address')
|
||||
| first
|
||||
| default(proxmox_name)
|
||||
```
|
||||
|
||||
### Static + Dynamic Inventory
|
||||
|
||||
Combine with static inventory:
|
||||
|
||||
```bash
|
||||
# inventory/
|
||||
# static.yml # Static hosts
|
||||
# proxmox.yml # Dynamic from Proxmox
|
||||
|
||||
ansible-playbook -i inventory/ playbook.yml
|
||||
```
|
||||
|
||||
## Available Variables
|
||||
|
||||
Variables populated from Proxmox API:
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| proxmox_vmid | VM/container ID |
|
||||
| proxmox_name | VM/container name |
|
||||
| proxmox_type | "qemu" or "lxc" |
|
||||
| proxmox_status | running, stopped, etc. |
|
||||
| proxmox_node | Proxmox node name |
|
||||
| proxmox_pool | Resource pool (if any) |
|
||||
| proxmox_tags | Tags (Proxmox 7+) |
|
||||
| proxmox_template | Is template (bool) |
|
||||
| proxmox_agent | QEMU agent enabled (bool) |
|
||||
| proxmox_agent_interfaces | Network info from agent |
|
||||
| proxmox_cpus | CPU count |
|
||||
| proxmox_maxmem | Max memory bytes |
|
||||
| proxmox_maxdisk | Max disk bytes |
|
||||
|
||||
## Caching
|
||||
|
||||
Enable caching for faster inventory:
|
||||
|
||||
```yaml
|
||||
plugin: community.general.proxmox
|
||||
# ... auth ...
|
||||
|
||||
cache: true
|
||||
cache_plugin: jsonfile
|
||||
cache_connection: /tmp/ansible_proxmox_cache
|
||||
cache_timeout: 300 # 5 minutes
|
||||
```
|
||||
|
||||
Clear cache:
|
||||
```bash
|
||||
rm -rf /tmp/ansible_proxmox_cache
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No hosts returned
|
||||
|
||||
1. Check API connectivity:
|
||||
```bash
|
||||
curl -k "https://proxmox:8006/api2/json/cluster/resources" \
|
||||
-H "Authorization: PVEAPIToken=ansible@pve!mytoken=secret"
|
||||
```
|
||||
|
||||
2. Check filters aren't too restrictive - try removing them
|
||||
|
||||
3. Verify token permissions include `VM.Audit`
|
||||
|
||||
### QEMU agent data missing
|
||||
|
||||
- Agent must be installed and running in guest
|
||||
- `want_facts: true` must be set
|
||||
- May take a few seconds after VM boot
|
||||
|
||||
### Slow inventory queries
|
||||
|
||||
- Enable caching (see above)
|
||||
- Use filters to reduce results
|
||||
- Avoid `want_facts: true` if not needed
|
||||
202
skills/ansible/references/proxmox/gotchas.md
Normal file
202
skills/ansible/references/proxmox/gotchas.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Ansible Proxmox Gotchas
|
||||
|
||||
Common issues when using Ansible with Proxmox VE.
|
||||
|
||||
## 1. Token ID Format
|
||||
|
||||
**Wrong:**
|
||||
```yaml
|
||||
api_token_id: mytoken
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```yaml
|
||||
api_token_id: mytoken # Just the token name, NOT user@realm!tokenname
|
||||
```
|
||||
|
||||
The module combines `api_user` and `api_token_id` internally.
|
||||
|
||||
## 2. VMID Required for Most Operations
|
||||
|
||||
Unlike Terraform, you must always specify `vmid`:
|
||||
|
||||
```yaml
|
||||
# Won't auto-generate VMID
|
||||
- name: Create VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
vmid: 300 # REQUIRED - no auto-assignment
|
||||
name: myvm
|
||||
```
|
||||
|
||||
To find next available VMID:
|
||||
```yaml
|
||||
- name: Get cluster resources
|
||||
ansible.builtin.uri:
|
||||
url: "https://{{ proxmox_api_host }}:8006/api2/json/cluster/resources"
|
||||
headers:
|
||||
Authorization: "PVEAPIToken={{ proxmox_api_user }}!{{ proxmox_api_token_id }}={{ proxmox_api_token_secret }}"
|
||||
validate_certs: false
|
||||
register: resources
|
||||
|
||||
- name: Calculate next VMID
|
||||
ansible.builtin.set_fact:
|
||||
next_vmid: "{{ (resources.json.data | selectattr('vmid', 'defined') | map(attribute='vmid') | max) + 1 }}"
|
||||
```
|
||||
|
||||
## 3. Node Parameter Required
|
||||
|
||||
Must specify which node to operate on:
|
||||
|
||||
```yaml
|
||||
- name: Create VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
node: joseph # REQUIRED - which Proxmox node
|
||||
vmid: 300
|
||||
```
|
||||
|
||||
## 4. Clone vs Create
|
||||
|
||||
Cloning requires different parameters than creating:
|
||||
|
||||
```yaml
|
||||
# CLONE from template
|
||||
- name: Clone VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
clone: tmpl-ubuntu-2404-standard # Template name or VMID
|
||||
full: true
|
||||
|
||||
# CREATE new (less common)
|
||||
- name: Create VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
ostype: l26
|
||||
scsihw: virtio-scsi-pci
|
||||
bootdisk: scsi0
|
||||
scsi:
|
||||
scsi0: 'local-lvm:32,format=raw'
|
||||
```
|
||||
|
||||
## 5. Async Operations
|
||||
|
||||
Large operations (clone, snapshot) can timeout. Use async:
|
||||
|
||||
```yaml
|
||||
- name: Clone large VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
clone: large-template
|
||||
vmid: 300
|
||||
timeout: 600 # Module timeout
|
||||
async: 900 # Ansible async timeout
|
||||
poll: 10 # Check every 10 seconds
|
||||
```
|
||||
|
||||
## 6. State Idempotency
|
||||
|
||||
`state: present` doesn't update existing VMs:
|
||||
|
||||
```yaml
|
||||
# This WON'T change cores on existing VM
|
||||
- name: Create/update VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
vmid: 300
|
||||
cores: 4 # Ignored if VM exists
|
||||
state: present
|
||||
```
|
||||
|
||||
To modify existing VMs, use `proxmox_kvm` with `update: true` (Ansible 2.14+) or use the API directly.
|
||||
|
||||
## 7. Network Interface Format (LXC)
|
||||
|
||||
LXC containers use a specific JSON-like string format:
|
||||
|
||||
```yaml
|
||||
# WRONG
|
||||
netif:
|
||||
net0:
|
||||
bridge: vmbr0
|
||||
ip: dhcp
|
||||
|
||||
# CORRECT
|
||||
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp"}'
|
||||
|
||||
# Multiple interfaces
|
||||
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp","net1":"name=eth1,bridge=vmbr12,ip=dhcp"}'
|
||||
```
|
||||
|
||||
## 8. Disk Resize Only Grows
|
||||
|
||||
`proxmox_disk` resize only increases size:
|
||||
|
||||
```yaml
|
||||
# This adds 20G to current size
|
||||
- name: Grow disk
|
||||
community.general.proxmox_disk:
|
||||
# ... auth ...
|
||||
vmid: 300
|
||||
disk: scsi0
|
||||
size: +20G # Relative increase
|
||||
state: resized
|
||||
|
||||
# NOT possible to shrink
|
||||
```
|
||||
|
||||
## 9. Template vs VM States
|
||||
|
||||
Templates don't support all states:
|
||||
|
||||
```yaml
|
||||
# Can't start a template
|
||||
- name: Start template
|
||||
community.general.proxmox_kvm:
|
||||
vmid: 100
|
||||
state: started # FAILS - templates can't run
|
||||
```
|
||||
|
||||
Convert template to VM first if needed.
|
||||
|
||||
## 10. Collection Version Matters
|
||||
|
||||
Module parameters change between versions. Check installed version:
|
||||
|
||||
```bash
|
||||
ansible-galaxy collection list | grep community.general
|
||||
```
|
||||
|
||||
Update if needed:
|
||||
```bash
|
||||
ansible-galaxy collection install community.general --upgrade
|
||||
```
|
||||
|
||||
## 11. Cloud-Init Not Supported
|
||||
|
||||
Unlike Terraform's Proxmox provider, the Ansible modules have limited cloud-init support. For cloud-init VMs:
|
||||
|
||||
1. Clone template with cloud-init already configured
|
||||
2. Use API calls to set cloud-init parameters
|
||||
3. Or configure post-boot with Ansible
|
||||
|
||||
```yaml
|
||||
# Workaround: Use URI module for cloud-init config
|
||||
- name: Set cloud-init IP
|
||||
ansible.builtin.uri:
|
||||
url: "https://{{ proxmox_api_host }}:8006/api2/json/nodes/{{ node }}/qemu/{{ vmid }}/config"
|
||||
method: PUT
|
||||
headers:
|
||||
Authorization: "PVEAPIToken={{ proxmox_api_user }}!{{ proxmox_api_token_id }}={{ proxmox_api_token_secret }}"
|
||||
body_format: form-urlencoded
|
||||
body:
|
||||
ipconfig0: "ip=192.168.1.100/24,gw=192.168.1.1"
|
||||
ciuser: ubuntu
|
||||
validate_certs: false
|
||||
```
|
||||
232
skills/ansible/references/proxmox/modules.md
Normal file
232
skills/ansible/references/proxmox/modules.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# Ansible Proxmox Modules
|
||||
|
||||
Proxmox VE management via `community.general` collection.
|
||||
|
||||
## Collection Setup
|
||||
|
||||
```bash
|
||||
ansible-galaxy collection install community.general
|
||||
```
|
||||
|
||||
## Core Modules
|
||||
|
||||
### proxmox (LXC Containers)
|
||||
|
||||
```yaml
|
||||
- name: Create LXC container
|
||||
community.general.proxmox:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 200
|
||||
hostname: mycontainer
|
||||
ostemplate: local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst
|
||||
storage: local-lvm
|
||||
cores: 2
|
||||
memory: 2048
|
||||
disk: 10
|
||||
netif: '{"net0":"name=eth0,bridge=vmbr0,ip=dhcp"}'
|
||||
state: present
|
||||
|
||||
- name: Start container
|
||||
community.general.proxmox:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 200
|
||||
state: started
|
||||
|
||||
- name: Stop container
|
||||
community.general.proxmox:
|
||||
# ... auth params ...
|
||||
vmid: 200
|
||||
state: stopped
|
||||
force: true # Force stop if graceful fails
|
||||
|
||||
- name: Remove container
|
||||
community.general.proxmox:
|
||||
# ... auth params ...
|
||||
vmid: 200
|
||||
state: absent
|
||||
```
|
||||
|
||||
### proxmox_kvm (VMs)
|
||||
|
||||
```yaml
|
||||
- name: Create VM from template
|
||||
community.general.proxmox_kvm:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
clone: tmpl-ubuntu-2404-standard
|
||||
full: true # Full clone (not linked)
|
||||
storage: local-lvm
|
||||
format: raw
|
||||
timeout: 500
|
||||
|
||||
- name: Start VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth params ...
|
||||
node: joseph
|
||||
vmid: 300
|
||||
state: started
|
||||
|
||||
- name: Stop VM (ACPI shutdown)
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
state: stopped
|
||||
force: false # Graceful ACPI
|
||||
|
||||
- name: Force stop VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
state: stopped
|
||||
force: true
|
||||
|
||||
- name: Current state (running/stopped/present/absent)
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
state: current
|
||||
register: vm_state
|
||||
```
|
||||
|
||||
### proxmox_template
|
||||
|
||||
```yaml
|
||||
- name: Convert VM to template
|
||||
community.general.proxmox_template:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 100
|
||||
state: present # Convert to template
|
||||
|
||||
- name: Delete template
|
||||
community.general.proxmox_template:
|
||||
# ... auth params ...
|
||||
vmid: 100
|
||||
state: absent
|
||||
```
|
||||
|
||||
### proxmox_snap
|
||||
|
||||
```yaml
|
||||
- name: Create snapshot
|
||||
community.general.proxmox_snap:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
vmid: 300
|
||||
snapname: before-upgrade
|
||||
description: "Snapshot before major upgrade"
|
||||
vmstate: false # Don't include RAM
|
||||
state: present
|
||||
|
||||
- name: Rollback to snapshot
|
||||
community.general.proxmox_snap:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
snapname: before-upgrade
|
||||
state: rollback
|
||||
|
||||
- name: Remove snapshot
|
||||
community.general.proxmox_snap:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
snapname: before-upgrade
|
||||
state: absent
|
||||
```
|
||||
|
||||
### proxmox_nic
|
||||
|
||||
```yaml
|
||||
- name: Add NIC to VM
|
||||
community.general.proxmox_nic:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
vmid: 300
|
||||
interface: net1
|
||||
bridge: vmbr12
|
||||
model: virtio
|
||||
tag: 12 # VLAN tag
|
||||
state: present
|
||||
|
||||
- name: Remove NIC
|
||||
community.general.proxmox_nic:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
interface: net1
|
||||
state: absent
|
||||
```
|
||||
|
||||
### proxmox_disk
|
||||
|
||||
```yaml
|
||||
- name: Add disk to VM
|
||||
community.general.proxmox_disk:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
vmid: 300
|
||||
disk: scsi1
|
||||
storage: local-lvm
|
||||
size: 50G
|
||||
format: raw
|
||||
state: present
|
||||
|
||||
- name: Resize disk
|
||||
community.general.proxmox_disk:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
disk: scsi0
|
||||
size: +20G # Increase by 20G
|
||||
state: resized
|
||||
|
||||
- name: Detach disk
|
||||
community.general.proxmox_disk:
|
||||
# ... auth params ...
|
||||
vmid: 300
|
||||
disk: scsi1
|
||||
state: absent
|
||||
```
|
||||
|
||||
## State Reference
|
||||
|
||||
| Module | States |
|
||||
|--------|--------|
|
||||
| proxmox (LXC) | present, started, stopped, restarted, absent |
|
||||
| proxmox_kvm | present, started, stopped, restarted, absent, current |
|
||||
| proxmox_template | present, absent |
|
||||
| proxmox_snap | present, absent, rollback |
|
||||
| proxmox_nic | present, absent |
|
||||
| proxmox_disk | present, absent, resized |
|
||||
|
||||
## Common Parameters
|
||||
|
||||
All modules share these authentication parameters:
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| api_host | Proxmox hostname/IP |
|
||||
| api_user | User (format: user@realm) |
|
||||
| api_token_id | API token name |
|
||||
| api_token_secret | API token value |
|
||||
| validate_certs | Verify TLS (default: true) |
|
||||
| timeout | API timeout seconds |
|
||||
295
skills/ansible/references/troubleshooting.md
Normal file
295
skills/ansible/references/troubleshooting.md
Normal file
@@ -0,0 +1,295 @@
|
||||
# Ansible Troubleshooting Reference
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| SSH connection failed | Wrong host/key/user | Check ansible_host, ansible_user, key |
|
||||
| Permission denied | Need sudo/wrong user | Add `become: true`, check sudo config |
|
||||
| Module not found | Collection not installed | `ansible-galaxy collection install` |
|
||||
| Variable undefined | Missing var/typo | Check var name, define in vars |
|
||||
| Syntax error | YAML/Jinja2 issue | Run `ansible-playbook --syntax-check` |
|
||||
| Host unreachable | Network/SSH issue | `ansible host -m ping`, check firewall |
|
||||
|
||||
## Debug Commands
|
||||
|
||||
```bash
|
||||
# Test connectivity
|
||||
ansible all -m ping
|
||||
ansible host -m ping -vvv
|
||||
|
||||
# Syntax check
|
||||
ansible-playbook playbook.yml --syntax-check
|
||||
|
||||
# Dry run (check mode)
|
||||
ansible-playbook playbook.yml --check
|
||||
|
||||
# Diff mode (show changes)
|
||||
ansible-playbook playbook.yml --diff
|
||||
|
||||
# Verbose output
|
||||
ansible-playbook playbook.yml -v # Minimal
|
||||
ansible-playbook playbook.yml -vv # More
|
||||
ansible-playbook playbook.yml -vvv # Connection debug
|
||||
ansible-playbook playbook.yml -vvvv # Full debug
|
||||
|
||||
# List tasks without running
|
||||
ansible-playbook playbook.yml --list-tasks
|
||||
|
||||
# List hosts
|
||||
ansible-playbook playbook.yml --list-hosts
|
||||
|
||||
# Start at specific task
|
||||
ansible-playbook playbook.yml --start-at-task="Task name"
|
||||
|
||||
# Step through tasks
|
||||
ansible-playbook playbook.yml --step
|
||||
```
|
||||
|
||||
## Connection Issues
|
||||
|
||||
### Test SSH
|
||||
|
||||
```bash
|
||||
# Direct SSH test
|
||||
ssh -i ~/.ssh/key user@host
|
||||
|
||||
# Ansible ping
|
||||
ansible host -m ping -vvv
|
||||
|
||||
# Check SSH config
|
||||
ansible host -m debug -a "var=ansible_ssh_private_key_file"
|
||||
```
|
||||
|
||||
### Common SSH Fixes
|
||||
|
||||
```yaml
|
||||
# In inventory or ansible.cfg
|
||||
ansible_ssh_private_key_file: ~/.ssh/mykey
|
||||
ansible_user: ubuntu
|
||||
ansible_host: 192.168.1.10
|
||||
host_key_checking: False # Only for testing
|
||||
```
|
||||
|
||||
### SSH Connection Options
|
||||
|
||||
```yaml
|
||||
# In inventory
|
||||
host1:
|
||||
ansible_host: 192.168.1.10
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
|
||||
ansible_ssh_extra_args: '-o ConnectTimeout=10'
|
||||
```
|
||||
|
||||
## Permission Issues
|
||||
|
||||
### Sudo Not Working
|
||||
|
||||
```yaml
|
||||
# Enable become
|
||||
- hosts: all
|
||||
become: true
|
||||
become_method: sudo
|
||||
become_user: root
|
||||
```
|
||||
|
||||
```bash
|
||||
# On target host, check sudoers
|
||||
sudo visudo
|
||||
|
||||
# User should have:
|
||||
# ubuntu ALL=(ALL) NOPASSWD: ALL
|
||||
```
|
||||
|
||||
### Ask for Sudo Password
|
||||
|
||||
```bash
|
||||
ansible-playbook playbook.yml --ask-become-pass
|
||||
```
|
||||
|
||||
## Variable Issues
|
||||
|
||||
### Debug Variables
|
||||
|
||||
```yaml
|
||||
- name: Print all vars
|
||||
ansible.builtin.debug:
|
||||
var: vars
|
||||
|
||||
- name: Print specific var
|
||||
ansible.builtin.debug:
|
||||
var: my_var
|
||||
|
||||
- name: Print hostvars
|
||||
ansible.builtin.debug:
|
||||
var: hostvars[inventory_hostname]
|
||||
|
||||
- name: Print facts
|
||||
ansible.builtin.debug:
|
||||
var: ansible_facts
|
||||
```
|
||||
|
||||
### Check Variable Precedence
|
||||
|
||||
```bash
|
||||
# See where variable comes from
|
||||
ansible-inventory --host hostname --yaml
|
||||
```
|
||||
|
||||
### Undefined Variable
|
||||
|
||||
```yaml
|
||||
# Provide default
|
||||
value: "{{ my_var | default('fallback') }}"
|
||||
|
||||
# Check if defined
|
||||
- name: Task
|
||||
when: my_var is defined
|
||||
|
||||
# Fail early if required
|
||||
- name: Validate
|
||||
ansible.builtin.assert:
|
||||
that: my_var is defined
|
||||
fail_msg: "my_var must be set"
|
||||
```
|
||||
|
||||
## Module Issues
|
||||
|
||||
### Module Not Found
|
||||
|
||||
```bash
|
||||
# Install collection
|
||||
ansible-galaxy collection install community.docker
|
||||
|
||||
# Check installed
|
||||
ansible-galaxy collection list
|
||||
|
||||
# Update collections
|
||||
ansible-galaxy collection install -r requirements.yml --force
|
||||
```
|
||||
|
||||
### Module Arguments
|
||||
|
||||
```bash
|
||||
# Get module documentation
|
||||
ansible-doc ansible.builtin.copy
|
||||
ansible-doc community.docker.docker_compose_v2
|
||||
```
|
||||
|
||||
## Idempotency Issues
|
||||
|
||||
### Task Always Shows "changed"
|
||||
|
||||
```yaml
|
||||
# Bad - always changed
|
||||
- name: Run script
|
||||
ansible.builtin.command: /bin/script.sh
|
||||
|
||||
# Good - check first
|
||||
- name: Run script
|
||||
ansible.builtin.command: /bin/script.sh
|
||||
args:
|
||||
creates: /opt/app/.installed
|
||||
|
||||
# Good - explicit changed_when
|
||||
- name: Run script
|
||||
ansible.builtin.command: /bin/script.sh
|
||||
register: result
|
||||
changed_when: "'Created' in result.stdout"
|
||||
```
|
||||
|
||||
### Test Idempotency
|
||||
|
||||
```bash
|
||||
# Run twice, second should show all "ok"
|
||||
ansible-playbook playbook.yml
|
||||
ansible-playbook playbook.yml # Should show "changed=0"
|
||||
```
|
||||
|
||||
## Handler Issues
|
||||
|
||||
### Handler Not Running
|
||||
|
||||
- Handlers only run if task reports "changed"
|
||||
- Handlers run at end of play, not immediately
|
||||
- Force handler run: `ansible-playbook --force-handlers`
|
||||
|
||||
```yaml
|
||||
# Force handler to run immediately
|
||||
- name: Config change
|
||||
ansible.builtin.template:
|
||||
src: config.j2
|
||||
dest: /etc/app/config
|
||||
notify: Restart app
|
||||
|
||||
- name: Flush handlers
|
||||
ansible.builtin.meta: flush_handlers
|
||||
|
||||
- name: Continue with restarted service
|
||||
ansible.builtin.uri:
|
||||
url: http://localhost:8080/health
|
||||
```
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Slow Playbook
|
||||
|
||||
```yaml
|
||||
# Disable fact gathering if not needed
|
||||
- hosts: all
|
||||
gather_facts: false
|
||||
|
||||
# Or gather specific facts
|
||||
- hosts: all
|
||||
gather_facts: true
|
||||
gather_subset:
|
||||
- network
|
||||
```
|
||||
|
||||
```bash
|
||||
# Increase parallelism
|
||||
ansible-playbook playbook.yml -f 20 # 20 forks
|
||||
|
||||
# Use pipelining (add to ansible.cfg)
|
||||
# [ssh_connection]
|
||||
# pipelining = True
|
||||
```
|
||||
|
||||
### Callback Timer
|
||||
|
||||
```ini
|
||||
# ansible.cfg
|
||||
[defaults]
|
||||
callbacks_enabled = timer, profile_tasks
|
||||
```
|
||||
|
||||
## Recovery
|
||||
|
||||
### Failed Playbook
|
||||
|
||||
```bash
|
||||
# Retry failed hosts
|
||||
ansible-playbook playbook.yml --limit @playbook.retry
|
||||
|
||||
# Start at failed task
|
||||
ansible-playbook playbook.yml --start-at-task="Failed Task Name"
|
||||
```
|
||||
|
||||
### Cleanup After Failure
|
||||
|
||||
```yaml
|
||||
- name: Risky operation
|
||||
block:
|
||||
- name: Do something
|
||||
ansible.builtin.command: /bin/risky
|
||||
rescue:
|
||||
- name: Cleanup on failure
|
||||
ansible.builtin.file:
|
||||
path: /tmp/incomplete
|
||||
state: absent
|
||||
always:
|
||||
- name: Always cleanup
|
||||
ansible.builtin.file:
|
||||
path: /tmp/lock
|
||||
state: absent
|
||||
```
|
||||
246
skills/ansible/references/variables.md
Normal file
246
skills/ansible/references/variables.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Ansible Variables Reference
|
||||
|
||||
## Variable Precedence (High to Low)
|
||||
|
||||
1. **Extra vars** (`-e "var=value"`)
|
||||
2. **Task vars** (in task)
|
||||
3. **Block vars** (in block)
|
||||
4. **Role/include vars**
|
||||
5. **set_facts / registered vars**
|
||||
6. **Play vars_files**
|
||||
7. **Play vars_prompt**
|
||||
8. **Play vars**
|
||||
9. **Host facts**
|
||||
10. **Playbook host_vars/**
|
||||
11. **Inventory host_vars/**
|
||||
12. **Playbook group_vars/**
|
||||
13. **Inventory group_vars/**
|
||||
14. **Playbook group_vars/all**
|
||||
15. **Inventory group_vars/all**
|
||||
16. **Role defaults**
|
||||
|
||||
## Defining Variables
|
||||
|
||||
### In Playbook
|
||||
|
||||
```yaml
|
||||
- hosts: all
|
||||
vars:
|
||||
app_name: myapp
|
||||
app_port: 8080
|
||||
|
||||
vars_files:
|
||||
- vars/common.yml
|
||||
- "vars/{{ environment }}.yml"
|
||||
```
|
||||
|
||||
### In Tasks
|
||||
|
||||
```yaml
|
||||
- name: Set variable
|
||||
ansible.builtin.set_fact:
|
||||
my_var: "value"
|
||||
|
||||
- name: Register output
|
||||
ansible.builtin.command: whoami
|
||||
register: user_result
|
||||
|
||||
- name: Use registered
|
||||
ansible.builtin.debug:
|
||||
msg: "User: {{ user_result.stdout }}"
|
||||
```
|
||||
|
||||
### In Roles
|
||||
|
||||
```yaml
|
||||
# roles/app/defaults/main.yml (low priority)
|
||||
app_port: 8080
|
||||
|
||||
# roles/app/vars/main.yml (high priority)
|
||||
internal_setting: value
|
||||
```
|
||||
|
||||
## Variable Types
|
||||
|
||||
```yaml
|
||||
# String
|
||||
name: "value"
|
||||
|
||||
# Number
|
||||
port: 8080
|
||||
|
||||
# Boolean
|
||||
enabled: true
|
||||
|
||||
# List
|
||||
packages:
|
||||
- nginx
|
||||
- python3
|
||||
|
||||
# Dictionary
|
||||
user:
|
||||
name: admin
|
||||
groups:
|
||||
- wheel
|
||||
- docker
|
||||
```
|
||||
|
||||
## Accessing Variables
|
||||
|
||||
```yaml
|
||||
# Simple
|
||||
msg: "{{ my_var }}"
|
||||
|
||||
# Dictionary
|
||||
msg: "{{ user.name }}"
|
||||
msg: "{{ user['name'] }}"
|
||||
|
||||
# List
|
||||
msg: "{{ packages[0] }}"
|
||||
msg: "{{ packages | first }}"
|
||||
|
||||
# Default value
|
||||
msg: "{{ my_var | default('fallback') }}"
|
||||
|
||||
# Required (fail if undefined)
|
||||
msg: "{{ my_var }}" # Fails if undefined
|
||||
```
|
||||
|
||||
## Jinja2 Filters
|
||||
|
||||
```yaml
|
||||
# Default
|
||||
value: "{{ var | default('default') }}"
|
||||
|
||||
# Mandatory
|
||||
value: "{{ var | mandatory }}"
|
||||
|
||||
# Type conversion
|
||||
port: "{{ port_string | int }}"
|
||||
flag: "{{ flag_string | bool }}"
|
||||
|
||||
# String operations
|
||||
upper: "{{ name | upper }}"
|
||||
lower: "{{ name | lower }}"
|
||||
title: "{{ name | title }}"
|
||||
|
||||
# Lists
|
||||
first: "{{ list | first }}"
|
||||
last: "{{ list | last }}"
|
||||
length: "{{ list | length }}"
|
||||
joined: "{{ list | join(',') }}"
|
||||
|
||||
# JSON
|
||||
json_str: "{{ dict | to_json }}"
|
||||
yaml_str: "{{ dict | to_yaml }}"
|
||||
|
||||
# Path operations
|
||||
basename: "{{ path | basename }}"
|
||||
dirname: "{{ path | dirname }}"
|
||||
```
|
||||
|
||||
## Facts
|
||||
|
||||
```yaml
|
||||
# Accessing facts
|
||||
os: "{{ ansible_distribution }}"
|
||||
version: "{{ ansible_distribution_version }}"
|
||||
ip: "{{ ansible_default_ipv4.address }}"
|
||||
hostname: "{{ ansible_hostname }}"
|
||||
memory_mb: "{{ ansible_memtotal_mb }}"
|
||||
cpus: "{{ ansible_processor_vcpus }}"
|
||||
```
|
||||
|
||||
### Gathering Facts
|
||||
|
||||
```yaml
|
||||
- hosts: all
|
||||
gather_facts: true # Default
|
||||
|
||||
# Or manually
|
||||
- name: Gather facts
|
||||
ansible.builtin.setup:
|
||||
filter: ansible_*
|
||||
|
||||
# Specific facts
|
||||
- name: Get network facts
|
||||
ansible.builtin.setup:
|
||||
gather_subset:
|
||||
- network
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```yaml
|
||||
# Lookup
|
||||
value: "{{ lookup('env', 'MY_VAR') }}"
|
||||
|
||||
# Set for task
|
||||
- name: Run with env
|
||||
ansible.builtin.command: /bin/command
|
||||
environment:
|
||||
MY_VAR: "{{ my_value }}"
|
||||
```
|
||||
|
||||
## Secrets/Vault
|
||||
|
||||
```bash
|
||||
# Create encrypted file
|
||||
ansible-vault create secrets.yml
|
||||
|
||||
# Edit encrypted file
|
||||
ansible-vault edit secrets.yml
|
||||
|
||||
# Encrypt existing file
|
||||
ansible-vault encrypt vars.yml
|
||||
|
||||
# Run with vault password
|
||||
ansible-playbook playbook.yml --ask-vault-pass
|
||||
ansible-playbook playbook.yml --vault-password-file ~/.vault_pass
|
||||
```
|
||||
|
||||
## Prompt for Variables
|
||||
|
||||
```yaml
|
||||
- hosts: all
|
||||
vars_prompt:
|
||||
- name: password
|
||||
prompt: "Enter password"
|
||||
private: true
|
||||
|
||||
- name: environment
|
||||
prompt: "Which environment?"
|
||||
default: "staging"
|
||||
```
|
||||
|
||||
## Conditionals with Variables
|
||||
|
||||
```yaml
|
||||
- name: Check defined
|
||||
when: my_var is defined
|
||||
|
||||
- name: Check undefined
|
||||
when: my_var is not defined
|
||||
|
||||
- name: Check truthy
|
||||
when: my_var | bool
|
||||
|
||||
- name: Check falsy
|
||||
when: not my_var | bool
|
||||
|
||||
- name: Check in list
|
||||
when: item in my_list
|
||||
|
||||
- name: Version comparison
|
||||
when: version is version('2.0', '>=')
|
||||
```
|
||||
|
||||
## Hostvars
|
||||
|
||||
Access variables from other hosts:
|
||||
|
||||
```yaml
|
||||
- name: Get from other host
|
||||
ansible.builtin.debug:
|
||||
msg: "{{ hostvars['web1']['ansible_host'] }}"
|
||||
```
|
||||
121
skills/docker/SKILL.md
Normal file
121
skills/docker/SKILL.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
name: docker
|
||||
description: |
|
||||
Docker and Docker Compose reference for container deployment, networking, volumes,
|
||||
and orchestration. Includes Proxmox hosting and LXC comparison patterns.
|
||||
Use when working with docker-compose.yaml, Dockerfiles, troubleshooting containers,
|
||||
or planning container architecture.
|
||||
Triggers: docker, compose, container, dockerfile, volume, network, service, lxc.
|
||||
---
|
||||
|
||||
# Docker Skill
|
||||
|
||||
Docker and Docker Compose reference for containerized application deployment and management.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Container operations
|
||||
docker ps # List running containers
|
||||
docker ps -a # List all containers
|
||||
docker logs <container> # View logs
|
||||
docker logs -f <container> # Follow logs
|
||||
docker exec -it <container> sh # Shell into container
|
||||
docker inspect <container> # Full container details
|
||||
|
||||
# Compose operations
|
||||
docker compose up -d # Start services (detached)
|
||||
docker compose down # Stop and remove
|
||||
docker compose ps # List compose services
|
||||
docker compose logs -f # Follow all logs
|
||||
docker compose pull # Pull latest images
|
||||
docker compose restart # Restart services
|
||||
|
||||
# Troubleshooting
|
||||
docker stats # Resource usage
|
||||
docker network ls # List networks
|
||||
docker network inspect <net> # Network details
|
||||
docker volume ls # List volumes
|
||||
docker system df # Disk usage
|
||||
docker system prune # Clean up unused resources
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
Load on-demand based on task:
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Compose Structure | [compose.md](references/compose.md) | Writing docker-compose.yaml |
|
||||
| Networking | [networking.md](references/networking.md) | Network modes, port mapping |
|
||||
| Volumes | [volumes.md](references/volumes.md) | Data persistence, mounts |
|
||||
| Dockerfile | [dockerfile.md](references/dockerfile.md) | Building images |
|
||||
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, diagnostics |
|
||||
|
||||
### Proxmox Integration
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Docker on Proxmox | [proxmox/hosting.md](references/proxmox/hosting.md) | VM sizing, storage, GPU passthrough |
|
||||
| LXC vs Docker | [proxmox/lxc-vs-docker.md](references/proxmox/lxc-vs-docker.md) | Choosing container type |
|
||||
|
||||
## Compose File Quick Reference
|
||||
|
||||
```yaml
|
||||
name: myapp # Project name (optional)
|
||||
|
||||
services:
|
||||
web:
|
||||
image: nginx:alpine
|
||||
ports:
|
||||
- "80:80"
|
||||
volumes:
|
||||
- ./html:/usr/share/nginx/html:ro
|
||||
networks:
|
||||
- frontend
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
networks:
|
||||
frontend:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
data:
|
||||
```
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before deploying containers:
|
||||
|
||||
- [ ] Services defined with specific image tags (not :latest)
|
||||
- [ ] Port mappings without conflicts
|
||||
- [ ] Volumes for persistent data
|
||||
- [ ] Networks configured appropriately
|
||||
- [ ] Resource limits set (memory, CPU)
|
||||
- [ ] Health checks for critical services
|
||||
- [ ] Restart policy appropriate
|
||||
- [ ] Secrets not in images or compose file
|
||||
- [ ] .env file for environment variables
|
||||
|
||||
## Network Mode Quick Decision
|
||||
|
||||
| Mode | Use Case | Isolation |
|
||||
|------|----------|-----------|
|
||||
| bridge | Default, most services | Container isolated |
|
||||
| host | Performance, network tools | No isolation |
|
||||
| macvlan | Direct LAN access | Own MAC/IP |
|
||||
| ipvlan | Like macvlan, shared MAC | Own IP |
|
||||
| none | No networking | Full isolation |
|
||||
|
||||
## Volume Type Quick Decision
|
||||
|
||||
| Type | Use Case | Portability |
|
||||
|------|----------|-------------|
|
||||
| Named volume | Database, app data | Best |
|
||||
| Bind mount | Config files, dev | Host-dependent |
|
||||
| tmpfs | Secrets, cache | Memory only |
|
||||
268
skills/docker/references/compose.md
Normal file
268
skills/docker/references/compose.md
Normal file
@@ -0,0 +1,268 @@
|
||||
# Docker Compose Reference
|
||||
|
||||
## File Structure
|
||||
|
||||
```yaml
|
||||
name: project-name # Optional, defaults to directory name
|
||||
|
||||
services:
|
||||
service-name:
|
||||
# Image or build
|
||||
image: image:tag
|
||||
build:
|
||||
context: ./path
|
||||
dockerfile: Dockerfile
|
||||
|
||||
# Networking
|
||||
ports:
|
||||
- "host:container"
|
||||
networks:
|
||||
- network-name
|
||||
|
||||
# Storage
|
||||
volumes:
|
||||
- named-volume:/path
|
||||
- ./host-path:/container-path
|
||||
|
||||
# Environment
|
||||
environment:
|
||||
KEY: value
|
||||
env_file:
|
||||
- .env
|
||||
|
||||
# Dependencies
|
||||
depends_on:
|
||||
- other-service
|
||||
|
||||
# Lifecycle
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
# Resources
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '0.5'
|
||||
memory: 512M
|
||||
reservations:
|
||||
memory: 256M
|
||||
|
||||
networks:
|
||||
network-name:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
named-volume:
|
||||
```
|
||||
|
||||
## Service Options
|
||||
|
||||
### Image vs Build
|
||||
|
||||
```yaml
|
||||
# Use existing image
|
||||
image: nginx:1.25-alpine
|
||||
|
||||
# Build from Dockerfile
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
args:
|
||||
BUILD_ARG: value
|
||||
```
|
||||
|
||||
### Port Mapping
|
||||
|
||||
```yaml
|
||||
ports:
|
||||
- "80:80" # host:container
|
||||
- "443:443"
|
||||
- "127.0.0.1:8080:80" # localhost only
|
||||
- "8080-8090:8080-8090" # range
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```yaml
|
||||
# Inline
|
||||
environment:
|
||||
DATABASE_URL: postgres://db:5432/app
|
||||
DEBUG: "false"
|
||||
|
||||
# From file
|
||||
env_file:
|
||||
- .env
|
||||
- .env.local
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
```yaml
|
||||
depends_on:
|
||||
- db
|
||||
- redis
|
||||
|
||||
# With conditions (compose v2.1+)
|
||||
depends_on:
|
||||
db:
|
||||
condition: service_healthy
|
||||
```
|
||||
|
||||
### Restart Policies
|
||||
|
||||
| Policy | Behavior |
|
||||
|--------|----------|
|
||||
| no | Never restart (default) |
|
||||
| always | Always restart |
|
||||
| unless-stopped | Restart unless manually stopped |
|
||||
| on-failure | Restart only on error exit |
|
||||
|
||||
### Health Checks
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost/health"]
|
||||
interval: 30s # Time between checks
|
||||
timeout: 10s # Check timeout
|
||||
retries: 3 # Failures before unhealthy
|
||||
start_period: 40s # Grace period on startup
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 1G
|
||||
reservations:
|
||||
cpus: '0.5'
|
||||
memory: 256M
|
||||
```
|
||||
|
||||
## Network Configuration
|
||||
|
||||
### Custom Network
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
frontend:
|
||||
driver: bridge
|
||||
backend:
|
||||
driver: bridge
|
||||
internal: true # No external access
|
||||
```
|
||||
|
||||
### External Network
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
existing-network:
|
||||
external: true
|
||||
```
|
||||
|
||||
### Macvlan Network
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
lan:
|
||||
driver: macvlan
|
||||
driver_opts:
|
||||
parent: eth0
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 192.168.1.0/24
|
||||
gateway: 192.168.1.1
|
||||
```
|
||||
|
||||
## Volume Configuration
|
||||
|
||||
### Named Volume
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
data:
|
||||
driver: local
|
||||
|
||||
services:
|
||||
db:
|
||||
volumes:
|
||||
- data:/var/lib/mysql
|
||||
```
|
||||
|
||||
### Bind Mount
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
volumes:
|
||||
- ./config:/etc/app/config:ro
|
||||
- ./data:/app/data
|
||||
```
|
||||
|
||||
### tmpfs Mount
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
tmpfs:
|
||||
- /tmp
|
||||
- /run
|
||||
```
|
||||
|
||||
## Multi-Environment Setup
|
||||
|
||||
### Using .env Files
|
||||
|
||||
```bash
|
||||
# .env
|
||||
COMPOSE_PROJECT_NAME=myapp
|
||||
IMAGE_TAG=latest
|
||||
```
|
||||
|
||||
```yaml
|
||||
# docker-compose.yaml
|
||||
services:
|
||||
app:
|
||||
image: myapp:${IMAGE_TAG}
|
||||
```
|
||||
|
||||
### Override Files
|
||||
|
||||
```bash
|
||||
# Base config
|
||||
docker-compose.yaml
|
||||
|
||||
# Development overrides
|
||||
docker-compose.override.yaml # Auto-loaded
|
||||
|
||||
# Production
|
||||
docker compose -f docker-compose.yaml -f docker-compose.prod.yaml up
|
||||
```
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# Start with rebuild
|
||||
docker compose up -d --build
|
||||
|
||||
# Scale service
|
||||
docker compose up -d --scale web=3
|
||||
|
||||
# View config after variable substitution
|
||||
docker compose config
|
||||
|
||||
# Execute command in service
|
||||
docker compose exec web sh
|
||||
|
||||
# View service logs
|
||||
docker compose logs -f web
|
||||
|
||||
# Restart single service
|
||||
docker compose restart web
|
||||
```
|
||||
243
skills/docker/references/dockerfile.md
Normal file
243
skills/docker/references/dockerfile.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Dockerfile Reference
|
||||
|
||||
## Basic Structure
|
||||
|
||||
```dockerfile
|
||||
# Base image
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Metadata
|
||||
LABEL maintainer="team@example.com"
|
||||
LABEL version="1.0"
|
||||
|
||||
# Environment
|
||||
ENV APP_HOME=/app
|
||||
WORKDIR $APP_HOME
|
||||
|
||||
# Install dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
package1 \
|
||||
package2 \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy files
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt
|
||||
COPY . .
|
||||
|
||||
# Non-root user
|
||||
RUN useradd -r -s /bin/false appuser
|
||||
USER appuser
|
||||
|
||||
# Expose port
|
||||
EXPOSE 8080
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=3s \
|
||||
CMD curl -f http://localhost:8080/health || exit 1
|
||||
|
||||
# Entry point
|
||||
ENTRYPOINT ["python"]
|
||||
CMD ["app.py"]
|
||||
```
|
||||
|
||||
## Multi-Stage Builds
|
||||
|
||||
Reduce final image size by separating build and runtime:
|
||||
|
||||
```dockerfile
|
||||
# Build stage
|
||||
FROM golang:1.21 AS builder
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
RUN go build -o myapp
|
||||
|
||||
# Runtime stage
|
||||
FROM alpine:3.18
|
||||
COPY --from=builder /app/myapp /usr/local/bin/
|
||||
CMD ["myapp"]
|
||||
```
|
||||
|
||||
## Common Base Images
|
||||
|
||||
| Image | Size | Use Case |
|
||||
|-------|------|----------|
|
||||
| alpine | ~5MB | Minimal, production |
|
||||
| debian:slim | ~80MB | Compatibility |
|
||||
| ubuntu | ~75MB | Development |
|
||||
| distroless | ~20MB | Security-focused |
|
||||
| scratch | 0MB | Static binaries only |
|
||||
|
||||
## Instructions Reference
|
||||
|
||||
### FROM
|
||||
|
||||
```dockerfile
|
||||
FROM image:tag
|
||||
FROM image:tag AS builder
|
||||
FROM --platform=linux/amd64 image:tag
|
||||
```
|
||||
|
||||
### RUN
|
||||
|
||||
```dockerfile
|
||||
# Shell form
|
||||
RUN apt-get update && apt-get install -y package
|
||||
|
||||
# Exec form
|
||||
RUN ["executable", "param1", "param2"]
|
||||
```
|
||||
|
||||
### COPY vs ADD
|
||||
|
||||
```dockerfile
|
||||
# COPY - preferred for local files
|
||||
COPY ./src /app/src
|
||||
COPY --chown=user:group files /app/
|
||||
|
||||
# ADD - can extract tars, fetch URLs (use sparingly)
|
||||
ADD archive.tar.gz /app/
|
||||
```
|
||||
|
||||
### ENV vs ARG
|
||||
|
||||
```dockerfile
|
||||
# ARG - build-time only
|
||||
ARG VERSION=1.0
|
||||
|
||||
# ENV - persists in image
|
||||
ENV APP_VERSION=$VERSION
|
||||
```
|
||||
|
||||
### EXPOSE
|
||||
|
||||
```dockerfile
|
||||
EXPOSE 8080
|
||||
EXPOSE 443/tcp
|
||||
EXPOSE 53/udp
|
||||
```
|
||||
|
||||
Documentation only - doesn't publish ports.
|
||||
|
||||
### ENTRYPOINT vs CMD
|
||||
|
||||
```dockerfile
|
||||
# ENTRYPOINT - main executable
|
||||
ENTRYPOINT ["python"]
|
||||
|
||||
# CMD - default arguments (can be overridden)
|
||||
CMD ["app.py"]
|
||||
|
||||
# Combined: python app.py
|
||||
# Override: docker run image other.py -> python other.py
|
||||
```
|
||||
|
||||
### USER
|
||||
|
||||
```dockerfile
|
||||
RUN useradd -r -s /bin/false appuser
|
||||
USER appuser
|
||||
```
|
||||
|
||||
### HEALTHCHECK
|
||||
|
||||
```dockerfile
|
||||
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
||||
CMD curl -f http://localhost/health || exit 1
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Layer Caching
|
||||
|
||||
Order from least to most frequently changed:
|
||||
|
||||
```dockerfile
|
||||
# Rarely changes - cached
|
||||
FROM node:18-alpine
|
||||
WORKDIR /app
|
||||
|
||||
# Changes when deps change
|
||||
COPY package*.json ./
|
||||
RUN npm install
|
||||
|
||||
# Changes frequently - rebuild each time
|
||||
COPY . .
|
||||
```
|
||||
|
||||
### Reduce Layers
|
||||
|
||||
Combine RUN commands:
|
||||
|
||||
```dockerfile
|
||||
# Bad - 3 layers
|
||||
RUN apt-get update
|
||||
RUN apt-get install -y package
|
||||
RUN rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Good - 1 layer
|
||||
RUN apt-get update && \
|
||||
apt-get install -y package && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
```
|
||||
|
||||
### Security
|
||||
|
||||
```dockerfile
|
||||
# Use specific tags
|
||||
FROM node:18.17.0-alpine # Not :latest
|
||||
|
||||
# Non-root user
|
||||
USER nobody
|
||||
|
||||
# Read-only filesystem
|
||||
# (Set at runtime with --read-only)
|
||||
|
||||
# No secrets in image
|
||||
# (Use build args or runtime secrets)
|
||||
```
|
||||
|
||||
### .dockerignore
|
||||
|
||||
```
|
||||
.git
|
||||
.gitignore
|
||||
node_modules
|
||||
*.log
|
||||
.env
|
||||
Dockerfile
|
||||
docker-compose.yaml
|
||||
README.md
|
||||
```
|
||||
|
||||
## Build Commands
|
||||
|
||||
```bash
|
||||
# Basic build
|
||||
docker build -t myimage:tag .
|
||||
|
||||
# With build args
|
||||
docker build --build-arg VERSION=1.0 -t myimage .
|
||||
|
||||
# No cache
|
||||
docker build --no-cache -t myimage .
|
||||
|
||||
# Specific Dockerfile
|
||||
docker build -f Dockerfile.prod -t myimage .
|
||||
|
||||
# Multi-platform
|
||||
docker buildx build --platform linux/amd64,linux/arm64 -t myimage .
|
||||
```
|
||||
|
||||
## Debugging Builds
|
||||
|
||||
```bash
|
||||
# Build with progress output
|
||||
docker build --progress=plain -t myimage .
|
||||
|
||||
# Inspect layers
|
||||
docker history myimage
|
||||
|
||||
# Check image size
|
||||
docker images myimage
|
||||
```
|
||||
229
skills/docker/references/networking.md
Normal file
229
skills/docker/references/networking.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Docker Networking Reference
|
||||
|
||||
## Network Drivers
|
||||
|
||||
### Bridge (Default)
|
||||
|
||||
Isolated container network with port mapping.
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
app-network:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
- Containers get private IPs (172.17.0.0/16 default)
|
||||
- Port mapping exposes services (`-p 80:80`)
|
||||
- DNS resolution between containers by name
|
||||
- Default for single-host deployments
|
||||
|
||||
### Host
|
||||
|
||||
Container shares host network stack.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
network_mode: host
|
||||
```
|
||||
|
||||
- No network isolation
|
||||
- No port mapping needed (container uses host ports)
|
||||
- Best performance (no NAT overhead)
|
||||
- Use for: Network tools, performance-critical apps
|
||||
|
||||
### Macvlan
|
||||
|
||||
Container gets own MAC address on physical network.
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
lan:
|
||||
driver: macvlan
|
||||
driver_opts:
|
||||
parent: eth0
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 192.168.1.0/24
|
||||
gateway: 192.168.1.1
|
||||
ip_range: 192.168.1.128/25
|
||||
```
|
||||
|
||||
- Container appears as physical device on LAN
|
||||
- Direct network access, no port mapping
|
||||
- Use for: Services needing LAN presence (DNS, DHCP)
|
||||
- Requires promiscuous mode on parent interface
|
||||
|
||||
### IPvlan
|
||||
|
||||
Like macvlan but shares host MAC address.
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
lan:
|
||||
driver: ipvlan
|
||||
driver_opts:
|
||||
parent: eth0
|
||||
ipvlan_mode: l2 # or l3
|
||||
```
|
||||
|
||||
- L2 mode: Same subnet as host
|
||||
- L3 mode: Different subnet, requires routing
|
||||
- Use when: Macvlan blocked by switch, cloud environments
|
||||
|
||||
### None
|
||||
|
||||
No networking.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
isolated:
|
||||
network_mode: none
|
||||
```
|
||||
|
||||
## Port Mapping
|
||||
|
||||
```yaml
|
||||
ports:
|
||||
# Simple mapping
|
||||
- "80:80"
|
||||
|
||||
# Different host port
|
||||
- "8080:80"
|
||||
|
||||
# Localhost only
|
||||
- "127.0.0.1:8080:80"
|
||||
|
||||
# UDP
|
||||
- "53:53/udp"
|
||||
|
||||
# Range
|
||||
- "8080-8090:8080-8090"
|
||||
|
||||
# Random host port
|
||||
- "80"
|
||||
```
|
||||
|
||||
## DNS and Service Discovery
|
||||
|
||||
### Automatic DNS
|
||||
|
||||
Containers on same network resolve each other by service name:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
networks:
|
||||
- app
|
||||
db:
|
||||
networks:
|
||||
- app
|
||||
```
|
||||
|
||||
`web` can reach `db` at hostname `db`.
|
||||
|
||||
### Aliases
|
||||
|
||||
```yaml
|
||||
services:
|
||||
db:
|
||||
networks:
|
||||
app:
|
||||
aliases:
|
||||
- database
|
||||
- mysql
|
||||
```
|
||||
|
||||
### Custom DNS
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
dns:
|
||||
- 8.8.8.8
|
||||
- 8.8.4.4
|
||||
dns_search:
|
||||
- example.com
|
||||
```
|
||||
|
||||
## Network Isolation
|
||||
|
||||
### Internal Networks
|
||||
|
||||
No external connectivity:
|
||||
|
||||
```yaml
|
||||
networks:
|
||||
backend:
|
||||
internal: true
|
||||
```
|
||||
|
||||
### Multiple Networks
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
networks:
|
||||
- frontend
|
||||
- backend
|
||||
|
||||
db:
|
||||
networks:
|
||||
- backend # Not on frontend
|
||||
|
||||
networks:
|
||||
frontend:
|
||||
backend:
|
||||
internal: true
|
||||
```
|
||||
|
||||
## Static IPs
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
networks:
|
||||
app-network:
|
||||
ipv4_address: 172.20.0.10
|
||||
|
||||
networks:
|
||||
app-network:
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 172.20.0.0/24
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Inspect Network
|
||||
|
||||
```bash
|
||||
docker network ls
|
||||
docker network inspect <network>
|
||||
```
|
||||
|
||||
### Container Network Info
|
||||
|
||||
```bash
|
||||
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
|
||||
```
|
||||
|
||||
### Test Connectivity
|
||||
|
||||
```bash
|
||||
# From inside container
|
||||
docker exec <container> ping <target>
|
||||
docker exec <container> curl <url>
|
||||
|
||||
# Check DNS
|
||||
docker exec <container> nslookup <hostname>
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Problem | Check |
|
||||
|---------|-------|
|
||||
| Can't reach container | Port mapping, firewall, network attachment |
|
||||
| DNS not working | Same network, container running |
|
||||
| Slow network | Network mode, MTU settings |
|
||||
| Port already in use | `lsof -i :<port>`, change mapping |
|
||||
227
skills/docker/references/proxmox/hosting.md
Normal file
227
skills/docker/references/proxmox/hosting.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# Docker on Proxmox VMs
|
||||
|
||||
Best practices for running Docker workloads on Proxmox VE.
|
||||
|
||||
## Template Selection
|
||||
|
||||
Use Docker-ready templates (102+) which have Docker pre-installed:
|
||||
|
||||
| Template ID | Name | Docker? |
|
||||
|-------------|------|---------|
|
||||
| 100 | tmpl-ubuntu-2404-base | No |
|
||||
| 101 | tmpl-ubuntu-2404-standard | No |
|
||||
| 102 | tmpl-ubuntu-2404-docker | Yes |
|
||||
| 103 | tmpl-ubuntu-2404-github-runner | Yes |
|
||||
| 104 | tmpl-ubuntu-2404-pihole | Yes |
|
||||
|
||||
**DO NOT** install Docker via cloud-init on templates 102+.
|
||||
|
||||
## VM vs LXC for Docker
|
||||
|
||||
| Factor | VM (QEMU) | LXC Unprivileged | LXC Privileged |
|
||||
|--------|-----------|------------------|----------------|
|
||||
| Docker support | Full | Limited | Works but risky |
|
||||
| Isolation | Complete | Shared kernel | Shared kernel |
|
||||
| Overhead | Higher | Lower | Lower |
|
||||
| Nested containers | Works | Requires config | Works |
|
||||
| GPU passthrough | Yes | Limited | Limited |
|
||||
| Security | Best | Good | Avoid |
|
||||
|
||||
**Recommendation:** Use VMs for Docker workloads. LXC adds complexity for marginal resource savings.
|
||||
|
||||
## VM Sizing for Docker
|
||||
|
||||
### Minimum for Docker host
|
||||
|
||||
```
|
||||
CPU: 2 cores
|
||||
RAM: 4 GB (2 GB for OS, 2 GB for containers)
|
||||
Disk: 50 GB (20 GB OS, 30 GB images/volumes)
|
||||
```
|
||||
|
||||
### Per-container overhead
|
||||
|
||||
```
|
||||
Base: ~10 MB RAM per container
|
||||
Image layers: Shared between containers
|
||||
Volumes: Depends on data
|
||||
```
|
||||
|
||||
### Sizing formula
|
||||
|
||||
```
|
||||
Total RAM = 2 GB (OS) + sum(container memory limits) + 20% buffer
|
||||
Total Disk = 20 GB (OS) + images + volumes + 20% buffer
|
||||
```
|
||||
|
||||
## Storage Backend Selection
|
||||
|
||||
| Proxmox Storage | Docker Use Case | Performance |
|
||||
|-----------------|-----------------|-------------|
|
||||
| local-lvm | General workloads | Good |
|
||||
| ZFS | Database containers | Better (snapshots) |
|
||||
| Ceph | HA workloads | Good (distributed) |
|
||||
| NFS | Shared config/data | Moderate |
|
||||
|
||||
### Volume mapping to Proxmox storage
|
||||
|
||||
```yaml
|
||||
# docker-compose.yaml
|
||||
volumes:
|
||||
db_data:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: none
|
||||
device: /mnt/storage/mysql # Map to Proxmox storage mount
|
||||
o: bind
|
||||
```
|
||||
|
||||
## Network Considerations
|
||||
|
||||
### Bridge mode (default)
|
||||
|
||||
Container gets private IP, NAT to VM IP. Good for most workloads.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
ports:
|
||||
- "80:80" # VM_IP:80 -> container:80
|
||||
```
|
||||
|
||||
### Host mode
|
||||
|
||||
Container shares VM network stack. Use for network tools or performance.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
pihole:
|
||||
network_mode: host # Container uses VM's IPs directly
|
||||
```
|
||||
|
||||
### Macvlan (direct LAN access)
|
||||
|
||||
Container gets own IP on Proxmox bridge.
|
||||
|
||||
```bash
|
||||
# On Docker host (VM)
|
||||
docker network create -d macvlan \
|
||||
--subnet=192.168.1.0/24 \
|
||||
--gateway=192.168.1.1 \
|
||||
-o parent=eth0 \
|
||||
lan
|
||||
```
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
networks:
|
||||
lan:
|
||||
ipv4_address: 192.168.1.50
|
||||
|
||||
networks:
|
||||
lan:
|
||||
external: true
|
||||
```
|
||||
|
||||
**Note:** Requires Proxmox bridge without VLAN tagging on that interface, or pass-through the VLAN-tagged interface to VM.
|
||||
|
||||
## Resource Limits
|
||||
|
||||
Always set limits to prevent container runaway affecting VM:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
reservations:
|
||||
cpus: '0.5'
|
||||
memory: 512M
|
||||
```
|
||||
|
||||
## GPU Passthrough
|
||||
|
||||
For containers needing GPU (AI/ML, transcoding):
|
||||
|
||||
1. **Proxmox:** Pass GPU to VM
|
||||
```
|
||||
hostpci0: 0000:01:00.0,pcie=1
|
||||
```
|
||||
|
||||
2. **VM:** Install NVIDIA drivers + nvidia-container-toolkit
|
||||
|
||||
3. **Compose:**
|
||||
```yaml
|
||||
services:
|
||||
plex:
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
## Backup Considerations
|
||||
|
||||
### What to backup
|
||||
|
||||
| Data | Method | Location |
|
||||
|------|--------|----------|
|
||||
| VM disk | Proxmox vzdump | Includes everything |
|
||||
| Docker volumes | docker run --volumes-from | Application-level |
|
||||
| Compose files | Git | Version control |
|
||||
|
||||
### Proxmox backup includes Docker
|
||||
|
||||
When backing up the VM with vzdump, all Docker data (images, volumes, containers) is included.
|
||||
|
||||
```bash
|
||||
vzdump <vmid> --mode snapshot --storage backup
|
||||
```
|
||||
|
||||
### Application-consistent backups
|
||||
|
||||
For databases, use pre/post scripts:
|
||||
|
||||
```bash
|
||||
# Pre-backup: flush and lock
|
||||
docker exec mysql mysql -e "FLUSH TABLES WITH READ LOCK;"
|
||||
|
||||
# vzdump runs...
|
||||
|
||||
# Post-backup: unlock
|
||||
docker exec mysql mysql -e "UNLOCK TABLES;"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Container can't reach internet
|
||||
|
||||
1. Check VM can reach internet: `ping 8.8.8.8`
|
||||
2. Check Docker DNS: `docker run --rm alpine nslookup google.com`
|
||||
3. Check iptables forwarding: `sysctl net.ipv4.ip_forward`
|
||||
|
||||
### Port not accessible from LAN
|
||||
|
||||
1. Check Proxmox firewall allows port
|
||||
2. Check VM firewall (ufw/iptables)
|
||||
3. Check container is bound to 0.0.0.0 not 127.0.0.1
|
||||
|
||||
### Disk space issues
|
||||
|
||||
```bash
|
||||
# Check Docker disk usage
|
||||
docker system df
|
||||
|
||||
# Clean up
|
||||
docker system prune -a --volumes # WARNING: removes all unused data
|
||||
|
||||
# Check VM disk
|
||||
df -h
|
||||
```
|
||||
140
skills/docker/references/proxmox/lxc-vs-docker.md
Normal file
140
skills/docker/references/proxmox/lxc-vs-docker.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# LXC vs Docker Containers
|
||||
|
||||
Understanding when to use Proxmox LXC containers vs Docker containers.
|
||||
|
||||
## Fundamental Differences
|
||||
|
||||
| Aspect | LXC (Proxmox) | Docker |
|
||||
|--------|---------------|--------|
|
||||
| Abstraction | System container (full OS) | Application container |
|
||||
| Init system | systemd, runit, etc. | Single process (PID 1) |
|
||||
| Management | Proxmox (pct) | Docker daemon |
|
||||
| Persistence | Stateful by default | Ephemeral by default |
|
||||
| Updates | apt/yum inside container | Replace container |
|
||||
| Networking | Proxmox managed | Docker managed |
|
||||
|
||||
## When to Use LXC
|
||||
|
||||
- **Long-running services** with traditional management (systemd, cron)
|
||||
- **Multi-process applications** that expect init system
|
||||
- **Legacy apps** not designed for containers
|
||||
- **Dev/test environments** mimicking full VMs
|
||||
- **Resource efficiency** when full VM isolation not needed
|
||||
- **Direct Proxmox management** (backup, snapshots, migration)
|
||||
|
||||
```bash
|
||||
# Create LXC
|
||||
pct create 200 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||||
--hostname mycontainer \
|
||||
--storage local-lvm \
|
||||
--rootfs local-lvm:8 \
|
||||
--cores 2 \
|
||||
--memory 2048 \
|
||||
--net0 name=eth0,bridge=vmbr0,ip=dhcp
|
||||
```
|
||||
|
||||
## When to Use Docker
|
||||
|
||||
- **Microservices** with single responsibility
|
||||
- **CI/CD pipelines** with reproducible builds
|
||||
- **Rapid deployment** and scaling
|
||||
- **Application isolation** within a host
|
||||
- **Compose stacks** with multi-container apps
|
||||
- **Ecosystem tooling** (registries, orchestration)
|
||||
|
||||
```yaml
|
||||
# docker-compose.yaml
|
||||
services:
|
||||
app:
|
||||
image: myapp:1.0
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Scenario | Recommendation | Rationale |
|
||||
|----------|---------------|-----------|
|
||||
| Pi-hole | Docker on VM | Easy updates, compose ecosystem |
|
||||
| Database server | LXC or VM | Stateful, traditional management |
|
||||
| Web app microservice | Docker | Ephemeral, scalable |
|
||||
| Development environment | LXC | Full OS, multiple services |
|
||||
| CI runner | Docker on VM | Isolation, reproducibility |
|
||||
| Network appliance | LXC | Direct network access, systemd |
|
||||
| Home automation | Docker on VM | Compose stacks, easy backup |
|
||||
|
||||
## Hybrid Approach
|
||||
|
||||
Common pattern: **VM runs Docker**, managed by Proxmox.
|
||||
|
||||
```
|
||||
Proxmox Node
|
||||
├── VM: docker-host-1 (template 102)
|
||||
│ ├── Container: nginx
|
||||
│ ├── Container: app
|
||||
│ └── Container: redis
|
||||
├── VM: docker-host-2 (template 102)
|
||||
│ ├── Container: postgres
|
||||
│ └── Container: backup
|
||||
└── LXC: pihole (direct network)
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Proxmox handles VM-level backup/migration
|
||||
- Docker handles application deployment
|
||||
- Clear separation of concerns
|
||||
|
||||
## Docker in LXC (Not Recommended)
|
||||
|
||||
Running Docker inside LXC is possible but adds complexity:
|
||||
|
||||
### Requirements
|
||||
|
||||
1. Privileged container OR nested containers enabled
|
||||
2. AppArmor profile modifications
|
||||
3. Keyctl feature enabled
|
||||
|
||||
```bash
|
||||
# LXC config (Proxmox)
|
||||
lxc.apparmor.profile: unconfined
|
||||
lxc.cgroup.devices.allow: a
|
||||
lxc.cap.drop:
|
||||
features: keyctl=1,nesting=1
|
||||
```
|
||||
|
||||
### Issues
|
||||
|
||||
- Security: Reduced isolation
|
||||
- Compatibility: Some Docker features broken
|
||||
- Debugging: Two container layers
|
||||
- Backup: More complex
|
||||
|
||||
**Recommendation:** Use VM with Docker instead.
|
||||
|
||||
## Resource Comparison
|
||||
|
||||
For equivalent workload:
|
||||
|
||||
| Resource | VM + Docker | LXC | Docker in LXC |
|
||||
|----------|-------------|-----|---------------|
|
||||
| RAM overhead | ~500 MB | ~50 MB | ~100 MB |
|
||||
| Disk overhead | ~5 GB | ~500 MB | ~1 GB |
|
||||
| Boot time | 30-60s | 2-5s | 5-10s |
|
||||
| Isolation | Full | Shared kernel | Shared kernel |
|
||||
| Complexity | Low | Low | High |
|
||||
|
||||
## Migration Paths
|
||||
|
||||
### LXC to Docker
|
||||
|
||||
1. Export application config from LXC
|
||||
2. Create Dockerfile/compose
|
||||
3. Build image
|
||||
4. Deploy to Docker host
|
||||
5. Migrate data volumes
|
||||
|
||||
### Docker to LXC
|
||||
|
||||
1. Install service directly in LXC (apt/yum)
|
||||
2. Configure with systemd
|
||||
3. Migrate data
|
||||
4. Update Proxmox firewall rules
|
||||
212
skills/docker/references/troubleshooting.md
Normal file
212
skills/docker/references/troubleshooting.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Docker Troubleshooting Reference
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| Container exits immediately | Bad entrypoint, missing deps | Check logs, verify CMD |
|
||||
| Port already in use | Conflict with host/other container | `lsof -i :<port>`, change mapping |
|
||||
| Volume permission denied | UID mismatch | Check ownership, use named volumes |
|
||||
| Network not found | Network removed/not created | `docker network create` |
|
||||
| Image pull failed | Registry/auth/name issue | Check registry, credentials, name |
|
||||
| OOM killed | Exceeded memory limit | Increase limit or optimize app |
|
||||
| DNS resolution failed | Network config issue | Check DNS settings, network mode |
|
||||
| Health check failing | App not responding | Check command, increase timeout |
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
### Container Status
|
||||
|
||||
```bash
|
||||
# List all containers (including stopped)
|
||||
docker ps -a
|
||||
|
||||
# Check exit code
|
||||
docker inspect <container> --format '{{.State.ExitCode}}'
|
||||
|
||||
# Check restart count
|
||||
docker inspect <container> --format '{{.RestartCount}}'
|
||||
```
|
||||
|
||||
### Logs
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
docker logs <container>
|
||||
|
||||
# Follow logs
|
||||
docker logs -f <container>
|
||||
|
||||
# Last N lines
|
||||
docker logs --tail 100 <container>
|
||||
|
||||
# With timestamps
|
||||
docker logs -t <container>
|
||||
|
||||
# Since time
|
||||
docker logs --since 10m <container>
|
||||
```
|
||||
|
||||
### Resource Usage
|
||||
|
||||
```bash
|
||||
# Real-time stats
|
||||
docker stats
|
||||
|
||||
# Single container
|
||||
docker stats <container>
|
||||
|
||||
# Disk usage
|
||||
docker system df
|
||||
docker system df -v # Verbose
|
||||
```
|
||||
|
||||
### Container Details
|
||||
|
||||
```bash
|
||||
# Full inspection
|
||||
docker inspect <container>
|
||||
|
||||
# Specific fields
|
||||
docker inspect <container> --format '{{.State.Status}}'
|
||||
docker inspect <container> --format '{{json .NetworkSettings.Networks}}'
|
||||
docker inspect <container> --format '{{.Mounts}}'
|
||||
```
|
||||
|
||||
### Process and Network
|
||||
|
||||
```bash
|
||||
# Running processes
|
||||
docker top <container>
|
||||
|
||||
# Execute command
|
||||
docker exec <container> ps aux
|
||||
docker exec <container> netstat -tlnp
|
||||
|
||||
# Network connectivity
|
||||
docker exec <container> ping <host>
|
||||
docker exec <container> curl <url>
|
||||
docker exec <container> nslookup <hostname>
|
||||
```
|
||||
|
||||
## Troubleshooting Workflows
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
1. Check logs: `docker logs <container>`
|
||||
2. Check exit code: `docker inspect <container> --format '{{.State.ExitCode}}'`
|
||||
3. Run interactively: `docker run -it <image> sh`
|
||||
4. Check entrypoint/cmd: `docker inspect <image> --format '{{.Config.Cmd}}'`
|
||||
|
||||
### Container Keeps Restarting
|
||||
|
||||
1. Check logs for errors
|
||||
2. Verify health check if configured
|
||||
3. Check resource limits (OOM)
|
||||
4. Test entrypoint manually
|
||||
|
||||
### Network Issues
|
||||
|
||||
1. Verify network exists: `docker network ls`
|
||||
2. Check container attached: `docker inspect <container> --format '{{.NetworkSettings.Networks}}'`
|
||||
3. Test DNS: `docker exec <container> nslookup <service>`
|
||||
4. Check port mapping: `docker port <container>`
|
||||
|
||||
### Volume Issues
|
||||
|
||||
1. Check mount: `docker inspect <container> --format '{{.Mounts}}'`
|
||||
2. Verify permissions inside: `docker exec <container> ls -la /path`
|
||||
3. Check host path exists (bind mounts)
|
||||
4. Try named volume instead
|
||||
|
||||
### Performance Issues
|
||||
|
||||
1. Check resource usage: `docker stats`
|
||||
2. Review limits: `docker inspect <container> --format '{{.HostConfig.Memory}}'`
|
||||
3. Check for resource contention
|
||||
4. Profile application inside container
|
||||
|
||||
## Cleanup
|
||||
|
||||
```bash
|
||||
# Remove stopped containers
|
||||
docker container prune
|
||||
|
||||
# Remove unused images
|
||||
docker image prune
|
||||
|
||||
# Remove unused volumes
|
||||
docker volume prune
|
||||
|
||||
# Remove unused networks
|
||||
docker network prune
|
||||
|
||||
# Remove everything unused
|
||||
docker system prune -a --volumes
|
||||
```
|
||||
|
||||
## Debugging Compose
|
||||
|
||||
```bash
|
||||
# Validate compose file
|
||||
docker compose config
|
||||
|
||||
# See what would run
|
||||
docker compose config --services
|
||||
|
||||
# Check why service isn't starting
|
||||
docker compose logs <service>
|
||||
|
||||
# Force recreate
|
||||
docker compose up -d --force-recreate
|
||||
|
||||
# Rebuild images
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
## Common Compose Issues
|
||||
|
||||
| Problem | Check |
|
||||
|---------|-------|
|
||||
| Service not starting | `docker compose logs <service>` |
|
||||
| depends_on not working | Service starts but app not ready (use healthcheck) |
|
||||
| Volume not persisting | Check volume name, not recreated |
|
||||
| Env vars not loading | Check .env file location, syntax |
|
||||
| Network errors | Check network names, external networks |
|
||||
|
||||
## Health Check Debugging
|
||||
|
||||
```bash
|
||||
# Check health status
|
||||
docker inspect <container> --format '{{.State.Health.Status}}'
|
||||
|
||||
# View health log
|
||||
docker inspect <container> --format '{{json .State.Health}}' | jq
|
||||
|
||||
# Test health command manually
|
||||
docker exec <container> <health-command>
|
||||
```
|
||||
|
||||
## Emergency Recovery
|
||||
|
||||
### Force Stop
|
||||
|
||||
```bash
|
||||
docker kill <container>
|
||||
```
|
||||
|
||||
### Remove Stuck Container
|
||||
|
||||
```bash
|
||||
docker rm -f <container>
|
||||
```
|
||||
|
||||
### Reset Docker
|
||||
|
||||
```bash
|
||||
# Restart Docker daemon
|
||||
sudo systemctl restart docker
|
||||
|
||||
# Or on macOS
|
||||
# Restart Docker Desktop
|
||||
```
|
||||
230
skills/docker/references/volumes.md
Normal file
230
skills/docker/references/volumes.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Docker Volumes Reference
|
||||
|
||||
## Volume Types
|
||||
|
||||
### Named Volumes (Recommended)
|
||||
|
||||
Managed by Docker, stored in `/var/lib/docker/volumes/`.
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
db-data:
|
||||
|
||||
services:
|
||||
db:
|
||||
volumes:
|
||||
- db-data:/var/lib/mysql
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Portable across hosts
|
||||
- Backup-friendly
|
||||
- No permission issues
|
||||
- Can use volume drivers (NFS, etc.)
|
||||
|
||||
### Bind Mounts
|
||||
|
||||
Direct host path mapping.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
volumes:
|
||||
- ./config:/etc/app/config:ro
|
||||
- /host/data:/container/data
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Direct file access from host
|
||||
- Development workflow (live reload)
|
||||
- Access to host files
|
||||
|
||||
Drawbacks:
|
||||
- Host-dependent paths
|
||||
- Permission issues possible
|
||||
- Less portable
|
||||
|
||||
### tmpfs Mounts
|
||||
|
||||
In-memory storage (Linux only).
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
tmpfs:
|
||||
- /tmp
|
||||
- /run:size=100m
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Fast (RAM-based)
|
||||
- Secure (not persisted)
|
||||
- Good for secrets, cache
|
||||
|
||||
## Volume Options
|
||||
|
||||
### Read-Only
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- ./config:/etc/app/config:ro
|
||||
```
|
||||
|
||||
### Bind Propagation
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- type: bind
|
||||
source: ./data
|
||||
target: /data
|
||||
bind:
|
||||
propagation: rslave
|
||||
```
|
||||
|
||||
### Volume Driver Options
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
nfs-data:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: nfs
|
||||
o: addr=192.168.1.100,rw
|
||||
device: ":/export/data"
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Database Data
|
||||
|
||||
```yaml
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:15
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
environment:
|
||||
POSTGRES_PASSWORD: secret
|
||||
|
||||
volumes:
|
||||
pgdata:
|
||||
```
|
||||
|
||||
### Configuration Files
|
||||
|
||||
```yaml
|
||||
services:
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
- ./html:/usr/share/nginx/html:ro
|
||||
```
|
||||
|
||||
### Shared Data Between Services
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
volumes:
|
||||
- shared:/data
|
||||
|
||||
worker:
|
||||
volumes:
|
||||
- shared:/data
|
||||
|
||||
volumes:
|
||||
shared:
|
||||
```
|
||||
|
||||
### Log Persistence
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
volumes:
|
||||
- logs:/var/log/app
|
||||
|
||||
volumes:
|
||||
logs:
|
||||
```
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
### Backup Named Volume
|
||||
|
||||
```bash
|
||||
# Create backup
|
||||
docker run --rm \
|
||||
-v myvolume:/source:ro \
|
||||
-v $(pwd):/backup \
|
||||
alpine tar czf /backup/myvolume.tar.gz -C /source .
|
||||
|
||||
# Restore backup
|
||||
docker run --rm \
|
||||
-v myvolume:/target \
|
||||
-v $(pwd):/backup \
|
||||
alpine tar xzf /backup/myvolume.tar.gz -C /target
|
||||
```
|
||||
|
||||
### Copy Files from Volume
|
||||
|
||||
```bash
|
||||
docker cp <container>:/path/to/file ./local-file
|
||||
```
|
||||
|
||||
## Volume Management
|
||||
|
||||
```bash
|
||||
# List volumes
|
||||
docker volume ls
|
||||
|
||||
# Inspect volume
|
||||
docker volume inspect <volume>
|
||||
|
||||
# Remove unused volumes
|
||||
docker volume prune
|
||||
|
||||
# Remove specific volume
|
||||
docker volume rm <volume>
|
||||
|
||||
# Create volume manually
|
||||
docker volume create --name myvolume
|
||||
```
|
||||
|
||||
## Permissions
|
||||
|
||||
### Common Permission Issues
|
||||
|
||||
```bash
|
||||
# Check container user
|
||||
docker exec <container> id
|
||||
|
||||
# Check volume permissions
|
||||
docker exec <container> ls -la /data
|
||||
```
|
||||
|
||||
### Solutions
|
||||
|
||||
```yaml
|
||||
# Run as specific user
|
||||
services:
|
||||
app:
|
||||
user: "1000:1000"
|
||||
volumes:
|
||||
- ./data:/data
|
||||
```
|
||||
|
||||
Or fix host permissions:
|
||||
```bash
|
||||
chown -R 1000:1000 ./data
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use named volumes for data** - More portable than bind mounts
|
||||
2. **Read-only when possible** - Use `:ro` for config files
|
||||
3. **Separate concerns** - Different volumes for data, config, logs
|
||||
4. **Backup strategy** - Plan for volume backup/restore
|
||||
5. **Don't store in image** - Data should be in volumes, not image layers
|
||||
6. **Use .dockerignore** - Exclude data directories from build context
|
||||
95
skills/proxmox/SKILL.md
Normal file
95
skills/proxmox/SKILL.md
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
name: proxmox
|
||||
description: |
|
||||
Proxmox VE virtualization platform reference for VM/LXC management, clustering,
|
||||
storage, and networking. Includes Terraform and Ansible integration patterns.
|
||||
Use when working with Proxmox configurations, CLI commands, troubleshooting
|
||||
VMs/containers, or planning resource allocation.
|
||||
Triggers: proxmox, qemu, kvm, lxc, pve, vm, container, cluster, vzdump, qm, pct.
|
||||
---
|
||||
|
||||
# Proxmox Skill
|
||||
|
||||
Proxmox VE virtualization platform reference for VM management, containers, clustering, and homelab infrastructure.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# VM management (qm)
|
||||
qm list # List all VMs
|
||||
qm status <vmid> # Check VM status
|
||||
qm start <vmid> # Start VM
|
||||
qm stop <vmid> # Stop VM (graceful)
|
||||
qm shutdown <vmid> # Shutdown VM (ACPI)
|
||||
qm unlock <vmid> # Remove lock
|
||||
qm config <vmid> # Show VM config
|
||||
|
||||
# Container management (pct)
|
||||
pct list # List all containers
|
||||
pct status <ctid> # Check container status
|
||||
pct start <ctid> # Start container
|
||||
pct stop <ctid> # Stop container
|
||||
pct enter <ctid> # Enter container shell
|
||||
|
||||
# Cluster management (pvecm)
|
||||
pvecm status # Cluster status and quorum
|
||||
pvecm nodes # List cluster nodes
|
||||
|
||||
# API shell (pvesh)
|
||||
pvesh get /nodes # List nodes via API
|
||||
pvesh get /nodes/<node>/status # Node resource status
|
||||
|
||||
# Backup (vzdump)
|
||||
vzdump <vmid> --mode snapshot --storage <storage>
|
||||
vzdump --all --compress zstd
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
Load on-demand based on task:
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| VM vs LXC | [vm-lxc.md](references/vm-lxc.md) | Choosing virtualization type |
|
||||
| Docker Hosting | [docker-hosting.md](references/docker-hosting.md) | Running Docker on Proxmox |
|
||||
| Networking | [networking.md](references/networking.md) | Bridges, VLANs, SDN, firewall |
|
||||
| Storage | [storage.md](references/storage.md) | Storage backends, content types |
|
||||
| Clustering | [clustering.md](references/clustering.md) | HA, quorum, fencing |
|
||||
| Backup | [backup.md](references/backup.md) | vzdump modes, restore |
|
||||
| CLI Tools | [cli-tools.md](references/cli-tools.md) | qm, pct, pvecm, pvesh commands |
|
||||
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Common errors, diagnostics |
|
||||
| Automation Tools | [automation-tools.md](references/automation-tools.md) | Terraform/Ansible integration |
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before deploying VMs/containers:
|
||||
|
||||
- [ ] Cluster status healthy (`pvecm status`)
|
||||
- [ ] Node resources available (CPU, RAM, disk)
|
||||
- [ ] Storage accessible and mounted
|
||||
- [ ] Network bridges configured correctly
|
||||
- [ ] VLAN tags match network design
|
||||
- [ ] Resource allocation within node limits
|
||||
- [ ] HA configuration correct (if enabled)
|
||||
- [ ] Backup schedule in place
|
||||
- [ ] Naming convention followed
|
||||
|
||||
## VM vs LXC Quick Decision
|
||||
|
||||
| Factor | Use VM | Use LXC |
|
||||
|--------|--------|---------|
|
||||
| OS | Windows, BSD, any | Linux only |
|
||||
| Isolation | Full kernel isolation | Shared kernel |
|
||||
| Performance | Good | Better (lighter) |
|
||||
| Startup | Slower | Fast |
|
||||
| Density | Lower | Higher |
|
||||
| Complexity | Any workload | Simple services |
|
||||
|
||||
## Homelab Network VLANs
|
||||
|
||||
| VLAN | Purpose | Proxmox Bridge |
|
||||
|------|---------|----------------|
|
||||
| 5 | Management (Web UI, API, SSH) | vmbr5 |
|
||||
| 1 | Trusted network | vmbr0 |
|
||||
| 11 | Storage (NFS/Ceph, MTU 9000) | vmbr11 |
|
||||
| 12 | High-speed transfers | vmbr12 |
|
||||
179
skills/proxmox/references/automation-tools.md
Normal file
179
skills/proxmox/references/automation-tools.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Proxmox Automation Tools
|
||||
|
||||
Integration patterns for managing Proxmox with Terraform and Ansible.
|
||||
|
||||
## Tool Selection Guide
|
||||
|
||||
| Task | Recommended Tool | Rationale |
|
||||
|------|-----------------|-----------|
|
||||
| VM/LXC provisioning | Terraform | Declarative state, idempotent, handles dependencies |
|
||||
| Template creation | Packer | Repeatable builds, version-controlled |
|
||||
| Post-boot configuration | Ansible | Agent-based, procedural, good for drift |
|
||||
| One-off VM operations | Ansible | Quick tasks, no state file needed |
|
||||
| Dynamic inventory | Ansible | Query running VMs for configuration |
|
||||
| Bulk VM creation | Terraform | count/for_each, parallel creation |
|
||||
| Snapshot management | Either | Terraform for lifecycle, Ansible for ad-hoc |
|
||||
| Cluster administration | CLI/API | Direct access for maintenance tasks |
|
||||
|
||||
## Terraform Integration
|
||||
|
||||
### Provider
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
required_providers {
|
||||
proxmox = {
|
||||
source = "telmate/proxmox"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "proxmox" {
|
||||
pm_api_url = "https://proxmox.example.com:8006/api2/json"
|
||||
pm_api_token_id = "terraform@pve!mytoken"
|
||||
pm_api_token_secret = var.pm_api_token_secret
|
||||
}
|
||||
```
|
||||
|
||||
### Common Patterns
|
||||
|
||||
```hcl
|
||||
# Clone from template
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
name = "myvm"
|
||||
target_node = "joseph"
|
||||
clone = "tmpl-ubuntu-2404-standard"
|
||||
full_clone = true
|
||||
|
||||
cores = 2
|
||||
memory = 4096
|
||||
|
||||
disks {
|
||||
scsi {
|
||||
scsi0 {
|
||||
disk {
|
||||
storage = "local-lvm"
|
||||
size = "50G"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Skill Reference
|
||||
|
||||
Load terraform skill for detailed patterns:
|
||||
- `terraform/references/proxmox/gotchas.md` - Critical issues
|
||||
- `terraform/references/proxmox/vm-qemu.md` - VM resource patterns
|
||||
- `terraform/references/proxmox/authentication.md` - API setup
|
||||
|
||||
## Ansible Integration
|
||||
|
||||
### Collection
|
||||
|
||||
```bash
|
||||
ansible-galaxy collection install community.general
|
||||
```
|
||||
|
||||
### Common Patterns
|
||||
|
||||
```yaml
|
||||
# Clone VM
|
||||
- name: Clone from template
|
||||
community.general.proxmox_kvm:
|
||||
api_host: proxmox.example.com
|
||||
api_user: ansible@pve
|
||||
api_token_id: mytoken
|
||||
api_token_secret: "{{ proxmox_token_secret }}"
|
||||
node: joseph
|
||||
vmid: 300
|
||||
name: myvm
|
||||
clone: tmpl-ubuntu-2404-standard
|
||||
full: true
|
||||
timeout: 500
|
||||
|
||||
# Start VM
|
||||
- name: Start VM
|
||||
community.general.proxmox_kvm:
|
||||
# ... auth ...
|
||||
vmid: 300
|
||||
state: started
|
||||
```
|
||||
|
||||
### Skill Reference
|
||||
|
||||
Load ansible skill for detailed patterns:
|
||||
- `ansible/references/proxmox/modules.md` - All Proxmox modules
|
||||
- `ansible/references/proxmox/gotchas.md` - Common issues
|
||||
- `ansible/references/proxmox/dynamic-inventory.md` - Auto-discovery
|
||||
|
||||
## Terraform vs Ansible Decision
|
||||
|
||||
### Use Terraform When
|
||||
|
||||
- Creating infrastructure from scratch
|
||||
- Managing VM lifecycle (create, update, destroy)
|
||||
- Need state tracking and drift detection
|
||||
- Deploying multiple similar VMs (for_each)
|
||||
- Complex dependencies between resources
|
||||
- Team collaboration with state locking
|
||||
|
||||
### Use Ansible When
|
||||
|
||||
- Configuring VMs after creation
|
||||
- Ad-hoc operations (start/stop specific VMs)
|
||||
- Dynamic inventory needed for other playbooks
|
||||
- Quick one-off tasks
|
||||
- No state file management desired
|
||||
- Integration with existing Ansible workflows
|
||||
|
||||
### Use Both When
|
||||
|
||||
- Terraform provisions VMs
|
||||
- Ansible configures them post-boot
|
||||
- Ansible uses Proxmox dynamic inventory to find Terraform-created VMs
|
||||
|
||||
## Hybrid Workflow Example
|
||||
|
||||
```
|
||||
1. Packer builds VM template
|
||||
└── packer build ubuntu-2404.pkr.hcl
|
||||
|
||||
2. Terraform provisions VMs from template
|
||||
└── terraform apply
|
||||
└── Outputs: VM IPs, hostnames
|
||||
|
||||
3. Ansible configures VMs
|
||||
└── Uses Proxmox dynamic inventory OR
|
||||
└── Uses Terraform output as inventory
|
||||
|
||||
4. Ongoing management
|
||||
└── Terraform for infrastructure changes
|
||||
└── Ansible for configuration drift
|
||||
```
|
||||
|
||||
## API Token Sharing
|
||||
|
||||
Both tools can share the same API token:
|
||||
|
||||
```bash
|
||||
# Create shared token
|
||||
pveum user add automation@pve
|
||||
pveum aclmod / -user automation@pve -role PVEAdmin
|
||||
pveum user token add automation@pve shared --privsep 0
|
||||
```
|
||||
|
||||
Store in shared secrets management (1Password, Vault, etc.).
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
| Issue | Terraform | Ansible |
|
||||
|-------|-----------|---------|
|
||||
| VMID | Auto-assigns if not specified | Must specify manually |
|
||||
| Cloud-init changes | Use replace_triggered_by | Limited support, use API |
|
||||
| State tracking | Yes (tfstate) | No state file |
|
||||
| Parallel operations | Yes (configurable) | Yes (forks) |
|
||||
| Template name vs ID | Supports both | Supports both |
|
||||
| Timeout handling | Provider config | Module parameter |
|
||||
162
skills/proxmox/references/backup.md
Normal file
162
skills/proxmox/references/backup.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Proxmox Backup Reference
|
||||
|
||||
## vzdump Overview
|
||||
|
||||
Built-in backup tool for VMs and containers.
|
||||
|
||||
```bash
|
||||
# Basic backup
|
||||
vzdump <vmid>
|
||||
|
||||
# With options
|
||||
vzdump <vmid> --mode snapshot --storage backup-nfs --compress zstd
|
||||
|
||||
# Backup all VMs
|
||||
vzdump --all --compress zstd
|
||||
```
|
||||
|
||||
## Backup Modes
|
||||
|
||||
| Mode | Downtime | Method | Use Case |
|
||||
|------|----------|--------|----------|
|
||||
| stop | Full | Shutdown, backup, start | Consistent, any storage |
|
||||
| suspend | Brief | Pause, backup, resume | Running state preserved |
|
||||
| snapshot | None | LVM/ZFS/Ceph snapshot | Production, requires snapshot storage |
|
||||
|
||||
### Mode Selection
|
||||
|
||||
```bash
|
||||
# Stop mode (most consistent)
|
||||
vzdump <vmid> --mode stop
|
||||
|
||||
# Suspend mode (preserves RAM state)
|
||||
vzdump <vmid> --mode suspend
|
||||
|
||||
# Snapshot mode (live, requires compatible storage)
|
||||
vzdump <vmid> --mode snapshot
|
||||
```
|
||||
|
||||
## Backup Formats
|
||||
|
||||
| Format | Type | Compression |
|
||||
|--------|------|-------------|
|
||||
| VMA | VMs | Native Proxmox format |
|
||||
| tar | Containers | Standard tar archive |
|
||||
|
||||
## Compression Options
|
||||
|
||||
| Type | Speed | Ratio | CPU |
|
||||
|------|-------|-------|-----|
|
||||
| none | Fastest | 1:1 | Low |
|
||||
| lzo | Fast | Good | Low |
|
||||
| gzip | Moderate | Better | Medium |
|
||||
| zstd | Fast | Best | Medium |
|
||||
|
||||
Recommendation: `zstd` for best balance.
|
||||
|
||||
```bash
|
||||
vzdump <vmid> --compress zstd
|
||||
```
|
||||
|
||||
## Storage Configuration
|
||||
|
||||
```bash
|
||||
# Backup to specific storage
|
||||
vzdump <vmid> --storage backup-nfs
|
||||
|
||||
# Check available backup storage
|
||||
pvesm status | grep backup
|
||||
```
|
||||
|
||||
## Scheduled Backups
|
||||
|
||||
Configure in Datacenter → Backup:
|
||||
|
||||
- Schedule (cron format)
|
||||
- Selection (all, pool, specific VMs)
|
||||
- Storage destination
|
||||
- Mode and compression
|
||||
- Retention policy
|
||||
|
||||
### Retention Policy
|
||||
|
||||
```
|
||||
keep-last: 3 # Keep last N backups
|
||||
keep-daily: 7 # Keep daily for N days
|
||||
keep-weekly: 4 # Keep weekly for N weeks
|
||||
keep-monthly: 6 # Keep monthly for N months
|
||||
```
|
||||
|
||||
## Restore Operations
|
||||
|
||||
### Full Restore
|
||||
|
||||
```bash
|
||||
# Restore VM
|
||||
qmrestore <backup-file> <vmid>
|
||||
|
||||
# Restore to different VMID
|
||||
qmrestore <backup-file> <new-vmid>
|
||||
|
||||
# Restore container
|
||||
pct restore <ctid> <backup-file>
|
||||
```
|
||||
|
||||
### Restore Options
|
||||
|
||||
```bash
|
||||
# Restore to different storage
|
||||
qmrestore <backup> <vmid> --storage local-lvm
|
||||
|
||||
# Force overwrite existing VM
|
||||
qmrestore <backup> <vmid> --force
|
||||
```
|
||||
|
||||
### File-Level Restore
|
||||
|
||||
```bash
|
||||
# Mount backup for file extraction
|
||||
# (Use web UI: Backup → Restore → File Restore)
|
||||
```
|
||||
|
||||
## Proxmox Backup Server (PBS)
|
||||
|
||||
Dedicated backup server with deduplication.
|
||||
|
||||
### Benefits
|
||||
|
||||
- Deduplication across backups
|
||||
- Encryption at rest
|
||||
- Verification and integrity checks
|
||||
- Efficient incremental backups
|
||||
- Remote backup sync
|
||||
|
||||
### Integration
|
||||
|
||||
Add PBS storage:
|
||||
|
||||
```bash
|
||||
pvesm add pbs <storage-id> \
|
||||
--server <pbs-server> \
|
||||
--datastore <datastore> \
|
||||
--username <user>@pbs \
|
||||
--fingerprint <fingerprint>
|
||||
```
|
||||
|
||||
## Backup Best Practices
|
||||
|
||||
- Store backups on separate storage from VMs
|
||||
- Use snapshot mode for production VMs
|
||||
- Test restores regularly
|
||||
- Offsite backup copy for disaster recovery
|
||||
- Monitor backup job completion
|
||||
- Set appropriate retention policy
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Issue | Check |
|
||||
|-------|-------|
|
||||
| Backup fails | Storage space, VM state, permissions |
|
||||
| Slow backup | Mode (snapshot faster), compression, network |
|
||||
| Restore fails | Storage compatibility, VMID conflicts |
|
||||
| Snapshot fails | Storage doesn't support snapshots |
|
||||
178
skills/proxmox/references/cli-tools.md
Normal file
178
skills/proxmox/references/cli-tools.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Proxmox CLI Tools Reference
|
||||
|
||||
## qm - VM Management
|
||||
|
||||
```bash
|
||||
# List and status
|
||||
qm list # List all VMs
|
||||
qm status <vmid> # VM status
|
||||
qm config <vmid> # Show VM config
|
||||
|
||||
# Power operations
|
||||
qm start <vmid> # Start VM
|
||||
qm stop <vmid> # Force stop
|
||||
qm shutdown <vmid> # ACPI shutdown
|
||||
qm reboot <vmid> # ACPI reboot
|
||||
qm reset <vmid> # Hard reset
|
||||
qm suspend <vmid> # Suspend to RAM
|
||||
qm resume <vmid> # Resume from suspend
|
||||
|
||||
# Configuration
|
||||
qm set <vmid> --memory 4096 # Set memory
|
||||
qm set <vmid> --cores 4 # Set CPU cores
|
||||
qm set <vmid> --name newname # Rename VM
|
||||
|
||||
# Disk operations
|
||||
qm resize <vmid> scsi0 +10G # Extend disk
|
||||
qm move-disk <vmid> scsi0 <storage> # Move disk
|
||||
|
||||
# Snapshots
|
||||
qm snapshot <vmid> <snapname> # Create snapshot
|
||||
qm listsnapshot <vmid> # List snapshots
|
||||
qm rollback <vmid> <snapname> # Rollback
|
||||
qm delsnapshot <vmid> <snapname> # Delete snapshot
|
||||
|
||||
# Templates and clones
|
||||
qm template <vmid> # Convert to template
|
||||
qm clone <vmid> <newid> # Clone VM
|
||||
|
||||
# Migration
|
||||
qm migrate <vmid> <target-node> # Live migrate
|
||||
|
||||
# Troubleshooting
|
||||
qm unlock <vmid> # Remove lock
|
||||
qm showcmd <vmid> # Show QEMU command
|
||||
qm monitor <vmid> # QEMU monitor
|
||||
qm guest cmd <vmid> <command> # Guest agent command
|
||||
```
|
||||
|
||||
## pct - Container Management
|
||||
|
||||
```bash
|
||||
# List and status
|
||||
pct list # List all containers
|
||||
pct status <ctid> # Container status
|
||||
pct config <ctid> # Show config
|
||||
|
||||
# Power operations
|
||||
pct start <ctid> # Start container
|
||||
pct stop <ctid> # Stop container
|
||||
pct shutdown <ctid> # Graceful shutdown
|
||||
pct reboot <ctid> # Reboot
|
||||
|
||||
# Access
|
||||
pct enter <ctid> # Enter shell
|
||||
pct exec <ctid> -- <command> # Run command
|
||||
pct console <ctid> # Attach console
|
||||
|
||||
# Configuration
|
||||
pct set <ctid> --memory 2048 # Set memory
|
||||
pct set <ctid> --cores 2 # Set CPU cores
|
||||
pct set <ctid> --hostname name # Set hostname
|
||||
|
||||
# Disk operations
|
||||
pct resize <ctid> rootfs +5G # Extend rootfs
|
||||
pct move-volume <ctid> <vol> <storage> # Move volume
|
||||
|
||||
# Snapshots
|
||||
pct snapshot <ctid> <snapname> # Create snapshot
|
||||
pct listsnapshot <ctid> # List snapshots
|
||||
pct rollback <ctid> <snapname> # Rollback
|
||||
|
||||
# Templates
|
||||
pct template <ctid> # Convert to template
|
||||
pct clone <ctid> <newid> # Clone container
|
||||
|
||||
# Migration
|
||||
pct migrate <ctid> <target-node> # Migrate container
|
||||
|
||||
# Troubleshooting
|
||||
pct unlock <ctid> # Remove lock
|
||||
pct push <ctid> <src> <dst> # Copy file to container
|
||||
pct pull <ctid> <src> <dst> # Copy file from container
|
||||
```
|
||||
|
||||
## pvecm - Cluster Management
|
||||
|
||||
```bash
|
||||
# Status
|
||||
pvecm status # Cluster status
|
||||
pvecm nodes # List nodes
|
||||
pvecm qdevice # QDevice status
|
||||
|
||||
# Node operations
|
||||
pvecm add <node> # Join cluster
|
||||
pvecm delnode <node> # Remove node
|
||||
pvecm updatecerts # Update SSL certs
|
||||
|
||||
# Recovery
|
||||
pvecm expected <votes> # Set expected votes
|
||||
```
|
||||
|
||||
## pvesh - API Shell
|
||||
|
||||
```bash
|
||||
# GET requests
|
||||
pvesh get /nodes # List nodes
|
||||
pvesh get /nodes/<node>/status # Node status
|
||||
pvesh get /nodes/<node>/qemu # List VMs on node
|
||||
pvesh get /nodes/<node>/qemu/<vmid>/status/current # VM status
|
||||
pvesh get /storage # List storage
|
||||
pvesh get /cluster/resources # All cluster resources
|
||||
|
||||
# POST/PUT requests
|
||||
pvesh create /nodes/<node>/qemu -vmid <id> ... # Create VM
|
||||
pvesh set /nodes/<node>/qemu/<vmid>/config ... # Modify VM
|
||||
|
||||
# DELETE requests
|
||||
pvesh delete /nodes/<node>/qemu/<vmid> # Delete VM
|
||||
```
|
||||
|
||||
## vzdump - Backup
|
||||
|
||||
```bash
|
||||
# Basic backup
|
||||
vzdump <vmid> # Backup VM
|
||||
vzdump <ctid> # Backup container
|
||||
|
||||
# Options
|
||||
vzdump <vmid> --mode snapshot # Snapshot mode
|
||||
vzdump <vmid> --compress zstd # With compression
|
||||
vzdump <vmid> --storage backup # To specific storage
|
||||
vzdump <vmid> --mailto admin@example.com # Email notification
|
||||
|
||||
# Backup all
|
||||
vzdump --all # All VMs and containers
|
||||
vzdump --pool <pool> # All in pool
|
||||
```
|
||||
|
||||
## qmrestore / pct restore
|
||||
|
||||
```bash
|
||||
# Restore VM
|
||||
qmrestore <backup.vma> <vmid>
|
||||
qmrestore <backup.vma> <vmid> --storage local-lvm
|
||||
|
||||
# Restore container
|
||||
pct restore <ctid> <backup.tar>
|
||||
pct restore <ctid> <backup.tar> --storage local-lvm
|
||||
```
|
||||
|
||||
## Useful Combinations
|
||||
|
||||
```bash
|
||||
# Check resources on all nodes
|
||||
for node in joseph maxwell everette; do
|
||||
echo "=== $node ==="
|
||||
pvesh get /nodes/$node/status | jq '{cpu:.cpu, memory:.memory}'
|
||||
done
|
||||
|
||||
# Stop all VMs on a node
|
||||
qm list | awk 'NR>1 {print $1}' | xargs -I {} qm stop {}
|
||||
|
||||
# List VMs with their IPs (requires guest agent)
|
||||
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
|
||||
echo -n "$vmid: "
|
||||
qm guest cmd $vmid network-get-interfaces 2>/dev/null | jq -r '.[].["ip-addresses"][]?.["ip-address"]' | head -1
|
||||
done
|
||||
```
|
||||
181
skills/proxmox/references/clustering.md
Normal file
181
skills/proxmox/references/clustering.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Proxmox Clustering Reference
|
||||
|
||||
## Cluster Benefits
|
||||
|
||||
- Centralized web management
|
||||
- Live VM migration between nodes
|
||||
- High availability (HA) with automatic failover
|
||||
- Shared configuration
|
||||
|
||||
## Cluster Requirements
|
||||
|
||||
| Requirement | Details |
|
||||
|-------------|---------|
|
||||
| Version | Same major/minor Proxmox version |
|
||||
| Time | NTP synchronized |
|
||||
| Network | Low-latency cluster network |
|
||||
| Names | Unique node hostnames |
|
||||
| Storage | Shared storage for HA |
|
||||
|
||||
## Cluster Commands
|
||||
|
||||
```bash
|
||||
# Check cluster status
|
||||
pvecm status
|
||||
|
||||
# List cluster nodes
|
||||
pvecm nodes
|
||||
|
||||
# Add node to cluster (run on new node)
|
||||
pvecm add <existing-node>
|
||||
|
||||
# Remove node (run on remaining node)
|
||||
pvecm delnode <node-name>
|
||||
|
||||
# Expected votes (split-brain recovery)
|
||||
pvecm expected <votes>
|
||||
```
|
||||
|
||||
## Quorum
|
||||
|
||||
Cluster requires majority of nodes online to operate.
|
||||
|
||||
| Nodes | Quorum | Can Lose |
|
||||
|-------|--------|----------|
|
||||
| 2 | 2 | 0 (use QDevice) |
|
||||
| 3 | 2 | 1 |
|
||||
| 4 | 3 | 1 |
|
||||
| 5 | 3 | 2 |
|
||||
|
||||
### QDevice
|
||||
|
||||
External quorum device for even-node clusters:
|
||||
|
||||
- Prevents split-brain in 2-node clusters
|
||||
- Runs on separate machine
|
||||
- Provides tie-breaking vote
|
||||
|
||||
## High Availability (HA)
|
||||
|
||||
Automatic VM restart on healthy node if host fails.
|
||||
|
||||
### Requirements
|
||||
|
||||
- Shared storage (Ceph, NFS, iSCSI)
|
||||
- Fencing enabled (watchdog)
|
||||
- HA group configured
|
||||
- VM added to HA
|
||||
|
||||
### HA States
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| started | VM running, managed by HA |
|
||||
| stopped | VM stopped intentionally |
|
||||
| migrate | Migration in progress |
|
||||
| relocate | Moving to different node |
|
||||
| error | Problem detected |
|
||||
|
||||
### HA Configuration
|
||||
|
||||
1. Enable fencing (watchdog device)
|
||||
2. Create HA group (optional)
|
||||
3. Add VM to HA: Datacenter → HA → Add
|
||||
|
||||
### Fencing
|
||||
|
||||
Prevents split-brain by forcing failed node to stop:
|
||||
|
||||
```bash
|
||||
# Check watchdog status
|
||||
cat /proc/sys/kernel/watchdog
|
||||
|
||||
# Watchdog config
|
||||
/etc/pve/ha/fence.cfg
|
||||
```
|
||||
|
||||
## Live Migration
|
||||
|
||||
Move running VM between nodes without downtime.
|
||||
|
||||
### Requirements
|
||||
|
||||
- Shared storage OR local-to-local migration
|
||||
- Same CPU architecture
|
||||
- Network connectivity
|
||||
- Sufficient resources on target
|
||||
|
||||
### Migration Types
|
||||
|
||||
| Type | Downtime | Requirements |
|
||||
|------|----------|--------------|
|
||||
| Live | Minimal | Shared storage |
|
||||
| Offline | Full | Any storage |
|
||||
| Local storage | Moderate | Copies disk |
|
||||
|
||||
### Migration Command
|
||||
|
||||
```bash
|
||||
# Live migrate
|
||||
qm migrate <vmid> <target-node>
|
||||
|
||||
# Offline migrate
|
||||
qm migrate <vmid> <target-node> --offline
|
||||
|
||||
# With local disk
|
||||
qm migrate <vmid> <target-node> --with-local-disks
|
||||
```
|
||||
|
||||
## Cluster Network
|
||||
|
||||
### Corosync Network
|
||||
|
||||
Cluster communication (default port 5405):
|
||||
|
||||
- Low-latency required
|
||||
- Dedicated VLAN recommended
|
||||
- Redundant links for HA
|
||||
|
||||
### Configuration
|
||||
|
||||
```
|
||||
# /etc/pve/corosync.conf
|
||||
nodelist {
|
||||
node {
|
||||
name: node1
|
||||
ring0_addr: 192.168.10.1
|
||||
}
|
||||
node {
|
||||
name: node2
|
||||
ring0_addr: 192.168.10.2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Quorum Lost
|
||||
|
||||
```bash
|
||||
# Check status
|
||||
pvecm status
|
||||
|
||||
# Force expected votes (DANGEROUS)
|
||||
pvecm expected 1
|
||||
|
||||
# Then: recover remaining nodes
|
||||
```
|
||||
|
||||
### Node Won't Join
|
||||
|
||||
- Check network connectivity
|
||||
- Verify time sync
|
||||
- Check Proxmox versions match
|
||||
- Review /var/log/pve-cluster/
|
||||
|
||||
### Split Brain Recovery
|
||||
|
||||
1. Identify authoritative node
|
||||
2. Stop cluster services on other nodes
|
||||
3. Set expected votes
|
||||
4. Restart and rejoin nodes
|
||||
202
skills/proxmox/references/docker-hosting.md
Normal file
202
skills/proxmox/references/docker-hosting.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Docker Workloads on Proxmox
|
||||
|
||||
Best practices for hosting Docker containers on Proxmox VE.
|
||||
|
||||
## Hosting Options
|
||||
|
||||
| Option | Isolation | Overhead | Complexity | Recommendation |
|
||||
|--------|-----------|----------|------------|----------------|
|
||||
| VM + Docker | Full | Higher | Low | **Recommended** |
|
||||
| LXC + Docker | Shared kernel | Lower | High | Avoid |
|
||||
| Bare metal Docker | None | Lowest | N/A | Not on Proxmox |
|
||||
|
||||
## VM for Docker (Recommended)
|
||||
|
||||
### Template Selection
|
||||
|
||||
Use Docker-ready templates (102+):
|
||||
|
||||
| Template | Docker Pre-installed |
|
||||
|----------|---------------------|
|
||||
| 102 (docker) | Yes |
|
||||
| 103 (github-runner) | Yes |
|
||||
| 104 (pihole) | Yes |
|
||||
|
||||
### VM Sizing
|
||||
|
||||
| Workload | CPU | RAM | Disk |
|
||||
|----------|-----|-----|------|
|
||||
| Light (1-3 containers) | 2 | 4 GB | 50 GB |
|
||||
| Medium (4-10 containers) | 4 | 8 GB | 100 GB |
|
||||
| Heavy (10+ containers) | 8+ | 16+ GB | 200+ GB |
|
||||
|
||||
### Storage Backend
|
||||
|
||||
| Proxmox Storage | Docker Suitability | Notes |
|
||||
|-----------------|-------------------|-------|
|
||||
| local-lvm | Good | Default, fast |
|
||||
| ZFS | Best | Snapshots, compression |
|
||||
| Ceph | Good | Distributed, HA |
|
||||
| NFS | Moderate | Shared access, slower |
|
||||
|
||||
### Network Configuration
|
||||
|
||||
```
|
||||
Proxmox Node
|
||||
├── vmbr0 (bridge) → VM eth0 → Docker bridge network
|
||||
└── vmbr12 (high-speed) → VM eth1 → Docker macvlan (optional)
|
||||
```
|
||||
|
||||
## Docker in LXC (Not Recommended)
|
||||
|
||||
If you must run Docker in LXC:
|
||||
|
||||
### Requirements
|
||||
|
||||
1. **Privileged container** or nesting enabled
|
||||
2. **AppArmor** profile unconfined
|
||||
3. **Keyctl** feature enabled
|
||||
|
||||
### LXC Options
|
||||
|
||||
```bash
|
||||
# Proxmox GUI: Options → Features
|
||||
nesting: 1
|
||||
keyctl: 1
|
||||
|
||||
# Or in /etc/pve/lxc/<vmid>.conf
|
||||
features: keyctl=1,nesting=1
|
||||
lxc.apparmor.profile: unconfined
|
||||
```
|
||||
|
||||
### Known Issues
|
||||
|
||||
- Some Docker storage drivers don't work
|
||||
- Overlay filesystem may have issues
|
||||
- Reduced security isolation
|
||||
- Complex debugging (two container layers)
|
||||
|
||||
## Resource Allocation
|
||||
|
||||
### CPU
|
||||
|
||||
```bash
|
||||
# VM config - dedicate cores to Docker host
|
||||
cores: 4
|
||||
cpu: host # Pass through CPU features
|
||||
```
|
||||
|
||||
### Memory
|
||||
|
||||
```bash
|
||||
# VM config - allow some overcommit for containers
|
||||
memory: 8192
|
||||
balloon: 4096 # Minimum memory
|
||||
```
|
||||
|
||||
### Disk I/O
|
||||
|
||||
For I/O intensive containers (databases):
|
||||
|
||||
```bash
|
||||
# VM disk options
|
||||
cache: none # Direct I/O for consistency
|
||||
iothread: 1 # Dedicated I/O thread
|
||||
ssd: 1 # If on SSD storage
|
||||
```
|
||||
|
||||
## GPU Passthrough for Containers
|
||||
|
||||
For transcoding (Plex) or ML workloads:
|
||||
|
||||
### 1. Proxmox: Pass GPU to VM
|
||||
|
||||
```bash
|
||||
# /etc/pve/qemu-server/<vmid>.conf
|
||||
hostpci0: 0000:01:00.0,pcie=1
|
||||
```
|
||||
|
||||
### 2. VM: Install NVIDIA Container Toolkit
|
||||
|
||||
```bash
|
||||
# In VM
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
sudo apt update && sudo apt install -y nvidia-container-toolkit
|
||||
sudo nvidia-ctk runtime configure --runtime=docker
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### 3. Docker Compose
|
||||
|
||||
```yaml
|
||||
services:
|
||||
plex:
|
||||
image: linuxserver/plex
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### VM-level (Recommended)
|
||||
|
||||
Proxmox vzdump backs up entire Docker host including all containers:
|
||||
|
||||
```bash
|
||||
vzdump <vmid> --mode snapshot --storage backup --compress zstd
|
||||
```
|
||||
|
||||
### Application-level
|
||||
|
||||
For consistent database backups, stop or flush before VM backup:
|
||||
|
||||
```bash
|
||||
# Pre-backup hook
|
||||
docker exec postgres pg_dump -U user db > /backup/db.sql
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### From Proxmox
|
||||
|
||||
- VM CPU, memory, network, disk via Proxmox UI
|
||||
- No visibility into individual containers
|
||||
|
||||
### From Docker Host
|
||||
|
||||
```bash
|
||||
# Resource usage per container
|
||||
docker stats
|
||||
|
||||
# System-wide
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Recommended Stack
|
||||
|
||||
```yaml
|
||||
# On Docker host
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor
|
||||
grafana:
|
||||
image: grafana/grafana
|
||||
```
|
||||
|
||||
## Skill References
|
||||
|
||||
For Docker-specific patterns:
|
||||
- `docker/references/compose.md` - Compose file structure
|
||||
- `docker/references/networking.md` - Network modes
|
||||
- `docker/references/volumes.md` - Data persistence
|
||||
- `docker/references/proxmox/hosting.md` - Detailed hosting guide
|
||||
153
skills/proxmox/references/networking.md
Normal file
153
skills/proxmox/references/networking.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Proxmox Networking Reference
|
||||
|
||||
## Linux Bridges
|
||||
|
||||
Default networking method for Proxmox VMs and containers.
|
||||
|
||||
### Bridge Configuration
|
||||
|
||||
```
|
||||
# /etc/network/interfaces example
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 192.168.1.10/24
|
||||
gateway 192.168.1.1
|
||||
bridge-ports eno1
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-vlan-aware yes
|
||||
```
|
||||
|
||||
### VLAN-Aware Bridge
|
||||
|
||||
Enable VLAN tagging at VM level instead of separate bridges:
|
||||
|
||||
- Set `bridge-vlan-aware yes` on bridge
|
||||
- Configure VLAN tag in VM network config
|
||||
- Simpler management, fewer bridges needed
|
||||
|
||||
### Separate Bridges (Alternative)
|
||||
|
||||
One bridge per VLAN:
|
||||
|
||||
- vmbr0: Untagged/native VLAN
|
||||
- vmbr1: VLAN 10
|
||||
- vmbr5: VLAN 5
|
||||
|
||||
More bridges but explicit network separation.
|
||||
|
||||
## VLAN Configuration
|
||||
|
||||
### At VM Level (VLAN-aware bridge)
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,tag=20
|
||||
```
|
||||
|
||||
### At Bridge Level (Separate bridges)
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr20
|
||||
```
|
||||
|
||||
## Firewall
|
||||
|
||||
Three levels of firewall rules:
|
||||
|
||||
| Level | Scope | Use Case |
|
||||
|-------|-------|----------|
|
||||
| Datacenter | Cluster-wide | Default policies |
|
||||
| Node | Per-node | Node-specific rules |
|
||||
| VM/Container | Per-VM | Application-specific |
|
||||
|
||||
### Default Policy
|
||||
|
||||
- Input: DROP (only allow explicit rules)
|
||||
- Output: ACCEPT
|
||||
- Enable firewall per VM in Options
|
||||
|
||||
### Common Rules
|
||||
|
||||
```
|
||||
# Allow SSH
|
||||
IN ACCEPT -p tcp --dport 22
|
||||
|
||||
# Allow HTTP/HTTPS
|
||||
IN ACCEPT -p tcp --dport 80
|
||||
IN ACCEPT -p tcp --dport 443
|
||||
|
||||
# Allow ICMP (ping)
|
||||
IN ACCEPT -p icmp
|
||||
```
|
||||
|
||||
## SDN (Software Defined Networking)
|
||||
|
||||
Advanced networking for complex multi-tenant setups.
|
||||
|
||||
### Zone Types
|
||||
|
||||
| Type | Use Case |
|
||||
|------|----------|
|
||||
| Simple | Basic L2 network |
|
||||
| VLAN | VLAN-based isolation |
|
||||
| VXLAN | Overlay networking |
|
||||
| EVPN | BGP-based routing |
|
||||
|
||||
### When to Use SDN
|
||||
|
||||
- Multi-tenant environments
|
||||
- Complex routing requirements
|
||||
- Cross-node L2 networks
|
||||
- VXLAN overlay needs
|
||||
|
||||
For homelab: Standard bridges usually sufficient.
|
||||
|
||||
## Network Performance
|
||||
|
||||
### Jumbo Frames
|
||||
|
||||
Enable on storage network for better throughput:
|
||||
|
||||
```
|
||||
# Set MTU 9000 on bridge
|
||||
auto vmbr40
|
||||
iface vmbr40 inet static
|
||||
mtu 9000
|
||||
...
|
||||
```
|
||||
|
||||
Requires: All devices in path support jumbo frames.
|
||||
|
||||
### VirtIO Multiqueue
|
||||
|
||||
Enable parallel network processing for high-throughput VMs:
|
||||
|
||||
```
|
||||
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,queues=4
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Bridge Status
|
||||
|
||||
```bash
|
||||
brctl show # List bridges and attached interfaces
|
||||
ip link show vmbr0 # Bridge interface details
|
||||
bridge vlan show # VLAN configuration
|
||||
```
|
||||
|
||||
### Check VM Network
|
||||
|
||||
```bash
|
||||
qm config <vmid> | grep net # VM network config
|
||||
ip addr # From inside VM
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Problem | Check |
|
||||
|---------|-------|
|
||||
| No connectivity | Bridge exists, interface attached |
|
||||
| Wrong VLAN | Tag matches switch config |
|
||||
| Slow network | MTU mismatch, driver type |
|
||||
| Firewall blocking | Rules, policy, enabled status |
|
||||
150
skills/proxmox/references/storage.md
Normal file
150
skills/proxmox/references/storage.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Proxmox Storage Reference
|
||||
|
||||
## Storage Types
|
||||
|
||||
### Local Storage
|
||||
|
||||
| Type | Features | Use Case |
|
||||
|------|----------|----------|
|
||||
| Directory | Simple, any filesystem | Basic storage |
|
||||
| LVM | Block device, raw performance | Performance |
|
||||
| LVM-thin | Thin provisioning, snapshots | Efficient space |
|
||||
| ZFS | Compression, snapshots, high perf | Production |
|
||||
|
||||
Limitations: No live migration, single node only.
|
||||
|
||||
### Shared Storage
|
||||
|
||||
| Type | Features | Use Case |
|
||||
|------|----------|----------|
|
||||
| NFS | File-based, simple | Shared access |
|
||||
| Ceph RBD | Distributed block, HA | Production HA |
|
||||
| iSCSI | Network block | SAN integration |
|
||||
| GlusterFS | Distributed file | File sharing |
|
||||
|
||||
Benefits: Live migration, HA, shared access.
|
||||
|
||||
## Content Types
|
||||
|
||||
Configure what each storage can hold:
|
||||
|
||||
| Content | Description | File Types |
|
||||
|---------|-------------|------------|
|
||||
| images | VM disk images | .raw, .qcow2 |
|
||||
| iso | ISO images for install | .iso |
|
||||
| vztmpl | Container templates | .tar.gz |
|
||||
| backup | Backup files | .vma, .tar |
|
||||
| rootdir | Container root FS | directories |
|
||||
| snippets | Cloud-init, hooks | .yaml, scripts |
|
||||
|
||||
## Storage Configuration
|
||||
|
||||
### Add NFS Storage
|
||||
|
||||
```bash
|
||||
pvesm add nfs <storage-id> \
|
||||
--server <nfs-server> \
|
||||
--export <export-path> \
|
||||
--content images,iso,backup
|
||||
```
|
||||
|
||||
### Add Ceph RBD
|
||||
|
||||
```bash
|
||||
pvesm add rbd <storage-id> \
|
||||
--monhost <mon1>,<mon2>,<mon3> \
|
||||
--pool <pool-name> \
|
||||
--content images,rootdir
|
||||
```
|
||||
|
||||
### Check Storage Status
|
||||
|
||||
```bash
|
||||
pvesm status # All storage status
|
||||
pvesh get /storage # API query
|
||||
df -h # Disk space
|
||||
```
|
||||
|
||||
## Disk Formats
|
||||
|
||||
| Format | Features | Performance |
|
||||
|--------|----------|-------------|
|
||||
| raw | No overhead, full allocation | Fastest |
|
||||
| qcow2 | Snapshots, thin provisioning | Moderate |
|
||||
|
||||
Recommendation: Use `raw` for production, `qcow2` for dev/snapshots.
|
||||
|
||||
## Disk Cache Modes
|
||||
|
||||
| Mode | Safety | Performance | Use Case |
|
||||
|------|--------|-------------|----------|
|
||||
| none | Safe | Good | Default, recommended |
|
||||
| writeback | Unsafe | Best | Non-critical, battery backup |
|
||||
| writethrough | Safe | Moderate | Compatibility |
|
||||
| directsync | Safest | Slow | Critical data |
|
||||
|
||||
## Storage Performance
|
||||
|
||||
### Enable Discard (TRIM)
|
||||
|
||||
For SSD thin provisioning:
|
||||
|
||||
```
|
||||
scsi0: local-lvm:vm-100-disk-0,discard=on
|
||||
```
|
||||
|
||||
### I/O Thread
|
||||
|
||||
Dedicated I/O thread per disk:
|
||||
|
||||
```
|
||||
scsi0: local-lvm:vm-100-disk-0,iothread=1
|
||||
```
|
||||
|
||||
### I/O Limits
|
||||
|
||||
Throttle disk bandwidth:
|
||||
|
||||
```
|
||||
# In VM config
|
||||
bwlimit: <KiB/s>
|
||||
iops_rd: <iops>
|
||||
iops_wr: <iops>
|
||||
```
|
||||
|
||||
## Cloud-Init Storage
|
||||
|
||||
Cloud-init configs stored in `snippets` content type:
|
||||
|
||||
```bash
|
||||
# Upload cloud-init files
|
||||
scp user-data.yaml root@proxmox:/var/lib/vz/snippets/
|
||||
|
||||
# Or to named storage
|
||||
scp user-data.yaml root@proxmox:/mnt/pve/<storage>/snippets/
|
||||
```
|
||||
|
||||
Reference in VM:
|
||||
|
||||
```
|
||||
cicustom: user=<storage>:snippets/user-data.yaml
|
||||
```
|
||||
|
||||
## Backup Storage
|
||||
|
||||
### Recommended Configuration
|
||||
|
||||
- Separate storage for backups
|
||||
- NFS or dedicated backup server
|
||||
- Sufficient space for retention policy
|
||||
|
||||
### Backup Retention
|
||||
|
||||
Configure in Datacenter → Backup:
|
||||
|
||||
```
|
||||
keep-last: 3
|
||||
keep-daily: 7
|
||||
keep-weekly: 4
|
||||
keep-monthly: 6
|
||||
```
|
||||
197
skills/proxmox/references/troubleshooting.md
Normal file
197
skills/proxmox/references/troubleshooting.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Proxmox Troubleshooting Reference
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| VM won't start | Lock, storage, resources | `qm unlock`, check storage, verify resources |
|
||||
| Migration failed | No shared storage, resources | Verify shared storage, check target capacity |
|
||||
| Cluster issues | Quorum, network, time | `pvecm status`, check NTP, network |
|
||||
| Storage unavailable | Mount failed, network | Check mount, network access |
|
||||
| High load | Resource contention | Identify bottleneck, rebalance VMs |
|
||||
| Network issues | Bridge, VLAN, firewall | `brctl show`, check tags, firewall rules |
|
||||
| Backup failed | Disk space, VM state | Check space, storage access |
|
||||
| Template not found | Not downloaded | Download from Proxmox repo |
|
||||
| API errors | Auth, permissions | Check token, user permissions |
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
### Cluster Health
|
||||
|
||||
```bash
|
||||
pvecm status # Quorum and node status
|
||||
pvecm nodes # List cluster members
|
||||
systemctl status pve-cluster # Cluster service
|
||||
systemctl status corosync # Corosync service
|
||||
```
|
||||
|
||||
### Node Health
|
||||
|
||||
```bash
|
||||
pveversion -v # Proxmox version info
|
||||
uptime # Load and uptime
|
||||
free -h # Memory usage
|
||||
df -h # Disk space
|
||||
top -bn1 | head -20 # Process overview
|
||||
```
|
||||
|
||||
### VM Diagnostics
|
||||
|
||||
```bash
|
||||
qm status <vmid> # VM state
|
||||
qm config <vmid> # VM configuration
|
||||
qm showcmd <vmid> # QEMU command line
|
||||
qm unlock <vmid> # Clear locks
|
||||
qm monitor <vmid> # QEMU monitor access
|
||||
```
|
||||
|
||||
### Container Diagnostics
|
||||
|
||||
```bash
|
||||
pct status <ctid> # Container state
|
||||
pct config <ctid> # Container configuration
|
||||
pct enter <ctid> # Enter container shell
|
||||
pct unlock <ctid> # Clear locks
|
||||
```
|
||||
|
||||
### Storage Diagnostics
|
||||
|
||||
```bash
|
||||
pvesm status # Storage status
|
||||
df -h # Disk space
|
||||
mount | grep -E 'nfs|ceph' # Mounted storage
|
||||
zpool status # ZFS pool status (if using ZFS)
|
||||
ceph -s # Ceph status (if using Ceph)
|
||||
```
|
||||
|
||||
### Network Diagnostics
|
||||
|
||||
```bash
|
||||
brctl show # Bridge configuration
|
||||
ip link # Network interfaces
|
||||
ip addr # IP addresses
|
||||
ip route # Routing table
|
||||
bridge vlan show # VLAN configuration
|
||||
```
|
||||
|
||||
### Log Files
|
||||
|
||||
```bash
|
||||
# Cluster logs
|
||||
journalctl -u pve-cluster
|
||||
journalctl -u corosync
|
||||
|
||||
# VM/Container logs
|
||||
journalctl | grep <vmid>
|
||||
tail -f /var/log/pve/tasks/*
|
||||
|
||||
# Firewall logs
|
||||
journalctl -u pve-firewall
|
||||
|
||||
# Web interface logs
|
||||
journalctl -u pveproxy
|
||||
```
|
||||
|
||||
## Troubleshooting Workflows
|
||||
|
||||
### VM Won't Start
|
||||
|
||||
1. Check for locks: `qm unlock <vmid>`
|
||||
2. Verify storage: `pvesm status`
|
||||
3. Check resources: `free -h`, `df -h`
|
||||
4. Review config: `qm config <vmid>`
|
||||
5. Check logs: `journalctl | grep <vmid>`
|
||||
6. Try manual start: `qm start <vmid> --debug`
|
||||
|
||||
### Migration Failure
|
||||
|
||||
1. Verify shared storage: `pvesm status`
|
||||
2. Check target resources: `pvesh get /nodes/<target>/status`
|
||||
3. Verify network: `ping <target-node>`
|
||||
4. Check version match: `pveversion` on both nodes
|
||||
5. Review migration logs
|
||||
|
||||
### Cluster Quorum Lost
|
||||
|
||||
1. Check status: `pvecm status`
|
||||
2. Identify online nodes
|
||||
3. If majority lost, set expected: `pvecm expected <n>`
|
||||
4. Recover remaining nodes
|
||||
5. Rejoin lost nodes when available
|
||||
|
||||
### Storage Mount Failed
|
||||
|
||||
1. Check network: `ping <storage-server>`
|
||||
2. Verify mount: `mount | grep <storage>`
|
||||
3. Try manual mount
|
||||
4. Check permissions on storage server
|
||||
5. Review `/var/log/syslog`
|
||||
|
||||
### High CPU/Memory Usage
|
||||
|
||||
1. Identify culprit: `top`, `htop`
|
||||
2. Check VM resources: `qm monitor <vmid>` → `info balloon`
|
||||
3. Review resource allocation across cluster
|
||||
4. Consider migration or resource limits
|
||||
|
||||
## Recovery Procedures
|
||||
|
||||
### Remove Failed Node
|
||||
|
||||
```bash
|
||||
# On healthy node
|
||||
pvecm delnode <failed-node>
|
||||
|
||||
# Clean up node-specific configs
|
||||
rm -rf /etc/pve/nodes/<failed-node>
|
||||
```
|
||||
|
||||
### Force Stop Locked VM
|
||||
|
||||
```bash
|
||||
# Remove lock
|
||||
qm unlock <vmid>
|
||||
|
||||
# If still stuck, find and kill QEMU process
|
||||
ps aux | grep <vmid>
|
||||
kill <pid>
|
||||
|
||||
# Force cleanup
|
||||
qm stop <vmid> --skiplock
|
||||
```
|
||||
|
||||
### Recover from Corrupt Config
|
||||
|
||||
```bash
|
||||
# Backup current config
|
||||
cp /etc/pve/qemu-server/<vmid>.conf /root/<vmid>.conf.bak
|
||||
|
||||
# Edit config manually
|
||||
nano /etc/pve/qemu-server/<vmid>.conf
|
||||
|
||||
# Or restore from backup
|
||||
qmrestore <backup> <vmid>
|
||||
```
|
||||
|
||||
## Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
echo "=== Cluster Status ==="
|
||||
pvecm status
|
||||
|
||||
echo -e "\n=== Node Resources ==="
|
||||
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do
|
||||
echo "--- $node ---"
|
||||
pvesh get /nodes/$node/status --output-format yaml | grep -E '^(cpu|memory):'
|
||||
done
|
||||
|
||||
echo -e "\n=== Storage Status ==="
|
||||
pvesm status
|
||||
|
||||
echo -e "\n=== Running VMs ==="
|
||||
qm list | grep running
|
||||
|
||||
echo -e "\n=== Running Containers ==="
|
||||
pct list | grep running
|
||||
```
|
||||
103
skills/proxmox/references/vm-lxc.md
Normal file
103
skills/proxmox/references/vm-lxc.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# VM vs LXC Reference
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
### Use VM (QEMU/KVM) When
|
||||
|
||||
- Running Windows or non-Linux OS
|
||||
- Need full kernel isolation
|
||||
- Running untrusted workloads
|
||||
- Complex hardware passthrough needed
|
||||
- Different kernel version required
|
||||
- GPU passthrough required
|
||||
|
||||
### Use LXC When
|
||||
|
||||
- Running Linux services
|
||||
- Need lightweight, fast startup
|
||||
- Comfortable with shared kernel
|
||||
- Want better density/performance
|
||||
- Simple application containers
|
||||
- Development environments
|
||||
|
||||
## QEMU/KVM VMs
|
||||
|
||||
Full hardware virtualization with any OS support.
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
| Setting | Options | Recommendation |
|
||||
|---------|---------|----------------|
|
||||
| CPU type | host, kvm64, custom | `host` for performance |
|
||||
| Boot | UEFI, BIOS | UEFI for modern OS |
|
||||
| Display | VNC, SPICE, NoVNC | NoVNC for web access |
|
||||
|
||||
### Storage Controllers
|
||||
|
||||
| Type | Performance | Use Case |
|
||||
|------|-------------|----------|
|
||||
| VirtIO | Fastest | Linux, Windows with drivers |
|
||||
| SCSI | Fast | General purpose |
|
||||
| SATA | Moderate | Compatibility |
|
||||
| IDE | Slow | Legacy OS |
|
||||
|
||||
### Network Adapters
|
||||
|
||||
| Type | Performance | Use Case |
|
||||
|------|-------------|----------|
|
||||
| VirtIO | Fastest | Linux, Windows with drivers |
|
||||
| E1000 | Good | Compatibility |
|
||||
| RTL8139 | Slow | Legacy OS |
|
||||
|
||||
### Features
|
||||
|
||||
- Snapshots (requires compatible storage)
|
||||
- Templates for rapid cloning
|
||||
- Live migration (requires shared storage)
|
||||
- Hardware passthrough (GPU, USB, PCI)
|
||||
|
||||
## LXC Containers
|
||||
|
||||
OS-level virtualization with shared kernel.
|
||||
|
||||
### Container Types
|
||||
|
||||
| Type | Security | Use Case |
|
||||
|------|----------|----------|
|
||||
| Unprivileged | Higher (recommended) | Production workloads |
|
||||
| Privileged | Lower | Docker-in-LXC, NFS mounts |
|
||||
|
||||
### Resource Controls
|
||||
|
||||
- CPU cores and limits
|
||||
- Memory hard/soft limits
|
||||
- Disk I/O throttling
|
||||
- Network bandwidth limits
|
||||
|
||||
### Storage Options
|
||||
|
||||
- Bind mounts from host
|
||||
- Volume storage
|
||||
- ZFS datasets
|
||||
|
||||
### Features
|
||||
|
||||
- Fast startup (seconds)
|
||||
- Lower memory overhead
|
||||
- Higher density per host
|
||||
- Templates from Proxmox repo
|
||||
|
||||
## Migration Considerations
|
||||
|
||||
### VM Migration Requirements
|
||||
|
||||
- Shared storage (Ceph, NFS, iSCSI)
|
||||
- Same CPU architecture
|
||||
- Compatible Proxmox versions
|
||||
- Network connectivity between nodes
|
||||
|
||||
### LXC Migration Requirements
|
||||
|
||||
- Shared storage for live migration
|
||||
- Same architecture
|
||||
- Unprivileged preferred for portability
|
||||
85
skills/terraform/SKILL.md
Normal file
85
skills/terraform/SKILL.md
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
name: terraform
|
||||
description: |
|
||||
Terraform infrastructure-as-code reference for HCL syntax, state management,
|
||||
module design, and provider configuration. Use when working with Terraform
|
||||
configurations (.tf files), running terraform commands, troubleshooting state
|
||||
issues, or designing modules. Includes Telmate Proxmox provider patterns.
|
||||
Triggers: terraform, tfstate, .tf files, HCL, modules, providers, proxmox_vm_qemu.
|
||||
---
|
||||
|
||||
# Terraform Skill
|
||||
|
||||
Infrastructure-as-code reference for Terraform configurations, state management, and provider patterns.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Core workflow
|
||||
terraform init # Initialize, download providers
|
||||
terraform validate # Syntax validation
|
||||
terraform fmt -recursive # Format HCL files
|
||||
terraform plan # Preview changes
|
||||
terraform apply # Apply changes
|
||||
|
||||
# Inspection
|
||||
terraform state list # List resources in state
|
||||
terraform state show <resource> # Show resource details
|
||||
terraform graph | dot -Tsvg > graph.svg # Dependency graph
|
||||
|
||||
# Debug
|
||||
TF_LOG=DEBUG terraform plan 2>debug.log
|
||||
```
|
||||
|
||||
## Core Workflow
|
||||
|
||||
```
|
||||
init → validate → fmt → plan → apply
|
||||
```
|
||||
|
||||
1. **init**: Download providers, initialize backend
|
||||
2. **validate**: Check syntax and configuration validity
|
||||
3. **fmt**: Ensure consistent formatting
|
||||
4. **plan**: Preview what will change (review carefully)
|
||||
5. **apply**: Execute changes
|
||||
|
||||
## Reference Files
|
||||
|
||||
Load on-demand based on task:
|
||||
|
||||
| Topic | File | When to Load |
|
||||
|-------|------|--------------|
|
||||
| Proxmox Gotchas | [proxmox/gotchas.md](references/proxmox/gotchas.md) | Critical provider issues, workarounds |
|
||||
| Proxmox Auth | [proxmox/authentication.md](references/proxmox/authentication.md) | Provider config, API tokens |
|
||||
| Proxmox VMs | [proxmox/vm-qemu.md](references/proxmox/vm-qemu.md) | proxmox_vm_qemu resource patterns |
|
||||
| Proxmox Errors | [proxmox/troubleshooting.md](references/proxmox/troubleshooting.md) | Common errors, debugging |
|
||||
| State | [state-management.md](references/state-management.md) | Backends, locking, operations |
|
||||
| Modules | [module-design.md](references/module-design.md) | Module patterns, composition |
|
||||
| Security | [security.md](references/security.md) | Secrets, state security |
|
||||
| External | [external-resources.md](references/external-resources.md) | Official docs, links |
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before `terraform apply`:
|
||||
|
||||
- [ ] `terraform init` completed successfully
|
||||
- [ ] `terraform validate` passes
|
||||
- [ ] `terraform fmt` applied
|
||||
- [ ] `terraform plan` reviewed (check destroy/replace operations)
|
||||
- [ ] Backend configured correctly (for team environments)
|
||||
- [ ] State locking enabled (if remote backend)
|
||||
- [ ] Sensitive variables marked `sensitive = true`
|
||||
- [ ] Provider versions pinned in `terraform.tf`
|
||||
- [ ] No secrets in version control
|
||||
- [ ] Blast radius assessed (what could break?)
|
||||
|
||||
## Variable Precedence
|
||||
|
||||
(highest to lowest)
|
||||
|
||||
1. `-var` flag: `terraform apply -var="name=value"`
|
||||
2. `-var-file` flag: `terraform apply -var-file=prod.tfvars`
|
||||
3. `*.auto.tfvars` files (alphabetically)
|
||||
4. `terraform.tfvars` file
|
||||
5. `TF_VAR_*` environment variables
|
||||
6. Variable defaults in `variables.tf`
|
||||
66
skills/terraform/references/external-resources.md
Normal file
66
skills/terraform/references/external-resources.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# External Resources
|
||||
|
||||
Pointers to official documentation and community resources.
|
||||
|
||||
## Official HashiCorp Documentation
|
||||
|
||||
| Resource | URL | Use For |
|
||||
|----------|-----|---------|
|
||||
| Terraform Docs | https://developer.hashicorp.com/terraform/docs | Language reference, CLI commands |
|
||||
| Terraform Tutorials | https://developer.hashicorp.com/terraform/tutorials | Step-by-step learning paths |
|
||||
| Language Reference | https://developer.hashicorp.com/terraform/language | HCL syntax, expressions, functions |
|
||||
| CLI Reference | https://developer.hashicorp.com/terraform/cli | Command options and usage |
|
||||
| Best Practices | https://developer.hashicorp.com/terraform/cloud-docs/recommended-practices | Official workflow recommendations |
|
||||
|
||||
## Terraform Registry
|
||||
|
||||
| Resource | URL | Use For |
|
||||
|----------|-----|---------|
|
||||
| Provider Registry | https://registry.terraform.io/browse/providers | Find and explore providers |
|
||||
| Module Registry | https://registry.terraform.io/browse/modules | Pre-built modules |
|
||||
| Telmate Proxmox | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Proxmox provider docs |
|
||||
| AWS Provider | https://registry.terraform.io/providers/hashicorp/aws/latest/docs | AWS resource reference |
|
||||
|
||||
## Proxmox Resources
|
||||
|
||||
| Resource | URL | Use For |
|
||||
|----------|-----|---------|
|
||||
| Telmate Provider Docs | https://registry.terraform.io/providers/Telmate/proxmox/latest/docs | Resource configuration |
|
||||
| Telmate GitHub | https://github.com/Telmate/terraform-provider-proxmox | Source, issues, examples |
|
||||
| Proxmox VE API | https://pve.proxmox.com/pve-docs/api-viewer/ | Understanding API calls |
|
||||
| Proxmox Wiki | https://pve.proxmox.com/wiki/Main_Page | Proxmox concepts and setup |
|
||||
|
||||
## Community Resources
|
||||
|
||||
| Resource | URL | Use For |
|
||||
|----------|-----|---------|
|
||||
| Terraform Best Practices | https://www.terraform-best-practices.com | Community-maintained guide |
|
||||
| Awesome Terraform | https://github.com/shuaibiyy/awesome-terraform | Curated list of resources |
|
||||
| Terraform Weekly | https://www.yourdevopsmentor.com/terraform-weekly | News and updates |
|
||||
|
||||
## Learning Resources
|
||||
|
||||
| Resource | URL | Use For |
|
||||
|----------|-----|---------|
|
||||
| HashiCorp Learn | https://developer.hashicorp.com/terraform/tutorials | Official tutorials |
|
||||
| Terraform Up & Running | https://www.terraformupandrunning.com/ | Comprehensive book |
|
||||
|
||||
## Tools
|
||||
|
||||
| Tool | URL | Use For |
|
||||
|------|-----|---------|
|
||||
| TFLint | https://github.com/terraform-linters/tflint | Linting and best practices |
|
||||
| Checkov | https://github.com/bridgecrewio/checkov | Security scanning |
|
||||
| Infracost | https://github.com/infracost/infracost | Cost estimation |
|
||||
| Terragrunt | https://terragrunt.gruntwork.io/ | DRY Terraform configurations |
|
||||
| tfenv | https://github.com/tfutils/tfenv | Terraform version management |
|
||||
|
||||
## Quick Links
|
||||
|
||||
**Most commonly needed:**
|
||||
|
||||
1. **HCL Syntax**: https://developer.hashicorp.com/terraform/language/syntax/configuration
|
||||
2. **Functions**: https://developer.hashicorp.com/terraform/language/functions
|
||||
3. **Expressions**: https://developer.hashicorp.com/terraform/language/expressions
|
||||
4. **Backend Configuration**: https://developer.hashicorp.com/terraform/language/settings/backends
|
||||
5. **Proxmox VM Resource**: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs/resources/vm_qemu
|
||||
165
skills/terraform/references/module-design.md
Normal file
165
skills/terraform/references/module-design.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Module Design
|
||||
|
||||
## Standard Structure
|
||||
|
||||
```
|
||||
modules/<name>/
|
||||
├── main.tf # Resources
|
||||
├── variables.tf # Inputs
|
||||
├── outputs.tf # Outputs
|
||||
├── versions.tf # Provider constraints
|
||||
```
|
||||
|
||||
## Module Example
|
||||
|
||||
```hcl
|
||||
# modules/vm/variables.tf
|
||||
variable "name" {
|
||||
description = "VM name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "target_node" {
|
||||
description = "Proxmox node"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "specs" {
|
||||
type = object({
|
||||
cores = number
|
||||
memory = number
|
||||
disk = optional(string, "50G")
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
```hcl
|
||||
# modules/vm/main.tf
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
name = var.name
|
||||
target_node = var.target_node
|
||||
cores = var.specs.cores
|
||||
memory = var.specs.memory
|
||||
}
|
||||
```
|
||||
|
||||
```hcl
|
||||
# modules/vm/outputs.tf
|
||||
output "ip" {
|
||||
value = proxmox_vm_qemu.vm.default_ipv4_address
|
||||
}
|
||||
```
|
||||
|
||||
```hcl
|
||||
# Usage
|
||||
module "web" {
|
||||
source = "./modules/vm"
|
||||
name = "web-01"
|
||||
target_node = "pve1"
|
||||
specs = { cores = 4, memory = 8192 }
|
||||
}
|
||||
```
|
||||
|
||||
## Complex Variable Types
|
||||
|
||||
```hcl
|
||||
# Map of objects
|
||||
variable "vms" {
|
||||
type = map(object({
|
||||
node = string
|
||||
cores = number
|
||||
memory = number
|
||||
}))
|
||||
}
|
||||
|
||||
# Object with optional fields
|
||||
variable "network" {
|
||||
type = object({
|
||||
bridge = string
|
||||
vlan = optional(number)
|
||||
ip = optional(string, "dhcp")
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
## Variable Validation
|
||||
|
||||
```hcl
|
||||
variable "environment" {
|
||||
type = string
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Must be dev, staging, or prod."
|
||||
}
|
||||
}
|
||||
|
||||
variable "cores" {
|
||||
type = number
|
||||
validation {
|
||||
condition = var.cores >= 1 && var.cores <= 32
|
||||
error_message = "Cores must be 1-32."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Module Composition
|
||||
|
||||
```hcl
|
||||
module "network" {
|
||||
source = "../../modules/network"
|
||||
# ...
|
||||
}
|
||||
|
||||
module "web" {
|
||||
source = "../../modules/vm"
|
||||
network_id = module.network.id # Implicit dependency
|
||||
}
|
||||
|
||||
module "database" {
|
||||
source = "../../modules/vm"
|
||||
depends_on = [module.network] # Explicit dependency
|
||||
}
|
||||
```
|
||||
|
||||
## for_each vs count
|
||||
|
||||
```hcl
|
||||
# count - index-based (0, 1, 2)
|
||||
module "worker" {
|
||||
source = "./modules/vm"
|
||||
count = 3
|
||||
name = "worker-${count.index}"
|
||||
}
|
||||
# Access: module.worker[0]
|
||||
|
||||
# for_each - key-based (preferred)
|
||||
module "vm" {
|
||||
source = "./modules/vm"
|
||||
for_each = var.vms
|
||||
name = each.key
|
||||
specs = each.value
|
||||
}
|
||||
# Access: module.vm["web"]
|
||||
```
|
||||
|
||||
## Version Constraints
|
||||
|
||||
```hcl
|
||||
# modules/vm/versions.tf
|
||||
terraform {
|
||||
required_version = ">= 1.0"
|
||||
required_providers {
|
||||
proxmox = {
|
||||
source = "telmate/proxmox"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```hcl
|
||||
# Pin module version
|
||||
module "vm" {
|
||||
source = "git::https://github.com/org/modules.git//vm?ref=v2.1.0"
|
||||
}
|
||||
```
|
||||
44
skills/terraform/references/proxmox/authentication.md
Normal file
44
skills/terraform/references/proxmox/authentication.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Proxmox Provider Authentication
|
||||
|
||||
## Provider Configuration
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
required_providers {
|
||||
proxmox = {
|
||||
source = "telmate/proxmox"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "proxmox" {
|
||||
pm_api_url = "https://proxmox.example.com:8006/api2/json"
|
||||
pm_api_token_id = "terraform@pve!mytoken"
|
||||
pm_api_token_secret = var.pm_api_token_secret
|
||||
pm_tls_insecure = false # true for self-signed certs
|
||||
pm_parallel = 4 # concurrent operations
|
||||
pm_timeout = 600 # API timeout seconds
|
||||
}
|
||||
```
|
||||
|
||||
## Create API Token
|
||||
|
||||
```bash
|
||||
pveum user add terraform@pve
|
||||
pveum aclmod / -user terraform@pve -role PVEAdmin
|
||||
pveum user token add terraform@pve mytoken
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
export PM_API_TOKEN_ID="terraform@pve!mytoken"
|
||||
export PM_API_TOKEN_SECRET="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
|
||||
```
|
||||
|
||||
## Official Resources
|
||||
|
||||
- [Provider Docs](https://registry.terraform.io/providers/Telmate/proxmox/latest/docs)
|
||||
- [GitHub](https://github.com/Telmate/terraform-provider-proxmox)
|
||||
- [Proxmox API](https://pve.proxmox.com/pve-docs/api-viewer/)
|
||||
86
skills/terraform/references/proxmox/gotchas.md
Normal file
86
skills/terraform/references/proxmox/gotchas.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Proxmox Provider Gotchas
|
||||
|
||||
Critical issues when using Telmate Proxmox provider with Terraform.
|
||||
|
||||
## 1. Cloud-Init Changes Not Tracked
|
||||
|
||||
Terraform does **not** detect changes to cloud-init snippet file contents.
|
||||
|
||||
```hcl
|
||||
# PROBLEM: Changing vendor-data.yml won't trigger replacement
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
cicustom = "vendor=local:snippets/vendor-data.yml"
|
||||
}
|
||||
|
||||
# SOLUTION: Use replace_triggered_by
|
||||
resource "local_file" "vendor_data" {
|
||||
filename = "vendor-data.yml"
|
||||
content = templatefile("vendor-data.yml.tftpl", { ... })
|
||||
}
|
||||
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
cicustom = "vendor=local:snippets/vendor-data.yml"
|
||||
|
||||
lifecycle {
|
||||
replace_triggered_by = [
|
||||
local_file.vendor_data.content_base64sha256
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 2. Storage Type vs Storage Pool
|
||||
|
||||
Different concepts - don't confuse:
|
||||
|
||||
```hcl
|
||||
disks {
|
||||
scsi {
|
||||
scsi0 {
|
||||
disk {
|
||||
storage = "local-lvm" # Pool NAME (from Proxmox datacenter)
|
||||
size = "50G"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
scsihw = "virtio-scsi-single" # Controller TYPE
|
||||
```
|
||||
|
||||
- **Storage pool** = Where data stored (local-lvm, ceph-pool, nfs-share)
|
||||
- **Disk type** = Interface (scsi, virtio, ide, sata)
|
||||
|
||||
## 3. Network Interface Naming
|
||||
|
||||
Proxmox VMs get predictable names by device order:
|
||||
|
||||
| NIC Order | Guest Name |
|
||||
|-----------|------------|
|
||||
| First | ens18 |
|
||||
| Second | ens19 |
|
||||
| Third | ens20 |
|
||||
|
||||
**NOT** eth0, eth1. Configure cloud-init netplan matching `ens*`.
|
||||
|
||||
## 4. API Token Expiration
|
||||
|
||||
Long operations (20+ VMs) can exceed token lifetime.
|
||||
|
||||
```hcl
|
||||
provider "proxmox" {
|
||||
pm_api_token_id = "terraform@pve!mytoken"
|
||||
pm_api_token_secret = var.pm_api_token_secret
|
||||
pm_timeout = 1200 # 20 minutes for large operations
|
||||
}
|
||||
```
|
||||
|
||||
Use API tokens (longer-lived) not passwords.
|
||||
|
||||
## 5. Full Clone vs Linked Clone
|
||||
|
||||
```hcl
|
||||
full_clone = true # Independent copy - safe, slower, more storage
|
||||
full_clone = false # References template - BREAKS if template modified
|
||||
```
|
||||
|
||||
**Always use `full_clone = true` for production.** Linked clones only for disposable test VMs.
|
||||
66
skills/terraform/references/proxmox/troubleshooting.md
Normal file
66
skills/terraform/references/proxmox/troubleshooting.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Proxmox Troubleshooting
|
||||
|
||||
## VM Creation Stuck
|
||||
|
||||
```
|
||||
Timeout waiting for VM to be created
|
||||
```
|
||||
|
||||
**Causes**: Template missing, storage full, network unreachable
|
||||
**Debug**: Check Proxmox task log in web UI
|
||||
|
||||
## Clone Failed
|
||||
|
||||
```
|
||||
VM template not found
|
||||
```
|
||||
|
||||
**Check**: `qm list | grep template-name`
|
||||
**Causes**: Template doesn't exist, wrong node, permission issue
|
||||
|
||||
## SSH Timeout
|
||||
|
||||
```
|
||||
Timeout waiting for SSH
|
||||
```
|
||||
|
||||
**Debug**:
|
||||
1. VM console in Proxmox UI
|
||||
2. `cloud-init status` on VM
|
||||
3. `ip addr` to verify network
|
||||
|
||||
**Causes**: Cloud-init failed, network misconfigured, firewall
|
||||
|
||||
## State Drift
|
||||
|
||||
```
|
||||
Plan shows changes for unchanged resources
|
||||
```
|
||||
|
||||
**Causes**: Manual changes in Proxmox UI, provider bug
|
||||
**Fix**:
|
||||
```bash
|
||||
terraform refresh
|
||||
terraform plan # Verify
|
||||
```
|
||||
|
||||
## API Errors
|
||||
|
||||
```
|
||||
500 Internal Server Error
|
||||
```
|
||||
|
||||
**Causes**: Invalid config, resource constraints, API timeout
|
||||
**Debug**: Check `/var/log/pveproxy/access.log` on Proxmox node
|
||||
|
||||
## Permission Denied
|
||||
|
||||
```
|
||||
Permission check failed
|
||||
```
|
||||
|
||||
**Fix**: Verify API token has required permissions:
|
||||
```bash
|
||||
pveum acl list
|
||||
pveum user permissions terraform@pve
|
||||
```
|
||||
86
skills/terraform/references/proxmox/vm-qemu.md
Normal file
86
skills/terraform/references/proxmox/vm-qemu.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# proxmox_vm_qemu Resource
|
||||
|
||||
## Basic VM from Template
|
||||
|
||||
```hcl
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
name = "my-vm"
|
||||
target_node = "pve1"
|
||||
clone = "ubuntu-template"
|
||||
full_clone = true
|
||||
|
||||
cores = 4
|
||||
sockets = 1
|
||||
memory = 8192
|
||||
cpu = "host"
|
||||
|
||||
onboot = true
|
||||
agent = 1 # QEMU guest agent
|
||||
|
||||
scsihw = "virtio-scsi-single"
|
||||
disks {
|
||||
scsi {
|
||||
scsi0 {
|
||||
disk {
|
||||
storage = "local-lvm"
|
||||
size = "50G"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
network {
|
||||
bridge = "vmbr0"
|
||||
model = "virtio"
|
||||
}
|
||||
|
||||
# Cloud-init
|
||||
os_type = "cloud-init"
|
||||
ciuser = "ubuntu"
|
||||
sshkeys = var.ssh_public_key
|
||||
ipconfig0 = "ip=dhcp"
|
||||
# Static: ipconfig0 = "ip=192.168.1.10/24,gw=192.168.1.1"
|
||||
|
||||
# Custom cloud-init
|
||||
cicustom = "vendor=local:snippets/vendor-data.yml"
|
||||
}
|
||||
```
|
||||
|
||||
## Lifecycle Management
|
||||
|
||||
```hcl
|
||||
lifecycle {
|
||||
prevent_destroy = true # Block accidental deletion
|
||||
|
||||
ignore_changes = [
|
||||
network, # Ignore manual changes
|
||||
]
|
||||
|
||||
replace_triggered_by = [
|
||||
local_file.cloud_init.content_base64sha256
|
||||
]
|
||||
|
||||
create_before_destroy = true # Blue-green deployment
|
||||
}
|
||||
```
|
||||
|
||||
## Multiple VMs with for_each
|
||||
|
||||
```hcl
|
||||
variable "vms" {
|
||||
type = map(object({
|
||||
node = string
|
||||
cores = number
|
||||
memory = number
|
||||
}))
|
||||
}
|
||||
|
||||
resource "proxmox_vm_qemu" "vm" {
|
||||
for_each = var.vms
|
||||
name = each.key
|
||||
target_node = each.value.node
|
||||
cores = each.value.cores
|
||||
memory = each.value.memory
|
||||
# ...
|
||||
}
|
||||
```
|
||||
92
skills/terraform/references/security.md
Normal file
92
skills/terraform/references/security.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Security
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### Environment Variables (Recommended)
|
||||
|
||||
```bash
|
||||
export TF_VAR_proxmox_password="secret"
|
||||
export TF_VAR_api_token="xxxxx"
|
||||
terraform apply
|
||||
```
|
||||
|
||||
### Sensitive Variables
|
||||
|
||||
```hcl
|
||||
variable "database_password" {
|
||||
type = string
|
||||
sensitive = true # Hidden in logs/plan
|
||||
}
|
||||
```
|
||||
|
||||
### External Secrets Managers
|
||||
|
||||
**HashiCorp Vault**:
|
||||
```hcl
|
||||
data "vault_generic_secret" "db" {
|
||||
path = "secret/database"
|
||||
}
|
||||
|
||||
resource "some_resource" "x" {
|
||||
password = data.vault_generic_secret.db.data["password"]
|
||||
}
|
||||
```
|
||||
|
||||
**1Password CLI**:
|
||||
```bash
|
||||
export TF_VAR_password="$(op read 'op://vault/item/password')"
|
||||
terraform apply
|
||||
```
|
||||
|
||||
## State Security
|
||||
|
||||
**CRITICAL**: State contains secrets in plaintext.
|
||||
|
||||
### Encrypt at Rest
|
||||
|
||||
```hcl
|
||||
backend "s3" {
|
||||
encrypt = true
|
||||
kms_key_id = "arn:aws:kms:..." # Optional KMS
|
||||
}
|
||||
```
|
||||
|
||||
### Restrict Access
|
||||
|
||||
- IAM/RBAC on backend storage
|
||||
- Enable state locking
|
||||
- Never commit state to git
|
||||
|
||||
## Provider Credentials
|
||||
|
||||
```hcl
|
||||
provider "proxmox" {
|
||||
pm_api_token_id = "terraform@pve!mytoken"
|
||||
pm_api_token_secret = var.pm_api_token_secret # From env
|
||||
}
|
||||
```
|
||||
|
||||
Create minimal-permission API user:
|
||||
```bash
|
||||
pveum user add terraform@pve
|
||||
pveum aclmod / -user terraform@pve -role PVEVMAdmin
|
||||
pveum user token add terraform@pve terraform-token
|
||||
```
|
||||
|
||||
## Sensitive Outputs
|
||||
|
||||
```hcl
|
||||
output "db_password" {
|
||||
value = random_password.db.result
|
||||
sensitive = true
|
||||
}
|
||||
```
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Sensitive vars marked `sensitive = true`
|
||||
- [ ] Secrets via env vars or secrets manager
|
||||
- [ ] State backend encryption enabled
|
||||
- [ ] State locking enabled
|
||||
- [ ] No credentials in .tf files
|
||||
- [ ] Provider credentials minimal permissions
|
||||
112
skills/terraform/references/state-management.md
Normal file
112
skills/terraform/references/state-management.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# State Management
|
||||
|
||||
## Remote Backend (Recommended)
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
backend "s3" {
|
||||
bucket = "terraform-state"
|
||||
key = "project/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "terraform-locks" # State locking
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### S3-Compatible (MinIO, Ceph)
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
backend "s3" {
|
||||
bucket = "terraform-state"
|
||||
key = "project/terraform.tfstate"
|
||||
region = "us-east-1" # Required but ignored
|
||||
|
||||
endpoint = "https://minio.example.com"
|
||||
skip_credentials_validation = true
|
||||
skip_metadata_api_check = true
|
||||
skip_region_validation = true
|
||||
force_path_style = true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## State Operations
|
||||
|
||||
```bash
|
||||
# List resources
|
||||
terraform state list
|
||||
terraform state list proxmox_vm_qemu.*
|
||||
|
||||
# Show resource details
|
||||
terraform state show proxmox_vm_qemu.web
|
||||
|
||||
# Rename resource
|
||||
terraform state mv proxmox_vm_qemu.old proxmox_vm_qemu.new
|
||||
|
||||
# Move to module
|
||||
terraform state mv proxmox_vm_qemu.web modules.web.proxmox_vm_qemu.main
|
||||
|
||||
# Remove from state (doesn't destroy)
|
||||
terraform state rm proxmox_vm_qemu.orphaned
|
||||
|
||||
# Import existing resource
|
||||
terraform import proxmox_vm_qemu.web pve1/qemu/100
|
||||
|
||||
# Update state from infrastructure
|
||||
terraform refresh
|
||||
```
|
||||
|
||||
## State Migration
|
||||
|
||||
```bash
|
||||
# Change backend - updates terraform block, then:
|
||||
terraform init -migrate-state
|
||||
|
||||
# Reinitialize without migration
|
||||
terraform init -reconfigure
|
||||
```
|
||||
|
||||
## State Locking
|
||||
|
||||
Prevents concurrent modifications. Enable via backend config:
|
||||
- S3: `dynamodb_table`
|
||||
- Consul: Built-in
|
||||
- HTTP: `lock_address`
|
||||
|
||||
### Force Unlock (Emergency)
|
||||
|
||||
```bash
|
||||
# Only when certain no operation running
|
||||
terraform force-unlock LOCK_ID
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### State Lock Timeout
|
||||
|
||||
```
|
||||
Error: Error acquiring state lock
|
||||
```
|
||||
|
||||
1. Wait for other operation
|
||||
2. Verify no process running
|
||||
3. `terraform force-unlock LOCK_ID` if safe
|
||||
|
||||
### State Drift
|
||||
|
||||
```
|
||||
Plan shows unexpected changes
|
||||
```
|
||||
|
||||
```bash
|
||||
terraform refresh # Update state from real infra
|
||||
terraform plan # Review changes
|
||||
```
|
||||
|
||||
### Corrupted State
|
||||
|
||||
1. Restore from backup
|
||||
2. `terraform state pull > backup.tfstate`
|
||||
3. Last resort: `terraform state rm` and re-import
|
||||
Reference in New Issue
Block a user