Files
gh-basher83-lunar-claude-pl…/skills/ansible-best-practices/SKILL.md
2025-11-29 18:00:24 +08:00

392 lines
12 KiB
Markdown

---
name: ansible-best-practices
description: >
Ansible playbook and role patterns using ansible.builtin modules, community.general,
community.proxmox, ansible.posix collections, molecule testing, ansible-lint validation,
and Infisical secrets management. Covers idempotency patterns (changed_when, failed_when,
register), YAML playbook structure, Jinja2 templating, handler patterns, and variable
precedence rules. This skill should be used when writing Ansible playbooks, developing
Ansible roles, testing with molecule/ansible-lint, managing secrets with Infisical,
implementing idempotent task patterns with changed_when/failed_when directives, or
configuring Proxmox/network automation.
---
# Ansible Playbook Best Practices
Expert guidance for writing maintainable, idempotent, and testable Ansible playbooks based on
real-world patterns from this repository.
## Quick Reference
### Pattern Decision Guide
| Need | Use Pattern | Details |
|------|-------------|---------|
| **Use secrets?** | Infisical Secret Management | [patterns/secrets-management.md](patterns/secrets-management.md) |
| **Resource management?** | State-Based Playbooks | [patterns/playbook-role-patterns.md](patterns/playbook-role-patterns.md) |
| **No native module?** | Hybrid Module Approach | See Hybrid Module section below |
| **Task failing?** | Proper Error Handling | [patterns/error-handling.md](patterns/error-handling.md) |
| **Repeating blocks?** | Task Organization | [patterns/task-organization.md](patterns/task-organization.md) |
| **Network config?** | Network Automation | [patterns/network-automation.md](patterns/network-automation.md) |
| **Tasks show 'changed'?** | Idempotency Patterns | [reference/idempotency-patterns.md](reference/idempotency-patterns.md) |
### Golden Rules
1. **Use `uv run` prefix** - Always: `uv run ansible-playbook`
2. **Fully qualify modules** - `ansible.builtin.copy` not `copy`
3. **Secrets via Infisical** - Use reusable task pattern
4. **Control `command`/`shell`** - Always use `changed_when`, `failed_when`
5. **Use `set -euo pipefail`** - In all shell scripts
6. **Tag sensitive tasks** - Use `no_log: true`
7. **Idempotency first** - Check before create, verify after
### Common Commands
```bash
# Lint
mise run ansible-lint
# Analyze complexity
./tools/analyze_playbook.py ansible/playbooks/my-playbook.yml
# Check idempotency
./tools/check_idempotency.py ansible/playbooks/my-playbook.yml
# Run with secrets
cd ansible && uv run ansible-playbook playbooks/my-playbook.yml
```
## Core Patterns from This Repository
### 1. Infisical Secret Management
This repository uses **Infisical** for centralized secrets management.
**Quick Pattern:**
```yaml
- name: Retrieve Proxmox credentials
ansible.builtin.include_tasks: tasks/infisical-secret-lookup.yml
vars:
secret_name: 'PROXMOX_PASSWORD'
secret_var_name: 'proxmox_password'
fallback_env_var: 'PROXMOX_PASSWORD' # Optional
```
**Key Features:** Validates authentication, proper `no_log`, fallback to env vars, reusable across playbooks.
See [patterns/secrets-management.md](patterns/secrets-management.md) for complete guide including
authentication methods, security best practices, and CI/CD integration.
### 2. State-Based Playbooks
**Pattern:** Single playbook handles both create and remove via `state` variable.
```yaml
# Create user (default)
uv run ansible-playbook playbooks/create-admin-user.yml \
-e "admin_name=alice" -e "admin_ssh_key='ssh-ed25519 ...'"
# Remove user (add state=absent)
uv run ansible-playbook playbooks/create-admin-user.yml \
-e "admin_name=alice" -e "admin_state=absent"
```
**Why:** Follows community role patterns, single source of truth, consistent interface, less duplication.
See [patterns/playbook-role-patterns.md](patterns/playbook-role-patterns.md) for complete implementation details and advanced patterns.
### 3. Hybrid Module Approach
**Pattern:** Use native modules where available, fall back to `command` when needed.
```yaml
# GOOD: Native module
- name: Create Linux system user
ansible.builtin.user:
name: "{{ system_username }}"
state: present
# ACCEPTABLE: Command when no native module exists
- name: Create Proxmox API token
ansible.builtin.command: >
pveum user token add {{ system_username }}@{{ proxmox_user_realm }}
register: token_result
changed_when: "'already exists' not in token_result.stderr"
failed_when:
- token_result.rc != 0
- "'already exists' not in token_result.stderr"
```
**Key:** `changed_when` and `failed_when` make `command` module idempotent.
### 4. Proper Error Handling
```yaml
- name: Check if resource exists
ansible.builtin.command: check-resource {{ resource_id }}
register: resource_check
changed_when: false # Read-only operation
failed_when: false # Don't fail, check in next task
- name: Fail if resource missing
ansible.builtin.fail:
msg: "Resource {{ resource_id }} not found"
when: resource_check.rc != 0
```
See [patterns/error-handling.md](patterns/error-handling.md) for comprehensive patterns.
### 5. Task Organization
**Reusable Tasks Pattern:**
```yaml
# In playbook
- name: Get database password
ansible.builtin.include_tasks: "{{ playbook_dir }}/../tasks/infisical-secret-lookup.yml"
vars:
secret_name: 'DB_PASSWORD'
secret_var_name: 'db_password'
```
Extract common patterns to `tasks/` directory, use `include_tasks` with clear variable contracts.
See [patterns/task-organization.md](patterns/task-organization.md) and [patterns/reusable-tasks.md](patterns/reusable-tasks.md).
### 6. Network Automation
**Pattern:** Use `community.general.interfaces_file` for network configuration.
```yaml
- name: Enable VLAN-aware bridging
community.general.interfaces_file:
iface: vmbr1
option: bridge-vlan-aware
value: "yes"
backup: true
state: present
notify: Reload network interfaces
```
Declarative config, automatic backup, handler pattern for reload.
See [patterns/network-automation.md](patterns/network-automation.md) for advanced patterns including VLAN, bonding, and verification.
### 7. Idempotency Patterns
**Use `changed_when` and `failed_when`:**
```yaml
# Check before create
- name: Check if VM exists
ansible.builtin.shell: |
set -o pipefail
qm list | awk '{print $1}' | grep -q "^{{ template_id }}$"
args:
executable: /bin/bash
register: vm_exists
changed_when: false # Checking doesn't change anything
failed_when: false # Don't fail if not found
# Conditional create
- name: Create VM
ansible.builtin.command: qm create {{ template_id }} ...
when: vm_exists.rc != 0
```
See [reference/idempotency-patterns.md](reference/idempotency-patterns.md) for comprehensive patterns.
## Variable Organization
### Quick Summary
**Precedence:** Extra vars (`-e`) > Role vars > Defaults
**Organization:**
```text
ansible/
├── group_vars/all.yml # Variables for ALL hosts
├── group_vars/proxmox.yml # Group-specific
├── host_vars/foxtrot.yml # Host-specific
└── playbooks/
└── my-playbook.yml # Use vars: for playbook-specific
```
**Key principle:** Use `defaults/main.yml` for configurable options, `vars/main.yml` for constants.
See [reference/variable-precedence.md](reference/variable-precedence.md) for complete precedence
rules (22 levels) and
[patterns/variable-management-patterns.md](patterns/variable-management-patterns.md) for
advanced patterns.
## Module Selection
### Prefer ansible.builtin
**Always use fully qualified collection names (FQCN):**
```yaml
# GOOD
- name: Ping hosts
ansible.builtin.ping:
# BAD (deprecated short names)
- name: Ping hosts
ping:
```
### Community Collections in Use
- `community.general` - General utilities (interfaces_file, etc.)
- `community.proxmox` - Proxmox VE management
- `infisical.vault` - Secrets management
- `ansible.posix` - POSIX system management
- `community.docker` - Docker management
See [../../ansible/requirements.yml](../../ansible/requirements.yml) and [reference/collections-guide.md](reference/collections-guide.md).
## Testing
### With ansible-lint
```bash
# Run all linters
mise run lint-all
# Just Ansible
mise run ansible-lint
```
**Common Issues:** Missing `name:` on tasks, using `shell` instead of `command`, not using
`changed_when`, deprecated short names, missing `no_log` on sensitive tasks.
### With Molecule
```bash
cd tools/molecule/default
molecule create # Create test environment
molecule converge # Run playbook
molecule verify # Run tests
molecule destroy # Clean up
```
See [reference/testing-guide.md](reference/testing-guide.md) and [patterns/testing-comprehensive.md](patterns/testing-comprehensive.md) for CI/CD integration.
## Common Anti-Patterns
See [anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md) for detailed examples.
### Quick List
**1. Not Using `set -euo pipefail`**
```yaml
# GOOD
- name: Run script
ansible.builtin.shell: |
set -euo pipefail
command1 | command2
args:
executable: /bin/bash
```
**2. Missing `no_log` on Secrets**
```yaml
# GOOD
- name: Set password
ansible.builtin.command: set-password {{ password }}
no_log: true
```
**3. Using `shell` When `command` Suffices**
Use `shell` ONLY when you need shell features (pipes, redirects, etc.).
```yaml
# GOOD: No shell features needed
- name: List files
ansible.builtin.command: ls -la
```
See [anti-patterns/common-mistakes.md](anti-patterns/common-mistakes.md) for complete list and
[anti-patterns/refactoring-guide.md](anti-patterns/refactoring-guide.md) for improvement
strategies.
## Tools Available
### Python Analysis Tools (uv)
```bash
# Complexity metrics
./tools/analyze_playbook.py playbook.yml
# Find non-idempotent patterns
./tools/check_idempotency.py playbook.yml
# Variable organization helper
./tools/extract_variables.py playbook.yml
```
### Linting
```bash
# Run all linters
./tools/lint-all.sh
```
### Testing
```bash
# Molecule test scenarios
./tools/molecule/default/
```
## Progressive Disclosure
Start here, drill down as needed:
### Quick Reference (Read First)
- [Playbook & Role Patterns](patterns/playbook-role-patterns.md) - State-based playbooks, public API variables, validation
- [Secrets Management](patterns/secrets-management.md) - Infisical integration, authentication, security
### Deep Patterns (Read When Needed)
- [Testing Comprehensive](patterns/testing-comprehensive.md) - Molecule, CI/CD, test strategies
- [Role Structure Standards](patterns/role-structure-standards.md) - Directory org, naming conventions
- [Documentation Templates](patterns/documentation-templates.md) - README structure, variable docs
- [Variable Management Patterns](patterns/variable-management-patterns.md) - defaults vs vars, naming
- [Handler Best Practices](patterns/handler-best-practices.md) - Handler usage patterns
- [Meta Dependencies](patterns/meta-dependencies.md) - galaxy_info, dependencies
### Advanced Automation (from ProxSpray Analysis)
- [Cluster Automation](patterns/cluster-automation.md) - Proxmox cluster formation with idempotency
- [Network Automation](patterns/network-automation.md) - Declarative network configuration
- [CEPH Automation](patterns/ceph-automation.md) - Complete CEPH storage deployment
### Core Reference
- [Roles vs Playbooks](reference/roles-vs-playbooks.md) - Organization patterns
- [Variable Precedence](reference/variable-precedence.md) - Complete precedence rules (22 levels)
- [Idempotency Patterns](reference/idempotency-patterns.md) - Advanced idempotency techniques
- [Module Selection](reference/module-selection.md) - Builtin vs community decision guide
- [Testing Guide](reference/testing-guide.md) - Molecule and ansible-lint deep dive
- [Collections Guide](reference/collections-guide.md) - Using and managing collections
- [Production Repos](reference/production-repos.md) - Studied geerlingguy roles index
### Patterns & Anti-Patterns
- [Error Handling](patterns/error-handling.md) - Proper error handling patterns
- [Task Organization](patterns/task-organization.md) - Reusable tasks and includes
- [Common Mistakes](anti-patterns/common-mistakes.md) - What to avoid
- [Refactoring Guide](anti-patterns/refactoring-guide.md) - How to improve existing playbooks
## Related Skills
- **Proxmox Infrastructure** - Playbooks for template creation and network config
- **NetBox + PowerDNS** - Dynamic inventory and secrets management patterns