# Error Handling Patterns ## Overview Proper error handling in Ansible ensures playbooks are robust, idempotent, and provide clear failure messages. This guide covers patterns from the Virgo-Core repository. ## Core Concepts ### changed_when Controls when Ansible reports a task as "changed". Critical for idempotency with `command` and `shell` modules. **Syntax:** ```yaml changed_when: ``` ### failed_when Controls when Ansible considers a task as failed. Allows graceful handling of expected errors. **Syntax:** ```yaml failed_when: ``` ### register Captures task output for later inspection and conditional logic. **Syntax:** ```yaml register: variable_name ``` ## Pattern 1: Idempotent Command Execution ### Problem `command` and `shell` modules always report "changed" even if nothing changed. ### Solution Use `changed_when` to detect actual changes: **Example from repository:** ```yaml - name: Create Proxmox API token ansible.builtin.command: > pveum user token add {{ system_username }}@{{ proxmox_user_realm }} {{ proxmox_token_name }} register: token_result changed_when: "'already exists' not in token_result.stderr" failed_when: - token_result.rc != 0 - "'already exists' not in token_result.stderr" no_log: true ``` **Explanation:** 1. `register: token_result` - Captures command output 2. `changed_when: "'already exists' not in token_result.stderr"` - Only report "changed" if token didn't already exist 3. `failed_when` - Don't fail if token already exists (expected scenario) ## Pattern 2: Check Before Create ### Problem Creating resources that may already exist causes unnecessary errors. ### Solution Check for existence first, create conditionally: **Example:** ```yaml - name: Check if VM template exists ansible.builtin.shell: | set -o pipefail qm list | awk '{print $1}' | grep -q "^{{ template_id }}$" args: executable: /bin/bash register: template_exists changed_when: false # Checking doesn't change anything failed_when: false # Don't fail if template not found - name: Create VM template ansible.builtin.command: > qm create {{ template_id }} --name {{ template_name }} --memory 2048 --cores 2 when: template_exists.rc != 0 # Only create if check failed (doesn't exist) register: create_result ``` **Key points:** - `changed_when: false` - Read-only operation - `failed_when: false` - Expected that template might not exist - `when: template_exists.rc != 0` - Conditional creation ## Pattern 3: Verify After Create ### Problem Resource creation appears to succeed but may have failed silently. ### Solution Verify resource exists after creation: **Example:** ```yaml - name: Create VM ansible.builtin.command: > qm create {{ vmid }} --name {{ vm_name }} --memory 4096 register: create_result - name: Verify VM was created ansible.builtin.shell: | set -o pipefail qm list | grep "{{ vmid }}" args: executable: /bin/bash register: verify_result changed_when: false failed_when: verify_result.rc != 0 ``` ## Pattern 4: Graceful Failure Handling ### Problem Task failures may be expected in certain scenarios. ### Solution Use `failed_when` with specific conditions: **Example:** ```yaml - name: Try to stop service ansible.builtin.systemd: name: myservice state: stopped register: stop_result failed_when: - stop_result.failed - "'not found' not in stop_result.msg" # Allow failure if service doesn't exist ``` **Multiple failure conditions:** ```yaml - name: Run migration ansible.builtin.command: /usr/bin/migrate-database register: migrate_result failed_when: - migrate_result.rc != 0 - "'already applied' not in migrate_result.stdout" - "'no changes' not in migrate_result.stdout" # Success if: rc=0, OR "already applied", OR "no changes" ``` ## Pattern 5: Block with Rescue ### Problem Need to handle failures and perform cleanup. ### Solution Use `block`/`rescue`/`always`: **Example:** ```yaml - name: Deploy application block: - name: Stop application ansible.builtin.systemd: name: myapp state: stopped - name: Deploy new version ansible.builtin.copy: src: myapp-v2.0 dest: /usr/bin/myapp - name: Start application ansible.builtin.systemd: name: myapp state: started rescue: - name: Rollback to previous version ansible.builtin.copy: src: myapp-backup dest: /usr/bin/myapp - name: Start application (rollback) ansible.builtin.systemd: name: myapp state: started - name: Report failure ansible.builtin.fail: msg: "Deployment failed, rolled back to previous version" always: - name: Cleanup temp files ansible.builtin.file: path: /tmp/deploy-* state: absent ``` **Explanation:** - `block:` - Main tasks - `rescue:` - Runs if any task in block fails - `always:` - Runs regardless of success/failure ## Pattern 6: Retry with Until ### Problem Transient failures need retries before giving up. ### Solution Use `until`, `retries`, `delay`: **Example:** ```yaml - name: Wait for service to be ready ansible.builtin.uri: url: http://localhost:8080/health status_code: 200 register: health_check until: health_check.status == 200 retries: 30 delay: 10 # Retry every 10 seconds, up to 30 times (5 minutes total) ``` **With command:** ```yaml - name: Wait for VM to get IP address ansible.builtin.command: qm agent {{ vmid }} network-get-interfaces register: vm_network until: vm_network.rc == 0 retries: 12 delay: 5 changed_when: false ``` ## Pattern 7: Conditional Failure Messages ### Problem Generic failure messages don't help with troubleshooting. ### Solution Use `ansible.builtin.fail` with conditional messages: **Example:** ```yaml - name: Check prerequisites ansible.builtin.command: which docker register: docker_check changed_when: false failed_when: false - name: Fail if Docker not installed ansible.builtin.fail: msg: | Docker is not installed on {{ inventory_hostname }} Please install Docker before running this playbook. Installation: sudo apt install docker.io when: docker_check.rc != 0 - name: Check Docker version ansible.builtin.command: docker --version register: docker_version changed_when: false - name: Validate Docker version ansible.builtin.fail: msg: | Docker version is too old: {{ docker_version.stdout }} Minimum required version: 20.10 when: docker_version.stdout is version('20.10', '<') ``` ## Pattern 8: Assert for Validation ### Problem Need to validate multiple conditions with clear error messages. ### Solution Use `ansible.builtin.assert`: **Example from repository:** ```yaml - name: Validate required variables ansible.builtin.assert: that: - secret_name is defined and secret_name|trim|length > 0 - secret_var_name is defined and secret_var_name|trim|length > 0 fail_msg: "secret_name and secret_var_name must be provided and non-empty" success_msg: "All required variables present" quiet: true no_log: true ``` **Multiple assertions:** ```yaml - name: Validate VM configuration ansible.builtin.assert: that: - vm_memory >= 2048 - vm_cores >= 2 - vm_disk_size >= 20 - vm_name is match('^[a-z0-9-]+$') fail_msg: | Invalid VM configuration: - Memory must be >= 2048 MB (got: {{ vm_memory }}) - Cores must be >= 2 (got: {{ vm_cores }}) - Disk must be >= 20 GB (got: {{ vm_disk_size }}) - Name must be lowercase alphanumeric with hyphens (got: {{ vm_name }}) ``` ## Pattern 9: Ignore Errors Temporarily ### Problem Task may fail but playbook should continue. ### Solution Use `ignore_errors` (sparingly!): **Example:** ```yaml - name: Try to remove old backup ansible.builtin.file: path: /backup/old-backup.tar.gz state: absent ignore_errors: true # OK if file doesn't exist register: cleanup_result - name: Report cleanup result ansible.builtin.debug: msg: "Cleanup {{ 'successful' if not cleanup_result.failed else 'skipped (file not found)' }}" ``` **Better approach with failed_when:** ```yaml - name: Remove old backup ansible.builtin.file: path: /backup/old-backup.tar.gz state: absent register: cleanup_result failed_when: - cleanup_result.failed - "'does not exist' not in cleanup_result.msg" ``` ## Pattern 10: Task Delegation ### Problem Need to run task locally or on a different host. ### Solution Use `delegate_to`: **Example:** ```yaml - name: Check API endpoint from controller ansible.builtin.uri: url: "https://{{ inventory_hostname }}:8006/api2/json/version" validate_certs: false delegate_to: localhost register: api_check failed_when: api_check.status != 200 ``` ## Complete Example: Robust VM Creation **Combining multiple patterns:** ```yaml --- - name: Create Proxmox VM with robust error handling hosts: proxmox_nodes gather_facts: false vars: vmid: 101 vm_name: docker-01-nexus tasks: - name: Validate VM configuration ansible.builtin.assert: that: - vmid is defined and vmid >= 100 - vm_name is match('^[a-z0-9-]+$') fail_msg: "Invalid VM configuration" - name: Check if VM already exists ansible.builtin.shell: | set -o pipefail qm list | awk '{print $1}' | grep -q "^{{ vmid }}$" args: executable: /bin/bash register: vm_exists changed_when: false failed_when: false - name: Create VM block: - name: Clone template ansible.builtin.command: > qm clone 9000 {{ vmid }} --name {{ vm_name }} --full --storage local-lvm when: vm_exists.rc != 0 register: clone_result changed_when: true - name: Wait for clone to complete ansible.builtin.pause: seconds: 5 when: clone_result is changed - name: Verify VM exists ansible.builtin.shell: | set -o pipefail qm list | grep "{{ vmid }}" args: executable: /bin/bash register: verify_vm changed_when: false failed_when: verify_vm.rc != 0 retries: 3 delay: 5 until: verify_vm.rc == 0 - name: Configure VM ansible.builtin.command: > qm set {{ vmid }} --memory 4096 --cores 4 --ipconfig0 ip=192.168.1.100/24,gw=192.168.1.1 register: config_result changed_when: true - name: Start VM ansible.builtin.command: qm start {{ vmid }} register: start_result changed_when: true rescue: - name: Cleanup failed VM ansible.builtin.command: qm destroy {{ vmid }} when: vm_exists.rc != 0 # Only destroy if we created it ignore_errors: true - name: Report failure ansible.builtin.fail: msg: | Failed to create VM {{ vmid }} Clone result: {{ clone_result.stderr | default('N/A') }} Config result: {{ config_result.stderr | default('N/A') }} Start result: {{ start_result.stderr | default('N/A') }} - name: Report success ansible.builtin.debug: msg: "VM {{ vmid }} ({{ vm_name }}) created successfully" when: vm_exists.rc != 0 ``` ## Best Practices Summary 1. **Use `changed_when: false` for checks** - Read-only operations don't change state 2. **Use `failed_when` for expected errors** - Don't fail on "already exists" scenarios 3. **Always `register` command output** - Needed for `changed_when` and `failed_when` 4. **Use `set -euo pipefail` in shell** - Catch errors in pipes 5. **Validate inputs with assert** - Clear failure messages for bad config 6. **Use blocks for complex operations** - Enable rollback with rescue 7. **Add retries for transient failures** - Network calls, service startup 8. **Verify critical operations** - Check resource exists after creation 9. **Use `no_log` with secrets** - Never log sensitive data 10. **Provide clear error messages** - Help troubleshooting with context ## Anti-Patterns to Avoid ### ❌ Bad: Silent Failures ```yaml - name: Important task ansible.builtin.command: critical-operation ignore_errors: true # Hides failures! ``` ### ❌ Bad: No Error Context ```yaml - name: Deploy ansible.builtin.command: deploy.sh # No register, no error handling, no context ``` ### ❌ Bad: Always Changed ```yaml - name: Check if exists ansible.builtin.command: check-resource # Missing: changed_when: false ``` ### ✅ Good: Explicit Error Handling ```yaml - name: Critical operation ansible.builtin.command: critical-operation register: result changed_when: "'created' in result.stdout" failed_when: - result.rc != 0 - "'already exists' not in result.stderr" - name: Verify operation ansible.builtin.command: verify-operation changed_when: false failed_when: false register: verify - name: Report result ansible.builtin.fail: msg: "Operation failed: {{ result.stderr }}" when: verify.rc != 0 ``` ## Further Reading - [Ansible Error Handling](https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html) - [Ansible Conditionals](https://docs.ansible.com/ansible/latest/user_guide/playbooks_conditionals.html) - [Ansible Blocks](https://docs.ansible.com/ansible/latest/user_guide/playbooks_blocks.html)