Initial commit

2025-11-29 18:00:27 +08:00
commit 0c6988a884
19 changed files with 5729 additions and 0 deletions
--- a/skills/proxmox-infrastructure/workflows/ceph-deployment.md
+++ b/skills/proxmox-infrastructure/workflows/ceph-deployment.md
@@ -0,0 +1,782 @@
+# CEPH Storage Deployment Workflow
+
+Complete guide to deploying CEPH storage on a Proxmox VE cluster with automated OSD creation, pool
+configuration, and health verification.
+
+## Overview
+
+This workflow automates CEPH deployment with:
+
+- CEPH package installation
+- Cluster initialization with proper network configuration
+- Monitor and manager creation across all nodes
+- Automated OSD creation with partition support
+- Pool configuration with replication and compression
+- Comprehensive health verification
+
+## Prerequisites
+
+Before deploying CEPH:
+
+1. **Cluster must be formed:**
+   - Proxmox cluster already initialized and healthy
+   - All nodes showing quorum
+   - See [Cluster Formation](cluster-formation.md) first
+
+2. **Network requirements:**
+   - Dedicated CEPH public network (192.168.5.0/24 for Matrix)
+   - Dedicated CEPH private/cluster network (192.168.7.0/24 for Matrix)
+   - MTU 9000 (jumbo frames) configured on CEPH networks
+   - Bridges configured: vmbr1 (public), vmbr2 (private)
+
+3. **Storage requirements:**
+   - Dedicated disks for OSDs (not boot disks)
+   - All OSD disks should be the same type (SSD/NVMe)
+   - Matrix: 2× 4TB Samsung 990 PRO NVMe per node = 24TB raw
+
+4. **System requirements:**
+   - Minimum 3 nodes for production (replication factor 3)
+   - At least 4GB RAM per OSD
+   - Fast network (10GbE recommended for CEPH networks)
+
+## Phase 1: Install CEPH Packages
+
+### Step 1: Install CEPH
+
+```yaml
+# roles/proxmox_ceph/tasks/install.yml
+---
+- name: Check if CEPH is already installed
+  ansible.builtin.stat:
+    path: /etc/pve/ceph.conf
+  register: ceph_conf_check
+
+- name: Check CEPH packages
+  ansible.builtin.command:
+    cmd: dpkg -l ceph-common
+  register: ceph_package_check
+  failed_when: false
+  changed_when: false
+
+- name: Install CEPH packages via pveceph
+  ansible.builtin.command:
+    cmd: "pveceph install --repository {{ ceph_repository }}"
+  when: ceph_package_check.rc != 0
+  register: ceph_install
+  changed_when: "'installed' in ceph_install.stdout | default('')"
+
+- name: Verify CEPH installation
+  ansible.builtin.command:
+    cmd: ceph --version
+  register: ceph_version
+  changed_when: false
+  failed_when: ceph_version.rc != 0
+
+- name: Display CEPH version
+  ansible.builtin.debug:
+    msg: "Installed CEPH version: {{ ceph_version.stdout }}"
+```
+
+## Phase 2: Initialize CEPH Cluster
+
+### Step 2: Initialize CEPH (First Node Only)
+
+```yaml
+# roles/proxmox_ceph/tasks/init.yml
+---
+- name: Check if CEPH cluster is initialized
+  ansible.builtin.command:
+    cmd: ceph status
+  register: ceph_status_check
+  failed_when: false
+  changed_when: false
+
+- name: Set CEPH initialization facts
+  ansible.builtin.set_fact:
+    ceph_initialized: "{{ ceph_status_check.rc == 0 }}"
+    is_ceph_first_node: "{{ inventory_hostname == groups[cluster_group | default('matrix_cluster')][0] }}"
+
+- name: Initialize CEPH cluster on first node
+  ansible.builtin.command:
+    cmd: >
+      pveceph init
+      --network {{ ceph_network }}
+      --cluster-network {{ ceph_cluster_network }}
+  when:
+    - is_ceph_first_node
+    - not ceph_initialized
+  register: ceph_init
+  changed_when: ceph_init.rc == 0
+
+- name: Wait for CEPH cluster to initialize
+  ansible.builtin.pause:
+    seconds: 15
+  when: ceph_init.changed
+
+- name: Verify CEPH initialization
+  ansible.builtin.command:
+    cmd: ceph status
+  register: ceph_init_verify
+  changed_when: false
+  when:
+    - is_ceph_first_node
+  failed_when:
+    - ceph_init_verify.rc != 0
+
+- name: Display initial CEPH status
+  ansible.builtin.debug:
+    var: ceph_init_verify.stdout_lines
+  when:
+    - is_ceph_first_node
+    - ceph_init.changed or ansible_verbosity > 0
+```
+
+## Phase 3: Create Monitors and Managers
+
+### Step 3: Create CEPH Monitors
+
+```yaml
+# roles/proxmox_ceph/tasks/monitors.yml
+---
+- name: Check existing CEPH monitors
+  ansible.builtin.command:
+    cmd: ceph mon dump --format json
+  register: mon_dump
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  failed_when: false
+  changed_when: false
+
+- name: Parse monitor list
+  ansible.builtin.set_fact:
+    existing_monitors: "{{ (mon_dump.stdout | from_json).mons | map(attribute='name') | list }}"
+  when: mon_dump.rc == 0
+
+- name: Set monitor facts
+  ansible.builtin.set_fact:
+    has_monitor: "{{ inventory_hostname_short in existing_monitors | default([]) }}"
+
+- name: Create CEPH monitor on first node
+  ansible.builtin.command:
+    cmd: pveceph mon create
+  when:
+    - is_ceph_first_node
+    - not has_monitor
+  register: mon_create_first
+  changed_when: mon_create_first.rc == 0
+
+- name: Wait for first monitor to stabilize
+  ansible.builtin.pause:
+    seconds: 10
+  when: mon_create_first.changed
+
+- name: Create CEPH monitors on other nodes
+  ansible.builtin.command:
+    cmd: pveceph mon create
+  when:
+    - not is_ceph_first_node
+    - not has_monitor
+  register: mon_create_others
+  changed_when: mon_create_others.rc == 0
+
+- name: Verify monitor quorum
+  ansible.builtin.command:
+    cmd: ceph quorum_status --format json
+  register: quorum_status
+  changed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Check monitor quorum size
+  ansible.builtin.assert:
+    that:
+      - (quorum_status.stdout | from_json).quorum | length >= ((groups[cluster_group | default('matrix_cluster')] | length // 2) + 1)
+    fail_msg: "Monitor quorum not established"
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+```
+
+### Step 4: Create CEPH Managers
+
+```yaml
+# roles/proxmox_ceph/tasks/managers.yml
+---
+- name: Check existing CEPH managers
+  ansible.builtin.command:
+    cmd: ceph mgr dump --format json
+  register: mgr_dump
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  failed_when: false
+  changed_when: false
+
+- name: Parse manager list
+  ansible.builtin.set_fact:
+    existing_managers: "{{ [(mgr_dump.stdout | from_json).active_name] + ((mgr_dump.stdout | from_json).standbys | map(attribute='name') | list) }}"
+  when: mgr_dump.rc == 0
+
+- name: Initialize empty manager list if check failed
+  ansible.builtin.set_fact:
+    existing_managers: []
+  when: mgr_dump.rc != 0
+
+- name: Set manager facts
+  ansible.builtin.set_fact:
+    has_manager: "{{ inventory_hostname_short in (existing_managers | default([])) }}"
+
+- name: Create CEPH manager
+  ansible.builtin.command:
+    cmd: pveceph mgr create
+  when: not has_manager
+  register: mgr_create
+  changed_when: mgr_create.rc == 0
+
+- name: Wait for managers to stabilize
+  ansible.builtin.pause:
+    seconds: 5
+  when: mgr_create.changed
+
+- name: Enable CEPH dashboard module
+  ansible.builtin.command:
+    cmd: ceph mgr module enable dashboard
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  register: dashboard_enable
+  changed_when: "'already enabled' not in dashboard_enable.stderr"
+  failed_when:
+    - dashboard_enable.rc != 0
+    - "'already enabled' not in dashboard_enable.stderr"
+
+- name: Enable Prometheus module
+  ansible.builtin.command:
+    cmd: ceph mgr module enable prometheus
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  register: prometheus_enable
+  changed_when: "'already enabled' not in prometheus_enable.stderr"
+  failed_when:
+    - prometheus_enable.rc != 0
+    - "'already enabled' not in prometheus_enable.stderr"
+```
+
+## Phase 4: Create OSDs
+
+### Step 5: Prepare and Create OSDs
+
+```yaml
+# roles/proxmox_ceph/tasks/osd_create.yml
+---
+- name: Get list of existing OSDs
+  ansible.builtin.command:
+    cmd: ceph osd ls
+  register: existing_osds
+  changed_when: false
+  failed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Check OSD devices availability
+  ansible.builtin.command:
+    cmd: "lsblk -ndo NAME,SIZE,TYPE {{ item.device }}"
+  register: device_check
+  failed_when: device_check.rc != 0
+  changed_when: false
+  loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}"
+  loop_control:
+    label: "{{ item.device }}"
+
+- name: Display device information
+  ansible.builtin.debug:
+    msg: "Device {{ item.item.device }}: {{ item.stdout }}"
+  loop: "{{ device_check.results }}"
+  loop_control:
+    label: "{{ item.item.device }}"
+  when: ansible_verbosity > 0
+
+- name: Wipe existing partitions on OSD devices
+  ansible.builtin.command:
+    cmd: "wipefs -a {{ item.device }}"
+  when:
+    - ceph_wipe_disks | default(false)
+  loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}"
+  loop_control:
+    label: "{{ item.device }}"
+  register: wipe_result
+  changed_when: wipe_result.rc == 0
+
+- name: Create OSDs from whole devices (no partitioning)
+  ansible.builtin.command:
+    cmd: >
+      pveceph osd create {{ item.device }}
+      {% if item.db_device is defined and item.db_device %}--db_dev {{ item.db_device }}{% endif %}
+      {% if item.wal_device is defined and item.wal_device %}--wal_dev {{ item.wal_device }}{% endif %}
+  when:
+    - item.partitions | default(1) == 1
+  loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}"
+  loop_control:
+    label: "{{ item.device }}"
+  register: osd_create_whole
+  changed_when: "'successfully created' in osd_create_whole.stdout | default('')"
+  failed_when:
+    - osd_create_whole.rc != 0
+    - "'already in use' not in osd_create_whole.stderr | default('')"
+    - "'ceph-volume' not in osd_create_whole.stderr | default('')"
+
+- name: Create multiple OSDs per device (with partitioning)
+  ansible.builtin.command:
+    cmd: >
+      pveceph osd create {{ item.0.device }}
+      --size {{ (item.0.device_size_gb | default(4000) / item.0.partitions) | int }}G
+      {% if item.0.db_device is defined and item.0.db_device %}--db_dev {{ item.0.db_device }}{% endif %}
+      {% if item.0.wal_device is defined and item.0.wal_device %}--wal_dev {{ item.0.wal_device }}{% endif %}
+  when:
+    - item.0.partitions > 1
+  with_subelements:
+    - "{{ ceph_osds[inventory_hostname_short] | default([]) }}"
+    - partition_indices
+    - skip_missing: true
+  loop_control:
+    label: "{{ item.0.device }} partition {{ item.1 }}"
+  register: osd_create_partition
+  changed_when: "'successfully created' in osd_create_partition.stdout | default('')"
+  failed_when:
+    - osd_create_partition.rc != 0
+    - "'already in use' not in osd_create_partition.stderr | default('')"
+
+- name: Wait for OSDs to come up
+  ansible.builtin.command:
+    cmd: ceph osd tree --format json
+  register: osd_tree
+  changed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  until: >
+    (osd_tree.stdout | from_json).nodes
+    | selectattr('type', 'equalto', 'osd')
+    | selectattr('status', 'equalto', 'up')
+    | list | length >= expected_osd_count | int
+  retries: 20
+  delay: 10
+  vars:
+    expected_osd_count: >-
+      {{
+        ceph_osds.values()
+        | map('map', attribute='partitions')
+        | map('default', 1)
+        | sum
+      }}
+```
+
+## Phase 5: Create and Configure Pools
+
+### Step 6: Create CEPH Pools
+
+```yaml
+# roles/proxmox_ceph/tasks/pools.yml
+---
+- name: Get existing CEPH pools
+  ansible.builtin.command:
+    cmd: ceph osd pool ls
+  register: existing_pools
+  changed_when: false
+
+- name: Create CEPH pools
+  ansible.builtin.command:
+    cmd: >
+      ceph osd pool create {{ item.name }}
+      {{ item.pg_num }}
+      {{ item.pgp_num | default(item.pg_num) }}
+  when: item.name not in existing_pools.stdout_lines
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_create
+  changed_when: pool_create.rc == 0
+
+- name: Set pool replication size
+  ansible.builtin.command:
+    cmd: "ceph osd pool set {{ item.name }} size {{ item.size }}"
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_size
+  changed_when: "'set pool' in pool_size.stdout"
+
+- name: Set pool minimum replication size
+  ansible.builtin.command:
+    cmd: "ceph osd pool set {{ item.name }} min_size {{ item.min_size }}"
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_min_size
+  changed_when: "'set pool' in pool_min_size.stdout"
+
+- name: Set pool application
+  ansible.builtin.command:
+    cmd: "ceph osd pool application enable {{ item.name }} {{ item.application }}"
+  when: item.application is defined
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_app
+  changed_when: "'enabled application' in pool_app.stdout"
+  failed_when:
+    - pool_app.rc != 0
+    - "'already enabled' not in pool_app.stderr"
+
+- name: Enable compression on pools
+  ansible.builtin.command:
+    cmd: "ceph osd pool set {{ item.name }} compression_mode aggressive"
+  when: item.compression | default(false)
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_compression
+  changed_when: "'set pool' in pool_compression.stdout"
+
+- name: Set compression algorithm
+  ansible.builtin.command:
+    cmd: "ceph osd pool set {{ item.name }} compression_algorithm {{ item.compression_algorithm | default('zstd') }}"
+  when: item.compression | default(false)
+  loop: "{{ ceph_pools }}"
+  loop_control:
+    label: "{{ item.name }}"
+  register: pool_compression_algo
+  changed_when: "'set pool' in pool_compression_algo.stdout"
+```
+
+## Phase 6: Verify CEPH Health
+
+### Step 7: Health Verification
+
+```yaml
+# roles/proxmox_ceph/tasks/verify.yml
+---
+- name: Wait for CEPH to stabilize
+  ansible.builtin.pause:
+    seconds: 30
+
+- name: Check CEPH cluster health
+  ansible.builtin.command:
+    cmd: ceph health
+  register: ceph_health
+  changed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Get CEPH status
+  ansible.builtin.command:
+    cmd: ceph status --format json
+  register: ceph_status
+  changed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Parse CEPH status
+  ansible.builtin.set_fact:
+    ceph_status_data: "{{ ceph_status.stdout | from_json }}"
+
+- name: Calculate expected OSD count
+  ansible.builtin.set_fact:
+    expected_osd_count: >-
+      {{
+        ceph_osds.values()
+        | map('map', attribute='partitions')
+        | map('default', 1)
+        | sum
+      }}
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Verify OSD count
+  ansible.builtin.assert:
+    that:
+      - ceph_status_data.osdmap.num_osds | int == expected_osd_count | int
+    fail_msg: "Expected {{ expected_osd_count }} OSDs but found {{ ceph_status_data.osdmap.num_osds }}"
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Verify all OSDs are up
+  ansible.builtin.assert:
+    that:
+      - ceph_status_data.osdmap.num_up_osds == ceph_status_data.osdmap.num_osds
+    fail_msg: "Not all OSDs are up: {{ ceph_status_data.osdmap.num_up_osds }}/{{ ceph_status_data.osdmap.num_osds }}"
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Verify all OSDs are in
+  ansible.builtin.assert:
+    that:
+      - ceph_status_data.osdmap.num_in_osds == ceph_status_data.osdmap.num_osds
+    fail_msg: "Not all OSDs are in cluster: {{ ceph_status_data.osdmap.num_in_osds }}/{{ ceph_status_data.osdmap.num_osds }}"
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+
+- name: Wait for PGs to become active+clean
+  ansible.builtin.command:
+    cmd: ceph pg stat --format json
+  register: pg_stat
+  changed_when: false
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+  until: >
+    (pg_stat.stdout | from_json).num_pg_by_state
+    | selectattr('name', 'equalto', 'active+clean')
+    | map(attribute='num')
+    | sum == (pg_stat.stdout | from_json).num_pgs
+  retries: 60
+  delay: 10
+
+- name: Display CEPH cluster summary
+  ansible.builtin.debug:
+    msg: |
+      CEPH Cluster Health: {{ ceph_health.stdout }}
+      Total OSDs: {{ ceph_status_data.osdmap.num_osds }}
+      OSDs Up: {{ ceph_status_data.osdmap.num_up_osds }}
+      OSDs In: {{ ceph_status_data.osdmap.num_in_osds }}
+      PGs: {{ ceph_status_data.pgmap.num_pgs }}
+      Data: {{ ceph_status_data.pgmap.bytes_used | default(0) | human_readable }}
+      Available: {{ ceph_status_data.pgmap.bytes_avail | default(0) | human_readable }}
+  delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+  run_once: true
+```
+
+## Matrix Cluster Configuration Example
+
+```yaml
+# group_vars/matrix_cluster.yml (CEPH section)
+---
+# CEPH configuration
+ceph_enabled: true
+ceph_repository: "no-subscription"  # or "enterprise" with subscription
+ceph_network: "192.168.5.0/24"          # vmbr1 - Public network
+ceph_cluster_network: "192.168.7.0/24"  # vmbr2 - Private network
+
+# OSD configuration (4 OSDs per node = 12 total)
+ceph_osds:
+  foxtrot:
+    - device: /dev/nvme1n1
+      partitions: 2  # Create 2 OSDs per 4TB NVMe
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      db_device: null
+      wal_device: null
+      crush_device_class: nvme
+    - device: /dev/nvme2n1
+      partitions: 2
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      db_device: null
+      wal_device: null
+      crush_device_class: nvme
+
+  golf:
+    - device: /dev/nvme1n1
+      partitions: 2
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      crush_device_class: nvme
+    - device: /dev/nvme2n1
+      partitions: 2
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      crush_device_class: nvme
+
+  hotel:
+    - device: /dev/nvme1n1
+      partitions: 2
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      crush_device_class: nvme
+    - device: /dev/nvme2n1
+      partitions: 2
+      device_size_gb: 4000
+      partition_indices: [0, 1]
+      crush_device_class: nvme
+
+# Pool configuration
+ceph_pools:
+  - name: vm_ssd
+    pg_num: 128
+    pgp_num: 128
+    size: 3           # Replicate across 3 nodes
+    min_size: 2       # Minimum 2 replicas required
+    application: rbd
+    compression: false
+
+  - name: vm_containers
+    pg_num: 64
+    pgp_num: 64
+    size: 3
+    min_size: 2
+    application: rbd
+    compression: true
+    compression_algorithm: zstd
+
+# Safety flags
+ceph_wipe_disks: false  # Set to true for fresh deployment (DESTRUCTIVE!)
+```
+
+## Complete Playbook Example
+
+```yaml
+# playbooks/ceph-deploy.yml
+---
+- name: Deploy CEPH Storage on Proxmox Cluster
+  hosts: "{{ cluster_group | default('matrix_cluster') }}"
+  become: true
+  serial: 1  # Deploy one node at a time
+
+  pre_tasks:
+    - name: Verify cluster is healthy
+      ansible.builtin.command:
+        cmd: pvecm status
+      register: cluster_check
+      changed_when: false
+      failed_when: "'Quorate: Yes' not in cluster_check.stdout"
+
+    - name: Verify CEPH networks MTU
+      ansible.builtin.command:
+        cmd: "ip link show {{ item }}"
+      register: mtu_check
+      changed_when: false
+      failed_when: "'mtu 9000' not in mtu_check.stdout"
+      loop:
+        - vmbr1  # CEPH public
+        - vmbr2  # CEPH private
+
+    - name: Display CEPH configuration
+      ansible.builtin.debug:
+        msg: |
+          Deploying CEPH to cluster: {{ cluster_name }}
+          Public network: {{ ceph_network }}
+          Cluster network: {{ ceph_cluster_network }}
+          Expected OSDs: {{ ceph_osds.values() | map('map', attribute='partitions') | map('default', 1) | sum }}
+      run_once: true
+
+  roles:
+    - role: proxmox_ceph
+
+  post_tasks:
+    - name: Display CEPH OSD tree
+      ansible.builtin.command:
+        cmd: ceph osd tree
+      register: osd_tree_final
+      changed_when: false
+      delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+      run_once: true
+
+    - name: Show OSD tree
+      ansible.builtin.debug:
+        var: osd_tree_final.stdout_lines
+      run_once: true
+
+    - name: Display pool information
+      ansible.builtin.command:
+        cmd: ceph osd pool ls detail
+      register: pool_info
+      changed_when: false
+      delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}"
+      run_once: true
+
+    - name: Show pool details
+      ansible.builtin.debug:
+        var: pool_info.stdout_lines
+      run_once: true
+```
+
+## Usage
+
+### Deploy CEPH to Matrix Cluster
+
+```bash
+# Check syntax
+ansible-playbook playbooks/ceph-deploy.yml --syntax-check
+
+# Deploy CEPH
+ansible-playbook playbooks/ceph-deploy.yml --limit matrix_cluster
+
+# Verify CEPH status
+ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph status"
+ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph osd tree"
+ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph df"
+```
+
+### Add mise Tasks
+
+```toml
+# .mise.toml
+[tasks."ceph:deploy"]
+description = "Deploy CEPH storage on cluster"
+run = """
+cd ansible
+uv run ansible-playbook playbooks/ceph-deploy.yml
+"""
+
+[tasks."ceph:status"]
+description = "Show CEPH cluster status"
+run = """
+ansible -i ansible/inventory/proxmox.yml foxtrot -m shell -a "ceph -s"
+"""
+
+[tasks."ceph:health"]
+description = "Show CEPH health detail"
+run = """
+ansible -i ansible/inventory/proxmox.yml foxtrot -m shell -a "ceph health detail"
+"""
+```
+
+## Troubleshooting
+
+### OSDs Won't Create
+
+**Symptoms:**
+
+- `pveceph osd create` fails with "already in use" error
+
+**Solutions:**
+
+1. Check if disk has existing partitions: `lsblk /dev/nvme1n1`
+2. Wipe disk: `wipefs -a /dev/nvme1n1` (DESTRUCTIVE!)
+3. Set `ceph_wipe_disks: true` in group_vars
+4. Check for existing LVM: `pvdisplay`, `lvdisplay`
+
+### PGs Stuck in Creating
+
+**Symptoms:**
+
+- PGs stay in "creating" state for extended period
+
+**Solutions:**
+
+1. Check OSD status: `ceph osd tree`
+2. Verify all OSDs are up and in: `ceph osd stat`
+3. Check mon/mgr status: `ceph mon stat`, `ceph mgr stat`
+4. Review logs: `journalctl -u ceph-osd@*.service -n 100`
+
+### Poor CEPH Performance
+
+**Symptoms:**
+
+- Slow VM disk I/O
+
+**Solutions:**
+
+1. Verify MTU 9000: `ip link show vmbr1 | grep mtu`
+2. Test network throughput: `iperf3` between nodes
+3. Check OSD utilization: `ceph osd df`
+4. Verify SSD/NVMe is being used: `ceph osd tree`
+5. Check for rebalancing: `ceph -s` (look for "recovery")
+
+## Related Workflows
+
+- [Cluster Formation](cluster-formation.md) - Form cluster before CEPH
+- [Network Configuration](../reference/networking.md) - Configure CEPH networks
+- [Storage Management](../reference/storage-management.md) - Manage CEPH pools and OSDs
+
+## References
+
+- ProxSpray analysis: `docs/proxspray-analysis.md` (lines 1431-1562)
+- Proxmox VE CEPH documentation
+- CEPH deployment best practices
+- [Ansible CEPH automation pattern](../../.claude/skills/ansible-best-practices/patterns/ceph-automation.md)
--- a/skills/proxmox-infrastructure/workflows/cluster-formation.md
+++ b/skills/proxmox-infrastructure/workflows/cluster-formation.md
@@ -0,0 +1,646 @@
+# Proxmox Cluster Formation Workflow
+
+Complete guide to forming a Proxmox VE cluster using Ansible automation with idempotent patterns.
+
+## Overview
+
+This workflow automates the creation of a Proxmox VE cluster with:
+
+- Hostname resolution configuration
+- SSH key distribution for cluster operations
+- Idempotent cluster initialization
+- Corosync network configuration
+- Quorum and health verification
+
+## Prerequisites
+
+Before forming a cluster:
+
+1. **All nodes must have:**
+   - Proxmox VE 9.x installed
+   - Network connectivity on management network
+   - Dedicated corosync network configured (VLAN 9 for Matrix)
+   - Unique hostnames
+   - Synchronized time (NTP configured)
+
+2. **Minimum requirements:**
+   - At least 3 nodes for quorum (production)
+   - 1 node for development/testing (non-recommended)
+
+3. **Network requirements:**
+   - All nodes must be able to resolve each other's hostnames
+   - Corosync network must be isolated (no VM traffic)
+   - Low latency between nodes (<2ms recommended)
+   - MTU 1500 on management network
+
+## Phase 1: Prepare Cluster Nodes
+
+### Step 1: Verify Prerequisites
+
+```yaml
+# roles/proxmox_cluster/tasks/prerequisites.yml
+---
+- name: Check Proxmox VE is installed
+  ansible.builtin.stat:
+    path: /usr/bin/pvecm
+  register: pvecm_binary
+  failed_when: not pvecm_binary.stat.exists
+
+- name: Get Proxmox VE version
+  ansible.builtin.command:
+    cmd: pveversion
+  register: pve_version
+  changed_when: false
+
+- name: Verify minimum Proxmox VE version
+  ansible.builtin.assert:
+    that:
+      - "'pve-manager/9' in pve_version.stdout or 'pve-manager/8' in pve_version.stdout"
+    fail_msg: "Proxmox VE 8.x or 9.x required"
+
+- name: Verify minimum node count for production
+  ansible.builtin.assert:
+    that:
+      - groups[cluster_group] | length >= 3
+    fail_msg: "Production cluster requires at least 3 nodes for quorum"
+  when: cluster_environment == 'production'
+
+- name: Check no existing cluster membership
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: existing_cluster
+  failed_when: false
+  changed_when: false
+
+- name: Display cluster warning if already member
+  ansible.builtin.debug:
+    msg: |
+      WARNING: Node {{ inventory_hostname }} is already a cluster member.
+      Current cluster: {{ existing_cluster.stdout }}
+      This playbook will attempt to join the target cluster.
+  when:
+    - existing_cluster.rc == 0
+    - cluster_name not in existing_cluster.stdout
+```
+
+### Step 2: Configure Hostname Resolution
+
+```yaml
+# roles/proxmox_cluster/tasks/hosts_config.yml
+---
+- name: Ensure cluster nodes in /etc/hosts (management IP)
+  ansible.builtin.lineinfile:
+    path: /etc/hosts
+    regexp: "^{{ item.management_ip }}\\s+"
+    line: "{{ item.management_ip }} {{ item.fqdn }} {{ item.short_name }}"
+    state: present
+  loop: "{{ cluster_nodes }}"
+  loop_control:
+    label: "{{ item.short_name }}"
+
+- name: Ensure corosync IPs in /etc/hosts
+  ansible.builtin.lineinfile:
+    path: /etc/hosts
+    regexp: "^{{ item.corosync_ip }}\\s+"
+    line: "{{ item.corosync_ip }} {{ item.short_name }}-corosync"
+    state: present
+  loop: "{{ cluster_nodes }}"
+  loop_control:
+    label: "{{ item.short_name }}"
+
+- name: Verify hostname resolution (forward)
+  ansible.builtin.command:
+    cmd: "getent hosts {{ item.fqdn }}"
+  register: host_lookup
+  failed_when: host_lookup.rc != 0
+  changed_when: false
+  loop: "{{ cluster_nodes }}"
+  loop_control:
+    label: "{{ item.fqdn }}"
+
+- name: Verify hostname resolution (reverse)
+  ansible.builtin.command:
+    cmd: "getent hosts {{ item.management_ip }}"
+  register: reverse_lookup
+  failed_when:
+    - reverse_lookup.rc != 0
+  changed_when: false
+  loop: "{{ cluster_nodes }}"
+  loop_control:
+    label: "{{ item.management_ip }}"
+
+- name: Test corosync network connectivity
+  ansible.builtin.command:
+    cmd: "ping -c 3 -W 2 {{ item.corosync_ip }}"
+  register: corosync_ping
+  changed_when: false
+  when: item.short_name != inventory_hostname_short
+  loop: "{{ cluster_nodes }}"
+  loop_control:
+    label: "{{ item.short_name }}"
+```
+
+### Step 3: Distribute SSH Keys
+
+```yaml
+# roles/proxmox_cluster/tasks/ssh_keys.yml
+---
+- name: Generate SSH key for root (if not exists)
+  ansible.builtin.user:
+    name: root
+    generate_ssh_key: true
+    ssh_key_type: ed25519
+    ssh_key_comment: "root@{{ inventory_hostname }}"
+  register: root_ssh_key
+
+- name: Fetch public keys from all nodes
+  ansible.builtin.slurp:
+    src: /root/.ssh/id_ed25519.pub
+  register: node_public_keys
+
+- name: Distribute SSH keys to all nodes
+  ansible.posix.authorized_key:
+    user: root
+    state: present
+    key: "{{ hostvars[item].node_public_keys.content | b64decode }}"
+    comment: "cluster-{{ item }}"
+  loop: "{{ groups[cluster_group] }}"
+  when: item != inventory_hostname
+
+- name: Populate known_hosts with node SSH keys
+  ansible.builtin.shell:
+    cmd: "ssh-keyscan -H {{ item }} >> /root/.ssh/known_hosts"
+  when: item != inventory_hostname
+  loop: "{{ groups[cluster_group] }}"
+  loop_control:
+    label: "{{ item }}"
+  changed_when: true
+
+- name: Test SSH connectivity to all nodes
+  ansible.builtin.command:
+    cmd: "ssh -o ConnectTimeout=5 {{ item }} hostname"
+  register: ssh_test
+  changed_when: false
+  when: item != inventory_hostname
+  loop: "{{ groups[cluster_group] }}"
+  loop_control:
+    label: "{{ item }}"
+```
+
+## Phase 2: Initialize Cluster
+
+### Step 4: Create Cluster (First Node Only)
+
+```yaml
+# roles/proxmox_cluster/tasks/cluster_init.yml
+---
+- name: Check existing cluster status
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: cluster_status
+  failed_when: false
+  changed_when: false
+
+- name: Get cluster nodes list
+  ansible.builtin.command:
+    cmd: pvecm nodes
+  register: cluster_nodes_check
+  failed_when: false
+  changed_when: false
+
+- name: Set cluster facts
+  ansible.builtin.set_fact:
+    in_target_cluster: "{{ cluster_status.rc == 0 and cluster_name in cluster_status.stdout }}"
+
+- name: Create new cluster on first node
+  ansible.builtin.command:
+    cmd: "pvecm create {{ cluster_name }} --link0 {{ corosync_link0_address }}"
+  when: not in_target_cluster
+  register: cluster_create
+  changed_when: cluster_create.rc == 0
+
+- name: Wait for cluster to initialize
+  ansible.builtin.pause:
+    seconds: 10
+  when: cluster_create.changed
+
+- name: Verify cluster creation
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: cluster_verify
+  changed_when: false
+  failed_when: cluster_name not in cluster_verify.stdout
+
+- name: Display cluster status
+  ansible.builtin.debug:
+    var: cluster_verify.stdout_lines
+  when: cluster_create.changed or ansible_verbosity > 0
+```
+
+### Step 5: Join Nodes to Cluster
+
+```yaml
+# roles/proxmox_cluster/tasks/cluster_join.yml
+---
+- name: Check if already in cluster
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: cluster_status
+  failed_when: false
+  changed_when: false
+
+- name: Set membership facts
+  ansible.builtin.set_fact:
+    is_cluster_member: "{{ cluster_status.rc == 0 }}"
+    in_target_cluster: "{{ cluster_status.rc == 0 and cluster_name in cluster_status.stdout }}"
+
+- name: Get first node hostname
+  ansible.builtin.set_fact:
+    first_node_hostname: "{{ hostvars[groups[cluster_group][0]].inventory_hostname }}"
+
+- name: Join cluster
+  ansible.builtin.command:
+    cmd: >
+      pvecm add {{ first_node_hostname }}
+      --link0 {{ corosync_link0_address }}
+  when:
+    - not is_cluster_member or not in_target_cluster
+  register: cluster_join
+  changed_when: cluster_join.rc == 0
+  failed_when:
+    - cluster_join.rc != 0
+    - "'already in a cluster' not in cluster_join.stderr"
+
+- name: Wait for node to join cluster
+  ansible.builtin.pause:
+    seconds: 10
+  when: cluster_join.changed
+
+- name: Verify cluster membership
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: join_verify
+  changed_when: false
+  failed_when:
+    - "'Quorate: Yes' not in join_verify.stdout"
+```
+
+## Phase 3: Configure Corosync
+
+### Step 6: Corosync Network Configuration
+
+```yaml
+# roles/proxmox_cluster/tasks/corosync.yml
+---
+- name: Get current corosync configuration
+  ansible.builtin.slurp:
+    src: /etc/pve/corosync.conf
+  register: corosync_conf_current
+
+- name: Parse current corosync config
+  ansible.builtin.set_fact:
+    current_corosync: "{{ corosync_conf_current.content | b64decode }}"
+
+- name: Check if corosync config needs update
+  ansible.builtin.set_fact:
+    corosync_needs_update: "{{ corosync_network not in current_corosync }}"
+
+- name: Backup corosync.conf
+  ansible.builtin.copy:
+    src: /etc/pve/corosync.conf
+    dest: "/etc/pve/corosync.conf.{{ ansible_date_time.epoch }}.bak"
+    remote_src: true
+    mode: '0640'
+  when: corosync_needs_update
+  delegate_to: "{{ groups[cluster_group][0] }}"
+  run_once: true
+
+- name: Update corosync configuration
+  ansible.builtin.template:
+    src: corosync.conf.j2
+    dest: /etc/pve/corosync.conf.new
+    validate: corosync-cfgtool -c %s
+    mode: '0640'
+  when: corosync_needs_update
+  delegate_to: "{{ groups[cluster_group][0] }}"
+  run_once: true
+
+- name: Apply new corosync configuration
+  ansible.builtin.copy:
+    src: /etc/pve/corosync.conf.new
+    dest: /etc/pve/corosync.conf
+    remote_src: true
+    mode: '0640'
+  when: corosync_needs_update
+  notify:
+    - reload corosync
+  delegate_to: "{{ groups[cluster_group][0] }}"
+  run_once: true
+```
+
+**Corosync Template Example:**
+
+```jinja2
+# templates/corosync.conf.j2
+totem {
+  version: 2
+  cluster_name: {{ cluster_name }}
+  transport: knet
+  crypto_cipher: aes256
+  crypto_hash: sha256
+
+  interface {
+    linknumber: 0
+    knet_link_priority: 255
+  }
+}
+
+nodelist {
+{% for node in cluster_nodes %}
+  node {
+    name: {{ node.short_name }}
+    nodeid: {{ node.node_id }}
+    quorum_votes: 1
+    ring0_addr: {{ node.corosync_ip }}
+  }
+{% endfor %}
+}
+
+quorum {
+  provider: corosync_votequorum
+{% if cluster_nodes | length == 2 %}
+  two_node: 1
+{% endif %}
+}
+
+logging {
+  to_logfile: yes
+  logfile: /var/log/corosync/corosync.log
+  to_syslog: yes
+  timestamp: on
+}
+```
+
+## Phase 4: Verify Cluster Health
+
+### Step 7: Health Checks
+
+```yaml
+# roles/proxmox_cluster/tasks/verify.yml
+---
+- name: Wait for cluster to stabilize
+  ansible.builtin.pause:
+    seconds: 15
+
+- name: Check cluster quorum
+  ansible.builtin.command:
+    cmd: pvecm status
+  register: cluster_health
+  changed_when: false
+  failed_when: "'Quorate: Yes' not in cluster_health.stdout"
+
+- name: Get cluster node count
+  ansible.builtin.command:
+    cmd: pvecm nodes
+  register: cluster_nodes_final
+  changed_when: false
+
+- name: Verify expected node count
+  ansible.builtin.assert:
+    that:
+      - cluster_nodes_final.stdout_lines | length >= groups[cluster_group] | length
+    fail_msg: "Expected {{ groups[cluster_group] | length }} nodes but found {{ cluster_nodes_final.stdout_lines | length }}"
+
+- name: Check corosync ring status
+  ansible.builtin.command:
+    cmd: corosync-cfgtool -s
+  register: corosync_status
+  changed_when: false
+
+- name: Verify all nodes in corosync
+  ansible.builtin.assert:
+    that:
+      - "'online' in corosync_status.stdout"
+    fail_msg: "Corosync ring issues detected"
+
+- name: Get cluster configuration version
+  ansible.builtin.command:
+    cmd: corosync-cmapctl -b totem.config_version
+  register: config_version
+  changed_when: false
+
+- name: Display cluster health summary
+  ansible.builtin.debug:
+    msg: |
+      Cluster: {{ cluster_name }}
+      Quorum: {{ 'Yes' if 'Quorate: Yes' in cluster_health.stdout else 'No' }}
+      Nodes: {{ cluster_nodes_final.stdout_lines | length }}
+      Config Version: {{ config_version.stdout }}
+```
+
+## Matrix Cluster Example Configuration
+
+```yaml
+# group_vars/matrix_cluster.yml
+---
+cluster_name: "Matrix"
+cluster_group: "matrix_cluster"
+cluster_environment: "production"
+
+# Corosync configuration
+corosync_network: "192.168.8.0/24"  # VLAN 9
+
+# Node configuration
+cluster_nodes:
+  - short_name: foxtrot
+    fqdn: foxtrot.matrix.spaceships.work
+    management_ip: 192.168.3.5
+    corosync_ip: 192.168.8.5
+    node_id: 1
+
+  - short_name: golf
+    fqdn: golf.matrix.spaceships.work
+    management_ip: 192.168.3.6
+    corosync_ip: 192.168.8.6
+    node_id: 2
+
+  - short_name: hotel
+    fqdn: hotel.matrix.spaceships.work
+    management_ip: 192.168.3.7
+    corosync_ip: 192.168.8.7
+    node_id: 3
+
+# Set per-node corosync address
+corosync_link0_address: "{{ cluster_nodes | selectattr('short_name', 'equalto', inventory_hostname_short) | map(attribute='corosync_ip') | first }}"
+```
+
+## Complete Playbook Example
+
+```yaml
+# playbooks/cluster-init.yml
+---
+- name: Initialize Proxmox Cluster
+  hosts: "{{ cluster_group | default('matrix_cluster') }}"
+  become: true
+  serial: 1  # One node at a time for safety
+
+  pre_tasks:
+    - name: Validate cluster group is defined
+      ansible.builtin.assert:
+        that:
+          - cluster_group is defined
+          - cluster_name is defined
+          - cluster_nodes is defined
+        fail_msg: "Required variables not defined in group_vars"
+
+    - name: Display cluster configuration
+      ansible.builtin.debug:
+        msg: |
+          Forming cluster: {{ cluster_name }}
+          Nodes: {{ cluster_nodes | map(attribute='short_name') | join(', ') }}
+          Corosync network: {{ corosync_network }}
+      run_once: true
+
+  tasks:
+    - name: Verify prerequisites
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/prerequisites.yml"
+
+    - name: Configure /etc/hosts
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/hosts_config.yml"
+
+    - name: Distribute SSH keys
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/ssh_keys.yml"
+
+    # First node creates cluster
+    - name: Initialize cluster on first node
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/cluster_init.yml"
+      when: inventory_hostname == groups[cluster_group][0]
+
+    # Wait for first node
+    - name: Wait for first node to complete
+      ansible.builtin.pause:
+        seconds: 20
+      when: inventory_hostname != groups[cluster_group][0]
+
+    # Other nodes join
+    - name: Join cluster on other nodes
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/cluster_join.yml"
+      when: inventory_hostname != groups[cluster_group][0]
+
+    - name: Configure corosync
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/corosync.yml"
+
+    - name: Verify cluster health
+      ansible.builtin.include_tasks: "{{ role_path }}/tasks/verify.yml"
+
+  post_tasks:
+    - name: Display final cluster status
+      ansible.builtin.command:
+        cmd: pvecm status
+      register: final_status
+      changed_when: false
+      delegate_to: "{{ groups[cluster_group][0] }}"
+      run_once: true
+
+    - name: Show cluster status
+      ansible.builtin.debug:
+        var: final_status.stdout_lines
+      run_once: true
+
+  handlers:
+    - name: reload corosync
+      ansible.builtin.systemd:
+        name: corosync
+        state: reloaded
+      throttle: 1
+```
+
+## Usage
+
+### Initialize Matrix Cluster
+
+```bash
+# Check syntax
+ansible-playbook playbooks/cluster-init.yml --syntax-check
+
+# Dry run (limited functionality)
+ansible-playbook playbooks/cluster-init.yml --check --diff
+
+# Initialize cluster
+ansible-playbook playbooks/cluster-init.yml --limit matrix_cluster
+
+# Verify cluster status
+ansible -i inventory/proxmox.yml foxtrot -m shell -a "pvecm status"
+ansible -i inventory/proxmox.yml foxtrot -m shell -a "pvecm nodes"
+```
+
+### Add mise Task
+
+```toml
+# .mise.toml
+[tasks."cluster:init"]
+description = "Initialize Proxmox cluster"
+run = """
+cd ansible
+uv run ansible-playbook playbooks/cluster-init.yml
+"""
+
+[tasks."cluster:status"]
+description = "Show cluster status"
+run = """
+ansible -i ansible/inventory/proxmox.yml foxtrot -m shell -a "pvecm status"
+"""
+```
+
+## Troubleshooting
+
+### Node Won't Join Cluster
+
+**Symptoms:**
+
+- `pvecm add` fails with timeout or connection error
+
+**Solutions:**
+
+1. Verify SSH connectivity: `ssh root@first-node hostname`
+2. Check /etc/hosts: `getent hosts first-node`
+3. Verify corosync network: `ping -c 3 192.168.8.5`
+4. Check firewall: `iptables -L | grep 5404`
+
+### Cluster Shows No Quorum
+
+**Symptoms:**
+
+- `pvecm status` shows `Quorate: No`
+
+**Solutions:**
+
+1. Check node count: Must have majority (2 of 3, 3 of 5, etc.)
+2. Verify corosync: `systemctl status corosync`
+3. Check corosync ring: `corosync-cfgtool -s`
+4. Review logs: `journalctl -u corosync -n 50`
+
+### Configuration Sync Issues
+
+**Symptoms:**
+
+- Changes on one node don't appear on others
+
+**Solutions:**
+
+1. Verify pmxcfs: `systemctl status pve-cluster`
+2. Check filesystem: `pvecm status | grep -i cluster`
+3. Restart cluster filesystem: `systemctl restart pve-cluster`
+
+## Related Workflows
+
+- [CEPH Deployment](ceph-deployment.md) - Deploy CEPH after cluster formation
+- [Network Configuration](../reference/networking.md) - Configure cluster networking
+- [Cluster Maintenance](cluster-maintenance.md) - Add/remove nodes, upgrades
+
+## References
+
+- ProxSpray analysis: `docs/proxspray-analysis.md` (lines 1318-1428)
+- Proxmox VE Cluster Manager documentation
+- Corosync configuration guide
+- [Ansible cluster automation pattern](../../ansible-best-practices/patterns/cluster-automation.md)