# CEPH Storage Deployment Workflow Complete guide to deploying CEPH storage on a Proxmox VE cluster with automated OSD creation, pool configuration, and health verification. ## Overview This workflow automates CEPH deployment with: - CEPH package installation - Cluster initialization with proper network configuration - Monitor and manager creation across all nodes - Automated OSD creation with partition support - Pool configuration with replication and compression - Comprehensive health verification ## Prerequisites Before deploying CEPH: 1. **Cluster must be formed:** - Proxmox cluster already initialized and healthy - All nodes showing quorum - See [Cluster Formation](cluster-formation.md) first 2. **Network requirements:** - Dedicated CEPH public network (192.168.5.0/24 for Matrix) - Dedicated CEPH private/cluster network (192.168.7.0/24 for Matrix) - MTU 9000 (jumbo frames) configured on CEPH networks - Bridges configured: vmbr1 (public), vmbr2 (private) 3. **Storage requirements:** - Dedicated disks for OSDs (not boot disks) - All OSD disks should be the same type (SSD/NVMe) - Matrix: 2× 4TB Samsung 990 PRO NVMe per node = 24TB raw 4. **System requirements:** - Minimum 3 nodes for production (replication factor 3) - At least 4GB RAM per OSD - Fast network (10GbE recommended for CEPH networks) ## Phase 1: Install CEPH Packages ### Step 1: Install CEPH ```yaml # roles/proxmox_ceph/tasks/install.yml --- - name: Check if CEPH is already installed ansible.builtin.stat: path: /etc/pve/ceph.conf register: ceph_conf_check - name: Check CEPH packages ansible.builtin.command: cmd: dpkg -l ceph-common register: ceph_package_check failed_when: false changed_when: false - name: Install CEPH packages via pveceph ansible.builtin.command: cmd: "pveceph install --repository {{ ceph_repository }}" when: ceph_package_check.rc != 0 register: ceph_install changed_when: "'installed' in ceph_install.stdout | default('')" - name: Verify CEPH installation ansible.builtin.command: cmd: ceph --version register: ceph_version changed_when: false failed_when: ceph_version.rc != 0 - name: Display CEPH version ansible.builtin.debug: msg: "Installed CEPH version: {{ ceph_version.stdout }}" ``` ## Phase 2: Initialize CEPH Cluster ### Step 2: Initialize CEPH (First Node Only) ```yaml # roles/proxmox_ceph/tasks/init.yml --- - name: Check if CEPH cluster is initialized ansible.builtin.command: cmd: ceph status register: ceph_status_check failed_when: false changed_when: false - name: Set CEPH initialization facts ansible.builtin.set_fact: ceph_initialized: "{{ ceph_status_check.rc == 0 }}" is_ceph_first_node: "{{ inventory_hostname == groups[cluster_group | default('matrix_cluster')][0] }}" - name: Initialize CEPH cluster on first node ansible.builtin.command: cmd: > pveceph init --network {{ ceph_network }} --cluster-network {{ ceph_cluster_network }} when: - is_ceph_first_node - not ceph_initialized register: ceph_init changed_when: ceph_init.rc == 0 - name: Wait for CEPH cluster to initialize ansible.builtin.pause: seconds: 15 when: ceph_init.changed - name: Verify CEPH initialization ansible.builtin.command: cmd: ceph status register: ceph_init_verify changed_when: false when: - is_ceph_first_node failed_when: - ceph_init_verify.rc != 0 - name: Display initial CEPH status ansible.builtin.debug: var: ceph_init_verify.stdout_lines when: - is_ceph_first_node - ceph_init.changed or ansible_verbosity > 0 ``` ## Phase 3: Create Monitors and Managers ### Step 3: Create CEPH Monitors ```yaml # roles/proxmox_ceph/tasks/monitors.yml --- - name: Check existing CEPH monitors ansible.builtin.command: cmd: ceph mon dump --format json register: mon_dump delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true failed_when: false changed_when: false - name: Parse monitor list ansible.builtin.set_fact: existing_monitors: "{{ (mon_dump.stdout | from_json).mons | map(attribute='name') | list }}" when: mon_dump.rc == 0 - name: Set monitor facts ansible.builtin.set_fact: has_monitor: "{{ inventory_hostname_short in existing_monitors | default([]) }}" - name: Create CEPH monitor on first node ansible.builtin.command: cmd: pveceph mon create when: - is_ceph_first_node - not has_monitor register: mon_create_first changed_when: mon_create_first.rc == 0 - name: Wait for first monitor to stabilize ansible.builtin.pause: seconds: 10 when: mon_create_first.changed - name: Create CEPH monitors on other nodes ansible.builtin.command: cmd: pveceph mon create when: - not is_ceph_first_node - not has_monitor register: mon_create_others changed_when: mon_create_others.rc == 0 - name: Verify monitor quorum ansible.builtin.command: cmd: ceph quorum_status --format json register: quorum_status changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Check monitor quorum size ansible.builtin.assert: that: - (quorum_status.stdout | from_json).quorum | length >= ((groups[cluster_group | default('matrix_cluster')] | length // 2) + 1) fail_msg: "Monitor quorum not established" delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true ``` ### Step 4: Create CEPH Managers ```yaml # roles/proxmox_ceph/tasks/managers.yml --- - name: Check existing CEPH managers ansible.builtin.command: cmd: ceph mgr dump --format json register: mgr_dump delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true failed_when: false changed_when: false - name: Parse manager list ansible.builtin.set_fact: existing_managers: "{{ [(mgr_dump.stdout | from_json).active_name] + ((mgr_dump.stdout | from_json).standbys | map(attribute='name') | list) }}" when: mgr_dump.rc == 0 - name: Initialize empty manager list if check failed ansible.builtin.set_fact: existing_managers: [] when: mgr_dump.rc != 0 - name: Set manager facts ansible.builtin.set_fact: has_manager: "{{ inventory_hostname_short in (existing_managers | default([])) }}" - name: Create CEPH manager ansible.builtin.command: cmd: pveceph mgr create when: not has_manager register: mgr_create changed_when: mgr_create.rc == 0 - name: Wait for managers to stabilize ansible.builtin.pause: seconds: 5 when: mgr_create.changed - name: Enable CEPH dashboard module ansible.builtin.command: cmd: ceph mgr module enable dashboard delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true register: dashboard_enable changed_when: "'already enabled' not in dashboard_enable.stderr" failed_when: - dashboard_enable.rc != 0 - "'already enabled' not in dashboard_enable.stderr" - name: Enable Prometheus module ansible.builtin.command: cmd: ceph mgr module enable prometheus delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true register: prometheus_enable changed_when: "'already enabled' not in prometheus_enable.stderr" failed_when: - prometheus_enable.rc != 0 - "'already enabled' not in prometheus_enable.stderr" ``` ## Phase 4: Create OSDs ### Step 5: Prepare and Create OSDs ```yaml # roles/proxmox_ceph/tasks/osd_create.yml --- - name: Get list of existing OSDs ansible.builtin.command: cmd: ceph osd ls register: existing_osds changed_when: false failed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Check OSD devices availability ansible.builtin.command: cmd: "lsblk -ndo NAME,SIZE,TYPE {{ item.device }}" register: device_check failed_when: device_check.rc != 0 changed_when: false loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}" loop_control: label: "{{ item.device }}" - name: Display device information ansible.builtin.debug: msg: "Device {{ item.item.device }}: {{ item.stdout }}" loop: "{{ device_check.results }}" loop_control: label: "{{ item.item.device }}" when: ansible_verbosity > 0 - name: Wipe existing partitions on OSD devices ansible.builtin.command: cmd: "wipefs -a {{ item.device }}" when: - ceph_wipe_disks | default(false) loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}" loop_control: label: "{{ item.device }}" register: wipe_result changed_when: wipe_result.rc == 0 - name: Create OSDs from whole devices (no partitioning) ansible.builtin.command: cmd: > pveceph osd create {{ item.device }} {% if item.db_device is defined and item.db_device %}--db_dev {{ item.db_device }}{% endif %} {% if item.wal_device is defined and item.wal_device %}--wal_dev {{ item.wal_device }}{% endif %} when: - item.partitions | default(1) == 1 loop: "{{ ceph_osds[inventory_hostname_short] | default([]) }}" loop_control: label: "{{ item.device }}" register: osd_create_whole changed_when: "'successfully created' in osd_create_whole.stdout | default('')" failed_when: - osd_create_whole.rc != 0 - "'already in use' not in osd_create_whole.stderr | default('')" - "'ceph-volume' not in osd_create_whole.stderr | default('')" - name: Create multiple OSDs per device (with partitioning) ansible.builtin.command: cmd: > pveceph osd create {{ item.0.device }} --size {{ (item.0.device_size_gb | default(4000) / item.0.partitions) | int }}G {% if item.0.db_device is defined and item.0.db_device %}--db_dev {{ item.0.db_device }}{% endif %} {% if item.0.wal_device is defined and item.0.wal_device %}--wal_dev {{ item.0.wal_device }}{% endif %} when: - item.0.partitions > 1 with_subelements: - "{{ ceph_osds[inventory_hostname_short] | default([]) }}" - partition_indices - skip_missing: true loop_control: label: "{{ item.0.device }} partition {{ item.1 }}" register: osd_create_partition changed_when: "'successfully created' in osd_create_partition.stdout | default('')" failed_when: - osd_create_partition.rc != 0 - "'already in use' not in osd_create_partition.stderr | default('')" - name: Wait for OSDs to come up ansible.builtin.command: cmd: ceph osd tree --format json register: osd_tree changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true until: > (osd_tree.stdout | from_json).nodes | selectattr('type', 'equalto', 'osd') | selectattr('status', 'equalto', 'up') | list | length >= expected_osd_count | int retries: 20 delay: 10 vars: expected_osd_count: >- {{ ceph_osds.values() | map('map', attribute='partitions') | map('default', 1) | sum }} ``` ## Phase 5: Create and Configure Pools ### Step 6: Create CEPH Pools ```yaml # roles/proxmox_ceph/tasks/pools.yml --- - name: Get existing CEPH pools ansible.builtin.command: cmd: ceph osd pool ls register: existing_pools changed_when: false - name: Create CEPH pools ansible.builtin.command: cmd: > ceph osd pool create {{ item.name }} {{ item.pg_num }} {{ item.pgp_num | default(item.pg_num) }} when: item.name not in existing_pools.stdout_lines loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_create changed_when: pool_create.rc == 0 - name: Set pool replication size ansible.builtin.command: cmd: "ceph osd pool set {{ item.name }} size {{ item.size }}" loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_size changed_when: "'set pool' in pool_size.stdout" - name: Set pool minimum replication size ansible.builtin.command: cmd: "ceph osd pool set {{ item.name }} min_size {{ item.min_size }}" loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_min_size changed_when: "'set pool' in pool_min_size.stdout" - name: Set pool application ansible.builtin.command: cmd: "ceph osd pool application enable {{ item.name }} {{ item.application }}" when: item.application is defined loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_app changed_when: "'enabled application' in pool_app.stdout" failed_when: - pool_app.rc != 0 - "'already enabled' not in pool_app.stderr" - name: Enable compression on pools ansible.builtin.command: cmd: "ceph osd pool set {{ item.name }} compression_mode aggressive" when: item.compression | default(false) loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_compression changed_when: "'set pool' in pool_compression.stdout" - name: Set compression algorithm ansible.builtin.command: cmd: "ceph osd pool set {{ item.name }} compression_algorithm {{ item.compression_algorithm | default('zstd') }}" when: item.compression | default(false) loop: "{{ ceph_pools }}" loop_control: label: "{{ item.name }}" register: pool_compression_algo changed_when: "'set pool' in pool_compression_algo.stdout" ``` ## Phase 6: Verify CEPH Health ### Step 7: Health Verification ```yaml # roles/proxmox_ceph/tasks/verify.yml --- - name: Wait for CEPH to stabilize ansible.builtin.pause: seconds: 30 - name: Check CEPH cluster health ansible.builtin.command: cmd: ceph health register: ceph_health changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Get CEPH status ansible.builtin.command: cmd: ceph status --format json register: ceph_status changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Parse CEPH status ansible.builtin.set_fact: ceph_status_data: "{{ ceph_status.stdout | from_json }}" - name: Calculate expected OSD count ansible.builtin.set_fact: expected_osd_count: >- {{ ceph_osds.values() | map('map', attribute='partitions') | map('default', 1) | sum }} delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Verify OSD count ansible.builtin.assert: that: - ceph_status_data.osdmap.num_osds | int == expected_osd_count | int fail_msg: "Expected {{ expected_osd_count }} OSDs but found {{ ceph_status_data.osdmap.num_osds }}" delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Verify all OSDs are up ansible.builtin.assert: that: - ceph_status_data.osdmap.num_up_osds == ceph_status_data.osdmap.num_osds fail_msg: "Not all OSDs are up: {{ ceph_status_data.osdmap.num_up_osds }}/{{ ceph_status_data.osdmap.num_osds }}" delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Verify all OSDs are in ansible.builtin.assert: that: - ceph_status_data.osdmap.num_in_osds == ceph_status_data.osdmap.num_osds fail_msg: "Not all OSDs are in cluster: {{ ceph_status_data.osdmap.num_in_osds }}/{{ ceph_status_data.osdmap.num_osds }}" delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Wait for PGs to become active+clean ansible.builtin.command: cmd: ceph pg stat --format json register: pg_stat changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true until: > (pg_stat.stdout | from_json).num_pg_by_state | selectattr('name', 'equalto', 'active+clean') | map(attribute='num') | sum == (pg_stat.stdout | from_json).num_pgs retries: 60 delay: 10 - name: Display CEPH cluster summary ansible.builtin.debug: msg: | CEPH Cluster Health: {{ ceph_health.stdout }} Total OSDs: {{ ceph_status_data.osdmap.num_osds }} OSDs Up: {{ ceph_status_data.osdmap.num_up_osds }} OSDs In: {{ ceph_status_data.osdmap.num_in_osds }} PGs: {{ ceph_status_data.pgmap.num_pgs }} Data: {{ ceph_status_data.pgmap.bytes_used | default(0) | human_readable }} Available: {{ ceph_status_data.pgmap.bytes_avail | default(0) | human_readable }} delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true ``` ## Matrix Cluster Configuration Example ```yaml # group_vars/matrix_cluster.yml (CEPH section) --- # CEPH configuration ceph_enabled: true ceph_repository: "no-subscription" # or "enterprise" with subscription ceph_network: "192.168.5.0/24" # vmbr1 - Public network ceph_cluster_network: "192.168.7.0/24" # vmbr2 - Private network # OSD configuration (4 OSDs per node = 12 total) ceph_osds: foxtrot: - device: /dev/nvme1n1 partitions: 2 # Create 2 OSDs per 4TB NVMe device_size_gb: 4000 partition_indices: [0, 1] db_device: null wal_device: null crush_device_class: nvme - device: /dev/nvme2n1 partitions: 2 device_size_gb: 4000 partition_indices: [0, 1] db_device: null wal_device: null crush_device_class: nvme golf: - device: /dev/nvme1n1 partitions: 2 device_size_gb: 4000 partition_indices: [0, 1] crush_device_class: nvme - device: /dev/nvme2n1 partitions: 2 device_size_gb: 4000 partition_indices: [0, 1] crush_device_class: nvme hotel: - device: /dev/nvme1n1 partitions: 2 device_size_gb: 4000 partition_indices: [0, 1] crush_device_class: nvme - device: /dev/nvme2n1 partitions: 2 device_size_gb: 4000 partition_indices: [0, 1] crush_device_class: nvme # Pool configuration ceph_pools: - name: vm_ssd pg_num: 128 pgp_num: 128 size: 3 # Replicate across 3 nodes min_size: 2 # Minimum 2 replicas required application: rbd compression: false - name: vm_containers pg_num: 64 pgp_num: 64 size: 3 min_size: 2 application: rbd compression: true compression_algorithm: zstd # Safety flags ceph_wipe_disks: false # Set to true for fresh deployment (DESTRUCTIVE!) ``` ## Complete Playbook Example ```yaml # playbooks/ceph-deploy.yml --- - name: Deploy CEPH Storage on Proxmox Cluster hosts: "{{ cluster_group | default('matrix_cluster') }}" become: true serial: 1 # Deploy one node at a time pre_tasks: - name: Verify cluster is healthy ansible.builtin.command: cmd: pvecm status register: cluster_check changed_when: false failed_when: "'Quorate: Yes' not in cluster_check.stdout" - name: Verify CEPH networks MTU ansible.builtin.command: cmd: "ip link show {{ item }}" register: mtu_check changed_when: false failed_when: "'mtu 9000' not in mtu_check.stdout" loop: - vmbr1 # CEPH public - vmbr2 # CEPH private - name: Display CEPH configuration ansible.builtin.debug: msg: | Deploying CEPH to cluster: {{ cluster_name }} Public network: {{ ceph_network }} Cluster network: {{ ceph_cluster_network }} Expected OSDs: {{ ceph_osds.values() | map('map', attribute='partitions') | map('default', 1) | sum }} run_once: true roles: - role: proxmox_ceph post_tasks: - name: Display CEPH OSD tree ansible.builtin.command: cmd: ceph osd tree register: osd_tree_final changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Show OSD tree ansible.builtin.debug: var: osd_tree_final.stdout_lines run_once: true - name: Display pool information ansible.builtin.command: cmd: ceph osd pool ls detail register: pool_info changed_when: false delegate_to: "{{ groups[cluster_group | default('matrix_cluster')][0] }}" run_once: true - name: Show pool details ansible.builtin.debug: var: pool_info.stdout_lines run_once: true ``` ## Usage ### Deploy CEPH to Matrix Cluster ```bash # Check syntax ansible-playbook playbooks/ceph-deploy.yml --syntax-check # Deploy CEPH ansible-playbook playbooks/ceph-deploy.yml --limit matrix_cluster # Verify CEPH status ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph status" ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph osd tree" ansible -i inventory/proxmox.yml foxtrot -m shell -a "ceph df" ``` ### Add mise Tasks ```toml # .mise.toml [tasks."ceph:deploy"] description = "Deploy CEPH storage on cluster" run = """ cd ansible uv run ansible-playbook playbooks/ceph-deploy.yml """ [tasks."ceph:status"] description = "Show CEPH cluster status" run = """ ansible -i ansible/inventory/proxmox.yml foxtrot -m shell -a "ceph -s" """ [tasks."ceph:health"] description = "Show CEPH health detail" run = """ ansible -i ansible/inventory/proxmox.yml foxtrot -m shell -a "ceph health detail" """ ``` ## Troubleshooting ### OSDs Won't Create **Symptoms:** - `pveceph osd create` fails with "already in use" error **Solutions:** 1. Check if disk has existing partitions: `lsblk /dev/nvme1n1` 2. Wipe disk: `wipefs -a /dev/nvme1n1` (DESTRUCTIVE!) 3. Set `ceph_wipe_disks: true` in group_vars 4. Check for existing LVM: `pvdisplay`, `lvdisplay` ### PGs Stuck in Creating **Symptoms:** - PGs stay in "creating" state for extended period **Solutions:** 1. Check OSD status: `ceph osd tree` 2. Verify all OSDs are up and in: `ceph osd stat` 3. Check mon/mgr status: `ceph mon stat`, `ceph mgr stat` 4. Review logs: `journalctl -u ceph-osd@*.service -n 100` ### Poor CEPH Performance **Symptoms:** - Slow VM disk I/O **Solutions:** 1. Verify MTU 9000: `ip link show vmbr1 | grep mtu` 2. Test network throughput: `iperf3` between nodes 3. Check OSD utilization: `ceph osd df` 4. Verify SSD/NVMe is being used: `ceph osd tree` 5. Check for rebalancing: `ceph -s` (look for "recovery") ## Related Workflows - [Cluster Formation](cluster-formation.md) - Form cluster before CEPH - [Network Configuration](../reference/networking.md) - Configure CEPH networks - [Storage Management](../reference/storage-management.md) - Manage CEPH pools and OSDs ## References - ProxSpray analysis: `docs/proxspray-analysis.md` (lines 1431-1562) - Proxmox VE CEPH documentation - CEPH deployment best practices - [Ansible CEPH automation pattern](../../.claude/skills/ansible-best-practices/patterns/ceph-automation.md)