487 lines
8.9 KiB
Markdown
487 lines
8.9 KiB
Markdown
# Proxmox Storage Management
|
||
|
||
## Overview
|
||
|
||
Proxmox VE supports multiple storage backends. This guide focuses on the storage architecture of the Matrix cluster: LVM-thin for boot disks and CEPH for distributed storage.
|
||
|
||
## Matrix Cluster Storage Architecture
|
||
|
||
### Hardware Configuration
|
||
|
||
**Per Node (Foxtrot, Golf, Hotel):**
|
||
|
||
```text
|
||
nvme0n1 - 1TB Crucial P3 → Boot disk + LVM
|
||
nvme1n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
|
||
nvme2n1 - 4TB Samsung 990 PRO → CEPH OSD (2 OSDs)
|
||
```
|
||
|
||
**Total Cluster:**
|
||
|
||
- 3× 1TB boot disks (LVM local storage)
|
||
- 6× 4TB NVMe drives (24TB raw CEPH capacity)
|
||
- 12 CEPH OSDs total (2 per NVMe drive)
|
||
|
||
### Storage Pools
|
||
|
||
```text
|
||
Storage Pool Type Backend Purpose
|
||
------------- ---- ------- -------
|
||
local dir Directory ISO images, templates, backups
|
||
local-lvm lvmthin LVM-thin VM disks (local)
|
||
ceph-pool rbd CEPH RBD VM disks (distributed, HA)
|
||
ceph-fs cephfs CephFS Shared filesystem
|
||
```
|
||
|
||
## LVM Storage
|
||
|
||
### LVM-thin Configuration
|
||
|
||
**Advantages:**
|
||
|
||
- Thin provisioning (overcommit storage)
|
||
- Fast snapshots
|
||
- Local to each node (low latency)
|
||
- No network overhead
|
||
|
||
**Disadvantages:**
|
||
|
||
- No HA (tied to single node)
|
||
- No live migration with storage
|
||
- Limited to node's local disk size
|
||
|
||
**Check LVM usage:**
|
||
|
||
```bash
|
||
# View volume groups
|
||
vgs
|
||
|
||
# View logical volumes
|
||
lvs
|
||
|
||
# View thin pool usage
|
||
lvs -a | grep thin
|
||
```
|
||
|
||
**Example output:**
|
||
|
||
```text
|
||
LV VG Attr LSize Pool Origin Data%
|
||
data pve twi-aotz-- 850.00g 45.23
|
||
vm-101-disk-0 pve Vwi-aotz-- 50.00g data 12.45
|
||
```
|
||
|
||
### Managing LVM Storage
|
||
|
||
**Extend thin pool (if boot disk has space):**
|
||
|
||
```bash
|
||
# Check free space in VG
|
||
vgs pve
|
||
|
||
# Extend thin pool
|
||
lvextend -L +100G pve/data
|
||
```
|
||
|
||
**Create VM disk manually:**
|
||
|
||
```bash
|
||
# Create 50GB disk for VM 101
|
||
lvcreate -V 50G -T pve/data -n vm-101-disk-0
|
||
```
|
||
|
||
## CEPH Storage
|
||
|
||
### CEPH Architecture for Matrix
|
||
|
||
**Network Configuration:**
|
||
|
||
```text
|
||
vmbr1 (192.168.5.0/24, MTU 9000) → CEPH Public Network
|
||
vmbr2 (192.168.7.0/24, MTU 9000) → CEPH Private Network
|
||
```
|
||
|
||
**OSD Distribution:**
|
||
|
||
```text
|
||
Node NVMe OSDs Capacity
|
||
------- ------ ---- --------
|
||
foxtrot nvme1n1 2 4TB
|
||
foxtrot nvme2n1 2 4TB
|
||
golf nvme1n1 2 4TB
|
||
golf nvme2n1 2 4TB
|
||
hotel nvme1n1 2 4TB
|
||
hotel nvme2n1 2 4TB
|
||
------- ------ ---- --------
|
||
Total 12 24TB raw
|
||
```
|
||
|
||
**Usable capacity (replica 3):** ~8TB
|
||
|
||
### CEPH Deployment Commands
|
||
|
||
**Install CEPH:**
|
||
|
||
```bash
|
||
# On first node (foxtrot)
|
||
pveceph install --version reef
|
||
|
||
# Initialize cluster
|
||
pveceph init --network 192.168.5.0/24 --cluster-network 192.168.7.0/24
|
||
```
|
||
|
||
**Create Monitors (3 for quorum):**
|
||
|
||
```bash
|
||
# On each node
|
||
pveceph mon create
|
||
```
|
||
|
||
**Create Manager (on each node):**
|
||
|
||
```bash
|
||
pveceph mgr create
|
||
```
|
||
|
||
**Create OSDs:**
|
||
|
||
```bash
|
||
# On each node - 2 OSDs per NVMe drive
|
||
|
||
# For nvme1n1 (4TB)
|
||
pveceph osd create /dev/nvme1n1 --crush-device-class nvme
|
||
|
||
# For nvme2n1 (4TB)
|
||
pveceph osd create /dev/nvme2n1 --crush-device-class nvme
|
||
```
|
||
|
||
**Create CEPH Pool:**
|
||
|
||
```bash
|
||
# Create RBD pool for VMs
|
||
pveceph pool create ceph-pool --add_storages
|
||
|
||
# Create CephFS for shared storage
|
||
pveceph fs create --name cephfs --add-storage
|
||
```
|
||
|
||
### CEPH Configuration Best Practices
|
||
|
||
**Optimize for NVMe:**
|
||
|
||
```bash
|
||
# /etc/pve/ceph.conf
|
||
[global]
|
||
public_network = 192.168.5.0/24
|
||
cluster_network = 192.168.7.0/24
|
||
osd_pool_default_size = 3
|
||
osd_pool_default_min_size = 2
|
||
|
||
[osd]
|
||
osd_memory_target = 4294967296 # 4GB per OSD
|
||
osd_max_backfills = 1
|
||
osd_recovery_max_active = 1
|
||
```
|
||
|
||
**Restart CEPH services after config change:**
|
||
|
||
```bash
|
||
systemctl restart ceph-osd@*.service
|
||
```
|
||
|
||
### CEPH Monitoring
|
||
|
||
**Check cluster health:**
|
||
|
||
```bash
|
||
ceph status
|
||
ceph health detail
|
||
```
|
||
|
||
**Example healthy output:**
|
||
|
||
```text
|
||
cluster:
|
||
id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
||
health: HEALTH_OK
|
||
|
||
services:
|
||
mon: 3 daemons, quorum foxtrot,golf,hotel
|
||
mgr: foxtrot(active), standbys: golf, hotel
|
||
osd: 12 osds: 12 up, 12 in
|
||
|
||
data:
|
||
pools: 2 pools, 128 pgs
|
||
objects: 1.23k objects, 45 GiB
|
||
usage: 135 GiB used, 23.8 TiB / 24 TiB avail
|
||
pgs: 128 active+clean
|
||
```
|
||
|
||
**Check OSD performance:**
|
||
|
||
```bash
|
||
ceph osd df
|
||
ceph osd perf
|
||
```
|
||
|
||
**Check pool usage:**
|
||
|
||
```bash
|
||
ceph df
|
||
rados df
|
||
```
|
||
|
||
## Storage Configuration in Proxmox
|
||
|
||
### Add Storage via Web UI
|
||
|
||
**Datacenter → Storage → Add:**
|
||
|
||
1. **Directory** - For ISOs and backups
|
||
2. **LVM-Thin** - For local VM disks
|
||
3. **RBD** - For CEPH VM disks
|
||
4. **CephFS** - For shared files
|
||
|
||
### Add Storage via CLI
|
||
|
||
**CEPH RBD:**
|
||
|
||
```bash
|
||
pvesm add rbd ceph-pool \
|
||
--pool ceph-pool \
|
||
--content images,rootdir \
|
||
--nodes foxtrot,golf,hotel
|
||
```
|
||
|
||
**CephFS:**
|
||
|
||
```bash
|
||
pvesm add cephfs cephfs \
|
||
--path /mnt/pve/cephfs \
|
||
--content backup,iso,vztmpl \
|
||
--nodes foxtrot,golf,hotel
|
||
```
|
||
|
||
**NFS (if using external NAS):**
|
||
|
||
```bash
|
||
pvesm add nfs nas-storage \
|
||
--server 192.168.3.10 \
|
||
--export /mnt/tank/proxmox \
|
||
--content images,backup,iso \
|
||
--nodes foxtrot,golf,hotel
|
||
```
|
||
|
||
## VM Disk Management
|
||
|
||
### Create VM Disk on CEPH
|
||
|
||
**Via CLI:**
|
||
|
||
```bash
|
||
# Create 100GB disk for VM 101 on CEPH
|
||
qm set 101 --scsi1 ceph-pool:100
|
||
```
|
||
|
||
**Via API (Python):**
|
||
|
||
```python
|
||
from proxmoxer import ProxmoxAPI
|
||
|
||
proxmox = ProxmoxAPI('192.168.3.5', user='root@pam', password='pass')
|
||
proxmox.nodes('foxtrot').qemu(101).config.put(scsi1='ceph-pool:100')
|
||
```
|
||
|
||
### Move VM Disk Between Storage
|
||
|
||
**Move from local-lvm to CEPH:**
|
||
|
||
```bash
|
||
qm move-disk 101 scsi0 ceph-pool --delete 1
|
||
```
|
||
|
||
**Move with live migration:**
|
||
|
||
```bash
|
||
qm move-disk 101 scsi0 ceph-pool --delete 1 --online 1
|
||
```
|
||
|
||
### Resize VM Disk
|
||
|
||
**Grow disk (can't shrink):**
|
||
|
||
```bash
|
||
# Grow VM 101's scsi0 by 50GB
|
||
qm resize 101 scsi0 +50G
|
||
```
|
||
|
||
**Inside VM (expand filesystem):**
|
||
|
||
```bash
|
||
# For ext4
|
||
sudo resize2fs /dev/sda1
|
||
|
||
# For XFS
|
||
sudo xfs_growfs /
|
||
```
|
||
|
||
## Backup and Restore
|
||
|
||
### Backup to Storage
|
||
|
||
**Create backup:**
|
||
|
||
```bash
|
||
# Backup VM 101 to local storage
|
||
vzdump 101 --storage local --mode snapshot --compress zstd
|
||
|
||
# Backup to CephFS
|
||
vzdump 101 --storage cephfs --mode snapshot --compress zstd
|
||
```
|
||
|
||
**Scheduled backups (via Web UI):**
|
||
|
||
Datacenter → Backup → Add:
|
||
|
||
- Schedule: Daily at 2 AM
|
||
- Storage: cephfs
|
||
- Mode: Snapshot
|
||
- Compression: ZSTD
|
||
- Retention: Keep last 7
|
||
|
||
### Restore from Backup
|
||
|
||
**List backups:**
|
||
|
||
```bash
|
||
ls /var/lib/vz/dump/
|
||
# OR
|
||
ls /mnt/pve/cephfs/dump/
|
||
```
|
||
|
||
**Restore:**
|
||
|
||
```bash
|
||
# Restore to same VMID
|
||
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 101
|
||
|
||
# Restore to new VMID
|
||
qmrestore /var/lib/vz/dump/vzdump-qemu-101-2024_01_15-02_00_00.vma.zst 102 --storage ceph-pool
|
||
```
|
||
|
||
## Performance Tuning
|
||
|
||
### CEPH Performance
|
||
|
||
**For NVMe OSDs:**
|
||
|
||
```bash
|
||
# Set proper device class
|
||
ceph osd crush set-device-class nvme osd.0
|
||
ceph osd crush set-device-class nvme osd.1
|
||
# ... repeat for all OSDs
|
||
```
|
||
|
||
**Create performance pool:**
|
||
|
||
```bash
|
||
ceph osd pool create fast-pool 128 128
|
||
ceph osd pool application enable fast-pool rbd
|
||
```
|
||
|
||
**Enable RBD cache:**
|
||
|
||
```bash
|
||
# /etc/pve/ceph.conf
|
||
[client]
|
||
rbd_cache = true
|
||
rbd_cache_size = 134217728 # 128MB
|
||
rbd_cache_writethrough_until_flush = false
|
||
```
|
||
|
||
### LVM Performance
|
||
|
||
**Use SSD discard:**
|
||
|
||
```bash
|
||
# Enable discard on VM disk
|
||
qm set 101 --scsi0 local-lvm:vm-101-disk-0,discard=on,ssd=1
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
### CEPH Not Healthy
|
||
|
||
**Check OSD status:**
|
||
|
||
```bash
|
||
ceph osd tree
|
||
ceph osd stat
|
||
```
|
||
|
||
**Restart stuck OSD:**
|
||
|
||
```bash
|
||
systemctl restart ceph-osd@0.service
|
||
```
|
||
|
||
**Check network connectivity:**
|
||
|
||
```bash
|
||
# From one node to another
|
||
ping -c 3 -M do -s 8972 192.168.5.6 # Test MTU 9000
|
||
```
|
||
|
||
### LVM Out of Space
|
||
|
||
**Check thin pool usage:**
|
||
|
||
```bash
|
||
lvs pve/data -o lv_name,data_percent,metadata_percent
|
||
```
|
||
|
||
**If thin pool > 90% full:**
|
||
|
||
```bash
|
||
# Extend if VG has space
|
||
lvextend -L +100G pve/data
|
||
|
||
# OR delete unused VM disks
|
||
lvremove pve/vm-XXX-disk-0
|
||
```
|
||
|
||
### Storage Performance Issues
|
||
|
||
**Test disk I/O:**
|
||
|
||
```bash
|
||
# Test sequential write
|
||
dd if=/dev/zero of=/tmp/test bs=1M count=1024 oflag=direct
|
||
|
||
# Test CEPH RBD performance
|
||
rbd bench --io-type write ceph-pool/test-image
|
||
```
|
||
|
||
**Monitor CEPH latency:**
|
||
|
||
```bash
|
||
ceph osd perf
|
||
```
|
||
|
||
## Best Practices
|
||
|
||
1. **Use CEPH for HA VMs** - Store critical VM disks on CEPH for live migration
|
||
2. **Use LVM for performance** - Non-critical VMs get better performance on local LVM
|
||
3. **MTU 9000 for CEPH** - Always use jumbo frames on CEPH networks
|
||
4. **Separate networks** - Public and private CEPH networks on different interfaces
|
||
5. **Monitor CEPH health** - Set up alerts for HEALTH_WARN/HEALTH_ERR
|
||
6. **Regular backups** - Automated daily backups to CephFS or external NAS
|
||
7. **Plan for growth** - Leave 20% free space in CEPH for rebalancing
|
||
8. **Use replica 3** - Essential for data safety, especially with only 3 nodes
|
||
|
||
## Further Reading
|
||
|
||
- [Proxmox VE Storage Documentation](https://pve.proxmox.com/wiki/Storage)
|
||
- [CEPH Documentation](https://docs.ceph.com/)
|
||
- [Proxmox CEPH Guide](https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster)
|