Files
gh-flox-flox-agentic-flox-p…/skills/flox-cuda/SKILL.md
2025-11-29 18:27:20 +08:00

515 lines
13 KiB
Markdown

---
name: flox-cuda
description: CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development.
---
# Flox CUDA Development Guide
## Prerequisites & Authentication
- Sign up for early access at https://flox.dev
- Authenticate with `flox auth login`
- **Linux-only**: CUDA packages only work on `["aarch64-linux", "x86_64-linux"]`
- All CUDA packages are prefixed with `flox-cuda/` in the catalog
- **No macOS support**: Use Metal alternatives on Darwin
## Core Commands
```bash
# Search for CUDA packages
flox search cudatoolkit --all | grep flox-cuda
flox search nvcc --all | grep 12_8
# Show available versions
flox show flox-cuda/cudaPackages.cudatoolkit
# Install CUDA packages
flox install flox-cuda/cudaPackages_12_8.cuda_nvcc
flox install flox-cuda/cudaPackages.cuda_cudart
# Verify installation
nvcc --version
nvidia-smi
```
## Package Discovery
```bash
# Search for CUDA toolkit
flox search cudatoolkit --all | grep flox-cuda
# Search for specific versions
flox search nvcc --all | grep 12_8
# Show all available versions
flox show flox-cuda/cudaPackages.cudatoolkit
# Search for CUDA libraries
flox search libcublas --all | grep flox-cuda
flox search cudnn --all | grep flox-cuda
```
## Essential CUDA Packages
| Package Pattern | Purpose | Example |
|-----------------|---------|---------|
| `cudaPackages_X_Y.cudatoolkit` | Main CUDA Toolkit | `cudaPackages_12_8.cudatoolkit` |
| `cudaPackages_X_Y.cuda_nvcc` | NVIDIA C++ Compiler | `cudaPackages_12_8.cuda_nvcc` |
| `cudaPackages.cuda_cudart` | CUDA Runtime API | `cuda_cudart` |
| `cudaPackages_X_Y.libcublas` | Linear algebra | `cudaPackages_12_8.libcublas` |
| `cudaPackages_X_Y.libcufft` | Fast Fourier Transform | `cudaPackages_12_8.libcufft` |
| `cudaPackages_X_Y.libcurand` | Random number generation | `cudaPackages_12_8.libcurand` |
| `cudaPackages_X_Y.cudnn_9_11` | Deep neural networks | `cudaPackages_12_8.cudnn_9_11` |
| `cudaPackages_X_Y.nccl` | Multi-GPU communication | `cudaPackages_12_8.nccl` |
## Critical: Conflict Resolution
**CUDA packages have LICENSE file conflicts requiring explicit priorities:**
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_nvcc.priority = 1 # Highest priority
cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.priority = 2
cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
cudatoolkit.priority = 3 # Lower for LICENSE conflicts
gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped" # For libstdc++
gcc-unwrapped.priority = 5
```
## CUDA Version Selection
### CUDA 12.x (Current)
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
cudatoolkit.priority = 3
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
```
### CUDA 11.x (Legacy Support)
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_11_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cudatoolkit.pkg-path = "flox-cuda/cudaPackages_11_8.cudatoolkit"
cudatoolkit.priority = 3
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
```
## Cross-Platform GPU Development
Dual CUDA/CPU packages for portability (Linux gets CUDA, macOS gets CPU fallback):
```toml
[install]
## CUDA packages (Linux only)
cuda-pytorch.pkg-path = "flox-cuda/python3Packages.torch"
cuda-pytorch.systems = ["x86_64-linux", "aarch64-linux"]
cuda-pytorch.priority = 1
## Non-CUDA packages (macOS + Linux fallback)
pytorch.pkg-path = "python313Packages.pytorch"
pytorch.systems = ["x86_64-darwin", "aarch64-darwin"]
pytorch.priority = 6 # Lower priority
```
## GPU Detection Pattern
**Dynamic CPU/GPU package installation in hooks:**
```bash
setup_gpu_packages() {
venv="$FLOX_ENV_CACHE/venv"
if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
if lspci 2>/dev/null | grep -E 'NVIDIA|AMD' > /dev/null; then
echo "GPU detected, installing CUDA packages"
uv pip install --python "$venv/bin/python" \
torch torchvision --index-url https://download.pytorch.org/whl/cu129
else
echo "No GPU detected, installing CPU packages"
uv pip install --python "$venv/bin/python" \
torch torchvision --index-url https://download.pytorch.org/whl/cpu
fi
touch "$FLOX_ENV_CACHE/.deps_installed"
fi
}
```
## Complete CUDA Environment Examples
### Basic CUDA Development
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5
[vars]
CUDA_VERSION = "12.8"
CUDA_HOME = "$FLOX_ENV"
[hook]
echo "CUDA $CUDA_VERSION environment ready"
echo "nvcc: $(nvcc --version | grep release)"
```
### Deep Learning with PyTorch
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]
cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
cudnn.priority = 2
cudnn.systems = ["aarch64-linux", "x86_64-linux"]
python313Full.pkg-path = "python313Full"
uv.pkg-path = "uv"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5
[vars]
CUDA_VERSION = "12.8"
PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"
[hook]
setup_pytorch_cuda() {
venv="$FLOX_ENV_CACHE/venv"
if [ ! -d "$venv" ]; then
uv venv "$venv" --python python3
fi
if [ -f "$venv/bin/activate" ]; then
source "$venv/bin/activate"
fi
if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
uv pip install --python "$venv/bin/python" \
torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu129
touch "$FLOX_ENV_CACHE/.deps_installed"
fi
}
setup_pytorch_cuda
```
### TensorFlow with CUDA
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
cudnn.priority = 2
cudnn.systems = ["aarch64-linux", "x86_64-linux"]
python313Full.pkg-path = "python313Full"
uv.pkg-path = "uv"
[hook]
setup_tensorflow() {
venv="$FLOX_ENV_CACHE/venv"
[ ! -d "$venv" ] && uv venv "$venv" --python python3
[ -f "$venv/bin/activate" ] && source "$venv/bin/activate"
if [ ! -f "$FLOX_ENV_CACHE/.tf_installed" ]; then
uv pip install --python "$venv/bin/python" tensorflow[and-cuda]
touch "$FLOX_ENV_CACHE/.tf_installed"
fi
}
setup_tensorflow
```
### Multi-GPU Development
```toml
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
nccl.pkg-path = "flox-cuda/cudaPackages_12_8.nccl"
nccl.priority = 2
nccl.systems = ["aarch64-linux", "x86_64-linux"]
libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]
[vars]
CUDA_VISIBLE_DEVICES = "0,1,2,3" # All GPUs
NCCL_DEBUG = "INFO"
```
## Modular CUDA Environments
### Base CUDA Environment
```toml
# team/cuda-base
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5
[vars]
CUDA_VERSION = "12.8"
CUDA_HOME = "$FLOX_ENV"
```
### CUDA Math Libraries
```toml
# team/cuda-math
[include]
environments = [{ remote = "team/cuda-base" }]
[install]
libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]
libcufft.pkg-path = "flox-cuda/cudaPackages_12_8.libcufft"
libcufft.priority = 2
libcufft.systems = ["aarch64-linux", "x86_64-linux"]
libcurand.pkg-path = "flox-cuda/cudaPackages_12_8.libcurand"
libcurand.priority = 2
libcurand.systems = ["aarch64-linux", "x86_64-linux"]
```
### CUDA Debugging Tools
```toml
# team/cuda-debug
[install]
cuda-gdb.pkg-path = "flox-cuda/cudaPackages_12_8.cuda-gdb"
cuda-gdb.systems = ["aarch64-linux", "x86_64-linux"]
nsight-systems.pkg-path = "flox-cuda/cudaPackages_12_8.nsight-systems"
nsight-systems.systems = ["aarch64-linux", "x86_64-linux"]
[vars]
CUDA_LAUNCH_BLOCKING = "1" # Synchronous kernel launches for debugging
```
### Layer for Development
```bash
# Base CUDA environment
flox activate -r team/cuda-base
# Add debugging tools when needed
flox activate -r team/cuda-base -- flox activate -r team/cuda-debug
```
## Testing CUDA Installation
### Verify CUDA Compiler
```bash
nvcc --version
```
### Check GPU Availability
```bash
nvidia-smi
```
### Compile Test Program
```bash
cat > hello_cuda.cu << 'EOF'
#include <stdio.h>
__global__ void hello() {
printf("Hello from GPU!\n");
}
int main() {
hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
EOF
nvcc hello_cuda.cu -o hello_cuda
./hello_cuda
```
### Test PyTorch CUDA
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
print(f"GPU name: {torch.cuda.get_device_name(0)}")
```
## Best Practices
### Always Use Priority Values
CUDA packages have predictable conflicts - assign explicit priorities
### Version Consistency
Use specific versions (e.g., `_12_8`) for reproducibility. Don't mix CUDA versions.
### Modular Design
Split base CUDA, math libs, and debugging into separate environments for flexibility
### Test Compilation
Verify `nvcc hello.cu -o hello` works after setup
### Platform Constraints
Always include `systems = ["aarch64-linux", "x86_64-linux"]`
### Memory Management
Set appropriate CUDA memory allocator configs:
```toml
[vars]
PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"
CUDA_LAUNCH_BLOCKING = "0" # Async by default
```
## Common CUDA Gotchas
### CUDA Toolkit ≠ Complete Toolkit
The cudatoolkit package doesn't include all libraries. Add what you need:
- libcublas for linear algebra
- libcufft for FFT
- cudnn for deep learning
### License Conflicts
Every CUDA package may need explicit priority due to LICENSE file conflicts
### No macOS Support
CUDA is Linux-only. Use Metal-accelerated packages on Darwin when available
### Version Mixing
Don't mix CUDA versions. Use consistent `_X_Y` suffixes across all CUDA packages
### Python Virtual Environments
CUDA Python packages (PyTorch, TensorFlow) should be installed in venv with correct CUDA version
### Driver Requirements
Ensure NVIDIA driver supports your CUDA version. Check with `nvidia-smi`
## Troubleshooting
### CUDA Not Found
```bash
# Check CUDA_HOME
echo $CUDA_HOME
# Check nvcc
which nvcc
nvcc --version
# Check library paths
echo $LD_LIBRARY_PATH
```
### PyTorch Not Using GPU
```python
import torch
print(torch.cuda.is_available()) # Should be True
print(torch.version.cuda) # Should match your CUDA version
# If False, reinstall with correct CUDA version
# uv pip install torch --index-url https://download.pytorch.org/whl/cu129
```
### Compilation Errors
```bash
# Check gcc/g++ version
gcc --version
g++ --version
# Ensure gcc-unwrapped is installed
flox list | grep gcc-unwrapped
# Check include paths
echo $CPATH
echo $LIBRARY_PATH
```
### Runtime Errors
```bash
# Check GPU visibility
echo $CUDA_VISIBLE_DEVICES
# Check for GPU
nvidia-smi
# Run with debug output
CUDA_LAUNCH_BLOCKING=1 python my_script.py
```
## Related Skills
- **flox-environments** - Setting up development environments
- **flox-sharing** - Composing CUDA base with project environments
- **flox-containers** - Containerizing CUDA environments for deployment
- **flox-services** - Running CUDA workloads as services