--- name: flox-cuda description: CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development. --- # Flox CUDA Development Guide ## Prerequisites & Authentication - Sign up for early access at https://flox.dev - Authenticate with `flox auth login` - **Linux-only**: CUDA packages only work on `["aarch64-linux", "x86_64-linux"]` - All CUDA packages are prefixed with `flox-cuda/` in the catalog - **No macOS support**: Use Metal alternatives on Darwin ## Core Commands ```bash # Search for CUDA packages flox search cudatoolkit --all | grep flox-cuda flox search nvcc --all | grep 12_8 # Show available versions flox show flox-cuda/cudaPackages.cudatoolkit # Install CUDA packages flox install flox-cuda/cudaPackages_12_8.cuda_nvcc flox install flox-cuda/cudaPackages.cuda_cudart # Verify installation nvcc --version nvidia-smi ``` ## Package Discovery ```bash # Search for CUDA toolkit flox search cudatoolkit --all | grep flox-cuda # Search for specific versions flox search nvcc --all | grep 12_8 # Show all available versions flox show flox-cuda/cudaPackages.cudatoolkit # Search for CUDA libraries flox search libcublas --all | grep flox-cuda flox search cudnn --all | grep flox-cuda ``` ## Essential CUDA Packages | Package Pattern | Purpose | Example | |-----------------|---------|---------| | `cudaPackages_X_Y.cudatoolkit` | Main CUDA Toolkit | `cudaPackages_12_8.cudatoolkit` | | `cudaPackages_X_Y.cuda_nvcc` | NVIDIA C++ Compiler | `cudaPackages_12_8.cuda_nvcc` | | `cudaPackages.cuda_cudart` | CUDA Runtime API | `cuda_cudart` | | `cudaPackages_X_Y.libcublas` | Linear algebra | `cudaPackages_12_8.libcublas` | | `cudaPackages_X_Y.libcufft` | Fast Fourier Transform | `cudaPackages_12_8.libcufft` | | `cudaPackages_X_Y.libcurand` | Random number generation | `cudaPackages_12_8.libcurand` | | `cudaPackages_X_Y.cudnn_9_11` | Deep neural networks | `cudaPackages_12_8.cudnn_9_11` | | `cudaPackages_X_Y.nccl` | Multi-GPU communication | `cudaPackages_12_8.nccl` | ## Critical: Conflict Resolution **CUDA packages have LICENSE file conflicts requiring explicit priorities:** ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_nvcc.priority = 1 # Highest priority cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.priority = 2 cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit" cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.priority = 3 # Lower for LICENSE conflicts gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" # For libstdc++ gcc-unwrapped.priority = 5 ``` ## CUDA Version Selection ### CUDA 12.x (Current) ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit" cudatoolkit.priority = 3 cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"] ``` ### CUDA 11.x (Legacy Support) ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_11_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cudatoolkit.pkg-path = "flox-cuda/cudaPackages_11_8.cudatoolkit" cudatoolkit.priority = 3 cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"] ``` ## Cross-Platform GPU Development Dual CUDA/CPU packages for portability (Linux gets CUDA, macOS gets CPU fallback): ```toml [install] ## CUDA packages (Linux only) cuda-pytorch.pkg-path = "flox-cuda/python3Packages.torch" cuda-pytorch.systems = ["x86_64-linux", "aarch64-linux"] cuda-pytorch.priority = 1 ## Non-CUDA packages (macOS + Linux fallback) pytorch.pkg-path = "python313Packages.pytorch" pytorch.systems = ["x86_64-darwin", "aarch64-darwin"] pytorch.priority = 6 # Lower priority ``` ## GPU Detection Pattern **Dynamic CPU/GPU package installation in hooks:** ```bash setup_gpu_packages() { venv="$FLOX_ENV_CACHE/venv" if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then if lspci 2>/dev/null | grep -E 'NVIDIA|AMD' > /dev/null; then echo "GPU detected, installing CUDA packages" uv pip install --python "$venv/bin/python" \ torch torchvision --index-url https://download.pytorch.org/whl/cu129 else echo "No GPU detected, installing CPU packages" uv pip install --python "$venv/bin/python" \ torch torchvision --index-url https://download.pytorch.org/whl/cpu fi touch "$FLOX_ENV_CACHE/.deps_installed" fi } ``` ## Complete CUDA Environment Examples ### Basic CUDA Development ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" CUDA_HOME = "$FLOX_ENV" [hook] echo "CUDA $CUDA_VERSION environment ready" echo "nvcc: $(nvcc --version | grep release)" ``` ### Deep Learning with PyTorch ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11" cudnn.priority = 2 cudnn.systems = ["aarch64-linux", "x86_64-linux"] python313Full.pkg-path = "python313Full" uv.pkg-path = "uv" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128" [hook] setup_pytorch_cuda() { venv="$FLOX_ENV_CACHE/venv" if [ ! -d "$venv" ]; then uv venv "$venv" --python python3 fi if [ -f "$venv/bin/activate" ]; then source "$venv/bin/activate" fi if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then uv pip install --python "$venv/bin/python" \ torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/cu129 touch "$FLOX_ENV_CACHE/.deps_installed" fi } setup_pytorch_cuda ``` ### TensorFlow with CUDA ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11" cudnn.priority = 2 cudnn.systems = ["aarch64-linux", "x86_64-linux"] python313Full.pkg-path = "python313Full" uv.pkg-path = "uv" [hook] setup_tensorflow() { venv="$FLOX_ENV_CACHE/venv" [ ! -d "$venv" ] && uv venv "$venv" --python python3 [ -f "$venv/bin/activate" ] && source "$venv/bin/activate" if [ ! -f "$FLOX_ENV_CACHE/.tf_installed" ]; then uv pip install --python "$venv/bin/python" tensorflow[and-cuda] touch "$FLOX_ENV_CACHE/.tf_installed" fi } setup_tensorflow ``` ### Multi-GPU Development ```toml [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] nccl.pkg-path = "flox-cuda/cudaPackages_12_8.nccl" nccl.priority = 2 nccl.systems = ["aarch64-linux", "x86_64-linux"] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] [vars] CUDA_VISIBLE_DEVICES = "0,1,2,3" # All GPUs NCCL_DEBUG = "INFO" ``` ## Modular CUDA Environments ### Base CUDA Environment ```toml # team/cuda-base [install] cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc" cuda_nvcc.priority = 1 cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"] cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart" cuda_cudart.priority = 2 cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"] gcc.pkg-path = "gcc" gcc-unwrapped.pkg-path = "gcc-unwrapped" gcc-unwrapped.priority = 5 [vars] CUDA_VERSION = "12.8" CUDA_HOME = "$FLOX_ENV" ``` ### CUDA Math Libraries ```toml # team/cuda-math [include] environments = [{ remote = "team/cuda-base" }] [install] libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas" libcublas.priority = 2 libcublas.systems = ["aarch64-linux", "x86_64-linux"] libcufft.pkg-path = "flox-cuda/cudaPackages_12_8.libcufft" libcufft.priority = 2 libcufft.systems = ["aarch64-linux", "x86_64-linux"] libcurand.pkg-path = "flox-cuda/cudaPackages_12_8.libcurand" libcurand.priority = 2 libcurand.systems = ["aarch64-linux", "x86_64-linux"] ``` ### CUDA Debugging Tools ```toml # team/cuda-debug [install] cuda-gdb.pkg-path = "flox-cuda/cudaPackages_12_8.cuda-gdb" cuda-gdb.systems = ["aarch64-linux", "x86_64-linux"] nsight-systems.pkg-path = "flox-cuda/cudaPackages_12_8.nsight-systems" nsight-systems.systems = ["aarch64-linux", "x86_64-linux"] [vars] CUDA_LAUNCH_BLOCKING = "1" # Synchronous kernel launches for debugging ``` ### Layer for Development ```bash # Base CUDA environment flox activate -r team/cuda-base # Add debugging tools when needed flox activate -r team/cuda-base -- flox activate -r team/cuda-debug ``` ## Testing CUDA Installation ### Verify CUDA Compiler ```bash nvcc --version ``` ### Check GPU Availability ```bash nvidia-smi ``` ### Compile Test Program ```bash cat > hello_cuda.cu << 'EOF' #include __global__ void hello() { printf("Hello from GPU!\n"); } int main() { hello<<<1,1>>>(); cudaDeviceSynchronize(); return 0; } EOF nvcc hello_cuda.cu -o hello_cuda ./hello_cuda ``` ### Test PyTorch CUDA ```python import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"CUDA version: {torch.version.cuda}") print(f"GPU count: {torch.cuda.device_count()}") if torch.cuda.is_available(): print(f"GPU name: {torch.cuda.get_device_name(0)}") ``` ## Best Practices ### Always Use Priority Values CUDA packages have predictable conflicts - assign explicit priorities ### Version Consistency Use specific versions (e.g., `_12_8`) for reproducibility. Don't mix CUDA versions. ### Modular Design Split base CUDA, math libs, and debugging into separate environments for flexibility ### Test Compilation Verify `nvcc hello.cu -o hello` works after setup ### Platform Constraints Always include `systems = ["aarch64-linux", "x86_64-linux"]` ### Memory Management Set appropriate CUDA memory allocator configs: ```toml [vars] PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128" CUDA_LAUNCH_BLOCKING = "0" # Async by default ``` ## Common CUDA Gotchas ### CUDA Toolkit ≠ Complete Toolkit The cudatoolkit package doesn't include all libraries. Add what you need: - libcublas for linear algebra - libcufft for FFT - cudnn for deep learning ### License Conflicts Every CUDA package may need explicit priority due to LICENSE file conflicts ### No macOS Support CUDA is Linux-only. Use Metal-accelerated packages on Darwin when available ### Version Mixing Don't mix CUDA versions. Use consistent `_X_Y` suffixes across all CUDA packages ### Python Virtual Environments CUDA Python packages (PyTorch, TensorFlow) should be installed in venv with correct CUDA version ### Driver Requirements Ensure NVIDIA driver supports your CUDA version. Check with `nvidia-smi` ## Troubleshooting ### CUDA Not Found ```bash # Check CUDA_HOME echo $CUDA_HOME # Check nvcc which nvcc nvcc --version # Check library paths echo $LD_LIBRARY_PATH ``` ### PyTorch Not Using GPU ```python import torch print(torch.cuda.is_available()) # Should be True print(torch.version.cuda) # Should match your CUDA version # If False, reinstall with correct CUDA version # uv pip install torch --index-url https://download.pytorch.org/whl/cu129 ``` ### Compilation Errors ```bash # Check gcc/g++ version gcc --version g++ --version # Ensure gcc-unwrapped is installed flox list | grep gcc-unwrapped # Check include paths echo $CPATH echo $LIBRARY_PATH ``` ### Runtime Errors ```bash # Check GPU visibility echo $CUDA_VISIBLE_DEVICES # Check for GPU nvidia-smi # Run with debug output CUDA_LAUNCH_BLOCKING=1 python my_script.py ``` ## Related Skills - **flox-environments** - Setting up development environments - **flox-sharing** - Composing CUDA base with project environments - **flox-containers** - Containerizing CUDA environments for deployment - **flox-services** - Running CUDA workloads as services