Initial commit

2025-11-29 18:27:20 +08:00
commit b35e761c82
11 changed files with 3059 additions and 0 deletions
--- a/skills/flox-cuda/SKILL.md
+++ b/skills/flox-cuda/SKILL.md
@@ -0,0 +1,514 @@
+---
+name: flox-cuda
+description: CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development.
+---
+
+# Flox CUDA Development Guide
+
+## Prerequisites & Authentication
+
+- Sign up for early access at https://flox.dev
+- Authenticate with `flox auth login`
+- **Linux-only**: CUDA packages only work on `["aarch64-linux", "x86_64-linux"]`
+- All CUDA packages are prefixed with `flox-cuda/` in the catalog
+- **No macOS support**: Use Metal alternatives on Darwin
+
+## Core Commands
+
+```bash
+# Search for CUDA packages
+flox search cudatoolkit --all | grep flox-cuda
+flox search nvcc --all | grep 12_8
+
+# Show available versions
+flox show flox-cuda/cudaPackages.cudatoolkit
+
+# Install CUDA packages
+flox install flox-cuda/cudaPackages_12_8.cuda_nvcc
+flox install flox-cuda/cudaPackages.cuda_cudart
+
+# Verify installation
+nvcc --version
+nvidia-smi
+```
+
+## Package Discovery
+
+```bash
+# Search for CUDA toolkit
+flox search cudatoolkit --all | grep flox-cuda
+
+# Search for specific versions
+flox search nvcc --all | grep 12_8
+
+# Show all available versions
+flox show flox-cuda/cudaPackages.cudatoolkit
+
+# Search for CUDA libraries
+flox search libcublas --all | grep flox-cuda
+flox search cudnn --all | grep flox-cuda
+```
+
+## Essential CUDA Packages
+
+| Package Pattern | Purpose | Example |
+|-----------------|---------|---------|
+| `cudaPackages_X_Y.cudatoolkit` | Main CUDA Toolkit | `cudaPackages_12_8.cudatoolkit` |
+| `cudaPackages_X_Y.cuda_nvcc` | NVIDIA C++ Compiler | `cudaPackages_12_8.cuda_nvcc` |
+| `cudaPackages.cuda_cudart` | CUDA Runtime API | `cuda_cudart` |
+| `cudaPackages_X_Y.libcublas` | Linear algebra | `cudaPackages_12_8.libcublas` |
+| `cudaPackages_X_Y.libcufft` | Fast Fourier Transform | `cudaPackages_12_8.libcufft` |
+| `cudaPackages_X_Y.libcurand` | Random number generation | `cudaPackages_12_8.libcurand` |
+| `cudaPackages_X_Y.cudnn_9_11` | Deep neural networks | `cudaPackages_12_8.cudnn_9_11` |
+| `cudaPackages_X_Y.nccl` | Multi-GPU communication | `cudaPackages_12_8.nccl` |
+
+## Critical: Conflict Resolution
+
+**CUDA packages have LICENSE file conflicts requiring explicit priorities:**
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+cuda_nvcc.priority = 1                    # Highest priority
+
+cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
+cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
+cuda_cudart.priority = 2
+
+cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
+cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
+cudatoolkit.priority = 3                  # Lower for LICENSE conflicts
+
+gcc.pkg-path = "gcc"
+gcc-unwrapped.pkg-path = "gcc-unwrapped"  # For libstdc++
+gcc-unwrapped.priority = 5
+```
+
+## CUDA Version Selection
+
+### CUDA 12.x (Current)
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
+cudatoolkit.priority = 3
+cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
+```
+
+### CUDA 11.x (Legacy Support)
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_11_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cudatoolkit.pkg-path = "flox-cuda/cudaPackages_11_8.cudatoolkit"
+cudatoolkit.priority = 3
+cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
+```
+
+## Cross-Platform GPU Development
+
+Dual CUDA/CPU packages for portability (Linux gets CUDA, macOS gets CPU fallback):
+
+```toml
+[install]
+## CUDA packages (Linux only)
+cuda-pytorch.pkg-path = "flox-cuda/python3Packages.torch"
+cuda-pytorch.systems = ["x86_64-linux", "aarch64-linux"]
+cuda-pytorch.priority = 1
+
+## Non-CUDA packages (macOS + Linux fallback)
+pytorch.pkg-path = "python313Packages.pytorch"
+pytorch.systems = ["x86_64-darwin", "aarch64-darwin"]
+pytorch.priority = 6                     # Lower priority
+```
+
+## GPU Detection Pattern
+
+**Dynamic CPU/GPU package installation in hooks:**
+
+```bash
+setup_gpu_packages() {
+  venv="$FLOX_ENV_CACHE/venv"
+
+  if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
+    if lspci 2>/dev/null | grep -E 'NVIDIA|AMD' > /dev/null; then
+      echo "GPU detected, installing CUDA packages"
+      uv pip install --python "$venv/bin/python" \
+        torch torchvision --index-url https://download.pytorch.org/whl/cu129
+    else
+      echo "No GPU detected, installing CPU packages"
+      uv pip install --python "$venv/bin/python" \
+        torch torchvision --index-url https://download.pytorch.org/whl/cpu
+    fi
+    touch "$FLOX_ENV_CACHE/.deps_installed"
+  fi
+}
+```
+
+## Complete CUDA Environment Examples
+
+### Basic CUDA Development
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
+cuda_cudart.priority = 2
+cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
+
+gcc.pkg-path = "gcc"
+gcc-unwrapped.pkg-path = "gcc-unwrapped"
+gcc-unwrapped.priority = 5
+
+[vars]
+CUDA_VERSION = "12.8"
+CUDA_HOME = "$FLOX_ENV"
+
+[hook]
+echo "CUDA $CUDA_VERSION environment ready"
+echo "nvcc: $(nvcc --version | grep release)"
+```
+
+### Deep Learning with PyTorch
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
+cuda_cudart.priority = 2
+cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
+
+libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
+libcublas.priority = 2
+libcublas.systems = ["aarch64-linux", "x86_64-linux"]
+
+cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
+cudnn.priority = 2
+cudnn.systems = ["aarch64-linux", "x86_64-linux"]
+
+python313Full.pkg-path = "python313Full"
+uv.pkg-path = "uv"
+gcc-unwrapped.pkg-path = "gcc-unwrapped"
+gcc-unwrapped.priority = 5
+
+[vars]
+CUDA_VERSION = "12.8"
+PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"
+
+[hook]
+setup_pytorch_cuda() {
+  venv="$FLOX_ENV_CACHE/venv"
+
+  if [ ! -d "$venv" ]; then
+    uv venv "$venv" --python python3
+  fi
+
+  if [ -f "$venv/bin/activate" ]; then
+    source "$venv/bin/activate"
+  fi
+
+  if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
+    uv pip install --python "$venv/bin/python" \
+      torch torchvision torchaudio \
+      --index-url https://download.pytorch.org/whl/cu129
+    touch "$FLOX_ENV_CACHE/.deps_installed"
+  fi
+}
+
+setup_pytorch_cuda
+```
+
+### TensorFlow with CUDA
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
+cuda_cudart.priority = 2
+cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
+
+cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
+cudnn.priority = 2
+cudnn.systems = ["aarch64-linux", "x86_64-linux"]
+
+python313Full.pkg-path = "python313Full"
+uv.pkg-path = "uv"
+
+[hook]
+setup_tensorflow() {
+  venv="$FLOX_ENV_CACHE/venv"
+  [ ! -d "$venv" ] && uv venv "$venv" --python python3
+  [ -f "$venv/bin/activate" ] && source "$venv/bin/activate"
+
+  if [ ! -f "$FLOX_ENV_CACHE/.tf_installed" ]; then
+    uv pip install --python "$venv/bin/python" tensorflow[and-cuda]
+    touch "$FLOX_ENV_CACHE/.tf_installed"
+  fi
+}
+
+setup_tensorflow
+```
+
+### Multi-GPU Development
+
+```toml
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+nccl.pkg-path = "flox-cuda/cudaPackages_12_8.nccl"
+nccl.priority = 2
+nccl.systems = ["aarch64-linux", "x86_64-linux"]
+
+libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
+libcublas.priority = 2
+libcublas.systems = ["aarch64-linux", "x86_64-linux"]
+
+[vars]
+CUDA_VISIBLE_DEVICES = "0,1,2,3"  # All GPUs
+NCCL_DEBUG = "INFO"
+```
+
+## Modular CUDA Environments
+
+### Base CUDA Environment
+
+```toml
+# team/cuda-base
+[install]
+cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
+cuda_nvcc.priority = 1
+cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
+
+cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
+cuda_cudart.priority = 2
+cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
+
+gcc.pkg-path = "gcc"
+gcc-unwrapped.pkg-path = "gcc-unwrapped"
+gcc-unwrapped.priority = 5
+
+[vars]
+CUDA_VERSION = "12.8"
+CUDA_HOME = "$FLOX_ENV"
+```
+
+### CUDA Math Libraries
+
+```toml
+# team/cuda-math
+[include]
+environments = [{ remote = "team/cuda-base" }]
+
+[install]
+libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
+libcublas.priority = 2
+libcublas.systems = ["aarch64-linux", "x86_64-linux"]
+
+libcufft.pkg-path = "flox-cuda/cudaPackages_12_8.libcufft"
+libcufft.priority = 2
+libcufft.systems = ["aarch64-linux", "x86_64-linux"]
+
+libcurand.pkg-path = "flox-cuda/cudaPackages_12_8.libcurand"
+libcurand.priority = 2
+libcurand.systems = ["aarch64-linux", "x86_64-linux"]
+```
+
+### CUDA Debugging Tools
+
+```toml
+# team/cuda-debug
+[install]
+cuda-gdb.pkg-path = "flox-cuda/cudaPackages_12_8.cuda-gdb"
+cuda-gdb.systems = ["aarch64-linux", "x86_64-linux"]
+
+nsight-systems.pkg-path = "flox-cuda/cudaPackages_12_8.nsight-systems"
+nsight-systems.systems = ["aarch64-linux", "x86_64-linux"]
+
+[vars]
+CUDA_LAUNCH_BLOCKING = "1"  # Synchronous kernel launches for debugging
+```
+
+### Layer for Development
+
+```bash
+# Base CUDA environment
+flox activate -r team/cuda-base
+
+# Add debugging tools when needed
+flox activate -r team/cuda-base -- flox activate -r team/cuda-debug
+```
+
+## Testing CUDA Installation
+
+### Verify CUDA Compiler
+
+```bash
+nvcc --version
+```
+
+### Check GPU Availability
+
+```bash
+nvidia-smi
+```
+
+### Compile Test Program
+
+```bash
+cat > hello_cuda.cu << 'EOF'
+#include <stdio.h>
+
+__global__ void hello() {
+    printf("Hello from GPU!\n");
+}
+
+int main() {
+    hello<<<1,1>>>();
+    cudaDeviceSynchronize();
+    return 0;
+}
+EOF
+
+nvcc hello_cuda.cu -o hello_cuda
+./hello_cuda
+```
+
+### Test PyTorch CUDA
+
+```python
+import torch
+print(f"CUDA available: {torch.cuda.is_available()}")
+print(f"CUDA version: {torch.version.cuda}")
+print(f"GPU count: {torch.cuda.device_count()}")
+if torch.cuda.is_available():
+    print(f"GPU name: {torch.cuda.get_device_name(0)}")
+```
+
+## Best Practices
+
+### Always Use Priority Values
+CUDA packages have predictable conflicts - assign explicit priorities
+
+### Version Consistency
+Use specific versions (e.g., `_12_8`) for reproducibility. Don't mix CUDA versions.
+
+### Modular Design
+Split base CUDA, math libs, and debugging into separate environments for flexibility
+
+### Test Compilation
+Verify `nvcc hello.cu -o hello` works after setup
+
+### Platform Constraints
+Always include `systems = ["aarch64-linux", "x86_64-linux"]`
+
+### Memory Management
+Set appropriate CUDA memory allocator configs:
+```toml
+[vars]
+PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"
+CUDA_LAUNCH_BLOCKING = "0"  # Async by default
+```
+
+## Common CUDA Gotchas
+
+### CUDA Toolkit ≠ Complete Toolkit
+The cudatoolkit package doesn't include all libraries. Add what you need:
+- libcublas for linear algebra
+- libcufft for FFT
+- cudnn for deep learning
+
+### License Conflicts
+Every CUDA package may need explicit priority due to LICENSE file conflicts
+
+### No macOS Support
+CUDA is Linux-only. Use Metal-accelerated packages on Darwin when available
+
+### Version Mixing
+Don't mix CUDA versions. Use consistent `_X_Y` suffixes across all CUDA packages
+
+### Python Virtual Environments
+CUDA Python packages (PyTorch, TensorFlow) should be installed in venv with correct CUDA version
+
+### Driver Requirements
+Ensure NVIDIA driver supports your CUDA version. Check with `nvidia-smi`
+
+## Troubleshooting
+
+### CUDA Not Found
+
+```bash
+# Check CUDA_HOME
+echo $CUDA_HOME
+
+# Check nvcc
+which nvcc
+nvcc --version
+
+# Check library paths
+echo $LD_LIBRARY_PATH
+```
+
+### PyTorch Not Using GPU
+
+```python
+import torch
+print(torch.cuda.is_available())  # Should be True
+print(torch.version.cuda)         # Should match your CUDA version
+
+# If False, reinstall with correct CUDA version
+# uv pip install torch --index-url https://download.pytorch.org/whl/cu129
+```
+
+### Compilation Errors
+
+```bash
+# Check gcc/g++ version
+gcc --version
+g++ --version
+
+# Ensure gcc-unwrapped is installed
+flox list | grep gcc-unwrapped
+
+# Check include paths
+echo $CPATH
+echo $LIBRARY_PATH
+```
+
+### Runtime Errors
+
+```bash
+# Check GPU visibility
+echo $CUDA_VISIBLE_DEVICES
+
+# Check for GPU
+nvidia-smi
+
+# Run with debug output
+CUDA_LAUNCH_BLOCKING=1 python my_script.py
+```
+
+## Related Skills
+
+- **flox-environments** - Setting up development environments
+- **flox-sharing** - Composing CUDA base with project environments
+- **flox-containers** - Containerizing CUDA environments for deployment
+- **flox-services** - Running CUDA workloads as services