gh-josiahsiegel-claude-code…/skills/container-apps-gpu-2025.md

## 🚨 CRITICAL GUIDELINES

### Windows File Path Requirements

**MANDATORY: Always Use Backslashes on Windows for File Paths**

When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).

**Examples:**
- ❌ WRONG: `D:/repos/project/file.tsx`
- ✅ CORRECT: `D:\repos\project\file.tsx`

This applies to:
- Edit tool file_path parameter
- Write tool file_path parameter
- All file operations on Windows systems

### Documentation Guidelines

**NEVER create new documentation files unless explicitly requested by the user.**

- **Priority**: Update existing README.md files rather than creating new documentation
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
- **User preference**: Only create additional .md files when user specifically asks for documentation

---


# Azure Container Apps GPU Support - 2025 Features

Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).

## Overview

Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.

## Key 2025 Features (Build Announcements)

### 1. Serverless GPU (GA)
- **Automatic scaling**: Scale GPU workloads based on demand
- **Scale-to-zero**: Pay only when GPU is actively used
- **Per-second billing**: Granular cost control
- **Optimized cold start**: Fast initialization for AI models
- **Reduced operational overhead**: No infrastructure management

### 2. Dedicated GPU (GA)
- **Consistent performance**: Dedicated GPU resources
- **Simplified AI deployment**: Easy model hosting
- **Long-running workloads**: Ideal for training and continuous inference
- **Multiple GPU types**: NVIDIA A100, T4, and more

### 3. Dynamic Sessions with GPU (Early Access)
- **Sandboxed execution**: Run untrusted AI-generated code
- **Hyper-V isolation**: Enhanced security
- **GPU-powered Python interpreter**: Handle compute-intensive AI workloads
- **Scale at runtime**: Dynamic resource allocation

### 4. Foundry Models Integration
- **Deploy AI models directly**: During container app creation
- **Ready-to-use models**: Pre-configured inference endpoints
- **Azure AI Foundry**: Seamless integration

### 5. Workflow with Durable Task Scheduler (Preview)
- **Long-running workflows**: Reliable orchestration
- **State management**: Automatic persistence
- **Event-driven**: Trigger workflows from events

### 6. Native Azure Functions Support
- **Functions runtime**: Run Azure Functions in Container Apps
- **Consistent development**: Same code, serverless execution
- **Event triggers**: All Functions triggers supported

### 7. Dapr Integration (GA)
- **Service discovery**: Built-in DNS-based discovery
- **State management**: Distributed state stores
- **Pub/sub messaging**: Reliable messaging patterns
- **Service invocation**: Resilient service-to-service calls
- **Observability**: Integrated tracing and metrics

## Creating Container Apps with GPU

### Basic Container App with Serverless GPU

```bash
# Create Container Apps environment
az containerapp env create \
  --name myenv \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key>

# Create Container App with GPU
az containerapp create \
  --name myapp-gpu \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --ingress external \
  --target-port 8080
```

### Production-Ready Container App with GPU

```bash
az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # Container configuration
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # Resources
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # Scaling
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # Networking
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # Security
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # Monitoring
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # Identity
  --system-assigned
```

## Container Apps Environment Configuration

### Environment with Zone Redundancy

```bash
az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true
```

### Workload Profiles (Dedicated GPU)

```bash
# Create environment with workload profiles
az containerapp env create \
  --name myenv-gpu \
  --resource-group MyRG \
  --location eastus \
  --enable-workload-profiles true

# Add GPU workload profile
az containerapp env workload-profile add \
  --name myenv-gpu \
  --resource-group MyRG \
  --workload-profile-name gpu-profile \
  --workload-profile-type GPU-A100 \
  --min-nodes 0 \
  --max-nodes 10

# Create container app with GPU profile
az containerapp create \
  --name myapp-dedicated-gpu \
  --resource-group MyRG \
  --environment myenv-gpu \
  --workload-profile-name gpu-profile \
  --image myregistry.azurecr.io/training-job:latest \
  --cpu 8 \
  --memory 16Gi \
  --min-replicas 1 \
  --max-replicas 5
```

## GPU Scaling Rules

### Custom Prometheus Scaling

```bash
az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"
```

### Queue-Based Scaling (Azure Service Bus)

```bash
az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection
```

## Dapr Integration

### Enable Dapr on Container App

```bash
az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true
```

### Dapr State Store (Azure Cosmos DB)

```yaml
# Create Dapr component for state store
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: statestore
spec:
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: "https://mycosmosdb.documents.azure.com:443/"
    - name: masterKey
      secretRef: cosmosdb-key
    - name: database
      value: "mydb"
    - name: collection
      value: "state"
```

```bash
# Create the component
az containerapp env dapr-component set \
  --name myenv \
  --resource-group MyRG \
  --dapr-component-name statestore \
  --yaml component.yaml
```

### Dapr Pub/Sub (Azure Service Bus)

```yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"
```

### Service-to-Service Invocation

```python
# Python example using Dapr SDK
from dapr.clients import DaprClient

with DaprClient() as client:
    # Invoke another service
    response = client.invoke_method(
        app_id='other-service',
        method_name='process',
        data='{"input": "data"}'
    )

    # Save state
    client.save_state(
        store_name='statestore',
        key='mykey',
        value='myvalue'
    )

    # Publish message
    client.publish_event(
        pubsub_name='pubsub',
        topic_name='orders',
        data='{"orderId": "123"}'
    )
```

## AI Model Deployment Patterns

### OpenAI-Compatible Endpoint

```dockerfile
# Dockerfile for vLLM model serving
FROM vllm/vllm-openai:latest

ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
ENV GPU_MEMORY_UTILIZATION=0.9
ENV MAX_MODEL_LEN=4096

CMD ["--model", "${MODEL_NAME}", \
     "--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}", \
     "--max-model-len", "${MAX_MODEL_LEN}", \
     "--port", "8080"]
```

```bash
# Deploy vLLM model
az containerapp create \
  --name llama-inference \
  --resource-group MyRG \
  --environment myenv \
  --image vllm/vllm-openai:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 1 \
  --max-replicas 5 \
  --target-port 8080 \
  --ingress external \
  --env-vars \
    MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
    GPU_MEMORY_UTILIZATION="0.9" \
    HF_TOKEN=secretref:huggingface-token
```

### Stable Diffusion Image Generation

```bash
az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1
```

### Batch Processing Job

```bash
az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"

# Execute job
az containerapp job start \
  --name batch-training-job \
  --resource-group MyRG
```

## Monitoring and Observability

### Application Insights Integration

```bash
az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection
```

### Query Logs

```bash
# Stream logs
az containerapp logs show \
  --name myapp-gpu \
  --resource-group MyRG \
  --follow

# Query with Log Analytics
az monitor log-analytics query \
  --workspace <workspace-id> \
  --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
```

### Metrics and Alerts

```bash
# Create metric alert for GPU usage
az monitor metrics alert create \
  --name high-gpu-usage \
  --resource-group MyRG \
  --scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
  --condition "avg Requests > 100" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action <action-group-id>
```

## Security Best Practices

### Managed Identity

```bash
# Create with system-assigned identity
az containerapp create \
  --name myapp-identity \
  --resource-group MyRG \
  --environment myenv \
  --system-assigned \
  --image myregistry.azurecr.io/myapp:latest

# Get identity principal ID
IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)

# Assign role to access Key Vault
az role assignment create \
  --assignee $IDENTITY_ID \
  --role "Key Vault Secrets User" \
  --scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault

# Use user-assigned identity
az identity create --name myapp-identity --resource-group MyRG
IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)

az containerapp create \
  --name myapp-user-identity \
  --resource-group MyRG \
  --environment myenv \
  --user-assigned $IDENTITY_RESOURCE_ID \
  --image myregistry.azurecr.io/myapp:latest
```

### Secret Management

```bash
# Add secrets
az containerapp secret set \
  --name myapp-gpu \
  --resource-group MyRG \
  --secrets \
    huggingface-token="<token>" \
    api-key="<key>"

# Reference secrets in environment variables
az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --set-env-vars \
    HF_TOKEN=secretref:huggingface-token \
    API_KEY=secretref:api-key
```

## Cost Optimization

### Scale-to-Zero Configuration

```bash
az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10
```

**Cost savings**: Pay only when requests are being processed. GPU costs are per-second when active.

### Right-Sizing Resources

```bash
# Start with minimal resources
--cpu 2 --memory 4Gi --gpu-count 1

# Monitor and adjust based on actual usage
az monitor metrics list \
  --resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
  --metric "CpuPercentage,MemoryPercentage"
```

### Use Spot/Preemptible GPUs (Future Feature)

When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.

## Troubleshooting

### Check Revision Status

```bash
az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table
```

### View Revision Details

```bash
az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG
```

### Restart Container App

```bash
az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart
```

### GPU Not Available

If GPU is not provisioning:
1. Check region availability: Not all regions support GPU
2. Verify quota: Request quota increase if needed
3. Check workload profile: Ensure GPU workload profile is created

## Best Practices

✓ Use scale-to-zero for intermittent workloads
✓ Implement health probes (liveness and readiness)
✓ Use managed identities for authentication
✓ Store secrets in Azure Key Vault
✓ Enable Dapr for microservices patterns
✓ Configure appropriate scaling rules
✓ Monitor GPU utilization and adjust resources
✓ Use Container Apps jobs for batch processing
✓ Implement retry logic for transient failures
✓ Use Application Insights for observability

## References

- [Container Apps GPU Documentation](https://learn.microsoft.com/en-us/azure/container-apps/gpu-support)
- [Dapr Integration](https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview)
- [Scaling Rules](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)
- [Build 2025 Announcements](https://azure.microsoft.com/en-us/blog/container-apps-build-2025/)

Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!