Initial commit

2025-11-30 08:28:52 +08:00
commit b40af6b4cc
9 changed files with 3945 additions and 0 deletions
--- a/skills/container-apps-gpu-2025.md
+++ b/skills/container-apps-gpu-2025.md
@@ -0,0 +1,624 @@
+## 🚨 CRITICAL GUIDELINES
+
+### Windows File Path Requirements
+
+**MANDATORY: Always Use Backslashes on Windows for File Paths**
+
+When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
+
+**Examples:**
+- ❌ WRONG: `D:/repos/project/file.tsx`
+- ✅ CORRECT: `D:\repos\project\file.tsx`
+
+This applies to:
+- Edit tool file_path parameter
+- Write tool file_path parameter
+- All file operations on Windows systems
+
+### Documentation Guidelines
+
+**NEVER create new documentation files unless explicitly requested by the user.**
+
+- **Priority**: Update existing README.md files rather than creating new documentation
+- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
+- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
+- **User preference**: Only create additional .md files when user specifically asks for documentation
+
+---
+
+
+# Azure Container Apps GPU Support - 2025 Features
+
+Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).
+
+## Overview
+
+Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.
+
+## Key 2025 Features (Build Announcements)
+
+### 1. Serverless GPU (GA)
+- **Automatic scaling**: Scale GPU workloads based on demand
+- **Scale-to-zero**: Pay only when GPU is actively used
+- **Per-second billing**: Granular cost control
+- **Optimized cold start**: Fast initialization for AI models
+- **Reduced operational overhead**: No infrastructure management
+
+### 2. Dedicated GPU (GA)
+- **Consistent performance**: Dedicated GPU resources
+- **Simplified AI deployment**: Easy model hosting
+- **Long-running workloads**: Ideal for training and continuous inference
+- **Multiple GPU types**: NVIDIA A100, T4, and more
+
+### 3. Dynamic Sessions with GPU (Early Access)
+- **Sandboxed execution**: Run untrusted AI-generated code
+- **Hyper-V isolation**: Enhanced security
+- **GPU-powered Python interpreter**: Handle compute-intensive AI workloads
+- **Scale at runtime**: Dynamic resource allocation
+
+### 4. Foundry Models Integration
+- **Deploy AI models directly**: During container app creation
+- **Ready-to-use models**: Pre-configured inference endpoints
+- **Azure AI Foundry**: Seamless integration
+
+### 5. Workflow with Durable Task Scheduler (Preview)
+- **Long-running workflows**: Reliable orchestration
+- **State management**: Automatic persistence
+- **Event-driven**: Trigger workflows from events
+
+### 6. Native Azure Functions Support
+- **Functions runtime**: Run Azure Functions in Container Apps
+- **Consistent development**: Same code, serverless execution
+- **Event triggers**: All Functions triggers supported
+
+### 7. Dapr Integration (GA)
+- **Service discovery**: Built-in DNS-based discovery
+- **State management**: Distributed state stores
+- **Pub/sub messaging**: Reliable messaging patterns
+- **Service invocation**: Resilient service-to-service calls
+- **Observability**: Integrated tracing and metrics
+
+## Creating Container Apps with GPU
+
+### Basic Container App with Serverless GPU
+
+```bash
+# Create Container Apps environment
+az containerapp env create \
+  --name myenv \
+  --resource-group MyRG \
+  --location eastus \
+  --logs-workspace-id <workspace-id> \
+  --logs-workspace-key <workspace-key>
+
+# Create Container App with GPU
+az containerapp create \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/ai-model:latest \
+  --cpu 4 \
+  --memory 8Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 1 \
+  --min-replicas 0 \
+  --max-replicas 10 \
+  --ingress external \
+  --target-port 8080
+```
+
+### Production-Ready Container App with GPU
+
+```bash
+az containerapp create \
+  --name myapp-gpu-prod \
+  --resource-group MyRG \
+  --environment myenv \
+  \
+  # Container configuration
+  --image myregistry.azurecr.io/ai-model:latest \
+  --registry-server myregistry.azurecr.io \
+  --registry-identity system \
+  \
+  # Resources
+  --cpu 4 \
+  --memory 8Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 1 \
+  \
+  # Scaling
+  --min-replicas 0 \
+  --max-replicas 20 \
+  --scale-rule-name http-scaling \
+  --scale-rule-type http \
+  --scale-rule-http-concurrency 10 \
+  \
+  # Networking
+  --ingress external \
+  --target-port 8080 \
+  --transport http2 \
+  --exposed-port 8080 \
+  \
+  # Security
+  --registry-identity system \
+  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
+  \
+  # Monitoring
+  --dapr-app-id myapp \
+  --dapr-app-port 8080 \
+  --dapr-app-protocol http \
+  --enable-dapr \
+  \
+  # Identity
+  --system-assigned
+```
+
+## Container Apps Environment Configuration
+
+### Environment with Zone Redundancy
+
+```bash
+az containerapp env create \
+  --name myenv-prod \
+  --resource-group MyRG \
+  --location eastus \
+  --logs-workspace-id <workspace-id> \
+  --logs-workspace-key <workspace-key> \
+  --zone-redundant true \
+  --enable-workload-profiles true
+```
+
+### Workload Profiles (Dedicated GPU)
+
+```bash
+# Create environment with workload profiles
+az containerapp env create \
+  --name myenv-gpu \
+  --resource-group MyRG \
+  --location eastus \
+  --enable-workload-profiles true
+
+# Add GPU workload profile
+az containerapp env workload-profile add \
+  --name myenv-gpu \
+  --resource-group MyRG \
+  --workload-profile-name gpu-profile \
+  --workload-profile-type GPU-A100 \
+  --min-nodes 0 \
+  --max-nodes 10
+
+# Create container app with GPU profile
+az containerapp create \
+  --name myapp-dedicated-gpu \
+  --resource-group MyRG \
+  --environment myenv-gpu \
+  --workload-profile-name gpu-profile \
+  --image myregistry.azurecr.io/training-job:latest \
+  --cpu 8 \
+  --memory 16Gi \
+  --min-replicas 1 \
+  --max-replicas 5
+```
+
+## GPU Scaling Rules
+
+### Custom Prometheus Scaling
+
+```bash
+az containerapp create \
+  --name myapp-gpu-prometheus \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/ai-model:latest \
+  --cpu 4 \
+  --memory 8Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 1 \
+  --min-replicas 0 \
+  --max-replicas 10 \
+  --scale-rule-name gpu-utilization \
+  --scale-rule-type custom \
+  --scale-rule-custom-type prometheus \
+  --scale-rule-metadata \
+    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
+    metricName=gpu_utilization \
+    threshold=80 \
+    query="avg(nvidia_gpu_utilization{app='myapp'})"
+```
+
+### Queue-Based Scaling (Azure Service Bus)
+
+```bash
+az containerapp create \
+  --name myapp-queue-processor \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/batch-processor:latest \
+  --cpu 4 \
+  --memory 8Gi \
+  --gpu-type nvidia-t4 \
+  --gpu-count 1 \
+  --min-replicas 0 \
+  --max-replicas 50 \
+  --scale-rule-name queue-scaling \
+  --scale-rule-type azure-servicebus \
+  --scale-rule-metadata \
+    queueName=ai-jobs \
+    namespace=myservicebus \
+    messageCount=5 \
+  --scale-rule-auth connection=servicebus-connection
+```
+
+## Dapr Integration
+
+### Enable Dapr on Container App
+
+```bash
+az containerapp create \
+  --name myapp-dapr \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/myapp:latest \
+  --enable-dapr \
+  --dapr-app-id myapp \
+  --dapr-app-port 8080 \
+  --dapr-app-protocol http \
+  --dapr-http-max-request-size 4 \
+  --dapr-http-read-buffer-size 4 \
+  --dapr-log-level info \
+  --dapr-enable-api-logging true
+```
+
+### Dapr State Store (Azure Cosmos DB)
+
+```yaml
+# Create Dapr component for state store
+apiVersion: dapr.io/v1alpha1
+kind: Component
+metadata:
+  name: statestore
+spec:
+  type: state.azure.cosmosdb
+  version: v1
+  metadata:
+    - name: url
+      value: "https://mycosmosdb.documents.azure.com:443/"
+    - name: masterKey
+      secretRef: cosmosdb-key
+    - name: database
+      value: "mydb"
+    - name: collection
+      value: "state"
+```
+
+```bash
+# Create the component
+az containerapp env dapr-component set \
+  --name myenv \
+  --resource-group MyRG \
+  --dapr-component-name statestore \
+  --yaml component.yaml
+```
+
+### Dapr Pub/Sub (Azure Service Bus)
+
+```yaml
+apiVersion: dapr.io/v1alpha1
+kind: Component
+metadata:
+  name: pubsub
+spec:
+  type: pubsub.azure.servicebus.topics
+  version: v1
+  metadata:
+    - name: connectionString
+      secretRef: servicebus-connection
+    - name: consumerID
+      value: "myapp"
+```
+
+### Service-to-Service Invocation
+
+```python
+# Python example using Dapr SDK
+from dapr.clients import DaprClient
+
+with DaprClient() as client:
+    # Invoke another service
+    response = client.invoke_method(
+        app_id='other-service',
+        method_name='process',
+        data='{"input": "data"}'
+    )
+
+    # Save state
+    client.save_state(
+        store_name='statestore',
+        key='mykey',
+        value='myvalue'
+    )
+
+    # Publish message
+    client.publish_event(
+        pubsub_name='pubsub',
+        topic_name='orders',
+        data='{"orderId": "123"}'
+    )
+```
+
+## AI Model Deployment Patterns
+
+### OpenAI-Compatible Endpoint
+
+```dockerfile
+# Dockerfile for vLLM model serving
+FROM vllm/vllm-openai:latest
+
+ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
+ENV GPU_MEMORY_UTILIZATION=0.9
+ENV MAX_MODEL_LEN=4096
+
+CMD ["--model", "${MODEL_NAME}", \
+     "--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}", \
+     "--max-model-len", "${MAX_MODEL_LEN}", \
+     "--port", "8080"]
+```
+
+```bash
+# Deploy vLLM model
+az containerapp create \
+  --name llama-inference \
+  --resource-group MyRG \
+  --environment myenv \
+  --image vllm/vllm-openai:latest \
+  --cpu 8 \
+  --memory 32Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 1 \
+  --min-replicas 1 \
+  --max-replicas 5 \
+  --target-port 8080 \
+  --ingress external \
+  --env-vars \
+    MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
+    GPU_MEMORY_UTILIZATION="0.9" \
+    HF_TOKEN=secretref:huggingface-token
+```
+
+### Stable Diffusion Image Generation
+
+```bash
+az containerapp create \
+  --name stable-diffusion \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/stable-diffusion:latest \
+  --cpu 4 \
+  --memory 16Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 1 \
+  --min-replicas 0 \
+  --max-replicas 10 \
+  --target-port 7860 \
+  --ingress external \
+  --scale-rule-name http-scaling \
+  --scale-rule-type http \
+  --scale-rule-http-concurrency 1
+```
+
+### Batch Processing Job
+
+```bash
+az containerapp job create \
+  --name batch-training-job \
+  --resource-group MyRG \
+  --environment myenv \
+  --trigger-type Manual \
+  --image myregistry.azurecr.io/training:latest \
+  --cpu 8 \
+  --memory 32Gi \
+  --gpu-type nvidia-a100 \
+  --gpu-count 2 \
+  --parallelism 1 \
+  --replica-timeout 7200 \
+  --replica-retry-limit 3 \
+  --env-vars \
+    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
+    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
+    EPOCHS="100"
+
+# Execute job
+az containerapp job start \
+  --name batch-training-job \
+  --resource-group MyRG
+```
+
+## Monitoring and Observability
+
+### Application Insights Integration
+
+```bash
+az containerapp create \
+  --name myapp-monitored \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/myapp:latest \
+  --env-vars \
+    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection
+```
+
+### Query Logs
+
+```bash
+# Stream logs
+az containerapp logs show \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --follow
+
+# Query with Log Analytics
+az monitor log-analytics query \
+  --workspace <workspace-id> \
+  --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
+```
+
+### Metrics and Alerts
+
+```bash
+# Create metric alert for GPU usage
+az monitor metrics alert create \
+  --name high-gpu-usage \
+  --resource-group MyRG \
+  --scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
+  --condition "avg Requests > 100" \
+  --window-size 5m \
+  --evaluation-frequency 1m \
+  --action <action-group-id>
+```
+
+## Security Best Practices
+
+### Managed Identity
+
+```bash
+# Create with system-assigned identity
+az containerapp create \
+  --name myapp-identity \
+  --resource-group MyRG \
+  --environment myenv \
+  --system-assigned \
+  --image myregistry.azurecr.io/myapp:latest
+
+# Get identity principal ID
+IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)
+
+# Assign role to access Key Vault
+az role assignment create \
+  --assignee $IDENTITY_ID \
+  --role "Key Vault Secrets User" \
+  --scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault
+
+# Use user-assigned identity
+az identity create --name myapp-identity --resource-group MyRG
+IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)
+
+az containerapp create \
+  --name myapp-user-identity \
+  --resource-group MyRG \
+  --environment myenv \
+  --user-assigned $IDENTITY_RESOURCE_ID \
+  --image myregistry.azurecr.io/myapp:latest
+```
+
+### Secret Management
+
+```bash
+# Add secrets
+az containerapp secret set \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --secrets \
+    huggingface-token="<token>" \
+    api-key="<key>"
+
+# Reference secrets in environment variables
+az containerapp update \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --set-env-vars \
+    HF_TOKEN=secretref:huggingface-token \
+    API_KEY=secretref:api-key
+```
+
+## Cost Optimization
+
+### Scale-to-Zero Configuration
+
+```bash
+az containerapp create \
+  --name myapp-scale-zero \
+  --resource-group MyRG \
+  --environment myenv \
+  --image myregistry.azurecr.io/myapp:latest \
+  --min-replicas 0 \
+  --max-replicas 10 \
+  --scale-rule-name http-scaling \
+  --scale-rule-type http \
+  --scale-rule-http-concurrency 10
+```
+
+**Cost savings**: Pay only when requests are being processed. GPU costs are per-second when active.
+
+### Right-Sizing Resources
+
+```bash
+# Start with minimal resources
+--cpu 2 --memory 4Gi --gpu-count 1
+
+# Monitor and adjust based on actual usage
+az monitor metrics list \
+  --resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
+  --metric "CpuPercentage,MemoryPercentage"
+```
+
+### Use Spot/Preemptible GPUs (Future Feature)
+
+When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.
+
+## Troubleshooting
+
+### Check Revision Status
+
+```bash
+az containerapp revision list \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --output table
+```
+
+### View Revision Details
+
+```bash
+az containerapp revision show \
+  --name <revision-name> \
+  --app myapp-gpu \
+  --resource-group MyRG
+```
+
+### Restart Container App
+
+```bash
+az containerapp update \
+  --name myapp-gpu \
+  --resource-group MyRG \
+  --force-restart
+```
+
+### GPU Not Available
+
+If GPU is not provisioning:
+1. Check region availability: Not all regions support GPU
+2. Verify quota: Request quota increase if needed
+3. Check workload profile: Ensure GPU workload profile is created
+
+## Best Practices
+
+✓ Use scale-to-zero for intermittent workloads
+✓ Implement health probes (liveness and readiness)
+✓ Use managed identities for authentication
+✓ Store secrets in Azure Key Vault
+✓ Enable Dapr for microservices patterns
+✓ Configure appropriate scaling rules
+✓ Monitor GPU utilization and adjust resources
+✓ Use Container Apps jobs for batch processing
+✓ Implement retry logic for transient failures
+✓ Use Application Insights for observability
+
+## References
+
+- [Container Apps GPU Documentation](https://learn.microsoft.com/en-us/azure/container-apps/gpu-support)
+- [Dapr Integration](https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview)
+- [Scaling Rules](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)
+- [Build 2025 Announcements](https://azure.microsoft.com/en-us/blog/container-apps-build-2025/)
+
+Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!