Initial commit
This commit is contained in:
620
skills/aks-automatic-2025.md
Normal file
620
skills/aks-automatic-2025.md
Normal file
@@ -0,0 +1,620 @@
|
||||
## 🚨 CRITICAL GUIDELINES
|
||||
|
||||
### Windows File Path Requirements
|
||||
|
||||
**MANDATORY: Always Use Backslashes on Windows for File Paths**
|
||||
|
||||
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
|
||||
|
||||
**Examples:**
|
||||
- ❌ WRONG: `D:/repos/project/file.tsx`
|
||||
- ✅ CORRECT: `D:\repos\project\file.tsx`
|
||||
|
||||
This applies to:
|
||||
- Edit tool file_path parameter
|
||||
- Write tool file_path parameter
|
||||
- All file operations on Windows systems
|
||||
|
||||
### Documentation Guidelines
|
||||
|
||||
**NEVER create new documentation files unless explicitly requested by the user.**
|
||||
|
||||
- **Priority**: Update existing README.md files rather than creating new documentation
|
||||
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
|
||||
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
|
||||
- **User preference**: Only create additional .md files when user specifically asks for documentation
|
||||
|
||||
---
|
||||
|
||||
|
||||
# AKS Automatic - 2025 GA Features
|
||||
|
||||
Complete knowledge base for Azure Kubernetes Service Automatic mode (GA October 2025).
|
||||
|
||||
## Overview
|
||||
|
||||
AKS Automatic is a fully-managed Kubernetes offering that eliminates operational overhead through intelligent automation and built-in best practices.
|
||||
|
||||
## Key Features (GA October 2025)
|
||||
|
||||
### 1. Zero Operational Overhead
|
||||
- Fully-managed control plane and worker nodes
|
||||
- Automatic OS patching and security updates
|
||||
- Built-in monitoring and diagnostics
|
||||
- Integrated security and compliance
|
||||
|
||||
### 2. Karpenter Integration
|
||||
- Dynamic node provisioning based on real-time demand
|
||||
- Intelligent bin-packing for cost optimization
|
||||
- Automatic node consolidation and deprovisioning
|
||||
- Support for multiple node pools and instance types
|
||||
|
||||
### 3. Auto-Scaling (Enabled by Default)
|
||||
- **Horizontal Pod Autoscaler (HPA)**: Scale pods based on CPU/memory
|
||||
- **Vertical Pod Autoscaler (VPA)**: Adjust pod resource requests/limits
|
||||
- **KEDA**: Event-driven autoscaling for external triggers
|
||||
|
||||
### 4. Enhanced Security
|
||||
- Microsoft Entra ID integration for authentication
|
||||
- Azure RBAC for Kubernetes authorization
|
||||
- Network policies enabled by default
|
||||
- Automatic security patches
|
||||
- Workload identity for pod-level authentication
|
||||
|
||||
### 5. Advanced Networking
|
||||
- Azure CNI Overlay for efficient IP usage
|
||||
- Cilium dataplane for high-performance networking
|
||||
- Network policies for microsegmentation
|
||||
- Private clusters supported
|
||||
|
||||
### 6. New Billing Model (Effective October 19, 2025)
|
||||
- Hosted control plane fee: **$0.16/cluster/hour**
|
||||
- Compute charges based on actual node usage
|
||||
- No separate cluster management fee
|
||||
- Cost savings from Karpenter optimization
|
||||
|
||||
### 7. Node Operating System
|
||||
- Ubuntu 22.04 for Kubernetes < 1.34
|
||||
- Ubuntu 24.04 for Kubernetes >= 1.34
|
||||
- Automatic OS upgrades with node image channel
|
||||
|
||||
## Creating AKS Automatic Cluster
|
||||
|
||||
### Basic Creation
|
||||
|
||||
```bash
|
||||
az aks create \
|
||||
--resource-group MyRG \
|
||||
--name MyAKSAutomatic \
|
||||
--sku automatic \
|
||||
--kubernetes-version 1.34 \
|
||||
--location eastus
|
||||
```
|
||||
|
||||
### Production-Ready Configuration
|
||||
|
||||
```bash
|
||||
az aks create \
|
||||
--resource-group MyRG \
|
||||
--name MyAKSAutomatic \
|
||||
--location eastus \
|
||||
--sku automatic \
|
||||
--tier standard \
|
||||
\
|
||||
# Kubernetes version
|
||||
--kubernetes-version 1.34 \
|
||||
\
|
||||
# Karpenter (default in automatic mode)
|
||||
--enable-karpenter \
|
||||
\
|
||||
# Networking
|
||||
--network-plugin azure \
|
||||
--network-plugin-mode overlay \
|
||||
--network-dataplane cilium \
|
||||
--service-cidr 10.0.0.0/16 \
|
||||
--dns-service-ip 10.0.0.10 \
|
||||
--load-balancer-sku standard \
|
||||
\
|
||||
# Use custom VNet (optional)
|
||||
--vnet-subnet-id /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.Network/virtualNetworks/MyVNet/subnets/AKSSubnet \
|
||||
\
|
||||
# Availability zones
|
||||
--zones 1 2 3 \
|
||||
\
|
||||
# Authentication and authorization
|
||||
--enable-managed-identity \
|
||||
--enable-aad \
|
||||
--enable-azure-rbac \
|
||||
--aad-admin-group-object-ids <group-object-id> \
|
||||
\
|
||||
# Auto-upgrade
|
||||
--auto-upgrade-channel stable \
|
||||
--node-os-upgrade-channel NodeImage \
|
||||
\
|
||||
# Security
|
||||
--enable-defender \
|
||||
--enable-workload-identity \
|
||||
--enable-oidc-issuer \
|
||||
\
|
||||
# Monitoring
|
||||
--enable-addons monitoring \
|
||||
--workspace-resource-id /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.OperationalInsights/workspaces/MyWorkspace \
|
||||
\
|
||||
# Tags
|
||||
--tags Environment=Production ManagedBy=AKSAutomatic
|
||||
```
|
||||
|
||||
### With Azure Policy Add-on
|
||||
|
||||
```bash
|
||||
az aks create \
|
||||
--resource-group MyRG \
|
||||
--name MyAKSAutomatic \
|
||||
--sku automatic \
|
||||
--enable-addons azure-policy \
|
||||
--kubernetes-version 1.34
|
||||
```
|
||||
|
||||
## Karpenter Configuration
|
||||
|
||||
AKS Automatic uses Karpenter for intelligent node provisioning. Customize node provisioning with AKSNodeClass and NodePool CRDs.
|
||||
|
||||
### Default AKSNodeClass
|
||||
|
||||
```yaml
|
||||
apiVersion: karpenter.azure.com/v1alpha1
|
||||
kind: AKSNodeClass
|
||||
metadata:
|
||||
name: default
|
||||
spec:
|
||||
# OS Image - Ubuntu 24.04 for K8s 1.34+
|
||||
osImage:
|
||||
sku: Ubuntu
|
||||
version: "24.04"
|
||||
|
||||
# VM Series
|
||||
vmSeries:
|
||||
- Standard_D
|
||||
- Standard_E
|
||||
|
||||
# Max pods per node
|
||||
maxPodsPerNode: 110
|
||||
|
||||
# Security
|
||||
securityProfile:
|
||||
sshAccess: Disabled
|
||||
securityType: Standard
|
||||
```
|
||||
|
||||
### Custom NodePool
|
||||
|
||||
```yaml
|
||||
apiVersion: karpenter.sh/v1
|
||||
kind: NodePool
|
||||
metadata:
|
||||
name: general-purpose
|
||||
spec:
|
||||
# Constraints
|
||||
template:
|
||||
spec:
|
||||
requirements:
|
||||
- key: kubernetes.io/arch
|
||||
operator: In
|
||||
values: ["amd64"]
|
||||
- key: karpenter.sh/capacity-type
|
||||
operator: In
|
||||
values: ["on-demand"]
|
||||
- key: kubernetes.azure.com/agentpool
|
||||
operator: In
|
||||
values: ["general"]
|
||||
|
||||
# Node labels
|
||||
labels:
|
||||
workload-type: general
|
||||
|
||||
# Taints (optional)
|
||||
taints:
|
||||
- key: "dedicated"
|
||||
value: "general"
|
||||
effect: "NoSchedule"
|
||||
|
||||
# NodeClass reference
|
||||
nodeClassRef:
|
||||
group: karpenter.azure.com
|
||||
kind: AKSNodeClass
|
||||
name: default
|
||||
|
||||
# Limits
|
||||
limits:
|
||||
cpu: "1000"
|
||||
memory: 4000Gi
|
||||
|
||||
# Disruption budget
|
||||
disruption:
|
||||
consolidationPolicy: WhenEmpty
|
||||
consolidateAfter: 30s
|
||||
expireAfter: 720h # 30 days
|
||||
budgets:
|
||||
- nodes: "10%"
|
||||
duration: 5m
|
||||
```
|
||||
|
||||
### GPU NodePool for AI Workloads
|
||||
|
||||
```yaml
|
||||
apiVersion: karpenter.sh/v1
|
||||
kind: NodePool
|
||||
metadata:
|
||||
name: gpu-workloads
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
requirements:
|
||||
- key: kubernetes.io/arch
|
||||
operator: In
|
||||
values: ["amd64"]
|
||||
- key: karpenter.sh/capacity-type
|
||||
operator: In
|
||||
values: ["on-demand"]
|
||||
- key: node.kubernetes.io/instance-type
|
||||
operator: In
|
||||
values: ["Standard_NC6s_v3", "Standard_NC12s_v3", "Standard_NC24s_v3"]
|
||||
|
||||
labels:
|
||||
workload-type: gpu
|
||||
gpu-type: nvidia-v100
|
||||
|
||||
taints:
|
||||
- key: "nvidia.com/gpu"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
|
||||
nodeClassRef:
|
||||
group: karpenter.azure.com
|
||||
kind: AKSNodeClass
|
||||
name: gpu-nodeclass
|
||||
|
||||
limits:
|
||||
cpu: "200"
|
||||
memory: 800Gi
|
||||
nvidia.com/gpu: "16"
|
||||
|
||||
disruption:
|
||||
consolidationPolicy: WhenEmpty
|
||||
consolidateAfter: 300s
|
||||
```
|
||||
|
||||
## Autoscaling with HPA, VPA, and KEDA
|
||||
|
||||
### Horizontal Pod Autoscaler (HPA)
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: myapp-hpa
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: myapp
|
||||
minReplicas: 2
|
||||
maxReplicas: 50
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
behavior:
|
||||
scaleUp:
|
||||
stabilizationWindowSeconds: 0
|
||||
policies:
|
||||
- type: Percent
|
||||
value: 100
|
||||
periodSeconds: 15
|
||||
- type: Pods
|
||||
value: 4
|
||||
periodSeconds: 15
|
||||
selectPolicy: Max
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Percent
|
||||
value: 50
|
||||
periodSeconds: 15
|
||||
```
|
||||
|
||||
### Vertical Pod Autoscaler (VPA)
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling.k8s.io/v1
|
||||
kind: VerticalPodAutoscaler
|
||||
metadata:
|
||||
name: myapp-vpa
|
||||
spec:
|
||||
targetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: myapp
|
||||
updatePolicy:
|
||||
updateMode: "Auto" # Auto, Recreate, Initial, Off
|
||||
resourcePolicy:
|
||||
containerPolicies:
|
||||
- containerName: "*"
|
||||
minAllowed:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
maxAllowed:
|
||||
cpu: 4
|
||||
memory: 8Gi
|
||||
controlledResources: ["cpu", "memory"]
|
||||
controlledValues: RequestsAndLimits
|
||||
```
|
||||
|
||||
### KEDA ScaledObject (Event-Driven)
|
||||
|
||||
```yaml
|
||||
apiVersion: keda.sh/v1alpha1
|
||||
kind: ScaledObject
|
||||
metadata:
|
||||
name: myapp-queue-scaler
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
name: myapp
|
||||
minReplicaCount: 0 # Scale to zero
|
||||
maxReplicaCount: 100
|
||||
pollingInterval: 30
|
||||
cooldownPeriod: 300
|
||||
triggers:
|
||||
# Azure Service Bus Queue
|
||||
- type: azure-servicebus
|
||||
metadata:
|
||||
queueName: myqueue
|
||||
namespace: myservicebus
|
||||
messageCount: "5"
|
||||
authenticationRef:
|
||||
name: azure-servicebus-auth
|
||||
|
||||
# Azure Storage Queue
|
||||
- type: azure-queue
|
||||
metadata:
|
||||
queueName: myqueue
|
||||
queueLength: "10"
|
||||
accountName: mystorageaccount
|
||||
authenticationRef:
|
||||
name: azure-storage-auth
|
||||
|
||||
# Prometheus metrics
|
||||
- type: prometheus
|
||||
metadata:
|
||||
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
|
||||
metricName: http_requests_per_second
|
||||
threshold: "100"
|
||||
query: sum(rate(http_requests_total[2m]))
|
||||
```
|
||||
|
||||
## Workload Identity (Replaces AAD Pod Identity)
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Workload identity is enabled by default in AKS Automatic
|
||||
|
||||
# Create managed identity
|
||||
az identity create \
|
||||
--name myapp-identity \
|
||||
--resource-group MyRG
|
||||
|
||||
# Get identity details
|
||||
export IDENTITY_CLIENT_ID=$(az identity show -g MyRG -n myapp-identity --query clientId -o tsv)
|
||||
export IDENTITY_OBJECT_ID=$(az identity show -g MyRG -n myapp-identity --query principalId -o tsv)
|
||||
|
||||
# Assign role to identity
|
||||
az role assignment create \
|
||||
--assignee $IDENTITY_OBJECT_ID \
|
||||
--role "Storage Blob Data Contributor" \
|
||||
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.Storage/storageAccounts/mystorage
|
||||
|
||||
# Create federated credential
|
||||
export AKS_OIDC_ISSUER=$(az aks show -g MyRG -n MyAKSAutomatic --query oidcIssuerProfile.issuerUrl -o tsv)
|
||||
|
||||
az identity federated-credential create \
|
||||
--name myapp-federated-credential \
|
||||
--identity-name myapp-identity \
|
||||
--resource-group MyRG \
|
||||
--issuer $AKS_OIDC_ISSUER \
|
||||
--subject system:serviceaccount:default:myapp-sa
|
||||
```
|
||||
|
||||
### Kubernetes Resources
|
||||
|
||||
```yaml
|
||||
# Service Account
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: myapp-sa
|
||||
namespace: default
|
||||
annotations:
|
||||
azure.workload.identity/client-id: "<IDENTITY_CLIENT_ID>"
|
||||
|
||||
---
|
||||
# Deployment using workload identity
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: myapp
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: myapp
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: myapp
|
||||
azure.workload.identity/use: "true" # Enable workload identity
|
||||
spec:
|
||||
serviceAccountName: myapp-sa
|
||||
containers:
|
||||
- name: myapp
|
||||
image: myregistry.azurecr.io/myapp:latest
|
||||
env:
|
||||
- name: AZURE_CLIENT_ID
|
||||
value: "<IDENTITY_CLIENT_ID>"
|
||||
- name: AZURE_TENANT_ID
|
||||
value: "<TENANT_ID>"
|
||||
- name: AZURE_FEDERATED_TOKEN_FILE
|
||||
value: /var/run/secrets/azure/tokens/azure-identity-token
|
||||
volumeMounts:
|
||||
- name: azure-identity-token
|
||||
mountPath: /var/run/secrets/azure/tokens
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: azure-identity-token
|
||||
projected:
|
||||
sources:
|
||||
- serviceAccountToken:
|
||||
path: azure-identity-token
|
||||
expirationSeconds: 3600
|
||||
audience: api://AzureADTokenExchange
|
||||
```
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Enable Container Insights
|
||||
|
||||
```bash
|
||||
# Already enabled with --enable-addons monitoring
|
||||
# Query logs using Azure Monitor
|
||||
|
||||
# Get cluster logs
|
||||
az monitor log-analytics query \
|
||||
--workspace <workspace-id> \
|
||||
--analytics-query "KubePodInventory | where ClusterName == 'MyAKSAutomatic' | take 10" \
|
||||
--output table
|
||||
|
||||
# Get Karpenter logs
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
|
||||
```
|
||||
|
||||
### Prometheus and Grafana
|
||||
|
||||
```bash
|
||||
# Enable managed Prometheus
|
||||
az aks update \
|
||||
--resource-group MyRG \
|
||||
--name MyAKSAutomatic \
|
||||
--enable-azure-monitor-metrics
|
||||
|
||||
# Access Grafana dashboards through Azure Portal
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Billing Model (October 2025)
|
||||
- **Control plane**: $0.16/hour per cluster
|
||||
- **Compute**: Pay for actual node usage
|
||||
- **Karpenter**: Automatic bin-packing and consolidation
|
||||
- **Scale-to-zero**: Possible with KEDA and Karpenter
|
||||
|
||||
### Cost-Saving Tips
|
||||
|
||||
1. **Use Spot Instances for Non-Critical Workloads**
|
||||
```yaml
|
||||
- key: karpenter.sh/capacity-type
|
||||
operator: In
|
||||
values: ["spot"]
|
||||
```
|
||||
|
||||
2. **Configure Aggressive Consolidation**
|
||||
```yaml
|
||||
disruption:
|
||||
consolidationPolicy: WhenUnderutilized
|
||||
consolidateAfter: 30s
|
||||
```
|
||||
|
||||
3. **Implement Pod Disruption Budgets**
|
||||
```yaml
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: myapp-pdb
|
||||
spec:
|
||||
minAvailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: myapp
|
||||
```
|
||||
|
||||
4. **Use VPA for Right-Sizing**
|
||||
- VPA automatically adjusts resource requests based on actual usage
|
||||
|
||||
## Migration from Standard AKS to Automatic
|
||||
|
||||
AKS Automatic is a new cluster mode - in-place migration is not supported. Follow these steps:
|
||||
|
||||
1. **Create new AKS Automatic cluster**
|
||||
2. **Install workloads in new cluster**
|
||||
3. **Validate functionality**
|
||||
4. **Switch traffic** (DNS, load balancer)
|
||||
5. **Decommission old cluster**
|
||||
|
||||
## Best Practices
|
||||
|
||||
✓ Use AKS Automatic for new production clusters
|
||||
✓ Enable workload identity for pod authentication
|
||||
✓ Configure custom NodePools for specific workload types
|
||||
✓ Implement HPA, VPA, and KEDA for comprehensive scaling
|
||||
✓ Use spot instances for batch and fault-tolerant workloads
|
||||
✓ Enable Container Insights and Managed Prometheus
|
||||
✓ Configure Pod Disruption Budgets for critical apps
|
||||
✓ Use network policies for microsegmentation
|
||||
✓ Enable Azure Policy add-on for compliance
|
||||
✓ Implement GitOps with Flux or Argo CD
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Karpenter Status
|
||||
```bash
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=100
|
||||
kubectl get nodepools
|
||||
kubectl get nodeclaims
|
||||
```
|
||||
|
||||
### View Node Provisioning Events
|
||||
```bash
|
||||
kubectl get events --field-selector involvedObject.kind=NodePool -A
|
||||
```
|
||||
|
||||
### Debug Workload Identity Issues
|
||||
```bash
|
||||
# Check service account annotation
|
||||
kubectl get sa myapp-sa -o yaml
|
||||
|
||||
# Check pod labels
|
||||
kubectl get pod <pod-name> -o yaml | grep azure.workload.identity
|
||||
|
||||
# Check federated credential
|
||||
az identity federated-credential show \
|
||||
--identity-name myapp-identity \
|
||||
--resource-group MyRG \
|
||||
--name myapp-federated-credential
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [AKS Automatic Documentation](https://learn.microsoft.com/en-us/azure/aks/automatic)
|
||||
- [Karpenter on Azure](https://karpenter.sh)
|
||||
- [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview)
|
||||
- [AKS Release Notes](https://github.com/Azure/AKS/releases)
|
||||
|
||||
AKS Automatic represents the future of managed Kubernetes on Azure - zero operational overhead with maximum automation!
|
||||
718
skills/azure-openai-2025.md
Normal file
718
skills/azure-openai-2025.md
Normal file
@@ -0,0 +1,718 @@
|
||||
## 🚨 CRITICAL GUIDELINES
|
||||
|
||||
### Windows File Path Requirements
|
||||
|
||||
**MANDATORY: Always Use Backslashes on Windows for File Paths**
|
||||
|
||||
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
|
||||
|
||||
**Examples:**
|
||||
- ❌ WRONG: `D:/repos/project/file.tsx`
|
||||
- ✅ CORRECT: `D:\repos\project\file.tsx`
|
||||
|
||||
This applies to:
|
||||
- Edit tool file_path parameter
|
||||
- Write tool file_path parameter
|
||||
- All file operations on Windows systems
|
||||
|
||||
### Documentation Guidelines
|
||||
|
||||
**NEVER create new documentation files unless explicitly requested by the user.**
|
||||
|
||||
- **Priority**: Update existing README.md files rather than creating new documentation
|
||||
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
|
||||
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
|
||||
- **User preference**: Only create additional .md files when user specifically asks for documentation
|
||||
|
||||
---
|
||||
|
||||
|
||||
# Azure OpenAI Service - 2025 Models and Features
|
||||
|
||||
Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
|
||||
|
||||
## Overview
|
||||
|
||||
Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
|
||||
|
||||
## Latest Models (2025)
|
||||
|
||||
### GPT-5 Series (GA August 2025)
|
||||
|
||||
**Registration Required Models:**
|
||||
- `gpt-5-pro`: Highest capability, complex reasoning
|
||||
- `gpt-5`: Balanced performance and cost
|
||||
- `gpt-5-codex`: Optimized for code generation
|
||||
|
||||
**No Registration Required:**
|
||||
- `gpt-5-mini`: Faster, more affordable
|
||||
- `gpt-5-nano`: Ultra-fast for simple tasks
|
||||
- `gpt-5-chat`: Optimized for conversational use
|
||||
|
||||
### GPT-4.1 Series
|
||||
|
||||
- `gpt-4.1`: 1 million token context window
|
||||
- `gpt-4.1-mini`: Efficient version with 1M context
|
||||
- `gpt-4.1-nano`: Fastest variant
|
||||
|
||||
**Key Improvements:**
|
||||
- 1,000,000 token context (vs 128K in GPT-4 Turbo)
|
||||
- Better instruction following
|
||||
- Reduced hallucinations
|
||||
- Improved multilingual support
|
||||
|
||||
### Reasoning Models
|
||||
|
||||
**o4-mini**: Lightweight reasoning model
|
||||
- Faster inference
|
||||
- Lower cost
|
||||
- Suitable for structured reasoning tasks
|
||||
|
||||
**o3**: Advanced reasoning model
|
||||
- Complex problem solving
|
||||
- Mathematical reasoning
|
||||
- Scientific analysis
|
||||
|
||||
**o1**: Original reasoning model
|
||||
- General-purpose reasoning
|
||||
- Step-by-step explanations
|
||||
|
||||
**o1-mini**: Efficient reasoning
|
||||
- Balanced cost and performance
|
||||
|
||||
### Image Generation
|
||||
|
||||
**GPT-image-1 (2025-04-15)**
|
||||
- DALL-E 3 successor
|
||||
- Higher quality images
|
||||
- Better prompt understanding
|
||||
- Improved safety filters
|
||||
|
||||
### Video Generation
|
||||
|
||||
**Sora (2025-05-02)**
|
||||
- Text-to-video generation
|
||||
- Realistic and imaginative scenes
|
||||
- Up to 60 seconds of video
|
||||
- Multiple camera angles and styles
|
||||
|
||||
### Audio Models
|
||||
|
||||
**gpt-4o-transcribe**: Speech-to-text powered by GPT-4o
|
||||
- High accuracy transcription
|
||||
- Multiple languages
|
||||
- Speaker diarization
|
||||
|
||||
**gpt-4o-mini-transcribe**: Faster, more affordable transcription
|
||||
- Good accuracy
|
||||
- Lower latency
|
||||
- Cost-effective
|
||||
|
||||
## Deploying Azure OpenAI
|
||||
|
||||
### Create Azure OpenAI Resource
|
||||
|
||||
```bash
|
||||
# Create OpenAI account
|
||||
az cognitiveservices account create \
|
||||
--name myopenai \
|
||||
--resource-group MyRG \
|
||||
--kind OpenAI \
|
||||
--sku S0 \
|
||||
--location eastus \
|
||||
--custom-domain myopenai \
|
||||
--public-network-access Disabled \
|
||||
--identity-type SystemAssigned
|
||||
|
||||
# Get endpoint and key
|
||||
az cognitiveservices account show \
|
||||
--name myopenai \
|
||||
--resource-group MyRG \
|
||||
--query "properties.endpoint" \
|
||||
--output tsv
|
||||
|
||||
az cognitiveservices account keys list \
|
||||
--name myopenai \
|
||||
--resource-group MyRG \
|
||||
--query "key1" \
|
||||
--output tsv
|
||||
```
|
||||
|
||||
### Deploy GPT-5 Model
|
||||
|
||||
```bash
|
||||
# Deploy gpt-5
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name gpt-5 \
|
||||
--model-name gpt-5 \
|
||||
--model-version latest \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 100 \
|
||||
--scale-type Standard
|
||||
|
||||
# Deploy gpt-5-pro (requires registration)
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name gpt-5-pro \
|
||||
--model-name gpt-5-pro \
|
||||
--model-version latest \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 50
|
||||
```
|
||||
|
||||
### Deploy Reasoning Models
|
||||
|
||||
```bash
|
||||
# Deploy o3 reasoning model
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name o3-reasoning \
|
||||
--model-name o3 \
|
||||
--model-version latest \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 50
|
||||
|
||||
# Deploy o4-mini
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name o4-mini \
|
||||
--model-name o4-mini \
|
||||
--model-version latest \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 100
|
||||
```
|
||||
|
||||
### Deploy GPT-4.1 with 1M Context
|
||||
|
||||
```bash
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name gpt-4-1 \
|
||||
--model-name gpt-4.1 \
|
||||
--model-version latest \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 100
|
||||
```
|
||||
|
||||
### Deploy Image Generation Model
|
||||
|
||||
```bash
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name image-gen \
|
||||
--model-name gpt-image-1 \
|
||||
--model-version 2025-04-15 \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 10
|
||||
```
|
||||
|
||||
### Deploy Sora Video Generation
|
||||
|
||||
```bash
|
||||
az cognitiveservices account deployment create \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name sora \
|
||||
--model-name sora \
|
||||
--model-version 2025-05-02 \
|
||||
--model-format OpenAI \
|
||||
--sku-name Standard \
|
||||
--sku-capacity 5
|
||||
```
|
||||
|
||||
## Using Azure OpenAI Models
|
||||
|
||||
### Python SDK (GPT-5)
|
||||
|
||||
```python
|
||||
from openai import AzureOpenAI
|
||||
import os
|
||||
|
||||
# Initialize client
|
||||
client = AzureOpenAI(
|
||||
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
|
||||
api_version="2025-02-01-preview",
|
||||
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
|
||||
)
|
||||
|
||||
# GPT-5 completion
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-5", # deployment name
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful AI assistant."},
|
||||
{"role": "user", "content": "Explain quantum computing in simple terms."}
|
||||
],
|
||||
max_tokens=1000,
|
||||
temperature=0.7,
|
||||
top_p=0.95
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### Python SDK (o3 Reasoning Model)
|
||||
|
||||
```python
|
||||
# o3 reasoning with chain-of-thought
|
||||
response = client.chat.completions.create(
|
||||
model="o3-reasoning",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
|
||||
{"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
|
||||
],
|
||||
max_tokens=2000,
|
||||
temperature=0.2 # Lower temperature for reasoning tasks
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### Python SDK (GPT-4.1 with 1M Context)
|
||||
|
||||
```python
|
||||
# Read a large document
|
||||
with open('large_document.txt', 'r') as f:
|
||||
document = f.read()
|
||||
|
||||
# GPT-4.1 can handle up to 1M tokens
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4-1",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a document analysis expert."},
|
||||
{"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
|
||||
],
|
||||
max_tokens=4000
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### Image Generation (GPT-image-1)
|
||||
|
||||
```python
|
||||
# Generate image with DALL-E 3 successor
|
||||
response = client.images.generate(
|
||||
model="image-gen",
|
||||
prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
|
||||
size="1024x1024",
|
||||
quality="hd",
|
||||
n=1
|
||||
)
|
||||
|
||||
image_url = response.data[0].url
|
||||
print(f"Generated image: {image_url}")
|
||||
```
|
||||
|
||||
### Video Generation (Sora)
|
||||
|
||||
```python
|
||||
# Generate video with Sora
|
||||
response = client.videos.generate(
|
||||
model="sora",
|
||||
prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
|
||||
duration=10, # seconds
|
||||
resolution="1080p",
|
||||
fps=30
|
||||
)
|
||||
|
||||
video_url = response.data[0].url
|
||||
print(f"Generated video: {video_url}")
|
||||
```
|
||||
|
||||
### Audio Transcription
|
||||
|
||||
```python
|
||||
# Transcribe audio file
|
||||
audio_file = open("meeting_recording.mp3", "rb")
|
||||
|
||||
response = client.audio.transcriptions.create(
|
||||
model="gpt-4o-transcribe",
|
||||
file=audio_file,
|
||||
language="en",
|
||||
response_format="verbose_json"
|
||||
)
|
||||
|
||||
print(f"Transcription: {response.text}")
|
||||
print(f"Duration: {response.duration}s")
|
||||
|
||||
# Speaker diarization
|
||||
for segment in response.segments:
|
||||
print(f"[{segment.start}s - {segment.end}s] {segment.text}")
|
||||
```
|
||||
|
||||
## Azure AI Foundry Integration
|
||||
|
||||
### Model Router (Automatic Model Selection)
|
||||
|
||||
```python
|
||||
from azure.ai.foundry import ModelRouter
|
||||
|
||||
# Initialize model router
|
||||
router = ModelRouter(
|
||||
endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
|
||||
credential=os.getenv("AZURE_OPENAI_API_KEY")
|
||||
)
|
||||
|
||||
# Automatically select optimal model
|
||||
response = router.complete(
|
||||
prompt="Analyze this complex scientific paper...",
|
||||
optimization_goals=["quality", "cost"],
|
||||
available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
|
||||
)
|
||||
|
||||
print(f"Selected model: {response.model_used}")
|
||||
print(f"Response: {response.content}")
|
||||
print(f"Cost: ${response.cost}")
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Automatic model selection based on prompt complexity
|
||||
- Balance quality vs cost
|
||||
- Reduce costs by up to 40% while maintaining quality
|
||||
|
||||
### Agentic Retrieval (Azure AI Search Integration)
|
||||
|
||||
```python
|
||||
from azure.search.documents import SearchClient
|
||||
from azure.core.credentials import AzureKeyCredential
|
||||
|
||||
# Initialize search client
|
||||
search_client = SearchClient(
|
||||
endpoint=os.getenv("SEARCH_ENDPOINT"),
|
||||
index_name="documents",
|
||||
credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
|
||||
)
|
||||
|
||||
# Agentic retrieval with Azure OpenAI
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-5",
|
||||
messages=[
|
||||
{"role": "system", "content": "You have access to a document search system."},
|
||||
{"role": "user", "content": "What are the company's revenue projections for Q3?"}
|
||||
],
|
||||
tools=[{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search_documents",
|
||||
"description": "Search company documents",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {"type": "string", "description": "Search query"}
|
||||
},
|
||||
"required": ["query"]
|
||||
}
|
||||
}
|
||||
}],
|
||||
tool_choice="auto"
|
||||
)
|
||||
|
||||
# Process tool calls
|
||||
if response.choices[0].message.tool_calls:
|
||||
for tool_call in response.choices[0].message.tool_calls:
|
||||
if tool_call.function.name == "search_documents":
|
||||
query = json.loads(tool_call.function.arguments)["query"]
|
||||
results = search_client.search(query)
|
||||
# Feed results back to model for final answer
|
||||
```
|
||||
|
||||
**Improvements:**
|
||||
- 40% better on complex, multi-part questions
|
||||
- Automatic query decomposition
|
||||
- Relevance ranking
|
||||
- Citation generation
|
||||
|
||||
### Foundry Observability (Preview)
|
||||
|
||||
```python
|
||||
from azure.ai.foundry import FoundryObservability
|
||||
|
||||
# Enable observability
|
||||
observability = FoundryObservability(
|
||||
workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
|
||||
enable_tracing=True,
|
||||
enable_metrics=True
|
||||
)
|
||||
|
||||
# Monitor agent execution
|
||||
with observability.trace_agent("customer_support_agent") as trace:
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-5",
|
||||
messages=messages
|
||||
)
|
||||
|
||||
trace.log_tool_call("search_kb", {"query": "refund policy"})
|
||||
trace.log_reasoning_step("Retrieved refund policy document")
|
||||
trace.log_token_usage(response.usage.total_tokens)
|
||||
|
||||
# View in Azure AI Foundry portal:
|
||||
# - End-to-end trace logs
|
||||
# - Reasoning steps and tool calls
|
||||
# - Performance metrics
|
||||
# - Cost analysis
|
||||
```
|
||||
|
||||
## Capacity and Quota Management
|
||||
|
||||
### Check Quota
|
||||
|
||||
```bash
|
||||
# List deployments with usage
|
||||
az cognitiveservices account deployment list \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--output table
|
||||
|
||||
# Check usage metrics
|
||||
az monitor metrics list \
|
||||
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
|
||||
--metric "TokenTransaction" \
|
||||
--start-time 2025-01-01T00:00:00Z \
|
||||
--end-time 2025-01-31T23:59:59Z \
|
||||
--interval PT1H \
|
||||
--aggregation Total
|
||||
```
|
||||
|
||||
### Update Capacity
|
||||
|
||||
```bash
|
||||
# Scale up deployment capacity
|
||||
az cognitiveservices account deployment update \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name gpt-5 \
|
||||
--sku-capacity 200
|
||||
|
||||
# Scale down during off-peak
|
||||
az cognitiveservices account deployment update \
|
||||
--resource-group MyRG \
|
||||
--name myopenai \
|
||||
--deployment-name gpt-5 \
|
||||
--sku-capacity 50
|
||||
```
|
||||
|
||||
### Request Quota Increase
|
||||
|
||||
1. Navigate to Azure Portal → Azure OpenAI resource
|
||||
2. Go to "Quotas" blade
|
||||
3. Select model and region
|
||||
4. Click "Request quota increase"
|
||||
5. Provide justification and target capacity
|
||||
|
||||
## Security and Networking
|
||||
|
||||
### Private Endpoint
|
||||
|
||||
```bash
|
||||
# Create private endpoint
|
||||
az network private-endpoint create \
|
||||
--name openai-private-endpoint \
|
||||
--resource-group MyRG \
|
||||
--vnet-name MyVNet \
|
||||
--subnet PrivateEndpointSubnet \
|
||||
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
|
||||
--group-id account \
|
||||
--connection-name openai-connection
|
||||
|
||||
# Create private DNS zone
|
||||
az network private-dns zone create \
|
||||
--resource-group MyRG \
|
||||
--name privatelink.openai.azure.com
|
||||
|
||||
# Link to VNet
|
||||
az network private-dns link vnet create \
|
||||
--resource-group MyRG \
|
||||
--zone-name privatelink.openai.azure.com \
|
||||
--name openai-dns-link \
|
||||
--virtual-network MyVNet \
|
||||
--registration-enabled false
|
||||
|
||||
# Create DNS zone group
|
||||
az network private-endpoint dns-zone-group create \
|
||||
--resource-group MyRG \
|
||||
--endpoint-name openai-private-endpoint \
|
||||
--name default \
|
||||
--private-dns-zone privatelink.openai.azure.com \
|
||||
--zone-name privatelink.openai.azure.com
|
||||
```
|
||||
|
||||
### Managed Identity Access
|
||||
|
||||
```bash
|
||||
# Enable system-assigned identity
|
||||
az cognitiveservices account identity assign \
|
||||
--name myopenai \
|
||||
--resource-group MyRG
|
||||
|
||||
# Grant role to managed identity
|
||||
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
|
||||
|
||||
az role assignment create \
|
||||
--assignee $PRINCIPAL_ID \
|
||||
--role "Cognitive Services OpenAI User" \
|
||||
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
|
||||
```
|
||||
|
||||
### Content Filtering
|
||||
|
||||
```bash
|
||||
# Configure content filtering
|
||||
az cognitiveservices account update \
|
||||
--name myopenai \
|
||||
--resource-group MyRG \
|
||||
--set properties.customContentFilter='{
|
||||
"hate": {"severity": "medium", "enabled": true},
|
||||
"violence": {"severity": "medium", "enabled": true},
|
||||
"sexual": {"severity": "medium", "enabled": true},
|
||||
"selfHarm": {"severity": "high", "enabled": true}
|
||||
}'
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Model Selection Strategy
|
||||
|
||||
**Use GPT-5-mini or GPT-5-nano for:**
|
||||
- Simple questions
|
||||
- Classification tasks
|
||||
- Content moderation
|
||||
- Summarization
|
||||
|
||||
**Use GPT-5 or GPT-4.1 for:**
|
||||
- Complex reasoning
|
||||
- Long-form content generation
|
||||
- Document analysis
|
||||
- Code generation
|
||||
|
||||
**Use Reasoning Models (o3, o4-mini) for:**
|
||||
- Mathematical problems
|
||||
- Scientific analysis
|
||||
- Step-by-step reasoning
|
||||
- Logic puzzles
|
||||
|
||||
### Implement Caching
|
||||
|
||||
```python
|
||||
# Use semantic cache to reduce duplicate requests
|
||||
from azure.ai.cache import SemanticCache
|
||||
|
||||
cache = SemanticCache(
|
||||
similarity_threshold=0.95,
|
||||
ttl_seconds=3600
|
||||
)
|
||||
|
||||
# Check cache before API call
|
||||
cached_response = cache.get(user_query)
|
||||
if cached_response:
|
||||
return cached_response
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-5",
|
||||
messages=messages
|
||||
)
|
||||
|
||||
cache.set(user_query, response)
|
||||
```
|
||||
|
||||
### Token Management
|
||||
|
||||
```python
|
||||
import tiktoken
|
||||
|
||||
# Count tokens before API call
|
||||
encoding = tiktoken.get_encoding("cl100k_base")
|
||||
tokens = len(encoding.encode(prompt))
|
||||
|
||||
if tokens > 100000:
|
||||
print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
|
||||
|
||||
# Use shorter max_tokens when appropriate
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-5",
|
||||
messages=messages,
|
||||
max_tokens=500 # Limit output tokens
|
||||
)
|
||||
```
|
||||
|
||||
## Monitoring and Alerts
|
||||
|
||||
### Set Up Cost Alerts
|
||||
|
||||
```bash
|
||||
# Create budget alert
|
||||
az consumption budget create \
|
||||
--budget-name openai-monthly-budget \
|
||||
--resource-group MyRG \
|
||||
--amount 1000 \
|
||||
--category Cost \
|
||||
--time-grain Monthly \
|
||||
--start-date 2025-01-01 \
|
||||
--end-date 2025-12-31 \
|
||||
--notifications '{
|
||||
"actual_GreaterThan_80_Percent": {
|
||||
"enabled": true,
|
||||
"operator": "GreaterThan",
|
||||
"threshold": 80,
|
||||
"contactEmails": ["billing@example.com"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### Application Insights Integration
|
||||
|
||||
```python
|
||||
from opencensus.ext.azure.log_exporter import AzureLogHandler
|
||||
import logging
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.addHandler(AzureLogHandler(
|
||||
connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
|
||||
))
|
||||
|
||||
# Log API calls
|
||||
logger.info("OpenAI API call", extra={
|
||||
"custom_dimensions": {
|
||||
"model": "gpt-5",
|
||||
"tokens": response.usage.total_tokens,
|
||||
"cost": calculate_cost(response.usage.total_tokens),
|
||||
"latency_ms": response.response_ms
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
✓ **Use Model Router** for automatic cost optimization
|
||||
✓ **Implement caching** to reduce duplicate requests
|
||||
✓ **Monitor token usage** and set budgets
|
||||
✓ **Use private endpoints** for production workloads
|
||||
✓ **Enable managed identity** instead of API keys
|
||||
✓ **Configure content filtering** for safety
|
||||
✓ **Right-size capacity** based on actual demand
|
||||
✓ **Use Foundry Observability** for monitoring
|
||||
✓ **Implement retry logic** with exponential backoff
|
||||
✓ **Choose appropriate models** for task complexity
|
||||
|
||||
## References
|
||||
|
||||
- [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/)
|
||||
- [What's New in Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/whats-new)
|
||||
- [GPT-5 Announcement](https://azure.microsoft.com/en-us/blog/gpt-5-azure/)
|
||||
- [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/)
|
||||
- [Model Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
|
||||
|
||||
Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!
|
||||
435
skills/azure-well-architected-framework.md
Normal file
435
skills/azure-well-architected-framework.md
Normal file
@@ -0,0 +1,435 @@
|
||||
---
|
||||
name: azure-well-architected-framework
|
||||
description: "Comprehensive Azure Well-Architected Framework knowledge covering the five pillars: Reliability, Security, Cost Optimization, Operational Excellence, and Performance Efficiency. Provides design principles, best practices, and implementation guidance for building robust Azure solutions."
|
||||
---
|
||||
|
||||
## 🚨 CRITICAL GUIDELINES
|
||||
|
||||
### Windows File Path Requirements
|
||||
|
||||
**MANDATORY: Always Use Backslashes on Windows for File Paths**
|
||||
|
||||
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
|
||||
|
||||
**Examples:**
|
||||
- ❌ WRONG: `D:/repos/project/file.tsx`
|
||||
- ✅ CORRECT: `D:\repos\project\file.tsx`
|
||||
|
||||
This applies to:
|
||||
- Edit tool file_path parameter
|
||||
- Write tool file_path parameter
|
||||
- All file operations on Windows systems
|
||||
|
||||
|
||||
### Documentation Guidelines
|
||||
|
||||
**NEVER create new documentation files unless explicitly requested by the user.**
|
||||
|
||||
- **Priority**: Update existing README.md files rather than creating new documentation
|
||||
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
|
||||
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
|
||||
- **User preference**: Only create additional .md files when user specifically asks for documentation
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Azure Well-Architected Framework
|
||||
|
||||
The Azure Well-Architected Framework is a set of guiding tenets for building high-quality cloud solutions. It consists of five pillars of architectural excellence.
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Help architects and engineers build secure, high-performing, resilient, and efficient infrastructure for applications.
|
||||
|
||||
**The Five Pillars**:
|
||||
1. Reliability
|
||||
2. Security
|
||||
3. Cost Optimization
|
||||
4. Operational Excellence
|
||||
5. Performance Efficiency
|
||||
|
||||
## Pillar 1: Reliability
|
||||
|
||||
**Definition**: The ability of a system to recover from failures and continue to function.
|
||||
|
||||
**Key Principles**:
|
||||
- Design for failure
|
||||
- Use availability zones and regions
|
||||
- Implement redundancy
|
||||
- Monitor and respond to failures
|
||||
- Test disaster recovery
|
||||
|
||||
**Best Practices**:
|
||||
|
||||
**Availability Zones:**
|
||||
```bash
|
||||
# Deploy VM across availability zones
|
||||
az vm create \
|
||||
--resource-group MyRG \
|
||||
--name MyVM \
|
||||
--zone 1 \
|
||||
--image Ubuntu2204 \
|
||||
--size Standard_D2s_v3
|
||||
|
||||
# Availability SLAs:
|
||||
# - Single VM (Premium SSD): 99.9%
|
||||
# - Availability Set: 99.95%
|
||||
# - Availability Zones: 99.99%
|
||||
```
|
||||
|
||||
**Backup and Disaster Recovery:**
|
||||
```bash
|
||||
# Enable Azure Backup
|
||||
az backup protection enable-for-vm \
|
||||
--resource-group MyRG \
|
||||
--vault-name MyVault \
|
||||
--vm MyVM \
|
||||
--policy-name DefaultPolicy
|
||||
|
||||
# Recovery Point Objective (RPO): How much data loss is acceptable
|
||||
# Recovery Time Objective (RTO): How long can system be down
|
||||
```
|
||||
|
||||
**Health Probes:**
|
||||
- Application Gateway health probes
|
||||
- Load Balancer probes
|
||||
- Traffic Manager endpoint monitoring
|
||||
|
||||
## Pillar 2: Security
|
||||
|
||||
**Definition**: Protecting applications and data from threats.
|
||||
|
||||
**Key Principles**:
|
||||
- Defense in depth
|
||||
- Least privilege access
|
||||
- Secure the network
|
||||
- Protect data at rest and in transit
|
||||
- Monitor and audit
|
||||
|
||||
**Best Practices**:
|
||||
|
||||
**Identity and Access:**
|
||||
```bash
|
||||
# Use managed identities (no credentials in code)
|
||||
az vm identity assign \
|
||||
--resource-group MyRG \
|
||||
--name MyVM
|
||||
|
||||
# RBAC assignment
|
||||
az role assignment create \
|
||||
--assignee <principal-id> \
|
||||
--role "Contributor" \
|
||||
--scope /subscriptions/<subscription-id>/resourceGroups/MyRG
|
||||
```
|
||||
|
||||
**Network Security:**
|
||||
- Use Network Security Groups (NSGs)
|
||||
- Implement Azure Firewall or Application Gateway WAF
|
||||
- Use Private Endpoints for PaaS services
|
||||
- Enable DDoS Protection Standard for public-facing apps
|
||||
|
||||
**Data Protection:**
|
||||
```bash
|
||||
# Enable encryption at rest (automatic for most services)
|
||||
# Enable TLS 1.2+ for data in transit
|
||||
|
||||
# Azure Storage encryption
|
||||
az storage account update \
|
||||
--name mystorageaccount \
|
||||
--resource-group MyRG \
|
||||
--min-tls-version TLS1_2 \
|
||||
--https-only true
|
||||
```
|
||||
|
||||
**Security Monitoring:**
|
||||
```bash
|
||||
# Enable Microsoft Defender for Cloud
|
||||
az security pricing create \
|
||||
--name VirtualMachines \
|
||||
--tier Standard
|
||||
|
||||
# Enable Azure Sentinel
|
||||
az sentinel onboard \
|
||||
--resource-group MyRG \
|
||||
--workspace-name MyWorkspace
|
||||
```
|
||||
|
||||
## Pillar 3: Cost Optimization
|
||||
|
||||
**Definition**: Managing costs to maximize the value delivered.
|
||||
|
||||
**Key Principles**:
|
||||
- Plan and estimate costs
|
||||
- Provision with optimization
|
||||
- Use monitoring and analytics
|
||||
- Maximize efficiency of cloud spend
|
||||
|
||||
**Best Practices**:
|
||||
|
||||
**Right-Sizing:**
|
||||
```bash
|
||||
# Use Azure Advisor recommendations
|
||||
az advisor recommendation list \
|
||||
--category Cost \
|
||||
--output table
|
||||
|
||||
# Common optimizations:
|
||||
# 1. Shutdown dev/test VMs when not in use
|
||||
# 2. Use Azure Hybrid Benefit for Windows/SQL
|
||||
# 3. Purchase reservations for consistent workloads
|
||||
# 4. Use autoscaling to match demand
|
||||
```
|
||||
|
||||
**Reserved Instances:**
|
||||
- 1-year or 3-year commitment
|
||||
- Save up to 72% vs pay-as-you-go
|
||||
- Available for VMs, SQL Database, Cosmos DB, Synapse, Storage
|
||||
|
||||
**Azure Hybrid Benefit:**
|
||||
```bash
|
||||
# Apply Windows license to VM
|
||||
az vm update \
|
||||
--resource-group MyRG \
|
||||
--name MyVM \
|
||||
--license-type Windows_Server
|
||||
|
||||
# SQL Server Hybrid Benefit
|
||||
az sql vm create \
|
||||
--resource-group MyRG \
|
||||
--name MySQLVM \
|
||||
--license-type AHUB
|
||||
```
|
||||
|
||||
**Cost Management:**
|
||||
```bash
|
||||
# Create budget
|
||||
az consumption budget create \
|
||||
--budget-name MyBudget \
|
||||
--category cost \
|
||||
--amount 1000 \
|
||||
--time-grain monthly \
|
||||
--start-date 2025-01-01 \
|
||||
--end-date 2025-12-31
|
||||
|
||||
# Set up alerts at 80%, 100%, 120% of budget
|
||||
```
|
||||
|
||||
## Pillar 4: Operational Excellence
|
||||
|
||||
**Definition**: Operations processes that keep a system running in production.
|
||||
|
||||
**Key Principles**:
|
||||
- Automate operations
|
||||
- Monitor and gain insights
|
||||
- Refine operations procedures
|
||||
- Anticipate failure
|
||||
- Stay current with updates
|
||||
|
||||
**Best Practices**:
|
||||
|
||||
**Infrastructure as Code:**
|
||||
```bash
|
||||
# Use ARM, Bicep, or Terraform
|
||||
# Version control all infrastructure
|
||||
# Implement CI/CD for infrastructure
|
||||
|
||||
# Example: Bicep deployment
|
||||
az deployment group create \
|
||||
--resource-group MyRG \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json
|
||||
```
|
||||
|
||||
**Monitoring and Alerting:**
|
||||
```bash
|
||||
# Application Insights for apps
|
||||
az monitor app-insights component create \
|
||||
--app MyApp \
|
||||
--location eastus \
|
||||
--resource-group MyRG
|
||||
|
||||
# Log Analytics for infrastructure
|
||||
az monitor log-analytics workspace create \
|
||||
--resource-group MyRG \
|
||||
--workspace-name MyWorkspace
|
||||
|
||||
# Create alerts
|
||||
az monitor metrics alert create \
|
||||
--name HighCPU \
|
||||
--resource-group MyRG \
|
||||
--scopes <vm-id> \
|
||||
--condition "avg Percentage CPU > 80" \
|
||||
--description "CPU usage is above 80%"
|
||||
```
|
||||
|
||||
**DevOps Practices:**
|
||||
- Continuous Integration/Continuous Deployment (CI/CD)
|
||||
- Blue-green deployments
|
||||
- Canary releases
|
||||
- Feature flags
|
||||
- Automated testing
|
||||
|
||||
## Pillar 5: Performance Efficiency
|
||||
|
||||
**Definition**: The ability of a system to adapt to changes in load.
|
||||
|
||||
**Key Principles**:
|
||||
- Scale horizontally
|
||||
- Choose the right resources
|
||||
- Monitor performance
|
||||
- Optimize network and data access
|
||||
|
||||
**Best Practices**:
|
||||
|
||||
**Scaling:**
|
||||
```bash
|
||||
# Horizontal scaling (preferred)
|
||||
# VM Scale Sets
|
||||
az vmss create \
|
||||
--resource-group MyRG \
|
||||
--name MyVMSS \
|
||||
--image Ubuntu2204 \
|
||||
--instance-count 3 \
|
||||
--vm-sku Standard_D2s_v3
|
||||
|
||||
# Autoscaling
|
||||
az monitor autoscale create \
|
||||
--resource-group MyRG \
|
||||
--resource MyVMSS \
|
||||
--resource-type Microsoft.Compute/virtualMachineScaleSets \
|
||||
--name MyAutoscale \
|
||||
--min-count 2 \
|
||||
--max-count 10
|
||||
```
|
||||
|
||||
**Caching:**
|
||||
- Azure Cache for Redis
|
||||
- Azure CDN for static content
|
||||
- Application-level caching
|
||||
|
||||
**Data Access:**
|
||||
- Use indexes on databases
|
||||
- Implement caching strategies
|
||||
- Use CDN for global content delivery
|
||||
- Optimize queries (SQL, Cosmos DB)
|
||||
|
||||
**Networking:**
|
||||
```bash
|
||||
# Use Azure Front Door for global apps
|
||||
az afd profile create \
|
||||
--profile-name MyFrontDoor \
|
||||
--resource-group MyRG \
|
||||
--sku Premium_AzureFrontDoor
|
||||
|
||||
# Features:
|
||||
# - Global load balancing
|
||||
# - CDN capabilities
|
||||
# - Web Application Firewall
|
||||
# - SSL offloading
|
||||
# - Caching
|
||||
```
|
||||
|
||||
## Assessment and Tools
|
||||
|
||||
**Azure Well-Architected Review:**
|
||||
```bash
|
||||
# Self-assessment tool in Azure Portal
|
||||
# Generates recommendations per pillar
|
||||
# Provides actionable guidance
|
||||
```
|
||||
|
||||
**Azure Advisor:**
|
||||
```bash
|
||||
# Get recommendations
|
||||
az advisor recommendation list --output table
|
||||
|
||||
# Categories:
|
||||
# - Reliability (High Availability)
|
||||
# - Security
|
||||
# - Performance
|
||||
# - Cost
|
||||
# - Operational Excellence
|
||||
```
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
**Reliability:**
|
||||
- [ ] Deploy across availability zones
|
||||
- [ ] Implement backup strategy
|
||||
- [ ] Define RTO and RPO
|
||||
- [ ] Test disaster recovery
|
||||
- [ ] Implement health monitoring
|
||||
|
||||
**Security:**
|
||||
- [ ] Enable Azure AD authentication
|
||||
- [ ] Implement RBAC (least privilege)
|
||||
- [ ] Encrypt data at rest and in transit
|
||||
- [ ] Enable Microsoft Defender for Cloud
|
||||
- [ ] Implement network segmentation (NSGs, Firewall)
|
||||
- [ ] Use Key Vault for secrets
|
||||
|
||||
**Cost Optimization:**
|
||||
- [ ] Right-size resources
|
||||
- [ ] Purchase reservations for predictable workloads
|
||||
- [ ] Enable autoscaling
|
||||
- [ ] Use Azure Hybrid Benefit
|
||||
- [ ] Implement budget alerts
|
||||
- [ ] Review Azure Advisor cost recommendations
|
||||
|
||||
**Operational Excellence:**
|
||||
- [ ] Implement Infrastructure as Code
|
||||
- [ ] Set up CI/CD pipelines
|
||||
- [ ] Enable comprehensive monitoring
|
||||
- [ ] Create operational runbooks
|
||||
- [ ] Implement automated alerting
|
||||
- [ ] Use tags for resource organization
|
||||
|
||||
**Performance Efficiency:**
|
||||
- [ ] Choose appropriate resource SKUs
|
||||
- [ ] Implement autoscaling
|
||||
- [ ] Use caching (Redis, CDN)
|
||||
- [ ] Optimize database queries
|
||||
- [ ] Implement load balancing
|
||||
- [ ] Monitor performance metrics
|
||||
|
||||
## Common Patterns
|
||||
|
||||
**Highly Available Web Application:**
|
||||
- Application Gateway (WAF enabled)
|
||||
- App Service (Premium tier, multiple instances)
|
||||
- Azure SQL Database (Zone-redundant)
|
||||
- Azure Cache for Redis
|
||||
- Application Insights
|
||||
- Azure Front Door (global distribution)
|
||||
|
||||
**Mission-Critical Application:**
|
||||
- Multi-region deployment
|
||||
- Traffic Manager or Front Door (global routing)
|
||||
- Availability Zones in each region
|
||||
- Geo-redundant storage (GRS or RA-GRS)
|
||||
- Automated backups with geo-replication
|
||||
- Comprehensive monitoring and alerting
|
||||
|
||||
**Cost-Optimized Dev/Test:**
|
||||
- Auto-shutdown for VMs
|
||||
- B-series (burstable) VMs
|
||||
- Dev/Test pricing tiers
|
||||
- Shared App Service plans
|
||||
- Azure DevTest Labs
|
||||
|
||||
## References
|
||||
|
||||
- **Official Framework**: https://learn.microsoft.com/en-us/azure/well-architected/
|
||||
- **Azure Advisor**: https://portal.azure.com/#blade/Microsoft_Azure_Expert/AdvisorMenuBlade/overview
|
||||
- **Well-Architected Review**: https://learn.microsoft.com/en-us/assessments/azure-architecture-review/
|
||||
- **Architecture Center**: https://learn.microsoft.com/en-us/azure/architecture/
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Balance the Pillars**: Trade-offs exist between pillars (e.g., cost vs. reliability)
|
||||
2. **Continuous Improvement**: Architecture is not static, revisit regularly
|
||||
3. **Measure and Monitor**: Use data to drive decisions
|
||||
4. **Automation**: Automate repetitive tasks to improve reliability and reduce costs
|
||||
5. **Security First**: Integrate security into every layer of architecture
|
||||
|
||||
The Well-Architected Framework provides a consistent approach to evaluating architectures and implementing designs that scale over time.
|
||||
624
skills/container-apps-gpu-2025.md
Normal file
624
skills/container-apps-gpu-2025.md
Normal file
@@ -0,0 +1,624 @@
|
||||
## 🚨 CRITICAL GUIDELINES
|
||||
|
||||
### Windows File Path Requirements
|
||||
|
||||
**MANDATORY: Always Use Backslashes on Windows for File Paths**
|
||||
|
||||
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
|
||||
|
||||
**Examples:**
|
||||
- ❌ WRONG: `D:/repos/project/file.tsx`
|
||||
- ✅ CORRECT: `D:\repos\project\file.tsx`
|
||||
|
||||
This applies to:
|
||||
- Edit tool file_path parameter
|
||||
- Write tool file_path parameter
|
||||
- All file operations on Windows systems
|
||||
|
||||
### Documentation Guidelines
|
||||
|
||||
**NEVER create new documentation files unless explicitly requested by the user.**
|
||||
|
||||
- **Priority**: Update existing README.md files rather than creating new documentation
|
||||
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
|
||||
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
|
||||
- **User preference**: Only create additional .md files when user specifically asks for documentation
|
||||
|
||||
---
|
||||
|
||||
|
||||
# Azure Container Apps GPU Support - 2025 Features
|
||||
|
||||
Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).
|
||||
|
||||
## Overview
|
||||
|
||||
Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.
|
||||
|
||||
## Key 2025 Features (Build Announcements)
|
||||
|
||||
### 1. Serverless GPU (GA)
|
||||
- **Automatic scaling**: Scale GPU workloads based on demand
|
||||
- **Scale-to-zero**: Pay only when GPU is actively used
|
||||
- **Per-second billing**: Granular cost control
|
||||
- **Optimized cold start**: Fast initialization for AI models
|
||||
- **Reduced operational overhead**: No infrastructure management
|
||||
|
||||
### 2. Dedicated GPU (GA)
|
||||
- **Consistent performance**: Dedicated GPU resources
|
||||
- **Simplified AI deployment**: Easy model hosting
|
||||
- **Long-running workloads**: Ideal for training and continuous inference
|
||||
- **Multiple GPU types**: NVIDIA A100, T4, and more
|
||||
|
||||
### 3. Dynamic Sessions with GPU (Early Access)
|
||||
- **Sandboxed execution**: Run untrusted AI-generated code
|
||||
- **Hyper-V isolation**: Enhanced security
|
||||
- **GPU-powered Python interpreter**: Handle compute-intensive AI workloads
|
||||
- **Scale at runtime**: Dynamic resource allocation
|
||||
|
||||
### 4. Foundry Models Integration
|
||||
- **Deploy AI models directly**: During container app creation
|
||||
- **Ready-to-use models**: Pre-configured inference endpoints
|
||||
- **Azure AI Foundry**: Seamless integration
|
||||
|
||||
### 5. Workflow with Durable Task Scheduler (Preview)
|
||||
- **Long-running workflows**: Reliable orchestration
|
||||
- **State management**: Automatic persistence
|
||||
- **Event-driven**: Trigger workflows from events
|
||||
|
||||
### 6. Native Azure Functions Support
|
||||
- **Functions runtime**: Run Azure Functions in Container Apps
|
||||
- **Consistent development**: Same code, serverless execution
|
||||
- **Event triggers**: All Functions triggers supported
|
||||
|
||||
### 7. Dapr Integration (GA)
|
||||
- **Service discovery**: Built-in DNS-based discovery
|
||||
- **State management**: Distributed state stores
|
||||
- **Pub/sub messaging**: Reliable messaging patterns
|
||||
- **Service invocation**: Resilient service-to-service calls
|
||||
- **Observability**: Integrated tracing and metrics
|
||||
|
||||
## Creating Container Apps with GPU
|
||||
|
||||
### Basic Container App with Serverless GPU
|
||||
|
||||
```bash
|
||||
# Create Container Apps environment
|
||||
az containerapp env create \
|
||||
--name myenv \
|
||||
--resource-group MyRG \
|
||||
--location eastus \
|
||||
--logs-workspace-id <workspace-id> \
|
||||
--logs-workspace-key <workspace-key>
|
||||
|
||||
# Create Container App with GPU
|
||||
az containerapp create \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/ai-model:latest \
|
||||
--cpu 4 \
|
||||
--memory 8Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 1 \
|
||||
--min-replicas 0 \
|
||||
--max-replicas 10 \
|
||||
--ingress external \
|
||||
--target-port 8080
|
||||
```
|
||||
|
||||
### Production-Ready Container App with GPU
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-gpu-prod \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
\
|
||||
# Container configuration
|
||||
--image myregistry.azurecr.io/ai-model:latest \
|
||||
--registry-server myregistry.azurecr.io \
|
||||
--registry-identity system \
|
||||
\
|
||||
# Resources
|
||||
--cpu 4 \
|
||||
--memory 8Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 1 \
|
||||
\
|
||||
# Scaling
|
||||
--min-replicas 0 \
|
||||
--max-replicas 20 \
|
||||
--scale-rule-name http-scaling \
|
||||
--scale-rule-type http \
|
||||
--scale-rule-http-concurrency 10 \
|
||||
\
|
||||
# Networking
|
||||
--ingress external \
|
||||
--target-port 8080 \
|
||||
--transport http2 \
|
||||
--exposed-port 8080 \
|
||||
\
|
||||
# Security
|
||||
--registry-identity system \
|
||||
--env-vars "AZURE_CLIENT_ID=secretref:client-id" \
|
||||
\
|
||||
# Monitoring
|
||||
--dapr-app-id myapp \
|
||||
--dapr-app-port 8080 \
|
||||
--dapr-app-protocol http \
|
||||
--enable-dapr \
|
||||
\
|
||||
# Identity
|
||||
--system-assigned
|
||||
```
|
||||
|
||||
## Container Apps Environment Configuration
|
||||
|
||||
### Environment with Zone Redundancy
|
||||
|
||||
```bash
|
||||
az containerapp env create \
|
||||
--name myenv-prod \
|
||||
--resource-group MyRG \
|
||||
--location eastus \
|
||||
--logs-workspace-id <workspace-id> \
|
||||
--logs-workspace-key <workspace-key> \
|
||||
--zone-redundant true \
|
||||
--enable-workload-profiles true
|
||||
```
|
||||
|
||||
### Workload Profiles (Dedicated GPU)
|
||||
|
||||
```bash
|
||||
# Create environment with workload profiles
|
||||
az containerapp env create \
|
||||
--name myenv-gpu \
|
||||
--resource-group MyRG \
|
||||
--location eastus \
|
||||
--enable-workload-profiles true
|
||||
|
||||
# Add GPU workload profile
|
||||
az containerapp env workload-profile add \
|
||||
--name myenv-gpu \
|
||||
--resource-group MyRG \
|
||||
--workload-profile-name gpu-profile \
|
||||
--workload-profile-type GPU-A100 \
|
||||
--min-nodes 0 \
|
||||
--max-nodes 10
|
||||
|
||||
# Create container app with GPU profile
|
||||
az containerapp create \
|
||||
--name myapp-dedicated-gpu \
|
||||
--resource-group MyRG \
|
||||
--environment myenv-gpu \
|
||||
--workload-profile-name gpu-profile \
|
||||
--image myregistry.azurecr.io/training-job:latest \
|
||||
--cpu 8 \
|
||||
--memory 16Gi \
|
||||
--min-replicas 1 \
|
||||
--max-replicas 5
|
||||
```
|
||||
|
||||
## GPU Scaling Rules
|
||||
|
||||
### Custom Prometheus Scaling
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-gpu-prometheus \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/ai-model:latest \
|
||||
--cpu 4 \
|
||||
--memory 8Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 1 \
|
||||
--min-replicas 0 \
|
||||
--max-replicas 10 \
|
||||
--scale-rule-name gpu-utilization \
|
||||
--scale-rule-type custom \
|
||||
--scale-rule-custom-type prometheus \
|
||||
--scale-rule-metadata \
|
||||
serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
|
||||
metricName=gpu_utilization \
|
||||
threshold=80 \
|
||||
query="avg(nvidia_gpu_utilization{app='myapp'})"
|
||||
```
|
||||
|
||||
### Queue-Based Scaling (Azure Service Bus)
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-queue-processor \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/batch-processor:latest \
|
||||
--cpu 4 \
|
||||
--memory 8Gi \
|
||||
--gpu-type nvidia-t4 \
|
||||
--gpu-count 1 \
|
||||
--min-replicas 0 \
|
||||
--max-replicas 50 \
|
||||
--scale-rule-name queue-scaling \
|
||||
--scale-rule-type azure-servicebus \
|
||||
--scale-rule-metadata \
|
||||
queueName=ai-jobs \
|
||||
namespace=myservicebus \
|
||||
messageCount=5 \
|
||||
--scale-rule-auth connection=servicebus-connection
|
||||
```
|
||||
|
||||
## Dapr Integration
|
||||
|
||||
### Enable Dapr on Container App
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-dapr \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/myapp:latest \
|
||||
--enable-dapr \
|
||||
--dapr-app-id myapp \
|
||||
--dapr-app-port 8080 \
|
||||
--dapr-app-protocol http \
|
||||
--dapr-http-max-request-size 4 \
|
||||
--dapr-http-read-buffer-size 4 \
|
||||
--dapr-log-level info \
|
||||
--dapr-enable-api-logging true
|
||||
```
|
||||
|
||||
### Dapr State Store (Azure Cosmos DB)
|
||||
|
||||
```yaml
|
||||
# Create Dapr component for state store
|
||||
apiVersion: dapr.io/v1alpha1
|
||||
kind: Component
|
||||
metadata:
|
||||
name: statestore
|
||||
spec:
|
||||
type: state.azure.cosmosdb
|
||||
version: v1
|
||||
metadata:
|
||||
- name: url
|
||||
value: "https://mycosmosdb.documents.azure.com:443/"
|
||||
- name: masterKey
|
||||
secretRef: cosmosdb-key
|
||||
- name: database
|
||||
value: "mydb"
|
||||
- name: collection
|
||||
value: "state"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create the component
|
||||
az containerapp env dapr-component set \
|
||||
--name myenv \
|
||||
--resource-group MyRG \
|
||||
--dapr-component-name statestore \
|
||||
--yaml component.yaml
|
||||
```
|
||||
|
||||
### Dapr Pub/Sub (Azure Service Bus)
|
||||
|
||||
```yaml
|
||||
apiVersion: dapr.io/v1alpha1
|
||||
kind: Component
|
||||
metadata:
|
||||
name: pubsub
|
||||
spec:
|
||||
type: pubsub.azure.servicebus.topics
|
||||
version: v1
|
||||
metadata:
|
||||
- name: connectionString
|
||||
secretRef: servicebus-connection
|
||||
- name: consumerID
|
||||
value: "myapp"
|
||||
```
|
||||
|
||||
### Service-to-Service Invocation
|
||||
|
||||
```python
|
||||
# Python example using Dapr SDK
|
||||
from dapr.clients import DaprClient
|
||||
|
||||
with DaprClient() as client:
|
||||
# Invoke another service
|
||||
response = client.invoke_method(
|
||||
app_id='other-service',
|
||||
method_name='process',
|
||||
data='{"input": "data"}'
|
||||
)
|
||||
|
||||
# Save state
|
||||
client.save_state(
|
||||
store_name='statestore',
|
||||
key='mykey',
|
||||
value='myvalue'
|
||||
)
|
||||
|
||||
# Publish message
|
||||
client.publish_event(
|
||||
pubsub_name='pubsub',
|
||||
topic_name='orders',
|
||||
data='{"orderId": "123"}'
|
||||
)
|
||||
```
|
||||
|
||||
## AI Model Deployment Patterns
|
||||
|
||||
### OpenAI-Compatible Endpoint
|
||||
|
||||
```dockerfile
|
||||
# Dockerfile for vLLM model serving
|
||||
FROM vllm/vllm-openai:latest
|
||||
|
||||
ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
|
||||
ENV GPU_MEMORY_UTILIZATION=0.9
|
||||
ENV MAX_MODEL_LEN=4096
|
||||
|
||||
CMD ["--model", "${MODEL_NAME}", \
|
||||
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}", \
|
||||
"--max-model-len", "${MAX_MODEL_LEN}", \
|
||||
"--port", "8080"]
|
||||
```
|
||||
|
||||
```bash
|
||||
# Deploy vLLM model
|
||||
az containerapp create \
|
||||
--name llama-inference \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image vllm/vllm-openai:latest \
|
||||
--cpu 8 \
|
||||
--memory 32Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 1 \
|
||||
--min-replicas 1 \
|
||||
--max-replicas 5 \
|
||||
--target-port 8080 \
|
||||
--ingress external \
|
||||
--env-vars \
|
||||
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
|
||||
GPU_MEMORY_UTILIZATION="0.9" \
|
||||
HF_TOKEN=secretref:huggingface-token
|
||||
```
|
||||
|
||||
### Stable Diffusion Image Generation
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name stable-diffusion \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/stable-diffusion:latest \
|
||||
--cpu 4 \
|
||||
--memory 16Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 1 \
|
||||
--min-replicas 0 \
|
||||
--max-replicas 10 \
|
||||
--target-port 7860 \
|
||||
--ingress external \
|
||||
--scale-rule-name http-scaling \
|
||||
--scale-rule-type http \
|
||||
--scale-rule-http-concurrency 1
|
||||
```
|
||||
|
||||
### Batch Processing Job
|
||||
|
||||
```bash
|
||||
az containerapp job create \
|
||||
--name batch-training-job \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--trigger-type Manual \
|
||||
--image myregistry.azurecr.io/training:latest \
|
||||
--cpu 8 \
|
||||
--memory 32Gi \
|
||||
--gpu-type nvidia-a100 \
|
||||
--gpu-count 2 \
|
||||
--parallelism 1 \
|
||||
--replica-timeout 7200 \
|
||||
--replica-retry-limit 3 \
|
||||
--env-vars \
|
||||
DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
|
||||
MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
|
||||
EPOCHS="100"
|
||||
|
||||
# Execute job
|
||||
az containerapp job start \
|
||||
--name batch-training-job \
|
||||
--resource-group MyRG
|
||||
```
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Application Insights Integration
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-monitored \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/myapp:latest \
|
||||
--env-vars \
|
||||
APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection
|
||||
```
|
||||
|
||||
### Query Logs
|
||||
|
||||
```bash
|
||||
# Stream logs
|
||||
az containerapp logs show \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--follow
|
||||
|
||||
# Query with Log Analytics
|
||||
az monitor log-analytics query \
|
||||
--workspace <workspace-id> \
|
||||
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
|
||||
```
|
||||
|
||||
### Metrics and Alerts
|
||||
|
||||
```bash
|
||||
# Create metric alert for GPU usage
|
||||
az monitor metrics alert create \
|
||||
--name high-gpu-usage \
|
||||
--resource-group MyRG \
|
||||
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
|
||||
--condition "avg Requests > 100" \
|
||||
--window-size 5m \
|
||||
--evaluation-frequency 1m \
|
||||
--action <action-group-id>
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Managed Identity
|
||||
|
||||
```bash
|
||||
# Create with system-assigned identity
|
||||
az containerapp create \
|
||||
--name myapp-identity \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--system-assigned \
|
||||
--image myregistry.azurecr.io/myapp:latest
|
||||
|
||||
# Get identity principal ID
|
||||
IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)
|
||||
|
||||
# Assign role to access Key Vault
|
||||
az role assignment create \
|
||||
--assignee $IDENTITY_ID \
|
||||
--role "Key Vault Secrets User" \
|
||||
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault
|
||||
|
||||
# Use user-assigned identity
|
||||
az identity create --name myapp-identity --resource-group MyRG
|
||||
IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)
|
||||
|
||||
az containerapp create \
|
||||
--name myapp-user-identity \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--user-assigned $IDENTITY_RESOURCE_ID \
|
||||
--image myregistry.azurecr.io/myapp:latest
|
||||
```
|
||||
|
||||
### Secret Management
|
||||
|
||||
```bash
|
||||
# Add secrets
|
||||
az containerapp secret set \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--secrets \
|
||||
huggingface-token="<token>" \
|
||||
api-key="<key>"
|
||||
|
||||
# Reference secrets in environment variables
|
||||
az containerapp update \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--set-env-vars \
|
||||
HF_TOKEN=secretref:huggingface-token \
|
||||
API_KEY=secretref:api-key
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Scale-to-Zero Configuration
|
||||
|
||||
```bash
|
||||
az containerapp create \
|
||||
--name myapp-scale-zero \
|
||||
--resource-group MyRG \
|
||||
--environment myenv \
|
||||
--image myregistry.azurecr.io/myapp:latest \
|
||||
--min-replicas 0 \
|
||||
--max-replicas 10 \
|
||||
--scale-rule-name http-scaling \
|
||||
--scale-rule-type http \
|
||||
--scale-rule-http-concurrency 10
|
||||
```
|
||||
|
||||
**Cost savings**: Pay only when requests are being processed. GPU costs are per-second when active.
|
||||
|
||||
### Right-Sizing Resources
|
||||
|
||||
```bash
|
||||
# Start with minimal resources
|
||||
--cpu 2 --memory 4Gi --gpu-count 1
|
||||
|
||||
# Monitor and adjust based on actual usage
|
||||
az monitor metrics list \
|
||||
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
|
||||
--metric "CpuPercentage,MemoryPercentage"
|
||||
```
|
||||
|
||||
### Use Spot/Preemptible GPUs (Future Feature)
|
||||
|
||||
When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Revision Status
|
||||
|
||||
```bash
|
||||
az containerapp revision list \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--output table
|
||||
```
|
||||
|
||||
### View Revision Details
|
||||
|
||||
```bash
|
||||
az containerapp revision show \
|
||||
--name <revision-name> \
|
||||
--app myapp-gpu \
|
||||
--resource-group MyRG
|
||||
```
|
||||
|
||||
### Restart Container App
|
||||
|
||||
```bash
|
||||
az containerapp update \
|
||||
--name myapp-gpu \
|
||||
--resource-group MyRG \
|
||||
--force-restart
|
||||
```
|
||||
|
||||
### GPU Not Available
|
||||
|
||||
If GPU is not provisioning:
|
||||
1. Check region availability: Not all regions support GPU
|
||||
2. Verify quota: Request quota increase if needed
|
||||
3. Check workload profile: Ensure GPU workload profile is created
|
||||
|
||||
## Best Practices
|
||||
|
||||
✓ Use scale-to-zero for intermittent workloads
|
||||
✓ Implement health probes (liveness and readiness)
|
||||
✓ Use managed identities for authentication
|
||||
✓ Store secrets in Azure Key Vault
|
||||
✓ Enable Dapr for microservices patterns
|
||||
✓ Configure appropriate scaling rules
|
||||
✓ Monitor GPU utilization and adjust resources
|
||||
✓ Use Container Apps jobs for batch processing
|
||||
✓ Implement retry logic for transient failures
|
||||
✓ Use Application Insights for observability
|
||||
|
||||
## References
|
||||
|
||||
- [Container Apps GPU Documentation](https://learn.microsoft.com/en-us/azure/container-apps/gpu-support)
|
||||
- [Dapr Integration](https://learn.microsoft.com/en-us/azure/container-apps/dapr-overview)
|
||||
- [Scaling Rules](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)
|
||||
- [Build 2025 Announcements](https://azure.microsoft.com/en-us/blog/container-apps-build-2025/)
|
||||
|
||||
Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!
|
||||
796
skills/deployment-stacks-2025.md
Normal file
796
skills/deployment-stacks-2025.md
Normal file
@@ -0,0 +1,796 @@
|
||||
## 🚨 CRITICAL GUIDELINES
|
||||
|
||||
### Windows File Path Requirements
|
||||
|
||||
**MANDATORY: Always Use Backslashes on Windows for File Paths**
|
||||
|
||||
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
|
||||
|
||||
**Examples:**
|
||||
- ❌ WRONG: `D:/repos/project/file.tsx`
|
||||
- ✅ CORRECT: `D:\repos\project\file.tsx`
|
||||
|
||||
This applies to:
|
||||
- Edit tool file_path parameter
|
||||
- Write tool file_path parameter
|
||||
- All file operations on Windows systems
|
||||
|
||||
### Documentation Guidelines
|
||||
|
||||
**NEVER create new documentation files unless explicitly requested by the user.**
|
||||
|
||||
- **Priority**: Update existing README.md files rather than creating new documentation
|
||||
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
|
||||
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
|
||||
- **User preference**: Only create additional .md files when user specifically asks for documentation
|
||||
|
||||
---
|
||||
|
||||
|
||||
# Azure Deployment Stacks - 2025 GA Features
|
||||
|
||||
Complete knowledge base for Azure Deployment Stacks, the successor to Azure Blueprints (GA 2024, best practices 2025).
|
||||
|
||||
## Overview
|
||||
|
||||
Azure Deployment Stacks is a resource type for managing a collection of Azure resources as a single, atomic unit. It provides unified lifecycle management, resource protection, and automatic cleanup capabilities.
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Unified Resource Management
|
||||
- Manage multiple resources as a single entity
|
||||
- Update, export, and delete operations on the entire stack
|
||||
- Track all managed resources in one place
|
||||
- Consistent deployment across environments
|
||||
|
||||
### 2. Deny Settings (Resource Protection)
|
||||
Prevent unauthorized modifications to managed resources:
|
||||
- **None**: No restrictions (default)
|
||||
- **DenyDelete**: Prevent resource deletion
|
||||
- **DenyWriteAndDelete**: Prevent updates and deletions
|
||||
|
||||
### 3. ActionOnUnmanage (Cleanup Policies)
|
||||
Control what happens to resources no longer in template:
|
||||
- **detachAll**: Remove from stack management, keep resources
|
||||
- **deleteAll**: Delete resources not in template
|
||||
- **deleteResources**: Delete unmanaged resources, keep resource groups
|
||||
|
||||
### 4. Scope Flexibility
|
||||
Deploy stacks at:
|
||||
- Resource group scope
|
||||
- Subscription scope
|
||||
- Management group scope
|
||||
|
||||
### 5. Replaces Azure Blueprints
|
||||
Azure Blueprints will be deprecated in **July 2026**. Deployment Stacks is the recommended replacement.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Azure CLI Version
|
||||
```bash
|
||||
# Requires Azure CLI 2.61.0 or later
|
||||
az version
|
||||
|
||||
# Upgrade if needed
|
||||
az upgrade
|
||||
```
|
||||
|
||||
### Azure PowerShell Version
|
||||
```bash
|
||||
# Requires Azure PowerShell 12.0.0 or later
|
||||
Get-InstalledModule -Name Az
|
||||
Update-Module -Name Az
|
||||
```
|
||||
|
||||
## Creating Deployment Stacks
|
||||
|
||||
### Subscription Scope Stack
|
||||
|
||||
```bash
|
||||
# Create deployment stack at subscription level
|
||||
az stack sub create \
|
||||
--name MyProductionStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--deny-settings-excluded-principals <devops-service-principal-id> <admin-group-id> \
|
||||
--action-on-unmanage deleteAll \
|
||||
--description "Production infrastructure managed by deployment stack" \
|
||||
--tags Environment=Production ManagedBy=DeploymentStack CostCenter=Engineering
|
||||
|
||||
# What-if analysis before deployment
|
||||
az stack sub what-if \
|
||||
--name MyProductionStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json
|
||||
|
||||
# Create with confirmation prompt disabled
|
||||
az stack sub create \
|
||||
--name MyDevStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--deny-settings-mode None \
|
||||
--action-on-unmanage detachAll \
|
||||
--yes
|
||||
```
|
||||
|
||||
### Resource Group Scope Stack
|
||||
|
||||
```bash
|
||||
# Create resource group
|
||||
az group create \
|
||||
--name MyRG \
|
||||
--location eastus \
|
||||
--tags Environment=Production
|
||||
|
||||
# Create deployment stack
|
||||
az stack group create \
|
||||
--name MyAppStack \
|
||||
--resource-group MyRG \
|
||||
--template-file main.bicep \
|
||||
--parameters environment=production \
|
||||
--deny-settings-mode DenyDelete \
|
||||
--action-on-unmanage deleteAll \
|
||||
--description "Application infrastructure stack"
|
||||
```
|
||||
|
||||
### Management Group Scope Stack
|
||||
|
||||
```bash
|
||||
# Create stack at management group level
|
||||
az stack mg create \
|
||||
--name MyEnterpriseStack \
|
||||
--management-group-id MyMgmtGroup \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--action-on-unmanage detachAll
|
||||
```
|
||||
|
||||
## Bicep Template for Deployment Stack
|
||||
|
||||
### Production Stack Template
|
||||
|
||||
```bicep
|
||||
// main.bicep
|
||||
targetScope = 'subscription'
|
||||
|
||||
@description('Environment name')
|
||||
@allowed([
|
||||
'dev'
|
||||
'staging'
|
||||
'production'
|
||||
])
|
||||
param environment string = 'production'
|
||||
|
||||
@description('Primary location')
|
||||
param location string = 'eastus'
|
||||
|
||||
@description('Secondary location for geo-replication')
|
||||
param secondaryLocation string = 'westus'
|
||||
|
||||
// Resource naming
|
||||
var namingPrefix = 'myapp-${environment}'
|
||||
|
||||
// Resource Group for core infrastructure
|
||||
resource coreRG 'Microsoft.Resources/resourceGroups@2024-03-01' = {
|
||||
name: '${namingPrefix}-core-rg'
|
||||
location: location
|
||||
tags: {
|
||||
Environment: environment
|
||||
ManagedBy: 'DeploymentStack'
|
||||
Purpose: 'Core Infrastructure'
|
||||
}
|
||||
}
|
||||
|
||||
// Resource Group for data services
|
||||
resource dataRG 'Microsoft.Resources/resourceGroups@2024-03-01' = {
|
||||
name: '${namingPrefix}-data-rg'
|
||||
location: location
|
||||
tags: {
|
||||
Environment: environment
|
||||
ManagedBy: 'DeploymentStack'
|
||||
Purpose: 'Data Services'
|
||||
}
|
||||
}
|
||||
|
||||
// Log Analytics Workspace
|
||||
module logAnalytics 'modules/log-analytics.bicep' = {
|
||||
name: 'logAnalyticsDeploy'
|
||||
scope: coreRG
|
||||
params: {
|
||||
name: '${namingPrefix}-logs'
|
||||
location: location
|
||||
retentionInDays: environment == 'production' ? 90 : 30
|
||||
}
|
||||
}
|
||||
|
||||
// AKS Automatic Cluster
|
||||
module aksCluster 'modules/aks-automatic.bicep' = {
|
||||
name: 'aksClusterDeploy'
|
||||
scope: coreRG
|
||||
params: {
|
||||
name: '${namingPrefix}-aks'
|
||||
location: location
|
||||
kubernetesVersion: '1.34'
|
||||
workspaceId: logAnalytics.outputs.workspaceId
|
||||
enableZoneRedundancy: environment == 'production'
|
||||
}
|
||||
}
|
||||
|
||||
// Container Apps Environment
|
||||
module containerEnv 'modules/container-env.bicep' = {
|
||||
name: 'containerEnvDeploy'
|
||||
scope: coreRG
|
||||
params: {
|
||||
name: '${namingPrefix}-containerenv'
|
||||
location: location
|
||||
workspaceId: logAnalytics.outputs.workspaceId
|
||||
zoneRedundant: environment == 'production'
|
||||
}
|
||||
}
|
||||
|
||||
// Azure OpenAI
|
||||
module openAI 'modules/openai.bicep' = {
|
||||
name: 'openAIDeploy'
|
||||
scope: dataRG
|
||||
params: {
|
||||
name: '${namingPrefix}-openai'
|
||||
location: location
|
||||
deployGPT5: environment == 'production'
|
||||
}
|
||||
}
|
||||
|
||||
// Cosmos DB with geo-replication
|
||||
module cosmosDB 'modules/cosmos-db.bicep' = {
|
||||
name: 'cosmosDBDeploy'
|
||||
scope: dataRG
|
||||
params: {
|
||||
name: '${namingPrefix}-cosmos'
|
||||
primaryLocation: location
|
||||
secondaryLocation: secondaryLocation
|
||||
enableAutomaticFailover: environment == 'production'
|
||||
}
|
||||
}
|
||||
|
||||
// Key Vault
|
||||
module keyVault 'modules/key-vault.bicep' = {
|
||||
name: 'keyVaultDeploy'
|
||||
scope: coreRG
|
||||
params: {
|
||||
name: '${namingPrefix}-kv'
|
||||
location: location
|
||||
enablePurgeProtection: environment == 'production'
|
||||
}
|
||||
}
|
||||
|
||||
// Outputs
|
||||
output aksClusterName string = aksCluster.outputs.clusterName
|
||||
output containerEnvId string = containerEnv.outputs.environmentId
|
||||
output openAIEndpoint string = openAI.outputs.endpoint
|
||||
output cosmosDBEndpoint string = cosmosDB.outputs.endpoint
|
||||
output keyVaultUri string = keyVault.outputs.vaultUri
|
||||
```
|
||||
|
||||
### AKS Automatic Module
|
||||
|
||||
```bicep
|
||||
// modules/aks-automatic.bicep
|
||||
@description('Cluster name')
|
||||
param name string
|
||||
|
||||
@description('Location')
|
||||
param location string
|
||||
|
||||
@description('Kubernetes version')
|
||||
param kubernetesVersion string = '1.34'
|
||||
|
||||
@description('Log Analytics workspace ID')
|
||||
param workspaceId string
|
||||
|
||||
@description('Enable zone redundancy')
|
||||
param enableZoneRedundancy bool = true
|
||||
|
||||
resource aksCluster 'Microsoft.ContainerService/managedClusters@2025-01-01' = {
|
||||
name: name
|
||||
location: location
|
||||
sku: {
|
||||
name: 'Automatic'
|
||||
tier: 'Standard'
|
||||
}
|
||||
identity: {
|
||||
type: 'SystemAssigned'
|
||||
}
|
||||
properties: {
|
||||
kubernetesVersion: kubernetesVersion
|
||||
dnsPrefix: '${name}-dns'
|
||||
enableRBAC: true
|
||||
aadProfile: {
|
||||
managed: true
|
||||
enableAzureRBAC: true
|
||||
}
|
||||
networkProfile: {
|
||||
networkPlugin: 'azure'
|
||||
networkPluginMode: 'overlay'
|
||||
networkDataplane: 'cilium'
|
||||
serviceCidr: '10.0.0.0/16'
|
||||
dnsServiceIP: '10.0.0.10'
|
||||
}
|
||||
autoScalerProfile: {
|
||||
'balance-similar-node-groups': 'true'
|
||||
expander: 'least-waste'
|
||||
}
|
||||
autoUpgradeProfile: {
|
||||
upgradeChannel: 'stable'
|
||||
nodeOSUpgradeChannel: 'NodeImage'
|
||||
}
|
||||
securityProfile: {
|
||||
defender: {
|
||||
securityMonitoring: {
|
||||
enabled: true
|
||||
}
|
||||
}
|
||||
workloadIdentity: {
|
||||
enabled: true
|
||||
}
|
||||
}
|
||||
oidcIssuerProfile: {
|
||||
enabled: true
|
||||
}
|
||||
addonProfiles: {
|
||||
omsagent: {
|
||||
enabled: true
|
||||
config: {
|
||||
logAnalyticsWorkspaceResourceID: workspaceId
|
||||
}
|
||||
}
|
||||
azurePolicy: {
|
||||
enabled: true
|
||||
}
|
||||
}
|
||||
}
|
||||
zones: enableZoneRedundancy ? ['1', '2', '3'] : null
|
||||
}
|
||||
|
||||
output clusterName string = aksCluster.name
|
||||
output clusterId string = aksCluster.id
|
||||
output oidcIssuerUrl string = aksCluster.properties.oidcIssuerProfile.issuerUrl
|
||||
output kubeletIdentity string = aksCluster.properties.identityProfile.kubeletidentity.objectId
|
||||
```
|
||||
|
||||
## Managing Deployment Stacks
|
||||
|
||||
### Update Stack
|
||||
|
||||
```bash
|
||||
# Update with new template version
|
||||
az stack sub update \
|
||||
--name MyProductionStack \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json \
|
||||
--action-on-unmanage deleteAll
|
||||
|
||||
# Update deny settings
|
||||
az stack sub update \
|
||||
--name MyProductionStack \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--deny-settings-excluded-principals <new-principal-id>
|
||||
```
|
||||
|
||||
### View Stack Details
|
||||
|
||||
```bash
|
||||
# Show stack information
|
||||
az stack sub show \
|
||||
--name MyProductionStack \
|
||||
--output json
|
||||
|
||||
# List all stacks in subscription
|
||||
az stack sub list --output table
|
||||
|
||||
# List stacks in resource group
|
||||
az stack group list \
|
||||
--resource-group MyRG \
|
||||
--output table
|
||||
```
|
||||
|
||||
### Export Stack Template
|
||||
|
||||
```bash
|
||||
# Export template from deployed stack
|
||||
az stack sub export \
|
||||
--name MyProductionStack \
|
||||
--output-file exported-stack.json
|
||||
|
||||
# Export and save parameters
|
||||
az stack sub show \
|
||||
--name MyProductionStack \
|
||||
--query "parameters" \
|
||||
--output json > parameters-backup.json
|
||||
```
|
||||
|
||||
### Delete Stack
|
||||
|
||||
```bash
|
||||
# Delete stack and all managed resources
|
||||
az stack sub delete \
|
||||
--name MyProductionStack \
|
||||
--action-on-unmanage deleteAll \
|
||||
--yes
|
||||
|
||||
# Delete stack but keep resources
|
||||
az stack sub delete \
|
||||
--name MyProductionStack \
|
||||
--action-on-unmanage detachAll \
|
||||
--yes
|
||||
|
||||
# Delete with confirmation prompt
|
||||
az stack sub delete --name MyProductionStack
|
||||
```
|
||||
|
||||
## Deny Settings in Detail
|
||||
|
||||
### DenyDelete Mode
|
||||
|
||||
Prevents deletion but allows updates:
|
||||
|
||||
```bash
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--deny-settings-mode DenyDelete \
|
||||
--deny-settings-excluded-principals \
|
||||
<emergency-access-principal-id> \
|
||||
<devops-service-principal-id>
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Protect production databases
|
||||
- Prevent accidental resource deletion
|
||||
- Allow configuration updates
|
||||
|
||||
### DenyWriteAndDelete Mode
|
||||
|
||||
Prevents both updates and deletions:
|
||||
|
||||
```bash
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--deny-settings-excluded-principals <break-glass-principal-id>
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Immutable infrastructure
|
||||
- Compliance requirements
|
||||
- Critical production workloads
|
||||
|
||||
### Excluded Principals
|
||||
|
||||
Bypass deny settings for specific identities:
|
||||
|
||||
```bash
|
||||
# Get principal IDs
|
||||
SERVICE_PRINCIPAL_ID=$(az ad sp show --id <app-id> --query id -o tsv)
|
||||
ADMIN_GROUP_ID=$(az ad group show --group "Cloud Admins" --query id -o tsv)
|
||||
|
||||
# Apply with exclusions
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--deny-settings-excluded-principals $SERVICE_PRINCIPAL_ID $ADMIN_GROUP_ID
|
||||
```
|
||||
|
||||
## ActionOnUnmanage Policies
|
||||
|
||||
### detachAll
|
||||
|
||||
Resources are removed from stack management but not deleted:
|
||||
|
||||
```bash
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--action-on-unmanage detachAll
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Testing deployment changes
|
||||
- Migrating resources to another stack
|
||||
- Temporary stack management
|
||||
|
||||
### deleteAll
|
||||
|
||||
All unmanaged resources are deleted:
|
||||
|
||||
```bash
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--action-on-unmanage deleteAll
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Ephemeral environments (dev, test)
|
||||
- Clean slate deployments
|
||||
- Strict infrastructure-as-code enforcement
|
||||
|
||||
### deleteResources
|
||||
|
||||
Delete resources but keep resource groups:
|
||||
|
||||
```bash
|
||||
az stack sub create \
|
||||
--name MyStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--action-on-unmanage deleteResources
|
||||
```
|
||||
|
||||
## RBAC for Deployment Stacks
|
||||
|
||||
### Built-in Roles
|
||||
|
||||
**Azure Deployment Stack Contributor**
|
||||
- Manage deployment stacks
|
||||
- Cannot create or delete deny-assignments
|
||||
|
||||
**Azure Deployment Stack Owner**
|
||||
- Full stack management
|
||||
- Can create and delete deny-assignments
|
||||
|
||||
### Assign Roles
|
||||
|
||||
```bash
|
||||
# Assign Stack Contributor role
|
||||
az role assignment create \
|
||||
--assignee <user-or-service-principal-id> \
|
||||
--role "Azure Deployment Stack Contributor" \
|
||||
--scope /subscriptions/<subscription-id>
|
||||
|
||||
# Assign Stack Owner role
|
||||
az role assignment create \
|
||||
--assignee <admin-principal-id> \
|
||||
--role "Azure Deployment Stack Owner" \
|
||||
--scope /subscriptions/<subscription-id>
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
```yaml
|
||||
name: Deploy Deployment Stack
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
id-token: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Azure Login
|
||||
uses: azure/login@v2
|
||||
with:
|
||||
client-id: ${{ secrets.AZURE_CLIENT_ID }}
|
||||
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
|
||||
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
|
||||
|
||||
- name: What-if Analysis
|
||||
run: |
|
||||
az stack sub what-if \
|
||||
--name MyProductionStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json
|
||||
|
||||
- name: Deploy Stack
|
||||
run: |
|
||||
az stack sub create \
|
||||
--name MyProductionStack \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--deny-settings-excluded-principals ${{ secrets.DEVOPS_PRINCIPAL_ID }} \
|
||||
--action-on-unmanage deleteAll \
|
||||
--yes
|
||||
```
|
||||
|
||||
### Azure DevOps Pipeline
|
||||
|
||||
```yaml
|
||||
trigger:
|
||||
branches:
|
||||
include:
|
||||
- main
|
||||
|
||||
pool:
|
||||
vmImage: 'ubuntu-latest'
|
||||
|
||||
variables:
|
||||
azureSubscription: 'MyAzureConnection'
|
||||
stackName: 'MyProductionStack'
|
||||
location: 'eastus'
|
||||
|
||||
steps:
|
||||
- task: AzureCLI@2
|
||||
displayName: 'What-if Analysis'
|
||||
inputs:
|
||||
azureSubscription: $(azureSubscription)
|
||||
scriptType: 'bash'
|
||||
scriptLocation: 'inlineScript'
|
||||
inlineScript: |
|
||||
az stack sub what-if \
|
||||
--name $(stackName) \
|
||||
--location $(location) \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json
|
||||
|
||||
- task: AzureCLI@2
|
||||
displayName: 'Deploy Stack'
|
||||
inputs:
|
||||
azureSubscription: $(azureSubscription)
|
||||
scriptType: 'bash'
|
||||
scriptLocation: 'inlineScript'
|
||||
inlineScript: |
|
||||
az stack sub create \
|
||||
--name $(stackName) \
|
||||
--location $(location) \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--action-on-unmanage deleteAll \
|
||||
--yes
|
||||
```
|
||||
|
||||
## Monitoring and Auditing
|
||||
|
||||
### View Stack Events
|
||||
|
||||
```bash
|
||||
# Get deployment operations
|
||||
az stack sub show \
|
||||
--name MyProductionStack \
|
||||
--query "deploymentId" \
|
||||
--output tsv | \
|
||||
xargs -I {} az deployment sub show --name {}
|
||||
|
||||
# List managed resources
|
||||
az stack sub show \
|
||||
--name MyProductionStack \
|
||||
--query "resources[].id" \
|
||||
--output table
|
||||
```
|
||||
|
||||
### Activity Logs
|
||||
|
||||
```bash
|
||||
# Query stack operations
|
||||
az monitor activity-log list \
|
||||
--resource-group MyRG \
|
||||
--namespace Microsoft.Resources \
|
||||
--start-time 2025-01-01T00:00:00Z \
|
||||
--query "[?contains(authorization.action, 'Microsoft.Resources/deploymentStacks')]" \
|
||||
--output table
|
||||
```
|
||||
|
||||
## Migration from Azure Blueprints
|
||||
|
||||
### Assessment
|
||||
|
||||
1. **Inventory Blueprints**: List all blueprints and assignments
|
||||
2. **Document Parameters**: Export parameters and configurations
|
||||
3. **Plan Conversion**: Map blueprints to deployment stacks
|
||||
4. **Test in Dev**: Validate converted templates
|
||||
|
||||
### Conversion Steps
|
||||
|
||||
```bash
|
||||
# 1. Export Blueprint as ARM template
|
||||
# (Use Azure Portal or PowerShell)
|
||||
|
||||
# 2. Convert ARM to Bicep
|
||||
az bicep decompile --file blueprint-template.json
|
||||
|
||||
# 3. Create Deployment Stack
|
||||
az stack sub create \
|
||||
--name ConvertedFromBlueprint \
|
||||
--location eastus \
|
||||
--template-file converted.bicep \
|
||||
--parameters @blueprint-parameters.json \
|
||||
--deny-settings-mode DenyWriteAndDelete \
|
||||
--action-on-unmanage detachAll
|
||||
|
||||
# 4. Validate resources
|
||||
az stack sub show --name ConvertedFromBlueprint
|
||||
|
||||
# 5. Delete Blueprint assignment (after validation)
|
||||
# Remove-AzBlueprintAssignment -Name MyBlueprintAssignment
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
✓ **Use Deployment Stacks for all new infrastructure**
|
||||
✓ **Always run what-if analysis before deployment**
|
||||
✓ **Use DenyWriteAndDelete for production stacks**
|
||||
✓ **Exclude break-glass principals from deny settings**
|
||||
✓ **Tag stacks with Environment, CostCenter, Owner**
|
||||
✓ **Use deleteAll for ephemeral environments**
|
||||
✓ **Use detachAll for migration scenarios**
|
||||
✓ **Implement CI/CD pipelines for stack deployment**
|
||||
✓ **Monitor stack operations via activity logs**
|
||||
✓ **Document stack architecture and dependencies**
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Stack Creation Fails
|
||||
|
||||
```bash
|
||||
# Check deployment errors
|
||||
az stack sub show \
|
||||
--name MyStack \
|
||||
--query "error" \
|
||||
--output json
|
||||
|
||||
# Validate template
|
||||
az deployment sub validate \
|
||||
--location eastus \
|
||||
--template-file main.bicep \
|
||||
--parameters @parameters.json
|
||||
```
|
||||
|
||||
### Deny Settings Blocking Operations
|
||||
|
||||
```bash
|
||||
# Check deny assignments
|
||||
az role assignment list \
|
||||
--scope /subscriptions/<subscription-id> \
|
||||
--include-inherited \
|
||||
--query "[?type=='Microsoft.Authorization/denyAssignments']"
|
||||
|
||||
# Add principal to exclusions
|
||||
az stack sub update \
|
||||
--name MyStack \
|
||||
--deny-settings-excluded-principals <new-principal-id>
|
||||
```
|
||||
|
||||
### Resources Not Deleted
|
||||
|
||||
```bash
|
||||
# Check action-on-unmanage setting
|
||||
az stack sub show \
|
||||
--name MyStack \
|
||||
--query "actionOnUnmanage" \
|
||||
--output tsv
|
||||
|
||||
# Update to deleteAll
|
||||
az stack sub update \
|
||||
--name MyStack \
|
||||
--action-on-unmanage deleteAll
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Stacks Documentation](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/deployment-stacks)
|
||||
- [Deployment Stacks Quickstart](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/quickstart-create-deployment-stacks)
|
||||
- [Migrate from Blueprints](https://learn.microsoft.com/en-us/azure/governance/blueprints/how-to/migrate-to-deployment-stacks)
|
||||
|
||||
Deployment Stacks represents the future of Azure infrastructure lifecycle management!
|
||||
Reference in New Issue
Block a user