## 🚨 CRITICAL GUIDELINES ### Windows File Path Requirements **MANDATORY: Always Use Backslashes on Windows for File Paths** When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`). **Examples:** - ❌ WRONG: `D:/repos/project/file.tsx` - ✅ CORRECT: `D:\repos\project\file.tsx` This applies to: - Edit tool file_path parameter - Write tool file_path parameter - All file operations on Windows systems ### Documentation Guidelines **NEVER create new documentation files unless explicitly requested by the user.** - **Priority**: Update existing README.md files rather than creating new documentation - **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise - **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone - **User preference**: Only create additional .md files when user specifically asks for documentation --- # AKS Automatic - 2025 GA Features Complete knowledge base for Azure Kubernetes Service Automatic mode (GA October 2025). ## Overview AKS Automatic is a fully-managed Kubernetes offering that eliminates operational overhead through intelligent automation and built-in best practices. ## Key Features (GA October 2025) ### 1. Zero Operational Overhead - Fully-managed control plane and worker nodes - Automatic OS patching and security updates - Built-in monitoring and diagnostics - Integrated security and compliance ### 2. Karpenter Integration - Dynamic node provisioning based on real-time demand - Intelligent bin-packing for cost optimization - Automatic node consolidation and deprovisioning - Support for multiple node pools and instance types ### 3. Auto-Scaling (Enabled by Default) - **Horizontal Pod Autoscaler (HPA)**: Scale pods based on CPU/memory - **Vertical Pod Autoscaler (VPA)**: Adjust pod resource requests/limits - **KEDA**: Event-driven autoscaling for external triggers ### 4. Enhanced Security - Microsoft Entra ID integration for authentication - Azure RBAC for Kubernetes authorization - Network policies enabled by default - Automatic security patches - Workload identity for pod-level authentication ### 5. Advanced Networking - Azure CNI Overlay for efficient IP usage - Cilium dataplane for high-performance networking - Network policies for microsegmentation - Private clusters supported ### 6. New Billing Model (Effective October 19, 2025) - Hosted control plane fee: **$0.16/cluster/hour** - Compute charges based on actual node usage - No separate cluster management fee - Cost savings from Karpenter optimization ### 7. Node Operating System - Ubuntu 22.04 for Kubernetes < 1.34 - Ubuntu 24.04 for Kubernetes >= 1.34 - Automatic OS upgrades with node image channel ## Creating AKS Automatic Cluster ### Basic Creation ```bash az aks create \ --resource-group MyRG \ --name MyAKSAutomatic \ --sku automatic \ --kubernetes-version 1.34 \ --location eastus ``` ### Production-Ready Configuration ```bash az aks create \ --resource-group MyRG \ --name MyAKSAutomatic \ --location eastus \ --sku automatic \ --tier standard \ \ # Kubernetes version --kubernetes-version 1.34 \ \ # Karpenter (default in automatic mode) --enable-karpenter \ \ # Networking --network-plugin azure \ --network-plugin-mode overlay \ --network-dataplane cilium \ --service-cidr 10.0.0.0/16 \ --dns-service-ip 10.0.0.10 \ --load-balancer-sku standard \ \ # Use custom VNet (optional) --vnet-subnet-id /subscriptions//resourceGroups/MyRG/providers/Microsoft.Network/virtualNetworks/MyVNet/subnets/AKSSubnet \ \ # Availability zones --zones 1 2 3 \ \ # Authentication and authorization --enable-managed-identity \ --enable-aad \ --enable-azure-rbac \ --aad-admin-group-object-ids \ \ # Auto-upgrade --auto-upgrade-channel stable \ --node-os-upgrade-channel NodeImage \ \ # Security --enable-defender \ --enable-workload-identity \ --enable-oidc-issuer \ \ # Monitoring --enable-addons monitoring \ --workspace-resource-id /subscriptions//resourceGroups/MyRG/providers/Microsoft.OperationalInsights/workspaces/MyWorkspace \ \ # Tags --tags Environment=Production ManagedBy=AKSAutomatic ``` ### With Azure Policy Add-on ```bash az aks create \ --resource-group MyRG \ --name MyAKSAutomatic \ --sku automatic \ --enable-addons azure-policy \ --kubernetes-version 1.34 ``` ## Karpenter Configuration AKS Automatic uses Karpenter for intelligent node provisioning. Customize node provisioning with AKSNodeClass and NodePool CRDs. ### Default AKSNodeClass ```yaml apiVersion: karpenter.azure.com/v1alpha1 kind: AKSNodeClass metadata: name: default spec: # OS Image - Ubuntu 24.04 for K8s 1.34+ osImage: sku: Ubuntu version: "24.04" # VM Series vmSeries: - Standard_D - Standard_E # Max pods per node maxPodsPerNode: 110 # Security securityProfile: sshAccess: Disabled securityType: Standard ``` ### Custom NodePool ```yaml apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: general-purpose spec: # Constraints template: spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["on-demand"] - key: kubernetes.azure.com/agentpool operator: In values: ["general"] # Node labels labels: workload-type: general # Taints (optional) taints: - key: "dedicated" value: "general" effect: "NoSchedule" # NodeClass reference nodeClassRef: group: karpenter.azure.com kind: AKSNodeClass name: default # Limits limits: cpu: "1000" memory: 4000Gi # Disruption budget disruption: consolidationPolicy: WhenEmpty consolidateAfter: 30s expireAfter: 720h # 30 days budgets: - nodes: "10%" duration: 5m ``` ### GPU NodePool for AI Workloads ```yaml apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: gpu-workloads spec: template: spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["on-demand"] - key: node.kubernetes.io/instance-type operator: In values: ["Standard_NC6s_v3", "Standard_NC12s_v3", "Standard_NC24s_v3"] labels: workload-type: gpu gpu-type: nvidia-v100 taints: - key: "nvidia.com/gpu" value: "true" effect: "NoSchedule" nodeClassRef: group: karpenter.azure.com kind: AKSNodeClass name: gpu-nodeclass limits: cpu: "200" memory: 800Gi nvidia.com/gpu: "16" disruption: consolidationPolicy: WhenEmpty consolidateAfter: 300s ``` ## Autoscaling with HPA, VPA, and KEDA ### Horizontal Pod Autoscaler (HPA) ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 ``` ### Vertical Pod Autoscaler (VPA) ```yaml apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" # Auto, Recreate, Initial, Off resourcePolicy: containerPolicies: - containerName: "*" minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 4 memory: 8Gi controlledResources: ["cpu", "memory"] controlledValues: RequestsAndLimits ``` ### KEDA ScaledObject (Event-Driven) ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: myapp-queue-scaler spec: scaleTargetRef: name: myapp minReplicaCount: 0 # Scale to zero maxReplicaCount: 100 pollingInterval: 30 cooldownPeriod: 300 triggers: # Azure Service Bus Queue - type: azure-servicebus metadata: queueName: myqueue namespace: myservicebus messageCount: "5" authenticationRef: name: azure-servicebus-auth # Azure Storage Queue - type: azure-queue metadata: queueName: myqueue queueLength: "10" accountName: mystorageaccount authenticationRef: name: azure-storage-auth # Prometheus metrics - type: prometheus metadata: serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 metricName: http_requests_per_second threshold: "100" query: sum(rate(http_requests_total[2m])) ``` ## Workload Identity (Replaces AAD Pod Identity) ### Setup ```bash # Workload identity is enabled by default in AKS Automatic # Create managed identity az identity create \ --name myapp-identity \ --resource-group MyRG # Get identity details export IDENTITY_CLIENT_ID=$(az identity show -g MyRG -n myapp-identity --query clientId -o tsv) export IDENTITY_OBJECT_ID=$(az identity show -g MyRG -n myapp-identity --query principalId -o tsv) # Assign role to identity az role assignment create \ --assignee $IDENTITY_OBJECT_ID \ --role "Storage Blob Data Contributor" \ --scope /subscriptions//resourceGroups/MyRG/providers/Microsoft.Storage/storageAccounts/mystorage # Create federated credential export AKS_OIDC_ISSUER=$(az aks show -g MyRG -n MyAKSAutomatic --query oidcIssuerProfile.issuerUrl -o tsv) az identity federated-credential create \ --name myapp-federated-credential \ --identity-name myapp-identity \ --resource-group MyRG \ --issuer $AKS_OIDC_ISSUER \ --subject system:serviceaccount:default:myapp-sa ``` ### Kubernetes Resources ```yaml # Service Account apiVersion: v1 kind: ServiceAccount metadata: name: myapp-sa namespace: default annotations: azure.workload.identity/client-id: "" --- # Deployment using workload identity apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 2 selector: matchLabels: app: myapp template: metadata: labels: app: myapp azure.workload.identity/use: "true" # Enable workload identity spec: serviceAccountName: myapp-sa containers: - name: myapp image: myregistry.azurecr.io/myapp:latest env: - name: AZURE_CLIENT_ID value: "" - name: AZURE_TENANT_ID value: "" - name: AZURE_FEDERATED_TOKEN_FILE value: /var/run/secrets/azure/tokens/azure-identity-token volumeMounts: - name: azure-identity-token mountPath: /var/run/secrets/azure/tokens readOnly: true volumes: - name: azure-identity-token projected: sources: - serviceAccountToken: path: azure-identity-token expirationSeconds: 3600 audience: api://AzureADTokenExchange ``` ## Monitoring and Observability ### Enable Container Insights ```bash # Already enabled with --enable-addons monitoring # Query logs using Azure Monitor # Get cluster logs az monitor log-analytics query \ --workspace \ --analytics-query "KubePodInventory | where ClusterName == 'MyAKSAutomatic' | take 10" \ --output table # Get Karpenter logs kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter ``` ### Prometheus and Grafana ```bash # Enable managed Prometheus az aks update \ --resource-group MyRG \ --name MyAKSAutomatic \ --enable-azure-monitor-metrics # Access Grafana dashboards through Azure Portal ``` ## Cost Optimization ### Billing Model (October 2025) - **Control plane**: $0.16/hour per cluster - **Compute**: Pay for actual node usage - **Karpenter**: Automatic bin-packing and consolidation - **Scale-to-zero**: Possible with KEDA and Karpenter ### Cost-Saving Tips 1. **Use Spot Instances for Non-Critical Workloads** ```yaml - key: karpenter.sh/capacity-type operator: In values: ["spot"] ``` 2. **Configure Aggressive Consolidation** ```yaml disruption: consolidationPolicy: WhenUnderutilized consolidateAfter: 30s ``` 3. **Implement Pod Disruption Budgets** ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: myapp-pdb spec: minAvailable: 1 selector: matchLabels: app: myapp ``` 4. **Use VPA for Right-Sizing** - VPA automatically adjusts resource requests based on actual usage ## Migration from Standard AKS to Automatic AKS Automatic is a new cluster mode - in-place migration is not supported. Follow these steps: 1. **Create new AKS Automatic cluster** 2. **Install workloads in new cluster** 3. **Validate functionality** 4. **Switch traffic** (DNS, load balancer) 5. **Decommission old cluster** ## Best Practices ✓ Use AKS Automatic for new production clusters ✓ Enable workload identity for pod authentication ✓ Configure custom NodePools for specific workload types ✓ Implement HPA, VPA, and KEDA for comprehensive scaling ✓ Use spot instances for batch and fault-tolerant workloads ✓ Enable Container Insights and Managed Prometheus ✓ Configure Pod Disruption Budgets for critical apps ✓ Use network policies for microsegmentation ✓ Enable Azure Policy add-on for compliance ✓ Implement GitOps with Flux or Argo CD ## Troubleshooting ### Check Karpenter Status ```bash kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=100 kubectl get nodepools kubectl get nodeclaims ``` ### View Node Provisioning Events ```bash kubectl get events --field-selector involvedObject.kind=NodePool -A ``` ### Debug Workload Identity Issues ```bash # Check service account annotation kubectl get sa myapp-sa -o yaml # Check pod labels kubectl get pod -o yaml | grep azure.workload.identity # Check federated credential az identity federated-credential show \ --identity-name myapp-identity \ --resource-group MyRG \ --name myapp-federated-credential ``` ## References - [AKS Automatic Documentation](https://learn.microsoft.com/en-us/azure/aks/automatic) - [Karpenter on Azure](https://karpenter.sh) - [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) - [AKS Release Notes](https://github.com/Azure/AKS/releases) AKS Automatic represents the future of managed Kubernetes on Azure - zero operational overhead with maximum automation!