gh-josiahsiegel-claude-code…/skills/aks-automatic-2025.md

## 🚨 CRITICAL GUIDELINES

### Windows File Path Requirements

**MANDATORY: Always Use Backslashes on Windows for File Paths**

When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).

**Examples:**
- ❌ WRONG: `D:/repos/project/file.tsx`
- ✅ CORRECT: `D:\repos\project\file.tsx`

This applies to:
- Edit tool file_path parameter
- Write tool file_path parameter
- All file operations on Windows systems

### Documentation Guidelines

**NEVER create new documentation files unless explicitly requested by the user.**

- **Priority**: Update existing README.md files rather than creating new documentation
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
- **User preference**: Only create additional .md files when user specifically asks for documentation

---


# AKS Automatic - 2025 GA Features

Complete knowledge base for Azure Kubernetes Service Automatic mode (GA October 2025).

## Overview

AKS Automatic is a fully-managed Kubernetes offering that eliminates operational overhead through intelligent automation and built-in best practices.

## Key Features (GA October 2025)

### 1. Zero Operational Overhead
- Fully-managed control plane and worker nodes
- Automatic OS patching and security updates
- Built-in monitoring and diagnostics
- Integrated security and compliance

### 2. Karpenter Integration
- Dynamic node provisioning based on real-time demand
- Intelligent bin-packing for cost optimization
- Automatic node consolidation and deprovisioning
- Support for multiple node pools and instance types

### 3. Auto-Scaling (Enabled by Default)
- **Horizontal Pod Autoscaler (HPA)**: Scale pods based on CPU/memory
- **Vertical Pod Autoscaler (VPA)**: Adjust pod resource requests/limits
- **KEDA**: Event-driven autoscaling for external triggers

### 4. Enhanced Security
- Microsoft Entra ID integration for authentication
- Azure RBAC for Kubernetes authorization
- Network policies enabled by default
- Automatic security patches
- Workload identity for pod-level authentication

### 5. Advanced Networking
- Azure CNI Overlay for efficient IP usage
- Cilium dataplane for high-performance networking
- Network policies for microsegmentation
- Private clusters supported

### 6. New Billing Model (Effective October 19, 2025)
- Hosted control plane fee: **$0.16/cluster/hour**
- Compute charges based on actual node usage
- No separate cluster management fee
- Cost savings from Karpenter optimization

### 7. Node Operating System
- Ubuntu 22.04 for Kubernetes < 1.34
- Ubuntu 24.04 for Kubernetes >= 1.34
- Automatic OS upgrades with node image channel

## Creating AKS Automatic Cluster

### Basic Creation

```bash
az aks create \
  --resource-group MyRG \
  --name MyAKSAutomatic \
  --sku automatic \
  --kubernetes-version 1.34 \
  --location eastus
```

### Production-Ready Configuration

```bash
az aks create \
  --resource-group MyRG \
  --name MyAKSAutomatic \
  --location eastus \
  --sku automatic \
  --tier standard \
  \
  # Kubernetes version
  --kubernetes-version 1.34 \
  \
  # Karpenter (default in automatic mode)
  --enable-karpenter \
  \
  # Networking
  --network-plugin azure \
  --network-plugin-mode overlay \
  --network-dataplane cilium \
  --service-cidr 10.0.0.0/16 \
  --dns-service-ip 10.0.0.10 \
  --load-balancer-sku standard \
  \
  # Use custom VNet (optional)
  --vnet-subnet-id /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.Network/virtualNetworks/MyVNet/subnets/AKSSubnet \
  \
  # Availability zones
  --zones 1 2 3 \
  \
  # Authentication and authorization
  --enable-managed-identity \
  --enable-aad \
  --enable-azure-rbac \
  --aad-admin-group-object-ids <group-object-id> \
  \
  # Auto-upgrade
  --auto-upgrade-channel stable \
  --node-os-upgrade-channel NodeImage \
  \
  # Security
  --enable-defender \
  --enable-workload-identity \
  --enable-oidc-issuer \
  \
  # Monitoring
  --enable-addons monitoring \
  --workspace-resource-id /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.OperationalInsights/workspaces/MyWorkspace \
  \
  # Tags
  --tags Environment=Production ManagedBy=AKSAutomatic
```

### With Azure Policy Add-on

```bash
az aks create \
  --resource-group MyRG \
  --name MyAKSAutomatic \
  --sku automatic \
  --enable-addons azure-policy \
  --kubernetes-version 1.34
```

## Karpenter Configuration

AKS Automatic uses Karpenter for intelligent node provisioning. Customize node provisioning with AKSNodeClass and NodePool CRDs.

### Default AKSNodeClass

```yaml
apiVersion: karpenter.azure.com/v1alpha1
kind: AKSNodeClass
metadata:
  name: default
spec:
  # OS Image - Ubuntu 24.04 for K8s 1.34+
  osImage:
    sku: Ubuntu
    version: "24.04"

  # VM Series
  vmSeries:
    - Standard_D
    - Standard_E

  # Max pods per node
  maxPodsPerNode: 110

  # Security
  securityProfile:
    sshAccess: Disabled
    securityType: Standard
```

### Custom NodePool

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  # Constraints
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: kubernetes.azure.com/agentpool
          operator: In
          values: ["general"]

      # Node labels
      labels:
        workload-type: general

      # Taints (optional)
      taints:
        - key: "dedicated"
          value: "general"
          effect: "NoSchedule"

      # NodeClass reference
      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: default

  # Limits
  limits:
    cpu: "1000"
    memory: 4000Gi

  # Disruption budget
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s
    expireAfter: 720h # 30 days
    budgets:
      - nodes: "10%"
        duration: 5m
```

### GPU NodePool for AI Workloads

```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-workloads
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["Standard_NC6s_v3", "Standard_NC12s_v3", "Standard_NC24s_v3"]

      labels:
        workload-type: gpu
        gpu-type: nvidia-v100

      taints:
        - key: "nvidia.com/gpu"
          value: "true"
          effect: "NoSchedule"

      nodeClassRef:
        group: karpenter.azure.com
        kind: AKSNodeClass
        name: gpu-nodeclass

  limits:
    cpu: "200"
    memory: 800Gi
    nvidia.com/gpu: "16"

  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 300s
```

## Autoscaling with HPA, VPA, and KEDA

### Horizontal Pod Autoscaler (HPA)

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 15
```

### Vertical Pod Autoscaler (VPA)

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"  # Auto, Recreate, Initial, Off
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits
```

### KEDA ScaledObject (Event-Driven)

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: myapp-queue-scaler
spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 0  # Scale to zero
  maxReplicaCount: 100
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    # Azure Service Bus Queue
    - type: azure-servicebus
      metadata:
        queueName: myqueue
        namespace: myservicebus
        messageCount: "5"
      authenticationRef:
        name: azure-servicebus-auth

    # Azure Storage Queue
    - type: azure-queue
      metadata:
        queueName: myqueue
        queueLength: "10"
        accountName: mystorageaccount
      authenticationRef:
        name: azure-storage-auth

    # Prometheus metrics
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
        metricName: http_requests_per_second
        threshold: "100"
        query: sum(rate(http_requests_total[2m]))
```

## Workload Identity (Replaces AAD Pod Identity)

### Setup

```bash
# Workload identity is enabled by default in AKS Automatic

# Create managed identity
az identity create \
  --name myapp-identity \
  --resource-group MyRG

# Get identity details
export IDENTITY_CLIENT_ID=$(az identity show -g MyRG -n myapp-identity --query clientId -o tsv)
export IDENTITY_OBJECT_ID=$(az identity show -g MyRG -n myapp-identity --query principalId -o tsv)

# Assign role to identity
az role assignment create \
  --assignee $IDENTITY_OBJECT_ID \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.Storage/storageAccounts/mystorage

# Create federated credential
export AKS_OIDC_ISSUER=$(az aks show -g MyRG -n MyAKSAutomatic --query oidcIssuerProfile.issuerUrl -o tsv)

az identity federated-credential create \
  --name myapp-federated-credential \
  --identity-name myapp-identity \
  --resource-group MyRG \
  --issuer $AKS_OIDC_ISSUER \
  --subject system:serviceaccount:default:myapp-sa
```

### Kubernetes Resources

```yaml
# Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: default
  annotations:
    azure.workload.identity/client-id: "<IDENTITY_CLIENT_ID>"

---
# Deployment using workload identity
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        azure.workload.identity/use: "true"  # Enable workload identity
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: myapp
          image: myregistry.azurecr.io/myapp:latest
          env:
            - name: AZURE_CLIENT_ID
              value: "<IDENTITY_CLIENT_ID>"
            - name: AZURE_TENANT_ID
              value: "<TENANT_ID>"
            - name: AZURE_FEDERATED_TOKEN_FILE
              value: /var/run/secrets/azure/tokens/azure-identity-token
          volumeMounts:
            - name: azure-identity-token
              mountPath: /var/run/secrets/azure/tokens
              readOnly: true
      volumes:
        - name: azure-identity-token
          projected:
            sources:
              - serviceAccountToken:
                  path: azure-identity-token
                  expirationSeconds: 3600
                  audience: api://AzureADTokenExchange
```

## Monitoring and Observability

### Enable Container Insights

```bash
# Already enabled with --enable-addons monitoring
# Query logs using Azure Monitor

# Get cluster logs
az monitor log-analytics query \
  --workspace <workspace-id> \
  --analytics-query "KubePodInventory | where ClusterName == 'MyAKSAutomatic' | take 10" \
  --output table

# Get Karpenter logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
```

### Prometheus and Grafana

```bash
# Enable managed Prometheus
az aks update \
  --resource-group MyRG \
  --name MyAKSAutomatic \
  --enable-azure-monitor-metrics

# Access Grafana dashboards through Azure Portal
```

## Cost Optimization

### Billing Model (October 2025)
- **Control plane**: $0.16/hour per cluster
- **Compute**: Pay for actual node usage
- **Karpenter**: Automatic bin-packing and consolidation
- **Scale-to-zero**: Possible with KEDA and Karpenter

### Cost-Saving Tips

1. **Use Spot Instances for Non-Critical Workloads**
```yaml
- key: karpenter.sh/capacity-type
  operator: In
  values: ["spot"]
```

2. **Configure Aggressive Consolidation**
```yaml
disruption:
  consolidationPolicy: WhenUnderutilized
  consolidateAfter: 30s
```

3. **Implement Pod Disruption Budgets**
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp
```

4. **Use VPA for Right-Sizing**
- VPA automatically adjusts resource requests based on actual usage

## Migration from Standard AKS to Automatic

AKS Automatic is a new cluster mode - in-place migration is not supported. Follow these steps:

1. **Create new AKS Automatic cluster**
2. **Install workloads in new cluster**
3. **Validate functionality**
4. **Switch traffic** (DNS, load balancer)
5. **Decommission old cluster**

## Best Practices

✓ Use AKS Automatic for new production clusters
✓ Enable workload identity for pod authentication
✓ Configure custom NodePools for specific workload types
✓ Implement HPA, VPA, and KEDA for comprehensive scaling
✓ Use spot instances for batch and fault-tolerant workloads
✓ Enable Container Insights and Managed Prometheus
✓ Configure Pod Disruption Budgets for critical apps
✓ Use network policies for microsegmentation
✓ Enable Azure Policy add-on for compliance
✓ Implement GitOps with Flux or Argo CD

## Troubleshooting

### Check Karpenter Status
```bash
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=100
kubectl get nodepools
kubectl get nodeclaims
```

### View Node Provisioning Events
```bash
kubectl get events --field-selector involvedObject.kind=NodePool -A
```

### Debug Workload Identity Issues
```bash
# Check service account annotation
kubectl get sa myapp-sa -o yaml

# Check pod labels
kubectl get pod <pod-name> -o yaml | grep azure.workload.identity

# Check federated credential
az identity federated-credential show \
  --identity-name myapp-identity \
  --resource-group MyRG \
  --name myapp-federated-credential
```

## References

- [AKS Automatic Documentation](https://learn.microsoft.com/en-us/azure/aks/automatic)
- [Karpenter on Azure](https://karpenter.sh)
- [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview)
- [AKS Release Notes](https://github.com/Azure/AKS/releases)

AKS Automatic represents the future of managed Kubernetes on Azure - zero operational overhead with maximum automation!