# Kubernetes Performance Troubleshooting Systematic approach to diagnosing and resolving Kubernetes performance issues. ## Table of Contents 1. [High Latency Issues](#high-latency-issues) 2. [CPU Performance](#cpu-performance) 3. [Memory Performance](#memory-performance) 4. [Network Performance](#network-performance) 5. [Storage I/O Performance](#storage-io-performance) 6. [Application-Level Metrics](#application-level-metrics) 7. [Cluster-Wide Performance](#cluster-wide-performance) --- ## High Latency Issues ### Symptoms - Slow API response times - Increased request latency - Timeouts - Degraded user experience ### Investigation Workflow **1. Identify the layer with latency:** ```bash # Check service mesh metrics (if using Istio/Linkerd) kubectl top pods -n # Check ingress controller metrics kubectl logs -n ingress-nginx | grep "request_time" # Check application logs for slow requests kubectl logs -n | grep -i "slow\|timeout\|latency" ``` **2. Profile application performance:** ```bash # Get pod metrics kubectl top pod -n # Check if pod is CPU throttled kubectl get pod -n -o json | \ jq '.spec.containers[].resources' # Exec into pod and check application-specific metrics kubectl exec -it -n -- /bin/sh # Then: curl localhost:8080/metrics (if Prometheus metrics available) ``` **3. Check dependencies:** ```bash # Test connectivity to downstream services kubectl exec -it -n -- \ curl -w "@curl-format.txt" -o /dev/null -s http://backend-service # curl-format.txt content: # time_namelookup: %{time_namelookup}\n # time_connect: %{time_connect}\n # time_appconnect: %{time_appconnect}\n # time_pretransfer: %{time_pretransfer}\n # time_redirect: %{time_redirect}\n # time_starttransfer: %{time_starttransfer}\n # time_total: %{time_total}\n ``` ### Common Causes and Solutions **CPU Throttling:** ```yaml # Increase CPU limits or remove limits for bursty workloads resources: requests: cpu: "500m" # What pod needs typically limits: cpu: "2000m" # Burst capacity (or remove for unlimited) ``` **Insufficient Replicas:** ```bash # Scale up deployment kubectl scale deployment -n --replicas=5 # Or enable HPA kubectl autoscale deployment \ --cpu-percent=70 \ --min=2 \ --max=10 ``` **Slow Dependencies:** ```yaml # Implement circuit breakers and timeouts in application # Or use service mesh policies (Istio example): apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: backend-circuit-breaker spec: host: backend-service trafficPolicy: connectionPool: tcp: maxConnections: 100 http: http1MaxPendingRequests: 50 http2MaxRequests: 100 outlierDetection: consecutiveErrors: 5 interval: 30s baseEjectionTime: 30s ``` --- ## CPU Performance ### Symptoms - High CPU usage - Throttling - Slow processing - Queue buildup ### Investigation Commands ```bash # Check CPU usage kubectl top nodes kubectl top pods -n # Check CPU throttling kubectl get pod -n -o json | \ jq '.spec.containers[].resources' # Get detailed CPU metrics (requires metrics-server) kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces//pods/" | jq # Check container-level CPU from node (SSH to node) ssh "docker stats --no-stream" ``` ### Advanced CPU Profiling **Enable CPU profiling in application:** ```bash # For Go applications with pprof kubectl port-forward 6060:6060 -n # Capture CPU profile curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof # Analyze with pprof go tool pprof -http=:8080 cpu.prof ``` **For Java applications:** ```bash # Use async-profiler kubectl exec -it -n -- \ /profiler.sh -d 30 -f /tmp/flamegraph.html 1 # Copy flamegraph kubectl cp /:/tmp/flamegraph.html ./flamegraph.html ``` ### Solutions **Vertical Scaling:** ```yaml resources: requests: cpu: "1000m" # Increased from 500m limits: cpu: "2000m" # Increased from 1000m ``` **Horizontal Scaling:** ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` **Remove CPU Limits for Bursty Workloads:** ```yaml # Allow bursting to available CPU resources: requests: cpu: "500m" # No limits - can use all available CPU ``` --- ## Memory Performance ### Symptoms - OOMKilled pods - Memory leaks - Slow garbage collection - Swap usage (if enabled) ### Investigation Commands ```bash # Check memory usage kubectl top nodes kubectl top pods -n # Check memory limits and requests kubectl describe pod -n | grep -A 5 "Limits\|Requests" # Check OOM kills kubectl get pods -n -o json | \ jq '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason == "OOMKilled") | .metadata.name' # Detailed memory breakdown (requires metrics-server) kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces//pods/" | \ jq '.containers[] | {name, usage: .usage.memory}' ``` ### Memory Profiling **Heap dump for Java:** ```bash # Capture heap dump kubectl exec -n -- \ jmap -dump:format=b,file=/tmp/heapdump.hprof 1 # Copy heap dump kubectl cp /:/tmp/heapdump.hprof ./heapdump.hprof # Analyze with Eclipse MAT or VisualVM ``` **Memory profiling for Go:** ```bash # Capture heap profile kubectl port-forward 6060:6060 -n curl http://localhost:6060/debug/pprof/heap > heap.prof # Analyze go tool pprof -http=:8080 heap.prof ``` ### Solutions **Increase Memory Limits:** ```yaml resources: requests: memory: "512Mi" limits: memory: "2Gi" # Increased from 1Gi ``` **Optimize Application:** - Fix memory leaks - Implement connection pooling - Optimize caching strategies - Tune garbage collection **Use Memory-Optimized Node Pools:** ```yaml # Node affinity for memory-intensive workloads affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: workload-type operator: In values: - memory-optimized ``` --- ## Network Performance ### Symptoms - High network latency - Packet loss - Connection timeouts - Bandwidth saturation ### Investigation Commands ```bash # Check pod network statistics kubectl exec -n -- netstat -s # Test network performance between pods # Deploy netperf kubectl run netperf-client --image=networkstatic/netperf --rm -it -- /bin/bash # From client, run: netperf -H -t TCP_STREAM netperf -H -t TCP_RR # Request-response latency # Check DNS resolution time kubectl exec -n -- \ time nslookup service-name.namespace.svc.cluster.local # Check service mesh overhead (if using Istio) kubectl exec -n -c istio-proxy -- \ curl -s localhost:15000/stats | grep "http.inbound\|http.outbound" ``` ### Check Network Policies ```bash # List network policies kubectl get networkpolicies -n # Check if policy is blocking traffic kubectl describe networkpolicy -n # Temporarily remove policies to test (in non-production) kubectl delete networkpolicy -n ``` ### Solutions **DNS Optimization:** ```yaml # Use CoreDNS caching # Increase CoreDNS replicas kubectl scale deployment coredns -n kube-system --replicas=5 # Or use NodeLocal DNSCache # https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/ ``` **Optimize Service Mesh:** ```yaml # Reduce Istio sidecar resources if over-provisioned sidecar.istio.io/proxyCPU: "100m" sidecar.istio.io/proxyMemory: "128Mi" # Or disable for internal, trusted services sidecar.istio.io/inject: "false" ``` **Use HostNetwork for Network-Intensive Pods:** ```yaml # Use with caution - bypasses pod networking spec: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet ``` **Enable Bandwidth Limits (QoS):** ```yaml metadata: annotations: kubernetes.io/ingress-bandwidth: "10M" kubernetes.io/egress-bandwidth: "10M" ``` --- ## Storage I/O Performance ### Symptoms - Slow read/write operations - High I/O wait - Application timeouts during disk operations - Database performance issues ### Investigation Commands ```bash # Check I/O metrics on node ssh "iostat -x 1 10" # Check disk usage kubectl exec -n -- df -h # Check I/O wait from pod kubectl exec -n -- top # Test storage performance kubectl exec -n -- \ dd if=/dev/zero of=/data/test bs=1M count=1024 conv=fdatasync # Check PV performance class kubectl get pv -o yaml | grep storageClassName kubectl describe storageclass ``` ### Storage Benchmarking **Deploy fio for benchmarking:** ```yaml apiVersion: v1 kind: Pod metadata: name: fio-benchmark spec: containers: - name: fio image: ljishen/fio command: ["/bin/sh", "-c"] args: - | fio --name=seqread --rw=read --bs=1M --size=1G --runtime=60 --filename=/data/test fio --name=seqwrite --rw=write --bs=1M --size=1G --runtime=60 --filename=/data/test fio --name=randread --rw=randread --bs=4k --size=1G --runtime=60 --filename=/data/test fio --name=randwrite --rw=randwrite --bs=4k --size=1G --runtime=60 --filename=/data/test volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: test-pvc ``` ### Solutions **Use Higher Performance Storage Class:** ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: high-performance-pvc spec: accessModes: - ReadWriteOnce storageClassName: gp3 # or io2, premium-rwo (GKE), etc. resources: requests: storage: 100Gi ``` **Provision IOPS (AWS EBS io2):** ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: io2-high-iops provisioner: ebs.csi.aws.com parameters: type: io2 iops: "10000" fsType: ext4 volumeBindingMode: WaitForFirstConsumer ``` **Use Local NVMe for Ultra-Low Latency:** ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-nvme provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- apiVersion: v1 kind: PersistentVolume metadata: name: local-pv spec: capacity: storage: 100Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: local-nvme local: path: /mnt/disks/nvme0n1 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - node-with-nvme ``` --- ## Application-Level Metrics ### Expose Prometheus Metrics **Add metrics endpoint to application:** ```yaml apiVersion: v1 kind: Service metadata: name: app-metrics annotations: prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/metrics" spec: selector: app: myapp ports: - name: metrics port: 8080 targetPort: 8080 ``` ### Key Metrics to Monitor **Application metrics:** - Request rate - Request latency (p50, p95, p99) - Error rate - Active connections - Queue depth - Cache hit rate **Example Prometheus queries:** ```promql # P95 latency histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) # Error rate sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) # Request rate sum(rate(http_requests_total[5m])) ``` ### Distributed Tracing **Implement OpenTelemetry:** ```yaml # Deploy Jaeger apiVersion: apps/v1 kind: Deployment metadata: name: jaeger spec: template: spec: containers: - name: jaeger image: jaegertracing/all-in-one:latest ports: - containerPort: 16686 # UI - containerPort: 14268 # Collector ``` **Instrument application:** - Add OpenTelemetry SDK to application - Configure trace export to Jaeger - Analyze end-to-end request traces to identify bottlenecks --- ## Cluster-Wide Performance ### Cluster Resource Utilization ```bash # Overall cluster capacity kubectl top nodes # Total resources kubectl describe nodes | grep -A 5 "Allocated resources" # Resource requests vs limits kubectl get pods --all-namespaces -o json | \ jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name) \(.spec.containers[].resources)"' ``` ### Control Plane Performance ```bash # Check API server latency kubectl get --raw /metrics | grep apiserver_request_duration_seconds # Check etcd performance kubectl exec -it -n kube-system etcd- -- \ etcdctl --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ check perf # Controller manager metrics kubectl get --raw /metrics | grep workqueue_depth ``` ### Scheduler Performance ```bash # Check scheduler latency kubectl get --raw /metrics | grep scheduler_scheduling_duration_seconds # Check pending pods kubectl get pods --all-namespaces --field-selector status.phase=Pending # Scheduler logs kubectl logs -n kube-system kube-scheduler- ``` ### Solutions for Cluster-Wide Issues **Scale Control Plane:** - Add more control plane nodes - Increase API server replicas - Tune etcd (increase memory, use SSD) **Optimize Scheduling:** - Use pod priority and preemption - Implement pod topology spread constraints - Use node affinity/anti-affinity appropriately **Resource Management:** - Set appropriate resource requests and limits - Use LimitRanges and ResourceQuotas - Implement VerticalPodAutoscaler for right-sizing --- ## Performance Optimization Checklist ### Application Level - [ ] Implement connection pooling - [ ] Enable response caching - [ ] Optimize database queries - [ ] Use async/non-blocking I/O - [ ] Implement circuit breakers - [ ] Profile and optimize hot paths ### Kubernetes Level - [ ] Set appropriate resource requests/limits - [ ] Use HPA for auto-scaling - [ ] Implement readiness/liveness probes correctly - [ ] Use anti-affinity for high-availability - [ ] Optimize container image size - [ ] Use multi-stage builds ### Infrastructure Level - [ ] Use appropriate instance/node types - [ ] Enable cluster autoscaling - [ ] Use high-performance storage classes - [ ] Optimize network topology - [ ] Implement monitoring and alerting - [ ] Regular performance testing --- ## Monitoring Tools **Essential tools:** - **Prometheus + Grafana**: Metrics and dashboards - **Jaeger/Zipkin**: Distributed tracing - **kube-state-metrics**: Kubernetes object metrics - **node-exporter**: Node-level metrics - **cAdvisor**: Container metrics - **kubectl-flamegraph**: CPU profiling **Commercial options:** - Datadog - New Relic - Dynatrace - Elastic APM