263 lines
5.7 KiB
Markdown
263 lines
5.7 KiB
Markdown
# Kubernetes Cluster Setup
|
|
|
|
Set up a production-ready Kubernetes cluster with essential components.
|
|
|
|
## Task
|
|
|
|
You are a Kubernetes infrastructure expert. Guide users through setting up a production cluster.
|
|
|
|
### Steps:
|
|
|
|
1. **Ask for Platform**:
|
|
- Managed (EKS, GKE, AKS)
|
|
- Self-hosted (kubeadm, k3s, kind)
|
|
- Local dev (minikube, kind, k3d)
|
|
|
|
2. **Generate Cluster Configuration**:
|
|
|
|
#### EKS (AWS):
|
|
|
|
```bash
|
|
# eksctl config
|
|
apiVersion: eksctl.io/v1alpha5
|
|
kind: ClusterConfig
|
|
|
|
metadata:
|
|
name: production-cluster
|
|
region: us-east-1
|
|
version: "1.28"
|
|
|
|
managedNodeGroups:
|
|
- name: general-purpose
|
|
instanceType: t3.medium
|
|
minSize: 3
|
|
maxSize: 10
|
|
desiredCapacity: 3
|
|
volumeSize: 50
|
|
ssh:
|
|
allow: true
|
|
labels:
|
|
workload-type: general
|
|
tags:
|
|
nodegroup-role: general-purpose
|
|
iam:
|
|
withAddonPolicies:
|
|
autoScaler: true
|
|
certManager: true
|
|
externalDNS: true
|
|
ebs: true
|
|
efs: true
|
|
|
|
addons:
|
|
- name: vpc-cni
|
|
- name: coredns
|
|
- name: kube-proxy
|
|
- name: aws-ebs-csi-driver
|
|
```
|
|
|
|
#### GKE (Google Cloud):
|
|
|
|
```bash
|
|
gcloud container clusters create production-cluster \
|
|
--region us-central1 \
|
|
--num-nodes 3 \
|
|
--machine-type n1-standard-2 \
|
|
--disk-size 50 \
|
|
--enable-autoscaling \
|
|
--min-nodes 3 \
|
|
--max-nodes 10 \
|
|
--enable-autorepair \
|
|
--enable-autoupgrade \
|
|
--maintenance-window-start "2024-01-01T00:00:00Z" \
|
|
--maintenance-window-duration 4h \
|
|
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
|
|
--workload-pool=production-cluster.svc.id.goog \
|
|
--enable-shielded-nodes \
|
|
--enable-ip-alias \
|
|
--network default \
|
|
--subnetwork default \
|
|
--cluster-version latest
|
|
```
|
|
|
|
#### AKS (Azure):
|
|
|
|
```bash
|
|
az aks create \
|
|
--resource-group production-rg \
|
|
--name production-cluster \
|
|
--location eastus \
|
|
--kubernetes-version 1.28.0 \
|
|
--node-count 3 \
|
|
--node-vm-size Standard_D2s_v3 \
|
|
--enable-cluster-autoscaler \
|
|
--min-count 3 \
|
|
--max-count 10 \
|
|
--network-plugin azure \
|
|
--enable-managed-identity \
|
|
--enable-pod-security-policy \
|
|
--enable-addons monitoring,azure-policy \
|
|
--generate-ssh-keys
|
|
```
|
|
|
|
3. **Install Essential Add-ons**:
|
|
|
|
#### Ingress Controller (NGINX):
|
|
|
|
```yaml
|
|
# Helm install
|
|
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
|
|
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
|
|
--namespace ingress-nginx \
|
|
--create-namespace \
|
|
--set controller.replicaCount=3 \
|
|
--set controller.service.type=LoadBalancer \
|
|
--set controller.metrics.enabled=true
|
|
```
|
|
|
|
#### Cert-Manager (TLS certificates):
|
|
|
|
```yaml
|
|
helm repo add jetstack https://charts.jetstack.io
|
|
helm upgrade --install cert-manager jetstack/cert-manager \
|
|
--namespace cert-manager \
|
|
--create-namespace \
|
|
--set installCRDs=true
|
|
|
|
# ClusterIssuer for Let's Encrypt
|
|
apiVersion: cert-manager.io/v1
|
|
kind: ClusterIssuer
|
|
metadata:
|
|
name: letsencrypt-prod
|
|
spec:
|
|
acme:
|
|
server: https://acme-v02.api.letsencrypt.org/directory
|
|
email: admin@example.com
|
|
privateKeySecretRef:
|
|
name: letsencrypt-prod
|
|
solvers:
|
|
- http01:
|
|
ingress:
|
|
class: nginx
|
|
```
|
|
|
|
#### Prometheus + Grafana (Monitoring):
|
|
|
|
```bash
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring \
|
|
--create-namespace \
|
|
--set prometheus.prometheusSpec.retention=30d \
|
|
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
|
|
--set grafana.adminPassword=admin123
|
|
```
|
|
|
|
#### External DNS (auto DNS records):
|
|
|
|
```yaml
|
|
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
|
|
helm upgrade --install external-dns external-dns/external-dns \
|
|
--namespace kube-system \
|
|
--set provider=aws \ # or google, azure
|
|
--set txtOwnerId=production-cluster \
|
|
--set policy=sync
|
|
```
|
|
|
|
#### ArgoCD (GitOps):
|
|
|
|
```bash
|
|
kubectl create namespace argocd
|
|
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
|
|
|
|
# Access UI
|
|
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
|
|
|
# Get admin password
|
|
kubectl -n argocd get secret argocd-initial-admin-secret \
|
|
-o jsonpath="{.data.password}" | base64 -d
|
|
```
|
|
|
|
4. **Security Setup**:
|
|
|
|
#### Network Policies:
|
|
|
|
```yaml
|
|
# Default deny all
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: default-deny-all
|
|
spec:
|
|
podSelector: {}
|
|
policyTypes:
|
|
- Ingress
|
|
- Egress
|
|
|
|
# Allow DNS
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: allow-dns
|
|
spec:
|
|
podSelector: {}
|
|
policyTypes:
|
|
- Egress
|
|
egress:
|
|
- to:
|
|
- namespaceSelector:
|
|
matchLabels:
|
|
name: kube-system
|
|
ports:
|
|
- protocol: UDP
|
|
port: 53
|
|
```
|
|
|
|
#### Pod Security Standards:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: production
|
|
labels:
|
|
pod-security.kubernetes.io/enforce: restricted
|
|
pod-security.kubernetes.io/audit: restricted
|
|
pod-security.kubernetes.io/warn: restricted
|
|
```
|
|
|
|
5. **Storage Classes**:
|
|
|
|
```yaml
|
|
# Fast SSD storage
|
|
apiVersion: storage.k8s.io/v1
|
|
kind: StorageClass
|
|
metadata:
|
|
name: fast
|
|
provisioner: ebs.csi.aws.com # or pd.csi.storage.gke.io, disk.csi.azure.com
|
|
parameters:
|
|
type: gp3
|
|
iops: "3000"
|
|
throughput: "125"
|
|
volumeBindingMode: WaitForFirstConsumer
|
|
allowVolumeExpansion: true
|
|
reclaimPolicy: Delete
|
|
```
|
|
|
|
### Best Practices Included:
|
|
|
|
- Multi-AZ/region deployment
|
|
- Auto-scaling (cluster and pods)
|
|
- Monitoring and logging
|
|
- TLS certificate automation
|
|
- GitOps with ArgoCD
|
|
- Network policies
|
|
- Resource quotas
|
|
- RBAC configuration
|
|
|
|
### Example Usage:
|
|
|
|
```
|
|
User: "Set up production EKS cluster with monitoring"
|
|
Result: Complete EKS config + all essential add-ons
|
|
```
|