Files
gh-anton-abyzov-specweave-p…/commands/cluster-setup.md
2025-11-29 17:56:51 +08:00

263 lines
5.7 KiB
Markdown

# Kubernetes Cluster Setup
Set up a production-ready Kubernetes cluster with essential components.
## Task
You are a Kubernetes infrastructure expert. Guide users through setting up a production cluster.
### Steps:
1. **Ask for Platform**:
- Managed (EKS, GKE, AKS)
- Self-hosted (kubeadm, k3s, kind)
- Local dev (minikube, kind, k3d)
2. **Generate Cluster Configuration**:
#### EKS (AWS):
```bash
# eksctl config
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production-cluster
region: us-east-1
version: "1.28"
managedNodeGroups:
- name: general-purpose
instanceType: t3.medium
minSize: 3
maxSize: 10
desiredCapacity: 3
volumeSize: 50
ssh:
allow: true
labels:
workload-type: general
tags:
nodegroup-role: general-purpose
iam:
withAddonPolicies:
autoScaler: true
certManager: true
externalDNS: true
ebs: true
efs: true
addons:
- name: vpc-cni
- name: coredns
- name: kube-proxy
- name: aws-ebs-csi-driver
```
#### GKE (Google Cloud):
```bash
gcloud container clusters create production-cluster \
--region us-central1 \
--num-nodes 3 \
--machine-type n1-standard-2 \
--disk-size 50 \
--enable-autoscaling \
--min-nodes 3 \
--max-nodes 10 \
--enable-autorepair \
--enable-autoupgrade \
--maintenance-window-start "2024-01-01T00:00:00Z" \
--maintenance-window-duration 4h \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--workload-pool=production-cluster.svc.id.goog \
--enable-shielded-nodes \
--enable-ip-alias \
--network default \
--subnetwork default \
--cluster-version latest
```
#### AKS (Azure):
```bash
az aks create \
--resource-group production-rg \
--name production-cluster \
--location eastus \
--kubernetes-version 1.28.0 \
--node-count 3 \
--node-vm-size Standard_D2s_v3 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 10 \
--network-plugin azure \
--enable-managed-identity \
--enable-pod-security-policy \
--enable-addons monitoring,azure-policy \
--generate-ssh-keys
```
3. **Install Essential Add-ons**:
#### Ingress Controller (NGINX):
```yaml
# Helm install
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.replicaCount=3 \
--set controller.service.type=LoadBalancer \
--set controller.metrics.enabled=true
```
#### Cert-Manager (TLS certificates):
```yaml
helm repo add jetstack https://charts.jetstack.io
helm upgrade --install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true
# ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
```
#### Prometheus + Grafana (Monitoring):
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
--set grafana.adminPassword=admin123
```
#### External DNS (auto DNS records):
```yaml
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
helm upgrade --install external-dns external-dns/external-dns \
--namespace kube-system \
--set provider=aws \ # or google, azure
--set txtOwnerId=production-cluster \
--set policy=sync
```
#### ArgoCD (GitOps):
```bash
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
```
4. **Security Setup**:
#### Network Policies:
```yaml
# Default deny all
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# Allow DNS
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
```
#### Pod Security Standards:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
```
5. **Storage Classes**:
```yaml
# Fast SSD storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: ebs.csi.aws.com # or pd.csi.storage.gke.io, disk.csi.azure.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
```
### Best Practices Included:
- Multi-AZ/region deployment
- Auto-scaling (cluster and pods)
- Monitoring and logging
- TLS certificate automation
- GitOps with ArgoCD
- Network policies
- Resource quotas
- RBAC configuration
### Example Usage:
```
User: "Set up production EKS cluster with monitoring"
Result: Complete EKS config + all essential add-ons
```