Initial commit
This commit is contained in:
529
commands/k8s-deploy.md
Normal file
529
commands/k8s-deploy.md
Normal file
@@ -0,0 +1,529 @@
|
||||
---
|
||||
description: Deploy to Kubernetes cluster
|
||||
argument-hint: Optional deployment details
|
||||
---
|
||||
|
||||
# Kubernetes Deployment
|
||||
|
||||
You are deploying applications to a Kubernetes cluster using the k8s-cluster-manager agent.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Gather Deployment Information
|
||||
|
||||
If not specified, ask for:
|
||||
- **What to deploy**:
|
||||
- Path to YAML manifests
|
||||
- Helm chart name/path
|
||||
- Kustomize directory
|
||||
- Docker image (for quick deployment)
|
||||
- **Target cluster**:
|
||||
- Cluster context name
|
||||
- Namespace (create if doesn't exist)
|
||||
- Environment type (dev/staging/production)
|
||||
- **Deployment strategy**:
|
||||
- RollingUpdate (default, zero downtime)
|
||||
- Recreate (stop old, start new)
|
||||
- Blue-Green (switch service selector)
|
||||
- Canary (gradual traffic shift)
|
||||
- **Requirements**:
|
||||
- Resource requests/limits
|
||||
- Replica count
|
||||
- Health check configuration
|
||||
|
||||
### 2. Pre-Deployment Validation
|
||||
|
||||
Before deploying, verify:
|
||||
|
||||
**Cluster connectivity**:
|
||||
```bash
|
||||
kubectl cluster-info
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
**Namespace exists or create**:
|
||||
```bash
|
||||
kubectl get namespace [namespace]
|
||||
# If doesn't exist:
|
||||
kubectl create namespace [namespace]
|
||||
```
|
||||
|
||||
**Context verification**:
|
||||
```bash
|
||||
kubectl config current-context
|
||||
# Switch if needed:
|
||||
kubectl config use-context [cluster-name]
|
||||
```
|
||||
|
||||
**Manifest validation** (for YAML files):
|
||||
```bash
|
||||
# Dry run to validate
|
||||
kubectl apply -f [manifest.yaml] --dry-run=client
|
||||
|
||||
# Validate all files in directory
|
||||
kubectl apply -f [directory]/ --dry-run=client
|
||||
|
||||
# Server-side validation
|
||||
kubectl apply -f [manifest.yaml] --dry-run=server
|
||||
```
|
||||
|
||||
### 3. Execute Deployment
|
||||
|
||||
Launch **k8s-cluster-manager** agent with deployment method:
|
||||
|
||||
#### Option A: Direct YAML Manifests
|
||||
|
||||
```bash
|
||||
# Single file
|
||||
kubectl apply -f deployment.yaml -n [namespace]
|
||||
|
||||
# Multiple files
|
||||
kubectl apply -f deployment.yaml -f service.yaml -f ingress.yaml -n [namespace]
|
||||
|
||||
# Entire directory
|
||||
kubectl apply -f k8s/ -n [namespace]
|
||||
|
||||
# Recursive directory
|
||||
kubectl apply -f k8s/ -n [namespace] --recursive
|
||||
```
|
||||
|
||||
#### Option B: Helm Chart
|
||||
|
||||
```bash
|
||||
# Add repository (if needed)
|
||||
helm repo add [repo-name] [repo-url]
|
||||
helm repo update
|
||||
|
||||
# Install new release
|
||||
helm install [release-name] [chart] -n [namespace] \
|
||||
--create-namespace \
|
||||
--set replicas=3 \
|
||||
--set image.tag=v1.2.3 \
|
||||
--values values.yaml
|
||||
|
||||
# Upgrade existing release
|
||||
helm upgrade [release-name] [chart] -n [namespace] \
|
||||
--reuse-values \
|
||||
--set image.tag=v1.2.4
|
||||
|
||||
# Install or upgrade (idempotent)
|
||||
helm upgrade --install [release-name] [chart] -n [namespace]
|
||||
```
|
||||
|
||||
#### Option C: Kustomize
|
||||
|
||||
```bash
|
||||
# Apply with kustomize
|
||||
kubectl apply -k overlays/[environment]/ -n [namespace]
|
||||
|
||||
# Preview what will be applied
|
||||
kubectl kustomize overlays/[environment]/
|
||||
```
|
||||
|
||||
#### Option D: Quick Deployment (Image Only)
|
||||
|
||||
```bash
|
||||
# Create deployment from image
|
||||
kubectl create deployment [name] \
|
||||
--image=[image:tag] \
|
||||
--replicas=3 \
|
||||
-n [namespace]
|
||||
|
||||
# Expose as service
|
||||
kubectl expose deployment [name] \
|
||||
--port=80 \
|
||||
--target-port=8080 \
|
||||
--type=LoadBalancer \
|
||||
-n [namespace]
|
||||
```
|
||||
|
||||
### 4. Monitor Deployment Progress
|
||||
|
||||
**Watch rollout status**:
|
||||
```bash
|
||||
# For Deployments
|
||||
kubectl rollout status deployment/[name] -n [namespace]
|
||||
|
||||
# For StatefulSets
|
||||
kubectl rollout status statefulset/[name] -n [namespace]
|
||||
|
||||
# For DaemonSets
|
||||
kubectl rollout status daemonset/[name] -n [namespace]
|
||||
```
|
||||
|
||||
**Watch pods coming up**:
|
||||
```bash
|
||||
# Watch pods in real-time
|
||||
kubectl get pods -n [namespace] -w
|
||||
|
||||
# Watch with labels
|
||||
kubectl get pods -n [namespace] -l app=[name] -w
|
||||
|
||||
# Detailed view
|
||||
kubectl get pods -n [namespace] -o wide
|
||||
```
|
||||
|
||||
**Check events**:
|
||||
```bash
|
||||
kubectl get events -n [namespace] \
|
||||
--sort-by='.lastTimestamp' \
|
||||
--watch
|
||||
```
|
||||
|
||||
### 5. Verify Deployment Health
|
||||
|
||||
**Pod status checks**:
|
||||
```bash
|
||||
# All pods running?
|
||||
kubectl get pods -n [namespace]
|
||||
|
||||
# Check specific deployment
|
||||
kubectl get deployment [name] -n [namespace]
|
||||
|
||||
# Detailed pod info
|
||||
kubectl describe pod [pod-name] -n [namespace]
|
||||
```
|
||||
|
||||
**Health check verification**:
|
||||
```bash
|
||||
# Check if pods are ready
|
||||
kubectl get pods -n [namespace] -o json | \
|
||||
jq '.items[] | {name: .metadata.name, ready: .status.conditions[] | select(.type=="Ready") | .status}'
|
||||
|
||||
# Check readiness probes
|
||||
kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Readiness"
|
||||
```
|
||||
|
||||
**Service connectivity**:
|
||||
```bash
|
||||
# Check service endpoints
|
||||
kubectl get endpoints [service-name] -n [namespace]
|
||||
|
||||
# Describe service
|
||||
kubectl describe service [service-name] -n [namespace]
|
||||
|
||||
# Test service from within cluster
|
||||
kubectl run test-pod --image=curlimages/curl -i --rm -- \
|
||||
curl http://[service-name].[namespace].svc.cluster.local
|
||||
```
|
||||
|
||||
**Resource usage**:
|
||||
```bash
|
||||
# Pod resource usage
|
||||
kubectl top pods -n [namespace]
|
||||
|
||||
# Specific deployment
|
||||
kubectl top pods -n [namespace] -l app=[name]
|
||||
```
|
||||
|
||||
### 6. Post-Deployment Validation
|
||||
|
||||
**Application health checks**:
|
||||
```bash
|
||||
# Check application logs
|
||||
kubectl logs -n [namespace] deployment/[name] --tail=50
|
||||
|
||||
# Follow logs
|
||||
kubectl logs -n [namespace] -f deployment/[name]
|
||||
|
||||
# Logs from all pods
|
||||
kubectl logs -n [namespace] -l app=[name] --all-containers=true
|
||||
```
|
||||
|
||||
**Ingress/Route verification** (if applicable):
|
||||
```bash
|
||||
# Check ingress
|
||||
kubectl get ingress -n [namespace]
|
||||
|
||||
# Test external access
|
||||
curl https://[domain]
|
||||
```
|
||||
|
||||
**ConfigMap/Secret verification**:
|
||||
```bash
|
||||
# Verify ConfigMaps mounted
|
||||
kubectl get configmap -n [namespace]
|
||||
|
||||
# Verify Secrets exist
|
||||
kubectl get secrets -n [namespace]
|
||||
```
|
||||
|
||||
### 7. Update Deployment Records
|
||||
|
||||
Document deployment details:
|
||||
- Deployment timestamp
|
||||
- Image versions deployed
|
||||
- Configuration changes
|
||||
- Any issues encountered
|
||||
- Rollback plan (previous version info)
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### Rolling Update (Default)
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
spec:
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1 # Max pods above desired count
|
||||
maxUnavailable: 0 # Max pods below desired count
|
||||
```
|
||||
|
||||
**Deploy**:
|
||||
```bash
|
||||
kubectl set image deployment/[name] \
|
||||
[container]=[image:new-tag] \
|
||||
-n [namespace]
|
||||
```
|
||||
|
||||
### Recreate Strategy
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
spec:
|
||||
strategy:
|
||||
type: Recreate
|
||||
```
|
||||
|
||||
**Use case**: When you can afford downtime or need to avoid version mixing
|
||||
|
||||
### Blue-Green Deployment
|
||||
|
||||
**Steps**:
|
||||
```bash
|
||||
# 1. Deploy green version
|
||||
kubectl apply -f deployment-green.yaml -n [namespace]
|
||||
|
||||
# 2. Verify green is healthy
|
||||
kubectl get pods -n [namespace] -l version=green
|
||||
|
||||
# 3. Switch service selector
|
||||
kubectl patch service [name] -n [namespace] \
|
||||
-p '{"spec":{"selector":{"version":"green"}}}'
|
||||
|
||||
# 4. Remove blue version
|
||||
kubectl delete deployment [name]-blue -n [namespace]
|
||||
```
|
||||
|
||||
### Canary Deployment
|
||||
|
||||
**Steps**:
|
||||
```bash
|
||||
# 1. Deploy canary with 1 replica
|
||||
kubectl apply -f deployment-canary.yaml -n [namespace]
|
||||
|
||||
# 2. Monitor metrics (error rate, latency)
|
||||
kubectl logs -n [namespace] -l version=canary
|
||||
|
||||
# 3. Gradually increase canary replicas
|
||||
kubectl scale deployment [name]-canary --replicas=3 -n [namespace]
|
||||
|
||||
# 4. If successful, update main deployment
|
||||
kubectl set image deployment/[name] [container]=[new-image] -n [namespace]
|
||||
|
||||
# 5. Remove canary
|
||||
kubectl delete deployment [name]-canary -n [namespace]
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
### Deployment Summary
|
||||
|
||||
**Deployment Information**:
|
||||
- **Name**: [deployment-name]
|
||||
- **Namespace**: [namespace]
|
||||
- **Environment**: [dev/staging/production]
|
||||
- **Strategy**: [RollingUpdate/Recreate/Blue-Green/Canary]
|
||||
- **Timestamp**: [YYYY-MM-DD HH:MM:SS UTC]
|
||||
|
||||
**Resources Deployed**:
|
||||
```
|
||||
Deployments:
|
||||
✓ [name]: 3/3 replicas ready
|
||||
- Image: [image:tag]
|
||||
- CPU: 100m request, 500m limit
|
||||
- Memory: 128Mi request, 512Mi limit
|
||||
|
||||
Services:
|
||||
✓ [name]: ClusterIP 10.96.1.10:80 → 8080
|
||||
✓ [name]-lb: LoadBalancer [external-ip]:80 → 8080
|
||||
|
||||
Ingress:
|
||||
✓ [name]: https://[domain] → [service]:80
|
||||
|
||||
ConfigMaps:
|
||||
✓ [name]-config
|
||||
|
||||
Secrets:
|
||||
✓ [name]-secrets
|
||||
```
|
||||
|
||||
**Health Status**:
|
||||
- Pods: 3/3 Running
|
||||
- Ready: 3/3
|
||||
- Restarts: 0
|
||||
- Age: 2m30s
|
||||
|
||||
**Access Information**:
|
||||
- Internal: http://[service].[namespace].svc.cluster.local:80
|
||||
- External: https://[domain]
|
||||
- Load Balancer: http://[external-ip]:80
|
||||
|
||||
### Verification Commands
|
||||
|
||||
Run these commands to verify deployment:
|
||||
```bash
|
||||
# Check deployment status
|
||||
kubectl get deployment [name] -n [namespace]
|
||||
|
||||
# Check pod health
|
||||
kubectl get pods -n [namespace] -l app=[name]
|
||||
|
||||
# View logs
|
||||
kubectl logs -n [namespace] -l app=[name] --tail=20
|
||||
|
||||
# Test service
|
||||
kubectl run test --image=curlimages/curl -i --rm -- \
|
||||
curl http://[service].[namespace].svc.cluster.local
|
||||
|
||||
# Check resource usage
|
||||
kubectl top pods -n [namespace] -l app=[name]
|
||||
```
|
||||
|
||||
### Rollback Information
|
||||
|
||||
If issues occur, rollback with:
|
||||
```bash
|
||||
# View rollout history
|
||||
kubectl rollout history deployment/[name] -n [namespace]
|
||||
|
||||
# Rollback to previous version
|
||||
kubectl rollout undo deployment/[name] -n [namespace]
|
||||
|
||||
# Rollback to specific revision
|
||||
kubectl rollout undo deployment/[name] -n [namespace] --to-revision=[num]
|
||||
```
|
||||
|
||||
**Previous Version**:
|
||||
- Revision: [number]
|
||||
- Image: [previous-image:tag]
|
||||
- Change cause: [previous-deployment-reason]
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Pods Not Starting
|
||||
|
||||
**ImagePullBackOff**:
|
||||
```bash
|
||||
# Check image pull errors
|
||||
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
|
||||
|
||||
# Verify image exists
|
||||
docker pull [image:tag]
|
||||
|
||||
# Check imagePullSecrets
|
||||
kubectl get secrets -n [namespace]
|
||||
```
|
||||
|
||||
**CrashLoopBackOff**:
|
||||
```bash
|
||||
# Check application logs
|
||||
kubectl logs [pod-name] -n [namespace] --previous
|
||||
|
||||
# Check startup command
|
||||
kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Command:"
|
||||
|
||||
# Check resource limits
|
||||
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Limits:"
|
||||
```
|
||||
|
||||
**Pending Status**:
|
||||
```bash
|
||||
# Check why pod is pending
|
||||
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
|
||||
|
||||
# Check node resources
|
||||
kubectl top nodes
|
||||
|
||||
# Check PVC status (if using persistent volumes)
|
||||
kubectl get pvc -n [namespace]
|
||||
```
|
||||
|
||||
### Rollout Stuck
|
||||
|
||||
```bash
|
||||
# Check rollout status
|
||||
kubectl rollout status deployment/[name] -n [namespace]
|
||||
|
||||
# Check deployment events
|
||||
kubectl describe deployment [name] -n [namespace]
|
||||
|
||||
# Check replica sets
|
||||
kubectl get rs -n [namespace]
|
||||
|
||||
# Force rollout
|
||||
kubectl rollout restart deployment/[name] -n [namespace]
|
||||
```
|
||||
|
||||
### Service Not Accessible
|
||||
|
||||
```bash
|
||||
# Check service selector matches pod labels
|
||||
kubectl get service [name] -n [namespace] -o yaml | grep selector -A5
|
||||
kubectl get pods -n [namespace] --show-labels
|
||||
|
||||
# Check endpoints
|
||||
kubectl get endpoints [name] -n [namespace]
|
||||
|
||||
# Check network policies
|
||||
kubectl get networkpolicies -n [namespace]
|
||||
|
||||
# Test from debug pod
|
||||
kubectl run debug --image=nicolaka/netshoot -i --rm -- \
|
||||
curl http://[service].[namespace].svc.cluster.local
|
||||
```
|
||||
|
||||
### High Resource Usage
|
||||
|
||||
```bash
|
||||
# Check resource usage
|
||||
kubectl top pods -n [namespace]
|
||||
|
||||
# Check for OOMKilled
|
||||
kubectl get pods -n [namespace] -o json | \
|
||||
jq '.items[] | select(.status.containerStatuses[].lastState.terminated.reason=="OOMKilled") | .metadata.name'
|
||||
|
||||
# Increase resources
|
||||
kubectl set resources deployment [name] -n [namespace] \
|
||||
--limits=cpu=1000m,memory=1Gi \
|
||||
--requests=cpu=200m,memory=256Mi
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
**Pre-deployment**:
|
||||
- Always use `--dry-run=client` first
|
||||
- Test in dev/staging before production
|
||||
- Review resource limits
|
||||
- Verify image tags (avoid :latest in production)
|
||||
|
||||
**During deployment**:
|
||||
- Monitor rollout status
|
||||
- Watch logs for errors
|
||||
- Check pod health continuously
|
||||
- Verify endpoints are ready
|
||||
|
||||
**Post-deployment**:
|
||||
- Document what was deployed
|
||||
- Monitor for 10-15 minutes
|
||||
- Keep previous version info for rollback
|
||||
- Update monitoring dashboards
|
||||
|
||||
**Production deployments**:
|
||||
- Use blue-green or canary for critical services
|
||||
- Set PodDisruptionBudgets
|
||||
- Configure HorizontalPodAutoscaler
|
||||
- Enable auto-rollback on failure
|
||||
- Schedule during maintenance windows
|
||||
134
commands/k8s-full-stack-deploy.md
Normal file
134
commands/k8s-full-stack-deploy.md
Normal file
@@ -0,0 +1,134 @@
|
||||
---
|
||||
description: Orchestrated end-to-end deployment workflow
|
||||
argument-hint: Optional stack description
|
||||
---
|
||||
|
||||
# Full-Stack Kubernetes Deployment
|
||||
|
||||
You are orchestrating a complete end-to-end Kubernetes deployment workflow using multiple specialized agents.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Gather Requirements
|
||||
|
||||
If the user hasn't specified details, gather:
|
||||
- Application components and their relationships
|
||||
- Dependencies (databases, caches, message queues, etc.)
|
||||
- Target environment (dev/staging/production)
|
||||
- Security and compliance requirements
|
||||
- Monitoring and observability needs
|
||||
- GitOps automation preferences (ArgoCD/Flux)
|
||||
- Infrastructure platform (standard K8s, K3s, Talos, Flatcar)
|
||||
|
||||
### 2. Phase 1 - Configuration Generation
|
||||
|
||||
Launch the appropriate configuration agent(s):
|
||||
- **k8s-config-developer**: For standard Kubernetes YAML manifests
|
||||
- **helm-chart-developer**: If packaging as Helm chart
|
||||
- **cdk8s-engineer**: If using code-based configuration
|
||||
|
||||
Pass complete requirements to generate:
|
||||
- Application deployments/statefulsets
|
||||
- Database statefulsets with persistence
|
||||
- Service definitions
|
||||
- Ingress configurations
|
||||
- ConfigMaps and Secrets
|
||||
- RBAC resources
|
||||
|
||||
### 3. Phase 2 - Security Review
|
||||
|
||||
Launch **k8s-security-reviewer** to analyze all generated configurations:
|
||||
- Pod Security Standards compliance
|
||||
- RBAC least privilege verification
|
||||
- Network policy requirements
|
||||
- Secret management practices
|
||||
- Image security
|
||||
- Resource limits and quotas
|
||||
|
||||
**Critical**: Address all critical and high-severity findings before proceeding.
|
||||
|
||||
### 4. Phase 3 - Deployment
|
||||
|
||||
Launch **k8s-cluster-manager** to deploy in proper sequence:
|
||||
|
||||
1. Deploy infrastructure layer (databases, caches)
|
||||
2. Verify infrastructure health
|
||||
3. Deploy application layer
|
||||
4. Verify application health
|
||||
5. Configure ingress and networking
|
||||
|
||||
Monitor rollout status and handle any failures with automatic rollback.
|
||||
|
||||
### 5. Phase 4 - Monitoring Setup
|
||||
|
||||
Launch **k8s-monitoring-analyst** to:
|
||||
- Configure Prometheus ServiceMonitors
|
||||
- Create Grafana dashboards
|
||||
- Set up alerts for critical metrics
|
||||
- Establish baseline performance metrics
|
||||
- Configure log aggregation
|
||||
|
||||
### 6. Phase 5 - GitOps Automation
|
||||
|
||||
Launch **k8s-cicd-engineer** to establish GitOps:
|
||||
- Configure ArgoCD Application or Flux Kustomization
|
||||
- Set up automatic sync policies
|
||||
- Configure deployment notifications
|
||||
- Establish progressive delivery if needed
|
||||
|
||||
## Output Format
|
||||
|
||||
Provide a comprehensive deployment report:
|
||||
|
||||
### Deployment Summary
|
||||
- Environment: [environment]
|
||||
- Namespace: [namespace]
|
||||
- Components deployed: [list]
|
||||
- Security review: [Pass/Issues addressed]
|
||||
|
||||
### Resources Created
|
||||
```
|
||||
Deployments:
|
||||
- [name]: [replicas] replicas, image [image:tag]
|
||||
|
||||
StatefulSets:
|
||||
- [name]: [replicas] replicas, [storage]
|
||||
|
||||
Services:
|
||||
- [name]: [type], port [port]
|
||||
|
||||
Ingress:
|
||||
- [domain]: → [service]:[port]
|
||||
```
|
||||
|
||||
### Access Information
|
||||
- Application URL: https://[domain]
|
||||
- Monitoring: https://grafana.[domain]/d/[dashboard]
|
||||
- GitOps: https://argocd.[domain]/applications/[app]
|
||||
|
||||
### Next Steps
|
||||
1. Verify application at [URL]
|
||||
2. Check monitoring dashboards
|
||||
3. Review GitOps sync status
|
||||
4. Test rollback procedure
|
||||
|
||||
### Validation Commands
|
||||
```bash
|
||||
kubectl get all -n [namespace]
|
||||
kubectl logs -n [namespace] -l app=[name]
|
||||
kubectl top pods -n [namespace]
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If deployment fails:
|
||||
1. Check pod status: `kubectl get pods -n [namespace]`
|
||||
2. Review events: `kubectl get events -n [namespace] --sort-by='.lastTimestamp'`
|
||||
3. Check logs: `kubectl logs -n [namespace] [pod-name]`
|
||||
4. Verify resources: `kubectl describe pod -n [namespace] [pod-name]`
|
||||
|
||||
If security review fails:
|
||||
1. Review critical findings
|
||||
2. Update configurations to address issues
|
||||
3. Re-run security review
|
||||
4. Proceed only when critical issues resolved
|
||||
184
commands/k8s-security-review.md
Normal file
184
commands/k8s-security-review.md
Normal file
@@ -0,0 +1,184 @@
|
||||
---
|
||||
description: Security review of Kubernetes configurations
|
||||
argument-hint: Optional configurations to review
|
||||
---
|
||||
|
||||
# Kubernetes Security Review
|
||||
|
||||
You are conducting a comprehensive security review of Kubernetes configurations and deployments using the k8s-security-reviewer agent.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Identify Review Scope
|
||||
|
||||
Determine what needs to be reviewed:
|
||||
- **New configurations**: YAML manifests before deployment
|
||||
- **Existing deployments**: Running workloads in cluster
|
||||
- **Helm charts**: Chart templates and values
|
||||
- **Entire namespace**: All resources in a namespace
|
||||
- **Cluster-wide**: Cluster roles, policies, admission controllers
|
||||
|
||||
If user hasn't specified, ask for:
|
||||
- Target configurations or namespace
|
||||
- Environment criticality (dev/staging/production)
|
||||
- Compliance requirements (CIS, PCI-DSS, SOC 2, HIPAA)
|
||||
- Specific security concerns or focus areas
|
||||
|
||||
### 2. Gather Configuration Files
|
||||
|
||||
For file-based review:
|
||||
- Use `Read` tool to access manifest files
|
||||
- Use `Glob` to find all YAML files in directory
|
||||
- Use `Bash` with `kubectl` to extract running configurations
|
||||
|
||||
For cluster review:
|
||||
```bash
|
||||
kubectl get all -n [namespace] -o yaml
|
||||
kubectl get networkpolicies -n [namespace] -o yaml
|
||||
kubectl get rolebindings,clusterrolebindings -o yaml
|
||||
kubectl get psp,pdb -n [namespace] -o yaml
|
||||
```
|
||||
|
||||
### 3. Launch Security Review Agent
|
||||
|
||||
Launch **k8s-security-reviewer** agent with:
|
||||
- All configuration files or cluster export
|
||||
- Environment context (production requires stricter standards)
|
||||
- Compliance requirements
|
||||
- Specific focus areas if any
|
||||
|
||||
### 4. Analyze Security Findings
|
||||
|
||||
The agent will assess:
|
||||
- **Pod Security**: privileged containers, security contexts, capabilities
|
||||
- **RBAC**: overly permissive roles, cluster-admin usage
|
||||
- **Network Policies**: segmentation, default deny, egress control
|
||||
- **Secrets Management**: hardcoded secrets, proper encryption
|
||||
- **Image Security**: tag usage, registry sources, vulnerability scanning
|
||||
- **Resource Limits**: DoS prevention, resource quotas
|
||||
- **Admission Control**: PSS/PSP enforcement
|
||||
|
||||
### 5. Categorize Issues
|
||||
|
||||
Organize findings by severity:
|
||||
|
||||
**Critical** (Block deployment):
|
||||
- Privileged containers in production
|
||||
- Hardcoded secrets or credentials
|
||||
- Missing network policies in production
|
||||
- Overly permissive RBAC (cluster-admin for apps)
|
||||
|
||||
**High** (Fix before deployment):
|
||||
- Running as root
|
||||
- Missing resource limits
|
||||
- No Pod Disruption Budgets in production
|
||||
- Missing security contexts
|
||||
|
||||
**Medium** (Address soon):
|
||||
- Using :latest tag
|
||||
- Missing readiness/liveness probes
|
||||
- Insufficient RBAC granularity
|
||||
|
||||
**Low** (Best practice):
|
||||
- Missing labels
|
||||
- No pod anti-affinity
|
||||
- Verbose logging
|
||||
|
||||
### 6. Provide Remediation Guidance
|
||||
|
||||
For each critical and high finding:
|
||||
1. Explain the security risk
|
||||
2. Show the problematic configuration
|
||||
3. Provide fixed configuration
|
||||
4. Include verification steps
|
||||
|
||||
## Output Format
|
||||
|
||||
### Security Review Report
|
||||
|
||||
#### Executive Summary
|
||||
- **Overall Risk Level**: [Critical/High/Medium/Low]
|
||||
- **Critical Issues**: [count] - MUST fix before deployment
|
||||
- **High Issues**: [count] - Fix before production
|
||||
- **Medium Issues**: [count] - Address within sprint
|
||||
- **Low Issues**: [count] - Best practice improvements
|
||||
|
||||
#### Critical Findings
|
||||
|
||||
**[CRITICAL] Privileged Container**
|
||||
- **Location**: `deployment/myapp` container `app`
|
||||
- **Risk**: Full host access, container escape, kernel exploits
|
||||
- **Current Config**:
|
||||
```yaml
|
||||
securityContext:
|
||||
privileged: true # DANGEROUS
|
||||
```
|
||||
- **Recommended Fix**:
|
||||
```yaml
|
||||
securityContext:
|
||||
privileged: false
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
```
|
||||
- **Verification**: `kubectl describe pod [pod] | grep "Privileged:"`
|
||||
|
||||
#### High Priority Findings
|
||||
|
||||
[Similar format for each high-priority issue]
|
||||
|
||||
#### Compliance Assessment
|
||||
|
||||
- **CIS Kubernetes Benchmark**: [Pass/Fail items]
|
||||
- **Pod Security Standards**: [Baseline/Restricted]
|
||||
- **Industry Requirements**: [Specific to requested compliance]
|
||||
|
||||
#### Recommended Actions
|
||||
|
||||
Priority 1 (Before Deployment):
|
||||
1. [Action with file:line reference]
|
||||
2. [Action with file:line reference]
|
||||
|
||||
Priority 2 (This Sprint):
|
||||
1. [Action]
|
||||
2. [Action]
|
||||
|
||||
Priority 3 (Backlog):
|
||||
1. [Action]
|
||||
2. [Action]
|
||||
|
||||
### Validation Commands
|
||||
|
||||
After applying fixes:
|
||||
```bash
|
||||
# Verify security contexts
|
||||
kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext}{"\n"}{end}'
|
||||
|
||||
# Check for privileged pods
|
||||
kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'
|
||||
|
||||
# Verify network policies exist
|
||||
kubectl get networkpolicies -n [namespace]
|
||||
|
||||
# Check RBAC
|
||||
kubectl auth can-i --list -n [namespace]
|
||||
```
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
**When to block deployment:**
|
||||
- Any CRITICAL findings in production
|
||||
- Multiple HIGH findings in production
|
||||
- Compliance requirement violations
|
||||
|
||||
**When to allow with warnings:**
|
||||
- Only MEDIUM/LOW findings
|
||||
- HIGH findings in dev/staging with remediation plan
|
||||
|
||||
**When to require re-review:**
|
||||
- After fixing CRITICAL issues
|
||||
- After major configuration changes
|
||||
- Before production promotion
|
||||
14
commands/k8s-setup-flatcar.md
Normal file
14
commands/k8s-setup-flatcar.md
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
description: Configure Flatcar Linux-based cluster
|
||||
argument-hint: Optional cluster requirements
|
||||
---
|
||||
|
||||
You are initiating Flatcar Container Linux cluster setup. Use the flatcar-linux-expert agent.
|
||||
|
||||
If the user specifies requirements, pass to the agent. Otherwise, ask for:
|
||||
- Node configuration
|
||||
- Ignition config requirements
|
||||
- Update strategy
|
||||
- Container runtime preference
|
||||
|
||||
Launch the flatcar-linux-expert agent to configure Flatcar-based Kubernetes cluster.
|
||||
342
commands/k8s-setup-gitops.md
Normal file
342
commands/k8s-setup-gitops.md
Normal file
@@ -0,0 +1,342 @@
|
||||
---
|
||||
description: Setup GitOps CI/CD with ArgoCD or Flux
|
||||
argument-hint: Optional GitOps tool preference
|
||||
---
|
||||
|
||||
# GitOps CI/CD Setup
|
||||
|
||||
You are setting up GitOps-based continuous deployment using the k8s-cicd-engineer agent.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Choose GitOps Tool
|
||||
|
||||
If not specified, help user choose:
|
||||
|
||||
**ArgoCD** - Best for:
|
||||
- UI-driven workflows
|
||||
- Multi-cluster management
|
||||
- RBAC and SSO integration
|
||||
- Helm and Kustomize support
|
||||
|
||||
**Flux** - Best for:
|
||||
- Pure GitOps (no UI needed)
|
||||
- Kubernetes-native resources
|
||||
- Helm controller integration
|
||||
- Multi-tenancy
|
||||
|
||||
### 2. Gather Requirements
|
||||
|
||||
Ask for:
|
||||
- **Git repository**:
|
||||
- Repository URL
|
||||
- Branch strategy (main, env branches, or directories)
|
||||
- Authentication method (SSH key, token)
|
||||
- **Applications**:
|
||||
- List of applications to manage
|
||||
- Manifest locations in repo
|
||||
- Dependencies between apps
|
||||
- **Environments**:
|
||||
- dev, staging, production
|
||||
- Separate clusters or namespaces
|
||||
- **Sync policy**:
|
||||
- Automatic or manual sync
|
||||
- Auto-pruning resources
|
||||
- Self-healing enabled
|
||||
- **Progressive delivery**:
|
||||
- Canary deployments
|
||||
- Blue-green deployments
|
||||
- Flagger integration
|
||||
|
||||
### 3. Install GitOps Tool
|
||||
|
||||
Launch **k8s-cicd-engineer** to install:
|
||||
|
||||
**For ArgoCD**:
|
||||
```bash
|
||||
kubectl create namespace argocd
|
||||
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
|
||||
```
|
||||
|
||||
**For Flux**:
|
||||
```bash
|
||||
flux bootstrap github \
|
||||
--owner=[org] \
|
||||
--repository=[repo] \
|
||||
--branch=main \
|
||||
--path=clusters/production \
|
||||
--personal
|
||||
```
|
||||
|
||||
### 4. Configure Git Repository Access
|
||||
|
||||
**ArgoCD**:
|
||||
```bash
|
||||
argocd repo add https://github.com/org/repo \
|
||||
--username [user] \
|
||||
--password [token]
|
||||
```
|
||||
|
||||
**Flux**:
|
||||
- Flux bootstrap automatically creates deploy key
|
||||
- Verify in GitHub Settings > Deploy keys
|
||||
|
||||
### 5. Create Application Definitions
|
||||
|
||||
**ArgoCD Application**:
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: myapp
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://github.com/org/repo
|
||||
targetRevision: HEAD
|
||||
path: k8s/overlays/production
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: production
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
**Flux Kustomization**:
|
||||
```yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: myapp
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 5m
|
||||
path: ./k8s/overlays/production
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: myapp
|
||||
```
|
||||
|
||||
### 6. Setup App-of-Apps Pattern (Optional)
|
||||
|
||||
For managing multiple applications:
|
||||
|
||||
**ArgoCD**:
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: apps
|
||||
namespace: argocd
|
||||
spec:
|
||||
source:
|
||||
path: argocd/applications
|
||||
destination:
|
||||
namespace: argocd
|
||||
syncPolicy:
|
||||
automated: {}
|
||||
```
|
||||
|
||||
**Flux**: Use hierarchical Kustomizations
|
||||
|
||||
### 7. Configure Progressive Delivery (Optional)
|
||||
|
||||
If requested, install and configure Flagger:
|
||||
|
||||
```bash
|
||||
helm install flagger flagger/flagger \
|
||||
--namespace flagger-system
|
||||
```
|
||||
|
||||
Create Canary resource:
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
kind: Canary
|
||||
metadata:
|
||||
name: myapp
|
||||
spec:
|
||||
targetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: myapp
|
||||
analysis:
|
||||
interval: 1m
|
||||
threshold: 5
|
||||
maxWeight: 50
|
||||
stepWeight: 10
|
||||
```
|
||||
|
||||
### 8. Setup Notifications
|
||||
|
||||
**ArgoCD**:
|
||||
- Configure Slack/Teams webhooks
|
||||
- Setup notification triggers
|
||||
|
||||
**Flux**:
|
||||
- Configure notification-controller
|
||||
- Create Alerts for Git events
|
||||
|
||||
### 9. Verify GitOps Workflow
|
||||
|
||||
1. Make change in Git repository
|
||||
2. Commit and push
|
||||
3. Verify automatic sync
|
||||
4. Check application health
|
||||
|
||||
## Output Format
|
||||
|
||||
### GitOps Setup Summary
|
||||
|
||||
**GitOps Tool**: [ArgoCD/Flux]
|
||||
**Version**: [version]
|
||||
**Installation**: [namespace]
|
||||
|
||||
**Git Repository**:
|
||||
- URL: [repo-url]
|
||||
- Branch: [branch]
|
||||
- Path: [path]
|
||||
- Authentication: [Configured ✓]
|
||||
|
||||
**Applications Configured**:
|
||||
1. [app-name]
|
||||
- Source: [path]
|
||||
- Destination: [namespace]
|
||||
- Sync: [Auto/Manual]
|
||||
- Status: [Synced/OutOfSync]
|
||||
|
||||
2. [app-name]
|
||||
- Source: [path]
|
||||
- Destination: [namespace]
|
||||
- Sync: [Auto/Manual]
|
||||
- Status: [Synced/OutOfSync]
|
||||
|
||||
**Access Information**:
|
||||
- **ArgoCD UI**: https://argocd.[domain]
|
||||
- Username: admin
|
||||
- Password: [Use `kubectl get secret` to retrieve]
|
||||
- **Flux**: `flux get all`
|
||||
|
||||
### Next Steps
|
||||
|
||||
**For ArgoCD**:
|
||||
```bash
|
||||
# Access UI
|
||||
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
||||
|
||||
# Get admin password
|
||||
kubectl -n argocd get secret argocd-initial-admin-secret \
|
||||
-o jsonpath="{.data.password}" | base64 -d
|
||||
|
||||
# Sync application
|
||||
argocd app sync myapp
|
||||
|
||||
# Check status
|
||||
argocd app list
|
||||
```
|
||||
|
||||
**For Flux**:
|
||||
```bash
|
||||
# Check GitOps status
|
||||
flux get all
|
||||
|
||||
# Reconcile immediately
|
||||
flux reconcile source git myapp
|
||||
flux reconcile kustomization myapp
|
||||
|
||||
# Check logs
|
||||
flux logs
|
||||
```
|
||||
|
||||
### Testing GitOps Workflow
|
||||
|
||||
1. **Make a change**:
|
||||
```bash
|
||||
git clone [repo]
|
||||
cd [repo]
|
||||
# Edit manifests
|
||||
git add .
|
||||
git commit -m "Update deployment replicas"
|
||||
git push
|
||||
```
|
||||
|
||||
2. **Watch sync** (ArgoCD):
|
||||
```bash
|
||||
argocd app wait myapp --sync
|
||||
```
|
||||
|
||||
2. **Watch sync** (Flux):
|
||||
```bash
|
||||
flux reconcile kustomization myapp --with-source
|
||||
watch flux get kustomizations
|
||||
```
|
||||
|
||||
3. **Verify changes**:
|
||||
```bash
|
||||
kubectl get deployment myapp -n production
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
**Repository Structure**:
|
||||
```
|
||||
repo/
|
||||
├── base/ # Base manifests
|
||||
│ ├── deployment.yaml
|
||||
│ └── service.yaml
|
||||
├── overlays/
|
||||
│ ├── dev/ # Dev environment
|
||||
│ ├── staging/ # Staging environment
|
||||
│ └── production/ # Production environment
|
||||
└── argocd/ # Application definitions
|
||||
└── applications/
|
||||
```
|
||||
|
||||
**Security**:
|
||||
- Use SSH keys for Git access
|
||||
- Enable RBAC in ArgoCD
|
||||
- Encrypt secrets (Sealed Secrets, External Secrets)
|
||||
- Review before auto-sync in production
|
||||
|
||||
**Workflow**:
|
||||
- Use pull requests for changes
|
||||
- Require code review
|
||||
- Test in dev/staging first
|
||||
- Enable auto-sync only after testing
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Application not syncing (ArgoCD)**:
|
||||
```bash
|
||||
# Check application status
|
||||
argocd app get myapp
|
||||
|
||||
# Force sync
|
||||
argocd app sync myapp --force
|
||||
|
||||
# Check events
|
||||
kubectl get events -n argocd
|
||||
```
|
||||
|
||||
**Kustomization failing (Flux)**:
|
||||
```bash
|
||||
# Check status
|
||||
flux get kustomizations
|
||||
|
||||
# Check logs
|
||||
flux logs --kind=Kustomization --name=myapp
|
||||
|
||||
# Force reconcile
|
||||
flux reconcile kustomization myapp --with-source
|
||||
```
|
||||
|
||||
**Git authentication failing**:
|
||||
- Verify deploy key permissions (read/write)
|
||||
- Check token hasn't expired
|
||||
- Verify repository URL correct
|
||||
- Check network policies allow Git access
|
||||
216
commands/k8s-setup-talos.md
Normal file
216
commands/k8s-setup-talos.md
Normal file
@@ -0,0 +1,216 @@
|
||||
---
|
||||
description: Configure Talos Linux-based cluster
|
||||
argument-hint: Optional cluster requirements
|
||||
---
|
||||
|
||||
# Talos Linux Cluster Setup
|
||||
|
||||
You are setting up a Kubernetes cluster on Talos Linux using the talos-linux-expert agent.
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Gather Cluster Requirements
|
||||
|
||||
If not specified, ask for:
|
||||
- **Node configuration**:
|
||||
- Number of control plane nodes (1 or 3+ for HA)
|
||||
- Number of worker nodes
|
||||
- IP addresses for each node
|
||||
- Hostnames
|
||||
- **Network configuration**:
|
||||
- Control plane endpoint (load balancer IP for HA)
|
||||
- CNI preference (none/Cilium/Calico - recommend installing separately)
|
||||
- Pod and service CIDR ranges
|
||||
- **High availability**:
|
||||
- Load balancer for control plane (required for HA)
|
||||
- Distributed storage requirements
|
||||
- **Talos version**: Latest stable or specific version
|
||||
|
||||
### 2. Generate Machine Configurations
|
||||
|
||||
Launch **talos-linux-expert** to generate configs:
|
||||
|
||||
```bash
|
||||
talosctl gen config cluster-name https://[endpoint]:6443
|
||||
```
|
||||
|
||||
This creates:
|
||||
- `controlplane.yaml` - For control plane nodes
|
||||
- `worker.yaml` - For worker nodes
|
||||
- `talosconfig` - For talosctl client
|
||||
|
||||
### 3. Customize Configurations
|
||||
|
||||
Apply necessary patches for:
|
||||
- **Network settings**: Static IPs, routes, VLANs
|
||||
- **CNI**: Disable built-in CNI if using Cilium/Calico
|
||||
- **Install disk**: Specify correct disk path
|
||||
- **Certificate SANs**: Add load balancer IP/hostname
|
||||
- **Cluster discovery**: Configure if needed
|
||||
|
||||
Example patch:
|
||||
```yaml
|
||||
machine:
|
||||
network:
|
||||
interfaces:
|
||||
- interface: eth0
|
||||
addresses:
|
||||
- 192.168.1.10/24
|
||||
routes:
|
||||
- network: 0.0.0.0/0
|
||||
gateway: 192.168.1.1
|
||||
cluster:
|
||||
network:
|
||||
cni:
|
||||
name: none # Install Cilium separately
|
||||
```
|
||||
|
||||
### 4. Apply Configurations to Nodes
|
||||
|
||||
For each node:
|
||||
```bash
|
||||
# Control plane nodes
|
||||
talosctl apply-config --insecure --nodes [IP] --file controlplane.yaml
|
||||
|
||||
# Worker nodes
|
||||
talosctl apply-config --insecure --nodes [IP] --file worker.yaml
|
||||
```
|
||||
|
||||
Wait for nodes to boot and apply configurations.
|
||||
|
||||
### 5. Bootstrap Kubernetes
|
||||
|
||||
On first control plane node only:
|
||||
```bash
|
||||
talosctl bootstrap --nodes [first-controlplane-IP]
|
||||
```
|
||||
|
||||
This initializes etcd and starts Kubernetes.
|
||||
|
||||
### 6. Retrieve kubeconfig
|
||||
|
||||
```bash
|
||||
talosctl kubeconfig --nodes [controlplane-IP]
|
||||
```
|
||||
|
||||
### 7. Verify Cluster
|
||||
|
||||
```bash
|
||||
# Check Talos health
|
||||
talosctl health --nodes [all-nodes]
|
||||
|
||||
# Check Kubernetes nodes
|
||||
kubectl get nodes
|
||||
|
||||
# Verify etcd
|
||||
talosctl etcd members --nodes [controlplane-IP]
|
||||
```
|
||||
|
||||
### 8. Install CNI (if using Cilium/Calico)
|
||||
|
||||
If CNI set to `none`, launch **k8s-network-engineer** to install:
|
||||
```bash
|
||||
helm install cilium cilium/cilium --namespace kube-system
|
||||
```
|
||||
|
||||
### 9. Post-Installation Tasks
|
||||
|
||||
- Configure storage (if needed)
|
||||
- Set up monitoring
|
||||
- Apply security policies
|
||||
- Configure backups (etcd snapshots)
|
||||
|
||||
## Output Format
|
||||
|
||||
### Talos Cluster Configuration Summary
|
||||
|
||||
**Cluster Information:**
|
||||
- Name: [cluster-name]
|
||||
- Talos Version: [version]
|
||||
- Kubernetes Version: [version]
|
||||
- Endpoint: https://[endpoint]:6443
|
||||
|
||||
**Control Plane Nodes:**
|
||||
- [hostname]: [IP] - [status]
|
||||
- [hostname]: [IP] - [status]
|
||||
- [hostname]: [IP] - [status]
|
||||
|
||||
**Worker Nodes:**
|
||||
- [hostname]: [IP] - [status]
|
||||
- [hostname]: [IP] - [status]
|
||||
|
||||
**Network Configuration:**
|
||||
- CNI: [Cilium/Calico/None]
|
||||
- Pod CIDR: [range]
|
||||
- Service CIDR: [range]
|
||||
|
||||
**Configuration Files:**
|
||||
```
|
||||
✓ controlplane.yaml - Apply to control plane nodes
|
||||
✓ worker.yaml - Apply to worker nodes
|
||||
✓ talosconfig - Configure talosctl client
|
||||
```
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. **Configure talosctl**:
|
||||
```bash
|
||||
export TALOSCONFIG=$PWD/talosconfig
|
||||
talosctl config endpoint [controlplane-IPs]
|
||||
talosctl config node [any-controlplane-IP]
|
||||
```
|
||||
|
||||
2. **Verify cluster**:
|
||||
```bash
|
||||
kubectl get nodes
|
||||
kubectl get pods -A
|
||||
```
|
||||
|
||||
3. **Install CNI** (if needed):
|
||||
```bash
|
||||
helm install cilium cilium/cilium -n kube-system
|
||||
```
|
||||
|
||||
4. **Deploy workloads**:
|
||||
```bash
|
||||
kubectl apply -f your-manifests/
|
||||
```
|
||||
|
||||
### Useful talosctl Commands
|
||||
|
||||
```bash
|
||||
# Check node status
|
||||
talosctl dashboard --nodes [IP]
|
||||
|
||||
# View logs
|
||||
talosctl logs --nodes [IP] kubelet
|
||||
|
||||
# Upgrade Talos
|
||||
talosctl upgrade --nodes [IP] --image ghcr.io/siderolabs/installer:v1.6.0
|
||||
|
||||
# Upgrade Kubernetes
|
||||
talosctl upgrade-k8s --nodes [IP] --to 1.29.0
|
||||
|
||||
# Restart services
|
||||
talosctl restart kubelet --nodes [IP]
|
||||
|
||||
# etcd operations
|
||||
talosctl etcd snapshot --nodes [IP]
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Nodes not joining:**
|
||||
- Verify network connectivity
|
||||
- Check firewall rules (6443, 50000, 50001)
|
||||
- Verify machine config applied correctly
|
||||
|
||||
**etcd not starting:**
|
||||
- Ensure only one bootstrap command run
|
||||
- Check time synchronization
|
||||
- Verify disk space
|
||||
|
||||
**CNI not working:**
|
||||
- Verify CNI set to `none` in config
|
||||
- Check Cilium/Calico installation
|
||||
- Verify network policies not blocking
|
||||
Reference in New Issue
Block a user