Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:13 +08:00
commit 9529eaebeb
20 changed files with 3382 additions and 0 deletions

529
commands/k8s-deploy.md Normal file
View File

@@ -0,0 +1,529 @@
---
description: Deploy to Kubernetes cluster
argument-hint: Optional deployment details
---
# Kubernetes Deployment
You are deploying applications to a Kubernetes cluster using the k8s-cluster-manager agent.
## Workflow
### 1. Gather Deployment Information
If not specified, ask for:
- **What to deploy**:
- Path to YAML manifests
- Helm chart name/path
- Kustomize directory
- Docker image (for quick deployment)
- **Target cluster**:
- Cluster context name
- Namespace (create if doesn't exist)
- Environment type (dev/staging/production)
- **Deployment strategy**:
- RollingUpdate (default, zero downtime)
- Recreate (stop old, start new)
- Blue-Green (switch service selector)
- Canary (gradual traffic shift)
- **Requirements**:
- Resource requests/limits
- Replica count
- Health check configuration
### 2. Pre-Deployment Validation
Before deploying, verify:
**Cluster connectivity**:
```bash
kubectl cluster-info
kubectl get nodes
```
**Namespace exists or create**:
```bash
kubectl get namespace [namespace]
# If doesn't exist:
kubectl create namespace [namespace]
```
**Context verification**:
```bash
kubectl config current-context
# Switch if needed:
kubectl config use-context [cluster-name]
```
**Manifest validation** (for YAML files):
```bash
# Dry run to validate
kubectl apply -f [manifest.yaml] --dry-run=client
# Validate all files in directory
kubectl apply -f [directory]/ --dry-run=client
# Server-side validation
kubectl apply -f [manifest.yaml] --dry-run=server
```
### 3. Execute Deployment
Launch **k8s-cluster-manager** agent with deployment method:
#### Option A: Direct YAML Manifests
```bash
# Single file
kubectl apply -f deployment.yaml -n [namespace]
# Multiple files
kubectl apply -f deployment.yaml -f service.yaml -f ingress.yaml -n [namespace]
# Entire directory
kubectl apply -f k8s/ -n [namespace]
# Recursive directory
kubectl apply -f k8s/ -n [namespace] --recursive
```
#### Option B: Helm Chart
```bash
# Add repository (if needed)
helm repo add [repo-name] [repo-url]
helm repo update
# Install new release
helm install [release-name] [chart] -n [namespace] \
--create-namespace \
--set replicas=3 \
--set image.tag=v1.2.3 \
--values values.yaml
# Upgrade existing release
helm upgrade [release-name] [chart] -n [namespace] \
--reuse-values \
--set image.tag=v1.2.4
# Install or upgrade (idempotent)
helm upgrade --install [release-name] [chart] -n [namespace]
```
#### Option C: Kustomize
```bash
# Apply with kustomize
kubectl apply -k overlays/[environment]/ -n [namespace]
# Preview what will be applied
kubectl kustomize overlays/[environment]/
```
#### Option D: Quick Deployment (Image Only)
```bash
# Create deployment from image
kubectl create deployment [name] \
--image=[image:tag] \
--replicas=3 \
-n [namespace]
# Expose as service
kubectl expose deployment [name] \
--port=80 \
--target-port=8080 \
--type=LoadBalancer \
-n [namespace]
```
### 4. Monitor Deployment Progress
**Watch rollout status**:
```bash
# For Deployments
kubectl rollout status deployment/[name] -n [namespace]
# For StatefulSets
kubectl rollout status statefulset/[name] -n [namespace]
# For DaemonSets
kubectl rollout status daemonset/[name] -n [namespace]
```
**Watch pods coming up**:
```bash
# Watch pods in real-time
kubectl get pods -n [namespace] -w
# Watch with labels
kubectl get pods -n [namespace] -l app=[name] -w
# Detailed view
kubectl get pods -n [namespace] -o wide
```
**Check events**:
```bash
kubectl get events -n [namespace] \
--sort-by='.lastTimestamp' \
--watch
```
### 5. Verify Deployment Health
**Pod status checks**:
```bash
# All pods running?
kubectl get pods -n [namespace]
# Check specific deployment
kubectl get deployment [name] -n [namespace]
# Detailed pod info
kubectl describe pod [pod-name] -n [namespace]
```
**Health check verification**:
```bash
# Check if pods are ready
kubectl get pods -n [namespace] -o json | \
jq '.items[] | {name: .metadata.name, ready: .status.conditions[] | select(.type=="Ready") | .status}'
# Check readiness probes
kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Readiness"
```
**Service connectivity**:
```bash
# Check service endpoints
kubectl get endpoints [service-name] -n [namespace]
# Describe service
kubectl describe service [service-name] -n [namespace]
# Test service from within cluster
kubectl run test-pod --image=curlimages/curl -i --rm -- \
curl http://[service-name].[namespace].svc.cluster.local
```
**Resource usage**:
```bash
# Pod resource usage
kubectl top pods -n [namespace]
# Specific deployment
kubectl top pods -n [namespace] -l app=[name]
```
### 6. Post-Deployment Validation
**Application health checks**:
```bash
# Check application logs
kubectl logs -n [namespace] deployment/[name] --tail=50
# Follow logs
kubectl logs -n [namespace] -f deployment/[name]
# Logs from all pods
kubectl logs -n [namespace] -l app=[name] --all-containers=true
```
**Ingress/Route verification** (if applicable):
```bash
# Check ingress
kubectl get ingress -n [namespace]
# Test external access
curl https://[domain]
```
**ConfigMap/Secret verification**:
```bash
# Verify ConfigMaps mounted
kubectl get configmap -n [namespace]
# Verify Secrets exist
kubectl get secrets -n [namespace]
```
### 7. Update Deployment Records
Document deployment details:
- Deployment timestamp
- Image versions deployed
- Configuration changes
- Any issues encountered
- Rollback plan (previous version info)
## Deployment Strategies
### Rolling Update (Default)
**Configuration**:
```yaml
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max pods above desired count
maxUnavailable: 0 # Max pods below desired count
```
**Deploy**:
```bash
kubectl set image deployment/[name] \
[container]=[image:new-tag] \
-n [namespace]
```
### Recreate Strategy
**Configuration**:
```yaml
spec:
strategy:
type: Recreate
```
**Use case**: When you can afford downtime or need to avoid version mixing
### Blue-Green Deployment
**Steps**:
```bash
# 1. Deploy green version
kubectl apply -f deployment-green.yaml -n [namespace]
# 2. Verify green is healthy
kubectl get pods -n [namespace] -l version=green
# 3. Switch service selector
kubectl patch service [name] -n [namespace] \
-p '{"spec":{"selector":{"version":"green"}}}'
# 4. Remove blue version
kubectl delete deployment [name]-blue -n [namespace]
```
### Canary Deployment
**Steps**:
```bash
# 1. Deploy canary with 1 replica
kubectl apply -f deployment-canary.yaml -n [namespace]
# 2. Monitor metrics (error rate, latency)
kubectl logs -n [namespace] -l version=canary
# 3. Gradually increase canary replicas
kubectl scale deployment [name]-canary --replicas=3 -n [namespace]
# 4. If successful, update main deployment
kubectl set image deployment/[name] [container]=[new-image] -n [namespace]
# 5. Remove canary
kubectl delete deployment [name]-canary -n [namespace]
```
## Output Format
### Deployment Summary
**Deployment Information**:
- **Name**: [deployment-name]
- **Namespace**: [namespace]
- **Environment**: [dev/staging/production]
- **Strategy**: [RollingUpdate/Recreate/Blue-Green/Canary]
- **Timestamp**: [YYYY-MM-DD HH:MM:SS UTC]
**Resources Deployed**:
```
Deployments:
✓ [name]: 3/3 replicas ready
- Image: [image:tag]
- CPU: 100m request, 500m limit
- Memory: 128Mi request, 512Mi limit
Services:
✓ [name]: ClusterIP 10.96.1.10:80 → 8080
✓ [name]-lb: LoadBalancer [external-ip]:80 → 8080
Ingress:
✓ [name]: https://[domain] → [service]:80
ConfigMaps:
✓ [name]-config
Secrets:
✓ [name]-secrets
```
**Health Status**:
- Pods: 3/3 Running
- Ready: 3/3
- Restarts: 0
- Age: 2m30s
**Access Information**:
- Internal: http://[service].[namespace].svc.cluster.local:80
- External: https://[domain]
- Load Balancer: http://[external-ip]:80
### Verification Commands
Run these commands to verify deployment:
```bash
# Check deployment status
kubectl get deployment [name] -n [namespace]
# Check pod health
kubectl get pods -n [namespace] -l app=[name]
# View logs
kubectl logs -n [namespace] -l app=[name] --tail=20
# Test service
kubectl run test --image=curlimages/curl -i --rm -- \
curl http://[service].[namespace].svc.cluster.local
# Check resource usage
kubectl top pods -n [namespace] -l app=[name]
```
### Rollback Information
If issues occur, rollback with:
```bash
# View rollout history
kubectl rollout history deployment/[name] -n [namespace]
# Rollback to previous version
kubectl rollout undo deployment/[name] -n [namespace]
# Rollback to specific revision
kubectl rollout undo deployment/[name] -n [namespace] --to-revision=[num]
```
**Previous Version**:
- Revision: [number]
- Image: [previous-image:tag]
- Change cause: [previous-deployment-reason]
## Troubleshooting
### Pods Not Starting
**ImagePullBackOff**:
```bash
# Check image pull errors
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
# Verify image exists
docker pull [image:tag]
# Check imagePullSecrets
kubectl get secrets -n [namespace]
```
**CrashLoopBackOff**:
```bash
# Check application logs
kubectl logs [pod-name] -n [namespace] --previous
# Check startup command
kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Command:"
# Check resource limits
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Limits:"
```
**Pending Status**:
```bash
# Check why pod is pending
kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
# Check node resources
kubectl top nodes
# Check PVC status (if using persistent volumes)
kubectl get pvc -n [namespace]
```
### Rollout Stuck
```bash
# Check rollout status
kubectl rollout status deployment/[name] -n [namespace]
# Check deployment events
kubectl describe deployment [name] -n [namespace]
# Check replica sets
kubectl get rs -n [namespace]
# Force rollout
kubectl rollout restart deployment/[name] -n [namespace]
```
### Service Not Accessible
```bash
# Check service selector matches pod labels
kubectl get service [name] -n [namespace] -o yaml | grep selector -A5
kubectl get pods -n [namespace] --show-labels
# Check endpoints
kubectl get endpoints [name] -n [namespace]
# Check network policies
kubectl get networkpolicies -n [namespace]
# Test from debug pod
kubectl run debug --image=nicolaka/netshoot -i --rm -- \
curl http://[service].[namespace].svc.cluster.local
```
### High Resource Usage
```bash
# Check resource usage
kubectl top pods -n [namespace]
# Check for OOMKilled
kubectl get pods -n [namespace] -o json | \
jq '.items[] | select(.status.containerStatuses[].lastState.terminated.reason=="OOMKilled") | .metadata.name'
# Increase resources
kubectl set resources deployment [name] -n [namespace] \
--limits=cpu=1000m,memory=1Gi \
--requests=cpu=200m,memory=256Mi
```
## Best Practices
**Pre-deployment**:
- Always use `--dry-run=client` first
- Test in dev/staging before production
- Review resource limits
- Verify image tags (avoid :latest in production)
**During deployment**:
- Monitor rollout status
- Watch logs for errors
- Check pod health continuously
- Verify endpoints are ready
**Post-deployment**:
- Document what was deployed
- Monitor for 10-15 minutes
- Keep previous version info for rollback
- Update monitoring dashboards
**Production deployments**:
- Use blue-green or canary for critical services
- Set PodDisruptionBudgets
- Configure HorizontalPodAutoscaler
- Enable auto-rollback on failure
- Schedule during maintenance windows

View File

@@ -0,0 +1,134 @@
---
description: Orchestrated end-to-end deployment workflow
argument-hint: Optional stack description
---
# Full-Stack Kubernetes Deployment
You are orchestrating a complete end-to-end Kubernetes deployment workflow using multiple specialized agents.
## Workflow
### 1. Gather Requirements
If the user hasn't specified details, gather:
- Application components and their relationships
- Dependencies (databases, caches, message queues, etc.)
- Target environment (dev/staging/production)
- Security and compliance requirements
- Monitoring and observability needs
- GitOps automation preferences (ArgoCD/Flux)
- Infrastructure platform (standard K8s, K3s, Talos, Flatcar)
### 2. Phase 1 - Configuration Generation
Launch the appropriate configuration agent(s):
- **k8s-config-developer**: For standard Kubernetes YAML manifests
- **helm-chart-developer**: If packaging as Helm chart
- **cdk8s-engineer**: If using code-based configuration
Pass complete requirements to generate:
- Application deployments/statefulsets
- Database statefulsets with persistence
- Service definitions
- Ingress configurations
- ConfigMaps and Secrets
- RBAC resources
### 3. Phase 2 - Security Review
Launch **k8s-security-reviewer** to analyze all generated configurations:
- Pod Security Standards compliance
- RBAC least privilege verification
- Network policy requirements
- Secret management practices
- Image security
- Resource limits and quotas
**Critical**: Address all critical and high-severity findings before proceeding.
### 4. Phase 3 - Deployment
Launch **k8s-cluster-manager** to deploy in proper sequence:
1. Deploy infrastructure layer (databases, caches)
2. Verify infrastructure health
3. Deploy application layer
4. Verify application health
5. Configure ingress and networking
Monitor rollout status and handle any failures with automatic rollback.
### 5. Phase 4 - Monitoring Setup
Launch **k8s-monitoring-analyst** to:
- Configure Prometheus ServiceMonitors
- Create Grafana dashboards
- Set up alerts for critical metrics
- Establish baseline performance metrics
- Configure log aggregation
### 6. Phase 5 - GitOps Automation
Launch **k8s-cicd-engineer** to establish GitOps:
- Configure ArgoCD Application or Flux Kustomization
- Set up automatic sync policies
- Configure deployment notifications
- Establish progressive delivery if needed
## Output Format
Provide a comprehensive deployment report:
### Deployment Summary
- Environment: [environment]
- Namespace: [namespace]
- Components deployed: [list]
- Security review: [Pass/Issues addressed]
### Resources Created
```
Deployments:
- [name]: [replicas] replicas, image [image:tag]
StatefulSets:
- [name]: [replicas] replicas, [storage]
Services:
- [name]: [type], port [port]
Ingress:
- [domain]: → [service]:[port]
```
### Access Information
- Application URL: https://[domain]
- Monitoring: https://grafana.[domain]/d/[dashboard]
- GitOps: https://argocd.[domain]/applications/[app]
### Next Steps
1. Verify application at [URL]
2. Check monitoring dashboards
3. Review GitOps sync status
4. Test rollback procedure
### Validation Commands
```bash
kubectl get all -n [namespace]
kubectl logs -n [namespace] -l app=[name]
kubectl top pods -n [namespace]
```
## Troubleshooting
If deployment fails:
1. Check pod status: `kubectl get pods -n [namespace]`
2. Review events: `kubectl get events -n [namespace] --sort-by='.lastTimestamp'`
3. Check logs: `kubectl logs -n [namespace] [pod-name]`
4. Verify resources: `kubectl describe pod -n [namespace] [pod-name]`
If security review fails:
1. Review critical findings
2. Update configurations to address issues
3. Re-run security review
4. Proceed only when critical issues resolved

View File

@@ -0,0 +1,184 @@
---
description: Security review of Kubernetes configurations
argument-hint: Optional configurations to review
---
# Kubernetes Security Review
You are conducting a comprehensive security review of Kubernetes configurations and deployments using the k8s-security-reviewer agent.
## Workflow
### 1. Identify Review Scope
Determine what needs to be reviewed:
- **New configurations**: YAML manifests before deployment
- **Existing deployments**: Running workloads in cluster
- **Helm charts**: Chart templates and values
- **Entire namespace**: All resources in a namespace
- **Cluster-wide**: Cluster roles, policies, admission controllers
If user hasn't specified, ask for:
- Target configurations or namespace
- Environment criticality (dev/staging/production)
- Compliance requirements (CIS, PCI-DSS, SOC 2, HIPAA)
- Specific security concerns or focus areas
### 2. Gather Configuration Files
For file-based review:
- Use `Read` tool to access manifest files
- Use `Glob` to find all YAML files in directory
- Use `Bash` with `kubectl` to extract running configurations
For cluster review:
```bash
kubectl get all -n [namespace] -o yaml
kubectl get networkpolicies -n [namespace] -o yaml
kubectl get rolebindings,clusterrolebindings -o yaml
kubectl get psp,pdb -n [namespace] -o yaml
```
### 3. Launch Security Review Agent
Launch **k8s-security-reviewer** agent with:
- All configuration files or cluster export
- Environment context (production requires stricter standards)
- Compliance requirements
- Specific focus areas if any
### 4. Analyze Security Findings
The agent will assess:
- **Pod Security**: privileged containers, security contexts, capabilities
- **RBAC**: overly permissive roles, cluster-admin usage
- **Network Policies**: segmentation, default deny, egress control
- **Secrets Management**: hardcoded secrets, proper encryption
- **Image Security**: tag usage, registry sources, vulnerability scanning
- **Resource Limits**: DoS prevention, resource quotas
- **Admission Control**: PSS/PSP enforcement
### 5. Categorize Issues
Organize findings by severity:
**Critical** (Block deployment):
- Privileged containers in production
- Hardcoded secrets or credentials
- Missing network policies in production
- Overly permissive RBAC (cluster-admin for apps)
**High** (Fix before deployment):
- Running as root
- Missing resource limits
- No Pod Disruption Budgets in production
- Missing security contexts
**Medium** (Address soon):
- Using :latest tag
- Missing readiness/liveness probes
- Insufficient RBAC granularity
**Low** (Best practice):
- Missing labels
- No pod anti-affinity
- Verbose logging
### 6. Provide Remediation Guidance
For each critical and high finding:
1. Explain the security risk
2. Show the problematic configuration
3. Provide fixed configuration
4. Include verification steps
## Output Format
### Security Review Report
#### Executive Summary
- **Overall Risk Level**: [Critical/High/Medium/Low]
- **Critical Issues**: [count] - MUST fix before deployment
- **High Issues**: [count] - Fix before production
- **Medium Issues**: [count] - Address within sprint
- **Low Issues**: [count] - Best practice improvements
#### Critical Findings
**[CRITICAL] Privileged Container**
- **Location**: `deployment/myapp` container `app`
- **Risk**: Full host access, container escape, kernel exploits
- **Current Config**:
```yaml
securityContext:
privileged: true # DANGEROUS
```
- **Recommended Fix**:
```yaml
securityContext:
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop: [ALL]
```
- **Verification**: `kubectl describe pod [pod] | grep "Privileged:"`
#### High Priority Findings
[Similar format for each high-priority issue]
#### Compliance Assessment
- **CIS Kubernetes Benchmark**: [Pass/Fail items]
- **Pod Security Standards**: [Baseline/Restricted]
- **Industry Requirements**: [Specific to requested compliance]
#### Recommended Actions
Priority 1 (Before Deployment):
1. [Action with file:line reference]
2. [Action with file:line reference]
Priority 2 (This Sprint):
1. [Action]
2. [Action]
Priority 3 (Backlog):
1. [Action]
2. [Action]
### Validation Commands
After applying fixes:
```bash
# Verify security contexts
kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext}{"\n"}{end}'
# Check for privileged pods
kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'
# Verify network policies exist
kubectl get networkpolicies -n [namespace]
# Check RBAC
kubectl auth can-i --list -n [namespace]
```
## Decision Matrix
**When to block deployment:**
- Any CRITICAL findings in production
- Multiple HIGH findings in production
- Compliance requirement violations
**When to allow with warnings:**
- Only MEDIUM/LOW findings
- HIGH findings in dev/staging with remediation plan
**When to require re-review:**
- After fixing CRITICAL issues
- After major configuration changes
- Before production promotion

View File

@@ -0,0 +1,14 @@
---
description: Configure Flatcar Linux-based cluster
argument-hint: Optional cluster requirements
---
You are initiating Flatcar Container Linux cluster setup. Use the flatcar-linux-expert agent.
If the user specifies requirements, pass to the agent. Otherwise, ask for:
- Node configuration
- Ignition config requirements
- Update strategy
- Container runtime preference
Launch the flatcar-linux-expert agent to configure Flatcar-based Kubernetes cluster.

View File

@@ -0,0 +1,342 @@
---
description: Setup GitOps CI/CD with ArgoCD or Flux
argument-hint: Optional GitOps tool preference
---
# GitOps CI/CD Setup
You are setting up GitOps-based continuous deployment using the k8s-cicd-engineer agent.
## Workflow
### 1. Choose GitOps Tool
If not specified, help user choose:
**ArgoCD** - Best for:
- UI-driven workflows
- Multi-cluster management
- RBAC and SSO integration
- Helm and Kustomize support
**Flux** - Best for:
- Pure GitOps (no UI needed)
- Kubernetes-native resources
- Helm controller integration
- Multi-tenancy
### 2. Gather Requirements
Ask for:
- **Git repository**:
- Repository URL
- Branch strategy (main, env branches, or directories)
- Authentication method (SSH key, token)
- **Applications**:
- List of applications to manage
- Manifest locations in repo
- Dependencies between apps
- **Environments**:
- dev, staging, production
- Separate clusters or namespaces
- **Sync policy**:
- Automatic or manual sync
- Auto-pruning resources
- Self-healing enabled
- **Progressive delivery**:
- Canary deployments
- Blue-green deployments
- Flagger integration
### 3. Install GitOps Tool
Launch **k8s-cicd-engineer** to install:
**For ArgoCD**:
```bash
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
```
**For Flux**:
```bash
flux bootstrap github \
--owner=[org] \
--repository=[repo] \
--branch=main \
--path=clusters/production \
--personal
```
### 4. Configure Git Repository Access
**ArgoCD**:
```bash
argocd repo add https://github.com/org/repo \
--username [user] \
--password [token]
```
**Flux**:
- Flux bootstrap automatically creates deploy key
- Verify in GitHub Settings > Deploy keys
### 5. Create Application Definitions
**ArgoCD Application**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/repo
targetRevision: HEAD
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
```
**Flux Kustomization**:
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: ./k8s/overlays/production
prune: true
sourceRef:
kind: GitRepository
name: myapp
```
### 6. Setup App-of-Apps Pattern (Optional)
For managing multiple applications:
**ArgoCD**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: apps
namespace: argocd
spec:
source:
path: argocd/applications
destination:
namespace: argocd
syncPolicy:
automated: {}
```
**Flux**: Use hierarchical Kustomizations
### 7. Configure Progressive Delivery (Optional)
If requested, install and configure Flagger:
```bash
helm install flagger flagger/flagger \
--namespace flagger-system
```
Create Canary resource:
```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
```
### 8. Setup Notifications
**ArgoCD**:
- Configure Slack/Teams webhooks
- Setup notification triggers
**Flux**:
- Configure notification-controller
- Create Alerts for Git events
### 9. Verify GitOps Workflow
1. Make change in Git repository
2. Commit and push
3. Verify automatic sync
4. Check application health
## Output Format
### GitOps Setup Summary
**GitOps Tool**: [ArgoCD/Flux]
**Version**: [version]
**Installation**: [namespace]
**Git Repository**:
- URL: [repo-url]
- Branch: [branch]
- Path: [path]
- Authentication: [Configured ✓]
**Applications Configured**:
1. [app-name]
- Source: [path]
- Destination: [namespace]
- Sync: [Auto/Manual]
- Status: [Synced/OutOfSync]
2. [app-name]
- Source: [path]
- Destination: [namespace]
- Sync: [Auto/Manual]
- Status: [Synced/OutOfSync]
**Access Information**:
- **ArgoCD UI**: https://argocd.[domain]
- Username: admin
- Password: [Use `kubectl get secret` to retrieve]
- **Flux**: `flux get all`
### Next Steps
**For ArgoCD**:
```bash
# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Sync application
argocd app sync myapp
# Check status
argocd app list
```
**For Flux**:
```bash
# Check GitOps status
flux get all
# Reconcile immediately
flux reconcile source git myapp
flux reconcile kustomization myapp
# Check logs
flux logs
```
### Testing GitOps Workflow
1. **Make a change**:
```bash
git clone [repo]
cd [repo]
# Edit manifests
git add .
git commit -m "Update deployment replicas"
git push
```
2. **Watch sync** (ArgoCD):
```bash
argocd app wait myapp --sync
```
2. **Watch sync** (Flux):
```bash
flux reconcile kustomization myapp --with-source
watch flux get kustomizations
```
3. **Verify changes**:
```bash
kubectl get deployment myapp -n production
```
## Best Practices
**Repository Structure**:
```
repo/
├── base/ # Base manifests
│ ├── deployment.yaml
│ └── service.yaml
├── overlays/
│ ├── dev/ # Dev environment
│ ├── staging/ # Staging environment
│ └── production/ # Production environment
└── argocd/ # Application definitions
└── applications/
```
**Security**:
- Use SSH keys for Git access
- Enable RBAC in ArgoCD
- Encrypt secrets (Sealed Secrets, External Secrets)
- Review before auto-sync in production
**Workflow**:
- Use pull requests for changes
- Require code review
- Test in dev/staging first
- Enable auto-sync only after testing
## Troubleshooting
**Application not syncing (ArgoCD)**:
```bash
# Check application status
argocd app get myapp
# Force sync
argocd app sync myapp --force
# Check events
kubectl get events -n argocd
```
**Kustomization failing (Flux)**:
```bash
# Check status
flux get kustomizations
# Check logs
flux logs --kind=Kustomization --name=myapp
# Force reconcile
flux reconcile kustomization myapp --with-source
```
**Git authentication failing**:
- Verify deploy key permissions (read/write)
- Check token hasn't expired
- Verify repository URL correct
- Check network policies allow Git access

216
commands/k8s-setup-talos.md Normal file
View File

@@ -0,0 +1,216 @@
---
description: Configure Talos Linux-based cluster
argument-hint: Optional cluster requirements
---
# Talos Linux Cluster Setup
You are setting up a Kubernetes cluster on Talos Linux using the talos-linux-expert agent.
## Workflow
### 1. Gather Cluster Requirements
If not specified, ask for:
- **Node configuration**:
- Number of control plane nodes (1 or 3+ for HA)
- Number of worker nodes
- IP addresses for each node
- Hostnames
- **Network configuration**:
- Control plane endpoint (load balancer IP for HA)
- CNI preference (none/Cilium/Calico - recommend installing separately)
- Pod and service CIDR ranges
- **High availability**:
- Load balancer for control plane (required for HA)
- Distributed storage requirements
- **Talos version**: Latest stable or specific version
### 2. Generate Machine Configurations
Launch **talos-linux-expert** to generate configs:
```bash
talosctl gen config cluster-name https://[endpoint]:6443
```
This creates:
- `controlplane.yaml` - For control plane nodes
- `worker.yaml` - For worker nodes
- `talosconfig` - For talosctl client
### 3. Customize Configurations
Apply necessary patches for:
- **Network settings**: Static IPs, routes, VLANs
- **CNI**: Disable built-in CNI if using Cilium/Calico
- **Install disk**: Specify correct disk path
- **Certificate SANs**: Add load balancer IP/hostname
- **Cluster discovery**: Configure if needed
Example patch:
```yaml
machine:
network:
interfaces:
- interface: eth0
addresses:
- 192.168.1.10/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.1.1
cluster:
network:
cni:
name: none # Install Cilium separately
```
### 4. Apply Configurations to Nodes
For each node:
```bash
# Control plane nodes
talosctl apply-config --insecure --nodes [IP] --file controlplane.yaml
# Worker nodes
talosctl apply-config --insecure --nodes [IP] --file worker.yaml
```
Wait for nodes to boot and apply configurations.
### 5. Bootstrap Kubernetes
On first control plane node only:
```bash
talosctl bootstrap --nodes [first-controlplane-IP]
```
This initializes etcd and starts Kubernetes.
### 6. Retrieve kubeconfig
```bash
talosctl kubeconfig --nodes [controlplane-IP]
```
### 7. Verify Cluster
```bash
# Check Talos health
talosctl health --nodes [all-nodes]
# Check Kubernetes nodes
kubectl get nodes
# Verify etcd
talosctl etcd members --nodes [controlplane-IP]
```
### 8. Install CNI (if using Cilium/Calico)
If CNI set to `none`, launch **k8s-network-engineer** to install:
```bash
helm install cilium cilium/cilium --namespace kube-system
```
### 9. Post-Installation Tasks
- Configure storage (if needed)
- Set up monitoring
- Apply security policies
- Configure backups (etcd snapshots)
## Output Format
### Talos Cluster Configuration Summary
**Cluster Information:**
- Name: [cluster-name]
- Talos Version: [version]
- Kubernetes Version: [version]
- Endpoint: https://[endpoint]:6443
**Control Plane Nodes:**
- [hostname]: [IP] - [status]
- [hostname]: [IP] - [status]
- [hostname]: [IP] - [status]
**Worker Nodes:**
- [hostname]: [IP] - [status]
- [hostname]: [IP] - [status]
**Network Configuration:**
- CNI: [Cilium/Calico/None]
- Pod CIDR: [range]
- Service CIDR: [range]
**Configuration Files:**
```
✓ controlplane.yaml - Apply to control plane nodes
✓ worker.yaml - Apply to worker nodes
✓ talosconfig - Configure talosctl client
```
### Next Steps
1. **Configure talosctl**:
```bash
export TALOSCONFIG=$PWD/talosconfig
talosctl config endpoint [controlplane-IPs]
talosctl config node [any-controlplane-IP]
```
2. **Verify cluster**:
```bash
kubectl get nodes
kubectl get pods -A
```
3. **Install CNI** (if needed):
```bash
helm install cilium cilium/cilium -n kube-system
```
4. **Deploy workloads**:
```bash
kubectl apply -f your-manifests/
```
### Useful talosctl Commands
```bash
# Check node status
talosctl dashboard --nodes [IP]
# View logs
talosctl logs --nodes [IP] kubelet
# Upgrade Talos
talosctl upgrade --nodes [IP] --image ghcr.io/siderolabs/installer:v1.6.0
# Upgrade Kubernetes
talosctl upgrade-k8s --nodes [IP] --to 1.29.0
# Restart services
talosctl restart kubelet --nodes [IP]
# etcd operations
talosctl etcd snapshot --nodes [IP]
```
## Troubleshooting
**Nodes not joining:**
- Verify network connectivity
- Check firewall rules (6443, 50000, 50001)
- Verify machine config applied correctly
**etcd not starting:**
- Ensure only one bootstrap command run
- Check time synchronization
- Verify disk space
**CNI not working:**
- Verify CNI set to `none` in config
- Check Cilium/Calico installation
- Verify network policies not blocking