Initial commit

2025-11-30 08:47:13 +08:00
commit 9529eaebeb
20 changed files with 3382 additions and 0 deletions
--- a/commands/k8s-deploy.md
+++ b/commands/k8s-deploy.md
@@ -0,0 +1,529 @@
+---
+description: Deploy to Kubernetes cluster
+argument-hint: Optional deployment details
+---
+
+# Kubernetes Deployment
+
+You are deploying applications to a Kubernetes cluster using the k8s-cluster-manager agent.
+
+## Workflow
+
+### 1. Gather Deployment Information
+
+If not specified, ask for:
+- **What to deploy**:
+  - Path to YAML manifests
+  - Helm chart name/path
+  - Kustomize directory
+  - Docker image (for quick deployment)
+- **Target cluster**:
+  - Cluster context name
+  - Namespace (create if doesn't exist)
+  - Environment type (dev/staging/production)
+- **Deployment strategy**:
+  - RollingUpdate (default, zero downtime)
+  - Recreate (stop old, start new)
+  - Blue-Green (switch service selector)
+  - Canary (gradual traffic shift)
+- **Requirements**:
+  - Resource requests/limits
+  - Replica count
+  - Health check configuration
+
+### 2. Pre-Deployment Validation
+
+Before deploying, verify:
+
+**Cluster connectivity**:
+```bash
+kubectl cluster-info
+kubectl get nodes
+```
+
+**Namespace exists or create**:
+```bash
+kubectl get namespace [namespace]
+# If doesn't exist:
+kubectl create namespace [namespace]
+```
+
+**Context verification**:
+```bash
+kubectl config current-context
+# Switch if needed:
+kubectl config use-context [cluster-name]
+```
+
+**Manifest validation** (for YAML files):
+```bash
+# Dry run to validate
+kubectl apply -f [manifest.yaml] --dry-run=client
+
+# Validate all files in directory
+kubectl apply -f [directory]/ --dry-run=client
+
+# Server-side validation
+kubectl apply -f [manifest.yaml] --dry-run=server
+```
+
+### 3. Execute Deployment
+
+Launch **k8s-cluster-manager** agent with deployment method:
+
+#### Option A: Direct YAML Manifests
+
+```bash
+# Single file
+kubectl apply -f deployment.yaml -n [namespace]
+
+# Multiple files
+kubectl apply -f deployment.yaml -f service.yaml -f ingress.yaml -n [namespace]
+
+# Entire directory
+kubectl apply -f k8s/ -n [namespace]
+
+# Recursive directory
+kubectl apply -f k8s/ -n [namespace] --recursive
+```
+
+#### Option B: Helm Chart
+
+```bash
+# Add repository (if needed)
+helm repo add [repo-name] [repo-url]
+helm repo update
+
+# Install new release
+helm install [release-name] [chart] -n [namespace] \
+  --create-namespace \
+  --set replicas=3 \
+  --set image.tag=v1.2.3 \
+  --values values.yaml
+
+# Upgrade existing release
+helm upgrade [release-name] [chart] -n [namespace] \
+  --reuse-values \
+  --set image.tag=v1.2.4
+
+# Install or upgrade (idempotent)
+helm upgrade --install [release-name] [chart] -n [namespace]
+```
+
+#### Option C: Kustomize
+
+```bash
+# Apply with kustomize
+kubectl apply -k overlays/[environment]/ -n [namespace]
+
+# Preview what will be applied
+kubectl kustomize overlays/[environment]/
+```
+
+#### Option D: Quick Deployment (Image Only)
+
+```bash
+# Create deployment from image
+kubectl create deployment [name] \
+  --image=[image:tag] \
+  --replicas=3 \
+  -n [namespace]
+
+# Expose as service
+kubectl expose deployment [name] \
+  --port=80 \
+  --target-port=8080 \
+  --type=LoadBalancer \
+  -n [namespace]
+```
+
+### 4. Monitor Deployment Progress
+
+**Watch rollout status**:
+```bash
+# For Deployments
+kubectl rollout status deployment/[name] -n [namespace]
+
+# For StatefulSets
+kubectl rollout status statefulset/[name] -n [namespace]
+
+# For DaemonSets
+kubectl rollout status daemonset/[name] -n [namespace]
+```
+
+**Watch pods coming up**:
+```bash
+# Watch pods in real-time
+kubectl get pods -n [namespace] -w
+
+# Watch with labels
+kubectl get pods -n [namespace] -l app=[name] -w
+
+# Detailed view
+kubectl get pods -n [namespace] -o wide
+```
+
+**Check events**:
+```bash
+kubectl get events -n [namespace] \
+  --sort-by='.lastTimestamp' \
+  --watch
+```
+
+### 5. Verify Deployment Health
+
+**Pod status checks**:
+```bash
+# All pods running?
+kubectl get pods -n [namespace]
+
+# Check specific deployment
+kubectl get deployment [name] -n [namespace]
+
+# Detailed pod info
+kubectl describe pod [pod-name] -n [namespace]
+```
+
+**Health check verification**:
+```bash
+# Check if pods are ready
+kubectl get pods -n [namespace] -o json | \
+  jq '.items[] | {name: .metadata.name, ready: .status.conditions[] | select(.type=="Ready") | .status}'
+
+# Check readiness probes
+kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Readiness"
+```
+
+**Service connectivity**:
+```bash
+# Check service endpoints
+kubectl get endpoints [service-name] -n [namespace]
+
+# Describe service
+kubectl describe service [service-name] -n [namespace]
+
+# Test service from within cluster
+kubectl run test-pod --image=curlimages/curl -i --rm -- \
+  curl http://[service-name].[namespace].svc.cluster.local
+```
+
+**Resource usage**:
+```bash
+# Pod resource usage
+kubectl top pods -n [namespace]
+
+# Specific deployment
+kubectl top pods -n [namespace] -l app=[name]
+```
+
+### 6. Post-Deployment Validation
+
+**Application health checks**:
+```bash
+# Check application logs
+kubectl logs -n [namespace] deployment/[name] --tail=50
+
+# Follow logs
+kubectl logs -n [namespace] -f deployment/[name]
+
+# Logs from all pods
+kubectl logs -n [namespace] -l app=[name] --all-containers=true
+```
+
+**Ingress/Route verification** (if applicable):
+```bash
+# Check ingress
+kubectl get ingress -n [namespace]
+
+# Test external access
+curl https://[domain]
+```
+
+**ConfigMap/Secret verification**:
+```bash
+# Verify ConfigMaps mounted
+kubectl get configmap -n [namespace]
+
+# Verify Secrets exist
+kubectl get secrets -n [namespace]
+```
+
+### 7. Update Deployment Records
+
+Document deployment details:
+- Deployment timestamp
+- Image versions deployed
+- Configuration changes
+- Any issues encountered
+- Rollback plan (previous version info)
+
+## Deployment Strategies
+
+### Rolling Update (Default)
+
+**Configuration**:
+```yaml
+spec:
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1        # Max pods above desired count
+      maxUnavailable: 0   # Max pods below desired count
+```
+
+**Deploy**:
+```bash
+kubectl set image deployment/[name] \
+  [container]=[image:new-tag] \
+  -n [namespace]
+```
+
+### Recreate Strategy
+
+**Configuration**:
+```yaml
+spec:
+  strategy:
+    type: Recreate
+```
+
+**Use case**: When you can afford downtime or need to avoid version mixing
+
+### Blue-Green Deployment
+
+**Steps**:
+```bash
+# 1. Deploy green version
+kubectl apply -f deployment-green.yaml -n [namespace]
+
+# 2. Verify green is healthy
+kubectl get pods -n [namespace] -l version=green
+
+# 3. Switch service selector
+kubectl patch service [name] -n [namespace] \
+  -p '{"spec":{"selector":{"version":"green"}}}'
+
+# 4. Remove blue version
+kubectl delete deployment [name]-blue -n [namespace]
+```
+
+### Canary Deployment
+
+**Steps**:
+```bash
+# 1. Deploy canary with 1 replica
+kubectl apply -f deployment-canary.yaml -n [namespace]
+
+# 2. Monitor metrics (error rate, latency)
+kubectl logs -n [namespace] -l version=canary
+
+# 3. Gradually increase canary replicas
+kubectl scale deployment [name]-canary --replicas=3 -n [namespace]
+
+# 4. If successful, update main deployment
+kubectl set image deployment/[name] [container]=[new-image] -n [namespace]
+
+# 5. Remove canary
+kubectl delete deployment [name]-canary -n [namespace]
+```
+
+## Output Format
+
+### Deployment Summary
+
+**Deployment Information**:
+- **Name**: [deployment-name]
+- **Namespace**: [namespace]
+- **Environment**: [dev/staging/production]
+- **Strategy**: [RollingUpdate/Recreate/Blue-Green/Canary]
+- **Timestamp**: [YYYY-MM-DD HH:MM:SS UTC]
+
+**Resources Deployed**:
+```
+Deployments:
+  ✓ [name]: 3/3 replicas ready
+    - Image: [image:tag]
+    - CPU: 100m request, 500m limit
+    - Memory: 128Mi request, 512Mi limit
+
+Services:
+  ✓ [name]: ClusterIP 10.96.1.10:80 → 8080
+  ✓ [name]-lb: LoadBalancer [external-ip]:80 → 8080
+
+Ingress:
+  ✓ [name]: https://[domain] → [service]:80
+
+ConfigMaps:
+  ✓ [name]-config
+
+Secrets:
+  ✓ [name]-secrets
+```
+
+**Health Status**:
+- Pods: 3/3 Running
+- Ready: 3/3
+- Restarts: 0
+- Age: 2m30s
+
+**Access Information**:
+- Internal: http://[service].[namespace].svc.cluster.local:80
+- External: https://[domain]
+- Load Balancer: http://[external-ip]:80
+
+### Verification Commands
+
+Run these commands to verify deployment:
+```bash
+# Check deployment status
+kubectl get deployment [name] -n [namespace]
+
+# Check pod health
+kubectl get pods -n [namespace] -l app=[name]
+
+# View logs
+kubectl logs -n [namespace] -l app=[name] --tail=20
+
+# Test service
+kubectl run test --image=curlimages/curl -i --rm -- \
+  curl http://[service].[namespace].svc.cluster.local
+
+# Check resource usage
+kubectl top pods -n [namespace] -l app=[name]
+```
+
+### Rollback Information
+
+If issues occur, rollback with:
+```bash
+# View rollout history
+kubectl rollout history deployment/[name] -n [namespace]
+
+# Rollback to previous version
+kubectl rollout undo deployment/[name] -n [namespace]
+
+# Rollback to specific revision
+kubectl rollout undo deployment/[name] -n [namespace] --to-revision=[num]
+```
+
+**Previous Version**:
+- Revision: [number]
+- Image: [previous-image:tag]
+- Change cause: [previous-deployment-reason]
+
+## Troubleshooting
+
+### Pods Not Starting
+
+**ImagePullBackOff**:
+```bash
+# Check image pull errors
+kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
+
+# Verify image exists
+docker pull [image:tag]
+
+# Check imagePullSecrets
+kubectl get secrets -n [namespace]
+```
+
+**CrashLoopBackOff**:
+```bash
+# Check application logs
+kubectl logs [pod-name] -n [namespace] --previous
+
+# Check startup command
+kubectl describe pod [pod-name] -n [namespace] | grep -A5 "Command:"
+
+# Check resource limits
+kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Limits:"
+```
+
+**Pending Status**:
+```bash
+# Check why pod is pending
+kubectl describe pod [pod-name] -n [namespace] | grep -A10 "Events:"
+
+# Check node resources
+kubectl top nodes
+
+# Check PVC status (if using persistent volumes)
+kubectl get pvc -n [namespace]
+```
+
+### Rollout Stuck
+
+```bash
+# Check rollout status
+kubectl rollout status deployment/[name] -n [namespace]
+
+# Check deployment events
+kubectl describe deployment [name] -n [namespace]
+
+# Check replica sets
+kubectl get rs -n [namespace]
+
+# Force rollout
+kubectl rollout restart deployment/[name] -n [namespace]
+```
+
+### Service Not Accessible
+
+```bash
+# Check service selector matches pod labels
+kubectl get service [name] -n [namespace] -o yaml | grep selector -A5
+kubectl get pods -n [namespace] --show-labels
+
+# Check endpoints
+kubectl get endpoints [name] -n [namespace]
+
+# Check network policies
+kubectl get networkpolicies -n [namespace]
+
+# Test from debug pod
+kubectl run debug --image=nicolaka/netshoot -i --rm -- \
+  curl http://[service].[namespace].svc.cluster.local
+```
+
+### High Resource Usage
+
+```bash
+# Check resource usage
+kubectl top pods -n [namespace]
+
+# Check for OOMKilled
+kubectl get pods -n [namespace] -o json | \
+  jq '.items[] | select(.status.containerStatuses[].lastState.terminated.reason=="OOMKilled") | .metadata.name'
+
+# Increase resources
+kubectl set resources deployment [name] -n [namespace] \
+  --limits=cpu=1000m,memory=1Gi \
+  --requests=cpu=200m,memory=256Mi
+```
+
+## Best Practices
+
+**Pre-deployment**:
+- Always use `--dry-run=client` first
+- Test in dev/staging before production
+- Review resource limits
+- Verify image tags (avoid :latest in production)
+
+**During deployment**:
+- Monitor rollout status
+- Watch logs for errors
+- Check pod health continuously
+- Verify endpoints are ready
+
+**Post-deployment**:
+- Document what was deployed
+- Monitor for 10-15 minutes
+- Keep previous version info for rollback
+- Update monitoring dashboards
+
+**Production deployments**:
+- Use blue-green or canary for critical services
+- Set PodDisruptionBudgets
+- Configure HorizontalPodAutoscaler
+- Enable auto-rollback on failure
+- Schedule during maintenance windows
--- a/commands/k8s-full-stack-deploy.md
+++ b/commands/k8s-full-stack-deploy.md
@@ -0,0 +1,134 @@
+---
+description: Orchestrated end-to-end deployment workflow
+argument-hint: Optional stack description
+---
+
+# Full-Stack Kubernetes Deployment
+
+You are orchestrating a complete end-to-end Kubernetes deployment workflow using multiple specialized agents.
+
+## Workflow
+
+### 1. Gather Requirements
+
+If the user hasn't specified details, gather:
+- Application components and their relationships
+- Dependencies (databases, caches, message queues, etc.)
+- Target environment (dev/staging/production)
+- Security and compliance requirements
+- Monitoring and observability needs
+- GitOps automation preferences (ArgoCD/Flux)
+- Infrastructure platform (standard K8s, K3s, Talos, Flatcar)
+
+### 2. Phase 1 - Configuration Generation
+
+Launch the appropriate configuration agent(s):
+- **k8s-config-developer**: For standard Kubernetes YAML manifests
+- **helm-chart-developer**: If packaging as Helm chart
+- **cdk8s-engineer**: If using code-based configuration
+
+Pass complete requirements to generate:
+- Application deployments/statefulsets
+- Database statefulsets with persistence
+- Service definitions
+- Ingress configurations
+- ConfigMaps and Secrets
+- RBAC resources
+
+### 3. Phase 2 - Security Review
+
+Launch **k8s-security-reviewer** to analyze all generated configurations:
+- Pod Security Standards compliance
+- RBAC least privilege verification
+- Network policy requirements
+- Secret management practices
+- Image security
+- Resource limits and quotas
+
+**Critical**: Address all critical and high-severity findings before proceeding.
+
+### 4. Phase 3 - Deployment
+
+Launch **k8s-cluster-manager** to deploy in proper sequence:
+
+1. Deploy infrastructure layer (databases, caches)
+2. Verify infrastructure health
+3. Deploy application layer
+4. Verify application health
+5. Configure ingress and networking
+
+Monitor rollout status and handle any failures with automatic rollback.
+
+### 5. Phase 4 - Monitoring Setup
+
+Launch **k8s-monitoring-analyst** to:
+- Configure Prometheus ServiceMonitors
+- Create Grafana dashboards
+- Set up alerts for critical metrics
+- Establish baseline performance metrics
+- Configure log aggregation
+
+### 6. Phase 5 - GitOps Automation
+
+Launch **k8s-cicd-engineer** to establish GitOps:
+- Configure ArgoCD Application or Flux Kustomization
+- Set up automatic sync policies
+- Configure deployment notifications
+- Establish progressive delivery if needed
+
+## Output Format
+
+Provide a comprehensive deployment report:
+
+### Deployment Summary
+- Environment: [environment]
+- Namespace: [namespace]
+- Components deployed: [list]
+- Security review: [Pass/Issues addressed]
+
+### Resources Created
+```
+Deployments:
+- [name]: [replicas] replicas, image [image:tag]
+
+StatefulSets:
+- [name]: [replicas] replicas, [storage]
+
+Services:
+- [name]: [type], port [port]
+
+Ingress:
+- [domain]: → [service]:[port]
+```
+
+### Access Information
+- Application URL: https://[domain]
+- Monitoring: https://grafana.[domain]/d/[dashboard]
+- GitOps: https://argocd.[domain]/applications/[app]
+
+### Next Steps
+1. Verify application at [URL]
+2. Check monitoring dashboards
+3. Review GitOps sync status
+4. Test rollback procedure
+
+### Validation Commands
+```bash
+kubectl get all -n [namespace]
+kubectl logs -n [namespace] -l app=[name]
+kubectl top pods -n [namespace]
+```
+
+## Troubleshooting
+
+If deployment fails:
+1. Check pod status: `kubectl get pods -n [namespace]`
+2. Review events: `kubectl get events -n [namespace] --sort-by='.lastTimestamp'`
+3. Check logs: `kubectl logs -n [namespace] [pod-name]`
+4. Verify resources: `kubectl describe pod -n [namespace] [pod-name]`
+
+If security review fails:
+1. Review critical findings
+2. Update configurations to address issues
+3. Re-run security review
+4. Proceed only when critical issues resolved
--- a/commands/k8s-security-review.md
+++ b/commands/k8s-security-review.md
@@ -0,0 +1,184 @@
+---
+description: Security review of Kubernetes configurations
+argument-hint: Optional configurations to review
+---
+
+# Kubernetes Security Review
+
+You are conducting a comprehensive security review of Kubernetes configurations and deployments using the k8s-security-reviewer agent.
+
+## Workflow
+
+### 1. Identify Review Scope
+
+Determine what needs to be reviewed:
+- **New configurations**: YAML manifests before deployment
+- **Existing deployments**: Running workloads in cluster
+- **Helm charts**: Chart templates and values
+- **Entire namespace**: All resources in a namespace
+- **Cluster-wide**: Cluster roles, policies, admission controllers
+
+If user hasn't specified, ask for:
+- Target configurations or namespace
+- Environment criticality (dev/staging/production)
+- Compliance requirements (CIS, PCI-DSS, SOC 2, HIPAA)
+- Specific security concerns or focus areas
+
+### 2. Gather Configuration Files
+
+For file-based review:
+- Use `Read` tool to access manifest files
+- Use `Glob` to find all YAML files in directory
+- Use `Bash` with `kubectl` to extract running configurations
+
+For cluster review:
+```bash
+kubectl get all -n [namespace] -o yaml
+kubectl get networkpolicies -n [namespace] -o yaml
+kubectl get rolebindings,clusterrolebindings -o yaml
+kubectl get psp,pdb -n [namespace] -o yaml
+```
+
+### 3. Launch Security Review Agent
+
+Launch **k8s-security-reviewer** agent with:
+- All configuration files or cluster export
+- Environment context (production requires stricter standards)
+- Compliance requirements
+- Specific focus areas if any
+
+### 4. Analyze Security Findings
+
+The agent will assess:
+- **Pod Security**: privileged containers, security contexts, capabilities
+- **RBAC**: overly permissive roles, cluster-admin usage
+- **Network Policies**: segmentation, default deny, egress control
+- **Secrets Management**: hardcoded secrets, proper encryption
+- **Image Security**: tag usage, registry sources, vulnerability scanning
+- **Resource Limits**: DoS prevention, resource quotas
+- **Admission Control**: PSS/PSP enforcement
+
+### 5. Categorize Issues
+
+Organize findings by severity:
+
+**Critical** (Block deployment):
+- Privileged containers in production
+- Hardcoded secrets or credentials
+- Missing network policies in production
+- Overly permissive RBAC (cluster-admin for apps)
+
+**High** (Fix before deployment):
+- Running as root
+- Missing resource limits
+- No Pod Disruption Budgets in production
+- Missing security contexts
+
+**Medium** (Address soon):
+- Using :latest tag
+- Missing readiness/liveness probes
+- Insufficient RBAC granularity
+
+**Low** (Best practice):
+- Missing labels
+- No pod anti-affinity
+- Verbose logging
+
+### 6. Provide Remediation Guidance
+
+For each critical and high finding:
+1. Explain the security risk
+2. Show the problematic configuration
+3. Provide fixed configuration
+4. Include verification steps
+
+## Output Format
+
+### Security Review Report
+
+#### Executive Summary
+- **Overall Risk Level**: [Critical/High/Medium/Low]
+- **Critical Issues**: [count] - MUST fix before deployment
+- **High Issues**: [count] - Fix before production
+- **Medium Issues**: [count] - Address within sprint
+- **Low Issues**: [count] - Best practice improvements
+
+#### Critical Findings
+
+**[CRITICAL] Privileged Container**
+- **Location**: `deployment/myapp` container `app`
+- **Risk**: Full host access, container escape, kernel exploits
+- **Current Config**:
+```yaml
+securityContext:
+  privileged: true  # DANGEROUS
+```
+- **Recommended Fix**:
+```yaml
+securityContext:
+  privileged: false
+  allowPrivilegeEscalation: false
+  readOnlyRootFilesystem: true
+  runAsNonRoot: true
+  runAsUser: 1000
+  capabilities:
+    drop: [ALL]
+```
+- **Verification**: `kubectl describe pod [pod] | grep "Privileged:"`
+
+#### High Priority Findings
+
+[Similar format for each high-priority issue]
+
+#### Compliance Assessment
+
+- **CIS Kubernetes Benchmark**: [Pass/Fail items]
+- **Pod Security Standards**: [Baseline/Restricted]
+- **Industry Requirements**: [Specific to requested compliance]
+
+#### Recommended Actions
+
+Priority 1 (Before Deployment):
+1. [Action with file:line reference]
+2. [Action with file:line reference]
+
+Priority 2 (This Sprint):
+1. [Action]
+2. [Action]
+
+Priority 3 (Backlog):
+1. [Action]
+2. [Action]
+
+### Validation Commands
+
+After applying fixes:
+```bash
+# Verify security contexts
+kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext}{"\n"}{end}'
+
+# Check for privileged pods
+kubectl get pods -n [namespace] -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].securityContext.privileged}{"\n"}{end}'
+
+# Verify network policies exist
+kubectl get networkpolicies -n [namespace]
+
+# Check RBAC
+kubectl auth can-i --list -n [namespace]
+```
+
+## Decision Matrix
+
+**When to block deployment:**
+- Any CRITICAL findings in production
+- Multiple HIGH findings in production
+- Compliance requirement violations
+
+**When to allow with warnings:**
+- Only MEDIUM/LOW findings
+- HIGH findings in dev/staging with remediation plan
+
+**When to require re-review:**
+- After fixing CRITICAL issues
+- After major configuration changes
+- Before production promotion
--- a/commands/k8s-setup-flatcar.md
+++ b/commands/k8s-setup-flatcar.md
@@ -0,0 +1,14 @@
+---
+description: Configure Flatcar Linux-based cluster
+argument-hint: Optional cluster requirements
+---
+
+You are initiating Flatcar Container Linux cluster setup. Use the flatcar-linux-expert agent.
+
+If the user specifies requirements, pass to the agent. Otherwise, ask for:
+- Node configuration
+- Ignition config requirements
+- Update strategy
+- Container runtime preference
+
+Launch the flatcar-linux-expert agent to configure Flatcar-based Kubernetes cluster.
--- a/commands/k8s-setup-gitops.md
+++ b/commands/k8s-setup-gitops.md
@@ -0,0 +1,342 @@
+---
+description: Setup GitOps CI/CD with ArgoCD or Flux
+argument-hint: Optional GitOps tool preference
+---
+
+# GitOps CI/CD Setup
+
+You are setting up GitOps-based continuous deployment using the k8s-cicd-engineer agent.
+
+## Workflow
+
+### 1. Choose GitOps Tool
+
+If not specified, help user choose:
+
+**ArgoCD** - Best for:
+- UI-driven workflows
+- Multi-cluster management
+- RBAC and SSO integration
+- Helm and Kustomize support
+
+**Flux** - Best for:
+- Pure GitOps (no UI needed)
+- Kubernetes-native resources
+- Helm controller integration
+- Multi-tenancy
+
+### 2. Gather Requirements
+
+Ask for:
+- **Git repository**:
+  - Repository URL
+  - Branch strategy (main, env branches, or directories)
+  - Authentication method (SSH key, token)
+- **Applications**:
+  - List of applications to manage
+  - Manifest locations in repo
+  - Dependencies between apps
+- **Environments**:
+  - dev, staging, production
+  - Separate clusters or namespaces
+- **Sync policy**:
+  - Automatic or manual sync
+  - Auto-pruning resources
+  - Self-healing enabled
+- **Progressive delivery**:
+  - Canary deployments
+  - Blue-green deployments
+  - Flagger integration
+
+### 3. Install GitOps Tool
+
+Launch **k8s-cicd-engineer** to install:
+
+**For ArgoCD**:
+```bash
+kubectl create namespace argocd
+kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
+```
+
+**For Flux**:
+```bash
+flux bootstrap github \
+  --owner=[org] \
+  --repository=[repo] \
+  --branch=main \
+  --path=clusters/production \
+  --personal
+```
+
+### 4. Configure Git Repository Access
+
+**ArgoCD**:
+```bash
+argocd repo add https://github.com/org/repo \
+  --username [user] \
+  --password [token]
+```
+
+**Flux**:
+- Flux bootstrap automatically creates deploy key
+- Verify in GitHub Settings > Deploy keys
+
+### 5. Create Application Definitions
+
+**ArgoCD Application**:
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: myapp
+  namespace: argocd
+spec:
+  project: default
+  source:
+    repoURL: https://github.com/org/repo
+    targetRevision: HEAD
+    path: k8s/overlays/production
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: production
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+      - CreateNamespace=true
+```
+
+**Flux Kustomization**:
+```yaml
+apiVersion: kustomize.toolkit.fluxcd.io/v1
+kind: Kustomization
+metadata:
+  name: myapp
+  namespace: flux-system
+spec:
+  interval: 5m
+  path: ./k8s/overlays/production
+  prune: true
+  sourceRef:
+    kind: GitRepository
+    name: myapp
+```
+
+### 6. Setup App-of-Apps Pattern (Optional)
+
+For managing multiple applications:
+
+**ArgoCD**:
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: apps
+  namespace: argocd
+spec:
+  source:
+    path: argocd/applications
+  destination:
+    namespace: argocd
+  syncPolicy:
+    automated: {}
+```
+
+**Flux**: Use hierarchical Kustomizations
+
+### 7. Configure Progressive Delivery (Optional)
+
+If requested, install and configure Flagger:
+
+```bash
+helm install flagger flagger/flagger \
+  --namespace flagger-system
+```
+
+Create Canary resource:
+```yaml
+apiVersion: flagger.app/v1beta1
+kind: Canary
+metadata:
+  name: myapp
+spec:
+  targetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: myapp
+  analysis:
+    interval: 1m
+    threshold: 5
+    maxWeight: 50
+    stepWeight: 10
+```
+
+### 8. Setup Notifications
+
+**ArgoCD**:
+- Configure Slack/Teams webhooks
+- Setup notification triggers
+
+**Flux**:
+- Configure notification-controller
+- Create Alerts for Git events
+
+### 9. Verify GitOps Workflow
+
+1. Make change in Git repository
+2. Commit and push
+3. Verify automatic sync
+4. Check application health
+
+## Output Format
+
+### GitOps Setup Summary
+
+**GitOps Tool**: [ArgoCD/Flux]
+**Version**: [version]
+**Installation**: [namespace]
+
+**Git Repository**:
+- URL: [repo-url]
+- Branch: [branch]
+- Path: [path]
+- Authentication: [Configured ✓]
+
+**Applications Configured**:
+1. [app-name]
+   - Source: [path]
+   - Destination: [namespace]
+   - Sync: [Auto/Manual]
+   - Status: [Synced/OutOfSync]
+
+2. [app-name]
+   - Source: [path]
+   - Destination: [namespace]
+   - Sync: [Auto/Manual]
+   - Status: [Synced/OutOfSync]
+
+**Access Information**:
+- **ArgoCD UI**: https://argocd.[domain]
+  - Username: admin
+  - Password: [Use `kubectl get secret` to retrieve]
+- **Flux**: `flux get all`
+
+### Next Steps
+
+**For ArgoCD**:
+```bash
+# Access UI
+kubectl port-forward svc/argocd-server -n argocd 8080:443
+
+# Get admin password
+kubectl -n argocd get secret argocd-initial-admin-secret \
+  -o jsonpath="{.data.password}" | base64 -d
+
+# Sync application
+argocd app sync myapp
+
+# Check status
+argocd app list
+```
+
+**For Flux**:
+```bash
+# Check GitOps status
+flux get all
+
+# Reconcile immediately
+flux reconcile source git myapp
+flux reconcile kustomization myapp
+
+# Check logs
+flux logs
+```
+
+### Testing GitOps Workflow
+
+1. **Make a change**:
+```bash
+git clone [repo]
+cd [repo]
+# Edit manifests
+git add .
+git commit -m "Update deployment replicas"
+git push
+```
+
+2. **Watch sync** (ArgoCD):
+```bash
+argocd app wait myapp --sync
+```
+
+2. **Watch sync** (Flux):
+```bash
+flux reconcile kustomization myapp --with-source
+watch flux get kustomizations
+```
+
+3. **Verify changes**:
+```bash
+kubectl get deployment myapp -n production
+```
+
+## Best Practices
+
+**Repository Structure**:
+```
+repo/
+├── base/              # Base manifests
+│   ├── deployment.yaml
+│   └── service.yaml
+├── overlays/
+│   ├── dev/          # Dev environment
+│   ├── staging/      # Staging environment
+│   └── production/   # Production environment
+└── argocd/           # Application definitions
+    └── applications/
+```
+
+**Security**:
+- Use SSH keys for Git access
+- Enable RBAC in ArgoCD
+- Encrypt secrets (Sealed Secrets, External Secrets)
+- Review before auto-sync in production
+
+**Workflow**:
+- Use pull requests for changes
+- Require code review
+- Test in dev/staging first
+- Enable auto-sync only after testing
+
+## Troubleshooting
+
+**Application not syncing (ArgoCD)**:
+```bash
+# Check application status
+argocd app get myapp
+
+# Force sync
+argocd app sync myapp --force
+
+# Check events
+kubectl get events -n argocd
+```
+
+**Kustomization failing (Flux)**:
+```bash
+# Check status
+flux get kustomizations
+
+# Check logs
+flux logs --kind=Kustomization --name=myapp
+
+# Force reconcile
+flux reconcile kustomization myapp --with-source
+```
+
+**Git authentication failing**:
+- Verify deploy key permissions (read/write)
+- Check token hasn't expired
+- Verify repository URL correct
+- Check network policies allow Git access
--- a/commands/k8s-setup-talos.md
+++ b/commands/k8s-setup-talos.md
@@ -0,0 +1,216 @@
+---
+description: Configure Talos Linux-based cluster
+argument-hint: Optional cluster requirements
+---
+
+# Talos Linux Cluster Setup
+
+You are setting up a Kubernetes cluster on Talos Linux using the talos-linux-expert agent.
+
+## Workflow
+
+### 1. Gather Cluster Requirements
+
+If not specified, ask for:
+- **Node configuration**:
+  - Number of control plane nodes (1 or 3+ for HA)
+  - Number of worker nodes
+  - IP addresses for each node
+  - Hostnames
+- **Network configuration**:
+  - Control plane endpoint (load balancer IP for HA)
+  - CNI preference (none/Cilium/Calico - recommend installing separately)
+  - Pod and service CIDR ranges
+- **High availability**:
+  - Load balancer for control plane (required for HA)
+  - Distributed storage requirements
+- **Talos version**: Latest stable or specific version
+
+### 2. Generate Machine Configurations
+
+Launch **talos-linux-expert** to generate configs:
+
+```bash
+talosctl gen config cluster-name https://[endpoint]:6443
+```
+
+This creates:
+- `controlplane.yaml` - For control plane nodes
+- `worker.yaml` - For worker nodes
+- `talosconfig` - For talosctl client
+
+### 3. Customize Configurations
+
+Apply necessary patches for:
+- **Network settings**: Static IPs, routes, VLANs
+- **CNI**: Disable built-in CNI if using Cilium/Calico
+- **Install disk**: Specify correct disk path
+- **Certificate SANs**: Add load balancer IP/hostname
+- **Cluster discovery**: Configure if needed
+
+Example patch:
+```yaml
+machine:
+  network:
+    interfaces:
+      - interface: eth0
+        addresses:
+          - 192.168.1.10/24
+        routes:
+          - network: 0.0.0.0/0
+            gateway: 192.168.1.1
+cluster:
+  network:
+    cni:
+      name: none  # Install Cilium separately
+```
+
+### 4. Apply Configurations to Nodes
+
+For each node:
+```bash
+# Control plane nodes
+talosctl apply-config --insecure --nodes [IP] --file controlplane.yaml
+
+# Worker nodes
+talosctl apply-config --insecure --nodes [IP] --file worker.yaml
+```
+
+Wait for nodes to boot and apply configurations.
+
+### 5. Bootstrap Kubernetes
+
+On first control plane node only:
+```bash
+talosctl bootstrap --nodes [first-controlplane-IP]
+```
+
+This initializes etcd and starts Kubernetes.
+
+### 6. Retrieve kubeconfig
+
+```bash
+talosctl kubeconfig --nodes [controlplane-IP]
+```
+
+### 7. Verify Cluster
+
+```bash
+# Check Talos health
+talosctl health --nodes [all-nodes]
+
+# Check Kubernetes nodes
+kubectl get nodes
+
+# Verify etcd
+talosctl etcd members --nodes [controlplane-IP]
+```
+
+### 8. Install CNI (if using Cilium/Calico)
+
+If CNI set to `none`, launch **k8s-network-engineer** to install:
+```bash
+helm install cilium cilium/cilium --namespace kube-system
+```
+
+### 9. Post-Installation Tasks
+
+- Configure storage (if needed)
+- Set up monitoring
+- Apply security policies
+- Configure backups (etcd snapshots)
+
+## Output Format
+
+### Talos Cluster Configuration Summary
+
+**Cluster Information:**
+- Name: [cluster-name]
+- Talos Version: [version]
+- Kubernetes Version: [version]
+- Endpoint: https://[endpoint]:6443
+
+**Control Plane Nodes:**
+- [hostname]: [IP] - [status]
+- [hostname]: [IP] - [status]
+- [hostname]: [IP] - [status]
+
+**Worker Nodes:**
+- [hostname]: [IP] - [status]
+- [hostname]: [IP] - [status]
+
+**Network Configuration:**
+- CNI: [Cilium/Calico/None]
+- Pod CIDR: [range]
+- Service CIDR: [range]
+
+**Configuration Files:**
+```
+✓ controlplane.yaml - Apply to control plane nodes
+✓ worker.yaml - Apply to worker nodes
+✓ talosconfig - Configure talosctl client
+```
+
+### Next Steps
+
+1. **Configure talosctl**:
+```bash
+export TALOSCONFIG=$PWD/talosconfig
+talosctl config endpoint [controlplane-IPs]
+talosctl config node [any-controlplane-IP]
+```
+
+2. **Verify cluster**:
+```bash
+kubectl get nodes
+kubectl get pods -A
+```
+
+3. **Install CNI** (if needed):
+```bash
+helm install cilium cilium/cilium -n kube-system
+```
+
+4. **Deploy workloads**:
+```bash
+kubectl apply -f your-manifests/
+```
+
+### Useful talosctl Commands
+
+```bash
+# Check node status
+talosctl dashboard --nodes [IP]
+
+# View logs
+talosctl logs --nodes [IP] kubelet
+
+# Upgrade Talos
+talosctl upgrade --nodes [IP] --image ghcr.io/siderolabs/installer:v1.6.0
+
+# Upgrade Kubernetes
+talosctl upgrade-k8s --nodes [IP] --to 1.29.0
+
+# Restart services
+talosctl restart kubelet --nodes [IP]
+
+# etcd operations
+talosctl etcd snapshot --nodes [IP]
+```
+
+## Troubleshooting
+
+**Nodes not joining:**
+- Verify network connectivity
+- Check firewall rules (6443, 50000, 50001)
+- Verify machine config applied correctly
+
+**etcd not starting:**
+- Ensure only one bootstrap command run
+- Check time synchronization
+- Verify disk space
+
+**CNI not working:**
+- Verify CNI set to `none` in config
+- Check Cilium/Calico installation
+- Verify network policies not blocking