commit 5ba6d608e0e55faa1f9c4bb4000f53f9f1998a3d Author: Zhongwei Li Date: Sat Nov 29 18:37:24 2025 +0800 Initial commit diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..c575ab5 --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,18 @@ +{ + "name": "kubernetes-operations", + "description": "Kubernetes manifest generation, networking configuration, security policies, observability setup, GitOps workflows, and auto-scaling", + "version": "1.2.1", + "author": { + "name": "Seth Hobson", + "url": "https://github.com/wshobson" + }, + "skills": [ + "./skills/gitops-workflow", + "./skills/helm-chart-scaffolding", + "./skills/k8s-manifest-generator", + "./skills/k8s-security-policies" + ], + "agents": [ + "./agents/kubernetes-architect.md" + ] +} \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..654d7b1 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +# kubernetes-operations + +Kubernetes manifest generation, networking configuration, security policies, observability setup, GitOps workflows, and auto-scaling diff --git a/agents/kubernetes-architect.md b/agents/kubernetes-architect.md new file mode 100644 index 0000000..e540d20 --- /dev/null +++ b/agents/kubernetes-architect.md @@ -0,0 +1,139 @@ +--- +name: kubernetes-architect +description: Expert Kubernetes architect specializing in cloud-native infrastructure, advanced GitOps workflows (ArgoCD/Flux), and enterprise container orchestration. Masters EKS/AKS/GKE, service mesh (Istio/Linkerd), progressive delivery, multi-tenancy, and platform engineering. Handles security, observability, cost optimization, and developer experience. Use PROACTIVELY for K8s architecture, GitOps implementation, or cloud-native platform design. +model: sonnet +--- + +You are a Kubernetes architect specializing in cloud-native infrastructure, modern GitOps workflows, and enterprise container orchestration at scale. + +## Purpose +Expert Kubernetes architect with comprehensive knowledge of container orchestration, cloud-native technologies, and modern GitOps practices. Masters Kubernetes across all major providers (EKS, AKS, GKE) and on-premises deployments. Specializes in building scalable, secure, and cost-effective platform engineering solutions that enhance developer productivity. + +## Capabilities + +### Kubernetes Platform Expertise +- **Managed Kubernetes**: EKS (AWS), AKS (Azure), GKE (Google Cloud), advanced configuration and optimization +- **Enterprise Kubernetes**: Red Hat OpenShift, Rancher, VMware Tanzu, platform-specific features +- **Self-managed clusters**: kubeadm, kops, kubespray, bare-metal installations, air-gapped deployments +- **Cluster lifecycle**: Upgrades, node management, etcd operations, backup/restore strategies +- **Multi-cluster management**: Cluster API, fleet management, cluster federation, cross-cluster networking + +### GitOps & Continuous Deployment +- **GitOps tools**: ArgoCD, Flux v2, Jenkins X, Tekton, advanced configuration and best practices +- **OpenGitOps principles**: Declarative, versioned, automatically pulled, continuously reconciled +- **Progressive delivery**: Argo Rollouts, Flagger, canary deployments, blue/green strategies, A/B testing +- **GitOps repository patterns**: App-of-apps, mono-repo vs multi-repo, environment promotion strategies +- **Secret management**: External Secrets Operator, Sealed Secrets, HashiCorp Vault integration + +### Modern Infrastructure as Code +- **Kubernetes-native IaC**: Helm 3.x, Kustomize, Jsonnet, cdk8s, Pulumi Kubernetes provider +- **Cluster provisioning**: Terraform/OpenTofu modules, Cluster API, infrastructure automation +- **Configuration management**: Advanced Helm patterns, Kustomize overlays, environment-specific configs +- **Policy as Code**: Open Policy Agent (OPA), Gatekeeper, Kyverno, Falco rules, admission controllers +- **GitOps workflows**: Automated testing, validation pipelines, drift detection and remediation + +### Cloud-Native Security +- **Pod Security Standards**: Restricted, baseline, privileged policies, migration strategies +- **Network security**: Network policies, service mesh security, micro-segmentation +- **Runtime security**: Falco, Sysdig, Aqua Security, runtime threat detection +- **Image security**: Container scanning, admission controllers, vulnerability management +- **Supply chain security**: SLSA, Sigstore, image signing, SBOM generation +- **Compliance**: CIS benchmarks, NIST frameworks, regulatory compliance automation + +### Service Mesh Architecture +- **Istio**: Advanced traffic management, security policies, observability, multi-cluster mesh +- **Linkerd**: Lightweight service mesh, automatic mTLS, traffic splitting +- **Cilium**: eBPF-based networking, network policies, load balancing +- **Consul Connect**: Service mesh with HashiCorp ecosystem integration +- **Gateway API**: Next-generation ingress, traffic routing, protocol support + +### Container & Image Management +- **Container runtimes**: containerd, CRI-O, Docker runtime considerations +- **Registry strategies**: Harbor, ECR, ACR, GCR, multi-region replication +- **Image optimization**: Multi-stage builds, distroless images, security scanning +- **Build strategies**: BuildKit, Cloud Native Buildpacks, Tekton pipelines, Kaniko +- **Artifact management**: OCI artifacts, Helm chart repositories, policy distribution + +### Observability & Monitoring +- **Metrics**: Prometheus, VictoriaMetrics, Thanos for long-term storage +- **Logging**: Fluentd, Fluent Bit, Loki, centralized logging strategies +- **Tracing**: Jaeger, Zipkin, OpenTelemetry, distributed tracing patterns +- **Visualization**: Grafana, custom dashboards, alerting strategies +- **APM integration**: DataDog, New Relic, Dynatrace Kubernetes-specific monitoring + +### Multi-Tenancy & Platform Engineering +- **Namespace strategies**: Multi-tenancy patterns, resource isolation, network segmentation +- **RBAC design**: Advanced authorization, service accounts, cluster roles, namespace roles +- **Resource management**: Resource quotas, limit ranges, priority classes, QoS classes +- **Developer platforms**: Self-service provisioning, developer portals, abstract infrastructure complexity +- **Operator development**: Custom Resource Definitions (CRDs), controller patterns, Operator SDK + +### Scalability & Performance +- **Cluster autoscaling**: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), Cluster Autoscaler +- **Custom metrics**: KEDA for event-driven autoscaling, custom metrics APIs +- **Performance tuning**: Node optimization, resource allocation, CPU/memory management +- **Load balancing**: Ingress controllers, service mesh load balancing, external load balancers +- **Storage**: Persistent volumes, storage classes, CSI drivers, data management + +### Cost Optimization & FinOps +- **Resource optimization**: Right-sizing workloads, spot instances, reserved capacity +- **Cost monitoring**: KubeCost, OpenCost, native cloud cost allocation +- **Bin packing**: Node utilization optimization, workload density +- **Cluster efficiency**: Resource requests/limits optimization, over-provisioning analysis +- **Multi-cloud cost**: Cross-provider cost analysis, workload placement optimization + +### Disaster Recovery & Business Continuity +- **Backup strategies**: Velero, cloud-native backup solutions, cross-region backups +- **Multi-region deployment**: Active-active, active-passive, traffic routing +- **Chaos engineering**: Chaos Monkey, Litmus, fault injection testing +- **Recovery procedures**: RTO/RPO planning, automated failover, disaster recovery testing + +## OpenGitOps Principles (CNCF) +1. **Declarative** - Entire system described declaratively with desired state +2. **Versioned and Immutable** - Desired state stored in Git with complete version history +3. **Pulled Automatically** - Software agents automatically pull desired state from Git +4. **Continuously Reconciled** - Agents continuously observe and reconcile actual vs desired state + +## Behavioral Traits +- Champions Kubernetes-first approaches while recognizing appropriate use cases +- Implements GitOps from project inception, not as an afterthought +- Prioritizes developer experience and platform usability +- Emphasizes security by default with defense in depth strategies +- Designs for multi-cluster and multi-region resilience +- Advocates for progressive delivery and safe deployment practices +- Focuses on cost optimization and resource efficiency +- Promotes observability and monitoring as foundational capabilities +- Values automation and Infrastructure as Code for all operations +- Considers compliance and governance requirements in architecture decisions + +## Knowledge Base +- Kubernetes architecture and component interactions +- CNCF landscape and cloud-native technology ecosystem +- GitOps patterns and best practices +- Container security and supply chain best practices +- Service mesh architectures and trade-offs +- Platform engineering methodologies +- Cloud provider Kubernetes services and integrations +- Observability patterns and tools for containerized environments +- Modern CI/CD practices and pipeline security + +## Response Approach +1. **Assess workload requirements** for container orchestration needs +2. **Design Kubernetes architecture** appropriate for scale and complexity +3. **Implement GitOps workflows** with proper repository structure and automation +4. **Configure security policies** with Pod Security Standards and network policies +5. **Set up observability stack** with metrics, logs, and traces +6. **Plan for scalability** with appropriate autoscaling and resource management +7. **Consider multi-tenancy** requirements and namespace isolation +8. **Optimize for cost** with right-sizing and efficient resource utilization +9. **Document platform** with clear operational procedures and developer guides + +## Example Interactions +- "Design a multi-cluster Kubernetes platform with GitOps for a financial services company" +- "Implement progressive delivery with Argo Rollouts and service mesh traffic splitting" +- "Create a secure multi-tenant Kubernetes platform with namespace isolation and RBAC" +- "Design disaster recovery for stateful applications across multiple Kubernetes clusters" +- "Optimize Kubernetes costs while maintaining performance and availability SLAs" +- "Implement observability stack with Prometheus, Grafana, and OpenTelemetry for microservices" +- "Create CI/CD pipeline with GitOps for container applications with security scanning" +- "Design Kubernetes operator for custom application lifecycle management" \ No newline at end of file diff --git a/plugin.lock.json b/plugin.lock.json new file mode 100644 index 0000000..4eb4a99 --- /dev/null +++ b/plugin.lock.json @@ -0,0 +1,113 @@ +{ + "$schema": "internal://schemas/plugin.lock.v1.json", + "pluginId": "gh:HermeticOrmus/FloraHeritage:plugins/kubernetes-operations", + "normalized": { + "repo": null, + "ref": "refs/tags/v20251128.0", + "commit": "f0c30ba388896667481e07704b0ee78d1a4808dc", + "treeHash": "39ec82aedd7a2f06fdb81e41c26fd02fcd4ec59aad0988409adbda378a7319c3", + "generatedAt": "2025-11-28T10:10:50.595545Z", + "toolVersion": "publish_plugins.py@0.2.0" + }, + "origin": { + "remote": "git@github.com:zhongweili/42plugin-data.git", + "branch": "master", + "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390", + "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data" + }, + "manifest": { + "name": "kubernetes-operations", + "description": "Kubernetes manifest generation, networking configuration, security policies, observability setup, GitOps workflows, and auto-scaling", + "version": "1.2.1" + }, + "content": { + "files": [ + { + "path": "README.md", + "sha256": "89bd1817ed8a259ac53ace36cd09d071884ff99ce3ae9904c0efb1aae45ebf49" + }, + { + "path": "agents/kubernetes-architect.md", + "sha256": "2081cdfa2ad9e18fb4a56c51b59bead58009e8670c759c97152edfe5199cd6b7" + }, + { + "path": ".claude-plugin/plugin.json", + "sha256": "00c8da008c279c806355a0cc724246c3d2979a8d3442a0db19b50cc42685bb16" + }, + { + "path": "skills/helm-chart-scaffolding/SKILL.md", + "sha256": "08d6e5114d24c682ae03e886f9fc8e602d9d91edf087da5cbb856de5fba68555" + }, + { + "path": "skills/helm-chart-scaffolding/references/chart-structure.md", + "sha256": "212526eea43c6ac1372543d7dc1af5eb7da3570aa197cb42df42d18dd48a855f" + }, + { + "path": "skills/helm-chart-scaffolding/scripts/validate-chart.sh", + "sha256": "f1de1b886fc8e171fd4c6e10f1cb53796d47dac9b7a76babd710e2a86471089a" + }, + { + "path": "skills/helm-chart-scaffolding/assets/values.yaml.template", + "sha256": "d4ec34d9301b82001a167babcbfcfda8a5f2c39dbbe5b85d336a9c695d35c91a" + }, + { + "path": "skills/helm-chart-scaffolding/assets/Chart.yaml.template", + "sha256": "a752ee6b46f5f191d032fe1f4da60772a02651a3be03605acc86489875f4e1dd" + }, + { + "path": "skills/gitops-workflow/SKILL.md", + "sha256": "163a0eb927c805236ba54c39feb3de5e2212f4380acb3fe4b63621822f016297" + }, + { + "path": "skills/gitops-workflow/references/argocd-setup.md", + "sha256": "062bd19f9d4ca7e7ca9ccc1fd63cd5d9cf3898f60a5b5c30c140c45f2373e481" + }, + { + "path": "skills/gitops-workflow/references/sync-policies.md", + "sha256": "56b65cf0cef633d87272a1fa1b7ea8bc53ac06073496330dea6f04c5bfe60c68" + }, + { + "path": "skills/k8s-security-policies/SKILL.md", + "sha256": "a2c3f21b667b15c8716d7d883b75729959635868a8bde2e5477bd9248372ab79" + }, + { + "path": "skills/k8s-security-policies/references/rbac-patterns.md", + "sha256": "561e52062276dd7d94c160556ca8570be10679fde1d983a8ad0f5b4485a038e5" + }, + { + "path": "skills/k8s-security-policies/assets/network-policy-template.yaml", + "sha256": "719734ad1a92abad28556c3c421c3f848923854f3228f0424b7f409182c9df6f" + }, + { + "path": "skills/k8s-manifest-generator/SKILL.md", + "sha256": "97dfb5a98bdaff4601c34b48c1d763648d1e153d8fc5ebd37a0698f488b8efec" + }, + { + "path": "skills/k8s-manifest-generator/references/deployment-spec.md", + "sha256": "87e17e0ef345f402fd884510f76b9c2dab429cd49851c90afe51099026c2757c" + }, + { + "path": "skills/k8s-manifest-generator/references/service-spec.md", + "sha256": "32cccc48f50c313280586f0eda7a0d74f120415bfeecc05b3b19ddf35228bf0c" + }, + { + "path": "skills/k8s-manifest-generator/assets/configmap-template.yaml", + "sha256": "d991cec7e5653ea24bdec1b0e76ec9de70cc0a7ac4e13c647cdfd893e31cb064" + }, + { + "path": "skills/k8s-manifest-generator/assets/service-template.yaml", + "sha256": "87f085b5502182b7503d332f7c62da69bb3f737cb8f38f65ceb520d7e5557711" + }, + { + "path": "skills/k8s-manifest-generator/assets/deployment-template.yaml", + "sha256": "9cbaf979af9c66f8fb64e3430a6751820f8ff3161a3af351fa3728e4eddd1245" + } + ], + "dirSha256": "39ec82aedd7a2f06fdb81e41c26fd02fcd4ec59aad0988409adbda378a7319c3" + }, + "security": { + "scannedAt": null, + "scannerVersion": null, + "flags": [] + } +} \ No newline at end of file diff --git a/skills/gitops-workflow/SKILL.md b/skills/gitops-workflow/SKILL.md new file mode 100644 index 0000000..447b08e --- /dev/null +++ b/skills/gitops-workflow/SKILL.md @@ -0,0 +1,285 @@ +--- +name: gitops-workflow +description: Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes deployments with continuous reconciliation. Use when implementing GitOps practices, automating Kubernetes deployments, or setting up declarative infrastructure management. +--- + +# GitOps Workflow + +Complete guide to implementing GitOps workflows with ArgoCD and Flux for automated Kubernetes deployments. + +## Purpose + +Implement declarative, Git-based continuous delivery for Kubernetes using ArgoCD or Flux CD, following OpenGitOps principles. + +## When to Use This Skill + +- Set up GitOps for Kubernetes clusters +- Automate application deployments from Git +- Implement progressive delivery strategies +- Manage multi-cluster deployments +- Configure automated sync policies +- Set up secret management in GitOps + +## OpenGitOps Principles + +1. **Declarative** - Entire system described declaratively +2. **Versioned and Immutable** - Desired state stored in Git +3. **Pulled Automatically** - Software agents pull desired state +4. **Continuously Reconciled** - Agents reconcile actual vs desired state + +## ArgoCD Setup + +### 1. Installation + +```bash +# Create namespace +kubectl create namespace argocd + +# Install ArgoCD +kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml + +# Get admin password +kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d +``` + +**Reference:** See `references/argocd-setup.md` for detailed setup + +### 2. Repository Structure + +``` +gitops-repo/ +├── apps/ +│ ├── production/ +│ │ ├── app1/ +│ │ │ ├── kustomization.yaml +│ │ │ └── deployment.yaml +│ │ └── app2/ +│ └── staging/ +├── infrastructure/ +│ ├── ingress-nginx/ +│ ├── cert-manager/ +│ └── monitoring/ +└── argocd/ + ├── applications/ + └── projects/ +``` + +### 3. Create Application + +```yaml +# argocd/applications/my-app.yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: my-app + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/org/gitops-repo + targetRevision: main + path: apps/production/my-app + destination: + server: https://kubernetes.default.svc + namespace: production + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true +``` + +### 4. App of Apps Pattern + +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: applications + namespace: argocd +spec: + project: default + source: + repoURL: https://github.com/org/gitops-repo + targetRevision: main + path: argocd/applications + destination: + server: https://kubernetes.default.svc + namespace: argocd + syncPolicy: + automated: {} +``` + +## Flux CD Setup + +### 1. Installation + +```bash +# Install Flux CLI +curl -s https://fluxcd.io/install.sh | sudo bash + +# Bootstrap Flux +flux bootstrap github \ + --owner=org \ + --repository=gitops-repo \ + --branch=main \ + --path=clusters/production \ + --personal +``` + +### 2. Create GitRepository + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: GitRepository +metadata: + name: my-app + namespace: flux-system +spec: + interval: 1m + url: https://github.com/org/my-app + ref: + branch: main +``` + +### 3. Create Kustomization + +```yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: my-app + namespace: flux-system +spec: + interval: 5m + path: ./deploy + prune: true + sourceRef: + kind: GitRepository + name: my-app +``` + +## Sync Policies + +### Auto-Sync Configuration + +**ArgoCD:** +```yaml +syncPolicy: + automated: + prune: true # Delete resources not in Git + selfHeal: true # Reconcile manual changes + allowEmpty: false + retry: + limit: 5 + backoff: + duration: 5s + factor: 2 + maxDuration: 3m +``` + +**Flux:** +```yaml +spec: + interval: 1m + prune: true + wait: true + timeout: 5m +``` + +**Reference:** See `references/sync-policies.md` + +## Progressive Delivery + +### Canary Deployment with ArgoCD Rollouts + +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Rollout +metadata: + name: my-app +spec: + replicas: 5 + strategy: + canary: + steps: + - setWeight: 20 + - pause: {duration: 1m} + - setWeight: 50 + - pause: {duration: 2m} + - setWeight: 100 +``` + +### Blue-Green Deployment + +```yaml +strategy: + blueGreen: + activeService: my-app + previewService: my-app-preview + autoPromotionEnabled: false +``` + +## Secret Management + +### External Secrets Operator + +```yaml +apiVersion: external-secrets.io/v1beta1 +kind: ExternalSecret +metadata: + name: db-credentials +spec: + refreshInterval: 1h + secretStoreRef: + name: aws-secrets-manager + kind: SecretStore + target: + name: db-credentials + data: + - secretKey: password + remoteRef: + key: prod/db/password +``` + +### Sealed Secrets + +```bash +# Encrypt secret +kubeseal --format yaml < secret.yaml > sealed-secret.yaml + +# Commit sealed-secret.yaml to Git +``` + +## Best Practices + +1. **Use separate repos or branches** for different environments +2. **Implement RBAC** for Git repositories +3. **Enable notifications** for sync failures +4. **Use health checks** for custom resources +5. **Implement approval gates** for production +6. **Keep secrets out of Git** (use External Secrets) +7. **Use App of Apps pattern** for organization +8. **Tag releases** for easy rollback +9. **Monitor sync status** with alerts +10. **Test changes** in staging first + +## Troubleshooting + +**Sync failures:** +```bash +argocd app get my-app +argocd app sync my-app --prune +``` + +**Out of sync status:** +```bash +argocd app diff my-app +argocd app sync my-app --force +``` + +## Related Skills + +- `k8s-manifest-generator` - For creating manifests +- `helm-chart-scaffolding` - For packaging applications diff --git a/skills/gitops-workflow/references/argocd-setup.md b/skills/gitops-workflow/references/argocd-setup.md new file mode 100644 index 0000000..667dddd --- /dev/null +++ b/skills/gitops-workflow/references/argocd-setup.md @@ -0,0 +1,134 @@ +# ArgoCD Setup and Configuration + +## Installation Methods + +### 1. Standard Installation +```bash +kubectl create namespace argocd +kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml +``` + +### 2. High Availability Installation +```bash +kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml +``` + +### 3. Helm Installation +```bash +helm repo add argo https://argoproj.github.io/argo-helm +helm install argocd argo/argo-cd -n argocd --create-namespace +``` + +## Initial Configuration + +### Access ArgoCD UI +```bash +# Port forward +kubectl port-forward svc/argocd-server -n argocd 8080:443 + +# Get initial admin password +argocd admin initial-password -n argocd +``` + +### Configure Ingress +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: argocd-server-ingress + namespace: argocd + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod + nginx.ingress.kubernetes.io/ssl-passthrough: "true" + nginx.ingress.kubernetes.io/backend-protocol: "HTTPS" +spec: + ingressClassName: nginx + rules: + - host: argocd.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: argocd-server + port: + number: 443 + tls: + - hosts: + - argocd.example.com + secretName: argocd-secret +``` + +## CLI Configuration + +### Login +```bash +argocd login argocd.example.com --username admin +``` + +### Add Repository +```bash +argocd repo add https://github.com/org/repo --username user --password token +``` + +### Create Application +```bash +argocd app create my-app \ + --repo https://github.com/org/repo \ + --path apps/my-app \ + --dest-server https://kubernetes.default.svc \ + --dest-namespace production +``` + +## SSO Configuration + +### GitHub OAuth +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: argocd-cm + namespace: argocd +data: + url: https://argocd.example.com + dex.config: | + connectors: + - type: github + id: github + name: GitHub + config: + clientID: $GITHUB_CLIENT_ID + clientSecret: $GITHUB_CLIENT_SECRET + orgs: + - name: my-org +``` + +## RBAC Configuration +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: argocd-rbac-cm + namespace: argocd +data: + policy.default: role:readonly + policy.csv: | + p, role:developers, applications, *, */dev, allow + p, role:operators, applications, *, */*, allow + g, my-org:devs, role:developers + g, my-org:ops, role:operators +``` + +## Best Practices + +1. Enable SSO for production +2. Implement RBAC policies +3. Use separate projects for teams +4. Enable audit logging +5. Configure notifications +6. Use ApplicationSets for multi-cluster +7. Implement resource hooks +8. Configure health checks +9. Use sync windows for maintenance +10. Monitor with Prometheus metrics diff --git a/skills/gitops-workflow/references/sync-policies.md b/skills/gitops-workflow/references/sync-policies.md new file mode 100644 index 0000000..c15307b --- /dev/null +++ b/skills/gitops-workflow/references/sync-policies.md @@ -0,0 +1,131 @@ +# GitOps Sync Policies + +## ArgoCD Sync Policies + +### Automated Sync +```yaml +syncPolicy: + automated: + prune: true # Delete resources removed from Git + selfHeal: true # Reconcile manual changes + allowEmpty: false # Prevent empty sync +``` + +### Manual Sync +```yaml +syncPolicy: + syncOptions: + - PrunePropagationPolicy=foreground + - CreateNamespace=true +``` + +### Sync Windows +```yaml +syncWindows: +- kind: allow + schedule: "0 8 * * *" + duration: 1h + applications: + - my-app +- kind: deny + schedule: "0 22 * * *" + duration: 8h + applications: + - '*' +``` + +### Retry Policy +```yaml +syncPolicy: + retry: + limit: 5 + backoff: + duration: 5s + factor: 2 + maxDuration: 3m +``` + +## Flux Sync Policies + +### Kustomization Sync +```yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: my-app +spec: + interval: 5m + prune: true + wait: true + timeout: 5m + retryInterval: 1m + force: false +``` + +### Source Sync Interval +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: GitRepository +metadata: + name: my-app +spec: + interval: 1m + timeout: 60s +``` + +## Health Assessment + +### Custom Health Checks +```yaml +# ArgoCD +apiVersion: v1 +kind: ConfigMap +metadata: + name: argocd-cm + namespace: argocd +data: + resource.customizations.health.MyCustomResource: | + hs = {} + if obj.status ~= nil then + if obj.status.conditions ~= nil then + for i, condition in ipairs(obj.status.conditions) do + if condition.type == "Ready" and condition.status == "False" then + hs.status = "Degraded" + hs.message = condition.message + return hs + end + if condition.type == "Ready" and condition.status == "True" then + hs.status = "Healthy" + hs.message = condition.message + return hs + end + end + end + end + hs.status = "Progressing" + hs.message = "Waiting for status" + return hs +``` + +## Sync Options + +### Common Sync Options +- `PrunePropagationPolicy=foreground` - Wait for pruned resources to be deleted +- `CreateNamespace=true` - Auto-create namespace +- `Validate=false` - Skip kubectl validation +- `PruneLast=true` - Prune resources after sync +- `RespectIgnoreDifferences=true` - Honor ignore differences +- `ApplyOutOfSyncOnly=true` - Only apply out-of-sync resources + +## Best Practices + +1. Use automated sync for non-production +2. Require manual approval for production +3. Configure sync windows for maintenance +4. Implement health checks for custom resources +5. Use selective sync for large applications +6. Configure appropriate retry policies +7. Monitor sync failures with alerts +8. Use prune with caution in production +9. Test sync policies in staging +10. Document sync behavior for teams diff --git a/skills/helm-chart-scaffolding/SKILL.md b/skills/helm-chart-scaffolding/SKILL.md new file mode 100644 index 0000000..db31ab1 --- /dev/null +++ b/skills/helm-chart-scaffolding/SKILL.md @@ -0,0 +1,544 @@ +--- +name: helm-chart-scaffolding +description: Design, organize, and manage Helm charts for templating and packaging Kubernetes applications with reusable configurations. Use when creating Helm charts, packaging Kubernetes applications, or implementing templated deployments. +--- + +# Helm Chart Scaffolding + +Comprehensive guidance for creating, organizing, and managing Helm charts for packaging and deploying Kubernetes applications. + +## Purpose + +This skill provides step-by-step instructions for building production-ready Helm charts, including chart structure, templating patterns, values management, and validation strategies. + +## When to Use This Skill + +Use this skill when you need to: +- Create new Helm charts from scratch +- Package Kubernetes applications for distribution +- Manage multi-environment deployments with Helm +- Implement templating for reusable Kubernetes manifests +- Set up Helm chart repositories +- Follow Helm best practices and conventions + +## Helm Overview + +**Helm** is the package manager for Kubernetes that: +- Templates Kubernetes manifests for reusability +- Manages application releases and rollbacks +- Handles dependencies between charts +- Provides version control for deployments +- Simplifies configuration management across environments + +## Step-by-Step Workflow + +### 1. Initialize Chart Structure + +**Create new chart:** +```bash +helm create my-app +``` + +**Standard chart structure:** +``` +my-app/ +├── Chart.yaml # Chart metadata +├── values.yaml # Default configuration values +├── charts/ # Chart dependencies +├── templates/ # Kubernetes manifest templates +│ ├── NOTES.txt # Post-install notes +│ ├── _helpers.tpl # Template helpers +│ ├── deployment.yaml +│ ├── service.yaml +│ ├── ingress.yaml +│ ├── serviceaccount.yaml +│ ├── hpa.yaml +│ └── tests/ +│ └── test-connection.yaml +└── .helmignore # Files to ignore +``` + +### 2. Configure Chart.yaml + +**Chart metadata defines the package:** + +```yaml +apiVersion: v2 +name: my-app +description: A Helm chart for My Application +type: application +version: 1.0.0 # Chart version +appVersion: "2.1.0" # Application version + +# Keywords for chart discovery +keywords: + - web + - api + - backend + +# Maintainer information +maintainers: + - name: DevOps Team + email: devops@example.com + url: https://github.com/example/my-app + +# Source code repository +sources: + - https://github.com/example/my-app + +# Homepage +home: https://example.com + +# Chart icon +icon: https://example.com/icon.png + +# Dependencies +dependencies: + - name: postgresql + version: "12.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: postgresql.enabled + - name: redis + version: "17.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: redis.enabled +``` + +**Reference:** See `assets/Chart.yaml.template` for complete example + +### 3. Design values.yaml Structure + +**Organize values hierarchically:** + +```yaml +# Image configuration +image: + repository: myapp + tag: "1.0.0" + pullPolicy: IfNotPresent + +# Number of replicas +replicaCount: 3 + +# Service configuration +service: + type: ClusterIP + port: 80 + targetPort: 8080 + +# Ingress configuration +ingress: + enabled: false + className: nginx + hosts: + - host: app.example.com + paths: + - path: / + pathType: Prefix + +# Resources +resources: + requests: + memory: "256Mi" + cpu: "250m" + limits: + memory: "512Mi" + cpu: "500m" + +# Autoscaling +autoscaling: + enabled: false + minReplicas: 2 + maxReplicas: 10 + targetCPUUtilizationPercentage: 80 + +# Environment variables +env: + - name: LOG_LEVEL + value: "info" + +# ConfigMap data +configMap: + data: + APP_MODE: production + +# Dependencies +postgresql: + enabled: true + auth: + database: myapp + username: myapp + +redis: + enabled: false +``` + +**Reference:** See `assets/values.yaml.template` for complete structure + +### 4. Create Template Files + +**Use Go templating with Helm functions:** + +**templates/deployment.yaml:** +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "my-app.fullname" . }} + labels: + {{- include "my-app.labels" . | nindent 4 }} +spec: + {{- if not .Values.autoscaling.enabled }} + replicas: {{ .Values.replicaCount }} + {{- end }} + selector: + matchLabels: + {{- include "my-app.selectorLabels" . | nindent 6 }} + template: + metadata: + labels: + {{- include "my-app.selectorLabels" . | nindent 8 }} + spec: + containers: + - name: {{ .Chart.Name }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + ports: + - name: http + containerPort: {{ .Values.service.targetPort }} + resources: + {{- toYaml .Values.resources | nindent 12 }} + env: + {{- toYaml .Values.env | nindent 12 }} +``` + +### 5. Create Template Helpers + +**templates/_helpers.tpl:** +```yaml +{{/* +Expand the name of the chart. +*/}} +{{- define "my-app.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +*/}} +{{- define "my-app.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- $name := default .Chart.Name .Values.nameOverride }} +{{- if contains $name .Release.Name }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "my-app.labels" -}} +helm.sh/chart: {{ include "my-app.chart" . }} +{{ include "my-app.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "my-app.selectorLabels" -}} +app.kubernetes.io/name: {{ include "my-app.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} +``` + +### 6. Manage Dependencies + +**Add dependencies in Chart.yaml:** +```yaml +dependencies: + - name: postgresql + version: "12.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: postgresql.enabled +``` + +**Update dependencies:** +```bash +helm dependency update +helm dependency build +``` + +**Override dependency values:** +```yaml +# values.yaml +postgresql: + enabled: true + auth: + database: myapp + username: myapp + password: changeme + primary: + persistence: + enabled: true + size: 10Gi +``` + +### 7. Test and Validate + +**Validation commands:** +```bash +# Lint the chart +helm lint my-app/ + +# Dry-run installation +helm install my-app ./my-app --dry-run --debug + +# Template rendering +helm template my-app ./my-app + +# Template with values +helm template my-app ./my-app -f values-prod.yaml + +# Show computed values +helm show values ./my-app +``` + +**Validation script:** +```bash +#!/bin/bash +set -e + +echo "Linting chart..." +helm lint . + +echo "Testing template rendering..." +helm template test-release . --dry-run + +echo "Checking for required values..." +helm template test-release . --validate + +echo "All validations passed!" +``` + +**Reference:** See `scripts/validate-chart.sh` + +### 8. Package and Distribute + +**Package the chart:** +```bash +helm package my-app/ +# Creates: my-app-1.0.0.tgz +``` + +**Create chart repository:** +```bash +# Create index +helm repo index . + +# Upload to repository +# AWS S3 example +aws s3 sync . s3://my-helm-charts/ --exclude "*" --include "*.tgz" --include "index.yaml" +``` + +**Use the chart:** +```bash +helm repo add my-repo https://charts.example.com +helm repo update +helm install my-app my-repo/my-app +``` + +### 9. Multi-Environment Configuration + +**Environment-specific values files:** + +``` +my-app/ +├── values.yaml # Defaults +├── values-dev.yaml # Development +├── values-staging.yaml # Staging +└── values-prod.yaml # Production +``` + +**values-prod.yaml:** +```yaml +replicaCount: 5 + +image: + tag: "2.1.0" + +resources: + requests: + memory: "512Mi" + cpu: "500m" + limits: + memory: "1Gi" + cpu: "1000m" + +autoscaling: + enabled: true + minReplicas: 3 + maxReplicas: 20 + +ingress: + enabled: true + hosts: + - host: app.example.com + paths: + - path: / + pathType: Prefix + +postgresql: + enabled: true + primary: + persistence: + size: 100Gi +``` + +**Install with environment:** +```bash +helm install my-app ./my-app -f values-prod.yaml --namespace production +``` + +### 10. Implement Hooks and Tests + +**Pre-install hook:** +```yaml +# templates/pre-install-job.yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: {{ include "my-app.fullname" . }}-db-setup + annotations: + "helm.sh/hook": pre-install + "helm.sh/hook-weight": "-5" + "helm.sh/hook-delete-policy": hook-succeeded +spec: + template: + spec: + containers: + - name: db-setup + image: postgres:15 + command: ["psql", "-c", "CREATE DATABASE myapp"] + restartPolicy: Never +``` + +**Test connection:** +```yaml +# templates/tests/test-connection.yaml +apiVersion: v1 +kind: Pod +metadata: + name: "{{ include "my-app.fullname" . }}-test-connection" + annotations: + "helm.sh/hook": test +spec: + containers: + - name: wget + image: busybox + command: ['wget'] + args: ['{{ include "my-app.fullname" . }}:{{ .Values.service.port }}'] + restartPolicy: Never +``` + +**Run tests:** +```bash +helm test my-app +``` + +## Common Patterns + +### Pattern 1: Conditional Resources + +```yaml +{{- if .Values.ingress.enabled }} +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: {{ include "my-app.fullname" . }} +spec: + # ... +{{- end }} +``` + +### Pattern 2: Iterating Over Lists + +```yaml +env: +{{- range .Values.env }} +- name: {{ .name }} + value: {{ .value | quote }} +{{- end }} +``` + +### Pattern 3: Including Files + +```yaml +data: + config.yaml: | + {{- .Files.Get "config/application.yaml" | nindent 4 }} +``` + +### Pattern 4: Global Values + +```yaml +global: + imageRegistry: docker.io + imagePullSecrets: + - name: regcred + +# Use in templates: +image: {{ .Values.global.imageRegistry }}/{{ .Values.image.repository }} +``` + +## Best Practices + +1. **Use semantic versioning** for chart and app versions +2. **Document all values** in values.yaml with comments +3. **Use template helpers** for repeated logic +4. **Validate charts** before packaging +5. **Pin dependency versions** explicitly +6. **Use conditions** for optional resources +7. **Follow naming conventions** (lowercase, hyphens) +8. **Include NOTES.txt** with usage instructions +9. **Add labels** consistently using helpers +10. **Test installations** in all environments + +## Troubleshooting + +**Template rendering errors:** +```bash +helm template my-app ./my-app --debug +``` + +**Dependency issues:** +```bash +helm dependency update +helm dependency list +``` + +**Installation failures:** +```bash +helm install my-app ./my-app --dry-run --debug +kubectl get events --sort-by='.lastTimestamp' +``` + +## Reference Files + +- `assets/Chart.yaml.template` - Chart metadata template +- `assets/values.yaml.template` - Values structure template +- `scripts/validate-chart.sh` - Validation script +- `references/chart-structure.md` - Detailed chart organization + +## Related Skills + +- `k8s-manifest-generator` - For creating base Kubernetes manifests +- `gitops-workflow` - For automated Helm chart deployments diff --git a/skills/helm-chart-scaffolding/assets/Chart.yaml.template b/skills/helm-chart-scaffolding/assets/Chart.yaml.template new file mode 100644 index 0000000..74dfe6e --- /dev/null +++ b/skills/helm-chart-scaffolding/assets/Chart.yaml.template @@ -0,0 +1,42 @@ +apiVersion: v2 +name: +description: +type: application +version: 0.1.0 +appVersion: "1.0.0" + +keywords: + - + - + +home: https://github.com// + +sources: + - https://github.com// + +maintainers: + - name: + email: + url: https://github.com/ + +icon: https://example.com/icon.png + +kubeVersion: ">=1.24.0" + +dependencies: + - name: postgresql + version: "12.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: postgresql.enabled + tags: + - database + - name: redis + version: "17.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: redis.enabled + tags: + - cache + +annotations: + category: Application + licenses: Apache-2.0 diff --git a/skills/helm-chart-scaffolding/assets/values.yaml.template b/skills/helm-chart-scaffolding/assets/values.yaml.template new file mode 100644 index 0000000..117c1e5 --- /dev/null +++ b/skills/helm-chart-scaffolding/assets/values.yaml.template @@ -0,0 +1,185 @@ +# Global values shared with subcharts +global: + imageRegistry: docker.io + imagePullSecrets: [] + storageClass: "" + +# Image configuration +image: + registry: docker.io + repository: myapp/web + tag: "" # Defaults to .Chart.AppVersion + pullPolicy: IfNotPresent + +# Override chart name +nameOverride: "" +fullnameOverride: "" + +# Number of replicas +replicaCount: 3 +revisionHistoryLimit: 10 + +# ServiceAccount +serviceAccount: + create: true + annotations: {} + name: "" + +# Pod annotations +podAnnotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + prometheus.io/path: "/metrics" + +# Pod security context +podSecurityContext: + runAsNonRoot: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + seccompProfile: + type: RuntimeDefault + +# Container security context +securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL + +# Service configuration +service: + type: ClusterIP + port: 80 + targetPort: http + annotations: {} + sessionAffinity: None + +# Ingress configuration +ingress: + enabled: false + className: nginx + annotations: {} + hosts: + - host: app.example.com + paths: + - path: / + pathType: Prefix + tls: [] + +# Resources +resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 250m + memory: 256Mi + +# Liveness probe +livenessProbe: + httpGet: + path: /health/live + port: http + initialDelaySeconds: 30 + periodSeconds: 10 + +# Readiness probe +readinessProbe: + httpGet: + path: /health/ready + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + +# Autoscaling +autoscaling: + enabled: false + minReplicas: 2 + maxReplicas: 10 + targetCPUUtilizationPercentage: 80 + targetMemoryUtilizationPercentage: 80 + +# Pod Disruption Budget +podDisruptionBudget: + enabled: true + minAvailable: 1 + +# Node selection +nodeSelector: {} +tolerations: [] +affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchExpressions: + - key: app.kubernetes.io/name + operator: In + values: + - '{{ include "my-app.name" . }}' + topologyKey: kubernetes.io/hostname + +# Environment variables +env: [] +# - name: LOG_LEVEL +# value: "info" + +# ConfigMap data +configMap: + enabled: true + data: {} +# APP_MODE: production +# DATABASE_HOST: postgres.example.com + +# Secrets (use external secret management in production) +secrets: + enabled: false + data: {} + +# Persistent Volume +persistence: + enabled: false + storageClass: "" + accessMode: ReadWriteOnce + size: 10Gi + annotations: {} + +# PostgreSQL dependency +postgresql: + enabled: false + auth: + database: myapp + username: myapp + password: changeme + primary: + persistence: + enabled: true + size: 10Gi + +# Redis dependency +redis: + enabled: false + auth: + enabled: false + master: + persistence: + enabled: false + +# ServiceMonitor for Prometheus Operator +serviceMonitor: + enabled: false + interval: 30s + scrapeTimeout: 10s + labels: {} + +# Network Policy +networkPolicy: + enabled: false + policyTypes: + - Ingress + - Egress + ingress: [] + egress: [] diff --git a/skills/helm-chart-scaffolding/references/chart-structure.md b/skills/helm-chart-scaffolding/references/chart-structure.md new file mode 100644 index 0000000..2b8769a --- /dev/null +++ b/skills/helm-chart-scaffolding/references/chart-structure.md @@ -0,0 +1,500 @@ +# Helm Chart Structure Reference + +Complete guide to Helm chart organization, file conventions, and best practices. + +## Standard Chart Directory Structure + +``` +my-app/ +├── Chart.yaml # Chart metadata (required) +├── Chart.lock # Dependency lock file (generated) +├── values.yaml # Default configuration values (required) +├── values.schema.json # JSON schema for values validation +├── .helmignore # Patterns to ignore when packaging +├── README.md # Chart documentation +├── LICENSE # Chart license +├── charts/ # Chart dependencies (bundled) +│ └── postgresql-12.0.0.tgz +├── crds/ # Custom Resource Definitions +│ └── my-crd.yaml +├── templates/ # Kubernetes manifest templates (required) +│ ├── NOTES.txt # Post-install instructions +│ ├── _helpers.tpl # Template helper functions +│ ├── deployment.yaml +│ ├── service.yaml +│ ├── ingress.yaml +│ ├── configmap.yaml +│ ├── secret.yaml +│ ├── serviceaccount.yaml +│ ├── hpa.yaml +│ ├── pdb.yaml +│ ├── networkpolicy.yaml +│ └── tests/ +│ └── test-connection.yaml +└── files/ # Additional files to include + └── config/ + └── app.conf +``` + +## Chart.yaml Specification + +### API Version v2 (Helm 3+) + +```yaml +apiVersion: v2 # Required: API version +name: my-application # Required: Chart name +version: 1.2.3 # Required: Chart version (SemVer) +appVersion: "2.5.0" # Application version +description: A Helm chart for my application # Required +type: application # Chart type: application or library +keywords: # Search keywords + - web + - api + - backend +home: https://example.com # Project home page +sources: # Source code URLs + - https://github.com/example/my-app +maintainers: # Maintainer list + - name: John Doe + email: john@example.com + url: https://github.com/johndoe +icon: https://example.com/icon.png # Chart icon URL +kubeVersion: ">=1.24.0" # Compatible Kubernetes versions +deprecated: false # Mark chart as deprecated +annotations: # Arbitrary annotations + example.com/release-notes: https://example.com/releases/v1.2.3 +dependencies: # Chart dependencies + - name: postgresql + version: "12.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: postgresql.enabled + tags: + - database + import-values: + - child: database + parent: database + alias: db +``` + +## Chart Types + +### Application Chart +```yaml +type: application +``` +- Standard Kubernetes applications +- Can be installed and managed +- Contains templates for K8s resources + +### Library Chart +```yaml +type: library +``` +- Shared template helpers +- Cannot be installed directly +- Used as dependency by other charts +- No templates/ directory + +## Values Files Organization + +### values.yaml (defaults) +```yaml +# Global values (shared with subcharts) +global: + imageRegistry: docker.io + imagePullSecrets: [] + +# Image configuration +image: + registry: docker.io + repository: myapp/web + tag: "" # Defaults to .Chart.AppVersion + pullPolicy: IfNotPresent + +# Deployment settings +replicaCount: 1 +revisionHistoryLimit: 10 + +# Pod configuration +podAnnotations: {} +podSecurityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + +# Container security +securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL + +# Service +service: + type: ClusterIP + port: 80 + targetPort: http + annotations: {} + +# Resources +resources: + limits: + cpu: 100m + memory: 128Mi + requests: + cpu: 100m + memory: 128Mi + +# Autoscaling +autoscaling: + enabled: false + minReplicas: 1 + maxReplicas: 100 + targetCPUUtilizationPercentage: 80 + +# Node selection +nodeSelector: {} +tolerations: [] +affinity: {} + +# Monitoring +serviceMonitor: + enabled: false + interval: 30s +``` + +### values.schema.json (validation) +```json +{ + "$schema": "https://json-schema.org/draft-07/schema#", + "type": "object", + "properties": { + "replicaCount": { + "type": "integer", + "minimum": 1 + }, + "image": { + "type": "object", + "required": ["repository"], + "properties": { + "repository": { + "type": "string" + }, + "tag": { + "type": "string" + }, + "pullPolicy": { + "type": "string", + "enum": ["Always", "IfNotPresent", "Never"] + } + } + } + }, + "required": ["image"] +} +``` + +## Template Files + +### Template Naming Conventions + +- **Lowercase with hyphens**: `deployment.yaml`, `service-account.yaml` +- **Partial templates**: Prefix with underscore `_helpers.tpl` +- **Tests**: Place in `templates/tests/` +- **CRDs**: Place in `crds/` (not templated) + +### Common Templates + +#### _helpers.tpl +```yaml +{{/* +Standard naming helpers +*/}} +{{- define "my-app.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}} +{{- end -}} + +{{- define "my-app.fullname" -}} +{{- if .Values.fullnameOverride -}} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- $name := default .Chart.Name .Values.nameOverride -}} +{{- if contains $name .Release.Name -}} +{{- .Release.Name | trunc 63 | trimSuffix "-" -}} +{{- else -}} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}} +{{- end -}} +{{- end -}} +{{- end -}} + +{{- define "my-app.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}} +{{- end -}} + +{{/* +Common labels +*/}} +{{- define "my-app.labels" -}} +helm.sh/chart: {{ include "my-app.chart" . }} +{{ include "my-app.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end -}} + +{{- define "my-app.selectorLabels" -}} +app.kubernetes.io/name: {{ include "my-app.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end -}} + +{{/* +Image name helper +*/}} +{{- define "my-app.image" -}} +{{- $registry := .Values.global.imageRegistry | default .Values.image.registry -}} +{{- $repository := .Values.image.repository -}} +{{- $tag := .Values.image.tag | default .Chart.AppVersion -}} +{{- printf "%s/%s:%s" $registry $repository $tag -}} +{{- end -}} +``` + +#### NOTES.txt +``` +Thank you for installing {{ .Chart.Name }}. + +Your release is named {{ .Release.Name }}. + +To learn more about the release, try: + + $ helm status {{ .Release.Name }} + $ helm get all {{ .Release.Name }} + +{{- if .Values.ingress.enabled }} + +Application URL: +{{- range .Values.ingress.hosts }} + http{{ if $.Values.ingress.tls }}s{{ end }}://{{ .host }}{{ .path }} +{{- end }} +{{- else }} + +Get the application URL by running: + export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "my-app.name" . }}" -o jsonpath="{.items[0].metadata.name}") + kubectl port-forward $POD_NAME 8080:80 + echo "Visit http://127.0.0.1:8080" +{{- end }} +``` + +## Dependencies Management + +### Declaring Dependencies + +```yaml +# Chart.yaml +dependencies: + - name: postgresql + version: "12.0.0" + repository: "https://charts.bitnami.com/bitnami" + condition: postgresql.enabled # Enable/disable via values + tags: # Group dependencies + - database + import-values: # Import values from subchart + - child: database + parent: database + alias: db # Reference as .Values.db +``` + +### Managing Dependencies + +```bash +# Update dependencies +helm dependency update + +# List dependencies +helm dependency list + +# Build dependencies +helm dependency build +``` + +### Chart.lock + +Generated automatically by `helm dependency update`: + +```yaml +dependencies: +- name: postgresql + repository: https://charts.bitnami.com/bitnami + version: 12.0.0 +digest: sha256:abcd1234... +generated: "2024-01-01T00:00:00Z" +``` + +## .helmignore + +Exclude files from chart package: + +``` +# Development files +.git/ +.gitignore +*.md +docs/ + +# Build artifacts +*.swp +*.bak +*.tmp +*.orig + +# CI/CD +.travis.yml +.gitlab-ci.yml +Jenkinsfile + +# Testing +test/ +*.test + +# IDE +.vscode/ +.idea/ +*.iml +``` + +## Custom Resource Definitions (CRDs) + +Place CRDs in `crds/` directory: + +``` +crds/ +├── my-app-crd.yaml +└── another-crd.yaml +``` + +**Important CRD notes:** +- CRDs are installed before any templates +- CRDs are NOT templated (no `{{ }}` syntax) +- CRDs are NOT upgraded or deleted with chart +- Use `helm install --skip-crds` to skip installation + +## Chart Versioning + +### Semantic Versioning + +- **Chart Version**: Increment when chart changes + - MAJOR: Breaking changes + - MINOR: New features, backward compatible + - PATCH: Bug fixes + +- **App Version**: Application version being deployed + - Can be any string + - Not required to follow SemVer + +```yaml +version: 2.3.1 # Chart version +appVersion: "1.5.0" # Application version +``` + +## Chart Testing + +### Test Files + +```yaml +# templates/tests/test-connection.yaml +apiVersion: v1 +kind: Pod +metadata: + name: "{{ include "my-app.fullname" . }}-test-connection" + annotations: + "helm.sh/hook": test + "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded +spec: + containers: + - name: wget + image: busybox + command: ['wget'] + args: ['{{ include "my-app.fullname" . }}:{{ .Values.service.port }}'] + restartPolicy: Never +``` + +### Running Tests + +```bash +helm test my-release +helm test my-release --logs +``` + +## Hooks + +Helm hooks allow intervention at specific points: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: {{ include "my-app.fullname" . }}-migration + annotations: + "helm.sh/hook": pre-upgrade,pre-install + "helm.sh/hook-weight": "-5" + "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded +``` + +### Hook Types + +- `pre-install`: Before templates rendered +- `post-install`: After all resources loaded +- `pre-delete`: Before any resources deleted +- `post-delete`: After all resources deleted +- `pre-upgrade`: Before upgrade +- `post-upgrade`: After upgrade +- `pre-rollback`: Before rollback +- `post-rollback`: After rollback +- `test`: Run with `helm test` + +### Hook Weight + +Controls hook execution order (-5 to 5, lower runs first) + +### Hook Deletion Policies + +- `before-hook-creation`: Delete previous hook before new one +- `hook-succeeded`: Delete after successful execution +- `hook-failed`: Delete if hook fails + +## Best Practices + +1. **Use helpers** for repeated template logic +2. **Quote strings** in templates: `{{ .Values.name | quote }}` +3. **Validate values** with values.schema.json +4. **Document all values** in values.yaml +5. **Use semantic versioning** for chart versions +6. **Pin dependency versions** exactly +7. **Include NOTES.txt** with usage instructions +8. **Add tests** for critical functionality +9. **Use hooks** for database migrations +10. **Keep charts focused** - one application per chart + +## Chart Repository Structure + +``` +helm-charts/ +├── index.yaml +├── my-app-1.0.0.tgz +├── my-app-1.1.0.tgz +├── my-app-1.2.0.tgz +└── another-chart-2.0.0.tgz +``` + +### Creating Repository Index + +```bash +helm repo index . --url https://charts.example.com +``` + +## Related Resources + +- [Helm Documentation](https://helm.sh/docs/) +- [Chart Template Guide](https://helm.sh/docs/chart_template_guide/) +- [Best Practices](https://helm.sh/docs/chart_best_practices/) diff --git a/skills/helm-chart-scaffolding/scripts/validate-chart.sh b/skills/helm-chart-scaffolding/scripts/validate-chart.sh new file mode 100644 index 0000000..b8d5b0f --- /dev/null +++ b/skills/helm-chart-scaffolding/scripts/validate-chart.sh @@ -0,0 +1,244 @@ +#!/bin/bash +set -e + +CHART_DIR="${1:-.}" +RELEASE_NAME="test-release" + +echo "═══════════════════════════════════════════════════════" +echo " Helm Chart Validation" +echo "═══════════════════════════════════════════════════════" +echo "" + +# Colors +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +RED='\033[0;31m' +NC='\033[0m' # No Color + +success() { + echo -e "${GREEN}✓${NC} $1" +} + +warning() { + echo -e "${YELLOW}⚠${NC} $1" +} + +error() { + echo -e "${RED}✗${NC} $1" +} + +# Check if Helm is installed +if ! command -v helm &> /dev/null; then + error "Helm is not installed" + exit 1 +fi + +echo "📦 Chart directory: $CHART_DIR" +echo "" + +# 1. Check chart structure +echo "1️⃣ Checking chart structure..." +if [ ! -f "$CHART_DIR/Chart.yaml" ]; then + error "Chart.yaml not found" + exit 1 +fi +success "Chart.yaml exists" + +if [ ! -f "$CHART_DIR/values.yaml" ]; then + error "values.yaml not found" + exit 1 +fi +success "values.yaml exists" + +if [ ! -d "$CHART_DIR/templates" ]; then + error "templates/ directory not found" + exit 1 +fi +success "templates/ directory exists" +echo "" + +# 2. Lint the chart +echo "2️⃣ Linting chart..." +if helm lint "$CHART_DIR"; then + success "Chart passed lint" +else + error "Chart failed lint" + exit 1 +fi +echo "" + +# 3. Check Chart.yaml +echo "3️⃣ Validating Chart.yaml..." +CHART_NAME=$(grep "^name:" "$CHART_DIR/Chart.yaml" | awk '{print $2}') +CHART_VERSION=$(grep "^version:" "$CHART_DIR/Chart.yaml" | awk '{print $2}') +APP_VERSION=$(grep "^appVersion:" "$CHART_DIR/Chart.yaml" | awk '{print $2}' | tr -d '"') + +if [ -z "$CHART_NAME" ]; then + error "Chart name not found" + exit 1 +fi +success "Chart name: $CHART_NAME" + +if [ -z "$CHART_VERSION" ]; then + error "Chart version not found" + exit 1 +fi +success "Chart version: $CHART_VERSION" + +if [ -z "$APP_VERSION" ]; then + warning "App version not specified" +else + success "App version: $APP_VERSION" +fi +echo "" + +# 4. Test template rendering +echo "4️⃣ Testing template rendering..." +if helm template "$RELEASE_NAME" "$CHART_DIR" > /dev/null 2>&1; then + success "Templates rendered successfully" +else + error "Template rendering failed" + helm template "$RELEASE_NAME" "$CHART_DIR" + exit 1 +fi +echo "" + +# 5. Dry-run installation +echo "5️⃣ Testing dry-run installation..." +if helm install "$RELEASE_NAME" "$CHART_DIR" --dry-run --debug > /dev/null 2>&1; then + success "Dry-run installation successful" +else + error "Dry-run installation failed" + exit 1 +fi +echo "" + +# 6. Check for required Kubernetes resources +echo "6️⃣ Checking generated resources..." +MANIFESTS=$(helm template "$RELEASE_NAME" "$CHART_DIR") + +if echo "$MANIFESTS" | grep -q "kind: Deployment"; then + success "Deployment found" +else + warning "No Deployment found" +fi + +if echo "$MANIFESTS" | grep -q "kind: Service"; then + success "Service found" +else + warning "No Service found" +fi + +if echo "$MANIFESTS" | grep -q "kind: ServiceAccount"; then + success "ServiceAccount found" +else + warning "No ServiceAccount found" +fi +echo "" + +# 7. Check for security best practices +echo "7️⃣ Checking security best practices..." +if echo "$MANIFESTS" | grep -q "runAsNonRoot: true"; then + success "Running as non-root user" +else + warning "Not explicitly running as non-root" +fi + +if echo "$MANIFESTS" | grep -q "readOnlyRootFilesystem: true"; then + success "Using read-only root filesystem" +else + warning "Not using read-only root filesystem" +fi + +if echo "$MANIFESTS" | grep -q "allowPrivilegeEscalation: false"; then + success "Privilege escalation disabled" +else + warning "Privilege escalation not explicitly disabled" +fi +echo "" + +# 8. Check for resource limits +echo "8️⃣ Checking resource configuration..." +if echo "$MANIFESTS" | grep -q "resources:"; then + if echo "$MANIFESTS" | grep -q "limits:"; then + success "Resource limits defined" + else + warning "No resource limits defined" + fi + if echo "$MANIFESTS" | grep -q "requests:"; then + success "Resource requests defined" + else + warning "No resource requests defined" + fi +else + warning "No resources defined" +fi +echo "" + +# 9. Check for health probes +echo "9️⃣ Checking health probes..." +if echo "$MANIFESTS" | grep -q "livenessProbe:"; then + success "Liveness probe configured" +else + warning "No liveness probe found" +fi + +if echo "$MANIFESTS" | grep -q "readinessProbe:"; then + success "Readiness probe configured" +else + warning "No readiness probe found" +fi +echo "" + +# 10. Check dependencies +if [ -f "$CHART_DIR/Chart.yaml" ] && grep -q "^dependencies:" "$CHART_DIR/Chart.yaml"; then + echo "🔟 Checking dependencies..." + if helm dependency list "$CHART_DIR" > /dev/null 2>&1; then + success "Dependencies valid" + + if [ -f "$CHART_DIR/Chart.lock" ]; then + success "Chart.lock file present" + else + warning "Chart.lock file missing (run 'helm dependency update')" + fi + else + error "Dependencies check failed" + fi + echo "" +fi + +# 11. Check for values schema +if [ -f "$CHART_DIR/values.schema.json" ]; then + echo "1️⃣1️⃣ Validating values schema..." + success "values.schema.json present" + + # Validate schema if jq is available + if command -v jq &> /dev/null; then + if jq empty "$CHART_DIR/values.schema.json" 2>/dev/null; then + success "values.schema.json is valid JSON" + else + error "values.schema.json contains invalid JSON" + exit 1 + fi + fi + echo "" +fi + +# Summary +echo "═══════════════════════════════════════════════════════" +echo " Validation Complete!" +echo "═══════════════════════════════════════════════════════" +echo "" +echo "Chart: $CHART_NAME" +echo "Version: $CHART_VERSION" +if [ -n "$APP_VERSION" ]; then + echo "App Version: $APP_VERSION" +fi +echo "" +success "All validations passed!" +echo "" +echo "Next steps:" +echo " • helm package $CHART_DIR" +echo " • helm install my-release $CHART_DIR" +echo " • helm test my-release" +echo "" diff --git a/skills/k8s-manifest-generator/SKILL.md b/skills/k8s-manifest-generator/SKILL.md new file mode 100644 index 0000000..3f5418c --- /dev/null +++ b/skills/k8s-manifest-generator/SKILL.md @@ -0,0 +1,511 @@ +--- +name: k8s-manifest-generator +description: Create production-ready Kubernetes manifests for Deployments, Services, ConfigMaps, and Secrets following best practices and security standards. Use when generating Kubernetes YAML manifests, creating K8s resources, or implementing production-grade Kubernetes configurations. +--- + +# Kubernetes Manifest Generator + +Step-by-step guidance for creating production-ready Kubernetes manifests including Deployments, Services, ConfigMaps, Secrets, and PersistentVolumeClaims. + +## Purpose + +This skill provides comprehensive guidance for generating well-structured, secure, and production-ready Kubernetes manifests following cloud-native best practices and Kubernetes conventions. + +## When to Use This Skill + +Use this skill when you need to: +- Create new Kubernetes Deployment manifests +- Define Service resources for network connectivity +- Generate ConfigMap and Secret resources for configuration management +- Create PersistentVolumeClaim manifests for stateful workloads +- Follow Kubernetes best practices and naming conventions +- Implement resource limits, health checks, and security contexts +- Design manifests for multi-environment deployments + +## Step-by-Step Workflow + +### 1. Gather Requirements + +**Understand the workload:** +- Application type (stateless/stateful) +- Container image and version +- Environment variables and configuration needs +- Storage requirements +- Network exposure requirements (internal/external) +- Resource requirements (CPU, memory) +- Scaling requirements +- Health check endpoints + +**Questions to ask:** +- What is the application name and purpose? +- What container image and tag will be used? +- Does the application need persistent storage? +- What ports does the application expose? +- Are there any secrets or configuration files needed? +- What are the CPU and memory requirements? +- Does the application need to be exposed externally? + +### 2. Create Deployment Manifest + +**Follow this structure:** + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: + namespace: + labels: + app: + version: +spec: + replicas: 3 + selector: + matchLabels: + app: + template: + metadata: + labels: + app: + version: + spec: + containers: + - name: + image: : + ports: + - containerPort: + name: http + resources: + requests: + memory: "256Mi" + cpu: "250m" + limits: + memory: "512Mi" + cpu: "500m" + livenessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 30 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /ready + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + env: + - name: ENV_VAR + value: "value" + envFrom: + - configMapRef: + name: -config + - secretRef: + name: -secret +``` + +**Best practices to apply:** +- Always set resource requests and limits +- Implement both liveness and readiness probes +- Use specific image tags (never `:latest`) +- Apply security context for non-root users +- Use labels for organization and selection +- Set appropriate replica count based on availability needs + +**Reference:** See `references/deployment-spec.md` for detailed deployment options + +### 3. Create Service Manifest + +**Choose the appropriate Service type:** + +**ClusterIP (internal only):** +```yaml +apiVersion: v1 +kind: Service +metadata: + name: + namespace: + labels: + app: +spec: + type: ClusterIP + selector: + app: + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP +``` + +**LoadBalancer (external access):** +```yaml +apiVersion: v1 +kind: Service +metadata: + name: + namespace: + labels: + app: + annotations: + service.beta.kubernetes.io/aws-load-balancer-type: nlb +spec: + type: LoadBalancer + selector: + app: + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP +``` + +**Reference:** See `references/service-spec.md` for service types and networking + +### 4. Create ConfigMap + +**For application configuration:** + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: -config + namespace: +data: + APP_MODE: production + LOG_LEVEL: info + DATABASE_HOST: db.example.com + # For config files + app.properties: | + server.port=8080 + server.host=0.0.0.0 + logging.level=INFO +``` + +**Best practices:** +- Use ConfigMaps for non-sensitive data only +- Organize related configuration together +- Use meaningful names for keys +- Consider using one ConfigMap per component +- Version ConfigMaps when making changes + +**Reference:** See `assets/configmap-template.yaml` for examples + +### 5. Create Secret + +**For sensitive data:** + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: -secret + namespace: +type: Opaque +stringData: + DATABASE_PASSWORD: "changeme" + API_KEY: "secret-api-key" + # For certificate files + tls.crt: | + -----BEGIN CERTIFICATE----- + ... + -----END CERTIFICATE----- + tls.key: | + -----BEGIN PRIVATE KEY----- + ... + -----END PRIVATE KEY----- +``` + +**Security considerations:** +- Never commit secrets to Git in plain text +- Use Sealed Secrets, External Secrets Operator, or Vault +- Rotate secrets regularly +- Use RBAC to limit secret access +- Consider using Secret type: `kubernetes.io/tls` for TLS secrets + +### 6. Create PersistentVolumeClaim (if needed) + +**For stateful applications:** + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: -data + namespace: +spec: + accessModes: + - ReadWriteOnce + storageClassName: gp3 + resources: + requests: + storage: 10Gi +``` + +**Mount in Deployment:** +```yaml +spec: + template: + spec: + containers: + - name: app + volumeMounts: + - name: data + mountPath: /var/lib/app + volumes: + - name: data + persistentVolumeClaim: + claimName: -data +``` + +**Storage considerations:** +- Choose appropriate StorageClass for performance needs +- Use ReadWriteOnce for single-pod access +- Use ReadWriteMany for multi-pod shared storage +- Consider backup strategies +- Set appropriate retention policies + +### 7. Apply Security Best Practices + +**Add security context to Deployment:** + +```yaml +spec: + template: + spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + seccompProfile: + type: RuntimeDefault + containers: + - name: app + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL +``` + +**Security checklist:** +- [ ] Run as non-root user +- [ ] Drop all capabilities +- [ ] Use read-only root filesystem +- [ ] Disable privilege escalation +- [ ] Set seccomp profile +- [ ] Use Pod Security Standards + +### 8. Add Labels and Annotations + +**Standard labels (recommended):** + +```yaml +metadata: + labels: + app.kubernetes.io/name: + app.kubernetes.io/instance: + app.kubernetes.io/version: "1.0.0" + app.kubernetes.io/component: backend + app.kubernetes.io/part-of: + app.kubernetes.io/managed-by: kubectl +``` + +**Useful annotations:** + +```yaml +metadata: + annotations: + description: "Application description" + contact: "team@example.com" + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + prometheus.io/path: "/metrics" +``` + +### 9. Organize Multi-Resource Manifests + +**File organization options:** + +**Option 1: Single file with `---` separator** +```yaml +# app-name.yaml +--- +apiVersion: v1 +kind: ConfigMap +... +--- +apiVersion: v1 +kind: Secret +... +--- +apiVersion: apps/v1 +kind: Deployment +... +--- +apiVersion: v1 +kind: Service +... +``` + +**Option 2: Separate files** +``` +manifests/ +├── configmap.yaml +├── secret.yaml +├── deployment.yaml +├── service.yaml +└── pvc.yaml +``` + +**Option 3: Kustomize structure** +``` +base/ +├── kustomization.yaml +├── deployment.yaml +├── service.yaml +└── configmap.yaml +overlays/ +├── dev/ +│ └── kustomization.yaml +└── prod/ + └── kustomization.yaml +``` + +### 10. Validate and Test + +**Validation steps:** + +```bash +# Dry-run validation +kubectl apply -f manifest.yaml --dry-run=client + +# Server-side validation +kubectl apply -f manifest.yaml --dry-run=server + +# Validate with kubeval +kubeval manifest.yaml + +# Validate with kube-score +kube-score score manifest.yaml + +# Check with kube-linter +kube-linter lint manifest.yaml +``` + +**Testing checklist:** +- [ ] Manifest passes dry-run validation +- [ ] All required fields are present +- [ ] Resource limits are reasonable +- [ ] Health checks are configured +- [ ] Security context is set +- [ ] Labels follow conventions +- [ ] Namespace exists or is created + +## Common Patterns + +### Pattern 1: Simple Stateless Web Application + +**Use case:** Standard web API or microservice + +**Components needed:** +- Deployment (3 replicas for HA) +- ClusterIP Service +- ConfigMap for configuration +- Secret for API keys +- HorizontalPodAutoscaler (optional) + +**Reference:** See `assets/deployment-template.yaml` + +### Pattern 2: Stateful Database Application + +**Use case:** Database or persistent storage application + +**Components needed:** +- StatefulSet (not Deployment) +- Headless Service +- PersistentVolumeClaim template +- ConfigMap for DB configuration +- Secret for credentials + +### Pattern 3: Background Job or Cron + +**Use case:** Scheduled tasks or batch processing + +**Components needed:** +- CronJob or Job +- ConfigMap for job parameters +- Secret for credentials +- ServiceAccount with RBAC + +### Pattern 4: Multi-Container Pod + +**Use case:** Application with sidecar containers + +**Components needed:** +- Deployment with multiple containers +- Shared volumes between containers +- Init containers for setup +- Service (if needed) + +## Templates + +The following templates are available in the `assets/` directory: + +- `deployment-template.yaml` - Standard deployment with best practices +- `service-template.yaml` - Service configurations (ClusterIP, LoadBalancer, NodePort) +- `configmap-template.yaml` - ConfigMap examples with different data types +- `secret-template.yaml` - Secret examples (to be generated, not committed) +- `pvc-template.yaml` - PersistentVolumeClaim templates + +## Reference Documentation + +- `references/deployment-spec.md` - Detailed Deployment specification +- `references/service-spec.md` - Service types and networking details + +## Best Practices Summary + +1. **Always set resource requests and limits** - Prevents resource starvation +2. **Implement health checks** - Ensures Kubernetes can manage your application +3. **Use specific image tags** - Avoid unpredictable deployments +4. **Apply security contexts** - Run as non-root, drop capabilities +5. **Use ConfigMaps and Secrets** - Separate config from code +6. **Label everything** - Enables filtering and organization +7. **Follow naming conventions** - Use standard Kubernetes labels +8. **Validate before applying** - Use dry-run and validation tools +9. **Version your manifests** - Keep in Git with version control +10. **Document with annotations** - Add context for other developers + +## Troubleshooting + +**Pods not starting:** +- Check image pull errors: `kubectl describe pod ` +- Verify resource availability: `kubectl get nodes` +- Check events: `kubectl get events --sort-by='.lastTimestamp'` + +**Service not accessible:** +- Verify selector matches pod labels: `kubectl get endpoints ` +- Check service type and port configuration +- Test from within cluster: `kubectl run debug --rm -it --image=busybox -- sh` + +**ConfigMap/Secret not loading:** +- Verify names match in Deployment +- Check namespace +- Ensure resources exist: `kubectl get configmap,secret` + +## Next Steps + +After creating manifests: +1. Store in Git repository +2. Set up CI/CD pipeline for deployment +3. Consider using Helm or Kustomize for templating +4. Implement GitOps with ArgoCD or Flux +5. Add monitoring and observability + +## Related Skills + +- `helm-chart-scaffolding` - For templating and packaging +- `gitops-workflow` - For automated deployments +- `k8s-security-policies` - For advanced security configurations diff --git a/skills/k8s-manifest-generator/assets/configmap-template.yaml b/skills/k8s-manifest-generator/assets/configmap-template.yaml new file mode 100644 index 0000000..c73ef74 --- /dev/null +++ b/skills/k8s-manifest-generator/assets/configmap-template.yaml @@ -0,0 +1,296 @@ +# Kubernetes ConfigMap Templates + +--- +# Template 1: Simple Key-Value Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: -config + namespace: + labels: + app.kubernetes.io/name: + app.kubernetes.io/instance: +data: + # Simple key-value pairs + APP_ENV: "production" + LOG_LEVEL: "info" + DATABASE_HOST: "db.example.com" + DATABASE_PORT: "5432" + CACHE_TTL: "3600" + MAX_CONNECTIONS: "100" + +--- +# Template 2: Configuration File +apiVersion: v1 +kind: ConfigMap +metadata: + name: -config-file + namespace: + labels: + app.kubernetes.io/name: +data: + # Application configuration file + application.yaml: | + server: + port: 8080 + host: 0.0.0.0 + + logging: + level: INFO + format: json + + database: + host: db.example.com + port: 5432 + pool_size: 20 + timeout: 30 + + cache: + enabled: true + ttl: 3600 + max_entries: 10000 + + features: + new_ui: true + beta_features: false + +--- +# Template 3: Multiple Configuration Files +apiVersion: v1 +kind: ConfigMap +metadata: + name: -multi-config + namespace: + labels: + app.kubernetes.io/name: +data: + # Nginx configuration + nginx.conf: | + user nginx; + worker_processes auto; + error_log /var/log/nginx/error.log warn; + pid /var/run/nginx.pid; + + events { + worker_connections 1024; + } + + http { + include /etc/nginx/mime.types; + default_type application/octet-stream; + + log_format main '$remote_addr - $remote_user [$time_local] "$request" ' + '$status $body_bytes_sent "$http_referer" ' + '"$http_user_agent" "$http_x_forwarded_for"'; + + access_log /var/log/nginx/access.log main; + sendfile on; + keepalive_timeout 65; + + include /etc/nginx/conf.d/*.conf; + } + + # Default site configuration + default.conf: | + server { + listen 80; + server_name _; + + location / { + proxy_pass http://backend:8080; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /health { + access_log off; + return 200 "healthy\n"; + } + } + +--- +# Template 4: JSON Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: -json-config + namespace: + labels: + app.kubernetes.io/name: +data: + config.json: | + { + "server": { + "port": 8080, + "host": "0.0.0.0", + "timeout": 30 + }, + "database": { + "host": "postgres.example.com", + "port": 5432, + "database": "myapp", + "pool": { + "min": 2, + "max": 20 + } + }, + "redis": { + "host": "redis.example.com", + "port": 6379, + "db": 0 + }, + "features": { + "auth": true, + "metrics": true, + "tracing": true + } + } + +--- +# Template 5: Environment-Specific Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: -prod-config + namespace: production + labels: + app.kubernetes.io/name: + environment: production +data: + APP_ENV: "production" + LOG_LEVEL: "warn" + DEBUG: "false" + RATE_LIMIT: "1000" + CACHE_TTL: "3600" + DATABASE_POOL_SIZE: "50" + FEATURE_FLAG_NEW_UI: "true" + FEATURE_FLAG_BETA: "false" + +--- +# Template 6: Script Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: -scripts + namespace: + labels: + app.kubernetes.io/name: +data: + # Initialization script + init.sh: | + #!/bin/bash + set -e + + echo "Running initialization..." + + # Wait for database + until nc -z $DATABASE_HOST $DATABASE_PORT; do + echo "Waiting for database..." + sleep 2 + done + + echo "Database is ready!" + + # Run migrations + if [ "$RUN_MIGRATIONS" = "true" ]; then + echo "Running database migrations..." + ./migrate up + fi + + echo "Initialization complete!" + + # Health check script + healthcheck.sh: | + #!/bin/bash + + # Check application health endpoint + response=$(curl -sf http://localhost:8080/health) + + if [ $? -eq 0 ]; then + echo "Health check passed" + exit 0 + else + echo "Health check failed" + exit 1 + fi + +--- +# Template 7: Prometheus Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: prometheus-config + namespace: monitoring + labels: + app.kubernetes.io/name: prometheus +data: + prometheus.yml: | + global: + scrape_interval: 15s + evaluation_interval: 15s + external_labels: + cluster: 'production' + region: 'us-west-2' + + alerting: + alertmanagers: + - static_configs: + - targets: + - alertmanager:9093 + + rule_files: + - /etc/prometheus/rules/*.yml + + scrape_configs: + - job_name: 'kubernetes-pods' + kubernetes_sd_configs: + - role: pod + relabel_configs: + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] + action: keep + regex: true + - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] + action: replace + target_label: __metrics_path__ + regex: (.+) + - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] + action: replace + target_label: __address__ + regex: ([^:]+)(?::\d+)?;(\d+) + replacement: $1:$2 + +--- +# Usage Examples: +# +# 1. Mount as environment variables: +# envFrom: +# - configMapRef: +# name: -config +# +# 2. Mount as files: +# volumeMounts: +# - name: config +# mountPath: /etc/app +# volumes: +# - name: config +# configMap: +# name: -config-file +# +# 3. Mount specific keys as files: +# volumes: +# - name: nginx-config +# configMap: +# name: -multi-config +# items: +# - key: nginx.conf +# path: nginx.conf +# +# 4. Use individual environment variables: +# env: +# - name: LOG_LEVEL +# valueFrom: +# configMapKeyRef: +# name: -config +# key: LOG_LEVEL diff --git a/skills/k8s-manifest-generator/assets/deployment-template.yaml b/skills/k8s-manifest-generator/assets/deployment-template.yaml new file mode 100644 index 0000000..402be74 --- /dev/null +++ b/skills/k8s-manifest-generator/assets/deployment-template.yaml @@ -0,0 +1,203 @@ +# Production-Ready Kubernetes Deployment Template +# Replace all with actual values + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: + namespace: + labels: + app.kubernetes.io/name: + app.kubernetes.io/instance: + app.kubernetes.io/version: "" + app.kubernetes.io/component: # backend, frontend, database, cache + app.kubernetes.io/part-of: + app.kubernetes.io/managed-by: kubectl + annotations: + description: "" + contact: "" +spec: + replicas: 3 # Minimum 3 for production HA + revisionHistoryLimit: 10 + + selector: + matchLabels: + app.kubernetes.io/name: + app.kubernetes.io/instance: + + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 # Zero-downtime deployment + + minReadySeconds: 10 + progressDeadlineSeconds: 600 + + template: + metadata: + labels: + app.kubernetes.io/name: + app.kubernetes.io/instance: + app.kubernetes.io/version: "" + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + prometheus.io/path: "/metrics" + + spec: + serviceAccountName: + + # Pod-level security context + securityContext: + runAsNonRoot: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + seccompProfile: + type: RuntimeDefault + + # Init containers (optional) + initContainers: + - name: init-wait + image: busybox:1.36 + command: ['sh', '-c', 'echo "Initializing..."'] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 + + containers: + - name: + image: /: # Never use :latest + imagePullPolicy: IfNotPresent + + ports: + - name: http + containerPort: 8080 + protocol: TCP + - name: metrics + containerPort: 9090 + protocol: TCP + + # Environment variables + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: POD_IP + valueFrom: + fieldRef: + fieldPath: status.podIP + + # Load from ConfigMap and Secret + envFrom: + - configMapRef: + name: -config + - secretRef: + name: -secret + + # Resource limits + resources: + requests: + memory: "256Mi" + cpu: "250m" + limits: + memory: "512Mi" + cpu: "500m" + + # Startup probe (for slow-starting apps) + startupProbe: + httpGet: + path: /health/startup + port: http + initialDelaySeconds: 0 + periodSeconds: 10 + timeoutSeconds: 3 + failureThreshold: 30 # 5 minutes to start + + # Liveness probe + livenessProbe: + httpGet: + path: /health/live + port: http + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + + # Readiness probe + readinessProbe: + httpGet: + path: /health/ready + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + timeoutSeconds: 3 + failureThreshold: 3 + + # Volume mounts + volumeMounts: + - name: tmp + mountPath: /tmp + - name: cache + mountPath: /app/cache + # - name: data + # mountPath: /var/lib/app + + # Container security context + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL + + # Lifecycle hooks + lifecycle: + preStop: + exec: + command: ["/bin/sh", "-c", "sleep 15"] # Graceful shutdown + + # Volumes + volumes: + - name: tmp + emptyDir: {} + - name: cache + emptyDir: + sizeLimit: 1Gi + # - name: data + # persistentVolumeClaim: + # claimName: -data + + # Scheduling + affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchLabels: + app.kubernetes.io/name: + topologyKey: kubernetes.io/hostname + + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: topology.kubernetes.io/zone + whenUnsatisfiable: ScheduleAnyway + labelSelector: + matchLabels: + app.kubernetes.io/name: + + terminationGracePeriodSeconds: 30 + + # Image pull secrets (if using private registry) + # imagePullSecrets: + # - name: regcred diff --git a/skills/k8s-manifest-generator/assets/service-template.yaml b/skills/k8s-manifest-generator/assets/service-template.yaml new file mode 100644 index 0000000..e740d80 --- /dev/null +++ b/skills/k8s-manifest-generator/assets/service-template.yaml @@ -0,0 +1,171 @@ +# Kubernetes Service Templates + +--- +# Template 1: ClusterIP Service (Internal Only) +apiVersion: v1 +kind: Service +metadata: + name: + namespace: + labels: + app.kubernetes.io/name: + app.kubernetes.io/instance: + annotations: + description: "Internal service for " +spec: + type: ClusterIP + selector: + app.kubernetes.io/name: + app.kubernetes.io/instance: + ports: + - name: http + port: 80 + targetPort: http # Named port from container + protocol: TCP + sessionAffinity: None + +--- +# Template 2: LoadBalancer Service (External Access) +apiVersion: v1 +kind: Service +metadata: + name: -lb + namespace: + labels: + app.kubernetes.io/name: + annotations: + # AWS NLB annotations + service.beta.kubernetes.io/aws-load-balancer-type: "nlb" + service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" + service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" + # SSL certificate (optional) + # service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..." +spec: + type: LoadBalancer + externalTrafficPolicy: Local # Preserves client IP + selector: + app.kubernetes.io/name: + ports: + - name: http + port: 80 + targetPort: http + protocol: TCP + - name: https + port: 443 + targetPort: https + protocol: TCP + # Restrict access to specific IPs (optional) + # loadBalancerSourceRanges: + # - 203.0.113.0/24 + +--- +# Template 3: NodePort Service (Direct Node Access) +apiVersion: v1 +kind: Service +metadata: + name: -np + namespace: + labels: + app.kubernetes.io/name: +spec: + type: NodePort + selector: + app.kubernetes.io/name: + ports: + - name: http + port: 80 + targetPort: 8080 + nodePort: 30080 # Optional, 30000-32767 range + protocol: TCP + +--- +# Template 4: Headless Service (StatefulSet) +apiVersion: v1 +kind: Service +metadata: + name: -headless + namespace: + labels: + app.kubernetes.io/name: +spec: + clusterIP: None # Headless + selector: + app.kubernetes.io/name: + ports: + - name: client + port: 9042 + targetPort: 9042 + publishNotReadyAddresses: true # Include not-ready pods in DNS + +--- +# Template 5: Multi-Port Service with Metrics +apiVersion: v1 +kind: Service +metadata: + name: -multi + namespace: + labels: + app.kubernetes.io/name: + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + prometheus.io/path: "/metrics" +spec: + type: ClusterIP + selector: + app.kubernetes.io/name: + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP + - name: https + port: 443 + targetPort: 8443 + protocol: TCP + - name: grpc + port: 9090 + targetPort: 9090 + protocol: TCP + - name: metrics + port: 9091 + targetPort: 9091 + protocol: TCP + +--- +# Template 6: Service with Session Affinity +apiVersion: v1 +kind: Service +metadata: + name: -sticky + namespace: + labels: + app.kubernetes.io/name: +spec: + type: ClusterIP + selector: + app.kubernetes.io/name: + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP + sessionAffinity: ClientIP + sessionAffinityConfig: + clientIP: + timeoutSeconds: 10800 # 3 hours + +--- +# Template 7: ExternalName Service (External Service Mapping) +apiVersion: v1 +kind: Service +metadata: + name: external-db + namespace: +spec: + type: ExternalName + externalName: db.example.com + ports: + - port: 5432 + targetPort: 5432 + protocol: TCP diff --git a/skills/k8s-manifest-generator/references/deployment-spec.md b/skills/k8s-manifest-generator/references/deployment-spec.md new file mode 100644 index 0000000..2dfa7ee --- /dev/null +++ b/skills/k8s-manifest-generator/references/deployment-spec.md @@ -0,0 +1,753 @@ +# Kubernetes Deployment Specification Reference + +Comprehensive reference for Kubernetes Deployment resources, covering all key fields, best practices, and common patterns. + +## Overview + +A Deployment provides declarative updates for Pods and ReplicaSets. It manages the desired state of your application, handling rollouts, rollbacks, and scaling operations. + +## Complete Deployment Specification + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-app + namespace: production + labels: + app.kubernetes.io/name: my-app + app.kubernetes.io/version: "1.0.0" + app.kubernetes.io/component: backend + app.kubernetes.io/part-of: my-system + annotations: + description: "Main application deployment" + contact: "backend-team@example.com" +spec: + # Replica management + replicas: 3 + revisionHistoryLimit: 10 + + # Pod selection + selector: + matchLabels: + app: my-app + version: v1 + + # Update strategy + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + + # Minimum time for pod to be ready + minReadySeconds: 10 + + # Deployment will fail if it doesn't progress in this time + progressDeadlineSeconds: 600 + + # Pod template + template: + metadata: + labels: + app: my-app + version: v1 + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + spec: + # Service account for RBAC + serviceAccountName: my-app + + # Security context for the pod + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + seccompProfile: + type: RuntimeDefault + + # Init containers run before main containers + initContainers: + - name: init-db + image: busybox:1.36 + command: ['sh', '-c', 'until nc -z db-service 5432; do sleep 1; done'] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 + + # Main containers + containers: + - name: app + image: myapp:1.0.0 + imagePullPolicy: IfNotPresent + + # Container ports + ports: + - name: http + containerPort: 8080 + protocol: TCP + - name: metrics + containerPort: 9090 + protocol: TCP + + # Environment variables + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: DATABASE_URL + valueFrom: + secretKeyRef: + name: db-credentials + key: url + + # ConfigMap and Secret references + envFrom: + - configMapRef: + name: app-config + - secretRef: + name: app-secrets + + # Resource requests and limits + resources: + requests: + memory: "256Mi" + cpu: "250m" + limits: + memory: "512Mi" + cpu: "500m" + + # Liveness probe + livenessProbe: + httpGet: + path: /health/live + port: http + httpHeaders: + - name: Custom-Header + value: Awesome + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + + # Readiness probe + readinessProbe: + httpGet: + path: /health/ready + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + timeoutSeconds: 3 + successThreshold: 1 + failureThreshold: 3 + + # Startup probe (for slow-starting containers) + startupProbe: + httpGet: + path: /health/startup + port: http + initialDelaySeconds: 0 + periodSeconds: 10 + timeoutSeconds: 3 + successThreshold: 1 + failureThreshold: 30 + + # Volume mounts + volumeMounts: + - name: data + mountPath: /var/lib/app + - name: config + mountPath: /etc/app + readOnly: true + - name: tmp + mountPath: /tmp + + # Security context for container + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL + + # Lifecycle hooks + lifecycle: + postStart: + exec: + command: ["/bin/sh", "-c", "echo Container started > /tmp/started"] + preStop: + exec: + command: ["/bin/sh", "-c", "sleep 15"] + + # Volumes + volumes: + - name: data + persistentVolumeClaim: + claimName: app-data + - name: config + configMap: + name: app-config + - name: tmp + emptyDir: {} + + # DNS configuration + dnsPolicy: ClusterFirst + dnsConfig: + options: + - name: ndots + value: "2" + + # Scheduling + nodeSelector: + disktype: ssd + + affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchExpressions: + - key: app + operator: In + values: + - my-app + topologyKey: kubernetes.io/hostname + + tolerations: + - key: "app" + operator: "Equal" + value: "my-app" + effect: "NoSchedule" + + # Termination + terminationGracePeriodSeconds: 30 + + # Image pull secrets + imagePullSecrets: + - name: regcred +``` + +## Field Reference + +### Metadata Fields + +#### Required Fields +- `apiVersion`: `apps/v1` (current stable version) +- `kind`: `Deployment` +- `metadata.name`: Unique name within namespace + +#### Recommended Metadata +- `metadata.namespace`: Target namespace (defaults to `default`) +- `metadata.labels`: Key-value pairs for organization +- `metadata.annotations`: Non-identifying metadata + +### Spec Fields + +#### Replica Management + +**`replicas`** (integer, default: 1) +- Number of desired pod instances +- Best practice: Use 3+ for production high availability +- Can be scaled manually or via HorizontalPodAutoscaler + +**`revisionHistoryLimit`** (integer, default: 10) +- Number of old ReplicaSets to retain for rollback +- Set to 0 to disable rollback capability +- Reduces storage overhead for long-running deployments + +#### Update Strategy + +**`strategy.type`** (string) +- `RollingUpdate` (default): Gradual pod replacement +- `Recreate`: Delete all pods before creating new ones + +**`strategy.rollingUpdate.maxSurge`** (int or percent, default: 25%) +- Maximum pods above desired replicas during update +- Example: With 3 replicas and maxSurge=1, up to 4 pods during update + +**`strategy.rollingUpdate.maxUnavailable`** (int or percent, default: 25%) +- Maximum pods below desired replicas during update +- Set to 0 for zero-downtime deployments +- Cannot be 0 if maxSurge is 0 + +**Best practices:** +```yaml +# Zero-downtime deployment +strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + +# Fast deployment (can have brief downtime) +strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 2 + maxUnavailable: 1 + +# Complete replacement +strategy: + type: Recreate +``` + +#### Pod Template + +**`template.metadata.labels`** +- Must include labels matching `spec.selector.matchLabels` +- Add version labels for blue/green deployments +- Include standard Kubernetes labels + +**`template.spec.containers`** (required) +- Array of container specifications +- At least one container required +- Each container needs unique name + +#### Container Configuration + +**Image Management:** +```yaml +containers: +- name: app + image: registry.example.com/myapp:1.0.0 + imagePullPolicy: IfNotPresent # or Always, Never +``` + +Image pull policies: +- `IfNotPresent`: Pull if not cached (default for tagged images) +- `Always`: Always pull (default for :latest) +- `Never`: Never pull, fail if not cached + +**Port Declarations:** +```yaml +ports: +- name: http # Named for referencing in Service + containerPort: 8080 + protocol: TCP # TCP (default), UDP, or SCTP + hostPort: 8080 # Optional: Bind to host port (rarely used) +``` + +#### Resource Management + +**Requests vs Limits:** + +```yaml +resources: + requests: + memory: "256Mi" # Guaranteed resources + cpu: "250m" # 0.25 CPU cores + limits: + memory: "512Mi" # Maximum allowed + cpu: "500m" # 0.5 CPU cores +``` + +**QoS Classes (determined automatically):** + +1. **Guaranteed**: requests = limits for all containers + - Highest priority + - Last to be evicted + +2. **Burstable**: requests < limits or only requests set + - Medium priority + - Evicted before Guaranteed + +3. **BestEffort**: No requests or limits set + - Lowest priority + - First to be evicted + +**Best practices:** +- Always set requests in production +- Set limits to prevent resource monopolization +- Memory limits should be 1.5-2x requests +- CPU limits can be higher for bursty workloads + +#### Health Checks + +**Probe Types:** + +1. **startupProbe** - For slow-starting applications + ```yaml + startupProbe: + httpGet: + path: /health/startup + port: 8080 + initialDelaySeconds: 0 + periodSeconds: 10 + failureThreshold: 30 # 5 minutes to start (10s * 30) + ``` + +2. **livenessProbe** - Restarts unhealthy containers + ```yaml + livenessProbe: + httpGet: + path: /health/live + port: 8080 + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 # Restart after 3 failures + ``` + +3. **readinessProbe** - Controls traffic routing + ```yaml + readinessProbe: + httpGet: + path: /health/ready + port: 8080 + initialDelaySeconds: 5 + periodSeconds: 5 + failureThreshold: 3 # Remove from service after 3 failures + ``` + +**Probe Mechanisms:** + +```yaml +# HTTP GET +httpGet: + path: /health + port: 8080 + httpHeaders: + - name: Authorization + value: Bearer token + +# TCP Socket +tcpSocket: + port: 3306 + +# Command execution +exec: + command: + - cat + - /tmp/healthy + +# gRPC (Kubernetes 1.24+) +grpc: + port: 9090 + service: my.service.health.v1.Health +``` + +**Probe Timing Parameters:** + +- `initialDelaySeconds`: Wait before first probe +- `periodSeconds`: How often to probe +- `timeoutSeconds`: Probe timeout +- `successThreshold`: Successes needed to mark healthy (1 for liveness/startup) +- `failureThreshold`: Failures before taking action + +#### Security Context + +**Pod-level security context:** +```yaml +spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 + fsGroupChangePolicy: OnRootMismatch + seccompProfile: + type: RuntimeDefault +``` + +**Container-level security context:** +```yaml +containers: +- name: app + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL + add: + - NET_BIND_SERVICE # Only if needed +``` + +**Security best practices:** +- Always run as non-root (`runAsNonRoot: true`) +- Drop all capabilities and add only needed ones +- Use read-only root filesystem when possible +- Enable seccomp profile +- Disable privilege escalation + +#### Volumes + +**Volume Types:** + +```yaml +volumes: +# PersistentVolumeClaim +- name: data + persistentVolumeClaim: + claimName: app-data + +# ConfigMap +- name: config + configMap: + name: app-config + items: + - key: app.properties + path: application.properties + +# Secret +- name: secrets + secret: + secretName: app-secrets + defaultMode: 0400 + +# EmptyDir (ephemeral) +- name: cache + emptyDir: + sizeLimit: 1Gi + +# HostPath (avoid in production) +- name: host-data + hostPath: + path: /data + type: DirectoryOrCreate +``` + +#### Scheduling + +**Node Selection:** + +```yaml +# Simple node selector +nodeSelector: + disktype: ssd + zone: us-west-1a + +# Node affinity (more expressive) +affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/arch + operator: In + values: + - amd64 + - arm64 +``` + +**Pod Affinity/Anti-Affinity:** + +```yaml +# Spread pods across nodes +affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchLabels: + app: my-app + topologyKey: kubernetes.io/hostname + +# Co-locate with database +affinity: + podAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchLabels: + app: database + topologyKey: kubernetes.io/hostname +``` + +**Tolerations:** + +```yaml +tolerations: +- key: "node.kubernetes.io/unreachable" + operator: "Exists" + effect: "NoExecute" + tolerationSeconds: 30 +- key: "dedicated" + operator: "Equal" + value: "database" + effect: "NoSchedule" +``` + +## Common Patterns + +### High Availability Deployment + +```yaml +spec: + replicas: 3 + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + template: + spec: + affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchLabels: + app: my-app + topologyKey: kubernetes.io/hostname + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: topology.kubernetes.io/zone + whenUnsatisfiable: DoNotSchedule + labelSelector: + matchLabels: + app: my-app +``` + +### Sidecar Container Pattern + +```yaml +spec: + template: + spec: + containers: + - name: app + image: myapp:1.0.0 + volumeMounts: + - name: shared-logs + mountPath: /var/log + - name: log-forwarder + image: fluent-bit:2.0 + volumeMounts: + - name: shared-logs + mountPath: /var/log + readOnly: true + volumes: + - name: shared-logs + emptyDir: {} +``` + +### Init Container for Dependencies + +```yaml +spec: + template: + spec: + initContainers: + - name: wait-for-db + image: busybox:1.36 + command: + - sh + - -c + - | + until nc -z database-service 5432; do + echo "Waiting for database..." + sleep 2 + done + - name: run-migrations + image: myapp:1.0.0 + command: ["./migrate", "up"] + env: + - name: DATABASE_URL + valueFrom: + secretKeyRef: + name: db-credentials + key: url + containers: + - name: app + image: myapp:1.0.0 +``` + +## Best Practices + +### Production Checklist + +- [ ] Set resource requests and limits +- [ ] Implement all three probe types (startup, liveness, readiness) +- [ ] Use specific image tags (not :latest) +- [ ] Configure security context (non-root, read-only filesystem) +- [ ] Set replica count >= 3 for HA +- [ ] Configure pod anti-affinity for spread +- [ ] Set appropriate update strategy (maxUnavailable: 0 for zero-downtime) +- [ ] Use ConfigMaps and Secrets for configuration +- [ ] Add standard labels and annotations +- [ ] Configure graceful shutdown (preStop hook, terminationGracePeriodSeconds) +- [ ] Set revisionHistoryLimit for rollback capability +- [ ] Use ServiceAccount with minimal RBAC permissions + +### Performance Tuning + +**Fast startup:** +```yaml +spec: + minReadySeconds: 5 + strategy: + rollingUpdate: + maxSurge: 2 + maxUnavailable: 1 +``` + +**Zero-downtime updates:** +```yaml +spec: + minReadySeconds: 10 + strategy: + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 +``` + +**Graceful shutdown:** +```yaml +spec: + template: + spec: + terminationGracePeriodSeconds: 60 + containers: + - name: app + lifecycle: + preStop: + exec: + command: ["/bin/sh", "-c", "sleep 15 && kill -SIGTERM 1"] +``` + +## Troubleshooting + +### Common Issues + +**Pods not starting:** +```bash +kubectl describe deployment +kubectl get pods -l app= +kubectl describe pod +kubectl logs +``` + +**ImagePullBackOff:** +- Check image name and tag +- Verify imagePullSecrets +- Check registry credentials + +**CrashLoopBackOff:** +- Check container logs +- Verify liveness probe is not too aggressive +- Check resource limits +- Verify application dependencies + +**Deployment stuck in progress:** +- Check progressDeadlineSeconds +- Verify readiness probes +- Check resource availability + +## Related Resources + +- [Kubernetes Deployment API Reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#deployment-v1-apps) +- [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) +- [Resource Management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) diff --git a/skills/k8s-manifest-generator/references/service-spec.md b/skills/k8s-manifest-generator/references/service-spec.md new file mode 100644 index 0000000..65abbc4 --- /dev/null +++ b/skills/k8s-manifest-generator/references/service-spec.md @@ -0,0 +1,724 @@ +# Kubernetes Service Specification Reference + +Comprehensive reference for Kubernetes Service resources, covering service types, networking, load balancing, and service discovery patterns. + +## Overview + +A Service provides stable network endpoints for accessing Pods. Services enable loose coupling between microservices by providing service discovery and load balancing. + +## Service Types + +### 1. ClusterIP (Default) + +Exposes the service on an internal cluster IP. Only reachable from within the cluster. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: backend-service + namespace: production +spec: + type: ClusterIP + selector: + app: backend + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP + sessionAffinity: None +``` + +**Use cases:** +- Internal microservice communication +- Database services +- Internal APIs +- Message queues + +### 2. NodePort + +Exposes the service on each Node's IP at a static port (30000-32767 range). + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: frontend-service +spec: + type: NodePort + selector: + app: frontend + ports: + - name: http + port: 80 + targetPort: 8080 + nodePort: 30080 # Optional, auto-assigned if omitted + protocol: TCP +``` + +**Use cases:** +- Development/testing external access +- Small deployments without load balancer +- Direct node access requirements + +**Limitations:** +- Limited port range (30000-32767) +- Must handle node failures +- No built-in load balancing across nodes + +### 3. LoadBalancer + +Exposes the service using a cloud provider's load balancer. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: public-api + annotations: + service.beta.kubernetes.io/aws-load-balancer-type: "nlb" + service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" +spec: + type: LoadBalancer + selector: + app: api + ports: + - name: https + port: 443 + targetPort: 8443 + protocol: TCP + loadBalancerSourceRanges: + - 203.0.113.0/24 +``` + +**Cloud-specific annotations:** + +**AWS:** +```yaml +annotations: + service.beta.kubernetes.io/aws-load-balancer-type: "nlb" # or "external" + service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" + service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" + service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..." + service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http" +``` + +**Azure:** +```yaml +annotations: + service.beta.kubernetes.io/azure-load-balancer-internal: "true" + service.beta.kubernetes.io/azure-pip-name: "my-public-ip" +``` + +**GCP:** +```yaml +annotations: + cloud.google.com/load-balancer-type: "Internal" + cloud.google.com/backend-config: '{"default": "my-backend-config"}' +``` + +### 4. ExternalName + +Maps service to external DNS name (CNAME record). + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: external-db +spec: + type: ExternalName + externalName: db.external.example.com + ports: + - port: 5432 +``` + +**Use cases:** +- Accessing external services +- Service migration scenarios +- Multi-cluster service references + +## Complete Service Specification + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: my-service + namespace: production + labels: + app: my-app + tier: backend + annotations: + description: "Main application service" + prometheus.io/scrape: "true" +spec: + # Service type + type: ClusterIP + + # Pod selector + selector: + app: my-app + version: v1 + + # Ports configuration + ports: + - name: http + port: 80 # Service port + targetPort: 8080 # Container port (or named port) + protocol: TCP # TCP, UDP, or SCTP + + # Session affinity + sessionAffinity: ClientIP + sessionAffinityConfig: + clientIP: + timeoutSeconds: 10800 + + # IP configuration + clusterIP: 10.0.0.10 # Optional: specific IP + clusterIPs: + - 10.0.0.10 + ipFamilies: + - IPv4 + ipFamilyPolicy: SingleStack + + # External traffic policy + externalTrafficPolicy: Local + + # Internal traffic policy + internalTrafficPolicy: Local + + # Health check + healthCheckNodePort: 30000 + + # Load balancer config (for type: LoadBalancer) + loadBalancerIP: 203.0.113.100 + loadBalancerSourceRanges: + - 203.0.113.0/24 + + # External IPs + externalIPs: + - 80.11.12.10 + + # Publishing strategy + publishNotReadyAddresses: false +``` + +## Port Configuration + +### Named Ports + +Use named ports in Pods for flexibility: + +**Deployment:** +```yaml +spec: + template: + spec: + containers: + - name: app + ports: + - name: http + containerPort: 8080 + - name: metrics + containerPort: 9090 +``` + +**Service:** +```yaml +spec: + ports: + - name: http + port: 80 + targetPort: http # References named port + - name: metrics + port: 9090 + targetPort: metrics +``` + +### Multiple Ports + +```yaml +spec: + ports: + - name: http + port: 80 + targetPort: 8080 + protocol: TCP + - name: https + port: 443 + targetPort: 8443 + protocol: TCP + - name: grpc + port: 9090 + targetPort: 9090 + protocol: TCP +``` + +## Session Affinity + +### None (Default) + +Distributes requests randomly across pods. + +```yaml +spec: + sessionAffinity: None +``` + +### ClientIP + +Routes requests from same client IP to same pod. + +```yaml +spec: + sessionAffinity: ClientIP + sessionAffinityConfig: + clientIP: + timeoutSeconds: 10800 # 3 hours +``` + +**Use cases:** +- Stateful applications +- Session-based applications +- WebSocket connections + +## Traffic Policies + +### External Traffic Policy + +**Cluster (Default):** +```yaml +spec: + externalTrafficPolicy: Cluster +``` +- Load balances across all nodes +- May add extra network hop +- Source IP is masked + +**Local:** +```yaml +spec: + externalTrafficPolicy: Local +``` +- Traffic goes only to pods on receiving node +- Preserves client source IP +- Better performance (no extra hop) +- May cause imbalanced load + +### Internal Traffic Policy + +```yaml +spec: + internalTrafficPolicy: Local # or Cluster +``` + +Controls traffic routing for cluster-internal clients. + +## Headless Services + +Service without cluster IP for direct pod access. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: database +spec: + clusterIP: None # Headless + selector: + app: database + ports: + - port: 5432 + targetPort: 5432 +``` + +**Use cases:** +- StatefulSet pod discovery +- Direct pod-to-pod communication +- Custom load balancing +- Database clusters + +**DNS returns:** +- Individual pod IPs instead of service IP +- Format: `...svc.cluster.local` + +## Service Discovery + +### DNS + +**ClusterIP Service:** +``` +..svc.cluster.local +``` + +Example: +```bash +curl http://backend-service.production.svc.cluster.local +``` + +**Within same namespace:** +```bash +curl http://backend-service +``` + +**Headless Service (returns pod IPs):** +``` +...svc.cluster.local +``` + +### Environment Variables + +Kubernetes injects service info into pods: + +```bash +# Service host and port +BACKEND_SERVICE_SERVICE_HOST=10.0.0.100 +BACKEND_SERVICE_SERVICE_PORT=80 + +# For named ports +BACKEND_SERVICE_SERVICE_PORT_HTTP=80 +``` + +**Note:** Pods must be created after the service for env vars to be injected. + +## Load Balancing + +### Algorithms + +Kubernetes uses random selection by default. For advanced load balancing: + +**Service Mesh (Istio example):** +```yaml +apiVersion: networking.istio.io/v1beta1 +kind: DestinationRule +metadata: + name: my-destination-rule +spec: + host: my-service + trafficPolicy: + loadBalancer: + simple: LEAST_REQUEST # or ROUND_ROBIN, RANDOM, PASSTHROUGH + connectionPool: + tcp: + maxConnections: 100 +``` + +### Connection Limits + +Use pod disruption budgets and resource limits: + +```yaml +apiVersion: policy/v1 +kind: PodDisruptionBudget +metadata: + name: my-app-pdb +spec: + minAvailable: 2 + selector: + matchLabels: + app: my-app +``` + +## Service Mesh Integration + +### Istio Virtual Service + +```yaml +apiVersion: networking.istio.io/v1beta1 +kind: VirtualService +metadata: + name: my-service +spec: + hosts: + - my-service + http: + - match: + - headers: + version: + exact: v2 + route: + - destination: + host: my-service + subset: v2 + - route: + - destination: + host: my-service + subset: v1 + weight: 90 + - destination: + host: my-service + subset: v2 + weight: 10 +``` + +## Common Patterns + +### Pattern 1: Internal Microservice + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: user-service + namespace: backend + labels: + app: user-service + tier: backend +spec: + type: ClusterIP + selector: + app: user-service + ports: + - name: http + port: 8080 + targetPort: http + protocol: TCP + - name: grpc + port: 9090 + targetPort: grpc + protocol: TCP +``` + +### Pattern 2: Public API with Load Balancer + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: api-gateway + annotations: + service.beta.kubernetes.io/aws-load-balancer-type: "nlb" + service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:..." +spec: + type: LoadBalancer + externalTrafficPolicy: Local + selector: + app: api-gateway + ports: + - name: https + port: 443 + targetPort: 8443 + protocol: TCP + loadBalancerSourceRanges: + - 0.0.0.0/0 +``` + +### Pattern 3: StatefulSet with Headless Service + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: cassandra +spec: + clusterIP: None + selector: + app: cassandra + ports: + - port: 9042 + targetPort: 9042 +--- +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: cassandra +spec: + serviceName: cassandra + replicas: 3 + selector: + matchLabels: + app: cassandra + template: + metadata: + labels: + app: cassandra + spec: + containers: + - name: cassandra + image: cassandra:4.0 +``` + +### Pattern 4: External Service Mapping + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: external-database +spec: + type: ExternalName + externalName: prod-db.cxyz.us-west-2.rds.amazonaws.com +--- +# Or with Endpoints for IP-based external service +apiVersion: v1 +kind: Service +metadata: + name: external-api +spec: + ports: + - port: 443 + targetPort: 443 + protocol: TCP +--- +apiVersion: v1 +kind: Endpoints +metadata: + name: external-api +subsets: +- addresses: + - ip: 203.0.113.100 + ports: + - port: 443 +``` + +### Pattern 5: Multi-Port Service with Metrics + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: web-app + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + prometheus.io/path: "/metrics" +spec: + type: ClusterIP + selector: + app: web-app + ports: + - name: http + port: 80 + targetPort: 8080 + - name: metrics + port: 9090 + targetPort: 9090 +``` + +## Network Policies + +Control traffic to services: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-frontend-to-backend +spec: + podSelector: + matchLabels: + app: backend + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app: frontend + ports: + - protocol: TCP + port: 8080 +``` + +## Best Practices + +### Service Configuration + +1. **Use named ports** for flexibility +2. **Set appropriate service type** based on exposure needs +3. **Use labels and selectors consistently** across Deployments and Services +4. **Configure session affinity** for stateful apps +5. **Set external traffic policy to Local** for IP preservation +6. **Use headless services** for StatefulSets +7. **Implement network policies** for security +8. **Add monitoring annotations** for observability + +### Production Checklist + +- [ ] Service type appropriate for use case +- [ ] Selector matches pod labels +- [ ] Named ports used for clarity +- [ ] Session affinity configured if needed +- [ ] Traffic policy set appropriately +- [ ] Load balancer annotations configured (if applicable) +- [ ] Source IP ranges restricted (for public services) +- [ ] Health check configuration validated +- [ ] Monitoring annotations added +- [ ] Network policies defined + +### Performance Tuning + +**For high traffic:** +```yaml +spec: + externalTrafficPolicy: Local + sessionAffinity: ClientIP + sessionAffinityConfig: + clientIP: + timeoutSeconds: 3600 +``` + +**For WebSocket/long connections:** +```yaml +spec: + sessionAffinity: ClientIP + sessionAffinityConfig: + clientIP: + timeoutSeconds: 86400 # 24 hours +``` + +## Troubleshooting + +### Service not accessible + +```bash +# Check service exists +kubectl get service + +# Check endpoints (should show pod IPs) +kubectl get endpoints + +# Describe service +kubectl describe service + +# Check if pods match selector +kubectl get pods -l app= +``` + +**Common issues:** +- Selector doesn't match pod labels +- No pods running (endpoints empty) +- Ports misconfigured +- Network policy blocking traffic + +### DNS resolution failing + +```bash +# Test DNS from pod +kubectl run debug --rm -it --image=busybox -- nslookup + +# Check CoreDNS +kubectl get pods -n kube-system -l k8s-app=kube-dns +kubectl logs -n kube-system -l k8s-app=kube-dns +``` + +### Load balancer issues + +```bash +# Check load balancer status +kubectl describe service + +# Check events +kubectl get events --sort-by='.lastTimestamp' + +# Verify cloud provider configuration +kubectl describe node +``` + +## Related Resources + +- [Kubernetes Service API Reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#service-v1-core) +- [Service Networking](https://kubernetes.io/docs/concepts/services-networking/service/) +- [DNS for Services and Pods](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) diff --git a/skills/k8s-security-policies/SKILL.md b/skills/k8s-security-policies/SKILL.md new file mode 100644 index 0000000..1e37550 --- /dev/null +++ b/skills/k8s-security-policies/SKILL.md @@ -0,0 +1,334 @@ +--- +name: k8s-security-policies +description: Implement Kubernetes security policies including NetworkPolicy, PodSecurityPolicy, and RBAC for production-grade security. Use when securing Kubernetes clusters, implementing network isolation, or enforcing pod security standards. +--- + +# Kubernetes Security Policies + +Comprehensive guide for implementing NetworkPolicy, PodSecurityPolicy, RBAC, and Pod Security Standards in Kubernetes. + +## Purpose + +Implement defense-in-depth security for Kubernetes clusters using network policies, pod security standards, and RBAC. + +## When to Use This Skill + +- Implement network segmentation +- Configure pod security standards +- Set up RBAC for least-privilege access +- Create security policies for compliance +- Implement admission control +- Secure multi-tenant clusters + +## Pod Security Standards + +### 1. Privileged (Unrestricted) +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: privileged-ns + labels: + pod-security.kubernetes.io/enforce: privileged + pod-security.kubernetes.io/audit: privileged + pod-security.kubernetes.io/warn: privileged +``` + +### 2. Baseline (Minimally restrictive) +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: baseline-ns + labels: + pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/audit: baseline + pod-security.kubernetes.io/warn: baseline +``` + +### 3. Restricted (Most restrictive) +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: restricted-ns + labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/audit: restricted + pod-security.kubernetes.io/warn: restricted +``` + +## Network Policies + +### Default Deny All +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: default-deny-all + namespace: production +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress +``` + +### Allow Frontend to Backend +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-frontend-to-backend + namespace: production +spec: + podSelector: + matchLabels: + app: backend + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app: frontend + ports: + - protocol: TCP + port: 8080 +``` + +### Allow DNS +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-dns + namespace: production +spec: + podSelector: {} + policyTypes: + - Egress + egress: + - to: + - namespaceSelector: + matchLabels: + name: kube-system + ports: + - protocol: UDP + port: 53 +``` + +**Reference:** See `assets/network-policy-template.yaml` + +## RBAC Configuration + +### Role (Namespace-scoped) +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: pod-reader + namespace: production +rules: +- apiGroups: [""] + resources: ["pods"] + verbs: ["get", "watch", "list"] +``` + +### ClusterRole (Cluster-wide) +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: secret-reader +rules: +- apiGroups: [""] + resources: ["secrets"] + verbs: ["get", "watch", "list"] +``` + +### RoleBinding +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: read-pods + namespace: production +subjects: +- kind: User + name: jane + apiGroup: rbac.authorization.k8s.io +- kind: ServiceAccount + name: default + namespace: production +roleRef: + kind: Role + name: pod-reader + apiGroup: rbac.authorization.k8s.io +``` + +**Reference:** See `references/rbac-patterns.md` + +## Pod Security Context + +### Restricted Pod +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: secure-pod +spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 + seccompProfile: + type: RuntimeDefault + containers: + - name: app + image: myapp:1.0 + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL +``` + +## Policy Enforcement with OPA Gatekeeper + +### ConstraintTemplate +```yaml +apiVersion: templates.gatekeeper.sh/v1 +kind: ConstraintTemplate +metadata: + name: k8srequiredlabels +spec: + crd: + spec: + names: + kind: K8sRequiredLabels + validation: + openAPIV3Schema: + type: object + properties: + labels: + type: array + items: + type: string + targets: + - target: admission.k8s.gatekeeper.sh + rego: | + package k8srequiredlabels + violation[{"msg": msg, "details": {"missing_labels": missing}}] { + provided := {label | input.review.object.metadata.labels[label]} + required := {label | label := input.parameters.labels[_]} + missing := required - provided + count(missing) > 0 + msg := sprintf("missing required labels: %v", [missing]) + } +``` + +### Constraint +```yaml +apiVersion: constraints.gatekeeper.sh/v1beta1 +kind: K8sRequiredLabels +metadata: + name: require-app-label +spec: + match: + kinds: + - apiGroups: ["apps"] + kinds: ["Deployment"] + parameters: + labels: ["app", "environment"] +``` + +## Service Mesh Security (Istio) + +### PeerAuthentication (mTLS) +```yaml +apiVersion: security.istio.io/v1beta1 +kind: PeerAuthentication +metadata: + name: default + namespace: production +spec: + mtls: + mode: STRICT +``` + +### AuthorizationPolicy +```yaml +apiVersion: security.istio.io/v1beta1 +kind: AuthorizationPolicy +metadata: + name: allow-frontend + namespace: production +spec: + selector: + matchLabels: + app: backend + action: ALLOW + rules: + - from: + - source: + principals: ["cluster.local/ns/production/sa/frontend"] +``` + +## Best Practices + +1. **Implement Pod Security Standards** at namespace level +2. **Use Network Policies** for network segmentation +3. **Apply least-privilege RBAC** for all service accounts +4. **Enable admission control** (OPA Gatekeeper/Kyverno) +5. **Run containers as non-root** +6. **Use read-only root filesystem** +7. **Drop all capabilities** unless needed +8. **Implement resource quotas** and limit ranges +9. **Enable audit logging** for security events +10. **Regular security scanning** of images + +## Compliance Frameworks + +### CIS Kubernetes Benchmark +- Use RBAC authorization +- Enable audit logging +- Use Pod Security Standards +- Configure network policies +- Implement secrets encryption at rest +- Enable node authentication + +### NIST Cybersecurity Framework +- Implement defense in depth +- Use network segmentation +- Configure security monitoring +- Implement access controls +- Enable logging and monitoring + +## Troubleshooting + +**NetworkPolicy not working:** +```bash +# Check if CNI supports NetworkPolicy +kubectl get nodes -o wide +kubectl describe networkpolicy +``` + +**RBAC permission denied:** +```bash +# Check effective permissions +kubectl auth can-i list pods --as system:serviceaccount:default:my-sa +kubectl auth can-i '*' '*' --as system:serviceaccount:default:my-sa +``` + +## Reference Files + +- `assets/network-policy-template.yaml` - Network policy examples +- `assets/pod-security-template.yaml` - Pod security policies +- `references/rbac-patterns.md` - RBAC configuration patterns + +## Related Skills + +- `k8s-manifest-generator` - For creating secure manifests +- `gitops-workflow` - For automated policy deployment diff --git a/skills/k8s-security-policies/assets/network-policy-template.yaml b/skills/k8s-security-policies/assets/network-policy-template.yaml new file mode 100644 index 0000000..218da0c --- /dev/null +++ b/skills/k8s-security-policies/assets/network-policy-template.yaml @@ -0,0 +1,177 @@ +# Network Policy Templates + +--- +# Template 1: Default Deny All (Start Here) +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: default-deny-all + namespace: +spec: + podSelector: {} + policyTypes: + - Ingress + - Egress + +--- +# Template 2: Allow DNS (Essential) +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-dns + namespace: +spec: + podSelector: {} + policyTypes: + - Egress + egress: + - to: + - namespaceSelector: + matchLabels: + name: kube-system + ports: + - protocol: UDP + port: 53 + +--- +# Template 3: Frontend to Backend +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-frontend-to-backend + namespace: +spec: + podSelector: + matchLabels: + app: backend + tier: backend + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app: frontend + tier: frontend + ports: + - protocol: TCP + port: 8080 + - protocol: TCP + port: 9090 + +--- +# Template 4: Allow Ingress Controller +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-ingress-controller + namespace: +spec: + podSelector: + matchLabels: + app: web + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + name: ingress-nginx + ports: + - protocol: TCP + port: 80 + - protocol: TCP + port: 443 + +--- +# Template 5: Allow Monitoring (Prometheus) +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-prometheus-scraping + namespace: +spec: + podSelector: + matchLabels: + prometheus.io/scrape: "true" + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + name: monitoring + ports: + - protocol: TCP + port: 9090 + +--- +# Template 6: Allow External HTTPS +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-external-https + namespace: +spec: + podSelector: + matchLabels: + app: api-client + policyTypes: + - Egress + egress: + - to: + - ipBlock: + cidr: 0.0.0.0/0 + except: + - 169.254.169.254/32 # Block metadata service + ports: + - protocol: TCP + port: 443 + +--- +# Template 7: Database Access +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-app-to-database + namespace: +spec: + podSelector: + matchLabels: + app: postgres + tier: database + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + tier: backend + ports: + - protocol: TCP + port: 5432 + +--- +# Template 8: Cross-Namespace Communication +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-from-prod-namespace + namespace: +spec: + podSelector: + matchLabels: + app: api + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + environment: production + podSelector: + matchLabels: + app: frontend + ports: + - protocol: TCP + port: 8080 diff --git a/skills/k8s-security-policies/references/rbac-patterns.md b/skills/k8s-security-policies/references/rbac-patterns.md new file mode 100644 index 0000000..11269c7 --- /dev/null +++ b/skills/k8s-security-policies/references/rbac-patterns.md @@ -0,0 +1,187 @@ +# RBAC Patterns and Best Practices + +## Common RBAC Patterns + +### Pattern 1: Read-Only Access +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: read-only +rules: +- apiGroups: ["", "apps", "batch"] + resources: ["*"] + verbs: ["get", "list", "watch"] +``` + +### Pattern 2: Namespace Admin +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: namespace-admin + namespace: production +rules: +- apiGroups: ["", "apps", "batch", "extensions"] + resources: ["*"] + verbs: ["*"] +``` + +### Pattern 3: Deployment Manager +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: deployment-manager + namespace: production +rules: +- apiGroups: ["apps"] + resources: ["deployments"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] +- apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list", "watch"] +``` + +### Pattern 4: Secret Reader (ServiceAccount) +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: secret-reader + namespace: production +rules: +- apiGroups: [""] + resources: ["secrets"] + verbs: ["get"] + resourceNames: ["app-secrets"] # Specific secret only +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: app-secret-reader + namespace: production +subjects: +- kind: ServiceAccount + name: my-app + namespace: production +roleRef: + kind: Role + name: secret-reader + apiGroup: rbac.authorization.k8s.io +``` + +### Pattern 5: CI/CD Pipeline Access +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: cicd-deployer +rules: +- apiGroups: ["apps"] + resources: ["deployments", "replicasets"] + verbs: ["get", "list", "create", "update", "patch"] +- apiGroups: [""] + resources: ["services", "configmaps"] + verbs: ["get", "list", "create", "update", "patch"] +- apiGroups: [""] + resources: ["pods"] + verbs: ["get", "list"] +``` + +## ServiceAccount Best Practices + +### Create Dedicated ServiceAccounts +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: my-app + namespace: production +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-app +spec: + template: + spec: + serviceAccountName: my-app + automountServiceAccountToken: false # Disable if not needed +``` + +### Least-Privilege ServiceAccount +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: my-app-role + namespace: production +rules: +- apiGroups: [""] + resources: ["configmaps"] + verbs: ["get"] + resourceNames: ["my-app-config"] +``` + +## Security Best Practices + +1. **Use Roles over ClusterRoles** when possible +2. **Specify resourceNames** for fine-grained access +3. **Avoid wildcard permissions** (`*`) in production +4. **Create dedicated ServiceAccounts** for each app +5. **Disable token auto-mounting** if not needed +6. **Regular RBAC audits** to remove unused permissions +7. **Use groups** for user management +8. **Implement namespace isolation** +9. **Monitor RBAC usage** with audit logs +10. **Document role purposes** in metadata + +## Troubleshooting RBAC + +### Check User Permissions +```bash +kubectl auth can-i list pods --as john@example.com +kubectl auth can-i '*' '*' --as system:serviceaccount:default:my-app +``` + +### View Effective Permissions +```bash +kubectl describe clusterrole cluster-admin +kubectl describe rolebinding -n production +``` + +### Debug Access Issues +```bash +kubectl get rolebindings,clusterrolebindings --all-namespaces -o wide | grep my-user +``` + +## Common RBAC Verbs + +- `get` - Read a specific resource +- `list` - List all resources of a type +- `watch` - Watch for resource changes +- `create` - Create new resources +- `update` - Update existing resources +- `patch` - Partially update resources +- `delete` - Delete resources +- `deletecollection` - Delete multiple resources +- `*` - All verbs (avoid in production) + +## Resource Scope + +### Cluster-Scoped Resources +- Nodes +- PersistentVolumes +- ClusterRoles +- ClusterRoleBindings +- Namespaces + +### Namespace-Scoped Resources +- Pods +- Services +- Deployments +- ConfigMaps +- Secrets +- Roles +- RoleBindings