# Helm Chart Best Practices CNCF and Helm community standards for production-ready charts. ## Chart Metadata Standards ### Chart.yaml Requirements - `apiVersion: v2` (Helm 3) - Semantic versioning (version, appVersion) - Meaningful description - Keywords for discoverability - Maintainer information ### Naming Conventions - Chart names: lowercase, hyphens (no underscores) - Resource names: `{{ template "name.fullname" . }}` - Avoid hardcoding names ## Kubernetes Label Standards **Required labels (app.kubernetes.io/* namespace):** ```yaml labels: app.kubernetes.io/name: {{ include "chart.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} app.kubernetes.io/managed-by: {{ .Release.Service }} helm.sh/chart: {{ include "chart.chart" . }} ``` **Selector labels (must be immutable):** ```yaml selector: matchLabels: app.kubernetes.io/name: {{ include "chart.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} ``` ## Security Best Practices ### Pod Security Context ```yaml podSecurityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault # Production ``` ### Container Security Context ```yaml securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsNonRoot: true capabilities: drop: - ALL ``` ### Security Guidelines - Never run as root (UID 0) - Drop all Linux capabilities by default - Use read-only root filesystem when possible - Apply seccomp profiles in production - Avoid privileged containers - Don't expose host ports or namespaces ## Resource Management ### Always Define Resources ```yaml resources: limits: cpu: 500m memory: 256Mi requests: cpu: 50m memory: 64Mi ``` ### Resource Sizing Guidelines - **Small apps**: 50m CPU / 64Mi memory (requests) - **Medium apps**: 100m CPU / 128Mi memory (requests) - **Large apps**: 250m+ CPU / 256Mi+ memory (requests) - Limits should be 2-10x requests - Monitor and adjust based on actual usage ## Health Checks ### Liveness Probe Detects when container needs restart: ```yaml livenessProbe: httpGet: path: /health port: http initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 ``` ### Readiness Probe Detects when container can accept traffic: ```yaml readinessProbe: httpGet: path: /ready port: http initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 ``` ### Probe Best Practices - Always define both liveness and readiness - Use appropriate initialDelaySeconds for slow-starting apps - Health endpoints should be lightweight - Don't use same endpoint for liveness and readiness if startup is slow ## Values.yaml Organization ### Structure ```yaml # 1. Replica configuration replicaCount: 1 # 2. Image configuration image: repository: example/app pullPolicy: IfNotPresent tag: "" # Defaults to Chart.appVersion # 3. Service account serviceAccount: create: true name: "" # 4. Security contexts podSecurityContext: {} securityContext: {} # 5. Service configuration service: type: ClusterIP port: 80 # 6. Resources resources: {} # 7. Autoscaling autoscaling: enabled: false # 8. Additional features (Ingress, ConfigMaps, etc.) ``` ### Documentation - Comment every major section - Provide examples for complex values - Document accepted value types - Explain default behavior ## Template Best Practices ### Use Helper Functions ```yaml # _helpers.tpl {{- define "app.fullname" -}} {{- printf "%s-%s" .Release.Name .Chart.Name | trunc 63 | trimSuffix "-" }} {{- end }} ``` ### Conditional Resources ```yaml {{- if .Values.ingress.enabled -}} apiVersion: networking.k8s.io/v1 kind: Ingress ... {{- end }} ``` ### Checksum Annotations Force pod restart on config changes: ```yaml annotations: checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} ``` ## NOTES.txt Guidelines Provide clear post-installation instructions: ``` 1. How to access the application 2. Default credentials (if any) 3. Next steps for configuration 4. Links to documentation 5. Troubleshooting commands ``` ## Multi-Environment Patterns ### Base + Override Pattern - `values.yaml`: Base defaults - `values-dev.yaml`: Development overrides - `values-prod.yaml`: Production overrides ### Environment-Specific Settings - **Dev**: Debug enabled, minimal resources, verbose logging - **Staging**: Production-like, moderate resources - **Prod**: HA, autoscaling, security hardening, monitoring ## Common Pitfalls to Avoid ❌ **Don't:** - Hardcode values in templates - Forget resource limits - Run containers as root - Skip health checks - Use `latest` image tag - Expose secrets in values.yaml - Create resources without labels - Ignore security contexts ✅ **Do:** - Use template functions - Define all resources - Use non-root users - Configure probes - Pin specific versions - Reference external secrets - Apply standard labels - Enable security contexts ## Testing Checklist Before deploying: - [ ] `helm lint` passes - [ ] `helm template` renders correctly - [ ] All required labels present - [ ] Security contexts configured - [ ] Resource limits defined - [ ] Health checks configured - [ ] NOTES.txt provides clear instructions - [ ] README documents all values - [ ] Dry run succeeds - [ ] Test deployment in dev environment ## Validation Commands ```bash # Lint chart helm lint . # Template rendering helm template myrelease . # Dry run helm install myrelease . --dry-run --debug # Install to test namespace kubectl create ns test helm install myrelease . -n test # Verify kubectl get all -n test helm test myrelease -n test # Cleanup helm uninstall myrelease -n test kubectl delete ns test ``` ## References - [Helm Best Practices](https://helm.sh/docs/chart_best_practices/) - [Kubernetes Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/) - [CNCF Security Whitepaper](https://github.com/cncf/tag-security) - [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)