Files
2025-11-30 09:05:12 +08:00

283 lines
6.0 KiB
Markdown

# Helm Chart Best Practices
CNCF and Helm community standards for production-ready charts.
## Chart Metadata Standards
### Chart.yaml Requirements
- `apiVersion: v2` (Helm 3)
- Semantic versioning (version, appVersion)
- Meaningful description
- Keywords for discoverability
- Maintainer information
### Naming Conventions
- Chart names: lowercase, hyphens (no underscores)
- Resource names: `{{ template "name.fullname" . }}`
- Avoid hardcoding names
## Kubernetes Label Standards
**Required labels (app.kubernetes.io/* namespace):**
```yaml
labels:
app.kubernetes.io/name: {{ include "chart.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
helm.sh/chart: {{ include "chart.chart" . }}
```
**Selector labels (must be immutable):**
```yaml
selector:
matchLabels:
app.kubernetes.io/name: {{ include "chart.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
```
## Security Best Practices
### Pod Security Context
```yaml
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault # Production
```
### Container Security Context
```yaml
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
capabilities:
drop:
- ALL
```
### Security Guidelines
- Never run as root (UID 0)
- Drop all Linux capabilities by default
- Use read-only root filesystem when possible
- Apply seccomp profiles in production
- Avoid privileged containers
- Don't expose host ports or namespaces
## Resource Management
### Always Define Resources
```yaml
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 50m
memory: 64Mi
```
### Resource Sizing Guidelines
- **Small apps**: 50m CPU / 64Mi memory (requests)
- **Medium apps**: 100m CPU / 128Mi memory (requests)
- **Large apps**: 250m+ CPU / 256Mi+ memory (requests)
- Limits should be 2-10x requests
- Monitor and adjust based on actual usage
## Health Checks
### Liveness Probe
Detects when container needs restart:
```yaml
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
```
### Readiness Probe
Detects when container can accept traffic:
```yaml
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
```
### Probe Best Practices
- Always define both liveness and readiness
- Use appropriate initialDelaySeconds for slow-starting apps
- Health endpoints should be lightweight
- Don't use same endpoint for liveness and readiness if startup is slow
## Values.yaml Organization
### Structure
```yaml
# 1. Replica configuration
replicaCount: 1
# 2. Image configuration
image:
repository: example/app
pullPolicy: IfNotPresent
tag: "" # Defaults to Chart.appVersion
# 3. Service account
serviceAccount:
create: true
name: ""
# 4. Security contexts
podSecurityContext: {}
securityContext: {}
# 5. Service configuration
service:
type: ClusterIP
port: 80
# 6. Resources
resources: {}
# 7. Autoscaling
autoscaling:
enabled: false
# 8. Additional features (Ingress, ConfigMaps, etc.)
```
### Documentation
- Comment every major section
- Provide examples for complex values
- Document accepted value types
- Explain default behavior
## Template Best Practices
### Use Helper Functions
```yaml
# _helpers.tpl
{{- define "app.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name | trunc 63 | trimSuffix "-" }}
{{- end }}
```
### Conditional Resources
```yaml
{{- if .Values.ingress.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
...
{{- end }}
```
### Checksum Annotations
Force pod restart on config changes:
```yaml
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
```
## NOTES.txt Guidelines
Provide clear post-installation instructions:
```
1. How to access the application
2. Default credentials (if any)
3. Next steps for configuration
4. Links to documentation
5. Troubleshooting commands
```
## Multi-Environment Patterns
### Base + Override Pattern
- `values.yaml`: Base defaults
- `values-dev.yaml`: Development overrides
- `values-prod.yaml`: Production overrides
### Environment-Specific Settings
- **Dev**: Debug enabled, minimal resources, verbose logging
- **Staging**: Production-like, moderate resources
- **Prod**: HA, autoscaling, security hardening, monitoring
## Common Pitfalls to Avoid
**Don't:**
- Hardcode values in templates
- Forget resource limits
- Run containers as root
- Skip health checks
- Use `latest` image tag
- Expose secrets in values.yaml
- Create resources without labels
- Ignore security contexts
**Do:**
- Use template functions
- Define all resources
- Use non-root users
- Configure probes
- Pin specific versions
- Reference external secrets
- Apply standard labels
- Enable security contexts
## Testing Checklist
Before deploying:
- [ ] `helm lint` passes
- [ ] `helm template` renders correctly
- [ ] All required labels present
- [ ] Security contexts configured
- [ ] Resource limits defined
- [ ] Health checks configured
- [ ] NOTES.txt provides clear instructions
- [ ] README documents all values
- [ ] Dry run succeeds
- [ ] Test deployment in dev environment
## Validation Commands
```bash
# Lint chart
helm lint .
# Template rendering
helm template myrelease .
# Dry run
helm install myrelease . --dry-run --debug
# Install to test namespace
kubectl create ns test
helm install myrelease . -n test
# Verify
kubectl get all -n test
helm test myrelease -n test
# Cleanup
helm uninstall myrelease -n test
kubectl delete ns test
```
## References
- [Helm Best Practices](https://helm.sh/docs/chart_best_practices/)
- [Kubernetes Labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/)
- [CNCF Security Whitepaper](https://github.com/cncf/tag-security)
- [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)