Files
gh-josiahsiegel-claude-code…/skills/docker-best-practices.md
2025-11-30 08:29:02 +08:00

657 lines
13 KiB
Markdown

---
name: docker-best-practices
description: Comprehensive Docker best practices for images, containers, and production deployments
---
## 🚨 CRITICAL GUIDELINES
### Windows File Path Requirements
**MANDATORY: Always Use Backslashes on Windows for File Paths**
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
**Examples:**
- ❌ WRONG: `D:/repos/project/file.tsx`
- ✅ CORRECT: `D:\repos\project\file.tsx`
This applies to:
- Edit tool file_path parameter
- Write tool file_path parameter
- All file operations on Windows systems
### Documentation Guidelines
**NEVER create new documentation files unless explicitly requested by the user.**
- **Priority**: Update existing README.md files rather than creating new documentation
- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
- **User preference**: Only create additional .md files when user specifically asks for documentation
---
# Docker Best Practices
This skill provides current Docker best practices across all aspects of container development, deployment, and operation.
## Image Best Practices
### Base Image Selection
**2025 Recommended Hierarchy:**
1. **Wolfi/Chainguard** (`cgr.dev/chainguard/*`) - Zero-CVE goal, SBOM included
2. **Alpine** (`alpine:3.19`) - ~7MB, minimal attack surface
3. **Distroless** (`gcr.io/distroless/*`) - ~2MB, no shell
4. **Slim variants** (`node:20-slim`) - ~70MB, balanced
**Key rules:**
- Always specify exact version tags: `node:20.11.0-alpine3.19`
- Never use `latest` (unpredictable, breaks reproducibility)
- Use official images from trusted registries
- Match base image to actual needs
### Dockerfile Structure
**Optimal layer ordering** (least to most frequently changing):
```dockerfile
1. Base image and system dependencies
2. Application dependencies (package.json, requirements.txt, etc.)
3. Application code
4. Configuration and metadata
```
**Rationale:** Docker caches layers. If code changes but dependencies don't, cached dependency layers are reused, speeding up builds.
**Example:**
```dockerfile
FROM python:3.12-slim
# 1. System packages (rarely change)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
&& rm -rf /var/lib/apt/lists/*
# 2. Dependencies (change occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 3. Application code (changes frequently)
COPY . /app
WORKDIR /app
CMD ["python", "app.py"]
```
### Multi-Stage Builds
Use multi-stage builds to separate build dependencies from runtime:
```dockerfile
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine AS runtime
WORKDIR /app
# Only copy what's needed for runtime
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]
```
**Benefits:**
- Smaller final images (no build tools)
- Better security (fewer attack vectors)
- Faster deployment (smaller upload/download)
### Layer Optimization
**Combine commands** to reduce layers and image size:
```dockerfile
# Bad - 3 layers, cleanup doesn't reduce size
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
# Good - 1 layer, cleanup effective
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
```
### .dockerignore
Always create `.dockerignore` to exclude unnecessary files:
```
# Version control
.git
.gitignore
# Dependencies
node_modules
__pycache__
*.pyc
# IDE
.vscode
.idea
# OS
.DS_Store
Thumbs.db
# Logs
*.log
logs/
# Testing
coverage/
.nyc_output
*.test.js
# Documentation
README.md
docs/
# Environment
.env
.env.local
*.local
```
## Container Runtime Best Practices
### Security
```bash
docker run \
# Run as non-root
--user 1000:1000 \
# Drop all capabilities, add only needed ones
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
# Read-only filesystem
--read-only \
# Temporary writable filesystems
--tmpfs /tmp:noexec,nosuid \
# No new privileges
--security-opt="no-new-privileges:true" \
# Resource limits
--memory="512m" \
--cpus="1.0" \
my-image
```
### Resource Management
Always set resource limits in production:
```yaml
# docker-compose.yml
services:
app:
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
reservations:
cpus: '1.0'
memory: 512M
```
### Health Checks
Implement health checks for all long-running containers:
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=40s \
CMD curl -f http://localhost:3000/health || exit 1
```
Or in compose:
```yaml
services:
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 40s
```
### Logging
Configure proper logging to prevent disk fill-up:
```yaml
services:
app:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
Or system-wide in `/etc/docker/daemon.json`:
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
```
### Restart Policies
```yaml
services:
app:
# For development
restart: "no"
# For production
restart: unless-stopped
# Or with fine-grained control (Swarm mode)
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
```
## Docker Compose Best Practices
### File Structure
```yaml
# No version field needed (Compose v2.40.3+)
services:
# Service definitions
web:
# ...
api:
# ...
database:
# ...
networks:
# Custom networks (preferred)
frontend:
backend:
internal: true
volumes:
# Named volumes (preferred for persistence)
db-data:
app-data:
configs:
# Configuration files (Swarm mode)
app-config:
file: ./config/app.conf
secrets:
# Secrets (Swarm mode)
db-password:
file: ./secrets/db_pass.txt
```
### Network Isolation
```yaml
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
services:
web:
networks:
- frontend
api:
networks:
- frontend
- backend
database:
networks:
- backend # Not accessible from frontend
```
### Environment Variables
```yaml
services:
app:
# Load from file (preferred for non-secrets)
env_file:
- .env
# Inline for service-specific vars
environment:
- NODE_ENV=production
- LOG_LEVEL=info
# For Swarm mode secrets
secrets:
- db_password
```
**Important:**
- Add `.env` to `.gitignore`
- Provide `.env.example` as template
- Never commit secrets to version control
### Dependency Management
```yaml
services:
api:
depends_on:
database:
condition: service_healthy # Wait for health check
redis:
condition: service_started # Just wait for start
```
## Production Best Practices
### Image Tagging Strategy
```bash
# Use semantic versioning
my-app:1.2.3
my-app:1.2
my-app:1
my-app:latest
# Include git commit for traceability
my-app:1.2.3-abc123f
# Environment tags
my-app:1.2.3-production
my-app:1.2.3-staging
```
### Secrets Management
**Never do this:**
```dockerfile
# BAD - secret in layer history
ENV API_KEY=secret123
RUN echo "password" > /app/config
```
**Do this:**
```bash
# Use Docker secrets (Swarm) or external secret management
docker secret create db_password ./password.txt
# Or mount secrets at runtime
docker run -v /secure/secrets:/run/secrets:ro my-app
# Or use environment files (not in image)
docker run --env-file /secure/.env my-app
```
### Monitoring & Observability
```yaml
services:
app:
# Health checks
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
# Labels for monitoring tools
labels:
- "prometheus.io/scrape=true"
- "prometheus.io/port=9090"
- "com.company.team=backend"
- "com.company.version=1.2.3"
# Logging
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
### Backup Strategy
```bash
# Backup named volume
docker run --rm \
-v VOLUME_NAME:/data \
-v $(pwd):/backup \
alpine tar czf /backup/backup-$(date +%Y%m%d).tar.gz -C /data .
# Restore volume
docker run --rm \
-v VOLUME_NAME:/data \
-v $(pwd):/backup \
alpine tar xzf /backup/backup.tar.gz -C /data
```
### Update Strategy
```yaml
services:
app:
# For Swarm mode - rolling updates
deploy:
replicas: 3
update_config:
parallelism: 1 # Update 1 at a time
delay: 10s # Wait 10s between updates
failure_action: rollback
monitor: 60s
rollback_config:
parallelism: 1
delay: 5s
```
## Platform-Specific Best Practices
### Linux
- Use user namespace remapping for added security
- Leverage native performance advantages
- Use Alpine for smallest images
- Configure SELinux/AppArmor profiles
- Use systemd for Docker daemon management
```json
// /etc/docker/daemon.json
{
"userns-remap": "default",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"live-restore": true
}
```
### macOS
- Allocate sufficient resources in Docker Desktop
- Use `:delegated` or `:cached` for bind mounts
- Consider multi-platform builds for ARM (M1/M2)
- Limit file sharing to necessary directories
```yaml
# Better volume performance on macOS
volumes:
- ./src:/app/src:delegated # Host writes are delayed
- ./build:/app/build:cached # Container writes are cached
```
### Windows
- Choose container type: Windows or Linux
- Use forward slashes in paths
- Ensure drives are shared in Docker Desktop
- Be aware of line ending differences (CRLF vs LF)
- Consider WSL2 backend for better performance
```yaml
# Windows-compatible paths
volumes:
- C:/Users/name/app:/app # Forward slashes work
# or
- C:\Users\name\app:/app # Backslashes need escaping in YAML
```
## Performance Best Practices
### Build Performance
```bash
# Use BuildKit (faster, better caching)
export DOCKER_BUILDKIT=1
# Use cache mounts
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Use bind mounts for dependencies
RUN --mount=type=bind,source=package.json,target=package.json \
--mount=type=bind,source=package-lock.json,target=package-lock.json \
--mount=type=cache,target=/root/.npm \
npm ci
```
### Image Size
- Use multi-stage builds
- Choose minimal base images
- Clean up in the same layer
- Use .dockerignore
- Remove build dependencies
```dockerfile
# Install and cleanup in one layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
package1 \
package2 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
```
### Runtime Performance
```dockerfile
# Use exec form (no shell overhead)
CMD ["node", "server.js"] # Good
# vs
CMD node server.js # Bad - spawns shell
# Optimize signals
STOPSIGNAL SIGTERM
# Run as non-root (slightly faster, much more secure)
USER appuser
```
## Security Best Practices Summary
**Image Security:**
- Use official, minimal base images
- Scan for vulnerabilities (Docker Scout, Trivy)
- Don't include secrets in layers
- Run as non-root user
- Keep images updated
**Runtime Security:**
- Drop capabilities
- Use read-only filesystem
- Set resource limits
- Enable security options
- Isolate networks
- Use secrets management
**Compliance:**
- Follow CIS Docker Benchmark
- Implement container scanning in CI/CD
- Use signed images (Docker Content Trust)
- Maintain audit logs
- Regular security reviews
## Common Anti-Patterns to Avoid
**Don't:**
- Run as root
- Use `--privileged`
- Mount Docker socket
- Use `latest` tag
- Hardcode secrets
- Skip health checks
- Ignore resource limits
- Use huge base images
- Skip vulnerability scanning
- Expose unnecessary ports
- Use inefficient layer caching
- Commit secrets to Git
**Do:**
- Run as non-root
- Use minimal capabilities
- Isolate containers
- Tag with versions
- Use secrets management
- Implement health checks
- Set resource limits
- Use minimal images
- Scan regularly
- Apply least privilege
- Optimize build cache
- Use .env.example templates
## Checklist for Production-Ready Images
- [ ] Based on official, versioned, minimal image
- [ ] Multi-stage build (if applicable)
- [ ] Runs as non-root user
- [ ] No secrets in layers
- [ ] .dockerignore configured
- [ ] Vulnerability scan passed
- [ ] Health check implemented
- [ ] Proper labeling (version, description, etc.)
- [ ] Efficient layer caching
- [ ] Resource limits defined
- [ ] Logging configured
- [ ] Signals handled correctly
- [ ] Security options set
- [ ] Documentation complete
- [ ] Tested on target platform(s)
This skill represents current Docker best practices. Always verify against official documentation for the latest recommendations, as Docker evolves continuously.