13 KiB
CI/CD Best Practices
Comprehensive guide to CI/CD pipeline design, testing strategies, and deployment patterns.
Table of Contents
- Pipeline Design Principles
- Testing in CI/CD
- Deployment Strategies
- Dependency Management
- Artifact & Release Management
- Platform Patterns
Pipeline Design Principles
Fast Feedback Loops
Design pipelines to provide feedback quickly:
Priority ordering:
- Linting and code formatting (seconds)
- Unit tests (1-5 minutes)
- Integration tests (5-15 minutes)
- E2E tests (15-30 minutes)
- Deployment (varies)
Fail fast pattern:
# GitHub Actions
jobs:
lint:
runs-on: ubuntu-latest
steps:
- run: npm run lint
test:
needs: lint # Only run if lint passes
runs-on: ubuntu-latest
steps:
- run: npm test
e2e:
needs: [lint, test] # Run after basic checks
Job Parallelization
Run independent jobs concurrently:
GitHub Actions:
jobs:
lint:
runs-on: ubuntu-latest
test:
runs-on: ubuntu-latest
# No 'needs' - runs in parallel with lint
build:
needs: [lint, test] # Wait for both
runs-on: ubuntu-latest
GitLab CI:
stages:
- validate
- test
- build
# Jobs in same stage run in parallel
unit-test:
stage: test
integration-test:
stage: test
e2e-test:
stage: test
Monorepo Strategies
Path-based triggers (GitHub):
on:
push:
paths:
- 'services/api/**'
- 'shared/**'
jobs:
api-test:
if: |
contains(github.event.head_commit.modified, 'services/api/') ||
contains(github.event.head_commit.modified, 'shared/')
GitLab rules:
api-test:
rules:
- changes:
- services/api/**/*
- shared/**/*
frontend-test:
rules:
- changes:
- services/frontend/**/*
- shared/**/*
Matrix Builds
Test across multiple versions/platforms:
GitHub Actions:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node: [18, 20, 22]
include:
- os: ubuntu-latest
node: 22
coverage: true
exclude:
- os: windows-latest
node: 18
fail-fast: false # See all results
GitLab parallel:
test:
parallel:
matrix:
- NODE_VERSION: ['18', '20', '22']
OS: ['ubuntu', 'alpine']
Testing in CI/CD
Test Pyramid Strategy
Maintain proper test distribution:
/\
/E2E\ 10% - Slow, expensive, flaky
/-----\
/ Int \ 20% - Medium speed
/--------\
/ Unit \ 70% - Fast, reliable
/------------\
Implementation:
jobs:
unit-test:
runs-on: ubuntu-latest
steps:
- run: npm run test:unit # Fast, runs on every commit
integration-test:
runs-on: ubuntu-latest
needs: unit-test
steps:
- run: npm run test:integration # Medium, after unit tests
e2e-test:
runs-on: ubuntu-latest
needs: [unit-test, integration-test]
if: github.ref == 'refs/heads/main' # Only on main branch
steps:
- run: npm run test:e2e # Slow, only on main
Test Splitting & Parallelization
Split large test suites:
GitHub Actions:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: npm test -- --shard=${{ matrix.shard }}/4
Playwright example:
strategy:
matrix:
shardIndex: [1, 2, 3, 4]
shardTotal: [4]
steps:
- run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
Code Coverage
Track coverage trends:
- name: Run tests with coverage
run: npm test -- --coverage
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
files: ./coverage/lcov.info
fail_ci_if_error: true # Fail if upload fails
- name: Coverage check
run: |
COVERAGE=$(jq -r '.total.lines.pct' coverage/coverage-summary.json)
if (( $(echo "$COVERAGE < 80" | bc -l) )); then
echo "Coverage $COVERAGE% is below 80%"
exit 1
fi
Test Environment Management
Docker Compose for services:
jobs:
integration-test:
runs-on: ubuntu-latest
steps:
- name: Start services
run: docker-compose up -d postgres redis
- name: Wait for services
run: |
timeout 30 bash -c 'until docker-compose exec -T postgres pg_isready; do sleep 1; done'
- name: Run tests
run: npm run test:integration
- name: Cleanup
if: always()
run: docker-compose down
GitLab services:
integration-test:
services:
- postgres:15
- redis:7-alpine
variables:
POSTGRES_DB: testdb
POSTGRES_PASSWORD: password
script:
- npm run test:integration
Deployment Strategies
Deployment Patterns
1. Direct Deployment (Simple)
deploy:
if: github.ref == 'refs/heads/main'
steps:
- run: |
aws s3 sync dist/ s3://${{ secrets.S3_BUCKET }}
aws cloudfront create-invalidation --distribution-id ${{ secrets.CF_DIST }}
2. Blue-Green Deployment
deploy:
steps:
- name: Deploy to staging slot
run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP
- name: Health check
run: |
for i in {1..10}; do
if curl -f https://$APP.azurewebsites.net/health; then
echo "Health check passed"
exit 0
fi
sleep 10
done
exit 1
- name: Rollback on failure
if: failure()
run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP
3. Canary Deployment
deploy-canary:
steps:
- run: kubectl set image deployment/app app=myapp:${{ github.sha }}
- run: kubectl patch deployment app -p '{"spec":{"replicas":1}}' # 1 pod
- run: sleep 300 # Monitor for 5 minutes
- run: kubectl scale deployment app --replicas=10 # Scale to full
Environment Management
GitHub Environments:
jobs:
deploy-staging:
environment:
name: staging
url: https://staging.example.com
steps:
- run: ./deploy.sh staging
deploy-production:
needs: deploy-staging
environment:
name: production
url: https://example.com
steps:
- run: ./deploy.sh production
Protection rules:
- Require approval for production
- Restrict to specific branches
- Add deployment delay
GitLab environments:
deploy:staging:
stage: deploy
environment:
name: staging
url: https://staging.example.com
on_stop: stop:staging
only:
- develop
deploy:production:
stage: deploy
environment:
name: production
url: https://example.com
when: manual # Require manual trigger
only:
- main
Deployment Gates
Pre-deployment checks:
pre-deploy-checks:
steps:
- name: Check migration status
run: ./scripts/check-migrations.sh
- name: Verify dependencies
run: npm audit --audit-level=high
- name: Check service health
run: curl -f https://api.example.com/health
Post-deployment validation:
post-deploy-validation:
needs: deploy
steps:
- name: Smoke tests
run: npm run test:smoke
- name: Monitor errors
run: |
ERROR_COUNT=$(datadog-api errors --since 5m)
if [ $ERROR_COUNT -gt 10 ]; then
echo "Error spike detected!"
exit 1
fi
Dependency Management
Lock Files
Always commit lock files:
package-lock.json(npm)yarn.lock(Yarn)pnpm-lock.yaml(pnpm)Cargo.lock(Rust)Gemfile.lock(Ruby)poetry.lock(Python)
Use deterministic install commands:
# Good - uses lock file
npm ci # Not npm install
yarn install --frozen-lockfile
pnpm install --frozen-lockfile
pip install -r requirements.txt
# Bad - updates lock file
npm install
Dependency Caching
See optimization.md for detailed caching strategies
Quick reference:
- Hash lock files for cache keys
- Include OS/platform in cache key
- Use restore-keys for partial matches
- Separate cache for build artifacts vs dependencies
Security Scanning
Automated vulnerability checks:
security-scan:
steps:
- name: Dependency audit
run: |
npm audit --audit-level=high
# Or: pip-audit, cargo audit, bundle audit
- name: SAST scanning
uses: github/codeql-action/analyze@v3
- name: Container scanning
run: trivy image myapp:${{ github.sha }}
Dependency Updates
Automated dependency updates:
- Dependabot (GitHub)
- Renovate
- GitLab Dependency Scanning
Configuration example (Dependabot):
# .github/dependabot.yml
version: 2
updates:
- package-ecosystem: npm
directory: "/"
schedule:
interval: weekly
open-pull-requests-limit: 5
groups:
dev-dependencies:
dependency-type: development
Artifact & Release Management
Artifact Strategy
Build once, deploy many:
build:
steps:
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: dist-${{ github.sha }}
path: dist/
retention-days: 7
deploy-staging:
needs: build
steps:
- uses: actions/download-artifact@v4
with:
name: dist-${{ github.sha }}
- run: ./deploy.sh staging
deploy-production:
needs: [build, deploy-staging]
steps:
- uses: actions/download-artifact@v4
with:
name: dist-${{ github.sha }}
- run: ./deploy.sh production
Container Image Management
Multi-stage builds:
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]
Image tagging strategy:
- name: Build and tag images
run: |
docker build -t myapp:${{ github.sha }} .
docker tag myapp:${{ github.sha }} myapp:latest
docker tag myapp:${{ github.sha }} myapp:v1.2.3
Release Automation
Semantic versioning:
release:
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/create-release@v1
with:
tag_name: ${{ github.ref }}
release_name: Release ${{ github.ref }}
body: |
Changes in this release:
${{ github.event.head_commit.message }}
Changelog generation:
- name: Generate changelog
run: |
git log $(git describe --tags --abbrev=0)..HEAD \
--pretty=format:"- %s (%h)" > CHANGELOG.md
Platform Patterns
GitHub Actions
Reusable workflows:
# .github/workflows/reusable-test.yml
on:
workflow_call:
inputs:
node-version:
required: true
type: string
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
- run: npm test
Composite actions:
# .github/actions/setup-app/action.yml
name: Setup Application
runs:
using: composite
steps:
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
shell: bash
GitLab CI
Templates & extends:
.test_template:
image: node:20
before_script:
- npm ci
script:
- npm test
unit-test:
extends: .test_template
script:
- npm run test:unit
integration-test:
extends: .test_template
script:
- npm run test:integration
Dynamic child pipelines:
generate-pipeline:
script:
- ./generate-config.sh > pipeline.yml
artifacts:
paths:
- pipeline.yml
trigger-pipeline:
trigger:
include:
- artifact: pipeline.yml
job: generate-pipeline
Continuous Improvement
Metrics to Track
- Build duration: Target < 10 minutes
- Failure rate: Target < 5%
- Time to recovery: Target < 1 hour
- Deployment frequency: Aim for multiple/day
- Lead time: Commit to production < 1 day
Pipeline Optimization Checklist
- Jobs run in parallel where possible
- Dependencies are cached
- Test suite is properly split
- Linting fails fast
- Only necessary tests run on PRs
- Artifacts are reused across jobs
- Pipeline has appropriate timeouts
- Flaky tests are identified and fixed
- Security scanning is automated
- Deployment requires approval
Regular Reviews
Monthly:
- Review build duration trends
- Analyze failure patterns
- Update dependencies
- Review security scan results
Quarterly:
- Audit pipeline efficiency
- Review deployment frequency
- Update CI/CD tools and actions
- Team retrospective on CI/CD pain points