Files
2025-11-29 17:51:12 +08:00

13 KiB

CI/CD Best Practices

Comprehensive guide to CI/CD pipeline design, testing strategies, and deployment patterns.

Table of Contents


Pipeline Design Principles

Fast Feedback Loops

Design pipelines to provide feedback quickly:

Priority ordering:

  1. Linting and code formatting (seconds)
  2. Unit tests (1-5 minutes)
  3. Integration tests (5-15 minutes)
  4. E2E tests (15-30 minutes)
  5. Deployment (varies)

Fail fast pattern:

# GitHub Actions
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: npm run lint

  test:
    needs: lint  # Only run if lint passes
    runs-on: ubuntu-latest
    steps:
      - run: npm test

  e2e:
    needs: [lint, test]  # Run after basic checks

Job Parallelization

Run independent jobs concurrently:

GitHub Actions:

jobs:
  lint:
    runs-on: ubuntu-latest

  test:
    runs-on: ubuntu-latest
    # No 'needs' - runs in parallel with lint

  build:
    needs: [lint, test]  # Wait for both
    runs-on: ubuntu-latest

GitLab CI:

stages:
  - validate
  - test
  - build

# Jobs in same stage run in parallel
unit-test:
  stage: test

integration-test:
  stage: test

e2e-test:
  stage: test

Monorepo Strategies

Path-based triggers (GitHub):

on:
  push:
    paths:
      - 'services/api/**'
      - 'shared/**'

jobs:
  api-test:
    if: |
      contains(github.event.head_commit.modified, 'services/api/') ||
      contains(github.event.head_commit.modified, 'shared/')

GitLab rules:

api-test:
  rules:
    - changes:
        - services/api/**/*
        - shared/**/*

frontend-test:
  rules:
    - changes:
        - services/frontend/**/*
        - shared/**/*

Matrix Builds

Test across multiple versions/platforms:

GitHub Actions:

strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    node: [18, 20, 22]
    include:
      - os: ubuntu-latest
        node: 22
        coverage: true
    exclude:
      - os: windows-latest
        node: 18
  fail-fast: false  # See all results

GitLab parallel:

test:
  parallel:
    matrix:
      - NODE_VERSION: ['18', '20', '22']
        OS: ['ubuntu', 'alpine']

Testing in CI/CD

Test Pyramid Strategy

Maintain proper test distribution:

        /\
       /E2E\      10% - Slow, expensive, flaky
      /-----\
     /  Int  \    20% - Medium speed
    /--------\
   /   Unit   \   70% - Fast, reliable
  /------------\

Implementation:

jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:unit  # Fast, runs on every commit

  integration-test:
    runs-on: ubuntu-latest
    needs: unit-test
    steps:
      - run: npm run test:integration  # Medium, after unit tests

  e2e-test:
    runs-on: ubuntu-latest
    needs: [unit-test, integration-test]
    if: github.ref == 'refs/heads/main'  # Only on main branch
    steps:
      - run: npm run test:e2e  # Slow, only on main

Test Splitting & Parallelization

Split large test suites:

GitHub Actions:

strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: npm test -- --shard=${{ matrix.shard }}/4

Playwright example:

strategy:
  matrix:
    shardIndex: [1, 2, 3, 4]
    shardTotal: [4]
steps:
  - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}

Code Coverage

Track coverage trends:

- name: Run tests with coverage
  run: npm test -- --coverage

- name: Upload coverage
  uses: codecov/codecov-action@v4
  with:
    files: ./coverage/lcov.info
    fail_ci_if_error: true  # Fail if upload fails

- name: Coverage check
  run: |
    COVERAGE=$(jq -r '.total.lines.pct' coverage/coverage-summary.json)
    if (( $(echo "$COVERAGE < 80" | bc -l) )); then
      echo "Coverage $COVERAGE% is below 80%"
      exit 1
    fi

Test Environment Management

Docker Compose for services:

jobs:
  integration-test:
    runs-on: ubuntu-latest
    steps:
      - name: Start services
        run: docker-compose up -d postgres redis

      - name: Wait for services
        run: |
          timeout 30 bash -c 'until docker-compose exec -T postgres pg_isready; do sleep 1; done'

      - name: Run tests
        run: npm run test:integration

      - name: Cleanup
        if: always()
        run: docker-compose down

GitLab services:

integration-test:
  services:
    - postgres:15
    - redis:7-alpine
  variables:
    POSTGRES_DB: testdb
    POSTGRES_PASSWORD: password
  script:
    - npm run test:integration

Deployment Strategies

Deployment Patterns

1. Direct Deployment (Simple)

deploy:
  if: github.ref == 'refs/heads/main'
  steps:
    - run: |
        aws s3 sync dist/ s3://${{ secrets.S3_BUCKET }}
        aws cloudfront create-invalidation --distribution-id ${{ secrets.CF_DIST }}

2. Blue-Green Deployment

deploy:
  steps:
    - name: Deploy to staging slot
      run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP

    - name: Health check
      run: |
        for i in {1..10}; do
          if curl -f https://$APP.azurewebsites.net/health; then
            echo "Health check passed"
            exit 0
          fi
          sleep 10
        done
        exit 1

    - name: Rollback on failure
      if: failure()
      run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP

3. Canary Deployment

deploy-canary:
  steps:
    - run: kubectl set image deployment/app app=myapp:${{ github.sha }}
    - run: kubectl patch deployment app -p '{"spec":{"replicas":1}}'  # 1 pod
    - run: sleep 300  # Monitor for 5 minutes
    - run: kubectl scale deployment app --replicas=10  # Scale to full

Environment Management

GitHub Environments:

jobs:
  deploy-staging:
    environment:
      name: staging
      url: https://staging.example.com
    steps:
      - run: ./deploy.sh staging

  deploy-production:
    needs: deploy-staging
    environment:
      name: production
      url: https://example.com
    steps:
      - run: ./deploy.sh production

Protection rules:

  • Require approval for production
  • Restrict to specific branches
  • Add deployment delay

GitLab environments:

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
    on_stop: stop:staging
  only:
    - develop

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  when: manual  # Require manual trigger
  only:
    - main

Deployment Gates

Pre-deployment checks:

pre-deploy-checks:
  steps:
    - name: Check migration status
      run: ./scripts/check-migrations.sh

    - name: Verify dependencies
      run: npm audit --audit-level=high

    - name: Check service health
      run: curl -f https://api.example.com/health

Post-deployment validation:

post-deploy-validation:
  needs: deploy
  steps:
    - name: Smoke tests
      run: npm run test:smoke

    - name: Monitor errors
      run: |
        ERROR_COUNT=$(datadog-api errors --since 5m)
        if [ $ERROR_COUNT -gt 10 ]; then
          echo "Error spike detected!"
          exit 1
        fi

Dependency Management

Lock Files

Always commit lock files:

  • package-lock.json (npm)
  • yarn.lock (Yarn)
  • pnpm-lock.yaml (pnpm)
  • Cargo.lock (Rust)
  • Gemfile.lock (Ruby)
  • poetry.lock (Python)

Use deterministic install commands:

# Good - uses lock file
npm ci           # Not npm install
yarn install --frozen-lockfile
pnpm install --frozen-lockfile
pip install -r requirements.txt

# Bad - updates lock file
npm install

Dependency Caching

See optimization.md for detailed caching strategies

Quick reference:

  • Hash lock files for cache keys
  • Include OS/platform in cache key
  • Use restore-keys for partial matches
  • Separate cache for build artifacts vs dependencies

Security Scanning

Automated vulnerability checks:

security-scan:
  steps:
    - name: Dependency audit
      run: |
        npm audit --audit-level=high
        # Or: pip-audit, cargo audit, bundle audit

    - name: SAST scanning
      uses: github/codeql-action/analyze@v3

    - name: Container scanning
      run: trivy image myapp:${{ github.sha }}

Dependency Updates

Automated dependency updates:

  • Dependabot (GitHub)
  • Renovate
  • GitLab Dependency Scanning

Configuration example (Dependabot):

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: npm
    directory: "/"
    schedule:
      interval: weekly
    open-pull-requests-limit: 5
    groups:
      dev-dependencies:
        dependency-type: development

Artifact & Release Management

Artifact Strategy

Build once, deploy many:

build:
  steps:
    - run: npm run build
    - uses: actions/upload-artifact@v4
      with:
        name: dist-${{ github.sha }}
        path: dist/
        retention-days: 7

deploy-staging:
  needs: build
  steps:
    - uses: actions/download-artifact@v4
      with:
        name: dist-${{ github.sha }}
    - run: ./deploy.sh staging

deploy-production:
  needs: [build, deploy-staging]
  steps:
    - uses: actions/download-artifact@v4
      with:
        name: dist-${{ github.sha }}
    - run: ./deploy.sh production

Container Image Management

Multi-stage builds:

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]

Image tagging strategy:

- name: Build and tag images
  run: |
    docker build -t myapp:${{ github.sha }} .
    docker tag myapp:${{ github.sha }} myapp:latest
    docker tag myapp:${{ github.sha }} myapp:v1.2.3

Release Automation

Semantic versioning:

release:
  if: startsWith(github.ref, 'refs/tags/v')
  steps:
    - uses: actions/create-release@v1
      with:
        tag_name: ${{ github.ref }}
        release_name: Release ${{ github.ref }}
        body: |
          Changes in this release:
          ${{ github.event.head_commit.message }}

Changelog generation:

- name: Generate changelog
  run: |
    git log $(git describe --tags --abbrev=0)..HEAD \
      --pretty=format:"- %s (%h)" > CHANGELOG.md

Platform Patterns

GitHub Actions

Reusable workflows:

# .github/workflows/reusable-test.yml
on:
  workflow_call:
    inputs:
      node-version:
        required: true
        type: string

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ inputs.node-version }}
      - run: npm test

Composite actions:

# .github/actions/setup-app/action.yml
name: Setup Application
runs:
  using: composite
  steps:
    - uses: actions/setup-node@v4
      with:
        node-version: 20
    - run: npm ci
      shell: bash

GitLab CI

Templates & extends:

.test_template:
  image: node:20
  before_script:
    - npm ci
  script:
    - npm test

unit-test:
  extends: .test_template
  script:
    - npm run test:unit

integration-test:
  extends: .test_template
  script:
    - npm run test:integration

Dynamic child pipelines:

generate-pipeline:
  script:
    - ./generate-config.sh > pipeline.yml
  artifacts:
    paths:
      - pipeline.yml

trigger-pipeline:
  trigger:
    include:
      - artifact: pipeline.yml
        job: generate-pipeline

Continuous Improvement

Metrics to Track

  • Build duration: Target < 10 minutes
  • Failure rate: Target < 5%
  • Time to recovery: Target < 1 hour
  • Deployment frequency: Aim for multiple/day
  • Lead time: Commit to production < 1 day

Pipeline Optimization Checklist

  • Jobs run in parallel where possible
  • Dependencies are cached
  • Test suite is properly split
  • Linting fails fast
  • Only necessary tests run on PRs
  • Artifacts are reused across jobs
  • Pipeline has appropriate timeouts
  • Flaky tests are identified and fixed
  • Security scanning is automated
  • Deployment requires approval

Regular Reviews

Monthly:

  • Review build duration trends
  • Analyze failure patterns
  • Update dependencies
  • Review security scan results

Quarterly:

  • Audit pipeline efficiency
  • Review deployment frequency
  • Update CI/CD tools and actions
  • Team retrospective on CI/CD pain points