Files
2025-11-29 17:51:12 +08:00

12 KiB

CI/CD Pipeline Optimization

Comprehensive guide to improving pipeline performance through caching, parallelization, and smart resource usage.

Table of Contents


Caching Strategies

Dependency Caching

Impact: Can reduce build times by 50-90%

GitHub Actions

Node.js/npm:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

- run: npm ci

Python/pip:

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

- run: pip install -r requirements.txt

Go modules:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/go-build
      ~/go/pkg/mod
    key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
    restore-keys: |
      ${{ runner.os }}-go-

- run: go build

Rust/Cargo:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/bin/
      ~/.cargo/registry/index/
      ~/.cargo/registry/cache/
      ~/.cargo/git/db/
      target/
    key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
    restore-keys: |
      ${{ runner.os }}-cargo-

- run: cargo build --release

Maven:

- uses: actions/cache@v4
  with:
    path: ~/.m2/repository
    key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
    restore-keys: |
      ${{ runner.os }}-maven-

- run: mvn clean install

GitLab CI

Global cache:

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .npm/
    - vendor/

Job-specific cache:

build:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: push  # Upload only

test:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: pull  # Download only

Cache with files checksum:

cache:
  key:
    files:
      - package-lock.json
      - yarn.lock
  paths:
    - node_modules/

Build Artifact Caching

Docker layer caching (GitHub):

- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v5
  with:
    context: .
    cache-from: type=gha
    cache-to: type=gha,mode=max
    push: false
    tags: myapp:latest

Docker layer caching (GitLab):

build:
  image: docker:latest
  services:
    - docker:dind
  variables:
    DOCKER_DRIVER: overlay2
  script:
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:latest

Gradle build cache:

- uses: actions/cache@v4
  with:
    path: |
      ~/.gradle/caches
      ~/.gradle/wrapper
    key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}

- run: ./gradlew build --build-cache

Cache Best Practices

Key strategies:

  • Include OS/platform: ${{ runner.os }}- or ${CI_RUNNER_OS}
  • Hash lock files: hashFiles('**/package-lock.json')
  • Use restore-keys for fallback matches
  • Separate caches for different purposes

Cache invalidation:

# Version in cache key
cache:
  key: v2-${CI_COMMIT_REF_SLUG}-${CI_PIPELINE_ID}

Cache size management:

  • GitHub: 10GB per repository (LRU eviction after 7 days)
  • GitLab: Configurable per runner

Parallelization Techniques

Job Parallelization

Remove unnecessary dependencies:

# Before - Sequential
jobs:
  lint:
  test:
    needs: lint
  build:
    needs: test

# After - Parallel
jobs:
  lint:
  test:
  build:
    needs: [lint, test]  # Only wait for what's needed

Matrix Builds

GitHub Actions:

strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    node: [18, 20, 22]
    include:
      - os: ubuntu-latest
        node: 22
        coverage: true
    exclude:
      - os: macos-latest
        node: 18
  fail-fast: false
  max-parallel: 10  # Limit concurrent jobs

GitLab parallel:

test:
  parallel:
    matrix:
      - NODE_VERSION: ['18', '20', '22']
        TEST_SUITE: ['unit', 'integration']
  script:
    - nvm use $NODE_VERSION
    - npm run test:$TEST_SUITE

Test Splitting

Jest sharding:

strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: npm test -- --shard=${{ matrix.shard }}/4

Playwright sharding:

strategy:
  matrix:
    shardIndex: [1, 2, 3, 4]
    shardTotal: [4]
steps:
  - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}

Pytest splitting:

strategy:
  matrix:
    group: [1, 2, 3, 4]
steps:
  - run: pytest --splits 4 --group ${{ matrix.group }}

Conditional Execution

Path-based:

jobs:
  frontend-test:
    if: contains(github.event.head_commit.modified, 'frontend/')

  backend-test:
    if: contains(github.event.head_commit.modified, 'backend/')

GitLab rules:

frontend-test:
  rules:
    - changes:
        - frontend/**/*

backend-test:
  rules:
    - changes:
        - backend/**/*

Build Optimization

Incremental Builds

**Turb

orepo (monorepo):**

- run: npx turbo run build test lint --filter=[HEAD^1]

Nx (monorepo):

- run: npx nx affected --target=build --base=origin/main

Compiler Optimizations

TypeScript incremental:

{
  "compilerOptions": {
    "incremental": true,
    "tsBuildInfoFile": ".tsbuildinfo"
  }
}

Cache tsbuildinfo:

- uses: actions/cache@v4
  with:
    path: .tsbuildinfo
    key: ts-build-${{ hashFiles('**/*.ts') }}

Multi-stage Docker Builds

# Build stage
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]

Build Tool Configuration

Webpack production mode:

module.exports = {
  mode: 'production',
  optimization: {
    minimize: true,
    splitChunks: {
      chunks: 'all'
    }
  }
}

Vite optimization:

export default {
  build: {
    minify: 'terser',
    rollupOptions: {
      output: {
        manualChunks(id) {
          if (id.includes('node_modules')) {
            return 'vendor';
          }
        }
      }
    }
  }
}

Test Optimization

Test Categorization

Run fast tests first:

jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:unit  # Fast (1-5 min)

  integration-test:
    needs: unit-test
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:integration  # Medium (5-15 min)

  e2e-test:
    needs: [unit-test, integration-test]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:e2e  # Slow (15-30 min)

Selective Test Execution

Run only changed:

- name: Get changed files
  id: changed
  run: |
    if [ "${{ github.event_name }}" == "pull_request" ]; then
      echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
    fi

- name: Run affected tests
  if: steps.changed.outputs.files
  run: npm test -- --findRelatedTests ${{ steps.changed.outputs.files }}

Test Fixtures & Data

Reuse test databases:

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_DB: testdb
      POSTGRES_PASSWORD: testpass
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

steps:
  - run: npm test  # All tests share same DB

Snapshot testing:

// Faster than full rendering tests
expect(component).toMatchSnapshot();

Mock External Services

// Instead of hitting real APIs
jest.mock('./api', () => ({
  fetchData: jest.fn(() => Promise.resolve(mockData))
}));

Resource Management

Job Timeouts

Prevent hung jobs:

jobs:
  test:
    timeout-minutes: 30  # Default: 360 (6 hours)

  build:
    timeout-minutes: 15

GitLab:

test:
  timeout: 30m  # Default: 1h

Concurrency Control

GitHub Actions:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # Cancel old runs

GitLab:

workflow:
  auto_cancel:
    on_new_commit: interruptible

job:
  interruptible: true

Resource Allocation

GitLab runner tags:

build:
  tags:
    - high-memory
    - ssd

Kubernetes resource limits:

# GitLab Runner config
[[runners]]
  [runners.kubernetes]
    cpu_request = "1"
    cpu_limit = "2"
    memory_request = "2Gi"
    memory_limit = "4Gi"

Monitoring & Metrics

Track Key Metrics

Build duration:

- name: Track duration
  run: |
    START=$SECONDS
    npm run build
    DURATION=$((SECONDS - START))
    echo "Build took ${DURATION}s"

Cache hit rate:

- uses: actions/cache@v4
  id: cache
  with:
    path: node_modules
    key: ${{ hashFiles('package-lock.json') }}

- name: Cache stats
  run: |
    if [ "${{ steps.cache.outputs.cache-hit }}" == "true" ]; then
      echo "Cache hit!"
    else
      echo "Cache miss"
    fi

Performance Regression Detection

Compare against baseline:

- name: Benchmark
  run: npm run benchmark > results.json

- name: Compare
  run: |
    CURRENT=$(jq '.duration' results.json)
    BASELINE=120
    if [ $CURRENT -gt $((BASELINE * 120 / 100)) ]; then
      echo "Performance regression: ${CURRENT}s vs ${BASELINE}s baseline"
      exit 1
    fi

External Monitoring

DataDog CI Visibility:

- run: datadog-ci junit upload --service myapp junit-results.xml

BuildPulse (flaky test detection):

- uses: buildpulse/buildpulse-action@v0.11.0
  with:
    account: myaccount
    repository: myrepo
    path: test-results/*.xml

Optimization Checklist

Quick Wins

  • Enable dependency caching
  • Remove unnecessary job dependencies
  • Add job timeouts
  • Enable concurrency cancellation
  • Use npm ci instead of npm install

Medium Impact

  • Implement test sharding
  • Use Docker layer caching
  • Add path-based triggers
  • Split slow test suites
  • Use matrix builds for parallel execution

Advanced

  • Implement incremental builds (Nx, Turborepo)
  • Use remote caching
  • Optimize Docker images (multi-stage, distroless)
  • Implement test impact analysis
  • Set up distributed test execution

Monitoring

  • Track build duration trends
  • Monitor cache hit rates
  • Identify flaky tests
  • Measure test execution time
  • Set up performance regression alerts

Performance Targets

Build times:

  • Lint: < 1 minute
  • Unit tests: < 5 minutes
  • Integration tests: < 15 minutes
  • E2E tests: < 30 minutes
  • Full pipeline: < 20 minutes

Resource usage:

  • Cache hit rate: > 80%
  • Job success rate: > 95%
  • Concurrent jobs: Balanced across available runners
  • Queue time: < 2 minutes

Cost optimization:

  • Build minutes used: Monitor monthly trends
  • Storage: Keep artifacts < 7 days unless needed
  • Self-hosted runners: Monitor utilization (target 60-80%)