zhongwei/gh-ahmedasmar-devops-claude-skills-ci-cd

Fork 0

Files

Zhongwei Li 1878d01517 Initial commit

2025-11-29 17:51:12 +08:00

12 KiB

Raw Permalink Blame History

CI/CD Pipeline Optimization

Comprehensive guide to improving pipeline performance through caching, parallelization, and smart resource usage.

Caching Strategies
Parallelization Techniques
Build Optimization
Test Optimization
Resource Management
Monitoring & Metrics

Caching Strategies

Dependency Caching

Impact: Can reduce build times by 50-90%

GitHub Actions

Node.js/npm:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

- run: npm ci

Python/pip:

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

- run: pip install -r requirements.txt

Go modules:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/go-build
      ~/go/pkg/mod
    key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
    restore-keys: |
      ${{ runner.os }}-go-

- run: go build

Rust/Cargo:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/bin/
      ~/.cargo/registry/index/
      ~/.cargo/registry/cache/
      ~/.cargo/git/db/
      target/
    key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
    restore-keys: |
      ${{ runner.os }}-cargo-

- run: cargo build --release

Maven:

- uses: actions/cache@v4
  with:
    path: ~/.m2/repository
    key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
    restore-keys: |
      ${{ runner.os }}-maven-

- run: mvn clean install

GitLab CI

Global cache:

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .npm/
    - vendor/

Job-specific cache:

build:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: push  # Upload only

test:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: pull  # Download only

Cache with files checksum:

cache:
  key:
    files:
      - package-lock.json
      - yarn.lock
  paths:
    - node_modules/

Build Artifact Caching

Docker layer caching (GitHub):

- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v5
  with:
    context: .
    cache-from: type=gha
    cache-to: type=gha,mode=max
    push: false
    tags: myapp:latest

Docker layer caching (GitLab):

build:
  image: docker:latest
  services:
    - docker:dind
  variables:
    DOCKER_DRIVER: overlay2
  script:
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:latest

Gradle build cache:

- uses: actions/cache@v4
  with:
    path: |
      ~/.gradle/caches
      ~/.gradle/wrapper
    key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}

- run: ./gradlew build --build-cache

Cache Best Practices

Key strategies:

Include OS/platform: ${{ runner.os }}- or ${CI_RUNNER_OS}
Hash lock files: hashFiles('**/package-lock.json')
Use restore-keys for fallback matches
Separate caches for different purposes

Cache invalidation:

# Version in cache key
cache:
  key: v2-${CI_COMMIT_REF_SLUG}-${CI_PIPELINE_ID}

Cache size management:

GitHub: 10GB per repository (LRU eviction after 7 days)
GitLab: Configurable per runner

Parallelization Techniques

Job Parallelization

Remove unnecessary dependencies:

# Before - Sequential
jobs:
  lint:
  test:
    needs: lint
  build:
    needs: test

# After - Parallel
jobs:
  lint:
  test:
  build:
    needs: [lint, test]  # Only wait for what's needed

Matrix Builds

GitHub Actions:

strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    node: [18, 20, 22]
    include:
      - os: ubuntu-latest
        node: 22
        coverage: true
    exclude:
      - os: macos-latest
        node: 18
  fail-fast: false
  max-parallel: 10  # Limit concurrent jobs

GitLab parallel:

test:
  parallel:
    matrix:
      - NODE_VERSION: ['18', '20', '22']
        TEST_SUITE: ['unit', 'integration']
  script:
    - nvm use $NODE_VERSION
    - npm run test:$TEST_SUITE

Test Splitting

Jest sharding:

strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: npm test -- --shard=${{ matrix.shard }}/4

Playwright sharding:

strategy:
  matrix:
    shardIndex: [1, 2, 3, 4]
    shardTotal: [4]
steps:
  - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}

Pytest splitting:

strategy:
  matrix:
    group: [1, 2, 3, 4]
steps:
  - run: pytest --splits 4 --group ${{ matrix.group }}

Conditional Execution

Path-based:

jobs:
  frontend-test:
    if: contains(github.event.head_commit.modified, 'frontend/')

  backend-test:
    if: contains(github.event.head_commit.modified, 'backend/')

GitLab rules:

frontend-test:
  rules:
    - changes:
        - frontend/**/*

backend-test:
  rules:
    - changes:
        - backend/**/*

Build Optimization

Incremental Builds

**Turb

orepo (monorepo):**

- run: npx turbo run build test lint --filter=[HEAD^1]

Nx (monorepo):

- run: npx nx affected --target=build --base=origin/main

Compiler Optimizations

TypeScript incremental:

{
  "compilerOptions": {
    "incremental": true,
    "tsBuildInfoFile": ".tsbuildinfo"
  }
}

Cache tsbuildinfo:

- uses: actions/cache@v4
  with:
    path: .tsbuildinfo
    key: ts-build-${{ hashFiles('**/*.ts') }}

Multi-stage Docker Builds

# Build stage
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]

Build Tool Configuration

Webpack production mode:

module.exports = {
  mode: 'production',
  optimization: {
    minimize: true,
    splitChunks: {
      chunks: 'all'
    }
  }
}

Vite optimization:

export default {
  build: {
    minify: 'terser',
    rollupOptions: {
      output: {
        manualChunks(id) {
          if (id.includes('node_modules')) {
            return 'vendor';
          }
        }
      }
    }
  }
}

Test Optimization

Test Categorization

Run fast tests first:

jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:unit  # Fast (1-5 min)

  integration-test:
    needs: unit-test
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:integration  # Medium (5-15 min)

  e2e-test:
    needs: [unit-test, integration-test]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:e2e  # Slow (15-30 min)

Selective Test Execution

Run only changed:

- name: Get changed files
  id: changed
  run: |
    if [ "${{ github.event_name }}" == "pull_request" ]; then
      echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
    fi

- name: Run affected tests
  if: steps.changed.outputs.files
  run: npm test -- --findRelatedTests ${{ steps.changed.outputs.files }}

Test Fixtures & Data

Reuse test databases:

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_DB: testdb
      POSTGRES_PASSWORD: testpass
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

steps:
  - run: npm test  # All tests share same DB

Snapshot testing:

// Faster than full rendering tests
expect(component).toMatchSnapshot();

Mock External Services

// Instead of hitting real APIs
jest.mock('./api', () => ({
  fetchData: jest.fn(() => Promise.resolve(mockData))
}));

Resource Management

Job Timeouts

Prevent hung jobs:

jobs:
  test:
    timeout-minutes: 30  # Default: 360 (6 hours)

  build:
    timeout-minutes: 15

GitLab:

test:
  timeout: 30m  # Default: 1h

Concurrency Control

GitHub Actions:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # Cancel old runs

GitLab:

workflow:
  auto_cancel:
    on_new_commit: interruptible

job:
  interruptible: true

Resource Allocation

GitLab runner tags:

build:
  tags:
    - high-memory
    - ssd

Kubernetes resource limits:

# GitLab Runner config
[[runners]]
  [runners.kubernetes]
    cpu_request = "1"
    cpu_limit = "2"
    memory_request = "2Gi"
    memory_limit = "4Gi"

Monitoring & Metrics

Track Key Metrics

Build duration:

- name: Track duration
  run: |
    START=$SECONDS
    npm run build
    DURATION=$((SECONDS - START))
    echo "Build took ${DURATION}s"

Cache hit rate:

- uses: actions/cache@v4
  id: cache
  with:
    path: node_modules
    key: ${{ hashFiles('package-lock.json') }}

- name: Cache stats
  run: |
    if [ "${{ steps.cache.outputs.cache-hit }}" == "true" ]; then
      echo "Cache hit!"
    else
      echo "Cache miss"
    fi

Performance Regression Detection

Compare against baseline:

- name: Benchmark
  run: npm run benchmark > results.json

- name: Compare
  run: |
    CURRENT=$(jq '.duration' results.json)
    BASELINE=120
    if [ $CURRENT -gt $((BASELINE * 120 / 100)) ]; then
      echo "Performance regression: ${CURRENT}s vs ${BASELINE}s baseline"
      exit 1
    fi

External Monitoring

DataDog CI Visibility:

- run: datadog-ci junit upload --service myapp junit-results.xml

BuildPulse (flaky test detection):

- uses: buildpulse/buildpulse-action@v0.11.0
  with:
    account: myaccount
    repository: myrepo
    path: test-results/*.xml

Optimization Checklist

Quick Wins

Enable dependency caching
Remove unnecessary job dependencies
Add job timeouts
Enable concurrency cancellation
Use npm ci instead of npm install

Medium Impact

Implement test sharding
Use Docker layer caching
Add path-based triggers
Split slow test suites
Use matrix builds for parallel execution

Advanced

Implement incremental builds (Nx, Turborepo)
Use remote caching
Optimize Docker images (multi-stage, distroless)
Implement test impact analysis
Set up distributed test execution

Monitoring

Track build duration trends
Monitor cache hit rates
Identify flaky tests
Measure test execution time
Set up performance regression alerts

Performance Targets

Build times:

Lint: < 1 minute
Unit tests: < 5 minutes
Integration tests: < 15 minutes
E2E tests: < 30 minutes
Full pipeline: < 20 minutes

Resource usage:

Cache hit rate: > 80%
Job success rate: > 95%
Concurrent jobs: Balanced across available runners
Queue time: < 2 minutes

Cost optimization:

Build minutes used: Monitor monthly trends
Storage: Keep artifacts < 7 days unless needed
Self-hosted runners: Monitor utilization (target 60-80%)

12 KiB Raw Permalink Blame History