gh-ahmedasmar-devops-claude…/references/optimization.md

# CI/CD Pipeline Optimization

Comprehensive guide to improving pipeline performance through caching, parallelization, and smart resource usage.

## Table of Contents

- [Caching Strategies](#caching-strategies)
- [Parallelization Techniques](#parallelization-techniques)
- [Build Optimization](#build-optimization)
- [Test Optimization](#test-optimization)
- [Resource Management](#resource-management)
- [Monitoring & Metrics](#monitoring--metrics)

---

## Caching Strategies

### Dependency Caching

**Impact:** Can reduce build times by 50-90%

#### GitHub Actions

**Node.js/npm:**
```yaml
- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

- run: npm ci
```

**Python/pip:**
```yaml
- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

- run: pip install -r requirements.txt
```

**Go modules:**
```yaml
- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/go-build
      ~/go/pkg/mod
    key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
    restore-keys: |
      ${{ runner.os }}-go-

- run: go build
```

**Rust/Cargo:**
```yaml
- uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/bin/
      ~/.cargo/registry/index/
      ~/.cargo/registry/cache/
      ~/.cargo/git/db/
      target/
    key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
    restore-keys: |
      ${{ runner.os }}-cargo-

- run: cargo build --release
```

**Maven:**
```yaml
- uses: actions/cache@v4
  with:
    path: ~/.m2/repository
    key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
    restore-keys: |
      ${{ runner.os }}-maven-

- run: mvn clean install
```

#### GitLab CI

**Global cache:**
```yaml
cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .npm/
    - vendor/
```

**Job-specific cache:**
```yaml
build:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: push  # Upload only

test:
  cache:
    key: build-${CI_COMMIT_REF_SLUG}
    paths:
      - target/
    policy: pull  # Download only
```

**Cache with files checksum:**
```yaml
cache:
  key:
    files:
      - package-lock.json
      - yarn.lock
  paths:
    - node_modules/
```

### Build Artifact Caching

**Docker layer caching (GitHub):**
```yaml
- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v5
  with:
    context: .
    cache-from: type=gha
    cache-to: type=gha,mode=max
    push: false
    tags: myapp:latest
```

**Docker layer caching (GitLab):**
```yaml
build:
  image: docker:latest
  services:
    - docker:dind
  variables:
    DOCKER_DRIVER: overlay2
  script:
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:latest
```

**Gradle build cache:**
```yaml
- uses: actions/cache@v4
  with:
    path: |
      ~/.gradle/caches
      ~/.gradle/wrapper
    key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}

- run: ./gradlew build --build-cache
```

### Cache Best Practices

**Key strategies:**
- Include OS/platform: `${{ runner.os }}-` or `${CI_RUNNER_OS}`
- Hash lock files: `hashFiles('**/package-lock.json')`
- Use restore-keys for fallback matches
- Separate caches for different purposes

**Cache invalidation:**
```yaml
# Version in cache key
cache:
  key: v2-${CI_COMMIT_REF_SLUG}-${CI_PIPELINE_ID}
```

**Cache size management:**
- GitHub: 10GB per repository (LRU eviction after 7 days)
- GitLab: Configurable per runner

---

## Parallelization Techniques

### Job Parallelization

**Remove unnecessary dependencies:**
```yaml
# Before - Sequential
jobs:
  lint:
  test:
    needs: lint
  build:
    needs: test

# After - Parallel
jobs:
  lint:
  test:
  build:
    needs: [lint, test]  # Only wait for what's needed
```

### Matrix Builds

**GitHub Actions:**
```yaml
strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    node: [18, 20, 22]
    include:
      - os: ubuntu-latest
        node: 22
        coverage: true
    exclude:
      - os: macos-latest
        node: 18
  fail-fast: false
  max-parallel: 10  # Limit concurrent jobs
```

**GitLab parallel:**
```yaml
test:
  parallel:
    matrix:
      - NODE_VERSION: ['18', '20', '22']
        TEST_SUITE: ['unit', 'integration']
  script:
    - nvm use $NODE_VERSION
    - npm run test:$TEST_SUITE
```

### Test Splitting

**Jest sharding:**
```yaml
strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: npm test -- --shard=${{ matrix.shard }}/4
```

**Playwright sharding:**
```yaml
strategy:
  matrix:
    shardIndex: [1, 2, 3, 4]
    shardTotal: [4]
steps:
  - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
```

**Pytest splitting:**
```yaml
strategy:
  matrix:
    group: [1, 2, 3, 4]
steps:
  - run: pytest --splits 4 --group ${{ matrix.group }}
```

### Conditional Execution

**Path-based:**
```yaml
jobs:
  frontend-test:
    if: contains(github.event.head_commit.modified, 'frontend/')

  backend-test:
    if: contains(github.event.head_commit.modified, 'backend/')
```

**GitLab rules:**
```yaml
frontend-test:
  rules:
    - changes:
        - frontend/**/*

backend-test:
  rules:
    - changes:
        - backend/**/*
```

---

## Build Optimization

### Incremental Builds

**Turb

orepo (monorepo):**
```yaml
- run: npx turbo run build test lint --filter=[HEAD^1]
```

**Nx (monorepo):**
```yaml
- run: npx nx affected --target=build --base=origin/main
```

### Compiler Optimizations

**TypeScript incremental:**
```json
{
  "compilerOptions": {
    "incremental": true,
    "tsBuildInfoFile": ".tsbuildinfo"
  }
}
```

**Cache tsbuildinfo:**
```yaml
- uses: actions/cache@v4
  with:
    path: .tsbuildinfo
    key: ts-build-${{ hashFiles('**/*.ts') }}
```

### Multi-stage Docker Builds

```dockerfile
# Build stage
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
```

### Build Tool Configuration

**Webpack production mode:**
```javascript
module.exports = {
  mode: 'production',
  optimization: {
    minimize: true,
    splitChunks: {
      chunks: 'all'
    }
  }
}
```

**Vite optimization:**
```javascript
export default {
  build: {
    minify: 'terser',
    rollupOptions: {
      output: {
        manualChunks(id) {
          if (id.includes('node_modules')) {
            return 'vendor';
          }
        }
      }
    }
  }
}
```

---

## Test Optimization

### Test Categorization

**Run fast tests first:**
```yaml
jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:unit  # Fast (1-5 min)

  integration-test:
    needs: unit-test
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:integration  # Medium (5-15 min)

  e2e-test:
    needs: [unit-test, integration-test]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: npm run test:e2e  # Slow (15-30 min)
```

### Selective Test Execution

**Run only changed:**
```yaml
- name: Get changed files
  id: changed
  run: |
    if [ "${{ github.event_name }}" == "pull_request" ]; then
      echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
    fi

- name: Run affected tests
  if: steps.changed.outputs.files
  run: npm test -- --findRelatedTests ${{ steps.changed.outputs.files }}
```

### Test Fixtures & Data

**Reuse test databases:**
```yaml
services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_DB: testdb
      POSTGRES_PASSWORD: testpass
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

steps:
  - run: npm test  # All tests share same DB
```

**Snapshot testing:**
```javascript
// Faster than full rendering tests
expect(component).toMatchSnapshot();
```

### Mock External Services

```javascript
// Instead of hitting real APIs
jest.mock('./api', () => ({
  fetchData: jest.fn(() => Promise.resolve(mockData))
}));
```

---

## Resource Management

### Job Timeouts

**Prevent hung jobs:**
```yaml
jobs:
  test:
    timeout-minutes: 30  # Default: 360 (6 hours)

  build:
    timeout-minutes: 15
```

**GitLab:**
```yaml
test:
  timeout: 30m  # Default: 1h
```

### Concurrency Control

**GitHub Actions:**
```yaml
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true  # Cancel old runs
```

**GitLab:**
```yaml
workflow:
  auto_cancel:
    on_new_commit: interruptible

job:
  interruptible: true
```

### Resource Allocation

**GitLab runner tags:**
```yaml
build:
  tags:
    - high-memory
    - ssd
```

**Kubernetes resource limits:**
```yaml
# GitLab Runner config
[[runners]]
  [runners.kubernetes]
    cpu_request = "1"
    cpu_limit = "2"
    memory_request = "2Gi"
    memory_limit = "4Gi"
```

---

## Monitoring & Metrics

### Track Key Metrics

**Build duration:**
```yaml
- name: Track duration
  run: |
    START=$SECONDS
    npm run build
    DURATION=$((SECONDS - START))
    echo "Build took ${DURATION}s"
```

**Cache hit rate:**
```yaml
- uses: actions/cache@v4
  id: cache
  with:
    path: node_modules
    key: ${{ hashFiles('package-lock.json') }}

- name: Cache stats
  run: |
    if [ "${{ steps.cache.outputs.cache-hit }}" == "true" ]; then
      echo "Cache hit!"
    else
      echo "Cache miss"
    fi
```

### Performance Regression Detection

**Compare against baseline:**
```yaml
- name: Benchmark
  run: npm run benchmark > results.json

- name: Compare
  run: |
    CURRENT=$(jq '.duration' results.json)
    BASELINE=120
    if [ $CURRENT -gt $((BASELINE * 120 / 100)) ]; then
      echo "Performance regression: ${CURRENT}s vs ${BASELINE}s baseline"
      exit 1
    fi
```

### External Monitoring

**DataDog CI Visibility:**
```yaml
- run: datadog-ci junit upload --service myapp junit-results.xml
```

**BuildPulse (flaky test detection):**
```yaml
- uses: buildpulse/buildpulse-action@v0.11.0
  with:
    account: myaccount
    repository: myrepo
    path: test-results/*.xml
```

---

## Optimization Checklist

### Quick Wins
- [ ] Enable dependency caching
- [ ] Remove unnecessary job dependencies
- [ ] Add job timeouts
- [ ] Enable concurrency cancellation
- [ ] Use `npm ci` instead of `npm install`

### Medium Impact
- [ ] Implement test sharding
- [ ] Use Docker layer caching
- [ ] Add path-based triggers
- [ ] Split slow test suites
- [ ] Use matrix builds for parallel execution

### Advanced
- [ ] Implement incremental builds (Nx, Turborepo)
- [ ] Use remote caching
- [ ] Optimize Docker images (multi-stage, distroless)
- [ ] Implement test impact analysis
- [ ] Set up distributed test execution

### Monitoring
- [ ] Track build duration trends
- [ ] Monitor cache hit rates
- [ ] Identify flaky tests
- [ ] Measure test execution time
- [ ] Set up performance regression alerts

---

## Performance Targets

**Build times:**
- Lint: < 1 minute
- Unit tests: < 5 minutes
- Integration tests: < 15 minutes
- E2E tests: < 30 minutes
- Full pipeline: < 20 minutes

**Resource usage:**
- Cache hit rate: > 80%
- Job success rate: > 95%
- Concurrent jobs: Balanced across available runners
- Queue time: < 2 minutes

**Cost optimization:**
- Build minutes used: Monitor monthly trends
- Storage: Keep artifacts < 7 days unless needed
- Self-hosted runners: Monitor utilization (target 60-80%)