Files
gh-ahmedasmar-devops-claude…/references/optimization.md
2025-11-29 17:51:12 +08:00

652 lines
12 KiB
Markdown

# CI/CD Pipeline Optimization
Comprehensive guide to improving pipeline performance through caching, parallelization, and smart resource usage.
## Table of Contents
- [Caching Strategies](#caching-strategies)
- [Parallelization Techniques](#parallelization-techniques)
- [Build Optimization](#build-optimization)
- [Test Optimization](#test-optimization)
- [Resource Management](#resource-management)
- [Monitoring & Metrics](#monitoring--metrics)
---
## Caching Strategies
### Dependency Caching
**Impact:** Can reduce build times by 50-90%
#### GitHub Actions
**Node.js/npm:**
```yaml
- uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- run: npm ci
```
**Python/pip:**
```yaml
- uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- run: pip install -r requirements.txt
```
**Go modules:**
```yaml
- uses: actions/cache@v4
with:
path: |
~/.cache/go-build
~/go/pkg/mod
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ runner.os }}-go-
- run: go build
```
**Rust/Cargo:**
```yaml
- uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-
- run: cargo build --release
```
**Maven:**
```yaml
- uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-
- run: mvn clean install
```
#### GitLab CI
**Global cache:**
```yaml
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .npm/
- vendor/
```
**Job-specific cache:**
```yaml
build:
cache:
key: build-${CI_COMMIT_REF_SLUG}
paths:
- target/
policy: push # Upload only
test:
cache:
key: build-${CI_COMMIT_REF_SLUG}
paths:
- target/
policy: pull # Download only
```
**Cache with files checksum:**
```yaml
cache:
key:
files:
- package-lock.json
- yarn.lock
paths:
- node_modules/
```
### Build Artifact Caching
**Docker layer caching (GitHub):**
```yaml
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v5
with:
context: .
cache-from: type=gha
cache-to: type=gha,mode=max
push: false
tags: myapp:latest
```
**Docker layer caching (GitLab):**
```yaml
build:
image: docker:latest
services:
- docker:dind
variables:
DOCKER_DRIVER: overlay2
script:
- docker pull $CI_REGISTRY_IMAGE:latest || true
- docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:latest .
- docker push $CI_REGISTRY_IMAGE:latest
```
**Gradle build cache:**
```yaml
- uses: actions/cache@v4
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}
- run: ./gradlew build --build-cache
```
### Cache Best Practices
**Key strategies:**
- Include OS/platform: `${{ runner.os }}-` or `${CI_RUNNER_OS}`
- Hash lock files: `hashFiles('**/package-lock.json')`
- Use restore-keys for fallback matches
- Separate caches for different purposes
**Cache invalidation:**
```yaml
# Version in cache key
cache:
key: v2-${CI_COMMIT_REF_SLUG}-${CI_PIPELINE_ID}
```
**Cache size management:**
- GitHub: 10GB per repository (LRU eviction after 7 days)
- GitLab: Configurable per runner
---
## Parallelization Techniques
### Job Parallelization
**Remove unnecessary dependencies:**
```yaml
# Before - Sequential
jobs:
lint:
test:
needs: lint
build:
needs: test
# After - Parallel
jobs:
lint:
test:
build:
needs: [lint, test] # Only wait for what's needed
```
### Matrix Builds
**GitHub Actions:**
```yaml
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node: [18, 20, 22]
include:
- os: ubuntu-latest
node: 22
coverage: true
exclude:
- os: macos-latest
node: 18
fail-fast: false
max-parallel: 10 # Limit concurrent jobs
```
**GitLab parallel:**
```yaml
test:
parallel:
matrix:
- NODE_VERSION: ['18', '20', '22']
TEST_SUITE: ['unit', 'integration']
script:
- nvm use $NODE_VERSION
- npm run test:$TEST_SUITE
```
### Test Splitting
**Jest sharding:**
```yaml
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: npm test -- --shard=${{ matrix.shard }}/4
```
**Playwright sharding:**
```yaml
strategy:
matrix:
shardIndex: [1, 2, 3, 4]
shardTotal: [4]
steps:
- run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
```
**Pytest splitting:**
```yaml
strategy:
matrix:
group: [1, 2, 3, 4]
steps:
- run: pytest --splits 4 --group ${{ matrix.group }}
```
### Conditional Execution
**Path-based:**
```yaml
jobs:
frontend-test:
if: contains(github.event.head_commit.modified, 'frontend/')
backend-test:
if: contains(github.event.head_commit.modified, 'backend/')
```
**GitLab rules:**
```yaml
frontend-test:
rules:
- changes:
- frontend/**/*
backend-test:
rules:
- changes:
- backend/**/*
```
---
## Build Optimization
### Incremental Builds
**Turb
orepo (monorepo):**
```yaml
- run: npx turbo run build test lint --filter=[HEAD^1]
```
**Nx (monorepo):**
```yaml
- run: npx nx affected --target=build --base=origin/main
```
### Compiler Optimizations
**TypeScript incremental:**
```json
{
"compilerOptions": {
"incremental": true,
"tsBuildInfoFile": ".tsbuildinfo"
}
}
```
**Cache tsbuildinfo:**
```yaml
- uses: actions/cache@v4
with:
path: .tsbuildinfo
key: ts-build-${{ hashFiles('**/*.ts') }}
```
### Multi-stage Docker Builds
```dockerfile
# Build stage
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
```
### Build Tool Configuration
**Webpack production mode:**
```javascript
module.exports = {
mode: 'production',
optimization: {
minimize: true,
splitChunks: {
chunks: 'all'
}
}
}
```
**Vite optimization:**
```javascript
export default {
build: {
minify: 'terser',
rollupOptions: {
output: {
manualChunks(id) {
if (id.includes('node_modules')) {
return 'vendor';
}
}
}
}
}
}
```
---
## Test Optimization
### Test Categorization
**Run fast tests first:**
```yaml
jobs:
unit-test:
runs-on: ubuntu-latest
steps:
- run: npm run test:unit # Fast (1-5 min)
integration-test:
needs: unit-test
runs-on: ubuntu-latest
steps:
- run: npm run test:integration # Medium (5-15 min)
e2e-test:
needs: [unit-test, integration-test]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- run: npm run test:e2e # Slow (15-30 min)
```
### Selective Test Execution
**Run only changed:**
```yaml
- name: Get changed files
id: changed
run: |
if [ "${{ github.event_name }}" == "pull_request" ]; then
echo "files=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT
fi
- name: Run affected tests
if: steps.changed.outputs.files
run: npm test -- --findRelatedTests ${{ steps.changed.outputs.files }}
```
### Test Fixtures & Data
**Reuse test databases:**
```yaml
services:
postgres:
image: postgres:15
env:
POSTGRES_DB: testdb
POSTGRES_PASSWORD: testpass
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- run: npm test # All tests share same DB
```
**Snapshot testing:**
```javascript
// Faster than full rendering tests
expect(component).toMatchSnapshot();
```
### Mock External Services
```javascript
// Instead of hitting real APIs
jest.mock('./api', () => ({
fetchData: jest.fn(() => Promise.resolve(mockData))
}));
```
---
## Resource Management
### Job Timeouts
**Prevent hung jobs:**
```yaml
jobs:
test:
timeout-minutes: 30 # Default: 360 (6 hours)
build:
timeout-minutes: 15
```
**GitLab:**
```yaml
test:
timeout: 30m # Default: 1h
```
### Concurrency Control
**GitHub Actions:**
```yaml
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true # Cancel old runs
```
**GitLab:**
```yaml
workflow:
auto_cancel:
on_new_commit: interruptible
job:
interruptible: true
```
### Resource Allocation
**GitLab runner tags:**
```yaml
build:
tags:
- high-memory
- ssd
```
**Kubernetes resource limits:**
```yaml
# GitLab Runner config
[[runners]]
[runners.kubernetes]
cpu_request = "1"
cpu_limit = "2"
memory_request = "2Gi"
memory_limit = "4Gi"
```
---
## Monitoring & Metrics
### Track Key Metrics
**Build duration:**
```yaml
- name: Track duration
run: |
START=$SECONDS
npm run build
DURATION=$((SECONDS - START))
echo "Build took ${DURATION}s"
```
**Cache hit rate:**
```yaml
- uses: actions/cache@v4
id: cache
with:
path: node_modules
key: ${{ hashFiles('package-lock.json') }}
- name: Cache stats
run: |
if [ "${{ steps.cache.outputs.cache-hit }}" == "true" ]; then
echo "Cache hit!"
else
echo "Cache miss"
fi
```
### Performance Regression Detection
**Compare against baseline:**
```yaml
- name: Benchmark
run: npm run benchmark > results.json
- name: Compare
run: |
CURRENT=$(jq '.duration' results.json)
BASELINE=120
if [ $CURRENT -gt $((BASELINE * 120 / 100)) ]; then
echo "Performance regression: ${CURRENT}s vs ${BASELINE}s baseline"
exit 1
fi
```
### External Monitoring
**DataDog CI Visibility:**
```yaml
- run: datadog-ci junit upload --service myapp junit-results.xml
```
**BuildPulse (flaky test detection):**
```yaml
- uses: buildpulse/buildpulse-action@v0.11.0
with:
account: myaccount
repository: myrepo
path: test-results/*.xml
```
---
## Optimization Checklist
### Quick Wins
- [ ] Enable dependency caching
- [ ] Remove unnecessary job dependencies
- [ ] Add job timeouts
- [ ] Enable concurrency cancellation
- [ ] Use `npm ci` instead of `npm install`
### Medium Impact
- [ ] Implement test sharding
- [ ] Use Docker layer caching
- [ ] Add path-based triggers
- [ ] Split slow test suites
- [ ] Use matrix builds for parallel execution
### Advanced
- [ ] Implement incremental builds (Nx, Turborepo)
- [ ] Use remote caching
- [ ] Optimize Docker images (multi-stage, distroless)
- [ ] Implement test impact analysis
- [ ] Set up distributed test execution
### Monitoring
- [ ] Track build duration trends
- [ ] Monitor cache hit rates
- [ ] Identify flaky tests
- [ ] Measure test execution time
- [ ] Set up performance regression alerts
---
## Performance Targets
**Build times:**
- Lint: < 1 minute
- Unit tests: < 5 minutes
- Integration tests: < 15 minutes
- E2E tests: < 30 minutes
- Full pipeline: < 20 minutes
**Resource usage:**
- Cache hit rate: > 80%
- Job success rate: > 95%
- Concurrent jobs: Balanced across available runners
- Queue time: < 2 minutes
**Cost optimization:**
- Build minutes used: Monitor monthly trends
- Storage: Keep artifacts < 7 days unless needed
- Self-hosted runners: Monitor utilization (target 60-80%)