# CI/CD Best Practices Comprehensive guide to CI/CD pipeline design, testing strategies, and deployment patterns. ## Table of Contents - [Pipeline Design Principles](#pipeline-design-principles) - [Testing in CI/CD](#testing-in-cicd) - [Deployment Strategies](#deployment-strategies) - [Dependency Management](#dependency-management) - [Artifact & Release Management](#artifact--release-management) - [Platform Patterns](#platform-patterns) --- ## Pipeline Design Principles ### Fast Feedback Loops Design pipelines to provide feedback quickly: **Priority ordering:** 1. Linting and code formatting (seconds) 2. Unit tests (1-5 minutes) 3. Integration tests (5-15 minutes) 4. E2E tests (15-30 minutes) 5. Deployment (varies) **Fail fast pattern:** ```yaml # GitHub Actions jobs: lint: runs-on: ubuntu-latest steps: - run: npm run lint test: needs: lint # Only run if lint passes runs-on: ubuntu-latest steps: - run: npm test e2e: needs: [lint, test] # Run after basic checks ``` ### Job Parallelization Run independent jobs concurrently: **GitHub Actions:** ```yaml jobs: lint: runs-on: ubuntu-latest test: runs-on: ubuntu-latest # No 'needs' - runs in parallel with lint build: needs: [lint, test] # Wait for both runs-on: ubuntu-latest ``` **GitLab CI:** ```yaml stages: - validate - test - build # Jobs in same stage run in parallel unit-test: stage: test integration-test: stage: test e2e-test: stage: test ``` ### Monorepo Strategies **Path-based triggers (GitHub):** ```yaml on: push: paths: - 'services/api/**' - 'shared/**' jobs: api-test: if: | contains(github.event.head_commit.modified, 'services/api/') || contains(github.event.head_commit.modified, 'shared/') ``` **GitLab rules:** ```yaml api-test: rules: - changes: - services/api/**/* - shared/**/* frontend-test: rules: - changes: - services/frontend/**/* - shared/**/* ``` ### Matrix Builds Test across multiple versions/platforms: **GitHub Actions:** ```yaml strategy: matrix: os: [ubuntu-latest, macos-latest, windows-latest] node: [18, 20, 22] include: - os: ubuntu-latest node: 22 coverage: true exclude: - os: windows-latest node: 18 fail-fast: false # See all results ``` **GitLab parallel:** ```yaml test: parallel: matrix: - NODE_VERSION: ['18', '20', '22'] OS: ['ubuntu', 'alpine'] ``` --- ## Testing in CI/CD ### Test Pyramid Strategy Maintain proper test distribution: ``` /\ /E2E\ 10% - Slow, expensive, flaky /-----\ / Int \ 20% - Medium speed /--------\ / Unit \ 70% - Fast, reliable /------------\ ``` **Implementation:** ```yaml jobs: unit-test: runs-on: ubuntu-latest steps: - run: npm run test:unit # Fast, runs on every commit integration-test: runs-on: ubuntu-latest needs: unit-test steps: - run: npm run test:integration # Medium, after unit tests e2e-test: runs-on: ubuntu-latest needs: [unit-test, integration-test] if: github.ref == 'refs/heads/main' # Only on main branch steps: - run: npm run test:e2e # Slow, only on main ``` ### Test Splitting & Parallelization Split large test suites: **GitHub Actions:** ```yaml strategy: matrix: shard: [1, 2, 3, 4] steps: - run: npm test -- --shard=${{ matrix.shard }}/4 ``` **Playwright example:** ```yaml strategy: matrix: shardIndex: [1, 2, 3, 4] shardTotal: [4] steps: - run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} ``` ### Code Coverage **Track coverage trends:** ```yaml - name: Run tests with coverage run: npm test -- --coverage - name: Upload coverage uses: codecov/codecov-action@v4 with: files: ./coverage/lcov.info fail_ci_if_error: true # Fail if upload fails - name: Coverage check run: | COVERAGE=$(jq -r '.total.lines.pct' coverage/coverage-summary.json) if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "Coverage $COVERAGE% is below 80%" exit 1 fi ``` ### Test Environment Management **Docker Compose for services:** ```yaml jobs: integration-test: runs-on: ubuntu-latest steps: - name: Start services run: docker-compose up -d postgres redis - name: Wait for services run: | timeout 30 bash -c 'until docker-compose exec -T postgres pg_isready; do sleep 1; done' - name: Run tests run: npm run test:integration - name: Cleanup if: always() run: docker-compose down ``` **GitLab services:** ```yaml integration-test: services: - postgres:15 - redis:7-alpine variables: POSTGRES_DB: testdb POSTGRES_PASSWORD: password script: - npm run test:integration ``` --- ## Deployment Strategies ### Deployment Patterns **1. Direct Deployment (Simple)** ```yaml deploy: if: github.ref == 'refs/heads/main' steps: - run: | aws s3 sync dist/ s3://${{ secrets.S3_BUCKET }} aws cloudfront create-invalidation --distribution-id ${{ secrets.CF_DIST }} ``` **2. Blue-Green Deployment** ```yaml deploy: steps: - name: Deploy to staging slot run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP - name: Health check run: | for i in {1..10}; do if curl -f https://$APP.azurewebsites.net/health; then echo "Health check passed" exit 0 fi sleep 10 done exit 1 - name: Rollback on failure if: failure() run: az webapp deployment slot swap --slot staging --resource-group $RG --name $APP ``` **3. Canary Deployment** ```yaml deploy-canary: steps: - run: kubectl set image deployment/app app=myapp:${{ github.sha }} - run: kubectl patch deployment app -p '{"spec":{"replicas":1}}' # 1 pod - run: sleep 300 # Monitor for 5 minutes - run: kubectl scale deployment app --replicas=10 # Scale to full ``` ### Environment Management **GitHub Environments:** ```yaml jobs: deploy-staging: environment: name: staging url: https://staging.example.com steps: - run: ./deploy.sh staging deploy-production: needs: deploy-staging environment: name: production url: https://example.com steps: - run: ./deploy.sh production ``` **Protection rules:** - Require approval for production - Restrict to specific branches - Add deployment delay **GitLab environments:** ```yaml deploy:staging: stage: deploy environment: name: staging url: https://staging.example.com on_stop: stop:staging only: - develop deploy:production: stage: deploy environment: name: production url: https://example.com when: manual # Require manual trigger only: - main ``` ### Deployment Gates **Pre-deployment checks:** ```yaml pre-deploy-checks: steps: - name: Check migration status run: ./scripts/check-migrations.sh - name: Verify dependencies run: npm audit --audit-level=high - name: Check service health run: curl -f https://api.example.com/health ``` **Post-deployment validation:** ```yaml post-deploy-validation: needs: deploy steps: - name: Smoke tests run: npm run test:smoke - name: Monitor errors run: | ERROR_COUNT=$(datadog-api errors --since 5m) if [ $ERROR_COUNT -gt 10 ]; then echo "Error spike detected!" exit 1 fi ``` --- ## Dependency Management ### Lock Files **Always commit lock files:** - `package-lock.json` (npm) - `yarn.lock` (Yarn) - `pnpm-lock.yaml` (pnpm) - `Cargo.lock` (Rust) - `Gemfile.lock` (Ruby) - `poetry.lock` (Python) **Use deterministic install commands:** ```bash # Good - uses lock file npm ci # Not npm install yarn install --frozen-lockfile pnpm install --frozen-lockfile pip install -r requirements.txt # Bad - updates lock file npm install ``` ### Dependency Caching **See optimization.md for detailed caching strategies** Quick reference: - Hash lock files for cache keys - Include OS/platform in cache key - Use restore-keys for partial matches - Separate cache for build artifacts vs dependencies ### Security Scanning **Automated vulnerability checks:** ```yaml security-scan: steps: - name: Dependency audit run: | npm audit --audit-level=high # Or: pip-audit, cargo audit, bundle audit - name: SAST scanning uses: github/codeql-action/analyze@v3 - name: Container scanning run: trivy image myapp:${{ github.sha }} ``` ### Dependency Updates **Automated dependency updates:** - Dependabot (GitHub) - Renovate - GitLab Dependency Scanning **Configuration example (Dependabot):** ```yaml # .github/dependabot.yml version: 2 updates: - package-ecosystem: npm directory: "/" schedule: interval: weekly open-pull-requests-limit: 5 groups: dev-dependencies: dependency-type: development ``` --- ## Artifact & Release Management ### Artifact Strategy **Build once, deploy many:** ```yaml build: steps: - run: npm run build - uses: actions/upload-artifact@v4 with: name: dist-${{ github.sha }} path: dist/ retention-days: 7 deploy-staging: needs: build steps: - uses: actions/download-artifact@v4 with: name: dist-${{ github.sha }} - run: ./deploy.sh staging deploy-production: needs: [build, deploy-staging] steps: - uses: actions/download-artifact@v4 with: name: dist-${{ github.sha }} - run: ./deploy.sh production ``` ### Container Image Management **Multi-stage builds:** ```dockerfile # Build stage FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . RUN npm run build # Production stage FROM node:20-alpine WORKDIR /app COPY --from=builder /app/dist ./dist COPY --from=builder /app/node_modules ./node_modules USER node CMD ["node", "dist/server.js"] ``` **Image tagging strategy:** ```yaml - name: Build and tag images run: | docker build -t myapp:${{ github.sha }} . docker tag myapp:${{ github.sha }} myapp:latest docker tag myapp:${{ github.sha }} myapp:v1.2.3 ``` ### Release Automation **Semantic versioning:** ```yaml release: if: startsWith(github.ref, 'refs/tags/v') steps: - uses: actions/create-release@v1 with: tag_name: ${{ github.ref }} release_name: Release ${{ github.ref }} body: | Changes in this release: ${{ github.event.head_commit.message }} ``` **Changelog generation:** ```yaml - name: Generate changelog run: | git log $(git describe --tags --abbrev=0)..HEAD \ --pretty=format:"- %s (%h)" > CHANGELOG.md ``` --- ## Platform Patterns ### GitHub Actions **Reusable workflows:** ```yaml # .github/workflows/reusable-test.yml on: workflow_call: inputs: node-version: required: true type: string jobs: test: runs-on: ubuntu-latest steps: - uses: actions/setup-node@v4 with: node-version: ${{ inputs.node-version }} - run: npm test ``` **Composite actions:** ```yaml # .github/actions/setup-app/action.yml name: Setup Application runs: using: composite steps: - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci shell: bash ``` ### GitLab CI **Templates & extends:** ```yaml .test_template: image: node:20 before_script: - npm ci script: - npm test unit-test: extends: .test_template script: - npm run test:unit integration-test: extends: .test_template script: - npm run test:integration ``` **Dynamic child pipelines:** ```yaml generate-pipeline: script: - ./generate-config.sh > pipeline.yml artifacts: paths: - pipeline.yml trigger-pipeline: trigger: include: - artifact: pipeline.yml job: generate-pipeline ``` --- ## Continuous Improvement ### Metrics to Track - **Build duration:** Target < 10 minutes - **Failure rate:** Target < 5% - **Time to recovery:** Target < 1 hour - **Deployment frequency:** Aim for multiple/day - **Lead time:** Commit to production < 1 day ### Pipeline Optimization Checklist - [ ] Jobs run in parallel where possible - [ ] Dependencies are cached - [ ] Test suite is properly split - [ ] Linting fails fast - [ ] Only necessary tests run on PRs - [ ] Artifacts are reused across jobs - [ ] Pipeline has appropriate timeouts - [ ] Flaky tests are identified and fixed - [ ] Security scanning is automated - [ ] Deployment requires approval ### Regular Reviews **Monthly:** - Review build duration trends - Analyze failure patterns - Update dependencies - Review security scan results **Quarterly:** - Audit pipeline efficiency - Review deployment frequency - Update CI/CD tools and actions - Team retrospective on CI/CD pain points