--- name: ci-cd description: CI/CD pipeline design, optimization, DevSecOps security scanning, and troubleshooting. Use for creating workflows, debugging pipeline failures, implementing SAST/DAST/SCA, optimizing build performance, implementing caching strategies, setting up deployments, securing pipelines with OIDC/secrets management, and troubleshooting common issues across GitHub Actions, GitLab CI, and other platforms. --- # CI/CD Pipelines Comprehensive guide for CI/CD pipeline design, optimization, security, and troubleshooting across GitHub Actions, GitLab CI, and other platforms. ## When to Use This Skill Use this skill when: - Creating new CI/CD workflows or pipelines - Debugging pipeline failures or flaky tests - Optimizing slow builds or test suites - Implementing caching strategies - Setting up deployment workflows - Securing pipelines (secrets, OIDC, supply chain) - Implementing DevSecOps security scanning (SAST, DAST, SCA) - Troubleshooting platform-specific issues - Analyzing pipeline performance - Implementing matrix builds or test sharding - Configuring multi-environment deployments ## Core Workflows ### 1. Creating a New Pipeline **Decision tree:** ``` What are you building? ├── Node.js/Frontend → GitHub: templates/github-actions/node-ci.yml | GitLab: templates/gitlab-ci/node-ci.yml ├── Python → GitHub: templates/github-actions/python-ci.yml | GitLab: templates/gitlab-ci/python-ci.yml ├── Go → GitHub: templates/github-actions/go-ci.yml | GitLab: templates/gitlab-ci/go-ci.yml ├── Docker Image → GitHub: templates/github-actions/docker-build.yml | GitLab: templates/gitlab-ci/docker-build.yml ├── Other → Follow the pipeline design pattern below ``` **Basic pipeline structure:** ```yaml # 1. Fast feedback (lint, format) - <1 min # 2. Unit tests - 1-5 min # 3. Integration tests - 5-15 min # 4. Build artifacts # 5. E2E tests (optional, main branch only) - 15-30 min # 6. Deploy (with approval gates) ``` **Key principles:** - Fail fast: Run cheap validation first - Parallelize: Remove unnecessary job dependencies - Cache dependencies: Use `actions/cache` or GitLab cache - Use artifacts: Build once, deploy many times See [best_practices.md](references/best_practices.md) for comprehensive pipeline design patterns. ### 2. Optimizing Pipeline Performance **Quick wins checklist:** - [ ] Add dependency caching (50-90% faster builds) - [ ] Remove unnecessary `needs` dependencies - [ ] Add path filters to skip unnecessary runs - [ ] Use `npm ci` instead of `npm install` - [ ] Add job timeouts to prevent hung builds - [ ] Enable concurrency cancellation for duplicate runs **Analyze existing pipeline:** ```bash # Use the pipeline analyzer script python3 scripts/pipeline_analyzer.py --platform github --workflow .github/workflows/ci.yml ``` **Common optimizations:** - **Slow tests:** Shard tests with matrix builds - **Repeated dependency installs:** Add caching - **Sequential jobs:** Parallelize with proper `needs` - **Full test suite on every PR:** Use path filters or test impact analysis See [optimization.md](references/optimization.md) for detailed caching strategies, parallelization techniques, and performance tuning. ### 3. Securing Your Pipeline **Essential security checklist:** - [ ] Use OIDC instead of static credentials - [ ] Pin actions/includes to commit SHAs - [ ] Use minimal permissions - [ ] Enable secret scanning - [ ] Add vulnerability scanning (dependencies, containers) - [ ] Implement branch protection - [ ] Separate test from deploy workflows **Quick setup - OIDC authentication:** **GitHub Actions → AWS:** ```yaml permissions: id-token: write contents: read steps: - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789:role/GitHubActionsRole aws-region: us-east-1 ``` **Secrets management:** - Store in platform secret stores (GitHub Secrets, GitLab CI/CD Variables) - Mark as "masked" in GitLab - Use environment-specific secrets - Rotate regularly (every 90 days) - Never log secrets See [security.md](references/security.md) for comprehensive security patterns, supply chain security, and secrets management. ### 4. Troubleshooting Pipeline Failures **Systematic approach:** **Step 1: Check pipeline health** ```bash python3 scripts/ci_health.py --platform github --repo owner/repo ``` **Step 2: Identify the failure type** | Error Pattern | Common Cause | Quick Fix | |---------------|--------------|-----------| | "Module not found" | Missing dependency or cache issue | Clear cache, run `npm ci` | | "Timeout" | Job taking too long | Add caching, increase timeout | | "Permission denied" | Missing permissions | Add to `permissions:` block | | "Cannot connect to Docker daemon" | Docker not available | Use correct runner or DinD | | Intermittent failures | Flaky tests or race conditions | Add retries, fix timing issues | **Step 3: Enable debug logging** GitHub Actions: ```yaml # Add repository secrets: # ACTIONS_RUNNER_DEBUG = true # ACTIONS_STEP_DEBUG = true ``` GitLab CI: ```yaml variables: CI_DEBUG_TRACE: "true" ``` **Step 4: Reproduce locally** ```bash # GitHub Actions - use act act -j build # Or Docker docker run -it ubuntu:latest bash # Then manually run the failing steps ``` See [troubleshooting.md](references/troubleshooting.md) for comprehensive issue diagnosis, platform-specific problems, and solutions. ### 5. Implementing Deployment Workflows **Deployment pattern selection:** | Pattern | Use Case | Complexity | Risk | |---------|----------|------------|------| | Direct | Simple apps, low traffic | Low | Medium | | Blue-Green | Zero downtime required | Medium | Low | | Canary | Gradual rollout, monitoring | High | Very Low | | Rolling | Kubernetes, containers | Medium | Low | **Basic deployment structure:** ```yaml deploy: needs: [build, test] if: github.ref == 'refs/heads/main' environment: name: production url: https://example.com steps: - name: Download artifacts - name: Deploy - name: Health check - name: Rollback on failure ``` **Multi-environment setup:** - **Development:** Auto-deploy on develop branch - **Staging:** Auto-deploy on main, requires passing tests - **Production:** Manual approval required, smoke tests mandatory See [best_practices.md](references/best_practices.md#deployment-strategies) for detailed deployment patterns and environment management. ### 6. Implementing DevSecOps Security Scanning **Security scanning types:** | Scan Type | Purpose | When to Run | Speed | Tools | |-----------|---------|-------------|-------|-------| | Secret Scanning | Find exposed credentials | Every commit | Fast (<1 min) | TruffleHog, Gitleaks | | SAST | Find code vulnerabilities | Every commit | Medium (5-15 min) | CodeQL, Semgrep, Bandit, Gosec | | SCA | Find dependency vulnerabilities | Every commit | Fast (1-5 min) | npm audit, pip-audit, Snyk | | Container Scanning | Find image vulnerabilities | After build | Medium (5-10 min) | Trivy, Grype | | DAST | Find runtime vulnerabilities | Scheduled/main only | Slow (15-60 min) | OWASP ZAP | **Quick setup - Add security to existing pipeline:** **GitHub Actions:** ```yaml jobs: # Add before build job secret-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: trufflesecurity/trufflehog@main - uses: gitleaks/gitleaks-action@v2 sast: runs-on: ubuntu-latest permissions: security-events: write steps: - uses: actions/checkout@v4 - uses: github/codeql-action/init@v3 with: languages: javascript # or python, go - uses: github/codeql-action/analyze@v3 build: needs: [secret-scan, sast] # Add dependencies ``` **GitLab CI:** ```yaml stages: - security # Add before other stages - build - test # Secret scanning secret-scan: stage: security image: trufflesecurity/trufflehog:latest script: - trufflehog filesystem . --json --fail # SAST sast:semgrep: stage: security image: returntocorp/semgrep script: - semgrep scan --config=auto . # Use GitLab templates include: - template: Security/SAST.gitlab-ci.yml - template: Security/Dependency-Scanning.gitlab-ci.yml ``` **Comprehensive security pipeline templates:** - **GitHub Actions:** `templates/github-actions/security-scan.yml` - Complete DevSecOps pipeline with all scanning stages - **GitLab CI:** `templates/gitlab-ci/security-scan.yml` - Complete DevSecOps pipeline with GitLab security templates **Security gate pattern:** Add a security gate job that evaluates all security scan results and fails the pipeline if critical issues are found: ```yaml security-gate: needs: [secret-scan, sast, sca, container-scan] script: # Check for critical vulnerabilities # Parse JSON reports and evaluate thresholds # Fail if critical issues found ``` **Language-specific security tools:** - **Node.js:** CodeQL, Semgrep, npm audit, eslint-plugin-security - **Python:** CodeQL, Semgrep, Bandit, pip-audit, Safety - **Go:** CodeQL, Semgrep, Gosec, govulncheck All language-specific templates now include security scanning stages. See: - `templates/github-actions/node-ci.yml` - `templates/github-actions/python-ci.yml` - `templates/github-actions/go-ci.yml` - `templates/gitlab-ci/node-ci.yml` - `templates/gitlab-ci/python-ci.yml` - `templates/gitlab-ci/go-ci.yml` See [devsecops.md](references/devsecops.md) for comprehensive DevSecOps guide covering all security scanning types, tool comparisons, and implementation patterns. ## Quick Reference Commands ### GitHub Actions ```bash # List workflows gh workflow list # View recent runs gh run list --limit 20 # View specific run gh run view # Re-run failed jobs gh run rerun --failed # Download logs gh run view --log > logs.txt # Trigger workflow manually gh workflow run ci.yml # Check workflow status gh run watch ``` ### GitLab CI ```bash # View pipelines gl project-pipelines list # Pipeline status gl project-pipeline get # Retry failed jobs gl project-pipeline retry # Cancel pipeline gl project-pipeline cancel # Download artifacts gl project-job artifacts ``` ## Platform-Specific Patterns ### GitHub Actions **Reusable workflows:** ```yaml # .github/workflows/reusable-test.yml on: workflow_call: inputs: node-version: required: true type: string jobs: test: runs-on: ubuntu-latest steps: - uses: actions/setup-node@v4 with: node-version: ${{ inputs.node-version }} ``` **Call from another workflow:** ```yaml jobs: test: uses: ./.github/workflows/reusable-test.yml with: node-version: '20' ``` ### GitLab CI **Templates with extends:** ```yaml .test_template: image: node:20 before_script: - npm ci unit-test: extends: .test_template script: - npm run test:unit integration-test: extends: .test_template script: - npm run test:integration ``` **DAG pipelines with needs:** ```yaml build: stage: build test:unit: stage: test needs: [build] test:integration: stage: test needs: [build] deploy: stage: deploy needs: [test:unit, test:integration] ``` ## Diagnostic Scripts ### Pipeline Analyzer Analyzes workflow configuration for optimization opportunities: ```bash # GitHub Actions python3 scripts/pipeline_analyzer.py --platform github --workflow .github/workflows/ci.yml # GitLab CI python3 scripts/pipeline_analyzer.py --platform gitlab --config .gitlab-ci.yml ``` **Identifies:** - Missing caching opportunities - Unnecessary sequential execution - Outdated action versions - Unused artifacts - Overly broad triggers ### CI Health Checker Checks pipeline status and identifies issues: ```bash # GitHub Actions python3 scripts/ci_health.py --platform github --repo owner/repo --limit 20 # GitLab CI python3 scripts/ci_health.py --platform gitlab --project-id 12345 --token $GITLAB_TOKEN ``` **Provides:** - Success/failure rates - Recent failure patterns - Workflow-specific insights - Actionable recommendations ## Reference Documentation For deep-dive information on specific topics: - **[best_practices.md](references/best_practices.md)** - Pipeline design, testing strategies, deployment patterns, dependency management, artifact handling, platform-specific patterns - **[security.md](references/security.md)** - Secrets management, OIDC authentication, supply chain security, access control, vulnerability scanning, secure pipeline patterns - **[devsecops.md](references/devsecops.md)** - Comprehensive DevSecOps guide: SAST (CodeQL, Semgrep, Bandit, Gosec), DAST (OWASP ZAP), SCA (npm audit, pip-audit, Snyk), container security (Trivy, Grype, SBOM), secret scanning (TruffleHog, Gitleaks), security gates, license compliance - **[optimization.md](references/optimization.md)** - Caching strategies (dependencies, Docker layers, build artifacts), parallelization techniques, test splitting, build optimization, resource management - **[troubleshooting.md](references/troubleshooting.md)** - Common issues (workflow not triggering, flaky tests, timeouts, dependency errors), Docker problems, authentication issues, platform-specific debugging ## Templates Starter templates for common use cases: ### GitHub Actions - **`assets/templates/github-actions/node-ci.yml`** - Complete Node.js CI/CD with security scanning, caching, matrix testing, and multi-environment deployment - **`assets/templates/github-actions/python-ci.yml`** - Python pipeline with security scanning, pytest, coverage, PyPI deployment - **`assets/templates/github-actions/go-ci.yml`** - Go pipeline with security scanning, multi-platform builds, benchmarks, integration tests - **`assets/templates/github-actions/docker-build.yml`** - Docker build with multi-platform support, security scanning, SBOM generation, and signing - **`assets/templates/github-actions/security-scan.yml`** - Comprehensive DevSecOps pipeline with SAST, DAST, SCA, container scanning, and security gates ### GitLab CI - **`assets/templates/gitlab-ci/node-ci.yml`** - GitLab CI pipeline with security scanning, parallel execution, services, and deployment stages - **`assets/templates/gitlab-ci/python-ci.yml`** - Python pipeline with security scanning, parallel testing, Docker builds, PyPI and Cloud Run deployment - **`assets/templates/gitlab-ci/go-ci.yml`** - Go pipeline with security scanning, multi-platform builds, benchmarks, Kubernetes deployment - **`assets/templates/gitlab-ci/docker-build.yml`** - Docker build with DinD, multi-arch, Container Registry, security scanning - **`assets/templates/gitlab-ci/security-scan.yml`** - Comprehensive DevSecOps pipeline with SAST, DAST, SCA, container scanning, GitLab security templates, and security gates ## Common Patterns ### Caching Dependencies **GitHub Actions:** ```yaml - uses: actions/cache@v4 with: path: ~/.npm key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }} restore-keys: | ${{ runner.os }}-node- - run: npm ci ``` **GitLab CI:** ```yaml cache: key: files: - package-lock.json paths: - node_modules/ ``` ### Matrix Builds **GitHub Actions:** ```yaml strategy: matrix: os: [ubuntu-latest, macos-latest] node: [18, 20, 22] fail-fast: false ``` **GitLab CI:** ```yaml test: parallel: matrix: - NODE_VERSION: ['18', '20', '22'] ``` ### Conditional Execution **GitHub Actions:** ```yaml - name: Deploy if: github.ref == 'refs/heads/main' && github.event_name == 'push' ``` **GitLab CI:** ```yaml deploy: rules: - if: '$CI_COMMIT_BRANCH == "main"' when: manual ``` ## Best Practices Summary **Performance:** - Enable dependency caching - Parallelize independent jobs - Add path filters to reduce unnecessary runs - Use matrix builds for cross-platform testing **Security:** - Use OIDC for cloud authentication - Pin actions to commit SHAs - Enable secret scanning and vulnerability checks - Apply principle of least privilege **Reliability:** - Add timeouts to prevent hung jobs - Implement retry logic for flaky operations - Use health checks after deployments - Enable concurrency cancellation **Maintainability:** - Use reusable workflows/templates - Document non-obvious decisions - Keep workflows DRY with extends/includes - Regular dependency updates ## Getting Started 1. **New pipeline:** Start with a template from `assets/templates/` 2. **Add security scanning:** Use DevSecOps templates or add security stages to existing pipelines (see workflow 6 above) 3. **Optimize existing:** Run `scripts/pipeline_analyzer.py` 4. **Debug issues:** Check `references/troubleshooting.md` 5. **Improve security:** Review `references/security.md` and `references/devsecops.md` checklists 6. **Speed up builds:** See `references/optimization.md`