# DevOps Practices

**CI/CD, Infrastructure, Deployment, and Monitoring**

Consolidated from:
- devops-engineer skills
- cloud-architect skills
- site-reliability-engineer skills
- release-manager skills

# CI/CD Patterns Skill

**Expert-level CI/CD pipeline design patterns and best practices**

## Core Principles

1. **Pipeline as Code**: All pipeline configuration in version control
2. **Fast Feedback**: Fail fast, provide clear error messages
3. **Build Once**: Build artifacts once, deploy everywhere
4. **Idempotent**: Running twice produces same result
5. **Secure by Default**: Security scanning integrated, not optional

## Multi-Stage Pipeline Pattern

```
┌─────────┐   ┌──────┐   ┌──────────┐   ┌────────┐   ┌────────┐
│  Build  │──>│ Test │──>│ Security │──>│ Deploy │──>│ Verify │
└─────────┘   └──────┘   └──────────┘   └────────┘   └────────┘
    Fast        Medium        Slow          Manual      Quick
   (<2 min)    (<5 min)    (<10 min)    (Approval)   (<2 min)
```

### Stage Ordering
1. **Build**: Compile code, create artifacts (fast fail)
2. **Test**: Unit → Integration → E2E (fastest first)
3. **Security**: SAST → Dependency scan → Container scan
4. **Deploy**: Dev → Staging → Prod (progressive)
5. **Verify**: Smoke tests, health checks

## Optimization Strategies

### 1. Caching
```yaml
# Cache dependencies
cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .pip/
    - .m2/
    - .gradle/
```

**Impact**: 50-80% faster builds

### 2. Parallelization
```yaml
# Run tests in parallel
test:
  parallel: 4
  script:
    - npm test -- --shard=${CI_NODE_INDEX}/${CI_NODE_TOTAL}
```

**Impact**: 4x faster test execution

### 3. Conditional Execution
```yaml
# Skip unnecessary steps
deploy:
  only:
    - main
    - /^release-.*$/
  changes:
    - src/**
    - Dockerfile
```

**Impact**: Reduce unnecessary runs by 70%

### 4. Docker Layer Caching
```dockerfile
# Multi-stage build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:18-alpine
COPY --from=builder /app/dist /app/dist
COPY --from=builder /app/node_modules /app/node_modules
```

**Impact**: 10x faster Docker builds

## Security Scanning Integration

### SAST (Static Application Security Testing)
```yaml
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=sast-report.json .
  artifacts:
    reports:
      sast: sast-report.json
```

**Tools**:
- **Semgrep**: Fast, customizable (free)
- **SonarQube**: Comprehensive code quality
- **CodeQL**: GitHub's semantic analysis

### Dependency Scanning
```yaml
dependency_scan:
  stage: security
  script:
    - npm audit --audit-level=high
    - snyk test --severity-threshold=high
  allow_failure: false  # Fail on critical vulnerabilities
```

**Tools**:
- **Snyk**: Comprehensive, auto-fix (free tier)
- **Dependabot**: GitHub native
- **npm audit**: Built-in Node.js
- **safety**: Python packages

### Container Scanning
```yaml
container_scan:
  stage: security
  image: aquasec/trivy
  script:
    - trivy image --severity HIGH,CRITICAL myapp:${CI_COMMIT_SHA}
```

**Tools**:
- **Trivy**: Fast, accurate (free)
- **Clair**: CoreOS project
- **Anchore**: Policy-based

### Secret Detection
```yaml
secrets_scan:
  stage: security
  image: zricethezav/gitleaks
  script:
    - gitleaks detect --source . --verbose
```

**Tools**:
- **Gitleaks**: Fast, configurable
- **TruffleHog**: High accuracy
- **git-secrets**: AWS focus

## Testing Strategies

### Test Pyramid
```
        /\
       /  \     E2E Tests (5%)
      /____\    Slow, brittle
     /      \
    / Integration \ (15%)
   /________________\
  /                  \
 /   Unit Tests (80%) \ Fast, reliable
/______________________\
```

### Test Execution Order
1. **Linting**: Fastest, catches syntax errors
2. **Unit tests**: Fast, isolated
3. **Integration tests**: Medium, database/API
4. **E2E tests**: Slow, full system

### Coverage Requirements
```yaml
test:
  script:
    - npm test -- --coverage --coverageThreshold='{"global":{"branches":80,"functions":80,"lines":80}}'
```

**Thresholds**:
- **Unit**: ≥80% coverage (enforce)
- **Integration**: ≥60% coverage (goal)
- **E2E**: Critical paths only

## Artifact Management

### Build Artifacts
```yaml
build:
  script:
    - npm run build
  artifacts:
    name: "build-${CI_COMMIT_SHA}"
    paths:
      - dist/
    expire_in: 1 week
```

### Docker Images
```yaml
build_image:
  script:
    - docker build -t ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA} .
    - docker tag ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA} ${REGISTRY}/${IMAGE}:latest
    - docker push ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA}
    - docker push ${REGISTRY}/${IMAGE}:latest
```

**Tagging Strategy**:
- **Commit SHA**: Immutable, traceable
- **Semantic version**: v1.2.3 (releases)
- **Branch name**: develop, staging
- **latest**: Most recent (use with caution)

## Deployment Patterns

### Environment Progression
```
Commit → Dev (auto) → Staging (auto) → Prod (manual)
```

### Deployment with Approval
```yaml
deploy_prod:
  stage: deploy
  environment:
    name: production
    url: https://app.example.com
  when: manual  # Require manual trigger
  only:
    - main
  script:
    - ./deploy.sh production
```

### Deployment with Verification
```yaml
deploy:
  script:
    - ./deploy.sh
    - |
      # Wait for deployment
      for i in {1..30}; do
        if curl -f https://app.example.com/health; then
          echo "Deployment successful!"
          exit 0
        fi
        sleep 10
      done
      echo "Deployment failed!"
      exit 1
```

### Rollback on Failure
```yaml
deploy:
  script:
    - ./deploy.sh || (./rollback.sh && exit 1)
```

## Notification Patterns

### Slack Notifications
```yaml
notify_slack:
  stage: .post
  when: on_failure
  script:
    - |
      curl -X POST -H 'Content-type: application/json' \
        --data "{
          \"text\": \"Pipeline failed for ${CI_PROJECT_NAME} on ${CI_COMMIT_BRANCH}\",
          \"attachments\": [{
            \"color\": \"danger\",
            \"fields\": [{
              \"title\": \"Commit\",
              \"value\": \"${CI_COMMIT_SHORT_SHA}: ${CI_COMMIT_MESSAGE}\"
            }, {
              \"title\": \"Author\",
              \"value\": \"${CI_COMMIT_AUTHOR}\"
            }, {
              \"title\": \"Pipeline\",
              \"value\": \"${CI_PIPELINE_URL}\"
            }]
          }]
        }" \
        ${SLACK_WEBHOOK_URL}
```

### Email on Production Deploy
```yaml
notify_email:
  stage: .post
  only:
    - main
  script:
    - |
      echo "Deployed ${CI_COMMIT_SHORT_SHA} to production" | \
        mail -s "Production Deployment" team@example.com
```

## Branch Protection

### Required Checks
```yaml
# .github/workflows/required-checks.yml
name: Required Checks
on: [pull_request]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm audit
```

### GitHub Branch Protection Rules
- Require pull request reviews (1-2 reviewers)
- Require status checks to pass
- Require branches to be up to date
- Include administrators
- Restrict force pushes

## Common Patterns by Platform

### GitHub Actions
```yaml
name: CI/CD
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: 18
  REGISTRY: ghcr.io

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run build
      - run: npm test -- --coverage
      - uses: codecov/codecov-action@v3
```

### GitLab CI
```yaml
stages:
  - build
  - test
  - security
  - deploy

variables:
  DOCKER_DRIVER: overlay2
  SECURE_ANALYZERS_PREFIX: "registry.gitlab.com/security-products"

build:
  stage: build
  script:
    - npm ci
    - npm run build
  artifacts:
    paths:
      - dist/
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

test:
  stage: test
  script:
    - npm test -- --coverage
  coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
```

### Jenkins
```groovy
pipeline {
    agent any

    environment {
        NODE_VERSION = '18'
        REGISTRY = 'registry.example.com'
    }

    stages {
        stage('Build') {
            steps {
                sh 'npm ci'
                sh 'npm run build'
            }
        }

        stage('Test') {
            parallel {
                stage('Unit') {
                    steps {
                        sh 'npm test'
                    }
                }
                stage('Lint') {
                    steps {
                        sh 'npm run lint'
                    }
                }
            }
        }

        stage('Security') {
            steps {
                sh 'npm audit'
                sh 'snyk test'
            }
        }

        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh './deploy.sh'
            }
        }
    }

    post {
        always {
            junit 'reports/**/*.xml'
            publishHTML([
                reportDir: 'coverage',
                reportFiles: 'index.html',
                reportName: 'Coverage'
            ])
        }
        failure {
            emailext(
                subject: "Build Failed: ${env.JOB_NAME}",
                body: "Check ${env.BUILD_URL}",
                to: "${env.CHANGE_AUTHOR_EMAIL}"
            )
        }
    }
}
```

## Cost Optimization

### GitHub Actions
- Use caching (50% faster, free)
- Use matrix builds sparingly
- Self-hosted runners for private repos
- **Cost**: $0.008/minute (Linux)

### GitLab CI
- Use shared runners (free tier: 400 minutes/month)
- Cache dependencies
- Limit parallel jobs
- **Cost**: Free tier available, $19/user/month Pro

### Jenkins
- Use spot instances for agents
- Shut down idle agents
- Containerized agents
- **Cost**: Infrastructure only

## Troubleshooting

### Slow Builds
1. Profile pipeline (which stage is slow?)
2. Add caching for dependencies
3. Parallelize independent jobs
4. Optimize Docker layers
5. Use smaller base images

### Flaky Tests
1. Identify flaky tests (run 100x)
2. Add explicit waits (not sleep)
3. Mock external dependencies
4. Isolate test data
5. Retry failed tests (max 3x)

### Failed Deployments
1. Check deployment logs
2. Verify health checks
3. Check resource constraints
4. Validate configuration
5. Rollback if needed

## Best Practices Summary

✅ **DO**:
- Keep pipelines fast (<10 min total)
- Fail fast (lint first, slow tests last)
- Cache dependencies
- Use semantic versioning
- Scan for vulnerabilities
- Require manual approval for prod
- Send notifications on failure
- Monitor pipeline performance

❌ **DON'T**:
- Hardcode secrets (use secrets management)
- Skip tests in CI
- Deploy without verification
- Use latest tag in prod
- Ignore security warnings
- Run unnecessary jobs
- Leave old artifacts

## Quick Reference

| Task | GitHub Actions | GitLab CI | Jenkins |
|------|----------------|-----------|---------|
| **Syntax** | YAML | YAML | Groovy |
| **Caching** | `cache:` key | `cache:` section | Pipeline plugin |
| **Artifacts** | `actions/upload-artifact` | `artifacts:` section | `archiveArtifacts` |
| **Secrets** | Repository secrets | CI/CD variables | Credentials plugin |
| **Matrix** | `strategy: matrix:` | `parallel:` | `matrix {}` |
| **Conditions** | `if:` | `only:` / `except:` | `when {}` |

---

## 🚀 MCP Integration: GitHub + Context7 for CI/CD Automation

### Runtime Detection & Usage

The skill automatically detects available MCPs for CI/CD workflow enhancement:

```typescript
const hasGitHub = typeof mcp__github__create_or_update_file !== 'undefined';
const hasContext7 = typeof mcp__context7__get_library_docs !== 'undefined';

if (hasGitHub && hasContext7) {
  // Get latest CI/CD framework documentation
  const githubActionsDocs = await mcp__context7__get_library_docs({
    context7CompatibleLibraryID: "/actions/toolkit",
    topic: "GitHub Actions workflow syntax caching artifacts",
    tokens: 3000
  });

  // Create optimized workflow directly in repository
  await mcp__github__create_or_update_file({
    owner: "myorg",
    repo: "myapp",
    path: ".github/workflows/ci.yml",
    content: generatedWorkflow,
    message: "Add optimized CI/CD pipeline with caching"
  });
} else {
  console.log("ℹ️  GitHub/Context7 MCP not available");
  console.log("   GitHub: npx @modelcontextprotocol/create-server github");
  console.log("   Context7: npm install -g @context7/mcp-server");
}
```

### Real-World Workflow Examples

**Example 1: Multi-Stage Pipeline Generation with Best Practices**

```typescript
// Without MCP: Manual workflow writing (3 hours)
// 1. Read GitHub Actions docs
// 2. Research caching strategies
// 3. Write YAML from scratch
// 4. Test and debug
// 5. Optimize

// With GitHub + Context7 MCP: AI-assisted generation (15 minutes)
const actionsDocs = await mcp__context7__get_library_docs({
  context7CompatibleLibraryID: "/actions/toolkit",
  topic: "caching dependencies matrix builds artifacts security scanning",
  tokens: 4000
});

const securityDocs = await mcp__context7__get_library_docs({
  context7CompatibleLibraryID: "/returntocorp/semgrep",
  topic: "CI integration security scanning",
  tokens: 2500
});

// Generate optimized workflow
const workflow = generateGitHubActionsWorkflow({
  language: "node",
  stages: ["build", "test", "security", "deploy"],
  patterns: actionsDocs,
  securityScan: securityDocs
});

// Deploy directly to repository
await mcp__github__create_or_update_file({
  owner: "myorg",
  repo: "myapp",
  path: ".github/workflows/ci.yml",
  content: workflow,
  message: "feat: add optimized CI/CD pipeline

- Multi-stage build with caching
- Parallel test execution
- Security scanning (SAST + dependency)
- Conditional deployment"
});

// ✅ 12x faster pipeline creation
// ✅ Latest best practices applied
// ✅ Automatic repository integration
```

**Example 2: GitLab CI to GitHub Actions Migration**

```typescript
// Analyze existing GitLab CI configuration
const gitlabConfig = await mcp__github__get_file_contents({
  owner: "myorg",
  repo: "legacy-app",
  path: ".gitlab-ci.yml"
});

// Get GitHub Actions patterns
const migrationDocs = await mcp__context7__get_library_docs({
  context7CompatibleLibraryID: "/actions/toolkit",
  topic: "GitLab CI migration GitHub Actions equivalents",
  tokens: 3500
});

// Convert GitLab CI → GitHub Actions
const convertedWorkflow = convertGitLabToGitHubActions({
  gitlabConfig: gitlabConfig.content,
  patterns: migrationDocs
});

// Create PR with converted workflow
await mcp__github__create_pull_request({
  owner: "myorg",
  repo: "legacy-app",
  title: "Migrate from GitLab CI to GitHub Actions",
  body: `## Migration Summary
- Converted all stages to GitHub Actions jobs
- Preserved caching strategy
- Maintained deployment logic
- Added security scanning

## Changes
- \`.gitlab-ci.yml\` → \`.github/workflows/ci.yml\`
- Updated cache paths for GitHub Actions
- Converted variables to GitHub secrets
`,
  head: "feat/github-actions-migration",
  base: "main",
  files: [
    { path: ".github/workflows/ci.yml", content: convertedWorkflow }
  ]
});

// ✅ Migration (1 hour vs 1 day)
// ✅ Automatic PR creation
// ✅ Best practices applied
```

**Example 3: CI/CD Performance Optimization**

```typescript
// Analyze current pipeline performance
const workflows = await mcp__github__list_workflows({
  owner: "myorg",
  repo: "myapp"
});

const runs = await mcp__github__list_workflow_runs({
  owner: "myorg",
  repo: "myapp",
  workflow_id: workflows[0].id,
  per_page: 100
});

// Get optimization patterns
const optimizationDocs = await mcp__context7__get_library_docs({
  context7CompatibleLibraryID: "/actions/toolkit",
  topic: "workflow optimization caching parallelization",
  tokens: 3500
});

// Analyze bottlenecks
const analysis = analyzeWorkflowPerformance(runs);
// Results: Test stage takes 15 min (80% of total time)

// Optimize with parallelization
const optimizedWorkflow = await optimizeWorkflow({
  currentWorkflow: workflow,
  bottlenecks: analysis.bottlenecks,
  patterns: optimizationDocs,
  strategies: ["parallelize-tests", "cache-dependencies", "matrix-builds"]
});

// Deploy optimized workflow
await mcp__github__create_or_update_file({
  owner: "myorg",
  repo: "myapp",
  path: ".github/workflows/ci.yml",
  content: optimizedWorkflow,
  message: "perf: optimize CI pipeline

- Parallelize tests across 4 runners
- Add dependency caching
- Use matrix strategy for multi-version testing

Reduces pipeline time: 18 min → 5 min (72% faster)"
});

// ✅ Pipeline time: 18 min → 5 min (72% faster)
// ✅ Cost reduction: 4x less compute time
// ✅ Faster feedback for developers
```

**Example 4: Automated Security Scanning Integration**

```typescript
// Get security tool documentation
const securityTools = [
  "/returntocorp/semgrep",
  "/aquasecurity/trivy",
  "/zricethezav/gitleaks"
];

const securityDocs = await Promise.all(
  securityTools.map(tool =>
    mcp__context7__get_library_docs({
      context7CompatibleLibraryID: tool,
      topic: "CI integration security scanning",
      tokens: 2500
    })
  )
);

// Generate comprehensive security workflow
const securityWorkflow = generateSecurityWorkflow({
  sast: securityDocs[0],        // Semgrep
  container: securityDocs[1],   // Trivy
  secrets: securityDocs[2]      // Gitleaks
});

// Add to repository
await mcp__github__create_or_update_file({
  owner: "myorg",
  repo: "myapp",
  path: ".github/workflows/security.yml",
  content: securityWorkflow,
  message: "security: add comprehensive security scanning

- SAST: Semgrep for code analysis
- Container: Trivy for image scanning
- Secrets: Gitleaks for credential detection

Fails pipeline on HIGH/CRITICAL vulnerabilities"
});

// ✅ Comprehensive security (30 min vs 4 hours)
// ✅ Multiple scan types integrated
// ✅ Production-ready thresholds
```

### Available Library IDs for CI/CD

**GitHub Actions**:
- `/actions/toolkit` - GitHub Actions core
- `/actions/cache` - Caching action
- `/actions/upload-artifact` - Artifact uploads
- `/actions/download-artifact` - Artifact downloads

**GitLab CI**:
- `/gitlab-org/gitlab` - GitLab CI/CD
- `/gitlab-org/gitlab-runner` - GitLab Runner

**Jenkins**:
- `/jenkinsci/jenkins` - Jenkins core
- `/jenkinsci/pipeline-plugin` - Pipeline as Code

**CI/CD Tools**:
- `/circleci/circleci-docs` - CircleCI
- `/travis-ci/travis-ci` - Travis CI
- `/drone/drone` - Drone CI

**Security Scanning**:
- `/returntocorp/semgrep` - SAST
- `/aquasecurity/trivy` - Container scanning
- `/zricethezav/gitleaks` - Secret detection
- `/snyk/cli` - Dependency scanning

**Build Tools**:
- `/docker/build-push-action` - Docker builds
- `/docker/metadata-action` - Docker metadata

### Benefits Comparison

| Task | Without MCP | With GitHub + Context7 MCP | Time Saved |
|------|-------------|---------------------------|------------|
| New pipeline creation | 3 hours (manual docs + trial/error) | 15 min (AI-assisted) | 92% faster |
| GitLab → GitHub migration | 1 day (conversion + testing) | 1 hour (automated) | 88% faster |
| Pipeline optimization | 4 hours (profiling + research) | 30 min (analysis + apply) | 87% faster |
| Security integration | 4 hours (tool research + setup) | 30 min (multi-tool setup) | 87% faster |
| Branch protection setup | 30 min (manual clicking) | 2 min (API automation) | 93% faster |

### When to Use GitHub + Context7 MCP

**Ideal for**:
- ✅ Creating new CI/CD pipelines from scratch
- ✅ Migrating between CI/CD platforms
- ✅ Optimizing existing pipeline performance
- ✅ Integrating security scanning tools
- ✅ Setting up branch protection rules
- ✅ Automating PR workflows
- ✅ Multi-repo pipeline standardization

**Not needed for**:
- ❌ Simple one-stage builds
- ❌ Basic linting workflows
- ❌ Trivial pipeline modifications

### Installation

```bash
# Install GitHub MCP
npx @modelcontextprotocol/create-server github

# Install Context7 MCP
npm install -g @context7/mcp-server

# Configure in Claude Code MCP settings
# Both servers will auto-detect and enable integration
```

### Security Best Practices

When using GitHub + Context7 MCP for CI/CD:
- Never commit secrets to workflow files (use GitHub Secrets)
- Validate all security scanner configurations
- Use pinned versions for actions (not `@main`)
- Review generated workflows before merging
- Enable branch protection on main branches
- Require status checks before merging
- Use least privilege for CI/CD service accounts

---

**Version**: 1.0.0
**Last Updated**: 2025-01-20
**Patterns**: 20+
**Best Practices**: Production-tested