Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:59:49 +08:00
commit 6e1bba5e72
16 changed files with 7270 additions and 0 deletions

View File

@@ -0,0 +1,881 @@
# DevOps Practices
**CI/CD, Infrastructure, Deployment, and Monitoring**
Consolidated from:
- devops-engineer skills
- cloud-architect skills
- site-reliability-engineer skills
- release-manager skills
# CI/CD Patterns Skill
**Expert-level CI/CD pipeline design patterns and best practices**
## Core Principles
1. **Pipeline as Code**: All pipeline configuration in version control
2. **Fast Feedback**: Fail fast, provide clear error messages
3. **Build Once**: Build artifacts once, deploy everywhere
4. **Idempotent**: Running twice produces same result
5. **Secure by Default**: Security scanning integrated, not optional
## Multi-Stage Pipeline Pattern
```
┌─────────┐ ┌──────┐ ┌──────────┐ ┌────────┐ ┌────────┐
│ Build │──>│ Test │──>│ Security │──>│ Deploy │──>│ Verify │
└─────────┘ └──────┘ └──────────┘ └────────┘ └────────┘
Fast Medium Slow Manual Quick
(<2 min) (<5 min) (<10 min) (Approval) (<2 min)
```
### Stage Ordering
1. **Build**: Compile code, create artifacts (fast fail)
2. **Test**: Unit → Integration → E2E (fastest first)
3. **Security**: SAST → Dependency scan → Container scan
4. **Deploy**: Dev → Staging → Prod (progressive)
5. **Verify**: Smoke tests, health checks
## Optimization Strategies
### 1. Caching
```yaml
# Cache dependencies
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .pip/
- .m2/
- .gradle/
```
**Impact**: 50-80% faster builds
### 2. Parallelization
```yaml
# Run tests in parallel
test:
parallel: 4
script:
- npm test -- --shard=${CI_NODE_INDEX}/${CI_NODE_TOTAL}
```
**Impact**: 4x faster test execution
### 3. Conditional Execution
```yaml
# Skip unnecessary steps
deploy:
only:
- main
- /^release-.*$/
changes:
- src/**
- Dockerfile
```
**Impact**: Reduce unnecessary runs by 70%
### 4. Docker Layer Caching
```dockerfile
# Multi-stage build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
FROM node:18-alpine
COPY --from=builder /app/dist /app/dist
COPY --from=builder /app/node_modules /app/node_modules
```
**Impact**: 10x faster Docker builds
## Security Scanning Integration
### SAST (Static Application Security Testing)
```yaml
sast:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=auto --json --output=sast-report.json .
artifacts:
reports:
sast: sast-report.json
```
**Tools**:
- **Semgrep**: Fast, customizable (free)
- **SonarQube**: Comprehensive code quality
- **CodeQL**: GitHub's semantic analysis
### Dependency Scanning
```yaml
dependency_scan:
stage: security
script:
- npm audit --audit-level=high
- snyk test --severity-threshold=high
allow_failure: false # Fail on critical vulnerabilities
```
**Tools**:
- **Snyk**: Comprehensive, auto-fix (free tier)
- **Dependabot**: GitHub native
- **npm audit**: Built-in Node.js
- **safety**: Python packages
### Container Scanning
```yaml
container_scan:
stage: security
image: aquasec/trivy
script:
- trivy image --severity HIGH,CRITICAL myapp:${CI_COMMIT_SHA}
```
**Tools**:
- **Trivy**: Fast, accurate (free)
- **Clair**: CoreOS project
- **Anchore**: Policy-based
### Secret Detection
```yaml
secrets_scan:
stage: security
image: zricethezav/gitleaks
script:
- gitleaks detect --source . --verbose
```
**Tools**:
- **Gitleaks**: Fast, configurable
- **TruffleHog**: High accuracy
- **git-secrets**: AWS focus
## Testing Strategies
### Test Pyramid
```
/\
/ \ E2E Tests (5%)
/____\ Slow, brittle
/ \
/ Integration \ (15%)
/________________\
/ \
/ Unit Tests (80%) \ Fast, reliable
/______________________\
```
### Test Execution Order
1. **Linting**: Fastest, catches syntax errors
2. **Unit tests**: Fast, isolated
3. **Integration tests**: Medium, database/API
4. **E2E tests**: Slow, full system
### Coverage Requirements
```yaml
test:
script:
- npm test -- --coverage --coverageThreshold='{"global":{"branches":80,"functions":80,"lines":80}}'
```
**Thresholds**:
- **Unit**: ≥80% coverage (enforce)
- **Integration**: ≥60% coverage (goal)
- **E2E**: Critical paths only
## Artifact Management
### Build Artifacts
```yaml
build:
script:
- npm run build
artifacts:
name: "build-${CI_COMMIT_SHA}"
paths:
- dist/
expire_in: 1 week
```
### Docker Images
```yaml
build_image:
script:
- docker build -t ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA} .
- docker tag ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA} ${REGISTRY}/${IMAGE}:latest
- docker push ${REGISTRY}/${IMAGE}:${CI_COMMIT_SHA}
- docker push ${REGISTRY}/${IMAGE}:latest
```
**Tagging Strategy**:
- **Commit SHA**: Immutable, traceable
- **Semantic version**: v1.2.3 (releases)
- **Branch name**: develop, staging
- **latest**: Most recent (use with caution)
## Deployment Patterns
### Environment Progression
```
Commit → Dev (auto) → Staging (auto) → Prod (manual)
```
### Deployment with Approval
```yaml
deploy_prod:
stage: deploy
environment:
name: production
url: https://app.example.com
when: manual # Require manual trigger
only:
- main
script:
- ./deploy.sh production
```
### Deployment with Verification
```yaml
deploy:
script:
- ./deploy.sh
- |
# Wait for deployment
for i in {1..30}; do
if curl -f https://app.example.com/health; then
echo "Deployment successful!"
exit 0
fi
sleep 10
done
echo "Deployment failed!"
exit 1
```
### Rollback on Failure
```yaml
deploy:
script:
- ./deploy.sh || (./rollback.sh && exit 1)
```
## Notification Patterns
### Slack Notifications
```yaml
notify_slack:
stage: .post
when: on_failure
script:
- |
curl -X POST -H 'Content-type: application/json' \
--data "{
\"text\": \"Pipeline failed for ${CI_PROJECT_NAME} on ${CI_COMMIT_BRANCH}\",
\"attachments\": [{
\"color\": \"danger\",
\"fields\": [{
\"title\": \"Commit\",
\"value\": \"${CI_COMMIT_SHORT_SHA}: ${CI_COMMIT_MESSAGE}\"
}, {
\"title\": \"Author\",
\"value\": \"${CI_COMMIT_AUTHOR}\"
}, {
\"title\": \"Pipeline\",
\"value\": \"${CI_PIPELINE_URL}\"
}]
}]
}" \
${SLACK_WEBHOOK_URL}
```
### Email on Production Deploy
```yaml
notify_email:
stage: .post
only:
- main
script:
- |
echo "Deployed ${CI_COMMIT_SHORT_SHA} to production" | \
mail -s "Production Deployment" team@example.com
```
## Branch Protection
### Required Checks
```yaml
# .github/workflows/required-checks.yml
name: Required Checks
on: [pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm run lint
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm test
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm audit
```
### GitHub Branch Protection Rules
- Require pull request reviews (1-2 reviewers)
- Require status checks to pass
- Require branches to be up to date
- Include administrators
- Restrict force pushes
## Common Patterns by Platform
### GitHub Actions
```yaml
name: CI/CD
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
NODE_VERSION: 18
REGISTRY: ghcr.io
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- run: npm run build
- run: npm test -- --coverage
- uses: codecov/codecov-action@v3
```
### GitLab CI
```yaml
stages:
- build
- test
- security
- deploy
variables:
DOCKER_DRIVER: overlay2
SECURE_ANALYZERS_PREFIX: "registry.gitlab.com/security-products"
build:
stage: build
script:
- npm ci
- npm run build
artifacts:
paths:
- dist/
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
test:
stage: test
script:
- npm test -- --coverage
coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
```
### Jenkins
```groovy
pipeline {
agent any
environment {
NODE_VERSION = '18'
REGISTRY = 'registry.example.com'
}
stages {
stage('Build') {
steps {
sh 'npm ci'
sh 'npm run build'
}
}
stage('Test') {
parallel {
stage('Unit') {
steps {
sh 'npm test'
}
}
stage('Lint') {
steps {
sh 'npm run lint'
}
}
}
}
stage('Security') {
steps {
sh 'npm audit'
sh 'snyk test'
}
}
stage('Deploy') {
when {
branch 'main'
}
steps {
sh './deploy.sh'
}
}
}
post {
always {
junit 'reports/**/*.xml'
publishHTML([
reportDir: 'coverage',
reportFiles: 'index.html',
reportName: 'Coverage'
])
}
failure {
emailext(
subject: "Build Failed: ${env.JOB_NAME}",
body: "Check ${env.BUILD_URL}",
to: "${env.CHANGE_AUTHOR_EMAIL}"
)
}
}
}
```
## Cost Optimization
### GitHub Actions
- Use caching (50% faster, free)
- Use matrix builds sparingly
- Self-hosted runners for private repos
- **Cost**: $0.008/minute (Linux)
### GitLab CI
- Use shared runners (free tier: 400 minutes/month)
- Cache dependencies
- Limit parallel jobs
- **Cost**: Free tier available, $19/user/month Pro
### Jenkins
- Use spot instances for agents
- Shut down idle agents
- Containerized agents
- **Cost**: Infrastructure only
## Troubleshooting
### Slow Builds
1. Profile pipeline (which stage is slow?)
2. Add caching for dependencies
3. Parallelize independent jobs
4. Optimize Docker layers
5. Use smaller base images
### Flaky Tests
1. Identify flaky tests (run 100x)
2. Add explicit waits (not sleep)
3. Mock external dependencies
4. Isolate test data
5. Retry failed tests (max 3x)
### Failed Deployments
1. Check deployment logs
2. Verify health checks
3. Check resource constraints
4. Validate configuration
5. Rollback if needed
## Best Practices Summary
**DO**:
- Keep pipelines fast (<10 min total)
- Fail fast (lint first, slow tests last)
- Cache dependencies
- Use semantic versioning
- Scan for vulnerabilities
- Require manual approval for prod
- Send notifications on failure
- Monitor pipeline performance
**DON'T**:
- Hardcode secrets (use secrets management)
- Skip tests in CI
- Deploy without verification
- Use latest tag in prod
- Ignore security warnings
- Run unnecessary jobs
- Leave old artifacts
## Quick Reference
| Task | GitHub Actions | GitLab CI | Jenkins |
|------|----------------|-----------|---------|
| **Syntax** | YAML | YAML | Groovy |
| **Caching** | `cache:` key | `cache:` section | Pipeline plugin |
| **Artifacts** | `actions/upload-artifact` | `artifacts:` section | `archiveArtifacts` |
| **Secrets** | Repository secrets | CI/CD variables | Credentials plugin |
| **Matrix** | `strategy: matrix:` | `parallel:` | `matrix {}` |
| **Conditions** | `if:` | `only:` / `except:` | `when {}` |
---
## 🚀 MCP Integration: GitHub + Context7 for CI/CD Automation
### Runtime Detection & Usage
The skill automatically detects available MCPs for CI/CD workflow enhancement:
```typescript
const hasGitHub = typeof mcp__github__create_or_update_file !== 'undefined';
const hasContext7 = typeof mcp__context7__get_library_docs !== 'undefined';
if (hasGitHub && hasContext7) {
// Get latest CI/CD framework documentation
const githubActionsDocs = await mcp__context7__get_library_docs({
context7CompatibleLibraryID: "/actions/toolkit",
topic: "GitHub Actions workflow syntax caching artifacts",
tokens: 3000
});
// Create optimized workflow directly in repository
await mcp__github__create_or_update_file({
owner: "myorg",
repo: "myapp",
path: ".github/workflows/ci.yml",
content: generatedWorkflow,
message: "Add optimized CI/CD pipeline with caching"
});
} else {
console.log(" GitHub/Context7 MCP not available");
console.log(" GitHub: npx @modelcontextprotocol/create-server github");
console.log(" Context7: npm install -g @context7/mcp-server");
}
```
### Real-World Workflow Examples
**Example 1: Multi-Stage Pipeline Generation with Best Practices**
```typescript
// Without MCP: Manual workflow writing (3 hours)
// 1. Read GitHub Actions docs
// 2. Research caching strategies
// 3. Write YAML from scratch
// 4. Test and debug
// 5. Optimize
// With GitHub + Context7 MCP: AI-assisted generation (15 minutes)
const actionsDocs = await mcp__context7__get_library_docs({
context7CompatibleLibraryID: "/actions/toolkit",
topic: "caching dependencies matrix builds artifacts security scanning",
tokens: 4000
});
const securityDocs = await mcp__context7__get_library_docs({
context7CompatibleLibraryID: "/returntocorp/semgrep",
topic: "CI integration security scanning",
tokens: 2500
});
// Generate optimized workflow
const workflow = generateGitHubActionsWorkflow({
language: "node",
stages: ["build", "test", "security", "deploy"],
patterns: actionsDocs,
securityScan: securityDocs
});
// Deploy directly to repository
await mcp__github__create_or_update_file({
owner: "myorg",
repo: "myapp",
path: ".github/workflows/ci.yml",
content: workflow,
message: "feat: add optimized CI/CD pipeline
- Multi-stage build with caching
- Parallel test execution
- Security scanning (SAST + dependency)
- Conditional deployment"
});
// ✅ 12x faster pipeline creation
// ✅ Latest best practices applied
// ✅ Automatic repository integration
```
**Example 2: GitLab CI to GitHub Actions Migration**
```typescript
// Analyze existing GitLab CI configuration
const gitlabConfig = await mcp__github__get_file_contents({
owner: "myorg",
repo: "legacy-app",
path: ".gitlab-ci.yml"
});
// Get GitHub Actions patterns
const migrationDocs = await mcp__context7__get_library_docs({
context7CompatibleLibraryID: "/actions/toolkit",
topic: "GitLab CI migration GitHub Actions equivalents",
tokens: 3500
});
// Convert GitLab CI → GitHub Actions
const convertedWorkflow = convertGitLabToGitHubActions({
gitlabConfig: gitlabConfig.content,
patterns: migrationDocs
});
// Create PR with converted workflow
await mcp__github__create_pull_request({
owner: "myorg",
repo: "legacy-app",
title: "Migrate from GitLab CI to GitHub Actions",
body: `## Migration Summary
- Converted all stages to GitHub Actions jobs
- Preserved caching strategy
- Maintained deployment logic
- Added security scanning
## Changes
- \`.gitlab-ci.yml\`\`.github/workflows/ci.yml\`
- Updated cache paths for GitHub Actions
- Converted variables to GitHub secrets
`,
head: "feat/github-actions-migration",
base: "main",
files: [
{ path: ".github/workflows/ci.yml", content: convertedWorkflow }
]
});
// ✅ Migration (1 hour vs 1 day)
// ✅ Automatic PR creation
// ✅ Best practices applied
```
**Example 3: CI/CD Performance Optimization**
```typescript
// Analyze current pipeline performance
const workflows = await mcp__github__list_workflows({
owner: "myorg",
repo: "myapp"
});
const runs = await mcp__github__list_workflow_runs({
owner: "myorg",
repo: "myapp",
workflow_id: workflows[0].id,
per_page: 100
});
// Get optimization patterns
const optimizationDocs = await mcp__context7__get_library_docs({
context7CompatibleLibraryID: "/actions/toolkit",
topic: "workflow optimization caching parallelization",
tokens: 3500
});
// Analyze bottlenecks
const analysis = analyzeWorkflowPerformance(runs);
// Results: Test stage takes 15 min (80% of total time)
// Optimize with parallelization
const optimizedWorkflow = await optimizeWorkflow({
currentWorkflow: workflow,
bottlenecks: analysis.bottlenecks,
patterns: optimizationDocs,
strategies: ["parallelize-tests", "cache-dependencies", "matrix-builds"]
});
// Deploy optimized workflow
await mcp__github__create_or_update_file({
owner: "myorg",
repo: "myapp",
path: ".github/workflows/ci.yml",
content: optimizedWorkflow,
message: "perf: optimize CI pipeline
- Parallelize tests across 4 runners
- Add dependency caching
- Use matrix strategy for multi-version testing
Reduces pipeline time: 18 min → 5 min (72% faster)"
});
// ✅ Pipeline time: 18 min → 5 min (72% faster)
// ✅ Cost reduction: 4x less compute time
// ✅ Faster feedback for developers
```
**Example 4: Automated Security Scanning Integration**
```typescript
// Get security tool documentation
const securityTools = [
"/returntocorp/semgrep",
"/aquasecurity/trivy",
"/zricethezav/gitleaks"
];
const securityDocs = await Promise.all(
securityTools.map(tool =>
mcp__context7__get_library_docs({
context7CompatibleLibraryID: tool,
topic: "CI integration security scanning",
tokens: 2500
})
)
);
// Generate comprehensive security workflow
const securityWorkflow = generateSecurityWorkflow({
sast: securityDocs[0], // Semgrep
container: securityDocs[1], // Trivy
secrets: securityDocs[2] // Gitleaks
});
// Add to repository
await mcp__github__create_or_update_file({
owner: "myorg",
repo: "myapp",
path: ".github/workflows/security.yml",
content: securityWorkflow,
message: "security: add comprehensive security scanning
- SAST: Semgrep for code analysis
- Container: Trivy for image scanning
- Secrets: Gitleaks for credential detection
Fails pipeline on HIGH/CRITICAL vulnerabilities"
});
// ✅ Comprehensive security (30 min vs 4 hours)
// ✅ Multiple scan types integrated
// ✅ Production-ready thresholds
```
### Available Library IDs for CI/CD
**GitHub Actions**:
- `/actions/toolkit` - GitHub Actions core
- `/actions/cache` - Caching action
- `/actions/upload-artifact` - Artifact uploads
- `/actions/download-artifact` - Artifact downloads
**GitLab CI**:
- `/gitlab-org/gitlab` - GitLab CI/CD
- `/gitlab-org/gitlab-runner` - GitLab Runner
**Jenkins**:
- `/jenkinsci/jenkins` - Jenkins core
- `/jenkinsci/pipeline-plugin` - Pipeline as Code
**CI/CD Tools**:
- `/circleci/circleci-docs` - CircleCI
- `/travis-ci/travis-ci` - Travis CI
- `/drone/drone` - Drone CI
**Security Scanning**:
- `/returntocorp/semgrep` - SAST
- `/aquasecurity/trivy` - Container scanning
- `/zricethezav/gitleaks` - Secret detection
- `/snyk/cli` - Dependency scanning
**Build Tools**:
- `/docker/build-push-action` - Docker builds
- `/docker/metadata-action` - Docker metadata
### Benefits Comparison
| Task | Without MCP | With GitHub + Context7 MCP | Time Saved |
|------|-------------|---------------------------|------------|
| New pipeline creation | 3 hours (manual docs + trial/error) | 15 min (AI-assisted) | 92% faster |
| GitLab → GitHub migration | 1 day (conversion + testing) | 1 hour (automated) | 88% faster |
| Pipeline optimization | 4 hours (profiling + research) | 30 min (analysis + apply) | 87% faster |
| Security integration | 4 hours (tool research + setup) | 30 min (multi-tool setup) | 87% faster |
| Branch protection setup | 30 min (manual clicking) | 2 min (API automation) | 93% faster |
### When to Use GitHub + Context7 MCP
**Ideal for**:
- ✅ Creating new CI/CD pipelines from scratch
- ✅ Migrating between CI/CD platforms
- ✅ Optimizing existing pipeline performance
- ✅ Integrating security scanning tools
- ✅ Setting up branch protection rules
- ✅ Automating PR workflows
- ✅ Multi-repo pipeline standardization
**Not needed for**:
- ❌ Simple one-stage builds
- ❌ Basic linting workflows
- ❌ Trivial pipeline modifications
### Installation
```bash
# Install GitHub MCP
npx @modelcontextprotocol/create-server github
# Install Context7 MCP
npm install -g @context7/mcp-server
# Configure in Claude Code MCP settings
# Both servers will auto-detect and enable integration
```
### Security Best Practices
When using GitHub + Context7 MCP for CI/CD:
- Never commit secrets to workflow files (use GitHub Secrets)
- Validate all security scanner configurations
- Use pinned versions for actions (not `@main`)
- Review generated workflows before merging
- Enable branch protection on main branches
- Require status checks before merging
- Use least privilege for CI/CD service accounts
---
**Version**: 1.0.0
**Last Updated**: 2025-01-20
**Patterns**: 20+
**Best Practices**: Production-tested