Files
gh-aazbeltran-claude-code-p…/references/ci-cd-optimization.md
2025-11-29 17:50:49 +08:00

736 lines
17 KiB
Markdown

# CI/CD Test Optimization and Feedback Loop Strategies
## Structuring Test Suites for Fast Feedback
Modern development velocity depends on rapid feedback from automated tests. The key is structuring test execution to provide the right feedback at the right time in the development workflow.
### The Feedback Loop Hierarchy
**Goal**: Provide fastest feedback for most common operations, comprehensive testing for critical gates.
```
Pre-Commit Hook → Ultra-fast (< 30s) → Run constantly
Pre-Push Hook → Fast (2-5 min) → Before pushing code
PR Pipeline → Quick (10-15 min) → For every PR
Post-Merge Pipeline → Moderate (20-45 min) → After merging
Nightly Build → Comprehensive (hrs) → Daily deep validation
```
### Pre-Commit Testing (< 30 seconds)
**Purpose**: Catch obvious issues before code is committed, maintaining flow state.
**What to Run**:
- Linting and code formatting
- Type checking (if applicable)
- Fast unit tests for changed files only
- Quick static analysis
**Example Git Hook**
```bash
#!/bin/bash
# .git/hooks/pre-commit
# Run linter
echo "Running linter..."
pylint $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
echo "Linting failed. Commit aborted."
exit 1
fi
# Run type checker
echo "Running type checker..."
mypy $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
echo "Type checking failed. Commit aborted."
exit 1
fi
# Run fast tests for changed files
echo "Running tests..."
pytest --quick tests/unit/ -k "$(git diff --cached --name-only | sed 's/\.py$//' | tr '\n' ' ')"
if [ $? -ne 0 ]; then
echo "Tests failed. Commit aborted."
exit 1
fi
echo "Pre-commit checks passed!"
```
**Configuration**
```ini
# pytest.ini
[pytest]
markers =
quick: marks tests as quick (< 100ms each)
# In test files
@pytest.mark.quick
def test_calculate_total():
assert calculate_total([1, 2, 3]) == 6
```
**Trade-offs**:
- ✅ Immediate feedback prevents committing broken code
- ✅ Maintains developer flow state
- ❌ Can slow down commits if not truly fast
- ❌ May be bypassed with --no-verify if too restrictive
### Pre-Push Testing (2-5 minutes)
**Purpose**: Run comprehensive unit tests before code reaches CI, catching bugs locally.
**What to Run**:
- All unit tests (should complete in 2-5 minutes)
- Security scanning of dependencies
- More thorough static analysis
- License compliance checks
**Example Git Hook**
```bash
#!/bin/bash
# .git/hooks/pre-push
echo "Running full unit test suite..."
pytest tests/unit/ --maxfail=5 --tb=short
if [ $? -ne 0 ]; then
echo "Unit tests failed. Push aborted."
echo "Run 'pytest tests/unit/' to see details."
exit 1
fi
echo "Checking for security vulnerabilities..."
safety check
if [ $? -ne 0 ]; then
echo "Security vulnerabilities detected. Push aborted."
exit 1
fi
echo "Pre-push checks passed!"
```
**Optimization Techniques**:
```python
# conftest.py - Optimize test setup
import pytest
@pytest.fixture(scope="session")
def expensive_setup():
"""Run once per test session, not per test."""
resource = create_expensive_resource()
yield resource
resource.cleanup()
# Run tests in parallel
# pytest -n auto tests/unit/
```
### Pull Request Pipeline (10-15 minutes)
**Purpose**: Verify PR quality before merge, providing comprehensive feedback without excessive wait.
**What to Run**:
- All unit tests (parallelized)
- Critical integration tests
- Code coverage analysis
- Static security analysis (SAST)
- Dependency vulnerability scanning
- Documentation generation
- Container image build (if applicable)
**Example GitHub Actions Workflow**
```yaml
name: Pull Request Tests
on:
pull_request:
branches: [main, develop]
jobs:
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, 3.10, 3.11]
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Cache dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov pytest-xdist
- name: Run unit tests
run: pytest tests/unit/ -n auto --cov=src --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
integration-tests:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v3
- name: Run critical integration tests
run: pytest tests/integration/ -m "critical"
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run security scan
uses: snyk/actions/python@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
```
**Optimization Strategies**:
**1. Parallel Execution**
```yaml
# Run test suites in parallel
jobs:
unit-tests:
# ... as above
integration-tests:
# ... runs simultaneously
lint:
# ... runs simultaneously
```
**2. Fail Fast**
```bash
# Stop after first few failures
pytest --maxfail=3
# Or stop on first failure for quick feedback
pytest -x
```
**3. Smart Test Selection**
```bash
# Only run tests affected by changes
pytest --testmon
# Or use pytest-picked
pytest --picked
```
**4. Caching**
```yaml
- name: Cache dependencies
uses: actions/cache@v3
with:
path: |
~/.cache/pip
~/.npm
node_modules
key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}
```
### Post-Merge Pipeline (20-45 minutes)
**Purpose**: Run comprehensive test suite on main branch, catching integration issues.
**What to Run**:
- All unit tests
- All integration tests
- Critical E2E tests
- Performance regression tests
- Database migration tests
- Cross-browser testing (subset)
- Container security scanning
**Example Configuration**
```yaml
name: Post-Merge Comprehensive Tests
on:
push:
branches: [main]
jobs:
comprehensive-tests:
runs-on: ubuntu-latest
timeout-minutes: 45
steps:
- uses: actions/checkout@v3
- name: Run all tests
run: |
pytest tests/ \
--cov=src \
--cov-report=html \
--cov-report=term \
--junitxml=test-results.xml
- name: Run E2E tests
run: pytest tests/e2e/ -m "critical"
- name: Performance tests
run: pytest tests/performance/ --benchmark-only
- name: Publish test results
uses: EnricoMi/publish-unit-test-result-action@v2
if: always()
with:
files: test-results.xml
deploy-staging:
needs: comprehensive-tests
runs-on: ubuntu-latest
steps:
- name: Deploy to staging
run: ./scripts/deploy-staging.sh
```
### Nightly/Scheduled Builds (Hours)
**Purpose**: Exhaustive testing including expensive, slow, or less critical tests.
**What to Run**:
- Full E2E test suite
- Cross-browser testing (all browsers)
- Performance and load tests
- Security penetration testing
- Accessibility testing
- Visual regression testing
- Long-running integration tests
- Compatibility matrix testing
**Example Schedule**
```yaml
name: Nightly Comprehensive Tests
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
workflow_dispatch: # Manual trigger
jobs:
full-e2e-suite:
runs-on: ubuntu-latest
timeout-minutes: 180
steps:
- uses: actions/checkout@v3
- name: Run full E2E suite
run: pytest tests/e2e/ --headed --slowmo=50
cross-browser-tests:
runs-on: ubuntu-latest
strategy:
matrix:
browser: [chrome, firefox, safari, edge]
steps:
- name: Run tests on ${{ matrix.browser }}
run: pytest tests/e2e/ --browser=${{ matrix.browser }}
performance-tests:
runs-on: ubuntu-latest
steps:
- name: Load testing
run: locust -f tests/performance/locustfile.py --headless
security-scan:
runs-on: ubuntu-latest
steps:
- name: DAST scanning
run: zap-baseline.py -t https://staging.example.com
```
---
## Parallel Execution and Test Sharding
### Test Parallelization Strategies
**1. Process-Level Parallelization**
```bash
# pytest with xdist
pytest -n auto # Auto-detect CPU count
pytest -n 4 # Use 4 processes
# With load balancing
pytest -n auto --dist loadscope
```
**2. CI Matrix Parallelization**
```yaml
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- name: Run shard ${{ matrix.shard }}
run: |
pytest tests/ \
--splits 4 \
--group ${{ matrix.shard }}
```
**3. Test File Sharding**
```python
# scripts/shard_tests.py
import sys
import os
def shard_tests(test_files, shard_index, total_shards):
"""Distribute test files across shards."""
return [f for i, f in enumerate(test_files)
if i % total_shards == shard_index]
if __name__ == "__main__":
shard_index = int(os.environ.get('SHARD_INDEX', 0))
total_shards = int(os.environ.get('TOTAL_SHARDS', 1))
all_tests = collect_all_test_files()
shard_tests = shard_tests(all_tests, shard_index, total_shards)
run_pytest(shard_tests)
```
### Ensuring Test Independence
**Requirements for Parallel Execution**:
- Tests must not share state
- Tests must not depend on execution order
- Tests must use unique data (e.g., unique emails, IDs)
- Tests must isolate database operations
**Database Isolation Strategies**
```python
# Strategy 1: Transaction rollback per test
@pytest.fixture
def db_session():
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()
# Strategy 2: Unique database per worker
@pytest.fixture(scope="session")
def db_url(worker_id):
if worker_id == "master":
return "postgresql://localhost/test_db"
return f"postgresql://localhost/test_db_{worker_id}"
# Strategy 3: Unique test data per test
@pytest.fixture
def user():
unique_email = f"test-{uuid4()}@example.com"
user = User.create(email=unique_email)
yield user
user.delete()
```
---
## Flaky Test Management
Flaky tests—those that pass or fail non-deterministically—are worse than no tests. They train developers to ignore failures and erode trust.
### Detecting Flaky Tests
**1. Automatic Detection**
```bash
# Run tests multiple times to detect flakiness
pytest tests/ --count=10 --randomly-seed=12345
# Or use pytest-flakefinder
pytest --flake-finder --flake-runs=50
```
**2. CI-Based Detection**
```yaml
jobs:
flaky-test-detection:
runs-on: ubuntu-latest
if: github.event_name == 'schedule'
steps:
- name: Run tests 20 times
run: |
for i in {1..20}; do
pytest tests/ --junitxml=results/run-$i.xml || true
done
- name: Analyze results
run: python scripts/detect_flaky_tests.py results/
```
**3. Tracking Test History**
```python
# scripts/detect_flaky_tests.py
import xml.etree.ElementTree as ET
from collections import defaultdict
def analyze_test_runs(result_files):
test_results = defaultdict(list)
for file in result_files:
tree = ET.parse(file)
for testcase in tree.findall('.//testcase'):
test_name = f"{testcase.get('classname')}.{testcase.get('name')}"
passed = testcase.find('failure') is None
test_results[test_name].append(passed)
flaky_tests = []
for test, results in test_results.items():
if not all(results) and any(results):
pass_rate = sum(results) / len(results)
flaky_tests.append((test, pass_rate))
return sorted(flaky_tests, key=lambda x: x[1])
```
### Quarantine Process
**Purpose**: Isolate flaky tests without blocking development.
```python
# Mark flaky tests
@pytest.mark.flaky
def test_sometimes_fails():
# Test with known flakiness
pass
# pytest configuration
# pytest.ini
[pytest]
markers =
flaky: marks tests as flaky (quarantined)
# Skip flaky tests in PR pipeline
pytest tests/ -m "not flaky"
# Run only flaky tests in diagnostic mode
pytest tests/ -m "flaky" --reruns 5
```
**Quarantine Workflow**:
1. **Detect**: Identify flaky test through automated detection or manual observation
2. **Mark**: Add `@pytest.mark.flaky` marker
3. **Track**: Create ticket to fix or remove flaky test
4. **Fix**: Investigate root cause and stabilize
5. **Verify**: Run fixed test 50+ times to confirm stability
6. **Release**: Remove flaky marker and return to normal suite
### Common Flakiness Root Causes and Fixes
**1. Timing Issues**
```python
# ❌ BAD: Fixed sleep
def test_async_operation():
start_operation()
time.sleep(2) # Hope 2 seconds is enough
assert operation_complete()
# ✅ GOOD: Conditional wait
def test_async_operation():
start_operation()
wait_for(lambda: operation_complete(), timeout=5)
assert operation_complete()
```
**2. Shared State**
```python
# ❌ BAD: Global state
cache = {}
def test_cache_operation():
cache['key'] = 'value'
assert get_from_cache('key') == 'value'
# ✅ GOOD: Isolated state
@pytest.fixture
def cache():
cache = {}
yield cache
cache.clear()
def test_cache_operation(cache):
cache['key'] = 'value'
assert cache['key'] == 'value'
```
**3. External Service Dependencies**
```python
# ❌ BAD: Calling real API
def test_fetch_data():
data = api_client.get('https://external-api.com/data')
assert data is not None
# ✅ GOOD: Mock external service
@responses.activate
def test_fetch_data():
responses.add(
responses.GET,
'https://external-api.com/data',
json={'result': 'data'},
status=200
)
data = api_client.get('https://external-api.com/data')
assert data['result'] == 'data'
```
**4. Test Order Dependencies**
```python
# ❌ BAD: Order-dependent tests
def test_create_user():
global user
user = create_user()
def test_update_user():
update_user(user) # Depends on test_create_user
# ✅ GOOD: Independent tests
@pytest.fixture
def user():
user = create_user()
yield user
delete_user(user)
def test_create_user(user):
assert user.id is not None
def test_update_user(user):
update_user(user)
assert user.updated_at is not None
```
---
## Test Health Metrics
Track these metrics to maintain test suite health and identify areas for improvement.
### Key Metrics
**1. Defect Detection Efficiency (DDE)**
```
DDE = Bugs Found by Tests / (Bugs Found by Tests + Bugs Found in Production)
Target: > 90%
```
Track monthly to identify test coverage gaps.
**2. Test Execution Time**
```
- Unit tests: < 5 minutes
- Integration tests: < 15 minutes
- Full suite: < 45 minutes
- Nightly comprehensive: < 3 hours
```
**3. Flakiness Rate**
```
Flakiness Rate = Flaky Test Runs / Total Test Runs
Target: < 5%
```
**4. Test Maintenance Burden**
```
Hours spent fixing broken tests / Total development hours
Target: < 10%
```
**5. Code Coverage**
```
- Unit test coverage: 70-90% (focusing on business logic)
- Integration coverage: Component boundaries covered
- E2E coverage: Critical paths covered
```
### Dashboarding and Tracking
**Example Metrics Dashboard**
```python
# scripts/test_metrics.py
import json
from datetime import datetime, timedelta
def calculate_test_metrics():
metrics = {
'timestamp': datetime.now().isoformat(),
'execution_time': {
'unit_tests': get_unit_test_time(),
'integration_tests': get_integration_test_time(),
'e2e_tests': get_e2e_test_time()
},
'flakiness': {
'flaky_tests': count_flaky_tests(),
'total_tests': count_total_tests(),
'flakiness_rate': calculate_flakiness_rate()
},
'coverage': {
'line_coverage': get_line_coverage(),
'branch_coverage': get_branch_coverage()
},
'failures': {
'failed_builds_last_week': count_failed_builds(days=7),
'flaky_failures_last_week': count_flaky_failures(days=7)
}
}
with open('test_metrics.json', 'w') as f:
json.dump(metrics, f, indent=2)
return metrics
```
---
## Summary: CI/CD Test Optimization Principles
1. **Layer feedback**: Fast local tests, comprehensive CI tests, exhaustive nightly tests
2. **Parallelize aggressively**: Use all available CPU and runner capacity
3. **Fail fast**: Stop on first failures for rapid feedback
4. **Cache dependencies**: Don't rebuild the world every time
5. **Quarantine flaky tests**: Don't let them poison the suite
6. **Monitor metrics**: Track execution time, flakiness, and defect detection
7. **Optimize continuously**: Test suites slow down over time—invest in speed
**Goal**: Developers get feedback in seconds locally, minutes in PR, comprehensive validation post-merge, and exhaustive testing nightly—all without flaky failures eroding trust.