gh-aazbeltran-claude-code-p…/references/ci-cd-optimization.md

# CI/CD Test Optimization and Feedback Loop Strategies

## Structuring Test Suites for Fast Feedback

Modern development velocity depends on rapid feedback from automated tests. The key is structuring test execution to provide the right feedback at the right time in the development workflow.

### The Feedback Loop Hierarchy

**Goal**: Provide fastest feedback for most common operations, comprehensive testing for critical gates.

```
Pre-Commit Hook      →  Ultra-fast (< 30s)   →  Run constantly
Pre-Push Hook        →  Fast (2-5 min)       →  Before pushing code
PR Pipeline          →  Quick (10-15 min)    →  For every PR
Post-Merge Pipeline  →  Moderate (20-45 min) →  After merging
Nightly Build        →  Comprehensive (hrs)  →  Daily deep validation
```

### Pre-Commit Testing (< 30 seconds)

**Purpose**: Catch obvious issues before code is committed, maintaining flow state.

**What to Run**:
- Linting and code formatting
- Type checking (if applicable)
- Fast unit tests for changed files only
- Quick static analysis

**Example Git Hook**
```bash
#!/bin/bash
# .git/hooks/pre-commit

# Run linter
echo "Running linter..."
pylint $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
    echo "Linting failed. Commit aborted."
    exit 1
fi

# Run type checker
echo "Running type checker..."
mypy $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
    echo "Type checking failed. Commit aborted."
    exit 1
fi

# Run fast tests for changed files
echo "Running tests..."
pytest --quick tests/unit/ -k "$(git diff --cached --name-only | sed 's/\.py$//' | tr '\n' ' ')"
if [ $? -ne 0 ]; then
    echo "Tests failed. Commit aborted."
    exit 1
fi

echo "Pre-commit checks passed!"
```

**Configuration**
```ini
# pytest.ini
[pytest]
markers =
    quick: marks tests as quick (< 100ms each)

# In test files
@pytest.mark.quick
def test_calculate_total():
    assert calculate_total([1, 2, 3]) == 6
```

**Trade-offs**:
- ✅ Immediate feedback prevents committing broken code
- ✅ Maintains developer flow state
- ❌ Can slow down commits if not truly fast
- ❌ May be bypassed with --no-verify if too restrictive

### Pre-Push Testing (2-5 minutes)

**Purpose**: Run comprehensive unit tests before code reaches CI, catching bugs locally.

**What to Run**:
- All unit tests (should complete in 2-5 minutes)
- Security scanning of dependencies
- More thorough static analysis
- License compliance checks

**Example Git Hook**
```bash
#!/bin/bash
# .git/hooks/pre-push

echo "Running full unit test suite..."
pytest tests/unit/ --maxfail=5 --tb=short

if [ $? -ne 0 ]; then
    echo "Unit tests failed. Push aborted."
    echo "Run 'pytest tests/unit/' to see details."
    exit 1
fi

echo "Checking for security vulnerabilities..."
safety check

if [ $? -ne 0 ]; then
    echo "Security vulnerabilities detected. Push aborted."
    exit 1
fi

echo "Pre-push checks passed!"
```

**Optimization Techniques**:
```python
# conftest.py - Optimize test setup
import pytest

@pytest.fixture(scope="session")
def expensive_setup():
    """Run once per test session, not per test."""
    resource = create_expensive_resource()
    yield resource
    resource.cleanup()

# Run tests in parallel
# pytest -n auto tests/unit/
```

### Pull Request Pipeline (10-15 minutes)

**Purpose**: Verify PR quality before merge, providing comprehensive feedback without excessive wait.

**What to Run**:
- All unit tests (parallelized)
- Critical integration tests
- Code coverage analysis
- Static security analysis (SAST)
- Dependency vulnerability scanning
- Documentation generation
- Container image build (if applicable)

**Example GitHub Actions Workflow**
```yaml
name: Pull Request Tests

on:
  pull_request:
    branches: [main, develop]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.9, 3.10, 3.11]

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}

      - name: Cache dependencies
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-xdist

      - name: Run unit tests
        run: pytest tests/unit/ -n auto --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v3

  integration-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v3

      - name: Run critical integration tests
        run: pytest tests/integration/ -m "critical"

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run security scan
        uses: snyk/actions/python@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
```

**Optimization Strategies**:

**1. Parallel Execution**
```yaml
# Run test suites in parallel
jobs:
  unit-tests:
    # ... as above

  integration-tests:
    # ... runs simultaneously

  lint:
    # ... runs simultaneously
```

**2. Fail Fast**
```bash
# Stop after first few failures
pytest --maxfail=3

# Or stop on first failure for quick feedback
pytest -x
```

**3. Smart Test Selection**
```bash
# Only run tests affected by changes
pytest --testmon

# Or use pytest-picked
pytest --picked
```

**4. Caching**
```yaml
- name: Cache dependencies
  uses: actions/cache@v3
  with:
    path: |
      ~/.cache/pip
      ~/.npm
      node_modules
    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}
```

### Post-Merge Pipeline (20-45 minutes)

**Purpose**: Run comprehensive test suite on main branch, catching integration issues.

**What to Run**:
- All unit tests
- All integration tests
- Critical E2E tests
- Performance regression tests
- Database migration tests
- Cross-browser testing (subset)
- Container security scanning

**Example Configuration**
```yaml
name: Post-Merge Comprehensive Tests

on:
  push:
    branches: [main]

jobs:
  comprehensive-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 45

    steps:
      - uses: actions/checkout@v3

      - name: Run all tests
        run: |
          pytest tests/ \
            --cov=src \
            --cov-report=html \
            --cov-report=term \
            --junitxml=test-results.xml

      - name: Run E2E tests
        run: pytest tests/e2e/ -m "critical"

      - name: Performance tests
        run: pytest tests/performance/ --benchmark-only

      - name: Publish test results
        uses: EnricoMi/publish-unit-test-result-action@v2
        if: always()
        with:
          files: test-results.xml

  deploy-staging:
    needs: comprehensive-tests
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to staging
        run: ./scripts/deploy-staging.sh
```

### Nightly/Scheduled Builds (Hours)

**Purpose**: Exhaustive testing including expensive, slow, or less critical tests.

**What to Run**:
- Full E2E test suite
- Cross-browser testing (all browsers)
- Performance and load tests
- Security penetration testing
- Accessibility testing
- Visual regression testing
- Long-running integration tests
- Compatibility matrix testing

**Example Schedule**
```yaml
name: Nightly Comprehensive Tests

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily
  workflow_dispatch:  # Manual trigger

jobs:
  full-e2e-suite:
    runs-on: ubuntu-latest
    timeout-minutes: 180

    steps:
      - uses: actions/checkout@v3

      - name: Run full E2E suite
        run: pytest tests/e2e/ --headed --slowmo=50

  cross-browser-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        browser: [chrome, firefox, safari, edge]

    steps:
      - name: Run tests on ${{ matrix.browser }}
        run: pytest tests/e2e/ --browser=${{ matrix.browser }}

  performance-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Load testing
        run: locust -f tests/performance/locustfile.py --headless

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - name: DAST scanning
        run: zap-baseline.py -t https://staging.example.com
```

---

## Parallel Execution and Test Sharding

### Test Parallelization Strategies

**1. Process-Level Parallelization**
```bash
# pytest with xdist
pytest -n auto  # Auto-detect CPU count
pytest -n 4     # Use 4 processes

# With load balancing
pytest -n auto --dist loadscope
```

**2. CI Matrix Parallelization**
```yaml
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]

    steps:
      - name: Run shard ${{ matrix.shard }}
        run: |
          pytest tests/ \
            --splits 4 \
            --group ${{ matrix.shard }}
```

**3. Test File Sharding**
```python
# scripts/shard_tests.py
import sys
import os

def shard_tests(test_files, shard_index, total_shards):
    """Distribute test files across shards."""
    return [f for i, f in enumerate(test_files)
            if i % total_shards == shard_index]

if __name__ == "__main__":
    shard_index = int(os.environ.get('SHARD_INDEX', 0))
    total_shards = int(os.environ.get('TOTAL_SHARDS', 1))

    all_tests = collect_all_test_files()
    shard_tests = shard_tests(all_tests, shard_index, total_shards)

    run_pytest(shard_tests)
```

### Ensuring Test Independence

**Requirements for Parallel Execution**:
- Tests must not share state
- Tests must not depend on execution order
- Tests must use unique data (e.g., unique emails, IDs)
- Tests must isolate database operations

**Database Isolation Strategies**
```python
# Strategy 1: Transaction rollback per test
@pytest.fixture
def db_session():
    connection = engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)
    yield session
    session.close()
    transaction.rollback()
    connection.close()

# Strategy 2: Unique database per worker
@pytest.fixture(scope="session")
def db_url(worker_id):
    if worker_id == "master":
        return "postgresql://localhost/test_db"
    return f"postgresql://localhost/test_db_{worker_id}"

# Strategy 3: Unique test data per test
@pytest.fixture
def user():
    unique_email = f"test-{uuid4()}@example.com"
    user = User.create(email=unique_email)
    yield user
    user.delete()
```

---

## Flaky Test Management

Flaky tests—those that pass or fail non-deterministically—are worse than no tests. They train developers to ignore failures and erode trust.

### Detecting Flaky Tests

**1. Automatic Detection**
```bash
# Run tests multiple times to detect flakiness
pytest tests/ --count=10 --randomly-seed=12345

# Or use pytest-flakefinder
pytest --flake-finder --flake-runs=50
```

**2. CI-Based Detection**
```yaml
jobs:
  flaky-test-detection:
    runs-on: ubuntu-latest
    if: github.event_name == 'schedule'

    steps:
      - name: Run tests 20 times
        run: |
          for i in {1..20}; do
            pytest tests/ --junitxml=results/run-$i.xml || true
          done

      - name: Analyze results
        run: python scripts/detect_flaky_tests.py results/
```

**3. Tracking Test History**
```python
# scripts/detect_flaky_tests.py
import xml.etree.ElementTree as ET
from collections import defaultdict

def analyze_test_runs(result_files):
    test_results = defaultdict(list)

    for file in result_files:
        tree = ET.parse(file)
        for testcase in tree.findall('.//testcase'):
            test_name = f"{testcase.get('classname')}.{testcase.get('name')}"
            passed = testcase.find('failure') is None
            test_results[test_name].append(passed)

    flaky_tests = []
    for test, results in test_results.items():
        if not all(results) and any(results):
            pass_rate = sum(results) / len(results)
            flaky_tests.append((test, pass_rate))

    return sorted(flaky_tests, key=lambda x: x[1])
```

### Quarantine Process

**Purpose**: Isolate flaky tests without blocking development.

```python
# Mark flaky tests
@pytest.mark.flaky
def test_sometimes_fails():
    # Test with known flakiness
    pass

# pytest configuration
# pytest.ini
[pytest]
markers =
    flaky: marks tests as flaky (quarantined)

# Skip flaky tests in PR pipeline
pytest tests/ -m "not flaky"

# Run only flaky tests in diagnostic mode
pytest tests/ -m "flaky" --reruns 5
```

**Quarantine Workflow**:
1. **Detect**: Identify flaky test through automated detection or manual observation
2. **Mark**: Add `@pytest.mark.flaky` marker
3. **Track**: Create ticket to fix or remove flaky test
4. **Fix**: Investigate root cause and stabilize
5. **Verify**: Run fixed test 50+ times to confirm stability
6. **Release**: Remove flaky marker and return to normal suite

### Common Flakiness Root Causes and Fixes

**1. Timing Issues**
```python
# ❌ BAD: Fixed sleep
def test_async_operation():
    start_operation()
    time.sleep(2)  # Hope 2 seconds is enough
    assert operation_complete()

# ✅ GOOD: Conditional wait
def test_async_operation():
    start_operation()
    wait_for(lambda: operation_complete(), timeout=5)
    assert operation_complete()
```

**2. Shared State**
```python
# ❌ BAD: Global state
cache = {}

def test_cache_operation():
    cache['key'] = 'value'
    assert get_from_cache('key') == 'value'

# ✅ GOOD: Isolated state
@pytest.fixture
def cache():
    cache = {}
    yield cache
    cache.clear()

def test_cache_operation(cache):
    cache['key'] = 'value'
    assert cache['key'] == 'value'
```

**3. External Service Dependencies**
```python
# ❌ BAD: Calling real API
def test_fetch_data():
    data = api_client.get('https://external-api.com/data')
    assert data is not None

# ✅ GOOD: Mock external service
@responses.activate
def test_fetch_data():
    responses.add(
        responses.GET,
        'https://external-api.com/data',
        json={'result': 'data'},
        status=200
    )
    data = api_client.get('https://external-api.com/data')
    assert data['result'] == 'data'
```

**4. Test Order Dependencies**
```python
# ❌ BAD: Order-dependent tests
def test_create_user():
    global user
    user = create_user()

def test_update_user():
    update_user(user)  # Depends on test_create_user

# ✅ GOOD: Independent tests
@pytest.fixture
def user():
    user = create_user()
    yield user
    delete_user(user)

def test_create_user(user):
    assert user.id is not None

def test_update_user(user):
    update_user(user)
    assert user.updated_at is not None
```

---

## Test Health Metrics

Track these metrics to maintain test suite health and identify areas for improvement.

### Key Metrics

**1. Defect Detection Efficiency (DDE)**
```
DDE = Bugs Found by Tests / (Bugs Found by Tests + Bugs Found in Production)

Target: > 90%
```

Track monthly to identify test coverage gaps.

**2. Test Execution Time**
```
- Unit tests: < 5 minutes
- Integration tests: < 15 minutes
- Full suite: < 45 minutes
- Nightly comprehensive: < 3 hours
```

**3. Flakiness Rate**
```
Flakiness Rate = Flaky Test Runs / Total Test Runs

Target: < 5%
```

**4. Test Maintenance Burden**
```
Hours spent fixing broken tests / Total development hours

Target: < 10%
```

**5. Code Coverage**
```
- Unit test coverage: 70-90% (focusing on business logic)
- Integration coverage: Component boundaries covered
- E2E coverage: Critical paths covered
```

### Dashboarding and Tracking

**Example Metrics Dashboard**
```python
# scripts/test_metrics.py
import json
from datetime import datetime, timedelta

def calculate_test_metrics():
    metrics = {
        'timestamp': datetime.now().isoformat(),
        'execution_time': {
            'unit_tests': get_unit_test_time(),
            'integration_tests': get_integration_test_time(),
            'e2e_tests': get_e2e_test_time()
        },
        'flakiness': {
            'flaky_tests': count_flaky_tests(),
            'total_tests': count_total_tests(),
            'flakiness_rate': calculate_flakiness_rate()
        },
        'coverage': {
            'line_coverage': get_line_coverage(),
            'branch_coverage': get_branch_coverage()
        },
        'failures': {
            'failed_builds_last_week': count_failed_builds(days=7),
            'flaky_failures_last_week': count_flaky_failures(days=7)
        }
    }

    with open('test_metrics.json', 'w') as f:
        json.dump(metrics, f, indent=2)

    return metrics
```

---

## Summary: CI/CD Test Optimization Principles

1. **Layer feedback**: Fast local tests, comprehensive CI tests, exhaustive nightly tests
2. **Parallelize aggressively**: Use all available CPU and runner capacity
3. **Fail fast**: Stop on first failures for rapid feedback
4. **Cache dependencies**: Don't rebuild the world every time
5. **Quarantine flaky tests**: Don't let them poison the suite
6. **Monitor metrics**: Track execution time, flakiness, and defect detection
7. **Optimize continuously**: Test suites slow down over time—invest in speed

**Goal**: Developers get feedback in seconds locally, minutes in PR, comprehensive validation post-merge, and exhaustive testing nightly—all without flaky failures eroding trust.