zhongwei/gh-aazbeltran-claude-code-plugins-plugins-testing

Files

Zhongwei Li d601a38b28 Initial commit

2025-11-29 17:50:49 +08:00

17 KiB

Raw Blame History

CI/CD Test Optimization and Feedback Loop Strategies

Structuring Test Suites for Fast Feedback

Modern development velocity depends on rapid feedback from automated tests. The key is structuring test execution to provide the right feedback at the right time in the development workflow.

The Feedback Loop Hierarchy

Goal: Provide fastest feedback for most common operations, comprehensive testing for critical gates.

Pre-Commit Hook      →  Ultra-fast (< 30s)   →  Run constantly
Pre-Push Hook        →  Fast (2-5 min)       →  Before pushing code
PR Pipeline          →  Quick (10-15 min)    →  For every PR
Post-Merge Pipeline  →  Moderate (20-45 min) →  After merging
Nightly Build        →  Comprehensive (hrs)  →  Daily deep validation

Pre-Commit Testing (< 30 seconds)

Purpose: Catch obvious issues before code is committed, maintaining flow state.

What to Run:

Linting and code formatting
Type checking (if applicable)
Fast unit tests for changed files only
Quick static analysis

Example Git Hook

#!/bin/bash
# .git/hooks/pre-commit

# Run linter
echo "Running linter..."
pylint $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
    echo "Linting failed. Commit aborted."
    exit 1
fi

# Run type checker
echo "Running type checker..."
mypy $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ $? -ne 0 ]; then
    echo "Type checking failed. Commit aborted."
    exit 1
fi

# Run fast tests for changed files
echo "Running tests..."
pytest --quick tests/unit/ -k "$(git diff --cached --name-only | sed 's/\.py$//' | tr '\n' ' ')"
if [ $? -ne 0 ]; then
    echo "Tests failed. Commit aborted."
    exit 1
fi

echo "Pre-commit checks passed!"

Configuration

# pytest.ini
[pytest]
markers =
    quick: marks tests as quick (< 100ms each)

# In test files
@pytest.mark.quick
def test_calculate_total():
    assert calculate_total([1, 2, 3]) == 6

Trade-offs:

✅ Immediate feedback prevents committing broken code
✅ Maintains developer flow state
❌ Can slow down commits if not truly fast
❌ May be bypassed with --no-verify if too restrictive

Pre-Push Testing (2-5 minutes)

Purpose: Run comprehensive unit tests before code reaches CI, catching bugs locally.

What to Run:

All unit tests (should complete in 2-5 minutes)
Security scanning of dependencies
More thorough static analysis
License compliance checks

Example Git Hook

#!/bin/bash
# .git/hooks/pre-push

echo "Running full unit test suite..."
pytest tests/unit/ --maxfail=5 --tb=short

if [ $? -ne 0 ]; then
    echo "Unit tests failed. Push aborted."
    echo "Run 'pytest tests/unit/' to see details."
    exit 1
fi

echo "Checking for security vulnerabilities..."
safety check

if [ $? -ne 0 ]; then
    echo "Security vulnerabilities detected. Push aborted."
    exit 1
fi

echo "Pre-push checks passed!"

Optimization Techniques:

# conftest.py - Optimize test setup
import pytest

@pytest.fixture(scope="session")
def expensive_setup():
    """Run once per test session, not per test."""
    resource = create_expensive_resource()
    yield resource
    resource.cleanup()

# Run tests in parallel
# pytest -n auto tests/unit/

Pull Request Pipeline (10-15 minutes)

Purpose: Verify PR quality before merge, providing comprehensive feedback without excessive wait.

What to Run:

All unit tests (parallelized)
Critical integration tests
Code coverage analysis
Static security analysis (SAST)
Dependency vulnerability scanning
Documentation generation
Container image build (if applicable)

Example GitHub Actions Workflow

name: Pull Request Tests

on:
  pull_request:
    branches: [main, develop]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.9, 3.10, 3.11]

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}

      - name: Cache dependencies
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-xdist

      - name: Run unit tests
        run: pytest tests/unit/ -n auto --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v3

  integration-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v3

      - name: Run critical integration tests
        run: pytest tests/integration/ -m "critical"

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run security scan
        uses: snyk/actions/python@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

Optimization Strategies:

1. Parallel Execution

# Run test suites in parallel
jobs:
  unit-tests:
    # ... as above

  integration-tests:
    # ... runs simultaneously

  lint:
    # ... runs simultaneously

2. Fail Fast

# Stop after first few failures
pytest --maxfail=3

# Or stop on first failure for quick feedback
pytest -x

3. Smart Test Selection

# Only run tests affected by changes
pytest --testmon

# Or use pytest-picked
pytest --picked

4. Caching

- name: Cache dependencies
  uses: actions/cache@v3
  with:
    path: |
      ~/.cache/pip
      ~/.npm
      node_modules
    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}

Post-Merge Pipeline (20-45 minutes)

Purpose: Run comprehensive test suite on main branch, catching integration issues.

What to Run:

All unit tests
All integration tests
Critical E2E tests
Performance regression tests
Database migration tests
Cross-browser testing (subset)
Container security scanning

Example Configuration

name: Post-Merge Comprehensive Tests

on:
  push:
    branches: [main]

jobs:
  comprehensive-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 45

    steps:
      - uses: actions/checkout@v3

      - name: Run all tests
        run: |
          pytest tests/ \
            --cov=src \
            --cov-report=html \
            --cov-report=term \
            --junitxml=test-results.xml

      - name: Run E2E tests
        run: pytest tests/e2e/ -m "critical"

      - name: Performance tests
        run: pytest tests/performance/ --benchmark-only

      - name: Publish test results
        uses: EnricoMi/publish-unit-test-result-action@v2
        if: always()
        with:
          files: test-results.xml

  deploy-staging:
    needs: comprehensive-tests
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to staging
        run: ./scripts/deploy-staging.sh

Nightly/Scheduled Builds (Hours)

Purpose: Exhaustive testing including expensive, slow, or less critical tests.

What to Run:

Full E2E test suite
Cross-browser testing (all browsers)
Performance and load tests
Security penetration testing
Accessibility testing
Visual regression testing
Long-running integration tests
Compatibility matrix testing

Example Schedule

name: Nightly Comprehensive Tests

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily
  workflow_dispatch:  # Manual trigger

jobs:
  full-e2e-suite:
    runs-on: ubuntu-latest
    timeout-minutes: 180

    steps:
      - uses: actions/checkout@v3

      - name: Run full E2E suite
        run: pytest tests/e2e/ --headed --slowmo=50

  cross-browser-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        browser: [chrome, firefox, safari, edge]

    steps:
      - name: Run tests on ${{ matrix.browser }}
        run: pytest tests/e2e/ --browser=${{ matrix.browser }}

  performance-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Load testing
        run: locust -f tests/performance/locustfile.py --headless

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - name: DAST scanning
        run: zap-baseline.py -t https://staging.example.com

Parallel Execution and Test Sharding

Test Parallelization Strategies

1. Process-Level Parallelization

# pytest with xdist
pytest -n auto  # Auto-detect CPU count
pytest -n 4     # Use 4 processes

# With load balancing
pytest -n auto --dist loadscope

2. CI Matrix Parallelization

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3, 4]

    steps:
      - name: Run shard ${{ matrix.shard }}
        run: |
          pytest tests/ \
            --splits 4 \
            --group ${{ matrix.shard }}

3. Test File Sharding

# scripts/shard_tests.py
import sys
import os

def shard_tests(test_files, shard_index, total_shards):
    """Distribute test files across shards."""
    return [f for i, f in enumerate(test_files)
            if i % total_shards == shard_index]

if __name__ == "__main__":
    shard_index = int(os.environ.get('SHARD_INDEX', 0))
    total_shards = int(os.environ.get('TOTAL_SHARDS', 1))

    all_tests = collect_all_test_files()
    shard_tests = shard_tests(all_tests, shard_index, total_shards)

    run_pytest(shard_tests)

Ensuring Test Independence

Requirements for Parallel Execution:

Tests must not share state
Tests must not depend on execution order
Tests must use unique data (e.g., unique emails, IDs)
Tests must isolate database operations

Database Isolation Strategies

# Strategy 1: Transaction rollback per test
@pytest.fixture
def db_session():
    connection = engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)
    yield session
    session.close()
    transaction.rollback()
    connection.close()

# Strategy 2: Unique database per worker
@pytest.fixture(scope="session")
def db_url(worker_id):
    if worker_id == "master":
        return "postgresql://localhost/test_db"
    return f"postgresql://localhost/test_db_{worker_id}"

# Strategy 3: Unique test data per test
@pytest.fixture
def user():
    unique_email = f"test-{uuid4()}@example.com"
    user = User.create(email=unique_email)
    yield user
    user.delete()

Flaky Test Management

Flaky tests—those that pass or fail non-deterministically—are worse than no tests. They train developers to ignore failures and erode trust.

Detecting Flaky Tests

1. Automatic Detection

# Run tests multiple times to detect flakiness
pytest tests/ --count=10 --randomly-seed=12345

# Or use pytest-flakefinder
pytest --flake-finder --flake-runs=50

2. CI-Based Detection

jobs:
  flaky-test-detection:
    runs-on: ubuntu-latest
    if: github.event_name == 'schedule'

    steps:
      - name: Run tests 20 times
        run: |
          for i in {1..20}; do
            pytest tests/ --junitxml=results/run-$i.xml || true
          done

      - name: Analyze results
        run: python scripts/detect_flaky_tests.py results/

3. Tracking Test History

# scripts/detect_flaky_tests.py
import xml.etree.ElementTree as ET
from collections import defaultdict

def analyze_test_runs(result_files):
    test_results = defaultdict(list)

    for file in result_files:
        tree = ET.parse(file)
        for testcase in tree.findall('.//testcase'):
            test_name = f"{testcase.get('classname')}.{testcase.get('name')}"
            passed = testcase.find('failure') is None
            test_results[test_name].append(passed)

    flaky_tests = []
    for test, results in test_results.items():
        if not all(results) and any(results):
            pass_rate = sum(results) / len(results)
            flaky_tests.append((test, pass_rate))

    return sorted(flaky_tests, key=lambda x: x[1])

Quarantine Process

Purpose: Isolate flaky tests without blocking development.

# Mark flaky tests
@pytest.mark.flaky
def test_sometimes_fails():
    # Test with known flakiness
    pass

# pytest configuration
# pytest.ini
[pytest]
markers =
    flaky: marks tests as flaky (quarantined)

# Skip flaky tests in PR pipeline
pytest tests/ -m "not flaky"

# Run only flaky tests in diagnostic mode
pytest tests/ -m "flaky" --reruns 5

Quarantine Workflow:

Detect: Identify flaky test through automated detection or manual observation
Mark: Add @pytest.mark.flaky marker
Track: Create ticket to fix or remove flaky test
Fix: Investigate root cause and stabilize
Verify: Run fixed test 50+ times to confirm stability
Release: Remove flaky marker and return to normal suite

Common Flakiness Root Causes and Fixes

1. Timing Issues

# ❌ BAD: Fixed sleep
def test_async_operation():
    start_operation()
    time.sleep(2)  # Hope 2 seconds is enough
    assert operation_complete()

# ✅ GOOD: Conditional wait
def test_async_operation():
    start_operation()
    wait_for(lambda: operation_complete(), timeout=5)
    assert operation_complete()

2. Shared State

# ❌ BAD: Global state
cache = {}

def test_cache_operation():
    cache['key'] = 'value'
    assert get_from_cache('key') == 'value'

# ✅ GOOD: Isolated state
@pytest.fixture
def cache():
    cache = {}
    yield cache
    cache.clear()

def test_cache_operation(cache):
    cache['key'] = 'value'
    assert cache['key'] == 'value'

3. External Service Dependencies

# ❌ BAD: Calling real API
def test_fetch_data():
    data = api_client.get('https://external-api.com/data')
    assert data is not None

# ✅ GOOD: Mock external service
@responses.activate
def test_fetch_data():
    responses.add(
        responses.GET,
        'https://external-api.com/data',
        json={'result': 'data'},
        status=200
    )
    data = api_client.get('https://external-api.com/data')
    assert data['result'] == 'data'

4. Test Order Dependencies

# ❌ BAD: Order-dependent tests
def test_create_user():
    global user
    user = create_user()

def test_update_user():
    update_user(user)  # Depends on test_create_user

# ✅ GOOD: Independent tests
@pytest.fixture
def user():
    user = create_user()
    yield user
    delete_user(user)

def test_create_user(user):
    assert user.id is not None

def test_update_user(user):
    update_user(user)
    assert user.updated_at is not None

Test Health Metrics

Track these metrics to maintain test suite health and identify areas for improvement.

Key Metrics

1. Defect Detection Efficiency (DDE)

DDE = Bugs Found by Tests / (Bugs Found by Tests + Bugs Found in Production)

Target: > 90%

Track monthly to identify test coverage gaps.

2. Test Execution Time

- Unit tests: < 5 minutes
- Integration tests: < 15 minutes
- Full suite: < 45 minutes
- Nightly comprehensive: < 3 hours

3. Flakiness Rate

Flakiness Rate = Flaky Test Runs / Total Test Runs

Target: < 5%

4. Test Maintenance Burden

Hours spent fixing broken tests / Total development hours

Target: < 10%

5. Code Coverage

- Unit test coverage: 70-90% (focusing on business logic)
- Integration coverage: Component boundaries covered
- E2E coverage: Critical paths covered

Dashboarding and Tracking

Example Metrics Dashboard

# scripts/test_metrics.py
import json
from datetime import datetime, timedelta

def calculate_test_metrics():
    metrics = {
        'timestamp': datetime.now().isoformat(),
        'execution_time': {
            'unit_tests': get_unit_test_time(),
            'integration_tests': get_integration_test_time(),
            'e2e_tests': get_e2e_test_time()
        },
        'flakiness': {
            'flaky_tests': count_flaky_tests(),
            'total_tests': count_total_tests(),
            'flakiness_rate': calculate_flakiness_rate()
        },
        'coverage': {
            'line_coverage': get_line_coverage(),
            'branch_coverage': get_branch_coverage()
        },
        'failures': {
            'failed_builds_last_week': count_failed_builds(days=7),
            'flaky_failures_last_week': count_flaky_failures(days=7)
        }
    }

    with open('test_metrics.json', 'w') as f:
        json.dump(metrics, f, indent=2)

    return metrics

Summary: CI/CD Test Optimization Principles

Layer feedback: Fast local tests, comprehensive CI tests, exhaustive nightly tests
Parallelize aggressively: Use all available CPU and runner capacity
Fail fast: Stop on first failures for rapid feedback
Cache dependencies: Don't rebuild the world every time
Quarantine flaky tests: Don't let them poison the suite
Monitor metrics: Track execution time, flakiness, and defect detection
Optimize continuously: Test suites slow down over time—invest in speed

Goal: Developers get feedback in seconds locally, minutes in PR, comprehensive validation post-merge, and exhaustive testing nightly—all without flaky failures eroding trust.

17 KiB Raw Blame History

CI/CD Test Optimization and Feedback Loop Strategies

Structuring Test Suites for Fast Feedback

The Feedback Loop Hierarchy

Pre-Commit Testing (< 30 seconds)

Pre-Push Testing (2-5 minutes)

Pull Request Pipeline (10-15 minutes)

Post-Merge Pipeline (20-45 minutes)

Nightly/Scheduled Builds (Hours)

Parallel Execution and Test Sharding

Test Parallelization Strategies

Ensuring Test Independence

Flaky Test Management

Detecting Flaky Tests

Quarantine Process

Common Flakiness Root Causes and Fixes

Test Health Metrics

Key Metrics

Dashboarding and Tracking

Summary: CI/CD Test Optimization Principles

17 KiB

Raw Blame History