# CI/CD Test Optimization and Feedback Loop Strategies ## Structuring Test Suites for Fast Feedback Modern development velocity depends on rapid feedback from automated tests. The key is structuring test execution to provide the right feedback at the right time in the development workflow. ### The Feedback Loop Hierarchy **Goal**: Provide fastest feedback for most common operations, comprehensive testing for critical gates. ``` Pre-Commit Hook → Ultra-fast (< 30s) → Run constantly Pre-Push Hook → Fast (2-5 min) → Before pushing code PR Pipeline → Quick (10-15 min) → For every PR Post-Merge Pipeline → Moderate (20-45 min) → After merging Nightly Build → Comprehensive (hrs) → Daily deep validation ``` ### Pre-Commit Testing (< 30 seconds) **Purpose**: Catch obvious issues before code is committed, maintaining flow state. **What to Run**: - Linting and code formatting - Type checking (if applicable) - Fast unit tests for changed files only - Quick static analysis **Example Git Hook** ```bash #!/bin/bash # .git/hooks/pre-commit # Run linter echo "Running linter..." pylint $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$') if [ $? -ne 0 ]; then echo "Linting failed. Commit aborted." exit 1 fi # Run type checker echo "Running type checker..." mypy $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$') if [ $? -ne 0 ]; then echo "Type checking failed. Commit aborted." exit 1 fi # Run fast tests for changed files echo "Running tests..." pytest --quick tests/unit/ -k "$(git diff --cached --name-only | sed 's/\.py$//' | tr '\n' ' ')" if [ $? -ne 0 ]; then echo "Tests failed. Commit aborted." exit 1 fi echo "Pre-commit checks passed!" ``` **Configuration** ```ini # pytest.ini [pytest] markers = quick: marks tests as quick (< 100ms each) # In test files @pytest.mark.quick def test_calculate_total(): assert calculate_total([1, 2, 3]) == 6 ``` **Trade-offs**: - ✅ Immediate feedback prevents committing broken code - ✅ Maintains developer flow state - ❌ Can slow down commits if not truly fast - ❌ May be bypassed with --no-verify if too restrictive ### Pre-Push Testing (2-5 minutes) **Purpose**: Run comprehensive unit tests before code reaches CI, catching bugs locally. **What to Run**: - All unit tests (should complete in 2-5 minutes) - Security scanning of dependencies - More thorough static analysis - License compliance checks **Example Git Hook** ```bash #!/bin/bash # .git/hooks/pre-push echo "Running full unit test suite..." pytest tests/unit/ --maxfail=5 --tb=short if [ $? -ne 0 ]; then echo "Unit tests failed. Push aborted." echo "Run 'pytest tests/unit/' to see details." exit 1 fi echo "Checking for security vulnerabilities..." safety check if [ $? -ne 0 ]; then echo "Security vulnerabilities detected. Push aborted." exit 1 fi echo "Pre-push checks passed!" ``` **Optimization Techniques**: ```python # conftest.py - Optimize test setup import pytest @pytest.fixture(scope="session") def expensive_setup(): """Run once per test session, not per test.""" resource = create_expensive_resource() yield resource resource.cleanup() # Run tests in parallel # pytest -n auto tests/unit/ ``` ### Pull Request Pipeline (10-15 minutes) **Purpose**: Verify PR quality before merge, providing comprehensive feedback without excessive wait. **What to Run**: - All unit tests (parallelized) - Critical integration tests - Code coverage analysis - Static security analysis (SAST) - Dependency vulnerability scanning - Documentation generation - Container image build (if applicable) **Example GitHub Actions Workflow** ```yaml name: Pull Request Tests on: pull_request: branches: [main, develop] jobs: unit-tests: runs-on: ubuntu-latest strategy: matrix: python-version: [3.9, 3.10, 3.11] steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: ${{ matrix.python-version }} - name: Cache dependencies uses: actions/cache@v3 with: path: ~/.cache/pip key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} - name: Install dependencies run: | pip install -r requirements.txt pip install pytest pytest-cov pytest-xdist - name: Run unit tests run: pytest tests/unit/ -n auto --cov=src --cov-report=xml - name: Upload coverage uses: codecov/codecov-action@v3 integration-tests: runs-on: ubuntu-latest services: postgres: image: postgres:15 env: POSTGRES_PASSWORD: postgres options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v3 - name: Run critical integration tests run: pytest tests/integration/ -m "critical" security-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run security scan uses: snyk/actions/python@master env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} ``` **Optimization Strategies**: **1. Parallel Execution** ```yaml # Run test suites in parallel jobs: unit-tests: # ... as above integration-tests: # ... runs simultaneously lint: # ... runs simultaneously ``` **2. Fail Fast** ```bash # Stop after first few failures pytest --maxfail=3 # Or stop on first failure for quick feedback pytest -x ``` **3. Smart Test Selection** ```bash # Only run tests affected by changes pytest --testmon # Or use pytest-picked pytest --picked ``` **4. Caching** ```yaml - name: Cache dependencies uses: actions/cache@v3 with: path: | ~/.cache/pip ~/.npm node_modules key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }} ``` ### Post-Merge Pipeline (20-45 minutes) **Purpose**: Run comprehensive test suite on main branch, catching integration issues. **What to Run**: - All unit tests - All integration tests - Critical E2E tests - Performance regression tests - Database migration tests - Cross-browser testing (subset) - Container security scanning **Example Configuration** ```yaml name: Post-Merge Comprehensive Tests on: push: branches: [main] jobs: comprehensive-tests: runs-on: ubuntu-latest timeout-minutes: 45 steps: - uses: actions/checkout@v3 - name: Run all tests run: | pytest tests/ \ --cov=src \ --cov-report=html \ --cov-report=term \ --junitxml=test-results.xml - name: Run E2E tests run: pytest tests/e2e/ -m "critical" - name: Performance tests run: pytest tests/performance/ --benchmark-only - name: Publish test results uses: EnricoMi/publish-unit-test-result-action@v2 if: always() with: files: test-results.xml deploy-staging: needs: comprehensive-tests runs-on: ubuntu-latest steps: - name: Deploy to staging run: ./scripts/deploy-staging.sh ``` ### Nightly/Scheduled Builds (Hours) **Purpose**: Exhaustive testing including expensive, slow, or less critical tests. **What to Run**: - Full E2E test suite - Cross-browser testing (all browsers) - Performance and load tests - Security penetration testing - Accessibility testing - Visual regression testing - Long-running integration tests - Compatibility matrix testing **Example Schedule** ```yaml name: Nightly Comprehensive Tests on: schedule: - cron: '0 2 * * *' # 2 AM daily workflow_dispatch: # Manual trigger jobs: full-e2e-suite: runs-on: ubuntu-latest timeout-minutes: 180 steps: - uses: actions/checkout@v3 - name: Run full E2E suite run: pytest tests/e2e/ --headed --slowmo=50 cross-browser-tests: runs-on: ubuntu-latest strategy: matrix: browser: [chrome, firefox, safari, edge] steps: - name: Run tests on ${{ matrix.browser }} run: pytest tests/e2e/ --browser=${{ matrix.browser }} performance-tests: runs-on: ubuntu-latest steps: - name: Load testing run: locust -f tests/performance/locustfile.py --headless security-scan: runs-on: ubuntu-latest steps: - name: DAST scanning run: zap-baseline.py -t https://staging.example.com ``` --- ## Parallel Execution and Test Sharding ### Test Parallelization Strategies **1. Process-Level Parallelization** ```bash # pytest with xdist pytest -n auto # Auto-detect CPU count pytest -n 4 # Use 4 processes # With load balancing pytest -n auto --dist loadscope ``` **2. CI Matrix Parallelization** ```yaml jobs: test: runs-on: ubuntu-latest strategy: matrix: shard: [1, 2, 3, 4] steps: - name: Run shard ${{ matrix.shard }} run: | pytest tests/ \ --splits 4 \ --group ${{ matrix.shard }} ``` **3. Test File Sharding** ```python # scripts/shard_tests.py import sys import os def shard_tests(test_files, shard_index, total_shards): """Distribute test files across shards.""" return [f for i, f in enumerate(test_files) if i % total_shards == shard_index] if __name__ == "__main__": shard_index = int(os.environ.get('SHARD_INDEX', 0)) total_shards = int(os.environ.get('TOTAL_SHARDS', 1)) all_tests = collect_all_test_files() shard_tests = shard_tests(all_tests, shard_index, total_shards) run_pytest(shard_tests) ``` ### Ensuring Test Independence **Requirements for Parallel Execution**: - Tests must not share state - Tests must not depend on execution order - Tests must use unique data (e.g., unique emails, IDs) - Tests must isolate database operations **Database Isolation Strategies** ```python # Strategy 1: Transaction rollback per test @pytest.fixture def db_session(): connection = engine.connect() transaction = connection.begin() session = Session(bind=connection) yield session session.close() transaction.rollback() connection.close() # Strategy 2: Unique database per worker @pytest.fixture(scope="session") def db_url(worker_id): if worker_id == "master": return "postgresql://localhost/test_db" return f"postgresql://localhost/test_db_{worker_id}" # Strategy 3: Unique test data per test @pytest.fixture def user(): unique_email = f"test-{uuid4()}@example.com" user = User.create(email=unique_email) yield user user.delete() ``` --- ## Flaky Test Management Flaky tests—those that pass or fail non-deterministically—are worse than no tests. They train developers to ignore failures and erode trust. ### Detecting Flaky Tests **1. Automatic Detection** ```bash # Run tests multiple times to detect flakiness pytest tests/ --count=10 --randomly-seed=12345 # Or use pytest-flakefinder pytest --flake-finder --flake-runs=50 ``` **2. CI-Based Detection** ```yaml jobs: flaky-test-detection: runs-on: ubuntu-latest if: github.event_name == 'schedule' steps: - name: Run tests 20 times run: | for i in {1..20}; do pytest tests/ --junitxml=results/run-$i.xml || true done - name: Analyze results run: python scripts/detect_flaky_tests.py results/ ``` **3. Tracking Test History** ```python # scripts/detect_flaky_tests.py import xml.etree.ElementTree as ET from collections import defaultdict def analyze_test_runs(result_files): test_results = defaultdict(list) for file in result_files: tree = ET.parse(file) for testcase in tree.findall('.//testcase'): test_name = f"{testcase.get('classname')}.{testcase.get('name')}" passed = testcase.find('failure') is None test_results[test_name].append(passed) flaky_tests = [] for test, results in test_results.items(): if not all(results) and any(results): pass_rate = sum(results) / len(results) flaky_tests.append((test, pass_rate)) return sorted(flaky_tests, key=lambda x: x[1]) ``` ### Quarantine Process **Purpose**: Isolate flaky tests without blocking development. ```python # Mark flaky tests @pytest.mark.flaky def test_sometimes_fails(): # Test with known flakiness pass # pytest configuration # pytest.ini [pytest] markers = flaky: marks tests as flaky (quarantined) # Skip flaky tests in PR pipeline pytest tests/ -m "not flaky" # Run only flaky tests in diagnostic mode pytest tests/ -m "flaky" --reruns 5 ``` **Quarantine Workflow**: 1. **Detect**: Identify flaky test through automated detection or manual observation 2. **Mark**: Add `@pytest.mark.flaky` marker 3. **Track**: Create ticket to fix or remove flaky test 4. **Fix**: Investigate root cause and stabilize 5. **Verify**: Run fixed test 50+ times to confirm stability 6. **Release**: Remove flaky marker and return to normal suite ### Common Flakiness Root Causes and Fixes **1. Timing Issues** ```python # ❌ BAD: Fixed sleep def test_async_operation(): start_operation() time.sleep(2) # Hope 2 seconds is enough assert operation_complete() # ✅ GOOD: Conditional wait def test_async_operation(): start_operation() wait_for(lambda: operation_complete(), timeout=5) assert operation_complete() ``` **2. Shared State** ```python # ❌ BAD: Global state cache = {} def test_cache_operation(): cache['key'] = 'value' assert get_from_cache('key') == 'value' # ✅ GOOD: Isolated state @pytest.fixture def cache(): cache = {} yield cache cache.clear() def test_cache_operation(cache): cache['key'] = 'value' assert cache['key'] == 'value' ``` **3. External Service Dependencies** ```python # ❌ BAD: Calling real API def test_fetch_data(): data = api_client.get('https://external-api.com/data') assert data is not None # ✅ GOOD: Mock external service @responses.activate def test_fetch_data(): responses.add( responses.GET, 'https://external-api.com/data', json={'result': 'data'}, status=200 ) data = api_client.get('https://external-api.com/data') assert data['result'] == 'data' ``` **4. Test Order Dependencies** ```python # ❌ BAD: Order-dependent tests def test_create_user(): global user user = create_user() def test_update_user(): update_user(user) # Depends on test_create_user # ✅ GOOD: Independent tests @pytest.fixture def user(): user = create_user() yield user delete_user(user) def test_create_user(user): assert user.id is not None def test_update_user(user): update_user(user) assert user.updated_at is not None ``` --- ## Test Health Metrics Track these metrics to maintain test suite health and identify areas for improvement. ### Key Metrics **1. Defect Detection Efficiency (DDE)** ``` DDE = Bugs Found by Tests / (Bugs Found by Tests + Bugs Found in Production) Target: > 90% ``` Track monthly to identify test coverage gaps. **2. Test Execution Time** ``` - Unit tests: < 5 minutes - Integration tests: < 15 minutes - Full suite: < 45 minutes - Nightly comprehensive: < 3 hours ``` **3. Flakiness Rate** ``` Flakiness Rate = Flaky Test Runs / Total Test Runs Target: < 5% ``` **4. Test Maintenance Burden** ``` Hours spent fixing broken tests / Total development hours Target: < 10% ``` **5. Code Coverage** ``` - Unit test coverage: 70-90% (focusing on business logic) - Integration coverage: Component boundaries covered - E2E coverage: Critical paths covered ``` ### Dashboarding and Tracking **Example Metrics Dashboard** ```python # scripts/test_metrics.py import json from datetime import datetime, timedelta def calculate_test_metrics(): metrics = { 'timestamp': datetime.now().isoformat(), 'execution_time': { 'unit_tests': get_unit_test_time(), 'integration_tests': get_integration_test_time(), 'e2e_tests': get_e2e_test_time() }, 'flakiness': { 'flaky_tests': count_flaky_tests(), 'total_tests': count_total_tests(), 'flakiness_rate': calculate_flakiness_rate() }, 'coverage': { 'line_coverage': get_line_coverage(), 'branch_coverage': get_branch_coverage() }, 'failures': { 'failed_builds_last_week': count_failed_builds(days=7), 'flaky_failures_last_week': count_flaky_failures(days=7) } } with open('test_metrics.json', 'w') as f: json.dump(metrics, f, indent=2) return metrics ``` --- ## Summary: CI/CD Test Optimization Principles 1. **Layer feedback**: Fast local tests, comprehensive CI tests, exhaustive nightly tests 2. **Parallelize aggressively**: Use all available CPU and runner capacity 3. **Fail fast**: Stop on first failures for rapid feedback 4. **Cache dependencies**: Don't rebuild the world every time 5. **Quarantine flaky tests**: Don't let them poison the suite 6. **Monitor metrics**: Track execution time, flakiness, and defect detection 7. **Optimize continuously**: Test suites slow down over time—invest in speed **Goal**: Developers get feedback in seconds locally, minutes in PR, comprehensive validation post-merge, and exhaustive testing nightly—all without flaky failures eroding trust.