Initial commit

2025-11-29 17:50:49 +08:00
commit d601a38b28
12 changed files with 5068 additions and 0 deletions
--- a/references/ci-cd-optimization.md
+++ b/references/ci-cd-optimization.md
@@ -0,0 +1,735 @@
+# CI/CD Test Optimization and Feedback Loop Strategies
+
+## Structuring Test Suites for Fast Feedback
+
+Modern development velocity depends on rapid feedback from automated tests. The key is structuring test execution to provide the right feedback at the right time in the development workflow.
+
+### The Feedback Loop Hierarchy
+
+**Goal**: Provide fastest feedback for most common operations, comprehensive testing for critical gates.
+
+```
+Pre-Commit Hook      →  Ultra-fast (< 30s)   →  Run constantly
+Pre-Push Hook        →  Fast (2-5 min)       →  Before pushing code
+PR Pipeline          →  Quick (10-15 min)    →  For every PR
+Post-Merge Pipeline  →  Moderate (20-45 min) →  After merging
+Nightly Build        →  Comprehensive (hrs)  →  Daily deep validation
+```
+
+### Pre-Commit Testing (< 30 seconds)
+
+**Purpose**: Catch obvious issues before code is committed, maintaining flow state.
+
+**What to Run**:
+- Linting and code formatting
+- Type checking (if applicable)
+- Fast unit tests for changed files only
+- Quick static analysis
+
+**Example Git Hook**
+```bash
+#!/bin/bash
+# .git/hooks/pre-commit
+
+# Run linter
+echo "Running linter..."
+pylint $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
+if [ $? -ne 0 ]; then
+    echo "Linting failed. Commit aborted."
+    exit 1
+fi
+
+# Run type checker
+echo "Running type checker..."
+mypy $(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
+if [ $? -ne 0 ]; then
+    echo "Type checking failed. Commit aborted."
+    exit 1
+fi
+
+# Run fast tests for changed files
+echo "Running tests..."
+pytest --quick tests/unit/ -k "$(git diff --cached --name-only | sed 's/\.py$//' | tr '\n' ' ')"
+if [ $? -ne 0 ]; then
+    echo "Tests failed. Commit aborted."
+    exit 1
+fi
+
+echo "Pre-commit checks passed!"
+```
+
+**Configuration**
+```ini
+# pytest.ini
+[pytest]
+markers =
+    quick: marks tests as quick (< 100ms each)
+
+# In test files
+@pytest.mark.quick
+def test_calculate_total():
+    assert calculate_total([1, 2, 3]) == 6
+```
+
+**Trade-offs**:
+- ✅ Immediate feedback prevents committing broken code
+- ✅ Maintains developer flow state
+- ❌ Can slow down commits if not truly fast
+- ❌ May be bypassed with --no-verify if too restrictive
+
+### Pre-Push Testing (2-5 minutes)
+
+**Purpose**: Run comprehensive unit tests before code reaches CI, catching bugs locally.
+
+**What to Run**:
+- All unit tests (should complete in 2-5 minutes)
+- Security scanning of dependencies
+- More thorough static analysis
+- License compliance checks
+
+**Example Git Hook**
+```bash
+#!/bin/bash
+# .git/hooks/pre-push
+
+echo "Running full unit test suite..."
+pytest tests/unit/ --maxfail=5 --tb=short
+
+if [ $? -ne 0 ]; then
+    echo "Unit tests failed. Push aborted."
+    echo "Run 'pytest tests/unit/' to see details."
+    exit 1
+fi
+
+echo "Checking for security vulnerabilities..."
+safety check
+
+if [ $? -ne 0 ]; then
+    echo "Security vulnerabilities detected. Push aborted."
+    exit 1
+fi
+
+echo "Pre-push checks passed!"
+```
+
+**Optimization Techniques**:
+```python
+# conftest.py - Optimize test setup
+import pytest
+
+@pytest.fixture(scope="session")
+def expensive_setup():
+    """Run once per test session, not per test."""
+    resource = create_expensive_resource()
+    yield resource
+    resource.cleanup()
+
+# Run tests in parallel
+# pytest -n auto tests/unit/
+```
+
+### Pull Request Pipeline (10-15 minutes)
+
+**Purpose**: Verify PR quality before merge, providing comprehensive feedback without excessive wait.
+
+**What to Run**:
+- All unit tests (parallelized)
+- Critical integration tests
+- Code coverage analysis
+- Static security analysis (SAST)
+- Dependency vulnerability scanning
+- Documentation generation
+- Container image build (if applicable)
+
+**Example GitHub Actions Workflow**
+```yaml
+name: Pull Request Tests
+
+on:
+  pull_request:
+    branches: [main, develop]
+
+jobs:
+  unit-tests:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.9, 3.10, 3.11]
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Cache dependencies
+        uses: actions/cache@v3
+        with:
+          path: ~/.cache/pip
+          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
+
+      - name: Install dependencies
+        run: |
+          pip install -r requirements.txt
+          pip install pytest pytest-cov pytest-xdist
+
+      - name: Run unit tests
+        run: pytest tests/unit/ -n auto --cov=src --cov-report=xml
+
+      - name: Upload coverage
+        uses: codecov/codecov-action@v3
+
+  integration-tests:
+    runs-on: ubuntu-latest
+    services:
+      postgres:
+        image: postgres:15
+        env:
+          POSTGRES_PASSWORD: postgres
+        options: >-
+          --health-cmd pg_isready
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run critical integration tests
+        run: pytest tests/integration/ -m "critical"
+
+  security-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run security scan
+        uses: snyk/actions/python@master
+        env:
+          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
+```
+
+**Optimization Strategies**:
+
+**1. Parallel Execution**
+```yaml
+# Run test suites in parallel
+jobs:
+  unit-tests:
+    # ... as above
+
+  integration-tests:
+    # ... runs simultaneously
+
+  lint:
+    # ... runs simultaneously
+```
+
+**2. Fail Fast**
+```bash
+# Stop after first few failures
+pytest --maxfail=3
+
+# Or stop on first failure for quick feedback
+pytest -x
+```
+
+**3. Smart Test Selection**
+```bash
+# Only run tests affected by changes
+pytest --testmon
+
+# Or use pytest-picked
+pytest --picked
+```
+
+**4. Caching**
+```yaml
+- name: Cache dependencies
+  uses: actions/cache@v3
+  with:
+    path: |
+      ~/.cache/pip
+      ~/.npm
+      node_modules
+    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt', '**/package-lock.json') }}
+```
+
+### Post-Merge Pipeline (20-45 minutes)
+
+**Purpose**: Run comprehensive test suite on main branch, catching integration issues.
+
+**What to Run**:
+- All unit tests
+- All integration tests
+- Critical E2E tests
+- Performance regression tests
+- Database migration tests
+- Cross-browser testing (subset)
+- Container security scanning
+
+**Example Configuration**
+```yaml
+name: Post-Merge Comprehensive Tests
+
+on:
+  push:
+    branches: [main]
+
+jobs:
+  comprehensive-tests:
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run all tests
+        run: |
+          pytest tests/ \
+            --cov=src \
+            --cov-report=html \
+            --cov-report=term \
+            --junitxml=test-results.xml
+
+      - name: Run E2E tests
+        run: pytest tests/e2e/ -m "critical"
+
+      - name: Performance tests
+        run: pytest tests/performance/ --benchmark-only
+
+      - name: Publish test results
+        uses: EnricoMi/publish-unit-test-result-action@v2
+        if: always()
+        with:
+          files: test-results.xml
+
+  deploy-staging:
+    needs: comprehensive-tests
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to staging
+        run: ./scripts/deploy-staging.sh
+```
+
+### Nightly/Scheduled Builds (Hours)
+
+**Purpose**: Exhaustive testing including expensive, slow, or less critical tests.
+
+**What to Run**:
+- Full E2E test suite
+- Cross-browser testing (all browsers)
+- Performance and load tests
+- Security penetration testing
+- Accessibility testing
+- Visual regression testing
+- Long-running integration tests
+- Compatibility matrix testing
+
+**Example Schedule**
+```yaml
+name: Nightly Comprehensive Tests
+
+on:
+  schedule:
+    - cron: '0 2 * * *'  # 2 AM daily
+  workflow_dispatch:  # Manual trigger
+
+jobs:
+  full-e2e-suite:
+    runs-on: ubuntu-latest
+    timeout-minutes: 180
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run full E2E suite
+        run: pytest tests/e2e/ --headed --slowmo=50
+
+  cross-browser-tests:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        browser: [chrome, firefox, safari, edge]
+
+    steps:
+      - name: Run tests on ${{ matrix.browser }}
+        run: pytest tests/e2e/ --browser=${{ matrix.browser }}
+
+  performance-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Load testing
+        run: locust -f tests/performance/locustfile.py --headless
+
+  security-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - name: DAST scanning
+        run: zap-baseline.py -t https://staging.example.com
+```
+
+---
+
+## Parallel Execution and Test Sharding
+
+### Test Parallelization Strategies
+
+**1. Process-Level Parallelization**
+```bash
+# pytest with xdist
+pytest -n auto  # Auto-detect CPU count
+pytest -n 4     # Use 4 processes
+
+# With load balancing
+pytest -n auto --dist loadscope
+```
+
+**2. CI Matrix Parallelization**
+```yaml
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        shard: [1, 2, 3, 4]
+
+    steps:
+      - name: Run shard ${{ matrix.shard }}
+        run: |
+          pytest tests/ \
+            --splits 4 \
+            --group ${{ matrix.shard }}
+```
+
+**3. Test File Sharding**
+```python
+# scripts/shard_tests.py
+import sys
+import os
+
+def shard_tests(test_files, shard_index, total_shards):
+    """Distribute test files across shards."""
+    return [f for i, f in enumerate(test_files)
+            if i % total_shards == shard_index]
+
+if __name__ == "__main__":
+    shard_index = int(os.environ.get('SHARD_INDEX', 0))
+    total_shards = int(os.environ.get('TOTAL_SHARDS', 1))
+
+    all_tests = collect_all_test_files()
+    shard_tests = shard_tests(all_tests, shard_index, total_shards)
+
+    run_pytest(shard_tests)
+```
+
+### Ensuring Test Independence
+
+**Requirements for Parallel Execution**:
+- Tests must not share state
+- Tests must not depend on execution order
+- Tests must use unique data (e.g., unique emails, IDs)
+- Tests must isolate database operations
+
+**Database Isolation Strategies**
+```python
+# Strategy 1: Transaction rollback per test
+@pytest.fixture
+def db_session():
+    connection = engine.connect()
+    transaction = connection.begin()
+    session = Session(bind=connection)
+    yield session
+    session.close()
+    transaction.rollback()
+    connection.close()
+
+# Strategy 2: Unique database per worker
+@pytest.fixture(scope="session")
+def db_url(worker_id):
+    if worker_id == "master":
+        return "postgresql://localhost/test_db"
+    return f"postgresql://localhost/test_db_{worker_id}"
+
+# Strategy 3: Unique test data per test
+@pytest.fixture
+def user():
+    unique_email = f"test-{uuid4()}@example.com"
+    user = User.create(email=unique_email)
+    yield user
+    user.delete()
+```
+
+---
+
+## Flaky Test Management
+
+Flaky tests—those that pass or fail non-deterministically—are worse than no tests. They train developers to ignore failures and erode trust.
+
+### Detecting Flaky Tests
+
+**1. Automatic Detection**
+```bash
+# Run tests multiple times to detect flakiness
+pytest tests/ --count=10 --randomly-seed=12345
+
+# Or use pytest-flakefinder
+pytest --flake-finder --flake-runs=50
+```
+
+**2. CI-Based Detection**
+```yaml
+jobs:
+  flaky-test-detection:
+    runs-on: ubuntu-latest
+    if: github.event_name == 'schedule'
+
+    steps:
+      - name: Run tests 20 times
+        run: |
+          for i in {1..20}; do
+            pytest tests/ --junitxml=results/run-$i.xml || true
+          done
+
+      - name: Analyze results
+        run: python scripts/detect_flaky_tests.py results/
+```
+
+**3. Tracking Test History**
+```python
+# scripts/detect_flaky_tests.py
+import xml.etree.ElementTree as ET
+from collections import defaultdict
+
+def analyze_test_runs(result_files):
+    test_results = defaultdict(list)
+
+    for file in result_files:
+        tree = ET.parse(file)
+        for testcase in tree.findall('.//testcase'):
+            test_name = f"{testcase.get('classname')}.{testcase.get('name')}"
+            passed = testcase.find('failure') is None
+            test_results[test_name].append(passed)
+
+    flaky_tests = []
+    for test, results in test_results.items():
+        if not all(results) and any(results):
+            pass_rate = sum(results) / len(results)
+            flaky_tests.append((test, pass_rate))
+
+    return sorted(flaky_tests, key=lambda x: x[1])
+```
+
+### Quarantine Process
+
+**Purpose**: Isolate flaky tests without blocking development.
+
+```python
+# Mark flaky tests
+@pytest.mark.flaky
+def test_sometimes_fails():
+    # Test with known flakiness
+    pass
+
+# pytest configuration
+# pytest.ini
+[pytest]
+markers =
+    flaky: marks tests as flaky (quarantined)
+
+# Skip flaky tests in PR pipeline
+pytest tests/ -m "not flaky"
+
+# Run only flaky tests in diagnostic mode
+pytest tests/ -m "flaky" --reruns 5
+```
+
+**Quarantine Workflow**:
+1. **Detect**: Identify flaky test through automated detection or manual observation
+2. **Mark**: Add `@pytest.mark.flaky` marker
+3. **Track**: Create ticket to fix or remove flaky test
+4. **Fix**: Investigate root cause and stabilize
+5. **Verify**: Run fixed test 50+ times to confirm stability
+6. **Release**: Remove flaky marker and return to normal suite
+
+### Common Flakiness Root Causes and Fixes
+
+**1. Timing Issues**
+```python
+# ❌ BAD: Fixed sleep
+def test_async_operation():
+    start_operation()
+    time.sleep(2)  # Hope 2 seconds is enough
+    assert operation_complete()
+
+# ✅ GOOD: Conditional wait
+def test_async_operation():
+    start_operation()
+    wait_for(lambda: operation_complete(), timeout=5)
+    assert operation_complete()
+```
+
+**2. Shared State**
+```python
+# ❌ BAD: Global state
+cache = {}
+
+def test_cache_operation():
+    cache['key'] = 'value'
+    assert get_from_cache('key') == 'value'
+
+# ✅ GOOD: Isolated state
+@pytest.fixture
+def cache():
+    cache = {}
+    yield cache
+    cache.clear()
+
+def test_cache_operation(cache):
+    cache['key'] = 'value'
+    assert cache['key'] == 'value'
+```
+
+**3. External Service Dependencies**
+```python
+# ❌ BAD: Calling real API
+def test_fetch_data():
+    data = api_client.get('https://external-api.com/data')
+    assert data is not None
+
+# ✅ GOOD: Mock external service
+@responses.activate
+def test_fetch_data():
+    responses.add(
+        responses.GET,
+        'https://external-api.com/data',
+        json={'result': 'data'},
+        status=200
+    )
+    data = api_client.get('https://external-api.com/data')
+    assert data['result'] == 'data'
+```
+
+**4. Test Order Dependencies**
+```python
+# ❌ BAD: Order-dependent tests
+def test_create_user():
+    global user
+    user = create_user()
+
+def test_update_user():
+    update_user(user)  # Depends on test_create_user
+
+# ✅ GOOD: Independent tests
+@pytest.fixture
+def user():
+    user = create_user()
+    yield user
+    delete_user(user)
+
+def test_create_user(user):
+    assert user.id is not None
+
+def test_update_user(user):
+    update_user(user)
+    assert user.updated_at is not None
+```
+
+---
+
+## Test Health Metrics
+
+Track these metrics to maintain test suite health and identify areas for improvement.
+
+### Key Metrics
+
+**1. Defect Detection Efficiency (DDE)**
+```
+DDE = Bugs Found by Tests / (Bugs Found by Tests + Bugs Found in Production)
+
+Target: > 90%
+```
+
+Track monthly to identify test coverage gaps.
+
+**2. Test Execution Time**
+```
+- Unit tests: < 5 minutes
+- Integration tests: < 15 minutes
+- Full suite: < 45 minutes
+- Nightly comprehensive: < 3 hours
+```
+
+**3. Flakiness Rate**
+```
+Flakiness Rate = Flaky Test Runs / Total Test Runs
+
+Target: < 5%
+```
+
+**4. Test Maintenance Burden**
+```
+Hours spent fixing broken tests / Total development hours
+
+Target: < 10%
+```
+
+**5. Code Coverage**
+```
+- Unit test coverage: 70-90% (focusing on business logic)
+- Integration coverage: Component boundaries covered
+- E2E coverage: Critical paths covered
+```
+
+### Dashboarding and Tracking
+
+**Example Metrics Dashboard**
+```python
+# scripts/test_metrics.py
+import json
+from datetime import datetime, timedelta
+
+def calculate_test_metrics():
+    metrics = {
+        'timestamp': datetime.now().isoformat(),
+        'execution_time': {
+            'unit_tests': get_unit_test_time(),
+            'integration_tests': get_integration_test_time(),
+            'e2e_tests': get_e2e_test_time()
+        },
+        'flakiness': {
+            'flaky_tests': count_flaky_tests(),
+            'total_tests': count_total_tests(),
+            'flakiness_rate': calculate_flakiness_rate()
+        },
+        'coverage': {
+            'line_coverage': get_line_coverage(),
+            'branch_coverage': get_branch_coverage()
+        },
+        'failures': {
+            'failed_builds_last_week': count_failed_builds(days=7),
+            'flaky_failures_last_week': count_flaky_failures(days=7)
+        }
+    }
+
+    with open('test_metrics.json', 'w') as f:
+        json.dump(metrics, f, indent=2)
+
+    return metrics
+```
+
+---
+
+## Summary: CI/CD Test Optimization Principles
+
+1. **Layer feedback**: Fast local tests, comprehensive CI tests, exhaustive nightly tests
+2. **Parallelize aggressively**: Use all available CPU and runner capacity
+3. **Fail fast**: Stop on first failures for rapid feedback
+4. **Cache dependencies**: Don't rebuild the world every time
+5. **Quarantine flaky tests**: Don't let them poison the suite
+6. **Monitor metrics**: Track execution time, flakiness, and defect detection
+7. **Optimize continuously**: Test suites slow down over time—invest in speed
+
+**Goal**: Developers get feedback in seconds locally, minutes in PR, comprehensive validation post-merge, and exhaustive testing nightly—all without flaky failures eroding trust.