Initial commit

2025-11-29 17:50:49 +08:00
commit d601a38b28
12 changed files with 5068 additions and 0 deletions
--- a/references/coverage-strategy.md
+++ b/references/coverage-strategy.md
@@ -0,0 +1,622 @@
+# Test Coverage Strategy: Beyond the Numbers
+
+Code coverage metrics have been both praised and criticized. The truth lies in understanding what coverage means and doesn't mean, and using it appropriately to guide testing strategy.
+
+## Code Coverage vs. Behavior Coverage
+
+### Code Coverage Metrics
+
+**Line Coverage**: Percentage of code lines executed by tests
+```python
+def calculate_discount(customer_type, order_total):
+    if customer_type == "VIP":          # Line 1
+        return order_total * 0.10       # Line 2
+    return 0                            # Line 3
+
+# Test that achieves 66% line coverage
+def test_vip_discount():
+    result = calculate_discount("VIP", 100)
+    assert result == 10
+# Lines executed: 1, 2 (66%)
+# Line not executed: 3 (33%)
+```
+
+**Branch Coverage**: Percentage of decision branches executed
+```python
+def calculate_discount(customer_type, order_total):
+    if customer_type == "VIP":
+        return order_total * 0.10
+    return 0
+
+# 50% branch coverage (only true branch tested)
+# Need test for false branch to reach 100%
+```
+
+**Path Coverage**: Percentage of execution paths through code (most comprehensive)
+```python
+def calculate_price(quantity, is_vip, has_coupon):
+    price = quantity * 10
+
+    if is_vip:              # 2 branches
+        price *= 0.9
+
+    if has_coupon:          # 2 branches
+        price *= 0.95
+
+    return price
+
+# 2 × 2 = 4 possible paths:
+# 1. Not VIP, no coupon
+# 2. Not VIP, has coupon
+# 3. VIP, no coupon
+# 4. VIP, has coupon
+```
+
+### Behavior Coverage
+
+**The Critical Distinction**: Code coverage measures execution; behavior coverage measures validation of scenarios.
+
+```python
+# ❌ HIGH code coverage, LOW behavior coverage
+def test_calculate_discount():
+    # Executes all lines
+    result1 = calculate_discount("VIP", 100)
+    result2 = calculate_discount("Regular", 100)
+    # But makes NO assertions - just executes code
+    # 100% line coverage, 0% behavior validation
+
+# ✅ GOOD behavior coverage
+def test_vip_customer_receives_discount():
+    result = calculate_discount("VIP", 100)
+    assert result == 10, "VIP customers should get 10% discount"
+
+def test_regular_customer_receives_no_discount():
+    result = calculate_discount("Regular", 100)
+    assert result == 0, "Regular customers should get no discount"
+
+def test_discount_scales_with_order_total():
+    assert calculate_discount("VIP", 100) == 10
+    assert calculate_discount("VIP", 200) == 20
+
+def test_empty_customer_type_returns_zero():
+    result = calculate_discount("", 100)
+    assert result == 0
+```
+
+### What Coverage Metrics Tell You
+
+**Coverage CAN tell you**:
+- Which code has been executed by tests
+- Which branches/paths have not been tested
+- Where obvious gaps in testing exist
+- Whether new code has any tests
+
+**Coverage CANNOT tell you**:
+- Whether tests make meaningful assertions
+- Whether edge cases are properly validated
+- Whether business logic is correct
+- Whether tests would catch real bugs
+- Test quality or effectiveness
+
+---
+
+## Recommended Coverage Targets by Component Type
+
+Coverage targets should vary based on code type, recognizing that different components have different testing ROI.
+
+### Business Logic and Domain Code: 90-100%
+
+**What**: Core business rules, calculations, domain models, workflows
+
+**Why High Coverage**:
+- High complexity requires thorough testing
+- Bugs here directly impact business outcomes
+- Logic changes frequently
+- Unit tests are fast and provide clear value
+
+**Example**:
+```python
+# Core business logic deserves comprehensive coverage
+class OrderPricing:
+    def calculate_total(self, items, customer):
+        """Calculate order total with discounts."""
+        subtotal = sum(item.price * item.quantity for items)
+
+        if customer.is_vip:
+            subtotal *= 0.90
+
+        if subtotal > 1000:
+            subtotal *= 0.95  # Bulk discount
+
+        return subtotal + self.calculate_tax(subtotal)
+
+# Comprehensive test coverage
+def test_regular_customer_basic_order():
+    assert pricing.calculate_total(items_100, regular_customer) == ...
+
+def test_vip_customer_receives_discount():
+    assert pricing.calculate_total(items_100, vip_customer) == ...
+
+def test_bulk_discount_applied_over_threshold():
+    assert pricing.calculate_total(items_1500, regular_customer) == ...
+
+def test_vip_and_bulk_discounts_stack():
+    assert pricing.calculate_total(items_1500, vip_customer) == ...
+
+def test_tax_calculation_included():
+    assert pricing.calculate_total(items_100, regular_customer) > 100
+```
+
+### Integration Points: 70-90%
+
+**What**: Repositories, API clients, message queue handlers, external service integrations
+
+**Why Moderate Coverage**:
+- Integration behavior is critical but some scenarios are impractical to test
+- Integration tests are slower than unit tests
+- Some paths only make sense in production environment
+
+**Example**:
+```python
+# Repository - 80% coverage is reasonable
+class UserRepository:
+    def save(self, user):
+        # Test this
+        pass
+
+    def find_by_email(self, email):
+        # Test this
+        pass
+
+    def find_all_active(self):
+        # Test this
+        pass
+
+    def cleanup_old_sessions(self):
+        # Maintenance query - maybe skip or light testing
+        pass
+
+# Test critical paths thoroughly
+def test_save_user_persists_to_database():
+    # Essential to test
+    pass
+
+def test_find_by_email_returns_correct_user():
+    # Essential to test
+    pass
+
+# Less critical edge cases can be sampled
+def test_find_by_email_handles_sql_injection_attempt():
+    # Nice to have but framework may handle this
+    pass
+```
+
+### Controllers and API Endpoints: 70-90%
+
+**What**: HTTP controllers, API route handlers, request/response handling
+
+**Why Moderate Coverage**:
+- Framework handles much of the complexity
+- Some error scenarios difficult to trigger
+- Integration tests cover many paths automatically
+
+**Example**:
+```python
+# API endpoint - focus on business logic integration
+@app.route('/api/orders', methods=['POST'])
+def create_order():
+    data = request.json
+
+    try:
+        order = order_service.create(data)
+        return jsonify(order.to_dict()), 201
+    except ValidationError as e:
+        return jsonify({'error': str(e)}), 400
+    except Exception as e:
+        logger.error(f"Order creation failed: {e}")
+        return jsonify({'error': 'Internal error'}), 500
+
+# Test main scenarios
+def test_create_order_success():
+    response = client.post('/api/orders', json=valid_order_data)
+    assert response.status_code == 201
+
+def test_create_order_validation_error():
+    response = client.post('/api/orders', json=invalid_order_data)
+    assert response.status_code == 400
+
+# Generic 500 error handling may not need explicit test
+# if framework testing covers it
+```
+
+### Configuration and Glue Code: 30-50%
+
+**What**: Configuration loading, simple adapters, basic initialization code
+
+**Why Low Coverage**:
+- Little logic to test
+- Mostly framework or library interaction
+- Better covered by integration tests
+- Testing provides minimal value
+
+**Example**:
+```python
+# Configuration loader - low unit test value
+class Config:
+    @staticmethod
+    def load():
+        return {
+            'database_url': os.getenv('DATABASE_URL'),
+            'api_key': os.getenv('API_KEY'),
+            'debug': os.getenv('DEBUG', 'false').lower() == 'true'
+        }
+
+# Minimal unit testing needed
+# Better to test that app starts with config in integration test
+```
+
+### Utilities and Helpers: 80-90%
+
+**What**: Shared utility functions, formatters, parsers, helpers
+
+**Why High Coverage**:
+- Used in multiple places - bugs have wide impact
+- Usually pure functions - easy to test thoroughly
+- Fast unit tests with high value
+
+**Example**:
+```python
+# Utility function - deserves thorough testing
+def format_currency(amount, currency='USD'):
+    """Format amount as currency string."""
+    symbols = {'USD': '$', 'EUR': '€', 'GBP': '£'}
+    symbol = symbols.get(currency, currency)
+
+    if amount < 0:
+        return f"-{symbol}{abs(amount):.2f}"
+    return f"{symbol}{amount:.2f}"
+
+# Comprehensive tests for shared utility
+def test_format_usd():
+    assert format_currency(100) == "$100.00"
+
+def test_format_eur():
+    assert format_currency(100, 'EUR') == "€100.00"
+
+def test_format_negative():
+    assert format_currency(-50) == "-$50.00"
+
+def test_format_unknown_currency():
+    assert format_currency(100, 'XYZ') == "XYZ100.00"
+```
+
+---
+
+## When Coverage Becomes Harmful
+
+Chasing coverage percentages as a goal leads to counterproductive practices.
+
+### Anti-Pattern: Testing for Coverage, Not Value
+
+**The Problem**: Writing tests that execute code but don't verify behavior, just to increase coverage numbers.
+
+```python
+# ❌ BAD: Test that only boosts coverage
+def test_user_creation():
+    user = User(email="test@example.com")
+    # No assertions - just creates object for coverage
+    # Provides zero value
+
+# ✅ GOOD: Test that validates behavior
+def test_user_creation_sets_default_values():
+    user = User(email="test@example.com")
+
+    assert user.email == "test@example.com"
+    assert user.is_active == True
+    assert user.created_at is not None
+```
+
+### Anti-Pattern: Testing Implementation Details to Hit Coverage
+
+**The Problem**: Creating brittle tests of private methods just to reach coverage targets.
+
+```python
+class OrderService:
+    def create_order(self, items):
+        self._validate(items)
+        return self._build_order(items)
+
+    def _validate(self, items):
+        # Private method
+        if not items:
+            raise ValueError("No items")
+
+    def _build_order(self, items):
+        # Private method
+        return Order(items)
+
+# ❌ BAD: Testing private methods for coverage
+def test_validate_private_method():
+    service = OrderService()
+    with pytest.raises(ValueError):
+        service._validate([])  # Testing implementation detail
+
+# ✅ GOOD: Testing through public interface
+def test_create_order_with_empty_items_raises_error():
+    service = OrderService()
+    with pytest.raises(ValueError):
+        service.create_order([])  # Testing behavior
+```
+
+### Anti-Pattern: 100% Coverage Without Edge Case Testing
+
+**The Problem**: Achieving high line coverage while missing important edge cases.
+
+```python
+def divide(a, b):
+    return a / b
+
+# Achieves 100% line coverage
+def test_divide():
+    result = divide(10, 2)
+    assert result == 5
+
+# But misses critical edge cases:
+# - Division by zero
+# - Negative numbers
+# - Very large numbers
+# - Floating point precision issues
+
+# ✅ GOOD: Comprehensive edge case testing
+def test_divide_normal_case():
+    assert divide(10, 2) == 5
+
+def test_divide_by_zero_raises_error():
+    with pytest.raises(ZeroDivisionError):
+        divide(10, 0)
+
+def test_divide_negative_numbers():
+    assert divide(-10, 2) == -5
+    assert divide(10, -2) == -5
+
+def test_divide_floating_point():
+    assert divide(1, 3) == pytest.approx(0.333, rel=0.01)
+```
+
+---
+
+## Practical Coverage Guidance
+
+### Setting Organizational Coverage Targets
+
+**Recommended Approach**: Set minimum thresholds that prevent coverage from decreasing, not aspirational 100% targets.
+
+```yaml
+# .coveragerc
+[report]
+fail_under = 80  # Fail if overall coverage drops below 80%
+
+[coverage:run]
+branch = true
+source = src/
+
+[coverage:report]
+exclude_lines =
+    # Don't require coverage for:
+    pragma: no cover
+    def __repr__
+    raise AssertionError
+    raise NotImplementedError
+    if __name__ == .__main__.:
+    if TYPE_CHECKING:
+    @abstract
+```
+
+**Progressive Enforcement**:
+```bash
+# Start where you are
+Current coverage: 65%
+Set target: 65% (prevent regression)
+
+# Gradually improve
+After 3 months: Raise to 70%
+After 6 months: Raise to 75%
+After 12 months: Raise to 80%
+```
+
+### Coverage in Code Review
+
+**Use Coverage Reports to Guide Review**:
+```bash
+# Generate coverage diff for PR
+pytest --cov=src --cov-report=html
+diff-cover coverage.xml --compare-branch=main
+
+# Shows:
+# - New code added without tests
+# - Changed code with decreased coverage
+# - Critical paths missing coverage
+```
+
+**Review Checklist**:
+- [ ] New business logic has >90% coverage
+- [ ] Critical paths are covered
+- [ ] Edge cases have tests
+- [ ] Tests make meaningful assertions
+- [ ] No coverage-chasing empty tests
+
+### Coverage for Different Test Levels
+
+**Unit Test Coverage**: Track separately from integration/E2E
+```bash
+# Unit tests only
+pytest tests/unit/ --cov=src --cov-report=term
+
+# Unit test coverage should be higher (70-90%)
+```
+
+**Integration Test Coverage**: Track what integration tests add
+```bash
+# Run all tests, see what integration adds
+pytest tests/ --cov=src --cov-report=term
+
+# Many integration tests don't add coverage
+# but add confidence in integration behavior
+```
+
+**Don't Count E2E Against Coverage**: E2E tests validate workflows, not code coverage
+```bash
+# Exclude E2E from coverage metrics
+pytest tests/unit/ tests/integration/ --cov=src
+# Don't include tests/e2e/ in coverage calculation
+```
+
+---
+
+## Risk-Based Coverage Strategy
+
+Focus coverage efforts where bugs have highest impact.
+
+### Risk Assessment Matrix
+
+```
+High Risk = High Business Impact + High Complexity + High Change Frequency
+Medium Risk = Mixed factors
+Low Risk = Low on most factors
+
+High Risk (Target: 90-100% coverage)
+├─ Payment processing
+├─ Authentication and authorization
+├─ Order calculation and fulfillment
+├─ Data migrations
+└─ Security-critical operations
+
+Medium Risk (Target: 70-90% coverage)
+├─ User management
+├─ Reporting and analytics
+├─ Email notifications
+├─ Search functionality
+└─ Content management
+
+Low Risk (Target: 30-70% coverage)
+├─ Configuration loading
+├─ Static content rendering
+├─ Logging utilities
+├─ Admin tools (low usage)
+└─ Deprecated features
+```
+
+### Historical Bug Analysis
+
+**Use Past Data to Guide Coverage**:
+```sql
+-- Find modules with highest bug density
+SELECT
+    module,
+    COUNT(*) as bug_count,
+    COUNT(*) / lines_of_code as bug_density
+FROM bugs
+JOIN code_metrics ON bugs.module = code_metrics.module
+WHERE created_at > DATE_SUB(NOW(), INTERVAL 12 MONTH)
+GROUP BY module
+ORDER BY bug_density DESC;
+
+-- Prioritize coverage for high bug density modules
+```
+
+### Coverage Heat Maps
+
+**Visualize Coverage by Risk**:
+```
+Component              | Coverage | Risk  | Status
+-----------------------|----------|-------|--------
+Payment Processing     | 95%      | High  | ✓ Good
+Authentication         | 88%      | High  | ⚠ Needs improvement
+Order Calculation      | 92%      | High  | ✓ Good
+User Management        | 75%      | Med   | ✓ Acceptable
+Email Service          | 65%      | Med   | ⚠ Consider adding tests
+Logging Utils          | 45%      | Low   | ✓ Acceptable
+```
+
+---
+
+## Coverage Tools and Techniques
+
+### Python Coverage Tools
+
+```bash
+# pytest-cov (most common)
+pytest --cov=src --cov-report=html --cov-report=term
+
+# View HTML report
+open htmlcov/index.html
+
+# Show missing lines
+pytest --cov=src --cov-report=term-missing
+
+# Branch coverage
+pytest --cov=src --cov-branch
+
+# Fail if coverage below threshold
+pytest --cov=src --cov-fail-under=80
+```
+
+### Coverage Badges for Documentation
+
+```markdown
+# README.md
+![Coverage](https://img.shields.io/badge/coverage-85%25-green)
+
+# Updates automatically in CI
+# codecov.io, coveralls.io provide this
+```
+
+### Continuous Coverage Tracking
+
+```yaml
+# .github/workflows/coverage.yml
+name: Coverage
+
+on: [push, pull_request]
+
+jobs:
+  coverage:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run tests with coverage
+        run: pytest --cov=src --cov-report=xml
+
+      - name: Upload to Codecov
+        uses: codecov/codecov-action@v3
+        with:
+          file: ./coverage.xml
+          fail_ci_if_error: true
+
+      - name: Comment coverage on PR
+        uses: py-cov-action/python-coverage-comment-action@v3
+        with:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+```
+
+---
+
+## Summary: Coverage Strategy Principles
+
+1. **Focus on behavior coverage, not just code coverage**: Tests should validate scenarios, not just execute lines
+
+2. **Set differentiated targets**: Business logic deserves higher coverage than configuration code
+
+3. **Use coverage as a guide, not a goal**: Low coverage indicates gaps; high coverage doesn't guarantee quality
+
+4. **Prioritize by risk**: Critical systems deserve comprehensive testing, peripheral features need less
+
+5. **Track trends, not just numbers**: Coverage should generally increase over time, especially for new code
+
+6. **Don't game the metrics**: Tests without assertions or testing trivial code provides false confidence
+
+7. **Combine with other metrics**: Coverage + defect detection rate + test execution time = complete picture
+
+**The Ultimate Goal**: High-quality tests that catch bugs, enable refactoring, and provide confidence—not just high coverage numbers. Coverage is a useful tool, but test effectiveness is what truly matters.