gh-aazbeltran-claude-code-p…/references/coverage-strategy.md

# Test Coverage Strategy: Beyond the Numbers

Code coverage metrics have been both praised and criticized. The truth lies in understanding what coverage means and doesn't mean, and using it appropriately to guide testing strategy.

## Code Coverage vs. Behavior Coverage

### Code Coverage Metrics

**Line Coverage**: Percentage of code lines executed by tests
```python
def calculate_discount(customer_type, order_total):
    if customer_type == "VIP":          # Line 1
        return order_total * 0.10       # Line 2
    return 0                            # Line 3

# Test that achieves 66% line coverage
def test_vip_discount():
    result = calculate_discount("VIP", 100)
    assert result == 10
# Lines executed: 1, 2 (66%)
# Line not executed: 3 (33%)
```

**Branch Coverage**: Percentage of decision branches executed
```python
def calculate_discount(customer_type, order_total):
    if customer_type == "VIP":
        return order_total * 0.10
    return 0

# 50% branch coverage (only true branch tested)
# Need test for false branch to reach 100%
```

**Path Coverage**: Percentage of execution paths through code (most comprehensive)
```python
def calculate_price(quantity, is_vip, has_coupon):
    price = quantity * 10

    if is_vip:              # 2 branches
        price *= 0.9

    if has_coupon:          # 2 branches
        price *= 0.95

    return price

# 2 × 2 = 4 possible paths:
# 1. Not VIP, no coupon
# 2. Not VIP, has coupon
# 3. VIP, no coupon
# 4. VIP, has coupon
```

### Behavior Coverage

**The Critical Distinction**: Code coverage measures execution; behavior coverage measures validation of scenarios.

```python
# ❌ HIGH code coverage, LOW behavior coverage
def test_calculate_discount():
    # Executes all lines
    result1 = calculate_discount("VIP", 100)
    result2 = calculate_discount("Regular", 100)
    # But makes NO assertions - just executes code
    # 100% line coverage, 0% behavior validation

# ✅ GOOD behavior coverage
def test_vip_customer_receives_discount():
    result = calculate_discount("VIP", 100)
    assert result == 10, "VIP customers should get 10% discount"

def test_regular_customer_receives_no_discount():
    result = calculate_discount("Regular", 100)
    assert result == 0, "Regular customers should get no discount"

def test_discount_scales_with_order_total():
    assert calculate_discount("VIP", 100) == 10
    assert calculate_discount("VIP", 200) == 20

def test_empty_customer_type_returns_zero():
    result = calculate_discount("", 100)
    assert result == 0
```

### What Coverage Metrics Tell You

**Coverage CAN tell you**:
- Which code has been executed by tests
- Which branches/paths have not been tested
- Where obvious gaps in testing exist
- Whether new code has any tests

**Coverage CANNOT tell you**:
- Whether tests make meaningful assertions
- Whether edge cases are properly validated
- Whether business logic is correct
- Whether tests would catch real bugs
- Test quality or effectiveness

---

## Recommended Coverage Targets by Component Type

Coverage targets should vary based on code type, recognizing that different components have different testing ROI.

### Business Logic and Domain Code: 90-100%

**What**: Core business rules, calculations, domain models, workflows

**Why High Coverage**:
- High complexity requires thorough testing
- Bugs here directly impact business outcomes
- Logic changes frequently
- Unit tests are fast and provide clear value

**Example**:
```python
# Core business logic deserves comprehensive coverage
class OrderPricing:
    def calculate_total(self, items, customer):
        """Calculate order total with discounts."""
        subtotal = sum(item.price * item.quantity for items)

        if customer.is_vip:
            subtotal *= 0.90

        if subtotal > 1000:
            subtotal *= 0.95  # Bulk discount

        return subtotal + self.calculate_tax(subtotal)

# Comprehensive test coverage
def test_regular_customer_basic_order():
    assert pricing.calculate_total(items_100, regular_customer) == ...

def test_vip_customer_receives_discount():
    assert pricing.calculate_total(items_100, vip_customer) == ...

def test_bulk_discount_applied_over_threshold():
    assert pricing.calculate_total(items_1500, regular_customer) == ...

def test_vip_and_bulk_discounts_stack():
    assert pricing.calculate_total(items_1500, vip_customer) == ...

def test_tax_calculation_included():
    assert pricing.calculate_total(items_100, regular_customer) > 100
```

### Integration Points: 70-90%

**What**: Repositories, API clients, message queue handlers, external service integrations

**Why Moderate Coverage**:
- Integration behavior is critical but some scenarios are impractical to test
- Integration tests are slower than unit tests
- Some paths only make sense in production environment

**Example**:
```python
# Repository - 80% coverage is reasonable
class UserRepository:
    def save(self, user):
        # Test this
        pass

    def find_by_email(self, email):
        # Test this
        pass

    def find_all_active(self):
        # Test this
        pass

    def cleanup_old_sessions(self):
        # Maintenance query - maybe skip or light testing
        pass

# Test critical paths thoroughly
def test_save_user_persists_to_database():
    # Essential to test
    pass

def test_find_by_email_returns_correct_user():
    # Essential to test
    pass

# Less critical edge cases can be sampled
def test_find_by_email_handles_sql_injection_attempt():
    # Nice to have but framework may handle this
    pass
```

### Controllers and API Endpoints: 70-90%

**What**: HTTP controllers, API route handlers, request/response handling

**Why Moderate Coverage**:
- Framework handles much of the complexity
- Some error scenarios difficult to trigger
- Integration tests cover many paths automatically

**Example**:
```python
# API endpoint - focus on business logic integration
@app.route('/api/orders', methods=['POST'])
def create_order():
    data = request.json

    try:
        order = order_service.create(data)
        return jsonify(order.to_dict()), 201
    except ValidationError as e:
        return jsonify({'error': str(e)}), 400
    except Exception as e:
        logger.error(f"Order creation failed: {e}")
        return jsonify({'error': 'Internal error'}), 500

# Test main scenarios
def test_create_order_success():
    response = client.post('/api/orders', json=valid_order_data)
    assert response.status_code == 201

def test_create_order_validation_error():
    response = client.post('/api/orders', json=invalid_order_data)
    assert response.status_code == 400

# Generic 500 error handling may not need explicit test
# if framework testing covers it
```

### Configuration and Glue Code: 30-50%

**What**: Configuration loading, simple adapters, basic initialization code

**Why Low Coverage**:
- Little logic to test
- Mostly framework or library interaction
- Better covered by integration tests
- Testing provides minimal value

**Example**:
```python
# Configuration loader - low unit test value
class Config:
    @staticmethod
    def load():
        return {
            'database_url': os.getenv('DATABASE_URL'),
            'api_key': os.getenv('API_KEY'),
            'debug': os.getenv('DEBUG', 'false').lower() == 'true'
        }

# Minimal unit testing needed
# Better to test that app starts with config in integration test
```

### Utilities and Helpers: 80-90%

**What**: Shared utility functions, formatters, parsers, helpers

**Why High Coverage**:
- Used in multiple places - bugs have wide impact
- Usually pure functions - easy to test thoroughly
- Fast unit tests with high value

**Example**:
```python
# Utility function - deserves thorough testing
def format_currency(amount, currency='USD'):
    """Format amount as currency string."""
    symbols = {'USD': '$', 'EUR': '€', 'GBP': '£'}
    symbol = symbols.get(currency, currency)

    if amount < 0:
        return f"-{symbol}{abs(amount):.2f}"
    return f"{symbol}{amount:.2f}"

# Comprehensive tests for shared utility
def test_format_usd():
    assert format_currency(100) == "$100.00"

def test_format_eur():
    assert format_currency(100, 'EUR') == "€100.00"

def test_format_negative():
    assert format_currency(-50) == "-$50.00"

def test_format_unknown_currency():
    assert format_currency(100, 'XYZ') == "XYZ100.00"
```

---

## When Coverage Becomes Harmful

Chasing coverage percentages as a goal leads to counterproductive practices.

### Anti-Pattern: Testing for Coverage, Not Value

**The Problem**: Writing tests that execute code but don't verify behavior, just to increase coverage numbers.

```python
# ❌ BAD: Test that only boosts coverage
def test_user_creation():
    user = User(email="test@example.com")
    # No assertions - just creates object for coverage
    # Provides zero value

# ✅ GOOD: Test that validates behavior
def test_user_creation_sets_default_values():
    user = User(email="test@example.com")

    assert user.email == "test@example.com"
    assert user.is_active == True
    assert user.created_at is not None
```

### Anti-Pattern: Testing Implementation Details to Hit Coverage

**The Problem**: Creating brittle tests of private methods just to reach coverage targets.

```python
class OrderService:
    def create_order(self, items):
        self._validate(items)
        return self._build_order(items)

    def _validate(self, items):
        # Private method
        if not items:
            raise ValueError("No items")

    def _build_order(self, items):
        # Private method
        return Order(items)

# ❌ BAD: Testing private methods for coverage
def test_validate_private_method():
    service = OrderService()
    with pytest.raises(ValueError):
        service._validate([])  # Testing implementation detail

# ✅ GOOD: Testing through public interface
def test_create_order_with_empty_items_raises_error():
    service = OrderService()
    with pytest.raises(ValueError):
        service.create_order([])  # Testing behavior
```

### Anti-Pattern: 100% Coverage Without Edge Case Testing

**The Problem**: Achieving high line coverage while missing important edge cases.

```python
def divide(a, b):
    return a / b

# Achieves 100% line coverage
def test_divide():
    result = divide(10, 2)
    assert result == 5

# But misses critical edge cases:
# - Division by zero
# - Negative numbers
# - Very large numbers
# - Floating point precision issues

# ✅ GOOD: Comprehensive edge case testing
def test_divide_normal_case():
    assert divide(10, 2) == 5

def test_divide_by_zero_raises_error():
    with pytest.raises(ZeroDivisionError):
        divide(10, 0)

def test_divide_negative_numbers():
    assert divide(-10, 2) == -5
    assert divide(10, -2) == -5

def test_divide_floating_point():
    assert divide(1, 3) == pytest.approx(0.333, rel=0.01)
```

---

## Practical Coverage Guidance

### Setting Organizational Coverage Targets

**Recommended Approach**: Set minimum thresholds that prevent coverage from decreasing, not aspirational 100% targets.

```yaml
# .coveragerc
[report]
fail_under = 80  # Fail if overall coverage drops below 80%

[coverage:run]
branch = true
source = src/

[coverage:report]
exclude_lines =
    # Don't require coverage for:
    pragma: no cover
    def __repr__
    raise AssertionError
    raise NotImplementedError
    if __name__ == .__main__.:
    if TYPE_CHECKING:
    @abstract
```

**Progressive Enforcement**:
```bash
# Start where you are
Current coverage: 65%
Set target: 65% (prevent regression)

# Gradually improve
After 3 months: Raise to 70%
After 6 months: Raise to 75%
After 12 months: Raise to 80%
```

### Coverage in Code Review

**Use Coverage Reports to Guide Review**:
```bash
# Generate coverage diff for PR
pytest --cov=src --cov-report=html
diff-cover coverage.xml --compare-branch=main

# Shows:
# - New code added without tests
# - Changed code with decreased coverage
# - Critical paths missing coverage
```

**Review Checklist**:
- [ ] New business logic has >90% coverage
- [ ] Critical paths are covered
- [ ] Edge cases have tests
- [ ] Tests make meaningful assertions
- [ ] No coverage-chasing empty tests

### Coverage for Different Test Levels

**Unit Test Coverage**: Track separately from integration/E2E
```bash
# Unit tests only
pytest tests/unit/ --cov=src --cov-report=term

# Unit test coverage should be higher (70-90%)
```

**Integration Test Coverage**: Track what integration tests add
```bash
# Run all tests, see what integration adds
pytest tests/ --cov=src --cov-report=term

# Many integration tests don't add coverage
# but add confidence in integration behavior
```

**Don't Count E2E Against Coverage**: E2E tests validate workflows, not code coverage
```bash
# Exclude E2E from coverage metrics
pytest tests/unit/ tests/integration/ --cov=src
# Don't include tests/e2e/ in coverage calculation
```

---

## Risk-Based Coverage Strategy

Focus coverage efforts where bugs have highest impact.

### Risk Assessment Matrix

```
High Risk = High Business Impact + High Complexity + High Change Frequency
Medium Risk = Mixed factors
Low Risk = Low on most factors

High Risk (Target: 90-100% coverage)
├─ Payment processing
├─ Authentication and authorization
├─ Order calculation and fulfillment
├─ Data migrations
└─ Security-critical operations

Medium Risk (Target: 70-90% coverage)
├─ User management
├─ Reporting and analytics
├─ Email notifications
├─ Search functionality
└─ Content management

Low Risk (Target: 30-70% coverage)
├─ Configuration loading
├─ Static content rendering
├─ Logging utilities
├─ Admin tools (low usage)
└─ Deprecated features
```

### Historical Bug Analysis

**Use Past Data to Guide Coverage**:
```sql
-- Find modules with highest bug density
SELECT
    module,
    COUNT(*) as bug_count,
    COUNT(*) / lines_of_code as bug_density
FROM bugs
JOIN code_metrics ON bugs.module = code_metrics.module
WHERE created_at > DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY module
ORDER BY bug_density DESC;

-- Prioritize coverage for high bug density modules
```

### Coverage Heat Maps

**Visualize Coverage by Risk**:
```
Component              | Coverage | Risk  | Status
-----------------------|----------|-------|--------
Payment Processing     | 95%      | High  | ✓ Good
Authentication         | 88%      | High  | ⚠ Needs improvement
Order Calculation      | 92%      | High  | ✓ Good
User Management        | 75%      | Med   | ✓ Acceptable
Email Service          | 65%      | Med   | ⚠ Consider adding tests
Logging Utils          | 45%      | Low   | ✓ Acceptable
```

---

## Coverage Tools and Techniques

### Python Coverage Tools

```bash
# pytest-cov (most common)
pytest --cov=src --cov-report=html --cov-report=term

# View HTML report
open htmlcov/index.html

# Show missing lines
pytest --cov=src --cov-report=term-missing

# Branch coverage
pytest --cov=src --cov-branch

# Fail if coverage below threshold
pytest --cov=src --cov-fail-under=80
```

### Coverage Badges for Documentation

```markdown
# README.md
![Coverage](https://img.shields.io/badge/coverage-85%25-green)

# Updates automatically in CI
# codecov.io, coveralls.io provide this
```

### Continuous Coverage Tracking

```yaml
# .github/workflows/coverage.yml
name: Coverage

on: [push, pull_request]

jobs:
  coverage:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Run tests with coverage
        run: pytest --cov=src --cov-report=xml

      - name: Upload to Codecov
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml
          fail_ci_if_error: true

      - name: Comment coverage on PR
        uses: py-cov-action/python-coverage-comment-action@v3
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
```

---

## Summary: Coverage Strategy Principles

1. **Focus on behavior coverage, not just code coverage**: Tests should validate scenarios, not just execute lines

2. **Set differentiated targets**: Business logic deserves higher coverage than configuration code

3. **Use coverage as a guide, not a goal**: Low coverage indicates gaps; high coverage doesn't guarantee quality

4. **Prioritize by risk**: Critical systems deserve comprehensive testing, peripheral features need less

5. **Track trends, not just numbers**: Coverage should generally increase over time, especially for new code

6. **Don't game the metrics**: Tests without assertions or testing trivial code provides false confidence

7. **Combine with other metrics**: Coverage + defect detection rate + test execution time = complete picture

**The Ultimate Goal**: High-quality tests that catch bugs, enable refactoring, and provide confidence—not just high coverage numbers. Coverage is a useful tool, but test effectiveness is what truly matters.