623 lines
17 KiB
Markdown
623 lines
17 KiB
Markdown
# Test Coverage Strategy: Beyond the Numbers
|
||
|
||
Code coverage metrics have been both praised and criticized. The truth lies in understanding what coverage means and doesn't mean, and using it appropriately to guide testing strategy.
|
||
|
||
## Code Coverage vs. Behavior Coverage
|
||
|
||
### Code Coverage Metrics
|
||
|
||
**Line Coverage**: Percentage of code lines executed by tests
|
||
```python
|
||
def calculate_discount(customer_type, order_total):
|
||
if customer_type == "VIP": # Line 1
|
||
return order_total * 0.10 # Line 2
|
||
return 0 # Line 3
|
||
|
||
# Test that achieves 66% line coverage
|
||
def test_vip_discount():
|
||
result = calculate_discount("VIP", 100)
|
||
assert result == 10
|
||
# Lines executed: 1, 2 (66%)
|
||
# Line not executed: 3 (33%)
|
||
```
|
||
|
||
**Branch Coverage**: Percentage of decision branches executed
|
||
```python
|
||
def calculate_discount(customer_type, order_total):
|
||
if customer_type == "VIP":
|
||
return order_total * 0.10
|
||
return 0
|
||
|
||
# 50% branch coverage (only true branch tested)
|
||
# Need test for false branch to reach 100%
|
||
```
|
||
|
||
**Path Coverage**: Percentage of execution paths through code (most comprehensive)
|
||
```python
|
||
def calculate_price(quantity, is_vip, has_coupon):
|
||
price = quantity * 10
|
||
|
||
if is_vip: # 2 branches
|
||
price *= 0.9
|
||
|
||
if has_coupon: # 2 branches
|
||
price *= 0.95
|
||
|
||
return price
|
||
|
||
# 2 × 2 = 4 possible paths:
|
||
# 1. Not VIP, no coupon
|
||
# 2. Not VIP, has coupon
|
||
# 3. VIP, no coupon
|
||
# 4. VIP, has coupon
|
||
```
|
||
|
||
### Behavior Coverage
|
||
|
||
**The Critical Distinction**: Code coverage measures execution; behavior coverage measures validation of scenarios.
|
||
|
||
```python
|
||
# ❌ HIGH code coverage, LOW behavior coverage
|
||
def test_calculate_discount():
|
||
# Executes all lines
|
||
result1 = calculate_discount("VIP", 100)
|
||
result2 = calculate_discount("Regular", 100)
|
||
# But makes NO assertions - just executes code
|
||
# 100% line coverage, 0% behavior validation
|
||
|
||
# ✅ GOOD behavior coverage
|
||
def test_vip_customer_receives_discount():
|
||
result = calculate_discount("VIP", 100)
|
||
assert result == 10, "VIP customers should get 10% discount"
|
||
|
||
def test_regular_customer_receives_no_discount():
|
||
result = calculate_discount("Regular", 100)
|
||
assert result == 0, "Regular customers should get no discount"
|
||
|
||
def test_discount_scales_with_order_total():
|
||
assert calculate_discount("VIP", 100) == 10
|
||
assert calculate_discount("VIP", 200) == 20
|
||
|
||
def test_empty_customer_type_returns_zero():
|
||
result = calculate_discount("", 100)
|
||
assert result == 0
|
||
```
|
||
|
||
### What Coverage Metrics Tell You
|
||
|
||
**Coverage CAN tell you**:
|
||
- Which code has been executed by tests
|
||
- Which branches/paths have not been tested
|
||
- Where obvious gaps in testing exist
|
||
- Whether new code has any tests
|
||
|
||
**Coverage CANNOT tell you**:
|
||
- Whether tests make meaningful assertions
|
||
- Whether edge cases are properly validated
|
||
- Whether business logic is correct
|
||
- Whether tests would catch real bugs
|
||
- Test quality or effectiveness
|
||
|
||
---
|
||
|
||
## Recommended Coverage Targets by Component Type
|
||
|
||
Coverage targets should vary based on code type, recognizing that different components have different testing ROI.
|
||
|
||
### Business Logic and Domain Code: 90-100%
|
||
|
||
**What**: Core business rules, calculations, domain models, workflows
|
||
|
||
**Why High Coverage**:
|
||
- High complexity requires thorough testing
|
||
- Bugs here directly impact business outcomes
|
||
- Logic changes frequently
|
||
- Unit tests are fast and provide clear value
|
||
|
||
**Example**:
|
||
```python
|
||
# Core business logic deserves comprehensive coverage
|
||
class OrderPricing:
|
||
def calculate_total(self, items, customer):
|
||
"""Calculate order total with discounts."""
|
||
subtotal = sum(item.price * item.quantity for items)
|
||
|
||
if customer.is_vip:
|
||
subtotal *= 0.90
|
||
|
||
if subtotal > 1000:
|
||
subtotal *= 0.95 # Bulk discount
|
||
|
||
return subtotal + self.calculate_tax(subtotal)
|
||
|
||
# Comprehensive test coverage
|
||
def test_regular_customer_basic_order():
|
||
assert pricing.calculate_total(items_100, regular_customer) == ...
|
||
|
||
def test_vip_customer_receives_discount():
|
||
assert pricing.calculate_total(items_100, vip_customer) == ...
|
||
|
||
def test_bulk_discount_applied_over_threshold():
|
||
assert pricing.calculate_total(items_1500, regular_customer) == ...
|
||
|
||
def test_vip_and_bulk_discounts_stack():
|
||
assert pricing.calculate_total(items_1500, vip_customer) == ...
|
||
|
||
def test_tax_calculation_included():
|
||
assert pricing.calculate_total(items_100, regular_customer) > 100
|
||
```
|
||
|
||
### Integration Points: 70-90%
|
||
|
||
**What**: Repositories, API clients, message queue handlers, external service integrations
|
||
|
||
**Why Moderate Coverage**:
|
||
- Integration behavior is critical but some scenarios are impractical to test
|
||
- Integration tests are slower than unit tests
|
||
- Some paths only make sense in production environment
|
||
|
||
**Example**:
|
||
```python
|
||
# Repository - 80% coverage is reasonable
|
||
class UserRepository:
|
||
def save(self, user):
|
||
# Test this
|
||
pass
|
||
|
||
def find_by_email(self, email):
|
||
# Test this
|
||
pass
|
||
|
||
def find_all_active(self):
|
||
# Test this
|
||
pass
|
||
|
||
def cleanup_old_sessions(self):
|
||
# Maintenance query - maybe skip or light testing
|
||
pass
|
||
|
||
# Test critical paths thoroughly
|
||
def test_save_user_persists_to_database():
|
||
# Essential to test
|
||
pass
|
||
|
||
def test_find_by_email_returns_correct_user():
|
||
# Essential to test
|
||
pass
|
||
|
||
# Less critical edge cases can be sampled
|
||
def test_find_by_email_handles_sql_injection_attempt():
|
||
# Nice to have but framework may handle this
|
||
pass
|
||
```
|
||
|
||
### Controllers and API Endpoints: 70-90%
|
||
|
||
**What**: HTTP controllers, API route handlers, request/response handling
|
||
|
||
**Why Moderate Coverage**:
|
||
- Framework handles much of the complexity
|
||
- Some error scenarios difficult to trigger
|
||
- Integration tests cover many paths automatically
|
||
|
||
**Example**:
|
||
```python
|
||
# API endpoint - focus on business logic integration
|
||
@app.route('/api/orders', methods=['POST'])
|
||
def create_order():
|
||
data = request.json
|
||
|
||
try:
|
||
order = order_service.create(data)
|
||
return jsonify(order.to_dict()), 201
|
||
except ValidationError as e:
|
||
return jsonify({'error': str(e)}), 400
|
||
except Exception as e:
|
||
logger.error(f"Order creation failed: {e}")
|
||
return jsonify({'error': 'Internal error'}), 500
|
||
|
||
# Test main scenarios
|
||
def test_create_order_success():
|
||
response = client.post('/api/orders', json=valid_order_data)
|
||
assert response.status_code == 201
|
||
|
||
def test_create_order_validation_error():
|
||
response = client.post('/api/orders', json=invalid_order_data)
|
||
assert response.status_code == 400
|
||
|
||
# Generic 500 error handling may not need explicit test
|
||
# if framework testing covers it
|
||
```
|
||
|
||
### Configuration and Glue Code: 30-50%
|
||
|
||
**What**: Configuration loading, simple adapters, basic initialization code
|
||
|
||
**Why Low Coverage**:
|
||
- Little logic to test
|
||
- Mostly framework or library interaction
|
||
- Better covered by integration tests
|
||
- Testing provides minimal value
|
||
|
||
**Example**:
|
||
```python
|
||
# Configuration loader - low unit test value
|
||
class Config:
|
||
@staticmethod
|
||
def load():
|
||
return {
|
||
'database_url': os.getenv('DATABASE_URL'),
|
||
'api_key': os.getenv('API_KEY'),
|
||
'debug': os.getenv('DEBUG', 'false').lower() == 'true'
|
||
}
|
||
|
||
# Minimal unit testing needed
|
||
# Better to test that app starts with config in integration test
|
||
```
|
||
|
||
### Utilities and Helpers: 80-90%
|
||
|
||
**What**: Shared utility functions, formatters, parsers, helpers
|
||
|
||
**Why High Coverage**:
|
||
- Used in multiple places - bugs have wide impact
|
||
- Usually pure functions - easy to test thoroughly
|
||
- Fast unit tests with high value
|
||
|
||
**Example**:
|
||
```python
|
||
# Utility function - deserves thorough testing
|
||
def format_currency(amount, currency='USD'):
|
||
"""Format amount as currency string."""
|
||
symbols = {'USD': '$', 'EUR': '€', 'GBP': '£'}
|
||
symbol = symbols.get(currency, currency)
|
||
|
||
if amount < 0:
|
||
return f"-{symbol}{abs(amount):.2f}"
|
||
return f"{symbol}{amount:.2f}"
|
||
|
||
# Comprehensive tests for shared utility
|
||
def test_format_usd():
|
||
assert format_currency(100) == "$100.00"
|
||
|
||
def test_format_eur():
|
||
assert format_currency(100, 'EUR') == "€100.00"
|
||
|
||
def test_format_negative():
|
||
assert format_currency(-50) == "-$50.00"
|
||
|
||
def test_format_unknown_currency():
|
||
assert format_currency(100, 'XYZ') == "XYZ100.00"
|
||
```
|
||
|
||
---
|
||
|
||
## When Coverage Becomes Harmful
|
||
|
||
Chasing coverage percentages as a goal leads to counterproductive practices.
|
||
|
||
### Anti-Pattern: Testing for Coverage, Not Value
|
||
|
||
**The Problem**: Writing tests that execute code but don't verify behavior, just to increase coverage numbers.
|
||
|
||
```python
|
||
# ❌ BAD: Test that only boosts coverage
|
||
def test_user_creation():
|
||
user = User(email="test@example.com")
|
||
# No assertions - just creates object for coverage
|
||
# Provides zero value
|
||
|
||
# ✅ GOOD: Test that validates behavior
|
||
def test_user_creation_sets_default_values():
|
||
user = User(email="test@example.com")
|
||
|
||
assert user.email == "test@example.com"
|
||
assert user.is_active == True
|
||
assert user.created_at is not None
|
||
```
|
||
|
||
### Anti-Pattern: Testing Implementation Details to Hit Coverage
|
||
|
||
**The Problem**: Creating brittle tests of private methods just to reach coverage targets.
|
||
|
||
```python
|
||
class OrderService:
|
||
def create_order(self, items):
|
||
self._validate(items)
|
||
return self._build_order(items)
|
||
|
||
def _validate(self, items):
|
||
# Private method
|
||
if not items:
|
||
raise ValueError("No items")
|
||
|
||
def _build_order(self, items):
|
||
# Private method
|
||
return Order(items)
|
||
|
||
# ❌ BAD: Testing private methods for coverage
|
||
def test_validate_private_method():
|
||
service = OrderService()
|
||
with pytest.raises(ValueError):
|
||
service._validate([]) # Testing implementation detail
|
||
|
||
# ✅ GOOD: Testing through public interface
|
||
def test_create_order_with_empty_items_raises_error():
|
||
service = OrderService()
|
||
with pytest.raises(ValueError):
|
||
service.create_order([]) # Testing behavior
|
||
```
|
||
|
||
### Anti-Pattern: 100% Coverage Without Edge Case Testing
|
||
|
||
**The Problem**: Achieving high line coverage while missing important edge cases.
|
||
|
||
```python
|
||
def divide(a, b):
|
||
return a / b
|
||
|
||
# Achieves 100% line coverage
|
||
def test_divide():
|
||
result = divide(10, 2)
|
||
assert result == 5
|
||
|
||
# But misses critical edge cases:
|
||
# - Division by zero
|
||
# - Negative numbers
|
||
# - Very large numbers
|
||
# - Floating point precision issues
|
||
|
||
# ✅ GOOD: Comprehensive edge case testing
|
||
def test_divide_normal_case():
|
||
assert divide(10, 2) == 5
|
||
|
||
def test_divide_by_zero_raises_error():
|
||
with pytest.raises(ZeroDivisionError):
|
||
divide(10, 0)
|
||
|
||
def test_divide_negative_numbers():
|
||
assert divide(-10, 2) == -5
|
||
assert divide(10, -2) == -5
|
||
|
||
def test_divide_floating_point():
|
||
assert divide(1, 3) == pytest.approx(0.333, rel=0.01)
|
||
```
|
||
|
||
---
|
||
|
||
## Practical Coverage Guidance
|
||
|
||
### Setting Organizational Coverage Targets
|
||
|
||
**Recommended Approach**: Set minimum thresholds that prevent coverage from decreasing, not aspirational 100% targets.
|
||
|
||
```yaml
|
||
# .coveragerc
|
||
[report]
|
||
fail_under = 80 # Fail if overall coverage drops below 80%
|
||
|
||
[coverage:run]
|
||
branch = true
|
||
source = src/
|
||
|
||
[coverage:report]
|
||
exclude_lines =
|
||
# Don't require coverage for:
|
||
pragma: no cover
|
||
def __repr__
|
||
raise AssertionError
|
||
raise NotImplementedError
|
||
if __name__ == .__main__.:
|
||
if TYPE_CHECKING:
|
||
@abstract
|
||
```
|
||
|
||
**Progressive Enforcement**:
|
||
```bash
|
||
# Start where you are
|
||
Current coverage: 65%
|
||
Set target: 65% (prevent regression)
|
||
|
||
# Gradually improve
|
||
After 3 months: Raise to 70%
|
||
After 6 months: Raise to 75%
|
||
After 12 months: Raise to 80%
|
||
```
|
||
|
||
### Coverage in Code Review
|
||
|
||
**Use Coverage Reports to Guide Review**:
|
||
```bash
|
||
# Generate coverage diff for PR
|
||
pytest --cov=src --cov-report=html
|
||
diff-cover coverage.xml --compare-branch=main
|
||
|
||
# Shows:
|
||
# - New code added without tests
|
||
# - Changed code with decreased coverage
|
||
# - Critical paths missing coverage
|
||
```
|
||
|
||
**Review Checklist**:
|
||
- [ ] New business logic has >90% coverage
|
||
- [ ] Critical paths are covered
|
||
- [ ] Edge cases have tests
|
||
- [ ] Tests make meaningful assertions
|
||
- [ ] No coverage-chasing empty tests
|
||
|
||
### Coverage for Different Test Levels
|
||
|
||
**Unit Test Coverage**: Track separately from integration/E2E
|
||
```bash
|
||
# Unit tests only
|
||
pytest tests/unit/ --cov=src --cov-report=term
|
||
|
||
# Unit test coverage should be higher (70-90%)
|
||
```
|
||
|
||
**Integration Test Coverage**: Track what integration tests add
|
||
```bash
|
||
# Run all tests, see what integration adds
|
||
pytest tests/ --cov=src --cov-report=term
|
||
|
||
# Many integration tests don't add coverage
|
||
# but add confidence in integration behavior
|
||
```
|
||
|
||
**Don't Count E2E Against Coverage**: E2E tests validate workflows, not code coverage
|
||
```bash
|
||
# Exclude E2E from coverage metrics
|
||
pytest tests/unit/ tests/integration/ --cov=src
|
||
# Don't include tests/e2e/ in coverage calculation
|
||
```
|
||
|
||
---
|
||
|
||
## Risk-Based Coverage Strategy
|
||
|
||
Focus coverage efforts where bugs have highest impact.
|
||
|
||
### Risk Assessment Matrix
|
||
|
||
```
|
||
High Risk = High Business Impact + High Complexity + High Change Frequency
|
||
Medium Risk = Mixed factors
|
||
Low Risk = Low on most factors
|
||
|
||
High Risk (Target: 90-100% coverage)
|
||
├─ Payment processing
|
||
├─ Authentication and authorization
|
||
├─ Order calculation and fulfillment
|
||
├─ Data migrations
|
||
└─ Security-critical operations
|
||
|
||
Medium Risk (Target: 70-90% coverage)
|
||
├─ User management
|
||
├─ Reporting and analytics
|
||
├─ Email notifications
|
||
├─ Search functionality
|
||
└─ Content management
|
||
|
||
Low Risk (Target: 30-70% coverage)
|
||
├─ Configuration loading
|
||
├─ Static content rendering
|
||
├─ Logging utilities
|
||
├─ Admin tools (low usage)
|
||
└─ Deprecated features
|
||
```
|
||
|
||
### Historical Bug Analysis
|
||
|
||
**Use Past Data to Guide Coverage**:
|
||
```sql
|
||
-- Find modules with highest bug density
|
||
SELECT
|
||
module,
|
||
COUNT(*) as bug_count,
|
||
COUNT(*) / lines_of_code as bug_density
|
||
FROM bugs
|
||
JOIN code_metrics ON bugs.module = code_metrics.module
|
||
WHERE created_at > DATE_SUB(NOW(), INTERVAL 12 MONTH)
|
||
GROUP BY module
|
||
ORDER BY bug_density DESC;
|
||
|
||
-- Prioritize coverage for high bug density modules
|
||
```
|
||
|
||
### Coverage Heat Maps
|
||
|
||
**Visualize Coverage by Risk**:
|
||
```
|
||
Component | Coverage | Risk | Status
|
||
-----------------------|----------|-------|--------
|
||
Payment Processing | 95% | High | ✓ Good
|
||
Authentication | 88% | High | ⚠ Needs improvement
|
||
Order Calculation | 92% | High | ✓ Good
|
||
User Management | 75% | Med | ✓ Acceptable
|
||
Email Service | 65% | Med | ⚠ Consider adding tests
|
||
Logging Utils | 45% | Low | ✓ Acceptable
|
||
```
|
||
|
||
---
|
||
|
||
## Coverage Tools and Techniques
|
||
|
||
### Python Coverage Tools
|
||
|
||
```bash
|
||
# pytest-cov (most common)
|
||
pytest --cov=src --cov-report=html --cov-report=term
|
||
|
||
# View HTML report
|
||
open htmlcov/index.html
|
||
|
||
# Show missing lines
|
||
pytest --cov=src --cov-report=term-missing
|
||
|
||
# Branch coverage
|
||
pytest --cov=src --cov-branch
|
||
|
||
# Fail if coverage below threshold
|
||
pytest --cov=src --cov-fail-under=80
|
||
```
|
||
|
||
### Coverage Badges for Documentation
|
||
|
||
```markdown
|
||
# README.md
|
||

|
||
|
||
# Updates automatically in CI
|
||
# codecov.io, coveralls.io provide this
|
||
```
|
||
|
||
### Continuous Coverage Tracking
|
||
|
||
```yaml
|
||
# .github/workflows/coverage.yml
|
||
name: Coverage
|
||
|
||
on: [push, pull_request]
|
||
|
||
jobs:
|
||
coverage:
|
||
runs-on: ubuntu-latest
|
||
|
||
steps:
|
||
- uses: actions/checkout@v3
|
||
|
||
- name: Run tests with coverage
|
||
run: pytest --cov=src --cov-report=xml
|
||
|
||
- name: Upload to Codecov
|
||
uses: codecov/codecov-action@v3
|
||
with:
|
||
file: ./coverage.xml
|
||
fail_ci_if_error: true
|
||
|
||
- name: Comment coverage on PR
|
||
uses: py-cov-action/python-coverage-comment-action@v3
|
||
with:
|
||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||
```
|
||
|
||
---
|
||
|
||
## Summary: Coverage Strategy Principles
|
||
|
||
1. **Focus on behavior coverage, not just code coverage**: Tests should validate scenarios, not just execute lines
|
||
|
||
2. **Set differentiated targets**: Business logic deserves higher coverage than configuration code
|
||
|
||
3. **Use coverage as a guide, not a goal**: Low coverage indicates gaps; high coverage doesn't guarantee quality
|
||
|
||
4. **Prioritize by risk**: Critical systems deserve comprehensive testing, peripheral features need less
|
||
|
||
5. **Track trends, not just numbers**: Coverage should generally increase over time, especially for new code
|
||
|
||
6. **Don't game the metrics**: Tests without assertions or testing trivial code provides false confidence
|
||
|
||
7. **Combine with other metrics**: Coverage + defect detection rate + test execution time = complete picture
|
||
|
||
**The Ultimate Goal**: High-quality tests that catch bugs, enable refactoring, and provide confidence—not just high coverage numbers. Coverage is a useful tool, but test effectiveness is what truly matters.
|