Initial commit
This commit is contained in:
622
references/coverage-strategy.md
Normal file
622
references/coverage-strategy.md
Normal file
@@ -0,0 +1,622 @@
|
||||
# Test Coverage Strategy: Beyond the Numbers
|
||||
|
||||
Code coverage metrics have been both praised and criticized. The truth lies in understanding what coverage means and doesn't mean, and using it appropriately to guide testing strategy.
|
||||
|
||||
## Code Coverage vs. Behavior Coverage
|
||||
|
||||
### Code Coverage Metrics
|
||||
|
||||
**Line Coverage**: Percentage of code lines executed by tests
|
||||
```python
|
||||
def calculate_discount(customer_type, order_total):
|
||||
if customer_type == "VIP": # Line 1
|
||||
return order_total * 0.10 # Line 2
|
||||
return 0 # Line 3
|
||||
|
||||
# Test that achieves 66% line coverage
|
||||
def test_vip_discount():
|
||||
result = calculate_discount("VIP", 100)
|
||||
assert result == 10
|
||||
# Lines executed: 1, 2 (66%)
|
||||
# Line not executed: 3 (33%)
|
||||
```
|
||||
|
||||
**Branch Coverage**: Percentage of decision branches executed
|
||||
```python
|
||||
def calculate_discount(customer_type, order_total):
|
||||
if customer_type == "VIP":
|
||||
return order_total * 0.10
|
||||
return 0
|
||||
|
||||
# 50% branch coverage (only true branch tested)
|
||||
# Need test for false branch to reach 100%
|
||||
```
|
||||
|
||||
**Path Coverage**: Percentage of execution paths through code (most comprehensive)
|
||||
```python
|
||||
def calculate_price(quantity, is_vip, has_coupon):
|
||||
price = quantity * 10
|
||||
|
||||
if is_vip: # 2 branches
|
||||
price *= 0.9
|
||||
|
||||
if has_coupon: # 2 branches
|
||||
price *= 0.95
|
||||
|
||||
return price
|
||||
|
||||
# 2 × 2 = 4 possible paths:
|
||||
# 1. Not VIP, no coupon
|
||||
# 2. Not VIP, has coupon
|
||||
# 3. VIP, no coupon
|
||||
# 4. VIP, has coupon
|
||||
```
|
||||
|
||||
### Behavior Coverage
|
||||
|
||||
**The Critical Distinction**: Code coverage measures execution; behavior coverage measures validation of scenarios.
|
||||
|
||||
```python
|
||||
# ❌ HIGH code coverage, LOW behavior coverage
|
||||
def test_calculate_discount():
|
||||
# Executes all lines
|
||||
result1 = calculate_discount("VIP", 100)
|
||||
result2 = calculate_discount("Regular", 100)
|
||||
# But makes NO assertions - just executes code
|
||||
# 100% line coverage, 0% behavior validation
|
||||
|
||||
# ✅ GOOD behavior coverage
|
||||
def test_vip_customer_receives_discount():
|
||||
result = calculate_discount("VIP", 100)
|
||||
assert result == 10, "VIP customers should get 10% discount"
|
||||
|
||||
def test_regular_customer_receives_no_discount():
|
||||
result = calculate_discount("Regular", 100)
|
||||
assert result == 0, "Regular customers should get no discount"
|
||||
|
||||
def test_discount_scales_with_order_total():
|
||||
assert calculate_discount("VIP", 100) == 10
|
||||
assert calculate_discount("VIP", 200) == 20
|
||||
|
||||
def test_empty_customer_type_returns_zero():
|
||||
result = calculate_discount("", 100)
|
||||
assert result == 0
|
||||
```
|
||||
|
||||
### What Coverage Metrics Tell You
|
||||
|
||||
**Coverage CAN tell you**:
|
||||
- Which code has been executed by tests
|
||||
- Which branches/paths have not been tested
|
||||
- Where obvious gaps in testing exist
|
||||
- Whether new code has any tests
|
||||
|
||||
**Coverage CANNOT tell you**:
|
||||
- Whether tests make meaningful assertions
|
||||
- Whether edge cases are properly validated
|
||||
- Whether business logic is correct
|
||||
- Whether tests would catch real bugs
|
||||
- Test quality or effectiveness
|
||||
|
||||
---
|
||||
|
||||
## Recommended Coverage Targets by Component Type
|
||||
|
||||
Coverage targets should vary based on code type, recognizing that different components have different testing ROI.
|
||||
|
||||
### Business Logic and Domain Code: 90-100%
|
||||
|
||||
**What**: Core business rules, calculations, domain models, workflows
|
||||
|
||||
**Why High Coverage**:
|
||||
- High complexity requires thorough testing
|
||||
- Bugs here directly impact business outcomes
|
||||
- Logic changes frequently
|
||||
- Unit tests are fast and provide clear value
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
# Core business logic deserves comprehensive coverage
|
||||
class OrderPricing:
|
||||
def calculate_total(self, items, customer):
|
||||
"""Calculate order total with discounts."""
|
||||
subtotal = sum(item.price * item.quantity for items)
|
||||
|
||||
if customer.is_vip:
|
||||
subtotal *= 0.90
|
||||
|
||||
if subtotal > 1000:
|
||||
subtotal *= 0.95 # Bulk discount
|
||||
|
||||
return subtotal + self.calculate_tax(subtotal)
|
||||
|
||||
# Comprehensive test coverage
|
||||
def test_regular_customer_basic_order():
|
||||
assert pricing.calculate_total(items_100, regular_customer) == ...
|
||||
|
||||
def test_vip_customer_receives_discount():
|
||||
assert pricing.calculate_total(items_100, vip_customer) == ...
|
||||
|
||||
def test_bulk_discount_applied_over_threshold():
|
||||
assert pricing.calculate_total(items_1500, regular_customer) == ...
|
||||
|
||||
def test_vip_and_bulk_discounts_stack():
|
||||
assert pricing.calculate_total(items_1500, vip_customer) == ...
|
||||
|
||||
def test_tax_calculation_included():
|
||||
assert pricing.calculate_total(items_100, regular_customer) > 100
|
||||
```
|
||||
|
||||
### Integration Points: 70-90%
|
||||
|
||||
**What**: Repositories, API clients, message queue handlers, external service integrations
|
||||
|
||||
**Why Moderate Coverage**:
|
||||
- Integration behavior is critical but some scenarios are impractical to test
|
||||
- Integration tests are slower than unit tests
|
||||
- Some paths only make sense in production environment
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
# Repository - 80% coverage is reasonable
|
||||
class UserRepository:
|
||||
def save(self, user):
|
||||
# Test this
|
||||
pass
|
||||
|
||||
def find_by_email(self, email):
|
||||
# Test this
|
||||
pass
|
||||
|
||||
def find_all_active(self):
|
||||
# Test this
|
||||
pass
|
||||
|
||||
def cleanup_old_sessions(self):
|
||||
# Maintenance query - maybe skip or light testing
|
||||
pass
|
||||
|
||||
# Test critical paths thoroughly
|
||||
def test_save_user_persists_to_database():
|
||||
# Essential to test
|
||||
pass
|
||||
|
||||
def test_find_by_email_returns_correct_user():
|
||||
# Essential to test
|
||||
pass
|
||||
|
||||
# Less critical edge cases can be sampled
|
||||
def test_find_by_email_handles_sql_injection_attempt():
|
||||
# Nice to have but framework may handle this
|
||||
pass
|
||||
```
|
||||
|
||||
### Controllers and API Endpoints: 70-90%
|
||||
|
||||
**What**: HTTP controllers, API route handlers, request/response handling
|
||||
|
||||
**Why Moderate Coverage**:
|
||||
- Framework handles much of the complexity
|
||||
- Some error scenarios difficult to trigger
|
||||
- Integration tests cover many paths automatically
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
# API endpoint - focus on business logic integration
|
||||
@app.route('/api/orders', methods=['POST'])
|
||||
def create_order():
|
||||
data = request.json
|
||||
|
||||
try:
|
||||
order = order_service.create(data)
|
||||
return jsonify(order.to_dict()), 201
|
||||
except ValidationError as e:
|
||||
return jsonify({'error': str(e)}), 400
|
||||
except Exception as e:
|
||||
logger.error(f"Order creation failed: {e}")
|
||||
return jsonify({'error': 'Internal error'}), 500
|
||||
|
||||
# Test main scenarios
|
||||
def test_create_order_success():
|
||||
response = client.post('/api/orders', json=valid_order_data)
|
||||
assert response.status_code == 201
|
||||
|
||||
def test_create_order_validation_error():
|
||||
response = client.post('/api/orders', json=invalid_order_data)
|
||||
assert response.status_code == 400
|
||||
|
||||
# Generic 500 error handling may not need explicit test
|
||||
# if framework testing covers it
|
||||
```
|
||||
|
||||
### Configuration and Glue Code: 30-50%
|
||||
|
||||
**What**: Configuration loading, simple adapters, basic initialization code
|
||||
|
||||
**Why Low Coverage**:
|
||||
- Little logic to test
|
||||
- Mostly framework or library interaction
|
||||
- Better covered by integration tests
|
||||
- Testing provides minimal value
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
# Configuration loader - low unit test value
|
||||
class Config:
|
||||
@staticmethod
|
||||
def load():
|
||||
return {
|
||||
'database_url': os.getenv('DATABASE_URL'),
|
||||
'api_key': os.getenv('API_KEY'),
|
||||
'debug': os.getenv('DEBUG', 'false').lower() == 'true'
|
||||
}
|
||||
|
||||
# Minimal unit testing needed
|
||||
# Better to test that app starts with config in integration test
|
||||
```
|
||||
|
||||
### Utilities and Helpers: 80-90%
|
||||
|
||||
**What**: Shared utility functions, formatters, parsers, helpers
|
||||
|
||||
**Why High Coverage**:
|
||||
- Used in multiple places - bugs have wide impact
|
||||
- Usually pure functions - easy to test thoroughly
|
||||
- Fast unit tests with high value
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
# Utility function - deserves thorough testing
|
||||
def format_currency(amount, currency='USD'):
|
||||
"""Format amount as currency string."""
|
||||
symbols = {'USD': '$', 'EUR': '€', 'GBP': '£'}
|
||||
symbol = symbols.get(currency, currency)
|
||||
|
||||
if amount < 0:
|
||||
return f"-{symbol}{abs(amount):.2f}"
|
||||
return f"{symbol}{amount:.2f}"
|
||||
|
||||
# Comprehensive tests for shared utility
|
||||
def test_format_usd():
|
||||
assert format_currency(100) == "$100.00"
|
||||
|
||||
def test_format_eur():
|
||||
assert format_currency(100, 'EUR') == "€100.00"
|
||||
|
||||
def test_format_negative():
|
||||
assert format_currency(-50) == "-$50.00"
|
||||
|
||||
def test_format_unknown_currency():
|
||||
assert format_currency(100, 'XYZ') == "XYZ100.00"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When Coverage Becomes Harmful
|
||||
|
||||
Chasing coverage percentages as a goal leads to counterproductive practices.
|
||||
|
||||
### Anti-Pattern: Testing for Coverage, Not Value
|
||||
|
||||
**The Problem**: Writing tests that execute code but don't verify behavior, just to increase coverage numbers.
|
||||
|
||||
```python
|
||||
# ❌ BAD: Test that only boosts coverage
|
||||
def test_user_creation():
|
||||
user = User(email="test@example.com")
|
||||
# No assertions - just creates object for coverage
|
||||
# Provides zero value
|
||||
|
||||
# ✅ GOOD: Test that validates behavior
|
||||
def test_user_creation_sets_default_values():
|
||||
user = User(email="test@example.com")
|
||||
|
||||
assert user.email == "test@example.com"
|
||||
assert user.is_active == True
|
||||
assert user.created_at is not None
|
||||
```
|
||||
|
||||
### Anti-Pattern: Testing Implementation Details to Hit Coverage
|
||||
|
||||
**The Problem**: Creating brittle tests of private methods just to reach coverage targets.
|
||||
|
||||
```python
|
||||
class OrderService:
|
||||
def create_order(self, items):
|
||||
self._validate(items)
|
||||
return self._build_order(items)
|
||||
|
||||
def _validate(self, items):
|
||||
# Private method
|
||||
if not items:
|
||||
raise ValueError("No items")
|
||||
|
||||
def _build_order(self, items):
|
||||
# Private method
|
||||
return Order(items)
|
||||
|
||||
# ❌ BAD: Testing private methods for coverage
|
||||
def test_validate_private_method():
|
||||
service = OrderService()
|
||||
with pytest.raises(ValueError):
|
||||
service._validate([]) # Testing implementation detail
|
||||
|
||||
# ✅ GOOD: Testing through public interface
|
||||
def test_create_order_with_empty_items_raises_error():
|
||||
service = OrderService()
|
||||
with pytest.raises(ValueError):
|
||||
service.create_order([]) # Testing behavior
|
||||
```
|
||||
|
||||
### Anti-Pattern: 100% Coverage Without Edge Case Testing
|
||||
|
||||
**The Problem**: Achieving high line coverage while missing important edge cases.
|
||||
|
||||
```python
|
||||
def divide(a, b):
|
||||
return a / b
|
||||
|
||||
# Achieves 100% line coverage
|
||||
def test_divide():
|
||||
result = divide(10, 2)
|
||||
assert result == 5
|
||||
|
||||
# But misses critical edge cases:
|
||||
# - Division by zero
|
||||
# - Negative numbers
|
||||
# - Very large numbers
|
||||
# - Floating point precision issues
|
||||
|
||||
# ✅ GOOD: Comprehensive edge case testing
|
||||
def test_divide_normal_case():
|
||||
assert divide(10, 2) == 5
|
||||
|
||||
def test_divide_by_zero_raises_error():
|
||||
with pytest.raises(ZeroDivisionError):
|
||||
divide(10, 0)
|
||||
|
||||
def test_divide_negative_numbers():
|
||||
assert divide(-10, 2) == -5
|
||||
assert divide(10, -2) == -5
|
||||
|
||||
def test_divide_floating_point():
|
||||
assert divide(1, 3) == pytest.approx(0.333, rel=0.01)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Practical Coverage Guidance
|
||||
|
||||
### Setting Organizational Coverage Targets
|
||||
|
||||
**Recommended Approach**: Set minimum thresholds that prevent coverage from decreasing, not aspirational 100% targets.
|
||||
|
||||
```yaml
|
||||
# .coveragerc
|
||||
[report]
|
||||
fail_under = 80 # Fail if overall coverage drops below 80%
|
||||
|
||||
[coverage:run]
|
||||
branch = true
|
||||
source = src/
|
||||
|
||||
[coverage:report]
|
||||
exclude_lines =
|
||||
# Don't require coverage for:
|
||||
pragma: no cover
|
||||
def __repr__
|
||||
raise AssertionError
|
||||
raise NotImplementedError
|
||||
if __name__ == .__main__.:
|
||||
if TYPE_CHECKING:
|
||||
@abstract
|
||||
```
|
||||
|
||||
**Progressive Enforcement**:
|
||||
```bash
|
||||
# Start where you are
|
||||
Current coverage: 65%
|
||||
Set target: 65% (prevent regression)
|
||||
|
||||
# Gradually improve
|
||||
After 3 months: Raise to 70%
|
||||
After 6 months: Raise to 75%
|
||||
After 12 months: Raise to 80%
|
||||
```
|
||||
|
||||
### Coverage in Code Review
|
||||
|
||||
**Use Coverage Reports to Guide Review**:
|
||||
```bash
|
||||
# Generate coverage diff for PR
|
||||
pytest --cov=src --cov-report=html
|
||||
diff-cover coverage.xml --compare-branch=main
|
||||
|
||||
# Shows:
|
||||
# - New code added without tests
|
||||
# - Changed code with decreased coverage
|
||||
# - Critical paths missing coverage
|
||||
```
|
||||
|
||||
**Review Checklist**:
|
||||
- [ ] New business logic has >90% coverage
|
||||
- [ ] Critical paths are covered
|
||||
- [ ] Edge cases have tests
|
||||
- [ ] Tests make meaningful assertions
|
||||
- [ ] No coverage-chasing empty tests
|
||||
|
||||
### Coverage for Different Test Levels
|
||||
|
||||
**Unit Test Coverage**: Track separately from integration/E2E
|
||||
```bash
|
||||
# Unit tests only
|
||||
pytest tests/unit/ --cov=src --cov-report=term
|
||||
|
||||
# Unit test coverage should be higher (70-90%)
|
||||
```
|
||||
|
||||
**Integration Test Coverage**: Track what integration tests add
|
||||
```bash
|
||||
# Run all tests, see what integration adds
|
||||
pytest tests/ --cov=src --cov-report=term
|
||||
|
||||
# Many integration tests don't add coverage
|
||||
# but add confidence in integration behavior
|
||||
```
|
||||
|
||||
**Don't Count E2E Against Coverage**: E2E tests validate workflows, not code coverage
|
||||
```bash
|
||||
# Exclude E2E from coverage metrics
|
||||
pytest tests/unit/ tests/integration/ --cov=src
|
||||
# Don't include tests/e2e/ in coverage calculation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Risk-Based Coverage Strategy
|
||||
|
||||
Focus coverage efforts where bugs have highest impact.
|
||||
|
||||
### Risk Assessment Matrix
|
||||
|
||||
```
|
||||
High Risk = High Business Impact + High Complexity + High Change Frequency
|
||||
Medium Risk = Mixed factors
|
||||
Low Risk = Low on most factors
|
||||
|
||||
High Risk (Target: 90-100% coverage)
|
||||
├─ Payment processing
|
||||
├─ Authentication and authorization
|
||||
├─ Order calculation and fulfillment
|
||||
├─ Data migrations
|
||||
└─ Security-critical operations
|
||||
|
||||
Medium Risk (Target: 70-90% coverage)
|
||||
├─ User management
|
||||
├─ Reporting and analytics
|
||||
├─ Email notifications
|
||||
├─ Search functionality
|
||||
└─ Content management
|
||||
|
||||
Low Risk (Target: 30-70% coverage)
|
||||
├─ Configuration loading
|
||||
├─ Static content rendering
|
||||
├─ Logging utilities
|
||||
├─ Admin tools (low usage)
|
||||
└─ Deprecated features
|
||||
```
|
||||
|
||||
### Historical Bug Analysis
|
||||
|
||||
**Use Past Data to Guide Coverage**:
|
||||
```sql
|
||||
-- Find modules with highest bug density
|
||||
SELECT
|
||||
module,
|
||||
COUNT(*) as bug_count,
|
||||
COUNT(*) / lines_of_code as bug_density
|
||||
FROM bugs
|
||||
JOIN code_metrics ON bugs.module = code_metrics.module
|
||||
WHERE created_at > DATE_SUB(NOW(), INTERVAL 12 MONTH)
|
||||
GROUP BY module
|
||||
ORDER BY bug_density DESC;
|
||||
|
||||
-- Prioritize coverage for high bug density modules
|
||||
```
|
||||
|
||||
### Coverage Heat Maps
|
||||
|
||||
**Visualize Coverage by Risk**:
|
||||
```
|
||||
Component | Coverage | Risk | Status
|
||||
-----------------------|----------|-------|--------
|
||||
Payment Processing | 95% | High | ✓ Good
|
||||
Authentication | 88% | High | ⚠ Needs improvement
|
||||
Order Calculation | 92% | High | ✓ Good
|
||||
User Management | 75% | Med | ✓ Acceptable
|
||||
Email Service | 65% | Med | ⚠ Consider adding tests
|
||||
Logging Utils | 45% | Low | ✓ Acceptable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Coverage Tools and Techniques
|
||||
|
||||
### Python Coverage Tools
|
||||
|
||||
```bash
|
||||
# pytest-cov (most common)
|
||||
pytest --cov=src --cov-report=html --cov-report=term
|
||||
|
||||
# View HTML report
|
||||
open htmlcov/index.html
|
||||
|
||||
# Show missing lines
|
||||
pytest --cov=src --cov-report=term-missing
|
||||
|
||||
# Branch coverage
|
||||
pytest --cov=src --cov-branch
|
||||
|
||||
# Fail if coverage below threshold
|
||||
pytest --cov=src --cov-fail-under=80
|
||||
```
|
||||
|
||||
### Coverage Badges for Documentation
|
||||
|
||||
```markdown
|
||||
# README.md
|
||||

|
||||
|
||||
# Updates automatically in CI
|
||||
# codecov.io, coveralls.io provide this
|
||||
```
|
||||
|
||||
### Continuous Coverage Tracking
|
||||
|
||||
```yaml
|
||||
# .github/workflows/coverage.yml
|
||||
name: Coverage
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
coverage:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Run tests with coverage
|
||||
run: pytest --cov=src --cov-report=xml
|
||||
|
||||
- name: Upload to Codecov
|
||||
uses: codecov/codecov-action@v3
|
||||
with:
|
||||
file: ./coverage.xml
|
||||
fail_ci_if_error: true
|
||||
|
||||
- name: Comment coverage on PR
|
||||
uses: py-cov-action/python-coverage-comment-action@v3
|
||||
with:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary: Coverage Strategy Principles
|
||||
|
||||
1. **Focus on behavior coverage, not just code coverage**: Tests should validate scenarios, not just execute lines
|
||||
|
||||
2. **Set differentiated targets**: Business logic deserves higher coverage than configuration code
|
||||
|
||||
3. **Use coverage as a guide, not a goal**: Low coverage indicates gaps; high coverage doesn't guarantee quality
|
||||
|
||||
4. **Prioritize by risk**: Critical systems deserve comprehensive testing, peripheral features need less
|
||||
|
||||
5. **Track trends, not just numbers**: Coverage should generally increase over time, especially for new code
|
||||
|
||||
6. **Don't game the metrics**: Tests without assertions or testing trivial code provides false confidence
|
||||
|
||||
7. **Combine with other metrics**: Coverage + defect detection rate + test execution time = complete picture
|
||||
|
||||
**The Ultimate Goal**: High-quality tests that catch bugs, enable refactoring, and provide confidence—not just high coverage numbers. Coverage is a useful tool, but test effectiveness is what truly matters.
|
||||
Reference in New Issue
Block a user