349 lines
7.7 KiB
Markdown
349 lines
7.7 KiB
Markdown
---
|
||
name: mutation-testing
|
||
description: Use when validating test effectiveness, measuring test quality beyond coverage, choosing mutation testing tools (Stryker, PITest, mutmut), interpreting mutation scores, or improving test suites - provides mutation operators, score interpretation, and integration patterns
|
||
---
|
||
|
||
# Mutation Testing
|
||
|
||
## Overview
|
||
|
||
**Core principle:** Mutation testing validates that your tests actually test something by introducing bugs and checking if tests catch them.
|
||
|
||
**Rule:** 100% code coverage doesn't mean good tests. Mutation score measures if tests detect bugs.
|
||
|
||
## Code Coverage vs Mutation Score
|
||
|
||
| Metric | What It Measures | Example |
|
||
|--------|------------------|---------|
|
||
| **Code Coverage** | Lines executed by tests | `calculate_tax(100)` executes code = 100% coverage |
|
||
| **Mutation Score** | Bugs detected by tests | Change `*` to `/` → test still passes = poor tests |
|
||
|
||
**Problem with coverage:**
|
||
|
||
```python
|
||
def calculate_tax(amount):
|
||
return amount * 0.08
|
||
|
||
def test_calculate_tax():
|
||
calculate_tax(100) # 100% coverage, but asserts nothing!
|
||
```
|
||
|
||
**Mutation testing catches this:**
|
||
1. Mutates `* 0.08` to `/ 0.08`
|
||
2. Runs test
|
||
3. Test still passes → **Survived mutation** (bad test!)
|
||
|
||
---
|
||
|
||
## How Mutation Testing Works
|
||
|
||
**Process:**
|
||
1. **Create mutant:** Change code slightly (e.g., `+` → `-`, `<` → `<=`)
|
||
2. **Run tests:** Do tests fail?
|
||
3. **Classify:**
|
||
- **Killed:** Test failed → Good test!
|
||
- **Survived:** Test passed → Test doesn't verify this logic
|
||
- **Timeout:** Test hung → Usually killed
|
||
- **No coverage:** Not executed → Add test
|
||
|
||
**Mutation Score:**
|
||
```
|
||
Mutation Score = (Killed Mutants / Total Mutants) × 100
|
||
```
|
||
|
||
**Thresholds:**
|
||
- **> 80%:** Excellent test quality
|
||
- **60-80%:** Acceptable
|
||
- **< 60%:** Tests are weak
|
||
|
||
---
|
||
|
||
## Tool Selection
|
||
|
||
| Language | Tool | Why |
|
||
|----------|------|-----|
|
||
| **JavaScript/TypeScript** | **Stryker** | Best JS support, framework-agnostic |
|
||
| **Java** | **PITest** | Industry standard, Maven/Gradle integration |
|
||
| **Python** | **mutmut** | Simple, fast, pytest integration |
|
||
| **C#** | **Stryker.NET** | .NET ecosystem integration |
|
||
|
||
---
|
||
|
||
## Example: Python with mutmut
|
||
|
||
### Installation
|
||
|
||
```bash
|
||
pip install mutmut
|
||
```
|
||
|
||
---
|
||
|
||
### Basic Usage
|
||
|
||
```bash
|
||
# Run mutation testing
|
||
mutmut run
|
||
|
||
# View results
|
||
mutmut results
|
||
|
||
# Show survived mutants (bugs your tests missed)
|
||
mutmut show
|
||
```
|
||
|
||
---
|
||
|
||
### Configuration
|
||
|
||
```toml
|
||
# setup.cfg
|
||
[mutmut]
|
||
paths_to_mutate=src/
|
||
backup=False
|
||
runner=python -m pytest -x
|
||
tests_dir=tests/
|
||
```
|
||
|
||
---
|
||
|
||
### Example
|
||
|
||
```python
|
||
# src/calculator.py
|
||
def calculate_discount(price, percent):
|
||
if percent > 100:
|
||
raise ValueError("Percent cannot exceed 100")
|
||
return price * (1 - percent / 100)
|
||
|
||
# tests/test_calculator.py
|
||
def test_calculate_discount():
|
||
result = calculate_discount(100, 20)
|
||
assert result == 80
|
||
```
|
||
|
||
**Run mutmut:**
|
||
```bash
|
||
mutmut run
|
||
```
|
||
|
||
**Possible mutations:**
|
||
1. `percent > 100` → `percent >= 100` (boundary)
|
||
2. `1 - percent` → `1 + percent` (operator)
|
||
3. `percent / 100` → `percent * 100` (operator)
|
||
4. `price * (...)` → `price / (...)` (operator)
|
||
|
||
**Results:**
|
||
- Mutation 1 **survived** (test doesn't check boundary)
|
||
- Mutation 2, 3, 4 **killed** (test catches these)
|
||
|
||
**Improvement:**
|
||
```python
|
||
def test_calculate_discount_boundary():
|
||
# Catch mutation 1
|
||
with pytest.raises(ValueError):
|
||
calculate_discount(100, 101)
|
||
```
|
||
|
||
---
|
||
|
||
## Common Mutation Operators
|
||
|
||
| Operator | Original | Mutated | What It Tests |
|
||
|----------|----------|---------|---------------|
|
||
| **Arithmetic** | `a + b` | `a - b` | Calculation logic |
|
||
| **Relational** | `a < b` | `a <= b` | Boundary conditions |
|
||
| **Logical** | `a and b` | `a or b` | Boolean logic |
|
||
| **Unary** | `+x` | `-x` | Sign handling |
|
||
| **Constant** | `return 0` | `return 1` | Magic numbers |
|
||
| **Return** | `return x` | `return None` | Return value validation |
|
||
| **Statement deletion** | `x = 5` | (deleted) | Side effects |
|
||
|
||
---
|
||
|
||
## Interpreting Mutation Score
|
||
|
||
### High Score (> 80%)
|
||
|
||
**Good tests that catch most bugs.**
|
||
|
||
```python
|
||
def add(a, b):
|
||
return a + b
|
||
|
||
def test_add():
|
||
assert add(2, 3) == 5
|
||
assert add(-1, 1) == 0
|
||
assert add(0, 0) == 0
|
||
|
||
# Mutations killed:
|
||
# - a - b (returns -1, test expects 5)
|
||
# - a * b (returns 6, test expects 5)
|
||
```
|
||
|
||
---
|
||
|
||
### Low Score (< 60%)
|
||
|
||
**Weak tests that don't verify logic.**
|
||
|
||
```python
|
||
def validate_email(email):
|
||
return "@" in email and "." in email
|
||
|
||
def test_validate_email():
|
||
validate_email("user@example.com") # No assertion!
|
||
|
||
# Mutations survived:
|
||
# - "@" in email → "@" not in email
|
||
# - "and" → "or"
|
||
# - (All mutations survive because test asserts nothing)
|
||
```
|
||
|
||
---
|
||
|
||
### Survived Mutants to Investigate
|
||
|
||
**Priority order:**
|
||
1. **Business logic mutations** (calculations, validations)
|
||
2. **Boundary conditions** (`<` → `<=`, `>` → `>=`)
|
||
3. **Error handling** (exception raising)
|
||
|
||
**Low priority:**
|
||
4. **Logging statements**
|
||
5. **Constants that don't affect behavior**
|
||
|
||
---
|
||
|
||
## Integration with CI/CD
|
||
|
||
### GitHub Actions (Python)
|
||
|
||
```yaml
|
||
# .github/workflows/mutation-testing.yml
|
||
name: Mutation Testing
|
||
|
||
on:
|
||
schedule:
|
||
- cron: '0 2 * * 0' # Weekly on Sunday 2 AM
|
||
workflow_dispatch: # Manual trigger
|
||
|
||
jobs:
|
||
mutmut:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v3
|
||
|
||
- name: Set up Python
|
||
uses: actions/setup-python@v4
|
||
with:
|
||
python-version: '3.11'
|
||
|
||
- name: Install dependencies
|
||
run: |
|
||
pip install mutmut pytest
|
||
|
||
- name: Run mutation testing
|
||
run: mutmut run
|
||
|
||
- name: Generate report
|
||
run: |
|
||
mutmut results
|
||
mutmut html # Generate HTML report
|
||
|
||
- name: Upload report
|
||
uses: actions/upload-artifact@v3
|
||
with:
|
||
name: mutation-report
|
||
path: html/
|
||
```
|
||
|
||
**Why weekly, not every PR:**
|
||
- Mutation testing is slow (10-100x slower than regular tests)
|
||
- Runs every possible mutation
|
||
- Not needed for every change
|
||
|
||
---
|
||
|
||
## Anti-Patterns Catalog
|
||
|
||
### ❌ Chasing 100% Mutation Score
|
||
|
||
**Symptom:** Writing tests just to kill surviving mutants
|
||
|
||
**Why bad:**
|
||
- Some mutations are equivalent (don't change behavior)
|
||
- Diminishing returns after 85%
|
||
- Time better spent on integration tests
|
||
|
||
**Fix:** Target 80-85%, focus on business logic
|
||
|
||
---
|
||
|
||
### ❌ Ignoring Equivalent Mutants
|
||
|
||
**Symptom:** "95% mutation score, still have survived mutants"
|
||
|
||
**Equivalent mutants:** Changes that don't affect behavior
|
||
|
||
```python
|
||
def is_positive(x):
|
||
return x > 0
|
||
|
||
# Mutation: x > 0 → x >= 0
|
||
# If input is never exactly 0, this mutation is equivalent
|
||
```
|
||
|
||
**Fix:** Mark as equivalent in tool config
|
||
|
||
```bash
|
||
# mutmut - mark mutant as equivalent
|
||
mutmut results
|
||
# Choose mutant ID
|
||
mutmut apply 42 --mark-as-equivalent
|
||
```
|
||
|
||
---
|
||
|
||
### ❌ Running Mutation Tests on Every Commit
|
||
|
||
**Symptom:** CI takes 2 hours
|
||
|
||
**Why bad:** Mutation testing is 10-100x slower than regular tests
|
||
|
||
**Fix:**
|
||
- Run weekly or nightly
|
||
- Run on core modules only (not entire codebase)
|
||
- Use as quality metric, not blocker
|
||
|
||
---
|
||
|
||
## Incremental Mutation Testing
|
||
|
||
**Test only changed code:**
|
||
|
||
```bash
|
||
# mutmut - test only modified files
|
||
git diff --name-only main | grep '\.py$' | mutmut run --paths-to-mutate -
|
||
```
|
||
|
||
**Benefits:**
|
||
- Faster feedback (minutes instead of hours)
|
||
- Can run on PRs
|
||
- Focuses on new code
|
||
|
||
---
|
||
|
||
## Bottom Line
|
||
|
||
**Mutation testing measures if your tests actually detect bugs. High code coverage doesn't mean good tests.**
|
||
|
||
**Usage:**
|
||
- Run weekly/nightly, not on every commit (too slow)
|
||
- Target 80-85% mutation score for business logic
|
||
- Use mutmut (Python), Stryker (JS), PITest (Java)
|
||
- Focus on killed vs survived mutants
|
||
- Ignore equivalent mutants
|
||
|
||
**If your tests have 95% coverage but 40% mutation score, your tests aren't testing anything meaningful. Fix the tests, not the coverage metric.**
|