gh-tachyon-beep-skillpacks-…/skills/mutation-testing/SKILL.md

---
name: mutation-testing
description: Use when validating test effectiveness, measuring test quality beyond coverage, choosing mutation testing tools (Stryker, PITest, mutmut), interpreting mutation scores, or improving test suites - provides mutation operators, score interpretation, and integration patterns
---

# Mutation Testing

## Overview

**Core principle:** Mutation testing validates that your tests actually test something by introducing bugs and checking if tests catch them.

**Rule:** 100% code coverage doesn't mean good tests. Mutation score measures if tests detect bugs.

## Code Coverage vs Mutation Score

| Metric | What It Measures | Example |
|--------|------------------|---------|
| **Code Coverage** | Lines executed by tests | `calculate_tax(100)` executes code = 100% coverage |
| **Mutation Score** | Bugs detected by tests | Change `*` to `/` → test still passes = poor tests |

**Problem with coverage:**

```python
def calculate_tax(amount):
    return amount * 0.08

def test_calculate_tax():
    calculate_tax(100)  # 100% coverage, but asserts nothing!
```

**Mutation testing catches this:**
1. Mutates `* 0.08` to `/ 0.08`
2. Runs test
3. Test still passes → **Survived mutation** (bad test!)

---

## How Mutation Testing Works

**Process:**
1. **Create mutant:** Change code slightly (e.g., `+` → `-`, `<` → `<=`)
2. **Run tests:** Do tests fail?
3. **Classify:**
   - **Killed:** Test failed → Good test!
   - **Survived:** Test passed → Test doesn't verify this logic
   - **Timeout:** Test hung → Usually killed
   - **No coverage:** Not executed → Add test

**Mutation Score:**
```
Mutation Score = (Killed Mutants / Total Mutants) × 100
```

**Thresholds:**
- **> 80%:** Excellent test quality
- **60-80%:** Acceptable
- **< 60%:** Tests are weak

---

## Tool Selection

| Language | Tool | Why |
|----------|------|-----|
| **JavaScript/TypeScript** | **Stryker** | Best JS support, framework-agnostic |
| **Java** | **PITest** | Industry standard, Maven/Gradle integration |
| **Python** | **mutmut** | Simple, fast, pytest integration |
| **C#** | **Stryker.NET** | .NET ecosystem integration |

---

## Example: Python with mutmut

### Installation

```bash
pip install mutmut
```

---

### Basic Usage

```bash
# Run mutation testing
mutmut run

# View results
mutmut results

# Show survived mutants (bugs your tests missed)
mutmut show
```

---

### Configuration

```toml
# setup.cfg
[mutmut]
paths_to_mutate=src/
backup=False
runner=python -m pytest -x
tests_dir=tests/
```

---

### Example

```python
# src/calculator.py
def calculate_discount(price, percent):
    if percent > 100:
        raise ValueError("Percent cannot exceed 100")
    return price * (1 - percent / 100)

# tests/test_calculator.py
def test_calculate_discount():
    result = calculate_discount(100, 20)
    assert result == 80
```

**Run mutmut:**
```bash
mutmut run
```

**Possible mutations:**
1. `percent > 100` → `percent >= 100` (boundary)
2. `1 - percent` → `1 + percent` (operator)
3. `percent / 100` → `percent * 100` (operator)
4. `price * (...)` → `price / (...)` (operator)

**Results:**
- Mutation 1 **survived** (test doesn't check boundary)
- Mutation 2, 3, 4 **killed** (test catches these)

**Improvement:**
```python
def test_calculate_discount_boundary():
    # Catch mutation 1
    with pytest.raises(ValueError):
        calculate_discount(100, 101)
```

---

## Common Mutation Operators

| Operator | Original | Mutated | What It Tests |
|----------|----------|---------|---------------|
| **Arithmetic** | `a + b` | `a - b` | Calculation logic |
| **Relational** | `a < b` | `a <= b` | Boundary conditions |
| **Logical** | `a and b` | `a or b` | Boolean logic |
| **Unary** | `+x` | `-x` | Sign handling |
| **Constant** | `return 0` | `return 1` | Magic numbers |
| **Return** | `return x` | `return None` | Return value validation |
| **Statement deletion** | `x = 5` | (deleted) | Side effects |

---

## Interpreting Mutation Score

### High Score (> 80%)

**Good tests that catch most bugs.**

```python
def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

# Mutations killed:
# - a - b (returns -1, test expects 5)
# - a * b (returns 6, test expects 5)
```

---

### Low Score (< 60%)

**Weak tests that don't verify logic.**

```python
def validate_email(email):
    return "@" in email and "." in email

def test_validate_email():
    validate_email("user@example.com")  # No assertion!

# Mutations survived:
# - "@" in email → "@" not in email
# - "and" → "or"
# - (All mutations survive because test asserts nothing)
```

---

### Survived Mutants to Investigate

**Priority order:**
1. **Business logic mutations** (calculations, validations)
2. **Boundary conditions** (`<` → `<=`, `>` → `>=`)
3. **Error handling** (exception raising)

**Low priority:**
4. **Logging statements**
5. **Constants that don't affect behavior**

---

## Integration with CI/CD

### GitHub Actions (Python)

```yaml
# .github/workflows/mutation-testing.yml
name: Mutation Testing

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday 2 AM
  workflow_dispatch:  # Manual trigger

jobs:
  mutmut:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install mutmut pytest

      - name: Run mutation testing
        run: mutmut run

      - name: Generate report
        run: |
          mutmut results
          mutmut html  # Generate HTML report

      - name: Upload report
        uses: actions/upload-artifact@v3
        with:
          name: mutation-report
          path: html/
```

**Why weekly, not every PR:**
- Mutation testing is slow (10-100x slower than regular tests)
- Runs every possible mutation
- Not needed for every change

---

## Anti-Patterns Catalog

### ❌ Chasing 100% Mutation Score

**Symptom:** Writing tests just to kill surviving mutants

**Why bad:**
- Some mutations are equivalent (don't change behavior)
- Diminishing returns after 85%
- Time better spent on integration tests

**Fix:** Target 80-85%, focus on business logic

---

### ❌ Ignoring Equivalent Mutants

**Symptom:** "95% mutation score, still have survived mutants"

**Equivalent mutants:** Changes that don't affect behavior

```python
def is_positive(x):
    return x > 0

# Mutation: x > 0 → x >= 0
# If input is never exactly 0, this mutation is equivalent
```

**Fix:** Mark as equivalent in tool config

```bash
# mutmut - mark mutant as equivalent
mutmut results
# Choose mutant ID
mutmut apply 42 --mark-as-equivalent
```

---

### ❌ Running Mutation Tests on Every Commit

**Symptom:** CI takes 2 hours

**Why bad:** Mutation testing is 10-100x slower than regular tests

**Fix:**
- Run weekly or nightly
- Run on core modules only (not entire codebase)
- Use as quality metric, not blocker

---

## Incremental Mutation Testing

**Test only changed code:**

```bash
# mutmut - test only modified files
git diff --name-only main | grep '\.py$' | mutmut run --paths-to-mutate -
```

**Benefits:**
- Faster feedback (minutes instead of hours)
- Can run on PRs
- Focuses on new code

---

## Bottom Line

**Mutation testing measures if your tests actually detect bugs. High code coverage doesn't mean good tests.**

**Usage:**
- Run weekly/nightly, not on every commit (too slow)
- Target 80-85% mutation score for business logic
- Use mutmut (Python), Stryker (JS), PITest (Java)
- Focus on killed vs survived mutants
- Ignore equivalent mutants

**If your tests have 95% coverage but 40% mutation score, your tests aren't testing anything meaningful. Fix the tests, not the coverage metric.**