gh-tachyon-beep-skillpacks-…/skills/test-isolation-fundamentals/SKILL.md

---
name: test-isolation-fundamentals
description: Use when tests fail together but pass alone, diagnosing test pollution, ensuring test independence and idempotence, managing shared state, or designing parallel-safe tests - provides isolation principles, database/file/service patterns, and cleanup strategies
---

# Test Isolation Fundamentals

## Overview

**Core principle:** Each test must work independently, regardless of execution order or parallel execution.

**Rule:** If a test fails when run with other tests but passes alone, you have an isolation problem. Fix it before adding more tests.

## When You Have Isolation Problems

**Symptoms:**
- Tests pass individually: `pytest test_checkout.py` ✓
- Tests fail in full suite: `pytest` ✗
- Errors like "User already exists", "Expected empty but found data"
- Tests fail randomly or only in CI
- Different results when tests run in different orders

**Root cause:** Tests share mutable state without cleanup.

## The Five Principles

### 1. Order-Independence

**Tests must pass regardless of execution order.**

```bash
# All of these must produce identical results
pytest tests/  # alphabetical order
pytest tests/ --random-order  # random order
pytest tests/ --reverse  # reverse order
```

**Anti-pattern:**
```python
# ❌ BAD: Test B depends on Test A running first
def test_create_user():
    db.users.insert({"id": 1, "name": "Alice"})

def test_update_user():
    db.users.update({"id": 1}, {"name": "Bob"})  # Assumes Alice exists!
```

**Fix:** Each test creates its own data.

---

### 2. Idempotence

**Running a test twice produces the same result both times.**

```bash
# Both runs must pass
pytest test_checkout.py  # First run
pytest test_checkout.py  # Second run (same result)
```

**Anti-pattern:**
```python
# ❌ BAD: Second run fails on unique constraint
def test_signup():
    user = create_user(email="test@example.com")
    assert user.id is not None
    # No cleanup - second run fails: "email already exists"
```

**Fix:** Clean up data after test OR use unique data per run.

---

### 3. Fresh State

**Each test starts with a clean slate.**

**What needs to be fresh:**
- Database records
- Files and directories
- In-memory caches
- Global variables
- Module-level state
- Environment variables
- Network sockets/ports
- Background processes

**Anti-pattern:**
```python
# ❌ BAD: Shared mutable global state
cache = {}  # Module-level global

def test_cache_miss():
    assert get_from_cache("key1") is None  # Passes first time
    cache["key1"] = "value"  # Pollutes global state

def test_cache_lookup():
    assert get_from_cache("key1") is None  # Fails if previous test ran!
```

---

### 4. Explicit Scope

**Know what state is shared vs isolated.**

**Test scopes (pytest):**
- `scope="function"` - Fresh per test (default, safest)
- `scope="class"` - Shared across test class
- `scope="module"` - Shared across file
- `scope="session"` - Shared across entire test run

**Rule:** Default to `scope="function"`. Only use broader scopes for expensive resources that are READ-ONLY.

```python
# ✅ GOOD: Expensive read-only data can be shared
@pytest.fixture(scope="session")
def large_config_file():
    return load_config("data.json")  # Expensive, never modified

# ❌ BAD: Mutable data shared across tests
@pytest.fixture(scope="session")
def database():
    return Database()  # Tests will pollute each other!

# ✅ GOOD: Mutable data fresh per test
@pytest.fixture(scope="function")
def database():
    db = Database()
    yield db
    db.cleanup()  # Fresh per test
```

---

### 5. Parallel Safety

**Tests must work when run concurrently.**

```bash
pytest -n 4  # Run 4 tests in parallel with pytest-xdist
```

**Parallel-unsafe patterns:**
- Shared files without unique names
- Fixed network ports
- Singleton databases
- Global module state
- Fixed temp directories

**Fix:** Use unique identifiers per test (UUIDs, process IDs, random ports).

---

## Isolation Patterns by Resource Type

### Database Isolation

**Pattern 1: Transactions with Rollback (Fastest, Recommended)**

```python
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

@pytest.fixture
def db_session(db_engine):
    """Each test gets a fresh DB session that auto-rollbacks."""
    connection = db_engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    transaction.rollback()  # Undo all changes
    connection.close()
```

**Why it works:**
- No cleanup code needed - rollback is automatic
- Fast (<1ms per test)
- Works with ANY database (PostgreSQL, MySQL, SQLite, Oracle)
- Handles FK relationships automatically

**When NOT to use:**
- Testing actual commits
- Testing transaction isolation levels
- Multi-database transactions

---

**Pattern 2: Unique Data Per Test**

```python
import uuid
import pytest

@pytest.fixture
def unique_user():
    """Each test gets a unique user."""
    email = f"test-{uuid.uuid4()}@example.com"
    user = create_user(email=email, name="Test User")

    yield user

    # Optional cleanup (or rely on test DB being dropped)
    delete_user(user.id)
```

**Why it works:**
- Tests don't interfere (different users)
- Can run in parallel
- Idempotent (UUID ensures uniqueness)

**When to use:**
- Testing with real databases
- Parallel test execution
- Integration tests that need real commits

---

**Pattern 3: Test Database Per Test**

```python
@pytest.fixture
def isolated_db():
    """Each test gets its own temporary database."""
    db_name = f"test_db_{uuid.uuid4().hex}"
    create_database(db_name)

    yield get_connection(db_name)

    drop_database(db_name)
```

**Why it works:**
- Complete isolation
- Can test schema migrations
- No cross-test pollution

**When NOT to use:**
- Unit tests (too slow)
- Large test suites (overhead adds up)

---

### File System Isolation

**Pattern: Temporary Directories**

```python
import pytest
import tempfile
import shutil

@pytest.fixture
def temp_workspace():
    """Each test gets a fresh temporary directory."""
    tmpdir = tempfile.mkdtemp(prefix="test_")

    yield tmpdir

    shutil.rmtree(tmpdir)  # Clean up
```

**Parallel-safe version:**

```python
@pytest.fixture
def temp_workspace(tmp_path):
    """pytest's tmp_path is automatically unique per test."""
    workspace = tmp_path / "workspace"
    workspace.mkdir()

    yield workspace

    # No cleanup needed - pytest handles it
```

**Why it works:**
- Each test writes to different directory
- Parallel-safe (unique paths)
- Automatic cleanup

---

### Service/API Isolation

**Pattern: Mocking External Services**

```python
import pytest
from unittest.mock import patch, MagicMock

@pytest.fixture
def mock_stripe():
    """Mock Stripe API for all tests."""
    with patch('stripe.Charge.create') as mock:
        mock.return_value = MagicMock(id="ch_test123", status="succeeded")
        yield mock
```

**When to use:**
- External APIs (Stripe, Twilio, SendGrid)
- Slow services
- Non-deterministic responses
- Services that cost money per call

**When NOT to use:**
- Testing integration with real service (use separate integration test suite)

---

### In-Memory Cache Isolation

**Pattern: Clear Cache Before Each Test**

```python
import pytest

@pytest.fixture(autouse=True)
def clear_cache():
    """Automatically clear cache before each test."""
    cache.clear()
    yield
    # Optional: clear after test too
    cache.clear()
```

**Why `autouse=True`:** Runs automatically for every test without explicit declaration.

---

### Process/Port Isolation

**Pattern: Dynamic Port Allocation**

```python
import socket
import pytest

def get_free_port():
    """Find an available port."""
    sock = socket.socket()
    sock.bind(('', 0))
    port = sock.getsockname()[1]
    sock.close()
    return port

@pytest.fixture
def test_server():
    """Each test gets a server on a unique port."""
    port = get_free_port()
    server = start_server(port=port)

    yield f"http://localhost:{port}"

    server.stop()
```

**Why it works:**
- Tests can run in parallel (different ports)
- No port conflicts

---

## Test Doubles: When to Use What

| Type | Purpose | Example |
|------|---------|---------|
| **Stub** | Returns hardcoded values | `getUser() → {id: 1, name: "Alice"}` |
| **Mock** | Verifies calls were made | `assert emailService.send.called` |
| **Fake** | Working implementation, simplified | In-memory database instead of PostgreSQL |
| **Spy** | Records calls for later inspection | Logs all method calls |

**Decision tree:**

```
Do you need to verify the call was made?
  YES → Use Mock
  NO → Do you need a working implementation?
    YES → Use Fake
    NO → Use Stub
```

---

## Diagnosing Isolation Problems

### Step 1: Identify Flaky Tests

```bash
# Run tests 100 times to find flakiness
pytest --count=100 test_checkout.py

# Run in random order
pytest --random-order
```

**Interpretation:**
- Passes 100/100 → Not flaky
- Passes 95/100 → Flaky (5% failure rate)
- Failures are random → Parallel unsafe OR order-dependent

---

### Step 2: Find Which Tests Interfere

**Run tests in isolation:**

```bash
# Test A alone
pytest test_a.py  # ✓ Passes

# Test B alone
pytest test_b.py  # ✓ Passes

# Both together
pytest test_a.py test_b.py  # ✗ Test B fails

# Conclusion: Test A pollutes state that Test B depends on
```

**Reverse the order:**

```bash
pytest test_b.py test_a.py  # Does Test A fail now?
```

- If YES: Bidirectional pollution
- If NO: Test A pollutes, Test B is victim

---

### Step 3: Identify Shared State

**Add diagnostic logging:**

```python
@pytest.fixture(autouse=True)
def log_state():
    """Log state before/after each test."""
    print(f"Before: DB has {db.count()} records")
    yield
    print(f"After: DB has {db.count()} records")
```

**Look for:**
- Record count increasing over time (no cleanup)
- Files accumulating
- Cache growing
- Ports in use

---

### Step 4: Audit for Global State

**Search codebase for isolation violations:**

```bash
# Module-level globals
grep -r "^[A-Z_]* = " app/

# Global caches
grep -r "cache = " app/

# Singletons
grep -r "@singleton" app/
grep -r "class.*Singleton" app/
```

---

## Anti-Patterns Catalog

### ❌ Cleanup Code Instead of Structural Isolation

**Symptom:** Every test has teardown code to clean up

```python
def test_checkout():
    user = create_user()
    cart = create_cart(user)

    checkout(cart)

    # Teardown
    delete_cart(cart.id)
    delete_user(user.id)
```

**Why bad:**
- If test fails before cleanup, state pollutes
- If cleanup has bugs, state pollutes
- Forces sequential execution (no parallelism)

**Fix:** Use transactions, unique IDs, or dependency injection

---

### ❌ Shared Test Fixtures

**Symptom:** Fixtures modify mutable state

```python
@pytest.fixture(scope="module")
def user():
    return create_user(email="test@example.com")

def test_update_name(user):
    user.name = "Alice"  # Modifies shared fixture!
    save(user)

def test_update_email(user):
    # Expects name to be original, but Test 1 changed it!
    assert user.name == "Test User"  # FAILS
```

**Why bad:** Tests interfere when fixture is modified

**Fix:** Use `scope="function"` for mutable fixtures

---

### ❌ Hidden Dependencies on Execution Order

**Symptom:** Test suite has implicit execution order

```python
# test_a.py
def test_create_admin():
    create_user(email="admin@example.com", role="admin")

# test_b.py
def test_admin_permissions():
    admin = get_user("admin@example.com")  # Assumes test_a ran!
    assert admin.has_permission("delete_users")
```

**Why bad:** Breaks when tests run in different order or in parallel

**Fix:** Each test creates its own dependencies

---

### ❌ Testing on Production-Like State

**Symptom:** Tests run against shared database with existing data

```python
def test_user_count():
    assert db.users.count() == 100  # Assumes specific state!
```

**Why bad:**
- Tests fail when data changes
- Can't run in parallel
- Can't run idempotently

**Fix:** Use isolated test database or count relative to test's own data

---

## Common Scenarios

### Scenario 1: "Tests pass locally, fail in CI"

**Likely causes:**
1. **Timing issues** - CI is slower/faster, race conditions exposed
2. **Parallel execution** - CI runs tests in parallel, local doesn't
3. **Missing cleanup** - Local has leftover state, CI is fresh

**Diagnosis:**
```bash
# Test parallel execution locally
pytest -n 4

# Test with clean state
rm -rf .pytest_cache && pytest
```

---

### Scenario 2: "Random test failures that disappear on retry"

**Likely causes:**
1. **Race conditions** - Async operations not awaited
2. **Shared mutable state** - Global variables polluted
3. **External service flakiness** - Real APIs being called

**Diagnosis:**
```bash
# Run same test 100 times
pytest --count=100 test_flaky.py

# If failure rate is consistent (e.g., 5/100), it's likely shared state
# If failure rate varies wildly, it's likely race condition
```

---

### Scenario 3: "Database unique constraint violations"

**Symptom:** `IntegrityError: duplicate key value violates unique constraint`

**Cause:** Tests reuse same email/username/ID

**Fix:**
```python
import uuid

@pytest.fixture
def unique_user():
    email = f"test-{uuid.uuid4()}@example.com"
    return create_user(email=email)
```

---

## Quick Reference: Isolation Strategy Decision Tree

```
What resource needs isolation?

DATABASE
├─ Can you use transactions? → Transaction Rollback (fastest)
├─ Need real commits? → Unique Data Per Test
└─ Need schema changes? → Test Database Per Test

FILES
├─ Few files? → pytest's tmp_path
└─ Complex directories? → tempfile.mkdtemp()

EXTERNAL SERVICES
├─ Testing integration? → Separate integration test suite
└─ Testing business logic? → Mock the service

IN-MEMORY STATE
├─ Caches → Clear before each test (autouse fixture)
├─ Globals → Dependency injection (refactor)
└─ Module-level → Reset in fixture or avoid entirely

PROCESSES/PORTS
└─ Dynamic port allocation per test
```

---

## Bottom Line

**Test isolation is structural, not reactive.**

- ❌ **Reactive:** Write cleanup code after each test
- ✅ **Structural:** Design tests so cleanup isn't needed

**The hierarchy:**
1. **Best:** Dependency injection (no shared state)
2. **Good:** Transactions/tmp_path (automatic cleanup)
3. **Acceptable:** Unique data per test (explicit isolation)
4. **Last resort:** Manual cleanup (fragile, error-prone)

**If your tests fail together but pass alone, you have an isolation problem. Stop adding tests and fix isolation first.**