14 KiB
name, description
| name | description |
|---|---|
| test-isolation-fundamentals | Use when tests fail together but pass alone, diagnosing test pollution, ensuring test independence and idempotence, managing shared state, or designing parallel-safe tests - provides isolation principles, database/file/service patterns, and cleanup strategies |
Test Isolation Fundamentals
Overview
Core principle: Each test must work independently, regardless of execution order or parallel execution.
Rule: If a test fails when run with other tests but passes alone, you have an isolation problem. Fix it before adding more tests.
When You Have Isolation Problems
Symptoms:
- Tests pass individually:
pytest test_checkout.py✓ - Tests fail in full suite:
pytest✗ - Errors like "User already exists", "Expected empty but found data"
- Tests fail randomly or only in CI
- Different results when tests run in different orders
Root cause: Tests share mutable state without cleanup.
The Five Principles
1. Order-Independence
Tests must pass regardless of execution order.
# All of these must produce identical results
pytest tests/ # alphabetical order
pytest tests/ --random-order # random order
pytest tests/ --reverse # reverse order
Anti-pattern:
# ❌ BAD: Test B depends on Test A running first
def test_create_user():
db.users.insert({"id": 1, "name": "Alice"})
def test_update_user():
db.users.update({"id": 1}, {"name": "Bob"}) # Assumes Alice exists!
Fix: Each test creates its own data.
2. Idempotence
Running a test twice produces the same result both times.
# Both runs must pass
pytest test_checkout.py # First run
pytest test_checkout.py # Second run (same result)
Anti-pattern:
# ❌ BAD: Second run fails on unique constraint
def test_signup():
user = create_user(email="test@example.com")
assert user.id is not None
# No cleanup - second run fails: "email already exists"
Fix: Clean up data after test OR use unique data per run.
3. Fresh State
Each test starts with a clean slate.
What needs to be fresh:
- Database records
- Files and directories
- In-memory caches
- Global variables
- Module-level state
- Environment variables
- Network sockets/ports
- Background processes
Anti-pattern:
# ❌ BAD: Shared mutable global state
cache = {} # Module-level global
def test_cache_miss():
assert get_from_cache("key1") is None # Passes first time
cache["key1"] = "value" # Pollutes global state
def test_cache_lookup():
assert get_from_cache("key1") is None # Fails if previous test ran!
4. Explicit Scope
Know what state is shared vs isolated.
Test scopes (pytest):
scope="function"- Fresh per test (default, safest)scope="class"- Shared across test classscope="module"- Shared across filescope="session"- Shared across entire test run
Rule: Default to scope="function". Only use broader scopes for expensive resources that are READ-ONLY.
# ✅ GOOD: Expensive read-only data can be shared
@pytest.fixture(scope="session")
def large_config_file():
return load_config("data.json") # Expensive, never modified
# ❌ BAD: Mutable data shared across tests
@pytest.fixture(scope="session")
def database():
return Database() # Tests will pollute each other!
# ✅ GOOD: Mutable data fresh per test
@pytest.fixture(scope="function")
def database():
db = Database()
yield db
db.cleanup() # Fresh per test
5. Parallel Safety
Tests must work when run concurrently.
pytest -n 4 # Run 4 tests in parallel with pytest-xdist
Parallel-unsafe patterns:
- Shared files without unique names
- Fixed network ports
- Singleton databases
- Global module state
- Fixed temp directories
Fix: Use unique identifiers per test (UUIDs, process IDs, random ports).
Isolation Patterns by Resource Type
Database Isolation
Pattern 1: Transactions with Rollback (Fastest, Recommended)
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
@pytest.fixture
def db_session(db_engine):
"""Each test gets a fresh DB session that auto-rollbacks."""
connection = db_engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
transaction.rollback() # Undo all changes
connection.close()
Why it works:
- No cleanup code needed - rollback is automatic
- Fast (<1ms per test)
- Works with ANY database (PostgreSQL, MySQL, SQLite, Oracle)
- Handles FK relationships automatically
When NOT to use:
- Testing actual commits
- Testing transaction isolation levels
- Multi-database transactions
Pattern 2: Unique Data Per Test
import uuid
import pytest
@pytest.fixture
def unique_user():
"""Each test gets a unique user."""
email = f"test-{uuid.uuid4()}@example.com"
user = create_user(email=email, name="Test User")
yield user
# Optional cleanup (or rely on test DB being dropped)
delete_user(user.id)
Why it works:
- Tests don't interfere (different users)
- Can run in parallel
- Idempotent (UUID ensures uniqueness)
When to use:
- Testing with real databases
- Parallel test execution
- Integration tests that need real commits
Pattern 3: Test Database Per Test
@pytest.fixture
def isolated_db():
"""Each test gets its own temporary database."""
db_name = f"test_db_{uuid.uuid4().hex}"
create_database(db_name)
yield get_connection(db_name)
drop_database(db_name)
Why it works:
- Complete isolation
- Can test schema migrations
- No cross-test pollution
When NOT to use:
- Unit tests (too slow)
- Large test suites (overhead adds up)
File System Isolation
Pattern: Temporary Directories
import pytest
import tempfile
import shutil
@pytest.fixture
def temp_workspace():
"""Each test gets a fresh temporary directory."""
tmpdir = tempfile.mkdtemp(prefix="test_")
yield tmpdir
shutil.rmtree(tmpdir) # Clean up
Parallel-safe version:
@pytest.fixture
def temp_workspace(tmp_path):
"""pytest's tmp_path is automatically unique per test."""
workspace = tmp_path / "workspace"
workspace.mkdir()
yield workspace
# No cleanup needed - pytest handles it
Why it works:
- Each test writes to different directory
- Parallel-safe (unique paths)
- Automatic cleanup
Service/API Isolation
Pattern: Mocking External Services
import pytest
from unittest.mock import patch, MagicMock
@pytest.fixture
def mock_stripe():
"""Mock Stripe API for all tests."""
with patch('stripe.Charge.create') as mock:
mock.return_value = MagicMock(id="ch_test123", status="succeeded")
yield mock
When to use:
- External APIs (Stripe, Twilio, SendGrid)
- Slow services
- Non-deterministic responses
- Services that cost money per call
When NOT to use:
- Testing integration with real service (use separate integration test suite)
In-Memory Cache Isolation
Pattern: Clear Cache Before Each Test
import pytest
@pytest.fixture(autouse=True)
def clear_cache():
"""Automatically clear cache before each test."""
cache.clear()
yield
# Optional: clear after test too
cache.clear()
Why autouse=True: Runs automatically for every test without explicit declaration.
Process/Port Isolation
Pattern: Dynamic Port Allocation
import socket
import pytest
def get_free_port():
"""Find an available port."""
sock = socket.socket()
sock.bind(('', 0))
port = sock.getsockname()[1]
sock.close()
return port
@pytest.fixture
def test_server():
"""Each test gets a server on a unique port."""
port = get_free_port()
server = start_server(port=port)
yield f"http://localhost:{port}"
server.stop()
Why it works:
- Tests can run in parallel (different ports)
- No port conflicts
Test Doubles: When to Use What
| Type | Purpose | Example |
|---|---|---|
| Stub | Returns hardcoded values | getUser() → {id: 1, name: "Alice"} |
| Mock | Verifies calls were made | assert emailService.send.called |
| Fake | Working implementation, simplified | In-memory database instead of PostgreSQL |
| Spy | Records calls for later inspection | Logs all method calls |
Decision tree:
Do you need to verify the call was made?
YES → Use Mock
NO → Do you need a working implementation?
YES → Use Fake
NO → Use Stub
Diagnosing Isolation Problems
Step 1: Identify Flaky Tests
# Run tests 100 times to find flakiness
pytest --count=100 test_checkout.py
# Run in random order
pytest --random-order
Interpretation:
- Passes 100/100 → Not flaky
- Passes 95/100 → Flaky (5% failure rate)
- Failures are random → Parallel unsafe OR order-dependent
Step 2: Find Which Tests Interfere
Run tests in isolation:
# Test A alone
pytest test_a.py # ✓ Passes
# Test B alone
pytest test_b.py # ✓ Passes
# Both together
pytest test_a.py test_b.py # ✗ Test B fails
# Conclusion: Test A pollutes state that Test B depends on
Reverse the order:
pytest test_b.py test_a.py # Does Test A fail now?
- If YES: Bidirectional pollution
- If NO: Test A pollutes, Test B is victim
Step 3: Identify Shared State
Add diagnostic logging:
@pytest.fixture(autouse=True)
def log_state():
"""Log state before/after each test."""
print(f"Before: DB has {db.count()} records")
yield
print(f"After: DB has {db.count()} records")
Look for:
- Record count increasing over time (no cleanup)
- Files accumulating
- Cache growing
- Ports in use
Step 4: Audit for Global State
Search codebase for isolation violations:
# Module-level globals
grep -r "^[A-Z_]* = " app/
# Global caches
grep -r "cache = " app/
# Singletons
grep -r "@singleton" app/
grep -r "class.*Singleton" app/
Anti-Patterns Catalog
❌ Cleanup Code Instead of Structural Isolation
Symptom: Every test has teardown code to clean up
def test_checkout():
user = create_user()
cart = create_cart(user)
checkout(cart)
# Teardown
delete_cart(cart.id)
delete_user(user.id)
Why bad:
- If test fails before cleanup, state pollutes
- If cleanup has bugs, state pollutes
- Forces sequential execution (no parallelism)
Fix: Use transactions, unique IDs, or dependency injection
❌ Shared Test Fixtures
Symptom: Fixtures modify mutable state
@pytest.fixture(scope="module")
def user():
return create_user(email="test@example.com")
def test_update_name(user):
user.name = "Alice" # Modifies shared fixture!
save(user)
def test_update_email(user):
# Expects name to be original, but Test 1 changed it!
assert user.name == "Test User" # FAILS
Why bad: Tests interfere when fixture is modified
Fix: Use scope="function" for mutable fixtures
❌ Hidden Dependencies on Execution Order
Symptom: Test suite has implicit execution order
# test_a.py
def test_create_admin():
create_user(email="admin@example.com", role="admin")
# test_b.py
def test_admin_permissions():
admin = get_user("admin@example.com") # Assumes test_a ran!
assert admin.has_permission("delete_users")
Why bad: Breaks when tests run in different order or in parallel
Fix: Each test creates its own dependencies
❌ Testing on Production-Like State
Symptom: Tests run against shared database with existing data
def test_user_count():
assert db.users.count() == 100 # Assumes specific state!
Why bad:
- Tests fail when data changes
- Can't run in parallel
- Can't run idempotently
Fix: Use isolated test database or count relative to test's own data
Common Scenarios
Scenario 1: "Tests pass locally, fail in CI"
Likely causes:
- Timing issues - CI is slower/faster, race conditions exposed
- Parallel execution - CI runs tests in parallel, local doesn't
- Missing cleanup - Local has leftover state, CI is fresh
Diagnosis:
# Test parallel execution locally
pytest -n 4
# Test with clean state
rm -rf .pytest_cache && pytest
Scenario 2: "Random test failures that disappear on retry"
Likely causes:
- Race conditions - Async operations not awaited
- Shared mutable state - Global variables polluted
- External service flakiness - Real APIs being called
Diagnosis:
# Run same test 100 times
pytest --count=100 test_flaky.py
# If failure rate is consistent (e.g., 5/100), it's likely shared state
# If failure rate varies wildly, it's likely race condition
Scenario 3: "Database unique constraint violations"
Symptom: IntegrityError: duplicate key value violates unique constraint
Cause: Tests reuse same email/username/ID
Fix:
import uuid
@pytest.fixture
def unique_user():
email = f"test-{uuid.uuid4()}@example.com"
return create_user(email=email)
Quick Reference: Isolation Strategy Decision Tree
What resource needs isolation?
DATABASE
├─ Can you use transactions? → Transaction Rollback (fastest)
├─ Need real commits? → Unique Data Per Test
└─ Need schema changes? → Test Database Per Test
FILES
├─ Few files? → pytest's tmp_path
└─ Complex directories? → tempfile.mkdtemp()
EXTERNAL SERVICES
├─ Testing integration? → Separate integration test suite
└─ Testing business logic? → Mock the service
IN-MEMORY STATE
├─ Caches → Clear before each test (autouse fixture)
├─ Globals → Dependency injection (refactor)
└─ Module-level → Reset in fixture or avoid entirely
PROCESSES/PORTS
└─ Dynamic port allocation per test
Bottom Line
Test isolation is structural, not reactive.
- ❌ Reactive: Write cleanup code after each test
- ✅ Structural: Design tests so cleanup isn't needed
The hierarchy:
- Best: Dependency injection (no shared state)
- Good: Transactions/tmp_path (automatic cleanup)
- Acceptable: Unique data per test (explicit isolation)
- Last resort: Manual cleanup (fragile, error-prone)
If your tests fail together but pass alone, you have an isolation problem. Stop adding tests and fix isolation first.