zhongwei/gh-tachyon-beep-skillpacks-plugins-ordis-quality-engineering

Files

Zhongwei Li 966ef521f7 Initial commit

2025-11-30 08:59:43 +08:00

14 KiB

Raw Blame History

name, description

name	description
test-isolation-fundamentals	Use when tests fail together but pass alone, diagnosing test pollution, ensuring test independence and idempotence, managing shared state, or designing parallel-safe tests - provides isolation principles, database/file/service patterns, and cleanup strategies

Test Isolation Fundamentals

Overview

Core principle: Each test must work independently, regardless of execution order or parallel execution.

Rule: If a test fails when run with other tests but passes alone, you have an isolation problem. Fix it before adding more tests.

When You Have Isolation Problems

Symptoms:

Tests pass individually: pytest test_checkout.py ✓
Tests fail in full suite: pytest ✗
Errors like "User already exists", "Expected empty but found data"
Tests fail randomly or only in CI
Different results when tests run in different orders

Root cause: Tests share mutable state without cleanup.

The Five Principles

1. Order-Independence

Tests must pass regardless of execution order.

# All of these must produce identical results
pytest tests/  # alphabetical order
pytest tests/ --random-order  # random order
pytest tests/ --reverse  # reverse order

Anti-pattern:

# ❌ BAD: Test B depends on Test A running first
def test_create_user():
    db.users.insert({"id": 1, "name": "Alice"})

def test_update_user():
    db.users.update({"id": 1}, {"name": "Bob"})  # Assumes Alice exists!

Fix: Each test creates its own data.

2. Idempotence

Running a test twice produces the same result both times.

# Both runs must pass
pytest test_checkout.py  # First run
pytest test_checkout.py  # Second run (same result)

Anti-pattern:

# ❌ BAD: Second run fails on unique constraint
def test_signup():
    user = create_user(email="test@example.com")
    assert user.id is not None
    # No cleanup - second run fails: "email already exists"

Fix: Clean up data after test OR use unique data per run.

3. Fresh State

Each test starts with a clean slate.

What needs to be fresh:

Database records
Files and directories
In-memory caches
Global variables
Module-level state
Environment variables
Network sockets/ports
Background processes

Anti-pattern:

# ❌ BAD: Shared mutable global state
cache = {}  # Module-level global

def test_cache_miss():
    assert get_from_cache("key1") is None  # Passes first time
    cache["key1"] = "value"  # Pollutes global state

def test_cache_lookup():
    assert get_from_cache("key1") is None  # Fails if previous test ran!

4. Explicit Scope

Know what state is shared vs isolated.

Test scopes (pytest):

scope="function" - Fresh per test (default, safest)
scope="class" - Shared across test class
scope="module" - Shared across file
scope="session" - Shared across entire test run

Rule: Default to scope="function". Only use broader scopes for expensive resources that are READ-ONLY.

# ✅ GOOD: Expensive read-only data can be shared
@pytest.fixture(scope="session")
def large_config_file():
    return load_config("data.json")  # Expensive, never modified

# ❌ BAD: Mutable data shared across tests
@pytest.fixture(scope="session")
def database():
    return Database()  # Tests will pollute each other!

# ✅ GOOD: Mutable data fresh per test
@pytest.fixture(scope="function")
def database():
    db = Database()
    yield db
    db.cleanup()  # Fresh per test

5. Parallel Safety

Tests must work when run concurrently.

pytest -n 4  # Run 4 tests in parallel with pytest-xdist

Parallel-unsafe patterns:

Shared files without unique names
Fixed network ports
Singleton databases
Global module state
Fixed temp directories

Fix: Use unique identifiers per test (UUIDs, process IDs, random ports).

Isolation Patterns by Resource Type

Database Isolation

Pattern 1: Transactions with Rollback (Fastest, Recommended)

import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

@pytest.fixture
def db_session(db_engine):
    """Each test gets a fresh DB session that auto-rollbacks."""
    connection = db_engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    transaction.rollback()  # Undo all changes
    connection.close()

Why it works:

No cleanup code needed - rollback is automatic
Fast (<1ms per test)
Works with ANY database (PostgreSQL, MySQL, SQLite, Oracle)
Handles FK relationships automatically

When NOT to use:

Testing actual commits
Testing transaction isolation levels
Multi-database transactions

Pattern 2: Unique Data Per Test

import uuid
import pytest

@pytest.fixture
def unique_user():
    """Each test gets a unique user."""
    email = f"test-{uuid.uuid4()}@example.com"
    user = create_user(email=email, name="Test User")

    yield user

    # Optional cleanup (or rely on test DB being dropped)
    delete_user(user.id)

Why it works:

Tests don't interfere (different users)
Can run in parallel
Idempotent (UUID ensures uniqueness)

When to use:

Testing with real databases
Parallel test execution
Integration tests that need real commits

Pattern 3: Test Database Per Test

@pytest.fixture
def isolated_db():
    """Each test gets its own temporary database."""
    db_name = f"test_db_{uuid.uuid4().hex}"
    create_database(db_name)

    yield get_connection(db_name)

    drop_database(db_name)

Why it works:

Complete isolation
Can test schema migrations
No cross-test pollution

When NOT to use:

Unit tests (too slow)
Large test suites (overhead adds up)

File System Isolation

Pattern: Temporary Directories

import pytest
import tempfile
import shutil

@pytest.fixture
def temp_workspace():
    """Each test gets a fresh temporary directory."""
    tmpdir = tempfile.mkdtemp(prefix="test_")

    yield tmpdir

    shutil.rmtree(tmpdir)  # Clean up

Parallel-safe version:

@pytest.fixture
def temp_workspace(tmp_path):
    """pytest's tmp_path is automatically unique per test."""
    workspace = tmp_path / "workspace"
    workspace.mkdir()

    yield workspace

    # No cleanup needed - pytest handles it

Why it works:

Each test writes to different directory
Parallel-safe (unique paths)
Automatic cleanup

Service/API Isolation

Pattern: Mocking External Services

import pytest
from unittest.mock import patch, MagicMock

@pytest.fixture
def mock_stripe():
    """Mock Stripe API for all tests."""
    with patch('stripe.Charge.create') as mock:
        mock.return_value = MagicMock(id="ch_test123", status="succeeded")
        yield mock

When to use:

External APIs (Stripe, Twilio, SendGrid)
Slow services
Non-deterministic responses
Services that cost money per call

When NOT to use:

Testing integration with real service (use separate integration test suite)

In-Memory Cache Isolation

Pattern: Clear Cache Before Each Test

import pytest

@pytest.fixture(autouse=True)
def clear_cache():
    """Automatically clear cache before each test."""
    cache.clear()
    yield
    # Optional: clear after test too
    cache.clear()

Why autouse=True: Runs automatically for every test without explicit declaration.

Process/Port Isolation

Pattern: Dynamic Port Allocation

import socket
import pytest

def get_free_port():
    """Find an available port."""
    sock = socket.socket()
    sock.bind(('', 0))
    port = sock.getsockname()[1]
    sock.close()
    return port

@pytest.fixture
def test_server():
    """Each test gets a server on a unique port."""
    port = get_free_port()
    server = start_server(port=port)

    yield f"http://localhost:{port}"

    server.stop()

Why it works:

Tests can run in parallel (different ports)
No port conflicts

Test Doubles: When to Use What

Type	Purpose	Example
Stub	Returns hardcoded values	`getUser() → {id: 1, name: "Alice"}`
Mock	Verifies calls were made	`assert emailService.send.called`
Fake	Working implementation, simplified	In-memory database instead of PostgreSQL
Spy	Records calls for later inspection	Logs all method calls

Decision tree:

Do you need to verify the call was made?
  YES → Use Mock
  NO → Do you need a working implementation?
    YES → Use Fake
    NO → Use Stub

Diagnosing Isolation Problems

Step 1: Identify Flaky Tests

# Run tests 100 times to find flakiness
pytest --count=100 test_checkout.py

# Run in random order
pytest --random-order

Interpretation:

Passes 100/100 → Not flaky
Passes 95/100 → Flaky (5% failure rate)
Failures are random → Parallel unsafe OR order-dependent

Step 2: Find Which Tests Interfere

Run tests in isolation:

# Test A alone
pytest test_a.py  # ✓ Passes

# Test B alone
pytest test_b.py  # ✓ Passes

# Both together
pytest test_a.py test_b.py  # ✗ Test B fails

# Conclusion: Test A pollutes state that Test B depends on

Reverse the order:

pytest test_b.py test_a.py  # Does Test A fail now?

If YES: Bidirectional pollution
If NO: Test A pollutes, Test B is victim

Step 3: Identify Shared State

Add diagnostic logging:

@pytest.fixture(autouse=True)
def log_state():
    """Log state before/after each test."""
    print(f"Before: DB has {db.count()} records")
    yield
    print(f"After: DB has {db.count()} records")

Look for:

Record count increasing over time (no cleanup)
Files accumulating
Cache growing
Ports in use

Step 4: Audit for Global State

Search codebase for isolation violations:

# Module-level globals
grep -r "^[A-Z_]* = " app/

# Global caches
grep -r "cache = " app/

# Singletons
grep -r "@singleton" app/
grep -r "class.*Singleton" app/

Anti-Patterns Catalog

❌ Cleanup Code Instead of Structural Isolation

Symptom: Every test has teardown code to clean up

def test_checkout():
    user = create_user()
    cart = create_cart(user)

    checkout(cart)

    # Teardown
    delete_cart(cart.id)
    delete_user(user.id)

Why bad:

If test fails before cleanup, state pollutes
If cleanup has bugs, state pollutes
Forces sequential execution (no parallelism)

Fix: Use transactions, unique IDs, or dependency injection

❌ Shared Test Fixtures

Symptom: Fixtures modify mutable state

@pytest.fixture(scope="module")
def user():
    return create_user(email="test@example.com")

def test_update_name(user):
    user.name = "Alice"  # Modifies shared fixture!
    save(user)

def test_update_email(user):
    # Expects name to be original, but Test 1 changed it!
    assert user.name == "Test User"  # FAILS

Why bad: Tests interfere when fixture is modified

Fix: Use scope="function" for mutable fixtures

❌ Hidden Dependencies on Execution Order

Symptom: Test suite has implicit execution order

# test_a.py
def test_create_admin():
    create_user(email="admin@example.com", role="admin")

# test_b.py
def test_admin_permissions():
    admin = get_user("admin@example.com")  # Assumes test_a ran!
    assert admin.has_permission("delete_users")

Why bad: Breaks when tests run in different order or in parallel

Fix: Each test creates its own dependencies

❌ Testing on Production-Like State

Symptom: Tests run against shared database with existing data

def test_user_count():
    assert db.users.count() == 100  # Assumes specific state!

Why bad:

Tests fail when data changes
Can't run in parallel
Can't run idempotently

Fix: Use isolated test database or count relative to test's own data

Common Scenarios

Scenario 1: "Tests pass locally, fail in CI"

Likely causes:

Timing issues - CI is slower/faster, race conditions exposed
Parallel execution - CI runs tests in parallel, local doesn't
Missing cleanup - Local has leftover state, CI is fresh

Diagnosis:

# Test parallel execution locally
pytest -n 4

# Test with clean state
rm -rf .pytest_cache && pytest

Scenario 2: "Random test failures that disappear on retry"

Likely causes:

Race conditions - Async operations not awaited
Shared mutable state - Global variables polluted
External service flakiness - Real APIs being called

Diagnosis:

# Run same test 100 times
pytest --count=100 test_flaky.py

# If failure rate is consistent (e.g., 5/100), it's likely shared state
# If failure rate varies wildly, it's likely race condition

Scenario 3: "Database unique constraint violations"

Symptom: IntegrityError: duplicate key value violates unique constraint

Cause: Tests reuse same email/username/ID

Fix:

import uuid

@pytest.fixture
def unique_user():
    email = f"test-{uuid.uuid4()}@example.com"
    return create_user(email=email)

Quick Reference: Isolation Strategy Decision Tree

What resource needs isolation?

DATABASE
├─ Can you use transactions? → Transaction Rollback (fastest)
├─ Need real commits? → Unique Data Per Test
└─ Need schema changes? → Test Database Per Test

FILES
├─ Few files? → pytest's tmp_path
└─ Complex directories? → tempfile.mkdtemp()

EXTERNAL SERVICES
├─ Testing integration? → Separate integration test suite
└─ Testing business logic? → Mock the service

IN-MEMORY STATE
├─ Caches → Clear before each test (autouse fixture)
├─ Globals → Dependency injection (refactor)
└─ Module-level → Reset in fixture or avoid entirely

PROCESSES/PORTS
└─ Dynamic port allocation per test

Bottom Line

Test isolation is structural, not reactive.

❌ Reactive: Write cleanup code after each test
✅ Structural: Design tests so cleanup isn't needed

The hierarchy:

Best: Dependency injection (no shared state)
Good: Transactions/tmp_path (automatic cleanup)
Acceptable: Unique data per test (explicit isolation)
Last resort: Manual cleanup (fragile, error-prone)

If your tests fail together but pass alone, you have an isolation problem. Stop adding tests and fix isolation first.

14 KiB Raw Blame History

Test Isolation Fundamentals

Overview

When You Have Isolation Problems

The Five Principles

1. Order-Independence

2. Idempotence

3. Fresh State

4. Explicit Scope

5. Parallel Safety

Isolation Patterns by Resource Type

Database Isolation

File System Isolation

Service/API Isolation

In-Memory Cache Isolation

Process/Port Isolation

Test Doubles: When to Use What

Diagnosing Isolation Problems

Step 1: Identify Flaky Tests

Step 2: Find Which Tests Interfere

Step 3: Identify Shared State

Step 4: Audit for Global State

Anti-Patterns Catalog

❌ Cleanup Code Instead of Structural Isolation

❌ Shared Test Fixtures

❌ Hidden Dependencies on Execution Order

❌ Testing on Production-Like State

Common Scenarios

Scenario 1: "Tests pass locally, fail in CI"

Scenario 2: "Random test failures that disappear on retry"

Scenario 3: "Database unique constraint violations"

Quick Reference: Isolation Strategy Decision Tree

Bottom Line

14 KiB

Raw Blame History