Files
gh-tachyon-beep-skillpacks-…/skills/test-isolation-fundamentals/SKILL.md
2025-11-30 08:59:43 +08:00

664 lines
14 KiB
Markdown

---
name: test-isolation-fundamentals
description: Use when tests fail together but pass alone, diagnosing test pollution, ensuring test independence and idempotence, managing shared state, or designing parallel-safe tests - provides isolation principles, database/file/service patterns, and cleanup strategies
---
# Test Isolation Fundamentals
## Overview
**Core principle:** Each test must work independently, regardless of execution order or parallel execution.
**Rule:** If a test fails when run with other tests but passes alone, you have an isolation problem. Fix it before adding more tests.
## When You Have Isolation Problems
**Symptoms:**
- Tests pass individually: `pytest test_checkout.py`
- Tests fail in full suite: `pytest`
- Errors like "User already exists", "Expected empty but found data"
- Tests fail randomly or only in CI
- Different results when tests run in different orders
**Root cause:** Tests share mutable state without cleanup.
## The Five Principles
### 1. Order-Independence
**Tests must pass regardless of execution order.**
```bash
# All of these must produce identical results
pytest tests/ # alphabetical order
pytest tests/ --random-order # random order
pytest tests/ --reverse # reverse order
```
**Anti-pattern:**
```python
# ❌ BAD: Test B depends on Test A running first
def test_create_user():
db.users.insert({"id": 1, "name": "Alice"})
def test_update_user():
db.users.update({"id": 1}, {"name": "Bob"}) # Assumes Alice exists!
```
**Fix:** Each test creates its own data.
---
### 2. Idempotence
**Running a test twice produces the same result both times.**
```bash
# Both runs must pass
pytest test_checkout.py # First run
pytest test_checkout.py # Second run (same result)
```
**Anti-pattern:**
```python
# ❌ BAD: Second run fails on unique constraint
def test_signup():
user = create_user(email="test@example.com")
assert user.id is not None
# No cleanup - second run fails: "email already exists"
```
**Fix:** Clean up data after test OR use unique data per run.
---
### 3. Fresh State
**Each test starts with a clean slate.**
**What needs to be fresh:**
- Database records
- Files and directories
- In-memory caches
- Global variables
- Module-level state
- Environment variables
- Network sockets/ports
- Background processes
**Anti-pattern:**
```python
# ❌ BAD: Shared mutable global state
cache = {} # Module-level global
def test_cache_miss():
assert get_from_cache("key1") is None # Passes first time
cache["key1"] = "value" # Pollutes global state
def test_cache_lookup():
assert get_from_cache("key1") is None # Fails if previous test ran!
```
---
### 4. Explicit Scope
**Know what state is shared vs isolated.**
**Test scopes (pytest):**
- `scope="function"` - Fresh per test (default, safest)
- `scope="class"` - Shared across test class
- `scope="module"` - Shared across file
- `scope="session"` - Shared across entire test run
**Rule:** Default to `scope="function"`. Only use broader scopes for expensive resources that are READ-ONLY.
```python
# ✅ GOOD: Expensive read-only data can be shared
@pytest.fixture(scope="session")
def large_config_file():
return load_config("data.json") # Expensive, never modified
# ❌ BAD: Mutable data shared across tests
@pytest.fixture(scope="session")
def database():
return Database() # Tests will pollute each other!
# ✅ GOOD: Mutable data fresh per test
@pytest.fixture(scope="function")
def database():
db = Database()
yield db
db.cleanup() # Fresh per test
```
---
### 5. Parallel Safety
**Tests must work when run concurrently.**
```bash
pytest -n 4 # Run 4 tests in parallel with pytest-xdist
```
**Parallel-unsafe patterns:**
- Shared files without unique names
- Fixed network ports
- Singleton databases
- Global module state
- Fixed temp directories
**Fix:** Use unique identifiers per test (UUIDs, process IDs, random ports).
---
## Isolation Patterns by Resource Type
### Database Isolation
**Pattern 1: Transactions with Rollback (Fastest, Recommended)**
```python
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
@pytest.fixture
def db_session(db_engine):
"""Each test gets a fresh DB session that auto-rollbacks."""
connection = db_engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
transaction.rollback() # Undo all changes
connection.close()
```
**Why it works:**
- No cleanup code needed - rollback is automatic
- Fast (<1ms per test)
- Works with ANY database (PostgreSQL, MySQL, SQLite, Oracle)
- Handles FK relationships automatically
**When NOT to use:**
- Testing actual commits
- Testing transaction isolation levels
- Multi-database transactions
---
**Pattern 2: Unique Data Per Test**
```python
import uuid
import pytest
@pytest.fixture
def unique_user():
"""Each test gets a unique user."""
email = f"test-{uuid.uuid4()}@example.com"
user = create_user(email=email, name="Test User")
yield user
# Optional cleanup (or rely on test DB being dropped)
delete_user(user.id)
```
**Why it works:**
- Tests don't interfere (different users)
- Can run in parallel
- Idempotent (UUID ensures uniqueness)
**When to use:**
- Testing with real databases
- Parallel test execution
- Integration tests that need real commits
---
**Pattern 3: Test Database Per Test**
```python
@pytest.fixture
def isolated_db():
"""Each test gets its own temporary database."""
db_name = f"test_db_{uuid.uuid4().hex}"
create_database(db_name)
yield get_connection(db_name)
drop_database(db_name)
```
**Why it works:**
- Complete isolation
- Can test schema migrations
- No cross-test pollution
**When NOT to use:**
- Unit tests (too slow)
- Large test suites (overhead adds up)
---
### File System Isolation
**Pattern: Temporary Directories**
```python
import pytest
import tempfile
import shutil
@pytest.fixture
def temp_workspace():
"""Each test gets a fresh temporary directory."""
tmpdir = tempfile.mkdtemp(prefix="test_")
yield tmpdir
shutil.rmtree(tmpdir) # Clean up
```
**Parallel-safe version:**
```python
@pytest.fixture
def temp_workspace(tmp_path):
"""pytest's tmp_path is automatically unique per test."""
workspace = tmp_path / "workspace"
workspace.mkdir()
yield workspace
# No cleanup needed - pytest handles it
```
**Why it works:**
- Each test writes to different directory
- Parallel-safe (unique paths)
- Automatic cleanup
---
### Service/API Isolation
**Pattern: Mocking External Services**
```python
import pytest
from unittest.mock import patch, MagicMock
@pytest.fixture
def mock_stripe():
"""Mock Stripe API for all tests."""
with patch('stripe.Charge.create') as mock:
mock.return_value = MagicMock(id="ch_test123", status="succeeded")
yield mock
```
**When to use:**
- External APIs (Stripe, Twilio, SendGrid)
- Slow services
- Non-deterministic responses
- Services that cost money per call
**When NOT to use:**
- Testing integration with real service (use separate integration test suite)
---
### In-Memory Cache Isolation
**Pattern: Clear Cache Before Each Test**
```python
import pytest
@pytest.fixture(autouse=True)
def clear_cache():
"""Automatically clear cache before each test."""
cache.clear()
yield
# Optional: clear after test too
cache.clear()
```
**Why `autouse=True`:** Runs automatically for every test without explicit declaration.
---
### Process/Port Isolation
**Pattern: Dynamic Port Allocation**
```python
import socket
import pytest
def get_free_port():
"""Find an available port."""
sock = socket.socket()
sock.bind(('', 0))
port = sock.getsockname()[1]
sock.close()
return port
@pytest.fixture
def test_server():
"""Each test gets a server on a unique port."""
port = get_free_port()
server = start_server(port=port)
yield f"http://localhost:{port}"
server.stop()
```
**Why it works:**
- Tests can run in parallel (different ports)
- No port conflicts
---
## Test Doubles: When to Use What
| Type | Purpose | Example |
|------|---------|---------|
| **Stub** | Returns hardcoded values | `getUser() → {id: 1, name: "Alice"}` |
| **Mock** | Verifies calls were made | `assert emailService.send.called` |
| **Fake** | Working implementation, simplified | In-memory database instead of PostgreSQL |
| **Spy** | Records calls for later inspection | Logs all method calls |
**Decision tree:**
```
Do you need to verify the call was made?
YES → Use Mock
NO → Do you need a working implementation?
YES → Use Fake
NO → Use Stub
```
---
## Diagnosing Isolation Problems
### Step 1: Identify Flaky Tests
```bash
# Run tests 100 times to find flakiness
pytest --count=100 test_checkout.py
# Run in random order
pytest --random-order
```
**Interpretation:**
- Passes 100/100 → Not flaky
- Passes 95/100 → Flaky (5% failure rate)
- Failures are random → Parallel unsafe OR order-dependent
---
### Step 2: Find Which Tests Interfere
**Run tests in isolation:**
```bash
# Test A alone
pytest test_a.py # ✓ Passes
# Test B alone
pytest test_b.py # ✓ Passes
# Both together
pytest test_a.py test_b.py # ✗ Test B fails
# Conclusion: Test A pollutes state that Test B depends on
```
**Reverse the order:**
```bash
pytest test_b.py test_a.py # Does Test A fail now?
```
- If YES: Bidirectional pollution
- If NO: Test A pollutes, Test B is victim
---
### Step 3: Identify Shared State
**Add diagnostic logging:**
```python
@pytest.fixture(autouse=True)
def log_state():
"""Log state before/after each test."""
print(f"Before: DB has {db.count()} records")
yield
print(f"After: DB has {db.count()} records")
```
**Look for:**
- Record count increasing over time (no cleanup)
- Files accumulating
- Cache growing
- Ports in use
---
### Step 4: Audit for Global State
**Search codebase for isolation violations:**
```bash
# Module-level globals
grep -r "^[A-Z_]* = " app/
# Global caches
grep -r "cache = " app/
# Singletons
grep -r "@singleton" app/
grep -r "class.*Singleton" app/
```
---
## Anti-Patterns Catalog
### ❌ Cleanup Code Instead of Structural Isolation
**Symptom:** Every test has teardown code to clean up
```python
def test_checkout():
user = create_user()
cart = create_cart(user)
checkout(cart)
# Teardown
delete_cart(cart.id)
delete_user(user.id)
```
**Why bad:**
- If test fails before cleanup, state pollutes
- If cleanup has bugs, state pollutes
- Forces sequential execution (no parallelism)
**Fix:** Use transactions, unique IDs, or dependency injection
---
### ❌ Shared Test Fixtures
**Symptom:** Fixtures modify mutable state
```python
@pytest.fixture(scope="module")
def user():
return create_user(email="test@example.com")
def test_update_name(user):
user.name = "Alice" # Modifies shared fixture!
save(user)
def test_update_email(user):
# Expects name to be original, but Test 1 changed it!
assert user.name == "Test User" # FAILS
```
**Why bad:** Tests interfere when fixture is modified
**Fix:** Use `scope="function"` for mutable fixtures
---
### ❌ Hidden Dependencies on Execution Order
**Symptom:** Test suite has implicit execution order
```python
# test_a.py
def test_create_admin():
create_user(email="admin@example.com", role="admin")
# test_b.py
def test_admin_permissions():
admin = get_user("admin@example.com") # Assumes test_a ran!
assert admin.has_permission("delete_users")
```
**Why bad:** Breaks when tests run in different order or in parallel
**Fix:** Each test creates its own dependencies
---
### ❌ Testing on Production-Like State
**Symptom:** Tests run against shared database with existing data
```python
def test_user_count():
assert db.users.count() == 100 # Assumes specific state!
```
**Why bad:**
- Tests fail when data changes
- Can't run in parallel
- Can't run idempotently
**Fix:** Use isolated test database or count relative to test's own data
---
## Common Scenarios
### Scenario 1: "Tests pass locally, fail in CI"
**Likely causes:**
1. **Timing issues** - CI is slower/faster, race conditions exposed
2. **Parallel execution** - CI runs tests in parallel, local doesn't
3. **Missing cleanup** - Local has leftover state, CI is fresh
**Diagnosis:**
```bash
# Test parallel execution locally
pytest -n 4
# Test with clean state
rm -rf .pytest_cache && pytest
```
---
### Scenario 2: "Random test failures that disappear on retry"
**Likely causes:**
1. **Race conditions** - Async operations not awaited
2. **Shared mutable state** - Global variables polluted
3. **External service flakiness** - Real APIs being called
**Diagnosis:**
```bash
# Run same test 100 times
pytest --count=100 test_flaky.py
# If failure rate is consistent (e.g., 5/100), it's likely shared state
# If failure rate varies wildly, it's likely race condition
```
---
### Scenario 3: "Database unique constraint violations"
**Symptom:** `IntegrityError: duplicate key value violates unique constraint`
**Cause:** Tests reuse same email/username/ID
**Fix:**
```python
import uuid
@pytest.fixture
def unique_user():
email = f"test-{uuid.uuid4()}@example.com"
return create_user(email=email)
```
---
## Quick Reference: Isolation Strategy Decision Tree
```
What resource needs isolation?
DATABASE
├─ Can you use transactions? → Transaction Rollback (fastest)
├─ Need real commits? → Unique Data Per Test
└─ Need schema changes? → Test Database Per Test
FILES
├─ Few files? → pytest's tmp_path
└─ Complex directories? → tempfile.mkdtemp()
EXTERNAL SERVICES
├─ Testing integration? → Separate integration test suite
└─ Testing business logic? → Mock the service
IN-MEMORY STATE
├─ Caches → Clear before each test (autouse fixture)
├─ Globals → Dependency injection (refactor)
└─ Module-level → Reset in fixture or avoid entirely
PROCESSES/PORTS
└─ Dynamic port allocation per test
```
---
## Bottom Line
**Test isolation is structural, not reactive.**
-**Reactive:** Write cleanup code after each test
-**Structural:** Design tests so cleanup isn't needed
**The hierarchy:**
1. **Best:** Dependency injection (no shared state)
2. **Good:** Transactions/tmp_path (automatic cleanup)
3. **Acceptable:** Unique data per test (explicit isolation)
4. **Last resort:** Manual cleanup (fragile, error-prone)
**If your tests fail together but pass alone, you have an isolation problem. Stop adding tests and fix isolation first.**