Files
2025-11-29 17:50:49 +08:00

20 KiB

Test Design Best Practices

Building Stable, Deterministic Tests

The foundation of a reliable test suite is deterministic tests that produce consistent results. Non-deterministic or "flaky" tests erode trust and waste time debugging phantom failures.

Eliminating Timing Dependencies

The Problem: Fixed sleep statements create unreliable tests that either waste time or fail intermittently.

# ❌ BAD: Fixed sleep
def test_async_operation():
    start_async_task()
    time.sleep(5)  # Hope 5 seconds is enough
    assert task_completed()

# ✅ GOOD: Conditional wait
def test_async_operation():
    start_async_task()
    wait_until(lambda: task_completed(), timeout=10, interval=0.1)
    assert task_completed()

Strategies:

  • Use explicit waits with conditions, not arbitrary sleeps
  • Implement timeout-based polling for async operations
  • Use testing framework wait utilities (e.g., Selenium WebDriverWait)
  • Mock time-dependent code to make tests deterministic
  • Use fast fake timers in tests instead of actual delays

Example: Polling with Timeout

def wait_until(condition, timeout=10, interval=0.1):
    """Wait until condition is true or timeout expires."""
    end_time = time.time() + timeout
    while time.time() < end_time:
        if condition():
            return True
        time.sleep(interval)
    raise TimeoutError(f"Condition not met within {timeout} seconds")

def test_message_processing():
    queue.publish(message)

    wait_until(lambda: queue.is_empty(), timeout=5)

    assert message_handler.processed_count == 1

Avoiding Shared State

The Problem: Tests that share state create order dependencies and cascading failures.

# ❌ BAD: Shared state
shared_user = None

def test_create_user():
    global shared_user
    shared_user = User.create(email="test@example.com")
    assert shared_user.id is not None

def test_user_can_login():
    # Depends on previous test
    result = login(shared_user.email, "password")
    assert result.success

# ✅ GOOD: Independent tests
@pytest.fixture
def user():
    """Each test gets its own user."""
    user = User.create(email=f"test-{uuid4()}@example.com")
    yield user
    user.delete()

def test_create_user(user):
    assert user.id is not None

def test_user_can_login(user):
    result = login(user.email, "password")
    assert result.success

Strategies:

  • Use test fixtures that create fresh instances
  • Employ database transactions that rollback after each test
  • Use in-memory databases that reset between tests
  • Avoid global variables and singletons in tests
  • Make each test completely self-contained

Database Isolation Pattern

@pytest.fixture
def db_transaction():
    """Each test runs in a transaction that rolls back."""
    connection = engine.connect()
    transaction = connection.begin()
    session = Session(bind=connection)

    yield session

    session.close()
    transaction.rollback()
    connection.close()

def test_user_creation(db_transaction):
    user = User(email="test@example.com")
    db_transaction.add(user)
    db_transaction.commit()

    assert db_transaction.query(User).count() == 1
    # Transaction rolls back after test - no cleanup needed

Handling External Dependencies

The Problem: Tests depending on external systems (APIs, databases, file systems) become non-deterministic and slow.

Strategies:

  • Mock external HTTP APIs: Use libraries like responses, httpretty, or requests-mock
  • Use in-memory databases: SQLite in-memory for SQL databases
  • Fake file systems: Use pyfakefs or similar for file operations
  • Container-based dependencies: Use Testcontainers for isolated, real dependencies
  • Test doubles: Stubs, mocks, and fakes for external services

Mocking HTTP Calls

import responses

@responses.activate
def test_fetch_user_data():
    responses.add(
        responses.GET,
        'https://api.example.com/users/123',
        json={'id': 123, 'name': 'Test User'},
        status=200
    )

    user_data = api_client.get_user(123)

    assert user_data['name'] == 'Test User'

Controlling Randomness and Time

Random Data: Use fixed seeds for reproducible randomness

# ❌ BAD: Non-deterministic
def test_shuffle():
    items = [1, 2, 3, 4, 5]
    random.shuffle(items)  # Different result each run
    assert items[0] < items[-1]

# ✅ GOOD: Deterministic
def test_shuffle():
    random.seed(42)
    items = [1, 2, 3, 4, 5]
    random.shuffle(items)
    assert items == [2, 4, 5, 3, 1]  # Always same order

Time-Dependent Code: Mock current time

from datetime import datetime
from unittest.mock import patch

# ❌ BAD: Depends on actual current time
def test_is_expired():
    expiry = datetime(2024, 1, 1)
    assert is_expired(expiry)  # Fails after 2024-01-01

# ✅ GOOD: Freezes time
@patch('mymodule.datetime')
def test_is_expired(mock_datetime):
    mock_datetime.now.return_value = datetime(2024, 6, 1)

    expiry = datetime(2024, 1, 1)

    assert is_expired(expiry)

Boundary Analysis and Equivalence Partitioning

Systematic approaches to test input selection ensure comprehensive coverage without exhaustive testing.

Boundary Value Analysis

Principle: Bugs often occur at the edges of input ranges. Test boundaries explicitly.

For a function accepting ages 0-120:

  • Test: -1 (below minimum)
  • Test: 0 (minimum boundary)
  • Test: 1 (just above minimum)
  • Test: 119 (just below maximum)
  • Test: 120 (maximum boundary)
  • Test: 121 (above maximum)

Example Implementation

def validate_age(age):
    if age < 0 or age > 120:
        raise ValueError("Age must be between 0 and 120")
    return True

@pytest.mark.parametrize("age,should_pass", [
    (-1, False),   # Below minimum
    (0, True),     # Minimum boundary
    (1, True),     # Just above minimum
    (60, True),    # Typical value
    (119, True),   # Just below maximum
    (120, True),   # Maximum boundary
    (121, False),  # Above maximum
])
def test_age_validation(age, should_pass):
    if should_pass:
        assert validate_age(age) == True
    else:
        with pytest.raises(ValueError):
            validate_age(age)

Equivalence Partitioning

Principle: Divide input space into classes that should behave identically. Test one representative from each class.

For a discount function based on purchase amount:

  • Partition 1: $0-100 (no discount)
  • Partition 2: $101-500 (10% discount)
  • Partition 3: $501+ (20% discount)

Test representatives: $50, $300, $600

Example Implementation

def calculate_discount(amount):
    if amount <= 100:
        return 0
    elif amount <= 500:
        return amount * 0.10
    else:
        return amount * 0.20

@pytest.mark.parametrize("amount,expected_discount", [
    (50, 0),         # Partition 1: 0-100
    (100, 0),        # Boundary
    (101, 10.10),    # Just into partition 2
    (300, 30),       # Partition 2: 101-500
    (500, 50),       # Boundary
    (501, 100.20),   # Just into partition 3
    (1000, 200),     # Partition 3: 501+
])
def test_discount_calculation(amount, expected_discount):
    assert calculate_discount(amount) == pytest.approx(expected_discount)

Property-Based Testing

Principle: Define invariants that should always hold, then generate hundreds of test cases automatically.

Example: Reversing a List

from hypothesis import given
from hypothesis.strategies import lists, integers

@given(lists(integers()))
def test_reverse_twice_returns_original(items):
    """Reversing a list twice should return the original."""
    reversed_once = list(reversed(items))
    reversed_twice = list(reversed(reversed_once))
    assert reversed_twice == items

@given(lists(integers()))
def test_sorted_list_has_same_elements(items):
    """Sorted list should have same elements as original."""
    sorted_items = sorted(items)
    assert sorted(sorted_items) == sorted(items)
    assert len(sorted_items) == len(items)

Benefits:

  • Finds edge cases you didn't think to test
  • Tests many more scenarios than manual tests
  • Shrinks failing examples to minimal reproducible cases
  • Documents properties/invariants of the code

Writing Expressive, Maintainable Tests

Arrange-Act-Assert (AAA) Pattern

Structure: Organize tests into three clear sections for maximum readability.

def test_user_registration():
    # Arrange: Set up test data and dependencies
    email = "newuser@example.com"
    password = "SecurePass123!"
    user_service = UserService(database, email_service)

    # Act: Execute the behavior being tested
    result = user_service.register(email, password)

    # Assert: Verify expected outcomes
    assert result.success == True
    assert result.user.email == email
    assert result.user.is_verified == False

Benefits:

  • Immediately clear what the test does
  • Easy to identify what's being tested
  • Simple to understand failure location
  • Consistent structure across test suite

Variations:

  • Use blank lines to separate sections
  • Use comments to label sections
  • BDD Given-When-Then is equivalent structure

Descriptive Test Names

Principle: Test names should describe scenario and expected outcome without reading test code.

Naming Patterns:

# Pattern 1: test_[function]_[scenario]_[expected_result]
def test_calculate_discount_for_vip_customer_returns_ten_percent():
    pass

# Pattern 2: test_[scenario]_[expected_result]
def test_invalid_email_raises_validation_error():
    pass

# Pattern 3: Natural language with underscores
def test_user_cannot_login_with_expired_password():
    pass

# Pattern 4: BDD-style
def test_given_vip_customer_when_ordering_then_receives_discount():
    pass

Good Test Names:

  • test_empty_cart_checkout_raises_error
  • test_duplicate_email_registration_rejected
  • test_expired_token_authentication_fails
  • test_concurrent_order_updates_handled_correctly

Bad Test Names:

  • test_1, test_2, test_3 (meaningless)
  • test_order (what about orders?)
  • test_the_thing_works (what thing? how?)
  • test_bug_fix (what bug?)

Clear, Actionable Assertions

Principle: Assertion failures should immediately communicate what went wrong.

# ❌ BAD: Unclear failure message
def test_discount_calculation():
    assert calculate_discount(100) == 10

# Output: AssertionError: assert 0 == 10
# (Why did we expect 10? What was the input context?)

# ✅ GOOD: Clear failure message
def test_discount_calculation():
    customer = Customer(type="VIP")
    order_total = 100

    discount = calculate_discount(customer, order_total)

    assert discount == 10, \
        f"VIP customer with ${order_total} order should get $10 discount, got ${discount}"

# Output: AssertionError: VIP customer with $100 order should get $10 discount, got $0

Custom Assertion Messages:

# Complex conditions benefit from explanatory messages
assert user.is_active, \
    f"User {user.email} should be active after verification, but is_active={user.is_active}"

assert len(results) > 0, \
    f"Search for '{search_term}' should return results, got empty list"

assert response.status_code == 200, \
    f"GET /api/users/{user_id} should return 200, got {response.status_code}: {response.text}"

Keep Tests Focused and Small

Principle: Each test should verify one behavior or logical assertion.

# ❌ BAD: Testing too much
def test_user_lifecycle():
    # Create user
    user = User.create(email="test@example.com")
    assert user.id is not None

    # Activate user
    user.activate()
    assert user.is_active

    # Update profile
    user.update_profile(name="Test User")
    assert user.name == "Test User"

    # Deactivate user
    user.deactivate()
    assert not user.is_active

    # Delete user
    user.delete()
    assert User.find(user.id) is None

# ✅ GOOD: Focused tests
def test_create_user_assigns_id():
    user = User.create(email="test@example.com")
    assert user.id is not None

def test_activate_user_sets_active_flag():
    user = User.create(email="test@example.com")

    user.activate()

    assert user.is_active == True

def test_update_profile_changes_name():
    user = User.create(email="test@example.com")

    user.update_profile(name="Test User")

    assert user.name == "Test User"

When Multiple Assertions Are OK:

  • Verifying different properties of the same object
  • Checking related side effects of single action
  • Asserting preconditions and postconditions
# ✅ GOOD: Multiple related assertions
def test_order_creation():
    items = [Item("Widget", 10), Item("Gadget", 20)]

    order = Order.create(items)

    assert order.id is not None
    assert order.total == 30
    assert len(order.items) == 2
    assert order.status == "pending"

Test Data Management

Fixtures: Providing Test Dependencies

Fixtures provide reusable test dependencies and setup/teardown logic.

Pytest Fixtures

@pytest.fixture
def database():
    """Provide a test database."""
    db = create_test_database()
    yield db
    db.close()

@pytest.fixture
def user(database):
    """Provide a test user."""
    user = User.create(email="test@example.com")
    yield user
    user.delete()

def test_user_can_place_order(user, database):
    order = Order.create(user=user, items=[Item("Widget", 10)])
    assert order.user_id == user.id

Fixture Scopes:

  • function: New instance per test (default)
  • class: Shared across test class
  • module: Shared across test file
  • session: Shared across entire test run
@pytest.fixture(scope="session")
def app():
    """Start application once for entire test session."""
    app = create_app()
    app.start()
    yield app
    app.stop()

Factories: Flexible Test Data Creation

Factories create test objects with sensible defaults that can be customized.

# Factory pattern
class UserFactory:
    _counter = 0

    @classmethod
    def create(cls, **kwargs):
        cls._counter += 1
        defaults = {
            'email': f'user{cls._counter}@example.com',
            'name': f'Test User {cls._counter}',
            'role': 'user',
            'is_active': True
        }
        defaults.update(kwargs)
        return User(**defaults)

# Usage
def test_admin_user():
    admin = UserFactory.create(role='admin', name='Admin User')
    assert admin.role == 'admin'

def test_inactive_user():
    inactive = UserFactory.create(is_active=False)
    assert not inactive.is_active

FactoryBoy Library

import factory

class UserFactory(factory.Factory):
    class Meta:
        model = User

    email = factory.Sequence(lambda n: f'user{n}@example.com')
    name = factory.Faker('name')
    role = 'user'
    is_active = True

# Usage
def test_user_creation():
    user = UserFactory()
    assert user.email.endswith('@example.com')

def test_admin_user():
    admin = UserFactory(role='admin')
    assert admin.role == 'admin'

Builders: Constructing Complex Objects

Builders provide fluent APIs for creating complex test objects.

class OrderBuilder:
    def __init__(self):
        self._user = None
        self._items = []
        self._status = 'pending'
        self._shipping_address = None

    def with_user(self, user):
        self._user = user
        return self

    def with_item(self, name, price, quantity=1):
        self._items.append(Item(name, price, quantity))
        return self

    def with_status(self, status):
        self._status = status
        return self

    def with_shipping(self, address):
        self._shipping_address = address
        return self

    def build(self):
        return Order(
            user=self._user,
            items=self._items,
            status=self._status,
            shipping_address=self._shipping_address
        )

# Usage
def test_order_total_calculation():
    order = (OrderBuilder()
        .with_item("Widget", 10, quantity=2)
        .with_item("Gadget", 20)
        .build())

    assert order.total == 40

Snapshot/Golden Master Testing

Principle: Capture complex output as baseline, verify future runs match.

Use Cases:

  • Testing rendered HTML or UI components
  • Verifying generated reports or documents
  • Validating complex JSON/XML output
  • Testing data transformations with many fields
def test_user_profile_rendering(snapshot):
    user = User(name="John Doe", email="john@example.com", age=30)

    html = render_user_profile(user)

    snapshot.assert_match(html, 'user_profile.html')

# First run: Captures output as baseline
# Subsequent runs: Compares against baseline
# To update baseline: pytest --snapshot-update

Trade-offs:

  • Pro: Easy to test complex outputs
  • Pro: Catches unintended changes
  • Con: Changes to output format require updating all snapshots
  • Con: Can hide real regressions if snapshots updated without review

Mocking, Stubbing, and Fakes

Understanding the Distinctions

Stub: Provides canned responses to method calls

class StubEmailService:
    def send_email(self, to, subject, body):
        return True  # Always succeeds, doesn't actually send

Mock: Records interactions and verifies they occurred as expected

from unittest.mock import Mock

email_service = Mock()
user_service.register(user)

email_service.send_welcome_email.assert_called_once_with(user.email)

Fake: Simplified working implementation

class FakeDatabase:
    def __init__(self):
        self._data = {}

    def save(self, id, value):
        self._data[id] = value

    def find(self, id):
        return self._data.get(id)

When to Use Each Type

Use Stubs for query dependencies that provide data

def test_calculate_user_discount():
    stub_repository = StubUserRepository()
    stub_repository.set_return_value(User(type="VIP"))

    discount_service = DiscountService(stub_repository)

    discount = discount_service.calculate_for_user(user_id=123)

    assert discount == 10

Use Mocks for command dependencies where interaction matters

def test_order_completion_sends_confirmation():
    mock_email_service = Mock()
    order_service = OrderService(mock_email_service)

    order_service.complete_order(order_id=123)

    mock_email_service.send_confirmation.assert_called_once()

Use Fakes for complex dependencies needing realistic behavior

def test_user_repository_operations():
    fake_db = FakeDatabase()
    repository = UserRepository(fake_db)

    user = User(email="test@example.com")
    repository.save(user)

    found = repository.find_by_email("test@example.com")
    assert found.email == user.email

Avoiding Over-Mocking

The Problem: Mocking every dependency couples tests to implementation.

# ❌ BAD: Over-mocking
def test_order_creation():
    mock_validator = Mock()
    mock_repository = Mock()
    mock_calculator = Mock()
    mock_logger = Mock()

    service = OrderService(mock_validator, mock_repository, mock_calculator, mock_logger)

    service.create_order(data)

    mock_validator.validate.assert_called_once()
    mock_calculator.calculate_total.assert_called_once()
    mock_repository.save.assert_called_once()
    mock_logger.log.assert_called()

# ✅ GOOD: Mock only external boundaries
def test_order_creation():
    repository = InMemoryOrderRepository()
    service = OrderService(repository)

    order = service.create_order(data)

    assert order.total == 100
    assert repository.find(order.id) is not None

Guidelines:

  • Mock external systems (databases, APIs, file systems)
  • Use real implementations for internal collaborators
  • Don't verify every method call
  • Focus on observable behavior, not interactions

Summary: Test Design Principles

  1. Deterministic tests: Eliminate timing issues, shared state, and external dependencies
  2. Boundary testing: Test edges systematically where bugs hide
  3. Expressive tests: Clear names, AAA structure, actionable failures
  4. Focused tests: One behavior per test, small and understandable
  5. Flexible test data: Use factories and builders for maintainable test data
  6. Strategic mocking: Mock at system boundaries, use real implementations internally
  7. Fast execution: Keep unit tests fast for rapid feedback loops

These practices work together to create test suites that catch bugs, enable refactoring, and remain maintainable over time.