# Test Design Best Practices ## Building Stable, Deterministic Tests The foundation of a reliable test suite is deterministic tests that produce consistent results. Non-deterministic or "flaky" tests erode trust and waste time debugging phantom failures. ### Eliminating Timing Dependencies **The Problem**: Fixed sleep statements create unreliable tests that either waste time or fail intermittently. ```python # ❌ BAD: Fixed sleep def test_async_operation(): start_async_task() time.sleep(5) # Hope 5 seconds is enough assert task_completed() # ✅ GOOD: Conditional wait def test_async_operation(): start_async_task() wait_until(lambda: task_completed(), timeout=10, interval=0.1) assert task_completed() ``` **Strategies**: - Use explicit waits with conditions, not arbitrary sleeps - Implement timeout-based polling for async operations - Use testing framework wait utilities (e.g., Selenium WebDriverWait) - Mock time-dependent code to make tests deterministic - Use fast fake timers in tests instead of actual delays **Example: Polling with Timeout** ```python def wait_until(condition, timeout=10, interval=0.1): """Wait until condition is true or timeout expires.""" end_time = time.time() + timeout while time.time() < end_time: if condition(): return True time.sleep(interval) raise TimeoutError(f"Condition not met within {timeout} seconds") def test_message_processing(): queue.publish(message) wait_until(lambda: queue.is_empty(), timeout=5) assert message_handler.processed_count == 1 ``` ### Avoiding Shared State **The Problem**: Tests that share state create order dependencies and cascading failures. ```python # ❌ BAD: Shared state shared_user = None def test_create_user(): global shared_user shared_user = User.create(email="test@example.com") assert shared_user.id is not None def test_user_can_login(): # Depends on previous test result = login(shared_user.email, "password") assert result.success # ✅ GOOD: Independent tests @pytest.fixture def user(): """Each test gets its own user.""" user = User.create(email=f"test-{uuid4()}@example.com") yield user user.delete() def test_create_user(user): assert user.id is not None def test_user_can_login(user): result = login(user.email, "password") assert result.success ``` **Strategies**: - Use test fixtures that create fresh instances - Employ database transactions that rollback after each test - Use in-memory databases that reset between tests - Avoid global variables and singletons in tests - Make each test completely self-contained **Database Isolation Pattern** ```python @pytest.fixture def db_transaction(): """Each test runs in a transaction that rolls back.""" connection = engine.connect() transaction = connection.begin() session = Session(bind=connection) yield session session.close() transaction.rollback() connection.close() def test_user_creation(db_transaction): user = User(email="test@example.com") db_transaction.add(user) db_transaction.commit() assert db_transaction.query(User).count() == 1 # Transaction rolls back after test - no cleanup needed ``` ### Handling External Dependencies **The Problem**: Tests depending on external systems (APIs, databases, file systems) become non-deterministic and slow. **Strategies**: - **Mock external HTTP APIs**: Use libraries like `responses`, `httpretty`, or `requests-mock` - **Use in-memory databases**: SQLite in-memory for SQL databases - **Fake file systems**: Use `pyfakefs` or similar for file operations - **Container-based dependencies**: Use Testcontainers for isolated, real dependencies - **Test doubles**: Stubs, mocks, and fakes for external services **Mocking HTTP Calls** ```python import responses @responses.activate def test_fetch_user_data(): responses.add( responses.GET, 'https://api.example.com/users/123', json={'id': 123, 'name': 'Test User'}, status=200 ) user_data = api_client.get_user(123) assert user_data['name'] == 'Test User' ``` ### Controlling Randomness and Time **Random Data**: Use fixed seeds for reproducible randomness ```python # ❌ BAD: Non-deterministic def test_shuffle(): items = [1, 2, 3, 4, 5] random.shuffle(items) # Different result each run assert items[0] < items[-1] # ✅ GOOD: Deterministic def test_shuffle(): random.seed(42) items = [1, 2, 3, 4, 5] random.shuffle(items) assert items == [2, 4, 5, 3, 1] # Always same order ``` **Time-Dependent Code**: Mock current time ```python from datetime import datetime from unittest.mock import patch # ❌ BAD: Depends on actual current time def test_is_expired(): expiry = datetime(2024, 1, 1) assert is_expired(expiry) # Fails after 2024-01-01 # ✅ GOOD: Freezes time @patch('mymodule.datetime') def test_is_expired(mock_datetime): mock_datetime.now.return_value = datetime(2024, 6, 1) expiry = datetime(2024, 1, 1) assert is_expired(expiry) ``` --- ## Boundary Analysis and Equivalence Partitioning Systematic approaches to test input selection ensure comprehensive coverage without exhaustive testing. ### Boundary Value Analysis **Principle**: Bugs often occur at the edges of input ranges. Test boundaries explicitly. **For a function accepting ages 0-120:** - Test: -1 (below minimum) - Test: 0 (minimum boundary) - Test: 1 (just above minimum) - Test: 119 (just below maximum) - Test: 120 (maximum boundary) - Test: 121 (above maximum) **Example Implementation** ```python def validate_age(age): if age < 0 or age > 120: raise ValueError("Age must be between 0 and 120") return True @pytest.mark.parametrize("age,should_pass", [ (-1, False), # Below minimum (0, True), # Minimum boundary (1, True), # Just above minimum (60, True), # Typical value (119, True), # Just below maximum (120, True), # Maximum boundary (121, False), # Above maximum ]) def test_age_validation(age, should_pass): if should_pass: assert validate_age(age) == True else: with pytest.raises(ValueError): validate_age(age) ``` ### Equivalence Partitioning **Principle**: Divide input space into classes that should behave identically. Test one representative from each class. **For a discount function based on purchase amount:** - Partition 1: $0-100 (no discount) - Partition 2: $101-500 (10% discount) - Partition 3: $501+ (20% discount) Test representatives: $50, $300, $600 **Example Implementation** ```python def calculate_discount(amount): if amount <= 100: return 0 elif amount <= 500: return amount * 0.10 else: return amount * 0.20 @pytest.mark.parametrize("amount,expected_discount", [ (50, 0), # Partition 1: 0-100 (100, 0), # Boundary (101, 10.10), # Just into partition 2 (300, 30), # Partition 2: 101-500 (500, 50), # Boundary (501, 100.20), # Just into partition 3 (1000, 200), # Partition 3: 501+ ]) def test_discount_calculation(amount, expected_discount): assert calculate_discount(amount) == pytest.approx(expected_discount) ``` ### Property-Based Testing **Principle**: Define invariants that should always hold, then generate hundreds of test cases automatically. **Example: Reversing a List** ```python from hypothesis import given from hypothesis.strategies import lists, integers @given(lists(integers())) def test_reverse_twice_returns_original(items): """Reversing a list twice should return the original.""" reversed_once = list(reversed(items)) reversed_twice = list(reversed(reversed_once)) assert reversed_twice == items @given(lists(integers())) def test_sorted_list_has_same_elements(items): """Sorted list should have same elements as original.""" sorted_items = sorted(items) assert sorted(sorted_items) == sorted(items) assert len(sorted_items) == len(items) ``` **Benefits**: - Finds edge cases you didn't think to test - Tests many more scenarios than manual tests - Shrinks failing examples to minimal reproducible cases - Documents properties/invariants of the code --- ## Writing Expressive, Maintainable Tests ### Arrange-Act-Assert (AAA) Pattern **Structure**: Organize tests into three clear sections for maximum readability. ```python def test_user_registration(): # Arrange: Set up test data and dependencies email = "newuser@example.com" password = "SecurePass123!" user_service = UserService(database, email_service) # Act: Execute the behavior being tested result = user_service.register(email, password) # Assert: Verify expected outcomes assert result.success == True assert result.user.email == email assert result.user.is_verified == False ``` **Benefits**: - Immediately clear what the test does - Easy to identify what's being tested - Simple to understand failure location - Consistent structure across test suite **Variations**: - Use blank lines to separate sections - Use comments to label sections - BDD Given-When-Then is equivalent structure ### Descriptive Test Names **Principle**: Test names should describe scenario and expected outcome without reading test code. **Naming Patterns**: ```python # Pattern 1: test_[function]_[scenario]_[expected_result] def test_calculate_discount_for_vip_customer_returns_ten_percent(): pass # Pattern 2: test_[scenario]_[expected_result] def test_invalid_email_raises_validation_error(): pass # Pattern 3: Natural language with underscores def test_user_cannot_login_with_expired_password(): pass # Pattern 4: BDD-style def test_given_vip_customer_when_ordering_then_receives_discount(): pass ``` **Good Test Names**: - ✅ `test_empty_cart_checkout_raises_error` - ✅ `test_duplicate_email_registration_rejected` - ✅ `test_expired_token_authentication_fails` - ✅ `test_concurrent_order_updates_handled_correctly` **Bad Test Names**: - ❌ `test_1`, `test_2`, `test_3` (meaningless) - ❌ `test_order` (what about orders?) - ❌ `test_the_thing_works` (what thing? how?) - ❌ `test_bug_fix` (what bug?) ### Clear, Actionable Assertions **Principle**: Assertion failures should immediately communicate what went wrong. ```python # ❌ BAD: Unclear failure message def test_discount_calculation(): assert calculate_discount(100) == 10 # Output: AssertionError: assert 0 == 10 # (Why did we expect 10? What was the input context?) # ✅ GOOD: Clear failure message def test_discount_calculation(): customer = Customer(type="VIP") order_total = 100 discount = calculate_discount(customer, order_total) assert discount == 10, \ f"VIP customer with ${order_total} order should get $10 discount, got ${discount}" # Output: AssertionError: VIP customer with $100 order should get $10 discount, got $0 ``` **Custom Assertion Messages**: ```python # Complex conditions benefit from explanatory messages assert user.is_active, \ f"User {user.email} should be active after verification, but is_active={user.is_active}" assert len(results) > 0, \ f"Search for '{search_term}' should return results, got empty list" assert response.status_code == 200, \ f"GET /api/users/{user_id} should return 200, got {response.status_code}: {response.text}" ``` ### Keep Tests Focused and Small **Principle**: Each test should verify one behavior or logical assertion. ```python # ❌ BAD: Testing too much def test_user_lifecycle(): # Create user user = User.create(email="test@example.com") assert user.id is not None # Activate user user.activate() assert user.is_active # Update profile user.update_profile(name="Test User") assert user.name == "Test User" # Deactivate user user.deactivate() assert not user.is_active # Delete user user.delete() assert User.find(user.id) is None # ✅ GOOD: Focused tests def test_create_user_assigns_id(): user = User.create(email="test@example.com") assert user.id is not None def test_activate_user_sets_active_flag(): user = User.create(email="test@example.com") user.activate() assert user.is_active == True def test_update_profile_changes_name(): user = User.create(email="test@example.com") user.update_profile(name="Test User") assert user.name == "Test User" ``` **When Multiple Assertions Are OK**: - Verifying different properties of the same object - Checking related side effects of single action - Asserting preconditions and postconditions ```python # ✅ GOOD: Multiple related assertions def test_order_creation(): items = [Item("Widget", 10), Item("Gadget", 20)] order = Order.create(items) assert order.id is not None assert order.total == 30 assert len(order.items) == 2 assert order.status == "pending" ``` --- ## Test Data Management ### Fixtures: Providing Test Dependencies **Fixtures** provide reusable test dependencies and setup/teardown logic. **Pytest Fixtures** ```python @pytest.fixture def database(): """Provide a test database.""" db = create_test_database() yield db db.close() @pytest.fixture def user(database): """Provide a test user.""" user = User.create(email="test@example.com") yield user user.delete() def test_user_can_place_order(user, database): order = Order.create(user=user, items=[Item("Widget", 10)]) assert order.user_id == user.id ``` **Fixture Scopes**: - `function`: New instance per test (default) - `class`: Shared across test class - `module`: Shared across test file - `session`: Shared across entire test run ```python @pytest.fixture(scope="session") def app(): """Start application once for entire test session.""" app = create_app() app.start() yield app app.stop() ``` ### Factories: Flexible Test Data Creation **Factories** create test objects with sensible defaults that can be customized. ```python # Factory pattern class UserFactory: _counter = 0 @classmethod def create(cls, **kwargs): cls._counter += 1 defaults = { 'email': f'user{cls._counter}@example.com', 'name': f'Test User {cls._counter}', 'role': 'user', 'is_active': True } defaults.update(kwargs) return User(**defaults) # Usage def test_admin_user(): admin = UserFactory.create(role='admin', name='Admin User') assert admin.role == 'admin' def test_inactive_user(): inactive = UserFactory.create(is_active=False) assert not inactive.is_active ``` **FactoryBoy Library** ```python import factory class UserFactory(factory.Factory): class Meta: model = User email = factory.Sequence(lambda n: f'user{n}@example.com') name = factory.Faker('name') role = 'user' is_active = True # Usage def test_user_creation(): user = UserFactory() assert user.email.endswith('@example.com') def test_admin_user(): admin = UserFactory(role='admin') assert admin.role == 'admin' ``` ### Builders: Constructing Complex Objects **Builders** provide fluent APIs for creating complex test objects. ```python class OrderBuilder: def __init__(self): self._user = None self._items = [] self._status = 'pending' self._shipping_address = None def with_user(self, user): self._user = user return self def with_item(self, name, price, quantity=1): self._items.append(Item(name, price, quantity)) return self def with_status(self, status): self._status = status return self def with_shipping(self, address): self._shipping_address = address return self def build(self): return Order( user=self._user, items=self._items, status=self._status, shipping_address=self._shipping_address ) # Usage def test_order_total_calculation(): order = (OrderBuilder() .with_item("Widget", 10, quantity=2) .with_item("Gadget", 20) .build()) assert order.total == 40 ``` ### Snapshot/Golden Master Testing **Principle**: Capture complex output as baseline, verify future runs match. **Use Cases**: - Testing rendered HTML or UI components - Verifying generated reports or documents - Validating complex JSON/XML output - Testing data transformations with many fields ```python def test_user_profile_rendering(snapshot): user = User(name="John Doe", email="john@example.com", age=30) html = render_user_profile(user) snapshot.assert_match(html, 'user_profile.html') # First run: Captures output as baseline # Subsequent runs: Compares against baseline # To update baseline: pytest --snapshot-update ``` **Trade-offs**: - **Pro**: Easy to test complex outputs - **Pro**: Catches unintended changes - **Con**: Changes to output format require updating all snapshots - **Con**: Can hide real regressions if snapshots updated without review --- ## Mocking, Stubbing, and Fakes ### Understanding the Distinctions **Stub**: Provides canned responses to method calls ```python class StubEmailService: def send_email(self, to, subject, body): return True # Always succeeds, doesn't actually send ``` **Mock**: Records interactions and verifies they occurred as expected ```python from unittest.mock import Mock email_service = Mock() user_service.register(user) email_service.send_welcome_email.assert_called_once_with(user.email) ``` **Fake**: Simplified working implementation ```python class FakeDatabase: def __init__(self): self._data = {} def save(self, id, value): self._data[id] = value def find(self, id): return self._data.get(id) ``` ### When to Use Each Type **Use Stubs** for query dependencies that provide data ```python def test_calculate_user_discount(): stub_repository = StubUserRepository() stub_repository.set_return_value(User(type="VIP")) discount_service = DiscountService(stub_repository) discount = discount_service.calculate_for_user(user_id=123) assert discount == 10 ``` **Use Mocks** for command dependencies where interaction matters ```python def test_order_completion_sends_confirmation(): mock_email_service = Mock() order_service = OrderService(mock_email_service) order_service.complete_order(order_id=123) mock_email_service.send_confirmation.assert_called_once() ``` **Use Fakes** for complex dependencies needing realistic behavior ```python def test_user_repository_operations(): fake_db = FakeDatabase() repository = UserRepository(fake_db) user = User(email="test@example.com") repository.save(user) found = repository.find_by_email("test@example.com") assert found.email == user.email ``` ### Avoiding Over-Mocking **The Problem**: Mocking every dependency couples tests to implementation. ```python # ❌ BAD: Over-mocking def test_order_creation(): mock_validator = Mock() mock_repository = Mock() mock_calculator = Mock() mock_logger = Mock() service = OrderService(mock_validator, mock_repository, mock_calculator, mock_logger) service.create_order(data) mock_validator.validate.assert_called_once() mock_calculator.calculate_total.assert_called_once() mock_repository.save.assert_called_once() mock_logger.log.assert_called() # ✅ GOOD: Mock only external boundaries def test_order_creation(): repository = InMemoryOrderRepository() service = OrderService(repository) order = service.create_order(data) assert order.total == 100 assert repository.find(order.id) is not None ``` **Guidelines**: - Mock external systems (databases, APIs, file systems) - Use real implementations for internal collaborators - Don't verify every method call - Focus on observable behavior, not interactions --- ## Summary: Test Design Principles 1. **Deterministic tests**: Eliminate timing issues, shared state, and external dependencies 2. **Boundary testing**: Test edges systematically where bugs hide 3. **Expressive tests**: Clear names, AAA structure, actionable failures 4. **Focused tests**: One behavior per test, small and understandable 5. **Flexible test data**: Use factories and builders for maintainable test data 6. **Strategic mocking**: Mock at system boundaries, use real implementations internally 7. **Fast execution**: Keep unit tests fast for rapid feedback loops These practices work together to create test suites that catch bugs, enable refactoring, and remain maintainable over time.