20 KiB
Test Design Best Practices
Building Stable, Deterministic Tests
The foundation of a reliable test suite is deterministic tests that produce consistent results. Non-deterministic or "flaky" tests erode trust and waste time debugging phantom failures.
Eliminating Timing Dependencies
The Problem: Fixed sleep statements create unreliable tests that either waste time or fail intermittently.
# ❌ BAD: Fixed sleep
def test_async_operation():
start_async_task()
time.sleep(5) # Hope 5 seconds is enough
assert task_completed()
# ✅ GOOD: Conditional wait
def test_async_operation():
start_async_task()
wait_until(lambda: task_completed(), timeout=10, interval=0.1)
assert task_completed()
Strategies:
- Use explicit waits with conditions, not arbitrary sleeps
- Implement timeout-based polling for async operations
- Use testing framework wait utilities (e.g., Selenium WebDriverWait)
- Mock time-dependent code to make tests deterministic
- Use fast fake timers in tests instead of actual delays
Example: Polling with Timeout
def wait_until(condition, timeout=10, interval=0.1):
"""Wait until condition is true or timeout expires."""
end_time = time.time() + timeout
while time.time() < end_time:
if condition():
return True
time.sleep(interval)
raise TimeoutError(f"Condition not met within {timeout} seconds")
def test_message_processing():
queue.publish(message)
wait_until(lambda: queue.is_empty(), timeout=5)
assert message_handler.processed_count == 1
Avoiding Shared State
The Problem: Tests that share state create order dependencies and cascading failures.
# ❌ BAD: Shared state
shared_user = None
def test_create_user():
global shared_user
shared_user = User.create(email="test@example.com")
assert shared_user.id is not None
def test_user_can_login():
# Depends on previous test
result = login(shared_user.email, "password")
assert result.success
# ✅ GOOD: Independent tests
@pytest.fixture
def user():
"""Each test gets its own user."""
user = User.create(email=f"test-{uuid4()}@example.com")
yield user
user.delete()
def test_create_user(user):
assert user.id is not None
def test_user_can_login(user):
result = login(user.email, "password")
assert result.success
Strategies:
- Use test fixtures that create fresh instances
- Employ database transactions that rollback after each test
- Use in-memory databases that reset between tests
- Avoid global variables and singletons in tests
- Make each test completely self-contained
Database Isolation Pattern
@pytest.fixture
def db_transaction():
"""Each test runs in a transaction that rolls back."""
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()
def test_user_creation(db_transaction):
user = User(email="test@example.com")
db_transaction.add(user)
db_transaction.commit()
assert db_transaction.query(User).count() == 1
# Transaction rolls back after test - no cleanup needed
Handling External Dependencies
The Problem: Tests depending on external systems (APIs, databases, file systems) become non-deterministic and slow.
Strategies:
- Mock external HTTP APIs: Use libraries like
responses,httpretty, orrequests-mock - Use in-memory databases: SQLite in-memory for SQL databases
- Fake file systems: Use
pyfakefsor similar for file operations - Container-based dependencies: Use Testcontainers for isolated, real dependencies
- Test doubles: Stubs, mocks, and fakes for external services
Mocking HTTP Calls
import responses
@responses.activate
def test_fetch_user_data():
responses.add(
responses.GET,
'https://api.example.com/users/123',
json={'id': 123, 'name': 'Test User'},
status=200
)
user_data = api_client.get_user(123)
assert user_data['name'] == 'Test User'
Controlling Randomness and Time
Random Data: Use fixed seeds for reproducible randomness
# ❌ BAD: Non-deterministic
def test_shuffle():
items = [1, 2, 3, 4, 5]
random.shuffle(items) # Different result each run
assert items[0] < items[-1]
# ✅ GOOD: Deterministic
def test_shuffle():
random.seed(42)
items = [1, 2, 3, 4, 5]
random.shuffle(items)
assert items == [2, 4, 5, 3, 1] # Always same order
Time-Dependent Code: Mock current time
from datetime import datetime
from unittest.mock import patch
# ❌ BAD: Depends on actual current time
def test_is_expired():
expiry = datetime(2024, 1, 1)
assert is_expired(expiry) # Fails after 2024-01-01
# ✅ GOOD: Freezes time
@patch('mymodule.datetime')
def test_is_expired(mock_datetime):
mock_datetime.now.return_value = datetime(2024, 6, 1)
expiry = datetime(2024, 1, 1)
assert is_expired(expiry)
Boundary Analysis and Equivalence Partitioning
Systematic approaches to test input selection ensure comprehensive coverage without exhaustive testing.
Boundary Value Analysis
Principle: Bugs often occur at the edges of input ranges. Test boundaries explicitly.
For a function accepting ages 0-120:
- Test: -1 (below minimum)
- Test: 0 (minimum boundary)
- Test: 1 (just above minimum)
- Test: 119 (just below maximum)
- Test: 120 (maximum boundary)
- Test: 121 (above maximum)
Example Implementation
def validate_age(age):
if age < 0 or age > 120:
raise ValueError("Age must be between 0 and 120")
return True
@pytest.mark.parametrize("age,should_pass", [
(-1, False), # Below minimum
(0, True), # Minimum boundary
(1, True), # Just above minimum
(60, True), # Typical value
(119, True), # Just below maximum
(120, True), # Maximum boundary
(121, False), # Above maximum
])
def test_age_validation(age, should_pass):
if should_pass:
assert validate_age(age) == True
else:
with pytest.raises(ValueError):
validate_age(age)
Equivalence Partitioning
Principle: Divide input space into classes that should behave identically. Test one representative from each class.
For a discount function based on purchase amount:
- Partition 1: $0-100 (no discount)
- Partition 2: $101-500 (10% discount)
- Partition 3: $501+ (20% discount)
Test representatives: $50, $300, $600
Example Implementation
def calculate_discount(amount):
if amount <= 100:
return 0
elif amount <= 500:
return amount * 0.10
else:
return amount * 0.20
@pytest.mark.parametrize("amount,expected_discount", [
(50, 0), # Partition 1: 0-100
(100, 0), # Boundary
(101, 10.10), # Just into partition 2
(300, 30), # Partition 2: 101-500
(500, 50), # Boundary
(501, 100.20), # Just into partition 3
(1000, 200), # Partition 3: 501+
])
def test_discount_calculation(amount, expected_discount):
assert calculate_discount(amount) == pytest.approx(expected_discount)
Property-Based Testing
Principle: Define invariants that should always hold, then generate hundreds of test cases automatically.
Example: Reversing a List
from hypothesis import given
from hypothesis.strategies import lists, integers
@given(lists(integers()))
def test_reverse_twice_returns_original(items):
"""Reversing a list twice should return the original."""
reversed_once = list(reversed(items))
reversed_twice = list(reversed(reversed_once))
assert reversed_twice == items
@given(lists(integers()))
def test_sorted_list_has_same_elements(items):
"""Sorted list should have same elements as original."""
sorted_items = sorted(items)
assert sorted(sorted_items) == sorted(items)
assert len(sorted_items) == len(items)
Benefits:
- Finds edge cases you didn't think to test
- Tests many more scenarios than manual tests
- Shrinks failing examples to minimal reproducible cases
- Documents properties/invariants of the code
Writing Expressive, Maintainable Tests
Arrange-Act-Assert (AAA) Pattern
Structure: Organize tests into three clear sections for maximum readability.
def test_user_registration():
# Arrange: Set up test data and dependencies
email = "newuser@example.com"
password = "SecurePass123!"
user_service = UserService(database, email_service)
# Act: Execute the behavior being tested
result = user_service.register(email, password)
# Assert: Verify expected outcomes
assert result.success == True
assert result.user.email == email
assert result.user.is_verified == False
Benefits:
- Immediately clear what the test does
- Easy to identify what's being tested
- Simple to understand failure location
- Consistent structure across test suite
Variations:
- Use blank lines to separate sections
- Use comments to label sections
- BDD Given-When-Then is equivalent structure
Descriptive Test Names
Principle: Test names should describe scenario and expected outcome without reading test code.
Naming Patterns:
# Pattern 1: test_[function]_[scenario]_[expected_result]
def test_calculate_discount_for_vip_customer_returns_ten_percent():
pass
# Pattern 2: test_[scenario]_[expected_result]
def test_invalid_email_raises_validation_error():
pass
# Pattern 3: Natural language with underscores
def test_user_cannot_login_with_expired_password():
pass
# Pattern 4: BDD-style
def test_given_vip_customer_when_ordering_then_receives_discount():
pass
Good Test Names:
- ✅
test_empty_cart_checkout_raises_error - ✅
test_duplicate_email_registration_rejected - ✅
test_expired_token_authentication_fails - ✅
test_concurrent_order_updates_handled_correctly
Bad Test Names:
- ❌
test_1,test_2,test_3(meaningless) - ❌
test_order(what about orders?) - ❌
test_the_thing_works(what thing? how?) - ❌
test_bug_fix(what bug?)
Clear, Actionable Assertions
Principle: Assertion failures should immediately communicate what went wrong.
# ❌ BAD: Unclear failure message
def test_discount_calculation():
assert calculate_discount(100) == 10
# Output: AssertionError: assert 0 == 10
# (Why did we expect 10? What was the input context?)
# ✅ GOOD: Clear failure message
def test_discount_calculation():
customer = Customer(type="VIP")
order_total = 100
discount = calculate_discount(customer, order_total)
assert discount == 10, \
f"VIP customer with ${order_total} order should get $10 discount, got ${discount}"
# Output: AssertionError: VIP customer with $100 order should get $10 discount, got $0
Custom Assertion Messages:
# Complex conditions benefit from explanatory messages
assert user.is_active, \
f"User {user.email} should be active after verification, but is_active={user.is_active}"
assert len(results) > 0, \
f"Search for '{search_term}' should return results, got empty list"
assert response.status_code == 200, \
f"GET /api/users/{user_id} should return 200, got {response.status_code}: {response.text}"
Keep Tests Focused and Small
Principle: Each test should verify one behavior or logical assertion.
# ❌ BAD: Testing too much
def test_user_lifecycle():
# Create user
user = User.create(email="test@example.com")
assert user.id is not None
# Activate user
user.activate()
assert user.is_active
# Update profile
user.update_profile(name="Test User")
assert user.name == "Test User"
# Deactivate user
user.deactivate()
assert not user.is_active
# Delete user
user.delete()
assert User.find(user.id) is None
# ✅ GOOD: Focused tests
def test_create_user_assigns_id():
user = User.create(email="test@example.com")
assert user.id is not None
def test_activate_user_sets_active_flag():
user = User.create(email="test@example.com")
user.activate()
assert user.is_active == True
def test_update_profile_changes_name():
user = User.create(email="test@example.com")
user.update_profile(name="Test User")
assert user.name == "Test User"
When Multiple Assertions Are OK:
- Verifying different properties of the same object
- Checking related side effects of single action
- Asserting preconditions and postconditions
# ✅ GOOD: Multiple related assertions
def test_order_creation():
items = [Item("Widget", 10), Item("Gadget", 20)]
order = Order.create(items)
assert order.id is not None
assert order.total == 30
assert len(order.items) == 2
assert order.status == "pending"
Test Data Management
Fixtures: Providing Test Dependencies
Fixtures provide reusable test dependencies and setup/teardown logic.
Pytest Fixtures
@pytest.fixture
def database():
"""Provide a test database."""
db = create_test_database()
yield db
db.close()
@pytest.fixture
def user(database):
"""Provide a test user."""
user = User.create(email="test@example.com")
yield user
user.delete()
def test_user_can_place_order(user, database):
order = Order.create(user=user, items=[Item("Widget", 10)])
assert order.user_id == user.id
Fixture Scopes:
function: New instance per test (default)class: Shared across test classmodule: Shared across test filesession: Shared across entire test run
@pytest.fixture(scope="session")
def app():
"""Start application once for entire test session."""
app = create_app()
app.start()
yield app
app.stop()
Factories: Flexible Test Data Creation
Factories create test objects with sensible defaults that can be customized.
# Factory pattern
class UserFactory:
_counter = 0
@classmethod
def create(cls, **kwargs):
cls._counter += 1
defaults = {
'email': f'user{cls._counter}@example.com',
'name': f'Test User {cls._counter}',
'role': 'user',
'is_active': True
}
defaults.update(kwargs)
return User(**defaults)
# Usage
def test_admin_user():
admin = UserFactory.create(role='admin', name='Admin User')
assert admin.role == 'admin'
def test_inactive_user():
inactive = UserFactory.create(is_active=False)
assert not inactive.is_active
FactoryBoy Library
import factory
class UserFactory(factory.Factory):
class Meta:
model = User
email = factory.Sequence(lambda n: f'user{n}@example.com')
name = factory.Faker('name')
role = 'user'
is_active = True
# Usage
def test_user_creation():
user = UserFactory()
assert user.email.endswith('@example.com')
def test_admin_user():
admin = UserFactory(role='admin')
assert admin.role == 'admin'
Builders: Constructing Complex Objects
Builders provide fluent APIs for creating complex test objects.
class OrderBuilder:
def __init__(self):
self._user = None
self._items = []
self._status = 'pending'
self._shipping_address = None
def with_user(self, user):
self._user = user
return self
def with_item(self, name, price, quantity=1):
self._items.append(Item(name, price, quantity))
return self
def with_status(self, status):
self._status = status
return self
def with_shipping(self, address):
self._shipping_address = address
return self
def build(self):
return Order(
user=self._user,
items=self._items,
status=self._status,
shipping_address=self._shipping_address
)
# Usage
def test_order_total_calculation():
order = (OrderBuilder()
.with_item("Widget", 10, quantity=2)
.with_item("Gadget", 20)
.build())
assert order.total == 40
Snapshot/Golden Master Testing
Principle: Capture complex output as baseline, verify future runs match.
Use Cases:
- Testing rendered HTML or UI components
- Verifying generated reports or documents
- Validating complex JSON/XML output
- Testing data transformations with many fields
def test_user_profile_rendering(snapshot):
user = User(name="John Doe", email="john@example.com", age=30)
html = render_user_profile(user)
snapshot.assert_match(html, 'user_profile.html')
# First run: Captures output as baseline
# Subsequent runs: Compares against baseline
# To update baseline: pytest --snapshot-update
Trade-offs:
- Pro: Easy to test complex outputs
- Pro: Catches unintended changes
- Con: Changes to output format require updating all snapshots
- Con: Can hide real regressions if snapshots updated without review
Mocking, Stubbing, and Fakes
Understanding the Distinctions
Stub: Provides canned responses to method calls
class StubEmailService:
def send_email(self, to, subject, body):
return True # Always succeeds, doesn't actually send
Mock: Records interactions and verifies they occurred as expected
from unittest.mock import Mock
email_service = Mock()
user_service.register(user)
email_service.send_welcome_email.assert_called_once_with(user.email)
Fake: Simplified working implementation
class FakeDatabase:
def __init__(self):
self._data = {}
def save(self, id, value):
self._data[id] = value
def find(self, id):
return self._data.get(id)
When to Use Each Type
Use Stubs for query dependencies that provide data
def test_calculate_user_discount():
stub_repository = StubUserRepository()
stub_repository.set_return_value(User(type="VIP"))
discount_service = DiscountService(stub_repository)
discount = discount_service.calculate_for_user(user_id=123)
assert discount == 10
Use Mocks for command dependencies where interaction matters
def test_order_completion_sends_confirmation():
mock_email_service = Mock()
order_service = OrderService(mock_email_service)
order_service.complete_order(order_id=123)
mock_email_service.send_confirmation.assert_called_once()
Use Fakes for complex dependencies needing realistic behavior
def test_user_repository_operations():
fake_db = FakeDatabase()
repository = UserRepository(fake_db)
user = User(email="test@example.com")
repository.save(user)
found = repository.find_by_email("test@example.com")
assert found.email == user.email
Avoiding Over-Mocking
The Problem: Mocking every dependency couples tests to implementation.
# ❌ BAD: Over-mocking
def test_order_creation():
mock_validator = Mock()
mock_repository = Mock()
mock_calculator = Mock()
mock_logger = Mock()
service = OrderService(mock_validator, mock_repository, mock_calculator, mock_logger)
service.create_order(data)
mock_validator.validate.assert_called_once()
mock_calculator.calculate_total.assert_called_once()
mock_repository.save.assert_called_once()
mock_logger.log.assert_called()
# ✅ GOOD: Mock only external boundaries
def test_order_creation():
repository = InMemoryOrderRepository()
service = OrderService(repository)
order = service.create_order(data)
assert order.total == 100
assert repository.find(order.id) is not None
Guidelines:
- Mock external systems (databases, APIs, file systems)
- Use real implementations for internal collaborators
- Don't verify every method call
- Focus on observable behavior, not interactions
Summary: Test Design Principles
- Deterministic tests: Eliminate timing issues, shared state, and external dependencies
- Boundary testing: Test edges systematically where bugs hide
- Expressive tests: Clear names, AAA structure, actionable failures
- Focused tests: One behavior per test, small and understandable
- Flexible test data: Use factories and builders for maintainable test data
- Strategic mocking: Mock at system boundaries, use real implementations internally
- Fast execution: Keep unit tests fast for rapid feedback loops
These practices work together to create test suites that catch bugs, enable refactoring, and remain maintainable over time.