# Testing and Quality ## Overview **Core Principle:** Test behavior, not implementation. Tests are executable documentation that ensure code works as expected and continues to work as it evolves. Modern Python testing centers on pytest: simple syntax, powerful fixtures, comprehensive plugins. Good tests enable confident refactoring, catch regressions early, and document expected behavior. Bad tests are brittle, slow, and create maintenance burden without providing value. ## When to Use **Use this skill when:** - "Tests are failing" - "How to write pytest tests?" - "Fixture scope issues" - "Mock not working" - "Flaky tests" - "Improve test coverage" - "Tests too slow" - "How to test X?" **Don't use when:** - Setting up testing infrastructure (use project-structure-and-tooling first) - Debugging production code (use debugging-and-profiling) - Performance optimization (use debugging-and-profiling to profile first) **Symptoms triggering this skill:** - pytest errors or failures - Need to add tests to existing code - Tests passing locally but failing in CI - Coverage gaps identified - Difficulty testing complex scenarios ## pytest Fundamentals ### Basic Test Structure ```python # ❌ WRONG: Using unittest (verbose, requires class) import unittest class TestCalculator(unittest.TestCase): def test_addition(self): self.assertEqual(add(2, 3), 5) def test_subtraction(self): self.assertEqual(subtract(5, 3), 2) if __name__ == '__main__': unittest.main() # ✅ CORRECT: Using pytest (simple, clear) def test_addition(): assert add(2, 3) == 5 def test_subtraction(): assert subtract(5, 3) == 2 # Why this matters: pytest uses plain assert, no class needed, cleaner syntax ``` ### Test Discovery ```python # pytest discovers tests automatically using these conventions: # ✅ Test file naming # test_*.py or *_test.py test_calculator.py # ✓ calculator_test.py # ✓ tests.py # ✗ Won't be discovered # ✅ Test function naming def test_addition(): # ✓ Discovered pass def addition_test(): # ✗ Not discovered (must start with test_) pass def testAddition(): # ✗ Not discovered (use snake_case) pass # ✅ Test class naming (optional) class TestCalculator: # Must start with Test def test_add(self): # Method must start with test_ pass ``` ### Assertions and Error Messages ```python # ❌ WRONG: No context for failure def test_user_creation(): user = create_user("alice", "alice@example.com") assert user.name == "alice" assert user.email == "alice@example.com" # ✅ CORRECT: Descriptive assertions def test_user_creation(): user = create_user("alice", "alice@example.com") # pytest shows actual vs expected on failure assert user.name == "alice", f"Expected name 'alice', got '{user.name}'" assert user.email == "alice@example.com" assert user.active is True # Boolean assertions are clear # ✅ CORRECT: Using pytest helpers for better errors import pytest def test_exception_raised(): with pytest.raises(ValueError, match="Invalid email"): create_user("alice", "not-an-email") def test_approximate_equality(): # For floats, use approx result = calculate_pi() assert result == pytest.approx(3.14159, rel=1e-5) # ✅ CORRECT: Testing multiple conditions def test_user_validation(): with pytest.raises(ValueError) as exc_info: create_user("", "alice@example.com") assert "name cannot be empty" in str(exc_info.value) ``` **Why this matters:** Clear assertions make test failures immediately understandable. pytest's introspection shows actual values without manual formatting. ### Test Organization ```python # ✅ CORRECT: Group related tests in classes class TestUserCreation: """Tests for user creation logic.""" def test_valid_user(self): user = create_user("alice", "alice@example.com") assert user.name == "alice" def test_invalid_email(self): with pytest.raises(ValueError): create_user("alice", "invalid") def test_empty_name(self): with pytest.raises(ValueError): create_user("", "alice@example.com") class TestUserUpdate: """Tests for user update logic.""" def test_update_email(self): user = create_user("alice", "old@example.com") user.update_email("new@example.com") assert user.email == "new@example.com" # ✅ Directory structure tests/ ├── __init__.py ├── conftest.py # Shared fixtures ├── test_users.py # User-related tests ├── test_auth.py # Auth-related tests └── integration/ ├── __init__.py └── test_api.py # Integration tests ``` ## Fixtures ### Basic Fixtures ```python import pytest # ❌ WRONG: Repeating setup in each test def test_user_creation(): db = Database("test.db") db.connect() user = create_user(db, "alice", "alice@example.com") assert user.name == "alice" db.disconnect() def test_user_deletion(): db = Database("test.db") db.connect() user = create_user(db, "alice", "alice@example.com") delete_user(db, user.id) db.disconnect() # ✅ CORRECT: Use fixture for shared setup @pytest.fixture def db(): """Provide a test database connection.""" database = Database("test.db") database.connect() yield database # Test runs here database.disconnect() # Cleanup def test_user_creation(db): user = create_user(db, "alice", "alice@example.com") assert user.name == "alice" def test_user_deletion(db): user = create_user(db, "alice", "alice@example.com") delete_user(db, user.id) assert not db.get_user(user.id) ``` **Why this matters:** Fixtures reduce duplication, ensure cleanup happens, and make test intent clear. ### Fixture Scopes ```python # ❌ WRONG: Function scope for expensive setup (slow tests) @pytest.fixture # Default scope="function" - runs for each test def expensive_resource(): resource = ExpensiveResource() # Takes 5 seconds to initialize resource.initialize() yield resource resource.cleanup() # 100 tests × 5 seconds = 500 seconds just for setup! # ✅ CORRECT: Appropriate scope for resource lifecycle @pytest.fixture(scope="session") # Once per test session def expensive_resource(): """Expensive resource initialized once for all tests.""" resource = ExpensiveResource() resource.initialize() yield resource resource.cleanup() @pytest.fixture(scope="module") # Once per test module def database(): """Database connection shared across test module.""" db = Database("test.db") db.connect() yield db db.disconnect() @pytest.fixture(scope="class") # Once per test class def api_client(): """API client for test class.""" client = APIClient() yield client client.close() @pytest.fixture(scope="function") # Once per test (default) def user(): """Fresh user for each test.""" return create_user("test", "test@example.com") ``` **Scope Guidelines:** - `function` (default): Fresh state for each test, slow but safe - `class`: Share across test class, balance speed and isolation - `module`: Share across test file, faster but less isolation - `session`: Share across entire test run, fastest but needs careful cleanup **Critical Rule:** Higher scopes must reset state between tests or be read-only! ### Fixture Factories ```python # ❌ WRONG: Creating fixtures for every variation @pytest.fixture def user_alice(): return create_user("alice", "alice@example.com") @pytest.fixture def user_bob(): return create_user("bob", "bob@example.com") @pytest.fixture def admin_user(): return create_user("admin", "admin@example.com", is_admin=True) # ✅ CORRECT: Use fixture factory pattern @pytest.fixture def user_factory(): """Factory for creating test users.""" created_users = [] def _create_user(name: str, email: str | None = None, **kwargs): if email is None: email = f"{name}@example.com" user = create_user(name, email, **kwargs) created_users.append(user) return user yield _create_user # Cleanup all created users for user in created_users: delete_user(user.id) # Usage def test_user_permissions(user_factory): alice = user_factory("alice") bob = user_factory("bob") admin = user_factory("admin", is_admin=True) assert not alice.is_admin assert admin.is_admin ``` **Why this matters:** Factories provide flexibility without fixture explosion. Automatic cleanup tracks all created resources. ### Fixture Composition ```python # ✅ CORRECT: Compose fixtures to build complex setups @pytest.fixture def database(): db = Database("test.db") db.connect() yield db db.disconnect() @pytest.fixture def user(database): # Uses database fixture user = create_user(database, "alice", "alice@example.com") yield user delete_user(database, user.id) @pytest.fixture def authenticated_client(user): # Uses user fixture (which uses database) client = APIClient() client.authenticate(user.id) yield client client.close() # Test uses only the highest-level fixture it needs def test_api_call(authenticated_client): response = authenticated_client.get("/profile") assert response.status_code == 200 ``` **Why this matters:** Composition creates clear dependency chains. Tests request only what they need, fixtures handle the rest. ### conftest.py ```python # File: tests/conftest.py # Fixtures defined here are available to all tests import pytest @pytest.fixture(scope="session") def database(): """Session-scoped database for all tests.""" db = Database("test.db") db.connect() db.migrate() yield db db.disconnect() @pytest.fixture def clean_database(database): """Reset database state before each test.""" yield database database.truncate_all_tables() # File: tests/integration/conftest.py # Fixtures here available only to integration tests @pytest.fixture def api_server(): """Start API server for integration tests.""" server = TestServer() server.start() yield server server.stop() ``` **conftest.py locations:** - `tests/conftest.py`: Available to all tests - `tests/integration/conftest.py`: Available only to tests in integration/ - Fixtures can reference fixtures from parent conftest.py files ## Parametrization ### Basic Parametrization ```python # ❌ WRONG: Repeating tests for different inputs def test_addition_positive(): assert add(2, 3) == 5 def test_addition_negative(): assert add(-2, -3) == -5 def test_addition_zero(): assert add(0, 0) == 0 def test_addition_mixed(): assert add(-2, 3) == 1 # ✅ CORRECT: Parametrize test import pytest @pytest.mark.parametrize("a,b,expected", [ (2, 3, 5), (-2, -3, -5), (0, 0, 0), (-2, 3, 1), ]) def test_addition(a, b, expected): assert add(a, b) == expected # pytest output shows each case: # test_addition[2-3-5] PASSED # test_addition[-2--3--5] PASSED # test_addition[0-0-0] PASSED # test_addition[-2-3-1] PASSED ``` ### Parametrize with IDs ```python # ✅ CORRECT: Add readable test IDs @pytest.mark.parametrize("a,b,expected", [ pytest.param(2, 3, 5, id="positive"), pytest.param(-2, -3, -5, id="negative"), pytest.param(0, 0, 0, id="zero"), pytest.param(-2, 3, 1, id="mixed"), ]) def test_addition(a, b, expected): assert add(a, b) == expected # Output: # test_addition[positive] PASSED # test_addition[negative] PASSED # test_addition[zero] PASSED # test_addition[mixed] PASSED ``` **Why this matters:** Readable test IDs make failures immediately understandable. Instead of "test_addition[2-3-5]", you see "test_addition[positive]". ### Multiple Parametrize ```python # ✅ CORRECT: Multiple parametrize creates cartesian product @pytest.mark.parametrize("operation", [add, subtract, multiply]) @pytest.mark.parametrize("a,b", [(2, 3), (-2, 3), (0, 0)]) def test_operations(operation, a, b): result = operation(a, b) assert isinstance(result, (int, float)) # Creates 3 × 3 = 9 test combinations ``` ### Parametrize Fixtures ```python # ✅ CORRECT: Parametrize fixtures for different configurations @pytest.fixture(params=["sqlite", "postgres", "mysql"]) def database(request): """Test against multiple database backends.""" db_type = request.param if db_type == "sqlite": db = SQLiteDatabase("test.db") elif db_type == "postgres": db = PostgresDatabase("test") elif db_type == "mysql": db = MySQLDatabase("test") db.connect() yield db db.disconnect() # All tests using this fixture run against all database types def test_user_creation(database): user = create_user(database, "alice", "alice@example.com") assert user.name == "alice" # Runs 3 times: with sqlite, postgres, mysql ``` **Why this matters:** Fixture parametrization tests against multiple implementations/configurations without changing test code. ## Mocking and Patching ### When to Mock ```python # ❌ WRONG: Mocking business logic (test implementation, not behavior) def get_user_score(user_id: int) -> int: user = get_user(user_id) score = calculate_score(user.actions) return score # Bad test - mocking internal implementation def test_get_user_score(mocker): mocker.patch("module.get_user") mocker.patch("module.calculate_score", return_value=100) result = get_user_score(1) assert result == 100 # Testing mock, not real logic! # ✅ CORRECT: Mock external dependencies only import httpx def fetch_user_data(user_id: int) -> dict: """Fetch user from external API.""" response = httpx.get(f"https://api.example.com/users/{user_id}") return response.json() # Good test - mocking external API def test_fetch_user_data(mocker): mock_response = mocker.Mock() mock_response.json.return_value = {"id": 1, "name": "alice"} mocker.patch("httpx.get", return_value=mock_response) result = fetch_user_data(1) assert result == {"id": 1, "name": "alice"} ``` **When to mock:** - External APIs/services - Database calls (sometimes - prefer test database) - File system operations - Time/date (freezing time for tests) - Random number generation **When NOT to mock:** - Business logic - Internal functions - Simple calculations - Data transformations ### pytest-mock Basics ```python # Install: pip install pytest-mock import pytest # ✅ CORRECT: Using mocker fixture def test_api_call(mocker): # Mock external HTTP call mock_get = mocker.patch("requests.get") mock_get.return_value.json.return_value = {"status": "ok"} mock_get.return_value.status_code = 200 result = fetch_data("https://api.example.com/data") # Verify mock was called correctly mock_get.assert_called_once_with("https://api.example.com/data") assert result == {"status": "ok"} # ✅ CORRECT: Mock return value def test_database_query(mocker): mock_db = mocker.patch("module.database") mock_db.query.return_value = [{"id": 1, "name": "alice"}] users = get_all_users() assert len(users) == 1 assert users[0]["name"] == "alice" # ✅ CORRECT: Mock side effect (different return per call) def test_retry_logic(mocker): mock_api = mocker.patch("module.api_call") mock_api.side_effect = [ Exception("Network error"), Exception("Timeout"), {"status": "ok"} # Succeeds on third try ] result = retry_api_call() assert result == {"status": "ok"} assert mock_api.call_count == 3 # ✅ CORRECT: Mock exception def test_error_handling(mocker): mock_api = mocker.patch("module.api_call") mock_api.side_effect = ConnectionError("Network down") with pytest.raises(ConnectionError): fetch_data() ``` ### Patching Strategies ```python # ✅ CORRECT: Patch where it's used, not where it's defined # File: module.py from datetime import datetime def create_timestamp(): return datetime.now() # ❌ WRONG: Patching in datetime module def test_timestamp_wrong(mocker): mocker.patch("datetime.datetime.now") # Doesn't work! # ... # ✅ CORRECT: Patch in module where it's imported def test_timestamp_correct(mocker): fixed_time = datetime(2025, 1, 1, 12, 0, 0) mocker.patch("module.datetime.now", return_value=fixed_time) result = create_timestamp() assert result == fixed_time # ✅ CORRECT: Patch class method def test_database_method(mocker): mocker.patch.object(Database, "query", return_value=[]) db = Database() result = db.query("SELECT * FROM users") assert result == [] # ✅ CORRECT: Patch with context manager def test_temporary_patch(mocker): with mocker.patch("module.api_call", return_value={"status": "ok"}): result = fetch_data() assert result["status"] == "ok" # Patch automatically removed after context ``` ### Mocking Time ```python # ✅ CORRECT: Freeze time for deterministic tests def test_expiration(mocker): from datetime import datetime, timedelta fixed_time = datetime(2025, 1, 1, 12, 0, 0) mocker.patch("module.datetime.now", return_value=fixed_time) # Create session that expires in 1 hour session = create_session(expires_in=timedelta(hours=1)) # Session not expired at creation time assert not session.is_expired() # Advance time by 2 hours future_time = fixed_time + timedelta(hours=2) mocker.patch("module.datetime.now", return_value=future_time) # Session now expired assert session.is_expired() # ✅ BETTER: Use freezegun library (pip install freezegun) from freezegun import freeze_time @freeze_time("2025-01-01 12:00:00") def test_expiration_freezegun(): session = create_session(expires_in=timedelta(hours=1)) assert not session.is_expired() # Move time forward with freeze_time("2025-01-01 14:00:00"): assert session.is_expired() ``` ### Mocking Anti-Patterns ```python # ❌ WRONG: Mock every dependency (brittle test) def test_process_user_data_wrong(mocker): mocker.patch("module.validate_user") mocker.patch("module.transform_data") mocker.patch("module.calculate_score") mocker.patch("module.save_result") process_user_data({"id": 1}) # Test proves nothing - all logic is mocked! # ✅ CORRECT: Test real logic, mock only external dependencies def test_process_user_data_correct(mocker): # Mock only external dependency mock_save = mocker.patch("module.save_to_database") # Test real validation, transformation, calculation result = process_user_data({"id": 1, "name": "alice"}) # Verify real logic ran correctly assert result["score"] > 0 mock_save.assert_called_once() # ❌ WRONG: Asserting internal implementation details def test_implementation_details(mocker): spy = mocker.spy(module, "internal_helper") process_data([1, 2, 3]) # Brittle - breaks if refactored assert spy.call_count == 3 spy.assert_called_with(3) # ✅ CORRECT: Assert behavior, not implementation def test_behavior(mocker): result = process_data([1, 2, 3]) # Test output, not how it was calculated assert result == [2, 4, 6] # ❌ WRONG: Over-specifying mock expectations def test_over_specified(mocker): mock_api = mocker.patch("module.api_call") mock_api.return_value = {"status": "ok"} result = fetch_data() # Too specific - breaks if parameter order changes mock_api.assert_called_once_with( url="https://api.example.com", method="GET", headers={"User-Agent": "Test"}, timeout=30, retry=3 ) # ✅ CORRECT: Assert only important arguments def test_appropriate_assertions(mocker): mock_api = mocker.patch("module.api_call") mock_api.return_value = {"status": "ok"} result = fetch_data() # Assert only critical behavior assert mock_api.called assert "https://api.example.com" in str(mock_api.call_args) ``` ## Coverage ### pytest-cov Setup ```bash # Install pip install pytest-cov # Run with coverage pytest --cov=mypackage --cov-report=term-missing # Generate HTML report pytest --cov=mypackage --cov-report=html # Coverage with branch coverage (recommended) pytest --cov=mypackage --cov-branch --cov-report=term-missing ``` ### Configuration ```toml # File: pyproject.toml [tool.pytest.ini_options] addopts = [ "--cov=mypackage", "--cov-branch", "--cov-report=term-missing:skip-covered", "--cov-report=html", "--cov-fail-under=80", ] [tool.coverage.run] source = ["mypackage"] branch = true omit = [ "*/tests/*", "*/test_*.py", "*/__init__.py", ] [tool.coverage.report] precision = 2 show_missing = true skip_covered = false exclude_lines = [ "pragma: no cover", "def __repr__", "raise AssertionError", "raise NotImplementedError", "if __name__ == .__main__.:", "if TYPE_CHECKING:", "@abstractmethod", ] ``` ### Coverage Targets ```python # ❌ WRONG: Chasing 100% coverage # File: utils.py def format_user(user: dict) -> str: if user.get("middle_name"): # Rare edge case return f"{user['first_name']} {user['middle_name']} {user['last_name']}" return f"{user['first_name']} {user['last_name']}" def __repr__(self): # Debug helper return f"User({self.name})" # Writing tests just for coverage: def test_format_user_with_middle_name(): # Low-value test result = format_user({"first_name": "A", "middle_name": "B", "last_name": "C"}) assert result == "A B C" # ✅ CORRECT: Pragmatic coverage with exclusions # File: utils.py def format_user(user: dict) -> str: if user.get("middle_name"): return f"{user['first_name']} {user['middle_name']} {user['last_name']}" return f"{user['first_name']} {user['last_name']}" def __repr__(self): # pragma: no cover return f"User({self.name})" # Test main path, exclude rare edge cases def test_format_user(): result = format_user({"first_name": "Alice", "last_name": "Smith"}) assert result == "Alice Smith" ``` **Coverage Guidelines:** - **80% overall coverage:** Good target for most projects - **100% for critical paths:** Payment, auth, security logic - **Exclude boilerplate:** `__repr__`, type checking, debug code - **Branch coverage:** More valuable than line coverage - **Don't game metrics:** Tests should verify behavior, not boost numbers ### Branch Coverage ```python # Line coverage: 100%, but missing edge case! def process_payment(amount: float, currency: str) -> bool: if currency == "USD": # Line covered return charge_usd(amount) # Line covered return charge_other(amount, currency) # Line not covered! def test_process_payment(): result = process_payment(100.0, "USD") assert result is True # Line coverage: 3/3 = 100% ✓ # Branch coverage: 1/2 = 50% ✗ # ✅ CORRECT: Test both branches def test_process_payment_usd(): result = process_payment(100.0, "USD") assert result is True def test_process_payment_other(): result = process_payment(100.0, "EUR") assert result is True # Line coverage: 3/3 = 100% ✓ # Branch coverage: 2/2 = 100% ✓ ``` **Why this matters:** Branch coverage catches untested code paths. Line coverage can show 100% while missing edge cases. ## Property-Based Testing ### Hypothesis Basics ```python # Install: pip install hypothesis from hypothesis import given, strategies as st # ❌ WRONG: Only testing specific examples def test_reverse_twice(): assert reverse(reverse([1, 2, 3])) == [1, 2, 3] assert reverse(reverse([])) == [] assert reverse(reverse([1])) == [1] # ✅ CORRECT: Property-based test from hypothesis import given from hypothesis import strategies as st @given(st.lists(st.integers())) def test_reverse_twice_property(lst): """Reversing a list twice returns the original list.""" assert reverse(reverse(lst)) == lst # Hypothesis generates hundreds of test cases automatically # ✅ CORRECT: Test mathematical properties @given(st.integers(), st.integers()) def test_addition_commutative(a, b): """Addition is commutative: a + b == b + a""" assert add(a, b) == add(b, a) @given(st.integers()) def test_addition_identity(a): """Adding zero is identity: a + 0 == a""" assert add(a, 0) == a @given(st.lists(st.integers())) def test_sort_idempotent(lst): """Sorting twice gives same result as sorting once.""" assert sorted(sorted(lst)) == sorted(lst) ``` ### Hypothesis Strategies ```python from hypothesis import given, strategies as st # ✅ Basic strategies @given(st.integers()) # Any integer def test_abs_positive(n): assert abs(n) >= 0 @given(st.integers(min_value=0, max_value=100)) # Bounded integers def test_percentage(n): assert 0 <= n <= 100 @given(st.floats(allow_nan=False, allow_infinity=False)) def test_float_calculation(x): result = calculate(x) assert not math.isnan(result) @given(st.text()) # Any unicode string def test_encode_decode(s): assert decode(encode(s)) == s @given(st.text(alphabet=st.characters(whitelist_categories=("Lu", "Ll")))) def test_letters_only(s): # Only upper/lowercase letters assert s.isalpha() or len(s) == 0 # ✅ Composite strategies @given(st.lists(st.integers(), min_size=1, max_size=10)) def test_list_operations(lst): assert len(lst) >= 1 assert len(lst) <= 10 @given(st.dictionaries(keys=st.text(), values=st.integers())) def test_dict_operations(d): serialized = json.dumps(d) assert json.loads(serialized) == d # ✅ Custom strategies from hypothesis import composite @composite def users(draw): """Generate test user dictionaries.""" return { "name": draw(st.text(min_size=1, max_size=50)), "age": draw(st.integers(min_value=0, max_value=120)), "email": draw(st.emails()), } @given(users()) def test_user_validation(user): validate_user(user) # Should not raise ``` ### When to Use Property-Based Testing ```python # ✅ Good use cases: # 1. Round-trip properties (encode/decode, serialize/deserialize) @given(st.dictionaries(st.text(), st.integers())) def test_json_round_trip(data): assert json.loads(json.dumps(data)) == data # 2. Invariants (properties that always hold) @given(st.lists(st.integers())) def test_sorted_is_ordered(lst): sorted_lst = sorted(lst) for i in range(len(sorted_lst) - 1): assert sorted_lst[i] <= sorted_lst[i + 1] # 3. Comparison with reference implementation @given(st.lists(st.integers())) def test_custom_sort_matches_builtin(lst): assert custom_sort(lst) == sorted(lst) # 4. Finding edge cases @given(st.text()) def test_parse_never_crashes(text): # Should handle any input without crashing result = parse(text) assert isinstance(result, (dict, None)) # ❌ Don't use for: # - Testing exact output (use example-based tests) # - Complex business logic (hard to express as properties) # - External API calls (use mocking with examples) ``` **Why this matters:** Property-based tests find edge cases humans miss. Hypothesis generates thousands of test cases, including corner cases like empty lists, negative numbers, unicode edge cases. ## Test Architecture ### Test Pyramid ``` /\ / \ E2E (few) /----\ / \ Integration (some) /--------\ / \ Unit (many) /------------\ ``` **Unit Tests (70-80%):** - Test individual functions/classes in isolation - Fast (milliseconds) - No external dependencies - Use mocks for dependencies **Integration Tests (15-25%):** - Test components working together - Slower (seconds) - Real database/services when possible - Test critical paths **E2E Tests (5-10%):** - Test entire system - Slowest (minutes) - Full stack: UI → API → Database - Test critical user journeys only ### Unit vs Integration vs E2E ```python # Unit test: Test function in isolation def test_calculate_discount_unit(): price = 100.0 discount_percent = 20 result = calculate_discount(price, discount_percent) assert result == 80.0 # Integration test: Test components together def test_apply_discount_integration(database): # Uses real database product = database.create_product(name="Widget", price=100.0) coupon = database.create_coupon(code="SAVE20", discount_percent=20) result = apply_discount_to_product(product.id, coupon.code) assert result.final_price == 80.0 assert database.get_product(product.id).price == 100.0 # Original unchanged # E2E test: Test through API def test_checkout_with_discount_e2e(api_client, database): # Setup test data api_client.post("/products", json={"name": "Widget", "price": 100.0}) api_client.post("/coupons", json={"code": "SAVE20", "discount": 20}) # User journey api_client.post("/cart/add", json={"product_id": 1, "quantity": 1}) api_client.post("/cart/apply-coupon", json={"code": "SAVE20"}) response = api_client.post("/checkout") assert response.status_code == 200 assert response.json()["total"] == 80.0 ``` ### Test Organization Strategies ```python # Strategy 1: Mirror source structure mypackage/ users.py auth.py payments.py tests/ test_users.py test_auth.py test_payments.py # Strategy 2: Separate by test type tests/ unit/ test_users.py test_auth.py integration/ test_user_auth_flow.py test_payment_flow.py e2e/ test_checkout.py # Strategy 3: Feature-based (for larger projects) tests/ users/ test_registration.py test_authentication.py test_profile.py payments/ test_checkout.py test_refunds.py ``` **Recommendation:** Start with Strategy 1 (mirror structure). Move to Strategy 2 when you have many integration/E2E tests. Use Strategy 3 for large projects with complex features. ## Flaky Tests ### Identifying Flaky Tests ```bash # Run tests multiple times to identify flakiness pytest --count=100 # Requires pytest-repeat # Run tests in random order pytest --random-order # Requires pytest-randomly # Run tests in parallel (exposes race conditions) pytest -n 4 # Requires pytest-xdist ``` ### Common Causes and Fixes #### 1. Test Order Dependencies ```python # ❌ WRONG: Test depends on state from previous test class TestUser: user = None def test_create_user(self): self.user = create_user("alice") assert self.user.name == "alice" def test_update_user(self): # Fails if run before test_create_user! self.user.name = "bob" assert self.user.name == "bob" # ✅ CORRECT: Each test is independent class TestUser: @pytest.fixture def user(self): return create_user("alice") def test_create_user(self): user = create_user("alice") assert user.name == "alice" def test_update_user(self, user): user.name = "bob" assert user.name == "bob" ``` #### 2. Time-Dependent Tests ```python # ❌ WRONG: Test depends on current time def test_expiration_wrong(): from datetime import datetime, timedelta session = create_session(expires_in=timedelta(seconds=1)) time.sleep(1) # Flaky - might not be exactly 1 second assert session.is_expired() # ✅ CORRECT: Mock time for deterministic tests def test_expiration_correct(mocker): from datetime import datetime, timedelta start_time = datetime(2025, 1, 1, 12, 0, 0) mocker.patch("module.datetime.now", return_value=start_time) session = create_session(expires_in=timedelta(hours=1)) assert not session.is_expired() # Advance time future_time = start_time + timedelta(hours=2) mocker.patch("module.datetime.now", return_value=future_time) assert session.is_expired() ``` #### 3. Async/Concurrency Issues ```python # ❌ WRONG: Race condition with async code async def test_concurrent_updates_wrong(): counter = Counter(value=0) # These run concurrently, order undefined await asyncio.gather( counter.increment(), counter.increment(), ) # Flaky - might be 1 or 2 depending on timing assert counter.value == 2 # ✅ CORRECT: Test with proper synchronization async def test_concurrent_updates_correct(): counter = ThreadSafeCounter(value=0) await asyncio.gather( counter.increment(), counter.increment(), ) assert counter.value == 2 # ThreadSafeCounter ensures correctness # ✅ CORRECT: Test for race conditions explicitly async def test_detects_race_condition(): unsafe_counter = Counter(value=0) # Run many times to trigger race condition for _ in range(100): await asyncio.gather( unsafe_counter.increment(), unsafe_counter.increment(), ) # This should fail, proving there's a race condition # (Or pass if the code is actually thread-safe) ``` #### 4. External Dependencies ```python # ❌ WRONG: Test depends on external service def test_fetch_user_data_wrong(): # Flaky - network issues, rate limits, service downtime response = requests.get("https://api.example.com/users/1") assert response.status_code == 200 # ✅ CORRECT: Mock external service def test_fetch_user_data_correct(mocker): mock_response = mocker.Mock() mock_response.status_code = 200 mock_response.json.return_value = {"id": 1, "name": "alice"} mocker.patch("requests.get", return_value=mock_response) response = fetch_user_data(1) assert response["name"] == "alice" ``` #### 5. Resource Leaks ```python # ❌ WRONG: Not cleaning up resources def test_file_operations_wrong(): f = open("test.txt", "w") f.write("test") # File not closed - subsequent tests might fail assert os.path.exists("test.txt") # ✅ CORRECT: Always cleanup def test_file_operations_correct(tmp_path): test_file = tmp_path / "test.txt" with test_file.open("w") as f: f.write("test") assert test_file.exists() # File automatically closed, tmp_path automatically cleaned up # ✅ CORRECT: Use fixtures for cleanup @pytest.fixture def test_file(tmp_path): file_path = tmp_path / "test.txt" yield file_path # Cleanup happens automatically via tmp_path ``` #### 6. Non-Deterministic Data ```python # ❌ WRONG: Random or time-based data def test_user_id_generation_wrong(): user = create_user("alice") # Flaky - ID might be random or timestamp-based assert user.id == 1 # ✅ CORRECT: Mock or control randomness def test_user_id_generation_correct(mocker): mocker.patch("module.generate_id", return_value="fixed-id-123") user = create_user("alice") assert user.id == "fixed-id-123" # ✅ CORRECT: Use fixtures with deterministic data @pytest.fixture def fixed_random(): import random random.seed(42) yield random # Reset seed if needed ``` ### Debugging Flaky Tests ```python # ✅ Strategy 1: Add retry decorator to identify flakiness import pytest @pytest.mark.flaky(reruns=3) # Requires pytest-rerunfailures def test_potentially_flaky(): # Test that occasionally fails result = fetch_data() assert result is not None # ✅ Strategy 2: Add logging to understand failures import logging def test_with_logging(caplog): caplog.set_level(logging.DEBUG) result = complex_operation() # Logs captured automatically assert "Expected step completed" in caplog.text assert result.success # ✅ Strategy 3: Use test markers @pytest.mark.flaky def test_known_flaky(): # Mark test as flaky while investigating ... # Skip flaky tests in CI pytest -m "not flaky" ``` ## CI Integration ### GitHub Actions Example ```yaml # File: .github/workflows/test.yml name: Tests on: push: branches: [ main ] pull_request: branches: [ main ] jobs: test: runs-on: ubuntu-latest strategy: matrix: python-version: ["3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: | python -m pip install --upgrade pip pip install -e ".[dev]" - name: Run tests run: | pytest --cov=mypackage --cov-report=xml --cov-report=term-missing - name: Upload coverage uses: codecov/codecov-action@v3 with: file: ./coverage.xml fail_ci_if_error: true ``` ### Parallel Testing in CI ```yaml # Run tests in parallel - name: Run tests in parallel run: | pytest -n auto --dist loadscope # Split tests across multiple jobs jobs: test: runs-on: ubuntu-latest strategy: matrix: test-group: [unit, integration, e2e] steps: - name: Run ${{ matrix.test-group }} tests run: | pytest tests/${{ matrix.test-group }} ``` ### Test Configuration for CI ```toml # File: pyproject.toml [tool.pytest.ini_options] # CI-friendly settings addopts = [ "--strict-markers", # Fail on unknown markers "--strict-config", # Fail on config errors "--cov=mypackage", "--cov-branch", "--cov-report=term-missing", "--cov-report=xml", "--cov-fail-under=80", # Fail if coverage below 80% "-v", # Verbose output ] markers = [ "slow: marks tests as slow (deselect with '-m \"not slow\"')", "integration: integration tests", "e2e: end-to-end tests", "flaky: known flaky tests", ] # Run fast tests first testpaths = ["tests"] python_files = ["test_*.py"] python_classes = ["Test*"] python_functions = ["test_*"] ``` ### Environment-Specific Test Behavior ```python import os import pytest # ✅ Skip tests in CI that require local resources @pytest.mark.skipif( os.getenv("CI") == "true", reason="Requires local database" ) def test_local_only(): ... # ✅ Use different fixtures in CI @pytest.fixture def database(): if os.getenv("CI"): # Use containerized database in CI return DockerDatabase() else: # Use local database in development return LocalDatabase() # ✅ Stricter timeouts in CI @pytest.mark.timeout(10 if os.getenv("CI") else 30) def test_with_timeout(): ... ``` ## Advanced Patterns ### Snapshot Testing ```python # Install: pip install syrupy def test_api_response_snapshot(snapshot): """Test API response matches saved snapshot.""" response = api.get_user(123) # First run: saves snapshot # Future runs: compares against snapshot assert response == snapshot # Update snapshots when intentionally changed: # pytest --snapshot-update ``` ### Mutation Testing ```python # Install: pip install mutmut # Run mutation testing # mutmut run # Mutation testing changes your code and runs tests # If tests still pass, you have inadequate coverage # Example: def is_even(n: int) -> bool: return n % 2 == 0 # Bad test: def test_is_even(): assert is_even(2) is True # Passes even if mutant changes 2 to 0 # Good test: def test_is_even(): assert is_even(2) is True assert is_even(3) is False # Would catch mutations assert is_even(0) is True ``` ### Test Fixtures as Contract ```python # ✅ Pattern: Fixtures define test contracts @pytest.fixture def valid_user() -> dict: """Fixture provides valid user that passes validation.""" return { "name": "alice", "email": "alice@example.com", "age": 30, } def test_user_validation_accepts_valid(valid_user): """Valid user fixture must pass validation.""" validate_user(valid_user) # Should not raise def test_user_creation(valid_user): """Can create user from valid fixture.""" user = create_user(**valid_user) assert user.name == "alice" # If validation rules change, update fixture once # All tests using fixture automatically get the update ``` ## Decision Trees ### Which Test Type? ``` Unit test if: - Testing single function/class - No external dependencies (or can mock them) - Fast (<10ms) Integration test if: - Testing multiple components - Real database/services involved - Moderate speed (<1s) E2E test if: - Testing full user journey - Multiple systems involved - Slow (>1s acceptable) ``` ### When to Mock? ``` Mock if: - External API/service - Slow operation (network, disk I/O) - Non-deterministic (time, random) - Not the focus of the test Don't mock if: - Business logic under test - Fast pure functions - Simple data transformations - Integration test (testing interaction) ``` ### Fixture Scope? ``` function (default): - Different state per test needed - Cheap to create (<10ms) class: - Tests in class share setup - Moderate creation cost module: - All tests in file can share - Expensive setup (database) - State reset between tests session: - One-time setup for all tests - Very expensive (>1s) - Read-only or stateless ``` ## Anti-Patterns ### Testing Implementation Details ```python # ❌ WRONG: Testing private methods class UserService: def _validate_email(self, email: str) -> bool: return "@" in email def create_user(self, name: str, email: str) -> User: if not self._validate_email(email): raise ValueError("Invalid email") return User(name, email) def test_validate_email_wrong(): service = UserService() assert service._validate_email("test@example.com") # Testing private method! # ✅ CORRECT: Test public interface def test_create_user_with_invalid_email(): service = UserService() with pytest.raises(ValueError, match="Invalid email"): service.create_user("alice", "not-an-email") ``` ### Tautological Tests ```python # ❌ WRONG: Test that only proves code runs def test_get_user(): user = get_user(1) assert user == get_user(1) # Proves nothing! # ✅ CORRECT: Test expected behavior def test_get_user(): user = get_user(1) assert user.id == 1 assert user.name is not None assert isinstance(user.email, str) ``` ### Fragile Selectors ```python # ❌ WRONG: Testing exact string matches (fragile) def test_error_message(): with pytest.raises(ValueError) as exc: validate_user({"name": ""}) assert str(exc.value) == "Validation error: name must not be empty" # Breaks if message wording changes slightly # ✅ CORRECT: Test meaningful parts def test_error_message(): with pytest.raises(ValueError) as exc: validate_user({"name": ""}) error_msg = str(exc.value).lower() assert "name" in error_msg assert "empty" in error_msg or "required" in error_msg ``` ### Slow Tests ```python # ❌ WRONG: Sleeping in tests def test_async_operation(): start_operation() time.sleep(5) # Waiting for operation to complete assert operation_complete() # ✅ CORRECT: Poll with timeout def test_async_operation(): start_operation() timeout = 5 start = time.time() while time.time() - start < timeout: if operation_complete(): return time.sleep(0.1) pytest.fail("Operation did not complete within timeout") # ✅ BETTER: Use async properly or mock async def test_async_operation(): await start_operation() assert await operation_complete() ``` ## Integration with Other Skills **After using this skill:** - If tests are slow → See @debugging-and-profiling for profiling tests - If setting up CI → See @project-structure-and-tooling for CI configuration - If testing async code → See @async-patterns-and-concurrency for async testing patterns **Before using this skill:** - Set up pytest → Use @project-structure-and-tooling for pytest configuration in pyproject.toml ## Quick Reference ### Essential pytest Commands ```bash # Run all tests pytest # Run specific file pytest tests/test_users.py # Run specific test pytest tests/test_users.py::test_create_user # Run tests matching pattern pytest -k "user and not admin" # Run with coverage pytest --cov=mypackage --cov-report=term-missing # Run in parallel pytest -n auto # Verbose output pytest -v # Stop on first failure pytest -x # Show local variables on failure pytest -l # Run last failed tests pytest --lf # Run failed, then all pytest --ff ``` ### pytest Markers ```python import pytest @pytest.mark.skip(reason="Not implemented yet") def test_future_feature(): ... @pytest.mark.skipif(sys.version_info < (3, 12), reason="Requires Python 3.12+") def test_new_syntax(): ... @pytest.mark.xfail(reason="Known bug #123") def test_buggy_feature(): ... @pytest.mark.parametrize("input,expected", [(1, 2), (2, 3)]) def test_increment(input, expected): ... @pytest.mark.slow def test_expensive_operation(): ... # Run: pytest -m "not slow" # Skip slow tests ``` ### Fixture Cheatsheet ```python @pytest.fixture def simple(): return "value" @pytest.fixture def with_cleanup(): resource = setup() yield resource cleanup(resource) @pytest.fixture(scope="session") def expensive(): return expensive_setup() @pytest.fixture def factory(): items = [] def _create(**kwargs): item = create_item(**kwargs) items.append(item) return item yield _create for item in items: cleanup(item) @pytest.fixture(params=["a", "b", "c"]) def parametrized(request): return request.param ``` ### Coverage Targets | Coverage Type | Good Target | Critical Code | Acceptable Minimum | |---------------|-------------|---------------|-------------------| | Line Coverage | 80% | 100% | 70% | | Branch Coverage | 75% | 100% | 65% | | Function Coverage | 90% | 100% | 80% | **Priority order:** 1. Critical paths (auth, payments, security) → 100% 2. Business logic → 80-90% 3. Utility functions → 70-80% 4. Boilerplate → Can exclude ## Why This Matters **Tests enable:** - **Confident refactoring:** Change code knowing tests catch regressions - **Living documentation:** Tests show how code is meant to be used - **Design feedback:** Hard-to-test code often indicates design problems - **Faster debugging:** Tests isolate problems to specific components **Good tests are:** - **Fast:** Milliseconds for unit tests, seconds for integration - **Isolated:** No dependencies between tests - **Repeatable:** Same result every time - **Self-checking:** Pass/fail without manual inspection - **Timely:** Written with or before code (TDD) **Test smells:** - Tests slower than code being tested - Tests breaking from unrelated changes - Need to change many tests for one feature change - Tests that sometimes fail for no reason (flaky) - Coverage gaps in critical paths **Testing is not:** - Proof of correctness (only proof of presence of bugs tested for) - Replacement for code review - Substitute for good design - Way to catch all bugs **Testing is:** - Safety net for refactoring - Documentation of expected behavior - Quick feedback on code quality - Regression prevention