--- name: scientific-python-testing description: Write robust, maintainable tests for scientific Python packages using pytest best practices following Scientific Python community guidelines --- # Scientific Python Testing with pytest A comprehensive guide to writing effective tests for scientific Python packages using pytest, following the [Scientific Python Community guidelines](https://learn.scientific-python.org/development/guides/pytest/) and [testing tutorial](https://learn.scientific-python.org/development/tutorials/test/). This skill focuses on modern testing patterns, fixtures, parametrization, and best practices specific to scientific computing. ## Quick Reference Card **Common Testing Tasks - Quick Decisions:** ```python # 1. Basic test → Use simple assert def test_function(): assert result == expected # 2. Floating-point comparison → Use approx from pytest import approx assert result == approx(0.333, rel=1e-6) # 3. Testing exceptions → Use pytest.raises with pytest.raises(ValueError, match="must be positive"): function(-1) # 4. Multiple inputs → Use parametrize @pytest.mark.parametrize("input,expected", [(1,1), (2,4), (3,9)]) def test_square(input, expected): assert input**2 == expected # 5. Reusable setup → Use fixture @pytest.fixture def sample_data(): return np.array([1, 2, 3, 4, 5]) # 6. NumPy arrays → Use approx or numpy.testing assert np.mean(data) == approx(3.0) ``` **Decision Tree:** - Need multiple test cases with same logic? → **Parametrize** - Need reusable test data/setup? → **Fixture** - Testing floating-point results? → **pytest.approx** - Testing exceptions/warnings? → **pytest.raises / pytest.warns** - Complex numerical arrays? → **numpy.testing.assert_allclose** - Organizing by speed? → **Markers and separate directories** ## When to Use This Skill - Writing tests for scientific Python packages and libraries - Testing numerical algorithms and scientific computations - Setting up test infrastructure for research software - Implementing continuous integration for scientific code - Testing data analysis pipelines and workflows - Validating scientific simulations and models - Ensuring reproducibility and correctness of research code - Testing code that uses NumPy, SciPy, Pandas, and other scientific libraries ## Core Concepts ### 1. Why pytest for Scientific Python pytest is the de facto standard for testing Python packages because it: - **Simple syntax**: Just use Python's `assert` statement - **Detailed reporting**: Clear, informative failure messages - **Powerful features**: Fixtures, parametrization, marks, plugins - **Scientific ecosystem**: Native support for NumPy arrays, approximate comparisons - **Community standard**: Used by NumPy, SciPy, Pandas, scikit-learn, and more ### 2. Test Structure and Organization **Standard test directory layout:** ```text my-package/ ├── src/ │ └── my_package/ │ ├── __init__.py │ ├── analysis.py │ └── utils.py ├── tests/ │ ├── conftest.py │ ├── test_analysis.py │ └── test_utils.py └── pyproject.toml ``` **Key principles:** - Tests directory separate from source code (alongside `src/`) - Test files named `test_*.py` (pytest discovery) - Test functions named `test_*` (pytest discovery) - No `__init__.py` in tests directory (avoid importability issues) - Test against installed package, not local source ### 3. pytest Configuration Configure pytest in `pyproject.toml` (recommended for modern packages): ```toml [tool.pytest.ini_options] minversion = "7.0" addopts = [ "-ra", # Show summary of all test outcomes "--showlocals", # Show local variables in tracebacks "--strict-markers", # Error on undefined markers "--strict-config", # Error on config issues ] xfail_strict = true # xfail tests must fail filterwarnings = [ "error", # Treat warnings as errors ] log_cli_level = "info" # Log level for test output testpaths = [ "tests", # Limit pytest to tests directory ] ``` ## Testing Principles Following the [Scientific Python testing recommendations](https://learn.scientific-python.org/development/principles/testing/), effective testing provides multiple benefits and should follow key principles: ### Advantages of Testing - **Trustworthy code**: Well-tested code behaves as expected and can be relied upon - **Living documentation**: Tests communicate intent and expected behavior, validated with each run - **Preventing failure**: Tests protect against implementation errors and unexpected dependency changes - **Confidence when making changes**: Thorough test suites enable adding features, fixing bugs, and refactoring with confidence ### Fundamental Principles **1. Any test case is better than none** When in doubt, write the test that makes sense at the time: - Test critical behaviors, features, and logic - Write clear, expressive, well-documented tests - Tests are documentation of developer intentions - Good tests make it clear what they are testing and how Don't get bogged down in taxonomy when learning—focus on writing tests that work. **2. As long as that test is correct** It's surprisingly easy to write tests that pass when they should fail: - **Check that your test fails when it should**: Deliberately break the code and verify the test fails - **Keep it simple**: Excessive mocks and fixtures make it difficult to know what's being tested - **Test one thing at a time**: A single test should test a single behavior **3. Start with Public Interface Tests** Begin by testing from the perspective of a user: - Test code as users will interact with it - Keep tests simple and readable for documentation purposes - Focus on supported use cases - Avoid testing private attributes - Minimize use of mocks/patches **4. Organize Tests into Suites** Divide tests by type and execution time for efficiency: - **Unit tests**: Fast, isolated tests of individual components - **Integration tests**: Tests of component interactions and dependencies - **End-to-end tests**: Complete workflow testing Benefits: - Run relevant tests quickly and frequently - "Fail fast" by running fast suites first - Easier to read and reason about - Avoid false positives from expected external failures ### Outside-In Testing Approach The recommended approach is **outside-in**, starting from the user's perspective: 1. **Public Interface Tests**: Test from user perspective, focusing on behavior and features 2. **Integration Tests**: Test that components work together and with dependencies 3. **Unit Tests**: Test individual units in isolation, optimized for speed This approach ensures you're building the right thing before optimizing implementation details. ## Quick Start ### Minimal Test Example ```python # tests/test_basic.py def test_simple_math(): """Test basic arithmetic.""" assert 4 == 2**2 def test_string_operations(): """Test string methods.""" result = "hello world".upper() assert result == "HELLO WORLD" assert "HELLO" in result ``` ### Scientific Test Example ```python # tests/test_scientific.py import numpy as np from pytest import approx from my_package.analysis import compute_mean, fit_linear def test_compute_mean(): """Test mean calculation.""" data = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) result = compute_mean(data) assert result == approx(3.0) def test_fit_linear(): """Test linear regression.""" x = np.array([0, 1, 2, 3, 4]) y = np.array([0, 2, 4, 6, 8]) slope, intercept = fit_linear(x, y) assert slope == approx(2.0) assert intercept == approx(0.0) ``` ## Testing Best Practices ### Pattern 1: Writing Simple, Focused Tests **Bad - Multiple assertions testing different things:** ```python def test_everything(): data = load_data("input.csv") assert len(data) > 0 processed = process_data(data) assert processed.mean() > 0 result = analyze(processed) assert result.success ``` **Good - Separate tests for each behavior:** ```python def test_load_data_returns_nonempty(): """Data loading should return at least one row.""" data = load_data("input.csv") assert len(data) > 0 def test_process_data_positive_mean(): """Processed data should have positive mean.""" data = load_data("input.csv") processed = process_data(data) assert processed.mean() > 0 def test_analyze_succeeds(): """Analysis should complete successfully.""" data = load_data("input.csv") processed = process_data(data) result = analyze(processed) assert result.success ``` **Arrange-Act-Assert pattern:** ```python def test_computation(): # Arrange - Set up test data data = np.array([1, 2, 3, 4, 5]) expected = 3.0 # Act - Execute the function result = compute_mean(data) # Assert - Check the result assert result == approx(expected) ``` ### Pattern 2: Testing for Failures Always test that your code raises appropriate exceptions: ```python import pytest def test_zero_division_raises(): """Division by zero should raise ZeroDivisionError.""" with pytest.raises(ZeroDivisionError): result = 1 / 0 def test_invalid_input_raises(): """Invalid input should raise ValueError.""" with pytest.raises(ValueError, match="must be positive"): result = compute_sqrt(-1) def test_deprecation_warning(): """Deprecated function should warn.""" with pytest.warns(DeprecationWarning): result = old_function() def test_deprecated_call(): """Check for deprecated API usage.""" with pytest.deprecated_call(): result = legacy_api() ``` ### Pattern 3: Approximate Comparisons Scientific computing often involves floating-point arithmetic that cannot be tested for exact equality: **For scalars:** ```python from pytest import approx def test_approximate_scalar(): """Test with approximate comparison.""" result = 1 / 3 assert result == approx(0.33333333333, rel=1e-10) # Default relative tolerance is 1e-6 assert 0.3 + 0.3 == approx(0.6) def test_approximate_with_absolute_tolerance(): """Test with absolute tolerance.""" result = compute_small_value() assert result == approx(0.0, abs=1e-10) ``` **For NumPy arrays (preferred over numpy.testing):** ```python import numpy as np from pytest import approx def test_array_approximate(): """Test NumPy arrays with approx.""" result = np.array([0.1, 0.2, 0.3]) expected = np.array([0.10001, 0.20001, 0.30001]) assert result == approx(expected) def test_array_with_nan(): """Handle NaN values in arrays.""" result = np.array([1.0, np.nan, 3.0]) expected = np.array([1.0, np.nan, 3.0]) assert result == approx(expected, nan_ok=True) ``` **When to use numpy.testing:** ```python import numpy as np from numpy.testing import assert_allclose, assert_array_equal def test_exact_integer_array(): """Use numpy.testing for exact integer comparisons.""" result = np.array([1, 2, 3]) expected = np.array([1, 2, 3]) assert_array_equal(result, expected) def test_complex_array_tolerances(): """Use numpy.testing for complex tolerance requirements.""" result = compute_result() expected = load_reference() assert_allclose(result, expected, rtol=1e-7, atol=1e-10) ``` ### Pattern 4: Using Fixtures Fixtures provide reusable test setup and teardown: **Basic fixtures:** ```python import pytest import numpy as np @pytest.fixture def sample_data(): """Provide sample data for tests.""" return np.array([1.0, 2.0, 3.0, 4.0, 5.0]) @pytest.fixture def empty_array(): """Provide empty array for edge case tests.""" return np.array([]) def test_mean_with_fixture(sample_data): """Test using fixture.""" result = np.mean(sample_data) assert result == approx(3.0) def test_empty_array(empty_array): """Test edge case with empty array.""" with pytest.warns(RuntimeWarning): result = np.mean(empty_array) assert np.isnan(result) ``` **Fixtures with setup and teardown:** ```python import pytest import tempfile from pathlib import Path @pytest.fixture def temp_datafile(): """Create temporary data file for tests.""" # Setup tmpfile = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') tmpfile.write("1.0\n2.0\n3.0\n") tmpfile.close() # Provide to test yield Path(tmpfile.name) # Teardown Path(tmpfile.name).unlink() def test_load_data(temp_datafile): """Test data loading from file.""" data = np.loadtxt(temp_datafile) assert len(data) == 3 assert data[0] == approx(1.0) ``` **Fixture scopes:** ```python @pytest.fixture(scope="function") # Default, run for each test def data_per_test(): return create_data() @pytest.fixture(scope="class") # Run once per test class def data_per_class(): return create_data() @pytest.fixture(scope="module") # Run once per module def data_per_module(): return load_large_dataset() @pytest.fixture(scope="session") # Run once per test session def database_connection(): conn = create_connection() yield conn conn.close() ``` **Auto-use fixtures:** ```python @pytest.fixture(autouse=True) def reset_random_seed(): """Reset random seed before each test for reproducibility.""" np.random.seed(42) ``` ### Pattern 5: Parametrized Tests Test the same function with multiple inputs: **Basic parametrization:** ```python import pytest @pytest.mark.parametrize("input_val,expected", [ (0, 0), (1, 1), (2, 4), (3, 9), (-2, 4), ]) def test_square(input_val, expected): """Test squaring with multiple inputs.""" assert input_val**2 == expected @pytest.mark.parametrize("angle", [0, np.pi/6, np.pi/4, np.pi/3, np.pi/2]) def test_sine_range(angle): """Test sine function returns values in [0, 1] for first quadrant.""" result = np.sin(angle) assert 0 <= result <= 1 ``` **Multiple parameters:** ```python @pytest.mark.parametrize("n_air,n_water", [ (1.0, 1.33), (1.0, 1.5), (1.5, 1.0), ]) def test_refraction(n_air, n_water): """Test Snell's law with different refractive indices.""" angle_in = np.pi / 4 angle_out = snell(angle_in, n_air, n_water) assert angle_out >= 0 assert angle_out <= np.pi / 2 ``` **Parametrized fixtures:** ```python @pytest.fixture(params=[1, 2, 3], ids=["one", "two", "three"]) def dimension(request): """Parametrized fixture for different dimensions.""" return request.param def test_array_creation(dimension): """Test array creation in different dimensions.""" shape = tuple([10] * dimension) arr = np.zeros(shape) assert arr.ndim == dimension assert arr.shape == shape ``` **Combining parametrization with custom IDs:** ```python @pytest.mark.parametrize( "data,expected", [ (np.array([1, 2, 3]), 2.0), (np.array([1, 1, 1]), 1.0), (np.array([0, 10]), 5.0), ], ids=["sequential", "constant", "extremes"] ) def test_mean_with_ids(data, expected): """Test mean with descriptive test IDs.""" assert np.mean(data) == approx(expected) ``` ### Pattern 6: Test Organization with Markers Use markers to organize and selectively run tests: **Basic markers:** ```python import pytest @pytest.mark.slow def test_expensive_computation(): """Test that takes a long time.""" result = run_simulation(n_iterations=1000000) assert result.converged @pytest.mark.requires_gpu def test_gpu_acceleration(): """Test that requires GPU hardware.""" result = compute_on_gpu(large_array) assert result.success @pytest.mark.integration def test_full_pipeline(): """Integration test for complete workflow.""" data = load_data() processed = preprocess(data) result = analyze(processed) output = save_results(result) assert output.exists() ``` **Running specific markers:** ```bash pytest -m slow # Run only slow tests pytest -m "not slow" # Skip slow tests pytest -m "slow or gpu" # Run slow OR gpu tests pytest -m "slow and integration" # Run slow AND integration tests ``` **Skip and xfail markers:** ```python @pytest.mark.skip(reason="Feature not implemented yet") def test_future_feature(): """Test for feature under development.""" result = future_function() assert result.success @pytest.mark.skipif(sys.version_info < (3, 10), reason="Requires Python 3.10+") def test_pattern_matching(): """Test using Python 3.10+ features.""" match value: case 0: result = "zero" case _: result = "other" assert result == "zero" @pytest.mark.xfail(reason="Known bug in upstream library") def test_known_failure(): """Test that currently fails due to known issue.""" result = buggy_function() assert result == expected @pytest.mark.xfail(strict=True) def test_must_fail(): """Test that MUST fail (test will fail if it passes).""" with pytest.raises(NotImplementedError): unimplemented_function() ``` **Custom markers in pyproject.toml:** ```toml [tool.pytest.ini_options] markers = [ "slow: marks tests as slow (deselect with '-m \"not slow\"')", "requires_gpu: marks tests that need GPU hardware", "integration: marks tests as integration tests", "unit: marks tests as unit tests", ] ``` ### Pattern 6b: Organizing Test Suites by Directory Following [Scientific Python recommendations](https://learn.scientific-python.org/development/principles/testing/#test-suites), organize tests into separate directories by type and execution time: ```text tests/ ├── unit/ # Fast, isolated unit tests │ ├── conftest.py │ ├── test_analysis.py │ └── test_utils.py ├── integration/ # Integration tests │ ├── conftest.py │ └── test_pipeline.py ├── e2e/ # End-to-end tests │ └── test_workflows.py └── conftest.py # Shared fixtures ``` **Run specific test suites:** ```bash # Run only unit tests (fast) pytest tests/unit/ # Run integration tests after unit tests pass pytest tests/integration/ # Run all tests pytest ``` **Auto-mark all tests in a directory using conftest.py:** ```python # tests/unit/conftest.py import pytest def pytest_collection_modifyitems(session, config, items): """Automatically mark all tests in this directory as unit tests.""" for item in items: item.add_marker(pytest.mark.unit) ``` **Benefits of organized test suites:** - Run fast tests first ("fail fast" principle) - Developers can run relevant tests quickly - Clear separation of test types - Avoid false positives from slow/flaky tests - Better CI/CD optimization **Example test runner strategy:** ```bash # Run fast unit tests first, stop on failure pytest tests/unit/ -x || exit 1 # If unit tests pass, run integration tests pytest tests/integration/ -x || exit 1 # Finally run slow end-to-end tests pytest tests/e2e/ ``` ### Pattern 7: Mocking and Monkeypatching Mock expensive operations or external dependencies: **Basic monkeypatching:** ```python import platform def test_platform_specific_behavior(monkeypatch): """Test behavior on different platforms.""" # Mock platform.system() to return "Linux" monkeypatch.setattr(platform, "system", lambda: "Linux") result = get_platform_specific_path() assert result == "/usr/local/data" # Change mock to return "Windows" monkeypatch.setattr(platform, "system", lambda: "Windows") result = get_platform_specific_path() assert result == r"C:\Users\data" ``` **Mocking with pytest-mock:** ```python import pytest from unittest.mock import Mock def test_expensive_computation(mocker): """Mock expensive computation.""" # Mock the expensive function mock_compute = mocker.patch("my_package.analysis.expensive_compute") mock_compute.return_value = 42 result = run_analysis() # Verify the mock was called mock_compute.assert_called_once() assert result == 42 def test_matplotlib_plotting(mocker): """Test plotting without creating actual plots.""" mock_plt = mocker.patch("matplotlib.pyplot") create_plot(data) # Verify plot was created mock_plt.figure.assert_called_once() mock_plt.plot.assert_called_once() mock_plt.savefig.assert_called_once_with("output.png") ``` **Fixture for repeated mocking:** ```python @pytest.fixture def mock_matplotlib(mocker): """Mock matplotlib for testing plots.""" fig = mocker.Mock(spec=plt.Figure) ax = mocker.Mock(spec=plt.Axes) line2d = mocker.Mock(name="plot", spec=plt.Line2D) ax.plot.return_value = (line2d,) mpl = mocker.patch("matplotlib.pyplot", autospec=True) mocker.patch("matplotlib.pyplot.subplots", return_value=(fig, ax)) return {"fig": fig, "ax": ax, "mpl": mpl} def test_my_plot(mock_matplotlib): """Test plotting function.""" ax = mock_matplotlib["ax"] my_plotting_function(ax=ax) ax.plot.assert_called_once() ax.set_xlabel.assert_called_once() ``` ### Pattern 8: Testing Against Installed Version Always test the installed package, not local source: **Why this matters:** ``` my-package/ ├── src/ │ └── my_package/ │ ├── __init__.py │ ├── data/ # Data files │ │ └── reference.csv │ └── analysis.py └── tests/ └── test_analysis.py ``` **Use src/ layout + editable install:** ```bash # Install in editable mode pip install -e . # Run tests against installed version pytest ``` **Benefits:** - Tests ensure package installs correctly - Catches missing files (like data files) - Tests work in CI/CD environments - Validates package structure and imports **In tests, import from package:** ```python # Good - imports installed package from my_package.analysis import compute_mean # Bad - would import from local src/ if not using src/ layout # from analysis import compute_mean ``` ### Pattern 8b: Import Best Practices in Tests Following [Scientific Python unit testing guidelines](https://learn.scientific-python.org/development/principles/testing/#unit-tests), proper import patterns make tests more maintainable: **Keep imports local to file under test:** ```python # Good - Import from the file being tested from my_package.analysis import MyClass, compute_mean def test_compute_mean(): """Test imports from module under test.""" data = MyClass() result = compute_mean(data) assert result > 0 ``` **Why this matters:** - When code is refactored and symbols move, tests don't break - Tests only care about symbols used in the file under test - Reduces coupling between tests and internal code organization **Import specific symbols, not entire modules:** ```python # Good - Specific imports, easy to mock from numpy import mean as np_mean, ndarray as NpArray def my_function(data: NpArray) -> float: return np_mean(data) # Good - Easy to patch in tests def test_my_function(mocker): mock_mean = mocker.patch("my_package.analysis.np_mean") # ... ``` ```python # Less ideal - Harder to mock effectively import numpy as np def my_function(data: np.ndarray) -> float: return np.mean(data) # Less ideal - Complex patching required def test_my_function(mocker): # Must patch through the aliased namespace mock_mean = mocker.patch("my_package.analysis.np.mean") # ... ``` **Consider meaningful aliases:** ```python # Make imports meaningful to your domain from numpy import sum as numeric_sum from scipy.stats import ttest_ind as statistical_test # Easy to understand and replace result = numeric_sum(values) p_value = statistical_test(group1, group2) ``` This approach makes it easier to: - Replace implementations without changing tests - Mock dependencies effectively - Understand code purpose from import names ## Running pytest ### Basic Usage ```bash # Run all tests pytest # Run specific file pytest tests/test_analysis.py # Run specific test pytest tests/test_analysis.py::test_mean # Run tests matching pattern pytest -k "mean or median" # Verbose output pytest -v # Show local variables in failures pytest -l # or --showlocals # Stop at first failure pytest -x # Show stdout/stderr pytest -s ``` ### Debugging Tests ```bash # Drop into debugger on failure pytest --pdb # Drop into debugger at start of each test pytest --trace # Run last failed tests pytest --lf # Run failed tests first, then rest pytest --ff # Show which tests would be run (dry run) pytest --collect-only ``` ### Coverage ```bash # Install pytest-cov pip install pytest-cov # Run with coverage pytest --cov=my_package # With coverage report pytest --cov=my_package --cov-report=html # With missing lines pytest --cov=my_package --cov-report=term-missing # Fail if coverage below threshold pytest --cov=my_package --cov-fail-under=90 ``` **Configure in pyproject.toml:** ```toml [tool.pytest.ini_options] addopts = [ "--cov=my_package", "--cov-report=term-missing", "--cov-report=html", ] [tool.coverage.run] source = ["src"] omit = [ "*/tests/*", "*/__init__.py", ] [tool.coverage.report] exclude_lines = [ "pragma: no cover", "def __repr__", "raise AssertionError", "raise NotImplementedError", "if __name__ == .__main__.:", "if TYPE_CHECKING:", "@abstractmethod", ] ``` ## Scientific Python Testing Patterns ### Pattern 9: Testing Numerical Algorithms ```python import numpy as np from pytest import approx def test_numerical_stability(): """Test algorithm is numerically stable.""" data = np.array([1e10, 1.0, -1e10]) result = stable_sum(data) assert result == approx(1.0) def test_convergence(): """Test iterative algorithm converges.""" x0 = np.array([1.0, 1.0, 1.0]) result = iterative_solver(x0, tol=1e-8, max_iter=1000) assert result.converged assert result.iterations < 1000 assert result.residual < 1e-8 def test_against_analytical_solution(): """Test against known analytical result.""" x = np.linspace(0, 1, 100) numerical = compute_integral(lambda t: t**2, x) analytical = x**3 / 3 assert numerical == approx(analytical, rel=1e-6) def test_conservation_law(): """Test that physical conservation law holds.""" initial_energy = compute_energy(system) system.evolve(dt=0.01, steps=1000) final_energy = compute_energy(system) # Energy should be conserved (within numerical error) assert final_energy == approx(initial_energy, rel=1e-10) ``` ### Pattern 10: Testing with Different NumPy dtypes ```python @pytest.mark.parametrize("dtype", [ np.float32, np.float64, np.complex64, np.complex128, ]) def test_computation_dtypes(dtype): """Test function works with different dtypes.""" data = np.array([1, 2, 3, 4, 5], dtype=dtype) result = compute_transform(data) assert result.dtype == dtype assert result.shape == data.shape @pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64]) def test_integer_and_float_types(dtype): """Test handling of integer and float types.""" arr = np.array([1, 2, 3], dtype=dtype) result = safe_divide(arr, 2) # Result should be floating point assert result.dtype in [np.float32, np.float64] ``` ### Pattern 11: Testing Random/Stochastic Code ```python def test_random_with_seed(): """Test random code with fixed seed for reproducibility.""" np.random.seed(42) result1 = generate_random_samples(n=100) np.random.seed(42) result2 = generate_random_samples(n=100) # Should get identical results with same seed assert np.array_equal(result1, result2) def test_statistical_properties(): """Test statistical properties of random output.""" np.random.seed(123) samples = generate_normal_samples(n=100000, mean=0, std=1) # Test mean and std are close to expected (not exact due to randomness) assert np.mean(samples) == approx(0, abs=0.01) assert np.std(samples) == approx(1, abs=0.01) @pytest.mark.parametrize("seed", [42, 123, 456]) def test_reproducibility_with_seeds(seed): """Test reproducibility with different seeds.""" np.random.seed(seed) result = stochastic_algorithm() # Should complete successfully regardless of seed assert result.success ``` ### Pattern 12: Testing Data Pipelines ```python def test_pipeline_end_to_end(tmp_path): """Test complete data pipeline.""" # Arrange - Create input data input_file = tmp_path / "input.csv" input_file.write_text("x,y\n1,2\n3,4\n5,6\n") output_file = tmp_path / "output.csv" # Act - Run pipeline result = run_pipeline(input_file, output_file) # Assert - Check results assert result.success assert output_file.exists() output_data = np.loadtxt(output_file, delimiter=",", skiprows=1) assert len(output_data) == 3 def test_pipeline_stages_independently(): """Test each pipeline stage separately.""" # Test stage 1 raw_data = load_data("input.csv") assert len(raw_data) > 0 # Test stage 2 cleaned = clean_data(raw_data) assert not np.any(np.isnan(cleaned)) # Test stage 3 transformed = transform_data(cleaned) assert transformed.shape == cleaned.shape # Test stage 4 result = analyze_data(transformed) assert result.metrics["r2"] > 0.9 ``` ### Pattern 13: Property-Based Testing with Hypothesis For complex scientific code, consider property-based testing: ```python from hypothesis import given, strategies as st from hypothesis.extra.numpy import arrays import numpy as np @given(arrays(np.float64, shape=st.integers(1, 100))) def test_mean_is_bounded(arr): """Mean should be between min and max.""" if len(arr) > 0 and not np.any(np.isnan(arr)): mean = np.mean(arr) assert np.min(arr) <= mean <= np.max(arr) @given( x=arrays(np.float64, shape=10, elements=st.floats(-100, 100)), y=arrays(np.float64, shape=10, elements=st.floats(-100, 100)) ) def test_linear_fit_properties(x, y): """Test properties of linear regression.""" if not (np.any(np.isnan(x)) or np.any(np.isnan(y))): slope, intercept = fit_linear(x, y) # Predictions should be finite predictions = slope * x + intercept assert np.all(np.isfinite(predictions)) ``` ## Test Configuration Examples ### Complete pyproject.toml Testing Section ```toml [tool.pytest.ini_options] minversion = "7.0" addopts = [ "-ra", # Show summary of all test outcomes "--showlocals", # Show local variables in tracebacks "--strict-markers", # Error on undefined markers "--strict-config", # Error on config issues "--cov=my_package", # Coverage for package "--cov-report=term-missing", # Show missing lines "--cov-report=html", # HTML coverage report ] xfail_strict = true # xfail tests must fail filterwarnings = [ "error", # Treat warnings as errors "ignore::DeprecationWarning:pkg_resources", # Ignore specific warning "ignore::PendingDeprecationWarning", ] log_cli_level = "info" # Log level for test output testpaths = [ "tests", # Test directory ] markers = [ "slow: marks tests as slow (deselect with '-m \"not slow\"')", "integration: marks tests as integration tests", "requires_gpu: marks tests that need GPU hardware", ] [tool.coverage.run] source = ["src"] omit = [ "*/tests/*", "*/__init__.py", "*/conftest.py", ] branch = true # Measure branch coverage [tool.coverage.report] exclude_lines = [ "pragma: no cover", "def __repr__", "raise AssertionError", "raise NotImplementedError", "if __name__ == .__main__.:", "if TYPE_CHECKING:", "@abstractmethod", ] precision = 2 show_missing = true skip_covered = false ``` ### conftest.py for Shared Fixtures ```python # tests/conftest.py import pytest import numpy as np from pathlib import Path @pytest.fixture(scope="session") def test_data_dir(): """Provide path to test data directory.""" return Path(__file__).parent / "data" @pytest.fixture def sample_array(): """Provide sample NumPy array.""" np.random.seed(42) return np.random.randn(100) @pytest.fixture def temp_output_dir(tmp_path): """Provide temporary directory for test outputs.""" output_dir = tmp_path / "output" output_dir.mkdir() return output_dir @pytest.fixture(autouse=True) def reset_random_state(): """Reset random state before each test.""" np.random.seed(42) @pytest.fixture(scope="session") def large_dataset(): """Load large dataset once per test session.""" return load_reference_data() # Platform-specific fixtures @pytest.fixture(params=["Linux", "Darwin", "Windows"]) def platform_name(request, monkeypatch): """Parametrize tests across platforms.""" monkeypatch.setattr("platform.system", lambda: request.param) return request.param ``` ## Common Testing Pitfalls and Solutions ### Pitfall 1: Testing Implementation Instead of Behavior **Bad:** ```python def test_uses_numpy_mean(): """Test that function uses np.mean.""" # Testing implementation! # This is fragile - breaks if implementation changes pass ``` **Good:** ```python def test_computes_correct_average(): """Test that function returns correct average.""" data = np.array([1, 2, 3, 4, 5]) result = compute_average(data) assert result == approx(3.0) ``` ### Pitfall 2: Non-Deterministic Tests **Bad:** ```python def test_random_sampling(): samples = generate_samples() # Uses random seed from system time! assert samples[0] > 0 # Might fail randomly ``` **Good:** ```python def test_random_sampling(): np.random.seed(42) # Fixed seed samples = generate_samples() assert samples[0] == approx(0.4967, rel=1e-4) ``` ### Pitfall 3: Exact Floating-Point Comparisons **Bad:** ```python def test_computation(): result = 0.1 + 0.2 assert result == 0.3 # Fails due to floating-point error! ``` **Good:** ```python def test_computation(): result = 0.1 + 0.2 assert result == approx(0.3) ``` ### Pitfall 4: Testing Too Much in One Test **Bad:** ```python def test_entire_analysis(): # Load data data = load_data() assert data is not None # Process processed = process(data) assert len(processed) > 0 # Analyze result = analyze(processed) assert result.score > 0.8 # Save save_results(result, "output.txt") assert Path("output.txt").exists() ``` **Good:** ```python def test_load_data_succeeds(): data = load_data() assert data is not None def test_process_returns_nonempty(): data = load_data() processed = process(data) assert len(processed) > 0 def test_analyze_gives_good_score(): data = load_data() processed = process(data) result = analyze(processed) assert result.score > 0.8 def test_save_results_creates_file(tmp_path): output_file = tmp_path / "output.txt" result = create_mock_result() save_results(result, output_file) assert output_file.exists() ``` ## Testing Checklist - [ ] Tests are in `tests/` directory separate from source - [ ] Test files named `test_*.py` - [ ] Test functions named `test_*` - [ ] Tests run against installed package (use src/ layout) - [ ] pytest configured in `pyproject.toml` - [ ] Using `pytest.approx` for floating-point comparisons - [ ] Tests check exceptions with `pytest.raises` - [ ] Tests check warnings with `pytest.warns` - [ ] Parametrized tests for multiple inputs - [ ] Fixtures for reusable setup - [ ] Markers used for test organization - [ ] Random tests use fixed seeds - [ ] Tests are independent (can run in any order) - [ ] Each test focuses on one behavior - [ ] Coverage > 80% (preferably > 90%) - [ ] All tests pass before committing - [ ] Slow tests marked with `@pytest.mark.slow` - [ ] Integration tests marked appropriately - [ ] CI configured to run tests automatically ## Continuous Integration ### GitHub Actions Example ```yaml # .github/workflows/tests.yml name: Tests on: push: branches: [main] pull_request: branches: [main] jobs: test: runs-on: ${{ matrix.os }} strategy: matrix: os: [ubuntu-latest, macos-latest, windows-latest] python-version: ["3.9", "3.10", "3.11", "3.12"] steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: | python -m pip install --upgrade pip pip install -e ".[dev]" - name: Run tests run: | pytest --cov=my_package --cov-report=xml - name: Upload coverage uses: codecov/codecov-action@v4 with: files: ./coverage.xml ``` ## Resources - **Scientific Python pytest Guide**: - **Scientific Python Testing Tutorial**: - **Scientific Python Testing Principles**: - **pytest Documentation**: - **pytest-cov**: - **pytest-mock**: - **Hypothesis (property-based testing)**: - **NumPy testing utilities**: - **Testing best practices**: ## Summary Testing scientific Python code with pytest, following Scientific Python community principles, provides: 1. **Confidence**: Know your code works correctly 2. **Reproducibility**: Ensure consistent behavior across environments 3. **Documentation**: Tests show how code should be used and communicate developer intent 4. **Refactoring safety**: Change code without breaking functionality 5. **Regression prevention**: Catch bugs before they reach users 6. **Scientific rigor**: Validate numerical accuracy and physical correctness **Key testing principles:** - Start with **public interface tests** from the user's perspective - Organize tests into **suites** (unit, integration, e2e) by type and speed - Follow **outside-in** approach: public interface → integration → unit tests - Keep tests **simple, focused, and independent** - Test **behavior rather than implementation** - Use pytest's powerful features (fixtures, parametrization, markers) effectively - Always verify tests **fail when they should** to avoid false confidence **Remember**: Any test is better than none, but well-organized tests following these principles create trustworthy, maintainable scientific software that the community can rely on.