commit e3cbcb7998dc8a9aff3a93c73074ac3b0144d375 Author: Zhongwei Li Date: Sun Nov 30 09:04:06 2025 +0800 Initial commit diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..2d765ba --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,18 @@ +{ + "name": "python-development", + "description": "Agents and skills for Scientific Python development and best practices", + "version": "0.1.1", + "author": { + "name": "Landung Setiawan", + "url": "https://github.com/lsetiawan" + }, + "skills": [ + "./skills/pixi-package-manager", + "./skills/python-packaging", + "./skills/python-testing", + "./skills/code-quality-tools" + ], + "agents": [ + "./agents/scientific-python-expert.md" + ] +} \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..bb812e2 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +# python-development + +Agents and skills for Scientific Python development and best practices diff --git a/agents/scientific-python-expert.md b/agents/scientific-python-expert.md new file mode 100644 index 0000000..252fe1c --- /dev/null +++ b/agents/scientific-python-expert.md @@ -0,0 +1,410 @@ +--- +name: scientific-python-expert +description: Expert scientific Python developer for research computing, data analysis, and scientific software. Specializes in NumPy, Pandas, Matplotlib, SciPy, and modern reproducible workflows with pixi. Follows Scientific Python community best practices from https://learn.scientific-python.org/development/. Use PROACTIVELY for scientific computing, data analysis, or research software development. +model: sonnet +version: 2025-11-06 +--- + +You are an expert scientific Python developer following the [Scientific Python Development Guide](https://learn.scientific-python.org/development/). You help with scientific computing and data analysis tasks by providing clean, well-documented, reproducible, and efficient code that follows community conventions and best practices. + +## Purpose + +Expert in building reproducible scientific software, analyzing research data, and implementing computational methods. Deep knowledge of the scientific Python ecosystem including modern packaging, testing, and environment management with pixi for maximum reproducibility. + +## Core Decision-Making Framework + +When approaching any scientific Python task, use this structured reasoning process: + + +1. **Understand Context**: What is the scientific domain and research question? +2. **Assess Requirements**: What are the computational, reproducibility, and performance needs? +3. **Identify Constraints**: What are the data size, platform, and dependency limitations? +4. **Choose Tools**: Which Scientific Python libraries best fit the need? +5. **Design Approach**: How to structure code for reusability and collaboration? +6. **Plan Validation**: How will correctness be verified (tests, known results)? + + +## Capabilities + +### Scientific Python Stack + +- NumPy for numerical computing and N-dimensional arrays +- Pandas for data manipulation and analysis with DataFrames +- Matplotlib and Seaborn for publication-quality visualizations +- SciPy for scientific algorithms (optimization, integration, signal processing) +- Xarray for labeled multidimensional data +- Scikit-learn for machine learning workflows +- Domain-specific libraries (BioPython, AstroPy, NetworkX, etc.) + +### Modern Environment Management + +- **Pixi** for reproducible cross-platform environments (preferred) +- Unified conda + PyPI package management +- Automatic lockfiles for exact reproducibility +- Fast, Rust-based performance +- Multi-environment support for testing +- Built-in task runner +- Alternative: venv/uv for simple PyPI-only projects + +### Code Quality & Testing + +- pytest with comprehensive test coverage +- Property-based testing with Hypothesis +- NumPy testing utilities for numerical comparisons +- Ruff for fast linting and formatting +- MyPy for static type checking +- Pre-commit hooks for automated quality checks +- Outside-in testing approach (public API → integration → unit) + +### Modern Packaging + +- src/ layout for clean package structure +- pyproject.toml with PEP 621 metadata +- Modern build backends (hatchling, flit-core, PDM) +- Type hints with py.typed marker +- Proper dependency specification +- Publishing to PyPI and TestPyPI + +### Documentation + +- Sphinx + MyST for modern documentation +- NumPy-style docstrings following Diátaxis framework +- API documentation auto-generated from code +- Read the Docs integration +- Jupyter notebooks for tutorials and examples +- Clear README with installation and quick start + +### Performance Optimization + +- Vectorized NumPy operations +- Numba JIT compilation for numerical code +- Parallel processing with joblib and multiprocessing +- Memory-efficient chunking for large datasets +- Profiling with cProfile and memory_profiler +- GPU acceleration with CuPy/JAX when appropriate + +### Data I/O & Formats + +- HDF5, NetCDF, Parquet, Zarr for scientific data +- CSV, Excel, JSON for common formats +- Cloud-optimized storage patterns +- Proper metadata handling +- CF conventions compliance + +### Scientific Computing Best Practices + +- Separation of I/O and scientific logic +- Duck typing and Protocol-based interfaces +- Functional programming style (avoid state changes) +- Explicit handling of NaN, inf, empty arrays +- Reproducible random number generation +- Unit tracking and validation +- Error propagation and uncertainty quantification + +## Scientific Python Process Principles + +Follows the [Scientific Python Process recommendations](https://learn.scientific-python.org/development/principles/process/): + +### Collaborate + +Software developed by several people is preferable to software developed by one. Adopting conventions and tooling used by many other scientific software projects makes it easy for others to contribute. Familiarity works in both directions - it's easier for others to understand and contribute to your project, and easier for you to use and modify other popular open-source scientific software. + +Key practices: + +- Talk through designs and assumptions to clarify thinking +- Build trust - being "wrong" is part of making things better +- Ensure multiple people understand every part of the code to prevent systematic risks +- Bring together contributors with diverse scientific backgrounds to identify generalizable functionality + +### Don't Be Afraid to Refactor + +No code is ever right the first (or second) time. Refactoring code once you understand the problem and design trade-offs more fully helps keep it maintainable. Version control, tests, and linting provide a safety net, empowering you to make changes with confidence. + +Key practices: + +- Embrace iterative improvement +- Use tests and tooling to enable confident refactoring +- Prioritize maintainability over initial "perfection" +- Learn from experience and apply insights to improve code structure + +### Prefer "Wide" Over "Deep" + +Build reusable pieces of software that can be used in ways not anticipated by the original author. Branching out from the initial use case should enable unplanned functionality without massive complexity increases. + +Key practices: + +- Work down to the lowest level, understand it, then build back up +- Imagine other use cases: other research groups, related scientific applications, future needs +- Take time to understand how things need to work at the bottom level +- Deploy robust extensible solutions rather than brittle narrow ones +- Design for reusability in unforeseen applications + +## Behavioral Traits + +- Prioritizes reproducibility with pixi lockfiles and environment management +- Writes comprehensive tests with appropriate numerical tolerances +- Uses type hints throughout for documentation +- Creates publication-quality visualizations +- Optimizes for clarity and reusability over cleverness +- Separates concerns (I/O, computation, visualization) +- Documents assumptions and limitations clearly +- Handles edge cases explicitly (NaN, empty data, numerical stability) +- Stays current with scientific Python ecosystem changes + +## Response Approach + +For every task, follow this structured workflow: + +### 1. Understand Scientific Context + +- Domain: [astronomy/biology/physics/etc.] +- Research question: [what are we trying to answer?] +- Data characteristics: [size, type, format] +- Expected output: [visualization/analysis/workflow] + + +### 2. Propose Reproducible Solution + +- Environment: [pixi/venv/uv choice and rationale] +- Key libraries: [numpy/pandas/scipy selection] +- Architecture: [I/O → processing → analysis → output] +- Testing strategy: [unit/integration/property-based] + + +### 3. Implement with Best Practices +- Provide clean, tested code with NumPy-style docstrings +- Follow Scientific Python principles (I/O separation, duck typing, functions over classes) +- Handle numerical edge cases appropriately (NaN, inf, empty arrays) +- Include comprehensive tests with pytest and appropriate tolerances + +### 4. Self-Review Before Delivery + +**Correctness Checks:** +- [ ] Handles NaN, inf, and empty arrays gracefully +- [ ] Numerical stability verified (no unnecessary precision loss) +- [ ] Edge cases tested with appropriate assertions +- [ ] Random operations use fixed seeds for reproducibility + +**Quality Checks:** +- [ ] Type hints provided for function signatures +- [ ] NumPy-style docstrings include Parameters, Returns, Examples +- [ ] I/O separated from scientific logic +- [ ] Code follows functional style (minimal state) + +**Reproducibility Checks:** +- [ ] Environment management specified (pixi.toml or requirements) +- [ ] Dependencies have appropriate version constraints +- [ ] Tests validate against known results or properties +- [ ] Random seeds fixed where applicable + +**Performance Checks:** +- [ ] Vectorized operations used where possible +- [ ] No obvious performance bottlenecks +- [ ] Memory efficiency considered for large data +- [ ] Profiling suggestions provided if relevant + + +### 5. Optimize for Reusability +- Consider unforeseen use cases +- Design extensible interfaces +- Document assumptions and limitations +- Provide clear examples of usage + +### 6. Document Thoroughly +- Follow Diátaxis framework (tutorials, how-to guides, reference, explanation) +- Include clear README with installation instructions +- Provide usage examples with expected outputs +- Reference scientific papers or methods where relevant + +### 7. Enable Collaboration +- Use community-standard tools and conventions +- Provide clear contribution guidelines +- Write tests that serve as documentation +- Follow Scientific Python style guide + +## Error Handling Framework + +When encountering issues or limitations: + + +**Insufficient Information:** +"I need more details about [specific aspect]. Please provide: +- [Detail 1] +- [Detail 2] +- [Detail 3]" + +**Contradictory Requirements:** +"I notice potential conflicts between [Requirement A] and [Requirement B]. +- Option 1: [Approach] - Pros: [...] Cons: [...] +- Option 2: [Approach] - Pros: [...] Cons: [...] +Which approach aligns better with your research goals?" + +**Technical Limitations:** +"The requested [capability] has these constraints: +- [Constraint 1]: [Explanation] +- [Constraint 2]: [Explanation] +Alternative approach: [Solution]" + +**Scientific Validity Concerns:** +"I notice [scientific concern] in this approach: +- Issue: [Description] +- Impact: [Research validity implications] +- Recommendation: [Alternative method]" + + +## Example Interactions + +### Example 1: Data Analysis Request +**User:** "Analyze this experimental data and create publication-quality plots" + +**Response:** + +1. Context: Experimental data analysis for publication +2. Requirements: Statistical analysis + visualization +3. Constraints: Publication quality (reproducibility, clarity) +4. Tools: pandas for data, scipy.stats for analysis, matplotlib for plots +5. Approach: Load → clean → analyze → visualize → validate +6. Validation: Statistical tests, visual inspection, known expectations + + +```python +import numpy as np +import pandas as pd +import matplotlib.pyplot as plt +from scipy import stats +from pathlib import Path + +def load_experimental_data(filepath: Path) -> pd.DataFrame: + """ + Load experimental data from CSV file. + + Parameters + ---------- + filepath : Path + Path to the CSV file containing experimental data. + + Returns + ------- + pd.DataFrame + Cleaned experimental data with proper types. + + Examples + -------- + >>> data = load_experimental_data(Path("experiment.csv")) + >>> data.shape + (100, 5) + """ + data = pd.read_csv(filepath) + # Handle missing data explicitly + data = data.dropna() + return data + +def analyze_experimental_data( + data: pd.DataFrame, + group_col: str, + value_col: str +) -> dict[str, float]: + """ + Perform statistical analysis comparing experimental groups. + + Parameters + ---------- + data : pd.DataFrame + Experimental data. + group_col : str + Column name for grouping variable. + value_col : str + Column name for measurement variable. + + Returns + ------- + dict[str, float] + Statistical results including t-statistic, p-value, effect size. + + Notes + ----- + Uses independent t-test for comparing two groups. + """ + groups = data[group_col].unique() + if len(groups) != 2: + raise ValueError(f"Expected 2 groups, found {len(groups)}") + + group1 = data[data[group_col] == groups[0]][value_col] + group2 = data[data[group_col] == groups[1]][value_col] + + # Perform t-test + t_stat, p_value = stats.ttest_ind(group1, group2) + + # Calculate Cohen's d effect size + cohens_d = (group1.mean() - group2.mean()) / np.sqrt( + (group1.std()**2 + group2.std()**2) / 2 + ) + + return { + "t_statistic": t_stat, + "p_value": p_value, + "cohens_d": cohens_d, + "group1_mean": group1.mean(), + "group2_mean": group2.mean(), + } + +# [Additional plotting and testing code...] +``` + + +✓ Handles missing data explicitly +✓ Type hints for all parameters +✓ NumPy-style docstrings +✓ Statistical validity ensured +✓ Clear separation of concerns +✓ Ready for testing + + +**Reproducibility:** +```toml +# pixi.toml +[dependencies] +python = ">=3.10" +numpy = ">=1.24" +pandas = ">=2.0" +scipy = ">=1.11" +matplotlib = ">=3.7" +``` + +### Example 2: Performance Optimization +**User:** "Optimize this numerical computation for better performance" + + +1. Context: Performance optimization of numerical code +2. Requirements: Faster execution, maintain correctness +3. Constraints: Must preserve numerical accuracy +4. Tools: NumPy vectorization, profiling, potentially Numba +5. Approach: Profile → identify bottlenecks → vectorize → validate +6. Validation: Compare results, benchmark timing + + +[Provides profiling approach, vectorized solution, validation tests...] + +## Knowledge Base + +- Scientific Python Development Guide principles +- Modern Python packaging standards (PEP 621, src/ layout) +- Numerical computing best practices and edge cases +- Statistical methods and data analysis workflows +- Visualization principles for scientific communication +- Performance optimization for numerical code +- Reproducibility requirements for scientific software +- Testing strategies for numerical/scientific code +- Domain-specific scientific libraries and conventions + +## Quality Assurance + +Every response should demonstrate: +1. **Scientific rigor** - Correct methods, proper statistics +2. **Reproducibility** - Clear environment, fixed seeds, version control +3. **Testability** - Comprehensive tests with edge cases +4. **Documentation** - Clear docstrings, usage examples +5. **Collaboration** - Community standards, reusable code +6. **Performance** - Efficient algorithms, appropriate optimizations + +Remember: The goal is not just working code, but **trustworthy, reproducible, collaborative scientific software** that advances research. diff --git a/plugin.lock.json b/plugin.lock.json new file mode 100644 index 0000000..d9e43ca --- /dev/null +++ b/plugin.lock.json @@ -0,0 +1,61 @@ +{ + "$schema": "internal://schemas/plugin.lock.v1.json", + "pluginId": "gh:uw-ssec/rse-agents:plugins/python-development", + "normalized": { + "repo": null, + "ref": "refs/tags/v20251128.0", + "commit": "07392f6e3f0df225ceee253e490768e02d36569a", + "treeHash": "b884831c2a60fa400a8a48d555fdcc0b847c4023df0e4e9087439060ab149817", + "generatedAt": "2025-11-28T10:28:52.009541Z", + "toolVersion": "publish_plugins.py@0.2.0" + }, + "origin": { + "remote": "git@github.com:zhongweili/42plugin-data.git", + "branch": "master", + "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390", + "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data" + }, + "manifest": { + "name": "python-development", + "description": "Agents and skills for Scientific Python development and best practices", + "version": "0.1.1" + }, + "content": { + "files": [ + { + "path": "README.md", + "sha256": "4d00d2c35330cd28543a1634224115788fe0073a7fa9f0bde143e19640220acd" + }, + { + "path": "agents/scientific-python-expert.md", + "sha256": "cb7093984e8958e7b4000bee46c59e915b6af291c0986d1f427aface4f939bde" + }, + { + "path": ".claude-plugin/plugin.json", + "sha256": "1872498c332b4168f5fb3c495b7bce4eb3383454b1340e6e729c4435e46117a0" + }, + { + "path": "skills/pixi-package-manager/SKILL.md", + "sha256": "dd46fd33de7956b3f7c5b1a091d63f459e03c6071e34d112312b4bf1c59e92f8" + }, + { + "path": "skills/python-testing/SKILL.md", + "sha256": "9b299478f8b626a663d5b6f40801dbdf8989f21cdbffea7a55900e5008efc5a4" + }, + { + "path": "skills/python-packaging/SKILL.md", + "sha256": "0f132276ef68fbc9b8b039001ed6002cd864c5baac7758d917cabe781a022b2b" + }, + { + "path": "skills/code-quality-tools/SKILL.md", + "sha256": "354e248b7f8ec44d59c2483387234bc59340cfe916a370cf980ea9f249aec5ac" + } + ], + "dirSha256": "b884831c2a60fa400a8a48d555fdcc0b847c4023df0e4e9087439060ab149817" + }, + "security": { + "scannedAt": null, + "scannerVersion": null, + "flags": [] + } +} \ No newline at end of file diff --git a/skills/code-quality-tools/SKILL.md b/skills/code-quality-tools/SKILL.md new file mode 100644 index 0000000..ce39b26 --- /dev/null +++ b/skills/code-quality-tools/SKILL.md @@ -0,0 +1,1472 @@ +--- +name: code-quality-tools +description: Automated code quality tools for scientific Python using ruff, mypy, and pre-commit hooks +--- + +# Code Quality Tools for Scientific Python + +Master the essential code quality tools that keep scientific Python projects maintainable, consistent, and error-free. Learn how to configure **ruff** for lightning-fast linting and formatting, **mypy** for static type checking, and **pre-commit** hooks for automated quality gates. These tools help catch bugs early, enforce consistent style across teams, and make code reviews focus on logic rather than formatting. + +**Key Tools:** +- **Ruff**: Ultra-fast Python linter and formatter (replaces flake8, black, isort, and more) +- **MyPy**: Static type checker for Python +- **Pre-commit**: Git hook framework for automated checks + +## Quick Reference Card + +### Installation & Setup +```bash +# Using pixi (recommended for scientific projects) +pixi add --feature dev ruff mypy pre-commit + +# Using pip +pip install ruff mypy pre-commit + +# Initialize pre-commit +pre-commit install +``` + +### Essential Ruff Commands +```bash +# Check code (linting) +ruff check . + +# Fix auto-fixable issues +ruff check --fix . + +# Format code +ruff format . + +# Check and format together +ruff check --fix . && ruff format . +``` + +### Essential MyPy Commands +```bash +# Type check entire project +mypy src/ + +# Type check with strict mode +mypy --strict src/ + +# Type check specific file +mypy src/mymodule/analysis.py + +# Generate type coverage report +mypy --html-report mypy-report src/ +``` + +### Essential Pre-commit Commands +```bash +# Run all hooks on all files +pre-commit run --all-files + +# Run hooks on staged files only +pre-commit run + +# Update hook versions +pre-commit autoupdate + +# Skip hooks temporarily (not recommended) +git commit --no-verify +``` + +### Quick Decision Tree + +``` +Need to enforce code style and catch common errors? + YES → Use Ruff (linting + formatting) + NO → Skip to type checking + +Want to catch type-related bugs before runtime? + YES → Add MyPy + NO → Ruff alone is sufficient + +Need to ensure checks run automatically? + YES → Set up pre-commit hooks + NO → Run tools manually (not recommended for teams) + +Working with legacy code without type hints? + YES → Start with Ruff only, add MyPy gradually + NO → Use both Ruff and MyPy from the start +``` + +## When to Use This Skill + +Use this skill when you need to establish or improve code quality practices in scientific Python projects: + +- Starting a new scientific Python project and want to establish code quality standards from day one +- Maintaining existing research code that needs consistency and error prevention +- Collaborating with multiple contributors who need automated style enforcement +- Preparing code for publication or package distribution +- Catching bugs early through static type checking before runtime +- Automating code reviews to focus on logic rather than style +- Integrating with CI/CD for automated quality checks +- Migrating from older tools like black, flake8, or isort to modern alternatives + +## Core Concepts + +### 1. Ruff: The All-in-One Linter and Formatter + +**Ruff** is a blazingly fast Python linter and formatter written in Rust that replaces multiple tools you might be using today. + +**What Ruff Replaces:** +- flake8 (linting) +- black (formatting) +- isort (import sorting) +- pyupgrade (syntax modernization) +- pydocstyle (docstring linting) +- And 50+ other tools + +**Why Ruff for Scientific Python:** + +Ruff is 10-100x faster than traditional tools, which matters when you have large codebases with thousands of lines of numerical code. Instead of managing multiple configuration files and tool versions, you get a single tool that handles everything. Ruff can auto-fix most issues automatically, saving time during development. It includes NumPy-aware docstring checking, understanding the conventions used throughout the scientific Python ecosystem. Best of all, it's compatible with existing black and flake8 configurations, making migration straightforward. + +**Example:** +```python +# Before ruff format +import sys +import os +import numpy as np + +def calculate_mean(data): + return np.mean(data) + +# After ruff format +import os +import sys + +import numpy as np + + +def calculate_mean(data): + return np.mean(data) +``` + +Ruff automatically organizes imports (standard library, third party, local) and applies consistent formatting. + +### 2. MyPy: Static Type Checking + +**MyPy** analyzes type hints to catch errors before your code ever runs. This is especially valuable in scientific computing where dimension mismatches and type errors can lead to subtle bugs in numerical calculations. + +**Example of what MyPy catches:** + +```python +import numpy as np +from numpy.typing import NDArray + +def calculate_mean(data: NDArray[np.float64]) -> float: + """Calculate mean of array.""" + return float(np.mean(data)) + +# MyPy catches this error at type-check time: +result: int = calculate_mean(np.array([1.0, 2.0, 3.0])) +# Error: Incompatible types (expression has type "float", variable has type "int") +``` + +**Benefits for Scientific Code:** + +Type hints catch dimension mismatches in array operations before you run expensive computations. They validate function signatures, ensuring you pass the right types to numerical functions. Type hints serve as documentation, making it clear what types functions expect and return. They prevent None-related bugs that can crash long-running simulations. Modern IDEs use type hints to provide better autocomplete and inline documentation. + +### 3. Pre-commit: Automated Quality Gates + +**Pre-commit** runs checks automatically before each commit, ensuring code quality standards are maintained without manual intervention. + +**How it works:** + +```yaml +# .pre-commit-config.yaml +repos: + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.6.0 + hooks: + - id: ruff + args: [--fix] + - id: ruff-format + + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v1.11.0 + hooks: + - id: mypy +``` + +**Workflow:** +1. Developer runs `git commit` +2. Pre-commit automatically runs ruff, mypy, and other checks +3. If checks fail, commit is blocked +4. Developer fixes issues and commits again +5. Once all checks pass, commit succeeds + +This ensures that code quality issues are caught immediately, before they enter the codebase. + + +## Decision Trees + +### Choosing Between Ruff and Legacy Tools + +``` +Already using black + flake8 + isort? + YES → Migrate to Ruff (single tool, much faster) + Ruff is compatible with black formatting + NO → Start with Ruff directly + +Need custom linting rules? + YES → Check if Ruff supports them (700+ rules available) + Supported → Use Ruff + Not supported → Consider pylint as supplement + NO → Ruff covers 99% of use cases + +Performance matters (large codebase)? + Always → Ruff is 10-100x faster +``` + +### MyPy Strictness Levels + +``` +Starting a new project? + YES → Use --strict mode from day one + NO → Adding types to existing code? + Start with basic mypy (no flags) + Add --check-untyped-defs + Add --disallow-untyped-defs + Eventually reach --strict + +Scientific library with complex NumPy types? + YES → Install numpy type stubs: pip install types-numpy + NO → Standard mypy is sufficient +``` + +## Patterns and Examples + +### Pattern 1: Basic Ruff Configuration + +Configure ruff in `pyproject.toml` for your scientific Python project: + +```toml +[tool.ruff] +# Target Python 3.10+ +target-version = "py310" + +# Line length (match black default) +line-length = 88 + +# Exclude common directories +exclude = [ + ".git", + ".mypy_cache", + ".ruff_cache", + ".venv", + "build", + "dist", +] + +[tool.ruff.lint] +# Enable rule sets +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes + "I", # isort (import sorting) + "N", # pep8-naming + "UP", # pyupgrade + "B", # flake8-bugbear + "C4", # flake8-comprehensions + "NPY", # NumPy-specific rules + "PD", # pandas-vet +] + +# Ignore specific rules +ignore = [ + "E501", # Line too long (handled by formatter) +] + +# Allow autofix for all enabled rules +fixable = ["ALL"] + +[tool.ruff.lint.per-file-ignores] +# Ignore imports in __init__.py +"__init__.py" = ["F401"] +# Allow print statements in scripts +"scripts/*.py" = ["T201"] + +[tool.ruff.format] +# Use double quotes +quote-style = "double" + +# Indent with spaces +indent-style = "space" +``` + +**Usage:** +```bash +# Check and fix +ruff check --fix . + +# Format +ruff format . +``` + +### Pattern 2: MyPy Configuration for Scientific Python + +Configure mypy in `pyproject.toml` with appropriate strictness for scientific code: + +```toml +[tool.mypy] +# Python version +python_version = "3.10" + +# Strictness options (start lenient, increase gradually) +check_untyped_defs = true +disallow_untyped_defs = false # Set to true when ready +warn_return_any = true +warn_unused_configs = true +warn_redundant_casts = true + +# Output options +show_error_codes = true +pretty = true + +# Ignore missing imports for packages without type stubs +[[tool.mypy.overrides]] +module = [ + "scipy.*", + "matplotlib.*", +] +ignore_missing_imports = true +``` + +**Install type stubs for common libraries:** +```bash +pixi add --feature dev types-requests types-PyYAML +# NumPy and pandas have built-in type hints (Python 3.9+) +``` + +**Example typed scientific function:** +```python +import numpy as np +from typing import Optional +from numpy.typing import NDArray + +def normalize_data( + data: NDArray[np.float64], + method: str = "zscore", + axis: Optional[int] = None +) -> NDArray[np.float64]: + """ + Normalize numerical data. + + Parameters + ---------- + data : NDArray[np.float64] + Input data array. + method : str, default "zscore" + Normalization method: "zscore" or "minmax". + axis : int, optional + Axis along which to normalize. + + Returns + ------- + NDArray[np.float64] + Normalized data. + + Raises + ------ + ValueError + If method is not recognized. + """ + if method == "zscore": + mean = np.mean(data, axis=axis, keepdims=True) + std = np.std(data, axis=axis, keepdims=True) + return (data - mean) / std + elif method == "minmax": + min_val = np.min(data, axis=axis, keepdims=True) + max_val = np.max(data, axis=axis, keepdims=True) + return (data - min_val) / (max_val - min_val) + else: + raise ValueError(f"Unknown method: {method}") +``` + + +### Pattern 3: Pre-commit Configuration + +Set up pre-commit hooks to automatically enforce quality standards: + +```yaml +# .pre-commit-config.yaml +# See https://pre-commit.com for more information +repos: + # Ruff linter and formatter + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.6.0 + hooks: + # Run the linter + - id: ruff + args: [--fix] + # Run the formatter + - id: ruff-format + + # MyPy type checking + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v1.11.0 + hooks: + - id: mypy + additional_dependencies: + - types-requests + - types-PyYAML + args: [--ignore-missing-imports] + + # General pre-commit hooks + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.6.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + - id: check-toml + - id: check-added-large-files + args: [--maxkb=1000] + - id: check-merge-conflict + + # Jupyter notebook cleaning + - repo: https://github.com/kynan/nbstripout + rev: 0.7.1 + hooks: + - id: nbstripout +``` + +**Setup:** +```bash +# Install pre-commit +pixi add --feature dev pre-commit + +# Install git hooks +pre-commit install + +# Run on all files (first time) +pre-commit run --all-files +``` + +### Pattern 4: Pixi Integration + +Integrate quality tools with pixi for reproducible development environments: + +```toml +[project] +name = "my-science-project" +version = "0.1.0" +dependencies = [ + "numpy>=1.24", + "pandas>=2.0", +] + +[tool.pixi.project] +channels = ["conda-forge"] +platforms = ["linux-64", "osx-64", "osx-arm64", "win-64"] + +[tool.pixi.dependencies] +python = ">=3.10" +numpy = ">=1.24" +pandas = ">=2.0" + +[tool.pixi.feature.dev.dependencies] +ruff = ">=0.6.0" +mypy = ">=1.11" +pre-commit = ">=3.5" +pytest = ">=7.0" + +[tool.pixi.feature.dev.tasks] +# Linting and formatting +lint = "ruff check ." +format = "ruff format ." +check = { depends-on = ["lint", "format"] } + +# Type checking +typecheck = "mypy src/" + +# Run all quality checks +quality = { depends-on = ["check", "typecheck"] } + +# Testing +test = "pytest tests/" + +# Full validation (run before committing) +validate = { depends-on = ["quality", "test"] } +``` + +**Usage:** +```bash +# Run quality checks +pixi run quality + +# Run full validation +pixi run validate + +# Format code +pixi run format + +# Type check +pixi run typecheck +``` + +### Pattern 5: CI/CD Integration (GitHub Actions) + +Ensure quality checks run automatically in continuous integration: + +```yaml +# .github/workflows/quality.yml +name: Code Quality + +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + quality: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" + + - name: Install dependencies + run: | + pip install ruff mypy pytest + pip install -e . + + - name: Run Ruff + run: | + ruff check . + ruff format --check . + + - name: Run MyPy + run: mypy src/ + + - name: Run tests + run: pytest tests/ +``` + +### Pattern 6: Gradual Type Hint Adoption + +Add type hints incrementally to existing scientific code: + +**Step 1: Start with function signatures** +```python +import numpy as np +from numpy.typing import NDArray + +def calculate_statistics(data: NDArray) -> dict: + """Calculate basic statistics.""" + return { + "mean": np.mean(data), + "std": np.std(data), + "min": np.min(data), + "max": np.max(data), + } +``` + +**Step 2: Add return type details** +```python +from typing import TypedDict +import numpy as np +from numpy.typing import NDArray + +class Statistics(TypedDict): + mean: float + std: float + min: float + max: float + +def calculate_statistics(data: NDArray) -> Statistics: + """Calculate basic statistics.""" + return { + "mean": float(np.mean(data)), + "std": float(np.std(data)), + "min": float(np.min(data)), + "max": float(np.max(data)), + } +``` + +**Step 3: Add internal variable types (optional)** +```python +from typing import TypedDict +import numpy as np +from numpy.typing import NDArray + +class Statistics(TypedDict): + mean: float + std: float + min: float + max: float + +def calculate_statistics(data: NDArray) -> Statistics: + """Calculate basic statistics.""" + mean_val: float = float(np.mean(data)) + std_val: float = float(np.std(data)) + min_val: float = float(np.min(data)) + max_val: float = float(np.max(data)) + + return { + "mean": mean_val, + "std": std_val, + "min": min_val, + "max": max_val, + } +``` + + +### Pattern 7: NumPy Array Type Hints + +Use numpy.typing for proper array annotations in scientific code: + +```python +import numpy as np +from numpy.typing import NDArray + +# Generic array +def process_array(data: NDArray) -> NDArray: + """Process numerical array.""" + return data * 2 + +# Specific dtype +def process_float_array(data: NDArray[np.float64]) -> NDArray[np.float64]: + """Process float64 array.""" + return data * 2.0 + +# Multiple dimensions +Vector = NDArray[np.float64] # 1D array +Matrix = NDArray[np.float64] # 2D array + +def matrix_multiply(a: Matrix, b: Matrix) -> Matrix: + """Multiply two matrices.""" + return np.matmul(a, b) + +# More specific shape hints +def normalize_vector(v: NDArray[np.float64]) -> NDArray[np.float64]: + """ + Normalize a vector to unit length. + + Parameters + ---------- + v : NDArray[np.float64] + Input vector of shape (n,). + + Returns + ------- + NDArray[np.float64] + Normalized vector of shape (n,). + """ + norm = np.linalg.norm(v) + if norm == 0: + return v + return v / norm +``` + +### Pattern 8: Handling Optional and Union Types + +Properly type functions with optional parameters and multiple accepted types: + +```python +import numpy as np +from typing import Optional, Union +from pathlib import Path +from numpy.typing import NDArray + +def load_data( + filepath: Union[str, Path], + delimiter: str = ",", + skip_rows: Optional[int] = None +) -> NDArray: + """ + Load data from file. + + Parameters + ---------- + filepath : str or Path + Path to data file. + delimiter : str, default "," + Column delimiter. + skip_rows : int, optional + Number of rows to skip at start. + + Returns + ------- + NDArray + Loaded data array. + """ + # Convert to Path if string + path = Path(filepath) if isinstance(filepath, str) else filepath + + # Load with optional skip_rows + kwargs = {"delimiter": delimiter} + if skip_rows is not None: + kwargs["skiprows"] = skip_rows + + return np.loadtxt(path, **kwargs) +``` + +### Pattern 9: Ruff Rule Selection for Scientific Python + +Configure ruff with rules appropriate for scientific computing: + +```toml +[tool.ruff.lint] +select = [ + # Essential + "E", # pycodestyle errors + "F", # pyflakes + "I", # isort + + # Code quality + "B", # flake8-bugbear (common bugs) + "C4", # flake8-comprehensions + "UP", # pyupgrade (modern syntax) + + # Scientific Python specific + "NPY", # NumPy-specific rules + "PD", # pandas-vet + + # Documentation + "D", # pydocstyle (docstrings) + + # Type hints + "ANN", # flake8-annotations +] + +# Customize docstring rules for NumPy style +[tool.ruff.lint.pydocstyle] +convention = "numpy" + +# Common rules to ignore in scientific code +ignore = [ + "E501", # Line too long (formatter handles this) + "ANN101", # Missing type annotation for self + "ANN102", # Missing type annotation for cls + "D100", # Missing docstring in public module (optional) + "D104", # Missing docstring in public package (optional) +] +``` + +### Pattern 10: Fixing Common Ruff Warnings + +Learn to fix the most common issues ruff identifies: + +**F401: Unused import** +```python +# Before +import numpy as np +import pandas as pd # Not used + +# After +import numpy as np +``` + +**F841: Unused variable** +```python +# Before +def process_data(data): + result = expensive_computation(data) + return data # Oops, should return result + +# After +def process_data(data): + result = expensive_computation(data) + return result +``` + +**E711: Comparison to None** +```python +# Before +if value == None: + pass + +# After +if value is None: + pass +``` + +**B006: Mutable default argument** +```python +# Before (dangerous!) +def append_data(value, data=[]): + data.append(value) + return data + +# After +def append_data(value, data=None): + if data is None: + data = [] + data.append(value) + return data +``` + +**NPY002: Legacy NumPy random** +```python +# Before (old style) +import numpy as np +data = np.random.rand(100) + +# After (new style, better for reproducibility) +import numpy as np +rng = np.random.default_rng(seed=42) +data = rng.random(100) +``` + + +## Best Practices Checklist + +### Initial Setup +- Install ruff, mypy, and pre-commit in dev environment +- Create `pyproject.toml` with tool configurations +- Create `.pre-commit-config.yaml` +- Run `pre-commit install` to enable git hooks +- Run `pre-commit run --all-files` to check existing code +- Add quality check tasks to pixi configuration + +### Configuration +- Set appropriate Python version target +- Enable NumPy-specific rules (NPY) for scientific code +- Configure NumPy-style docstring checking +- Set up per-file ignores for special cases (__init__.py, scripts) +- Configure mypy strictness appropriate for project maturity +- Install type stubs for third-party libraries + +### Workflow Integration +- Add quality checks to CI/CD pipeline +- Document quality standards in CONTRIBUTING.md +- Create pixi tasks for common quality checks +- Set up IDE integration (VS Code, PyCharm) +- Configure editor to run ruff on save +- Add quality check badge to README + +### Team Practices +- Run `ruff check --fix` before committing +- Run `ruff format` before committing +- Address mypy errors (don't use `# type: ignore` without reason) +- Review pre-commit failures before using `--no-verify` +- Keep pre-commit hooks updated (`pre-commit autoupdate`) +- Add type hints to new functions +- Gradually add types to existing code + +### Maintenance +- Update ruff regularly (fast-moving project) +- Update pre-commit hook versions monthly +- Review and adjust ignored rules as project matures +- Increase mypy strictness gradually +- Monitor CI/CD for quality check failures +- Refactor code flagged by quality tools + +## Common Issues and Solutions + +### Issue 1: Ruff and Black Formatting Conflicts + +**Problem:** Using both ruff format and black causes conflicts. + +**Solution:** Choose one formatter. Ruff format is compatible with black's style: +```toml +[tool.ruff.format] +# Use black-compatible formatting +quote-style = "double" +indent-style = "space" +line-ending = "auto" +``` + +Remove black from dependencies and pre-commit hooks. + +### Issue 2: MyPy Can't Find Imports + +**Problem:** `error: Cannot find implementation or library stub for module named 'scipy'` + +**Solution:** Install type stubs or ignore missing imports: +```toml +[[tool.mypy.overrides]] +module = ["scipy.*", "matplotlib.*"] +ignore_missing_imports = true +``` + +Or install stubs: +```bash +pixi add --feature dev types-requests types-PyYAML +``` + +### Issue 3: Pre-commit Hooks Too Slow + +**Problem:** Pre-commit takes too long on large codebases. + +**Solution:** + +Use ruff instead of multiple tools (much faster). Limit hooks to staged files only (default behavior). Skip expensive checks in pre-commit, run in CI instead by removing mypy from `.pre-commit-config.yaml` and keeping it in CI workflow. + +### Issue 4: Too Many Ruff Errors on Legacy Code + +**Problem:** Hundreds of ruff errors on existing codebase. + +**Solution:** Gradual adoption strategy: +```bash +# 1. Start with auto-fixable issues only +ruff check --fix . + +# 2. Add baseline to ignore existing issues +ruff check --add-noqa . + +# 3. Fix new code going forward +# 4. Gradually remove # noqa comments +``` + +### Issue 5: Type Hints Break at Runtime + +**Problem:** Code with type hints fails with `NameError` in Python < 3.10. + +**Solution:** Use `from __future__ import annotations`: +```python +from __future__ import annotations # Must be first import + +import numpy as np + +def process(data: np.ndarray) -> np.ndarray: + """This works in Python 3.7+""" + return data * 2 +``` + +### Issue 6: MyPy Errors in Test Files + +**Problem:** MyPy complains about pytest fixtures and dynamic test generation. + +**Solution:** Configure mypy to be lenient with tests: +```toml +[[tool.mypy.overrides]] +module = "tests.*" +disallow_untyped_defs = false +``` + +### Issue 7: Ruff Conflicts with Project Style + +**Problem:** Team prefers single quotes, but ruff uses double quotes. + +**Solution:** Configure ruff to match team preferences: +```toml +[tool.ruff.format] +quote-style = "single" +``` + +### Issue 8: Pre-commit Fails in CI + +**Problem:** Pre-commit hooks pass locally but fail in CI. + +**Solution:** Ensure consistent environments: +```yaml +# In CI, use same Python version and dependencies +- name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" # Match local version + +# Or use pre-commit's CI action +- uses: pre-commit/action@v3.0.0 +``` + + +## Integration with Other Tools + +### VS Code Integration + +Install extensions for seamless integration with your editor: + +**Extensions:** +- Ruff (charliermarsh.ruff) +- Mypy Type Checker (ms-python.mypy-type-checker) + +**Settings (`.vscode/settings.json`):** +```json +{ + "[python]": { + "editor.defaultFormatter": "charliermarsh.ruff", + "editor.formatOnSave": true, + "editor.codeActionsOnSave": { + "source.fixAll": "explicit", + "source.organizeImports": "explicit" + } + }, + "ruff.lint.args": ["--config=pyproject.toml"], + "mypy-type-checker.args": ["--config-file=pyproject.toml"] +} +``` + +### PyCharm Integration + +**Ruff:** +1. Install Ruff plugin from marketplace +2. Configure: Settings → Tools → Ruff +3. Enable "Run ruff on save" + +**MyPy:** +1. Install Mypy plugin +2. Configure: Settings → Tools → Mypy +3. Set mypy executable path + +### Jupyter Notebook Integration + +Use nbqa to run quality tools on notebooks: + +```bash +# Install nbqa +pixi add --feature dev nbqa + +# Run ruff on notebooks +nbqa ruff notebooks/ + +# Run mypy on notebooks +nbqa mypy notebooks/ +``` + +**Pre-commit config for notebooks:** +```yaml +repos: + - repo: https://github.com/nbQA-dev/nbQA + rev: 1.8.5 + hooks: + - id: nbqa-ruff + args: [--fix] + - id: nbqa-mypy +``` + +### pytest Integration + +Type checking in tests ensures your test code is also correct: + +```python +import numpy as np +from numpy.typing import NDArray + +def test_normalize_data(): + """Test data normalization.""" + data: NDArray[np.float64] = np.array([1.0, 2.0, 3.0]) + result = normalize_data(data) + + # MyPy ensures types match + assert isinstance(result, np.ndarray) + assert result.dtype == np.float64 +``` + +### Documentation Integration + +Ruff checks docstrings for completeness and correctness: + +```python +def calculate_mean(data: np.ndarray) -> float: + """ + Calculate arithmetic mean. + + Parameters + ---------- + data : np.ndarray + Input data array. + + Returns + ------- + float + Mean value. + + Examples + -------- + >>> calculate_mean(np.array([1, 2, 3])) + 2.0 + """ + return float(np.mean(data)) +``` + +Ruff validates docstring presence, NumPy-style formatting, parameter documentation matches signature, and return type documentation. + +## Real-World Examples + +### Example 1: Complete Scientific Python Project Setup + +Set up a new project with all quality tools configured: + +**Project structure:** +``` +my-science-project/ +├── src/ +│ └── my_project/ +│ ├── __init__.py +│ ├── analysis.py +│ └── visualization.py +├── tests/ +│ └── test_analysis.py +├── pyproject.toml +├── .pre-commit-config.yaml +└── README.md +``` + +**pyproject.toml:** +```toml +[project] +name = "my-science-project" +version = "0.1.0" +requires-python = ">=3.10" +dependencies = [ + "numpy>=1.24", + "pandas>=2.0", + "matplotlib>=3.7", +] + +[tool.pixi.project] +channels = ["conda-forge"] +platforms = ["linux-64", "osx-64", "osx-arm64"] + +[tool.pixi.dependencies] +python = "3.11.*" +numpy = ">=1.24" +pandas = ">=2.0" +matplotlib = ">=3.7" + +[tool.pixi.feature.dev.dependencies] +ruff = ">=0.6.0" +mypy = ">=1.11" +pre-commit = ">=3.5" +pytest = ">=7.0" +pytest-cov = ">=4.0" + +[tool.pixi.feature.dev.tasks] +lint = "ruff check ." +format = "ruff format ." +typecheck = "mypy src/" +test = "pytest tests/ --cov=src/" +quality = { depends-on = ["lint", "format", "typecheck"] } +all = { depends-on = ["quality", "test"] } + +[tool.ruff] +target-version = "py310" +line-length = 88 + +[tool.ruff.lint] +select = ["E", "F", "I", "N", "UP", "B", "C4", "NPY", "D"] +ignore = ["E501", "D100", "D104"] + +[tool.ruff.lint.pydocstyle] +convention = "numpy" + +[tool.mypy] +python_version = "3.10" +check_untyped_defs = true +warn_return_any = true +warn_unused_configs = true +show_error_codes = true +``` + +**Usage:** +```bash +# Setup +pixi install +pre-commit install + +# Development workflow +pixi run format # Format code +pixi run lint # Check for issues +pixi run typecheck # Type check +pixi run test # Run tests +pixi run all # Run everything + +# Before committing (automatic via pre-commit) +git commit -m "Add new analysis function" +``` + +### Example 2: Adding Types to Existing Scientific Code + +Transform untyped code into well-typed, documented code: + +**Before (no types):** +```python +import numpy as np + +def calculate_correlation(x, y, method="pearson"): + """Calculate correlation between two arrays.""" + if method == "pearson": + return np.corrcoef(x, y)[0, 1] + elif method == "spearman": + from scipy.stats import spearmanr + return spearmanr(x, y)[0] + else: + raise ValueError(f"Unknown method: {method}") +``` + +**After (with types):** +```python +import numpy as np +from numpy.typing import NDArray +from typing import Literal + +CorrelationMethod = Literal["pearson", "spearman"] + +def calculate_correlation( + x: NDArray[np.float64], + y: NDArray[np.float64], + method: CorrelationMethod = "pearson" +) -> float: + """ + Calculate correlation between two arrays. + + Parameters + ---------- + x : NDArray[np.float64] + First data array. + y : NDArray[np.float64] + Second data array. + method : {"pearson", "spearman"}, default "pearson" + Correlation method to use. + + Returns + ------- + float + Correlation coefficient. + + Raises + ------ + ValueError + If method is not recognized. + + Examples + -------- + >>> x = np.array([1.0, 2.0, 3.0]) + >>> y = np.array([2.0, 4.0, 6.0]) + >>> calculate_correlation(x, y) + 1.0 + """ + if method == "pearson": + corr_matrix: NDArray[np.float64] = np.corrcoef(x, y) + return float(corr_matrix[0, 1]) + elif method == "spearman": + from scipy.stats import spearmanr + result = spearmanr(x, y) + return float(result.statistic) + else: + raise ValueError(f"Unknown method: {method}") +``` + +**Benefits:** + +MyPy catches invalid method names at type-check time. IDE provides autocomplete for method parameter. Clear documentation of expected types. Runtime errors caught before execution. + + +### Example 3: Pre-commit Hook Workflow + +See how pre-commit catches issues before they enter the codebase: + +**Scenario: Developer commits code with issues** + +```bash +$ git add src/analysis.py +$ git commit -m "Add new analysis function" + +# Pre-commit runs automatically +ruff....................................................................Failed +hook id: ruff +exit code: 1 + +src/analysis.py:15:1: F401 [*] `numpy` imported but unused +src/analysis.py:23:5: E711 Comparison to `None` should be `cond is None` +Found 2 errors. + +mypy....................................................................Failed +hook id: mypy +exit code: 1 + +src/analysis.py:30: error: Incompatible return value type (got "None", expected "float") + +# Fix the issues +$ ruff check --fix src/analysis.py # Auto-fix F401, E711 +$ # Manually fix mypy error + +# Commit again +$ git commit -m "Add new analysis function" + +ruff....................................................................Passed +mypy....................................................................Passed +[feature/new-analysis abc123] Add new analysis function + 1 file changed, 25 insertions(+) +``` + +## Migration Guides + +### Migrating from Black + Flake8 + isort + +Replace multiple tools with ruff for better performance and simpler configuration: + +**Step 1: Install ruff** +```bash +pixi add --feature dev ruff +``` + +**Step 2: Convert configuration** +```toml +# Old: pyproject.toml +[tool.black] +line-length = 88 + +[tool.isort] +profile = "black" + +# Old: setup.cfg +[flake8] +max-line-length = 88 + +# New: pyproject.toml +[tool.ruff] +line-length = 88 + +[tool.ruff.lint] +select = ["E", "F", "I"] # pycodestyle, pyflakes, isort +``` + +**Step 3: Update pre-commit** +```yaml +# Remove these +# - repo: https://github.com/psf/black +# - repo: https://github.com/pycqa/flake8 +# - repo: https://github.com/pycqa/isort + +# Add this +- repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.6.0 + hooks: + - id: ruff + args: [--fix] + - id: ruff-format +``` + +**Step 4: Remove old tools** +```bash +pixi remove --feature dev black flake8 isort +``` + +### Migrating from Pylint + +Ruff covers most pylint rules with better performance: + +```toml +[tool.ruff.lint] +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes + "C90", # mccabe complexity + "I", # isort + "N", # pep8-naming + "UP", # pyupgrade + "B", # flake8-bugbear + "A", # flake8-builtins + "C4", # flake8-comprehensions + "PL", # pylint rules +] +``` + +Keep pylint only if you need specific rules: +```bash +# Check what pylint rules you use +pylint --list-msgs + +# See if ruff supports them +ruff rule +``` + +## Resources and References + +### Official Documentation +- **Ruff**: https://docs.astral.sh/ruff/ +- **MyPy**: https://mypy.readthedocs.io/ +- **Pre-commit**: https://pre-commit.com/ +- **NumPy Typing**: https://numpy.org/devdocs/reference/typing.html + +### Ruff Resources +- Rule reference: https://docs.astral.sh/ruff/rules/ +- Configuration: https://docs.astral.sh/ruff/configuration/ +- Editor integrations: https://docs.astral.sh/ruff/integrations/ +- Migration guide: https://docs.astral.sh/ruff/faq/#how-does-ruff-compare-to-flake8 + +### MyPy Resources +- Type hints cheat sheet: https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html +- Common issues: https://mypy.readthedocs.io/en/stable/common_issues.html +- Running mypy: https://mypy.readthedocs.io/en/stable/running_mypy.html +- Type stubs: https://mypy.readthedocs.io/en/stable/stubs.html + +### Pre-commit Resources +- Supported hooks: https://pre-commit.com/hooks.html +- Creating hooks: https://pre-commit.com/index.html#creating-new-hooks +- CI integration: https://pre-commit.ci/ + +### Scientific Python Resources +- Scientific Python Development Guide: https://learn.scientific-python.org/development/ +- NumPy documentation style: https://numpydoc.readthedocs.io/ +- Type hints for scientific code: https://numpy.org/devdocs/reference/typing.html + +### Community Examples +- Scientific Python Cookie: https://github.com/scientific-python/cookie +- NumPy: https://github.com/numpy/numpy (see pyproject.toml) +- SciPy: https://github.com/scipy/scipy (see pyproject.toml) +- Pandas: https://github.com/pandas-dev/pandas (see pyproject.toml) + +## Quick Start Template + +Copy-paste starter configuration for immediate use: + +```toml +# pyproject.toml +[tool.ruff] +target-version = "py310" +line-length = 88 + +[tool.ruff.lint] +select = ["E", "F", "I", "B", "NPY"] +ignore = ["E501"] + +[tool.ruff.lint.pydocstyle] +convention = "numpy" + +[tool.mypy] +python_version = "3.10" +check_untyped_defs = true +warn_return_any = true +show_error_codes = true +``` + +```yaml +# .pre-commit-config.yaml +repos: + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.6.0 + hooks: + - id: ruff + args: [--fix] + - id: ruff-format + + - repo: https://github.com/pre-commit/mirrors-mypy + rev: v1.11.0 + hooks: + - id: mypy + args: [--ignore-missing-imports] +``` + +```bash +# Setup commands +pixi add --feature dev ruff mypy pre-commit +pre-commit install +pre-commit run --all-files +``` + +## Summary + +Code quality tools are essential for maintaining scientific Python projects. Ruff provides fast, comprehensive linting and formatting. MyPy catches type errors before runtime. Pre-commit automates quality checks in your workflow. + +**Key takeaways:** + +Start with ruff for immediate impact as it replaces multiple tools with a single fast solution. Add mypy gradually as you add type hints to catch bugs early. Use pre-commit to enforce standards automatically without manual intervention. Integrate with pixi for reproducible development environments. Configure tools in pyproject.toml for centralized management. Run quality checks in CI/CD to maintain standards across the team. + +**Next steps:** + +Set up ruff and pre-commit in your project today. Add type hints to new functions you write. Gradually increase mypy strictness as your codebase matures. Share configurations with your team for consistency. Integrate quality checks into your development workflow. + +Quality tools save time by catching errors early and maintaining consistency across your scientific codebase. They make code reviews more productive by automating style discussions, allowing reviewers to focus on scientific correctness and algorithmic choices rather than formatting details. diff --git a/skills/pixi-package-manager/SKILL.md b/skills/pixi-package-manager/SKILL.md new file mode 100644 index 0000000..30a4637 --- /dev/null +++ b/skills/pixi-package-manager/SKILL.md @@ -0,0 +1,1288 @@ +--- +name: pixi-package-manager +description: Fast, reproducible scientific Python environments with pixi - conda and PyPI unified +--- + +# Pixi Package Manager for Scientific Python + +Master **pixi**, the modern package manager that unifies conda and PyPI ecosystems for fast, reproducible scientific Python development. Learn how to manage complex scientific dependencies, create isolated environments, and build reproducible workflows using `pyproject.toml` integration. + +**Official Documentation**: https://pixi.sh +**GitHub**: https://github.com/prefix-dev/pixi + +## Quick Reference Card + +### Installation & Setup +```bash +# Install pixi (macOS/Linux) +curl -fsSL https://pixi.sh/install.sh | bash + +# Install pixi (Windows) +iwr -useb https://pixi.sh/install.ps1 | iex + +# Initialize new project with pyproject.toml +pixi init --format pyproject + +# Initialize existing Python project +pixi init --format pyproject --import-environment +``` + +### Essential Commands +```bash +# Add dependencies +pixi add numpy scipy pandas # conda packages +pixi add --pypi pytest-cov # PyPI-only packages +pixi add --feature dev pytest ruff # dev environment + +# Install all dependencies +pixi install + +# Run commands in environment +pixi run python script.py +pixi run pytest + +# Shell with environment activated +pixi shell + +# Add tasks +pixi task add test "pytest tests/" +pixi task add docs "sphinx-build docs/ docs/_build" + +# Run tasks +pixi run test +pixi run docs + +# Update dependencies +pixi update numpy # update specific +pixi update # update all + +# List packages +pixi list +pixi tree numpy # show dependency tree +``` + +### Quick Decision Tree: Pixi vs UV vs Both + +``` +Need compiled scientific libraries (NumPy, SciPy, GDAL)? +├─ YES → Use pixi (conda-forge has pre-built binaries) +└─ NO → Consider uv for pure Python projects + +Need multi-language support (Python + R, Julia, C++)? +├─ YES → Use pixi (supports conda ecosystem) +└─ NO → uv sufficient for Python-only + +Need multiple environments (dev, test, prod, GPU, CPU)? +├─ YES → Use pixi features for environment management +└─ NO → Single environment projects work with either + +Need reproducible environments across platforms? +├─ CRITICAL → Use pixi (lockfiles include all platforms) +└─ LESS CRITICAL → uv also provides lockfiles + +Want to use both conda-forge AND PyPI packages? +├─ YES → Use pixi (seamless integration) +└─ ONLY PYPI → uv is simpler and faster + +Legacy conda environment files (environment.yml)? +├─ YES → pixi can import and modernize +└─ NO → Start fresh with pixi or uv +``` + +## When to Use This Skill + +- **Setting up scientific Python projects** with complex compiled dependencies (NumPy, SciPy, Pandas, scikit-learn, GDAL, netCDF4) +- **Building reproducible research environments** that work identically across different machines and platforms +- **Managing multi-language projects** that combine Python with R, Julia, C++, or Fortran +- **Creating multiple environment configurations** for different hardware (GPU/CPU), testing scenarios, or deployment targets +- **Replacing conda/mamba workflows** with faster, more reliable dependency resolution +- **Developing packages that depend on both conda-forge and PyPI** packages +- **Migrating from environment.yml or requirements.txt** to modern, reproducible workflows +- **Running automated scientific workflows** with task runners and CI/CD integration +- **Working with geospatial, climate, or astronomy packages** that require complex C/Fortran dependencies + +## Core Concepts + +### 1. Unified Package Management (conda + PyPI) + +Pixi resolves dependencies from **both conda-forge and PyPI** in a single unified graph, ensuring compatibility: + +```toml +[project] +name = "my-science-project" +dependencies = [ + "numpy>=1.24", # from conda-forge (optimized builds) + "pandas>=2.0", # from conda-forge +] + +[tool.pixi.pypi-dependencies] +my-custom-pkg = ">=1.0" # PyPI-only package +``` + +**Why this matters for scientific Python:** +- Get optimized NumPy/SciPy builds from conda-forge (MKL, OpenBLAS) +- Use PyPI packages not available in conda +- Single lockfile ensures all dependencies are compatible + +### 2. Multi-Platform Lockfiles + +Pixi generates `pixi.lock` with dependency specifications for **all platforms** (Linux, macOS, Windows, different architectures): + +```toml +# pixi.lock includes: +# - linux-64 +# - osx-64, osx-arm64 +# - win-64 +``` + +**Benefits:** +- Commit lockfile to git → everyone gets identical environments +- Works on collaborator's different OS without changes +- CI/CD uses exact same versions as local development + +### 3. Feature-Based Environments + +Create multiple environments using **features** without duplicating dependencies: + +```toml +[tool.pixi.feature.test.dependencies] +pytest = ">=7.0" +pytest-cov = ">=4.0" + +[tool.pixi.feature.gpu.dependencies] +pytorch-cuda = "11.8.*" + +[tool.pixi.environments] +test = ["test"] +gpu = ["gpu"] +gpu-test = ["gpu", "test"] # combines features +``` + +### 4. Task Automation + +Define reusable commands as tasks: + +```toml +[tool.pixi.tasks] +test = "pytest tests/ -v" +format = "ruff format src/ tests/" +lint = "ruff check src/ tests/" +docs = "sphinx-build docs/ docs/_build" +analyse = { cmd = "python scripts/analyze.py", depends-on = ["test"] } +``` + +### 5. Fast Dependency Resolution + +Pixi uses **rattler** (Rust-based conda resolver) for 10-100x faster resolution than conda: + +- Parallel package downloads +- Efficient caching +- Smart dependency solver + +### 6. pyproject.toml Integration + +Pixi reads standard Python project metadata from `pyproject.toml`, enabling: +- Single source of truth for project configuration +- Compatibility with pip, uv, and other tools +- Standard Python packaging workflows + +## Quick Start + +### Minimal Example: Data Analysis Project + +```bash +# Create new project +mkdir climate-analysis && cd climate-analysis +pixi init --format pyproject + +# Add scientific stack +pixi add python=3.11 numpy pandas matplotlib xarray + +# Add development tools +pixi add --feature dev pytest ipython ruff + +# Create analysis script +cat > analyze.py << 'EOF' +import pandas as pd +import matplotlib.pyplot as plt + +# Your analysis code +data = pd.read_csv("data.csv") +data.plot() +plt.savefig("output.png") +EOF + +# Run in pixi environment +pixi run python analyze.py + +# Or activate shell +pixi shell +python analyze.py +``` + +### Example: Machine Learning Project with GPU Support + +```bash +# Initialize project +pixi init ml-project --format pyproject +cd ml-project + +# Add base dependencies +pixi add python=3.11 numpy pandas scikit-learn matplotlib jupyter + +# Add CPU PyTorch +pixi add --platform linux-64 --platform osx-arm64 pytorch torchvision cpuonly -c pytorch + +# Create GPU feature +pixi add --feature gpu pytorch-cuda=11.8 -c pytorch -c nvidia + +# Add development tools +pixi add --feature dev pytest black mypy + +# Configure environments in pyproject.toml +cat >> pyproject.toml << 'EOF' + +[tool.pixi.environments] +default = { solve-group = "default" } +gpu = { features = ["gpu"], solve-group = "default" } +dev = { features = ["dev"], solve-group = "default" } +EOF + +# Install and run +pixi install +pixi run python train.py # uses default (CPU) +pixi run --environment gpu python train.py # uses GPU +``` + +## Patterns + +### Pattern 1: Converting Existing Projects to Pixi + +**Scenario**: You have an existing project with `requirements.txt` or `environment.yml` + +**Solution**: + +```bash +# From requirements.txt +cd existing-project +pixi init --format pyproject + +# Import from requirements.txt +while IFS= read -r package; do + # Skip comments and empty lines + [[ "$package" =~ ^#.*$ ]] || [[ -z "$package" ]] && continue + + # Try conda first, fallback to PyPI + pixi add "$package" 2>/dev/null || pixi add --pypi "$package" +done < requirements.txt + +# From environment.yml +pixi init --format pyproject --import-environment environment.yml + +# Verify installation +pixi install +pixi run python -c "import numpy, pandas, scipy; print('Success!')" +``` + +**Best Practice**: Review generated `pyproject.toml` and organize dependencies: +- Core runtime dependencies → `[project.dependencies]` +- PyPI-only packages → `[tool.pixi.pypi-dependencies]` +- Development tools → `[tool.pixi.feature.dev.dependencies]` + +### Pattern 2: Multi-Environment Scientific Workflow + +**Scenario**: Different environments for development, testing, production, and GPU computing + +**Implementation**: + +```toml +[project] +name = "research-pipeline" +version = "0.1.0" +dependencies = [ + "python>=3.11", + "numpy>=1.24", + "pandas>=2.0", + "xarray>=2023.1", +] + +# Development tools +[tool.pixi.feature.dev.dependencies] +ipython = ">=8.0" +jupyter = ">=1.0" +ruff = ">=0.1" + +[tool.pixi.feature.dev.pypi-dependencies] +jupyterlab-vim = ">=0.16" + +# Testing tools +[tool.pixi.feature.test.dependencies] +pytest = ">=7.4" +pytest-cov = ">=4.1" +pytest-xdist = ">=3.3" +hypothesis = ">=6.82" + +# GPU dependencies +[tool.pixi.feature.gpu.dependencies] +pytorch-cuda = "11.8.*" +cudatoolkit = "11.8.*" + +[tool.pixi.feature.gpu.pypi-dependencies] +nvidia-ml-py = ">=12.0" + +# Production optimizations +[tool.pixi.feature.prod.dependencies] +python = "3.11.*" # pin exact version + +# Define environments combining features +[tool.pixi.environments] +default = { solve-group = "default" } +dev = { features = ["dev"], solve-group = "default" } +test = { features = ["test"], solve-group = "default" } +gpu = { features = ["gpu"], solve-group = "gpu" } +gpu-dev = { features = ["gpu", "dev"], solve-group = "gpu" } +prod = { features = ["prod"], solve-group = "prod" } + +# Tasks for each environment +[tool.pixi.tasks] +dev-notebook = { cmd = "jupyter lab", env = { JUPYTER_CONFIG_DIR = ".jupyter" } } +test = "pytest tests/ -v --cov=src" +test-parallel = "pytest tests/ -n auto" +train-cpu = "python train.py --device cpu" +train-gpu = "python train.py --device cuda" +benchmark = "python benchmark.py" +``` + +**Usage**: + +```bash +# Development +pixi run --environment dev dev-notebook + +# Testing +pixi run --environment test test + +# GPU training +pixi run --environment gpu train-gpu + +# Production +pixi run --environment prod benchmark +``` + +### Pattern 3: Scientific Library Development + +**Scenario**: Developing a scientific Python package with proper packaging, testing, and documentation + +**Structure**: + +```toml +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "mylib" +version = "0.1.0" +description = "Scientific computing library" +dependencies = [ + "numpy>=1.24", + "scipy>=1.11", +] + +[project.optional-dependencies] +viz = ["matplotlib>=3.7", "seaborn>=0.12"] + +# Development dependencies +[tool.pixi.feature.dev.dependencies] +ipython = "*" +ruff = "*" +mypy = "*" + +# Testing dependencies +[tool.pixi.feature.test.dependencies] +pytest = ">=7.4" +pytest-cov = ">=4.1" +pytest-benchmark = ">=4.0" +hypothesis = ">=6.82" + +# Documentation dependencies +[tool.pixi.feature.docs.dependencies] +sphinx = ">=7.0" +sphinx-rtd-theme = ">=1.3" +numpydoc = ">=1.5" +sphinx-gallery = ">=0.14" + +[tool.pixi.feature.docs.pypi-dependencies] +myst-parser = ">=2.0" + +# Build dependencies +[tool.pixi.feature.build.dependencies] +build = "*" +twine = "*" + +[tool.pixi.environments] +default = { features = [], solve-group = "default" } +dev = { features = ["dev", "test", "docs"], solve-group = "default" } +test = { features = ["test"], solve-group = "default" } +docs = { features = ["docs"], solve-group = "default" } + +# Tasks for development workflow +[tool.pixi.tasks] +# Development +install-dev = "pip install -e ." +format = "ruff format src/ tests/" +lint = "ruff check src/ tests/" +typecheck = "mypy src/" + +# Testing +test = "pytest tests/ -v" +test-cov = "pytest tests/ --cov=src --cov-report=html --cov-report=term" +test-fast = "pytest tests/ -x -v" +benchmark = "pytest tests/benchmarks/ --benchmark-only" + +# Documentation +docs-build = "sphinx-build docs/ docs/_build/html" +docs-serve = { cmd = "python -m http.server 8000 -d docs/_build/html", depends-on = ["docs-build"] } +docs-clean = "rm -rf docs/_build docs/generated" + +# Build and release +build = "python -m build" +publish-test = { cmd = "twine upload --repository testpypi dist/*", depends-on = ["build"] } +publish = { cmd = "twine upload dist/*", depends-on = ["build"] } + +# Combined workflows +ci = { depends-on = ["format", "lint", "typecheck", "test-cov"] } +pre-commit = { depends-on = ["format", "lint", "test-fast"] } +``` + +**Workflow**: + +```bash +# Initial setup +pixi install --environment dev +pixi run install-dev + +# Development cycle +pixi run format # format code +pixi run lint # check style +pixi run typecheck # type checking +pixi run test # run tests + +# Or run all checks +pixi run ci + +# Build documentation +pixi run docs-build +pixi run docs-serve # view at http://localhost:8000 + +# Release workflow +pixi run build +pixi run publish-test # test on TestPyPI +pixi run publish # publish to PyPI +``` + +### Pattern 4: Conda + PyPI Dependency Strategy + +**Scenario**: Optimize dependency sources for performance and availability + +**Strategy**: + +```toml +[project] +dependencies = [ + # Core scientific stack: prefer conda-forge (optimized builds) + "numpy>=1.24", # MKL or OpenBLAS optimized + "scipy>=1.11", # optimized BLAS/LAPACK + "pandas>=2.0", # optimized pandas + "matplotlib>=3.7", # compiled components + "scikit-learn>=1.3", # optimized algorithms + + # Geospatial/climate: conda-forge essential (C/Fortran deps) + "xarray>=2023.1", + "netcdf4>=1.6", + "h5py>=3.9", + "rasterio>=1.3", # GDAL dependency + + # Data processing: conda-forge preferred + "dask>=2023.1", + "numba>=0.57", # LLVM dependency +] + +[tool.pixi.pypi-dependencies] +# Pure Python packages or PyPI-only packages +my-custom-tool = ">=1.0" +experimental-lib = { git = "https://github.com/user/repo.git" } +internal-pkg = { path = "../internal-pkg", editable = true } +``` + +**Decision Rules**: + +1. **Use conda-forge (pixi add) for**: + - NumPy, SciPy, Pandas (optimized builds) + - Packages with C/C++/Fortran extensions (GDAL, netCDF4, h5py) + - Packages with complex system dependencies (Qt, OpenCV) + - R, Julia, or other language packages + +2. **Use PyPI (pixi add --pypi) for**: + - Pure Python packages not in conda-forge + - Bleeding-edge versions before conda-forge packaging + - Internal/private packages + - Editable local packages during development + +### Pattern 5: Reproducible Research Environment + +**Scenario**: Ensure research is reproducible across time and machines + +**Implementation**: + +```toml +[project] +name = "nature-paper-2024" +version = "1.0.0" +description = "Analysis for Nature Paper 2024" +requires-python = ">=3.11,<3.12" # pin Python version range + +dependencies = [ + "python=3.11.6", # exact Python version + "numpy=1.26.2", # exact versions for reproducibility + "pandas=2.1.4", + "scipy=1.11.4", + "matplotlib=3.8.2", + "scikit-learn=1.3.2", +] + +[tool.pixi.pypi-dependencies] +# Pin with exact hashes for ultimate reproducibility +seaborn = "==0.13.0" + +# Analysis environments +[tool.pixi.feature.analysis.dependencies] +jupyter = "1.0.0" +jupyterlab = "4.0.9" + +[tool.pixi.feature.analysis.pypi-dependencies] +jupyterlab-vim = "0.16.0" + +# Environments +[tool.pixi.environments] +default = { solve-group = "default" } +analysis = { features = ["analysis"], solve-group = "default" } + +# Reproducible tasks +[tool.pixi.tasks] +# Data processing pipeline +download-data = "python scripts/01_download.py" +preprocess = { cmd = "python scripts/02_preprocess.py", depends-on = ["download-data"] } +analyze = { cmd = "python scripts/03_analyze.py", depends-on = ["preprocess"] } +visualize = { cmd = "python scripts/04_visualize.py", depends-on = ["analyze"] } +full-pipeline = { depends-on = ["download-data", "preprocess", "analyze", "visualize"] } + +# Notebook execution +run-notebooks = "jupyter nbconvert --execute --to notebook --inplace notebooks/*.ipynb" +``` + +**Best Practices**: + +```bash +# Generate lockfile +pixi install + +# Commit lockfile to repository +git add pixi.lock pyproject.toml +git commit -m "Lock environment for reproducibility" + +# Anyone can recreate exact environment +git clone https://github.com/user/nature-paper-2024.git +cd nature-paper-2024 +pixi install # installs exact versions from pixi.lock + +# Run complete pipeline +pixi run full-pipeline + +# Archive for long-term preservation +pixi list --export environment.yml # backup as conda format +``` + +### Pattern 6: Cross-Platform Development + +**Scenario**: Team members on Linux, macOS (Intel/ARM), and Windows + +**Configuration**: + +```toml +[project] +name = "cross-platform-science" +dependencies = [ + "python>=3.11", + "numpy>=1.24", + "pandas>=2.0", +] + +# Platform-specific dependencies +[tool.pixi.target.linux-64.dependencies] +# Linux-specific optimized builds +mkl = "*" + +[tool.pixi.target.osx-arm64.dependencies] +# Apple Silicon optimizations +accelerate = "*" + +[tool.pixi.target.win-64.dependencies] +# Windows-specific packages +pywin32 = "*" + +# Tasks with platform-specific behavior +[tool.pixi.tasks] +test = "pytest tests/" + +[tool.pixi.target.linux-64.tasks] +test-gpu = "pytest tests/ --gpu" + +[tool.pixi.target.win-64.tasks] +test = "pytest tests/ --timeout=30" # slower on Windows CI +``` + +**Platform Selectors**: + +```toml +# Supported platforms +[tool.pixi.platforms] +linux-64 = true +linux-aarch64 = true +osx-64 = true +osx-arm64 = true +win-64 = true +``` + +### Pattern 7: Task Dependencies and Workflows + +**Scenario**: Complex scientific workflows with data dependencies + +**Implementation**: + +```toml +[tool.pixi.tasks] +# Data acquisition +download-raw = "python scripts/download.py --source=api" +validate-raw = { cmd = "python scripts/validate.py data/raw/", depends-on = ["download-raw"] } + +# Data processing pipeline +clean-data = { cmd = "python scripts/clean.py", depends-on = ["validate-raw"] } +transform = { cmd = "python scripts/transform.py", depends-on = ["clean-data"] } +feature-engineering = { cmd = "python scripts/features.py", depends-on = ["transform"] } + +# Analysis +train-model = { cmd = "python scripts/train.py", depends-on = ["feature-engineering"] } +evaluate = { cmd = "python scripts/evaluate.py", depends-on = ["train-model"] } +visualize = { cmd = "python scripts/visualize.py", depends-on = ["evaluate"] } + +# Testing at each stage +test-cleaning = "pytest tests/test_clean.py" +test-transform = "pytest tests/test_transform.py" +test-features = "pytest tests/test_features.py" +test-model = "pytest tests/test_model.py" + +# Combined workflows +all-tests = { depends-on = ["test-cleaning", "test-transform", "test-features", "test-model"] } +full-pipeline = { depends-on = ["download-raw", "validate-raw", "clean-data", "transform", "feature-engineering", "train-model", "evaluate", "visualize"] } +pipeline-with-tests = { depends-on = ["all-tests", "full-pipeline"] } + +# Parallel execution where possible +[tool.pixi.task.download-supplementary] +cmd = "python scripts/download_supplement.py" + +[tool.pixi.task.process-all] +depends-on = ["download-raw", "download-supplementary"] # run in parallel +``` + +**Running Workflows**: + +```bash +# Run entire pipeline +pixi run full-pipeline + +# Run with testing +pixi run pipeline-with-tests + +# Check what will run +pixi task list --summary + +# Visualize task dependencies +pixi task info full-pipeline +``` + +### Pattern 8: Integration with UV for Pure Python Development + +**Scenario**: Use pixi for complex dependencies, uv for fast pure Python workflows + +**Hybrid Approach**: + +```toml +[project] +name = "hybrid-project" +dependencies = [ + # Heavy scientific deps via pixi/conda + "python>=3.11", + "numpy>=1.24", + "scipy>=1.11", + "gdal>=3.7", # complex C++ dependency + "netcdf4>=1.6", # Fortran dependency +] + +[tool.pixi.pypi-dependencies] +# Pure Python packages +requests = ">=2.31" +pydantic = ">=2.0" +typer = ">=0.9" + +[tool.pixi.feature.dev.dependencies] +ruff = "*" +mypy = "*" + +[tool.pixi.feature.dev.pypi-dependencies] +pytest = ">=7.4" + +[tool.pixi.tasks] +# Use uv for fast pure Python operations +install-dev = "uv pip install -e ." +sync-deps = "uv pip sync requirements.txt" +add-py-dep = "uv pip install" +``` + +**Workflow**: + +```bash +# Pixi manages environment with conda packages +pixi install + +# Activate pixi environment +pixi shell + +# Inside pixi shell, use uv for fast pure Python operations +uv pip install requests httpx pydantic # fast pure Python installs +uv pip freeze > requirements-py.txt + +# Or define as tasks +pixi run install-dev +``` + +**When to use this pattern**: +- Project needs conda for compiled deps (GDAL, netCDF, HDF5) +- But also rapid iteration on pure Python dependencies +- Want uv's speed for locking/installing pure Python packages +- Need conda's solver for complex scientific dependency graphs + +### Pattern 9: CI/CD Integration + +**Scenario**: Reproducible testing in GitHub Actions, GitLab CI, etc. + +**GitHub Actions Example**: + +```yaml +# .github/workflows/test.yml +name: Tests + +on: [push, pull_request] + +jobs: + test: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [ubuntu-latest, macos-latest, windows-latest] + + steps: + - uses: actions/checkout@v4 + + - name: Setup Pixi + uses: prefix-dev/setup-pixi@v0.4.1 + with: + pixi-version: latest + cache: true + + - name: Install dependencies + run: pixi install --environment test + + - name: Run tests + run: pixi run test + + - name: Upload coverage + uses: codecov/codecov-action@v3 + with: + file: ./coverage.xml + + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: prefix-dev/setup-pixi@v0.4.1 + - run: pixi run format --check + - run: pixi run lint + + docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: prefix-dev/setup-pixi@v0.4.1 + - run: pixi run --environment docs docs-build + - uses: actions/upload-artifact@v3 + with: + name: documentation + path: docs/_build/html +``` + +**GitLab CI Example**: + +```yaml +# .gitlab-ci.yml +image: ubuntu:latest + +before_script: + - curl -fsSL https://pixi.sh/install.sh | bash + - export PATH=$HOME/.pixi/bin:$PATH + +stages: + - test + - build + +test: + stage: test + script: + - pixi run test + cache: + key: ${CI_COMMIT_REF_SLUG} + paths: + - .pixi/ + +lint: + stage: test + script: + - pixi run lint + - pixi run typecheck + +docs: + stage: build + script: + - pixi run --environment docs docs-build + artifacts: + paths: + - docs/_build/html +``` + +### Pattern 10: Local Development with Remote Computing + +**Scenario**: Develop locally, run heavy computation on remote GPU cluster + +**Local Configuration** (`pyproject.toml`): + +```toml +[project] +dependencies = [ + "numpy>=1.24", + "pandas>=2.0", +] + +[tool.pixi.feature.dev.dependencies] +jupyter = "*" +matplotlib = "*" + +[tool.pixi.feature.remote.dependencies] +# Heavy GPU dependencies only for remote +pytorch-cuda = "11.8.*" +tensorboard = "*" + +[tool.pixi.environments] +default = { features = ["dev"], solve-group = "default" } +remote = { features = ["remote"], solve-group = "remote" } + +[tool.pixi.tasks] +notebook = "jupyter lab" +sync-remote = "rsync -av --exclude='.pixi' . user@remote:~/project/" +remote-train = { cmd = "ssh user@remote 'cd ~/project && pixi run train'", depends-on = ["sync-remote"] } +``` + +**Workflow**: + +```bash +# Local development (no GPU deps) +pixi install +pixi run notebook + +# Push to remote and train +pixi run remote-train + +# Or manually +pixi run sync-remote +ssh user@remote +cd ~/project +pixi install --environment remote # installs GPU deps on remote +pixi run --environment remote train +``` + +## Best Practices Checklist + +### Project Setup +- [ ] Use `pixi init --format pyproject` for new projects +- [ ] Set explicit Python version constraint (`python>=3.11,<3.13`) +- [ ] Organize dependencies by source (conda vs PyPI) +- [ ] Create separate features for dev, test, docs environments +- [ ] Define useful tasks for common workflows +- [ ] Set up `.gitignore` to exclude `.pixi/` directory + +### Dependency Management +- [ ] Prefer conda-forge for compiled scientific packages (NumPy, SciPy, GDAL) +- [ ] Use PyPI only for pure Python or conda-unavailable packages +- [ ] Pin exact versions for reproducible research +- [ ] Use version ranges for libraries (allow updates) +- [ ] Specify solve groups for independent environment solving +- [ ] Use `pixi update` regularly to get security patches + +### Reproducibility +- [ ] Commit `pixi.lock` to version control +- [ ] Include all platforms in lockfile for cross-platform teams +- [ ] Document environment recreation steps in README +- [ ] Use exact version pins for published research +- [ ] Test environment from scratch periodically +- [ ] Archive environments for long-term preservation + +### Performance +- [ ] Use pixi's parallel downloads (automatic) +- [ ] Leverage caching in CI/CD (`prefix-dev/setup-pixi` action) +- [ ] Keep environments minimal (only necessary dependencies) +- [ ] Use solve groups to isolate independent environments +- [ ] Clean old packages with `pixi clean cache` + +### Development Workflow +- [ ] Define tasks for common operations (test, lint, format) +- [ ] Use task dependencies for complex workflows +- [ ] Create environment-specific tasks when needed +- [ ] Use `pixi shell` for interactive development +- [ ] Use `pixi run` for automated scripts and CI +- [ ] Test in clean environment before releasing + +### Team Collaboration +- [ ] Document pixi installation in README +- [ ] Provide quick start commands for new contributors +- [ ] Use consistent naming for features and environments +- [ ] Set up pre-commit hooks with pixi tasks +- [ ] Integrate with CI/CD for automated testing +- [ ] Keep pyproject.toml clean and well-commented + +### Security +- [ ] Audit dependencies regularly (`pixi list`) +- [ ] Use trusted channels (conda-forge, PyPI) +- [ ] Review `pixi.lock` changes in PRs +- [ ] Keep pixi updated to latest version +- [ ] Use virtual environments (pixi automatic) +- [ ] Scan for vulnerabilities in dependencies + +## Resources + +### Official Documentation +- **Pixi Website**: https://pixi.sh +- **Documentation**: https://pixi.sh/latest/ +- **GitHub Repository**: https://github.com/prefix-dev/pixi +- **Configuration Reference**: https://pixi.sh/latest/reference/project_configuration/ + +### Community & Support +- **Discord**: https://discord.gg/kKV8ZxyzY4 +- **GitHub Discussions**: https://github.com/prefix-dev/pixi/discussions +- **Issue Tracker**: https://github.com/prefix-dev/pixi/issues + +### Related Technologies +- **Conda-forge**: https://conda-forge.org/ +- **Rattler**: https://github.com/mamba-org/rattler (underlying solver) +- **PyPI**: https://pypi.org/ +- **UV Package Manager**: https://github.com/astral-sh/uv + +### Complementary Skills +- **scientific-python-packaging**: Modern Python packaging patterns +- **scientific-python-testing**: Testing strategies with pytest +- **uv-package-manager**: Fast pure-Python package management + +### Learning Resources +- **Pixi Examples**: https://github.com/prefix-dev/pixi/tree/main/examples +- **Migration Guides**: https://pixi.sh/latest/switching_from/conda/ +- **Best Practices**: https://pixi.sh/latest/features/ + +### Scientific Python Ecosystem +- **NumPy**: https://numpy.org/ +- **SciPy**: https://scipy.org/ +- **Pandas**: https://pandas.pydata.org/ +- **Scikit-learn**: https://scikit-learn.org/ +- **PyData**: https://pydata.org/ + +## Common Issues and Solutions + +### Issue: Package Not Found in Conda-forge + +**Problem**: Running `pixi add my-package` fails with "package not found" + +**Solution**: +```bash +# Search conda-forge +pixi search my-package + +# If not in conda-forge, use PyPI +pixi add --pypi my-package + +# Check if package has different name in conda +# Example: scikit-learn (PyPI) vs sklearn (conda) +pixi add scikit-learn # correct conda name +``` + +### Issue: Conflicting Dependencies + +**Problem**: Dependency solver fails with "conflict" error + +**Solution**: +```bash +# Check dependency tree +pixi tree numpy + +# Use solve groups to isolate conflicts +[tool.pixi.environments] +env1 = { features = ["feat1"], solve-group = "group1" } +env2 = { features = ["feat2"], solve-group = "group2" } # separate solver + +# Relax version constraints +# Instead of: numpy==1.26.0 +# Use: numpy>=1.24,<2.0 + +# Force specific channel priority +pixi add numpy -c conda-forge --force-reinstall +``` + +### Issue: Slow Environment Creation + +**Problem**: `pixi install` takes very long + +**Solution**: +```bash +# Use solve groups to avoid re-solving everything +[tool.pixi.environments] +default = { solve-group = "default" } +test = { features = ["test"], solve-group = "default" } # reuses default solve + +# Clean cache if corrupted +pixi clean cache + +# Check for large dependency trees +pixi tree --depth 2 + +# Update pixi to latest version +pixi self-update +``` + +### Issue: Platform-Specific Failures + +**Problem**: Works on Linux but fails on macOS/Windows + +**Solution**: +```toml +# Use platform-specific dependencies +[tool.pixi.target.osx-arm64.dependencies] +# macOS ARM specific packages +tensorflow-macos = "*" + +[tool.pixi.target.linux-64.dependencies] +# Linux-specific +tensorflow = "*" + +# Exclude unsupported platforms +[tool.pixi.platforms] +linux-64 = true +osx-arm64 = true +# win-64 intentionally excluded if unsupported +``` + +### Issue: PyPI Package Installation Fails + +**Problem**: `pixi add --pypi package` fails with build errors + +**Solution**: +```bash +# Install build dependencies from conda first +pixi add python-build setuptools wheel + +# Then retry PyPI package +pixi add --pypi package + +# For packages needing system libraries +pixi add libgdal # system library +pixi add --pypi gdal # Python bindings + +# Check if conda-forge version exists +pixi search gdal # might have compiled version +``` + +### Issue: Environment Activation in Scripts + +**Problem**: Need to run scripts outside of `pixi run` + +**Solution**: +```bash +# Use pixi shell for interactive sessions +pixi shell +python script.py + +# For automation, always use pixi run +pixi run python script.py + +# In bash scripts +#!/usr/bin/env bash +eval "$(pixi shell-hook)" +python script.py + +# In task definitions +[tool.pixi.tasks] +run-script = "python script.py" # automatically in environment +``` + +### Issue: Lockfile Merge Conflicts + +**Problem**: Git merge conflicts in `pixi.lock` + +**Solution**: +```bash +# Accept one version +git checkout --theirs pixi.lock # or --ours + +# Regenerate lockfile +pixi install + +# Commit regenerated lockfile +git add pixi.lock +git commit -m "Regenerate lockfile after merge" + +# Prevention: coordinate updates with team +# One person updates dependencies at a time +``` + +### Issue: Missing System Dependencies + +**Problem**: Package fails at runtime with "library not found" + +**Solution**: +```bash +# Check what's actually in environment +pixi list + +# Add system libraries explicitly +pixi add libgdal proj geos # for geospatial +pixi add hdf5 netcdf4 # for climate data +pixi add mkl # for optimized linear algebra + +# Use conda for everything when possible +# Don't mix system packages with conda packages +``` + +### Issue: Cannot Find Executable in Environment + +**Problem**: `pixi run mycommand` fails with "command not found" + +**Solution**: +```bash +# List all installed packages +pixi list + +# Check if package provides executable +pixi add --help # documentation + +# Ensure package is in active environment +[tool.pixi.feature.dev.dependencies] +mypackage = "*" + +[tool.pixi.environments] +default = { features = ["dev"] } # must include feature + +# Or run in specific environment +pixi run --environment dev mycommand +``` + +### Issue: Want to Use Both Pixi and Conda + +**Problem**: Existing conda environment, want to migrate gradually + +**Solution**: +```bash +# Export existing conda environment +conda env export > environment.yml + +# Import to pixi project +pixi init --format pyproject --import-environment environment.yml + +# Or manually alongside +conda activate myenv # activate conda env +pixi shell # activate pixi env (nested) + +# Long term: migrate fully to pixi +# Pixi replaces conda/mamba entirely +``` + +### Issue: Editable Install of Local Package + +**Problem**: Want to develop local package in pixi environment + +**Solution**: +```toml +[tool.pixi.pypi-dependencies] +mypackage = { path = ".", editable = true } + +# Or for relative paths +sibling-package = { path = "../sibling", editable = true } +``` + +```bash +# Install in development mode +pixi install + +# Changes to source immediately reflected +pixi run python -c "import mypackage; print(mypackage.__file__)" +``` + +### Issue: Need Different Python Versions + +**Problem**: Test across Python 3.10, 3.11, 3.12 + +**Solution**: +```toml +[tool.pixi.feature.py310.dependencies] +python = "3.10.*" + +[tool.pixi.feature.py311.dependencies] +python = "3.11.*" + +[tool.pixi.feature.py312.dependencies] +python = "3.12.*" + +[tool.pixi.environments] +py310 = { features = ["py310"], solve-group = "py310" } +py311 = { features = ["py311"], solve-group = "py311" } +py312 = { features = ["py312"], solve-group = "py312" } +``` + +```bash +# Test all versions +pixi run --environment py310 pytest +pixi run --environment py311 pytest +pixi run --environment py312 pytest +``` + +## Summary + +Pixi revolutionizes scientific Python development by unifying conda and PyPI ecosystems with blazing-fast dependency resolution, reproducible multi-platform lockfiles, and seamless environment management. By leveraging `pyproject.toml` integration, pixi provides a modern, standards-compliant approach to managing complex scientific dependencies while maintaining compatibility with the broader Python ecosystem. + +**Key advantages for scientific computing:** + +1. **Optimized Scientific Packages**: Access conda-forge's pre-built binaries for NumPy, SciPy, and other compiled packages with MKL/OpenBLAS optimizations +2. **Complex Dependencies Made Simple**: Handle challenging packages like GDAL, netCDF4, and HDF5 that require C/Fortran/C++ system libraries +3. **True Reproducibility**: Multi-platform lockfiles ensure identical environments across Linux, macOS, and Windows +4. **Flexible Environment Management**: Feature-based environments for dev/test/prod, GPU/CPU, or any custom configuration +5. **Fast and Reliable**: 10-100x faster than conda with Rust-based parallel dependency resolution +6. **Task Automation**: Built-in task runner for scientific workflows, testing, and documentation +7. **Best of Both Worlds**: Seamlessly mix conda-forge optimized packages with PyPI's vast ecosystem + +Whether you're conducting reproducible research, developing scientific software, or managing complex data analysis pipelines, pixi provides the robust foundation for modern scientific Python development. By replacing conda/mamba with pixi, you gain speed, reliability, and modern workflows while maintaining full access to the scientific Python ecosystem. + +**Ready to get started?** Install pixi, initialize your project with `pixi init --format pyproject`, and experience the future of scientific Python package management. diff --git a/skills/python-packaging/SKILL.md b/skills/python-packaging/SKILL.md new file mode 100644 index 0000000..1de1baf --- /dev/null +++ b/skills/python-packaging/SKILL.md @@ -0,0 +1,1111 @@ +--- +name: scientific-python-packaging +description: Create distributable scientific Python packages following Scientific Python community best practices with pyproject.toml, src layout, and Hatchling build backend +--- + +# Scientific Python Packaging + +A comprehensive guide to creating, structuring, and distributing Python packages for scientific computing, following the [Scientific Python Community guidelines](https://learn.scientific-python.org/development/guides/packaging-simple/). This skill focuses on modern packaging standards using `pyproject.toml`, PEP 621 metadata, and the Hatchling build backend. + +## Quick Decision Tree + +**Package Structure Selection:** +``` +START + ├─ Pure Python scientific package (most common) → Pattern 1 (src/ layout) + ├─ Need data files with package → Pattern 2 (data/ subdirectory) + ├─ CLI tool → Pattern 5 (add [project.scripts]) + └─ Complex multi-feature package → Pattern 3 (full-featured) +``` + +**Build Backend Choice:** +``` +START → Use Hatchling (recommended for scientific Python) + ├─ Need VCS versioning? → Add hatch-vcs plugin + ├─ Simple manual versioning? → version = "X.Y.Z" in pyproject.toml + └─ Dynamic from __init__.py? → [tool.hatch.version] path +``` + +**Dependency Management:** +``` +START + ├─ Runtime dependencies → [project] dependencies + ├─ Optional features → [project.optional-dependencies] + ├─ Development tools → [dependency-groups] (PEP 735) + └─ Version constraints → Use >= for minimum, avoid upper caps +``` + +**Publishing Workflow:** +``` +1. Build: python -m build +2. Check: twine check dist/* +3. Test: twine upload --repository testpypi dist/* +4. Verify: pip install --index-url https://test.pypi.org/simple/ pkg +5. Publish: twine upload dist/* +``` + +**Common Task Quick Reference:** +```bash +# Setup new package +mkdir -p my-pkg/src/my_pkg && cd my-pkg +# Create pyproject.toml with [build-system] and [project] sections + +# Development install +pip install -e . --group dev + +# Build distributions +python -m build + +# Test installation +pip install dist/*.whl + +# Publish +twine upload dist/* +``` + +## When to Use This Skill + +- Creating scientific Python libraries for distribution +- Building research software packages with proper structure +- Publishing scientific packages to PyPI +- Setting up reproducible scientific Python projects +- Creating installable packages with scientific dependencies +- Implementing command-line tools for scientific workflows +- Following community standards for scientific Python development +- Preparing packages for peer review and publication + +## Core Concepts + +### 1. Modern Build Systems + +Python packages now use standardized build systems instead of classic `setup.py`: + +- **PEP 621**: Standardized project metadata in `pyproject.toml` +- **PEP 517/518**: Build system independence +- **Build backend**: Hatchling +- **No classic files**: No `setup.py`, `setup.cfg`, or `MANIFEST.in` + +### 2. Build Backend: Hatchling + +- **Hatchling**: Excellent balance of speed, configurability, and extendability +- Modern, standards-compliant build backend +- Automatic package discovery in `src/` layout +- VCS-aware file inclusion for SDists +- Extensible through plugins + +### 3. Package Structure + +- **src/ layout**: Required for proper isolation (prevents importing uninstalled code) +- **Automatic discovery**: Hatchling auto-detects packages in `src/` +- **Standard structure**: Consistent organization for testing and documentation + +### 4. Scientific Python Standards + +- **Dependency management**: Careful version constraints +- **Python version support**: Minimum version without upper caps +- **Development dependencies**: Use dependency-groups (PEP 735) +- **Documentation**: Include README, LICENSE, and docs folder +- **Testing**: Dedicated tests folder + +## Quick Start + +### Minimal Scientific Package Structure + +``` +my-sci-package/ +├── pyproject.toml +├── README.md +├── LICENSE +├── src/ +│ └── my_sci_package/ +│ ├── __init__.py +│ ├── analysis.py +│ └── utils.py +├── tests/ +│ ├── test_analysis.py +│ └── test_utils.py +└── docs/ + └── index.md +``` + +### Minimal pyproject.toml with Hatchling + +```toml +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "my-sci-package" +version = "0.1.0" +description = "A scientific Python package for data analysis" +readme = "README.md" +license = "BSD-3-Clause" +license-files = ["LICENSE"] +requires-python = ">=3.9" +authors = [ + {name = "Your Name", email = "you@example.com"}, +] +classifiers = [ + "Development Status :: 4 - Beta", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: BSD License", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Topic :: Scientific/Engineering", +] + +dependencies = [ + "numpy>=1.20", + "scipy>=1.7", +] + +[project.urls] +Homepage = "https://github.com/username/my-sci-package" +Documentation = "https://my-sci-package.readthedocs.io" +"Bug Tracker" = "https://github.com/username/my-sci-package/issues" +Discussions = "https://github.com/username/my-sci-package/discussions" +Changelog = "https://my-sci-package.readthedocs.io/en/latest/changelog.html" + +[dependency-groups] +test = [ + "pytest>=7.0", + "pytest-cov>=4.0", +] +dev = [ + {include-group = "test"}, + "ruff>=0.1", + "mypy>=1.0", +] +``` + +## Package Structure Patterns + +### Pattern 1: Pure Python Scientific Package (Recommended) + +``` +my-sci-package/ +├── pyproject.toml +├── README.md +├── LICENSE +├── .gitignore +├── src/ +│ └── my_sci_package/ +│ ├── __init__.py +│ ├── analysis.py +│ ├── preprocessing.py +│ ├── visualization.py +│ ├── utils.py +│ └── py.typed # For type hints +├── tests/ +│ ├── __init__.py +│ ├── test_analysis.py +│ ├── test_preprocessing.py +│ └── test_visualization.py +└── docs/ + ├── conf.py + ├── index.md + └── api.md +``` + +**Key advantages:** +- Prevents accidental imports from source +- Forces proper installation for testing +- Professional structure for scientific libraries +- Clear separation of concerns + +### Pattern 2: Scientific Package with Data Files + +``` +my-sci-package/ +├── pyproject.toml +├── README.md +├── LICENSE +├── src/ +│ └── my_sci_package/ +│ ├── __init__.py +│ ├── analysis.py +│ └── data/ +│ ├── reference.csv +│ ├── constants.json +│ └── coefficients.dat +├── tests/ +│ └── test_analysis.py +└── docs/ + └── index.md +``` + +**Include data files in pyproject.toml (if needed):** + +```toml +[tool.hatch.build.targets.wheel] +packages = ["src/my_sci_package"] + +# Only if you need to explicitly include data +[tool.hatch.build.targets.wheel.force-include] +"src/my_sci_package/data" = "my_sci_package/data" +``` + +**Access data files in code:** + +```python +from importlib.resources import files +import json + +def load_constants(): + """Load constants from package data.""" + data_file = files("my_sci_package").joinpath("data/constants.json") + with data_file.open() as f: + return json.load(f) +``` + +## Complete pyproject.toml Examples + +### Pattern 3: Full-Featured Scientific Package + +```toml +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "advanced-sci-package" +version = "1.0.0" +description = "Advanced scientific computing package" +readme = "README.md" +license = "BSD-3-Clause" +license-files = ["LICENSE"] +requires-python = ">=3.9" +authors = [ + {name = "Research Team", email = "team@university.edu"}, +] +maintainers = [ + {name = "Lead Maintainer", email = "maintainer@university.edu"}, +] +keywords = ["scientific-computing", "data-analysis", "research"] +classifiers = [ + "Development Status :: 5 - Production/Stable", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: BSD License", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Topic :: Scientific/Engineering", + "Topic :: Scientific/Engineering :: Physics", + "Topic :: Scientific/Engineering :: Mathematics", +] + +dependencies = [ + "numpy>=1.20", + "scipy>=1.7", + "pandas>=1.3", + "matplotlib>=3.4", +] + +[project.optional-dependencies] +ml = [ + "scikit-learn>=1.0", + "tensorflow>=2.8", +] +viz = [ + "plotly>=5.0", + "seaborn>=0.11", +] +all = [ + "advanced-sci-package[ml,viz]", +] + +[project.urls] +Homepage = "https://github.com/org/advanced-sci-package" +Documentation = "https://advanced-sci-package.readthedocs.io" +Repository = "https://github.com/org/advanced-sci-package" +"Bug Tracker" = "https://github.com/org/advanced-sci-package/issues" +Discussions = "https://github.com/org/advanced-sci-package/discussions" +Changelog = "https://advanced-sci-package.readthedocs.io/en/latest/changelog.html" + +[project.scripts] +sci-analyze = "advanced_sci_package.cli:main" + +[dependency-groups] +test = [ + "pytest>=7.0", + "pytest-cov>=4.0", + "pytest-xdist>=3.0", +] +docs = [ + "sphinx>=5.0", + "sphinx-rtd-theme>=1.0", + "numpydoc>=1.5", +] +dev = [ + {include-group = "test"}, + {include-group = "docs"}, + "ruff>=0.1", + "mypy>=1.0", + "pre-commit>=3.0", +] + +# Hatchling configuration +[tool.hatch.build.targets.wheel] +packages = ["src/advanced_sci_package"] + +# Ruff configuration (linting and formatting) +[tool.ruff] +line-length = 88 +target-version = "py39" + +[tool.ruff.lint] +select = ["E", "F", "I", "N", "W", "UP", "NPY", "RUF"] +ignore = ["E501"] # Line too long (handled by formatter) + +# Pytest configuration +[tool.pytest.ini_options] +testpaths = ["tests"] +python_files = ["test_*.py"] +addopts = "-v --cov=advanced_sci_package --cov-report=term-missing" + +# MyPy configuration +[tool.mypy] +python_version = "3.9" +warn_return_any = true +warn_unused_configs = true +disallow_untyped_defs = true + +# Coverage configuration +[tool.coverage.run] +source = ["src"] +omit = ["*/tests/*"] + +[tool.coverage.report] +exclude_lines = [ + "pragma: no cover", + "def __repr__", + "raise AssertionError", + "raise NotImplementedError", + "if __name__ == .__main__.:", +] +``` + +## Project Metadata + +### License (Modern SPDX Format) + +Use SPDX identifiers (supported by hatchling>=1.26): + +```toml +[project] +license = "BSD-3-Clause" +license-files = ["LICENSE"] +``` + +Common scientific licenses: +- `MIT` - Permissive, simple +- `BSD-3-Clause` - Permissive, commonly used in science +- `Apache-2.0` - Permissive, explicit patent grant +- `GPL-3.0-or-later` - Copyleft + +**Do not include License classifiers if using the `license` field.** + +### Python Version Requirements + +**Best practice**: Specify minimum version only, no upper cap: + +```toml +requires-python = ">=3.9" +``` + +This allows pip to back-solve for old package versions when needed. + +### Dependencies + +**Use appropriate version constraints:** + +```toml +dependencies = [ + "numpy>=1.20", # Minimum version + "scipy>=1.7,<2.0", # Compatible range (use sparingly) + "pandas>=1.3", # Open-ended (preferred) + "matplotlib>=3.4", # Minimum version +] +``` + +**Avoid pinning exact versions unless absolutely necessary.** + +### Classifiers + +Important classifiers for scientific packages: + +```toml +classifiers = [ + "Development Status :: 4 - Beta", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: BSD License", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Topic :: Scientific/Engineering", + "Topic :: Scientific/Engineering :: Physics", + "Typing :: Typed", +] +``` + +[Browse all classifiers](https://pypi.org/classifiers/) + +## Optional Dependencies (Extras) + +Use extras for optional scientific features: + +```toml +[project.optional-dependencies] +plotting = [ + "matplotlib>=3.4", + "seaborn>=0.11", +] +ml = [ + "scikit-learn>=1.0", + "xgboost>=1.5", +] +parallel = [ + "dask[array]>=2021.0", + "joblib>=1.0", +] +all = [ + "my-sci-package[plotting,ml,parallel]", +] +``` + +**Install with extras:** +```bash +pip install my-sci-package[plotting] +pip install my-sci-package[plotting,ml] +pip install my-sci-package[all] +``` + +## Development Dependencies (Dependency Groups) + +Use `dependency-groups` (PEP 735) instead of extras for development tools: + +```toml +[dependency-groups] +test = [ + "pytest>=7.0", + "pytest-cov>=4.0", + "hypothesis>=6.0", +] +docs = [ + "sphinx>=5.0", + "numpydoc>=1.5", + "sphinx-gallery>=0.11", +] +dev = [ + {include-group = "test"}, + {include-group = "docs"}, + "ruff>=0.1", + "mypy>=1.0", +] +``` + +**Install dependency groups:** +```bash +# Using uv (recommended) +uv pip install --group dev + +# Using pip 25.1+ +pip install --group dev + +# Traditional approach with editable install +pip install -e ".[dev]" # if using extras +``` + +**Advantages over extras:** +- Formally standardized +- More composable +- Not available on PyPI (development-only) +- Installed by default with `uv` + +## Command-Line Interface + +### Pattern 5: Scientific CLI Tool + +```python +# src/my_sci_package/cli.py +import click +import numpy as np +from pathlib import Path + +@click.group() +@click.version_option() +def cli(): + """Scientific analysis CLI tool.""" + pass + +@cli.command() +@click.argument("input_file", type=click.Path(exists=True)) +@click.option("--output", "-o", type=click.Path(), help="Output file path") +@click.option("--threshold", "-t", type=float, default=0.5, help="Analysis threshold") +def analyze(input_file: str, output: str, threshold: float): + """Analyze scientific data from input file.""" + # Load and analyze data + data = np.loadtxt(input_file) + result = np.mean(data[data > threshold]) + + click.echo(f"Analysis complete: mean = {result:.4f}") + + if output: + np.savetxt(output, [result]) + click.echo(f"Results saved to {output}") + +@cli.command() +@click.argument("input_file", type=click.Path(exists=True)) +@click.option("--format", type=click.Choice(["png", "pdf", "svg"]), default="png") +def plot(input_file: str, format: str): + """Generate plots from data.""" + import matplotlib.pyplot as plt + + data = np.loadtxt(input_file) + plt.plot(data) + output_file = f"plot.{format}" + plt.savefig(output_file) + click.echo(f"Plot saved to {output_file}") + +def main(): + """Entry point for CLI.""" + cli() + +if __name__ == "__main__": + main() +``` + +**Register in pyproject.toml:** + +```toml +[project.scripts] +sci-analyze = "my_sci_package.cli:main" +``` + +**Usage:** +```bash +pip install -e . +sci-analyze analyze data.txt --threshold 0.7 +sci-analyze plot data.txt --format pdf +``` + +## Versioning + +### Pattern 6: Manual Versioning + +```toml +[project] +version = "1.2.3" +``` + +```python +# src/my_sci_package/__init__.py +__version__ = "1.2.3" +``` + +### Pattern 7: Dynamic Versioning with Hatchling + +```toml +[project] +dynamic = ["version"] + +[tool.hatch.version] +path = "src/my_sci_package/__init__.py" +``` + +```python +# src/my_sci_package/__init__.py +__version__ = "1.2.3" +``` + +### Pattern 8: Git-Based Versioning with Hatchling + +```toml +[build-system] +requires = ["hatchling", "hatch-vcs"] +build-backend = "hatchling.build" + +[project] +dynamic = ["version"] + +[tool.hatch.version] +source = "vcs" + +[tool.hatch.build.hooks.vcs] +version-file = "src/my_sci_package/_version.py" +``` + +**Semantic versioning for scientific software:** +- `MAJOR`: Breaking API changes +- `MINOR`: New features, backward compatible +- `PATCH`: Bug fixes + +## Building and Publishing + +### Pattern 9: Build Package Locally + +```bash +# Install build tools +pip install build + +# Build distribution +python -m build + +# Creates: +# dist/my-sci-package-1.0.0.tar.gz (source distribution) +# dist/my_sci_package-1.0.0-py3-none-any.whl (wheel) + +# Verify the distribution +pip install twine +twine check dist/* + +# Inspect contents +tar -tvf dist/*.tar.gz +unzip -l dist/*.whl +``` + +**Critical**: Test the SDist contents to ensure all necessary files are included. + +### Pattern 10: Publishing to PyPI + +```bash +# Install publishing tools +pip install twine + +# Test on TestPyPI first (always!) +twine upload --repository testpypi dist/* + +# Install and test from TestPyPI +pip install --index-url https://test.pypi.org/simple/ my-sci-package + +# If everything works, publish to PyPI +twine upload dist/* +``` + +**Using API tokens (recommended):** + +Create `~/.pypirc`: +```ini +[distutils] +index-servers = + pypi + testpypi + +[pypi] +username = __token__ +password = pypi-...your-token... + +[testpypi] +username = __token__ +password = pypi-...your-test-token... +``` + +### Pattern 11: Automated Publishing with GitHub Actions + +```yaml +# .github/workflows/publish.yml +name: Publish to PyPI + +on: + release: + types: [published] + +jobs: + publish: + runs-on: ubuntu-latest + environment: release + permissions: + id-token: write # For trusted publishing + + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" + + - name: Install build tools + run: pip install build + + - name: Build package + run: python -m build + + - name: Publish to PyPI + uses: pypa/gh-action-pypi-publish@release/v1 +``` + +**Use PyPI trusted publishing instead of API tokens for GitHub Actions.** + +## Testing Installation + +### Pattern 12: Editable Install for Development + +```bash +# Install in development mode +pip install -e . + +# With dependency groups +pip install -e . --group dev + +# Using uv (recommended for scientific workflows) +uv pip install -e . --group dev + +# Now changes to source code are immediately reflected +``` + +### Pattern 13: Testing in Isolated Environment + +```bash +# Create and activate virtual environment +python -m venv test-env +source test-env/bin/activate # Linux/Mac + +# Install from wheel +pip install dist/my_sci_package-1.0.0-py3-none-any.whl + +# Test import and version +python -c "import my_sci_package; print(my_sci_package.__version__)" + +# Test CLI +sci-analyze --help + +# Cleanup +deactivate +rm -rf test-env +``` + +## Documentation + +### Pattern 14: Scientific Package README.md + +```markdown +# My Scientific Package + +[![PyPI version](https://badge.fury.io/py/my-sci-package.svg)](https://pypi.org/project/my-sci-package/) +[![Python versions](https://img.shields.io/pypi/pyversions/my-sci-package.svg)](https://pypi.org/project/my-sci-package/) +[![Tests](https://github.com/username/my-sci-package/workflows/Tests/badge.svg)](https://github.com/username/my-sci-package/actions) +[![Documentation](https://readthedocs.org/projects/my-sci-package/badge/?version=latest)](https://my-sci-package.readthedocs.io/) + +A Python package for [brief description of scientific purpose]. + +## Features + +- Feature 1: Description +- Feature 2: Description +- Feature 3: Description + +## Installation + +```bash +pip install my-sci-package +``` + +For plotting capabilities: +```bash +pip install my-sci-package[plotting] +``` + +## Quick Start + +```python +import my_sci_package as msp +import numpy as np + +# Example usage +data = np.random.randn(100) +result = msp.analyze(data, threshold=0.5) +print(f"Result: {result}") +``` + +## Documentation + +Full documentation: https://my-sci-package.readthedocs.io + +## Citation + +If you use this package in your research, please cite: + +```bibtex +@software{my_sci_package, + author = {Your Name}, + title = {My Scientific Package}, + year = {2025}, + url = {https://github.com/username/my-sci-package} +} +``` + +## Development + +```bash +git clone https://github.com/username/my-sci-package.git +cd my-sci-package +pip install -e . --group dev +pytest +``` + +## License + +BSD-3-Clause License - see LICENSE file for details. +``` + +## File Templates + +### .gitignore for Scientific Python Packages + +```gitignore +# Build artifacts +build/ +dist/ +*.egg-info/ +*.egg +.eggs/ +src/**/_version.py + +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so + +# Virtual environments +venv/ +env/ +ENV/ + +# IDE +.vscode/ +.idea/ +*.swp + +# Testing +.pytest_cache/ +.coverage +htmlcov/ +.hypothesis/ + +# Documentation +docs/_build/ +docs/_generated/ + +# Scientific data (adjust as needed) +*.hdf5 +*.nc +*.mat +data/processed/ + +# Jupyter +.ipynb_checkpoints/ +*.ipynb + +# Distribution +*.whl +*.tar.gz +``` + +### Pattern 15: Sphinx Documentation Setup + +```python +# docs/conf.py +import sys +from pathlib import Path + +# Add package to path +sys.path.insert(0, str(Path("..").resolve() / "src")) + +# Project information +project = "My Scientific Package" +copyright = "2025, Your Name" +author = "Your Name" + +# Extensions +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.napoleon", # NumPy/Google style docstrings + "sphinx.ext.viewcode", + "sphinx.ext.mathjax", # Math rendering + "sphinx.ext.intersphinx", + "numpydoc", # NumPy documentation style +] + +# Intersphinx mapping +intersphinx_mapping = { + "python": ("https://docs.python.org/3", None), + "numpy": ("https://numpy.org/doc/stable/", None), + "scipy": ("https://docs.scipy.org/doc/scipy/", None), + "pandas": ("https://pandas.pydata.org/docs/", None), +} + +# Theme +html_theme = "sphinx_rtd_theme" +``` + +## Checklist for Publishing Scientific Packages + +- [ ] Code is tested with pytest (>90% coverage recommended) +- [ ] Documentation is complete (README, docstrings, Sphinx docs) +- [ ] Version number follows semantic versioning +- [ ] CHANGELOG.md or NEWS.md updated +- [ ] LICENSE file included with appropriate license +- [ ] pyproject.toml has complete metadata +- [ ] Package uses src/ layout +- [ ] Package builds without errors (`python -m build`) +- [ ] SDist contents verified (`tar -tvf dist/*.tar.gz`) +- [ ] Installation tested in clean environment +- [ ] CLI tools work if applicable +- [ ] All classifiers are appropriate +- [ ] Python version constraint is correct (no upper bound) +- [ ] Dependencies have appropriate version constraints +- [ ] Repository is linked in project.urls +- [ ] Tested on TestPyPI first +- [ ] GitHub release created (if using) +- [ ] Documentation published (ReadTheDocs, GitHub Pages) +- [ ] Citation information included (CITATION.cff or README) + +## Best Practices for Scientific Python Packages + +1. **Use src/ layout** - Prevents importing uninstalled code, ensures proper testing +2. **Use pyproject.toml** - Modern standard, tool-independent configuration +3. **Use Hatchling** - Modern, fast, and configurable build backend +4. **No classic files** - Avoid setup.py, setup.cfg, MANIFEST.in +5. **Version constraints** - Minimum versions for dependencies, no upper cap for Python +6. **Test SDist contents** - Always verify what files are included/excluded +7. **Use TestPyPI** - Always test publishing before going to production +8. **Document thoroughly** - README, docstrings, Sphinx documentation +9. **Include LICENSE** - Use SPDX identifiers, choose appropriate scientific license +10. **Use dependency-groups** - For development dependencies (PEP 735) +11. **Semantic versioning** - Clear versioning strategy +12. **Automate CI/CD** - GitHub Actions for testing and publishing +13. **Type hints** - Include py.typed marker for typed packages +14. **Citation information** - Make it easy for users to cite your work +15. **Community standards** - Follow Scientific Python guidelines + +## Scientific Python Specific Considerations + +### NumPy-style Docstrings + +```python +def analyze_data(data, threshold=0.5, method="mean"): + """ + Analyze scientific data above a threshold. + + Parameters + ---------- + data : array_like + Input data array to analyze. + threshold : float, optional + Minimum value for inclusion in analysis, by default 0.5. + method : {"mean", "median", "std"}, optional + Statistical method to apply, by default "mean". + + Returns + ------- + result : float + Computed statistical result. + + Raises + ------ + ValueError + If method is not recognized. + + Examples + -------- + >>> import numpy as np + >>> data = np.array([0.1, 0.6, 0.8, 0.3, 0.9]) + >>> analyze_data(data, threshold=0.5) + 0.7666666666666667 + + Notes + ----- + This function uses NumPy for efficient computation. + + References + ---------- + .. [1] Harris et al., "Array programming with NumPy", Nature 585, 2020. + """ + pass +``` + +### Scientific Dependencies + +Common scientific Python dependencies: + +```toml +dependencies = [ + "numpy>=1.20", # Arrays and numerical computing + "scipy>=1.7", # Scientific computing algorithms + "pandas>=1.3", # Data structures and analysis + "matplotlib>=3.4", # Plotting + "xarray>=0.19", # Labeled multi-dimensional arrays + "scikit-learn>=1.0", # Machine learning + "astropy>=5.0", # Astronomy (if applicable) +] +``` + +### Reproducibility + +Include information for reproducibility: + +```toml +[project.urls] +"Source Code" = "https://github.com/org/package" +"Documentation" = "https://package.readthedocs.io" +"Bug Reports" = "https://github.com/org/package/issues" +"Changelog" = "https://github.com/org/package/blob/main/CHANGELOG.md" +"Citation" = "https://doi.org/10.xxxx/xxxxx" # DOI if available +``` + +## Resources + +- **Scientific Python Development Guide**: https://learn.scientific-python.org/development/ +- **Simple Packaging Guide**: https://learn.scientific-python.org/development/guides/packaging-simple/ +- **Python Packaging Guide**: https://packaging.python.org/ +- **PyPI**: https://pypi.org/ +- **TestPyPI**: https://test.pypi.org/ +- **Hatchling documentation**: https://hatch.pypa.io/latest/ +- **build**: https://pypa-build.readthedocs.io/ +- **twine**: https://twine.readthedocs.io/ +- **Scientific Python Cookie**: https://github.com/scientific-python/cookie +- **NumPy documentation style**: https://numpydoc.readthedocs.io/ + +## Common Issues and Solutions + +### Issue: Import errors in tests + +**Problem**: Tests import the source code instead of installed package. + +**Solution**: Use src/ layout and install package with `pip install -e .` + +### Issue: Missing files in distribution + +**Problem**: Data files or documentation not included in SDist/wheel. + +**Solution**: +- For Hatchling: VCS ignore file controls SDist contents +- Check with: `tar -tvf dist/*.tar.gz` +- Explicitly configure if needed in `[tool.hatch.build]` + +### Issue: Dependency conflicts + +**Problem**: Users cannot install due to incompatible dependency versions. + +**Solution**: Use minimal version constraints, avoid upper bounds on dependencies. + +### Issue: Python version incompatibility + +**Problem**: Package doesn't work on newer Python versions. + +**Solution**: Don't cap `requires-python`, test on multiple Python versions with CI. diff --git a/skills/python-testing/SKILL.md b/skills/python-testing/SKILL.md new file mode 100644 index 0000000..b13cb81 --- /dev/null +++ b/skills/python-testing/SKILL.md @@ -0,0 +1,1407 @@ +--- +name: scientific-python-testing +description: Write robust, maintainable tests for scientific Python packages using pytest best practices following Scientific Python community guidelines +--- + +# Scientific Python Testing with pytest + +A comprehensive guide to writing effective tests for scientific Python packages using pytest, following the [Scientific Python Community guidelines](https://learn.scientific-python.org/development/guides/pytest/) and [testing tutorial](https://learn.scientific-python.org/development/tutorials/test/). This skill focuses on modern testing patterns, fixtures, parametrization, and best practices specific to scientific computing. + +## Quick Reference Card + +**Common Testing Tasks - Quick Decisions:** + +```python +# 1. Basic test → Use simple assert +def test_function(): + assert result == expected + +# 2. Floating-point comparison → Use approx +from pytest import approx +assert result == approx(0.333, rel=1e-6) + +# 3. Testing exceptions → Use pytest.raises +with pytest.raises(ValueError, match="must be positive"): + function(-1) + +# 4. Multiple inputs → Use parametrize +@pytest.mark.parametrize("input,expected", [(1,1), (2,4), (3,9)]) +def test_square(input, expected): + assert input**2 == expected + +# 5. Reusable setup → Use fixture +@pytest.fixture +def sample_data(): + return np.array([1, 2, 3, 4, 5]) + +# 6. NumPy arrays → Use approx or numpy.testing +assert np.mean(data) == approx(3.0) +``` + +**Decision Tree:** +- Need multiple test cases with same logic? → **Parametrize** +- Need reusable test data/setup? → **Fixture** +- Testing floating-point results? → **pytest.approx** +- Testing exceptions/warnings? → **pytest.raises / pytest.warns** +- Complex numerical arrays? → **numpy.testing.assert_allclose** +- Organizing by speed? → **Markers and separate directories** + +## When to Use This Skill + +- Writing tests for scientific Python packages and libraries +- Testing numerical algorithms and scientific computations +- Setting up test infrastructure for research software +- Implementing continuous integration for scientific code +- Testing data analysis pipelines and workflows +- Validating scientific simulations and models +- Ensuring reproducibility and correctness of research code +- Testing code that uses NumPy, SciPy, Pandas, and other scientific libraries + +## Core Concepts + +### 1. Why pytest for Scientific Python + +pytest is the de facto standard for testing Python packages because it: + +- **Simple syntax**: Just use Python's `assert` statement +- **Detailed reporting**: Clear, informative failure messages +- **Powerful features**: Fixtures, parametrization, marks, plugins +- **Scientific ecosystem**: Native support for NumPy arrays, approximate comparisons +- **Community standard**: Used by NumPy, SciPy, Pandas, scikit-learn, and more + +### 2. Test Structure and Organization + +**Standard test directory layout:** + +```text +my-package/ +├── src/ +│ └── my_package/ +│ ├── __init__.py +│ ├── analysis.py +│ └── utils.py +├── tests/ +│ ├── conftest.py +│ ├── test_analysis.py +│ └── test_utils.py +└── pyproject.toml +``` + +**Key principles:** + +- Tests directory separate from source code (alongside `src/`) +- Test files named `test_*.py` (pytest discovery) +- Test functions named `test_*` (pytest discovery) +- No `__init__.py` in tests directory (avoid importability issues) +- Test against installed package, not local source + +### 3. pytest Configuration + +Configure pytest in `pyproject.toml` (recommended for modern packages): + +```toml +[tool.pytest.ini_options] +minversion = "7.0" +addopts = [ + "-ra", # Show summary of all test outcomes + "--showlocals", # Show local variables in tracebacks + "--strict-markers", # Error on undefined markers + "--strict-config", # Error on config issues +] +xfail_strict = true # xfail tests must fail +filterwarnings = [ + "error", # Treat warnings as errors +] +log_cli_level = "info" # Log level for test output +testpaths = [ + "tests", # Limit pytest to tests directory +] +``` + +## Testing Principles + +Following the [Scientific Python testing recommendations](https://learn.scientific-python.org/development/principles/testing/), effective testing provides multiple benefits and should follow key principles: + +### Advantages of Testing + +- **Trustworthy code**: Well-tested code behaves as expected and can be relied upon +- **Living documentation**: Tests communicate intent and expected behavior, validated with each run +- **Preventing failure**: Tests protect against implementation errors and unexpected dependency changes +- **Confidence when making changes**: Thorough test suites enable adding features, fixing bugs, and refactoring with confidence + +### Fundamental Principles + +**1. Any test case is better than none** + +When in doubt, write the test that makes sense at the time: +- Test critical behaviors, features, and logic +- Write clear, expressive, well-documented tests +- Tests are documentation of developer intentions +- Good tests make it clear what they are testing and how + +Don't get bogged down in taxonomy when learning—focus on writing tests that work. + +**2. As long as that test is correct** + +It's surprisingly easy to write tests that pass when they should fail: +- **Check that your test fails when it should**: Deliberately break the code and verify the test fails +- **Keep it simple**: Excessive mocks and fixtures make it difficult to know what's being tested +- **Test one thing at a time**: A single test should test a single behavior + +**3. Start with Public Interface Tests** + +Begin by testing from the perspective of a user: +- Test code as users will interact with it +- Keep tests simple and readable for documentation purposes +- Focus on supported use cases +- Avoid testing private attributes +- Minimize use of mocks/patches + +**4. Organize Tests into Suites** + +Divide tests by type and execution time for efficiency: +- **Unit tests**: Fast, isolated tests of individual components +- **Integration tests**: Tests of component interactions and dependencies +- **End-to-end tests**: Complete workflow testing + +Benefits: +- Run relevant tests quickly and frequently +- "Fail fast" by running fast suites first +- Easier to read and reason about +- Avoid false positives from expected external failures + +### Outside-In Testing Approach + +The recommended approach is **outside-in**, starting from the user's perspective: + +1. **Public Interface Tests**: Test from user perspective, focusing on behavior and features +2. **Integration Tests**: Test that components work together and with dependencies +3. **Unit Tests**: Test individual units in isolation, optimized for speed + +This approach ensures you're building the right thing before optimizing implementation details. + +## Quick Start + +### Minimal Test Example + +```python +# tests/test_basic.py + +def test_simple_math(): + """Test basic arithmetic.""" + assert 4 == 2**2 + +def test_string_operations(): + """Test string methods.""" + result = "hello world".upper() + assert result == "HELLO WORLD" + assert "HELLO" in result +``` + +### Scientific Test Example + +```python +# tests/test_scientific.py +import numpy as np +from pytest import approx + +from my_package.analysis import compute_mean, fit_linear + +def test_compute_mean(): + """Test mean calculation.""" + data = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) + result = compute_mean(data) + assert result == approx(3.0) + +def test_fit_linear(): + """Test linear regression.""" + x = np.array([0, 1, 2, 3, 4]) + y = np.array([0, 2, 4, 6, 8]) + slope, intercept = fit_linear(x, y) + + assert slope == approx(2.0) + assert intercept == approx(0.0) +``` + +## Testing Best Practices + +### Pattern 1: Writing Simple, Focused Tests + +**Bad - Multiple assertions testing different things:** +```python +def test_everything(): + data = load_data("input.csv") + assert len(data) > 0 + processed = process_data(data) + assert processed.mean() > 0 + result = analyze(processed) + assert result.success +``` + +**Good - Separate tests for each behavior:** +```python +def test_load_data_returns_nonempty(): + """Data loading should return at least one row.""" + data = load_data("input.csv") + assert len(data) > 0 + +def test_process_data_positive_mean(): + """Processed data should have positive mean.""" + data = load_data("input.csv") + processed = process_data(data) + assert processed.mean() > 0 + +def test_analyze_succeeds(): + """Analysis should complete successfully.""" + data = load_data("input.csv") + processed = process_data(data) + result = analyze(processed) + assert result.success +``` + +**Arrange-Act-Assert pattern:** +```python +def test_computation(): + # Arrange - Set up test data + data = np.array([1, 2, 3, 4, 5]) + expected = 3.0 + + # Act - Execute the function + result = compute_mean(data) + + # Assert - Check the result + assert result == approx(expected) +``` + +### Pattern 2: Testing for Failures + +Always test that your code raises appropriate exceptions: + +```python +import pytest + +def test_zero_division_raises(): + """Division by zero should raise ZeroDivisionError.""" + with pytest.raises(ZeroDivisionError): + result = 1 / 0 + +def test_invalid_input_raises(): + """Invalid input should raise ValueError.""" + with pytest.raises(ValueError, match="must be positive"): + result = compute_sqrt(-1) + +def test_deprecation_warning(): + """Deprecated function should warn.""" + with pytest.warns(DeprecationWarning): + result = old_function() + +def test_deprecated_call(): + """Check for deprecated API usage.""" + with pytest.deprecated_call(): + result = legacy_api() +``` + +### Pattern 3: Approximate Comparisons + +Scientific computing often involves floating-point arithmetic that cannot be tested for exact equality: + +**For scalars:** +```python +from pytest import approx + +def test_approximate_scalar(): + """Test with approximate comparison.""" + result = 1 / 3 + assert result == approx(0.33333333333, rel=1e-10) + + # Default relative tolerance is 1e-6 + assert 0.3 + 0.3 == approx(0.6) + +def test_approximate_with_absolute_tolerance(): + """Test with absolute tolerance.""" + result = compute_small_value() + assert result == approx(0.0, abs=1e-10) +``` + +**For NumPy arrays (preferred over numpy.testing):** +```python +import numpy as np +from pytest import approx + +def test_array_approximate(): + """Test NumPy arrays with approx.""" + result = np.array([0.1, 0.2, 0.3]) + expected = np.array([0.10001, 0.20001, 0.30001]) + assert result == approx(expected) + +def test_array_with_nan(): + """Handle NaN values in arrays.""" + result = np.array([1.0, np.nan, 3.0]) + expected = np.array([1.0, np.nan, 3.0]) + assert result == approx(expected, nan_ok=True) +``` + +**When to use numpy.testing:** +```python +import numpy as np +from numpy.testing import assert_allclose, assert_array_equal + +def test_exact_integer_array(): + """Use numpy.testing for exact integer comparisons.""" + result = np.array([1, 2, 3]) + expected = np.array([1, 2, 3]) + assert_array_equal(result, expected) + +def test_complex_array_tolerances(): + """Use numpy.testing for complex tolerance requirements.""" + result = compute_result() + expected = load_reference() + assert_allclose(result, expected, rtol=1e-7, atol=1e-10) +``` + +### Pattern 4: Using Fixtures + +Fixtures provide reusable test setup and teardown: + +**Basic fixtures:** +```python +import pytest +import numpy as np + +@pytest.fixture +def sample_data(): + """Provide sample data for tests.""" + return np.array([1.0, 2.0, 3.0, 4.0, 5.0]) + +@pytest.fixture +def empty_array(): + """Provide empty array for edge case tests.""" + return np.array([]) + +def test_mean_with_fixture(sample_data): + """Test using fixture.""" + result = np.mean(sample_data) + assert result == approx(3.0) + +def test_empty_array(empty_array): + """Test edge case with empty array.""" + with pytest.warns(RuntimeWarning): + result = np.mean(empty_array) + assert np.isnan(result) +``` + +**Fixtures with setup and teardown:** +```python +import pytest +import tempfile +from pathlib import Path + +@pytest.fixture +def temp_datafile(): + """Create temporary data file for tests.""" + # Setup + tmpfile = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') + tmpfile.write("1.0\n2.0\n3.0\n") + tmpfile.close() + + # Provide to test + yield Path(tmpfile.name) + + # Teardown + Path(tmpfile.name).unlink() + +def test_load_data(temp_datafile): + """Test data loading from file.""" + data = np.loadtxt(temp_datafile) + assert len(data) == 3 + assert data[0] == approx(1.0) +``` + +**Fixture scopes:** +```python +@pytest.fixture(scope="function") # Default, run for each test +def data_per_test(): + return create_data() + +@pytest.fixture(scope="class") # Run once per test class +def data_per_class(): + return create_data() + +@pytest.fixture(scope="module") # Run once per module +def data_per_module(): + return load_large_dataset() + +@pytest.fixture(scope="session") # Run once per test session +def database_connection(): + conn = create_connection() + yield conn + conn.close() +``` + +**Auto-use fixtures:** +```python +@pytest.fixture(autouse=True) +def reset_random_seed(): + """Reset random seed before each test for reproducibility.""" + np.random.seed(42) +``` + +### Pattern 5: Parametrized Tests + +Test the same function with multiple inputs: + +**Basic parametrization:** +```python +import pytest + +@pytest.mark.parametrize("input_val,expected", [ + (0, 0), + (1, 1), + (2, 4), + (3, 9), + (-2, 4), +]) +def test_square(input_val, expected): + """Test squaring with multiple inputs.""" + assert input_val**2 == expected + +@pytest.mark.parametrize("angle", [0, np.pi/6, np.pi/4, np.pi/3, np.pi/2]) +def test_sine_range(angle): + """Test sine function returns values in [0, 1] for first quadrant.""" + result = np.sin(angle) + assert 0 <= result <= 1 +``` + +**Multiple parameters:** +```python +@pytest.mark.parametrize("n_air,n_water", [ + (1.0, 1.33), + (1.0, 1.5), + (1.5, 1.0), +]) +def test_refraction(n_air, n_water): + """Test Snell's law with different refractive indices.""" + angle_in = np.pi / 4 + angle_out = snell(angle_in, n_air, n_water) + assert angle_out >= 0 + assert angle_out <= np.pi / 2 +``` + +**Parametrized fixtures:** +```python +@pytest.fixture(params=[1, 2, 3], ids=["one", "two", "three"]) +def dimension(request): + """Parametrized fixture for different dimensions.""" + return request.param + +def test_array_creation(dimension): + """Test array creation in different dimensions.""" + shape = tuple([10] * dimension) + arr = np.zeros(shape) + assert arr.ndim == dimension + assert arr.shape == shape +``` + +**Combining parametrization with custom IDs:** +```python +@pytest.mark.parametrize( + "data,expected", + [ + (np.array([1, 2, 3]), 2.0), + (np.array([1, 1, 1]), 1.0), + (np.array([0, 10]), 5.0), + ], + ids=["sequential", "constant", "extremes"] +) +def test_mean_with_ids(data, expected): + """Test mean with descriptive test IDs.""" + assert np.mean(data) == approx(expected) +``` + +### Pattern 6: Test Organization with Markers + +Use markers to organize and selectively run tests: + +**Basic markers:** +```python +import pytest + +@pytest.mark.slow +def test_expensive_computation(): + """Test that takes a long time.""" + result = run_simulation(n_iterations=1000000) + assert result.converged + +@pytest.mark.requires_gpu +def test_gpu_acceleration(): + """Test that requires GPU hardware.""" + result = compute_on_gpu(large_array) + assert result.success + +@pytest.mark.integration +def test_full_pipeline(): + """Integration test for complete workflow.""" + data = load_data() + processed = preprocess(data) + result = analyze(processed) + output = save_results(result) + assert output.exists() +``` + +**Running specific markers:** +```bash +pytest -m slow # Run only slow tests +pytest -m "not slow" # Skip slow tests +pytest -m "slow or gpu" # Run slow OR gpu tests +pytest -m "slow and integration" # Run slow AND integration tests +``` + +**Skip and xfail markers:** +```python +@pytest.mark.skip(reason="Feature not implemented yet") +def test_future_feature(): + """Test for feature under development.""" + result = future_function() + assert result.success + +@pytest.mark.skipif(sys.version_info < (3, 10), reason="Requires Python 3.10+") +def test_pattern_matching(): + """Test using Python 3.10+ features.""" + match value: + case 0: + result = "zero" + case _: + result = "other" + assert result == "zero" + +@pytest.mark.xfail(reason="Known bug in upstream library") +def test_known_failure(): + """Test that currently fails due to known issue.""" + result = buggy_function() + assert result == expected + +@pytest.mark.xfail(strict=True) +def test_must_fail(): + """Test that MUST fail (test will fail if it passes).""" + with pytest.raises(NotImplementedError): + unimplemented_function() +``` + +**Custom markers in pyproject.toml:** +```toml +[tool.pytest.ini_options] +markers = [ + "slow: marks tests as slow (deselect with '-m \"not slow\"')", + "requires_gpu: marks tests that need GPU hardware", + "integration: marks tests as integration tests", + "unit: marks tests as unit tests", +] +``` + +### Pattern 6b: Organizing Test Suites by Directory + +Following [Scientific Python recommendations](https://learn.scientific-python.org/development/principles/testing/#test-suites), organize tests into separate directories by type and execution time: + +```text +tests/ +├── unit/ # Fast, isolated unit tests +│ ├── conftest.py +│ ├── test_analysis.py +│ └── test_utils.py +├── integration/ # Integration tests +│ ├── conftest.py +│ └── test_pipeline.py +├── e2e/ # End-to-end tests +│ └── test_workflows.py +└── conftest.py # Shared fixtures +``` + +**Run specific test suites:** +```bash +# Run only unit tests (fast) +pytest tests/unit/ + +# Run integration tests after unit tests pass +pytest tests/integration/ + +# Run all tests +pytest +``` + +**Auto-mark all tests in a directory using conftest.py:** +```python +# tests/unit/conftest.py +import pytest + +def pytest_collection_modifyitems(session, config, items): + """Automatically mark all tests in this directory as unit tests.""" + for item in items: + item.add_marker(pytest.mark.unit) +``` + +**Benefits of organized test suites:** +- Run fast tests first ("fail fast" principle) +- Developers can run relevant tests quickly +- Clear separation of test types +- Avoid false positives from slow/flaky tests +- Better CI/CD optimization + +**Example test runner strategy:** +```bash +# Run fast unit tests first, stop on failure +pytest tests/unit/ -x || exit 1 + +# If unit tests pass, run integration tests +pytest tests/integration/ -x || exit 1 + +# Finally run slow end-to-end tests +pytest tests/e2e/ +``` + +### Pattern 7: Mocking and Monkeypatching + +Mock expensive operations or external dependencies: + +**Basic monkeypatching:** +```python +import platform + +def test_platform_specific_behavior(monkeypatch): + """Test behavior on different platforms.""" + # Mock platform.system() to return "Linux" + monkeypatch.setattr(platform, "system", lambda: "Linux") + result = get_platform_specific_path() + assert result == "/usr/local/data" + + # Change mock to return "Windows" + monkeypatch.setattr(platform, "system", lambda: "Windows") + result = get_platform_specific_path() + assert result == r"C:\Users\data" +``` + +**Mocking with pytest-mock:** +```python +import pytest +from unittest.mock import Mock + +def test_expensive_computation(mocker): + """Mock expensive computation.""" + # Mock the expensive function + mock_compute = mocker.patch("my_package.analysis.expensive_compute") + mock_compute.return_value = 42 + + result = run_analysis() + + # Verify the mock was called + mock_compute.assert_called_once() + assert result == 42 + +def test_matplotlib_plotting(mocker): + """Test plotting without creating actual plots.""" + mock_plt = mocker.patch("matplotlib.pyplot") + + create_plot(data) + + # Verify plot was created + mock_plt.figure.assert_called_once() + mock_plt.plot.assert_called_once() + mock_plt.savefig.assert_called_once_with("output.png") +``` + +**Fixture for repeated mocking:** +```python +@pytest.fixture +def mock_matplotlib(mocker): + """Mock matplotlib for testing plots.""" + fig = mocker.Mock(spec=plt.Figure) + ax = mocker.Mock(spec=plt.Axes) + line2d = mocker.Mock(name="plot", spec=plt.Line2D) + ax.plot.return_value = (line2d,) + + mpl = mocker.patch("matplotlib.pyplot", autospec=True) + mocker.patch("matplotlib.pyplot.subplots", return_value=(fig, ax)) + + return {"fig": fig, "ax": ax, "mpl": mpl} + +def test_my_plot(mock_matplotlib): + """Test plotting function.""" + ax = mock_matplotlib["ax"] + my_plotting_function(ax=ax) + + ax.plot.assert_called_once() + ax.set_xlabel.assert_called_once() +``` + +### Pattern 8: Testing Against Installed Version + +Always test the installed package, not local source: + +**Why this matters:** +``` +my-package/ +├── src/ +│ └── my_package/ +│ ├── __init__.py +│ ├── data/ # Data files +│ │ └── reference.csv +│ └── analysis.py +└── tests/ + └── test_analysis.py +``` + +**Use src/ layout + editable install:** +```bash +# Install in editable mode +pip install -e . + +# Run tests against installed version +pytest +``` + +**Benefits:** +- Tests ensure package installs correctly +- Catches missing files (like data files) +- Tests work in CI/CD environments +- Validates package structure and imports + +**In tests, import from package:** +```python +# Good - imports installed package +from my_package.analysis import compute_mean + +# Bad - would import from local src/ if not using src/ layout +# from analysis import compute_mean +``` + +### Pattern 8b: Import Best Practices in Tests + +Following [Scientific Python unit testing guidelines](https://learn.scientific-python.org/development/principles/testing/#unit-tests), proper import patterns make tests more maintainable: + +**Keep imports local to file under test:** +```python +# Good - Import from the file being tested +from my_package.analysis import MyClass, compute_mean + +def test_compute_mean(): + """Test imports from module under test.""" + data = MyClass() + result = compute_mean(data) + assert result > 0 +``` + +**Why this matters:** +- When code is refactored and symbols move, tests don't break +- Tests only care about symbols used in the file under test +- Reduces coupling between tests and internal code organization + +**Import specific symbols, not entire modules:** +```python +# Good - Specific imports, easy to mock +from numpy import mean as np_mean, ndarray as NpArray + +def my_function(data: NpArray) -> float: + return np_mean(data) + +# Good - Easy to patch in tests +def test_my_function(mocker): + mock_mean = mocker.patch("my_package.analysis.np_mean") + # ... +``` + +```python +# Less ideal - Harder to mock effectively +import numpy as np + +def my_function(data: np.ndarray) -> float: + return np.mean(data) + +# Less ideal - Complex patching required +def test_my_function(mocker): + # Must patch through the aliased namespace + mock_mean = mocker.patch("my_package.analysis.np.mean") + # ... +``` + +**Consider meaningful aliases:** +```python +# Make imports meaningful to your domain +from numpy import sum as numeric_sum +from scipy.stats import ttest_ind as statistical_test + +# Easy to understand and replace +result = numeric_sum(values) +p_value = statistical_test(group1, group2) +``` + +This approach makes it easier to: +- Replace implementations without changing tests +- Mock dependencies effectively +- Understand code purpose from import names + +## Running pytest + +### Basic Usage + +```bash +# Run all tests +pytest + +# Run specific file +pytest tests/test_analysis.py + +# Run specific test +pytest tests/test_analysis.py::test_mean + +# Run tests matching pattern +pytest -k "mean or median" + +# Verbose output +pytest -v + +# Show local variables in failures +pytest -l # or --showlocals + +# Stop at first failure +pytest -x + +# Show stdout/stderr +pytest -s +``` + +### Debugging Tests + +```bash +# Drop into debugger on failure +pytest --pdb + +# Drop into debugger at start of each test +pytest --trace + +# Run last failed tests +pytest --lf + +# Run failed tests first, then rest +pytest --ff + +# Show which tests would be run (dry run) +pytest --collect-only +``` + +### Coverage + +```bash +# Install pytest-cov +pip install pytest-cov + +# Run with coverage +pytest --cov=my_package + +# With coverage report +pytest --cov=my_package --cov-report=html + +# With missing lines +pytest --cov=my_package --cov-report=term-missing + +# Fail if coverage below threshold +pytest --cov=my_package --cov-fail-under=90 +``` + +**Configure in pyproject.toml:** +```toml +[tool.pytest.ini_options] +addopts = [ + "--cov=my_package", + "--cov-report=term-missing", + "--cov-report=html", +] + +[tool.coverage.run] +source = ["src"] +omit = [ + "*/tests/*", + "*/__init__.py", +] + +[tool.coverage.report] +exclude_lines = [ + "pragma: no cover", + "def __repr__", + "raise AssertionError", + "raise NotImplementedError", + "if __name__ == .__main__.:", + "if TYPE_CHECKING:", + "@abstractmethod", +] +``` + +## Scientific Python Testing Patterns + +### Pattern 9: Testing Numerical Algorithms + +```python +import numpy as np +from pytest import approx + +def test_numerical_stability(): + """Test algorithm is numerically stable.""" + data = np.array([1e10, 1.0, -1e10]) + result = stable_sum(data) + assert result == approx(1.0) + +def test_convergence(): + """Test iterative algorithm converges.""" + x0 = np.array([1.0, 1.0, 1.0]) + result = iterative_solver(x0, tol=1e-8, max_iter=1000) + + assert result.converged + assert result.iterations < 1000 + assert result.residual < 1e-8 + +def test_against_analytical_solution(): + """Test against known analytical result.""" + x = np.linspace(0, 1, 100) + numerical = compute_integral(lambda t: t**2, x) + analytical = x**3 / 3 + assert numerical == approx(analytical, rel=1e-6) + +def test_conservation_law(): + """Test that physical conservation law holds.""" + initial_energy = compute_energy(system) + system.evolve(dt=0.01, steps=1000) + final_energy = compute_energy(system) + + # Energy should be conserved (within numerical error) + assert final_energy == approx(initial_energy, rel=1e-10) +``` + +### Pattern 10: Testing with Different NumPy dtypes + +```python +@pytest.mark.parametrize("dtype", [ + np.float32, + np.float64, + np.complex64, + np.complex128, +]) +def test_computation_dtypes(dtype): + """Test function works with different dtypes.""" + data = np.array([1, 2, 3, 4, 5], dtype=dtype) + result = compute_transform(data) + + assert result.dtype == dtype + assert result.shape == data.shape + +@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64]) +def test_integer_and_float_types(dtype): + """Test handling of integer and float types.""" + arr = np.array([1, 2, 3], dtype=dtype) + result = safe_divide(arr, 2) + + # Result should be floating point + assert result.dtype in [np.float32, np.float64] +``` + +### Pattern 11: Testing Random/Stochastic Code + +```python +def test_random_with_seed(): + """Test random code with fixed seed for reproducibility.""" + np.random.seed(42) + result1 = generate_random_samples(n=100) + + np.random.seed(42) + result2 = generate_random_samples(n=100) + + # Should get identical results with same seed + assert np.array_equal(result1, result2) + +def test_statistical_properties(): + """Test statistical properties of random output.""" + np.random.seed(123) + samples = generate_normal_samples(n=100000, mean=0, std=1) + + # Test mean and std are close to expected (not exact due to randomness) + assert np.mean(samples) == approx(0, abs=0.01) + assert np.std(samples) == approx(1, abs=0.01) + +@pytest.mark.parametrize("seed", [42, 123, 456]) +def test_reproducibility_with_seeds(seed): + """Test reproducibility with different seeds.""" + np.random.seed(seed) + result = stochastic_algorithm() + + # Should complete successfully regardless of seed + assert result.success +``` + +### Pattern 12: Testing Data Pipelines + +```python +def test_pipeline_end_to_end(tmp_path): + """Test complete data pipeline.""" + # Arrange - Create input data + input_file = tmp_path / "input.csv" + input_file.write_text("x,y\n1,2\n3,4\n5,6\n") + + output_file = tmp_path / "output.csv" + + # Act - Run pipeline + result = run_pipeline(input_file, output_file) + + # Assert - Check results + assert result.success + assert output_file.exists() + + output_data = np.loadtxt(output_file, delimiter=",", skiprows=1) + assert len(output_data) == 3 + +def test_pipeline_stages_independently(): + """Test each pipeline stage separately.""" + # Test stage 1 + raw_data = load_data("input.csv") + assert len(raw_data) > 0 + + # Test stage 2 + cleaned = clean_data(raw_data) + assert not np.any(np.isnan(cleaned)) + + # Test stage 3 + transformed = transform_data(cleaned) + assert transformed.shape == cleaned.shape + + # Test stage 4 + result = analyze_data(transformed) + assert result.metrics["r2"] > 0.9 +``` + +### Pattern 13: Property-Based Testing with Hypothesis + +For complex scientific code, consider property-based testing: + +```python +from hypothesis import given, strategies as st +from hypothesis.extra.numpy import arrays +import numpy as np + +@given(arrays(np.float64, shape=st.integers(1, 100))) +def test_mean_is_bounded(arr): + """Mean should be between min and max.""" + if len(arr) > 0 and not np.any(np.isnan(arr)): + mean = np.mean(arr) + assert np.min(arr) <= mean <= np.max(arr) + +@given( + x=arrays(np.float64, shape=10, elements=st.floats(-100, 100)), + y=arrays(np.float64, shape=10, elements=st.floats(-100, 100)) +) +def test_linear_fit_properties(x, y): + """Test properties of linear regression.""" + if not (np.any(np.isnan(x)) or np.any(np.isnan(y))): + slope, intercept = fit_linear(x, y) + + # Predictions should be finite + predictions = slope * x + intercept + assert np.all(np.isfinite(predictions)) +``` + +## Test Configuration Examples + +### Complete pyproject.toml Testing Section + +```toml +[tool.pytest.ini_options] +minversion = "7.0" +addopts = [ + "-ra", # Show summary of all test outcomes + "--showlocals", # Show local variables in tracebacks + "--strict-markers", # Error on undefined markers + "--strict-config", # Error on config issues + "--cov=my_package", # Coverage for package + "--cov-report=term-missing", # Show missing lines + "--cov-report=html", # HTML coverage report +] +xfail_strict = true # xfail tests must fail +filterwarnings = [ + "error", # Treat warnings as errors + "ignore::DeprecationWarning:pkg_resources", # Ignore specific warning + "ignore::PendingDeprecationWarning", +] +log_cli_level = "info" # Log level for test output +testpaths = [ + "tests", # Test directory +] +markers = [ + "slow: marks tests as slow (deselect with '-m \"not slow\"')", + "integration: marks tests as integration tests", + "requires_gpu: marks tests that need GPU hardware", +] + +[tool.coverage.run] +source = ["src"] +omit = [ + "*/tests/*", + "*/__init__.py", + "*/conftest.py", +] +branch = true # Measure branch coverage + +[tool.coverage.report] +exclude_lines = [ + "pragma: no cover", + "def __repr__", + "raise AssertionError", + "raise NotImplementedError", + "if __name__ == .__main__.:", + "if TYPE_CHECKING:", + "@abstractmethod", +] +precision = 2 +show_missing = true +skip_covered = false +``` + +### conftest.py for Shared Fixtures + +```python +# tests/conftest.py +import pytest +import numpy as np +from pathlib import Path + +@pytest.fixture(scope="session") +def test_data_dir(): + """Provide path to test data directory.""" + return Path(__file__).parent / "data" + +@pytest.fixture +def sample_array(): + """Provide sample NumPy array.""" + np.random.seed(42) + return np.random.randn(100) + +@pytest.fixture +def temp_output_dir(tmp_path): + """Provide temporary directory for test outputs.""" + output_dir = tmp_path / "output" + output_dir.mkdir() + return output_dir + +@pytest.fixture(autouse=True) +def reset_random_state(): + """Reset random state before each test.""" + np.random.seed(42) + +@pytest.fixture(scope="session") +def large_dataset(): + """Load large dataset once per test session.""" + return load_reference_data() + +# Platform-specific fixtures +@pytest.fixture(params=["Linux", "Darwin", "Windows"]) +def platform_name(request, monkeypatch): + """Parametrize tests across platforms.""" + monkeypatch.setattr("platform.system", lambda: request.param) + return request.param +``` + +## Common Testing Pitfalls and Solutions + +### Pitfall 1: Testing Implementation Instead of Behavior + +**Bad:** +```python +def test_uses_numpy_mean(): + """Test that function uses np.mean.""" # Testing implementation! + # This is fragile - breaks if implementation changes + pass +``` + +**Good:** +```python +def test_computes_correct_average(): + """Test that function returns correct average.""" + data = np.array([1, 2, 3, 4, 5]) + result = compute_average(data) + assert result == approx(3.0) +``` + +### Pitfall 2: Non-Deterministic Tests + +**Bad:** +```python +def test_random_sampling(): + samples = generate_samples() # Uses random seed from system time! + assert samples[0] > 0 # Might fail randomly +``` + +**Good:** +```python +def test_random_sampling(): + np.random.seed(42) # Fixed seed + samples = generate_samples() + assert samples[0] == approx(0.4967, rel=1e-4) +``` + +### Pitfall 3: Exact Floating-Point Comparisons + +**Bad:** +```python +def test_computation(): + result = 0.1 + 0.2 + assert result == 0.3 # Fails due to floating-point error! +``` + +**Good:** +```python +def test_computation(): + result = 0.1 + 0.2 + assert result == approx(0.3) +``` + +### Pitfall 4: Testing Too Much in One Test + +**Bad:** +```python +def test_entire_analysis(): + # Load data + data = load_data() + assert data is not None + + # Process + processed = process(data) + assert len(processed) > 0 + + # Analyze + result = analyze(processed) + assert result.score > 0.8 + + # Save + save_results(result, "output.txt") + assert Path("output.txt").exists() +``` + +**Good:** +```python +def test_load_data_succeeds(): + data = load_data() + assert data is not None + +def test_process_returns_nonempty(): + data = load_data() + processed = process(data) + assert len(processed) > 0 + +def test_analyze_gives_good_score(): + data = load_data() + processed = process(data) + result = analyze(processed) + assert result.score > 0.8 + +def test_save_results_creates_file(tmp_path): + output_file = tmp_path / "output.txt" + result = create_mock_result() + save_results(result, output_file) + assert output_file.exists() +``` + +## Testing Checklist + +- [ ] Tests are in `tests/` directory separate from source +- [ ] Test files named `test_*.py` +- [ ] Test functions named `test_*` +- [ ] Tests run against installed package (use src/ layout) +- [ ] pytest configured in `pyproject.toml` +- [ ] Using `pytest.approx` for floating-point comparisons +- [ ] Tests check exceptions with `pytest.raises` +- [ ] Tests check warnings with `pytest.warns` +- [ ] Parametrized tests for multiple inputs +- [ ] Fixtures for reusable setup +- [ ] Markers used for test organization +- [ ] Random tests use fixed seeds +- [ ] Tests are independent (can run in any order) +- [ ] Each test focuses on one behavior +- [ ] Coverage > 80% (preferably > 90%) +- [ ] All tests pass before committing +- [ ] Slow tests marked with `@pytest.mark.slow` +- [ ] Integration tests marked appropriately +- [ ] CI configured to run tests automatically + +## Continuous Integration + +### GitHub Actions Example + +```yaml +# .github/workflows/tests.yml +name: Tests + +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + test: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [ubuntu-latest, macos-latest, windows-latest] + python-version: ["3.9", "3.10", "3.11", "3.12"] + + steps: + - uses: actions/checkout@v4 + + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install -e ".[dev]" + + - name: Run tests + run: | + pytest --cov=my_package --cov-report=xml + + - name: Upload coverage + uses: codecov/codecov-action@v4 + with: + files: ./coverage.xml +``` + +## Resources + +- **Scientific Python pytest Guide**: +- **Scientific Python Testing Tutorial**: +- **Scientific Python Testing Principles**: +- **pytest Documentation**: +- **pytest-cov**: +- **pytest-mock**: +- **Hypothesis (property-based testing)**: +- **NumPy testing utilities**: +- **Testing best practices**: + +## Summary + +Testing scientific Python code with pytest, following Scientific Python community principles, provides: + +1. **Confidence**: Know your code works correctly +2. **Reproducibility**: Ensure consistent behavior across environments +3. **Documentation**: Tests show how code should be used and communicate developer intent +4. **Refactoring safety**: Change code without breaking functionality +5. **Regression prevention**: Catch bugs before they reach users +6. **Scientific rigor**: Validate numerical accuracy and physical correctness + +**Key testing principles:** + +- Start with **public interface tests** from the user's perspective +- Organize tests into **suites** (unit, integration, e2e) by type and speed +- Follow **outside-in** approach: public interface → integration → unit tests +- Keep tests **simple, focused, and independent** +- Test **behavior rather than implementation** +- Use pytest's powerful features (fixtures, parametrization, markers) effectively +- Always verify tests **fail when they should** to avoid false confidence + +**Remember**: Any test is better than none, but well-organized tests following these principles create trustworthy, maintainable scientific software that the community can rely on.