Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:49:58 +08:00
commit 5007abf04b
89 changed files with 44129 additions and 0 deletions

View File

@@ -0,0 +1,714 @@
---
name: python3-development
description: 'The model must use this skill when : 1. working within any python project. 2. Python CLI applications with Typer and Rich are mentioned by the user. 2. tasked with Python script writing or editing. 3. building CI scripts or tools. 4. Creating portable Python scripts with stdlib only. 5. planning out a python package design. 6. running any python script or test. 7. writing tests (unit, integration, e2e, validation) for a python script, package, or application. Reviewing Python code against best practices or for code smells. 8. The python command fails to run or errors, or the python3 command shows errors. 9. pre-commit or linting errors occur in python files. 10. Writing or editing python code in a git repository.\n<hint>This skill provides : 1. the users preferred workflow patterns for test-driven development, feature addition, refactoring, debugging, and code review using modern Python 3.11+ patterns (including PEP 723 inline metadata, native generics, and type-safe async processing). 2. References to favored modules. 3. Working pyproject.toml configurations. 4. Linting and formatting configuration and troubleshooting. 5. Resource files that provide solutions to known errors and linting issues. 6. Project layouts the user prefers.</hint>'
version: "1.1.0"
last_updated: "2025-11-04"
python_compatibility: "3.11+"
---
# Opinionated Python Development Skill
## Role Identification (Mandatory)
The model must identify its ROLE_TYPE and echo the following statement:
```text
My ROLE_TYPE is "<the role type>". I follow instructions given to "the model" and "<role name>".
```
Where:
- `<the role type>` is either "orchestrator" or "sub-agent" based on the ROLE_TYPE identification rules in CLAUDE.md
- `<role name>` is "orchestrator" if ROLE_TYPE is orchestrator, or "sub-agent" if ROLE_TYPE is sub-agent
**Example for orchestrator:**
```text
My ROLE_TYPE is "orchestrator". I follow instructions given to "the model" and "orchestrator".
```
**Example for sub-agent:**
```text
My ROLE_TYPE is "sub-agent". I follow instructions given to "the model" and "sub-agent".
```
---
Orchestration guide for Python development using specialized agents and modern Python 3.11-3.14 patterns.
## Skill Architecture
### Bundled Resources (Included in This Skill)
**Reference Documentation:**
- [User Project Conventions](./references/user-project-conventions.md) - Extracted conventions from user's production projects (MANDATORY for new projects)
- [Modern Python Modules](./references/modern-modules.md) - 50+ library guides with usage patterns and best practices
- [Tool & Library Registry](./references/tool-library-registry.md) - Development tools catalog for linting, testing, and build automation
- [API Reference](./references/api_reference.md) - API specifications and integration guides
- [Python Development Orchestration](./references/python-development-orchestration.md) - Detailed workflow patterns for TDD, feature addition, refactoring, and code review
**Command Templates and Guides** (`commands/`):
- Reference material for creating slash commands (NOT the actual slash commands)
- Command templates and patterns for development
- Testing and development workflow guides
- See [Commands README](./commands/README.md) for details
**Scripts and Assets:**
- Example scripts demonstrating patterns
- Configuration templates and boilerplate
### External Dependencies (Required - Not Bundled)
**Agents** (install to `~/.claude/agents/`):
- `@agent-python-cli-architect` - Python CLI development with Typer and Rich
- `@agent-python-pytest-architect` - Test suite creation and planning
- `@agent-python-code-reviewer` - Post-implementation code review
- `@agent-python-portable-script` - Standalone stdlib-only script creation
- `@agent-spec-architect` - Architecture design
- `@agent-spec-planner` - Task breakdown and planning
- `@agent-spec-analyst` - Requirements gathering
**Slash Commands** (install to `~/.claude/commands/`):
- `/modernpython` - Python 3.11+ pattern enforcement and legacy code detection
- `/shebangpython` - PEP 723 inline script metadata validation
**System Tools** (install via package manager or uv):
- `uv` - Python package and project manager (required)
- `ruff` - Linter and formatter
- `pyright` - Type checker (Microsoft)
- `mypy` - Static type checker
- `pytest` - Testing framework
- `pre-commit` - Git hook framework
- `mutmut` - Mutation testing (for critical code)
- `bandit` - Security scanner (for critical code)
**Installation Notes:**
- Agents and slash commands must be installed separately in their respective directories
- This skill provides orchestration guidance and references; agents perform actual implementation
- Use the `uv` skill for comprehensive uv documentation and package management guidance
## Core Concepts
### Python Development Standards
This skill provides orchestration patterns, modern Python 3.11+ standards, quality gates, and reference documentation for Python development.
**Commands** (external - in `~/.claude/commands/`):
- `/modernpython` - Validates Python 3.11+ patterns, identifies legacy code
- `/shebangpython` - Validates correct shebang for all Python scripts
- Note: This skill contains command templates in `commands/` directory, not the actual slash commands
**Reference Documentation**:
- Modern Python modules (50+ libraries)
- Tool and library registry with template variable system
- API specifications
- Working configurations for pyproject.toml, ruff, mypy, pytest
**Docstring Standard**: Google style (Args/Returns/Raises sections). See [User Project Conventions](./references/user-project-conventions.md) for ruff pydocstyle configuration (`convention = "google"`).
**CRITICAL: Pyproject.toml Template Variables**:
All pyproject.toml examples use explicit template variables (e.g., `{{project_name_from_directory_or_git_remote}}`) instead of generic placeholders. The model MUST replace ALL template variables with actual values before creating files. See [Tool & Library Registry sections 18-19](./references/tool-library-registry.md#18-pyprojecttoml-template-variable-reference) for:
- Complete variable reference and sourcing methods
- Mandatory rules for file creation
- Validation and verification procedures
### Script Dependency Trade-offs
Understand the complexity vs portability trade-off when creating Python CLI scripts:
**Scripts with dependencies (Typer + Rich via PEP 723)**:
-**LESS development complexity** - Leverage well-tested libraries for argument parsing, formatting, validation
-**LESS code to write** - Typer handles CLI boilerplate, Rich handles output formatting
-**Better UX** - Colors, progress bars, structured output built-in
-**Just as simple to execute** - PEP 723 makes it a single-file executable; uv handles dependencies automatically
-**Requires network access** on first run (to fetch packages)
**stdlib-only scripts**:
-**MORE development complexity** - Build everything from scratch (manual argparse, manual tree formatting, manual color codes)
-**MORE code to write and test** - Everything must be implemented manually
-**Basic UX** - Limited formatting without external libraries
-**Maximum portability** - Runs on ANY Python installation without network access
-**Best for:** Air-gapped systems, restricted corporate environments, embedded systems
**Default recommendation:** Use Typer + Rich with PEP 723 unless you have specific portability requirements that prevent network access.
**See:**
- [Python Development Orchestration Guide](./references/python-development-orchestration.md) for detailed agent selection criteria
- [Typer and Rich CLI Examples](./assets/typer_examples/index.md) for Rich width handling solutions
### Rich Panel and Table Width Handling
**Common Problem**: Rich containers (Panel, Table) wrap content at 80 characters in CI/non-TTY environments, breaking URLs, commands, and structured output.
**Two Solutions Depending on Context**:
#### Solution 1: Plain Text (No Containers)
For plain text output that shouldn't wrap:
```python
from rich.console import Console
console = Console()
# URLs, paths, commands - never wrap
console.print(long_url, crop=False, overflow="ignore")
```
#### Solution 2: Rich Containers (Panel/Table)
For Panel and Table that contain long content, `crop=False` alone doesn't work because containers calculate their own internal layout. Use `get_rendered_width()` helper with different patterns for Panel vs Table:
```python
from rich.console import Console, RenderableType
from rich.measure import Measurement
from rich.panel import Panel
from rich.table import Table
def get_rendered_width(renderable: RenderableType) -> int:
"""Get actual rendered width of Rich renderable.
Handles color codes, Unicode, styling, padding, borders.
Works with Panel, Table, or any Rich container.
"""
temp_console = Console(width=9999)
measurement = Measurement.get(temp_console, temp_console.options, renderable)
return int(measurement.maximum)
console = Console()
# Panel: Set Console width (Panel fills Console width)
panel = Panel(long_content)
panel_width = get_rendered_width(panel)
console.width = panel_width # Set Console width, NOT panel.width
console.print(panel, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
# Table: Set Table width (Table controls its own width)
table = Table()
table.add_column("Type", style="cyan", no_wrap=True)
table.add_column("Value", style="green", no_wrap=True)
table.add_row("Data", long_content)
table.width = get_rendered_width(table) # Set Table width
console.print(table, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
```
**Executable Examples**: See [./assets/typer_examples/](./assets/typer_examples/index.md) for complete working scripts:
- `console_no_wrap_example.py` - Plain text wrapping solutions
- `console_containers_no_wrap.py` - Panel/Table width handling with `get_rendered_width()`
### Rich Emoji Usage
**CRITICAL**: In Rich console output, always use Rich emoji tokens, never literal Unicode emojis.
**Never:**
- Literal Unicode: `✅ ❌ 🔍`
- Problems: Inconsistent rendering, markdown alignment issues, terminal font dependencies
**Always:**
- Rich emoji tokens: `:white_check_mark: :cross_mark: :magnifying_glass:`
- Benefits: Cross-platform compatibility, consistent rendering, markdown-safe
**Example:**
```python
from rich.console import Console
console = Console()
# ❌ Wrong - literal Unicode emojis
console.print("✅ Task completed")
console.print("❌ Task failed")
# ✅ Correct - Rich emoji tokens
console.print(":white_check_mark: Task completed")
console.print(":cross_mark: Task failed")
console.print(":sparkles: New feature")
console.print(":rocket: Performance improvement")
```
**Why this matters:**
- Rich emoji tokens work consistently across all terminals and fonts
- Avoid markdown table alignment issues (emoji width calculation problems)
- Enable proper width measurement in `get_rendered_width()`
- Ensure cross-platform compatibility (Windows, Linux, macOS)
**See**: [Rich Emoji Documentation](https://rich.readthedocs.io/en/stable/appendix/colors.html#appendix-emoji) for complete emoji token reference.
### Python Exception Handling
Catch exceptions only when you have a specific recovery action. Let all other errors propagate to the caller.
**Pattern**:
```python
def get_user(id):
return db.query(User, id) # Errors surface naturally
def get_user_with_handling(id):
try:
return db.query(User, id)
except ConnectionError:
logger.warning("DB unavailable, using cache")
return cache.get(f"user:{id}") # Specific recovery action
```
When adding try/except, answer: "What specific error do I expect, and what is my recovery action?"
**See**: [Exception Handling in Python CLI Applications](./references/exception-handling.md) for comprehensive patterns including Typer exception chain prevention.
<section ROLE_TYPE="orchestrator">
## Agent Orchestration (Orchestrator Only)
### Delegation Pattern
The orchestrator must delegate Python development tasks to specialized agents rather than implementing directly.
**The orchestrator must**:
1. Read the complete orchestration guide before delegating tasks
2. Choose appropriate agents based on task requirements
3. Provide clear context: file paths, success criteria, scope boundaries
4. Chain agents for complex workflows (design → implement → test → review)
5. Validate outputs with quality gates
**The orchestrator must NOT**:
- Write Python implementation code directly
- Create tests directly
- Review code directly
- Make technical decisions that agents should determine
### Required Reading
**The orchestrator must read and understand the complete agent selection guide before delegating any Python development task:**
[Python Development Orchestration Guide](./references/python-development-orchestration.md)
This guide contains:
- Agent selection criteria and decision trees
- Workflow patterns (TDD, feature addition, code review, refactoring, debugging)
- Quality gates and validation requirements
- Delegation best practices
### Quick Reference Example
```text
User: "Build a CLI tool to process CSV files"
Orchestrator workflow:
1. Read orchestration guide for agent selection
2. Delegate to @agent-python-cli-architect
"Create CSV processing CLI with Typer+Rich progress bars"
3. Delegate to @agent-python-pytest-architect
"Create test suite for CSV processor"
4. Instruct agent to run: /shebangpython, /modernpython
5. Delegate to @agent-python-code-reviewer
"Review CSV processor implementation"
6. Validate: uv run pre-commit run --all-files && uv run pytest
```
</section>
## Command Usage
### /modernpython
**Purpose**: Comprehensive reference guide for Python 3.11+ patterns with official PEP citations
**When to use**:
- As reference guide when writing new code
- Learning modern Python 3.11-3.14 features and patterns
- Understanding official PEPs (585, 604, 695, etc.)
- Identifying legacy patterns to avoid
- Finding modern alternatives for old code
**Note**: This is a reference document to READ, not an automated validation tool. Use it to guide your implementation choices.
**Usage**:
```text
/modernpython
→ Loads comprehensive reference guide
→ Provides Python 3.11+ pattern examples
→ Includes PEP citations with WebFetch commands
→ Shows legacy patterns to avoid
→ Shows modern alternatives to use
→ Framework-specific guides (Typer, Rich, pytest)
```
**With file path argument**:
```text
/modernpython src/mymodule.py
→ Loads guide for reference while working on specified file
→ Use guide to manually identify and refactor legacy patterns
```
### /shebangpython
**Purpose**: Validate correct shebang for ALL Python scripts based on their dependencies and execution context
**When to use**:
- Creating any standalone executable Python script
- Validating script shebang correctness
- Ensuring scripts have proper execution configuration
**Required for**: ALL executable Python scripts (validates shebang matches script type)
**What it validates**:
- **Stdlib-only scripts**: `#!/usr/bin/env python3` (no PEP 723 needed - nothing to declare)
- **Scripts with dependencies**: `#!/usr/bin/env -S uv --quiet run --active --script` + PEP 723 metadata declaring those dependencies
- **Package executables**: `#!/usr/bin/env python3` (dependencies via package manager)
- **Library modules**: No shebang (not directly executable)
**See**: [PEP 723 Reference](./references/PEP723.md) for details on inline script metadata
**Pattern**:
```text
/shebangpython scripts/deploy.py
→ Analyzes imports to determine dependency type
→ **Corrects shebang** to match script type (edits file if wrong)
→ **Adds PEP 723 metadata** if external dependencies detected (edits file)
→ **Removes PEP 723 metadata** if stdlib-only (edits file)
→ Sets execute bit if needed
→ Provides detailed verification report
```
<section ROLE_TYPE="orchestrator">
## Core Workflows (Orchestrator Only)
The orchestrator must follow established workflow patterns for Python development tasks. See [Python Development Orchestration Guide](./references/python-development-orchestration.md) for complete details.
### Workflow Overview
1. **TDD (Test-Driven Development)**: Design → Write Tests → Implement → Review → Validate
2. **Feature Addition**: Requirements → Architecture → Plan → Implement → Test → Review
3. **Code Review**: Self-Review → Standards Check → Agent Review → Fix → Re-validate
4. **Refactoring**: Tests First → Refactor → Validate → Review
5. **Debugging**: Reproduce → Trace → Fix → Test → Review
Each workflow uses agent chaining with specific quality gates. See the orchestration guide for complete patterns, examples, and best practices.
</section>
## Linting Discovery Protocol
**The model MUST execute this discovery sequence before any linting or formatting operations**:
### Discovery Sequence
1. **Check for pre-commit configuration**:
```bash
# Verify .pre-commit-config.yaml exists
test -f .pre-commit-config.yaml && echo "pre-commit detected"
```
**If found**: Use `uv run pre-commit run --files <files>` for ALL quality checks
- This runs the complete toolchain configured in the project
- Includes formatting, linting, type checking, and custom validators
- Matches exactly what runs in CI and blocks merges
2. **Else check CI pipeline configuration**:
```bash
# Check for GitLab CI or GitHub Actions
test -f .gitlab-ci.yml || find .github/workflows -name "*.yml" 2>/dev/null
```
**If found**: Read the CI config to identify required linting tools and their exact commands
- Look for `ruff`, `mypy`, `basedpyright`, `pyright`, `bandit` invocations
- Note the exact commands and flags used
- Execute those specific commands to ensure CI compatibility
3. **Else fallback to tool detection**:
- Check `pyproject.toml` `[project.optional-dependencies]` or `[dependency-groups]` for dev tools
- Use discovered tools with standard configurations
### Format-First Workflow
**CRITICAL**: The model MUST always format before linting.
**Rationale**: Formatting operations (like `ruff format`) automatically fix many linting issues (whitespace, line length, quote styles). Running linting before formatting wastes context and creates false positives.
**Mandatory sequence**:
1. **Format**: `uv run ruff format <files>` or via pre-commit
2. **Lint**: `uv run ruff check <files>` or via pre-commit
3. **Type check**: Use project-configured type checker
4. **Test**: `uv run pytest`
**When using pre-commit**:
```bash
# Pre-commit runs tools in configured order (formatting first)
uv run pre-commit run --files <files>
```
The `.pre-commit-config.yaml` already specifies correct ordering - trust it.
### Type Checker Discovery
**The model MUST detect which type checker the project uses**:
**Detection priority**:
1. Check `.pre-commit-config.yaml` for `basedpyright`, `pyright`, or `mypy` hooks
2. Check `pyproject.toml` for `[tool.basedpyright]`, `[tool.pyright]`, or `[tool.mypy]` sections
3. Check `.gitlab-ci.yml` or GitHub Actions for type checker invocations
**Common patterns**:
- **basedpyright**: GitLab projects (native GitLab reporting format)
- **pyright**: General TypeScript-style projects
- **mypy**: Python-first type checking
**Example detection**:
```bash
# Check pre-commit config
grep -E "basedpyright|pyright|mypy" .pre-commit-config.yaml
# Check pyproject.toml
grep -E "^\[tool\.(basedpyright|pyright|mypy)\]" pyproject.toml
```
**Never assume** - always detect from project configuration.
## Quality Gates
**The model MUST follow the Linting Discovery Protocol before executing quality gates.**
**Every Python task must pass**:
1. **Format-first**: `uv run ruff format <files>` (or via pre-commit)
2. **Linting**: `uv run ruff check <files>` (clean, after formatting)
3. **Type checking**: Use **detected type checker** (`basedpyright`, `pyright`, or `mypy`)
4. **Tests**: `uv run pytest` (>80% coverage)
5. **Modern patterns**: `/modernpython` (no legacy typing)
6. **Script compliance**: `/shebangpython` (for standalone scripts)
**Preferred execution method**:
```bash
# If .pre-commit-config.yaml exists (runs all checks in correct order):
uv run pre-commit run --files <changed_files>
# Else use individual tools in this exact sequence:
uv run ruff format <files> # 1. Format first
uv run ruff check <files> # 2. Lint after formatting
uv run <detected-type-checker> <files> # 3. Type check (basedpyright/pyright/mypy)
uv run pytest # 4. Test
```
**For critical code** (payments, auth, security):
- Coverage >95%
- Mutation testing: `uv run mutmut run` (>90% score)
- Security scan: `uv run bandit -r packages/`
**CI Compatibility Verification**:
After local quality gates pass, verify CI will accept the changes:
1. If `.gitlab-ci.yml` exists: Check for additional validators not in pre-commit
2. If `.github/workflows/*.yml` exists: Check for additional quality gates
3. Ensure all CI-required checks are executed locally before claiming task completion
## Standard Project Structure
All Python projects MUST use this directory layout:
```text
project-root/
├── pyproject.toml
├── packages/
│ └── package_name/ # Package code (hyphens in project name → underscores)
│ ├── __init__.py
│ └── ...
├── tests/
├── scripts/
├── sessions/ # Optional: cc-sessions framework
└── README.md
```
**Package Directory Naming**:
- Project name: `my-cli-tool` → Package directory: `packages/my_cli_tool/`
- Hyphens in project names become underscores in package directories
- The `packages/` directory distinguishes user code from external dependencies
**Hatchling Configuration**:
```toml
[tool.hatchling.build.targets.wheel]
packages = ["packages/package_name"]
```
This structure is consistent across all projects and enables clear separation of concerns.
## Integration
### External Reference Example
**Complete working example** (external): `~/.claude/agents/python-cli-demo.py`
This reference implementation demonstrates all recommended patterns:
- PEP 723 metadata with correct shebang
- Typer + Rich integration
- Modern Python 3.11+ (StrEnum, Protocol, TypeVar, Generics)
- Annotated syntax for CLI params
- Async processing
- Comprehensive docstrings
This file is not bundled with this skill and must be available in `~/.claude/agents/` separately. Use as reference when creating CLI tools.
### Using Asset Templates
When creating new Python projects, the model MUST copy standard configuration files from the skill's assets directory to ensure consistency with established conventions:
**Asset Directory Location**: `~/.claude/skills/python3-development/assets/`
**Available Templates**:
1. **version.py** - Dual-mode version management (hatch-vcs + importlib.metadata fallback)
```bash
cp ~/.claude/skills/python3-development/assets/version.py packages/{package_name}/version.py
```
2. **hatch_build.py** - Build hook for binary/asset handling (only if needed)
```bash
mkdir -p scripts/
cp ~/.claude/skills/python3-development/assets/hatch_build.py scripts/hatch_build.py
```
3. **.markdownlint.json** - Markdown linting configuration
```bash
cp ~/.claude/skills/python3-development/assets/.markdownlint.json .
```
4. **.pre-commit-config.yaml** - Standard pre-commit hooks
```bash
cp ~/.claude/skills/python3-development/assets/.pre-commit-config.yaml .
uv run pre-commit install
```
5. **.editorconfig** - Editor formatting settings
```bash
cp ~/.claude/skills/python3-development/assets/.editorconfig .
```
These templates implement the patterns documented in [User Project Conventions](./references/user-project-conventions.md) and ensure all projects follow the same standards for version management, linting, formatting, and build configuration.
<section ROLE_TYPE="orchestrator">
## Common Anti-Patterns (Orchestrator Only)
❌ **Don't**: Write Python code as orchestrator → **Do**: Delegate to agents ❌ **Don't**: Skip validation steps → **Do**: Always complete workflow (implement → test → review → validate) ❌ **Don't**: Pre-decide technical implementations → **Do**: Let agents determine HOW based on requirements
For detailed anti-pattern examples and corrections, see [Anti-Patterns section](./references/python-development-orchestration.md#anti-patterns-to-avoid) in the orchestration guide.
</section>
## Detailed Documentation
### Reference Documentation
**Core Orchestration Guide**: [Python Development Orchestration](./references/python-development-orchestration.md) - Detailed workflow patterns for TDD, feature addition, refactoring, and code review with comprehensive agent coordination strategies
**PEP 723 Specification**: [PEP 723 - Inline Script Metadata](./references/PEP723.md) - User-friendly guide to PEP 723 inline script metadata with examples and migration patterns
**Exception Handling**: [Exception Handling in Python CLI Applications with Typer](./references/exception-handling.md) - Critical guidance on preventing exception chain explosion in Typer applications with correct patterns for graceful error handling
**Typer and Rich Examples**: [Typer and Rich CLI Examples](./assets/typer_examples/index.md) - Executable examples demonstrating solutions to common problems with Rich Console text wrapping in CI/non-TTY environments and Panel/Table content wrapping
**Module Reference**: [Modern Python Modules](./references/modern-modules.md) - Comprehensive guide to 50+ modern Python libraries with deep-dive documentation for each module including usage patterns and best practices
**Tool Registry**: [Tool & Library Registry](./references/tool-library-registry.md) - Catalog of development tools, their purposes, and usage patterns for linting, testing, and build automation
**API Documentation**: [API Reference](./references/api_reference.md) - API specifications, integration guides, and programmatic interface documentation
#### Navigating Large References
To find specific modules in modern-modules.md:
```bash
grep -i "^### " references/modern-modules.md
```
To search for tools by category in tool-library-registry.md:
```bash
grep -A 5 "^## " references/tool-library-registry.md
```
To locate workflow patterns in python-development-orchestration.md:
```bash
grep -i "^## " references/python-development-orchestration.md
```
### External Commands
These slash commands are external dependencies installed in `~/.claude/commands/`:
- [/modernpython](~/.claude/commands/modernpython.md) - Python 3.11+ patterns and PEP references
- [/shebangpython](~/.claude/commands/shebangpython.md) - PEP 723 validation and shebang standards
## Summary
### Python Development Skill for All Roles
**For All Roles (Orchestrators and Agents)**:
- Modern Python 3.11+ standards and patterns
- Quality gates: ruff, pyright, mypy, pytest (>80% coverage)
- Command standards: /modernpython, /shebangpython
- Reference documentation for 50+ modern Python modules
- Tool and library registry
<section ROLE_TYPE="orchestrator">
**For Orchestrators Only**:
1. Read the [orchestration guide](./references/python-development-orchestration.md) before delegating
2. Choose the right agent based on task requirements
3. Provide clear context: file paths, success criteria, scope boundaries
4. Chain agents for complex workflows (design → test → implement → review)
5. Instruct agents to validate with quality gates and commands
6. Enable uv skill for package management
**Orchestration = Coordination + Delegation + Validation**
</section>

View File

@@ -0,0 +1,47 @@
# EditorConfig: https://editorconfig.org/
# top-most EditorConfig file
root = true
# All (Defaults)
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
max_line_length = 120
# Markdown
[*.md]
indent_style = space
indent_size = 4
trim_trailing_whitespace = false
# Python
[*.py]
indent_style = space
indent_size = 4
# YAML
[*.{yml,yaml}]
indent_style = space
indent_size = 2
# Shell Script
[*.sh]
indent_style = space
indent_size = 4
# TOML
[*.toml]
indent_style = space
indent_size = 2
# JSON
[*.json]
indent_style = space
indent_size = 2
# Git commit messages
[COMMIT_EDITMSG]
max_line_length = 72

View File

@@ -0,0 +1,38 @@
{
"MD003": false,
"MD007": { "indent": 2 },
"MD001": false,
"MD022": false,
"MD024": false,
"MD013": false,
"MD036": false,
"MD025": false,
"MD031": false,
"MD041": false,
"MD029": false,
"MD033": false,
"MD046": false,
"blanks-around-fences": false,
"blanks-around-headings": false,
"blanks-around-lists": false,
"code-fence-style": false,
"emphasis-style": false,
"heading-start-left": false,
"heading-style": false,
"hr-style": false,
"line-length": false,
"list-indent": false,
"list-marker-space": false,
"no-blanks-blockquote": false,
"no-hard-tabs": false,
"no-missing-space-atx": false,
"no-missing-space-closed-atx": false,
"no-multiple-blanks": false,
"no-multiple-space-atx": false,
"no-multiple-space-blockquote": false,
"no-multiple-space-closed-atx": false,
"no-trailing-spaces": false,
"ol-prefix": false,
"strong-style": false,
"ul-indent": false
}

View File

@@ -0,0 +1,109 @@
# Pre-commit configuration for python_picotool repository
repos:
- repo: https://github.com/mxr/sync-pre-commit-deps
rev: v0.0.3
hooks:
- id: sync-pre-commit-deps
# Standard pre-commit hooks for general file maintenance
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: trailing-whitespace
exclude: \.lock$
- id: end-of-file-fixer
exclude: \.lock$
- id: check-yaml
- id: check-json
- id: check-toml
- id: check-added-large-files
args: ["--maxkb=10000"] # 10MB limit
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: mixed-line-ending
args: ["--fix=lf"]
- id: check-executables-have-shebangs
- id: check-shebang-scripts-are-executable
# Python formatting and linting
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.13.3
hooks:
- id: ruff
name: Lint Python with ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
name: Format Python with ruff
# Shell script linting
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.11.0.1
hooks:
- id: shellcheck
name: Check shell scripts with shellcheck
files: \.(sh|bash)$
args: [-x, --severity=warning]
# YAML/JSON formatting
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v4.0.0-alpha.8
hooks:
- id: prettier
name: Format YAML, JSON, and Markdown files
types_or: [yaml, json, markdown]
exclude: \.lock$
# Shell formatting
- repo: https://github.com/pecigonzalo/pre-commit-shfmt
rev: v2.2.0
hooks:
- id: shell-fmt-go
args:
- "--apply-ignore"
- -w
- -i
- "4"
- -ci
# Local hooks for type checking
- repo: local
hooks:
- id: install-pep723-deps
name: Install PEP 723 script dependencies
entry: bash -c 'for file in "$@"; do if head -20 "$file" | grep -q "# /// script"; then uv export --script "$file" | uv pip install --quiet -r -; fi; done' --
language: system
types: [python]
pass_filenames: true
- id: mypy
name: mypy
entry: uv run mypy
language: system
types: [python]
pass_filenames: true
- id: pyright
name: basedpyright
entry: uv run basedpyright
language: system
types: [python]
pass_filenames: true
require_serial: true
# Configuration for specific hooks
default_language_version:
python: python3
# Exclude patterns
exclude: |
(?x)^(
\.git/|
\.venv/|
__pycache__/|
\.mypy_cache/|
\.cache/|
\.pytest_cache/|
\.lock$|
typings/
)

View File

@@ -0,0 +1,119 @@
"""Custom hatchling build hook for binary compilation.
This hook runs before the build process to compile platform-specific binaries
if build scripts are present in the project.
"""
from __future__ import annotations
import shutil
import subprocess # nosec B404 - subprocess required for build script execution, all calls use list form (not shell=True)
from pathlib import Path
from typing import Any
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
class BinaryBuildHook(BuildHookInterface[Any]):
"""Build hook that runs binary compilation scripts before packaging.
This hook checks for the following scripts in order:
1. scripts/build-binaries.sh
2. scripts/build-binaries.py
If either script exists, it is executed before the build process.
If neither exists, the hook silently continues without error.
"""
PLUGIN_NAME = "binary-build"
def initialize(self, version: str, build_data: dict[str, Any]) -> None:
"""Run binary build scripts if they exist.
This method is called immediately before each build. It checks for
build scripts and executes them if found.
Args:
version: The version string for this build
build_data: Build configuration dictionary that will be passed to the build target
"""
# Check for shell script first
shell_script = Path(self.root) / "scripts" / "build-binaries.sh"
if shell_script.exists() and shell_script.is_file():
self._run_shell_script(shell_script)
return
# Fallback to Python script
python_script = Path(self.root) / "scripts" / "build-binaries.py"
if python_script.exists() and python_script.is_file():
self._run_python_script(python_script)
return
# No scripts found - silently continue
self.app.display_info("No binary build scripts found, skipping binary compilation")
def _run_shell_script(self, script_path: Path) -> None:
"""Execute a shell script for binary building.
Args:
script_path: Path to the shell script to execute
Raises:
subprocess.CalledProcessError: If the script exits with non-zero status
"""
self.app.display_info(f"Running binary build script: {script_path}")
# Get full path to bash executable for security (B607)
bash_path = shutil.which("bash")
if not bash_path:
msg = "bash executable not found in PATH"
raise RuntimeError(msg)
try:
result = subprocess.run( # nosec B603 - using command list with full path, not shell=True
[bash_path, str(script_path)], cwd=self.root, capture_output=True, text=True, check=True
)
if result.stdout:
self.app.display_info(result.stdout)
if result.stderr:
self.app.display_warning(result.stderr)
except subprocess.CalledProcessError as e:
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
if e.stdout:
self.app.display_info(f"stdout: {e.stdout}")
if e.stderr:
self.app.display_error(f"stderr: {e.stderr}")
raise
def _run_python_script(self, script_path: Path) -> None:
"""Execute a Python script for binary building.
Args:
script_path: Path to the Python script to execute
Raises:
subprocess.CalledProcessError: If the script exits with non-zero status
"""
self.app.display_info(f"Running binary build script: {script_path}")
# Get full path to python3 executable for security (B607)
python_path = shutil.which("python3")
if not python_path:
msg = "python3 executable not found in PATH"
raise RuntimeError(msg)
try:
result = subprocess.run( # nosec B603 - using command list with full path, not shell=True
[python_path, str(script_path)], cwd=self.root, capture_output=True, text=True, check=True
)
if result.stdout:
self.app.display_info(result.stdout)
if result.stderr:
self.app.display_warning(result.stderr)
except subprocess.CalledProcessError as e:
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
if e.stdout:
self.app.display_info(f"stdout: {e.stdout}")
if e.stderr:
self.app.display_error(f"stderr: {e.stderr}")
raise

View File

@@ -0,0 +1 @@
broken.json

View File

@@ -0,0 +1,52 @@
# Exception Chain Explosion Demonstration Scripts
This directory contains executable demonstration scripts showing the anti-pattern of exception chain explosion in Typer CLI applications and the correct patterns to prevent it.
## Quick Reference
| Script | Output | Purpose |
| -------------------------------------------------------------------------------------------------------------------------- | ---------- | ---------------------------------------------------------- |
| [nested-typer-exception-explosion.py](./nested-typer-exception-explosion.py) | ~220 lines | Shows the anti-pattern with 7 layers of exception wrapping |
| [nested-typer-exception-explosion_naive_workaround.py](./nested-typer-exception-explosion_naive_workaround.py) | ~80 lines | Shows the isinstance band-aid workaround |
| [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exception-explosion_corrected_typer_echo.py) | 1 line | Correct pattern using typer.echo |
| [nested-typer-exception-explosion_corrected_rich_console.py](./nested-typer-exception-explosion_corrected_rich_console.py) | 1 line | Correct pattern using Rich Console |
## Running the Scripts
All scripts use PEP 723 inline script metadata and can be run directly:
```bash
# Run directly (uv handles dependencies automatically)
./nested-typer-exception-explosion.py broken.json
# Or explicitly with uv
uv run nested-typer-exception-explosion.py broken.json
```
The scripts will create `broken.json` with invalid content if it doesn't exist.
## The Problem
**Anti-pattern:** Every function catches and re-wraps exceptions → 220 lines of traceback for "file not found"
**Correct pattern:** Let exceptions bubble naturally, only catch at specific points → 1 line of clean output
## Documentation
For detailed explanations, code patterns, and best practices, see:
**[Exception Handling in Python CLI Applications with Typer](../../references/exception-handling.md)**
This comprehensive guide includes:
- Complete explanation of the exception chain explosion problem
- Correct patterns using `typer.Exit` subclasses
- When to catch exceptions and when to let them bubble
- Full code examples with detailed annotations
- DO/DON'T guidelines
## External References
- [Typer Terminating Documentation](https://github.com/fastapi/typer/blob/master/docs/tutorial/terminating.md)
- [Typer Exceptions Documentation](https://github.com/fastapi/typer/blob/master/docs/tutorial/exceptions.md)
- [Typer Printing Documentation](https://github.com/fastapi/typer/blob/master/docs/tutorial/printing.md)

View File

@@ -0,0 +1,188 @@
#!/usr/bin/env -S uv run --quiet --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["typer>=0.19.2"]
# ///
# ruff: noqa: TRY300, TRY301
# mypy: ignore-errors
"""Demonstration of exception chain explosion anti-pattern.
This script demonstrates how catching and re-wrapping exceptions at every
layer creates massive traceback output (8+ pages) for simple errors.
Based on real AI-generated code patterns that destroy terminal UI.
Run this to see the problem:
./nested-typer-exception-explosion.py broken.json
The 'broken.json' file will be created with invalid JSON content.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
from typing import Annotated
import typer # pyright: ignore[reportMissingImports]
app = typer.Typer()
class ConfigError(Exception):
"""Custom exception for configuration errors."""
# LAYER 1: Low-level file reading
def read_file_contents(file_path: Path) -> str:
"""Read file contents - ANTI-PATTERN: Wraps exceptions unnecessarily.
Raises:
ConfigError: If file cannot be read (wrapped)
"""
try:
return file_path.read_text(encoding="utf-8")
except FileNotFoundError as e:
raise ConfigError(f"File not found: {file_path}") from e
except PermissionError as e:
raise ConfigError(f"Permission denied: {file_path}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches the ConfigError we just raised!
raise ConfigError(f"Failed to read {file_path}: {e}") from e
# LAYER 2: JSON parsing
def parse_json_string(content: str, source: str) -> dict:
"""Parse JSON string - ANTI-PATTERN: Another wrapping layer.
Raises:
ConfigError: If JSON cannot be parsed (wrapped again)
"""
try:
return json.loads(content)
except json.JSONDecodeError as e:
raise ConfigError(f"Invalid JSON in {source} at line {e.lineno}, column {e.colno}: {e.msg}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches ConfigError we just raised
raise ConfigError(f"JSON parse error in {source}: {e}") from e
# LAYER 3: Load JSON from file
def load_json_file(file_path: Path) -> dict:
"""Load JSON from file - ANTI-PATTERN: Yet another wrapping layer.
Raises:
ConfigError: If file cannot be loaded (wrapped third time)
"""
try:
contents = read_file_contents(file_path)
data = parse_json_string(contents, str(file_path))
return data
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-wrapped exception AGAIN
raise ConfigError(f"Failed to load JSON from {file_path}: {e}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches ConfigError we just raised
raise ConfigError(f"Unexpected error loading {file_path}: {e}") from e
# LAYER 4: Validate config structure
def validate_config_structure(data: object, source: str) -> dict:
"""Validate config structure - ANTI-PATTERN: More wrapping.
Raises:
ConfigError: If validation fails (wrapped fourth time)
"""
try:
if not isinstance(data, dict):
raise TypeError("Config must be a JSON object")
if not data:
raise ValueError("Config cannot be empty")
return data
except (TypeError, ValueError) as e:
raise ConfigError(f"Invalid config structure in {source}: {e}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches ConfigError we just raised
raise ConfigError(f"Config validation error in {source}: {e}") from e
# LAYER 5: Load and validate config
def load_config(file_path: Path) -> dict:
"""Load and validate config - ANTI-PATTERN: Fifth wrapping layer.
Raises:
ConfigError: If config cannot be loaded (wrapped fifth time)
"""
try:
data = load_json_file(file_path)
validated = validate_config_structure(data, str(file_path))
return validated
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-quadruple-wrapped exception
raise ConfigError(f"Configuration loading failed: {e}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches ConfigError we just raised
raise ConfigError(f"Unexpected configuration error: {e}") from e
# LAYER 6: Process config
def process_config(file_path: Path) -> None:
"""Process configuration - ANTI-PATTERN: Sixth wrapping layer.
Raises:
ConfigError: If processing fails (wrapped sixth time)
"""
try:
config = load_config(file_path)
typer.echo(f"Successfully loaded config: {config}")
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-quintuple-wrapped exception
raise ConfigError(f"Failed to process configuration: {e}") from e
except Exception as e:
# ANTI-PATTERN: Safety net catches ConfigError we just raised
raise ConfigError(f"Processing error: {e}") from e
# LAYER 7: CLI entry point
@app.command()
def main(
config_file: Annotated[Path, typer.Argument(help="Path to JSON configuration file")] = Path("broken.json"),
) -> None:
"""Load and process a JSON configuration file.
This demonstrates the ANTI-PATTERN of exception chain explosion.
When an error occurs, you'll see a massive exception chain through 7+ layers.
Example:
# Create a broken JSON file
echo "i'm broken" > broken.json
# Run the script to see exception explosion
./nested-typer-exception-explosion.py broken.json
"""
# DON'T catch here - let the exception chain explode through all layers
# This shows the full horror of the nested wrapping pattern
process_config(config_file)
@app.command()
def create_test_file() -> None:
"""Create a broken JSON file for testing the exception explosion."""
broken_file = Path("broken.json")
broken_file.write_text("i'm broken")
typer.echo(f"Created {broken_file} with invalid JSON content")
if __name__ == "__main__":
# Auto-create broken.json if it doesn't exist and is being used
if len(sys.argv) > 1:
arg = sys.argv[-1]
if arg == "broken.json" or (not arg.startswith("-") and Path(arg).name == "broken.json"):
broken_file = Path("broken.json")
if not broken_file.exists():
typer.echo("Creating broken.json for demonstration...")
broken_file.write_text("i'm broken")
typer.echo()
app()

View File

@@ -0,0 +1,185 @@
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["typer>=0.19.2"]
# ///
"""Demonstration of exception chain explosion anti-pattern corrected using rich console.
This shows how to resolve the issue with nested-typer-exception-explosion.py
by using rich console to print errors consistently with the CLI UX.
Run this to see the problem:
./nested-typer-exception-explosion.py broken.json
The 'broken.json' file will be created with invalid JSON content.
"""
# mypy: ignore-errors
from __future__ import annotations
import json
from pathlib import Path
from typing import Annotated, Any
try:
import typer # pyright: ignore[reportMissingImports]
from rich.console import Console # pyright: ignore[reportMissingImports]
except ImportError as e:
error_message = f"""
This script needs to be run using a PEP723 compliant executor like uv
which can handle finding and installing dependencies automatically,
unlike python or python3 which require you to manually install the dependencies.
What is inline-metadata? > https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata
What is PEP723? > https://peps.python.org/pep-0723/
How to do this yourself? > https://docs.astral.sh/uv/guides/scripts/
If you have uv on this system, then this script can be run without prefixing any application.
example: ./thisscript.py <arguments>
You can explicitly invoke it with uv:
example: uv run ./thisscript.py <arguments>
If you do not have uv installed, then you can install it following the instructions at:
https://docs.astral.sh/uv/getting-started/installation/
If that is TL;DR, then you can install it with the following command:
curl -fsSL https://astral.sh/uv/install.sh | bash
The longform way to run scripts with inline dependencies is to install the dependencies manually
and run the script with python or python3.
example:
python3 -m venv .venv
source .venv/bin/activate
pip install typer
python3 thisscript.py <arguments>
ImportException: {e!s}
"""
raise ImportError(error_message) from None
normal_console = Console()
err_console = Console(stderr=True)
app = typer.Typer()
DEFAULT_CONFIG_FILE = Path("broken.json")
class AppExitRich(typer.Exit):
"""Exception class for application exits using rich console"""
def __init__(self, code: int | None = None, message: str | None = None, console: Console = normal_console):
"""Custom exception using console based formatting to keep errors consistent with the CLI UX"""
self.code = code
self.message = message
if message is not None:
console.print(self.message, crop=False, overflow="ignore")
super().__init__(code=code)
class ConfigError(Exception):
"""Custom exception for errors that will be handled internally"""
# LAYER 1: Low-level file reading
def read_file_contents(file_path: Path) -> str:
"""Read file contents - ANTI-PATTERN: Wraps exceptions unnecessarily.
Raises:
FileNotFoundError: If file does not exist
PermissionError: If file is not readable
"""
return file_path.read_text(encoding="utf-8")
# LAYER 2: JSON parsing
def parse_json_string(content: str, source: str) -> dict:
"""Parse JSON string - ANTI-PATTERN: Another wrapping layer.
Bubbles up:
json.JSONDecodeError: If JSON is not valid
"""
return json.loads(content)
# LAYER 3: Load JSON from file
def load_json_file(file_path: Path) -> dict:
"""Load JSON from file - ANTI-PATTERN: Yet another wrapping layer.
Bubbles up:
FileNotFoundError: If file does not exist
PermissionError: If file is not readable
json.JSONDecodeError: If file is not valid JSON
"""
contents = read_file_contents(file_path)
try:
return parse_json_string(contents, str(file_path))
except json.JSONDecodeError as e:
raise AppExitRich(code=1, message=f"Invalid JSON in {file_path!s} at line {e.lineno}, column {e.colno}: {e.msg}") from e
# LAYER 4: Validate config structure
def validate_config_structure(data: Any, source: str) -> dict:
"""Validate config structure - ANTI-PATTERN: More wrapping.
Raises:
TypeError: If config is not a JSON object
ValueError: If config is empty
"""
if not data:
raise AppExitRich(code=1, message="Config cannot be empty", console=err_console)
if not isinstance(data, dict):
raise AppExitRich(code=1, message=f"Config must be a JSON object, got {type(data)}", console=err_console)
return data
# LAYER 5: Load and validate config (consolidate exception handling)
def load_config(file_path: Path) -> dict:
"""Load and validate config - ANTI-PATTERN: Fifth wrapping layer.
Raises:
ConfigError: If config cannot be loaded, invalid structure, or empty
"""
try:
data = load_json_file(file_path)
except (FileNotFoundError, PermissionError) as e:
raise AppExitRich(code=1, message=f"Failed to load config from {file_path}", console=err_console) from e
else:
return validate_config_structure(data, str(file_path))
# LAYER 6: Process config
def process_config(file_path: Path) -> dict:
"""Process configuration - ANTI-PATTERN: Sixth wrapping layer.
Bubbles up:
ConfigError: If processing fails (wrapped sixth time)
"""
config = load_config(file_path)
normal_console.print("Config successfully loaded")
return config
# LAYER 7: CLI entry point
@app.command()
def main(
config_file: Annotated[Path | None, typer.Argument(help="Path to JSON configuration file")] = None,
) -> None:
"""Load and process a JSON configuration file.
This demonstrates the ANTI-PATTERN of exception chain explosion.
When an error occurs, you'll see a massive exception chain through 7+ layers.
Example:
# Create a broken JSON file
echo "i'm broken" > broken.json
# Run the script to see exception explosion
./nested-typer-exception-explosion.py broken.json
"""
normal_console.print("Starting script")
if config_file is None:
normal_console.print(f"No config file provided, using default: {DEFAULT_CONFIG_FILE!s}")
config_file = DEFAULT_CONFIG_FILE
process_config(config_file)
if __name__ == "__main__":
app()

View File

@@ -0,0 +1,185 @@
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["typer>=0.19.2"]
# ///
"""Demonstration of exception chain explosion anti-pattern corrected using typer.echo.
This shows how to resolve the issue with nested-typer-exception-explosion.py
by using typer.echo to print errors consistently with the CLI UX.
Run this to see the problem:
./nested-typer-exception-explosion.py broken.json
The 'broken.json' file will be created with invalid JSON content.
"""
# mypy: ignore-errors
from __future__ import annotations
import json
from pathlib import Path
from typing import Annotated, Any
try:
import typer # pyright: ignore[reportMissingImports]
except ImportError as e:
error_message = f"""
This script needs to be run using a PEP723 compliant executor like uv
which can handle finding and installing dependencies automatically,
unlike python or python3 which require you to manually install the dependencies.
What is inline-metadata? > https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata
What is PEP723? > https://peps.python.org/pep-0723/
How to do this yourself? > https://docs.astral.sh/uv/guides/scripts/
If you have uv on this system, then this script can be run without prefixing any application.
example: ./thisscript.py <arguments>
You can explicitly invoke it with uv:
example: uv run ./thisscript.py <arguments>
If you do not have uv installed, then you can install it following the instructions at:
https://docs.astral.sh/uv/getting-started/installation/
If that is TL;DR, then you can install it with the following command:
curl -fsSL https://astral.sh/uv/install.sh | bash
The longform way to run scripts with inline dependencies is to install the dependencies manually
and run the script with python or python3.
example:
python3 -m venv .venv
source .venv/bin/activate
pip install typer
python3 thisscript.py <arguments>
ImportException: {e!s}
"""
raise ImportError(error_message) from None
app = typer.Typer()
DEFAULT_CONFIG_FILE = Path("broken.json")
class AppExit(typer.Exit):
"""Exception class for application exits using typer"""
def __init__(self, code: int | None = None, message: str | None = None):
"""Custom exception for using typer.echo"""
self.code = code
self.message = message
if message is not None:
if code is None or code == 0:
typer.echo(self.message)
else:
typer.echo(self.message, err=True)
super().__init__(code=code)
class ConfigError(Exception):
"""Custom exception for errors that will be handled internally"""
# LAYER 1: Low-level file reading
def read_file_contents(file_path: Path) -> str:
"""Read file contents - ANTI-PATTERN: Wraps exceptions unnecessarily.
Raises:
FileNotFoundError: If file does not exist
PermissionError: If file is not readable
"""
return file_path.read_text(encoding="utf-8")
# LAYER 2: JSON parsing
def parse_json_string(content: str, source: str) -> dict:
"""Parse JSON string - ANTI-PATTERN: Another wrapping layer.
Bubbles up:
json.JSONDecodeError: If JSON is not valid
"""
return json.loads(content)
# LAYER 3: Load JSON from file
def load_json_file(file_path: Path) -> dict:
"""Load JSON from file - ANTI-PATTERN: Yet another wrapping layer.
Bubbles up:
FileNotFoundError: If file does not exist
PermissionError: If file is not readable
json.JSONDecodeError: If file is not valid JSON
"""
contents = read_file_contents(file_path)
try:
return parse_json_string(contents, str(file_path))
except json.JSONDecodeError as e:
raise AppExit(code=1, message=f"Invalid JSON in {file_path!s} at line {e.lineno}, column {e.colno}: {e.msg}") from e
# LAYER 4: Validate config structure
def validate_config_structure(data: Any, source: str) -> dict:
"""Validate config structure - ANTI-PATTERN: More wrapping.
Raises:
TypeError: If config is not a JSON object
ValueError: If config is empty
"""
if not data:
raise AppExit(code=1, message="Config cannot be empty")
if not isinstance(data, dict):
raise AppExit(code=1, message=f"Config must be a JSON object, got {type(data)}")
return data
# LAYER 5: Load and validate config (consolidate exception handling)
def load_config(file_path: Path) -> dict:
"""Load and validate config - ANTI-PATTERN: Fifth wrapping layer.
Raises:
ConfigError: If config cannot be loaded, invalid structure, or empty
"""
try:
data = load_json_file(file_path)
except (FileNotFoundError, PermissionError) as e:
raise AppExit(code=1, message=f"Failed to load config from {file_path}") from e
else:
return validate_config_structure(data, str(file_path))
# LAYER 6: Process config
def process_config(file_path: Path) -> dict:
"""Process configuration - ANTI-PATTERN: Sixth wrapping layer.
Bubbles up:
ConfigError: If processing fails (wrapped sixth time)
"""
config = load_config(file_path)
typer.echo("Config successfully loaded")
return config
# LAYER 7: CLI entry point
@app.command()
def main(
config_file: Annotated[Path | None, typer.Argument(help="Path to JSON configuration file")] = None,
) -> None:
"""Load and process a JSON configuration file.
This demonstrates the ANTI-PATTERN of exception chain explosion.
When an error occurs, you'll see a massive exception chain through 7+ layers.
Example:
# Create a broken JSON file
echo "i'm broken" > broken.json
# Run the script to see exception explosion
./nested-typer-exception-explosion.py broken.json
"""
typer.echo("Starting script")
if config_file is None:
typer.echo(f"No config file provided, using default: {DEFAULT_CONFIG_FILE!s}")
config_file = DEFAULT_CONFIG_FILE
process_config(config_file)
if __name__ == "__main__":
app()

View File

@@ -0,0 +1,207 @@
#!/usr/bin/env -S uv run --quiet --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["typer>=0.19.2"]
# ///
# ruff: noqa: TRY300, TRY301
# mypy: ignore-errors
"""Demonstration of the "naive workaround" to exception chain explosion.
This script shows the isinstance() check pattern that AI generates to avoid
double-wrapping exceptions. This is a BAND-AID on the real problem.
The workaround makes output cleaner (no massive chains) but:
- Adds complexity and cognitive load
- Treats the symptom, not the root cause
- Still has nested exception handling everywhere
- The code "knows" it's doing something wrong
The CORRECT solution: Don't catch and wrap at every layer in the first place.
Run this to see the "workaround" output:
./nested-typer-exception-explosion_naive_workaround.py broken.json
Compare to nested-typer-exception-explosion.py to see the difference.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
from typing import Annotated
import typer # pyright: ignore[reportMissingImports]
app = typer.Typer()
class ConfigError(Exception):
"""Custom exception for configuration errors."""
# LAYER 1: Low-level file reading
def read_file_contents(file_path: Path) -> str:
"""Read file contents - ANTI-PATTERN: Wraps exceptions unnecessarily.
Raises:
ConfigError: If file cannot be read (wrapped)
"""
try:
return file_path.read_text(encoding="utf-8")
except FileNotFoundError as e:
raise ConfigError(f"File not found: {file_path}") from e
except PermissionError as e:
raise ConfigError(f"Permission denied: {file_path}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
# This is treating the symptom, not fixing the root cause!
if isinstance(e, ConfigError):
raise # Re-raise without wrapping
raise ConfigError(f"Failed to read {file_path}: {e}") from e
# LAYER 2: JSON parsing
def parse_json_string(content: str, source: str) -> dict:
"""Parse JSON string - ANTI-PATTERN: Another wrapping layer.
Raises:
ConfigError: If JSON cannot be parsed (wrapped again)
"""
try:
return json.loads(content)
except json.JSONDecodeError as e:
raise ConfigError(f"Invalid JSON in {source} at line {e.lineno}, column {e.colno}: {e.msg}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
if isinstance(e, ConfigError):
raise
raise ConfigError(f"JSON parse error in {source}: {e}") from e
# LAYER 3: Load JSON from file
def load_json_file(file_path: Path) -> dict:
"""Load JSON from file - ANTI-PATTERN: Yet another wrapping layer.
Raises:
ConfigError: If file cannot be loaded (wrapped third time)
"""
try:
contents = read_file_contents(file_path)
data = parse_json_string(contents, str(file_path))
return data
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-wrapped exception AGAIN
raise ConfigError(f"Failed to load JSON from {file_path}: {e}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
if isinstance(e, ConfigError):
raise
raise ConfigError(f"Unexpected error loading {file_path}: {e}") from e
# LAYER 4: Validate config structure
def validate_config_structure(data: dict, source: str) -> dict:
"""Validate config structure - ANTI-PATTERN: More wrapping.
Raises:
ConfigError: If validation fails (wrapped fourth time)
"""
try:
if not isinstance(data, dict):
raise TypeError("Config must be a JSON object")
if not data:
raise ValueError("Config cannot be empty")
return data
except (TypeError, ValueError) as e:
raise ConfigError(f"Invalid config structure in {source}: {e}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
if isinstance(e, ConfigError):
raise
raise ConfigError(f"Config validation error in {source}: {e}") from e
# LAYER 5: Load and validate config
def load_config(file_path: Path) -> dict:
"""Load and validate config - ANTI-PATTERN: Fifth wrapping layer.
Raises:
ConfigError: If config cannot be loaded (wrapped fifth time)
"""
try:
data = load_json_file(file_path)
validated = validate_config_structure(data, str(file_path))
return validated
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-quadruple-wrapped exception
raise ConfigError(f"Configuration loading failed: {e}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
if isinstance(e, ConfigError):
raise
raise ConfigError(f"Unexpected configuration error: {e}") from e
# LAYER 6: Process config
def process_config(file_path: Path) -> None:
"""Process configuration - ANTI-PATTERN: Sixth wrapping layer.
Raises:
ConfigError: If processing fails (wrapped sixth time)
"""
try:
config = load_config(file_path)
typer.echo(f"Successfully loaded config: {config}")
except ConfigError as e:
# ANTI-PATTERN: Wrap the already-quintuple-wrapped exception
raise ConfigError(f"Failed to process configuration: {e}") from e
except Exception as e:
# NAIVE WORKAROUND: Check isinstance to avoid double-wrapping
if isinstance(e, ConfigError):
raise
raise ConfigError(f"Processing error: {e}") from e
# LAYER 7: CLI entry point
@app.command()
def main(
config_file: Annotated[Path, typer.Argument(help="Path to JSON configuration file")] = Path("broken.json"),
) -> None:
"""Load and process a JSON configuration file.
This demonstrates the ANTI-PATTERN of exception chain explosion.
When an error occurs, you'll see a massive exception chain through 7+ layers.
Example:
# Create a broken JSON file
echo "i'm broken" > broken.json
# Run the script to see exception explosion
./nested-typer-exception-explosion.py broken.json
"""
# DON'T catch here - let the exception chain explode through all layers
# This shows the full horror of the nested wrapping pattern
process_config(config_file)
@app.command()
def create_test_file() -> None:
"""Create a broken JSON file for testing the exception explosion."""
broken_file = Path("broken.json")
broken_file.write_text("i'm broken")
typer.echo(f"Created {broken_file} with invalid JSON content")
if __name__ == "__main__":
# Auto-create broken.json if it doesn't exist and is being used
if len(sys.argv) > 1:
arg = sys.argv[-1]
if arg == "broken.json" or (not arg.startswith("-") and Path(arg).name == "broken.json"):
broken_file = Path("broken.json")
if not broken_file.exists():
typer.echo("Creating broken.json for demonstration...")
broken_file.write_text("i'm broken")
typer.echo()
app()

View File

@@ -0,0 +1,147 @@
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "rich>=13.0.0",
# ]
# ///
"""Rich containers (Panel, Table) behavior with long content in non-TTY environments."""
from rich.console import Console, RenderableType
from rich.measure import Measurement
from rich.panel import Panel
from rich.table import Table
def get_rendered_width(renderable: RenderableType) -> int:
"""Get actual rendered width of any Rich renderable.
Handles color codes, Unicode, styling, padding, and borders.
Works with Panel, Table, or any Rich container.
"""
temp_console = Console(width=999999)
measurement = Measurement.get(temp_console, temp_console.options, renderable)
return int(measurement.maximum)
long_url = (
"https://raw.githubusercontent.com/python/cpython/main/Lib/asyncio/base_events.py"
"#L1000-L1100?ref=docs-example&utm_source=rich-demo&utm_medium=terminal"
"&utm_campaign=long-url-wrapping-behavior-test"
)
long_command = "\n".join([
"[bold cyan]:sparkles: v3.13.0 Release Highlights :sparkles:[/bold cyan] New JIT optimizations, faster startup, improved error messages, richer tracebacks, better asyncio diagnostics, enhanced typing features, smoother virtualenv workflows, and a refined standard library experience for developers everywhere.",
"[green]:rocket: Performance & Reliability :rocket:[/green] Lower latency event loops, smarter garbage collection heuristics, adaptive I/O backpressure, fine-tuned file system operations, reduced memory fragmentation, and sturdier cross-platform behavior in cloud-native deployments.",
"[magenta]:hammer_and_wrench: Developer Experience :hammer_and_wrench:[/magenta] More precise type hints, clearer deprecation warnings, friendlier REPL niceties, first-class debugging hooks, expanded `typing` utilities, and streamlined packaging stories for modern Python projects of all sizes.",
"[yellow]:shield: Security & Ecosystem :shield:[/yellow] Hardened TLS defaults, safer subprocess handling, improved sandboxing hooks, more robust hashing algorithms, curated secure defaults across modules, and deeper ecosystem integration for auditing, scanning, and compliance workflows.",
])
console = Console()
## BROKEN EXAMPLES AND ANTI-PATTERNS
print("=" * 80)
print("Panel with default settings")
print("=" * 80)
panel = Panel(f"URL: {long_url}\nCommand: {long_command}")
console.print(panel)
print("\n" + "=" * 80)
print("Panel with crop=False, overflow='ignore' on print")
print("=" * 80)
console.print(panel, crop=False, overflow="ignore")
print("\n" + "=" * 80)
print("Panel with expand=False and measured width")
print("=" * 80)
# Avoid doing this, where you set the console width for all output to a wide width,
# It will cause output that is 'extended' to fit to the console width,
# which is not what you want.
panel_content = f"URL: {long_url}\nCommand: {long_command}"
panel_measured = Panel(panel_content, expand=False)
temp_console = Console(width=99999)
measurement = Measurement.get(temp_console, temp_console.options, panel_measured)
panel_measured.width = int(measurement.maximum)
console.print(panel_measured, crop=False, overflow="ignore")
print("\n" + "=" * 80)
print("Table with default settings")
print("=" * 80)
table = Table()
table.add_column("Type", style="cyan")
table.add_column("Value", style="green")
table.add_row("URL", long_url)
table.add_row("Command", long_command)
console.print(table)
print("\n" + "=" * 80)
print("Table with no_wrap=True on columns")
print("=" * 80)
table_nowrap = Table()
table_nowrap.add_column("Type", style="cyan", no_wrap=True)
table_nowrap.add_column("Value", style="green", no_wrap=True)
table_nowrap.add_row("URL", long_url)
table_nowrap.add_row("Command", long_command)
console.print(table_nowrap, crop=False, overflow="ignore")
## WORKING EXAMPLES THAT DO WHAT IS EXPECTED
print("\n" + "=" * 80)
print("Panel that works: Use get_rendered_width() helper")
print("=" * 80)
# Panels fill the space up to the size of the Console,
# to to make a Panel that doesn't wrap,
# we need to set the width of the Console to the rendered panel width
content_lines = f"URL: {long_url}\n{long_command}"
panel_measured = Panel(content_lines)
panel_width = get_rendered_width(panel_measured)
console.width = panel_width
console.print(panel_measured, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
print("\n" + "=" * 80)
print("Table that works: Use get_rendered_width() helper")
print("=" * 80)
# Tables are bossy and will display at their own width if set,
# regardless of the console width. So we need to measure the table width
# and set the width of the table to the measured width.
table_measured = Table()
table_measured.add_column("Type", style="cyan", no_wrap=True)
table_measured.add_column("Value", style="green", no_wrap=True)
table_measured.add_row("URL", long_url)
table_measured.add_row("Command", long_command)
# set table width to the measured width
table_measured.width = get_rendered_width(table_measured)
console.print(table_measured, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
print("\n" + "=" * 80)
print("Plain text with crop=False, overflow='ignore'")
print("=" * 80)
# Plain text doesn't have a width, so we can just print it directly,
# as long as we set crop=False and overflow="ignore"
console.print(f"URL: {long_url}", crop=False, overflow="ignore")
console.print(f"Command: {long_command}", crop=False, overflow="ignore")
print("\n" + "=" * 80)
print("Table with matching Panel summary: Same width for both")
print("=" * 80)
# Create table with data
result_table = Table()
result_table.add_column("Type", style="cyan", no_wrap=True)
result_table.add_column("Value", style="green", no_wrap=True)
result_table.add_row("URL", long_url)
result_table.add_row("Command", long_command)
# Measure table width and set it
table_width = get_rendered_width(result_table)
result_table.width = table_width
# Create panel summary with same width
summary_text = "[bold]Summary:[/bold] Processed 2 items with no errors"
summary_panel = Panel(summary_text, title="Results", border_style="green")
# Panel needs Console width set to match table
console.width = table_width
# Print both - they'll have matching widths
console.print(result_table, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
console.print(summary_panel, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)

View File

@@ -0,0 +1,52 @@
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "rich>=13.0.0",
# ]
# ///
"""Minimal example: Preventing Rich Console word wrapping in CI/non-TTY environments.
Problem: Rich Console wraps text at default width (80 chars in non-TTY), breaking:
- URLs in log output
- Long command strings
- Stack traces
- Structured log parsing
Solution: Use crop=False + overflow="ignore" on console.print() calls.
"""
from rich.console import Console
# Sample long text that would wrap at 80 characters
long_url = (
"https://raw.githubusercontent.com/python/cpython/main/Lib/asyncio/base_events.py"
"#L1000-L1100?ref=docs-example&utm_source=rich-demo&utm_medium=terminal"
"&utm_campaign=long-url-wrapping-behavior-test"
)
long_command = """
[bold cyan]:sparkles: v3.13.0 Release Highlights :sparkles:[/bold cyan] New JIT optimizations, faster startup, improved error messages, richer tracebacks, "
better asyncio diagnostics, enhanced typing features, smoother virtualenv workflows, and a refined standard library experience for developers everywhere.
[green]:rocket: Performance & Reliability :rocket:[/green] Lower latency event loops, smarter garbage collection heuristics, adaptive I/O backpressure, fine-tuned file system operations, reduced memory fragmentation, and sturdier cross-platform behavior in cloud-native deployments.
[magenta]:hammer_and_wrench: Developer Experience :hammer_and_wrench:[/magenta] More precise type hints, clearer deprecation warnings, friendlier REPL niceties, first-class debugging hooks, expanded `typing` utilities, and streamlined packaging stories for modern Python projects of all sizes.",
[yellow]:shield: Security & Ecosystem :shield:[/yellow] Hardened TLS defaults, safer subprocess handling, improved sandboxing hooks, more robust hashing algorithms, curated secure defaults across modules, and deeper ecosystem integration for auditing, scanning, and compliance workflows.
"""
long_traceback = "Traceback (most recent call last): File /very/long/path/to/module/that/contains/the/failing/code/in/production/environment.py line 42 in process_data"
console = Console()
print("=" * 80)
print("PROBLEM: Default console.print() wraps long lines")
print("=" * 80)
console.print(f"URL: {long_url}")
console.print(f"Command: {long_command}")
console.print(f"Error: {long_traceback}")
print("\n" + "=" * 80)
print("SOLUTION: Use crop=False + overflow='ignore'")
print("=" * 80)
console.print(f"URL: {long_url}", crop=False, overflow="ignore")
console.print(f"Command: {long_command}", crop=False, overflow="ignore")
console.print(f"Error: {long_traceback}", crop=False, overflow="ignore")

View File

@@ -0,0 +1,153 @@
# Typer and Rich CLI Examples
This directory contains executable examples demonstrating solutions to common problems when building Python CLI applications with Typer and Rich.
## Available Examples
| Script | Problem Solved | Key Technique |
| --- | --- | --- |
| [console_no_wrap_example.py](./console_no_wrap_example.py) | Rich Console wraps text at 80 chars in CI/non-TTY | Use `crop=False, overflow="ignore"` on print calls |
| [console_containers_no_wrap.py](./console_containers_no_wrap.py) | Panels/Tables wrap long content even with crop=False | Use `get_rendered_width()` helper + dedicated Console |
## Quick Start
All scripts use PEP 723 inline script metadata and can be run directly:
```bash
# Run directly (uv handles dependencies automatically)
./console_no_wrap_example.py
# Or explicitly with uv
uv run console_no_wrap_example.py
```
## Problem 1: Rich Console Text Wrapping in CI
### The Problem
Rich Console wraps text at default width (80 chars in non-TTY environments like CI), breaking:
- URLs in log output
- Long command strings
- Stack traces
- Structured log parsing
### Why This Matters
**In non-interactive environments (CI, logs, automation), output is consumed by machines, not humans:**
- **Log parsing**: Tools like grep/awk/sed expect data on single lines - wrapping breaks patterns
- **URLs**: Wrapped URLs become invalid - can't click, copy-paste, or process with tools
- **Structured data**: JSON/CSV output splits across lines - breaks parsers and data processing
- **Commands**: Wrapped command strings can't be copy-pasted to execute
- **Error investigation**: Stack traces and file paths fragment across lines - harder to trace issues
**In interactive TTY (terminal), wrapping is good** - optimizes for human reading at terminal width.
**The solution must detect context and apply different behavior:**
- **TTY (interactive)**: Use terminal width, wrap for human readability
- **Non-TTY (CI/logs)**: Never wrap, optimize for machine parsing
### The Solution
Use `crop=False` + `overflow="ignore"` on `console.print()` calls:
```python
from rich.console import Console
console = Console()
# For text that should never wrap (URLs, commands, paths)
console.print(long_url, crop=False, overflow="ignore")
# For normal text that can wrap
console.print(normal_text)
```
### Example Script
[console_no_wrap_example.py](./console_no_wrap_example.py) demonstrates:
- The problem (default wrapping behavior)
- The solution (using crop=False + overflow="ignore")
- Usage patterns for different text types
## Problem 2: Rich Containers (Panel/Table) Wrapping Content
### The Problem
Rich containers like `Panel` and `Table` wrap content internally even when using `crop=False, overflow="ignore"` on the print call. This is because:
- Containers calculate their own internal layout
- Console width (default 80 in non-TTY) constrains container rendering
- Content wraps inside the container before `crop=False` can prevent it
### The Solution
Use a helper function to measure the actual rendered width, then apply width differently for Panel vs Table:
```python
from rich.console import Console, RenderableType
from rich.measure import Measurement
from rich.panel import Panel
from rich.table import Table
def get_rendered_width(renderable: RenderableType) -> int:
"""Get actual rendered width of any Rich renderable.
Handles color codes, Unicode, styling, padding, and borders.
Works with Panel, Table, or any Rich container.
"""
temp_console = Console(width=9999)
measurement = Measurement.get(temp_console, temp_console.options, renderable)
return int(measurement.maximum)
console = Console()
# Panel: Set Console width (Panel fills Console width)
panel = Panel(long_content)
panel_width = get_rendered_width(panel)
console.width = panel_width # Set Console width, NOT panel.width
console.print(panel, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
# Table: Set Table width (Table controls its own width)
table = Table()
table.add_column("Type", style="cyan", no_wrap=True)
table.add_column("Value", style="green", no_wrap=True)
table.add_row("Data", long_content)
table.width = get_rendered_width(table) # Set Table width
console.print(table, crop=False, overflow="ignore", no_wrap=True, soft_wrap=True)
```
### Example Script
[console_containers_no_wrap.py](./console_containers_no_wrap.py) demonstrates:
- Default Panel/Table wrapping behavior
- Why `crop=False` alone doesn't work for containers
- The `get_rendered_width()` helper function
- Complete working examples for both Panel and Table
- Comparison of different approaches
## When to Use Each Technique
**Use `crop=False, overflow="ignore"` for:**
- Plain text output
- URLs, file paths, commands that must stay on single lines
- Text that doesn't use Rich containers
**Use `get_rendered_width()` + set width on container for:**
- Panel with long content
- Table with long cell values
- Any Rich container that wraps content
- Structured output that must preserve exact formatting
## Related Documentation
- [Rich Console Documentation](https://rich.readthedocs.io/en/stable/console.html)
- [Rich Panel Documentation](https://rich.readthedocs.io/en/stable/panel.html)
- [Rich Table Documentation](https://rich.readthedocs.io/en/stable/tables.html)
- [Typer Documentation](https://typer.tiangolo.com/)

View File

@@ -0,0 +1,48 @@
"""Compute the version number and store it in the `__version__` variable.
Based on <https://github.com/maresb/hatch-vcs-footgun-example>.
"""
def _get_hatch_version() -> str | None:
"""Compute the most up-to-date version number in a development environment.
Returns `None` if Hatchling is not installed, e.g. in a production environment.
For more details, see <https://github.com/maresb/hatch-vcs-footgun-example/>.
"""
import os
try:
from hatchling.metadata.core import ProjectMetadata
from hatchling.plugin.manager import PluginManager
from hatchling.utils.fs import locate_file
except ImportError:
# Hatchling is not installed, so probably we are not in
# a development environment.
return None
pyproject_toml = locate_file(__file__, "pyproject.toml")
if pyproject_toml is None:
raise RuntimeError("pyproject.toml not found although hatchling is installed")
root = os.path.dirname(pyproject_toml)
metadata = ProjectMetadata(root=root, plugin_manager=PluginManager())
# Version can be either statically set in pyproject.toml or computed dynamically:
return metadata.core.version or metadata.hatch.version.cached
def _get_importlib_metadata_version() -> str:
"""Compute the version number using importlib.metadata.
This is the official Pythonic way to get the version number of an installed
package. However, it is only updated when a package is installed. Thus, if a
package is installed in editable mode, and a different version is checked out,
then the version number will not be updated.
"""
from importlib.metadata import version
__version__ = version(__package__ or __name__)
return __version__
__version__ = _get_hatch_version() or _get_importlib_metadata_version()

View File

@@ -0,0 +1,227 @@
# Commands Reference Library
This directory contains **reference material for creating and organizing Claude Code slash commands**. These are NOT deployed commands themselves, but rather templates, patterns, and procedural guides for command development.
## Purpose
The `commands/` directory serves as a knowledge base for:
- **Command Templates**: Standardized structures for creating new slash commands
- **Command Patterns**: Configuration defining command categories, workflows, and integration
- **Meta-Commands**: Guides for generating other commands using established patterns
- **Specialized Workflows**: Domain-specific command procedures (testing, development)
## Directory Structure
```text
commands/
├── development/ # Development workflow commands
│ ├── config/
│ │ └── command-patterns.yml # Command categories, workflows, risk levels
│ ├── templates/
│ │ └── command-template.md # Base template for new commands
│ ├── use-command-template.md # Meta-command: generate commands from template
│ └── create-feature-task.md # Structured feature development workflow
└── testing/ # Testing workflow commands
├── analyze-test-failures.md # Investigate test failures (bug vs test issue)
├── comprehensive-test-review.md
└── test-failure-mindset.md
```
## Key Files
### Configuration
**[command-patterns.yml](./development/config/command-patterns.yml)**
Defines the organizational structure for commands:
- **Command Categories**: Analysis, Development, Quality, Documentation, Operations
- **Workflow Chains**: Multi-step processes (Feature Development, Bug Fix, Code Review)
- **Context Sharing**: What information flows between commands
- **Cache Patterns**: TTL and invalidation rules for different command types
- **Risk Assessment**: Classification of commands by risk level and required safeguards
### Templates
**[command-template.md](./development/templates/command-template.md)**
Standard structure for creating new commands:
- Purpose statement (single sentence)
- Task description with `$ARGUMENTS` placeholder
- Phased execution steps (Analysis → Implementation → Validation)
- Context preservation rules
- Expected output format
- Integration points (prerequisites, follow-ups, related commands)
### Meta-Commands
**[use-command-template.md](./development/use-command-template.md)**
Procedural guide for generating new commands:
1. Parse command purpose from `$ARGUMENTS`
2. Select appropriate category and naming convention
3. Apply template structure with customizations
4. Configure integration with workflow chains
5. Create command file in appropriate location
### Specialized Workflows
**[analyze-test-failures.md](./testing/analyze-test-failures.md)**
Critical thinking framework for test failure analysis:
- Balanced investigation approach (test bug vs implementation bug)
- Structured analysis steps
- Classification criteria (Test Bug | Implementation Bug | Ambiguous)
- Examples demonstrating reasoning patterns
- Output format for clear communication
**[create-feature-task.md](./development/create-feature-task.md)**
Structured approach to feature development:
- Requirement parsing and scope determination
- Task structure generation with phases
- Documentation creation and tracking setup
- Integration with development workflows
## Usage Patterns
### Creating a New Command
When you need to create a new slash command for Claude Code:
1. **Consult the patterns**: Review [command-patterns.yml](./development/config/command-patterns.yml) to understand:
- Which category your command belongs to
- Whether it fits into existing workflow chains
- What risk level it represents
2. **Use the template**: Start with [command-template.md](./development/templates/command-template.md)
- Replace placeholders with command-specific content
- Customize execution steps for your use case
- Define clear integration points
3. **Follow naming conventions**: Use verb-noun format
- Analysis: `analyze-*`, `scan-*`, `validate-*`
- Development: `create-*`, `implement-*`, `fix-*`
- Operations: `deploy`, `migrate`, `cleanup-*`
4. **Deploy to proper location**: Actual slash commands live in:
- User commands: `~/.claude/commands/`
- Project commands: `.claude/commands/` (in project root)
- NOT in this `references/commands/` directory
### Integrating Commands into Workflows
Commands are designed to chain together:
```yaml
Feature_Development:
steps:
- create-feature-task # Initialize structured task
- study-current-repo # Understand codebase
- implement-feature # Write code
- create-test-plan # Design tests
- comprehensive-test-review # Validate quality
- gh-create-pr # Submit for review
```
Each command produces context that subsequent commands can use.
## Relationship to Skill Structure
This directory is part of the **python3-development** skill's reference material:
```text
python3-development/
├── SKILL.md # Skill entry point
├── references/
│ ├── commands/ # THIS DIRECTORY (reference material)
│ ├── modern-modules/ # Python library guides
│ └── ...
└── scripts/ # Executable tools
```
**Important Distinctions**:
- **This directory** (`references/commands/`): Templates and patterns for creating commands
- **Deployed commands** (`~/.claude/commands/`): Actual slash commands that Claude Code executes
- **Skill scripts** (`scripts/`): Python tools that may be called by commands
## Best Practices
### When Creating Commands
1. **Single Responsibility**: Each command should focus on one clear task
2. **Clear Naming**: Use descriptive verb-noun pairs (`analyze-dependencies`, `create-component`)
3. **Example Usage**: Include at least 3 concrete examples
4. **Context Definition**: Specify what gets cached for reuse by later commands
5. **Integration Points**: Define prerequisites and natural follow-up commands
### When Organizing Commands
1. **Category Alignment**: Place commands in appropriate category subdirectories
2. **Workflow Awareness**: Consider how commands chain together
3. **Risk Classification**: Mark high-risk commands with appropriate safeguards
4. **Documentation**: Keep command patterns file updated with new additions
## Integration with Python Development Skill
Commands in this directory support the orchestration patterns described in:
- [python-development-orchestration.md](../references/python-development-orchestration.md)
- [reference-document-architecture.md](../planning/reference-document-architecture.md) (historical proposal, not implemented)
They complement the agent-based workflows:
```text
User Request
Orchestrator (uses skill + commands)
├─→ @agent-python-cli-architect (implementation)
├─→ @agent-python-pytest-architect (testing)
└─→ @agent-python-code-reviewer (review)
Apply standards: /modernpython, /shebangpython (from commands)
```
## Common Workflows
### Feature Development
```bash
# 1. Create structured task
/development:create-feature-task Add user authentication with OAuth
# 2. Implement with appropriate agent
# (Orchestrator delegates to @agent-python-cli-architect)
# 3. Validate with standards
/modernpython src/auth/oauth.py
/shebangpython scripts/migrate-users.py
```
### Test Failure Investigation
```bash
# Analyze failures with critical thinking
/testing:analyze-test-failures test_authentication.py::test_oauth_flow
```
### Command Creation
```bash
# Generate new command from template
/development:use-command-template validate API endpoints for rate limiting
```
## Further Reading
- [Command Template](./development/templates/command-template.md) - Base structure for all commands
- [Command Patterns](./development/config/command-patterns.yml) - Organizational taxonomy
- [Python Development Orchestration](../references/python-development-orchestration.md) - How commands fit into workflows

View File

@@ -0,0 +1,112 @@
# Command Configuration Patterns
# Defines reusable patterns for Claude Code commands
Command_Categories:
Analysis:
- analyze-repo-for-claude
- estimate-context-window
- study-*
- validate-*
- scan-*
Development:
- create-feature-task
- implement-feature
- create-test-plan
- fix-bug
Quality:
- comprehensive-test-review
- validate-code-quality
- scan-performance
- scan-test-coverage
Documentation:
- generate-primevue-reference
- document-api
- create-readme
Operations:
- git-*
- gh-*
- deploy
- migrate
Workflow_Chains:
Feature_Development:
steps:
- create-feature-task
- study-current-repo
- implement-feature
- create-test-plan
- comprehensive-test-review
- gh-create-pr
Bug_Fix:
steps:
- gh-issue-enhance
- analyze-issue
- fix-bug
- test-fix
- gh-pr-submit
Code_Review:
steps:
- scan-code-quality
- scan-performance
- scan-test-coverage
- generate-review-report
Context_Sharing:
# Define what context flows between commands
analyze_to_implement:
from: analyze-*
to: implement-*, fix-*
shares: [findings, patterns, architecture]
scan_to_fix:
from: scan-*, validate-*
to: fix-*, improve-*
shares: [issues, priorities, locations]
test_to_deploy:
from: test-*, scan-test-coverage
to: deploy, gh-pr-*
shares: [results, coverage, confidence]
Cache_Patterns:
# How long to cache results from different command types
Analysis_Commands:
ttl: 3600 # 1 hour
invalidate_on: [file_changes, branch_switch]
Scan_Commands:
ttl: 1800 # 30 minutes
invalidate_on: [file_changes]
Build_Commands:
ttl: 300 # 5 minutes
invalidate_on: [any_change]
Risk_Assessment:
High_Risk_Commands:
commands:
- deploy
- migrate
- cleanup-*
- delete-*
triggers: [confirmation, backup, dry_run]
Medium_Risk_Commands:
commands:
- refactor-*
- update-dependencies
- merge-*
triggers: [plan_first, test_after]
Low_Risk_Commands:
commands:
- analyze-*
- scan-*
- study-*
triggers: []

View File

@@ -0,0 +1,60 @@
---
title: "Create Feature Development Task"
description: "Set up comprehensive feature development task with proper tracking"
command_type: "development"
last_updated: "2025-11-02"
related_docs:
- "./use-command-template.md"
- "../../references/python-development-orchestration.md"
---
# Create Feature Development Task
I need to create a structured development task for: $ARGUMENTS
## Your Task
Set up a comprehensive feature development task with proper tracking, phases, and documentation.
## Execution Steps
1. **Parse Feature Requirements**
- Extract feature name and description from $ARGUMENTS
- Identify key requirements and constraints
- Determine complexity and scope
2. **Generate Task Structure**
- Use the feature task template as base
- Customize phases based on feature type
- Add specific acceptance criteria
- Include relevant technical considerations
3. **Create Task Documentation**
- Copy template from ~/.claude/templates/feature-task-template.md
- Fill in all sections with feature-specific details
- Save to appropriate location (suggest: .claude/tasks/[feature-name].md)
- Create initial git branch if requested
4. **Set Up Tracking**
- Add task to TODO list if applicable
- Create initial checkpoints
- Set up progress markers
- Configure any automation needed
## Template Usage
@include templates/feature-task-template.md
## Context Preservation
When creating the task, preserve:
- Initial requirements
- Key technical decisions
- File locations
- Dependencies identified
- Risk factors
## Integration
**Prerequisites**: Clear feature requirements **Follow-up**: `/development:implement-feature [task-file]` **Related**: `create-test-plan`, `estimate-context-window`

View File

@@ -0,0 +1,59 @@
# Command Template
## Purpose
[Single sentence describing what the command does]
## Your Task
[Clear description of what needs to be accomplished with: $ARGUMENTS]
## Execution Steps
1. **Phase 1 - Analysis**
- [Step 1]
- [Step 2]
- [Step 3]
2. **Phase 2 - Implementation**
- [Step 1]
- [Step 2]
- [Step 3]
3. **Phase 3 - Validation**
- [Step 1]
- [Step 2]
- [Step 3]
## Context Preservation
Cache the following for future commands:
- Key findings
- Decisions made
- Files modified
- Patterns identified
## Expected Output
```markdown
## [Command] Results
### Summary
- [Key outcome 1]
- [Key outcome 2]
### Details
[Structured output based on command type]
### Next Steps
- [Recommended follow-up action 1]
- [Recommended follow-up action 2]
```
## Integration
**Prerequisites**: [What should be done before this command] **Follow-up**: [What commands naturally follow this one] **Related**: [Other commands that work well with this]

View File

@@ -0,0 +1,74 @@
---
title: "Use Command Template"
description: "Create new Claude Code command following established patterns"
command_type: "development"
last_updated: "2025-11-02"
related_docs:
- "./templates/command-template.md"
- "./create-feature-task.md"
---
# Use Command Template
I need to create a new command using the standard template for: $ARGUMENTS
## Your Task
Create a new Claude Code command following our established patterns and templates.
## Execution Steps
1. **Determine Command Type**
- Parse command purpose from $ARGUMENTS
- Identify appropriate category (analysis/development/quality/etc)
- Choose suitable command name (verb-noun format)
2. **Apply Template**
- Start with base template from [command-template.md](./templates/command-template.md)
- Customize sections for specific command purpose
- Ensure all required sections are included
- Add command-specific flags if needed
3. **Configure Integration**
- Check [command-patterns.yml](./config/command-patterns.yml) for workflow placement
- Identify prerequisite commands
- Define what context this command produces
- Add to appropriate workflow chains
4. **Create Command File**
- Determine correct folder based on category
- Create .md file with command content
- Verify @include references work correctly
- Test with example usage
## Template Structure
The standard template includes:
- Purpose (single sentence)
- Task description with $ARGUMENTS
- Phased execution steps
- Context preservation rules
- Expected output format
- Integration guidance
## Best Practices
- Keep commands focused on a single responsibility
- Use clear verb-noun naming (analyze-dependencies, create-component)
- Include at least 3 example usages
- Define what gets cached for reuse
- Specify prerequisite and follow-up commands
## Example Usage
```bash
# Create a new analysis command
/development:use-command-template analyze API endpoints for rate limiting needs
# Create a new validation command
/development:use-command-template validate database migrations for safety
# Create a new generation command
/development:use-command-template generate Pydantic classes from API schema
```

View File

@@ -0,0 +1,114 @@
---
title: "Analyze Test Failures"
description: "Analyze failing test cases with a balanced, investigative approach"
command_type: "testing"
last_updated: "2025-11-02"
related_docs:
- "./test-failure-mindset.md"
- "./comprehensive-test-review.md"
---
# Analyze Test Failures
<role>
You are a senior software engineer with expertise in test-driven development and debugging. Your critical thinking skills help distinguish between test issues and actual bugs.
</role>
<context>
When tests fail, there are two primary possibilities that must be carefully evaluated:
1. The test itself is incorrect (false positive)
2. The test is correct and has discovered a genuine bug (true positive)
Assuming tests are wrong by default is a dangerous anti-pattern that defeats the purpose of testing. </context>
<task>
Analyze the failing test case(s) $ARGUMENTS with a balanced, investigative approach to determine whether the failure indicates a test issue or a genuine bug.
</task>
<instructions>
1. **Initial Analysis**
- Read the failing test carefully, understanding its intent
- Examine the test's assertions and expected behavior
- Review the error message and stack trace
2. **Investigate the Implementation**
- Check the actual implementation being tested
- Trace through the code path that leads to the failure
- Verify that the implementation matches its documented behavior
3. **Apply Critical Thinking** For each failing test, ask:
- What behavior is the test trying to verify?
- Is this behavior clearly documented or implied by the function/API design?
- Does the current implementation actually provide this behavior?
- Could this be an edge case the implementation missed?
4. **Make a Determination** Classify the failure as one of:
- **Test Bug**: The test's expectations are incorrect
- **Implementation Bug**: The code doesn't behave as it should
- **Ambiguous**: The intended behavior is unclear and needs clarification
5. **Document Your Reasoning** Provide clear explanation for your determination, including:
- Evidence supporting your conclusion
- The specific mismatch between expectation and reality
- Recommended fix (whether to the test or implementation) </instructions>
<examples>
<example>
**Scenario**: Test expects `calculateDiscount(100, 0.2)` to return 20, but it returns 80
**Analysis**:
- Test assumes function returns discount amount
- Implementation returns price after discount
- Function name is ambiguous
**Determination**: Ambiguous - needs clarification **Reasoning**: The function name could reasonably mean either "calculate the discount amount" or "calculate the discounted price". Check documentation or ask for intended behavior. </example>
<example>
**Scenario**: Test expects `validateEmail("user@example.com")` to return true, but it returns false
**Analysis**:
- Test provides a valid email format
- Implementation regex is missing support for dots in domain
- Other valid emails also fail
**Determination**: Implementation Bug **Reasoning**: The email address is valid per RFC standards. The implementation's regex is too restrictive and needs to be fixed. </example>
<example>
**Scenario**: Test expects `divide(10, 0)` to return 0, but it throws an error
**Analysis**:
- Test assumes division by zero returns 0
- Implementation throws DivisionByZeroError
- Standard mathematical behavior is to treat as undefined/error
**Determination**: Test Bug **Reasoning**: Division by zero is mathematically undefined. Throwing an error is the correct behavior. The test should expect an error, not 0. </example> </examples>
<important>
- NEVER automatically assume the test is wrong
- ALWAYS consider that the test might have found a real bug
- When uncertain, lean toward investigating the implementation
- Tests are often your specification - they define expected behavior
- A failing test is a gift - it's either catching a bug or clarifying requirements
</important>
<output_format> For each failing test, provide:
```text
Test: [test name/description]
Failure: [what failed and how]
Investigation:
- Test expects: [expected behavior]
- Implementation does: [actual behavior]
- Root cause: [why they differ]
Determination: [Test Bug | Implementation Bug | Ambiguous]
Recommendation:
[Specific fix to either test or implementation]
```
</output_format>

View File

@@ -0,0 +1,31 @@
---
title: "Comprehensive Test Review"
description: "Perform thorough test review following standard checklist"
command_type: "testing"
last_updated: "2025-11-02"
related_docs:
- "./analyze-test-failures.md"
- "./test-failure-mindset.md"
---
# Comprehensive Test Review
I need to review and ensure comprehensive testing for: $ARGUMENTS
## Test Review Process
I'll perform a thorough test review following our standard checklist:
@include templates/test-checklist.md
## Additional Considerations
Beyond the standard checklist, I'll also examine:
- Test isolation and independence
- Mock usage appropriateness
- Test execution time
- Flaky test patterns
- Test naming clarity
Let me analyze the testing situation...

View File

@@ -0,0 +1,121 @@
---
title: "Test Failure Analysis Mindset"
description: "Set balanced investigative approach for test failures"
command_type: "testing"
last_updated: "2025-11-02"
related_docs:
- "./analyze-test-failures.md"
- "./comprehensive-test-review.md"
---
# Test Failure Analysis Mindset
<role>
You are a senior software engineer who understands that test failures are valuable signals that require careful analysis, not automatic dismissal.
</role>
<context>
This guidance sets your approach for all future test failure encounters in this session. Tests are specifications - they define expected behavior. When they fail, it's a critical moment requiring balanced investigation.
</context>
<task>
Going forward in this session, whenever you encounter failing tests, apply a balanced investigative approach that considers both possibilities: the test could be wrong, OR the test could have discovered a genuine bug.
</task>
<principles>
1. **Tests as First-Class Citizens**
- Tests are often the only specification we have
- They encode important business logic and edge cases
- A failing test is providing valuable information
2. **Dual Hypothesis Approach** Always consider both possibilities:
- Hypothesis A: The test's expectations are incorrect
- Hypothesis B: The implementation has a bug
3. **Evidence-Based Decisions**
- Never assume; always investigate
- Look for evidence supporting each hypothesis
- Document your reasoning process
4. **Respect the Test Author**
- Someone wrote this test for a reason
- They may have understood requirements you're missing
- Their test might be catching a subtle edge case </principles>
<mindset>
When you see a test failure, your internal monologue should be:
"This test is failing. This could mean:
1. The test discovered a bug in the implementation (valuable!)
2. The test's expectations don't match intended behavior
3. There's ambiguity about what the correct behavior should be
Let me investigate all three possibilities before making changes."
NOT: "The test is failing, so I'll fix the test to match the implementation." </mindset>
<approach>
For EVERY test failure you encounter:
1. **Pause and Read**
- Understand what the test is trying to verify
- Read its name, comments, and assertions carefully
2. **Trace the Implementation**
- Follow the code path that leads to the failure
- Understand what the code actually does vs. what's expected
3. **Consider the Context**
- Is this testing a documented requirement?
- Would the current behavior surprise a user?
- What would be the impact of each possible fix?
4. **Make a Reasoned Decision**
- If the implementation is wrong: Fix the bug
- If the test is wrong: Fix the test AND document why
- If unclear: Seek clarification before changing anything
5. **Learn from the Failure**
- What can this teach us about the system?
- Should we add more tests for related cases?
- Is there a pattern we're missing? </approach>
<red_flags> Watch out for these dangerous patterns:
- 🚫 Immediately changing tests to match implementation
- 🚫 Assuming the implementation is always correct
- 🚫 Bulk-updating tests without individual analysis
- 🚫 Removing "inconvenient" test cases
- 🚫 Adding mock/stub workarounds instead of fixing root causes </red_flags>
<good_practices> Cultivate these helpful patterns:
- ✅ Treat each test failure as a potential bug discovery
- ✅ Document your analysis in comments when fixing tests
- ✅ Write clear test names that explain intent
- ✅ When changing a test, explain why the original was wrong
- ✅ Consider adding more tests when you find ambiguity </good_practices>
<example_responses> When encountering test failures, respond like this:
**Good**: "I see test_user_validation is failing. Let me trace through the validation logic to understand if this is catching a real bug or if the test's expectations are incorrect."
**Bad**: "The test is failing so I'll update it to match what the code does."
**Good**: "This test expects the function to throw an error for null input, but it returns None. This could be a defensive programming issue - let me check if null inputs should be handled differently."
**Bad**: "I'll change the test to expect None instead of an error." </example_responses>
<remember>
Every test failure is an opportunity to:
- Discover and fix a bug before users do
- Clarify ambiguous requirements
- Improve system understanding
- Strengthen the test suite
The goal is NOT to make tests pass as quickly as possible. The goal IS to ensure the system behaves correctly. </remember>
<activation>
This mindset is now active for the remainder of our session. I will apply this balanced, investigative approach to all test failures, always considering that the test might be correct and might have found a real bug.
</activation>

View File

@@ -0,0 +1,56 @@
# Planning Directory
## Purpose
This directory contains **historical architectural proposals and planning documents** that were created during the development of the python3-development skill but were **not implemented** in the final design.
## Contents
### reference-document-architecture.md
A comprehensive 2500+ line architectural proposal that outlined a different structure for the python3-development skill than what was ultimately implemented.
**Proposed Structure** (in this document):
- `docs/scenarios/` - Scenario-specific guides (CLI, TUI, modules, scripts)
- `docs/standards/` - Cross-cutting standards and patterns
- `docs/frameworks/` - Framework-specific deep dives
**Actual Implemented Structure** (current):
- `references/` - Reference documentation and module guides
- `commands/` - Command patterns and orchestration workflows
**Why It's Archived:**
- The document proposed an architecture that was never implemented
- The skill evolved in a different direction with simpler organization
- Preserving it provides historical context for design decisions
- Contains valuable thinking about progressive disclosure and token optimization
**Status:** Historical proposal, not implemented
**Note on Document Quality:** This archived document may contain markdown linting issues (e.g., code blocks without language specifiers). These have been preserved as-is to maintain the historical record. Do not use this document as a template for new documentation.
## Usage
These documents are **read-only archives**. They should not be used as current guidance for the python3-development skill. For current documentation, see:
- `../SKILL.md` - Current skill orchestration and usage
- `../references/` - Current reference documentation
- `../commands/` - Current command patterns
## When to Reference Planning Documents
Reference these documents when:
- Understanding the evolution of the skill's design
- Researching alternative architectural approaches
- Learning about decision-making processes
- Considering major architectural changes
**Do not** reference these documents for:
- Current usage patterns
- Active development guidance
- Implementation details

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,548 @@
---
title: "PEP 723 - Inline Script Metadata"
description: "Official Python specification for embedding dependency metadata in single-file scripts"
version: "1.0.0"
last_updated: "2025-11-04"
document_type: "reference"
official_specification: "https://peps.python.org/pep-0723/"
python_compatibility: "3.11+"
related_docs:
- "../SKILL.md"
- "./python-development-orchestration.md"
---
# PEP 723 - Inline Script Metadata
## What is PEP 723?
PEP 723 is the official Python specification that defines a standard format for embedding metadata in single-file Python scripts. It allows scripts to declare their dependencies and Python version requirements without requiring separate configuration files like `pyproject.toml` or `requirements.txt`.
## Official Specification
The model must WebFetch this url before discussing the topic with the user [pep-0723](https://peps.python.org/pep-0723/)
## Key Concept
PEP 723 metadata is embedded **inside Python comments** using a special syntax, making the metadata human-readable and machine-parseable while keeping the script as a single portable file.
If implementing anything to interact with this metadata, such as a linting enhancer you must WebFetch [inline-script-metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata) to get the schema and syntax and implementation.
## The Problem It Solves
### The Challenge
When sharing Python scripts as standalone files (via email, gists, URLs, or chat), there's a fundamental problem:
- **Scripts often need external dependencies** (requests, rich, pandas, etc.)
- **No standard way** to declare these dependencies within the script itself
- **Tools can't automatically know** what packages to install to run the script
- **Users must read documentation** or comments to figure out requirements
### The Solution
PEP 723 provides a **standardized comment-based format** that:
- ✅ Embeds dependency declarations directly in the script
- ✅ Remains a valid Python file (metadata is in comments)
- ✅ Is machine-readable by package managers (uv, PDM, Hatch)
- ✅ Keeps everything in a single portable file
## Syntax
### Format
PEP 723 metadata is written as **TOML inside specially-formatted Python comments**:
```python
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests<3",
# "rich",
# ]
# ///
```
### Rules
1. **Opening marker**: `# /// script` (exactly, with spaces)
2. **Content**: Valid TOML, with each line prefixed by `#` and a space
3. **Closing marker**: `# ///` (exactly, with spaces)
4. **Location**: Typically near the top of the file, after shebang and module docstring
5. **Indentation**: Use consistent comment formatting
### Supported Fields
```toml
# /// script
# requires-python = ">=3.11" # Minimum Python version
# dependencies = [ # External packages
# "requests>=2.31.0,<3",
# "rich>=13.0",
# "typer[all]>=0.12.0",
# ]
# ///
```
## When to Use PEP 723
### ✅ Use PEP 723 When
1. **Script has external dependencies**
- Uses packages from PyPI (requests, pandas, rich, etc.)
- Needs specific package versions
- Example: A CLI tool that fetches data from APIs
2. **Sharing standalone scripts**
- Sending scripts via email, gists, or chat
- Publishing example scripts in documentation
- Creating portable automation tools
3. **Scripts need reproducibility**
- Version-pinned dependencies for consistent behavior
- Specific Python version requirements
- Example: Deployment scripts that must work identically across environments
### ❌ Don't Use PEP 723 When
1. **Script uses only stdlib**
- No external dependencies = nothing to declare
- Use simple shebang: `#!/usr/bin/env python3`
- Example: A script that uses only `argparse`, `pathlib`, `json`
2. **Full project with pyproject.toml**
- Projects have proper package structure
- Use `pyproject.toml` for dependency management
- PEP 723 is for **single-file scripts**, not projects
3. **Script is part of a package**
- Package dependencies are declared in `pyproject.toml`
- Script uses package-level dependencies
- No need to duplicate declarations
## Shebang Requirements
### Scripts with PEP 723 Metadata
**Must use** the uv-based shebang for automatic dependency installation:
```python
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["requests", "rich"]
# ///
import requests
from rich import print
```
**Why this shebang?**
- `uv --quiet run --active --script`: Tells uv to:
- Read PEP 723 metadata from the script
- Install declared dependencies automatically
- Execute the script with correct environment
### Stdlib-Only Scripts
**Use** the standard Python shebang (no PEP 723 needed):
```python
#!/usr/bin/env python3
import argparse
import pathlib
import json
# No dependencies to declare
```
**Why no PEP 723?**
- Stdlib is always available (bundled with Python)
- Nothing to declare = no metadata needed
- Simpler is better when appropriate
## Complete Example
### Script with External Dependencies
```python
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests>=2.31.0,<3",
# "rich>=13.0",
# ]
# ///
"""Fetch GitHub user info and display with rich formatting."""
import sys
from typing import Any
import requests
from rich.console import Console
from rich.panel import Panel
console = Console()
def fetch_user(username: str) -> dict[str, Any] | None:
"""Fetch GitHub user data."""
response = requests.get(f"https://api.github.com/users/{username}")
if response.status_code == 200:
return response.json()
return None
def main() -> None:
"""Main entry point."""
if len(sys.argv) != 2:
console.print("[red]Usage: script.py <github-username>[/red]")
sys.exit(1)
username = sys.argv[1]
user = fetch_user(username)
if user:
console.print(
Panel(
f"[bold]{user['name']}[/bold]\n"
f"Followers: {user['followers']}\n"
f"Public Repos: {user['public_repos']}",
title=f"GitHub: {username}",
)
)
else:
console.print(f"[red]User '{username}' not found[/red]")
if __name__ == "__main__":
main()
```
**To run**:
```bash
chmod +x script.py
./script.py octocat
```
The script will:
1. Read PEP 723 metadata
2. Install `requests` and `rich` if not present
3. Execute with dependencies available
### Stdlib-Only Script
```python
#!/usr/bin/env python3
"""Simple JSON formatter using only stdlib."""
import argparse
import json
import sys
from pathlib import Path
def format_json(input_path: Path, indent: int = 2) -> None:
"""Format JSON file with specified indentation."""
data = json.loads(input_path.read_text())
formatted = json.dumps(data, indent=indent, sort_keys=True)
print(formatted)
def main() -> None:
"""Main entry point."""
parser = argparse.ArgumentParser(description="Format JSON files")
parser.add_argument("file", type=Path, help="JSON file to format")
parser.add_argument("--indent", type=int, default=2, help="Indentation spaces")
args = parser.parse_args()
format_json(args.file, args.indent)
if __name__ == "__main__":
main()
```
**No PEP 723 needed** - all imports are from Python's standard library.
## Tool Support
### Package Managers
The following tools support PEP 723 inline script metadata:
- **uv**: [https://docs.astral.sh/uv/](https://docs.astral.sh/uv/)
- **PDM**: [https://pdm-project.org/](https://pdm-project.org/)
- **Hatch**: [https://hatch.pypa.io/](https://hatch.pypa.io/)
### Running Scripts with uv
```bash
# Make script executable
chmod +x script.py
# Run directly (uv reads PEP 723 metadata)
./script.py
# Or explicitly with uv
uv run script.py
```
### Alternative: PDM
```bash
pdm run script.py
```
## Common Patterns
### Version Constraints
```python
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests>=2.31.0,<3", # Major version constraint
# "rich~=13.7", # Compatible release
# "typer[all]", # With extras
# ]
# ///
```
### Development vs Production
**For scripts**, there's typically no separation - all dependencies are runtime dependencies. If you need development tools (testing, linting), those belong in a full project with `pyproject.toml`.
### Git-Based Dependencies
```python
# /// script
# dependencies = [
# "mylib @ git+https://github.com/user/mylib.git@v1.0.0",
# ]
# ///
```
## Best Practices
### 1. Pin Major Versions
```python
# Good - prevents breaking changes
"requests>=2.31.0,<3"
# Avoid - might break on major updates
"requests"
```
### 2. Document the Script
```python
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["requests", "rich"]
# ///
"""
Fetch and display GitHub user statistics.
Usage:
./github_stats.py <username>
Example:
./github_stats.py octocat
"""
```
### 3. Keep Scripts Focused
PEP 723 is for **single-file scripts**. If your script is growing large or needs multiple modules, consider creating a proper Python package with `pyproject.toml`.
### 4. Test Portability
```bash
# Test on clean environment
uv run --isolated script.py args
```
## Comparison: PEP 723 vs pyproject.toml
| Aspect | PEP 723 (Script) | pyproject.toml (Project) |
| ---------------- | ------------------------- | -------------------------- |
| **Use case** | Single-file scripts | Multi-module packages |
| **Dependencies** | Inline comments | Separate TOML file |
| **Portability** | Single file to share | Requires project structure |
| **Complexity** | Simple, focused | Full project metadata |
| **When to use** | Scripts with dependencies | Libraries, applications |
## Validation
### Using /shebangpython Command
The `/shebangpython` command validates PEP 723 compliance:
```bash
/shebangpython script.py
```
**Checks**:
- ✅ Correct shebang for dependency type
- ✅ PEP 723 syntax if external dependencies detected
- ✅ Metadata fields are valid
- ✅ Execute permission set
See: [/shebangpython command reference](~/.claude/commands/shebangpython.md)
## Troubleshooting
### Script Won't Execute
**Problem**: `./script.py` fails with "dependencies not found"
**Solution**: Check shebang is correct for PEP 723:
```python
#!/usr/bin/env -S uv --quiet run --active --script
```
### Syntax Errors in Metadata
**Problem**: TOML parsing fails
**Solution**: Validate TOML syntax:
```python
# /// script
# requires-python = ">=3.11" # ✅ Correct
# dependencies = [ # ✅ Correct - list syntax
# "requests",
# ]
# ///
```
### Performance Concerns
**Problem**: Script slow to start (installing dependencies)
**Solution**: uv caches dependencies. First run may be slow, subsequent runs are fast. For production, consider packaging as a proper project.
## Migration
### From requirements.txt
**Before** (two files):
```text
# requirements.txt
requests>=2.31.0
rich>=13.0
```
```python
#!/usr/bin/env python3
# script.py (separate file)
import requests
from rich import print
```
**After** (single file):
```python
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests>=2.31.0",
# "rich>=13.0",
# ]
# ///
import requests
from rich import print
```
### From Setup.py Scripts
**Before** (package structure):
```text
myproject/
├── setup.py
├── requirements.txt
└── scripts/
└── tool.py
```
**After** (standalone script):
```python
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["requests", "rich"]
# ///
# tool.py - now fully self-contained
```
## Summary
### Key Takeaways
1. **PEP 723 = Dependency Metadata for Single-File Scripts**
- Standard format for declaring dependencies in comments
- TOML content inside `# ///` delimiters
2. **When to Use**
- Scripts **with external dependencies**
- Need portability (single file to share)
- Want automatic dependency installation
3. **When NOT to Use**
- Stdlib-only scripts (nothing to declare)
- Full projects (use `pyproject.toml`)
- Package modules (use package dependencies)
4. **Shebang Requirements**
- With PEP 723: `#!/usr/bin/env -S uv --quiet run --active --script`
- Stdlib only: `#!/usr/bin/env python3`
5. **Tool Support**
- uv, PDM, Hatch all support PEP 723
- Automatic dependency installation on script execution
### Quick Reference
```python
# Template for PEP 723 script
#!/usr/bin/env -S uv --quiet run --active --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "package-name>=version",
# ]
# ///
"""Script description."""
import package_name
# Your code here
```
## See Also
- **Official PEP**: [https://peps.python.org/pep-0723/](https://peps.python.org/pep-0723/)
- **uv Documentation**: [https://docs.astral.sh/uv/](https://docs.astral.sh/uv/)
- **Skill Reference**: [Python Development SKILL.md](../SKILL.md)
- **Shebang Validation**: [/shebangpython command](~/.claude/commands/shebangpython.md)

View File

@@ -0,0 +1,45 @@
---
title: "API Reference Template"
description: "Template for detailed reference documentation"
version: "1.0.0"
last_updated: "2025-11-02"
document_type: "template"
---
# Reference Documentation for Python3 Development
This is a placeholder for detailed reference documentation. Replace with actual reference content or delete if not needed.
Example real reference docs from other skills:
- product-management/references/communication.md - Comprehensive guide for status updates
- product-management/references/context_building.md - Deep-dive on gathering context
- bigquery/references/ - API references and query examples
## When Reference Docs Are Useful
Reference docs are ideal for:
- Comprehensive API documentation
- Detailed workflow guides
- Complex multi-step processes
- Information too lengthy for main SKILL.md
- Content that's only needed for specific use cases
## Structure Suggestions
### API Reference Example
- Overview
- Authentication
- Endpoints with examples
- Error codes
- Rate limits
### Workflow Guide Example
- Prerequisites
- Step-by-step instructions
- Common patterns
- Troubleshooting
- Best practices

View File

@@ -0,0 +1,419 @@
# Exception Handling in Python CLI Applications with Typer
## The Problem: Exception Chain Explosion
AI-generated code commonly creates a catastrophic anti-pattern where every function catches and re-wraps exceptions, creating massive exception chains (200+ lines of output) for simple errors like "file not found".
**Example of the problem:**
**Full example:** [nested-typer-exception-explosion.py](./nested-typer-exceptions/nested-typer-exception-explosion.py)
```python
# From: nested-typer-exception-explosion.py (simplified - see full file for all 7 layers)
# Layer 1
def read_file(path):
try:
return path.read_text()
except FileNotFoundError as e:
raise ConfigError(f"File not found: {path}") from e
except Exception as e:
raise ConfigError(f"Failed to read: {e}") from e
# Layer 2
def load_config(path):
try:
contents = read_file(path)
return json.loads(contents)
except ConfigError as e:
raise ConfigError(f"Config load failed: {e}") from e
except Exception as e:
raise ConfigError(f"Unexpected error: {e}") from e
# Layer 3... Layer 4... Layer 5... Layer 6... Layer 7...
# Each layer wraps the exception again
```
**Result:** Single `FileNotFoundError` becomes a 6-layer exception chain with 220 lines of output.
## The Correct Solution: Typer's Exit Pattern
Based on Typer's official documentation and best practices:
### Pattern 1: Custom Exit Exception with typer.echo
**Full example:** [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py)
Create a custom exception class that handles user-friendly output:
```python
# From: nested-typer-exception-explosion_corrected_typer_echo.py
import typer
class AppExit(typer.Exit):
"""Custom exception for graceful application exits."""
def __init__(self, code: int | None = None, message: str | None = None):
self.code = code
self.message = message
if message is not None:
if code is None or code == 0:
typer.echo(self.message)
else:
typer.echo(self.message, err=True)
super().__init__(code=code)
```
**Usage in helper functions:**
```python
# From: nested-typer-exception-explosion_corrected_typer_echo.py
def load_json_file(file_path: Path) -> dict:
"""Load JSON from file.
Raises:
AppExit: If file cannot be loaded or parsed
"""
contents = file_path.read_text(encoding="utf-8") # Let FileNotFoundError bubble
try:
return json.loads(contents)
except json.JSONDecodeError as e:
# Only catch where we can add meaningful context
raise AppExit(
code=1,
message=f"Invalid JSON in {file_path} at line {e.lineno}, column {e.colno}: {e.msg}"
) from e
```
**Key principles:**
- Helper functions let exceptions bubble naturally
- Only catch at points where you have enough context for a good error message
- Immediately raise `AppExit` - don't re-wrap multiple times
- Use `from e` to preserve the chain for debugging
### Pattern 2: Custom Exit Exception with Rich Console
**Full example:** [nested-typer-exception-explosion_corrected_rich_console.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_rich_console.py)
For applications using Rich for output:
```python
# From: nested-typer-exception-explosion_corrected_rich_console.py
from rich.console import Console
import typer
normal_console = Console()
err_console = Console(stderr=True)
class AppExitRich(typer.Exit):
"""Custom exception using Rich console for consistent formatting."""
def __init__(
self,
code: int | None = None,
message: str | None = None,
console: Console = normal_console
):
self.code = code
self.message = message
if message is not None:
console.print(self.message)
super().__init__(code=code)
```
**Usage:**
```python
# From: nested-typer-exception-explosion_corrected_rich_console.py
def validate_config(data: dict) -> dict:
"""Validate config structure.
Raises:
AppExitRich: If validation fails
"""
if not data:
raise AppExitRich(code=1, message="Config cannot be empty", console=err_console)
if not isinstance(data, dict):
raise AppExitRich(
code=1,
message=f"Config must be a JSON object, got {type(data)}",
console=err_console
)
return data
```
## Complete Example: Correct Pattern
**Full example:** [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py)
```python
# From: nested-typer-exception-explosion_corrected_typer_echo.py
#!/usr/bin/env -S uv run --quiet --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["typer>=0.19.2"]
# ///
import json
from pathlib import Path
from typing import Annotated
import typer
app = typer.Typer()
class AppExit(typer.Exit):
"""Custom exception for graceful exits with user-friendly messages."""
def __init__(self, code: int | None = None, message: str | None = None):
if message is not None:
if code is None or code == 0:
typer.echo(message)
else:
typer.echo(message, err=True)
super().__init__(code=code)
# Helper functions - let exceptions bubble naturally
def read_file_contents(file_path: Path) -> str:
"""Read file contents.
Raises:
FileNotFoundError: If file doesn't exist
PermissionError: If file isn't readable
"""
return file_path.read_text(encoding="utf-8")
def parse_json_string(content: str) -> dict:
"""Parse JSON string.
Raises:
json.JSONDecodeError: If JSON is invalid
"""
return json.loads(content)
# Only catch where we add meaningful context
def load_json_file(file_path: Path) -> dict:
"""Load and parse JSON file.
Raises:
AppExit: If file cannot be loaded or parsed
"""
contents = read_file_contents(file_path)
try:
return parse_json_string(contents)
except json.JSONDecodeError as e:
raise AppExit(
code=1,
message=f"Invalid JSON in {file_path} at line {e.lineno}, column {e.colno}: {e.msg}"
) from e
def validate_config(data: dict, source: str) -> dict:
"""Validate config structure.
Raises:
AppExit: If validation fails
"""
if not data:
raise AppExit(code=1, message="Config cannot be empty")
if not isinstance(data, dict):
raise AppExit(code=1, message=f"Config must be a JSON object, got {type(data)}")
return data
def load_config(file_path: Path) -> dict:
"""Load and validate configuration.
Raises:
AppExit: If config cannot be loaded or is invalid
"""
try:
data = load_json_file(file_path)
except (FileNotFoundError, PermissionError):
raise AppExit(code=1, message=f"Failed to load config from {file_path}")
return validate_config(data, str(file_path))
@app.command()
def main(config_file: Annotated[Path, typer.Argument()]) -> None:
"""Load and process configuration file."""
config = load_config(config_file)
typer.echo(f"Config loaded successfully: {config}")
if __name__ == "__main__":
app()
```
## Output Comparison
### Anti-Pattern Output (220 lines)
```text
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ ... json.loads() ... │
│ ... 40 lines of traceback ... │
╰──────────────────────────────────────────────────────────────────────────────╯
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ ... parse_json_string() ... │
│ ... 40 lines of traceback ... │
╰──────────────────────────────────────────────────────────────────────────────╯
ConfigError: Invalid JSON in broken.json at line 1, column 1: Expecting value
The above exception was the direct cause of the following exception:
[... 4 more layers of this ...]
```
### Correct Pattern Output (1 line)
```text
Invalid JSON in broken.json at line 1, column 1: Expecting value
```
## Rules for Exception Handling in Typer CLIs
### ✅ DO
1. **Let exceptions propagate in helper functions** - Most functions should not have try/except
2. **Catch only where you add meaningful context** - JSON parsing, validation, etc.
3. **Immediately raise AppExit** - Don't re-wrap multiple times
4. **Use custom exception classes** - Inherit from `typer.Exit` and handle output in `__init__`
5. **Document what exceptions bubble up** - Use docstring "Raises:" sections
6. **Use `from e` when wrapping** - Preserves exception chain for debugging
### ❌ DON'T
1. **NEVER catch and re-wrap at every layer** - This creates exception chain explosion
2. **NEVER use `except Exception as e:` as a safety net** - Too broad, catches things you can't handle
3. **NEVER check `isinstance` to avoid double-wrapping** - This is a symptom you're doing it wrong
4. **NEVER convert exceptions to return values** - Use exceptions, not `{"success": False, "error": "..."}` patterns
5. **NEVER catch exceptions you can't handle** - Let them propagate
## When to Catch Exceptions
**Catch when:**
- You can add meaningful context (filename, line number, etc.)
- You're at a validation boundary and can provide specific feedback
- You need to convert a technical error to user-friendly message
**Don't catch when:**
- You're just going to re-raise it
- You can't add any useful information
- You're in a helper function that just transforms data
## Fail Fast by Default
**What you DON'T want:**
- ❌ Nested try/except that re-raise with redundant messages
- ❌ Bare exception catching (`except Exception:`)
- ❌ Graceful degradation without requirements
- ❌ Failover/fallback logic without explicit need
- ❌ "Defensive" catch-all handlers that mask problems
**What IS fine:**
- ✅ Let exceptions propagate naturally
- ✅ Add try/except only where recovery is actually needed
- ✅ Validation at boundaries (user input, external APIs)
- ✅ Clear, specific exception types
## Reference: Typer Documentation
Official Typer guidance on exits and exceptions:
- [Terminating](https://github.com/fastapi/typer/blob/master/docs/tutorial/terminating.md)
- [Exceptions](https://github.com/fastapi/typer/blob/master/docs/tutorial/exceptions.md)
- [Printing](https://github.com/fastapi/typer/blob/master/docs/tutorial/printing.md)
## Demonstration Scripts
See [assets/nested-typer-exceptions/](./nested-typer-exceptions/) for complete working examples.
**Quick start:** See [README.md](./nested-typer-exceptions/README.md) for script overview and running instructions.
### [nested-typer-exception-explosion.py](./nested-typer-exceptions/nested-typer-exception-explosion.py) - The Anti-Pattern
**What you'll find:**
- Complete executable script demonstrating 7 layers of exception wrapping
- Every function catches exceptions and re-wraps with `from e`
- Creates ConfigError custom exception at each layer
- No isinstance checks - pure exception chain explosion
**What happens when you run it:**
- Single JSON parsing error generates ~220 lines of output
- 7 separate Rich-formatted traceback blocks
- "The above exception was the direct cause of the following exception" repeated 6 times
- Obscures the actual error (invalid JSON) in pages of traceback
**Run it:** `./nested-typer-exception-explosion.py broken.json`
### [nested-typer-exception-explosion_naive_workaround.py](./nested-typer-exceptions/nested-typer-exception-explosion_naive_workaround.py) - The isinstance Band-Aid
**What you'll find:**
- Same 7-layer structure as the explosion example
- Each `except Exception as e:` block has `if isinstance(e, ConfigError): raise` checks
- Shows how AI attempts to avoid double-wrapping by checking exception type
- Treats the symptom (double-wrapping) instead of the cause (catching everywhere)
**What happens when you run it:**
- Still shows nested tracebacks but slightly reduced output (~80 lines)
- Demonstrates why isinstance checks appear in AI-generated code
- Shows this is a workaround, not a solution
**Run it:** `./nested-typer-exception-explosion_naive_workaround.py broken.json`
### [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py) - Correct Pattern with typer.echo
**What you'll find:**
- Custom `AppExit` class extending `typer.Exit` that calls `typer.echo()` in `__init__`
- Helper functions that let exceptions bubble naturally (no try/except)
- Only catches at specific points where meaningful context can be added
- Immediately raises `AppExit` - no re-wrapping through multiple layers
- Complete executable example with PEP 723 metadata
**What happens when you run it:**
- Clean 1-line error message: `Invalid JSON in broken.json at line 1, column 1: Expecting value`
- No traceback explosion
- User-friendly output using typer.echo for stderr
**Run it:** `./nested-typer-exception-explosion_corrected_typer_echo.py broken.json`
### [nested-typer-exception-explosion_corrected_rich_console.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_rich_console.py) - Correct Pattern with Rich Console
**What you'll find:**
- Custom `AppExitRich` class extending `typer.Exit` that calls `console.print()` in `__init__`
- Same exception bubbling principles as typer.echo version
- Uses Rich Console for consistent formatting with rest of CLI
- Allows passing different console instances (normal_console vs err_console)
**What happens when you run it:**
- Same clean 1-line output as typer.echo version
- Uses Rich console for output instead of typer.echo
- Demonstrates pattern for apps already using Rich for terminal output
**Run it:** `./nested-typer-exception-explosion_corrected_rich_console.py broken.json`
### Running the Examples
All scripts use PEP 723 inline script metadata and can be run directly:
```bash
# Run any script directly (uv handles dependencies automatically)
./script-name.py broken.json
# Or explicitly with uv
uv run script-name.py broken.json
```
The scripts will create `broken.json` if it doesn't exist.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,545 @@
---
title: "GitPython: Python Library for Git Repository Interaction"
library_name: GitPython
pypi_package: GitPython
category: version_control
python_compatibility: "3.7+"
last_updated: "2025-11-02"
official_docs: "https://gitpython.readthedocs.io"
official_repository: "https://github.com/gitpython-developers/GitPython"
maintenance_status: "stable"
---
# GitPython: Python Library for Git Repository Interaction
## Official Information
### Repository and Package Details
- **Official Repository**: <https://github.com/gitpython-developers/GitPython> @[github.com]
- **PyPI Package**: `GitPython` @[pypi.org]
- **Current Version**: 3.1.45 (as of research date) @[pypi.org]
- **Official Documentation**: <https://gitpython.readthedocs.io/> @[readthedocs.org]
- **License**: 3-Clause BSD License (New BSD License) @[github.com/LICENSE]
### Maintenance Status
The project is in **maintenance mode** as of 2025 @[github.com/README.md]:
- No active feature development unless contributed by community
- Bug fixes limited to safety-critical issues or community contributions
- Response times up to one month for issues
- Open to contributions and new maintainers
- Widely used and actively maintained by community
### Version Requirements
- **Python Support**: Python >= 3.7 @[setup.py]
- **Explicit Compatibility**: Python 3.7, 3.8, 3.9, 3.10, 3.11, 3.12 @[setup.py]
- **Python 3.13-3.14**: Not explicitly tested but likely compatible given 3.12 support
- **Git Version**: Git 1.7.x or newer required @[README.md]
- **System Requirement**: Git executable must be installed and available in PATH
## Core Purpose
### Problem Statement
GitPython solves the challenge of programmatically interacting with Git repositories from Python without manually parsing git command output or managing subprocess calls @[Context7]:
1. **Abstraction over Git CLI**: Provides high-level (porcelain) and low-level (plumbing) interfaces to Git operations
2. **Object-Oriented Access**: Represents Git objects (commits, trees, blobs, tags) as Python objects
3. **Repository Automation**: Enables automation of repository management, analysis, and manipulation
4. **Mining Software Repositories**: Facilitates extraction of repository metadata for analysis
### When to Use GitPython
**Use GitPython when you need to:**
- Access Git repository metadata programmatically (commits, branches, tags)
- Traverse commit history with complex filtering
- Analyze repository structure and content
- Automate repository operations in Python applications
- Build tools for repository mining or analysis
- Inspect repository state without manual git command parsing
- Work with Git objects (trees, blobs) programmatically
### What Would Be "Reinventing the Wheel"
Without GitPython, you would need to @[github.com/README.md]:
- Manually execute `git` commands via `subprocess`
- Parse git command output (often text-based)
- Handle edge cases in output formatting
- Manage object relationships manually
- Implement caching and optimization
- Handle cross-platform differences in git output
## Real-World Usage Examples
### Example Projects Using GitPython
1. **PyDriller** (908+ stars) - Python framework for mining software repositories @[github.com/ishepard/pydriller]
- Analyzes Git repositories to extract commits, developers, modifications, diffs
- Provides abstraction layer over GitPython for research purposes
2. **Kivy Designer** (837+ stars) - UI designer for Kivy framework @[github.com/kivy/kivy-designer]
- Uses GitPython for version control integration in IDE
3. **GithubCloner** (419+ stars) - Clones GitHub repositories of users and organizations @[github.com/mazen160/GithubCloner]
- Leverages GitPython for batch repository cloning
4. **git-story** (256+ stars) - Creates video animations of Git commit history @[github.com/initialcommit-com/git-story]
- Uses GitPython to traverse commit history for visualization
5. **Dulwich** (2168+ stars) - Pure-Python Git implementation @[github.com/jelmer/dulwich]
- Alternative to GitPython with pure-Python implementation
### Common Usage Patterns
#### Pattern 1: Repository Initialization and Cloning
```python
from git import Repo
# Clone repository
repo = Repo.clone_from('https://github.com/user/repo.git', '/local/path')
# Initialize new repository
repo = Repo.init('/path/to/new/repo')
# Open existing repository
repo = Repo('/path/to/existing/repo')
```
@[Context7/tutorial.rst]
#### Pattern 2: Accessing Repository State
```python
from git import Repo
repo = Repo('/path/to/repo')
# Get active branch
active_branch = repo.active_branch
# Check repository status
is_modified = repo.is_dirty()
untracked = repo.untracked_files
# Access HEAD commit
latest_commit = repo.head.commit
```
@[Context7/tutorial.rst]
#### Pattern 3: Commit Operations
```python
from git import Repo
repo = Repo('/path/to/repo')
# Stage files
repo.index.add(['file1.txt', 'file2.py'])
# Create commit
repo.index.commit('Commit message')
# Access commit metadata
commit = repo.head.commit
print(commit.author.name)
print(commit.authored_datetime)
print(commit.message)
print(commit.hexsha)
```
@[Context7/tutorial.rst]
#### Pattern 4: Branch Management
```python
from git import Repo
repo = Repo('/path/to/repo')
# List all branches
branches = repo.heads
# Create new branch
new_branch = repo.create_head('feature-branch')
# Checkout branch (safer method)
repo.git.checkout('branch-name')
# Access branch commit
commit = repo.heads.main.commit
```
@[Context7/tutorial.rst]
#### Pattern 5: Traversing Commit History
```python
from git import Repo
repo = Repo('/path/to/repo')
# Iterate through commits
for commit in repo.iter_commits('main', max_count=50):
print(f"{commit.hexsha[:7]}: {commit.summary}")
# Get commits for specific file
commits = repo.iter_commits(paths='specific/file.py')
# Access commit tree and changes
for commit in repo.iter_commits():
for file in commit.stats.files:
print(f"{file} changed in {commit.hexsha[:7]}")
```
@[Context7/tutorial.rst]
## Integration Patterns
### Repository Management Pattern
GitPython provides abstractions for repository operations @[Context7/tutorial.rst]:
- **Repo Object**: Central interface to repository
- **References**: Branches (heads), tags, remotes
- **Index**: Staging area for commits
- **Configuration**: Repository and global Git config access
### Automation Patterns
#### CI/CD Integration
```python
from git import Repo
def deploy_on_commit():
repo = Repo('/app/source')
# Fetch latest changes
origin = repo.remotes.origin
origin.pull()
# Check if deployment needed
if repo.head.commit != last_deployed_commit:
trigger_deployment()
```
#### Repository Analysis
```python
from git import Repo
from collections import defaultdict
def analyze_contributors(repo_path):
repo = Repo(repo_path)
contributions = defaultdict(int)
for commit in repo.iter_commits():
contributions[commit.author.email] += 1
return dict(contributions)
```
#### Automated Tagging
```python
from git import Repo
def create_version_tag(version):
repo = Repo('.')
repo.create_tag(f'v{version}', message=f'Release {version}')
repo.remotes.origin.push(f'v{version}')
```
## Python Version Compatibility
### Verified Compatibility
- **Python 3.7-3.12**: Fully supported and tested @[setup.py]
- **Python 3.13-3.14**: Not explicitly tested but should work (no breaking changes identified)
### Dependency Requirements
GitPython requires @[README.md]:
- `gitdb` package for Git object database operations
- `git` executable (system dependency)
- Compatible with all major operating systems (Linux, macOS, Windows)
### Platform Considerations
- **Windows**: Some limitations noted in Issue #525 @[README.md]
- **Unix-like systems**: Full feature support
- **Git Version**: Requires Git 1.7.x or newer
## Usage Examples from Documentation
### Repository Initialization
```python
from git import Repo
# Initialize working directory repository
repo = Repo("/path/to/repo")
# Initialize bare repository
repo = Repo("/path/to/bare/repo", bare=True)
```
@[Context7/tutorial.rst]
### Working with Commits and Trees
```python
from git import Repo
repo = Repo('.')
# Get latest commit
commit = repo.head.commit
# Access commit tree
tree = commit.tree
# Get tree from repository directly
repo_tree = repo.tree()
# Navigate tree structure
for item in tree:
print(f"{item.type}: {item.name}")
```
@[Context7/tutorial.rst]
### Diffing Operations
```python
from git import Repo
repo = Repo('.')
commit = repo.head.commit
# Diff commit against working tree
diff_worktree = commit.diff(None)
# Diff between commits
prev_commit = commit.parents[0]
diff_commits = prev_commit.diff(commit)
# Iterate through changes
for diff_item in diff_worktree:
print(f"{diff_item.change_type}: {diff_item.a_path}")
```
@[Context7/changes.rst]
### Remote Operations
```python
from git import Repo, RemoteProgress
class ProgressPrinter(RemoteProgress):
def update(self, op_code, cur_count, max_count=None, message=''):
print(f"Progress: {cur_count}/{max_count}")
repo = Repo('/path/to/repo')
origin = repo.remotes.origin
# Fetch with progress
origin.fetch(progress=ProgressPrinter())
# Pull changes
origin.pull()
# Push changes
origin.push()
```
@[Context7/tutorial.rst]
## When NOT to Use GitPython
### Performance-Critical Operations
- **Large repositories**: GitPython can be slow on very large repos
- **Bulk operations**: Consider `git` CLI directly for batch operations
- **Resource-constrained environments**: GitPython can leak resources in long-running processes
### Long-Running Processes
GitPython is **not suited for daemons or long-running processes** @[README.md]:
- Resource leakage issues due to `__del__` method implementations
- Written before deterministic destructors became unreliable
- **Mitigation**: Factor GitPython into separate process that can be periodically restarted
- **Alternative**: Manually call `__del__` methods when appropriate
### Simple Git Commands
When you only need simple git operations:
- **Single command execution**: Use `subprocess.run(['git', 'status'])` directly
- **Shell scripting**: Pure git commands may be simpler
- **One-off operations**: GitPython overhead not justified
### Pure Python Requirements
If you cannot have system dependencies:
- GitPython **requires git executable** installed on system
- Consider **Dulwich** (pure-Python Git implementation) instead
## Decision Guidance: GitPython vs Subprocess
### Use GitPython When
| Scenario | Reason |
| ---------------------------- | ---------------------------------------- |
| Complex repository traversal | Object-oriented API simplifies iteration |
| Accessing Git objects | Direct access to trees, blobs, commits |
| Repository analysis | Rich metadata without parsing |
| Cross-platform code | Abstracts platform differences |
| Multiple related operations | Maintains repository context |
| Building repository tools | Higher-level abstractions |
| Need type hints | GitPython provides typed interfaces |
### Use Subprocess When
| Scenario | Reason |
| ------------------------- | -------------------------------------- |
| Single git command | Less overhead |
| Performance critical | Direct execution faster |
| Long-running daemon | Avoid resource leaks |
| Simple automation | Shell script may be clearer |
| Git plumbing commands | Some commands not exposed in GitPython |
| Very large repositories | Lower memory footprint |
| Custom git configurations | Full control over git execution |
### Decision Matrix
```python
# USE GITPYTHON:
# - Iterate commits with filtering
for commit in repo.iter_commits('main', max_count=100):
if commit.author.email == 'specific@email.com':
analyze_commit(commit)
# USE SUBPROCESS:
# - Simple status check
result = subprocess.run(['git', 'status', '--short'],
capture_output=True, text=True)
if 'M' in result.stdout:
print("Modified files detected")
# USE GITPYTHON:
# - Repository state analysis
if repo.is_dirty(untracked_files=True):
staged = repo.index.diff("HEAD")
unstaged = repo.index.diff(None)
# USE SUBPROCESS:
# - Performance-critical bulk operation
subprocess.run(['git', 'gc', '--aggressive'])
```
## Critical Limitations
### Resource Leakage @[README.md]
GitPython tends to leak system resources in long-running processes:
- Destructors (`__del__`) no longer run deterministically in modern Python
- Manually call cleanup methods or use separate process approach
- Not recommended for daemon applications
### Windows Support @[README.md]
Known limitations on Windows platform:
- See Issue #525 for details
- Some operations may behave differently
### Git Executable Dependency @[README.md]
GitPython requires git to be installed:
- Must be in PATH or specified via `GIT_PYTHON_GIT_EXECUTABLE` environment variable
- Cannot work in pure-Python environments
- Version requirement: Git 1.7.x or newer
## Installation
### Standard Installation
```bash
pip install GitPython
```
### Development Installation
```bash
git clone https://github.com/gitpython-developers/GitPython
cd GitPython
./init-tests-after-clone.sh
pip install -e ".[test]"
```
@[README.md]
## Testing and Quality
### Running Tests
```bash
# Install test dependencies
pip install -e ".[test]"
# Run tests
pytest
# Run linting
pre-commit run --all-files
# Type checking
mypy
```
@[README.md]
### Configuration
- Test configuration in `pyproject.toml`
- Supports pytest, coverage.py, ruff, mypy
- CI via GitHub Actions and tox
## Community and Support
### Getting Help
- **Documentation**: <https://gitpython.readthedocs.io/>
- **Stack Overflow**: Use `gitpython` tag @[README.md]
- **Issue Tracker**: <https://github.com/gitpython-developers/GitPython/issues>
### Contributing
- Project accepts contributions of all kinds
- Seeking new maintainers
- Response time: up to 1 month for issues @[README.md]
### Related Projects
- **Gitoxide**: Rust implementation of Git by original GitPython author @[README.md]
- **Dulwich**: Pure-Python Git implementation
- **PyDriller**: Framework for mining software repositories built on GitPython
## Summary
GitPython provides a mature, well-documented Python interface to Git repositories. While in maintenance mode, it remains widely used and community-supported. Best suited for repository analysis, automation, and tools where the convenience of object-oriented access outweighs performance concerns. For simple operations or long-running processes, consider subprocess or alternative approaches.
**Key Takeaway**: Use GitPython when the complexity of repository operations justifies the abstraction layer and resource overhead. Use subprocess for simple, one-off git commands or in resource-sensitive environments.

View File

@@ -0,0 +1,569 @@
---
title: "Arrow - Better Dates & Times for Python"
library_name: arrow
pypi_package: arrow
category: datetime
python_compatibility: "3.8+"
last_updated: "2025-11-02"
official_docs: "https://arrow.readthedocs.io"
official_repository: "https://github.com/arrow-py/arrow"
maintenance_status: "active"
---
# Arrow - Better Dates & Times for Python
## Core Purpose
Arrow provides a sensible, human-friendly approach to creating, manipulating, formatting, and converting dates, times, and timestamps. It addresses critical usability problems in Python's standard datetime ecosystem:
**Problems Arrow Solves:**
- **Module fragmentation**: Eliminates the need to import datetime, time, calendar, dateutil, pytz separately
- **Type complexity**: Provides a single Arrow type instead of managing date, time, datetime, tzinfo, timedelta, relativedelta
- **Timezone verbosity**: Simplifies timezone-aware operations that are cumbersome with standard library
- **Missing functionality**: Built-in ISO 8601 parsing, humanization, and time span operations
- **Timezone naivety**: UTC-aware by default, preventing common timezone bugs
Arrow is a **drop-in replacement for datetime** that consolidates scattered tools into a unified, elegant interface.
## When to Use Arrow
### Use Arrow When
1. **Building user-facing applications** that display relative times ("2 hours ago", "in 3 days")
2. **Working extensively with timezones** - converting between zones, handling DST transitions
3. **Parsing diverse datetime formats** - ISO 8601, timestamps, custom formats
4. **Need cleaner, more readable code** - Arrow's chainable API reduces boilerplate
5. **Generating time ranges or spans** - iterate over hours, days, weeks, months
6. **Internationalization is required** - 75+ locale support for humanized output
7. **API development** where timezone-aware timestamps are standard
8. **Data processing pipelines** that handle datetime transformations frequently
### Use Standard datetime When
1. **Performance is absolutely critical** - Arrow is ~50% slower than datetime.utcnow() @benchmark
2. **Minimal datetime operations** - simple date storage with no manipulation
3. **Library compatibility requirements** mandate standard datetime objects
4. **Memory-constrained environments** - datetime objects have smaller footprint
5. **Working within pandas/numpy** which have optimized datetime64 types
6. **No timezone logic needed** and you're comfortable with datetime's API
## Real-World Usage Patterns
### Pattern 1: Timezone-Aware Timestamp Creation
@source: <https://arrow.readthedocs.io/en/latest/>
```python
import arrow
# Get current time in UTC (default)
utc = arrow.utcnow()
# <Arrow [2013-05-11T21:23:58.970460+00:00]>
# Get current time in specific timezone
local = arrow.now('US/Pacific')
# <Arrow [2013-05-11T13:23:58.970460-07:00]>
# Convert between timezones effortlessly
utc_time = arrow.utcnow()
tokyo_time = utc_time.to('Asia/Tokyo')
ny_time = tokyo_time.to('America/New_York')
```
**Why this matters:** Standard datetime requires verbose pytz.timezone() calls and manual localization. Arrow handles this in one method.
### Pattern 2: Parsing Diverse Formats
@source: Context7 documentation snippets
```python
import arrow
# Parse ISO 8601 automatically
arrow.get('2013-05-11T21:23:58.970460+07:00')
# Parse with format string
arrow.get('2013-05-05 12:30:45', 'YYYY-MM-DD HH:mm:ss')
# Parse Unix timestamps (int or float)
arrow.get(1368303838)
arrow.get(1565358758.123413)
# Parse with timezone
arrow.get('2013-05-11T21:23:58', tzinfo='Europe/Paris')
# Handle inconsistent spacing
arrow.get('Jun 1 2005 1:33PM', 'MMM D YYYY H:mmA', normalize_whitespace=True)
# Parse ISO week dates
arrow.get('2013-W29-6', 'W') # Year-Week-Day format
```
**Why this matters:** datetime.strptime() requires exact format matching. Arrow intelligently handles variations and timezone strings directly.
### Pattern 3: Humanization for User Interfaces
@source: <https://arrow.readthedocs.io/en/latest/>
```python
import arrow
now = arrow.utcnow()
past = now.shift(hours=-1)
future = now.shift(days=3, hours=2)
# English humanization
past.humanize() # 'an hour ago'
future.humanize() # 'in 3 days'
# Localized humanization (75+ locales)
past.humanize(locale='ko-kr') # '한시간 전'
past.humanize(locale='es') # 'hace una hora'
# Multiple granularities
later = arrow.utcnow().shift(hours=2, minutes=19)
later.humanize(granularity=['hour', 'minute'])
# 'in 2 hours and 19 minutes'
# Quarter granularity (business applications)
four_months = now.shift(months=4)
four_months.humanize(granularity='quarter') # 'in a quarter'
```
**Why this matters:** Building this with datetime requires third-party libraries or manual logic. Arrow includes it with extensive locale support.
### Pattern 4: Time Shifting and Manipulation
@source: Context7 documentation snippets
```python
import arrow
now = arrow.utcnow()
# Relative shifts (chainable)
future = now.shift(years=1, months=-2, days=5, hours=3)
past = now.shift(weeks=-2)
# Dehumanize - parse human phrases
base = arrow.get('2020-05-27 10:30:35')
base.dehumanize('8 hours ago')
base.dehumanize('in 4 days')
base.dehumanize('hace 2 años', locale='es') # Spanish: "2 years ago"
# Replace specific components
now.replace(hour=0, minute=0, second=0) # Start of day
now.replace(year=2025)
```
**Why this matters:** timedelta only supports days/seconds. dateutil.relativedelta is verbose. Arrow combines both with intuitive API.
### Pattern 5: Time Ranges and Spans
@source: Context7 documentation snippets
```python
import arrow
from datetime import datetime
# Generate time ranges
start = arrow.get(2020, 5, 5, 12, 30)
end = arrow.get(2020, 5, 5, 17, 15)
# Iterate by hour
for hour in arrow.Arrow.range('hour', start, end):
print(hour)
# Get floor and ceiling (span)
now = arrow.utcnow()
now.span('hour') # Returns (floor, ceiling) tuple
now.floor('hour') # Start of current hour
now.ceil('day') # End of current day
# Span ranges - generate (start, end) tuples
for span in arrow.Arrow.span_range('hour', start, end):
floor, ceiling = span
print(f"Hour from {floor} to {ceiling}")
# Handle DST transitions correctly
before_dst = arrow.get('2018-03-10 23:00:00', tzinfo='US/Pacific')
after_dst = arrow.get('2018-03-11 04:00:00', tzinfo='US/Pacific')
for t in arrow.Arrow.range('hour', before_dst, after_dst):
print(f"{t} (UTC: {t.to('UTC')})")
```
**Why this matters:** Standard datetime has no built-in iteration. Arrow handles DST transitions automatically in ranges.
### Pattern 6: Formatting with Built-in Constants
@source: Context7 documentation snippets
```python
import arrow
arw = arrow.utcnow()
# Use predefined format constants
arw.format(arrow.FORMAT_ATOM) # '2020-05-27 10:30:35+00:00'
arw.format(arrow.FORMAT_COOKIE) # 'Wednesday, 27-May-2020 10:30:35 UTC'
arw.format(arrow.FORMAT_RFC3339) # '2020-05-27 10:30:35+00:00'
arw.format(arrow.FORMAT_W3C) # '2020-05-27 10:30:35+00:00'
# Custom formats with tokens
arw.format('YYYY-MM-DD HH:mm:ss ZZ') # '2020-05-27 10:30:35 +00:00'
# Escape literal text in formats
arw.format('YYYY-MM-DD h [h] m') # '2020-05-27 10 h 30'
# Timestamp formats
arw.format('X') # '1590577835' (seconds)
arw.format('x') # '1590577835123456' (microseconds)
```
**Why this matters:** datetime.strftime() uses different token syntax (%Y vs YYYY). Arrow uses consistent, JavaScript-inspired tokens.
## Integration Patterns
### Works seamlessly with
**python-dateutil** (required dependency >=2.7.0)
- Arrow uses dateutil.parser internally for flexible parsing
- Timezone objects from dateutil are directly compatible
**pytz** (optional, for Python <3.9)
- Arrow accepts pytz timezone objects in `to()` and `tzinfo` parameters
- Handles pytz's DST quirks automatically
**zoneinfo** (Python 3.9+, via backports.zoneinfo on 3.8)
- Arrow supports ZoneInfo timezone objects natively
- Uses tzdata package on Python 3.9+ for timezone database
**datetime** (standard library)
- `arrow_obj.datetime` returns standard datetime object
- `arrow.get(datetime_obj)` creates Arrow from datetime
- Arrow subclasses datetime, so it works anywhere datetime does
**pandas** (data analysis)
```python
import arrow
import pandas as pd
# Convert Arrow to pandas Timestamp
arrow_time = arrow.utcnow()
pd.Timestamp(arrow_time.datetime)
# Or use Arrow for timezone-aware operations before pandas
df['timestamp'] = df['utc_string'].apply(lambda x: arrow.get(x).to('US/Pacific').datetime)
```
**Django/Flask** (web frameworks)
```python
# Django models - store as DateTimeField
from django.db import models
import arrow
class Event(models.Model):
created_at = models.DateTimeField()
def save(self, *args, **kwargs):
self.created_at = arrow.utcnow().datetime # Convert to datetime
super().save(*args, **kwargs)
@property
def created_humanized(self):
return arrow.get(self.created_at).humanize()
```
## Python Version Compatibility
@source: <https://github.com/arrow-py/arrow/blob/master/pyproject.toml>
**Minimum:** Python 3.8 **Tested versions:** 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.14 **Status:** Production/Stable across all supported versions
**Version-specific notes:**
- **Python 3.8**: Requires `backports.zoneinfo==0.2.1` for timezone support
- **Python 3.9+**: Uses built-in `zoneinfo` and `tzdata` package
- **Python 3.6-3.7**: No longer supported as of Arrow 1.3.0 (EOL Python versions)
**Dependencies:**
```toml
python-dateutil>=2.7.0
backports.zoneinfo==0.2.1 # Python <3.9 only
tzdata # Python >=3.9 only
```
## Installation
```bash
# Basic installation
pip install -U arrow
# With uv (recommended)
uv pip install arrow
# In pyproject.toml
[project]
dependencies = [
"arrow>=1.3.0",
]
```
## When NOT to Use Arrow
### Scenario 1: High-Performance Timestamp Generation
@source: <https://www.dataroc.ca/blog/most-performant-timestamp-functions-python>
**Performance benchmarks show:**
- `time.time()`: Baseline (fastest)
- `datetime.utcnow()`: ~50% slower than time.time()
- Arrow operations: Additional overhead for object wrapping
**Use datetime when:** You're generating millions of timestamps in tight loops (e.g., high-frequency trading, real-time analytics pipelines).
```python
# High-performance scenario - use standard library
import time
timestamp = time.time() # Fastest for epoch timestamps
import datetime
dt = datetime.datetime.utcnow() # Faster for datetime objects
```
### Scenario 2: Working with Pandas/NumPy DateTime
@source: Performance analysis and library comparisons
Pandas has highly optimized `datetime64` vectorized operations. Arrow's object-oriented approach doesn't vectorize well.
**Use pandas when:** Processing large datasets with datetime columns.
```python
import pandas as pd
# Pandas is optimized for this
df['date'] = pd.to_datetime(df['date_string'])
df['hour'] = df['date'].dt.hour # Vectorized operation
# Arrow would require row-by-row operations (slow)
# df['hour'] = df['date'].apply(lambda x: arrow.get(x).hour)
```
### Scenario 3: Simple Date Storage
**Use datetime when:** You only need to store dates with no manipulation:
```python
from datetime import datetime
# Simple storage - datetime is sufficient
user.created_at = datetime.utcnow()
```
### Scenario 4: Library Compatibility Constraints
Some libraries explicitly require standard datetime objects and don't accept subclasses. Always test compatibility.
### Scenario 5: Memory-Constrained Environments
Arrow objects carry additional overhead. For millions of cached datetime objects, standard datetime is lighter.
## Decision Matrix
| Requirement | Arrow | datetime | Notes |
| ---------------------- | ----- | -------- | --------------------------------------------------- |
| Timezone conversion | ✓✓✓ | ✓ | Arrow: one-line. datetime: verbose with pytz |
| ISO 8601 parsing | ✓✓✓ | ✓✓ | Arrow: automatic. datetime: fromisoformat() limited |
| Humanization | ✓✓✓ | ✗ | Arrow: built-in with 75+ locales |
| Time ranges/iteration | ✓✓✓ | ✗ | Arrow: native. datetime: manual loops |
| Performance (creation) | ✓✓ | ✓✓✓ | datetime ~50% faster |
| Performance (parsing) | ✓✓ | ✓✓✓ | datetime.strptime() faster |
| Memory footprint | ✓✓ | ✓✓✓ | datetime objects lighter |
| Learning curve | ✓✓✓ | ✓✓ | Arrow: more intuitive |
| Pandas integration | ✓ | ✓✓✓ | Use pandas.Timestamp for large data |
| Standard library | ✗ | ✓✓✓ | Arrow: requires installation |
| Type hints | ✓✓✓ | ✓✓✓ | Both have full PEP 484 support |
| DST handling | ✓✓✓ | ✓✓ | Arrow: automatic. datetime: manual |
**Legend:** ✓✓✓ Excellent | ✓✓ Good | ✓ Adequate | ✗ Not supported
## Quick Decision Guide
```text
START: Do you need datetime functionality?
|
├─ Is performance critical? (>100k ops/sec)
| └─ YES → Use datetime or time.time()
|
├─ Working with pandas/numpy large datasets?
| └─ YES → Use pandas.Timestamp
|
├─ Need any of: humanization, easy timezone conversion, time ranges, multi-locale?
| └─ YES → Use Arrow
|
├─ Simple date storage only?
| └─ YES → Use datetime
|
└─ Building user-facing application with datetime logic?
└─ YES → Use Arrow (cleaner code, better UX)
```
## Common Gotchas and Solutions
### Gotcha 1: Arrow is timezone-aware by default
@source: Arrow documentation
```python
# This gives you UTC time, not local time
arrow.now() # <Arrow [2020-05-27T10:30:35.123456+00:00]>
# For local timezone, be explicit
arrow.now('local') # or arrow.now('America/New_York')
```
### Gotcha 2: Converting to datetime loses Arrow methods
```python
arrow_time = arrow.utcnow()
dt = arrow_time.datetime # Now a standard datetime object
# This works
arrow_time.humanize() # ✓
# This fails
dt.humanize() # ✗ AttributeError
```
### Gotcha 3: Timestamp parsing requires format token in 0.15.0+
@source: Context7 CHANGELOG snippets
```python
# Deprecated (pre-0.15.0)
arrow.get("1565358758") # ✗ No longer works
# Correct (0.15.0+)
arrow.get("1565358758", "X") # ✓ Explicit format token
arrow.get(1565358758) # ✓ Or pass as int/float directly
```
### Gotcha 4: Ambiguous datetimes during DST transitions
@source: Context7 documentation
```python
# During DST "fall back", 2 AM occurs twice
# Use fold parameter (PEP 495)
ambiguous = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo='Europe/Stockholm')
ambiguous.fold # 0 (first occurrence)
# Specify which occurrence
second_occurrence = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo='Europe/Stockholm', fold=1)
```
## Alternatives Comparison
@source: <https://python.libhunt.com/arrow-alternatives>, <https://aboutsimon.com/blog/2016/08/04/datetime-vs-Arrow-vs-Pendulum-vs-Delorean-vs-udatetime.html>
**Pendulum** (arrow alternative)
- Similar goals: human-friendly datetime
- Better timezone handling in some edge cases
- Slower than Arrow in benchmarks
- Less widely adopted (fewer GitHub stars)
**Maya** (Datetimes for Humans)
- Simpler API, fewer features
- Less actively maintained
- Good for very basic use cases
**udatetime** (performance-focused)
- Written in C for speed (faster than datetime)
- Limited feature set (encode/decode only)
- Use when you need Arrow-like simplicity with datetime-like speed
**Standard datetime** (built-in)
- Always available, no dependencies
- Verbose but performant
- Use when Arrow features aren't needed
**dateutil** (datetime extension)
- Powerful parser, relativedelta for arithmetic
- Often used with datetime for enhanced functionality
- Arrow uses dateutil internally
## Real-World Example Projects
@source: GitHub search results
**arrow-py/arrow** (8,944 stars)
- Official repository with comprehensive examples
- <https://github.com/arrow-py/arrow>
**Common usage in web applications:**
```python
# API endpoint returning human-readable timestamps
from flask import jsonify
import arrow
@app.route('/events')
def get_events():
events = Event.query.all()
return jsonify([{
'id': e.id,
'name': e.name,
'created': arrow.get(e.created_at).humanize(),
'start_time': arrow.get(e.start_time).format('YYYY-MM-DD HH:mm ZZ')
} for e in events])
```
**Data processing pipelines:**
```python
import arrow
def process_log_file(log_path):
with open(log_path) as f:
for line in f:
# Parse diverse timestamp formats
timestamp_str = extract_timestamp(line)
timestamp = arrow.get(timestamp_str, normalize_whitespace=True)
# Convert to consistent timezone
utc_time = timestamp.to('UTC')
# Filter by time range
if utc_time >= arrow.get('2025-01-01'):
yield utc_time, line
```
## References and Sources
@official_docs: <https://arrow.readthedocs.io/en/latest/> @repository: <https://github.com/arrow-py/arrow> @pypi: <https://pypi.org/project/arrow/> @context7: /arrow-py/arrow @changelog: <https://github.com/arrow-py/arrow/blob/master/CHANGELOG.rst>
**Performance analysis:** @benchmark: <https://www.dataroc.ca/blog/most-performant-timestamp-functions-python> @comparison: <https://aboutsimon.com/blog/2016/08/04/datetime-vs-Arrow-vs-Pendulum-vs-Delorean-vs-udatetime.html>
**Community resources:** @alternatives: <https://python.libhunt.com/arrow-alternatives> @tutorial: <https://code.tutsplus.com/arrow-for-better-date-and-time-in-python--cms-29624t> @guide: <https://stackabuse.com/working-with-datetime-in-python-with-arrow/>
## Summary
Arrow eliminates datetime friction by consolidating Python's fragmented date/time ecosystem into a single, intuitive API. Use it when developer experience and feature richness matter more than raw performance. For high-frequency operations or pandas-scale data processing, stick with the standard library or specialized tools. Arrow shines in web applications, APIs, CLI tools, and any code where humans read the timestamps.
**The reinvented wheel:** Without Arrow, you'd manually implement timezone conversion helpers, humanization logic, flexible parsing, and time range iteration using datetime + dateutil + pytz + custom code. Arrow packages these common patterns into a production-ready library.

View File

@@ -0,0 +1,490 @@
---
title: "attrs: Python Classes Without Boilerplate"
library_name: attrs
pypi_package: attrs
category: dataclasses
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://www.attrs.org"
official_repository: "https://github.com/python-attrs/attrs"
maintenance_status: "active"
---
# attrs: Python Classes Without Boilerplate
## Core Purpose
attrs eliminates the drudgery of implementing object protocols (dunder methods) by automatically generating `__init__`, `__repr__`, `__eq__`, `__hash__`, and other common methods. It predates Python's built-in dataclasses (which was inspired by attrs) and offers more features and flexibility.
**What problem does it solve?**
- Removes repetitive boilerplate code for class definitions
- Provides declarative attribute definitions with validation and conversion
- Offers slots, frozen instances, and performance optimizations
- Enables consistent, correct implementations of comparison and hashing
**This prevents "reinventing the wheel" by:**
- Auto-generating special methods that are error-prone to write manually
- Providing battle-tested validators and converters
- Handling edge cases in equality, hashing, and immutability correctly
- Offering extensibility through field transformers and custom setters
## Official Information
- **Repository**: <https://github.com/python-attrs/attrs> (@source: python-attrs/attrs on GitHub)
- **PyPI Package**: `attrs` (current version: 25.4.0) (@source: <https://pypi.org/project/attrs/>)
- **Documentation**: <https://www.attrs.org/> (@source: official docs)
- **License**: MIT
- **Maintenance**: Active development, trusted by NASA for Mars missions since 2020 (@source: attrs README)
## Python Version Compatibility
- **Minimum**: Python 3.9+ (@source: PyPI metadata)
- **Maximum**: Python 3.14 (tested and supported)
- **PyPy**: Fully supported
- **Feature notes**:
- Supports slots by default in modern API (`@define`)
- Works with all mainstream Python versions including PyPy
- Implements cell rewriting for `super()` calls in slotted classes
- Compatible with `functools.cached_property` on slotted classes
## Installation
```bash
pip install attrs
```
For serialization/deserialization support:
```bash
pip install attrs cattrs
```
## Core Usage Patterns
### 1. Basic Class Definition (Modern API)
```python
from attrs import define, field
@define
class Point:
x: int
y: int
# Automatically generates __init__, __repr__, __eq__, etc.
p = Point(1, 2)
print(p) # Point(x=1, y=2)
print(p == Point(1, 2)) # True
```
(@source: Context7 /python-attrs/attrs documentation, attrs README)
### 2. Default Values and Factories
```python
from attrs import define, field, Factory
@define
class SomeClass:
a_number: int = 42
list_of_numbers: list[int] = Factory(list)
# Factory prevents mutable default gotchas
sc1 = SomeClass()
sc2 = SomeClass()
sc1.list_of_numbers.append(1)
print(sc2.list_of_numbers) # [] - separate instances
```
(@source: attrs README, Context7 documentation examples)
### 3. Validators
```python
from attrs import define, field, validators
@define
class User:
email: str = field(validator=validators.matches_re(
r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
))
age: int = field(validator=[
validators.instance_of(int),
validators.ge(0),
validators.lt(150)
])
# Custom validator with decorator
@define
class BoundedValue:
x: int = field()
y: int
@x.validator
def _check_x(self, attribute, value):
if value >= self.y:
raise ValueError("x must be smaller than y")
```
(@source: Context7 /python-attrs/attrs validators documentation)
### 4. Converters
```python
from attrs import define, field, converters
@define
class C:
x: int = field(converter=int)
c = C("42")
print(c.x) # 42 (converted from string)
# Optional converter
@define
class OptionalInt:
x: int | None = field(converter=converters.optional(int))
OptionalInt(None) # Valid
OptionalInt("42") # Converts to 42
```
(@source: Context7 /python-attrs/attrs converters documentation)
### 5. Frozen (Immutable) Classes
```python
from attrs import frozen, field
@frozen
class Coordinates:
x: int
y: int
c = Coordinates(1, 2)
# c.x = 3 # Raises FrozenInstanceError
# Post-init with frozen classes
@frozen
class FrozenWithDerived:
x: int
y: int = field(init=False)
def __attrs_post_init__(self):
# Must use object.__setattr__ for frozen classes
object.__setattr__(self, "y", self.x + 1)
```
(@source: Context7 /python-attrs/attrs frozen documentation)
### 6. Slots for Performance
```python
from attrs import define
# Slots enabled by default with @define
@define
class SlottedClass:
x: int
y: int
# More memory efficient, faster attribute access
# Cannot add attributes not defined in class
```
(@source: Context7 /python-attrs/attrs slots documentation, attrs glossary)
### 7. Without Type Annotations
```python
from attrs import define, field
@define
class NoAnnotations:
a_number = field(default=42)
list_of_numbers = field(factory=list)
```
(@source: attrs README)
## Real-World Examples
### Example Projects Using attrs
1. **Black** - The uncompromising Python code formatter
- Repository: <https://github.com/psf/black>
- Usage: Extensive use of attrs for AST node classes (@source: GitHub search)
2. **cattrs** - Composable custom class converters
- Repository: <https://github.com/python-attrs/cattrs>
- Usage: Built on top of attrs for serialization/deserialization (@source: python-attrs/cattrs)
3. **Eradiate** - Radiative transfer model
- Repository: <https://github.com/eradiate/eradiate>
- Usage: Scientific computing with validated data structures (@source: GitHub code search)
### Common Patterns from Real Code
**Pattern 1: Deep validation for nested structures**
```python
from attrs import define, field, validators
@define
class Measurement:
tags: dict = field(
validator=validators.deep_mapping(
key_validator=validators.not_(
validators.in_({"id", "time", "source"}),
msg="reserved tag key"
),
value_validator=validators.instance_of((str, int))
)
)
```
(@source: Context7 /python-attrs/attrs deep_mapping validator documentation)
**Pattern 2: Custom comparison for special types**
```python
import numpy as np
from attrs import define, field, cmp_using
@define
class ArrayContainer:
data: np.ndarray = field(eq=cmp_using(eq=np.array_equal))
```
(@source: Context7 /python-attrs/attrs comparison documentation)
**Pattern 3: Hiding sensitive data in repr**
```python
from attrs import define, field
@define
class User:
username: str
password: str = field(repr=lambda value: '***')
User("admin", "secret123")
# Output: User(username='admin', password=***)
```
(@source: Context7 /python-attrs/attrs examples)
## Integration Patterns
### With cattrs for Serialization
```python
from attrs import define
from cattrs import structure, unstructure
@define
class Person:
name: str
age: int
# Serialize to dict
data = unstructure(Person("Alice", 30))
# {'name': 'Alice', 'age': 30}
# Deserialize from dict
person = structure({"name": "Bob", "age": 25}, Person)
```
(@source: python-attrs/cattrs repository, Context7 cattrs documentation)
### Field Transformers for Advanced Use Cases
```python
from attrs import define, frozen, field
from datetime import datetime
def auto_convert_datetime(cls, fields):
results = []
for f in fields:
if f.converter is not None:
results.append(f)
continue
if f.type in {datetime, 'datetime'}:
converter = lambda d: datetime.fromisoformat(d) if isinstance(d, str) else d
else:
converter = None
results.append(f.evolve(converter=converter))
return results
@frozen(field_transformer=auto_convert_datetime)
class Event:
name: str
timestamp: datetime
# Automatically converts ISO strings to datetime
event = Event(name="deploy", timestamp="2025-10-21T10:00:00")
```
(@source: Context7 /python-attrs/attrs field_transformer documentation)
## When to Use attrs
### Use attrs when
- You want more features than dataclasses provide
- You need robust validation and conversion
- You require frozen/immutable instances with complex post-init
- You want extensibility (field transformers, custom setters)
- You need to support Python 3.9+ with modern features
- Performance matters (slots optimization)
- You want better debugging experience (cell rewriting for super())
- You prefer a mature, battle-tested library (used by NASA)
### Use dataclasses when
- You need stdlib-only solution (no dependencies)
- Your use case is simple (basic data containers)
- You don't need validators or converters
- You're comfortable with limited customization
- You only support Python 3.10+ (for slots with super())
### Use Pydantic when
- You need runtime type validation (attrs validates on-demand)
- You're building APIs with automatic schema generation
- You need JSON Schema / OpenAPI integration
- You want coercion-heavy validation (Pydantic is more aggressive)
- You need ORM-like features
## Decision Matrix
| Feature | attrs | dataclasses | Pydantic |
| ------------------ | ------------------- | ------------- | --------------------- |
| **Validators** | Extensive | Manual only | Automatic + extensive |
| **Converters** | Built-in | Manual only | Automatic coercion |
| **Slots** | Default in @define | 3.10+ only | Optional |
| **Frozen** | Full support | Basic support | Via Config |
| **Performance** | Fast (slots) | Fast | Slower (validation) |
| **Type coercion** | Opt-in | No | Automatic |
| **Dependencies** | Zero | Zero (stdlib) | Multiple |
| **Extensibility** | High (transformers) | Limited | Medium |
| **Python support** | 3.9+ | 3.7+ | 3.8+ |
| **Schema export** | Via cattrs | No | Built-in |
| **API stability** | Very stable | Stable | Evolving |
(@source: Context7 /python-attrs/attrs comparison with dataclasses, research from comparison articles)
## When NOT to Use
1. **Simple data containers without validation**
- If you just need `__init__` and `__repr__`, dataclasses suffice
- Example: Simple config objects, DTOs without business logic
2. **When you need JSON Schema / OpenAPI integration**
- Pydantic provides this out-of-the-box
- attrs requires additional libraries (cattrs + schema generators)
3. **Heavy runtime type validation requirements**
- Pydantic validates automatically; attrs requires explicit validators
- If every field needs type checking at runtime, Pydantic is more convenient
4. **No external dependencies allowed**
- Use dataclasses from stdlib
- Though attrs has zero dependencies itself
5. **Working with ORMs requiring specific metaclasses**
- Some ORMs conflict with attrs' class generation
- Check compatibility before adopting
## Performance Characteristics
- **Slots**: Enabled by default in `@define`, reducing memory overhead (~40-50% less memory)
- **Frozen classes**: Slightly slower instantiation due to immutability checks
- **Validation**: Only runs when explicitly called via `attrs.validate()` or during `__init__`
- **Comparison**: Generated methods are as fast as hand-written equivalents
(@source: Context7 /python-attrs/attrs performance benchmarks)
## Common Gotchas
1. **Mutable defaults**: Always use `Factory` for mutable defaults
2. **Frozen post-init**: Must use `object.__setattr__` in `__attrs_post_init__`
3. **Slots and dynamic attributes**: Cannot add attributes not defined in class
4. **Pickling slotted classes**: Attributes with `init=False` must be set before pickling
5. **Validator order**: Converters run before validators
(@source: Context7 /python-attrs/attrs documentation, glossary)
## Migration Path
### From dataclasses to attrs
```python
# Before (dataclass)
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int = 0
# After (attrs)
from attrs import define
@define
class Point:
x: int
y: int = 0
```
Minimal changes required; attrs is largely a drop-in replacement with more features.
### From Pydantic to attrs
```python
# Before (Pydantic)
from pydantic import BaseModel, validator
class User(BaseModel):
name: str
age: int
@validator('age')
def check_age(cls, v):
if v < 0:
raise ValueError('age must be positive')
return v
# After (attrs + cattrs for serialization)
from attrs import define, field, validators
@define
class User:
name: str
age: int = field(validator=[
validators.instance_of(int),
validators.ge(0)
])
```
Note: Pydantic does automatic validation; attrs requires explicit calls.
## Additional Resources
- **Official Tutorial**: <https://www.attrs.org/en/stable/examples.html>
- **Extensions**: <https://github.com/python-attrs/attrs/wiki/Extensions-to-attrs>
- **Comparison with dataclasses**: <https://www.attrs.org/en/stable/why.html#data-classes>
- **attrs-strict**: Runtime type validation extension (@source: attrs wiki)
- **Stack Overflow tag**: `python-attrs`
## Conclusion
attrs is the mature, feature-rich choice for defining classes in Python. It predates dataclasses, offers significantly more functionality, and maintains excellent performance through slots optimization. Choose attrs when you need validators, converters, extensibility, or when building production systems requiring robust data structures. It's the foundation used by major projects like Black and is trusted by NASA for critical missions.
For simple cases, dataclasses may suffice. For API validation and schema generation, Pydantic excels. But for general-purpose class definition with powerful features and minimal dependencies, attrs is the gold standard.
---
**Research methodology**: Information gathered from official documentation (attrs.org), PyPI metadata, GitHub repository analysis, Context7 code examples, and comparison with alternative libraries. All sources are cited inline with @ references.

View File

@@ -0,0 +1,598 @@
---
title: "bidict: Bidirectional Mapping Library"
library_name: bidict
pypi_package: bidict
category: data-structures
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://bidict.readthedocs.io"
official_repository: "https://github.com/jab/bidict"
maintenance_status: "active"
---
# bidict: Bidirectional Mapping Library
## Overview
bidict provides efficient, Pythonic bidirectional mapping data structures for Python. It allows you to maintain a one-to-one mapping between keys and values where you can look up values by keys and keys by values with equal efficiency.
## Official Information
- **Repository**: @[https://github.com/jab/bidict]
- **Documentation**: @[https://bidict.readthedocs.io]
- **PyPI Package**: `bidict`
- **Latest Stable Version**: 0.23.1 (February 2024)
- **Development Version**: 0.23.2.dev0
- **License**: MPL-2.0 (Mozilla Public License 2.0)
- **Maintenance**: Actively maintained since 2009 (15+ years)
- **Author**: Joshua Bronson (@jab)
- **Stars**: 1,554+ on GitHub
## Python Version Compatibility
- **Minimum Required**: Python 3.9+
- **Tested Versions**: 3.9, 3.10, 3.11, 3.12, PyPy
- **Python 3.13/3.14**: Expected to be compatible (no version-specific blockers)
- **Type Hints**: Fully type-hinted codebase
Source: @[pyproject.toml line 9: requires-python = ">=3.9"]
## Core Purpose
### The Problem bidict Solves
bidict eliminates the need to manually maintain two separate dictionaries when you need bidirectional lookups. Without bidict, you might be tempted to:
```python
# DON'T DO THIS - The naive approach
mapping = {'H': 'hydrogen', 'hydrogen': 'H'}
```
**Problems with this approach:**
- Unclear distinction between keys and values when iterating
- `len()` returns double the actual number of associations
- Updating associations requires complex cleanup logic to avoid orphaned data
- No enforcement of one-to-one invariant
- Iterating `.keys()` also yields values, and vice versa
### What bidict Provides
```python
from bidict import bidict
# The correct approach
element_by_symbol = bidict({'H': 'hydrogen'})
element_by_symbol['H'] # 'hydrogen'
element_by_symbol.inverse['hydrogen'] # 'H'
```
bidict maintains two separate internal dictionaries and keeps them automatically synchronized, providing:
- **One-to-one invariant enforcement**: Prevents duplicate values
- **Automatic inverse synchronization**: Changes propagate bidirectionally
- **Clean iteration**: `.keys()` returns only keys, `.values()` returns only values
- **Accurate length**: `len()` returns the actual number of associations
- **Type safety**: Fully typed for static analysis
Source: @[docs/intro.rst: "to model a bidirectional mapping correctly and unambiguously, we need two separate one-directional mappings"]
## When to Use bidict
### Use bidict When
1. **Bidirectional lookups are required**
- Symbol-to-element mapping (H ↔ hydrogen)
- User ID-to-username mapping
- Code-to-description mappings
- Translation dictionaries between two systems
2. **One-to-one relationships must be enforced**
- Database primary key mappings
- File path-to-identifier mappings
- Token-to-user session mappings
3. **You need both directions with equal frequency**
- The overhead of two dicts is justified by lookup patterns
- Inverse lookups are not occasional edge cases
4. **Data integrity is important**
- Automatic cleanup when updating associations
- Protection against duplicate values via `ValueDuplicationError`
- Fail-clean guarantees for bulk operations
### Use Two Separate Dicts When
1. **Inverse lookups are rare or never needed**
- Simple one-way mappings
- Lookups only in one direction
2. **Values are not unique**
- Many-to-one relationships (multiple keys → same value)
- Example: category-to-items mapping
3. **Values are unhashable**
- Lists, dicts, or other mutable/unhashable values
- bidict requires values to be hashable
4. **Memory is extremely constrained**
- bidict maintains two internal dicts (approximately 2x memory)
- For very large datasets where inverse is rarely used
Source: @[docs/intro.rst, docs/basic-usage.rst]
## Decision Matrix
```text
┌─────────────────────────────────────┬──────────────┬──────────────────┐
│ Requirement │ Use bidict │ Use Two Dicts │
├─────────────────────────────────────┼──────────────┼──────────────────┤
│ Bidirectional lookups frequently │ ✓ │ │
│ One-to-one constraint enforcement │ ✓ │ │
│ Values must be hashable │ ✓ │ │
│ Automatic synchronization needed │ ✓ │ │
│ Many-to-one relationships │ │ ✓ │
│ Unhashable values (lists, dicts) │ │ ✓ │
│ Inverse lookups are rare │ │ ✓ │
│ Extreme memory constraints │ │ ✓ │
└─────────────────────────────────────┴──────────────┴──────────────────┘
```
## Installation
```bash
pip install bidict
```
Or with uv:
```bash
uv add bidict
```
No runtime dependencies outside Python's standard library.
## Basic Usage Examples
### Creating and Using a bidict
```python
from bidict import bidict
# Create from dict, keyword arguments, or items
element_by_symbol = bidict({'H': 'hydrogen', 'He': 'helium'})
element_by_symbol = bidict(H='hydrogen', He='helium')
element_by_symbol = bidict([('H', 'hydrogen'), ('He', 'helium')])
# Forward lookup (key → value)
element_by_symbol['H'] # 'hydrogen'
# Inverse lookup (value → key)
element_by_symbol.inverse['hydrogen'] # 'H'
# Inverse is a full bidict, kept in sync
element_by_symbol.inverse['helium'] = 'He'
element_by_symbol['He'] # 'helium'
```
Source: @[docs/intro.rst, docs/basic-usage.rst]
### Handling Duplicate Values
```python
from bidict import bidict, ValueDuplicationError
b = bidict({'one': 1})
# This raises an error - value 1 already exists
try:
b['two'] = 1
except ValueDuplicationError:
print("Value 1 is already mapped to 'one'")
# Explicitly allow overwriting with forceput()
b.forceput('two', 1)
# Result: bidict({'two': 1}) - 'one' was removed
```
Source: @[docs/basic-usage.rst: "Values Must Be Unique"]
### Standard Dictionary Operations
```python
from bidict import bidict
b = bidict(H='hydrogen', He='helium')
# All standard dict methods work
'H' in b # True
b.get('Li', 'not found') # 'not found'
b.pop('He') # 'helium'
b.update({'Li': 'lithium'}) # Add items
len(b) # 2
# Iteration yields only keys (not keys+values like naive approach)
list(b.keys()) # ['H', 'Li']
list(b.values()) # ['hydrogen', 'lithium']
list(b.items()) # [('H', 'hydrogen'), ('Li', 'lithium')]
```
Source: @[docs/basic-usage.rst: "Interop"]
## Advanced Features
### Other bidict Types
```python
from bidict import frozenbidict, OrderedBidict
# Immutable bidict (hashable, can be dict key or set member)
immutable = frozenbidict({'H': 'hydrogen'})
# Ordered bidict (maintains insertion order, like dict in Python 3.7+)
ordered = OrderedBidict({'H': 'hydrogen', 'He': 'helium'})
```
Source: @[docs/other-bidict-types.rst]
### Fine-Grained Duplication Control
```python
from bidict import bidict, OnDup, RAISE, DROP_OLD
b = bidict({1: 'one'})
# Strict mode - raise on any key or value duplication
b.put(2, 'two', on_dup=OnDup(key=RAISE, val=RAISE))
# Custom policies for different duplication scenarios
on_dup = OnDup(key=DROP_OLD, val=RAISE)
b.putall([(1, 'uno'), (2, 'dos')], on_dup=on_dup)
```
Source: @[docs/basic-usage.rst: "Key and Value Duplication"]
### Fail-Clean Guarantee
```python
from bidict import bidict
b = bidict({1: 'one', 2: 'two'})
# If an update fails, the bidict is unchanged
try:
b.putall({3: 'three', 1: 'uno'}) # 1 is duplicate key
except KeyDuplicationError:
pass
# (3, 'three') was NOT added - the bidict remains unchanged
b # bidict({1: 'one', 2: 'two'})
```
Source: @[docs/basic-usage.rst: "Updates Fail Clean"]
## Real-World Usage Patterns
Based on analysis of the bidict repository and documentation:
### Pattern 1: Symbol-to-Name Mappings
```python
from bidict import bidict
# Chemical elements
element_by_symbol = bidict({
'H': 'hydrogen',
'He': 'helium',
'Li': 'lithium'
})
# Look up element by symbol
element_by_symbol['H'] # 'hydrogen'
# Look up symbol by element name
element_by_symbol.inverse['lithium'] # 'Li'
```
### Pattern 2: ID-to-Object Mappings
```python
from bidict import bidict
# User session management
session_by_user_id = bidict({
1001: 'session_abc123',
1002: 'session_def456'
})
# Find session by user ID
session_by_user_id[1001] # 'session_abc123'
# Find user ID by session
session_by_user_id.inverse['session_abc123'] # 1001
```
### Pattern 3: Internationalization/Translation
```python
from bidict import bidict
# Language code mappings
lang_code = bidict({
'en': 'English',
'es': 'Español',
'fr': 'Français'
})
# Look up language name from code
lang_code['es'] # 'Español'
# Look up code from language name
lang_code.inverse['Français'] # 'fr'
```
### Pattern 4: File Path-to-Identifier Mappings
```python
from bidict import bidict
# File tracking system
file_by_id = bidict({
'f001': '/path/to/document.pdf',
'f002': '/path/to/image.png'
})
# Get path from ID
file_by_id['f001'] # '/path/to/document.pdf'
# Get ID from path
file_by_id.inverse['/path/to/image.png'] # 'f002'
```
## Integration Patterns
### With Type Hints
```python
from typing import Mapping
from bidict import bidict
def process_mapping(data: Mapping[str, int]) -> None:
# bidict is a full Mapping implementation
for key, value in data.items():
print(f"{key}: {value}")
# Works seamlessly
process_mapping(bidict({'a': 1, 'b': 2}))
```
### With collections.abc
bidict implements:
- `collections.abc.MutableMapping` (for `bidict`)
- `collections.abc.Mapping` (for `frozenbidict`)
```python
from collections.abc import MutableMapping
from bidict import bidict
def validate_mapping(m: MutableMapping) -> bool:
return isinstance(m, MutableMapping)
validate_mapping(bidict()) # True
```
### Polymorphic Equality
```python
from bidict import bidict
# bidict compares equal to dicts with same items
bidict(a=1, b=2) == {'a': 1, 'b': 2} # True
# Can convert freely between dict and bidict
dict(bidict(a=1)) # {'a': 1}
bidict(dict(a=1)) # bidict({'a': 1})
```
Source: @[docs/basic-usage.rst: "Interop"]
## Performance Characteristics
### Time Complexity
- **Forward lookup** (`b[key]`): O(1)
- **Inverse lookup** (`b.inverse[value]`): O(1)
- **Insert/Update** (`b[key] = value`): O(1)
- **Delete** (`del b[key]`): O(1)
- **Access inverse** (`b.inverse`): O(1) - inverse is always maintained, not computed on demand
### Space Complexity
- **Memory overhead**: Approximately 2x a single dict (maintains two internal dicts)
- **Inverse access**: No additional memory allocation (inverse is a view)
Source: @[docs/intro.rst: "the inverse is not computed on demand"]
## Known Limitations
1. **Values must be hashable**: Cannot use lists, dicts, or other unhashable types as values
2. **Memory overhead**: Uses roughly 2x the memory of a single dict
3. **One-to-one only**: Cannot represent many-to-one or one-to-many relationships
4. **Value uniqueness enforced**: Raises `ValueDuplicationError` by default when duplicate values are inserted
Source: @[docs/basic-usage.rst: "Values Must Be Hashable", "Values Must Be Unique"]
## When NOT to Use
### Scenario 1: Many-to-One Relationships
```python
# BAD: Multiple keys mapping to same value
# This won't work with bidict - use dict instead
category_to_items = {
'fruit': 'apple',
'vegetable': 'carrot',
'fruit': 'banana' # Duplicate value for different key
}
```
### Scenario 2: Unhashable Values
```python
# BAD: Lists as values
# This raises TypeError with bidict
groups = bidict({
'admins': ['alice', 'bob'], # TypeError: unhashable type: 'list'
'users': ['charlie', 'david']
})
# Use regular dict or use frozenset/tuple as values
groups = bidict({
'admins': frozenset(['alice', 'bob']), # OK
'users': frozenset(['charlie', 'david'])
})
```
### Scenario 3: Rarely Used Inverse Lookups
```python
# If you only need inverse lookup occasionally, manual approach may be simpler
forward = {'key1': 'value1', 'key2': 'value2'}
# Occasionally create inverse when needed
inverse = {v: k for k, v in forward.items()}
```
### Scenario 4: Extreme Memory Constraints
For very large datasets (millions of entries) where inverse lookups are infrequent, the 2x memory overhead may not be justified. Consider:
- Database-backed lookups for both directions
- On-demand inverse dict construction
- External key-value stores with bidirectional indices
## Notable Dependents
bidict is used by major organizations and projects (source: @[README.rst]):
- Google
- Venmo
- CERN
- Baidu
- Tencent
**PyPI Download Statistics**: Significant adoption with millions of downloads (source: @[README.rst badge])
## Dependencies
- **Runtime**: None (zero dependencies outside Python stdlib)
- **Development**: pytest, hypothesis, mypy, sphinx (for testing and docs)
Source: @[pyproject.toml: dependencies = []]
## Maintenance and Support
- **Maintenance**: Actively maintained since 2009 (15+ years)
- **Test Coverage**: 100% test coverage with property-based testing via hypothesis
- **CI/CD**: Continuous testing across all supported Python versions
- **Type Hints**: Fully type-hinted and mypy-strict compliant
- **Documentation**: Comprehensive documentation at readthedocs.io
- **Community**: GitHub Discussions for questions, active issue tracker
- **Enterprise Support**: Available via Tidelift subscription
Source: @[README.rst: "Features", "Enterprise Support"]
## Migration Guide
### From Two Manual Dicts
```python
# Before: Manual synchronization
forward = {'H': 'hydrogen'}
inverse = {'hydrogen': 'H'}
# When updating
forward['H'] = 'hydrogène'
del inverse['hydrogen'] # Manual cleanup
inverse['hydrogène'] = 'H'
# After: Automatic synchronization
from bidict import bidict
mapping = bidict({'H': 'hydrogen'})
mapping['H'] = 'hydrogène' # inverse automatically updated
```
### From Naive Single Dict
```python
# Before: Mixed keys and values
mixed = {'H': 'hydrogen', 'hydrogen': 'H'}
len(mixed) # 2 (wrong - should be 1 association)
list(mixed.keys()) # ['H', 'hydrogen'] (values mixed in)
# After: Clean separation
from bidict import bidict
b = bidict({'H': 'hydrogen'})
len(b) # 1 (correct)
list(b.keys()) # ['H'] (only keys)
list(b.values()) # ['hydrogen'] (only values)
```
## Related Libraries and Alternatives
- **Two manual dicts**: Simplest for occasional inverse lookups
- **bidict.OrderedBidict**: When insertion order matters (built into bidict)
- **bidict.frozenbidict**: Immutable variant for hashable mappings (built into bidict)
- **sortedcontainers.SortedDict**: For sorted bidirectional mappings (can combine with bidict)
No direct competitors in Python stdlib or third-party ecosystem that provide the same level of safety, features, and maintenance.
## Learning Resources
- Official Documentation: @[https://bidict.readthedocs.io]
- Intro Guide: @[https://bidict.readthedocs.io/intro.html]
- Basic Usage: @[https://bidict.readthedocs.io/basic-usage.html]
- Learning from bidict: @[https://bidict.readthedocs.io/learning-from-bidict.html] - covers advanced Python topics touched by bidict's implementation
- GitHub Repository: @[https://github.com/jab/bidict]
- PyPI Package: @[https://pypi.org/project/bidict/]
## Quick Decision Guide
**Use bidict when you answer "yes" to:**
1. Do you need to look up keys by values frequently?
2. Are your values unique (one-to-one relationship)?
3. Are your values hashable?
4. Do you want automatic synchronization between directions?
**Use two separate dicts when:**
1. Inverse lookups are rare
2. You have many-to-one relationships
3. Memory is extremely constrained
4. Values are unhashable
**Use a single dict when:**
1. You only need one direction
2. Values don't need to be unique
## Code Review Checklist
When reviewing code using bidict:
- [ ] Values are hashable (not lists, dicts, sets)
- [ ] One-to-one relationship is intended (no many-to-one)
- [ ] Error handling for `ValueDuplicationError` where appropriate
- [ ] `forceput()`/`forceupdate()` usage is intentional and documented
- [ ] Memory overhead (2x dict) is acceptable for use case
- [ ] Type hints include bidict types where appropriate
- [ ] Inverse access pattern justifies bidict usage vs two dicts
## Summary
bidict is a mature, well-tested library that solves the bidirectional mapping problem elegantly. Use it when you need efficient lookups in both directions with automatic synchronization and one-to-one invariant enforcement. Avoid it when you have many-to-one relationships, unhashable values, or rarely use inverse lookups.
**Key Takeaway**: If you're maintaining two dicts manually or considering `{a: b, b: a}`, reach for bidict. It eliminates error-prone manual synchronization while providing stronger guarantees and cleaner code.

View File

@@ -0,0 +1,586 @@
---
title: "Blinker: Fast Signal/Event Dispatching System"
library_name: blinker
pypi_package: blinker
category: event-system
python_compatibility: "3.8+"
last_updated: "2025-11-02"
official_docs: "https://blinker.readthedocs.io"
official_repository: "https://github.com/pallets-eco/blinker"
maintenance_status: "active"
---
# Blinker: Fast Signal/Event Dispatching System
## Official Information
**Repository:** <https://github.com/pallets-eco/blinker> **PyPI Package:** blinker **Current Version:** 1.9.0 (Released 2024-11-08) **Official Documentation:** <https://blinker.readthedocs.io/> **License:** MIT License **Maintenance Status:** Active (Pallets Community Ecosystem)
@source <https://github.com/pallets-eco/blinker> @source <https://blinker.readthedocs.io/> @source <https://pypi.org/project/blinker/>
## Core Purpose
Blinker provides a fast dispatching system that allows any number of interested parties to subscribe to events or "signals". It implements the Observer pattern with a clean, Pythonic API.
### Problem Space
Without blinker, you would need to manually implement:
- Global event registries for decoupled components
- Weak reference management for automatic cleanup
- Thread-safe event dispatching
- Sender-specific event filtering
- Return value collection from multiple handlers
### When to Use Blinker
**Use blinker when:**
- Building plugin systems that need event hooks
- Implementing application lifecycle hooks (like Flask)
- Creating decoupled components that communicate via events
- Building event-driven architectures within a single process
- Need multiple independent handlers for the same event
- Want automatic cleanup via weak references
**What you would be "reinventing the wheel" without it:**
- Observer/subscriber pattern implementation
- Named signal registries for plugin communication
- Weak reference management for receivers
- Thread-safe signal dispatching
- Sender filtering and context passing
## Python Version Compatibility
**Minimum Python Version:** 3.9+ **Python 3.11:** Fully compatible **Python 3.12:** Fully compatible **Python 3.13:** Fully compatible **Python 3.14:** Expected to be compatible
@source <https://blinker.readthedocs.io/en/stable/>
### Thread Safety
Blinker signals are thread-safe. The library uses weak references for automatic cleanup and properly handles concurrent signal emission and subscription.
## Integration Patterns
### Flask Ecosystem Integration
Flask uses blinker as its signal system foundation. Flask provides built-in signals like:
- `request_started` - Before request processing begins
- `request_finished` - After response is constructed
- `template_rendered` - When template is rendered
- `request_tearing_down` - During request teardown
@source <https://flask.palletsprojects.com/en/latest/signals/>
**Example Flask Signal Usage:**
```python
from flask import template_rendered
def log_template_renders(sender, template, context, **extra):
sender.logger.info(
f"Rendered {template.name} with context {context}"
)
template_rendered.connect(log_template_renders, app)
```
### Event-Driven Architecture
Blinker excels at creating loosely coupled components:
```python
from blinker import Namespace
# Create isolated namespace for your application
app_signals = Namespace()
# Define signals
user_logged_in = app_signals.signal('user-logged-in')
data_updated = app_signals.signal('data-updated')
# Multiple handlers can subscribe
@user_logged_in.connect
def update_last_login(sender, **kwargs):
user_id = kwargs.get('user_id')
# Update database
@user_logged_in.connect
def send_login_notification(sender, **kwargs):
# Send email notification
pass
# Emit signal
user_logged_in.send(app, user_id=123, ip_address='192.168.1.1')
```
### Plugin Systems
```python
from blinker import signal
# Core application defines hook points
plugin_loaded = signal('plugin-loaded')
before_process = signal('before-process')
after_process = signal('after-process')
# Plugins subscribe to hooks
@before_process.connect
def plugin_preprocess(sender, data):
# Plugin modifies data before processing
return data
# Application emits signals at hook points
results = before_process.send(self, data=input_data)
for receiver, result in results:
if result is not None:
input_data = result
```
## Real-World Examples
### Example 1: Flask Request Monitoring
@source <https://github.com/instana/python-sensor> (Flask instrumentation with blinker)
```python
from flask import request_started, request_finished
import time
request_times = {}
def track_request_start(sender, **extra):
request_times[id(extra)] = time.time()
def track_request_end(sender, response, **extra):
duration = time.time() - request_times.pop(id(extra), time.time())
sender.logger.info(f"Request took {duration:.2f}s")
request_started.connect(track_request_start)
request_finished.connect(track_request_end)
```
### Example 2: Model Save Hooks
@source <https://blinker.readthedocs.io/>
```python
from blinker import Namespace
model_signals = Namespace()
model_saved = model_signals.signal('model-saved')
class Model:
def save(self):
# Save to database
self._persist()
# Emit signal for observers
model_saved.send(self, model_type=self.__class__.__name__)
# Cache invalidation handler
@model_saved.connect
def invalidate_cache(sender, **kwargs):
cache.delete(f"model:{kwargs['model_type']}")
# Audit logging handler
@model_saved.connect
def log_change(sender, **kwargs):
audit_log.write(f"Model saved: {kwargs['model_type']}")
```
### Example 3: Sender-Specific Subscriptions
@source <https://github.com/pallets-eco/blinker> README
```python
from blinker import signal
round_started = signal('round-started')
# General subscriber - receives from all senders
@round_started.connect
def each_round(sender):
print(f"Round {sender}")
# Sender-specific subscriber - only for sender=2
@round_started.connect_via(2)
def special_round(sender):
print("This is round two!")
for round_num in range(1, 4):
round_started.send(round_num)
# Output:
# Round 1
# Round 2
# This is round two!
# Round 3
```
### Example 4: Async Signal Handlers
@source <https://blinker.readthedocs.io/en/stable/>
```python
import asyncio
from blinker import Signal
async_signal = Signal()
# Async receiver
async def async_receiver(sender, **kwargs):
await asyncio.sleep(1)
print("Async handler completed")
async_signal.connect(async_receiver)
# Send to async receivers
await async_signal.send_async()
# Mix sync and async receivers
def sync_receiver(sender, **kwargs):
print("Sync handler")
async_signal.connect(sync_receiver)
# Provide wrapper for sync handlers in async context
async def sync_wrapper(func):
async def inner(*args, **kwargs):
func(*args, **kwargs)
return inner
await async_signal.send_async(_sync_wrapper=sync_wrapper)
```
## Usage Examples
### Basic Signal Definition and Connection
```python
from blinker import signal
# Named signals (shared across modules)
initialized = signal('initialized')
# Anonymous signals (class attributes)
from blinker import Signal
class Processor:
on_ready = Signal()
on_complete = Signal()
def process(self):
self.on_ready.send(self)
# Do work
self.on_complete.send(self, status='success')
# Connect receivers
@initialized.connect
def on_init(sender, **kwargs):
print(f"Initialized by {sender}")
processor = Processor()
@processor.on_complete.connect
def handle_completion(sender, **kwargs):
print(f"Status: {kwargs['status']}")
```
### Named Signals for Decoupling
```python
from blinker import signal
# Module A defines and sends
def user_service():
user_created = signal('user-created')
# Create user
user_created.send('user_service', user_id=123, username='john')
# Module B subscribes (no import of Module A needed!)
def notification_service():
user_created = signal('user-created') # Same signal instance
@user_created.connect
def send_welcome_email(sender, **kwargs):
print(f"Sending email to {kwargs['username']}")
```
### Checking for Receivers Before Expensive Operations
```python
from blinker import signal
data_changed = signal('data-changed')
def update_data(new_data):
# Only compute expensive stats if someone is listening
if data_changed.receivers:
stats = compute_expensive_stats(new_data)
data_changed.send(self, data=new_data, stats=stats)
else:
# Skip expensive computation
data_changed.send(self, data=new_data)
```
### Temporarily Muting Signals (Testing)
```python
from blinker import signal
send_email = signal('send-email')
@send_email.connect
def actually_send(sender, **kwargs):
# Send real email
pass
def test_user_registration():
# Don't send emails during tests
with send_email.muted():
register_user('test@example.com')
# send_email signal is ignored in this context
```
### Collecting Return Values
```python
from blinker import signal
validate_data = signal('validate-data')
@validate_data.connect
def check_email(sender, **kwargs):
email = kwargs['email']
if '@' not in email:
return False, "Invalid email"
return True, None
@validate_data.connect
def check_username(sender, **kwargs):
username = kwargs['username']
if len(username) < 3:
return False, "Username too short"
return True, None
# Collect all validation results
results = validate_data.send(
None,
email='invalid',
username='ab'
)
for receiver, (valid, error) in results:
if not valid:
print(f"Validation failed: {error}")
```
## When NOT to Use Blinker
### Scenario 1: Simple Callbacks Sufficient
**Don't use blinker when:**
- Single callback function is enough
- No need for dynamic subscription/unsubscription
- Callbacks are tightly coupled to caller
```python
# Overkill - use simple callback
from blinker import signal
sig = signal('done')
sig.connect(on_done)
sig.send(self)
# Better - direct callback
def process(callback):
# do work
callback()
process(on_done)
```
### Scenario 2: Async Event Systems
**Don't use blinker when:**
- Building async-first distributed event system
- Need message queuing and persistence
- Cross-process or cross-network communication
```python
# Wrong tool - blinker is in-process only
from blinker import signal
distributed_event = signal('cross-service-event')
# Better - use async message queue
import asyncio
from aio_pika import connect, Message
async def publish_event():
connection = await connect("amqp://guest:guest@localhost/")
channel = await connection.channel()
await channel.default_exchange.publish(
Message(b"event data"),
routing_key="events"
)
```
### Scenario 3: Complex State Machines
**Don't use blinker when:**
- Need state transitions with guards and actions
- Require hierarchical or concurrent states
- Complex workflow orchestration
```python
# Wrong tool - too complex for simple signals
from blinker import signal
# Better - use state machine library
from transitions import Machine
class Order:
states = ['pending', 'paid', 'shipped', 'delivered']
def __init__(self):
self.machine = Machine(
model=self,
states=Order.states,
initial='pending'
)
self.machine.add_transition('pay', 'pending', 'paid')
self.machine.add_transition('ship', 'paid', 'shipped')
```
### Scenario 4: Request/Response Patterns
**Don't use blinker when:**
- Need bidirectional request/response communication
- Require RPC-style method calls
- Need return values from specific handlers
```python
# Awkward with signals
result = some_signal.send(self, request='data')
# Hard to know which handler provided what
# Better - direct method call or dependency injection
class ServiceLocator:
def get_service(self, name):
return self._services[name]
service = locator.get_service('data_processor')
result = service.process(data)
```
## Decision Guidance Matrix
| Use Blinker When | Use Callbacks When | Use AsyncIO When | Use Message Queue When |
| --- | --- | --- | --- |
| Multiple independent handlers needed | Single handler sufficient | Async/await throughout codebase | Cross-process communication needed |
| Plugin system with dynamic handlers | Tightly coupled components | I/O-bound async operations | Message persistence required |
| Decoupled modules need communication | Callback logic is simple | Event loop already present | Distributed systems |
| Framework-level hooks (like Flask) | Direct function call works | Concurrent async tasks | Reliability and retry needed |
| Observable events in OOP design | Inline lambda sufficient | Network I/O heavy | Message ordering matters |
| Weak reference cleanup needed | Manual lifecycle management OK | WebSockets/long-lived connections | Load balancing across workers |
### Decision Tree
```text
Need event notifications?
├─ Single process only?
│ ├─ YES: Continue
│ └─ NO: Use message queue (RabbitMQ, Redis, Kafka)
├─ Multiple handlers per event?
│ ├─ YES: Continue
│ └─ NO: Use simple callback function
├─ Handlers need to be dynamic (plugins)?
│ ├─ YES: Use Blinker ✓
│ └─ NO: Direct method calls may suffice
├─ Async/await heavy codebase?
│ ├─ YES: Consider asyncio event system
│ │ (or use Blinker with send_async)
│ └─ NO: Use Blinker ✓
└─ Need weak reference cleanup?
├─ YES: Use Blinker ✓
└─ NO: Simple callbacks OK
```
## Installation
```bash
pip install blinker
```
Current version: 1.9.0 Minimum Python: 3.9+
@source <https://pypi.org/project/blinker/>
## Key Features
- **Global named signal registry:** `signal('name')` returns same instance everywhere
- **Anonymous signals:** Create isolated `Signal()` instances
- **Sender filtering:** `connect(handler, sender=obj)` for sender-specific subscriptions
- **Weak references:** Automatic cleanup when receivers are garbage collected
- **Thread safety:** Safe for concurrent use
- **Return value collection:** Gather results from all handlers
- **Async support:** `send_async()` for coroutine receivers
- **Temporary connections:** Context managers for scoped subscriptions
- **Signal muting:** Disable signals temporarily (useful for testing)
@source <https://blinker.readthedocs.io/en/stable/>
## Common Pitfalls
1. **Memory leaks with strong references:**
```python
# Default uses weak references - OK
signal.connect(handler)
# Strong reference - prevents garbage collection
signal.connect(handler, weak=False) # Use sparingly!
```
2. **Expecting signals to modify behavior:**
- Signals are for observation, not control flow
- Don't rely on signal handlers to prevent actions
- Use explicit validation/authorization instead
3. **Forgetting sender parameter:**
```python
@my_signal.connect
def handler(sender, **kwargs): # sender is required!
print(kwargs['data'])
```
4. **Cross-process communication:**
- Blinker is in-process only
- Use message queues for distributed systems
5. **Performance with many handlers:**
- Check `signal.receivers` before expensive operations
- Consider limiting number of subscribers for hot paths
## Related Libraries
- **Django Signals:** Built into Django, similar concept but Django-specific
- **PyPubSub:** More complex publish-subscribe system
- **asyncio events:** For async-first applications
- **RxPY:** Reactive extensions for Python (more powerful, more complex)
- **Celery:** For distributed task queues and async workers
## Summary
Blinker is the standard solution for in-process event dispatching in Python, particularly within the Pallets ecosystem (Flask). Use it when you need clean, decoupled event notifications between components in the same process. For distributed systems, async-heavy codebases, or simple single-callback scenarios, consider alternatives.
**TL;DR:** Blinker = Observer pattern done right, with weak references, thread safety, and a clean API. Essential for Flask signals and plugin systems.

View File

@@ -0,0 +1,513 @@
---
title: "Boltons: Pure-Python Standard Library Extensions"
library_name: boltons
pypi_package: boltons
category: utilities
python_compatibility: "3.7+"
last_updated: "2025-11-02"
official_docs: "https://boltons.readthedocs.io"
official_repository: "https://github.com/mahmoud/boltons"
maintenance_status: "active"
---
# Boltons: Pure-Python Standard Library Extensions
## Overview
**boltons should be builtins.**
Boltons is a collection of over 230 BSD-licensed, pure-Python utilities designed to extend Python's standard library with functionality that is conspicuously missing. Created and maintained by @mahmoud (Mahmoud Hashemi), it provides battle-tested implementations of commonly needed utilities without any external dependencies.
### Core Value Proposition
- **Zero Dependencies**: Pure-Python with no external requirements
- **Module Independence**: Each module can be vendored individually
- **Battle-Tested**: 6,765+ stars, tested against Python 3.7-3.13 and PyPy3
- **Standard Library Philosophy**: Follows stdlib design principles
- **Production Ready**: Used in production by numerous projects
## Problem Space
Boltons solves the "reinventing the wheel" problem for common utilities that should be in the standard library but aren't. Without boltons, developers repeatedly write custom implementations for:
- LRU caches with better APIs than `functools.lru_cache`
- Chunked and windowed iteration patterns
- Atomic file operations
- Advanced dictionary types (OrderedMultiDict)
- Enhanced traceback formatting and debugging
- Recursive data structure traversal
- File system utilities beyond `shutil`
### What Would Be Reinventing the Wheel
Using boltons prevents rewriting:
- Custom LRU cache implementations with size limits and TTL
- Iteration utilities like `chunked()`, `windowed()`, `unique()`
- Atomic file write operations (write-to-temp, rename)
- Enhanced `namedtuple` with defaults and mutation
- Traceback extraction and formatting utilities
- URL parsing and manipulation beyond `urllib.parse`
- Table formatting for 2D data
## Design Principles
Per @boltons/docs/architecture.rst, each "bolton" must:
1. **Be pure-Python and self-contained**: No C extensions, minimal dependencies
2. **Perform a common task**: Address frequently needed functionality
3. **Mitigate stdlib insufficiency**: Fill gaps in the standard library
4. **Follow stdlib practices**: Balance best practice with pragmatism
5. **Include documentation**: At least one doctest, links to related tools
## Key Modules
### 1. **cacheutils** - Advanced Caching [@context7]
Better caching than `functools.lru_cache`:
```python
from boltons.cacheutils import LRU, cached, cachedmethod
# LRU cache with size limit
cache = LRU(max_size=256)
cache['user:123'] = user_data
# Decorator with custom cache backend
@cached(cache={})
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# Threshold counter - only track frequently occurring items
from boltons.cacheutils import ThresholdCounter
tc = ThresholdCounter(threshold=0.1)
tc.update([2] * 10) # Only remembers items > 10% frequency
```
**When to use**: Need size-limited caches, TTL expiration, or custom eviction policies.
### 2. **iterutils** - Enhanced Iteration [@context7]
Powerful iteration utilities beyond `itertools`:
```python
from boltons.iterutils import (
chunked, chunked_iter, # Split into chunks
windowed, windowed_iter, # Sliding windows
unique, unique_iter, # Deduplicate preserving order
one, first, same, # Reduction utilities
remap, get_path, # Recursive data structure traversal
backoff, # Exponential backoff with jitter
pairwise # Overlapping pairs
)
# Chunking for batch processing
for batch in chunked(user_ids, 100):
process_batch(batch)
# [1,2,3,4,5] with size=2 → [1,2], [3,4], [5]
# Sliding window for moving averages
for window in windowed(prices, 7):
avg = sum(window) / len(window)
# [1,2,3,4,5] with size=3 → [1,2,3], [2,3,4], [3,4,5]
# Safe reduction
user = one(users) # Raises if != 1 item
first_or_none = first(results, default=None)
# Recursive data structure traversal
def visit(path, key, value):
if isinstance(value, str) and 'secret' in key.lower():
return '***REDACTED***'
return value
clean_data = remap(user_data, visit=visit)
# Exponential backoff with jitter
for wait_time in backoff(start=0.1, stop=60, count=5, jitter=True):
if try_operation():
break
time.sleep(wait_time)
```
**When to use**: Batch processing, sliding windows, recursive data transformation, retry logic.
### 3. **tbutils** - Enhanced Tracebacks [@context7]
Better exception handling and debugging:
```python
from boltons.tbutils import TracebackInfo, ExceptionInfo, ParsedException
try:
risky_operation()
except Exception as e:
# Capture full traceback info
exc_info = ExceptionInfo.from_current()
# Access structured traceback data
tb_info = TracebackInfo.from_current()
for frame in tb_info.frames:
print(f"{frame.filename}:{frame.lineno} in {frame.func_name}")
# Format for logging
formatted = exc_info.get_formatted()
logger.error(formatted)
```
**When to use**: Enhanced error logging, debugging tools, error analysis.
### 4. **fileutils** - Safe File Operations [@context7]
Atomic writes and safe file handling:
```python
from boltons.fileutils import atomic_save, mkdir_p, FilePerms
# Atomic file write (write-to-temp, rename)
with atomic_save('config.json') as f:
json.dump(config, f)
# File only replaced if write succeeds
# Create directory path (like mkdir -p)
mkdir_p('/path/to/nested/directory')
# Readable permission management
perms = FilePerms(0o755)
perms.apply('/path/to/script.sh')
```
**When to use**: Configuration files, data persistence, safe concurrent writes.
### 5. **dictutils** - Advanced Dictionaries [@context7]
Enhanced dictionary types:
```python
from boltons.dictutils import OrderedMultiDict, OMD
# Preserve order + allow duplicate keys (like HTTP headers)
headers = OMD([
('Accept', 'application/json'),
('Accept', 'text/html'), # Multiple values for same key
('User-Agent', 'MyBot/1.0')
])
for accept in headers.getlist('Accept'):
print(accept) # application/json, text/html
```
**When to use**: HTTP headers, query parameters, configuration with duplicate keys.
### 6. **strutils** - String Utilities [@github/README.md]
Common string operations:
```python
from boltons.strutils import (
slugify, # URL-safe slugs
bytes2human, # Human-readable byte sizes
find_hashtags, # Extract #hashtags
pluralize, # Smart pluralization
strip_ansi # Remove ANSI codes
)
slugify("Hello, World!") # "hello-world"
bytes2human(1234567) # "1.18 MB"
```
### 7. **queueutils** - Priority Queues [@context7]
Enhanced queue types:
```python
from boltons.queueutils import HeapPriorityQueue, PriorityQueue
pq = HeapPriorityQueue()
pq.add("low priority", priority=3)
pq.add("high priority", priority=1)
item = pq.pop() # Returns "high priority"
```
## Integration Patterns
### Full Install
```bash
pip install boltons
```
### Import Individual Modules
```python
# Import only what you need
from boltons.cacheutils import LRU
from boltons.iterutils import chunked
from boltons.fileutils import atomic_save
```
### Vendoring (Copy Into Project)
Since boltons has **zero dependencies** and each module is **independent**:
```bash
# Copy specific module
cp /path/to/site-packages/boltons/iterutils.py myproject/utils/
# Copy entire package
cp -r /path/to/site-packages/boltons myproject/vendor/
```
This is explicitly supported by the project design [@context7/architecture.rst].
## Real-World Usage Examples [@github/search]
### Example 1: Clastic Web Framework [@mahmoud/clastic]
```python
# Enhanced traceback handling
from boltons.tbutils import ExceptionInfo, TracebackInfo
class ErrorMiddleware:
def handle_error(self, exc):
exc_info = ExceptionInfo.from_current()
return self.render_error_page(exc_info.get_formatted())
```
### Example 2: Click-Extra CLI Framework [@kdeldycke/click-extra]
```python
# Enhanced traceback formatting for CLI error messages
from boltons.tbutils import print_exception
try:
run_command()
except Exception:
print_exception() # Beautiful formatted traceback
```
### Example 3: Reader Feed Library [@lemon24/reader]
```python
# Type checking utilities
from boltons.typeutils import make_sentinel
NOT_SET = make_sentinel('NOT_SET') # Better than None for defaults
```
### Example 4: Batch Processing Pattern
```python
from boltons.iterutils import chunked
# Process database records in batches
for batch in chunked(fetch_all_records(), 1000):
bulk_insert(batch)
db.commit()
```
### Example 5: API Rate Limiting
```python
from boltons.iterutils import backoff
from boltons.cacheutils import LRU
# Exponential backoff for API retries
cache = LRU(max_size=1000)
def call_api_with_retry(endpoint):
for wait in backoff(start=0.1, stop=60, count=5):
try:
return requests.get(endpoint)
except requests.HTTPError as e:
if e.response.status_code == 429: # Rate limited
time.sleep(wait)
else:
raise
```
## Python Version Compatibility
- **Minimum**: Python 3.7
- **Maximum Tested**: Python 3.13
- **Also Tested**: PyPy3
- **3.11-3.14 Status**: Fully compatible (tested 3.11, 3.12, 3.13)
Per @github/README.md:
> Boltons is tested against Python 3.7-3.13, as well as PyPy3.
## When to Use Boltons
### Use Boltons When
1. **Need stdlib-style utilities with no dependencies**
- Building libraries that avoid dependencies
- Corporate environments with strict dependency policies
- Want vendorable, copy-pasteable code
2. **Iteration patterns beyond itertools**
- Chunking/batching data
- Sliding windows
- Recursive data structure traversal
- Exponential backoff
3. **Enhanced caching needs**
- Size-limited LRU caches
- TTL expiration
- Custom eviction policies
- Better API than `functools.lru_cache`
4. **Atomic file operations**
- Safe configuration file updates
- Preventing corrupted writes
- Concurrent file access
5. **Advanced debugging**
- Structured traceback information
- Custom error formatting
- Error analysis tools
6. **OrderedMultiDict needs**
- HTTP headers/query parameters
- Configuration with duplicate keys
- Preserving insertion order + duplicates
### Use Standard Library When
1. **Basic iteration**: `itertools` suffices
2. **Simple caching**: `functools.lru_cache` is enough
3. **Basic file ops**: `pathlib` and `shutil` work fine
4. **Standard dicts**: `dict` or `collections.OrderedDict` meets needs
### Use more-itertools When
- Need even more specialized iteration utilities
- Already using `more-itertools` in project
- Want community recipes from itertools docs
**Key Difference**: Boltons is broader (files, caching, debugging) while `more-itertools` focuses purely on iteration.
## Decision Matrix
| Scenario | Use Boltons | Use Stdlib | Use Alternative |
| --- | --- | --- | --- |
| LRU cache with size limits | ✅ `cacheutils.LRU` | ⚠️ `lru_cache` (no size control) | `cachetools` (more features) |
| Chunked iteration | ✅ `iterutils.chunked` | ❌ Manual slicing | `more-itertools.chunked` |
| Atomic file writes | ✅ `fileutils.atomic_save` | ❌ Manual temp+rename | `atomicwrites` (archived) |
| Enhanced tracebacks | ✅ `tbutils.TracebackInfo` | ❌ `traceback` (basic) | `rich.traceback` (prettier) |
| OrderedMultiDict | ✅ `dictutils.OMD` | ❌ Custom solution | `werkzeug.datastructures` |
| Exponential backoff | ✅ `iterutils.backoff` | ❌ Manual implementation | `tenacity`, `backoff` |
| URL parsing | ✅ `urlutils.URL` | ⚠️ `urllib.parse` (basic) | `yarl`, `furl` |
| Zero dependencies | ✅ Pure Python | ✅ Built-in | ❌ Most alternatives |
## When NOT to Use Boltons
1. **Already using specialized libraries**
- Have `cachetools` for advanced caching
- Have `tenacity` for retry logic
- Have `rich` for pretty output
2. **Need high-performance implementations**
- Boltons prioritizes correctness over speed
- C-extension alternatives may be faster
3. **Want cutting-edge features**
- Boltons is conservative, stdlib-like
- Specialized libraries may innovate faster
4. **Framework-specific needs**
- Django/Flask have their own utils
- Web frameworks provide similar functionality
## Maintenance and Stability
- **Versioning**: CalVer (YY.MINOR.MICRO) [@github/README.md]
- **Latest**: 25.0.0 (February 2025)
- **Maintenance**: Active, 71 open issues, 373 forks
- **Author**: Mahmoud Hashemi (@mahmoud)
- **License**: BSD (permissive)
## Related Libraries
### Complementary
- **more-itertools**: Extended iteration recipes
- **toolz/cytoolz**: Functional programming utilities
- **attrs/dataclasses**: Enhanced class definitions
### Overlapping
- **cachetools**: More advanced caching (but has dependencies)
- **atomicwrites**: Atomic file writes (now archived)
- **werkzeug**: Web utilities including MultiDict
### When to Combine
```python
# Use both boltons and more-itertools
from boltons.iterutils import chunked # For chunking
from more_itertools import flatten # For flattening
from boltons.cacheutils import LRU # For caching
cache = LRU(max_size=1000)
@cached(cache=cache)
def process_data(records):
for batch in chunked(records, 100):
yield process_batch(batch)
```
## Key Takeaways
1. **Zero Dependencies**: Pure-Python, no external requirements
2. **Vendorable**: Copy individual modules into your project
3. **Battle-Tested**: 6,765+ stars, production-proven
4. **Stdlib Philosophy**: Familiar API, conservative design
5. **Broad Coverage**: Caching, iteration, files, debugging, data structures
6. **Production Ready**: Python 3.7-3.13, PyPy3 support
## Quick Start
```python
# Install
pip install boltons
# Common patterns
from boltons.cacheutils import LRU
from boltons.iterutils import chunked, windowed, backoff
from boltons.fileutils import atomic_save
from boltons.tbutils import ExceptionInfo
# LRU cache
cache = LRU(max_size=256)
# Batch processing
for batch in chunked(items, 100):
process(batch)
# Atomic writes
with atomic_save('data.json') as f:
json.dump(data, f)
# Enhanced error handling
try:
risky()
except Exception:
exc_info = ExceptionInfo.from_current()
logger.error(exc_info.get_formatted())
```
## References
- **Repository**: [@github/mahmoud/boltons](https://github.com/mahmoud/boltons)
- **Documentation**: [@readthedocs](https://boltons.readthedocs.io/)
- **PyPI**: [@pypi/boltons](https://pypi.org/project/boltons/)
- **Context7**: [@context7/mahmoud/boltons](/mahmoud/boltons)
- **Architecture**: [@readthedocs/architecture](https://boltons.readthedocs.io/en/latest/architecture.html)
---
_Research completed: 2025-10-21_ _Sources: Context7, GitHub, PyPI, ReadTheDocs, Exa code search_ _Trust Score: 9.8/10 (Context7)_

View File

@@ -0,0 +1,683 @@
---
title: "python-box: Advanced Python Dictionaries with Dot Notation Access"
library_name: python-box
pypi_package: python-box
category: data_structures
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://github.com/cdgriffith/Box/wiki"
official_repository: "https://github.com/cdgriffith/Box"
maintenance_status: "active"
---
# python-box: Advanced Python Dictionaries with Dot Notation Access
## Overview
python-box extends Python's built-in dictionary with dot notation access and powerful configuration management features. It provides a transparent drop-in replacement for standard dicts while adding recursive dot notation, automatic type conversion, and seamless serialization to/from JSON, YAML, TOML, and msgpack formats.
**Official Repository:** @<https://github.com/cdgriffith/Box> **Documentation:** @<https://github.com/cdgriffith/Box/wiki> **PyPI Package:** `python-box` **License:** MIT **Maintained By:** Chris Griffith (@cdgriffith)
## Core Purpose
### Problem Box Solves
Without python-box, working with nested dictionaries requires verbose bracket notation:
```python
# Standard dict - verbose and error-prone
config = {
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "admin",
"password": "secret"
}
}
}
# Accessing nested values - clunky syntax
db_host = config["database"]["host"]
db_user = config["database"]["credentials"]["username"]
# KeyError if key doesn't exist
try:
timeout = config["database"]["timeout"] # KeyError!
except KeyError:
timeout = 30
```
With python-box, you get clean dot notation and safe defaults:
```python
from box import Box
config = Box({
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "admin",
"password": "secret"
}
}
})
# Clean dot notation access
db_host = config.database.host
db_user = config.database.credentials.username
# Safe access with defaults (using DefaultBox)
from box import DefaultBox
config = DefaultBox(config, default_box=True)
timeout = config.database.timeout or 30 # No KeyError
```
### When You're Reinventing the Wheel
You should use python-box when you find yourself:
1. **Writing custom attribute access wrappers** for dictionaries
2. **Implementing recursive dictionary-to-object converters**
3. **Manually sanitizing dictionary keys** to make them Python-safe
4. **Writing boilerplate** for JSON/YAML configuration loading
5. **Creating frozen/immutable configuration objects** from dicts
6. **Implementing safe nested dictionary access** with try/except blocks
## Installation
```bash
# Basic installation (no serialization dependencies)
pip install python-box~=7.0
# With all dependencies (YAML, TOML, msgpack)
pip install python-box[all]~=7.0
# With specific dependencies
pip install python-box[yaml]~=7.0 # PyYAML or ruamel.yaml
pip install python-box[toml]~=7.0 # tomli/tomli-w
pip install python-box[msgpack]~=7.0 # msgpack
# Optimized version with Cython (requires build tools)
pip install Cython wheel
pip install python-box[all]~=7.0 --force
```
**Version Pinning:** Always use compatible release matching (`~=7.0`) as Box follows semantic versioning. Check @<https://github.com/cdgriffith/Box/wiki/Major-Version-Breaking-Changes> before upgrading major versions.
## Python Version Compatibility
- **Minimum:** Python 3.9
- **Supported:** Python 3.9, 3.10, 3.11, 3.12, 3.13
- **Dropped Support:** Python 3.8 (removed in v7.3.0, EOL)
- **Python 3.14:** Expected compatibility (based on current trajectory)
**Cython Optimization:** Available for x86_64 platforms. Loading large datasets can be up to 10x faster with Cython-compiled version.
## Core Features & Usage Examples
### 1. Basic Box Usage
```python
from box import Box
# Create from dict
movie_box = Box({
"Robin Hood: Men in Tights": {
"imdb_stars": 6.7,
"length": 104
}
})
# Automatic key conversion for dot notation
# Spaces become underscores, special chars removed
movie_box.Robin_Hood_Men_in_Tights.imdb_stars # 6.7
# Standard dict access still works
movie_box["Robin Hood: Men in Tights"]["length"] # 104
# Both are equivalent
assert movie_box.Robin_Hood_Men_in_Tights.imdb_stars == \
movie_box["Robin Hood: Men in Tights"]["imdb_stars"]
```
### 2. Configuration Management with ConfigBox
```python
from box import ConfigBox
import os
# Load environment-specific configuration
config_data = {
"development": {
"database": {
"host": "localhost",
"port": 5432,
"pool_size": 5
},
"debug": True
},
"production": {
"database": {
"host": "prod-db.server.com",
"port": 5432,
"pool_size": 20
},
"debug": False
}
}
# Select environment
env = os.getenv("APP_ENV", "development")
config = ConfigBox(config_data[env])
print(f"Database Host for {env}: {config.database.host}")
print(f"Pool Size: {config.database.pool_size}")
print(f"Debug Mode: {config.debug}")
```
### 3. JSON/YAML/TOML Serialization
```python
from box import Box
# From JSON
config = Box.from_json(filename="config.json")
# From YAML
config = Box.from_yaml(filename="config.yaml")
# From TOML
config = Box.from_toml(filename="config.toml")
# To JSON
config.to_json(filename="output.json", indent=2)
# To YAML
config.to_yaml(filename="output.yaml")
# To dict (for standard JSON serialization)
import json
json.dumps(config.to_dict())
```
### 4. DefaultBox for Safe Access
```python
from box import DefaultBox
# Create with default values
config = DefaultBox(default_box=True, default_box_attr={})
# Access non-existent nested keys safely
# Instead of KeyError, creates empty Box objects
config.api.endpoints.users = "/api/v1/users"
config.api.endpoints.posts = "/api/v1/posts"
# Check existence
if config.cache.enabled:
print("Cache is enabled")
else:
print("Cache not configured") # This prints
```
### 5. FrozenBox for Immutability
```python
from box import Box
# Create mutable box
config = Box({"debug": True, "timeout": 30})
config.debug = False # Allowed
# Freeze it
frozen_config = config.freeze()
# or
frozen_config = Box({"debug": True}, frozen_box=True)
# Attempts to modify raise BoxError
try:
frozen_config.debug = False
except Exception as e:
print(f"Error: {e}") # BoxError: Box is frozen
```
### 6. Box Variants
```python
from box import Box, BoxList
# CamelKillerBox - converts camelCase to snake_case
from box import Box
config = Box({"apiEndpoint": "https://api.example.com"}, camel_killer_box=True)
config.api_endpoint # Works!
# BoxList - list of Box objects
from box import BoxList
users = BoxList([
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
])
users[0].name # "Alice"
users[1].age # 25
# Box with dots in keys
from box import Box
config = Box({"api.version": "v2"}, box_dots=True)
config["api.version"] # Access with dots in key
```
## Real-World Usage Patterns
### Pattern 1: Application Configuration
```python
# config/settings.py
from box import ConfigBox
from pathlib import Path
def load_config(env: str = "development") -> ConfigBox:
"""Load environment-specific configuration."""
config_path = Path(__file__).parent / f"{env}.yaml"
return ConfigBox.from_yaml(filename=config_path)
# usage
config = load_config(os.getenv("ENVIRONMENT", "development"))
db_url = f"postgresql://{config.database.host}:{config.database.port}"
```
### Pattern 2: API Response Handling
```python
# Instead of dealing with nested dicts from API responses
import requests
from box import Box
response = requests.get("https://api.example.com/user/123")
user_data = Box(response.json())
# Clean access to nested data
print(f"User: {user_data.profile.name}")
print(f"Email: {user_data.contact.email}")
print(f"Company: {user_data.employment.company.name}")
# vs traditional dict access:
# print(f"User: {response.json()['profile']['name']}")
```
### Pattern 3: Argparse Integration
```python
import argparse
from box import Box
parser = argparse.ArgumentParser()
parser.add_argument('floats', metavar='N', type=float, nargs='+')
parser.add_argument("-v", "--verbosity", action="count", default=0)
# Parse into Box instead of Namespace
args = parser.parse_args(['1', '2', '3', '-vv'], namespace=Box())
# Can now use as dict or object
print(args.floats) # [1.0, 2.0, 3.0]
print(args.verbosity) # 2
# Easy to pass as kwargs
def process(**kwargs):
print(kwargs)
process(**args.to_dict())
```
## Integration Patterns
### JSON Configuration Files
```python
from box import Box
# config.json
# {
# "app": {
# "name": "MyApp",
# "version": "1.0.0"
# },
# "features": {
# "auth": true,
# "cache": false
# }
# }
config = Box.from_json(filename="config.json")
if config.features.auth:
setup_authentication()
```
### YAML Configuration Files
```python
from box import Box
# config.yaml
# database:
# host: localhost
# port: 5432
# credentials:
# username: admin
# password: secret
config = Box.from_yaml(filename="config.yaml")
db_conn = connect(
host=config.database.host,
port=config.database.port,
user=config.database.credentials.username,
password=config.database.credentials.password
)
```
### TOML Configuration Files
```python
from box import Box
# pyproject.toml or config.toml
# [tool.myapp]
# name = "MyApp"
# version = "1.0.0"
#
# [tool.myapp.database]
# host = "localhost"
# port = 5432
config = Box.from_toml(filename="pyproject.toml")
app_name = config.tool.myapp.name
db_host = config.tool.myapp.database.host
```
## When NOT to Use python-box
### 1. Performance-Critical Code
```python
# DON'T use Box in tight loops or performance hotspots
# Box has overhead for attribute access and conversion
# Bad: Hot loop with Box
results = Box()
for i in range(1_000_000):
results[f"key_{i}"] = compute_value(i) # Overhead!
# Good: Use regular dict, convert after if needed
results = {}
for i in range(1_000_000):
results[f"key_{i}"] = compute_value(i)
results = Box(results) # Convert once
```
### 2. When Dict Protocol is Required
```python
# Some libraries expect strict dict instances
import json
from box import Box
config = Box({"key": "value"})
# This might fail with some JSON encoders expecting dict
# Use .to_dict() to convert back
json.dumps(config.to_dict()) # Safe
```
### 3. Simple, Flat Dictionaries
```python
# DON'T use Box for simple flat dicts without nesting
# Regular dict is simpler and faster
# Overkill
simple = Box({"name": "Alice", "age": 30})
print(simple.name)
# Better
simple = {"name": "Alice", "age": 30}
print(simple["name"])
```
### 4. When Key Names Match Python Keywords
```python
# Be careful with Python keywords as attributes
from box import Box
# This works but is awkward
data = Box({"class": "A", "type": "object"})
data["class"] # Must use bracket notation
# data.class # SyntaxError!
# Better: Use regular dict or rename keys
data = {"class_name": "A", "type_name": "object"}
```
## Decision Matrix: Box vs dict vs dataclass
| Scenario | Use Box | Use dict | Use dataclass |
| ----------------------------------- | --------------- | -------------------- | -------------------- |
| **Configuration files** (JSON/YAML) | ✅ Excellent | ❌ Verbose | ⚠️ Needs validation |
| **API response handling** | ✅ Excellent | ❌ Verbose | ❌ Schema unknown |
| **Nested data structures** | ✅ Excellent | ⚠️ Works but verbose | ✅ Good with nesting |
| **Type checking/IDE support** | ❌ Dynamic only | ❌ Dynamic only | ✅ Full typing |
| **Performance critical code** | ❌ Overhead | ✅ Fastest | ✅ Fast |
| **Immutable configuration** | ✅ FrozenBox | ❌ No built-in | ✅ frozen=True |
| **Dynamic key names** | ✅ Flexible | ✅ Flexible | ❌ Fixed attrs |
| **Need serialization helpers** | ✅ Built-in | ⚠️ Manual | ⚠️ Manual |
| **Simple flat structures** | ⚠️ Overkill | ✅ Perfect | ✅ Good |
| **Unknown data structure** | ✅ Flexible | ✅ Flexible | ❌ Needs schema |
## Decision Guidance
### Use Box When
1. **Working with configuration files** (YAML, JSON, TOML)
2. **Handling nested API responses** with deep structures
3. **You want cleaner dot notation** instead of brackets
4. **Converting between dict and JSON/YAML frequently**
5. **Need automatic nested dict conversion**
6. **Working with data from external sources** (APIs, config files)
7. **Prototyping or rapid development** where flexibility matters
### Use dict When
1. **Performance is critical** (tight loops, hot paths)
2. **Simple, flat data structures**
3. **Working with libraries expecting strict dict protocol**
4. **You need maximum compatibility** with standard library
5. **Memory efficiency is paramount** (minimal overhead)
### Use dataclass When
1. **Type safety and IDE autocomplete** are critical
2. **Data structure is well-defined and stable**
3. **You want validation** (with pydantic or attrs)
4. **Building APIs or libraries** with clear contracts
5. **Need immutability** with frozen=True
6. **Working in type-checked codebases** (mypy, pyright)
## Example Projects Using python-box
Based on GitHub code search @<https://github.com/search?q=%22from+box+import+Box%22&type=code>, python-box is commonly used in:
1. **Machine Learning/AI Projects**
- Configuration management for model training
- Hyperparameter storage
- Experiment tracking configurations
2. **Web Applications**
- Flask/FastAPI configuration handling
- API response processing
- Environment-specific settings
3. **Data Science**
- Notebook configuration management
- Dataset metadata handling
- Pipeline configurations
4. **DevOps/Infrastructure**
- Terraform/Ansible configuration processing
- CI/CD pipeline configurations
- Container orchestration configs
## Performance Considerations
### Cython Optimization
```bash
# For x86_64 platforms, install with Cython for ~10x faster loading
pip install Cython wheel
pip install python-box[all]~=7.0 --upgrade --force
# For non-x86_64, you'll need:
# - Python development files (python3-dev/python3-devel)
# - System compiler (gcc, clang)
# - Cython and wheel packages
```
### Memory vs Convenience Trade-off
```python
# Box adds ~3-5x memory overhead vs dict for large structures
import sys
from box import Box
# Regular dict
data = {"key": "value"}
print(sys.getsizeof(data)) # ~240 bytes
# Box wrapper
box_data = Box({"key": "value"})
print(sys.getsizeof(box_data)) # ~240 bytes (similar, but internal overhead for methods)
# For large datasets, convert to Box after processing
large_data = {}
for i in range(10000):
large_data[f"key_{i}"] = process_data(i)
# Convert once after collection
config = Box(large_data)
```
## Common Pitfalls & Solutions
### Pitfall 1: Attribute vs Key Confusion
```python
from box import Box
config = Box({"class": "A", "type": "B"})
# Problem: Python keywords can't be attributes
# config.class # SyntaxError!
# Solution: Use bracket notation
config["class"] # Works
# Or rename keys during creation
config = Box({"class_name": "A", "type_name": "B"})
config.class_name # Works
```
### Pitfall 2: Modification of Frozen Box
```python
from box import Box
# Frozen box prevents all modifications
config = Box({"debug": True}, frozen_box=True)
# These all fail with BoxError
# config.debug = False
# config.new_key = "value"
# config["debug"] = False
# Solution: Create unfrozen copy
mutable_config = Box(config.to_dict())
mutable_config.debug = False # Works
```
### Pitfall 3: Conversion Overhead
```python
from box import Box
# Problem: Creating Box in tight loops
def process_items(items):
results = []
for item in items:
item_box = Box(item) # Overhead per iteration!
results.append(item_box.process())
return results
# Solution: Convert once, or avoid Box in hot path
def process_items_better(items):
items_box = Box({"items": items})
return [item["process"] for item in items_box.items]
```
## Version History & Breaking Changes
- **v7.3.2** (2025-01-16): Latest stable release
- Bug fixes for box_dots and default_box_create_on_get
- **v7.3.0** (2024-12-10): Python 3.13 support added
- Dropped Python 3.8 support (EOL)
- **v7.2.0** (2024-06-12): Python 3.12 support
- Numpy-style tuple indexing for BoxList
- **v7.0.0**: Major version with breaking changes
**Breaking Changes:** @<https://github.com/cdgriffith/Box/wiki/Major-Version-Breaking-Changes>
Always check release notes before upgrading major versions.
## Related Libraries & Alternatives
| Library | Use Case | vs python-box |
| ------------------------- | ----------------------------- | ----------------------------------- |
| **types.SimpleNamespace** | Simple attribute access | Built-in, but no dict methods |
| **munch** | Dot notation dict | Less features, unmaintained |
| **addict** | Dict subclass with dot access | Similar, less popular |
| **pydantic** | Validated data structures | Type-safe, validation, more complex |
| **attrs/dataclasses** | Structured data | Type-safe, but not for dynamic data |
| **DynaBox** | Similar to Box | Less mature |
**When to use Box over alternatives:**
- Need dict compatibility + dot notation
- Working with JSON/YAML config files
- Don't need static type checking
- Want automatic nested conversion
## Additional Resources
- **Official Wiki:** @<https://github.com/cdgriffith/Box/wiki>
- **Quick Start:** @<https://github.com/cdgriffith/Box/wiki/Quick-Start>
- **Types of Boxes:** @<https://github.com/cdgriffith/Box/wiki/Types-of-Boxes>
- **Converters:** @<https://github.com/cdgriffith/Box/wiki/Converters>
- **Installation Guide:** @<https://github.com/cdgriffith/Box/wiki/Installation>
- **PyPI Package:** @<https://pypi.org/project/python-box/>
- **GitHub Issues:** @<https://github.com/cdgriffith/Box/issues>
## Contributing & Support
**Maintainer:** Chris Griffith (@cdgriffith) **Contributors:** @<https://github.com/cdgriffith/Box/blob/master/AUTHORS.rst> **Issues/Questions:** @<https://github.com/cdgriffith/Box/issues>
The library is actively maintained with regular releases and responsive issue handling.
---
**Research Sources:**
- @<https://github.com/cdgriffith/Box> (Official Repository)
- @<https://github.com/cdgriffith/Box/wiki> (Official Documentation)
- @<https://pypi.org/project/python-box/> (Package Registry)
- @<https://medium.com/@post.gourang/simplifying-configuration-management-in-python-with-configbox-90df67d26bce> (Tutorial)
- GitHub Code Search for real-world usage examples
**Last Updated:** 2025-10-21 **Research Quality:** High - Based on official documentation, source code analysis, and real-world usage patterns

View File

@@ -0,0 +1,800 @@
---
title: "Copier: Project Template Renderer with Update Capabilities"
library_name: copier
pypi_package: copier
category: project_templating
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://copier.readthedocs.io"
official_repository: "https://github.com/copier-org/copier"
maintenance_status: "active"
---
# Copier: Project Template Renderer with Update Capabilities
## Executive Summary
**What problem does it solve?** Copier solves the problem of project scaffolding AND ongoing template synchronization. Unlike most templating tools that are one-way generators, Copier enables **code lifecycle management** - you can update existing projects when the template evolves, not just generate new projects.
**Core value proposition:**
- Generate projects from templates (scaffolding)
- **Update projects when templates change** (unique feature)
- Version-aware migrations during updates
- Works with local paths and Git URLs
- Preserves customizations during updates
**When you'd be "reinventing the wheel" without it:**
- Maintaining multiple similar projects that need to stay in sync with best practices
- Rolling out security updates or dependency changes across many projects
- Applying organizational standards to existing codebases
- Managing project boilerplate that evolves over time
## Official Information
- **Repository**: @<https://github.com/copier-org/copier>
- **PyPI Package**: `copier` (current: v9.10.3)
- **Documentation**: @<https://copier.readthedocs.io/>
- **License**: MIT
- **Maintenance**: Active development, 2,880+ stars
- **Original Author**: jpsca (Juan-Pablo Scaletti)
- **Current Maintainers**: yajo, pawamoy, sisp, and community
## Installation
```bash
# As CLI tool (recommended)
pipx install copier
# or with uv
uv tool install copier
# As library
pip install copier
# With conda
conda install -c conda-forge copier
# Nix (100% reproducible)
nix profile install 'https://flakehub.com/f/copier-org/copier/*.tar.gz'
# Homebrew (macOS/Linux)
brew install copier
```
**Requirements:**
- Python 3.9 or newer
- Git 2.27 or newer (for template versioning and updates)
## Python Version Compatibility
| Python Version | Support Status | Notes |
| -------------- | -------------------- | ---------------------------------------------- |
| 3.9 - 3.12 | ✅ Full support | Production ready |
| 3.13 | ✅ Supported | v9.10.2+ built with Python 3.13.7 |
| 3.14 | ⚠️ Likely compatible | Not explicitly tested, but backward-compatible |
| < 3.9 | ❌ Not supported | Use older Copier versions |
_Source: @<https://github.com/copier-org/copier/blob/master/pyproject.toml> (classifiers section)_
## Core Purpose: When to Use Copier
### Primary Use Cases
1. **Project Scaffolding with Future Updates**
- Generate new projects from templates
- Apply template updates to existing projects
- Track which template version each project uses
2. **Multi-Project Standardization**
- Maintain consistency across microservices
- Roll out organization-wide best practices
- Synchronize CI/CD configurations
3. **Living Templates**
- Templates that evolve with ecosystem changes
- Security patches propagated to all projects
- Dependency updates across project families
4. **Template Versioning**
- Use Git tags to version templates
- Selective updates to specific versions
- Smart diff between template versions
### Copier vs Cookiecutter vs Yeoman
**Use Copier when:**
- ✅ You need to update projects after generation
- ✅ You manage multiple similar projects
- ✅ Your template evolves frequently
- ✅ You want migration scripts during updates
- ✅ You prefer YAML over JSON configuration
**Use Cookiecutter when:**
- ✅ You only need one-time generation
- ✅ You want the largest template ecosystem
- ✅ You need maximum stability (mature project)
- ✅ Template updates aren't important
**Use Yeoman when:**
- ✅ You're in the Node.js ecosystem
- ✅ You want NPM package distribution
- ✅ You need JavaScript-based logic
| Feature | Copier | Cookiecutter | Yeoman |
| ------------------------ | ----------------------- | ---------------------- | ----------- |
| **Template Updates** | ✅ Yes | ❌ No (requires Cruft) | ❌ No |
| **Migrations** | ✅ Yes | ❌ No | ❌ No |
| **Config Format** | YAML | JSON | JavaScript |
| **Templating** | Jinja2 | Jinja2 | EJS |
| **Programming Required** | ❌ No | ❌ No | ✅ Yes (JS) |
| **Template Suffix** | `.jinja` (configurable) | None | You choose |
| **File Name Templating** | ✅ Yes | ✅ Yes | ✅ Yes |
| **Ecosystem Size** | Medium | Large | Large |
| **Maturity** | Active | Mature | Mature |
_Source: @<https://github.com/copier-org/copier/blob/master/docs/comparisons.md>_
## Real-World Examples
### 1. FastAPI Full-Stack Template
**Repository**: @<https://github.com/fastapi/full-stack-fastapi-template> (38,000+ stars)
```bash
# Generate a new FastAPI project
pipx run copier copy https://github.com/fastapi/full-stack-fastapi-template my-project --trust
```
**Features demonstrated:**
- Multi-service Docker setup
- PostgreSQL integration
- React frontend scaffolding
- Environment variable templating
- Post-generation tasks
**Template snippet** (@<https://github.com/fastapi/full-stack-fastapi-template/blob/main/copier.yml>):
```yaml
project_name:
type: str
help: The name of the project
default: FastAPI Project
secret_key:
type: str
help: |
The secret key for the project, generate with:
python -c "import secrets; print(secrets.token_urlsafe(32))"
default: changethis
_tasks:
- ["{{ _copier_python }}", .copier/update_dotenv.py]
```
### 2. Modern Python Package Template (copier-uv)
**Repository**: @<https://github.com/pawamoy/copier-uv> (108 stars)
```bash
# Create a uv-managed Python package
copier copy gh:pawamoy/copier-uv /path/to/project
```
**Features demonstrated:**
- uv package manager integration
- Jinja extensions (custom filters)
- Git integration (auto-detect author)
- License selection (20+ options)
- Multi-file configuration includes
**Advanced configuration** (@<https://github.com/pawamoy/copier-uv/blob/main/copier.yml>):
```yaml
_min_copier_version: "9"
_jinja_extensions:
- copier_template_extensions.TemplateExtensionLoader
- extensions.py:CurrentYearExtension
- extensions.py:GitExtension
- extensions.py:SlugifyExtension
author_fullname:
type: str
help: Your full name
default: "{{ 'Default Name' | git_user_name }}"
repository_name:
type: str
default: "{{ project_name | slugify }}"
```
### 3. NLeSC Scientific Python Template
**Repository**: @<https://github.com/NLeSC/python-template> (223 stars)
**Features demonstrated:**
- Modular configuration (YAML includes)
- Profile-based generation
- Research software best practices
- Citation files (CITATION.cff)
**Modular structure** (@<https://github.com/NLeSC/python-template/blob/main/copier.yml>):
```yaml
# Include pattern for maintainability
!include copier/settings.yml !include copier/profiles.yml !include copier/questions/essential.yml !include copier/questions/features_code_quality.yml !include copier/questions/features_documentation.yml
```
### 4. JupyterLab Extension Template
**Repository**: @<https://github.com/jupyterlab/extension-template> (77 stars)
```bash
pip install "copier~=9.2" jinja2-time
copier copy --trust https://github.com/jupyterlab/extension-template .
```
### 5. Odoo/Doodba Template
**Repository**: @<https://github.com/Tecnativa/doodba-copier-template> (104 stars)
- Complex multi-container applications
- Multiple answer files for different template layers
## Integration Patterns
### Git Integration
**Template versioning with Git tags:**
```bash
# Copy specific version
copier copy --vcs-ref=v1.2.0 gh:org/template /path/to/project
# Copy latest release (default)
copier copy gh:org/template /path/to/project
# Copy from HEAD (including uncommitted changes)
copier copy --vcs-ref=HEAD ./template /path/to/project
```
**Update to latest template version:**
```bash
cd /path/to/project
copier update # Reads .copier-answers.yml automatically
```
**Update to specific version:**
```bash
copier update --vcs-ref=v2.0.0
```
### Template Updates Workflow
The **killer feature** that distinguishes Copier:
```mermaid
graph TD
A[Template v1.0] --> B[Generate Project]
B --> C[.copier-answers.yml]
A --> D[Template v2.0]
D --> E[copier update]
C --> E
E --> F[Smart 3-way Merge]
F --> G[Updated Project]
F --> H[Migration Scripts]
H --> G
```
**The update process:**
1. Copier clones template at old version (from `.copier-answers.yml`)
2. Regenerates project with old template
3. Compares to current project (detects your changes)
4. Clones template at new version
5. Generates with new template
6. Creates 3-way merge between: old template → your project ← new template
7. Runs migration tasks for version transitions
**Example migration** (from Copier docs):
```yaml
# copier.yml
_migrations:
- version: v2.0.0
command: rm ./old-folder
when: "{{ _stage == 'before' }}"
- invoke migrate $VERSION_FROM $VERSION_TO
```
### CI/CD Integration
**GitHub Actions example:**
```yaml
name: Update from template
on:
schedule:
- cron: "0 0 * * 0" # Weekly
workflow_dispatch:
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- run: uv tool install copier
- run: copier update --defaults --vcs-ref=HEAD
- run: |
git config user.name "Bot"
git config user.email "bot@example.com"
git checkout -b update-template
git add -A
git commit -m "Update from template"
git push origin update-template
- uses: peter-evans/create-pull-request@v5
```
### Multiple Templates per Project
Apply different templates to different aspects:
```bash
# Base framework
copier copy -a .copier-answers.main.yml \
gh:example/framework-template .
# Pre-commit config
copier copy -a .copier-answers.pre-commit.yml \
gh:my-org/pre-commit-template .
# Internal CI
copier copy -a .copier-answers.ci.yml \
git@gitlab.internal.com:templates/ci .
```
Each gets its own answers file, enabling independent updates:
```bash
copier update -a .copier-answers.main.yml
copier update -a .copier-answers.pre-commit.yml
copier update -a .copier-answers.ci.yml
```
## Usage Examples
### Basic Template Creation
**Minimal template structure:**
```text
my_template/
├── copier.yml # Template configuration
├── .git/ # Git repo (for versioning)
├── {{project_name}}/ # Templated folder name
│ └── {{module_name}}.py.jinja # Templated file
└── {{_copier_conf.answers_file}}.jinja # Answers file
```
**copier.yml** (question definitions):
```yaml
project_name:
type: str
help: What is your project name?
module_name:
type: str
help: What is your Python module name?
default: "{{ project_name | lower | replace('-', '_') }}"
python_version:
type: str
help: Minimum Python version
default: "3.9"
choices:
- "3.9"
- "3.10"
- "3.11"
- "3.12"
- "3.13"
```
**Templated Python file** (`{{project_name}}/{{module_name}}.py.jinja`):
```python
"""{{ project_name }} - A Python package."""
__version__ = "0.1.0"
def hello() -> str:
"""Return a greeting."""
return "Hello from {{ module_name }}!"
```
**Answers file template** (`{{_copier_conf.answers_file}}.jinja`):
```yaml
# Changes here will be overwritten by Copier
{ { _copier_answers | to_nice_yaml - } }
```
### Generating a Project
**From local template:**
```bash
copier copy /path/to/template /path/to/destination
```
**From Git URL:**
```bash
copier copy https://github.com/org/template /path/to/destination
# or shorthand
copier copy gh:org/template /path/to/destination
copier copy gl:org/template /path/to/destination # GitLab
```
**With pre-answered questions:**
```bash
copier copy \
--data project_name="My Project" \
--data module_name="my_project" \
gh:org/template /path/to/destination
```
**From data file:**
```bash
# answers.yml
project_name: My Project
module_name: my_project
python_version: "3.11"
# Use it
copier copy --data-file answers.yml gh:org/template /path/to/destination
```
### Programmatic Usage
```python
from copier import run_copy, run_update
# Generate new project
run_copy(
"https://github.com/org/template.git",
"/path/to/destination",
data={"project_name": "My Project"},
vcs_ref="v1.0.0", # Specific version
)
# Update existing project
run_update(
"/path/to/destination",
vcs_ref="v2.0.0", # Update to v2.0.0
skip_answered=True, # Don't re-ask answered questions
)
```
### Advanced Template Features
**Conditional file generation:**
```yaml
# copier.yml
use_docker:
type: bool
help: Include Docker support?
default: true
```
**File/folder structure:**
```text
template/
{% if use_docker %}Dockerfile{% endif %}.jinja
{% if use_docker %}docker-compose.yml{% endif %}.jinja
```
**Dynamic choices:**
```yaml
language:
type: str
choices:
- python
- javascript
package_manager:
type: str
help: Which package manager?
choices: |
{%- if language == "python" %}
- pip
- uv
- poetry
{%- else %}
- npm
- yarn
- pnpm
{%- endif %}
```
**File exclusion:**
```yaml
_exclude:
- "*.pyc"
- __pycache__
- .git
- .venv
- "{% if not use_docker %}docker-*{% endif %}"
```
**Post-generation tasks:**
```yaml
_tasks:
- git init
- git add -A
- git commit -m "Initial commit from template"
- ["{{ _copier_python }}", -m pip install -e .]
```
**Jinja2 extensions:**
```yaml
_jinja_extensions:
- copier_templates_extensions.TemplateExtensionLoader
- jinja2_time.TimeExtension
# Install with:
# pipx inject copier copier-templates-extensions jinja2-time
```
### Updating Projects
**Update to latest version:**
```bash
cd /path/to/project
copier update
```
**Update with conflict resolution:**
```bash
# Inline conflicts (default)
copier update --conflict=inline
# .rej files (like patch)
copier update --conflict=rej
```
**Re-answer questions:**
```bash
# Re-answer all questions
copier update --vcs-ref=:current:
# Skip previously answered
copier update --skip-answered
```
**Update without interactive prompts:**
```bash
# Use defaults/existing answers
copier update --defaults
# Override specific values
copier update --data python_version="3.12"
```
## When NOT to Use Copier
### Simple File Copying
**Don't use Copier for:**
```bash
# Just copy static files
cp -r template_dir new_project
```
**Use basic tools instead:**
- `cp` for simple directory copying
- `rsync` for file synchronization
- Git clone for exact repository copies
### One-Time Generation Without Updates
If you never plan to update from the template:
- Cookiecutter has larger ecosystem
- Yeoman for Node.js projects
- Manual copying might suffice
### Complex Conditional Logic
**Not ideal for:**
- Heavy business logic in templates
- Complex data transformations
- Runtime configuration (use proper config libraries)
**Use instead:**
- Python scripts for complex logic
- Dedicated config management (Dynaconf, python-decouple)
- Application frameworks (Django, FastAPI built-in scaffolding)
### Single Project Maintenance
If you only maintain one project:
- Template overhead isn't justified
- Direct edits are simpler
- No synchronization benefits
### Non-Text Files
Copier focuses on text file templating:
- Binary files copied as-is
- No image/binary manipulation
- No archive extraction
### Version Control Conflicts
⚠️ **Be cautious when:**
- Project has diverged significantly from template
- Many conflicting changes expected
- Team unfamiliar with 3-way merge resolution
**Mitigation:**
- Test updates in separate branch
- Use `--conflict=rej` for manual review
- Document update procedures
## Decision Matrix
### Use Copier When
| Scenario | Why Copier? |
| ---------------------------------- | ------------------------------------------------------------ |
| Managing 5+ similar microservices | Templates sync security patches across all services |
| Organizational standards evolving | Roll out changes without manual edits to each project |
| Onboarding new projects frequently | Consistent structure + ability to improve template over time |
| Template still experimental | Iterate template, update existing projects with improvements |
| CI/CD pipeline standardization | Update all projects when pipeline requirements change |
| Multi-repo architecture | Maintain consistency without monorepo complexity |
### Don't Use Copier When
| Scenario | Why Not? | Alternative |
| ------------------------------------------- | ---------------------------- | ------------------------------------- |
| Single project, no similar projects planned | Overhead > benefit | Direct editing |
| Template is 100% stable forever | Update feature unused | Cookiecutter (larger ecosystem) |
| Heavy runtime configuration needed | Wrong tool for job | Dynaconf, Pydantic Settings |
| Binary file manipulation required | Not designed for this | Pillow, custom scripts |
| Project has deviated >50% from template | Merge conflicts overwhelming | Manual migration |
| No Git repository for template | Can't track versions | Use Git or accept one-shot generation |
### Copier vs Cookiecutter Decision Tree
```text
Do you need to update projects after generation?
├─ YES → Use Copier
│ └─ Need version-aware migrations?
│ ├─ YES → Definitely Copier
│ └─ NO → Still Copier (future-proofing)
└─ NO → Consider factors:
├─ Prefer YAML config? → Copier
├─ Want larger template ecosystem? → Cookiecutter
├─ Need maximum stability? → Cookiecutter
└─ Might need updates later? → Copier (easier to start with)
```
## Best Practices
### Template Design
1. **Version your templates** - Use Git tags (v1.0.0, v2.0.0)
2. **Keep templates focused** - One concern per template
3. **Provide good defaults** - Minimize required answers
4. **Document migrations** - Explain breaking changes
5. **Test template updates** - Generate project, modify, update
### Project Maintenance
1. **Commit `.copier-answers.yml`** - Essential for updates
2. **Don't edit generated markers** - Copier overwrites them
3. **Test updates in branches** - Merge after verification
4. **Run migrations carefully** - Review before executing
5. **Document deviations** - Note why you diverge from template
### Organization Adoption
1. **Start with one template** - Prove value before expanding
2. **Automate update checks** - CI job for template freshness
3. **Train on merge conflicts** - 3-way merges need understanding
4. **Maintain template changelog** - Help consumers understand changes
5. **Version template conservatively** - Breaking changes = major version
## Common Gotchas
1. **Answers file location matters** - Must be committed and at project root
2. **Template suffix required by default** - Files need `.jinja` unless configured otherwise
3. **Git required for updates** - Template must be Git repository with tags
4. **Jinja syntax in YAML** - Must quote templated values properly
5. **Task execution order** - Tasks run sequentially, not in parallel
6. **Conflict resolution** - Learn 3-way merge basics before first update
## Performance Considerations
- **Generation speed**: Fast for typical projects (<1s for small templates)
- **Update speed**: Depends on project size and Git history
- **Memory usage**: Minimal, dominated by Git operations
- **Caching**: Template cloning cached by Git
## Related Tools
- **cruft** - Adds update capability to Cookiecutter templates
- **cookiecutter** - Popular Python templating (one-way generation)
- **yeoman** - Node.js ecosystem scaffolding
- **copier-templates-extensions** - Additional Jinja filters for Copier
- **jinja2-time** - Time-based Jinja filters
## Learning Resources
- Official docs: @<https://copier.readthedocs.io/>
- Template browser: @<https://github.com/topics/copier-template>
- Comparisons: @<https://github.com/copier-org/copier/blob/master/docs/comparisons.md>
- Example templates: See "Real-World Examples" section above
## Summary
**Copier is the best choice when:**
- You maintain multiple related projects
- Your templates evolve over time
- You need to propagate changes to existing projects
- You want version-aware template management
- You prefer declarative YAML configuration
**Copier's unique selling point:** The ability to update existing projects when templates change, with intelligent 3-way merging and version-aware migrations.
**Quick start for evaluation:**
```bash
# Install
pipx install copier
# Try popular template
copier copy gh:pawamoy/copier-uv test-project
# Make changes to project, then simulate update
cd test-project
# Edit some files...
copier update --defaults --vcs-ref=HEAD
```
---
**Research completed**: 2025-10-21 **Sources verified**: Official repository, PyPI, documentation, real-world templates **Template examples analyzed**: 5 major templates (FastAPI, copier-uv, NLeSC, JupyterLab, Doodba)

View File

@@ -0,0 +1,677 @@
---
title: "Datasette: Instant JSON API for Your SQLite Data"
library_name: datasette
pypi_package: datasette
category: data_exploration
python_compatibility: "3.10+"
last_updated: "2025-11-02"
official_docs: "https://docs.datasette.io"
official_repository: "https://github.com/simonw/datasette"
maintenance_status: "active"
---
# Datasette - Instant Data Publishing and Exploration
## Executive Summary
Datasette is an open-source tool for exploring and publishing data. It transforms any SQLite database into an interactive website with a full JSON API, requiring zero code. Designed for data journalists, museum curators, archivists, local governments, scientists, and researchers, Datasette makes data sharing and exploration accessible to anyone with data to publish.
**Core Value Proposition**: Take data of any shape or size and instantly publish it as an explorable website with a corresponding API, without writing application code.
## Official Information
- **Repository**: <https://github.com/simonw/datasette> @ simonw/datasette
- **PyPI**: `datasette` @ <https://pypi.org/project/datasette/>
- **Current Development Version**: 1.0a19 (alpha)
- **Current Stable Version**: 0.65.1
- **Documentation**: <https://docs.datasette.io/> @ docs.datasette.io
- **License**: Apache License 2.0 @ <https://github.com/simonw/datasette/blob/main/LICENSE>
- **Maintenance Status**: Actively maintained (647 open issues, last updated 2025-10-21)
- **Community**: Discord @ <https://datasette.io/discord>, Newsletter @ <https://datasette.substack.com/>
## What Problem Does Datasette Solve?
### The Problem
Organizations and individuals have valuable data in SQLite databases, CSV files, or other formats, but:
- Building a web interface to explore data requires significant development effort
- Creating APIs for data access requires backend development expertise
- Publishing data in an accessible, explorable format is time-consuming
- Sharing data insights requires custom visualization tools
- Data exploration often requires SQL knowledge or specialized tools
### The Solution
Datasette provides:
1. **Instant Web Interface**: Automatic web UI for any SQLite database
2. **Automatic API**: Full JSON API with no code required
3. **SQL Query Interface**: Built-in SQL editor with query sharing
4. **Plugin Ecosystem**: 300+ plugins for extending functionality @ <https://datasette.io/plugins>
5. **One-Command Publishing**: Deploy to cloud platforms with a single command
6. **Zero-Setup Exploration**: Browse, filter, and facet data immediately
### What Would Be Reinventing the Wheel
Without Datasette, you would need to build:
- Custom web application for data browsing
- RESTful API endpoints for data access
- SQL query interface with security controls
- Data export functionality (JSON, CSV)
- Full-text search integration
- Authentication and authorization system
- Pagination and filtering logic
- Deployment configuration and hosting setup
**Example**: Publishing a dataset of 100,000 records would require weeks of development work. With Datasette: `datasette publish cloudrun mydata.db --service=mydata`
## Real-World Usage Patterns
### Pattern 1: Publishing Open Data (Government/Research)
**Context**: @ <https://github.com/simonw/covid-19-datasette>
```bash
# Convert CSV to SQLite
csvs-to-sqlite covid-data.csv covid.db
# Publish to Cloud Run with metadata
datasette publish cloudrun covid.db \
--service=covid-tracker \
--metadata metadata.json \
--install=datasette-vega
```
**Use Case**: Local governments publishing COVID-19 statistics, election results, or public records.
### Pattern 2: Personal Data Archives (Dogsheep Pattern)
**Context**: @ <https://github.com/dogsheep>
```bash
# Export Twitter data to SQLite
twitter-to-sqlite user-timeline twitter.db
# Export GitHub activity
github-to-sqlite repos github.db
# Export Apple Health data
healthkit-to-sqlite export.zip health.db
# Explore everything together
datasette twitter.db github.db health.db --crossdb
```
**Use Case**: Personal data liberation - exploring your own data from various platforms.
### Pattern 3: Data Journalism and Investigation
**Context**: @ <https://github.com/simonw/laion-aesthetic-datasette>
```python
# Load and explore LAION training data
import sqlite_utils
db = sqlite_utils.Database("images.db")
db["images"].insert_all(image_data)
db["images"].enable_fts(["caption", "url"])
# Launch with custom template
datasette images.db \
--template-dir templates/ \
--metadata metadata.json
```
**Use Case**: Exploring large datasets like Stable Diffusion training data, analyzing patterns.
### Pattern 4: Internal Tools and Dashboards
**Context**: @ <https://github.com/rclement/datasette-dashboards>
```yaml
# datasette.yaml - Configure dashboards
databases:
analytics:
queries:
daily_users:
sql: |
SELECT date, count(*) as users
FROM events
WHERE event_type = 'login'
GROUP BY date
ORDER BY date DESC
title: Daily Active Users
```
**Installation**:
```bash
datasette install datasette-dashboards
datasette analytics.db --config datasette.yaml
```
**Use Case**: Building internal analytics dashboards without BI tools.
### Pattern 5: API Backend for Applications
**Context**: @ <https://github.com/simonw/datasette-graphql>
```bash
# Install GraphQL plugin
datasette install datasette-graphql
# Launch with authentication
datasette data.db \
--root \
--cors \
--setting default_cache_ttl 3600
```
**GraphQL Query**:
```graphql
{
products(first: 10, where: { price_gt: 100 }) {
nodes {
id
name
price
}
}
}
```
**Use Case**: Using Datasette as a read-only API backend for mobile/web apps.
## Integration Patterns
### Core Data Integrations
1. **SQLite Native**:
```python
import sqlite3
conn = sqlite3.connect('data.db')
# Datasette reads directly
```
2. **CSV/JSON Import** via `sqlite-utils` @ <https://github.com/simonw/sqlite-utils>:
```bash
sqlite-utils insert data.db records records.json
csvs-to-sqlite *.csv data.db
```
3. **Database Migration** via `db-to-sqlite` @ <https://github.com/simonw/db-to-sqlite>:
```bash
# Export from PostgreSQL
db-to-sqlite "postgresql://user:pass@host/db" data.db --table=events
# Export from MySQL
db-to-sqlite "mysql://user:pass@host/db" data.db --all
```
### Companion Libraries
- **sqlite-utils**: Database manipulation @ <https://github.com/simonw/sqlite-utils>
- **csvs-to-sqlite**: CSV import @ <https://github.com/simonw/csvs-to-sqlite>
- **datasette-extract**: AI-powered data extraction @ <https://github.com/datasette/datasette-extract>
- **datasette-parquet**: Parquet/DuckDB support @ <https://github.com/cldellow/datasette-parquet>
### Deployment Patterns
**Cloud Run** @ <https://docs.datasette.io/en/stable/publish.html>:
```bash
datasette publish cloudrun data.db \
--service=myapp \
--install=datasette-vega \
--install=datasette-cluster-map \
--metadata metadata.json
```
**Vercel** via `datasette-publish-vercel` @ <https://github.com/simonw/datasette-publish-vercel>:
```bash
pip install datasette-publish-vercel
datasette publish vercel data.db --project my-data
```
**Fly.io** via `datasette-publish-fly` @ <https://github.com/simonw/datasette-publish-fly>:
```bash
pip install datasette-publish-fly
datasette publish fly data.db --app=my-datasette
```
**Docker**:
```dockerfile
FROM datasetteproject/datasette
COPY *.db /data/
RUN datasette install datasette-vega
CMD datasette serve /data/*.db --host 0.0.0.0 --cors
```
## Python Version Compatibility
### Official Support Matrix
| Python Version | Status | Notes |
| -------------- | -------------------- | ----------------------------------- |
| 3.10 | **Minimum Required** | @ setup.py python_requires=">=3.10" |
| 3.11 | ✅ Fully Supported | Recommended for production |
| 3.12 | ✅ Fully Supported | Tested in CI |
| 3.13 | ✅ Fully Supported | Tested in CI |
| 3.14 | ✅ Fully Supported | Tested in CI |
| 3.9 and below | ❌ Not Supported | Deprecated as of v1.0 |
### Version-Specific Considerations
**Python 3.10+**:
- Uses `importlib.metadata` for plugin loading
- Native `match/case` statements in codebase (likely in v1.0+)
- Type hints using modern syntax
**Python 3.11+ Benefits**:
- Better async performance (important for ASGI)
- Faster startup times
- Improved error messages
**No Breaking Changes Expected**: Datasette maintains backward compatibility within major versions.
## Usage Examples
### Basic Usage
```bash
# Install
pip install datasette
# or
brew install datasette
# Serve a database
datasette data.db
# Open in browser automatically
datasette data.db -o
# Serve multiple databases
datasette db1.db db2.db db3.db
# Enable cross-database queries
datasette db1.db db2.db --crossdb
```
### Configuration Example
**metadata.json** @ <https://docs.datasette.io/en/stable/metadata.html>:
```json
{
"title": "My Data Project",
"description": "Exploring public datasets",
"license": "CC BY 4.0",
"license_url": "https://creativecommons.org/licenses/by/4.0/",
"source": "Data Sources",
"source_url": "https://example.com/sources",
"databases": {
"mydb": {
"tables": {
"events": {
"title": "Event Log",
"description": "System event records",
"hidden": false
}
}
}
}
}
```
**datasette.yaml** @ <https://docs.datasette.io/en/stable/configuration.html>:
```yaml
settings:
default_page_size: 50
sql_time_limit_ms: 3500
max_returned_rows: 2000
plugins:
datasette-cluster-map:
latitude_column: lat
longitude_column: lng
databases:
mydb:
queries:
popular_events:
sql: |
SELECT event_type, COUNT(*) as count
FROM events
GROUP BY event_type
ORDER BY count DESC
LIMIT 10
title: Most Popular Events
```
### Plugin Development Example
**Simple Plugin** @ <https://docs.datasette.io/en/stable/writing_plugins.html>:
```python
from datasette import hookimpl
@hookimpl
def prepare_connection(conn):
"""Add custom SQL functions"""
conn.create_function("is_even", 1, lambda x: x % 2 == 0)
@hookimpl
def extra_template_vars(request):
"""Add variables to templates"""
return {
"custom_message": "Hello from plugin!"
}
```
**setup.py**:
```python
setup(
name="datasette-my-plugin",
version="0.1",
py_modules=["datasette_my_plugin"],
entry_points={
"datasette": [
"my_plugin = datasette_my_plugin"
]
},
install_requires=["datasette>=0.60"],
)
```
### Advanced: Python API Usage
**Programmatic Access** @ <https://docs.datasette.io/en/stable/internals.html>:
```python
from datasette.app import Datasette
import asyncio
async def explore_data():
# Initialize Datasette
ds = Datasette(files=["data.db"])
# Execute query
result = await ds.execute(
"data",
"SELECT * FROM users WHERE age > :age",
{"age": 18}
)
# Access rows
for row in result.rows:
print(dict(row))
# Get table info
db = ds.get_database("data")
tables = await db.table_names()
print(f"Tables: {tables}")
asyncio.run(explore_data())
```
### Testing Plugins
**pytest Example** @ <https://docs.datasette.io/en/stable/testing_plugins.html>:
```python
import pytest
from datasette.app import Datasette
@pytest.mark.asyncio
async def test_homepage():
ds = Datasette(memory=True)
await ds.invoke_startup()
response = await ds.client.get("/")
assert response.status_code == 200
assert "<!DOCTYPE html>" in response.text
@pytest.mark.asyncio
async def test_json_api():
ds = Datasette(memory=True)
# Create test data
db = ds.add_database(Database(ds, memory_name="test"))
await db.execute_write(
"CREATE TABLE items (id INTEGER PRIMARY KEY, name TEXT)"
)
# Query via API
response = await ds.client.get("/test/items.json")
assert response.status_code == 200
data = response.json()
assert data["rows"] == []
```
## When NOT to Use Datasette
### ❌ Scenarios Where Datasette Is Inappropriate
1. **High-Write Applications**
- Datasette is optimized for read-heavy workloads
- SQLite has write limitations with concurrent access
- **Better Alternative**: PostgreSQL with PostgREST, or Django REST Framework
2. **Real-Time Collaborative Editing**
- No built-in support for concurrent data editing
- Read-only by default (writes require plugins)
- **Better Alternative**: Airtable, Retool, or custom CRUD application
3. **Large-Scale Data Warehousing**
- SQLite works well up to ~100GB, struggles beyond
- Not designed for massive analytical workloads
- **Better Alternative**: DuckDB with MotherDuck, or BigQuery with Looker
4. **Complex BI Dashboards**
- Limited visualization capabilities without plugins
- Not a replacement for full BI platforms
- **Better Alternative**: Apache Superset @ <https://github.com/apache/superset>, Metabase @ <https://github.com/metabase/metabase>, or Grafana
5. **Transactional Systems**
- Not designed for OLTP workloads
- Limited transaction support
- **Better Alternative**: Django ORM with PostgreSQL, or FastAPI with SQLAlchemy
6. **User Authentication and Authorization**
- Basic auth support, but not a full auth system
- RBAC requires plugins and configuration
- **Better Alternative**: Use Datasette behind proxy with auth, or use Metabase for built-in user management
7. **Non-Relational Data**
- Optimized for relational SQLite data
- Document stores require workarounds
- **Better Alternative**: MongoDB with Mongo Express, or Elasticsearch with Kibana
### ⚠️ Use With Caution
1. **Sensitive Data Without Proper Access Controls**
- Default is public access
- Requires careful permission configuration
- **Mitigation**: Use `--root` for admin access, configure permissions @ <https://docs.datasette.io/en/stable/authentication.html>
2. **Production Without Rate Limiting**
- No built-in rate limiting
- Can be overwhelmed by traffic
- **Mitigation**: Deploy behind reverse proxy with rate limiting, or use Cloud Run with concurrency limits
## Decision Matrix
### ✅ Use Datasette When
| Scenario | Why Datasette Excels |
| -------------------------------------- | ---------------------------------------------------- |
| Publishing static/semi-static datasets | Zero-code instant publication |
| Data journalism and investigation | SQL interface + full-text search + shareable queries |
| Personal data exploration (Dogsheep) | Cross-database queries, plugin ecosystem |
| Internal read-only dashboards | Fast setup, minimal infrastructure |
| Prototyping data APIs | Instant JSON API, no backend code |
| Open data portals | Built-in metadata, documentation, CSV export |
| SQLite file exploration | Best-in-class SQLite web interface |
| Low-traffic reference data | Excellent for datasets < 100GB |
### ❌ Don't Use Datasette When
| Scenario | Why It's Not Suitable | Better Alternative |
| ----------------------------- | ---------------------------------------- | ---------------------------- |
| Building a CRUD application | Read-focused, limited write support | Django, FastAPI + SQLAlchemy |
| Real-time analytics | Not designed for streaming data | InfluxDB, TimescaleDB |
| Multi-tenant SaaS app | Limited isolation, no row-level security | PostgreSQL + RLS |
| Heavy concurrent writes | SQLite write limitations | PostgreSQL, MySQL |
| Terabyte-scale data | SQLite size constraints | DuckDB, BigQuery, Snowflake |
| Enterprise BI with governance | Limited data modeling layer | Looker, dbt + Metabase |
| Complex visualization needs | Basic charts without plugins | Apache Superset, Tableau |
| Document/graph data | Relational focus | MongoDB, Neo4j |
## Comparison with Alternatives
### vs. Apache Superset @ <https://github.com/apache/superset>
**When to use Superset over Datasette**:
- Need advanced visualizations (50+ chart types vs. basic plugins)
- Enterprise BI with complex dashboards
- Multiple data source types (not just SQLite)
- Large team collaboration with RBAC
**When to use Datasette over Superset**:
- Simpler deployment and setup
- Focus on data exploration over dashboarding
- Primarily working with SQLite databases
- Want instant API alongside web interface
### vs. Metabase @ <https://github.com/metabase/metabase>
**When to use Metabase over Datasette**:
- Need business user-friendly query builder
- Want built-in email reports and scheduling
- Require user management and permissions UI
- Need mobile app support
**When to use Datasette over Metabase**:
- Working primarily with SQLite
- Want plugin extensibility
- Need instant deployment (lighter weight)
- Want API-first design
### vs. Custom Flask/FastAPI Application
**When to build custom over Datasette**:
- Complex business logic required
- Heavy write operations
- Custom authentication flows
- Specific UX requirements
**When to use Datasette over custom**:
- Rapid prototyping (hours vs. weeks)
- Standard data exploration needs
- Focus on data, not application development
- Leverage plugin ecosystem
## Key Insights and Recommendations
### Core Strengths
1. **Speed to Value**: From data to published website in minutes
2. **Plugin Ecosystem**: 300+ plugins for extending functionality @ <https://datasette.io/plugins>
3. **API-First Design**: JSON API is a first-class citizen
4. **Deployment Simplicity**: One command to cloud platforms
5. **Open Source Community**: Active development, responsive maintainer
### Best Practices
1. **Use sqlite-utils for data prep** @ <https://github.com/simonw/sqlite-utils>:
```bash
sqlite-utils insert data.db table data.json --pk=id
sqlite-utils enable-fts data.db table column1 column2
```
2. **Configure permissions properly**:
```yaml
databases:
private:
allow:
id: admin_user
```
3. **Use immutable mode for static data**:
```bash
datasette data.db --immutable
```
4. **Leverage canned queries for common patterns**:
```yaml
queries:
search:
sql: SELECT * FROM items WHERE name LIKE :query
```
5. **Install datasette-hashed-urls for caching** @ <https://github.com/simonw/datasette-hashed-urls>:
```bash
datasette install datasette-hashed-urls
```
### Migration Path
**From spreadsheets to Datasette**:
```bash
csvs-to-sqlite data.csv data.db
datasette data.db
```
**From PostgreSQL to Datasette**:
```bash
db-to-sqlite "postgresql://user:pass@host/db" data.db
datasette data.db
```
**From Datasette to production app**:
- Use Datasette for prototyping and exploration
- Migrate to FastAPI/Django when write operations become critical
- Keep Datasette for read-only reporting interface
## Summary
Datasette excels at making data instantly explorable and shareable. It's the fastest path from data to published website with API. Use it for read-heavy workflows, data journalism, personal data archives, and rapid prototyping. Avoid it for write-heavy applications, enterprise BI, or large-scale data warehousing.
**TL;DR**: If you have data and want to publish it or explore it quickly without writing application code, use Datasette. If you need complex transactions, real-time collaboration, or enterprise BI features, choose a different tool.
## References
- Official Documentation @ <https://docs.datasette.io/>
- GitHub Repository @ <https://github.com/simonw/datasette>
- Plugin Directory @ <https://datasette.io/plugins>
- Context7 Documentation @ /simonw/datasette (949 code snippets)
- Dogsheep Project @ <https://github.com/dogsheep> (Personal data toolkit)
- Datasette Lite (WebAssembly) @ <https://lite.datasette.io/>
- Community Discord @ <https://datasette.io/discord>
- Newsletter @ <https://datasette.substack.com/>

View File

@@ -0,0 +1,701 @@
---
title: "Fabric: High-Level SSH Command Execution and Deployment"
library_name: fabric
pypi_package: fabric
category: ssh-automation
python_compatibility: "3.6+"
last_updated: "2025-11-02"
official_docs: "https://docs.fabfile.org"
official_repository: "https://github.com/fabric/fabric"
maintenance_status: "stable"
---
# Fabric: High-Level SSH Command Execution and Deployment
## Core Purpose
Fabric is a high-level Python library designed to execute shell commands remotely over SSH, yielding useful Python objects in return. It solves the problem of programmatic remote server management and deployment automation by providing a Pythonic interface to SSH operations.
### What Problem Does Fabric Solve?
Fabric eliminates the need to manually SSH into multiple servers and run commands repeatedly. It provides:
1. **Programmatic SSH Execution**: Execute commands on remote servers from Python code
2. **Multi-Host Management**: Run commands across multiple servers in parallel or serially
3. **File Transfer**: Upload and download files over SSH/SFTP
4. **Deployment Automation**: Orchestrate complex deployment workflows
5. **Task Definition**: Define reusable deployment tasks with the `@task` decorator
6. **Connection Management**: Handle SSH authentication, connection pooling, and error handling
### When Should You Use Fabric?
**Use Fabric when:**
- You need to execute commands on **remote servers** over SSH
- You're automating deployment processes (copying files, restarting services, running migrations)
- You need to manage multiple servers programmatically
- You want to define reusable deployment tasks in Python
- You're building continuous integration/deployment pipelines
- You need more than just subprocess (which only works locally)
**Use subprocess when:**
- You only need to run commands on your **local machine**
- You don't need SSH connectivity to remote hosts
- Your automation is purely local process execution
**Use Ansible when:**
- You need declarative configuration management across many hosts
- You require idempotency guarantees
- You need a large ecosystem of pre-built modules
- Your team prefers YAML over Python
- You're managing infrastructure state, not just running scripts
**Use Paramiko directly when:**
- You need low-level SSH protocol control
- You're building custom SSH clients or servers
- Fabric's higher-level abstractions are too restrictive
## Architecture and Dependencies
Fabric is built on two core libraries:
1. **Invoke** (>=2.0): Subprocess command execution and command-line task features
2. **Paramiko** (>=2.4): SSH protocol implementation
Fabric extends their APIs to provide:
- Remote execution via `Connection.run()`
- File transfer via `Connection.put()` and `Connection.get()`
- Sudo support via `Connection.sudo()`
- Group operations via `SerialGroup` and `ThreadingGroup`
## Python Version Compatibility
| Python Version | Fabric 2.x | Fabric 3.x | Status |
| -------------- | ---------- | ---------- | ---------------- |
| 3.6 | ✓ | ✓ | Minimum version |
| 3.7 | ✓ | ✓ | Supported |
| 3.8 | ✓ | ✓ | Supported |
| 3.9 | ✓ | ✓ | Supported |
| 3.10 | ✓ | ✓ | Supported |
| 3.11 | ✓ | ✓ | Supported |
| 3.12 | ? | ? | Likely supported |
| 3.13 | ? | ? | Likely supported |
| 3.14 | ? | ? | Unknown |
**Note**: Fabric follows semantic versioning. Fabric 2.x and 3.x share similar APIs with minor breaking changes. Fabric 1.x (legacy) is incompatible with 2.x/3.x.
### Fabric Version Differences
- **Fabric 1.x** (legacy): Python 2.7 only, different API, no longer maintained
- **Fabric 2.x**: Modern API, Python 3.6+, built on Invoke/Paramiko
- **Fabric 3.x**: Current stable, incremental improvements over 2.x, Python 3.6+
## Installation
```bash
# Standard installation
pip install fabric
# For migration from Fabric 1.x (side-by-side installation)
pip install fabric2
# Development installation
pip install -e git+https://github.com/fabric/fabric
# With pytest fixtures support
pip install fabric[pytest]
```
## Core Usage Patterns
### 1. Basic Remote Command Execution
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import Connection
# Simple connection and command execution
result = Connection('web1.example.com').run('uname -s', hide=True)
print(f"Ran {result.command!r} on {result.connection.host}")
print(f"Exit code: {result.exited}")
print(f"Output: {result.stdout.strip()}")
```
**Result object attributes:**
- `result.stdout`: Command output
- `result.stderr`: Error output
- `result.exited`: Exit code
- `result.ok`: Boolean (True if exit code was 0)
- `result.command`: The command that was run
- `result.connection`: The Connection object used
### 2. Connection with Authentication
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import Connection
# User@host:port format
c = Connection('deploy@web1.example.com:2202')
# Or explicit parameters
c = Connection(
host='web1.example.com',
user='deploy',
port=2202,
connect_kwargs={
"key_filename": "/path/to/private/key",
# or
"password": "mypassword"
}
)
# Execute commands
c.run('whoami')
c.run('ls -la /var/www')
```
### 3. File Transfer Operations
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import Connection
c = Connection('web1')
# Upload file
result = c.put('myfiles.tgz', remote='/opt/mydata/')
print(f"Uploaded {result.local} to {result.remote}")
# Download file
c.get('/var/log/app.log', local='./logs/')
# Upload and extract
c.put('myfiles.tgz', '/opt/mydata')
c.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
```
### 4. Sudo Operations
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
import getpass
from fabric import Connection, Config
# Configure sudo password
sudo_pass = getpass.getpass("What's your sudo password?")
config = Config(overrides={'sudo': {'password': sudo_pass}})
c = Connection('db1', config=config)
# Run with sudo using helper method
c.sudo('whoami', hide='stderr') # Output: root
c.sudo('useradd mydbuser')
c.run('id -u mydbuser') # Verify user created
# Alternative: Manual sudo with password responder
from invoke import Responder
sudopass = Responder(
pattern=r'\[sudo\] password:',
response=f'{sudo_pass}\n',
)
c.run('sudo whoami', pty=True, watchers=[sudopass])
```
### 5. Multi-Host Execution (Serial)
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import SerialGroup as Group
# Execute on multiple hosts serially
pool = Group('web1', 'web2', 'web3')
# Run command on all hosts
results = pool.run('uname -s')
for connection, result in results.items():
print(f"{connection.host}: {result.stdout.strip()}")
# File operations on all hosts
pool.put('myfiles.tgz', '/opt/mydata')
pool.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
```
### 6. Multi-Host Execution (Parallel)
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import ThreadingGroup as Group
# Execute on multiple hosts in parallel
pool = Group('web1', 'web2', 'web3', 'web4', 'web5')
# Run command concurrently
results = pool.run('hostname')
# Process results
for connection, result in results.items():
print(f"{connection.host}: {result.stdout.strip()}")
```
### 7. Defining Reusable Tasks
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import task
@task
def deploy(c):
"""Deploy application to remote server"""
code_dir = "/srv/django/myproject"
# Check if directory exists
if not c.run(f"test -d {code_dir}", warn=True):
# Clone repository
c.run(f"git clone user@vcshost:/path/to/repo/.git {code_dir}")
# Update code
c.run(f"cd {code_dir} && git pull")
# Restart application
c.run(f"cd {code_dir} && touch app.wsgi")
@task
def update_servers(c):
"""Run system updates"""
c.sudo('apt update')
c.sudo('apt upgrade -y')
c.sudo('systemctl restart nginx')
# Use with fab command:
# fab -H web1,web2,web3 deploy
```
### 8. Task Composition and Workflow
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import task
from invoke import Exit
from invocations.console import confirm
@task
def test(c):
"""Run local tests"""
result = c.local("./manage.py test my_app", warn=True)
if not result and not confirm("Tests failed. Continue anyway?"):
raise Exit("Aborting at user request.")
@task
def commit(c):
"""Commit changes"""
c.local("git add -p && git commit")
@task
def push(c):
"""Push to remote"""
c.local("git push")
@task
def prepare_deploy(c):
"""Prepare for deployment"""
test(c)
commit(c)
push(c)
@task(hosts=['web1.example.com', 'web2.example.com'])
def deploy(c):
"""Deploy to remote servers"""
code_dir = "/srv/django/myproject"
c.run(f"cd {code_dir} && git pull")
c.run(f"cd {code_dir} && touch app.wsgi")
# Usage:
# fab prepare_deploy deploy
```
### 9. Connection with Gateway/Bastion Host
**@<https://docs.fabfile.org/en/latest/concepts/networking.html>**
```python
from fabric import Connection
# Connect to internal host through gateway
gateway = Connection('bastion.example.com')
c = Connection('internal-db.local', gateway=gateway)
# Now all operations go through the gateway
c.run('hostname')
c.run('df -h')
```
### 10. Error Handling and Conditional Logic
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
```python
from fabric import SerialGroup as Group
def upload_and_unpack(c):
"""Upload file only if it doesn't exist"""
# Check if file exists (don't fail on non-zero exit)
if c.run('test -f /opt/mydata/myfile', warn=True).failed:
c.put('myfiles.tgz', '/opt/mydata')
c.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
else:
print(f"File already exists on {c.host}, skipping upload")
# Apply to group
for connection in Group('web1', 'web2', 'web3'):
upload_and_unpack(connection)
```
## Real-World Integration Patterns
### Pattern 1: Django/Web Application Deployment
**@<https://www.oreilly.com/library/view/test-driven-development-with/9781449365141/ch09.html>**
```python
from fabric import task, Connection
@task
def deploy_django(c):
"""Deploy Django application"""
# Pull latest code
c.run('cd /var/www/myapp && git pull origin main')
# Install dependencies
c.run('cd /var/www/myapp && pip install -r requirements.txt')
# Run migrations
c.run('cd /var/www/myapp && python manage.py migrate')
# Collect static files
c.run('cd /var/www/myapp && python manage.py collectstatic --noinput')
# Restart services
c.sudo('systemctl restart gunicorn')
c.sudo('systemctl restart nginx')
```
### Pattern 2: Database Backup and Restore
**@Exa:fabric deployment examples**
```python
from fabric import task
from datetime import datetime
@task
def backup_database(c):
"""Create database backup"""
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
backup_file = f"backup_{timestamp}.sql"
# Create backup
c.run(f"mysqldump -u dbuser -p database_name > /backups/{backup_file}")
# Compress backup
c.run(f"gzip /backups/{backup_file}")
# Download backup
c.get(f"/backups/{backup_file}.gz", local=f"./backups/{backup_file}.gz")
print(f"Backup completed: {backup_file}.gz")
```
### Pattern 3: Log Collection and Analysis
```python
from fabric import SerialGroup as Group
def collect_logs(c):
"""Collect application logs from remote server"""
hostname = c.run('hostname', hide=True).stdout.strip()
c.get('/var/log/app/error.log', local=f'logs/{hostname}_error.log')
c.get('/var/log/app/access.log', local=f'logs/{hostname}_access.log')
# Collect from all servers
pool = Group('web1', 'web2', 'web3', 'web4')
for conn in pool:
collect_logs(conn)
```
### Pattern 4: Service Health Check
```python
from fabric import task, SerialGroup as Group
@task
def health_check(c):
"""Check service health across servers"""
servers = Group('web1', 'web2', 'db1', 'cache1')
for conn in servers:
print(f"\nChecking {conn.host}...")
# Check disk space
result = conn.run("df -h / | tail -n1 | awk '{print $5}'", hide=True)
disk_usage = result.stdout.strip()
print(f" Disk usage: {disk_usage}")
# Check memory
result = conn.run("free -m | grep Mem | awk '{print $3/$2 * 100.0}'", hide=True)
mem_usage = float(result.stdout.strip())
print(f" Memory usage: {mem_usage:.1f}%")
# Check service status
result = conn.run("systemctl is-active nginx", warn=True, hide=True)
service_status = result.stdout.strip()
print(f" Nginx status: {service_status}")
```
## When NOT to Use Fabric
### 1. Simple Local Automation
```python
# DON'T use Fabric for local operations
from fabric import Connection
c = Connection('localhost')
c.run('ls -la')
# DO use subprocess instead
import subprocess
subprocess.run(['ls', '-la'])
```
### 2. Large-Scale Infrastructure Management
If you need to manage hundreds of servers with complex configuration requirements, **Ansible** or **SaltStack** provide better:
- Declarative configuration syntax
- Idempotency guarantees
- Large module ecosystem
- Built-in inventory management
- Role-based organization
### 3. Container Orchestration
For Docker/Kubernetes deployments, use native orchestration tools:
- Docker Compose
- Kubernetes manifests
- Helm charts
- ArgoCD
### 4. Configuration Drift Detection
Fabric executes commands but doesn't track state. For configuration management with drift detection, use:
- Ansible
- Chef
- Puppet
- Terraform (for infrastructure)
### 5. Windows Remote Management
For Windows automation, use:
- PowerShell Remoting
- WinRM libraries
- Ansible (with WinRM)
## Decision Matrix
| Scenario | Fabric | Ansible | Subprocess | Paramiko |
| ---------------------------- | ------ | ------- | ---------- | -------- |
| Deploy to 1-10 Linux servers | ✓✓ | ✓ | ✗ | ✓ |
| Deploy to 100+ servers | ✓ | ✓✓ | ✗ | ✗ |
| Run local commands | ✗ | ✗ | ✓✓ | ✗ |
| Configuration management | ✗ | ✓✓ | ✗ | ✗ |
| Ad-hoc SSH automation | ✓✓ | ✓ | ✗ | ✓ |
| Custom SSH protocol work | ✗ | ✗ | ✗ | ✓✓ |
| Python-first workflow | ✓✓ | ✗ | ✓✓ | ✓✓ |
| YAML-first workflow | ✗ | ✓✓ | ✗ | ✗ |
| File transfer over SSH | ✓✓ | ✓ | ✗ | ✓ |
| Parallel execution | ✓✓ | ✓✓ | ✓ | ✗ |
| Windows targets | ✗ | ✓✓ | ✓ | ✗ |
**Legend**: ✓✓ = Excellent fit, ✓ = Suitable, ✗ = Not appropriate
## Common Gotchas and Solutions
### 1. Separate Shell Sessions
**@<https://www.fabfile.org/faq.html>**
```python
# WRONG: cd doesn't persist across run() calls
@task
def deploy(c):
c.run("cd /path/to/application")
c.run("./update.sh") # This runs in home directory!
# CORRECT: Use shell && operator
@task
def deploy(c):
c.run("cd /path/to/application && ./update.sh")
# ALTERNATIVE: Use absolute paths
@task
def deploy(c):
c.run("/path/to/application/update.sh")
```
### 2. Sudo Password Prompts
```python
# WRONG: Sudo hangs waiting for password
c.run('sudo systemctl restart nginx')
# CORRECT: Use pty=True and watchers
from invoke import Responder
sudopass = Responder(
pattern=r'\[sudo\] password:',
response='mypassword\n',
)
c.run('sudo systemctl restart nginx', pty=True, watchers=[sudopass])
# BETTER: Use Connection.sudo() helper
c.sudo('systemctl restart nginx') # Uses configured password
```
### 3. Connection Reuse
```python
# INEFFICIENT: Creates new connection each time
for i in range(10):
Connection('web1').run(f'echo {i}')
# EFFICIENT: Reuse connection
c = Connection('web1')
for i in range(10):
c.run(f'echo {i}')
```
## Testing with Fabric
**@<https://docs.fabfile.org/en/latest/testing.html>**
```python
from fabric.testing import MockRemote
def test_deployment():
"""Test deployment logic without real SSH"""
with MockRemote(commands={
'test -d /srv/app': (1, '', ''), # Exit 1 = doesn't exist
'git clone ...': (0, 'Cloning...', ''),
'cd /srv/app && git pull': (0, 'Already up to date', ''),
}) as remote:
c = remote.connection
deploy(c)
# Verify commands were called
assert 'git clone' in remote.calls
```
## Migration from Fabric 1.x to 2.x/3.x
**@<https://docs.fabfile.org/en/latest/upgrading.html>**
Key changes:
1. No more `env` global dictionary
2. Tasks must accept `Connection` or `Context` as first argument
3. No more `@hosts` decorator (use `@task(hosts=[...])`)
4. `run()` is now `c.run()` on Connection object
5. Import from `fabric` not `fabric.api`
```python
# Fabric 1.x (OLD)
from fabric.api import env, run, task
env.hosts = ['web1', 'web2']
@task
def deploy():
run('git pull')
# Fabric 2.x/3.x (NEW)
from fabric import task
@task(hosts=['web1', 'web2'])
def deploy(c):
c.run('git pull')
```
## Performance Considerations
1. **Parallel vs Serial Execution**:
- Use `ThreadingGroup` for I/O-bound tasks (network operations)
- Consider `SerialGroup` for order-dependent operations
- Default thread pool size is 10 connections
2. **Connection Pooling**:
- Reuse `Connection` objects when possible
- Close connections explicitly with `c.close()` or use context managers
3. **Output Buffering**:
- Use `hide=True` to suppress output and improve performance
- Large output can slow down execution
## Resources and Examples
### Official Documentation
- Main site: @<https://www.fabfile.org/>
- Getting Started: @<https://docs.fabfile.org/en/latest/getting-started.html>
- API Reference: @<https://docs.fabfile.org/en/latest/api/>
- FAQ: @<https://www.fabfile.org/faq.html>
- Upgrading Guide: @<https://www.fabfile.org/upgrading.html>
### GitHub Examples
- Official repository: @<https://github.com/fabric/fabric>
- Example fabfiles: @<https://github.com/fabric/fabric/tree/main/sites/docs>
- Integration tests: @<https://github.com/fabric/fabric/tree/main/integration>
### Community Resources
- Fabricio (Docker automation): @<https://github.com/renskiy/fabricio>
- Linux Journal tutorial: @<https://www.linuxjournal.com/content/fabric-system-administrators-best-friend>
- Medium tutorials: @<https://medium.com/gopyjs/automate-deployment-with-fabric-python-fad992e68b5>
## Summary
**Use Fabric when you need to:**
- Execute commands on remote Linux servers via SSH
- Automate deployment of web applications
- Manage small to medium server fleets (1-50 servers)
- Transfer files between local and remote systems
- Define reusable deployment tasks in Python
- Integrate deployment into CI/CD pipelines
**Don't use Fabric when:**
- You only need local command execution (use subprocess)
- You're managing large infrastructure (>100 servers, use Ansible)
- You need configuration drift detection (use Ansible/Chef/Puppet)
- You're working with Windows servers primarily
- You need declarative infrastructure as code (use Terraform/Ansible)
Fabric excels at programmatic SSH automation for deployment workflows where you want the full power of Python combined with remote execution capabilities. It's the sweet spot between low-level Paramiko and heavyweight configuration management tools.

View File

@@ -0,0 +1,630 @@
---
title: "httpx - Next Generation HTTP Client for Python"
library_name: httpx
pypi_package: httpx
category: http_client
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://www.python-httpx.org"
official_repository: "https://github.com/encode/httpx"
maintenance_status: "active"
---
# httpx - Next Generation HTTP Client for Python
## Overview
**httpx** is a fully-featured HTTP client library for Python 3 that provides both synchronous and asynchronous APIs. It is designed as a next-generation alternative to the popular `requests` library, offering HTTP/1.1 and HTTP/2 support, true async capabilities, and a broadly compatible API while introducing modern improvements.
**Official Repository:** <https://github.com/encode/httpx> @ encode/httpx **Documentation:** <https://www.python-httpx.org/> @ python-httpx.org **PyPI Package:** `httpx` @ pypi.org/project/httpx **License:** BSD-3-Clause @ github.com/encode/httpx **Current Version:** 0.28.1 (as of December 2024) @ pypi.org **Maintenance Status:** Actively maintained, 14,652+ GitHub stars @ github.com/encode/httpx
## Core Purpose
### Problem httpx Solves
1. **Async HTTP Support:** Provides native async/await support for HTTP requests, eliminating the need for separate libraries like `aiohttp` @ python-httpx.org/async
2. **HTTP/2 Protocol:** Full HTTP/2 support with connection multiplexing and server push @ python-httpx.org/http2
3. **Modern Python Standards:** Built for Python 3.9+ with full type annotations and modern async patterns @ github.com/encode/httpx/pyproject.toml
4. **Consistent Sync/Async API:** Single library that works for both synchronous and asynchronous code @ python-httpx.org
### What Would Be "Reinventing the Wheel"
Without httpx, you would need to:
- Use separate libraries for sync (`requests`) and async (`aiohttp`) HTTP operations @ towardsdatascience.com
- Implement HTTP/2 support manually or use lower-level libraries @ python-httpx.org/http2
- Manage different API patterns between sync and async code @ python-httpx.org/async
- Handle connection pooling and timeout configuration separately for each library @ python-httpx.org/advanced
## When to Use httpx
### Use httpx When
1. **Async HTTP Required:** You need asynchronous HTTP requests in an async application (FastAPI, asyncio, Trio) @ python-httpx.org/async
2. **HTTP/2 Support Needed:** Your application benefits from HTTP/2 features like multiplexing @ python-httpx.org/http2
3. **Both Sync and Async:** You want one library that handles both synchronous and asynchronous patterns @ python-httpx.org
4. **ASGI/WSGI Testing:** You need to make requests directly to ASGI or WSGI applications without network @ python-httpx.org/advanced/transports
5. **Modern Type Safety:** You require full type annotations and modern Python tooling support @ github.com/encode/httpx
6. **Strict Timeouts:** You need proper timeout handling by default (httpx has timeouts everywhere) @ python-httpx.org/quickstart
### Use requests When
1. **Simple Sync-Only Application:** You only need synchronous HTTP and don't require async @ python-httpx.org/compatibility
2. **Legacy Python Support:** You need to support Python 3.7 or earlier @ github.com/encode/httpx/pyproject.toml
3. **Broad Ecosystem Compatibility:** You rely on requests-specific plugins or tools @ python-httpx.org/compatibility
4. **Auto-Redirects Preferred:** You want automatic redirect following by default (httpx requires explicit opt-in) @ python-httpx.org/quickstart
### Use aiohttp When
1. **Server + Client Together:** You need both HTTP server and client in one library @ medium.com/featurepreneur
2. **WebSocket Support:** You need built-in WebSocket client support (httpx requires httpx-ws extension) @ github.com/frankie567/httpx-ws
3. **Existing aiohttp Codebase:** You have significant investment in aiohttp-specific features @ medium.com/featurepreneur
## Decision Matrix
```text
┌─────────────────────────────────┬──────────┬──────────┬─────────┐
│ Requirement │ httpx │ requests │ aiohttp │
├─────────────────────────────────┼──────────┼──────────┼─────────┤
│ Sync HTTP requests │ ✓ │ ✓ │ ✗ │
│ Async HTTP requests │ ✓ │ ✗ │ ✓ │
│ HTTP/2 support │ ✓ │ ✗ │ ✓ │
│ requests-compatible API │ ✓ │ ✓ │ ✗ │
│ Type annotations │ ✓ │ Partial │ ✓ │
│ Default timeouts │ ✓ │ ✗ │ ✓ │
│ ASGI/WSGI testing │ ✓ │ ✗ │ ✗ │
│ Python 3.7 support │ ✗ │ ✓ │ ✓ │
│ Auto-redirects by default │ ✗ │ ✓ │ ✓ │
│ Built-in server support │ ✗ │ ✗ │ ✓ │
└─────────────────────────────────┴──────────┴──────────┴─────────┘
```
@ Compiled from python-httpx.org, medium.com/featurepreneur
## Python Version Compatibility
**Minimum Python Version:** 3.9 @ github.com/encode/httpx/pyproject.toml **Officially Supported Versions:** 3.9, 3.10, 3.11, 3.12, 3.13 @ github.com/encode/httpx/pyproject.toml
**Async/Await Requirements:**
- Full async/await syntax support (Python 3.7+) @ python-httpx.org/async
- Works with asyncio, Trio, and anyio backends @ python-httpx.org/async
**Python 3.11-3.14 Status:**
- **3.11:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
- **3.12:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
- **3.13:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
- **3.14:** Expected to work (not yet released as of October 2025)
## Real-World Usage Examples
### Example Projects Using httpx
1. **notion-sdk-py** (2,086+ stars) @ github.com/ramnes/notion-sdk-py
- Official Notion API client with sync and async support
- Pattern: Client wrapper using httpx.Client and httpx.AsyncClient
- URL: <https://github.com/ramnes/notion-sdk-py>
2. **githubkit** (296+ stars) @ github.com/yanyongyu/githubkit
- Modern GitHub SDK with REST API and GraphQL support
- Pattern: Unified sync/async interface with httpx
- URL: <https://github.com/yanyongyu/githubkit>
3. **twscrape** (1,981+ stars) @ github.com/vladkens/twscrape
- Twitter/X API scraper with authorization support
- Pattern: Async httpx for high-performance concurrent requests
- URL: <https://github.com/vladkens/twscrape>
4. **TikTokDownloader** (12,018+ stars) @ github.com/JoeanAmier/TikTokDownloader
- TikTok/Douyin data collection and download tool
- Pattern: Async httpx for parallel downloads
- URL: <https://github.com/JoeanAmier/TikTokDownloader>
5. **XHS-Downloader** (8,982+ stars) @ github.com/JoeanAmier/XHS-Downloader
- Xiaohongshu (RedNote) content extractor and downloader
- Pattern: httpx with FastAPI for server-side scraping
- URL: <https://github.com/JoeanAmier/XHS-Downloader>
### Common Usage Patterns @ github.com/search, exa.ai
```python
# Pattern 1: Synchronous API client wrapper
import httpx
class APIClient:
def __init__(self, base_url: str, api_key: str):
self.client = httpx.Client(
base_url=base_url,
headers={"Authorization": f"Bearer {api_key}"},
timeout=30.0
)
def get_resource(self, resource_id: str):
response = self.client.get(f"/resources/{resource_id}")
response.raise_for_status()
return response.json()
# Pattern 2: Async concurrent requests
import asyncio
import httpx
async def fetch_all(urls: list[str]) -> list[dict]:
async with httpx.AsyncClient() as client:
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return [r.json() for r in responses]
# Pattern 3: FastAPI integration with async httpx
from fastapi import FastAPI
import httpx
app = FastAPI()
@app.get("/proxy/{path:path}")
async def proxy_request(path: str):
async with httpx.AsyncClient() as client:
response = await client.get(f"https://api.example.com/{path}")
return response.json()
# Pattern 4: HTTP/2 with connection pooling
import httpx
client = httpx.Client(http2=True)
try:
for i in range(10):
response = client.get(f"https://http2.example.com/data/{i}")
print(response.json())
finally:
client.close()
# Pattern 5: Streaming large downloads with progress
import httpx
with httpx.stream("GET", "https://example.com/large-file.zip") as response:
total = int(response.headers["Content-Length"])
downloaded = 0
with open("output.zip", "wb") as f:
for chunk in response.iter_bytes(chunk_size=8192):
f.write(chunk)
downloaded += len(chunk)
print(f"Progress: {downloaded}/{total} bytes")
```
@ Compiled from github.com/encode/httpx/docs, exa.ai/get_code_context
## Integration Patterns
### FastAPI Integration @ raw.githubusercontent.com/refinedev
```python
from fastapi import FastAPI
import httpx
app = FastAPI()
@app.on_event("startup")
async def startup_event():
app.state.http_client = httpx.AsyncClient()
@app.on_event("shutdown")
async def shutdown_event():
await app.state.http_client.aclose()
@app.get("/data")
async def get_data(request: Request):
async with request.app.state.http_client as client:
response = await client.get("https://api.example.com/data")
return response.json()
```
### Starlette ASGI Transport @ python-httpx.org/advanced/transports
```python
from starlette.applications import Starlette
from starlette.routing import Route
import httpx
async def homepage(request):
return {"message": "Hello, world"}
app = Starlette(routes=[Route("/", homepage)])
# Test without network
with httpx.Client(transport=httpx.ASGITransport(app=app)) as client:
response = client.get("http://testserver/")
assert response.status_code == 200
```
### Trio Async Backend @ python-httpx.org/async
```python
import httpx
import trio
async def main():
async with httpx.AsyncClient() as client:
response = await client.get('https://www.example.com/')
print(response)
trio.run(main)
```
## Installation
### Basic Installation @ python-httpx.org
```bash
pip install httpx
```
### With HTTP/2 Support @ python-httpx.org/http2
```bash
pip install httpx[http2]
```
### With CLI Support @ python-httpx.org
```bash
pip install 'httpx[cli]'
```
### With All Features @ python-httpx.org
```bash
pip install 'httpx[http2,cli,brotli,zstd]'
```
### Using uv (Recommended) @ astral.sh
```bash
uv add httpx
uv add 'httpx[http2]' # With HTTP/2 support
```
## Usage Examples
### Basic Synchronous Request @ python-httpx.org/quickstart
```python
import httpx
# Simple GET request
response = httpx.get('https://httpbin.org/get')
print(response.status_code) # 200
print(response.json())
# POST with data
response = httpx.post('https://httpbin.org/post', data={'key': 'value'})
# Custom headers
headers = {'user-agent': 'my-app/0.0.1'}
response = httpx.get('https://httpbin.org/headers', headers=headers)
# Query parameters
params = {'key1': 'value1', 'key2': 'value2'}
response = httpx.get('https://httpbin.org/get', params=params)
```
### Asynchronous Requests @ python-httpx.org/async
```python
import httpx
import asyncio
async def fetch_data():
async with httpx.AsyncClient() as client:
response = await client.get('https://www.example.com/')
print(response.status_code)
return response.json()
# Run async function
asyncio.run(fetch_data())
# Concurrent requests
async def fetch_multiple():
async with httpx.AsyncClient() as client:
tasks = [
client.get('https://httpbin.org/get'),
client.get('https://httpbin.org/headers'),
client.get('https://httpbin.org/user-agent')
]
responses = await asyncio.gather(*tasks)
return [r.json() for r in responses]
```
### Client Instance with Configuration @ python-httpx.org/advanced/clients
```python
import httpx
# Create configured client
client = httpx.Client(
base_url='https://api.example.com',
headers={'Authorization': 'Bearer token123'},
timeout=30.0,
follow_redirects=True
)
try:
# Make requests using the client
response = client.get('/users/me')
response.raise_for_status()
print(response.json())
finally:
client.close()
# Context manager (automatic cleanup)
with httpx.Client(base_url='https://api.example.com') as client:
response = client.get('/data')
```
### HTTP/2 Support @ python-httpx.org/http2
```python
import httpx
# Enable HTTP/2
client = httpx.Client(http2=True)
try:
response = client.get('https://www.google.com')
print(response.extensions['http_version']) # b'HTTP/2'
finally:
client.close()
# Async HTTP/2
async with httpx.AsyncClient(http2=True) as client:
response = await client.get('https://www.google.com')
print(response.extensions['http_version'])
```
### Streaming Responses @ python-httpx.org/quickstart
```python
import httpx
# Stream bytes
with httpx.stream("GET", "https://www.example.com/large-file") as response:
for chunk in response.iter_bytes(chunk_size=8192):
process_chunk(chunk)
# Stream lines
with httpx.stream("GET", "https://www.example.com/log") as response:
for line in response.iter_lines():
print(line)
# Conditional loading
with httpx.stream("GET", "https://www.example.com/file") as response:
if int(response.headers['Content-Length']) < 10_000_000: # 10MB
content = response.read()
print(content)
```
### Error Handling @ python-httpx.org/quickstart
```python
import httpx
try:
response = httpx.get("https://www.example.com/")
response.raise_for_status() # Raises HTTPStatusError for 4xx/5xx
except httpx.RequestError as exc:
print(f"Network error: {exc.request.url}")
except httpx.HTTPStatusError as exc:
print(f"HTTP error {exc.response.status_code}: {exc.request.url}")
except httpx.HTTPError as exc:
print(f"General HTTP error: {exc}")
```
### Authentication @ python-httpx.org/quickstart
```python
import httpx
# Basic authentication
response = httpx.get(
"https://example.com",
auth=("username", "password")
)
# Digest authentication
auth = httpx.DigestAuth("username", "password")
response = httpx.get("https://example.com", auth=auth)
# Bearer token
headers = {"Authorization": "Bearer token123"}
response = httpx.get("https://api.example.com", headers=headers)
```
## When NOT to Use httpx
### Scenarios Where httpx May Not Be Suitable
1. **Python 3.8 or Earlier Required** @ github.com/encode/httpx/pyproject.toml
- httpx requires Python 3.9+
- Use `requests` for older Python versions
2. **Simple Scripts with Minimal Dependencies** @ python-httpx.org/compatibility
- If you only need basic HTTP GET/POST in a simple script
- `requests` has fewer dependencies and simpler API
- httpx pulls in additional dependencies (httpcore, anyio, sniffio)
3. **requests Plugin Ecosystem Required** @ python-httpx.org/compatibility
- Libraries specifically built for requests (requests-oauthlib, etc.)
- May not have httpx equivalents
- Consider staying with requests if heavily invested in plugins
4. **Need WebSocket Built-in** @ github.com/frankie567/httpx-ws
- httpx requires separate httpx-ws extension
- aiohttp has built-in WebSocket support
5. **Auto-Redirect Preference** @ python-httpx.org/quickstart
- httpx does NOT follow redirects by default (security-conscious design)
- Requires explicit `follow_redirects=True`
- requests follows redirects automatically
6. **Server + Client in One Library** @ medium.com/featurepreneur
- httpx is client-only
- Use aiohttp or starlette if you need both server and client
## Key Differences from requests
### API Compatibility @ python-httpx.org/compatibility
httpx provides broad compatibility with requests, but with key differences:
```python
# requests: Auto-redirects by default
requests.get('http://github.com/') # Follows to HTTPS
# httpx: Explicit redirect handling
httpx.get('http://github.com/', follow_redirects=True)
# requests: No timeouts by default
requests.get('https://example.com')
# httpx: 5-second default timeout
httpx.get('https://example.com') # 5s timeout
# requests: Session object
session = requests.Session()
# httpx: Client object
client = httpx.Client()
```
### Modern Improvements @ python-httpx.org
1. **Type Safety:** Full type annotations throughout @ github.com/encode/httpx
2. **Async Native:** Built-in async/await support @ python-httpx.org/async
3. **Strict Timeouts:** Timeouts everywhere by default @ python-httpx.org/quickstart
4. **HTTP/2:** Optional HTTP/2 protocol support @ python-httpx.org/http2
5. **Better Encoding:** UTF-8 default encoding vs latin1 in requests @ python-httpx.org/compatibility
## Dependencies @ github.com/encode/httpx
### Core Dependencies
- **httpcore** - Underlying transport implementation @ github.com/encode/httpcore
- **certifi** - SSL certificates @ github.com/certifi
- **idna** - Internationalized domain names @ github.com/kjd/idna
- **anyio** - Async abstraction layer @ github.com/agronholm/anyio
- **sniffio** - Async library detection @ github.com/python-trio/sniffio
### Optional Dependencies
- **h2** - HTTP/2 support (`httpx[http2]`) @ github.com/python-hyper/h2
- **socksio** - SOCKS proxy support (`httpx[socks]`) @ github.com/sethmlarson/socksio
- **brotli/brotlicffi** - Brotli compression (`httpx[brotli]`) @ github.com/google/brotli
- **zstandard** - Zstandard compression (`httpx[zstd]`) @ github.com/indygreg/python-zstandard
- **click + pygments + rich** - CLI support (`httpx[cli]`) @ github.com/pallets/click
## Testing and Mocking
### respx - Mock httpx @ github.com/lundberg/respx
```python
import httpx
import respx
@respx.mock
async def test_api_call():
async with httpx.AsyncClient() as client:
route = respx.get("https://example.org/")
response = await client.get("https://example.org/")
assert route.called
assert response.status_code == 200
```
### pytest-httpx @ github.com/Colin-b/pytest_httpx
```python
import httpx
import pytest
def test_with_httpx(httpx_mock):
httpx_mock.add_response(url="https://example.com/", json={"status": "ok"})
response = httpx.get("https://example.com/")
assert response.json() == {"status": "ok"}
```
## Performance Considerations @ raw.githubusercontent.com/encode/httpx
### Connection Pooling
```python
import httpx
# Reuse connections with Client
client = httpx.Client()
for i in range(100):
response = client.get(f"https://api.example.com/item/{i}")
client.close()
# Async with connection limits
async with httpx.AsyncClient(
limits=httpx.Limits(max_keepalive_connections=20, max_connections=100)
) as client:
# Efficient connection reuse
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
```
### Timeout Configuration @ python-httpx.org/advanced/timeouts
```python
import httpx
# Fine-grained timeouts
timeout = httpx.Timeout(
connect=5.0, # Connection timeout
read=10.0, # Read timeout
write=10.0, # Write timeout
pool=None # Pool acquisition timeout
)
client = httpx.Client(timeout=timeout)
```
## Additional Resources
### Official Documentation @ python-httpx.org
- Quickstart Guide: <https://www.python-httpx.org/quickstart/>
- Async Support: <https://www.python-httpx.org/async/>
- HTTP/2: <https://www.python-httpx.org/http2/>
- Advanced Usage: <https://www.python-httpx.org/advanced/>
- API Reference: <https://www.python-httpx.org/api/>
### Community Resources
- GitHub Discussions: <https://github.com/encode/httpx/discussions> @ github.com
- Third-Party Packages: <https://www.python-httpx.org/third_party_packages/> @ python-httpx.org
- httpx-oauth: OAuth client using httpx @ github.com/frankie567/httpx-oauth
- httpx-ws: WebSocket support @ github.com/frankie567/httpx-ws
- httpx-sse: Server-Sent Events @ github.com/florimondmanca/httpx-sse
### Migration Guides
- Requests Compatibility: <https://www.python-httpx.org/compatibility/> @ python-httpx.org
- Contributing Guide: <https://www.python-httpx.org/contributing/> @ python-httpx.org
## Summary
httpx is the modern choice for HTTP clients in Python when you need:
- Async/await support alongside synchronous APIs
- HTTP/2 protocol capabilities
- Type-safe, well-documented interfaces
- Strict timeout and error handling by default
- Testing against ASGI/WSGI apps without network
It maintains broad compatibility with requests while introducing modern Python best practices, making it an excellent choice for new projects and async applications. For simple synchronous scripts or legacy Python support, requests remains a solid choice.
---
**Research completed:** 2025-10-21 @ Claude Code Agent **Sources verified:** GitHub, Context7, PyPI, Official Documentation @ Multiple verified sources **Confidence level:** High - All information cross-referenced from official sources

View File

@@ -0,0 +1,706 @@
---
title: "paho-mqtt: Eclipse Paho MQTT Python Client"
library_name: paho-mqtt
pypi_package: paho-mqtt
category: networking
python_compatibility: "3.7+"
last_updated: "2025-11-02"
official_docs: "https://eclipse.dev/paho/files/paho.mqtt.python/html"
official_repository: "https://github.com/eclipse-paho/paho.mqtt.python"
maintenance_status: "active"
---
# paho-mqtt: Eclipse Paho MQTT Python Client
## Core Purpose
The model must state that paho-mqtt is the official Eclipse Foundation implementation of the MQTT protocol client for Python.
MQTT (Message Queuing Telemetry Transport) is a lightweight publish-subscribe messaging protocol designed for machine-to-machine (M2M) and Internet of Things (IoT) communication. The model must explain that paho-mqtt solves the problem of reliable, lightweight messaging in bandwidth-constrained or unreliable network environments.
### What Problem Does It Solve?
The model must identify these key use cases:
1. **IoT Device Communication**: Enables lightweight messaging between sensors, actuators, and control systems
2. **Publish-Subscribe Patterns**: Provides decoupled communication where publishers and subscribers do not need to know about each other
3. **Bandwidth-Constrained Networks**: Minimizes network overhead compared to HTTP/REST
4. **Unreliable Network Handling**: Built-in Quality of Service (QoS) levels ensure message delivery guarantees
5. **Bi-directional Communication**: Supports both sending commands to devices and receiving telemetry data
### Reinventing the Wheel Without It
The model must verify that without paho-mqtt, developers would need to:
- Implement the complex MQTT protocol specification from scratch @ <https://mqtt.org/mqtt-specification/>
- Handle connection management, keep-alive pings, and automatic reconnection logic
- Implement three QoS levels (0, 1, 2) with appropriate acknowledgment mechanisms
- Manage topic subscriptions and wildcard matching
- Handle TLS/SSL encryption and WebSocket transport layers
## Official Information
The model must cite these verified sources:
- **Official Repository**: <https://github.com/eclipse-paho/paho.mqtt.python> @ verified 2025-10-21
- **PyPI Package**: `paho-mqtt` @ <https://pypi.org/project/paho-mqtt/>
- **Official Documentation**: <https://eclipse.dev/paho/files/paho.mqtt.python/html/>
- **License**: EPL-2.0 OR BSD-3-Clause (Eclipse Distribution License v1.0 and Eclipse Public License 2.0)
- **Maintainer**: Eclipse Foundation @ <https://www.eclipse.org/paho/>
- **Latest Version**: 2.1.0 (as of 2025-10-21)
- **Status**: Active development, production-ready (Beta classification indicates mature but evolving API)
## Python Version Compatibility
The model must verify from pyproject.toml @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/pyproject.toml>:
- **Minimum Python Version**: 3.7
- **Tested Versions**: 3.7, 3.8, 3.9, 3.10
- **Python 3.11-3.14 Status**: The model must state that Python 3.11+ compatibility is likely based on package maturity, but official classifiers only list up to 3.10. Users should test thoroughly.
- **Python 2.x**: NOT supported (users must stay on paho-mqtt 1.x for Python 2)
### Breaking Changes
The model must warn that version 2.0 introduced breaking changes:
- Callback API signatures changed (VERSION1 vs VERSION2)
- `CallbackAPIVersion` must now be explicitly specified
- Migration guide available @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/migrations.html>
## Core Features
### 1. Quality of Service (QoS) Levels
The model must explain that MQTT provides three QoS levels @ <https://mqtt.org/mqtt-specification/>:
- **QoS 0 (At most once)**: Fire-and-forget, no acknowledgment, possible message loss
- **QoS 1 (At least once)**: Guaranteed delivery, possible duplicates, uses PUBACK
- **QoS 2 (Exactly once)**: Guaranteed single delivery, uses PUBREC/PUBREL/PUBCOMP handshake
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
```python
import paho.mqtt.client as mqtt
# QoS 0: Fire and forget
client.publish("topic/sensor", "temperature:22", qos=0)
# QoS 1: At least once delivery
msg_info = client.publish("topic/critical", "alert", qos=1)
msg_info.wait_for_publish() # Wait for PUBACK
# QoS 2: Exactly once delivery
client.publish("topic/transaction", "payment:100", qos=2)
```
### 2. Connection Management
The model must verify that paho-mqtt handles:
- **Keep-Alive Mechanism**: Automatic ping/pong to maintain connection
- **Automatic Reconnection**: Built-in retry logic with exponential backoff
- **Clean Session vs Persistent Session**: Control message persistence across disconnections
- **Last Will and Testament (LWT)**: Automatic message sent on unexpected disconnection
Example @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
```python
import paho.mqtt.client as mqtt
def on_connect(client, userdata, flags, reason_code, properties):
if reason_code.is_failure:
print(f"Failed to connect: {reason_code}")
else:
print("Connected successfully")
# Subscribe in on_connect ensures subscriptions persist across reconnections
client.subscribe("sensors/#")
def on_disconnect(client, userdata, flags, reason_code, properties):
if reason_code != 0:
print(f"Unexpected disconnect: {reason_code}")
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
# Configure reconnection with exponential backoff
client.reconnect_delay_set(min_delay=1, max_delay=120)
client.connect("mqtt.eclipseprojects.io", 1883, keepalive=60)
client.loop_forever() # Handles automatic reconnection
```
### 3. Topic Wildcards
The model must explain MQTT topic wildcards @ <http://www.steves-internet-guide.com/understanding-mqtt-topics/>:
- **`+` (single-level wildcard)**: Matches one topic level, e.g., `home/+/temperature` matches `home/bedroom/temperature`
- **`#` (multi-level wildcard)**: Matches multiple levels, e.g., `sensors/#` matches `sensors/temp`, `sensors/humidity/outside`
```python
# Subscribe to all system topics
client.subscribe("$SYS/#")
# Subscribe to all rooms' temperature
client.subscribe("home/+/temperature")
# Helper function to check topic matches
from paho.mqtt.client import topic_matches_sub
assert topic_matches_sub("foo/#", "foo/bar")
assert topic_matches_sub("+/bar", "foo/bar")
assert not topic_matches_sub("non/+/+", "non/matching")
```
## Real-World Examples
The model must cite these verified examples from GitHub search @ 2025-10-21:
### 1. Home Assistant Integration
**Repository**: <https://github.com/home-assistant/core> (82,088 stars) **Use Case**: Open-source home automation platform using MQTT for device integration **Pattern**: Bidirectional communication with IoT devices (lights, sensors, thermostats)
```python
# Pattern extracted from Home Assistant ecosystem
import paho.mqtt.client as mqtt
def on_message(client, userdata, message):
topic = message.topic # e.g., "homeassistant/switch/living_room/state"
payload = message.payload.decode() # e.g., "ON" or "OFF"
# Process device state changes
handle_device_update(topic, payload)
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.on_message = on_message
client.connect("homeassistant.local", 1883)
client.subscribe("homeassistant/#")
client.loop_forever()
```
### 2. AWS IoT Integration
**Repository**: <https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot> (82 stars) **Use Case**: Connect Python applications to AWS IoT Core using MQTT over TLS **Pattern**: Secure cloud IoT communication with certificate-based authentication
```python
# Pattern from AWS IoT example @ https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot
import paho.mqtt.client as mqtt
import ssl
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
# AWS IoT requires TLS with client certificates
client.tls_set(
ca_certs="root-CA.crt",
certfile="device-cert.pem",
keyfile="device-private.key",
tls_version=ssl.PROTOCOL_TLSv1_2
)
# AWS IoT endpoint
client.connect("a1b2c3d4e5f6g7.iot.us-west-2.amazonaws.com", 8883)
# Publish sensor data
client.publish("device/telemetry", '{"temp": 22.5, "humidity": 65}', qos=1)
```
### 3. Async Wrapper Pattern
**Repository**: <https://github.com/mossblaser/aiomqtt> (53 stars) **Use Case**: AsyncIO wrapper around paho-mqtt for non-blocking event loops **Pattern**: Integration with modern Python async/await patterns
Note: The model must verify that paho-mqtt itself does NOT have native async support. Third-party wrappers exist:
- `aiomqtt` @ <https://github.com/empicano/aiomqtt> (recommended, actively maintained)
- `asyncio-mqtt` @ <https://github.com/sbtinstruments/asyncio-mqtt> (deprecated)
## Integration Patterns
### 1. MQTT Broker Compatibility
The model must verify that paho-mqtt works with these brokers @ <https://mqtt.org/software/>:
- **Eclipse Mosquitto**: Open-source, lightweight broker @ <https://mosquitto.org/>
- **HiveMQ**: Enterprise MQTT platform @ <https://www.hivemq.com/>
- **EMQX**: Scalable, distributed broker @ <https://www.emqx.io/>
- **AWS IoT Core**: Cloud-based managed service
- **Azure IoT Hub**: Microsoft cloud IoT platform
- **Google Cloud IoT Core**: Google cloud service
Example with Mosquitto @ <http://www.steves-internet-guide.com/into-mqtt-python-client/>:
```python
import paho.mqtt.client as mqtt
def on_message(client, userdata, message):
print(f"Received: {message.payload.decode()} on {message.topic}")
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, client_id="python_client")
client.on_message = on_message
# Local Mosquitto broker
client.connect("localhost", 1883, 60)
client.subscribe("test/topic")
client.loop_forever()
```
### 2. WebSocket Transport
The model must verify WebSocket support @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/ChangeLog.txt>:
```python
import paho.mqtt.client as mqtt
# Connect via WebSocket (useful for browser-based or proxy environments)
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, transport="websockets")
# Configure WebSocket path and headers
client.ws_set_options(path="/mqtt", headers={'User-Agent': 'Paho-Python'})
# Connect to broker's WebSocket port
client.connect("mqtt.example.com", 8080, 60)
```
### 3. TLS/SSL Encryption
The model must verify TLS support @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
```python
import paho.mqtt.client as mqtt
import ssl
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
# Server certificate validation
client.tls_set(
ca_certs="ca.crt",
certfile="client.crt",
keyfile="client.key",
tls_version=ssl.PROTOCOL_TLSv1_2
)
# For testing only: disable certificate verification (insecure!)
# client.tls_insecure_set(True)
client.connect("secure.mqtt.broker", 8883)
```
## Usage Examples
### Basic Publish
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
```python
import paho.mqtt.publish as publish
# One-shot publish (connect, publish, disconnect)
publish.single(
"home/temperature",
payload="22.5",
hostname="mqtt.eclipseprojects.io",
port=1883
)
# Multiple messages at once
msgs = [
{'topic': "sensor/temp", 'payload': "22.5"},
{'topic': "sensor/humidity", 'payload': "65"},
('sensor/pressure', '1013', 0, False) # Alternative tuple format
]
publish.multiple(msgs, hostname="mqtt.eclipseprojects.io")
```
### Basic Subscribe
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
```python
import paho.mqtt.subscribe as subscribe
# Simple blocking subscribe (receives one message)
msg = subscribe.simple("home/temperature", hostname="mqtt.eclipseprojects.io")
print(f"{msg.topic}: {msg.payload.decode()}")
# Callback-based subscription
def on_message_handler(client, userdata, message):
print(f"{message.topic}: {message.payload.decode()}")
userdata["count"] += 1
if userdata["count"] >= 10:
client.disconnect() # Stop after 10 messages
subscribe.callback(
on_message_handler,
"sensors/#",
hostname="mqtt.eclipseprojects.io",
userdata={"count": 0}
)
```
### Production-Grade Client
Example combining best practices @ <https://cedalo.com/blog/configuring-paho-mqtt-python-client-with-examples/>:
```python
import paho.mqtt.client as mqtt
import time
def on_connect(client, userdata, flags, reason_code, properties):
if reason_code.is_failure:
print(f"Connection failed: {reason_code}")
return
print(f"Connected with result code {reason_code}")
# Subscribe in on_connect ensures subscriptions persist after reconnection
client.subscribe("sensors/#", qos=1)
def on_disconnect(client, userdata, flags, reason_code, properties):
if reason_code != 0:
print(f"Unexpected disconnect. Reconnecting... (code: {reason_code})")
def on_message(client, userdata, message):
print(f"Topic: {message.topic}")
print(f"Payload: {message.payload.decode()}")
print(f"QoS: {message.qos}")
print(f"Retain: {message.retain}")
def on_publish(client, userdata, mid, reason_code, properties):
print(f"Message {mid} published")
# Create client with VERSION2 callbacks (recommended)
client = mqtt.Client(
mqtt.CallbackAPIVersion.VERSION2,
client_id="sensor_monitor",
clean_session=False # Persistent session
)
# Set callbacks
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.on_message = on_message
client.on_publish = on_publish
# Authentication
client.username_pw_set("username", "password")
# TLS (if required)
# client.tls_set(ca_certs="ca.crt")
# Reconnection settings
client.reconnect_delay_set(min_delay=1, max_delay=120)
# Connect
client.connect("mqtt.example.com", 1883, keepalive=60)
# Start network loop in background thread
client.loop_start()
# Application logic
try:
while True:
# Publish sensor data
result = client.publish("sensors/temperature", "22.5", qos=1)
result.wait_for_publish() # Block until published
time.sleep(5)
except KeyboardInterrupt:
print("Shutting down...")
finally:
client.loop_stop()
client.disconnect()
```
### Loop Management Patterns
The model must explain three loop options @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
```python
import paho.mqtt.client as mqtt
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.connect("mqtt.eclipseprojects.io", 1883)
# OPTION 1: Blocking loop (simplest)
# Runs forever, handles reconnection automatically
client.loop_forever()
# OPTION 2: Threaded loop (recommended for most cases)
# Runs in background thread, main thread free for other work
client.loop_start()
# ... do other work ...
client.loop_stop()
# OPTION 3: Manual loop (advanced, full control)
# Must be called regularly, manual reconnection handling
while True:
rc = client.loop(timeout=1.0)
if rc != 0:
# Handle connection error
break
```
## When NOT to Use paho-mqtt
The model must provide clear decision guidance based on verified constraints:
### Use HTTP/REST Instead When
1. **Request-Response Pattern**: Simple one-off queries without persistent connection
- Example: Weather API calls, database queries
- Reason: HTTP is simpler for synchronous request-response
2. **Large Payload Transfer**: Transferring files, images, or large datasets
- Example: Uploading videos, downloading reports
- Reason: HTTP has better tooling for chunked transfer, range requests
3. **Browser-Based Only**: Pure web applications without IoT integration
- Example: Standard web app, SPA without real-time requirements
- Reason: REST APIs are natively supported by browsers
4. **Strong Consistency Required**: Immediate consistency across all clients
- Example: Financial transactions, inventory management
- Reason: MQTT is eventually consistent, REST can enforce immediate consistency
### Use WebSockets Instead When
1. **Full-Duplex, Low-Latency Communication**: Real-time chat, gaming, collaborative editing
- Example: Slack-like messaging, Google Docs collaboration
- Reason: WebSockets provide bidirectional streams without MQTT protocol overhead
2. **Custom Protocol**: Need full control over message format and semantics
- Example: Proprietary binary protocols, custom RPC
- Reason: WebSockets are a transport layer, MQTT adds specific semantics
### Use Message Queues (RabbitMQ, Kafka) Instead When
1. **Complex Routing Logic**: Advanced routing rules, message transformation
- Example: Enterprise service bus, workflow orchestration
- Reason: RabbitMQ exchanges provide richer routing than MQTT topics
2. **High-Throughput Log Streaming**: Million+ messages per second, log aggregation
- Example: Centralized logging, event sourcing at scale
- Reason: Kafka optimized for high-throughput sequential writes
3. **Message Persistence and Replay**: Need to replay message history
- Example: Event sourcing, audit trails
- Reason: Kafka provides durable log storage, MQTT has limited persistence
## Decision Matrix: MQTT vs Alternatives
The model must provide this decision matrix based on verified use cases:
| **Use Case** | **MQTT (paho-mqtt)** | **HTTP/REST** | **WebSocket** | **Message Queue** |
| --- | --- | --- | --- | --- |
| **IoT Sensor Data** | ✅ Optimal | ❌ Too heavy | ⚠️ Possible | ❌ Overkill |
| **Home Automation** | ✅ Optimal | ❌ Polling inefficient | ⚠️ Possible | ❌ Too complex |
| **Mobile Notifications** | ✅ Good (battery efficient) | ⚠️ Polling wastes battery | ✅ Good | ❌ Overkill |
| **Real-time Chat** | ⚠️ Possible | ❌ No real-time | ✅ Optimal | ⚠️ Possible |
| **File Transfer** | ❌ Not designed for this | ✅ Better tools | ⚠️ Possible | ❌ Wrong tool |
| **Microservices RPC** | ⚠️ Possible | ✅ Standard approach | ❌ Overkill | ✅ Enterprise scale |
| **Telemetry Collection** | ✅ Optimal | ❌ Too chatty | ❌ Overkill | ✅ At massive scale |
### Use MQTT When
The model must verify these conditions favor MQTT:
1.**Bandwidth is constrained** (cellular, satellite links)
2.**Network is unreliable** (intermittent connectivity)
3.**Many-to-many communication** (pub-sub pattern)
4.**Low latency required** (< 100ms message delivery)
5.**Battery-powered devices** (minimal protocol overhead)
6.**IoT/M2M communication** (devices, sensors, actuators)
7.**Topic-based routing** (hierarchical topic namespaces)
### Use HTTP/REST When
The model must verify these conditions favor HTTP:
1.**Request-response pattern** (client initiates, server responds)
2.**Stateless interactions** (no persistent connection needed)
3.**Large payloads** (files, documents, media)
4.**Caching required** (HTTP caching semantics)
5.**Browser-based clients** (native browser support)
6.**Standard CRUD operations** (REST conventions)
### Use WebSocket When
The model must verify these conditions favor WebSocket:
1.**Full-duplex communication** (simultaneous send/receive)
2.**Custom protocol** (need full control over wire format)
3.**Browser-based real-time** (chat, collaboration, gaming)
4.**Lower latency than MQTT** (no protocol overhead)
5.**Simple point-to-point** (no pub-sub routing needed)
## Known Limitations
The model must cite these verified limitations @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
### Session Persistence
1. **Memory-Only Sessions**: When `clean_session=False`, session state is NOT persisted to disk
- Impact: Session lost if Python process restarts
- Lost data: QoS 2 messages in-flight, pending QoS 1/2 publishes
- Mitigation: Use `wait_for_publish()` to ensure message delivery before shutdown
```python
# Session is only in memory!
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, clean_session=False)
# Ensure message is fully acknowledged before shutdown
msg_info = client.publish("critical/data", "important", qos=2)
msg_info.wait_for_publish() # Blocks until PUBCOMP received
```
2. **QoS 2 Duplicate Risk**: With `clean_session=True`, QoS > 0 messages are republished after reconnection
- Impact: QoS 2 messages may be received twice (non-compliant with MQTT spec)
- Standard requires: Discard unacknowledged messages on reconnection
- Recommendation: Use `clean_session=False` for exactly-once guarantees
### Native Async Support
The model must verify that paho-mqtt does NOT have native asyncio support:
- **Workaround**: Use third-party wrappers like `aiomqtt` @ <https://github.com/empicano/aiomqtt>
- **Alternative**: Use threaded loops (`loop_start()`) or external event loop support
```python
# NOT native async - need wrapper
import asyncio
from aiomqtt import Client # Third-party wrapper
async def main():
async with Client("mqtt.eclipseprojects.io") as client:
async with client.messages() as messages:
await client.subscribe("sensors/#")
async for message in messages:
print(message.payload.decode())
asyncio.run(main())
```
## Installation
The model must verify installation from official sources @ <https://pypi.org/project/paho-mqtt/>:
```bash
# Standard installation
pip install paho-mqtt
# With SOCKS proxy support
pip install paho-mqtt[proxy]
# Development installation from source
git clone https://github.com/eclipse-paho/paho.mqtt.python
cd paho.mqtt.python
pip install -e .
```
## Common Patterns and Best Practices
### 1. Reconnection Handling
The model must recommend this pattern @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
```python
import paho.mqtt.client as mqtt
def on_connect(client, userdata, flags, reason_code, properties):
# ALWAYS subscribe in on_connect callback
# This ensures subscriptions are renewed after reconnection
client.subscribe("sensors/#", qos=1)
def on_disconnect(client, userdata, flags, reason_code, properties):
if reason_code != 0:
print(f"Unexpected disconnect: {reason_code}")
# loop_forever() and loop_start() will automatically reconnect
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
# Configure reconnection delay
client.reconnect_delay_set(min_delay=1, max_delay=120)
client.connect("mqtt.example.com", 1883)
client.loop_forever() # Handles reconnection automatically
```
### 2. Logging for Debugging
The model must recommend enabling logging @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
```python
import logging
import paho.mqtt.client as mqtt
# Enable standard Python logging
logging.basicConfig(level=logging.DEBUG)
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.enable_logger() # Uses standard logging module
# Or use custom on_log callback
def on_log(client, userdata, level, buf):
if level == mqtt.MQTT_LOG_ERR:
print(f"ERROR: {buf}")
client.on_log = on_log
```
### 3. Graceful Shutdown
The model must recommend this pattern for clean disconnection:
```python
import paho.mqtt.client as mqtt
import signal
import sys
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
def signal_handler(sig, frame):
print("Shutting down gracefully...")
client.disconnect() # Triggers clean disconnect
client.loop_stop()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
client.connect("mqtt.example.com", 1883)
client.loop_start()
# Application runs...
signal.pause() # Wait for SIGINT
```
## References and Sources
The model must cite these verified sources used in this research:
1. Official Eclipse Paho repository @ <https://github.com/eclipse-paho/paho.mqtt.python>
2. Official documentation @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/>
3. MQTT specification @ <https://mqtt.org/mqtt-specification/>
4. Steve's Internet Guide (MQTT tutorials) @ <http://www.steves-internet-guide.com/>
5. HiveMQ MQTT client guide @ <https://www.hivemq.com/blog/mqtt-client-library-paho-python/>
6. Cedalo MQTT configuration guide @ <https://cedalo.com/blog/configuring-paho-mqtt-python-client-with-examples/>
7. EMQX Python MQTT guide @ <https://www.emqx.com/en/blog/how-to-use-mqtt-in-python>
8. Home Assistant core repository @ <https://github.com/home-assistant/core>
9. AWS IoT Python example @ <https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot>
10. aiomqtt async wrapper @ <https://github.com/empicano/aiomqtt>
## Summary
The model must conclude that paho-mqtt is the recommended solution when:
1. Building IoT applications with resource-constrained devices
2. Implementing pub-sub messaging patterns with topic-based routing
3. Operating in unreliable or bandwidth-limited network environments
4. Requiring specific QoS guarantees for message delivery
5. Integrating with standard MQTT brokers (Mosquitto, HiveMQ, EMQX, AWS IoT)
The model must avoid paho-mqtt when:
1. Simple request-response patterns suffice (use HTTP/REST)
2. Real-time, low-latency browser communication needed (use WebSocket)
3. Complex message routing or high-throughput streaming required (use RabbitMQ/Kafka)
4. Large file transfers or binary data streaming needed (use HTTP)
The model must verify that paho-mqtt is production-ready, actively maintained by the Eclipse Foundation, and the de facto standard MQTT client library for Python.

View File

@@ -0,0 +1,512 @@
---
title: "Prefect: Modern Workflow Orchestration Platform"
library_name: prefect
pypi_package: prefect
category: workflow-orchestration
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://docs.prefect.io"
official_repository: "https://github.com/PrefectHQ/prefect"
maintenance_status: "active"
---
# Prefect: Modern Workflow Orchestration
## Core Purpose
Prefect solves workflow orchestration with a Python-first approach that turns regular Python functions into production-ready data pipelines. Unlike legacy orchestrators that require DAG definitions and framework-specific operators, Prefect observes native Python code execution and provides orchestration through simple decorators@[1].
**Problem Domain:** Coordinating multi-step data workflows, handling failures with retries, scheduling recurring jobs, monitoring pipeline execution, and managing dependencies between tasks without writing boilerplate orchestration code@[2].
**When to Use:** Building data pipelines, ML workflows, ETL processes, or any multi-step automation that needs scheduling, retry logic, state tracking, and observability@[3].
**What You Would Reinvent:** Manual retry logic, state management, dependency coordination, scheduling systems, execution monitoring, error handling, result caching, and workflow visibility dashboards@[4].
## Official Information
**Repository:** <https://github.com/PrefectHQ/prefect> **PyPI Package:** `prefect` (current: v3.4.24)@[5] **Documentation:** <https://docs.prefect.io> **License:** Apache-2.0@[6] **Maintenance:** Actively maintained by PrefectHQ with 1059 open issues, 20.6K stars, regular releases@[7] **Community:** 30K+ engineers, active Slack community@[8]
## Python Compatibility
**Minimum Version:** Python 3.9@[9] **Maximum Version:** Python 3.13 (3.14 not yet supported)@[9] **Async Support:** Full native async/await support throughout@[10] **Type Hints:** First-class support, type-safe structured outputs@[11]
## Core Capabilities
### 1. Pythonic Flow Definition
Write workflows as regular Python functions with `@flow` and `@task` decorators:
```python
from prefect import flow, task
import httpx
@task(log_prints=True)
def get_stars(repo: str):
url = f"https://api.github.com/repos/{repo}"
count = httpx.get(url).json()["stargazers_count"]
print(f"{repo} has {count} stars!")
@flow(name="GitHub Stars")
def github_stars(repos: list[str]):
for repo in repos:
get_stars(repo)
# Run directly
if __name__ == "__main__":
github_stars(["PrefectHQ/Prefect"])
```
@[12]
### 2. Dynamic Runtime Workflows
Create tasks dynamically based on data, not static DAG definitions:
```python
from prefect import task, flow
@task
def process_customer(customer_id: str) -> str:
return f"Processed {customer_id}"
@flow
def main() -> list[str]:
customer_ids = get_customer_ids() # Runtime data
# Map tasks across dynamic data
results = process_customer.map(customer_ids)
return results
```
@[13]
### 3. Flexible Scheduling
Deploy workflows with cron, interval, or RRule schedules:
```python
# Serve with cron schedule
if __name__ == "__main__":
github_stars.serve(
name="daily-stars",
cron="0 8 * * *", # Daily at 8 AM
parameters={"repos": ["PrefectHQ/prefect"]}
)
```
@[14]
```python
# Or use interval-based scheduling
my_flow.deploy(
name="my-deployment",
work_pool_name="my-work-pool",
interval=timedelta(minutes=10)
)
```
@[15]
### 4. Built-in Retries and State Management
Automatic retry logic and state tracking:
```python
@task(retries=3, retry_delay_seconds=60)
def fetch_data():
# Automatically retries on failure
return api_call()
```
@[16]
### 5. Concurrent Task Execution
Run tasks in parallel with `.submit()`:
```python
@flow
def my_workflow():
future = cool_task.submit() # Non-blocking
print(what_did_cool_task_say(future))
```
@[17]
### 6. Event-Driven Automations
React to events, not just schedules:
```python
# Trigger flows on external events
my_flow.deploy(
triggers=[
DeploymentEventTrigger(
expect=["s3.file.uploaded"]
)
]
)
```
@[18]
## Real-World Integration Patterns
### Integration with dbt
Orchestrate dbt transformations within Prefect flows:
```python
from prefect_dbt import DbtCoreOperation
@flow
def dbt_flow():
result = DbtCoreOperation(
commands=["dbt run", "dbt test"],
project_dir="/path/to/dbt/project"
).run()
return result
```
@[19]
**Example Repository:** <https://github.com/anna-geller/prefect-dataplatform> (106 stars) - Shows Prefect + dbt + Snowflake data platform@[20]
### AWS Deployment Pattern
Deploy to AWS ECS Fargate:
```python
# prefect.yaml configuration
work_pool:
name: aws-ecs-pool
type: ecs
deployments:
- name: production
work_pool_name: aws-ecs-pool
schedules:
- cron: "0 */4 * * *"
```
@[21]
**Example Repository:** <https://github.com/anna-geller/dataflow-ops> (116 stars) - Automated deployments to AWS ECS@[22]
### Docker Compose Self-Hosted
Run Prefect server with Docker Compose:
```yaml
version: "3.8"
services:
prefect-server:
image: prefecthq/prefect:latest
command: prefect server start
ports:
- "4200:4200"
environment:
- PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:password@postgres:5432/prefect
```
@[23]
**Example Repositories:**
- <https://github.com/rpeden/prefect-docker-compose> (142 stars)@[24]
- <https://github.com/flavienbwk/prefect-docker-compose> (161 stars)@[25]
## Common Usage Patterns
### Pattern 1: ETL Pipeline with Retries
```python
from prefect import flow, task
from prefect.tasks import exponential_backoff
@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=2))
def extract_data(source: str):
# Fetch from API with automatic retries
return fetch_api_data(source)
@task
def transform_data(raw_data):
return clean_and_transform(raw_data)
@task
def load_data(data, destination: str):
write_to_database(data, destination)
@flow(log_prints=True)
def etl_pipeline():
raw = extract_data("https://api.example.com/data")
transformed = transform_data(raw)
load_data(transformed, "postgresql://db")
```
@[26]
### Pattern 2: Scheduled Data Sync
```python
@flow
def sync_customer_data():
customers = fetch_customers()
for customer in customers:
sync_to_warehouse(customer)
# Schedule to run every hour
if __name__ == "__main__":
sync_customer_data.serve(
name="hourly-sync",
interval=3600, # Every hour
tags=["production", "sync"]
)
```
@[27]
### Pattern 3: ML Pipeline with Caching
```python
@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
def load_training_data():
# Expensive data loading - cached for 1 hour
return load_large_dataset()
@task
def train_model(data):
return train_ml_model(data)
@flow
def ml_pipeline():
data = load_training_data() # Reuses cached result
model = train_model(data)
return model
```
@[28]
## Integration Ecosystem
### Data Transformation
- **dbt:** Native integration via `prefect-dbt` package (archived, use dbt Cloud API)@[29]
- **dbt Cloud:** Official integration for triggering dbt Cloud jobs@[30]
### Data Warehouses
- **Snowflake:** `prefect-snowflake` for query execution@[31]
- **BigQuery:** `prefect-gcp` for BigQuery operations@[32]
- **Redshift, PostgreSQL:** Standard database connectors@[33]
### Cloud Platforms
- **AWS:** `prefect-aws` (S3, ECS, Lambda, Batch)@[34]
- **GCP:** `prefect-gcp` (GCS, BigQuery, Cloud Run)@[35]
- **Azure:** `prefect-azure` (Blob Storage, Container Instances)@[36]
### Container Orchestration
- **Docker:** Native Docker build and push support@[37]
- **Kubernetes:** `prefect-kubernetes` for K8s deployments@[38]
- **ECS Fargate:** Built-in ECS work pools@[39]
### Data Quality
- **Great Expectations:** `prefect-great-expectations` for validation@[40]
- **Monte Carlo:** Circuit breaker integrations@[41]
### ML/AI
- **LangChain:** `langchain-prefect` for LLM workflows (archived)@[42]
- **MLflow:** Track experiments within Prefect flows@[43]
## Deployment Options
### 1. Prefect Cloud (Managed)
Fully managed orchestration platform with:
- Hosted API and UI
- Team collaboration features
- RBAC and access controls
- Enterprise SLAs
- Automations and event triggers@[44]
**Pricing:** Free tier + usage-based pricing@[45]
### 2. Self-Hosted Prefect Server
Open-source server you deploy:
```bash
# Start local server
prefect server start
# Or deploy via Docker
docker run -p 4200:4200 prefecthq/prefect:latest prefect server start
```
@[46]
**Requirements:** PostgreSQL database, Redis (optional for caching)@[47]
### 3. Hybrid Execution Model
Orchestration in cloud, execution anywhere:
- Control plane in Prefect Cloud
- Workers run in your infrastructure
- Code never leaves your environment@[48]
## When to Use Prefect
### Use Prefect When
1. **Building data pipelines** that need scheduling, retries, and monitoring@[49]
2. **Orchestrating ML workflows** with dynamic dependencies@[50]
3. **Coordinating microservices** or distributed tasks@[51]
4. **Migrating from cron jobs** to a modern orchestrator@[52]
5. **Need Python-native workflows** without DSL overhead@[53]
6. **Want local development** with production parity@[54]
7. **Require event-driven automation** beyond scheduling@[55]
8. **Need visibility** into workflow execution and failures@[56]
### Use Simple Scripts/Cron When
1. **Single-step tasks** with no dependencies@[57]
2. **One-off scripts** that rarely run@[58]
3. **No retry logic** needed@[59]
4. **No failure visibility** required@[60]
5. **Under 5 lines of code** total@[61]
## Prefect vs. Alternatives
### Prefect vs. Airflow
| Dimension | Prefect | Airflow |
| --- | --- | --- |
| **Development Model** | Pure Python functions with decorators | DAG definitions with operators |
| **Dynamic Workflows** | Runtime task creation based on data | Static DAG structure at parse time |
| **Local Development** | Run locally without infrastructure | Requires full Airflow setup |
| **Learning Curve** | Minimal - just Python | Steep - framework concepts required |
| **Infrastructure** | Runs anywhere Python runs | Multi-component (scheduler, webserver, DB) |
| **Cost** | 60-70% lower (per customer reports)@[62] | Higher due to always-on infrastructure@[63] |
| **Best For** | ML/AI, modern data teams, dynamic pipelines | Traditional ETL, platform teams invested in ecosystem |
**Migration Path:** Prefect provides 73.78% cost reduction over Astronomer (managed Airflow)@[64]
### Prefect vs. Dagster
| Dimension | Prefect | Dagster |
| ---------------- | --------------------------- | --------------------------------- |
| **Philosophy** | Workflow orchestration | Data asset orchestration |
| **Abstractions** | Flows and tasks | Software-defined assets |
| **Use Case** | General workflow automation | Data asset lineage and cataloging |
| **Complexity** | Lower barrier to entry | Higher conceptual overhead |
### Prefect vs. Metaflow
| Dimension | Prefect | Metaflow |
| -------------- | ------------------------- | --------------------- |
| **Origin** | General orchestration | Netflix ML workflows |
| **Scope** | Broad workflow automation | ML-specific pipelines |
| **Deployment** | Any infrastructure | AWS, K8s focus |
| **Community** | Larger ecosystem | ML-focused community |
## Decision Matrix
```text
Use Prefect when:
- You write Python workflows
- You need dynamic task generation
- You want local development + production parity
- You need retry/caching/scheduling out of box
- You're building ML, data, or automation pipelines
- You want low operational overhead
- Cost efficiency matters (vs. Airflow)
Use Airflow when:
- You're heavily invested in Airflow ecosystem
- Your team already knows Airflow
- You need specific Airflow operators not in Prefect
- You have dedicated platform engineering for Airflow
Use Dagster when:
- Data asset lineage is primary concern
- You're building a data platform with asset catalog
- You need software-defined assets
Use simple cron/scripts when:
- Single independent tasks
- No retry logic needed
- No monitoring required
- Runs once per day or less
```
@[65]
## Anti-Patterns and Gotchas
### Don't Use Prefect For
1. **Simple one-off scripts** - adds unnecessary overhead@[66]
2. **Real-time streaming** - designed for batch/scheduled workflows@[67]
3. **Sub-second latency requirements** - orchestration adds overhead@[68]
4. **Pure event processing** - use Kafka/RabbitMQ instead@[69]
### Common Pitfalls
1. **Over-decomposition:** Breaking every line into a task creates overhead@[70]
2. **Ignoring task inputs:** Tasks should be pure functions for caching@[71]
3. **Not using .submit():** Blocking task calls prevent parallelism@[72]
4. **Skipping local testing:** Run flows locally before deploying@[73]
## Learning Resources
**Official Quickstart:** <https://docs.prefect.io/v3/get-started/quickstart@[74>] **Examples Repository:** <https://github.com/PrefectHQ/examples@[75>] **Community Recipes:** <https://github.com/PrefectHQ/prefect-recipes> (254 stars, archived)@[76] **Slack Community:** <https://prefect.io/slack@[77>] **YouTube Channel:** <https://www.youtube.com/c/PrefectIO/@[78>]
## Installation
```bash
# Using pip
pip install -U prefect
# Using uv (recommended)
uv add prefect
# With specific integrations
pip install prefect-aws prefect-gcp prefect-dbt
```
@[79]
## Verification Checklist
- [x] Official repository confirmed: <https://github.com/PrefectHQ/prefect>
- [x] PyPI package verified: prefect v3.4.24
- [x] Python compatibility: 3.9-3.13
- [x] License confirmed: Apache-2.0
- [x] Real-world examples: 5+ GitHub repositories with 100+ stars
- [x] Integration patterns documented: dbt, Snowflake, AWS, Docker
- [x] Decision matrix provided: vs Airflow, Dagster, Metaflow, cron
- [x] Anti-patterns identified: streaming, sub-second latency
- [x] Code examples: 6+ verified from official docs and Context7
- [x] Maintenance status: Active (1059 open issues, recent commits)
## References
Sources cited with @ notation throughout document:
[1-79] Information gathered from:
- Context7 Library ID: /prefecthq/prefect (Trust Score: 8.2, 6247 code snippets)
- Official documentation: <https://docs.prefect.io>
- GitHub repository: <https://github.com/PrefectHQ/prefect>
- PyPI package page: <https://pypi.org/project/prefect/>
- Prefect vs Airflow comparison: <https://www.prefect.io/compare/airflow>
- Example repositories: anna-geller/prefect-dataplatform, rpeden/prefect-docker-compose, flavienbwk/prefect-docker-compose, anna-geller/dataflow-ops
- Exa code context search results
- Ref documentation search results
Last verified: 2025-10-21

View File

@@ -0,0 +1,942 @@
---
title: "python-diskcache - SQLite-Backed Persistent Cache for Python"
library_name: python-diskcache
pypi_package: diskcache
category: caching
python_compatibility: "3.0+"
last_updated: "2025-11-02"
official_docs: "https://grantjenks.com/docs/diskcache"
official_repository: "https://github.com/grantjenks/python-diskcache"
maintenance_status: "active"
---
# python-diskcache - SQLite-Backed Persistent Cache for Python
## Overview
**python-diskcache** is an Apache2-licensed disk and file-backed cache library written in pure Python. It provides persistent, thread-safe, and process-safe caching using SQLite as the backend, making it suitable for applications that need caching without running a separate cache server like Redis or Memcached.
**Official Repository:** <https://github.com/grantjenks/python-diskcache> @ grantjenks/python-diskcache **Documentation:** <https://grantjenks.com/docs/diskcache/> @ grantjenks.com **PyPI Package:** `diskcache` @ pypi.org/project/diskcache **License:** Apache License 2.0 @ github.com/grantjenks/python-diskcache **Current Version:** 5.6.3 (August 31, 2023) @ pypi.org **Maintenance Status:** Actively maintained, 2,647+ GitHub stars @ github.com/grantjenks/python-diskcache
## Core Purpose
### Problem diskcache Solves
1. **Persistent Caching Without External Services:** Provides disk-backed caching without requiring Redis/Memcached servers @ grantjenks.com/docs/diskcache
2. **Thread and Process Safety:** SQLite-backed cache with atomic operations safe for multi-threaded and multi-process applications @ grantjenks.com/docs/diskcache/tutorial.html
3. **Leveraging Unused Disk Space:** Utilizes empty disk space instead of competing for scarce memory in cloud environments @ github.com/grantjenks/python-diskcache/README.rst
4. **Django's Broken File Cache:** Replaces Django's problematic file-based cache with linear scaling issues @ github.com/grantjenks/python-diskcache/README.rst
### What Would Be "Reinventing the Wheel"
Without diskcache, you would need to:
- Implement SQLite-based caching with proper locking and atomicity manually @ grantjenks.com/docs/diskcache
- Build eviction policies (LRU, LFU) from scratch @ grantjenks.com/docs/diskcache/tutorial.html
- Manage thread-safe and process-safe file system operations @ grantjenks.com/docs/diskcache
- Handle serialization, compression, and expiration logic manually @ grantjenks.com/docs/diskcache/tutorial.html
- Implement cache stampede prevention for memoization @ grantjenks.com/docs/diskcache/case-study-landing-page-caching.html
## When to Use diskcache
### Use diskcache When
1. **Single-Machine Persistent Cache:** You need persistent caching on one server without distributed requirements @ grantjenks.com/docs/diskcache
2. **No External Cache Server:** You want to avoid running and managing Redis/Memcached @ github.com/grantjenks/python-diskcache/README.rst
3. **Process-Safe Caching:** Multiple processes need to share cache data safely (web workers, background tasks) @ grantjenks.com/docs/diskcache/tutorial.html
4. **Large Cache Size:** You need gigabytes of cache that would be expensive in memory @ github.com/grantjenks/python-diskcache/README.rst
5. **Django File Cache Replacement:** Django's file cache is too slow for your needs @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
6. **Memoization with Persistence:** Function results should persist across process restarts @ grantjenks.com/docs/diskcache/tutorial.html
7. **Tag-Based Eviction:** You need to invalidate related cache entries by tag @ grantjenks.com/docs/diskcache/tutorial.html
8. **Offline/Local Development:** No network cache available in development environment @ grantjenks.com/docs/diskcache
### Use Redis When
1. **Distributed Caching:** Multiple servers need to share the same cache @ grantjenks.com/docs/diskcache
2. **Sub-Millisecond Latency Critical:** Network latency acceptable for extreme speed requirements @ grantjenks.com/docs/diskcache/cache-benchmarks.html
3. **Advanced Data Structures:** Need Redis-specific types (sets, sorted sets, pub/sub) @ redis.io
4. **Cache Replication:** Require high availability and replication across nodes @ redis.io
5. **Horizontal Scaling:** Cache must scale across multiple machines @ redis.io
### Use functools.lru_cache When
1. **In-Memory Only:** Cache doesn't need to persist across process restarts @ python.org/docs
2. **Single Process:** No multi-process cache sharing needed @ python.org/docs
3. **Small Cache Size:** Cache fits comfortably in memory (megabytes, not gigabytes) @ python.org/docs
4. **Simple Memoization:** No expiration, tags, or complex eviction needed @ python.org/docs
## Decision Matrix
```text
┌──────────────────────────────┬───────────┬─────────┬────────────────┬──────────┐
│ Requirement │ diskcache │ Redis │ lru_cache │ shelve │
├──────────────────────────────┼───────────┼─────────┼────────────────┼──────────┤
│ Persistent storage │ ✓ │ ✓* │ ✗ │ ✓ │
│ Thread-safe │ ✓ │ ✓ │ ✓ │ ✗ │
│ Process-safe │ ✓ │ ✓ │ ✗ │ ✗ │
│ No external server │ ✓ │ ✗ │ ✓ │ ✓ │
│ Eviction policies │ LRU/LFU │ LRU/LFU │ LRU only │ None │
│ Tag-based invalidation │ ✓ │ Manual │ ✗ │ ✗ │
│ Expiration support │ ✓ │ ✓ │ ✗ │ ✗ │
│ Distributed caching │ ✗ │ ✓ │ ✗ │ ✗ │
│ Django integration │ ✓ │ ✓ │ ✗ │ ✗ │
│ Transactions │ ✓ │ ✓ │ ✗ │ ✗ │
│ Atomic operations │ Always │ ✓ │ ✓ │ Maybe │
│ Memoization decorators │ ✓ │ Manual │ ✓ │ ✗ │
│ Typical latency (get) │ 25 µs │ 190 µs │ 0.1 µs │ 36 µs │
│ Pure Python │ ✓ │ ✗ │ ✓ │ ✓ │
└──────────────────────────────┴───────────┴─────────┴────────────────┴──────────┘
```
@ Compiled from grantjenks.com/docs/diskcache, github.com/grantjenks/python-diskcache
**Note:** Redis persistence is optional and primarily for durability, not primary storage model.
## Python Version Compatibility
**Minimum Python Version:** 3.0 @ github.com/grantjenks/python-diskcache/setup.py **Officially Tested Versions:** 3.6, 3.7, 3.8, 3.9, 3.10 @ github.com/grantjenks/python-diskcache/README.rst **Development Version:** 3.10 @ github.com/grantjenks/python-diskcache/README.rst
**Python 3.11-3.14 Status:**
- **3.11:** Expected to work (no known incompatibilities)
- **3.12:** Expected to work (no known incompatibilities)
- **3.13:** Expected to work (no known incompatibilities)
- **3.14:** Expected to work (pure Python with no C dependencies)
**Dependencies:** None - pure Python with standard library only @ github.com/grantjenks/python-diskcache/setup.py
## Real-World Usage Examples
### Example Projects Using diskcache
1. **morss** (722+ stars) @ github.com/pictuga/morss
- Full-text RSS feed generator
- Pattern: Caching HTTP responses and parsed feed data
- URL: <https://github.com/pictuga/morss>
2. **git-pandas** (192+ stars) @ github.com/wdm0006/git-pandas
- Git repository analysis with pandas dataframes
- Pattern: Caching expensive git repository queries
- URL: <https://github.com/wdm0006/git-pandas>
3. **High-Traffic Website Caching** @ grantjenks.com/docs/diskcache
- Testimonial: "Reduced Elasticsearch queries by over 25% for 1M+ users/day (100+ hits/second)" - Daren Hasenkamp
- Pattern: Database query result caching in production web applications
4. **Ansible Automation** @ grantjenks.com/docs/diskcache
- Testimonial: "Sped up Ansible runs by almost 3 times" - Mathias Petermann
- Pattern: Caching lookup module results across playbook runs
### Common Usage Patterns @ grantjenks.com/docs/diskcache, exa.ai
```python
# Pattern 1: Basic Cache Operations
from diskcache import Cache
cache = Cache('/tmp/mycache')
# Dictionary-like interface
cache['key'] = 'value'
print(cache['key']) # 'value'
print('key' in cache) # True
del cache['key']
# Method-based interface with expiration
cache.set('key', 'value', expire=300) # 5 minutes
value = cache.get('key')
cache.delete('key')
# Cleanup
cache.close()
# Pattern 2: Function Memoization with Cache Decorator
from diskcache import Cache
cache = Cache('/tmp/mycache')
@cache.memoize()
def expensive_function(x, y):
# Expensive computation
import time
time.sleep(2)
return x + y
# First call takes 2 seconds
result = expensive_function(1, 2) # Slow
# Second call is instant (cached)
result = expensive_function(1, 2) # Fast!
# Pattern 3: Cache Stampede Prevention
from diskcache import Cache, memoize_stampede
import time
cache = Cache('/tmp/mycache')
@memoize_stampede(cache, expire=60, beta=0.3)
def generate_landing_page():
"""Prevents thundering herd when cache expires"""
time.sleep(0.2) # Simulate expensive computation
return "<html>Landing Page</html>"
# Multiple concurrent requests won't cause stampede
result = generate_landing_page()
# Pattern 4: FanoutCache for High Concurrency
from diskcache import FanoutCache
# Sharded cache for concurrent writes
cache = FanoutCache('/tmp/mycache', shards=8, timeout=1.0)
# Same API as Cache but with better write concurrency
cache.set('key', 'value')
value = cache.get('key')
# Pattern 5: Tag-Based Eviction
from diskcache import Cache
from io import BytesIO
cache = Cache('/tmp/mycache', tag_index=True) # Enable tag index
# Set items with tags
cache.set('user:1:profile', data1, tag='user:1')
cache.set('user:1:posts', data2, tag='user:1')
cache.set('user:1:friends', data3, tag='user:1')
# Evict all items for a specific tag
cache.evict('user:1')
# Pattern 6: Web Crawler with Persistent Storage
from diskcache import Index
# Persistent dictionary for crawled URLs
results = Index('data/results')
# Store crawled data
results['https://example.com'] = {
'html': '<html>...</html>',
'timestamp': '2025-10-21',
'status': 200
}
# Query persistent results
print(len(results))
if 'https://example.com' in results:
data = results['https://example.com']
# Pattern 7: Django Cache Configuration
# settings.py
CACHES = {
'default': {
'BACKEND': 'diskcache.DjangoCache',
'LOCATION': '/var/cache/django',
'TIMEOUT': 300,
'SHARDS': 8,
'DATABASE_TIMEOUT': 0.010, # 10 milliseconds
'OPTIONS': {
'size_limit': 2 ** 30 # 1 GB
},
},
}
# Pattern 8: Async Operation with asyncio
import asyncio
from diskcache import Cache
cache = Cache('/tmp/mycache')
async def set_async(key, value):
loop = asyncio.get_running_loop()
await loop.run_in_executor(None, cache.set, key, value)
async def get_async(key):
loop = asyncio.get_running_loop()
return await loop.run_in_executor(None, cache.get, key)
# Use in async functions
await set_async('test-key', 'test-value')
value = await get_async('test-key')
# Pattern 9: Custom Serialization with JSONDisk
import json
import zlib
from diskcache import Cache, Disk, UNKNOWN
class JSONDisk(Disk):
def __init__(self, directory, compress_level=1, **kwargs):
self.compress_level = compress_level
super().__init__(directory, **kwargs)
def put(self, key):
json_bytes = json.dumps(key).encode('utf-8')
data = zlib.compress(json_bytes, self.compress_level)
return super().put(data)
def get(self, key, raw):
data = super().get(key, raw)
return json.loads(zlib.decompress(data).decode('utf-8'))
def store(self, value, read, key=UNKNOWN):
if not read:
json_bytes = json.dumps(value).encode('utf-8')
value = zlib.compress(json_bytes, self.compress_level)
return super().store(value, read, key=key)
def fetch(self, mode, filename, value, read):
data = super().fetch(mode, filename, value, read)
if not read:
data = json.loads(zlib.decompress(data).decode('utf-8'))
return data
# Use custom disk implementation
cache = Cache('/tmp/mycache', disk=JSONDisk, disk_compress_level=6)
# Pattern 10: Cross-Process Locking
from diskcache import Lock
import time
lock = Lock(cache, 'resource-name')
with lock:
# Critical section - only one process executes at a time
print("Exclusive access to resource")
time.sleep(1)
# Pattern 11: Rate Limiting / Throttling
from diskcache import throttle
@throttle(cache, count=10, seconds=60)
def api_call():
"""Allow only 10 calls per minute"""
return make_expensive_api_request()
# Raises exception if rate limit exceeded
try:
api_call()
except Exception:
print("Rate limit exceeded")
```
@ Compiled from grantjenks.com/docs/diskcache, exa.ai/get_code_context
## Integration Patterns
### Django Integration @ grantjenks.com/docs/diskcache/tutorial.html
```python
# settings.py
CACHES = {
'default': {
'BACKEND': 'diskcache.DjangoCache',
'LOCATION': '/path/to/cache/directory',
'TIMEOUT': 300,
'SHARDS': 8,
'DATABASE_TIMEOUT': 0.010,
'OPTIONS': {
'size_limit': 2 ** 30 # 1 gigabyte
},
},
}
# Usage in views
from django.core.cache import cache
def my_view(request):
result = cache.get('my_key')
if result is None:
result = expensive_computation()
cache.set('my_key', result, timeout=300)
return result
```
### FastAPI with Async Caching @ exa.ai, calmcode.io
```python
from fastapi import FastAPI
import httpx
from diskcache import Cache
import asyncio
app = FastAPI()
cache = Cache('/tmp/api_cache')
async def cached_api_call(url: str):
# Check cache
if url in cache:
print(f'Using cached content for {url}')
return cache[url]
print(f'Making new request for {url}')
# Make async request
async with httpx.AsyncClient(timeout=10) as client:
response = await client.get(url)
html = response.text
cache[url] = html
return html
@app.get("/fetch")
async def fetch_data(url: str):
content = await cached_api_call(url)
return {"content": content[:1000]}
```
### Multi-Process Web Crawler @ grantjenks.com/docs/diskcache/case-study-web-crawler.html
```python
from diskcache import Index, Deque
from multiprocessing import Process
import requests
# Shared queue and results across processes
todo = Deque('data/todo')
results = Index('data/results')
def crawl():
while True:
try:
url = todo.popleft()
except IndexError:
break
response = requests.get(url)
results[url] = response.text
# Add discovered URLs to queue
for link in extract_links(response.text):
todo.append(link)
# Start multiple crawler processes
processes = [Process(target=crawl) for _ in range(4)]
for process in processes:
process.start()
for process in processes:
process.join()
print(f"Crawled {len(results)} pages")
```
## Installation
### Basic Installation @ grantjenks.com/docs/diskcache
```bash
pip install diskcache
```
### Using uv (Recommended) @ astral.sh
```bash
uv add diskcache
```
### Development Installation @ grantjenks.com/docs/diskcache/development.rst
```bash
git clone https://github.com/grantjenks/python-diskcache.git
cd python-diskcache
pip install -r requirements.txt
```
## Core API Components
### Cache Class @ grantjenks.com/docs/diskcache/tutorial.html
The basic cache implementation backed by SQLite.
```python
from diskcache import Cache
# Initialize cache
cache = Cache(directory='/tmp/mycache')
# Dictionary-like operations
cache['key'] = 'value'
value = cache['key']
'key' in cache # True
del cache['key']
# Method-based operations
cache.set('key', 'value', expire=60, tag='category')
value = cache.get('key', default=None, read=False,
expire_time=False, tag=False)
cache.delete('key')
cache.clear()
# Statistics and management
cache.volume() # Estimated disk usage
cache.stats(enable=True, reset=False) # (hits, misses)
cache.evict('tag') # Remove all entries with tag
cache.expire() # Remove expired entries
cache.close()
```
### FanoutCache Class @ grantjenks.com/docs/diskcache/tutorial.html
Sharded cache for high-concurrency write scenarios.
```python
from diskcache import FanoutCache
# Sharded cache (default 8 shards)
cache = FanoutCache(
directory='/tmp/mycache',
shards=8,
timeout=1.0,
disk=Disk,
disk_min_file_size=2 ** 15
)
# Same API as Cache
cache.set('key', 'value')
value = cache.get('key')
```
### Eviction Policies @ grantjenks.com/docs/diskcache/tutorial.html
Four eviction policies control what happens when cache size limit is reached:
```python
from diskcache import Cache
# least-recently-stored (default) - fastest
cache = Cache(eviction_policy='least-recently-stored')
# least-recently-used - updates on read
cache = Cache(eviction_policy='least-recently-used')
# least-frequently-used - tracks access count
cache = Cache(eviction_policy='least-frequently-used')
# none - no eviction, unbounded growth
cache = Cache(eviction_policy='none')
```
**Performance Characteristics:**
- **least-recently-stored:** Fastest (no read updates)
- **least-recently-used:** Slower (updates timestamp on read)
- **least-frequently-used:** Slowest (increments counter on read)
- **none:** Fastest (no eviction overhead)
### Deque and Index Classes @ grantjenks.com/docs/diskcache/tutorial.html
Persistent, process-safe data structures.
```python
from diskcache import Deque, Index
# Persistent deque (FIFO queue)
deque = Deque('data/queue')
deque.append('item')
deque.appendleft('item')
item = deque.pop()
item = deque.popleft()
# Persistent dictionary
index = Index('data/index')
index['key'] = 'value'
value = index['key']
```
## Performance Benchmarks
### Single Process Performance @ grantjenks.com/docs/diskcache/cache-benchmarks.html
```text
diskcache.Cache:
get: 19.073 µs (median)
set: 114.918 µs (median)
delete: 87.976 µs (median)
pylibmc.Client (Memcached):
get: 42.915 µs (median)
set: 44.107 µs (median)
delete: 41.962 µs (median)
Comparison vs alternatives:
dbm: get 36µs, set 900µs, delete 740µs
shelve: get 41µs, set 928µs, delete 702µs
sqlitedict: get 513µs, set 697µs, delete 1717µs
pickleDB: get 92µs, set 1020µs, delete 1020µs
```
### Multi-Process Performance (8 processes) @ grantjenks.com/docs/diskcache/cache-benchmarks.html
```text
diskcache.Cache:
get: 20.027 µs (median)
set: 129.700 µs (median)
delete: 97.036 µs (median)
redis.StrictRedis:
get: 187.874 µs (median)
set: 192.881 µs (median)
delete: 185.966 µs (median)
pylibmc.Client:
get: 95.844 µs (median)
set: 97.036 µs (median)
delete: 94.891 µs (median)
```
**Key Insight:** diskcache is faster than network-based caches (Redis, Memcached) for single-machine workloads, especially for reads. @ grantjenks.com/docs/diskcache
### Django Cache Backend Performance @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
```text
diskcache DjangoCache:
get: 55.075 µs (median)
set: 303.984 µs (median)
delete: 228.882 µs (median)
Total: 98.465s
redis DjangoCache:
get: 214.100 µs (median)
set: 230.789 µs (median)
delete: 195.742 µs (median)
Total: 174.069s
filebased DjangoCache:
get: 114.918 µs (median)
set: 11.289 ms (median)
delete: 432.014 µs (median)
Total: 907.537s
```
**Key Insight:** diskcache is 1.8x faster than Redis and 9.2x faster than Django's file-based cache. @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
## When NOT to Use diskcache
### Scenarios Where diskcache May Not Be Suitable
1. **Distributed Systems** @ grantjenks.com/docs/diskcache
- diskcache is single-machine only
- Use Redis, Memcached, or distributed caches for multi-server architectures
- Cannot share cache across network nodes
2. **Extremely Low Latency Required** @ grantjenks.com/docs/diskcache/cache-benchmarks.html
- In-memory caches (lru_cache, dict) are faster for frequently accessed data
- diskcache adds disk I/O overhead (~20µs vs ~0.1µs)
- Consider in-memory + diskcache two-tier strategy
3. **Small Cache (< 100MB)** @ github.com/grantjenks/python-diskcache
- functools.lru_cache more appropriate for small in-memory caches
- Overhead of SQLite not justified for tiny caches
- Use lru_cache for simplicity
4. **Read-Only Access Patterns** @ grantjenks.com/docs/diskcache
- If cache is never updated after initialization
- Simple dict or frozen data structures may be simpler
- No eviction or expiration needed
5. **Cache Needs to Survive Disk Failures** @ grantjenks.com/docs/diskcache
- diskcache stores on local disk
- Disk failure = cache loss
- Redis with persistence and replication for critical caches
6. **Need Atomic Multi-Key Operations** @ grantjenks.com/docs/diskcache
- diskcache operations are single-key atomic
- No native support for transactions across multiple keys
- Redis supports MULTI/EXEC for atomic multi-key operations
7. **Advanced Data Structures Required** @ redis.io
- diskcache is key-value only
- Redis provides sets, sorted sets, lists, streams, etc.
- Use Redis if you need these structures
## Key Features
### Thread and Process Safety @ grantjenks.com/docs/diskcache/tutorial.html
All operations are atomic and safe for concurrent access:
```python
from diskcache import Cache
from multiprocessing import Process
cache = Cache('/tmp/shared')
def worker(worker_id):
for i in range(1000):
cache[f'worker_{worker_id}_key_{i}'] = f'value_{i}'
# Safe concurrent writes from multiple processes
processes = [Process(target=worker, args=(i,)) for i in range(4)]
for p in processes:
p.start()
for p in processes:
p.join()
```
### Expiration and TTL @ grantjenks.com/docs/diskcache/tutorial.html
```python
from diskcache import Cache
import time
cache = Cache()
# Set with expiration
cache.set('key', 'value', expire=5) # 5 seconds
time.sleep(6)
print(cache.get('key')) # None (expired)
# Manual expiration cleanup
cache.expire() # Remove all expired entries
```
### Tag-Based Invalidation @ grantjenks.com/docs/diskcache/tutorial.html
```python
from diskcache import Cache
cache = Cache(tag_index=True) # Enable tag index for performance
# Tag cache entries
cache.set('user:1:profile', data1, tag='user:1')
cache.set('user:1:settings', data2, tag='user:1')
cache.set('user:2:profile', data3, tag='user:2')
# Evict all entries for a tag
count = cache.evict('user:1')
print(f"Evicted {count} entries")
```
### Statistics and Monitoring @ grantjenks.com/docs/diskcache/tutorial.html
```python
from diskcache import Cache
cache = Cache()
# Enable statistics tracking
cache.stats(enable=True)
# Perform operations
for i in range(100):
cache.set(i, i)
for i in range(150):
cache.get(i)
# Get statistics
hits, misses = cache.stats(enable=False, reset=True)
print(f"Hits: {hits}, Misses: {misses}") # Hits: 100, Misses: 50
# Get cache size
volume = cache.volume()
print(f"Cache volume: {volume} bytes")
```
### Custom Serialization @ grantjenks.com/docs/diskcache/tutorial.html
```python
from diskcache import Cache, Disk, UNKNOWN
import pickle
import zlib
class CompressedDisk(Disk):
def put(self, key):
data = pickle.dumps(key)
compressed = zlib.compress(data)
return super().put(compressed)
def get(self, key, raw):
compressed = super().get(key, raw)
data = zlib.decompress(compressed)
return pickle.loads(data)
cache = Cache(disk=CompressedDisk)
```
## Migration and Compatibility
### From functools.lru_cache @ python.org/docs, grantjenks.com/docs/diskcache
```python
# Before: In-memory only
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_function(x):
return x * 2
# After: Persistent across restarts
from diskcache import Cache
cache = Cache('/tmp/mycache')
@cache.memoize()
def expensive_function(x):
return x * 2
```
### From Django File Cache @ grantjenks.com/docs/diskcache/tutorial.html
```python
# Before: Django's slow file cache
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
'LOCATION': '/var/tmp/django_cache',
}
}
# After: Fast diskcache
CACHES = {
'default': {
'BACKEND': 'diskcache.DjangoCache',
'LOCATION': '/var/tmp/django_cache',
'TIMEOUT': 300,
'SHARDS': 8,
'OPTIONS': {
'size_limit': 2 ** 30
}
}
}
```
### From Redis (Single Machine) @ grantjenks.com/docs/diskcache
```python
# Before: Redis client
import redis
r = redis.Redis(host='localhost', port=6379)
r.set('key', 'value')
value = r.get('key')
# After: diskcache (no server needed)
from diskcache import Cache
cache = Cache('/tmp/mycache')
cache.set('key', 'value')
value = cache.get('key')
```
## Advanced Patterns
### Cache Warming @ grantjenks.com/docs/diskcache
```python
from diskcache import Cache
def warm_cache():
cache = Cache('/tmp/mycache')
# Pre-populate cache with common queries
common_queries = load_common_queries()
for query in common_queries:
result = expensive_database_query(query)
cache.set(f'query:{query}', result, expire=3600)
print(f"Warmed cache with {len(common_queries)} entries")
```
### Two-Tier Caching @ grantjenks.com/docs/diskcache
```python
from functools import lru_cache
from diskcache import Cache
disk_cache = Cache('/tmp/mycache')
@lru_cache(maxsize=100) # Fast in-memory tier
def get_from_memory(key):
# Fall back to disk cache
return disk_cache.get(key)
def get_value(key):
# Try memory first (fast)
value = get_from_memory(key)
if value is None:
# Fetch from source and cache both tiers
value = expensive_operation(key)
disk_cache.set(key, value, expire=3600)
get_from_memory.cache_clear() # Invalidate memory
get_from_memory(key) # Warm memory cache
return value
```
## Testing and Development
### Temporary Cache for Tests @ grantjenks.com/docs/diskcache
```python
import tempfile
import shutil
from diskcache import Cache
def test_cache_operations():
# Create temporary cache directory
tmpdir = tempfile.mkdtemp()
try:
cache = Cache(tmpdir)
# Test operations
cache.set('key', 'value')
assert cache.get('key') == 'value'
cache.close()
finally:
# Cleanup
shutil.rmtree(tmpdir, ignore_errors=True)
```
### Context Manager for Cleanup @ grantjenks.com/docs/diskcache/tutorial.html
```python
from diskcache import Cache
# Automatic cleanup with context manager
with Cache('/tmp/mycache') as cache:
cache.set('key', 'value')
value = cache.get('key')
# cache.close() called automatically
```
## Additional Resources
### Official Documentation @ grantjenks.com/docs/diskcache
- Tutorial: <https://grantjenks.com/docs/diskcache/tutorial.html>
- Cache Benchmarks: <https://grantjenks.com/docs/diskcache/cache-benchmarks.html>
- Django Benchmarks: <https://grantjenks.com/docs/diskcache/djangocache-benchmarks.html>
- Case Study - Web Crawler: <https://grantjenks.com/docs/diskcache/case-study-web-crawler.html>
- Case Study - Landing Page: <https://grantjenks.com/docs/diskcache/case-study-landing-page-caching.html>
- API Reference: <https://grantjenks.com/docs/diskcache/api.html>
### Community Resources
- GitHub Repository: <https://github.com/grantjenks/python-diskcache> @ github.com
- Issue Tracker: <https://github.com/grantjenks/python-diskcache/issues> @ github.com
- PyPI Package: <https://pypi.org/project/diskcache/> @ pypi.org
- Author's Blog: <https://grantjenks.com/> @ grantjenks.com
### Related Projects by Author
- sortedcontainers: Fast pure-Python sorted collections @ github.com/grantjenks/python-sortedcontainers
- wordsegment: English word segmentation @ github.com/grantjenks/python-wordsegment
- runstats: Online statistics and regression @ github.com/grantjenks/python-runstats
## Summary
diskcache is the ideal choice for single-machine persistent caching when you need:
- Process-safe caching without running a separate server
- Gigabytes of cache using disk space instead of memory
- Better performance than Django's file cache or network caches for local workloads
- Memoization that persists across process restarts
- Tag-based invalidation for related cache entries
- Multiple eviction policies (LRU, LFU)
It provides production-grade reliability with 100% test coverage, extensive benchmarking, and stress testing. For distributed systems or when network latency is acceptable, Redis remains the better choice. For small in-memory caches, use functools.lru_cache.
**Performance Highlight:** diskcache can be faster than Redis and Memcached for single-machine workloads because it eliminates network overhead (19µs get vs 187µs for Redis). @ grantjenks.com/docs/diskcache/cache-benchmarks.html
---
**Research completed:** 2025-10-21 @ Claude Code Agent **Sources verified:** GitHub, Context7, PyPI, Official Documentation, Exa Code Context @ Multiple verified sources **Confidence level:** High - All information cross-referenced from official sources and benchmarks

View File

@@ -0,0 +1,643 @@
---
title: "python-dotenv: Environment Variable Management for Python"
library_name: python-dotenv
pypi_package: python-dotenv
category: configuration
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://github.com/theskumar/python-dotenv#readme"
official_repository: "https://github.com/theskumar/python-dotenv"
maintenance_status: "active"
---
# python-dotenv: Environment Variable Management for Python
## Core Purpose
python-dotenv reads key-value pairs from a `.env` file and can set them as environment variables. It helps in the development of applications following the [12-factor app](https://12factor.net/) principles.
### Problem Solved
The model must state that python-dotenv addresses development environment configuration challenges by enabling loading of environment variables from `.env` files, eliminating the need to manually set environment variables before running applications during local development.
Without python-dotenv, developers would need to:
- Manually export environment variables in shell sessions before running applications
- Hardcode configuration values in source code (security risk)
- Write custom file-reading and environment-setting code (reinventing the wheel)
- Maintain separate shell scripts for different environments
### 12-Factor App Methodology
The model must verify that python-dotenv implements the configuration principle from the 12-factor methodology: "Store config in the environment." This separation of configuration from code enables:
- Different configurations per environment (development, staging, production)
- No credentials in version control
- Configuration changes without code deployment
- Portable code across environments
## Official Information
The model must cite the following verified sources:
- **PyPI Package**: `python-dotenv` @ <https://pypi.org/project/python-dotenv/>
- **Version**: 1.1.1 (verified 2025-10-21)
- **Repository**: <https://github.com/theskumar/python-dotenv>
- **Stars**: 8,443+ GitHub stars
- **License**: BSD-3-Clause
- **Maintenance**: Production/Stable status, actively maintained by Saurabh Kumar and Bertrand Bonnefoy-Claudet
- **Python Support**: 3.9, 3.10, 3.11, 3.12, 3.13 + PyPy
## Python Version Compatibility
The model must state the following compatibility information from setup.py:
| Python Version | Support Status | Notes |
| -------------- | -------------- | ----------------------------- |
| 3.9 | Supported | Minimum version |
| 3.10 | Supported | Full support |
| 3.11 | Supported | Full support |
| 3.12 | Supported | Full support |
| 3.13 | Supported | Latest version |
| 3.14 | Expected | No breaking changes expected |
| PyPy | Supported | PyPy implementation supported |
The model must verify that python-dotenv has no version-specific features and works identically across supported Python versions.
## Installation
```bash
# Basic installation
pip install python-dotenv
# With CLI support
pip install "python-dotenv[cli]"
```
## Usage Examples
### Basic .env File Loading
The model must demonstrate the standard pattern from official documentation:
```python
# app.py
from dotenv import load_dotenv
import os
# Load variables from .env file
load_dotenv()
# Access environment variables
database_url = os.getenv('DATABASE_URL')
api_key = os.getenv('API_KEY')
debug = os.getenv('DEBUG', 'False') == 'True'
```
Corresponding `.env` file:
```bash
# .env
DATABASE_URL=postgresql://localhost/mydb
API_KEY=secret_key_12345
DEBUG=True
```
### Advanced Configuration Management
The model must show the dictionary-based pattern for merging multiple configuration sources:
```python
from dotenv import dotenv_values
import os
config = {
**dotenv_values(".env.shared"), # load shared development variables
**dotenv_values(".env.secret"), # load sensitive variables
**os.environ, # override with environment variables
}
# Access with priority: os.environ > .env.secret > .env.shared
database_url = config['DATABASE_URL']
```
### Multi-Environment Loading
The model must demonstrate environment-specific configuration loading:
```python
from dotenv import load_dotenv, find_dotenv
import os
# Determine environment
env = os.getenv('ENVIRONMENT', 'development')
dotenv_path = f'.env.{env}'
# Load environment-specific file
load_dotenv(dotenv_path=dotenv_path)
# .env.development, .env.staging, .env.production
```
### Variable Expansion with Defaults
The model must show POSIX-style variable interpolation:
```bash
# .env with variable expansion
DOMAIN=example.org
EMAIL=admin@${DOMAIN}
API_URL=https://${DOMAIN}/api
# Default values for missing variables
DATABASE_HOST=${DB_HOST:-localhost}
DATABASE_PORT=${DB_PORT:-5432}
REDIS_URL=redis://${REDIS_HOST:-localhost}:${REDIS_PORT:-6379}
```
### IPython/Jupyter Integration
The model must demonstrate the IPython extension usage:
```python
# In Jupyter notebook
%load_ext dotenv
%dotenv
# Load specific file
%dotenv /path/to/.env.local
# Override existing variables
%dotenv -o
# Verbose output
%dotenv -v
```
### CLI Usage
The model must show command-line interface examples:
```bash
# Set variables
dotenv set DATABASE_URL "postgresql://localhost/mydb"
dotenv set API_KEY "secret_key_123"
# Get specific value
dotenv get API_KEY
# List all variables
dotenv list
# List as JSON
dotenv list --format=json
# Run command with loaded environment
dotenv run -- python manage.py runserver
dotenv run -- pytest tests/
```
## Integration Patterns
### Django Integration
The model must demonstrate Django integration in manage.py and settings.py:
```python
# manage.py
import os
import sys
from dotenv import load_dotenv
if __name__ == '__main__':
# Load .env before Django imports settings
load_dotenv()
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')
from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
```
```python
# settings.py
import os
from dotenv import load_dotenv
load_dotenv()
SECRET_KEY = os.getenv('SECRET_KEY')
DEBUG = os.getenv('DEBUG', 'False') == 'True'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': os.getenv('DB_NAME'),
'USER': os.getenv('DB_USER'),
'PASSWORD': os.getenv('DB_PASSWORD'),
'HOST': os.getenv('DB_HOST', 'localhost'),
'PORT': os.getenv('DB_PORT', '5432'),
}
}
```
### Flask Integration
The model must show Flask application factory pattern:
```python
# app.py or __init__.py
from flask import Flask
from dotenv import load_dotenv
import os
load_dotenv()
def create_app():
app = Flask(__name__)
app.config['SECRET_KEY'] = os.getenv('SECRET_KEY')
app.config['DATABASE_URI'] = os.getenv('DATABASE_URL')
app.config['DEBUG'] = os.getenv('FLASK_DEBUG', 'False') == 'True'
return app
if __name__ == '__main__':
app = create_app()
app.run()
```
### FastAPI Integration
The model must demonstrate Pydantic Settings integration:
```python
# config.py
from pydantic_settings import BaseSettings
from dotenv import load_dotenv
load_dotenv()
class Settings(BaseSettings):
database_url: str
api_key: str
debug: bool = False
class Config:
env_file = '.env'
env_file_encoding = 'utf-8'
settings = Settings()
```
```python
# main.py
from fastapi import FastAPI
from config import settings
app = FastAPI(debug=settings.debug)
@app.get("/")
def read_root():
return {"database": settings.database_url}
```
### Docker and Container Integration
The model must show Docker integration with environment variable precedence:
```dockerfile
# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
# .env file is included in image for defaults
# But container env vars take precedence
CMD ["python", "app.py"]
```
```python
# app.py with override=False (default)
from dotenv import load_dotenv
import os
# Precedence: Docker env vars > .env file
load_dotenv(override=False)
# This respects container-injected variables
database_url = os.getenv('DATABASE_URL')
```
```yaml
# docker-compose.yml
services:
app:
build: .
environment:
- DATABASE_URL=postgresql://db/production
- DEBUG=False
# Container env vars override .env file
```
### Environment Variable Precedence
The model must explain the precedence rules:
**With `load_dotenv(override=False)` (default)**:
1. Existing environment variables (highest priority)
2. Variables from .env file
3. Default values in code
**With `load_dotenv(override=True)`**:
1. Variables from .env file (highest priority)
2. Existing environment variables
3. Default values in code
## Real-World Examples
The model must cite verified GitHub repositories using python-dotenv:
1. **theskumar/python-dotenv** (8,443 stars)
- Source repository with comprehensive examples
- @github:theskumar/python-dotenv
2. **daveebbelaar/langchain-experiments** (1,104 stars)
- LangChain AI applications with environment configuration
- @github:daveebbelaar/langchain-experiments
3. **iam-veeramalla/python-for-devops** (3,994 stars)
- DevOps automation scripts using dotenv
- @github:iam-veeramalla/python-for-devops
4. **AgentOps-AI/agentops** (4,978 stars)
- AI agent monitoring with environment configuration
- @github:AgentOps-AI/agentops
Common pattern observed across repositories:
```python
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv('API_KEY')
```
## When to Use python-dotenv
The model must create a decision matrix based on verified use cases:
### Use python-dotenv when
- **Local Development**: Managing configuration during development where setting environment variables manually is impractical
- **Multiple Developers**: Team needs consistent local environment setup without sharing credentials in version control
- **Multiple Environments**: Application runs in development, staging, production with different configurations
- **Third-Party Services**: Application integrates with APIs requiring secret keys
- **Framework Integration**: Using Django, Flask, FastAPI where .env files are standard practice
- **CI/CD Pipelines**: Testing with different configurations in continuous integration
- **Jupyter Notebooks**: Interactive development requiring API keys and configuration
- **Docker Development**: Local development with Docker where .env provides defaults but containers can override
- **12-Factor Applications**: Following cloud-native application design principles
The model must verify these scenarios from real-world usage patterns in GitHub repositories.
## When NOT to Use python-dotenv
The model must state limitations and alternative approaches:
### Do NOT use python-dotenv when
1. **Production Secrets Management**:
- Problem: .env files are plaintext and not encrypted
- Alternative: Use HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
- Reason: Production secrets require encryption, rotation, audit logging, and access control
2. **Native Environment Variables Sufficient**:
- Problem: Adds unnecessary dependency for simple cases
- Alternative: Use `os.environ` directly
- Reason: If environment already provides all variables, loading from file is redundant
3. **Compiled Applications**:
- Problem: .env files must be distributed with application
- Alternative: Bake configuration into build or use external configuration service
- Reason: Compiled/packaged applications should not rely on external files
4. **Read-Only Filesystems**:
- Problem: Cannot read .env file from disk
- Alternative: Use environment variables directly
- Reason: Some container runtimes and serverless platforms use read-only filesystems
5. **Complex Configuration Schemas**:
- Problem: .env files only support string key-value pairs
- Alternative: Use YAML, TOML, JSON with schema validation
- Reason: Complex nested configuration requires structured formats
6. **Dynamic Configuration**:
- Problem: .env files are loaded once at startup
- Alternative: Use configuration management service (Consul, etcd)
- Reason: Applications requiring runtime configuration updates need dynamic sources
7. **Distributed Systems Coordination**:
- Problem: Each service would need its own .env file
- Alternative: Use centralized configuration service
- Reason: Distributed systems need synchronized configuration updates
## Decision Guidance Matrix
The model must provide deterministic decision criteria:
```text
┌─────────────────────────────────┬──────────────────┬───────────────────┐
│ Scenario │ Use dotenv? │ Alternative │
├─────────────────────────────────┼──────────────────┼───────────────────┤
│ Local development │ YES │ N/A │
│ Development with team │ YES │ N/A │
│ CI/CD testing │ YES │ N/A │
│ Docker local development │ YES │ N/A │
│ Jupyter notebooks │ YES │ N/A │
│ Production deployments │ NO │ Secrets manager │
│ Production secrets storage │ NO │ Vault/KMS │
│ Simple scripts (no secrets) │ NO │ os.environ │
│ Complex nested config │ NO │ YAML/TOML │
│ Dynamic config updates │ NO │ Consul/etcd │
│ Serverless functions │ MAYBE │ Cloud env vars │
│ Distributed systems │ NO │ Config service │
└─────────────────────────────────┴──────────────────┴───────────────────┘
```
## File Format Reference
The model must document the supported .env syntax from official documentation:
```bash
# Basic key-value pairs
API_KEY=secret123
PORT=8080
DEBUG=true
# Quoted values
DATABASE_URL='postgresql://localhost/mydb'
APP_NAME="My Application"
# Multiline values (quoted)
CERTIFICATE="-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAKL0UG+mRbzMMA0GCSqGSIb3DQEBCwUA
-----END CERTIFICATE-----"
# Comments
# This is a comment
LOG_LEVEL=INFO # Inline comment
# Export directive (optional, no effect on parsing)
export PATH_EXTENSION=/usr/local/bin
# Variable expansion with POSIX syntax
DOMAIN=example.org
EMAIL=admin@${DOMAIN}
API_URL=https://${DOMAIN}/api
# Default values for missing variables
DATABASE_HOST=${DB_HOST:-localhost}
DATABASE_PORT=${DB_PORT:-5432}
# Escape sequences in double quotes
MESSAGE="Line 1\nLine 2\nLine 3"
TABS="Column1\tColumn2\tColumn3"
QUOTE="He said \"Hello\""
# Escape sequences in single quotes (only \\ and \')
PATH='C:\\Users\\Admin'
NAME='O\'Brien'
# Empty values
EMPTY_STRING=
EMPTY_VAR
# Spaces around = are ignored
REDIS_URL = redis://localhost:6379
```
Supported escape sequences:
- Double quotes: `\\`, `\'`, `\"`, `\a`, `\b`, `\f`, `\n`, `\r`, `\t`, `\v`
- Single quotes: `\\`, `\'`
## API Reference Summary
The model must list core functions from verified documentation:
| Function | Purpose | Returns | Common Use |
| --- | --- | --- | --- |
| `load_dotenv(dotenv_path=None, stream=None, verbose=False, override=False, interpolate=True, encoding=None)` | Load .env into os.environ | bool (success) | Application startup |
| `dotenv_values(dotenv_path=None, stream=None, verbose=False, interpolate=True, encoding=None)` | Parse .env to dict | dict | Config merging |
| `find_dotenv(filename='.env', raise_error_if_not_found=False, usecwd=False)` | Search for .env file | str (path) | Auto-discovery |
| `get_key(dotenv_path, key_to_get, encoding=None)` | Get single value | str or None | Read specific key |
| `set_key(dotenv_path, key_to_set, value_to_set, quote_mode='always', export=False, encoding=None)` | Write key-value | tuple | Programmatic updates |
| `unset_key(dotenv_path, key_to_unset, encoding=None)` | Remove key | tuple | Cleanup |
## Related Libraries
The model must cite verified alternatives from GitHub repository:
- **django-environ**: Django-specific with type coercion (@github:joke2k/django-environ)
- **python-decouple**: Strict separation with type casting (@github:HBNetwork/python-decouple)
- **environs**: Marshmallow-based validation (@github:sloria/environs)
- **dynaconf**: Multi-format with layered settings (@github:rochacbruno/dynaconf)
- **pydantic-settings**: Type-safe with Pydantic models (recommended for FastAPI)
The model must state that python-dotenv is the most widely adopted for simple .env loading.
## Best Practices
The model must recommend the following verified patterns:
1. **Never commit .env to version control**
```bash
# .gitignore
.env
.env.local
.env.*.local
```
2. **Provide .env.example for documentation**
```bash
# .env.example
DATABASE_URL=postgresql://localhost/mydb
API_KEY=your_api_key_here
DEBUG=False
```
3. **Load at application entry point**
```python
# Load once at startup, not in modules
if __name__ == '__main__':
from dotenv import load_dotenv
load_dotenv()
main()
```
4. **Use override=False for production precedence**
```python
# Let container/server env vars override .env
load_dotenv(override=False)
```
5. **Validate required variables**
```python
from dotenv import load_dotenv
import os
load_dotenv()
required_vars = ['DATABASE_URL', 'API_KEY', 'SECRET_KEY']
missing = [var for var in required_vars if not os.getenv(var)]
if missing:
raise ValueError(f"Missing required environment variables: {missing}")
```
6. **Use type conversion explicitly**
```python
# python-dotenv returns strings only
DEBUG = os.getenv('DEBUG', 'False').lower() in ('true', '1', 'yes')
PORT = int(os.getenv('PORT', '8000'))
TIMEOUT = float(os.getenv('TIMEOUT', '30.0'))
```
## Common Pitfalls
The model must warn about verified issues:
1. **Strings only**: All values are strings, must convert to int/bool/float manually
2. **Load timing**: Must load before importing modules that use environment variables
3. **Override behavior**: Default `override=False` means existing env vars take precedence
4. **Variable expansion**: Disabled with `interpolate=False`, check if this is needed
5. **Production secrets**: Never use .env files for production secret storage
## Verification
The model must self-check the following criteria:
- All sections present and complete
- Code examples tested against python-dotenv 1.1.1 documentation
- Sources cited with @ references
- Decision matrix provides deterministic guidance
- No claims about functionality without documentation verification
- Python version compatibility verified from setup.py
- Real-world examples from GitHub repositories with star counts
- Integration patterns match framework documentation
- Security warnings included for production usage
## Summary
The model must state that python-dotenv is the standard Python library for loading environment variables from .env files during development. It implements 12-factor app configuration principles, supports Python 3.9-3.13, and integrates with Django, Flask, FastAPI, and other frameworks. The library is suitable for development and testing environments but should not be used for production secrets management. For production deployments, environment variables should be injected by container orchestration or cloud platforms, with secrets managed by dedicated secrets management services.

View File

@@ -0,0 +1,647 @@
---
title: "Robot Framework: Generic Test Automation Framework"
library_name: robotframework
pypi_package: robotframework
category: testing
python_compatibility: "3.8+"
last_updated: "2025-11-02"
official_docs: "https://robotframework.org"
official_repository: "https://github.com/robotframework/robotframework"
maintenance_status: "active"
---
# Robot Framework
## Core Purpose
Robot Framework is a generic open source automation framework designed for acceptance testing, acceptance test driven development (ATDD), behavior driven development (BDD), and robotic process automation (RPA). It uses a keyword-driven testing approach that enables writing tests in a human-readable, tabular format.
**What problem does it solve?**
- Enables non-programmers to write and maintain automated tests
- Bridges communication gap between technical and non-technical stakeholders
- Provides a unified framework for acceptance testing across different technologies (web, API, desktop, mobile)
- Allows test automation without deep programming knowledge
- Facilitates living documentation through readable test cases
**What would be "reinventing the wheel" without it?**
Without Robot Framework, teams would need to:
- Build custom test execution frameworks with reporting capabilities
- Create their own keyword abstraction layers for business-readable tests
- Develop logging and debugging infrastructure from scratch
- Implement test data parsing for multiple formats (plain text, HTML, reStructuredText)
- Create plugin systems for extending test capabilities
- Build result aggregation and reporting tools
@Source: <https://github.com/robotframework/robotframework/blob/master/README.rst>
## Python Version Compatibility
**Minimum Python version:** 3.8
**Python 3.11-3.14 compatibility status:**
- Python 3.8-3.13: Fully supported (verified in SeleniumLibrary)
- Python 3.14: Expected to work (no known blockers)
**Version differences:**
- Robot Framework 7.x (current): Requires Python 3.8+
- Robot Framework 6.1.1: Last version supporting Python 3.6-3.7
- Robot Framework 4.1.3: Last version supporting Python 2.7, Jython, IronPython
@Source: <https://github.com/robotframework/robotframework/blob/master/INSTALL.rst> @Source: <https://github.com/robotframework/SeleniumLibrary/blob/master/README.rst>
## Installation
```bash
# Install latest stable version
pip install robotframework
# Install specific version
pip install robotframework==7.3.2
# Upgrade to latest
pip install --upgrade robotframework
# Install with common libraries
pip install robotframework robotframework-seleniumlibrary robotframework-requests
```
@Source: <https://github.com/robotframework/robotframework/blob/master/INSTALL.rst>
## When to Use Robot Framework
**Use Robot Framework when:**
1. **Acceptance testing is the primary goal**
- You need stakeholder-readable test cases
- Business analysts or QA engineers write tests without coding
- Tests serve as living documentation
2. **Keyword-driven testing fits your workflow**
- You want to build reusable test components (keywords)
- Test cases follow similar patterns with different data
- Abstraction layers improve maintainability
3. **Cross-technology testing is required**
- Testing web applications (via SeleniumLibrary or Browser library)
- API testing (via RequestsLibrary)
- Desktop applications (via various libraries)
- Mobile apps (via AppiumLibrary)
- SSH/remote systems (via SSHLibrary)
4. **Non-programmers need to contribute to tests**
- QA teams without Python expertise
- Domain experts need to validate test logic
- Collaboration between technical and business teams
5. **RPA (Robotic Process Automation) tasks**
- Automating repetitive business processes
- Desktop automation workflows
- Data migration and validation
**Do NOT use Robot Framework when:**
1. **Unit testing is the primary need**
- Use pytest for Python unit tests
- Robot Framework is too heavy for granular testing
- Fast feedback loops are critical (TDD cycles)
2. **Python-centric test suites**
- Team consists entirely of Python developers
- Complex test logic requires extensive Python code
- pytest fixtures and parametrization are more natural
3. **Performance testing**
- Use locust, JMeter, or k6 instead
- Robot Framework adds overhead for load testing
4. **Rapid TDD cycles**
- Robot Framework startup time is slower than pytest
- Test discovery and execution have overhead
- pytest is better for red-green-refactor cycles
5. **Complex test orchestration**
- Use pytest with advanced fixtures
- Dependency injection patterns work better in pure Python
@Source: Based on framework design patterns and ecosystem analysis
## Decision Matrix
| Requirement | Robot Framework | pytest | Recommendation |
| ---------------------- | --------------- | ------ | -------------------------------------------------- |
| Acceptance testing | ★★★★★ | ★★☆☆☆ | Robot Framework |
| Unit testing | ★☆☆☆☆ | ★★★★★ | pytest |
| API testing | ★★★★☆ | ★★★★☆ | Either (RF for acceptance, pytest for integration) |
| Web UI testing | ★★★★★ | ★★★☆☆ | Robot Framework |
| Non-programmer writers | ★★★★★ | ★☆☆☆☆ | Robot Framework |
| TDD cycles | ★★☆☆☆ | ★★★★★ | pytest |
| Living documentation | ★★★★★ | ★★☆☆☆ | Robot Framework |
| Python developers only | ★★☆☆☆ | ★★★★★ | pytest |
| BDD/Gherkin style | ★★★★☆ | ★★★★☆ | Either (RF native, pytest with behave) |
| RPA/automation | ★★★★★ | ★★☆☆☆ | Robot Framework |
## Core Concepts
### Keyword-Driven Testing Approach
Robot Framework tests are built from keywords - reusable test steps that can be combined to create test cases. Keywords can be:
- Built-in keywords from Robot Framework core
- Library keywords from external libraries (SeleniumLibrary, RequestsLibrary, etc.)
- User-defined keywords created in test files or resource files
### Test Case Syntax
```robotframework
*** Settings ***
Documentation Example test suite showing Robot Framework syntax
Library SeleniumLibrary
Library RequestsLibrary
Resource common_keywords.resource
*** Variables ***
${LOGIN_URL} http://localhost:8080/login
${BROWSER} Chrome
${API_URL} http://localhost:8080/api
*** Test Cases ***
Valid User Login
[Documentation] Test successful login with valid credentials
[Tags] smoke login
Open Browser To Login Page
Input Username demo
Input Password mode
Submit Credentials
Welcome Page Should Be Open
[Teardown] Close Browser
API Health Check
[Documentation] Verify API is responding
${response}= GET ${API_URL}/health
Status Should Be 200
Should Be Equal As Strings ${response.json()}[status] healthy
*** Keywords ***
Open Browser To Login Page
Open Browser ${LOGIN_URL} ${BROWSER}
Title Should Be Login Page
Input Username
[Arguments] ${username}
Input Text username_field ${username}
Input Password
[Arguments] ${password}
Input Text password_field ${password}
Submit Credentials
Click Button login_button
Welcome Page Should Be Open
Title Should Be Welcome Page
```
@Source: <https://github.com/robotframework/SeleniumLibrary/blob/master/README.rst> @Source: <https://github.com/robotframework/robotframework> (User Guide examples)
## Real-World Usage Patterns
### Pattern 1: Web Testing with SeleniumLibrary
SeleniumLibrary is the most popular Robot Framework library for web testing, supporting Selenium 4 and Python 3.8-3.13.
```robotframework
*** Settings ***
Library SeleniumLibrary
*** Test Cases ***
Search Product
Open Browser https://example.com Chrome
Input Text id:search-input laptop
Click Button id:search-button
Page Should Contain Search Results
Close Browser
```
**Example repositories:**
- <https://github.com/robotframework/SeleniumLibrary> (1,450+ stars)
- <https://github.com/robotframework/WebDemo> (demo project)
@Source: <https://github.com/robotframework/SeleniumLibrary>
### Pattern 2: Modern Browser Testing with Browser Library
Browser library (powered by Playwright) is the next-generation web testing library, offering better performance and reliability.
```robotframework
*** Settings ***
Library Browser
*** Test Cases ***
Fast Modern Web Test
New Browser chromium headless=False
New Page https://example.com
Type Text id=search robot framework
Click button#submit
Get Text h1 == Results
Close Browser
```
**Example repository:**
- <https://github.com/MarketSquare/robotframework-browser> (605+ stars)
@Source: <https://github.com/MarketSquare/robotframework-browser>
### Pattern 3: API Testing with RequestsLibrary
RequestsLibrary wraps the Python requests library for API testing.
```robotframework
*** Settings ***
Library RequestsLibrary
*** Test Cases ***
GET Request Test
${response}= GET https://jsonplaceholder.typicode.com/posts/1
Should Be Equal As Strings 1 ${response.json()}[id]
Status Should Be 200
POST Request Test
&{data}= Create Dictionary title=Test body=Content userId=1
${response}= POST https://jsonplaceholder.typicode.com/posts
... json=${data}
Status Should Be 201
```
**Example repository:**
- <https://github.com/MarketSquare/robotframework-requests> (506+ stars)
@Source: <https://github.com/MarketSquare/robotframework-requests/blob/master/README.md>
### Pattern 4: Data-Driven Testing
The data-driven approach excels when the same workflow needs to be executed with different inputs.
```robotframework
*** Settings ***
Test Template Calculate
*** Test Cases *** Expression Expected
Addition 12 + 2 + 2 16
2 + -3 -1
Subtraction 12 - 2 - 2 8
2 - -3 5
Multiplication 12 * 2 * 2 48
Division 12 / 2 / 2 3
*** Keywords ***
Calculate
[Arguments] ${expression} ${expected}
${result}= Evaluate ${expression}
Should Be Equal As Numbers ${result} ${expected}
```
@Source: <https://github.com/robotframework/RobotDemo/blob/master/data_driven.robot>
### Pattern 5: BDD/Gherkin Style
Robot Framework supports Given-When-Then syntax for behavior-driven development.
```robotframework
*** Test Cases ***
User Can Purchase Product
Given user is logged in
When user adds product to cart
And user proceeds to checkout
Then order should be confirmed
*** Keywords ***
User Is Logged In
Open Browser To Login Page
Login With Valid Credentials
User Adds Product To Cart
Search For Product laptop
Add First Result To Cart
User Proceeds To Checkout
Click Cart Icon
Click Checkout Button
Order Should Be Confirmed
Page Should Contain Order Confirmed
```
@Source: Robot Framework User Guide (Gherkin style examples)
## Integration Patterns
### SeleniumLibrary (Web Testing)
```bash
pip install robotframework-seleniumlibrary
```
- Most mature web testing library
- Supports Selenium 4
- Selenium Manager handles browser drivers automatically
- Python 3.8-3.13 compatible
@Source: <https://github.com/robotframework/SeleniumLibrary>
### Browser Library (Modern Web Testing)
```bash
pip install robotframework-browser
rfbrowser init # Install Playwright browsers
```
- Powered by Playwright
- Better performance and reliability than Selenium
- Built-in waiting and auto-retry mechanisms
- Supports modern browser features
@Source: <https://github.com/MarketSquare/robotframework-browser>
### RequestsLibrary (API Testing)
```bash
pip install robotframework-requests
```
- Wraps Python requests library
- RESTful API testing
- OAuth and authentication support
- JSON/XML response validation
@Source: <https://github.com/MarketSquare/robotframework-requests>
### SSHLibrary (Remote Testing)
```bash
pip install robotframework-sshlibrary
```
- SSH and SFTP operations
- Remote command execution
- File transfer capabilities
- Terminal emulation
@Source: <https://github.com/MarketSquare/SSHLibrary>
### AppiumLibrary (Mobile Testing)
```bash
pip install robotframework-appiumlibrary
```
- Mobile app testing (iOS/Android)
- Built on Appium
- Cross-platform mobile automation
@Source: <https://github.com/serhatbolsu/robotframework-appiumlibrary>
## Custom Keyword Libraries
Robot Framework can be extended with Python libraries:
```python
# MyLibrary.py
class MyLibrary:
"""Custom keyword library for Robot Framework."""
def __init__(self, host, port=80):
"""Library initialization with arguments."""
self.host = host
self.port = port
def connect_to_service(self):
"""Keyword: Connect To Service
Establishes connection to the configured service.
"""
# Implementation
pass
def send_message(self, message):
"""Keyword: Send Message
Sends a message to the service.
Arguments:
message: The message to send
"""
# Implementation
pass
```
Usage in test:
```robotframework
*** Settings ***
Library MyLibrary localhost 8080
*** Test Cases ***
Send Test Message
Connect To Service
Send Message Hello, Robot Framework!
```
@Source: <https://github.com/robotframework/robotframework> (User Guide - Creating Libraries)
## Execution and Reporting
### Basic Execution
```bash
# Run all tests in a file
robot tests.robot
# Run tests in a directory
robot path/to/tests/
# Run with specific browser
robot --variable BROWSER:Firefox tests.robot
# Run tests with specific tags
robot --include smoke tests/
# Run and generate custom output directory
robot --outputdir results tests.robot
# Run with Python module syntax
python -m robot tests.robot
```
### Advanced Execution
```bash
# Parallel execution (with pabot)
pip install robotframework-pabot
pabot --processes 4 tests/
# Re-run failed tests
robot --rerunfailed output.xml tests.robot
# Combine multiple test results
rebot --name Combined output1.xml output2.xml
```
@Source: <https://github.com/robotframework/robotframework/blob/master/doc/userguide/src/ExecutingTestCases/BasicUsage.rst>
## Ecosystem Tools
### RIDE (Test Editor)
Desktop IDE for creating and editing Robot Framework tests. Supports Python 3.8-3.13.
```bash
pip install robotframework-ride
ride.py
```
@Source: <https://github.com/robotframework/RIDE>
### RobotCode (VS Code Extension)
LSP-powered VS Code extension for Robot Framework development.
- Syntax highlighting and code completion
- Debugging support
- Test execution from IDE
- Keyword documentation
@Source: <https://github.com/robotcodedev/robotcode>
### Robocop (Linter)
Static code analysis and linting tool for Robot Framework.
```bash
pip install robotframework-robocop
robocop tests/
```
@Source: <https://github.com/MarketSquare/robotframework-robocop>
### Tidy (Code Formatter)
Code formatting tool for Robot Framework files.
```bash
pip install robotframework-tidy
robotidy tests/
```
@Source: <https://github.com/MarketSquare/robotframework-tidy> (referenced in ecosystem)
## Maintenance Status
**Status:** Actively maintained
- Latest stable: 7.3.2 (July 2025)
- Latest pre-release: 7.4b1 (October 2025)
- Active development on GitHub (11,000+ stars, 2,400+ forks)
- Non-profit Robot Framework Foundation provides governance
- Regular releases (multiple per year)
- Strong community support (Slack, Forum, GitHub)
**Project Health Indicators:**
- 269 open issues (October 2025)
- Active commit history
- Responsive maintainers
- Large ecosystem of maintained libraries
- Corporate backing and foundation support
@Source: <https://github.com/robotframework/robotframework> @Source: <https://pypi.org/project/robotframework/>
## Comparison with Alternatives
### vs pytest
**Choose Robot Framework:**
- Acceptance testing focus
- Non-programmers write tests
- Keyword-driven approach preferred
- Cross-technology testing (web, API, desktop, mobile)
- Living documentation requirement
**Choose pytest:**
- Unit testing focus
- Python developers only
- Complex test logic in Python
- Rapid TDD cycles
- Python-native fixtures and parametrization
### vs Behave (Python BDD)
**Choose Robot Framework:**
- Broader scope (not just BDD)
- Rich ecosystem of libraries
- Keyword reusability across projects
- Built-in reporting and logging
**Choose Behave:**
- Pure BDD/Gherkin focus
- Step definitions in Python
- Integration with pytest
### vs Cucumber (JVM BDD)
**Choose Robot Framework:**
- Python ecosystem
- RPA capabilities
- Broader than just BDD
**Choose Cucumber:**
- JVM ecosystem (Java, Kotlin, Scala)
- Pure Gherkin syntax
- Enterprise Java integration
## Example Projects
1. **RobotDemo** - Official demo project
- <https://github.com/robotframework/RobotDemo>
- Shows keyword-driven, data-driven, and Gherkin styles
- Calculator library implementation example
2. **WebDemo** - Web testing demo
- Referenced in SeleniumLibrary docs
- Complete login test example with page objects
3. **awesome-robotframework** - Curated resources
- <https://github.com/MarketSquare/awesome-robotframework>
- Libraries, tools, and example projects
- Community contributions
@Source: <https://github.com/robotframework/RobotDemo> @Source: <https://github.com/MarketSquare/awesome-robotframework>
## Summary
Robot Framework is the premier choice for acceptance testing and RPA in the Python ecosystem. Its keyword-driven approach enables collaboration between technical and non-technical team members, making it ideal for projects where tests serve as living documentation. The framework excels at cross-technology testing (web, API, mobile, desktop) through its rich ecosystem of libraries.
However, it is not a replacement for pytest in unit testing scenarios. Teams should use Robot Framework for acceptance-level tests and pytest for unit/integration tests. The frameworks complement each other well in a comprehensive testing strategy.
**Quick decision guide:**
- Need stakeholder-readable tests? → Robot Framework
- Need unit tests? → pytest
- Need both? → Use both frameworks together
- Pure Python developers doing integration tests? → Consider pytest first
- QA team without coding experience? → Robot Framework
The framework's active maintenance, strong community, and foundation backing ensure long-term viability for projects adopting it.

View File

@@ -0,0 +1,467 @@
---
title: "shiv: Python Zipapp Builder for Self-Contained Applications"
library_name: shiv
pypi_package: shiv
category: packaging-distribution
python_compatibility: "3.8+"
last_updated: "2025-11-02"
official_docs: "https://shiv.readthedocs.io"
official_repository: "https://github.com/linkedin/shiv"
maintenance_status: "active"
---
# shiv
## Overview
shiv is a command-line utility for building fully self-contained Python zipapps as outlined in PEP 441, but with all their dependencies included. It is developed and maintained by LinkedIn and provides a fast, easy way to distribute Python applications.
**Official Repository**: @<https://github.com/linkedin/shiv> **Official Documentation**: @<https://shiv.readthedocs.io/en/latest/> **PyPI Package**: @<https://pypi.org/project/shiv/>
## Core Purpose
### Problem Statement
shiv solves the challenge of distributing Python applications with all their dependencies bundled into a single executable file without requiring complex build processes or compilation.
**What problems does shiv solve?**
1. **Dependency bundling**: Packages your application and all its dependencies into a single `.pyz` file
2. **Simple distribution**: Creates executable files that can be shared and run on systems with compatible Python installations
3. **No compilation required**: Unlike PyInstaller or cx_Freeze, shiv does not compile Python code to binaries
4. **Fast deployment**: Built on Python's standard library zipapp module (PEP 441) for minimal overhead
5. **Reproducible builds**: Creates deterministic outputs for version control and deployment
**When you would be "reinventing the wheel" without shiv:**
- Building custom scripts to bundle dependencies with applications
- Manually creating zipapp structures with dependencies
- Writing deployment automation for Python CLI tools
- Managing virtual environments on deployment targets
## When to Use shiv vs Alternatives
### Use shiv When
- Deploying Python applications to controlled environments where Python is already installed
- Building CLI tools for internal distribution within organizations
- Creating portable Python applications for Linux/macOS/WSL environments
- You need fast build times and simple deployment workflows
- Your application is pure Python or has platform-specific compiled dependencies that can be installed per-platform
- You want to leverage the PEP 441 zipapp standard
### Use PyInstaller/cx_Freeze When
- Distributing to end-users who do not have Python installed
- Creating true standalone executables with embedded Python interpreter
- Targeting Windows environments without Python installations
- Building GUI applications for general consumer distribution
- You need absolute portability without Python runtime dependencies
### Use wheel/sdist When
- Publishing libraries to PyPI
- Developing packages meant to be installed via pip
- Creating reusable components rather than standalone applications
- Working in environments where pip/package managers are the standard
## Decision Matrix
```text
┌─────────────────────────┬──────────┬─────────────┬───────────┐
│ Requirement │ shiv │ PyInstaller │ wheel │
├─────────────────────────┼──────────┼─────────────┼───────────┤
│ Python required │ Yes │ No │ Yes │
│ Build speed │ Fast │ Slow │ Fast │
│ Bundle size │ Small │ Large │ Smallest │
│ Cross-platform binary │ No │ Yes │ No │
│ PEP 441 compliant │ Yes │ No │ N/A │
│ Installation required │ No │ No │ Yes (pip) │
│ C extension support │ Limited* │ Full │ Full │
└─────────────────────────┴──────────┴─────────────┴───────────┘
* C extensions work but are platform-specific (not cross-compatible)
```
## Python Version Compatibility
- **Minimum Python version**: 3.8 (per setup.cfg @<https://github.com/linkedin/shiv/blob/main/setup.cfg>)
- **Tested versions**: 3.8, 3.9, 3.10, 3.11
- **Python 3.11+ compatibility**: Fully compatible
- **Python 3.12-3.14 status**: Expected to work (relies on standard library zipapp module)
- **PEP 441 dependency**: Requires Python 3.5+ (PEP 441 introduced in Python 3.5)
### Installation
```bash
# From PyPI
pip install shiv
# From source
git clone https://github.com/linkedin/shiv.git
cd shiv
python3 -m pip install -e .
```
## Core Concepts
### PEP 441 zipapp Integration
shiv builds on Python's standard library `zipapp` module (PEP 441) which allows creating executable ZIP files. The key enhancement is automatic dependency installation and bundling.
**How it works:**
1. Creates a temporary directory structure
2. Installs specified packages and dependencies using pip
3. Packages everything into a ZIP file
4. Adds a shebang line to make it executable
5. Extracts dependencies to `~/.shiv/` cache on first run
### Deployment Patterns
**Single-file distribution:**
```bash
# Build once
shiv -c myapp -o myapp.pyz myapp
# Distribute myapp.pyz
# Users run: ./myapp.pyz
```
**Library bundling:**
```bash
# Bundle multiple packages
shiv -o toolkit.pyz requests click pyyaml
```
**From requirements.txt:**
```bash
shiv -r requirements.txt -o app.pyz -c app
```
## Usage Examples
### Basic Command-Line Tool
Create a standalone executable of flake8:
```bash
shiv -c flake8 -o ~/bin/flake8 flake8
```
**Explanation:**
- `-c flake8`: Specifies the console script entry point
- `-o ~/bin/flake8`: Output file location
- `flake8`: Package to install from PyPI
**Running:**
```bash
~/bin/flake8 --version
# Output: 3.7.8 (mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.1)
```
### Interactive Python Environment
Create an interactive executable with libraries:
```bash
shiv -o boto.pyz boto
```
**Running:**
```bash
./boto.pyz
# Opens Python REPL with boto available
>>> import boto
>>> boto.__version__
'2.49.0'
```
### Real-World Example: CLI Application Distribution
From @<https://github.com/scs/smartmeter-datacollector/blob/master/README.md>:
```bash
# Build a self-contained zipapp using shiv
poetry run poe build_shiv
```
This creates a `.pyz` file containing the smartmeter-datacollector application and all dependencies, distributable as a single file.
### Custom Python Interpreter Path
```bash
shiv -c myapp -o myapp.pyz -p "/usr/bin/env python3" myapp
```
The `-p` flag specifies the shebang line for the executable.
### Building from Local Package
```bash
# From current directory with setup.py or pyproject.toml
shiv -c myapp -o myapp.pyz .
```
### Advanced: Building shiv with shiv
From @<https://github.com/linkedin/shiv/blob/main/README.md>:
```bash
python3 -m venv .
source bin/activate
pip install shiv
shiv -c shiv -o shiv shiv
```
This creates a self-contained shiv executable using shiv itself, demonstrating bootstrapping capability.
## Integration Patterns
### CI/CD Pipeline Integration
```yaml
# Example GitHub Actions workflow
- name: Build application zipapp
run: |
pip install shiv
shiv -c myapp -o dist/myapp.pyz myapp
- name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: myapp-zipapp
path: dist/myapp.pyz
```
### Makefile Integration
From @<https://github.com/JanssenProject/jans/blob/main/jans-cli-tui/Makefile>:
```makefile
zipapp:
@echo "Building zipapp with shiv"
shiv -c jans_cli_tui -o jans_cli_tui.pyz .
```
### Poetry Integration
In `pyproject.toml`:
```toml
[tool.poe.tasks]
build_shiv = "shiv -c myapp -o dist/myapp.pyz ."
```
Run with: `poetry run poe build_shiv`
## Platform-Specific Considerations
### Linux/macOS
- **Shebang support**: Full support for `#!/usr/bin/env python3`
- **Permissions**: Requires `chmod +x` for executable files
- **Cache location**: `~/.shiv/` for dependency extraction
### Windows
- **Shebang limitations**: Windows does not natively support shebangs
- **Execution**: Must run as `python myapp.pyz`
- **Alternative**: Use Python launcher: `py myapp.pyz`
- **Cache location**: `%USERPROFILE%\.shiv\`
### Cross-Platform Gotchas
**From @<https://github.com/linkedin/shiv/blob/main/README.md>:**
> Zipapps created with shiv are not guaranteed to be cross-compatible with other architectures. For example, a pyz file built on a Mac may only work on other Macs, likewise for RHEL, etc. This usually only applies to zipapps that have C extensions in their dependencies. If all your dependencies are pure Python, then chances are the pyz will work on other platforms.
**Recommendation**: Build platform-specific executables for production deployments when using packages with C extensions.
## Cache Management
shiv extracts dependencies to `~/.shiv/` (or `SHIV_ROOT`) on first run. This directory can grow over time.
**Cleanup:**
```bash
# Remove all cached extractions
rm -rf ~/.shiv/
# Set custom cache location
export SHIV_ROOT=/tmp/shiv_cache
./myapp.pyz
```
## When NOT to Use shiv
### Scenarios Where Alternatives Are Better
1. **Windows-only distribution without Python**: Use PyInstaller or cx_Freeze for embedded interpreter
2. **End-user applications**: Users expect double-click executables, not Python scripts
3. **Cross-platform binaries from single build**: shiv requires platform-specific builds for C extensions
4. **Library distribution**: Use wheel/sdist and publish to PyPI
5. **Complex GUI applications**: PyInstaller has better support for frameworks like PyQt/Tkinter
6. **Environments without Python**: shiv requires a compatible Python installation on the target system
## Common Use Cases
### Internal Tool Distribution
**Example**: DevOps teams distributing CLI tools
```bash
# Build deployment tool
shiv -c deploy -o deploy.pyz deploy-tool
# Distribute to team members
# Everyone runs: ./deploy.pyz --environment prod
```
### Lambda/Cloud Function Packaging
While AWS Lambda has native Python support, shiv can simplify dependency management:
```bash
shiv -o lambda_function.pyz --no-binary :all: boto3 requests
```
### Portable Development Environments
Create portable toolchains:
```bash
# Bundle linting tools
shiv -o lint.pyz black pylint mypy flake8
# Bundle testing tools
shiv -o test.pyz pytest pytest-cov hypothesis
```
## Real-World Projects Using shiv
Based on GitHub search results (@<https://github.com/search?q=shiv+zipapp>):
1. **JanssenProject/jans** - IAM authentication server
- Uses shiv to build CLI and TUI applications
- Makefile integration for zipapp builds
- @<https://github.com/JanssenProject/jans>
2. **scs/smartmeter-datacollector** - Smart meter data collection
- Poetry integration with custom build command
- Self-contained distribution for Raspberry Pi
- @<https://github.com/scs/smartmeter-datacollector>
3. **praetorian-inc/noseyparker-explorer** - Security scanning results explorer
- TUI application distributed via shiv
- @<https://github.com/praetorian-inc/noseyparker-explorer>
4. **ClericPy/zipapps** - Alternative zipapp builder
- Built as comparison/alternative to shiv
- @<https://github.com/ClericPy/zipapps>
## Additional Resources
### Official Documentation
- PEP 441 - Improving Python ZIP Application Support: @<https://www.python.org/dev/peps/pep-0441/>
- Python zipapp module: @<https://docs.python.org/3/library/zipapp.html>
- shiv documentation: @<https://shiv.readthedocs.io/en/latest/>
- Lincoln Loop blog: "Dissecting a Python Zipapp Built with Shiv": @<https://lincolnloop.com/insights/dissecting-python-zipapp-built-shiv/>
### Community Resources
- Real Python tutorial: "Python's zipapp: Build Executable Zip Applications": @<https://realpython.com/python-zipapp/>
- jhermann blog: "Bundling Python Dependencies in a ZIP Archive": @<https://jhermann.github.io/blog/python/deployment/2020/03/08/ship_libs_with_shiv.html>
### Comparison Articles
- PyOxidizer comparisons (includes shiv): @<https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_comparisons.html>
- Hacker News discussion: @<https://news.ycombinator.com/item?id=26832809>
## Technical Implementation Details
### Dependencies
From @<https://github.com/linkedin/shiv/blob/main/setup.cfg>:
```ini
[options]
install_requires =
click>=6.7,!=7.0
pip>=9.0.3
setuptools
python_requires = >=3.8
```
shiv has minimal dependencies, relying primarily on standard library components plus click for CLI and pip for dependency resolution.
### Entry Points
shiv provides two console scripts:
1. `shiv`: Main build tool
2. `shiv-info`: Inspect zipapp metadata
### Build Backend
Uses setuptools with pyproject.toml (PEP 517/518 compliant):
```toml
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
```
## Maintenance and Support
- **License**: BSD 2-Clause License
- **Maintainer**: LinkedIn (@<https://github.com/linkedin>)
- **GitHub Stars**: 1,884+ (as of October 2025)
- **Active Development**: Yes (last updated October 2025)
- **Open Issues**: 66 (as of October 2025)
- **Community**: Active issue tracker and pull request reviews
## Security Considerations
1. **Code signing**: shiv does not sign executables; implement external signing if required
2. **Dependency verification**: shiv uses pip, which respects pip's security model
3. **Cache security**: `~/.shiv/` directory contains extracted dependencies; ensure proper permissions
4. **Supply chain**: Verify package sources before building zipapps
## Performance Characteristics
- **Build time**: Fast (seconds for typical applications)
- **Startup overhead**: First run extracts to cache (one-time cost), subsequent runs are instant
- **Runtime performance**: Native Python performance (no interpretation overhead)
- **File size**: Smaller than PyInstaller bundles (no embedded interpreter)
## Troubleshooting
### Common Issues
**Issue**: "shiv requires Python >= 3.8" **Solution**: Upgrade Python or use an older shiv version
**Issue**: "ImportError on different platform" **Solution**: Rebuild zipapp on target platform for C extension dependencies
**Issue**: "Permission denied" **Solution**: `chmod +x myapp.pyz`
**Issue**: "SHIV_ROOT fills up disk" **Solution**: Clean cache: `rm -rf ~/.shiv/` or set `SHIV_ROOT` to tmpfs
## Conclusion
shiv is an excellent choice for distributing Python applications in controlled environments where Python is available. It provides a simple, fast, and standards-based approach to application packaging without the complexity of binary compilation. For internal tools, CLI utilities, and cloud function packaging, shiv offers an ideal balance of simplicity and functionality.
**Quick decision guide:**
- Need standalone binary with no Python? Use PyInstaller/cx_Freeze
- Distributing library? Use wheel + PyPI
- Internal tool with Python available? Use shiv
- Cross-platform GUI app? Use PyInstaller
- Cloud function deployment? Consider shiv or native platform tools

View File

@@ -0,0 +1,486 @@
---
title: "uvloop: Ultra-Fast AsyncIO Event Loop"
library_name: uvloop
pypi_package: uvloop
category: async-io
python_compatibility: "3.8+"
last_updated: "2025-11-02"
official_docs: "https://uvloop.readthedocs.io"
official_repository: "https://github.com/MagicStack/uvloop"
maintenance_status: "active"
---
# uvloop: Ultra-Fast AsyncIO Event Loop
## Overview
uvloop is a drop-in replacement for Python's built-in asyncio event loop that delivers 2-4x performance improvements for network-intensive applications. Built on top of libuv (the same C library that powers Node.js) and implemented in Cython, uvloop enables Python asyncio code to approach the performance characteristics of compiled languages like Go.
## The Problem It Solves
### Without uvloop (Reinventing the Wheel)
Python's standard asyncio event loop, while functional, has performance limitations that become apparent in high-throughput scenarios:
1. **Pure Python implementation** with overhead from interpreter execution
2. **Slower I/O operations** compared to C-based event loops
3. **Limited networking throughput** for concurrent connections
4. **Higher CPU utilization** for equivalent workloads
Writing a custom event loop or using lower-level libraries like epoll directly adds complexity and defeats the purpose of asyncio's high-level abstractions.
### With uvloop (Best Practice)
uvloop provides a zero-code-change performance boost by simply replacing the event loop implementation:
- **2-4x faster** than standard asyncio @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/)
- **Drop-in replacement** requiring minimal code changes
- **Production-proven** in high-performance applications like Sanic, uvicorn, and vLLM
- **libuv foundation** providing battle-tested async I/O primitives
## Core Use Cases
### 1. High-Performance Web Servers
uvloop is the default event loop for production ASGI servers:
```python
# uvicorn with uvloop (automatic with standard install)
# @ https://github.com/encode/uvicorn
import uvloop
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
# Run with uvloop
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000, loop="uvloop")
```
### 2. WebSocket Servers
High-throughput WebSocket applications @ [sanic-org/sanic](https://github.com/sanic-org/sanic):
```python
import uvloop
from sanic import Sanic, response
app = Sanic("websocket_app")
@app.websocket("/feed")
async def feed(request, ws):
while True:
data = await ws.recv()
await ws.send(data)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
```
### 3. Concurrent Network Clients
Web scraping and API clients @ [howie6879/ruia](https://github.com/howie6879/ruia):
```python
import asyncio
import uvloop
import aiohttp
async def fetch_many(urls):
async with aiohttp.ClientSession() as session:
tasks = [session.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return responses
# Use uvloop for 2-4x faster concurrent requests
uvloop.run(fetch_many(["https://example.com"] * 1000))
```
## Integration Patterns
### Pattern 1: Global Installation (Recommended for Python <3.11)
```python
import asyncio
import uvloop
# Install uvloop as default event loop policy
uvloop.install()
async def main():
# Your async code here
await asyncio.sleep(1)
# Now all asyncio.run() calls use uvloop
asyncio.run(main())
```
### Pattern 2: Direct Run (Preferred for Python >=3.11)
```python
import uvloop
async def main():
# Your async application entry point
pass
# Simplest usage - replaces asyncio.run()
# @ https://github.com/MagicStack/uvloop/blob/master/README.rst
uvloop.run(main())
```
### Pattern 3: Explicit Event Loop (Advanced)
```python
import asyncio
import sys
import uvloop
async def main():
# Application logic
pass
# Python 3.11+ with explicit loop factory
if sys.version_info >= (3, 11):
with asyncio.Runner(loop_factory=uvloop.new_event_loop) as runner:
runner.run(main())
else:
uvloop.install()
asyncio.run(main())
```
### Pattern 4: Platform-Specific Installation
```python
import asyncio
import os
# Only use uvloop on POSIX systems (Linux/macOS)
# @ https://github.com/wanZzz6/Modules-Learn
if os.name == 'posix':
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
# Windows will use default asyncio (proactor loop)
async def main():
pass
asyncio.run(main())
```
## Real-World Examples
### FastAPI/Uvicorn Production Setup
```python
# @ https://medium.com/israeli-tech-radar/so-you-think-python-is-slow-asyncio-vs-node-js-fe4c0083aee4
import asyncio
import uvloop
from fastapi import FastAPI
import uvicorn
# Enable uvloop globally
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
app = FastAPI()
@app.get("/api/data")
async def handle_data():
# Simulate async database query
await asyncio.sleep(0.1)
return {"message": "Hello from Python"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=3000, loop="uvloop")
```
### Discord Bot (hikari-py)
```python
# @ https://github.com/hikari-py/hikari
import asyncio
import os
if os.name != "nt": # Not Windows
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
# Discord bot code follows - automatic 2-4x performance boost
```
### Async Web Scraper
```python
# @ https://github.com/elliotgao2/gain
import asyncio
import uvloop
import aiohttp
async def handle_response(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
tasks = [handle_response(session, f"https://api.example.com/item/{i}")
for i in range(1000)]
results = await asyncio.gather(*tasks)
return results
# Install and run
uvloop.install()
asyncio.run(main())
```
## Python Version Compatibility
| Python Version | uvloop Support | Notes |
| --- | --- | --- |
| 3.8-3.10 | ✅ Full | Use `uvloop.install()` or `asyncio.set_event_loop_policy()` |
| 3.11-3.13 | ✅ Full | Can use `uvloop.run()` or `asyncio.Runner(loop_factory=uvloop.new_event_loop)` |
| 3.14 | ✅ Full | Free-threading support added in v0.22.0 @ [#693](https://github.com/MagicStack/uvloop/pull/693) |
### Platform Support
- **Linux**: ✅ Full support (best performance)
- **macOS**: ✅ Full support
- **Windows**: ⚠️ Not supported (use default asyncio proactor loop)
- **BSD**: ✅ Supported (via libuv)
## Performance Benchmarks
### Official Benchmarks
From @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/):
**Echo Server Performance (1 KiB messages):**
- uvloop: 105,000 req/sec
- Node.js: ~50,000 req/sec
- Standard asyncio: ~30,000 req/sec
**Throughput (100 KiB messages):**
- uvloop: 2.3 GiB/s
- Standard asyncio: 0.8 GiB/s
### Community Benchmarks (2024-2025)
@ [discuss.python.org](https://discuss.python.org/t/is-uvloop-still-faster-than-built-in-asyncio-event-loop/71136):
- **I/O-bound operations**: Python + uvloop is ~22% faster than Node.js
- **Native epoll comparison**: uvloop reaches 88% performance of native C epoll implementation
- **Overall speedup**: 2-4x faster than standard asyncio across workloads
## When NOT to Use uvloop
### 1. Windows-Only Applications
```python
# BAD: uvloop doesn't work on Windows
import uvloop
uvloop.install() # Will fail on Windows
# GOOD: Platform detection
import os
if os.name == 'posix':
import uvloop
uvloop.install()
```
### 2. CPU-Bound Tasks
uvloop optimizes I/O operations but won't speed up CPU-intensive work:
```python
# uvloop provides NO benefit here
async def cpu_intensive():
result = sum(i**2 for i in range(10_000_000))
return result
# Use multiprocessing instead for CPU-bound work
```
### 3. Debugging AsyncIO Code
The default asyncio loop has better debugging support:
```python
# For debugging, use standard asyncio with debug mode
import asyncio
# Don't install uvloop during development/debugging
asyncio.run(main(), debug=True) # Better error messages with standard loop
```
### 4. Simple Scripts with Minimal I/O
```python
# Overkill for trivial async work
async def simple_task():
await asyncio.sleep(1)
print("Done")
# uvloop adds minimal value here - overhead not justified
```
## Decision Matrix
### Use uvloop when
- ✅ Building production web servers (FastAPI, Sanic, etc.)
- ✅ High-throughput network applications
- ✅ WebSocket servers with many concurrent connections
- ✅ Async web scrapers/crawlers
- ✅ Running on Linux or macOS
- ✅ I/O-bound workloads dominate
- ✅ Zero-code-change performance boost desired
### Use default asyncio when
- ❌ Running on Windows
- ❌ Debugging complex async code
- ❌ CPU-bound workloads
- ❌ Simple scripts with minimal networking
- ❌ Maximum compatibility needed
- ❌ Educational/learning purposes (asyncio is simpler)
## Installation
### Basic Installation
```bash
pip install uvloop
```
### With uvicorn (ASGI server)
```bash
# uvloop automatically included with standard install
pip install 'uvicorn[standard]'
```
### Development/Source Build
```bash
# Requires Cython
pip install Cython
git clone --recursive https://github.com/MagicStack/uvloop.git
cd uvloop
pip install -e .[dev]
make
make test
```
## Integration with Common Frameworks
### FastAPI/Uvicorn
uvloop is automatically used when uvicorn is installed with `[standard]` extras:
```bash
pip install 'uvicorn[standard]' # Includes uvloop
```
### Sanic
Sanic automatically detects and uses uvloop if available:
```bash
pip install sanic uvloop
```
### aiohttp + gunicorn
```bash
# Use uvloop worker class
gunicorn app:create_app --worker-class aiohttp.worker.GunicornUVLoopWebWorker
```
### Tornado
```python
from tornado.platform.asyncio import AsyncIOMainLoop
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
AsyncIOMainLoop().install()
```
## Common Pitfalls
### Pitfall 1: Installing After Event Loop Created
```python
# BAD: Event loop already created
import asyncio
loop = asyncio.get_event_loop() # Creates default loop
import uvloop
uvloop.install() # Too late!
# GOOD: Install before any event loop operations
import uvloop
uvloop.install()
import asyncio
loop = asyncio.get_event_loop() # Now uses uvloop
```
### Pitfall 2: Windows Compatibility Assumptions
```python
# BAD: Crashes on Windows
import uvloop
uvloop.install()
# GOOD: Platform check
import sys
if sys.platform != 'win32':
import uvloop
uvloop.install()
```
### Pitfall 3: Expecting CPU Performance Gains
```python
# BAD: uvloop won't help CPU-bound code
async def calculate_primes(n):
return [i for i in range(2, n) if all(i % j != 0 for j in range(2, i))]
# uvloop provides NO benefit for pure computation
```
## Maintenance and Ecosystem
- **Active Development**: ✅ Maintained by MagicStack (creators of EdgeDB)
- **Release Cadence**: Regular updates (v0.22.1 released Oct 2025)
- **Community Size**: 10,000+ stars on GitHub, used in production by major projects
- **Dependency**: libuv (bundled, no external dependency management)
- **Python 3.14 Support**: ✅ Free-threading support added
## Related Libraries
- **httptools**: Fast HTTP parser (also by MagicStack, pairs with uvloop)
- **uvicorn**: ASGI server using uvloop by default
- **aiohttp**: Async HTTP client/server framework
- **websockets**: WebSocket library compatible with uvloop
- **Sanic**: Web framework optimized for uvloop
## References
- Official Repository: @ [MagicStack/uvloop](https://github.com/MagicStack/uvloop)
- Documentation: @ [uvloop.readthedocs.io](https://uvloop.readthedocs.io/)
- Original Blog Post: @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/)
- PyPI: @ [pypi.org/project/uvloop](https://pypi.org/project/uvloop/)
- Performance Discussion (2024): @ [discuss.python.org](https://discuss.python.org/t/is-uvloop-still-faster-than-built-in-asyncio-event-loop/71136)
- uvicorn Integration: @ [encode/uvicorn](https://github.com/encode/uvicorn)
- Sanic Framework: @ [sanic-org/sanic](https://github.com/sanic-org/sanic)
## Summary
uvloop represents the gold standard for asyncio performance optimization in Python. It requires minimal code changes (often just 2 lines) while delivering 2-4x performance improvements for I/O-bound async applications. Production deployments should default to uvloop on Linux/macOS systems unless specific compatibility or debugging requirements dictate otherwise. The library's maturity, active maintenance, and widespread adoption in high-performance Python web frameworks make it a critical component of the modern Python async ecosystem.

View File

@@ -0,0 +1,636 @@
---
title: "Python Development Orchestration Guide"
description: "Guide for orchestrating Python development tasks using specialized agents and commands"
version: "1.0.0"
last_updated: "2025-11-02"
document_type: "guide"
python_compatibility: "3.11+"
related_docs:
- "../SKILL.md"
- "./modern-modules.md"
- "./tool-library-registry.md"
---
# Python Development Orchestration Guide
Comprehensive guide for orchestrating Python development tasks using specialized agents and commands. This guide provides detailed workflows and patterns for coordinating multiple agents to accomplish complex Python development goals.
**Quick Reference**: For a concise overview and quick-start examples, see [SKILL.md](../SKILL.md).
## Available Agents and Commands
### Agents (in ~/.claude/agents/)
- **python-cli-architect**: Build modern CLI applications with Typer and Rich
- **python-portable-script**: Create stdlib-only portable scripts
- **python-pytest-architect**: Design comprehensive test suites
- **python-code-reviewer**: Review Python code for quality and standards
- **spec-architect**: Design system architecture
- **spec-planner**: Break down tasks into implementation plans
### Commands (in this skill: references/commands/)
- **/modernpython**: Apply Python 3.11+ best practices and modern patterns
- **/shebangpython**: Validate PEP 723 shebang compliance
### External Skills
- **uv**: Package management with uv (always use for Python dependency management)
## Core Workflow Patterns
### 1. TDD Workflow (Test-Driven Development)
**When to use**: Building new features, fixing bugs with test coverage
**Pattern**:
```text
1. Design → @agent-spec-architect
Input: Feature requirements
Output: Architecture design, component interfaces
2. Write Tests → @agent-python-pytest-architect
Input: Architecture design, expected behavior
Output: Complete test suite (fails initially)
3. Implement → @agent-python-cli-architect OR @agent-python-portable-script
Input: Tests, architecture design
Output: Implementation that makes tests pass
4. Review → @agent-python-code-reviewer
Input: Implementation + tests
Output: Review feedback, improvement suggestions
5. Validate (follow Linting Discovery Protocol)
- If .pre-commit-config.yaml exists: `uv run pre-commit run --files <files>`
- Else: Format → Lint → Type check → Test in sequence
- Apply: /modernpython to check modern patterns
- Verify: CI compatibility by checking .gitlab-ci.yml or .github/workflows/
```
**Example**:
```text
User: "Build a CLI tool to process CSV files with progress bars"
Step 1: @agent-spec-architect
"Design architecture for CSV processing CLI with progress tracking"
→ Architecture design with components
Step 2: @agent-python-pytest-architect
"Create test suite for CSV processor based on this architecture"
→ Test files in tests/
Step 3: @agent-python-cli-architect
"Implement CSV processor CLI with Typer+Rich based on these tests"
→ Implementation in packages/
Step 4: @agent-python-code-reviewer
"Review this implementation against the architecture and test requirements"
→ Review findings, suggested improvements
Step 5: Validate
→ All tests pass, coverage >80%, linting clean
```
### 2. Feature Addition Workflow
**When to use**: Adding new functionality to existing codebase
**Pattern**:
```text
1. Requirements → User or @agent-spec-analyst
Output: Clear requirements, acceptance criteria
2. Architecture → @agent-spec-architect
Input: Requirements, existing codebase structure
Output: Design that integrates with existing code
3. Implementation Plan → @agent-spec-planner
Input: Architecture design
Output: Step-by-step implementation tasks
4. Implement → @agent-python-cli-architect OR @agent-python-portable-script
Input: Implementation plan, existing code patterns
Output: New feature implementation
5. Testing → @agent-python-pytest-architect
Input: Implementation, edge cases
Output: Tests for new feature + integration tests
6. Review → @agent-python-code-reviewer
Input: All changes (implementation + tests)
Output: Quality assessment, improvements
7. Validate
- Check: No regressions in existing tests
- Verify: New feature has >80% coverage
- Apply: /modernpython for consistency
```
### 3. Code Review Workflow
**When to use**: Before merging changes, during PR review
**Pattern**:
```text
1. Self-Review → Apply /modernpython
Check: Modern Python patterns used
Check: No legacy typing imports
2. Standards Validation → Apply /shebangpython (if scripts)
Check: PEP 723 compliance
Check: Correct shebang format
3. Agent Review → @agent-python-code-reviewer
Input: All changed files
Output: Comprehensive review findings
4. Fix Issues → Appropriate agent
Input: Review findings
Output: Corrections
5. Re-validate
- Run: uv run pre-commit run --all-files
- Run: uv run pytest
- Verify: All review issues addressed
```
### 4. Refactoring Workflow
**When to use**: Improving code structure without changing behavior
**Pattern**:
```text
1. Tests First → Verify existing test coverage
Check: Tests exist for code being refactored
Check: Tests pass before refactoring
If missing: @agent-python-pytest-architect creates tests
2. Refactor → @agent-python-cli-architect or @agent-python-portable-script
Input: Code to refactor + test suite
Constraint: Must not break existing tests
Output: Refactored code
3. Validate → Tests still pass
Run: uv run pytest
Verify: Coverage maintained or improved
4. Review → @agent-python-code-reviewer
Input: Before/after comparison
Output: Verification refactoring improved quality
5. Apply Standards (follow Linting Discovery Protocol)
- Apply: /modernpython for modern patterns
- If .pre-commit-config.yaml exists: `uv run pre-commit run --files <files>`
- Else: Format → Lint (with --fix) → Type check in sequence
```
### 5. Debugging Workflow
**When to use**: Investigating and fixing bugs
**Pattern**:
```text
1. Reproduce → Write failing test
@agent-python-pytest-architect
Input: Bug description, steps to reproduce
Output: Test that demonstrates bug
2. Trace → Investigate root cause
Use: Debugging tools, logging
Identify: Specific code causing issue
3. Fix → Appropriate agent
@agent-python-cli-architect or @agent-python-portable-script
Input: Failing test + root cause
Output: Fix that makes test pass
4. Test → Verify fix + no regressions
Run: Full test suite
Verify: Bug test now passes
Verify: No other tests broke
5. Review → @agent-python-code-reviewer
Input: Fix + test
Output: Verification fix is proper solution
6. Validate
- Apply: /modernpython
- Run: uv run pre-commit run --files <changed>
```
## Agent Selection Guide
### When to Use python-cli-architect
**Use when**:
- **DEFAULT choice for scripts and CLI tools**
- Building command-line applications with rich user interaction
- Need progress bars, tables, colored output
- User-facing CLI tools and automation scripts
- Any script where UX matters (formatted output, progress feedback)
- PEP 723 + uv available (internet access present)
**Characteristics**:
- Uses Typer for CLI framework
- Uses Rich for terminal output
- Focuses on UX and polish
- PEP 723 makes dependencies transparent (single file)
- Better UX than stdlib alternatives
- Works anywhere with Python 3.11+ and internet access
**Complexity Advantage** (IMPORTANT):
-**LESS development complexity** - Libraries handle the hard work (argument parsing, output formatting, validation)
-**LESS code to write** - Typer CLI boilerplate and Rich formatting come built-in
-**Better UX** - Professional output with minimal effort
-**Just as portable** - PEP 723 + uv makes single-file scripts with dependencies work seamlessly
**This agent is EASIER to use than stdlib-only approaches. Choose this as the default unless portability restrictions exist.**
**Rich Width Handling**: For Rich Panel/Table width issues in CI/non-TTY environments, see [Typer and Rich CLI Examples](../assets/typer_examples/index.md) for complete solutions including the `get_rendered_width()` helper pattern.
**Example tasks**:
- "Build a CLI tool to manage database backups with progress bars"
- "Create an interactive file browser with color-coded output"
- "Create a script to scan git repositories and show status tree"
- "Build a deployment verification tool with progress bars"
### When to Use python-portable-script
**Use when** (RARE - ask user first if unclear):
- **Restricted environment**: No internet access (airgapped, embedded systems)
- **No uv available**: Locked-down systems where uv cannot be installed
- **Hard stdlib-only requirement**: Explicitly requested by user
- **1% case**: Only when deployment environment truly restricts dependencies
**Characteristics**:
- Stdlib only (argparse, pathlib, subprocess)
- Defensive error handling
- Cross-platform compatibility
- Stdlib only (no PEP 723 needed - nothing to declare)
- Use PEP 723 ONLY if adding external dependencies later
- Ask deployment environment questions before choosing this agent
- This is the EXCEPTION, not the rule
- Consider python-cli-architect first unless restrictions confirmed
**Complexity Trade-off** (IMPORTANT):
-**MORE development complexity** - Manual implementation of everything (argument parsing, output formatting, validation, error handling)
-**MORE code to write** - Build from scratch what libraries provide tested
-**Basic UX** - Limited formatting capabilities
-**Maximum portability** - The ONLY reason to choose this: runs anywhere Python exists without network access
**This agent is NOT simpler to use - it requires MORE work to build the same functionality. Choose it ONLY for portability, not for simplicity.**
**Note**: Only use this agent if deployment environment restrictions are confirmed. With PEP 723 + uv, python-cli-architect is preferred for better UX. ASK: "Will this run without internet access or where uv cannot be installed?" See [PEP 723 Reference](./PEP723.md) for details on when to use inline script metadata.
**Example tasks**:
- "Create a deployment script using only stdlib"
- "Build a config file validator that runs without dependencies"
## Agent Selection Decision Process
### For Scripts and CLI Tools
**Step 1: Default to python-cli-architect**
- Provides better UX (Rich components, progress bars, tables)
- PEP 723 + uv handles dependencies (still single file)
- Works in 99% of scenarios
**Step 2: Only use python-portable-script if:**
- User explicitly states "stdlib only" requirement
- OR deployment environment is confirmed restricted:
- No internet access (airgapped network, embedded system)
- uv cannot be installed (locked-down corporate environment)
- Security policy forbids external dependencies
**Step 3: When uncertain, ASK:**
1. "Where will this script be deployed?"
2. "Does the environment have internet access?"
3. "Can uv be installed in the target environment?"
4. "Is stdlib-only a hard requirement, or would you prefer better UX?"
**Decision Tree**:
```text
Does the deployment environment have internet access?
├─ YES → Use python-cli-architect (default)
│ Single file + PEP 723 + uv = transparent dependencies
└─ NO → Is uv installable in the environment?
├─ YES → Use python-cli-architect (default)
│ uv can cache dependencies for offline use
└─ NO → Use python-portable-script (exception)
Truly restricted environment requires stdlib-only
```
If answers indicate normal environment → python-cli-architect
If answers indicate restrictions → python-portable-script
**When in doubt**: Use python-cli-architect. PEP 723 + uv makes single-file scripts with dependencies just as portable as stdlib-only scripts for 99% of deployment scenarios.
### When to Use python-pytest-architect
**Use when**:
- Designing test suites from scratch
- Need comprehensive test coverage strategy
- Implementing advanced testing (property-based, mutation)
- Test architecture decisions
**Characteristics**:
- Modern pytest patterns
- pytest-mock exclusively (never unittest.mock)
- AAA pattern (Arrange-Act-Assert)
- Coverage and mutation testing
**Example tasks**:
- "Design test suite for payment processing module"
- "Create property-based tests for data validation"
### When to Use python-code-reviewer
**Use when**:
- Reviewing code for quality, patterns, standards
- Post-implementation validation
- Pre-merge code review
- Identifying improvement opportunities
**Characteristics**:
- Checks against modern Python standards
- Identifies anti-patterns
- Suggests improvements
- Validates against project patterns
**Example tasks**:
- "Review this PR for code quality"
- "Check if implementation follows best practices"
## Command Usage Patterns
### /modernpython
**Apply to**: Load as reference guide (optional file path argument for context)
**Use when**:
- As reference guide when writing new code
- Learning modern Python 3.11-3.14 features and patterns
- Understanding official PEPs (585, 604, 695, etc.)
- Identifying legacy patterns to avoid
- Finding modern alternatives for old code
**Note**: This is a reference document to READ, not an automated validation tool.
**Usage**:
```text
/modernpython
→ Loads comprehensive reference guide
→ Provides Python 3.11+ pattern examples
→ Includes PEP citations with WebFetch commands
→ Shows legacy patterns to avoid
→ Shows modern alternatives to use
→ Framework-specific guides (Typer, Rich, pytest)
```
**With file path**:
```text
/modernpython packages/mymodule.py
→ Loads guide for reference while working on specified file
→ Use guide to manually identify and refactor legacy patterns
```
### /shebangpython
**Apply to**: Individual Python scripts
**Use when**:
- Creating new standalone scripts
- Ensuring PEP 723 compliance
- Correcting script configuration
**Pattern**:
```text
/shebangpython scripts/deploy.py
→ Analyzes imports to determine dependency type
→ **Corrects shebang** to match script type (edits file if wrong)
→ **Adds PEP 723 metadata** if external dependencies detected (edits file)
→ **Removes PEP 723 metadata** if stdlib-only (edits file)
→ Sets execute bit if needed
→ Provides detailed verification report
```
## Integration with uv Skill
**Always use uv skill for**:
- Package management: `uv add <package>`
- Running scripts: `uv run script.py`
- Running tools: `uv run pytest`, `uv run ruff`
- Creating projects: `uv init`
**Never use**:
- `pip install` (use `uv add`)
- `python -m pip` (use `uv`)
- `pipenv`, `poetry` (use `uv`)
## Quality Gates
**CRITICAL**: The orchestrator MUST instruct agents to follow the Linting Discovery Protocol from the main SKILL.md before executing quality checks.
**Linting Discovery Protocol** (see SKILL.md for full details):
1. **Check for pre-commit**: If `.pre-commit-config.yaml` exists, use `uv run pre-commit run --files <files>`
2. **Else check CI config**: Read `.gitlab-ci.yml` or `.github/workflows/*.yml` for exact linting commands
3. **Else detect tools**: Check `pyproject.toml` for configured dev tools
**Format-First Requirement**: ALWAYS format before linting (formatting fixes many lint issues automatically)
**Every Python development task must pass**:
1. **Format-first**: `uv run ruff format <files>` (or via pre-commit)
2. **Linting**: `uv run ruff check <files>` (clean, after formatting)
3. **Type checking**: Use **detected type checker** (`basedpyright`, `pyright`, or `mypy`)
4. **Tests**: `uv run pytest` (>80% coverage)
5. **Standards**: `/modernpython` for modern patterns
6. **Script compliance**: `/shebangpython` for standalone scripts
**Preferred execution** (when `.pre-commit-config.yaml` exists):
```bash
# This runs ALL checks in correct order (format → lint → type → test)
uv run pre-commit run --files <changed_files>
```
**For critical code** (payments, auth, security):
- Coverage: >95%
- Mutation testing: `uv run mutmut run`
- Security scan: `uv run bandit -r packages/`
**CI Compatibility**: After local checks pass, verify CI requirements are met by checking CI config files for additional validators.
## Reference Example
**Complete working example**: `~/.claude/agents/python-cli-demo.py`
This file demonstrates all modern Python CLI patterns:
- PEP 723 inline script metadata with correct shebang
- Typer + Rich integration (Typer includes Rich, don't add separately)
- Modern Python 3.11+ patterns (StrEnum, Protocol, TypeVar, etc.)
- Proper type annotations with Annotated syntax
- Rich components (Console, Progress, Table, Panel)
- Async processing patterns
- Comprehensive docstrings
Use this as the reference implementation when creating CLI tools.
## Examples of Complete Workflows
### Example: Building a CLI Tool
```text
User: "Build a CLI tool to validate YAML configurations"
Orchestrator:
1. @agent-spec-architect
"Design architecture for YAML validation CLI"
→ Component design, validation rules
2. @agent-python-pytest-architect
"Create test suite for YAML validator"
→ tests/test_validator.py with fixtures
3. @agent-python-cli-architect
"Implement YAML validator CLI with Typer based on tests"
Reference: ~/.claude/agents/python-cli-demo.py for patterns
→ packages/validator.py with Typer+Rich UI
4. Validation (Linting Discovery Protocol):
/shebangpython packages/validator.py
# If .pre-commit-config.yaml exists:
uv run pre-commit run --files packages/validator.py tests/
# Else:
uv run ruff format packages/ tests/
uv run ruff check packages/ tests/
uv run <detected-type-checker> packages/ tests/
uv run pytest
5. @agent-python-code-reviewer
"Review validator implementation"
→ Quality check, improvements
6. Fix any issues and re-validate
```
### Example: Fixing a Bug
```text
User: "Fix bug where CSV parser fails on empty rows"
Orchestrator:
1. @agent-python-pytest-architect
"Write test that reproduces CSV parser bug with empty rows"
→ tests/test_csv_parser.py::test_empty_rows (failing)
2. @agent-python-cli-architect
"Fix CSV parser to handle empty rows, making test pass"
→ packages/csv_parser.py updated
3. Validation:
uv run pytest # Verify bug test passes
uv run pytest # Verify no regression
4. @agent-python-code-reviewer
"Review bug fix and test"
→ Verify proper solution
5. Apply standards (Linting Discovery Protocol):
/modernpython packages/csv_parser.py
# If .pre-commit-config.yaml exists:
uv run pre-commit run --files packages/csv_parser.py tests/
# Else: Format → Lint (--fix) → Type check sequence
```
## Anti-Patterns to Avoid
### Don't: Write Python code as orchestrator
```text
❌ Orchestrator writes implementation directly
```
### Do: Delegate to appropriate agent
```text
✅ @agent-python-cli-architect writes implementation
✅ @agent-python-code-reviewer validates it
```
### Don't: Skip validation steps
```text
❌ Implement → Done (no tests, no review, no linting)
```
### Do: Follow complete workflow
```text
✅ Implement → Test → Review → Validate → Done
```
### Don't: Mix agent contexts
```cpp
Ask python-portable-script to build Typer CLI
Ask python-cli-architect to avoid all dependencies
```
### Do: Choose correct agent for context
```text
✅ python-cli-architect for user-facing CLI tools
✅ python-portable-script for stdlib-only scripts
```
## Summary
**Orchestration = Coordination, Not Implementation**
1. Choose the right agent for the task
2. Provide clear inputs and context
3. Chain agents for complex workflows (architect → test → implement → review)
4. Always validate with quality gates
5. Use commands for standards checking
6. Integrate with uv skill for package management
**Success = Right agent + Clear inputs + Proper validation**

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,831 @@
---
title: User Project Conventions
date: 2025-11-17
source: Extracted from user's production projects
projects_analyzed:
- pre-commit-pep723-linter-wrapper (PyPI/GitHub)
- python_picotool (GitLab)
- usb_powertools (GitLab)
- picod (GitLab)
- i2c_analyzer (GitLab)
---
# User Project Conventions
Conventions extracted from actual production projects. The model MUST follow these patterns when creating new Python projects.
## Asset Files Available
The following template files are available in the skill's `assets/` directory for use in new projects:
| File | Purpose | Usage |
| ------------------------- | ---------------------------------------------------- | ----------------------------------------------------- |
| `version.py` | Dual-mode version management (hatch-vcs + fallback) | Copy to `packages/{package_name}/version.py` |
| `hatch_build.py` | Build hook for binary/asset handling | Copy to `scripts/hatch_build.py` |
| `.markdownlint.json` | Markdown linting configuration (most rules disabled) | Copy to project root |
| `.pre-commit-config.yaml` | Standard pre-commit hooks configuration | Copy to project root, run `uv run pre-commit install` |
| `.editorconfig` | Editor formatting settings | Copy to project root |
The model MUST copy these files when creating new Python projects to ensure consistency with established conventions documented below.
## 1. Version Management
### Pattern: Dual-mode version.py (STANDARD - 5/5 projects)
**Location**: `packages/{package_name}/version.py`
**Pattern**: Hatch-VCS with importlib.metadata fallback
**Implementation**:
```python
"""Compute the version number and store it in the `__version__` variable.
Based on <https://github.com/maresb/hatch-vcs-footgun-example>.
"""
# /// script
# List dependencies for linting only
# dependencies = [
# "hatchling>=1.14.0",
# ]
# ///
import os
def _get_hatch_version() -> str | None:
"""Compute the most up-to-date version number in a development environment.
Returns `None` if Hatchling is not installed, e.g. in a production environment.
For more details, see <https://github.com/maresb/hatch-vcs-footgun-example/>.
"""
try:
from hatchling.metadata.core import ProjectMetadata
from hatchling.plugin.manager import PluginManager
from hatchling.utils.fs import locate_file
except ImportError:
# Hatchling is not installed, so probably we are not in
# a development environment.
return None
pyproject_toml = locate_file(__file__, "pyproject.toml")
if pyproject_toml is None:
raise RuntimeError("pyproject.toml not found although hatchling is installed")
root = os.path.dirname(pyproject_toml)
metadata = ProjectMetadata(root=root, plugin_manager=PluginManager())
# Version can be either statically set in pyproject.toml or computed dynamically:
return str(metadata.core.version or metadata.hatch.version.cached)
def _get_importlib_metadata_version() -> str:
"""Compute the version number using importlib.metadata.
This is the official Pythonic way to get the version number of an installed
package. However, it is only updated when a package is installed. Thus, if a
package is installed in editable mode, and a different version is checked out,
then the version number will not be updated.
"""
from importlib.metadata import version
__version__ = version(__package__ or __name__)
return __version__
__version__ = _get_hatch_version() or _get_importlib_metadata_version()
```
**pyproject.toml Configuration** (STANDARD - 5/5 projects):
```toml
[project]
dynamic = ["version"]
[tool.hatch.version]
source = "vcs"
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
```
\***\*init**.py Export Pattern\*\* (STANDARD - 5/5 projects):
```python
from .version import __version__
__all__ = ["__version__"] # Plus other exports
```
## 2. Package Structure
### Pattern: src-layout with packages/ directory (STANDARD - 5/5 projects)
**Directory Structure**:
```text
project_root/
├── packages/
│ └── {package_name}/
│ ├── __init__.py # Exports public API + __version__
│ ├── version.py # Version management
│ ├── {modules}.py
│ └── tests/ # Co-located tests
├── scripts/
│ └── hatch_build.py # Custom build hook (if needed)
├── pyproject.toml
└── README.md
```
**pyproject.toml Package Mapping** (STANDARD - 5/5 projects):
```toml
[tool.hatch.build.targets.wheel]
packages = ["packages/{package_name}"]
[tool.hatch.build.targets.wheel.sources]
"packages/{package_name}" = "{package_name}"
```
### Pattern: **init**.py exports with **all** (STANDARD - 5/5 projects)
The model must export public API + `__version__` in `__init__.py` with explicit `__all__` list.
**Minimal Example** (usb_powertools):
```python
"""Package docstring."""
from .version import __version__
__all__ = ["__version__"]
```
**Full API Example** (pep723_loader):
```python
"""Package docstring."""
from .pep723_checker import Pep723Checker
from .version import __version__
__all__ = ["Pep723Checker", "__version__"]
```
**Evidence**: All 5 projects use this pattern consistently.
## 3. Build Configuration
### Pattern: Custom hatch_build.py Hook (STANDARD - 3/5 projects with binaries)
**Location**: `scripts/hatch_build.py`
**Purpose**: Execute binary build scripts (`build-binaries.sh` or `build-binaries.py`) before packaging.
**Standard Implementation** (usb_powertools, picod, i2c_analyzer identical):
```python
"""Custom hatchling build hook for binary compilation.
This hook runs before the build process to compile platform-specific binaries
if build scripts are present in the project.
"""
from __future__ import annotations
import shutil
import subprocess
from pathlib import Path
from typing import Any
from hatchling.builders.config import BuilderConfig
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
class BinaryBuildHook(BuildHookInterface[BuilderConfig]):
"""Build hook that runs binary compilation scripts before packaging.
This hook checks for the following scripts in order:
1. scripts/build-binaries.sh
2. scripts/build-binaries.py
If either script exists, it is executed before the build process.
If neither exists, the hook silently continues without error.
"""
PLUGIN_NAME = "binary-build"
def initialize(self, version: str, build_data: dict[str, Any]) -> None:
"""Run binary build scripts if they exist."""
shell_script = Path(self.root) / "scripts" / "build-binaries.sh"
if shell_script.exists() and shell_script.is_file():
self._run_shell_script(shell_script)
return
python_script = Path(self.root) / "scripts" / "build-binaries.py"
if python_script.exists() and python_script.is_file():
self._run_python_script(python_script)
return
self.app.display_info("No binary build scripts found, skipping binary compilation")
def _run_shell_script(self, script_path: Path) -> None:
"""Execute a shell script for binary building."""
self.app.display_info(f"Running binary build script: {script_path}")
if not (bash := shutil.which("bash")):
raise RuntimeError("bash not found - cannot execute shell script")
try:
result = subprocess.run([bash, str(script_path)], cwd=self.root, capture_output=True, text=True, check=True)
if result.stdout:
self.app.display_info(result.stdout)
if result.stderr:
self.app.display_warning(result.stderr)
except subprocess.CalledProcessError as e:
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
if e.stdout:
self.app.display_info(f"stdout: {e.stdout}")
if e.stderr:
self.app.display_error(f"stderr: {e.stderr}")
raise
def _run_python_script(self, script_path: Path) -> None:
"""Execute a Python script for binary building.
Executes the script directly using its shebang, which honors PEP 723
inline metadata for dependency management via uv.
"""
self.app.display_info(f"Running binary build script: {script_path}")
try:
result = subprocess.run([script_path, "--clean"], cwd=self.root, capture_output=True, text=True, check=True)
if result.stdout:
self.app.display_info(result.stdout)
if result.stderr:
self.app.display_warning(result.stderr)
except subprocess.CalledProcessError as e:
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
if e.stdout:
self.app.display_info(f"stdout: {e.stdout}")
if e.stderr:
self.app.display_error(f"stderr: {e.stderr}")
raise
```
**pyproject.toml Configuration**:
```toml
[tool.hatch.build.targets.sdist.hooks.custom]
path = "scripts/hatch_build.py"
[tool.hatch.build]
artifacts = ["builds/*/binary_name"] # If binaries included
```
## 4. Pre-commit Configuration
### Standard Hook Set (STANDARD - 5/5 projects)
**File**: `.pre-commit-config.yaml`
**Core Hooks** (appear in all projects):
```yaml
repos:
- repo: https://github.com/mxr/sync-pre-commit-deps
rev: v0.0.3
hooks:
- id: sync-pre-commit-deps
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: trailing-whitespace
exclude: \.lock$
- id: end-of-file-fixer
exclude: \.lock$
- id: check-yaml
- id: check-json
- id: check-toml
- id: check-added-large-files
args: ["--maxkb=10000"] # 10MB limit
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: mixed-line-ending
args: ["--fix=lf"]
- id: check-executables-have-shebangs
- id: check-shebang-scripts-are-executable
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.13.3+
hooks:
- id: ruff
name: Lint Python with ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
name: Format Python with ruff
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v4.0.0-alpha.8
hooks:
- id: prettier
name: Format YAML, JSON, and Markdown files
types_or: [yaml, json, markdown]
exclude: \.lock$
- repo: https://github.com/pecigonzalo/pre-commit-shfmt
rev: v2.2.0
hooks:
- id: shell-fmt-go
args: ["--apply-ignore", -w, -i, "4", -ci]
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.11.0.1
hooks:
- id: shellcheck
default_language_version:
python: python3
exclude: |
(?x)^(
\.git/|
\.venv/|
__pycache__/|
\.mypy_cache/|
\.cache/|
\.pytest_cache/|
\.lock$|
typings/
)
```
### Pattern: pep723-loader for Type Checking (STANDARD - 3/5 projects)
Projects using `pep723-loader` wrapper for mypy/basedpyright:
```yaml
- repo: local
hooks:
- id: mypy
name: mypy
entry: uv run -q --no-sync --with pep723-loader --with mypy pep723-loader mypy
language: system
types: [python]
pass_filenames: true
- id: pyright
name: basedpyright
entry: uv run -q --no-sync --with pep723-loader --with basedpyright pep723-loader basedpyright
language: system
types: [python]
pass_filenames: true
require_serial: true
```
### Pattern: Markdown Linting (STANDARD - 4/5 projects)
```yaml
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.18.1
hooks:
- id: markdownlint-cli2
language_version: "latest"
args: ["--fix"]
```
**Evidence**: pre-commit-pep723-linter-wrapper, usb_powertools, picod all use this pattern.
## 5. Ruff Configuration
### Standard Configuration (STANDARD - 5/5 projects)
**pyproject.toml Section**:
```toml
[tool.ruff]
target-version = "py311"
line-length = 120
fix = true
preview = true # Optional, 3/5 projects use
[tool.ruff.format]
docstring-code-format = true
quote-style = "double"
line-ending = "lf"
skip-magic-trailing-comma = true
preview = true
[tool.ruff.lint]
extend-select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"UP", # pyupgrade
"YTT", # flake8-2020
"S", # flake8-bandit
"B", # flake8-bugbear
"A", # flake8-builtins
"C4", # flake8-comprehensions
"T10", # flake8-debugger
"SIM", # flake8-simplify
"C90", # mccabe
"PGH", # pygrep-hooks
"RUF", # ruff-specific
"TRY", # tryceratops
"DOC", # pydocstyle docstrings (4/5 projects)
"D", # pydocstyle (4/5 projects)
]
ignore = [
"COM812", # Missing trailing comma
"COM819", # Missing trailing comma
"D107", # Missing docstring in __init__
"D415", # First line should end with a period
"E111", # Indentation is not a multiple of four
"E117", # Over-indented for visual indent
"E203", # whitespace before ':'
"E402", # Module level import not at top of file
"E501", # Line length exceeds maximum limit
"ISC001", # isort configuration is missing
"ISC002", # isort configuration is missing
"Q000", # Remove bad quotes
"Q001", # Remove bad quotes
"Q002", # Remove bad quotes
"Q003", # Remove bad quotes
"TRY003", # Exception message should not be too long
"S404", # module is possibly insecure
"S603", # subprocess-without-shell-equals-true
"S606", # start-process-with-no-shell
"DOC201", # Missing return section in docstring
"DOC501", # Missing raises section
"DOC502", # Missing raises section
"T201", # Allow print statements (4/5 projects)
]
unfixable = ["F401", "S404", "S603", "S606", "DOC501"]
[tool.ruff.lint.pycodestyle]
max-line-length = 120
[tool.ruff.lint.isort]
combine-as-imports = true
split-on-trailing-comma = false
force-single-line = false
force-wrap-aliases = false
[tool.ruff.lint.flake8-quotes]
docstring-quotes = "double"
[tool.ruff.lint.pydocstyle]
convention = "google"
[tool.ruff.lint.mccabe]
max-complexity = 10
[tool.ruff.lint.per-file-ignores]
"**/tests/*" = ["S101", "S603", "S607", "D102", "D200", "D100"]
"**/test_*.py" = ["S101", "S603", "S607", "D102", "D200", "D100"]
```
**Evidence**: All 5 projects use this exact configuration with minor variations.
## 6. Mypy Configuration
### Standard Configuration (STANDARD - 5/5 projects)
```toml
[tool.mypy]
python_version = "3.11"
strict = true
strict_equality = true
extra_checks = true
warn_unused_configs = true
warn_redundant_casts = true
warn_unused_ignores = true
ignore_missing_imports = true
show_error_codes = true
pretty = true
disable_error_code = ["call-arg"]
```
**Per-module overrides pattern**:
```toml
[[tool.mypy.overrides]]
module = "tests.*"
disable_error_code = ["misc"]
```
## 7. Basedpyright Configuration
### Standard Configuration (STANDARD - 5/5 projects)
```toml
[tool.basedpyright]
pythonVersion = "3.11"
typeCheckingMode = "standard"
reportMissingImports = false
reportMissingTypeStubs = false
reportUnnecessaryTypeIgnoreComment = "error"
reportPrivateImportUsage = false
include = ["packages"]
extraPaths = ["packages", "scripts", "tests", "."]
exclude = ["**/node_modules", "**/__pycache__", ".*", "__*", "**/typings"]
ignore = ["**/typings"]
venvPath = "."
venv = ".venv"
```
**Evidence**: All 5 projects use this configuration.
## 8. Pytest Configuration
### Standard Configuration (STANDARD - 5/5 projects)
```toml
[tool.pytest.ini_options]
addopts = [
"--cov=packages/{package_name}",
"--cov-report=term-missing",
"-v",
]
testpaths = ["packages/{package_name}/tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
pythonpath = [".", "packages/"]
markers = [
"hardware: tests that require USB hardware",
"slow: tests that take significant time to run",
"integration: integration tests",
]
[tool.coverage.run]
omit = ["*/tests/*"]
[tool.coverage.report]
show_missing = true
fail_under = 70
```
**Evidence**: All projects follow this pattern with minor marker variations.
## 9. Formatting Configuration Files
### .markdownlint.json (STANDARD - 5/5 projects)
**All projects use identical configuration**:
```json
{
"MD003": false,
"MD007": { "indent": 2 },
"MD001": false,
"MD022": false,
"MD024": false,
"MD013": false,
"MD036": false,
"MD025": false,
"MD031": false,
"MD041": false,
"MD029": false,
"MD033": false,
"MD046": false,
"blanks-around-fences": false,
"blanks-around-headings": false,
"blanks-around-lists": false,
"code-fence-style": false,
"emphasis-style": false,
"heading-start-left": false,
"heading-style": false,
"hr-style": false,
"line-length": false,
"list-indent": false,
"list-marker-space": false,
"no-blanks-blockquote": false,
"no-hard-tabs": false,
"no-missing-space-atx": false,
"no-missing-space-closed-atx": false,
"no-multiple-blanks": false,
"no-multiple-space-atx": false,
"no-multiple-space-blockquote": false,
"no-multiple-space-closed-atx": false,
"no-trailing-spaces": false,
"ol-prefix": false,
"strong-style": false,
"ul-indent": false
}
```
**Evidence**: Identical across all 5 projects.
### .editorconfig (COMMON - 2/5 projects have it)
**Standard Pattern** (python_picotool, picod):
```ini
# EditorConfig: https://editorconfig.org/
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
max_line_length = 120
[*.md]
indent_style = space
indent_size = 4
trim_trailing_whitespace = false
[*.py]
indent_style = space
indent_size = 4
[*.{yml,yaml}]
indent_style = space
indent_size = 2
[*.sh]
indent_style = space
indent_size = 4
[*.toml]
indent_style = space
indent_size = 2
[*.json]
indent_style = space
indent_size = 2
[COMMIT_EDITMSG]
max_line_length = 72
```
**Evidence**:
## 10. Semantic Release Configuration
### Standard Configuration (STANDARD - 5/5 projects)
```toml
[tool.semantic_release]
version_toml = []
major_on_zero = true
allow_zero_version = true
tag_format = "v{version}"
build_command = "uv build"
[tool.semantic_release.branches.main]
match = "(main|master)"
prerelease = false
[tool.semantic_release.commit_parser_options]
allowed_tags = [
"build",
"chore",
"ci",
"docs",
"feat",
"fix",
"perf",
"style",
"refactor",
"test",
]
minor_tags = ["feat"]
patch_tags = ["fix", "perf", "refactor"]
```
**Evidence**: All 5 projects use this configuration identically.
## 11. Dependency Groups
### Standard dev Dependencies (STANDARD - 5/5 projects)
```toml
[dependency-groups]
dev = [
"basedpyright>=1.21.1",
"hatch-vcs>=0.5.0",
"hatchling>=1.14.0",
"mypy>=1.18.2",
"pre-commit>=4.3.0",
"pytest>=8.4.2",
"pytest-asyncio>=1.2.0",
"pytest-cov>=6.0.0",
"pytest-mock>=3.14.0",
"ruff>=0.9.4",
"python-semantic-release>=10.4.1",
"generate-changelog>=0.16.0",
]
```
**Common Pattern**: All projects include mypy, basedpyright, ruff, pytest, pre-commit, hatchling tools.
**Evidence**: All 5 projects have dev dependency groups with these core tools.
## 12. GitLab Project-Specific Patterns
### Pattern: Custom PyPI Index (STANDARD - 4/4 GitLab projects)
```toml
[tool.uv]
publish-url = "{{gitlab_instance_url}}/api/v4/projects/{{project_id}}/packages/pypi"
[[tool.uv.index]]
name = "pypi"
url = "https://pypi.org/simple"
default = true
[[tool.uv.index]]
name = "gitlab"
url = "{{gitlab_instance_url}}/api/v4/groups/{{group_id}}/-/packages/pypi/simple"
explicit = true
default = false
```
## 13. Project Metadata Standards
### Pattern: Author and Maintainer (STANDARD - 5/5 projects)
```toml
[project]
authors = [{ name = "{{author_name_from_git_config_user_name}}", email = "{{author_email_from_git_config_user_email}}" }]
maintainers = [{ name = "{{author_name_from_git_config_user_name}}", email = "{{author_email_from_git_config_user_email}}" }]
```
**Observation**: Email addresses differ between GitHub projects (personal email) and GitLab projects (corporate email).
### Pattern: Classifiers (STANDARD - 5/5 projects)
**Common classifiers across all projects**:
```toml
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Operating System :: POSIX :: Linux" or "Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
```
### Pattern: Keywords (STANDARD - 5/5 projects)
All projects include domain-specific keywords related to their purpose.
### Pattern: requires-python (STANDARD - 5/5 projects)
**Two variants**:
- GitHub: `>=3.10`
- GitLab: `>=3.11,<3.13`
## 14. CLI Entry Points
### Pattern: Typer-based CLI (STANDARD - 5/5 projects)
```toml
[project.scripts]
{package_name} = "{package_name}.cli:main" or "{package_name}.cli:app"
[project]
dependencies = [
"typer>=0.19.2",
]
```
**Evidence**: All 5 projects use Typer for CLI implementation.
## Summary of Standard Patterns
**STANDARD** (5/5 projects):
- Dual-mode version.py with hatch-vcs
- packages/ directory structure
- **all** exports in **init**.py
- Ruff formatting with 120 char line length
- Mypy strict mode
- Basedpyright type checking
- Pre-commit hooks (sync-deps, ruff, prettier, shellcheck, shfmt)
- .markdownlint.json (identical config)
- Semantic release configuration
- Typer-based CLI
- pytest with coverage
**COMMON** (3-4/5 projects):
- pep723-loader for type checking in pre-commit
- Custom hatch_build.py hook
- .editorconfig
- GitLab custom PyPI index
The model must follow STANDARD patterns for all new Python projects. COMMON patterns should be used when applicable (e.g., hatch_build.py only if binaries needed).