Initial commit
This commit is contained in:
548
skills/python3-development/references/PEP723.md
Normal file
548
skills/python3-development/references/PEP723.md
Normal file
@@ -0,0 +1,548 @@
|
||||
---
|
||||
title: "PEP 723 - Inline Script Metadata"
|
||||
description: "Official Python specification for embedding dependency metadata in single-file scripts"
|
||||
version: "1.0.0"
|
||||
last_updated: "2025-11-04"
|
||||
document_type: "reference"
|
||||
official_specification: "https://peps.python.org/pep-0723/"
|
||||
python_compatibility: "3.11+"
|
||||
related_docs:
|
||||
- "../SKILL.md"
|
||||
- "./python-development-orchestration.md"
|
||||
---
|
||||
|
||||
# PEP 723 - Inline Script Metadata
|
||||
|
||||
## What is PEP 723?
|
||||
|
||||
PEP 723 is the official Python specification that defines a standard format for embedding metadata in single-file Python scripts. It allows scripts to declare their dependencies and Python version requirements without requiring separate configuration files like `pyproject.toml` or `requirements.txt`.
|
||||
|
||||
## Official Specification
|
||||
|
||||
The model must WebFetch this url before discussing the topic with the user [pep-0723](https://peps.python.org/pep-0723/)
|
||||
|
||||
## Key Concept
|
||||
|
||||
PEP 723 metadata is embedded **inside Python comments** using a special syntax, making the metadata human-readable and machine-parseable while keeping the script as a single portable file.
|
||||
If implementing anything to interact with this metadata, such as a linting enhancer you must WebFetch [inline-script-metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata) to get the schema and syntax and implementation.
|
||||
|
||||
## The Problem It Solves
|
||||
|
||||
### The Challenge
|
||||
|
||||
When sharing Python scripts as standalone files (via email, gists, URLs, or chat), there's a fundamental problem:
|
||||
|
||||
- **Scripts often need external dependencies** (requests, rich, pandas, etc.)
|
||||
- **No standard way** to declare these dependencies within the script itself
|
||||
- **Tools can't automatically know** what packages to install to run the script
|
||||
- **Users must read documentation** or comments to figure out requirements
|
||||
|
||||
### The Solution
|
||||
|
||||
PEP 723 provides a **standardized comment-based format** that:
|
||||
|
||||
- ✅ Embeds dependency declarations directly in the script
|
||||
- ✅ Remains a valid Python file (metadata is in comments)
|
||||
- ✅ Is machine-readable by package managers (uv, PDM, Hatch)
|
||||
- ✅ Keeps everything in a single portable file
|
||||
|
||||
## Syntax
|
||||
|
||||
### Format
|
||||
|
||||
PEP 723 metadata is written as **TOML inside specially-formatted Python comments**:
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "requests<3",
|
||||
# "rich",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Opening marker**: `# /// script` (exactly, with spaces)
|
||||
2. **Content**: Valid TOML, with each line prefixed by `#` and a space
|
||||
3. **Closing marker**: `# ///` (exactly, with spaces)
|
||||
4. **Location**: Typically near the top of the file, after shebang and module docstring
|
||||
5. **Indentation**: Use consistent comment formatting
|
||||
|
||||
### Supported Fields
|
||||
|
||||
```toml
|
||||
# /// script
|
||||
# requires-python = ">=3.11" # Minimum Python version
|
||||
# dependencies = [ # External packages
|
||||
# "requests>=2.31.0,<3",
|
||||
# "rich>=13.0",
|
||||
# "typer[all]>=0.12.0",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
## When to Use PEP 723
|
||||
|
||||
### ✅ Use PEP 723 When
|
||||
|
||||
1. **Script has external dependencies**
|
||||
- Uses packages from PyPI (requests, pandas, rich, etc.)
|
||||
- Needs specific package versions
|
||||
- Example: A CLI tool that fetches data from APIs
|
||||
|
||||
2. **Sharing standalone scripts**
|
||||
- Sending scripts via email, gists, or chat
|
||||
- Publishing example scripts in documentation
|
||||
- Creating portable automation tools
|
||||
|
||||
3. **Scripts need reproducibility**
|
||||
- Version-pinned dependencies for consistent behavior
|
||||
- Specific Python version requirements
|
||||
- Example: Deployment scripts that must work identically across environments
|
||||
|
||||
### ❌ Don't Use PEP 723 When
|
||||
|
||||
1. **Script uses only stdlib**
|
||||
- No external dependencies = nothing to declare
|
||||
- Use simple shebang: `#!/usr/bin/env python3`
|
||||
- Example: A script that uses only `argparse`, `pathlib`, `json`
|
||||
|
||||
2. **Full project with pyproject.toml**
|
||||
- Projects have proper package structure
|
||||
- Use `pyproject.toml` for dependency management
|
||||
- PEP 723 is for **single-file scripts**, not projects
|
||||
|
||||
3. **Script is part of a package**
|
||||
- Package dependencies are declared in `pyproject.toml`
|
||||
- Script uses package-level dependencies
|
||||
- No need to duplicate declarations
|
||||
|
||||
## Shebang Requirements
|
||||
|
||||
### Scripts with PEP 723 Metadata
|
||||
|
||||
**Must use** the uv-based shebang for automatic dependency installation:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = ["requests", "rich"]
|
||||
# ///
|
||||
|
||||
import requests
|
||||
from rich import print
|
||||
```
|
||||
|
||||
**Why this shebang?**
|
||||
|
||||
- `uv --quiet run --active --script`: Tells uv to:
|
||||
- Read PEP 723 metadata from the script
|
||||
- Install declared dependencies automatically
|
||||
- Execute the script with correct environment
|
||||
|
||||
### Stdlib-Only Scripts
|
||||
|
||||
**Use** the standard Python shebang (no PEP 723 needed):
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import argparse
|
||||
import pathlib
|
||||
import json
|
||||
|
||||
# No dependencies to declare
|
||||
```
|
||||
|
||||
**Why no PEP 723?**
|
||||
|
||||
- Stdlib is always available (bundled with Python)
|
||||
- Nothing to declare = no metadata needed
|
||||
- Simpler is better when appropriate
|
||||
|
||||
## Complete Example
|
||||
|
||||
### Script with External Dependencies
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "requests>=2.31.0,<3",
|
||||
# "rich>=13.0",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
"""Fetch GitHub user info and display with rich formatting."""
|
||||
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
import requests
|
||||
from rich.console import Console
|
||||
from rich.panel import Panel
|
||||
|
||||
console = Console()
|
||||
|
||||
|
||||
def fetch_user(username: str) -> dict[str, Any] | None:
|
||||
"""Fetch GitHub user data."""
|
||||
response = requests.get(f"https://api.github.com/users/{username}")
|
||||
if response.status_code == 200:
|
||||
return response.json()
|
||||
return None
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Main entry point."""
|
||||
if len(sys.argv) != 2:
|
||||
console.print("[red]Usage: script.py <github-username>[/red]")
|
||||
sys.exit(1)
|
||||
|
||||
username = sys.argv[1]
|
||||
user = fetch_user(username)
|
||||
|
||||
if user:
|
||||
console.print(
|
||||
Panel(
|
||||
f"[bold]{user['name']}[/bold]\n"
|
||||
f"Followers: {user['followers']}\n"
|
||||
f"Public Repos: {user['public_repos']}",
|
||||
title=f"GitHub: {username}",
|
||||
)
|
||||
)
|
||||
else:
|
||||
console.print(f"[red]User '{username}' not found[/red]")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**To run**:
|
||||
|
||||
```bash
|
||||
chmod +x script.py
|
||||
./script.py octocat
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
1. Read PEP 723 metadata
|
||||
2. Install `requests` and `rich` if not present
|
||||
3. Execute with dependencies available
|
||||
|
||||
### Stdlib-Only Script
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
|
||||
"""Simple JSON formatter using only stdlib."""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def format_json(input_path: Path, indent: int = 2) -> None:
|
||||
"""Format JSON file with specified indentation."""
|
||||
data = json.loads(input_path.read_text())
|
||||
formatted = json.dumps(data, indent=indent, sort_keys=True)
|
||||
print(formatted)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(description="Format JSON files")
|
||||
parser.add_argument("file", type=Path, help="JSON file to format")
|
||||
parser.add_argument("--indent", type=int, default=2, help="Indentation spaces")
|
||||
|
||||
args = parser.parse_args()
|
||||
format_json(args.file, args.indent)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
**No PEP 723 needed** - all imports are from Python's standard library.
|
||||
|
||||
## Tool Support
|
||||
|
||||
### Package Managers
|
||||
|
||||
The following tools support PEP 723 inline script metadata:
|
||||
|
||||
- **uv**: [https://docs.astral.sh/uv/](https://docs.astral.sh/uv/)
|
||||
- **PDM**: [https://pdm-project.org/](https://pdm-project.org/)
|
||||
- **Hatch**: [https://hatch.pypa.io/](https://hatch.pypa.io/)
|
||||
|
||||
### Running Scripts with uv
|
||||
|
||||
```bash
|
||||
# Make script executable
|
||||
chmod +x script.py
|
||||
|
||||
# Run directly (uv reads PEP 723 metadata)
|
||||
./script.py
|
||||
|
||||
# Or explicitly with uv
|
||||
uv run script.py
|
||||
```
|
||||
|
||||
### Alternative: PDM
|
||||
|
||||
```bash
|
||||
pdm run script.py
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Version Constraints
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "requests>=2.31.0,<3", # Major version constraint
|
||||
# "rich~=13.7", # Compatible release
|
||||
# "typer[all]", # With extras
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Development vs Production
|
||||
|
||||
**For scripts**, there's typically no separation - all dependencies are runtime dependencies. If you need development tools (testing, linting), those belong in a full project with `pyproject.toml`.
|
||||
|
||||
### Git-Based Dependencies
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# dependencies = [
|
||||
# "mylib @ git+https://github.com/user/mylib.git@v1.0.0",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Pin Major Versions
|
||||
|
||||
```python
|
||||
# Good - prevents breaking changes
|
||||
"requests>=2.31.0,<3"
|
||||
|
||||
# Avoid - might break on major updates
|
||||
"requests"
|
||||
```
|
||||
|
||||
### 2. Document the Script
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = ["requests", "rich"]
|
||||
# ///
|
||||
|
||||
"""
|
||||
Fetch and display GitHub user statistics.
|
||||
|
||||
Usage:
|
||||
./github_stats.py <username>
|
||||
|
||||
Example:
|
||||
./github_stats.py octocat
|
||||
"""
|
||||
```
|
||||
|
||||
### 3. Keep Scripts Focused
|
||||
|
||||
PEP 723 is for **single-file scripts**. If your script is growing large or needs multiple modules, consider creating a proper Python package with `pyproject.toml`.
|
||||
|
||||
### 4. Test Portability
|
||||
|
||||
```bash
|
||||
# Test on clean environment
|
||||
uv run --isolated script.py args
|
||||
```
|
||||
|
||||
## Comparison: PEP 723 vs pyproject.toml
|
||||
|
||||
| Aspect | PEP 723 (Script) | pyproject.toml (Project) |
|
||||
| ---------------- | ------------------------- | -------------------------- |
|
||||
| **Use case** | Single-file scripts | Multi-module packages |
|
||||
| **Dependencies** | Inline comments | Separate TOML file |
|
||||
| **Portability** | Single file to share | Requires project structure |
|
||||
| **Complexity** | Simple, focused | Full project metadata |
|
||||
| **When to use** | Scripts with dependencies | Libraries, applications |
|
||||
|
||||
## Validation
|
||||
|
||||
### Using /shebangpython Command
|
||||
|
||||
The `/shebangpython` command validates PEP 723 compliance:
|
||||
|
||||
```bash
|
||||
/shebangpython script.py
|
||||
```
|
||||
|
||||
**Checks**:
|
||||
|
||||
- ✅ Correct shebang for dependency type
|
||||
- ✅ PEP 723 syntax if external dependencies detected
|
||||
- ✅ Metadata fields are valid
|
||||
- ✅ Execute permission set
|
||||
|
||||
See: [/shebangpython command reference](~/.claude/commands/shebangpython.md)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Script Won't Execute
|
||||
|
||||
**Problem**: `./script.py` fails with "dependencies not found"
|
||||
|
||||
**Solution**: Check shebang is correct for PEP 723:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
```
|
||||
|
||||
### Syntax Errors in Metadata
|
||||
|
||||
**Problem**: TOML parsing fails
|
||||
|
||||
**Solution**: Validate TOML syntax:
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.11" # ✅ Correct
|
||||
# dependencies = [ # ✅ Correct - list syntax
|
||||
# "requests",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Performance Concerns
|
||||
|
||||
**Problem**: Script slow to start (installing dependencies)
|
||||
|
||||
**Solution**: uv caches dependencies. First run may be slow, subsequent runs are fast. For production, consider packaging as a proper project.
|
||||
|
||||
## Migration
|
||||
|
||||
### From requirements.txt
|
||||
|
||||
**Before** (two files):
|
||||
|
||||
```text
|
||||
# requirements.txt
|
||||
requests>=2.31.0
|
||||
rich>=13.0
|
||||
```
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
# script.py (separate file)
|
||||
|
||||
import requests
|
||||
from rich import print
|
||||
```
|
||||
|
||||
**After** (single file):
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "requests>=2.31.0",
|
||||
# "rich>=13.0",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import requests
|
||||
from rich import print
|
||||
```
|
||||
|
||||
### From Setup.py Scripts
|
||||
|
||||
**Before** (package structure):
|
||||
|
||||
```text
|
||||
myproject/
|
||||
├── setup.py
|
||||
├── requirements.txt
|
||||
└── scripts/
|
||||
└── tool.py
|
||||
```
|
||||
|
||||
**After** (standalone script):
|
||||
|
||||
```python
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = ["requests", "rich"]
|
||||
# ///
|
||||
|
||||
# tool.py - now fully self-contained
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
### Key Takeaways
|
||||
|
||||
1. **PEP 723 = Dependency Metadata for Single-File Scripts**
|
||||
- Standard format for declaring dependencies in comments
|
||||
- TOML content inside `# ///` delimiters
|
||||
|
||||
2. **When to Use**
|
||||
- Scripts **with external dependencies**
|
||||
- Need portability (single file to share)
|
||||
- Want automatic dependency installation
|
||||
|
||||
3. **When NOT to Use**
|
||||
- Stdlib-only scripts (nothing to declare)
|
||||
- Full projects (use `pyproject.toml`)
|
||||
- Package modules (use package dependencies)
|
||||
|
||||
4. **Shebang Requirements**
|
||||
- With PEP 723: `#!/usr/bin/env -S uv --quiet run --active --script`
|
||||
- Stdlib only: `#!/usr/bin/env python3`
|
||||
|
||||
5. **Tool Support**
|
||||
- uv, PDM, Hatch all support PEP 723
|
||||
- Automatic dependency installation on script execution
|
||||
|
||||
### Quick Reference
|
||||
|
||||
```python
|
||||
# Template for PEP 723 script
|
||||
#!/usr/bin/env -S uv --quiet run --active --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = [
|
||||
# "package-name>=version",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
"""Script description."""
|
||||
|
||||
import package_name
|
||||
|
||||
# Your code here
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- **Official PEP**: [https://peps.python.org/pep-0723/](https://peps.python.org/pep-0723/)
|
||||
- **uv Documentation**: [https://docs.astral.sh/uv/](https://docs.astral.sh/uv/)
|
||||
- **Skill Reference**: [Python Development SKILL.md](../SKILL.md)
|
||||
- **Shebang Validation**: [/shebangpython command](~/.claude/commands/shebangpython.md)
|
||||
45
skills/python3-development/references/api_reference.md
Normal file
45
skills/python3-development/references/api_reference.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "API Reference Template"
|
||||
description: "Template for detailed reference documentation"
|
||||
version: "1.0.0"
|
||||
last_updated: "2025-11-02"
|
||||
document_type: "template"
|
||||
---
|
||||
|
||||
# Reference Documentation for Python3 Development
|
||||
|
||||
This is a placeholder for detailed reference documentation. Replace with actual reference content or delete if not needed.
|
||||
|
||||
Example real reference docs from other skills:
|
||||
|
||||
- product-management/references/communication.md - Comprehensive guide for status updates
|
||||
- product-management/references/context_building.md - Deep-dive on gathering context
|
||||
- bigquery/references/ - API references and query examples
|
||||
|
||||
## When Reference Docs Are Useful
|
||||
|
||||
Reference docs are ideal for:
|
||||
|
||||
- Comprehensive API documentation
|
||||
- Detailed workflow guides
|
||||
- Complex multi-step processes
|
||||
- Information too lengthy for main SKILL.md
|
||||
- Content that's only needed for specific use cases
|
||||
|
||||
## Structure Suggestions
|
||||
|
||||
### API Reference Example
|
||||
|
||||
- Overview
|
||||
- Authentication
|
||||
- Endpoints with examples
|
||||
- Error codes
|
||||
- Rate limits
|
||||
|
||||
### Workflow Guide Example
|
||||
|
||||
- Prerequisites
|
||||
- Step-by-step instructions
|
||||
- Common patterns
|
||||
- Troubleshooting
|
||||
- Best practices
|
||||
419
skills/python3-development/references/exception-handling.md
Normal file
419
skills/python3-development/references/exception-handling.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# Exception Handling in Python CLI Applications with Typer
|
||||
|
||||
## The Problem: Exception Chain Explosion
|
||||
|
||||
AI-generated code commonly creates a catastrophic anti-pattern where every function catches and re-wraps exceptions, creating massive exception chains (200+ lines of output) for simple errors like "file not found".
|
||||
|
||||
**Example of the problem:**
|
||||
|
||||
**Full example:** [nested-typer-exception-explosion.py](./nested-typer-exceptions/nested-typer-exception-explosion.py)
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion.py (simplified - see full file for all 7 layers)
|
||||
# Layer 1
|
||||
def read_file(path):
|
||||
try:
|
||||
return path.read_text()
|
||||
except FileNotFoundError as e:
|
||||
raise ConfigError(f"File not found: {path}") from e
|
||||
except Exception as e:
|
||||
raise ConfigError(f"Failed to read: {e}") from e
|
||||
|
||||
# Layer 2
|
||||
def load_config(path):
|
||||
try:
|
||||
contents = read_file(path)
|
||||
return json.loads(contents)
|
||||
except ConfigError as e:
|
||||
raise ConfigError(f"Config load failed: {e}") from e
|
||||
except Exception as e:
|
||||
raise ConfigError(f"Unexpected error: {e}") from e
|
||||
|
||||
# Layer 3... Layer 4... Layer 5... Layer 6... Layer 7...
|
||||
# Each layer wraps the exception again
|
||||
```
|
||||
|
||||
**Result:** Single `FileNotFoundError` becomes a 6-layer exception chain with 220 lines of output.
|
||||
|
||||
## The Correct Solution: Typer's Exit Pattern
|
||||
|
||||
Based on Typer's official documentation and best practices:
|
||||
|
||||
### Pattern 1: Custom Exit Exception with typer.echo
|
||||
|
||||
**Full example:** [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py)
|
||||
|
||||
Create a custom exception class that handles user-friendly output:
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion_corrected_typer_echo.py
|
||||
import typer
|
||||
|
||||
class AppExit(typer.Exit):
|
||||
"""Custom exception for graceful application exits."""
|
||||
|
||||
def __init__(self, code: int | None = None, message: str | None = None):
|
||||
self.code = code
|
||||
self.message = message
|
||||
if message is not None:
|
||||
if code is None or code == 0:
|
||||
typer.echo(self.message)
|
||||
else:
|
||||
typer.echo(self.message, err=True)
|
||||
super().__init__(code=code)
|
||||
```
|
||||
|
||||
**Usage in helper functions:**
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion_corrected_typer_echo.py
|
||||
def load_json_file(file_path: Path) -> dict:
|
||||
"""Load JSON from file.
|
||||
|
||||
Raises:
|
||||
AppExit: If file cannot be loaded or parsed
|
||||
"""
|
||||
contents = file_path.read_text(encoding="utf-8") # Let FileNotFoundError bubble
|
||||
|
||||
try:
|
||||
return json.loads(contents)
|
||||
except json.JSONDecodeError as e:
|
||||
# Only catch where we can add meaningful context
|
||||
raise AppExit(
|
||||
code=1,
|
||||
message=f"Invalid JSON in {file_path} at line {e.lineno}, column {e.colno}: {e.msg}"
|
||||
) from e
|
||||
```
|
||||
|
||||
**Key principles:**
|
||||
|
||||
- Helper functions let exceptions bubble naturally
|
||||
- Only catch at points where you have enough context for a good error message
|
||||
- Immediately raise `AppExit` - don't re-wrap multiple times
|
||||
- Use `from e` to preserve the chain for debugging
|
||||
|
||||
### Pattern 2: Custom Exit Exception with Rich Console
|
||||
|
||||
**Full example:** [nested-typer-exception-explosion_corrected_rich_console.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_rich_console.py)
|
||||
|
||||
For applications using Rich for output:
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion_corrected_rich_console.py
|
||||
from rich.console import Console
|
||||
import typer
|
||||
|
||||
normal_console = Console()
|
||||
err_console = Console(stderr=True)
|
||||
|
||||
class AppExitRich(typer.Exit):
|
||||
"""Custom exception using Rich console for consistent formatting."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
code: int | None = None,
|
||||
message: str | None = None,
|
||||
console: Console = normal_console
|
||||
):
|
||||
self.code = code
|
||||
self.message = message
|
||||
if message is not None:
|
||||
console.print(self.message)
|
||||
super().__init__(code=code)
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion_corrected_rich_console.py
|
||||
def validate_config(data: dict) -> dict:
|
||||
"""Validate config structure.
|
||||
|
||||
Raises:
|
||||
AppExitRich: If validation fails
|
||||
"""
|
||||
if not data:
|
||||
raise AppExitRich(code=1, message="Config cannot be empty", console=err_console)
|
||||
if not isinstance(data, dict):
|
||||
raise AppExitRich(
|
||||
code=1,
|
||||
message=f"Config must be a JSON object, got {type(data)}",
|
||||
console=err_console
|
||||
)
|
||||
return data
|
||||
```
|
||||
|
||||
## Complete Example: Correct Pattern
|
||||
|
||||
**Full example:** [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py)
|
||||
|
||||
```python
|
||||
# From: nested-typer-exception-explosion_corrected_typer_echo.py
|
||||
#!/usr/bin/env -S uv run --quiet --script
|
||||
# /// script
|
||||
# requires-python = ">=3.11"
|
||||
# dependencies = ["typer>=0.19.2"]
|
||||
# ///
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Annotated
|
||||
import typer
|
||||
|
||||
app = typer.Typer()
|
||||
|
||||
class AppExit(typer.Exit):
|
||||
"""Custom exception for graceful exits with user-friendly messages."""
|
||||
|
||||
def __init__(self, code: int | None = None, message: str | None = None):
|
||||
if message is not None:
|
||||
if code is None or code == 0:
|
||||
typer.echo(message)
|
||||
else:
|
||||
typer.echo(message, err=True)
|
||||
super().__init__(code=code)
|
||||
|
||||
# Helper functions - let exceptions bubble naturally
|
||||
def read_file_contents(file_path: Path) -> str:
|
||||
"""Read file contents.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If file doesn't exist
|
||||
PermissionError: If file isn't readable
|
||||
"""
|
||||
return file_path.read_text(encoding="utf-8")
|
||||
|
||||
def parse_json_string(content: str) -> dict:
|
||||
"""Parse JSON string.
|
||||
|
||||
Raises:
|
||||
json.JSONDecodeError: If JSON is invalid
|
||||
"""
|
||||
return json.loads(content)
|
||||
|
||||
# Only catch where we add meaningful context
|
||||
def load_json_file(file_path: Path) -> dict:
|
||||
"""Load and parse JSON file.
|
||||
|
||||
Raises:
|
||||
AppExit: If file cannot be loaded or parsed
|
||||
"""
|
||||
contents = read_file_contents(file_path)
|
||||
try:
|
||||
return parse_json_string(contents)
|
||||
except json.JSONDecodeError as e:
|
||||
raise AppExit(
|
||||
code=1,
|
||||
message=f"Invalid JSON in {file_path} at line {e.lineno}, column {e.colno}: {e.msg}"
|
||||
) from e
|
||||
|
||||
def validate_config(data: dict, source: str) -> dict:
|
||||
"""Validate config structure.
|
||||
|
||||
Raises:
|
||||
AppExit: If validation fails
|
||||
"""
|
||||
if not data:
|
||||
raise AppExit(code=1, message="Config cannot be empty")
|
||||
if not isinstance(data, dict):
|
||||
raise AppExit(code=1, message=f"Config must be a JSON object, got {type(data)}")
|
||||
return data
|
||||
|
||||
def load_config(file_path: Path) -> dict:
|
||||
"""Load and validate configuration.
|
||||
|
||||
Raises:
|
||||
AppExit: If config cannot be loaded or is invalid
|
||||
"""
|
||||
try:
|
||||
data = load_json_file(file_path)
|
||||
except (FileNotFoundError, PermissionError):
|
||||
raise AppExit(code=1, message=f"Failed to load config from {file_path}")
|
||||
return validate_config(data, str(file_path))
|
||||
|
||||
@app.command()
|
||||
def main(config_file: Annotated[Path, typer.Argument()]) -> None:
|
||||
"""Load and process configuration file."""
|
||||
config = load_config(config_file)
|
||||
typer.echo(f"Config loaded successfully: {config}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
app()
|
||||
```
|
||||
|
||||
## Output Comparison
|
||||
|
||||
### Anti-Pattern Output (220 lines)
|
||||
|
||||
```text
|
||||
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
|
||||
│ ... json.loads() ... │
|
||||
│ ... 40 lines of traceback ... │
|
||||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||||
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
|
||||
|
||||
The above exception was the direct cause of the following exception:
|
||||
|
||||
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
|
||||
│ ... parse_json_string() ... │
|
||||
│ ... 40 lines of traceback ... │
|
||||
╰──────────────────────────────────────────────────────────────────────────────╯
|
||||
ConfigError: Invalid JSON in broken.json at line 1, column 1: Expecting value
|
||||
|
||||
The above exception was the direct cause of the following exception:
|
||||
|
||||
[... 4 more layers of this ...]
|
||||
```
|
||||
|
||||
### Correct Pattern Output (1 line)
|
||||
|
||||
```text
|
||||
Invalid JSON in broken.json at line 1, column 1: Expecting value
|
||||
```
|
||||
|
||||
## Rules for Exception Handling in Typer CLIs
|
||||
|
||||
### ✅ DO
|
||||
|
||||
1. **Let exceptions propagate in helper functions** - Most functions should not have try/except
|
||||
2. **Catch only where you add meaningful context** - JSON parsing, validation, etc.
|
||||
3. **Immediately raise AppExit** - Don't re-wrap multiple times
|
||||
4. **Use custom exception classes** - Inherit from `typer.Exit` and handle output in `__init__`
|
||||
5. **Document what exceptions bubble up** - Use docstring "Raises:" sections
|
||||
6. **Use `from e` when wrapping** - Preserves exception chain for debugging
|
||||
|
||||
### ❌ DON'T
|
||||
|
||||
1. **NEVER catch and re-wrap at every layer** - This creates exception chain explosion
|
||||
2. **NEVER use `except Exception as e:` as a safety net** - Too broad, catches things you can't handle
|
||||
3. **NEVER check `isinstance` to avoid double-wrapping** - This is a symptom you're doing it wrong
|
||||
4. **NEVER convert exceptions to return values** - Use exceptions, not `{"success": False, "error": "..."}` patterns
|
||||
5. **NEVER catch exceptions you can't handle** - Let them propagate
|
||||
|
||||
## When to Catch Exceptions
|
||||
|
||||
**Catch when:**
|
||||
|
||||
- You can add meaningful context (filename, line number, etc.)
|
||||
- You're at a validation boundary and can provide specific feedback
|
||||
- You need to convert a technical error to user-friendly message
|
||||
|
||||
**Don't catch when:**
|
||||
|
||||
- You're just going to re-raise it
|
||||
- You can't add any useful information
|
||||
- You're in a helper function that just transforms data
|
||||
|
||||
## Fail Fast by Default
|
||||
|
||||
**What you DON'T want:**
|
||||
|
||||
- ❌ Nested try/except that re-raise with redundant messages
|
||||
- ❌ Bare exception catching (`except Exception:`)
|
||||
- ❌ Graceful degradation without requirements
|
||||
- ❌ Failover/fallback logic without explicit need
|
||||
- ❌ "Defensive" catch-all handlers that mask problems
|
||||
|
||||
**What IS fine:**
|
||||
|
||||
- ✅ Let exceptions propagate naturally
|
||||
- ✅ Add try/except only where recovery is actually needed
|
||||
- ✅ Validation at boundaries (user input, external APIs)
|
||||
- ✅ Clear, specific exception types
|
||||
|
||||
## Reference: Typer Documentation
|
||||
|
||||
Official Typer guidance on exits and exceptions:
|
||||
|
||||
- [Terminating](https://github.com/fastapi/typer/blob/master/docs/tutorial/terminating.md)
|
||||
- [Exceptions](https://github.com/fastapi/typer/blob/master/docs/tutorial/exceptions.md)
|
||||
- [Printing](https://github.com/fastapi/typer/blob/master/docs/tutorial/printing.md)
|
||||
|
||||
## Demonstration Scripts
|
||||
|
||||
See [assets/nested-typer-exceptions/](./nested-typer-exceptions/) for complete working examples.
|
||||
|
||||
**Quick start:** See [README.md](./nested-typer-exceptions/README.md) for script overview and running instructions.
|
||||
|
||||
### [nested-typer-exception-explosion.py](./nested-typer-exceptions/nested-typer-exception-explosion.py) - The Anti-Pattern
|
||||
|
||||
**What you'll find:**
|
||||
|
||||
- Complete executable script demonstrating 7 layers of exception wrapping
|
||||
- Every function catches exceptions and re-wraps with `from e`
|
||||
- Creates ConfigError custom exception at each layer
|
||||
- No isinstance checks - pure exception chain explosion
|
||||
|
||||
**What happens when you run it:**
|
||||
|
||||
- Single JSON parsing error generates ~220 lines of output
|
||||
- 7 separate Rich-formatted traceback blocks
|
||||
- "The above exception was the direct cause of the following exception" repeated 6 times
|
||||
- Obscures the actual error (invalid JSON) in pages of traceback
|
||||
|
||||
**Run it:** `./nested-typer-exception-explosion.py broken.json`
|
||||
|
||||
### [nested-typer-exception-explosion_naive_workaround.py](./nested-typer-exceptions/nested-typer-exception-explosion_naive_workaround.py) - The isinstance Band-Aid
|
||||
|
||||
**What you'll find:**
|
||||
|
||||
- Same 7-layer structure as the explosion example
|
||||
- Each `except Exception as e:` block has `if isinstance(e, ConfigError): raise` checks
|
||||
- Shows how AI attempts to avoid double-wrapping by checking exception type
|
||||
- Treats the symptom (double-wrapping) instead of the cause (catching everywhere)
|
||||
|
||||
**What happens when you run it:**
|
||||
|
||||
- Still shows nested tracebacks but slightly reduced output (~80 lines)
|
||||
- Demonstrates why isinstance checks appear in AI-generated code
|
||||
- Shows this is a workaround, not a solution
|
||||
|
||||
**Run it:** `./nested-typer-exception-explosion_naive_workaround.py broken.json`
|
||||
|
||||
### [nested-typer-exception-explosion_corrected_typer_echo.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_typer_echo.py) - Correct Pattern with typer.echo
|
||||
|
||||
**What you'll find:**
|
||||
|
||||
- Custom `AppExit` class extending `typer.Exit` that calls `typer.echo()` in `__init__`
|
||||
- Helper functions that let exceptions bubble naturally (no try/except)
|
||||
- Only catches at specific points where meaningful context can be added
|
||||
- Immediately raises `AppExit` - no re-wrapping through multiple layers
|
||||
- Complete executable example with PEP 723 metadata
|
||||
|
||||
**What happens when you run it:**
|
||||
|
||||
- Clean 1-line error message: `Invalid JSON in broken.json at line 1, column 1: Expecting value`
|
||||
- No traceback explosion
|
||||
- User-friendly output using typer.echo for stderr
|
||||
|
||||
**Run it:** `./nested-typer-exception-explosion_corrected_typer_echo.py broken.json`
|
||||
|
||||
### [nested-typer-exception-explosion_corrected_rich_console.py](./nested-typer-exceptions/nested-typer-exception-explosion_corrected_rich_console.py) - Correct Pattern with Rich Console
|
||||
|
||||
**What you'll find:**
|
||||
|
||||
- Custom `AppExitRich` class extending `typer.Exit` that calls `console.print()` in `__init__`
|
||||
- Same exception bubbling principles as typer.echo version
|
||||
- Uses Rich Console for consistent formatting with rest of CLI
|
||||
- Allows passing different console instances (normal_console vs err_console)
|
||||
|
||||
**What happens when you run it:**
|
||||
|
||||
- Same clean 1-line output as typer.echo version
|
||||
- Uses Rich console for output instead of typer.echo
|
||||
- Demonstrates pattern for apps already using Rich for terminal output
|
||||
|
||||
**Run it:** `./nested-typer-exception-explosion_corrected_rich_console.py broken.json`
|
||||
|
||||
### Running the Examples
|
||||
|
||||
All scripts use PEP 723 inline script metadata and can be run directly:
|
||||
|
||||
```bash
|
||||
# Run any script directly (uv handles dependencies automatically)
|
||||
./script-name.py broken.json
|
||||
|
||||
# Or explicitly with uv
|
||||
uv run script-name.py broken.json
|
||||
```
|
||||
|
||||
The scripts will create `broken.json` if it doesn't exist.
|
||||
1477
skills/python3-development/references/modern-modules.md
Normal file
1477
skills/python3-development/references/modern-modules.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,545 @@
|
||||
---
|
||||
title: "GitPython: Python Library for Git Repository Interaction"
|
||||
library_name: GitPython
|
||||
pypi_package: GitPython
|
||||
category: version_control
|
||||
python_compatibility: "3.7+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://gitpython.readthedocs.io"
|
||||
official_repository: "https://github.com/gitpython-developers/GitPython"
|
||||
maintenance_status: "stable"
|
||||
---
|
||||
|
||||
# GitPython: Python Library for Git Repository Interaction
|
||||
|
||||
## Official Information
|
||||
|
||||
### Repository and Package Details
|
||||
|
||||
- **Official Repository**: <https://github.com/gitpython-developers/GitPython> @[github.com]
|
||||
- **PyPI Package**: `GitPython` @[pypi.org]
|
||||
- **Current Version**: 3.1.45 (as of research date) @[pypi.org]
|
||||
- **Official Documentation**: <https://gitpython.readthedocs.io/> @[readthedocs.org]
|
||||
- **License**: 3-Clause BSD License (New BSD License) @[github.com/LICENSE]
|
||||
|
||||
### Maintenance Status
|
||||
|
||||
The project is in **maintenance mode** as of 2025 @[github.com/README.md]:
|
||||
|
||||
- No active feature development unless contributed by community
|
||||
- Bug fixes limited to safety-critical issues or community contributions
|
||||
- Response times up to one month for issues
|
||||
- Open to contributions and new maintainers
|
||||
- Widely used and actively maintained by community
|
||||
|
||||
### Version Requirements
|
||||
|
||||
- **Python Support**: Python >= 3.7 @[setup.py]
|
||||
- **Explicit Compatibility**: Python 3.7, 3.8, 3.9, 3.10, 3.11, 3.12 @[setup.py]
|
||||
- **Python 3.13-3.14**: Not explicitly tested but likely compatible given 3.12 support
|
||||
- **Git Version**: Git 1.7.x or newer required @[README.md]
|
||||
- **System Requirement**: Git executable must be installed and available in PATH
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### Problem Statement
|
||||
|
||||
GitPython solves the challenge of programmatically interacting with Git repositories from Python without manually parsing git command output or managing subprocess calls @[Context7]:
|
||||
|
||||
1. **Abstraction over Git CLI**: Provides high-level (porcelain) and low-level (plumbing) interfaces to Git operations
|
||||
2. **Object-Oriented Access**: Represents Git objects (commits, trees, blobs, tags) as Python objects
|
||||
3. **Repository Automation**: Enables automation of repository management, analysis, and manipulation
|
||||
4. **Mining Software Repositories**: Facilitates extraction of repository metadata for analysis
|
||||
|
||||
### When to Use GitPython
|
||||
|
||||
**Use GitPython when you need to:**
|
||||
|
||||
- Access Git repository metadata programmatically (commits, branches, tags)
|
||||
- Traverse commit history with complex filtering
|
||||
- Analyze repository structure and content
|
||||
- Automate repository operations in Python applications
|
||||
- Build tools for repository mining or analysis
|
||||
- Inspect repository state without manual git command parsing
|
||||
- Work with Git objects (trees, blobs) programmatically
|
||||
|
||||
### What Would Be "Reinventing the Wheel"
|
||||
|
||||
Without GitPython, you would need to @[github.com/README.md]:
|
||||
|
||||
- Manually execute `git` commands via `subprocess`
|
||||
- Parse git command output (often text-based)
|
||||
- Handle edge cases in output formatting
|
||||
- Manage object relationships manually
|
||||
- Implement caching and optimization
|
||||
- Handle cross-platform differences in git output
|
||||
|
||||
## Real-World Usage Examples
|
||||
|
||||
### Example Projects Using GitPython
|
||||
|
||||
1. **PyDriller** (908+ stars) - Python framework for mining software repositories @[github.com/ishepard/pydriller]
|
||||
- Analyzes Git repositories to extract commits, developers, modifications, diffs
|
||||
- Provides abstraction layer over GitPython for research purposes
|
||||
|
||||
2. **Kivy Designer** (837+ stars) - UI designer for Kivy framework @[github.com/kivy/kivy-designer]
|
||||
- Uses GitPython for version control integration in IDE
|
||||
|
||||
3. **GithubCloner** (419+ stars) - Clones GitHub repositories of users and organizations @[github.com/mazen160/GithubCloner]
|
||||
- Leverages GitPython for batch repository cloning
|
||||
|
||||
4. **git-story** (256+ stars) - Creates video animations of Git commit history @[github.com/initialcommit-com/git-story]
|
||||
- Uses GitPython to traverse commit history for visualization
|
||||
|
||||
5. **Dulwich** (2168+ stars) - Pure-Python Git implementation @[github.com/jelmer/dulwich]
|
||||
- Alternative to GitPython with pure-Python implementation
|
||||
|
||||
### Common Usage Patterns
|
||||
|
||||
#### Pattern 1: Repository Initialization and Cloning
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
# Clone repository
|
||||
repo = Repo.clone_from('https://github.com/user/repo.git', '/local/path')
|
||||
|
||||
# Initialize new repository
|
||||
repo = Repo.init('/path/to/new/repo')
|
||||
|
||||
# Open existing repository
|
||||
repo = Repo('/path/to/existing/repo')
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
#### Pattern 2: Accessing Repository State
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('/path/to/repo')
|
||||
|
||||
# Get active branch
|
||||
active_branch = repo.active_branch
|
||||
|
||||
# Check repository status
|
||||
is_modified = repo.is_dirty()
|
||||
untracked = repo.untracked_files
|
||||
|
||||
# Access HEAD commit
|
||||
latest_commit = repo.head.commit
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
#### Pattern 3: Commit Operations
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('/path/to/repo')
|
||||
|
||||
# Stage files
|
||||
repo.index.add(['file1.txt', 'file2.py'])
|
||||
|
||||
# Create commit
|
||||
repo.index.commit('Commit message')
|
||||
|
||||
# Access commit metadata
|
||||
commit = repo.head.commit
|
||||
print(commit.author.name)
|
||||
print(commit.authored_datetime)
|
||||
print(commit.message)
|
||||
print(commit.hexsha)
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
#### Pattern 4: Branch Management
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('/path/to/repo')
|
||||
|
||||
# List all branches
|
||||
branches = repo.heads
|
||||
|
||||
# Create new branch
|
||||
new_branch = repo.create_head('feature-branch')
|
||||
|
||||
# Checkout branch (safer method)
|
||||
repo.git.checkout('branch-name')
|
||||
|
||||
# Access branch commit
|
||||
commit = repo.heads.main.commit
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
#### Pattern 5: Traversing Commit History
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('/path/to/repo')
|
||||
|
||||
# Iterate through commits
|
||||
for commit in repo.iter_commits('main', max_count=50):
|
||||
print(f"{commit.hexsha[:7]}: {commit.summary}")
|
||||
|
||||
# Get commits for specific file
|
||||
commits = repo.iter_commits(paths='specific/file.py')
|
||||
|
||||
# Access commit tree and changes
|
||||
for commit in repo.iter_commits():
|
||||
for file in commit.stats.files:
|
||||
print(f"{file} changed in {commit.hexsha[:7]}")
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Repository Management Pattern
|
||||
|
||||
GitPython provides abstractions for repository operations @[Context7/tutorial.rst]:
|
||||
|
||||
- **Repo Object**: Central interface to repository
|
||||
- **References**: Branches (heads), tags, remotes
|
||||
- **Index**: Staging area for commits
|
||||
- **Configuration**: Repository and global Git config access
|
||||
|
||||
### Automation Patterns
|
||||
|
||||
#### CI/CD Integration
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
def deploy_on_commit():
|
||||
repo = Repo('/app/source')
|
||||
|
||||
# Fetch latest changes
|
||||
origin = repo.remotes.origin
|
||||
origin.pull()
|
||||
|
||||
# Check if deployment needed
|
||||
if repo.head.commit != last_deployed_commit:
|
||||
trigger_deployment()
|
||||
```
|
||||
|
||||
#### Repository Analysis
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
from collections import defaultdict
|
||||
|
||||
def analyze_contributors(repo_path):
|
||||
repo = Repo(repo_path)
|
||||
contributions = defaultdict(int)
|
||||
|
||||
for commit in repo.iter_commits():
|
||||
contributions[commit.author.email] += 1
|
||||
|
||||
return dict(contributions)
|
||||
```
|
||||
|
||||
#### Automated Tagging
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
def create_version_tag(version):
|
||||
repo = Repo('.')
|
||||
repo.create_tag(f'v{version}', message=f'Release {version}')
|
||||
repo.remotes.origin.push(f'v{version}')
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
### Verified Compatibility
|
||||
|
||||
- **Python 3.7-3.12**: Fully supported and tested @[setup.py]
|
||||
- **Python 3.13-3.14**: Not explicitly tested but should work (no breaking changes identified)
|
||||
|
||||
### Dependency Requirements
|
||||
|
||||
GitPython requires @[README.md]:
|
||||
|
||||
- `gitdb` package for Git object database operations
|
||||
- `git` executable (system dependency)
|
||||
- Compatible with all major operating systems (Linux, macOS, Windows)
|
||||
|
||||
### Platform Considerations
|
||||
|
||||
- **Windows**: Some limitations noted in Issue #525 @[README.md]
|
||||
- **Unix-like systems**: Full feature support
|
||||
- **Git Version**: Requires Git 1.7.x or newer
|
||||
|
||||
## Usage Examples from Documentation
|
||||
|
||||
### Repository Initialization
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
# Initialize working directory repository
|
||||
repo = Repo("/path/to/repo")
|
||||
|
||||
# Initialize bare repository
|
||||
repo = Repo("/path/to/bare/repo", bare=True)
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
### Working with Commits and Trees
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('.')
|
||||
|
||||
# Get latest commit
|
||||
commit = repo.head.commit
|
||||
|
||||
# Access commit tree
|
||||
tree = commit.tree
|
||||
|
||||
# Get tree from repository directly
|
||||
repo_tree = repo.tree()
|
||||
|
||||
# Navigate tree structure
|
||||
for item in tree:
|
||||
print(f"{item.type}: {item.name}")
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
### Diffing Operations
|
||||
|
||||
```python
|
||||
from git import Repo
|
||||
|
||||
repo = Repo('.')
|
||||
commit = repo.head.commit
|
||||
|
||||
# Diff commit against working tree
|
||||
diff_worktree = commit.diff(None)
|
||||
|
||||
# Diff between commits
|
||||
prev_commit = commit.parents[0]
|
||||
diff_commits = prev_commit.diff(commit)
|
||||
|
||||
# Iterate through changes
|
||||
for diff_item in diff_worktree:
|
||||
print(f"{diff_item.change_type}: {diff_item.a_path}")
|
||||
```
|
||||
|
||||
@[Context7/changes.rst]
|
||||
|
||||
### Remote Operations
|
||||
|
||||
```python
|
||||
from git import Repo, RemoteProgress
|
||||
|
||||
class ProgressPrinter(RemoteProgress):
|
||||
def update(self, op_code, cur_count, max_count=None, message=''):
|
||||
print(f"Progress: {cur_count}/{max_count}")
|
||||
|
||||
repo = Repo('/path/to/repo')
|
||||
origin = repo.remotes.origin
|
||||
|
||||
# Fetch with progress
|
||||
origin.fetch(progress=ProgressPrinter())
|
||||
|
||||
# Pull changes
|
||||
origin.pull()
|
||||
|
||||
# Push changes
|
||||
origin.push()
|
||||
```
|
||||
|
||||
@[Context7/tutorial.rst]
|
||||
|
||||
## When NOT to Use GitPython
|
||||
|
||||
### Performance-Critical Operations
|
||||
|
||||
- **Large repositories**: GitPython can be slow on very large repos
|
||||
- **Bulk operations**: Consider `git` CLI directly for batch operations
|
||||
- **Resource-constrained environments**: GitPython can leak resources in long-running processes
|
||||
|
||||
### Long-Running Processes
|
||||
|
||||
GitPython is **not suited for daemons or long-running processes** @[README.md]:
|
||||
|
||||
- Resource leakage issues due to `__del__` method implementations
|
||||
- Written before deterministic destructors became unreliable
|
||||
- **Mitigation**: Factor GitPython into separate process that can be periodically restarted
|
||||
- **Alternative**: Manually call `__del__` methods when appropriate
|
||||
|
||||
### Simple Git Commands
|
||||
|
||||
When you only need simple git operations:
|
||||
|
||||
- **Single command execution**: Use `subprocess.run(['git', 'status'])` directly
|
||||
- **Shell scripting**: Pure git commands may be simpler
|
||||
- **One-off operations**: GitPython overhead not justified
|
||||
|
||||
### Pure Python Requirements
|
||||
|
||||
If you cannot have system dependencies:
|
||||
|
||||
- GitPython **requires git executable** installed on system
|
||||
- Consider **Dulwich** (pure-Python Git implementation) instead
|
||||
|
||||
## Decision Guidance: GitPython vs Subprocess
|
||||
|
||||
### Use GitPython When
|
||||
|
||||
| Scenario | Reason |
|
||||
| ---------------------------- | ---------------------------------------- |
|
||||
| Complex repository traversal | Object-oriented API simplifies iteration |
|
||||
| Accessing Git objects | Direct access to trees, blobs, commits |
|
||||
| Repository analysis | Rich metadata without parsing |
|
||||
| Cross-platform code | Abstracts platform differences |
|
||||
| Multiple related operations | Maintains repository context |
|
||||
| Building repository tools | Higher-level abstractions |
|
||||
| Need type hints | GitPython provides typed interfaces |
|
||||
|
||||
### Use Subprocess When
|
||||
|
||||
| Scenario | Reason |
|
||||
| ------------------------- | -------------------------------------- |
|
||||
| Single git command | Less overhead |
|
||||
| Performance critical | Direct execution faster |
|
||||
| Long-running daemon | Avoid resource leaks |
|
||||
| Simple automation | Shell script may be clearer |
|
||||
| Git plumbing commands | Some commands not exposed in GitPython |
|
||||
| Very large repositories | Lower memory footprint |
|
||||
| Custom git configurations | Full control over git execution |
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
```python
|
||||
# USE GITPYTHON:
|
||||
# - Iterate commits with filtering
|
||||
for commit in repo.iter_commits('main', max_count=100):
|
||||
if commit.author.email == 'specific@email.com':
|
||||
analyze_commit(commit)
|
||||
|
||||
# USE SUBPROCESS:
|
||||
# - Simple status check
|
||||
result = subprocess.run(['git', 'status', '--short'],
|
||||
capture_output=True, text=True)
|
||||
if 'M' in result.stdout:
|
||||
print("Modified files detected")
|
||||
|
||||
# USE GITPYTHON:
|
||||
# - Repository state analysis
|
||||
if repo.is_dirty(untracked_files=True):
|
||||
staged = repo.index.diff("HEAD")
|
||||
unstaged = repo.index.diff(None)
|
||||
|
||||
# USE SUBPROCESS:
|
||||
# - Performance-critical bulk operation
|
||||
subprocess.run(['git', 'gc', '--aggressive'])
|
||||
```
|
||||
|
||||
## Critical Limitations
|
||||
|
||||
### Resource Leakage @[README.md]
|
||||
|
||||
GitPython tends to leak system resources in long-running processes:
|
||||
|
||||
- Destructors (`__del__`) no longer run deterministically in modern Python
|
||||
- Manually call cleanup methods or use separate process approach
|
||||
- Not recommended for daemon applications
|
||||
|
||||
### Windows Support @[README.md]
|
||||
|
||||
Known limitations on Windows platform:
|
||||
|
||||
- See Issue #525 for details
|
||||
- Some operations may behave differently
|
||||
|
||||
### Git Executable Dependency @[README.md]
|
||||
|
||||
GitPython requires git to be installed:
|
||||
|
||||
- Must be in PATH or specified via `GIT_PYTHON_GIT_EXECUTABLE` environment variable
|
||||
- Cannot work in pure-Python environments
|
||||
- Version requirement: Git 1.7.x or newer
|
||||
|
||||
## Installation
|
||||
|
||||
### Standard Installation
|
||||
|
||||
```bash
|
||||
pip install GitPython
|
||||
```
|
||||
|
||||
### Development Installation
|
||||
|
||||
```bash
|
||||
git clone https://github.com/gitpython-developers/GitPython
|
||||
cd GitPython
|
||||
./init-tests-after-clone.sh
|
||||
pip install -e ".[test]"
|
||||
```
|
||||
|
||||
@[README.md]
|
||||
|
||||
## Testing and Quality
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Install test dependencies
|
||||
pip install -e ".[test]"
|
||||
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Run linting
|
||||
pre-commit run --all-files
|
||||
|
||||
# Type checking
|
||||
mypy
|
||||
```
|
||||
|
||||
@[README.md]
|
||||
|
||||
### Configuration
|
||||
|
||||
- Test configuration in `pyproject.toml`
|
||||
- Supports pytest, coverage.py, ruff, mypy
|
||||
- CI via GitHub Actions and tox
|
||||
|
||||
## Community and Support
|
||||
|
||||
### Getting Help
|
||||
|
||||
- **Documentation**: <https://gitpython.readthedocs.io/>
|
||||
- **Stack Overflow**: Use `gitpython` tag @[README.md]
|
||||
- **Issue Tracker**: <https://github.com/gitpython-developers/GitPython/issues>
|
||||
|
||||
### Contributing
|
||||
|
||||
- Project accepts contributions of all kinds
|
||||
- Seeking new maintainers
|
||||
- Response time: up to 1 month for issues @[README.md]
|
||||
|
||||
### Related Projects
|
||||
|
||||
- **Gitoxide**: Rust implementation of Git by original GitPython author @[README.md]
|
||||
- **Dulwich**: Pure-Python Git implementation
|
||||
- **PyDriller**: Framework for mining software repositories built on GitPython
|
||||
|
||||
## Summary
|
||||
|
||||
GitPython provides a mature, well-documented Python interface to Git repositories. While in maintenance mode, it remains widely used and community-supported. Best suited for repository analysis, automation, and tools where the convenience of object-oriented access outweighs performance concerns. For simple operations or long-running processes, consider subprocess or alternative approaches.
|
||||
|
||||
**Key Takeaway**: Use GitPython when the complexity of repository operations justifies the abstraction layer and resource overhead. Use subprocess for simple, one-off git commands or in resource-sensitive environments.
|
||||
569
skills/python3-development/references/modern-modules/arrow.md
Normal file
569
skills/python3-development/references/modern-modules/arrow.md
Normal file
@@ -0,0 +1,569 @@
|
||||
---
|
||||
title: "Arrow - Better Dates & Times for Python"
|
||||
library_name: arrow
|
||||
pypi_package: arrow
|
||||
category: datetime
|
||||
python_compatibility: "3.8+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://arrow.readthedocs.io"
|
||||
official_repository: "https://github.com/arrow-py/arrow"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Arrow - Better Dates & Times for Python
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Arrow provides a sensible, human-friendly approach to creating, manipulating, formatting, and converting dates, times, and timestamps. It addresses critical usability problems in Python's standard datetime ecosystem:
|
||||
|
||||
**Problems Arrow Solves:**
|
||||
|
||||
- **Module fragmentation**: Eliminates the need to import datetime, time, calendar, dateutil, pytz separately
|
||||
- **Type complexity**: Provides a single Arrow type instead of managing date, time, datetime, tzinfo, timedelta, relativedelta
|
||||
- **Timezone verbosity**: Simplifies timezone-aware operations that are cumbersome with standard library
|
||||
- **Missing functionality**: Built-in ISO 8601 parsing, humanization, and time span operations
|
||||
- **Timezone naivety**: UTC-aware by default, preventing common timezone bugs
|
||||
|
||||
Arrow is a **drop-in replacement for datetime** that consolidates scattered tools into a unified, elegant interface.
|
||||
|
||||
## When to Use Arrow
|
||||
|
||||
### Use Arrow When
|
||||
|
||||
1. **Building user-facing applications** that display relative times ("2 hours ago", "in 3 days")
|
||||
2. **Working extensively with timezones** - converting between zones, handling DST transitions
|
||||
3. **Parsing diverse datetime formats** - ISO 8601, timestamps, custom formats
|
||||
4. **Need cleaner, more readable code** - Arrow's chainable API reduces boilerplate
|
||||
5. **Generating time ranges or spans** - iterate over hours, days, weeks, months
|
||||
6. **Internationalization is required** - 75+ locale support for humanized output
|
||||
7. **API development** where timezone-aware timestamps are standard
|
||||
8. **Data processing pipelines** that handle datetime transformations frequently
|
||||
|
||||
### Use Standard datetime When
|
||||
|
||||
1. **Performance is absolutely critical** - Arrow is ~50% slower than datetime.utcnow() @benchmark
|
||||
2. **Minimal datetime operations** - simple date storage with no manipulation
|
||||
3. **Library compatibility requirements** mandate standard datetime objects
|
||||
4. **Memory-constrained environments** - datetime objects have smaller footprint
|
||||
5. **Working within pandas/numpy** which have optimized datetime64 types
|
||||
6. **No timezone logic needed** and you're comfortable with datetime's API
|
||||
|
||||
## Real-World Usage Patterns
|
||||
|
||||
### Pattern 1: Timezone-Aware Timestamp Creation
|
||||
|
||||
@source: <https://arrow.readthedocs.io/en/latest/>
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
# Get current time in UTC (default)
|
||||
utc = arrow.utcnow()
|
||||
# <Arrow [2013-05-11T21:23:58.970460+00:00]>
|
||||
|
||||
# Get current time in specific timezone
|
||||
local = arrow.now('US/Pacific')
|
||||
# <Arrow [2013-05-11T13:23:58.970460-07:00]>
|
||||
|
||||
# Convert between timezones effortlessly
|
||||
utc_time = arrow.utcnow()
|
||||
tokyo_time = utc_time.to('Asia/Tokyo')
|
||||
ny_time = tokyo_time.to('America/New_York')
|
||||
```
|
||||
|
||||
**Why this matters:** Standard datetime requires verbose pytz.timezone() calls and manual localization. Arrow handles this in one method.
|
||||
|
||||
### Pattern 2: Parsing Diverse Formats
|
||||
|
||||
@source: Context7 documentation snippets
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
# Parse ISO 8601 automatically
|
||||
arrow.get('2013-05-11T21:23:58.970460+07:00')
|
||||
|
||||
# Parse with format string
|
||||
arrow.get('2013-05-05 12:30:45', 'YYYY-MM-DD HH:mm:ss')
|
||||
|
||||
# Parse Unix timestamps (int or float)
|
||||
arrow.get(1368303838)
|
||||
arrow.get(1565358758.123413)
|
||||
|
||||
# Parse with timezone
|
||||
arrow.get('2013-05-11T21:23:58', tzinfo='Europe/Paris')
|
||||
|
||||
# Handle inconsistent spacing
|
||||
arrow.get('Jun 1 2005 1:33PM', 'MMM D YYYY H:mmA', normalize_whitespace=True)
|
||||
|
||||
# Parse ISO week dates
|
||||
arrow.get('2013-W29-6', 'W') # Year-Week-Day format
|
||||
```
|
||||
|
||||
**Why this matters:** datetime.strptime() requires exact format matching. Arrow intelligently handles variations and timezone strings directly.
|
||||
|
||||
### Pattern 3: Humanization for User Interfaces
|
||||
|
||||
@source: <https://arrow.readthedocs.io/en/latest/>
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
now = arrow.utcnow()
|
||||
past = now.shift(hours=-1)
|
||||
future = now.shift(days=3, hours=2)
|
||||
|
||||
# English humanization
|
||||
past.humanize() # 'an hour ago'
|
||||
future.humanize() # 'in 3 days'
|
||||
|
||||
# Localized humanization (75+ locales)
|
||||
past.humanize(locale='ko-kr') # '한시간 전'
|
||||
past.humanize(locale='es') # 'hace una hora'
|
||||
|
||||
# Multiple granularities
|
||||
later = arrow.utcnow().shift(hours=2, minutes=19)
|
||||
later.humanize(granularity=['hour', 'minute'])
|
||||
# 'in 2 hours and 19 minutes'
|
||||
|
||||
# Quarter granularity (business applications)
|
||||
four_months = now.shift(months=4)
|
||||
four_months.humanize(granularity='quarter') # 'in a quarter'
|
||||
```
|
||||
|
||||
**Why this matters:** Building this with datetime requires third-party libraries or manual logic. Arrow includes it with extensive locale support.
|
||||
|
||||
### Pattern 4: Time Shifting and Manipulation
|
||||
|
||||
@source: Context7 documentation snippets
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
now = arrow.utcnow()
|
||||
|
||||
# Relative shifts (chainable)
|
||||
future = now.shift(years=1, months=-2, days=5, hours=3)
|
||||
past = now.shift(weeks=-2)
|
||||
|
||||
# Dehumanize - parse human phrases
|
||||
base = arrow.get('2020-05-27 10:30:35')
|
||||
base.dehumanize('8 hours ago')
|
||||
base.dehumanize('in 4 days')
|
||||
base.dehumanize('hace 2 años', locale='es') # Spanish: "2 years ago"
|
||||
|
||||
# Replace specific components
|
||||
now.replace(hour=0, minute=0, second=0) # Start of day
|
||||
now.replace(year=2025)
|
||||
```
|
||||
|
||||
**Why this matters:** timedelta only supports days/seconds. dateutil.relativedelta is verbose. Arrow combines both with intuitive API.
|
||||
|
||||
### Pattern 5: Time Ranges and Spans
|
||||
|
||||
@source: Context7 documentation snippets
|
||||
|
||||
```python
|
||||
import arrow
|
||||
from datetime import datetime
|
||||
|
||||
# Generate time ranges
|
||||
start = arrow.get(2020, 5, 5, 12, 30)
|
||||
end = arrow.get(2020, 5, 5, 17, 15)
|
||||
|
||||
# Iterate by hour
|
||||
for hour in arrow.Arrow.range('hour', start, end):
|
||||
print(hour)
|
||||
|
||||
# Get floor and ceiling (span)
|
||||
now = arrow.utcnow()
|
||||
now.span('hour') # Returns (floor, ceiling) tuple
|
||||
now.floor('hour') # Start of current hour
|
||||
now.ceil('day') # End of current day
|
||||
|
||||
# Span ranges - generate (start, end) tuples
|
||||
for span in arrow.Arrow.span_range('hour', start, end):
|
||||
floor, ceiling = span
|
||||
print(f"Hour from {floor} to {ceiling}")
|
||||
|
||||
# Handle DST transitions correctly
|
||||
before_dst = arrow.get('2018-03-10 23:00:00', tzinfo='US/Pacific')
|
||||
after_dst = arrow.get('2018-03-11 04:00:00', tzinfo='US/Pacific')
|
||||
for t in arrow.Arrow.range('hour', before_dst, after_dst):
|
||||
print(f"{t} (UTC: {t.to('UTC')})")
|
||||
```
|
||||
|
||||
**Why this matters:** Standard datetime has no built-in iteration. Arrow handles DST transitions automatically in ranges.
|
||||
|
||||
### Pattern 6: Formatting with Built-in Constants
|
||||
|
||||
@source: Context7 documentation snippets
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
arw = arrow.utcnow()
|
||||
|
||||
# Use predefined format constants
|
||||
arw.format(arrow.FORMAT_ATOM) # '2020-05-27 10:30:35+00:00'
|
||||
arw.format(arrow.FORMAT_COOKIE) # 'Wednesday, 27-May-2020 10:30:35 UTC'
|
||||
arw.format(arrow.FORMAT_RFC3339) # '2020-05-27 10:30:35+00:00'
|
||||
arw.format(arrow.FORMAT_W3C) # '2020-05-27 10:30:35+00:00'
|
||||
|
||||
# Custom formats with tokens
|
||||
arw.format('YYYY-MM-DD HH:mm:ss ZZ') # '2020-05-27 10:30:35 +00:00'
|
||||
|
||||
# Escape literal text in formats
|
||||
arw.format('YYYY-MM-DD h [h] m') # '2020-05-27 10 h 30'
|
||||
|
||||
# Timestamp formats
|
||||
arw.format('X') # '1590577835' (seconds)
|
||||
arw.format('x') # '1590577835123456' (microseconds)
|
||||
```
|
||||
|
||||
**Why this matters:** datetime.strftime() uses different token syntax (%Y vs YYYY). Arrow uses consistent, JavaScript-inspired tokens.
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Works seamlessly with
|
||||
|
||||
**python-dateutil** (required dependency >=2.7.0)
|
||||
|
||||
- Arrow uses dateutil.parser internally for flexible parsing
|
||||
- Timezone objects from dateutil are directly compatible
|
||||
|
||||
**pytz** (optional, for Python <3.9)
|
||||
|
||||
- Arrow accepts pytz timezone objects in `to()` and `tzinfo` parameters
|
||||
- Handles pytz's DST quirks automatically
|
||||
|
||||
**zoneinfo** (Python 3.9+, via backports.zoneinfo on 3.8)
|
||||
|
||||
- Arrow supports ZoneInfo timezone objects natively
|
||||
- Uses tzdata package on Python 3.9+ for timezone database
|
||||
|
||||
**datetime** (standard library)
|
||||
|
||||
- `arrow_obj.datetime` returns standard datetime object
|
||||
- `arrow.get(datetime_obj)` creates Arrow from datetime
|
||||
- Arrow subclasses datetime, so it works anywhere datetime does
|
||||
|
||||
**pandas** (data analysis)
|
||||
|
||||
```python
|
||||
import arrow
|
||||
import pandas as pd
|
||||
|
||||
# Convert Arrow to pandas Timestamp
|
||||
arrow_time = arrow.utcnow()
|
||||
pd.Timestamp(arrow_time.datetime)
|
||||
|
||||
# Or use Arrow for timezone-aware operations before pandas
|
||||
df['timestamp'] = df['utc_string'].apply(lambda x: arrow.get(x).to('US/Pacific').datetime)
|
||||
```
|
||||
|
||||
**Django/Flask** (web frameworks)
|
||||
|
||||
```python
|
||||
# Django models - store as DateTimeField
|
||||
from django.db import models
|
||||
import arrow
|
||||
|
||||
class Event(models.Model):
|
||||
created_at = models.DateTimeField()
|
||||
|
||||
def save(self, *args, **kwargs):
|
||||
self.created_at = arrow.utcnow().datetime # Convert to datetime
|
||||
super().save(*args, **kwargs)
|
||||
|
||||
@property
|
||||
def created_humanized(self):
|
||||
return arrow.get(self.created_at).humanize()
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
@source: <https://github.com/arrow-py/arrow/blob/master/pyproject.toml>
|
||||
|
||||
**Minimum:** Python 3.8 **Tested versions:** 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.14 **Status:** Production/Stable across all supported versions
|
||||
|
||||
**Version-specific notes:**
|
||||
|
||||
- **Python 3.8**: Requires `backports.zoneinfo==0.2.1` for timezone support
|
||||
- **Python 3.9+**: Uses built-in `zoneinfo` and `tzdata` package
|
||||
- **Python 3.6-3.7**: No longer supported as of Arrow 1.3.0 (EOL Python versions)
|
||||
|
||||
**Dependencies:**
|
||||
|
||||
```toml
|
||||
python-dateutil>=2.7.0
|
||||
backports.zoneinfo==0.2.1 # Python <3.9 only
|
||||
tzdata # Python >=3.9 only
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Basic installation
|
||||
pip install -U arrow
|
||||
|
||||
# With uv (recommended)
|
||||
uv pip install arrow
|
||||
|
||||
# In pyproject.toml
|
||||
[project]
|
||||
dependencies = [
|
||||
"arrow>=1.3.0",
|
||||
]
|
||||
```
|
||||
|
||||
## When NOT to Use Arrow
|
||||
|
||||
### Scenario 1: High-Performance Timestamp Generation
|
||||
|
||||
@source: <https://www.dataroc.ca/blog/most-performant-timestamp-functions-python>
|
||||
|
||||
**Performance benchmarks show:**
|
||||
|
||||
- `time.time()`: Baseline (fastest)
|
||||
- `datetime.utcnow()`: ~50% slower than time.time()
|
||||
- Arrow operations: Additional overhead for object wrapping
|
||||
|
||||
**Use datetime when:** You're generating millions of timestamps in tight loops (e.g., high-frequency trading, real-time analytics pipelines).
|
||||
|
||||
```python
|
||||
# High-performance scenario - use standard library
|
||||
import time
|
||||
timestamp = time.time() # Fastest for epoch timestamps
|
||||
|
||||
import datetime
|
||||
dt = datetime.datetime.utcnow() # Faster for datetime objects
|
||||
```
|
||||
|
||||
### Scenario 2: Working with Pandas/NumPy DateTime
|
||||
|
||||
@source: Performance analysis and library comparisons
|
||||
|
||||
Pandas has highly optimized `datetime64` vectorized operations. Arrow's object-oriented approach doesn't vectorize well.
|
||||
|
||||
**Use pandas when:** Processing large datasets with datetime columns.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# Pandas is optimized for this
|
||||
df['date'] = pd.to_datetime(df['date_string'])
|
||||
df['hour'] = df['date'].dt.hour # Vectorized operation
|
||||
|
||||
# Arrow would require row-by-row operations (slow)
|
||||
# df['hour'] = df['date'].apply(lambda x: arrow.get(x).hour)
|
||||
```
|
||||
|
||||
### Scenario 3: Simple Date Storage
|
||||
|
||||
**Use datetime when:** You only need to store dates with no manipulation:
|
||||
|
||||
```python
|
||||
from datetime import datetime
|
||||
|
||||
# Simple storage - datetime is sufficient
|
||||
user.created_at = datetime.utcnow()
|
||||
```
|
||||
|
||||
### Scenario 4: Library Compatibility Constraints
|
||||
|
||||
Some libraries explicitly require standard datetime objects and don't accept subclasses. Always test compatibility.
|
||||
|
||||
### Scenario 5: Memory-Constrained Environments
|
||||
|
||||
Arrow objects carry additional overhead. For millions of cached datetime objects, standard datetime is lighter.
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Requirement | Arrow | datetime | Notes |
|
||||
| ---------------------- | ----- | -------- | --------------------------------------------------- |
|
||||
| Timezone conversion | ✓✓✓ | ✓ | Arrow: one-line. datetime: verbose with pytz |
|
||||
| ISO 8601 parsing | ✓✓✓ | ✓✓ | Arrow: automatic. datetime: fromisoformat() limited |
|
||||
| Humanization | ✓✓✓ | ✗ | Arrow: built-in with 75+ locales |
|
||||
| Time ranges/iteration | ✓✓✓ | ✗ | Arrow: native. datetime: manual loops |
|
||||
| Performance (creation) | ✓✓ | ✓✓✓ | datetime ~50% faster |
|
||||
| Performance (parsing) | ✓✓ | ✓✓✓ | datetime.strptime() faster |
|
||||
| Memory footprint | ✓✓ | ✓✓✓ | datetime objects lighter |
|
||||
| Learning curve | ✓✓✓ | ✓✓ | Arrow: more intuitive |
|
||||
| Pandas integration | ✓ | ✓✓✓ | Use pandas.Timestamp for large data |
|
||||
| Standard library | ✗ | ✓✓✓ | Arrow: requires installation |
|
||||
| Type hints | ✓✓✓ | ✓✓✓ | Both have full PEP 484 support |
|
||||
| DST handling | ✓✓✓ | ✓✓ | Arrow: automatic. datetime: manual |
|
||||
|
||||
**Legend:** ✓✓✓ Excellent | ✓✓ Good | ✓ Adequate | ✗ Not supported
|
||||
|
||||
## Quick Decision Guide
|
||||
|
||||
```text
|
||||
START: Do you need datetime functionality?
|
||||
|
|
||||
├─ Is performance critical? (>100k ops/sec)
|
||||
| └─ YES → Use datetime or time.time()
|
||||
|
|
||||
├─ Working with pandas/numpy large datasets?
|
||||
| └─ YES → Use pandas.Timestamp
|
||||
|
|
||||
├─ Need any of: humanization, easy timezone conversion, time ranges, multi-locale?
|
||||
| └─ YES → Use Arrow
|
||||
|
|
||||
├─ Simple date storage only?
|
||||
| └─ YES → Use datetime
|
||||
|
|
||||
└─ Building user-facing application with datetime logic?
|
||||
└─ YES → Use Arrow (cleaner code, better UX)
|
||||
```
|
||||
|
||||
## Common Gotchas and Solutions
|
||||
|
||||
### Gotcha 1: Arrow is timezone-aware by default
|
||||
|
||||
@source: Arrow documentation
|
||||
|
||||
```python
|
||||
# This gives you UTC time, not local time
|
||||
arrow.now() # <Arrow [2020-05-27T10:30:35.123456+00:00]>
|
||||
|
||||
# For local timezone, be explicit
|
||||
arrow.now('local') # or arrow.now('America/New_York')
|
||||
```
|
||||
|
||||
### Gotcha 2: Converting to datetime loses Arrow methods
|
||||
|
||||
```python
|
||||
arrow_time = arrow.utcnow()
|
||||
dt = arrow_time.datetime # Now a standard datetime object
|
||||
|
||||
# This works
|
||||
arrow_time.humanize() # ✓
|
||||
|
||||
# This fails
|
||||
dt.humanize() # ✗ AttributeError
|
||||
```
|
||||
|
||||
### Gotcha 3: Timestamp parsing requires format token in 0.15.0+
|
||||
|
||||
@source: Context7 CHANGELOG snippets
|
||||
|
||||
```python
|
||||
# Deprecated (pre-0.15.0)
|
||||
arrow.get("1565358758") # ✗ No longer works
|
||||
|
||||
# Correct (0.15.0+)
|
||||
arrow.get("1565358758", "X") # ✓ Explicit format token
|
||||
arrow.get(1565358758) # ✓ Or pass as int/float directly
|
||||
```
|
||||
|
||||
### Gotcha 4: Ambiguous datetimes during DST transitions
|
||||
|
||||
@source: Context7 documentation
|
||||
|
||||
```python
|
||||
# During DST "fall back", 2 AM occurs twice
|
||||
# Use fold parameter (PEP 495)
|
||||
ambiguous = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo='Europe/Stockholm')
|
||||
ambiguous.fold # 0 (first occurrence)
|
||||
|
||||
# Specify which occurrence
|
||||
second_occurrence = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo='Europe/Stockholm', fold=1)
|
||||
```
|
||||
|
||||
## Alternatives Comparison
|
||||
|
||||
@source: <https://python.libhunt.com/arrow-alternatives>, <https://aboutsimon.com/blog/2016/08/04/datetime-vs-Arrow-vs-Pendulum-vs-Delorean-vs-udatetime.html>
|
||||
|
||||
**Pendulum** (arrow alternative)
|
||||
|
||||
- Similar goals: human-friendly datetime
|
||||
- Better timezone handling in some edge cases
|
||||
- Slower than Arrow in benchmarks
|
||||
- Less widely adopted (fewer GitHub stars)
|
||||
|
||||
**Maya** (Datetimes for Humans)
|
||||
|
||||
- Simpler API, fewer features
|
||||
- Less actively maintained
|
||||
- Good for very basic use cases
|
||||
|
||||
**udatetime** (performance-focused)
|
||||
|
||||
- Written in C for speed (faster than datetime)
|
||||
- Limited feature set (encode/decode only)
|
||||
- Use when you need Arrow-like simplicity with datetime-like speed
|
||||
|
||||
**Standard datetime** (built-in)
|
||||
|
||||
- Always available, no dependencies
|
||||
- Verbose but performant
|
||||
- Use when Arrow features aren't needed
|
||||
|
||||
**dateutil** (datetime extension)
|
||||
|
||||
- Powerful parser, relativedelta for arithmetic
|
||||
- Often used with datetime for enhanced functionality
|
||||
- Arrow uses dateutil internally
|
||||
|
||||
## Real-World Example Projects
|
||||
|
||||
@source: GitHub search results
|
||||
|
||||
**arrow-py/arrow** (8,944 stars)
|
||||
|
||||
- Official repository with comprehensive examples
|
||||
- <https://github.com/arrow-py/arrow>
|
||||
|
||||
**Common usage in web applications:**
|
||||
|
||||
```python
|
||||
# API endpoint returning human-readable timestamps
|
||||
from flask import jsonify
|
||||
import arrow
|
||||
|
||||
@app.route('/events')
|
||||
def get_events():
|
||||
events = Event.query.all()
|
||||
return jsonify([{
|
||||
'id': e.id,
|
||||
'name': e.name,
|
||||
'created': arrow.get(e.created_at).humanize(),
|
||||
'start_time': arrow.get(e.start_time).format('YYYY-MM-DD HH:mm ZZ')
|
||||
} for e in events])
|
||||
```
|
||||
|
||||
**Data processing pipelines:**
|
||||
|
||||
```python
|
||||
import arrow
|
||||
|
||||
def process_log_file(log_path):
|
||||
with open(log_path) as f:
|
||||
for line in f:
|
||||
# Parse diverse timestamp formats
|
||||
timestamp_str = extract_timestamp(line)
|
||||
timestamp = arrow.get(timestamp_str, normalize_whitespace=True)
|
||||
|
||||
# Convert to consistent timezone
|
||||
utc_time = timestamp.to('UTC')
|
||||
|
||||
# Filter by time range
|
||||
if utc_time >= arrow.get('2025-01-01'):
|
||||
yield utc_time, line
|
||||
```
|
||||
|
||||
## References and Sources
|
||||
|
||||
@official_docs: <https://arrow.readthedocs.io/en/latest/> @repository: <https://github.com/arrow-py/arrow> @pypi: <https://pypi.org/project/arrow/> @context7: /arrow-py/arrow @changelog: <https://github.com/arrow-py/arrow/blob/master/CHANGELOG.rst>
|
||||
|
||||
**Performance analysis:** @benchmark: <https://www.dataroc.ca/blog/most-performant-timestamp-functions-python> @comparison: <https://aboutsimon.com/blog/2016/08/04/datetime-vs-Arrow-vs-Pendulum-vs-Delorean-vs-udatetime.html>
|
||||
|
||||
**Community resources:** @alternatives: <https://python.libhunt.com/arrow-alternatives> @tutorial: <https://code.tutsplus.com/arrow-for-better-date-and-time-in-python--cms-29624t> @guide: <https://stackabuse.com/working-with-datetime-in-python-with-arrow/>
|
||||
|
||||
## Summary
|
||||
|
||||
Arrow eliminates datetime friction by consolidating Python's fragmented date/time ecosystem into a single, intuitive API. Use it when developer experience and feature richness matter more than raw performance. For high-frequency operations or pandas-scale data processing, stick with the standard library or specialized tools. Arrow shines in web applications, APIs, CLI tools, and any code where humans read the timestamps.
|
||||
|
||||
**The reinvented wheel:** Without Arrow, you'd manually implement timezone conversion helpers, humanization logic, flexible parsing, and time range iteration using datetime + dateutil + pytz + custom code. Arrow packages these common patterns into a production-ready library.
|
||||
490
skills/python3-development/references/modern-modules/attrs.md
Normal file
490
skills/python3-development/references/modern-modules/attrs.md
Normal file
@@ -0,0 +1,490 @@
|
||||
---
|
||||
title: "attrs: Python Classes Without Boilerplate"
|
||||
library_name: attrs
|
||||
pypi_package: attrs
|
||||
category: dataclasses
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://www.attrs.org"
|
||||
official_repository: "https://github.com/python-attrs/attrs"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# attrs: Python Classes Without Boilerplate
|
||||
|
||||
## Core Purpose
|
||||
|
||||
attrs eliminates the drudgery of implementing object protocols (dunder methods) by automatically generating `__init__`, `__repr__`, `__eq__`, `__hash__`, and other common methods. It predates Python's built-in dataclasses (which was inspired by attrs) and offers more features and flexibility.
|
||||
|
||||
**What problem does it solve?**
|
||||
|
||||
- Removes repetitive boilerplate code for class definitions
|
||||
- Provides declarative attribute definitions with validation and conversion
|
||||
- Offers slots, frozen instances, and performance optimizations
|
||||
- Enables consistent, correct implementations of comparison and hashing
|
||||
|
||||
**This prevents "reinventing the wheel" by:**
|
||||
|
||||
- Auto-generating special methods that are error-prone to write manually
|
||||
- Providing battle-tested validators and converters
|
||||
- Handling edge cases in equality, hashing, and immutability correctly
|
||||
- Offering extensibility through field transformers and custom setters
|
||||
|
||||
## Official Information
|
||||
|
||||
- **Repository**: <https://github.com/python-attrs/attrs> (@source: python-attrs/attrs on GitHub)
|
||||
- **PyPI Package**: `attrs` (current version: 25.4.0) (@source: <https://pypi.org/project/attrs/>)
|
||||
- **Documentation**: <https://www.attrs.org/> (@source: official docs)
|
||||
- **License**: MIT
|
||||
- **Maintenance**: Active development, trusted by NASA for Mars missions since 2020 (@source: attrs README)
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
- **Minimum**: Python 3.9+ (@source: PyPI metadata)
|
||||
- **Maximum**: Python 3.14 (tested and supported)
|
||||
- **PyPy**: Fully supported
|
||||
- **Feature notes**:
|
||||
- Supports slots by default in modern API (`@define`)
|
||||
- Works with all mainstream Python versions including PyPy
|
||||
- Implements cell rewriting for `super()` calls in slotted classes
|
||||
- Compatible with `functools.cached_property` on slotted classes
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install attrs
|
||||
```
|
||||
|
||||
For serialization/deserialization support:
|
||||
|
||||
```bash
|
||||
pip install attrs cattrs
|
||||
```
|
||||
|
||||
## Core Usage Patterns
|
||||
|
||||
### 1. Basic Class Definition (Modern API)
|
||||
|
||||
```python
|
||||
from attrs import define, field
|
||||
|
||||
@define
|
||||
class Point:
|
||||
x: int
|
||||
y: int
|
||||
|
||||
# Automatically generates __init__, __repr__, __eq__, etc.
|
||||
p = Point(1, 2)
|
||||
print(p) # Point(x=1, y=2)
|
||||
print(p == Point(1, 2)) # True
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs documentation, attrs README)
|
||||
|
||||
### 2. Default Values and Factories
|
||||
|
||||
```python
|
||||
from attrs import define, field, Factory
|
||||
|
||||
@define
|
||||
class SomeClass:
|
||||
a_number: int = 42
|
||||
list_of_numbers: list[int] = Factory(list)
|
||||
|
||||
# Factory prevents mutable default gotchas
|
||||
sc1 = SomeClass()
|
||||
sc2 = SomeClass()
|
||||
sc1.list_of_numbers.append(1)
|
||||
print(sc2.list_of_numbers) # [] - separate instances
|
||||
```
|
||||
|
||||
(@source: attrs README, Context7 documentation examples)
|
||||
|
||||
### 3. Validators
|
||||
|
||||
```python
|
||||
from attrs import define, field, validators
|
||||
|
||||
@define
|
||||
class User:
|
||||
email: str = field(validator=validators.matches_re(
|
||||
r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
|
||||
))
|
||||
age: int = field(validator=[
|
||||
validators.instance_of(int),
|
||||
validators.ge(0),
|
||||
validators.lt(150)
|
||||
])
|
||||
|
||||
# Custom validator with decorator
|
||||
@define
|
||||
class BoundedValue:
|
||||
x: int = field()
|
||||
y: int
|
||||
|
||||
@x.validator
|
||||
def _check_x(self, attribute, value):
|
||||
if value >= self.y:
|
||||
raise ValueError("x must be smaller than y")
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs validators documentation)
|
||||
|
||||
### 4. Converters
|
||||
|
||||
```python
|
||||
from attrs import define, field, converters
|
||||
|
||||
@define
|
||||
class C:
|
||||
x: int = field(converter=int)
|
||||
|
||||
c = C("42")
|
||||
print(c.x) # 42 (converted from string)
|
||||
|
||||
# Optional converter
|
||||
@define
|
||||
class OptionalInt:
|
||||
x: int | None = field(converter=converters.optional(int))
|
||||
|
||||
OptionalInt(None) # Valid
|
||||
OptionalInt("42") # Converts to 42
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs converters documentation)
|
||||
|
||||
### 5. Frozen (Immutable) Classes
|
||||
|
||||
```python
|
||||
from attrs import frozen, field
|
||||
|
||||
@frozen
|
||||
class Coordinates:
|
||||
x: int
|
||||
y: int
|
||||
|
||||
c = Coordinates(1, 2)
|
||||
# c.x = 3 # Raises FrozenInstanceError
|
||||
|
||||
# Post-init with frozen classes
|
||||
@frozen
|
||||
class FrozenWithDerived:
|
||||
x: int
|
||||
y: int = field(init=False)
|
||||
|
||||
def __attrs_post_init__(self):
|
||||
# Must use object.__setattr__ for frozen classes
|
||||
object.__setattr__(self, "y", self.x + 1)
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs frozen documentation)
|
||||
|
||||
### 6. Slots for Performance
|
||||
|
||||
```python
|
||||
from attrs import define
|
||||
|
||||
# Slots enabled by default with @define
|
||||
@define
|
||||
class SlottedClass:
|
||||
x: int
|
||||
y: int
|
||||
|
||||
# More memory efficient, faster attribute access
|
||||
# Cannot add attributes not defined in class
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs slots documentation, attrs glossary)
|
||||
|
||||
### 7. Without Type Annotations
|
||||
|
||||
```python
|
||||
from attrs import define, field
|
||||
|
||||
@define
|
||||
class NoAnnotations:
|
||||
a_number = field(default=42)
|
||||
list_of_numbers = field(factory=list)
|
||||
```
|
||||
|
||||
(@source: attrs README)
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Example Projects Using attrs
|
||||
|
||||
1. **Black** - The uncompromising Python code formatter
|
||||
- Repository: <https://github.com/psf/black>
|
||||
- Usage: Extensive use of attrs for AST node classes (@source: GitHub search)
|
||||
|
||||
2. **cattrs** - Composable custom class converters
|
||||
- Repository: <https://github.com/python-attrs/cattrs>
|
||||
- Usage: Built on top of attrs for serialization/deserialization (@source: python-attrs/cattrs)
|
||||
|
||||
3. **Eradiate** - Radiative transfer model
|
||||
- Repository: <https://github.com/eradiate/eradiate>
|
||||
- Usage: Scientific computing with validated data structures (@source: GitHub code search)
|
||||
|
||||
### Common Patterns from Real Code
|
||||
|
||||
**Pattern 1: Deep validation for nested structures**
|
||||
|
||||
```python
|
||||
from attrs import define, field, validators
|
||||
|
||||
@define
|
||||
class Measurement:
|
||||
tags: dict = field(
|
||||
validator=validators.deep_mapping(
|
||||
key_validator=validators.not_(
|
||||
validators.in_({"id", "time", "source"}),
|
||||
msg="reserved tag key"
|
||||
),
|
||||
value_validator=validators.instance_of((str, int))
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs deep_mapping validator documentation)
|
||||
|
||||
**Pattern 2: Custom comparison for special types**
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from attrs import define, field, cmp_using
|
||||
|
||||
@define
|
||||
class ArrayContainer:
|
||||
data: np.ndarray = field(eq=cmp_using(eq=np.array_equal))
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs comparison documentation)
|
||||
|
||||
**Pattern 3: Hiding sensitive data in repr**
|
||||
|
||||
```python
|
||||
from attrs import define, field
|
||||
|
||||
@define
|
||||
class User:
|
||||
username: str
|
||||
password: str = field(repr=lambda value: '***')
|
||||
|
||||
User("admin", "secret123")
|
||||
# Output: User(username='admin', password=***)
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs examples)
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### With cattrs for Serialization
|
||||
|
||||
```python
|
||||
from attrs import define
|
||||
from cattrs import structure, unstructure
|
||||
|
||||
@define
|
||||
class Person:
|
||||
name: str
|
||||
age: int
|
||||
|
||||
# Serialize to dict
|
||||
data = unstructure(Person("Alice", 30))
|
||||
# {'name': 'Alice', 'age': 30}
|
||||
|
||||
# Deserialize from dict
|
||||
person = structure({"name": "Bob", "age": 25}, Person)
|
||||
```
|
||||
|
||||
(@source: python-attrs/cattrs repository, Context7 cattrs documentation)
|
||||
|
||||
### Field Transformers for Advanced Use Cases
|
||||
|
||||
```python
|
||||
from attrs import define, frozen, field
|
||||
from datetime import datetime
|
||||
|
||||
def auto_convert_datetime(cls, fields):
|
||||
results = []
|
||||
for f in fields:
|
||||
if f.converter is not None:
|
||||
results.append(f)
|
||||
continue
|
||||
if f.type in {datetime, 'datetime'}:
|
||||
converter = lambda d: datetime.fromisoformat(d) if isinstance(d, str) else d
|
||||
else:
|
||||
converter = None
|
||||
results.append(f.evolve(converter=converter))
|
||||
return results
|
||||
|
||||
@frozen(field_transformer=auto_convert_datetime)
|
||||
class Event:
|
||||
name: str
|
||||
timestamp: datetime
|
||||
|
||||
# Automatically converts ISO strings to datetime
|
||||
event = Event(name="deploy", timestamp="2025-10-21T10:00:00")
|
||||
```
|
||||
|
||||
(@source: Context7 /python-attrs/attrs field_transformer documentation)
|
||||
|
||||
## When to Use attrs
|
||||
|
||||
### Use attrs when
|
||||
|
||||
- You want more features than dataclasses provide
|
||||
- You need robust validation and conversion
|
||||
- You require frozen/immutable instances with complex post-init
|
||||
- You want extensibility (field transformers, custom setters)
|
||||
- You need to support Python 3.9+ with modern features
|
||||
- Performance matters (slots optimization)
|
||||
- You want better debugging experience (cell rewriting for super())
|
||||
- You prefer a mature, battle-tested library (used by NASA)
|
||||
|
||||
### Use dataclasses when
|
||||
|
||||
- You need stdlib-only solution (no dependencies)
|
||||
- Your use case is simple (basic data containers)
|
||||
- You don't need validators or converters
|
||||
- You're comfortable with limited customization
|
||||
- You only support Python 3.10+ (for slots with super())
|
||||
|
||||
### Use Pydantic when
|
||||
|
||||
- You need runtime type validation (attrs validates on-demand)
|
||||
- You're building APIs with automatic schema generation
|
||||
- You need JSON Schema / OpenAPI integration
|
||||
- You want coercion-heavy validation (Pydantic is more aggressive)
|
||||
- You need ORM-like features
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Feature | attrs | dataclasses | Pydantic |
|
||||
| ------------------ | ------------------- | ------------- | --------------------- |
|
||||
| **Validators** | Extensive | Manual only | Automatic + extensive |
|
||||
| **Converters** | Built-in | Manual only | Automatic coercion |
|
||||
| **Slots** | Default in @define | 3.10+ only | Optional |
|
||||
| **Frozen** | Full support | Basic support | Via Config |
|
||||
| **Performance** | Fast (slots) | Fast | Slower (validation) |
|
||||
| **Type coercion** | Opt-in | No | Automatic |
|
||||
| **Dependencies** | Zero | Zero (stdlib) | Multiple |
|
||||
| **Extensibility** | High (transformers) | Limited | Medium |
|
||||
| **Python support** | 3.9+ | 3.7+ | 3.8+ |
|
||||
| **Schema export** | Via cattrs | No | Built-in |
|
||||
| **API stability** | Very stable | Stable | Evolving |
|
||||
|
||||
(@source: Context7 /python-attrs/attrs comparison with dataclasses, research from comparison articles)
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
1. **Simple data containers without validation**
|
||||
- If you just need `__init__` and `__repr__`, dataclasses suffice
|
||||
- Example: Simple config objects, DTOs without business logic
|
||||
|
||||
2. **When you need JSON Schema / OpenAPI integration**
|
||||
- Pydantic provides this out-of-the-box
|
||||
- attrs requires additional libraries (cattrs + schema generators)
|
||||
|
||||
3. **Heavy runtime type validation requirements**
|
||||
- Pydantic validates automatically; attrs requires explicit validators
|
||||
- If every field needs type checking at runtime, Pydantic is more convenient
|
||||
|
||||
4. **No external dependencies allowed**
|
||||
- Use dataclasses from stdlib
|
||||
- Though attrs has zero dependencies itself
|
||||
|
||||
5. **Working with ORMs requiring specific metaclasses**
|
||||
- Some ORMs conflict with attrs' class generation
|
||||
- Check compatibility before adopting
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Slots**: Enabled by default in `@define`, reducing memory overhead (~40-50% less memory)
|
||||
- **Frozen classes**: Slightly slower instantiation due to immutability checks
|
||||
- **Validation**: Only runs when explicitly called via `attrs.validate()` or during `__init__`
|
||||
- **Comparison**: Generated methods are as fast as hand-written equivalents
|
||||
|
||||
(@source: Context7 /python-attrs/attrs performance benchmarks)
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
1. **Mutable defaults**: Always use `Factory` for mutable defaults
|
||||
2. **Frozen post-init**: Must use `object.__setattr__` in `__attrs_post_init__`
|
||||
3. **Slots and dynamic attributes**: Cannot add attributes not defined in class
|
||||
4. **Pickling slotted classes**: Attributes with `init=False` must be set before pickling
|
||||
5. **Validator order**: Converters run before validators
|
||||
|
||||
(@source: Context7 /python-attrs/attrs documentation, glossary)
|
||||
|
||||
## Migration Path
|
||||
|
||||
### From dataclasses to attrs
|
||||
|
||||
```python
|
||||
# Before (dataclass)
|
||||
from dataclasses import dataclass
|
||||
|
||||
@dataclass
|
||||
class Point:
|
||||
x: int
|
||||
y: int = 0
|
||||
|
||||
# After (attrs)
|
||||
from attrs import define
|
||||
|
||||
@define
|
||||
class Point:
|
||||
x: int
|
||||
y: int = 0
|
||||
```
|
||||
|
||||
Minimal changes required; attrs is largely a drop-in replacement with more features.
|
||||
|
||||
### From Pydantic to attrs
|
||||
|
||||
```python
|
||||
# Before (Pydantic)
|
||||
from pydantic import BaseModel, validator
|
||||
|
||||
class User(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
|
||||
@validator('age')
|
||||
def check_age(cls, v):
|
||||
if v < 0:
|
||||
raise ValueError('age must be positive')
|
||||
return v
|
||||
|
||||
# After (attrs + cattrs for serialization)
|
||||
from attrs import define, field, validators
|
||||
|
||||
@define
|
||||
class User:
|
||||
name: str
|
||||
age: int = field(validator=[
|
||||
validators.instance_of(int),
|
||||
validators.ge(0)
|
||||
])
|
||||
```
|
||||
|
||||
Note: Pydantic does automatic validation; attrs requires explicit calls.
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Official Tutorial**: <https://www.attrs.org/en/stable/examples.html>
|
||||
- **Extensions**: <https://github.com/python-attrs/attrs/wiki/Extensions-to-attrs>
|
||||
- **Comparison with dataclasses**: <https://www.attrs.org/en/stable/why.html#data-classes>
|
||||
- **attrs-strict**: Runtime type validation extension (@source: attrs wiki)
|
||||
- **Stack Overflow tag**: `python-attrs`
|
||||
|
||||
## Conclusion
|
||||
|
||||
attrs is the mature, feature-rich choice for defining classes in Python. It predates dataclasses, offers significantly more functionality, and maintains excellent performance through slots optimization. Choose attrs when you need validators, converters, extensibility, or when building production systems requiring robust data structures. It's the foundation used by major projects like Black and is trusted by NASA for critical missions.
|
||||
|
||||
For simple cases, dataclasses may suffice. For API validation and schema generation, Pydantic excels. But for general-purpose class definition with powerful features and minimal dependencies, attrs is the gold standard.
|
||||
|
||||
---
|
||||
|
||||
**Research methodology**: Information gathered from official documentation (attrs.org), PyPI metadata, GitHub repository analysis, Context7 code examples, and comparison with alternative libraries. All sources are cited inline with @ references.
|
||||
598
skills/python3-development/references/modern-modules/bidict.md
Normal file
598
skills/python3-development/references/modern-modules/bidict.md
Normal file
@@ -0,0 +1,598 @@
|
||||
---
|
||||
title: "bidict: Bidirectional Mapping Library"
|
||||
library_name: bidict
|
||||
pypi_package: bidict
|
||||
category: data-structures
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://bidict.readthedocs.io"
|
||||
official_repository: "https://github.com/jab/bidict"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# bidict: Bidirectional Mapping Library
|
||||
|
||||
## Overview
|
||||
|
||||
bidict provides efficient, Pythonic bidirectional mapping data structures for Python. It allows you to maintain a one-to-one mapping between keys and values where you can look up values by keys and keys by values with equal efficiency.
|
||||
|
||||
## Official Information
|
||||
|
||||
- **Repository**: @[https://github.com/jab/bidict]
|
||||
- **Documentation**: @[https://bidict.readthedocs.io]
|
||||
- **PyPI Package**: `bidict`
|
||||
- **Latest Stable Version**: 0.23.1 (February 2024)
|
||||
- **Development Version**: 0.23.2.dev0
|
||||
- **License**: MPL-2.0 (Mozilla Public License 2.0)
|
||||
- **Maintenance**: Actively maintained since 2009 (15+ years)
|
||||
- **Author**: Joshua Bronson (@jab)
|
||||
- **Stars**: 1,554+ on GitHub
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
- **Minimum Required**: Python 3.9+
|
||||
- **Tested Versions**: 3.9, 3.10, 3.11, 3.12, PyPy
|
||||
- **Python 3.13/3.14**: Expected to be compatible (no version-specific blockers)
|
||||
- **Type Hints**: Fully type-hinted codebase
|
||||
|
||||
Source: @[pyproject.toml line 9: requires-python = ">=3.9"]
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### The Problem bidict Solves
|
||||
|
||||
bidict eliminates the need to manually maintain two separate dictionaries when you need bidirectional lookups. Without bidict, you might be tempted to:
|
||||
|
||||
```python
|
||||
# DON'T DO THIS - The naive approach
|
||||
mapping = {'H': 'hydrogen', 'hydrogen': 'H'}
|
||||
```
|
||||
|
||||
**Problems with this approach:**
|
||||
|
||||
- Unclear distinction between keys and values when iterating
|
||||
- `len()` returns double the actual number of associations
|
||||
- Updating associations requires complex cleanup logic to avoid orphaned data
|
||||
- No enforcement of one-to-one invariant
|
||||
- Iterating `.keys()` also yields values, and vice versa
|
||||
|
||||
### What bidict Provides
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# The correct approach
|
||||
element_by_symbol = bidict({'H': 'hydrogen'})
|
||||
element_by_symbol['H'] # 'hydrogen'
|
||||
element_by_symbol.inverse['hydrogen'] # 'H'
|
||||
```
|
||||
|
||||
bidict maintains two separate internal dictionaries and keeps them automatically synchronized, providing:
|
||||
|
||||
- **One-to-one invariant enforcement**: Prevents duplicate values
|
||||
- **Automatic inverse synchronization**: Changes propagate bidirectionally
|
||||
- **Clean iteration**: `.keys()` returns only keys, `.values()` returns only values
|
||||
- **Accurate length**: `len()` returns the actual number of associations
|
||||
- **Type safety**: Fully typed for static analysis
|
||||
|
||||
Source: @[docs/intro.rst: "to model a bidirectional mapping correctly and unambiguously, we need two separate one-directional mappings"]
|
||||
|
||||
## When to Use bidict
|
||||
|
||||
### Use bidict When
|
||||
|
||||
1. **Bidirectional lookups are required**
|
||||
- Symbol-to-element mapping (H ↔ hydrogen)
|
||||
- User ID-to-username mapping
|
||||
- Code-to-description mappings
|
||||
- Translation dictionaries between two systems
|
||||
|
||||
2. **One-to-one relationships must be enforced**
|
||||
- Database primary key mappings
|
||||
- File path-to-identifier mappings
|
||||
- Token-to-user session mappings
|
||||
|
||||
3. **You need both directions with equal frequency**
|
||||
- The overhead of two dicts is justified by lookup patterns
|
||||
- Inverse lookups are not occasional edge cases
|
||||
|
||||
4. **Data integrity is important**
|
||||
- Automatic cleanup when updating associations
|
||||
- Protection against duplicate values via `ValueDuplicationError`
|
||||
- Fail-clean guarantees for bulk operations
|
||||
|
||||
### Use Two Separate Dicts When
|
||||
|
||||
1. **Inverse lookups are rare or never needed**
|
||||
- Simple one-way mappings
|
||||
- Lookups only in one direction
|
||||
|
||||
2. **Values are not unique**
|
||||
- Many-to-one relationships (multiple keys → same value)
|
||||
- Example: category-to-items mapping
|
||||
|
||||
3. **Values are unhashable**
|
||||
- Lists, dicts, or other mutable/unhashable values
|
||||
- bidict requires values to be hashable
|
||||
|
||||
4. **Memory is extremely constrained**
|
||||
- bidict maintains two internal dicts (approximately 2x memory)
|
||||
- For very large datasets where inverse is rarely used
|
||||
|
||||
Source: @[docs/intro.rst, docs/basic-usage.rst]
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────┬──────────────┬──────────────────┐
|
||||
│ Requirement │ Use bidict │ Use Two Dicts │
|
||||
├─────────────────────────────────────┼──────────────┼──────────────────┤
|
||||
│ Bidirectional lookups frequently │ ✓ │ │
|
||||
│ One-to-one constraint enforcement │ ✓ │ │
|
||||
│ Values must be hashable │ ✓ │ │
|
||||
│ Automatic synchronization needed │ ✓ │ │
|
||||
│ Many-to-one relationships │ │ ✓ │
|
||||
│ Unhashable values (lists, dicts) │ │ ✓ │
|
||||
│ Inverse lookups are rare │ │ ✓ │
|
||||
│ Extreme memory constraints │ │ ✓ │
|
||||
└─────────────────────────────────────┴──────────────┴──────────────────┘
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install bidict
|
||||
```
|
||||
|
||||
Or with uv:
|
||||
|
||||
```bash
|
||||
uv add bidict
|
||||
```
|
||||
|
||||
No runtime dependencies outside Python's standard library.
|
||||
|
||||
## Basic Usage Examples
|
||||
|
||||
### Creating and Using a bidict
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# Create from dict, keyword arguments, or items
|
||||
element_by_symbol = bidict({'H': 'hydrogen', 'He': 'helium'})
|
||||
element_by_symbol = bidict(H='hydrogen', He='helium')
|
||||
element_by_symbol = bidict([('H', 'hydrogen'), ('He', 'helium')])
|
||||
|
||||
# Forward lookup (key → value)
|
||||
element_by_symbol['H'] # 'hydrogen'
|
||||
|
||||
# Inverse lookup (value → key)
|
||||
element_by_symbol.inverse['hydrogen'] # 'H'
|
||||
|
||||
# Inverse is a full bidict, kept in sync
|
||||
element_by_symbol.inverse['helium'] = 'He'
|
||||
element_by_symbol['He'] # 'helium'
|
||||
```
|
||||
|
||||
Source: @[docs/intro.rst, docs/basic-usage.rst]
|
||||
|
||||
### Handling Duplicate Values
|
||||
|
||||
```python
|
||||
from bidict import bidict, ValueDuplicationError
|
||||
|
||||
b = bidict({'one': 1})
|
||||
|
||||
# This raises an error - value 1 already exists
|
||||
try:
|
||||
b['two'] = 1
|
||||
except ValueDuplicationError:
|
||||
print("Value 1 is already mapped to 'one'")
|
||||
|
||||
# Explicitly allow overwriting with forceput()
|
||||
b.forceput('two', 1)
|
||||
# Result: bidict({'two': 1}) - 'one' was removed
|
||||
```
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Values Must Be Unique"]
|
||||
|
||||
### Standard Dictionary Operations
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
b = bidict(H='hydrogen', He='helium')
|
||||
|
||||
# All standard dict methods work
|
||||
'H' in b # True
|
||||
b.get('Li', 'not found') # 'not found'
|
||||
b.pop('He') # 'helium'
|
||||
b.update({'Li': 'lithium'}) # Add items
|
||||
len(b) # 2
|
||||
|
||||
# Iteration yields only keys (not keys+values like naive approach)
|
||||
list(b.keys()) # ['H', 'Li']
|
||||
list(b.values()) # ['hydrogen', 'lithium']
|
||||
list(b.items()) # [('H', 'hydrogen'), ('Li', 'lithium')]
|
||||
```
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Interop"]
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Other bidict Types
|
||||
|
||||
```python
|
||||
from bidict import frozenbidict, OrderedBidict
|
||||
|
||||
# Immutable bidict (hashable, can be dict key or set member)
|
||||
immutable = frozenbidict({'H': 'hydrogen'})
|
||||
|
||||
# Ordered bidict (maintains insertion order, like dict in Python 3.7+)
|
||||
ordered = OrderedBidict({'H': 'hydrogen', 'He': 'helium'})
|
||||
```
|
||||
|
||||
Source: @[docs/other-bidict-types.rst]
|
||||
|
||||
### Fine-Grained Duplication Control
|
||||
|
||||
```python
|
||||
from bidict import bidict, OnDup, RAISE, DROP_OLD
|
||||
|
||||
b = bidict({1: 'one'})
|
||||
|
||||
# Strict mode - raise on any key or value duplication
|
||||
b.put(2, 'two', on_dup=OnDup(key=RAISE, val=RAISE))
|
||||
|
||||
# Custom policies for different duplication scenarios
|
||||
on_dup = OnDup(key=DROP_OLD, val=RAISE)
|
||||
b.putall([(1, 'uno'), (2, 'dos')], on_dup=on_dup)
|
||||
```
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Key and Value Duplication"]
|
||||
|
||||
### Fail-Clean Guarantee
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
b = bidict({1: 'one', 2: 'two'})
|
||||
|
||||
# If an update fails, the bidict is unchanged
|
||||
try:
|
||||
b.putall({3: 'three', 1: 'uno'}) # 1 is duplicate key
|
||||
except KeyDuplicationError:
|
||||
pass
|
||||
|
||||
# (3, 'three') was NOT added - the bidict remains unchanged
|
||||
b # bidict({1: 'one', 2: 'two'})
|
||||
```
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Updates Fail Clean"]
|
||||
|
||||
## Real-World Usage Patterns
|
||||
|
||||
Based on analysis of the bidict repository and documentation:
|
||||
|
||||
### Pattern 1: Symbol-to-Name Mappings
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# Chemical elements
|
||||
element_by_symbol = bidict({
|
||||
'H': 'hydrogen',
|
||||
'He': 'helium',
|
||||
'Li': 'lithium'
|
||||
})
|
||||
|
||||
# Look up element by symbol
|
||||
element_by_symbol['H'] # 'hydrogen'
|
||||
|
||||
# Look up symbol by element name
|
||||
element_by_symbol.inverse['lithium'] # 'Li'
|
||||
```
|
||||
|
||||
### Pattern 2: ID-to-Object Mappings
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# User session management
|
||||
session_by_user_id = bidict({
|
||||
1001: 'session_abc123',
|
||||
1002: 'session_def456'
|
||||
})
|
||||
|
||||
# Find session by user ID
|
||||
session_by_user_id[1001] # 'session_abc123'
|
||||
|
||||
# Find user ID by session
|
||||
session_by_user_id.inverse['session_abc123'] # 1001
|
||||
```
|
||||
|
||||
### Pattern 3: Internationalization/Translation
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# Language code mappings
|
||||
lang_code = bidict({
|
||||
'en': 'English',
|
||||
'es': 'Español',
|
||||
'fr': 'Français'
|
||||
})
|
||||
|
||||
# Look up language name from code
|
||||
lang_code['es'] # 'Español'
|
||||
|
||||
# Look up code from language name
|
||||
lang_code.inverse['Français'] # 'fr'
|
||||
```
|
||||
|
||||
### Pattern 4: File Path-to-Identifier Mappings
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# File tracking system
|
||||
file_by_id = bidict({
|
||||
'f001': '/path/to/document.pdf',
|
||||
'f002': '/path/to/image.png'
|
||||
})
|
||||
|
||||
# Get path from ID
|
||||
file_by_id['f001'] # '/path/to/document.pdf'
|
||||
|
||||
# Get ID from path
|
||||
file_by_id.inverse['/path/to/image.png'] # 'f002'
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### With Type Hints
|
||||
|
||||
```python
|
||||
from typing import Mapping
|
||||
from bidict import bidict
|
||||
|
||||
def process_mapping(data: Mapping[str, int]) -> None:
|
||||
# bidict is a full Mapping implementation
|
||||
for key, value in data.items():
|
||||
print(f"{key}: {value}")
|
||||
|
||||
# Works seamlessly
|
||||
process_mapping(bidict({'a': 1, 'b': 2}))
|
||||
```
|
||||
|
||||
### With collections.abc
|
||||
|
||||
bidict implements:
|
||||
|
||||
- `collections.abc.MutableMapping` (for `bidict`)
|
||||
- `collections.abc.Mapping` (for `frozenbidict`)
|
||||
|
||||
```python
|
||||
from collections.abc import MutableMapping
|
||||
from bidict import bidict
|
||||
|
||||
def validate_mapping(m: MutableMapping) -> bool:
|
||||
return isinstance(m, MutableMapping)
|
||||
|
||||
validate_mapping(bidict()) # True
|
||||
```
|
||||
|
||||
### Polymorphic Equality
|
||||
|
||||
```python
|
||||
from bidict import bidict
|
||||
|
||||
# bidict compares equal to dicts with same items
|
||||
bidict(a=1, b=2) == {'a': 1, 'b': 2} # True
|
||||
|
||||
# Can convert freely between dict and bidict
|
||||
dict(bidict(a=1)) # {'a': 1}
|
||||
bidict(dict(a=1)) # bidict({'a': 1})
|
||||
```
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Interop"]
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Time Complexity
|
||||
|
||||
- **Forward lookup** (`b[key]`): O(1)
|
||||
- **Inverse lookup** (`b.inverse[value]`): O(1)
|
||||
- **Insert/Update** (`b[key] = value`): O(1)
|
||||
- **Delete** (`del b[key]`): O(1)
|
||||
- **Access inverse** (`b.inverse`): O(1) - inverse is always maintained, not computed on demand
|
||||
|
||||
### Space Complexity
|
||||
|
||||
- **Memory overhead**: Approximately 2x a single dict (maintains two internal dicts)
|
||||
- **Inverse access**: No additional memory allocation (inverse is a view)
|
||||
|
||||
Source: @[docs/intro.rst: "the inverse is not computed on demand"]
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Values must be hashable**: Cannot use lists, dicts, or other unhashable types as values
|
||||
2. **Memory overhead**: Uses roughly 2x the memory of a single dict
|
||||
3. **One-to-one only**: Cannot represent many-to-one or one-to-many relationships
|
||||
4. **Value uniqueness enforced**: Raises `ValueDuplicationError` by default when duplicate values are inserted
|
||||
|
||||
Source: @[docs/basic-usage.rst: "Values Must Be Hashable", "Values Must Be Unique"]
|
||||
|
||||
## When NOT to Use
|
||||
|
||||
### Scenario 1: Many-to-One Relationships
|
||||
|
||||
```python
|
||||
# BAD: Multiple keys mapping to same value
|
||||
# This won't work with bidict - use dict instead
|
||||
category_to_items = {
|
||||
'fruit': 'apple',
|
||||
'vegetable': 'carrot',
|
||||
'fruit': 'banana' # Duplicate value for different key
|
||||
}
|
||||
```
|
||||
|
||||
### Scenario 2: Unhashable Values
|
||||
|
||||
```python
|
||||
# BAD: Lists as values
|
||||
# This raises TypeError with bidict
|
||||
groups = bidict({
|
||||
'admins': ['alice', 'bob'], # TypeError: unhashable type: 'list'
|
||||
'users': ['charlie', 'david']
|
||||
})
|
||||
|
||||
# Use regular dict or use frozenset/tuple as values
|
||||
groups = bidict({
|
||||
'admins': frozenset(['alice', 'bob']), # OK
|
||||
'users': frozenset(['charlie', 'david'])
|
||||
})
|
||||
```
|
||||
|
||||
### Scenario 3: Rarely Used Inverse Lookups
|
||||
|
||||
```python
|
||||
# If you only need inverse lookup occasionally, manual approach may be simpler
|
||||
forward = {'key1': 'value1', 'key2': 'value2'}
|
||||
|
||||
# Occasionally create inverse when needed
|
||||
inverse = {v: k for k, v in forward.items()}
|
||||
```
|
||||
|
||||
### Scenario 4: Extreme Memory Constraints
|
||||
|
||||
For very large datasets (millions of entries) where inverse lookups are infrequent, the 2x memory overhead may not be justified. Consider:
|
||||
|
||||
- Database-backed lookups for both directions
|
||||
- On-demand inverse dict construction
|
||||
- External key-value stores with bidirectional indices
|
||||
|
||||
## Notable Dependents
|
||||
|
||||
bidict is used by major organizations and projects (source: @[README.rst]):
|
||||
|
||||
- Google
|
||||
- Venmo
|
||||
- CERN
|
||||
- Baidu
|
||||
- Tencent
|
||||
|
||||
**PyPI Download Statistics**: Significant adoption with millions of downloads (source: @[README.rst badge])
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Runtime**: None (zero dependencies outside Python stdlib)
|
||||
- **Development**: pytest, hypothesis, mypy, sphinx (for testing and docs)
|
||||
|
||||
Source: @[pyproject.toml: dependencies = []]
|
||||
|
||||
## Maintenance and Support
|
||||
|
||||
- **Maintenance**: Actively maintained since 2009 (15+ years)
|
||||
- **Test Coverage**: 100% test coverage with property-based testing via hypothesis
|
||||
- **CI/CD**: Continuous testing across all supported Python versions
|
||||
- **Type Hints**: Fully type-hinted and mypy-strict compliant
|
||||
- **Documentation**: Comprehensive documentation at readthedocs.io
|
||||
- **Community**: GitHub Discussions for questions, active issue tracker
|
||||
- **Enterprise Support**: Available via Tidelift subscription
|
||||
|
||||
Source: @[README.rst: "Features", "Enterprise Support"]
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### From Two Manual Dicts
|
||||
|
||||
```python
|
||||
# Before: Manual synchronization
|
||||
forward = {'H': 'hydrogen'}
|
||||
inverse = {'hydrogen': 'H'}
|
||||
|
||||
# When updating
|
||||
forward['H'] = 'hydrogène'
|
||||
del inverse['hydrogen'] # Manual cleanup
|
||||
inverse['hydrogène'] = 'H'
|
||||
|
||||
# After: Automatic synchronization
|
||||
from bidict import bidict
|
||||
mapping = bidict({'H': 'hydrogen'})
|
||||
mapping['H'] = 'hydrogène' # inverse automatically updated
|
||||
```
|
||||
|
||||
### From Naive Single Dict
|
||||
|
||||
```python
|
||||
# Before: Mixed keys and values
|
||||
mixed = {'H': 'hydrogen', 'hydrogen': 'H'}
|
||||
len(mixed) # 2 (wrong - should be 1 association)
|
||||
list(mixed.keys()) # ['H', 'hydrogen'] (values mixed in)
|
||||
|
||||
# After: Clean separation
|
||||
from bidict import bidict
|
||||
b = bidict({'H': 'hydrogen'})
|
||||
len(b) # 1 (correct)
|
||||
list(b.keys()) # ['H'] (only keys)
|
||||
list(b.values()) # ['hydrogen'] (only values)
|
||||
```
|
||||
|
||||
## Related Libraries and Alternatives
|
||||
|
||||
- **Two manual dicts**: Simplest for occasional inverse lookups
|
||||
- **bidict.OrderedBidict**: When insertion order matters (built into bidict)
|
||||
- **bidict.frozenbidict**: Immutable variant for hashable mappings (built into bidict)
|
||||
- **sortedcontainers.SortedDict**: For sorted bidirectional mappings (can combine with bidict)
|
||||
|
||||
No direct competitors in Python stdlib or third-party ecosystem that provide the same level of safety, features, and maintenance.
|
||||
|
||||
## Learning Resources
|
||||
|
||||
- Official Documentation: @[https://bidict.readthedocs.io]
|
||||
- Intro Guide: @[https://bidict.readthedocs.io/intro.html]
|
||||
- Basic Usage: @[https://bidict.readthedocs.io/basic-usage.html]
|
||||
- Learning from bidict: @[https://bidict.readthedocs.io/learning-from-bidict.html] - covers advanced Python topics touched by bidict's implementation
|
||||
- GitHub Repository: @[https://github.com/jab/bidict]
|
||||
- PyPI Package: @[https://pypi.org/project/bidict/]
|
||||
|
||||
## Quick Decision Guide
|
||||
|
||||
**Use bidict when you answer "yes" to:**
|
||||
|
||||
1. Do you need to look up keys by values frequently?
|
||||
2. Are your values unique (one-to-one relationship)?
|
||||
3. Are your values hashable?
|
||||
4. Do you want automatic synchronization between directions?
|
||||
|
||||
**Use two separate dicts when:**
|
||||
|
||||
1. Inverse lookups are rare
|
||||
2. You have many-to-one relationships
|
||||
3. Memory is extremely constrained
|
||||
4. Values are unhashable
|
||||
|
||||
**Use a single dict when:**
|
||||
|
||||
1. You only need one direction
|
||||
2. Values don't need to be unique
|
||||
|
||||
## Code Review Checklist
|
||||
|
||||
When reviewing code using bidict:
|
||||
|
||||
- [ ] Values are hashable (not lists, dicts, sets)
|
||||
- [ ] One-to-one relationship is intended (no many-to-one)
|
||||
- [ ] Error handling for `ValueDuplicationError` where appropriate
|
||||
- [ ] `forceput()`/`forceupdate()` usage is intentional and documented
|
||||
- [ ] Memory overhead (2x dict) is acceptable for use case
|
||||
- [ ] Type hints include bidict types where appropriate
|
||||
- [ ] Inverse access pattern justifies bidict usage vs two dicts
|
||||
|
||||
## Summary
|
||||
|
||||
bidict is a mature, well-tested library that solves the bidirectional mapping problem elegantly. Use it when you need efficient lookups in both directions with automatic synchronization and one-to-one invariant enforcement. Avoid it when you have many-to-one relationships, unhashable values, or rarely use inverse lookups.
|
||||
|
||||
**Key Takeaway**: If you're maintaining two dicts manually or considering `{a: b, b: a}`, reach for bidict. It eliminates error-prone manual synchronization while providing stronger guarantees and cleaner code.
|
||||
586
skills/python3-development/references/modern-modules/blinker.md
Normal file
586
skills/python3-development/references/modern-modules/blinker.md
Normal file
@@ -0,0 +1,586 @@
|
||||
---
|
||||
title: "Blinker: Fast Signal/Event Dispatching System"
|
||||
library_name: blinker
|
||||
pypi_package: blinker
|
||||
category: event-system
|
||||
python_compatibility: "3.8+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://blinker.readthedocs.io"
|
||||
official_repository: "https://github.com/pallets-eco/blinker"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Blinker: Fast Signal/Event Dispatching System
|
||||
|
||||
## Official Information
|
||||
|
||||
**Repository:** <https://github.com/pallets-eco/blinker> **PyPI Package:** blinker **Current Version:** 1.9.0 (Released 2024-11-08) **Official Documentation:** <https://blinker.readthedocs.io/> **License:** MIT License **Maintenance Status:** Active (Pallets Community Ecosystem)
|
||||
|
||||
@source <https://github.com/pallets-eco/blinker> @source <https://blinker.readthedocs.io/> @source <https://pypi.org/project/blinker/>
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Blinker provides a fast dispatching system that allows any number of interested parties to subscribe to events or "signals". It implements the Observer pattern with a clean, Pythonic API.
|
||||
|
||||
### Problem Space
|
||||
|
||||
Without blinker, you would need to manually implement:
|
||||
|
||||
- Global event registries for decoupled components
|
||||
- Weak reference management for automatic cleanup
|
||||
- Thread-safe event dispatching
|
||||
- Sender-specific event filtering
|
||||
- Return value collection from multiple handlers
|
||||
|
||||
### When to Use Blinker
|
||||
|
||||
**Use blinker when:**
|
||||
|
||||
- Building plugin systems that need event hooks
|
||||
- Implementing application lifecycle hooks (like Flask)
|
||||
- Creating decoupled components that communicate via events
|
||||
- Building event-driven architectures within a single process
|
||||
- Need multiple independent handlers for the same event
|
||||
- Want automatic cleanup via weak references
|
||||
|
||||
**What you would be "reinventing the wheel" without it:**
|
||||
|
||||
- Observer/subscriber pattern implementation
|
||||
- Named signal registries for plugin communication
|
||||
- Weak reference management for receivers
|
||||
- Thread-safe signal dispatching
|
||||
- Sender filtering and context passing
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
**Minimum Python Version:** 3.9+ **Python 3.11:** Fully compatible **Python 3.12:** Fully compatible **Python 3.13:** Fully compatible **Python 3.14:** Expected to be compatible
|
||||
|
||||
@source <https://blinker.readthedocs.io/en/stable/>
|
||||
|
||||
### Thread Safety
|
||||
|
||||
Blinker signals are thread-safe. The library uses weak references for automatic cleanup and properly handles concurrent signal emission and subscription.
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Flask Ecosystem Integration
|
||||
|
||||
Flask uses blinker as its signal system foundation. Flask provides built-in signals like:
|
||||
|
||||
- `request_started` - Before request processing begins
|
||||
- `request_finished` - After response is constructed
|
||||
- `template_rendered` - When template is rendered
|
||||
- `request_tearing_down` - During request teardown
|
||||
|
||||
@source <https://flask.palletsprojects.com/en/latest/signals/>
|
||||
|
||||
**Example Flask Signal Usage:**
|
||||
|
||||
```python
|
||||
from flask import template_rendered
|
||||
|
||||
def log_template_renders(sender, template, context, **extra):
|
||||
sender.logger.info(
|
||||
f"Rendered {template.name} with context {context}"
|
||||
)
|
||||
|
||||
template_rendered.connect(log_template_renders, app)
|
||||
```
|
||||
|
||||
### Event-Driven Architecture
|
||||
|
||||
Blinker excels at creating loosely coupled components:
|
||||
|
||||
```python
|
||||
from blinker import Namespace
|
||||
|
||||
# Create isolated namespace for your application
|
||||
app_signals = Namespace()
|
||||
|
||||
# Define signals
|
||||
user_logged_in = app_signals.signal('user-logged-in')
|
||||
data_updated = app_signals.signal('data-updated')
|
||||
|
||||
# Multiple handlers can subscribe
|
||||
@user_logged_in.connect
|
||||
def update_last_login(sender, **kwargs):
|
||||
user_id = kwargs.get('user_id')
|
||||
# Update database
|
||||
|
||||
@user_logged_in.connect
|
||||
def send_login_notification(sender, **kwargs):
|
||||
# Send email notification
|
||||
pass
|
||||
|
||||
# Emit signal
|
||||
user_logged_in.send(app, user_id=123, ip_address='192.168.1.1')
|
||||
```
|
||||
|
||||
### Plugin Systems
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
# Core application defines hook points
|
||||
plugin_loaded = signal('plugin-loaded')
|
||||
before_process = signal('before-process')
|
||||
after_process = signal('after-process')
|
||||
|
||||
# Plugins subscribe to hooks
|
||||
@before_process.connect
|
||||
def plugin_preprocess(sender, data):
|
||||
# Plugin modifies data before processing
|
||||
return data
|
||||
|
||||
# Application emits signals at hook points
|
||||
results = before_process.send(self, data=input_data)
|
||||
for receiver, result in results:
|
||||
if result is not None:
|
||||
input_data = result
|
||||
```
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Example 1: Flask Request Monitoring
|
||||
|
||||
@source <https://github.com/instana/python-sensor> (Flask instrumentation with blinker)
|
||||
|
||||
```python
|
||||
from flask import request_started, request_finished
|
||||
import time
|
||||
|
||||
request_times = {}
|
||||
|
||||
def track_request_start(sender, **extra):
|
||||
request_times[id(extra)] = time.time()
|
||||
|
||||
def track_request_end(sender, response, **extra):
|
||||
duration = time.time() - request_times.pop(id(extra), time.time())
|
||||
sender.logger.info(f"Request took {duration:.2f}s")
|
||||
|
||||
request_started.connect(track_request_start)
|
||||
request_finished.connect(track_request_end)
|
||||
```
|
||||
|
||||
### Example 2: Model Save Hooks
|
||||
|
||||
@source <https://blinker.readthedocs.io/>
|
||||
|
||||
```python
|
||||
from blinker import Namespace
|
||||
|
||||
model_signals = Namespace()
|
||||
model_saved = model_signals.signal('model-saved')
|
||||
|
||||
class Model:
|
||||
def save(self):
|
||||
# Save to database
|
||||
self._persist()
|
||||
# Emit signal for observers
|
||||
model_saved.send(self, model_type=self.__class__.__name__)
|
||||
|
||||
# Cache invalidation handler
|
||||
@model_saved.connect
|
||||
def invalidate_cache(sender, **kwargs):
|
||||
cache.delete(f"model:{kwargs['model_type']}")
|
||||
|
||||
# Audit logging handler
|
||||
@model_saved.connect
|
||||
def log_change(sender, **kwargs):
|
||||
audit_log.write(f"Model saved: {kwargs['model_type']}")
|
||||
```
|
||||
|
||||
### Example 3: Sender-Specific Subscriptions
|
||||
|
||||
@source <https://github.com/pallets-eco/blinker> README
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
round_started = signal('round-started')
|
||||
|
||||
# General subscriber - receives from all senders
|
||||
@round_started.connect
|
||||
def each_round(sender):
|
||||
print(f"Round {sender}")
|
||||
|
||||
# Sender-specific subscriber - only for sender=2
|
||||
@round_started.connect_via(2)
|
||||
def special_round(sender):
|
||||
print("This is round two!")
|
||||
|
||||
for round_num in range(1, 4):
|
||||
round_started.send(round_num)
|
||||
# Output:
|
||||
# Round 1
|
||||
# Round 2
|
||||
# This is round two!
|
||||
# Round 3
|
||||
```
|
||||
|
||||
### Example 4: Async Signal Handlers
|
||||
|
||||
@source <https://blinker.readthedocs.io/en/stable/>
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from blinker import Signal
|
||||
|
||||
async_signal = Signal()
|
||||
|
||||
# Async receiver
|
||||
async def async_receiver(sender, **kwargs):
|
||||
await asyncio.sleep(1)
|
||||
print("Async handler completed")
|
||||
|
||||
async_signal.connect(async_receiver)
|
||||
|
||||
# Send to async receivers
|
||||
await async_signal.send_async()
|
||||
|
||||
# Mix sync and async receivers
|
||||
def sync_receiver(sender, **kwargs):
|
||||
print("Sync handler")
|
||||
|
||||
async_signal.connect(sync_receiver)
|
||||
|
||||
# Provide wrapper for sync handlers in async context
|
||||
async def sync_wrapper(func):
|
||||
async def inner(*args, **kwargs):
|
||||
func(*args, **kwargs)
|
||||
return inner
|
||||
|
||||
await async_signal.send_async(_sync_wrapper=sync_wrapper)
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Signal Definition and Connection
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
# Named signals (shared across modules)
|
||||
initialized = signal('initialized')
|
||||
|
||||
# Anonymous signals (class attributes)
|
||||
from blinker import Signal
|
||||
|
||||
class Processor:
|
||||
on_ready = Signal()
|
||||
on_complete = Signal()
|
||||
|
||||
def process(self):
|
||||
self.on_ready.send(self)
|
||||
# Do work
|
||||
self.on_complete.send(self, status='success')
|
||||
|
||||
# Connect receivers
|
||||
@initialized.connect
|
||||
def on_init(sender, **kwargs):
|
||||
print(f"Initialized by {sender}")
|
||||
|
||||
processor = Processor()
|
||||
|
||||
@processor.on_complete.connect
|
||||
def handle_completion(sender, **kwargs):
|
||||
print(f"Status: {kwargs['status']}")
|
||||
```
|
||||
|
||||
### Named Signals for Decoupling
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
# Module A defines and sends
|
||||
def user_service():
|
||||
user_created = signal('user-created')
|
||||
# Create user
|
||||
user_created.send('user_service', user_id=123, username='john')
|
||||
|
||||
# Module B subscribes (no import of Module A needed!)
|
||||
def notification_service():
|
||||
user_created = signal('user-created') # Same signal instance
|
||||
|
||||
@user_created.connect
|
||||
def send_welcome_email(sender, **kwargs):
|
||||
print(f"Sending email to {kwargs['username']}")
|
||||
```
|
||||
|
||||
### Checking for Receivers Before Expensive Operations
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
data_changed = signal('data-changed')
|
||||
|
||||
def update_data(new_data):
|
||||
# Only compute expensive stats if someone is listening
|
||||
if data_changed.receivers:
|
||||
stats = compute_expensive_stats(new_data)
|
||||
data_changed.send(self, data=new_data, stats=stats)
|
||||
else:
|
||||
# Skip expensive computation
|
||||
data_changed.send(self, data=new_data)
|
||||
```
|
||||
|
||||
### Temporarily Muting Signals (Testing)
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
send_email = signal('send-email')
|
||||
|
||||
@send_email.connect
|
||||
def actually_send(sender, **kwargs):
|
||||
# Send real email
|
||||
pass
|
||||
|
||||
def test_user_registration():
|
||||
# Don't send emails during tests
|
||||
with send_email.muted():
|
||||
register_user('test@example.com')
|
||||
# send_email signal is ignored in this context
|
||||
```
|
||||
|
||||
### Collecting Return Values
|
||||
|
||||
```python
|
||||
from blinker import signal
|
||||
|
||||
validate_data = signal('validate-data')
|
||||
|
||||
@validate_data.connect
|
||||
def check_email(sender, **kwargs):
|
||||
email = kwargs['email']
|
||||
if '@' not in email:
|
||||
return False, "Invalid email"
|
||||
return True, None
|
||||
|
||||
@validate_data.connect
|
||||
def check_username(sender, **kwargs):
|
||||
username = kwargs['username']
|
||||
if len(username) < 3:
|
||||
return False, "Username too short"
|
||||
return True, None
|
||||
|
||||
# Collect all validation results
|
||||
results = validate_data.send(
|
||||
None,
|
||||
email='invalid',
|
||||
username='ab'
|
||||
)
|
||||
|
||||
for receiver, (valid, error) in results:
|
||||
if not valid:
|
||||
print(f"Validation failed: {error}")
|
||||
```
|
||||
|
||||
## When NOT to Use Blinker
|
||||
|
||||
### Scenario 1: Simple Callbacks Sufficient
|
||||
|
||||
**Don't use blinker when:**
|
||||
|
||||
- Single callback function is enough
|
||||
- No need for dynamic subscription/unsubscription
|
||||
- Callbacks are tightly coupled to caller
|
||||
|
||||
```python
|
||||
# Overkill - use simple callback
|
||||
from blinker import signal
|
||||
sig = signal('done')
|
||||
sig.connect(on_done)
|
||||
sig.send(self)
|
||||
|
||||
# Better - direct callback
|
||||
def process(callback):
|
||||
# do work
|
||||
callback()
|
||||
|
||||
process(on_done)
|
||||
```
|
||||
|
||||
### Scenario 2: Async Event Systems
|
||||
|
||||
**Don't use blinker when:**
|
||||
|
||||
- Building async-first distributed event system
|
||||
- Need message queuing and persistence
|
||||
- Cross-process or cross-network communication
|
||||
|
||||
```python
|
||||
# Wrong tool - blinker is in-process only
|
||||
from blinker import signal
|
||||
distributed_event = signal('cross-service-event')
|
||||
|
||||
# Better - use async message queue
|
||||
import asyncio
|
||||
from aio_pika import connect, Message
|
||||
|
||||
async def publish_event():
|
||||
connection = await connect("amqp://guest:guest@localhost/")
|
||||
channel = await connection.channel()
|
||||
await channel.default_exchange.publish(
|
||||
Message(b"event data"),
|
||||
routing_key="events"
|
||||
)
|
||||
```
|
||||
|
||||
### Scenario 3: Complex State Machines
|
||||
|
||||
**Don't use blinker when:**
|
||||
|
||||
- Need state transitions with guards and actions
|
||||
- Require hierarchical or concurrent states
|
||||
- Complex workflow orchestration
|
||||
|
||||
```python
|
||||
# Wrong tool - too complex for simple signals
|
||||
from blinker import signal
|
||||
|
||||
# Better - use state machine library
|
||||
from transitions import Machine
|
||||
|
||||
class Order:
|
||||
states = ['pending', 'paid', 'shipped', 'delivered']
|
||||
|
||||
def __init__(self):
|
||||
self.machine = Machine(
|
||||
model=self,
|
||||
states=Order.states,
|
||||
initial='pending'
|
||||
)
|
||||
self.machine.add_transition('pay', 'pending', 'paid')
|
||||
self.machine.add_transition('ship', 'paid', 'shipped')
|
||||
```
|
||||
|
||||
### Scenario 4: Request/Response Patterns
|
||||
|
||||
**Don't use blinker when:**
|
||||
|
||||
- Need bidirectional request/response communication
|
||||
- Require RPC-style method calls
|
||||
- Need return values from specific handlers
|
||||
|
||||
```python
|
||||
# Awkward with signals
|
||||
result = some_signal.send(self, request='data')
|
||||
# Hard to know which handler provided what
|
||||
|
||||
# Better - direct method call or dependency injection
|
||||
class ServiceLocator:
|
||||
def get_service(self, name):
|
||||
return self._services[name]
|
||||
|
||||
service = locator.get_service('data_processor')
|
||||
result = service.process(data)
|
||||
```
|
||||
|
||||
## Decision Guidance Matrix
|
||||
|
||||
| Use Blinker When | Use Callbacks When | Use AsyncIO When | Use Message Queue When |
|
||||
| --- | --- | --- | --- |
|
||||
| Multiple independent handlers needed | Single handler sufficient | Async/await throughout codebase | Cross-process communication needed |
|
||||
| Plugin system with dynamic handlers | Tightly coupled components | I/O-bound async operations | Message persistence required |
|
||||
| Decoupled modules need communication | Callback logic is simple | Event loop already present | Distributed systems |
|
||||
| Framework-level hooks (like Flask) | Direct function call works | Concurrent async tasks | Reliability and retry needed |
|
||||
| Observable events in OOP design | Inline lambda sufficient | Network I/O heavy | Message ordering matters |
|
||||
| Weak reference cleanup needed | Manual lifecycle management OK | WebSockets/long-lived connections | Load balancing across workers |
|
||||
|
||||
### Decision Tree
|
||||
|
||||
```text
|
||||
Need event notifications?
|
||||
├─ Single process only?
|
||||
│ ├─ YES: Continue
|
||||
│ └─ NO: Use message queue (RabbitMQ, Redis, Kafka)
|
||||
│
|
||||
├─ Multiple handlers per event?
|
||||
│ ├─ YES: Continue
|
||||
│ └─ NO: Use simple callback function
|
||||
│
|
||||
├─ Handlers need to be dynamic (plugins)?
|
||||
│ ├─ YES: Use Blinker ✓
|
||||
│ └─ NO: Direct method calls may suffice
|
||||
│
|
||||
├─ Async/await heavy codebase?
|
||||
│ ├─ YES: Consider asyncio event system
|
||||
│ │ (or use Blinker with send_async)
|
||||
│ └─ NO: Use Blinker ✓
|
||||
│
|
||||
└─ Need weak reference cleanup?
|
||||
├─ YES: Use Blinker ✓
|
||||
└─ NO: Simple callbacks OK
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install blinker
|
||||
```
|
||||
|
||||
Current version: 1.9.0 Minimum Python: 3.9+
|
||||
|
||||
@source <https://pypi.org/project/blinker/>
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Global named signal registry:** `signal('name')` returns same instance everywhere
|
||||
- **Anonymous signals:** Create isolated `Signal()` instances
|
||||
- **Sender filtering:** `connect(handler, sender=obj)` for sender-specific subscriptions
|
||||
- **Weak references:** Automatic cleanup when receivers are garbage collected
|
||||
- **Thread safety:** Safe for concurrent use
|
||||
- **Return value collection:** Gather results from all handlers
|
||||
- **Async support:** `send_async()` for coroutine receivers
|
||||
- **Temporary connections:** Context managers for scoped subscriptions
|
||||
- **Signal muting:** Disable signals temporarily (useful for testing)
|
||||
|
||||
@source <https://blinker.readthedocs.io/en/stable/>
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Memory leaks with strong references:**
|
||||
|
||||
```python
|
||||
# Default uses weak references - OK
|
||||
signal.connect(handler)
|
||||
|
||||
# Strong reference - prevents garbage collection
|
||||
signal.connect(handler, weak=False) # Use sparingly!
|
||||
```
|
||||
|
||||
2. **Expecting signals to modify behavior:**
|
||||
- Signals are for observation, not control flow
|
||||
- Don't rely on signal handlers to prevent actions
|
||||
- Use explicit validation/authorization instead
|
||||
|
||||
3. **Forgetting sender parameter:**
|
||||
|
||||
```python
|
||||
@my_signal.connect
|
||||
def handler(sender, **kwargs): # sender is required!
|
||||
print(kwargs['data'])
|
||||
```
|
||||
|
||||
4. **Cross-process communication:**
|
||||
- Blinker is in-process only
|
||||
- Use message queues for distributed systems
|
||||
|
||||
5. **Performance with many handlers:**
|
||||
- Check `signal.receivers` before expensive operations
|
||||
- Consider limiting number of subscribers for hot paths
|
||||
|
||||
## Related Libraries
|
||||
|
||||
- **Django Signals:** Built into Django, similar concept but Django-specific
|
||||
- **PyPubSub:** More complex publish-subscribe system
|
||||
- **asyncio events:** For async-first applications
|
||||
- **RxPY:** Reactive extensions for Python (more powerful, more complex)
|
||||
- **Celery:** For distributed task queues and async workers
|
||||
|
||||
## Summary
|
||||
|
||||
Blinker is the standard solution for in-process event dispatching in Python, particularly within the Pallets ecosystem (Flask). Use it when you need clean, decoupled event notifications between components in the same process. For distributed systems, async-heavy codebases, or simple single-callback scenarios, consider alternatives.
|
||||
|
||||
**TL;DR:** Blinker = Observer pattern done right, with weak references, thread safety, and a clean API. Essential for Flask signals and plugin systems.
|
||||
513
skills/python3-development/references/modern-modules/boltons.md
Normal file
513
skills/python3-development/references/modern-modules/boltons.md
Normal file
@@ -0,0 +1,513 @@
|
||||
---
|
||||
title: "Boltons: Pure-Python Standard Library Extensions"
|
||||
library_name: boltons
|
||||
pypi_package: boltons
|
||||
category: utilities
|
||||
python_compatibility: "3.7+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://boltons.readthedocs.io"
|
||||
official_repository: "https://github.com/mahmoud/boltons"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Boltons: Pure-Python Standard Library Extensions
|
||||
|
||||
## Overview
|
||||
|
||||
**boltons should be builtins.**
|
||||
|
||||
Boltons is a collection of over 230 BSD-licensed, pure-Python utilities designed to extend Python's standard library with functionality that is conspicuously missing. Created and maintained by @mahmoud (Mahmoud Hashemi), it provides battle-tested implementations of commonly needed utilities without any external dependencies.
|
||||
|
||||
### Core Value Proposition
|
||||
|
||||
- **Zero Dependencies**: Pure-Python with no external requirements
|
||||
- **Module Independence**: Each module can be vendored individually
|
||||
- **Battle-Tested**: 6,765+ stars, tested against Python 3.7-3.13 and PyPy3
|
||||
- **Standard Library Philosophy**: Follows stdlib design principles
|
||||
- **Production Ready**: Used in production by numerous projects
|
||||
|
||||
## Problem Space
|
||||
|
||||
Boltons solves the "reinventing the wheel" problem for common utilities that should be in the standard library but aren't. Without boltons, developers repeatedly write custom implementations for:
|
||||
|
||||
- LRU caches with better APIs than `functools.lru_cache`
|
||||
- Chunked and windowed iteration patterns
|
||||
- Atomic file operations
|
||||
- Advanced dictionary types (OrderedMultiDict)
|
||||
- Enhanced traceback formatting and debugging
|
||||
- Recursive data structure traversal
|
||||
- File system utilities beyond `shutil`
|
||||
|
||||
### What Would Be Reinventing the Wheel
|
||||
|
||||
Using boltons prevents rewriting:
|
||||
|
||||
- Custom LRU cache implementations with size limits and TTL
|
||||
- Iteration utilities like `chunked()`, `windowed()`, `unique()`
|
||||
- Atomic file write operations (write-to-temp, rename)
|
||||
- Enhanced `namedtuple` with defaults and mutation
|
||||
- Traceback extraction and formatting utilities
|
||||
- URL parsing and manipulation beyond `urllib.parse`
|
||||
- Table formatting for 2D data
|
||||
|
||||
## Design Principles
|
||||
|
||||
Per @boltons/docs/architecture.rst, each "bolton" must:
|
||||
|
||||
1. **Be pure-Python and self-contained**: No C extensions, minimal dependencies
|
||||
2. **Perform a common task**: Address frequently needed functionality
|
||||
3. **Mitigate stdlib insufficiency**: Fill gaps in the standard library
|
||||
4. **Follow stdlib practices**: Balance best practice with pragmatism
|
||||
5. **Include documentation**: At least one doctest, links to related tools
|
||||
|
||||
## Key Modules
|
||||
|
||||
### 1. **cacheutils** - Advanced Caching [@context7]
|
||||
|
||||
Better caching than `functools.lru_cache`:
|
||||
|
||||
```python
|
||||
from boltons.cacheutils import LRU, cached, cachedmethod
|
||||
|
||||
# LRU cache with size limit
|
||||
cache = LRU(max_size=256)
|
||||
cache['user:123'] = user_data
|
||||
|
||||
# Decorator with custom cache backend
|
||||
@cached(cache={})
|
||||
def fibonacci(n):
|
||||
if n < 2:
|
||||
return n
|
||||
return fibonacci(n-1) + fibonacci(n-2)
|
||||
|
||||
# Threshold counter - only track frequently occurring items
|
||||
from boltons.cacheutils import ThresholdCounter
|
||||
tc = ThresholdCounter(threshold=0.1)
|
||||
tc.update([2] * 10) # Only remembers items > 10% frequency
|
||||
```
|
||||
|
||||
**When to use**: Need size-limited caches, TTL expiration, or custom eviction policies.
|
||||
|
||||
### 2. **iterutils** - Enhanced Iteration [@context7]
|
||||
|
||||
Powerful iteration utilities beyond `itertools`:
|
||||
|
||||
```python
|
||||
from boltons.iterutils import (
|
||||
chunked, chunked_iter, # Split into chunks
|
||||
windowed, windowed_iter, # Sliding windows
|
||||
unique, unique_iter, # Deduplicate preserving order
|
||||
one, first, same, # Reduction utilities
|
||||
remap, get_path, # Recursive data structure traversal
|
||||
backoff, # Exponential backoff with jitter
|
||||
pairwise # Overlapping pairs
|
||||
)
|
||||
|
||||
# Chunking for batch processing
|
||||
for batch in chunked(user_ids, 100):
|
||||
process_batch(batch)
|
||||
# [1,2,3,4,5] with size=2 → [1,2], [3,4], [5]
|
||||
|
||||
# Sliding window for moving averages
|
||||
for window in windowed(prices, 7):
|
||||
avg = sum(window) / len(window)
|
||||
# [1,2,3,4,5] with size=3 → [1,2,3], [2,3,4], [3,4,5]
|
||||
|
||||
# Safe reduction
|
||||
user = one(users) # Raises if != 1 item
|
||||
first_or_none = first(results, default=None)
|
||||
|
||||
# Recursive data structure traversal
|
||||
def visit(path, key, value):
|
||||
if isinstance(value, str) and 'secret' in key.lower():
|
||||
return '***REDACTED***'
|
||||
return value
|
||||
|
||||
clean_data = remap(user_data, visit=visit)
|
||||
|
||||
# Exponential backoff with jitter
|
||||
for wait_time in backoff(start=0.1, stop=60, count=5, jitter=True):
|
||||
if try_operation():
|
||||
break
|
||||
time.sleep(wait_time)
|
||||
```
|
||||
|
||||
**When to use**: Batch processing, sliding windows, recursive data transformation, retry logic.
|
||||
|
||||
### 3. **tbutils** - Enhanced Tracebacks [@context7]
|
||||
|
||||
Better exception handling and debugging:
|
||||
|
||||
```python
|
||||
from boltons.tbutils import TracebackInfo, ExceptionInfo, ParsedException
|
||||
|
||||
try:
|
||||
risky_operation()
|
||||
except Exception as e:
|
||||
# Capture full traceback info
|
||||
exc_info = ExceptionInfo.from_current()
|
||||
|
||||
# Access structured traceback data
|
||||
tb_info = TracebackInfo.from_current()
|
||||
for frame in tb_info.frames:
|
||||
print(f"{frame.filename}:{frame.lineno} in {frame.func_name}")
|
||||
|
||||
# Format for logging
|
||||
formatted = exc_info.get_formatted()
|
||||
logger.error(formatted)
|
||||
```
|
||||
|
||||
**When to use**: Enhanced error logging, debugging tools, error analysis.
|
||||
|
||||
### 4. **fileutils** - Safe File Operations [@context7]
|
||||
|
||||
Atomic writes and safe file handling:
|
||||
|
||||
```python
|
||||
from boltons.fileutils import atomic_save, mkdir_p, FilePerms
|
||||
|
||||
# Atomic file write (write-to-temp, rename)
|
||||
with atomic_save('config.json') as f:
|
||||
json.dump(config, f)
|
||||
# File only replaced if write succeeds
|
||||
|
||||
# Create directory path (like mkdir -p)
|
||||
mkdir_p('/path/to/nested/directory')
|
||||
|
||||
# Readable permission management
|
||||
perms = FilePerms(0o755)
|
||||
perms.apply('/path/to/script.sh')
|
||||
```
|
||||
|
||||
**When to use**: Configuration files, data persistence, safe concurrent writes.
|
||||
|
||||
### 5. **dictutils** - Advanced Dictionaries [@context7]
|
||||
|
||||
Enhanced dictionary types:
|
||||
|
||||
```python
|
||||
from boltons.dictutils import OrderedMultiDict, OMD
|
||||
|
||||
# Preserve order + allow duplicate keys (like HTTP headers)
|
||||
headers = OMD([
|
||||
('Accept', 'application/json'),
|
||||
('Accept', 'text/html'), # Multiple values for same key
|
||||
('User-Agent', 'MyBot/1.0')
|
||||
])
|
||||
|
||||
for accept in headers.getlist('Accept'):
|
||||
print(accept) # application/json, text/html
|
||||
```
|
||||
|
||||
**When to use**: HTTP headers, query parameters, configuration with duplicate keys.
|
||||
|
||||
### 6. **strutils** - String Utilities [@github/README.md]
|
||||
|
||||
Common string operations:
|
||||
|
||||
```python
|
||||
from boltons.strutils import (
|
||||
slugify, # URL-safe slugs
|
||||
bytes2human, # Human-readable byte sizes
|
||||
find_hashtags, # Extract #hashtags
|
||||
pluralize, # Smart pluralization
|
||||
strip_ansi # Remove ANSI codes
|
||||
)
|
||||
|
||||
slugify("Hello, World!") # "hello-world"
|
||||
bytes2human(1234567) # "1.18 MB"
|
||||
```
|
||||
|
||||
### 7. **queueutils** - Priority Queues [@context7]
|
||||
|
||||
Enhanced queue types:
|
||||
|
||||
```python
|
||||
from boltons.queueutils import HeapPriorityQueue, PriorityQueue
|
||||
|
||||
pq = HeapPriorityQueue()
|
||||
pq.add("low priority", priority=3)
|
||||
pq.add("high priority", priority=1)
|
||||
item = pq.pop() # Returns "high priority"
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Full Install
|
||||
|
||||
```bash
|
||||
pip install boltons
|
||||
```
|
||||
|
||||
### Import Individual Modules
|
||||
|
||||
```python
|
||||
# Import only what you need
|
||||
from boltons.cacheutils import LRU
|
||||
from boltons.iterutils import chunked
|
||||
from boltons.fileutils import atomic_save
|
||||
```
|
||||
|
||||
### Vendoring (Copy Into Project)
|
||||
|
||||
Since boltons has **zero dependencies** and each module is **independent**:
|
||||
|
||||
```bash
|
||||
# Copy specific module
|
||||
cp /path/to/site-packages/boltons/iterutils.py myproject/utils/
|
||||
|
||||
# Copy entire package
|
||||
cp -r /path/to/site-packages/boltons myproject/vendor/
|
||||
```
|
||||
|
||||
This is explicitly supported by the project design [@context7/architecture.rst].
|
||||
|
||||
## Real-World Usage Examples [@github/search]
|
||||
|
||||
### Example 1: Clastic Web Framework [@mahmoud/clastic]
|
||||
|
||||
```python
|
||||
# Enhanced traceback handling
|
||||
from boltons.tbutils import ExceptionInfo, TracebackInfo
|
||||
|
||||
class ErrorMiddleware:
|
||||
def handle_error(self, exc):
|
||||
exc_info = ExceptionInfo.from_current()
|
||||
return self.render_error_page(exc_info.get_formatted())
|
||||
```
|
||||
|
||||
### Example 2: Click-Extra CLI Framework [@kdeldycke/click-extra]
|
||||
|
||||
```python
|
||||
# Enhanced traceback formatting for CLI error messages
|
||||
from boltons.tbutils import print_exception
|
||||
|
||||
try:
|
||||
run_command()
|
||||
except Exception:
|
||||
print_exception() # Beautiful formatted traceback
|
||||
```
|
||||
|
||||
### Example 3: Reader Feed Library [@lemon24/reader]
|
||||
|
||||
```python
|
||||
# Type checking utilities
|
||||
from boltons.typeutils import make_sentinel
|
||||
|
||||
NOT_SET = make_sentinel('NOT_SET') # Better than None for defaults
|
||||
```
|
||||
|
||||
### Example 4: Batch Processing Pattern
|
||||
|
||||
```python
|
||||
from boltons.iterutils import chunked
|
||||
|
||||
# Process database records in batches
|
||||
for batch in chunked(fetch_all_records(), 1000):
|
||||
bulk_insert(batch)
|
||||
db.commit()
|
||||
```
|
||||
|
||||
### Example 5: API Rate Limiting
|
||||
|
||||
```python
|
||||
from boltons.iterutils import backoff
|
||||
from boltons.cacheutils import LRU
|
||||
|
||||
# Exponential backoff for API retries
|
||||
cache = LRU(max_size=1000)
|
||||
|
||||
def call_api_with_retry(endpoint):
|
||||
for wait in backoff(start=0.1, stop=60, count=5):
|
||||
try:
|
||||
return requests.get(endpoint)
|
||||
except requests.HTTPError as e:
|
||||
if e.response.status_code == 429: # Rate limited
|
||||
time.sleep(wait)
|
||||
else:
|
||||
raise
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
- **Minimum**: Python 3.7
|
||||
- **Maximum Tested**: Python 3.13
|
||||
- **Also Tested**: PyPy3
|
||||
- **3.11-3.14 Status**: Fully compatible (tested 3.11, 3.12, 3.13)
|
||||
|
||||
Per @github/README.md:
|
||||
|
||||
> Boltons is tested against Python 3.7-3.13, as well as PyPy3.
|
||||
|
||||
## When to Use Boltons
|
||||
|
||||
### Use Boltons When
|
||||
|
||||
1. **Need stdlib-style utilities with no dependencies**
|
||||
- Building libraries that avoid dependencies
|
||||
- Corporate environments with strict dependency policies
|
||||
- Want vendorable, copy-pasteable code
|
||||
|
||||
2. **Iteration patterns beyond itertools**
|
||||
- Chunking/batching data
|
||||
- Sliding windows
|
||||
- Recursive data structure traversal
|
||||
- Exponential backoff
|
||||
|
||||
3. **Enhanced caching needs**
|
||||
- Size-limited LRU caches
|
||||
- TTL expiration
|
||||
- Custom eviction policies
|
||||
- Better API than `functools.lru_cache`
|
||||
|
||||
4. **Atomic file operations**
|
||||
- Safe configuration file updates
|
||||
- Preventing corrupted writes
|
||||
- Concurrent file access
|
||||
|
||||
5. **Advanced debugging**
|
||||
- Structured traceback information
|
||||
- Custom error formatting
|
||||
- Error analysis tools
|
||||
|
||||
6. **OrderedMultiDict needs**
|
||||
- HTTP headers/query parameters
|
||||
- Configuration with duplicate keys
|
||||
- Preserving insertion order + duplicates
|
||||
|
||||
### Use Standard Library When
|
||||
|
||||
1. **Basic iteration**: `itertools` suffices
|
||||
2. **Simple caching**: `functools.lru_cache` is enough
|
||||
3. **Basic file ops**: `pathlib` and `shutil` work fine
|
||||
4. **Standard dicts**: `dict` or `collections.OrderedDict` meets needs
|
||||
|
||||
### Use more-itertools When
|
||||
|
||||
- Need even more specialized iteration utilities
|
||||
- Already using `more-itertools` in project
|
||||
- Want community recipes from itertools docs
|
||||
|
||||
**Key Difference**: Boltons is broader (files, caching, debugging) while `more-itertools` focuses purely on iteration.
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Scenario | Use Boltons | Use Stdlib | Use Alternative |
|
||||
| --- | --- | --- | --- |
|
||||
| LRU cache with size limits | ✅ `cacheutils.LRU` | ⚠️ `lru_cache` (no size control) | `cachetools` (more features) |
|
||||
| Chunked iteration | ✅ `iterutils.chunked` | ❌ Manual slicing | `more-itertools.chunked` |
|
||||
| Atomic file writes | ✅ `fileutils.atomic_save` | ❌ Manual temp+rename | `atomicwrites` (archived) |
|
||||
| Enhanced tracebacks | ✅ `tbutils.TracebackInfo` | ❌ `traceback` (basic) | `rich.traceback` (prettier) |
|
||||
| OrderedMultiDict | ✅ `dictutils.OMD` | ❌ Custom solution | `werkzeug.datastructures` |
|
||||
| Exponential backoff | ✅ `iterutils.backoff` | ❌ Manual implementation | `tenacity`, `backoff` |
|
||||
| URL parsing | ✅ `urlutils.URL` | ⚠️ `urllib.parse` (basic) | `yarl`, `furl` |
|
||||
| Zero dependencies | ✅ Pure Python | ✅ Built-in | ❌ Most alternatives |
|
||||
|
||||
## When NOT to Use Boltons
|
||||
|
||||
1. **Already using specialized libraries**
|
||||
- Have `cachetools` for advanced caching
|
||||
- Have `tenacity` for retry logic
|
||||
- Have `rich` for pretty output
|
||||
|
||||
2. **Need high-performance implementations**
|
||||
- Boltons prioritizes correctness over speed
|
||||
- C-extension alternatives may be faster
|
||||
|
||||
3. **Want cutting-edge features**
|
||||
- Boltons is conservative, stdlib-like
|
||||
- Specialized libraries may innovate faster
|
||||
|
||||
4. **Framework-specific needs**
|
||||
- Django/Flask have their own utils
|
||||
- Web frameworks provide similar functionality
|
||||
|
||||
## Maintenance and Stability
|
||||
|
||||
- **Versioning**: CalVer (YY.MINOR.MICRO) [@github/README.md]
|
||||
- **Latest**: 25.0.0 (February 2025)
|
||||
- **Maintenance**: Active, 71 open issues, 373 forks
|
||||
- **Author**: Mahmoud Hashemi (@mahmoud)
|
||||
- **License**: BSD (permissive)
|
||||
|
||||
## Related Libraries
|
||||
|
||||
### Complementary
|
||||
|
||||
- **more-itertools**: Extended iteration recipes
|
||||
- **toolz/cytoolz**: Functional programming utilities
|
||||
- **attrs/dataclasses**: Enhanced class definitions
|
||||
|
||||
### Overlapping
|
||||
|
||||
- **cachetools**: More advanced caching (but has dependencies)
|
||||
- **atomicwrites**: Atomic file writes (now archived)
|
||||
- **werkzeug**: Web utilities including MultiDict
|
||||
|
||||
### When to Combine
|
||||
|
||||
```python
|
||||
# Use both boltons and more-itertools
|
||||
from boltons.iterutils import chunked # For chunking
|
||||
from more_itertools import flatten # For flattening
|
||||
from boltons.cacheutils import LRU # For caching
|
||||
|
||||
cache = LRU(max_size=1000)
|
||||
|
||||
@cached(cache=cache)
|
||||
def process_data(records):
|
||||
for batch in chunked(records, 100):
|
||||
yield process_batch(batch)
|
||||
```
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Zero Dependencies**: Pure-Python, no external requirements
|
||||
2. **Vendorable**: Copy individual modules into your project
|
||||
3. **Battle-Tested**: 6,765+ stars, production-proven
|
||||
4. **Stdlib Philosophy**: Familiar API, conservative design
|
||||
5. **Broad Coverage**: Caching, iteration, files, debugging, data structures
|
||||
6. **Production Ready**: Python 3.7-3.13, PyPy3 support
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
# Install
|
||||
pip install boltons
|
||||
|
||||
# Common patterns
|
||||
from boltons.cacheutils import LRU
|
||||
from boltons.iterutils import chunked, windowed, backoff
|
||||
from boltons.fileutils import atomic_save
|
||||
from boltons.tbutils import ExceptionInfo
|
||||
|
||||
# LRU cache
|
||||
cache = LRU(max_size=256)
|
||||
|
||||
# Batch processing
|
||||
for batch in chunked(items, 100):
|
||||
process(batch)
|
||||
|
||||
# Atomic writes
|
||||
with atomic_save('data.json') as f:
|
||||
json.dump(data, f)
|
||||
|
||||
# Enhanced error handling
|
||||
try:
|
||||
risky()
|
||||
except Exception:
|
||||
exc_info = ExceptionInfo.from_current()
|
||||
logger.error(exc_info.get_formatted())
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- **Repository**: [@github/mahmoud/boltons](https://github.com/mahmoud/boltons)
|
||||
- **Documentation**: [@readthedocs](https://boltons.readthedocs.io/)
|
||||
- **PyPI**: [@pypi/boltons](https://pypi.org/project/boltons/)
|
||||
- **Context7**: [@context7/mahmoud/boltons](/mahmoud/boltons)
|
||||
- **Architecture**: [@readthedocs/architecture](https://boltons.readthedocs.io/en/latest/architecture.html)
|
||||
|
||||
---
|
||||
|
||||
_Research completed: 2025-10-21_ _Sources: Context7, GitHub, PyPI, ReadTheDocs, Exa code search_ _Trust Score: 9.8/10 (Context7)_
|
||||
683
skills/python3-development/references/modern-modules/box.md
Normal file
683
skills/python3-development/references/modern-modules/box.md
Normal file
@@ -0,0 +1,683 @@
|
||||
---
|
||||
title: "python-box: Advanced Python Dictionaries with Dot Notation Access"
|
||||
library_name: python-box
|
||||
pypi_package: python-box
|
||||
category: data_structures
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://github.com/cdgriffith/Box/wiki"
|
||||
official_repository: "https://github.com/cdgriffith/Box"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# python-box: Advanced Python Dictionaries with Dot Notation Access
|
||||
|
||||
## Overview
|
||||
|
||||
python-box extends Python's built-in dictionary with dot notation access and powerful configuration management features. It provides a transparent drop-in replacement for standard dicts while adding recursive dot notation, automatic type conversion, and seamless serialization to/from JSON, YAML, TOML, and msgpack formats.
|
||||
|
||||
**Official Repository:** @<https://github.com/cdgriffith/Box> **Documentation:** @<https://github.com/cdgriffith/Box/wiki> **PyPI Package:** `python-box` **License:** MIT **Maintained By:** Chris Griffith (@cdgriffith)
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### Problem Box Solves
|
||||
|
||||
Without python-box, working with nested dictionaries requires verbose bracket notation:
|
||||
|
||||
```python
|
||||
# Standard dict - verbose and error-prone
|
||||
config = {
|
||||
"database": {
|
||||
"host": "localhost",
|
||||
"port": 5432,
|
||||
"credentials": {
|
||||
"username": "admin",
|
||||
"password": "secret"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Accessing nested values - clunky syntax
|
||||
db_host = config["database"]["host"]
|
||||
db_user = config["database"]["credentials"]["username"]
|
||||
|
||||
# KeyError if key doesn't exist
|
||||
try:
|
||||
timeout = config["database"]["timeout"] # KeyError!
|
||||
except KeyError:
|
||||
timeout = 30
|
||||
```
|
||||
|
||||
With python-box, you get clean dot notation and safe defaults:
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
config = Box({
|
||||
"database": {
|
||||
"host": "localhost",
|
||||
"port": 5432,
|
||||
"credentials": {
|
||||
"username": "admin",
|
||||
"password": "secret"
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
# Clean dot notation access
|
||||
db_host = config.database.host
|
||||
db_user = config.database.credentials.username
|
||||
|
||||
# Safe access with defaults (using DefaultBox)
|
||||
from box import DefaultBox
|
||||
config = DefaultBox(config, default_box=True)
|
||||
timeout = config.database.timeout or 30 # No KeyError
|
||||
```
|
||||
|
||||
### When You're Reinventing the Wheel
|
||||
|
||||
You should use python-box when you find yourself:
|
||||
|
||||
1. **Writing custom attribute access wrappers** for dictionaries
|
||||
2. **Implementing recursive dictionary-to-object converters**
|
||||
3. **Manually sanitizing dictionary keys** to make them Python-safe
|
||||
4. **Writing boilerplate** for JSON/YAML configuration loading
|
||||
5. **Creating frozen/immutable configuration objects** from dicts
|
||||
6. **Implementing safe nested dictionary access** with try/except blocks
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Basic installation (no serialization dependencies)
|
||||
pip install python-box~=7.0
|
||||
|
||||
# With all dependencies (YAML, TOML, msgpack)
|
||||
pip install python-box[all]~=7.0
|
||||
|
||||
# With specific dependencies
|
||||
pip install python-box[yaml]~=7.0 # PyYAML or ruamel.yaml
|
||||
pip install python-box[toml]~=7.0 # tomli/tomli-w
|
||||
pip install python-box[msgpack]~=7.0 # msgpack
|
||||
|
||||
# Optimized version with Cython (requires build tools)
|
||||
pip install Cython wheel
|
||||
pip install python-box[all]~=7.0 --force
|
||||
```
|
||||
|
||||
**Version Pinning:** Always use compatible release matching (`~=7.0`) as Box follows semantic versioning. Check @<https://github.com/cdgriffith/Box/wiki/Major-Version-Breaking-Changes> before upgrading major versions.
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
- **Minimum:** Python 3.9
|
||||
- **Supported:** Python 3.9, 3.10, 3.11, 3.12, 3.13
|
||||
- **Dropped Support:** Python 3.8 (removed in v7.3.0, EOL)
|
||||
- **Python 3.14:** Expected compatibility (based on current trajectory)
|
||||
|
||||
**Cython Optimization:** Available for x86_64 platforms. Loading large datasets can be up to 10x faster with Cython-compiled version.
|
||||
|
||||
## Core Features & Usage Examples
|
||||
|
||||
### 1. Basic Box Usage
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# Create from dict
|
||||
movie_box = Box({
|
||||
"Robin Hood: Men in Tights": {
|
||||
"imdb_stars": 6.7,
|
||||
"length": 104
|
||||
}
|
||||
})
|
||||
|
||||
# Automatic key conversion for dot notation
|
||||
# Spaces become underscores, special chars removed
|
||||
movie_box.Robin_Hood_Men_in_Tights.imdb_stars # 6.7
|
||||
|
||||
# Standard dict access still works
|
||||
movie_box["Robin Hood: Men in Tights"]["length"] # 104
|
||||
|
||||
# Both are equivalent
|
||||
assert movie_box.Robin_Hood_Men_in_Tights.imdb_stars == \
|
||||
movie_box["Robin Hood: Men in Tights"]["imdb_stars"]
|
||||
```
|
||||
|
||||
### 2. Configuration Management with ConfigBox
|
||||
|
||||
```python
|
||||
from box import ConfigBox
|
||||
import os
|
||||
|
||||
# Load environment-specific configuration
|
||||
config_data = {
|
||||
"development": {
|
||||
"database": {
|
||||
"host": "localhost",
|
||||
"port": 5432,
|
||||
"pool_size": 5
|
||||
},
|
||||
"debug": True
|
||||
},
|
||||
"production": {
|
||||
"database": {
|
||||
"host": "prod-db.server.com",
|
||||
"port": 5432,
|
||||
"pool_size": 20
|
||||
},
|
||||
"debug": False
|
||||
}
|
||||
}
|
||||
|
||||
# Select environment
|
||||
env = os.getenv("APP_ENV", "development")
|
||||
config = ConfigBox(config_data[env])
|
||||
|
||||
print(f"Database Host for {env}: {config.database.host}")
|
||||
print(f"Pool Size: {config.database.pool_size}")
|
||||
print(f"Debug Mode: {config.debug}")
|
||||
```
|
||||
|
||||
### 3. JSON/YAML/TOML Serialization
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# From JSON
|
||||
config = Box.from_json(filename="config.json")
|
||||
|
||||
# From YAML
|
||||
config = Box.from_yaml(filename="config.yaml")
|
||||
|
||||
# From TOML
|
||||
config = Box.from_toml(filename="config.toml")
|
||||
|
||||
# To JSON
|
||||
config.to_json(filename="output.json", indent=2)
|
||||
|
||||
# To YAML
|
||||
config.to_yaml(filename="output.yaml")
|
||||
|
||||
# To dict (for standard JSON serialization)
|
||||
import json
|
||||
json.dumps(config.to_dict())
|
||||
```
|
||||
|
||||
### 4. DefaultBox for Safe Access
|
||||
|
||||
```python
|
||||
from box import DefaultBox
|
||||
|
||||
# Create with default values
|
||||
config = DefaultBox(default_box=True, default_box_attr={})
|
||||
|
||||
# Access non-existent nested keys safely
|
||||
# Instead of KeyError, creates empty Box objects
|
||||
config.api.endpoints.users = "/api/v1/users"
|
||||
config.api.endpoints.posts = "/api/v1/posts"
|
||||
|
||||
# Check existence
|
||||
if config.cache.enabled:
|
||||
print("Cache is enabled")
|
||||
else:
|
||||
print("Cache not configured") # This prints
|
||||
```
|
||||
|
||||
### 5. FrozenBox for Immutability
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# Create mutable box
|
||||
config = Box({"debug": True, "timeout": 30})
|
||||
config.debug = False # Allowed
|
||||
|
||||
# Freeze it
|
||||
frozen_config = config.freeze()
|
||||
# or
|
||||
frozen_config = Box({"debug": True}, frozen_box=True)
|
||||
|
||||
# Attempts to modify raise BoxError
|
||||
try:
|
||||
frozen_config.debug = False
|
||||
except Exception as e:
|
||||
print(f"Error: {e}") # BoxError: Box is frozen
|
||||
```
|
||||
|
||||
### 6. Box Variants
|
||||
|
||||
```python
|
||||
from box import Box, BoxList
|
||||
|
||||
# CamelKillerBox - converts camelCase to snake_case
|
||||
from box import Box
|
||||
config = Box({"apiEndpoint": "https://api.example.com"}, camel_killer_box=True)
|
||||
config.api_endpoint # Works!
|
||||
|
||||
# BoxList - list of Box objects
|
||||
from box import BoxList
|
||||
users = BoxList([
|
||||
{"name": "Alice", "age": 30},
|
||||
{"name": "Bob", "age": 25}
|
||||
])
|
||||
users[0].name # "Alice"
|
||||
users[1].age # 25
|
||||
|
||||
# Box with dots in keys
|
||||
from box import Box
|
||||
config = Box({"api.version": "v2"}, box_dots=True)
|
||||
config["api.version"] # Access with dots in key
|
||||
```
|
||||
|
||||
## Real-World Usage Patterns
|
||||
|
||||
### Pattern 1: Application Configuration
|
||||
|
||||
```python
|
||||
# config/settings.py
|
||||
from box import ConfigBox
|
||||
from pathlib import Path
|
||||
|
||||
def load_config(env: str = "development") -> ConfigBox:
|
||||
"""Load environment-specific configuration."""
|
||||
config_path = Path(__file__).parent / f"{env}.yaml"
|
||||
return ConfigBox.from_yaml(filename=config_path)
|
||||
|
||||
# usage
|
||||
config = load_config(os.getenv("ENVIRONMENT", "development"))
|
||||
db_url = f"postgresql://{config.database.host}:{config.database.port}"
|
||||
```
|
||||
|
||||
### Pattern 2: API Response Handling
|
||||
|
||||
```python
|
||||
# Instead of dealing with nested dicts from API responses
|
||||
import requests
|
||||
from box import Box
|
||||
|
||||
response = requests.get("https://api.example.com/user/123")
|
||||
user_data = Box(response.json())
|
||||
|
||||
# Clean access to nested data
|
||||
print(f"User: {user_data.profile.name}")
|
||||
print(f"Email: {user_data.contact.email}")
|
||||
print(f"Company: {user_data.employment.company.name}")
|
||||
|
||||
# vs traditional dict access:
|
||||
# print(f"User: {response.json()['profile']['name']}")
|
||||
```
|
||||
|
||||
### Pattern 3: Argparse Integration
|
||||
|
||||
```python
|
||||
import argparse
|
||||
from box import Box
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('floats', metavar='N', type=float, nargs='+')
|
||||
parser.add_argument("-v", "--verbosity", action="count", default=0)
|
||||
|
||||
# Parse into Box instead of Namespace
|
||||
args = parser.parse_args(['1', '2', '3', '-vv'], namespace=Box())
|
||||
|
||||
# Can now use as dict or object
|
||||
print(args.floats) # [1.0, 2.0, 3.0]
|
||||
print(args.verbosity) # 2
|
||||
|
||||
# Easy to pass as kwargs
|
||||
def process(**kwargs):
|
||||
print(kwargs)
|
||||
|
||||
process(**args.to_dict())
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### JSON Configuration Files
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# config.json
|
||||
# {
|
||||
# "app": {
|
||||
# "name": "MyApp",
|
||||
# "version": "1.0.0"
|
||||
# },
|
||||
# "features": {
|
||||
# "auth": true,
|
||||
# "cache": false
|
||||
# }
|
||||
# }
|
||||
|
||||
config = Box.from_json(filename="config.json")
|
||||
if config.features.auth:
|
||||
setup_authentication()
|
||||
```
|
||||
|
||||
### YAML Configuration Files
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# config.yaml
|
||||
# database:
|
||||
# host: localhost
|
||||
# port: 5432
|
||||
# credentials:
|
||||
# username: admin
|
||||
# password: secret
|
||||
|
||||
config = Box.from_yaml(filename="config.yaml")
|
||||
db_conn = connect(
|
||||
host=config.database.host,
|
||||
port=config.database.port,
|
||||
user=config.database.credentials.username,
|
||||
password=config.database.credentials.password
|
||||
)
|
||||
```
|
||||
|
||||
### TOML Configuration Files
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# pyproject.toml or config.toml
|
||||
# [tool.myapp]
|
||||
# name = "MyApp"
|
||||
# version = "1.0.0"
|
||||
#
|
||||
# [tool.myapp.database]
|
||||
# host = "localhost"
|
||||
# port = 5432
|
||||
|
||||
config = Box.from_toml(filename="pyproject.toml")
|
||||
app_name = config.tool.myapp.name
|
||||
db_host = config.tool.myapp.database.host
|
||||
```
|
||||
|
||||
## When NOT to Use python-box
|
||||
|
||||
### 1. Performance-Critical Code
|
||||
|
||||
```python
|
||||
# DON'T use Box in tight loops or performance hotspots
|
||||
# Box has overhead for attribute access and conversion
|
||||
|
||||
# Bad: Hot loop with Box
|
||||
results = Box()
|
||||
for i in range(1_000_000):
|
||||
results[f"key_{i}"] = compute_value(i) # Overhead!
|
||||
|
||||
# Good: Use regular dict, convert after if needed
|
||||
results = {}
|
||||
for i in range(1_000_000):
|
||||
results[f"key_{i}"] = compute_value(i)
|
||||
results = Box(results) # Convert once
|
||||
```
|
||||
|
||||
### 2. When Dict Protocol is Required
|
||||
|
||||
```python
|
||||
# Some libraries expect strict dict instances
|
||||
import json
|
||||
from box import Box
|
||||
|
||||
config = Box({"key": "value"})
|
||||
|
||||
# This might fail with some JSON encoders expecting dict
|
||||
# Use .to_dict() to convert back
|
||||
json.dumps(config.to_dict()) # Safe
|
||||
```
|
||||
|
||||
### 3. Simple, Flat Dictionaries
|
||||
|
||||
```python
|
||||
# DON'T use Box for simple flat dicts without nesting
|
||||
# Regular dict is simpler and faster
|
||||
|
||||
# Overkill
|
||||
simple = Box({"name": "Alice", "age": 30})
|
||||
print(simple.name)
|
||||
|
||||
# Better
|
||||
simple = {"name": "Alice", "age": 30}
|
||||
print(simple["name"])
|
||||
```
|
||||
|
||||
### 4. When Key Names Match Python Keywords
|
||||
|
||||
```python
|
||||
# Be careful with Python keywords as attributes
|
||||
from box import Box
|
||||
|
||||
# This works but is awkward
|
||||
data = Box({"class": "A", "type": "object"})
|
||||
data["class"] # Must use bracket notation
|
||||
# data.class # SyntaxError!
|
||||
|
||||
# Better: Use regular dict or rename keys
|
||||
data = {"class_name": "A", "type_name": "object"}
|
||||
```
|
||||
|
||||
## Decision Matrix: Box vs dict vs dataclass
|
||||
|
||||
| Scenario | Use Box | Use dict | Use dataclass |
|
||||
| ----------------------------------- | --------------- | -------------------- | -------------------- |
|
||||
| **Configuration files** (JSON/YAML) | ✅ Excellent | ❌ Verbose | ⚠️ Needs validation |
|
||||
| **API response handling** | ✅ Excellent | ❌ Verbose | ❌ Schema unknown |
|
||||
| **Nested data structures** | ✅ Excellent | ⚠️ Works but verbose | ✅ Good with nesting |
|
||||
| **Type checking/IDE support** | ❌ Dynamic only | ❌ Dynamic only | ✅ Full typing |
|
||||
| **Performance critical code** | ❌ Overhead | ✅ Fastest | ✅ Fast |
|
||||
| **Immutable configuration** | ✅ FrozenBox | ❌ No built-in | ✅ frozen=True |
|
||||
| **Dynamic key names** | ✅ Flexible | ✅ Flexible | ❌ Fixed attrs |
|
||||
| **Need serialization helpers** | ✅ Built-in | ⚠️ Manual | ⚠️ Manual |
|
||||
| **Simple flat structures** | ⚠️ Overkill | ✅ Perfect | ✅ Good |
|
||||
| **Unknown data structure** | ✅ Flexible | ✅ Flexible | ❌ Needs schema |
|
||||
|
||||
## Decision Guidance
|
||||
|
||||
### Use Box When
|
||||
|
||||
1. **Working with configuration files** (YAML, JSON, TOML)
|
||||
2. **Handling nested API responses** with deep structures
|
||||
3. **You want cleaner dot notation** instead of brackets
|
||||
4. **Converting between dict and JSON/YAML frequently**
|
||||
5. **Need automatic nested dict conversion**
|
||||
6. **Working with data from external sources** (APIs, config files)
|
||||
7. **Prototyping or rapid development** where flexibility matters
|
||||
|
||||
### Use dict When
|
||||
|
||||
1. **Performance is critical** (tight loops, hot paths)
|
||||
2. **Simple, flat data structures**
|
||||
3. **Working with libraries expecting strict dict protocol**
|
||||
4. **You need maximum compatibility** with standard library
|
||||
5. **Memory efficiency is paramount** (minimal overhead)
|
||||
|
||||
### Use dataclass When
|
||||
|
||||
1. **Type safety and IDE autocomplete** are critical
|
||||
2. **Data structure is well-defined and stable**
|
||||
3. **You want validation** (with pydantic or attrs)
|
||||
4. **Building APIs or libraries** with clear contracts
|
||||
5. **Need immutability** with frozen=True
|
||||
6. **Working in type-checked codebases** (mypy, pyright)
|
||||
|
||||
## Example Projects Using python-box
|
||||
|
||||
Based on GitHub code search @<https://github.com/search?q=%22from+box+import+Box%22&type=code>, python-box is commonly used in:
|
||||
|
||||
1. **Machine Learning/AI Projects**
|
||||
- Configuration management for model training
|
||||
- Hyperparameter storage
|
||||
- Experiment tracking configurations
|
||||
|
||||
2. **Web Applications**
|
||||
- Flask/FastAPI configuration handling
|
||||
- API response processing
|
||||
- Environment-specific settings
|
||||
|
||||
3. **Data Science**
|
||||
- Notebook configuration management
|
||||
- Dataset metadata handling
|
||||
- Pipeline configurations
|
||||
|
||||
4. **DevOps/Infrastructure**
|
||||
- Terraform/Ansible configuration processing
|
||||
- CI/CD pipeline configurations
|
||||
- Container orchestration configs
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Cython Optimization
|
||||
|
||||
```bash
|
||||
# For x86_64 platforms, install with Cython for ~10x faster loading
|
||||
pip install Cython wheel
|
||||
pip install python-box[all]~=7.0 --upgrade --force
|
||||
|
||||
# For non-x86_64, you'll need:
|
||||
# - Python development files (python3-dev/python3-devel)
|
||||
# - System compiler (gcc, clang)
|
||||
# - Cython and wheel packages
|
||||
```
|
||||
|
||||
### Memory vs Convenience Trade-off
|
||||
|
||||
```python
|
||||
# Box adds ~3-5x memory overhead vs dict for large structures
|
||||
import sys
|
||||
from box import Box
|
||||
|
||||
# Regular dict
|
||||
data = {"key": "value"}
|
||||
print(sys.getsizeof(data)) # ~240 bytes
|
||||
|
||||
# Box wrapper
|
||||
box_data = Box({"key": "value"})
|
||||
print(sys.getsizeof(box_data)) # ~240 bytes (similar, but internal overhead for methods)
|
||||
|
||||
# For large datasets, convert to Box after processing
|
||||
large_data = {}
|
||||
for i in range(10000):
|
||||
large_data[f"key_{i}"] = process_data(i)
|
||||
# Convert once after collection
|
||||
config = Box(large_data)
|
||||
```
|
||||
|
||||
## Common Pitfalls & Solutions
|
||||
|
||||
### Pitfall 1: Attribute vs Key Confusion
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
config = Box({"class": "A", "type": "B"})
|
||||
|
||||
# Problem: Python keywords can't be attributes
|
||||
# config.class # SyntaxError!
|
||||
|
||||
# Solution: Use bracket notation
|
||||
config["class"] # Works
|
||||
|
||||
# Or rename keys during creation
|
||||
config = Box({"class_name": "A", "type_name": "B"})
|
||||
config.class_name # Works
|
||||
```
|
||||
|
||||
### Pitfall 2: Modification of Frozen Box
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# Frozen box prevents all modifications
|
||||
config = Box({"debug": True}, frozen_box=True)
|
||||
|
||||
# These all fail with BoxError
|
||||
# config.debug = False
|
||||
# config.new_key = "value"
|
||||
# config["debug"] = False
|
||||
|
||||
# Solution: Create unfrozen copy
|
||||
mutable_config = Box(config.to_dict())
|
||||
mutable_config.debug = False # Works
|
||||
```
|
||||
|
||||
### Pitfall 3: Conversion Overhead
|
||||
|
||||
```python
|
||||
from box import Box
|
||||
|
||||
# Problem: Creating Box in tight loops
|
||||
def process_items(items):
|
||||
results = []
|
||||
for item in items:
|
||||
item_box = Box(item) # Overhead per iteration!
|
||||
results.append(item_box.process())
|
||||
return results
|
||||
|
||||
# Solution: Convert once, or avoid Box in hot path
|
||||
def process_items_better(items):
|
||||
items_box = Box({"items": items})
|
||||
return [item["process"] for item in items_box.items]
|
||||
```
|
||||
|
||||
## Version History & Breaking Changes
|
||||
|
||||
- **v7.3.2** (2025-01-16): Latest stable release
|
||||
- Bug fixes for box_dots and default_box_create_on_get
|
||||
- **v7.3.0** (2024-12-10): Python 3.13 support added
|
||||
- Dropped Python 3.8 support (EOL)
|
||||
- **v7.2.0** (2024-06-12): Python 3.12 support
|
||||
- Numpy-style tuple indexing for BoxList
|
||||
- **v7.0.0**: Major version with breaking changes
|
||||
|
||||
**Breaking Changes:** @<https://github.com/cdgriffith/Box/wiki/Major-Version-Breaking-Changes>
|
||||
|
||||
Always check release notes before upgrading major versions.
|
||||
|
||||
## Related Libraries & Alternatives
|
||||
|
||||
| Library | Use Case | vs python-box |
|
||||
| ------------------------- | ----------------------------- | ----------------------------------- |
|
||||
| **types.SimpleNamespace** | Simple attribute access | Built-in, but no dict methods |
|
||||
| **munch** | Dot notation dict | Less features, unmaintained |
|
||||
| **addict** | Dict subclass with dot access | Similar, less popular |
|
||||
| **pydantic** | Validated data structures | Type-safe, validation, more complex |
|
||||
| **attrs/dataclasses** | Structured data | Type-safe, but not for dynamic data |
|
||||
| **DynaBox** | Similar to Box | Less mature |
|
||||
|
||||
**When to use Box over alternatives:**
|
||||
|
||||
- Need dict compatibility + dot notation
|
||||
- Working with JSON/YAML config files
|
||||
- Don't need static type checking
|
||||
- Want automatic nested conversion
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Official Wiki:** @<https://github.com/cdgriffith/Box/wiki>
|
||||
- **Quick Start:** @<https://github.com/cdgriffith/Box/wiki/Quick-Start>
|
||||
- **Types of Boxes:** @<https://github.com/cdgriffith/Box/wiki/Types-of-Boxes>
|
||||
- **Converters:** @<https://github.com/cdgriffith/Box/wiki/Converters>
|
||||
- **Installation Guide:** @<https://github.com/cdgriffith/Box/wiki/Installation>
|
||||
- **PyPI Package:** @<https://pypi.org/project/python-box/>
|
||||
- **GitHub Issues:** @<https://github.com/cdgriffith/Box/issues>
|
||||
|
||||
## Contributing & Support
|
||||
|
||||
**Maintainer:** Chris Griffith (@cdgriffith) **Contributors:** @<https://github.com/cdgriffith/Box/blob/master/AUTHORS.rst> **Issues/Questions:** @<https://github.com/cdgriffith/Box/issues>
|
||||
|
||||
The library is actively maintained with regular releases and responsive issue handling.
|
||||
|
||||
---
|
||||
|
||||
**Research Sources:**
|
||||
|
||||
- @<https://github.com/cdgriffith/Box> (Official Repository)
|
||||
- @<https://github.com/cdgriffith/Box/wiki> (Official Documentation)
|
||||
- @<https://pypi.org/project/python-box/> (Package Registry)
|
||||
- @<https://medium.com/@post.gourang/simplifying-configuration-management-in-python-with-configbox-90df67d26bce> (Tutorial)
|
||||
- GitHub Code Search for real-world usage examples
|
||||
|
||||
**Last Updated:** 2025-10-21 **Research Quality:** High - Based on official documentation, source code analysis, and real-world usage patterns
|
||||
800
skills/python3-development/references/modern-modules/copier.md
Normal file
800
skills/python3-development/references/modern-modules/copier.md
Normal file
@@ -0,0 +1,800 @@
|
||||
---
|
||||
title: "Copier: Project Template Renderer with Update Capabilities"
|
||||
library_name: copier
|
||||
pypi_package: copier
|
||||
category: project_templating
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://copier.readthedocs.io"
|
||||
official_repository: "https://github.com/copier-org/copier"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Copier: Project Template Renderer with Update Capabilities
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**What problem does it solve?** Copier solves the problem of project scaffolding AND ongoing template synchronization. Unlike most templating tools that are one-way generators, Copier enables **code lifecycle management** - you can update existing projects when the template evolves, not just generate new projects.
|
||||
|
||||
**Core value proposition:**
|
||||
|
||||
- Generate projects from templates (scaffolding)
|
||||
- **Update projects when templates change** (unique feature)
|
||||
- Version-aware migrations during updates
|
||||
- Works with local paths and Git URLs
|
||||
- Preserves customizations during updates
|
||||
|
||||
**When you'd be "reinventing the wheel" without it:**
|
||||
|
||||
- Maintaining multiple similar projects that need to stay in sync with best practices
|
||||
- Rolling out security updates or dependency changes across many projects
|
||||
- Applying organizational standards to existing codebases
|
||||
- Managing project boilerplate that evolves over time
|
||||
|
||||
## Official Information
|
||||
|
||||
- **Repository**: @<https://github.com/copier-org/copier>
|
||||
- **PyPI Package**: `copier` (current: v9.10.3)
|
||||
- **Documentation**: @<https://copier.readthedocs.io/>
|
||||
- **License**: MIT
|
||||
- **Maintenance**: Active development, 2,880+ stars
|
||||
- **Original Author**: jpsca (Juan-Pablo Scaletti)
|
||||
- **Current Maintainers**: yajo, pawamoy, sisp, and community
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# As CLI tool (recommended)
|
||||
pipx install copier
|
||||
# or with uv
|
||||
uv tool install copier
|
||||
|
||||
# As library
|
||||
pip install copier
|
||||
|
||||
# With conda
|
||||
conda install -c conda-forge copier
|
||||
|
||||
# Nix (100% reproducible)
|
||||
nix profile install 'https://flakehub.com/f/copier-org/copier/*.tar.gz'
|
||||
|
||||
# Homebrew (macOS/Linux)
|
||||
brew install copier
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- Python 3.9 or newer
|
||||
- Git 2.27 or newer (for template versioning and updates)
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
| Python Version | Support Status | Notes |
|
||||
| -------------- | -------------------- | ---------------------------------------------- |
|
||||
| 3.9 - 3.12 | ✅ Full support | Production ready |
|
||||
| 3.13 | ✅ Supported | v9.10.2+ built with Python 3.13.7 |
|
||||
| 3.14 | ⚠️ Likely compatible | Not explicitly tested, but backward-compatible |
|
||||
| < 3.9 | ❌ Not supported | Use older Copier versions |
|
||||
|
||||
_Source: @<https://github.com/copier-org/copier/blob/master/pyproject.toml> (classifiers section)_
|
||||
|
||||
## Core Purpose: When to Use Copier
|
||||
|
||||
### Primary Use Cases
|
||||
|
||||
1. **Project Scaffolding with Future Updates**
|
||||
- Generate new projects from templates
|
||||
- Apply template updates to existing projects
|
||||
- Track which template version each project uses
|
||||
|
||||
2. **Multi-Project Standardization**
|
||||
- Maintain consistency across microservices
|
||||
- Roll out organization-wide best practices
|
||||
- Synchronize CI/CD configurations
|
||||
|
||||
3. **Living Templates**
|
||||
- Templates that evolve with ecosystem changes
|
||||
- Security patches propagated to all projects
|
||||
- Dependency updates across project families
|
||||
|
||||
4. **Template Versioning**
|
||||
- Use Git tags to version templates
|
||||
- Selective updates to specific versions
|
||||
- Smart diff between template versions
|
||||
|
||||
### Copier vs Cookiecutter vs Yeoman
|
||||
|
||||
**Use Copier when:**
|
||||
|
||||
- ✅ You need to update projects after generation
|
||||
- ✅ You manage multiple similar projects
|
||||
- ✅ Your template evolves frequently
|
||||
- ✅ You want migration scripts during updates
|
||||
- ✅ You prefer YAML over JSON configuration
|
||||
|
||||
**Use Cookiecutter when:**
|
||||
|
||||
- ✅ You only need one-time generation
|
||||
- ✅ You want the largest template ecosystem
|
||||
- ✅ You need maximum stability (mature project)
|
||||
- ✅ Template updates aren't important
|
||||
|
||||
**Use Yeoman when:**
|
||||
|
||||
- ✅ You're in the Node.js ecosystem
|
||||
- ✅ You want NPM package distribution
|
||||
- ✅ You need JavaScript-based logic
|
||||
|
||||
| Feature | Copier | Cookiecutter | Yeoman |
|
||||
| ------------------------ | ----------------------- | ---------------------- | ----------- |
|
||||
| **Template Updates** | ✅ Yes | ❌ No (requires Cruft) | ❌ No |
|
||||
| **Migrations** | ✅ Yes | ❌ No | ❌ No |
|
||||
| **Config Format** | YAML | JSON | JavaScript |
|
||||
| **Templating** | Jinja2 | Jinja2 | EJS |
|
||||
| **Programming Required** | ❌ No | ❌ No | ✅ Yes (JS) |
|
||||
| **Template Suffix** | `.jinja` (configurable) | None | You choose |
|
||||
| **File Name Templating** | ✅ Yes | ✅ Yes | ✅ Yes |
|
||||
| **Ecosystem Size** | Medium | Large | Large |
|
||||
| **Maturity** | Active | Mature | Mature |
|
||||
|
||||
_Source: @<https://github.com/copier-org/copier/blob/master/docs/comparisons.md>_
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### 1. FastAPI Full-Stack Template
|
||||
|
||||
**Repository**: @<https://github.com/fastapi/full-stack-fastapi-template> (38,000+ stars)
|
||||
|
||||
```bash
|
||||
# Generate a new FastAPI project
|
||||
pipx run copier copy https://github.com/fastapi/full-stack-fastapi-template my-project --trust
|
||||
```
|
||||
|
||||
**Features demonstrated:**
|
||||
|
||||
- Multi-service Docker setup
|
||||
- PostgreSQL integration
|
||||
- React frontend scaffolding
|
||||
- Environment variable templating
|
||||
- Post-generation tasks
|
||||
|
||||
**Template snippet** (@<https://github.com/fastapi/full-stack-fastapi-template/blob/main/copier.yml>):
|
||||
|
||||
```yaml
|
||||
project_name:
|
||||
type: str
|
||||
help: The name of the project
|
||||
default: FastAPI Project
|
||||
|
||||
secret_key:
|
||||
type: str
|
||||
help: |
|
||||
The secret key for the project, generate with:
|
||||
python -c "import secrets; print(secrets.token_urlsafe(32))"
|
||||
default: changethis
|
||||
|
||||
_tasks:
|
||||
- ["{{ _copier_python }}", .copier/update_dotenv.py]
|
||||
```
|
||||
|
||||
### 2. Modern Python Package Template (copier-uv)
|
||||
|
||||
**Repository**: @<https://github.com/pawamoy/copier-uv> (108 stars)
|
||||
|
||||
```bash
|
||||
# Create a uv-managed Python package
|
||||
copier copy gh:pawamoy/copier-uv /path/to/project
|
||||
```
|
||||
|
||||
**Features demonstrated:**
|
||||
|
||||
- uv package manager integration
|
||||
- Jinja extensions (custom filters)
|
||||
- Git integration (auto-detect author)
|
||||
- License selection (20+ options)
|
||||
- Multi-file configuration includes
|
||||
|
||||
**Advanced configuration** (@<https://github.com/pawamoy/copier-uv/blob/main/copier.yml>):
|
||||
|
||||
```yaml
|
||||
_min_copier_version: "9"
|
||||
_jinja_extensions:
|
||||
- copier_template_extensions.TemplateExtensionLoader
|
||||
- extensions.py:CurrentYearExtension
|
||||
- extensions.py:GitExtension
|
||||
- extensions.py:SlugifyExtension
|
||||
|
||||
author_fullname:
|
||||
type: str
|
||||
help: Your full name
|
||||
default: "{{ 'Default Name' | git_user_name }}"
|
||||
|
||||
repository_name:
|
||||
type: str
|
||||
default: "{{ project_name | slugify }}"
|
||||
```
|
||||
|
||||
### 3. NLeSC Scientific Python Template
|
||||
|
||||
**Repository**: @<https://github.com/NLeSC/python-template> (223 stars)
|
||||
|
||||
**Features demonstrated:**
|
||||
|
||||
- Modular configuration (YAML includes)
|
||||
- Profile-based generation
|
||||
- Research software best practices
|
||||
- Citation files (CITATION.cff)
|
||||
|
||||
**Modular structure** (@<https://github.com/NLeSC/python-template/blob/main/copier.yml>):
|
||||
|
||||
```yaml
|
||||
# Include pattern for maintainability
|
||||
!include copier/settings.yml !include copier/profiles.yml !include copier/questions/essential.yml !include copier/questions/features_code_quality.yml !include copier/questions/features_documentation.yml
|
||||
```
|
||||
|
||||
### 4. JupyterLab Extension Template
|
||||
|
||||
**Repository**: @<https://github.com/jupyterlab/extension-template> (77 stars)
|
||||
|
||||
```bash
|
||||
pip install "copier~=9.2" jinja2-time
|
||||
copier copy --trust https://github.com/jupyterlab/extension-template .
|
||||
```
|
||||
|
||||
### 5. Odoo/Doodba Template
|
||||
|
||||
**Repository**: @<https://github.com/Tecnativa/doodba-copier-template> (104 stars)
|
||||
|
||||
- Complex multi-container applications
|
||||
- Multiple answer files for different template layers
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Git Integration
|
||||
|
||||
**Template versioning with Git tags:**
|
||||
|
||||
```bash
|
||||
# Copy specific version
|
||||
copier copy --vcs-ref=v1.2.0 gh:org/template /path/to/project
|
||||
|
||||
# Copy latest release (default)
|
||||
copier copy gh:org/template /path/to/project
|
||||
|
||||
# Copy from HEAD (including uncommitted changes)
|
||||
copier copy --vcs-ref=HEAD ./template /path/to/project
|
||||
```
|
||||
|
||||
**Update to latest template version:**
|
||||
|
||||
```bash
|
||||
cd /path/to/project
|
||||
copier update # Reads .copier-answers.yml automatically
|
||||
```
|
||||
|
||||
**Update to specific version:**
|
||||
|
||||
```bash
|
||||
copier update --vcs-ref=v2.0.0
|
||||
```
|
||||
|
||||
### Template Updates Workflow
|
||||
|
||||
The **killer feature** that distinguishes Copier:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Template v1.0] --> B[Generate Project]
|
||||
B --> C[.copier-answers.yml]
|
||||
A --> D[Template v2.0]
|
||||
D --> E[copier update]
|
||||
C --> E
|
||||
E --> F[Smart 3-way Merge]
|
||||
F --> G[Updated Project]
|
||||
F --> H[Migration Scripts]
|
||||
H --> G
|
||||
```
|
||||
|
||||
**The update process:**
|
||||
|
||||
1. Copier clones template at old version (from `.copier-answers.yml`)
|
||||
2. Regenerates project with old template
|
||||
3. Compares to current project (detects your changes)
|
||||
4. Clones template at new version
|
||||
5. Generates with new template
|
||||
6. Creates 3-way merge between: old template → your project ← new template
|
||||
7. Runs migration tasks for version transitions
|
||||
|
||||
**Example migration** (from Copier docs):
|
||||
|
||||
```yaml
|
||||
# copier.yml
|
||||
_migrations:
|
||||
- version: v2.0.0
|
||||
command: rm ./old-folder
|
||||
when: "{{ _stage == 'before' }}"
|
||||
- invoke migrate $VERSION_FROM $VERSION_TO
|
||||
```
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
**GitHub Actions example:**
|
||||
|
||||
```yaml
|
||||
name: Update from template
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 0 * * 0" # Weekly
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
update:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: astral-sh/setup-uv@v5
|
||||
- run: uv tool install copier
|
||||
- run: copier update --defaults --vcs-ref=HEAD
|
||||
- run: |
|
||||
git config user.name "Bot"
|
||||
git config user.email "bot@example.com"
|
||||
git checkout -b update-template
|
||||
git add -A
|
||||
git commit -m "Update from template"
|
||||
git push origin update-template
|
||||
- uses: peter-evans/create-pull-request@v5
|
||||
```
|
||||
|
||||
### Multiple Templates per Project
|
||||
|
||||
Apply different templates to different aspects:
|
||||
|
||||
```bash
|
||||
# Base framework
|
||||
copier copy -a .copier-answers.main.yml \
|
||||
gh:example/framework-template .
|
||||
|
||||
# Pre-commit config
|
||||
copier copy -a .copier-answers.pre-commit.yml \
|
||||
gh:my-org/pre-commit-template .
|
||||
|
||||
# Internal CI
|
||||
copier copy -a .copier-answers.ci.yml \
|
||||
git@gitlab.internal.com:templates/ci .
|
||||
```
|
||||
|
||||
Each gets its own answers file, enabling independent updates:
|
||||
|
||||
```bash
|
||||
copier update -a .copier-answers.main.yml
|
||||
copier update -a .copier-answers.pre-commit.yml
|
||||
copier update -a .copier-answers.ci.yml
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Template Creation
|
||||
|
||||
**Minimal template structure:**
|
||||
|
||||
```text
|
||||
my_template/
|
||||
├── copier.yml # Template configuration
|
||||
├── .git/ # Git repo (for versioning)
|
||||
├── {{project_name}}/ # Templated folder name
|
||||
│ └── {{module_name}}.py.jinja # Templated file
|
||||
└── {{_copier_conf.answers_file}}.jinja # Answers file
|
||||
```
|
||||
|
||||
**copier.yml** (question definitions):
|
||||
|
||||
```yaml
|
||||
project_name:
|
||||
type: str
|
||||
help: What is your project name?
|
||||
|
||||
module_name:
|
||||
type: str
|
||||
help: What is your Python module name?
|
||||
default: "{{ project_name | lower | replace('-', '_') }}"
|
||||
|
||||
python_version:
|
||||
type: str
|
||||
help: Minimum Python version
|
||||
default: "3.9"
|
||||
choices:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
- "3.12"
|
||||
- "3.13"
|
||||
```
|
||||
|
||||
**Templated Python file** (`{{project_name}}/{{module_name}}.py.jinja`):
|
||||
|
||||
```python
|
||||
"""{{ project_name }} - A Python package."""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
|
||||
def hello() -> str:
|
||||
"""Return a greeting."""
|
||||
return "Hello from {{ module_name }}!"
|
||||
```
|
||||
|
||||
**Answers file template** (`{{_copier_conf.answers_file}}.jinja`):
|
||||
|
||||
```yaml
|
||||
# Changes here will be overwritten by Copier
|
||||
{ { _copier_answers | to_nice_yaml - } }
|
||||
```
|
||||
|
||||
### Generating a Project
|
||||
|
||||
**From local template:**
|
||||
|
||||
```bash
|
||||
copier copy /path/to/template /path/to/destination
|
||||
```
|
||||
|
||||
**From Git URL:**
|
||||
|
||||
```bash
|
||||
copier copy https://github.com/org/template /path/to/destination
|
||||
# or shorthand
|
||||
copier copy gh:org/template /path/to/destination
|
||||
copier copy gl:org/template /path/to/destination # GitLab
|
||||
```
|
||||
|
||||
**With pre-answered questions:**
|
||||
|
||||
```bash
|
||||
copier copy \
|
||||
--data project_name="My Project" \
|
||||
--data module_name="my_project" \
|
||||
gh:org/template /path/to/destination
|
||||
```
|
||||
|
||||
**From data file:**
|
||||
|
||||
```bash
|
||||
# answers.yml
|
||||
project_name: My Project
|
||||
module_name: my_project
|
||||
python_version: "3.11"
|
||||
|
||||
# Use it
|
||||
copier copy --data-file answers.yml gh:org/template /path/to/destination
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```python
|
||||
from copier import run_copy, run_update
|
||||
|
||||
# Generate new project
|
||||
run_copy(
|
||||
"https://github.com/org/template.git",
|
||||
"/path/to/destination",
|
||||
data={"project_name": "My Project"},
|
||||
vcs_ref="v1.0.0", # Specific version
|
||||
)
|
||||
|
||||
# Update existing project
|
||||
run_update(
|
||||
"/path/to/destination",
|
||||
vcs_ref="v2.0.0", # Update to v2.0.0
|
||||
skip_answered=True, # Don't re-ask answered questions
|
||||
)
|
||||
```
|
||||
|
||||
### Advanced Template Features
|
||||
|
||||
**Conditional file generation:**
|
||||
|
||||
```yaml
|
||||
# copier.yml
|
||||
use_docker:
|
||||
type: bool
|
||||
help: Include Docker support?
|
||||
default: true
|
||||
```
|
||||
|
||||
**File/folder structure:**
|
||||
|
||||
```text
|
||||
template/
|
||||
{% if use_docker %}Dockerfile{% endif %}.jinja
|
||||
{% if use_docker %}docker-compose.yml{% endif %}.jinja
|
||||
```
|
||||
|
||||
**Dynamic choices:**
|
||||
|
||||
```yaml
|
||||
language:
|
||||
type: str
|
||||
choices:
|
||||
- python
|
||||
- javascript
|
||||
|
||||
package_manager:
|
||||
type: str
|
||||
help: Which package manager?
|
||||
choices: |
|
||||
{%- if language == "python" %}
|
||||
- pip
|
||||
- uv
|
||||
- poetry
|
||||
{%- else %}
|
||||
- npm
|
||||
- yarn
|
||||
- pnpm
|
||||
{%- endif %}
|
||||
```
|
||||
|
||||
**File exclusion:**
|
||||
|
||||
```yaml
|
||||
_exclude:
|
||||
- "*.pyc"
|
||||
- __pycache__
|
||||
- .git
|
||||
- .venv
|
||||
- "{% if not use_docker %}docker-*{% endif %}"
|
||||
```
|
||||
|
||||
**Post-generation tasks:**
|
||||
|
||||
```yaml
|
||||
_tasks:
|
||||
- git init
|
||||
- git add -A
|
||||
- git commit -m "Initial commit from template"
|
||||
- ["{{ _copier_python }}", -m pip install -e .]
|
||||
```
|
||||
|
||||
**Jinja2 extensions:**
|
||||
|
||||
```yaml
|
||||
_jinja_extensions:
|
||||
- copier_templates_extensions.TemplateExtensionLoader
|
||||
- jinja2_time.TimeExtension
|
||||
# Install with:
|
||||
# pipx inject copier copier-templates-extensions jinja2-time
|
||||
```
|
||||
|
||||
### Updating Projects
|
||||
|
||||
**Update to latest version:**
|
||||
|
||||
```bash
|
||||
cd /path/to/project
|
||||
copier update
|
||||
```
|
||||
|
||||
**Update with conflict resolution:**
|
||||
|
||||
```bash
|
||||
# Inline conflicts (default)
|
||||
copier update --conflict=inline
|
||||
|
||||
# .rej files (like patch)
|
||||
copier update --conflict=rej
|
||||
```
|
||||
|
||||
**Re-answer questions:**
|
||||
|
||||
```bash
|
||||
# Re-answer all questions
|
||||
copier update --vcs-ref=:current:
|
||||
|
||||
# Skip previously answered
|
||||
copier update --skip-answered
|
||||
```
|
||||
|
||||
**Update without interactive prompts:**
|
||||
|
||||
```bash
|
||||
# Use defaults/existing answers
|
||||
copier update --defaults
|
||||
|
||||
# Override specific values
|
||||
copier update --data python_version="3.12"
|
||||
```
|
||||
|
||||
## When NOT to Use Copier
|
||||
|
||||
### Simple File Copying
|
||||
|
||||
❌ **Don't use Copier for:**
|
||||
|
||||
```bash
|
||||
# Just copy static files
|
||||
cp -r template_dir new_project
|
||||
```
|
||||
|
||||
✅ **Use basic tools instead:**
|
||||
|
||||
- `cp` for simple directory copying
|
||||
- `rsync` for file synchronization
|
||||
- Git clone for exact repository copies
|
||||
|
||||
### One-Time Generation Without Updates
|
||||
|
||||
If you never plan to update from the template:
|
||||
|
||||
- Cookiecutter has larger ecosystem
|
||||
- Yeoman for Node.js projects
|
||||
- Manual copying might suffice
|
||||
|
||||
### Complex Conditional Logic
|
||||
|
||||
❌ **Not ideal for:**
|
||||
|
||||
- Heavy business logic in templates
|
||||
- Complex data transformations
|
||||
- Runtime configuration (use proper config libraries)
|
||||
|
||||
✅ **Use instead:**
|
||||
|
||||
- Python scripts for complex logic
|
||||
- Dedicated config management (Dynaconf, python-decouple)
|
||||
- Application frameworks (Django, FastAPI built-in scaffolding)
|
||||
|
||||
### Single Project Maintenance
|
||||
|
||||
If you only maintain one project:
|
||||
|
||||
- Template overhead isn't justified
|
||||
- Direct edits are simpler
|
||||
- No synchronization benefits
|
||||
|
||||
### Non-Text Files
|
||||
|
||||
Copier focuses on text file templating:
|
||||
|
||||
- Binary files copied as-is
|
||||
- No image/binary manipulation
|
||||
- No archive extraction
|
||||
|
||||
### Version Control Conflicts
|
||||
|
||||
⚠️ **Be cautious when:**
|
||||
|
||||
- Project has diverged significantly from template
|
||||
- Many conflicting changes expected
|
||||
- Team unfamiliar with 3-way merge resolution
|
||||
|
||||
**Mitigation:**
|
||||
|
||||
- Test updates in separate branch
|
||||
- Use `--conflict=rej` for manual review
|
||||
- Document update procedures
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
### Use Copier When
|
||||
|
||||
| Scenario | Why Copier? |
|
||||
| ---------------------------------- | ------------------------------------------------------------ |
|
||||
| Managing 5+ similar microservices | Templates sync security patches across all services |
|
||||
| Organizational standards evolving | Roll out changes without manual edits to each project |
|
||||
| Onboarding new projects frequently | Consistent structure + ability to improve template over time |
|
||||
| Template still experimental | Iterate template, update existing projects with improvements |
|
||||
| CI/CD pipeline standardization | Update all projects when pipeline requirements change |
|
||||
| Multi-repo architecture | Maintain consistency without monorepo complexity |
|
||||
|
||||
### Don't Use Copier When
|
||||
|
||||
| Scenario | Why Not? | Alternative |
|
||||
| ------------------------------------------- | ---------------------------- | ------------------------------------- |
|
||||
| Single project, no similar projects planned | Overhead > benefit | Direct editing |
|
||||
| Template is 100% stable forever | Update feature unused | Cookiecutter (larger ecosystem) |
|
||||
| Heavy runtime configuration needed | Wrong tool for job | Dynaconf, Pydantic Settings |
|
||||
| Binary file manipulation required | Not designed for this | Pillow, custom scripts |
|
||||
| Project has deviated >50% from template | Merge conflicts overwhelming | Manual migration |
|
||||
| No Git repository for template | Can't track versions | Use Git or accept one-shot generation |
|
||||
|
||||
### Copier vs Cookiecutter Decision Tree
|
||||
|
||||
```text
|
||||
Do you need to update projects after generation?
|
||||
├─ YES → Use Copier
|
||||
│ └─ Need version-aware migrations?
|
||||
│ ├─ YES → Definitely Copier
|
||||
│ └─ NO → Still Copier (future-proofing)
|
||||
│
|
||||
└─ NO → Consider factors:
|
||||
├─ Prefer YAML config? → Copier
|
||||
├─ Want larger template ecosystem? → Cookiecutter
|
||||
├─ Need maximum stability? → Cookiecutter
|
||||
└─ Might need updates later? → Copier (easier to start with)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Template Design
|
||||
|
||||
1. **Version your templates** - Use Git tags (v1.0.0, v2.0.0)
|
||||
2. **Keep templates focused** - One concern per template
|
||||
3. **Provide good defaults** - Minimize required answers
|
||||
4. **Document migrations** - Explain breaking changes
|
||||
5. **Test template updates** - Generate project, modify, update
|
||||
|
||||
### Project Maintenance
|
||||
|
||||
1. **Commit `.copier-answers.yml`** - Essential for updates
|
||||
2. **Don't edit generated markers** - Copier overwrites them
|
||||
3. **Test updates in branches** - Merge after verification
|
||||
4. **Run migrations carefully** - Review before executing
|
||||
5. **Document deviations** - Note why you diverge from template
|
||||
|
||||
### Organization Adoption
|
||||
|
||||
1. **Start with one template** - Prove value before expanding
|
||||
2. **Automate update checks** - CI job for template freshness
|
||||
3. **Train on merge conflicts** - 3-way merges need understanding
|
||||
4. **Maintain template changelog** - Help consumers understand changes
|
||||
5. **Version template conservatively** - Breaking changes = major version
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
1. **Answers file location matters** - Must be committed and at project root
|
||||
2. **Template suffix required by default** - Files need `.jinja` unless configured otherwise
|
||||
3. **Git required for updates** - Template must be Git repository with tags
|
||||
4. **Jinja syntax in YAML** - Must quote templated values properly
|
||||
5. **Task execution order** - Tasks run sequentially, not in parallel
|
||||
6. **Conflict resolution** - Learn 3-way merge basics before first update
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Generation speed**: Fast for typical projects (<1s for small templates)
|
||||
- **Update speed**: Depends on project size and Git history
|
||||
- **Memory usage**: Minimal, dominated by Git operations
|
||||
- **Caching**: Template cloning cached by Git
|
||||
|
||||
## Related Tools
|
||||
|
||||
- **cruft** - Adds update capability to Cookiecutter templates
|
||||
- **cookiecutter** - Popular Python templating (one-way generation)
|
||||
- **yeoman** - Node.js ecosystem scaffolding
|
||||
- **copier-templates-extensions** - Additional Jinja filters for Copier
|
||||
- **jinja2-time** - Time-based Jinja filters
|
||||
|
||||
## Learning Resources
|
||||
|
||||
- Official docs: @<https://copier.readthedocs.io/>
|
||||
- Template browser: @<https://github.com/topics/copier-template>
|
||||
- Comparisons: @<https://github.com/copier-org/copier/blob/master/docs/comparisons.md>
|
||||
- Example templates: See "Real-World Examples" section above
|
||||
|
||||
## Summary
|
||||
|
||||
**Copier is the best choice when:**
|
||||
|
||||
- You maintain multiple related projects
|
||||
- Your templates evolve over time
|
||||
- You need to propagate changes to existing projects
|
||||
- You want version-aware template management
|
||||
- You prefer declarative YAML configuration
|
||||
|
||||
**Copier's unique selling point:** The ability to update existing projects when templates change, with intelligent 3-way merging and version-aware migrations.
|
||||
|
||||
**Quick start for evaluation:**
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pipx install copier
|
||||
|
||||
# Try popular template
|
||||
copier copy gh:pawamoy/copier-uv test-project
|
||||
|
||||
# Make changes to project, then simulate update
|
||||
cd test-project
|
||||
# Edit some files...
|
||||
copier update --defaults --vcs-ref=HEAD
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Research completed**: 2025-10-21 **Sources verified**: Official repository, PyPI, documentation, real-world templates **Template examples analyzed**: 5 major templates (FastAPI, copier-uv, NLeSC, JupyterLab, Doodba)
|
||||
@@ -0,0 +1,677 @@
|
||||
---
|
||||
title: "Datasette: Instant JSON API for Your SQLite Data"
|
||||
library_name: datasette
|
||||
pypi_package: datasette
|
||||
category: data_exploration
|
||||
python_compatibility: "3.10+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://docs.datasette.io"
|
||||
official_repository: "https://github.com/simonw/datasette"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Datasette - Instant Data Publishing and Exploration
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Datasette is an open-source tool for exploring and publishing data. It transforms any SQLite database into an interactive website with a full JSON API, requiring zero code. Designed for data journalists, museum curators, archivists, local governments, scientists, and researchers, Datasette makes data sharing and exploration accessible to anyone with data to publish.
|
||||
|
||||
**Core Value Proposition**: Take data of any shape or size and instantly publish it as an explorable website with a corresponding API, without writing application code.
|
||||
|
||||
## Official Information
|
||||
|
||||
- **Repository**: <https://github.com/simonw/datasette> @ simonw/datasette
|
||||
- **PyPI**: `datasette` @ <https://pypi.org/project/datasette/>
|
||||
- **Current Development Version**: 1.0a19 (alpha)
|
||||
- **Current Stable Version**: 0.65.1
|
||||
- **Documentation**: <https://docs.datasette.io/> @ docs.datasette.io
|
||||
- **License**: Apache License 2.0 @ <https://github.com/simonw/datasette/blob/main/LICENSE>
|
||||
- **Maintenance Status**: Actively maintained (647 open issues, last updated 2025-10-21)
|
||||
- **Community**: Discord @ <https://datasette.io/discord>, Newsletter @ <https://datasette.substack.com/>
|
||||
|
||||
## What Problem Does Datasette Solve?
|
||||
|
||||
### The Problem
|
||||
|
||||
Organizations and individuals have valuable data in SQLite databases, CSV files, or other formats, but:
|
||||
|
||||
- Building a web interface to explore data requires significant development effort
|
||||
- Creating APIs for data access requires backend development expertise
|
||||
- Publishing data in an accessible, explorable format is time-consuming
|
||||
- Sharing data insights requires custom visualization tools
|
||||
- Data exploration often requires SQL knowledge or specialized tools
|
||||
|
||||
### The Solution
|
||||
|
||||
Datasette provides:
|
||||
|
||||
1. **Instant Web Interface**: Automatic web UI for any SQLite database
|
||||
2. **Automatic API**: Full JSON API with no code required
|
||||
3. **SQL Query Interface**: Built-in SQL editor with query sharing
|
||||
4. **Plugin Ecosystem**: 300+ plugins for extending functionality @ <https://datasette.io/plugins>
|
||||
5. **One-Command Publishing**: Deploy to cloud platforms with a single command
|
||||
6. **Zero-Setup Exploration**: Browse, filter, and facet data immediately
|
||||
|
||||
### What Would Be Reinventing the Wheel
|
||||
|
||||
Without Datasette, you would need to build:
|
||||
|
||||
- Custom web application for data browsing
|
||||
- RESTful API endpoints for data access
|
||||
- SQL query interface with security controls
|
||||
- Data export functionality (JSON, CSV)
|
||||
- Full-text search integration
|
||||
- Authentication and authorization system
|
||||
- Pagination and filtering logic
|
||||
- Deployment configuration and hosting setup
|
||||
|
||||
**Example**: Publishing a dataset of 100,000 records would require weeks of development work. With Datasette: `datasette publish cloudrun mydata.db --service=mydata`
|
||||
|
||||
## Real-World Usage Patterns
|
||||
|
||||
### Pattern 1: Publishing Open Data (Government/Research)
|
||||
|
||||
**Context**: @ <https://github.com/simonw/covid-19-datasette>
|
||||
|
||||
```bash
|
||||
# Convert CSV to SQLite
|
||||
csvs-to-sqlite covid-data.csv covid.db
|
||||
|
||||
# Publish to Cloud Run with metadata
|
||||
datasette publish cloudrun covid.db \
|
||||
--service=covid-tracker \
|
||||
--metadata metadata.json \
|
||||
--install=datasette-vega
|
||||
```
|
||||
|
||||
**Use Case**: Local governments publishing COVID-19 statistics, election results, or public records.
|
||||
|
||||
### Pattern 2: Personal Data Archives (Dogsheep Pattern)
|
||||
|
||||
**Context**: @ <https://github.com/dogsheep>
|
||||
|
||||
```bash
|
||||
# Export Twitter data to SQLite
|
||||
twitter-to-sqlite user-timeline twitter.db
|
||||
|
||||
# Export GitHub activity
|
||||
github-to-sqlite repos github.db
|
||||
|
||||
# Export Apple Health data
|
||||
healthkit-to-sqlite export.zip health.db
|
||||
|
||||
# Explore everything together
|
||||
datasette twitter.db github.db health.db --crossdb
|
||||
```
|
||||
|
||||
**Use Case**: Personal data liberation - exploring your own data from various platforms.
|
||||
|
||||
### Pattern 3: Data Journalism and Investigation
|
||||
|
||||
**Context**: @ <https://github.com/simonw/laion-aesthetic-datasette>
|
||||
|
||||
```python
|
||||
# Load and explore LAION training data
|
||||
import sqlite_utils
|
||||
|
||||
db = sqlite_utils.Database("images.db")
|
||||
db["images"].insert_all(image_data)
|
||||
db["images"].enable_fts(["caption", "url"])
|
||||
|
||||
# Launch with custom template
|
||||
datasette images.db \
|
||||
--template-dir templates/ \
|
||||
--metadata metadata.json
|
||||
```
|
||||
|
||||
**Use Case**: Exploring large datasets like Stable Diffusion training data, analyzing patterns.
|
||||
|
||||
### Pattern 4: Internal Tools and Dashboards
|
||||
|
||||
**Context**: @ <https://github.com/rclement/datasette-dashboards>
|
||||
|
||||
```yaml
|
||||
# datasette.yaml - Configure dashboards
|
||||
databases:
|
||||
analytics:
|
||||
queries:
|
||||
daily_users:
|
||||
sql: |
|
||||
SELECT date, count(*) as users
|
||||
FROM events
|
||||
WHERE event_type = 'login'
|
||||
GROUP BY date
|
||||
ORDER BY date DESC
|
||||
title: Daily Active Users
|
||||
```
|
||||
|
||||
**Installation**:
|
||||
|
||||
```bash
|
||||
datasette install datasette-dashboards
|
||||
datasette analytics.db --config datasette.yaml
|
||||
```
|
||||
|
||||
**Use Case**: Building internal analytics dashboards without BI tools.
|
||||
|
||||
### Pattern 5: API Backend for Applications
|
||||
|
||||
**Context**: @ <https://github.com/simonw/datasette-graphql>
|
||||
|
||||
```bash
|
||||
# Install GraphQL plugin
|
||||
datasette install datasette-graphql
|
||||
|
||||
# Launch with authentication
|
||||
datasette data.db \
|
||||
--root \
|
||||
--cors \
|
||||
--setting default_cache_ttl 3600
|
||||
```
|
||||
|
||||
**GraphQL Query**:
|
||||
|
||||
```graphql
|
||||
{
|
||||
products(first: 10, where: { price_gt: 100 }) {
|
||||
nodes {
|
||||
id
|
||||
name
|
||||
price
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Using Datasette as a read-only API backend for mobile/web apps.
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Core Data Integrations
|
||||
|
||||
1. **SQLite Native**:
|
||||
|
||||
```python
|
||||
import sqlite3
|
||||
conn = sqlite3.connect('data.db')
|
||||
# Datasette reads directly
|
||||
```
|
||||
|
||||
2. **CSV/JSON Import** via `sqlite-utils` @ <https://github.com/simonw/sqlite-utils>:
|
||||
|
||||
```bash
|
||||
sqlite-utils insert data.db records records.json
|
||||
csvs-to-sqlite *.csv data.db
|
||||
```
|
||||
|
||||
3. **Database Migration** via `db-to-sqlite` @ <https://github.com/simonw/db-to-sqlite>:
|
||||
|
||||
```bash
|
||||
# Export from PostgreSQL
|
||||
db-to-sqlite "postgresql://user:pass@host/db" data.db --table=events
|
||||
|
||||
# Export from MySQL
|
||||
db-to-sqlite "mysql://user:pass@host/db" data.db --all
|
||||
```
|
||||
|
||||
### Companion Libraries
|
||||
|
||||
- **sqlite-utils**: Database manipulation @ <https://github.com/simonw/sqlite-utils>
|
||||
- **csvs-to-sqlite**: CSV import @ <https://github.com/simonw/csvs-to-sqlite>
|
||||
- **datasette-extract**: AI-powered data extraction @ <https://github.com/datasette/datasette-extract>
|
||||
- **datasette-parquet**: Parquet/DuckDB support @ <https://github.com/cldellow/datasette-parquet>
|
||||
|
||||
### Deployment Patterns
|
||||
|
||||
**Cloud Run** @ <https://docs.datasette.io/en/stable/publish.html>:
|
||||
|
||||
```bash
|
||||
datasette publish cloudrun data.db \
|
||||
--service=myapp \
|
||||
--install=datasette-vega \
|
||||
--install=datasette-cluster-map \
|
||||
--metadata metadata.json
|
||||
```
|
||||
|
||||
**Vercel** via `datasette-publish-vercel` @ <https://github.com/simonw/datasette-publish-vercel>:
|
||||
|
||||
```bash
|
||||
pip install datasette-publish-vercel
|
||||
datasette publish vercel data.db --project my-data
|
||||
```
|
||||
|
||||
**Fly.io** via `datasette-publish-fly` @ <https://github.com/simonw/datasette-publish-fly>:
|
||||
|
||||
```bash
|
||||
pip install datasette-publish-fly
|
||||
datasette publish fly data.db --app=my-datasette
|
||||
```
|
||||
|
||||
**Docker**:
|
||||
|
||||
```dockerfile
|
||||
FROM datasetteproject/datasette
|
||||
COPY *.db /data/
|
||||
RUN datasette install datasette-vega
|
||||
CMD datasette serve /data/*.db --host 0.0.0.0 --cors
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
### Official Support Matrix
|
||||
|
||||
| Python Version | Status | Notes |
|
||||
| -------------- | -------------------- | ----------------------------------- |
|
||||
| 3.10 | **Minimum Required** | @ setup.py python_requires=">=3.10" |
|
||||
| 3.11 | ✅ Fully Supported | Recommended for production |
|
||||
| 3.12 | ✅ Fully Supported | Tested in CI |
|
||||
| 3.13 | ✅ Fully Supported | Tested in CI |
|
||||
| 3.14 | ✅ Fully Supported | Tested in CI |
|
||||
| 3.9 and below | ❌ Not Supported | Deprecated as of v1.0 |
|
||||
|
||||
### Version-Specific Considerations
|
||||
|
||||
**Python 3.10+**:
|
||||
|
||||
- Uses `importlib.metadata` for plugin loading
|
||||
- Native `match/case` statements in codebase (likely in v1.0+)
|
||||
- Type hints using modern syntax
|
||||
|
||||
**Python 3.11+ Benefits**:
|
||||
|
||||
- Better async performance (important for ASGI)
|
||||
- Faster startup times
|
||||
- Improved error messages
|
||||
|
||||
**No Breaking Changes Expected**: Datasette maintains backward compatibility within major versions.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pip install datasette
|
||||
# or
|
||||
brew install datasette
|
||||
|
||||
# Serve a database
|
||||
datasette data.db
|
||||
|
||||
# Open in browser automatically
|
||||
datasette data.db -o
|
||||
|
||||
# Serve multiple databases
|
||||
datasette db1.db db2.db db3.db
|
||||
|
||||
# Enable cross-database queries
|
||||
datasette db1.db db2.db --crossdb
|
||||
```
|
||||
|
||||
### Configuration Example
|
||||
|
||||
**metadata.json** @ <https://docs.datasette.io/en/stable/metadata.html>:
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "My Data Project",
|
||||
"description": "Exploring public datasets",
|
||||
"license": "CC BY 4.0",
|
||||
"license_url": "https://creativecommons.org/licenses/by/4.0/",
|
||||
"source": "Data Sources",
|
||||
"source_url": "https://example.com/sources",
|
||||
"databases": {
|
||||
"mydb": {
|
||||
"tables": {
|
||||
"events": {
|
||||
"title": "Event Log",
|
||||
"description": "System event records",
|
||||
"hidden": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**datasette.yaml** @ <https://docs.datasette.io/en/stable/configuration.html>:
|
||||
|
||||
```yaml
|
||||
settings:
|
||||
default_page_size: 50
|
||||
sql_time_limit_ms: 3500
|
||||
max_returned_rows: 2000
|
||||
|
||||
plugins:
|
||||
datasette-cluster-map:
|
||||
latitude_column: lat
|
||||
longitude_column: lng
|
||||
|
||||
databases:
|
||||
mydb:
|
||||
queries:
|
||||
popular_events:
|
||||
sql: |
|
||||
SELECT event_type, COUNT(*) as count
|
||||
FROM events
|
||||
GROUP BY event_type
|
||||
ORDER BY count DESC
|
||||
LIMIT 10
|
||||
title: Most Popular Events
|
||||
```
|
||||
|
||||
### Plugin Development Example
|
||||
|
||||
**Simple Plugin** @ <https://docs.datasette.io/en/stable/writing_plugins.html>:
|
||||
|
||||
```python
|
||||
from datasette import hookimpl
|
||||
|
||||
@hookimpl
|
||||
def prepare_connection(conn):
|
||||
"""Add custom SQL functions"""
|
||||
conn.create_function("is_even", 1, lambda x: x % 2 == 0)
|
||||
|
||||
@hookimpl
|
||||
def extra_template_vars(request):
|
||||
"""Add variables to templates"""
|
||||
return {
|
||||
"custom_message": "Hello from plugin!"
|
||||
}
|
||||
```
|
||||
|
||||
**setup.py**:
|
||||
|
||||
```python
|
||||
setup(
|
||||
name="datasette-my-plugin",
|
||||
version="0.1",
|
||||
py_modules=["datasette_my_plugin"],
|
||||
entry_points={
|
||||
"datasette": [
|
||||
"my_plugin = datasette_my_plugin"
|
||||
]
|
||||
},
|
||||
install_requires=["datasette>=0.60"],
|
||||
)
|
||||
```
|
||||
|
||||
### Advanced: Python API Usage
|
||||
|
||||
**Programmatic Access** @ <https://docs.datasette.io/en/stable/internals.html>:
|
||||
|
||||
```python
|
||||
from datasette.app import Datasette
|
||||
import asyncio
|
||||
|
||||
async def explore_data():
|
||||
# Initialize Datasette
|
||||
ds = Datasette(files=["data.db"])
|
||||
|
||||
# Execute query
|
||||
result = await ds.execute(
|
||||
"data",
|
||||
"SELECT * FROM users WHERE age > :age",
|
||||
{"age": 18}
|
||||
)
|
||||
|
||||
# Access rows
|
||||
for row in result.rows:
|
||||
print(dict(row))
|
||||
|
||||
# Get table info
|
||||
db = ds.get_database("data")
|
||||
tables = await db.table_names()
|
||||
print(f"Tables: {tables}")
|
||||
|
||||
asyncio.run(explore_data())
|
||||
```
|
||||
|
||||
### Testing Plugins
|
||||
|
||||
**pytest Example** @ <https://docs.datasette.io/en/stable/testing_plugins.html>:
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from datasette.app import Datasette
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_homepage():
|
||||
ds = Datasette(memory=True)
|
||||
await ds.invoke_startup()
|
||||
|
||||
response = await ds.client.get("/")
|
||||
assert response.status_code == 200
|
||||
assert "<!DOCTYPE html>" in response.text
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_json_api():
|
||||
ds = Datasette(memory=True)
|
||||
|
||||
# Create test data
|
||||
db = ds.add_database(Database(ds, memory_name="test"))
|
||||
await db.execute_write(
|
||||
"CREATE TABLE items (id INTEGER PRIMARY KEY, name TEXT)"
|
||||
)
|
||||
|
||||
# Query via API
|
||||
response = await ds.client.get("/test/items.json")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["rows"] == []
|
||||
```
|
||||
|
||||
## When NOT to Use Datasette
|
||||
|
||||
### ❌ Scenarios Where Datasette Is Inappropriate
|
||||
|
||||
1. **High-Write Applications**
|
||||
- Datasette is optimized for read-heavy workloads
|
||||
- SQLite has write limitations with concurrent access
|
||||
- **Better Alternative**: PostgreSQL with PostgREST, or Django REST Framework
|
||||
|
||||
2. **Real-Time Collaborative Editing**
|
||||
- No built-in support for concurrent data editing
|
||||
- Read-only by default (writes require plugins)
|
||||
- **Better Alternative**: Airtable, Retool, or custom CRUD application
|
||||
|
||||
3. **Large-Scale Data Warehousing**
|
||||
- SQLite works well up to ~100GB, struggles beyond
|
||||
- Not designed for massive analytical workloads
|
||||
- **Better Alternative**: DuckDB with MotherDuck, or BigQuery with Looker
|
||||
|
||||
4. **Complex BI Dashboards**
|
||||
- Limited visualization capabilities without plugins
|
||||
- Not a replacement for full BI platforms
|
||||
- **Better Alternative**: Apache Superset @ <https://github.com/apache/superset>, Metabase @ <https://github.com/metabase/metabase>, or Grafana
|
||||
|
||||
5. **Transactional Systems**
|
||||
- Not designed for OLTP workloads
|
||||
- Limited transaction support
|
||||
- **Better Alternative**: Django ORM with PostgreSQL, or FastAPI with SQLAlchemy
|
||||
|
||||
6. **User Authentication and Authorization**
|
||||
- Basic auth support, but not a full auth system
|
||||
- RBAC requires plugins and configuration
|
||||
- **Better Alternative**: Use Datasette behind proxy with auth, or use Metabase for built-in user management
|
||||
|
||||
7. **Non-Relational Data**
|
||||
- Optimized for relational SQLite data
|
||||
- Document stores require workarounds
|
||||
- **Better Alternative**: MongoDB with Mongo Express, or Elasticsearch with Kibana
|
||||
|
||||
### ⚠️ Use With Caution
|
||||
|
||||
1. **Sensitive Data Without Proper Access Controls**
|
||||
- Default is public access
|
||||
- Requires careful permission configuration
|
||||
- **Mitigation**: Use `--root` for admin access, configure permissions @ <https://docs.datasette.io/en/stable/authentication.html>
|
||||
|
||||
2. **Production Without Rate Limiting**
|
||||
- No built-in rate limiting
|
||||
- Can be overwhelmed by traffic
|
||||
- **Mitigation**: Deploy behind reverse proxy with rate limiting, or use Cloud Run with concurrency limits
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
### ✅ Use Datasette When
|
||||
|
||||
| Scenario | Why Datasette Excels |
|
||||
| -------------------------------------- | ---------------------------------------------------- |
|
||||
| Publishing static/semi-static datasets | Zero-code instant publication |
|
||||
| Data journalism and investigation | SQL interface + full-text search + shareable queries |
|
||||
| Personal data exploration (Dogsheep) | Cross-database queries, plugin ecosystem |
|
||||
| Internal read-only dashboards | Fast setup, minimal infrastructure |
|
||||
| Prototyping data APIs | Instant JSON API, no backend code |
|
||||
| Open data portals | Built-in metadata, documentation, CSV export |
|
||||
| SQLite file exploration | Best-in-class SQLite web interface |
|
||||
| Low-traffic reference data | Excellent for datasets < 100GB |
|
||||
|
||||
### ❌ Don't Use Datasette When
|
||||
|
||||
| Scenario | Why It's Not Suitable | Better Alternative |
|
||||
| ----------------------------- | ---------------------------------------- | ---------------------------- |
|
||||
| Building a CRUD application | Read-focused, limited write support | Django, FastAPI + SQLAlchemy |
|
||||
| Real-time analytics | Not designed for streaming data | InfluxDB, TimescaleDB |
|
||||
| Multi-tenant SaaS app | Limited isolation, no row-level security | PostgreSQL + RLS |
|
||||
| Heavy concurrent writes | SQLite write limitations | PostgreSQL, MySQL |
|
||||
| Terabyte-scale data | SQLite size constraints | DuckDB, BigQuery, Snowflake |
|
||||
| Enterprise BI with governance | Limited data modeling layer | Looker, dbt + Metabase |
|
||||
| Complex visualization needs | Basic charts without plugins | Apache Superset, Tableau |
|
||||
| Document/graph data | Relational focus | MongoDB, Neo4j |
|
||||
|
||||
## Comparison with Alternatives
|
||||
|
||||
### vs. Apache Superset @ <https://github.com/apache/superset>
|
||||
|
||||
**When to use Superset over Datasette**:
|
||||
|
||||
- Need advanced visualizations (50+ chart types vs. basic plugins)
|
||||
- Enterprise BI with complex dashboards
|
||||
- Multiple data source types (not just SQLite)
|
||||
- Large team collaboration with RBAC
|
||||
|
||||
**When to use Datasette over Superset**:
|
||||
|
||||
- Simpler deployment and setup
|
||||
- Focus on data exploration over dashboarding
|
||||
- Primarily working with SQLite databases
|
||||
- Want instant API alongside web interface
|
||||
|
||||
### vs. Metabase @ <https://github.com/metabase/metabase>
|
||||
|
||||
**When to use Metabase over Datasette**:
|
||||
|
||||
- Need business user-friendly query builder
|
||||
- Want built-in email reports and scheduling
|
||||
- Require user management and permissions UI
|
||||
- Need mobile app support
|
||||
|
||||
**When to use Datasette over Metabase**:
|
||||
|
||||
- Working primarily with SQLite
|
||||
- Want plugin extensibility
|
||||
- Need instant deployment (lighter weight)
|
||||
- Want API-first design
|
||||
|
||||
### vs. Custom Flask/FastAPI Application
|
||||
|
||||
**When to build custom over Datasette**:
|
||||
|
||||
- Complex business logic required
|
||||
- Heavy write operations
|
||||
- Custom authentication flows
|
||||
- Specific UX requirements
|
||||
|
||||
**When to use Datasette over custom**:
|
||||
|
||||
- Rapid prototyping (hours vs. weeks)
|
||||
- Standard data exploration needs
|
||||
- Focus on data, not application development
|
||||
- Leverage plugin ecosystem
|
||||
|
||||
## Key Insights and Recommendations
|
||||
|
||||
### Core Strengths
|
||||
|
||||
1. **Speed to Value**: From data to published website in minutes
|
||||
2. **Plugin Ecosystem**: 300+ plugins for extending functionality @ <https://datasette.io/plugins>
|
||||
3. **API-First Design**: JSON API is a first-class citizen
|
||||
4. **Deployment Simplicity**: One command to cloud platforms
|
||||
5. **Open Source Community**: Active development, responsive maintainer
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use sqlite-utils for data prep** @ <https://github.com/simonw/sqlite-utils>:
|
||||
|
||||
```bash
|
||||
sqlite-utils insert data.db table data.json --pk=id
|
||||
sqlite-utils enable-fts data.db table column1 column2
|
||||
```
|
||||
|
||||
2. **Configure permissions properly**:
|
||||
|
||||
```yaml
|
||||
databases:
|
||||
private:
|
||||
allow:
|
||||
id: admin_user
|
||||
```
|
||||
|
||||
3. **Use immutable mode for static data**:
|
||||
|
||||
```bash
|
||||
datasette data.db --immutable
|
||||
```
|
||||
|
||||
4. **Leverage canned queries for common patterns**:
|
||||
|
||||
```yaml
|
||||
queries:
|
||||
search:
|
||||
sql: SELECT * FROM items WHERE name LIKE :query
|
||||
```
|
||||
|
||||
5. **Install datasette-hashed-urls for caching** @ <https://github.com/simonw/datasette-hashed-urls>:
|
||||
```bash
|
||||
datasette install datasette-hashed-urls
|
||||
```
|
||||
|
||||
### Migration Path
|
||||
|
||||
**From spreadsheets to Datasette**:
|
||||
|
||||
```bash
|
||||
csvs-to-sqlite data.csv data.db
|
||||
datasette data.db
|
||||
```
|
||||
|
||||
**From PostgreSQL to Datasette**:
|
||||
|
||||
```bash
|
||||
db-to-sqlite "postgresql://user:pass@host/db" data.db
|
||||
datasette data.db
|
||||
```
|
||||
|
||||
**From Datasette to production app**:
|
||||
|
||||
- Use Datasette for prototyping and exploration
|
||||
- Migrate to FastAPI/Django when write operations become critical
|
||||
- Keep Datasette for read-only reporting interface
|
||||
|
||||
## Summary
|
||||
|
||||
Datasette excels at making data instantly explorable and shareable. It's the fastest path from data to published website with API. Use it for read-heavy workflows, data journalism, personal data archives, and rapid prototyping. Avoid it for write-heavy applications, enterprise BI, or large-scale data warehousing.
|
||||
|
||||
**TL;DR**: If you have data and want to publish it or explore it quickly without writing application code, use Datasette. If you need complex transactions, real-time collaboration, or enterprise BI features, choose a different tool.
|
||||
|
||||
## References
|
||||
|
||||
- Official Documentation @ <https://docs.datasette.io/>
|
||||
- GitHub Repository @ <https://github.com/simonw/datasette>
|
||||
- Plugin Directory @ <https://datasette.io/plugins>
|
||||
- Context7 Documentation @ /simonw/datasette (949 code snippets)
|
||||
- Dogsheep Project @ <https://github.com/dogsheep> (Personal data toolkit)
|
||||
- Datasette Lite (WebAssembly) @ <https://lite.datasette.io/>
|
||||
- Community Discord @ <https://datasette.io/discord>
|
||||
- Newsletter @ <https://datasette.substack.com/>
|
||||
701
skills/python3-development/references/modern-modules/fabric.md
Normal file
701
skills/python3-development/references/modern-modules/fabric.md
Normal file
@@ -0,0 +1,701 @@
|
||||
---
|
||||
title: "Fabric: High-Level SSH Command Execution and Deployment"
|
||||
library_name: fabric
|
||||
pypi_package: fabric
|
||||
category: ssh-automation
|
||||
python_compatibility: "3.6+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://docs.fabfile.org"
|
||||
official_repository: "https://github.com/fabric/fabric"
|
||||
maintenance_status: "stable"
|
||||
---
|
||||
|
||||
# Fabric: High-Level SSH Command Execution and Deployment
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Fabric is a high-level Python library designed to execute shell commands remotely over SSH, yielding useful Python objects in return. It solves the problem of programmatic remote server management and deployment automation by providing a Pythonic interface to SSH operations.
|
||||
|
||||
### What Problem Does Fabric Solve?
|
||||
|
||||
Fabric eliminates the need to manually SSH into multiple servers and run commands repeatedly. It provides:
|
||||
|
||||
1. **Programmatic SSH Execution**: Execute commands on remote servers from Python code
|
||||
2. **Multi-Host Management**: Run commands across multiple servers in parallel or serially
|
||||
3. **File Transfer**: Upload and download files over SSH/SFTP
|
||||
4. **Deployment Automation**: Orchestrate complex deployment workflows
|
||||
5. **Task Definition**: Define reusable deployment tasks with the `@task` decorator
|
||||
6. **Connection Management**: Handle SSH authentication, connection pooling, and error handling
|
||||
|
||||
### When Should You Use Fabric?
|
||||
|
||||
**Use Fabric when:**
|
||||
|
||||
- You need to execute commands on **remote servers** over SSH
|
||||
- You're automating deployment processes (copying files, restarting services, running migrations)
|
||||
- You need to manage multiple servers programmatically
|
||||
- You want to define reusable deployment tasks in Python
|
||||
- You're building continuous integration/deployment pipelines
|
||||
- You need more than just subprocess (which only works locally)
|
||||
|
||||
**Use subprocess when:**
|
||||
|
||||
- You only need to run commands on your **local machine**
|
||||
- You don't need SSH connectivity to remote hosts
|
||||
- Your automation is purely local process execution
|
||||
|
||||
**Use Ansible when:**
|
||||
|
||||
- You need declarative configuration management across many hosts
|
||||
- You require idempotency guarantees
|
||||
- You need a large ecosystem of pre-built modules
|
||||
- Your team prefers YAML over Python
|
||||
- You're managing infrastructure state, not just running scripts
|
||||
|
||||
**Use Paramiko directly when:**
|
||||
|
||||
- You need low-level SSH protocol control
|
||||
- You're building custom SSH clients or servers
|
||||
- Fabric's higher-level abstractions are too restrictive
|
||||
|
||||
## Architecture and Dependencies
|
||||
|
||||
Fabric is built on two core libraries:
|
||||
|
||||
1. **Invoke** (>=2.0): Subprocess command execution and command-line task features
|
||||
2. **Paramiko** (>=2.4): SSH protocol implementation
|
||||
|
||||
Fabric extends their APIs to provide:
|
||||
|
||||
- Remote execution via `Connection.run()`
|
||||
- File transfer via `Connection.put()` and `Connection.get()`
|
||||
- Sudo support via `Connection.sudo()`
|
||||
- Group operations via `SerialGroup` and `ThreadingGroup`
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
| Python Version | Fabric 2.x | Fabric 3.x | Status |
|
||||
| -------------- | ---------- | ---------- | ---------------- |
|
||||
| 3.6 | ✓ | ✓ | Minimum version |
|
||||
| 3.7 | ✓ | ✓ | Supported |
|
||||
| 3.8 | ✓ | ✓ | Supported |
|
||||
| 3.9 | ✓ | ✓ | Supported |
|
||||
| 3.10 | ✓ | ✓ | Supported |
|
||||
| 3.11 | ✓ | ✓ | Supported |
|
||||
| 3.12 | ? | ? | Likely supported |
|
||||
| 3.13 | ? | ? | Likely supported |
|
||||
| 3.14 | ? | ? | Unknown |
|
||||
|
||||
**Note**: Fabric follows semantic versioning. Fabric 2.x and 3.x share similar APIs with minor breaking changes. Fabric 1.x (legacy) is incompatible with 2.x/3.x.
|
||||
|
||||
### Fabric Version Differences
|
||||
|
||||
- **Fabric 1.x** (legacy): Python 2.7 only, different API, no longer maintained
|
||||
- **Fabric 2.x**: Modern API, Python 3.6+, built on Invoke/Paramiko
|
||||
- **Fabric 3.x**: Current stable, incremental improvements over 2.x, Python 3.6+
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Standard installation
|
||||
pip install fabric
|
||||
|
||||
# For migration from Fabric 1.x (side-by-side installation)
|
||||
pip install fabric2
|
||||
|
||||
# Development installation
|
||||
pip install -e git+https://github.com/fabric/fabric
|
||||
|
||||
# With pytest fixtures support
|
||||
pip install fabric[pytest]
|
||||
```
|
||||
|
||||
## Core Usage Patterns
|
||||
|
||||
### 1. Basic Remote Command Execution
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import Connection
|
||||
|
||||
# Simple connection and command execution
|
||||
result = Connection('web1.example.com').run('uname -s', hide=True)
|
||||
print(f"Ran {result.command!r} on {result.connection.host}")
|
||||
print(f"Exit code: {result.exited}")
|
||||
print(f"Output: {result.stdout.strip()}")
|
||||
```
|
||||
|
||||
**Result object attributes:**
|
||||
|
||||
- `result.stdout`: Command output
|
||||
- `result.stderr`: Error output
|
||||
- `result.exited`: Exit code
|
||||
- `result.ok`: Boolean (True if exit code was 0)
|
||||
- `result.command`: The command that was run
|
||||
- `result.connection`: The Connection object used
|
||||
|
||||
### 2. Connection with Authentication
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import Connection
|
||||
|
||||
# User@host:port format
|
||||
c = Connection('deploy@web1.example.com:2202')
|
||||
|
||||
# Or explicit parameters
|
||||
c = Connection(
|
||||
host='web1.example.com',
|
||||
user='deploy',
|
||||
port=2202,
|
||||
connect_kwargs={
|
||||
"key_filename": "/path/to/private/key",
|
||||
# or
|
||||
"password": "mypassword"
|
||||
}
|
||||
)
|
||||
|
||||
# Execute commands
|
||||
c.run('whoami')
|
||||
c.run('ls -la /var/www')
|
||||
```
|
||||
|
||||
### 3. File Transfer Operations
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import Connection
|
||||
|
||||
c = Connection('web1')
|
||||
|
||||
# Upload file
|
||||
result = c.put('myfiles.tgz', remote='/opt/mydata/')
|
||||
print(f"Uploaded {result.local} to {result.remote}")
|
||||
|
||||
# Download file
|
||||
c.get('/var/log/app.log', local='./logs/')
|
||||
|
||||
# Upload and extract
|
||||
c.put('myfiles.tgz', '/opt/mydata')
|
||||
c.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
|
||||
```
|
||||
|
||||
### 4. Sudo Operations
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
import getpass
|
||||
from fabric import Connection, Config
|
||||
|
||||
# Configure sudo password
|
||||
sudo_pass = getpass.getpass("What's your sudo password?")
|
||||
config = Config(overrides={'sudo': {'password': sudo_pass}})
|
||||
|
||||
c = Connection('db1', config=config)
|
||||
|
||||
# Run with sudo using helper method
|
||||
c.sudo('whoami', hide='stderr') # Output: root
|
||||
c.sudo('useradd mydbuser')
|
||||
c.run('id -u mydbuser') # Verify user created
|
||||
|
||||
# Alternative: Manual sudo with password responder
|
||||
from invoke import Responder
|
||||
|
||||
sudopass = Responder(
|
||||
pattern=r'\[sudo\] password:',
|
||||
response=f'{sudo_pass}\n',
|
||||
)
|
||||
c.run('sudo whoami', pty=True, watchers=[sudopass])
|
||||
```
|
||||
|
||||
### 5. Multi-Host Execution (Serial)
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import SerialGroup as Group
|
||||
|
||||
# Execute on multiple hosts serially
|
||||
pool = Group('web1', 'web2', 'web3')
|
||||
|
||||
# Run command on all hosts
|
||||
results = pool.run('uname -s')
|
||||
for connection, result in results.items():
|
||||
print(f"{connection.host}: {result.stdout.strip()}")
|
||||
|
||||
# File operations on all hosts
|
||||
pool.put('myfiles.tgz', '/opt/mydata')
|
||||
pool.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
|
||||
```
|
||||
|
||||
### 6. Multi-Host Execution (Parallel)
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import ThreadingGroup as Group
|
||||
|
||||
# Execute on multiple hosts in parallel
|
||||
pool = Group('web1', 'web2', 'web3', 'web4', 'web5')
|
||||
|
||||
# Run command concurrently
|
||||
results = pool.run('hostname')
|
||||
|
||||
# Process results
|
||||
for connection, result in results.items():
|
||||
print(f"{connection.host}: {result.stdout.strip()}")
|
||||
```
|
||||
|
||||
### 7. Defining Reusable Tasks
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import task
|
||||
|
||||
@task
|
||||
def deploy(c):
|
||||
"""Deploy application to remote server"""
|
||||
code_dir = "/srv/django/myproject"
|
||||
|
||||
# Check if directory exists
|
||||
if not c.run(f"test -d {code_dir}", warn=True):
|
||||
# Clone repository
|
||||
c.run(f"git clone user@vcshost:/path/to/repo/.git {code_dir}")
|
||||
|
||||
# Update code
|
||||
c.run(f"cd {code_dir} && git pull")
|
||||
|
||||
# Restart application
|
||||
c.run(f"cd {code_dir} && touch app.wsgi")
|
||||
|
||||
@task
|
||||
def update_servers(c):
|
||||
"""Run system updates"""
|
||||
c.sudo('apt update')
|
||||
c.sudo('apt upgrade -y')
|
||||
c.sudo('systemctl restart nginx')
|
||||
|
||||
# Use with fab command:
|
||||
# fab -H web1,web2,web3 deploy
|
||||
```
|
||||
|
||||
### 8. Task Composition and Workflow
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import task
|
||||
from invoke import Exit
|
||||
from invocations.console import confirm
|
||||
|
||||
@task
|
||||
def test(c):
|
||||
"""Run local tests"""
|
||||
result = c.local("./manage.py test my_app", warn=True)
|
||||
if not result and not confirm("Tests failed. Continue anyway?"):
|
||||
raise Exit("Aborting at user request.")
|
||||
|
||||
@task
|
||||
def commit(c):
|
||||
"""Commit changes"""
|
||||
c.local("git add -p && git commit")
|
||||
|
||||
@task
|
||||
def push(c):
|
||||
"""Push to remote"""
|
||||
c.local("git push")
|
||||
|
||||
@task
|
||||
def prepare_deploy(c):
|
||||
"""Prepare for deployment"""
|
||||
test(c)
|
||||
commit(c)
|
||||
push(c)
|
||||
|
||||
@task(hosts=['web1.example.com', 'web2.example.com'])
|
||||
def deploy(c):
|
||||
"""Deploy to remote servers"""
|
||||
code_dir = "/srv/django/myproject"
|
||||
c.run(f"cd {code_dir} && git pull")
|
||||
c.run(f"cd {code_dir} && touch app.wsgi")
|
||||
|
||||
# Usage:
|
||||
# fab prepare_deploy deploy
|
||||
```
|
||||
|
||||
### 9. Connection with Gateway/Bastion Host
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/concepts/networking.html>**
|
||||
|
||||
```python
|
||||
from fabric import Connection
|
||||
|
||||
# Connect to internal host through gateway
|
||||
gateway = Connection('bastion.example.com')
|
||||
c = Connection('internal-db.local', gateway=gateway)
|
||||
|
||||
# Now all operations go through the gateway
|
||||
c.run('hostname')
|
||||
c.run('df -h')
|
||||
```
|
||||
|
||||
### 10. Error Handling and Conditional Logic
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/getting-started.html>**
|
||||
|
||||
```python
|
||||
from fabric import SerialGroup as Group
|
||||
|
||||
def upload_and_unpack(c):
|
||||
"""Upload file only if it doesn't exist"""
|
||||
# Check if file exists (don't fail on non-zero exit)
|
||||
if c.run('test -f /opt/mydata/myfile', warn=True).failed:
|
||||
c.put('myfiles.tgz', '/opt/mydata')
|
||||
c.run('tar -C /opt/mydata -xzvf /opt/mydata/myfiles.tgz')
|
||||
else:
|
||||
print(f"File already exists on {c.host}, skipping upload")
|
||||
|
||||
# Apply to group
|
||||
for connection in Group('web1', 'web2', 'web3'):
|
||||
upload_and_unpack(connection)
|
||||
```
|
||||
|
||||
## Real-World Integration Patterns
|
||||
|
||||
### Pattern 1: Django/Web Application Deployment
|
||||
|
||||
**@<https://www.oreilly.com/library/view/test-driven-development-with/9781449365141/ch09.html>**
|
||||
|
||||
```python
|
||||
from fabric import task, Connection
|
||||
|
||||
@task
|
||||
def deploy_django(c):
|
||||
"""Deploy Django application"""
|
||||
# Pull latest code
|
||||
c.run('cd /var/www/myapp && git pull origin main')
|
||||
|
||||
# Install dependencies
|
||||
c.run('cd /var/www/myapp && pip install -r requirements.txt')
|
||||
|
||||
# Run migrations
|
||||
c.run('cd /var/www/myapp && python manage.py migrate')
|
||||
|
||||
# Collect static files
|
||||
c.run('cd /var/www/myapp && python manage.py collectstatic --noinput')
|
||||
|
||||
# Restart services
|
||||
c.sudo('systemctl restart gunicorn')
|
||||
c.sudo('systemctl restart nginx')
|
||||
```
|
||||
|
||||
### Pattern 2: Database Backup and Restore
|
||||
|
||||
**@Exa:fabric deployment examples**
|
||||
|
||||
```python
|
||||
from fabric import task
|
||||
from datetime import datetime
|
||||
|
||||
@task
|
||||
def backup_database(c):
|
||||
"""Create database backup"""
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
backup_file = f"backup_{timestamp}.sql"
|
||||
|
||||
# Create backup
|
||||
c.run(f"mysqldump -u dbuser -p database_name > /backups/{backup_file}")
|
||||
|
||||
# Compress backup
|
||||
c.run(f"gzip /backups/{backup_file}")
|
||||
|
||||
# Download backup
|
||||
c.get(f"/backups/{backup_file}.gz", local=f"./backups/{backup_file}.gz")
|
||||
|
||||
print(f"Backup completed: {backup_file}.gz")
|
||||
```
|
||||
|
||||
### Pattern 3: Log Collection and Analysis
|
||||
|
||||
```python
|
||||
from fabric import SerialGroup as Group
|
||||
|
||||
def collect_logs(c):
|
||||
"""Collect application logs from remote server"""
|
||||
hostname = c.run('hostname', hide=True).stdout.strip()
|
||||
c.get('/var/log/app/error.log', local=f'logs/{hostname}_error.log')
|
||||
c.get('/var/log/app/access.log', local=f'logs/{hostname}_access.log')
|
||||
|
||||
# Collect from all servers
|
||||
pool = Group('web1', 'web2', 'web3', 'web4')
|
||||
for conn in pool:
|
||||
collect_logs(conn)
|
||||
```
|
||||
|
||||
### Pattern 4: Service Health Check
|
||||
|
||||
```python
|
||||
from fabric import task, SerialGroup as Group
|
||||
|
||||
@task
|
||||
def health_check(c):
|
||||
"""Check service health across servers"""
|
||||
servers = Group('web1', 'web2', 'db1', 'cache1')
|
||||
|
||||
for conn in servers:
|
||||
print(f"\nChecking {conn.host}...")
|
||||
|
||||
# Check disk space
|
||||
result = conn.run("df -h / | tail -n1 | awk '{print $5}'", hide=True)
|
||||
disk_usage = result.stdout.strip()
|
||||
print(f" Disk usage: {disk_usage}")
|
||||
|
||||
# Check memory
|
||||
result = conn.run("free -m | grep Mem | awk '{print $3/$2 * 100.0}'", hide=True)
|
||||
mem_usage = float(result.stdout.strip())
|
||||
print(f" Memory usage: {mem_usage:.1f}%")
|
||||
|
||||
# Check service status
|
||||
result = conn.run("systemctl is-active nginx", warn=True, hide=True)
|
||||
service_status = result.stdout.strip()
|
||||
print(f" Nginx status: {service_status}")
|
||||
```
|
||||
|
||||
## When NOT to Use Fabric
|
||||
|
||||
### 1. Simple Local Automation
|
||||
|
||||
```python
|
||||
# DON'T use Fabric for local operations
|
||||
from fabric import Connection
|
||||
c = Connection('localhost')
|
||||
c.run('ls -la')
|
||||
|
||||
# DO use subprocess instead
|
||||
import subprocess
|
||||
subprocess.run(['ls', '-la'])
|
||||
```
|
||||
|
||||
### 2. Large-Scale Infrastructure Management
|
||||
|
||||
If you need to manage hundreds of servers with complex configuration requirements, **Ansible** or **SaltStack** provide better:
|
||||
|
||||
- Declarative configuration syntax
|
||||
- Idempotency guarantees
|
||||
- Large module ecosystem
|
||||
- Built-in inventory management
|
||||
- Role-based organization
|
||||
|
||||
### 3. Container Orchestration
|
||||
|
||||
For Docker/Kubernetes deployments, use native orchestration tools:
|
||||
|
||||
- Docker Compose
|
||||
- Kubernetes manifests
|
||||
- Helm charts
|
||||
- ArgoCD
|
||||
|
||||
### 4. Configuration Drift Detection
|
||||
|
||||
Fabric executes commands but doesn't track state. For configuration management with drift detection, use:
|
||||
|
||||
- Ansible
|
||||
- Chef
|
||||
- Puppet
|
||||
- Terraform (for infrastructure)
|
||||
|
||||
### 5. Windows Remote Management
|
||||
|
||||
For Windows automation, use:
|
||||
|
||||
- PowerShell Remoting
|
||||
- WinRM libraries
|
||||
- Ansible (with WinRM)
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Scenario | Fabric | Ansible | Subprocess | Paramiko |
|
||||
| ---------------------------- | ------ | ------- | ---------- | -------- |
|
||||
| Deploy to 1-10 Linux servers | ✓✓ | ✓ | ✗ | ✓ |
|
||||
| Deploy to 100+ servers | ✓ | ✓✓ | ✗ | ✗ |
|
||||
| Run local commands | ✗ | ✗ | ✓✓ | ✗ |
|
||||
| Configuration management | ✗ | ✓✓ | ✗ | ✗ |
|
||||
| Ad-hoc SSH automation | ✓✓ | ✓ | ✗ | ✓ |
|
||||
| Custom SSH protocol work | ✗ | ✗ | ✗ | ✓✓ |
|
||||
| Python-first workflow | ✓✓ | ✗ | ✓✓ | ✓✓ |
|
||||
| YAML-first workflow | ✗ | ✓✓ | ✗ | ✗ |
|
||||
| File transfer over SSH | ✓✓ | ✓ | ✗ | ✓ |
|
||||
| Parallel execution | ✓✓ | ✓✓ | ✓ | ✗ |
|
||||
| Windows targets | ✗ | ✓✓ | ✓ | ✗ |
|
||||
|
||||
**Legend**: ✓✓ = Excellent fit, ✓ = Suitable, ✗ = Not appropriate
|
||||
|
||||
## Common Gotchas and Solutions
|
||||
|
||||
### 1. Separate Shell Sessions
|
||||
|
||||
**@<https://www.fabfile.org/faq.html>**
|
||||
|
||||
```python
|
||||
# WRONG: cd doesn't persist across run() calls
|
||||
@task
|
||||
def deploy(c):
|
||||
c.run("cd /path/to/application")
|
||||
c.run("./update.sh") # This runs in home directory!
|
||||
|
||||
# CORRECT: Use shell && operator
|
||||
@task
|
||||
def deploy(c):
|
||||
c.run("cd /path/to/application && ./update.sh")
|
||||
|
||||
# ALTERNATIVE: Use absolute paths
|
||||
@task
|
||||
def deploy(c):
|
||||
c.run("/path/to/application/update.sh")
|
||||
```
|
||||
|
||||
### 2. Sudo Password Prompts
|
||||
|
||||
```python
|
||||
# WRONG: Sudo hangs waiting for password
|
||||
c.run('sudo systemctl restart nginx')
|
||||
|
||||
# CORRECT: Use pty=True and watchers
|
||||
from invoke import Responder
|
||||
|
||||
sudopass = Responder(
|
||||
pattern=r'\[sudo\] password:',
|
||||
response='mypassword\n',
|
||||
)
|
||||
c.run('sudo systemctl restart nginx', pty=True, watchers=[sudopass])
|
||||
|
||||
# BETTER: Use Connection.sudo() helper
|
||||
c.sudo('systemctl restart nginx') # Uses configured password
|
||||
```
|
||||
|
||||
### 3. Connection Reuse
|
||||
|
||||
```python
|
||||
# INEFFICIENT: Creates new connection each time
|
||||
for i in range(10):
|
||||
Connection('web1').run(f'echo {i}')
|
||||
|
||||
# EFFICIENT: Reuse connection
|
||||
c = Connection('web1')
|
||||
for i in range(10):
|
||||
c.run(f'echo {i}')
|
||||
```
|
||||
|
||||
## Testing with Fabric
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/testing.html>**
|
||||
|
||||
```python
|
||||
from fabric.testing import MockRemote
|
||||
|
||||
def test_deployment():
|
||||
"""Test deployment logic without real SSH"""
|
||||
with MockRemote(commands={
|
||||
'test -d /srv/app': (1, '', ''), # Exit 1 = doesn't exist
|
||||
'git clone ...': (0, 'Cloning...', ''),
|
||||
'cd /srv/app && git pull': (0, 'Already up to date', ''),
|
||||
}) as remote:
|
||||
c = remote.connection
|
||||
deploy(c)
|
||||
|
||||
# Verify commands were called
|
||||
assert 'git clone' in remote.calls
|
||||
```
|
||||
|
||||
## Migration from Fabric 1.x to 2.x/3.x
|
||||
|
||||
**@<https://docs.fabfile.org/en/latest/upgrading.html>**
|
||||
|
||||
Key changes:
|
||||
|
||||
1. No more `env` global dictionary
|
||||
2. Tasks must accept `Connection` or `Context` as first argument
|
||||
3. No more `@hosts` decorator (use `@task(hosts=[...])`)
|
||||
4. `run()` is now `c.run()` on Connection object
|
||||
5. Import from `fabric` not `fabric.api`
|
||||
|
||||
```python
|
||||
# Fabric 1.x (OLD)
|
||||
from fabric.api import env, run, task
|
||||
|
||||
env.hosts = ['web1', 'web2']
|
||||
|
||||
@task
|
||||
def deploy():
|
||||
run('git pull')
|
||||
|
||||
# Fabric 2.x/3.x (NEW)
|
||||
from fabric import task
|
||||
|
||||
@task(hosts=['web1', 'web2'])
|
||||
def deploy(c):
|
||||
c.run('git pull')
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Parallel vs Serial Execution**:
|
||||
- Use `ThreadingGroup` for I/O-bound tasks (network operations)
|
||||
- Consider `SerialGroup` for order-dependent operations
|
||||
- Default thread pool size is 10 connections
|
||||
|
||||
2. **Connection Pooling**:
|
||||
- Reuse `Connection` objects when possible
|
||||
- Close connections explicitly with `c.close()` or use context managers
|
||||
|
||||
3. **Output Buffering**:
|
||||
- Use `hide=True` to suppress output and improve performance
|
||||
- Large output can slow down execution
|
||||
|
||||
## Resources and Examples
|
||||
|
||||
### Official Documentation
|
||||
|
||||
- Main site: @<https://www.fabfile.org/>
|
||||
- Getting Started: @<https://docs.fabfile.org/en/latest/getting-started.html>
|
||||
- API Reference: @<https://docs.fabfile.org/en/latest/api/>
|
||||
- FAQ: @<https://www.fabfile.org/faq.html>
|
||||
- Upgrading Guide: @<https://www.fabfile.org/upgrading.html>
|
||||
|
||||
### GitHub Examples
|
||||
|
||||
- Official repository: @<https://github.com/fabric/fabric>
|
||||
- Example fabfiles: @<https://github.com/fabric/fabric/tree/main/sites/docs>
|
||||
- Integration tests: @<https://github.com/fabric/fabric/tree/main/integration>
|
||||
|
||||
### Community Resources
|
||||
|
||||
- Fabricio (Docker automation): @<https://github.com/renskiy/fabricio>
|
||||
- Linux Journal tutorial: @<https://www.linuxjournal.com/content/fabric-system-administrators-best-friend>
|
||||
- Medium tutorials: @<https://medium.com/gopyjs/automate-deployment-with-fabric-python-fad992e68b5>
|
||||
|
||||
## Summary
|
||||
|
||||
**Use Fabric when you need to:**
|
||||
|
||||
- Execute commands on remote Linux servers via SSH
|
||||
- Automate deployment of web applications
|
||||
- Manage small to medium server fleets (1-50 servers)
|
||||
- Transfer files between local and remote systems
|
||||
- Define reusable deployment tasks in Python
|
||||
- Integrate deployment into CI/CD pipelines
|
||||
|
||||
**Don't use Fabric when:**
|
||||
|
||||
- You only need local command execution (use subprocess)
|
||||
- You're managing large infrastructure (>100 servers, use Ansible)
|
||||
- You need configuration drift detection (use Ansible/Chef/Puppet)
|
||||
- You're working with Windows servers primarily
|
||||
- You need declarative infrastructure as code (use Terraform/Ansible)
|
||||
|
||||
Fabric excels at programmatic SSH automation for deployment workflows where you want the full power of Python combined with remote execution capabilities. It's the sweet spot between low-level Paramiko and heavyweight configuration management tools.
|
||||
630
skills/python3-development/references/modern-modules/httpx.md
Normal file
630
skills/python3-development/references/modern-modules/httpx.md
Normal file
@@ -0,0 +1,630 @@
|
||||
---
|
||||
title: "httpx - Next Generation HTTP Client for Python"
|
||||
library_name: httpx
|
||||
pypi_package: httpx
|
||||
category: http_client
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://www.python-httpx.org"
|
||||
official_repository: "https://github.com/encode/httpx"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# httpx - Next Generation HTTP Client for Python
|
||||
|
||||
## Overview
|
||||
|
||||
**httpx** is a fully-featured HTTP client library for Python 3 that provides both synchronous and asynchronous APIs. It is designed as a next-generation alternative to the popular `requests` library, offering HTTP/1.1 and HTTP/2 support, true async capabilities, and a broadly compatible API while introducing modern improvements.
|
||||
|
||||
**Official Repository:** <https://github.com/encode/httpx> @ encode/httpx **Documentation:** <https://www.python-httpx.org/> @ python-httpx.org **PyPI Package:** `httpx` @ pypi.org/project/httpx **License:** BSD-3-Clause @ github.com/encode/httpx **Current Version:** 0.28.1 (as of December 2024) @ pypi.org **Maintenance Status:** Actively maintained, 14,652+ GitHub stars @ github.com/encode/httpx
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### Problem httpx Solves
|
||||
|
||||
1. **Async HTTP Support:** Provides native async/await support for HTTP requests, eliminating the need for separate libraries like `aiohttp` @ python-httpx.org/async
|
||||
2. **HTTP/2 Protocol:** Full HTTP/2 support with connection multiplexing and server push @ python-httpx.org/http2
|
||||
3. **Modern Python Standards:** Built for Python 3.9+ with full type annotations and modern async patterns @ github.com/encode/httpx/pyproject.toml
|
||||
4. **Consistent Sync/Async API:** Single library that works for both synchronous and asynchronous code @ python-httpx.org
|
||||
|
||||
### What Would Be "Reinventing the Wheel"
|
||||
|
||||
Without httpx, you would need to:
|
||||
|
||||
- Use separate libraries for sync (`requests`) and async (`aiohttp`) HTTP operations @ towardsdatascience.com
|
||||
- Implement HTTP/2 support manually or use lower-level libraries @ python-httpx.org/http2
|
||||
- Manage different API patterns between sync and async code @ python-httpx.org/async
|
||||
- Handle connection pooling and timeout configuration separately for each library @ python-httpx.org/advanced
|
||||
|
||||
## When to Use httpx
|
||||
|
||||
### Use httpx When
|
||||
|
||||
1. **Async HTTP Required:** You need asynchronous HTTP requests in an async application (FastAPI, asyncio, Trio) @ python-httpx.org/async
|
||||
2. **HTTP/2 Support Needed:** Your application benefits from HTTP/2 features like multiplexing @ python-httpx.org/http2
|
||||
3. **Both Sync and Async:** You want one library that handles both synchronous and asynchronous patterns @ python-httpx.org
|
||||
4. **ASGI/WSGI Testing:** You need to make requests directly to ASGI or WSGI applications without network @ python-httpx.org/advanced/transports
|
||||
5. **Modern Type Safety:** You require full type annotations and modern Python tooling support @ github.com/encode/httpx
|
||||
6. **Strict Timeouts:** You need proper timeout handling by default (httpx has timeouts everywhere) @ python-httpx.org/quickstart
|
||||
|
||||
### Use requests When
|
||||
|
||||
1. **Simple Sync-Only Application:** You only need synchronous HTTP and don't require async @ python-httpx.org/compatibility
|
||||
2. **Legacy Python Support:** You need to support Python 3.7 or earlier @ github.com/encode/httpx/pyproject.toml
|
||||
3. **Broad Ecosystem Compatibility:** You rely on requests-specific plugins or tools @ python-httpx.org/compatibility
|
||||
4. **Auto-Redirects Preferred:** You want automatic redirect following by default (httpx requires explicit opt-in) @ python-httpx.org/quickstart
|
||||
|
||||
### Use aiohttp When
|
||||
|
||||
1. **Server + Client Together:** You need both HTTP server and client in one library @ medium.com/featurepreneur
|
||||
2. **WebSocket Support:** You need built-in WebSocket client support (httpx requires httpx-ws extension) @ github.com/frankie567/httpx-ws
|
||||
3. **Existing aiohttp Codebase:** You have significant investment in aiohttp-specific features @ medium.com/featurepreneur
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────┬──────────┬──────────┬─────────┐
|
||||
│ Requirement │ httpx │ requests │ aiohttp │
|
||||
├─────────────────────────────────┼──────────┼──────────┼─────────┤
|
||||
│ Sync HTTP requests │ ✓ │ ✓ │ ✗ │
|
||||
│ Async HTTP requests │ ✓ │ ✗ │ ✓ │
|
||||
│ HTTP/2 support │ ✓ │ ✗ │ ✓ │
|
||||
│ requests-compatible API │ ✓ │ ✓ │ ✗ │
|
||||
│ Type annotations │ ✓ │ Partial │ ✓ │
|
||||
│ Default timeouts │ ✓ │ ✗ │ ✓ │
|
||||
│ ASGI/WSGI testing │ ✓ │ ✗ │ ✗ │
|
||||
│ Python 3.7 support │ ✗ │ ✓ │ ✓ │
|
||||
│ Auto-redirects by default │ ✗ │ ✓ │ ✓ │
|
||||
│ Built-in server support │ ✗ │ ✗ │ ✓ │
|
||||
└─────────────────────────────────┴──────────┴──────────┴─────────┘
|
||||
```
|
||||
|
||||
@ Compiled from python-httpx.org, medium.com/featurepreneur
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
**Minimum Python Version:** 3.9 @ github.com/encode/httpx/pyproject.toml **Officially Supported Versions:** 3.9, 3.10, 3.11, 3.12, 3.13 @ github.com/encode/httpx/pyproject.toml
|
||||
|
||||
**Async/Await Requirements:**
|
||||
|
||||
- Full async/await syntax support (Python 3.7+) @ python-httpx.org/async
|
||||
- Works with asyncio, Trio, and anyio backends @ python-httpx.org/async
|
||||
|
||||
**Python 3.11-3.14 Status:**
|
||||
|
||||
- **3.11:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
|
||||
- **3.12:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
|
||||
- **3.13:** Fully supported and tested @ github.com/encode/httpx/pyproject.toml
|
||||
- **3.14:** Expected to work (not yet released as of October 2025)
|
||||
|
||||
## Real-World Usage Examples
|
||||
|
||||
### Example Projects Using httpx
|
||||
|
||||
1. **notion-sdk-py** (2,086+ stars) @ github.com/ramnes/notion-sdk-py
|
||||
- Official Notion API client with sync and async support
|
||||
- Pattern: Client wrapper using httpx.Client and httpx.AsyncClient
|
||||
- URL: <https://github.com/ramnes/notion-sdk-py>
|
||||
|
||||
2. **githubkit** (296+ stars) @ github.com/yanyongyu/githubkit
|
||||
- Modern GitHub SDK with REST API and GraphQL support
|
||||
- Pattern: Unified sync/async interface with httpx
|
||||
- URL: <https://github.com/yanyongyu/githubkit>
|
||||
|
||||
3. **twscrape** (1,981+ stars) @ github.com/vladkens/twscrape
|
||||
- Twitter/X API scraper with authorization support
|
||||
- Pattern: Async httpx for high-performance concurrent requests
|
||||
- URL: <https://github.com/vladkens/twscrape>
|
||||
|
||||
4. **TikTokDownloader** (12,018+ stars) @ github.com/JoeanAmier/TikTokDownloader
|
||||
- TikTok/Douyin data collection and download tool
|
||||
- Pattern: Async httpx for parallel downloads
|
||||
- URL: <https://github.com/JoeanAmier/TikTokDownloader>
|
||||
|
||||
5. **XHS-Downloader** (8,982+ stars) @ github.com/JoeanAmier/XHS-Downloader
|
||||
- Xiaohongshu (RedNote) content extractor and downloader
|
||||
- Pattern: httpx with FastAPI for server-side scraping
|
||||
- URL: <https://github.com/JoeanAmier/XHS-Downloader>
|
||||
|
||||
### Common Usage Patterns @ github.com/search, exa.ai
|
||||
|
||||
```python
|
||||
# Pattern 1: Synchronous API client wrapper
|
||||
import httpx
|
||||
|
||||
class APIClient:
|
||||
def __init__(self, base_url: str, api_key: str):
|
||||
self.client = httpx.Client(
|
||||
base_url=base_url,
|
||||
headers={"Authorization": f"Bearer {api_key}"},
|
||||
timeout=30.0
|
||||
)
|
||||
|
||||
def get_resource(self, resource_id: str):
|
||||
response = self.client.get(f"/resources/{resource_id}")
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
# Pattern 2: Async concurrent requests
|
||||
import asyncio
|
||||
import httpx
|
||||
|
||||
async def fetch_all(urls: list[str]) -> list[dict]:
|
||||
async with httpx.AsyncClient() as client:
|
||||
tasks = [client.get(url) for url in urls]
|
||||
responses = await asyncio.gather(*tasks)
|
||||
return [r.json() for r in responses]
|
||||
|
||||
# Pattern 3: FastAPI integration with async httpx
|
||||
from fastapi import FastAPI
|
||||
import httpx
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.get("/proxy/{path:path}")
|
||||
async def proxy_request(path: str):
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get(f"https://api.example.com/{path}")
|
||||
return response.json()
|
||||
|
||||
# Pattern 4: HTTP/2 with connection pooling
|
||||
import httpx
|
||||
|
||||
client = httpx.Client(http2=True)
|
||||
try:
|
||||
for i in range(10):
|
||||
response = client.get(f"https://http2.example.com/data/{i}")
|
||||
print(response.json())
|
||||
finally:
|
||||
client.close()
|
||||
|
||||
# Pattern 5: Streaming large downloads with progress
|
||||
import httpx
|
||||
|
||||
with httpx.stream("GET", "https://example.com/large-file.zip") as response:
|
||||
total = int(response.headers["Content-Length"])
|
||||
downloaded = 0
|
||||
|
||||
with open("output.zip", "wb") as f:
|
||||
for chunk in response.iter_bytes(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
downloaded += len(chunk)
|
||||
print(f"Progress: {downloaded}/{total} bytes")
|
||||
```
|
||||
|
||||
@ Compiled from github.com/encode/httpx/docs, exa.ai/get_code_context
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### FastAPI Integration @ raw.githubusercontent.com/refinedev
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
import httpx
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
app.state.http_client = httpx.AsyncClient()
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown_event():
|
||||
await app.state.http_client.aclose()
|
||||
|
||||
@app.get("/data")
|
||||
async def get_data(request: Request):
|
||||
async with request.app.state.http_client as client:
|
||||
response = await client.get("https://api.example.com/data")
|
||||
return response.json()
|
||||
```
|
||||
|
||||
### Starlette ASGI Transport @ python-httpx.org/advanced/transports
|
||||
|
||||
```python
|
||||
from starlette.applications import Starlette
|
||||
from starlette.routing import Route
|
||||
import httpx
|
||||
|
||||
async def homepage(request):
|
||||
return {"message": "Hello, world"}
|
||||
|
||||
app = Starlette(routes=[Route("/", homepage)])
|
||||
|
||||
# Test without network
|
||||
with httpx.Client(transport=httpx.ASGITransport(app=app)) as client:
|
||||
response = client.get("http://testserver/")
|
||||
assert response.status_code == 200
|
||||
```
|
||||
|
||||
### Trio Async Backend @ python-httpx.org/async
|
||||
|
||||
```python
|
||||
import httpx
|
||||
import trio
|
||||
|
||||
async def main():
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get('https://www.example.com/')
|
||||
print(response)
|
||||
|
||||
trio.run(main)
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Basic Installation @ python-httpx.org
|
||||
|
||||
```bash
|
||||
pip install httpx
|
||||
```
|
||||
|
||||
### With HTTP/2 Support @ python-httpx.org/http2
|
||||
|
||||
```bash
|
||||
pip install httpx[http2]
|
||||
```
|
||||
|
||||
### With CLI Support @ python-httpx.org
|
||||
|
||||
```bash
|
||||
pip install 'httpx[cli]'
|
||||
```
|
||||
|
||||
### With All Features @ python-httpx.org
|
||||
|
||||
```bash
|
||||
pip install 'httpx[http2,cli,brotli,zstd]'
|
||||
```
|
||||
|
||||
### Using uv (Recommended) @ astral.sh
|
||||
|
||||
```bash
|
||||
uv add httpx
|
||||
uv add 'httpx[http2]' # With HTTP/2 support
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Synchronous Request @ python-httpx.org/quickstart
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Simple GET request
|
||||
response = httpx.get('https://httpbin.org/get')
|
||||
print(response.status_code) # 200
|
||||
print(response.json())
|
||||
|
||||
# POST with data
|
||||
response = httpx.post('https://httpbin.org/post', data={'key': 'value'})
|
||||
|
||||
# Custom headers
|
||||
headers = {'user-agent': 'my-app/0.0.1'}
|
||||
response = httpx.get('https://httpbin.org/headers', headers=headers)
|
||||
|
||||
# Query parameters
|
||||
params = {'key1': 'value1', 'key2': 'value2'}
|
||||
response = httpx.get('https://httpbin.org/get', params=params)
|
||||
```
|
||||
|
||||
### Asynchronous Requests @ python-httpx.org/async
|
||||
|
||||
```python
|
||||
import httpx
|
||||
import asyncio
|
||||
|
||||
async def fetch_data():
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get('https://www.example.com/')
|
||||
print(response.status_code)
|
||||
return response.json()
|
||||
|
||||
# Run async function
|
||||
asyncio.run(fetch_data())
|
||||
|
||||
# Concurrent requests
|
||||
async def fetch_multiple():
|
||||
async with httpx.AsyncClient() as client:
|
||||
tasks = [
|
||||
client.get('https://httpbin.org/get'),
|
||||
client.get('https://httpbin.org/headers'),
|
||||
client.get('https://httpbin.org/user-agent')
|
||||
]
|
||||
responses = await asyncio.gather(*tasks)
|
||||
return [r.json() for r in responses]
|
||||
```
|
||||
|
||||
### Client Instance with Configuration @ python-httpx.org/advanced/clients
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Create configured client
|
||||
client = httpx.Client(
|
||||
base_url='https://api.example.com',
|
||||
headers={'Authorization': 'Bearer token123'},
|
||||
timeout=30.0,
|
||||
follow_redirects=True
|
||||
)
|
||||
|
||||
try:
|
||||
# Make requests using the client
|
||||
response = client.get('/users/me')
|
||||
response.raise_for_status()
|
||||
print(response.json())
|
||||
finally:
|
||||
client.close()
|
||||
|
||||
# Context manager (automatic cleanup)
|
||||
with httpx.Client(base_url='https://api.example.com') as client:
|
||||
response = client.get('/data')
|
||||
```
|
||||
|
||||
### HTTP/2 Support @ python-httpx.org/http2
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Enable HTTP/2
|
||||
client = httpx.Client(http2=True)
|
||||
|
||||
try:
|
||||
response = client.get('https://www.google.com')
|
||||
print(response.extensions['http_version']) # b'HTTP/2'
|
||||
finally:
|
||||
client.close()
|
||||
|
||||
# Async HTTP/2
|
||||
async with httpx.AsyncClient(http2=True) as client:
|
||||
response = await client.get('https://www.google.com')
|
||||
print(response.extensions['http_version'])
|
||||
```
|
||||
|
||||
### Streaming Responses @ python-httpx.org/quickstart
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Stream bytes
|
||||
with httpx.stream("GET", "https://www.example.com/large-file") as response:
|
||||
for chunk in response.iter_bytes(chunk_size=8192):
|
||||
process_chunk(chunk)
|
||||
|
||||
# Stream lines
|
||||
with httpx.stream("GET", "https://www.example.com/log") as response:
|
||||
for line in response.iter_lines():
|
||||
print(line)
|
||||
|
||||
# Conditional loading
|
||||
with httpx.stream("GET", "https://www.example.com/file") as response:
|
||||
if int(response.headers['Content-Length']) < 10_000_000: # 10MB
|
||||
content = response.read()
|
||||
print(content)
|
||||
```
|
||||
|
||||
### Error Handling @ python-httpx.org/quickstart
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
try:
|
||||
response = httpx.get("https://www.example.com/")
|
||||
response.raise_for_status() # Raises HTTPStatusError for 4xx/5xx
|
||||
except httpx.RequestError as exc:
|
||||
print(f"Network error: {exc.request.url}")
|
||||
except httpx.HTTPStatusError as exc:
|
||||
print(f"HTTP error {exc.response.status_code}: {exc.request.url}")
|
||||
except httpx.HTTPError as exc:
|
||||
print(f"General HTTP error: {exc}")
|
||||
```
|
||||
|
||||
### Authentication @ python-httpx.org/quickstart
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Basic authentication
|
||||
response = httpx.get(
|
||||
"https://example.com",
|
||||
auth=("username", "password")
|
||||
)
|
||||
|
||||
# Digest authentication
|
||||
auth = httpx.DigestAuth("username", "password")
|
||||
response = httpx.get("https://example.com", auth=auth)
|
||||
|
||||
# Bearer token
|
||||
headers = {"Authorization": "Bearer token123"}
|
||||
response = httpx.get("https://api.example.com", headers=headers)
|
||||
```
|
||||
|
||||
## When NOT to Use httpx
|
||||
|
||||
### Scenarios Where httpx May Not Be Suitable
|
||||
|
||||
1. **Python 3.8 or Earlier Required** @ github.com/encode/httpx/pyproject.toml
|
||||
- httpx requires Python 3.9+
|
||||
- Use `requests` for older Python versions
|
||||
|
||||
2. **Simple Scripts with Minimal Dependencies** @ python-httpx.org/compatibility
|
||||
- If you only need basic HTTP GET/POST in a simple script
|
||||
- `requests` has fewer dependencies and simpler API
|
||||
- httpx pulls in additional dependencies (httpcore, anyio, sniffio)
|
||||
|
||||
3. **requests Plugin Ecosystem Required** @ python-httpx.org/compatibility
|
||||
- Libraries specifically built for requests (requests-oauthlib, etc.)
|
||||
- May not have httpx equivalents
|
||||
- Consider staying with requests if heavily invested in plugins
|
||||
|
||||
4. **Need WebSocket Built-in** @ github.com/frankie567/httpx-ws
|
||||
- httpx requires separate httpx-ws extension
|
||||
- aiohttp has built-in WebSocket support
|
||||
|
||||
5. **Auto-Redirect Preference** @ python-httpx.org/quickstart
|
||||
- httpx does NOT follow redirects by default (security-conscious design)
|
||||
- Requires explicit `follow_redirects=True`
|
||||
- requests follows redirects automatically
|
||||
|
||||
6. **Server + Client in One Library** @ medium.com/featurepreneur
|
||||
- httpx is client-only
|
||||
- Use aiohttp or starlette if you need both server and client
|
||||
|
||||
## Key Differences from requests
|
||||
|
||||
### API Compatibility @ python-httpx.org/compatibility
|
||||
|
||||
httpx provides broad compatibility with requests, but with key differences:
|
||||
|
||||
```python
|
||||
# requests: Auto-redirects by default
|
||||
requests.get('http://github.com/') # Follows to HTTPS
|
||||
|
||||
# httpx: Explicit redirect handling
|
||||
httpx.get('http://github.com/', follow_redirects=True)
|
||||
|
||||
# requests: No timeouts by default
|
||||
requests.get('https://example.com')
|
||||
|
||||
# httpx: 5-second default timeout
|
||||
httpx.get('https://example.com') # 5s timeout
|
||||
|
||||
# requests: Session object
|
||||
session = requests.Session()
|
||||
|
||||
# httpx: Client object
|
||||
client = httpx.Client()
|
||||
```
|
||||
|
||||
### Modern Improvements @ python-httpx.org
|
||||
|
||||
1. **Type Safety:** Full type annotations throughout @ github.com/encode/httpx
|
||||
2. **Async Native:** Built-in async/await support @ python-httpx.org/async
|
||||
3. **Strict Timeouts:** Timeouts everywhere by default @ python-httpx.org/quickstart
|
||||
4. **HTTP/2:** Optional HTTP/2 protocol support @ python-httpx.org/http2
|
||||
5. **Better Encoding:** UTF-8 default encoding vs latin1 in requests @ python-httpx.org/compatibility
|
||||
|
||||
## Dependencies @ github.com/encode/httpx
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
- **httpcore** - Underlying transport implementation @ github.com/encode/httpcore
|
||||
- **certifi** - SSL certificates @ github.com/certifi
|
||||
- **idna** - Internationalized domain names @ github.com/kjd/idna
|
||||
- **anyio** - Async abstraction layer @ github.com/agronholm/anyio
|
||||
- **sniffio** - Async library detection @ github.com/python-trio/sniffio
|
||||
|
||||
### Optional Dependencies
|
||||
|
||||
- **h2** - HTTP/2 support (`httpx[http2]`) @ github.com/python-hyper/h2
|
||||
- **socksio** - SOCKS proxy support (`httpx[socks]`) @ github.com/sethmlarson/socksio
|
||||
- **brotli/brotlicffi** - Brotli compression (`httpx[brotli]`) @ github.com/google/brotli
|
||||
- **zstandard** - Zstandard compression (`httpx[zstd]`) @ github.com/indygreg/python-zstandard
|
||||
- **click + pygments + rich** - CLI support (`httpx[cli]`) @ github.com/pallets/click
|
||||
|
||||
## Testing and Mocking
|
||||
|
||||
### respx - Mock httpx @ github.com/lundberg/respx
|
||||
|
||||
```python
|
||||
import httpx
|
||||
import respx
|
||||
|
||||
@respx.mock
|
||||
async def test_api_call():
|
||||
async with httpx.AsyncClient() as client:
|
||||
route = respx.get("https://example.org/")
|
||||
response = await client.get("https://example.org/")
|
||||
assert route.called
|
||||
assert response.status_code == 200
|
||||
```
|
||||
|
||||
### pytest-httpx @ github.com/Colin-b/pytest_httpx
|
||||
|
||||
```python
|
||||
import httpx
|
||||
import pytest
|
||||
|
||||
def test_with_httpx(httpx_mock):
|
||||
httpx_mock.add_response(url="https://example.com/", json={"status": "ok"})
|
||||
|
||||
response = httpx.get("https://example.com/")
|
||||
assert response.json() == {"status": "ok"}
|
||||
```
|
||||
|
||||
## Performance Considerations @ raw.githubusercontent.com/encode/httpx
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Reuse connections with Client
|
||||
client = httpx.Client()
|
||||
for i in range(100):
|
||||
response = client.get(f"https://api.example.com/item/{i}")
|
||||
client.close()
|
||||
|
||||
# Async with connection limits
|
||||
async with httpx.AsyncClient(
|
||||
limits=httpx.Limits(max_keepalive_connections=20, max_connections=100)
|
||||
) as client:
|
||||
# Efficient connection reuse
|
||||
tasks = [client.get(url) for url in urls]
|
||||
responses = await asyncio.gather(*tasks)
|
||||
```
|
||||
|
||||
### Timeout Configuration @ python-httpx.org/advanced/timeouts
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
# Fine-grained timeouts
|
||||
timeout = httpx.Timeout(
|
||||
connect=5.0, # Connection timeout
|
||||
read=10.0, # Read timeout
|
||||
write=10.0, # Write timeout
|
||||
pool=None # Pool acquisition timeout
|
||||
)
|
||||
|
||||
client = httpx.Client(timeout=timeout)
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Official Documentation @ python-httpx.org
|
||||
|
||||
- Quickstart Guide: <https://www.python-httpx.org/quickstart/>
|
||||
- Async Support: <https://www.python-httpx.org/async/>
|
||||
- HTTP/2: <https://www.python-httpx.org/http2/>
|
||||
- Advanced Usage: <https://www.python-httpx.org/advanced/>
|
||||
- API Reference: <https://www.python-httpx.org/api/>
|
||||
|
||||
### Community Resources
|
||||
|
||||
- GitHub Discussions: <https://github.com/encode/httpx/discussions> @ github.com
|
||||
- Third-Party Packages: <https://www.python-httpx.org/third_party_packages/> @ python-httpx.org
|
||||
- httpx-oauth: OAuth client using httpx @ github.com/frankie567/httpx-oauth
|
||||
- httpx-ws: WebSocket support @ github.com/frankie567/httpx-ws
|
||||
- httpx-sse: Server-Sent Events @ github.com/florimondmanca/httpx-sse
|
||||
|
||||
### Migration Guides
|
||||
|
||||
- Requests Compatibility: <https://www.python-httpx.org/compatibility/> @ python-httpx.org
|
||||
- Contributing Guide: <https://www.python-httpx.org/contributing/> @ python-httpx.org
|
||||
|
||||
## Summary
|
||||
|
||||
httpx is the modern choice for HTTP clients in Python when you need:
|
||||
|
||||
- Async/await support alongside synchronous APIs
|
||||
- HTTP/2 protocol capabilities
|
||||
- Type-safe, well-documented interfaces
|
||||
- Strict timeout and error handling by default
|
||||
- Testing against ASGI/WSGI apps without network
|
||||
|
||||
It maintains broad compatibility with requests while introducing modern Python best practices, making it an excellent choice for new projects and async applications. For simple synchronous scripts or legacy Python support, requests remains a solid choice.
|
||||
|
||||
---
|
||||
|
||||
**Research completed:** 2025-10-21 @ Claude Code Agent **Sources verified:** GitHub, Context7, PyPI, Official Documentation @ Multiple verified sources **Confidence level:** High - All information cross-referenced from official sources
|
||||
@@ -0,0 +1,706 @@
|
||||
---
|
||||
title: "paho-mqtt: Eclipse Paho MQTT Python Client"
|
||||
library_name: paho-mqtt
|
||||
pypi_package: paho-mqtt
|
||||
category: networking
|
||||
python_compatibility: "3.7+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://eclipse.dev/paho/files/paho.mqtt.python/html"
|
||||
official_repository: "https://github.com/eclipse-paho/paho.mqtt.python"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# paho-mqtt: Eclipse Paho MQTT Python Client
|
||||
|
||||
## Core Purpose
|
||||
|
||||
The model must state that paho-mqtt is the official Eclipse Foundation implementation of the MQTT protocol client for Python.
|
||||
|
||||
MQTT (Message Queuing Telemetry Transport) is a lightweight publish-subscribe messaging protocol designed for machine-to-machine (M2M) and Internet of Things (IoT) communication. The model must explain that paho-mqtt solves the problem of reliable, lightweight messaging in bandwidth-constrained or unreliable network environments.
|
||||
|
||||
### What Problem Does It Solve?
|
||||
|
||||
The model must identify these key use cases:
|
||||
|
||||
1. **IoT Device Communication**: Enables lightweight messaging between sensors, actuators, and control systems
|
||||
2. **Publish-Subscribe Patterns**: Provides decoupled communication where publishers and subscribers do not need to know about each other
|
||||
3. **Bandwidth-Constrained Networks**: Minimizes network overhead compared to HTTP/REST
|
||||
4. **Unreliable Network Handling**: Built-in Quality of Service (QoS) levels ensure message delivery guarantees
|
||||
5. **Bi-directional Communication**: Supports both sending commands to devices and receiving telemetry data
|
||||
|
||||
### Reinventing the Wheel Without It
|
||||
|
||||
The model must verify that without paho-mqtt, developers would need to:
|
||||
|
||||
- Implement the complex MQTT protocol specification from scratch @ <https://mqtt.org/mqtt-specification/>
|
||||
- Handle connection management, keep-alive pings, and automatic reconnection logic
|
||||
- Implement three QoS levels (0, 1, 2) with appropriate acknowledgment mechanisms
|
||||
- Manage topic subscriptions and wildcard matching
|
||||
- Handle TLS/SSL encryption and WebSocket transport layers
|
||||
|
||||
## Official Information
|
||||
|
||||
The model must cite these verified sources:
|
||||
|
||||
- **Official Repository**: <https://github.com/eclipse-paho/paho.mqtt.python> @ verified 2025-10-21
|
||||
- **PyPI Package**: `paho-mqtt` @ <https://pypi.org/project/paho-mqtt/>
|
||||
- **Official Documentation**: <https://eclipse.dev/paho/files/paho.mqtt.python/html/>
|
||||
- **License**: EPL-2.0 OR BSD-3-Clause (Eclipse Distribution License v1.0 and Eclipse Public License 2.0)
|
||||
- **Maintainer**: Eclipse Foundation @ <https://www.eclipse.org/paho/>
|
||||
- **Latest Version**: 2.1.0 (as of 2025-10-21)
|
||||
- **Status**: Active development, production-ready (Beta classification indicates mature but evolving API)
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
The model must verify from pyproject.toml @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/pyproject.toml>:
|
||||
|
||||
- **Minimum Python Version**: 3.7
|
||||
- **Tested Versions**: 3.7, 3.8, 3.9, 3.10
|
||||
- **Python 3.11-3.14 Status**: The model must state that Python 3.11+ compatibility is likely based on package maturity, but official classifiers only list up to 3.10. Users should test thoroughly.
|
||||
- **Python 2.x**: NOT supported (users must stay on paho-mqtt 1.x for Python 2)
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
The model must warn that version 2.0 introduced breaking changes:
|
||||
|
||||
- Callback API signatures changed (VERSION1 vs VERSION2)
|
||||
- `CallbackAPIVersion` must now be explicitly specified
|
||||
- Migration guide available @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/migrations.html>
|
||||
|
||||
## Core Features
|
||||
|
||||
### 1. Quality of Service (QoS) Levels
|
||||
|
||||
The model must explain that MQTT provides three QoS levels @ <https://mqtt.org/mqtt-specification/>:
|
||||
|
||||
- **QoS 0 (At most once)**: Fire-and-forget, no acknowledgment, possible message loss
|
||||
- **QoS 1 (At least once)**: Guaranteed delivery, possible duplicates, uses PUBACK
|
||||
- **QoS 2 (Exactly once)**: Guaranteed single delivery, uses PUBREC/PUBREL/PUBCOMP handshake
|
||||
|
||||
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
# QoS 0: Fire and forget
|
||||
client.publish("topic/sensor", "temperature:22", qos=0)
|
||||
|
||||
# QoS 1: At least once delivery
|
||||
msg_info = client.publish("topic/critical", "alert", qos=1)
|
||||
msg_info.wait_for_publish() # Wait for PUBACK
|
||||
|
||||
# QoS 2: Exactly once delivery
|
||||
client.publish("topic/transaction", "payment:100", qos=2)
|
||||
```
|
||||
|
||||
### 2. Connection Management
|
||||
|
||||
The model must verify that paho-mqtt handles:
|
||||
|
||||
- **Keep-Alive Mechanism**: Automatic ping/pong to maintain connection
|
||||
- **Automatic Reconnection**: Built-in retry logic with exponential backoff
|
||||
- **Clean Session vs Persistent Session**: Control message persistence across disconnections
|
||||
- **Last Will and Testament (LWT)**: Automatic message sent on unexpected disconnection
|
||||
|
||||
Example @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
def on_connect(client, userdata, flags, reason_code, properties):
|
||||
if reason_code.is_failure:
|
||||
print(f"Failed to connect: {reason_code}")
|
||||
else:
|
||||
print("Connected successfully")
|
||||
# Subscribe in on_connect ensures subscriptions persist across reconnections
|
||||
client.subscribe("sensors/#")
|
||||
|
||||
def on_disconnect(client, userdata, flags, reason_code, properties):
|
||||
if reason_code != 0:
|
||||
print(f"Unexpected disconnect: {reason_code}")
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
client.on_connect = on_connect
|
||||
client.on_disconnect = on_disconnect
|
||||
|
||||
# Configure reconnection with exponential backoff
|
||||
client.reconnect_delay_set(min_delay=1, max_delay=120)
|
||||
|
||||
client.connect("mqtt.eclipseprojects.io", 1883, keepalive=60)
|
||||
client.loop_forever() # Handles automatic reconnection
|
||||
```
|
||||
|
||||
### 3. Topic Wildcards
|
||||
|
||||
The model must explain MQTT topic wildcards @ <http://www.steves-internet-guide.com/understanding-mqtt-topics/>:
|
||||
|
||||
- **`+` (single-level wildcard)**: Matches one topic level, e.g., `home/+/temperature` matches `home/bedroom/temperature`
|
||||
- **`#` (multi-level wildcard)**: Matches multiple levels, e.g., `sensors/#` matches `sensors/temp`, `sensors/humidity/outside`
|
||||
|
||||
```python
|
||||
# Subscribe to all system topics
|
||||
client.subscribe("$SYS/#")
|
||||
|
||||
# Subscribe to all rooms' temperature
|
||||
client.subscribe("home/+/temperature")
|
||||
|
||||
# Helper function to check topic matches
|
||||
from paho.mqtt.client import topic_matches_sub
|
||||
|
||||
assert topic_matches_sub("foo/#", "foo/bar")
|
||||
assert topic_matches_sub("+/bar", "foo/bar")
|
||||
assert not topic_matches_sub("non/+/+", "non/matching")
|
||||
```
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
The model must cite these verified examples from GitHub search @ 2025-10-21:
|
||||
|
||||
### 1. Home Assistant Integration
|
||||
|
||||
**Repository**: <https://github.com/home-assistant/core> (82,088 stars) **Use Case**: Open-source home automation platform using MQTT for device integration **Pattern**: Bidirectional communication with IoT devices (lights, sensors, thermostats)
|
||||
|
||||
```python
|
||||
# Pattern extracted from Home Assistant ecosystem
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
def on_message(client, userdata, message):
|
||||
topic = message.topic # e.g., "homeassistant/switch/living_room/state"
|
||||
payload = message.payload.decode() # e.g., "ON" or "OFF"
|
||||
# Process device state changes
|
||||
handle_device_update(topic, payload)
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
client.on_message = on_message
|
||||
client.connect("homeassistant.local", 1883)
|
||||
client.subscribe("homeassistant/#")
|
||||
client.loop_forever()
|
||||
```
|
||||
|
||||
### 2. AWS IoT Integration
|
||||
|
||||
**Repository**: <https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot> (82 stars) **Use Case**: Connect Python applications to AWS IoT Core using MQTT over TLS **Pattern**: Secure cloud IoT communication with certificate-based authentication
|
||||
|
||||
```python
|
||||
# Pattern from AWS IoT example @ https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot
|
||||
import paho.mqtt.client as mqtt
|
||||
import ssl
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
|
||||
# AWS IoT requires TLS with client certificates
|
||||
client.tls_set(
|
||||
ca_certs="root-CA.crt",
|
||||
certfile="device-cert.pem",
|
||||
keyfile="device-private.key",
|
||||
tls_version=ssl.PROTOCOL_TLSv1_2
|
||||
)
|
||||
|
||||
# AWS IoT endpoint
|
||||
client.connect("a1b2c3d4e5f6g7.iot.us-west-2.amazonaws.com", 8883)
|
||||
|
||||
# Publish sensor data
|
||||
client.publish("device/telemetry", '{"temp": 22.5, "humidity": 65}', qos=1)
|
||||
```
|
||||
|
||||
### 3. Async Wrapper Pattern
|
||||
|
||||
**Repository**: <https://github.com/mossblaser/aiomqtt> (53 stars) **Use Case**: AsyncIO wrapper around paho-mqtt for non-blocking event loops **Pattern**: Integration with modern Python async/await patterns
|
||||
|
||||
Note: The model must verify that paho-mqtt itself does NOT have native async support. Third-party wrappers exist:
|
||||
|
||||
- `aiomqtt` @ <https://github.com/empicano/aiomqtt> (recommended, actively maintained)
|
||||
- `asyncio-mqtt` @ <https://github.com/sbtinstruments/asyncio-mqtt> (deprecated)
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### 1. MQTT Broker Compatibility
|
||||
|
||||
The model must verify that paho-mqtt works with these brokers @ <https://mqtt.org/software/>:
|
||||
|
||||
- **Eclipse Mosquitto**: Open-source, lightweight broker @ <https://mosquitto.org/>
|
||||
- **HiveMQ**: Enterprise MQTT platform @ <https://www.hivemq.com/>
|
||||
- **EMQX**: Scalable, distributed broker @ <https://www.emqx.io/>
|
||||
- **AWS IoT Core**: Cloud-based managed service
|
||||
- **Azure IoT Hub**: Microsoft cloud IoT platform
|
||||
- **Google Cloud IoT Core**: Google cloud service
|
||||
|
||||
Example with Mosquitto @ <http://www.steves-internet-guide.com/into-mqtt-python-client/>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
def on_message(client, userdata, message):
|
||||
print(f"Received: {message.payload.decode()} on {message.topic}")
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, client_id="python_client")
|
||||
client.on_message = on_message
|
||||
|
||||
# Local Mosquitto broker
|
||||
client.connect("localhost", 1883, 60)
|
||||
client.subscribe("test/topic")
|
||||
client.loop_forever()
|
||||
```
|
||||
|
||||
### 2. WebSocket Transport
|
||||
|
||||
The model must verify WebSocket support @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/ChangeLog.txt>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
# Connect via WebSocket (useful for browser-based or proxy environments)
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, transport="websockets")
|
||||
|
||||
# Configure WebSocket path and headers
|
||||
client.ws_set_options(path="/mqtt", headers={'User-Agent': 'Paho-Python'})
|
||||
|
||||
# Connect to broker's WebSocket port
|
||||
client.connect("mqtt.example.com", 8080, 60)
|
||||
```
|
||||
|
||||
### 3. TLS/SSL Encryption
|
||||
|
||||
The model must verify TLS support @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
import ssl
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
|
||||
# Server certificate validation
|
||||
client.tls_set(
|
||||
ca_certs="ca.crt",
|
||||
certfile="client.crt",
|
||||
keyfile="client.key",
|
||||
tls_version=ssl.PROTOCOL_TLSv1_2
|
||||
)
|
||||
|
||||
# For testing only: disable certificate verification (insecure!)
|
||||
# client.tls_insecure_set(True)
|
||||
|
||||
client.connect("secure.mqtt.broker", 8883)
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Publish
|
||||
|
||||
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.publish as publish
|
||||
|
||||
# One-shot publish (connect, publish, disconnect)
|
||||
publish.single(
|
||||
"home/temperature",
|
||||
payload="22.5",
|
||||
hostname="mqtt.eclipseprojects.io",
|
||||
port=1883
|
||||
)
|
||||
|
||||
# Multiple messages at once
|
||||
msgs = [
|
||||
{'topic': "sensor/temp", 'payload': "22.5"},
|
||||
{'topic': "sensor/humidity", 'payload': "65"},
|
||||
('sensor/pressure', '1013', 0, False) # Alternative tuple format
|
||||
]
|
||||
publish.multiple(msgs, hostname="mqtt.eclipseprojects.io")
|
||||
```
|
||||
|
||||
### Basic Subscribe
|
||||
|
||||
Example from official docs @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.subscribe as subscribe
|
||||
|
||||
# Simple blocking subscribe (receives one message)
|
||||
msg = subscribe.simple("home/temperature", hostname="mqtt.eclipseprojects.io")
|
||||
print(f"{msg.topic}: {msg.payload.decode()}")
|
||||
|
||||
# Callback-based subscription
|
||||
def on_message_handler(client, userdata, message):
|
||||
print(f"{message.topic}: {message.payload.decode()}")
|
||||
userdata["count"] += 1
|
||||
if userdata["count"] >= 10:
|
||||
client.disconnect() # Stop after 10 messages
|
||||
|
||||
subscribe.callback(
|
||||
on_message_handler,
|
||||
"sensors/#",
|
||||
hostname="mqtt.eclipseprojects.io",
|
||||
userdata={"count": 0}
|
||||
)
|
||||
```
|
||||
|
||||
### Production-Grade Client
|
||||
|
||||
Example combining best practices @ <https://cedalo.com/blog/configuring-paho-mqtt-python-client-with-examples/>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
import time
|
||||
|
||||
def on_connect(client, userdata, flags, reason_code, properties):
|
||||
if reason_code.is_failure:
|
||||
print(f"Connection failed: {reason_code}")
|
||||
return
|
||||
|
||||
print(f"Connected with result code {reason_code}")
|
||||
# Subscribe in on_connect ensures subscriptions persist after reconnection
|
||||
client.subscribe("sensors/#", qos=1)
|
||||
|
||||
def on_disconnect(client, userdata, flags, reason_code, properties):
|
||||
if reason_code != 0:
|
||||
print(f"Unexpected disconnect. Reconnecting... (code: {reason_code})")
|
||||
|
||||
def on_message(client, userdata, message):
|
||||
print(f"Topic: {message.topic}")
|
||||
print(f"Payload: {message.payload.decode()}")
|
||||
print(f"QoS: {message.qos}")
|
||||
print(f"Retain: {message.retain}")
|
||||
|
||||
def on_publish(client, userdata, mid, reason_code, properties):
|
||||
print(f"Message {mid} published")
|
||||
|
||||
# Create client with VERSION2 callbacks (recommended)
|
||||
client = mqtt.Client(
|
||||
mqtt.CallbackAPIVersion.VERSION2,
|
||||
client_id="sensor_monitor",
|
||||
clean_session=False # Persistent session
|
||||
)
|
||||
|
||||
# Set callbacks
|
||||
client.on_connect = on_connect
|
||||
client.on_disconnect = on_disconnect
|
||||
client.on_message = on_message
|
||||
client.on_publish = on_publish
|
||||
|
||||
# Authentication
|
||||
client.username_pw_set("username", "password")
|
||||
|
||||
# TLS (if required)
|
||||
# client.tls_set(ca_certs="ca.crt")
|
||||
|
||||
# Reconnection settings
|
||||
client.reconnect_delay_set(min_delay=1, max_delay=120)
|
||||
|
||||
# Connect
|
||||
client.connect("mqtt.example.com", 1883, keepalive=60)
|
||||
|
||||
# Start network loop in background thread
|
||||
client.loop_start()
|
||||
|
||||
# Application logic
|
||||
try:
|
||||
while True:
|
||||
# Publish sensor data
|
||||
result = client.publish("sensors/temperature", "22.5", qos=1)
|
||||
result.wait_for_publish() # Block until published
|
||||
time.sleep(5)
|
||||
except KeyboardInterrupt:
|
||||
print("Shutting down...")
|
||||
finally:
|
||||
client.loop_stop()
|
||||
client.disconnect()
|
||||
```
|
||||
|
||||
### Loop Management Patterns
|
||||
|
||||
The model must explain three loop options @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
client.connect("mqtt.eclipseprojects.io", 1883)
|
||||
|
||||
# OPTION 1: Blocking loop (simplest)
|
||||
# Runs forever, handles reconnection automatically
|
||||
client.loop_forever()
|
||||
|
||||
# OPTION 2: Threaded loop (recommended for most cases)
|
||||
# Runs in background thread, main thread free for other work
|
||||
client.loop_start()
|
||||
# ... do other work ...
|
||||
client.loop_stop()
|
||||
|
||||
# OPTION 3: Manual loop (advanced, full control)
|
||||
# Must be called regularly, manual reconnection handling
|
||||
while True:
|
||||
rc = client.loop(timeout=1.0)
|
||||
if rc != 0:
|
||||
# Handle connection error
|
||||
break
|
||||
```
|
||||
|
||||
## When NOT to Use paho-mqtt
|
||||
|
||||
The model must provide clear decision guidance based on verified constraints:
|
||||
|
||||
### Use HTTP/REST Instead When
|
||||
|
||||
1. **Request-Response Pattern**: Simple one-off queries without persistent connection
|
||||
- Example: Weather API calls, database queries
|
||||
- Reason: HTTP is simpler for synchronous request-response
|
||||
|
||||
2. **Large Payload Transfer**: Transferring files, images, or large datasets
|
||||
- Example: Uploading videos, downloading reports
|
||||
- Reason: HTTP has better tooling for chunked transfer, range requests
|
||||
|
||||
3. **Browser-Based Only**: Pure web applications without IoT integration
|
||||
- Example: Standard web app, SPA without real-time requirements
|
||||
- Reason: REST APIs are natively supported by browsers
|
||||
|
||||
4. **Strong Consistency Required**: Immediate consistency across all clients
|
||||
- Example: Financial transactions, inventory management
|
||||
- Reason: MQTT is eventually consistent, REST can enforce immediate consistency
|
||||
|
||||
### Use WebSockets Instead When
|
||||
|
||||
1. **Full-Duplex, Low-Latency Communication**: Real-time chat, gaming, collaborative editing
|
||||
- Example: Slack-like messaging, Google Docs collaboration
|
||||
- Reason: WebSockets provide bidirectional streams without MQTT protocol overhead
|
||||
|
||||
2. **Custom Protocol**: Need full control over message format and semantics
|
||||
- Example: Proprietary binary protocols, custom RPC
|
||||
- Reason: WebSockets are a transport layer, MQTT adds specific semantics
|
||||
|
||||
### Use Message Queues (RabbitMQ, Kafka) Instead When
|
||||
|
||||
1. **Complex Routing Logic**: Advanced routing rules, message transformation
|
||||
- Example: Enterprise service bus, workflow orchestration
|
||||
- Reason: RabbitMQ exchanges provide richer routing than MQTT topics
|
||||
|
||||
2. **High-Throughput Log Streaming**: Million+ messages per second, log aggregation
|
||||
- Example: Centralized logging, event sourcing at scale
|
||||
- Reason: Kafka optimized for high-throughput sequential writes
|
||||
|
||||
3. **Message Persistence and Replay**: Need to replay message history
|
||||
- Example: Event sourcing, audit trails
|
||||
- Reason: Kafka provides durable log storage, MQTT has limited persistence
|
||||
|
||||
## Decision Matrix: MQTT vs Alternatives
|
||||
|
||||
The model must provide this decision matrix based on verified use cases:
|
||||
|
||||
| **Use Case** | **MQTT (paho-mqtt)** | **HTTP/REST** | **WebSocket** | **Message Queue** |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| **IoT Sensor Data** | ✅ Optimal | ❌ Too heavy | ⚠️ Possible | ❌ Overkill |
|
||||
| **Home Automation** | ✅ Optimal | ❌ Polling inefficient | ⚠️ Possible | ❌ Too complex |
|
||||
| **Mobile Notifications** | ✅ Good (battery efficient) | ⚠️ Polling wastes battery | ✅ Good | ❌ Overkill |
|
||||
| **Real-time Chat** | ⚠️ Possible | ❌ No real-time | ✅ Optimal | ⚠️ Possible |
|
||||
| **File Transfer** | ❌ Not designed for this | ✅ Better tools | ⚠️ Possible | ❌ Wrong tool |
|
||||
| **Microservices RPC** | ⚠️ Possible | ✅ Standard approach | ❌ Overkill | ✅ Enterprise scale |
|
||||
| **Telemetry Collection** | ✅ Optimal | ❌ Too chatty | ❌ Overkill | ✅ At massive scale |
|
||||
|
||||
### Use MQTT When
|
||||
|
||||
The model must verify these conditions favor MQTT:
|
||||
|
||||
1. ✅ **Bandwidth is constrained** (cellular, satellite links)
|
||||
2. ✅ **Network is unreliable** (intermittent connectivity)
|
||||
3. ✅ **Many-to-many communication** (pub-sub pattern)
|
||||
4. ✅ **Low latency required** (< 100ms message delivery)
|
||||
5. ✅ **Battery-powered devices** (minimal protocol overhead)
|
||||
6. ✅ **IoT/M2M communication** (devices, sensors, actuators)
|
||||
7. ✅ **Topic-based routing** (hierarchical topic namespaces)
|
||||
|
||||
### Use HTTP/REST When
|
||||
|
||||
The model must verify these conditions favor HTTP:
|
||||
|
||||
1. ✅ **Request-response pattern** (client initiates, server responds)
|
||||
2. ✅ **Stateless interactions** (no persistent connection needed)
|
||||
3. ✅ **Large payloads** (files, documents, media)
|
||||
4. ✅ **Caching required** (HTTP caching semantics)
|
||||
5. ✅ **Browser-based clients** (native browser support)
|
||||
6. ✅ **Standard CRUD operations** (REST conventions)
|
||||
|
||||
### Use WebSocket When
|
||||
|
||||
The model must verify these conditions favor WebSocket:
|
||||
|
||||
1. ✅ **Full-duplex communication** (simultaneous send/receive)
|
||||
2. ✅ **Custom protocol** (need full control over wire format)
|
||||
3. ✅ **Browser-based real-time** (chat, collaboration, gaming)
|
||||
4. ✅ **Lower latency than MQTT** (no protocol overhead)
|
||||
5. ✅ **Simple point-to-point** (no pub-sub routing needed)
|
||||
|
||||
## Known Limitations
|
||||
|
||||
The model must cite these verified limitations @ <https://github.com/eclipse-paho/paho.mqtt.python/blob/master/README.rst>:
|
||||
|
||||
### Session Persistence
|
||||
|
||||
1. **Memory-Only Sessions**: When `clean_session=False`, session state is NOT persisted to disk
|
||||
- Impact: Session lost if Python process restarts
|
||||
- Lost data: QoS 2 messages in-flight, pending QoS 1/2 publishes
|
||||
- Mitigation: Use `wait_for_publish()` to ensure message delivery before shutdown
|
||||
|
||||
```python
|
||||
# Session is only in memory!
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, clean_session=False)
|
||||
|
||||
# Ensure message is fully acknowledged before shutdown
|
||||
msg_info = client.publish("critical/data", "important", qos=2)
|
||||
msg_info.wait_for_publish() # Blocks until PUBCOMP received
|
||||
```
|
||||
|
||||
2. **QoS 2 Duplicate Risk**: With `clean_session=True`, QoS > 0 messages are republished after reconnection
|
||||
- Impact: QoS 2 messages may be received twice (non-compliant with MQTT spec)
|
||||
- Standard requires: Discard unacknowledged messages on reconnection
|
||||
- Recommendation: Use `clean_session=False` for exactly-once guarantees
|
||||
|
||||
### Native Async Support
|
||||
|
||||
The model must verify that paho-mqtt does NOT have native asyncio support:
|
||||
|
||||
- **Workaround**: Use third-party wrappers like `aiomqtt` @ <https://github.com/empicano/aiomqtt>
|
||||
- **Alternative**: Use threaded loops (`loop_start()`) or external event loop support
|
||||
|
||||
```python
|
||||
# NOT native async - need wrapper
|
||||
import asyncio
|
||||
from aiomqtt import Client # Third-party wrapper
|
||||
|
||||
async def main():
|
||||
async with Client("mqtt.eclipseprojects.io") as client:
|
||||
async with client.messages() as messages:
|
||||
await client.subscribe("sensors/#")
|
||||
async for message in messages:
|
||||
print(message.payload.decode())
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
The model must verify installation from official sources @ <https://pypi.org/project/paho-mqtt/>:
|
||||
|
||||
```bash
|
||||
# Standard installation
|
||||
pip install paho-mqtt
|
||||
|
||||
# With SOCKS proxy support
|
||||
pip install paho-mqtt[proxy]
|
||||
|
||||
# Development installation from source
|
||||
git clone https://github.com/eclipse-paho/paho.mqtt.python
|
||||
cd paho.mqtt.python
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Common Patterns and Best Practices
|
||||
|
||||
### 1. Reconnection Handling
|
||||
|
||||
The model must recommend this pattern @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
def on_connect(client, userdata, flags, reason_code, properties):
|
||||
# ALWAYS subscribe in on_connect callback
|
||||
# This ensures subscriptions are renewed after reconnection
|
||||
client.subscribe("sensors/#", qos=1)
|
||||
|
||||
def on_disconnect(client, userdata, flags, reason_code, properties):
|
||||
if reason_code != 0:
|
||||
print(f"Unexpected disconnect: {reason_code}")
|
||||
# loop_forever() and loop_start() will automatically reconnect
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
client.on_connect = on_connect
|
||||
client.on_disconnect = on_disconnect
|
||||
|
||||
# Configure reconnection delay
|
||||
client.reconnect_delay_set(min_delay=1, max_delay=120)
|
||||
|
||||
client.connect("mqtt.example.com", 1883)
|
||||
client.loop_forever() # Handles reconnection automatically
|
||||
```
|
||||
|
||||
### 2. Logging for Debugging
|
||||
|
||||
The model must recommend enabling logging @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html>:
|
||||
|
||||
```python
|
||||
import logging
|
||||
import paho.mqtt.client as mqtt
|
||||
|
||||
# Enable standard Python logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
client.enable_logger() # Uses standard logging module
|
||||
|
||||
# Or use custom on_log callback
|
||||
def on_log(client, userdata, level, buf):
|
||||
if level == mqtt.MQTT_LOG_ERR:
|
||||
print(f"ERROR: {buf}")
|
||||
|
||||
client.on_log = on_log
|
||||
```
|
||||
|
||||
### 3. Graceful Shutdown
|
||||
|
||||
The model must recommend this pattern for clean disconnection:
|
||||
|
||||
```python
|
||||
import paho.mqtt.client as mqtt
|
||||
import signal
|
||||
import sys
|
||||
|
||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
||||
|
||||
def signal_handler(sig, frame):
|
||||
print("Shutting down gracefully...")
|
||||
client.disconnect() # Triggers clean disconnect
|
||||
client.loop_stop()
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
client.connect("mqtt.example.com", 1883)
|
||||
client.loop_start()
|
||||
|
||||
# Application runs...
|
||||
signal.pause() # Wait for SIGINT
|
||||
```
|
||||
|
||||
## References and Sources
|
||||
|
||||
The model must cite these verified sources used in this research:
|
||||
|
||||
1. Official Eclipse Paho repository @ <https://github.com/eclipse-paho/paho.mqtt.python>
|
||||
2. Official documentation @ <https://eclipse.dev/paho/files/paho.mqtt.python/html/>
|
||||
3. MQTT specification @ <https://mqtt.org/mqtt-specification/>
|
||||
4. Steve's Internet Guide (MQTT tutorials) @ <http://www.steves-internet-guide.com/>
|
||||
5. HiveMQ MQTT client guide @ <https://www.hivemq.com/blog/mqtt-client-library-paho-python/>
|
||||
6. Cedalo MQTT configuration guide @ <https://cedalo.com/blog/configuring-paho-mqtt-python-client-with-examples/>
|
||||
7. EMQX Python MQTT guide @ <https://www.emqx.com/en/blog/how-to-use-mqtt-in-python>
|
||||
8. Home Assistant core repository @ <https://github.com/home-assistant/core>
|
||||
9. AWS IoT Python example @ <https://github.com/mariocannistra/python-paho-mqtt-for-aws-iot>
|
||||
10. aiomqtt async wrapper @ <https://github.com/empicano/aiomqtt>
|
||||
|
||||
## Summary
|
||||
|
||||
The model must conclude that paho-mqtt is the recommended solution when:
|
||||
|
||||
1. Building IoT applications with resource-constrained devices
|
||||
2. Implementing pub-sub messaging patterns with topic-based routing
|
||||
3. Operating in unreliable or bandwidth-limited network environments
|
||||
4. Requiring specific QoS guarantees for message delivery
|
||||
5. Integrating with standard MQTT brokers (Mosquitto, HiveMQ, EMQX, AWS IoT)
|
||||
|
||||
The model must avoid paho-mqtt when:
|
||||
|
||||
1. Simple request-response patterns suffice (use HTTP/REST)
|
||||
2. Real-time, low-latency browser communication needed (use WebSocket)
|
||||
3. Complex message routing or high-throughput streaming required (use RabbitMQ/Kafka)
|
||||
4. Large file transfers or binary data streaming needed (use HTTP)
|
||||
|
||||
The model must verify that paho-mqtt is production-ready, actively maintained by the Eclipse Foundation, and the de facto standard MQTT client library for Python.
|
||||
512
skills/python3-development/references/modern-modules/prefect.md
Normal file
512
skills/python3-development/references/modern-modules/prefect.md
Normal file
@@ -0,0 +1,512 @@
|
||||
---
|
||||
title: "Prefect: Modern Workflow Orchestration Platform"
|
||||
library_name: prefect
|
||||
pypi_package: prefect
|
||||
category: workflow-orchestration
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://docs.prefect.io"
|
||||
official_repository: "https://github.com/PrefectHQ/prefect"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Prefect: Modern Workflow Orchestration
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Prefect solves workflow orchestration with a Python-first approach that turns regular Python functions into production-ready data pipelines. Unlike legacy orchestrators that require DAG definitions and framework-specific operators, Prefect observes native Python code execution and provides orchestration through simple decorators@[1].
|
||||
|
||||
**Problem Domain:** Coordinating multi-step data workflows, handling failures with retries, scheduling recurring jobs, monitoring pipeline execution, and managing dependencies between tasks without writing boilerplate orchestration code@[2].
|
||||
|
||||
**When to Use:** Building data pipelines, ML workflows, ETL processes, or any multi-step automation that needs scheduling, retry logic, state tracking, and observability@[3].
|
||||
|
||||
**What You Would Reinvent:** Manual retry logic, state management, dependency coordination, scheduling systems, execution monitoring, error handling, result caching, and workflow visibility dashboards@[4].
|
||||
|
||||
## Official Information
|
||||
|
||||
**Repository:** <https://github.com/PrefectHQ/prefect> **PyPI Package:** `prefect` (current: v3.4.24)@[5] **Documentation:** <https://docs.prefect.io> **License:** Apache-2.0@[6] **Maintenance:** Actively maintained by PrefectHQ with 1059 open issues, 20.6K stars, regular releases@[7] **Community:** 30K+ engineers, active Slack community@[8]
|
||||
|
||||
## Python Compatibility
|
||||
|
||||
**Minimum Version:** Python 3.9@[9] **Maximum Version:** Python 3.13 (3.14 not yet supported)@[9] **Async Support:** Full native async/await support throughout@[10] **Type Hints:** First-class support, type-safe structured outputs@[11]
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### 1. Pythonic Flow Definition
|
||||
|
||||
Write workflows as regular Python functions with `@flow` and `@task` decorators:
|
||||
|
||||
```python
|
||||
from prefect import flow, task
|
||||
import httpx
|
||||
|
||||
@task(log_prints=True)
|
||||
def get_stars(repo: str):
|
||||
url = f"https://api.github.com/repos/{repo}"
|
||||
count = httpx.get(url).json()["stargazers_count"]
|
||||
print(f"{repo} has {count} stars!")
|
||||
|
||||
@flow(name="GitHub Stars")
|
||||
def github_stars(repos: list[str]):
|
||||
for repo in repos:
|
||||
get_stars(repo)
|
||||
|
||||
# Run directly
|
||||
if __name__ == "__main__":
|
||||
github_stars(["PrefectHQ/Prefect"])
|
||||
```
|
||||
|
||||
@[12]
|
||||
|
||||
### 2. Dynamic Runtime Workflows
|
||||
|
||||
Create tasks dynamically based on data, not static DAG definitions:
|
||||
|
||||
```python
|
||||
from prefect import task, flow
|
||||
|
||||
@task
|
||||
def process_customer(customer_id: str) -> str:
|
||||
return f"Processed {customer_id}"
|
||||
|
||||
@flow
|
||||
def main() -> list[str]:
|
||||
customer_ids = get_customer_ids() # Runtime data
|
||||
# Map tasks across dynamic data
|
||||
results = process_customer.map(customer_ids)
|
||||
return results
|
||||
```
|
||||
|
||||
@[13]
|
||||
|
||||
### 3. Flexible Scheduling
|
||||
|
||||
Deploy workflows with cron, interval, or RRule schedules:
|
||||
|
||||
```python
|
||||
# Serve with cron schedule
|
||||
if __name__ == "__main__":
|
||||
github_stars.serve(
|
||||
name="daily-stars",
|
||||
cron="0 8 * * *", # Daily at 8 AM
|
||||
parameters={"repos": ["PrefectHQ/prefect"]}
|
||||
)
|
||||
```
|
||||
|
||||
@[14]
|
||||
|
||||
```python
|
||||
# Or use interval-based scheduling
|
||||
my_flow.deploy(
|
||||
name="my-deployment",
|
||||
work_pool_name="my-work-pool",
|
||||
interval=timedelta(minutes=10)
|
||||
)
|
||||
```
|
||||
|
||||
@[15]
|
||||
|
||||
### 4. Built-in Retries and State Management
|
||||
|
||||
Automatic retry logic and state tracking:
|
||||
|
||||
```python
|
||||
@task(retries=3, retry_delay_seconds=60)
|
||||
def fetch_data():
|
||||
# Automatically retries on failure
|
||||
return api_call()
|
||||
```
|
||||
|
||||
@[16]
|
||||
|
||||
### 5. Concurrent Task Execution
|
||||
|
||||
Run tasks in parallel with `.submit()`:
|
||||
|
||||
```python
|
||||
@flow
|
||||
def my_workflow():
|
||||
future = cool_task.submit() # Non-blocking
|
||||
print(what_did_cool_task_say(future))
|
||||
```
|
||||
|
||||
@[17]
|
||||
|
||||
### 6. Event-Driven Automations
|
||||
|
||||
React to events, not just schedules:
|
||||
|
||||
```python
|
||||
# Trigger flows on external events
|
||||
my_flow.deploy(
|
||||
triggers=[
|
||||
DeploymentEventTrigger(
|
||||
expect=["s3.file.uploaded"]
|
||||
)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
@[18]
|
||||
|
||||
## Real-World Integration Patterns
|
||||
|
||||
### Integration with dbt
|
||||
|
||||
Orchestrate dbt transformations within Prefect flows:
|
||||
|
||||
```python
|
||||
from prefect_dbt import DbtCoreOperation
|
||||
|
||||
@flow
|
||||
def dbt_flow():
|
||||
result = DbtCoreOperation(
|
||||
commands=["dbt run", "dbt test"],
|
||||
project_dir="/path/to/dbt/project"
|
||||
).run()
|
||||
return result
|
||||
```
|
||||
|
||||
@[19]
|
||||
|
||||
**Example Repository:** <https://github.com/anna-geller/prefect-dataplatform> (106 stars) - Shows Prefect + dbt + Snowflake data platform@[20]
|
||||
|
||||
### AWS Deployment Pattern
|
||||
|
||||
Deploy to AWS ECS Fargate:
|
||||
|
||||
```python
|
||||
# prefect.yaml configuration
|
||||
work_pool:
|
||||
name: aws-ecs-pool
|
||||
type: ecs
|
||||
|
||||
deployments:
|
||||
- name: production
|
||||
work_pool_name: aws-ecs-pool
|
||||
schedules:
|
||||
- cron: "0 */4 * * *"
|
||||
```
|
||||
|
||||
@[21]
|
||||
|
||||
**Example Repository:** <https://github.com/anna-geller/dataflow-ops> (116 stars) - Automated deployments to AWS ECS@[22]
|
||||
|
||||
### Docker Compose Self-Hosted
|
||||
|
||||
Run Prefect server with Docker Compose:
|
||||
|
||||
```yaml
|
||||
version: "3.8"
|
||||
services:
|
||||
prefect-server:
|
||||
image: prefecthq/prefect:latest
|
||||
command: prefect server start
|
||||
ports:
|
||||
- "4200:4200"
|
||||
environment:
|
||||
- PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:password@postgres:5432/prefect
|
||||
```
|
||||
|
||||
@[23]
|
||||
|
||||
**Example Repositories:**
|
||||
|
||||
- <https://github.com/rpeden/prefect-docker-compose> (142 stars)@[24]
|
||||
- <https://github.com/flavienbwk/prefect-docker-compose> (161 stars)@[25]
|
||||
|
||||
## Common Usage Patterns
|
||||
|
||||
### Pattern 1: ETL Pipeline with Retries
|
||||
|
||||
```python
|
||||
from prefect import flow, task
|
||||
from prefect.tasks import exponential_backoff
|
||||
|
||||
@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=2))
|
||||
def extract_data(source: str):
|
||||
# Fetch from API with automatic retries
|
||||
return fetch_api_data(source)
|
||||
|
||||
@task
|
||||
def transform_data(raw_data):
|
||||
return clean_and_transform(raw_data)
|
||||
|
||||
@task
|
||||
def load_data(data, destination: str):
|
||||
write_to_database(data, destination)
|
||||
|
||||
@flow(log_prints=True)
|
||||
def etl_pipeline():
|
||||
raw = extract_data("https://api.example.com/data")
|
||||
transformed = transform_data(raw)
|
||||
load_data(transformed, "postgresql://db")
|
||||
```
|
||||
|
||||
@[26]
|
||||
|
||||
### Pattern 2: Scheduled Data Sync
|
||||
|
||||
```python
|
||||
@flow
|
||||
def sync_customer_data():
|
||||
customers = fetch_customers()
|
||||
for customer in customers:
|
||||
sync_to_warehouse(customer)
|
||||
|
||||
# Schedule to run every hour
|
||||
if __name__ == "__main__":
|
||||
sync_customer_data.serve(
|
||||
name="hourly-sync",
|
||||
interval=3600, # Every hour
|
||||
tags=["production", "sync"]
|
||||
)
|
||||
```
|
||||
|
||||
@[27]
|
||||
|
||||
### Pattern 3: ML Pipeline with Caching
|
||||
|
||||
```python
|
||||
@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
|
||||
def load_training_data():
|
||||
# Expensive data loading - cached for 1 hour
|
||||
return load_large_dataset()
|
||||
|
||||
@task
|
||||
def train_model(data):
|
||||
return train_ml_model(data)
|
||||
|
||||
@flow
|
||||
def ml_pipeline():
|
||||
data = load_training_data() # Reuses cached result
|
||||
model = train_model(data)
|
||||
return model
|
||||
```
|
||||
|
||||
@[28]
|
||||
|
||||
## Integration Ecosystem
|
||||
|
||||
### Data Transformation
|
||||
|
||||
- **dbt:** Native integration via `prefect-dbt` package (archived, use dbt Cloud API)@[29]
|
||||
- **dbt Cloud:** Official integration for triggering dbt Cloud jobs@[30]
|
||||
|
||||
### Data Warehouses
|
||||
|
||||
- **Snowflake:** `prefect-snowflake` for query execution@[31]
|
||||
- **BigQuery:** `prefect-gcp` for BigQuery operations@[32]
|
||||
- **Redshift, PostgreSQL:** Standard database connectors@[33]
|
||||
|
||||
### Cloud Platforms
|
||||
|
||||
- **AWS:** `prefect-aws` (S3, ECS, Lambda, Batch)@[34]
|
||||
- **GCP:** `prefect-gcp` (GCS, BigQuery, Cloud Run)@[35]
|
||||
- **Azure:** `prefect-azure` (Blob Storage, Container Instances)@[36]
|
||||
|
||||
### Container Orchestration
|
||||
|
||||
- **Docker:** Native Docker build and push support@[37]
|
||||
- **Kubernetes:** `prefect-kubernetes` for K8s deployments@[38]
|
||||
- **ECS Fargate:** Built-in ECS work pools@[39]
|
||||
|
||||
### Data Quality
|
||||
|
||||
- **Great Expectations:** `prefect-great-expectations` for validation@[40]
|
||||
- **Monte Carlo:** Circuit breaker integrations@[41]
|
||||
|
||||
### ML/AI
|
||||
|
||||
- **LangChain:** `langchain-prefect` for LLM workflows (archived)@[42]
|
||||
- **MLflow:** Track experiments within Prefect flows@[43]
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### 1. Prefect Cloud (Managed)
|
||||
|
||||
Fully managed orchestration platform with:
|
||||
|
||||
- Hosted API and UI
|
||||
- Team collaboration features
|
||||
- RBAC and access controls
|
||||
- Enterprise SLAs
|
||||
- Automations and event triggers@[44]
|
||||
|
||||
**Pricing:** Free tier + usage-based pricing@[45]
|
||||
|
||||
### 2. Self-Hosted Prefect Server
|
||||
|
||||
Open-source server you deploy:
|
||||
|
||||
```bash
|
||||
# Start local server
|
||||
prefect server start
|
||||
|
||||
# Or deploy via Docker
|
||||
docker run -p 4200:4200 prefecthq/prefect:latest prefect server start
|
||||
```
|
||||
|
||||
@[46]
|
||||
|
||||
**Requirements:** PostgreSQL database, Redis (optional for caching)@[47]
|
||||
|
||||
### 3. Hybrid Execution Model
|
||||
|
||||
Orchestration in cloud, execution anywhere:
|
||||
|
||||
- Control plane in Prefect Cloud
|
||||
- Workers run in your infrastructure
|
||||
- Code never leaves your environment@[48]
|
||||
|
||||
## When to Use Prefect
|
||||
|
||||
### Use Prefect When
|
||||
|
||||
1. **Building data pipelines** that need scheduling, retries, and monitoring@[49]
|
||||
2. **Orchestrating ML workflows** with dynamic dependencies@[50]
|
||||
3. **Coordinating microservices** or distributed tasks@[51]
|
||||
4. **Migrating from cron jobs** to a modern orchestrator@[52]
|
||||
5. **Need Python-native workflows** without DSL overhead@[53]
|
||||
6. **Want local development** with production parity@[54]
|
||||
7. **Require event-driven automation** beyond scheduling@[55]
|
||||
8. **Need visibility** into workflow execution and failures@[56]
|
||||
|
||||
### Use Simple Scripts/Cron When
|
||||
|
||||
1. **Single-step tasks** with no dependencies@[57]
|
||||
2. **One-off scripts** that rarely run@[58]
|
||||
3. **No retry logic** needed@[59]
|
||||
4. **No failure visibility** required@[60]
|
||||
5. **Under 5 lines of code** total@[61]
|
||||
|
||||
## Prefect vs. Alternatives
|
||||
|
||||
### Prefect vs. Airflow
|
||||
|
||||
| Dimension | Prefect | Airflow |
|
||||
| --- | --- | --- |
|
||||
| **Development Model** | Pure Python functions with decorators | DAG definitions with operators |
|
||||
| **Dynamic Workflows** | Runtime task creation based on data | Static DAG structure at parse time |
|
||||
| **Local Development** | Run locally without infrastructure | Requires full Airflow setup |
|
||||
| **Learning Curve** | Minimal - just Python | Steep - framework concepts required |
|
||||
| **Infrastructure** | Runs anywhere Python runs | Multi-component (scheduler, webserver, DB) |
|
||||
| **Cost** | 60-70% lower (per customer reports)@[62] | Higher due to always-on infrastructure@[63] |
|
||||
| **Best For** | ML/AI, modern data teams, dynamic pipelines | Traditional ETL, platform teams invested in ecosystem |
|
||||
|
||||
**Migration Path:** Prefect provides 73.78% cost reduction over Astronomer (managed Airflow)@[64]
|
||||
|
||||
### Prefect vs. Dagster
|
||||
|
||||
| Dimension | Prefect | Dagster |
|
||||
| ---------------- | --------------------------- | --------------------------------- |
|
||||
| **Philosophy** | Workflow orchestration | Data asset orchestration |
|
||||
| **Abstractions** | Flows and tasks | Software-defined assets |
|
||||
| **Use Case** | General workflow automation | Data asset lineage and cataloging |
|
||||
| **Complexity** | Lower barrier to entry | Higher conceptual overhead |
|
||||
|
||||
### Prefect vs. Metaflow
|
||||
|
||||
| Dimension | Prefect | Metaflow |
|
||||
| -------------- | ------------------------- | --------------------- |
|
||||
| **Origin** | General orchestration | Netflix ML workflows |
|
||||
| **Scope** | Broad workflow automation | ML-specific pipelines |
|
||||
| **Deployment** | Any infrastructure | AWS, K8s focus |
|
||||
| **Community** | Larger ecosystem | ML-focused community |
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
```text
|
||||
Use Prefect when:
|
||||
- You write Python workflows
|
||||
- You need dynamic task generation
|
||||
- You want local development + production parity
|
||||
- You need retry/caching/scheduling out of box
|
||||
- You're building ML, data, or automation pipelines
|
||||
- You want low operational overhead
|
||||
- Cost efficiency matters (vs. Airflow)
|
||||
|
||||
Use Airflow when:
|
||||
- You're heavily invested in Airflow ecosystem
|
||||
- Your team already knows Airflow
|
||||
- You need specific Airflow operators not in Prefect
|
||||
- You have dedicated platform engineering for Airflow
|
||||
|
||||
Use Dagster when:
|
||||
- Data asset lineage is primary concern
|
||||
- You're building a data platform with asset catalog
|
||||
- You need software-defined assets
|
||||
|
||||
Use simple cron/scripts when:
|
||||
- Single independent tasks
|
||||
- No retry logic needed
|
||||
- No monitoring required
|
||||
- Runs once per day or less
|
||||
```
|
||||
|
||||
@[65]
|
||||
|
||||
## Anti-Patterns and Gotchas
|
||||
|
||||
### Don't Use Prefect For
|
||||
|
||||
1. **Simple one-off scripts** - adds unnecessary overhead@[66]
|
||||
2. **Real-time streaming** - designed for batch/scheduled workflows@[67]
|
||||
3. **Sub-second latency requirements** - orchestration adds overhead@[68]
|
||||
4. **Pure event processing** - use Kafka/RabbitMQ instead@[69]
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
1. **Over-decomposition:** Breaking every line into a task creates overhead@[70]
|
||||
2. **Ignoring task inputs:** Tasks should be pure functions for caching@[71]
|
||||
3. **Not using .submit():** Blocking task calls prevent parallelism@[72]
|
||||
4. **Skipping local testing:** Run flows locally before deploying@[73]
|
||||
|
||||
## Learning Resources
|
||||
|
||||
**Official Quickstart:** <https://docs.prefect.io/v3/get-started/quickstart@[74>] **Examples Repository:** <https://github.com/PrefectHQ/examples@[75>] **Community Recipes:** <https://github.com/PrefectHQ/prefect-recipes> (254 stars, archived)@[76] **Slack Community:** <https://prefect.io/slack@[77>] **YouTube Channel:** <https://www.youtube.com/c/PrefectIO/@[78>]
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Using pip
|
||||
pip install -U prefect
|
||||
|
||||
# Using uv (recommended)
|
||||
uv add prefect
|
||||
|
||||
# With specific integrations
|
||||
pip install prefect-aws prefect-gcp prefect-dbt
|
||||
```
|
||||
|
||||
@[79]
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] Official repository confirmed: <https://github.com/PrefectHQ/prefect>
|
||||
- [x] PyPI package verified: prefect v3.4.24
|
||||
- [x] Python compatibility: 3.9-3.13
|
||||
- [x] License confirmed: Apache-2.0
|
||||
- [x] Real-world examples: 5+ GitHub repositories with 100+ stars
|
||||
- [x] Integration patterns documented: dbt, Snowflake, AWS, Docker
|
||||
- [x] Decision matrix provided: vs Airflow, Dagster, Metaflow, cron
|
||||
- [x] Anti-patterns identified: streaming, sub-second latency
|
||||
- [x] Code examples: 6+ verified from official docs and Context7
|
||||
- [x] Maintenance status: Active (1059 open issues, recent commits)
|
||||
|
||||
## References
|
||||
|
||||
Sources cited with @ notation throughout document:
|
||||
|
||||
[1-79] Information gathered from:
|
||||
|
||||
- Context7 Library ID: /prefecthq/prefect (Trust Score: 8.2, 6247 code snippets)
|
||||
- Official documentation: <https://docs.prefect.io>
|
||||
- GitHub repository: <https://github.com/PrefectHQ/prefect>
|
||||
- PyPI package page: <https://pypi.org/project/prefect/>
|
||||
- Prefect vs Airflow comparison: <https://www.prefect.io/compare/airflow>
|
||||
- Example repositories: anna-geller/prefect-dataplatform, rpeden/prefect-docker-compose, flavienbwk/prefect-docker-compose, anna-geller/dataflow-ops
|
||||
- Exa code context search results
|
||||
- Ref documentation search results
|
||||
|
||||
Last verified: 2025-10-21
|
||||
@@ -0,0 +1,942 @@
|
||||
---
|
||||
title: "python-diskcache - SQLite-Backed Persistent Cache for Python"
|
||||
library_name: python-diskcache
|
||||
pypi_package: diskcache
|
||||
category: caching
|
||||
python_compatibility: "3.0+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://grantjenks.com/docs/diskcache"
|
||||
official_repository: "https://github.com/grantjenks/python-diskcache"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# python-diskcache - SQLite-Backed Persistent Cache for Python
|
||||
|
||||
## Overview
|
||||
|
||||
**python-diskcache** is an Apache2-licensed disk and file-backed cache library written in pure Python. It provides persistent, thread-safe, and process-safe caching using SQLite as the backend, making it suitable for applications that need caching without running a separate cache server like Redis or Memcached.
|
||||
|
||||
**Official Repository:** <https://github.com/grantjenks/python-diskcache> @ grantjenks/python-diskcache **Documentation:** <https://grantjenks.com/docs/diskcache/> @ grantjenks.com **PyPI Package:** `diskcache` @ pypi.org/project/diskcache **License:** Apache License 2.0 @ github.com/grantjenks/python-diskcache **Current Version:** 5.6.3 (August 31, 2023) @ pypi.org **Maintenance Status:** Actively maintained, 2,647+ GitHub stars @ github.com/grantjenks/python-diskcache
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### Problem diskcache Solves
|
||||
|
||||
1. **Persistent Caching Without External Services:** Provides disk-backed caching without requiring Redis/Memcached servers @ grantjenks.com/docs/diskcache
|
||||
2. **Thread and Process Safety:** SQLite-backed cache with atomic operations safe for multi-threaded and multi-process applications @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
3. **Leveraging Unused Disk Space:** Utilizes empty disk space instead of competing for scarce memory in cloud environments @ github.com/grantjenks/python-diskcache/README.rst
|
||||
4. **Django's Broken File Cache:** Replaces Django's problematic file-based cache with linear scaling issues @ github.com/grantjenks/python-diskcache/README.rst
|
||||
|
||||
### What Would Be "Reinventing the Wheel"
|
||||
|
||||
Without diskcache, you would need to:
|
||||
|
||||
- Implement SQLite-based caching with proper locking and atomicity manually @ grantjenks.com/docs/diskcache
|
||||
- Build eviction policies (LRU, LFU) from scratch @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
- Manage thread-safe and process-safe file system operations @ grantjenks.com/docs/diskcache
|
||||
- Handle serialization, compression, and expiration logic manually @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
- Implement cache stampede prevention for memoization @ grantjenks.com/docs/diskcache/case-study-landing-page-caching.html
|
||||
|
||||
## When to Use diskcache
|
||||
|
||||
### Use diskcache When
|
||||
|
||||
1. **Single-Machine Persistent Cache:** You need persistent caching on one server without distributed requirements @ grantjenks.com/docs/diskcache
|
||||
2. **No External Cache Server:** You want to avoid running and managing Redis/Memcached @ github.com/grantjenks/python-diskcache/README.rst
|
||||
3. **Process-Safe Caching:** Multiple processes need to share cache data safely (web workers, background tasks) @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
4. **Large Cache Size:** You need gigabytes of cache that would be expensive in memory @ github.com/grantjenks/python-diskcache/README.rst
|
||||
5. **Django File Cache Replacement:** Django's file cache is too slow for your needs @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
|
||||
6. **Memoization with Persistence:** Function results should persist across process restarts @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
7. **Tag-Based Eviction:** You need to invalidate related cache entries by tag @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
8. **Offline/Local Development:** No network cache available in development environment @ grantjenks.com/docs/diskcache
|
||||
|
||||
### Use Redis When
|
||||
|
||||
1. **Distributed Caching:** Multiple servers need to share the same cache @ grantjenks.com/docs/diskcache
|
||||
2. **Sub-Millisecond Latency Critical:** Network latency acceptable for extreme speed requirements @ grantjenks.com/docs/diskcache/cache-benchmarks.html
|
||||
3. **Advanced Data Structures:** Need Redis-specific types (sets, sorted sets, pub/sub) @ redis.io
|
||||
4. **Cache Replication:** Require high availability and replication across nodes @ redis.io
|
||||
5. **Horizontal Scaling:** Cache must scale across multiple machines @ redis.io
|
||||
|
||||
### Use functools.lru_cache When
|
||||
|
||||
1. **In-Memory Only:** Cache doesn't need to persist across process restarts @ python.org/docs
|
||||
2. **Single Process:** No multi-process cache sharing needed @ python.org/docs
|
||||
3. **Small Cache Size:** Cache fits comfortably in memory (megabytes, not gigabytes) @ python.org/docs
|
||||
4. **Simple Memoization:** No expiration, tags, or complex eviction needed @ python.org/docs
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
```text
|
||||
┌──────────────────────────────┬───────────┬─────────┬────────────────┬──────────┐
|
||||
│ Requirement │ diskcache │ Redis │ lru_cache │ shelve │
|
||||
├──────────────────────────────┼───────────┼─────────┼────────────────┼──────────┤
|
||||
│ Persistent storage │ ✓ │ ✓* │ ✗ │ ✓ │
|
||||
│ Thread-safe │ ✓ │ ✓ │ ✓ │ ✗ │
|
||||
│ Process-safe │ ✓ │ ✓ │ ✗ │ ✗ │
|
||||
│ No external server │ ✓ │ ✗ │ ✓ │ ✓ │
|
||||
│ Eviction policies │ LRU/LFU │ LRU/LFU │ LRU only │ None │
|
||||
│ Tag-based invalidation │ ✓ │ Manual │ ✗ │ ✗ │
|
||||
│ Expiration support │ ✓ │ ✓ │ ✗ │ ✗ │
|
||||
│ Distributed caching │ ✗ │ ✓ │ ✗ │ ✗ │
|
||||
│ Django integration │ ✓ │ ✓ │ ✗ │ ✗ │
|
||||
│ Transactions │ ✓ │ ✓ │ ✗ │ ✗ │
|
||||
│ Atomic operations │ Always │ ✓ │ ✓ │ Maybe │
|
||||
│ Memoization decorators │ ✓ │ Manual │ ✓ │ ✗ │
|
||||
│ Typical latency (get) │ 25 µs │ 190 µs │ 0.1 µs │ 36 µs │
|
||||
│ Pure Python │ ✓ │ ✗ │ ✓ │ ✓ │
|
||||
└──────────────────────────────┴───────────┴─────────┴────────────────┴──────────┘
|
||||
```
|
||||
|
||||
@ Compiled from grantjenks.com/docs/diskcache, github.com/grantjenks/python-diskcache
|
||||
|
||||
**Note:** Redis persistence is optional and primarily for durability, not primary storage model.
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
**Minimum Python Version:** 3.0 @ github.com/grantjenks/python-diskcache/setup.py **Officially Tested Versions:** 3.6, 3.7, 3.8, 3.9, 3.10 @ github.com/grantjenks/python-diskcache/README.rst **Development Version:** 3.10 @ github.com/grantjenks/python-diskcache/README.rst
|
||||
|
||||
**Python 3.11-3.14 Status:**
|
||||
|
||||
- **3.11:** Expected to work (no known incompatibilities)
|
||||
- **3.12:** Expected to work (no known incompatibilities)
|
||||
- **3.13:** Expected to work (no known incompatibilities)
|
||||
- **3.14:** Expected to work (pure Python with no C dependencies)
|
||||
|
||||
**Dependencies:** None - pure Python with standard library only @ github.com/grantjenks/python-diskcache/setup.py
|
||||
|
||||
## Real-World Usage Examples
|
||||
|
||||
### Example Projects Using diskcache
|
||||
|
||||
1. **morss** (722+ stars) @ github.com/pictuga/morss
|
||||
- Full-text RSS feed generator
|
||||
- Pattern: Caching HTTP responses and parsed feed data
|
||||
- URL: <https://github.com/pictuga/morss>
|
||||
|
||||
2. **git-pandas** (192+ stars) @ github.com/wdm0006/git-pandas
|
||||
- Git repository analysis with pandas dataframes
|
||||
- Pattern: Caching expensive git repository queries
|
||||
- URL: <https://github.com/wdm0006/git-pandas>
|
||||
|
||||
3. **High-Traffic Website Caching** @ grantjenks.com/docs/diskcache
|
||||
- Testimonial: "Reduced Elasticsearch queries by over 25% for 1M+ users/day (100+ hits/second)" - Daren Hasenkamp
|
||||
- Pattern: Database query result caching in production web applications
|
||||
|
||||
4. **Ansible Automation** @ grantjenks.com/docs/diskcache
|
||||
- Testimonial: "Sped up Ansible runs by almost 3 times" - Mathias Petermann
|
||||
- Pattern: Caching lookup module results across playbook runs
|
||||
|
||||
### Common Usage Patterns @ grantjenks.com/docs/diskcache, exa.ai
|
||||
|
||||
```python
|
||||
# Pattern 1: Basic Cache Operations
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
# Dictionary-like interface
|
||||
cache['key'] = 'value'
|
||||
print(cache['key']) # 'value'
|
||||
print('key' in cache) # True
|
||||
del cache['key']
|
||||
|
||||
# Method-based interface with expiration
|
||||
cache.set('key', 'value', expire=300) # 5 minutes
|
||||
value = cache.get('key')
|
||||
cache.delete('key')
|
||||
|
||||
# Cleanup
|
||||
cache.close()
|
||||
|
||||
# Pattern 2: Function Memoization with Cache Decorator
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
@cache.memoize()
|
||||
def expensive_function(x, y):
|
||||
# Expensive computation
|
||||
import time
|
||||
time.sleep(2)
|
||||
return x + y
|
||||
|
||||
# First call takes 2 seconds
|
||||
result = expensive_function(1, 2) # Slow
|
||||
|
||||
# Second call is instant (cached)
|
||||
result = expensive_function(1, 2) # Fast!
|
||||
|
||||
# Pattern 3: Cache Stampede Prevention
|
||||
from diskcache import Cache, memoize_stampede
|
||||
import time
|
||||
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
@memoize_stampede(cache, expire=60, beta=0.3)
|
||||
def generate_landing_page():
|
||||
"""Prevents thundering herd when cache expires"""
|
||||
time.sleep(0.2) # Simulate expensive computation
|
||||
return "<html>Landing Page</html>"
|
||||
|
||||
# Multiple concurrent requests won't cause stampede
|
||||
result = generate_landing_page()
|
||||
|
||||
# Pattern 4: FanoutCache for High Concurrency
|
||||
from diskcache import FanoutCache
|
||||
|
||||
# Sharded cache for concurrent writes
|
||||
cache = FanoutCache('/tmp/mycache', shards=8, timeout=1.0)
|
||||
|
||||
# Same API as Cache but with better write concurrency
|
||||
cache.set('key', 'value')
|
||||
value = cache.get('key')
|
||||
|
||||
# Pattern 5: Tag-Based Eviction
|
||||
from diskcache import Cache
|
||||
from io import BytesIO
|
||||
|
||||
cache = Cache('/tmp/mycache', tag_index=True) # Enable tag index
|
||||
|
||||
# Set items with tags
|
||||
cache.set('user:1:profile', data1, tag='user:1')
|
||||
cache.set('user:1:posts', data2, tag='user:1')
|
||||
cache.set('user:1:friends', data3, tag='user:1')
|
||||
|
||||
# Evict all items for a specific tag
|
||||
cache.evict('user:1')
|
||||
|
||||
# Pattern 6: Web Crawler with Persistent Storage
|
||||
from diskcache import Index
|
||||
|
||||
# Persistent dictionary for crawled URLs
|
||||
results = Index('data/results')
|
||||
|
||||
# Store crawled data
|
||||
results['https://example.com'] = {
|
||||
'html': '<html>...</html>',
|
||||
'timestamp': '2025-10-21',
|
||||
'status': 200
|
||||
}
|
||||
|
||||
# Query persistent results
|
||||
print(len(results))
|
||||
if 'https://example.com' in results:
|
||||
data = results['https://example.com']
|
||||
|
||||
# Pattern 7: Django Cache Configuration
|
||||
# settings.py
|
||||
CACHES = {
|
||||
'default': {
|
||||
'BACKEND': 'diskcache.DjangoCache',
|
||||
'LOCATION': '/var/cache/django',
|
||||
'TIMEOUT': 300,
|
||||
'SHARDS': 8,
|
||||
'DATABASE_TIMEOUT': 0.010, # 10 milliseconds
|
||||
'OPTIONS': {
|
||||
'size_limit': 2 ** 30 # 1 GB
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
# Pattern 8: Async Operation with asyncio
|
||||
import asyncio
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
async def set_async(key, value):
|
||||
loop = asyncio.get_running_loop()
|
||||
await loop.run_in_executor(None, cache.set, key, value)
|
||||
|
||||
async def get_async(key):
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(None, cache.get, key)
|
||||
|
||||
# Use in async functions
|
||||
await set_async('test-key', 'test-value')
|
||||
value = await get_async('test-key')
|
||||
|
||||
# Pattern 9: Custom Serialization with JSONDisk
|
||||
import json
|
||||
import zlib
|
||||
from diskcache import Cache, Disk, UNKNOWN
|
||||
|
||||
class JSONDisk(Disk):
|
||||
def __init__(self, directory, compress_level=1, **kwargs):
|
||||
self.compress_level = compress_level
|
||||
super().__init__(directory, **kwargs)
|
||||
|
||||
def put(self, key):
|
||||
json_bytes = json.dumps(key).encode('utf-8')
|
||||
data = zlib.compress(json_bytes, self.compress_level)
|
||||
return super().put(data)
|
||||
|
||||
def get(self, key, raw):
|
||||
data = super().get(key, raw)
|
||||
return json.loads(zlib.decompress(data).decode('utf-8'))
|
||||
|
||||
def store(self, value, read, key=UNKNOWN):
|
||||
if not read:
|
||||
json_bytes = json.dumps(value).encode('utf-8')
|
||||
value = zlib.compress(json_bytes, self.compress_level)
|
||||
return super().store(value, read, key=key)
|
||||
|
||||
def fetch(self, mode, filename, value, read):
|
||||
data = super().fetch(mode, filename, value, read)
|
||||
if not read:
|
||||
data = json.loads(zlib.decompress(data).decode('utf-8'))
|
||||
return data
|
||||
|
||||
# Use custom disk implementation
|
||||
cache = Cache('/tmp/mycache', disk=JSONDisk, disk_compress_level=6)
|
||||
|
||||
# Pattern 10: Cross-Process Locking
|
||||
from diskcache import Lock
|
||||
import time
|
||||
|
||||
lock = Lock(cache, 'resource-name')
|
||||
|
||||
with lock:
|
||||
# Critical section - only one process executes at a time
|
||||
print("Exclusive access to resource")
|
||||
time.sleep(1)
|
||||
|
||||
# Pattern 11: Rate Limiting / Throttling
|
||||
from diskcache import throttle
|
||||
|
||||
@throttle(cache, count=10, seconds=60)
|
||||
def api_call():
|
||||
"""Allow only 10 calls per minute"""
|
||||
return make_expensive_api_request()
|
||||
|
||||
# Raises exception if rate limit exceeded
|
||||
try:
|
||||
api_call()
|
||||
except Exception:
|
||||
print("Rate limit exceeded")
|
||||
```
|
||||
|
||||
@ Compiled from grantjenks.com/docs/diskcache, exa.ai/get_code_context
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Django Integration @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
# settings.py
|
||||
CACHES = {
|
||||
'default': {
|
||||
'BACKEND': 'diskcache.DjangoCache',
|
||||
'LOCATION': '/path/to/cache/directory',
|
||||
'TIMEOUT': 300,
|
||||
'SHARDS': 8,
|
||||
'DATABASE_TIMEOUT': 0.010,
|
||||
'OPTIONS': {
|
||||
'size_limit': 2 ** 30 # 1 gigabyte
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
# Usage in views
|
||||
from django.core.cache import cache
|
||||
|
||||
def my_view(request):
|
||||
result = cache.get('my_key')
|
||||
if result is None:
|
||||
result = expensive_computation()
|
||||
cache.set('my_key', result, timeout=300)
|
||||
return result
|
||||
```
|
||||
|
||||
### FastAPI with Async Caching @ exa.ai, calmcode.io
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
import httpx
|
||||
from diskcache import Cache
|
||||
import asyncio
|
||||
|
||||
app = FastAPI()
|
||||
cache = Cache('/tmp/api_cache')
|
||||
|
||||
async def cached_api_call(url: str):
|
||||
# Check cache
|
||||
if url in cache:
|
||||
print(f'Using cached content for {url}')
|
||||
return cache[url]
|
||||
|
||||
print(f'Making new request for {url}')
|
||||
# Make async request
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
response = await client.get(url)
|
||||
html = response.text
|
||||
cache[url] = html
|
||||
return html
|
||||
|
||||
@app.get("/fetch")
|
||||
async def fetch_data(url: str):
|
||||
content = await cached_api_call(url)
|
||||
return {"content": content[:1000]}
|
||||
```
|
||||
|
||||
### Multi-Process Web Crawler @ grantjenks.com/docs/diskcache/case-study-web-crawler.html
|
||||
|
||||
```python
|
||||
from diskcache import Index, Deque
|
||||
from multiprocessing import Process
|
||||
import requests
|
||||
|
||||
# Shared queue and results across processes
|
||||
todo = Deque('data/todo')
|
||||
results = Index('data/results')
|
||||
|
||||
def crawl():
|
||||
while True:
|
||||
try:
|
||||
url = todo.popleft()
|
||||
except IndexError:
|
||||
break
|
||||
|
||||
response = requests.get(url)
|
||||
results[url] = response.text
|
||||
|
||||
# Add discovered URLs to queue
|
||||
for link in extract_links(response.text):
|
||||
todo.append(link)
|
||||
|
||||
# Start multiple crawler processes
|
||||
processes = [Process(target=crawl) for _ in range(4)]
|
||||
for process in processes:
|
||||
process.start()
|
||||
for process in processes:
|
||||
process.join()
|
||||
|
||||
print(f"Crawled {len(results)} pages")
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Basic Installation @ grantjenks.com/docs/diskcache
|
||||
|
||||
```bash
|
||||
pip install diskcache
|
||||
```
|
||||
|
||||
### Using uv (Recommended) @ astral.sh
|
||||
|
||||
```bash
|
||||
uv add diskcache
|
||||
```
|
||||
|
||||
### Development Installation @ grantjenks.com/docs/diskcache/development.rst
|
||||
|
||||
```bash
|
||||
git clone https://github.com/grantjenks/python-diskcache.git
|
||||
cd python-diskcache
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Core API Components
|
||||
|
||||
### Cache Class @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
The basic cache implementation backed by SQLite.
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
# Initialize cache
|
||||
cache = Cache(directory='/tmp/mycache')
|
||||
|
||||
# Dictionary-like operations
|
||||
cache['key'] = 'value'
|
||||
value = cache['key']
|
||||
'key' in cache # True
|
||||
del cache['key']
|
||||
|
||||
# Method-based operations
|
||||
cache.set('key', 'value', expire=60, tag='category')
|
||||
value = cache.get('key', default=None, read=False,
|
||||
expire_time=False, tag=False)
|
||||
cache.delete('key')
|
||||
cache.clear()
|
||||
|
||||
# Statistics and management
|
||||
cache.volume() # Estimated disk usage
|
||||
cache.stats(enable=True, reset=False) # (hits, misses)
|
||||
cache.evict('tag') # Remove all entries with tag
|
||||
cache.expire() # Remove expired entries
|
||||
cache.close()
|
||||
```
|
||||
|
||||
### FanoutCache Class @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
Sharded cache for high-concurrency write scenarios.
|
||||
|
||||
```python
|
||||
from diskcache import FanoutCache
|
||||
|
||||
# Sharded cache (default 8 shards)
|
||||
cache = FanoutCache(
|
||||
directory='/tmp/mycache',
|
||||
shards=8,
|
||||
timeout=1.0,
|
||||
disk=Disk,
|
||||
disk_min_file_size=2 ** 15
|
||||
)
|
||||
|
||||
# Same API as Cache
|
||||
cache.set('key', 'value')
|
||||
value = cache.get('key')
|
||||
```
|
||||
|
||||
### Eviction Policies @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
Four eviction policies control what happens when cache size limit is reached:
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
# least-recently-stored (default) - fastest
|
||||
cache = Cache(eviction_policy='least-recently-stored')
|
||||
|
||||
# least-recently-used - updates on read
|
||||
cache = Cache(eviction_policy='least-recently-used')
|
||||
|
||||
# least-frequently-used - tracks access count
|
||||
cache = Cache(eviction_policy='least-frequently-used')
|
||||
|
||||
# none - no eviction, unbounded growth
|
||||
cache = Cache(eviction_policy='none')
|
||||
```
|
||||
|
||||
**Performance Characteristics:**
|
||||
|
||||
- **least-recently-stored:** Fastest (no read updates)
|
||||
- **least-recently-used:** Slower (updates timestamp on read)
|
||||
- **least-frequently-used:** Slowest (increments counter on read)
|
||||
- **none:** Fastest (no eviction overhead)
|
||||
|
||||
### Deque and Index Classes @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
Persistent, process-safe data structures.
|
||||
|
||||
```python
|
||||
from diskcache import Deque, Index
|
||||
|
||||
# Persistent deque (FIFO queue)
|
||||
deque = Deque('data/queue')
|
||||
deque.append('item')
|
||||
deque.appendleft('item')
|
||||
item = deque.pop()
|
||||
item = deque.popleft()
|
||||
|
||||
# Persistent dictionary
|
||||
index = Index('data/index')
|
||||
index['key'] = 'value'
|
||||
value = index['key']
|
||||
```
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
### Single Process Performance @ grantjenks.com/docs/diskcache/cache-benchmarks.html
|
||||
|
||||
```text
|
||||
diskcache.Cache:
|
||||
get: 19.073 µs (median)
|
||||
set: 114.918 µs (median)
|
||||
delete: 87.976 µs (median)
|
||||
|
||||
pylibmc.Client (Memcached):
|
||||
get: 42.915 µs (median)
|
||||
set: 44.107 µs (median)
|
||||
delete: 41.962 µs (median)
|
||||
|
||||
Comparison vs alternatives:
|
||||
dbm: get 36µs, set 900µs, delete 740µs
|
||||
shelve: get 41µs, set 928µs, delete 702µs
|
||||
sqlitedict: get 513µs, set 697µs, delete 1717µs
|
||||
pickleDB: get 92µs, set 1020µs, delete 1020µs
|
||||
```
|
||||
|
||||
### Multi-Process Performance (8 processes) @ grantjenks.com/docs/diskcache/cache-benchmarks.html
|
||||
|
||||
```text
|
||||
diskcache.Cache:
|
||||
get: 20.027 µs (median)
|
||||
set: 129.700 µs (median)
|
||||
delete: 97.036 µs (median)
|
||||
|
||||
redis.StrictRedis:
|
||||
get: 187.874 µs (median)
|
||||
set: 192.881 µs (median)
|
||||
delete: 185.966 µs (median)
|
||||
|
||||
pylibmc.Client:
|
||||
get: 95.844 µs (median)
|
||||
set: 97.036 µs (median)
|
||||
delete: 94.891 µs (median)
|
||||
```
|
||||
|
||||
**Key Insight:** diskcache is faster than network-based caches (Redis, Memcached) for single-machine workloads, especially for reads. @ grantjenks.com/docs/diskcache
|
||||
|
||||
### Django Cache Backend Performance @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
|
||||
|
||||
```text
|
||||
diskcache DjangoCache:
|
||||
get: 55.075 µs (median)
|
||||
set: 303.984 µs (median)
|
||||
delete: 228.882 µs (median)
|
||||
Total: 98.465s
|
||||
|
||||
redis DjangoCache:
|
||||
get: 214.100 µs (median)
|
||||
set: 230.789 µs (median)
|
||||
delete: 195.742 µs (median)
|
||||
Total: 174.069s
|
||||
|
||||
filebased DjangoCache:
|
||||
get: 114.918 µs (median)
|
||||
set: 11.289 ms (median)
|
||||
delete: 432.014 µs (median)
|
||||
Total: 907.537s
|
||||
```
|
||||
|
||||
**Key Insight:** diskcache is 1.8x faster than Redis and 9.2x faster than Django's file-based cache. @ grantjenks.com/docs/diskcache/djangocache-benchmarks.html
|
||||
|
||||
## When NOT to Use diskcache
|
||||
|
||||
### Scenarios Where diskcache May Not Be Suitable
|
||||
|
||||
1. **Distributed Systems** @ grantjenks.com/docs/diskcache
|
||||
- diskcache is single-machine only
|
||||
- Use Redis, Memcached, or distributed caches for multi-server architectures
|
||||
- Cannot share cache across network nodes
|
||||
|
||||
2. **Extremely Low Latency Required** @ grantjenks.com/docs/diskcache/cache-benchmarks.html
|
||||
- In-memory caches (lru_cache, dict) are faster for frequently accessed data
|
||||
- diskcache adds disk I/O overhead (~20µs vs ~0.1µs)
|
||||
- Consider in-memory + diskcache two-tier strategy
|
||||
|
||||
3. **Small Cache (< 100MB)** @ github.com/grantjenks/python-diskcache
|
||||
- functools.lru_cache more appropriate for small in-memory caches
|
||||
- Overhead of SQLite not justified for tiny caches
|
||||
- Use lru_cache for simplicity
|
||||
|
||||
4. **Read-Only Access Patterns** @ grantjenks.com/docs/diskcache
|
||||
- If cache is never updated after initialization
|
||||
- Simple dict or frozen data structures may be simpler
|
||||
- No eviction or expiration needed
|
||||
|
||||
5. **Cache Needs to Survive Disk Failures** @ grantjenks.com/docs/diskcache
|
||||
- diskcache stores on local disk
|
||||
- Disk failure = cache loss
|
||||
- Redis with persistence and replication for critical caches
|
||||
|
||||
6. **Need Atomic Multi-Key Operations** @ grantjenks.com/docs/diskcache
|
||||
- diskcache operations are single-key atomic
|
||||
- No native support for transactions across multiple keys
|
||||
- Redis supports MULTI/EXEC for atomic multi-key operations
|
||||
|
||||
7. **Advanced Data Structures Required** @ redis.io
|
||||
- diskcache is key-value only
|
||||
- Redis provides sets, sorted sets, lists, streams, etc.
|
||||
- Use Redis if you need these structures
|
||||
|
||||
## Key Features
|
||||
|
||||
### Thread and Process Safety @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
All operations are atomic and safe for concurrent access:
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
from multiprocessing import Process
|
||||
|
||||
cache = Cache('/tmp/shared')
|
||||
|
||||
def worker(worker_id):
|
||||
for i in range(1000):
|
||||
cache[f'worker_{worker_id}_key_{i}'] = f'value_{i}'
|
||||
|
||||
# Safe concurrent writes from multiple processes
|
||||
processes = [Process(target=worker, args=(i,)) for i in range(4)]
|
||||
for p in processes:
|
||||
p.start()
|
||||
for p in processes:
|
||||
p.join()
|
||||
```
|
||||
|
||||
### Expiration and TTL @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
import time
|
||||
|
||||
cache = Cache()
|
||||
|
||||
# Set with expiration
|
||||
cache.set('key', 'value', expire=5) # 5 seconds
|
||||
|
||||
time.sleep(6)
|
||||
print(cache.get('key')) # None (expired)
|
||||
|
||||
# Manual expiration cleanup
|
||||
cache.expire() # Remove all expired entries
|
||||
```
|
||||
|
||||
### Tag-Based Invalidation @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache(tag_index=True) # Enable tag index for performance
|
||||
|
||||
# Tag cache entries
|
||||
cache.set('user:1:profile', data1, tag='user:1')
|
||||
cache.set('user:1:settings', data2, tag='user:1')
|
||||
cache.set('user:2:profile', data3, tag='user:2')
|
||||
|
||||
# Evict all entries for a tag
|
||||
count = cache.evict('user:1')
|
||||
print(f"Evicted {count} entries")
|
||||
```
|
||||
|
||||
### Statistics and Monitoring @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache()
|
||||
|
||||
# Enable statistics tracking
|
||||
cache.stats(enable=True)
|
||||
|
||||
# Perform operations
|
||||
for i in range(100):
|
||||
cache.set(i, i)
|
||||
|
||||
for i in range(150):
|
||||
cache.get(i)
|
||||
|
||||
# Get statistics
|
||||
hits, misses = cache.stats(enable=False, reset=True)
|
||||
print(f"Hits: {hits}, Misses: {misses}") # Hits: 100, Misses: 50
|
||||
|
||||
# Get cache size
|
||||
volume = cache.volume()
|
||||
print(f"Cache volume: {volume} bytes")
|
||||
```
|
||||
|
||||
### Custom Serialization @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
from diskcache import Cache, Disk, UNKNOWN
|
||||
import pickle
|
||||
import zlib
|
||||
|
||||
class CompressedDisk(Disk):
|
||||
def put(self, key):
|
||||
data = pickle.dumps(key)
|
||||
compressed = zlib.compress(data)
|
||||
return super().put(compressed)
|
||||
|
||||
def get(self, key, raw):
|
||||
compressed = super().get(key, raw)
|
||||
data = zlib.decompress(compressed)
|
||||
return pickle.loads(data)
|
||||
|
||||
cache = Cache(disk=CompressedDisk)
|
||||
```
|
||||
|
||||
## Migration and Compatibility
|
||||
|
||||
### From functools.lru_cache @ python.org/docs, grantjenks.com/docs/diskcache
|
||||
|
||||
```python
|
||||
# Before: In-memory only
|
||||
from functools import lru_cache
|
||||
|
||||
@lru_cache(maxsize=128)
|
||||
def expensive_function(x):
|
||||
return x * 2
|
||||
|
||||
# After: Persistent across restarts
|
||||
from diskcache import Cache
|
||||
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
@cache.memoize()
|
||||
def expensive_function(x):
|
||||
return x * 2
|
||||
```
|
||||
|
||||
### From Django File Cache @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
# Before: Django's slow file cache
|
||||
CACHES = {
|
||||
'default': {
|
||||
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
|
||||
'LOCATION': '/var/tmp/django_cache',
|
||||
}
|
||||
}
|
||||
|
||||
# After: Fast diskcache
|
||||
CACHES = {
|
||||
'default': {
|
||||
'BACKEND': 'diskcache.DjangoCache',
|
||||
'LOCATION': '/var/tmp/django_cache',
|
||||
'TIMEOUT': 300,
|
||||
'SHARDS': 8,
|
||||
'OPTIONS': {
|
||||
'size_limit': 2 ** 30
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### From Redis (Single Machine) @ grantjenks.com/docs/diskcache
|
||||
|
||||
```python
|
||||
# Before: Redis client
|
||||
import redis
|
||||
r = redis.Redis(host='localhost', port=6379)
|
||||
r.set('key', 'value')
|
||||
value = r.get('key')
|
||||
|
||||
# After: diskcache (no server needed)
|
||||
from diskcache import Cache
|
||||
cache = Cache('/tmp/mycache')
|
||||
cache.set('key', 'value')
|
||||
value = cache.get('key')
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Cache Warming @ grantjenks.com/docs/diskcache
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
def warm_cache():
|
||||
cache = Cache('/tmp/mycache')
|
||||
|
||||
# Pre-populate cache with common queries
|
||||
common_queries = load_common_queries()
|
||||
|
||||
for query in common_queries:
|
||||
result = expensive_database_query(query)
|
||||
cache.set(f'query:{query}', result, expire=3600)
|
||||
|
||||
print(f"Warmed cache with {len(common_queries)} entries")
|
||||
```
|
||||
|
||||
### Two-Tier Caching @ grantjenks.com/docs/diskcache
|
||||
|
||||
```python
|
||||
from functools import lru_cache
|
||||
from diskcache import Cache
|
||||
|
||||
disk_cache = Cache('/tmp/mycache')
|
||||
|
||||
@lru_cache(maxsize=100) # Fast in-memory tier
|
||||
def get_from_memory(key):
|
||||
# Fall back to disk cache
|
||||
return disk_cache.get(key)
|
||||
|
||||
def get_value(key):
|
||||
# Try memory first (fast)
|
||||
value = get_from_memory(key)
|
||||
|
||||
if value is None:
|
||||
# Fetch from source and cache both tiers
|
||||
value = expensive_operation(key)
|
||||
disk_cache.set(key, value, expire=3600)
|
||||
get_from_memory.cache_clear() # Invalidate memory
|
||||
get_from_memory(key) # Warm memory cache
|
||||
|
||||
return value
|
||||
```
|
||||
|
||||
## Testing and Development
|
||||
|
||||
### Temporary Cache for Tests @ grantjenks.com/docs/diskcache
|
||||
|
||||
```python
|
||||
import tempfile
|
||||
import shutil
|
||||
from diskcache import Cache
|
||||
|
||||
def test_cache_operations():
|
||||
# Create temporary cache directory
|
||||
tmpdir = tempfile.mkdtemp()
|
||||
|
||||
try:
|
||||
cache = Cache(tmpdir)
|
||||
|
||||
# Test operations
|
||||
cache.set('key', 'value')
|
||||
assert cache.get('key') == 'value'
|
||||
|
||||
cache.close()
|
||||
finally:
|
||||
# Cleanup
|
||||
shutil.rmtree(tmpdir, ignore_errors=True)
|
||||
```
|
||||
|
||||
### Context Manager for Cleanup @ grantjenks.com/docs/diskcache/tutorial.html
|
||||
|
||||
```python
|
||||
from diskcache import Cache
|
||||
|
||||
# Automatic cleanup with context manager
|
||||
with Cache('/tmp/mycache') as cache:
|
||||
cache.set('key', 'value')
|
||||
value = cache.get('key')
|
||||
# cache.close() called automatically
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Official Documentation @ grantjenks.com/docs/diskcache
|
||||
|
||||
- Tutorial: <https://grantjenks.com/docs/diskcache/tutorial.html>
|
||||
- Cache Benchmarks: <https://grantjenks.com/docs/diskcache/cache-benchmarks.html>
|
||||
- Django Benchmarks: <https://grantjenks.com/docs/diskcache/djangocache-benchmarks.html>
|
||||
- Case Study - Web Crawler: <https://grantjenks.com/docs/diskcache/case-study-web-crawler.html>
|
||||
- Case Study - Landing Page: <https://grantjenks.com/docs/diskcache/case-study-landing-page-caching.html>
|
||||
- API Reference: <https://grantjenks.com/docs/diskcache/api.html>
|
||||
|
||||
### Community Resources
|
||||
|
||||
- GitHub Repository: <https://github.com/grantjenks/python-diskcache> @ github.com
|
||||
- Issue Tracker: <https://github.com/grantjenks/python-diskcache/issues> @ github.com
|
||||
- PyPI Package: <https://pypi.org/project/diskcache/> @ pypi.org
|
||||
- Author's Blog: <https://grantjenks.com/> @ grantjenks.com
|
||||
|
||||
### Related Projects by Author
|
||||
|
||||
- sortedcontainers: Fast pure-Python sorted collections @ github.com/grantjenks/python-sortedcontainers
|
||||
- wordsegment: English word segmentation @ github.com/grantjenks/python-wordsegment
|
||||
- runstats: Online statistics and regression @ github.com/grantjenks/python-runstats
|
||||
|
||||
## Summary
|
||||
|
||||
diskcache is the ideal choice for single-machine persistent caching when you need:
|
||||
|
||||
- Process-safe caching without running a separate server
|
||||
- Gigabytes of cache using disk space instead of memory
|
||||
- Better performance than Django's file cache or network caches for local workloads
|
||||
- Memoization that persists across process restarts
|
||||
- Tag-based invalidation for related cache entries
|
||||
- Multiple eviction policies (LRU, LFU)
|
||||
|
||||
It provides production-grade reliability with 100% test coverage, extensive benchmarking, and stress testing. For distributed systems or when network latency is acceptable, Redis remains the better choice. For small in-memory caches, use functools.lru_cache.
|
||||
|
||||
**Performance Highlight:** diskcache can be faster than Redis and Memcached for single-machine workloads because it eliminates network overhead (19µs get vs 187µs for Redis). @ grantjenks.com/docs/diskcache/cache-benchmarks.html
|
||||
|
||||
---
|
||||
|
||||
**Research completed:** 2025-10-21 @ Claude Code Agent **Sources verified:** GitHub, Context7, PyPI, Official Documentation, Exa Code Context @ Multiple verified sources **Confidence level:** High - All information cross-referenced from official sources and benchmarks
|
||||
@@ -0,0 +1,643 @@
|
||||
---
|
||||
title: "python-dotenv: Environment Variable Management for Python"
|
||||
library_name: python-dotenv
|
||||
pypi_package: python-dotenv
|
||||
category: configuration
|
||||
python_compatibility: "3.9+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://github.com/theskumar/python-dotenv#readme"
|
||||
official_repository: "https://github.com/theskumar/python-dotenv"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# python-dotenv: Environment Variable Management for Python
|
||||
|
||||
## Core Purpose
|
||||
|
||||
python-dotenv reads key-value pairs from a `.env` file and can set them as environment variables. It helps in the development of applications following the [12-factor app](https://12factor.net/) principles.
|
||||
|
||||
### Problem Solved
|
||||
|
||||
The model must state that python-dotenv addresses development environment configuration challenges by enabling loading of environment variables from `.env` files, eliminating the need to manually set environment variables before running applications during local development.
|
||||
|
||||
Without python-dotenv, developers would need to:
|
||||
|
||||
- Manually export environment variables in shell sessions before running applications
|
||||
- Hardcode configuration values in source code (security risk)
|
||||
- Write custom file-reading and environment-setting code (reinventing the wheel)
|
||||
- Maintain separate shell scripts for different environments
|
||||
|
||||
### 12-Factor App Methodology
|
||||
|
||||
The model must verify that python-dotenv implements the configuration principle from the 12-factor methodology: "Store config in the environment." This separation of configuration from code enables:
|
||||
|
||||
- Different configurations per environment (development, staging, production)
|
||||
- No credentials in version control
|
||||
- Configuration changes without code deployment
|
||||
- Portable code across environments
|
||||
|
||||
## Official Information
|
||||
|
||||
The model must cite the following verified sources:
|
||||
|
||||
- **PyPI Package**: `python-dotenv` @ <https://pypi.org/project/python-dotenv/>
|
||||
- **Version**: 1.1.1 (verified 2025-10-21)
|
||||
- **Repository**: <https://github.com/theskumar/python-dotenv>
|
||||
- **Stars**: 8,443+ GitHub stars
|
||||
- **License**: BSD-3-Clause
|
||||
- **Maintenance**: Production/Stable status, actively maintained by Saurabh Kumar and Bertrand Bonnefoy-Claudet
|
||||
- **Python Support**: 3.9, 3.10, 3.11, 3.12, 3.13 + PyPy
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
The model must state the following compatibility information from setup.py:
|
||||
|
||||
| Python Version | Support Status | Notes |
|
||||
| -------------- | -------------- | ----------------------------- |
|
||||
| 3.9 | Supported | Minimum version |
|
||||
| 3.10 | Supported | Full support |
|
||||
| 3.11 | Supported | Full support |
|
||||
| 3.12 | Supported | Full support |
|
||||
| 3.13 | Supported | Latest version |
|
||||
| 3.14 | Expected | No breaking changes expected |
|
||||
| PyPy | Supported | PyPy implementation supported |
|
||||
|
||||
The model must verify that python-dotenv has no version-specific features and works identically across supported Python versions.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Basic installation
|
||||
pip install python-dotenv
|
||||
|
||||
# With CLI support
|
||||
pip install "python-dotenv[cli]"
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic .env File Loading
|
||||
|
||||
The model must demonstrate the standard pattern from official documentation:
|
||||
|
||||
```python
|
||||
# app.py
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
# Load variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
# Access environment variables
|
||||
database_url = os.getenv('DATABASE_URL')
|
||||
api_key = os.getenv('API_KEY')
|
||||
debug = os.getenv('DEBUG', 'False') == 'True'
|
||||
```
|
||||
|
||||
Corresponding `.env` file:
|
||||
|
||||
```bash
|
||||
# .env
|
||||
DATABASE_URL=postgresql://localhost/mydb
|
||||
API_KEY=secret_key_12345
|
||||
DEBUG=True
|
||||
```
|
||||
|
||||
### Advanced Configuration Management
|
||||
|
||||
The model must show the dictionary-based pattern for merging multiple configuration sources:
|
||||
|
||||
```python
|
||||
from dotenv import dotenv_values
|
||||
import os
|
||||
|
||||
config = {
|
||||
**dotenv_values(".env.shared"), # load shared development variables
|
||||
**dotenv_values(".env.secret"), # load sensitive variables
|
||||
**os.environ, # override with environment variables
|
||||
}
|
||||
|
||||
# Access with priority: os.environ > .env.secret > .env.shared
|
||||
database_url = config['DATABASE_URL']
|
||||
```
|
||||
|
||||
### Multi-Environment Loading
|
||||
|
||||
The model must demonstrate environment-specific configuration loading:
|
||||
|
||||
```python
|
||||
from dotenv import load_dotenv, find_dotenv
|
||||
import os
|
||||
|
||||
# Determine environment
|
||||
env = os.getenv('ENVIRONMENT', 'development')
|
||||
dotenv_path = f'.env.{env}'
|
||||
|
||||
# Load environment-specific file
|
||||
load_dotenv(dotenv_path=dotenv_path)
|
||||
|
||||
# .env.development, .env.staging, .env.production
|
||||
```
|
||||
|
||||
### Variable Expansion with Defaults
|
||||
|
||||
The model must show POSIX-style variable interpolation:
|
||||
|
||||
```bash
|
||||
# .env with variable expansion
|
||||
DOMAIN=example.org
|
||||
EMAIL=admin@${DOMAIN}
|
||||
API_URL=https://${DOMAIN}/api
|
||||
|
||||
# Default values for missing variables
|
||||
DATABASE_HOST=${DB_HOST:-localhost}
|
||||
DATABASE_PORT=${DB_PORT:-5432}
|
||||
REDIS_URL=redis://${REDIS_HOST:-localhost}:${REDIS_PORT:-6379}
|
||||
```
|
||||
|
||||
### IPython/Jupyter Integration
|
||||
|
||||
The model must demonstrate the IPython extension usage:
|
||||
|
||||
```python
|
||||
# In Jupyter notebook
|
||||
%load_ext dotenv
|
||||
%dotenv
|
||||
|
||||
# Load specific file
|
||||
%dotenv /path/to/.env.local
|
||||
|
||||
# Override existing variables
|
||||
%dotenv -o
|
||||
|
||||
# Verbose output
|
||||
%dotenv -v
|
||||
```
|
||||
|
||||
### CLI Usage
|
||||
|
||||
The model must show command-line interface examples:
|
||||
|
||||
```bash
|
||||
# Set variables
|
||||
dotenv set DATABASE_URL "postgresql://localhost/mydb"
|
||||
dotenv set API_KEY "secret_key_123"
|
||||
|
||||
# Get specific value
|
||||
dotenv get API_KEY
|
||||
|
||||
# List all variables
|
||||
dotenv list
|
||||
|
||||
# List as JSON
|
||||
dotenv list --format=json
|
||||
|
||||
# Run command with loaded environment
|
||||
dotenv run -- python manage.py runserver
|
||||
dotenv run -- pytest tests/
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Django Integration
|
||||
|
||||
The model must demonstrate Django integration in manage.py and settings.py:
|
||||
|
||||
```python
|
||||
# manage.py
|
||||
import os
|
||||
import sys
|
||||
from dotenv import load_dotenv
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Load .env before Django imports settings
|
||||
load_dotenv()
|
||||
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')
|
||||
from django.core.management import execute_from_command_line
|
||||
execute_from_command_line(sys.argv)
|
||||
```
|
||||
|
||||
```python
|
||||
# settings.py
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
SECRET_KEY = os.getenv('SECRET_KEY')
|
||||
DEBUG = os.getenv('DEBUG', 'False') == 'True'
|
||||
DATABASES = {
|
||||
'default': {
|
||||
'ENGINE': 'django.db.backends.postgresql',
|
||||
'NAME': os.getenv('DB_NAME'),
|
||||
'USER': os.getenv('DB_USER'),
|
||||
'PASSWORD': os.getenv('DB_PASSWORD'),
|
||||
'HOST': os.getenv('DB_HOST', 'localhost'),
|
||||
'PORT': os.getenv('DB_PORT', '5432'),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Flask Integration
|
||||
|
||||
The model must show Flask application factory pattern:
|
||||
|
||||
```python
|
||||
# app.py or __init__.py
|
||||
from flask import Flask
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
|
||||
def create_app():
|
||||
app = Flask(__name__)
|
||||
|
||||
app.config['SECRET_KEY'] = os.getenv('SECRET_KEY')
|
||||
app.config['DATABASE_URI'] = os.getenv('DATABASE_URL')
|
||||
app.config['DEBUG'] = os.getenv('FLASK_DEBUG', 'False') == 'True'
|
||||
|
||||
return app
|
||||
|
||||
if __name__ == '__main__':
|
||||
app = create_app()
|
||||
app.run()
|
||||
```
|
||||
|
||||
### FastAPI Integration
|
||||
|
||||
The model must demonstrate Pydantic Settings integration:
|
||||
|
||||
```python
|
||||
# config.py
|
||||
from pydantic_settings import BaseSettings
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
class Settings(BaseSettings):
|
||||
database_url: str
|
||||
api_key: str
|
||||
debug: bool = False
|
||||
|
||||
class Config:
|
||||
env_file = '.env'
|
||||
env_file_encoding = 'utf-8'
|
||||
|
||||
settings = Settings()
|
||||
```
|
||||
|
||||
```python
|
||||
# main.py
|
||||
from fastapi import FastAPI
|
||||
from config import settings
|
||||
|
||||
app = FastAPI(debug=settings.debug)
|
||||
|
||||
@app.get("/")
|
||||
def read_root():
|
||||
return {"database": settings.database_url}
|
||||
```
|
||||
|
||||
### Docker and Container Integration
|
||||
|
||||
The model must show Docker integration with environment variable precedence:
|
||||
|
||||
```dockerfile
|
||||
# Dockerfile
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt
|
||||
|
||||
COPY . .
|
||||
|
||||
# .env file is included in image for defaults
|
||||
# But container env vars take precedence
|
||||
CMD ["python", "app.py"]
|
||||
```
|
||||
|
||||
```python
|
||||
# app.py with override=False (default)
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
# Precedence: Docker env vars > .env file
|
||||
load_dotenv(override=False)
|
||||
|
||||
# This respects container-injected variables
|
||||
database_url = os.getenv('DATABASE_URL')
|
||||
```
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
app:
|
||||
build: .
|
||||
environment:
|
||||
- DATABASE_URL=postgresql://db/production
|
||||
- DEBUG=False
|
||||
# Container env vars override .env file
|
||||
```
|
||||
|
||||
### Environment Variable Precedence
|
||||
|
||||
The model must explain the precedence rules:
|
||||
|
||||
**With `load_dotenv(override=False)` (default)**:
|
||||
|
||||
1. Existing environment variables (highest priority)
|
||||
2. Variables from .env file
|
||||
3. Default values in code
|
||||
|
||||
**With `load_dotenv(override=True)`**:
|
||||
|
||||
1. Variables from .env file (highest priority)
|
||||
2. Existing environment variables
|
||||
3. Default values in code
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
The model must cite verified GitHub repositories using python-dotenv:
|
||||
|
||||
1. **theskumar/python-dotenv** (8,443 stars)
|
||||
- Source repository with comprehensive examples
|
||||
- @github:theskumar/python-dotenv
|
||||
|
||||
2. **daveebbelaar/langchain-experiments** (1,104 stars)
|
||||
- LangChain AI applications with environment configuration
|
||||
- @github:daveebbelaar/langchain-experiments
|
||||
|
||||
3. **iam-veeramalla/python-for-devops** (3,994 stars)
|
||||
- DevOps automation scripts using dotenv
|
||||
- @github:iam-veeramalla/python-for-devops
|
||||
|
||||
4. **AgentOps-AI/agentops** (4,978 stars)
|
||||
- AI agent monitoring with environment configuration
|
||||
- @github:AgentOps-AI/agentops
|
||||
|
||||
Common pattern observed across repositories:
|
||||
|
||||
```python
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
api_key = os.getenv('API_KEY')
|
||||
```
|
||||
|
||||
## When to Use python-dotenv
|
||||
|
||||
The model must create a decision matrix based on verified use cases:
|
||||
|
||||
### Use python-dotenv when
|
||||
|
||||
- **Local Development**: Managing configuration during development where setting environment variables manually is impractical
|
||||
- **Multiple Developers**: Team needs consistent local environment setup without sharing credentials in version control
|
||||
- **Multiple Environments**: Application runs in development, staging, production with different configurations
|
||||
- **Third-Party Services**: Application integrates with APIs requiring secret keys
|
||||
- **Framework Integration**: Using Django, Flask, FastAPI where .env files are standard practice
|
||||
- **CI/CD Pipelines**: Testing with different configurations in continuous integration
|
||||
- **Jupyter Notebooks**: Interactive development requiring API keys and configuration
|
||||
- **Docker Development**: Local development with Docker where .env provides defaults but containers can override
|
||||
- **12-Factor Applications**: Following cloud-native application design principles
|
||||
|
||||
The model must verify these scenarios from real-world usage patterns in GitHub repositories.
|
||||
|
||||
## When NOT to Use python-dotenv
|
||||
|
||||
The model must state limitations and alternative approaches:
|
||||
|
||||
### Do NOT use python-dotenv when
|
||||
|
||||
1. **Production Secrets Management**:
|
||||
- Problem: .env files are plaintext and not encrypted
|
||||
- Alternative: Use HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
|
||||
- Reason: Production secrets require encryption, rotation, audit logging, and access control
|
||||
|
||||
2. **Native Environment Variables Sufficient**:
|
||||
- Problem: Adds unnecessary dependency for simple cases
|
||||
- Alternative: Use `os.environ` directly
|
||||
- Reason: If environment already provides all variables, loading from file is redundant
|
||||
|
||||
3. **Compiled Applications**:
|
||||
- Problem: .env files must be distributed with application
|
||||
- Alternative: Bake configuration into build or use external configuration service
|
||||
- Reason: Compiled/packaged applications should not rely on external files
|
||||
|
||||
4. **Read-Only Filesystems**:
|
||||
- Problem: Cannot read .env file from disk
|
||||
- Alternative: Use environment variables directly
|
||||
- Reason: Some container runtimes and serverless platforms use read-only filesystems
|
||||
|
||||
5. **Complex Configuration Schemas**:
|
||||
- Problem: .env files only support string key-value pairs
|
||||
- Alternative: Use YAML, TOML, JSON with schema validation
|
||||
- Reason: Complex nested configuration requires structured formats
|
||||
|
||||
6. **Dynamic Configuration**:
|
||||
- Problem: .env files are loaded once at startup
|
||||
- Alternative: Use configuration management service (Consul, etcd)
|
||||
- Reason: Applications requiring runtime configuration updates need dynamic sources
|
||||
|
||||
7. **Distributed Systems Coordination**:
|
||||
- Problem: Each service would need its own .env file
|
||||
- Alternative: Use centralized configuration service
|
||||
- Reason: Distributed systems need synchronized configuration updates
|
||||
|
||||
## Decision Guidance Matrix
|
||||
|
||||
The model must provide deterministic decision criteria:
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────┬──────────────────┬───────────────────┐
|
||||
│ Scenario │ Use dotenv? │ Alternative │
|
||||
├─────────────────────────────────┼──────────────────┼───────────────────┤
|
||||
│ Local development │ YES │ N/A │
|
||||
│ Development with team │ YES │ N/A │
|
||||
│ CI/CD testing │ YES │ N/A │
|
||||
│ Docker local development │ YES │ N/A │
|
||||
│ Jupyter notebooks │ YES │ N/A │
|
||||
│ Production deployments │ NO │ Secrets manager │
|
||||
│ Production secrets storage │ NO │ Vault/KMS │
|
||||
│ Simple scripts (no secrets) │ NO │ os.environ │
|
||||
│ Complex nested config │ NO │ YAML/TOML │
|
||||
│ Dynamic config updates │ NO │ Consul/etcd │
|
||||
│ Serverless functions │ MAYBE │ Cloud env vars │
|
||||
│ Distributed systems │ NO │ Config service │
|
||||
└─────────────────────────────────┴──────────────────┴───────────────────┘
|
||||
```
|
||||
|
||||
## File Format Reference
|
||||
|
||||
The model must document the supported .env syntax from official documentation:
|
||||
|
||||
```bash
|
||||
# Basic key-value pairs
|
||||
API_KEY=secret123
|
||||
PORT=8080
|
||||
DEBUG=true
|
||||
|
||||
# Quoted values
|
||||
DATABASE_URL='postgresql://localhost/mydb'
|
||||
APP_NAME="My Application"
|
||||
|
||||
# Multiline values (quoted)
|
||||
CERTIFICATE="-----BEGIN CERTIFICATE-----
|
||||
MIIDXTCCAkWgAwIBAgIJAKL0UG+mRbzMMA0GCSqGSIb3DQEBCwUA
|
||||
-----END CERTIFICATE-----"
|
||||
|
||||
# Comments
|
||||
# This is a comment
|
||||
LOG_LEVEL=INFO # Inline comment
|
||||
|
||||
# Export directive (optional, no effect on parsing)
|
||||
export PATH_EXTENSION=/usr/local/bin
|
||||
|
||||
# Variable expansion with POSIX syntax
|
||||
DOMAIN=example.org
|
||||
EMAIL=admin@${DOMAIN}
|
||||
API_URL=https://${DOMAIN}/api
|
||||
|
||||
# Default values for missing variables
|
||||
DATABASE_HOST=${DB_HOST:-localhost}
|
||||
DATABASE_PORT=${DB_PORT:-5432}
|
||||
|
||||
# Escape sequences in double quotes
|
||||
MESSAGE="Line 1\nLine 2\nLine 3"
|
||||
TABS="Column1\tColumn2\tColumn3"
|
||||
QUOTE="He said \"Hello\""
|
||||
|
||||
# Escape sequences in single quotes (only \\ and \')
|
||||
PATH='C:\\Users\\Admin'
|
||||
NAME='O\'Brien'
|
||||
|
||||
# Empty values
|
||||
EMPTY_STRING=
|
||||
EMPTY_VAR
|
||||
|
||||
# Spaces around = are ignored
|
||||
REDIS_URL = redis://localhost:6379
|
||||
```
|
||||
|
||||
Supported escape sequences:
|
||||
|
||||
- Double quotes: `\\`, `\'`, `\"`, `\a`, `\b`, `\f`, `\n`, `\r`, `\t`, `\v`
|
||||
- Single quotes: `\\`, `\'`
|
||||
|
||||
## API Reference Summary
|
||||
|
||||
The model must list core functions from verified documentation:
|
||||
|
||||
| Function | Purpose | Returns | Common Use |
|
||||
| --- | --- | --- | --- |
|
||||
| `load_dotenv(dotenv_path=None, stream=None, verbose=False, override=False, interpolate=True, encoding=None)` | Load .env into os.environ | bool (success) | Application startup |
|
||||
| `dotenv_values(dotenv_path=None, stream=None, verbose=False, interpolate=True, encoding=None)` | Parse .env to dict | dict | Config merging |
|
||||
| `find_dotenv(filename='.env', raise_error_if_not_found=False, usecwd=False)` | Search for .env file | str (path) | Auto-discovery |
|
||||
| `get_key(dotenv_path, key_to_get, encoding=None)` | Get single value | str or None | Read specific key |
|
||||
| `set_key(dotenv_path, key_to_set, value_to_set, quote_mode='always', export=False, encoding=None)` | Write key-value | tuple | Programmatic updates |
|
||||
| `unset_key(dotenv_path, key_to_unset, encoding=None)` | Remove key | tuple | Cleanup |
|
||||
|
||||
## Related Libraries
|
||||
|
||||
The model must cite verified alternatives from GitHub repository:
|
||||
|
||||
- **django-environ**: Django-specific with type coercion (@github:joke2k/django-environ)
|
||||
- **python-decouple**: Strict separation with type casting (@github:HBNetwork/python-decouple)
|
||||
- **environs**: Marshmallow-based validation (@github:sloria/environs)
|
||||
- **dynaconf**: Multi-format with layered settings (@github:rochacbruno/dynaconf)
|
||||
- **pydantic-settings**: Type-safe with Pydantic models (recommended for FastAPI)
|
||||
|
||||
The model must state that python-dotenv is the most widely adopted for simple .env loading.
|
||||
|
||||
## Best Practices
|
||||
|
||||
The model must recommend the following verified patterns:
|
||||
|
||||
1. **Never commit .env to version control**
|
||||
|
||||
```bash
|
||||
# .gitignore
|
||||
.env
|
||||
.env.local
|
||||
.env.*.local
|
||||
```
|
||||
|
||||
2. **Provide .env.example for documentation**
|
||||
|
||||
```bash
|
||||
# .env.example
|
||||
DATABASE_URL=postgresql://localhost/mydb
|
||||
API_KEY=your_api_key_here
|
||||
DEBUG=False
|
||||
```
|
||||
|
||||
3. **Load at application entry point**
|
||||
|
||||
```python
|
||||
# Load once at startup, not in modules
|
||||
if __name__ == '__main__':
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
main()
|
||||
```
|
||||
|
||||
4. **Use override=False for production precedence**
|
||||
|
||||
```python
|
||||
# Let container/server env vars override .env
|
||||
load_dotenv(override=False)
|
||||
```
|
||||
|
||||
5. **Validate required variables**
|
||||
|
||||
```python
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
|
||||
required_vars = ['DATABASE_URL', 'API_KEY', 'SECRET_KEY']
|
||||
missing = [var for var in required_vars if not os.getenv(var)]
|
||||
|
||||
if missing:
|
||||
raise ValueError(f"Missing required environment variables: {missing}")
|
||||
```
|
||||
|
||||
6. **Use type conversion explicitly**
|
||||
```python
|
||||
# python-dotenv returns strings only
|
||||
DEBUG = os.getenv('DEBUG', 'False').lower() in ('true', '1', 'yes')
|
||||
PORT = int(os.getenv('PORT', '8000'))
|
||||
TIMEOUT = float(os.getenv('TIMEOUT', '30.0'))
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
The model must warn about verified issues:
|
||||
|
||||
1. **Strings only**: All values are strings, must convert to int/bool/float manually
|
||||
2. **Load timing**: Must load before importing modules that use environment variables
|
||||
3. **Override behavior**: Default `override=False` means existing env vars take precedence
|
||||
4. **Variable expansion**: Disabled with `interpolate=False`, check if this is needed
|
||||
5. **Production secrets**: Never use .env files for production secret storage
|
||||
|
||||
## Verification
|
||||
|
||||
The model must self-check the following criteria:
|
||||
|
||||
- All sections present and complete
|
||||
- Code examples tested against python-dotenv 1.1.1 documentation
|
||||
- Sources cited with @ references
|
||||
- Decision matrix provides deterministic guidance
|
||||
- No claims about functionality without documentation verification
|
||||
- Python version compatibility verified from setup.py
|
||||
- Real-world examples from GitHub repositories with star counts
|
||||
- Integration patterns match framework documentation
|
||||
- Security warnings included for production usage
|
||||
|
||||
## Summary
|
||||
|
||||
The model must state that python-dotenv is the standard Python library for loading environment variables from .env files during development. It implements 12-factor app configuration principles, supports Python 3.9-3.13, and integrates with Django, Flask, FastAPI, and other frameworks. The library is suitable for development and testing environments but should not be used for production secrets management. For production deployments, environment variables should be injected by container orchestration or cloud platforms, with secrets managed by dedicated secrets management services.
|
||||
@@ -0,0 +1,647 @@
|
||||
---
|
||||
title: "Robot Framework: Generic Test Automation Framework"
|
||||
library_name: robotframework
|
||||
pypi_package: robotframework
|
||||
category: testing
|
||||
python_compatibility: "3.8+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://robotframework.org"
|
||||
official_repository: "https://github.com/robotframework/robotframework"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# Robot Framework
|
||||
|
||||
## Core Purpose
|
||||
|
||||
Robot Framework is a generic open source automation framework designed for acceptance testing, acceptance test driven development (ATDD), behavior driven development (BDD), and robotic process automation (RPA). It uses a keyword-driven testing approach that enables writing tests in a human-readable, tabular format.
|
||||
|
||||
**What problem does it solve?**
|
||||
|
||||
- Enables non-programmers to write and maintain automated tests
|
||||
- Bridges communication gap between technical and non-technical stakeholders
|
||||
- Provides a unified framework for acceptance testing across different technologies (web, API, desktop, mobile)
|
||||
- Allows test automation without deep programming knowledge
|
||||
- Facilitates living documentation through readable test cases
|
||||
|
||||
**What would be "reinventing the wheel" without it?**
|
||||
|
||||
Without Robot Framework, teams would need to:
|
||||
|
||||
- Build custom test execution frameworks with reporting capabilities
|
||||
- Create their own keyword abstraction layers for business-readable tests
|
||||
- Develop logging and debugging infrastructure from scratch
|
||||
- Implement test data parsing for multiple formats (plain text, HTML, reStructuredText)
|
||||
- Create plugin systems for extending test capabilities
|
||||
- Build result aggregation and reporting tools
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework/blob/master/README.rst>
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
**Minimum Python version:** 3.8
|
||||
|
||||
**Python 3.11-3.14 compatibility status:**
|
||||
|
||||
- Python 3.8-3.13: Fully supported (verified in SeleniumLibrary)
|
||||
- Python 3.14: Expected to work (no known blockers)
|
||||
|
||||
**Version differences:**
|
||||
|
||||
- Robot Framework 7.x (current): Requires Python 3.8+
|
||||
- Robot Framework 6.1.1: Last version supporting Python 3.6-3.7
|
||||
- Robot Framework 4.1.3: Last version supporting Python 2.7, Jython, IronPython
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework/blob/master/INSTALL.rst> @Source: <https://github.com/robotframework/SeleniumLibrary/blob/master/README.rst>
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Install latest stable version
|
||||
pip install robotframework
|
||||
|
||||
# Install specific version
|
||||
pip install robotframework==7.3.2
|
||||
|
||||
# Upgrade to latest
|
||||
pip install --upgrade robotframework
|
||||
|
||||
# Install with common libraries
|
||||
pip install robotframework robotframework-seleniumlibrary robotframework-requests
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework/blob/master/INSTALL.rst>
|
||||
|
||||
## When to Use Robot Framework
|
||||
|
||||
**Use Robot Framework when:**
|
||||
|
||||
1. **Acceptance testing is the primary goal**
|
||||
- You need stakeholder-readable test cases
|
||||
- Business analysts or QA engineers write tests without coding
|
||||
- Tests serve as living documentation
|
||||
|
||||
2. **Keyword-driven testing fits your workflow**
|
||||
- You want to build reusable test components (keywords)
|
||||
- Test cases follow similar patterns with different data
|
||||
- Abstraction layers improve maintainability
|
||||
|
||||
3. **Cross-technology testing is required**
|
||||
- Testing web applications (via SeleniumLibrary or Browser library)
|
||||
- API testing (via RequestsLibrary)
|
||||
- Desktop applications (via various libraries)
|
||||
- Mobile apps (via AppiumLibrary)
|
||||
- SSH/remote systems (via SSHLibrary)
|
||||
|
||||
4. **Non-programmers need to contribute to tests**
|
||||
- QA teams without Python expertise
|
||||
- Domain experts need to validate test logic
|
||||
- Collaboration between technical and business teams
|
||||
|
||||
5. **RPA (Robotic Process Automation) tasks**
|
||||
- Automating repetitive business processes
|
||||
- Desktop automation workflows
|
||||
- Data migration and validation
|
||||
|
||||
**Do NOT use Robot Framework when:**
|
||||
|
||||
1. **Unit testing is the primary need**
|
||||
- Use pytest for Python unit tests
|
||||
- Robot Framework is too heavy for granular testing
|
||||
- Fast feedback loops are critical (TDD cycles)
|
||||
|
||||
2. **Python-centric test suites**
|
||||
- Team consists entirely of Python developers
|
||||
- Complex test logic requires extensive Python code
|
||||
- pytest fixtures and parametrization are more natural
|
||||
|
||||
3. **Performance testing**
|
||||
- Use locust, JMeter, or k6 instead
|
||||
- Robot Framework adds overhead for load testing
|
||||
|
||||
4. **Rapid TDD cycles**
|
||||
- Robot Framework startup time is slower than pytest
|
||||
- Test discovery and execution have overhead
|
||||
- pytest is better for red-green-refactor cycles
|
||||
|
||||
5. **Complex test orchestration**
|
||||
- Use pytest with advanced fixtures
|
||||
- Dependency injection patterns work better in pure Python
|
||||
|
||||
@Source: Based on framework design patterns and ecosystem analysis
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Requirement | Robot Framework | pytest | Recommendation |
|
||||
| ---------------------- | --------------- | ------ | -------------------------------------------------- |
|
||||
| Acceptance testing | ★★★★★ | ★★☆☆☆ | Robot Framework |
|
||||
| Unit testing | ★☆☆☆☆ | ★★★★★ | pytest |
|
||||
| API testing | ★★★★☆ | ★★★★☆ | Either (RF for acceptance, pytest for integration) |
|
||||
| Web UI testing | ★★★★★ | ★★★☆☆ | Robot Framework |
|
||||
| Non-programmer writers | ★★★★★ | ★☆☆☆☆ | Robot Framework |
|
||||
| TDD cycles | ★★☆☆☆ | ★★★★★ | pytest |
|
||||
| Living documentation | ★★★★★ | ★★☆☆☆ | Robot Framework |
|
||||
| Python developers only | ★★☆☆☆ | ★★★★★ | pytest |
|
||||
| BDD/Gherkin style | ★★★★☆ | ★★★★☆ | Either (RF native, pytest with behave) |
|
||||
| RPA/automation | ★★★★★ | ★★☆☆☆ | Robot Framework |
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Keyword-Driven Testing Approach
|
||||
|
||||
Robot Framework tests are built from keywords - reusable test steps that can be combined to create test cases. Keywords can be:
|
||||
|
||||
- Built-in keywords from Robot Framework core
|
||||
- Library keywords from external libraries (SeleniumLibrary, RequestsLibrary, etc.)
|
||||
- User-defined keywords created in test files or resource files
|
||||
|
||||
### Test Case Syntax
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Documentation Example test suite showing Robot Framework syntax
|
||||
Library SeleniumLibrary
|
||||
Library RequestsLibrary
|
||||
Resource common_keywords.resource
|
||||
|
||||
*** Variables ***
|
||||
${LOGIN_URL} http://localhost:8080/login
|
||||
${BROWSER} Chrome
|
||||
${API_URL} http://localhost:8080/api
|
||||
|
||||
*** Test Cases ***
|
||||
Valid User Login
|
||||
[Documentation] Test successful login with valid credentials
|
||||
[Tags] smoke login
|
||||
Open Browser To Login Page
|
||||
Input Username demo
|
||||
Input Password mode
|
||||
Submit Credentials
|
||||
Welcome Page Should Be Open
|
||||
[Teardown] Close Browser
|
||||
|
||||
API Health Check
|
||||
[Documentation] Verify API is responding
|
||||
${response}= GET ${API_URL}/health
|
||||
Status Should Be 200
|
||||
Should Be Equal As Strings ${response.json()}[status] healthy
|
||||
|
||||
*** Keywords ***
|
||||
Open Browser To Login Page
|
||||
Open Browser ${LOGIN_URL} ${BROWSER}
|
||||
Title Should Be Login Page
|
||||
|
||||
Input Username
|
||||
[Arguments] ${username}
|
||||
Input Text username_field ${username}
|
||||
|
||||
Input Password
|
||||
[Arguments] ${password}
|
||||
Input Text password_field ${password}
|
||||
|
||||
Submit Credentials
|
||||
Click Button login_button
|
||||
|
||||
Welcome Page Should Be Open
|
||||
Title Should Be Welcome Page
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/SeleniumLibrary/blob/master/README.rst> @Source: <https://github.com/robotframework/robotframework> (User Guide examples)
|
||||
|
||||
## Real-World Usage Patterns
|
||||
|
||||
### Pattern 1: Web Testing with SeleniumLibrary
|
||||
|
||||
SeleniumLibrary is the most popular Robot Framework library for web testing, supporting Selenium 4 and Python 3.8-3.13.
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Library SeleniumLibrary
|
||||
|
||||
*** Test Cases ***
|
||||
Search Product
|
||||
Open Browser https://example.com Chrome
|
||||
Input Text id:search-input laptop
|
||||
Click Button id:search-button
|
||||
Page Should Contain Search Results
|
||||
Close Browser
|
||||
```
|
||||
|
||||
**Example repositories:**
|
||||
|
||||
- <https://github.com/robotframework/SeleniumLibrary> (1,450+ stars)
|
||||
- <https://github.com/robotframework/WebDemo> (demo project)
|
||||
|
||||
@Source: <https://github.com/robotframework/SeleniumLibrary>
|
||||
|
||||
### Pattern 2: Modern Browser Testing with Browser Library
|
||||
|
||||
Browser library (powered by Playwright) is the next-generation web testing library, offering better performance and reliability.
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Library Browser
|
||||
|
||||
*** Test Cases ***
|
||||
Fast Modern Web Test
|
||||
New Browser chromium headless=False
|
||||
New Page https://example.com
|
||||
Type Text id=search robot framework
|
||||
Click button#submit
|
||||
Get Text h1 == Results
|
||||
Close Browser
|
||||
```
|
||||
|
||||
**Example repository:**
|
||||
|
||||
- <https://github.com/MarketSquare/robotframework-browser> (605+ stars)
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-browser>
|
||||
|
||||
### Pattern 3: API Testing with RequestsLibrary
|
||||
|
||||
RequestsLibrary wraps the Python requests library for API testing.
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Library RequestsLibrary
|
||||
|
||||
*** Test Cases ***
|
||||
GET Request Test
|
||||
${response}= GET https://jsonplaceholder.typicode.com/posts/1
|
||||
Should Be Equal As Strings 1 ${response.json()}[id]
|
||||
Status Should Be 200
|
||||
|
||||
POST Request Test
|
||||
&{data}= Create Dictionary title=Test body=Content userId=1
|
||||
${response}= POST https://jsonplaceholder.typicode.com/posts
|
||||
... json=${data}
|
||||
Status Should Be 201
|
||||
```
|
||||
|
||||
**Example repository:**
|
||||
|
||||
- <https://github.com/MarketSquare/robotframework-requests> (506+ stars)
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-requests/blob/master/README.md>
|
||||
|
||||
### Pattern 4: Data-Driven Testing
|
||||
|
||||
The data-driven approach excels when the same workflow needs to be executed with different inputs.
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Test Template Calculate
|
||||
|
||||
*** Test Cases *** Expression Expected
|
||||
Addition 12 + 2 + 2 16
|
||||
2 + -3 -1
|
||||
Subtraction 12 - 2 - 2 8
|
||||
2 - -3 5
|
||||
Multiplication 12 * 2 * 2 48
|
||||
Division 12 / 2 / 2 3
|
||||
|
||||
*** Keywords ***
|
||||
Calculate
|
||||
[Arguments] ${expression} ${expected}
|
||||
${result}= Evaluate ${expression}
|
||||
Should Be Equal As Numbers ${result} ${expected}
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/RobotDemo/blob/master/data_driven.robot>
|
||||
|
||||
### Pattern 5: BDD/Gherkin Style
|
||||
|
||||
Robot Framework supports Given-When-Then syntax for behavior-driven development.
|
||||
|
||||
```robotframework
|
||||
*** Test Cases ***
|
||||
User Can Purchase Product
|
||||
Given user is logged in
|
||||
When user adds product to cart
|
||||
And user proceeds to checkout
|
||||
Then order should be confirmed
|
||||
|
||||
*** Keywords ***
|
||||
User Is Logged In
|
||||
Open Browser To Login Page
|
||||
Login With Valid Credentials
|
||||
|
||||
User Adds Product To Cart
|
||||
Search For Product laptop
|
||||
Add First Result To Cart
|
||||
|
||||
User Proceeds To Checkout
|
||||
Click Cart Icon
|
||||
Click Checkout Button
|
||||
|
||||
Order Should Be Confirmed
|
||||
Page Should Contain Order Confirmed
|
||||
```
|
||||
|
||||
@Source: Robot Framework User Guide (Gherkin style examples)
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### SeleniumLibrary (Web Testing)
|
||||
|
||||
```bash
|
||||
pip install robotframework-seleniumlibrary
|
||||
```
|
||||
|
||||
- Most mature web testing library
|
||||
- Supports Selenium 4
|
||||
- Selenium Manager handles browser drivers automatically
|
||||
- Python 3.8-3.13 compatible
|
||||
|
||||
@Source: <https://github.com/robotframework/SeleniumLibrary>
|
||||
|
||||
### Browser Library (Modern Web Testing)
|
||||
|
||||
```bash
|
||||
pip install robotframework-browser
|
||||
rfbrowser init # Install Playwright browsers
|
||||
```
|
||||
|
||||
- Powered by Playwright
|
||||
- Better performance and reliability than Selenium
|
||||
- Built-in waiting and auto-retry mechanisms
|
||||
- Supports modern browser features
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-browser>
|
||||
|
||||
### RequestsLibrary (API Testing)
|
||||
|
||||
```bash
|
||||
pip install robotframework-requests
|
||||
```
|
||||
|
||||
- Wraps Python requests library
|
||||
- RESTful API testing
|
||||
- OAuth and authentication support
|
||||
- JSON/XML response validation
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-requests>
|
||||
|
||||
### SSHLibrary (Remote Testing)
|
||||
|
||||
```bash
|
||||
pip install robotframework-sshlibrary
|
||||
```
|
||||
|
||||
- SSH and SFTP operations
|
||||
- Remote command execution
|
||||
- File transfer capabilities
|
||||
- Terminal emulation
|
||||
|
||||
@Source: <https://github.com/MarketSquare/SSHLibrary>
|
||||
|
||||
### AppiumLibrary (Mobile Testing)
|
||||
|
||||
```bash
|
||||
pip install robotframework-appiumlibrary
|
||||
```
|
||||
|
||||
- Mobile app testing (iOS/Android)
|
||||
- Built on Appium
|
||||
- Cross-platform mobile automation
|
||||
|
||||
@Source: <https://github.com/serhatbolsu/robotframework-appiumlibrary>
|
||||
|
||||
## Custom Keyword Libraries
|
||||
|
||||
Robot Framework can be extended with Python libraries:
|
||||
|
||||
```python
|
||||
# MyLibrary.py
|
||||
class MyLibrary:
|
||||
"""Custom keyword library for Robot Framework."""
|
||||
|
||||
def __init__(self, host, port=80):
|
||||
"""Library initialization with arguments."""
|
||||
self.host = host
|
||||
self.port = port
|
||||
|
||||
def connect_to_service(self):
|
||||
"""Keyword: Connect To Service
|
||||
|
||||
Establishes connection to the configured service.
|
||||
"""
|
||||
# Implementation
|
||||
pass
|
||||
|
||||
def send_message(self, message):
|
||||
"""Keyword: Send Message
|
||||
|
||||
Sends a message to the service.
|
||||
|
||||
Arguments:
|
||||
message: The message to send
|
||||
"""
|
||||
# Implementation
|
||||
pass
|
||||
```
|
||||
|
||||
Usage in test:
|
||||
|
||||
```robotframework
|
||||
*** Settings ***
|
||||
Library MyLibrary localhost 8080
|
||||
|
||||
*** Test Cases ***
|
||||
Send Test Message
|
||||
Connect To Service
|
||||
Send Message Hello, Robot Framework!
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework> (User Guide - Creating Libraries)
|
||||
|
||||
## Execution and Reporting
|
||||
|
||||
### Basic Execution
|
||||
|
||||
```bash
|
||||
# Run all tests in a file
|
||||
robot tests.robot
|
||||
|
||||
# Run tests in a directory
|
||||
robot path/to/tests/
|
||||
|
||||
# Run with specific browser
|
||||
robot --variable BROWSER:Firefox tests.robot
|
||||
|
||||
# Run tests with specific tags
|
||||
robot --include smoke tests/
|
||||
|
||||
# Run and generate custom output directory
|
||||
robot --outputdir results tests.robot
|
||||
|
||||
# Run with Python module syntax
|
||||
python -m robot tests.robot
|
||||
```
|
||||
|
||||
### Advanced Execution
|
||||
|
||||
```bash
|
||||
# Parallel execution (with pabot)
|
||||
pip install robotframework-pabot
|
||||
pabot --processes 4 tests/
|
||||
|
||||
# Re-run failed tests
|
||||
robot --rerunfailed output.xml tests.robot
|
||||
|
||||
# Combine multiple test results
|
||||
rebot --name Combined output1.xml output2.xml
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework/blob/master/doc/userguide/src/ExecutingTestCases/BasicUsage.rst>
|
||||
|
||||
## Ecosystem Tools
|
||||
|
||||
### RIDE (Test Editor)
|
||||
|
||||
Desktop IDE for creating and editing Robot Framework tests. Supports Python 3.8-3.13.
|
||||
|
||||
```bash
|
||||
pip install robotframework-ride
|
||||
ride.py
|
||||
```
|
||||
|
||||
@Source: <https://github.com/robotframework/RIDE>
|
||||
|
||||
### RobotCode (VS Code Extension)
|
||||
|
||||
LSP-powered VS Code extension for Robot Framework development.
|
||||
|
||||
- Syntax highlighting and code completion
|
||||
- Debugging support
|
||||
- Test execution from IDE
|
||||
- Keyword documentation
|
||||
|
||||
@Source: <https://github.com/robotcodedev/robotcode>
|
||||
|
||||
### Robocop (Linter)
|
||||
|
||||
Static code analysis and linting tool for Robot Framework.
|
||||
|
||||
```bash
|
||||
pip install robotframework-robocop
|
||||
robocop tests/
|
||||
```
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-robocop>
|
||||
|
||||
### Tidy (Code Formatter)
|
||||
|
||||
Code formatting tool for Robot Framework files.
|
||||
|
||||
```bash
|
||||
pip install robotframework-tidy
|
||||
robotidy tests/
|
||||
```
|
||||
|
||||
@Source: <https://github.com/MarketSquare/robotframework-tidy> (referenced in ecosystem)
|
||||
|
||||
## Maintenance Status
|
||||
|
||||
**Status:** Actively maintained
|
||||
|
||||
- Latest stable: 7.3.2 (July 2025)
|
||||
- Latest pre-release: 7.4b1 (October 2025)
|
||||
- Active development on GitHub (11,000+ stars, 2,400+ forks)
|
||||
- Non-profit Robot Framework Foundation provides governance
|
||||
- Regular releases (multiple per year)
|
||||
- Strong community support (Slack, Forum, GitHub)
|
||||
|
||||
**Project Health Indicators:**
|
||||
|
||||
- 269 open issues (October 2025)
|
||||
- Active commit history
|
||||
- Responsive maintainers
|
||||
- Large ecosystem of maintained libraries
|
||||
- Corporate backing and foundation support
|
||||
|
||||
@Source: <https://github.com/robotframework/robotframework> @Source: <https://pypi.org/project/robotframework/>
|
||||
|
||||
## Comparison with Alternatives
|
||||
|
||||
### vs pytest
|
||||
|
||||
**Choose Robot Framework:**
|
||||
|
||||
- Acceptance testing focus
|
||||
- Non-programmers write tests
|
||||
- Keyword-driven approach preferred
|
||||
- Cross-technology testing (web, API, desktop, mobile)
|
||||
- Living documentation requirement
|
||||
|
||||
**Choose pytest:**
|
||||
|
||||
- Unit testing focus
|
||||
- Python developers only
|
||||
- Complex test logic in Python
|
||||
- Rapid TDD cycles
|
||||
- Python-native fixtures and parametrization
|
||||
|
||||
### vs Behave (Python BDD)
|
||||
|
||||
**Choose Robot Framework:**
|
||||
|
||||
- Broader scope (not just BDD)
|
||||
- Rich ecosystem of libraries
|
||||
- Keyword reusability across projects
|
||||
- Built-in reporting and logging
|
||||
|
||||
**Choose Behave:**
|
||||
|
||||
- Pure BDD/Gherkin focus
|
||||
- Step definitions in Python
|
||||
- Integration with pytest
|
||||
|
||||
### vs Cucumber (JVM BDD)
|
||||
|
||||
**Choose Robot Framework:**
|
||||
|
||||
- Python ecosystem
|
||||
- RPA capabilities
|
||||
- Broader than just BDD
|
||||
|
||||
**Choose Cucumber:**
|
||||
|
||||
- JVM ecosystem (Java, Kotlin, Scala)
|
||||
- Pure Gherkin syntax
|
||||
- Enterprise Java integration
|
||||
|
||||
## Example Projects
|
||||
|
||||
1. **RobotDemo** - Official demo project
|
||||
- <https://github.com/robotframework/RobotDemo>
|
||||
- Shows keyword-driven, data-driven, and Gherkin styles
|
||||
- Calculator library implementation example
|
||||
|
||||
2. **WebDemo** - Web testing demo
|
||||
- Referenced in SeleniumLibrary docs
|
||||
- Complete login test example with page objects
|
||||
|
||||
3. **awesome-robotframework** - Curated resources
|
||||
- <https://github.com/MarketSquare/awesome-robotframework>
|
||||
- Libraries, tools, and example projects
|
||||
- Community contributions
|
||||
|
||||
@Source: <https://github.com/robotframework/RobotDemo> @Source: <https://github.com/MarketSquare/awesome-robotframework>
|
||||
|
||||
## Summary
|
||||
|
||||
Robot Framework is the premier choice for acceptance testing and RPA in the Python ecosystem. Its keyword-driven approach enables collaboration between technical and non-technical team members, making it ideal for projects where tests serve as living documentation. The framework excels at cross-technology testing (web, API, mobile, desktop) through its rich ecosystem of libraries.
|
||||
|
||||
However, it is not a replacement for pytest in unit testing scenarios. Teams should use Robot Framework for acceptance-level tests and pytest for unit/integration tests. The frameworks complement each other well in a comprehensive testing strategy.
|
||||
|
||||
**Quick decision guide:**
|
||||
|
||||
- Need stakeholder-readable tests? → Robot Framework
|
||||
- Need unit tests? → pytest
|
||||
- Need both? → Use both frameworks together
|
||||
- Pure Python developers doing integration tests? → Consider pytest first
|
||||
- QA team without coding experience? → Robot Framework
|
||||
|
||||
The framework's active maintenance, strong community, and foundation backing ensure long-term viability for projects adopting it.
|
||||
467
skills/python3-development/references/modern-modules/shiv.md
Normal file
467
skills/python3-development/references/modern-modules/shiv.md
Normal file
@@ -0,0 +1,467 @@
|
||||
---
|
||||
title: "shiv: Python Zipapp Builder for Self-Contained Applications"
|
||||
library_name: shiv
|
||||
pypi_package: shiv
|
||||
category: packaging-distribution
|
||||
python_compatibility: "3.8+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://shiv.readthedocs.io"
|
||||
official_repository: "https://github.com/linkedin/shiv"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# shiv
|
||||
|
||||
## Overview
|
||||
|
||||
shiv is a command-line utility for building fully self-contained Python zipapps as outlined in PEP 441, but with all their dependencies included. It is developed and maintained by LinkedIn and provides a fast, easy way to distribute Python applications.
|
||||
|
||||
**Official Repository**: @<https://github.com/linkedin/shiv> **Official Documentation**: @<https://shiv.readthedocs.io/en/latest/> **PyPI Package**: @<https://pypi.org/project/shiv/>
|
||||
|
||||
## Core Purpose
|
||||
|
||||
### Problem Statement
|
||||
|
||||
shiv solves the challenge of distributing Python applications with all their dependencies bundled into a single executable file without requiring complex build processes or compilation.
|
||||
|
||||
**What problems does shiv solve?**
|
||||
|
||||
1. **Dependency bundling**: Packages your application and all its dependencies into a single `.pyz` file
|
||||
2. **Simple distribution**: Creates executable files that can be shared and run on systems with compatible Python installations
|
||||
3. **No compilation required**: Unlike PyInstaller or cx_Freeze, shiv does not compile Python code to binaries
|
||||
4. **Fast deployment**: Built on Python's standard library zipapp module (PEP 441) for minimal overhead
|
||||
5. **Reproducible builds**: Creates deterministic outputs for version control and deployment
|
||||
|
||||
**When you would be "reinventing the wheel" without shiv:**
|
||||
|
||||
- Building custom scripts to bundle dependencies with applications
|
||||
- Manually creating zipapp structures with dependencies
|
||||
- Writing deployment automation for Python CLI tools
|
||||
- Managing virtual environments on deployment targets
|
||||
|
||||
## When to Use shiv vs Alternatives
|
||||
|
||||
### Use shiv When
|
||||
|
||||
- Deploying Python applications to controlled environments where Python is already installed
|
||||
- Building CLI tools for internal distribution within organizations
|
||||
- Creating portable Python applications for Linux/macOS/WSL environments
|
||||
- You need fast build times and simple deployment workflows
|
||||
- Your application is pure Python or has platform-specific compiled dependencies that can be installed per-platform
|
||||
- You want to leverage the PEP 441 zipapp standard
|
||||
|
||||
### Use PyInstaller/cx_Freeze When
|
||||
|
||||
- Distributing to end-users who do not have Python installed
|
||||
- Creating true standalone executables with embedded Python interpreter
|
||||
- Targeting Windows environments without Python installations
|
||||
- Building GUI applications for general consumer distribution
|
||||
- You need absolute portability without Python runtime dependencies
|
||||
|
||||
### Use wheel/sdist When
|
||||
|
||||
- Publishing libraries to PyPI
|
||||
- Developing packages meant to be installed via pip
|
||||
- Creating reusable components rather than standalone applications
|
||||
- Working in environments where pip/package managers are the standard
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
```text
|
||||
┌─────────────────────────┬──────────┬─────────────┬───────────┐
|
||||
│ Requirement │ shiv │ PyInstaller │ wheel │
|
||||
├─────────────────────────┼──────────┼─────────────┼───────────┤
|
||||
│ Python required │ Yes │ No │ Yes │
|
||||
│ Build speed │ Fast │ Slow │ Fast │
|
||||
│ Bundle size │ Small │ Large │ Smallest │
|
||||
│ Cross-platform binary │ No │ Yes │ No │
|
||||
│ PEP 441 compliant │ Yes │ No │ N/A │
|
||||
│ Installation required │ No │ No │ Yes (pip) │
|
||||
│ C extension support │ Limited* │ Full │ Full │
|
||||
└─────────────────────────┴──────────┴─────────────┴───────────┘
|
||||
|
||||
* C extensions work but are platform-specific (not cross-compatible)
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
- **Minimum Python version**: 3.8 (per setup.cfg @<https://github.com/linkedin/shiv/blob/main/setup.cfg>)
|
||||
- **Tested versions**: 3.8, 3.9, 3.10, 3.11
|
||||
- **Python 3.11+ compatibility**: Fully compatible
|
||||
- **Python 3.12-3.14 status**: Expected to work (relies on standard library zipapp module)
|
||||
- **PEP 441 dependency**: Requires Python 3.5+ (PEP 441 introduced in Python 3.5)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# From PyPI
|
||||
pip install shiv
|
||||
|
||||
# From source
|
||||
git clone https://github.com/linkedin/shiv.git
|
||||
cd shiv
|
||||
python3 -m pip install -e .
|
||||
```
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### PEP 441 zipapp Integration
|
||||
|
||||
shiv builds on Python's standard library `zipapp` module (PEP 441) which allows creating executable ZIP files. The key enhancement is automatic dependency installation and bundling.
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Creates a temporary directory structure
|
||||
2. Installs specified packages and dependencies using pip
|
||||
3. Packages everything into a ZIP file
|
||||
4. Adds a shebang line to make it executable
|
||||
5. Extracts dependencies to `~/.shiv/` cache on first run
|
||||
|
||||
### Deployment Patterns
|
||||
|
||||
**Single-file distribution:**
|
||||
|
||||
```bash
|
||||
# Build once
|
||||
shiv -c myapp -o myapp.pyz myapp
|
||||
|
||||
# Distribute myapp.pyz
|
||||
# Users run: ./myapp.pyz
|
||||
```
|
||||
|
||||
**Library bundling:**
|
||||
|
||||
```bash
|
||||
# Bundle multiple packages
|
||||
shiv -o toolkit.pyz requests click pyyaml
|
||||
```
|
||||
|
||||
**From requirements.txt:**
|
||||
|
||||
```bash
|
||||
shiv -r requirements.txt -o app.pyz -c app
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Command-Line Tool
|
||||
|
||||
Create a standalone executable of flake8:
|
||||
|
||||
```bash
|
||||
shiv -c flake8 -o ~/bin/flake8 flake8
|
||||
```
|
||||
|
||||
**Explanation:**
|
||||
|
||||
- `-c flake8`: Specifies the console script entry point
|
||||
- `-o ~/bin/flake8`: Output file location
|
||||
- `flake8`: Package to install from PyPI
|
||||
|
||||
**Running:**
|
||||
|
||||
```bash
|
||||
~/bin/flake8 --version
|
||||
# Output: 3.7.8 (mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.1)
|
||||
```
|
||||
|
||||
### Interactive Python Environment
|
||||
|
||||
Create an interactive executable with libraries:
|
||||
|
||||
```bash
|
||||
shiv -o boto.pyz boto
|
||||
```
|
||||
|
||||
**Running:**
|
||||
|
||||
```bash
|
||||
./boto.pyz
|
||||
# Opens Python REPL with boto available
|
||||
>>> import boto
|
||||
>>> boto.__version__
|
||||
'2.49.0'
|
||||
```
|
||||
|
||||
### Real-World Example: CLI Application Distribution
|
||||
|
||||
From @<https://github.com/scs/smartmeter-datacollector/blob/master/README.md>:
|
||||
|
||||
```bash
|
||||
# Build a self-contained zipapp using shiv
|
||||
poetry run poe build_shiv
|
||||
```
|
||||
|
||||
This creates a `.pyz` file containing the smartmeter-datacollector application and all dependencies, distributable as a single file.
|
||||
|
||||
### Custom Python Interpreter Path
|
||||
|
||||
```bash
|
||||
shiv -c myapp -o myapp.pyz -p "/usr/bin/env python3" myapp
|
||||
```
|
||||
|
||||
The `-p` flag specifies the shebang line for the executable.
|
||||
|
||||
### Building from Local Package
|
||||
|
||||
```bash
|
||||
# From current directory with setup.py or pyproject.toml
|
||||
shiv -c myapp -o myapp.pyz .
|
||||
```
|
||||
|
||||
### Advanced: Building shiv with shiv
|
||||
|
||||
From @<https://github.com/linkedin/shiv/blob/main/README.md>:
|
||||
|
||||
```bash
|
||||
python3 -m venv .
|
||||
source bin/activate
|
||||
pip install shiv
|
||||
shiv -c shiv -o shiv shiv
|
||||
```
|
||||
|
||||
This creates a self-contained shiv executable using shiv itself, demonstrating bootstrapping capability.
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### CI/CD Pipeline Integration
|
||||
|
||||
```yaml
|
||||
# Example GitHub Actions workflow
|
||||
- name: Build application zipapp
|
||||
run: |
|
||||
pip install shiv
|
||||
shiv -c myapp -o dist/myapp.pyz myapp
|
||||
|
||||
- name: Upload artifact
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: myapp-zipapp
|
||||
path: dist/myapp.pyz
|
||||
```
|
||||
|
||||
### Makefile Integration
|
||||
|
||||
From @<https://github.com/JanssenProject/jans/blob/main/jans-cli-tui/Makefile>:
|
||||
|
||||
```makefile
|
||||
zipapp:
|
||||
@echo "Building zipapp with shiv"
|
||||
shiv -c jans_cli_tui -o jans_cli_tui.pyz .
|
||||
```
|
||||
|
||||
### Poetry Integration
|
||||
|
||||
In `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[tool.poe.tasks]
|
||||
build_shiv = "shiv -c myapp -o dist/myapp.pyz ."
|
||||
```
|
||||
|
||||
Run with: `poetry run poe build_shiv`
|
||||
|
||||
## Platform-Specific Considerations
|
||||
|
||||
### Linux/macOS
|
||||
|
||||
- **Shebang support**: Full support for `#!/usr/bin/env python3`
|
||||
- **Permissions**: Requires `chmod +x` for executable files
|
||||
- **Cache location**: `~/.shiv/` for dependency extraction
|
||||
|
||||
### Windows
|
||||
|
||||
- **Shebang limitations**: Windows does not natively support shebangs
|
||||
- **Execution**: Must run as `python myapp.pyz`
|
||||
- **Alternative**: Use Python launcher: `py myapp.pyz`
|
||||
- **Cache location**: `%USERPROFILE%\.shiv\`
|
||||
|
||||
### Cross-Platform Gotchas
|
||||
|
||||
**From @<https://github.com/linkedin/shiv/blob/main/README.md>:**
|
||||
|
||||
> Zipapps created with shiv are not guaranteed to be cross-compatible with other architectures. For example, a pyz file built on a Mac may only work on other Macs, likewise for RHEL, etc. This usually only applies to zipapps that have C extensions in their dependencies. If all your dependencies are pure Python, then chances are the pyz will work on other platforms.
|
||||
|
||||
**Recommendation**: Build platform-specific executables for production deployments when using packages with C extensions.
|
||||
|
||||
## Cache Management
|
||||
|
||||
shiv extracts dependencies to `~/.shiv/` (or `SHIV_ROOT`) on first run. This directory can grow over time.
|
||||
|
||||
**Cleanup:**
|
||||
|
||||
```bash
|
||||
# Remove all cached extractions
|
||||
rm -rf ~/.shiv/
|
||||
|
||||
# Set custom cache location
|
||||
export SHIV_ROOT=/tmp/shiv_cache
|
||||
./myapp.pyz
|
||||
```
|
||||
|
||||
## When NOT to Use shiv
|
||||
|
||||
### Scenarios Where Alternatives Are Better
|
||||
|
||||
1. **Windows-only distribution without Python**: Use PyInstaller or cx_Freeze for embedded interpreter
|
||||
2. **End-user applications**: Users expect double-click executables, not Python scripts
|
||||
3. **Cross-platform binaries from single build**: shiv requires platform-specific builds for C extensions
|
||||
4. **Library distribution**: Use wheel/sdist and publish to PyPI
|
||||
5. **Complex GUI applications**: PyInstaller has better support for frameworks like PyQt/Tkinter
|
||||
6. **Environments without Python**: shiv requires a compatible Python installation on the target system
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### Internal Tool Distribution
|
||||
|
||||
**Example**: DevOps teams distributing CLI tools
|
||||
|
||||
```bash
|
||||
# Build deployment tool
|
||||
shiv -c deploy -o deploy.pyz deploy-tool
|
||||
|
||||
# Distribute to team members
|
||||
# Everyone runs: ./deploy.pyz --environment prod
|
||||
```
|
||||
|
||||
### Lambda/Cloud Function Packaging
|
||||
|
||||
While AWS Lambda has native Python support, shiv can simplify dependency management:
|
||||
|
||||
```bash
|
||||
shiv -o lambda_function.pyz --no-binary :all: boto3 requests
|
||||
```
|
||||
|
||||
### Portable Development Environments
|
||||
|
||||
Create portable toolchains:
|
||||
|
||||
```bash
|
||||
# Bundle linting tools
|
||||
shiv -o lint.pyz black pylint mypy flake8
|
||||
|
||||
# Bundle testing tools
|
||||
shiv -o test.pyz pytest pytest-cov hypothesis
|
||||
```
|
||||
|
||||
## Real-World Projects Using shiv
|
||||
|
||||
Based on GitHub search results (@<https://github.com/search?q=shiv+zipapp>):
|
||||
|
||||
1. **JanssenProject/jans** - IAM authentication server
|
||||
- Uses shiv to build CLI and TUI applications
|
||||
- Makefile integration for zipapp builds
|
||||
- @<https://github.com/JanssenProject/jans>
|
||||
|
||||
2. **scs/smartmeter-datacollector** - Smart meter data collection
|
||||
- Poetry integration with custom build command
|
||||
- Self-contained distribution for Raspberry Pi
|
||||
- @<https://github.com/scs/smartmeter-datacollector>
|
||||
|
||||
3. **praetorian-inc/noseyparker-explorer** - Security scanning results explorer
|
||||
- TUI application distributed via shiv
|
||||
- @<https://github.com/praetorian-inc/noseyparker-explorer>
|
||||
|
||||
4. **ClericPy/zipapps** - Alternative zipapp builder
|
||||
- Built as comparison/alternative to shiv
|
||||
- @<https://github.com/ClericPy/zipapps>
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Official Documentation
|
||||
|
||||
- PEP 441 - Improving Python ZIP Application Support: @<https://www.python.org/dev/peps/pep-0441/>
|
||||
- Python zipapp module: @<https://docs.python.org/3/library/zipapp.html>
|
||||
- shiv documentation: @<https://shiv.readthedocs.io/en/latest/>
|
||||
- Lincoln Loop blog: "Dissecting a Python Zipapp Built with Shiv": @<https://lincolnloop.com/insights/dissecting-python-zipapp-built-shiv/>
|
||||
|
||||
### Community Resources
|
||||
|
||||
- Real Python tutorial: "Python's zipapp: Build Executable Zip Applications": @<https://realpython.com/python-zipapp/>
|
||||
- jhermann blog: "Bundling Python Dependencies in a ZIP Archive": @<https://jhermann.github.io/blog/python/deployment/2020/03/08/ship_libs_with_shiv.html>
|
||||
|
||||
### Comparison Articles
|
||||
|
||||
- PyOxidizer comparisons (includes shiv): @<https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_comparisons.html>
|
||||
- Hacker News discussion: @<https://news.ycombinator.com/item?id=26832809>
|
||||
|
||||
## Technical Implementation Details
|
||||
|
||||
### Dependencies
|
||||
|
||||
From @<https://github.com/linkedin/shiv/blob/main/setup.cfg>:
|
||||
|
||||
```ini
|
||||
[options]
|
||||
install_requires =
|
||||
click>=6.7,!=7.0
|
||||
pip>=9.0.3
|
||||
setuptools
|
||||
python_requires = >=3.8
|
||||
```
|
||||
|
||||
shiv has minimal dependencies, relying primarily on standard library components plus click for CLI and pip for dependency resolution.
|
||||
|
||||
### Entry Points
|
||||
|
||||
shiv provides two console scripts:
|
||||
|
||||
1. `shiv`: Main build tool
|
||||
2. `shiv-info`: Inspect zipapp metadata
|
||||
|
||||
### Build Backend
|
||||
|
||||
Uses setuptools with pyproject.toml (PEP 517/518 compliant):
|
||||
|
||||
```toml
|
||||
[build-system]
|
||||
requires = ["setuptools", "wheel"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
```
|
||||
|
||||
## Maintenance and Support
|
||||
|
||||
- **License**: BSD 2-Clause License
|
||||
- **Maintainer**: LinkedIn (@<https://github.com/linkedin>)
|
||||
- **GitHub Stars**: 1,884+ (as of October 2025)
|
||||
- **Active Development**: Yes (last updated October 2025)
|
||||
- **Open Issues**: 66 (as of October 2025)
|
||||
- **Community**: Active issue tracker and pull request reviews
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Code signing**: shiv does not sign executables; implement external signing if required
|
||||
2. **Dependency verification**: shiv uses pip, which respects pip's security model
|
||||
3. **Cache security**: `~/.shiv/` directory contains extracted dependencies; ensure proper permissions
|
||||
4. **Supply chain**: Verify package sources before building zipapps
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Build time**: Fast (seconds for typical applications)
|
||||
- **Startup overhead**: First run extracts to cache (one-time cost), subsequent runs are instant
|
||||
- **Runtime performance**: Native Python performance (no interpretation overhead)
|
||||
- **File size**: Smaller than PyInstaller bundles (no embedded interpreter)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue**: "shiv requires Python >= 3.8" **Solution**: Upgrade Python or use an older shiv version
|
||||
|
||||
**Issue**: "ImportError on different platform" **Solution**: Rebuild zipapp on target platform for C extension dependencies
|
||||
|
||||
**Issue**: "Permission denied" **Solution**: `chmod +x myapp.pyz`
|
||||
|
||||
**Issue**: "SHIV_ROOT fills up disk" **Solution**: Clean cache: `rm -rf ~/.shiv/` or set `SHIV_ROOT` to tmpfs
|
||||
|
||||
## Conclusion
|
||||
|
||||
shiv is an excellent choice for distributing Python applications in controlled environments where Python is available. It provides a simple, fast, and standards-based approach to application packaging without the complexity of binary compilation. For internal tools, CLI utilities, and cloud function packaging, shiv offers an ideal balance of simplicity and functionality.
|
||||
|
||||
**Quick decision guide:**
|
||||
|
||||
- Need standalone binary with no Python? Use PyInstaller/cx_Freeze
|
||||
- Distributing library? Use wheel + PyPI
|
||||
- Internal tool with Python available? Use shiv
|
||||
- Cross-platform GUI app? Use PyInstaller
|
||||
- Cloud function deployment? Consider shiv or native platform tools
|
||||
486
skills/python3-development/references/modern-modules/uvloop.md
Normal file
486
skills/python3-development/references/modern-modules/uvloop.md
Normal file
@@ -0,0 +1,486 @@
|
||||
---
|
||||
title: "uvloop: Ultra-Fast AsyncIO Event Loop"
|
||||
library_name: uvloop
|
||||
pypi_package: uvloop
|
||||
category: async-io
|
||||
python_compatibility: "3.8+"
|
||||
last_updated: "2025-11-02"
|
||||
official_docs: "https://uvloop.readthedocs.io"
|
||||
official_repository: "https://github.com/MagicStack/uvloop"
|
||||
maintenance_status: "active"
|
||||
---
|
||||
|
||||
# uvloop: Ultra-Fast AsyncIO Event Loop
|
||||
|
||||
## Overview
|
||||
|
||||
uvloop is a drop-in replacement for Python's built-in asyncio event loop that delivers 2-4x performance improvements for network-intensive applications. Built on top of libuv (the same C library that powers Node.js) and implemented in Cython, uvloop enables Python asyncio code to approach the performance characteristics of compiled languages like Go.
|
||||
|
||||
## The Problem It Solves
|
||||
|
||||
### Without uvloop (Reinventing the Wheel)
|
||||
|
||||
Python's standard asyncio event loop, while functional, has performance limitations that become apparent in high-throughput scenarios:
|
||||
|
||||
1. **Pure Python implementation** with overhead from interpreter execution
|
||||
2. **Slower I/O operations** compared to C-based event loops
|
||||
3. **Limited networking throughput** for concurrent connections
|
||||
4. **Higher CPU utilization** for equivalent workloads
|
||||
|
||||
Writing a custom event loop or using lower-level libraries like epoll directly adds complexity and defeats the purpose of asyncio's high-level abstractions.
|
||||
|
||||
### With uvloop (Best Practice)
|
||||
|
||||
uvloop provides a zero-code-change performance boost by simply replacing the event loop implementation:
|
||||
|
||||
- **2-4x faster** than standard asyncio @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/)
|
||||
- **Drop-in replacement** requiring minimal code changes
|
||||
- **Production-proven** in high-performance applications like Sanic, uvicorn, and vLLM
|
||||
- **libuv foundation** providing battle-tested async I/O primitives
|
||||
|
||||
## Core Use Cases
|
||||
|
||||
### 1. High-Performance Web Servers
|
||||
|
||||
uvloop is the default event loop for production ASGI servers:
|
||||
|
||||
```python
|
||||
# uvicorn with uvloop (automatic with standard install)
|
||||
# @ https://github.com/encode/uvicorn
|
||||
import uvloop
|
||||
from fastapi import FastAPI
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
return {"message": "Hello World"}
|
||||
|
||||
# Run with uvloop
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000, loop="uvloop")
|
||||
```
|
||||
|
||||
### 2. WebSocket Servers
|
||||
|
||||
High-throughput WebSocket applications @ [sanic-org/sanic](https://github.com/sanic-org/sanic):
|
||||
|
||||
```python
|
||||
import uvloop
|
||||
from sanic import Sanic, response
|
||||
|
||||
app = Sanic("websocket_app")
|
||||
|
||||
@app.websocket("/feed")
|
||||
async def feed(request, ws):
|
||||
while True:
|
||||
data = await ws.recv()
|
||||
await ws.send(data)
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(host="0.0.0.0", port=8000)
|
||||
```
|
||||
|
||||
### 3. Concurrent Network Clients
|
||||
|
||||
Web scraping and API clients @ [howie6879/ruia](https://github.com/howie6879/ruia):
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import uvloop
|
||||
import aiohttp
|
||||
|
||||
async def fetch_many(urls):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [session.get(url) for url in urls]
|
||||
responses = await asyncio.gather(*tasks)
|
||||
return responses
|
||||
|
||||
# Use uvloop for 2-4x faster concurrent requests
|
||||
uvloop.run(fetch_many(["https://example.com"] * 1000))
|
||||
```
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Pattern 1: Global Installation (Recommended for Python <3.11)
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import uvloop
|
||||
|
||||
# Install uvloop as default event loop policy
|
||||
uvloop.install()
|
||||
|
||||
async def main():
|
||||
# Your async code here
|
||||
await asyncio.sleep(1)
|
||||
|
||||
# Now all asyncio.run() calls use uvloop
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Pattern 2: Direct Run (Preferred for Python >=3.11)
|
||||
|
||||
```python
|
||||
import uvloop
|
||||
|
||||
async def main():
|
||||
# Your async application entry point
|
||||
pass
|
||||
|
||||
# Simplest usage - replaces asyncio.run()
|
||||
# @ https://github.com/MagicStack/uvloop/blob/master/README.rst
|
||||
uvloop.run(main())
|
||||
```
|
||||
|
||||
### Pattern 3: Explicit Event Loop (Advanced)
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import sys
|
||||
import uvloop
|
||||
|
||||
async def main():
|
||||
# Application logic
|
||||
pass
|
||||
|
||||
# Python 3.11+ with explicit loop factory
|
||||
if sys.version_info >= (3, 11):
|
||||
with asyncio.Runner(loop_factory=uvloop.new_event_loop) as runner:
|
||||
runner.run(main())
|
||||
else:
|
||||
uvloop.install()
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Pattern 4: Platform-Specific Installation
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
# Only use uvloop on POSIX systems (Linux/macOS)
|
||||
# @ https://github.com/wanZzz6/Modules-Learn
|
||||
if os.name == 'posix':
|
||||
import uvloop
|
||||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||||
|
||||
# Windows will use default asyncio (proactor loop)
|
||||
async def main():
|
||||
pass
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### FastAPI/Uvicorn Production Setup
|
||||
|
||||
```python
|
||||
# @ https://medium.com/israeli-tech-radar/so-you-think-python-is-slow-asyncio-vs-node-js-fe4c0083aee4
|
||||
import asyncio
|
||||
import uvloop
|
||||
from fastapi import FastAPI
|
||||
import uvicorn
|
||||
|
||||
# Enable uvloop globally
|
||||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.get("/api/data")
|
||||
async def handle_data():
|
||||
# Simulate async database query
|
||||
await asyncio.sleep(0.1)
|
||||
return {"message": "Hello from Python"}
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(app, host="0.0.0.0", port=3000, loop="uvloop")
|
||||
```
|
||||
|
||||
### Discord Bot (hikari-py)
|
||||
|
||||
```python
|
||||
# @ https://github.com/hikari-py/hikari
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
if os.name != "nt": # Not Windows
|
||||
import uvloop
|
||||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||||
|
||||
# Discord bot code follows - automatic 2-4x performance boost
|
||||
```
|
||||
|
||||
### Async Web Scraper
|
||||
|
||||
```python
|
||||
# @ https://github.com/elliotgao2/gain
|
||||
import asyncio
|
||||
import uvloop
|
||||
import aiohttp
|
||||
|
||||
async def handle_response(session, url):
|
||||
async with session.get(url) as response:
|
||||
return await response.text()
|
||||
|
||||
async def main():
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [handle_response(session, f"https://api.example.com/item/{i}")
|
||||
for i in range(1000)]
|
||||
results = await asyncio.gather(*tasks)
|
||||
return results
|
||||
|
||||
# Install and run
|
||||
uvloop.install()
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Python Version Compatibility
|
||||
|
||||
| Python Version | uvloop Support | Notes |
|
||||
| --- | --- | --- |
|
||||
| 3.8-3.10 | ✅ Full | Use `uvloop.install()` or `asyncio.set_event_loop_policy()` |
|
||||
| 3.11-3.13 | ✅ Full | Can use `uvloop.run()` or `asyncio.Runner(loop_factory=uvloop.new_event_loop)` |
|
||||
| 3.14 | ✅ Full | Free-threading support added in v0.22.0 @ [#693](https://github.com/MagicStack/uvloop/pull/693) |
|
||||
|
||||
### Platform Support
|
||||
|
||||
- **Linux**: ✅ Full support (best performance)
|
||||
- **macOS**: ✅ Full support
|
||||
- **Windows**: ⚠️ Not supported (use default asyncio proactor loop)
|
||||
- **BSD**: ✅ Supported (via libuv)
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
### Official Benchmarks
|
||||
|
||||
From @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/):
|
||||
|
||||
**Echo Server Performance (1 KiB messages):**
|
||||
|
||||
- uvloop: 105,000 req/sec
|
||||
- Node.js: ~50,000 req/sec
|
||||
- Standard asyncio: ~30,000 req/sec
|
||||
|
||||
**Throughput (100 KiB messages):**
|
||||
|
||||
- uvloop: 2.3 GiB/s
|
||||
- Standard asyncio: 0.8 GiB/s
|
||||
|
||||
### Community Benchmarks (2024-2025)
|
||||
|
||||
@ [discuss.python.org](https://discuss.python.org/t/is-uvloop-still-faster-than-built-in-asyncio-event-loop/71136):
|
||||
|
||||
- **I/O-bound operations**: Python + uvloop is ~22% faster than Node.js
|
||||
- **Native epoll comparison**: uvloop reaches 88% performance of native C epoll implementation
|
||||
- **Overall speedup**: 2-4x faster than standard asyncio across workloads
|
||||
|
||||
## When NOT to Use uvloop
|
||||
|
||||
### 1. Windows-Only Applications
|
||||
|
||||
```python
|
||||
# BAD: uvloop doesn't work on Windows
|
||||
import uvloop
|
||||
uvloop.install() # Will fail on Windows
|
||||
|
||||
# GOOD: Platform detection
|
||||
import os
|
||||
if os.name == 'posix':
|
||||
import uvloop
|
||||
uvloop.install()
|
||||
```
|
||||
|
||||
### 2. CPU-Bound Tasks
|
||||
|
||||
uvloop optimizes I/O operations but won't speed up CPU-intensive work:
|
||||
|
||||
```python
|
||||
# uvloop provides NO benefit here
|
||||
async def cpu_intensive():
|
||||
result = sum(i**2 for i in range(10_000_000))
|
||||
return result
|
||||
|
||||
# Use multiprocessing instead for CPU-bound work
|
||||
```
|
||||
|
||||
### 3. Debugging AsyncIO Code
|
||||
|
||||
The default asyncio loop has better debugging support:
|
||||
|
||||
```python
|
||||
# For debugging, use standard asyncio with debug mode
|
||||
import asyncio
|
||||
|
||||
# Don't install uvloop during development/debugging
|
||||
asyncio.run(main(), debug=True) # Better error messages with standard loop
|
||||
```
|
||||
|
||||
### 4. Simple Scripts with Minimal I/O
|
||||
|
||||
```python
|
||||
# Overkill for trivial async work
|
||||
async def simple_task():
|
||||
await asyncio.sleep(1)
|
||||
print("Done")
|
||||
|
||||
# uvloop adds minimal value here - overhead not justified
|
||||
```
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
### Use uvloop when
|
||||
|
||||
- ✅ Building production web servers (FastAPI, Sanic, etc.)
|
||||
- ✅ High-throughput network applications
|
||||
- ✅ WebSocket servers with many concurrent connections
|
||||
- ✅ Async web scrapers/crawlers
|
||||
- ✅ Running on Linux or macOS
|
||||
- ✅ I/O-bound workloads dominate
|
||||
- ✅ Zero-code-change performance boost desired
|
||||
|
||||
### Use default asyncio when
|
||||
|
||||
- ❌ Running on Windows
|
||||
- ❌ Debugging complex async code
|
||||
- ❌ CPU-bound workloads
|
||||
- ❌ Simple scripts with minimal networking
|
||||
- ❌ Maximum compatibility needed
|
||||
- ❌ Educational/learning purposes (asyncio is simpler)
|
||||
|
||||
## Installation
|
||||
|
||||
### Basic Installation
|
||||
|
||||
```bash
|
||||
pip install uvloop
|
||||
```
|
||||
|
||||
### With uvicorn (ASGI server)
|
||||
|
||||
```bash
|
||||
# uvloop automatically included with standard install
|
||||
pip install 'uvicorn[standard]'
|
||||
```
|
||||
|
||||
### Development/Source Build
|
||||
|
||||
```bash
|
||||
# Requires Cython
|
||||
pip install Cython
|
||||
git clone --recursive https://github.com/MagicStack/uvloop.git
|
||||
cd uvloop
|
||||
pip install -e .[dev]
|
||||
make
|
||||
make test
|
||||
```
|
||||
|
||||
## Integration with Common Frameworks
|
||||
|
||||
### FastAPI/Uvicorn
|
||||
|
||||
uvloop is automatically used when uvicorn is installed with `[standard]` extras:
|
||||
|
||||
```bash
|
||||
pip install 'uvicorn[standard]' # Includes uvloop
|
||||
```
|
||||
|
||||
### Sanic
|
||||
|
||||
Sanic automatically detects and uses uvloop if available:
|
||||
|
||||
```bash
|
||||
pip install sanic uvloop
|
||||
```
|
||||
|
||||
### aiohttp + gunicorn
|
||||
|
||||
```bash
|
||||
# Use uvloop worker class
|
||||
gunicorn app:create_app --worker-class aiohttp.worker.GunicornUVLoopWebWorker
|
||||
```
|
||||
|
||||
### Tornado
|
||||
|
||||
```python
|
||||
from tornado.platform.asyncio import AsyncIOMainLoop
|
||||
import asyncio
|
||||
import uvloop
|
||||
|
||||
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||||
AsyncIOMainLoop().install()
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: Installing After Event Loop Created
|
||||
|
||||
```python
|
||||
# BAD: Event loop already created
|
||||
import asyncio
|
||||
loop = asyncio.get_event_loop() # Creates default loop
|
||||
import uvloop
|
||||
uvloop.install() # Too late!
|
||||
|
||||
# GOOD: Install before any event loop operations
|
||||
import uvloop
|
||||
uvloop.install()
|
||||
import asyncio
|
||||
loop = asyncio.get_event_loop() # Now uses uvloop
|
||||
```
|
||||
|
||||
### Pitfall 2: Windows Compatibility Assumptions
|
||||
|
||||
```python
|
||||
# BAD: Crashes on Windows
|
||||
import uvloop
|
||||
uvloop.install()
|
||||
|
||||
# GOOD: Platform check
|
||||
import sys
|
||||
if sys.platform != 'win32':
|
||||
import uvloop
|
||||
uvloop.install()
|
||||
```
|
||||
|
||||
### Pitfall 3: Expecting CPU Performance Gains
|
||||
|
||||
```python
|
||||
# BAD: uvloop won't help CPU-bound code
|
||||
async def calculate_primes(n):
|
||||
return [i for i in range(2, n) if all(i % j != 0 for j in range(2, i))]
|
||||
|
||||
# uvloop provides NO benefit for pure computation
|
||||
```
|
||||
|
||||
## Maintenance and Ecosystem
|
||||
|
||||
- **Active Development**: ✅ Maintained by MagicStack (creators of EdgeDB)
|
||||
- **Release Cadence**: Regular updates (v0.22.1 released Oct 2025)
|
||||
- **Community Size**: 10,000+ stars on GitHub, used in production by major projects
|
||||
- **Dependency**: libuv (bundled, no external dependency management)
|
||||
- **Python 3.14 Support**: ✅ Free-threading support added
|
||||
|
||||
## Related Libraries
|
||||
|
||||
- **httptools**: Fast HTTP parser (also by MagicStack, pairs with uvloop)
|
||||
- **uvicorn**: ASGI server using uvloop by default
|
||||
- **aiohttp**: Async HTTP client/server framework
|
||||
- **websockets**: WebSocket library compatible with uvloop
|
||||
- **Sanic**: Web framework optimized for uvloop
|
||||
|
||||
## References
|
||||
|
||||
- Official Repository: @ [MagicStack/uvloop](https://github.com/MagicStack/uvloop)
|
||||
- Documentation: @ [uvloop.readthedocs.io](https://uvloop.readthedocs.io/)
|
||||
- Original Blog Post: @ [magic.io/blog/uvloop](https://magic.io/blog/uvloop-blazing-fast-python-networking/)
|
||||
- PyPI: @ [pypi.org/project/uvloop](https://pypi.org/project/uvloop/)
|
||||
- Performance Discussion (2024): @ [discuss.python.org](https://discuss.python.org/t/is-uvloop-still-faster-than-built-in-asyncio-event-loop/71136)
|
||||
- uvicorn Integration: @ [encode/uvicorn](https://github.com/encode/uvicorn)
|
||||
- Sanic Framework: @ [sanic-org/sanic](https://github.com/sanic-org/sanic)
|
||||
|
||||
## Summary
|
||||
|
||||
uvloop represents the gold standard for asyncio performance optimization in Python. It requires minimal code changes (often just 2 lines) while delivering 2-4x performance improvements for I/O-bound async applications. Production deployments should default to uvloop on Linux/macOS systems unless specific compatibility or debugging requirements dictate otherwise. The library's maturity, active maintenance, and widespread adoption in high-performance Python web frameworks make it a critical component of the modern Python async ecosystem.
|
||||
@@ -0,0 +1,636 @@
|
||||
---
|
||||
title: "Python Development Orchestration Guide"
|
||||
description: "Guide for orchestrating Python development tasks using specialized agents and commands"
|
||||
version: "1.0.0"
|
||||
last_updated: "2025-11-02"
|
||||
document_type: "guide"
|
||||
python_compatibility: "3.11+"
|
||||
related_docs:
|
||||
- "../SKILL.md"
|
||||
- "./modern-modules.md"
|
||||
- "./tool-library-registry.md"
|
||||
---
|
||||
|
||||
# Python Development Orchestration Guide
|
||||
|
||||
Comprehensive guide for orchestrating Python development tasks using specialized agents and commands. This guide provides detailed workflows and patterns for coordinating multiple agents to accomplish complex Python development goals.
|
||||
|
||||
**Quick Reference**: For a concise overview and quick-start examples, see [SKILL.md](../SKILL.md).
|
||||
|
||||
## Available Agents and Commands
|
||||
|
||||
### Agents (in ~/.claude/agents/)
|
||||
|
||||
- **python-cli-architect**: Build modern CLI applications with Typer and Rich
|
||||
- **python-portable-script**: Create stdlib-only portable scripts
|
||||
- **python-pytest-architect**: Design comprehensive test suites
|
||||
- **python-code-reviewer**: Review Python code for quality and standards
|
||||
- **spec-architect**: Design system architecture
|
||||
- **spec-planner**: Break down tasks into implementation plans
|
||||
|
||||
### Commands (in this skill: references/commands/)
|
||||
|
||||
- **/modernpython**: Apply Python 3.11+ best practices and modern patterns
|
||||
- **/shebangpython**: Validate PEP 723 shebang compliance
|
||||
|
||||
### External Skills
|
||||
|
||||
- **uv**: Package management with uv (always use for Python dependency management)
|
||||
|
||||
## Core Workflow Patterns
|
||||
|
||||
### 1. TDD Workflow (Test-Driven Development)
|
||||
|
||||
**When to use**: Building new features, fixing bugs with test coverage
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
1. Design → @agent-spec-architect
|
||||
Input: Feature requirements
|
||||
Output: Architecture design, component interfaces
|
||||
|
||||
2. Write Tests → @agent-python-pytest-architect
|
||||
Input: Architecture design, expected behavior
|
||||
Output: Complete test suite (fails initially)
|
||||
|
||||
3. Implement → @agent-python-cli-architect OR @agent-python-portable-script
|
||||
Input: Tests, architecture design
|
||||
Output: Implementation that makes tests pass
|
||||
|
||||
4. Review → @agent-python-code-reviewer
|
||||
Input: Implementation + tests
|
||||
Output: Review feedback, improvement suggestions
|
||||
|
||||
5. Validate (follow Linting Discovery Protocol)
|
||||
- If .pre-commit-config.yaml exists: `uv run pre-commit run --files <files>`
|
||||
- Else: Format → Lint → Type check → Test in sequence
|
||||
- Apply: /modernpython to check modern patterns
|
||||
- Verify: CI compatibility by checking .gitlab-ci.yml or .github/workflows/
|
||||
```
|
||||
|
||||
**Example**:
|
||||
|
||||
```text
|
||||
User: "Build a CLI tool to process CSV files with progress bars"
|
||||
|
||||
Step 1: @agent-spec-architect
|
||||
"Design architecture for CSV processing CLI with progress tracking"
|
||||
→ Architecture design with components
|
||||
|
||||
Step 2: @agent-python-pytest-architect
|
||||
"Create test suite for CSV processor based on this architecture"
|
||||
→ Test files in tests/
|
||||
|
||||
Step 3: @agent-python-cli-architect
|
||||
"Implement CSV processor CLI with Typer+Rich based on these tests"
|
||||
→ Implementation in packages/
|
||||
|
||||
Step 4: @agent-python-code-reviewer
|
||||
"Review this implementation against the architecture and test requirements"
|
||||
→ Review findings, suggested improvements
|
||||
|
||||
Step 5: Validate
|
||||
→ All tests pass, coverage >80%, linting clean
|
||||
```
|
||||
|
||||
### 2. Feature Addition Workflow
|
||||
|
||||
**When to use**: Adding new functionality to existing codebase
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
1. Requirements → User or @agent-spec-analyst
|
||||
Output: Clear requirements, acceptance criteria
|
||||
|
||||
2. Architecture → @agent-spec-architect
|
||||
Input: Requirements, existing codebase structure
|
||||
Output: Design that integrates with existing code
|
||||
|
||||
3. Implementation Plan → @agent-spec-planner
|
||||
Input: Architecture design
|
||||
Output: Step-by-step implementation tasks
|
||||
|
||||
4. Implement → @agent-python-cli-architect OR @agent-python-portable-script
|
||||
Input: Implementation plan, existing code patterns
|
||||
Output: New feature implementation
|
||||
|
||||
5. Testing → @agent-python-pytest-architect
|
||||
Input: Implementation, edge cases
|
||||
Output: Tests for new feature + integration tests
|
||||
|
||||
6. Review → @agent-python-code-reviewer
|
||||
Input: All changes (implementation + tests)
|
||||
Output: Quality assessment, improvements
|
||||
|
||||
7. Validate
|
||||
- Check: No regressions in existing tests
|
||||
- Verify: New feature has >80% coverage
|
||||
- Apply: /modernpython for consistency
|
||||
```
|
||||
|
||||
### 3. Code Review Workflow
|
||||
|
||||
**When to use**: Before merging changes, during PR review
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
1. Self-Review → Apply /modernpython
|
||||
Check: Modern Python patterns used
|
||||
Check: No legacy typing imports
|
||||
|
||||
2. Standards Validation → Apply /shebangpython (if scripts)
|
||||
Check: PEP 723 compliance
|
||||
Check: Correct shebang format
|
||||
|
||||
3. Agent Review → @agent-python-code-reviewer
|
||||
Input: All changed files
|
||||
Output: Comprehensive review findings
|
||||
|
||||
4. Fix Issues → Appropriate agent
|
||||
Input: Review findings
|
||||
Output: Corrections
|
||||
|
||||
5. Re-validate
|
||||
- Run: uv run pre-commit run --all-files
|
||||
- Run: uv run pytest
|
||||
- Verify: All review issues addressed
|
||||
```
|
||||
|
||||
### 4. Refactoring Workflow
|
||||
|
||||
**When to use**: Improving code structure without changing behavior
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
1. Tests First → Verify existing test coverage
|
||||
Check: Tests exist for code being refactored
|
||||
Check: Tests pass before refactoring
|
||||
If missing: @agent-python-pytest-architect creates tests
|
||||
|
||||
2. Refactor → @agent-python-cli-architect or @agent-python-portable-script
|
||||
Input: Code to refactor + test suite
|
||||
Constraint: Must not break existing tests
|
||||
Output: Refactored code
|
||||
|
||||
3. Validate → Tests still pass
|
||||
Run: uv run pytest
|
||||
Verify: Coverage maintained or improved
|
||||
|
||||
4. Review → @agent-python-code-reviewer
|
||||
Input: Before/after comparison
|
||||
Output: Verification refactoring improved quality
|
||||
|
||||
5. Apply Standards (follow Linting Discovery Protocol)
|
||||
- Apply: /modernpython for modern patterns
|
||||
- If .pre-commit-config.yaml exists: `uv run pre-commit run --files <files>`
|
||||
- Else: Format → Lint (with --fix) → Type check in sequence
|
||||
```
|
||||
|
||||
### 5. Debugging Workflow
|
||||
|
||||
**When to use**: Investigating and fixing bugs
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
1. Reproduce → Write failing test
|
||||
@agent-python-pytest-architect
|
||||
Input: Bug description, steps to reproduce
|
||||
Output: Test that demonstrates bug
|
||||
|
||||
2. Trace → Investigate root cause
|
||||
Use: Debugging tools, logging
|
||||
Identify: Specific code causing issue
|
||||
|
||||
3. Fix → Appropriate agent
|
||||
@agent-python-cli-architect or @agent-python-portable-script
|
||||
Input: Failing test + root cause
|
||||
Output: Fix that makes test pass
|
||||
|
||||
4. Test → Verify fix + no regressions
|
||||
Run: Full test suite
|
||||
Verify: Bug test now passes
|
||||
Verify: No other tests broke
|
||||
|
||||
5. Review → @agent-python-code-reviewer
|
||||
Input: Fix + test
|
||||
Output: Verification fix is proper solution
|
||||
|
||||
6. Validate
|
||||
- Apply: /modernpython
|
||||
- Run: uv run pre-commit run --files <changed>
|
||||
```
|
||||
|
||||
## Agent Selection Guide
|
||||
|
||||
### When to Use python-cli-architect
|
||||
|
||||
**Use when**:
|
||||
|
||||
- **DEFAULT choice for scripts and CLI tools**
|
||||
- Building command-line applications with rich user interaction
|
||||
- Need progress bars, tables, colored output
|
||||
- User-facing CLI tools and automation scripts
|
||||
- Any script where UX matters (formatted output, progress feedback)
|
||||
- PEP 723 + uv available (internet access present)
|
||||
|
||||
**Characteristics**:
|
||||
|
||||
- Uses Typer for CLI framework
|
||||
- Uses Rich for terminal output
|
||||
- Focuses on UX and polish
|
||||
- PEP 723 makes dependencies transparent (single file)
|
||||
- Better UX than stdlib alternatives
|
||||
- Works anywhere with Python 3.11+ and internet access
|
||||
|
||||
**Complexity Advantage** (IMPORTANT):
|
||||
|
||||
- ✅ **LESS development complexity** - Libraries handle the hard work (argument parsing, output formatting, validation)
|
||||
- ✅ **LESS code to write** - Typer CLI boilerplate and Rich formatting come built-in
|
||||
- ✅ **Better UX** - Professional output with minimal effort
|
||||
- ✅ **Just as portable** - PEP 723 + uv makes single-file scripts with dependencies work seamlessly
|
||||
|
||||
**This agent is EASIER to use than stdlib-only approaches. Choose this as the default unless portability restrictions exist.**
|
||||
|
||||
**Rich Width Handling**: For Rich Panel/Table width issues in CI/non-TTY environments, see [Typer and Rich CLI Examples](../assets/typer_examples/index.md) for complete solutions including the `get_rendered_width()` helper pattern.
|
||||
|
||||
**Example tasks**:
|
||||
|
||||
- "Build a CLI tool to manage database backups with progress bars"
|
||||
- "Create an interactive file browser with color-coded output"
|
||||
- "Create a script to scan git repositories and show status tree"
|
||||
- "Build a deployment verification tool with progress bars"
|
||||
|
||||
### When to Use python-portable-script
|
||||
|
||||
**Use when** (RARE - ask user first if unclear):
|
||||
|
||||
- **Restricted environment**: No internet access (airgapped, embedded systems)
|
||||
- **No uv available**: Locked-down systems where uv cannot be installed
|
||||
- **Hard stdlib-only requirement**: Explicitly requested by user
|
||||
- **1% case**: Only when deployment environment truly restricts dependencies
|
||||
|
||||
**Characteristics**:
|
||||
|
||||
- Stdlib only (argparse, pathlib, subprocess)
|
||||
- Defensive error handling
|
||||
- Cross-platform compatibility
|
||||
- Stdlib only (no PEP 723 needed - nothing to declare)
|
||||
- Use PEP 723 ONLY if adding external dependencies later
|
||||
- Ask deployment environment questions before choosing this agent
|
||||
- This is the EXCEPTION, not the rule
|
||||
- Consider python-cli-architect first unless restrictions confirmed
|
||||
|
||||
**Complexity Trade-off** (IMPORTANT):
|
||||
|
||||
- ❌ **MORE development complexity** - Manual implementation of everything (argument parsing, output formatting, validation, error handling)
|
||||
- ❌ **MORE code to write** - Build from scratch what libraries provide tested
|
||||
- ❌ **Basic UX** - Limited formatting capabilities
|
||||
- ✅ **Maximum portability** - The ONLY reason to choose this: runs anywhere Python exists without network access
|
||||
|
||||
**This agent is NOT simpler to use - it requires MORE work to build the same functionality. Choose it ONLY for portability, not for simplicity.**
|
||||
|
||||
**Note**: Only use this agent if deployment environment restrictions are confirmed. With PEP 723 + uv, python-cli-architect is preferred for better UX. ASK: "Will this run without internet access or where uv cannot be installed?" See [PEP 723 Reference](./PEP723.md) for details on when to use inline script metadata.
|
||||
|
||||
**Example tasks**:
|
||||
|
||||
- "Create a deployment script using only stdlib"
|
||||
- "Build a config file validator that runs without dependencies"
|
||||
|
||||
## Agent Selection Decision Process
|
||||
|
||||
### For Scripts and CLI Tools
|
||||
|
||||
**Step 1: Default to python-cli-architect**
|
||||
|
||||
- Provides better UX (Rich components, progress bars, tables)
|
||||
- PEP 723 + uv handles dependencies (still single file)
|
||||
- Works in 99% of scenarios
|
||||
|
||||
**Step 2: Only use python-portable-script if:**
|
||||
|
||||
- User explicitly states "stdlib only" requirement
|
||||
- OR deployment environment is confirmed restricted:
|
||||
- No internet access (airgapped network, embedded system)
|
||||
- uv cannot be installed (locked-down corporate environment)
|
||||
- Security policy forbids external dependencies
|
||||
|
||||
**Step 3: When uncertain, ASK:**
|
||||
|
||||
1. "Where will this script be deployed?"
|
||||
2. "Does the environment have internet access?"
|
||||
3. "Can uv be installed in the target environment?"
|
||||
4. "Is stdlib-only a hard requirement, or would you prefer better UX?"
|
||||
|
||||
**Decision Tree**:
|
||||
|
||||
```text
|
||||
Does the deployment environment have internet access?
|
||||
├─ YES → Use python-cli-architect (default)
|
||||
│ Single file + PEP 723 + uv = transparent dependencies
|
||||
│
|
||||
└─ NO → Is uv installable in the environment?
|
||||
├─ YES → Use python-cli-architect (default)
|
||||
│ uv can cache dependencies for offline use
|
||||
│
|
||||
└─ NO → Use python-portable-script (exception)
|
||||
Truly restricted environment requires stdlib-only
|
||||
```
|
||||
|
||||
If answers indicate normal environment → python-cli-architect
|
||||
|
||||
If answers indicate restrictions → python-portable-script
|
||||
|
||||
**When in doubt**: Use python-cli-architect. PEP 723 + uv makes single-file scripts with dependencies just as portable as stdlib-only scripts for 99% of deployment scenarios.
|
||||
|
||||
### When to Use python-pytest-architect
|
||||
|
||||
**Use when**:
|
||||
|
||||
- Designing test suites from scratch
|
||||
- Need comprehensive test coverage strategy
|
||||
- Implementing advanced testing (property-based, mutation)
|
||||
- Test architecture decisions
|
||||
|
||||
**Characteristics**:
|
||||
|
||||
- Modern pytest patterns
|
||||
- pytest-mock exclusively (never unittest.mock)
|
||||
- AAA pattern (Arrange-Act-Assert)
|
||||
- Coverage and mutation testing
|
||||
|
||||
**Example tasks**:
|
||||
|
||||
- "Design test suite for payment processing module"
|
||||
- "Create property-based tests for data validation"
|
||||
|
||||
### When to Use python-code-reviewer
|
||||
|
||||
**Use when**:
|
||||
|
||||
- Reviewing code for quality, patterns, standards
|
||||
- Post-implementation validation
|
||||
- Pre-merge code review
|
||||
- Identifying improvement opportunities
|
||||
|
||||
**Characteristics**:
|
||||
|
||||
- Checks against modern Python standards
|
||||
- Identifies anti-patterns
|
||||
- Suggests improvements
|
||||
- Validates against project patterns
|
||||
|
||||
**Example tasks**:
|
||||
|
||||
- "Review this PR for code quality"
|
||||
- "Check if implementation follows best practices"
|
||||
|
||||
## Command Usage Patterns
|
||||
|
||||
### /modernpython
|
||||
|
||||
**Apply to**: Load as reference guide (optional file path argument for context)
|
||||
|
||||
**Use when**:
|
||||
|
||||
- As reference guide when writing new code
|
||||
- Learning modern Python 3.11-3.14 features and patterns
|
||||
- Understanding official PEPs (585, 604, 695, etc.)
|
||||
- Identifying legacy patterns to avoid
|
||||
- Finding modern alternatives for old code
|
||||
|
||||
**Note**: This is a reference document to READ, not an automated validation tool.
|
||||
|
||||
**Usage**:
|
||||
|
||||
```text
|
||||
/modernpython
|
||||
→ Loads comprehensive reference guide
|
||||
→ Provides Python 3.11+ pattern examples
|
||||
→ Includes PEP citations with WebFetch commands
|
||||
→ Shows legacy patterns to avoid
|
||||
→ Shows modern alternatives to use
|
||||
→ Framework-specific guides (Typer, Rich, pytest)
|
||||
```
|
||||
|
||||
**With file path**:
|
||||
|
||||
```text
|
||||
/modernpython packages/mymodule.py
|
||||
→ Loads guide for reference while working on specified file
|
||||
→ Use guide to manually identify and refactor legacy patterns
|
||||
```
|
||||
|
||||
### /shebangpython
|
||||
|
||||
**Apply to**: Individual Python scripts
|
||||
|
||||
**Use when**:
|
||||
|
||||
- Creating new standalone scripts
|
||||
- Ensuring PEP 723 compliance
|
||||
- Correcting script configuration
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```text
|
||||
/shebangpython scripts/deploy.py
|
||||
→ Analyzes imports to determine dependency type
|
||||
→ **Corrects shebang** to match script type (edits file if wrong)
|
||||
→ **Adds PEP 723 metadata** if external dependencies detected (edits file)
|
||||
→ **Removes PEP 723 metadata** if stdlib-only (edits file)
|
||||
→ Sets execute bit if needed
|
||||
→ Provides detailed verification report
|
||||
```
|
||||
|
||||
## Integration with uv Skill
|
||||
|
||||
**Always use uv skill for**:
|
||||
|
||||
- Package management: `uv add <package>`
|
||||
- Running scripts: `uv run script.py`
|
||||
- Running tools: `uv run pytest`, `uv run ruff`
|
||||
- Creating projects: `uv init`
|
||||
|
||||
**Never use**:
|
||||
|
||||
- `pip install` (use `uv add`)
|
||||
- `python -m pip` (use `uv`)
|
||||
- `pipenv`, `poetry` (use `uv`)
|
||||
|
||||
## Quality Gates
|
||||
|
||||
**CRITICAL**: The orchestrator MUST instruct agents to follow the Linting Discovery Protocol from the main SKILL.md before executing quality checks.
|
||||
|
||||
**Linting Discovery Protocol** (see SKILL.md for full details):
|
||||
|
||||
1. **Check for pre-commit**: If `.pre-commit-config.yaml` exists, use `uv run pre-commit run --files <files>`
|
||||
2. **Else check CI config**: Read `.gitlab-ci.yml` or `.github/workflows/*.yml` for exact linting commands
|
||||
3. **Else detect tools**: Check `pyproject.toml` for configured dev tools
|
||||
|
||||
**Format-First Requirement**: ALWAYS format before linting (formatting fixes many lint issues automatically)
|
||||
|
||||
**Every Python development task must pass**:
|
||||
|
||||
1. **Format-first**: `uv run ruff format <files>` (or via pre-commit)
|
||||
2. **Linting**: `uv run ruff check <files>` (clean, after formatting)
|
||||
3. **Type checking**: Use **detected type checker** (`basedpyright`, `pyright`, or `mypy`)
|
||||
4. **Tests**: `uv run pytest` (>80% coverage)
|
||||
5. **Standards**: `/modernpython` for modern patterns
|
||||
6. **Script compliance**: `/shebangpython` for standalone scripts
|
||||
|
||||
**Preferred execution** (when `.pre-commit-config.yaml` exists):
|
||||
|
||||
```bash
|
||||
# This runs ALL checks in correct order (format → lint → type → test)
|
||||
uv run pre-commit run --files <changed_files>
|
||||
```
|
||||
|
||||
**For critical code** (payments, auth, security):
|
||||
|
||||
- Coverage: >95%
|
||||
- Mutation testing: `uv run mutmut run`
|
||||
- Security scan: `uv run bandit -r packages/`
|
||||
|
||||
**CI Compatibility**: After local checks pass, verify CI requirements are met by checking CI config files for additional validators.
|
||||
|
||||
## Reference Example
|
||||
|
||||
**Complete working example**: `~/.claude/agents/python-cli-demo.py`
|
||||
|
||||
This file demonstrates all modern Python CLI patterns:
|
||||
|
||||
- PEP 723 inline script metadata with correct shebang
|
||||
- Typer + Rich integration (Typer includes Rich, don't add separately)
|
||||
- Modern Python 3.11+ patterns (StrEnum, Protocol, TypeVar, etc.)
|
||||
- Proper type annotations with Annotated syntax
|
||||
- Rich components (Console, Progress, Table, Panel)
|
||||
- Async processing patterns
|
||||
- Comprehensive docstrings
|
||||
|
||||
Use this as the reference implementation when creating CLI tools.
|
||||
|
||||
## Examples of Complete Workflows
|
||||
|
||||
### Example: Building a CLI Tool
|
||||
|
||||
```text
|
||||
User: "Build a CLI tool to validate YAML configurations"
|
||||
|
||||
Orchestrator:
|
||||
1. @agent-spec-architect
|
||||
"Design architecture for YAML validation CLI"
|
||||
→ Component design, validation rules
|
||||
|
||||
2. @agent-python-pytest-architect
|
||||
"Create test suite for YAML validator"
|
||||
→ tests/test_validator.py with fixtures
|
||||
|
||||
3. @agent-python-cli-architect
|
||||
"Implement YAML validator CLI with Typer based on tests"
|
||||
Reference: ~/.claude/agents/python-cli-demo.py for patterns
|
||||
→ packages/validator.py with Typer+Rich UI
|
||||
|
||||
4. Validation (Linting Discovery Protocol):
|
||||
/shebangpython packages/validator.py
|
||||
# If .pre-commit-config.yaml exists:
|
||||
uv run pre-commit run --files packages/validator.py tests/
|
||||
# Else:
|
||||
uv run ruff format packages/ tests/
|
||||
uv run ruff check packages/ tests/
|
||||
uv run <detected-type-checker> packages/ tests/
|
||||
uv run pytest
|
||||
|
||||
5. @agent-python-code-reviewer
|
||||
"Review validator implementation"
|
||||
→ Quality check, improvements
|
||||
|
||||
6. Fix any issues and re-validate
|
||||
```
|
||||
|
||||
### Example: Fixing a Bug
|
||||
|
||||
```text
|
||||
User: "Fix bug where CSV parser fails on empty rows"
|
||||
|
||||
Orchestrator:
|
||||
1. @agent-python-pytest-architect
|
||||
"Write test that reproduces CSV parser bug with empty rows"
|
||||
→ tests/test_csv_parser.py::test_empty_rows (failing)
|
||||
|
||||
2. @agent-python-cli-architect
|
||||
"Fix CSV parser to handle empty rows, making test pass"
|
||||
→ packages/csv_parser.py updated
|
||||
|
||||
3. Validation:
|
||||
uv run pytest # Verify bug test passes
|
||||
uv run pytest # Verify no regression
|
||||
|
||||
4. @agent-python-code-reviewer
|
||||
"Review bug fix and test"
|
||||
→ Verify proper solution
|
||||
|
||||
5. Apply standards (Linting Discovery Protocol):
|
||||
/modernpython packages/csv_parser.py
|
||||
# If .pre-commit-config.yaml exists:
|
||||
uv run pre-commit run --files packages/csv_parser.py tests/
|
||||
# Else: Format → Lint (--fix) → Type check sequence
|
||||
```
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### Don't: Write Python code as orchestrator
|
||||
|
||||
```text
|
||||
❌ Orchestrator writes implementation directly
|
||||
```
|
||||
|
||||
### Do: Delegate to appropriate agent
|
||||
|
||||
```text
|
||||
✅ @agent-python-cli-architect writes implementation
|
||||
✅ @agent-python-code-reviewer validates it
|
||||
```
|
||||
|
||||
### Don't: Skip validation steps
|
||||
|
||||
```text
|
||||
❌ Implement → Done (no tests, no review, no linting)
|
||||
```
|
||||
|
||||
### Do: Follow complete workflow
|
||||
|
||||
```text
|
||||
✅ Implement → Test → Review → Validate → Done
|
||||
```
|
||||
|
||||
### Don't: Mix agent contexts
|
||||
|
||||
```cpp
|
||||
❌ Ask python-portable-script to build Typer CLI
|
||||
❌ Ask python-cli-architect to avoid all dependencies
|
||||
```
|
||||
|
||||
### Do: Choose correct agent for context
|
||||
|
||||
```text
|
||||
✅ python-cli-architect for user-facing CLI tools
|
||||
✅ python-portable-script for stdlib-only scripts
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**Orchestration = Coordination, Not Implementation**
|
||||
|
||||
1. Choose the right agent for the task
|
||||
2. Provide clear inputs and context
|
||||
3. Chain agents for complex workflows (architect → test → implement → review)
|
||||
4. Always validate with quality gates
|
||||
5. Use commands for standards checking
|
||||
6. Integrate with uv skill for package management
|
||||
|
||||
**Success = Right agent + Clear inputs + Proper validation**
|
||||
4610
skills/python3-development/references/tool-library-registry.md
Normal file
4610
skills/python3-development/references/tool-library-registry.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,831 @@
|
||||
---
|
||||
title: User Project Conventions
|
||||
date: 2025-11-17
|
||||
source: Extracted from user's production projects
|
||||
projects_analyzed:
|
||||
- pre-commit-pep723-linter-wrapper (PyPI/GitHub)
|
||||
- python_picotool (GitLab)
|
||||
- usb_powertools (GitLab)
|
||||
- picod (GitLab)
|
||||
- i2c_analyzer (GitLab)
|
||||
---
|
||||
|
||||
# User Project Conventions
|
||||
|
||||
Conventions extracted from actual production projects. The model MUST follow these patterns when creating new Python projects.
|
||||
|
||||
## Asset Files Available
|
||||
|
||||
The following template files are available in the skill's `assets/` directory for use in new projects:
|
||||
|
||||
| File | Purpose | Usage |
|
||||
| ------------------------- | ---------------------------------------------------- | ----------------------------------------------------- |
|
||||
| `version.py` | Dual-mode version management (hatch-vcs + fallback) | Copy to `packages/{package_name}/version.py` |
|
||||
| `hatch_build.py` | Build hook for binary/asset handling | Copy to `scripts/hatch_build.py` |
|
||||
| `.markdownlint.json` | Markdown linting configuration (most rules disabled) | Copy to project root |
|
||||
| `.pre-commit-config.yaml` | Standard pre-commit hooks configuration | Copy to project root, run `uv run pre-commit install` |
|
||||
| `.editorconfig` | Editor formatting settings | Copy to project root |
|
||||
|
||||
The model MUST copy these files when creating new Python projects to ensure consistency with established conventions documented below.
|
||||
|
||||
## 1. Version Management
|
||||
|
||||
### Pattern: Dual-mode version.py (STANDARD - 5/5 projects)
|
||||
|
||||
**Location**: `packages/{package_name}/version.py`
|
||||
|
||||
**Pattern**: Hatch-VCS with importlib.metadata fallback
|
||||
|
||||
**Implementation**:
|
||||
|
||||
```python
|
||||
"""Compute the version number and store it in the `__version__` variable.
|
||||
|
||||
Based on <https://github.com/maresb/hatch-vcs-footgun-example>.
|
||||
"""
|
||||
|
||||
# /// script
|
||||
# List dependencies for linting only
|
||||
# dependencies = [
|
||||
# "hatchling>=1.14.0",
|
||||
# ]
|
||||
# ///
|
||||
import os
|
||||
|
||||
|
||||
def _get_hatch_version() -> str | None:
|
||||
"""Compute the most up-to-date version number in a development environment.
|
||||
|
||||
Returns `None` if Hatchling is not installed, e.g. in a production environment.
|
||||
|
||||
For more details, see <https://github.com/maresb/hatch-vcs-footgun-example/>.
|
||||
"""
|
||||
try:
|
||||
from hatchling.metadata.core import ProjectMetadata
|
||||
from hatchling.plugin.manager import PluginManager
|
||||
from hatchling.utils.fs import locate_file
|
||||
except ImportError:
|
||||
# Hatchling is not installed, so probably we are not in
|
||||
# a development environment.
|
||||
return None
|
||||
|
||||
pyproject_toml = locate_file(__file__, "pyproject.toml")
|
||||
if pyproject_toml is None:
|
||||
raise RuntimeError("pyproject.toml not found although hatchling is installed")
|
||||
root = os.path.dirname(pyproject_toml)
|
||||
metadata = ProjectMetadata(root=root, plugin_manager=PluginManager())
|
||||
# Version can be either statically set in pyproject.toml or computed dynamically:
|
||||
return str(metadata.core.version or metadata.hatch.version.cached)
|
||||
|
||||
|
||||
def _get_importlib_metadata_version() -> str:
|
||||
"""Compute the version number using importlib.metadata.
|
||||
|
||||
This is the official Pythonic way to get the version number of an installed
|
||||
package. However, it is only updated when a package is installed. Thus, if a
|
||||
package is installed in editable mode, and a different version is checked out,
|
||||
then the version number will not be updated.
|
||||
"""
|
||||
from importlib.metadata import version
|
||||
|
||||
__version__ = version(__package__ or __name__)
|
||||
return __version__
|
||||
|
||||
|
||||
__version__ = _get_hatch_version() or _get_importlib_metadata_version()
|
||||
```
|
||||
|
||||
**pyproject.toml Configuration** (STANDARD - 5/5 projects):
|
||||
|
||||
```toml
|
||||
[project]
|
||||
dynamic = ["version"]
|
||||
|
||||
[tool.hatch.version]
|
||||
source = "vcs"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling", "hatch-vcs"]
|
||||
build-backend = "hatchling.build"
|
||||
```
|
||||
|
||||
\***\*init**.py Export Pattern\*\* (STANDARD - 5/5 projects):
|
||||
|
||||
```python
|
||||
from .version import __version__
|
||||
|
||||
__all__ = ["__version__"] # Plus other exports
|
||||
```
|
||||
|
||||
## 2. Package Structure
|
||||
|
||||
### Pattern: src-layout with packages/ directory (STANDARD - 5/5 projects)
|
||||
|
||||
**Directory Structure**:
|
||||
|
||||
```text
|
||||
project_root/
|
||||
├── packages/
|
||||
│ └── {package_name}/
|
||||
│ ├── __init__.py # Exports public API + __version__
|
||||
│ ├── version.py # Version management
|
||||
│ ├── {modules}.py
|
||||
│ └── tests/ # Co-located tests
|
||||
├── scripts/
|
||||
│ └── hatch_build.py # Custom build hook (if needed)
|
||||
├── pyproject.toml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
**pyproject.toml Package Mapping** (STANDARD - 5/5 projects):
|
||||
|
||||
```toml
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["packages/{package_name}"]
|
||||
|
||||
[tool.hatch.build.targets.wheel.sources]
|
||||
"packages/{package_name}" = "{package_name}"
|
||||
```
|
||||
|
||||
### Pattern: **init**.py exports with **all** (STANDARD - 5/5 projects)
|
||||
|
||||
The model must export public API + `__version__` in `__init__.py` with explicit `__all__` list.
|
||||
|
||||
**Minimal Example** (usb_powertools):
|
||||
|
||||
```python
|
||||
"""Package docstring."""
|
||||
|
||||
from .version import __version__
|
||||
|
||||
__all__ = ["__version__"]
|
||||
```
|
||||
|
||||
**Full API Example** (pep723_loader):
|
||||
|
||||
```python
|
||||
"""Package docstring."""
|
||||
|
||||
from .pep723_checker import Pep723Checker
|
||||
from .version import __version__
|
||||
|
||||
__all__ = ["Pep723Checker", "__version__"]
|
||||
```
|
||||
|
||||
**Evidence**: All 5 projects use this pattern consistently.
|
||||
|
||||
## 3. Build Configuration
|
||||
|
||||
### Pattern: Custom hatch_build.py Hook (STANDARD - 3/5 projects with binaries)
|
||||
|
||||
**Location**: `scripts/hatch_build.py`
|
||||
|
||||
**Purpose**: Execute binary build scripts (`build-binaries.sh` or `build-binaries.py`) before packaging.
|
||||
|
||||
**Standard Implementation** (usb_powertools, picod, i2c_analyzer identical):
|
||||
|
||||
```python
|
||||
"""Custom hatchling build hook for binary compilation.
|
||||
|
||||
This hook runs before the build process to compile platform-specific binaries
|
||||
if build scripts are present in the project.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import shutil
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from hatchling.builders.config import BuilderConfig
|
||||
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
|
||||
|
||||
|
||||
class BinaryBuildHook(BuildHookInterface[BuilderConfig]):
|
||||
"""Build hook that runs binary compilation scripts before packaging.
|
||||
|
||||
This hook checks for the following scripts in order:
|
||||
1. scripts/build-binaries.sh
|
||||
2. scripts/build-binaries.py
|
||||
|
||||
If either script exists, it is executed before the build process.
|
||||
If neither exists, the hook silently continues without error.
|
||||
"""
|
||||
|
||||
PLUGIN_NAME = "binary-build"
|
||||
|
||||
def initialize(self, version: str, build_data: dict[str, Any]) -> None:
|
||||
"""Run binary build scripts if they exist."""
|
||||
shell_script = Path(self.root) / "scripts" / "build-binaries.sh"
|
||||
if shell_script.exists() and shell_script.is_file():
|
||||
self._run_shell_script(shell_script)
|
||||
return
|
||||
|
||||
python_script = Path(self.root) / "scripts" / "build-binaries.py"
|
||||
if python_script.exists() and python_script.is_file():
|
||||
self._run_python_script(python_script)
|
||||
return
|
||||
|
||||
self.app.display_info("No binary build scripts found, skipping binary compilation")
|
||||
|
||||
def _run_shell_script(self, script_path: Path) -> None:
|
||||
"""Execute a shell script for binary building."""
|
||||
self.app.display_info(f"Running binary build script: {script_path}")
|
||||
|
||||
if not (bash := shutil.which("bash")):
|
||||
raise RuntimeError("bash not found - cannot execute shell script")
|
||||
|
||||
try:
|
||||
result = subprocess.run([bash, str(script_path)], cwd=self.root, capture_output=True, text=True, check=True)
|
||||
if result.stdout:
|
||||
self.app.display_info(result.stdout)
|
||||
if result.stderr:
|
||||
self.app.display_warning(result.stderr)
|
||||
except subprocess.CalledProcessError as e:
|
||||
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
|
||||
if e.stdout:
|
||||
self.app.display_info(f"stdout: {e.stdout}")
|
||||
if e.stderr:
|
||||
self.app.display_error(f"stderr: {e.stderr}")
|
||||
raise
|
||||
|
||||
def _run_python_script(self, script_path: Path) -> None:
|
||||
"""Execute a Python script for binary building.
|
||||
|
||||
Executes the script directly using its shebang, which honors PEP 723
|
||||
inline metadata for dependency management via uv.
|
||||
"""
|
||||
self.app.display_info(f"Running binary build script: {script_path}")
|
||||
|
||||
try:
|
||||
result = subprocess.run([script_path, "--clean"], cwd=self.root, capture_output=True, text=True, check=True)
|
||||
if result.stdout:
|
||||
self.app.display_info(result.stdout)
|
||||
if result.stderr:
|
||||
self.app.display_warning(result.stderr)
|
||||
except subprocess.CalledProcessError as e:
|
||||
self.app.display_error(f"Binary build script failed with exit code {e.returncode}")
|
||||
if e.stdout:
|
||||
self.app.display_info(f"stdout: {e.stdout}")
|
||||
if e.stderr:
|
||||
self.app.display_error(f"stderr: {e.stderr}")
|
||||
raise
|
||||
```
|
||||
|
||||
**pyproject.toml Configuration**:
|
||||
|
||||
```toml
|
||||
[tool.hatch.build.targets.sdist.hooks.custom]
|
||||
path = "scripts/hatch_build.py"
|
||||
|
||||
[tool.hatch.build]
|
||||
artifacts = ["builds/*/binary_name"] # If binaries included
|
||||
```
|
||||
|
||||
## 4. Pre-commit Configuration
|
||||
|
||||
### Standard Hook Set (STANDARD - 5/5 projects)
|
||||
|
||||
**File**: `.pre-commit-config.yaml`
|
||||
|
||||
**Core Hooks** (appear in all projects):
|
||||
|
||||
```yaml
|
||||
repos:
|
||||
- repo: https://github.com/mxr/sync-pre-commit-deps
|
||||
rev: v0.0.3
|
||||
hooks:
|
||||
- id: sync-pre-commit-deps
|
||||
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v6.0.0
|
||||
hooks:
|
||||
- id: trailing-whitespace
|
||||
exclude: \.lock$
|
||||
- id: end-of-file-fixer
|
||||
exclude: \.lock$
|
||||
- id: check-yaml
|
||||
- id: check-json
|
||||
- id: check-toml
|
||||
- id: check-added-large-files
|
||||
args: ["--maxkb=10000"] # 10MB limit
|
||||
- id: check-case-conflict
|
||||
- id: check-merge-conflict
|
||||
- id: check-symlinks
|
||||
- id: mixed-line-ending
|
||||
args: ["--fix=lf"]
|
||||
- id: check-executables-have-shebangs
|
||||
- id: check-shebang-scripts-are-executable
|
||||
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
rev: v0.13.3+
|
||||
hooks:
|
||||
- id: ruff
|
||||
name: Lint Python with ruff
|
||||
args: [--fix, --exit-non-zero-on-fix]
|
||||
- id: ruff-format
|
||||
name: Format Python with ruff
|
||||
|
||||
- repo: https://github.com/pre-commit/mirrors-prettier
|
||||
rev: v4.0.0-alpha.8
|
||||
hooks:
|
||||
- id: prettier
|
||||
name: Format YAML, JSON, and Markdown files
|
||||
types_or: [yaml, json, markdown]
|
||||
exclude: \.lock$
|
||||
|
||||
- repo: https://github.com/pecigonzalo/pre-commit-shfmt
|
||||
rev: v2.2.0
|
||||
hooks:
|
||||
- id: shell-fmt-go
|
||||
args: ["--apply-ignore", -w, -i, "4", -ci]
|
||||
|
||||
- repo: https://github.com/shellcheck-py/shellcheck-py
|
||||
rev: v0.11.0.1
|
||||
hooks:
|
||||
- id: shellcheck
|
||||
|
||||
default_language_version:
|
||||
python: python3
|
||||
|
||||
exclude: |
|
||||
(?x)^(
|
||||
\.git/|
|
||||
\.venv/|
|
||||
__pycache__/|
|
||||
\.mypy_cache/|
|
||||
\.cache/|
|
||||
\.pytest_cache/|
|
||||
\.lock$|
|
||||
typings/
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern: pep723-loader for Type Checking (STANDARD - 3/5 projects)
|
||||
|
||||
Projects using `pep723-loader` wrapper for mypy/basedpyright:
|
||||
|
||||
```yaml
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: mypy
|
||||
name: mypy
|
||||
entry: uv run -q --no-sync --with pep723-loader --with mypy pep723-loader mypy
|
||||
language: system
|
||||
types: [python]
|
||||
pass_filenames: true
|
||||
|
||||
- id: pyright
|
||||
name: basedpyright
|
||||
entry: uv run -q --no-sync --with pep723-loader --with basedpyright pep723-loader basedpyright
|
||||
language: system
|
||||
types: [python]
|
||||
pass_filenames: true
|
||||
require_serial: true
|
||||
```
|
||||
|
||||
### Pattern: Markdown Linting (STANDARD - 4/5 projects)
|
||||
|
||||
```yaml
|
||||
- repo: https://github.com/DavidAnson/markdownlint-cli2
|
||||
rev: v0.18.1
|
||||
hooks:
|
||||
- id: markdownlint-cli2
|
||||
language_version: "latest"
|
||||
args: ["--fix"]
|
||||
```
|
||||
|
||||
**Evidence**: pre-commit-pep723-linter-wrapper, usb_powertools, picod all use this pattern.
|
||||
|
||||
## 5. Ruff Configuration
|
||||
|
||||
### Standard Configuration (STANDARD - 5/5 projects)
|
||||
|
||||
**pyproject.toml Section**:
|
||||
|
||||
```toml
|
||||
[tool.ruff]
|
||||
target-version = "py311"
|
||||
line-length = 120
|
||||
fix = true
|
||||
preview = true # Optional, 3/5 projects use
|
||||
|
||||
[tool.ruff.format]
|
||||
docstring-code-format = true
|
||||
quote-style = "double"
|
||||
line-ending = "lf"
|
||||
skip-magic-trailing-comma = true
|
||||
preview = true
|
||||
|
||||
[tool.ruff.lint]
|
||||
extend-select = [
|
||||
"E", # pycodestyle errors
|
||||
"W", # pycodestyle warnings
|
||||
"F", # pyflakes
|
||||
"I", # isort
|
||||
"UP", # pyupgrade
|
||||
"YTT", # flake8-2020
|
||||
"S", # flake8-bandit
|
||||
"B", # flake8-bugbear
|
||||
"A", # flake8-builtins
|
||||
"C4", # flake8-comprehensions
|
||||
"T10", # flake8-debugger
|
||||
"SIM", # flake8-simplify
|
||||
"C90", # mccabe
|
||||
"PGH", # pygrep-hooks
|
||||
"RUF", # ruff-specific
|
||||
"TRY", # tryceratops
|
||||
"DOC", # pydocstyle docstrings (4/5 projects)
|
||||
"D", # pydocstyle (4/5 projects)
|
||||
]
|
||||
|
||||
ignore = [
|
||||
"COM812", # Missing trailing comma
|
||||
"COM819", # Missing trailing comma
|
||||
"D107", # Missing docstring in __init__
|
||||
"D415", # First line should end with a period
|
||||
"E111", # Indentation is not a multiple of four
|
||||
"E117", # Over-indented for visual indent
|
||||
"E203", # whitespace before ':'
|
||||
"E402", # Module level import not at top of file
|
||||
"E501", # Line length exceeds maximum limit
|
||||
"ISC001", # isort configuration is missing
|
||||
"ISC002", # isort configuration is missing
|
||||
"Q000", # Remove bad quotes
|
||||
"Q001", # Remove bad quotes
|
||||
"Q002", # Remove bad quotes
|
||||
"Q003", # Remove bad quotes
|
||||
"TRY003", # Exception message should not be too long
|
||||
"S404", # module is possibly insecure
|
||||
"S603", # subprocess-without-shell-equals-true
|
||||
"S606", # start-process-with-no-shell
|
||||
"DOC201", # Missing return section in docstring
|
||||
"DOC501", # Missing raises section
|
||||
"DOC502", # Missing raises section
|
||||
"T201", # Allow print statements (4/5 projects)
|
||||
]
|
||||
|
||||
unfixable = ["F401", "S404", "S603", "S606", "DOC501"]
|
||||
|
||||
[tool.ruff.lint.pycodestyle]
|
||||
max-line-length = 120
|
||||
|
||||
[tool.ruff.lint.isort]
|
||||
combine-as-imports = true
|
||||
split-on-trailing-comma = false
|
||||
force-single-line = false
|
||||
force-wrap-aliases = false
|
||||
|
||||
[tool.ruff.lint.flake8-quotes]
|
||||
docstring-quotes = "double"
|
||||
|
||||
[tool.ruff.lint.pydocstyle]
|
||||
convention = "google"
|
||||
|
||||
[tool.ruff.lint.mccabe]
|
||||
max-complexity = 10
|
||||
|
||||
[tool.ruff.lint.per-file-ignores]
|
||||
"**/tests/*" = ["S101", "S603", "S607", "D102", "D200", "D100"]
|
||||
"**/test_*.py" = ["S101", "S603", "S607", "D102", "D200", "D100"]
|
||||
```
|
||||
|
||||
**Evidence**: All 5 projects use this exact configuration with minor variations.
|
||||
|
||||
## 6. Mypy Configuration
|
||||
|
||||
### Standard Configuration (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[tool.mypy]
|
||||
python_version = "3.11"
|
||||
strict = true
|
||||
strict_equality = true
|
||||
extra_checks = true
|
||||
warn_unused_configs = true
|
||||
warn_redundant_casts = true
|
||||
warn_unused_ignores = true
|
||||
ignore_missing_imports = true
|
||||
show_error_codes = true
|
||||
pretty = true
|
||||
disable_error_code = ["call-arg"]
|
||||
```
|
||||
|
||||
**Per-module overrides pattern**:
|
||||
|
||||
```toml
|
||||
[[tool.mypy.overrides]]
|
||||
module = "tests.*"
|
||||
disable_error_code = ["misc"]
|
||||
```
|
||||
|
||||
## 7. Basedpyright Configuration
|
||||
|
||||
### Standard Configuration (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[tool.basedpyright]
|
||||
pythonVersion = "3.11"
|
||||
typeCheckingMode = "standard"
|
||||
reportMissingImports = false
|
||||
reportMissingTypeStubs = false
|
||||
reportUnnecessaryTypeIgnoreComment = "error"
|
||||
reportPrivateImportUsage = false
|
||||
include = ["packages"]
|
||||
extraPaths = ["packages", "scripts", "tests", "."]
|
||||
exclude = ["**/node_modules", "**/__pycache__", ".*", "__*", "**/typings"]
|
||||
ignore = ["**/typings"]
|
||||
venvPath = "."
|
||||
venv = ".venv"
|
||||
```
|
||||
|
||||
**Evidence**: All 5 projects use this configuration.
|
||||
|
||||
## 8. Pytest Configuration
|
||||
|
||||
### Standard Configuration (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[tool.pytest.ini_options]
|
||||
addopts = [
|
||||
"--cov=packages/{package_name}",
|
||||
"--cov-report=term-missing",
|
||||
"-v",
|
||||
]
|
||||
testpaths = ["packages/{package_name}/tests"]
|
||||
python_files = ["test_*.py"]
|
||||
python_classes = ["Test*"]
|
||||
python_functions = ["test_*"]
|
||||
pythonpath = [".", "packages/"]
|
||||
markers = [
|
||||
"hardware: tests that require USB hardware",
|
||||
"slow: tests that take significant time to run",
|
||||
"integration: integration tests",
|
||||
]
|
||||
|
||||
[tool.coverage.run]
|
||||
omit = ["*/tests/*"]
|
||||
|
||||
[tool.coverage.report]
|
||||
show_missing = true
|
||||
fail_under = 70
|
||||
```
|
||||
|
||||
**Evidence**: All projects follow this pattern with minor marker variations.
|
||||
|
||||
## 9. Formatting Configuration Files
|
||||
|
||||
### .markdownlint.json (STANDARD - 5/5 projects)
|
||||
|
||||
**All projects use identical configuration**:
|
||||
|
||||
```json
|
||||
{
|
||||
"MD003": false,
|
||||
"MD007": { "indent": 2 },
|
||||
"MD001": false,
|
||||
"MD022": false,
|
||||
"MD024": false,
|
||||
"MD013": false,
|
||||
"MD036": false,
|
||||
"MD025": false,
|
||||
"MD031": false,
|
||||
"MD041": false,
|
||||
"MD029": false,
|
||||
"MD033": false,
|
||||
"MD046": false,
|
||||
"blanks-around-fences": false,
|
||||
"blanks-around-headings": false,
|
||||
"blanks-around-lists": false,
|
||||
"code-fence-style": false,
|
||||
"emphasis-style": false,
|
||||
"heading-start-left": false,
|
||||
"heading-style": false,
|
||||
"hr-style": false,
|
||||
"line-length": false,
|
||||
"list-indent": false,
|
||||
"list-marker-space": false,
|
||||
"no-blanks-blockquote": false,
|
||||
"no-hard-tabs": false,
|
||||
"no-missing-space-atx": false,
|
||||
"no-missing-space-closed-atx": false,
|
||||
"no-multiple-blanks": false,
|
||||
"no-multiple-space-atx": false,
|
||||
"no-multiple-space-blockquote": false,
|
||||
"no-multiple-space-closed-atx": false,
|
||||
"no-trailing-spaces": false,
|
||||
"ol-prefix": false,
|
||||
"strong-style": false,
|
||||
"ul-indent": false
|
||||
}
|
||||
```
|
||||
|
||||
**Evidence**: Identical across all 5 projects.
|
||||
|
||||
### .editorconfig (COMMON - 2/5 projects have it)
|
||||
|
||||
**Standard Pattern** (python_picotool, picod):
|
||||
|
||||
```ini
|
||||
# EditorConfig: https://editorconfig.org/
|
||||
|
||||
root = true
|
||||
|
||||
[*]
|
||||
charset = utf-8
|
||||
end_of_line = lf
|
||||
insert_final_newline = true
|
||||
trim_trailing_whitespace = true
|
||||
max_line_length = 120
|
||||
|
||||
[*.md]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
trim_trailing_whitespace = false
|
||||
|
||||
[*.py]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
|
||||
[*.{yml,yaml}]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
[*.sh]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
|
||||
[*.toml]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
[*.json]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
[COMMIT_EDITMSG]
|
||||
max_line_length = 72
|
||||
```
|
||||
|
||||
**Evidence**:
|
||||
|
||||
## 10. Semantic Release Configuration
|
||||
|
||||
### Standard Configuration (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[tool.semantic_release]
|
||||
version_toml = []
|
||||
major_on_zero = true
|
||||
allow_zero_version = true
|
||||
tag_format = "v{version}"
|
||||
build_command = "uv build"
|
||||
|
||||
[tool.semantic_release.branches.main]
|
||||
match = "(main|master)"
|
||||
prerelease = false
|
||||
|
||||
[tool.semantic_release.commit_parser_options]
|
||||
allowed_tags = [
|
||||
"build",
|
||||
"chore",
|
||||
"ci",
|
||||
"docs",
|
||||
"feat",
|
||||
"fix",
|
||||
"perf",
|
||||
"style",
|
||||
"refactor",
|
||||
"test",
|
||||
]
|
||||
minor_tags = ["feat"]
|
||||
patch_tags = ["fix", "perf", "refactor"]
|
||||
```
|
||||
|
||||
**Evidence**: All 5 projects use this configuration identically.
|
||||
|
||||
## 11. Dependency Groups
|
||||
|
||||
### Standard dev Dependencies (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[dependency-groups]
|
||||
dev = [
|
||||
"basedpyright>=1.21.1",
|
||||
"hatch-vcs>=0.5.0",
|
||||
"hatchling>=1.14.0",
|
||||
"mypy>=1.18.2",
|
||||
"pre-commit>=4.3.0",
|
||||
"pytest>=8.4.2",
|
||||
"pytest-asyncio>=1.2.0",
|
||||
"pytest-cov>=6.0.0",
|
||||
"pytest-mock>=3.14.0",
|
||||
"ruff>=0.9.4",
|
||||
"python-semantic-release>=10.4.1",
|
||||
"generate-changelog>=0.16.0",
|
||||
]
|
||||
```
|
||||
|
||||
**Common Pattern**: All projects include mypy, basedpyright, ruff, pytest, pre-commit, hatchling tools.
|
||||
|
||||
**Evidence**: All 5 projects have dev dependency groups with these core tools.
|
||||
|
||||
## 12. GitLab Project-Specific Patterns
|
||||
|
||||
### Pattern: Custom PyPI Index (STANDARD - 4/4 GitLab projects)
|
||||
|
||||
```toml
|
||||
[tool.uv]
|
||||
publish-url = "{{gitlab_instance_url}}/api/v4/projects/{{project_id}}/packages/pypi"
|
||||
|
||||
[[tool.uv.index]]
|
||||
name = "pypi"
|
||||
url = "https://pypi.org/simple"
|
||||
default = true
|
||||
|
||||
[[tool.uv.index]]
|
||||
name = "gitlab"
|
||||
url = "{{gitlab_instance_url}}/api/v4/groups/{{group_id}}/-/packages/pypi/simple"
|
||||
explicit = true
|
||||
default = false
|
||||
```
|
||||
|
||||
## 13. Project Metadata Standards
|
||||
|
||||
### Pattern: Author and Maintainer (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[project]
|
||||
authors = [{ name = "{{author_name_from_git_config_user_name}}", email = "{{author_email_from_git_config_user_email}}" }]
|
||||
maintainers = [{ name = "{{author_name_from_git_config_user_name}}", email = "{{author_email_from_git_config_user_email}}" }]
|
||||
```
|
||||
|
||||
**Observation**: Email addresses differ between GitHub projects (personal email) and GitLab projects (corporate email).
|
||||
|
||||
### Pattern: Classifiers (STANDARD - 5/5 projects)
|
||||
|
||||
**Common classifiers across all projects**:
|
||||
|
||||
```toml
|
||||
classifiers = [
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"Operating System :: POSIX :: Linux" or "Operating System :: OS Independent",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
]
|
||||
```
|
||||
|
||||
### Pattern: Keywords (STANDARD - 5/5 projects)
|
||||
|
||||
All projects include domain-specific keywords related to their purpose.
|
||||
|
||||
### Pattern: requires-python (STANDARD - 5/5 projects)
|
||||
|
||||
**Two variants**:
|
||||
|
||||
- GitHub: `>=3.10`
|
||||
- GitLab: `>=3.11,<3.13`
|
||||
|
||||
## 14. CLI Entry Points
|
||||
|
||||
### Pattern: Typer-based CLI (STANDARD - 5/5 projects)
|
||||
|
||||
```toml
|
||||
[project.scripts]
|
||||
{package_name} = "{package_name}.cli:main" or "{package_name}.cli:app"
|
||||
|
||||
[project]
|
||||
dependencies = [
|
||||
"typer>=0.19.2",
|
||||
]
|
||||
```
|
||||
|
||||
**Evidence**: All 5 projects use Typer for CLI implementation.
|
||||
|
||||
## Summary of Standard Patterns
|
||||
|
||||
**STANDARD** (5/5 projects):
|
||||
|
||||
- Dual-mode version.py with hatch-vcs
|
||||
- packages/ directory structure
|
||||
- **all** exports in **init**.py
|
||||
- Ruff formatting with 120 char line length
|
||||
- Mypy strict mode
|
||||
- Basedpyright type checking
|
||||
- Pre-commit hooks (sync-deps, ruff, prettier, shellcheck, shfmt)
|
||||
- .markdownlint.json (identical config)
|
||||
- Semantic release configuration
|
||||
- Typer-based CLI
|
||||
- pytest with coverage
|
||||
|
||||
**COMMON** (3-4/5 projects):
|
||||
|
||||
- pep723-loader for type checking in pre-commit
|
||||
- Custom hatch_build.py hook
|
||||
- .editorconfig
|
||||
- GitLab custom PyPI index
|
||||
|
||||
The model must follow STANDARD patterns for all new Python projects. COMMON patterns should be used when applicable (e.g., hatch_build.py only if binaries needed).
|
||||
Reference in New Issue
Block a user