gh-jamie-bitflight-claude-s…/skills/python3-development/references/modern-modules/prefect.md

---
title: "Prefect: Modern Workflow Orchestration Platform"
library_name: prefect
pypi_package: prefect
category: workflow-orchestration
python_compatibility: "3.9+"
last_updated: "2025-11-02"
official_docs: "https://docs.prefect.io"
official_repository: "https://github.com/PrefectHQ/prefect"
maintenance_status: "active"
---

# Prefect: Modern Workflow Orchestration

## Core Purpose

Prefect solves workflow orchestration with a Python-first approach that turns regular Python functions into production-ready data pipelines. Unlike legacy orchestrators that require DAG definitions and framework-specific operators, Prefect observes native Python code execution and provides orchestration through simple decorators@[1].

**Problem Domain:** Coordinating multi-step data workflows, handling failures with retries, scheduling recurring jobs, monitoring pipeline execution, and managing dependencies between tasks without writing boilerplate orchestration code@[2].

**When to Use:** Building data pipelines, ML workflows, ETL processes, or any multi-step automation that needs scheduling, retry logic, state tracking, and observability@[3].

**What You Would Reinvent:** Manual retry logic, state management, dependency coordination, scheduling systems, execution monitoring, error handling, result caching, and workflow visibility dashboards@[4].

## Official Information

**Repository:** <https://github.com/PrefectHQ/prefect> **PyPI Package:** `prefect` (current: v3.4.24)@[5] **Documentation:** <https://docs.prefect.io> **License:** Apache-2.0@[6] **Maintenance:** Actively maintained by PrefectHQ with 1059 open issues, 20.6K stars, regular releases@[7] **Community:** 30K+ engineers, active Slack community@[8]

## Python Compatibility

**Minimum Version:** Python 3.9@[9] **Maximum Version:** Python 3.13 (3.14 not yet supported)@[9] **Async Support:** Full native async/await support throughout@[10] **Type Hints:** First-class support, type-safe structured outputs@[11]

## Core Capabilities

### 1. Pythonic Flow Definition

Write workflows as regular Python functions with `@flow` and `@task` decorators:

```python
from prefect import flow, task
import httpx

@task(log_prints=True)
def get_stars(repo: str):
    url = f"https://api.github.com/repos/{repo}"
    count = httpx.get(url).json()["stargazers_count"]
    print(f"{repo} has {count} stars!")

@flow(name="GitHub Stars")
def github_stars(repos: list[str]):
    for repo in repos:
        get_stars(repo)

# Run directly
if __name__ == "__main__":
    github_stars(["PrefectHQ/Prefect"])
```

@[12]

### 2. Dynamic Runtime Workflows

Create tasks dynamically based on data, not static DAG definitions:

```python
from prefect import task, flow

@task
def process_customer(customer_id: str) -> str:
    return f"Processed {customer_id}"

@flow
def main() -> list[str]:
    customer_ids = get_customer_ids()  # Runtime data
    # Map tasks across dynamic data
    results = process_customer.map(customer_ids)
    return results
```

@[13]

### 3. Flexible Scheduling

Deploy workflows with cron, interval, or RRule schedules:

```python
# Serve with cron schedule
if __name__ == "__main__":
    github_stars.serve(
        name="daily-stars",
        cron="0 8 * * *",  # Daily at 8 AM
        parameters={"repos": ["PrefectHQ/prefect"]}
    )
```

@[14]

```python
# Or use interval-based scheduling
my_flow.deploy(
    name="my-deployment",
    work_pool_name="my-work-pool",
    interval=timedelta(minutes=10)
)
```

@[15]

### 4. Built-in Retries and State Management

Automatic retry logic and state tracking:

```python
@task(retries=3, retry_delay_seconds=60)
def fetch_data():
    # Automatically retries on failure
    return api_call()
```

@[16]

### 5. Concurrent Task Execution

Run tasks in parallel with `.submit()`:

```python
@flow
def my_workflow():
    future = cool_task.submit()  # Non-blocking
    print(what_did_cool_task_say(future))
```

@[17]

### 6. Event-Driven Automations

React to events, not just schedules:

```python
# Trigger flows on external events
my_flow.deploy(
    triggers=[
        DeploymentEventTrigger(
            expect=["s3.file.uploaded"]
        )
    ]
)
```

@[18]

## Real-World Integration Patterns

### Integration with dbt

Orchestrate dbt transformations within Prefect flows:

```python
from prefect_dbt import DbtCoreOperation

@flow
def dbt_flow():
    result = DbtCoreOperation(
        commands=["dbt run", "dbt test"],
        project_dir="/path/to/dbt/project"
    ).run()
    return result
```

@[19]

**Example Repository:** <https://github.com/anna-geller/prefect-dataplatform> (106 stars) - Shows Prefect + dbt + Snowflake data platform@[20]

### AWS Deployment Pattern

Deploy to AWS ECS Fargate:

```python
# prefect.yaml configuration
work_pool:
  name: aws-ecs-pool
  type: ecs

deployments:
  - name: production
    work_pool_name: aws-ecs-pool
    schedules:
      - cron: "0 */4 * * *"
```

@[21]

**Example Repository:** <https://github.com/anna-geller/dataflow-ops> (116 stars) - Automated deployments to AWS ECS@[22]

### Docker Compose Self-Hosted

Run Prefect server with Docker Compose:

```yaml
version: "3.8"
services:
  prefect-server:
    image: prefecthq/prefect:latest
    command: prefect server start
    ports:
      - "4200:4200"
    environment:
      - PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:password@postgres:5432/prefect
```

@[23]

**Example Repositories:**

- <https://github.com/rpeden/prefect-docker-compose> (142 stars)@[24]
- <https://github.com/flavienbwk/prefect-docker-compose> (161 stars)@[25]

## Common Usage Patterns

### Pattern 1: ETL Pipeline with Retries

```python
from prefect import flow, task
from prefect.tasks import exponential_backoff

@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=2))
def extract_data(source: str):
    # Fetch from API with automatic retries
    return fetch_api_data(source)

@task
def transform_data(raw_data):
    return clean_and_transform(raw_data)

@task
def load_data(data, destination: str):
    write_to_database(data, destination)

@flow(log_prints=True)
def etl_pipeline():
    raw = extract_data("https://api.example.com/data")
    transformed = transform_data(raw)
    load_data(transformed, "postgresql://db")
```

@[26]

### Pattern 2: Scheduled Data Sync

```python
@flow
def sync_customer_data():
    customers = fetch_customers()
    for customer in customers:
        sync_to_warehouse(customer)

# Schedule to run every hour
if __name__ == "__main__":
    sync_customer_data.serve(
        name="hourly-sync",
        interval=3600,  # Every hour
        tags=["production", "sync"]
    )
```

@[27]

### Pattern 3: ML Pipeline with Caching

```python
@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
def load_training_data():
    # Expensive data loading - cached for 1 hour
    return load_large_dataset()

@task
def train_model(data):
    return train_ml_model(data)

@flow
def ml_pipeline():
    data = load_training_data()  # Reuses cached result
    model = train_model(data)
    return model
```

@[28]

## Integration Ecosystem

### Data Transformation

- **dbt:** Native integration via `prefect-dbt` package (archived, use dbt Cloud API)@[29]
- **dbt Cloud:** Official integration for triggering dbt Cloud jobs@[30]

### Data Warehouses

- **Snowflake:** `prefect-snowflake` for query execution@[31]
- **BigQuery:** `prefect-gcp` for BigQuery operations@[32]
- **Redshift, PostgreSQL:** Standard database connectors@[33]

### Cloud Platforms

- **AWS:** `prefect-aws` (S3, ECS, Lambda, Batch)@[34]
- **GCP:** `prefect-gcp` (GCS, BigQuery, Cloud Run)@[35]
- **Azure:** `prefect-azure` (Blob Storage, Container Instances)@[36]

### Container Orchestration

- **Docker:** Native Docker build and push support@[37]
- **Kubernetes:** `prefect-kubernetes` for K8s deployments@[38]
- **ECS Fargate:** Built-in ECS work pools@[39]

### Data Quality

- **Great Expectations:** `prefect-great-expectations` for validation@[40]
- **Monte Carlo:** Circuit breaker integrations@[41]

### ML/AI

- **LangChain:** `langchain-prefect` for LLM workflows (archived)@[42]
- **MLflow:** Track experiments within Prefect flows@[43]

## Deployment Options

### 1. Prefect Cloud (Managed)

Fully managed orchestration platform with:

- Hosted API and UI
- Team collaboration features
- RBAC and access controls
- Enterprise SLAs
- Automations and event triggers@[44]

**Pricing:** Free tier + usage-based pricing@[45]

### 2. Self-Hosted Prefect Server

Open-source server you deploy:

```bash
# Start local server
prefect server start

# Or deploy via Docker
docker run -p 4200:4200 prefecthq/prefect:latest prefect server start
```

@[46]

**Requirements:** PostgreSQL database, Redis (optional for caching)@[47]

### 3. Hybrid Execution Model

Orchestration in cloud, execution anywhere:

- Control plane in Prefect Cloud
- Workers run in your infrastructure
- Code never leaves your environment@[48]

## When to Use Prefect

### Use Prefect When

1. **Building data pipelines** that need scheduling, retries, and monitoring@[49]
2. **Orchestrating ML workflows** with dynamic dependencies@[50]
3. **Coordinating microservices** or distributed tasks@[51]
4. **Migrating from cron jobs** to a modern orchestrator@[52]
5. **Need Python-native workflows** without DSL overhead@[53]
6. **Want local development** with production parity@[54]
7. **Require event-driven automation** beyond scheduling@[55]
8. **Need visibility** into workflow execution and failures@[56]

### Use Simple Scripts/Cron When

1. **Single-step tasks** with no dependencies@[57]
2. **One-off scripts** that rarely run@[58]
3. **No retry logic** needed@[59]
4. **No failure visibility** required@[60]
5. **Under 5 lines of code** total@[61]

## Prefect vs. Alternatives

### Prefect vs. Airflow

| Dimension | Prefect | Airflow |
| --- | --- | --- |
| **Development Model** | Pure Python functions with decorators | DAG definitions with operators |
| **Dynamic Workflows** | Runtime task creation based on data | Static DAG structure at parse time |
| **Local Development** | Run locally without infrastructure | Requires full Airflow setup |
| **Learning Curve** | Minimal - just Python | Steep - framework concepts required |
| **Infrastructure** | Runs anywhere Python runs | Multi-component (scheduler, webserver, DB) |
| **Cost** | 60-70% lower (per customer reports)@[62] | Higher due to always-on infrastructure@[63] |
| **Best For** | ML/AI, modern data teams, dynamic pipelines | Traditional ETL, platform teams invested in ecosystem |

**Migration Path:** Prefect provides 73.78% cost reduction over Astronomer (managed Airflow)@[64]

### Prefect vs. Dagster

| Dimension        | Prefect                     | Dagster                           |
| ---------------- | --------------------------- | --------------------------------- |
| **Philosophy**   | Workflow orchestration      | Data asset orchestration          |
| **Abstractions** | Flows and tasks             | Software-defined assets           |
| **Use Case**     | General workflow automation | Data asset lineage and cataloging |
| **Complexity**   | Lower barrier to entry      | Higher conceptual overhead        |

### Prefect vs. Metaflow

| Dimension      | Prefect                   | Metaflow              |
| -------------- | ------------------------- | --------------------- |
| **Origin**     | General orchestration     | Netflix ML workflows  |
| **Scope**      | Broad workflow automation | ML-specific pipelines |
| **Deployment** | Any infrastructure        | AWS, K8s focus        |
| **Community**  | Larger ecosystem          | ML-focused community  |

## Decision Matrix

```text
Use Prefect when:
- You write Python workflows
- You need dynamic task generation
- You want local development + production parity
- You need retry/caching/scheduling out of box
- You're building ML, data, or automation pipelines
- You want low operational overhead
- Cost efficiency matters (vs. Airflow)

Use Airflow when:
- You're heavily invested in Airflow ecosystem
- Your team already knows Airflow
- You need specific Airflow operators not in Prefect
- You have dedicated platform engineering for Airflow

Use Dagster when:
- Data asset lineage is primary concern
- You're building a data platform with asset catalog
- You need software-defined assets

Use simple cron/scripts when:
- Single independent tasks
- No retry logic needed
- No monitoring required
- Runs once per day or less
```

@[65]

## Anti-Patterns and Gotchas

### Don't Use Prefect For

1. **Simple one-off scripts** - adds unnecessary overhead@[66]
2. **Real-time streaming** - designed for batch/scheduled workflows@[67]
3. **Sub-second latency requirements** - orchestration adds overhead@[68]
4. **Pure event processing** - use Kafka/RabbitMQ instead@[69]

### Common Pitfalls

1. **Over-decomposition:** Breaking every line into a task creates overhead@[70]
2. **Ignoring task inputs:** Tasks should be pure functions for caching@[71]
3. **Not using .submit():** Blocking task calls prevent parallelism@[72]
4. **Skipping local testing:** Run flows locally before deploying@[73]

## Learning Resources

**Official Quickstart:** <https://docs.prefect.io/v3/get-started/quickstart@[74>] **Examples Repository:** <https://github.com/PrefectHQ/examples@[75>] **Community Recipes:** <https://github.com/PrefectHQ/prefect-recipes> (254 stars, archived)@[76] **Slack Community:** <https://prefect.io/slack@[77>] **YouTube Channel:** <https://www.youtube.com/c/PrefectIO/@[78>]

## Installation

```bash
# Using pip
pip install -U prefect

# Using uv (recommended)
uv add prefect

# With specific integrations
pip install prefect-aws prefect-gcp prefect-dbt
```

@[79]

## Verification Checklist

- [x] Official repository confirmed: <https://github.com/PrefectHQ/prefect>
- [x] PyPI package verified: prefect v3.4.24
- [x] Python compatibility: 3.9-3.13
- [x] License confirmed: Apache-2.0
- [x] Real-world examples: 5+ GitHub repositories with 100+ stars
- [x] Integration patterns documented: dbt, Snowflake, AWS, Docker
- [x] Decision matrix provided: vs Airflow, Dagster, Metaflow, cron
- [x] Anti-patterns identified: streaming, sub-second latency
- [x] Code examples: 6+ verified from official docs and Context7
- [x] Maintenance status: Active (1059 open issues, recent commits)

## References

Sources cited with @ notation throughout document:

[1-79] Information gathered from:

- Context7 Library ID: /prefecthq/prefect (Trust Score: 8.2, 6247 code snippets)
- Official documentation: <https://docs.prefect.io>
- GitHub repository: <https://github.com/PrefectHQ/prefect>
- PyPI package page: <https://pypi.org/project/prefect/>
- Prefect vs Airflow comparison: <https://www.prefect.io/compare/airflow>
- Example repositories: anna-geller/prefect-dataplatform, rpeden/prefect-docker-compose, flavienbwk/prefect-docker-compose, anna-geller/dataflow-ops
- Exa code context search results
- Ref documentation search results

Last verified: 2025-10-21