Files
gh-jamie-bitflight-claude-s…/skills/python3-development/references/modern-modules/prefect.md
2025-11-29 18:49:58 +08:00

15 KiB

title, library_name, pypi_package, category, python_compatibility, last_updated, official_docs, official_repository, maintenance_status
title library_name pypi_package category python_compatibility last_updated official_docs official_repository maintenance_status
Prefect: Modern Workflow Orchestration Platform prefect prefect workflow-orchestration 3.9+ 2025-11-02 https://docs.prefect.io https://github.com/PrefectHQ/prefect active

Prefect: Modern Workflow Orchestration

Core Purpose

Prefect solves workflow orchestration with a Python-first approach that turns regular Python functions into production-ready data pipelines. Unlike legacy orchestrators that require DAG definitions and framework-specific operators, Prefect observes native Python code execution and provides orchestration through simple decorators@[1].

Problem Domain: Coordinating multi-step data workflows, handling failures with retries, scheduling recurring jobs, monitoring pipeline execution, and managing dependencies between tasks without writing boilerplate orchestration code@[2].

When to Use: Building data pipelines, ML workflows, ETL processes, or any multi-step automation that needs scheduling, retry logic, state tracking, and observability@[3].

What You Would Reinvent: Manual retry logic, state management, dependency coordination, scheduling systems, execution monitoring, error handling, result caching, and workflow visibility dashboards@[4].

Official Information

Repository: https://github.com/PrefectHQ/prefect PyPI Package: prefect (current: v3.4.24)@[5] Documentation: https://docs.prefect.io License: Apache-2.0@[6] Maintenance: Actively maintained by PrefectHQ with 1059 open issues, 20.6K stars, regular releases@[7] Community: 30K+ engineers, active Slack community@[8]

Python Compatibility

Minimum Version: Python 3.9@[9] Maximum Version: Python 3.13 (3.14 not yet supported)@[9] Async Support: Full native async/await support throughout@[10] Type Hints: First-class support, type-safe structured outputs@[11]

Core Capabilities

1. Pythonic Flow Definition

Write workflows as regular Python functions with @flow and @task decorators:

from prefect import flow, task
import httpx

@task(log_prints=True)
def get_stars(repo: str):
    url = f"https://api.github.com/repos/{repo}"
    count = httpx.get(url).json()["stargazers_count"]
    print(f"{repo} has {count} stars!")

@flow(name="GitHub Stars")
def github_stars(repos: list[str]):
    for repo in repos:
        get_stars(repo)

# Run directly
if __name__ == "__main__":
    github_stars(["PrefectHQ/Prefect"])

@[12]

2. Dynamic Runtime Workflows

Create tasks dynamically based on data, not static DAG definitions:

from prefect import task, flow

@task
def process_customer(customer_id: str) -> str:
    return f"Processed {customer_id}"

@flow
def main() -> list[str]:
    customer_ids = get_customer_ids()  # Runtime data
    # Map tasks across dynamic data
    results = process_customer.map(customer_ids)
    return results

@[13]

3. Flexible Scheduling

Deploy workflows with cron, interval, or RRule schedules:

# Serve with cron schedule
if __name__ == "__main__":
    github_stars.serve(
        name="daily-stars",
        cron="0 8 * * *",  # Daily at 8 AM
        parameters={"repos": ["PrefectHQ/prefect"]}
    )

@[14]

# Or use interval-based scheduling
my_flow.deploy(
    name="my-deployment",
    work_pool_name="my-work-pool",
    interval=timedelta(minutes=10)
)

@[15]

4. Built-in Retries and State Management

Automatic retry logic and state tracking:

@task(retries=3, retry_delay_seconds=60)
def fetch_data():
    # Automatically retries on failure
    return api_call()

@[16]

5. Concurrent Task Execution

Run tasks in parallel with .submit():

@flow
def my_workflow():
    future = cool_task.submit()  # Non-blocking
    print(what_did_cool_task_say(future))

@[17]

6. Event-Driven Automations

React to events, not just schedules:

# Trigger flows on external events
my_flow.deploy(
    triggers=[
        DeploymentEventTrigger(
            expect=["s3.file.uploaded"]
        )
    ]
)

@[18]

Real-World Integration Patterns

Integration with dbt

Orchestrate dbt transformations within Prefect flows:

from prefect_dbt import DbtCoreOperation

@flow
def dbt_flow():
    result = DbtCoreOperation(
        commands=["dbt run", "dbt test"],
        project_dir="/path/to/dbt/project"
    ).run()
    return result

@[19]

Example Repository: https://github.com/anna-geller/prefect-dataplatform (106 stars) - Shows Prefect + dbt + Snowflake data platform@[20]

AWS Deployment Pattern

Deploy to AWS ECS Fargate:

# prefect.yaml configuration
work_pool:
  name: aws-ecs-pool
  type: ecs

deployments:
  - name: production
    work_pool_name: aws-ecs-pool
    schedules:
      - cron: "0 */4 * * *"

@[21]

Example Repository: https://github.com/anna-geller/dataflow-ops (116 stars) - Automated deployments to AWS ECS@[22]

Docker Compose Self-Hosted

Run Prefect server with Docker Compose:

version: "3.8"
services:
  prefect-server:
    image: prefecthq/prefect:latest
    command: prefect server start
    ports:
      - "4200:4200"
    environment:
      - PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:password@postgres:5432/prefect

@[23]

Example Repositories:

Common Usage Patterns

Pattern 1: ETL Pipeline with Retries

from prefect import flow, task
from prefect.tasks import exponential_backoff

@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=2))
def extract_data(source: str):
    # Fetch from API with automatic retries
    return fetch_api_data(source)

@task
def transform_data(raw_data):
    return clean_and_transform(raw_data)

@task
def load_data(data, destination: str):
    write_to_database(data, destination)

@flow(log_prints=True)
def etl_pipeline():
    raw = extract_data("https://api.example.com/data")
    transformed = transform_data(raw)
    load_data(transformed, "postgresql://db")

@[26]

Pattern 2: Scheduled Data Sync

@flow
def sync_customer_data():
    customers = fetch_customers()
    for customer in customers:
        sync_to_warehouse(customer)

# Schedule to run every hour
if __name__ == "__main__":
    sync_customer_data.serve(
        name="hourly-sync",
        interval=3600,  # Every hour
        tags=["production", "sync"]
    )

@[27]

Pattern 3: ML Pipeline with Caching

@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
def load_training_data():
    # Expensive data loading - cached for 1 hour
    return load_large_dataset()

@task
def train_model(data):
    return train_ml_model(data)

@flow
def ml_pipeline():
    data = load_training_data()  # Reuses cached result
    model = train_model(data)
    return model

@[28]

Integration Ecosystem

Data Transformation

  • dbt: Native integration via prefect-dbt package (archived, use dbt Cloud API)@[29]
  • dbt Cloud: Official integration for triggering dbt Cloud jobs@[30]

Data Warehouses

  • Snowflake: prefect-snowflake for query execution@[31]
  • BigQuery: prefect-gcp for BigQuery operations@[32]
  • Redshift, PostgreSQL: Standard database connectors@[33]

Cloud Platforms

  • AWS: prefect-aws (S3, ECS, Lambda, Batch)@[34]
  • GCP: prefect-gcp (GCS, BigQuery, Cloud Run)@[35]
  • Azure: prefect-azure (Blob Storage, Container Instances)@[36]

Container Orchestration

  • Docker: Native Docker build and push support@[37]
  • Kubernetes: prefect-kubernetes for K8s deployments@[38]
  • ECS Fargate: Built-in ECS work pools@[39]

Data Quality

  • Great Expectations: prefect-great-expectations for validation@[40]
  • Monte Carlo: Circuit breaker integrations@[41]

ML/AI

  • LangChain: langchain-prefect for LLM workflows (archived)@[42]
  • MLflow: Track experiments within Prefect flows@[43]

Deployment Options

1. Prefect Cloud (Managed)

Fully managed orchestration platform with:

  • Hosted API and UI
  • Team collaboration features
  • RBAC and access controls
  • Enterprise SLAs
  • Automations and event triggers@[44]

Pricing: Free tier + usage-based pricing@[45]

2. Self-Hosted Prefect Server

Open-source server you deploy:

# Start local server
prefect server start

# Or deploy via Docker
docker run -p 4200:4200 prefecthq/prefect:latest prefect server start

@[46]

Requirements: PostgreSQL database, Redis (optional for caching)@[47]

3. Hybrid Execution Model

Orchestration in cloud, execution anywhere:

  • Control plane in Prefect Cloud
  • Workers run in your infrastructure
  • Code never leaves your environment@[48]

When to Use Prefect

Use Prefect When

  1. Building data pipelines that need scheduling, retries, and monitoring@[49]
  2. Orchestrating ML workflows with dynamic dependencies@[50]
  3. Coordinating microservices or distributed tasks@[51]
  4. Migrating from cron jobs to a modern orchestrator@[52]
  5. Need Python-native workflows without DSL overhead@[53]
  6. Want local development with production parity@[54]
  7. Require event-driven automation beyond scheduling@[55]
  8. Need visibility into workflow execution and failures@[56]

Use Simple Scripts/Cron When

  1. Single-step tasks with no dependencies@[57]
  2. One-off scripts that rarely run@[58]
  3. No retry logic needed@[59]
  4. No failure visibility required@[60]
  5. Under 5 lines of code total@[61]

Prefect vs. Alternatives

Prefect vs. Airflow

Dimension Prefect Airflow
Development Model Pure Python functions with decorators DAG definitions with operators
Dynamic Workflows Runtime task creation based on data Static DAG structure at parse time
Local Development Run locally without infrastructure Requires full Airflow setup
Learning Curve Minimal - just Python Steep - framework concepts required
Infrastructure Runs anywhere Python runs Multi-component (scheduler, webserver, DB)
Cost 60-70% lower (per customer reports)@[62] Higher due to always-on infrastructure@[63]
Best For ML/AI, modern data teams, dynamic pipelines Traditional ETL, platform teams invested in ecosystem

Migration Path: Prefect provides 73.78% cost reduction over Astronomer (managed Airflow)@[64]

Prefect vs. Dagster

Dimension Prefect Dagster
Philosophy Workflow orchestration Data asset orchestration
Abstractions Flows and tasks Software-defined assets
Use Case General workflow automation Data asset lineage and cataloging
Complexity Lower barrier to entry Higher conceptual overhead

Prefect vs. Metaflow

Dimension Prefect Metaflow
Origin General orchestration Netflix ML workflows
Scope Broad workflow automation ML-specific pipelines
Deployment Any infrastructure AWS, K8s focus
Community Larger ecosystem ML-focused community

Decision Matrix

Use Prefect when:
- You write Python workflows
- You need dynamic task generation
- You want local development + production parity
- You need retry/caching/scheduling out of box
- You're building ML, data, or automation pipelines
- You want low operational overhead
- Cost efficiency matters (vs. Airflow)

Use Airflow when:
- You're heavily invested in Airflow ecosystem
- Your team already knows Airflow
- You need specific Airflow operators not in Prefect
- You have dedicated platform engineering for Airflow

Use Dagster when:
- Data asset lineage is primary concern
- You're building a data platform with asset catalog
- You need software-defined assets

Use simple cron/scripts when:
- Single independent tasks
- No retry logic needed
- No monitoring required
- Runs once per day or less

@[65]

Anti-Patterns and Gotchas

Don't Use Prefect For

  1. Simple one-off scripts - adds unnecessary overhead@[66]
  2. Real-time streaming - designed for batch/scheduled workflows@[67]
  3. Sub-second latency requirements - orchestration adds overhead@[68]
  4. Pure event processing - use Kafka/RabbitMQ instead@[69]

Common Pitfalls

  1. Over-decomposition: Breaking every line into a task creates overhead@[70]
  2. Ignoring task inputs: Tasks should be pure functions for caching@[71]
  3. Not using .submit(): Blocking task calls prevent parallelism@[72]
  4. Skipping local testing: Run flows locally before deploying@[73]

Learning Resources

Official Quickstart: https://docs.prefect.io/v3/get-started/quickstart@[74] Examples Repository: https://github.com/PrefectHQ/examples@[75] Community Recipes: https://github.com/PrefectHQ/prefect-recipes (254 stars, archived)@[76] Slack Community: https://prefect.io/slack@[77] YouTube Channel: https://www.youtube.com/c/PrefectIO/@[78]

Installation

# Using pip
pip install -U prefect

# Using uv (recommended)
uv add prefect

# With specific integrations
pip install prefect-aws prefect-gcp prefect-dbt

@[79]

Verification Checklist

  • Official repository confirmed: https://github.com/PrefectHQ/prefect
  • PyPI package verified: prefect v3.4.24
  • Python compatibility: 3.9-3.13
  • License confirmed: Apache-2.0
  • Real-world examples: 5+ GitHub repositories with 100+ stars
  • Integration patterns documented: dbt, Snowflake, AWS, Docker
  • Decision matrix provided: vs Airflow, Dagster, Metaflow, cron
  • Anti-patterns identified: streaming, sub-second latency
  • Code examples: 6+ verified from official docs and Context7
  • Maintenance status: Active (1059 open issues, recent commits)

References

Sources cited with @ notation throughout document:

[1-79] Information gathered from:

Last verified: 2025-10-21