zhongwei/gh-jamie-bitflight-claude-skills-development-skills

Files

Zhongwei Li 5007abf04b Initial commit

2025-11-29 18:49:58 +08:00

15 KiB

Raw Blame History

title, library_name, pypi_package, category, python_compatibility, last_updated, official_docs, official_repository, maintenance_status

title	library_name	pypi_package	category	python_compatibility	last_updated	official_docs	official_repository	maintenance_status
Prefect: Modern Workflow Orchestration Platform	prefect	prefect	workflow-orchestration	3.9+	2025-11-02	https://docs.prefect.io	https://github.com/PrefectHQ/prefect	active

Prefect: Modern Workflow Orchestration

Core Purpose

Prefect solves workflow orchestration with a Python-first approach that turns regular Python functions into production-ready data pipelines. Unlike legacy orchestrators that require DAG definitions and framework-specific operators, Prefect observes native Python code execution and provides orchestration through simple decorators@[1].

Problem Domain: Coordinating multi-step data workflows, handling failures with retries, scheduling recurring jobs, monitoring pipeline execution, and managing dependencies between tasks without writing boilerplate orchestration code@[2].

When to Use: Building data pipelines, ML workflows, ETL processes, or any multi-step automation that needs scheduling, retry logic, state tracking, and observability@[3].

What You Would Reinvent: Manual retry logic, state management, dependency coordination, scheduling systems, execution monitoring, error handling, result caching, and workflow visibility dashboards@[4].

Official Information

Repository: https://github.com/PrefectHQ/prefect PyPI Package: prefect (current: v3.4.24)@[5] Documentation: https://docs.prefect.io License: Apache-2.0@[6] Maintenance: Actively maintained by PrefectHQ with 1059 open issues, 20.6K stars, regular releases@[7] Community: 30K+ engineers, active Slack community@[8]

Python Compatibility

Minimum Version: Python 3.9@[9] Maximum Version: Python 3.13 (3.14 not yet supported)@[9] Async Support: Full native async/await support throughout@[10] Type Hints: First-class support, type-safe structured outputs@[11]

Core Capabilities

1. Pythonic Flow Definition

Write workflows as regular Python functions with @flow and @task decorators:

from prefect import flow, task
import httpx

@task(log_prints=True)
def get_stars(repo: str):
    url = f"https://api.github.com/repos/{repo}"
    count = httpx.get(url).json()["stargazers_count"]
    print(f"{repo} has {count} stars!")

@flow(name="GitHub Stars")
def github_stars(repos: list[str]):
    for repo in repos:
        get_stars(repo)

# Run directly
if __name__ == "__main__":
    github_stars(["PrefectHQ/Prefect"])

@[12]

2. Dynamic Runtime Workflows

Create tasks dynamically based on data, not static DAG definitions:

from prefect import task, flow

@task
def process_customer(customer_id: str) -> str:
    return f"Processed {customer_id}"

@flow
def main() -> list[str]:
    customer_ids = get_customer_ids()  # Runtime data
    # Map tasks across dynamic data
    results = process_customer.map(customer_ids)
    return results

@[13]

3. Flexible Scheduling

Deploy workflows with cron, interval, or RRule schedules:

# Serve with cron schedule
if __name__ == "__main__":
    github_stars.serve(
        name="daily-stars",
        cron="0 8 * * *",  # Daily at 8 AM
        parameters={"repos": ["PrefectHQ/prefect"]}
    )

@[14]

# Or use interval-based scheduling
my_flow.deploy(
    name="my-deployment",
    work_pool_name="my-work-pool",
    interval=timedelta(minutes=10)
)

@[15]

4. Built-in Retries and State Management

Automatic retry logic and state tracking:

@task(retries=3, retry_delay_seconds=60)
def fetch_data():
    # Automatically retries on failure
    return api_call()

@[16]

5. Concurrent Task Execution

Run tasks in parallel with .submit():

@flow
def my_workflow():
    future = cool_task.submit()  # Non-blocking
    print(what_did_cool_task_say(future))

@[17]

6. Event-Driven Automations

React to events, not just schedules:

# Trigger flows on external events
my_flow.deploy(
    triggers=[
        DeploymentEventTrigger(
            expect=["s3.file.uploaded"]
        )
    ]
)

@[18]

Real-World Integration Patterns

Integration with dbt

Orchestrate dbt transformations within Prefect flows:

from prefect_dbt import DbtCoreOperation

@flow
def dbt_flow():
    result = DbtCoreOperation(
        commands=["dbt run", "dbt test"],
        project_dir="/path/to/dbt/project"
    ).run()
    return result

@[19]

Example Repository: https://github.com/anna-geller/prefect-dataplatform (106 stars) - Shows Prefect + dbt + Snowflake data platform@[20]

AWS Deployment Pattern

Deploy to AWS ECS Fargate:

# prefect.yaml configuration
work_pool:
  name: aws-ecs-pool
  type: ecs

deployments:
  - name: production
    work_pool_name: aws-ecs-pool
    schedules:
      - cron: "0 */4 * * *"

@[21]

Example Repository: https://github.com/anna-geller/dataflow-ops (116 stars) - Automated deployments to AWS ECS@[22]

Docker Compose Self-Hosted

Run Prefect server with Docker Compose:

version: "3.8"
services:
  prefect-server:
    image: prefecthq/prefect:latest
    command: prefect server start
    ports:
      - "4200:4200"
    environment:
      - PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:password@postgres:5432/prefect

@[23]

Example Repositories:

https://github.com/rpeden/prefect-docker-compose (142 stars)@[24]
https://github.com/flavienbwk/prefect-docker-compose (161 stars)@[25]

Common Usage Patterns

Pattern 1: ETL Pipeline with Retries

from prefect import flow, task
from prefect.tasks import exponential_backoff

@task(retries=3, retry_delay_seconds=exponential_backoff(backoff_factor=2))
def extract_data(source: str):
    # Fetch from API with automatic retries
    return fetch_api_data(source)

@task
def transform_data(raw_data):
    return clean_and_transform(raw_data)

@task
def load_data(data, destination: str):
    write_to_database(data, destination)

@flow(log_prints=True)
def etl_pipeline():
    raw = extract_data("https://api.example.com/data")
    transformed = transform_data(raw)
    load_data(transformed, "postgresql://db")

@[26]

Pattern 2: Scheduled Data Sync

@flow
def sync_customer_data():
    customers = fetch_customers()
    for customer in customers:
        sync_to_warehouse(customer)

# Schedule to run every hour
if __name__ == "__main__":
    sync_customer_data.serve(
        name="hourly-sync",
        interval=3600,  # Every hour
        tags=["production", "sync"]
    )

@[27]

Pattern 3: ML Pipeline with Caching

@task(cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
def load_training_data():
    # Expensive data loading - cached for 1 hour
    return load_large_dataset()

@task
def train_model(data):
    return train_ml_model(data)

@flow
def ml_pipeline():
    data = load_training_data()  # Reuses cached result
    model = train_model(data)
    return model

@[28]

Integration Ecosystem

Data Transformation

dbt: Native integration via prefect-dbt package (archived, use dbt Cloud API)@[29]
dbt Cloud: Official integration for triggering dbt Cloud jobs@[30]

Data Warehouses

Snowflake: prefect-snowflake for query execution@[31]
BigQuery: prefect-gcp for BigQuery operations@[32]
Redshift, PostgreSQL: Standard database connectors@[33]

Cloud Platforms

AWS: prefect-aws (S3, ECS, Lambda, Batch)@[34]
GCP: prefect-gcp (GCS, BigQuery, Cloud Run)@[35]
Azure: prefect-azure (Blob Storage, Container Instances)@[36]

Container Orchestration

Docker: Native Docker build and push support@[37]
Kubernetes: prefect-kubernetes for K8s deployments@[38]
ECS Fargate: Built-in ECS work pools@[39]

Data Quality

Great Expectations: prefect-great-expectations for validation@[40]
Monte Carlo: Circuit breaker integrations@[41]

ML/AI

LangChain: langchain-prefect for LLM workflows (archived)@[42]
MLflow: Track experiments within Prefect flows@[43]

Deployment Options

1. Prefect Cloud (Managed)

Fully managed orchestration platform with:

Hosted API and UI
Team collaboration features
RBAC and access controls
Enterprise SLAs
Automations and event triggers@[44]

Pricing: Free tier + usage-based pricing@[45]

2. Self-Hosted Prefect Server

Open-source server you deploy:

# Start local server
prefect server start

# Or deploy via Docker
docker run -p 4200:4200 prefecthq/prefect:latest prefect server start

@[46]

Requirements: PostgreSQL database, Redis (optional for caching)@[47]

3. Hybrid Execution Model

Orchestration in cloud, execution anywhere:

Control plane in Prefect Cloud
Workers run in your infrastructure
Code never leaves your environment@[48]

When to Use Prefect

Use Prefect When

Building data pipelines that need scheduling, retries, and monitoring@[49]
Orchestrating ML workflows with dynamic dependencies@[50]
Coordinating microservices or distributed tasks@[51]
Migrating from cron jobs to a modern orchestrator@[52]
Need Python-native workflows without DSL overhead@[53]
Want local development with production parity@[54]
Require event-driven automation beyond scheduling@[55]
Need visibility into workflow execution and failures@[56]

Use Simple Scripts/Cron When

Single-step tasks with no dependencies@[57]
One-off scripts that rarely run@[58]
No retry logic needed@[59]
No failure visibility required@[60]
Under 5 lines of code total@[61]

Prefect vs. Alternatives

Prefect vs. Airflow

Dimension	Prefect	Airflow
Development Model	Pure Python functions with decorators	DAG definitions with operators
Dynamic Workflows	Runtime task creation based on data	Static DAG structure at parse time
Local Development	Run locally without infrastructure	Requires full Airflow setup
Learning Curve	Minimal - just Python	Steep - framework concepts required
Infrastructure	Runs anywhere Python runs	Multi-component (scheduler, webserver, DB)
Cost	60-70% lower (per customer reports)@[62]	Higher due to always-on infrastructure@[63]
Best For	ML/AI, modern data teams, dynamic pipelines	Traditional ETL, platform teams invested in ecosystem

Migration Path: Prefect provides 73.78% cost reduction over Astronomer (managed Airflow)@[64]

Prefect vs. Dagster

Dimension	Prefect	Dagster
Philosophy	Workflow orchestration	Data asset orchestration
Abstractions	Flows and tasks	Software-defined assets
Use Case	General workflow automation	Data asset lineage and cataloging
Complexity	Lower barrier to entry	Higher conceptual overhead

Prefect vs. Metaflow

Dimension	Prefect	Metaflow
Origin	General orchestration	Netflix ML workflows
Scope	Broad workflow automation	ML-specific pipelines
Deployment	Any infrastructure	AWS, K8s focus
Community	Larger ecosystem	ML-focused community

Decision Matrix

Use Prefect when:
- You write Python workflows
- You need dynamic task generation
- You want local development + production parity
- You need retry/caching/scheduling out of box
- You're building ML, data, or automation pipelines
- You want low operational overhead
- Cost efficiency matters (vs. Airflow)

Use Airflow when:
- You're heavily invested in Airflow ecosystem
- Your team already knows Airflow
- You need specific Airflow operators not in Prefect
- You have dedicated platform engineering for Airflow

Use Dagster when:
- Data asset lineage is primary concern
- You're building a data platform with asset catalog
- You need software-defined assets

Use simple cron/scripts when:
- Single independent tasks
- No retry logic needed
- No monitoring required
- Runs once per day or less

@[65]

Anti-Patterns and Gotchas

Don't Use Prefect For

Simple one-off scripts - adds unnecessary overhead@[66]
Real-time streaming - designed for batch/scheduled workflows@[67]
Sub-second latency requirements - orchestration adds overhead@[68]
Pure event processing - use Kafka/RabbitMQ instead@[69]

Common Pitfalls

Over-decomposition: Breaking every line into a task creates overhead@[70]
Ignoring task inputs: Tasks should be pure functions for caching@[71]
Not using .submit(): Blocking task calls prevent parallelism@[72]
Skipping local testing: Run flows locally before deploying@[73]

Learning Resources

Official Quickstart: https://docs.prefect.io/v3/get-started/quickstart@[74] Examples Repository: https://github.com/PrefectHQ/examples@[75] Community Recipes: https://github.com/PrefectHQ/prefect-recipes (254 stars, archived)@[76] Slack Community: https://prefect.io/slack@[77] YouTube Channel: https://www.youtube.com/c/PrefectIO/@[78]

Installation

# Using pip
pip install -U prefect

# Using uv (recommended)
uv add prefect

# With specific integrations
pip install prefect-aws prefect-gcp prefect-dbt

@[79]

Verification Checklist

Official repository confirmed: https://github.com/PrefectHQ/prefect
PyPI package verified: prefect v3.4.24
Python compatibility: 3.9-3.13
License confirmed: Apache-2.0
Real-world examples: 5+ GitHub repositories with 100+ stars
Integration patterns documented: dbt, Snowflake, AWS, Docker
Decision matrix provided: vs Airflow, Dagster, Metaflow, cron
Anti-patterns identified: streaming, sub-second latency
Code examples: 6+ verified from official docs and Context7
Maintenance status: Active (1059 open issues, recent commits)

References

Sources cited with @ notation throughout document:

[1-79] Information gathered from:

Context7 Library ID: /prefecthq/prefect (Trust Score: 8.2, 6247 code snippets)
Official documentation: https://docs.prefect.io
GitHub repository: https://github.com/PrefectHQ/prefect
PyPI package page: https://pypi.org/project/prefect/
Prefect vs Airflow comparison: https://www.prefect.io/compare/airflow
Example repositories: anna-geller/prefect-dataplatform, rpeden/prefect-docker-compose, flavienbwk/prefect-docker-compose, anna-geller/dataflow-ops
Exa code context search results
Ref documentation search results

Last verified: 2025-10-21

15 KiB Raw Blame History