Files
gh-hiroshi75-protografico-p…/skills/langgraph-master/02_graph_architecture_parallelization.md
2025-11-29 18:45:58 +08:00

183 lines
5.1 KiB
Markdown

# Parallelization (Parallel Processing)
A pattern for executing multiple independent tasks simultaneously.
## Overview
Parallelization is a pattern that executes **multiple tasks that don't depend on each other** simultaneously, achieving speed improvements and reliability verification.
## Use Cases
- Scoring documents with multiple evaluation criteria
- Analysis from different perspectives (technical/business/legal)
- Comparing results from multiple translation engines
- Implementing Map-Reduce pattern
## Implementation Example
```python
from typing import Annotated, TypedDict
from operator import add
class State(TypedDict):
document: str
scores: Annotated[list[dict], add] # Aggregate multiple results
def technical_review(state: State):
"""Review from technical perspective"""
score = llm.invoke(
f"Technical review: {state['document']}"
)
return {"scores": [{"type": "technical", "score": score}]}
def business_review(state: State):
"""Review from business perspective"""
score = llm.invoke(
f"Business review: {state['document']}"
)
return {"scores": [{"type": "business", "score": score}]}
def legal_review(state: State):
"""Review from legal perspective"""
score = llm.invoke(
f"Legal review: {state['document']}"
)
return {"scores": [{"type": "legal", "score": score}]}
def aggregate_scores(state: State):
"""Aggregate scores"""
total = sum(s["score"] for s in state["scores"])
return {"final_score": total / len(state["scores"])}
# Build graph
builder = StateGraph(State)
# Nodes to be executed in parallel
builder.add_node("technical", technical_review)
builder.add_node("business", business_review)
builder.add_node("legal", legal_review)
builder.add_node("aggregate", aggregate_scores)
# Edges for parallel execution
builder.add_edge(START, "technical")
builder.add_edge(START, "business")
builder.add_edge(START, "legal")
# To aggregation node
builder.add_edge("technical", "aggregate")
builder.add_edge("business", "aggregate")
builder.add_edge("legal", "aggregate")
builder.add_edge("aggregate", END)
graph = builder.compile()
```
## Important Concept: Reducer
A **Reducer** is essential for aggregating results from parallel execution:
```python
from operator import add
class State(TypedDict):
# Additively aggregate results from multiple nodes
results: Annotated[list, add]
# Keep maximum value
max_score: Annotated[int, max]
# Custom Reducer
combined: Annotated[dict, combine_dicts]
```
## Benefits
**Speed**: Time reduction through parallel task execution
**Reliability**: Verification by comparing multiple results
**Scalability**: Adjust parallelism based on task count
**Robustness**: Can continue if some succeed even if others fail
## Considerations
⚠️ **Reducer Required**: Explicitly define result aggregation method
⚠️ **Resource Consumption**: Increased memory and API calls from parallel execution
⚠️ **Uncertain Order**: Execution order not guaranteed
⚠️ **Debugging Complexity**: Parallel execution troubleshooting is difficult
## Advanced Patterns
### Pattern 1: Fan-out / Fan-in
```python
# Fan-out: One node to multiple
builder.add_edge("router", "task_a")
builder.add_edge("router", "task_b")
builder.add_edge("router", "task_c")
# Fan-in: Multiple to one aggregation
builder.add_edge("task_a", "aggregator")
builder.add_edge("task_b", "aggregator")
builder.add_edge("task_c", "aggregator")
```
### Pattern 2: Balancing (defer=True)
Wait for branches of different lengths:
```python
from operator import add
def add_with_defer(left: list, right: list) -> list:
return left + right
class State(TypedDict):
results: Annotated[list, add_with_defer]
# Specify defer=True at compile time
graph = builder.compile(
checkpointer=checkpointer,
# Wait until all branches complete
)
```
### Pattern 3: Reliability Through Redundancy
```python
def provider_a(state: State):
"""Provider A"""
return {"responses": [call_api_a(state["query"])]}
def provider_b(state: State):
"""Provider B (backup)"""
return {"responses": [call_api_b(state["query"])]}
def provider_c(state: State):
"""Provider C (backup)"""
return {"responses": [call_api_c(state["query"])]}
def select_best(state: State):
"""Select best response"""
responses = state["responses"]
best = max(responses, key=lambda r: r.confidence)
return {"result": best}
```
## vs Other Patterns
| Pattern | Parallelization | Prompt Chaining |
|---------|----------------|-----------------|
| Execution Order | Parallel | Sequential |
| Dependencies | None | Yes |
| Execution Time | Short | Long |
| Result Aggregation | Reducer required | Not required |
## Summary
Parallelization is optimal for **simultaneous execution of independent tasks**. It's important to properly aggregate results using a Reducer.
## Related Pages
- [02_graph_architecture_orchestrator_worker.md](02_graph_architecture_orchestrator_worker.md) - Dynamic parallel processing
- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce pattern
- [01_core_concepts_state.md](01_core_concepts_state.md) - Reducer details