Initial commit
This commit is contained in:
471
skills/arch-analysis/SKILL.md
Normal file
471
skills/arch-analysis/SKILL.md
Normal file
@@ -0,0 +1,471 @@
|
||||
---
|
||||
name: arch-analysis
|
||||
description: Analyze LangGraph application architecture, identify bottlenecks, and propose multiple improvement strategies
|
||||
---
|
||||
|
||||
# LangGraph Architecture Analysis Skill
|
||||
|
||||
A skill for analyzing LangGraph application architecture, identifying bottlenecks, and proposing multiple improvement strategies.
|
||||
|
||||
## 📋 Overview
|
||||
|
||||
This skill analyzes existing LangGraph applications and proposes graph structure improvements:
|
||||
|
||||
1. **Current State Analysis**: Performance measurement and graph structure understanding
|
||||
2. **Problem Identification**: Organizing bottlenecks and architectural issues
|
||||
3. **Improvement Proposals**: Generate 3-5 diverse improvement proposals (**all candidates for parallel exploration**)
|
||||
|
||||
**Important**:
|
||||
- This skill only performs analysis and proposals. It does not implement changes.
|
||||
- **Output all improvement proposals**. The arch-tune command will implement and evaluate them in parallel.
|
||||
|
||||
## 🎯 When to Use
|
||||
|
||||
Use this skill in the following situations:
|
||||
|
||||
1. **When performance improvement of existing applications is needed**
|
||||
- Latency exceeds targets
|
||||
- Cost is too high
|
||||
- Accuracy is insufficient
|
||||
|
||||
2. **When considering architecture-level improvements**
|
||||
- Prompt optimization (fine-tune) has limitations
|
||||
- Graph structure changes are needed
|
||||
- Considering introduction of new patterns
|
||||
|
||||
3. **When you want to compare multiple improvement options**
|
||||
- Unclear which architecture is optimal
|
||||
- Want to understand trade-offs
|
||||
|
||||
## 📖 Analysis and Proposal Workflow
|
||||
|
||||
### Step 1: Verify Evaluation Environment
|
||||
|
||||
**Purpose**: Prepare for performance measurement
|
||||
|
||||
**Actions**:
|
||||
1. Verify existence of evaluation program (`.langgraph-master/evaluation/` or specified directory)
|
||||
2. If not present, confirm evaluation criteria with user and create
|
||||
3. Verify test cases
|
||||
|
||||
**Output**: Evaluation program ready
|
||||
|
||||
### Step 2: Measure Current Performance
|
||||
|
||||
**Purpose**: Establish baseline
|
||||
|
||||
**Actions**:
|
||||
1. Run test cases 3-5 times
|
||||
2. Record each metric (accuracy, latency, cost, etc.)
|
||||
3. Calculate statistics (mean, standard deviation, min, max)
|
||||
4. Save as baseline
|
||||
|
||||
**Output**: `baseline_performance.json`
|
||||
|
||||
### Step 3: Analyze Graph Structure
|
||||
|
||||
**Purpose**: Understand current architecture
|
||||
|
||||
**Actions**:
|
||||
1. **Identify graph definitions with Serena MCP**
|
||||
- Search for StateGraph, MessageGraph with `find_symbol`
|
||||
- Identify graph definition files (typically `graph.py`, `main.py`, etc.)
|
||||
|
||||
2. **Analyze node and edge structure**
|
||||
- List node functions with `get_symbols_overview`
|
||||
- Verify edge types (sequential, parallel, conditional)
|
||||
- Check for subgraphs
|
||||
|
||||
3. **Understand each node's role**
|
||||
- Read node functions
|
||||
- Verify presence of LLM calls
|
||||
- Summarize processing content
|
||||
|
||||
**Output**: Graph structure documentation
|
||||
|
||||
### Step 4: Identify Bottlenecks
|
||||
|
||||
**Purpose**: Identify performance problem areas
|
||||
|
||||
**Actions**:
|
||||
1. **Latency Bottlenecks**
|
||||
- Identify nodes with longest execution time
|
||||
- Verify delays from sequential processing
|
||||
- Discover unnecessary processing
|
||||
|
||||
2. **Cost Issues**
|
||||
- Identify high-cost nodes
|
||||
- Verify unnecessary LLM calls
|
||||
- Evaluate model selection optimality
|
||||
|
||||
3. **Accuracy Issues**
|
||||
- Identify nodes with frequent errors
|
||||
- Verify errors due to insufficient information
|
||||
- Discover architecture constraints
|
||||
|
||||
**Output**: List of issues
|
||||
|
||||
### Step 5: Consider Architecture Patterns
|
||||
|
||||
**Purpose**: Identify applicable LangGraph patterns
|
||||
|
||||
**Actions**:
|
||||
1. **Consider patterns based on problems**
|
||||
- Latency issues → Parallelization
|
||||
- Diverse use cases → Routing
|
||||
- Complex processing → Subgraph
|
||||
- Staged processing → Prompt Chaining, Map-Reduce
|
||||
|
||||
2. **Reference langgraph-master skill**
|
||||
- Verify characteristics of each pattern
|
||||
- Evaluate application conditions
|
||||
- Reference implementation examples
|
||||
|
||||
**Output**: List of applicable patterns
|
||||
|
||||
### Step 6: Generate Improvement Proposals
|
||||
|
||||
**Purpose**: Create 3-5 diverse improvement proposals (all candidates for parallel exploration)
|
||||
|
||||
**Actions**:
|
||||
1. **Create improvement proposals based on each pattern**
|
||||
- Change details (which nodes/edges to modify)
|
||||
- Expected effects (impact on accuracy, latency, cost)
|
||||
- Implementation complexity (low/medium/high)
|
||||
- Estimated implementation time
|
||||
|
||||
2. **Evaluate improvement proposals**
|
||||
- Feasibility
|
||||
- Risk assessment
|
||||
- Expected ROI
|
||||
|
||||
**Important**: Output all improvement proposals. The arch-tune command will **implement and evaluate all proposals in parallel**.
|
||||
|
||||
**Output**: Improvement proposal document (including all proposals)
|
||||
|
||||
### Step 7: Create Report
|
||||
|
||||
**Purpose**: Organize analysis results and proposals
|
||||
|
||||
**Actions**:
|
||||
1. Current state analysis summary
|
||||
2. Organize issues
|
||||
3. **Document all improvement proposals in `improvement_proposals.md`** (with priorities)
|
||||
4. Present recommendations for reference (first recommendation, second recommendation, reference)
|
||||
|
||||
**Important**: Output all proposals to `improvement_proposals.md`. The arch-tune command will read these and implement/evaluate them in parallel.
|
||||
|
||||
**Output**:
|
||||
- `analysis_report.md` - Current state analysis and issues
|
||||
- `improvement_proposals.md` - **All improvement proposals** (Proposal 1, 2, 3, ...)
|
||||
|
||||
## 📊 Output Formats
|
||||
|
||||
### baseline_performance.json
|
||||
|
||||
```json
|
||||
{
|
||||
"iterations": 5,
|
||||
"test_cases": 20,
|
||||
"metrics": {
|
||||
"accuracy": {
|
||||
"mean": 75.0,
|
||||
"std": 3.2,
|
||||
"min": 70.0,
|
||||
"max": 80.0
|
||||
},
|
||||
"latency": {
|
||||
"mean": 3.5,
|
||||
"std": 0.4,
|
||||
"min": 3.1,
|
||||
"max": 4.2
|
||||
},
|
||||
"cost": {
|
||||
"mean": 0.020,
|
||||
"std": 0.002,
|
||||
"min": 0.018,
|
||||
"max": 0.023
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### analysis_report.md
|
||||
|
||||
```markdown
|
||||
# Architecture Analysis Report
|
||||
|
||||
Execution Date: 2024-11-24 10:00:00
|
||||
|
||||
## Current Performance
|
||||
|
||||
| Metric | Mean | Std Dev | Target | Gap |
|
||||
|--------|------|---------|--------|-----|
|
||||
| Accuracy | 75.0% | 3.2% | 90.0% | -15.0% |
|
||||
| Latency | 3.5s | 0.4s | 2.0s | +1.5s |
|
||||
| Cost | $0.020 | $0.002 | $0.010 | +$0.010 |
|
||||
|
||||
## Graph Structure
|
||||
|
||||
### Current Configuration
|
||||
|
||||
\```
|
||||
analyze_intent → retrieve_docs → generate_response
|
||||
\```
|
||||
|
||||
- **Node Count**: 3
|
||||
- **Edge Type**: Sequential only
|
||||
- **Parallel Processing**: None
|
||||
- **Conditional Branching**: None
|
||||
|
||||
### Node Details
|
||||
|
||||
#### analyze_intent
|
||||
- **Role**: Classify user input intent
|
||||
- **LLM**: Claude 3.5 Sonnet
|
||||
- **Average Execution Time**: 0.5s
|
||||
|
||||
#### retrieve_docs
|
||||
- **Role**: Search related documents
|
||||
- **Processing**: Vector DB query + reranking
|
||||
- **Average Execution Time**: 1.5s
|
||||
|
||||
#### generate_response
|
||||
- **Role**: Generate final response
|
||||
- **LLM**: Claude 3.5 Sonnet
|
||||
- **Average Execution Time**: 1.5s
|
||||
|
||||
## Issues
|
||||
|
||||
### 1. Latency Bottleneck from Sequential Processing
|
||||
|
||||
- **Issue**: analyze_intent and retrieve_docs are sequential
|
||||
- **Impact**: Total 2.0s delay (57% of total)
|
||||
- **Improvement Potential**: -0.8s or more reduction possible through parallelization
|
||||
|
||||
### 2. All Requests Follow Same Flow
|
||||
|
||||
- **Issue**: Simple and complex questions go through same processing
|
||||
- **Impact**: Unnecessary retrieve_docs execution (wasted Cost and Latency)
|
||||
- **Improvement Potential**: -50% reduction possible for simple cases through routing
|
||||
|
||||
### 3. Use of Low-Relevance Documents
|
||||
|
||||
- **Issue**: retrieve_docs returns only top-k (no reranking)
|
||||
- **Impact**: Low Accuracy (75%)
|
||||
- **Improvement Potential**: +10-15% improvement possible through multi-stage RAG
|
||||
|
||||
## Applicable Architecture Patterns
|
||||
|
||||
1. **Parallelization** - Parallelize analyze_intent and retrieve_docs
|
||||
2. **Routing** - Branch processing flow based on intent
|
||||
3. **Subgraph** - Dedicated subgraph for RAG processing (retrieve → rerank → select)
|
||||
4. **Orchestrator-Worker** - Execute multiple retrievers in parallel and integrate results
|
||||
```
|
||||
|
||||
### improvement_proposals.md
|
||||
|
||||
```markdown
|
||||
# Architecture Improvement Proposals
|
||||
|
||||
Proposal Date: 2024-11-24 10:30:00
|
||||
|
||||
## Proposal 1: Parallel Document Retrieval + Intent Analysis
|
||||
|
||||
### Changes
|
||||
|
||||
**Current**:
|
||||
\```
|
||||
analyze_intent → retrieve_docs → generate_response
|
||||
\```
|
||||
|
||||
**After Change**:
|
||||
\```
|
||||
START → [analyze_intent, retrieve_docs] → generate_response
|
||||
↓ parallel execution ↓
|
||||
\```
|
||||
|
||||
### Implementation Details
|
||||
|
||||
1. Add parallel edges to StateGraph
|
||||
2. Add join node to wait for both results
|
||||
3. generate_response receives both results
|
||||
|
||||
### Expected Effects
|
||||
|
||||
| Metric | Current | Expected | Change | Change Rate |
|
||||
|--------|---------|----------|--------|-------------|
|
||||
| Accuracy | 75.0% | 75.0% | ±0 | - |
|
||||
| Latency | 3.5s | 2.7s | -0.8s | -23% |
|
||||
| Cost | $0.020 | $0.020 | ±0 | - |
|
||||
|
||||
### Implementation Complexity
|
||||
|
||||
- **Level**: Low
|
||||
- **Estimated Time**: 1-2 hours
|
||||
- **Risk**: Low (no changes to existing nodes required)
|
||||
|
||||
### Recommendation Level
|
||||
|
||||
⭐⭐⭐⭐ (High) - Effective for Latency improvement with low risk
|
||||
|
||||
---
|
||||
|
||||
## Proposal 2: Intent-Based Routing
|
||||
|
||||
### Changes
|
||||
|
||||
**Current**:
|
||||
\```
|
||||
analyze_intent → retrieve_docs → generate_response
|
||||
\```
|
||||
|
||||
**After Change**:
|
||||
\```
|
||||
analyze_intent
|
||||
├─ simple_intent → simple_response (lightweight)
|
||||
└─ complex_intent → retrieve_docs → generate_response
|
||||
\```
|
||||
|
||||
### Implementation Details
|
||||
|
||||
1. Conditional branching based on analyze_intent output
|
||||
2. Create new simple_response node (using Haiku)
|
||||
3. Routing with conditional_edges
|
||||
|
||||
### Expected Effects
|
||||
|
||||
| Metric | Current | Expected | Change | Change Rate |
|
||||
|--------|---------|----------|--------|-------------|
|
||||
| Accuracy | 75.0% | 82.0% | +7.0% | +9% |
|
||||
| Latency | 3.5s | 2.8s | -0.7s | -20% |
|
||||
| Cost | $0.020 | $0.014 | -$0.006 | -30% |
|
||||
|
||||
**Assumption**: 40% simple cases, 60% complex cases
|
||||
|
||||
### Implementation Complexity
|
||||
|
||||
- **Level**: Medium
|
||||
- **Estimated Time**: 2-3 hours
|
||||
- **Risk**: Medium (adding routing logic)
|
||||
|
||||
### Recommendation Level
|
||||
|
||||
⭐⭐⭐⭐⭐ (Highest) - Balanced improvement across all metrics
|
||||
|
||||
---
|
||||
|
||||
## Proposal 3: Multi-Stage RAG with Reranking Subgraph
|
||||
|
||||
### Changes
|
||||
|
||||
**Current**:
|
||||
\```
|
||||
analyze_intent → retrieve_docs → generate_response
|
||||
\```
|
||||
|
||||
**After Change**:
|
||||
\```
|
||||
analyze_intent → [RAG Subgraph] → generate_response
|
||||
↓
|
||||
retrieve (k=20)
|
||||
↓
|
||||
rerank (top-5)
|
||||
↓
|
||||
select (best context)
|
||||
\```
|
||||
|
||||
### Implementation Details
|
||||
|
||||
1. Convert RAG processing to dedicated subgraph
|
||||
2. Retrieve more candidates in retrieve node (k=20)
|
||||
3. Evaluate relevance in rerank node (Cross-Encoder)
|
||||
4. Select optimal context in select node
|
||||
|
||||
### Expected Effects
|
||||
|
||||
| Metric | Current | Expected | Change | Change Rate |
|
||||
|--------|---------|----------|--------|-------------|
|
||||
| Accuracy | 75.0% | 88.0% | +13.0% | +17% |
|
||||
| Latency | 3.5s | 3.8s | +0.3s | +9% |
|
||||
| Cost | $0.020 | $0.022 | +$0.002 | +10% |
|
||||
|
||||
### Implementation Complexity
|
||||
|
||||
- **Level**: Medium-High
|
||||
- **Estimated Time**: 3-4 hours
|
||||
- **Risk**: Medium (introducing new model, subgraph management)
|
||||
|
||||
### Recommendation Level
|
||||
|
||||
⭐⭐⭐ (Medium) - Effective when Accuracy is priority, Latency will degrade
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
**Note**: The following recommendations are for reference. The arch-tune command will **implement and evaluate all Proposals above in parallel** and select the best option based on actual results.
|
||||
|
||||
### 🥇 First Recommendation: Proposal 2 (Intent-Based Routing)
|
||||
|
||||
**Reasons**:
|
||||
- Balanced improvement across all metrics
|
||||
- Implementation complexity is manageable at medium level
|
||||
- High ROI (effect vs cost)
|
||||
|
||||
**Next Steps**:
|
||||
1. Run parallel exploration with arch-tune command
|
||||
2. Implement and evaluate Proposals 1, 2, 3 simultaneously
|
||||
3. Select best option based on actual results
|
||||
|
||||
### 🥈 Second Recommendation: Proposal 1 (Parallel Retrieval)
|
||||
|
||||
**Reasons**:
|
||||
- Simple implementation with low risk
|
||||
- Reliable Latency improvement
|
||||
- Can be combined with Proposal 2
|
||||
|
||||
### 📝 Reference: Proposal 3 (Multi-Stage RAG)
|
||||
|
||||
**Reasons**:
|
||||
- Effective when Accuracy is most important
|
||||
- Only when Latency trade-off is acceptable
|
||||
```
|
||||
|
||||
## 🔧 Tools and Technologies Used
|
||||
|
||||
### MCP Server Usage
|
||||
|
||||
- **Serena MCP**: Codebase analysis
|
||||
- `find_symbol`: Search graph definitions
|
||||
- `get_symbols_overview`: Understand node structure
|
||||
- `search_for_pattern`: Search specific patterns
|
||||
|
||||
### Reference Skills
|
||||
|
||||
- **langgraph-master skill**: Architecture pattern reference
|
||||
|
||||
### Evaluation Program
|
||||
|
||||
- User-provided or auto-generated
|
||||
- Metrics: accuracy, latency, cost, etc.
|
||||
|
||||
## ⚠️ Important Notes
|
||||
|
||||
1. **Analysis Only**
|
||||
- This skill does not implement changes
|
||||
- Only outputs analysis and proposals
|
||||
|
||||
2. **Evaluation Environment**
|
||||
- Evaluation program is required
|
||||
- Will be created if not present
|
||||
|
||||
3. **Serena MCP**
|
||||
- If Serena is unavailable, manual code analysis
|
||||
- Use ls, read tools
|
||||
|
||||
## 🔗 Related Resources
|
||||
|
||||
- [langgraph-master skill](../langgraph-master/SKILL.md) - Architecture patterns
|
||||
- [arch-tune command](../../commands/arch-tune.md) - Command that uses this skill
|
||||
- [fine-tune skill](../fine-tune/SKILL.md) - Prompt optimization
|
||||
83
skills/fine-tune/README.md
Normal file
83
skills/fine-tune/README.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# LangGraph Fine-Tune Skill
|
||||
|
||||
A comprehensive skill for iteratively optimizing prompts and processing logic in LangGraph applications based on evaluation criteria.
|
||||
|
||||
## Overview
|
||||
|
||||
The fine-tune skill helps you improve the performance of existing LangGraph applications through systematic prompt optimization without modifying the graph structure (nodes, edges configuration).
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Iterative Optimization**: Data-driven improvement cycles with measurable results
|
||||
- **Graph Structure Preservation**: Only optimize prompts and node logic, not the graph architecture
|
||||
- **Statistical Evaluation**: Multiple runs with statistical analysis for reliable results
|
||||
- **MCP Integration**: Leverages Serena MCP for codebase analysis and target identification
|
||||
|
||||
## When to Use
|
||||
|
||||
- LLM output quality needs improvement
|
||||
- Response latency is too high
|
||||
- Cost optimization is required
|
||||
- Error rates need reduction
|
||||
- Prompt engineering improvements are expected to help
|
||||
|
||||
## 4-Phase Workflow
|
||||
|
||||
### Phase 1: Preparation and Analysis
|
||||
|
||||
Understand optimization targets and current state.
|
||||
|
||||
- Load objectives from `.langgraph-master/fine-tune.md`
|
||||
- Identify optimization targets using Serena MCP
|
||||
- Create prioritized optimization target list
|
||||
|
||||
### Phase 2: Baseline Evaluation
|
||||
|
||||
Quantitatively measure current performance.
|
||||
|
||||
- Prepare evaluation environment (test cases, scripts)
|
||||
- Measure baseline (3-5 runs recommended)
|
||||
- Analyze results and identify problems
|
||||
|
||||
### Phase 3: Iterative Improvement
|
||||
|
||||
Data-driven incremental improvement cycle.
|
||||
|
||||
- Prioritize improvement areas by impact
|
||||
- Implement prompt optimizations
|
||||
- Re-evaluate under same conditions
|
||||
- Compare results and decide next steps
|
||||
- Repeat until goals are achieved
|
||||
|
||||
### Phase 4: Completion and Documentation
|
||||
|
||||
Record achievements and provide recommendations.
|
||||
|
||||
- Create final evaluation report
|
||||
- Commit code changes
|
||||
- Update documentation
|
||||
|
||||
## Key Optimization Techniques
|
||||
|
||||
| Technique | Expected Impact |
|
||||
| --------------------------------- | --------------------------- |
|
||||
| Few-Shot Examples | Accuracy +10-20% |
|
||||
| Structured Output Format | Parsing errors -90% |
|
||||
| Temperature/Max Tokens Adjustment | Cost -20-40% |
|
||||
| Model Selection Optimization | Cost -40-60% |
|
||||
| Prompt Caching | Cost -50-90% (on cache hit) |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start Small**: Begin with the most impactful node
|
||||
2. **Measurement-Driven**: Always quantify before and after improvements
|
||||
3. **Incremental Changes**: Validate one change at a time
|
||||
4. **Document Everything**: Record reasons and results for each change
|
||||
5. **Iterate**: Continue improving until goals are achieved
|
||||
|
||||
## Important Constraints
|
||||
|
||||
- **Preserve Graph Structure**: Do not add/remove nodes or edges
|
||||
- **Maintain Data Flow**: Do not change data flow between nodes
|
||||
- **Keep State Schema**: Maintain the existing state schema
|
||||
- **Evaluation Consistency**: Use same test cases and metrics throughout
|
||||
153
skills/fine-tune/SKILL.md
Normal file
153
skills/fine-tune/SKILL.md
Normal file
@@ -0,0 +1,153 @@
|
||||
---
|
||||
name: fine-tune
|
||||
description: Use when you need to fine-tune(ファインチューニング) and optimize LangGraph applications based on evaluation criteria. This skill performs iterative prompt optimization for LangGraph nodes without changing the graph structure.
|
||||
---
|
||||
|
||||
# LangGraph Application Fine-Tuning Skill
|
||||
|
||||
A skill for iteratively optimizing prompts and processing logic in each node of a LangGraph application based on evaluation criteria.
|
||||
|
||||
## 📋 Overview
|
||||
|
||||
This skill executes the following process to improve the performance of existing LangGraph applications:
|
||||
|
||||
1. **Load Objectives**: Retrieve optimization goals and evaluation criteria from `.langgraph-master/fine-tune.md` (if this file doesn't exist, help the user create it based on their requirements)
|
||||
2. **Identify Optimization Targets**: Extract nodes containing LLM prompts using Serena MCP (if Serena MCP is unavailable, investigate the codebase using ls, read, etc.)
|
||||
3. **Baseline Evaluation**: Measure current performance through multiple runs
|
||||
4. **Implement Improvements**: Identify the most effective improvement areas and optimize prompts and processing logic
|
||||
5. **Re-evaluation**: Measure performance after improvements
|
||||
6. **Iteration**: Repeat steps 4-5 until goals are achieved
|
||||
|
||||
**Important Constraint**: Only optimize prompts and processing logic within each node without modifying the graph structure (nodes, edges configuration).
|
||||
|
||||
## 🎯 When to Use This Skill
|
||||
|
||||
Use this skill in the following situations:
|
||||
|
||||
1. **When performance improvement of existing applications is needed**
|
||||
|
||||
- Want to improve LLM output quality
|
||||
- Want to improve response speed
|
||||
- Want to reduce error rate
|
||||
|
||||
2. **When evaluation criteria are clear**
|
||||
|
||||
- Optimization goals are defined in `.langgraph-master/fine-tune.md`
|
||||
- Quantitative evaluation methods are established
|
||||
|
||||
3. **When improvements through prompt engineering are expected**
|
||||
- Improvements are likely with clearer LLM instructions
|
||||
- Adding few-shot examples would be effective
|
||||
- Output format adjustment is needed
|
||||
|
||||
## 📖 Fine-Tuning Workflow Overview
|
||||
|
||||
### Phase 1: Preparation and Analysis
|
||||
|
||||
**Purpose**: Understand optimization targets and current state
|
||||
|
||||
**Main Steps**:
|
||||
|
||||
1. Load objective setting file (`.langgraph-master/fine-tune.md`)
|
||||
2. Identify optimization targets (Serena MCP or manual code investigation)
|
||||
3. Create optimization target list (evaluate improvement potential for each node)
|
||||
|
||||
→ See [workflow.md](workflow.md#phase-1-preparation-and-analysis) for details
|
||||
|
||||
### Phase 2: Baseline Evaluation
|
||||
|
||||
**Purpose**: Quantitatively measure current performance
|
||||
|
||||
**Main Steps**: 4. Prepare evaluation environment (test cases, evaluation scripts) 5. Baseline measurement (recommended: 3-5 runs) 6. Analyze baseline results (identify problems)
|
||||
|
||||
**Important**: When evaluation programs are needed, create evaluation code in a specific subdirectory (users may specify the directory).
|
||||
|
||||
→ See [workflow.md](workflow.md#phase-2-baseline-evaluation) and [evaluation.md](evaluation.md) for details
|
||||
|
||||
### Phase 3: Iterative Improvement
|
||||
|
||||
**Purpose**: Data-driven incremental improvement
|
||||
|
||||
**Main Steps**: 7. Prioritization (select the most impactful improvement area) 8. Implement improvements (prompt optimization, parameter tuning) 9. Post-improvement evaluation (re-evaluate under the same conditions) 10. Compare and analyze results (measure improvement effects) 11. Decide whether to continue iteration (repeat until goals are achieved)
|
||||
|
||||
→ See [workflow.md](workflow.md#phase-3-iterative-improvement) and [prompt_optimization.md](prompt_optimization.md) for details
|
||||
|
||||
### Phase 4: Completion and Documentation
|
||||
|
||||
**Purpose**: Record achievements and provide future recommendations
|
||||
|
||||
**Main Steps**: 12. Create final evaluation report (improvement content, results, recommendations) 13. Code commit and documentation update
|
||||
|
||||
→ See [workflow.md](workflow.md#phase-4-completion-and-documentation) for details
|
||||
|
||||
## 🔧 Tools and Technologies Used
|
||||
|
||||
### MCP Server Utilization
|
||||
|
||||
- **Serena MCP**: Codebase analysis and optimization target identification
|
||||
|
||||
- `find_symbol`: Search for LLM clients
|
||||
- `find_referencing_symbols`: Identify prompt construction locations
|
||||
- `get_symbols_overview`: Understand node structure
|
||||
|
||||
- **Sequential MCP**: Complex analysis and decision making
|
||||
- Determine improvement priorities
|
||||
- Analyze evaluation results
|
||||
- Plan next actions
|
||||
|
||||
### Key Optimization Techniques
|
||||
|
||||
1. **Few-Shot Examples**: Accuracy +10-20%
|
||||
2. **Structured Output Format**: Parsing errors -90%
|
||||
3. **Temperature/Max Tokens Adjustment**: Cost -20-40%
|
||||
4. **Model Selection Optimization**: Cost -40-60%
|
||||
5. **Prompt Caching**: Cost -50-90% (on cache hit)
|
||||
|
||||
→ See [prompt_optimization.md](prompt_optimization.md) for details
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
Detailed guidelines and best practices:
|
||||
|
||||
- **[workflow.md](workflow.md)** - Fine-tuning workflow details (execution procedures and code examples for each phase)
|
||||
- **[evaluation.md](evaluation.md)** - Evaluation methods and best practices (metric calculation, statistical analysis, test case design)
|
||||
- **[prompt_optimization.md](prompt_optimization.md)** - Prompt optimization techniques (10 practical methods and priorities)
|
||||
- **[examples.md](examples.md)** - Practical examples collection (copy-and-paste ready code examples and template collection)
|
||||
|
||||
## ⚠️ Important Notes
|
||||
|
||||
1. **Preserve Graph Structure**
|
||||
|
||||
- Do not add or remove nodes or edges
|
||||
- Do not change data flow between nodes
|
||||
- Maintain state schema
|
||||
|
||||
2. **Evaluation Consistency**
|
||||
|
||||
- Use the same test cases
|
||||
- Measure with the same evaluation metrics
|
||||
- Run multiple times to confirm statistically significant improvements
|
||||
|
||||
3. **Cost Management**
|
||||
|
||||
- Consider evaluation execution costs
|
||||
- Adjust sample size as needed
|
||||
- Be mindful of API rate limits
|
||||
|
||||
4. **Version Control**
|
||||
- Git commit each iteration's changes
|
||||
- Maintain rollback-capable state
|
||||
- Record evaluation results
|
||||
|
||||
## 🎓 Fine-Tuning Best Practices
|
||||
|
||||
1. **Start Small**: Optimize from the most impactful node
|
||||
2. **Measurement-Driven**: Always perform quantitative evaluation before and after improvements
|
||||
3. **Incremental Improvement**: Validate one change at a time, not multiple simultaneously
|
||||
4. **Documentation**: Record reasons and results for each change
|
||||
5. **Iteration**: Continuously improve until goals are achieved
|
||||
|
||||
## 🔗 Reference Links
|
||||
|
||||
- [LangGraph Official Documentation](https://docs.langchain.com/oss/python/langgraph/overview)
|
||||
- [Prompt Engineering Guide](https://www.promptingguide.ai/)
|
||||
80
skills/fine-tune/evaluation.md
Normal file
80
skills/fine-tune/evaluation.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Evaluation Methods and Best Practices
|
||||
|
||||
Evaluation strategies, metrics, and best practices for fine-tuning LangGraph applications.
|
||||
|
||||
**💡 Tip**: For practical evaluation scripts and report templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
|
||||
|
||||
## 📚 Table of Contents
|
||||
|
||||
This guide is divided into the following sections:
|
||||
|
||||
### 1. [Evaluation Metrics Design](./evaluation_metrics.md)
|
||||
Learn how to define and calculate metrics used for evaluation.
|
||||
|
||||
### 2. [Test Case Design](./evaluation_testcases.md)
|
||||
Understand test case structure, coverage, and design principles.
|
||||
|
||||
### 3. [Statistical Significance Testing](./evaluation_statistics.md)
|
||||
Master methods for multiple runs and statistical analysis.
|
||||
|
||||
### 4. [Evaluation Best Practices](./evaluation_practices.md)
|
||||
Provides practical evaluation guidelines.
|
||||
|
||||
## 🎯 Quick Start
|
||||
|
||||
### For First-Time Evaluation
|
||||
|
||||
1. **[Understand Evaluation Metrics](./evaluation_metrics.md)** - Which metrics to measure
|
||||
2. **[Design Test Cases](./evaluation_testcases.md)** - Create representative cases
|
||||
3. **[Learn Statistical Methods](./evaluation_statistics.md)** - Importance of multiple runs
|
||||
4. **[Follow Best Practices](./evaluation_practices.md)** - Effective evaluation implementation
|
||||
|
||||
### Improving Existing Evaluations
|
||||
|
||||
1. **[Add Metrics](./evaluation_metrics.md)** - More comprehensive evaluation
|
||||
2. **[Improve Coverage](./evaluation_testcases.md)** - Enhance test cases
|
||||
3. **[Strengthen Statistical Validation](./evaluation_statistics.md)** - Improve reliability
|
||||
4. **[Introduce Automation](./evaluation_practices.md)** - Continuous evaluation pipeline
|
||||
|
||||
## 📖 Importance of Evaluation
|
||||
|
||||
In fine-tuning, evaluation provides:
|
||||
- **Quantifying Improvements**: Objective progress measurement
|
||||
- **Basis for Decision-Making**: Data-driven prioritization
|
||||
- **Quality Assurance**: Prevention of regressions
|
||||
- **ROI Demonstration**: Visualization of business value
|
||||
|
||||
## 💡 Basic Principles of Evaluation
|
||||
|
||||
For effective evaluation:
|
||||
|
||||
1. ✅ **Multiple Metrics**: Comprehensive assessment of quality, performance, cost, and reliability
|
||||
2. ✅ **Statistical Validation**: Confirm significance through multiple runs
|
||||
3. ✅ **Consistency**: Evaluate with the same test cases under the same conditions
|
||||
4. ✅ **Visualization**: Track improvements with graphs and tables
|
||||
5. ✅ **Documentation**: Record evaluation results and analysis
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Large Variance in Evaluation Results
|
||||
→ Check [Statistical Significance Testing](./evaluation_statistics.md#outlier-detection-and-handling)
|
||||
|
||||
### Evaluation Takes Too Long
|
||||
→ Implement staged evaluation in [Best Practices](./evaluation_practices.md#troubleshooting)
|
||||
|
||||
### Unclear Which Metrics to Measure
|
||||
→ Check [Evaluation Metrics Design](./evaluation_metrics.md) for purpose and use cases of each metric
|
||||
|
||||
### Insufficient Test Cases
|
||||
→ Refer to coverage analysis in [Test Case Design](./evaluation_testcases.md#test-case-design-principles)
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- **[Prompt Optimization](./prompt_optimization.md)** - Techniques for prompt improvement
|
||||
- **[Examples Collection](./examples.md)** - Samples of evaluation scripts and reports
|
||||
- **[Workflow](./workflow.md)** - Overall fine-tuning flow including evaluation
|
||||
- **[SKILL.md](./SKILL.md)** - Overview of the fine-tune skill
|
||||
|
||||
---
|
||||
|
||||
**💡 Tip**: For practical evaluation scripts and templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
|
||||
340
skills/fine-tune/evaluation_metrics.md
Normal file
340
skills/fine-tune/evaluation_metrics.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# Evaluation Metrics Design
|
||||
|
||||
Definitions and calculation methods for evaluation metrics in LangGraph application fine-tuning.
|
||||
|
||||
**💡 Tip**: For practical evaluation scripts and report templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
|
||||
|
||||
## 📊 Importance of Evaluation
|
||||
|
||||
In fine-tuning, evaluation provides:
|
||||
- **Quantifying Improvements**: Objective progress measurement
|
||||
- **Basis for Decision-Making**: Data-driven prioritization
|
||||
- **Quality Assurance**: Prevention of regressions
|
||||
- **ROI Demonstration**: Visualization of business value
|
||||
|
||||
## 🎯 Evaluation Metric Categories
|
||||
|
||||
### 1. Quality Metrics
|
||||
|
||||
#### Accuracy
|
||||
```python
|
||||
def calculate_accuracy(predictions: List, ground_truth: List) -> float:
|
||||
"""Calculate accuracy"""
|
||||
correct = sum(p == g for p, g in zip(predictions, ground_truth))
|
||||
return (correct / len(predictions)) * 100
|
||||
|
||||
# Example
|
||||
predictions = ["product", "technical", "billing", "general"]
|
||||
ground_truth = ["product", "billing", "billing", "general"]
|
||||
accuracy = calculate_accuracy(predictions, ground_truth)
|
||||
# => 50.0% (2/4 correct)
|
||||
```
|
||||
|
||||
#### F1 Score (Multi-class Classification)
|
||||
```python
|
||||
from sklearn.metrics import f1_score, classification_report
|
||||
|
||||
def calculate_f1(predictions: List, ground_truth: List, average='weighted') -> float:
|
||||
"""Calculate F1 score (multi-class support)"""
|
||||
return f1_score(ground_truth, predictions, average=average)
|
||||
|
||||
# Detailed report
|
||||
report = classification_report(ground_truth, predictions)
|
||||
print(report)
|
||||
"""
|
||||
precision recall f1-score support
|
||||
|
||||
product 1.00 1.00 1.00 1
|
||||
technical 0.00 0.00 0.00 1
|
||||
billing 0.50 1.00 0.67 1
|
||||
general 1.00 1.00 1.00 1
|
||||
|
||||
accuracy 0.75 4
|
||||
macro avg 0.62 0.75 0.67 4
|
||||
weighted avg 0.62 0.75 0.67 4
|
||||
"""
|
||||
```
|
||||
|
||||
#### Semantic Similarity
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer, util
|
||||
|
||||
def calculate_semantic_similarity(
|
||||
generated: str,
|
||||
reference: str,
|
||||
model_name: str = "all-MiniLM-L6-v2"
|
||||
) -> float:
|
||||
"""Calculate semantic similarity between generated and reference text"""
|
||||
model = SentenceTransformer(model_name)
|
||||
|
||||
embeddings = model.encode([generated, reference], convert_to_tensor=True)
|
||||
similarity = util.pytorch_cos_sim(embeddings[0], embeddings[1])
|
||||
|
||||
return similarity.item()
|
||||
|
||||
# Example
|
||||
generated = "Our premium plan costs $49 per month."
|
||||
reference = "The premium subscription is $49/month."
|
||||
similarity = calculate_semantic_similarity(generated, reference)
|
||||
# => 0.87 (high similarity)
|
||||
```
|
||||
|
||||
#### BLEU Score (Text Generation Quality)
|
||||
```python
|
||||
from nltk.translate.bleu_score import sentence_bleu
|
||||
|
||||
def calculate_bleu(generated: str, reference: str) -> float:
|
||||
"""Calculate BLEU score"""
|
||||
reference_tokens = [reference.split()]
|
||||
generated_tokens = generated.split()
|
||||
|
||||
return sentence_bleu(reference_tokens, generated_tokens)
|
||||
|
||||
# Example
|
||||
generated = "The product costs forty nine dollars"
|
||||
reference = "The product costs $49"
|
||||
bleu = calculate_bleu(generated, reference)
|
||||
# => 0.45
|
||||
```
|
||||
|
||||
### 2. Performance Metrics
|
||||
|
||||
#### Latency (Response Time)
|
||||
```python
|
||||
import time
|
||||
from typing import Dict, List
|
||||
|
||||
def measure_latency(test_cases: List[Dict]) -> Dict:
|
||||
"""Measure latency for each node and total"""
|
||||
results = {
|
||||
"total": [],
|
||||
"by_node": {}
|
||||
}
|
||||
|
||||
for case in test_cases:
|
||||
start_time = time.time()
|
||||
|
||||
# Measurement by node
|
||||
node_times = {}
|
||||
|
||||
# Node 1: analyze_intent
|
||||
node_start = time.time()
|
||||
analyze_result = analyze_intent(case["input"])
|
||||
node_times["analyze_intent"] = time.time() - node_start
|
||||
|
||||
# Node 2: retrieve_context
|
||||
node_start = time.time()
|
||||
context = retrieve_context(analyze_result)
|
||||
node_times["retrieve_context"] = time.time() - node_start
|
||||
|
||||
# Node 3: generate_response
|
||||
node_start = time.time()
|
||||
response = generate_response(context, case["input"])
|
||||
node_times["generate_response"] = time.time() - node_start
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
results["total"].append(total_time)
|
||||
for node, duration in node_times.items():
|
||||
if node not in results["by_node"]:
|
||||
results["by_node"][node] = []
|
||||
results["by_node"][node].append(duration)
|
||||
|
||||
# Statistical calculation
|
||||
import numpy as np
|
||||
summary = {
|
||||
"total": {
|
||||
"mean": np.mean(results["total"]),
|
||||
"p50": np.percentile(results["total"], 50),
|
||||
"p95": np.percentile(results["total"], 95),
|
||||
"p99": np.percentile(results["total"], 99),
|
||||
}
|
||||
}
|
||||
|
||||
for node, times in results["by_node"].items():
|
||||
summary[node] = {
|
||||
"mean": np.mean(times),
|
||||
"p50": np.percentile(times, 50),
|
||||
"p95": np.percentile(times, 95),
|
||||
}
|
||||
|
||||
return summary
|
||||
|
||||
# Usage example
|
||||
latency_results = measure_latency(test_cases)
|
||||
print(f"Mean latency: {latency_results['total']['mean']:.2f}s")
|
||||
print(f"P95 latency: {latency_results['total']['p95']:.2f}s")
|
||||
```
|
||||
|
||||
#### Throughput
|
||||
```python
|
||||
import concurrent.futures
|
||||
from typing import List, Dict
|
||||
|
||||
def measure_throughput(
|
||||
test_cases: List[Dict],
|
||||
max_workers: int = 10,
|
||||
duration_seconds: int = 60
|
||||
) -> Dict:
|
||||
"""Measure number of requests processed within a given time"""
|
||||
start_time = time.time()
|
||||
completed = 0
|
||||
errors = 0
|
||||
|
||||
def process_case(case):
|
||||
try:
|
||||
result = run_langgraph_app(case["input"])
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
|
||||
while time.time() - start_time < duration_seconds:
|
||||
# Loop through test cases
|
||||
for case in test_cases:
|
||||
if time.time() - start_time >= duration_seconds:
|
||||
break
|
||||
|
||||
future = executor.submit(process_case, case)
|
||||
if future.result():
|
||||
completed += 1
|
||||
else:
|
||||
errors += 1
|
||||
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
return {
|
||||
"completed": completed,
|
||||
"errors": errors,
|
||||
"elapsed": elapsed,
|
||||
"throughput": completed / elapsed, # requests per second
|
||||
"error_rate": errors / (completed + errors) if (completed + errors) > 0 else 0
|
||||
}
|
||||
|
||||
# Usage example
|
||||
throughput = measure_throughput(test_cases, max_workers=5, duration_seconds=30)
|
||||
print(f"Throughput: {throughput['throughput']:.2f} req/s")
|
||||
print(f"Error rate: {throughput['error_rate']*100:.2f}%")
|
||||
```
|
||||
|
||||
### 3. Cost Metrics
|
||||
|
||||
#### Token Usage and Cost
|
||||
```python
|
||||
from typing import Dict
|
||||
|
||||
# Pricing table by model (as of November 2024)
|
||||
PRICING = {
|
||||
"claude-3-5-sonnet-20241022": {
|
||||
"input": 3.0 / 1_000_000, # $3.00 per 1M input tokens
|
||||
"output": 15.0 / 1_000_000, # $15.00 per 1M output tokens
|
||||
},
|
||||
"claude-3-5-haiku-20241022": {
|
||||
"input": 0.8 / 1_000_000, # $0.80 per 1M input tokens
|
||||
"output": 4.0 / 1_000_000, # $4.00 per 1M output tokens
|
||||
}
|
||||
}
|
||||
|
||||
def calculate_cost(token_usage: Dict, model: str) -> Dict:
|
||||
"""Calculate cost from token usage"""
|
||||
pricing = PRICING.get(model, PRICING["claude-3-5-sonnet-20241022"])
|
||||
|
||||
input_cost = token_usage["input_tokens"] * pricing["input"]
|
||||
output_cost = token_usage["output_tokens"] * pricing["output"]
|
||||
total_cost = input_cost + output_cost
|
||||
|
||||
return {
|
||||
"input_tokens": token_usage["input_tokens"],
|
||||
"output_tokens": token_usage["output_tokens"],
|
||||
"total_tokens": token_usage["input_tokens"] + token_usage["output_tokens"],
|
||||
"input_cost": input_cost,
|
||||
"output_cost": output_cost,
|
||||
"total_cost": total_cost,
|
||||
"cost_breakdown": {
|
||||
"input_pct": (input_cost / total_cost * 100) if total_cost > 0 else 0,
|
||||
"output_pct": (output_cost / total_cost * 100) if total_cost > 0 else 0
|
||||
}
|
||||
}
|
||||
|
||||
# Usage example
|
||||
token_usage = {"input_tokens": 1500, "output_tokens": 800}
|
||||
cost = calculate_cost(token_usage, "claude-3-5-sonnet-20241022")
|
||||
print(f"Total cost: ${cost['total_cost']:.4f}")
|
||||
print(f"Input: ${cost['input_cost']:.4f} ({cost['cost_breakdown']['input_pct']:.1f}%)")
|
||||
print(f"Output: ${cost['output_cost']:.4f} ({cost['cost_breakdown']['output_pct']:.1f}%)")
|
||||
```
|
||||
|
||||
#### Cost per Request
|
||||
```python
|
||||
def calculate_cost_per_request(
|
||||
test_results: List[Dict],
|
||||
model: str
|
||||
) -> Dict:
|
||||
"""Calculate cost per request"""
|
||||
total_cost = 0
|
||||
total_input_tokens = 0
|
||||
total_output_tokens = 0
|
||||
|
||||
for result in test_results:
|
||||
cost = calculate_cost(result["token_usage"], model)
|
||||
total_cost += cost["total_cost"]
|
||||
total_input_tokens += result["token_usage"]["input_tokens"]
|
||||
total_output_tokens += result["token_usage"]["output_tokens"]
|
||||
|
||||
num_requests = len(test_results)
|
||||
|
||||
return {
|
||||
"total_requests": num_requests,
|
||||
"total_cost": total_cost,
|
||||
"cost_per_request": total_cost / num_requests,
|
||||
"avg_input_tokens": total_input_tokens / num_requests,
|
||||
"avg_output_tokens": total_output_tokens / num_requests,
|
||||
"total_tokens": total_input_tokens + total_output_tokens
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Reliability Metrics
|
||||
|
||||
#### Error Rate
|
||||
```python
|
||||
def calculate_error_rate(results: List[Dict]) -> Dict:
|
||||
"""Analyze error rate and error types"""
|
||||
total = len(results)
|
||||
errors = [r for r in results if r.get("error")]
|
||||
|
||||
error_types = {}
|
||||
for error in errors:
|
||||
error_type = error["error"]["type"]
|
||||
if error_type not in error_types:
|
||||
error_types[error_type] = 0
|
||||
error_types[error_type] += 1
|
||||
|
||||
return {
|
||||
"total_requests": total,
|
||||
"total_errors": len(errors),
|
||||
"error_rate": len(errors) / total if total > 0 else 0,
|
||||
"error_types": error_types,
|
||||
"success_rate": (total - len(errors)) / total if total > 0 else 0
|
||||
}
|
||||
```
|
||||
|
||||
#### Retry Rate
|
||||
```python
|
||||
def calculate_retry_rate(results: List[Dict]) -> Dict:
|
||||
"""Proportion of cases that required retries"""
|
||||
total = len(results)
|
||||
retried = [r for r in results if r.get("retry_count", 0) > 0]
|
||||
|
||||
return {
|
||||
"total_requests": total,
|
||||
"retried_requests": len(retried),
|
||||
"retry_rate": len(retried) / total if total > 0 else 0,
|
||||
"avg_retries": sum(r.get("retry_count", 0) for r in retried) / len(retried) if retried else 0
|
||||
}
|
||||
```
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- [Test Case Design](./evaluation_testcases.md) - Test case structure and coverage
|
||||
- [Statistical Significance Testing](./evaluation_statistics.md) - Multiple runs and statistical analysis
|
||||
- [Evaluation Best Practices](./evaluation_practices.md) - Consistency, visualization, reporting
|
||||
324
skills/fine-tune/evaluation_practices.md
Normal file
324
skills/fine-tune/evaluation_practices.md
Normal file
@@ -0,0 +1,324 @@
|
||||
# Evaluation Best Practices
|
||||
|
||||
Practical guidelines for effective evaluation of LangGraph applications.
|
||||
|
||||
## 🎯 Evaluation Best Practices
|
||||
|
||||
### 1. Ensuring Consistency
|
||||
|
||||
#### Evaluation Under Same Conditions
|
||||
|
||||
```python
|
||||
class EvaluationConfig:
|
||||
"""Fix evaluation settings to ensure consistency"""
|
||||
|
||||
def __init__(self):
|
||||
self.test_cases_path = "tests/evaluation/test_cases.json"
|
||||
self.seed = 42 # For reproducibility
|
||||
self.iterations = 5
|
||||
self.timeout = 30 # seconds
|
||||
self.model = "claude-3-5-sonnet-20241022"
|
||||
|
||||
def load_test_cases(self) -> List[Dict]:
|
||||
"""Load the same test cases"""
|
||||
with open(self.test_cases_path) as f:
|
||||
data = json.load(f)
|
||||
return data["test_cases"]
|
||||
|
||||
# Usage
|
||||
config = EvaluationConfig()
|
||||
test_cases = config.load_test_cases()
|
||||
# Use the same test cases for all evaluations
|
||||
```
|
||||
|
||||
### 2. Staged Evaluation
|
||||
|
||||
#### Start Small and Gradually Expand
|
||||
|
||||
```python
|
||||
# Phase 1: Quick check (3 cases, 1 iteration)
|
||||
quick_results = evaluate(test_cases[:3], iterations=1)
|
||||
|
||||
if quick_results["accuracy"] > baseline["accuracy"]:
|
||||
# Phase 2: Medium check (10 cases, 3 iterations)
|
||||
medium_results = evaluate(test_cases[:10], iterations=3)
|
||||
|
||||
if medium_results["accuracy"] > baseline["accuracy"]:
|
||||
# Phase 3: Full evaluation (all cases, 5 iterations)
|
||||
full_results = evaluate(test_cases, iterations=5)
|
||||
```
|
||||
|
||||
### 3. Recording Evaluation Results
|
||||
|
||||
#### Structured Logging
|
||||
|
||||
```python
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
def save_evaluation_result(
|
||||
results: Dict,
|
||||
version: str,
|
||||
output_dir: Path = Path("evaluation_results")
|
||||
):
|
||||
"""Save evaluation results"""
|
||||
output_dir.mkdir(exist_ok=True)
|
||||
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"{version}_{timestamp}.json"
|
||||
|
||||
full_results = {
|
||||
"version": version,
|
||||
"timestamp": timestamp,
|
||||
"metrics": results,
|
||||
"config": {
|
||||
"model": "claude-3-5-sonnet-20241022",
|
||||
"test_cases": len(test_cases),
|
||||
"iterations": 5
|
||||
}
|
||||
}
|
||||
|
||||
with open(output_dir / filename, "w") as f:
|
||||
json.dump(full_results, f, indent=2)
|
||||
|
||||
print(f"Results saved to: {output_dir / filename}")
|
||||
|
||||
# Usage
|
||||
save_evaluation_result(results, version="baseline")
|
||||
save_evaluation_result(results, version="iteration_1")
|
||||
```
|
||||
|
||||
### 4. Visualization
|
||||
|
||||
#### Visualizing Results
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
def visualize_improvement(
|
||||
baseline: Dict,
|
||||
iterations: List[Dict],
|
||||
metrics: List[str] = ["accuracy", "latency", "cost"]
|
||||
):
|
||||
"""Visualize improvement progress"""
|
||||
fig, axes = plt.subplots(1, len(metrics), figsize=(15, 5))
|
||||
|
||||
for idx, metric in enumerate(metrics):
|
||||
ax = axes[idx]
|
||||
|
||||
# Prepare data
|
||||
x = ["Baseline"] + [f"Iter {i+1}" for i in range(len(iterations))]
|
||||
y = [baseline[metric]] + [it[metric] for it in iterations]
|
||||
|
||||
# Plot
|
||||
ax.plot(x, y, marker='o', linewidth=2)
|
||||
ax.set_title(f"{metric.capitalize()} Progress")
|
||||
ax.set_ylabel(metric.capitalize())
|
||||
ax.grid(True, alpha=0.3)
|
||||
|
||||
# Goal line
|
||||
if metric in baseline.get("goals", {}):
|
||||
goal = baseline["goals"][metric]
|
||||
ax.axhline(y=goal, color='r', linestyle='--', label='Goal')
|
||||
ax.legend()
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig("evaluation_results/improvement_progress.png")
|
||||
print("Visualization saved to: evaluation_results/improvement_progress.png")
|
||||
```
|
||||
|
||||
## 📋 Evaluation Report Template
|
||||
|
||||
### Standard Report Format
|
||||
|
||||
```markdown
|
||||
# Evaluation Report - [Version/Iteration]
|
||||
|
||||
Execution Date: 2024-11-24 12:00:00
|
||||
Executed by: Claude Code (fine-tune skill)
|
||||
|
||||
## Configuration
|
||||
|
||||
- **Model**: claude-3-5-sonnet-20241022
|
||||
- **Number of Test Cases**: 20
|
||||
- **Number of Runs**: 5
|
||||
- **Evaluation Duration**: 10 minutes
|
||||
|
||||
## Results Summary
|
||||
|
||||
| Metric | Mean | Std Dev | 95% CI | Goal | Achievement |
|
||||
|--------|------|---------|--------|------|-------------|
|
||||
| Accuracy | 86.0% | 2.1% | [83.9%, 88.1%] | 90.0% | 95.6% |
|
||||
| Latency | 2.4s | 0.3s | [2.1s, 2.7s] | 2.0s | 83.3% |
|
||||
| Cost | $0.014 | $0.001 | [$0.013, $0.015] | $0.010 | 71.4% |
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Accuracy
|
||||
- **Improvement**: +11.0% (75.0% → 86.0%)
|
||||
- **Statistical Significance**: p < 0.01 ✅
|
||||
- **Effect Size**: Cohen's d = 2.3 (large)
|
||||
|
||||
### Latency
|
||||
- **Improvement**: -0.1s (2.5s → 2.4s)
|
||||
- **Statistical Significance**: p = 0.12 ❌ (not significant)
|
||||
- **Effect Size**: Cohen's d = 0.3 (small)
|
||||
|
||||
## Error Analysis
|
||||
|
||||
- **Total Errors**: 0
|
||||
- **Error Rate**: 0.0%
|
||||
- **Retry Rate**: 0.0%
|
||||
|
||||
## Next Actions
|
||||
|
||||
1. ✅ Accuracy significantly improved → Continue
|
||||
2. ⚠️ Latency improvement is small → Focus in next iteration
|
||||
3. ⚠️ Cost still below goal → Consider max_tokens limit
|
||||
```
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Common Problems and Solutions
|
||||
|
||||
#### 1. Large Variance in Evaluation Results
|
||||
|
||||
**Symptom**: Standard deviation > 20% of mean
|
||||
|
||||
**Causes**:
|
||||
- LLM temperature is too high
|
||||
- Test cases are uneven
|
||||
- Network latency effects
|
||||
|
||||
**Solutions**:
|
||||
```python
|
||||
# Lower temperature
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.3 # Set lower
|
||||
)
|
||||
|
||||
# Increase number of runs
|
||||
iterations = 10 # 5 → 10
|
||||
|
||||
# Remove outliers
|
||||
results_clean = remove_outliers(results)
|
||||
```
|
||||
|
||||
#### 2. Evaluation Takes Too Long
|
||||
|
||||
**Symptom**: Evaluation takes over 1 hour
|
||||
|
||||
**Causes**:
|
||||
- Too many test cases
|
||||
- Not running in parallel
|
||||
- Timeout setting too long
|
||||
|
||||
**Solutions**:
|
||||
```python
|
||||
# Subset evaluation
|
||||
quick_test_cases = test_cases[:10] # First 10 cases only
|
||||
|
||||
# Parallel execution
|
||||
import concurrent.futures
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
|
||||
futures = [executor.submit(evaluate_case, case) for case in test_cases]
|
||||
results = [f.result() for f in futures]
|
||||
|
||||
# Timeout setting
|
||||
timeout = 10 # 30s → 10s
|
||||
```
|
||||
|
||||
#### 3. No Statistical Significance
|
||||
|
||||
**Symptom**: p-value ≥ 0.05
|
||||
|
||||
**Causes**:
|
||||
- Improvement effect is small
|
||||
- Insufficient sample size
|
||||
- High data variance
|
||||
|
||||
**Solutions**:
|
||||
```python
|
||||
# Aim for larger improvements
|
||||
# - Apply multiple optimizations simultaneously
|
||||
# - Choose more effective techniques
|
||||
|
||||
# Increase sample size
|
||||
iterations = 20 # 5 → 20
|
||||
|
||||
# Reduce variance
|
||||
# - Lower temperature
|
||||
# - Stabilize evaluation environment
|
||||
```
|
||||
|
||||
## 📊 Continuous Evaluation
|
||||
|
||||
### Scheduled Evaluation
|
||||
|
||||
```yaml
|
||||
evaluation_schedule:
|
||||
daily:
|
||||
- quick_check: 3 test cases, 1 iteration
|
||||
- purpose: Detect major regressions
|
||||
|
||||
weekly:
|
||||
- medium_check: 10 test cases, 3 iterations
|
||||
- purpose: Continuous quality monitoring
|
||||
|
||||
before_release:
|
||||
- full_evaluation: all test cases, 5-10 iterations
|
||||
- purpose: Release quality assurance
|
||||
|
||||
after_major_changes:
|
||||
- comprehensive_evaluation: all test cases, 10+ iterations
|
||||
- purpose: Impact assessment of major changes
|
||||
```
|
||||
|
||||
### Automated Evaluation Pipeline
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# continuous_evaluation.sh
|
||||
|
||||
# Daily evaluation script
|
||||
|
||||
DATE=$(date +%Y%m%d)
|
||||
RESULTS_DIR="evaluation_results/continuous/$DATE"
|
||||
mkdir -p $RESULTS_DIR
|
||||
|
||||
# Quick check
|
||||
echo "Running quick evaluation..."
|
||||
uv run python -m tests.evaluation.evaluator \
|
||||
--test-cases 3 \
|
||||
--iterations 1 \
|
||||
--output "$RESULTS_DIR/quick.json"
|
||||
|
||||
# Compare with previous results
|
||||
uv run python -m tests.evaluation.compare \
|
||||
--baseline "evaluation_results/baseline/summary.json" \
|
||||
--current "$RESULTS_DIR/quick.json" \
|
||||
--threshold 0.05
|
||||
|
||||
# Notify if regression detected
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "⚠️ Regression detected! Sending notification..."
|
||||
# Notification process (Slack, Email, etc.)
|
||||
fi
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
For effective evaluation:
|
||||
- ✅ **Multiple Metrics**: Quality, performance, cost, reliability
|
||||
- ✅ **Statistical Validation**: Multiple runs and significance testing
|
||||
- ✅ **Consistency**: Same test cases, same conditions
|
||||
- ✅ **Visualization**: Track improvements with graphs and tables
|
||||
- ✅ **Documentation**: Record evaluation results and analysis
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
|
||||
- [Test Case Design](./evaluation_testcases.md) - Test case structure
|
||||
- [Statistical Significance](./evaluation_statistics.md) - Statistical analysis methods
|
||||
315
skills/fine-tune/evaluation_statistics.md
Normal file
315
skills/fine-tune/evaluation_statistics.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Statistical Significance Testing
|
||||
|
||||
Statistical approaches and significance testing in LangGraph application evaluation.
|
||||
|
||||
## 📈 Importance of Multiple Runs
|
||||
|
||||
### Why Multiple Runs Are Necessary
|
||||
|
||||
1. **Account for Randomness**: LLM outputs have probabilistic variation
|
||||
2. **Detect Outliers**: Eliminate effects like temporary network latency
|
||||
3. **Calculate Confidence Intervals**: Determine if improvements are statistically significant
|
||||
|
||||
### Recommended Number of Runs
|
||||
|
||||
| Phase | Runs | Purpose |
|
||||
|-------|------|---------|
|
||||
| **During Development** | 3 | Quick feedback |
|
||||
| **During Evaluation** | 5 | Balanced reliability |
|
||||
| **Before Production** | 10-20 | High statistical confidence |
|
||||
|
||||
## 📊 Statistical Analysis
|
||||
|
||||
### Basic Statistical Calculations
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy import stats
|
||||
|
||||
def statistical_analysis(
|
||||
baseline_results: List[float],
|
||||
improved_results: List[float],
|
||||
alpha: float = 0.05
|
||||
) -> Dict:
|
||||
"""Statistical comparison of baseline and improved versions"""
|
||||
|
||||
# Basic statistics
|
||||
baseline_stats = {
|
||||
"mean": np.mean(baseline_results),
|
||||
"std": np.std(baseline_results),
|
||||
"median": np.median(baseline_results),
|
||||
"min": np.min(baseline_results),
|
||||
"max": np.max(baseline_results)
|
||||
}
|
||||
|
||||
improved_stats = {
|
||||
"mean": np.mean(improved_results),
|
||||
"std": np.std(improved_results),
|
||||
"median": np.median(improved_results),
|
||||
"min": np.min(improved_results),
|
||||
"max": np.max(improved_results)
|
||||
}
|
||||
|
||||
# Independent t-test
|
||||
t_statistic, p_value = stats.ttest_ind(improved_results, baseline_results)
|
||||
|
||||
# Effect size (Cohen's d)
|
||||
pooled_std = np.sqrt(
|
||||
((len(baseline_results) - 1) * baseline_stats["std"]**2 +
|
||||
(len(improved_results) - 1) * improved_stats["std"]**2) /
|
||||
(len(baseline_results) + len(improved_results) - 2)
|
||||
)
|
||||
cohens_d = (improved_stats["mean"] - baseline_stats["mean"]) / pooled_std
|
||||
|
||||
# Improvement percentage
|
||||
improvement_pct = (
|
||||
(improved_stats["mean"] - baseline_stats["mean"]) /
|
||||
baseline_stats["mean"] * 100
|
||||
)
|
||||
|
||||
# Confidence intervals (95%)
|
||||
ci_baseline = stats.t.interval(
|
||||
0.95,
|
||||
len(baseline_results) - 1,
|
||||
loc=baseline_stats["mean"],
|
||||
scale=stats.sem(baseline_results)
|
||||
)
|
||||
|
||||
ci_improved = stats.t.interval(
|
||||
0.95,
|
||||
len(improved_results) - 1,
|
||||
loc=improved_stats["mean"],
|
||||
scale=stats.sem(improved_results)
|
||||
)
|
||||
|
||||
# Determine statistical significance
|
||||
is_significant = p_value < alpha
|
||||
|
||||
# Interpret effect size
|
||||
effect_size_interpretation = (
|
||||
"small" if abs(cohens_d) < 0.5 else
|
||||
"medium" if abs(cohens_d) < 0.8 else
|
||||
"large"
|
||||
)
|
||||
|
||||
return {
|
||||
"baseline": baseline_stats,
|
||||
"improved": improved_stats,
|
||||
"comparison": {
|
||||
"improvement_pct": improvement_pct,
|
||||
"t_statistic": t_statistic,
|
||||
"p_value": p_value,
|
||||
"is_significant": is_significant,
|
||||
"cohens_d": cohens_d,
|
||||
"effect_size": effect_size_interpretation
|
||||
},
|
||||
"confidence_intervals": {
|
||||
"baseline": ci_baseline,
|
||||
"improved": ci_improved
|
||||
}
|
||||
}
|
||||
|
||||
# Usage example
|
||||
baseline_accuracy = [73.0, 75.0, 77.0, 74.0, 76.0] # 5 run results
|
||||
improved_accuracy = [85.0, 87.0, 86.0, 88.0, 84.0] # 5 run results after improvement
|
||||
|
||||
analysis = statistical_analysis(baseline_accuracy, improved_accuracy)
|
||||
print(f"Improvement: {analysis['comparison']['improvement_pct']:.1f}%")
|
||||
print(f"P-value: {analysis['comparison']['p_value']:.4f}")
|
||||
print(f"Significant: {analysis['comparison']['is_significant']}")
|
||||
print(f"Effect size: {analysis['comparison']['effect_size']}")
|
||||
```
|
||||
|
||||
## 🎯 Interpreting Statistical Significance
|
||||
|
||||
### P-value Interpretation
|
||||
|
||||
| P-value | Interpretation | Action |
|
||||
|---------|---------------|--------|
|
||||
| p < 0.01 | **Highly significant** | Adopt improvement with confidence |
|
||||
| p < 0.05 | **Significant** | Can adopt as improvement |
|
||||
| p < 0.10 | **Marginally significant** | Consider additional validation |
|
||||
| p ≥ 0.10 | **Not significant** | Judge as no improvement effect |
|
||||
|
||||
### Effect Size (Cohen's d) Interpretation
|
||||
|
||||
| Cohen's d | Effect Size | Meaning |
|
||||
|-----------|------------|---------|
|
||||
| d < 0.2 | **Negligible** | No substantial improvement |
|
||||
| 0.2 ≤ d < 0.5 | **Small** | Slight improvement |
|
||||
| 0.5 ≤ d < 0.8 | **Medium** | Clear improvement |
|
||||
| d ≥ 0.8 | **Large** | Significant improvement |
|
||||
|
||||
## 📉 Outlier Detection and Handling
|
||||
|
||||
### Outlier Detection
|
||||
|
||||
```python
|
||||
def detect_outliers(data: List[float], method: str = "iqr") -> List[int]:
|
||||
"""Detect outlier indices"""
|
||||
data_array = np.array(data)
|
||||
|
||||
if method == "iqr":
|
||||
# IQR method (Interquartile Range)
|
||||
q1 = np.percentile(data_array, 25)
|
||||
q3 = np.percentile(data_array, 75)
|
||||
iqr = q3 - q1
|
||||
lower_bound = q1 - 1.5 * iqr
|
||||
upper_bound = q3 + 1.5 * iqr
|
||||
|
||||
outliers = [
|
||||
i for i, val in enumerate(data)
|
||||
if val < lower_bound or val > upper_bound
|
||||
]
|
||||
|
||||
elif method == "zscore":
|
||||
# Z-score method
|
||||
mean = np.mean(data_array)
|
||||
std = np.std(data_array)
|
||||
z_scores = np.abs((data_array - mean) / std)
|
||||
|
||||
outliers = [i for i, z in enumerate(z_scores) if z > 3]
|
||||
|
||||
return outliers
|
||||
|
||||
# Usage example
|
||||
results = [75.0, 76.0, 74.0, 77.0, 95.0] # 95.0 may be an outlier
|
||||
outliers = detect_outliers(results, method="iqr")
|
||||
print(f"Outlier indices: {outliers}") # => [4]
|
||||
```
|
||||
|
||||
### Outlier Handling Policy
|
||||
|
||||
1. **Investigation**: Identify why outliers occurred
|
||||
2. **Removal Decision**:
|
||||
- Clear errors (network failure, etc.) → Remove
|
||||
- Actual performance variation → Keep
|
||||
3. **Documentation**: Document cause and handling of outliers
|
||||
|
||||
## 🔄 Considerations for Repeated Measurements
|
||||
|
||||
### Sample Size Calculation
|
||||
|
||||
```python
|
||||
from scipy.stats import ttest_ind_from_stats
|
||||
|
||||
def required_sample_size(
|
||||
baseline_mean: float,
|
||||
baseline_std: float,
|
||||
expected_improvement_pct: float,
|
||||
alpha: float = 0.05,
|
||||
power: float = 0.8
|
||||
) -> int:
|
||||
"""Estimate required sample size"""
|
||||
improved_mean = baseline_mean * (1 + expected_improvement_pct / 100)
|
||||
|
||||
# Calculate effect size
|
||||
effect_size = abs(improved_mean - baseline_mean) / baseline_std
|
||||
|
||||
# Simple estimation (use statsmodels.stats.power for more accuracy)
|
||||
if effect_size < 0.2:
|
||||
return 100 # Small effect requires many samples
|
||||
elif effect_size < 0.5:
|
||||
return 50
|
||||
elif effect_size < 0.8:
|
||||
return 30
|
||||
else:
|
||||
return 20
|
||||
|
||||
# Usage example
|
||||
sample_size = required_sample_size(
|
||||
baseline_mean=75.0,
|
||||
baseline_std=3.0,
|
||||
expected_improvement_pct=10.0
|
||||
)
|
||||
print(f"Required sample size: {sample_size}")
|
||||
```
|
||||
|
||||
## 📊 Visualizing Confidence Intervals
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
def plot_confidence_intervals(
|
||||
baseline_results: List[float],
|
||||
improved_results: List[float],
|
||||
labels: List[str] = ["Baseline", "Improved"]
|
||||
):
|
||||
"""Plot confidence intervals"""
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
# Statistical calculations
|
||||
baseline_mean = np.mean(baseline_results)
|
||||
baseline_ci = stats.t.interval(
|
||||
0.95,
|
||||
len(baseline_results) - 1,
|
||||
loc=baseline_mean,
|
||||
scale=stats.sem(baseline_results)
|
||||
)
|
||||
|
||||
improved_mean = np.mean(improved_results)
|
||||
improved_ci = stats.t.interval(
|
||||
0.95,
|
||||
len(improved_results) - 1,
|
||||
loc=improved_mean,
|
||||
scale=stats.sem(improved_results)
|
||||
)
|
||||
|
||||
# Plot
|
||||
positions = [1, 2]
|
||||
means = [baseline_mean, improved_mean]
|
||||
cis = [
|
||||
(baseline_mean - baseline_ci[0], baseline_ci[1] - baseline_mean),
|
||||
(improved_mean - improved_ci[0], improved_ci[1] - improved_mean)
|
||||
]
|
||||
|
||||
ax.errorbar(positions, means, yerr=np.array(cis).T, fmt='o', markersize=10, capsize=10)
|
||||
ax.set_xticks(positions)
|
||||
ax.set_xticklabels(labels)
|
||||
ax.set_ylabel("Metric Value")
|
||||
ax.set_title("Comparison with 95% Confidence Intervals")
|
||||
ax.grid(True, alpha=0.3)
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig("confidence_intervals.png")
|
||||
print("Plot saved: confidence_intervals.png")
|
||||
```
|
||||
|
||||
## 📋 Statistical Report Template
|
||||
|
||||
```markdown
|
||||
## Statistical Analysis Results
|
||||
|
||||
### Basic Statistics
|
||||
|
||||
| Metric | Baseline | Improved | Improvement |
|
||||
|--------|----------|----------|-------------|
|
||||
| Mean | 75.0% | 86.0% | +11.0% |
|
||||
| Std Dev | 3.2% | 2.1% | -1.1% |
|
||||
| Median | 75.0% | 86.0% | +11.0% |
|
||||
| Min | 70.0% | 84.0% | +14.0% |
|
||||
| Max | 80.0% | 88.0% | +8.0% |
|
||||
|
||||
### Statistical Tests
|
||||
|
||||
- **t-statistic**: 8.45
|
||||
- **P-value**: 0.0001 (p < 0.01)
|
||||
- **Statistical Significance**: ✅ Highly significant
|
||||
- **Effect Size (Cohen's d)**: 2.3 (large)
|
||||
|
||||
### Confidence Intervals (95%)
|
||||
|
||||
- **Baseline**: [72.8%, 77.2%]
|
||||
- **Improved**: [84.9%, 87.1%]
|
||||
|
||||
### Conclusion
|
||||
|
||||
The improvement is statistically highly significant (p < 0.01), with a large effect size (Cohen's d = 2.3).
|
||||
There is no overlap in confidence intervals, confirming the improvement effect is certain.
|
||||
```
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
|
||||
- [Test Case Design](./evaluation_testcases.md) - Test case structure
|
||||
- [Best Practices](./evaluation_practices.md) - Practical evaluation guide
|
||||
279
skills/fine-tune/evaluation_testcases.md
Normal file
279
skills/fine-tune/evaluation_testcases.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# Test Case Design
|
||||
|
||||
Structure, coverage, and design principles for test cases used in LangGraph application evaluation.
|
||||
|
||||
## 🧪 Test Case Structure
|
||||
|
||||
### Representative Test Case Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"test_cases": [
|
||||
{
|
||||
"id": "TC001",
|
||||
"category": "product_inquiry",
|
||||
"difficulty": "easy",
|
||||
"input": "How much does the premium plan cost?",
|
||||
"expected_intent": "product_inquiry",
|
||||
"expected_answer": "The premium plan costs $49 per month.",
|
||||
"expected_answer_semantic": ["premium", "plan", "$49", "month"],
|
||||
"metadata": {
|
||||
"user_type": "new",
|
||||
"context_required": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "TC002",
|
||||
"category": "technical_support",
|
||||
"difficulty": "medium",
|
||||
"input": "I can't seem to log into my account even after resetting my password",
|
||||
"expected_intent": "technical_support",
|
||||
"expected_answer": "Let me help you troubleshoot the login issue. First, please clear your browser cache and cookies, then try logging in again.",
|
||||
"expected_answer_semantic": ["troubleshoot", "clear cache", "cookies", "try again"],
|
||||
"metadata": {
|
||||
"user_type": "existing",
|
||||
"context_required": true,
|
||||
"requires_escalation": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "TC003",
|
||||
"category": "edge_case",
|
||||
"difficulty": "hard",
|
||||
"input": "yo whats the deal with my bill being so high lol",
|
||||
"expected_intent": "billing",
|
||||
"expected_answer": "I understand you have concerns about your bill. Let me review your account to identify any unexpected charges.",
|
||||
"expected_answer_semantic": ["concerns", "bill", "review", "charges"],
|
||||
"metadata": {
|
||||
"user_type": "existing",
|
||||
"context_required": true,
|
||||
"tone": "informal",
|
||||
"requires_empathy": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 📊 Test Case Coverage
|
||||
|
||||
### Balance by Category
|
||||
|
||||
```python
|
||||
def analyze_test_coverage(test_cases: List[Dict]) -> Dict:
|
||||
"""Analyze test case coverage"""
|
||||
categories = {}
|
||||
difficulties = {}
|
||||
|
||||
for case in test_cases:
|
||||
# Category
|
||||
cat = case.get("category", "unknown")
|
||||
categories[cat] = categories.get(cat, 0) + 1
|
||||
|
||||
# Difficulty
|
||||
diff = case.get("difficulty", "unknown")
|
||||
difficulties[diff] = difficulties.get(diff, 0) + 1
|
||||
|
||||
total = len(test_cases)
|
||||
|
||||
return {
|
||||
"total_cases": total,
|
||||
"by_category": {
|
||||
cat: {"count": count, "percentage": count/total*100}
|
||||
for cat, count in categories.items()
|
||||
},
|
||||
"by_difficulty": {
|
||||
diff: {"count": count, "percentage": count/total*100}
|
||||
for diff, count in difficulties.items()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Recommended Balance
|
||||
|
||||
```yaml
|
||||
category_balance:
|
||||
description: "Recommended distribution by category"
|
||||
recommendations:
|
||||
- main_categories: "20-30% (evenly distributed)"
|
||||
- edge_cases: "10-15% (sufficient abnormal case coverage)"
|
||||
|
||||
difficulty_balance:
|
||||
description: "Recommended distribution by difficulty"
|
||||
recommendations:
|
||||
- easy: "40-50% (basic functionality verification)"
|
||||
- medium: "30-40% (practical cases)"
|
||||
- hard: "10-20% (edge cases and complex scenarios)"
|
||||
```
|
||||
|
||||
## 🎯 Test Case Design Principles
|
||||
|
||||
### 1. Representativeness
|
||||
- **Reflect Real Use Cases**: Cover actual user input patterns
|
||||
- **Weight by Frequency**: Include more common cases
|
||||
|
||||
### 2. Diversity
|
||||
- **Comprehensive Categories**: Cover all major categories
|
||||
- **Difficulty Variation**: From easy to hard
|
||||
- **Edge Cases**: Abnormal cases, ambiguous cases, boundary values
|
||||
|
||||
### 3. Clarity
|
||||
- **Clear Expectations**: Be specific with expected_answer
|
||||
- **Explicit Criteria**: Clearly define correctness criteria
|
||||
|
||||
### 4. Maintainability
|
||||
- **ID-based Tracking**: Unique ID for each test case
|
||||
- **Rich Metadata**: Category, difficulty, and other attributes
|
||||
|
||||
## 📝 Test Case Templates
|
||||
|
||||
### Basic Template
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "TC[number]",
|
||||
"category": "[category name]",
|
||||
"difficulty": "easy|medium|hard",
|
||||
"input": "[user input]",
|
||||
"expected_intent": "[expected intent]",
|
||||
"expected_answer": "[expected answer]",
|
||||
"expected_answer_semantic": ["keyword1", "keyword2"],
|
||||
"metadata": {
|
||||
"user_type": "new|existing",
|
||||
"context_required": true|false,
|
||||
"specific_flag": true|false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Templates by Category
|
||||
|
||||
#### Product Inquiry
|
||||
```json
|
||||
{
|
||||
"id": "TC_PRODUCT_001",
|
||||
"category": "product_inquiry",
|
||||
"difficulty": "easy",
|
||||
"input": "Question about product",
|
||||
"expected_intent": "product_inquiry",
|
||||
"expected_answer": "Answer including product information",
|
||||
"metadata": {
|
||||
"product_type": "premium|basic|enterprise",
|
||||
"question_type": "pricing|features|comparison"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Technical Support
|
||||
```json
|
||||
{
|
||||
"id": "TC_TECH_001",
|
||||
"category": "technical_support",
|
||||
"difficulty": "medium",
|
||||
"input": "Technical problem report",
|
||||
"expected_intent": "technical_support",
|
||||
"expected_answer": "Troubleshooting steps",
|
||||
"metadata": {
|
||||
"issue_type": "login|performance|bug",
|
||||
"requires_escalation": false,
|
||||
"urgency": "low|medium|high"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Billing
|
||||
```json
|
||||
{
|
||||
"id": "TC_BILLING_001",
|
||||
"category": "billing",
|
||||
"difficulty": "medium",
|
||||
"input": "Billing question",
|
||||
"expected_intent": "billing",
|
||||
"expected_answer": "Billing explanation and next steps",
|
||||
"metadata": {
|
||||
"billing_type": "charge|refund|subscription",
|
||||
"requires_account_access": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Edge Cases
|
||||
```json
|
||||
{
|
||||
"id": "TC_EDGE_001",
|
||||
"category": "edge_case",
|
||||
"difficulty": "hard",
|
||||
"input": "Ambiguous, non-standard, or unexpected input",
|
||||
"expected_intent": "appropriate fallback",
|
||||
"expected_answer": "Polite clarification request",
|
||||
"metadata": {
|
||||
"edge_type": "ambiguous|off_topic|malformed",
|
||||
"requires_empathy": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🔍 Test Case Evaluation
|
||||
|
||||
### Quality Checklist
|
||||
|
||||
```python
|
||||
def validate_test_case(test_case: Dict) -> List[str]:
|
||||
"""Check test case quality"""
|
||||
issues = []
|
||||
|
||||
# Check required fields
|
||||
required_fields = ["id", "category", "difficulty", "input", "expected_intent"]
|
||||
for field in required_fields:
|
||||
if field not in test_case:
|
||||
issues.append(f"Missing required field: {field}")
|
||||
|
||||
# ID uniqueness (requires separate check)
|
||||
# Input length check
|
||||
if len(test_case.get("input", "")) < 5:
|
||||
issues.append("Input too short (minimum 5 characters)")
|
||||
|
||||
# Category validity
|
||||
valid_categories = ["product_inquiry", "technical_support", "billing", "general", "edge_case"]
|
||||
if test_case.get("category") not in valid_categories:
|
||||
issues.append(f"Invalid category: {test_case.get('category')}")
|
||||
|
||||
# Difficulty validity
|
||||
valid_difficulties = ["easy", "medium", "hard"]
|
||||
if test_case.get("difficulty") not in valid_difficulties:
|
||||
issues.append(f"Invalid difficulty: {test_case.get('difficulty')}")
|
||||
|
||||
return issues
|
||||
```
|
||||
|
||||
## 📈 Coverage Report
|
||||
|
||||
### Coverage Analysis Script
|
||||
|
||||
```python
|
||||
def generate_coverage_report(test_cases: List[Dict]) -> str:
|
||||
"""Generate test case coverage report"""
|
||||
coverage = analyze_test_coverage(test_cases)
|
||||
|
||||
report = f"""# Test Case Coverage Report
|
||||
|
||||
## Summary
|
||||
- **Total Test Cases**: {coverage['total_cases']}
|
||||
|
||||
## By Category
|
||||
"""
|
||||
for cat, data in coverage['by_category'].items():
|
||||
report += f"- **{cat}**: {data['count']} cases ({data['percentage']:.1f}%)\n"
|
||||
|
||||
report += "\n## By Difficulty\n"
|
||||
for diff, data in coverage['by_difficulty'].items():
|
||||
report += f"- **{diff}**: {data['count']} cases ({data['percentage']:.1f}%)\n"
|
||||
|
||||
return report
|
||||
```
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
|
||||
- [Statistical Significance](./evaluation_statistics.md) - Multiple runs and statistical analysis
|
||||
- [Best Practices](./evaluation_practices.md) - Practical evaluation guide
|
||||
119
skills/fine-tune/examples.md
Normal file
119
skills/fine-tune/examples.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Fine-Tuning Practical Examples Collection
|
||||
|
||||
A collection of specific code examples and markdown templates used for LangGraph application fine-tuning.
|
||||
|
||||
## 📋 Table of Contents
|
||||
|
||||
This guide is divided by Phase:
|
||||
|
||||
### [Phase 1: Preparation and Analysis Examples](./examples_phase1.md)
|
||||
Templates and code examples used in the optimization preparation phase:
|
||||
- **Example 1.1**: fine-tune.md structure example
|
||||
- **Example 1.2**: Optimization target list example
|
||||
- **Example 1.3**: Code search example with Serena MCP
|
||||
|
||||
**Estimated Time**: 30 minutes - 1 hour
|
||||
|
||||
### [Phase 2: Baseline Evaluation Examples](./examples_phase2.md)
|
||||
Scripts and report examples used for current performance measurement:
|
||||
- **Example 2.1**: Evaluation script (evaluator.py)
|
||||
- **Example 2.2**: Baseline measurement script (baseline_evaluation.sh)
|
||||
- **Example 2.3**: Baseline results report
|
||||
|
||||
**Estimated Time**: 1-2 hours
|
||||
|
||||
### [Phase 3: Iterative Improvement Examples](./examples_phase3.md)
|
||||
Practical examples of prompt optimization and result comparison:
|
||||
- **Example 3.1**: Before/After prompt comparison
|
||||
- **Example 3.2**: Prioritization matrix
|
||||
- **Example 3.3**: Iteration results report
|
||||
|
||||
**Estimated Time**: 1-2 hours per iteration × number of iterations
|
||||
|
||||
### [Phase 4: Completion and Documentation Examples](./examples_phase4.md)
|
||||
Examples of recording final results and version control:
|
||||
- **Example 4.1**: Final evaluation report (complete version)
|
||||
- **Example 4.2**: Git commit message examples
|
||||
|
||||
**Estimated Time**: 30 minutes - 1 hour
|
||||
|
||||
## 🎯 How to Use
|
||||
|
||||
### For First-Time Implementation
|
||||
|
||||
1. **Start with [Phase 1 examples](./examples_phase1.md)** - Copy and use templates
|
||||
2. **Set up [Phase 2 evaluation scripts](./examples_phase2.md)** - Customize for your environment
|
||||
3. **Iterate using [Phase 3 comparison examples](./examples_phase3.md)** - Record Before/After
|
||||
4. **Document with [Phase 4 report](./examples_phase4.md)** - Summarize final results
|
||||
|
||||
### Copy & Paste Ready
|
||||
|
||||
Each example includes complete code and templates:
|
||||
- Python scripts → Ready to execute as-is
|
||||
- Bash scripts → Set environment variables and run
|
||||
- Markdown templates → Fill in content and use
|
||||
- JSON structures → Templates for test cases and reports
|
||||
|
||||
## 📊 Types of Examples
|
||||
|
||||
### Code Scripts
|
||||
- **Evaluation scripts** (Phase 2): evaluator.py, aggregate_results.py
|
||||
- **Measurement scripts** (Phase 2): baseline_evaluation.sh
|
||||
- **Analysis scripts** (Phase 1): Serena MCP search examples
|
||||
|
||||
### Markdown Templates
|
||||
- **fine-tune.md** (Phase 1): Goal setting
|
||||
- **Optimization target list** (Phase 1): Organizing improvement targets
|
||||
- **Baseline results report** (Phase 2): Current state analysis
|
||||
- **Iteration results report** (Phase 3): Improvement effect measurement
|
||||
- **Final evaluation report** (Phase 4): Overall summary
|
||||
|
||||
### Comparison Examples
|
||||
- **Before/After prompts** (Phase 3): Specific improvement examples
|
||||
- **Prioritization matrix** (Phase 3): Decision-making records
|
||||
|
||||
## 🔍 Finding Examples
|
||||
|
||||
### By Purpose
|
||||
|
||||
| Purpose | Phase | Example |
|
||||
|---------|-------|---------|
|
||||
| Set goals | Phase 1 | [Example 1.1](./examples_phase1.md#example-11-fine-tunemd-structure-example) |
|
||||
| Find optimization targets | Phase 1 | [Example 1.3](./examples_phase1.md#example-13-code-search-example-with-serena-mcp) |
|
||||
| Create evaluation scripts | Phase 2 | [Example 2.1](./examples_phase2.md#example-21-evaluation-script) |
|
||||
| Measure baseline | Phase 2 | [Example 2.2](./examples_phase2.md#example-22-baseline-measurement-script) |
|
||||
| Improve prompts | Phase 3 | [Example 3.1](./examples_phase3.md#example-31-beforeafter-prompt-comparison) |
|
||||
| Determine priorities | Phase 3 | [Example 3.2](./examples_phase3.md#example-32-prioritization-matrix) |
|
||||
| Write final report | Phase 4 | [Example 4.1](./examples_phase4.md#example-41-final-evaluation-report) |
|
||||
| Git commit | Phase 4 | [Example 4.2](./examples_phase4.md#example-42-git-commit-message-examples) |
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- **[Workflow](./workflow.md)** - Detailed procedures for each Phase
|
||||
- **[Evaluation Methods](./evaluation.md)** - Evaluation metrics and statistical analysis
|
||||
- **[Prompt Optimization](./prompt_optimization.md)** - Detailed optimization techniques
|
||||
- **[SKILL.md](./SKILL.md)** - Overview of the Fine-tune skill
|
||||
|
||||
## 💡 Tips
|
||||
|
||||
### Customization Points
|
||||
|
||||
1. **Number of test cases**: Examples use 20 cases, but adjust according to your project
|
||||
2. **Number of runs**: 3-5 runs recommended for baseline measurement, but adjust based on time constraints
|
||||
3. **Target values**: Set Accuracy, Latency, and Cost targets according to project requirements
|
||||
4. **Model**: Adjust pricing if using models other than Claude 3.5 Sonnet
|
||||
|
||||
### Frequently Asked Questions
|
||||
|
||||
**Q: Can I use the example code as-is?**
|
||||
A: Yes, it's executable once you set environment variables (API keys, etc.).
|
||||
|
||||
**Q: Can I edit the templates?**
|
||||
A: Yes, please customize freely according to your project.
|
||||
|
||||
**Q: Can I skip phases?**
|
||||
A: We recommend executing all phases on the first run. From the second run onward, you can start from Phase 2.
|
||||
|
||||
---
|
||||
|
||||
**💡 Tip**: For detailed procedures of each Phase, refer to the [Workflow](./workflow.md).
|
||||
174
skills/fine-tune/examples_phase1.md
Normal file
174
skills/fine-tune/examples_phase1.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# Phase 1: Preparation and Analysis Examples
|
||||
|
||||
Practical code examples and templates.
|
||||
|
||||
**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 1](./workflow_phase1.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Preparation and Analysis Examples
|
||||
|
||||
### Example 1.1: fine-tune.md Structure Example
|
||||
|
||||
**File**: `.langgraph-master/fine-tune.md`
|
||||
|
||||
```markdown
|
||||
# Fine-Tuning Goals
|
||||
|
||||
## Optimization Objectives
|
||||
|
||||
- **Accuracy**: Improve user intent classification accuracy to 90% or higher
|
||||
- **Latency**: Reduce response time to 2.0 seconds or less
|
||||
- **Cost**: Reduce cost per request to $0.010 or less
|
||||
|
||||
## Evaluation Method
|
||||
|
||||
### Test Cases
|
||||
|
||||
- **Dataset**: tests/evaluation/test_cases.json (20 cases)
|
||||
- **Execution Command**: uv run python -m src.evaluate
|
||||
- **Evaluation Script**: tests/evaluation/evaluator.py
|
||||
|
||||
### Evaluation Metrics
|
||||
|
||||
#### Accuracy (Correctness Rate)
|
||||
|
||||
- **Calculation Method**: (Number of correct answers / Total cases) × 100
|
||||
- **Target Value**: 90% or higher
|
||||
|
||||
#### Latency (Response Time)
|
||||
|
||||
- **Calculation Method**: Average time of each execution
|
||||
- **Target Value**: 2.0 seconds or less
|
||||
|
||||
#### Cost
|
||||
|
||||
- **Calculation Method**: Total API cost / Total number of requests
|
||||
- **Target Value**: $0.010 or less
|
||||
|
||||
## Pass Criteria
|
||||
|
||||
All evaluation metrics must achieve their target values.
|
||||
```
|
||||
|
||||
### Example 1.2: Optimization Target List Example
|
||||
|
||||
```markdown
|
||||
# Optimization Target Nodes
|
||||
|
||||
## Node: analyze_intent
|
||||
|
||||
### Basic Information
|
||||
|
||||
- **File**: src/nodes/analyzer.py:25-45
|
||||
- **Role**: Classify user input intent
|
||||
- **LLM Model**: claude-3-5-sonnet-20241022
|
||||
- **Current Parameters**: temperature=1.0, max_tokens=default
|
||||
|
||||
### Current Prompt
|
||||
|
||||
\```python
|
||||
SystemMessage(content="You are an intent analyzer. Analyze user input.")
|
||||
HumanMessage(content=f"Analyze: {user_input}")
|
||||
\```
|
||||
|
||||
### Issues
|
||||
|
||||
1. **Ambiguous instructions**: Specific criteria for "Analyze" are unclear
|
||||
2. **No few-shot examples**: No expected output examples
|
||||
3. **Undefined output format**: Free text, not structured
|
||||
4. **High temperature**: 1.0 is too high for classification tasks
|
||||
|
||||
### Improvement Proposals
|
||||
|
||||
1. Specify concrete classification categories
|
||||
2. Add 3-5 few-shot examples
|
||||
3. Specify JSON output format
|
||||
4. Lower temperature to 0.3-0.5
|
||||
|
||||
### Estimated Improvement Effect
|
||||
|
||||
- **Accuracy**: +10-15% (Current misclassification 20% → 5-10%)
|
||||
- **Latency**: ±0 (no change)
|
||||
- **Cost**: ±0 (no change)
|
||||
|
||||
### Priority
|
||||
|
||||
⭐⭐⭐⭐⭐ (Highest priority) - Direct impact on accuracy improvement
|
||||
|
||||
---
|
||||
|
||||
## Node: generate_response
|
||||
|
||||
### Basic Information
|
||||
|
||||
- **File**: src/nodes/generator.py:45-68
|
||||
- **Role**: Generate final user-facing response
|
||||
- **LLM Model**: claude-3-5-sonnet-20241022
|
||||
- **Current Parameters**: temperature=0.7, max_tokens=default
|
||||
|
||||
### Current Prompt
|
||||
|
||||
\```python
|
||||
ChatPromptTemplate.from_messages([
|
||||
("system", "Generate helpful response based on context."),
|
||||
("human", "{context}\n\nQuestion: {question}")
|
||||
])
|
||||
\```
|
||||
|
||||
### Issues
|
||||
|
||||
1. **No redundancy control**: No instructions for conciseness
|
||||
2. **max_tokens not set**: Possibility of unnecessarily long output
|
||||
3. **Response style undefined**: No specification of tone or style
|
||||
|
||||
### Improvement Proposals
|
||||
|
||||
1. Add length instructions like "concisely" "in 2-3 sentences"
|
||||
2. Limit max_tokens to 500
|
||||
3. Clarify response style ("friendly" "professional", etc.)
|
||||
|
||||
### Estimated Improvement Effect
|
||||
|
||||
- **Accuracy**: ±0 (no change)
|
||||
- **Latency**: -0.3-0.5s (due to reduced output tokens)
|
||||
- **Cost**: -20-30% (due to reduced token count)
|
||||
|
||||
### Priority
|
||||
|
||||
⭐⭐⭐ (Medium) - Improvement in latency and cost
|
||||
```
|
||||
|
||||
### Example 1.3: Code Search Example with Serena MCP
|
||||
|
||||
```python
|
||||
# Search for LLM client
|
||||
from mcp_serena import find_symbol, find_referencing_symbols
|
||||
|
||||
# Step 1: Search for ChatAnthropic usage locations
|
||||
chat_anthropic_usages = find_symbol(
|
||||
name_path="ChatAnthropic",
|
||||
substring_matching=True,
|
||||
include_body=False
|
||||
)
|
||||
|
||||
print(f"Found {len(chat_anthropic_usages)} ChatAnthropic usages")
|
||||
|
||||
# Step 2: Investigate details of each usage location
|
||||
for usage in chat_anthropic_usages:
|
||||
print(f"\nFile: {usage.relative_path}:{usage.line_start}")
|
||||
print(f"Context: {usage.name_path}")
|
||||
|
||||
# Identify prompt construction locations
|
||||
references = find_referencing_symbols(
|
||||
name_path=usage.name,
|
||||
relative_path=usage.relative_path
|
||||
)
|
||||
|
||||
# Display locations that may contain prompts
|
||||
for ref in references:
|
||||
if "message" in ref.name.lower() or "prompt" in ref.name.lower():
|
||||
print(f" - Potential prompt location: {ref.name_path}")
|
||||
```
|
||||
|
||||
---
|
||||
194
skills/fine-tune/examples_phase2.md
Normal file
194
skills/fine-tune/examples_phase2.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# Phase 2: Baseline Evaluation Examples
|
||||
|
||||
Examples of evaluation scripts and result reports.
|
||||
|
||||
**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 2](./workflow_phase2.md) | [Evaluation Methods](./evaluation.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Baseline Evaluation Examples
|
||||
|
||||
### Example 2.1: Evaluation Script
|
||||
|
||||
**File**: `tests/evaluation/evaluator.py`
|
||||
|
||||
```python
|
||||
import json
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Dict, List
|
||||
|
||||
def evaluate_test_cases(test_cases: List[Dict]) -> Dict:
|
||||
"""Evaluate test cases"""
|
||||
results = {
|
||||
"total_cases": len(test_cases),
|
||||
"correct": 0,
|
||||
"total_latency": 0.0,
|
||||
"total_cost": 0.0,
|
||||
"case_results": []
|
||||
}
|
||||
|
||||
for case in test_cases:
|
||||
start_time = time.time()
|
||||
|
||||
# Run LangGraph application
|
||||
output = run_langgraph_app(case["input"])
|
||||
|
||||
latency = time.time() - start_time
|
||||
|
||||
# Correctness judgment
|
||||
is_correct = output["answer"] == case["expected_answer"]
|
||||
if is_correct:
|
||||
results["correct"] += 1
|
||||
|
||||
# Cost calculation (from token usage)
|
||||
cost = calculate_cost(output["token_usage"])
|
||||
|
||||
results["total_latency"] += latency
|
||||
results["total_cost"] += cost
|
||||
|
||||
results["case_results"].append({
|
||||
"case_id": case["id"],
|
||||
"correct": is_correct,
|
||||
"latency": latency,
|
||||
"cost": cost
|
||||
})
|
||||
|
||||
# Calculate metrics
|
||||
results["accuracy"] = (results["correct"] / results["total_cases"]) * 100
|
||||
results["avg_latency"] = results["total_latency"] / results["total_cases"]
|
||||
results["avg_cost"] = results["total_cost"] / results["total_cases"]
|
||||
|
||||
return results
|
||||
|
||||
def calculate_cost(token_usage: Dict) -> float:
|
||||
"""Calculate cost from token usage"""
|
||||
# Claude 3.5 Sonnet pricing
|
||||
INPUT_COST_PER_1M = 3.0 # $3.00 per 1M input tokens
|
||||
OUTPUT_COST_PER_1M = 15.0 # $15.00 per 1M output tokens
|
||||
|
||||
input_cost = (token_usage["input_tokens"] / 1_000_000) * INPUT_COST_PER_1M
|
||||
output_cost = (token_usage["output_tokens"] / 1_000_000) * OUTPUT_COST_PER_1M
|
||||
|
||||
return input_cost + output_cost
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Load test cases
|
||||
with open("tests/evaluation/test_cases.json") as f:
|
||||
test_cases = json.load(f)["test_cases"]
|
||||
|
||||
# Execute evaluation
|
||||
results = evaluate_test_cases(test_cases)
|
||||
|
||||
# Save results
|
||||
with open("evaluation_results/baseline_run.json", "w") as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
print(f"Accuracy: {results['accuracy']:.1f}%")
|
||||
print(f"Avg Latency: {results['avg_latency']:.2f}s")
|
||||
print(f"Avg Cost: ${results['avg_cost']:.4f}")
|
||||
```
|
||||
|
||||
### Example 2.2: Baseline Measurement Script
|
||||
|
||||
**File**: `scripts/baseline_evaluation.sh`
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
ITERATIONS=5
|
||||
RESULTS_DIR="evaluation_results/baseline"
|
||||
mkdir -p $RESULTS_DIR
|
||||
|
||||
echo "Starting baseline evaluation: $ITERATIONS iterations"
|
||||
|
||||
for i in $(seq 1 $ITERATIONS); do
|
||||
echo "----------------------------------------"
|
||||
echo "Iteration $i/$ITERATIONS"
|
||||
echo "----------------------------------------"
|
||||
|
||||
uv run python -m tests.evaluation.evaluator \
|
||||
--output "$RESULTS_DIR/run_$i.json" \
|
||||
--verbose
|
||||
|
||||
echo "Completed iteration $i"
|
||||
|
||||
# API rate limit mitigation
|
||||
if [ $i -lt $ITERATIONS ]; then
|
||||
echo "Waiting 5 seconds before next iteration..."
|
||||
sleep 5
|
||||
fi
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "All iterations completed. Aggregating results..."
|
||||
|
||||
# Aggregate results
|
||||
uv run python -m tests.evaluation.aggregate \
|
||||
--input-dir "$RESULTS_DIR" \
|
||||
--output "$RESULTS_DIR/summary.json"
|
||||
|
||||
echo "Baseline evaluation complete!"
|
||||
echo "Results saved to: $RESULTS_DIR/summary.json"
|
||||
```
|
||||
|
||||
### Example 2.3: Baseline Results Report
|
||||
|
||||
```markdown
|
||||
# Baseline Evaluation Results
|
||||
|
||||
Execution Date/Time: 2024-11-24 10:00:00
|
||||
Number of Runs: 5
|
||||
Number of Test Cases: 20
|
||||
|
||||
## Evaluation Metrics Summary
|
||||
|
||||
| Metric | Average | Std Dev | Min | Max | Target | Gap |
|
||||
| -------- | ------- | ------- | ------ | ------ | ------ | ---------- |
|
||||
| Accuracy | 75.0% | 3.2% | 70.0% | 80.0% | 90.0% | **-15.0%** |
|
||||
| Latency | 2.5s | 0.4s | 2.1s | 3.2s | 2.0s | **+0.5s** |
|
||||
| Cost/req | $0.015 | $0.002 | $0.013 | $0.018 | $0.010 | **+$0.005** |
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Accuracy Issues
|
||||
|
||||
- **Current**: 75.0% (Target: 90.0%)
|
||||
- **Main incorrect answer patterns**:
|
||||
1. Intent classification errors: 12 cases (60% of errors)
|
||||
2. Insufficient context understanding: 5 cases (25% of errors)
|
||||
3. Ambiguous question handling: 3 cases (15% of errors)
|
||||
|
||||
### Latency Issues
|
||||
|
||||
- **Current**: 2.5s (Target: 2.0s)
|
||||
- **Bottlenecks**:
|
||||
1. generate_response node: Average 1.8s (72% of total)
|
||||
2. analyze_intent node: Average 0.5s (20% of total)
|
||||
3. Other: Average 0.2s (8% of total)
|
||||
|
||||
### Cost Issues
|
||||
|
||||
- **Current**: $0.015/req (Target: $0.010/req)
|
||||
- **Cost breakdown**:
|
||||
1. generate_response: $0.011 (73%)
|
||||
2. analyze_intent: $0.003 (20%)
|
||||
3. Other: $0.001 (7%)
|
||||
- **Main factor**: High output token count (average 800 tokens)
|
||||
|
||||
## Improvement Directions
|
||||
|
||||
### Priority 1: Improve analyze_intent accuracy
|
||||
|
||||
- **Impact**: Direct impact on Accuracy (accounts for 60% of the -15% gap)
|
||||
- **Improvement measures**: Few-shot examples, clear classification criteria, JSON output format
|
||||
- **Estimated effect**: +10-12% accuracy
|
||||
|
||||
### Priority 2: Optimize generate_response efficiency
|
||||
|
||||
- **Impact**: Affects both Latency and Cost
|
||||
- **Improvement measures**: Conciseness instructions, max_tokens limit, temperature adjustment
|
||||
- **Estimated effect**: -0.4s latency, -$0.004 cost
|
||||
```
|
||||
|
||||
---
|
||||
230
skills/fine-tune/examples_phase3.md
Normal file
230
skills/fine-tune/examples_phase3.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Phase 3: Iterative Improvement Examples
|
||||
|
||||
Examples of before/after prompt comparisons and result reports.
|
||||
|
||||
**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 3](./workflow_phase3.md) | [Prompt Optimization](./prompt_optimization.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Iterative Improvement Examples
|
||||
|
||||
### Example 3.1: Before/After Prompt Comparison
|
||||
|
||||
**Node**: analyze_intent
|
||||
|
||||
#### Before (Baseline)
|
||||
|
||||
```python
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=1.0
|
||||
)
|
||||
|
||||
messages = [
|
||||
SystemMessage(content="You are an intent analyzer. Analyze user input."),
|
||||
HumanMessage(content=f"Analyze: {state['user_input']}")
|
||||
]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
state["intent"] = response.content
|
||||
return state
|
||||
```
|
||||
|
||||
**Issues**:
|
||||
- Ambiguous instructions
|
||||
- No few-shot examples
|
||||
- Free text output
|
||||
- High temperature
|
||||
|
||||
**Result**: Accuracy 75%
|
||||
|
||||
#### After (Iteration 1)
|
||||
|
||||
```python
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.3 # Lower temperature for classification tasks
|
||||
)
|
||||
|
||||
# Clear classification categories and few-shot examples
|
||||
system_prompt = """You are an intent classifier for a customer support chatbot.
|
||||
|
||||
Classify user input into one of these categories:
|
||||
- "product_inquiry": Questions about products or services
|
||||
- "technical_support": Technical issues or troubleshooting
|
||||
- "billing": Payment, invoicing, or billing questions
|
||||
- "general": General questions or chitchat
|
||||
|
||||
Output ONLY a valid JSON object with this structure:
|
||||
{
|
||||
"intent": "<category>",
|
||||
"confidence": <0.0-1.0>,
|
||||
"reasoning": "<brief explanation>"
|
||||
}
|
||||
|
||||
Examples:
|
||||
|
||||
Input: "How much does the premium plan cost?"
|
||||
Output: {"intent": "product_inquiry", "confidence": 0.95, "reasoning": "Question about product pricing"}
|
||||
|
||||
Input: "I can't log into my account"
|
||||
Output: {"intent": "technical_support", "confidence": 0.9, "reasoning": "Authentication issue"}
|
||||
|
||||
Input: "Why was I charged twice?"
|
||||
Output: {"intent": "billing", "confidence": 0.95, "reasoning": "Question about billing charges"}
|
||||
|
||||
Input: "Hello, how are you?"
|
||||
Output: {"intent": "general", "confidence": 0.85, "reasoning": "General greeting"}
|
||||
|
||||
Input: "What's the return policy?"
|
||||
Output: {"intent": "product_inquiry", "confidence": 0.9, "reasoning": "Question about product policy"}
|
||||
"""
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=system_prompt),
|
||||
HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
|
||||
]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
|
||||
# JSON parsing (with error handling)
|
||||
try:
|
||||
intent_data = json.loads(response.content)
|
||||
state["intent"] = intent_data["intent"]
|
||||
state["confidence"] = intent_data["confidence"]
|
||||
except json.JSONDecodeError:
|
||||
# Fallback
|
||||
state["intent"] = "general"
|
||||
state["confidence"] = 0.5
|
||||
|
||||
return state
|
||||
```
|
||||
|
||||
**Improvements**:
|
||||
- ✅ temperature: 1.0 → 0.3
|
||||
- ✅ Clear classification categories (4 intents)
|
||||
- ✅ Few-shot examples (5 added)
|
||||
- ✅ JSON output format (structured output)
|
||||
- ✅ Error handling (fallback for JSON parsing failures)
|
||||
|
||||
**Result**: Accuracy 86% (+11%)
|
||||
|
||||
### Example 3.2: Prioritization Matrix
|
||||
|
||||
```markdown
|
||||
## Improvement Prioritization Matrix
|
||||
|
||||
| Node | Impact | Feasibility | Implementation Cost | Total Score | Priority |
|
||||
| ----------------- | ------------ | ------------ | ------------------- | ----------- | -------- |
|
||||
| analyze_intent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 14/15 | 1st |
|
||||
| generate_response | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 12/15 | 2nd |
|
||||
| retrieve_context | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 8/15 | 3rd |
|
||||
|
||||
### Detailed Analysis
|
||||
|
||||
#### 1st: analyze_intent Node
|
||||
|
||||
- **Impact**: ⭐⭐⭐⭐⭐
|
||||
- Direct impact on Accuracy (accounts for 60% of -15% gap)
|
||||
- Also affects downstream nodes (chain errors from misclassification)
|
||||
|
||||
- **Feasibility**: ⭐⭐⭐⭐⭐
|
||||
- Improvement expected from few-shot examples
|
||||
- Similar cases show +10-15% improvement
|
||||
|
||||
- **Implementation Cost**: ⭐⭐⭐⭐
|
||||
- Implementation time: 30-60 minutes
|
||||
- Testing time: 30 minutes
|
||||
- Risk: Low
|
||||
|
||||
**Iteration 1 target**: analyze_intent node
|
||||
|
||||
#### 2nd: generate_response Node
|
||||
|
||||
- **Impact**: ⭐⭐⭐⭐
|
||||
- Main contributor to Latency and Cost (over 70% of total)
|
||||
- Small direct impact on Accuracy
|
||||
|
||||
- **Feasibility**: ⭐⭐⭐⭐
|
||||
- max_tokens limit ensures improvement
|
||||
- Quality can be maintained with conciseness instructions
|
||||
|
||||
- **Implementation Cost**: ⭐⭐⭐⭐
|
||||
- Implementation time: 20-30 minutes
|
||||
- Testing time: 30 minutes
|
||||
- Risk: Low
|
||||
|
||||
**Iteration 2 target**: generate_response node
|
||||
```
|
||||
|
||||
### Example 3.3: Iteration Results Report
|
||||
|
||||
```markdown
|
||||
# Iteration 1 Evaluation Results
|
||||
|
||||
Execution Date/Time: 2024-11-24 12:00:00
|
||||
Changes: analyze_intent node optimization
|
||||
|
||||
## Result Comparison
|
||||
|
||||
| Metric | Baseline | Iteration 1 | Change | Change Rate | Target | Achievement |
|
||||
| ------------ | -------- | ----------- | ---------- | ----------- | ------ | ----------- |
|
||||
| **Accuracy** | 75.0% | **86.0%** | **+11.0%** | +14.7% | 90.0% | 95.6% |
|
||||
| **Latency** | 2.5s | 2.4s | -0.1s | -4.0% | 2.0s | 80.0% |
|
||||
| **Cost/req** | $0.015 | $0.014 | -$0.001 | -6.7% | $0.010 | 71.4% |
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Accuracy Improvement
|
||||
|
||||
- **Improvement**: +11.0% (75.0% → 86.0%)
|
||||
- **Remaining gap**: 4.0% (Target 90.0%)
|
||||
- **Improved cases**: Intent classification errors reduced from 12 → 3 cases
|
||||
- **Still needs improvement**: Context understanding cases (5 cases)
|
||||
|
||||
### Slight Latency Improvement
|
||||
|
||||
- **Improvement**: -0.1s (2.5s → 2.4s)
|
||||
- **Main factor**: analyze_intent output became more concise due to lower temperature
|
||||
- **Remaining bottleneck**: generate_response (average 1.8s)
|
||||
|
||||
### Slight Cost Reduction
|
||||
|
||||
- **Reduction**: -$0.001 (6.7% reduction)
|
||||
- **Factor**: analyze_intent output token reduction
|
||||
- **Main cost**: generate_response still accounts for 73%
|
||||
|
||||
## Statistical Significance
|
||||
|
||||
- **t-test**: p < 0.01 ✅ (statistically significant)
|
||||
- **Effect size**: Cohen's d = 2.3 (large effect)
|
||||
- **Confidence interval**: [83.9%, 88.1%] (95% CI)
|
||||
|
||||
## Next Iteration Strategy
|
||||
|
||||
### Priority 1: Optimize generate_response
|
||||
|
||||
- **Goal**: Latency from 1.8s → 1.4s, Cost from $0.011 → $0.007
|
||||
- **Approach**:
|
||||
1. Add conciseness instructions
|
||||
2. Limit max_tokens to 500
|
||||
3. Adjust temperature from 0.7 → 0.5
|
||||
|
||||
### Priority 2: Final 4% Accuracy improvement
|
||||
|
||||
- **Goal**: 86.0% → 90.0% or higher
|
||||
- **Approach**: Improve context understanding (retrieve_context node)
|
||||
|
||||
## Decision
|
||||
|
||||
✅ **Continue** → Proceed to Iteration 2
|
||||
|
||||
Reasons:
|
||||
- Accuracy improved significantly but still hasn't reached target
|
||||
- Latency and Cost still have room for improvement
|
||||
- Clear improvement strategy is in place
|
||||
```
|
||||
|
||||
---
|
||||
288
skills/fine-tune/examples_phase4.md
Normal file
288
skills/fine-tune/examples_phase4.md
Normal file
@@ -0,0 +1,288 @@
|
||||
# Phase 4: Completion and Documentation Examples
|
||||
|
||||
Examples of final reports and Git commits.
|
||||
|
||||
**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 4](./workflow_phase4.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Completion and Documentation Examples
|
||||
|
||||
### Example 4.1: Final Evaluation Report
|
||||
|
||||
```markdown
|
||||
# LangGraph Application Fine-Tuning Completion Report
|
||||
|
||||
Project: Customer Support Chatbot
|
||||
Implementation Period: 2024-11-24 10:00 - 2024-11-24 15:00 (5 hours)
|
||||
Implementer: Claude Code (fine-tune skill)
|
||||
|
||||
## 🎯 Executive Summary
|
||||
|
||||
This fine-tuning project optimized the prompts for the LangGraph chatbot application and achieved the following results:
|
||||
|
||||
- ✅ **Accuracy**: 75.0% → 92.0% (+17.0%, target 90% achieved)
|
||||
- ✅ **Latency**: 2.5s → 1.9s (-24.0%, target 2.0s achieved)
|
||||
- ⚠️ **Cost**: $0.015 → $0.011 (-26.7%, target $0.010 not achieved)
|
||||
|
||||
A total of 3 iterations were conducted, achieving targets for 2 out of 3 metrics.
|
||||
|
||||
## 📊 Implementation Summary
|
||||
|
||||
### Number of Iterations and Execution Time
|
||||
|
||||
- **Total Iterations**: 3
|
||||
- **Number of Nodes Optimized**: 2 (analyze_intent, generate_response)
|
||||
- **Number of Evaluation Runs**: 20 times (Baseline 5 times + 5 times after each iteration × 3)
|
||||
- **Total Execution Time**: Approximately 5 hours
|
||||
|
||||
### Final Results
|
||||
|
||||
| Metric | Initial | Final | Improvement | Improvement Rate | Target | Achievement Status |
|
||||
| -------- | ------- | ------ | ----------- | ---------------- | ------ | ------------------ |
|
||||
| Accuracy | 75.0% | 92.0% | +17.0% | +22.7% | 90.0% | ✅ 102.2% |
|
||||
| Latency | 2.5s | 1.9s | -0.6s | -24.0% | 2.0s | ✅ 95.0% |
|
||||
| Cost/req | $0.015 | $0.011 | -$0.004 | -26.7% | $0.010 | ⚠️ 90.9% |
|
||||
|
||||
## 📝 Details by Iteration
|
||||
|
||||
### Iteration 1: Optimize analyze_intent Node
|
||||
|
||||
**Implementation Date/Time**: 2024-11-24 11:00
|
||||
**Target Node**: src/nodes/analyzer.py:25-45
|
||||
|
||||
**Changes**:
|
||||
1. temperature: 1.0 → 0.3
|
||||
2. Added 5 few-shot examples
|
||||
3. Structured into JSON output format
|
||||
4. Defined clear classification categories (4 categories)
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 75.0% → 86.0% (+11.0%)
|
||||
- Latency: 2.5s → 2.4s (-0.1s)
|
||||
- Cost: $0.015 → $0.014 (-$0.001)
|
||||
|
||||
**Learnings**: Few-shot examples and clear output format are most effective for accuracy improvement
|
||||
|
||||
---
|
||||
|
||||
### Iteration 2: Optimize generate_response Node
|
||||
|
||||
**Implementation Date/Time**: 2024-11-24 13:00
|
||||
**Target Node**: src/nodes/generator.py:45-68
|
||||
|
||||
**Changes**:
|
||||
1. Added conciseness instructions ("respond in 2-3 sentences")
|
||||
2. max_tokens: unlimited → 500
|
||||
3. temperature: 0.7 → 0.5
|
||||
4. Clarified response style
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 86.0% → 88.0% (+2.0%)
|
||||
- Latency: 2.4s → 2.0s (-0.4s)
|
||||
- Cost: $0.014 → $0.011 (-$0.003)
|
||||
|
||||
**Learnings**: max_tokens limit significantly contributes to latency and cost reduction
|
||||
|
||||
---
|
||||
|
||||
### Iteration 3: Additional Improvements to analyze_intent
|
||||
|
||||
**Implementation Date/Time**: 2024-11-24 14:30
|
||||
**Target Node**: src/nodes/analyzer.py:25-45
|
||||
|
||||
**Changes**:
|
||||
1. Increased few-shot examples from 5 → 10
|
||||
2. Added edge case handling
|
||||
3. Reclassification logic based on confidence threshold
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 88.0% → 92.0% (+4.0%)
|
||||
- Latency: 2.0s → 1.9s (-0.1s)
|
||||
- Cost: $0.011 → $0.011 (±0)
|
||||
|
||||
**Learnings**: Additional few-shot examples broke through the final accuracy barrier
|
||||
|
||||
## 🔧 Final Changes Summary
|
||||
|
||||
### src/nodes/analyzer.py
|
||||
|
||||
**Changed Lines**: 25-45
|
||||
|
||||
**Main Changes**:
|
||||
- temperature: 1.0 → 0.3
|
||||
- Few-shot examples: 0 → 10
|
||||
- Output: Free text → JSON
|
||||
- Added fallback based on confidence threshold
|
||||
|
||||
---
|
||||
|
||||
### src/nodes/generator.py
|
||||
|
||||
**Changed Lines**: 45-68
|
||||
|
||||
**Main Changes**:
|
||||
- temperature: 0.7 → 0.5
|
||||
- max_tokens: unlimited → 500
|
||||
- Clear conciseness instructions ("2-3 sentences")
|
||||
- Added response style guidelines
|
||||
|
||||
## 📈 Detailed Evaluation Results
|
||||
|
||||
### Improvement Status by Test Case
|
||||
|
||||
| Case ID | Category | Before | After | Improvement |
|
||||
| ------- | --------- | ----------- | ----------- | ----------- |
|
||||
| TC001 | Product | ❌ Wrong | ✅ Correct | ✅ |
|
||||
| TC002 | Technical | ❌ Wrong | ✅ Correct | ✅ |
|
||||
| TC003 | Billing | ✅ Correct | ✅ Correct | - |
|
||||
| ... | ... | ... | ... | ... |
|
||||
| TC020 | Technical | ✅ Correct | ✅ Correct | - |
|
||||
|
||||
**Improved Cases**: 15/20 (75%)
|
||||
**Maintained Cases**: 5/20 (25%)
|
||||
**Degraded Cases**: 0/20 (0%)
|
||||
|
||||
### Latency Breakdown
|
||||
|
||||
| Node | Before | After | Change | Change Rate |
|
||||
| ----------------- | ------ | ----- | ------ | ----------- |
|
||||
| analyze_intent | 0.5s | 0.4s | -0.1s | -20% |
|
||||
| retrieve_context | 0.2s | 0.2s | ±0s | 0% |
|
||||
| generate_response | 1.8s | 1.3s | -0.5s | -28% |
|
||||
| **Total** | **2.5s** | **1.9s** | **-0.6s** | **-24%** |
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
| Node | Before | After | Change | Change Rate |
|
||||
| ----------------- | ------- | ------- | -------- | ----------- |
|
||||
| analyze_intent | $0.003 | $0.003 | ±$0 | 0% |
|
||||
| retrieve_context | $0.001 | $0.001 | ±$0 | 0% |
|
||||
| generate_response | $0.011 | $0.007 | -$0.004 | -36% |
|
||||
| **Total** | **$0.015** | **$0.011** | **-$0.004** | **-27%** |
|
||||
|
||||
## 💡 Future Recommendations
|
||||
|
||||
### Short-term (1-2 weeks)
|
||||
|
||||
1. **Achieve Cost Target**: $0.011 → $0.010
|
||||
- Approach: Consider partial migration to Claude 3.5 Haiku
|
||||
- Estimated effect: -$0.002-0.003/req
|
||||
|
||||
2. **Further Accuracy Improvement**: 92.0% → 95.0%
|
||||
- Approach: Analyze error cases and add few-shot examples
|
||||
- Estimated effect: +3.0%
|
||||
|
||||
### Mid-term (1-2 months)
|
||||
|
||||
1. **Model Optimization**
|
||||
- Use Haiku for simple intent classification
|
||||
- Use Sonnet only for complex response generation
|
||||
- Estimated effect: -30-40% cost, minimal impact on latency
|
||||
|
||||
2. **Utilize Prompt Caching**
|
||||
- Cache system prompts and few-shot examples
|
||||
- Estimated effect: -50% cost (when cache hits)
|
||||
|
||||
### Long-term (3-6 months)
|
||||
|
||||
1. **Consider Fine-tuned Models**
|
||||
- Model fine-tuning with proprietary data
|
||||
- Concise prompts without few-shot examples
|
||||
- Estimated effect: -60% cost, +5% accuracy
|
||||
|
||||
## 🎓 Conclusion
|
||||
|
||||
This project achieved the following through fine-tuning the LangGraph application:
|
||||
|
||||
✅ **Successes**:
|
||||
1. Significant accuracy improvement (+22.7%) - Exceeded target by 2.2%
|
||||
2. Notable latency improvement (-24.0%) - Exceeded target by 5%
|
||||
3. Cost reduction (-26.7%) - 9.1% away from target
|
||||
|
||||
⚠️ **Challenges**:
|
||||
1. Cost target not achieved ($0.011 vs $0.010 target) - Can be addressed by migrating to lighter models
|
||||
|
||||
📈 **Business Impact**:
|
||||
- Improved user satisfaction (due to accuracy improvement)
|
||||
- Reduced operational costs (due to latency and cost reduction)
|
||||
- Improved scalability (efficient resource usage)
|
||||
|
||||
🎯 **Next Steps**:
|
||||
1. Verify migration to lighter models for cost reduction
|
||||
2. Continuous monitoring and evaluation
|
||||
3. Expand to new use cases
|
||||
|
||||
---
|
||||
|
||||
Created Date/Time: 2024-11-24 15:00:00
|
||||
Creator: Claude Code (fine-tune skill)
|
||||
```
|
||||
|
||||
### Example 4.2: Git Commit Message Examples
|
||||
|
||||
```bash
|
||||
# Iteration 1 commit
|
||||
git commit -m "feat(nodes): optimize analyze_intent prompt for accuracy
|
||||
|
||||
- Add temperature control (1.0 -> 0.3) for deterministic classification
|
||||
- Add 5 few-shot examples for intent categories
|
||||
- Implement JSON structured output format
|
||||
- Add error handling for JSON parsing failures
|
||||
|
||||
Results:
|
||||
- Accuracy: 75.0% -> 86.0% (+11.0%)
|
||||
- Latency: 2.5s -> 2.4s (-0.1s)
|
||||
- Cost: \$0.015 -> \$0.014 (-\$0.001)
|
||||
|
||||
Related: fine-tune iteration 1
|
||||
See: evaluation_results/iteration_1/"
|
||||
|
||||
# Iteration 2 commit
|
||||
git commit -m "feat(nodes): optimize generate_response for latency and cost
|
||||
|
||||
- Add conciseness guidelines (2-3 sentences)
|
||||
- Set max_tokens limit to 500
|
||||
- Adjust temperature (0.7 -> 0.5) for consistency
|
||||
- Define response style and tone
|
||||
|
||||
Results:
|
||||
- Accuracy: 86.0% -> 88.0% (+2.0%)
|
||||
- Latency: 2.4s -> 2.0s (-0.4s, -17%)
|
||||
- Cost: \$0.014 -> \$0.011 (-\$0.003, -21%)
|
||||
|
||||
Related: fine-tune iteration 2
|
||||
See: evaluation_results/iteration_2/"
|
||||
|
||||
# Final commit
|
||||
git commit -m "feat(nodes): finalize fine-tuning with additional improvements
|
||||
|
||||
Complete fine-tuning process with 3 iterations:
|
||||
- analyze_intent: 10 few-shot examples, confidence threshold
|
||||
- generate_response: conciseness and style optimization
|
||||
|
||||
Final Results:
|
||||
- Accuracy: 75.0% -> 92.0% (+17.0%, goal 90% ✅)
|
||||
- Latency: 2.5s -> 1.9s (-0.6s, -24%, goal 2.0s ✅)
|
||||
- Cost: \$0.015 -> \$0.011 (-\$0.004, -27%, goal \$0.010 ⚠️)
|
||||
|
||||
Related: fine-tune completion
|
||||
See: evaluation_results/final_report.md"
|
||||
|
||||
# Evaluation results commit
|
||||
git commit -m "docs: add fine-tuning evaluation results and final report
|
||||
|
||||
- Baseline evaluation (5 iterations)
|
||||
- Iteration 1-3 results
|
||||
- Final comprehensive report
|
||||
- Statistical analysis and recommendations"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- [SKILL.md](SKILL.md) - Skill overview
|
||||
- [workflow.md](workflow.md) - Workflow details
|
||||
- [evaluation.md](evaluation.md) - Evaluation methods
|
||||
- [prompt_optimization.md](prompt_optimization.md) - Optimization techniques
|
||||
65
skills/fine-tune/prompt_optimization.md
Normal file
65
skills/fine-tune/prompt_optimization.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Prompt Optimization Guide
|
||||
|
||||
A comprehensive guide for effectively optimizing prompts in LangGraph nodes.
|
||||
|
||||
## 📚 Table of Contents
|
||||
|
||||
This guide is divided into the following sections:
|
||||
|
||||
### 1. [Prompt Optimization Principles](./prompt_principles.md)
|
||||
Learn the fundamental principles for designing prompts.
|
||||
|
||||
### 2. [Prompt Optimization Techniques](./prompt_techniques.md)
|
||||
Provides a collection of practical optimization techniques (10 techniques).
|
||||
|
||||
### 3. [Optimization Priorities](./prompt_priorities.md)
|
||||
Explains how to apply optimization techniques in order of improvement impact.
|
||||
|
||||
## 🎯 Quick Start
|
||||
|
||||
### First-Time Optimization
|
||||
|
||||
1. **[Understand the Principles](./prompt_principles.md)** - Learn the basics of clarity, structure, and specificity
|
||||
2. **[Start with High-Impact Techniques](./prompt_priorities.md)** - Few-Shot Examples, output format structuring, parameter tuning
|
||||
3. **[Review Technique Details](./prompt_techniques.md)** - Implementation methods and effects of each technique
|
||||
|
||||
### Improving Existing Prompts
|
||||
|
||||
1. **Measure Baseline** - Record current performance
|
||||
2. **[Refer to Priority Guide](./prompt_priorities.md)** - Select the most impactful improvements
|
||||
3. **[Apply Techniques](./prompt_techniques.md)** - Implement one at a time and measure effects
|
||||
4. **Iterate** - Repeat the cycle of measure, implement, validate
|
||||
|
||||
## 📖 Related Documentation
|
||||
|
||||
- **[Prompt Optimization Examples](./examples.md)** - Before/After comparison examples and code templates
|
||||
- **[SKILL.md](./SKILL.md)** - Overview and usage of the Fine-tune skill
|
||||
- **[evaluation.md](./evaluation.md)** - Evaluation criteria design and measurement methods
|
||||
|
||||
## 💡 Best Practices
|
||||
|
||||
For effective prompt optimization:
|
||||
|
||||
1. ✅ **Measurement-Driven**: Evaluate all changes quantitatively
|
||||
2. ✅ **Incremental Improvement**: One change at a time, measure, validate
|
||||
3. ✅ **Cost-Conscious**: Optimize with model selection, caching, max_tokens
|
||||
4. ✅ **Task-Appropriate**: Select techniques based on task complexity
|
||||
5. ✅ **Iterative Approach**: Maintain continuous improvement cycles
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Low Prompt Quality
|
||||
→ Review [Prompt Optimization Principles](./prompt_principles.md)
|
||||
|
||||
### Insufficient Accuracy
|
||||
→ Apply [Few-Shot Examples](./prompt_techniques.md#technique-1-few-shot-examples) or [Chain-of-Thought](./prompt_techniques.md#technique-2-chain-of-thought)
|
||||
|
||||
### High Latency
|
||||
→ Implement [Temperature/Max Tokens Adjustment](./prompt_techniques.md#technique-4-temperature-and-max-tokens-adjustment) or [Output Format Structuring](./prompt_techniques.md#technique-3-output-format-structuring)
|
||||
|
||||
### High Cost
|
||||
→ Introduce [Model Selection Optimization](./prompt_techniques.md#technique-10-model-selection) or [Prompt Caching](./prompt_techniques.md#technique-6-prompt-caching)
|
||||
|
||||
---
|
||||
|
||||
**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
|
||||
84
skills/fine-tune/prompt_principles.md
Normal file
84
skills/fine-tune/prompt_principles.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Prompt Optimization Principles
|
||||
|
||||
Fundamental principles for designing prompts in LangGraph nodes.
|
||||
|
||||
## 🎯 Prompt Optimization Principles
|
||||
|
||||
### 1. Clarity
|
||||
|
||||
**Bad Example**:
|
||||
```python
|
||||
SystemMessage(content="Analyze the input.")
|
||||
```
|
||||
|
||||
**Good Example**:
|
||||
```python
|
||||
SystemMessage(content="""You are an intent classifier for customer support.
|
||||
|
||||
Task: Classify user input into one of these categories:
|
||||
- product_inquiry: Questions about products or services
|
||||
- technical_support: Technical issues or troubleshooting
|
||||
- billing: Payment or billing questions
|
||||
- general: General questions or greetings
|
||||
|
||||
Output only the category name.""")
|
||||
```
|
||||
|
||||
**Improvements**:
|
||||
- ✅ Clearly defined role
|
||||
- ✅ Specific task description
|
||||
- ✅ Enumerated categories
|
||||
- ✅ Specified output format
|
||||
|
||||
### 2. Structure
|
||||
|
||||
**Bad Example**:
|
||||
```python
|
||||
prompt = f"Answer this: {question}"
|
||||
```
|
||||
|
||||
**Good Example**:
|
||||
```python
|
||||
prompt = f"""Context:
|
||||
{context}
|
||||
|
||||
Question:
|
||||
{question}
|
||||
|
||||
Instructions:
|
||||
1. Base your answer on the provided context
|
||||
2. Be concise (2-3 sentences maximum)
|
||||
3. If the answer is not in the context, say "I don't have enough information"
|
||||
|
||||
Answer:"""
|
||||
```
|
||||
|
||||
**Improvements**:
|
||||
- ✅ Sectioned (Context, Question, Instructions, Answer)
|
||||
- ✅ Sequential instructions
|
||||
- ✅ Clear separators
|
||||
|
||||
### 3. Specificity
|
||||
|
||||
**Bad Example**:
|
||||
```python
|
||||
"Be helpful and friendly."
|
||||
```
|
||||
|
||||
**Good Example**:
|
||||
```python
|
||||
"""Tone and Style:
|
||||
- Use a warm, professional tone
|
||||
- Address the customer by name if available
|
||||
- Acknowledge their concern explicitly
|
||||
- Provide actionable next steps
|
||||
|
||||
Example:
|
||||
"Hi Sarah, I understand your concern about the billing charge. Let me review your account and get back to you within 24 hours with a detailed explanation."
|
||||
"""
|
||||
```
|
||||
|
||||
**Improvements**:
|
||||
- ✅ Specific guidelines
|
||||
- ✅ Concrete examples provided
|
||||
- ✅ Measurable criteria
|
||||
87
skills/fine-tune/prompt_priorities.md
Normal file
87
skills/fine-tune/prompt_priorities.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Prompt Optimization Priorities
|
||||
|
||||
A priority guide for applying optimization techniques in order of improvement impact.
|
||||
|
||||
## 📊 Optimization Priorities
|
||||
|
||||
In order of improvement impact:
|
||||
|
||||
### 1. Adding Few-Shot Examples (High Impact, Low Cost)
|
||||
- **Improvement**: Accuracy +10-20%
|
||||
- **Cost**: +5-10% (increased input tokens)
|
||||
- **Implementation Time**: 30 minutes - 1 hour
|
||||
- **Recommended**: ⭐⭐⭐⭐⭐
|
||||
|
||||
### 2. Output Format Structuring (High Impact, Low Cost)
|
||||
- **Improvement**: Latency -10-20%, Parsing errors -90%
|
||||
- **Cost**: ±0%
|
||||
- **Implementation Time**: 15-30 minutes
|
||||
- **Recommended**: ⭐⭐⭐⭐⭐
|
||||
|
||||
### 3. Temperature/Max Tokens Adjustment (Medium Impact, Zero Cost)
|
||||
- **Improvement**: Latency -10-30%, Cost -20-40%
|
||||
- **Cost**: Reduction
|
||||
- **Implementation Time**: 10-15 minutes
|
||||
- **Recommended**: ⭐⭐⭐⭐⭐
|
||||
|
||||
### 4. Clear Instructions and Guidelines (Medium Impact, Low Cost)
|
||||
- **Improvement**: Accuracy +5-10%, Quality +15-25%
|
||||
- **Cost**: +2-5%
|
||||
- **Implementation Time**: 30 minutes - 1 hour
|
||||
- **Recommended**: ⭐⭐⭐⭐
|
||||
|
||||
### 5. Model Selection Optimization (High Impact, Requires Validation)
|
||||
- **Improvement**: Cost -40-60%
|
||||
- **Risk**: Accuracy -2-5%
|
||||
- **Implementation Time**: 2-4 hours (including validation)
|
||||
- **Recommended**: ⭐⭐⭐⭐
|
||||
|
||||
### 6. Prompt Caching (High Impact, Medium Cost)
|
||||
- **Improvement**: Cost -50-90% (on cache hit)
|
||||
- **Complexity**: Medium (implementation and monitoring)
|
||||
- **Implementation Time**: 1-2 hours
|
||||
- **Recommended**: ⭐⭐⭐⭐
|
||||
|
||||
### 7. Chain-of-Thought (High Impact for Specific Tasks)
|
||||
- **Improvement**: Accuracy +15-30% for complex tasks
|
||||
- **Cost**: +20-40%
|
||||
- **Implementation Time**: 1-2 hours
|
||||
- **Recommended**: ⭐⭐⭐ (complex tasks only)
|
||||
|
||||
### 8. Self-Consistency (Limited Use)
|
||||
- **Improvement**: Accuracy +10-20%
|
||||
- **Cost**: +200-300%
|
||||
- **Implementation Time**: 2-3 hours
|
||||
- **Recommended**: ⭐⭐ (critical decisions only)
|
||||
|
||||
## 🔄 Iterative Optimization Process
|
||||
|
||||
```
|
||||
1. Measure baseline
|
||||
↓
|
||||
2. Select the most impactful improvement
|
||||
↓
|
||||
3. Implement (one change only)
|
||||
↓
|
||||
4. Evaluate (with same test cases)
|
||||
↓
|
||||
5. Is improvement confirmed?
|
||||
├─ Yes → Keep change, go to step 2
|
||||
└─ No → Rollback change, try different improvement
|
||||
↓
|
||||
6. Goal achieved?
|
||||
├─ Yes → Complete
|
||||
└─ No → Go to step 2
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
For effective prompt optimization:
|
||||
|
||||
1. ✅ **Clarity**: Clear role, task, and output format
|
||||
2. ✅ **Few-Shot Examples**: 3-7 high-quality examples
|
||||
3. ✅ **Structuring**: Structured output like JSON
|
||||
4. ✅ **Parameter Tuning**: Task-appropriate temperature/max_tokens
|
||||
5. ✅ **Incremental Improvement**: One change at a time, measure, validate
|
||||
6. ✅ **Cost-Conscious**: Model selection, caching, max_tokens
|
||||
7. ✅ **Measurement-Driven**: Evaluate all changes quantitatively
|
||||
425
skills/fine-tune/prompt_techniques.md
Normal file
425
skills/fine-tune/prompt_techniques.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Prompt Optimization Techniques
|
||||
|
||||
A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.
|
||||
|
||||
**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
|
||||
|
||||
## 🔧 Practical Optimization Techniques
|
||||
|
||||
### Technique 1: Few-Shot Examples
|
||||
|
||||
**Effect**: Accuracy +10-20%
|
||||
|
||||
**Before (Zero-shot)**:
|
||||
```python
|
||||
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""
|
||||
|
||||
# Accuracy: ~70%
|
||||
```
|
||||
|
||||
**After (Few-shot)**:
|
||||
```python
|
||||
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.
|
||||
|
||||
Examples:
|
||||
|
||||
Input: "How much does the premium plan cost?"
|
||||
Output: product_inquiry
|
||||
|
||||
Input: "I can't log into my account"
|
||||
Output: technical_support
|
||||
|
||||
Input: "Why was I charged twice this month?"
|
||||
Output: billing
|
||||
|
||||
Input: "Hello, how are you today?"
|
||||
Output: general
|
||||
|
||||
Input: "What features are included in the basic plan?"
|
||||
Output: product_inquiry"""
|
||||
|
||||
# Accuracy: ~85-90%
|
||||
```
|
||||
|
||||
**Best Practices**:
|
||||
- **Number of Examples**: 3-7 (diminishing returns beyond this)
|
||||
- **Diversity**: At least one from each category, including edge cases
|
||||
- **Quality**: Select clear and unambiguous examples
|
||||
- **Format**: Consistent Input/Output format
|
||||
|
||||
### Technique 2: Chain-of-Thought
|
||||
|
||||
**Effect**: Accuracy +15-30% for complex reasoning tasks
|
||||
|
||||
**Before (Direct answer)**:
|
||||
```python
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Answer:"""
|
||||
|
||||
# Many incorrect answers for complex questions
|
||||
```
|
||||
|
||||
**After (Chain-of-Thought)**:
|
||||
```python
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Think through this step by step:
|
||||
|
||||
1. First, identify the key information needed
|
||||
2. Then, analyze the context for relevant details
|
||||
3. Finally, formulate a clear answer
|
||||
|
||||
Reasoning:"""
|
||||
|
||||
# Logical answers even for complex questions
|
||||
```
|
||||
|
||||
**Application Scenarios**:
|
||||
- ✅ Tasks requiring multi-step reasoning
|
||||
- ✅ Complex decision making
|
||||
- ✅ Resolving contradictions
|
||||
- ❌ Simple classification tasks (overhead)
|
||||
|
||||
### Technique 3: Output Format Structuring
|
||||
|
||||
**Effect**: Latency -10-20%, Parsing errors -90%
|
||||
|
||||
**Before (Free text)**:
|
||||
```python
|
||||
prompt = "Classify the intent and explain why."
|
||||
|
||||
# Output: "This looks like a technical support question because the user is having trouble logging in..."
|
||||
# Problems: Hard to parse, verbose, inconsistent
|
||||
```
|
||||
|
||||
**After (JSON structured)**:
|
||||
```python
|
||||
prompt = """Classify the intent.
|
||||
|
||||
Output ONLY a valid JSON object:
|
||||
{
|
||||
"intent": "<category>",
|
||||
"confidence": <0.0-1.0>,
|
||||
"reasoning": "<brief explanation in one sentence>"
|
||||
}
|
||||
|
||||
Example output:
|
||||
{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""
|
||||
|
||||
# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
|
||||
# Benefits: Easy to parse, concise, consistent
|
||||
```
|
||||
|
||||
**JSON Parsing Error Handling**:
|
||||
```python
|
||||
import json
|
||||
import re
|
||||
|
||||
def parse_llm_json_output(output: str) -> dict:
|
||||
"""Robustly parse LLM JSON output"""
|
||||
try:
|
||||
# Parse as JSON directly
|
||||
return json.loads(output)
|
||||
except json.JSONDecodeError:
|
||||
# Extract JSON only (from markdown code blocks, etc.)
|
||||
json_match = re.search(r'\{[^}]+\}', output)
|
||||
if json_match:
|
||||
try:
|
||||
return json.loads(json_match.group())
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Fallback
|
||||
return {
|
||||
"intent": "general",
|
||||
"confidence": 0.5,
|
||||
"reasoning": "Failed to parse LLM output"
|
||||
}
|
||||
```
|
||||
|
||||
### Technique 4: Temperature and Max Tokens Adjustment
|
||||
|
||||
**Temperature Effects**:
|
||||
|
||||
| Task Type | Recommended Temperature | Reason |
|
||||
|-----------|------------------------|--------|
|
||||
| Classification/Extraction | 0.0 - 0.3 | Deterministic output desired |
|
||||
| Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed |
|
||||
| Creative/Generation | 0.7 - 1.0 | Diversity and creativity important |
|
||||
|
||||
**Before (Default settings)**:
|
||||
```python
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=1.0 # Default, used for all tasks
|
||||
)
|
||||
# Unstable results for classification tasks
|
||||
```
|
||||
|
||||
**After (Optimized per task)**:
|
||||
```python
|
||||
# Intent classification: Low temperature
|
||||
intent_llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.3 # Emphasize consistency
|
||||
)
|
||||
|
||||
# Response generation: Medium temperature
|
||||
response_llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.5, # Balance flexibility
|
||||
max_tokens=500 # Enforce conciseness
|
||||
)
|
||||
```
|
||||
|
||||
**Max Tokens Effects**:
|
||||
|
||||
```python
|
||||
# Before: No limit
|
||||
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s
|
||||
|
||||
# After: Appropriate limit
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=500 # Necessary and sufficient length
|
||||
)
|
||||
# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)
|
||||
```
|
||||
|
||||
### Technique 5: System Message vs Human Message Usage
|
||||
|
||||
**System Message**:
|
||||
- **Use**: Role, guidelines, constraints
|
||||
- **Characteristics**: Context applied to entire task
|
||||
- **Caching**: Effective (doesn't change frequently)
|
||||
|
||||
**Human Message**:
|
||||
- **Use**: Specific input, questions
|
||||
- **Characteristics**: Changes per request
|
||||
- **Caching**: Less effective
|
||||
|
||||
**Good Structure**:
|
||||
```python
|
||||
messages = [
|
||||
SystemMessage(content="""You are a customer support assistant.
|
||||
|
||||
Guidelines:
|
||||
- Be concise: 2-3 sentences maximum
|
||||
- Be empathetic: Acknowledge customer concerns
|
||||
- Be actionable: Provide clear next steps
|
||||
|
||||
Response format:
|
||||
1. Acknowledgment
|
||||
2. Answer or solution
|
||||
3. Next steps (if applicable)"""),
|
||||
|
||||
HumanMessage(content=f"""Customer question: {user_input}
|
||||
|
||||
Context: {context}
|
||||
|
||||
Generate a helpful response:""")
|
||||
]
|
||||
```
|
||||
|
||||
### Technique 6: Prompt Caching
|
||||
|
||||
**Effect**: Cost -50-90% (on cache hit)
|
||||
|
||||
Leverage Anthropic Claude's prompt caching:
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
|
||||
client = Anthropic()
|
||||
|
||||
# Large cacheable system prompt
|
||||
CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...
|
||||
|
||||
[Long guidelines, examples, and context - 1000+ tokens]
|
||||
|
||||
Examples:
|
||||
[50 few-shot examples]
|
||||
"""
|
||||
|
||||
# Use cache
|
||||
message = client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=500,
|
||||
system=[
|
||||
{
|
||||
"type": "text",
|
||||
"text": CACHED_SYSTEM_PROMPT,
|
||||
"cache_control": {"type": "ephemeral"} # Enable caching
|
||||
}
|
||||
],
|
||||
messages=[
|
||||
{"role": "user", "content": user_input}
|
||||
]
|
||||
)
|
||||
|
||||
# First time: Full cost
|
||||
# 2nd+ time (within 5 minutes): Input tokens -90% discount
|
||||
```
|
||||
|
||||
**Caching Strategy**:
|
||||
- ✅ Large system prompts (>1024 tokens)
|
||||
- ✅ Sets of few-shot examples
|
||||
- ✅ Long context (RAG documents)
|
||||
- ❌ Frequently changing content
|
||||
- ❌ Small prompts (<1024 tokens)
|
||||
|
||||
### Technique 7: Progressive Refinement
|
||||
|
||||
Break complex tasks into multiple steps:
|
||||
|
||||
**Before (1 step)**:
|
||||
```python
|
||||
# Execute everything in one node
|
||||
prompt = f"""Analyze user input, retrieve relevant info, and generate response.
|
||||
|
||||
Input: {user_input}"""
|
||||
|
||||
# Problems: Too complex, low quality, hard to debug
|
||||
```
|
||||
|
||||
**After (Multiple steps)**:
|
||||
```python
|
||||
# Step 1: Intent classification
|
||||
intent = classify_intent(user_input)
|
||||
|
||||
# Step 2: Information retrieval (based on intent)
|
||||
context = retrieve_context(intent, user_input)
|
||||
|
||||
# Step 3: Response generation (using intent and context)
|
||||
response = generate_response(intent, context, user_input)
|
||||
|
||||
# Benefits: Each step optimizable, easy to debug, improved quality
|
||||
```
|
||||
|
||||
### Technique 8: Negative Instructions
|
||||
|
||||
**Effect**: Edge case errors -30-50%
|
||||
|
||||
```python
|
||||
prompt = """Generate a customer support response.
|
||||
|
||||
DO:
|
||||
- Be concise (2-3 sentences)
|
||||
- Acknowledge the customer's concern
|
||||
- Provide actionable next steps
|
||||
|
||||
DO NOT:
|
||||
- Apologize excessively (one apology maximum)
|
||||
- Make promises you can't keep (e.g., "immediate resolution")
|
||||
- Use technical jargon without explanation
|
||||
- Provide information not in the context
|
||||
- Generate placeholder text like "XXX" or "[insert here]"
|
||||
|
||||
Customer question: {question}
|
||||
Context: {context}
|
||||
|
||||
Response:"""
|
||||
```
|
||||
|
||||
### Technique 9: Self-Consistency
|
||||
|
||||
**Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300%
|
||||
|
||||
Generate multiple reasoning paths and use majority voting:
|
||||
|
||||
```python
|
||||
def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
|
||||
"""Generate multiple reasoning paths and select the most consistent answer"""
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.7 # Higher temperature for diversity
|
||||
)
|
||||
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Think through this step by step and provide your reasoning:
|
||||
|
||||
Reasoning:"""
|
||||
|
||||
# Generate multiple reasoning paths
|
||||
responses = []
|
||||
for _ in range(num_samples):
|
||||
response = llm.invoke([HumanMessage(content=prompt)])
|
||||
responses.append(response.content)
|
||||
|
||||
# Extract the most consistent answer (simplified)
|
||||
# In practice, extract final answer from each response and use majority voting
|
||||
from collections import Counter
|
||||
final_answers = [extract_final_answer(r) for r in responses]
|
||||
most_common = Counter(final_answers).most_common(1)[0][0]
|
||||
|
||||
return most_common
|
||||
|
||||
# Trade-offs:
|
||||
# - Accuracy: +10-20%
|
||||
# - Cost: +200-300% (5x API calls)
|
||||
# - Latency: +200-300% (if not parallelized)
|
||||
# Use: Critical decisions only
|
||||
```
|
||||
|
||||
### Technique 10: Model Selection
|
||||
|
||||
**Model Selection Based on Task Complexity**:
|
||||
|
||||
| Task Type | Recommended Model | Reason |
|
||||
|-----------|------------------|--------|
|
||||
| Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy |
|
||||
| Complex reasoning | Claude 3.5 Sonnet | Balanced performance |
|
||||
| Highly complex tasks | Claude Opus | Best performance (high cost) |
|
||||
|
||||
```python
|
||||
# Select optimal model per task
|
||||
class LLMSelector:
|
||||
def __init__(self):
|
||||
self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
|
||||
self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
self.opus = ChatAnthropic(model="claude-opus-20240229")
|
||||
|
||||
def get_llm(self, task_complexity: str):
|
||||
if task_complexity == "simple":
|
||||
return self.haiku # ~$0.001/req
|
||||
elif task_complexity == "complex":
|
||||
return self.sonnet # ~$0.005/req
|
||||
else: # very_complex
|
||||
return self.opus # ~$0.015/req
|
||||
|
||||
# Usage example
|
||||
selector = LLMSelector()
|
||||
|
||||
# Simple intent classification → Haiku
|
||||
intent_llm = selector.get_llm("simple")
|
||||
|
||||
# Complex response generation → Sonnet
|
||||
response_llm = selector.get_llm("complex")
|
||||
```
|
||||
|
||||
**Hybrid Approach**:
|
||||
```python
|
||||
def hybrid_classification(user_input: str) -> dict:
|
||||
"""Try Haiku first, use Sonnet if confidence is low"""
|
||||
|
||||
# Step 1: Classify with Haiku
|
||||
haiku_result = classify_with_haiku(user_input)
|
||||
|
||||
if haiku_result["confidence"] >= 0.8:
|
||||
# High confidence → Use Haiku result
|
||||
return haiku_result
|
||||
else:
|
||||
# Low confidence → Re-classify with Sonnet
|
||||
sonnet_result = classify_with_sonnet(user_input)
|
||||
return sonnet_result
|
||||
|
||||
# Effects:
|
||||
# - 80% of cases use Haiku (low cost)
|
||||
# - 20% of cases use Sonnet (high accuracy)
|
||||
# - Average cost: -60%
|
||||
# - Average accuracy: -2% (acceptable range)
|
||||
```
|
||||
127
skills/fine-tune/workflow.md
Normal file
127
skills/fine-tune/workflow.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# Fine-Tuning Workflow Details
|
||||
|
||||
Detailed workflow and practical guidelines for executing fine-tuning of LangGraph applications.
|
||||
|
||||
**💡 Tip**: For concrete code examples and templates you can copy and paste, refer to [examples.md](examples.md).
|
||||
|
||||
## 📋 Workflow Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Phase 1: Preparation and Analysis │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ 1. Read fine-tune.md → Understand goals and criteria │
|
||||
│ 2. Identify optimization targets with Serena → List LLM nodes│
|
||||
│ 3. Create optimization list → Assess improvement potential │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Phase 2: Baseline Evaluation │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ 4. Prepare evaluation environment → Test cases, scripts │
|
||||
│ 5. Measure baseline → Run 3-5 times, collect statistics │
|
||||
│ 6. Analyze results → Identify issues, assess improvement │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Phase 3: Iterative Improvement (Iteration Loop) │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ 7. Prioritize → Select most effective improvement area │
|
||||
│ 8. Implement improvements → Optimize prompts, adjust params │
|
||||
│ 9. Post-improvement evaluation → Re-evaluate same conditions│
|
||||
│ 10. Compare results → Measure improvement, decide next step │
|
||||
│ 11. Continue decision → Goal met? Yes → Phase 4 / No → Next │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Phase 4: Completion and Documentation │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ 12. Create final evaluation report → Summary of improvements│
|
||||
│ 13. Commit code → Version control and documentation update │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 📚 Phase-by-Phase Detailed Guide
|
||||
|
||||
### [Phase 1: Preparation and Analysis](./workflow_phase1.md)
|
||||
Clarify optimization direction and identify targets for improvement:
|
||||
- **Step 1**: Read and understand fine-tune.md
|
||||
- **Step 2**: Identify optimization targets with Serena MCP
|
||||
- **Step 3**: Create optimization target list
|
||||
|
||||
**Time Required**: 30 minutes - 1 hour
|
||||
|
||||
### [Phase 2: Baseline Evaluation](./workflow_phase2.md)
|
||||
Quantitatively measure current performance:
|
||||
- **Step 4**: Prepare evaluation environment
|
||||
- **Step 5**: Measure baseline (3-5 runs)
|
||||
- **Step 6**: Analyze baseline results
|
||||
|
||||
**Time Required**: 1-2 hours
|
||||
|
||||
### [Phase 3: Iterative Improvement](./workflow_phase3.md)
|
||||
Data-driven, incremental prompt optimization:
|
||||
- **Step 7**: Prioritization
|
||||
- **Step 8**: Implement improvements
|
||||
- **Step 9**: Post-improvement evaluation
|
||||
- **Step 10**: Compare results
|
||||
- **Step 11**: Continue decision
|
||||
|
||||
**Time Required**: 1-2 hours per iteration × number of iterations (typically 3-5)
|
||||
|
||||
### [Phase 4: Completion and Documentation](./workflow_phase4.md)
|
||||
Record final results and commit code:
|
||||
- **Step 12**: Create final evaluation report
|
||||
- **Step 13**: Commit code and update documentation
|
||||
|
||||
**Time Required**: 30 minutes - 1 hour
|
||||
|
||||
## 🎯 Workflow Execution Points
|
||||
|
||||
### For First-Time Fine-Tuning
|
||||
|
||||
1. **Start from Phase 1 in order**: Execute all phases without skipping
|
||||
2. **Create documentation**: Record results from each phase
|
||||
3. **Start small**: Experiment with a small number of test cases initially
|
||||
|
||||
### Continuous Fine-Tuning
|
||||
|
||||
1. **Start from Phase 2**: Measure new baseline
|
||||
2. **Repeat Phase 3**: Continuous improvement cycle
|
||||
3. **Consider automation**: Build evaluation pipeline
|
||||
|
||||
## 📊 Principles for Success
|
||||
|
||||
1. **Data-Driven**: Base all decisions on measurement results
|
||||
2. **Incremental Improvement**: One change at a time, measure, verify
|
||||
3. **Documentation**: Record results and learnings from each phase
|
||||
4. **Statistical Verification**: Run multiple times to confirm significance
|
||||
|
||||
## 🔗 Related Documents
|
||||
|
||||
- **[Example Collection](./examples.md)** - Code examples and templates for each phase
|
||||
- **[Evaluation Methods](./evaluation.md)** - Details on evaluation metrics and statistical analysis
|
||||
- **[Prompt Optimization](./prompt_optimization.md)** - Detailed optimization techniques
|
||||
- **[SKILL.md](./SKILL.md)** - Overview of the Fine-tune skill
|
||||
|
||||
## 💡 Troubleshooting
|
||||
|
||||
### Cannot find optimization targets in Phase 1
|
||||
→ Check search patterns in [workflow_phase1.md#step-2](./workflow_phase1.md#step-2-identify-optimization-targets-with-serena-mcp)
|
||||
|
||||
### Evaluation script fails in Phase 2
|
||||
→ Check checklist in [workflow_phase2.md#step-4](./workflow_phase2.md#step-4-prepare-evaluation-environment)
|
||||
|
||||
### No improvement effect in Phase 3
|
||||
→ Review priority matrix in [workflow_phase3.md#step-7](./workflow_phase3.md#step-7-prioritization)
|
||||
|
||||
### Report creation takes too long in Phase 4
|
||||
→ Utilize templates in [workflow_phase4.md#step-12](./workflow_phase4.md#step-12-create-final-evaluation-report)
|
||||
|
||||
---
|
||||
|
||||
Following this workflow enables:
|
||||
- ✅ Systematic fine-tuning process execution
|
||||
- ✅ Data-driven decision making
|
||||
- ✅ Continuous improvement and verification
|
||||
- ✅ Complete documentation and traceability
|
||||
229
skills/fine-tune/workflow_phase1.md
Normal file
229
skills/fine-tune/workflow_phase1.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Phase 1: Preparation and Analysis
|
||||
|
||||
Preparation phase to clarify optimization direction and identify targets for improvement.
|
||||
|
||||
**Time Required**: 30 minutes - 1 hour
|
||||
|
||||
**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Practical Examples](./examples.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Preparation and Analysis
|
||||
|
||||
### Step 1: Read and Understand fine-tune.md
|
||||
|
||||
**Purpose**: Clarify optimization direction
|
||||
|
||||
**Execution**:
|
||||
```python
|
||||
# Read .langgraph-master/fine-tune.md
|
||||
file_path = ".langgraph-master/fine-tune.md"
|
||||
with open(file_path, "r") as f:
|
||||
fine_tune_spec = f.read()
|
||||
|
||||
# Extract the following information:
|
||||
# - Optimization goals (accuracy, latency, cost, etc.)
|
||||
# - Evaluation methods (test cases, metrics, calculation methods)
|
||||
# - Passing criteria (target values for each metric)
|
||||
# - Test data location
|
||||
```
|
||||
|
||||
**Typical fine-tune.md structure**:
|
||||
```markdown
|
||||
# Fine-Tuning Goals
|
||||
|
||||
## Optimization Objectives
|
||||
- **Accuracy**: Improve user intent classification accuracy to 90% or higher
|
||||
- **Latency**: Reduce response time to 2.0 seconds or less
|
||||
- **Cost**: Reduce cost per request to $0.010 or less
|
||||
|
||||
## Evaluation Methods
|
||||
- **Test Cases**: tests/evaluation/test_cases.json (20 cases)
|
||||
- **Execution Command**: uv run python -m src.evaluate
|
||||
- **Evaluation Script**: tests/evaluation/evaluator.py
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
### Accuracy
|
||||
- Calculation method: (Correct count / Total cases) × 100
|
||||
- Target value: 90% or higher
|
||||
|
||||
### Latency
|
||||
- Calculation method: Average time per execution
|
||||
- Target value: 2.0 seconds or less
|
||||
|
||||
### Cost
|
||||
- Calculation method: Total API cost / Total requests
|
||||
- Target value: $0.010 or less
|
||||
|
||||
## Passing Criteria
|
||||
All evaluation metrics must achieve their target values
|
||||
```
|
||||
|
||||
### Step 2: Identify Optimization Targets with Serena MCP
|
||||
|
||||
**Purpose**: Comprehensively identify nodes calling LLMs
|
||||
|
||||
**Execution Steps**:
|
||||
|
||||
1. **Search for LLM clients**
|
||||
```python
|
||||
# Use Serena MCP: find_symbol
|
||||
# Search for ChatAnthropic, ChatOpenAI, ChatGoogleGenerativeAI, etc.
|
||||
|
||||
patterns = [
|
||||
"ChatAnthropic",
|
||||
"ChatOpenAI",
|
||||
"ChatGoogleGenerativeAI",
|
||||
"ChatVertexAI"
|
||||
]
|
||||
|
||||
llm_usages = []
|
||||
for pattern in patterns:
|
||||
results = serena.find_symbol(
|
||||
name_path=pattern,
|
||||
substring_matching=True,
|
||||
include_body=False
|
||||
)
|
||||
llm_usages.extend(results)
|
||||
```
|
||||
|
||||
2. **Identify prompt construction locations**
|
||||
```python
|
||||
# For each LLM call, investigate how prompts are constructed
|
||||
for usage in llm_usages:
|
||||
# Get surrounding context with find_referencing_symbols
|
||||
context = serena.find_referencing_symbols(
|
||||
name_path=usage.name,
|
||||
relative_path=usage.file_path
|
||||
)
|
||||
|
||||
# Identify prompt templates and message construction logic
|
||||
# - Use of ChatPromptTemplate
|
||||
# - SystemMessage, HumanMessage definitions
|
||||
# - Prompt construction with f-strings or format()
|
||||
```
|
||||
|
||||
3. **Per-node analysis**
|
||||
```python
|
||||
# Analyze LLM usage patterns within each node function
|
||||
# - Prompt clarity
|
||||
# - Presence of few-shot examples
|
||||
# - Structured output format
|
||||
# - Parameter settings (temperature, max_tokens, etc.)
|
||||
```
|
||||
|
||||
**Example Output**:
|
||||
```markdown
|
||||
## LLM Call Location Analysis
|
||||
|
||||
### 1. analyze_intent node
|
||||
- **File**: src/nodes/analyzer.py
|
||||
- **Line numbers**: 25-45
|
||||
- **LLM**: ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
- **Prompt structure**:
|
||||
```python
|
||||
SystemMessage: "You are an intent analyzer..."
|
||||
HumanMessage: f"Analyze: {user_input}"
|
||||
```
|
||||
- **Improvement potential**: ⭐⭐⭐⭐⭐ (High)
|
||||
- Prompt is vague ("Analyze" criteria unclear)
|
||||
- No few-shot examples
|
||||
- Output format is free text
|
||||
- **Estimated improvement effect**: Accuracy +10-15%
|
||||
|
||||
### 2. generate_response node
|
||||
- **File**: src/nodes/generator.py
|
||||
- **Line numbers**: 45-68
|
||||
- **LLM**: ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
- **Prompt structure**:
|
||||
```python
|
||||
ChatPromptTemplate.from_messages([
|
||||
("system", "Generate helpful response..."),
|
||||
("human", "{context}\n\nQuestion: {question}")
|
||||
])
|
||||
```
|
||||
- **Improvement potential**: ⭐⭐⭐ (Medium)
|
||||
- Prompt is structured but lacks conciseness instructions
|
||||
- No max_tokens limit → possibility of verbose output
|
||||
- **Estimated improvement effect**: Latency -0.3-0.5s, Cost -20-30%
|
||||
```
|
||||
|
||||
### Step 3: Create Optimization Target List
|
||||
|
||||
**Purpose**: Organize information to determine improvement priorities
|
||||
|
||||
**List Creation Template**:
|
||||
```markdown
|
||||
# Optimization Target List
|
||||
|
||||
## Node: analyze_intent
|
||||
|
||||
### Basic Information
|
||||
- **File**: src/nodes/analyzer.py:25-45
|
||||
- **Role**: Classify user input intent
|
||||
- **LLM Model**: claude-3-5-sonnet-20241022
|
||||
- **Current Parameters**: temperature=1.0, max_tokens=default
|
||||
|
||||
### Current Prompt
|
||||
```python
|
||||
SystemMessage(content="You are an intent analyzer. Analyze user input.")
|
||||
HumanMessage(content=f"Analyze: {user_input}")
|
||||
```
|
||||
|
||||
### Issues
|
||||
1. **Vague instructions**: Specific criteria for "Analyze" unclear
|
||||
2. **No few-shot**: No expected output examples
|
||||
3. **Undefined output format**: Unstructured free text
|
||||
4. **High temperature**: 1.0 is too high for classification tasks
|
||||
|
||||
### Improvement Ideas
|
||||
1. Specify concrete classification categories
|
||||
2. Add 3-5 few-shot examples
|
||||
3. Specify JSON output format
|
||||
4. Lower temperature to 0.3-0.5
|
||||
|
||||
### Estimated Improvement Effect
|
||||
- **Accuracy**: +10-15% (Current misclassification 20% → 5-10%)
|
||||
- **Latency**: ±0 (No change)
|
||||
- **Cost**: ±0 (No change)
|
||||
|
||||
### Priority
|
||||
⭐⭐⭐⭐⭐ (Highest) - Direct impact on accuracy improvement
|
||||
|
||||
---
|
||||
|
||||
## Node: generate_response
|
||||
|
||||
### Basic Information
|
||||
- **File**: src/nodes/generator.py:45-68
|
||||
- **Role**: Generate final user-facing response
|
||||
- **LLM Model**: claude-3-5-sonnet-20241022
|
||||
- **Current Parameters**: temperature=0.7, max_tokens=default
|
||||
|
||||
### Current Prompt
|
||||
```python
|
||||
ChatPromptTemplate.from_messages([
|
||||
("system", "Generate helpful response based on context."),
|
||||
("human", "{context}\n\nQuestion: {question}")
|
||||
])
|
||||
```
|
||||
|
||||
### Issues
|
||||
1. **No verbosity control**: No conciseness instructions
|
||||
2. **max_tokens not set**: Possibility of unnecessarily long output
|
||||
3. **Undefined response style**: No tone or style specifications
|
||||
|
||||
### Improvement Ideas
|
||||
1. Add length instructions like "be concise" "in 2-3 sentences"
|
||||
2. Limit max_tokens to 500
|
||||
3. Clarify response style ("friendly" "professional" etc.)
|
||||
|
||||
### Estimated Improvement Effect
|
||||
- **Accuracy**: ±0 (No change)
|
||||
- **Latency**: -0.3-0.5s (Due to reduced output tokens)
|
||||
- **Cost**: -20-30% (Due to reduced token count)
|
||||
|
||||
### Priority
|
||||
⭐⭐⭐ (Medium) - Improvement in latency and cost
|
||||
```
|
||||
222
skills/fine-tune/workflow_phase2.md
Normal file
222
skills/fine-tune/workflow_phase2.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Phase 2: Baseline Evaluation
|
||||
|
||||
Phase to quantitatively measure current performance.
|
||||
|
||||
**Time Required**: 1-2 hours
|
||||
|
||||
**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Evaluation Methods](./evaluation.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Baseline Evaluation
|
||||
|
||||
### Step 4: Prepare Evaluation Environment
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Test case files exist
|
||||
- [ ] Evaluation script is executable
|
||||
- [ ] Environment variables (API keys, etc.) are set
|
||||
- [ ] Dependency packages are installed
|
||||
|
||||
**Execution Example**:
|
||||
```bash
|
||||
# Check test cases
|
||||
cat tests/evaluation/test_cases.json
|
||||
|
||||
# Verify evaluation script works
|
||||
uv run python -m src.evaluate --dry-run
|
||||
|
||||
# Verify environment variables
|
||||
echo $ANTHROPIC_API_KEY
|
||||
```
|
||||
|
||||
### Step 5: Measure Baseline
|
||||
|
||||
**Recommended Run Count**: 3-5 times (for statistical reliability)
|
||||
|
||||
**Execution Script Example**:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# baseline_evaluation.sh
|
||||
|
||||
ITERATIONS=5
|
||||
RESULTS_DIR="evaluation_results/baseline"
|
||||
mkdir -p $RESULTS_DIR
|
||||
|
||||
for i in $(seq 1 $ITERATIONS); do
|
||||
echo "Running baseline evaluation: iteration $i/$ITERATIONS"
|
||||
uv run python -m src.evaluate \
|
||||
--output "$RESULTS_DIR/run_$i.json" \
|
||||
--verbose
|
||||
|
||||
# API rate limit countermeasure
|
||||
sleep 5
|
||||
done
|
||||
|
||||
# Aggregate results
|
||||
uv run python -m src.aggregate_results \
|
||||
--input-dir "$RESULTS_DIR" \
|
||||
--output "$RESULTS_DIR/summary.json"
|
||||
```
|
||||
|
||||
**Evaluation Script Example** (`src/evaluate.py`):
|
||||
```python
|
||||
import json
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Dict, List
|
||||
|
||||
def evaluate_test_cases(test_cases: List[Dict]) -> Dict:
|
||||
"""Evaluate test cases"""
|
||||
results = {
|
||||
"total_cases": len(test_cases),
|
||||
"correct": 0,
|
||||
"total_latency": 0.0,
|
||||
"total_cost": 0.0,
|
||||
"case_results": []
|
||||
}
|
||||
|
||||
for case in test_cases:
|
||||
start_time = time.time()
|
||||
|
||||
# Execute LangGraph application
|
||||
output = run_langgraph_app(case["input"])
|
||||
|
||||
latency = time.time() - start_time
|
||||
|
||||
# Correct answer judgment
|
||||
is_correct = output["answer"] == case["expected_answer"]
|
||||
if is_correct:
|
||||
results["correct"] += 1
|
||||
|
||||
# Cost calculation (from token usage)
|
||||
cost = calculate_cost(output["token_usage"])
|
||||
|
||||
results["total_latency"] += latency
|
||||
results["total_cost"] += cost
|
||||
|
||||
results["case_results"].append({
|
||||
"case_id": case["id"],
|
||||
"correct": is_correct,
|
||||
"latency": latency,
|
||||
"cost": cost
|
||||
})
|
||||
|
||||
# Calculate metrics
|
||||
results["accuracy"] = (results["correct"] / results["total_cases"]) * 100
|
||||
results["avg_latency"] = results["total_latency"] / results["total_cases"]
|
||||
results["avg_cost"] = results["total_cost"] / results["total_cases"]
|
||||
|
||||
return results
|
||||
|
||||
def calculate_cost(token_usage: Dict) -> float:
|
||||
"""Calculate cost from token usage"""
|
||||
# Claude 3.5 Sonnet pricing
|
||||
INPUT_COST_PER_1M = 3.0 # $3.00 per 1M input tokens
|
||||
OUTPUT_COST_PER_1M = 15.0 # $15.00 per 1M output tokens
|
||||
|
||||
input_cost = (token_usage["input_tokens"] / 1_000_000) * INPUT_COST_PER_1M
|
||||
output_cost = (token_usage["output_tokens"] / 1_000_000) * OUTPUT_COST_PER_1M
|
||||
|
||||
return input_cost + output_cost
|
||||
```
|
||||
|
||||
### Step 6: Analyze Baseline Results
|
||||
|
||||
**Aggregation Script Example** (`src/aggregate_results.py`):
|
||||
```python
|
||||
import json
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from typing import List, Dict
|
||||
|
||||
def aggregate_results(results_dir: Path) -> Dict:
|
||||
"""Aggregate multiple execution results"""
|
||||
all_results = []
|
||||
|
||||
for result_file in sorted(results_dir.glob("run_*.json")):
|
||||
with open(result_file) as f:
|
||||
all_results.append(json.load(f))
|
||||
|
||||
# Calculate statistics for each metric
|
||||
accuracies = [r["accuracy"] for r in all_results]
|
||||
latencies = [r["avg_latency"] for r in all_results]
|
||||
costs = [r["avg_cost"] for r in all_results]
|
||||
|
||||
summary = {
|
||||
"iterations": len(all_results),
|
||||
"accuracy": {
|
||||
"mean": np.mean(accuracies),
|
||||
"std": np.std(accuracies),
|
||||
"min": np.min(accuracies),
|
||||
"max": np.max(accuracies)
|
||||
},
|
||||
"latency": {
|
||||
"mean": np.mean(latencies),
|
||||
"std": np.std(latencies),
|
||||
"min": np.min(latencies),
|
||||
"max": np.max(latencies)
|
||||
},
|
||||
"cost": {
|
||||
"mean": np.mean(costs),
|
||||
"std": np.std(costs),
|
||||
"min": np.min(costs),
|
||||
"max": np.max(costs)
|
||||
}
|
||||
}
|
||||
|
||||
return summary
|
||||
```
|
||||
|
||||
**Results Report Example**:
|
||||
```markdown
|
||||
# Baseline Evaluation Results
|
||||
|
||||
Execution Date: 2024-11-24 10:00:00
|
||||
Run Count: 5
|
||||
Test Case Count: 20
|
||||
|
||||
## Evaluation Metrics Summary
|
||||
|
||||
| Metric | Mean | Std Dev | Min | Max | Target | Gap |
|
||||
|--------|------|---------|-----|-----|--------|-----|
|
||||
| Accuracy | 75.0% | 3.2% | 70.0% | 80.0% | 90.0% | **-15.0%** |
|
||||
| Latency | 2.5s | 0.4s | 2.1s | 3.2s | 2.0s | **+0.5s** |
|
||||
| Cost/req | $0.015 | $0.002 | $0.013 | $0.018 | $0.010 | **+$0.005** |
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Accuracy Issues
|
||||
- **Current**: 75.0% (Target: 90.0%)
|
||||
- **Main error patterns**:
|
||||
1. Intent classification errors: 12 cases (60% of errors)
|
||||
2. Context understanding deficiency: 5 cases (25% of errors)
|
||||
3. Handling ambiguous questions: 3 cases (15% of errors)
|
||||
|
||||
### Latency Issues
|
||||
- **Current**: 2.5s (Target: 2.0s)
|
||||
- **Bottlenecks**:
|
||||
1. generate_response node: avg 1.8s (72% of total)
|
||||
2. analyze_intent node: avg 0.5s (20% of total)
|
||||
3. Other: avg 0.2s (8% of total)
|
||||
|
||||
### Cost Issues
|
||||
- **Current**: $0.015/req (Target: $0.010/req)
|
||||
- **Cost breakdown**:
|
||||
1. generate_response: $0.011 (73%)
|
||||
2. analyze_intent: $0.003 (20%)
|
||||
3. Other: $0.001 (7%)
|
||||
- **Main factor**: High output token count (avg 800 tokens)
|
||||
|
||||
## Improvement Directions
|
||||
|
||||
### Priority 1: Improve analyze_intent accuracy
|
||||
- **Impact**: Direct impact on accuracy (accounts for 60% of -15% gap)
|
||||
- **Improvements**: Few-shot examples, clear classification criteria, JSON output format
|
||||
- **Estimated effect**: +10-12% accuracy
|
||||
|
||||
### Priority 2: Optimize generate_response efficiency
|
||||
- **Impact**: Affects both latency and cost
|
||||
- **Improvements**: Conciseness instructions, max_tokens limit, temperature adjustment
|
||||
- **Estimated effect**: -0.4s latency, -$0.004 cost
|
||||
```
|
||||
225
skills/fine-tune/workflow_phase3.md
Normal file
225
skills/fine-tune/workflow_phase3.md
Normal file
@@ -0,0 +1,225 @@
|
||||
# Phase 3: Iterative Improvement
|
||||
|
||||
Phase for data-driven, incremental prompt optimization.
|
||||
|
||||
**Time Required**: 1-2 hours per iteration × number of iterations (typically 3-5)
|
||||
|
||||
**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Prompt Optimization](./prompt_optimization.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Iterative Improvement
|
||||
|
||||
### Iteration Cycle
|
||||
|
||||
Execute the following in each iteration:
|
||||
|
||||
1. **Prioritization** (Step 7)
|
||||
2. **Implement Improvements** (Step 8)
|
||||
3. **Post-Improvement Evaluation** (Step 9)
|
||||
4. **Compare Results** (Step 10)
|
||||
5. **Continue Decision** (Step 11)
|
||||
|
||||
### Step 7: Prioritization
|
||||
|
||||
**Decision Criteria**:
|
||||
1. **Impact on goal achievement**
|
||||
2. **Feasibility of improvement**
|
||||
3. **Implementation cost**
|
||||
|
||||
**Priority Matrix**:
|
||||
```markdown
|
||||
## Improvement Priority Matrix
|
||||
|
||||
| Node | Impact | Feasibility | Impl Cost | Total Score | Priority |
|
||||
|------|--------|-------------|-----------|-------------|----------|
|
||||
| analyze_intent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 14/15 | 1st |
|
||||
| generate_response | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 12/15 | 2nd |
|
||||
| retrieve_context | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 8/15 | 3rd |
|
||||
|
||||
**Iteration 1 Target**: analyze_intent node
|
||||
```
|
||||
|
||||
### Step 8: Implement Improvements
|
||||
|
||||
**Pre-Improvement Prompt** (`src/nodes/analyzer.py`):
|
||||
```python
|
||||
# Before
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=1.0
|
||||
)
|
||||
|
||||
messages = [
|
||||
SystemMessage(content="You are an intent analyzer. Analyze user input."),
|
||||
HumanMessage(content=f"Analyze: {state['user_input']}")
|
||||
]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
state["intent"] = response.content
|
||||
return state
|
||||
```
|
||||
|
||||
**Post-Improvement Prompt**:
|
||||
```python
|
||||
# After - Iteration 1
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.3 # Lower temperature for classification tasks
|
||||
)
|
||||
|
||||
# Clear classification categories and few-shot examples
|
||||
system_prompt = """You are an intent classifier for a customer support chatbot.
|
||||
|
||||
Classify user input into one of these categories:
|
||||
- "product_inquiry": Questions about products or services
|
||||
- "technical_support": Technical issues or troubleshooting
|
||||
- "billing": Payment, invoicing, or billing questions
|
||||
- "general": General questions or chitchat
|
||||
|
||||
Output ONLY a valid JSON object with this structure:
|
||||
{
|
||||
"intent": "<category>",
|
||||
"confidence": <0.0-1.0>,
|
||||
"reasoning": "<brief explanation>"
|
||||
}
|
||||
|
||||
Examples:
|
||||
|
||||
Input: "How much does the premium plan cost?"
|
||||
Output: {"intent": "product_inquiry", "confidence": 0.95, "reasoning": "Question about product pricing"}
|
||||
|
||||
Input: "I can't log into my account"
|
||||
Output: {"intent": "technical_support", "confidence": 0.9, "reasoning": "Authentication issue"}
|
||||
|
||||
Input: "Why was I charged twice?"
|
||||
Output: {"intent": "billing", "confidence": 0.95, "reasoning": "Question about billing charges"}
|
||||
|
||||
Input: "Hello, how are you?"
|
||||
Output: {"intent": "general", "confidence": 0.85, "reasoning": "General greeting"}
|
||||
|
||||
Input: "What's the return policy?"
|
||||
Output: {"intent": "product_inquiry", "confidence": 0.9, "reasoning": "Question about product policy"}
|
||||
"""
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=system_prompt),
|
||||
HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
|
||||
]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
|
||||
# JSON parsing (with error handling)
|
||||
try:
|
||||
intent_data = json.loads(response.content)
|
||||
state["intent"] = intent_data["intent"]
|
||||
state["confidence"] = intent_data["confidence"]
|
||||
except json.JSONDecodeError:
|
||||
# Fallback
|
||||
state["intent"] = "general"
|
||||
state["confidence"] = 0.5
|
||||
|
||||
return state
|
||||
```
|
||||
|
||||
**Summary of Changes**:
|
||||
1. ✅ temperature: 1.0 → 0.3 (appropriate for classification tasks)
|
||||
2. ✅ Clear classification categories (4 intents)
|
||||
3. ✅ Few-shot examples (added 5)
|
||||
4. ✅ JSON output format (structured output)
|
||||
5. ✅ Error handling (fallback for JSON parse failures)
|
||||
|
||||
### Step 9: Post-Improvement Evaluation
|
||||
|
||||
**Execution**:
|
||||
```bash
|
||||
# Execute post-improvement evaluation under same conditions
|
||||
./evaluation_after_iteration1.sh
|
||||
```
|
||||
|
||||
### Step 10: Compare Results
|
||||
|
||||
**Comparison Report Example**:
|
||||
```markdown
|
||||
# Iteration 1 Evaluation Results
|
||||
|
||||
Execution Date: 2024-11-24 12:00:00
|
||||
Changes: Optimization of analyze_intent node
|
||||
|
||||
## Results Comparison
|
||||
|
||||
| Metric | Baseline | Iteration 1 | Change | % Change | Target | Achievement |
|
||||
|--------|----------|-------------|--------|----------|--------|-------------|
|
||||
| **Accuracy** | 75.0% | **86.0%** | **+11.0%** | +14.7% | 90.0% | 95.6% |
|
||||
| **Latency** | 2.5s | 2.4s | -0.1s | -4.0% | 2.0s | 80.0% |
|
||||
| **Cost/req** | $0.015 | $0.014 | -$0.001 | -6.7% | $0.010 | 71.4% |
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### Accuracy Improvement
|
||||
- **Improvement**: +11.0% (75.0% → 86.0%)
|
||||
- **Remaining gap**: 4.0% (target 90.0%)
|
||||
- **Improved cases**: Intent classification errors reduced from 12 → 3 cases
|
||||
- **Still needs improvement**: Context understanding deficiency cases (5 cases)
|
||||
|
||||
### Slight Latency Improvement
|
||||
- **Improvement**: -0.1s (2.5s → 2.4s)
|
||||
- **Main factor**: Lower temperature in analyze_intent made output more concise
|
||||
- **Remaining bottleneck**: generate_response (avg 1.8s)
|
||||
|
||||
### Slight Cost Reduction
|
||||
- **Reduction**: -$0.001 (6.7% reduction)
|
||||
- **Factor**: Reduced output tokens in analyze_intent
|
||||
- **Main cost**: generate_response still accounts for 73%
|
||||
|
||||
## Next Iteration Strategy
|
||||
|
||||
### Priority 1: Optimize generate_response
|
||||
- **Goal**: Latency 1.8s → 1.4s, Cost $0.011 → $0.007
|
||||
- **Approach**:
|
||||
1. Add conciseness instructions
|
||||
2. Limit max_tokens to 500
|
||||
3. Adjust temperature from 0.7 → 0.5
|
||||
|
||||
### Priority 2: Final 4% accuracy improvement
|
||||
- **Goal**: 86.0% → 90.0% or higher
|
||||
- **Approach**: Improve context understanding (retrieve_context node)
|
||||
|
||||
## Decision
|
||||
✅ Continue → Proceed to Iteration 2
|
||||
```
|
||||
|
||||
### Step 11: Continue Decision
|
||||
|
||||
**Decision Criteria**:
|
||||
```python
|
||||
def should_continue_iteration(results: Dict, goals: Dict) -> bool:
|
||||
"""Determine if iteration should continue"""
|
||||
all_goals_met = True
|
||||
|
||||
for metric, goal in goals.items():
|
||||
if metric == "accuracy":
|
||||
if results[metric] < goal:
|
||||
all_goals_met = False
|
||||
elif metric in ["latency", "cost"]:
|
||||
if results[metric] > goal:
|
||||
all_goals_met = False
|
||||
|
||||
return not all_goals_met
|
||||
|
||||
# Example
|
||||
goals = {"accuracy": 90.0, "latency": 2.0, "cost": 0.010}
|
||||
results = {"accuracy": 86.0, "latency": 2.4, "cost": 0.014}
|
||||
|
||||
if should_continue_iteration(results, goals):
|
||||
print("Proceed to next iteration")
|
||||
else:
|
||||
print("Goals achieved - Move to Phase 4")
|
||||
```
|
||||
|
||||
**Iteration Limit**:
|
||||
- **Recommended**: 3-5 iterations
|
||||
- **Reason**: Beyond this, law of diminishing returns likely applies
|
||||
- **Exception**: Critical applications may require 10+ iterations
|
||||
339
skills/fine-tune/workflow_phase4.md
Normal file
339
skills/fine-tune/workflow_phase4.md
Normal file
@@ -0,0 +1,339 @@
|
||||
# Phase 4: Completion and Documentation
|
||||
|
||||
Phase to record final results and commit code.
|
||||
|
||||
**Time Required**: 30 minutes - 1 hour
|
||||
|
||||
**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Practical Examples](./examples.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Completion and Documentation
|
||||
|
||||
### Step 12: Create Final Evaluation Report
|
||||
|
||||
**Report Template**:
|
||||
```markdown
|
||||
# LangGraph Application Fine-Tuning Completion Report
|
||||
|
||||
Project: [Project Name]
|
||||
Implementation Period: 2024-11-24 10:00 - 2024-11-24 15:00 (5 hours)
|
||||
Implementer: Claude Code with fine-tune skill
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This fine-tuning project executed prompt optimization for a LangGraph chatbot application and achieved the following results:
|
||||
|
||||
- ✅ **Accuracy**: 75.0% → 92.0% (+17.0%, achieved 90% target)
|
||||
- ✅ **Latency**: 2.5s → 1.9s (-24.0%, achieved 2.0s target)
|
||||
- ⚠️ **Cost**: $0.015 → $0.011 (-26.7%, target $0.010 not met)
|
||||
|
||||
A total of 3 iterations were executed, achieving 2 out of 3 metric targets.
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Iteration Count and Execution Time
|
||||
- **Total Iterations**: 3
|
||||
- **Optimized Nodes**: 2 (analyze_intent, generate_response)
|
||||
- **Evaluation Run Count**: 20 times (baseline 5 times + 5 times × 3 post-iteration)
|
||||
- **Total Execution Time**: Approximately 5 hours
|
||||
|
||||
### Final Results
|
||||
|
||||
| Metric | Initial | Final | Improvement | % Change | Target | Achievement |
|
||||
|--------|---------|-------|-------------|----------|--------|-------------|
|
||||
| Accuracy | 75.0% | 92.0% | +17.0% | +22.7% | 90.0% | ✅ 102.2% achieved |
|
||||
| Latency | 2.5s | 1.9s | -0.6s | -24.0% | 2.0s | ✅ 95.0% achieved |
|
||||
| Cost/req | $0.015 | $0.011 | -$0.004 | -26.7% | $0.010 | ⚠️ 90.9% achieved |
|
||||
|
||||
## Iteration Details
|
||||
|
||||
### Iteration 1: Optimization of analyze_intent node
|
||||
|
||||
**Date/Time**: 2024-11-24 11:00
|
||||
**Target Node**: src/nodes/analyzer.py:25-45
|
||||
|
||||
**Changes**:
|
||||
1. temperature: 1.0 → 0.3
|
||||
2. Added 5 few-shot examples
|
||||
3. Structured JSON output format
|
||||
4. Defined clear classification categories (4)
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 75.0% → 86.0% (+11.0%)
|
||||
- Latency: 2.5s → 2.4s (-0.1s)
|
||||
- Cost: $0.015 → $0.014 (-$0.001)
|
||||
|
||||
**Learning**: Few-shot examples and clear output format most effective for accuracy improvement
|
||||
|
||||
---
|
||||
|
||||
### Iteration 2: Optimization of generate_response node
|
||||
|
||||
**Date/Time**: 2024-11-24 13:00
|
||||
**Target Node**: src/nodes/generator.py:45-68
|
||||
|
||||
**Changes**:
|
||||
1. Added conciseness instructions ("answer in 2-3 sentences")
|
||||
2. max_tokens: unlimited → 500
|
||||
3. temperature: 0.7 → 0.5
|
||||
4. Clarified response style
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 86.0% → 88.0% (+2.0%)
|
||||
- Latency: 2.4s → 2.0s (-0.4s)
|
||||
- Cost: $0.014 → $0.011 (-$0.003)
|
||||
|
||||
**Learning**: max_tokens limit contributed significantly to latency and cost reduction
|
||||
|
||||
---
|
||||
|
||||
### Iteration 3: Additional improvement of analyze_intent
|
||||
|
||||
**Date/Time**: 2024-11-24 14:30
|
||||
**Target Node**: src/nodes/analyzer.py:25-45
|
||||
|
||||
**Changes**:
|
||||
1. Increased few-shot examples from 5 → 10
|
||||
2. Added edge case handling
|
||||
3. Re-classification logic with confidence threshold
|
||||
|
||||
**Results**:
|
||||
- Accuracy: 88.0% → 92.0% (+4.0%)
|
||||
- Latency: 2.0s → 1.9s (-0.1s)
|
||||
- Cost: $0.011 → $0.011 (±0)
|
||||
|
||||
**Learning**: Additional few-shot examples broke through final accuracy barrier
|
||||
|
||||
## Final Changes
|
||||
|
||||
### src/nodes/analyzer.py (analyze_intent node)
|
||||
|
||||
#### Before
|
||||
```python
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=1.0)
|
||||
messages = [
|
||||
SystemMessage(content="You are an intent analyzer. Analyze user input."),
|
||||
HumanMessage(content=f"Analyze: {state['user_input']}")
|
||||
]
|
||||
response = llm.invoke(messages)
|
||||
state["intent"] = response.content
|
||||
return state
|
||||
```
|
||||
|
||||
#### After
|
||||
```python
|
||||
def analyze_intent(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.3)
|
||||
|
||||
system_prompt = """You are an intent classifier for a customer support chatbot.
|
||||
Classify user input into: product_inquiry, technical_support, billing, or general.
|
||||
Output JSON: {"intent": "<category>", "confidence": <0.0-1.0>, "reasoning": "<explanation>"}
|
||||
|
||||
[10 few-shot examples...]
|
||||
"""
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=system_prompt),
|
||||
HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
|
||||
]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
intent_data = json.loads(response.content)
|
||||
|
||||
# Low confidence → re-classify as general
|
||||
if intent_data["confidence"] < 0.7:
|
||||
intent_data["intent"] = "general"
|
||||
|
||||
state["intent"] = intent_data["intent"]
|
||||
state["confidence"] = intent_data["confidence"]
|
||||
return state
|
||||
```
|
||||
|
||||
**Key Changes**:
|
||||
- temperature: 1.0 → 0.3
|
||||
- Few-shot examples: 0 → 10
|
||||
- Output: free text → JSON
|
||||
- Added confidence threshold fallback
|
||||
|
||||
---
|
||||
|
||||
### src/nodes/generator.py (generate_response node)
|
||||
|
||||
#### Before
|
||||
```python
|
||||
def generate_response(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.7)
|
||||
prompt = ChatPromptTemplate.from_messages([
|
||||
("system", "Generate helpful response based on context."),
|
||||
("human", "{context}\n\nQuestion: {question}")
|
||||
])
|
||||
chain = prompt | llm
|
||||
response = chain.invoke({"context": state["context"], "question": state["user_input"]})
|
||||
state["response"] = response.content
|
||||
return state
|
||||
```
|
||||
|
||||
#### After
|
||||
```python
|
||||
def generate_response(state: GraphState) -> GraphState:
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.5,
|
||||
max_tokens=500 # Output length limit
|
||||
)
|
||||
|
||||
system_prompt = """You are a helpful customer support assistant.
|
||||
|
||||
Guidelines:
|
||||
- Be concise: Answer in 2-3 sentences
|
||||
- Be friendly: Use a warm, professional tone
|
||||
- Be accurate: Base your answer on the provided context
|
||||
- If uncertain: Acknowledge and offer to escalate
|
||||
|
||||
Format: Direct answer followed by one optional clarifying sentence.
|
||||
"""
|
||||
|
||||
prompt = ChatPromptTemplate.from_messages([
|
||||
("system", system_prompt),
|
||||
("human", "Context: {context}\n\nQuestion: {question}\n\nAnswer:")
|
||||
])
|
||||
|
||||
chain = prompt | llm
|
||||
response = chain.invoke({"context": state["context"], "question": state["user_input"]})
|
||||
state["response"] = response.content
|
||||
return state
|
||||
```
|
||||
|
||||
**Key Changes**:
|
||||
- temperature: 0.7 → 0.5
|
||||
- max_tokens: unlimited → 500
|
||||
- Clear conciseness instruction ("2-3 sentences")
|
||||
- Added response style guidelines
|
||||
|
||||
## Detailed Evaluation Results
|
||||
|
||||
### Improvement Status by Test Case
|
||||
|
||||
| Case ID | Category | Before | After | Improved |
|
||||
|---------|----------|--------|-------|----------|
|
||||
| TC001 | Product | ❌ Wrong | ✅ Correct | ✅ |
|
||||
| TC002 | Technical | ❌ Wrong | ✅ Correct | ✅ |
|
||||
| TC003 | Billing | ✅ Correct | ✅ Correct | - |
|
||||
| TC004 | General | ✅ Correct | ✅ Correct | - |
|
||||
| TC005 | Product | ❌ Wrong | ✅ Correct | ✅ |
|
||||
| ... | ... | ... | ... | ... |
|
||||
| TC020 | Technical | ✅ Correct | ✅ Correct | - |
|
||||
|
||||
**Improved Cases**: 15/20 (75%)
|
||||
**Maintained Cases**: 5/20 (25%)
|
||||
**Degraded Cases**: 0/20 (0%)
|
||||
|
||||
### Latency Breakdown
|
||||
|
||||
| Node | Before | After | Change | % Change |
|
||||
|------|--------|-------|--------|----------|
|
||||
| analyze_intent | 0.5s | 0.4s | -0.1s | -20% |
|
||||
| retrieve_context | 0.2s | 0.2s | ±0s | 0% |
|
||||
| generate_response | 1.8s | 1.3s | -0.5s | -28% |
|
||||
| **Total** | **2.5s** | **1.9s** | **-0.6s** | **-24%** |
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
| Node | Before | After | Change | % Change |
|
||||
|------|--------|-------|--------|----------|
|
||||
| analyze_intent | $0.003 | $0.003 | ±$0 | 0% |
|
||||
| retrieve_context | $0.001 | $0.001 | ±$0 | 0% |
|
||||
| generate_response | $0.011 | $0.007 | -$0.004 | -36% |
|
||||
| **Total** | **$0.015** | **$0.011** | **-$0.004** | **-27%** |
|
||||
|
||||
## Future Recommendations
|
||||
|
||||
### Short-term (1-2 weeks)
|
||||
1. **Achieve cost target**: $0.011 → $0.010
|
||||
- Approach: Consider partial migration to Claude 3.5 Haiku
|
||||
- Estimated effect: -$0.002-0.003/req
|
||||
|
||||
2. **Further accuracy improvement**: 92.0% → 95.0%
|
||||
- Approach: Analyze error cases and add few-shot examples
|
||||
- Estimated effect: +3.0%
|
||||
|
||||
### Mid-term (1-2 months)
|
||||
1. **Model optimization**
|
||||
- Use Haiku for simple intent classification
|
||||
- Use Sonnet only for complex response generation
|
||||
- Estimated effect: -30-40% cost, minimal latency impact
|
||||
|
||||
2. **Leverage prompt caching**
|
||||
- Cache system prompts and few-shot examples
|
||||
- Estimated effect: -50% cost (when cache hits)
|
||||
|
||||
### Long-term (3-6 months)
|
||||
1. **Consider fine-tuned models**
|
||||
- Model fine-tuning with proprietary data
|
||||
- No need for few-shot examples, more concise prompts
|
||||
- Estimated effect: -60% cost, +5% accuracy
|
||||
|
||||
## Conclusion
|
||||
|
||||
This project achieved the following through fine-tuning of the LangGraph application:
|
||||
|
||||
✅ **Successes**:
|
||||
1. Significant accuracy improvement (+22.7%) - exceeded target by 2.2%
|
||||
2. Notable latency improvement (-24.0%) - exceeded target by 5%
|
||||
3. Cost reduction (-26.7%) - 9.1% away from target
|
||||
|
||||
⚠️ **Challenges**:
|
||||
1. Cost target not met ($0.011 vs $0.010 target) - addressable through migration to lighter models
|
||||
|
||||
📈 **Business Impact**:
|
||||
- Improved user satisfaction (through accuracy improvement)
|
||||
- Reduced operational costs (through latency and cost reduction)
|
||||
- Improved scalability (through efficient resource usage)
|
||||
|
||||
🎯 **Next Steps**:
|
||||
1. Validate migration to lighter models for cost reduction
|
||||
2. Continuous monitoring and evaluation
|
||||
3. Expansion to new use cases
|
||||
|
||||
---
|
||||
|
||||
Created: 2024-11-24 15:00:00
|
||||
Creator: Claude Code (fine-tune skill)
|
||||
```
|
||||
|
||||
### Step 13: Commit Code and Update Documentation
|
||||
|
||||
**Git Commit Example**:
|
||||
```bash
|
||||
# Commit changes
|
||||
git add src/nodes/analyzer.py src/nodes/generator.py
|
||||
git commit -m "feat: optimize LangGraph prompts for accuracy and latency
|
||||
|
||||
Iteration 1-3 of fine-tuning process:
|
||||
- analyze_intent: added few-shot examples, JSON output, lower temperature
|
||||
- generate_response: added conciseness guidelines, max_tokens limit
|
||||
|
||||
Results:
|
||||
- Accuracy: 75.0% → 92.0% (+17.0%, goal 90% ✅)
|
||||
- Latency: 2.5s → 1.9s (-0.6s, goal 2.0s ✅)
|
||||
- Cost: $0.015 → $0.011 (-$0.004, goal $0.010 ⚠️)
|
||||
|
||||
Full report: evaluation_results/final_report.md"
|
||||
|
||||
# Commit evaluation results
|
||||
git add evaluation_results/
|
||||
git commit -m "docs: add fine-tuning evaluation results and final report"
|
||||
|
||||
# Add tag
|
||||
git tag -a fine-tune-v1.0 -m "Fine-tuning completed: 92% accuracy achieved"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Following this workflow enables:
|
||||
- ✅ Systematic fine-tuning process execution
|
||||
- ✅ Data-driven decision making
|
||||
- ✅ Continuous improvement and verification
|
||||
- ✅ Complete documentation and traceability
|
||||
170
skills/langgraph-master/01_core_concepts_edge.md
Normal file
170
skills/langgraph-master/01_core_concepts_edge.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# Edge
|
||||
|
||||
Control flow that defines transitions between nodes.
|
||||
|
||||
## Overview
|
||||
|
||||
Edges determine "what to do next". Nodes perform processing, and edges dictate the next action.
|
||||
|
||||
## Types of Edges
|
||||
|
||||
### 1. Normal Edges (Fixed Transitions)
|
||||
|
||||
Always transition to a specific node:
|
||||
|
||||
```python
|
||||
from langgraph.graph import START, END
|
||||
|
||||
# From START to node_a
|
||||
builder.add_edge(START, "node_a")
|
||||
|
||||
# From node_a to node_b
|
||||
builder.add_edge("node_a", "node_b")
|
||||
|
||||
# From node_b to end
|
||||
builder.add_edge("node_b", END)
|
||||
```
|
||||
|
||||
### 2. Conditional Edges (Dynamic Transitions)
|
||||
|
||||
Determine the destination based on state:
|
||||
|
||||
```python
|
||||
from typing import Literal
|
||||
|
||||
def should_continue(state: State) -> Literal["continue", "end"]:
|
||||
if state["iteration"] < state["max_iterations"]:
|
||||
return "continue"
|
||||
return "end"
|
||||
|
||||
# Add conditional edge
|
||||
builder.add_conditional_edges(
|
||||
"agent",
|
||||
should_continue,
|
||||
{
|
||||
"continue": "tools", # Go to tools if continue
|
||||
"end": END # End if end
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Entry Points
|
||||
|
||||
Define the starting point of the graph:
|
||||
|
||||
```python
|
||||
# Simple entry
|
||||
builder.add_edge(START, "first_node")
|
||||
|
||||
# Conditional entry
|
||||
builder.add_conditional_edges(
|
||||
START,
|
||||
route_start,
|
||||
{
|
||||
"path_a": "node_a",
|
||||
"path_b": "node_b"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Parallel Execution
|
||||
|
||||
Nodes with multiple outgoing edges will have **all destination nodes execute in parallel** in the next step:
|
||||
|
||||
```python
|
||||
# From node_a to multiple nodes
|
||||
builder.add_edge("node_a", "node_b")
|
||||
builder.add_edge("node_a", "node_c")
|
||||
|
||||
# node_b and node_c execute in parallel
|
||||
```
|
||||
|
||||
To aggregate results from parallel execution, use a Reducer:
|
||||
|
||||
```python
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
results: Annotated[list, add] # Aggregate results from multiple nodes
|
||||
```
|
||||
|
||||
## Edge Control with Command
|
||||
|
||||
Specify the next destination from within a node:
|
||||
|
||||
```python
|
||||
from langgraph.types import Command
|
||||
|
||||
def smart_node(state: State) -> Command:
|
||||
result = analyze(state["data"])
|
||||
|
||||
if result["confidence"] > 0.8:
|
||||
return Command(
|
||||
update={"result": result},
|
||||
goto="finalize"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"result": result, "needs_review": True},
|
||||
goto="human_review"
|
||||
)
|
||||
```
|
||||
|
||||
## Conditional Branching Implementation Patterns
|
||||
|
||||
### Pattern 1: Tool Call Loop
|
||||
|
||||
```python
|
||||
def should_continue(state: State) -> Literal["continue", "end"]:
|
||||
messages = state["messages"]
|
||||
last_message = messages[-1]
|
||||
|
||||
# Continue if there are tool calls
|
||||
if last_message.tool_calls:
|
||||
return "continue"
|
||||
return "end"
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"agent",
|
||||
should_continue,
|
||||
{
|
||||
"continue": "tools",
|
||||
"end": END
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Routing
|
||||
|
||||
```python
|
||||
def route_query(state: State) -> Literal["search", "calculate", "general"]:
|
||||
query = state["query"]
|
||||
|
||||
if "calculate" in query or "+" in query:
|
||||
return "calculate"
|
||||
elif "search" in query:
|
||||
return "search"
|
||||
return "general"
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"router",
|
||||
route_query,
|
||||
{
|
||||
"search": "search_node",
|
||||
"calculate": "calculator_node",
|
||||
"general": "general_node"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Important Principles
|
||||
|
||||
1. **Explicit Control Flow**: Transitions should be transparent and traceable
|
||||
2. **Type Safety**: Explicitly specify destinations with Literal
|
||||
3. **Leverage Parallel Execution**: Execute independent tasks in parallel
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [01_core_concepts_node.md](01_core_concepts_node.md) - Node implementation
|
||||
- [02_graph_architecture_routing.md](02_graph_architecture_routing.md) - Routing patterns
|
||||
- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Parallel processing patterns
|
||||
132
skills/langgraph-master/01_core_concepts_node.md
Normal file
132
skills/langgraph-master/01_core_concepts_node.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Node
|
||||
|
||||
Python functions that execute individual tasks.
|
||||
|
||||
## Overview
|
||||
|
||||
Nodes are "processing units" that read state, perform some processing, and return updates.
|
||||
|
||||
## Basic Implementation
|
||||
|
||||
```python
|
||||
def my_node(state: State) -> dict:
|
||||
# Get information from state
|
||||
messages = state["messages"]
|
||||
|
||||
# Execute processing
|
||||
result = process_messages(messages)
|
||||
|
||||
# Return updates (don't modify state directly)
|
||||
return {"result": result, "count": state["count"] + 1}
|
||||
```
|
||||
|
||||
## Types of Nodes
|
||||
|
||||
### 1. LLM Call Node
|
||||
|
||||
```python
|
||||
def llm_node(state: State):
|
||||
messages = state["messages"]
|
||||
response = llm.invoke(messages)
|
||||
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
### 2. Tool Execution Node
|
||||
|
||||
```python
|
||||
from langgraph.prebuilt import ToolNode
|
||||
|
||||
tools = [search_tool, calculator_tool]
|
||||
tool_node = ToolNode(tools)
|
||||
```
|
||||
|
||||
### 3. Processing Node
|
||||
|
||||
```python
|
||||
def process_node(state: State):
|
||||
data = state["raw_data"]
|
||||
|
||||
# Data processing
|
||||
processed = clean_and_transform(data)
|
||||
|
||||
return {"processed_data": processed}
|
||||
```
|
||||
|
||||
## Node Signature
|
||||
|
||||
Nodes can accept the following parameters:
|
||||
|
||||
```python
|
||||
from langgraph.types import Command
|
||||
|
||||
def advanced_node(
|
||||
state: State,
|
||||
config: RunnableConfig, # Optional
|
||||
) -> dict | Command:
|
||||
# Get configuration from config
|
||||
thread_id = config["configurable"]["thread_id"]
|
||||
|
||||
# Processing...
|
||||
|
||||
return {"result": result}
|
||||
```
|
||||
|
||||
## Control with Command API
|
||||
|
||||
Specify state updates and control flow simultaneously:
|
||||
|
||||
```python
|
||||
from langgraph.types import Command
|
||||
|
||||
def decision_node(state: State) -> Command:
|
||||
if state["should_continue"]:
|
||||
return Command(
|
||||
update={"status": "continuing"},
|
||||
goto="next_node"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"status": "done"},
|
||||
goto=END
|
||||
)
|
||||
```
|
||||
|
||||
## Important Principles
|
||||
|
||||
1. **Idempotency**: Return the same output for the same input
|
||||
2. **Return Updates**: Return update contents instead of directly modifying state
|
||||
3. **Single Responsibility**: Each node does one thing well
|
||||
|
||||
## Adding Nodes
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph
|
||||
|
||||
builder = StateGraph(State)
|
||||
|
||||
# Add nodes
|
||||
builder.add_node("analyze", analyze_node)
|
||||
builder.add_node("decide", decide_node)
|
||||
builder.add_node("execute", execute_node)
|
||||
|
||||
# Add tool node
|
||||
builder.add_node("tools", tool_node)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
def robust_node(state: State) -> dict:
|
||||
try:
|
||||
result = risky_operation(state["data"])
|
||||
return {"result": result, "error": None}
|
||||
except Exception as e:
|
||||
return {"result": None, "error": str(e)}
|
||||
```
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - How to define State
|
||||
- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Connections between nodes
|
||||
- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool node details
|
||||
57
skills/langgraph-master/01_core_concepts_overview.md
Normal file
57
skills/langgraph-master/01_core_concepts_overview.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# 01. Core Concepts
|
||||
|
||||
Understanding the three core elements of LangGraph.
|
||||
|
||||
## Overview
|
||||
|
||||
LangGraph is a framework that models agent workflows as **graphs**. By decomposing complex workflows into **discrete steps (nodes)**, it achieves the following:
|
||||
|
||||
- **Improved Resilience**: Create checkpoints at node boundaries
|
||||
- **Enhanced Visibility**: Enable state inspection between each step
|
||||
- **Independent Testing**: Easy unit testing of individual nodes
|
||||
- **Error Handling**: Apply different strategies for each error type
|
||||
|
||||
## Three Core Elements
|
||||
|
||||
### 1. [State](01_core_concepts_state.md)
|
||||
- Memory shared across all nodes in the graph
|
||||
- Snapshot of the current execution state
|
||||
- Defined with TypedDict or Pydantic models
|
||||
|
||||
### 2. [Node](01_core_concepts_node.md)
|
||||
- Python functions that execute individual tasks
|
||||
- Receive the current state and return updates
|
||||
- Basic unit of processing
|
||||
|
||||
### 3. [Edge](01_core_concepts_edge.md)
|
||||
- Define transitions between nodes
|
||||
- Fixed transitions or conditional branching
|
||||
- Determine control flow
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
The core concept of LangGraph is **decomposition into discrete steps**:
|
||||
|
||||
```python
|
||||
# Split agent into individual nodes
|
||||
graph = StateGraph(State)
|
||||
graph.add_node("analyze", analyze_node) # Analysis step
|
||||
graph.add_node("decide", decide_node) # Decision step
|
||||
graph.add_node("execute", execute_node) # Execution step
|
||||
```
|
||||
|
||||
This approach allows each step to operate independently, building a robust system as a whole.
|
||||
|
||||
## Important Principles
|
||||
|
||||
1. **Store Raw Data**: Store raw data in State, format prompts dynamically within nodes
|
||||
2. **Return Updates**: Nodes return update contents instead of directly modifying state
|
||||
3. **Transparent Control Flow**: Explicitly declare the next destination with Command objects
|
||||
|
||||
## Next Steps
|
||||
|
||||
For details on each element, refer to the following pages:
|
||||
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - State management details
|
||||
- [01_core_concepts_node.md](01_core_concepts_node.md) - How to implement nodes
|
||||
- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Edges and control flow
|
||||
102
skills/langgraph-master/01_core_concepts_state.md
Normal file
102
skills/langgraph-master/01_core_concepts_state.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# State
|
||||
|
||||
Memory shared across all nodes in the graph.
|
||||
|
||||
## Overview
|
||||
|
||||
State is like a "notebook" that records everything the agent learns and decides. It is a **shared data structure** accessible to all nodes and edges in the graph.
|
||||
|
||||
## Definition Methods
|
||||
|
||||
### Using TypedDict
|
||||
|
||||
```python
|
||||
from typing import TypedDict
|
||||
|
||||
class State(TypedDict):
|
||||
messages: list[str]
|
||||
user_name: str
|
||||
count: int
|
||||
```
|
||||
|
||||
### Using Pydantic Model
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
|
||||
class State(BaseModel):
|
||||
messages: list[str]
|
||||
user_name: str
|
||||
count: int = 0 # Default value
|
||||
```
|
||||
|
||||
## Reducer (Controlling Update Methods)
|
||||
|
||||
A function that specifies how each key is updated. If not specified, it defaults to **value overwrite**.
|
||||
|
||||
### Addition (Adding to List)
|
||||
|
||||
```python
|
||||
from typing import Annotated
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
messages: Annotated[list[str], add] # Add to existing list
|
||||
count: int # Overwrite
|
||||
```
|
||||
|
||||
### Custom Reducer
|
||||
|
||||
```python
|
||||
def concat_strings(existing: str, new: str) -> str:
|
||||
return existing + " " + new
|
||||
|
||||
class State(TypedDict):
|
||||
text: Annotated[str, concat_strings]
|
||||
```
|
||||
|
||||
## MessagesState (LLM Preset)
|
||||
|
||||
For LLM conversations, LangChain's `MessagesState` is convenient:
|
||||
|
||||
```python
|
||||
from langgraph.graph import MessagesState
|
||||
|
||||
# This is equivalent to:
|
||||
class MessagesState(TypedDict):
|
||||
messages: Annotated[list[AnyMessage], add_messages]
|
||||
```
|
||||
|
||||
The `add_messages` reducer:
|
||||
- Adds new messages
|
||||
- Updates existing messages (ID-based)
|
||||
- Supports OpenAI format shorthand
|
||||
|
||||
## Important Principles
|
||||
|
||||
1. **Store Raw Data**: Format prompts within nodes
|
||||
2. **Clear Schema**: Define types with TypedDict or Pydantic
|
||||
3. **Control with Reducer**: Explicitly specify update methods
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
from typing import Annotated, TypedDict
|
||||
from operator import add
|
||||
|
||||
class AgentState(TypedDict):
|
||||
# Messages are added to the list
|
||||
messages: Annotated[list[str], add]
|
||||
|
||||
# User information is overwritten
|
||||
user_id: str
|
||||
user_name: str
|
||||
|
||||
# Counter is also overwritten
|
||||
iteration_count: int
|
||||
```
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [01_core_concepts_node.md](01_core_concepts_node.md) - How to use State in nodes
|
||||
- [03_memory_management_overview.md](03_memory_management_overview.md) - State persistence
|
||||
338
skills/langgraph-master/02_graph_architecture_agent.md
Normal file
338
skills/langgraph-master/02_graph_architecture_agent.md
Normal file
@@ -0,0 +1,338 @@
|
||||
# Agent (Autonomous Tool Usage)
|
||||
|
||||
A pattern where the LLM dynamically determines tool selection to handle unpredictable problem-solving.
|
||||
|
||||
## Overview
|
||||
|
||||
The Agent pattern follows **ReAct** (Reasoning + Acting), where the LLM dynamically selects and executes tools to solve problems.
|
||||
|
||||
## ReAct Pattern
|
||||
|
||||
**ReAct** = Reasoning + Acting
|
||||
|
||||
1. **Reasoning**: Think "What should I do next?"
|
||||
2. **Acting**: Take action using tools
|
||||
3. **Observing**: Observe the results
|
||||
4. **Repeat steps 1-3** until reaching a final answer
|
||||
|
||||
## Implementation Example: Basic Agent
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph, START, END, MessagesState
|
||||
from langgraph.prebuilt import ToolNode
|
||||
from typing import Literal
|
||||
|
||||
# Tool definitions
|
||||
@tool
|
||||
def search(query: str) -> str:
|
||||
"""Execute web search"""
|
||||
return perform_search(query)
|
||||
|
||||
@tool
|
||||
def calculator(expression: str) -> float:
|
||||
"""Execute calculation"""
|
||||
return eval(expression)
|
||||
|
||||
tools = [search, calculator]
|
||||
|
||||
# Agent node
|
||||
def agent_node(state: MessagesState):
|
||||
"""LLM determines tool usage"""
|
||||
messages = state["messages"]
|
||||
|
||||
# Invoke LLM with tools
|
||||
response = llm_with_tools.invoke(messages)
|
||||
|
||||
return {"messages": [response]}
|
||||
|
||||
# Continue decision
|
||||
def should_continue(state: MessagesState) -> Literal["tools", "end"]:
|
||||
"""Check if there are tool calls"""
|
||||
last_message = state["messages"][-1]
|
||||
|
||||
# Continue if there are tool calls
|
||||
if last_message.tool_calls:
|
||||
return "tools"
|
||||
|
||||
# End if no tool calls (final answer)
|
||||
return "end"
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(MessagesState)
|
||||
|
||||
builder.add_node("agent", agent_node)
|
||||
builder.add_node("tools", ToolNode(tools))
|
||||
|
||||
builder.add_edge(START, "agent")
|
||||
|
||||
# ReAct loop
|
||||
builder.add_conditional_edges(
|
||||
"agent",
|
||||
should_continue,
|
||||
{
|
||||
"tools": "tools",
|
||||
"end": END
|
||||
}
|
||||
)
|
||||
|
||||
# Return to agent after tool execution
|
||||
builder.add_edge("tools", "agent")
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Tool Definitions
|
||||
|
||||
### Basic Tools
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def get_weather(location: str) -> str:
|
||||
"""Get weather for the specified location.
|
||||
|
||||
Args:
|
||||
location: City name (e.g., "Tokyo", "New York")
|
||||
"""
|
||||
return fetch_weather_data(location)
|
||||
|
||||
@tool
|
||||
def send_email(to: str, subject: str, body: str) -> str:
|
||||
"""Send an email.
|
||||
|
||||
Args:
|
||||
to: Recipient email address
|
||||
subject: Email subject
|
||||
body: Email body
|
||||
"""
|
||||
return send_email_api(to, subject, body)
|
||||
```
|
||||
|
||||
### Structured Output Tools
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class WeatherResponse(BaseModel):
|
||||
location: str
|
||||
temperature: float
|
||||
condition: str
|
||||
humidity: int
|
||||
|
||||
@tool(response_format="content_and_artifact")
|
||||
def get_detailed_weather(location: str) -> tuple[str, WeatherResponse]:
|
||||
"""Get detailed weather information"""
|
||||
data = fetch_weather_data(location)
|
||||
|
||||
weather = WeatherResponse(
|
||||
location=location,
|
||||
temperature=data["temp"],
|
||||
condition=data["condition"],
|
||||
humidity=data["humidity"]
|
||||
)
|
||||
|
||||
message = f"Weather in {location}: {weather.condition}, {weather.temperature}°C"
|
||||
|
||||
return message, weather
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Multi-Agent Collaboration
|
||||
|
||||
```python
|
||||
# Specialist agents
|
||||
def research_agent(state: State):
|
||||
"""Research specialist agent"""
|
||||
response = research_llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def coding_agent(state: State):
|
||||
"""Coding specialist agent"""
|
||||
response = coding_llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
# Router
|
||||
def route_to_specialist(state: State) -> Literal["research", "coding"]:
|
||||
"""Select specialist based on task"""
|
||||
last_message = state["messages"][-1]
|
||||
|
||||
if "research" in last_message.content or "search" in last_message.content:
|
||||
return "research"
|
||||
elif "code" in last_message.content or "implement" in last_message.content:
|
||||
return "coding"
|
||||
|
||||
return "research" # Default
|
||||
```
|
||||
|
||||
### Pattern 2: Agent with Memory
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
class AgentState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
context: dict # Long-term memory
|
||||
|
||||
def agent_with_memory(state: AgentState):
|
||||
"""Agent utilizing context"""
|
||||
messages = state["messages"]
|
||||
context = state.get("context", {})
|
||||
|
||||
# Add context to prompt
|
||||
system_message = f"Context: {context}"
|
||||
|
||||
response = llm_with_tools.invoke([
|
||||
{"role": "system", "content": system_message},
|
||||
*messages
|
||||
])
|
||||
|
||||
return {"messages": [response]}
|
||||
|
||||
# Compile with checkpointer
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
### Pattern 3: Human-in-the-Loop Agent
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt
|
||||
|
||||
def careful_agent(state: State):
|
||||
"""Confirm with human before important actions"""
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
|
||||
# Request confirmation for important tool calls
|
||||
if response.tool_calls:
|
||||
for tool_call in response.tool_calls:
|
||||
if tool_call["name"] in ["send_email", "delete_data"]:
|
||||
# Wait for human approval
|
||||
approved = interrupt({
|
||||
"action": tool_call["name"],
|
||||
"args": tool_call["args"],
|
||||
"message": "Approve this action?"
|
||||
})
|
||||
|
||||
if not approved:
|
||||
return {
|
||||
"messages": [
|
||||
{"role": "assistant", "content": "Action cancelled by user"}
|
||||
]
|
||||
}
|
||||
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
### Pattern 4: Error Handling and Retry
|
||||
|
||||
```python
|
||||
class RobustAgentState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
retry_count: int
|
||||
errors: list[str]
|
||||
|
||||
def robust_tool_node(state: RobustAgentState):
|
||||
"""Tool execution with error handling"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
try:
|
||||
result = execute_tool(tool_call)
|
||||
tool_results.append(result)
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Tool {tool_call['name']} failed: {str(e)}"
|
||||
|
||||
# Check if retry is possible
|
||||
if state.get("retry_count", 0) < 3:
|
||||
tool_results.append({
|
||||
"tool_call_id": tool_call["id"],
|
||||
"error": error_msg,
|
||||
"retry": True
|
||||
})
|
||||
else:
|
||||
tool_results.append({
|
||||
"tool_call_id": tool_call["id"],
|
||||
"error": "Max retries exceeded",
|
||||
"retry": False
|
||||
})
|
||||
|
||||
return {
|
||||
"messages": tool_results,
|
||||
"retry_count": state.get("retry_count", 0) + 1
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced Tool Features
|
||||
|
||||
### Dynamic Tool Generation
|
||||
|
||||
```python
|
||||
def create_tool_for_api(api_spec: dict):
|
||||
"""Dynamically generate tool from API specification"""
|
||||
|
||||
@tool
|
||||
def dynamic_api_tool(**kwargs) -> str:
|
||||
f"""
|
||||
{api_spec['description']}
|
||||
|
||||
Args: {api_spec['parameters']}
|
||||
"""
|
||||
return call_api(api_spec['endpoint'], kwargs)
|
||||
|
||||
return dynamic_api_tool
|
||||
```
|
||||
|
||||
### Conditional Tool Usage
|
||||
|
||||
```python
|
||||
def conditional_agent(state: State):
|
||||
"""Change toolset based on situation"""
|
||||
context = state.get("context", {})
|
||||
|
||||
# Basic tools only for beginners
|
||||
if context.get("user_level") == "beginner":
|
||||
tools = [basic_search, simple_calculator]
|
||||
# Advanced tools for advanced users
|
||||
else:
|
||||
tools = [advanced_search, scientific_calculator, code_executor]
|
||||
|
||||
llm_with_selected_tools = llm.bind_tools(tools)
|
||||
response = llm_with_selected_tools.invoke(state["messages"])
|
||||
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Flexibility**: Dynamically responds to unpredictable problems
|
||||
✅ **Autonomy**: LLM selects optimal tools and strategies
|
||||
✅ **Extensibility**: Extend functionality by simply adding tools
|
||||
✅ **Adaptability**: Solves complex multi-step tasks
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Unpredictability**: May behave differently with same input
|
||||
⚠️ **Cost**: Multiple LLM calls occur
|
||||
⚠️ **Infinite Loops**: Proper termination conditions required
|
||||
⚠️ **Tool Misuse**: LLM may use tools incorrectly
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Clear Tool Descriptions**: Write detailed tool docstrings
|
||||
2. **Maximum Iterations**: Set upper limit for loops
|
||||
3. **Error Handling**: Handle tool execution errors appropriately
|
||||
4. **Logging**: Make agent behavior traceable
|
||||
|
||||
## Summary
|
||||
|
||||
The Agent pattern is optimal for **dynamic and uncertain problem-solving**. It autonomously solves problems using tools through the ReAct loop.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md) - Differences between Workflow and Agent
|
||||
- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool details
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Human intervention
|
||||
@@ -0,0 +1,335 @@
|
||||
# Evaluator-Optimizer (Evaluation-Improvement Loop)
|
||||
|
||||
A pattern that repeats generation and evaluation, continuing iterative improvement until acceptable criteria are met.
|
||||
|
||||
## Overview
|
||||
|
||||
Evaluator-Optimizer is a pattern that repeats the **generate → evaluate → improve** loop, continuing until quality standards are met.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Code generation and quality verification
|
||||
- Translation accuracy improvement
|
||||
- Gradual content improvement
|
||||
- Iterative solution for optimization problems
|
||||
|
||||
## Implementation Example: Translation Quality Improvement
|
||||
|
||||
```python
|
||||
from typing import TypedDict
|
||||
|
||||
class State(TypedDict):
|
||||
original_text: str
|
||||
translated_text: str
|
||||
quality_score: float
|
||||
iteration: int
|
||||
max_iterations: int
|
||||
feedback: str
|
||||
|
||||
def generator_node(state: State):
|
||||
"""Generate or improve translation"""
|
||||
if state.get("translated_text"):
|
||||
# Improve existing translation
|
||||
prompt = f"""
|
||||
Original: {state['original_text']}
|
||||
Current translation: {state['translated_text']}
|
||||
Feedback: {state['feedback']}
|
||||
|
||||
Improve the translation based on the feedback.
|
||||
"""
|
||||
else:
|
||||
# Initial translation
|
||||
prompt = f"Translate to Japanese: {state['original_text']}"
|
||||
|
||||
translated = llm.invoke(prompt)
|
||||
|
||||
return {
|
||||
"translated_text": translated,
|
||||
"iteration": state.get("iteration", 0) + 1
|
||||
}
|
||||
|
||||
def evaluator_node(state: State):
|
||||
"""Evaluate translation quality"""
|
||||
evaluation_prompt = f"""
|
||||
Original: {state['original_text']}
|
||||
Translation: {state['translated_text']}
|
||||
|
||||
Rate the translation quality (0-1) and provide specific feedback.
|
||||
Format: SCORE: 0.X\nFEEDBACK: ...
|
||||
"""
|
||||
|
||||
result = llm.invoke(evaluation_prompt)
|
||||
|
||||
# Extract score and feedback
|
||||
score = extract_score(result)
|
||||
feedback = extract_feedback(result)
|
||||
|
||||
return {
|
||||
"quality_score": score,
|
||||
"feedback": feedback
|
||||
}
|
||||
|
||||
def should_continue(state: State) -> Literal["improve", "done"]:
|
||||
"""Continuation decision"""
|
||||
# Check if quality standard is met
|
||||
if state["quality_score"] >= 0.9:
|
||||
return "done"
|
||||
|
||||
# Check if maximum iterations reached
|
||||
if state["iteration"] >= state["max_iterations"]:
|
||||
return "done"
|
||||
|
||||
return "improve"
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(State)
|
||||
|
||||
builder.add_node("generator", generator_node)
|
||||
builder.add_node("evaluator", evaluator_node)
|
||||
|
||||
builder.add_edge(START, "generator")
|
||||
builder.add_edge("generator", "evaluator")
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"evaluator",
|
||||
should_continue,
|
||||
{
|
||||
"improve": "generator", # Loop
|
||||
"done": END
|
||||
}
|
||||
)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Multiple Evaluation Criteria
|
||||
|
||||
```python
|
||||
class MultiEvalState(TypedDict):
|
||||
content: str
|
||||
scores: dict[str, float] # Multiple evaluation scores
|
||||
min_scores: dict[str, float] # Minimum value for each criterion
|
||||
|
||||
def multi_evaluator(state: State):
|
||||
"""Evaluate from multiple perspectives"""
|
||||
content = state["content"]
|
||||
|
||||
# Evaluate each perspective
|
||||
scores = {
|
||||
"accuracy": evaluate_accuracy(content),
|
||||
"readability": evaluate_readability(content),
|
||||
"completeness": evaluate_completeness(content)
|
||||
}
|
||||
|
||||
return {"scores": scores}
|
||||
|
||||
def multi_should_continue(state: MultiEvalState):
|
||||
"""Check if all criteria are met"""
|
||||
for criterion, min_score in state["min_scores"].items():
|
||||
if state["scores"][criterion] < min_score:
|
||||
return "improve"
|
||||
|
||||
return "done"
|
||||
```
|
||||
|
||||
### Pattern 2: Progressive Criteria Increase
|
||||
|
||||
```python
|
||||
def adaptive_evaluator(state: State):
|
||||
"""Adjust criteria based on iteration"""
|
||||
iteration = state["iteration"]
|
||||
|
||||
# Start with lenient criteria, gradually stricter
|
||||
threshold = 0.7 + (iteration * 0.05)
|
||||
threshold = min(threshold, 0.95) # Maximum 0.95
|
||||
|
||||
score = evaluate(state["content"])
|
||||
|
||||
return {
|
||||
"quality_score": score,
|
||||
"threshold": threshold
|
||||
}
|
||||
|
||||
def adaptive_should_continue(state: State):
|
||||
if state["quality_score"] >= state["threshold"]:
|
||||
return "done"
|
||||
|
||||
if state["iteration"] >= state["max_iterations"]:
|
||||
return "done"
|
||||
|
||||
return "improve"
|
||||
```
|
||||
|
||||
### Pattern 3: Multiple Improvement Strategies
|
||||
|
||||
```python
|
||||
from typing import Literal
|
||||
|
||||
def strategy_router(state: State) -> Literal["minor_fix", "major_rewrite"]:
|
||||
"""Select improvement strategy based on score"""
|
||||
score = state["quality_score"]
|
||||
|
||||
if score >= 0.7:
|
||||
# Minor adjustments sufficient
|
||||
return "minor_fix"
|
||||
else:
|
||||
# Major rewrite needed
|
||||
return "major_rewrite"
|
||||
|
||||
def minor_fix_node(state: State):
|
||||
"""Small improvements"""
|
||||
prompt = f"Make minor improvements: {state['content']}\n{state['feedback']}"
|
||||
return {"content": llm.invoke(prompt)}
|
||||
|
||||
def major_rewrite_node(state: State):
|
||||
"""Major rewrite"""
|
||||
prompt = f"Completely rewrite: {state['content']}\n{state['feedback']}"
|
||||
return {"content": llm.invoke(prompt)}
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"evaluator",
|
||||
strategy_router,
|
||||
{
|
||||
"minor_fix": "minor_fix",
|
||||
"major_rewrite": "major_rewrite"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 4: Early Termination and Timeout
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
class TimedState(TypedDict):
|
||||
content: str
|
||||
quality_score: float
|
||||
iteration: int
|
||||
start_time: float
|
||||
max_duration: float # seconds
|
||||
|
||||
def timed_should_continue(state: TimedState):
|
||||
"""Check both quality criteria and timeout"""
|
||||
# Quality standard met
|
||||
if state["quality_score"] >= 0.9:
|
||||
return "done"
|
||||
|
||||
# Timeout
|
||||
elapsed = time.time() - state["start_time"]
|
||||
if elapsed >= state["max_duration"]:
|
||||
return "timeout"
|
||||
|
||||
# Maximum iterations
|
||||
if state["iteration"] >= 10:
|
||||
return "max_iterations"
|
||||
|
||||
return "improve"
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"evaluator",
|
||||
timed_should_continue,
|
||||
{
|
||||
"improve": "generator",
|
||||
"done": END,
|
||||
"timeout": "timeout_handler",
|
||||
"max_iterations": "max_iter_handler"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Evaluator Implementation Patterns
|
||||
|
||||
### Pattern 1: Rule-Based Evaluation
|
||||
|
||||
```python
|
||||
def rule_based_evaluator(state: State):
|
||||
"""Rule-based evaluation"""
|
||||
content = state["content"]
|
||||
score = 0.0
|
||||
feedback = []
|
||||
|
||||
# Length check
|
||||
if 100 <= len(content) <= 500:
|
||||
score += 0.3
|
||||
else:
|
||||
feedback.append("Length should be 100-500 characters")
|
||||
|
||||
# Keyword check
|
||||
required_keywords = state["required_keywords"]
|
||||
if all(kw in content for kw in required_keywords):
|
||||
score += 0.3
|
||||
else:
|
||||
missing = [kw for kw in required_keywords if kw not in content]
|
||||
feedback.append(f"Missing keywords: {missing}")
|
||||
|
||||
# Structure check
|
||||
if has_proper_structure(content):
|
||||
score += 0.4
|
||||
else:
|
||||
feedback.append("Improve structure")
|
||||
|
||||
return {
|
||||
"quality_score": score,
|
||||
"feedback": "\n".join(feedback)
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: LLM-Based Evaluation
|
||||
|
||||
```python
|
||||
def llm_evaluator(state: State):
|
||||
"""LLM evaluation"""
|
||||
evaluation_prompt = f"""
|
||||
Evaluate this content on a scale of 0-1:
|
||||
{state['content']}
|
||||
|
||||
Criteria:
|
||||
- Clarity
|
||||
- Completeness
|
||||
- Accuracy
|
||||
|
||||
Provide:
|
||||
1. Overall score (0-1)
|
||||
2. Specific feedback for improvement
|
||||
"""
|
||||
|
||||
result = llm.invoke(evaluation_prompt)
|
||||
|
||||
return {
|
||||
"quality_score": parse_score(result),
|
||||
"feedback": parse_feedback(result)
|
||||
}
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Quality Assurance**: Continue improvement until standards are met
|
||||
✅ **Automatic Optimization**: Quality improvement without manual intervention
|
||||
✅ **Feedback Loop**: Use evaluation results for next improvement
|
||||
✅ **Adaptive**: Iteration count varies based on problem difficulty
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Infinite Loops**: Set termination conditions appropriately
|
||||
⚠️ **Cost**: Multiple LLM calls occur
|
||||
⚠️ **No Convergence Guarantee**: May not always meet standards
|
||||
⚠️ **Local Optima**: Improvement may get stuck
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Clear Termination Conditions**: Set maximum iterations and timeout
|
||||
2. **Progressive Feedback**: Provide specific improvement points
|
||||
3. **Progress Tracking**: Record scores for each iteration
|
||||
4. **Fallback**: Handle cases where standards cannot be met
|
||||
|
||||
## Summary
|
||||
|
||||
Evaluator-Optimizer is optimal when **iterative improvement is needed until quality standards are met**. Clear evaluation criteria and termination conditions are key to success.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_prompt_chaining.md](02_graph_architecture_prompt_chaining.md) - Basic sequential processing
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combining with Agent
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Human evaluation
|
||||
@@ -0,0 +1,262 @@
|
||||
# Orchestrator-Worker (Master-Worker)
|
||||
|
||||
A pattern where an orchestrator decomposes tasks and delegates them to multiple workers.
|
||||
|
||||
## Overview
|
||||
|
||||
Orchestrator-Worker is a pattern where a **master node** decomposes tasks into multiple subtasks and delegates them in parallel to **worker nodes**. Also known as the Map-Reduce pattern.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Parallel processing of multiple documents
|
||||
- Dividing large tasks into smaller subtasks
|
||||
- Distributed processing of datasets
|
||||
- Parallel API calls
|
||||
|
||||
## Implementation Example: Summarizing Multiple Documents
|
||||
|
||||
```python
|
||||
from langgraph.types import Send
|
||||
from typing import TypedDict, Annotated
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
documents: list[str]
|
||||
summaries: Annotated[list[str], add]
|
||||
final_summary: str
|
||||
|
||||
class WorkerState(TypedDict):
|
||||
document: str
|
||||
summary: str
|
||||
|
||||
def orchestrator_node(state: State):
|
||||
"""Decompose task and delegate to workers"""
|
||||
# Send each document to a worker
|
||||
return [
|
||||
Send("worker", {"document": doc})
|
||||
for doc in state["documents"]
|
||||
]
|
||||
|
||||
def worker_node(state: WorkerState):
|
||||
"""Summarize individual document"""
|
||||
summary = llm.invoke(f"Summarize: {state['document']}")
|
||||
return {"summaries": [summary]}
|
||||
|
||||
def reducer_node(state: State):
|
||||
"""Integrate all summaries"""
|
||||
all_summaries = "\n".join(state["summaries"])
|
||||
final = llm.invoke(f"Create final summary from:\n{all_summaries}")
|
||||
return {"final_summary": final}
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(State)
|
||||
|
||||
builder.add_node("orchestrator", orchestrator_node)
|
||||
builder.add_node("worker", worker_node)
|
||||
builder.add_node("reducer", reducer_node)
|
||||
|
||||
# Orchestrator to workers (dynamic)
|
||||
builder.add_edge(START, "orchestrator")
|
||||
|
||||
# Workers to aggregation node
|
||||
builder.add_edge("worker", "reducer")
|
||||
builder.add_edge("reducer", END)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Using the Send API
|
||||
|
||||
Generate **node instances dynamically** with `Send` objects:
|
||||
|
||||
```python
|
||||
def orchestrator(state: State):
|
||||
# Generate worker instance for each item
|
||||
return [
|
||||
Send("worker", {"item": item, "index": i})
|
||||
for i, item in enumerate(state["items"])
|
||||
]
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Hierarchical Processing
|
||||
|
||||
```python
|
||||
def master_orchestrator(state: State):
|
||||
"""Master delegates to multiple sub-orchestrators"""
|
||||
return [
|
||||
Send("sub_orchestrator", {"category": cat, "items": items})
|
||||
for cat, items in group_by_category(state["all_items"])
|
||||
]
|
||||
|
||||
def sub_orchestrator(state: SubState):
|
||||
"""Sub-orchestrator delegates to individual workers"""
|
||||
return [
|
||||
Send("worker", {"item": item})
|
||||
for item in state["items"]
|
||||
]
|
||||
```
|
||||
|
||||
### Pattern 2: Conditional Worker Selection
|
||||
|
||||
```python
|
||||
def smart_orchestrator(state: State):
|
||||
"""Select different workers based on task characteristics"""
|
||||
tasks = []
|
||||
|
||||
for item in state["items"]:
|
||||
if is_complex(item):
|
||||
tasks.append(Send("advanced_worker", {"item": item}))
|
||||
else:
|
||||
tasks.append(Send("simple_worker", {"item": item}))
|
||||
|
||||
return tasks
|
||||
```
|
||||
|
||||
### Pattern 3: Batch Processing
|
||||
|
||||
```python
|
||||
def batch_orchestrator(state: State):
|
||||
"""Divide items into batches"""
|
||||
batch_size = 10
|
||||
batches = [
|
||||
state["items"][i:i+batch_size]
|
||||
for i in range(0, len(state["items"]), batch_size)
|
||||
]
|
||||
|
||||
return [
|
||||
Send("batch_worker", {"batch": batch, "batch_id": i})
|
||||
for i, batch in enumerate(batches)
|
||||
]
|
||||
|
||||
def batch_worker(state: BatchState):
|
||||
"""Process batch"""
|
||||
results = [process(item) for item in state["batch"]]
|
||||
return {"results": results}
|
||||
```
|
||||
|
||||
### Pattern 4: Error Handling and Retry
|
||||
|
||||
```python
|
||||
class WorkerState(TypedDict):
|
||||
item: str
|
||||
retry_count: int
|
||||
result: str
|
||||
error: str | None
|
||||
|
||||
def robust_worker(state: WorkerState):
|
||||
"""Worker with error handling"""
|
||||
try:
|
||||
result = process_item(state["item"])
|
||||
return {"result": result, "error": None}
|
||||
except Exception as e:
|
||||
if state.get("retry_count", 0) < 3:
|
||||
# Retry
|
||||
return Send("worker", {
|
||||
"item": state["item"],
|
||||
"retry_count": state.get("retry_count", 0) + 1
|
||||
})
|
||||
else:
|
||||
# Maximum retries reached
|
||||
return {"error": str(e)}
|
||||
```
|
||||
|
||||
## Dynamic Parallelism Control
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
def adaptive_orchestrator(state: State):
|
||||
"""Adjust parallelism based on system resources"""
|
||||
max_workers = int(os.getenv("MAX_WORKERS", "5"))
|
||||
|
||||
# Divide items into chunks
|
||||
items = state["items"]
|
||||
chunk_size = max(1, len(items) // max_workers)
|
||||
|
||||
chunks = [
|
||||
items[i:i+chunk_size]
|
||||
for i in range(0, len(items), chunk_size)
|
||||
]
|
||||
|
||||
return [
|
||||
Send("worker", {"chunk": chunk})
|
||||
for chunk in chunks
|
||||
]
|
||||
```
|
||||
|
||||
## Reducer Implementation Patterns
|
||||
|
||||
### Pattern 1: Simple Aggregation
|
||||
|
||||
```python
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
results: Annotated[list, add]
|
||||
|
||||
def reducer(state: State):
|
||||
"""Simple aggregation of results"""
|
||||
return {"total": sum(state["results"])}
|
||||
```
|
||||
|
||||
### Pattern 2: Complex Aggregation
|
||||
|
||||
```python
|
||||
def advanced_reducer(state: State):
|
||||
"""Calculate statistics"""
|
||||
results = state["results"]
|
||||
|
||||
return {
|
||||
"total": sum(results),
|
||||
"average": sum(results) / len(results),
|
||||
"min": min(results),
|
||||
"max": max(results)
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: LLM-Based Integration
|
||||
|
||||
```python
|
||||
def llm_reducer(state: State):
|
||||
"""Integrate multiple results with LLM"""
|
||||
all_results = "\n".join(state["summaries"])
|
||||
|
||||
final = llm.invoke(
|
||||
f"Synthesize these summaries into one:\n{all_results}"
|
||||
)
|
||||
|
||||
return {"final_summary": final}
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Scalability**: Workers automatically generated based on task count
|
||||
✅ **Parallel Processing**: High-speed processing of large amounts of data
|
||||
✅ **Flexibility**: Dynamically adjustable worker count
|
||||
✅ **Distributed Processing**: Distributable across multiple servers
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Memory Consumption**: Many worker instances are generated
|
||||
⚠️ **Reducer Design**: Appropriately design result aggregation method
|
||||
⚠️ **Error Handling**: Handle cases where some workers fail
|
||||
⚠️ **Resource Management**: May need to limit parallelism
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Batch Size Adjustment**: Too small causes overhead, too large reduces parallelism
|
||||
2. **Error Isolation**: One failure shouldn't affect the whole
|
||||
3. **Progress Tracking**: Visualize progress for large task counts
|
||||
4. **Resource Limits**: Set upper limit on parallelism
|
||||
|
||||
## Summary
|
||||
|
||||
Orchestrator-Worker is optimal for **parallel processing of large task volumes**. Workers are generated dynamically with the Send API, and results are aggregated with a Reducer.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with static parallel processing
|
||||
- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce details
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - Reducer details
|
||||
59
skills/langgraph-master/02_graph_architecture_overview.md
Normal file
59
skills/langgraph-master/02_graph_architecture_overview.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# 02. Graph Architecture
|
||||
|
||||
Six major graph patterns and agent design.
|
||||
|
||||
## Overview
|
||||
|
||||
LangGraph supports various architectural patterns. It's important to select the optimal pattern based on the nature of the problem.
|
||||
|
||||
## [Workflow vs Agent](02_graph_architecture_workflow_vs_agent.md)
|
||||
|
||||
First, understand the difference between Workflow and Agent:
|
||||
|
||||
- **Workflow**: Predetermined code paths, operates in a specific order
|
||||
- **Agent**: Dynamic, defines its own processes and tool usage
|
||||
|
||||
## Six Major Patterns
|
||||
|
||||
### 1. [Prompt Chaining (Sequential Processing)](02_graph_architecture_prompt_chaining.md)
|
||||
Each LLM call processes the previous output. Suitable for translation and stepwise processing.
|
||||
|
||||
### 2. [Parallelization (Parallel Processing)](02_graph_architecture_parallelization.md)
|
||||
Execute multiple independent tasks simultaneously. Used for speed improvement and reliability verification.
|
||||
|
||||
### 3. [Routing (Branching Processing)](02_graph_architecture_routing.md)
|
||||
Route to specialized flows based on input. Optimal for customer support.
|
||||
|
||||
### 4. [Orchestrator-Worker (Master-Worker)](02_graph_architecture_orchestrator_worker.md)
|
||||
Orchestrator decomposes tasks and delegates to multiple workers.
|
||||
|
||||
### 5. [Evaluator-Optimizer (Evaluation-Improvement Loop)](02_graph_architecture_evaluator_optimizer.md)
|
||||
Repeat generation and evaluation, iteratively improving until acceptable criteria are met.
|
||||
|
||||
### 6. [Agent (Autonomous Tool Usage)](02_graph_architecture_agent.md)
|
||||
LLM dynamically determines tool selection, handling unpredictable problem-solving.
|
||||
|
||||
## [Subgraph](02_graph_architecture_subgraph.md)
|
||||
|
||||
Build hierarchical graph structures and modularize complex systems.
|
||||
|
||||
## Pattern Selection Guide
|
||||
|
||||
| Pattern | Use Case | Example |
|
||||
|---------|----------|---------|
|
||||
| Prompt Chaining | Stepwise processing | Translation → Summary → Analysis |
|
||||
| Parallelization | Simultaneous execution of independent tasks | Evaluation by multiple criteria |
|
||||
| Routing | Type-based routing | Support inquiry classification |
|
||||
| Orchestrator-Worker | Task decomposition and delegation | Parallel processing of multiple documents |
|
||||
| Evaluator-Optimizer | Iterative improvement | Quality improvement loop |
|
||||
| Agent | Dynamic problem solving | Uncertain tasks |
|
||||
|
||||
## Important Principles
|
||||
|
||||
1. **Workflow if structure is clear**: When task structure can be predefined
|
||||
2. **Agent if uncertain**: When problem or solution is uncertain and LLM judgment is needed
|
||||
3. **Subgraph for modularization**: Organize complex systems with hierarchical structure
|
||||
|
||||
## Next Steps
|
||||
|
||||
For details on each pattern, refer to individual pages. We recommend starting with [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md).
|
||||
182
skills/langgraph-master/02_graph_architecture_parallelization.md
Normal file
182
skills/langgraph-master/02_graph_architecture_parallelization.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Parallelization (Parallel Processing)
|
||||
|
||||
A pattern for executing multiple independent tasks simultaneously.
|
||||
|
||||
## Overview
|
||||
|
||||
Parallelization is a pattern that executes **multiple tasks that don't depend on each other** simultaneously, achieving speed improvements and reliability verification.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Scoring documents with multiple evaluation criteria
|
||||
- Analysis from different perspectives (technical/business/legal)
|
||||
- Comparing results from multiple translation engines
|
||||
- Implementing Map-Reduce pattern
|
||||
|
||||
## Implementation Example
|
||||
|
||||
```python
|
||||
from typing import Annotated, TypedDict
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
document: str
|
||||
scores: Annotated[list[dict], add] # Aggregate multiple results
|
||||
|
||||
def technical_review(state: State):
|
||||
"""Review from technical perspective"""
|
||||
score = llm.invoke(
|
||||
f"Technical review: {state['document']}"
|
||||
)
|
||||
return {"scores": [{"type": "technical", "score": score}]}
|
||||
|
||||
def business_review(state: State):
|
||||
"""Review from business perspective"""
|
||||
score = llm.invoke(
|
||||
f"Business review: {state['document']}"
|
||||
)
|
||||
return {"scores": [{"type": "business", "score": score}]}
|
||||
|
||||
def legal_review(state: State):
|
||||
"""Review from legal perspective"""
|
||||
score = llm.invoke(
|
||||
f"Legal review: {state['document']}"
|
||||
)
|
||||
return {"scores": [{"type": "legal", "score": score}]}
|
||||
|
||||
def aggregate_scores(state: State):
|
||||
"""Aggregate scores"""
|
||||
total = sum(s["score"] for s in state["scores"])
|
||||
return {"final_score": total / len(state["scores"])}
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(State)
|
||||
|
||||
# Nodes to be executed in parallel
|
||||
builder.add_node("technical", technical_review)
|
||||
builder.add_node("business", business_review)
|
||||
builder.add_node("legal", legal_review)
|
||||
builder.add_node("aggregate", aggregate_scores)
|
||||
|
||||
# Edges for parallel execution
|
||||
builder.add_edge(START, "technical")
|
||||
builder.add_edge(START, "business")
|
||||
builder.add_edge(START, "legal")
|
||||
|
||||
# To aggregation node
|
||||
builder.add_edge("technical", "aggregate")
|
||||
builder.add_edge("business", "aggregate")
|
||||
builder.add_edge("legal", "aggregate")
|
||||
builder.add_edge("aggregate", END)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Important Concept: Reducer
|
||||
|
||||
A **Reducer** is essential for aggregating results from parallel execution:
|
||||
|
||||
```python
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
# Additively aggregate results from multiple nodes
|
||||
results: Annotated[list, add]
|
||||
|
||||
# Keep maximum value
|
||||
max_score: Annotated[int, max]
|
||||
|
||||
# Custom Reducer
|
||||
combined: Annotated[dict, combine_dicts]
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Speed**: Time reduction through parallel task execution
|
||||
✅ **Reliability**: Verification by comparing multiple results
|
||||
✅ **Scalability**: Adjust parallelism based on task count
|
||||
✅ **Robustness**: Can continue if some succeed even if others fail
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Reducer Required**: Explicitly define result aggregation method
|
||||
⚠️ **Resource Consumption**: Increased memory and API calls from parallel execution
|
||||
⚠️ **Uncertain Order**: Execution order not guaranteed
|
||||
⚠️ **Debugging Complexity**: Parallel execution troubleshooting is difficult
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Fan-out / Fan-in
|
||||
|
||||
```python
|
||||
# Fan-out: One node to multiple
|
||||
builder.add_edge("router", "task_a")
|
||||
builder.add_edge("router", "task_b")
|
||||
builder.add_edge("router", "task_c")
|
||||
|
||||
# Fan-in: Multiple to one aggregation
|
||||
builder.add_edge("task_a", "aggregator")
|
||||
builder.add_edge("task_b", "aggregator")
|
||||
builder.add_edge("task_c", "aggregator")
|
||||
```
|
||||
|
||||
### Pattern 2: Balancing (defer=True)
|
||||
|
||||
Wait for branches of different lengths:
|
||||
|
||||
```python
|
||||
from operator import add
|
||||
|
||||
def add_with_defer(left: list, right: list) -> list:
|
||||
return left + right
|
||||
|
||||
class State(TypedDict):
|
||||
results: Annotated[list, add_with_defer]
|
||||
|
||||
# Specify defer=True at compile time
|
||||
graph = builder.compile(
|
||||
checkpointer=checkpointer,
|
||||
# Wait until all branches complete
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Reliability Through Redundancy
|
||||
|
||||
```python
|
||||
def provider_a(state: State):
|
||||
"""Provider A"""
|
||||
return {"responses": [call_api_a(state["query"])]}
|
||||
|
||||
def provider_b(state: State):
|
||||
"""Provider B (backup)"""
|
||||
return {"responses": [call_api_b(state["query"])]}
|
||||
|
||||
def provider_c(state: State):
|
||||
"""Provider C (backup)"""
|
||||
return {"responses": [call_api_c(state["query"])]}
|
||||
|
||||
def select_best(state: State):
|
||||
"""Select best response"""
|
||||
responses = state["responses"]
|
||||
best = max(responses, key=lambda r: r.confidence)
|
||||
return {"result": best}
|
||||
```
|
||||
|
||||
## vs Other Patterns
|
||||
|
||||
| Pattern | Parallelization | Prompt Chaining |
|
||||
|---------|----------------|-----------------|
|
||||
| Execution Order | Parallel | Sequential |
|
||||
| Dependencies | None | Yes |
|
||||
| Execution Time | Short | Long |
|
||||
| Result Aggregation | Reducer required | Not required |
|
||||
|
||||
## Summary
|
||||
|
||||
Parallelization is optimal for **simultaneous execution of independent tasks**. It's important to properly aggregate results using a Reducer.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_orchestrator_worker.md](02_graph_architecture_orchestrator_worker.md) - Dynamic parallel processing
|
||||
- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce pattern
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - Reducer details
|
||||
138
skills/langgraph-master/02_graph_architecture_prompt_chaining.md
Normal file
138
skills/langgraph-master/02_graph_architecture_prompt_chaining.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# Prompt Chaining (Sequential Processing)
|
||||
|
||||
A sequential pattern where each LLM call processes the previous output.
|
||||
|
||||
## Overview
|
||||
|
||||
Prompt Chaining is a pattern that **chains multiple LLM calls in sequence**. The output of each step becomes the input for the next step.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Stepwise processing like translation → summary → analysis
|
||||
- Content generation → validation → correction pipeline
|
||||
- Data extraction → transformation → validation flow
|
||||
|
||||
## Implementation Example
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph, START, END
|
||||
from typing import TypedDict
|
||||
|
||||
class State(TypedDict):
|
||||
text: str
|
||||
translated: str
|
||||
summarized: str
|
||||
analyzed: str
|
||||
|
||||
def translate_node(state: State):
|
||||
"""Translate English → Japanese"""
|
||||
translated = llm.invoke(
|
||||
f"Translate to Japanese: {state['text']}"
|
||||
)
|
||||
return {"translated": translated}
|
||||
|
||||
def summarize_node(state: State):
|
||||
"""Summarize translated text"""
|
||||
summarized = llm.invoke(
|
||||
f"Summarize this text: {state['translated']}"
|
||||
)
|
||||
return {"summarized": summarized}
|
||||
|
||||
def analyze_node(state: State):
|
||||
"""Analyze summary"""
|
||||
analyzed = llm.invoke(
|
||||
f"Analyze sentiment: {state['summarized']}"
|
||||
)
|
||||
return {"analyzed": analyzed}
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(State)
|
||||
builder.add_node("translate", translate_node)
|
||||
builder.add_node("summarize", summarize_node)
|
||||
builder.add_node("analyze", analyze_node)
|
||||
|
||||
# Edges for sequential execution
|
||||
builder.add_edge(START, "translate")
|
||||
builder.add_edge("translate", "summarize")
|
||||
builder.add_edge("summarize", "analyze")
|
||||
builder.add_edge("analyze", END)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Simple**: Processing flow is linear and easy to understand
|
||||
✅ **Predictable**: Always executes in the same order
|
||||
✅ **Easy to Debug**: Each step can be tested independently
|
||||
✅ **Gradual Improvement**: Quality improves at each step
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Accumulated Delay**: Takes time as each step executes sequentially
|
||||
⚠️ **Error Propagation**: Earlier errors affect later stages
|
||||
⚠️ **Lack of Flexibility**: Dynamic branching is difficult
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Chain with Validation
|
||||
|
||||
```python
|
||||
def validate_translation(state: State):
|
||||
"""Validate translation quality"""
|
||||
is_valid = check_quality(state["translated"])
|
||||
return {"is_valid": is_valid}
|
||||
|
||||
def route_after_validation(state: State):
|
||||
if state["is_valid"]:
|
||||
return "continue"
|
||||
return "retry"
|
||||
|
||||
# Validation → continue or retry
|
||||
builder.add_conditional_edges(
|
||||
"validate",
|
||||
route_after_validation,
|
||||
{
|
||||
"continue": "summarize",
|
||||
"retry": "translate"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Gradual Refinement
|
||||
|
||||
```python
|
||||
def draft_node(state: State):
|
||||
"""Create draft"""
|
||||
draft = llm.invoke(f"Write a draft: {state['topic']}")
|
||||
return {"draft": draft}
|
||||
|
||||
def refine_node(state: State):
|
||||
"""Refine draft"""
|
||||
refined = llm.invoke(f"Improve this draft: {state['draft']}")
|
||||
return {"refined": refined}
|
||||
|
||||
def polish_node(state: State):
|
||||
"""Final polish"""
|
||||
polished = llm.invoke(f"Polish this text: {state['refined']}")
|
||||
return {"final": polished}
|
||||
```
|
||||
|
||||
## vs Other Patterns
|
||||
|
||||
| Pattern | Prompt Chaining | Parallelization |
|
||||
|---------|----------------|-----------------|
|
||||
| Execution Order | Sequential | Parallel |
|
||||
| Dependencies | Yes | No |
|
||||
| Execution Time | Long | Short |
|
||||
| Use Case | Stepwise processing | Independent tasks |
|
||||
|
||||
## Summary
|
||||
|
||||
Prompt Chaining is the simplest pattern, optimal for **cases requiring stepwise processing**. Use when each step depends on the previous step.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with parallel processing
|
||||
- [02_graph_architecture_evaluator_optimizer.md](02_graph_architecture_evaluator_optimizer.md) - Combination with validation loop
|
||||
- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Edge basics
|
||||
263
skills/langgraph-master/02_graph_architecture_routing.md
Normal file
263
skills/langgraph-master/02_graph_architecture_routing.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Routing (Branching Processing)
|
||||
|
||||
A pattern for routing to specialized flows based on input.
|
||||
|
||||
## Overview
|
||||
|
||||
Routing is a pattern that **selects the appropriate processing path** based on input characteristics. Used for customer support question classification, etc.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Route customer questions to specialized teams by type
|
||||
- Different processing pipelines by document type
|
||||
- Prioritization by urgency/importance
|
||||
- Processing flow selection by language
|
||||
|
||||
## Implementation Example: Customer Support
|
||||
|
||||
```python
|
||||
from typing import Literal, TypedDict
|
||||
|
||||
class State(TypedDict):
|
||||
query: str
|
||||
category: str
|
||||
response: str
|
||||
|
||||
def router_node(state: State) -> Literal["pricing", "refund", "technical"]:
|
||||
"""Classify and route question"""
|
||||
query = state["query"]
|
||||
|
||||
# Classify with LLM
|
||||
category = llm.invoke(
|
||||
f"Classify this customer query into: pricing, refund, or technical\n"
|
||||
f"Query: {query}\n"
|
||||
f"Category:"
|
||||
)
|
||||
|
||||
if "price" in query or "cost" in query:
|
||||
return "pricing"
|
||||
elif "refund" in query or "cancel" in query:
|
||||
return "refund"
|
||||
else:
|
||||
return "technical"
|
||||
|
||||
def pricing_node(state: State):
|
||||
"""Handle pricing queries"""
|
||||
response = handle_pricing_query(state["query"])
|
||||
return {"response": response, "category": "pricing"}
|
||||
|
||||
def refund_node(state: State):
|
||||
"""Handle refund queries"""
|
||||
response = handle_refund_query(state["query"])
|
||||
return {"response": response, "category": "refund"}
|
||||
|
||||
def technical_node(state: State):
|
||||
"""Handle technical issues"""
|
||||
response = handle_technical_query(state["query"])
|
||||
return {"response": response, "category": "technical"}
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(State)
|
||||
|
||||
builder.add_node("router", router_node)
|
||||
builder.add_node("pricing", pricing_node)
|
||||
builder.add_node("refund", refund_node)
|
||||
builder.add_node("technical", technical_node)
|
||||
|
||||
# Routing edges
|
||||
builder.add_edge(START, "router")
|
||||
builder.add_conditional_edges(
|
||||
"router",
|
||||
lambda state: state.get("category", "technical"),
|
||||
{
|
||||
"pricing": "pricing",
|
||||
"refund": "refund",
|
||||
"technical": "technical"
|
||||
}
|
||||
)
|
||||
|
||||
# End from each node
|
||||
builder.add_edge("pricing", END)
|
||||
builder.add_edge("refund", END)
|
||||
builder.add_edge("technical", END)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Multi-Stage Routing
|
||||
|
||||
```python
|
||||
def first_router(state: State) -> Literal["sales", "support"]:
|
||||
"""Stage 1: Sales or Support"""
|
||||
if "purchase" in state["query"] or "quote" in state["query"]:
|
||||
return "sales"
|
||||
return "support"
|
||||
|
||||
def support_router(state: State) -> Literal["billing", "technical"]:
|
||||
"""Stage 2: Classification within Support"""
|
||||
if "billing" in state["query"]:
|
||||
return "billing"
|
||||
return "technical"
|
||||
|
||||
# Multi-stage routing
|
||||
builder.add_conditional_edges("first_router", first_router, {...})
|
||||
builder.add_conditional_edges("support_router", support_router, {...})
|
||||
```
|
||||
|
||||
### Pattern 2: Priority-Based Routing
|
||||
|
||||
```python
|
||||
from typing import Literal
|
||||
|
||||
def priority_router(state: State) -> Literal["urgent", "normal", "low"]:
|
||||
"""Route by urgency"""
|
||||
query = state["query"]
|
||||
|
||||
# Urgent keywords
|
||||
if any(word in query for word in ["urgent", "immediately", "asap"]):
|
||||
return "urgent"
|
||||
|
||||
# Importance determination
|
||||
importance = analyze_importance(query)
|
||||
if importance > 0.7:
|
||||
return "normal"
|
||||
|
||||
return "low"
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"priority_router",
|
||||
priority_router,
|
||||
{
|
||||
"urgent": "urgent_handler", # Immediate processing
|
||||
"normal": "normal_queue", # Normal queue
|
||||
"low": "batch_processor" # Batch processing
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Semantic Routing (Embedding-Based)
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from typing import Literal
|
||||
|
||||
def semantic_router(state: State) -> Literal["product", "account", "general"]:
|
||||
"""Semantic routing based on embeddings"""
|
||||
query_embedding = embed(state["query"])
|
||||
|
||||
# Representative embeddings for each category
|
||||
categories = {
|
||||
"product": embed("product, features, how to use"),
|
||||
"account": embed("account, login, password"),
|
||||
"general": embed("general questions")
|
||||
}
|
||||
|
||||
# Select closest category
|
||||
similarities = {
|
||||
cat: cosine_similarity(query_embedding, emb)
|
||||
for cat, emb in categories.items()
|
||||
}
|
||||
|
||||
return max(similarities, key=similarities.get)
|
||||
```
|
||||
|
||||
### Pattern 4: Dynamic Routing (LLM Judgment)
|
||||
|
||||
```python
|
||||
def llm_router(state: State):
|
||||
"""Have LLM determine optimal route"""
|
||||
routes = ["expert_a", "expert_b", "expert_c", "general"]
|
||||
|
||||
prompt = f"""
|
||||
Select the most appropriate expert to handle this question:
|
||||
- expert_a: Database specialist
|
||||
- expert_b: API specialist
|
||||
- expert_c: UI specialist
|
||||
- general: General questions
|
||||
|
||||
Question: {state['query']}
|
||||
|
||||
Selection: """
|
||||
|
||||
route = llm.invoke(prompt).strip()
|
||||
return route if route in routes else "general"
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"router",
|
||||
llm_router,
|
||||
{
|
||||
"expert_a": "database_expert",
|
||||
"expert_b": "api_expert",
|
||||
"expert_c": "ui_expert",
|
||||
"general": "general_handler"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Specialization**: Specialized processing for each type
|
||||
✅ **Efficiency**: Skip unnecessary processing
|
||||
✅ **Maintainability**: Improve each route independently
|
||||
✅ **Scalability**: Easy to add new routes
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Classification Accuracy**: Routing errors affect the whole
|
||||
⚠️ **Coverage**: Need to cover all cases
|
||||
⚠️ **Fallback**: Handling unknown cases is important
|
||||
⚠️ **Balance**: Consider load balancing between routes
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Provide Fallback Route
|
||||
|
||||
```python
|
||||
def safe_router(state: State):
|
||||
try:
|
||||
route = determine_route(state)
|
||||
if route in valid_routes:
|
||||
return route
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Fallback
|
||||
return "general_handler"
|
||||
```
|
||||
|
||||
### 2. Log Routing Reasons
|
||||
|
||||
```python
|
||||
def logged_router(state: State):
|
||||
route = determine_route(state)
|
||||
|
||||
return {
|
||||
"route": route,
|
||||
"routing_reason": f"Routed to {route} because..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Dynamic Route Addition
|
||||
|
||||
```python
|
||||
# Load routes from configuration file
|
||||
ROUTES = load_routes_config()
|
||||
|
||||
builder.add_conditional_edges(
|
||||
"router",
|
||||
determine_route,
|
||||
{route: handler for route, handler in ROUTES.items()}
|
||||
)
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Routing is optimal for **appropriate processing selection based on input characteristics**. Classification accuracy and fallback handling are keys to success.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combining with Agent
|
||||
- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Conditional edge details
|
||||
- [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md) - Pattern usage
|
||||
282
skills/langgraph-master/02_graph_architecture_subgraph.md
Normal file
282
skills/langgraph-master/02_graph_architecture_subgraph.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# Subgraph
|
||||
|
||||
A pattern for building hierarchical graph structures and modularizing complex systems.
|
||||
|
||||
## Overview
|
||||
|
||||
Subgraph is a pattern for hierarchically organizing complex systems by **embedding graphs as nodes in other graphs**.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Modularizing large-scale agent systems
|
||||
- Integrating multiple specialized agents
|
||||
- Reusable workflow components
|
||||
- Multi-level hierarchical structures
|
||||
|
||||
## Two Implementation Approaches
|
||||
|
||||
### Approach 1: Add Graph as Node
|
||||
|
||||
Use when **sharing state keys**.
|
||||
|
||||
```python
|
||||
# Subgraph definition
|
||||
class SubState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
sub_result: str
|
||||
|
||||
def sub_node_a(state: SubState):
|
||||
return {"messages": [{"role": "assistant", "content": "Sub A"}]}
|
||||
|
||||
def sub_node_b(state: SubState):
|
||||
return {"sub_result": "Sub B completed"}
|
||||
|
||||
# Build subgraph
|
||||
sub_builder = StateGraph(SubState)
|
||||
sub_builder.add_node("sub_a", sub_node_a)
|
||||
sub_builder.add_node("sub_b", sub_node_b)
|
||||
sub_builder.add_edge(START, "sub_a")
|
||||
sub_builder.add_edge("sub_a", "sub_b")
|
||||
sub_builder.add_edge("sub_b", END)
|
||||
|
||||
sub_graph = sub_builder.compile()
|
||||
|
||||
# Use subgraph as node in parent graph
|
||||
class ParentState(TypedDict):
|
||||
messages: Annotated[list, add_messages] # Shared key
|
||||
sub_result: str # Shared key
|
||||
parent_data: str
|
||||
|
||||
parent_builder = StateGraph(ParentState)
|
||||
|
||||
# Add subgraph directly as node
|
||||
parent_builder.add_node("subgraph", sub_graph)
|
||||
|
||||
parent_builder.add_edge(START, "subgraph")
|
||||
parent_builder.add_edge("subgraph", END)
|
||||
|
||||
parent_graph = parent_builder.compile()
|
||||
```
|
||||
|
||||
### Approach 2: Call Graph from Within Node
|
||||
|
||||
Use when having **different state schemas**.
|
||||
|
||||
```python
|
||||
# Subgraph (own state)
|
||||
class SubGraphState(TypedDict):
|
||||
input_text: str
|
||||
output_text: str
|
||||
|
||||
def process_node(state: SubGraphState):
|
||||
return {"output_text": process(state["input_text"])}
|
||||
|
||||
sub_builder = StateGraph(SubGraphState)
|
||||
sub_builder.add_node("process", process_node)
|
||||
sub_builder.add_edge(START, "process")
|
||||
sub_builder.add_edge("process", END)
|
||||
|
||||
sub_graph = sub_builder.compile()
|
||||
|
||||
# Parent graph (different state)
|
||||
class ParentState(TypedDict):
|
||||
user_query: str
|
||||
result: str
|
||||
|
||||
def invoke_subgraph_node(state: ParentState):
|
||||
"""Call subgraph within node"""
|
||||
# Convert parent state to subgraph state
|
||||
sub_input = {"input_text": state["user_query"]}
|
||||
|
||||
# Execute subgraph
|
||||
sub_output = sub_graph.invoke(sub_input)
|
||||
|
||||
# Convert subgraph output to parent state
|
||||
return {"result": sub_output["output_text"]}
|
||||
|
||||
parent_builder = StateGraph(ParentState)
|
||||
parent_builder.add_node("call_subgraph", invoke_subgraph_node)
|
||||
parent_builder.add_edge(START, "call_subgraph")
|
||||
parent_builder.add_edge("call_subgraph", END)
|
||||
|
||||
parent_graph = parent_builder.compile()
|
||||
```
|
||||
|
||||
## Multi-Level Subgraphs
|
||||
|
||||
Multiple levels of subgraphs (parent → child → grandchild) are also possible:
|
||||
|
||||
```python
|
||||
# Grandchild graph
|
||||
class GrandchildState(TypedDict):
|
||||
data: str
|
||||
|
||||
grandchild_builder = StateGraph(GrandchildState)
|
||||
grandchild_builder.add_node("process", lambda s: {"data": f"Processed: {s['data']}"})
|
||||
grandchild_builder.add_edge(START, "process")
|
||||
grandchild_builder.add_edge("process", END)
|
||||
grandchild_graph = grandchild_builder.compile()
|
||||
|
||||
# Child graph (includes grandchild graph)
|
||||
class ChildState(TypedDict):
|
||||
data: str
|
||||
|
||||
child_builder = StateGraph(ChildState)
|
||||
child_builder.add_node("grandchild", grandchild_graph) # Add grandchild graph
|
||||
child_builder.add_edge(START, "grandchild")
|
||||
child_builder.add_edge("grandchild", END)
|
||||
child_graph = child_builder.compile()
|
||||
|
||||
# Parent graph (includes child graph)
|
||||
class ParentState(TypedDict):
|
||||
data: str
|
||||
|
||||
parent_builder = StateGraph(ParentState)
|
||||
parent_builder.add_node("child", child_graph) # Add child graph
|
||||
parent_builder.add_edge(START, "child")
|
||||
parent_builder.add_edge("child", END)
|
||||
parent_graph = parent_builder.compile()
|
||||
```
|
||||
|
||||
## Navigation Between Subgraphs
|
||||
|
||||
Transition from subgraph to another node in parent graph:
|
||||
|
||||
```python
|
||||
from langgraph.types import Command
|
||||
|
||||
def sub_node_with_navigation(state: SubState):
|
||||
"""Navigate from subgraph node to parent graph"""
|
||||
result = process(state["data"])
|
||||
|
||||
if need_parent_intervention(result):
|
||||
# Transition to another node in parent graph
|
||||
return Command(
|
||||
update={"result": result},
|
||||
goto="parent_handler",
|
||||
graph=Command.PARENT
|
||||
)
|
||||
|
||||
return {"result": result}
|
||||
```
|
||||
|
||||
## Persistence and Debugging
|
||||
|
||||
### Automatic Checkpointer Propagation
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
# Set checkpointer only on parent graph
|
||||
checkpointer = MemorySaver()
|
||||
|
||||
parent_graph = parent_builder.compile(
|
||||
checkpointer=checkpointer # Automatically propagates to child graphs
|
||||
)
|
||||
```
|
||||
|
||||
### Streaming Including Subgraph Output
|
||||
|
||||
```python
|
||||
# Stream including subgraph details
|
||||
for chunk in parent_graph.stream(
|
||||
inputs,
|
||||
stream_mode="values",
|
||||
subgraphs=True # Include subgraph output
|
||||
):
|
||||
print(chunk)
|
||||
```
|
||||
|
||||
## Practical Example: Multi-Agent System
|
||||
|
||||
```python
|
||||
# Research agent (subgraph)
|
||||
class ResearchState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
research_result: str
|
||||
|
||||
research_builder = StateGraph(ResearchState)
|
||||
research_builder.add_node("search", search_node)
|
||||
research_builder.add_node("analyze", analyze_node)
|
||||
research_builder.add_edge(START, "search")
|
||||
research_builder.add_edge("search", "analyze")
|
||||
research_builder.add_edge("analyze", END)
|
||||
research_graph = research_builder.compile()
|
||||
|
||||
# Coding agent (subgraph)
|
||||
class CodingState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
code: str
|
||||
|
||||
coding_builder = StateGraph(CodingState)
|
||||
coding_builder.add_node("generate", generate_code_node)
|
||||
coding_builder.add_node("test", test_code_node)
|
||||
coding_builder.add_edge(START, "generate")
|
||||
coding_builder.add_edge("generate", "test")
|
||||
coding_builder.add_edge("test", END)
|
||||
coding_graph = coding_builder.compile()
|
||||
|
||||
# Integrated system (parent graph)
|
||||
class SystemState(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
research_result: str
|
||||
code: str
|
||||
task_type: str
|
||||
|
||||
def router(state: SystemState):
|
||||
if "research" in state["messages"][-1].content:
|
||||
return "research"
|
||||
return "coding"
|
||||
|
||||
system_builder = StateGraph(SystemState)
|
||||
|
||||
# Add subgraphs
|
||||
system_builder.add_node("research_agent", research_graph)
|
||||
system_builder.add_node("coding_agent", coding_graph)
|
||||
|
||||
# Routing
|
||||
system_builder.add_conditional_edges(
|
||||
START,
|
||||
router,
|
||||
{
|
||||
"research": "research_agent",
|
||||
"coding": "coding_agent"
|
||||
}
|
||||
)
|
||||
|
||||
system_builder.add_edge("research_agent", END)
|
||||
system_builder.add_edge("coding_agent", END)
|
||||
|
||||
system_graph = system_builder.compile()
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Modularization**: Divide complex systems into smaller parts
|
||||
✅ **Reusability**: Use subgraphs in multiple parent graphs
|
||||
✅ **Maintainability**: Improve each subgraph independently
|
||||
✅ **Testability**: Test subgraphs individually
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **State Sharing**: Carefully design which keys to share
|
||||
⚠️ **Debugging Complexity**: Deep hierarchies are hard to track
|
||||
⚠️ **Performance**: Multi-level increases overhead
|
||||
⚠️ **Circular References**: Watch for circular dependencies between subgraphs
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Shallow Hierarchy**: Keep hierarchy as shallow as possible (2-3 levels)
|
||||
2. **Clear Responsibilities**: Clearly define role of each subgraph
|
||||
3. **Minimize State**: Share only necessary state keys
|
||||
4. **Independence**: Subgraphs should operate as independently as possible
|
||||
|
||||
## Summary
|
||||
|
||||
Subgraph is optimal for **hierarchical organization of complex systems**. Choose between two approaches depending on state sharing method.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combination with multi-agent
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - State design
|
||||
- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Checkpointer propagation
|
||||
@@ -0,0 +1,156 @@
|
||||
# Workflow vs Agent
|
||||
|
||||
Differences and usage between Workflow and Agent.
|
||||
|
||||
## Basic Differences
|
||||
|
||||
### Workflow
|
||||
> "predetermined code paths and are designed to operate in a certain order"
|
||||
> (Predetermined code paths, operates in specific order)
|
||||
|
||||
- **Pre-defined**: Processing flow is clear
|
||||
- **Predictable**: Follows same path for same input
|
||||
- **Controlled Execution**: Developer has complete control over control flow
|
||||
|
||||
### Agent
|
||||
> "dynamic and define their own processes and tool usage"
|
||||
> (Dynamic, defines its own processes and tool usage)
|
||||
|
||||
- **Dynamic**: LLM decides next action
|
||||
- **Autonomous**: Self-determines tool selection
|
||||
- **Uncertain**: May follow different paths with same input
|
||||
|
||||
## Implementation Comparison
|
||||
|
||||
### Workflow Example: Translation Pipeline
|
||||
|
||||
```python
|
||||
def translate_node(state: State):
|
||||
return {"text": translate(state["text"])}
|
||||
|
||||
def summarize_node(state: State):
|
||||
return {"summary": summarize(state["text"])}
|
||||
|
||||
def validate_node(state: State):
|
||||
return {"valid": check_quality(state["summary"])}
|
||||
|
||||
# Fixed flow
|
||||
builder.add_edge(START, "translate")
|
||||
builder.add_edge("translate", "summarize")
|
||||
builder.add_edge("summarize", "validate")
|
||||
builder.add_edge("validate", END)
|
||||
```
|
||||
|
||||
### Agent Example: Problem-Solving Agent
|
||||
|
||||
```python
|
||||
def agent_node(state: State):
|
||||
# LLM determines tool usage
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def should_continue(state: State):
|
||||
last_message = state["messages"][-1]
|
||||
# Continue if there are tool calls
|
||||
if last_message.tool_calls:
|
||||
return "continue"
|
||||
return "end"
|
||||
|
||||
# LLM decides dynamically
|
||||
builder.add_conditional_edges(
|
||||
"agent",
|
||||
should_continue,
|
||||
{"continue": "tools", "end": END}
|
||||
)
|
||||
```
|
||||
|
||||
## Selection Criteria
|
||||
|
||||
### Choose Workflow When
|
||||
|
||||
✅ **Structure is Clear**
|
||||
- Processing steps are known in advance
|
||||
- Execution order is fixed
|
||||
|
||||
✅ **Predictability is Important**
|
||||
- Compliance requirements exist
|
||||
- Debugging needs to be easy
|
||||
|
||||
✅ **Cost Efficiency**
|
||||
- Want to minimize LLM calls
|
||||
- Want to reduce token consumption
|
||||
|
||||
**Examples**: Data processing pipelines, approval workflows, translation chains
|
||||
|
||||
### Choose Agent When
|
||||
|
||||
✅ **Problem is Uncertain**
|
||||
- Don't know which tools are needed
|
||||
- Variable number of steps
|
||||
|
||||
✅ **Flexibility is Needed**
|
||||
- Different approaches based on situation
|
||||
- Diverse user questions
|
||||
|
||||
✅ **Autonomy is Valuable**
|
||||
- Want to leverage LLM's judgment
|
||||
- ReAct (reasoning + action) pattern is suitable
|
||||
|
||||
**Examples**: Customer support, research assistant, complex problem solving
|
||||
|
||||
## Hybrid Approach
|
||||
|
||||
Many practical systems combine both:
|
||||
|
||||
```python
|
||||
# Embed Agent within Workflow
|
||||
builder.add_edge(START, "input_validation") # Workflow
|
||||
builder.add_edge("input_validation", "agent") # Agent part
|
||||
builder.add_conditional_edges("agent", should_continue, {...})
|
||||
builder.add_edge("tools", "agent")
|
||||
builder.add_conditional_edges("agent", ..., {"end": "output_formatting"})
|
||||
builder.add_edge("output_formatting", END) # Workflow
|
||||
```
|
||||
|
||||
## ReAct Pattern (Agent Foundation)
|
||||
|
||||
Agent follows the **ReAct** (Reasoning + Acting) pattern:
|
||||
|
||||
1. **Reasoning**: Think "What should I do next?"
|
||||
2. **Acting**: Take action using tools
|
||||
3. **Observing**: Observe results
|
||||
4. Repeat until reaching final answer
|
||||
|
||||
```python
|
||||
# ReAct loop implementation
|
||||
def agent(state):
|
||||
# Reasoning: Determine next action
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def tools(state):
|
||||
# Acting: Execute tools
|
||||
results = execute_tools(state["messages"][-1].tool_calls)
|
||||
return {"messages": results}
|
||||
|
||||
# Observing & Repeat
|
||||
builder.add_conditional_edges("agent", should_continue, ...)
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
| Aspect | Workflow | Agent |
|
||||
|--------|----------|-------|
|
||||
| Control | Developer has complete control | LLM decides dynamically |
|
||||
| Predictability | High | Low |
|
||||
| Flexibility | Low | High |
|
||||
| Cost | Low | High |
|
||||
| Use Case | Structured tasks | Uncertain tasks |
|
||||
|
||||
**Important**: Both can be built with the same tools (State, Node, Edge) in LangGraph. Pattern choice depends on problem nature.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_prompt_chaining.md](02_graph_architecture_prompt_chaining.md) - Workflow pattern example
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent pattern details
|
||||
- [02_graph_architecture_routing.md](02_graph_architecture_routing.md) - Hybrid approach example
|
||||
224
skills/langgraph-master/03_memory_management_checkpointer.md
Normal file
224
skills/langgraph-master/03_memory_management_checkpointer.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Checkpointer
|
||||
|
||||
Implementation details for saving and restoring state.
|
||||
|
||||
## Overview
|
||||
|
||||
Checkpointer implements the `BaseCheckpointSaver` interface and is responsible for state persistence.
|
||||
|
||||
## Checkpointer Implementations
|
||||
|
||||
### 1. MemorySaver (For Experimentation & Testing)
|
||||
|
||||
Saves checkpoints in memory:
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# All data is lost when the process terminates
|
||||
```
|
||||
|
||||
**Use Case**: Local testing, prototyping
|
||||
|
||||
### 2. SqliteSaver (For Local Development)
|
||||
|
||||
Saves to SQLite database:
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.sqlite import SqliteSaver
|
||||
|
||||
# File-based
|
||||
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
|
||||
|
||||
# Or from connection object
|
||||
import sqlite3
|
||||
conn = sqlite3.connect("checkpoints.db")
|
||||
checkpointer = SqliteSaver(conn)
|
||||
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
**Use Case**: Local development, single-user applications
|
||||
|
||||
### 3. PostgresSaver (For Production)
|
||||
|
||||
Saves to PostgreSQL database:
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.postgres import PostgresSaver
|
||||
from psycopg_pool import ConnectionPool
|
||||
|
||||
# Connection pool
|
||||
pool = ConnectionPool(
|
||||
conninfo="postgresql://user:password@localhost:5432/db"
|
||||
)
|
||||
|
||||
checkpointer = PostgresSaver(pool)
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
**Use Case**: Production environments, multi-user applications
|
||||
|
||||
## BaseCheckpointSaver Interface
|
||||
|
||||
All checkpointers implement the following methods:
|
||||
|
||||
```python
|
||||
class BaseCheckpointSaver:
|
||||
def put(
|
||||
self,
|
||||
config: RunnableConfig,
|
||||
checkpoint: Checkpoint,
|
||||
metadata: dict
|
||||
) -> RunnableConfig:
|
||||
"""Save a checkpoint"""
|
||||
|
||||
def get_tuple(
|
||||
self,
|
||||
config: RunnableConfig
|
||||
) -> CheckpointTuple | None:
|
||||
"""Retrieve a checkpoint"""
|
||||
|
||||
def list(
|
||||
self,
|
||||
config: RunnableConfig,
|
||||
*,
|
||||
before: RunnableConfig | None = None,
|
||||
limit: int | None = None
|
||||
) -> Iterator[CheckpointTuple]:
|
||||
"""Get list of checkpoints"""
|
||||
```
|
||||
|
||||
## Custom Checkpointer
|
||||
|
||||
Implement your own persistence logic:
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.base import BaseCheckpointSaver
|
||||
|
||||
class RedisCheckpointer(BaseCheckpointSaver):
|
||||
def __init__(self, redis_client):
|
||||
self.redis = redis_client
|
||||
|
||||
def put(self, config, checkpoint, metadata):
|
||||
thread_id = config["configurable"]["thread_id"]
|
||||
checkpoint_id = checkpoint["id"]
|
||||
|
||||
key = f"checkpoint:{thread_id}:{checkpoint_id}"
|
||||
self.redis.set(key, serialize(checkpoint))
|
||||
|
||||
return config
|
||||
|
||||
def get_tuple(self, config):
|
||||
thread_id = config["configurable"]["thread_id"]
|
||||
# Retrieve the latest checkpoint
|
||||
# ...
|
||||
|
||||
def list(self, config, before=None, limit=None):
|
||||
# Return list of checkpoints
|
||||
# ...
|
||||
```
|
||||
|
||||
## Checkpointer Configuration
|
||||
|
||||
### Namespaces
|
||||
|
||||
Share the same checkpointer across multiple graphs:
|
||||
|
||||
```python
|
||||
checkpointer = MemorySaver()
|
||||
|
||||
graph1 = builder1.compile(
|
||||
checkpointer=checkpointer,
|
||||
name="graph1" # Namespace
|
||||
)
|
||||
|
||||
graph2 = builder2.compile(
|
||||
checkpointer=checkpointer,
|
||||
name="graph2" # Different namespace
|
||||
)
|
||||
```
|
||||
|
||||
### Automatic Propagation
|
||||
|
||||
Parent graph's checkpointer automatically propagates to subgraphs:
|
||||
|
||||
```python
|
||||
# Set only on parent graph
|
||||
parent_graph = parent_builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# Automatically propagates to child graphs
|
||||
```
|
||||
|
||||
## Checkpoint Management
|
||||
|
||||
### Deleting Old Checkpoints
|
||||
|
||||
```python
|
||||
# Delete after a certain period (implementation-dependent)
|
||||
import datetime
|
||||
|
||||
cutoff = datetime.datetime.now() - datetime.timedelta(days=30)
|
||||
|
||||
# Implementation example (SQLite)
|
||||
checkpointer.conn.execute(
|
||||
"DELETE FROM checkpoints WHERE created_at < ?",
|
||||
(cutoff,)
|
||||
)
|
||||
```
|
||||
|
||||
### Optimizing Checkpoint Size
|
||||
|
||||
```python
|
||||
class State(TypedDict):
|
||||
# Avoid large data
|
||||
messages: Annotated[list, add_messages]
|
||||
|
||||
# Store references only
|
||||
large_data_id: str # Actual data in separate storage
|
||||
|
||||
def node(state: State):
|
||||
# Retrieve large data from external source
|
||||
large_data = fetch_from_storage(state["large_data_id"])
|
||||
# ...
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Connection Pool (PostgreSQL)
|
||||
|
||||
```python
|
||||
from psycopg_pool import ConnectionPool
|
||||
|
||||
pool = ConnectionPool(
|
||||
conninfo=conn_string,
|
||||
min_size=5,
|
||||
max_size=20
|
||||
)
|
||||
|
||||
checkpointer = PostgresSaver(pool)
|
||||
```
|
||||
|
||||
### Async Checkpointer
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.postgres import AsyncPostgresSaver
|
||||
|
||||
async_checkpointer = AsyncPostgresSaver(async_pool)
|
||||
|
||||
# Async execution
|
||||
async for chunk in graph.astream(input, config):
|
||||
print(chunk)
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Checkpointer determines how state is persisted. It's important to choose the appropriate implementation for your use case.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [03_memory_management_persistence.md](03_memory_management_persistence.md) - How to use persistence
|
||||
- [03_memory_management_store.md](03_memory_management_store.md) - Differences from long-term memory
|
||||
152
skills/langgraph-master/03_memory_management_overview.md
Normal file
152
skills/langgraph-master/03_memory_management_overview.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# 03. Memory Management
|
||||
|
||||
State management through persistence and checkpoint features.
|
||||
|
||||
## Overview
|
||||
|
||||
LangGraph's **built-in persistence layer** allows you to save and restore agent state. This enables conversation continuation, error recovery, and time travel.
|
||||
|
||||
## Memory Types
|
||||
|
||||
### Short-term Memory: [Checkpointer](03_memory_management_checkpointer.md)
|
||||
- Automatically saves state at each superstep
|
||||
- Thread-based conversation management
|
||||
- Time travel functionality
|
||||
|
||||
### Long-term Memory: [Store](03_memory_management_store.md)
|
||||
- Share information across threads
|
||||
- Persist user information
|
||||
- Semantic search
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. [Persistence](03_memory_management_persistence.md)
|
||||
|
||||
**Checkpoints**: Save state at each superstep
|
||||
- Snapshot state at each stage of graph execution
|
||||
- Recoverable from failures
|
||||
- Track execution history
|
||||
|
||||
**Threads**: Unit of conversation
|
||||
- Identify conversations by `thread_id`
|
||||
- Each thread maintains independent state
|
||||
- Manage multiple conversations in parallel
|
||||
|
||||
**StateSnapshot**: Representation of checkpoints
|
||||
- `values`: State at that point in time
|
||||
- `next`: Nodes to execute next
|
||||
- `config`: Checkpoint configuration
|
||||
- `metadata`: Metadata
|
||||
|
||||
### 2. Human-in-the-Loop
|
||||
|
||||
**State Inspection**: Check state at any point
|
||||
```python
|
||||
state = graph.get_state(config)
|
||||
print(state.values)
|
||||
```
|
||||
|
||||
**Approval Flow**: Human approval before critical operations
|
||||
```python
|
||||
# Pause graph and wait for approval
|
||||
```
|
||||
|
||||
### 3. Memory
|
||||
|
||||
**Conversation Memory**: Memory within a thread
|
||||
```python
|
||||
# Conversation continues when called with the same thread_id
|
||||
config = {"configurable": {"thread_id": "conversation-1"}}
|
||||
graph.invoke(input, config)
|
||||
```
|
||||
|
||||
**Long-term Memory**: Memory across threads
|
||||
```python
|
||||
# Save user information in Store
|
||||
store.put(("user", user_id), "preferences", user_prefs)
|
||||
```
|
||||
|
||||
### 4. Time Travel
|
||||
|
||||
Replay and fork past executions:
|
||||
```python
|
||||
# Resume from specific checkpoint
|
||||
history = graph.get_state_history(config)
|
||||
for state in history:
|
||||
print(f"Checkpoint: {state.config['configurable']['checkpoint_id']}")
|
||||
|
||||
# Re-execute from past checkpoint
|
||||
graph.invoke(input, past_checkpoint_config)
|
||||
```
|
||||
|
||||
## Checkpointer Implementations
|
||||
|
||||
LangGraph provides multiple checkpointer implementations:
|
||||
|
||||
### InMemorySaver (For Experimentation)
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
### SqliteSaver (For Local Development)
|
||||
```python
|
||||
from langgraph.checkpoint.sqlite import SqliteSaver
|
||||
|
||||
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
### PostgresSaver (For Production)
|
||||
```python
|
||||
from langgraph.checkpoint.postgres import PostgresSaver
|
||||
|
||||
checkpointer = PostgresSaver.from_conn_string(
|
||||
"postgresql://user:pass@localhost/db"
|
||||
)
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
## Basic Usage Example
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
# Compile with checkpointer
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# Execute with thread_id
|
||||
config = {"configurable": {"thread_id": "user-123"}}
|
||||
|
||||
# First execution
|
||||
result1 = graph.invoke({"messages": [("user", "Hello")]}, config)
|
||||
|
||||
# Continue in same thread
|
||||
result2 = graph.invoke({"messages": [("user", "How are you?")]}, config)
|
||||
|
||||
# Check state
|
||||
state = graph.get_state(config)
|
||||
print(state.values) # All messages so far
|
||||
|
||||
# Check history
|
||||
for state in graph.get_state_history(config):
|
||||
print(f"Step: {state.values}")
|
||||
```
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Thread ID Management**: Use unique thread_id for each conversation
|
||||
2. **Checkpointer Selection**: Choose appropriate implementation for your use case
|
||||
3. **State Minimization**: Save only necessary information to keep checkpoint size small
|
||||
4. **Cleanup**: Periodically delete old checkpoints
|
||||
|
||||
## Next Steps
|
||||
|
||||
For details on each feature, refer to the following pages:
|
||||
|
||||
- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Persistence details
|
||||
- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Checkpointer implementation
|
||||
- [03_memory_management_store.md](03_memory_management_store.md) - Long-term memory management
|
||||
264
skills/langgraph-master/03_memory_management_persistence.md
Normal file
264
skills/langgraph-master/03_memory_management_persistence.md
Normal file
@@ -0,0 +1,264 @@
|
||||
# Persistence
|
||||
|
||||
Functionality to save and restore graph state.
|
||||
|
||||
## Overview
|
||||
|
||||
Persistence is a feature that **automatically saves** state at each stage of graph execution and allows you to restore it later.
|
||||
|
||||
## Basic Concepts
|
||||
|
||||
### Checkpoints
|
||||
|
||||
State is automatically saved after each **superstep** (set of nodes executed in parallel).
|
||||
|
||||
```python
|
||||
# Superstep 1: node_a and node_b execute in parallel
|
||||
# → Checkpoint 1
|
||||
|
||||
# Superstep 2: node_c executes
|
||||
# → Checkpoint 2
|
||||
|
||||
# Superstep 3: node_d executes
|
||||
# → Checkpoint 3
|
||||
```
|
||||
|
||||
### Threads
|
||||
|
||||
A thread is an identifier containing the **accumulated state of a series of executions**:
|
||||
|
||||
```python
|
||||
config = {"configurable": {"thread_id": "conversation-123"}}
|
||||
```
|
||||
|
||||
Executing with the same `thread_id` continues from the previous state.
|
||||
|
||||
## Implementation Example
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
from langgraph.graph import StateGraph, MessagesState
|
||||
|
||||
# Define graph
|
||||
builder = StateGraph(MessagesState)
|
||||
builder.add_node("chatbot", chatbot_node)
|
||||
builder.add_edge(START, "chatbot")
|
||||
builder.add_edge("chatbot", END)
|
||||
|
||||
# Compile with checkpointer
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# Execute with thread ID
|
||||
config = {"configurable": {"thread_id": "user-001"}}
|
||||
|
||||
# First execution
|
||||
graph.invoke(
|
||||
{"messages": [{"role": "user", "content": "My name is Alice"}]},
|
||||
config
|
||||
)
|
||||
|
||||
# Continue in same thread (retains previous state)
|
||||
response = graph.invoke(
|
||||
{"messages": [{"role": "user", "content": "What's my name?"}]},
|
||||
config
|
||||
)
|
||||
|
||||
# → "Your name is Alice"
|
||||
```
|
||||
|
||||
## StateSnapshot Object
|
||||
|
||||
Checkpoints are represented as `StateSnapshot` objects:
|
||||
|
||||
```python
|
||||
class StateSnapshot:
|
||||
values: dict # State at that point in time
|
||||
next: tuple[str] # Nodes to execute next
|
||||
config: RunnableConfig # Checkpoint configuration
|
||||
metadata: dict # Metadata
|
||||
tasks: tuple[PregelTask] # Scheduled tasks
|
||||
```
|
||||
|
||||
### Getting Latest State
|
||||
|
||||
```python
|
||||
state = graph.get_state(config)
|
||||
|
||||
print(state.values) # Current state
|
||||
print(state.next) # Next nodes
|
||||
print(state.config) # Checkpoint configuration
|
||||
```
|
||||
|
||||
### Getting History
|
||||
|
||||
```python
|
||||
# Get list of StateSnapshots in chronological order
|
||||
for state in graph.get_state_history(config):
|
||||
print(f"Checkpoint: {state.config['configurable']['checkpoint_id']}")
|
||||
print(f"Values: {state.values}")
|
||||
print(f"Next: {state.next}")
|
||||
print("---")
|
||||
```
|
||||
|
||||
## Time Travel Feature
|
||||
|
||||
Resume execution from a specific checkpoint:
|
||||
|
||||
```python
|
||||
# Get specific checkpoint from history
|
||||
history = list(graph.get_state_history(config))
|
||||
|
||||
# Checkpoint from 3 steps ago
|
||||
past_state = history[3]
|
||||
|
||||
# Re-execute from that checkpoint
|
||||
result = graph.invoke(
|
||||
{"messages": [{"role": "user", "content": "New question"}]},
|
||||
past_state.config
|
||||
)
|
||||
```
|
||||
|
||||
### Validating Alternative Paths
|
||||
|
||||
```python
|
||||
# Get current state
|
||||
current_state = graph.get_state(config)
|
||||
|
||||
# Try with different input
|
||||
alt_result = graph.invoke(
|
||||
{"messages": [{"role": "user", "content": "Different question"}]},
|
||||
current_state.config
|
||||
)
|
||||
|
||||
# Original execution is not affected
|
||||
```
|
||||
|
||||
## Updating State
|
||||
|
||||
Directly update checkpoint state:
|
||||
|
||||
```python
|
||||
# Get current state
|
||||
state = graph.get_state(config)
|
||||
|
||||
# Update state
|
||||
graph.update_state(
|
||||
config,
|
||||
{"messages": [{"role": "assistant", "content": "Updated message"}]}
|
||||
)
|
||||
|
||||
# Resume from updated state
|
||||
graph.invoke({"messages": [...]}, config)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Conversation Continuation
|
||||
|
||||
```python
|
||||
# Session 1
|
||||
config = {"configurable": {"thread_id": "chat-1"}}
|
||||
graph.invoke({"messages": [("user", "Hello")]}, config)
|
||||
|
||||
# Session 2 (days later)
|
||||
# Remembers previous conversation
|
||||
graph.invoke({"messages": [("user", "Continuing from last time")]}, config)
|
||||
```
|
||||
|
||||
### 2. Error Recovery
|
||||
|
||||
```python
|
||||
try:
|
||||
graph.invoke(input, config)
|
||||
except Exception as e:
|
||||
# Even if error occurs, can recover from checkpoint
|
||||
print(f"Error: {e}")
|
||||
|
||||
# Check latest state
|
||||
state = graph.get_state(config)
|
||||
|
||||
# Fix state and re-execute
|
||||
graph.update_state(config, {"error_fixed": True})
|
||||
graph.invoke(input, config)
|
||||
```
|
||||
|
||||
### 3. A/B Testing
|
||||
|
||||
```python
|
||||
# Base execution
|
||||
base_result = graph.invoke(input, base_config)
|
||||
|
||||
# Alternative execution 1
|
||||
alt_config_1 = base_config.copy()
|
||||
alt_result_1 = graph.invoke(modified_input_1, alt_config_1)
|
||||
|
||||
# Alternative execution 2
|
||||
alt_config_2 = base_config.copy()
|
||||
alt_result_2 = graph.invoke(modified_input_2, alt_config_2)
|
||||
|
||||
# Compare results
|
||||
```
|
||||
|
||||
### 4. Debugging and Tracing
|
||||
|
||||
```python
|
||||
# Execute
|
||||
graph.invoke(input, config)
|
||||
|
||||
# Check each step
|
||||
for i, state in enumerate(graph.get_state_history(config)):
|
||||
print(f"Step {i}:")
|
||||
print(f" State: {state.values}")
|
||||
print(f" Next: {state.next}")
|
||||
```
|
||||
|
||||
## Important Considerations
|
||||
|
||||
### Thread ID Uniqueness
|
||||
|
||||
```python
|
||||
# Use different thread_id per user
|
||||
user_config = {"configurable": {"thread_id": f"user-{user_id}"}}
|
||||
|
||||
# Use different thread_id per conversation
|
||||
conversation_config = {"configurable": {"thread_id": f"conv-{conv_id}"}}
|
||||
```
|
||||
|
||||
### Checkpoint Cleanup
|
||||
|
||||
```python
|
||||
# Delete old checkpoints (implementation-dependent)
|
||||
checkpointer.cleanup(before_timestamp=old_timestamp)
|
||||
```
|
||||
|
||||
### Multi-user Support
|
||||
|
||||
```python
|
||||
# Combine user ID and session ID
|
||||
def get_config(user_id: str, session_id: str):
|
||||
return {
|
||||
"configurable": {
|
||||
"thread_id": f"{user_id}-{session_id}"
|
||||
}
|
||||
}
|
||||
|
||||
config = get_config("user123", "session456")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Meaningful thread_id**: Format that can identify user, session, conversation
|
||||
2. **Regular Cleanup**: Delete old checkpoints
|
||||
3. **Appropriate Checkpointer**: Choose implementation based on use case
|
||||
4. **Error Handling**: Properly handle errors when retrieving checkpoints
|
||||
|
||||
## Summary
|
||||
|
||||
Persistence enables **state persistence and restoration**, making conversation continuation, error recovery, and time travel possible.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Checkpointer implementation details
|
||||
- [03_memory_management_store.md](03_memory_management_store.md) - Combining with long-term memory
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Applications of state inspection
|
||||
287
skills/langgraph-master/03_memory_management_store.md
Normal file
287
skills/langgraph-master/03_memory_management_store.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# Store (Long-term Memory)
|
||||
|
||||
Long-term memory for sharing information across multiple threads.
|
||||
|
||||
## Overview
|
||||
|
||||
Checkpointer only saves state within a single thread. To share information across multiple threads, use **Store**.
|
||||
|
||||
## Checkpointer vs Store
|
||||
|
||||
| Feature | Checkpointer | Store |
|
||||
|---------|-------------|-------|
|
||||
| Scope | Single thread | All threads |
|
||||
| Purpose | Conversation state | User information |
|
||||
| Auto-save | Yes | No (manual) |
|
||||
| Search | thread_id | Namespace |
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from langgraph.store.memory import InMemoryStore
|
||||
|
||||
# Create Store
|
||||
store = InMemoryStore()
|
||||
|
||||
# Save user information
|
||||
store.put(
|
||||
namespace=("users", "user-123"),
|
||||
key="preferences",
|
||||
value={
|
||||
"language": "en",
|
||||
"theme": "dark",
|
||||
"notifications": True
|
||||
}
|
||||
)
|
||||
|
||||
# Retrieve user information
|
||||
user_prefs = store.get(("users", "user-123"), "preferences")
|
||||
```
|
||||
|
||||
## Namespace
|
||||
|
||||
Namespaces are grouped by **tuples**:
|
||||
|
||||
```python
|
||||
# User information
|
||||
("users", user_id)
|
||||
|
||||
# Session information
|
||||
("sessions", session_id)
|
||||
|
||||
# Project information
|
||||
("projects", project_id, "documents")
|
||||
|
||||
# Hierarchical structure
|
||||
("organization", org_id, "department", dept_id)
|
||||
```
|
||||
|
||||
## Store Operations
|
||||
|
||||
### Save
|
||||
|
||||
```python
|
||||
store.put(
|
||||
namespace=("users", "alice"),
|
||||
key="profile",
|
||||
value={
|
||||
"name": "Alice",
|
||||
"email": "alice@example.com",
|
||||
"joined": "2024-01-01"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Retrieve
|
||||
|
||||
```python
|
||||
# Single item
|
||||
profile = store.get(("users", "alice"), "profile")
|
||||
|
||||
# All items in namespace
|
||||
items = store.search(("users", "alice"))
|
||||
```
|
||||
|
||||
### Search
|
||||
|
||||
```python
|
||||
# Filter by namespace
|
||||
all_users = store.search(("users",))
|
||||
|
||||
# Filter by key
|
||||
profiles = store.search(("users",), filter={"key": "profile"})
|
||||
```
|
||||
|
||||
### Delete
|
||||
|
||||
```python
|
||||
# Single item
|
||||
store.delete(("users", "alice"), "profile")
|
||||
|
||||
# Entire namespace
|
||||
store.delete_namespace(("users", "alice"))
|
||||
```
|
||||
|
||||
## Integration with Graph
|
||||
|
||||
```python
|
||||
from langgraph.store.memory import InMemoryStore
|
||||
|
||||
store = InMemoryStore()
|
||||
|
||||
# Integrate Store with graph
|
||||
graph = builder.compile(
|
||||
checkpointer=checkpointer,
|
||||
store=store
|
||||
)
|
||||
|
||||
# Use Store within nodes
|
||||
def personalized_node(state: State, *, store):
|
||||
user_id = state["user_id"]
|
||||
|
||||
# Get user preferences
|
||||
prefs = store.get(("users", user_id), "preferences")
|
||||
|
||||
# Process based on preferences
|
||||
if prefs and prefs.value.get("language") == "en":
|
||||
response = generate_english_response(state)
|
||||
else:
|
||||
response = generate_default_response(state)
|
||||
|
||||
return {"response": response}
|
||||
```
|
||||
|
||||
## Semantic Search
|
||||
|
||||
Store implementations with vector search capability:
|
||||
|
||||
```python
|
||||
from langgraph.store.memory import InMemoryStore
|
||||
|
||||
store = InMemoryStore(index={"embed": True})
|
||||
|
||||
# Save documents (automatically vectorized)
|
||||
store.put(
|
||||
("documents", "doc-1"),
|
||||
"content",
|
||||
{"text": "LangGraph is an agent framework"}
|
||||
)
|
||||
|
||||
# Semantic search
|
||||
results = store.search(
|
||||
("documents",),
|
||||
query="agent development"
|
||||
)
|
||||
```
|
||||
|
||||
## Practical Example: User Profile
|
||||
|
||||
```python
|
||||
class ProfileState(TypedDict):
|
||||
user_id: str
|
||||
messages: Annotated[list, add_messages]
|
||||
|
||||
def save_user_info(state: ProfileState, *, store):
|
||||
"""Extract and save user information from conversation"""
|
||||
messages = state["messages"]
|
||||
user_id = state["user_id"]
|
||||
|
||||
# Extract information with LLM
|
||||
info = extract_user_info(messages)
|
||||
|
||||
if info:
|
||||
# Save to Store
|
||||
current = store.get(("users", user_id), "profile")
|
||||
|
||||
if current:
|
||||
# Merge with existing information
|
||||
updated = {**current.value, **info}
|
||||
else:
|
||||
updated = info
|
||||
|
||||
store.put(
|
||||
("users", user_id),
|
||||
"profile",
|
||||
updated
|
||||
)
|
||||
|
||||
return {}
|
||||
|
||||
def personalized_response(state: ProfileState, *, store):
|
||||
"""Personalize using user information"""
|
||||
user_id = state["user_id"]
|
||||
|
||||
# Get user information
|
||||
profile = store.get(("users", user_id), "profile")
|
||||
|
||||
if profile:
|
||||
context = f"User context: {profile.value}"
|
||||
messages = [
|
||||
{"role": "system", "content": context},
|
||||
*state["messages"]
|
||||
]
|
||||
else:
|
||||
messages = state["messages"]
|
||||
|
||||
response = llm.invoke(messages)
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
## Practical Example: Knowledge Base
|
||||
|
||||
```python
|
||||
def query_knowledge_base(state: State, *, store):
|
||||
"""Search for knowledge related to question"""
|
||||
query = state["messages"][-1].content
|
||||
|
||||
# Semantic search
|
||||
relevant_docs = store.search(
|
||||
("knowledge",),
|
||||
query=query,
|
||||
limit=3
|
||||
)
|
||||
|
||||
# Add relevant information to context
|
||||
context = "\n".join([
|
||||
doc.value["text"]
|
||||
for doc in relevant_docs
|
||||
])
|
||||
|
||||
# Pass to LLM
|
||||
response = llm.invoke([
|
||||
{"role": "system", "content": f"Context:\n{context}"},
|
||||
*state["messages"]
|
||||
])
|
||||
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
## Store Implementations
|
||||
|
||||
### InMemoryStore
|
||||
|
||||
```python
|
||||
from langgraph.store.memory import InMemoryStore
|
||||
|
||||
store = InMemoryStore()
|
||||
```
|
||||
|
||||
### Custom Store
|
||||
|
||||
```python
|
||||
from langgraph.store.base import BaseStore
|
||||
|
||||
class RedisStore(BaseStore):
|
||||
def __init__(self, redis_client):
|
||||
self.redis = redis_client
|
||||
|
||||
def put(self, namespace, key, value):
|
||||
ns_key = f"{':'.join(namespace)}:{key}"
|
||||
self.redis.set(ns_key, json.dumps(value))
|
||||
|
||||
def get(self, namespace, key):
|
||||
ns_key = f"{':'.join(namespace)}:{key}"
|
||||
data = self.redis.get(ns_key)
|
||||
return json.loads(data) if data else None
|
||||
|
||||
def search(self, namespace, filter=None):
|
||||
pattern = f"{':'.join(namespace)}:*"
|
||||
keys = self.redis.keys(pattern)
|
||||
return [self.get_by_key(k) for k in keys]
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Namespace Design**: Hierarchical and meaningful structure
|
||||
2. **Key Naming**: Clear and consistent naming conventions
|
||||
3. **Data Size**: Store references only for large data
|
||||
4. **Cleanup**: Periodic deletion of old data
|
||||
|
||||
## Summary
|
||||
|
||||
Store is long-term memory for sharing information across multiple threads. Use it for persisting user profiles, knowledge bases, settings, etc.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Differences from short-term memory
|
||||
- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Persistence basics
|
||||
280
skills/langgraph-master/04_tool_integration_command_api.md
Normal file
280
skills/langgraph-master/04_tool_integration_command_api.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Command API
|
||||
|
||||
An advanced API that integrates state updates and control flow.
|
||||
|
||||
## Overview
|
||||
|
||||
The Command API is a feature that allows nodes to specify **state updates** and **control flow** simultaneously.
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from langgraph.types import Command
|
||||
|
||||
def decision_node(state: State) -> Command:
|
||||
"""Update state and specify the next node"""
|
||||
result = analyze(state["data"])
|
||||
|
||||
if result["confidence"] > 0.8:
|
||||
return Command(
|
||||
update={"result": result, "confident": True},
|
||||
goto="finalize"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"result": result, "confident": False},
|
||||
goto="review"
|
||||
)
|
||||
```
|
||||
|
||||
## Command Object Parameters
|
||||
|
||||
```python
|
||||
Command(
|
||||
update: dict, # Updates to state
|
||||
goto: str | list[str], # Next node(s) (single or multiple)
|
||||
graph: str | None = None # For subgraph navigation
|
||||
)
|
||||
```
|
||||
|
||||
## vs Traditional State Updates
|
||||
|
||||
### Traditional Method
|
||||
|
||||
```python
|
||||
def node(state: State) -> dict:
|
||||
return {"result": "value"}
|
||||
|
||||
# Control flow in edges
|
||||
def route(state: State) -> str:
|
||||
if state["result"] == "value":
|
||||
return "next_node"
|
||||
return "other_node"
|
||||
|
||||
builder.add_conditional_edges("node", route, {...})
|
||||
```
|
||||
|
||||
### Command API
|
||||
|
||||
```python
|
||||
def node(state: State) -> Command:
|
||||
return Command(
|
||||
update={"result": "value"},
|
||||
goto="next_node" # Specify control flow as well
|
||||
)
|
||||
|
||||
# No edges needed (Command controls flow)
|
||||
```
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Pattern 1: Conditional Branching
|
||||
|
||||
```python
|
||||
def validator(state: State) -> Command:
|
||||
"""Validate and determine next node"""
|
||||
is_valid = validate(state["data"])
|
||||
|
||||
if is_valid:
|
||||
return Command(
|
||||
update={"valid": True},
|
||||
goto="process"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"valid": False, "errors": get_errors(state["data"])},
|
||||
goto="error_handler"
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Parallel Execution
|
||||
|
||||
```python
|
||||
def fan_out_node(state: State) -> Command:
|
||||
"""Branch to multiple nodes in parallel"""
|
||||
return Command(
|
||||
update={"started": True},
|
||||
goto=["worker_a", "worker_b", "worker_c"] # Parallel execution
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Loop Control
|
||||
|
||||
```python
|
||||
def iterator_node(state: State) -> Command:
|
||||
"""Iterative processing"""
|
||||
iteration = state.get("iteration", 0) + 1
|
||||
result = process_iteration(state["data"], iteration)
|
||||
|
||||
if iteration < state["max_iterations"] and not result["done"]:
|
||||
return Command(
|
||||
update={"iteration": iteration, "result": result},
|
||||
goto="iterator_node" # Loop back to self
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"final_result": result},
|
||||
goto=END
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 4: Subgraph Navigation
|
||||
|
||||
```python
|
||||
def sub_node(state: State) -> Command:
|
||||
"""Navigate from subgraph to parent graph"""
|
||||
result = process(state["data"])
|
||||
|
||||
if need_parent_intervention(result):
|
||||
return Command(
|
||||
update={"sub_result": result},
|
||||
goto="parent_handler",
|
||||
graph=Command.PARENT # Navigate to parent graph
|
||||
)
|
||||
|
||||
return {"sub_result": result}
|
||||
```
|
||||
|
||||
## Integration with Tools
|
||||
|
||||
### Control After Tool Execution
|
||||
|
||||
```python
|
||||
def tool_node_with_command(state: MessagesState) -> Command:
|
||||
"""Determine next action after tool execution"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
tool = tool_map[tool_call["name"]]
|
||||
result = tool.invoke(tool_call["args"])
|
||||
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
# Determine next node based on results
|
||||
if any("error" in r.content.lower() for r in tool_results):
|
||||
return Command(
|
||||
update={"messages": tool_results},
|
||||
goto="error_handler"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"messages": tool_results},
|
||||
goto="agent"
|
||||
)
|
||||
```
|
||||
|
||||
### Command from Within Tools
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt
|
||||
|
||||
@tool
|
||||
def send_email(to: str, subject: str, body: str) -> str:
|
||||
"""Send email (with approval)"""
|
||||
|
||||
# Request approval
|
||||
approved = interrupt({
|
||||
"action": "send_email",
|
||||
"to": to,
|
||||
"subject": subject,
|
||||
"message": "Approve sending this email?"
|
||||
})
|
||||
|
||||
if approved:
|
||||
result = actually_send_email(to, subject, body)
|
||||
return f"Email sent to {to}"
|
||||
else:
|
||||
return "Email cancelled by user"
|
||||
```
|
||||
|
||||
## Dynamic Routing
|
||||
|
||||
```python
|
||||
def dynamic_router(state: State) -> Command:
|
||||
"""Dynamically select route based on state"""
|
||||
score = evaluate(state["data"])
|
||||
|
||||
# Select route based on score
|
||||
if score > 0.9:
|
||||
route = "expert_handler"
|
||||
elif score > 0.7:
|
||||
route = "standard_handler"
|
||||
else:
|
||||
route = "basic_handler"
|
||||
|
||||
return Command(
|
||||
update={"confidence_score": score},
|
||||
goto=route
|
||||
)
|
||||
```
|
||||
|
||||
## Error Recovery
|
||||
|
||||
```python
|
||||
def processor_with_fallback(state: State) -> Command:
|
||||
"""Fallback on error"""
|
||||
try:
|
||||
result = risky_operation(state["data"])
|
||||
|
||||
return Command(
|
||||
update={"result": result, "error": None},
|
||||
goto="success_handler"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
return Command(
|
||||
update={"error": str(e)},
|
||||
goto="fallback_handler"
|
||||
)
|
||||
```
|
||||
|
||||
## State Machine Implementation
|
||||
|
||||
```python
|
||||
def state_machine_node(state: State) -> Command:
|
||||
"""State machine"""
|
||||
current_state = state.get("state", "initial")
|
||||
|
||||
transitions = {
|
||||
"initial": ("validate", {"state": "validating"}),
|
||||
"validating": ("process" if state.get("valid") else "error", {"state": "processing"}),
|
||||
"processing": ("finalize", {"state": "finalizing"}),
|
||||
"finalizing": (END, {"state": "done"})
|
||||
}
|
||||
|
||||
next_node, update = transitions[current_state]
|
||||
|
||||
return Command(
|
||||
update=update,
|
||||
goto=next_node
|
||||
)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Conciseness**: Define state updates and control flow in one place
|
||||
✅ **Readability**: Node intent is clear
|
||||
✅ **Flexibility**: Dynamic routing is easier
|
||||
✅ **Debugging**: Control flow is easier to track
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Complexity**: Avoid overly complex conditional branching
|
||||
⚠️ **Testing**: All branches need to be tested
|
||||
⚠️ **Parallel Execution**: Order of parallel nodes is non-deterministic
|
||||
|
||||
## Summary
|
||||
|
||||
The Command API integrates state updates and control flow, enabling more flexible and readable graph construction.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [01_core_concepts_node.md](01_core_concepts_node.md) - Node basics
|
||||
- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Comparison with edges
|
||||
- [02_graph_architecture_subgraph.md](02_graph_architecture_subgraph.md) - Subgraph navigation
|
||||
158
skills/langgraph-master/04_tool_integration_overview.md
Normal file
158
skills/langgraph-master/04_tool_integration_overview.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# 04. Tool Integration
|
||||
|
||||
Integration and execution control of external tools.
|
||||
|
||||
## Overview
|
||||
|
||||
In LangGraph, LLMs can interact with external systems by calling **tools**. Tools provide various capabilities such as search, calculation, API calls, and more.
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. [Tool Definition](04_tool_integration_tool_definition.md)
|
||||
|
||||
How to define tools:
|
||||
- `@tool` decorator
|
||||
- Function descriptions and parameters
|
||||
- Structured output
|
||||
|
||||
### 2. [Tool Node](04_tool_integration_tool_node.md)
|
||||
|
||||
Nodes that execute tools:
|
||||
- Using `ToolNode`
|
||||
- Error handling
|
||||
- Custom tool nodes
|
||||
|
||||
### 3. [Command API](04_tool_integration_command_api.md)
|
||||
|
||||
Controlling tool execution:
|
||||
- Integration of state updates and control flow
|
||||
- Transition control from tools
|
||||
|
||||
## Basic Implementation
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
from langgraph.prebuilt import ToolNode
|
||||
from langgraph.graph import MessagesState, StateGraph
|
||||
|
||||
# 1. Define tools
|
||||
@tool
|
||||
def search(query: str) -> str:
|
||||
"""Perform a web search.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
"""
|
||||
return perform_search(query)
|
||||
|
||||
@tool
|
||||
def calculator(expression: str) -> float:
|
||||
"""Calculate a mathematical expression.
|
||||
|
||||
Args:
|
||||
expression: Expression to calculate (e.g., "2 + 2")
|
||||
"""
|
||||
return eval(expression)
|
||||
|
||||
tools = [search, calculator]
|
||||
|
||||
# 2. Bind tools to LLM
|
||||
llm_with_tools = llm.bind_tools(tools)
|
||||
|
||||
# 3. Agent node
|
||||
def agent(state: MessagesState):
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
# 4. Tool node
|
||||
tool_node = ToolNode(tools)
|
||||
|
||||
# 5. Build graph
|
||||
builder = StateGraph(MessagesState)
|
||||
builder.add_node("agent", agent)
|
||||
builder.add_node("tools", tool_node)
|
||||
|
||||
# 6. Conditional edges
|
||||
def should_continue(state: MessagesState):
|
||||
last_message = state["messages"][-1]
|
||||
if last_message.tool_calls:
|
||||
return "tools"
|
||||
return END
|
||||
|
||||
builder.add_edge(START, "agent")
|
||||
builder.add_conditional_edges("agent", should_continue)
|
||||
builder.add_edge("tools", "agent")
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Types of Tools
|
||||
|
||||
### Search Tools
|
||||
|
||||
```python
|
||||
@tool
|
||||
def web_search(query: str) -> str:
|
||||
"""Search the web"""
|
||||
return search_api(query)
|
||||
```
|
||||
|
||||
### Calculator Tools
|
||||
|
||||
```python
|
||||
@tool
|
||||
def calculator(expression: str) -> float:
|
||||
"""Calculate a mathematical expression"""
|
||||
return eval(expression)
|
||||
```
|
||||
|
||||
### API Tools
|
||||
|
||||
```python
|
||||
@tool
|
||||
def get_weather(city: str) -> dict:
|
||||
"""Get weather information"""
|
||||
return weather_api(city)
|
||||
```
|
||||
|
||||
### Database Tools
|
||||
|
||||
```python
|
||||
@tool
|
||||
def query_database(sql: str) -> list[dict]:
|
||||
"""Query the database"""
|
||||
return execute_sql(sql)
|
||||
```
|
||||
|
||||
## Tool Execution Flow
|
||||
|
||||
```
|
||||
User Query
|
||||
↓
|
||||
[Agent Node]
|
||||
↓
|
||||
LLM decides: Use tool?
|
||||
↓ Yes
|
||||
[Tool Node] ← Execute tool
|
||||
↓
|
||||
[Agent Node] ← Tool result
|
||||
↓
|
||||
LLM decides: Continue?
|
||||
↓ No
|
||||
Final Answer
|
||||
```
|
||||
|
||||
## Key Principles
|
||||
|
||||
1. **Clear Descriptions**: Write detailed docstrings for tools
|
||||
2. **Error Handling**: Handle tool execution errors appropriately
|
||||
3. **Type Safety**: Explicitly specify parameter types
|
||||
4. **Approval Flow**: Incorporate Human-in-the-Loop for critical tools
|
||||
|
||||
## Next Steps
|
||||
|
||||
For details on each component, please refer to the following pages:
|
||||
|
||||
- [04_tool_integration_tool_definition.md](04_tool_integration_tool_definition.md) - How to define tools
|
||||
- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Tool node implementation
|
||||
- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Using the Command API
|
||||
227
skills/langgraph-master/04_tool_integration_tool_definition.md
Normal file
227
skills/langgraph-master/04_tool_integration_tool_definition.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# Tool Definition
|
||||
|
||||
How to define tools and design patterns.
|
||||
|
||||
## Basic Definition
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search(query: str) -> str:
|
||||
"""Perform a web search.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
"""
|
||||
return perform_search(query)
|
||||
```
|
||||
|
||||
## Key Elements
|
||||
|
||||
### 1. Docstring
|
||||
|
||||
Description for the LLM to understand the tool:
|
||||
|
||||
```python
|
||||
@tool
|
||||
def get_weather(location: str, unit: str = "celsius") -> str:
|
||||
"""Get the current weather for a specified location.
|
||||
|
||||
This tool provides up-to-date weather information for cities around the world.
|
||||
It includes detailed information such as temperature, humidity, and weather conditions.
|
||||
|
||||
Args:
|
||||
location: City name (e.g., "Tokyo", "New York", "London")
|
||||
unit: Temperature unit ("celsius" or "fahrenheit"), default is "celsius"
|
||||
|
||||
Returns:
|
||||
A string containing weather information
|
||||
|
||||
Examples:
|
||||
>>> get_weather("Tokyo")
|
||||
"Tokyo weather: Sunny, Temperature: 25°C, Humidity: 60%"
|
||||
"""
|
||||
return fetch_weather(location, unit)
|
||||
```
|
||||
|
||||
### 2. Type Annotations
|
||||
|
||||
Explicitly specify parameter and return value types:
|
||||
|
||||
```python
|
||||
from typing import List, Dict
|
||||
|
||||
@tool
|
||||
def search_products(
|
||||
query: str,
|
||||
max_results: int = 10,
|
||||
category: str | None = None
|
||||
) -> List[Dict[str, any]]:
|
||||
"""Search for products.
|
||||
|
||||
Args:
|
||||
query: Search keywords
|
||||
max_results: Maximum number of results
|
||||
category: Category filter (optional)
|
||||
"""
|
||||
return database.search(query, max_results, category)
|
||||
```
|
||||
|
||||
## Structured Output
|
||||
|
||||
Structured output using Pydantic models:
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class WeatherInfo(BaseModel):
|
||||
temperature: float = Field(description="Temperature in Celsius")
|
||||
humidity: int = Field(description="Humidity (%)")
|
||||
condition: str = Field(description="Weather condition")
|
||||
location: str = Field(description="Location")
|
||||
|
||||
@tool(response_format="content_and_artifact")
|
||||
def get_detailed_weather(location: str) -> tuple[str, WeatherInfo]:
|
||||
"""Get detailed weather information.
|
||||
|
||||
Args:
|
||||
location: City name
|
||||
"""
|
||||
data = fetch_weather_data(location)
|
||||
|
||||
weather = WeatherInfo(
|
||||
temperature=data["temp"],
|
||||
humidity=data["humidity"],
|
||||
condition=data["condition"],
|
||||
location=location
|
||||
)
|
||||
|
||||
summary = f"{location} weather: {weather.condition}, {weather.temperature}°C"
|
||||
|
||||
return summary, weather
|
||||
```
|
||||
|
||||
## Best Practices for Tool Design
|
||||
|
||||
### 1. Single Responsibility
|
||||
|
||||
```python
|
||||
# Good: Does one thing well
|
||||
@tool
|
||||
def send_email(to: str, subject: str, body: str) -> str:
|
||||
"""Send an email"""
|
||||
|
||||
# Bad: Multiple responsibilities
|
||||
@tool
|
||||
def send_and_log_email(to: str, subject: str, body: str, log_file: str) -> str:
|
||||
"""Send an email and log it"""
|
||||
# Two different responsibilities
|
||||
```
|
||||
|
||||
### 2. Clear Parameters
|
||||
|
||||
```python
|
||||
# Good: Clear parameters
|
||||
@tool
|
||||
def book_meeting(
|
||||
title: str,
|
||||
start_time: str, # "2024-01-01 10:00"
|
||||
duration_minutes: int,
|
||||
attendees: List[str]
|
||||
) -> str:
|
||||
"""Book a meeting"""
|
||||
|
||||
# Bad: Ambiguous parameters
|
||||
@tool
|
||||
def book_meeting(data: dict) -> str:
|
||||
"""Book a meeting"""
|
||||
```
|
||||
|
||||
### 3. Error Handling
|
||||
|
||||
```python
|
||||
@tool
|
||||
def divide(a: float, b: float) -> float:
|
||||
"""Divide two numbers.
|
||||
|
||||
Args:
|
||||
a: Dividend
|
||||
b: Divisor
|
||||
|
||||
Raises:
|
||||
ValueError: If b is 0
|
||||
"""
|
||||
if b == 0:
|
||||
raise ValueError("Cannot divide by zero")
|
||||
|
||||
return a / b
|
||||
```
|
||||
|
||||
## Dynamic Tool Generation
|
||||
|
||||
Automatically generate tools from API schemas:
|
||||
|
||||
```python
|
||||
def create_api_tool(endpoint: str, method: str, description: str):
|
||||
"""Generate tools from API specifications"""
|
||||
|
||||
@tool
|
||||
def api_tool(**kwargs) -> dict:
|
||||
f"""
|
||||
{description}
|
||||
|
||||
API Endpoint: {endpoint}
|
||||
Method: {method}
|
||||
"""
|
||||
response = requests.request(
|
||||
method=method,
|
||||
url=endpoint,
|
||||
json=kwargs
|
||||
)
|
||||
return response.json()
|
||||
|
||||
return api_tool
|
||||
|
||||
# Example usage
|
||||
create_user_tool = create_api_tool(
|
||||
endpoint="https://api.example.com/users",
|
||||
method="POST",
|
||||
description="Create a new user"
|
||||
)
|
||||
```
|
||||
|
||||
## Grouping Tools
|
||||
|
||||
Group related tools together:
|
||||
|
||||
```python
|
||||
# Database tool group
|
||||
database_tools = [
|
||||
query_users_tool,
|
||||
update_user_tool,
|
||||
delete_user_tool
|
||||
]
|
||||
|
||||
# Search tool group
|
||||
search_tools = [
|
||||
web_search_tool,
|
||||
image_search_tool,
|
||||
news_search_tool
|
||||
]
|
||||
|
||||
# Select based on context
|
||||
if user.role == "admin":
|
||||
tools = database_tools + search_tools
|
||||
else:
|
||||
tools = search_tools
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Tool definitions require clear and detailed docstrings, appropriate type annotations, and error handling.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Using tools in tool nodes
|
||||
- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Integration with Command API
|
||||
318
skills/langgraph-master/04_tool_integration_tool_node.md
Normal file
318
skills/langgraph-master/04_tool_integration_tool_node.md
Normal file
@@ -0,0 +1,318 @@
|
||||
# Tool Node
|
||||
|
||||
Implementation of nodes that execute tools.
|
||||
|
||||
## ToolNode (Built-in)
|
||||
|
||||
The simplest approach:
|
||||
|
||||
```python
|
||||
from langgraph.prebuilt import ToolNode
|
||||
|
||||
tools = [search_tool, calculator_tool]
|
||||
tool_node = ToolNode(tools)
|
||||
|
||||
# Add to graph
|
||||
builder.add_node("tools", tool_node)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
ToolNode:
|
||||
1. Extracts `tool_calls` from the last message
|
||||
2. Executes each tool
|
||||
3. Returns results as `ToolMessage`
|
||||
|
||||
```python
|
||||
# Input
|
||||
{
|
||||
"messages": [
|
||||
AIMessage(tool_calls=[
|
||||
{"name": "search", "args": {"query": "weather"}, "id": "1"}
|
||||
])
|
||||
]
|
||||
}
|
||||
|
||||
# ToolNode execution
|
||||
|
||||
# Output
|
||||
{
|
||||
"messages": [
|
||||
ToolMessage(
|
||||
content="Sunny, 25°C",
|
||||
tool_call_id="1"
|
||||
)
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Custom Tool Node
|
||||
|
||||
For finer control:
|
||||
|
||||
```python
|
||||
def custom_tool_node(state: MessagesState):
|
||||
"""Custom tool node"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
# Find the tool
|
||||
tool = tool_map.get(tool_call["name"])
|
||||
|
||||
if not tool:
|
||||
result = f"Tool {tool_call['name']} not found"
|
||||
else:
|
||||
try:
|
||||
# Execute the tool
|
||||
result = tool.invoke(tool_call["args"])
|
||||
except Exception as e:
|
||||
result = f"Error: {str(e)}"
|
||||
|
||||
# Create ToolMessage
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Basic Error Handling
|
||||
|
||||
```python
|
||||
def robust_tool_node(state: MessagesState):
|
||||
"""Tool node with error handling"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
try:
|
||||
tool = tool_map[tool_call["name"]]
|
||||
result = tool.invoke(tool_call["args"])
|
||||
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
except KeyError:
|
||||
# Tool not found
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Error: Tool '{tool_call['name']}' not found",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Execution error
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Error executing tool: {str(e)}",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
### Retry Logic
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
def tool_node_with_retry(state: MessagesState, max_retries: int = 3):
|
||||
"""Tool node with retry"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
tool = tool_map[tool_call["name"]]
|
||||
retry_count = 0
|
||||
|
||||
while retry_count < max_retries:
|
||||
try:
|
||||
result = tool.invoke(tool_call["args"])
|
||||
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
break
|
||||
|
||||
except TransientError as e:
|
||||
retry_count += 1
|
||||
if retry_count >= max_retries:
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Failed after {max_retries} retries: {str(e)}",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
else:
|
||||
time.sleep(2 ** retry_count) # Exponential backoff
|
||||
|
||||
except Exception as e:
|
||||
# Non-retryable error
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Error: {str(e)}",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
break
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
## Conditional Tool Execution
|
||||
|
||||
```python
|
||||
def conditional_tool_node(state: MessagesState, *, store):
|
||||
"""Tool node with permission checking"""
|
||||
user_id = state.get("user_id")
|
||||
user = store.get(("users", user_id), "profile")
|
||||
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
tool = tool_map[tool_call["name"]]
|
||||
|
||||
# Permission check
|
||||
if not has_permission(user, tool.name):
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Permission denied for tool '{tool.name}'",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
continue
|
||||
|
||||
# Execute
|
||||
result = tool.invoke(tool_call["args"])
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
## Logging Tool Execution
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def logged_tool_node(state: MessagesState):
|
||||
"""Tool node with logging"""
|
||||
last_message = state["messages"][-1]
|
||||
tool_results = []
|
||||
|
||||
for tool_call in last_message.tool_calls:
|
||||
tool = tool_map[tool_call["name"]]
|
||||
|
||||
logger.info(
|
||||
f"Executing tool: {tool.name}",
|
||||
extra={
|
||||
"tool": tool.name,
|
||||
"args": tool_call["args"],
|
||||
"call_id": tool_call["id"]
|
||||
}
|
||||
)
|
||||
|
||||
try:
|
||||
start = time.time()
|
||||
result = tool.invoke(tool_call["args"])
|
||||
duration = time.time() - start
|
||||
|
||||
logger.info(
|
||||
f"Tool completed: {tool.name}",
|
||||
extra={
|
||||
"tool": tool.name,
|
||||
"duration": duration,
|
||||
"success": True
|
||||
}
|
||||
)
|
||||
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Tool failed: {tool.name}",
|
||||
extra={
|
||||
"tool": tool.name,
|
||||
"error": str(e)
|
||||
},
|
||||
exc_info=True
|
||||
)
|
||||
|
||||
tool_results.append(
|
||||
ToolMessage(
|
||||
content=f"Error: {str(e)}",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
)
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
## Parallel Tool Execution
|
||||
|
||||
```python
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
def parallel_tool_node(state: MessagesState):
|
||||
"""Execute tools in parallel"""
|
||||
last_message = state["messages"][-1]
|
||||
|
||||
def execute_tool(tool_call):
|
||||
tool = tool_map[tool_call["name"]]
|
||||
try:
|
||||
result = tool.invoke(tool_call["args"])
|
||||
return ToolMessage(
|
||||
content=str(result),
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
except Exception as e:
|
||||
return ToolMessage(
|
||||
content=f"Error: {str(e)}",
|
||||
tool_call_id=tool_call["id"]
|
||||
)
|
||||
|
||||
with ThreadPoolExecutor(max_workers=5) as executor:
|
||||
tool_results = list(executor.map(
|
||||
execute_tool,
|
||||
last_message.tool_calls
|
||||
))
|
||||
|
||||
return {"messages": tool_results}
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
ToolNode executes tools and returns results as ToolMessage. You can add error handling, permission checks, logging, and more.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [04_tool_integration_tool_definition.md](04_tool_integration_tool_definition.md) - Tool definition
|
||||
- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Integration with Command API
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Combining with approval flows
|
||||
@@ -0,0 +1,289 @@
|
||||
# Human-in-the-Loop (Approval Flow)
|
||||
|
||||
A feature to pause graph execution and request human intervention.
|
||||
|
||||
## Overview
|
||||
|
||||
Human-in-the-Loop is a feature that requests **human approval or input** before important decisions or actions.
|
||||
|
||||
## Dynamic Interrupt (Recommended)
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt
|
||||
|
||||
def approval_node(state: State):
|
||||
"""Request approval"""
|
||||
approved = interrupt("Do you approve this action?")
|
||||
|
||||
if approved:
|
||||
return {"status": "approved"}
|
||||
else:
|
||||
return {"status": "rejected"}
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```python
|
||||
# Initial execution (stops at interrupt)
|
||||
result = graph.invoke(input, config)
|
||||
|
||||
# Check interrupt information
|
||||
print(result["__interrupt__"]) # "Do you approve this action?"
|
||||
|
||||
# Approve and resume
|
||||
graph.invoke(None, config, resume=True)
|
||||
|
||||
# Or reject
|
||||
graph.invoke(None, config, resume=False)
|
||||
```
|
||||
|
||||
## Application Patterns
|
||||
|
||||
### Pattern 1: Approve or Reject
|
||||
|
||||
```python
|
||||
def action_approval(state: State):
|
||||
"""Approval before action execution"""
|
||||
action_details = prepare_action(state)
|
||||
|
||||
approved = interrupt({
|
||||
"question": "Approve this action?",
|
||||
"details": action_details
|
||||
})
|
||||
|
||||
if approved:
|
||||
result = execute_action(action_details)
|
||||
return {"result": result, "approved": True}
|
||||
else:
|
||||
return {"result": None, "approved": False}
|
||||
```
|
||||
|
||||
### Pattern 2: Editable Approval
|
||||
|
||||
```python
|
||||
def review_and_edit(state: State):
|
||||
"""Review and edit generated content"""
|
||||
generated = generate_content(state)
|
||||
|
||||
edited_content = interrupt({
|
||||
"instruction": "Review and edit this content",
|
||||
"content": generated
|
||||
})
|
||||
|
||||
return {"final_content": edited_content}
|
||||
|
||||
# Resume with edited version
|
||||
graph.invoke(None, config, resume=edited_version)
|
||||
```
|
||||
|
||||
### Pattern 3: Tool Execution Approval
|
||||
|
||||
```python
|
||||
@tool
|
||||
def send_email(to: str, subject: str, body: str):
|
||||
"""Send email (with approval)"""
|
||||
response = interrupt({
|
||||
"action": "send_email",
|
||||
"to": to,
|
||||
"subject": subject,
|
||||
"body": body,
|
||||
"message": "Approve sending this email?"
|
||||
})
|
||||
|
||||
if response.get("action") == "approve":
|
||||
# When approved, parameters can also be edited
|
||||
final_to = response.get("to", to)
|
||||
final_subject = response.get("subject", subject)
|
||||
final_body = response.get("body", body)
|
||||
|
||||
return actually_send_email(final_to, final_subject, final_body)
|
||||
else:
|
||||
return "Email cancelled by user"
|
||||
```
|
||||
|
||||
### Pattern 4: Input Validation Loop
|
||||
|
||||
```python
|
||||
def get_valid_input(state: State):
|
||||
"""Loop until valid input is obtained"""
|
||||
prompt = "Enter a positive number:"
|
||||
|
||||
while True:
|
||||
answer = interrupt(prompt)
|
||||
|
||||
if isinstance(answer, (int, float)) and answer > 0:
|
||||
break
|
||||
|
||||
prompt = f"'{answer}' is invalid. Enter a positive number:"
|
||||
|
||||
return {"value": answer}
|
||||
```
|
||||
|
||||
## Static Interrupt (For Debugging)
|
||||
|
||||
Set breakpoints at compile time:
|
||||
|
||||
```python
|
||||
graph = builder.compile(
|
||||
checkpointer=checkpointer,
|
||||
interrupt_before=["risky_node"], # Stop before node execution
|
||||
interrupt_after=["generate_content"] # Stop after node execution
|
||||
)
|
||||
|
||||
# Execute (stops before specified node)
|
||||
graph.invoke(input, config)
|
||||
|
||||
# Check state
|
||||
state = graph.get_state(config)
|
||||
|
||||
# Resume
|
||||
graph.invoke(None, config)
|
||||
```
|
||||
|
||||
## Practical Example: Multi-Stage Approval Workflow
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt, Command
|
||||
|
||||
class ApprovalState(TypedDict):
|
||||
request: str
|
||||
draft: str
|
||||
reviewed: str
|
||||
approved: bool
|
||||
|
||||
def draft_node(state: ApprovalState):
|
||||
"""Create draft"""
|
||||
draft = create_draft(state["request"])
|
||||
return {"draft": draft}
|
||||
|
||||
def review_node(state: ApprovalState):
|
||||
"""Review and edit"""
|
||||
reviewed = interrupt({
|
||||
"type": "review",
|
||||
"content": state["draft"],
|
||||
"instruction": "Review and improve the draft"
|
||||
})
|
||||
|
||||
return {"reviewed": reviewed}
|
||||
|
||||
def approval_node(state: ApprovalState):
|
||||
"""Final approval"""
|
||||
approved = interrupt({
|
||||
"type": "approval",
|
||||
"content": state["reviewed"],
|
||||
"question": "Approve for publication?"
|
||||
})
|
||||
|
||||
if approved:
|
||||
return Command(
|
||||
update={"approved": True},
|
||||
goto="publish"
|
||||
)
|
||||
else:
|
||||
return Command(
|
||||
update={"approved": False},
|
||||
goto="draft" # Return to draft
|
||||
)
|
||||
|
||||
def publish_node(state: ApprovalState):
|
||||
"""Publish"""
|
||||
publish(state["reviewed"])
|
||||
return {"status": "published"}
|
||||
|
||||
# Build graph
|
||||
builder.add_node("draft", draft_node)
|
||||
builder.add_node("review", review_node)
|
||||
builder.add_node("approval", approval_node)
|
||||
builder.add_node("publish", publish_node)
|
||||
|
||||
builder.add_edge(START, "draft")
|
||||
builder.add_edge("draft", "review")
|
||||
builder.add_edge("review", "approval")
|
||||
# approval node determines control flow with Command
|
||||
builder.add_edge("publish", END)
|
||||
```
|
||||
|
||||
## Important Rules
|
||||
|
||||
### ✅ Recommendations
|
||||
|
||||
- Pass values in JSON format
|
||||
- Keep `interrupt()` call order consistent
|
||||
- Make processing before `interrupt()` idempotent
|
||||
|
||||
### ❌ Prohibitions
|
||||
|
||||
- Don't catch `interrupt()` with `try-except`
|
||||
- Don't skip `interrupt()` conditionally
|
||||
- Don't pass non-serializable objects
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. High-Risk Operation Approval
|
||||
|
||||
```python
|
||||
def delete_data(state: State):
|
||||
"""Delete data (approval required)"""
|
||||
approved = interrupt({
|
||||
"action": "delete_data",
|
||||
"warning": "This cannot be undone!",
|
||||
"data_count": len(state["data_to_delete"])
|
||||
})
|
||||
|
||||
if approved:
|
||||
execute_delete(state["data_to_delete"])
|
||||
return {"deleted": True}
|
||||
return {"deleted": False}
|
||||
```
|
||||
|
||||
### 2. Creative Work Review
|
||||
|
||||
```python
|
||||
def creative_generation(state: State):
|
||||
"""Creative content generation and review"""
|
||||
versions = []
|
||||
|
||||
for _ in range(3):
|
||||
version = generate_creative(state["prompt"])
|
||||
versions.append(version)
|
||||
|
||||
selected = interrupt({
|
||||
"type": "select_version",
|
||||
"versions": versions,
|
||||
"instruction": "Select the best version or request regeneration"
|
||||
})
|
||||
|
||||
return {"final_version": selected}
|
||||
```
|
||||
|
||||
### 3. Incremental Data Input
|
||||
|
||||
```python
|
||||
def collect_user_info(state: State):
|
||||
"""Collect user information incrementally"""
|
||||
name = interrupt("What is your name?")
|
||||
|
||||
age = interrupt(f"Hello {name}, what is your age?")
|
||||
|
||||
city = interrupt("What city do you live in?")
|
||||
|
||||
return {
|
||||
"user_info": {
|
||||
"name": name,
|
||||
"age": age,
|
||||
"city": city
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Human-in-the-Loop is a feature for incorporating human judgment in important decisions. Dynamic interrupt is flexible and recommended.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Checkpointer is required
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combination with agents
|
||||
- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Approval before tool execution
|
||||
283
skills/langgraph-master/05_advanced_features_map_reduce.md
Normal file
283
skills/langgraph-master/05_advanced_features_map_reduce.md
Normal file
@@ -0,0 +1,283 @@
|
||||
# Map-Reduce (Parallel Processing Pattern)
|
||||
|
||||
A pattern for parallel processing and aggregation of large datasets.
|
||||
|
||||
## Overview
|
||||
|
||||
Map-Reduce is a pattern that combines **Map** (parallel processing) and **Reduce** (aggregation). In LangGraph, it's implemented using the Send API.
|
||||
|
||||
## Basic Implementation
|
||||
|
||||
```python
|
||||
from langgraph.types import Send
|
||||
from typing import Annotated
|
||||
from operator import add
|
||||
|
||||
class MapReduceState(TypedDict):
|
||||
items: list[str]
|
||||
results: Annotated[list[str], add]
|
||||
final_result: str
|
||||
|
||||
def map_node(state: MapReduceState):
|
||||
"""Map: Send each item to worker"""
|
||||
return [
|
||||
Send("worker", {"item": item})
|
||||
for item in state["items"]
|
||||
]
|
||||
|
||||
def worker_node(item_state: dict):
|
||||
"""Process individual item"""
|
||||
result = process_item(item_state["item"])
|
||||
return {"results": [result]}
|
||||
|
||||
def reduce_node(state: MapReduceState):
|
||||
"""Reduce: Aggregate results"""
|
||||
final = aggregate_results(state["results"])
|
||||
return {"final_result": final}
|
||||
|
||||
# Build graph
|
||||
builder = StateGraph(MapReduceState)
|
||||
builder.add_node("map", map_node)
|
||||
builder.add_node("worker", worker_node)
|
||||
builder.add_node("reduce", reduce_node)
|
||||
|
||||
builder.add_edge(START, "map")
|
||||
builder.add_edge("worker", "reduce")
|
||||
builder.add_edge("reduce", END)
|
||||
|
||||
graph = builder.compile()
|
||||
```
|
||||
|
||||
## Types of Reducers
|
||||
|
||||
### Addition (List Concatenation)
|
||||
|
||||
```python
|
||||
from operator import add
|
||||
|
||||
class State(TypedDict):
|
||||
results: Annotated[list, add] # Concatenate lists
|
||||
|
||||
# [1, 2] + [3, 4] = [1, 2, 3, 4]
|
||||
```
|
||||
|
||||
### Custom Reducer
|
||||
|
||||
```python
|
||||
def merge_dicts(left: dict, right: dict) -> dict:
|
||||
"""Merge dictionaries"""
|
||||
return {**left, **right}
|
||||
|
||||
class State(TypedDict):
|
||||
data: Annotated[dict, merge_dicts]
|
||||
```
|
||||
|
||||
## Application Patterns
|
||||
|
||||
### Pattern 1: Parallel Document Summarization
|
||||
|
||||
```python
|
||||
class DocSummaryState(TypedDict):
|
||||
documents: list[str]
|
||||
summaries: Annotated[list[str], add]
|
||||
final_summary: str
|
||||
|
||||
def map_documents(state: DocSummaryState):
|
||||
"""Send each document to worker"""
|
||||
return [
|
||||
Send("summarize_worker", {"doc": doc, "index": i})
|
||||
for i, doc in enumerate(state["documents"])
|
||||
]
|
||||
|
||||
def summarize_worker(worker_state: dict):
|
||||
"""Summarize individual document"""
|
||||
summary = llm.invoke(f"Summarize: {worker_state['doc']}")
|
||||
return {"summaries": [summary]}
|
||||
|
||||
def final_summary_node(state: DocSummaryState):
|
||||
"""Integrate all summaries"""
|
||||
combined = "\n".join(state["summaries"])
|
||||
final = llm.invoke(f"Create final summary from:\n{combined}")
|
||||
return {"final_summary": final}
|
||||
```
|
||||
|
||||
### Pattern 2: Hierarchical Map-Reduce
|
||||
|
||||
```python
|
||||
def level1_map(state: State):
|
||||
"""Level 1: Split data into chunks"""
|
||||
chunks = create_chunks(state["data"], chunk_size=100)
|
||||
return [
|
||||
Send("level1_worker", {"chunk": chunk})
|
||||
for chunk in chunks
|
||||
]
|
||||
|
||||
def level1_worker(worker_state: dict):
|
||||
"""Level 1 worker: Aggregate within chunk"""
|
||||
partial_result = aggregate_chunk(worker_state["chunk"])
|
||||
return {"level1_results": [partial_result]}
|
||||
|
||||
def level2_map(state: State):
|
||||
"""Level 2: Further aggregate partial results"""
|
||||
return [
|
||||
Send("level2_worker", {"partial": result})
|
||||
for result in state["level1_results"]
|
||||
]
|
||||
|
||||
def level2_worker(worker_state: dict):
|
||||
"""Level 2 worker: Final aggregation"""
|
||||
final = final_aggregate(worker_state["partial"])
|
||||
return {"final_result": final}
|
||||
```
|
||||
|
||||
### Pattern 3: Dynamic Parallelism Control
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
def adaptive_map(state: State):
|
||||
"""Adjust parallelism based on system resources"""
|
||||
max_workers = int(os.getenv("MAX_WORKERS", "10"))
|
||||
items = state["items"]
|
||||
|
||||
# Split items into batches
|
||||
batch_size = max(1, len(items) // max_workers)
|
||||
batches = [
|
||||
items[i:i+batch_size]
|
||||
for i in range(0, len(items), batch_size)
|
||||
]
|
||||
|
||||
return [
|
||||
Send("batch_worker", {"batch": batch})
|
||||
for batch in batches
|
||||
]
|
||||
|
||||
def batch_worker(worker_state: dict):
|
||||
"""Process batch"""
|
||||
results = [process_item(item) for item in worker_state["batch"]]
|
||||
return {"results": results}
|
||||
```
|
||||
|
||||
### Pattern 4: Error-Resilient Map-Reduce
|
||||
|
||||
```python
|
||||
class RobustState(TypedDict):
|
||||
items: list[str]
|
||||
successes: Annotated[list, add]
|
||||
failures: Annotated[list, add]
|
||||
|
||||
def robust_worker(worker_state: dict):
|
||||
"""Worker with error handling"""
|
||||
try:
|
||||
result = process_item(worker_state["item"])
|
||||
return {"successes": [{"item": worker_state["item"], "result": result}]}
|
||||
|
||||
except Exception as e:
|
||||
return {"failures": [{"item": worker_state["item"], "error": str(e)}]}
|
||||
|
||||
def error_handler(state: RobustState):
|
||||
"""Process failed items"""
|
||||
if state["failures"]:
|
||||
# Retry or log failed items
|
||||
log_failures(state["failures"])
|
||||
|
||||
return {"final_result": aggregate(state["successes"])}
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Batch Size Adjustment
|
||||
|
||||
```python
|
||||
def optimal_batching(items: list, target_batch_time: float = 1.0):
|
||||
"""Calculate optimal batch size"""
|
||||
# Estimate processing time per item
|
||||
sample_time = estimate_processing_time(items[0])
|
||||
|
||||
# Batch size to reach target time
|
||||
batch_size = max(1, int(target_batch_time / sample_time))
|
||||
|
||||
batches = [
|
||||
items[i:i+batch_size]
|
||||
for i in range(0, len(items), batch_size)
|
||||
]
|
||||
|
||||
return batches
|
||||
```
|
||||
|
||||
### Progress Tracking
|
||||
|
||||
```python
|
||||
from langgraph.config import get_stream_writer
|
||||
|
||||
def map_with_progress(state: State):
|
||||
"""Map that reports progress"""
|
||||
writer = get_stream_writer()
|
||||
total = len(state["items"])
|
||||
|
||||
sends = []
|
||||
for i, item in enumerate(state["items"]):
|
||||
sends.append(Send("worker", {"item": item}))
|
||||
writer({"progress": f"{i+1}/{total}"})
|
||||
|
||||
return sends
|
||||
```
|
||||
|
||||
## Aggregation Patterns
|
||||
|
||||
### Statistical Aggregation
|
||||
|
||||
```python
|
||||
def statistical_reduce(state: State):
|
||||
"""Calculate statistics"""
|
||||
results = state["results"]
|
||||
|
||||
return {
|
||||
"total": sum(results),
|
||||
"average": sum(results) / len(results),
|
||||
"min": min(results),
|
||||
"max": max(results),
|
||||
"count": len(results)
|
||||
}
|
||||
```
|
||||
|
||||
### LLM-Based Integration
|
||||
|
||||
```python
|
||||
def llm_reduce(state: State):
|
||||
"""Integrate multiple results with LLM"""
|
||||
all_results = "\n\n".join([
|
||||
f"Result {i+1}:\n{r}"
|
||||
for i, r in enumerate(state["results"])
|
||||
])
|
||||
|
||||
final = llm.invoke(
|
||||
f"Synthesize these results into a comprehensive answer:\n\n{all_results}"
|
||||
)
|
||||
|
||||
return {"final_result": final}
|
||||
```
|
||||
|
||||
## Advantages
|
||||
|
||||
✅ **Scalability**: Efficiently process large datasets
|
||||
✅ **Parallelism**: Execute independent tasks concurrently
|
||||
✅ **Flexibility**: Dynamically adjust number of workers
|
||||
✅ **Error Isolation**: One failure doesn't affect the whole
|
||||
|
||||
## Considerations
|
||||
|
||||
⚠️ **Memory Consumption**: Many worker instances
|
||||
⚠️ **Order Non-deterministic**: Worker execution order is not guaranteed
|
||||
⚠️ **Overhead**: Inefficient for small tasks
|
||||
⚠️ **Reducer Design**: Design appropriate aggregation method
|
||||
|
||||
## Summary
|
||||
|
||||
Map-Reduce is a pattern that uses Send API to process large datasets in parallel and aggregates with Reducers. Optimal for large-scale data processing.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_orchestrator_worker.md](02_graph_architecture_orchestrator_worker.md) - Orchestrator-Worker pattern
|
||||
- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with static parallelization
|
||||
- [01_core_concepts_state.md](01_core_concepts_state.md) - Details on Reducers
|
||||
73
skills/langgraph-master/05_advanced_features_overview.md
Normal file
73
skills/langgraph-master/05_advanced_features_overview.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# 05. Advanced Features
|
||||
|
||||
Advanced features and implementation patterns.
|
||||
|
||||
## Overview
|
||||
|
||||
By leveraging LangGraph's advanced features, you can build more sophisticated agent systems.
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. [Human-in-the-Loop (Approval Flow)](05_advanced_features_human_in_the_loop.md)
|
||||
|
||||
Pause graph execution and request human intervention:
|
||||
- Dynamic interrupt
|
||||
- Static interrupt
|
||||
- Approval, editing, and rejection flows
|
||||
|
||||
### 2. [Streaming](05_advanced_features_streaming.md)
|
||||
|
||||
Monitor progress in real-time:
|
||||
- LLM token streaming
|
||||
- State update streaming
|
||||
- Custom event streaming
|
||||
|
||||
### 3. [Map-Reduce (Parallel Processing Pattern)](05_advanced_features_map_reduce.md)
|
||||
|
||||
Parallel processing of large datasets:
|
||||
- Dynamic worker generation with Send API
|
||||
- Result aggregation with Reducers
|
||||
- Hierarchical parallel processing
|
||||
|
||||
## Feature Comparison
|
||||
|
||||
| Feature | Use Case | Implementation Complexity |
|
||||
|---------|----------|--------------------------|
|
||||
| Human-in-the-Loop | Approval flows, quality control | Medium |
|
||||
| Streaming | Real-time monitoring, UX improvement | Low |
|
||||
| Map-Reduce | Large-scale data processing | High |
|
||||
|
||||
## Combination Patterns
|
||||
|
||||
### Human-in-the-Loop + Streaming
|
||||
|
||||
```python
|
||||
# Stream while requesting approval
|
||||
for chunk in graph.stream(input, config, stream_mode="values"):
|
||||
print(chunk)
|
||||
|
||||
# Pause at interrupt
|
||||
if chunk.get("__interrupt__"):
|
||||
approval = input("Approve? (y/n): ")
|
||||
graph.invoke(None, config, resume=approval == "y")
|
||||
```
|
||||
|
||||
### Map-Reduce + Streaming
|
||||
|
||||
```python
|
||||
# Stream progress of parallel processing
|
||||
for chunk in graph.stream(
|
||||
{"items": large_dataset},
|
||||
stream_mode="updates",
|
||||
subgraphs=True # Also show worker progress
|
||||
):
|
||||
print(f"Progress: {chunk}")
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
For details on each feature, refer to the following pages:
|
||||
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Implementation of approval flows
|
||||
- [05_advanced_features_streaming.md](05_advanced_features_streaming.md) - How to use streaming
|
||||
- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce pattern
|
||||
220
skills/langgraph-master/05_advanced_features_streaming.md
Normal file
220
skills/langgraph-master/05_advanced_features_streaming.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# Streaming
|
||||
|
||||
A feature to monitor graph execution progress in real-time.
|
||||
|
||||
## Overview
|
||||
|
||||
Streaming is a feature that receives **real-time updates** during graph execution. You can stream LLM tokens, state changes, custom events, and more.
|
||||
|
||||
## Types of stream_mode
|
||||
|
||||
### 1. values (Complete State Snapshot)
|
||||
|
||||
Complete state after each step:
|
||||
|
||||
```python
|
||||
for chunk in graph.stream(input, stream_mode="values"):
|
||||
print(chunk)
|
||||
|
||||
# Example output
|
||||
# {"messages": [{"role": "user", "content": "Hello"}]}
|
||||
# {"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi!"}]}
|
||||
```
|
||||
|
||||
### 2. updates (Only State Changes)
|
||||
|
||||
Only changes at each step:
|
||||
|
||||
```python
|
||||
for chunk in graph.stream(input, stream_mode="updates"):
|
||||
print(chunk)
|
||||
|
||||
# Example output
|
||||
# {"messages": [{"role": "assistant", "content": "Hi!"}]}
|
||||
```
|
||||
|
||||
### 3. messages (LLM Tokens)
|
||||
|
||||
Stream at token level from LLM:
|
||||
|
||||
```python
|
||||
for msg, metadata in graph.stream(input, stream_mode="messages"):
|
||||
if msg.content:
|
||||
print(msg.content, end="", flush=True)
|
||||
|
||||
# Output: "H" "i" "!" " " "H" "o" "w" ... (token by token)
|
||||
```
|
||||
|
||||
### 4. debug (Debug Information)
|
||||
|
||||
Detailed graph execution information:
|
||||
|
||||
```python
|
||||
for chunk in graph.stream(input, stream_mode="debug"):
|
||||
print(chunk)
|
||||
|
||||
# Details like node execution, edge transitions, etc.
|
||||
```
|
||||
|
||||
### 5. custom (Custom Data)
|
||||
|
||||
Send custom data from nodes:
|
||||
|
||||
```python
|
||||
from langgraph.config import get_stream_writer
|
||||
|
||||
def my_node(state: State):
|
||||
writer = get_stream_writer()
|
||||
|
||||
for i in range(10):
|
||||
writer({"progress": i * 10}) # Custom data
|
||||
|
||||
return {"result": "done"}
|
||||
|
||||
for mode, chunk in graph.stream(input, stream_mode=["updates", "custom"]):
|
||||
if mode == "custom":
|
||||
print(f"Progress: {chunk['progress']}%")
|
||||
```
|
||||
|
||||
## LLM Token Streaming
|
||||
|
||||
### Stream Only Specific Nodes
|
||||
|
||||
```python
|
||||
for msg, metadata in graph.stream(input, stream_mode="messages"):
|
||||
# Display tokens only from specific node
|
||||
if metadata["langgraph_node"] == "chatbot":
|
||||
if msg.content:
|
||||
print(msg.content, end="", flush=True)
|
||||
|
||||
print() # Newline
|
||||
```
|
||||
|
||||
### Filter by Tags
|
||||
|
||||
```python
|
||||
# Set tags on LLM
|
||||
llm = init_chat_model("gpt-5", tags=["main_llm"])
|
||||
|
||||
for msg, metadata in graph.stream(input, stream_mode="messages"):
|
||||
if "main_llm" in metadata.get("tags", []):
|
||||
if msg.content:
|
||||
print(msg.content, end="", flush=True)
|
||||
```
|
||||
|
||||
## Using Multiple Modes Simultaneously
|
||||
|
||||
```python
|
||||
for mode, chunk in graph.stream(input, stream_mode=["values", "messages"]):
|
||||
if mode == "values":
|
||||
print(f"\nState: {chunk}")
|
||||
elif mode == "messages":
|
||||
if chunk[0].content:
|
||||
print(chunk[0].content, end="", flush=True)
|
||||
```
|
||||
|
||||
## Subgraph Streaming
|
||||
|
||||
```python
|
||||
# Include subgraph outputs
|
||||
for chunk in graph.stream(
|
||||
input,
|
||||
stream_mode="updates",
|
||||
subgraphs=True # Include subgraphs
|
||||
):
|
||||
print(chunk)
|
||||
```
|
||||
|
||||
## Practical Example: Progress Bar
|
||||
|
||||
```python
|
||||
from tqdm import tqdm
|
||||
|
||||
def process_with_progress(items: list):
|
||||
"""Processing with progress bar"""
|
||||
total = len(items)
|
||||
|
||||
with tqdm(total=total) as pbar:
|
||||
for chunk in graph.stream(
|
||||
{"items": items},
|
||||
stream_mode="custom"
|
||||
):
|
||||
if "progress" in chunk:
|
||||
pbar.update(1)
|
||||
|
||||
return "Complete!"
|
||||
```
|
||||
|
||||
## Practical Example: Real-time UI Updates
|
||||
|
||||
```python
|
||||
import streamlit as st
|
||||
|
||||
def run_with_ui_updates(user_input: str):
|
||||
"""Update Streamlit UI in real-time"""
|
||||
status = st.empty()
|
||||
output = st.empty()
|
||||
|
||||
full_response = ""
|
||||
|
||||
for msg, metadata in graph.stream(
|
||||
{"messages": [{"role": "user", "content": user_input}]},
|
||||
stream_mode="messages"
|
||||
):
|
||||
if msg.content:
|
||||
full_response += msg.content
|
||||
output.markdown(full_response + "▌")
|
||||
|
||||
status.text(f"Node: {metadata['langgraph_node']}")
|
||||
|
||||
output.markdown(full_response)
|
||||
status.text("Complete!")
|
||||
```
|
||||
|
||||
## Async Streaming
|
||||
|
||||
```python
|
||||
async def async_stream_example():
|
||||
"""Async streaming"""
|
||||
async for chunk in graph.astream(input, stream_mode="updates"):
|
||||
print(chunk)
|
||||
await asyncio.sleep(0) # Yield to other tasks
|
||||
```
|
||||
|
||||
## Sending Custom Events
|
||||
|
||||
```python
|
||||
from langgraph.config import get_stream_writer
|
||||
|
||||
def multi_step_node(state: State):
|
||||
"""Report progress of multiple steps"""
|
||||
writer = get_stream_writer()
|
||||
|
||||
# Step 1
|
||||
writer({"status": "Analyzing..."})
|
||||
analysis = analyze_data(state["data"])
|
||||
|
||||
# Step 2
|
||||
writer({"status": "Processing..."})
|
||||
result = process_analysis(analysis)
|
||||
|
||||
# Step 3
|
||||
writer({"status": "Finalizing..."})
|
||||
final = finalize(result)
|
||||
|
||||
return {"result": final}
|
||||
|
||||
# Receive
|
||||
for mode, chunk in graph.stream(input, stream_mode=["updates", "custom"]):
|
||||
if mode == "custom":
|
||||
print(chunk["status"])
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Streaming monitors progress in real-time and improves user experience. Choose the appropriate stream_mode based on your use case.
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent streaming
|
||||
- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Combining streaming and approval
|
||||
299
skills/langgraph-master/06_llm_model_ids.md
Normal file
299
skills/langgraph-master/06_llm_model_ids.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# LLM Model ID Reference
|
||||
|
||||
List of model IDs for major LLM providers commonly used in LangGraph. For detailed information and best practices for each provider, please refer to the individual pages.
|
||||
|
||||
> **Last Updated**: 2025-11-24
|
||||
> **Note**: Model availability and names may change. Please refer to each provider's official documentation for the latest information.
|
||||
|
||||
## 📚 Provider-Specific Documentation
|
||||
|
||||
### [Google Gemini Models](06_llm_model_ids_gemini.md)
|
||||
|
||||
Google's latest LLM models featuring large-scale context (up to 1M tokens).
|
||||
|
||||
**Key Models**:
|
||||
|
||||
- `google/gemini-3-pro-preview` - Latest high-performance model
|
||||
- `gemini-2.5-flash` - Fast response version (1M tokens)
|
||||
- `gemini-2.5-flash-lite` - Lightweight fast version
|
||||
|
||||
**Details**: [Gemini Model ID Complete Guide](06_llm_model_ids_gemini.md)
|
||||
|
||||
---
|
||||
|
||||
### [Anthropic Claude Models](06_llm_model_ids_claude.md)
|
||||
|
||||
Anthropic's Claude 4.x series featuring balanced performance and cost.
|
||||
|
||||
**Key Models**:
|
||||
|
||||
- `claude-opus-4-1-20250805` - Most powerful model
|
||||
- `claude-sonnet-4-5` - Balanced (recommended)
|
||||
- `claude-haiku-4-5-20251001` - Fast and low-cost
|
||||
|
||||
**Details**: [Claude Model ID Complete Guide](06_llm_model_ids_claude.md)
|
||||
|
||||
---
|
||||
|
||||
### [OpenAI GPT Models](06_llm_model_ids_openai.md)
|
||||
|
||||
OpenAI's GPT-5 series supporting a wide range of tasks, with 400K context and advanced reasoning capabilities.
|
||||
|
||||
**Key Models**:
|
||||
|
||||
- `gpt-5` - GPT-5 standard version
|
||||
- `gpt-5-mini` - Small version (cost-efficient ◎)
|
||||
- `gpt-5.1-thinking` - Adaptive reasoning model
|
||||
|
||||
**Details**: [OpenAI Model ID Complete Guide](06_llm_model_ids_openai.md)
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
|
||||
# Use Claude
|
||||
claude_llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
# Use OpenAI
|
||||
openai_llm = ChatOpenAI(model="gpt-5")
|
||||
|
||||
# Use Gemini
|
||||
gemini_llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
|
||||
```
|
||||
|
||||
### Using with LangGraph
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from typing import TypedDict, Annotated
|
||||
from langgraph.graph.message import add_messages
|
||||
|
||||
# State definition
|
||||
class State(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
|
||||
# Model initialization
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
# Node definition
|
||||
def chat_node(state: State):
|
||||
response = llm.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
# Graph construction
|
||||
graph = StateGraph(State)
|
||||
graph.add_node("chat", chat_node)
|
||||
graph.set_entry_point("chat")
|
||||
graph.set_finish_point("chat")
|
||||
|
||||
app = graph.compile()
|
||||
```
|
||||
|
||||
## 📊 Model Selection Guide
|
||||
|
||||
### Recommended Models by Use Case
|
||||
|
||||
| Use Case | Recommended Model | Reason |
|
||||
| ---------------------- | ------------------------------------------------------------- | ------------------------- |
|
||||
| **Cost-focused** | `claude-haiku-4-5`<br>`gpt-5-mini`<br>`gemini-2.5-flash-lite` | Low cost and fast |
|
||||
| **Balance-focused** | `claude-sonnet-4-5`<br>`gpt-5`<br>`gemini-2.5-flash` | Balance of performance and cost |
|
||||
| **Performance-focused** | `claude-opus-4-1`<br>`gpt-5-pro`<br>`gemini-3-pro` | Maximum performance |
|
||||
| **Reasoning-specialized** | `gpt-5.1-thinking`<br>`gpt-5.1-instant` | Adaptive reasoning, math, science |
|
||||
| **Large-scale context** | `gemini-2.5-pro` | 1M token context |
|
||||
|
||||
### Selection by Task Complexity
|
||||
|
||||
```python
|
||||
def select_model(task_complexity: str, budget: str = "normal"):
|
||||
"""Select optimal model based on task and budget"""
|
||||
|
||||
# Budget-focused
|
||||
if budget == "low":
|
||||
models = {
|
||||
"simple": "claude-haiku-4-5-20251001",
|
||||
"medium": "gpt-5-mini",
|
||||
"complex": "claude-sonnet-4-5"
|
||||
}
|
||||
return models.get(task_complexity, "gpt-5-mini")
|
||||
|
||||
# Performance-focused
|
||||
if budget == "high":
|
||||
models = {
|
||||
"simple": "claude-sonnet-4-5",
|
||||
"medium": "gpt-5",
|
||||
"complex": "claude-opus-4-1-20250805"
|
||||
}
|
||||
return models.get(task_complexity, "claude-opus-4-1-20250805")
|
||||
|
||||
# Balance-focused (default)
|
||||
models = {
|
||||
"simple": "gpt-5-mini",
|
||||
"medium": "claude-sonnet-4-5",
|
||||
"complex": "gpt-5"
|
||||
}
|
||||
return models.get(task_complexity, "claude-sonnet-4-5")
|
||||
```
|
||||
|
||||
## 🔄 Multi-Model Strategy
|
||||
|
||||
### Fallback Between Providers
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
# Primary model and fallback
|
||||
primary = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
fallback1 = ChatOpenAI(model="gpt-5")
|
||||
fallback2 = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
|
||||
|
||||
llm_with_fallback = primary.with_fallbacks([fallback1, fallback2])
|
||||
|
||||
# Automatically fallback until one model succeeds
|
||||
response = llm_with_fallback.invoke("Question content")
|
||||
```
|
||||
|
||||
### Cost-Optimized Auto-Routing
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph
|
||||
from typing import TypedDict, Annotated, Literal
|
||||
from langgraph.graph.message import add_messages
|
||||
|
||||
class State(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
complexity: Literal["simple", "medium", "complex"]
|
||||
|
||||
# Use different models based on complexity
|
||||
simple_llm = ChatAnthropic(model="claude-haiku-4-5-20251001") # Low cost
|
||||
medium_llm = ChatOpenAI(model="gpt-5-mini") # Balance
|
||||
complex_llm = ChatAnthropic(model="claude-opus-4-1-20250805") # High performance
|
||||
|
||||
def analyze_complexity(state: State):
|
||||
"""Analyze message complexity"""
|
||||
message = state["messages"][-1].content
|
||||
# Simple complexity determination
|
||||
if len(message) < 50:
|
||||
complexity = "simple"
|
||||
elif len(message) < 200:
|
||||
complexity = "medium"
|
||||
else:
|
||||
complexity = "complex"
|
||||
return {"complexity": complexity}
|
||||
|
||||
def route_by_complexity(state: State):
|
||||
"""Route based on complexity"""
|
||||
routes = {
|
||||
"simple": "simple_node",
|
||||
"medium": "medium_node",
|
||||
"complex": "complex_node"
|
||||
}
|
||||
return routes[state["complexity"]]
|
||||
|
||||
def simple_node(state: State):
|
||||
response = simple_llm.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def medium_node(state: State):
|
||||
response = medium_llm.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def complex_node(state: State):
|
||||
response = complex_llm.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
# Graph construction
|
||||
graph = StateGraph(State)
|
||||
graph.add_node("analyze", analyze_complexity)
|
||||
graph.add_node("simple_node", simple_node)
|
||||
graph.add_node("medium_node", medium_node)
|
||||
graph.add_node("complex_node", complex_node)
|
||||
|
||||
graph.set_entry_point("analyze")
|
||||
graph.add_conditional_edges("analyze", route_by_complexity)
|
||||
|
||||
app = graph.compile()
|
||||
```
|
||||
|
||||
## 🔧 Best Practices
|
||||
|
||||
### 1. Environment Variable Management
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
# Flexibly manage models with environment variables
|
||||
DEFAULT_MODEL = os.getenv("DEFAULT_LLM_MODEL", "claude-sonnet-4-5")
|
||||
FAST_MODEL = os.getenv("FAST_LLM_MODEL", "claude-haiku-4-5-20251001")
|
||||
SMART_MODEL = os.getenv("SMART_LLM_MODEL", "claude-opus-4-1-20250805")
|
||||
|
||||
# Switch provider based on environment
|
||||
PROVIDER = os.getenv("LLM_PROVIDER", "anthropic")
|
||||
|
||||
if PROVIDER == "anthropic":
|
||||
llm = ChatAnthropic(model=DEFAULT_MODEL)
|
||||
elif PROVIDER == "openai":
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
elif PROVIDER == "google":
|
||||
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
|
||||
```
|
||||
|
||||
### 2. Fixed Model Version (Production)
|
||||
|
||||
```python
|
||||
# ✅ Recommended: Use dated version (production)
|
||||
prod_llm = ChatAnthropic(model="claude-sonnet-4-20250514")
|
||||
|
||||
# ⚠️ Caution: No version specified (potential unexpected updates)
|
||||
dev_llm = ChatAnthropic(model="claude-sonnet-4")
|
||||
```
|
||||
|
||||
### 3. Cost Monitoring
|
||||
|
||||
```python
|
||||
from langchain.callbacks import get_openai_callback
|
||||
|
||||
# OpenAI cost tracking
|
||||
with get_openai_callback() as cb:
|
||||
response = openai_llm.invoke("question")
|
||||
print(f"Total Cost: ${cb.total_cost}")
|
||||
print(f"Tokens: {cb.total_tokens}")
|
||||
|
||||
# For other providers, track manually
|
||||
# Refer to each provider's detail pages
|
||||
```
|
||||
|
||||
## 📖 Detailed Documentation
|
||||
|
||||
For detailed information on each provider, please refer to the following pages:
|
||||
|
||||
- **[Gemini Model ID](06_llm_model_ids_gemini.md)**: Model list, usage, advanced settings, multimodal features
|
||||
- **[Claude Model ID](06_llm_model_ids_claude.md)**: Model list, platform-specific IDs, tool usage, deprecated model information
|
||||
- **[OpenAI Model ID](06_llm_model_ids_openai.md)**: Model list, reasoning models, vision features, Azure OpenAI
|
||||
|
||||
## 🔗 Reference Links
|
||||
|
||||
### Official Documentation
|
||||
|
||||
- [Google Gemini API](https://ai.google.dev/gemini-api/docs/models)
|
||||
- [Anthropic Claude API](https://docs.anthropic.com/en/docs/about-claude/models/overview)
|
||||
- [OpenAI Platform](https://platform.openai.com/docs/models)
|
||||
|
||||
### Integration Guides
|
||||
|
||||
- [LangChain Chat Models](https://docs.langchain.com/oss/python/modules/model_io/chat/)
|
||||
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
|
||||
|
||||
### Pricing Information
|
||||
|
||||
- [Gemini Pricing](https://ai.google.dev/pricing)
|
||||
- [Claude Pricing](https://www.anthropic.com/pricing)
|
||||
- [OpenAI Pricing](https://openai.com/pricing)
|
||||
127
skills/langgraph-master/06_llm_model_ids_claude.md
Normal file
127
skills/langgraph-master/06_llm_model_ids_claude.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# Anthropic Claude Model IDs
|
||||
|
||||
List of available model IDs for the Anthropic Claude API.
|
||||
|
||||
> **Last Updated**: 2025-11-24
|
||||
|
||||
## Model List
|
||||
|
||||
### Claude 4.x (2025)
|
||||
|
||||
| Model ID | Context | Max Output | Release | Features |
|
||||
|-----------|------------|---------|---------|------|
|
||||
| `claude-opus-4-1-20250805` | 200K | 32K | 2025-08 | Most powerful. Complex reasoning & code generation |
|
||||
| `claude-sonnet-4-5` | 1M | 64K | 2025-09 | Latest balanced model (recommended) |
|
||||
| `claude-sonnet-4-20250514` | 200K (1M beta) | 64K | 2025-05 | Production recommended (date-fixed) |
|
||||
| `claude-haiku-4-5-20251001` | 200K | 64K | 2025-10 | Fast & low-cost |
|
||||
|
||||
**Model Characteristics**:
|
||||
- **Opus**: Highest performance, complex tasks (200K context)
|
||||
- **Sonnet**: Balanced, general-purpose (1M context)
|
||||
- **Haiku**: Fast & low-cost ($1/M input, $5/M output)
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
# Recommended: Latest Sonnet
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
# Production: Date-fixed version
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
|
||||
|
||||
# Fast & low-cost
|
||||
llm = ChatAnthropic(model="claude-haiku-4-5-20251001")
|
||||
|
||||
# Highest performance
|
||||
llm = ChatAnthropic(model="claude-opus-4-1-20250805")
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="sk-ant-..."
|
||||
```
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
| Use Case | Recommended Model |
|
||||
|------|-----------|
|
||||
| Cost-focused | `claude-haiku-4-5-20251001` |
|
||||
| Balanced | `claude-sonnet-4-5` |
|
||||
| Performance-focused | `claude-opus-4-1-20250805` |
|
||||
| Production | `claude-sonnet-4-20250514` (date-fixed) |
|
||||
|
||||
## Claude Features
|
||||
|
||||
### 1. Large Context Window
|
||||
|
||||
Claude Sonnet 4.5 supports **1M tokens** context window:
|
||||
|
||||
| Model | Standard Context | Max Output | Notes |
|
||||
|--------|---------------|---------|------|
|
||||
| Sonnet 4.5 | 1M | 64K | Latest version |
|
||||
| Sonnet 4 | 200K (1M beta) | 64K | 1M available with beta header |
|
||||
| Opus 4.1 | 200K | 32K | High-performance version |
|
||||
| Haiku 4.5 | 200K | 64K | Fast version |
|
||||
|
||||
```python
|
||||
# Using 1M context (Sonnet 4.5)
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
max_tokens=64000 # Max output: 64K
|
||||
)
|
||||
|
||||
# Enable 1M context for Sonnet 4 (beta)
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-20250514",
|
||||
default_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"}
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Date-Fixed Versions
|
||||
|
||||
For production environments, date-fixed versions are recommended to prevent unexpected updates:
|
||||
|
||||
```python
|
||||
# ✅ Recommended (production)
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
|
||||
|
||||
# ⚠️ Caution (development only)
|
||||
llm = ChatAnthropic(model="claude-sonnet-4")
|
||||
```
|
||||
|
||||
### 3. Tool Use (Function Calling)
|
||||
|
||||
Claude has powerful tool use capabilities (see [Tool Use Guide](06_llm_model_ids_claude_tools.md) for details).
|
||||
|
||||
### 4. Multi-Platform Support
|
||||
|
||||
Available on multiple cloud platforms (see [Platform-Specific Guide](06_llm_model_ids_claude_platforms.md) for details):
|
||||
|
||||
- Anthropic API (direct)
|
||||
- Google Vertex AI
|
||||
- AWS Bedrock
|
||||
- Azure AI (Microsoft Foundry)
|
||||
|
||||
## Deprecated Models
|
||||
|
||||
| Model | Deprecation Date | Migration Target |
|
||||
|--------|-------|--------|
|
||||
| Claude 3 Opus | 2025-07-21 | `claude-opus-4-1-20250805` |
|
||||
| Claude 3 Sonnet | 2025-07-21 | `claude-sonnet-4-5` |
|
||||
| Claude 2.1 | 2025-07-21 | `claude-sonnet-4-5` |
|
||||
|
||||
## Detailed Documentation
|
||||
|
||||
For advanced settings and parameters:
|
||||
- **[Claude Advanced Features](06_llm_model_ids_claude_advanced.md)** - Parameter configuration, streaming, caching
|
||||
- **[Platform-Specific Guide](06_llm_model_ids_claude_platforms.md)** - Usage on Vertex AI, AWS Bedrock, Azure AI
|
||||
- **[Tool Use Guide](06_llm_model_ids_claude_tools.md)** - Function Calling implementation
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Claude API Official](https://docs.anthropic.com/en/docs/about-claude/models/overview)
|
||||
- [Anthropic Console](https://console.anthropic.com/)
|
||||
- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/anthropic)
|
||||
262
skills/langgraph-master/06_llm_model_ids_claude_advanced.md
Normal file
262
skills/langgraph-master/06_llm_model_ids_claude_advanced.md
Normal file
@@ -0,0 +1,262 @@
|
||||
# Claude Advanced Features
|
||||
|
||||
Advanced settings and parameter tuning for Claude models.
|
||||
|
||||
## Context Window and Output Limits
|
||||
|
||||
| Model | Context Window | Max Output Tokens | Notes |
|
||||
|--------|-------------------|---------------|------|
|
||||
| `claude-opus-4-1-20250805` | 200,000 | 32,000 | Highest performance |
|
||||
| `claude-sonnet-4-5` | 1,000,000 | 64,000 | Latest version |
|
||||
| `claude-sonnet-4-20250514` | 200,000 (1M beta) | 64,000 | 1M with beta header |
|
||||
| `claude-haiku-4-5-20251001` | 200,000 | 64,000 | Fast version |
|
||||
|
||||
**Note**: To use 1M context with Sonnet 4, a beta header is required.
|
||||
|
||||
## Parameter Configuration
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
temperature=0.7, # Creativity (0.0-1.0)
|
||||
max_tokens=64000, # Max output (Sonnet 4.5: 64K)
|
||||
top_p=0.9, # Diversity
|
||||
top_k=40, # Sampling
|
||||
)
|
||||
|
||||
# Opus 4.1 (max output 32K)
|
||||
llm_opus = ChatAnthropic(
|
||||
model="claude-opus-4-1-20250805",
|
||||
max_tokens=32000,
|
||||
)
|
||||
```
|
||||
|
||||
## Using 1M Context
|
||||
|
||||
### Sonnet 4.5 (Standard)
|
||||
|
||||
```python
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
max_tokens=64000
|
||||
)
|
||||
|
||||
# Can process 1M tokens of context
|
||||
long_document = "..." * 500000 # Long document
|
||||
response = llm.invoke(f"Please analyze the following document:\n\n{long_document}")
|
||||
```
|
||||
|
||||
### Sonnet 4 (Beta Header)
|
||||
|
||||
```python
|
||||
# Enable 1M context with beta header
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-20250514",
|
||||
max_tokens=64000,
|
||||
default_headers={
|
||||
"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
```python
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
streaming=True
|
||||
)
|
||||
|
||||
for chunk in llm.stream("question"):
|
||||
print(chunk.content, end="", flush=True)
|
||||
```
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
Cache parts of long prompts for efficiency:
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
max_tokens=4096
|
||||
)
|
||||
|
||||
# System prompt for caching
|
||||
system_prompt = """
|
||||
You are a professional code reviewer.
|
||||
Please review according to the following coding guidelines:
|
||||
[long guidelines...]
|
||||
"""
|
||||
|
||||
# Use cache
|
||||
response = llm.invoke(
|
||||
[
|
||||
{"role": "system", "content": system_prompt, "cache_control": {"type": "ephemeral"}},
|
||||
{"role": "user", "content": "Please review this code"}
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
**Cache Benefits**:
|
||||
- Cost reduction (90% off on cache hits)
|
||||
- Latency reduction (faster processing on reuse)
|
||||
|
||||
## Vision (Image Processing)
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "What's in this image?"},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "https://example.com/image.jpg"
|
||||
}
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
response = llm.invoke([message])
|
||||
```
|
||||
|
||||
## JSON Mode
|
||||
|
||||
When structured output is needed:
|
||||
|
||||
```python
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
model_kwargs={
|
||||
"response_format": {"type": "json_object"}
|
||||
}
|
||||
)
|
||||
|
||||
response = llm.invoke("Return user information in JSON format")
|
||||
```
|
||||
|
||||
## Token Usage Tracking
|
||||
|
||||
```python
|
||||
from langchain.callbacks import get_openai_callback
|
||||
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
response = llm.invoke("question")
|
||||
print(f"Total Tokens: {cb.total_tokens}")
|
||||
print(f"Prompt Tokens: {cb.prompt_tokens}")
|
||||
print(f"Completion Tokens: {cb.completion_tokens}")
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from anthropic import AnthropicError, RateLimitError
|
||||
|
||||
try:
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
response = llm.invoke("question")
|
||||
except RateLimitError:
|
||||
print("Rate limit reached")
|
||||
except AnthropicError as e:
|
||||
print(f"Anthropic error: {e}")
|
||||
```
|
||||
|
||||
## Rate Limit Handling
|
||||
|
||||
```python
|
||||
from tenacity import retry, wait_exponential, stop_after_attempt
|
||||
from anthropic import RateLimitError
|
||||
|
||||
@retry(
|
||||
wait=wait_exponential(multiplier=1, min=4, max=60),
|
||||
stop=stop_after_attempt(5),
|
||||
retry=lambda e: isinstance(e, RateLimitError)
|
||||
)
|
||||
def invoke_with_retry(llm, messages):
|
||||
return llm.invoke(messages)
|
||||
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
response = invoke_with_retry(llm, ["question"])
|
||||
```
|
||||
|
||||
## Listing Models
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
import os
|
||||
|
||||
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
models = client.models.list()
|
||||
|
||||
for model in models.data:
|
||||
print(f"{model.id} - {model.display_name}")
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Cost Management by Model Selection
|
||||
|
||||
```python
|
||||
# Low-cost version (simple tasks)
|
||||
llm_cheap = ChatAnthropic(model="claude-haiku-4-5-20251001")
|
||||
|
||||
# Balanced version (general tasks)
|
||||
llm_balanced = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
# High-performance version (complex tasks)
|
||||
llm_powerful = ChatAnthropic(model="claude-opus-4-1-20250805")
|
||||
|
||||
# Select based on task
|
||||
def get_llm_for_task(complexity):
|
||||
if complexity == "simple":
|
||||
return llm_cheap
|
||||
elif complexity == "medium":
|
||||
return llm_balanced
|
||||
else:
|
||||
return llm_powerful
|
||||
```
|
||||
|
||||
### Cost Reduction with Prompt Caching
|
||||
|
||||
```python
|
||||
# Cache long system prompt
|
||||
system = {"role": "system", "content": long_guidelines, "cache_control": {"type": "ephemeral"}}
|
||||
|
||||
# Reuse cache across multiple calls (90% cost reduction)
|
||||
for user_input in user_inputs:
|
||||
response = llm.invoke([system, {"role": "user", "content": user_input}])
|
||||
```
|
||||
|
||||
## Leveraging Large Context
|
||||
|
||||
```python
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
|
||||
# Process large documents at once (1M token support)
|
||||
documents = load_large_documents() # Large document collection
|
||||
|
||||
response = llm.invoke(f"""
|
||||
Please analyze the following multiple documents:
|
||||
|
||||
{documents}
|
||||
|
||||
Tell me the main themes and conclusions.
|
||||
""")
|
||||
```
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Claude API Documentation](https://docs.anthropic.com/)
|
||||
- [Anthropic API Reference](https://docs.anthropic.com/en/api/)
|
||||
- [Claude Models Overview](https://docs.anthropic.com/en/docs/about-claude/models/overview)
|
||||
- [Prompt Caching Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
|
||||
219
skills/langgraph-master/06_llm_model_ids_claude_platforms.md
Normal file
219
skills/langgraph-master/06_llm_model_ids_claude_platforms.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# Claude Platform-Specific Guide
|
||||
|
||||
How to use Claude on different cloud platforms.
|
||||
|
||||
## Anthropic API (Direct)
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
anthropic_api_key="sk-ant-..."
|
||||
)
|
||||
```
|
||||
|
||||
### Listing Models
|
||||
|
||||
```python
|
||||
import anthropic
|
||||
import os
|
||||
|
||||
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
models = client.models.list()
|
||||
|
||||
for model in models.data:
|
||||
print(f"{model.id} - {model.display_name}")
|
||||
```
|
||||
|
||||
## Google Vertex AI
|
||||
|
||||
### Model ID Format
|
||||
|
||||
Vertex AI uses `@` notation:
|
||||
|
||||
```
|
||||
claude-opus-4-1@20250805
|
||||
claude-sonnet-4@20250514
|
||||
claude-haiku-4.5@20251001
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
from langchain_google_vertexai import ChatVertexAI
|
||||
|
||||
llm = ChatVertexAI(
|
||||
model="claude-haiku-4.5@20251001",
|
||||
project="your-gcp-project",
|
||||
location="us-central1"
|
||||
)
|
||||
```
|
||||
|
||||
### Environment Setup
|
||||
|
||||
```bash
|
||||
# GCP authentication
|
||||
gcloud auth application-default login
|
||||
|
||||
# Environment variables
|
||||
export GOOGLE_CLOUD_PROJECT="your-project-id"
|
||||
export GOOGLE_CLOUD_LOCATION="us-central1"
|
||||
```
|
||||
|
||||
## AWS Bedrock
|
||||
|
||||
### Model ID Format
|
||||
|
||||
Bedrock uses ARN format:
|
||||
|
||||
```
|
||||
anthropic.claude-opus-4-1-20250805-v1:0
|
||||
anthropic.claude-sonnet-4-20250514-v1:0
|
||||
anthropic.claude-haiku-4-5-20251001-v1:0
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
from langchain_aws import ChatBedrock
|
||||
|
||||
llm = ChatBedrock(
|
||||
model_id="anthropic.claude-haiku-4-5-20251001-v1:0",
|
||||
region_name="us-east-1",
|
||||
model_kwargs={
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 4096
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Environment Setup
|
||||
|
||||
```bash
|
||||
# AWS CLI configuration
|
||||
aws configure
|
||||
|
||||
# Or environment variables
|
||||
export AWS_ACCESS_KEY_ID="your-access-key"
|
||||
export AWS_SECRET_ACCESS_KEY="your-secret-key"
|
||||
export AWS_DEFAULT_REGION="us-east-1"
|
||||
```
|
||||
|
||||
## Azure AI (Microsoft Foundry)
|
||||
|
||||
> **Release**: Public preview started in November 2025
|
||||
|
||||
### Model ID Format
|
||||
|
||||
Azure AI uses the same format as Anthropic API:
|
||||
|
||||
```
|
||||
claude-opus-4-1
|
||||
claude-sonnet-4-5
|
||||
claude-haiku-4-5
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
- **Claude Opus 4.1** (`claude-opus-4-1`)
|
||||
- **Claude Sonnet 4.5** (`claude-sonnet-4-5`)
|
||||
- **Claude Haiku 4.5** (`claude-haiku-4-5`)
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
# Calling Claude using Azure OpenAI SDK
|
||||
import os
|
||||
from openai import AzureOpenAI
|
||||
|
||||
client = AzureOpenAI(
|
||||
azure_endpoint=os.getenv("AZURE_FOUNDRY_ENDPOINT"),
|
||||
api_key=os.getenv("AZURE_FOUNDRY_API_KEY"),
|
||||
api_version="2024-12-01-preview"
|
||||
)
|
||||
|
||||
# Specify deployment name (default is same as model ID)
|
||||
response = client.chat.completions.create(
|
||||
model="claude-sonnet-4-5", # Or your custom deployment name
|
||||
messages=[
|
||||
{"role": "user", "content": "Hello"}
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
### Custom Deployments
|
||||
|
||||
You can set custom deployment names in the Foundry portal:
|
||||
|
||||
```python
|
||||
# Using custom deployment name
|
||||
response = client.chat.completions.create(
|
||||
model="my-custom-claude-deployment",
|
||||
messages=[...]
|
||||
)
|
||||
```
|
||||
|
||||
### Environment Setup
|
||||
|
||||
```bash
|
||||
export AZURE_FOUNDRY_ENDPOINT="https://your-foundry-resource.azure.com"
|
||||
export AZURE_FOUNDRY_API_KEY="your-api-key"
|
||||
```
|
||||
|
||||
### Region Limitations
|
||||
|
||||
Currently available in the following regions:
|
||||
- **East US2**
|
||||
- **Sweden Central**
|
||||
|
||||
Deployment type: **Global Standard**
|
||||
|
||||
## Platform-Specific Features
|
||||
|
||||
| Platform | Model ID Format | Benefits | Drawbacks |
|
||||
|----------------|------------|---------|-----------|
|
||||
| **Anthropic API** | `claude-sonnet-4-5` | Instant access to latest models | Single provider dependency |
|
||||
| **Vertex AI** | `claude-sonnet-4@20250514` | Integration with GCP services | Complex setup |
|
||||
| **AWS Bedrock** | `anthropic.claude-sonnet-4-20250514-v1:0` | Integration with AWS ecosystem | Complex model ID format |
|
||||
| **Azure AI** | `claude-sonnet-4-5` | Azure + GPT and Claude integration | Region limitations |
|
||||
|
||||
## Cross-Platform Fallback
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_google_vertexai import ChatVertexAI
|
||||
from langchain_aws import ChatBedrock
|
||||
|
||||
# Primary and fallback (multi-platform support)
|
||||
primary = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
fallback_gcp = ChatVertexAI(
|
||||
model="claude-sonnet-4@20250514",
|
||||
project="your-project"
|
||||
)
|
||||
fallback_aws = ChatBedrock(
|
||||
model_id="anthropic.claude-sonnet-4-20250514-v1:0",
|
||||
region_name="us-east-1"
|
||||
)
|
||||
|
||||
# Fallback across three platforms
|
||||
llm = primary.with_fallbacks([fallback_gcp, fallback_aws])
|
||||
```
|
||||
|
||||
## Model ID Comparison Table
|
||||
|
||||
| Anthropic API | Vertex AI | AWS Bedrock | Azure AI |
|
||||
|--------------|-----------|-------------|----------|
|
||||
| `claude-opus-4-1-20250805` | `claude-opus-4-1@20250805` | `anthropic.claude-opus-4-1-20250805-v1:0` | `claude-opus-4-1` |
|
||||
| `claude-sonnet-4-5` | `claude-sonnet-4@20250514` | `anthropic.claude-sonnet-4-20250514-v1:0` | `claude-sonnet-4-5` |
|
||||
| `claude-haiku-4-5-20251001` | `claude-haiku-4.5@20251001` | `anthropic.claude-haiku-4-5-20251001-v1:0` | `claude-haiku-4-5` |
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Anthropic API Documentation](https://docs.anthropic.com/)
|
||||
- [Vertex AI Claude Models](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude)
|
||||
- [AWS Bedrock Claude Models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html)
|
||||
- [Azure AI Claude Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/how-to/use-foundry-models-claude)
|
||||
- [Claude in Microsoft Foundry Announcement](https://www.anthropic.com/news/claude-in-microsoft-foundry)
|
||||
216
skills/langgraph-master/06_llm_model_ids_claude_tools.md
Normal file
216
skills/langgraph-master/06_llm_model_ids_claude_tools.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# Claude Tool Use Guide
|
||||
|
||||
Implementation methods for Claude's tool use (Function Calling).
|
||||
|
||||
## Basic Tool Definition
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def get_weather(location: str) -> str:
|
||||
"""Get weather for a specified location.
|
||||
|
||||
Args:
|
||||
location: Location to check weather (e.g., "Tokyo")
|
||||
"""
|
||||
return f"The weather in {location} is sunny"
|
||||
|
||||
@tool
|
||||
def calculate(expression: str) -> float:
|
||||
"""Calculate a mathematical expression.
|
||||
|
||||
Args:
|
||||
expression: Mathematical expression to calculate (e.g., "2 + 2")
|
||||
"""
|
||||
return eval(expression)
|
||||
|
||||
# Bind tools
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm_with_tools = llm.bind_tools([get_weather, calculate])
|
||||
|
||||
# Usage
|
||||
response = llm_with_tools.invoke("Tell me Tokyo's weather and 2+2")
|
||||
print(response.tool_calls)
|
||||
```
|
||||
|
||||
## Tool Integration with LangGraph
|
||||
|
||||
```python
|
||||
from langgraph.prebuilt import create_react_agent
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search_database(query: str) -> str:
|
||||
"""Search the database.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
"""
|
||||
return f"Search results for '{query}'"
|
||||
|
||||
# Create agent
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
tools = [search_database]
|
||||
|
||||
agent = create_react_agent(llm, tools)
|
||||
|
||||
# Execute
|
||||
result = agent.invoke({
|
||||
"messages": [("user", "Search for user information")]
|
||||
})
|
||||
```
|
||||
|
||||
## Custom Tool Node Implementation
|
||||
|
||||
```python
|
||||
from langgraph.graph import StateGraph
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from typing import TypedDict, Annotated
|
||||
from langgraph.graph.message import add_messages
|
||||
|
||||
class State(TypedDict):
|
||||
messages: Annotated[list, add_messages]
|
||||
|
||||
@tool
|
||||
def get_stock_price(symbol: str) -> float:
|
||||
"""Get stock price"""
|
||||
return 150.25
|
||||
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm_with_tools = llm.bind_tools([get_stock_price])
|
||||
|
||||
def agent_node(state: State):
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def tool_node(state: State):
|
||||
# Execute tool calls
|
||||
last_message = state["messages"][-1]
|
||||
tool_calls = last_message.tool_calls
|
||||
|
||||
results = []
|
||||
for tool_call in tool_calls:
|
||||
tool_result = get_stock_price.invoke(tool_call["args"])
|
||||
results.append({
|
||||
"tool_call_id": tool_call["id"],
|
||||
"output": tool_result
|
||||
})
|
||||
|
||||
return {"messages": results}
|
||||
|
||||
# Build graph
|
||||
graph = StateGraph(State)
|
||||
graph.add_node("agent", agent_node)
|
||||
graph.add_node("tools", tool_node)
|
||||
# ... Add edges, etc.
|
||||
```
|
||||
|
||||
## Streaming + Tool Use
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def get_info(topic: str) -> str:
|
||||
"""Get information"""
|
||||
return f"Information about {topic}"
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-sonnet-4-5",
|
||||
streaming=True
|
||||
)
|
||||
llm_with_tools = llm.bind_tools([get_info])
|
||||
|
||||
for chunk in llm_with_tools.stream("Tell me about Python"):
|
||||
if hasattr(chunk, 'tool_calls') and chunk.tool_calls:
|
||||
print(f"Tool: {chunk.tool_calls}")
|
||||
elif chunk.content:
|
||||
print(chunk.content, end="", flush=True)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.tools import tool
|
||||
import anthropic
|
||||
|
||||
@tool
|
||||
def risky_operation(data: str) -> str:
|
||||
"""Risky operation"""
|
||||
if not data:
|
||||
raise ValueError("Data is required")
|
||||
return f"Processing complete: {data}"
|
||||
|
||||
try:
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5")
|
||||
llm_with_tools = llm.bind_tools([risky_operation])
|
||||
response = llm_with_tools.invoke("Execute operation")
|
||||
except anthropic.BadRequestError as e:
|
||||
print(f"Invalid request: {e}")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
```
|
||||
|
||||
## Tool Best Practices
|
||||
|
||||
### 1. Clear Documentation
|
||||
|
||||
```python
|
||||
@tool
|
||||
def analyze_sentiment(text: str, language: str = "en") -> dict:
|
||||
"""Perform sentiment analysis on text.
|
||||
|
||||
Args:
|
||||
text: Text to analyze (max 1000 characters)
|
||||
language: Language of text ("ja", "en", etc.) defaults to English
|
||||
|
||||
Returns:
|
||||
{"sentiment": "positive|negative|neutral", "score": 0.0-1.0}
|
||||
"""
|
||||
# Implementation
|
||||
return {"sentiment": "positive", "score": 0.8}
|
||||
```
|
||||
|
||||
### 2. Use Type Hints
|
||||
|
||||
```python
|
||||
from typing import List, Dict
|
||||
|
||||
@tool
|
||||
def batch_process(items: List[str]) -> Dict[str, int]:
|
||||
"""Batch process multiple items.
|
||||
|
||||
Args:
|
||||
items: List of items to process
|
||||
|
||||
Returns:
|
||||
Dictionary of processing results for each item
|
||||
"""
|
||||
return {item: len(item) for item in items}
|
||||
```
|
||||
|
||||
### 3. Proper Error Handling
|
||||
|
||||
```python
|
||||
@tool
|
||||
def safe_operation(data: str) -> str:
|
||||
"""Safe operation"""
|
||||
try:
|
||||
# Execute operation
|
||||
result = process(data)
|
||||
return result
|
||||
except ValueError as e:
|
||||
return f"Input error: {e}"
|
||||
except Exception as e:
|
||||
return f"Unexpected error: {e}"
|
||||
```
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Claude Tool Use Guide](https://docs.anthropic.com/en/docs/tool-use)
|
||||
- [LangGraph Tools Documentation](https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/)
|
||||
115
skills/langgraph-master/06_llm_model_ids_gemini.md
Normal file
115
skills/langgraph-master/06_llm_model_ids_gemini.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Google Gemini Model IDs
|
||||
|
||||
List of available model IDs for the Google Gemini API.
|
||||
|
||||
> **Last Updated**: 2025-11-24
|
||||
|
||||
## Model List
|
||||
|
||||
While there are many models available, `gemini-2.5-flash` is generally recommended for development at this time. It offers a good balance of cost and performance for a wide range of use cases.
|
||||
|
||||
### Gemini 3.x (Latest)
|
||||
|
||||
| Model ID | Context | Max Output | Use Case |
|
||||
| ---------------------------------------- | ------------ | -------- | ------------------ |
|
||||
| `google/gemini-3-pro-preview` | - | 64K | Latest high-performance model |
|
||||
| `google/gemini-3-pro-image-preview` | - | - | Image generation |
|
||||
| `google/gemini-3-pro-image-preview-edit` | - | - | Image editing |
|
||||
|
||||
### Gemini 2.5
|
||||
|
||||
| Model ID | Context | Max Output | Use Case |
|
||||
| ----------------------- | ------------ | -------- | ---------------------- |
|
||||
| `google/gemini-2.5-pro` | 1M (2M planned) | - | High performance |
|
||||
| `gemini-2.5-flash` | 1M | - | Fast balanced model (recommended) |
|
||||
| `gemini-2.5-flash-lite` | 1M | - | Lightweight and fast |
|
||||
|
||||
**Note**: Free tier is limited to approximately 32K tokens. Gemini Advanced (2.5 Pro) supports 1M tokens.
|
||||
|
||||
### Gemini 2.0
|
||||
|
||||
| Model ID | Context | Max Output | Use Case |
|
||||
| ------------------ | ------------ | -------- | ------ |
|
||||
| `gemini-2.0-flash` | 1M | - | Stable version |
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
|
||||
# Recommended: Balanced model
|
||||
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
|
||||
|
||||
# Also works with prefix
|
||||
llm = ChatGoogleGenerativeAI(model="models/gemini-2.5-flash")
|
||||
|
||||
# High-performance version
|
||||
llm = ChatGoogleGenerativeAI(model="google/gemini-3-pro")
|
||||
|
||||
# Lightweight version
|
||||
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
export GOOGLE_API_KEY="your-api-key"
|
||||
```
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
| Use Case | Recommended Model |
|
||||
| ------------------ | ------------------------------ |
|
||||
| Cost-focused | `gemini-2.5-flash-lite` |
|
||||
| Balanced | `gemini-2.5-flash` |
|
||||
| Performance-focused | `google/gemini-3-pro` |
|
||||
| Large context | `gemini-2.5-pro` (1M tokens) |
|
||||
|
||||
## Gemini Features
|
||||
|
||||
### 1. Large Context Window
|
||||
|
||||
Gemini is the **industry's first model to support 1M tokens**:
|
||||
|
||||
| Tier | Context Limit |
|
||||
| ------------------------- | ---------------- |
|
||||
| Gemini Advanced (2.5 Pro) | 1M tokens |
|
||||
| Vertex AI | 1M tokens |
|
||||
| Free tier | ~32K tokens |
|
||||
|
||||
**Use Cases**:
|
||||
|
||||
- Long document analysis
|
||||
- Understanding entire codebases
|
||||
- Long conversation history
|
||||
|
||||
```python
|
||||
# Processing large context
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-2.5-pro",
|
||||
max_tokens=8192 # Specify output token count
|
||||
)
|
||||
```
|
||||
|
||||
**Future**: Gemini 2.5 Pro is planned to support 2M token context windows.
|
||||
|
||||
### 2. Multimodal Support
|
||||
|
||||
Image input and generation capabilities (see [Advanced Features](06_llm_model_ids_gemini_advanced.md) for details).
|
||||
|
||||
## Important Notes
|
||||
|
||||
- ❌ **Deprecated**: Gemini 1.0, 1.5 series are no longer available
|
||||
- ✅ **Migration Recommended**: Use `gemini-2.5-flash` or later models
|
||||
|
||||
## Detailed Documentation
|
||||
|
||||
For advanced configuration and multimodal features, see:
|
||||
|
||||
- **[Gemini Advanced Features](06_llm_model_ids_gemini_advanced.md)**
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Gemini API Official](https://ai.google.dev/gemini-api/docs/models)
|
||||
- [Google AI Studio](https://makersuite.google.com/)
|
||||
- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai)
|
||||
118
skills/langgraph-master/06_llm_model_ids_gemini_advanced.md
Normal file
118
skills/langgraph-master/06_llm_model_ids_gemini_advanced.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# Gemini Advanced Features
|
||||
|
||||
Advanced configuration and multimodal features for Google Gemini models.
|
||||
|
||||
## Context Window and Output Limits
|
||||
|
||||
| Model | Context Window | Max Output Tokens |
|
||||
|--------|-------------------|---------------|
|
||||
| Gemini 3 Pro | - | 64K |
|
||||
| Gemini 2.5 Pro | 1M (2M planned) | - |
|
||||
| Gemini 2.5 Flash | 1M | - |
|
||||
| Gemini 2.0 Flash | 1M | - |
|
||||
|
||||
**Tier-based Limits**:
|
||||
- Gemini Advanced / Vertex AI: 1M tokens
|
||||
- Free tier: ~32K tokens
|
||||
|
||||
## Parameter Configuration
|
||||
|
||||
```python
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-2.5-flash",
|
||||
temperature=0.7, # Creativity (0.0-1.0)
|
||||
top_p=0.9, # Diversity
|
||||
top_k=40, # Sampling
|
||||
max_tokens=8192, # Max output
|
||||
)
|
||||
```
|
||||
|
||||
## Multimodal Features
|
||||
|
||||
### Image Input
|
||||
|
||||
```python
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
|
||||
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "What is in this image?"},
|
||||
{"type": "image_url", "image_url": "https://example.com/image.jpg"}
|
||||
]
|
||||
)
|
||||
|
||||
response = llm.invoke([message])
|
||||
```
|
||||
|
||||
### Image Generation (Gemini 3.x)
|
||||
|
||||
```python
|
||||
llm = ChatGoogleGenerativeAI(model="google/gemini-3-pro-image-preview")
|
||||
response = llm.invoke("Generate a beautiful sunset landscape")
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
```python
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-2.5-flash",
|
||||
streaming=True
|
||||
)
|
||||
|
||||
for chunk in llm.stream("Question"):
|
||||
print(chunk.content, end="", flush=True)
|
||||
```
|
||||
|
||||
## Safety Settings
|
||||
|
||||
```python
|
||||
from langchain_google_genai import (
|
||||
ChatGoogleGenerativeAI,
|
||||
HarmBlockThreshold,
|
||||
HarmCategory
|
||||
)
|
||||
|
||||
llm = ChatGoogleGenerativeAI(
|
||||
model="gemini-2.5-flash",
|
||||
safety_settings={
|
||||
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
|
||||
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Retrieving Model List
|
||||
|
||||
```python
|
||||
import google.generativeai as genai
|
||||
import os
|
||||
|
||||
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
|
||||
|
||||
for model in genai.list_models():
|
||||
if 'generateContent' in model.supported_generation_methods:
|
||||
print(f"{model.name}: {model.input_token_limit} tokens")
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from google.api_core import exceptions
|
||||
|
||||
try:
|
||||
response = llm.invoke("Question")
|
||||
except exceptions.ResourceExhausted:
|
||||
print("Rate limit reached")
|
||||
except exceptions.InvalidArgument as e:
|
||||
print(f"Invalid argument: {e}")
|
||||
```
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [Gemini API Models](https://ai.google.dev/gemini-api/docs/models)
|
||||
- [Vertex AI](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models)
|
||||
186
skills/langgraph-master/06_llm_model_ids_openai.md
Normal file
186
skills/langgraph-master/06_llm_model_ids_openai.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# OpenAI GPT Model IDs
|
||||
|
||||
List of available model IDs for the OpenAI API.
|
||||
|
||||
> **Last Updated**: 2025-11-24
|
||||
|
||||
## Model List
|
||||
|
||||
### GPT-5 Series
|
||||
|
||||
> **Released**: August 2025
|
||||
|
||||
| Model ID | Context | Max Output | Features |
|
||||
|-----------|------------|---------|------|
|
||||
| `gpt-5` | 400K | 128K | Full-featured. High-quality general-purpose tasks |
|
||||
| `gpt-5-pro` | 400K | 272K | Extended reasoning version. Complex enterprise and research use cases |
|
||||
| `gpt-5-mini` | 400K | 128K | Small high-speed version. Low latency |
|
||||
| `gpt-5-nano` | 400K | 128K | Ultra-lightweight version. Resource optimized |
|
||||
|
||||
**Performance**: Achieved 94.6% on AIME 2025, 74.9% on SWE-bench Verified
|
||||
**Note**: Context window is the combined length of input + output
|
||||
|
||||
### GPT-5.1 Series (Latest Update)
|
||||
|
||||
| Model ID | Context | Max Output | Features |
|
||||
|-----------|------------|---------|------|
|
||||
| `gpt-5.1` | 128K (ChatGPT) / 400K (API) | 128K | Balance of intelligence and speed |
|
||||
| `gpt-5.1-instant` | 128K / 400K | 128K | Adaptive reasoning. Balances speed and accuracy |
|
||||
| `gpt-5.1-thinking` | 128K / 400K | 128K | Adjusts thinking time based on problem complexity |
|
||||
| `gpt-5.1-mini` | 128K / 400K | 128K | Compact version |
|
||||
| `gpt-5.1-codex` | 400K | 128K | Code-specialized version (for GitHub Copilot) |
|
||||
| `gpt-5.1-codex-mini` | 400K | 128K | Code-specialized compact version |
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
# Latest: GPT-5
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
|
||||
# Latest update: GPT-5.1
|
||||
llm = ChatOpenAI(model="gpt-5.1")
|
||||
|
||||
# High performance: GPT-5 Pro
|
||||
llm = ChatOpenAI(model="gpt-5-pro")
|
||||
|
||||
# Cost-conscious: Compact version
|
||||
llm = ChatOpenAI(model="gpt-5-mini")
|
||||
|
||||
# Ultra-lightweight
|
||||
llm = ChatOpenAI(model="gpt-5-nano")
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
```
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
| Use Case | Recommended Model |
|
||||
|------|-----------|
|
||||
| **Maximum Performance** | `gpt-5-pro` |
|
||||
| **General-Purpose Tasks** | `gpt-5` or `gpt-5.1` |
|
||||
| **Cost-Conscious** | `gpt-5-mini` |
|
||||
| **Ultra-Lightweight** | `gpt-5-nano` |
|
||||
| **Adaptive Reasoning** | `gpt-5.1-instant` or `gpt-5.1-thinking` |
|
||||
| **Code Generation** | `gpt-5.1-codex` or `gpt-5` |
|
||||
|
||||
## GPT-5 Features
|
||||
|
||||
### 1. Large Context Window
|
||||
|
||||
GPT-5 series has a **400K token** context window:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
max_tokens=128000 # Max output: 128K
|
||||
)
|
||||
|
||||
# GPT-5 Pro has a maximum output of 272K
|
||||
llm_pro = ChatOpenAI(
|
||||
model="gpt-5-pro",
|
||||
max_tokens=272000
|
||||
)
|
||||
```
|
||||
|
||||
**Use Cases**:
|
||||
- Batch processing of long documents
|
||||
- Analysis of large codebases
|
||||
- Maintaining long conversation histories
|
||||
|
||||
### 2. Software On-Demand Generation
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
response = llm.invoke("Generate a web application")
|
||||
```
|
||||
|
||||
### 3. Advanced Reasoning Capabilities
|
||||
|
||||
**Performance Metrics**:
|
||||
- AIME 2025: 94.6%
|
||||
- SWE-bench Verified: 74.9%
|
||||
- Aider Polyglot: 88%
|
||||
- MMMU: 84.2%
|
||||
|
||||
### 4. GPT-5.1 Adaptive Reasoning
|
||||
|
||||
Automatically adjusts thinking time based on problem complexity:
|
||||
|
||||
```python
|
||||
# Balance between speed and accuracy
|
||||
llm = ChatOpenAI(model="gpt-5.1-instant")
|
||||
|
||||
# Tasks requiring deep thought
|
||||
llm = ChatOpenAI(model="gpt-5.1-thinking")
|
||||
```
|
||||
|
||||
**Compaction Technology**: GPT-5.1 introduces technology that effectively handles longer contexts.
|
||||
|
||||
### 5. GPT-5 Pro - Extended Reasoning
|
||||
|
||||
Advanced reasoning for enterprise and research environments. **Maximum output of 272K tokens**:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5-pro",
|
||||
max_tokens=272000 # Larger output possible than other models
|
||||
)
|
||||
# More detailed and reliable responses
|
||||
```
|
||||
|
||||
### 6. Code-Specialized Models
|
||||
|
||||
```python
|
||||
# Used in GitHub Copilot
|
||||
llm = ChatOpenAI(model="gpt-5.1-codex")
|
||||
|
||||
# Compact version
|
||||
llm = ChatOpenAI(model="gpt-5.1-codex-mini")
|
||||
```
|
||||
|
||||
## Multimodal Support
|
||||
|
||||
GPT-5 supports images and audio (see [Advanced Features](06_llm_model_ids_openai_advanced.md) for details).
|
||||
|
||||
## JSON Mode
|
||||
|
||||
When structured output is needed:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
model_kwargs={"response_format": {"type": "json_object"}}
|
||||
)
|
||||
```
|
||||
|
||||
## Retrieving Model List
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
import os
|
||||
|
||||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
||||
models = client.models.list()
|
||||
|
||||
for model in models:
|
||||
if model.id.startswith("gpt-5"):
|
||||
print(model.id)
|
||||
```
|
||||
|
||||
## Detailed Documentation
|
||||
|
||||
For advanced settings, vision features, and Azure OpenAI:
|
||||
- **[OpenAI Advanced Features](06_llm_model_ids_openai_advanced.md)**
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [OpenAI GPT-5](https://openai.com/index/introducing-gpt-5/)
|
||||
- [OpenAI GPT-5.1](https://openai.com/index/gpt-5-1/)
|
||||
- [OpenAI Platform](https://platform.openai.com/)
|
||||
- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/openai)
|
||||
289
skills/langgraph-master/06_llm_model_ids_openai_advanced.md
Normal file
289
skills/langgraph-master/06_llm_model_ids_openai_advanced.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# OpenAI GPT-5 Advanced Features
|
||||
|
||||
Advanced settings and multimodal features for GPT-5 models.
|
||||
|
||||
## Parameter Settings
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
temperature=0.7, # Creativity (0.0-2.0)
|
||||
max_tokens=128000, # Max output (GPT-5: 128K)
|
||||
top_p=0.9, # Diversity
|
||||
frequency_penalty=0.0, # Repetition penalty
|
||||
presence_penalty=0.0, # Topic diversity
|
||||
)
|
||||
|
||||
# GPT-5 Pro (larger max output)
|
||||
llm_pro = ChatOpenAI(
|
||||
model="gpt-5-pro",
|
||||
max_tokens=272000, # GPT-5 Pro: 272K
|
||||
)
|
||||
```
|
||||
|
||||
## Context Window and Output Limits
|
||||
|
||||
| Model | Context Window | Max Output Tokens |
|
||||
|--------|-------------------|---------------|
|
||||
| `gpt-5` | 400,000 (API) | 128,000 |
|
||||
| `gpt-5-mini` | 400,000 (API) | 128,000 |
|
||||
| `gpt-5-nano` | 400,000 (API) | 128,000 |
|
||||
| `gpt-5-pro` | 400,000 | 272,000 |
|
||||
| `gpt-5.1` | 128,000 (ChatGPT) / 400,000 (API) | 128,000 |
|
||||
| `gpt-5.1-codex` | 400,000 | 128,000 |
|
||||
|
||||
**Note**: Context window is the combined length of input + output.
|
||||
|
||||
## Vision (Image Processing)
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "What is shown in this image?"},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "https://example.com/image.jpg",
|
||||
"detail": "high" # "low", "high", "auto"
|
||||
}
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
response = llm.invoke([message])
|
||||
```
|
||||
|
||||
## Tool Use (Function Calling)
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def get_weather(location: str) -> str:
|
||||
"""Get weather"""
|
||||
return f"The weather in {location} is sunny"
|
||||
|
||||
@tool
|
||||
def calculate(expression: str) -> float:
|
||||
"""Calculate"""
|
||||
return eval(expression)
|
||||
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
llm_with_tools = llm.bind_tools([get_weather, calculate])
|
||||
|
||||
response = llm_with_tools.invoke("Tell me the weather in Tokyo and 2+2")
|
||||
print(response.tool_calls)
|
||||
```
|
||||
|
||||
## Parallel Tool Calling
|
||||
|
||||
```python
|
||||
@tool
|
||||
def get_stock_price(symbol: str) -> float:
|
||||
"""Get stock price"""
|
||||
return 150.25
|
||||
|
||||
@tool
|
||||
def get_company_info(symbol: str) -> dict:
|
||||
"""Get company information"""
|
||||
return {"name": "Apple Inc.", "industry": "Technology"}
|
||||
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
llm_with_tools = llm.bind_tools([get_stock_price, get_company_info])
|
||||
|
||||
# Call multiple tools in parallel
|
||||
response = llm_with_tools.invoke("Tell me the stock price and company info for AAPL")
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
streaming=True
|
||||
)
|
||||
|
||||
for chunk in llm.stream("Question"):
|
||||
print(chunk.content, end="", flush=True)
|
||||
```
|
||||
|
||||
## JSON Mode
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5",
|
||||
model_kwargs={"response_format": {"type": "json_object"}}
|
||||
)
|
||||
|
||||
response = llm.invoke("Return user information in JSON format")
|
||||
```
|
||||
|
||||
## Using GPT-5.1 Adaptive Reasoning
|
||||
|
||||
### Instant Mode
|
||||
|
||||
Balance between speed and accuracy:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(model="gpt-5.1-instant")
|
||||
|
||||
# Adaptively adjusts reasoning time
|
||||
response = llm.invoke("Solve this problem...")
|
||||
```
|
||||
|
||||
### Thinking Mode
|
||||
|
||||
Deep thought for complex problems:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(model="gpt-5.1-thinking")
|
||||
|
||||
# Improves accuracy with longer thinking time
|
||||
response = llm.invoke("Complex math problem...")
|
||||
```
|
||||
|
||||
## Leveraging GPT-5 Pro
|
||||
|
||||
Extended reasoning for enterprise and research environments:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="gpt-5-pro",
|
||||
temperature=0.3, # Precision-focused
|
||||
max_tokens=272000 # Large output possible
|
||||
)
|
||||
|
||||
# More detailed and reliable responses
|
||||
response = llm.invoke("Detailed analysis of...")
|
||||
```
|
||||
|
||||
## Code Generation Specialized Models
|
||||
|
||||
```python
|
||||
# Codex used in GitHub Copilot
|
||||
llm = ChatOpenAI(model="gpt-5.1-codex")
|
||||
|
||||
response = llm.invoke("Implement quicksort in Python")
|
||||
|
||||
# Compact version (fast)
|
||||
llm_mini = ChatOpenAI(model="gpt-5.1-codex-mini")
|
||||
```
|
||||
|
||||
## Tracking Token Usage
|
||||
|
||||
```python
|
||||
from langchain.callbacks import get_openai_callback
|
||||
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
response = llm.invoke("Question")
|
||||
print(f"Total Tokens: {cb.total_tokens}")
|
||||
print(f"Prompt Tokens: {cb.prompt_tokens}")
|
||||
print(f"Completion Tokens: {cb.completion_tokens}")
|
||||
print(f"Total Cost (USD): ${cb.total_cost}")
|
||||
```
|
||||
|
||||
## Azure OpenAI Service
|
||||
|
||||
GPT-5 is also available on Azure:
|
||||
|
||||
```python
|
||||
from langchain_openai import AzureChatOpenAI
|
||||
|
||||
llm = AzureChatOpenAI(
|
||||
azure_endpoint="https://your-resource.openai.azure.com/",
|
||||
api_key="your-azure-api-key",
|
||||
api_version="2024-12-01-preview",
|
||||
deployment_name="gpt-5",
|
||||
model="gpt-5"
|
||||
)
|
||||
```
|
||||
|
||||
### Environment Variables (Azure)
|
||||
|
||||
```bash
|
||||
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
|
||||
export AZURE_OPENAI_API_KEY="your-azure-api-key"
|
||||
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-5"
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
from openai import OpenAIError, RateLimitError
|
||||
|
||||
try:
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
response = llm.invoke("Question")
|
||||
except RateLimitError:
|
||||
print("Rate limit reached")
|
||||
except OpenAIError as e:
|
||||
print(f"OpenAI error: {e}")
|
||||
```
|
||||
|
||||
## Handling Rate Limits
|
||||
|
||||
```python
|
||||
from tenacity import retry, wait_exponential, stop_after_attempt
|
||||
from openai import RateLimitError
|
||||
|
||||
@retry(
|
||||
wait=wait_exponential(multiplier=1, min=4, max=60),
|
||||
stop=stop_after_attempt(5),
|
||||
retry=lambda e: isinstance(e, RateLimitError)
|
||||
)
|
||||
def invoke_with_retry(llm, messages):
|
||||
return llm.invoke(messages)
|
||||
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
response = invoke_with_retry(llm, ["Question"])
|
||||
```
|
||||
|
||||
## Leveraging Large Context
|
||||
|
||||
Utilizing GPT-5's 400K context window:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(model="gpt-5")
|
||||
|
||||
# Process large amounts of documents at once
|
||||
long_document = "..." * 100000 # Long document
|
||||
|
||||
response = llm.invoke(f"""
|
||||
Please analyze the following document:
|
||||
|
||||
{long_document}
|
||||
|
||||
Provide a summary and key points.
|
||||
""")
|
||||
```
|
||||
|
||||
## Compaction Technology
|
||||
|
||||
GPT-5.1 introduces technology that effectively handles longer contexts:
|
||||
|
||||
```python
|
||||
# Processing very long conversation histories or documents
|
||||
llm = ChatOpenAI(model="gpt-5.1")
|
||||
|
||||
# Efficiently processed through Compaction
|
||||
response = llm.invoke(very_long_context)
|
||||
```
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [OpenAI GPT-5 Documentation](https://openai.com/gpt-5/)
|
||||
- [OpenAI GPT-5.1 Documentation](https://openai.com/index/gpt-5-1/)
|
||||
- [OpenAI API Reference](https://platform.openai.com/docs/api-reference)
|
||||
- [OpenAI Platform Models](https://platform.openai.com/docs/models)
|
||||
- [Azure OpenAI Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
|
||||
131
skills/langgraph-master/README.md
Normal file
131
skills/langgraph-master/README.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# langgraph-master
|
||||
|
||||
**PROACTIVE SKILL** - Comprehensive guide for building AI agents with LangGraph. Claude invokes this skill automatically when LangGraph development is detected, providing architecture patterns, implementation guidance, and best practices.
|
||||
|
||||
## Installation
|
||||
|
||||
```
|
||||
/plugin marketplace add hiroshi75/ccplugins
|
||||
/plugin install langgraph-master-plugin@hiroshi75
|
||||
```
|
||||
|
||||
## Automatic Triggers
|
||||
|
||||
Claude **automatically invokes** this skill when:
|
||||
|
||||
- **LangGraph development** - Detecting LangGraph imports or StateGraph usage
|
||||
- **Agent architecture** - Planning or implementing AI agent workflows
|
||||
- **Graph patterns** - Working with nodes, edges, or state management
|
||||
- **Keywords detected** - When user mentions: LangGraph, StateGraph, agent workflow, node, edge, checkpointer
|
||||
- **Implementation requests** - Building chatbots, RAG agents, or autonomous systems
|
||||
|
||||
**No manual action required** - Claude provides LangGraph expertise automatically.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Detect LangGraph context → Auto-invoke skill → Provide patterns/guidance → Implement with best practices
|
||||
```
|
||||
|
||||
## Manual Invocation (Optional)
|
||||
|
||||
To manually trigger LangGraph guidance:
|
||||
|
||||
```
|
||||
/langgraph-master-plugin:langgraph-master
|
||||
```
|
||||
|
||||
For learning specific patterns:
|
||||
|
||||
```
|
||||
/langgraph-master-plugin:langgraph-master "explain routing pattern"
|
||||
```
|
||||
|
||||
## Learning Resources
|
||||
|
||||
The skill provides comprehensive documentation covering:
|
||||
|
||||
| Category | Topics | Files |
|
||||
|----------|--------|-------|
|
||||
| **Core Concepts** | State, Node, Edge fundamentals | 01_core_concepts_*.md |
|
||||
| **Architecture** | 6 major graph patterns (Routing, Agent, etc.) | 02_graph_architecture_*.md |
|
||||
| **Memory** | Checkpointer, Store, Persistence | 03_memory_management_*.md |
|
||||
| **Tools** | Tool definition, Command API, Tool Node | 04_tool_integration_*.md |
|
||||
| **Advanced** | Human-in-the-Loop, Streaming, Map-Reduce | 05_advanced_features_*.md |
|
||||
| **Models** | Gemini, Claude, OpenAI model IDs | 06_llm_model_ids*.md |
|
||||
| **Examples** | Chatbot, RAG agent implementations | example_*.md |
|
||||
|
||||
## Subagent: langgraph-engineer
|
||||
|
||||
The skill includes a specialized **langgraph-master-plugin:langgraph-engineer** subagent for efficient parallel development:
|
||||
|
||||
### Key Features
|
||||
- **Functional Module Scope**: Implements complete features (2-5 nodes) as cohesive units
|
||||
- **Parallel Execution**: Multiple subagents can develop different modules simultaneously
|
||||
- **Production-Ready**: No TODOs or placeholders, fully functional code only
|
||||
- **Skill-Driven**: Always references langgraph-master documentation before implementation
|
||||
|
||||
### When to Use
|
||||
1. **Feature Module Implementation**: RAG search, intent analysis, approval workflows
|
||||
2. **Subgraph Patterns**: Complete functional units with nodes, edges, and state
|
||||
3. **Tool Integration**: Full tool integration modules with error handling
|
||||
|
||||
### Parallel Development Pattern
|
||||
```
|
||||
Planner → Decompose into functional modules
|
||||
├─ langgraph-engineer 1: Intent analysis module (parallel)
|
||||
│ └─ analyze + classify + route nodes
|
||||
└─ langgraph-engineer 2: RAG search module (parallel)
|
||||
└─ retrieve + rerank + generate nodes
|
||||
Orchestrator → Integrate modules into complete graph
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Context Detection** - Claude monitors LangGraph-related activities
|
||||
2. **Trigger Evaluation** - Checks if auto-invoke conditions are met
|
||||
3. **Skill Invocation** - Automatically invokes langgraph-master skill
|
||||
4. **Pattern Guidance** - Provides architecture patterns and best practices
|
||||
5. **Implementation Support** - Assists with code generation using documented patterns
|
||||
|
||||
## Example Use Cases
|
||||
|
||||
### Automatic Guidance
|
||||
```python
|
||||
# Claude detects LangGraph usage and automatically provides guidance
|
||||
from langgraph.graph import StateGraph
|
||||
|
||||
# Skill auto-invoked → Provides state management patterns
|
||||
class AgentState(TypedDict):
|
||||
messages: list[str]
|
||||
```
|
||||
|
||||
### Pattern Implementation
|
||||
```
|
||||
User: "Build a RAG agent with LangGraph"
|
||||
Claude: [Auto-invokes skill]
|
||||
→ Provides RAG architecture pattern
|
||||
→ Suggests node structure (retrieve → rerank → generate)
|
||||
→ Implements with checkpointer for state persistence
|
||||
```
|
||||
|
||||
### Subagent Delegation
|
||||
```
|
||||
User: "Create a chatbot with intent classification and RAG search"
|
||||
Claude: → Decomposes into 2 modules
|
||||
→ Spawns langgraph-engineer for each module (parallel)
|
||||
→ Integrates completed modules into final graph
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Faster Development**: Pre-validated architecture patterns reduce trial and error
|
||||
- **Best Practices**: Automatically applies LangGraph best practices and conventions
|
||||
- **Parallel Implementation**: Efficient development through subagent delegation
|
||||
- **Complete Documentation**: 40+ documentation files covering all aspects
|
||||
- **Production-Ready**: Guidance ensures robust, maintainable implementations
|
||||
|
||||
## Reference Links
|
||||
|
||||
- [LangGraph Official Docs](https://docs.langchain.com/oss/python/langgraph/overview)
|
||||
- [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
|
||||
193
skills/langgraph-master/SKILL.md
Normal file
193
skills/langgraph-master/SKILL.md
Normal file
@@ -0,0 +1,193 @@
|
||||
---
|
||||
name: langgraph-master
|
||||
description: Use when specifying or implementing LangGraph applications - from architecture planning and specification writing to actual code implementation. Also use for designing agent workflows or learning LangGraph patterns. This is a comprehensive guide for building AI agents with LangGraph, covering core concepts, architecture patterns, memory management, tool integration, and advanced features.
|
||||
---
|
||||
|
||||
# LangGraph Agent Construction Skill
|
||||
|
||||
A comprehensive guide for building AI agents using LangGraph.
|
||||
|
||||
## 📚 Learning Content
|
||||
|
||||
### [01. Core Concepts](01_core_concepts_overview.md)
|
||||
|
||||
Understanding the three core elements of LangGraph
|
||||
|
||||
- [State](01_core_concepts_state.md)
|
||||
- [Node](01_core_concepts_node.md)
|
||||
- [Edge](01_core_concepts_edge.md)
|
||||
- Advantages of the graph-based approach
|
||||
|
||||
### [02. Graph Architecture](02_graph_architecture_overview.md)
|
||||
|
||||
Six major graph patterns and agent design
|
||||
|
||||
- [Workflow vs Agent Differences](02_graph_architecture_workflow_vs_agent.md)
|
||||
- [Prompt Chaining (Sequential Processing)](02_graph_architecture_prompt_chaining.md)
|
||||
- [Parallelization](02_graph_architecture_parallelization.md)
|
||||
- [Routing (Branching)](02_graph_architecture_routing.md)
|
||||
- [Orchestrator-Worker](02_graph_architecture_orchestrator_worker.md)
|
||||
- [Evaluator-Optimizer](02_graph_architecture_evaluator_optimizer.md)
|
||||
- [Agent (Autonomous Tool Usage)](02_graph_architecture_agent.md)
|
||||
- [Subgraph](02_graph_architecture_subgraph.md)
|
||||
|
||||
### [03. Memory Management](03_memory_management_overview.md)
|
||||
|
||||
Persistence and checkpoint functionality
|
||||
|
||||
- [Checkpointer](03_memory_management_checkpointer.md)
|
||||
- [Store (Long-term Memory)](03_memory_management_store.md)
|
||||
- [Persistence](03_memory_management_persistence.md)
|
||||
|
||||
### [04. Tool Integration](04_tool_integration_overview.md)
|
||||
|
||||
External tool integration and execution control
|
||||
|
||||
- [Tool Definition](04_tool_integration_tool_definition.md)
|
||||
- [Command API (Control API)](04_tool_integration_command_api.md)
|
||||
- [Tool Node](04_tool_integration_tool_node.md)
|
||||
|
||||
### [05. Advanced Features](05_advanced_features_overview.md)
|
||||
|
||||
Advanced functionality and implementation patterns
|
||||
|
||||
- [Human-in-the-Loop (Approval Flow)](05_advanced_features_human_in_the_loop.md)
|
||||
- [Streaming](05_advanced_features_streaming.md)
|
||||
- [Map-Reduce Pattern](05_advanced_features_map_reduce.md)
|
||||
|
||||
### [06. LLM Model IDs](06_llm_model_ids.md)
|
||||
|
||||
Model ID reference for major LLM providers. Always refer to this document when selecting model IDs. Do not use models not listed in this document.
|
||||
|
||||
- Google Gemini model list
|
||||
- Anthropic Claude model list
|
||||
- OpenAI GPT model list
|
||||
- Usage examples and best practices with LangGraph
|
||||
|
||||
### Implementation Examples
|
||||
|
||||
Practical agent implementation examples
|
||||
|
||||
- [Basic Chatbot](example_basic_chatbot.md)
|
||||
- [RAG Agent](example_rag_agent.md)
|
||||
|
||||
## 📖 How to Use
|
||||
|
||||
Each section can be read independently, but reading them in order is recommended:
|
||||
|
||||
1. First understand LangGraph fundamentals in "Core Concepts"
|
||||
2. Learn design patterns in "Graph Architecture"
|
||||
3. Grasp implementation details in "Memory Management" and "Tool Integration"
|
||||
4. Master advanced features in "Advanced Features"
|
||||
5. Check practical usage in "Implementation Examples"
|
||||
|
||||
Each file is kept short and concise, allowing you to reference only the sections you need.
|
||||
|
||||
## 🤖 Efficient Implementation: Utilizing Subagents
|
||||
|
||||
To accelerate LangGraph application development, utilize the dedicated subagent `langgraph-master-plugin:langgraph-engineer`.
|
||||
|
||||
### Subagent Characteristics
|
||||
|
||||
**langgraph-master-plugin:langgraph-engineer** is an agent specialized in implementing functional modules:
|
||||
|
||||
- **Functional Unit Scope**: Implements complete functionality with multiple nodes, edges, and state definitions as a set
|
||||
- **Parallel Execution Optimization**: Designed for multiple agents to develop different functional modules simultaneously
|
||||
- **Skill-Driven**: Always references the langgraph-master skill before implementation
|
||||
- **Complete Implementation**: Generates fully functional modules (no TODOs or placeholders)
|
||||
- **Appropriate Size**: Functional units of about 2-5 nodes (subgraphs, workflow patterns, tool integrations, etc.)
|
||||
|
||||
### When to Use
|
||||
|
||||
Use langgraph-master-plugin:langgraph-engineer in the following cases:
|
||||
|
||||
1. **When functional module implementation is needed**
|
||||
|
||||
- Decompose the application into functional units
|
||||
- Efficiently develop each function through parallel execution
|
||||
|
||||
2. **Subgraph and pattern implementation**
|
||||
|
||||
- RAG search functionality (retrieve → rerank → generate)
|
||||
- Human-in-the-Loop approval flow (propose → wait_approval → execute)
|
||||
- Intent analysis functionality (analyze → classify → route)
|
||||
|
||||
3. **Tool integration and memory setup**
|
||||
- Complete tool integration module (definition → execution → processing → error handling)
|
||||
- Memory management module (checkpoint setup → persistence → restoration)
|
||||
|
||||
### Practical Example
|
||||
|
||||
**Task**: Build a chatbot with intent analysis and RAG search
|
||||
|
||||
**Parallel Execution Pattern**:
|
||||
|
||||
```
|
||||
Planner → Decompose into functional units
|
||||
├─ langgraph-master-plugin:langgraph-engineer 1: Intent analysis module (parallel)
|
||||
│ └─ analyze + classify + route nodes + conditional edges
|
||||
└─ langgraph-master-plugin:langgraph-engineer 2: RAG search module (parallel)
|
||||
└─ retrieve + rerank + generate nodes + state management
|
||||
Orchestrator → Integrate modules to assemble graph
|
||||
```
|
||||
|
||||
### Usage Method
|
||||
|
||||
1. **Decompose into functional modules**
|
||||
|
||||
- Decompose large LangGraph applications into functional units
|
||||
- Verify that each module can be implemented and tested independently
|
||||
- Verify that module size is appropriate (about 2-5 nodes)
|
||||
|
||||
2. **Implement common parts first**
|
||||
|
||||
- State used across the entire graph
|
||||
- Common tool definitions and common nodes used throughout
|
||||
|
||||
3. **Parallel Execution**
|
||||
|
||||
Assign one functional module implementation to each langgraph-master-plugin:langgraph-engineer agent and execute in parallel
|
||||
|
||||
- Implement independent functional modules simultaneously
|
||||
|
||||
4. **Integration**
|
||||
- Incorporate completed modules into the graph
|
||||
- Verify operation through integration testing
|
||||
|
||||
### Testing Method
|
||||
|
||||
- Perform unit testing for each functional module
|
||||
- Verify overall operation after integration. In many cases, there's an API key in .env, so load it and run at least one successful test case
|
||||
- If the successful case doesn't work well, code review is important, but roughly pinpoint the location, add appropriate logs to identify the cause, think carefully, and then fix.
|
||||
|
||||
### Functional Module Examples
|
||||
|
||||
**Appropriate Size (langgraph-master-plugin:langgraph-engineer scope)**:
|
||||
|
||||
- RAG search functionality: retrieve + rerank + generate (3 nodes)
|
||||
- Intent analysis: analyze + classify + route (2-3 nodes)
|
||||
- Approval workflow: propose + wait_approval + execute (3 nodes)
|
||||
- Tool integration: tool_call + execute + process + error_handling (3-4 nodes)
|
||||
|
||||
**Too Small (individual implementation is sufficient)**:
|
||||
|
||||
- Single node only
|
||||
- Single edge only
|
||||
- State field definition only
|
||||
|
||||
**Too Large (further decomposition needed)**:
|
||||
|
||||
- Complete chatbot application
|
||||
- Entire system containing multiple independent functions
|
||||
|
||||
### Notes
|
||||
|
||||
- **Appropriate Scope Setting**: Verify that each task is limited to one functional module
|
||||
- **Functional Independence**: Minimize dependencies between modules
|
||||
- **Interface Design**: Clearly document state contracts between modules
|
||||
- **Integration Plan**: Plan the integration method after module implementation in advance
|
||||
|
||||
## 🔗 Reference Links
|
||||
|
||||
- [LangGraph Official Documentation](https://docs.langchain.com/oss/python/langgraph/overview)
|
||||
- [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
|
||||
117
skills/langgraph-master/example_basic_chatbot.md
Normal file
117
skills/langgraph-master/example_basic_chatbot.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# Basic Chatbot
|
||||
|
||||
Implementation example of a basic chatbot using LangGraph.
|
||||
|
||||
## Complete Code
|
||||
|
||||
```python
|
||||
from typing import Annotated
|
||||
from langgraph.graph import StateGraph, START, END, MessagesState
|
||||
from langgraph.graph.message import add_messages
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
# 1. Initialize LLM
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929")
|
||||
|
||||
# 2. Define node
|
||||
def chatbot_node(state: MessagesState):
|
||||
"""Chatbot node"""
|
||||
response = llm.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
# 3. Build graph
|
||||
builder = StateGraph(MessagesState)
|
||||
builder.add_node("chatbot", chatbot_node)
|
||||
builder.add_edge(START, "chatbot")
|
||||
builder.add_edge("chatbot", END)
|
||||
|
||||
# 4. Compile with checkpointer
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# 5. Execute
|
||||
config = {"configurable": {"thread_id": "conversation-1"}}
|
||||
|
||||
while True:
|
||||
user_input = input("User: ")
|
||||
if user_input.lower() in ["quit", "exit", "q"]:
|
||||
break
|
||||
|
||||
# Send message
|
||||
for chunk in graph.stream(
|
||||
{"messages": [{"role": "user", "content": user_input}]},
|
||||
config,
|
||||
stream_mode="values"
|
||||
):
|
||||
chunk["messages"][-1].pretty_print()
|
||||
```
|
||||
|
||||
## Explanation
|
||||
|
||||
### 1. MessagesState
|
||||
|
||||
```python
|
||||
from langgraph.graph import MessagesState
|
||||
|
||||
# MessagesState is equivalent to:
|
||||
class MessagesState(TypedDict):
|
||||
messages: Annotated[list[AnyMessage], add_messages]
|
||||
```
|
||||
|
||||
- `messages`: List of messages
|
||||
- `add_messages`: Reducer that adds new messages
|
||||
|
||||
### 2. Checkpointer
|
||||
|
||||
```python
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
```
|
||||
|
||||
- Saves conversation state
|
||||
- Continues conversation with same `thread_id`
|
||||
|
||||
### 3. Streaming
|
||||
|
||||
```python
|
||||
for chunk in graph.stream(input, config, stream_mode="values"):
|
||||
chunk["messages"][-1].pretty_print()
|
||||
```
|
||||
|
||||
- `stream_mode="values"`: Complete state after each step
|
||||
- `pretty_print()`: Displays messages in a readable format
|
||||
|
||||
## Extension Examples
|
||||
|
||||
### Adding System Message
|
||||
|
||||
```python
|
||||
def chatbot_with_system(state: MessagesState):
|
||||
"""With system message"""
|
||||
system_msg = {
|
||||
"role": "system",
|
||||
"content": "You are a helpful assistant."
|
||||
}
|
||||
|
||||
response = llm.invoke([system_msg] + state["messages"])
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
### Limiting Message History
|
||||
|
||||
```python
|
||||
def chatbot_with_limit(state: MessagesState):
|
||||
"""Use only the latest 10 messages"""
|
||||
recent_messages = state["messages"][-10:]
|
||||
response = llm.invoke(recent_messages)
|
||||
return {"messages": [response]}
|
||||
```
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [01_core_concepts_overview.md](01_core_concepts_overview.md) - Understanding fundamental concepts
|
||||
- [03_memory_management_overview.md](03_memory_management_overview.md) - Checkpointer details
|
||||
- [example_rag_agent.md](example_rag_agent.md) - More advanced example
|
||||
169
skills/langgraph-master/example_rag_agent.md
Normal file
169
skills/langgraph-master/example_rag_agent.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# RAG Agent
|
||||
|
||||
Implementation example of a RAG (Retrieval-Augmented Generation) agent with search functionality.
|
||||
|
||||
## Complete Code
|
||||
|
||||
```python
|
||||
from typing import Annotated, Literal
|
||||
from langgraph.graph import StateGraph, START, END, MessagesState
|
||||
from langgraph.prebuilt import ToolNode
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.tools import tool
|
||||
|
||||
# 1. Define tool
|
||||
@tool
|
||||
def retrieve_documents(query: str) -> str:
|
||||
"""Retrieve relevant documents.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
"""
|
||||
# In practice, search with vector store, etc.
|
||||
# Using dummy data here
|
||||
docs = [
|
||||
"LangGraph is an agent framework.",
|
||||
"StateGraph manages state.",
|
||||
"You can extend agents with tools."
|
||||
]
|
||||
|
||||
return "\n".join(docs)
|
||||
|
||||
tools = [retrieve_documents]
|
||||
|
||||
# 2. Bind tools to LLM
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929")
|
||||
llm_with_tools = llm.bind_tools(tools)
|
||||
|
||||
# 3. Define nodes
|
||||
def agent_node(state: MessagesState):
|
||||
"""Agent node"""
|
||||
response = llm_with_tools.invoke(state["messages"])
|
||||
return {"messages": [response]}
|
||||
|
||||
def should_continue(state: MessagesState) -> Literal["tools", "end"]:
|
||||
"""Determine tool usage"""
|
||||
last_message = state["messages"][-1]
|
||||
|
||||
if last_message.tool_calls:
|
||||
return "tools"
|
||||
return "end"
|
||||
|
||||
# 4. Build graph
|
||||
builder = StateGraph(MessagesState)
|
||||
|
||||
builder.add_node("agent", agent_node)
|
||||
builder.add_node("tools", ToolNode(tools))
|
||||
|
||||
builder.add_edge(START, "agent")
|
||||
builder.add_conditional_edges(
|
||||
"agent",
|
||||
should_continue,
|
||||
{
|
||||
"tools": "tools",
|
||||
"end": END
|
||||
}
|
||||
)
|
||||
builder.add_edge("tools", "agent")
|
||||
|
||||
# 5. Compile
|
||||
checkpointer = MemorySaver()
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
# 6. Execute
|
||||
config = {"configurable": {"thread_id": "rag-session-1"}}
|
||||
|
||||
query = "What is LangGraph?"
|
||||
|
||||
for chunk in graph.stream(
|
||||
{"messages": [{"role": "user", "content": query}]},
|
||||
config,
|
||||
stream_mode="values"
|
||||
):
|
||||
chunk["messages"][-1].pretty_print()
|
||||
```
|
||||
|
||||
## Execution Flow
|
||||
|
||||
```
|
||||
User Query: "What is LangGraph?"
|
||||
↓
|
||||
[Agent Node]
|
||||
↓
|
||||
LLM: "I'll search for information" + ToolCall(retrieve_documents)
|
||||
↓
|
||||
[Tool Node] ← Execute search
|
||||
↓
|
||||
ToolMessage: "LangGraph is an agent framework..."
|
||||
↓
|
||||
[Agent Node] ← Use search results
|
||||
↓
|
||||
LLM: "LangGraph is a framework for building agents..."
|
||||
↓
|
||||
END
|
||||
```
|
||||
|
||||
## Extension Examples
|
||||
|
||||
### Multiple Search Tools
|
||||
|
||||
```python
|
||||
@tool
|
||||
def web_search(query: str) -> str:
|
||||
"""Search the web"""
|
||||
return search_web(query)
|
||||
|
||||
@tool
|
||||
def database_search(query: str) -> str:
|
||||
"""Search database"""
|
||||
return search_database(query)
|
||||
|
||||
tools = [retrieve_documents, web_search, database_search]
|
||||
```
|
||||
|
||||
### Vector Search Implementation
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import FAISS
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Initialize vector store
|
||||
embeddings = OpenAIEmbeddings()
|
||||
vectorstore = FAISS.from_texts(
|
||||
["LangGraph is an agent framework.", ...],
|
||||
embeddings
|
||||
)
|
||||
|
||||
@tool
|
||||
def semantic_search(query: str) -> str:
|
||||
"""Perform semantic search"""
|
||||
docs = vectorstore.similarity_search(query, k=3)
|
||||
return "\n".join([doc.page_content for doc in docs])
|
||||
```
|
||||
|
||||
### Adding Human-in-the-Loop
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt
|
||||
|
||||
@tool
|
||||
def sensitive_search(query: str) -> str:
|
||||
"""Search sensitive information (requires approval)"""
|
||||
approved = interrupt({
|
||||
"action": "sensitive_search",
|
||||
"query": query,
|
||||
"message": "Approve this sensitive search?"
|
||||
})
|
||||
|
||||
if approved:
|
||||
return perform_sensitive_search(query)
|
||||
else:
|
||||
return "Search cancelled by user"
|
||||
```
|
||||
|
||||
## Related Pages
|
||||
|
||||
- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent pattern
|
||||
- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool details
|
||||
- [example_basic_chatbot.md](example_basic_chatbot.md) - Basic chatbot
|
||||
Reference in New Issue
Block a user