Initial commit

2025-11-29 18:45:53 +08:00
commit bf626a95e2
68 changed files with 15159 additions and 0 deletions
--- a/skills/arch-analysis/SKILL.md
+++ b/skills/arch-analysis/SKILL.md
@@ -0,0 +1,471 @@
+---
+name: arch-analysis
+description: Analyze LangGraph application architecture, identify bottlenecks, and propose multiple improvement strategies
+---
+
+# LangGraph Architecture Analysis Skill
+
+A skill for analyzing LangGraph application architecture, identifying bottlenecks, and proposing multiple improvement strategies.
+
+## 📋 Overview
+
+This skill analyzes existing LangGraph applications and proposes graph structure improvements:
+
+1. **Current State Analysis**: Performance measurement and graph structure understanding
+2. **Problem Identification**: Organizing bottlenecks and architectural issues
+3. **Improvement Proposals**: Generate 3-5 diverse improvement proposals (**all candidates for parallel exploration**)
+
+**Important**:
+- This skill only performs analysis and proposals. It does not implement changes.
+- **Output all improvement proposals**. The arch-tune command will implement and evaluate them in parallel.
+
+## 🎯 When to Use
+
+Use this skill in the following situations:
+
+1. **When performance improvement of existing applications is needed**
+   - Latency exceeds targets
+   - Cost is too high
+   - Accuracy is insufficient
+
+2. **When considering architecture-level improvements**
+   - Prompt optimization (fine-tune) has limitations
+   - Graph structure changes are needed
+   - Considering introduction of new patterns
+
+3. **When you want to compare multiple improvement options**
+   - Unclear which architecture is optimal
+   - Want to understand trade-offs
+
+## 📖 Analysis and Proposal Workflow
+
+### Step 1: Verify Evaluation Environment
+
+**Purpose**: Prepare for performance measurement
+
+**Actions**:
+1. Verify existence of evaluation program (`.langgraph-master/evaluation/` or specified directory)
+2. If not present, confirm evaluation criteria with user and create
+3. Verify test cases
+
+**Output**: Evaluation program ready
+
+### Step 2: Measure Current Performance
+
+**Purpose**: Establish baseline
+
+**Actions**:
+1. Run test cases 3-5 times
+2. Record each metric (accuracy, latency, cost, etc.)
+3. Calculate statistics (mean, standard deviation, min, max)
+4. Save as baseline
+
+**Output**: `baseline_performance.json`
+
+### Step 3: Analyze Graph Structure
+
+**Purpose**: Understand current architecture
+
+**Actions**:
+1. **Identify graph definitions with Serena MCP**
+   - Search for StateGraph, MessageGraph with `find_symbol`
+   - Identify graph definition files (typically `graph.py`, `main.py`, etc.)
+
+2. **Analyze node and edge structure**
+   - List node functions with `get_symbols_overview`
+   - Verify edge types (sequential, parallel, conditional)
+   - Check for subgraphs
+
+3. **Understand each node's role**
+   - Read node functions
+   - Verify presence of LLM calls
+   - Summarize processing content
+
+**Output**: Graph structure documentation
+
+### Step 4: Identify Bottlenecks
+
+**Purpose**: Identify performance problem areas
+
+**Actions**:
+1. **Latency Bottlenecks**
+   - Identify nodes with longest execution time
+   - Verify delays from sequential processing
+   - Discover unnecessary processing
+
+2. **Cost Issues**
+   - Identify high-cost nodes
+   - Verify unnecessary LLM calls
+   - Evaluate model selection optimality
+
+3. **Accuracy Issues**
+   - Identify nodes with frequent errors
+   - Verify errors due to insufficient information
+   - Discover architecture constraints
+
+**Output**: List of issues
+
+### Step 5: Consider Architecture Patterns
+
+**Purpose**: Identify applicable LangGraph patterns
+
+**Actions**:
+1. **Consider patterns based on problems**
+   - Latency issues → Parallelization
+   - Diverse use cases → Routing
+   - Complex processing → Subgraph
+   - Staged processing → Prompt Chaining, Map-Reduce
+
+2. **Reference langgraph-master skill**
+   - Verify characteristics of each pattern
+   - Evaluate application conditions
+   - Reference implementation examples
+
+**Output**: List of applicable patterns
+
+### Step 6: Generate Improvement Proposals
+
+**Purpose**: Create 3-5 diverse improvement proposals (all candidates for parallel exploration)
+
+**Actions**:
+1. **Create improvement proposals based on each pattern**
+   - Change details (which nodes/edges to modify)
+   - Expected effects (impact on accuracy, latency, cost)
+   - Implementation complexity (low/medium/high)
+   - Estimated implementation time
+
+2. **Evaluate improvement proposals**
+   - Feasibility
+   - Risk assessment
+   - Expected ROI
+
+**Important**: Output all improvement proposals. The arch-tune command will **implement and evaluate all proposals in parallel**.
+
+**Output**: Improvement proposal document (including all proposals)
+
+### Step 7: Create Report
+
+**Purpose**: Organize analysis results and proposals
+
+**Actions**:
+1. Current state analysis summary
+2. Organize issues
+3. **Document all improvement proposals in `improvement_proposals.md`** (with priorities)
+4. Present recommendations for reference (first recommendation, second recommendation, reference)
+
+**Important**: Output all proposals to `improvement_proposals.md`. The arch-tune command will read these and implement/evaluate them in parallel.
+
+**Output**:
+- `analysis_report.md` - Current state analysis and issues
+- `improvement_proposals.md` - **All improvement proposals** (Proposal 1, 2, 3, ...)
+
+## 📊 Output Formats
+
+### baseline_performance.json
+
+```json
+{
+  "iterations": 5,
+  "test_cases": 20,
+  "metrics": {
+    "accuracy": {
+      "mean": 75.0,
+      "std": 3.2,
+      "min": 70.0,
+      "max": 80.0
+    },
+    "latency": {
+      "mean": 3.5,
+      "std": 0.4,
+      "min": 3.1,
+      "max": 4.2
+    },
+    "cost": {
+      "mean": 0.020,
+      "std": 0.002,
+      "min": 0.018,
+      "max": 0.023
+    }
+  }
+}
+```
+
+### analysis_report.md
+
+```markdown
+# Architecture Analysis Report
+
+Execution Date: 2024-11-24 10:00:00
+
+## Current Performance
+
+| Metric | Mean | Std Dev | Target | Gap |
+|--------|------|---------|--------|-----|
+| Accuracy | 75.0% | 3.2% | 90.0% | -15.0% |
+| Latency | 3.5s | 0.4s | 2.0s | +1.5s |
+| Cost | $0.020 | $0.002 | $0.010 | +$0.010 |
+
+## Graph Structure
+
+### Current Configuration
+
+\```
+analyze_intent → retrieve_docs → generate_response
+\```
+
+- **Node Count**: 3
+- **Edge Type**: Sequential only
+- **Parallel Processing**: None
+- **Conditional Branching**: None
+
+### Node Details
+
+#### analyze_intent
+- **Role**: Classify user input intent
+- **LLM**: Claude 3.5 Sonnet
+- **Average Execution Time**: 0.5s
+
+#### retrieve_docs
+- **Role**: Search related documents
+- **Processing**: Vector DB query + reranking
+- **Average Execution Time**: 1.5s
+
+#### generate_response
+- **Role**: Generate final response
+- **LLM**: Claude 3.5 Sonnet
+- **Average Execution Time**: 1.5s
+
+## Issues
+
+### 1. Latency Bottleneck from Sequential Processing
+
+- **Issue**: analyze_intent and retrieve_docs are sequential
+- **Impact**: Total 2.0s delay (57% of total)
+- **Improvement Potential**: -0.8s or more reduction possible through parallelization
+
+### 2. All Requests Follow Same Flow
+
+- **Issue**: Simple and complex questions go through same processing
+- **Impact**: Unnecessary retrieve_docs execution (wasted Cost and Latency)
+- **Improvement Potential**: -50% reduction possible for simple cases through routing
+
+### 3. Use of Low-Relevance Documents
+
+- **Issue**: retrieve_docs returns only top-k (no reranking)
+- **Impact**: Low Accuracy (75%)
+- **Improvement Potential**: +10-15% improvement possible through multi-stage RAG
+
+## Applicable Architecture Patterns
+
+1. **Parallelization** - Parallelize analyze_intent and retrieve_docs
+2. **Routing** - Branch processing flow based on intent
+3. **Subgraph** - Dedicated subgraph for RAG processing (retrieve → rerank → select)
+4. **Orchestrator-Worker** - Execute multiple retrievers in parallel and integrate results
+```
+
+### improvement_proposals.md
+
+```markdown
+# Architecture Improvement Proposals
+
+Proposal Date: 2024-11-24 10:30:00
+
+## Proposal 1: Parallel Document Retrieval + Intent Analysis
+
+### Changes
+
+**Current**:
+\```
+analyze_intent → retrieve_docs → generate_response
+\```
+
+**After Change**:
+\```
+START → [analyze_intent, retrieve_docs] → generate_response
+      ↓ parallel execution ↓
+\```
+
+### Implementation Details
+
+1. Add parallel edges to StateGraph
+2. Add join node to wait for both results
+3. generate_response receives both results
+
+### Expected Effects
+
+| Metric | Current | Expected | Change | Change Rate |
+|--------|---------|----------|--------|-------------|
+| Accuracy | 75.0% | 75.0% | ±0 | - |
+| Latency | 3.5s | 2.7s | -0.8s | -23% |
+| Cost | $0.020 | $0.020 | ±0 | - |
+
+### Implementation Complexity
+
+- **Level**: Low
+- **Estimated Time**: 1-2 hours
+- **Risk**: Low (no changes to existing nodes required)
+
+### Recommendation Level
+
+⭐⭐⭐⭐ (High) - Effective for Latency improvement with low risk
+
+---
+
+## Proposal 2: Intent-Based Routing
+
+### Changes
+
+**Current**:
+\```
+analyze_intent → retrieve_docs → generate_response
+\```
+
+**After Change**:
+\```
+analyze_intent
+    ├─ simple_intent → simple_response (lightweight)
+    └─ complex_intent → retrieve_docs → generate_response
+\```
+
+### Implementation Details
+
+1. Conditional branching based on analyze_intent output
+2. Create new simple_response node (using Haiku)
+3. Routing with conditional_edges
+
+### Expected Effects
+
+| Metric | Current | Expected | Change | Change Rate |
+|--------|---------|----------|--------|-------------|
+| Accuracy | 75.0% | 82.0% | +7.0% | +9% |
+| Latency | 3.5s | 2.8s | -0.7s | -20% |
+| Cost | $0.020 | $0.014 | -$0.006 | -30% |
+
+**Assumption**: 40% simple cases, 60% complex cases
+
+### Implementation Complexity
+
+- **Level**: Medium
+- **Estimated Time**: 2-3 hours
+- **Risk**: Medium (adding routing logic)
+
+### Recommendation Level
+
+⭐⭐⭐⭐⭐ (Highest) - Balanced improvement across all metrics
+
+---
+
+## Proposal 3: Multi-Stage RAG with Reranking Subgraph
+
+### Changes
+
+**Current**:
+\```
+analyze_intent → retrieve_docs → generate_response
+\```
+
+**After Change**:
+\```
+analyze_intent → [RAG Subgraph] → generate_response
+                  ↓
+            retrieve (k=20)
+                  ↓
+            rerank (top-5)
+                  ↓
+            select (best context)
+\```
+
+### Implementation Details
+
+1. Convert RAG processing to dedicated subgraph
+2. Retrieve more candidates in retrieve node (k=20)
+3. Evaluate relevance in rerank node (Cross-Encoder)
+4. Select optimal context in select node
+
+### Expected Effects
+
+| Metric | Current | Expected | Change | Change Rate |
+|--------|---------|----------|--------|-------------|
+| Accuracy | 75.0% | 88.0% | +13.0% | +17% |
+| Latency | 3.5s | 3.8s | +0.3s | +9% |
+| Cost | $0.020 | $0.022 | +$0.002 | +10% |
+
+### Implementation Complexity
+
+- **Level**: Medium-High
+- **Estimated Time**: 3-4 hours
+- **Risk**: Medium (introducing new model, subgraph management)
+
+### Recommendation Level
+
+⭐⭐⭐ (Medium) - Effective when Accuracy is priority, Latency will degrade
+
+---
+
+## Recommendations
+
+**Note**: The following recommendations are for reference. The arch-tune command will **implement and evaluate all Proposals above in parallel** and select the best option based on actual results.
+
+### 🥇 First Recommendation: Proposal 2 (Intent-Based Routing)
+
+**Reasons**:
+- Balanced improvement across all metrics
+- Implementation complexity is manageable at medium level
+- High ROI (effect vs cost)
+
+**Next Steps**:
+1. Run parallel exploration with arch-tune command
+2. Implement and evaluate Proposals 1, 2, 3 simultaneously
+3. Select best option based on actual results
+
+### 🥈 Second Recommendation: Proposal 1 (Parallel Retrieval)
+
+**Reasons**:
+- Simple implementation with low risk
+- Reliable Latency improvement
+- Can be combined with Proposal 2
+
+### 📝 Reference: Proposal 3 (Multi-Stage RAG)
+
+**Reasons**:
+- Effective when Accuracy is most important
+- Only when Latency trade-off is acceptable
+```
+
+## 🔧 Tools and Technologies Used
+
+### MCP Server Usage
+
+- **Serena MCP**: Codebase analysis
+  - `find_symbol`: Search graph definitions
+  - `get_symbols_overview`: Understand node structure
+  - `search_for_pattern`: Search specific patterns
+
+### Reference Skills
+
+- **langgraph-master skill**: Architecture pattern reference
+
+### Evaluation Program
+
+- User-provided or auto-generated
+- Metrics: accuracy, latency, cost, etc.
+
+## ⚠️ Important Notes
+
+1. **Analysis Only**
+   - This skill does not implement changes
+   - Only outputs analysis and proposals
+
+2. **Evaluation Environment**
+   - Evaluation program is required
+   - Will be created if not present
+
+3. **Serena MCP**
+   - If Serena is unavailable, manual code analysis
+   - Use ls, read tools
+
+## 🔗 Related Resources
+
+- [langgraph-master skill](../langgraph-master/SKILL.md) - Architecture patterns
+- [arch-tune command](../../commands/arch-tune.md) - Command that uses this skill
+- [fine-tune skill](../fine-tune/SKILL.md) - Prompt optimization
--- a/skills/fine-tune/README.md
+++ b/skills/fine-tune/README.md
@@ -0,0 +1,83 @@
+# LangGraph Fine-Tune Skill
+
+A comprehensive skill for iteratively optimizing prompts and processing logic in LangGraph applications based on evaluation criteria.
+
+## Overview
+
+The fine-tune skill helps you improve the performance of existing LangGraph applications through systematic prompt optimization without modifying the graph structure (nodes, edges configuration).
+
+## Key Features
+
+- **Iterative Optimization**: Data-driven improvement cycles with measurable results
+- **Graph Structure Preservation**: Only optimize prompts and node logic, not the graph architecture
+- **Statistical Evaluation**: Multiple runs with statistical analysis for reliable results
+- **MCP Integration**: Leverages Serena MCP for codebase analysis and target identification
+
+## When to Use
+
+- LLM output quality needs improvement
+- Response latency is too high
+- Cost optimization is required
+- Error rates need reduction
+- Prompt engineering improvements are expected to help
+
+## 4-Phase Workflow
+
+### Phase 1: Preparation and Analysis
+
+Understand optimization targets and current state.
+
+- Load objectives from `.langgraph-master/fine-tune.md`
+- Identify optimization targets using Serena MCP
+- Create prioritized optimization target list
+
+### Phase 2: Baseline Evaluation
+
+Quantitatively measure current performance.
+
+- Prepare evaluation environment (test cases, scripts)
+- Measure baseline (3-5 runs recommended)
+- Analyze results and identify problems
+
+### Phase 3: Iterative Improvement
+
+Data-driven incremental improvement cycle.
+
+- Prioritize improvement areas by impact
+- Implement prompt optimizations
+- Re-evaluate under same conditions
+- Compare results and decide next steps
+- Repeat until goals are achieved
+
+### Phase 4: Completion and Documentation
+
+Record achievements and provide recommendations.
+
+- Create final evaluation report
+- Commit code changes
+- Update documentation
+
+## Key Optimization Techniques
+
+| Technique                         | Expected Impact             |
+| --------------------------------- | --------------------------- |
+| Few-Shot Examples                 | Accuracy +10-20%            |
+| Structured Output Format          | Parsing errors -90%         |
+| Temperature/Max Tokens Adjustment | Cost -20-40%                |
+| Model Selection Optimization      | Cost -40-60%                |
+| Prompt Caching                    | Cost -50-90% (on cache hit) |
+
+## Best Practices
+
+1. **Start Small**: Begin with the most impactful node
+2. **Measurement-Driven**: Always quantify before and after improvements
+3. **Incremental Changes**: Validate one change at a time
+4. **Document Everything**: Record reasons and results for each change
+5. **Iterate**: Continue improving until goals are achieved
+
+## Important Constraints
+
+- **Preserve Graph Structure**: Do not add/remove nodes or edges
+- **Maintain Data Flow**: Do not change data flow between nodes
+- **Keep State Schema**: Maintain the existing state schema
+- **Evaluation Consistency**: Use same test cases and metrics throughout
--- a/skills/fine-tune/SKILL.md
+++ b/skills/fine-tune/SKILL.md
@@ -0,0 +1,153 @@
+---
+name: fine-tune
+description: Use when you need to fine-tune(ファインチューニング) and optimize LangGraph applications based on evaluation criteria. This skill performs iterative prompt optimization for LangGraph nodes without changing the graph structure.
+---
+
+# LangGraph Application Fine-Tuning Skill
+
+A skill for iteratively optimizing prompts and processing logic in each node of a LangGraph application based on evaluation criteria.
+
+## 📋 Overview
+
+This skill executes the following process to improve the performance of existing LangGraph applications:
+
+1. **Load Objectives**: Retrieve optimization goals and evaluation criteria from `.langgraph-master/fine-tune.md` (if this file doesn't exist, help the user create it based on their requirements)
+2. **Identify Optimization Targets**: Extract nodes containing LLM prompts using Serena MCP (if Serena MCP is unavailable, investigate the codebase using ls, read, etc.)
+3. **Baseline Evaluation**: Measure current performance through multiple runs
+4. **Implement Improvements**: Identify the most effective improvement areas and optimize prompts and processing logic
+5. **Re-evaluation**: Measure performance after improvements
+6. **Iteration**: Repeat steps 4-5 until goals are achieved
+
+**Important Constraint**: Only optimize prompts and processing logic within each node without modifying the graph structure (nodes, edges configuration).
+
+## 🎯 When to Use This Skill
+
+Use this skill in the following situations:
+
+1. **When performance improvement of existing applications is needed**
+
+   - Want to improve LLM output quality
+   - Want to improve response speed
+   - Want to reduce error rate
+
+2. **When evaluation criteria are clear**
+
+   - Optimization goals are defined in `.langgraph-master/fine-tune.md`
+   - Quantitative evaluation methods are established
+
+3. **When improvements through prompt engineering are expected**
+   - Improvements are likely with clearer LLM instructions
+   - Adding few-shot examples would be effective
+   - Output format adjustment is needed
+
+## 📖 Fine-Tuning Workflow Overview
+
+### Phase 1: Preparation and Analysis
+
+**Purpose**: Understand optimization targets and current state
+
+**Main Steps**:
+
+1. Load objective setting file (`.langgraph-master/fine-tune.md`)
+2. Identify optimization targets (Serena MCP or manual code investigation)
+3. Create optimization target list (evaluate improvement potential for each node)
+
+→ See [workflow.md](workflow.md#phase-1-preparation-and-analysis) for details
+
+### Phase 2: Baseline Evaluation
+
+**Purpose**: Quantitatively measure current performance
+
+**Main Steps**: 4. Prepare evaluation environment (test cases, evaluation scripts) 5. Baseline measurement (recommended: 3-5 runs) 6. Analyze baseline results (identify problems)
+
+**Important**: When evaluation programs are needed, create evaluation code in a specific subdirectory (users may specify the directory).
+
+→ See [workflow.md](workflow.md#phase-2-baseline-evaluation) and [evaluation.md](evaluation.md) for details
+
+### Phase 3: Iterative Improvement
+
+**Purpose**: Data-driven incremental improvement
+
+**Main Steps**: 7. Prioritization (select the most impactful improvement area) 8. Implement improvements (prompt optimization, parameter tuning) 9. Post-improvement evaluation (re-evaluate under the same conditions) 10. Compare and analyze results (measure improvement effects) 11. Decide whether to continue iteration (repeat until goals are achieved)
+
+→ See [workflow.md](workflow.md#phase-3-iterative-improvement) and [prompt_optimization.md](prompt_optimization.md) for details
+
+### Phase 4: Completion and Documentation
+
+**Purpose**: Record achievements and provide future recommendations
+
+**Main Steps**: 12. Create final evaluation report (improvement content, results, recommendations) 13. Code commit and documentation update
+
+→ See [workflow.md](workflow.md#phase-4-completion-and-documentation) for details
+
+## 🔧 Tools and Technologies Used
+
+### MCP Server Utilization
+
+- **Serena MCP**: Codebase analysis and optimization target identification
+
+  - `find_symbol`: Search for LLM clients
+  - `find_referencing_symbols`: Identify prompt construction locations
+  - `get_symbols_overview`: Understand node structure
+
+- **Sequential MCP**: Complex analysis and decision making
+  - Determine improvement priorities
+  - Analyze evaluation results
+  - Plan next actions
+
+### Key Optimization Techniques
+
+1. **Few-Shot Examples**: Accuracy +10-20%
+2. **Structured Output Format**: Parsing errors -90%
+3. **Temperature/Max Tokens Adjustment**: Cost -20-40%
+4. **Model Selection Optimization**: Cost -40-60%
+5. **Prompt Caching**: Cost -50-90% (on cache hit)
+
+→ See [prompt_optimization.md](prompt_optimization.md) for details
+
+## 📚 Related Documentation
+
+Detailed guidelines and best practices:
+
+- **[workflow.md](workflow.md)** - Fine-tuning workflow details (execution procedures and code examples for each phase)
+- **[evaluation.md](evaluation.md)** - Evaluation methods and best practices (metric calculation, statistical analysis, test case design)
+- **[prompt_optimization.md](prompt_optimization.md)** - Prompt optimization techniques (10 practical methods and priorities)
+- **[examples.md](examples.md)** - Practical examples collection (copy-and-paste ready code examples and template collection)
+
+## ⚠️ Important Notes
+
+1. **Preserve Graph Structure**
+
+   - Do not add or remove nodes or edges
+   - Do not change data flow between nodes
+   - Maintain state schema
+
+2. **Evaluation Consistency**
+
+   - Use the same test cases
+   - Measure with the same evaluation metrics
+   - Run multiple times to confirm statistically significant improvements
+
+3. **Cost Management**
+
+   - Consider evaluation execution costs
+   - Adjust sample size as needed
+   - Be mindful of API rate limits
+
+4. **Version Control**
+   - Git commit each iteration's changes
+   - Maintain rollback-capable state
+   - Record evaluation results
+
+## 🎓 Fine-Tuning Best Practices
+
+1. **Start Small**: Optimize from the most impactful node
+2. **Measurement-Driven**: Always perform quantitative evaluation before and after improvements
+3. **Incremental Improvement**: Validate one change at a time, not multiple simultaneously
+4. **Documentation**: Record reasons and results for each change
+5. **Iteration**: Continuously improve until goals are achieved
+
+## 🔗 Reference Links
+
+- [LangGraph Official Documentation](https://docs.langchain.com/oss/python/langgraph/overview)
+- [Prompt Engineering Guide](https://www.promptingguide.ai/)
--- a/skills/fine-tune/evaluation.md
+++ b/skills/fine-tune/evaluation.md
@@ -0,0 +1,80 @@
+# Evaluation Methods and Best Practices
+
+Evaluation strategies, metrics, and best practices for fine-tuning LangGraph applications.
+
+**💡 Tip**: For practical evaluation scripts and report templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
+
+## 📚 Table of Contents
+
+This guide is divided into the following sections:
+
+### 1. [Evaluation Metrics Design](./evaluation_metrics.md)
+Learn how to define and calculate metrics used for evaluation.
+
+### 2. [Test Case Design](./evaluation_testcases.md)
+Understand test case structure, coverage, and design principles.
+
+### 3. [Statistical Significance Testing](./evaluation_statistics.md)
+Master methods for multiple runs and statistical analysis.
+
+### 4. [Evaluation Best Practices](./evaluation_practices.md)
+Provides practical evaluation guidelines.
+
+## 🎯 Quick Start
+
+### For First-Time Evaluation
+
+1. **[Understand Evaluation Metrics](./evaluation_metrics.md)** - Which metrics to measure
+2. **[Design Test Cases](./evaluation_testcases.md)** - Create representative cases
+3. **[Learn Statistical Methods](./evaluation_statistics.md)** - Importance of multiple runs
+4. **[Follow Best Practices](./evaluation_practices.md)** - Effective evaluation implementation
+
+### Improving Existing Evaluations
+
+1. **[Add Metrics](./evaluation_metrics.md)** - More comprehensive evaluation
+2. **[Improve Coverage](./evaluation_testcases.md)** - Enhance test cases
+3. **[Strengthen Statistical Validation](./evaluation_statistics.md)** - Improve reliability
+4. **[Introduce Automation](./evaluation_practices.md)** - Continuous evaluation pipeline
+
+## 📖 Importance of Evaluation
+
+In fine-tuning, evaluation provides:
+- **Quantifying Improvements**: Objective progress measurement
+- **Basis for Decision-Making**: Data-driven prioritization
+- **Quality Assurance**: Prevention of regressions
+- **ROI Demonstration**: Visualization of business value
+
+## 💡 Basic Principles of Evaluation
+
+For effective evaluation:
+
+1. ✅ **Multiple Metrics**: Comprehensive assessment of quality, performance, cost, and reliability
+2. ✅ **Statistical Validation**: Confirm significance through multiple runs
+3. ✅ **Consistency**: Evaluate with the same test cases under the same conditions
+4. ✅ **Visualization**: Track improvements with graphs and tables
+5. ✅ **Documentation**: Record evaluation results and analysis
+
+## 🔍 Troubleshooting
+
+### Large Variance in Evaluation Results
+→ Check [Statistical Significance Testing](./evaluation_statistics.md#outlier-detection-and-handling)
+
+### Evaluation Takes Too Long
+→ Implement staged evaluation in [Best Practices](./evaluation_practices.md#troubleshooting)
+
+### Unclear Which Metrics to Measure
+→ Check [Evaluation Metrics Design](./evaluation_metrics.md) for purpose and use cases of each metric
+
+### Insufficient Test Cases
+→ Refer to coverage analysis in [Test Case Design](./evaluation_testcases.md#test-case-design-principles)
+
+## 📋 Related Documentation
+
+- **[Prompt Optimization](./prompt_optimization.md)** - Techniques for prompt improvement
+- **[Examples Collection](./examples.md)** - Samples of evaluation scripts and reports
+- **[Workflow](./workflow.md)** - Overall fine-tuning flow including evaluation
+- **[SKILL.md](./SKILL.md)** - Overview of the fine-tune skill
+
+---
+
+**💡 Tip**: For practical evaluation scripts and templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
--- a/skills/fine-tune/evaluation_metrics.md
+++ b/skills/fine-tune/evaluation_metrics.md
@@ -0,0 +1,340 @@
+# Evaluation Metrics Design
+
+Definitions and calculation methods for evaluation metrics in LangGraph application fine-tuning.
+
+**💡 Tip**: For practical evaluation scripts and report templates, see [examples.md](examples.md#phase-2-baseline-evaluation-examples).
+
+## 📊 Importance of Evaluation
+
+In fine-tuning, evaluation provides:
+- **Quantifying Improvements**: Objective progress measurement
+- **Basis for Decision-Making**: Data-driven prioritization
+- **Quality Assurance**: Prevention of regressions
+- **ROI Demonstration**: Visualization of business value
+
+## 🎯 Evaluation Metric Categories
+
+### 1. Quality Metrics
+
+#### Accuracy
+```python
+def calculate_accuracy(predictions: List, ground_truth: List) -> float:
+    """Calculate accuracy"""
+    correct = sum(p == g for p, g in zip(predictions, ground_truth))
+    return (correct / len(predictions)) * 100
+
+# Example
+predictions = ["product", "technical", "billing", "general"]
+ground_truth = ["product", "billing", "billing", "general"]
+accuracy = calculate_accuracy(predictions, ground_truth)
+# => 50.0% (2/4 correct)
+```
+
+#### F1 Score (Multi-class Classification)
+```python
+from sklearn.metrics import f1_score, classification_report
+
+def calculate_f1(predictions: List, ground_truth: List, average='weighted') -> float:
+    """Calculate F1 score (multi-class support)"""
+    return f1_score(ground_truth, predictions, average=average)
+
+# Detailed report
+report = classification_report(ground_truth, predictions)
+print(report)
+"""
+              precision    recall  f1-score   support
+
+     product       1.00      1.00      1.00         1
+   technical       0.00      0.00      0.00         1
+     billing       0.50      1.00      0.67         1
+     general       1.00      1.00      1.00         1
+
+    accuracy                           0.75         4
+   macro avg       0.62      0.75      0.67         4
+weighted avg       0.62      0.75      0.67         4
+"""
+```
+
+#### Semantic Similarity
+```python
+from sentence_transformers import SentenceTransformer, util
+
+def calculate_semantic_similarity(
+    generated: str,
+    reference: str,
+    model_name: str = "all-MiniLM-L6-v2"
+) -> float:
+    """Calculate semantic similarity between generated and reference text"""
+    model = SentenceTransformer(model_name)
+
+    embeddings = model.encode([generated, reference], convert_to_tensor=True)
+    similarity = util.pytorch_cos_sim(embeddings[0], embeddings[1])
+
+    return similarity.item()
+
+# Example
+generated = "Our premium plan costs $49 per month."
+reference = "The premium subscription is $49/month."
+similarity = calculate_semantic_similarity(generated, reference)
+# => 0.87 (high similarity)
+```
+
+#### BLEU Score (Text Generation Quality)
+```python
+from nltk.translate.bleu_score import sentence_bleu
+
+def calculate_bleu(generated: str, reference: str) -> float:
+    """Calculate BLEU score"""
+    reference_tokens = [reference.split()]
+    generated_tokens = generated.split()
+
+    return sentence_bleu(reference_tokens, generated_tokens)
+
+# Example
+generated = "The product costs forty nine dollars"
+reference = "The product costs $49"
+bleu = calculate_bleu(generated, reference)
+# => 0.45
+```
+
+### 2. Performance Metrics
+
+#### Latency (Response Time)
+```python
+import time
+from typing import Dict, List
+
+def measure_latency(test_cases: List[Dict]) -> Dict:
+    """Measure latency for each node and total"""
+    results = {
+        "total": [],
+        "by_node": {}
+    }
+
+    for case in test_cases:
+        start_time = time.time()
+
+        # Measurement by node
+        node_times = {}
+
+        # Node 1: analyze_intent
+        node_start = time.time()
+        analyze_result = analyze_intent(case["input"])
+        node_times["analyze_intent"] = time.time() - node_start
+
+        # Node 2: retrieve_context
+        node_start = time.time()
+        context = retrieve_context(analyze_result)
+        node_times["retrieve_context"] = time.time() - node_start
+
+        # Node 3: generate_response
+        node_start = time.time()
+        response = generate_response(context, case["input"])
+        node_times["generate_response"] = time.time() - node_start
+
+        total_time = time.time() - start_time
+
+        results["total"].append(total_time)
+        for node, duration in node_times.items():
+            if node not in results["by_node"]:
+                results["by_node"][node] = []
+            results["by_node"][node].append(duration)
+
+    # Statistical calculation
+    import numpy as np
+    summary = {
+        "total": {
+            "mean": np.mean(results["total"]),
+            "p50": np.percentile(results["total"], 50),
+            "p95": np.percentile(results["total"], 95),
+            "p99": np.percentile(results["total"], 99),
+        }
+    }
+
+    for node, times in results["by_node"].items():
+        summary[node] = {
+            "mean": np.mean(times),
+            "p50": np.percentile(times, 50),
+            "p95": np.percentile(times, 95),
+        }
+
+    return summary
+
+# Usage example
+latency_results = measure_latency(test_cases)
+print(f"Mean latency: {latency_results['total']['mean']:.2f}s")
+print(f"P95 latency: {latency_results['total']['p95']:.2f}s")
+```
+
+#### Throughput
+```python
+import concurrent.futures
+from typing import List, Dict
+
+def measure_throughput(
+    test_cases: List[Dict],
+    max_workers: int = 10,
+    duration_seconds: int = 60
+) -> Dict:
+    """Measure number of requests processed within a given time"""
+    start_time = time.time()
+    completed = 0
+    errors = 0
+
+    def process_case(case):
+        try:
+            result = run_langgraph_app(case["input"])
+            return True
+        except Exception:
+            return False
+
+    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
+        while time.time() - start_time < duration_seconds:
+            # Loop through test cases
+            for case in test_cases:
+                if time.time() - start_time >= duration_seconds:
+                    break
+
+                future = executor.submit(process_case, case)
+                if future.result():
+                    completed += 1
+                else:
+                    errors += 1
+
+    elapsed = time.time() - start_time
+
+    return {
+        "completed": completed,
+        "errors": errors,
+        "elapsed": elapsed,
+        "throughput": completed / elapsed,  # requests per second
+        "error_rate": errors / (completed + errors) if (completed + errors) > 0 else 0
+    }
+
+# Usage example
+throughput = measure_throughput(test_cases, max_workers=5, duration_seconds=30)
+print(f"Throughput: {throughput['throughput']:.2f} req/s")
+print(f"Error rate: {throughput['error_rate']*100:.2f}%")
+```
+
+### 3. Cost Metrics
+
+#### Token Usage and Cost
+```python
+from typing import Dict
+
+# Pricing table by model (as of November 2024)
+PRICING = {
+    "claude-3-5-sonnet-20241022": {
+        "input": 3.0 / 1_000_000,   # $3.00 per 1M input tokens
+        "output": 15.0 / 1_000_000,  # $15.00 per 1M output tokens
+    },
+    "claude-3-5-haiku-20241022": {
+        "input": 0.8 / 1_000_000,   # $0.80 per 1M input tokens
+        "output": 4.0 / 1_000_000,   # $4.00 per 1M output tokens
+    }
+}
+
+def calculate_cost(token_usage: Dict, model: str) -> Dict:
+    """Calculate cost from token usage"""
+    pricing = PRICING.get(model, PRICING["claude-3-5-sonnet-20241022"])
+
+    input_cost = token_usage["input_tokens"] * pricing["input"]
+    output_cost = token_usage["output_tokens"] * pricing["output"]
+    total_cost = input_cost + output_cost
+
+    return {
+        "input_tokens": token_usage["input_tokens"],
+        "output_tokens": token_usage["output_tokens"],
+        "total_tokens": token_usage["input_tokens"] + token_usage["output_tokens"],
+        "input_cost": input_cost,
+        "output_cost": output_cost,
+        "total_cost": total_cost,
+        "cost_breakdown": {
+            "input_pct": (input_cost / total_cost * 100) if total_cost > 0 else 0,
+            "output_pct": (output_cost / total_cost * 100) if total_cost > 0 else 0
+        }
+    }
+
+# Usage example
+token_usage = {"input_tokens": 1500, "output_tokens": 800}
+cost = calculate_cost(token_usage, "claude-3-5-sonnet-20241022")
+print(f"Total cost: ${cost['total_cost']:.4f}")
+print(f"Input: ${cost['input_cost']:.4f} ({cost['cost_breakdown']['input_pct']:.1f}%)")
+print(f"Output: ${cost['output_cost']:.4f} ({cost['cost_breakdown']['output_pct']:.1f}%)")
+```
+
+#### Cost per Request
+```python
+def calculate_cost_per_request(
+    test_results: List[Dict],
+    model: str
+) -> Dict:
+    """Calculate cost per request"""
+    total_cost = 0
+    total_input_tokens = 0
+    total_output_tokens = 0
+
+    for result in test_results:
+        cost = calculate_cost(result["token_usage"], model)
+        total_cost += cost["total_cost"]
+        total_input_tokens += result["token_usage"]["input_tokens"]
+        total_output_tokens += result["token_usage"]["output_tokens"]
+
+    num_requests = len(test_results)
+
+    return {
+        "total_requests": num_requests,
+        "total_cost": total_cost,
+        "cost_per_request": total_cost / num_requests,
+        "avg_input_tokens": total_input_tokens / num_requests,
+        "avg_output_tokens": total_output_tokens / num_requests,
+        "total_tokens": total_input_tokens + total_output_tokens
+    }
+```
+
+### 4. Reliability Metrics
+
+#### Error Rate
+```python
+def calculate_error_rate(results: List[Dict]) -> Dict:
+    """Analyze error rate and error types"""
+    total = len(results)
+    errors = [r for r in results if r.get("error")]
+
+    error_types = {}
+    for error in errors:
+        error_type = error["error"]["type"]
+        if error_type not in error_types:
+            error_types[error_type] = 0
+        error_types[error_type] += 1
+
+    return {
+        "total_requests": total,
+        "total_errors": len(errors),
+        "error_rate": len(errors) / total if total > 0 else 0,
+        "error_types": error_types,
+        "success_rate": (total - len(errors)) / total if total > 0 else 0
+    }
+```
+
+#### Retry Rate
+```python
+def calculate_retry_rate(results: List[Dict]) -> Dict:
+    """Proportion of cases that required retries"""
+    total = len(results)
+    retried = [r for r in results if r.get("retry_count", 0) > 0]
+
+    return {
+        "total_requests": total,
+        "retried_requests": len(retried),
+        "retry_rate": len(retried) / total if total > 0 else 0,
+        "avg_retries": sum(r.get("retry_count", 0) for r in retried) / len(retried) if retried else 0
+    }
+```
+
+## 📋 Related Documentation
+
+- [Test Case Design](./evaluation_testcases.md) - Test case structure and coverage
+- [Statistical Significance Testing](./evaluation_statistics.md) - Multiple runs and statistical analysis
+- [Evaluation Best Practices](./evaluation_practices.md) - Consistency, visualization, reporting
--- a/skills/fine-tune/evaluation_practices.md
+++ b/skills/fine-tune/evaluation_practices.md
@@ -0,0 +1,324 @@
+# Evaluation Best Practices
+
+Practical guidelines for effective evaluation of LangGraph applications.
+
+## 🎯 Evaluation Best Practices
+
+### 1. Ensuring Consistency
+
+#### Evaluation Under Same Conditions
+
+```python
+class EvaluationConfig:
+    """Fix evaluation settings to ensure consistency"""
+
+    def __init__(self):
+        self.test_cases_path = "tests/evaluation/test_cases.json"
+        self.seed = 42  # For reproducibility
+        self.iterations = 5
+        self.timeout = 30  # seconds
+        self.model = "claude-3-5-sonnet-20241022"
+
+    def load_test_cases(self) -> List[Dict]:
+        """Load the same test cases"""
+        with open(self.test_cases_path) as f:
+            data = json.load(f)
+        return data["test_cases"]
+
+# Usage
+config = EvaluationConfig()
+test_cases = config.load_test_cases()
+# Use the same test cases for all evaluations
+```
+
+### 2. Staged Evaluation
+
+#### Start Small and Gradually Expand
+
+```python
+# Phase 1: Quick check (3 cases, 1 iteration)
+quick_results = evaluate(test_cases[:3], iterations=1)
+
+if quick_results["accuracy"] > baseline["accuracy"]:
+    # Phase 2: Medium check (10 cases, 3 iterations)
+    medium_results = evaluate(test_cases[:10], iterations=3)
+
+    if medium_results["accuracy"] > baseline["accuracy"]:
+        # Phase 3: Full evaluation (all cases, 5 iterations)
+        full_results = evaluate(test_cases, iterations=5)
+```
+
+### 3. Recording Evaluation Results
+
+#### Structured Logging
+
+```python
+import json
+from datetime import datetime
+from pathlib import Path
+
+def save_evaluation_result(
+    results: Dict,
+    version: str,
+    output_dir: Path = Path("evaluation_results")
+):
+    """Save evaluation results"""
+    output_dir.mkdir(exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    filename = f"{version}_{timestamp}.json"
+
+    full_results = {
+        "version": version,
+        "timestamp": timestamp,
+        "metrics": results,
+        "config": {
+            "model": "claude-3-5-sonnet-20241022",
+            "test_cases": len(test_cases),
+            "iterations": 5
+        }
+    }
+
+    with open(output_dir / filename, "w") as f:
+        json.dump(full_results, f, indent=2)
+
+    print(f"Results saved to: {output_dir / filename}")
+
+# Usage
+save_evaluation_result(results, version="baseline")
+save_evaluation_result(results, version="iteration_1")
+```
+
+### 4. Visualization
+
+#### Visualizing Results
+
+```python
+import matplotlib.pyplot as plt
+
+def visualize_improvement(
+    baseline: Dict,
+    iterations: List[Dict],
+    metrics: List[str] = ["accuracy", "latency", "cost"]
+):
+    """Visualize improvement progress"""
+    fig, axes = plt.subplots(1, len(metrics), figsize=(15, 5))
+
+    for idx, metric in enumerate(metrics):
+        ax = axes[idx]
+
+        # Prepare data
+        x = ["Baseline"] + [f"Iter {i+1}" for i in range(len(iterations))]
+        y = [baseline[metric]] + [it[metric] for it in iterations]
+
+        # Plot
+        ax.plot(x, y, marker='o', linewidth=2)
+        ax.set_title(f"{metric.capitalize()} Progress")
+        ax.set_ylabel(metric.capitalize())
+        ax.grid(True, alpha=0.3)
+
+        # Goal line
+        if metric in baseline.get("goals", {}):
+            goal = baseline["goals"][metric]
+            ax.axhline(y=goal, color='r', linestyle='--', label='Goal')
+            ax.legend()
+
+    plt.tight_layout()
+    plt.savefig("evaluation_results/improvement_progress.png")
+    print("Visualization saved to: evaluation_results/improvement_progress.png")
+```
+
+## 📋 Evaluation Report Template
+
+### Standard Report Format
+
+```markdown
+# Evaluation Report - [Version/Iteration]
+
+Execution Date: 2024-11-24 12:00:00
+Executed by: Claude Code (fine-tune skill)
+
+## Configuration
+
+- **Model**: claude-3-5-sonnet-20241022
+- **Number of Test Cases**: 20
+- **Number of Runs**: 5
+- **Evaluation Duration**: 10 minutes
+
+## Results Summary
+
+| Metric | Mean | Std Dev | 95% CI | Goal | Achievement |
+|--------|------|---------|--------|------|-------------|
+| Accuracy | 86.0% | 2.1% | [83.9%, 88.1%] | 90.0% | 95.6% |
+| Latency | 2.4s | 0.3s | [2.1s, 2.7s] | 2.0s | 83.3% |
+| Cost | $0.014 | $0.001 | [$0.013, $0.015] | $0.010 | 71.4% |
+
+## Detailed Analysis
+
+### Accuracy
+- **Improvement**: +11.0% (75.0% → 86.0%)
+- **Statistical Significance**: p < 0.01 ✅
+- **Effect Size**: Cohen's d = 2.3 (large)
+
+### Latency
+- **Improvement**: -0.1s (2.5s → 2.4s)
+- **Statistical Significance**: p = 0.12 ❌ (not significant)
+- **Effect Size**: Cohen's d = 0.3 (small)
+
+## Error Analysis
+
+- **Total Errors**: 0
+- **Error Rate**: 0.0%
+- **Retry Rate**: 0.0%
+
+## Next Actions
+
+1. ✅ Accuracy significantly improved → Continue
+2. ⚠️ Latency improvement is small → Focus in next iteration
+3. ⚠️ Cost still below goal → Consider max_tokens limit
+```
+
+## 🔍 Troubleshooting
+
+### Common Problems and Solutions
+
+#### 1. Large Variance in Evaluation Results
+
+**Symptom**: Standard deviation > 20% of mean
+
+**Causes**:
+- LLM temperature is too high
+- Test cases are uneven
+- Network latency effects
+
+**Solutions**:
+```python
+# Lower temperature
+llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=0.3  # Set lower
+)
+
+# Increase number of runs
+iterations = 10  # 5 → 10
+
+# Remove outliers
+results_clean = remove_outliers(results)
+```
+
+#### 2. Evaluation Takes Too Long
+
+**Symptom**: Evaluation takes over 1 hour
+
+**Causes**:
+- Too many test cases
+- Not running in parallel
+- Timeout setting too long
+
+**Solutions**:
+```python
+# Subset evaluation
+quick_test_cases = test_cases[:10]  # First 10 cases only
+
+# Parallel execution
+import concurrent.futures
+with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
+    futures = [executor.submit(evaluate_case, case) for case in test_cases]
+    results = [f.result() for f in futures]
+
+# Timeout setting
+timeout = 10  # 30s → 10s
+```
+
+#### 3. No Statistical Significance
+
+**Symptom**: p-value ≥ 0.05
+
+**Causes**:
+- Improvement effect is small
+- Insufficient sample size
+- High data variance
+
+**Solutions**:
+```python
+# Aim for larger improvements
+# - Apply multiple optimizations simultaneously
+# - Choose more effective techniques
+
+# Increase sample size
+iterations = 20  # 5 → 20
+
+# Reduce variance
+# - Lower temperature
+# - Stabilize evaluation environment
+```
+
+## 📊 Continuous Evaluation
+
+### Scheduled Evaluation
+
+```yaml
+evaluation_schedule:
+  daily:
+    - quick_check: 3 test cases, 1 iteration
+    - purpose: Detect major regressions
+
+  weekly:
+    - medium_check: 10 test cases, 3 iterations
+    - purpose: Continuous quality monitoring
+
+  before_release:
+    - full_evaluation: all test cases, 5-10 iterations
+    - purpose: Release quality assurance
+
+  after_major_changes:
+    - comprehensive_evaluation: all test cases, 10+ iterations
+    - purpose: Impact assessment of major changes
+```
+
+### Automated Evaluation Pipeline
+
+```bash
+#!/bin/bash
+# continuous_evaluation.sh
+
+# Daily evaluation script
+
+DATE=$(date +%Y%m%d)
+RESULTS_DIR="evaluation_results/continuous/$DATE"
+mkdir -p $RESULTS_DIR
+
+# Quick check
+echo "Running quick evaluation..."
+uv run python -m tests.evaluation.evaluator \
+    --test-cases 3 \
+    --iterations 1 \
+    --output "$RESULTS_DIR/quick.json"
+
+# Compare with previous results
+uv run python -m tests.evaluation.compare \
+    --baseline "evaluation_results/baseline/summary.json" \
+    --current "$RESULTS_DIR/quick.json" \
+    --threshold 0.05
+
+# Notify if regression detected
+if [ $? -ne 0 ]; then
+    echo "⚠️ Regression detected! Sending notification..."
+    # Notification process (Slack, Email, etc.)
+fi
+```
+
+## Summary
+
+For effective evaluation:
+- ✅ **Multiple Metrics**: Quality, performance, cost, reliability
+- ✅ **Statistical Validation**: Multiple runs and significance testing
+- ✅ **Consistency**: Same test cases, same conditions
+- ✅ **Visualization**: Track improvements with graphs and tables
+- ✅ **Documentation**: Record evaluation results and analysis
+
+## 📋 Related Documentation
+
+- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
+- [Test Case Design](./evaluation_testcases.md) - Test case structure
+- [Statistical Significance](./evaluation_statistics.md) - Statistical analysis methods
--- a/skills/fine-tune/evaluation_statistics.md
+++ b/skills/fine-tune/evaluation_statistics.md
@@ -0,0 +1,315 @@
+# Statistical Significance Testing
+
+Statistical approaches and significance testing in LangGraph application evaluation.
+
+## 📈 Importance of Multiple Runs
+
+### Why Multiple Runs Are Necessary
+
+1. **Account for Randomness**: LLM outputs have probabilistic variation
+2. **Detect Outliers**: Eliminate effects like temporary network latency
+3. **Calculate Confidence Intervals**: Determine if improvements are statistically significant
+
+### Recommended Number of Runs
+
+| Phase | Runs | Purpose |
+|-------|------|---------|
+| **During Development** | 3 | Quick feedback |
+| **During Evaluation** | 5 | Balanced reliability |
+| **Before Production** | 10-20 | High statistical confidence |
+
+## 📊 Statistical Analysis
+
+### Basic Statistical Calculations
+
+```python
+import numpy as np
+from scipy import stats
+
+def statistical_analysis(
+    baseline_results: List[float],
+    improved_results: List[float],
+    alpha: float = 0.05
+) -> Dict:
+    """Statistical comparison of baseline and improved versions"""
+
+    # Basic statistics
+    baseline_stats = {
+        "mean": np.mean(baseline_results),
+        "std": np.std(baseline_results),
+        "median": np.median(baseline_results),
+        "min": np.min(baseline_results),
+        "max": np.max(baseline_results)
+    }
+
+    improved_stats = {
+        "mean": np.mean(improved_results),
+        "std": np.std(improved_results),
+        "median": np.median(improved_results),
+        "min": np.min(improved_results),
+        "max": np.max(improved_results)
+    }
+
+    # Independent t-test
+    t_statistic, p_value = stats.ttest_ind(improved_results, baseline_results)
+
+    # Effect size (Cohen's d)
+    pooled_std = np.sqrt(
+        ((len(baseline_results) - 1) * baseline_stats["std"]**2 +
+         (len(improved_results) - 1) * improved_stats["std"]**2) /
+        (len(baseline_results) + len(improved_results) - 2)
+    )
+    cohens_d = (improved_stats["mean"] - baseline_stats["mean"]) / pooled_std
+
+    # Improvement percentage
+    improvement_pct = (
+        (improved_stats["mean"] - baseline_stats["mean"]) /
+        baseline_stats["mean"] * 100
+    )
+
+    # Confidence intervals (95%)
+    ci_baseline = stats.t.interval(
+        0.95,
+        len(baseline_results) - 1,
+        loc=baseline_stats["mean"],
+        scale=stats.sem(baseline_results)
+    )
+
+    ci_improved = stats.t.interval(
+        0.95,
+        len(improved_results) - 1,
+        loc=improved_stats["mean"],
+        scale=stats.sem(improved_results)
+    )
+
+    # Determine statistical significance
+    is_significant = p_value < alpha
+
+    # Interpret effect size
+    effect_size_interpretation = (
+        "small" if abs(cohens_d) < 0.5 else
+        "medium" if abs(cohens_d) < 0.8 else
+        "large"
+    )
+
+    return {
+        "baseline": baseline_stats,
+        "improved": improved_stats,
+        "comparison": {
+            "improvement_pct": improvement_pct,
+            "t_statistic": t_statistic,
+            "p_value": p_value,
+            "is_significant": is_significant,
+            "cohens_d": cohens_d,
+            "effect_size": effect_size_interpretation
+        },
+        "confidence_intervals": {
+            "baseline": ci_baseline,
+            "improved": ci_improved
+        }
+    }
+
+# Usage example
+baseline_accuracy = [73.0, 75.0, 77.0, 74.0, 76.0]  # 5 run results
+improved_accuracy = [85.0, 87.0, 86.0, 88.0, 84.0]  # 5 run results after improvement
+
+analysis = statistical_analysis(baseline_accuracy, improved_accuracy)
+print(f"Improvement: {analysis['comparison']['improvement_pct']:.1f}%")
+print(f"P-value: {analysis['comparison']['p_value']:.4f}")
+print(f"Significant: {analysis['comparison']['is_significant']}")
+print(f"Effect size: {analysis['comparison']['effect_size']}")
+```
+
+## 🎯 Interpreting Statistical Significance
+
+### P-value Interpretation
+
+| P-value | Interpretation | Action |
+|---------|---------------|--------|
+| p < 0.01 | **Highly significant** | Adopt improvement with confidence |
+| p < 0.05 | **Significant** | Can adopt as improvement |
+| p < 0.10 | **Marginally significant** | Consider additional validation |
+| p ≥ 0.10 | **Not significant** | Judge as no improvement effect |
+
+### Effect Size (Cohen's d) Interpretation
+
+| Cohen's d | Effect Size | Meaning |
+|-----------|------------|---------|
+| d < 0.2 | **Negligible** | No substantial improvement |
+| 0.2 ≤ d < 0.5 | **Small** | Slight improvement |
+| 0.5 ≤ d < 0.8 | **Medium** | Clear improvement |
+| d ≥ 0.8 | **Large** | Significant improvement |
+
+## 📉 Outlier Detection and Handling
+
+### Outlier Detection
+
+```python
+def detect_outliers(data: List[float], method: str = "iqr") -> List[int]:
+    """Detect outlier indices"""
+    data_array = np.array(data)
+
+    if method == "iqr":
+        # IQR method (Interquartile Range)
+        q1 = np.percentile(data_array, 25)
+        q3 = np.percentile(data_array, 75)
+        iqr = q3 - q1
+        lower_bound = q1 - 1.5 * iqr
+        upper_bound = q3 + 1.5 * iqr
+
+        outliers = [
+            i for i, val in enumerate(data)
+            if val < lower_bound or val > upper_bound
+        ]
+
+    elif method == "zscore":
+        # Z-score method
+        mean = np.mean(data_array)
+        std = np.std(data_array)
+        z_scores = np.abs((data_array - mean) / std)
+
+        outliers = [i for i, z in enumerate(z_scores) if z > 3]
+
+    return outliers
+
+# Usage example
+results = [75.0, 76.0, 74.0, 77.0, 95.0]  # 95.0 may be an outlier
+outliers = detect_outliers(results, method="iqr")
+print(f"Outlier indices: {outliers}")  # => [4]
+```
+
+### Outlier Handling Policy
+
+1. **Investigation**: Identify why outliers occurred
+2. **Removal Decision**:
+   - Clear errors (network failure, etc.) → Remove
+   - Actual performance variation → Keep
+3. **Documentation**: Document cause and handling of outliers
+
+## 🔄 Considerations for Repeated Measurements
+
+### Sample Size Calculation
+
+```python
+from scipy.stats import ttest_ind_from_stats
+
+def required_sample_size(
+    baseline_mean: float,
+    baseline_std: float,
+    expected_improvement_pct: float,
+    alpha: float = 0.05,
+    power: float = 0.8
+) -> int:
+    """Estimate required sample size"""
+    improved_mean = baseline_mean * (1 + expected_improvement_pct / 100)
+
+    # Calculate effect size
+    effect_size = abs(improved_mean - baseline_mean) / baseline_std
+
+    # Simple estimation (use statsmodels.stats.power for more accuracy)
+    if effect_size < 0.2:
+        return 100  # Small effect requires many samples
+    elif effect_size < 0.5:
+        return 50
+    elif effect_size < 0.8:
+        return 30
+    else:
+        return 20
+
+# Usage example
+sample_size = required_sample_size(
+    baseline_mean=75.0,
+    baseline_std=3.0,
+    expected_improvement_pct=10.0
+)
+print(f"Required sample size: {sample_size}")
+```
+
+## 📊 Visualizing Confidence Intervals
+
+```python
+import matplotlib.pyplot as plt
+
+def plot_confidence_intervals(
+    baseline_results: List[float],
+    improved_results: List[float],
+    labels: List[str] = ["Baseline", "Improved"]
+):
+    """Plot confidence intervals"""
+    fig, ax = plt.subplots(figsize=(10, 6))
+
+    # Statistical calculations
+    baseline_mean = np.mean(baseline_results)
+    baseline_ci = stats.t.interval(
+        0.95,
+        len(baseline_results) - 1,
+        loc=baseline_mean,
+        scale=stats.sem(baseline_results)
+    )
+
+    improved_mean = np.mean(improved_results)
+    improved_ci = stats.t.interval(
+        0.95,
+        len(improved_results) - 1,
+        loc=improved_mean,
+        scale=stats.sem(improved_results)
+    )
+
+    # Plot
+    positions = [1, 2]
+    means = [baseline_mean, improved_mean]
+    cis = [
+        (baseline_mean - baseline_ci[0], baseline_ci[1] - baseline_mean),
+        (improved_mean - improved_ci[0], improved_ci[1] - improved_mean)
+    ]
+
+    ax.errorbar(positions, means, yerr=np.array(cis).T, fmt='o', markersize=10, capsize=10)
+    ax.set_xticks(positions)
+    ax.set_xticklabels(labels)
+    ax.set_ylabel("Metric Value")
+    ax.set_title("Comparison with 95% Confidence Intervals")
+    ax.grid(True, alpha=0.3)
+
+    plt.tight_layout()
+    plt.savefig("confidence_intervals.png")
+    print("Plot saved: confidence_intervals.png")
+```
+
+## 📋 Statistical Report Template
+
+```markdown
+## Statistical Analysis Results
+
+### Basic Statistics
+
+| Metric | Baseline | Improved | Improvement |
+|--------|----------|----------|-------------|
+| Mean | 75.0% | 86.0% | +11.0% |
+| Std Dev | 3.2% | 2.1% | -1.1% |
+| Median | 75.0% | 86.0% | +11.0% |
+| Min | 70.0% | 84.0% | +14.0% |
+| Max | 80.0% | 88.0% | +8.0% |
+
+### Statistical Tests
+
+- **t-statistic**: 8.45
+- **P-value**: 0.0001 (p < 0.01)
+- **Statistical Significance**: ✅ Highly significant
+- **Effect Size (Cohen's d)**: 2.3 (large)
+
+### Confidence Intervals (95%)
+
+- **Baseline**: [72.8%, 77.2%]
+- **Improved**: [84.9%, 87.1%]
+
+### Conclusion
+
+The improvement is statistically highly significant (p < 0.01), with a large effect size (Cohen's d = 2.3).
+There is no overlap in confidence intervals, confirming the improvement effect is certain.
+```
+
+## 📋 Related Documentation
+
+- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
+- [Test Case Design](./evaluation_testcases.md) - Test case structure
+- [Best Practices](./evaluation_practices.md) - Practical evaluation guide
--- a/skills/fine-tune/evaluation_testcases.md
+++ b/skills/fine-tune/evaluation_testcases.md
@@ -0,0 +1,279 @@
+# Test Case Design
+
+Structure, coverage, and design principles for test cases used in LangGraph application evaluation.
+
+## 🧪 Test Case Structure
+
+### Representative Test Case Structure
+
+```json
+{
+  "test_cases": [
+    {
+      "id": "TC001",
+      "category": "product_inquiry",
+      "difficulty": "easy",
+      "input": "How much does the premium plan cost?",
+      "expected_intent": "product_inquiry",
+      "expected_answer": "The premium plan costs $49 per month.",
+      "expected_answer_semantic": ["premium", "plan", "$49", "month"],
+      "metadata": {
+        "user_type": "new",
+        "context_required": false
+      }
+    },
+    {
+      "id": "TC002",
+      "category": "technical_support",
+      "difficulty": "medium",
+      "input": "I can't seem to log into my account even after resetting my password",
+      "expected_intent": "technical_support",
+      "expected_answer": "Let me help you troubleshoot the login issue. First, please clear your browser cache and cookies, then try logging in again.",
+      "expected_answer_semantic": ["troubleshoot", "clear cache", "cookies", "try again"],
+      "metadata": {
+        "user_type": "existing",
+        "context_required": true,
+        "requires_escalation": false
+      }
+    },
+    {
+      "id": "TC003",
+      "category": "edge_case",
+      "difficulty": "hard",
+      "input": "yo whats the deal with my bill being so high lol",
+      "expected_intent": "billing",
+      "expected_answer": "I understand you have concerns about your bill. Let me review your account to identify any unexpected charges.",
+      "expected_answer_semantic": ["concerns", "bill", "review", "charges"],
+      "metadata": {
+        "user_type": "existing",
+        "context_required": true,
+        "tone": "informal",
+        "requires_empathy": true
+      }
+    }
+  ]
+}
+```
+
+## 📊 Test Case Coverage
+
+### Balance by Category
+
+```python
+def analyze_test_coverage(test_cases: List[Dict]) -> Dict:
+    """Analyze test case coverage"""
+    categories = {}
+    difficulties = {}
+
+    for case in test_cases:
+        # Category
+        cat = case.get("category", "unknown")
+        categories[cat] = categories.get(cat, 0) + 1
+
+        # Difficulty
+        diff = case.get("difficulty", "unknown")
+        difficulties[diff] = difficulties.get(diff, 0) + 1
+
+    total = len(test_cases)
+
+    return {
+        "total_cases": total,
+        "by_category": {
+            cat: {"count": count, "percentage": count/total*100}
+            for cat, count in categories.items()
+        },
+        "by_difficulty": {
+            diff: {"count": count, "percentage": count/total*100}
+            for diff, count in difficulties.items()
+        }
+    }
+```
+
+### Recommended Balance
+
+```yaml
+category_balance:
+  description: "Recommended distribution by category"
+  recommendations:
+    - main_categories: "20-30% (evenly distributed)"
+    - edge_cases: "10-15% (sufficient abnormal case coverage)"
+
+difficulty_balance:
+  description: "Recommended distribution by difficulty"
+  recommendations:
+    - easy: "40-50% (basic functionality verification)"
+    - medium: "30-40% (practical cases)"
+    - hard: "10-20% (edge cases and complex scenarios)"
+```
+
+## 🎯 Test Case Design Principles
+
+### 1. Representativeness
+- **Reflect Real Use Cases**: Cover actual user input patterns
+- **Weight by Frequency**: Include more common cases
+
+### 2. Diversity
+- **Comprehensive Categories**: Cover all major categories
+- **Difficulty Variation**: From easy to hard
+- **Edge Cases**: Abnormal cases, ambiguous cases, boundary values
+
+### 3. Clarity
+- **Clear Expectations**: Be specific with expected_answer
+- **Explicit Criteria**: Clearly define correctness criteria
+
+### 4. Maintainability
+- **ID-based Tracking**: Unique ID for each test case
+- **Rich Metadata**: Category, difficulty, and other attributes
+
+## 📝 Test Case Templates
+
+### Basic Template
+
+```json
+{
+  "id": "TC[number]",
+  "category": "[category name]",
+  "difficulty": "easy|medium|hard",
+  "input": "[user input]",
+  "expected_intent": "[expected intent]",
+  "expected_answer": "[expected answer]",
+  "expected_answer_semantic": ["keyword1", "keyword2"],
+  "metadata": {
+    "user_type": "new|existing",
+    "context_required": true|false,
+    "specific_flag": true|false
+  }
+}
+```
+
+### Templates by Category
+
+#### Product Inquiry
+```json
+{
+  "id": "TC_PRODUCT_001",
+  "category": "product_inquiry",
+  "difficulty": "easy",
+  "input": "Question about product",
+  "expected_intent": "product_inquiry",
+  "expected_answer": "Answer including product information",
+  "metadata": {
+    "product_type": "premium|basic|enterprise",
+    "question_type": "pricing|features|comparison"
+  }
+}
+```
+
+#### Technical Support
+```json
+{
+  "id": "TC_TECH_001",
+  "category": "technical_support",
+  "difficulty": "medium",
+  "input": "Technical problem report",
+  "expected_intent": "technical_support",
+  "expected_answer": "Troubleshooting steps",
+  "metadata": {
+    "issue_type": "login|performance|bug",
+    "requires_escalation": false,
+    "urgency": "low|medium|high"
+  }
+}
+```
+
+#### Billing
+```json
+{
+  "id": "TC_BILLING_001",
+  "category": "billing",
+  "difficulty": "medium",
+  "input": "Billing question",
+  "expected_intent": "billing",
+  "expected_answer": "Billing explanation and next steps",
+  "metadata": {
+    "billing_type": "charge|refund|subscription",
+    "requires_account_access": true
+  }
+}
+```
+
+#### Edge Cases
+```json
+{
+  "id": "TC_EDGE_001",
+  "category": "edge_case",
+  "difficulty": "hard",
+  "input": "Ambiguous, non-standard, or unexpected input",
+  "expected_intent": "appropriate fallback",
+  "expected_answer": "Polite clarification request",
+  "metadata": {
+    "edge_type": "ambiguous|off_topic|malformed",
+    "requires_empathy": true
+  }
+}
+```
+
+## 🔍 Test Case Evaluation
+
+### Quality Checklist
+
+```python
+def validate_test_case(test_case: Dict) -> List[str]:
+    """Check test case quality"""
+    issues = []
+
+    # Check required fields
+    required_fields = ["id", "category", "difficulty", "input", "expected_intent"]
+    for field in required_fields:
+        if field not in test_case:
+            issues.append(f"Missing required field: {field}")
+
+    # ID uniqueness (requires separate check)
+    # Input length check
+    if len(test_case.get("input", "")) < 5:
+        issues.append("Input too short (minimum 5 characters)")
+
+    # Category validity
+    valid_categories = ["product_inquiry", "technical_support", "billing", "general", "edge_case"]
+    if test_case.get("category") not in valid_categories:
+        issues.append(f"Invalid category: {test_case.get('category')}")
+
+    # Difficulty validity
+    valid_difficulties = ["easy", "medium", "hard"]
+    if test_case.get("difficulty") not in valid_difficulties:
+        issues.append(f"Invalid difficulty: {test_case.get('difficulty')}")
+
+    return issues
+```
+
+## 📈 Coverage Report
+
+### Coverage Analysis Script
+
+```python
+def generate_coverage_report(test_cases: List[Dict]) -> str:
+    """Generate test case coverage report"""
+    coverage = analyze_test_coverage(test_cases)
+
+    report = f"""# Test Case Coverage Report
+
+## Summary
+- **Total Test Cases**: {coverage['total_cases']}
+
+## By Category
+"""
+    for cat, data in coverage['by_category'].items():
+        report += f"- **{cat}**: {data['count']} cases ({data['percentage']:.1f}%)\n"
+
+    report += "\n## By Difficulty\n"
+    for diff, data in coverage['by_difficulty'].items():
+        report += f"- **{diff}**: {data['count']} cases ({data['percentage']:.1f}%)\n"
+
+    return report
+```
+
+## 📋 Related Documentation
+
+- [Evaluation Metrics](./evaluation_metrics.md) - Metric definitions and calculation methods
+- [Statistical Significance](./evaluation_statistics.md) - Multiple runs and statistical analysis
+- [Best Practices](./evaluation_practices.md) - Practical evaluation guide
--- a/skills/fine-tune/examples.md
+++ b/skills/fine-tune/examples.md
@@ -0,0 +1,119 @@
+# Fine-Tuning Practical Examples Collection
+
+A collection of specific code examples and markdown templates used for LangGraph application fine-tuning.
+
+## 📋 Table of Contents
+
+This guide is divided by Phase:
+
+### [Phase 1: Preparation and Analysis Examples](./examples_phase1.md)
+Templates and code examples used in the optimization preparation phase:
+- **Example 1.1**: fine-tune.md structure example
+- **Example 1.2**: Optimization target list example
+- **Example 1.3**: Code search example with Serena MCP
+
+**Estimated Time**: 30 minutes - 1 hour
+
+### [Phase 2: Baseline Evaluation Examples](./examples_phase2.md)
+Scripts and report examples used for current performance measurement:
+- **Example 2.1**: Evaluation script (evaluator.py)
+- **Example 2.2**: Baseline measurement script (baseline_evaluation.sh)
+- **Example 2.3**: Baseline results report
+
+**Estimated Time**: 1-2 hours
+
+### [Phase 3: Iterative Improvement Examples](./examples_phase3.md)
+Practical examples of prompt optimization and result comparison:
+- **Example 3.1**: Before/After prompt comparison
+- **Example 3.2**: Prioritization matrix
+- **Example 3.3**: Iteration results report
+
+**Estimated Time**: 1-2 hours per iteration × number of iterations
+
+### [Phase 4: Completion and Documentation Examples](./examples_phase4.md)
+Examples of recording final results and version control:
+- **Example 4.1**: Final evaluation report (complete version)
+- **Example 4.2**: Git commit message examples
+
+**Estimated Time**: 30 minutes - 1 hour
+
+## 🎯 How to Use
+
+### For First-Time Implementation
+
+1. **Start with [Phase 1 examples](./examples_phase1.md)** - Copy and use templates
+2. **Set up [Phase 2 evaluation scripts](./examples_phase2.md)** - Customize for your environment
+3. **Iterate using [Phase 3 comparison examples](./examples_phase3.md)** - Record Before/After
+4. **Document with [Phase 4 report](./examples_phase4.md)** - Summarize final results
+
+### Copy & Paste Ready
+
+Each example includes complete code and templates:
+- Python scripts → Ready to execute as-is
+- Bash scripts → Set environment variables and run
+- Markdown templates → Fill in content and use
+- JSON structures → Templates for test cases and reports
+
+## 📊 Types of Examples
+
+### Code Scripts
+- **Evaluation scripts** (Phase 2): evaluator.py, aggregate_results.py
+- **Measurement scripts** (Phase 2): baseline_evaluation.sh
+- **Analysis scripts** (Phase 1): Serena MCP search examples
+
+### Markdown Templates
+- **fine-tune.md** (Phase 1): Goal setting
+- **Optimization target list** (Phase 1): Organizing improvement targets
+- **Baseline results report** (Phase 2): Current state analysis
+- **Iteration results report** (Phase 3): Improvement effect measurement
+- **Final evaluation report** (Phase 4): Overall summary
+
+### Comparison Examples
+- **Before/After prompts** (Phase 3): Specific improvement examples
+- **Prioritization matrix** (Phase 3): Decision-making records
+
+## 🔍 Finding Examples
+
+### By Purpose
+
+| Purpose | Phase | Example |
+|---------|-------|---------|
+| Set goals | Phase 1 | [Example 1.1](./examples_phase1.md#example-11-fine-tunemd-structure-example) |
+| Find optimization targets | Phase 1 | [Example 1.3](./examples_phase1.md#example-13-code-search-example-with-serena-mcp) |
+| Create evaluation scripts | Phase 2 | [Example 2.1](./examples_phase2.md#example-21-evaluation-script) |
+| Measure baseline | Phase 2 | [Example 2.2](./examples_phase2.md#example-22-baseline-measurement-script) |
+| Improve prompts | Phase 3 | [Example 3.1](./examples_phase3.md#example-31-beforeafter-prompt-comparison) |
+| Determine priorities | Phase 3 | [Example 3.2](./examples_phase3.md#example-32-prioritization-matrix) |
+| Write final report | Phase 4 | [Example 4.1](./examples_phase4.md#example-41-final-evaluation-report) |
+| Git commit | Phase 4 | [Example 4.2](./examples_phase4.md#example-42-git-commit-message-examples) |
+
+## 🔗 Related Documentation
+
+- **[Workflow](./workflow.md)** - Detailed procedures for each Phase
+- **[Evaluation Methods](./evaluation.md)** - Evaluation metrics and statistical analysis
+- **[Prompt Optimization](./prompt_optimization.md)** - Detailed optimization techniques
+- **[SKILL.md](./SKILL.md)** - Overview of the Fine-tune skill
+
+## 💡 Tips
+
+### Customization Points
+
+1. **Number of test cases**: Examples use 20 cases, but adjust according to your project
+2. **Number of runs**: 3-5 runs recommended for baseline measurement, but adjust based on time constraints
+3. **Target values**: Set Accuracy, Latency, and Cost targets according to project requirements
+4. **Model**: Adjust pricing if using models other than Claude 3.5 Sonnet
+
+### Frequently Asked Questions
+
+**Q: Can I use the example code as-is?**
+A: Yes, it's executable once you set environment variables (API keys, etc.).
+
+**Q: Can I edit the templates?**
+A: Yes, please customize freely according to your project.
+
+**Q: Can I skip phases?**
+A: We recommend executing all phases on the first run. From the second run onward, you can start from Phase 2.
+
+---
+
+**💡 Tip**: For detailed procedures of each Phase, refer to the [Workflow](./workflow.md).
--- a/skills/fine-tune/examples_phase1.md
+++ b/skills/fine-tune/examples_phase1.md
@@ -0,0 +1,174 @@
+# Phase 1: Preparation and Analysis Examples
+
+Practical code examples and templates.
+
+**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 1](./workflow_phase1.md)
+
+---
+
+## Phase 1: Preparation and Analysis Examples
+
+### Example 1.1: fine-tune.md Structure Example
+
+**File**: `.langgraph-master/fine-tune.md`
+
+```markdown
+# Fine-Tuning Goals
+
+## Optimization Objectives
+
+- **Accuracy**: Improve user intent classification accuracy to 90% or higher
+- **Latency**: Reduce response time to 2.0 seconds or less
+- **Cost**: Reduce cost per request to $0.010 or less
+
+## Evaluation Method
+
+### Test Cases
+
+- **Dataset**: tests/evaluation/test_cases.json (20 cases)
+- **Execution Command**: uv run python -m src.evaluate
+- **Evaluation Script**: tests/evaluation/evaluator.py
+
+### Evaluation Metrics
+
+#### Accuracy (Correctness Rate)
+
+- **Calculation Method**: (Number of correct answers / Total cases) × 100
+- **Target Value**: 90% or higher
+
+#### Latency (Response Time)
+
+- **Calculation Method**: Average time of each execution
+- **Target Value**: 2.0 seconds or less
+
+#### Cost
+
+- **Calculation Method**: Total API cost / Total number of requests
+- **Target Value**: $0.010 or less
+
+## Pass Criteria
+
+All evaluation metrics must achieve their target values.
+```
+
+### Example 1.2: Optimization Target List Example
+
+```markdown
+# Optimization Target Nodes
+
+## Node: analyze_intent
+
+### Basic Information
+
+- **File**: src/nodes/analyzer.py:25-45
+- **Role**: Classify user input intent
+- **LLM Model**: claude-3-5-sonnet-20241022
+- **Current Parameters**: temperature=1.0, max_tokens=default
+
+### Current Prompt
+
+\```python
+SystemMessage(content="You are an intent analyzer. Analyze user input.")
+HumanMessage(content=f"Analyze: {user_input}")
+\```
+
+### Issues
+
+1. **Ambiguous instructions**: Specific criteria for "Analyze" are unclear
+2. **No few-shot examples**: No expected output examples
+3. **Undefined output format**: Free text, not structured
+4. **High temperature**: 1.0 is too high for classification tasks
+
+### Improvement Proposals
+
+1. Specify concrete classification categories
+2. Add 3-5 few-shot examples
+3. Specify JSON output format
+4. Lower temperature to 0.3-0.5
+
+### Estimated Improvement Effect
+
+- **Accuracy**: +10-15% (Current misclassification 20% → 5-10%)
+- **Latency**: ±0 (no change)
+- **Cost**: ±0 (no change)
+
+### Priority
+
+⭐⭐⭐⭐⭐ (Highest priority) - Direct impact on accuracy improvement
+
+---
+
+## Node: generate_response
+
+### Basic Information
+
+- **File**: src/nodes/generator.py:45-68
+- **Role**: Generate final user-facing response
+- **LLM Model**: claude-3-5-sonnet-20241022
+- **Current Parameters**: temperature=0.7, max_tokens=default
+
+### Current Prompt
+
+\```python
+ChatPromptTemplate.from_messages([
+    ("system", "Generate helpful response based on context."),
+    ("human", "{context}\n\nQuestion: {question}")
+])
+\```
+
+### Issues
+
+1. **No redundancy control**: No instructions for conciseness
+2. **max_tokens not set**: Possibility of unnecessarily long output
+3. **Response style undefined**: No specification of tone or style
+
+### Improvement Proposals
+
+1. Add length instructions like "concisely" "in 2-3 sentences"
+2. Limit max_tokens to 500
+3. Clarify response style ("friendly" "professional", etc.)
+
+### Estimated Improvement Effect
+
+- **Accuracy**: ±0 (no change)
+- **Latency**: -0.3-0.5s (due to reduced output tokens)
+- **Cost**: -20-30% (due to reduced token count)
+
+### Priority
+
+⭐⭐⭐ (Medium) - Improvement in latency and cost
+```
+
+### Example 1.3: Code Search Example with Serena MCP
+
+```python
+# Search for LLM client
+from mcp_serena import find_symbol, find_referencing_symbols
+
+# Step 1: Search for ChatAnthropic usage locations
+chat_anthropic_usages = find_symbol(
+    name_path="ChatAnthropic",
+    substring_matching=True,
+    include_body=False
+)
+
+print(f"Found {len(chat_anthropic_usages)} ChatAnthropic usages")
+
+# Step 2: Investigate details of each usage location
+for usage in chat_anthropic_usages:
+    print(f"\nFile: {usage.relative_path}:{usage.line_start}")
+    print(f"Context: {usage.name_path}")
+
+    # Identify prompt construction locations
+    references = find_referencing_symbols(
+        name_path=usage.name,
+        relative_path=usage.relative_path
+    )
+
+    # Display locations that may contain prompts
+    for ref in references:
+        if "message" in ref.name.lower() or "prompt" in ref.name.lower():
+            print(f"  - Potential prompt location: {ref.name_path}")
+```
+
+---
--- a/skills/fine-tune/examples_phase2.md
+++ b/skills/fine-tune/examples_phase2.md
@@ -0,0 +1,194 @@
+# Phase 2: Baseline Evaluation Examples
+
+Examples of evaluation scripts and result reports.
+
+**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 2](./workflow_phase2.md) | [Evaluation Methods](./evaluation.md)
+
+---
+
+## Phase 2: Baseline Evaluation Examples
+
+### Example 2.1: Evaluation Script
+
+**File**: `tests/evaluation/evaluator.py`
+
+```python
+import json
+import time
+from pathlib import Path
+from typing import Dict, List
+
+def evaluate_test_cases(test_cases: List[Dict]) -> Dict:
+    """Evaluate test cases"""
+    results = {
+        "total_cases": len(test_cases),
+        "correct": 0,
+        "total_latency": 0.0,
+        "total_cost": 0.0,
+        "case_results": []
+    }
+
+    for case in test_cases:
+        start_time = time.time()
+
+        # Run LangGraph application
+        output = run_langgraph_app(case["input"])
+
+        latency = time.time() - start_time
+
+        # Correctness judgment
+        is_correct = output["answer"] == case["expected_answer"]
+        if is_correct:
+            results["correct"] += 1
+
+        # Cost calculation (from token usage)
+        cost = calculate_cost(output["token_usage"])
+
+        results["total_latency"] += latency
+        results["total_cost"] += cost
+
+        results["case_results"].append({
+            "case_id": case["id"],
+            "correct": is_correct,
+            "latency": latency,
+            "cost": cost
+        })
+
+    # Calculate metrics
+    results["accuracy"] = (results["correct"] / results["total_cases"]) * 100
+    results["avg_latency"] = results["total_latency"] / results["total_cases"]
+    results["avg_cost"] = results["total_cost"] / results["total_cases"]
+
+    return results
+
+def calculate_cost(token_usage: Dict) -> float:
+    """Calculate cost from token usage"""
+    # Claude 3.5 Sonnet pricing
+    INPUT_COST_PER_1M = 3.0  # $3.00 per 1M input tokens
+    OUTPUT_COST_PER_1M = 15.0  # $15.00 per 1M output tokens
+
+    input_cost = (token_usage["input_tokens"] / 1_000_000) * INPUT_COST_PER_1M
+    output_cost = (token_usage["output_tokens"] / 1_000_000) * OUTPUT_COST_PER_1M
+
+    return input_cost + output_cost
+
+if __name__ == "__main__":
+    # Load test cases
+    with open("tests/evaluation/test_cases.json") as f:
+        test_cases = json.load(f)["test_cases"]
+
+    # Execute evaluation
+    results = evaluate_test_cases(test_cases)
+
+    # Save results
+    with open("evaluation_results/baseline_run.json", "w") as f:
+        json.dump(results, f, indent=2)
+
+    print(f"Accuracy: {results['accuracy']:.1f}%")
+    print(f"Avg Latency: {results['avg_latency']:.2f}s")
+    print(f"Avg Cost: ${results['avg_cost']:.4f}")
+```
+
+### Example 2.2: Baseline Measurement Script
+
+**File**: `scripts/baseline_evaluation.sh`
+
+```bash
+#!/bin/bash
+
+ITERATIONS=5
+RESULTS_DIR="evaluation_results/baseline"
+mkdir -p $RESULTS_DIR
+
+echo "Starting baseline evaluation: $ITERATIONS iterations"
+
+for i in $(seq 1 $ITERATIONS); do
+    echo "----------------------------------------"
+    echo "Iteration $i/$ITERATIONS"
+    echo "----------------------------------------"
+
+    uv run python -m tests.evaluation.evaluator \
+        --output "$RESULTS_DIR/run_$i.json" \
+        --verbose
+
+    echo "Completed iteration $i"
+
+    # API rate limit mitigation
+    if [ $i -lt $ITERATIONS ]; then
+        echo "Waiting 5 seconds before next iteration..."
+        sleep 5
+    fi
+done
+
+echo ""
+echo "All iterations completed. Aggregating results..."
+
+# Aggregate results
+uv run python -m tests.evaluation.aggregate \
+    --input-dir "$RESULTS_DIR" \
+    --output "$RESULTS_DIR/summary.json"
+
+echo "Baseline evaluation complete!"
+echo "Results saved to: $RESULTS_DIR/summary.json"
+```
+
+### Example 2.3: Baseline Results Report
+
+```markdown
+# Baseline Evaluation Results
+
+Execution Date/Time: 2024-11-24 10:00:00
+Number of Runs: 5
+Number of Test Cases: 20
+
+## Evaluation Metrics Summary
+
+| Metric   | Average | Std Dev | Min    | Max    | Target | Gap        |
+| -------- | ------- | ------- | ------ | ------ | ------ | ---------- |
+| Accuracy | 75.0%   | 3.2%    | 70.0%  | 80.0%  | 90.0%  | **-15.0%** |
+| Latency  | 2.5s    | 0.4s    | 2.1s   | 3.2s   | 2.0s   | **+0.5s**  |
+| Cost/req | $0.015  | $0.002  | $0.013 | $0.018 | $0.010 | **+$0.005** |
+
+## Detailed Analysis
+
+### Accuracy Issues
+
+- **Current**: 75.0% (Target: 90.0%)
+- **Main incorrect answer patterns**:
+  1. Intent classification errors: 12 cases (60% of errors)
+  2. Insufficient context understanding: 5 cases (25% of errors)
+  3. Ambiguous question handling: 3 cases (15% of errors)
+
+### Latency Issues
+
+- **Current**: 2.5s (Target: 2.0s)
+- **Bottlenecks**:
+  1. generate_response node: Average 1.8s (72% of total)
+  2. analyze_intent node: Average 0.5s (20% of total)
+  3. Other: Average 0.2s (8% of total)
+
+### Cost Issues
+
+- **Current**: $0.015/req (Target: $0.010/req)
+- **Cost breakdown**:
+  1. generate_response: $0.011 (73%)
+  2. analyze_intent: $0.003 (20%)
+  3. Other: $0.001 (7%)
+- **Main factor**: High output token count (average 800 tokens)
+
+## Improvement Directions
+
+### Priority 1: Improve analyze_intent accuracy
+
+- **Impact**: Direct impact on Accuracy (accounts for 60% of the -15% gap)
+- **Improvement measures**: Few-shot examples, clear classification criteria, JSON output format
+- **Estimated effect**: +10-12% accuracy
+
+### Priority 2: Optimize generate_response efficiency
+
+- **Impact**: Affects both Latency and Cost
+- **Improvement measures**: Conciseness instructions, max_tokens limit, temperature adjustment
+- **Estimated effect**: -0.4s latency, -$0.004 cost
+```
+
+---
--- a/skills/fine-tune/examples_phase3.md
+++ b/skills/fine-tune/examples_phase3.md
@@ -0,0 +1,230 @@
+# Phase 3: Iterative Improvement Examples
+
+Examples of before/after prompt comparisons and result reports.
+
+**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 3](./workflow_phase3.md) | [Prompt Optimization](./prompt_optimization.md)
+
+---
+
+## Phase 3: Iterative Improvement Examples
+
+### Example 3.1: Before/After Prompt Comparison
+
+**Node**: analyze_intent
+
+#### Before (Baseline)
+
+```python
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=1.0
+    )
+
+    messages = [
+        SystemMessage(content="You are an intent analyzer. Analyze user input."),
+        HumanMessage(content=f"Analyze: {state['user_input']}")
+    ]
+
+    response = llm.invoke(messages)
+    state["intent"] = response.content
+    return state
+```
+
+**Issues**:
+- Ambiguous instructions
+- No few-shot examples
+- Free text output
+- High temperature
+
+**Result**: Accuracy 75%
+
+#### After (Iteration 1)
+
+```python
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=0.3  # Lower temperature for classification tasks
+    )
+
+    # Clear classification categories and few-shot examples
+    system_prompt = """You are an intent classifier for a customer support chatbot.
+
+Classify user input into one of these categories:
+- "product_inquiry": Questions about products or services
+- "technical_support": Technical issues or troubleshooting
+- "billing": Payment, invoicing, or billing questions
+- "general": General questions or chitchat
+
+Output ONLY a valid JSON object with this structure:
+{
+  "intent": "<category>",
+  "confidence": <0.0-1.0>,
+  "reasoning": "<brief explanation>"
+}
+
+Examples:
+
+Input: "How much does the premium plan cost?"
+Output: {"intent": "product_inquiry", "confidence": 0.95, "reasoning": "Question about product pricing"}
+
+Input: "I can't log into my account"
+Output: {"intent": "technical_support", "confidence": 0.9, "reasoning": "Authentication issue"}
+
+Input: "Why was I charged twice?"
+Output: {"intent": "billing", "confidence": 0.95, "reasoning": "Question about billing charges"}
+
+Input: "Hello, how are you?"
+Output: {"intent": "general", "confidence": 0.85, "reasoning": "General greeting"}
+
+Input: "What's the return policy?"
+Output: {"intent": "product_inquiry", "confidence": 0.9, "reasoning": "Question about product policy"}
+"""
+
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
+    ]
+
+    response = llm.invoke(messages)
+
+    # JSON parsing (with error handling)
+    try:
+        intent_data = json.loads(response.content)
+        state["intent"] = intent_data["intent"]
+        state["confidence"] = intent_data["confidence"]
+    except json.JSONDecodeError:
+        # Fallback
+        state["intent"] = "general"
+        state["confidence"] = 0.5
+
+    return state
+```
+
+**Improvements**:
+- ✅ temperature: 1.0 → 0.3
+- ✅ Clear classification categories (4 intents)
+- ✅ Few-shot examples (5 added)
+- ✅ JSON output format (structured output)
+- ✅ Error handling (fallback for JSON parsing failures)
+
+**Result**: Accuracy 86% (+11%)
+
+### Example 3.2: Prioritization Matrix
+
+```markdown
+## Improvement Prioritization Matrix
+
+| Node              | Impact       | Feasibility  | Implementation Cost | Total Score | Priority |
+| ----------------- | ------------ | ------------ | ------------------- | ----------- | -------- |
+| analyze_intent    | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐          | 14/15       | 1st      |
+| generate_response | ⭐⭐⭐⭐   | ⭐⭐⭐⭐   | ⭐⭐⭐⭐          | 12/15       | 2nd      |
+| retrieve_context  | ⭐⭐       | ⭐⭐⭐     | ⭐⭐⭐           | 8/15        | 3rd      |
+
+### Detailed Analysis
+
+#### 1st: analyze_intent Node
+
+- **Impact**: ⭐⭐⭐⭐⭐
+  - Direct impact on Accuracy (accounts for 60% of -15% gap)
+  - Also affects downstream nodes (chain errors from misclassification)
+
+- **Feasibility**: ⭐⭐⭐⭐⭐
+  - Improvement expected from few-shot examples
+  - Similar cases show +10-15% improvement
+
+- **Implementation Cost**: ⭐⭐⭐⭐
+  - Implementation time: 30-60 minutes
+  - Testing time: 30 minutes
+  - Risk: Low
+
+**Iteration 1 target**: analyze_intent node
+
+#### 2nd: generate_response Node
+
+- **Impact**: ⭐⭐⭐⭐
+  - Main contributor to Latency and Cost (over 70% of total)
+  - Small direct impact on Accuracy
+
+- **Feasibility**: ⭐⭐⭐⭐
+  - max_tokens limit ensures improvement
+  - Quality can be maintained with conciseness instructions
+
+- **Implementation Cost**: ⭐⭐⭐⭐
+  - Implementation time: 20-30 minutes
+  - Testing time: 30 minutes
+  - Risk: Low
+
+**Iteration 2 target**: generate_response node
+```
+
+### Example 3.3: Iteration Results Report
+
+```markdown
+# Iteration 1 Evaluation Results
+
+Execution Date/Time: 2024-11-24 12:00:00
+Changes: analyze_intent node optimization
+
+## Result Comparison
+
+| Metric       | Baseline | Iteration 1 | Change     | Change Rate | Target | Achievement |
+| ------------ | -------- | ----------- | ---------- | ----------- | ------ | ----------- |
+| **Accuracy** | 75.0%    | **86.0%**   | **+11.0%** | +14.7%      | 90.0%  | 95.6%       |
+| **Latency**  | 2.5s     | 2.4s        | -0.1s      | -4.0%       | 2.0s   | 80.0%       |
+| **Cost/req** | $0.015   | $0.014      | -$0.001    | -6.7%       | $0.010 | 71.4%       |
+
+## Detailed Analysis
+
+### Accuracy Improvement
+
+- **Improvement**: +11.0% (75.0% → 86.0%)
+- **Remaining gap**: 4.0% (Target 90.0%)
+- **Improved cases**: Intent classification errors reduced from 12 → 3 cases
+- **Still needs improvement**: Context understanding cases (5 cases)
+
+### Slight Latency Improvement
+
+- **Improvement**: -0.1s (2.5s → 2.4s)
+- **Main factor**: analyze_intent output became more concise due to lower temperature
+- **Remaining bottleneck**: generate_response (average 1.8s)
+
+### Slight Cost Reduction
+
+- **Reduction**: -$0.001 (6.7% reduction)
+- **Factor**: analyze_intent output token reduction
+- **Main cost**: generate_response still accounts for 73%
+
+## Statistical Significance
+
+- **t-test**: p < 0.01 ✅ (statistically significant)
+- **Effect size**: Cohen's d = 2.3 (large effect)
+- **Confidence interval**: [83.9%, 88.1%] (95% CI)
+
+## Next Iteration Strategy
+
+### Priority 1: Optimize generate_response
+
+- **Goal**: Latency from 1.8s → 1.4s, Cost from $0.011 → $0.007
+- **Approach**:
+  1. Add conciseness instructions
+  2. Limit max_tokens to 500
+  3. Adjust temperature from 0.7 → 0.5
+
+### Priority 2: Final 4% Accuracy improvement
+
+- **Goal**: 86.0% → 90.0% or higher
+- **Approach**: Improve context understanding (retrieve_context node)
+
+## Decision
+
+✅ **Continue** → Proceed to Iteration 2
+
+Reasons:
+- Accuracy improved significantly but still hasn't reached target
+- Latency and Cost still have room for improvement
+- Clear improvement strategy is in place
+```
+
+---
--- a/skills/fine-tune/examples_phase4.md
+++ b/skills/fine-tune/examples_phase4.md
@@ -0,0 +1,288 @@
+# Phase 4: Completion and Documentation Examples
+
+Examples of final reports and Git commits.
+
+**📋 Related Documentation**: [Examples Home](./examples.md) | [Workflow Phase 4](./workflow_phase4.md)
+
+---
+
+## Phase 4: Completion and Documentation Examples
+
+### Example 4.1: Final Evaluation Report
+
+```markdown
+# LangGraph Application Fine-Tuning Completion Report
+
+Project: Customer Support Chatbot
+Implementation Period: 2024-11-24 10:00 - 2024-11-24 15:00 (5 hours)
+Implementer: Claude Code (fine-tune skill)
+
+## 🎯 Executive Summary
+
+This fine-tuning project optimized the prompts for the LangGraph chatbot application and achieved the following results:
+
+- ✅ **Accuracy**: 75.0% → 92.0% (+17.0%, target 90% achieved)
+- ✅ **Latency**: 2.5s → 1.9s (-24.0%, target 2.0s achieved)
+- ⚠️ **Cost**: $0.015 → $0.011 (-26.7%, target $0.010 not achieved)
+
+A total of 3 iterations were conducted, achieving targets for 2 out of 3 metrics.
+
+## 📊 Implementation Summary
+
+### Number of Iterations and Execution Time
+
+- **Total Iterations**: 3
+- **Number of Nodes Optimized**: 2 (analyze_intent, generate_response)
+- **Number of Evaluation Runs**: 20 times (Baseline 5 times + 5 times after each iteration × 3)
+- **Total Execution Time**: Approximately 5 hours
+
+### Final Results
+
+| Metric   | Initial | Final  | Improvement | Improvement Rate | Target | Achievement Status |
+| -------- | ------- | ------ | ----------- | ---------------- | ------ | ------------------ |
+| Accuracy | 75.0%   | 92.0%  | +17.0%      | +22.7%           | 90.0%  | ✅ 102.2%          |
+| Latency  | 2.5s    | 1.9s   | -0.6s       | -24.0%           | 2.0s   | ✅ 95.0%           |
+| Cost/req | $0.015  | $0.011 | -$0.004     | -26.7%           | $0.010 | ⚠️ 90.9%           |
+
+## 📝 Details by Iteration
+
+### Iteration 1: Optimize analyze_intent Node
+
+**Implementation Date/Time**: 2024-11-24 11:00
+**Target Node**: src/nodes/analyzer.py:25-45
+
+**Changes**:
+1. temperature: 1.0 → 0.3
+2. Added 5 few-shot examples
+3. Structured into JSON output format
+4. Defined clear classification categories (4 categories)
+
+**Results**:
+- Accuracy: 75.0% → 86.0% (+11.0%)
+- Latency: 2.5s → 2.4s (-0.1s)
+- Cost: $0.015 → $0.014 (-$0.001)
+
+**Learnings**: Few-shot examples and clear output format are most effective for accuracy improvement
+
+---
+
+### Iteration 2: Optimize generate_response Node
+
+**Implementation Date/Time**: 2024-11-24 13:00
+**Target Node**: src/nodes/generator.py:45-68
+
+**Changes**:
+1. Added conciseness instructions ("respond in 2-3 sentences")
+2. max_tokens: unlimited → 500
+3. temperature: 0.7 → 0.5
+4. Clarified response style
+
+**Results**:
+- Accuracy: 86.0% → 88.0% (+2.0%)
+- Latency: 2.4s → 2.0s (-0.4s)
+- Cost: $0.014 → $0.011 (-$0.003)
+
+**Learnings**: max_tokens limit significantly contributes to latency and cost reduction
+
+---
+
+### Iteration 3: Additional Improvements to analyze_intent
+
+**Implementation Date/Time**: 2024-11-24 14:30
+**Target Node**: src/nodes/analyzer.py:25-45
+
+**Changes**:
+1. Increased few-shot examples from 5 → 10
+2. Added edge case handling
+3. Reclassification logic based on confidence threshold
+
+**Results**:
+- Accuracy: 88.0% → 92.0% (+4.0%)
+- Latency: 2.0s → 1.9s (-0.1s)
+- Cost: $0.011 → $0.011 (±0)
+
+**Learnings**: Additional few-shot examples broke through the final accuracy barrier
+
+## 🔧 Final Changes Summary
+
+### src/nodes/analyzer.py
+
+**Changed Lines**: 25-45
+
+**Main Changes**:
+- temperature: 1.0 → 0.3
+- Few-shot examples: 0 → 10
+- Output: Free text → JSON
+- Added fallback based on confidence threshold
+
+---
+
+### src/nodes/generator.py
+
+**Changed Lines**: 45-68
+
+**Main Changes**:
+- temperature: 0.7 → 0.5
+- max_tokens: unlimited → 500
+- Clear conciseness instructions ("2-3 sentences")
+- Added response style guidelines
+
+## 📈 Detailed Evaluation Results
+
+### Improvement Status by Test Case
+
+| Case ID | Category  | Before      | After       | Improvement |
+| ------- | --------- | ----------- | ----------- | ----------- |
+| TC001   | Product   | ❌ Wrong    | ✅ Correct  | ✅          |
+| TC002   | Technical | ❌ Wrong    | ✅ Correct  | ✅          |
+| TC003   | Billing   | ✅ Correct  | ✅ Correct  | -           |
+| ...     | ...       | ...         | ...         | ...         |
+| TC020   | Technical | ✅ Correct  | ✅ Correct  | -           |
+
+**Improved Cases**: 15/20 (75%)
+**Maintained Cases**: 5/20 (25%)
+**Degraded Cases**: 0/20 (0%)
+
+### Latency Breakdown
+
+| Node              | Before | After | Change | Change Rate |
+| ----------------- | ------ | ----- | ------ | ----------- |
+| analyze_intent    | 0.5s   | 0.4s  | -0.1s  | -20%        |
+| retrieve_context  | 0.2s   | 0.2s  | ±0s    | 0%          |
+| generate_response | 1.8s   | 1.3s  | -0.5s  | -28%        |
+| **Total**         | **2.5s** | **1.9s** | **-0.6s** | **-24%** |
+
+### Cost Breakdown
+
+| Node              | Before  | After   | Change   | Change Rate |
+| ----------------- | ------- | ------- | -------- | ----------- |
+| analyze_intent    | $0.003  | $0.003  | ±$0      | 0%          |
+| retrieve_context  | $0.001  | $0.001  | ±$0      | 0%          |
+| generate_response | $0.011  | $0.007  | -$0.004  | -36%        |
+| **Total**         | **$0.015** | **$0.011** | **-$0.004** | **-27%** |
+
+## 💡 Future Recommendations
+
+### Short-term (1-2 weeks)
+
+1. **Achieve Cost Target**: $0.011 → $0.010
+   - Approach: Consider partial migration to Claude 3.5 Haiku
+   - Estimated effect: -$0.002-0.003/req
+
+2. **Further Accuracy Improvement**: 92.0% → 95.0%
+   - Approach: Analyze error cases and add few-shot examples
+   - Estimated effect: +3.0%
+
+### Mid-term (1-2 months)
+
+1. **Model Optimization**
+   - Use Haiku for simple intent classification
+   - Use Sonnet only for complex response generation
+   - Estimated effect: -30-40% cost, minimal impact on latency
+
+2. **Utilize Prompt Caching**
+   - Cache system prompts and few-shot examples
+   - Estimated effect: -50% cost (when cache hits)
+
+### Long-term (3-6 months)
+
+1. **Consider Fine-tuned Models**
+   - Model fine-tuning with proprietary data
+   - Concise prompts without few-shot examples
+   - Estimated effect: -60% cost, +5% accuracy
+
+## 🎓 Conclusion
+
+This project achieved the following through fine-tuning the LangGraph application:
+
+✅ **Successes**:
+1. Significant accuracy improvement (+22.7%) - Exceeded target by 2.2%
+2. Notable latency improvement (-24.0%) - Exceeded target by 5%
+3. Cost reduction (-26.7%) - 9.1% away from target
+
+⚠️ **Challenges**:
+1. Cost target not achieved ($0.011 vs $0.010 target) - Can be addressed by migrating to lighter models
+
+📈 **Business Impact**:
+- Improved user satisfaction (due to accuracy improvement)
+- Reduced operational costs (due to latency and cost reduction)
+- Improved scalability (efficient resource usage)
+
+🎯 **Next Steps**:
+1. Verify migration to lighter models for cost reduction
+2. Continuous monitoring and evaluation
+3. Expand to new use cases
+
+---
+
+Created Date/Time: 2024-11-24 15:00:00
+Creator: Claude Code (fine-tune skill)
+```
+
+### Example 4.2: Git Commit Message Examples
+
+```bash
+# Iteration 1 commit
+git commit -m "feat(nodes): optimize analyze_intent prompt for accuracy
+
+- Add temperature control (1.0 -> 0.3) for deterministic classification
+- Add 5 few-shot examples for intent categories
+- Implement JSON structured output format
+- Add error handling for JSON parsing failures
+
+Results:
+- Accuracy: 75.0% -> 86.0% (+11.0%)
+- Latency: 2.5s -> 2.4s (-0.1s)
+- Cost: \$0.015 -> \$0.014 (-\$0.001)
+
+Related: fine-tune iteration 1
+See: evaluation_results/iteration_1/"
+
+# Iteration 2 commit
+git commit -m "feat(nodes): optimize generate_response for latency and cost
+
+- Add conciseness guidelines (2-3 sentences)
+- Set max_tokens limit to 500
+- Adjust temperature (0.7 -> 0.5) for consistency
+- Define response style and tone
+
+Results:
+- Accuracy: 86.0% -> 88.0% (+2.0%)
+- Latency: 2.4s -> 2.0s (-0.4s, -17%)
+- Cost: \$0.014 -> \$0.011 (-\$0.003, -21%)
+
+Related: fine-tune iteration 2
+See: evaluation_results/iteration_2/"
+
+# Final commit
+git commit -m "feat(nodes): finalize fine-tuning with additional improvements
+
+Complete fine-tuning process with 3 iterations:
+- analyze_intent: 10 few-shot examples, confidence threshold
+- generate_response: conciseness and style optimization
+
+Final Results:
+- Accuracy: 75.0% -> 92.0% (+17.0%, goal 90% ✅)
+- Latency: 2.5s -> 1.9s (-0.6s, -24%, goal 2.0s ✅)
+- Cost: \$0.015 -> \$0.011 (-\$0.004, -27%, goal \$0.010 ⚠️)
+
+Related: fine-tune completion
+See: evaluation_results/final_report.md"
+
+# Evaluation results commit
+git commit -m "docs: add fine-tuning evaluation results and final report
+
+- Baseline evaluation (5 iterations)
+- Iteration 1-3 results
+- Final comprehensive report
+- Statistical analysis and recommendations"
+```
+
+---
+
+## 📚 Related Documentation
+
+- [SKILL.md](SKILL.md) - Skill overview
+- [workflow.md](workflow.md) - Workflow details
+- [evaluation.md](evaluation.md) - Evaluation methods
+- [prompt_optimization.md](prompt_optimization.md) - Optimization techniques
--- a/skills/fine-tune/prompt_optimization.md
+++ b/skills/fine-tune/prompt_optimization.md
@@ -0,0 +1,65 @@
+# Prompt Optimization Guide
+
+A comprehensive guide for effectively optimizing prompts in LangGraph nodes.
+
+## 📚 Table of Contents
+
+This guide is divided into the following sections:
+
+### 1. [Prompt Optimization Principles](./prompt_principles.md)
+Learn the fundamental principles for designing prompts.
+
+### 2. [Prompt Optimization Techniques](./prompt_techniques.md)
+Provides a collection of practical optimization techniques (10 techniques).
+
+### 3. [Optimization Priorities](./prompt_priorities.md)
+Explains how to apply optimization techniques in order of improvement impact.
+
+## 🎯 Quick Start
+
+### First-Time Optimization
+
+1. **[Understand the Principles](./prompt_principles.md)** - Learn the basics of clarity, structure, and specificity
+2. **[Start with High-Impact Techniques](./prompt_priorities.md)** - Few-Shot Examples, output format structuring, parameter tuning
+3. **[Review Technique Details](./prompt_techniques.md)** - Implementation methods and effects of each technique
+
+### Improving Existing Prompts
+
+1. **Measure Baseline** - Record current performance
+2. **[Refer to Priority Guide](./prompt_priorities.md)** - Select the most impactful improvements
+3. **[Apply Techniques](./prompt_techniques.md)** - Implement one at a time and measure effects
+4. **Iterate** - Repeat the cycle of measure, implement, validate
+
+## 📖 Related Documentation
+
+- **[Prompt Optimization Examples](./examples.md)** - Before/After comparison examples and code templates
+- **[SKILL.md](./SKILL.md)** - Overview and usage of the Fine-tune skill
+- **[evaluation.md](./evaluation.md)** - Evaluation criteria design and measurement methods
+
+## 💡 Best Practices
+
+For effective prompt optimization:
+
+1. ✅ **Measurement-Driven**: Evaluate all changes quantitatively
+2. ✅ **Incremental Improvement**: One change at a time, measure, validate
+3. ✅ **Cost-Conscious**: Optimize with model selection, caching, max_tokens
+4. ✅ **Task-Appropriate**: Select techniques based on task complexity
+5. ✅ **Iterative Approach**: Maintain continuous improvement cycles
+
+## 🔍 Troubleshooting
+
+### Low Prompt Quality
+→ Review [Prompt Optimization Principles](./prompt_principles.md)
+
+### Insufficient Accuracy
+→ Apply [Few-Shot Examples](./prompt_techniques.md#technique-1-few-shot-examples) or [Chain-of-Thought](./prompt_techniques.md#technique-2-chain-of-thought)
+
+### High Latency
+→ Implement [Temperature/Max Tokens Adjustment](./prompt_techniques.md#technique-4-temperature-and-max-tokens-adjustment) or [Output Format Structuring](./prompt_techniques.md#technique-3-output-format-structuring)
+
+### High Cost
+→ Introduce [Model Selection Optimization](./prompt_techniques.md#technique-10-model-selection) or [Prompt Caching](./prompt_techniques.md#technique-6-prompt-caching)
+
+---
+
+**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
--- a/skills/fine-tune/prompt_principles.md
+++ b/skills/fine-tune/prompt_principles.md
@@ -0,0 +1,84 @@
+# Prompt Optimization Principles
+
+Fundamental principles for designing prompts in LangGraph nodes.
+
+## 🎯 Prompt Optimization Principles
+
+### 1. Clarity
+
+**Bad Example**:
+```python
+SystemMessage(content="Analyze the input.")
+```
+
+**Good Example**:
+```python
+SystemMessage(content="""You are an intent classifier for customer support.
+
+Task: Classify user input into one of these categories:
+- product_inquiry: Questions about products or services
+- technical_support: Technical issues or troubleshooting
+- billing: Payment or billing questions
+- general: General questions or greetings
+
+Output only the category name.""")
+```
+
+**Improvements**:
+- ✅ Clearly defined role
+- ✅ Specific task description
+- ✅ Enumerated categories
+- ✅ Specified output format
+
+### 2. Structure
+
+**Bad Example**:
+```python
+prompt = f"Answer this: {question}"
+```
+
+**Good Example**:
+```python
+prompt = f"""Context:
+{context}
+
+Question:
+{question}
+
+Instructions:
+1. Base your answer on the provided context
+2. Be concise (2-3 sentences maximum)
+3. If the answer is not in the context, say "I don't have enough information"
+
+Answer:"""
+```
+
+**Improvements**:
+- ✅ Sectioned (Context, Question, Instructions, Answer)
+- ✅ Sequential instructions
+- ✅ Clear separators
+
+### 3. Specificity
+
+**Bad Example**:
+```python
+"Be helpful and friendly."
+```
+
+**Good Example**:
+```python
+"""Tone and Style:
+- Use a warm, professional tone
+- Address the customer by name if available
+- Acknowledge their concern explicitly
+- Provide actionable next steps
+
+Example:
+"Hi Sarah, I understand your concern about the billing charge. Let me review your account and get back to you within 24 hours with a detailed explanation."
+"""
+```
+
+**Improvements**:
+- ✅ Specific guidelines
+- ✅ Concrete examples provided
+- ✅ Measurable criteria
--- a/skills/fine-tune/prompt_priorities.md
+++ b/skills/fine-tune/prompt_priorities.md
@@ -0,0 +1,87 @@
+# Prompt Optimization Priorities
+
+A priority guide for applying optimization techniques in order of improvement impact.
+
+## 📊 Optimization Priorities
+
+In order of improvement impact:
+
+### 1. Adding Few-Shot Examples (High Impact, Low Cost)
+- **Improvement**: Accuracy +10-20%
+- **Cost**: +5-10% (increased input tokens)
+- **Implementation Time**: 30 minutes - 1 hour
+- **Recommended**: ⭐⭐⭐⭐⭐
+
+### 2. Output Format Structuring (High Impact, Low Cost)
+- **Improvement**: Latency -10-20%, Parsing errors -90%
+- **Cost**: ±0%
+- **Implementation Time**: 15-30 minutes
+- **Recommended**: ⭐⭐⭐⭐⭐
+
+### 3. Temperature/Max Tokens Adjustment (Medium Impact, Zero Cost)
+- **Improvement**: Latency -10-30%, Cost -20-40%
+- **Cost**: Reduction
+- **Implementation Time**: 10-15 minutes
+- **Recommended**: ⭐⭐⭐⭐⭐
+
+### 4. Clear Instructions and Guidelines (Medium Impact, Low Cost)
+- **Improvement**: Accuracy +5-10%, Quality +15-25%
+- **Cost**: +2-5%
+- **Implementation Time**: 30 minutes - 1 hour
+- **Recommended**: ⭐⭐⭐⭐
+
+### 5. Model Selection Optimization (High Impact, Requires Validation)
+- **Improvement**: Cost -40-60%
+- **Risk**: Accuracy -2-5%
+- **Implementation Time**: 2-4 hours (including validation)
+- **Recommended**: ⭐⭐⭐⭐
+
+### 6. Prompt Caching (High Impact, Medium Cost)
+- **Improvement**: Cost -50-90% (on cache hit)
+- **Complexity**: Medium (implementation and monitoring)
+- **Implementation Time**: 1-2 hours
+- **Recommended**: ⭐⭐⭐⭐
+
+### 7. Chain-of-Thought (High Impact for Specific Tasks)
+- **Improvement**: Accuracy +15-30% for complex tasks
+- **Cost**: +20-40%
+- **Implementation Time**: 1-2 hours
+- **Recommended**: ⭐⭐⭐ (complex tasks only)
+
+### 8. Self-Consistency (Limited Use)
+- **Improvement**: Accuracy +10-20%
+- **Cost**: +200-300%
+- **Implementation Time**: 2-3 hours
+- **Recommended**: ⭐⭐ (critical decisions only)
+
+## 🔄 Iterative Optimization Process
+
+```
+1. Measure baseline
+   ↓
+2. Select the most impactful improvement
+   ↓
+3. Implement (one change only)
+   ↓
+4. Evaluate (with same test cases)
+   ↓
+5. Is improvement confirmed?
+   ├─ Yes → Keep change, go to step 2
+   └─ No → Rollback change, try different improvement
+   ↓
+6. Goal achieved?
+   ├─ Yes → Complete
+   └─ No → Go to step 2
+```
+
+## Summary
+
+For effective prompt optimization:
+
+1. ✅ **Clarity**: Clear role, task, and output format
+2. ✅ **Few-Shot Examples**: 3-7 high-quality examples
+3. ✅ **Structuring**: Structured output like JSON
+4. ✅ **Parameter Tuning**: Task-appropriate temperature/max_tokens
+5. ✅ **Incremental Improvement**: One change at a time, measure, validate
+6. ✅ **Cost-Conscious**: Model selection, caching, max_tokens
+7. ✅ **Measurement-Driven**: Evaluate all changes quantitatively
--- a/skills/fine-tune/prompt_techniques.md
+++ b/skills/fine-tune/prompt_techniques.md
@@ -0,0 +1,425 @@
+# Prompt Optimization Techniques
+
+A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.
+
+**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
+
+## 🔧 Practical Optimization Techniques
+
+### Technique 1: Few-Shot Examples
+
+**Effect**: Accuracy +10-20%
+
+**Before (Zero-shot)**:
+```python
+system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""
+
+# Accuracy: ~70%
+```
+
+**After (Few-shot)**:
+```python
+system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.
+
+Examples:
+
+Input: "How much does the premium plan cost?"
+Output: product_inquiry
+
+Input: "I can't log into my account"
+Output: technical_support
+
+Input: "Why was I charged twice this month?"
+Output: billing
+
+Input: "Hello, how are you today?"
+Output: general
+
+Input: "What features are included in the basic plan?"
+Output: product_inquiry"""
+
+# Accuracy: ~85-90%
+```
+
+**Best Practices**:
+- **Number of Examples**: 3-7 (diminishing returns beyond this)
+- **Diversity**: At least one from each category, including edge cases
+- **Quality**: Select clear and unambiguous examples
+- **Format**: Consistent Input/Output format
+
+### Technique 2: Chain-of-Thought
+
+**Effect**: Accuracy +15-30% for complex reasoning tasks
+
+**Before (Direct answer)**:
+```python
+prompt = f"""Question: {question}
+
+Answer:"""
+
+# Many incorrect answers for complex questions
+```
+
+**After (Chain-of-Thought)**:
+```python
+prompt = f"""Question: {question}
+
+Think through this step by step:
+
+1. First, identify the key information needed
+2. Then, analyze the context for relevant details
+3. Finally, formulate a clear answer
+
+Reasoning:"""
+
+# Logical answers even for complex questions
+```
+
+**Application Scenarios**:
+- ✅ Tasks requiring multi-step reasoning
+- ✅ Complex decision making
+- ✅ Resolving contradictions
+- ❌ Simple classification tasks (overhead)
+
+### Technique 3: Output Format Structuring
+
+**Effect**: Latency -10-20%, Parsing errors -90%
+
+**Before (Free text)**:
+```python
+prompt = "Classify the intent and explain why."
+
+# Output: "This looks like a technical support question because the user is having trouble logging in..."
+# Problems: Hard to parse, verbose, inconsistent
+```
+
+**After (JSON structured)**:
+```python
+prompt = """Classify the intent.
+
+Output ONLY a valid JSON object:
+{
+  "intent": "<category>",
+  "confidence": <0.0-1.0>,
+  "reasoning": "<brief explanation in one sentence>"
+}
+
+Example output:
+{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""
+
+# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
+# Benefits: Easy to parse, concise, consistent
+```
+
+**JSON Parsing Error Handling**:
+```python
+import json
+import re
+
+def parse_llm_json_output(output: str) -> dict:
+    """Robustly parse LLM JSON output"""
+    try:
+        # Parse as JSON directly
+        return json.loads(output)
+    except json.JSONDecodeError:
+        # Extract JSON only (from markdown code blocks, etc.)
+        json_match = re.search(r'\{[^}]+\}', output)
+        if json_match:
+            try:
+                return json.loads(json_match.group())
+            except json.JSONDecodeError:
+                pass
+
+        # Fallback
+        return {
+            "intent": "general",
+            "confidence": 0.5,
+            "reasoning": "Failed to parse LLM output"
+        }
+```
+
+### Technique 4: Temperature and Max Tokens Adjustment
+
+**Temperature Effects**:
+
+| Task Type | Recommended Temperature | Reason |
+|-----------|------------------------|--------|
+| Classification/Extraction | 0.0 - 0.3 | Deterministic output desired |
+| Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed |
+| Creative/Generation | 0.7 - 1.0 | Diversity and creativity important |
+
+**Before (Default settings)**:
+```python
+llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=1.0  # Default, used for all tasks
+)
+# Unstable results for classification tasks
+```
+
+**After (Optimized per task)**:
+```python
+# Intent classification: Low temperature
+intent_llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=0.3  # Emphasize consistency
+)
+
+# Response generation: Medium temperature
+response_llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=0.5,  # Balance flexibility
+    max_tokens=500    # Enforce conciseness
+)
+```
+
+**Max Tokens Effects**:
+
+```python
+# Before: No limit
+llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
+# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s
+
+# After: Appropriate limit
+llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=500  # Necessary and sufficient length
+)
+# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)
+```
+
+### Technique 5: System Message vs Human Message Usage
+
+**System Message**:
+- **Use**: Role, guidelines, constraints
+- **Characteristics**: Context applied to entire task
+- **Caching**: Effective (doesn't change frequently)
+
+**Human Message**:
+- **Use**: Specific input, questions
+- **Characteristics**: Changes per request
+- **Caching**: Less effective
+
+**Good Structure**:
+```python
+messages = [
+    SystemMessage(content="""You are a customer support assistant.
+
+Guidelines:
+- Be concise: 2-3 sentences maximum
+- Be empathetic: Acknowledge customer concerns
+- Be actionable: Provide clear next steps
+
+Response format:
+1. Acknowledgment
+2. Answer or solution
+3. Next steps (if applicable)"""),
+
+    HumanMessage(content=f"""Customer question: {user_input}
+
+Context: {context}
+
+Generate a helpful response:""")
+]
+```
+
+### Technique 6: Prompt Caching
+
+**Effect**: Cost -50-90% (on cache hit)
+
+Leverage Anthropic Claude's prompt caching:
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic()
+
+# Large cacheable system prompt
+CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...
+
+[Long guidelines, examples, and context - 1000+ tokens]
+
+Examples:
+[50 few-shot examples]
+"""
+
+# Use cache
+message = client.messages.create(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=500,
+    system=[
+        {
+            "type": "text",
+            "text": CACHED_SYSTEM_PROMPT,
+            "cache_control": {"type": "ephemeral"}  # Enable caching
+        }
+    ],
+    messages=[
+        {"role": "user", "content": user_input}
+    ]
+)
+
+# First time: Full cost
+# 2nd+ time (within 5 minutes): Input tokens -90% discount
+```
+
+**Caching Strategy**:
+- ✅ Large system prompts (>1024 tokens)
+- ✅ Sets of few-shot examples
+- ✅ Long context (RAG documents)
+- ❌ Frequently changing content
+- ❌ Small prompts (<1024 tokens)
+
+### Technique 7: Progressive Refinement
+
+Break complex tasks into multiple steps:
+
+**Before (1 step)**:
+```python
+# Execute everything in one node
+prompt = f"""Analyze user input, retrieve relevant info, and generate response.
+
+Input: {user_input}"""
+
+# Problems: Too complex, low quality, hard to debug
+```
+
+**After (Multiple steps)**:
+```python
+# Step 1: Intent classification
+intent = classify_intent(user_input)
+
+# Step 2: Information retrieval (based on intent)
+context = retrieve_context(intent, user_input)
+
+# Step 3: Response generation (using intent and context)
+response = generate_response(intent, context, user_input)
+
+# Benefits: Each step optimizable, easy to debug, improved quality
+```
+
+### Technique 8: Negative Instructions
+
+**Effect**: Edge case errors -30-50%
+
+```python
+prompt = """Generate a customer support response.
+
+DO:
+- Be concise (2-3 sentences)
+- Acknowledge the customer's concern
+- Provide actionable next steps
+
+DO NOT:
+- Apologize excessively (one apology maximum)
+- Make promises you can't keep (e.g., "immediate resolution")
+- Use technical jargon without explanation
+- Provide information not in the context
+- Generate placeholder text like "XXX" or "[insert here]"
+
+Customer question: {question}
+Context: {context}
+
+Response:"""
+```
+
+### Technique 9: Self-Consistency
+
+**Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300%
+
+Generate multiple reasoning paths and use majority voting:
+
+```python
+def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
+    """Generate multiple reasoning paths and select the most consistent answer"""
+
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=0.7  # Higher temperature for diversity
+    )
+
+    prompt = f"""Question: {question}
+
+Think through this step by step and provide your reasoning:
+
+Reasoning:"""
+
+    # Generate multiple reasoning paths
+    responses = []
+    for _ in range(num_samples):
+        response = llm.invoke([HumanMessage(content=prompt)])
+        responses.append(response.content)
+
+    # Extract the most consistent answer (simplified)
+    # In practice, extract final answer from each response and use majority voting
+    from collections import Counter
+    final_answers = [extract_final_answer(r) for r in responses]
+    most_common = Counter(final_answers).most_common(1)[0][0]
+
+    return most_common
+
+# Trade-offs:
+# - Accuracy: +10-20%
+# - Cost: +200-300% (5x API calls)
+# - Latency: +200-300% (if not parallelized)
+# Use: Critical decisions only
+```
+
+### Technique 10: Model Selection
+
+**Model Selection Based on Task Complexity**:
+
+| Task Type | Recommended Model | Reason |
+|-----------|------------------|--------|
+| Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy |
+| Complex reasoning | Claude 3.5 Sonnet | Balanced performance |
+| Highly complex tasks | Claude Opus | Best performance (high cost) |
+
+```python
+# Select optimal model per task
+class LLMSelector:
+    def __init__(self):
+        self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
+        self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
+        self.opus = ChatAnthropic(model="claude-opus-20240229")
+
+    def get_llm(self, task_complexity: str):
+        if task_complexity == "simple":
+            return self.haiku  # ~$0.001/req
+        elif task_complexity == "complex":
+            return self.sonnet  # ~$0.005/req
+        else:  # very_complex
+            return self.opus  # ~$0.015/req
+
+# Usage example
+selector = LLMSelector()
+
+# Simple intent classification → Haiku
+intent_llm = selector.get_llm("simple")
+
+# Complex response generation → Sonnet
+response_llm = selector.get_llm("complex")
+```
+
+**Hybrid Approach**:
+```python
+def hybrid_classification(user_input: str) -> dict:
+    """Try Haiku first, use Sonnet if confidence is low"""
+
+    # Step 1: Classify with Haiku
+    haiku_result = classify_with_haiku(user_input)
+
+    if haiku_result["confidence"] >= 0.8:
+        # High confidence → Use Haiku result
+        return haiku_result
+    else:
+        # Low confidence → Re-classify with Sonnet
+        sonnet_result = classify_with_sonnet(user_input)
+        return sonnet_result
+
+# Effects:
+# - 80% of cases use Haiku (low cost)
+# - 20% of cases use Sonnet (high accuracy)
+# - Average cost: -60%
+# - Average accuracy: -2% (acceptable range)
+```
--- a/skills/fine-tune/workflow.md
+++ b/skills/fine-tune/workflow.md
@@ -0,0 +1,127 @@
+# Fine-Tuning Workflow Details
+
+Detailed workflow and practical guidelines for executing fine-tuning of LangGraph applications.
+
+**💡 Tip**: For concrete code examples and templates you can copy and paste, refer to [examples.md](examples.md).
+
+## 📋 Workflow Overview
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Phase 1: Preparation and Analysis                           │
+├─────────────────────────────────────────────────────────────┤
+│ 1. Read fine-tune.md → Understand goals and criteria        │
+│ 2. Identify optimization targets with Serena → List LLM nodes│
+│ 3. Create optimization list → Assess improvement potential  │
+└─────────────────────────────────────────────────────────────┘
+                            ↓
+┌─────────────────────────────────────────────────────────────┐
+│ Phase 2: Baseline Evaluation                                │
+├─────────────────────────────────────────────────────────────┤
+│ 4. Prepare evaluation environment → Test cases, scripts     │
+│ 5. Measure baseline → Run 3-5 times, collect statistics     │
+│ 6. Analyze results → Identify issues, assess improvement    │
+└─────────────────────────────────────────────────────────────┘
+                            ↓
+┌─────────────────────────────────────────────────────────────┐
+│ Phase 3: Iterative Improvement (Iteration Loop)             │
+├─────────────────────────────────────────────────────────────┤
+│ 7. Prioritize → Select most effective improvement area      │
+│ 8. Implement improvements → Optimize prompts, adjust params │
+│ 9. Post-improvement evaluation → Re-evaluate same conditions│
+│ 10. Compare results → Measure improvement, decide next step │
+│ 11. Continue decision → Goal met? Yes → Phase 4 / No → Next │
+└─────────────────────────────────────────────────────────────┘
+                            ↓
+┌─────────────────────────────────────────────────────────────┐
+│ Phase 4: Completion and Documentation                       │
+├─────────────────────────────────────────────────────────────┤
+│ 12. Create final evaluation report → Summary of improvements│
+│ 13. Commit code → Version control and documentation update  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## 📚 Phase-by-Phase Detailed Guide
+
+### [Phase 1: Preparation and Analysis](./workflow_phase1.md)
+Clarify optimization direction and identify targets for improvement:
+- **Step 1**: Read and understand fine-tune.md
+- **Step 2**: Identify optimization targets with Serena MCP
+- **Step 3**: Create optimization target list
+
+**Time Required**: 30 minutes - 1 hour
+
+### [Phase 2: Baseline Evaluation](./workflow_phase2.md)
+Quantitatively measure current performance:
+- **Step 4**: Prepare evaluation environment
+- **Step 5**: Measure baseline (3-5 runs)
+- **Step 6**: Analyze baseline results
+
+**Time Required**: 1-2 hours
+
+### [Phase 3: Iterative Improvement](./workflow_phase3.md)
+Data-driven, incremental prompt optimization:
+- **Step 7**: Prioritization
+- **Step 8**: Implement improvements
+- **Step 9**: Post-improvement evaluation
+- **Step 10**: Compare results
+- **Step 11**: Continue decision
+
+**Time Required**: 1-2 hours per iteration × number of iterations (typically 3-5)
+
+### [Phase 4: Completion and Documentation](./workflow_phase4.md)
+Record final results and commit code:
+- **Step 12**: Create final evaluation report
+- **Step 13**: Commit code and update documentation
+
+**Time Required**: 30 minutes - 1 hour
+
+## 🎯 Workflow Execution Points
+
+### For First-Time Fine-Tuning
+
+1. **Start from Phase 1 in order**: Execute all phases without skipping
+2. **Create documentation**: Record results from each phase
+3. **Start small**: Experiment with a small number of test cases initially
+
+### Continuous Fine-Tuning
+
+1. **Start from Phase 2**: Measure new baseline
+2. **Repeat Phase 3**: Continuous improvement cycle
+3. **Consider automation**: Build evaluation pipeline
+
+## 📊 Principles for Success
+
+1. **Data-Driven**: Base all decisions on measurement results
+2. **Incremental Improvement**: One change at a time, measure, verify
+3. **Documentation**: Record results and learnings from each phase
+4. **Statistical Verification**: Run multiple times to confirm significance
+
+## 🔗 Related Documents
+
+- **[Example Collection](./examples.md)** - Code examples and templates for each phase
+- **[Evaluation Methods](./evaluation.md)** - Details on evaluation metrics and statistical analysis
+- **[Prompt Optimization](./prompt_optimization.md)** - Detailed optimization techniques
+- **[SKILL.md](./SKILL.md)** - Overview of the Fine-tune skill
+
+## 💡 Troubleshooting
+
+### Cannot find optimization targets in Phase 1
+→ Check search patterns in [workflow_phase1.md#step-2](./workflow_phase1.md#step-2-identify-optimization-targets-with-serena-mcp)
+
+### Evaluation script fails in Phase 2
+→ Check checklist in [workflow_phase2.md#step-4](./workflow_phase2.md#step-4-prepare-evaluation-environment)
+
+### No improvement effect in Phase 3
+→ Review priority matrix in [workflow_phase3.md#step-7](./workflow_phase3.md#step-7-prioritization)
+
+### Report creation takes too long in Phase 4
+→ Utilize templates in [workflow_phase4.md#step-12](./workflow_phase4.md#step-12-create-final-evaluation-report)
+
+---
+
+Following this workflow enables:
+- ✅ Systematic fine-tuning process execution
+- ✅ Data-driven decision making
+- ✅ Continuous improvement and verification
+- ✅ Complete documentation and traceability
--- a/skills/fine-tune/workflow_phase1.md
+++ b/skills/fine-tune/workflow_phase1.md
@@ -0,0 +1,229 @@
+# Phase 1: Preparation and Analysis
+
+Preparation phase to clarify optimization direction and identify targets for improvement.
+
+**Time Required**: 30 minutes - 1 hour
+
+**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Practical Examples](./examples.md)
+
+---
+
+## Phase 1: Preparation and Analysis
+
+### Step 1: Read and Understand fine-tune.md
+
+**Purpose**: Clarify optimization direction
+
+**Execution**:
+```python
+# Read .langgraph-master/fine-tune.md
+file_path = ".langgraph-master/fine-tune.md"
+with open(file_path, "r") as f:
+    fine_tune_spec = f.read()
+
+# Extract the following information:
+# - Optimization goals (accuracy, latency, cost, etc.)
+# - Evaluation methods (test cases, metrics, calculation methods)
+# - Passing criteria (target values for each metric)
+# - Test data location
+```
+
+**Typical fine-tune.md structure**:
+```markdown
+# Fine-Tuning Goals
+
+## Optimization Objectives
+- **Accuracy**: Improve user intent classification accuracy to 90% or higher
+- **Latency**: Reduce response time to 2.0 seconds or less
+- **Cost**: Reduce cost per request to $0.010 or less
+
+## Evaluation Methods
+- **Test Cases**: tests/evaluation/test_cases.json (20 cases)
+- **Execution Command**: uv run python -m src.evaluate
+- **Evaluation Script**: tests/evaluation/evaluator.py
+
+## Evaluation Metrics
+
+### Accuracy
+- Calculation method: (Correct count / Total cases) × 100
+- Target value: 90% or higher
+
+### Latency
+- Calculation method: Average time per execution
+- Target value: 2.0 seconds or less
+
+### Cost
+- Calculation method: Total API cost / Total requests
+- Target value: $0.010 or less
+
+## Passing Criteria
+All evaluation metrics must achieve their target values
+```
+
+### Step 2: Identify Optimization Targets with Serena MCP
+
+**Purpose**: Comprehensively identify nodes calling LLMs
+
+**Execution Steps**:
+
+1. **Search for LLM clients**
+```python
+# Use Serena MCP: find_symbol
+# Search for ChatAnthropic, ChatOpenAI, ChatGoogleGenerativeAI, etc.
+
+patterns = [
+    "ChatAnthropic",
+    "ChatOpenAI",
+    "ChatGoogleGenerativeAI",
+    "ChatVertexAI"
+]
+
+llm_usages = []
+for pattern in patterns:
+    results = serena.find_symbol(
+        name_path=pattern,
+        substring_matching=True,
+        include_body=False
+    )
+    llm_usages.extend(results)
+```
+
+2. **Identify prompt construction locations**
+```python
+# For each LLM call, investigate how prompts are constructed
+for usage in llm_usages:
+    # Get surrounding context with find_referencing_symbols
+    context = serena.find_referencing_symbols(
+        name_path=usage.name,
+        relative_path=usage.file_path
+    )
+
+    # Identify prompt templates and message construction logic
+    # - Use of ChatPromptTemplate
+    # - SystemMessage, HumanMessage definitions
+    # - Prompt construction with f-strings or format()
+```
+
+3. **Per-node analysis**
+```python
+# Analyze LLM usage patterns within each node function
+# - Prompt clarity
+# - Presence of few-shot examples
+# - Structured output format
+# - Parameter settings (temperature, max_tokens, etc.)
+```
+
+**Example Output**:
+```markdown
+## LLM Call Location Analysis
+
+### 1. analyze_intent node
+- **File**: src/nodes/analyzer.py
+- **Line numbers**: 25-45
+- **LLM**: ChatAnthropic(model="claude-3-5-sonnet-20241022")
+- **Prompt structure**:
+  ```python
+  SystemMessage: "You are an intent analyzer..."
+  HumanMessage: f"Analyze: {user_input}"
+  ```
+- **Improvement potential**: ⭐⭐⭐⭐⭐ (High)
+  - Prompt is vague ("Analyze" criteria unclear)
+  - No few-shot examples
+  - Output format is free text
+- **Estimated improvement effect**: Accuracy +10-15%
+
+### 2. generate_response node
+- **File**: src/nodes/generator.py
+- **Line numbers**: 45-68
+- **LLM**: ChatAnthropic(model="claude-3-5-sonnet-20241022")
+- **Prompt structure**:
+  ```python
+  ChatPromptTemplate.from_messages([
+      ("system", "Generate helpful response..."),
+      ("human", "{context}\n\nQuestion: {question}")
+  ])
+  ```
+- **Improvement potential**: ⭐⭐⭐ (Medium)
+  - Prompt is structured but lacks conciseness instructions
+  - No max_tokens limit → possibility of verbose output
+- **Estimated improvement effect**: Latency -0.3-0.5s, Cost -20-30%
+```
+
+### Step 3: Create Optimization Target List
+
+**Purpose**: Organize information to determine improvement priorities
+
+**List Creation Template**:
+```markdown
+# Optimization Target List
+
+## Node: analyze_intent
+
+### Basic Information
+- **File**: src/nodes/analyzer.py:25-45
+- **Role**: Classify user input intent
+- **LLM Model**: claude-3-5-sonnet-20241022
+- **Current Parameters**: temperature=1.0, max_tokens=default
+
+### Current Prompt
+```python
+SystemMessage(content="You are an intent analyzer. Analyze user input.")
+HumanMessage(content=f"Analyze: {user_input}")
+```
+
+### Issues
+1. **Vague instructions**: Specific criteria for "Analyze" unclear
+2. **No few-shot**: No expected output examples
+3. **Undefined output format**: Unstructured free text
+4. **High temperature**: 1.0 is too high for classification tasks
+
+### Improvement Ideas
+1. Specify concrete classification categories
+2. Add 3-5 few-shot examples
+3. Specify JSON output format
+4. Lower temperature to 0.3-0.5
+
+### Estimated Improvement Effect
+- **Accuracy**: +10-15% (Current misclassification 20% → 5-10%)
+- **Latency**: ±0 (No change)
+- **Cost**: ±0 (No change)
+
+### Priority
+⭐⭐⭐⭐⭐ (Highest) - Direct impact on accuracy improvement
+
+---
+
+## Node: generate_response
+
+### Basic Information
+- **File**: src/nodes/generator.py:45-68
+- **Role**: Generate final user-facing response
+- **LLM Model**: claude-3-5-sonnet-20241022
+- **Current Parameters**: temperature=0.7, max_tokens=default
+
+### Current Prompt
+```python
+ChatPromptTemplate.from_messages([
+    ("system", "Generate helpful response based on context."),
+    ("human", "{context}\n\nQuestion: {question}")
+])
+```
+
+### Issues
+1. **No verbosity control**: No conciseness instructions
+2. **max_tokens not set**: Possibility of unnecessarily long output
+3. **Undefined response style**: No tone or style specifications
+
+### Improvement Ideas
+1. Add length instructions like "be concise" "in 2-3 sentences"
+2. Limit max_tokens to 500
+3. Clarify response style ("friendly" "professional" etc.)
+
+### Estimated Improvement Effect
+- **Accuracy**: ±0 (No change)
+- **Latency**: -0.3-0.5s (Due to reduced output tokens)
+- **Cost**: -20-30% (Due to reduced token count)
+
+### Priority
+⭐⭐⭐ (Medium) - Improvement in latency and cost
+```
--- a/skills/fine-tune/workflow_phase2.md
+++ b/skills/fine-tune/workflow_phase2.md
@@ -0,0 +1,222 @@
+# Phase 2: Baseline Evaluation
+
+Phase to quantitatively measure current performance.
+
+**Time Required**: 1-2 hours
+
+**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Evaluation Methods](./evaluation.md)
+
+---
+
+## Phase 2: Baseline Evaluation
+
+### Step 4: Prepare Evaluation Environment
+
+**Checklist**:
+- [ ] Test case files exist
+- [ ] Evaluation script is executable
+- [ ] Environment variables (API keys, etc.) are set
+- [ ] Dependency packages are installed
+
+**Execution Example**:
+```bash
+# Check test cases
+cat tests/evaluation/test_cases.json
+
+# Verify evaluation script works
+uv run python -m src.evaluate --dry-run
+
+# Verify environment variables
+echo $ANTHROPIC_API_KEY
+```
+
+### Step 5: Measure Baseline
+
+**Recommended Run Count**: 3-5 times (for statistical reliability)
+
+**Execution Script Example**:
+```bash
+#!/bin/bash
+# baseline_evaluation.sh
+
+ITERATIONS=5
+RESULTS_DIR="evaluation_results/baseline"
+mkdir -p $RESULTS_DIR
+
+for i in $(seq 1 $ITERATIONS); do
+    echo "Running baseline evaluation: iteration $i/$ITERATIONS"
+    uv run python -m src.evaluate \
+        --output "$RESULTS_DIR/run_$i.json" \
+        --verbose
+
+    # API rate limit countermeasure
+    sleep 5
+done
+
+# Aggregate results
+uv run python -m src.aggregate_results \
+    --input-dir "$RESULTS_DIR" \
+    --output "$RESULTS_DIR/summary.json"
+```
+
+**Evaluation Script Example** (`src/evaluate.py`):
+```python
+import json
+import time
+from pathlib import Path
+from typing import Dict, List
+
+def evaluate_test_cases(test_cases: List[Dict]) -> Dict:
+    """Evaluate test cases"""
+    results = {
+        "total_cases": len(test_cases),
+        "correct": 0,
+        "total_latency": 0.0,
+        "total_cost": 0.0,
+        "case_results": []
+    }
+
+    for case in test_cases:
+        start_time = time.time()
+
+        # Execute LangGraph application
+        output = run_langgraph_app(case["input"])
+
+        latency = time.time() - start_time
+
+        # Correct answer judgment
+        is_correct = output["answer"] == case["expected_answer"]
+        if is_correct:
+            results["correct"] += 1
+
+        # Cost calculation (from token usage)
+        cost = calculate_cost(output["token_usage"])
+
+        results["total_latency"] += latency
+        results["total_cost"] += cost
+
+        results["case_results"].append({
+            "case_id": case["id"],
+            "correct": is_correct,
+            "latency": latency,
+            "cost": cost
+        })
+
+    # Calculate metrics
+    results["accuracy"] = (results["correct"] / results["total_cases"]) * 100
+    results["avg_latency"] = results["total_latency"] / results["total_cases"]
+    results["avg_cost"] = results["total_cost"] / results["total_cases"]
+
+    return results
+
+def calculate_cost(token_usage: Dict) -> float:
+    """Calculate cost from token usage"""
+    # Claude 3.5 Sonnet pricing
+    INPUT_COST_PER_1M = 3.0  # $3.00 per 1M input tokens
+    OUTPUT_COST_PER_1M = 15.0  # $15.00 per 1M output tokens
+
+    input_cost = (token_usage["input_tokens"] / 1_000_000) * INPUT_COST_PER_1M
+    output_cost = (token_usage["output_tokens"] / 1_000_000) * OUTPUT_COST_PER_1M
+
+    return input_cost + output_cost
+```
+
+### Step 6: Analyze Baseline Results
+
+**Aggregation Script Example** (`src/aggregate_results.py`):
+```python
+import json
+import numpy as np
+from pathlib import Path
+from typing import List, Dict
+
+def aggregate_results(results_dir: Path) -> Dict:
+    """Aggregate multiple execution results"""
+    all_results = []
+
+    for result_file in sorted(results_dir.glob("run_*.json")):
+        with open(result_file) as f:
+            all_results.append(json.load(f))
+
+    # Calculate statistics for each metric
+    accuracies = [r["accuracy"] for r in all_results]
+    latencies = [r["avg_latency"] for r in all_results]
+    costs = [r["avg_cost"] for r in all_results]
+
+    summary = {
+        "iterations": len(all_results),
+        "accuracy": {
+            "mean": np.mean(accuracies),
+            "std": np.std(accuracies),
+            "min": np.min(accuracies),
+            "max": np.max(accuracies)
+        },
+        "latency": {
+            "mean": np.mean(latencies),
+            "std": np.std(latencies),
+            "min": np.min(latencies),
+            "max": np.max(latencies)
+        },
+        "cost": {
+            "mean": np.mean(costs),
+            "std": np.std(costs),
+            "min": np.min(costs),
+            "max": np.max(costs)
+        }
+    }
+
+    return summary
+```
+
+**Results Report Example**:
+```markdown
+# Baseline Evaluation Results
+
+Execution Date: 2024-11-24 10:00:00
+Run Count: 5
+Test Case Count: 20
+
+## Evaluation Metrics Summary
+
+| Metric | Mean | Std Dev | Min | Max | Target | Gap |
+|--------|------|---------|-----|-----|--------|-----|
+| Accuracy | 75.0% | 3.2% | 70.0% | 80.0% | 90.0% | **-15.0%** |
+| Latency | 2.5s | 0.4s | 2.1s | 3.2s | 2.0s | **+0.5s** |
+| Cost/req | $0.015 | $0.002 | $0.013 | $0.018 | $0.010 | **+$0.005** |
+
+## Detailed Analysis
+
+### Accuracy Issues
+- **Current**: 75.0% (Target: 90.0%)
+- **Main error patterns**:
+  1. Intent classification errors: 12 cases (60% of errors)
+  2. Context understanding deficiency: 5 cases (25% of errors)
+  3. Handling ambiguous questions: 3 cases (15% of errors)
+
+### Latency Issues
+- **Current**: 2.5s (Target: 2.0s)
+- **Bottlenecks**:
+  1. generate_response node: avg 1.8s (72% of total)
+  2. analyze_intent node: avg 0.5s (20% of total)
+  3. Other: avg 0.2s (8% of total)
+
+### Cost Issues
+- **Current**: $0.015/req (Target: $0.010/req)
+- **Cost breakdown**:
+  1. generate_response: $0.011 (73%)
+  2. analyze_intent: $0.003 (20%)
+  3. Other: $0.001 (7%)
+- **Main factor**: High output token count (avg 800 tokens)
+
+## Improvement Directions
+
+### Priority 1: Improve analyze_intent accuracy
+- **Impact**: Direct impact on accuracy (accounts for 60% of -15% gap)
+- **Improvements**: Few-shot examples, clear classification criteria, JSON output format
+- **Estimated effect**: +10-12% accuracy
+
+### Priority 2: Optimize generate_response efficiency
+- **Impact**: Affects both latency and cost
+- **Improvements**: Conciseness instructions, max_tokens limit, temperature adjustment
+- **Estimated effect**: -0.4s latency, -$0.004 cost
+```
--- a/skills/fine-tune/workflow_phase3.md
+++ b/skills/fine-tune/workflow_phase3.md
@@ -0,0 +1,225 @@
+# Phase 3: Iterative Improvement
+
+Phase for data-driven, incremental prompt optimization.
+
+**Time Required**: 1-2 hours per iteration × number of iterations (typically 3-5)
+
+**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Prompt Optimization](./prompt_optimization.md)
+
+---
+
+## Phase 3: Iterative Improvement
+
+### Iteration Cycle
+
+Execute the following in each iteration:
+
+1. **Prioritization** (Step 7)
+2. **Implement Improvements** (Step 8)
+3. **Post-Improvement Evaluation** (Step 9)
+4. **Compare Results** (Step 10)
+5. **Continue Decision** (Step 11)
+
+### Step 7: Prioritization
+
+**Decision Criteria**:
+1. **Impact on goal achievement**
+2. **Feasibility of improvement**
+3. **Implementation cost**
+
+**Priority Matrix**:
+```markdown
+## Improvement Priority Matrix
+
+| Node | Impact | Feasibility | Impl Cost | Total Score | Priority |
+|------|--------|-------------|-----------|-------------|----------|
+| analyze_intent | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 14/15 | 1st |
+| generate_response | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 12/15 | 2nd |
+| retrieve_context | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 8/15 | 3rd |
+
+**Iteration 1 Target**: analyze_intent node
+```
+
+### Step 8: Implement Improvements
+
+**Pre-Improvement Prompt** (`src/nodes/analyzer.py`):
+```python
+# Before
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=1.0
+    )
+
+    messages = [
+        SystemMessage(content="You are an intent analyzer. Analyze user input."),
+        HumanMessage(content=f"Analyze: {state['user_input']}")
+    ]
+
+    response = llm.invoke(messages)
+    state["intent"] = response.content
+    return state
+```
+
+**Post-Improvement Prompt**:
+```python
+# After - Iteration 1
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=0.3  # Lower temperature for classification tasks
+    )
+
+    # Clear classification categories and few-shot examples
+    system_prompt = """You are an intent classifier for a customer support chatbot.
+
+Classify user input into one of these categories:
+- "product_inquiry": Questions about products or services
+- "technical_support": Technical issues or troubleshooting
+- "billing": Payment, invoicing, or billing questions
+- "general": General questions or chitchat
+
+Output ONLY a valid JSON object with this structure:
+{
+  "intent": "<category>",
+  "confidence": <0.0-1.0>,
+  "reasoning": "<brief explanation>"
+}
+
+Examples:
+
+Input: "How much does the premium plan cost?"
+Output: {"intent": "product_inquiry", "confidence": 0.95, "reasoning": "Question about product pricing"}
+
+Input: "I can't log into my account"
+Output: {"intent": "technical_support", "confidence": 0.9, "reasoning": "Authentication issue"}
+
+Input: "Why was I charged twice?"
+Output: {"intent": "billing", "confidence": 0.95, "reasoning": "Question about billing charges"}
+
+Input: "Hello, how are you?"
+Output: {"intent": "general", "confidence": 0.85, "reasoning": "General greeting"}
+
+Input: "What's the return policy?"
+Output: {"intent": "product_inquiry", "confidence": 0.9, "reasoning": "Question about product policy"}
+"""
+
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
+    ]
+
+    response = llm.invoke(messages)
+
+    # JSON parsing (with error handling)
+    try:
+        intent_data = json.loads(response.content)
+        state["intent"] = intent_data["intent"]
+        state["confidence"] = intent_data["confidence"]
+    except json.JSONDecodeError:
+        # Fallback
+        state["intent"] = "general"
+        state["confidence"] = 0.5
+
+    return state
+```
+
+**Summary of Changes**:
+1. ✅ temperature: 1.0 → 0.3 (appropriate for classification tasks)
+2. ✅ Clear classification categories (4 intents)
+3. ✅ Few-shot examples (added 5)
+4. ✅ JSON output format (structured output)
+5. ✅ Error handling (fallback for JSON parse failures)
+
+### Step 9: Post-Improvement Evaluation
+
+**Execution**:
+```bash
+# Execute post-improvement evaluation under same conditions
+./evaluation_after_iteration1.sh
+```
+
+### Step 10: Compare Results
+
+**Comparison Report Example**:
+```markdown
+# Iteration 1 Evaluation Results
+
+Execution Date: 2024-11-24 12:00:00
+Changes: Optimization of analyze_intent node
+
+## Results Comparison
+
+| Metric | Baseline | Iteration 1 | Change | % Change | Target | Achievement |
+|--------|----------|-------------|--------|----------|--------|-------------|
+| **Accuracy** | 75.0% | **86.0%** | **+11.0%** | +14.7% | 90.0% | 95.6% |
+| **Latency** | 2.5s | 2.4s | -0.1s | -4.0% | 2.0s | 80.0% |
+| **Cost/req** | $0.015 | $0.014 | -$0.001 | -6.7% | $0.010 | 71.4% |
+
+## Detailed Analysis
+
+### Accuracy Improvement
+- **Improvement**: +11.0% (75.0% → 86.0%)
+- **Remaining gap**: 4.0% (target 90.0%)
+- **Improved cases**: Intent classification errors reduced from 12 → 3 cases
+- **Still needs improvement**: Context understanding deficiency cases (5 cases)
+
+### Slight Latency Improvement
+- **Improvement**: -0.1s (2.5s → 2.4s)
+- **Main factor**: Lower temperature in analyze_intent made output more concise
+- **Remaining bottleneck**: generate_response (avg 1.8s)
+
+### Slight Cost Reduction
+- **Reduction**: -$0.001 (6.7% reduction)
+- **Factor**: Reduced output tokens in analyze_intent
+- **Main cost**: generate_response still accounts for 73%
+
+## Next Iteration Strategy
+
+### Priority 1: Optimize generate_response
+- **Goal**: Latency 1.8s → 1.4s, Cost $0.011 → $0.007
+- **Approach**:
+  1. Add conciseness instructions
+  2. Limit max_tokens to 500
+  3. Adjust temperature from 0.7 → 0.5
+
+### Priority 2: Final 4% accuracy improvement
+- **Goal**: 86.0% → 90.0% or higher
+- **Approach**: Improve context understanding (retrieve_context node)
+
+## Decision
+✅ Continue → Proceed to Iteration 2
+```
+
+### Step 11: Continue Decision
+
+**Decision Criteria**:
+```python
+def should_continue_iteration(results: Dict, goals: Dict) -> bool:
+    """Determine if iteration should continue"""
+    all_goals_met = True
+
+    for metric, goal in goals.items():
+        if metric == "accuracy":
+            if results[metric] < goal:
+                all_goals_met = False
+        elif metric in ["latency", "cost"]:
+            if results[metric] > goal:
+                all_goals_met = False
+
+    return not all_goals_met
+
+# Example
+goals = {"accuracy": 90.0, "latency": 2.0, "cost": 0.010}
+results = {"accuracy": 86.0, "latency": 2.4, "cost": 0.014}
+
+if should_continue_iteration(results, goals):
+    print("Proceed to next iteration")
+else:
+    print("Goals achieved - Move to Phase 4")
+```
+
+**Iteration Limit**:
+- **Recommended**: 3-5 iterations
+- **Reason**: Beyond this, law of diminishing returns likely applies
+- **Exception**: Critical applications may require 10+ iterations
--- a/skills/fine-tune/workflow_phase4.md
+++ b/skills/fine-tune/workflow_phase4.md
@@ -0,0 +1,339 @@
+# Phase 4: Completion and Documentation
+
+Phase to record final results and commit code.
+
+**Time Required**: 30 minutes - 1 hour
+
+**📋 Related Documents**: [Overall Workflow](./workflow.md) | [Practical Examples](./examples.md)
+
+---
+
+## Phase 4: Completion and Documentation
+
+### Step 12: Create Final Evaluation Report
+
+**Report Template**:
+```markdown
+# LangGraph Application Fine-Tuning Completion Report
+
+Project: [Project Name]
+Implementation Period: 2024-11-24 10:00 - 2024-11-24 15:00 (5 hours)
+Implementer: Claude Code with fine-tune skill
+
+## Executive Summary
+
+This fine-tuning project executed prompt optimization for a LangGraph chatbot application and achieved the following results:
+
+- ✅ **Accuracy**: 75.0% → 92.0% (+17.0%, achieved 90% target)
+- ✅ **Latency**: 2.5s → 1.9s (-24.0%, achieved 2.0s target)
+- ⚠️ **Cost**: $0.015 → $0.011 (-26.7%, target $0.010 not met)
+
+A total of 3 iterations were executed, achieving 2 out of 3 metric targets.
+
+## Implementation Summary
+
+### Iteration Count and Execution Time
+- **Total Iterations**: 3
+- **Optimized Nodes**: 2 (analyze_intent, generate_response)
+- **Evaluation Run Count**: 20 times (baseline 5 times + 5 times × 3 post-iteration)
+- **Total Execution Time**: Approximately 5 hours
+
+### Final Results
+
+| Metric | Initial | Final | Improvement | % Change | Target | Achievement |
+|--------|---------|-------|-------------|----------|--------|-------------|
+| Accuracy | 75.0% | 92.0% | +17.0% | +22.7% | 90.0% | ✅ 102.2% achieved |
+| Latency | 2.5s | 1.9s | -0.6s | -24.0% | 2.0s | ✅ 95.0% achieved |
+| Cost/req | $0.015 | $0.011 | -$0.004 | -26.7% | $0.010 | ⚠️ 90.9% achieved |
+
+## Iteration Details
+
+### Iteration 1: Optimization of analyze_intent node
+
+**Date/Time**: 2024-11-24 11:00
+**Target Node**: src/nodes/analyzer.py:25-45
+
+**Changes**:
+1. temperature: 1.0 → 0.3
+2. Added 5 few-shot examples
+3. Structured JSON output format
+4. Defined clear classification categories (4)
+
+**Results**:
+- Accuracy: 75.0% → 86.0% (+11.0%)
+- Latency: 2.5s → 2.4s (-0.1s)
+- Cost: $0.015 → $0.014 (-$0.001)
+
+**Learning**: Few-shot examples and clear output format most effective for accuracy improvement
+
+---
+
+### Iteration 2: Optimization of generate_response node
+
+**Date/Time**: 2024-11-24 13:00
+**Target Node**: src/nodes/generator.py:45-68
+
+**Changes**:
+1. Added conciseness instructions ("answer in 2-3 sentences")
+2. max_tokens: unlimited → 500
+3. temperature: 0.7 → 0.5
+4. Clarified response style
+
+**Results**:
+- Accuracy: 86.0% → 88.0% (+2.0%)
+- Latency: 2.4s → 2.0s (-0.4s)
+- Cost: $0.014 → $0.011 (-$0.003)
+
+**Learning**: max_tokens limit contributed significantly to latency and cost reduction
+
+---
+
+### Iteration 3: Additional improvement of analyze_intent
+
+**Date/Time**: 2024-11-24 14:30
+**Target Node**: src/nodes/analyzer.py:25-45
+
+**Changes**:
+1. Increased few-shot examples from 5 → 10
+2. Added edge case handling
+3. Re-classification logic with confidence threshold
+
+**Results**:
+- Accuracy: 88.0% → 92.0% (+4.0%)
+- Latency: 2.0s → 1.9s (-0.1s)
+- Cost: $0.011 → $0.011 (±0)
+
+**Learning**: Additional few-shot examples broke through final accuracy barrier
+
+## Final Changes
+
+### src/nodes/analyzer.py (analyze_intent node)
+
+#### Before
+```python
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=1.0)
+    messages = [
+        SystemMessage(content="You are an intent analyzer. Analyze user input."),
+        HumanMessage(content=f"Analyze: {state['user_input']}")
+    ]
+    response = llm.invoke(messages)
+    state["intent"] = response.content
+    return state
+```
+
+#### After
+```python
+def analyze_intent(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.3)
+
+    system_prompt = """You are an intent classifier for a customer support chatbot.
+Classify user input into: product_inquiry, technical_support, billing, or general.
+Output JSON: {"intent": "<category>", "confidence": <0.0-1.0>, "reasoning": "<explanation>"}
+
+[10 few-shot examples...]
+"""
+
+    messages = [
+        SystemMessage(content=system_prompt),
+        HumanMessage(content=f"Input: {state['user_input']}\nOutput:")
+    ]
+
+    response = llm.invoke(messages)
+    intent_data = json.loads(response.content)
+
+    # Low confidence → re-classify as general
+    if intent_data["confidence"] < 0.7:
+        intent_data["intent"] = "general"
+
+    state["intent"] = intent_data["intent"]
+    state["confidence"] = intent_data["confidence"]
+    return state
+```
+
+**Key Changes**:
+- temperature: 1.0 → 0.3
+- Few-shot examples: 0 → 10
+- Output: free text → JSON
+- Added confidence threshold fallback
+
+---
+
+### src/nodes/generator.py (generate_response node)
+
+#### Before
+```python
+def generate_response(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.7)
+    prompt = ChatPromptTemplate.from_messages([
+        ("system", "Generate helpful response based on context."),
+        ("human", "{context}\n\nQuestion: {question}")
+    ])
+    chain = prompt | llm
+    response = chain.invoke({"context": state["context"], "question": state["user_input"]})
+    state["response"] = response.content
+    return state
+```
+
+#### After
+```python
+def generate_response(state: GraphState) -> GraphState:
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=0.5,
+        max_tokens=500  # Output length limit
+    )
+
+    system_prompt = """You are a helpful customer support assistant.
+
+Guidelines:
+- Be concise: Answer in 2-3 sentences
+- Be friendly: Use a warm, professional tone
+- Be accurate: Base your answer on the provided context
+- If uncertain: Acknowledge and offer to escalate
+
+Format: Direct answer followed by one optional clarifying sentence.
+"""
+
+    prompt = ChatPromptTemplate.from_messages([
+        ("system", system_prompt),
+        ("human", "Context: {context}\n\nQuestion: {question}\n\nAnswer:")
+    ])
+
+    chain = prompt | llm
+    response = chain.invoke({"context": state["context"], "question": state["user_input"]})
+    state["response"] = response.content
+    return state
+```
+
+**Key Changes**:
+- temperature: 0.7 → 0.5
+- max_tokens: unlimited → 500
+- Clear conciseness instruction ("2-3 sentences")
+- Added response style guidelines
+
+## Detailed Evaluation Results
+
+### Improvement Status by Test Case
+
+| Case ID | Category | Before | After | Improved |
+|---------|----------|--------|-------|----------|
+| TC001 | Product | ❌ Wrong | ✅ Correct | ✅ |
+| TC002 | Technical | ❌ Wrong | ✅ Correct | ✅ |
+| TC003 | Billing | ✅ Correct | ✅ Correct | - |
+| TC004 | General | ✅ Correct | ✅ Correct | - |
+| TC005 | Product | ❌ Wrong | ✅ Correct | ✅ |
+| ... | ... | ... | ... | ... |
+| TC020 | Technical | ✅ Correct | ✅ Correct | - |
+
+**Improved Cases**: 15/20 (75%)
+**Maintained Cases**: 5/20 (25%)
+**Degraded Cases**: 0/20 (0%)
+
+### Latency Breakdown
+
+| Node | Before | After | Change | % Change |
+|------|--------|-------|--------|----------|
+| analyze_intent | 0.5s | 0.4s | -0.1s | -20% |
+| retrieve_context | 0.2s | 0.2s | ±0s | 0% |
+| generate_response | 1.8s | 1.3s | -0.5s | -28% |
+| **Total** | **2.5s** | **1.9s** | **-0.6s** | **-24%** |
+
+### Cost Breakdown
+
+| Node | Before | After | Change | % Change |
+|------|--------|-------|--------|----------|
+| analyze_intent | $0.003 | $0.003 | ±$0 | 0% |
+| retrieve_context | $0.001 | $0.001 | ±$0 | 0% |
+| generate_response | $0.011 | $0.007 | -$0.004 | -36% |
+| **Total** | **$0.015** | **$0.011** | **-$0.004** | **-27%** |
+
+## Future Recommendations
+
+### Short-term (1-2 weeks)
+1. **Achieve cost target**: $0.011 → $0.010
+   - Approach: Consider partial migration to Claude 3.5 Haiku
+   - Estimated effect: -$0.002-0.003/req
+
+2. **Further accuracy improvement**: 92.0% → 95.0%
+   - Approach: Analyze error cases and add few-shot examples
+   - Estimated effect: +3.0%
+
+### Mid-term (1-2 months)
+1. **Model optimization**
+   - Use Haiku for simple intent classification
+   - Use Sonnet only for complex response generation
+   - Estimated effect: -30-40% cost, minimal latency impact
+
+2. **Leverage prompt caching**
+   - Cache system prompts and few-shot examples
+   - Estimated effect: -50% cost (when cache hits)
+
+### Long-term (3-6 months)
+1. **Consider fine-tuned models**
+   - Model fine-tuning with proprietary data
+   - No need for few-shot examples, more concise prompts
+   - Estimated effect: -60% cost, +5% accuracy
+
+## Conclusion
+
+This project achieved the following through fine-tuning of the LangGraph application:
+
+✅ **Successes**:
+1. Significant accuracy improvement (+22.7%) - exceeded target by 2.2%
+2. Notable latency improvement (-24.0%) - exceeded target by 5%
+3. Cost reduction (-26.7%) - 9.1% away from target
+
+⚠️ **Challenges**:
+1. Cost target not met ($0.011 vs $0.010 target) - addressable through migration to lighter models
+
+📈 **Business Impact**:
+- Improved user satisfaction (through accuracy improvement)
+- Reduced operational costs (through latency and cost reduction)
+- Improved scalability (through efficient resource usage)
+
+🎯 **Next Steps**:
+1. Validate migration to lighter models for cost reduction
+2. Continuous monitoring and evaluation
+3. Expansion to new use cases
+
+---
+
+Created: 2024-11-24 15:00:00
+Creator: Claude Code (fine-tune skill)
+```
+
+### Step 13: Commit Code and Update Documentation
+
+**Git Commit Example**:
+```bash
+# Commit changes
+git add src/nodes/analyzer.py src/nodes/generator.py
+git commit -m "feat: optimize LangGraph prompts for accuracy and latency
+
+Iteration 1-3 of fine-tuning process:
+- analyze_intent: added few-shot examples, JSON output, lower temperature
+- generate_response: added conciseness guidelines, max_tokens limit
+
+Results:
+- Accuracy: 75.0% → 92.0% (+17.0%, goal 90% ✅)
+- Latency: 2.5s → 1.9s (-0.6s, goal 2.0s ✅)
+- Cost: $0.015 → $0.011 (-$0.004, goal $0.010 ⚠️)
+
+Full report: evaluation_results/final_report.md"
+
+# Commit evaluation results
+git add evaluation_results/
+git commit -m "docs: add fine-tuning evaluation results and final report"
+
+# Add tag
+git tag -a fine-tune-v1.0 -m "Fine-tuning completed: 92% accuracy achieved"
+```
+
+## Summary
+
+Following this workflow enables:
+- ✅ Systematic fine-tuning process execution
+- ✅ Data-driven decision making
+- ✅ Continuous improvement and verification
+- ✅ Complete documentation and traceability
--- a/skills/langgraph-master/01_core_concepts_edge.md
+++ b/skills/langgraph-master/01_core_concepts_edge.md
@@ -0,0 +1,170 @@
+# Edge
+
+Control flow that defines transitions between nodes.
+
+## Overview
+
+Edges determine "what to do next". Nodes perform processing, and edges dictate the next action.
+
+## Types of Edges
+
+### 1. Normal Edges (Fixed Transitions)
+
+Always transition to a specific node:
+
+```python
+from langgraph.graph import START, END
+
+# From START to node_a
+builder.add_edge(START, "node_a")
+
+# From node_a to node_b
+builder.add_edge("node_a", "node_b")
+
+# From node_b to end
+builder.add_edge("node_b", END)
+```
+
+### 2. Conditional Edges (Dynamic Transitions)
+
+Determine the destination based on state:
+
+```python
+from typing import Literal
+
+def should_continue(state: State) -> Literal["continue", "end"]:
+    if state["iteration"] < state["max_iterations"]:
+        return "continue"
+    return "end"
+
+# Add conditional edge
+builder.add_conditional_edges(
+    "agent",
+    should_continue,
+    {
+        "continue": "tools",  # Go to tools if continue
+        "end": END            # End if end
+    }
+)
+```
+
+### 3. Entry Points
+
+Define the starting point of the graph:
+
+```python
+# Simple entry
+builder.add_edge(START, "first_node")
+
+# Conditional entry
+builder.add_conditional_edges(
+    START,
+    route_start,
+    {
+        "path_a": "node_a",
+        "path_b": "node_b"
+    }
+)
+```
+
+## Parallel Execution
+
+Nodes with multiple outgoing edges will have **all destination nodes execute in parallel** in the next step:
+
+```python
+# From node_a to multiple nodes
+builder.add_edge("node_a", "node_b")
+builder.add_edge("node_a", "node_c")
+
+# node_b and node_c execute in parallel
+```
+
+To aggregate results from parallel execution, use a Reducer:
+
+```python
+from operator import add
+
+class State(TypedDict):
+    results: Annotated[list, add]  # Aggregate results from multiple nodes
+```
+
+## Edge Control with Command
+
+Specify the next destination from within a node:
+
+```python
+from langgraph.types import Command
+
+def smart_node(state: State) -> Command:
+    result = analyze(state["data"])
+
+    if result["confidence"] > 0.8:
+        return Command(
+            update={"result": result},
+            goto="finalize"
+        )
+    else:
+        return Command(
+            update={"result": result, "needs_review": True},
+            goto="human_review"
+        )
+```
+
+## Conditional Branching Implementation Patterns
+
+### Pattern 1: Tool Call Loop
+
+```python
+def should_continue(state: State) -> Literal["continue", "end"]:
+    messages = state["messages"]
+    last_message = messages[-1]
+
+    # Continue if there are tool calls
+    if last_message.tool_calls:
+        return "continue"
+    return "end"
+
+builder.add_conditional_edges(
+    "agent",
+    should_continue,
+    {
+        "continue": "tools",
+        "end": END
+    }
+)
+```
+
+### Pattern 2: Routing
+
+```python
+def route_query(state: State) -> Literal["search", "calculate", "general"]:
+    query = state["query"]
+
+    if "calculate" in query or "+" in query:
+        return "calculate"
+    elif "search" in query:
+        return "search"
+    return "general"
+
+builder.add_conditional_edges(
+    "router",
+    route_query,
+    {
+        "search": "search_node",
+        "calculate": "calculator_node",
+        "general": "general_node"
+    }
+)
+```
+
+## Important Principles
+
+1. **Explicit Control Flow**: Transitions should be transparent and traceable
+2. **Type Safety**: Explicitly specify destinations with Literal
+3. **Leverage Parallel Execution**: Execute independent tasks in parallel
+
+## Related Pages
+
+- [01_core_concepts_node.md](01_core_concepts_node.md) - Node implementation
+- [02_graph_architecture_routing.md](02_graph_architecture_routing.md) - Routing patterns
+- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Parallel processing patterns
--- a/skills/langgraph-master/01_core_concepts_node.md
+++ b/skills/langgraph-master/01_core_concepts_node.md
@@ -0,0 +1,132 @@
+# Node
+
+Python functions that execute individual tasks.
+
+## Overview
+
+Nodes are "processing units" that read state, perform some processing, and return updates.
+
+## Basic Implementation
+
+```python
+def my_node(state: State) -> dict:
+    # Get information from state
+    messages = state["messages"]
+
+    # Execute processing
+    result = process_messages(messages)
+
+    # Return updates (don't modify state directly)
+    return {"result": result, "count": state["count"] + 1}
+```
+
+## Types of Nodes
+
+### 1. LLM Call Node
+
+```python
+def llm_node(state: State):
+    messages = state["messages"]
+    response = llm.invoke(messages)
+
+    return {"messages": [response]}
+```
+
+### 2. Tool Execution Node
+
+```python
+from langgraph.prebuilt import ToolNode
+
+tools = [search_tool, calculator_tool]
+tool_node = ToolNode(tools)
+```
+
+### 3. Processing Node
+
+```python
+def process_node(state: State):
+    data = state["raw_data"]
+
+    # Data processing
+    processed = clean_and_transform(data)
+
+    return {"processed_data": processed}
+```
+
+## Node Signature
+
+Nodes can accept the following parameters:
+
+```python
+from langgraph.types import Command
+
+def advanced_node(
+    state: State,
+    config: RunnableConfig,  # Optional
+) -> dict | Command:
+    # Get configuration from config
+    thread_id = config["configurable"]["thread_id"]
+
+    # Processing...
+
+    return {"result": result}
+```
+
+## Control with Command API
+
+Specify state updates and control flow simultaneously:
+
+```python
+from langgraph.types import Command
+
+def decision_node(state: State) -> Command:
+    if state["should_continue"]:
+        return Command(
+            update={"status": "continuing"},
+            goto="next_node"
+        )
+    else:
+        return Command(
+            update={"status": "done"},
+            goto=END
+        )
+```
+
+## Important Principles
+
+1. **Idempotency**: Return the same output for the same input
+2. **Return Updates**: Return update contents instead of directly modifying state
+3. **Single Responsibility**: Each node does one thing well
+
+## Adding Nodes
+
+```python
+from langgraph.graph import StateGraph
+
+builder = StateGraph(State)
+
+# Add nodes
+builder.add_node("analyze", analyze_node)
+builder.add_node("decide", decide_node)
+builder.add_node("execute", execute_node)
+
+# Add tool node
+builder.add_node("tools", tool_node)
+```
+
+## Error Handling
+
+```python
+def robust_node(state: State) -> dict:
+    try:
+        result = risky_operation(state["data"])
+        return {"result": result, "error": None}
+    except Exception as e:
+        return {"result": None, "error": str(e)}
+```
+
+## Related Pages
+
+- [01_core_concepts_state.md](01_core_concepts_state.md) - How to define State
+- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Connections between nodes
+- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool node details
--- a/skills/langgraph-master/01_core_concepts_overview.md
+++ b/skills/langgraph-master/01_core_concepts_overview.md
@@ -0,0 +1,57 @@
+# 01. Core Concepts
+
+Understanding the three core elements of LangGraph.
+
+## Overview
+
+LangGraph is a framework that models agent workflows as **graphs**. By decomposing complex workflows into **discrete steps (nodes)**, it achieves the following:
+
+- **Improved Resilience**: Create checkpoints at node boundaries
+- **Enhanced Visibility**: Enable state inspection between each step
+- **Independent Testing**: Easy unit testing of individual nodes
+- **Error Handling**: Apply different strategies for each error type
+
+## Three Core Elements
+
+### 1. [State](01_core_concepts_state.md)
+- Memory shared across all nodes in the graph
+- Snapshot of the current execution state
+- Defined with TypedDict or Pydantic models
+
+### 2. [Node](01_core_concepts_node.md)
+- Python functions that execute individual tasks
+- Receive the current state and return updates
+- Basic unit of processing
+
+### 3. [Edge](01_core_concepts_edge.md)
+- Define transitions between nodes
+- Fixed transitions or conditional branching
+- Determine control flow
+
+## Design Philosophy
+
+The core concept of LangGraph is **decomposition into discrete steps**:
+
+```python
+# Split agent into individual nodes
+graph = StateGraph(State)
+graph.add_node("analyze", analyze_node)  # Analysis step
+graph.add_node("decide", decide_node)     # Decision step
+graph.add_node("execute", execute_node)   # Execution step
+```
+
+This approach allows each step to operate independently, building a robust system as a whole.
+
+## Important Principles
+
+1. **Store Raw Data**: Store raw data in State, format prompts dynamically within nodes
+2. **Return Updates**: Nodes return update contents instead of directly modifying state
+3. **Transparent Control Flow**: Explicitly declare the next destination with Command objects
+
+## Next Steps
+
+For details on each element, refer to the following pages:
+
+- [01_core_concepts_state.md](01_core_concepts_state.md) - State management details
+- [01_core_concepts_node.md](01_core_concepts_node.md) - How to implement nodes
+- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Edges and control flow
--- a/skills/langgraph-master/01_core_concepts_state.md
+++ b/skills/langgraph-master/01_core_concepts_state.md
@@ -0,0 +1,102 @@
+# State
+
+Memory shared across all nodes in the graph.
+
+## Overview
+
+State is like a "notebook" that records everything the agent learns and decides. It is a **shared data structure** accessible to all nodes and edges in the graph.
+
+## Definition Methods
+
+### Using TypedDict
+
+```python
+from typing import TypedDict
+
+class State(TypedDict):
+    messages: list[str]
+    user_name: str
+    count: int
+```
+
+### Using Pydantic Model
+
+```python
+from pydantic import BaseModel
+
+class State(BaseModel):
+    messages: list[str]
+    user_name: str
+    count: int = 0  # Default value
+```
+
+## Reducer (Controlling Update Methods)
+
+A function that specifies how each key is updated. If not specified, it defaults to **value overwrite**.
+
+### Addition (Adding to List)
+
+```python
+from typing import Annotated
+from operator import add
+
+class State(TypedDict):
+    messages: Annotated[list[str], add]  # Add to existing list
+    count: int  # Overwrite
+```
+
+### Custom Reducer
+
+```python
+def concat_strings(existing: str, new: str) -> str:
+    return existing + " " + new
+
+class State(TypedDict):
+    text: Annotated[str, concat_strings]
+```
+
+## MessagesState (LLM Preset)
+
+For LLM conversations, LangChain's `MessagesState` is convenient:
+
+```python
+from langgraph.graph import MessagesState
+
+# This is equivalent to:
+class MessagesState(TypedDict):
+    messages: Annotated[list[AnyMessage], add_messages]
+```
+
+The `add_messages` reducer:
+- Adds new messages
+- Updates existing messages (ID-based)
+- Supports OpenAI format shorthand
+
+## Important Principles
+
+1. **Store Raw Data**: Format prompts within nodes
+2. **Clear Schema**: Define types with TypedDict or Pydantic
+3. **Control with Reducer**: Explicitly specify update methods
+
+## Example
+
+```python
+from typing import Annotated, TypedDict
+from operator import add
+
+class AgentState(TypedDict):
+    # Messages are added to the list
+    messages: Annotated[list[str], add]
+
+    # User information is overwritten
+    user_id: str
+    user_name: str
+
+    # Counter is also overwritten
+    iteration_count: int
+```
+
+## Related Pages
+
+- [01_core_concepts_node.md](01_core_concepts_node.md) - How to use State in nodes
+- [03_memory_management_overview.md](03_memory_management_overview.md) - State persistence
--- a/skills/langgraph-master/02_graph_architecture_agent.md
+++ b/skills/langgraph-master/02_graph_architecture_agent.md
@@ -0,0 +1,338 @@
+# Agent (Autonomous Tool Usage)
+
+A pattern where the LLM dynamically determines tool selection to handle unpredictable problem-solving.
+
+## Overview
+
+The Agent pattern follows **ReAct** (Reasoning + Acting), where the LLM dynamically selects and executes tools to solve problems.
+
+## ReAct Pattern
+
+**ReAct** = Reasoning + Acting
+
+1. **Reasoning**: Think "What should I do next?"
+2. **Acting**: Take action using tools
+3. **Observing**: Observe the results
+4. **Repeat steps 1-3** until reaching a final answer
+
+## Implementation Example: Basic Agent
+
+```python
+from langgraph.graph import StateGraph, START, END, MessagesState
+from langgraph.prebuilt import ToolNode
+from typing import Literal
+
+# Tool definitions
+@tool
+def search(query: str) -> str:
+    """Execute web search"""
+    return perform_search(query)
+
+@tool
+def calculator(expression: str) -> float:
+    """Execute calculation"""
+    return eval(expression)
+
+tools = [search, calculator]
+
+# Agent node
+def agent_node(state: MessagesState):
+    """LLM determines tool usage"""
+    messages = state["messages"]
+
+    # Invoke LLM with tools
+    response = llm_with_tools.invoke(messages)
+
+    return {"messages": [response]}
+
+# Continue decision
+def should_continue(state: MessagesState) -> Literal["tools", "end"]:
+    """Check if there are tool calls"""
+    last_message = state["messages"][-1]
+
+    # Continue if there are tool calls
+    if last_message.tool_calls:
+        return "tools"
+
+    # End if no tool calls (final answer)
+    return "end"
+
+# Build graph
+builder = StateGraph(MessagesState)
+
+builder.add_node("agent", agent_node)
+builder.add_node("tools", ToolNode(tools))
+
+builder.add_edge(START, "agent")
+
+# ReAct loop
+builder.add_conditional_edges(
+    "agent",
+    should_continue,
+    {
+        "tools": "tools",
+        "end": END
+    }
+)
+
+# Return to agent after tool execution
+builder.add_edge("tools", "agent")
+
+graph = builder.compile()
+```
+
+## Tool Definitions
+
+### Basic Tools
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def get_weather(location: str) -> str:
+    """Get weather for the specified location.
+
+    Args:
+        location: City name (e.g., "Tokyo", "New York")
+    """
+    return fetch_weather_data(location)
+
+@tool
+def send_email(to: str, subject: str, body: str) -> str:
+    """Send an email.
+
+    Args:
+        to: Recipient email address
+        subject: Email subject
+        body: Email body
+    """
+    return send_email_api(to, subject, body)
+```
+
+### Structured Output Tools
+
+```python
+from pydantic import BaseModel, Field
+
+class WeatherResponse(BaseModel):
+    location: str
+    temperature: float
+    condition: str
+    humidity: int
+
+@tool(response_format="content_and_artifact")
+def get_detailed_weather(location: str) -> tuple[str, WeatherResponse]:
+    """Get detailed weather information"""
+    data = fetch_weather_data(location)
+
+    weather = WeatherResponse(
+        location=location,
+        temperature=data["temp"],
+        condition=data["condition"],
+        humidity=data["humidity"]
+    )
+
+    message = f"Weather in {location}: {weather.condition}, {weather.temperature}°C"
+
+    return message, weather
+```
+
+## Advanced Patterns
+
+### Pattern 1: Multi-Agent Collaboration
+
+```python
+# Specialist agents
+def research_agent(state: State):
+    """Research specialist agent"""
+    response = research_llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+def coding_agent(state: State):
+    """Coding specialist agent"""
+    response = coding_llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+# Router
+def route_to_specialist(state: State) -> Literal["research", "coding"]:
+    """Select specialist based on task"""
+    last_message = state["messages"][-1]
+
+    if "research" in last_message.content or "search" in last_message.content:
+        return "research"
+    elif "code" in last_message.content or "implement" in last_message.content:
+        return "coding"
+
+    return "research"  # Default
+```
+
+### Pattern 2: Agent with Memory
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+class AgentState(TypedDict):
+    messages: Annotated[list, add_messages]
+    context: dict  # Long-term memory
+
+def agent_with_memory(state: AgentState):
+    """Agent utilizing context"""
+    messages = state["messages"]
+    context = state.get("context", {})
+
+    # Add context to prompt
+    system_message = f"Context: {context}"
+
+    response = llm_with_tools.invoke([
+        {"role": "system", "content": system_message},
+        *messages
+    ])
+
+    return {"messages": [response]}
+
+# Compile with checkpointer
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+### Pattern 3: Human-in-the-Loop Agent
+
+```python
+from langgraph.types import interrupt
+
+def careful_agent(state: State):
+    """Confirm with human before important actions"""
+    response = llm_with_tools.invoke(state["messages"])
+
+    # Request confirmation for important tool calls
+    if response.tool_calls:
+        for tool_call in response.tool_calls:
+            if tool_call["name"] in ["send_email", "delete_data"]:
+                # Wait for human approval
+                approved = interrupt({
+                    "action": tool_call["name"],
+                    "args": tool_call["args"],
+                    "message": "Approve this action?"
+                })
+
+                if not approved:
+                    return {
+                        "messages": [
+                            {"role": "assistant", "content": "Action cancelled by user"}
+                        ]
+                    }
+
+    return {"messages": [response]}
+```
+
+### Pattern 4: Error Handling and Retry
+
+```python
+class RobustAgentState(TypedDict):
+    messages: Annotated[list, add_messages]
+    retry_count: int
+    errors: list[str]
+
+def robust_tool_node(state: RobustAgentState):
+    """Tool execution with error handling"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        try:
+            result = execute_tool(tool_call)
+            tool_results.append(result)
+
+        except Exception as e:
+            error_msg = f"Tool {tool_call['name']} failed: {str(e)}"
+
+            # Check if retry is possible
+            if state.get("retry_count", 0) < 3:
+                tool_results.append({
+                    "tool_call_id": tool_call["id"],
+                    "error": error_msg,
+                    "retry": True
+                })
+            else:
+                tool_results.append({
+                    "tool_call_id": tool_call["id"],
+                    "error": "Max retries exceeded",
+                    "retry": False
+                })
+
+    return {
+        "messages": tool_results,
+        "retry_count": state.get("retry_count", 0) + 1
+    }
+```
+
+## Advanced Tool Features
+
+### Dynamic Tool Generation
+
+```python
+def create_tool_for_api(api_spec: dict):
+    """Dynamically generate tool from API specification"""
+
+    @tool
+    def dynamic_api_tool(**kwargs) -> str:
+        f"""
+        {api_spec['description']}
+
+        Args: {api_spec['parameters']}
+        """
+        return call_api(api_spec['endpoint'], kwargs)
+
+    return dynamic_api_tool
+```
+
+### Conditional Tool Usage
+
+```python
+def conditional_agent(state: State):
+    """Change toolset based on situation"""
+    context = state.get("context", {})
+
+    # Basic tools only for beginners
+    if context.get("user_level") == "beginner":
+        tools = [basic_search, simple_calculator]
+    # Advanced tools for advanced users
+    else:
+        tools = [advanced_search, scientific_calculator, code_executor]
+
+    llm_with_selected_tools = llm.bind_tools(tools)
+    response = llm_with_selected_tools.invoke(state["messages"])
+
+    return {"messages": [response]}
+```
+
+## Benefits
+
+✅ **Flexibility**: Dynamically responds to unpredictable problems
+✅ **Autonomy**: LLM selects optimal tools and strategies
+✅ **Extensibility**: Extend functionality by simply adding tools
+✅ **Adaptability**: Solves complex multi-step tasks
+
+## Considerations
+
+⚠️ **Unpredictability**: May behave differently with same input
+⚠️ **Cost**: Multiple LLM calls occur
+⚠️ **Infinite Loops**: Proper termination conditions required
+⚠️ **Tool Misuse**: LLM may use tools incorrectly
+
+## Best Practices
+
+1. **Clear Tool Descriptions**: Write detailed tool docstrings
+2. **Maximum Iterations**: Set upper limit for loops
+3. **Error Handling**: Handle tool execution errors appropriately
+4. **Logging**: Make agent behavior traceable
+
+## Summary
+
+The Agent pattern is optimal for **dynamic and uncertain problem-solving**. It autonomously solves problems using tools through the ReAct loop.
+
+## Related Pages
+
+- [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md) - Differences between Workflow and Agent
+- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool details
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Human intervention
--- a/skills/langgraph-master/02_graph_architecture_evaluator_optimizer.md
+++ b/skills/langgraph-master/02_graph_architecture_evaluator_optimizer.md
@@ -0,0 +1,335 @@
+# Evaluator-Optimizer (Evaluation-Improvement Loop)
+
+A pattern that repeats generation and evaluation, continuing iterative improvement until acceptable criteria are met.
+
+## Overview
+
+Evaluator-Optimizer is a pattern that repeats the **generate → evaluate → improve** loop, continuing until quality standards are met.
+
+## Use Cases
+
+- Code generation and quality verification
+- Translation accuracy improvement
+- Gradual content improvement
+- Iterative solution for optimization problems
+
+## Implementation Example: Translation Quality Improvement
+
+```python
+from typing import TypedDict
+
+class State(TypedDict):
+    original_text: str
+    translated_text: str
+    quality_score: float
+    iteration: int
+    max_iterations: int
+    feedback: str
+
+def generator_node(state: State):
+    """Generate or improve translation"""
+    if state.get("translated_text"):
+        # Improve existing translation
+        prompt = f"""
+        Original: {state['original_text']}
+        Current translation: {state['translated_text']}
+        Feedback: {state['feedback']}
+
+        Improve the translation based on the feedback.
+        """
+    else:
+        # Initial translation
+        prompt = f"Translate to Japanese: {state['original_text']}"
+
+    translated = llm.invoke(prompt)
+
+    return {
+        "translated_text": translated,
+        "iteration": state.get("iteration", 0) + 1
+    }
+
+def evaluator_node(state: State):
+    """Evaluate translation quality"""
+    evaluation_prompt = f"""
+    Original: {state['original_text']}
+    Translation: {state['translated_text']}
+
+    Rate the translation quality (0-1) and provide specific feedback.
+    Format: SCORE: 0.X\nFEEDBACK: ...
+    """
+
+    result = llm.invoke(evaluation_prompt)
+
+    # Extract score and feedback
+    score = extract_score(result)
+    feedback = extract_feedback(result)
+
+    return {
+        "quality_score": score,
+        "feedback": feedback
+    }
+
+def should_continue(state: State) -> Literal["improve", "done"]:
+    """Continuation decision"""
+    # Check if quality standard is met
+    if state["quality_score"] >= 0.9:
+        return "done"
+
+    # Check if maximum iterations reached
+    if state["iteration"] >= state["max_iterations"]:
+        return "done"
+
+    return "improve"
+
+# Build graph
+builder = StateGraph(State)
+
+builder.add_node("generator", generator_node)
+builder.add_node("evaluator", evaluator_node)
+
+builder.add_edge(START, "generator")
+builder.add_edge("generator", "evaluator")
+
+builder.add_conditional_edges(
+    "evaluator",
+    should_continue,
+    {
+        "improve": "generator",  # Loop
+        "done": END
+    }
+)
+
+graph = builder.compile()
+```
+
+## Advanced Patterns
+
+### Pattern 1: Multiple Evaluation Criteria
+
+```python
+class MultiEvalState(TypedDict):
+    content: str
+    scores: dict[str, float]  # Multiple evaluation scores
+    min_scores: dict[str, float]  # Minimum value for each criterion
+
+def multi_evaluator(state: State):
+    """Evaluate from multiple perspectives"""
+    content = state["content"]
+
+    # Evaluate each perspective
+    scores = {
+        "accuracy": evaluate_accuracy(content),
+        "readability": evaluate_readability(content),
+        "completeness": evaluate_completeness(content)
+    }
+
+    return {"scores": scores}
+
+def multi_should_continue(state: MultiEvalState):
+    """Check if all criteria are met"""
+    for criterion, min_score in state["min_scores"].items():
+        if state["scores"][criterion] < min_score:
+            return "improve"
+
+    return "done"
+```
+
+### Pattern 2: Progressive Criteria Increase
+
+```python
+def adaptive_evaluator(state: State):
+    """Adjust criteria based on iteration"""
+    iteration = state["iteration"]
+
+    # Start with lenient criteria, gradually stricter
+    threshold = 0.7 + (iteration * 0.05)
+    threshold = min(threshold, 0.95)  # Maximum 0.95
+
+    score = evaluate(state["content"])
+
+    return {
+        "quality_score": score,
+        "threshold": threshold
+    }
+
+def adaptive_should_continue(state: State):
+    if state["quality_score"] >= state["threshold"]:
+        return "done"
+
+    if state["iteration"] >= state["max_iterations"]:
+        return "done"
+
+    return "improve"
+```
+
+### Pattern 3: Multiple Improvement Strategies
+
+```python
+from typing import Literal
+
+def strategy_router(state: State) -> Literal["minor_fix", "major_rewrite"]:
+    """Select improvement strategy based on score"""
+    score = state["quality_score"]
+
+    if score >= 0.7:
+        # Minor adjustments sufficient
+        return "minor_fix"
+    else:
+        # Major rewrite needed
+        return "major_rewrite"
+
+def minor_fix_node(state: State):
+    """Small improvements"""
+    prompt = f"Make minor improvements: {state['content']}\n{state['feedback']}"
+    return {"content": llm.invoke(prompt)}
+
+def major_rewrite_node(state: State):
+    """Major rewrite"""
+    prompt = f"Completely rewrite: {state['content']}\n{state['feedback']}"
+    return {"content": llm.invoke(prompt)}
+
+builder.add_conditional_edges(
+    "evaluator",
+    strategy_router,
+    {
+        "minor_fix": "minor_fix",
+        "major_rewrite": "major_rewrite"
+    }
+)
+```
+
+### Pattern 4: Early Termination and Timeout
+
+```python
+import time
+
+class TimedState(TypedDict):
+    content: str
+    quality_score: float
+    iteration: int
+    start_time: float
+    max_duration: float  # seconds
+
+def timed_should_continue(state: TimedState):
+    """Check both quality criteria and timeout"""
+    # Quality standard met
+    if state["quality_score"] >= 0.9:
+        return "done"
+
+    # Timeout
+    elapsed = time.time() - state["start_time"]
+    if elapsed >= state["max_duration"]:
+        return "timeout"
+
+    # Maximum iterations
+    if state["iteration"] >= 10:
+        return "max_iterations"
+
+    return "improve"
+
+builder.add_conditional_edges(
+    "evaluator",
+    timed_should_continue,
+    {
+        "improve": "generator",
+        "done": END,
+        "timeout": "timeout_handler",
+        "max_iterations": "max_iter_handler"
+    }
+)
+```
+
+## Evaluator Implementation Patterns
+
+### Pattern 1: Rule-Based Evaluation
+
+```python
+def rule_based_evaluator(state: State):
+    """Rule-based evaluation"""
+    content = state["content"]
+    score = 0.0
+    feedback = []
+
+    # Length check
+    if 100 <= len(content) <= 500:
+        score += 0.3
+    else:
+        feedback.append("Length should be 100-500 characters")
+
+    # Keyword check
+    required_keywords = state["required_keywords"]
+    if all(kw in content for kw in required_keywords):
+        score += 0.3
+    else:
+        missing = [kw for kw in required_keywords if kw not in content]
+        feedback.append(f"Missing keywords: {missing}")
+
+    # Structure check
+    if has_proper_structure(content):
+        score += 0.4
+    else:
+        feedback.append("Improve structure")
+
+    return {
+        "quality_score": score,
+        "feedback": "\n".join(feedback)
+    }
+```
+
+### Pattern 2: LLM-Based Evaluation
+
+```python
+def llm_evaluator(state: State):
+    """LLM evaluation"""
+    evaluation_prompt = f"""
+    Evaluate this content on a scale of 0-1:
+    {state['content']}
+
+    Criteria:
+    - Clarity
+    - Completeness
+    - Accuracy
+
+    Provide:
+    1. Overall score (0-1)
+    2. Specific feedback for improvement
+    """
+
+    result = llm.invoke(evaluation_prompt)
+
+    return {
+        "quality_score": parse_score(result),
+        "feedback": parse_feedback(result)
+    }
+```
+
+## Benefits
+
+✅ **Quality Assurance**: Continue improvement until standards are met
+✅ **Automatic Optimization**: Quality improvement without manual intervention
+✅ **Feedback Loop**: Use evaluation results for next improvement
+✅ **Adaptive**: Iteration count varies based on problem difficulty
+
+## Considerations
+
+⚠️ **Infinite Loops**: Set termination conditions appropriately
+⚠️ **Cost**: Multiple LLM calls occur
+⚠️ **No Convergence Guarantee**: May not always meet standards
+⚠️ **Local Optima**: Improvement may get stuck
+
+## Best Practices
+
+1. **Clear Termination Conditions**: Set maximum iterations and timeout
+2. **Progressive Feedback**: Provide specific improvement points
+3. **Progress Tracking**: Record scores for each iteration
+4. **Fallback**: Handle cases where standards cannot be met
+
+## Summary
+
+Evaluator-Optimizer is optimal when **iterative improvement is needed until quality standards are met**. Clear evaluation criteria and termination conditions are key to success.
+
+## Related Pages
+
+- [02_graph_architecture_prompt_chaining.md](02_graph_architecture_prompt_chaining.md) - Basic sequential processing
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combining with Agent
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Human evaluation
--- a/skills/langgraph-master/02_graph_architecture_orchestrator_worker.md
+++ b/skills/langgraph-master/02_graph_architecture_orchestrator_worker.md
@@ -0,0 +1,262 @@
+# Orchestrator-Worker (Master-Worker)
+
+A pattern where an orchestrator decomposes tasks and delegates them to multiple workers.
+
+## Overview
+
+Orchestrator-Worker is a pattern where a **master node** decomposes tasks into multiple subtasks and delegates them in parallel to **worker nodes**. Also known as the Map-Reduce pattern.
+
+## Use Cases
+
+- Parallel processing of multiple documents
+- Dividing large tasks into smaller subtasks
+- Distributed processing of datasets
+- Parallel API calls
+
+## Implementation Example: Summarizing Multiple Documents
+
+```python
+from langgraph.types import Send
+from typing import TypedDict, Annotated
+from operator import add
+
+class State(TypedDict):
+    documents: list[str]
+    summaries: Annotated[list[str], add]
+    final_summary: str
+
+class WorkerState(TypedDict):
+    document: str
+    summary: str
+
+def orchestrator_node(state: State):
+    """Decompose task and delegate to workers"""
+    # Send each document to a worker
+    return [
+        Send("worker", {"document": doc})
+        for doc in state["documents"]
+    ]
+
+def worker_node(state: WorkerState):
+    """Summarize individual document"""
+    summary = llm.invoke(f"Summarize: {state['document']}")
+    return {"summaries": [summary]}
+
+def reducer_node(state: State):
+    """Integrate all summaries"""
+    all_summaries = "\n".join(state["summaries"])
+    final = llm.invoke(f"Create final summary from:\n{all_summaries}")
+    return {"final_summary": final}
+
+# Build graph
+builder = StateGraph(State)
+
+builder.add_node("orchestrator", orchestrator_node)
+builder.add_node("worker", worker_node)
+builder.add_node("reducer", reducer_node)
+
+# Orchestrator to workers (dynamic)
+builder.add_edge(START, "orchestrator")
+
+# Workers to aggregation node
+builder.add_edge("worker", "reducer")
+builder.add_edge("reducer", END)
+
+graph = builder.compile()
+```
+
+## Using the Send API
+
+Generate **node instances dynamically** with `Send` objects:
+
+```python
+def orchestrator(state: State):
+    # Generate worker instance for each item
+    return [
+        Send("worker", {"item": item, "index": i})
+        for i, item in enumerate(state["items"])
+    ]
+```
+
+## Advanced Patterns
+
+### Pattern 1: Hierarchical Processing
+
+```python
+def master_orchestrator(state: State):
+    """Master delegates to multiple sub-orchestrators"""
+    return [
+        Send("sub_orchestrator", {"category": cat, "items": items})
+        for cat, items in group_by_category(state["all_items"])
+    ]
+
+def sub_orchestrator(state: SubState):
+    """Sub-orchestrator delegates to individual workers"""
+    return [
+        Send("worker", {"item": item})
+        for item in state["items"]
+    ]
+```
+
+### Pattern 2: Conditional Worker Selection
+
+```python
+def smart_orchestrator(state: State):
+    """Select different workers based on task characteristics"""
+    tasks = []
+
+    for item in state["items"]:
+        if is_complex(item):
+            tasks.append(Send("advanced_worker", {"item": item}))
+        else:
+            tasks.append(Send("simple_worker", {"item": item}))
+
+    return tasks
+```
+
+### Pattern 3: Batch Processing
+
+```python
+def batch_orchestrator(state: State):
+    """Divide items into batches"""
+    batch_size = 10
+    batches = [
+        state["items"][i:i+batch_size]
+        for i in range(0, len(state["items"]), batch_size)
+    ]
+
+    return [
+        Send("batch_worker", {"batch": batch, "batch_id": i})
+        for i, batch in enumerate(batches)
+    ]
+
+def batch_worker(state: BatchState):
+    """Process batch"""
+    results = [process(item) for item in state["batch"]]
+    return {"results": results}
+```
+
+### Pattern 4: Error Handling and Retry
+
+```python
+class WorkerState(TypedDict):
+    item: str
+    retry_count: int
+    result: str
+    error: str | None
+
+def robust_worker(state: WorkerState):
+    """Worker with error handling"""
+    try:
+        result = process_item(state["item"])
+        return {"result": result, "error": None}
+    except Exception as e:
+        if state.get("retry_count", 0) < 3:
+            # Retry
+            return Send("worker", {
+                "item": state["item"],
+                "retry_count": state.get("retry_count", 0) + 1
+            })
+        else:
+            # Maximum retries reached
+            return {"error": str(e)}
+```
+
+## Dynamic Parallelism Control
+
+```python
+import os
+
+def adaptive_orchestrator(state: State):
+    """Adjust parallelism based on system resources"""
+    max_workers = int(os.getenv("MAX_WORKERS", "5"))
+
+    # Divide items into chunks
+    items = state["items"]
+    chunk_size = max(1, len(items) // max_workers)
+
+    chunks = [
+        items[i:i+chunk_size]
+        for i in range(0, len(items), chunk_size)
+    ]
+
+    return [
+        Send("worker", {"chunk": chunk})
+        for chunk in chunks
+    ]
+```
+
+## Reducer Implementation Patterns
+
+### Pattern 1: Simple Aggregation
+
+```python
+from operator import add
+
+class State(TypedDict):
+    results: Annotated[list, add]
+
+def reducer(state: State):
+    """Simple aggregation of results"""
+    return {"total": sum(state["results"])}
+```
+
+### Pattern 2: Complex Aggregation
+
+```python
+def advanced_reducer(state: State):
+    """Calculate statistics"""
+    results = state["results"]
+
+    return {
+        "total": sum(results),
+        "average": sum(results) / len(results),
+        "min": min(results),
+        "max": max(results)
+    }
+```
+
+### Pattern 3: LLM-Based Integration
+
+```python
+def llm_reducer(state: State):
+    """Integrate multiple results with LLM"""
+    all_results = "\n".join(state["summaries"])
+
+    final = llm.invoke(
+        f"Synthesize these summaries into one:\n{all_results}"
+    )
+
+    return {"final_summary": final}
+```
+
+## Benefits
+
+✅ **Scalability**: Workers automatically generated based on task count
+✅ **Parallel Processing**: High-speed processing of large amounts of data
+✅ **Flexibility**: Dynamically adjustable worker count
+✅ **Distributed Processing**: Distributable across multiple servers
+
+## Considerations
+
+⚠️ **Memory Consumption**: Many worker instances are generated
+⚠️ **Reducer Design**: Appropriately design result aggregation method
+⚠️ **Error Handling**: Handle cases where some workers fail
+⚠️ **Resource Management**: May need to limit parallelism
+
+## Best Practices
+
+1. **Batch Size Adjustment**: Too small causes overhead, too large reduces parallelism
+2. **Error Isolation**: One failure shouldn't affect the whole
+3. **Progress Tracking**: Visualize progress for large task counts
+4. **Resource Limits**: Set upper limit on parallelism
+
+## Summary
+
+Orchestrator-Worker is optimal for **parallel processing of large task volumes**. Workers are generated dynamically with the Send API, and results are aggregated with a Reducer.
+
+## Related Pages
+
+- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with static parallel processing
+- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce details
+- [01_core_concepts_state.md](01_core_concepts_state.md) - Reducer details
--- a/skills/langgraph-master/02_graph_architecture_overview.md
+++ b/skills/langgraph-master/02_graph_architecture_overview.md
@@ -0,0 +1,59 @@
+# 02. Graph Architecture
+
+Six major graph patterns and agent design.
+
+## Overview
+
+LangGraph supports various architectural patterns. It's important to select the optimal pattern based on the nature of the problem.
+
+## [Workflow vs Agent](02_graph_architecture_workflow_vs_agent.md)
+
+First, understand the difference between Workflow and Agent:
+
+- **Workflow**: Predetermined code paths, operates in a specific order
+- **Agent**: Dynamic, defines its own processes and tool usage
+
+## Six Major Patterns
+
+### 1. [Prompt Chaining (Sequential Processing)](02_graph_architecture_prompt_chaining.md)
+Each LLM call processes the previous output. Suitable for translation and stepwise processing.
+
+### 2. [Parallelization (Parallel Processing)](02_graph_architecture_parallelization.md)
+Execute multiple independent tasks simultaneously. Used for speed improvement and reliability verification.
+
+### 3. [Routing (Branching Processing)](02_graph_architecture_routing.md)
+Route to specialized flows based on input. Optimal for customer support.
+
+### 4. [Orchestrator-Worker (Master-Worker)](02_graph_architecture_orchestrator_worker.md)
+Orchestrator decomposes tasks and delegates to multiple workers.
+
+### 5. [Evaluator-Optimizer (Evaluation-Improvement Loop)](02_graph_architecture_evaluator_optimizer.md)
+Repeat generation and evaluation, iteratively improving until acceptable criteria are met.
+
+### 6. [Agent (Autonomous Tool Usage)](02_graph_architecture_agent.md)
+LLM dynamically determines tool selection, handling unpredictable problem-solving.
+
+## [Subgraph](02_graph_architecture_subgraph.md)
+
+Build hierarchical graph structures and modularize complex systems.
+
+## Pattern Selection Guide
+
+| Pattern | Use Case | Example |
+|---------|----------|---------|
+| Prompt Chaining | Stepwise processing | Translation → Summary → Analysis |
+| Parallelization | Simultaneous execution of independent tasks | Evaluation by multiple criteria |
+| Routing | Type-based routing | Support inquiry classification |
+| Orchestrator-Worker | Task decomposition and delegation | Parallel processing of multiple documents |
+| Evaluator-Optimizer | Iterative improvement | Quality improvement loop |
+| Agent | Dynamic problem solving | Uncertain tasks |
+
+## Important Principles
+
+1. **Workflow if structure is clear**: When task structure can be predefined
+2. **Agent if uncertain**: When problem or solution is uncertain and LLM judgment is needed
+3. **Subgraph for modularization**: Organize complex systems with hierarchical structure
+
+## Next Steps
+
+For details on each pattern, refer to individual pages. We recommend starting with [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md).
--- a/skills/langgraph-master/02_graph_architecture_parallelization.md
+++ b/skills/langgraph-master/02_graph_architecture_parallelization.md
@@ -0,0 +1,182 @@
+# Parallelization (Parallel Processing)
+
+A pattern for executing multiple independent tasks simultaneously.
+
+## Overview
+
+Parallelization is a pattern that executes **multiple tasks that don't depend on each other** simultaneously, achieving speed improvements and reliability verification.
+
+## Use Cases
+
+- Scoring documents with multiple evaluation criteria
+- Analysis from different perspectives (technical/business/legal)
+- Comparing results from multiple translation engines
+- Implementing Map-Reduce pattern
+
+## Implementation Example
+
+```python
+from typing import Annotated, TypedDict
+from operator import add
+
+class State(TypedDict):
+    document: str
+    scores: Annotated[list[dict], add]  # Aggregate multiple results
+
+def technical_review(state: State):
+    """Review from technical perspective"""
+    score = llm.invoke(
+        f"Technical review: {state['document']}"
+    )
+    return {"scores": [{"type": "technical", "score": score}]}
+
+def business_review(state: State):
+    """Review from business perspective"""
+    score = llm.invoke(
+        f"Business review: {state['document']}"
+    )
+    return {"scores": [{"type": "business", "score": score}]}
+
+def legal_review(state: State):
+    """Review from legal perspective"""
+    score = llm.invoke(
+        f"Legal review: {state['document']}"
+    )
+    return {"scores": [{"type": "legal", "score": score}]}
+
+def aggregate_scores(state: State):
+    """Aggregate scores"""
+    total = sum(s["score"] for s in state["scores"])
+    return {"final_score": total / len(state["scores"])}
+
+# Build graph
+builder = StateGraph(State)
+
+# Nodes to be executed in parallel
+builder.add_node("technical", technical_review)
+builder.add_node("business", business_review)
+builder.add_node("legal", legal_review)
+builder.add_node("aggregate", aggregate_scores)
+
+# Edges for parallel execution
+builder.add_edge(START, "technical")
+builder.add_edge(START, "business")
+builder.add_edge(START, "legal")
+
+# To aggregation node
+builder.add_edge("technical", "aggregate")
+builder.add_edge("business", "aggregate")
+builder.add_edge("legal", "aggregate")
+builder.add_edge("aggregate", END)
+
+graph = builder.compile()
+```
+
+## Important Concept: Reducer
+
+A **Reducer** is essential for aggregating results from parallel execution:
+
+```python
+from operator import add
+
+class State(TypedDict):
+    # Additively aggregate results from multiple nodes
+    results: Annotated[list, add]
+
+    # Keep maximum value
+    max_score: Annotated[int, max]
+
+    # Custom Reducer
+    combined: Annotated[dict, combine_dicts]
+```
+
+## Benefits
+
+✅ **Speed**: Time reduction through parallel task execution
+✅ **Reliability**: Verification by comparing multiple results
+✅ **Scalability**: Adjust parallelism based on task count
+✅ **Robustness**: Can continue if some succeed even if others fail
+
+## Considerations
+
+⚠️ **Reducer Required**: Explicitly define result aggregation method
+⚠️ **Resource Consumption**: Increased memory and API calls from parallel execution
+⚠️ **Uncertain Order**: Execution order not guaranteed
+⚠️ **Debugging Complexity**: Parallel execution troubleshooting is difficult
+
+## Advanced Patterns
+
+### Pattern 1: Fan-out / Fan-in
+
+```python
+# Fan-out: One node to multiple
+builder.add_edge("router", "task_a")
+builder.add_edge("router", "task_b")
+builder.add_edge("router", "task_c")
+
+# Fan-in: Multiple to one aggregation
+builder.add_edge("task_a", "aggregator")
+builder.add_edge("task_b", "aggregator")
+builder.add_edge("task_c", "aggregator")
+```
+
+### Pattern 2: Balancing (defer=True)
+
+Wait for branches of different lengths:
+
+```python
+from operator import add
+
+def add_with_defer(left: list, right: list) -> list:
+    return left + right
+
+class State(TypedDict):
+    results: Annotated[list, add_with_defer]
+
+# Specify defer=True at compile time
+graph = builder.compile(
+    checkpointer=checkpointer,
+    # Wait until all branches complete
+)
+```
+
+### Pattern 3: Reliability Through Redundancy
+
+```python
+def provider_a(state: State):
+    """Provider A"""
+    return {"responses": [call_api_a(state["query"])]}
+
+def provider_b(state: State):
+    """Provider B (backup)"""
+    return {"responses": [call_api_b(state["query"])]}
+
+def provider_c(state: State):
+    """Provider C (backup)"""
+    return {"responses": [call_api_c(state["query"])]}
+
+def select_best(state: State):
+    """Select best response"""
+    responses = state["responses"]
+    best = max(responses, key=lambda r: r.confidence)
+    return {"result": best}
+```
+
+## vs Other Patterns
+
+| Pattern | Parallelization | Prompt Chaining |
+|---------|----------------|-----------------|
+| Execution Order | Parallel | Sequential |
+| Dependencies | None | Yes |
+| Execution Time | Short | Long |
+| Result Aggregation | Reducer required | Not required |
+
+## Summary
+
+Parallelization is optimal for **simultaneous execution of independent tasks**. It's important to properly aggregate results using a Reducer.
+
+## Related Pages
+
+- [02_graph_architecture_orchestrator_worker.md](02_graph_architecture_orchestrator_worker.md) - Dynamic parallel processing
+- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce pattern
+- [01_core_concepts_state.md](01_core_concepts_state.md) - Reducer details
--- a/skills/langgraph-master/02_graph_architecture_prompt_chaining.md
+++ b/skills/langgraph-master/02_graph_architecture_prompt_chaining.md
@@ -0,0 +1,138 @@
+# Prompt Chaining (Sequential Processing)
+
+A sequential pattern where each LLM call processes the previous output.
+
+## Overview
+
+Prompt Chaining is a pattern that **chains multiple LLM calls in sequence**. The output of each step becomes the input for the next step.
+
+## Use Cases
+
+- Stepwise processing like translation → summary → analysis
+- Content generation → validation → correction pipeline
+- Data extraction → transformation → validation flow
+
+## Implementation Example
+
+```python
+from langgraph.graph import StateGraph, START, END
+from typing import TypedDict
+
+class State(TypedDict):
+    text: str
+    translated: str
+    summarized: str
+    analyzed: str
+
+def translate_node(state: State):
+    """Translate English → Japanese"""
+    translated = llm.invoke(
+        f"Translate to Japanese: {state['text']}"
+    )
+    return {"translated": translated}
+
+def summarize_node(state: State):
+    """Summarize translated text"""
+    summarized = llm.invoke(
+        f"Summarize this text: {state['translated']}"
+    )
+    return {"summarized": summarized}
+
+def analyze_node(state: State):
+    """Analyze summary"""
+    analyzed = llm.invoke(
+        f"Analyze sentiment: {state['summarized']}"
+    )
+    return {"analyzed": analyzed}
+
+# Build graph
+builder = StateGraph(State)
+builder.add_node("translate", translate_node)
+builder.add_node("summarize", summarize_node)
+builder.add_node("analyze", analyze_node)
+
+# Edges for sequential execution
+builder.add_edge(START, "translate")
+builder.add_edge("translate", "summarize")
+builder.add_edge("summarize", "analyze")
+builder.add_edge("analyze", END)
+
+graph = builder.compile()
+```
+
+## Benefits
+
+✅ **Simple**: Processing flow is linear and easy to understand
+✅ **Predictable**: Always executes in the same order
+✅ **Easy to Debug**: Each step can be tested independently
+✅ **Gradual Improvement**: Quality improves at each step
+
+## Considerations
+
+⚠️ **Accumulated Delay**: Takes time as each step executes sequentially
+⚠️ **Error Propagation**: Earlier errors affect later stages
+⚠️ **Lack of Flexibility**: Dynamic branching is difficult
+
+## Advanced Patterns
+
+### Pattern 1: Chain with Validation
+
+```python
+def validate_translation(state: State):
+    """Validate translation quality"""
+    is_valid = check_quality(state["translated"])
+    return {"is_valid": is_valid}
+
+def route_after_validation(state: State):
+    if state["is_valid"]:
+        return "continue"
+    return "retry"
+
+# Validation → continue or retry
+builder.add_conditional_edges(
+    "validate",
+    route_after_validation,
+    {
+        "continue": "summarize",
+        "retry": "translate"
+    }
+)
+```
+
+### Pattern 2: Gradual Refinement
+
+```python
+def draft_node(state: State):
+    """Create draft"""
+    draft = llm.invoke(f"Write a draft: {state['topic']}")
+    return {"draft": draft}
+
+def refine_node(state: State):
+    """Refine draft"""
+    refined = llm.invoke(f"Improve this draft: {state['draft']}")
+    return {"refined": refined}
+
+def polish_node(state: State):
+    """Final polish"""
+    polished = llm.invoke(f"Polish this text: {state['refined']}")
+    return {"final": polished}
+```
+
+## vs Other Patterns
+
+| Pattern | Prompt Chaining | Parallelization |
+|---------|----------------|-----------------|
+| Execution Order | Sequential | Parallel |
+| Dependencies | Yes | No |
+| Execution Time | Long | Short |
+| Use Case | Stepwise processing | Independent tasks |
+
+## Summary
+
+Prompt Chaining is the simplest pattern, optimal for **cases requiring stepwise processing**. Use when each step depends on the previous step.
+
+## Related Pages
+
+- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with parallel processing
+- [02_graph_architecture_evaluator_optimizer.md](02_graph_architecture_evaluator_optimizer.md) - Combination with validation loop
+- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Edge basics
--- a/skills/langgraph-master/02_graph_architecture_routing.md
+++ b/skills/langgraph-master/02_graph_architecture_routing.md
@@ -0,0 +1,263 @@
+# Routing (Branching Processing)
+
+A pattern for routing to specialized flows based on input.
+
+## Overview
+
+Routing is a pattern that **selects the appropriate processing path** based on input characteristics. Used for customer support question classification, etc.
+
+## Use Cases
+
+- Route customer questions to specialized teams by type
+- Different processing pipelines by document type
+- Prioritization by urgency/importance
+- Processing flow selection by language
+
+## Implementation Example: Customer Support
+
+```python
+from typing import Literal, TypedDict
+
+class State(TypedDict):
+    query: str
+    category: str
+    response: str
+
+def router_node(state: State) -> Literal["pricing", "refund", "technical"]:
+    """Classify and route question"""
+    query = state["query"]
+
+    # Classify with LLM
+    category = llm.invoke(
+        f"Classify this customer query into: pricing, refund, or technical\n"
+        f"Query: {query}\n"
+        f"Category:"
+    )
+
+    if "price" in query or "cost" in query:
+        return "pricing"
+    elif "refund" in query or "cancel" in query:
+        return "refund"
+    else:
+        return "technical"
+
+def pricing_node(state: State):
+    """Handle pricing queries"""
+    response = handle_pricing_query(state["query"])
+    return {"response": response, "category": "pricing"}
+
+def refund_node(state: State):
+    """Handle refund queries"""
+    response = handle_refund_query(state["query"])
+    return {"response": response, "category": "refund"}
+
+def technical_node(state: State):
+    """Handle technical issues"""
+    response = handle_technical_query(state["query"])
+    return {"response": response, "category": "technical"}
+
+# Build graph
+builder = StateGraph(State)
+
+builder.add_node("router", router_node)
+builder.add_node("pricing", pricing_node)
+builder.add_node("refund", refund_node)
+builder.add_node("technical", technical_node)
+
+# Routing edges
+builder.add_edge(START, "router")
+builder.add_conditional_edges(
+    "router",
+    lambda state: state.get("category", "technical"),
+    {
+        "pricing": "pricing",
+        "refund": "refund",
+        "technical": "technical"
+    }
+)
+
+# End from each node
+builder.add_edge("pricing", END)
+builder.add_edge("refund", END)
+builder.add_edge("technical", END)
+
+graph = builder.compile()
+```
+
+## Advanced Patterns
+
+### Pattern 1: Multi-Stage Routing
+
+```python
+def first_router(state: State) -> Literal["sales", "support"]:
+    """Stage 1: Sales or Support"""
+    if "purchase" in state["query"] or "quote" in state["query"]:
+        return "sales"
+    return "support"
+
+def support_router(state: State) -> Literal["billing", "technical"]:
+    """Stage 2: Classification within Support"""
+    if "billing" in state["query"]:
+        return "billing"
+    return "technical"
+
+# Multi-stage routing
+builder.add_conditional_edges("first_router", first_router, {...})
+builder.add_conditional_edges("support_router", support_router, {...})
+```
+
+### Pattern 2: Priority-Based Routing
+
+```python
+from typing import Literal
+
+def priority_router(state: State) -> Literal["urgent", "normal", "low"]:
+    """Route by urgency"""
+    query = state["query"]
+
+    # Urgent keywords
+    if any(word in query for word in ["urgent", "immediately", "asap"]):
+        return "urgent"
+
+    # Importance determination
+    importance = analyze_importance(query)
+    if importance > 0.7:
+        return "normal"
+
+    return "low"
+
+builder.add_conditional_edges(
+    "priority_router",
+    priority_router,
+    {
+        "urgent": "urgent_handler",    # Immediate processing
+        "normal": "normal_queue",       # Normal queue
+        "low": "batch_processor"        # Batch processing
+    }
+)
+```
+
+### Pattern 3: Semantic Routing (Embedding-Based)
+
+```python
+import numpy as np
+from typing import Literal
+
+def semantic_router(state: State) -> Literal["product", "account", "general"]:
+    """Semantic routing based on embeddings"""
+    query_embedding = embed(state["query"])
+
+    # Representative embeddings for each category
+    categories = {
+        "product": embed("product, features, how to use"),
+        "account": embed("account, login, password"),
+        "general": embed("general questions")
+    }
+
+    # Select closest category
+    similarities = {
+        cat: cosine_similarity(query_embedding, emb)
+        for cat, emb in categories.items()
+    }
+
+    return max(similarities, key=similarities.get)
+```
+
+### Pattern 4: Dynamic Routing (LLM Judgment)
+
+```python
+def llm_router(state: State):
+    """Have LLM determine optimal route"""
+    routes = ["expert_a", "expert_b", "expert_c", "general"]
+
+    prompt = f"""
+    Select the most appropriate expert to handle this question:
+    - expert_a: Database specialist
+    - expert_b: API specialist
+    - expert_c: UI specialist
+    - general: General questions
+
+    Question: {state['query']}
+
+    Selection: """
+
+    route = llm.invoke(prompt).strip()
+    return route if route in routes else "general"
+
+builder.add_conditional_edges(
+    "router",
+    llm_router,
+    {
+        "expert_a": "database_expert",
+        "expert_b": "api_expert",
+        "expert_c": "ui_expert",
+        "general": "general_handler"
+    }
+)
+```
+
+## Benefits
+
+✅ **Specialization**: Specialized processing for each type
+✅ **Efficiency**: Skip unnecessary processing
+✅ **Maintainability**: Improve each route independently
+✅ **Scalability**: Easy to add new routes
+
+## Considerations
+
+⚠️ **Classification Accuracy**: Routing errors affect the whole
+⚠️ **Coverage**: Need to cover all cases
+⚠️ **Fallback**: Handling unknown cases is important
+⚠️ **Balance**: Consider load balancing between routes
+
+## Best Practices
+
+### 1. Provide Fallback Route
+
+```python
+def safe_router(state: State):
+    try:
+        route = determine_route(state)
+        if route in valid_routes:
+            return route
+    except Exception:
+        pass
+
+    # Fallback
+    return "general_handler"
+```
+
+### 2. Log Routing Reasons
+
+```python
+def logged_router(state: State):
+    route = determine_route(state)
+
+    return {
+        "route": route,
+        "routing_reason": f"Routed to {route} because..."
+    }
+```
+
+### 3. Dynamic Route Addition
+
+```python
+# Load routes from configuration file
+ROUTES = load_routes_config()
+
+builder.add_conditional_edges(
+    "router",
+    determine_route,
+    {route: handler for route, handler in ROUTES.items()}
+)
+```
+
+## Summary
+
+Routing is optimal for **appropriate processing selection based on input characteristics**. Classification accuracy and fallback handling are keys to success.
+
+## Related Pages
+
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combining with Agent
+- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Conditional edge details
+- [02_graph_architecture_workflow_vs_agent.md](02_graph_architecture_workflow_vs_agent.md) - Pattern usage
--- a/skills/langgraph-master/02_graph_architecture_subgraph.md
+++ b/skills/langgraph-master/02_graph_architecture_subgraph.md
@@ -0,0 +1,282 @@
+# Subgraph
+
+A pattern for building hierarchical graph structures and modularizing complex systems.
+
+## Overview
+
+Subgraph is a pattern for hierarchically organizing complex systems by **embedding graphs as nodes in other graphs**.
+
+## Use Cases
+
+- Modularizing large-scale agent systems
+- Integrating multiple specialized agents
+- Reusable workflow components
+- Multi-level hierarchical structures
+
+## Two Implementation Approaches
+
+### Approach 1: Add Graph as Node
+
+Use when **sharing state keys**.
+
+```python
+# Subgraph definition
+class SubState(TypedDict):
+    messages: Annotated[list, add_messages]
+    sub_result: str
+
+def sub_node_a(state: SubState):
+    return {"messages": [{"role": "assistant", "content": "Sub A"}]}
+
+def sub_node_b(state: SubState):
+    return {"sub_result": "Sub B completed"}
+
+# Build subgraph
+sub_builder = StateGraph(SubState)
+sub_builder.add_node("sub_a", sub_node_a)
+sub_builder.add_node("sub_b", sub_node_b)
+sub_builder.add_edge(START, "sub_a")
+sub_builder.add_edge("sub_a", "sub_b")
+sub_builder.add_edge("sub_b", END)
+
+sub_graph = sub_builder.compile()
+
+# Use subgraph as node in parent graph
+class ParentState(TypedDict):
+    messages: Annotated[list, add_messages]  # Shared key
+    sub_result: str  # Shared key
+    parent_data: str
+
+parent_builder = StateGraph(ParentState)
+
+# Add subgraph directly as node
+parent_builder.add_node("subgraph", sub_graph)
+
+parent_builder.add_edge(START, "subgraph")
+parent_builder.add_edge("subgraph", END)
+
+parent_graph = parent_builder.compile()
+```
+
+### Approach 2: Call Graph from Within Node
+
+Use when having **different state schemas**.
+
+```python
+# Subgraph (own state)
+class SubGraphState(TypedDict):
+    input_text: str
+    output_text: str
+
+def process_node(state: SubGraphState):
+    return {"output_text": process(state["input_text"])}
+
+sub_builder = StateGraph(SubGraphState)
+sub_builder.add_node("process", process_node)
+sub_builder.add_edge(START, "process")
+sub_builder.add_edge("process", END)
+
+sub_graph = sub_builder.compile()
+
+# Parent graph (different state)
+class ParentState(TypedDict):
+    user_query: str
+    result: str
+
+def invoke_subgraph_node(state: ParentState):
+    """Call subgraph within node"""
+    # Convert parent state to subgraph state
+    sub_input = {"input_text": state["user_query"]}
+
+    # Execute subgraph
+    sub_output = sub_graph.invoke(sub_input)
+
+    # Convert subgraph output to parent state
+    return {"result": sub_output["output_text"]}
+
+parent_builder = StateGraph(ParentState)
+parent_builder.add_node("call_subgraph", invoke_subgraph_node)
+parent_builder.add_edge(START, "call_subgraph")
+parent_builder.add_edge("call_subgraph", END)
+
+parent_graph = parent_builder.compile()
+```
+
+## Multi-Level Subgraphs
+
+Multiple levels of subgraphs (parent → child → grandchild) are also possible:
+
+```python
+# Grandchild graph
+class GrandchildState(TypedDict):
+    data: str
+
+grandchild_builder = StateGraph(GrandchildState)
+grandchild_builder.add_node("process", lambda s: {"data": f"Processed: {s['data']}"})
+grandchild_builder.add_edge(START, "process")
+grandchild_builder.add_edge("process", END)
+grandchild_graph = grandchild_builder.compile()
+
+# Child graph (includes grandchild graph)
+class ChildState(TypedDict):
+    data: str
+
+child_builder = StateGraph(ChildState)
+child_builder.add_node("grandchild", grandchild_graph)  # Add grandchild graph
+child_builder.add_edge(START, "grandchild")
+child_builder.add_edge("grandchild", END)
+child_graph = child_builder.compile()
+
+# Parent graph (includes child graph)
+class ParentState(TypedDict):
+    data: str
+
+parent_builder = StateGraph(ParentState)
+parent_builder.add_node("child", child_graph)  # Add child graph
+parent_builder.add_edge(START, "child")
+parent_builder.add_edge("child", END)
+parent_graph = parent_builder.compile()
+```
+
+## Navigation Between Subgraphs
+
+Transition from subgraph to another node in parent graph:
+
+```python
+from langgraph.types import Command
+
+def sub_node_with_navigation(state: SubState):
+    """Navigate from subgraph node to parent graph"""
+    result = process(state["data"])
+
+    if need_parent_intervention(result):
+        # Transition to another node in parent graph
+        return Command(
+            update={"result": result},
+            goto="parent_handler",
+            graph=Command.PARENT
+        )
+
+    return {"result": result}
+```
+
+## Persistence and Debugging
+
+### Automatic Checkpointer Propagation
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+# Set checkpointer only on parent graph
+checkpointer = MemorySaver()
+
+parent_graph = parent_builder.compile(
+    checkpointer=checkpointer  # Automatically propagates to child graphs
+)
+```
+
+### Streaming Including Subgraph Output
+
+```python
+# Stream including subgraph details
+for chunk in parent_graph.stream(
+    inputs,
+    stream_mode="values",
+    subgraphs=True  # Include subgraph output
+):
+    print(chunk)
+```
+
+## Practical Example: Multi-Agent System
+
+```python
+# Research agent (subgraph)
+class ResearchState(TypedDict):
+    messages: Annotated[list, add_messages]
+    research_result: str
+
+research_builder = StateGraph(ResearchState)
+research_builder.add_node("search", search_node)
+research_builder.add_node("analyze", analyze_node)
+research_builder.add_edge(START, "search")
+research_builder.add_edge("search", "analyze")
+research_builder.add_edge("analyze", END)
+research_graph = research_builder.compile()
+
+# Coding agent (subgraph)
+class CodingState(TypedDict):
+    messages: Annotated[list, add_messages]
+    code: str
+
+coding_builder = StateGraph(CodingState)
+coding_builder.add_node("generate", generate_code_node)
+coding_builder.add_node("test", test_code_node)
+coding_builder.add_edge(START, "generate")
+coding_builder.add_edge("generate", "test")
+coding_builder.add_edge("test", END)
+coding_graph = coding_builder.compile()
+
+# Integrated system (parent graph)
+class SystemState(TypedDict):
+    messages: Annotated[list, add_messages]
+    research_result: str
+    code: str
+    task_type: str
+
+def router(state: SystemState):
+    if "research" in state["messages"][-1].content:
+        return "research"
+    return "coding"
+
+system_builder = StateGraph(SystemState)
+
+# Add subgraphs
+system_builder.add_node("research_agent", research_graph)
+system_builder.add_node("coding_agent", coding_graph)
+
+# Routing
+system_builder.add_conditional_edges(
+    START,
+    router,
+    {
+        "research": "research_agent",
+        "coding": "coding_agent"
+    }
+)
+
+system_builder.add_edge("research_agent", END)
+system_builder.add_edge("coding_agent", END)
+
+system_graph = system_builder.compile()
+```
+
+## Benefits
+
+✅ **Modularization**: Divide complex systems into smaller parts
+✅ **Reusability**: Use subgraphs in multiple parent graphs
+✅ **Maintainability**: Improve each subgraph independently
+✅ **Testability**: Test subgraphs individually
+
+## Considerations
+
+⚠️ **State Sharing**: Carefully design which keys to share
+⚠️ **Debugging Complexity**: Deep hierarchies are hard to track
+⚠️ **Performance**: Multi-level increases overhead
+⚠️ **Circular References**: Watch for circular dependencies between subgraphs
+
+## Best Practices
+
+1. **Shallow Hierarchy**: Keep hierarchy as shallow as possible (2-3 levels)
+2. **Clear Responsibilities**: Clearly define role of each subgraph
+3. **Minimize State**: Share only necessary state keys
+4. **Independence**: Subgraphs should operate as independently as possible
+
+## Summary
+
+Subgraph is optimal for **hierarchical organization of complex systems**. Choose between two approaches depending on state sharing method.
+
+## Related Pages
+
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combination with multi-agent
+- [01_core_concepts_state.md](01_core_concepts_state.md) - State design
+- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Checkpointer propagation
--- a/skills/langgraph-master/02_graph_architecture_workflow_vs_agent.md
+++ b/skills/langgraph-master/02_graph_architecture_workflow_vs_agent.md
@@ -0,0 +1,156 @@
+# Workflow vs Agent
+
+Differences and usage between Workflow and Agent.
+
+## Basic Differences
+
+### Workflow
+> "predetermined code paths and are designed to operate in a certain order"
+> (Predetermined code paths, operates in specific order)
+
+- **Pre-defined**: Processing flow is clear
+- **Predictable**: Follows same path for same input
+- **Controlled Execution**: Developer has complete control over control flow
+
+### Agent
+> "dynamic and define their own processes and tool usage"
+> (Dynamic, defines its own processes and tool usage)
+
+- **Dynamic**: LLM decides next action
+- **Autonomous**: Self-determines tool selection
+- **Uncertain**: May follow different paths with same input
+
+## Implementation Comparison
+
+### Workflow Example: Translation Pipeline
+
+```python
+def translate_node(state: State):
+    return {"text": translate(state["text"])}
+
+def summarize_node(state: State):
+    return {"summary": summarize(state["text"])}
+
+def validate_node(state: State):
+    return {"valid": check_quality(state["summary"])}
+
+# Fixed flow
+builder.add_edge(START, "translate")
+builder.add_edge("translate", "summarize")
+builder.add_edge("summarize", "validate")
+builder.add_edge("validate", END)
+```
+
+### Agent Example: Problem-Solving Agent
+
+```python
+def agent_node(state: State):
+    # LLM determines tool usage
+    response = llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+def should_continue(state: State):
+    last_message = state["messages"][-1]
+    # Continue if there are tool calls
+    if last_message.tool_calls:
+        return "continue"
+    return "end"
+
+# LLM decides dynamically
+builder.add_conditional_edges(
+    "agent",
+    should_continue,
+    {"continue": "tools", "end": END}
+)
+```
+
+## Selection Criteria
+
+### Choose Workflow When
+
+✅ **Structure is Clear**
+- Processing steps are known in advance
+- Execution order is fixed
+
+✅ **Predictability is Important**
+- Compliance requirements exist
+- Debugging needs to be easy
+
+✅ **Cost Efficiency**
+- Want to minimize LLM calls
+- Want to reduce token consumption
+
+**Examples**: Data processing pipelines, approval workflows, translation chains
+
+### Choose Agent When
+
+✅ **Problem is Uncertain**
+- Don't know which tools are needed
+- Variable number of steps
+
+✅ **Flexibility is Needed**
+- Different approaches based on situation
+- Diverse user questions
+
+✅ **Autonomy is Valuable**
+- Want to leverage LLM's judgment
+- ReAct (reasoning + action) pattern is suitable
+
+**Examples**: Customer support, research assistant, complex problem solving
+
+## Hybrid Approach
+
+Many practical systems combine both:
+
+```python
+# Embed Agent within Workflow
+builder.add_edge(START, "input_validation")  # Workflow
+builder.add_edge("input_validation", "agent")  # Agent part
+builder.add_conditional_edges("agent", should_continue, {...})
+builder.add_edge("tools", "agent")
+builder.add_conditional_edges("agent", ..., {"end": "output_formatting"})
+builder.add_edge("output_formatting", END)  # Workflow
+```
+
+## ReAct Pattern (Agent Foundation)
+
+Agent follows the **ReAct** (Reasoning + Acting) pattern:
+
+1. **Reasoning**: Think "What should I do next?"
+2. **Acting**: Take action using tools
+3. **Observing**: Observe results
+4. Repeat until reaching final answer
+
+```python
+# ReAct loop implementation
+def agent(state):
+    # Reasoning: Determine next action
+    response = llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+def tools(state):
+    # Acting: Execute tools
+    results = execute_tools(state["messages"][-1].tool_calls)
+    return {"messages": results}
+
+# Observing & Repeat
+builder.add_conditional_edges("agent", should_continue, ...)
+```
+
+## Summary
+
+| Aspect | Workflow | Agent |
+|--------|----------|-------|
+| Control | Developer has complete control | LLM decides dynamically |
+| Predictability | High | Low |
+| Flexibility | Low | High |
+| Cost | Low | High |
+| Use Case | Structured tasks | Uncertain tasks |
+
+**Important**: Both can be built with the same tools (State, Node, Edge) in LangGraph. Pattern choice depends on problem nature.
+
+## Related Pages
+
+- [02_graph_architecture_prompt_chaining.md](02_graph_architecture_prompt_chaining.md) - Workflow pattern example
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent pattern details
+- [02_graph_architecture_routing.md](02_graph_architecture_routing.md) - Hybrid approach example
--- a/skills/langgraph-master/03_memory_management_checkpointer.md
+++ b/skills/langgraph-master/03_memory_management_checkpointer.md
@@ -0,0 +1,224 @@
+# Checkpointer
+
+Implementation details for saving and restoring state.
+
+## Overview
+
+Checkpointer implements the `BaseCheckpointSaver` interface and is responsible for state persistence.
+
+## Checkpointer Implementations
+
+### 1. MemorySaver (For Experimentation & Testing)
+
+Saves checkpoints in memory:
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+
+# All data is lost when the process terminates
+```
+
+**Use Case**: Local testing, prototyping
+
+### 2. SqliteSaver (For Local Development)
+
+Saves to SQLite database:
+
+```python
+from langgraph.checkpoint.sqlite import SqliteSaver
+
+# File-based
+checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
+
+# Or from connection object
+import sqlite3
+conn = sqlite3.connect("checkpoints.db")
+checkpointer = SqliteSaver(conn)
+
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+**Use Case**: Local development, single-user applications
+
+### 3. PostgresSaver (For Production)
+
+Saves to PostgreSQL database:
+
+```python
+from langgraph.checkpoint.postgres import PostgresSaver
+from psycopg_pool import ConnectionPool
+
+# Connection pool
+pool = ConnectionPool(
+    conninfo="postgresql://user:password@localhost:5432/db"
+)
+
+checkpointer = PostgresSaver(pool)
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+**Use Case**: Production environments, multi-user applications
+
+## BaseCheckpointSaver Interface
+
+All checkpointers implement the following methods:
+
+```python
+class BaseCheckpointSaver:
+    def put(
+        self,
+        config: RunnableConfig,
+        checkpoint: Checkpoint,
+        metadata: dict
+    ) -> RunnableConfig:
+        """Save a checkpoint"""
+
+    def get_tuple(
+        self,
+        config: RunnableConfig
+    ) -> CheckpointTuple | None:
+        """Retrieve a checkpoint"""
+
+    def list(
+        self,
+        config: RunnableConfig,
+        *,
+        before: RunnableConfig | None = None,
+        limit: int | None = None
+    ) -> Iterator[CheckpointTuple]:
+        """Get list of checkpoints"""
+```
+
+## Custom Checkpointer
+
+Implement your own persistence logic:
+
+```python
+from langgraph.checkpoint.base import BaseCheckpointSaver
+
+class RedisCheckpointer(BaseCheckpointSaver):
+    def __init__(self, redis_client):
+        self.redis = redis_client
+
+    def put(self, config, checkpoint, metadata):
+        thread_id = config["configurable"]["thread_id"]
+        checkpoint_id = checkpoint["id"]
+
+        key = f"checkpoint:{thread_id}:{checkpoint_id}"
+        self.redis.set(key, serialize(checkpoint))
+
+        return config
+
+    def get_tuple(self, config):
+        thread_id = config["configurable"]["thread_id"]
+        # Retrieve the latest checkpoint
+        # ...
+
+    def list(self, config, before=None, limit=None):
+        # Return list of checkpoints
+        # ...
+```
+
+## Checkpointer Configuration
+
+### Namespaces
+
+Share the same checkpointer across multiple graphs:
+
+```python
+checkpointer = MemorySaver()
+
+graph1 = builder1.compile(
+    checkpointer=checkpointer,
+    name="graph1"  # Namespace
+)
+
+graph2 = builder2.compile(
+    checkpointer=checkpointer,
+    name="graph2"  # Different namespace
+)
+```
+
+### Automatic Propagation
+
+Parent graph's checkpointer automatically propagates to subgraphs:
+
+```python
+# Set only on parent graph
+parent_graph = parent_builder.compile(checkpointer=checkpointer)
+
+# Automatically propagates to child graphs
+```
+
+## Checkpoint Management
+
+### Deleting Old Checkpoints
+
+```python
+# Delete after a certain period (implementation-dependent)
+import datetime
+
+cutoff = datetime.datetime.now() - datetime.timedelta(days=30)
+
+# Implementation example (SQLite)
+checkpointer.conn.execute(
+    "DELETE FROM checkpoints WHERE created_at < ?",
+    (cutoff,)
+)
+```
+
+### Optimizing Checkpoint Size
+
+```python
+class State(TypedDict):
+    # Avoid large data
+    messages: Annotated[list, add_messages]
+
+    # Store references only
+    large_data_id: str  # Actual data in separate storage
+
+def node(state: State):
+    # Retrieve large data from external source
+    large_data = fetch_from_storage(state["large_data_id"])
+    # ...
+```
+
+## Performance Considerations
+
+### Connection Pool (PostgreSQL)
+
+```python
+from psycopg_pool import ConnectionPool
+
+pool = ConnectionPool(
+    conninfo=conn_string,
+    min_size=5,
+    max_size=20
+)
+
+checkpointer = PostgresSaver(pool)
+```
+
+### Async Checkpointer
+
+```python
+from langgraph.checkpoint.postgres import AsyncPostgresSaver
+
+async_checkpointer = AsyncPostgresSaver(async_pool)
+
+# Async execution
+async for chunk in graph.astream(input, config):
+    print(chunk)
+```
+
+## Summary
+
+Checkpointer determines how state is persisted. It's important to choose the appropriate implementation for your use case.
+
+## Related Pages
+
+- [03_memory_management_persistence.md](03_memory_management_persistence.md) - How to use persistence
+- [03_memory_management_store.md](03_memory_management_store.md) - Differences from long-term memory
--- a/skills/langgraph-master/03_memory_management_overview.md
+++ b/skills/langgraph-master/03_memory_management_overview.md
@@ -0,0 +1,152 @@
+# 03. Memory Management
+
+State management through persistence and checkpoint features.
+
+## Overview
+
+LangGraph's **built-in persistence layer** allows you to save and restore agent state. This enables conversation continuation, error recovery, and time travel.
+
+## Memory Types
+
+### Short-term Memory: [Checkpointer](03_memory_management_checkpointer.md)
+- Automatically saves state at each superstep
+- Thread-based conversation management
+- Time travel functionality
+
+### Long-term Memory: [Store](03_memory_management_store.md)
+- Share information across threads
+- Persist user information
+- Semantic search
+
+## Key Features
+
+### 1. [Persistence](03_memory_management_persistence.md)
+
+**Checkpoints**: Save state at each superstep
+- Snapshot state at each stage of graph execution
+- Recoverable from failures
+- Track execution history
+
+**Threads**: Unit of conversation
+- Identify conversations by `thread_id`
+- Each thread maintains independent state
+- Manage multiple conversations in parallel
+
+**StateSnapshot**: Representation of checkpoints
+- `values`: State at that point in time
+- `next`: Nodes to execute next
+- `config`: Checkpoint configuration
+- `metadata`: Metadata
+
+### 2. Human-in-the-Loop
+
+**State Inspection**: Check state at any point
+```python
+state = graph.get_state(config)
+print(state.values)
+```
+
+**Approval Flow**: Human approval before critical operations
+```python
+# Pause graph and wait for approval
+```
+
+### 3. Memory
+
+**Conversation Memory**: Memory within a thread
+```python
+# Conversation continues when called with the same thread_id
+config = {"configurable": {"thread_id": "conversation-1"}}
+graph.invoke(input, config)
+```
+
+**Long-term Memory**: Memory across threads
+```python
+# Save user information in Store
+store.put(("user", user_id), "preferences", user_prefs)
+```
+
+### 4. Time Travel
+
+Replay and fork past executions:
+```python
+# Resume from specific checkpoint
+history = graph.get_state_history(config)
+for state in history:
+    print(f"Checkpoint: {state.config['configurable']['checkpoint_id']}")
+
+# Re-execute from past checkpoint
+graph.invoke(input, past_checkpoint_config)
+```
+
+## Checkpointer Implementations
+
+LangGraph provides multiple checkpointer implementations:
+
+### InMemorySaver (For Experimentation)
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+### SqliteSaver (For Local Development)
+```python
+from langgraph.checkpoint.sqlite import SqliteSaver
+
+checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+### PostgresSaver (For Production)
+```python
+from langgraph.checkpoint.postgres import PostgresSaver
+
+checkpointer = PostgresSaver.from_conn_string(
+    "postgresql://user:pass@localhost/db"
+)
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+## Basic Usage Example
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+# Compile with checkpointer
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+
+# Execute with thread_id
+config = {"configurable": {"thread_id": "user-123"}}
+
+# First execution
+result1 = graph.invoke({"messages": [("user", "Hello")]}, config)
+
+# Continue in same thread
+result2 = graph.invoke({"messages": [("user", "How are you?")]}, config)
+
+# Check state
+state = graph.get_state(config)
+print(state.values)  # All messages so far
+
+# Check history
+for state in graph.get_state_history(config):
+    print(f"Step: {state.values}")
+```
+
+## Key Principles
+
+1. **Thread ID Management**: Use unique thread_id for each conversation
+2. **Checkpointer Selection**: Choose appropriate implementation for your use case
+3. **State Minimization**: Save only necessary information to keep checkpoint size small
+4. **Cleanup**: Periodically delete old checkpoints
+
+## Next Steps
+
+For details on each feature, refer to the following pages:
+
+- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Persistence details
+- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Checkpointer implementation
+- [03_memory_management_store.md](03_memory_management_store.md) - Long-term memory management
--- a/skills/langgraph-master/03_memory_management_persistence.md
+++ b/skills/langgraph-master/03_memory_management_persistence.md
@@ -0,0 +1,264 @@
+# Persistence
+
+Functionality to save and restore graph state.
+
+## Overview
+
+Persistence is a feature that **automatically saves** state at each stage of graph execution and allows you to restore it later.
+
+## Basic Concepts
+
+### Checkpoints
+
+State is automatically saved after each **superstep** (set of nodes executed in parallel).
+
+```python
+# Superstep 1: node_a and node_b execute in parallel
+# → Checkpoint 1
+
+# Superstep 2: node_c executes
+# → Checkpoint 2
+
+# Superstep 3: node_d executes
+# → Checkpoint 3
+```
+
+### Threads
+
+A thread is an identifier containing the **accumulated state of a series of executions**:
+
+```python
+config = {"configurable": {"thread_id": "conversation-123"}}
+```
+
+Executing with the same `thread_id` continues from the previous state.
+
+## Implementation Example
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+from langgraph.graph import StateGraph, MessagesState
+
+# Define graph
+builder = StateGraph(MessagesState)
+builder.add_node("chatbot", chatbot_node)
+builder.add_edge(START, "chatbot")
+builder.add_edge("chatbot", END)
+
+# Compile with checkpointer
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+
+# Execute with thread ID
+config = {"configurable": {"thread_id": "user-001"}}
+
+# First execution
+graph.invoke(
+    {"messages": [{"role": "user", "content": "My name is Alice"}]},
+    config
+)
+
+# Continue in same thread (retains previous state)
+response = graph.invoke(
+    {"messages": [{"role": "user", "content": "What's my name?"}]},
+    config
+)
+
+# → "Your name is Alice"
+```
+
+## StateSnapshot Object
+
+Checkpoints are represented as `StateSnapshot` objects:
+
+```python
+class StateSnapshot:
+    values: dict          # State at that point in time
+    next: tuple[str]      # Nodes to execute next
+    config: RunnableConfig  # Checkpoint configuration
+    metadata: dict        # Metadata
+    tasks: tuple[PregelTask]  # Scheduled tasks
+```
+
+### Getting Latest State
+
+```python
+state = graph.get_state(config)
+
+print(state.values)      # Current state
+print(state.next)        # Next nodes
+print(state.config)      # Checkpoint configuration
+```
+
+### Getting History
+
+```python
+# Get list of StateSnapshots in chronological order
+for state in graph.get_state_history(config):
+    print(f"Checkpoint: {state.config['configurable']['checkpoint_id']}")
+    print(f"Values: {state.values}")
+    print(f"Next: {state.next}")
+    print("---")
+```
+
+## Time Travel Feature
+
+Resume execution from a specific checkpoint:
+
+```python
+# Get specific checkpoint from history
+history = list(graph.get_state_history(config))
+
+# Checkpoint from 3 steps ago
+past_state = history[3]
+
+# Re-execute from that checkpoint
+result = graph.invoke(
+    {"messages": [{"role": "user", "content": "New question"}]},
+    past_state.config
+)
+```
+
+### Validating Alternative Paths
+
+```python
+# Get current state
+current_state = graph.get_state(config)
+
+# Try with different input
+alt_result = graph.invoke(
+    {"messages": [{"role": "user", "content": "Different question"}]},
+    current_state.config
+)
+
+# Original execution is not affected
+```
+
+## Updating State
+
+Directly update checkpoint state:
+
+```python
+# Get current state
+state = graph.get_state(config)
+
+# Update state
+graph.update_state(
+    config,
+    {"messages": [{"role": "assistant", "content": "Updated message"}]}
+)
+
+# Resume from updated state
+graph.invoke({"messages": [...]}, config)
+```
+
+## Use Cases
+
+### 1. Conversation Continuation
+
+```python
+# Session 1
+config = {"configurable": {"thread_id": "chat-1"}}
+graph.invoke({"messages": [("user", "Hello")]}, config)
+
+# Session 2 (days later)
+# Remembers previous conversation
+graph.invoke({"messages": [("user", "Continuing from last time")]}, config)
+```
+
+### 2. Error Recovery
+
+```python
+try:
+    graph.invoke(input, config)
+except Exception as e:
+    # Even if error occurs, can recover from checkpoint
+    print(f"Error: {e}")
+
+    # Check latest state
+    state = graph.get_state(config)
+
+    # Fix state and re-execute
+    graph.update_state(config, {"error_fixed": True})
+    graph.invoke(input, config)
+```
+
+### 3. A/B Testing
+
+```python
+# Base execution
+base_result = graph.invoke(input, base_config)
+
+# Alternative execution 1
+alt_config_1 = base_config.copy()
+alt_result_1 = graph.invoke(modified_input_1, alt_config_1)
+
+# Alternative execution 2
+alt_config_2 = base_config.copy()
+alt_result_2 = graph.invoke(modified_input_2, alt_config_2)
+
+# Compare results
+```
+
+### 4. Debugging and Tracing
+
+```python
+# Execute
+graph.invoke(input, config)
+
+# Check each step
+for i, state in enumerate(graph.get_state_history(config)):
+    print(f"Step {i}:")
+    print(f"  State: {state.values}")
+    print(f"  Next: {state.next}")
+```
+
+## Important Considerations
+
+### Thread ID Uniqueness
+
+```python
+# Use different thread_id per user
+user_config = {"configurable": {"thread_id": f"user-{user_id}"}}
+
+# Use different thread_id per conversation
+conversation_config = {"configurable": {"thread_id": f"conv-{conv_id}"}}
+```
+
+### Checkpoint Cleanup
+
+```python
+# Delete old checkpoints (implementation-dependent)
+checkpointer.cleanup(before_timestamp=old_timestamp)
+```
+
+### Multi-user Support
+
+```python
+# Combine user ID and session ID
+def get_config(user_id: str, session_id: str):
+    return {
+        "configurable": {
+            "thread_id": f"{user_id}-{session_id}"
+        }
+    }
+
+config = get_config("user123", "session456")
+```
+
+## Best Practices
+
+1. **Meaningful thread_id**: Format that can identify user, session, conversation
+2. **Regular Cleanup**: Delete old checkpoints
+3. **Appropriate Checkpointer**: Choose implementation based on use case
+4. **Error Handling**: Properly handle errors when retrieving checkpoints
+
+## Summary
+
+Persistence enables **state persistence and restoration**, making conversation continuation, error recovery, and time travel possible.
+
+## Related Pages
+
+- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Checkpointer implementation details
+- [03_memory_management_store.md](03_memory_management_store.md) - Combining with long-term memory
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Applications of state inspection
--- a/skills/langgraph-master/03_memory_management_store.md
+++ b/skills/langgraph-master/03_memory_management_store.md
@@ -0,0 +1,287 @@
+# Store (Long-term Memory)
+
+Long-term memory for sharing information across multiple threads.
+
+## Overview
+
+Checkpointer only saves state within a single thread. To share information across multiple threads, use **Store**.
+
+## Checkpointer vs Store
+
+| Feature | Checkpointer | Store |
+|---------|-------------|-------|
+| Scope | Single thread | All threads |
+| Purpose | Conversation state | User information |
+| Auto-save | Yes | No (manual) |
+| Search | thread_id | Namespace |
+
+## Basic Usage
+
+```python
+from langgraph.store.memory import InMemoryStore
+
+# Create Store
+store = InMemoryStore()
+
+# Save user information
+store.put(
+    namespace=("users", "user-123"),
+    key="preferences",
+    value={
+        "language": "en",
+        "theme": "dark",
+        "notifications": True
+    }
+)
+
+# Retrieve user information
+user_prefs = store.get(("users", "user-123"), "preferences")
+```
+
+## Namespace
+
+Namespaces are grouped by **tuples**:
+
+```python
+# User information
+("users", user_id)
+
+# Session information
+("sessions", session_id)
+
+# Project information
+("projects", project_id, "documents")
+
+# Hierarchical structure
+("organization", org_id, "department", dept_id)
+```
+
+## Store Operations
+
+### Save
+
+```python
+store.put(
+    namespace=("users", "alice"),
+    key="profile",
+    value={
+        "name": "Alice",
+        "email": "alice@example.com",
+        "joined": "2024-01-01"
+    }
+)
+```
+
+### Retrieve
+
+```python
+# Single item
+profile = store.get(("users", "alice"), "profile")
+
+# All items in namespace
+items = store.search(("users", "alice"))
+```
+
+### Search
+
+```python
+# Filter by namespace
+all_users = store.search(("users",))
+
+# Filter by key
+profiles = store.search(("users",), filter={"key": "profile"})
+```
+
+### Delete
+
+```python
+# Single item
+store.delete(("users", "alice"), "profile")
+
+# Entire namespace
+store.delete_namespace(("users", "alice"))
+```
+
+## Integration with Graph
+
+```python
+from langgraph.store.memory import InMemoryStore
+
+store = InMemoryStore()
+
+# Integrate Store with graph
+graph = builder.compile(
+    checkpointer=checkpointer,
+    store=store
+)
+
+# Use Store within nodes
+def personalized_node(state: State, *, store):
+    user_id = state["user_id"]
+
+    # Get user preferences
+    prefs = store.get(("users", user_id), "preferences")
+
+    # Process based on preferences
+    if prefs and prefs.value.get("language") == "en":
+        response = generate_english_response(state)
+    else:
+        response = generate_default_response(state)
+
+    return {"response": response}
+```
+
+## Semantic Search
+
+Store implementations with vector search capability:
+
+```python
+from langgraph.store.memory import InMemoryStore
+
+store = InMemoryStore(index={"embed": True})
+
+# Save documents (automatically vectorized)
+store.put(
+    ("documents", "doc-1"),
+    "content",
+    {"text": "LangGraph is an agent framework"}
+)
+
+# Semantic search
+results = store.search(
+    ("documents",),
+    query="agent development"
+)
+```
+
+## Practical Example: User Profile
+
+```python
+class ProfileState(TypedDict):
+    user_id: str
+    messages: Annotated[list, add_messages]
+
+def save_user_info(state: ProfileState, *, store):
+    """Extract and save user information from conversation"""
+    messages = state["messages"]
+    user_id = state["user_id"]
+
+    # Extract information with LLM
+    info = extract_user_info(messages)
+
+    if info:
+        # Save to Store
+        current = store.get(("users", user_id), "profile")
+
+        if current:
+            # Merge with existing information
+            updated = {**current.value, **info}
+        else:
+            updated = info
+
+        store.put(
+            ("users", user_id),
+            "profile",
+            updated
+        )
+
+    return {}
+
+def personalized_response(state: ProfileState, *, store):
+    """Personalize using user information"""
+    user_id = state["user_id"]
+
+    # Get user information
+    profile = store.get(("users", user_id), "profile")
+
+    if profile:
+        context = f"User context: {profile.value}"
+        messages = [
+            {"role": "system", "content": context},
+            *state["messages"]
+        ]
+    else:
+        messages = state["messages"]
+
+    response = llm.invoke(messages)
+    return {"messages": [response]}
+```
+
+## Practical Example: Knowledge Base
+
+```python
+def query_knowledge_base(state: State, *, store):
+    """Search for knowledge related to question"""
+    query = state["messages"][-1].content
+
+    # Semantic search
+    relevant_docs = store.search(
+        ("knowledge",),
+        query=query,
+        limit=3
+    )
+
+    # Add relevant information to context
+    context = "\n".join([
+        doc.value["text"]
+        for doc in relevant_docs
+    ])
+
+    # Pass to LLM
+    response = llm.invoke([
+        {"role": "system", "content": f"Context:\n{context}"},
+        *state["messages"]
+    ])
+
+    return {"messages": [response]}
+```
+
+## Store Implementations
+
+### InMemoryStore
+
+```python
+from langgraph.store.memory import InMemoryStore
+
+store = InMemoryStore()
+```
+
+### Custom Store
+
+```python
+from langgraph.store.base import BaseStore
+
+class RedisStore(BaseStore):
+    def __init__(self, redis_client):
+        self.redis = redis_client
+
+    def put(self, namespace, key, value):
+        ns_key = f"{':'.join(namespace)}:{key}"
+        self.redis.set(ns_key, json.dumps(value))
+
+    def get(self, namespace, key):
+        ns_key = f"{':'.join(namespace)}:{key}"
+        data = self.redis.get(ns_key)
+        return json.loads(data) if data else None
+
+    def search(self, namespace, filter=None):
+        pattern = f"{':'.join(namespace)}:*"
+        keys = self.redis.keys(pattern)
+        return [self.get_by_key(k) for k in keys]
+```
+
+## Best Practices
+
+1. **Namespace Design**: Hierarchical and meaningful structure
+2. **Key Naming**: Clear and consistent naming conventions
+3. **Data Size**: Store references only for large data
+4. **Cleanup**: Periodic deletion of old data
+
+## Summary
+
+Store is long-term memory for sharing information across multiple threads. Use it for persisting user profiles, knowledge bases, settings, etc.
+
+## Related Pages
+
+- [03_memory_management_checkpointer.md](03_memory_management_checkpointer.md) - Differences from short-term memory
+- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Persistence basics
--- a/skills/langgraph-master/04_tool_integration_command_api.md
+++ b/skills/langgraph-master/04_tool_integration_command_api.md
@@ -0,0 +1,280 @@
+# Command API
+
+An advanced API that integrates state updates and control flow.
+
+## Overview
+
+The Command API is a feature that allows nodes to specify **state updates** and **control flow** simultaneously.
+
+## Basic Usage
+
+```python
+from langgraph.types import Command
+
+def decision_node(state: State) -> Command:
+    """Update state and specify the next node"""
+    result = analyze(state["data"])
+
+    if result["confidence"] > 0.8:
+        return Command(
+            update={"result": result, "confident": True},
+            goto="finalize"
+        )
+    else:
+        return Command(
+            update={"result": result, "confident": False},
+            goto="review"
+        )
+```
+
+## Command Object Parameters
+
+```python
+Command(
+    update: dict,           # Updates to state
+    goto: str | list[str],  # Next node(s) (single or multiple)
+    graph: str | None = None  # For subgraph navigation
+)
+```
+
+## vs Traditional State Updates
+
+### Traditional Method
+
+```python
+def node(state: State) -> dict:
+    return {"result": "value"}
+
+# Control flow in edges
+def route(state: State) -> str:
+    if state["result"] == "value":
+        return "next_node"
+    return "other_node"
+
+builder.add_conditional_edges("node", route, {...})
+```
+
+### Command API
+
+```python
+def node(state: State) -> Command:
+    return Command(
+        update={"result": "value"},
+        goto="next_node"  # Specify control flow as well
+    )
+
+# No edges needed (Command controls flow)
+```
+
+## Advanced Patterns
+
+### Pattern 1: Conditional Branching
+
+```python
+def validator(state: State) -> Command:
+    """Validate and determine next node"""
+    is_valid = validate(state["data"])
+
+    if is_valid:
+        return Command(
+            update={"valid": True},
+            goto="process"
+        )
+    else:
+        return Command(
+            update={"valid": False, "errors": get_errors(state["data"])},
+            goto="error_handler"
+        )
+```
+
+### Pattern 2: Parallel Execution
+
+```python
+def fan_out_node(state: State) -> Command:
+    """Branch to multiple nodes in parallel"""
+    return Command(
+        update={"started": True},
+        goto=["worker_a", "worker_b", "worker_c"]  # Parallel execution
+    )
+```
+
+### Pattern 3: Loop Control
+
+```python
+def iterator_node(state: State) -> Command:
+    """Iterative processing"""
+    iteration = state.get("iteration", 0) + 1
+    result = process_iteration(state["data"], iteration)
+
+    if iteration < state["max_iterations"] and not result["done"]:
+        return Command(
+            update={"iteration": iteration, "result": result},
+            goto="iterator_node"  # Loop back to self
+        )
+    else:
+        return Command(
+            update={"final_result": result},
+            goto=END
+        )
+```
+
+### Pattern 4: Subgraph Navigation
+
+```python
+def sub_node(state: State) -> Command:
+    """Navigate from subgraph to parent graph"""
+    result = process(state["data"])
+
+    if need_parent_intervention(result):
+        return Command(
+            update={"sub_result": result},
+            goto="parent_handler",
+            graph=Command.PARENT  # Navigate to parent graph
+        )
+
+    return {"sub_result": result}
+```
+
+## Integration with Tools
+
+### Control After Tool Execution
+
+```python
+def tool_node_with_command(state: MessagesState) -> Command:
+    """Determine next action after tool execution"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        tool = tool_map[tool_call["name"]]
+        result = tool.invoke(tool_call["args"])
+
+        tool_results.append(
+            ToolMessage(
+                content=str(result),
+                tool_call_id=tool_call["id"]
+            )
+        )
+
+    # Determine next node based on results
+    if any("error" in r.content.lower() for r in tool_results):
+        return Command(
+            update={"messages": tool_results},
+            goto="error_handler"
+        )
+    else:
+        return Command(
+            update={"messages": tool_results},
+            goto="agent"
+        )
+```
+
+### Command from Within Tools
+
+```python
+from langgraph.types import interrupt
+
+@tool
+def send_email(to: str, subject: str, body: str) -> str:
+    """Send email (with approval)"""
+
+    # Request approval
+    approved = interrupt({
+        "action": "send_email",
+        "to": to,
+        "subject": subject,
+        "message": "Approve sending this email?"
+    })
+
+    if approved:
+        result = actually_send_email(to, subject, body)
+        return f"Email sent to {to}"
+    else:
+        return "Email cancelled by user"
+```
+
+## Dynamic Routing
+
+```python
+def dynamic_router(state: State) -> Command:
+    """Dynamically select route based on state"""
+    score = evaluate(state["data"])
+
+    # Select route based on score
+    if score > 0.9:
+        route = "expert_handler"
+    elif score > 0.7:
+        route = "standard_handler"
+    else:
+        route = "basic_handler"
+
+    return Command(
+        update={"confidence_score": score},
+        goto=route
+    )
+```
+
+## Error Recovery
+
+```python
+def processor_with_fallback(state: State) -> Command:
+    """Fallback on error"""
+    try:
+        result = risky_operation(state["data"])
+
+        return Command(
+            update={"result": result, "error": None},
+            goto="success_handler"
+        )
+
+    except Exception as e:
+        return Command(
+            update={"error": str(e)},
+            goto="fallback_handler"
+        )
+```
+
+## State Machine Implementation
+
+```python
+def state_machine_node(state: State) -> Command:
+    """State machine"""
+    current_state = state.get("state", "initial")
+
+    transitions = {
+        "initial": ("validate", {"state": "validating"}),
+        "validating": ("process" if state.get("valid") else "error", {"state": "processing"}),
+        "processing": ("finalize", {"state": "finalizing"}),
+        "finalizing": (END, {"state": "done"})
+    }
+
+    next_node, update = transitions[current_state]
+
+    return Command(
+        update=update,
+        goto=next_node
+    )
+```
+
+## Benefits
+
+✅ **Conciseness**: Define state updates and control flow in one place
+✅ **Readability**: Node intent is clear
+✅ **Flexibility**: Dynamic routing is easier
+✅ **Debugging**: Control flow is easier to track
+
+## Considerations
+
+⚠️ **Complexity**: Avoid overly complex conditional branching
+⚠️ **Testing**: All branches need to be tested
+⚠️ **Parallel Execution**: Order of parallel nodes is non-deterministic
+
+## Summary
+
+The Command API integrates state updates and control flow, enabling more flexible and readable graph construction.
+
+## Related Pages
+
+- [01_core_concepts_node.md](01_core_concepts_node.md) - Node basics
+- [01_core_concepts_edge.md](01_core_concepts_edge.md) - Comparison with edges
+- [02_graph_architecture_subgraph.md](02_graph_architecture_subgraph.md) - Subgraph navigation
--- a/skills/langgraph-master/04_tool_integration_overview.md
+++ b/skills/langgraph-master/04_tool_integration_overview.md
@@ -0,0 +1,158 @@
+# 04. Tool Integration
+
+Integration and execution control of external tools.
+
+## Overview
+
+In LangGraph, LLMs can interact with external systems by calling **tools**. Tools provide various capabilities such as search, calculation, API calls, and more.
+
+## Key Components
+
+### 1. [Tool Definition](04_tool_integration_tool_definition.md)
+
+How to define tools:
+- `@tool` decorator
+- Function descriptions and parameters
+- Structured output
+
+### 2. [Tool Node](04_tool_integration_tool_node.md)
+
+Nodes that execute tools:
+- Using `ToolNode`
+- Error handling
+- Custom tool nodes
+
+### 3. [Command API](04_tool_integration_command_api.md)
+
+Controlling tool execution:
+- Integration of state updates and control flow
+- Transition control from tools
+
+## Basic Implementation
+
+```python
+from langchain_core.tools import tool
+from langgraph.prebuilt import ToolNode
+from langgraph.graph import MessagesState, StateGraph
+
+# 1. Define tools
+@tool
+def search(query: str) -> str:
+    """Perform a web search.
+
+    Args:
+        query: Search query
+    """
+    return perform_search(query)
+
+@tool
+def calculator(expression: str) -> float:
+    """Calculate a mathematical expression.
+
+    Args:
+        expression: Expression to calculate (e.g., "2 + 2")
+    """
+    return eval(expression)
+
+tools = [search, calculator]
+
+# 2. Bind tools to LLM
+llm_with_tools = llm.bind_tools(tools)
+
+# 3. Agent node
+def agent(state: MessagesState):
+    response = llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+# 4. Tool node
+tool_node = ToolNode(tools)
+
+# 5. Build graph
+builder = StateGraph(MessagesState)
+builder.add_node("agent", agent)
+builder.add_node("tools", tool_node)
+
+# 6. Conditional edges
+def should_continue(state: MessagesState):
+    last_message = state["messages"][-1]
+    if last_message.tool_calls:
+        return "tools"
+    return END
+
+builder.add_edge(START, "agent")
+builder.add_conditional_edges("agent", should_continue)
+builder.add_edge("tools", "agent")
+
+graph = builder.compile()
+```
+
+## Types of Tools
+
+### Search Tools
+
+```python
+@tool
+def web_search(query: str) -> str:
+    """Search the web"""
+    return search_api(query)
+```
+
+### Calculator Tools
+
+```python
+@tool
+def calculator(expression: str) -> float:
+    """Calculate a mathematical expression"""
+    return eval(expression)
+```
+
+### API Tools
+
+```python
+@tool
+def get_weather(city: str) -> dict:
+    """Get weather information"""
+    return weather_api(city)
+```
+
+### Database Tools
+
+```python
+@tool
+def query_database(sql: str) -> list[dict]:
+    """Query the database"""
+    return execute_sql(sql)
+```
+
+## Tool Execution Flow
+
+```
+User Query
+    ↓
+[Agent Node]
+    ↓
+LLM decides: Use tool?
+    ↓ Yes
+[Tool Node] ← Execute tool
+    ↓
+[Agent Node] ← Tool result
+    ↓
+LLM decides: Continue?
+    ↓ No
+Final Answer
+```
+
+## Key Principles
+
+1. **Clear Descriptions**: Write detailed docstrings for tools
+2. **Error Handling**: Handle tool execution errors appropriately
+3. **Type Safety**: Explicitly specify parameter types
+4. **Approval Flow**: Incorporate Human-in-the-Loop for critical tools
+
+## Next Steps
+
+For details on each component, please refer to the following pages:
+
+- [04_tool_integration_tool_definition.md](04_tool_integration_tool_definition.md) - How to define tools
+- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Tool node implementation
+- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Using the Command API
--- a/skills/langgraph-master/04_tool_integration_tool_definition.md
+++ b/skills/langgraph-master/04_tool_integration_tool_definition.md
@@ -0,0 +1,227 @@
+# Tool Definition
+
+How to define tools and design patterns.
+
+## Basic Definition
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def search(query: str) -> str:
+    """Perform a web search.
+
+    Args:
+        query: Search query
+    """
+    return perform_search(query)
+```
+
+## Key Elements
+
+### 1. Docstring
+
+Description for the LLM to understand the tool:
+
+```python
+@tool
+def get_weather(location: str, unit: str = "celsius") -> str:
+    """Get the current weather for a specified location.
+
+    This tool provides up-to-date weather information for cities around the world.
+    It includes detailed information such as temperature, humidity, and weather conditions.
+
+    Args:
+        location: City name (e.g., "Tokyo", "New York", "London")
+        unit: Temperature unit ("celsius" or "fahrenheit"), default is "celsius"
+
+    Returns:
+        A string containing weather information
+
+    Examples:
+        >>> get_weather("Tokyo")
+        "Tokyo weather: Sunny, Temperature: 25°C, Humidity: 60%"
+    """
+    return fetch_weather(location, unit)
+```
+
+### 2. Type Annotations
+
+Explicitly specify parameter and return value types:
+
+```python
+from typing import List, Dict
+
+@tool
+def search_products(
+    query: str,
+    max_results: int = 10,
+    category: str | None = None
+) -> List[Dict[str, any]]:
+    """Search for products.
+
+    Args:
+        query: Search keywords
+        max_results: Maximum number of results
+        category: Category filter (optional)
+    """
+    return database.search(query, max_results, category)
+```
+
+## Structured Output
+
+Structured output using Pydantic models:
+
+```python
+from pydantic import BaseModel, Field
+
+class WeatherInfo(BaseModel):
+    temperature: float = Field(description="Temperature in Celsius")
+    humidity: int = Field(description="Humidity (%)")
+    condition: str = Field(description="Weather condition")
+    location: str = Field(description="Location")
+
+@tool(response_format="content_and_artifact")
+def get_detailed_weather(location: str) -> tuple[str, WeatherInfo]:
+    """Get detailed weather information.
+
+    Args:
+        location: City name
+    """
+    data = fetch_weather_data(location)
+
+    weather = WeatherInfo(
+        temperature=data["temp"],
+        humidity=data["humidity"],
+        condition=data["condition"],
+        location=location
+    )
+
+    summary = f"{location} weather: {weather.condition}, {weather.temperature}°C"
+
+    return summary, weather
+```
+
+## Best Practices for Tool Design
+
+### 1. Single Responsibility
+
+```python
+# Good: Does one thing well
+@tool
+def send_email(to: str, subject: str, body: str) -> str:
+    """Send an email"""
+
+# Bad: Multiple responsibilities
+@tool
+def send_and_log_email(to: str, subject: str, body: str, log_file: str) -> str:
+    """Send an email and log it"""
+    # Two different responsibilities
+```
+
+### 2. Clear Parameters
+
+```python
+# Good: Clear parameters
+@tool
+def book_meeting(
+    title: str,
+    start_time: str,  # "2024-01-01 10:00"
+    duration_minutes: int,
+    attendees: List[str]
+) -> str:
+    """Book a meeting"""
+
+# Bad: Ambiguous parameters
+@tool
+def book_meeting(data: dict) -> str:
+    """Book a meeting"""
+```
+
+### 3. Error Handling
+
+```python
+@tool
+def divide(a: float, b: float) -> float:
+    """Divide two numbers.
+
+    Args:
+        a: Dividend
+        b: Divisor
+
+    Raises:
+        ValueError: If b is 0
+    """
+    if b == 0:
+        raise ValueError("Cannot divide by zero")
+
+    return a / b
+```
+
+## Dynamic Tool Generation
+
+Automatically generate tools from API schemas:
+
+```python
+def create_api_tool(endpoint: str, method: str, description: str):
+    """Generate tools from API specifications"""
+
+    @tool
+    def api_tool(**kwargs) -> dict:
+        f"""
+        {description}
+
+        API Endpoint: {endpoint}
+        Method: {method}
+        """
+        response = requests.request(
+            method=method,
+            url=endpoint,
+            json=kwargs
+        )
+        return response.json()
+
+    return api_tool
+
+# Example usage
+create_user_tool = create_api_tool(
+    endpoint="https://api.example.com/users",
+    method="POST",
+    description="Create a new user"
+)
+```
+
+## Grouping Tools
+
+Group related tools together:
+
+```python
+# Database tool group
+database_tools = [
+    query_users_tool,
+    update_user_tool,
+    delete_user_tool
+]
+
+# Search tool group
+search_tools = [
+    web_search_tool,
+    image_search_tool,
+    news_search_tool
+]
+
+# Select based on context
+if user.role == "admin":
+    tools = database_tools + search_tools
+else:
+    tools = search_tools
+```
+
+## Summary
+
+Tool definitions require clear and detailed docstrings, appropriate type annotations, and error handling.
+
+## Related Pages
+
+- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Using tools in tool nodes
+- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Integration with Command API
--- a/skills/langgraph-master/04_tool_integration_tool_node.md
+++ b/skills/langgraph-master/04_tool_integration_tool_node.md
@@ -0,0 +1,318 @@
+# Tool Node
+
+Implementation of nodes that execute tools.
+
+## ToolNode (Built-in)
+
+The simplest approach:
+
+```python
+from langgraph.prebuilt import ToolNode
+
+tools = [search_tool, calculator_tool]
+tool_node = ToolNode(tools)
+
+# Add to graph
+builder.add_node("tools", tool_node)
+```
+
+## How It Works
+
+ToolNode:
+1. Extracts `tool_calls` from the last message
+2. Executes each tool
+3. Returns results as `ToolMessage`
+
+```python
+# Input
+{
+    "messages": [
+        AIMessage(tool_calls=[
+            {"name": "search", "args": {"query": "weather"}, "id": "1"}
+        ])
+    ]
+}
+
+# ToolNode execution
+
+# Output
+{
+    "messages": [
+        ToolMessage(
+            content="Sunny, 25°C",
+            tool_call_id="1"
+        )
+    ]
+}
+```
+
+## Custom Tool Node
+
+For finer control:
+
+```python
+def custom_tool_node(state: MessagesState):
+    """Custom tool node"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        # Find the tool
+        tool = tool_map.get(tool_call["name"])
+
+        if not tool:
+            result = f"Tool {tool_call['name']} not found"
+        else:
+            try:
+                # Execute the tool
+                result = tool.invoke(tool_call["args"])
+            except Exception as e:
+                result = f"Error: {str(e)}"
+
+        # Create ToolMessage
+        tool_results.append(
+            ToolMessage(
+                content=str(result),
+                tool_call_id=tool_call["id"]
+            )
+        )
+
+    return {"messages": tool_results}
+```
+
+## Error Handling
+
+### Basic Error Handling
+
+```python
+def robust_tool_node(state: MessagesState):
+    """Tool node with error handling"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        try:
+            tool = tool_map[tool_call["name"]]
+            result = tool.invoke(tool_call["args"])
+
+            tool_results.append(
+                ToolMessage(
+                    content=str(result),
+                    tool_call_id=tool_call["id"]
+                )
+            )
+
+        except KeyError:
+            # Tool not found
+            tool_results.append(
+                ToolMessage(
+                    content=f"Error: Tool '{tool_call['name']}' not found",
+                    tool_call_id=tool_call["id"]
+                )
+            )
+
+        except Exception as e:
+            # Execution error
+            tool_results.append(
+                ToolMessage(
+                    content=f"Error executing tool: {str(e)}",
+                    tool_call_id=tool_call["id"]
+                )
+            )
+
+    return {"messages": tool_results}
+```
+
+### Retry Logic
+
+```python
+import time
+
+def tool_node_with_retry(state: MessagesState, max_retries: int = 3):
+    """Tool node with retry"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        tool = tool_map[tool_call["name"]]
+        retry_count = 0
+
+        while retry_count < max_retries:
+            try:
+                result = tool.invoke(tool_call["args"])
+
+                tool_results.append(
+                    ToolMessage(
+                        content=str(result),
+                        tool_call_id=tool_call["id"]
+                    )
+                )
+                break
+
+            except TransientError as e:
+                retry_count += 1
+                if retry_count >= max_retries:
+                    tool_results.append(
+                        ToolMessage(
+                            content=f"Failed after {max_retries} retries: {str(e)}",
+                            tool_call_id=tool_call["id"]
+                        )
+                    )
+                else:
+                    time.sleep(2 ** retry_count)  # Exponential backoff
+
+            except Exception as e:
+                # Non-retryable error
+                tool_results.append(
+                    ToolMessage(
+                        content=f"Error: {str(e)}",
+                        tool_call_id=tool_call["id"]
+                    )
+                )
+                break
+
+    return {"messages": tool_results}
+```
+
+## Conditional Tool Execution
+
+```python
+def conditional_tool_node(state: MessagesState, *, store):
+    """Tool node with permission checking"""
+    user_id = state.get("user_id")
+    user = store.get(("users", user_id), "profile")
+
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        tool = tool_map[tool_call["name"]]
+
+        # Permission check
+        if not has_permission(user, tool.name):
+            tool_results.append(
+                ToolMessage(
+                    content=f"Permission denied for tool '{tool.name}'",
+                    tool_call_id=tool_call["id"]
+                )
+            )
+            continue
+
+        # Execute
+        result = tool.invoke(tool_call["args"])
+        tool_results.append(
+            ToolMessage(
+                content=str(result),
+                tool_call_id=tool_call["id"]
+            )
+        )
+
+    return {"messages": tool_results}
+```
+
+## Logging Tool Execution
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+def logged_tool_node(state: MessagesState):
+    """Tool node with logging"""
+    last_message = state["messages"][-1]
+    tool_results = []
+
+    for tool_call in last_message.tool_calls:
+        tool = tool_map[tool_call["name"]]
+
+        logger.info(
+            f"Executing tool: {tool.name}",
+            extra={
+                "tool": tool.name,
+                "args": tool_call["args"],
+                "call_id": tool_call["id"]
+            }
+        )
+
+        try:
+            start = time.time()
+            result = tool.invoke(tool_call["args"])
+            duration = time.time() - start
+
+            logger.info(
+                f"Tool completed: {tool.name}",
+                extra={
+                    "tool": tool.name,
+                    "duration": duration,
+                    "success": True
+                }
+            )
+
+            tool_results.append(
+                ToolMessage(
+                    content=str(result),
+                    tool_call_id=tool_call["id"]
+                )
+            )
+
+        except Exception as e:
+            logger.error(
+                f"Tool failed: {tool.name}",
+                extra={
+                    "tool": tool.name,
+                    "error": str(e)
+                },
+                exc_info=True
+            )
+
+            tool_results.append(
+                ToolMessage(
+                    content=f"Error: {str(e)}",
+                    tool_call_id=tool_call["id"]
+                )
+            )
+
+    return {"messages": tool_results}
+```
+
+## Parallel Tool Execution
+
+```python
+from concurrent.futures import ThreadPoolExecutor
+
+def parallel_tool_node(state: MessagesState):
+    """Execute tools in parallel"""
+    last_message = state["messages"][-1]
+
+    def execute_tool(tool_call):
+        tool = tool_map[tool_call["name"]]
+        try:
+            result = tool.invoke(tool_call["args"])
+            return ToolMessage(
+                content=str(result),
+                tool_call_id=tool_call["id"]
+            )
+        except Exception as e:
+            return ToolMessage(
+                content=f"Error: {str(e)}",
+                tool_call_id=tool_call["id"]
+            )
+
+    with ThreadPoolExecutor(max_workers=5) as executor:
+        tool_results = list(executor.map(
+            execute_tool,
+            last_message.tool_calls
+        ))
+
+    return {"messages": tool_results}
+```
+
+## Summary
+
+ToolNode executes tools and returns results as ToolMessage. You can add error handling, permission checks, logging, and more.
+
+## Related Pages
+
+- [04_tool_integration_tool_definition.md](04_tool_integration_tool_definition.md) - Tool definition
+- [04_tool_integration_command_api.md](04_tool_integration_command_api.md) - Integration with Command API
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Combining with approval flows
--- a/skills/langgraph-master/05_advanced_features_human_in_the_loop.md
+++ b/skills/langgraph-master/05_advanced_features_human_in_the_loop.md
@@ -0,0 +1,289 @@
+# Human-in-the-Loop (Approval Flow)
+
+A feature to pause graph execution and request human intervention.
+
+## Overview
+
+Human-in-the-Loop is a feature that requests **human approval or input** before important decisions or actions.
+
+## Dynamic Interrupt (Recommended)
+
+### Basic Usage
+
+```python
+from langgraph.types import interrupt
+
+def approval_node(state: State):
+    """Request approval"""
+    approved = interrupt("Do you approve this action?")
+
+    if approved:
+        return {"status": "approved"}
+    else:
+        return {"status": "rejected"}
+```
+
+### Execution
+
+```python
+# Initial execution (stops at interrupt)
+result = graph.invoke(input, config)
+
+# Check interrupt information
+print(result["__interrupt__"])  # "Do you approve this action?"
+
+# Approve and resume
+graph.invoke(None, config, resume=True)
+
+# Or reject
+graph.invoke(None, config, resume=False)
+```
+
+## Application Patterns
+
+### Pattern 1: Approve or Reject
+
+```python
+def action_approval(state: State):
+    """Approval before action execution"""
+    action_details = prepare_action(state)
+
+    approved = interrupt({
+        "question": "Approve this action?",
+        "details": action_details
+    })
+
+    if approved:
+        result = execute_action(action_details)
+        return {"result": result, "approved": True}
+    else:
+        return {"result": None, "approved": False}
+```
+
+### Pattern 2: Editable Approval
+
+```python
+def review_and_edit(state: State):
+    """Review and edit generated content"""
+    generated = generate_content(state)
+
+    edited_content = interrupt({
+        "instruction": "Review and edit this content",
+        "content": generated
+    })
+
+    return {"final_content": edited_content}
+
+# Resume with edited version
+graph.invoke(None, config, resume=edited_version)
+```
+
+### Pattern 3: Tool Execution Approval
+
+```python
+@tool
+def send_email(to: str, subject: str, body: str):
+    """Send email (with approval)"""
+    response = interrupt({
+        "action": "send_email",
+        "to": to,
+        "subject": subject,
+        "body": body,
+        "message": "Approve sending this email?"
+    })
+
+    if response.get("action") == "approve":
+        # When approved, parameters can also be edited
+        final_to = response.get("to", to)
+        final_subject = response.get("subject", subject)
+        final_body = response.get("body", body)
+
+        return actually_send_email(final_to, final_subject, final_body)
+    else:
+        return "Email cancelled by user"
+```
+
+### Pattern 4: Input Validation Loop
+
+```python
+def get_valid_input(state: State):
+    """Loop until valid input is obtained"""
+    prompt = "Enter a positive number:"
+
+    while True:
+        answer = interrupt(prompt)
+
+        if isinstance(answer, (int, float)) and answer > 0:
+            break
+
+        prompt = f"'{answer}' is invalid. Enter a positive number:"
+
+    return {"value": answer}
+```
+
+## Static Interrupt (For Debugging)
+
+Set breakpoints at compile time:
+
+```python
+graph = builder.compile(
+    checkpointer=checkpointer,
+    interrupt_before=["risky_node"],  # Stop before node execution
+    interrupt_after=["generate_content"]  # Stop after node execution
+)
+
+# Execute (stops before specified node)
+graph.invoke(input, config)
+
+# Check state
+state = graph.get_state(config)
+
+# Resume
+graph.invoke(None, config)
+```
+
+## Practical Example: Multi-Stage Approval Workflow
+
+```python
+from langgraph.types import interrupt, Command
+
+class ApprovalState(TypedDict):
+    request: str
+    draft: str
+    reviewed: str
+    approved: bool
+
+def draft_node(state: ApprovalState):
+    """Create draft"""
+    draft = create_draft(state["request"])
+    return {"draft": draft}
+
+def review_node(state: ApprovalState):
+    """Review and edit"""
+    reviewed = interrupt({
+        "type": "review",
+        "content": state["draft"],
+        "instruction": "Review and improve the draft"
+    })
+
+    return {"reviewed": reviewed}
+
+def approval_node(state: ApprovalState):
+    """Final approval"""
+    approved = interrupt({
+        "type": "approval",
+        "content": state["reviewed"],
+        "question": "Approve for publication?"
+    })
+
+    if approved:
+        return Command(
+            update={"approved": True},
+            goto="publish"
+        )
+    else:
+        return Command(
+            update={"approved": False},
+            goto="draft"  # Return to draft
+        )
+
+def publish_node(state: ApprovalState):
+    """Publish"""
+    publish(state["reviewed"])
+    return {"status": "published"}
+
+# Build graph
+builder.add_node("draft", draft_node)
+builder.add_node("review", review_node)
+builder.add_node("approval", approval_node)
+builder.add_node("publish", publish_node)
+
+builder.add_edge(START, "draft")
+builder.add_edge("draft", "review")
+builder.add_edge("review", "approval")
+# approval node determines control flow with Command
+builder.add_edge("publish", END)
+```
+
+## Important Rules
+
+### ✅ Recommendations
+
+- Pass values in JSON format
+- Keep `interrupt()` call order consistent
+- Make processing before `interrupt()` idempotent
+
+### ❌ Prohibitions
+
+- Don't catch `interrupt()` with `try-except`
+- Don't skip `interrupt()` conditionally
+- Don't pass non-serializable objects
+
+## Use Cases
+
+### 1. High-Risk Operation Approval
+
+```python
+def delete_data(state: State):
+    """Delete data (approval required)"""
+    approved = interrupt({
+        "action": "delete_data",
+        "warning": "This cannot be undone!",
+        "data_count": len(state["data_to_delete"])
+    })
+
+    if approved:
+        execute_delete(state["data_to_delete"])
+        return {"deleted": True}
+    return {"deleted": False}
+```
+
+### 2. Creative Work Review
+
+```python
+def creative_generation(state: State):
+    """Creative content generation and review"""
+    versions = []
+
+    for _ in range(3):
+        version = generate_creative(state["prompt"])
+        versions.append(version)
+
+    selected = interrupt({
+        "type": "select_version",
+        "versions": versions,
+        "instruction": "Select the best version or request regeneration"
+    })
+
+    return {"final_version": selected}
+```
+
+### 3. Incremental Data Input
+
+```python
+def collect_user_info(state: State):
+    """Collect user information incrementally"""
+    name = interrupt("What is your name?")
+
+    age = interrupt(f"Hello {name}, what is your age?")
+
+    city = interrupt("What city do you live in?")
+
+    return {
+        "user_info": {
+            "name": name,
+            "age": age,
+            "city": city
+        }
+    }
+```
+
+## Summary
+
+Human-in-the-Loop is a feature for incorporating human judgment in important decisions. Dynamic interrupt is flexible and recommended.
+
+## Related Pages
+
+- [03_memory_management_persistence.md](03_memory_management_persistence.md) - Checkpointer is required
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Combination with agents
+- [04_tool_integration_tool_node.md](04_tool_integration_tool_node.md) - Approval before tool execution
--- a/skills/langgraph-master/05_advanced_features_map_reduce.md
+++ b/skills/langgraph-master/05_advanced_features_map_reduce.md
@@ -0,0 +1,283 @@
+# Map-Reduce (Parallel Processing Pattern)
+
+A pattern for parallel processing and aggregation of large datasets.
+
+## Overview
+
+Map-Reduce is a pattern that combines **Map** (parallel processing) and **Reduce** (aggregation). In LangGraph, it's implemented using the Send API.
+
+## Basic Implementation
+
+```python
+from langgraph.types import Send
+from typing import Annotated
+from operator import add
+
+class MapReduceState(TypedDict):
+    items: list[str]
+    results: Annotated[list[str], add]
+    final_result: str
+
+def map_node(state: MapReduceState):
+    """Map: Send each item to worker"""
+    return [
+        Send("worker", {"item": item})
+        for item in state["items"]
+    ]
+
+def worker_node(item_state: dict):
+    """Process individual item"""
+    result = process_item(item_state["item"])
+    return {"results": [result]}
+
+def reduce_node(state: MapReduceState):
+    """Reduce: Aggregate results"""
+    final = aggregate_results(state["results"])
+    return {"final_result": final}
+
+# Build graph
+builder = StateGraph(MapReduceState)
+builder.add_node("map", map_node)
+builder.add_node("worker", worker_node)
+builder.add_node("reduce", reduce_node)
+
+builder.add_edge(START, "map")
+builder.add_edge("worker", "reduce")
+builder.add_edge("reduce", END)
+
+graph = builder.compile()
+```
+
+## Types of Reducers
+
+### Addition (List Concatenation)
+
+```python
+from operator import add
+
+class State(TypedDict):
+    results: Annotated[list, add]  # Concatenate lists
+
+# [1, 2] + [3, 4] = [1, 2, 3, 4]
+```
+
+### Custom Reducer
+
+```python
+def merge_dicts(left: dict, right: dict) -> dict:
+    """Merge dictionaries"""
+    return {**left, **right}
+
+class State(TypedDict):
+    data: Annotated[dict, merge_dicts]
+```
+
+## Application Patterns
+
+### Pattern 1: Parallel Document Summarization
+
+```python
+class DocSummaryState(TypedDict):
+    documents: list[str]
+    summaries: Annotated[list[str], add]
+    final_summary: str
+
+def map_documents(state: DocSummaryState):
+    """Send each document to worker"""
+    return [
+        Send("summarize_worker", {"doc": doc, "index": i})
+        for i, doc in enumerate(state["documents"])
+    ]
+
+def summarize_worker(worker_state: dict):
+    """Summarize individual document"""
+    summary = llm.invoke(f"Summarize: {worker_state['doc']}")
+    return {"summaries": [summary]}
+
+def final_summary_node(state: DocSummaryState):
+    """Integrate all summaries"""
+    combined = "\n".join(state["summaries"])
+    final = llm.invoke(f"Create final summary from:\n{combined}")
+    return {"final_summary": final}
+```
+
+### Pattern 2: Hierarchical Map-Reduce
+
+```python
+def level1_map(state: State):
+    """Level 1: Split data into chunks"""
+    chunks = create_chunks(state["data"], chunk_size=100)
+    return [
+        Send("level1_worker", {"chunk": chunk})
+        for chunk in chunks
+    ]
+
+def level1_worker(worker_state: dict):
+    """Level 1 worker: Aggregate within chunk"""
+    partial_result = aggregate_chunk(worker_state["chunk"])
+    return {"level1_results": [partial_result]}
+
+def level2_map(state: State):
+    """Level 2: Further aggregate partial results"""
+    return [
+        Send("level2_worker", {"partial": result})
+        for result in state["level1_results"]
+    ]
+
+def level2_worker(worker_state: dict):
+    """Level 2 worker: Final aggregation"""
+    final = final_aggregate(worker_state["partial"])
+    return {"final_result": final}
+```
+
+### Pattern 3: Dynamic Parallelism Control
+
+```python
+import os
+
+def adaptive_map(state: State):
+    """Adjust parallelism based on system resources"""
+    max_workers = int(os.getenv("MAX_WORKERS", "10"))
+    items = state["items"]
+
+    # Split items into batches
+    batch_size = max(1, len(items) // max_workers)
+    batches = [
+        items[i:i+batch_size]
+        for i in range(0, len(items), batch_size)
+    ]
+
+    return [
+        Send("batch_worker", {"batch": batch})
+        for batch in batches
+    ]
+
+def batch_worker(worker_state: dict):
+    """Process batch"""
+    results = [process_item(item) for item in worker_state["batch"]]
+    return {"results": results}
+```
+
+### Pattern 4: Error-Resilient Map-Reduce
+
+```python
+class RobustState(TypedDict):
+    items: list[str]
+    successes: Annotated[list, add]
+    failures: Annotated[list, add]
+
+def robust_worker(worker_state: dict):
+    """Worker with error handling"""
+    try:
+        result = process_item(worker_state["item"])
+        return {"successes": [{"item": worker_state["item"], "result": result}]}
+
+    except Exception as e:
+        return {"failures": [{"item": worker_state["item"], "error": str(e)}]}
+
+def error_handler(state: RobustState):
+    """Process failed items"""
+    if state["failures"]:
+        # Retry or log failed items
+        log_failures(state["failures"])
+
+    return {"final_result": aggregate(state["successes"])}
+```
+
+## Performance Optimization
+
+### Batch Size Adjustment
+
+```python
+def optimal_batching(items: list, target_batch_time: float = 1.0):
+    """Calculate optimal batch size"""
+    # Estimate processing time per item
+    sample_time = estimate_processing_time(items[0])
+
+    # Batch size to reach target time
+    batch_size = max(1, int(target_batch_time / sample_time))
+
+    batches = [
+        items[i:i+batch_size]
+        for i in range(0, len(items), batch_size)
+    ]
+
+    return batches
+```
+
+### Progress Tracking
+
+```python
+from langgraph.config import get_stream_writer
+
+def map_with_progress(state: State):
+    """Map that reports progress"""
+    writer = get_stream_writer()
+    total = len(state["items"])
+
+    sends = []
+    for i, item in enumerate(state["items"]):
+        sends.append(Send("worker", {"item": item}))
+        writer({"progress": f"{i+1}/{total}"})
+
+    return sends
+```
+
+## Aggregation Patterns
+
+### Statistical Aggregation
+
+```python
+def statistical_reduce(state: State):
+    """Calculate statistics"""
+    results = state["results"]
+
+    return {
+        "total": sum(results),
+        "average": sum(results) / len(results),
+        "min": min(results),
+        "max": max(results),
+        "count": len(results)
+    }
+```
+
+### LLM-Based Integration
+
+```python
+def llm_reduce(state: State):
+    """Integrate multiple results with LLM"""
+    all_results = "\n\n".join([
+        f"Result {i+1}:\n{r}"
+        for i, r in enumerate(state["results"])
+    ])
+
+    final = llm.invoke(
+        f"Synthesize these results into a comprehensive answer:\n\n{all_results}"
+    )
+
+    return {"final_result": final}
+```
+
+## Advantages
+
+✅ **Scalability**: Efficiently process large datasets
+✅ **Parallelism**: Execute independent tasks concurrently
+✅ **Flexibility**: Dynamically adjust number of workers
+✅ **Error Isolation**: One failure doesn't affect the whole
+
+## Considerations
+
+⚠️ **Memory Consumption**: Many worker instances
+⚠️ **Order Non-deterministic**: Worker execution order is not guaranteed
+⚠️ **Overhead**: Inefficient for small tasks
+⚠️ **Reducer Design**: Design appropriate aggregation method
+
+## Summary
+
+Map-Reduce is a pattern that uses Send API to process large datasets in parallel and aggregates with Reducers. Optimal for large-scale data processing.
+
+## Related Pages
+
+- [02_graph_architecture_orchestrator_worker.md](02_graph_architecture_orchestrator_worker.md) - Orchestrator-Worker pattern
+- [02_graph_architecture_parallelization.md](02_graph_architecture_parallelization.md) - Comparison with static parallelization
+- [01_core_concepts_state.md](01_core_concepts_state.md) - Details on Reducers
--- a/skills/langgraph-master/05_advanced_features_overview.md
+++ b/skills/langgraph-master/05_advanced_features_overview.md
@@ -0,0 +1,73 @@
+# 05. Advanced Features
+
+Advanced features and implementation patterns.
+
+## Overview
+
+By leveraging LangGraph's advanced features, you can build more sophisticated agent systems.
+
+## Key Features
+
+### 1. [Human-in-the-Loop (Approval Flow)](05_advanced_features_human_in_the_loop.md)
+
+Pause graph execution and request human intervention:
+- Dynamic interrupt
+- Static interrupt
+- Approval, editing, and rejection flows
+
+### 2. [Streaming](05_advanced_features_streaming.md)
+
+Monitor progress in real-time:
+- LLM token streaming
+- State update streaming
+- Custom event streaming
+
+### 3. [Map-Reduce (Parallel Processing Pattern)](05_advanced_features_map_reduce.md)
+
+Parallel processing of large datasets:
+- Dynamic worker generation with Send API
+- Result aggregation with Reducers
+- Hierarchical parallel processing
+
+## Feature Comparison
+
+| Feature | Use Case | Implementation Complexity |
+|---------|----------|--------------------------|
+| Human-in-the-Loop | Approval flows, quality control | Medium |
+| Streaming | Real-time monitoring, UX improvement | Low |
+| Map-Reduce | Large-scale data processing | High |
+
+## Combination Patterns
+
+### Human-in-the-Loop + Streaming
+
+```python
+# Stream while requesting approval
+for chunk in graph.stream(input, config, stream_mode="values"):
+    print(chunk)
+
+    # Pause at interrupt
+    if chunk.get("__interrupt__"):
+        approval = input("Approve? (y/n): ")
+        graph.invoke(None, config, resume=approval == "y")
+```
+
+### Map-Reduce + Streaming
+
+```python
+# Stream progress of parallel processing
+for chunk in graph.stream(
+    {"items": large_dataset},
+    stream_mode="updates",
+    subgraphs=True  # Also show worker progress
+):
+    print(f"Progress: {chunk}")
+```
+
+## Next Steps
+
+For details on each feature, refer to the following pages:
+
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Implementation of approval flows
+- [05_advanced_features_streaming.md](05_advanced_features_streaming.md) - How to use streaming
+- [05_advanced_features_map_reduce.md](05_advanced_features_map_reduce.md) - Map-Reduce pattern
--- a/skills/langgraph-master/05_advanced_features_streaming.md
+++ b/skills/langgraph-master/05_advanced_features_streaming.md
@@ -0,0 +1,220 @@
+# Streaming
+
+A feature to monitor graph execution progress in real-time.
+
+## Overview
+
+Streaming is a feature that receives **real-time updates** during graph execution. You can stream LLM tokens, state changes, custom events, and more.
+
+## Types of stream_mode
+
+### 1. values (Complete State Snapshot)
+
+Complete state after each step:
+
+```python
+for chunk in graph.stream(input, stream_mode="values"):
+    print(chunk)
+
+# Example output
+# {"messages": [{"role": "user", "content": "Hello"}]}
+# {"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi!"}]}
+```
+
+### 2. updates (Only State Changes)
+
+Only changes at each step:
+
+```python
+for chunk in graph.stream(input, stream_mode="updates"):
+    print(chunk)
+
+# Example output
+# {"messages": [{"role": "assistant", "content": "Hi!"}]}
+```
+
+### 3. messages (LLM Tokens)
+
+Stream at token level from LLM:
+
+```python
+for msg, metadata in graph.stream(input, stream_mode="messages"):
+    if msg.content:
+        print(msg.content, end="", flush=True)
+
+# Output: "H" "i" "!" " " "H" "o" "w" ... (token by token)
+```
+
+### 4. debug (Debug Information)
+
+Detailed graph execution information:
+
+```python
+for chunk in graph.stream(input, stream_mode="debug"):
+    print(chunk)
+
+# Details like node execution, edge transitions, etc.
+```
+
+### 5. custom (Custom Data)
+
+Send custom data from nodes:
+
+```python
+from langgraph.config import get_stream_writer
+
+def my_node(state: State):
+    writer = get_stream_writer()
+
+    for i in range(10):
+        writer({"progress": i * 10})  # Custom data
+
+    return {"result": "done"}
+
+for mode, chunk in graph.stream(input, stream_mode=["updates", "custom"]):
+    if mode == "custom":
+        print(f"Progress: {chunk['progress']}%")
+```
+
+## LLM Token Streaming
+
+### Stream Only Specific Nodes
+
+```python
+for msg, metadata in graph.stream(input, stream_mode="messages"):
+    # Display tokens only from specific node
+    if metadata["langgraph_node"] == "chatbot":
+        if msg.content:
+            print(msg.content, end="", flush=True)
+
+print()  # Newline
+```
+
+### Filter by Tags
+
+```python
+# Set tags on LLM
+llm = init_chat_model("gpt-5", tags=["main_llm"])
+
+for msg, metadata in graph.stream(input, stream_mode="messages"):
+    if "main_llm" in metadata.get("tags", []):
+        if msg.content:
+            print(msg.content, end="", flush=True)
+```
+
+## Using Multiple Modes Simultaneously
+
+```python
+for mode, chunk in graph.stream(input, stream_mode=["values", "messages"]):
+    if mode == "values":
+        print(f"\nState: {chunk}")
+    elif mode == "messages":
+        if chunk[0].content:
+            print(chunk[0].content, end="", flush=True)
+```
+
+## Subgraph Streaming
+
+```python
+# Include subgraph outputs
+for chunk in graph.stream(
+    input,
+    stream_mode="updates",
+    subgraphs=True  # Include subgraphs
+):
+    print(chunk)
+```
+
+## Practical Example: Progress Bar
+
+```python
+from tqdm import tqdm
+
+def process_with_progress(items: list):
+    """Processing with progress bar"""
+    total = len(items)
+
+    with tqdm(total=total) as pbar:
+        for chunk in graph.stream(
+            {"items": items},
+            stream_mode="custom"
+        ):
+            if "progress" in chunk:
+                pbar.update(1)
+
+    return "Complete!"
+```
+
+## Practical Example: Real-time UI Updates
+
+```python
+import streamlit as st
+
+def run_with_ui_updates(user_input: str):
+    """Update Streamlit UI in real-time"""
+    status = st.empty()
+    output = st.empty()
+
+    full_response = ""
+
+    for msg, metadata in graph.stream(
+        {"messages": [{"role": "user", "content": user_input}]},
+        stream_mode="messages"
+    ):
+        if msg.content:
+            full_response += msg.content
+            output.markdown(full_response + "▌")
+
+        status.text(f"Node: {metadata['langgraph_node']}")
+
+    output.markdown(full_response)
+    status.text("Complete!")
+```
+
+## Async Streaming
+
+```python
+async def async_stream_example():
+    """Async streaming"""
+    async for chunk in graph.astream(input, stream_mode="updates"):
+        print(chunk)
+        await asyncio.sleep(0)  # Yield to other tasks
+```
+
+## Sending Custom Events
+
+```python
+from langgraph.config import get_stream_writer
+
+def multi_step_node(state: State):
+    """Report progress of multiple steps"""
+    writer = get_stream_writer()
+
+    # Step 1
+    writer({"status": "Analyzing..."})
+    analysis = analyze_data(state["data"])
+
+    # Step 2
+    writer({"status": "Processing..."})
+    result = process_analysis(analysis)
+
+    # Step 3
+    writer({"status": "Finalizing..."})
+    final = finalize(result)
+
+    return {"result": final}
+
+# Receive
+for mode, chunk in graph.stream(input, stream_mode=["updates", "custom"]):
+    if mode == "custom":
+        print(chunk["status"])
+```
+
+## Summary
+
+Streaming monitors progress in real-time and improves user experience. Choose the appropriate stream_mode based on your use case.
+
+## Related Pages
+
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent streaming
+- [05_advanced_features_human_in_the_loop.md](05_advanced_features_human_in_the_loop.md) - Combining streaming and approval
--- a/skills/langgraph-master/06_llm_model_ids.md
+++ b/skills/langgraph-master/06_llm_model_ids.md
@@ -0,0 +1,299 @@
+# LLM Model ID Reference
+
+List of model IDs for major LLM providers commonly used in LangGraph. For detailed information and best practices for each provider, please refer to the individual pages.
+
+> **Last Updated**: 2025-11-24
+> **Note**: Model availability and names may change. Please refer to each provider's official documentation for the latest information.
+
+## 📚 Provider-Specific Documentation
+
+### [Google Gemini Models](06_llm_model_ids_gemini.md)
+
+Google's latest LLM models featuring large-scale context (up to 1M tokens).
+
+**Key Models**:
+
+- `google/gemini-3-pro-preview` - Latest high-performance model
+- `gemini-2.5-flash` - Fast response version (1M tokens)
+- `gemini-2.5-flash-lite` - Lightweight fast version
+
+**Details**: [Gemini Model ID Complete Guide](06_llm_model_ids_gemini.md)
+
+---
+
+### [Anthropic Claude Models](06_llm_model_ids_claude.md)
+
+Anthropic's Claude 4.x series featuring balanced performance and cost.
+
+**Key Models**:
+
+- `claude-opus-4-1-20250805` - Most powerful model
+- `claude-sonnet-4-5` - Balanced (recommended)
+- `claude-haiku-4-5-20251001` - Fast and low-cost
+
+**Details**: [Claude Model ID Complete Guide](06_llm_model_ids_claude.md)
+
+---
+
+### [OpenAI GPT Models](06_llm_model_ids_openai.md)
+
+OpenAI's GPT-5 series supporting a wide range of tasks, with 400K context and advanced reasoning capabilities.
+
+**Key Models**:
+
+- `gpt-5` - GPT-5 standard version
+- `gpt-5-mini` - Small version (cost-efficient ◎)
+- `gpt-5.1-thinking` - Adaptive reasoning model
+
+**Details**: [OpenAI Model ID Complete Guide](06_llm_model_ids_openai.md)
+
+---
+
+## 🚀 Quick Start
+
+### Basic Usage
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_openai import ChatOpenAI
+from langchain_google_genai import ChatGoogleGenerativeAI
+
+# Use Claude
+claude_llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+# Use OpenAI
+openai_llm = ChatOpenAI(model="gpt-5")
+
+# Use Gemini
+gemini_llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
+```
+
+### Using with LangGraph
+
+```python
+from langgraph.graph import StateGraph
+from langchain_anthropic import ChatAnthropic
+from typing import TypedDict, Annotated
+from langgraph.graph.message import add_messages
+
+# State definition
+class State(TypedDict):
+    messages: Annotated[list, add_messages]
+
+# Model initialization
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+# Node definition
+def chat_node(state: State):
+    response = llm.invoke(state["messages"])
+    return {"messages": [response]}
+
+# Graph construction
+graph = StateGraph(State)
+graph.add_node("chat", chat_node)
+graph.set_entry_point("chat")
+graph.set_finish_point("chat")
+
+app = graph.compile()
+```
+
+## 📊 Model Selection Guide
+
+### Recommended Models by Use Case
+
+| Use Case | Recommended Model | Reason |
+| ---------------------- | ------------------------------------------------------------- | ------------------------- |
+| **Cost-focused** | `claude-haiku-4-5`<br>`gpt-5-mini`<br>`gemini-2.5-flash-lite` | Low cost and fast |
+| **Balance-focused** | `claude-sonnet-4-5`<br>`gpt-5`<br>`gemini-2.5-flash` | Balance of performance and cost |
+| **Performance-focused** | `claude-opus-4-1`<br>`gpt-5-pro`<br>`gemini-3-pro` | Maximum performance |
+| **Reasoning-specialized** | `gpt-5.1-thinking`<br>`gpt-5.1-instant` | Adaptive reasoning, math, science |
+| **Large-scale context** | `gemini-2.5-pro` | 1M token context |
+
+### Selection by Task Complexity
+
+```python
+def select_model(task_complexity: str, budget: str = "normal"):
+    """Select optimal model based on task and budget"""
+
+    # Budget-focused
+    if budget == "low":
+        models = {
+            "simple": "claude-haiku-4-5-20251001",
+            "medium": "gpt-5-mini",
+            "complex": "claude-sonnet-4-5"
+        }
+        return models.get(task_complexity, "gpt-5-mini")
+
+    # Performance-focused
+    if budget == "high":
+        models = {
+            "simple": "claude-sonnet-4-5",
+            "medium": "gpt-5",
+            "complex": "claude-opus-4-1-20250805"
+        }
+        return models.get(task_complexity, "claude-opus-4-1-20250805")
+
+    # Balance-focused (default)
+    models = {
+        "simple": "gpt-5-mini",
+        "medium": "claude-sonnet-4-5",
+        "complex": "gpt-5"
+    }
+    return models.get(task_complexity, "claude-sonnet-4-5")
+```
+
+## 🔄 Multi-Model Strategy
+
+### Fallback Between Providers
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_openai import ChatOpenAI
+
+# Primary model and fallback
+primary = ChatAnthropic(model="claude-sonnet-4-5")
+fallback1 = ChatOpenAI(model="gpt-5")
+fallback2 = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
+
+llm_with_fallback = primary.with_fallbacks([fallback1, fallback2])
+
+# Automatically fallback until one model succeeds
+response = llm_with_fallback.invoke("Question content")
+```
+
+### Cost-Optimized Auto-Routing
+
+```python
+from langgraph.graph import StateGraph
+from typing import TypedDict, Annotated, Literal
+from langgraph.graph.message import add_messages
+
+class State(TypedDict):
+    messages: Annotated[list, add_messages]
+    complexity: Literal["simple", "medium", "complex"]
+
+# Use different models based on complexity
+simple_llm = ChatAnthropic(model="claude-haiku-4-5-20251001")  # Low cost
+medium_llm = ChatOpenAI(model="gpt-5-mini")  # Balance
+complex_llm = ChatAnthropic(model="claude-opus-4-1-20250805")  # High performance
+
+def analyze_complexity(state: State):
+    """Analyze message complexity"""
+    message = state["messages"][-1].content
+    # Simple complexity determination
+    if len(message) < 50:
+        complexity = "simple"
+    elif len(message) < 200:
+        complexity = "medium"
+    else:
+        complexity = "complex"
+    return {"complexity": complexity}
+
+def route_by_complexity(state: State):
+    """Route based on complexity"""
+    routes = {
+        "simple": "simple_node",
+        "medium": "medium_node",
+        "complex": "complex_node"
+    }
+    return routes[state["complexity"]]
+
+def simple_node(state: State):
+    response = simple_llm.invoke(state["messages"])
+    return {"messages": [response]}
+
+def medium_node(state: State):
+    response = medium_llm.invoke(state["messages"])
+    return {"messages": [response]}
+
+def complex_node(state: State):
+    response = complex_llm.invoke(state["messages"])
+    return {"messages": [response]}
+
+# Graph construction
+graph = StateGraph(State)
+graph.add_node("analyze", analyze_complexity)
+graph.add_node("simple_node", simple_node)
+graph.add_node("medium_node", medium_node)
+graph.add_node("complex_node", complex_node)
+
+graph.set_entry_point("analyze")
+graph.add_conditional_edges("analyze", route_by_complexity)
+
+app = graph.compile()
+```
+
+## 🔧 Best Practices
+
+### 1. Environment Variable Management
+
+```python
+import os
+
+# Flexibly manage models with environment variables
+DEFAULT_MODEL = os.getenv("DEFAULT_LLM_MODEL", "claude-sonnet-4-5")
+FAST_MODEL = os.getenv("FAST_LLM_MODEL", "claude-haiku-4-5-20251001")
+SMART_MODEL = os.getenv("SMART_LLM_MODEL", "claude-opus-4-1-20250805")
+
+# Switch provider based on environment
+PROVIDER = os.getenv("LLM_PROVIDER", "anthropic")
+
+if PROVIDER == "anthropic":
+    llm = ChatAnthropic(model=DEFAULT_MODEL)
+elif PROVIDER == "openai":
+    llm = ChatOpenAI(model="gpt-5")
+elif PROVIDER == "google":
+    llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
+```
+
+### 2. Fixed Model Version (Production)
+
+```python
+# ✅ Recommended: Use dated version (production)
+prod_llm = ChatAnthropic(model="claude-sonnet-4-20250514")
+
+# ⚠️ Caution: No version specified (potential unexpected updates)
+dev_llm = ChatAnthropic(model="claude-sonnet-4")
+```
+
+### 3. Cost Monitoring
+
+```python
+from langchain.callbacks import get_openai_callback
+
+# OpenAI cost tracking
+with get_openai_callback() as cb:
+    response = openai_llm.invoke("question")
+    print(f"Total Cost: ${cb.total_cost}")
+    print(f"Tokens: {cb.total_tokens}")
+
+# For other providers, track manually
+# Refer to each provider's detail pages
+```
+
+## 📖 Detailed Documentation
+
+For detailed information on each provider, please refer to the following pages:
+
+- **[Gemini Model ID](06_llm_model_ids_gemini.md)**: Model list, usage, advanced settings, multimodal features
+- **[Claude Model ID](06_llm_model_ids_claude.md)**: Model list, platform-specific IDs, tool usage, deprecated model information
+- **[OpenAI Model ID](06_llm_model_ids_openai.md)**: Model list, reasoning models, vision features, Azure OpenAI
+
+## 🔗 Reference Links
+
+### Official Documentation
+
+- [Google Gemini API](https://ai.google.dev/gemini-api/docs/models)
+- [Anthropic Claude API](https://docs.anthropic.com/en/docs/about-claude/models/overview)
+- [OpenAI Platform](https://platform.openai.com/docs/models)
+
+### Integration Guides
+
+- [LangChain Chat Models](https://docs.langchain.com/oss/python/modules/model_io/chat/)
+- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
+
+### Pricing Information
+
+- [Gemini Pricing](https://ai.google.dev/pricing)
+- [Claude Pricing](https://www.anthropic.com/pricing)
+- [OpenAI Pricing](https://openai.com/pricing)
--- a/skills/langgraph-master/06_llm_model_ids_claude.md
+++ b/skills/langgraph-master/06_llm_model_ids_claude.md
@@ -0,0 +1,127 @@
+# Anthropic Claude Model IDs
+
+List of available model IDs for the Anthropic Claude API.
+
+> **Last Updated**: 2025-11-24
+
+## Model List
+
+### Claude 4.x (2025)
+
+| Model ID | Context | Max Output | Release | Features |
+|-----------|------------|---------|---------|------|
+| `claude-opus-4-1-20250805` | 200K | 32K | 2025-08 | Most powerful. Complex reasoning & code generation |
+| `claude-sonnet-4-5` | 1M | 64K | 2025-09 | Latest balanced model (recommended) |
+| `claude-sonnet-4-20250514` | 200K (1M beta) | 64K | 2025-05 | Production recommended (date-fixed) |
+| `claude-haiku-4-5-20251001` | 200K | 64K | 2025-10 | Fast & low-cost |
+
+**Model Characteristics**:
+- **Opus**: Highest performance, complex tasks (200K context)
+- **Sonnet**: Balanced, general-purpose (1M context)
+- **Haiku**: Fast & low-cost ($1/M input, $5/M output)
+
+## Basic Usage
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+# Recommended: Latest Sonnet
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+# Production: Date-fixed version
+llm = ChatAnthropic(model="claude-sonnet-4-20250514")
+
+# Fast & low-cost
+llm = ChatAnthropic(model="claude-haiku-4-5-20251001")
+
+# Highest performance
+llm = ChatAnthropic(model="claude-opus-4-1-20250805")
+```
+
+### Environment Variables
+
+```bash
+export ANTHROPIC_API_KEY="sk-ant-..."
+```
+
+## Model Selection Guide
+
+| Use Case | Recommended Model |
+|------|-----------|
+| Cost-focused | `claude-haiku-4-5-20251001` |
+| Balanced | `claude-sonnet-4-5` |
+| Performance-focused | `claude-opus-4-1-20250805` |
+| Production | `claude-sonnet-4-20250514` (date-fixed) |
+
+## Claude Features
+
+### 1. Large Context Window
+
+Claude Sonnet 4.5 supports **1M tokens** context window:
+
+| Model | Standard Context | Max Output | Notes |
+|--------|---------------|---------|------|
+| Sonnet 4.5 | 1M | 64K | Latest version |
+| Sonnet 4 | 200K (1M beta) | 64K | 1M available with beta header |
+| Opus 4.1 | 200K | 32K | High-performance version |
+| Haiku 4.5 | 200K | 64K | Fast version |
+
+```python
+# Using 1M context (Sonnet 4.5)
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    max_tokens=64000  # Max output: 64K
+)
+
+# Enable 1M context for Sonnet 4 (beta)
+llm = ChatAnthropic(
+    model="claude-sonnet-4-20250514",
+    default_headers={"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"}
+)
+```
+
+### 2. Date-Fixed Versions
+
+For production environments, date-fixed versions are recommended to prevent unexpected updates:
+
+```python
+# ✅ Recommended (production)
+llm = ChatAnthropic(model="claude-sonnet-4-20250514")
+
+# ⚠️ Caution (development only)
+llm = ChatAnthropic(model="claude-sonnet-4")
+```
+
+### 3. Tool Use (Function Calling)
+
+Claude has powerful tool use capabilities (see [Tool Use Guide](06_llm_model_ids_claude_tools.md) for details).
+
+### 4. Multi-Platform Support
+
+Available on multiple cloud platforms (see [Platform-Specific Guide](06_llm_model_ids_claude_platforms.md) for details):
+
+- Anthropic API (direct)
+- Google Vertex AI
+- AWS Bedrock
+- Azure AI (Microsoft Foundry)
+
+## Deprecated Models
+
+| Model | Deprecation Date | Migration Target |
+|--------|-------|--------|
+| Claude 3 Opus | 2025-07-21 | `claude-opus-4-1-20250805` |
+| Claude 3 Sonnet | 2025-07-21 | `claude-sonnet-4-5` |
+| Claude 2.1 | 2025-07-21 | `claude-sonnet-4-5` |
+
+## Detailed Documentation
+
+For advanced settings and parameters:
+- **[Claude Advanced Features](06_llm_model_ids_claude_advanced.md)** - Parameter configuration, streaming, caching
+- **[Platform-Specific Guide](06_llm_model_ids_claude_platforms.md)** - Usage on Vertex AI, AWS Bedrock, Azure AI
+- **[Tool Use Guide](06_llm_model_ids_claude_tools.md)** - Function Calling implementation
+
+## Reference Links
+
+- [Claude API Official](https://docs.anthropic.com/en/docs/about-claude/models/overview)
+- [Anthropic Console](https://console.anthropic.com/)
+- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/anthropic)
--- a/skills/langgraph-master/06_llm_model_ids_claude_advanced.md
+++ b/skills/langgraph-master/06_llm_model_ids_claude_advanced.md
@@ -0,0 +1,262 @@
+# Claude Advanced Features
+
+Advanced settings and parameter tuning for Claude models.
+
+## Context Window and Output Limits
+
+| Model | Context Window | Max Output Tokens | Notes |
+|--------|-------------------|---------------|------|
+| `claude-opus-4-1-20250805` | 200,000 | 32,000 | Highest performance |
+| `claude-sonnet-4-5` | 1,000,000 | 64,000 | Latest version |
+| `claude-sonnet-4-20250514` | 200,000 (1M beta) | 64,000 | 1M with beta header |
+| `claude-haiku-4-5-20251001` | 200,000 | 64,000 | Fast version |
+
+**Note**: To use 1M context with Sonnet 4, a beta header is required.
+
+## Parameter Configuration
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    temperature=0.7,          # Creativity (0.0-1.0)
+    max_tokens=64000,         # Max output (Sonnet 4.5: 64K)
+    top_p=0.9,               # Diversity
+    top_k=40,                # Sampling
+)
+
+# Opus 4.1 (max output 32K)
+llm_opus = ChatAnthropic(
+    model="claude-opus-4-1-20250805",
+    max_tokens=32000,
+)
+```
+
+## Using 1M Context
+
+### Sonnet 4.5 (Standard)
+
+```python
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    max_tokens=64000
+)
+
+# Can process 1M tokens of context
+long_document = "..." * 500000  # Long document
+response = llm.invoke(f"Please analyze the following document:\n\n{long_document}")
+```
+
+### Sonnet 4 (Beta Header)
+
+```python
+# Enable 1M context with beta header
+llm = ChatAnthropic(
+    model="claude-sonnet-4-20250514",
+    max_tokens=64000,
+    default_headers={
+        "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"
+    }
+)
+```
+
+## Streaming
+
+```python
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    streaming=True
+)
+
+for chunk in llm.stream("question"):
+    print(chunk.content, end="", flush=True)
+```
+
+## Prompt Caching
+
+Cache parts of long prompts for efficiency:
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    max_tokens=4096
+)
+
+# System prompt for caching
+system_prompt = """
+You are a professional code reviewer.
+Please review according to the following coding guidelines:
+[long guidelines...]
+"""
+
+# Use cache
+response = llm.invoke(
+    [
+        {"role": "system", "content": system_prompt, "cache_control": {"type": "ephemeral"}},
+        {"role": "user", "content": "Please review this code"}
+    ]
+)
+```
+
+**Cache Benefits**:
+- Cost reduction (90% off on cache hits)
+- Latency reduction (faster processing on reuse)
+
+## Vision (Image Processing)
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_core.messages import HumanMessage
+
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "What's in this image?"},
+        {
+            "type": "image_url",
+            "image_url": {
+                "url": "https://example.com/image.jpg"
+            }
+        }
+    ]
+)
+
+response = llm.invoke([message])
+```
+
+## JSON Mode
+
+When structured output is needed:
+
+```python
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    model_kwargs={
+        "response_format": {"type": "json_object"}
+    }
+)
+
+response = llm.invoke("Return user information in JSON format")
+```
+
+## Token Usage Tracking
+
+```python
+from langchain.callbacks import get_openai_callback
+
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+with get_openai_callback() as cb:
+    response = llm.invoke("question")
+    print(f"Total Tokens: {cb.total_tokens}")
+    print(f"Prompt Tokens: {cb.prompt_tokens}")
+    print(f"Completion Tokens: {cb.completion_tokens}")
+```
+
+## Error Handling
+
+```python
+from anthropic import AnthropicError, RateLimitError
+
+try:
+    llm = ChatAnthropic(model="claude-sonnet-4-5")
+    response = llm.invoke("question")
+except RateLimitError:
+    print("Rate limit reached")
+except AnthropicError as e:
+    print(f"Anthropic error: {e}")
+```
+
+## Rate Limit Handling
+
+```python
+from tenacity import retry, wait_exponential, stop_after_attempt
+from anthropic import RateLimitError
+
+@retry(
+    wait=wait_exponential(multiplier=1, min=4, max=60),
+    stop=stop_after_attempt(5),
+    retry=lambda e: isinstance(e, RateLimitError)
+)
+def invoke_with_retry(llm, messages):
+    return llm.invoke(messages)
+
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+response = invoke_with_retry(llm, ["question"])
+```
+
+## Listing Models
+
+```python
+import anthropic
+import os
+
+client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+models = client.models.list()
+
+for model in models.data:
+    print(f"{model.id} - {model.display_name}")
+```
+
+## Cost Optimization
+
+### Cost Management by Model Selection
+
+```python
+# Low-cost version (simple tasks)
+llm_cheap = ChatAnthropic(model="claude-haiku-4-5-20251001")
+
+# Balanced version (general tasks)
+llm_balanced = ChatAnthropic(model="claude-sonnet-4-5")
+
+# High-performance version (complex tasks)
+llm_powerful = ChatAnthropic(model="claude-opus-4-1-20250805")
+
+# Select based on task
+def get_llm_for_task(complexity):
+    if complexity == "simple":
+        return llm_cheap
+    elif complexity == "medium":
+        return llm_balanced
+    else:
+        return llm_powerful
+```
+
+### Cost Reduction with Prompt Caching
+
+```python
+# Cache long system prompt
+system = {"role": "system", "content": long_guidelines, "cache_control": {"type": "ephemeral"}}
+
+# Reuse cache across multiple calls (90% cost reduction)
+for user_input in user_inputs:
+    response = llm.invoke([system, {"role": "user", "content": user_input}])
+```
+
+## Leveraging Large Context
+
+```python
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+
+# Process large documents at once (1M token support)
+documents = load_large_documents()  # Large document collection
+
+response = llm.invoke(f"""
+Please analyze the following multiple documents:
+
+{documents}
+
+Tell me the main themes and conclusions.
+""")
+```
+
+## Reference Links
+
+- [Claude API Documentation](https://docs.anthropic.com/)
+- [Anthropic API Reference](https://docs.anthropic.com/en/api/)
+- [Claude Models Overview](https://docs.anthropic.com/en/docs/about-claude/models/overview)
+- [Prompt Caching Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
--- a/skills/langgraph-master/06_llm_model_ids_claude_platforms.md
+++ b/skills/langgraph-master/06_llm_model_ids_claude_platforms.md
@@ -0,0 +1,219 @@
+# Claude Platform-Specific Guide
+
+How to use Claude on different cloud platforms.
+
+## Anthropic API (Direct)
+
+### Basic Usage
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    anthropic_api_key="sk-ant-..."
+)
+```
+
+### Listing Models
+
+```python
+import anthropic
+import os
+
+client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+models = client.models.list()
+
+for model in models.data:
+    print(f"{model.id} - {model.display_name}")
+```
+
+## Google Vertex AI
+
+### Model ID Format
+
+Vertex AI uses `@` notation:
+
+```
+claude-opus-4-1@20250805
+claude-sonnet-4@20250514
+claude-haiku-4.5@20251001
+```
+
+### Usage
+
+```python
+from langchain_google_vertexai import ChatVertexAI
+
+llm = ChatVertexAI(
+    model="claude-haiku-4.5@20251001",
+    project="your-gcp-project",
+    location="us-central1"
+)
+```
+
+### Environment Setup
+
+```bash
+# GCP authentication
+gcloud auth application-default login
+
+# Environment variables
+export GOOGLE_CLOUD_PROJECT="your-project-id"
+export GOOGLE_CLOUD_LOCATION="us-central1"
+```
+
+## AWS Bedrock
+
+### Model ID Format
+
+Bedrock uses ARN format:
+
+```
+anthropic.claude-opus-4-1-20250805-v1:0
+anthropic.claude-sonnet-4-20250514-v1:0
+anthropic.claude-haiku-4-5-20251001-v1:0
+```
+
+### Usage
+
+```python
+from langchain_aws import ChatBedrock
+
+llm = ChatBedrock(
+    model_id="anthropic.claude-haiku-4-5-20251001-v1:0",
+    region_name="us-east-1",
+    model_kwargs={
+        "temperature": 0.7,
+        "max_tokens": 4096
+    }
+)
+```
+
+### Environment Setup
+
+```bash
+# AWS CLI configuration
+aws configure
+
+# Or environment variables
+export AWS_ACCESS_KEY_ID="your-access-key"
+export AWS_SECRET_ACCESS_KEY="your-secret-key"
+export AWS_DEFAULT_REGION="us-east-1"
+```
+
+## Azure AI (Microsoft Foundry)
+
+> **Release**: Public preview started in November 2025
+
+### Model ID Format
+
+Azure AI uses the same format as Anthropic API:
+
+```
+claude-opus-4-1
+claude-sonnet-4-5
+claude-haiku-4-5
+```
+
+### Available Models
+
+- **Claude Opus 4.1** (`claude-opus-4-1`)
+- **Claude Sonnet 4.5** (`claude-sonnet-4-5`)
+- **Claude Haiku 4.5** (`claude-haiku-4-5`)
+
+### Usage
+
+```python
+# Calling Claude using Azure OpenAI SDK
+import os
+from openai import AzureOpenAI
+
+client = AzureOpenAI(
+    azure_endpoint=os.getenv("AZURE_FOUNDRY_ENDPOINT"),
+    api_key=os.getenv("AZURE_FOUNDRY_API_KEY"),
+    api_version="2024-12-01-preview"
+)
+
+# Specify deployment name (default is same as model ID)
+response = client.chat.completions.create(
+    model="claude-sonnet-4-5",  # Or your custom deployment name
+    messages=[
+        {"role": "user", "content": "Hello"}
+    ]
+)
+```
+
+### Custom Deployments
+
+You can set custom deployment names in the Foundry portal:
+
+```python
+# Using custom deployment name
+response = client.chat.completions.create(
+    model="my-custom-claude-deployment",
+    messages=[...]
+)
+```
+
+### Environment Setup
+
+```bash
+export AZURE_FOUNDRY_ENDPOINT="https://your-foundry-resource.azure.com"
+export AZURE_FOUNDRY_API_KEY="your-api-key"
+```
+
+### Region Limitations
+
+Currently available in the following regions:
+- **East US2**
+- **Sweden Central**
+
+Deployment type: **Global Standard**
+
+## Platform-Specific Features
+
+| Platform | Model ID Format | Benefits | Drawbacks |
+|----------------|------------|---------|-----------|
+| **Anthropic API** | `claude-sonnet-4-5` | Instant access to latest models | Single provider dependency |
+| **Vertex AI** | `claude-sonnet-4@20250514` | Integration with GCP services | Complex setup |
+| **AWS Bedrock** | `anthropic.claude-sonnet-4-20250514-v1:0` | Integration with AWS ecosystem | Complex model ID format |
+| **Azure AI** | `claude-sonnet-4-5` | Azure + GPT and Claude integration | Region limitations |
+
+## Cross-Platform Fallback
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_google_vertexai import ChatVertexAI
+from langchain_aws import ChatBedrock
+
+# Primary and fallback (multi-platform support)
+primary = ChatAnthropic(model="claude-sonnet-4-5")
+fallback_gcp = ChatVertexAI(
+    model="claude-sonnet-4@20250514",
+    project="your-project"
+)
+fallback_aws = ChatBedrock(
+    model_id="anthropic.claude-sonnet-4-20250514-v1:0",
+    region_name="us-east-1"
+)
+
+# Fallback across three platforms
+llm = primary.with_fallbacks([fallback_gcp, fallback_aws])
+```
+
+## Model ID Comparison Table
+
+| Anthropic API | Vertex AI | AWS Bedrock | Azure AI |
+|--------------|-----------|-------------|----------|
+| `claude-opus-4-1-20250805` | `claude-opus-4-1@20250805` | `anthropic.claude-opus-4-1-20250805-v1:0` | `claude-opus-4-1` |
+| `claude-sonnet-4-5` | `claude-sonnet-4@20250514` | `anthropic.claude-sonnet-4-20250514-v1:0` | `claude-sonnet-4-5` |
+| `claude-haiku-4-5-20251001` | `claude-haiku-4.5@20251001` | `anthropic.claude-haiku-4-5-20251001-v1:0` | `claude-haiku-4-5` |
+
+## Reference Links
+
+- [Anthropic API Documentation](https://docs.anthropic.com/)
+- [Vertex AI Claude Models](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude)
+- [AWS Bedrock Claude Models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html)
+- [Azure AI Claude Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/how-to/use-foundry-models-claude)
+- [Claude in Microsoft Foundry Announcement](https://www.anthropic.com/news/claude-in-microsoft-foundry)
--- a/skills/langgraph-master/06_llm_model_ids_claude_tools.md
+++ b/skills/langgraph-master/06_llm_model_ids_claude_tools.md
@@ -0,0 +1,216 @@
+# Claude Tool Use Guide
+
+Implementation methods for Claude's tool use (Function Calling).
+
+## Basic Tool Definition
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_core.tools import tool
+
+@tool
+def get_weather(location: str) -> str:
+    """Get weather for a specified location.
+
+    Args:
+        location: Location to check weather (e.g., "Tokyo")
+    """
+    return f"The weather in {location} is sunny"
+
+@tool
+def calculate(expression: str) -> float:
+    """Calculate a mathematical expression.
+
+    Args:
+        expression: Mathematical expression to calculate (e.g., "2 + 2")
+    """
+    return eval(expression)
+
+# Bind tools
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+llm_with_tools = llm.bind_tools([get_weather, calculate])
+
+# Usage
+response = llm_with_tools.invoke("Tell me Tokyo's weather and 2+2")
+print(response.tool_calls)
+```
+
+## Tool Integration with LangGraph
+
+```python
+from langgraph.prebuilt import create_react_agent
+from langchain_anthropic import ChatAnthropic
+from langchain_core.tools import tool
+
+@tool
+def search_database(query: str) -> str:
+    """Search the database.
+
+    Args:
+        query: Search query
+    """
+    return f"Search results for '{query}'"
+
+# Create agent
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+tools = [search_database]
+
+agent = create_react_agent(llm, tools)
+
+# Execute
+result = agent.invoke({
+    "messages": [("user", "Search for user information")]
+})
+```
+
+## Custom Tool Node Implementation
+
+```python
+from langgraph.graph import StateGraph
+from langchain_anthropic import ChatAnthropic
+from typing import TypedDict, Annotated
+from langgraph.graph.message import add_messages
+
+class State(TypedDict):
+    messages: Annotated[list, add_messages]
+
+@tool
+def get_stock_price(symbol: str) -> float:
+    """Get stock price"""
+    return 150.25
+
+llm = ChatAnthropic(model="claude-sonnet-4-5")
+llm_with_tools = llm.bind_tools([get_stock_price])
+
+def agent_node(state: State):
+    response = llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+def tool_node(state: State):
+    # Execute tool calls
+    last_message = state["messages"][-1]
+    tool_calls = last_message.tool_calls
+
+    results = []
+    for tool_call in tool_calls:
+        tool_result = get_stock_price.invoke(tool_call["args"])
+        results.append({
+            "tool_call_id": tool_call["id"],
+            "output": tool_result
+        })
+
+    return {"messages": results}
+
+# Build graph
+graph = StateGraph(State)
+graph.add_node("agent", agent_node)
+graph.add_node("tools", tool_node)
+# ... Add edges, etc.
+```
+
+## Streaming + Tool Use
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_core.tools import tool
+
+@tool
+def get_info(topic: str) -> str:
+    """Get information"""
+    return f"Information about {topic}"
+
+llm = ChatAnthropic(
+    model="claude-sonnet-4-5",
+    streaming=True
+)
+llm_with_tools = llm.bind_tools([get_info])
+
+for chunk in llm_with_tools.stream("Tell me about Python"):
+    if hasattr(chunk, 'tool_calls') and chunk.tool_calls:
+        print(f"Tool: {chunk.tool_calls}")
+    elif chunk.content:
+        print(chunk.content, end="", flush=True)
+```
+
+## Error Handling
+
+```python
+from langchain_anthropic import ChatAnthropic
+from langchain_core.tools import tool
+import anthropic
+
+@tool
+def risky_operation(data: str) -> str:
+    """Risky operation"""
+    if not data:
+        raise ValueError("Data is required")
+    return f"Processing complete: {data}"
+
+try:
+    llm = ChatAnthropic(model="claude-sonnet-4-5")
+    llm_with_tools = llm.bind_tools([risky_operation])
+    response = llm_with_tools.invoke("Execute operation")
+except anthropic.BadRequestError as e:
+    print(f"Invalid request: {e}")
+except Exception as e:
+    print(f"Error: {e}")
+```
+
+## Tool Best Practices
+
+### 1. Clear Documentation
+
+```python
+@tool
+def analyze_sentiment(text: str, language: str = "en") -> dict:
+    """Perform sentiment analysis on text.
+
+    Args:
+        text: Text to analyze (max 1000 characters)
+        language: Language of text ("ja", "en", etc.) defaults to English
+
+    Returns:
+        {"sentiment": "positive|negative|neutral", "score": 0.0-1.0}
+    """
+    # Implementation
+    return {"sentiment": "positive", "score": 0.8}
+```
+
+### 2. Use Type Hints
+
+```python
+from typing import List, Dict
+
+@tool
+def batch_process(items: List[str]) -> Dict[str, int]:
+    """Batch process multiple items.
+
+    Args:
+        items: List of items to process
+
+    Returns:
+        Dictionary of processing results for each item
+    """
+    return {item: len(item) for item in items}
+```
+
+### 3. Proper Error Handling
+
+```python
+@tool
+def safe_operation(data: str) -> str:
+    """Safe operation"""
+    try:
+        # Execute operation
+        result = process(data)
+        return result
+    except ValueError as e:
+        return f"Input error: {e}"
+    except Exception as e:
+        return f"Unexpected error: {e}"
+```
+
+## Reference Links
+
+- [Claude Tool Use Guide](https://docs.anthropic.com/en/docs/tool-use)
+- [LangGraph Tools Documentation](https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/)
--- a/skills/langgraph-master/06_llm_model_ids_gemini.md
+++ b/skills/langgraph-master/06_llm_model_ids_gemini.md
@@ -0,0 +1,115 @@
+# Google Gemini Model IDs
+
+List of available model IDs for the Google Gemini API.
+
+> **Last Updated**: 2025-11-24
+
+## Model List
+
+While there are many models available, `gemini-2.5-flash` is generally recommended for development at this time. It offers a good balance of cost and performance for a wide range of use cases.
+
+### Gemini 3.x (Latest)
+
+| Model ID                                | Context | Max Output | Use Case                    |
+| ---------------------------------------- | ------------ | -------- | ------------------ |
+| `google/gemini-3-pro-preview`            | -            | 64K      | Latest high-performance model |
+| `google/gemini-3-pro-image-preview`      | -            | -        | Image generation           |
+| `google/gemini-3-pro-image-preview-edit` | -            | -        | Image editing           |
+
+### Gemini 2.5
+
+| Model ID               | Context | Max Output | Use Case                   |
+| ----------------------- | ------------ | -------- | ---------------------- |
+| `google/gemini-2.5-pro` | 1M (2M planned) | -        | High performance                 |
+| `gemini-2.5-flash`      | 1M           | -        | Fast balanced model (recommended) |
+| `gemini-2.5-flash-lite` | 1M           | -        | Lightweight and fast               |
+
+**Note**: Free tier is limited to approximately 32K tokens. Gemini Advanced (2.5 Pro) supports 1M tokens.
+
+### Gemini 2.0
+
+| Model ID          | Context | Max Output | Use Case   |
+| ------------------ | ------------ | -------- | ------ |
+| `gemini-2.0-flash` | 1M           | -        | Stable version |
+
+## Basic Usage
+
+```python
+from langchain_google_genai import ChatGoogleGenerativeAI
+
+# Recommended: Balanced model
+llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
+
+# Also works with prefix
+llm = ChatGoogleGenerativeAI(model="models/gemini-2.5-flash")
+
+# High-performance version
+llm = ChatGoogleGenerativeAI(model="google/gemini-3-pro")
+
+# Lightweight version
+llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")
+```
+
+### Environment Variables
+
+```bash
+export GOOGLE_API_KEY="your-api-key"
+```
+
+## Model Selection Guide
+
+| Use Case               | Recommended Model                     |
+| ------------------ | ------------------------------ |
+| Cost-focused         | `gemini-2.5-flash-lite`        |
+| Balanced           | `gemini-2.5-flash`             |
+| Performance-focused           | `google/gemini-3-pro`          |
+| Large context | `gemini-2.5-pro` (1M tokens) |
+
+## Gemini Features
+
+### 1. Large Context Window
+
+Gemini is the **industry's first model to support 1M tokens**:
+
+| Tier                    | Context Limit |
+| ------------------------- | ---------------- |
+| Gemini Advanced (2.5 Pro) | 1M tokens      |
+| Vertex AI                 | 1M tokens      |
+| Free tier                    | ~32K tokens  |
+
+**Use Cases**:
+
+- Long document analysis
+- Understanding entire codebases
+- Long conversation history
+
+```python
+# Processing large context
+llm = ChatGoogleGenerativeAI(
+    model="gemini-2.5-pro",
+    max_tokens=8192  # Specify output token count
+)
+```
+
+**Future**: Gemini 2.5 Pro is planned to support 2M token context windows.
+
+### 2. Multimodal Support
+
+Image input and generation capabilities (see [Advanced Features](06_llm_model_ids_gemini_advanced.md) for details).
+
+## Important Notes
+
+- ❌ **Deprecated**: Gemini 1.0, 1.5 series are no longer available
+- ✅ **Migration Recommended**: Use `gemini-2.5-flash` or later models
+
+## Detailed Documentation
+
+For advanced configuration and multimodal features, see:
+
+- **[Gemini Advanced Features](06_llm_model_ids_gemini_advanced.md)**
+
+## Reference Links
+
+- [Gemini API Official](https://ai.google.dev/gemini-api/docs/models)
+- [Google AI Studio](https://makersuite.google.com/)
+- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai)
--- a/skills/langgraph-master/06_llm_model_ids_gemini_advanced.md
+++ b/skills/langgraph-master/06_llm_model_ids_gemini_advanced.md
@@ -0,0 +1,118 @@
+# Gemini Advanced Features
+
+Advanced configuration and multimodal features for Google Gemini models.
+
+## Context Window and Output Limits
+
+| Model | Context Window | Max Output Tokens |
+|--------|-------------------|---------------|
+| Gemini 3 Pro | - | 64K |
+| Gemini 2.5 Pro | 1M (2M planned) | - |
+| Gemini 2.5 Flash | 1M | - |
+| Gemini 2.0 Flash | 1M | - |
+
+**Tier-based Limits**:
+- Gemini Advanced / Vertex AI: 1M tokens
+- Free tier: ~32K tokens
+
+## Parameter Configuration
+
+```python
+from langchain_google_genai import ChatGoogleGenerativeAI
+
+llm = ChatGoogleGenerativeAI(
+    model="gemini-2.5-flash",
+    temperature=0.7,          # Creativity (0.0-1.0)
+    top_p=0.9,               # Diversity
+    top_k=40,                # Sampling
+    max_tokens=8192,         # Max output
+)
+```
+
+## Multimodal Features
+
+### Image Input
+
+```python
+from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain_core.messages import HumanMessage
+
+llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "What is in this image?"},
+        {"type": "image_url", "image_url": "https://example.com/image.jpg"}
+    ]
+)
+
+response = llm.invoke([message])
+```
+
+### Image Generation (Gemini 3.x)
+
+```python
+llm = ChatGoogleGenerativeAI(model="google/gemini-3-pro-image-preview")
+response = llm.invoke("Generate a beautiful sunset landscape")
+```
+
+## Streaming
+
+```python
+llm = ChatGoogleGenerativeAI(
+    model="gemini-2.5-flash",
+    streaming=True
+)
+
+for chunk in llm.stream("Question"):
+    print(chunk.content, end="", flush=True)
+```
+
+## Safety Settings
+
+```python
+from langchain_google_genai import (
+    ChatGoogleGenerativeAI,
+    HarmBlockThreshold,
+    HarmCategory
+)
+
+llm = ChatGoogleGenerativeAI(
+    model="gemini-2.5-flash",
+    safety_settings={
+        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
+        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
+    }
+)
+```
+
+## Retrieving Model List
+
+```python
+import google.generativeai as genai
+import os
+
+genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
+
+for model in genai.list_models():
+    if 'generateContent' in model.supported_generation_methods:
+        print(f"{model.name}: {model.input_token_limit} tokens")
+```
+
+## Error Handling
+
+```python
+from google.api_core import exceptions
+
+try:
+    response = llm.invoke("Question")
+except exceptions.ResourceExhausted:
+    print("Rate limit reached")
+except exceptions.InvalidArgument as e:
+    print(f"Invalid argument: {e}")
+```
+
+## Reference Links
+
+- [Gemini API Models](https://ai.google.dev/gemini-api/docs/models)
+- [Vertex AI](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models)
--- a/skills/langgraph-master/06_llm_model_ids_openai.md
+++ b/skills/langgraph-master/06_llm_model_ids_openai.md
@@ -0,0 +1,186 @@
+# OpenAI GPT Model IDs
+
+List of available model IDs for the OpenAI API.
+
+> **Last Updated**: 2025-11-24
+
+## Model List
+
+### GPT-5 Series
+
+> **Released**: August 2025
+
+| Model ID | Context | Max Output | Features |
+|-----------|------------|---------|------|
+| `gpt-5` | 400K | 128K | Full-featured. High-quality general-purpose tasks |
+| `gpt-5-pro` | 400K | 272K | Extended reasoning version. Complex enterprise and research use cases |
+| `gpt-5-mini` | 400K | 128K | Small high-speed version. Low latency |
+| `gpt-5-nano` | 400K | 128K | Ultra-lightweight version. Resource optimized |
+
+**Performance**: Achieved 94.6% on AIME 2025, 74.9% on SWE-bench Verified
+**Note**: Context window is the combined length of input + output
+
+### GPT-5.1 Series (Latest Update)
+
+| Model ID | Context | Max Output | Features |
+|-----------|------------|---------|------|
+| `gpt-5.1` | 128K (ChatGPT) / 400K (API) | 128K | Balance of intelligence and speed |
+| `gpt-5.1-instant` | 128K / 400K | 128K | Adaptive reasoning. Balances speed and accuracy |
+| `gpt-5.1-thinking` | 128K / 400K | 128K | Adjusts thinking time based on problem complexity |
+| `gpt-5.1-mini` | 128K / 400K | 128K | Compact version |
+| `gpt-5.1-codex` | 400K | 128K | Code-specialized version (for GitHub Copilot) |
+| `gpt-5.1-codex-mini` | 400K | 128K | Code-specialized compact version |
+
+## Basic Usage
+
+```python
+from langchain_openai import ChatOpenAI
+
+# Latest: GPT-5
+llm = ChatOpenAI(model="gpt-5")
+
+# Latest update: GPT-5.1
+llm = ChatOpenAI(model="gpt-5.1")
+
+# High performance: GPT-5 Pro
+llm = ChatOpenAI(model="gpt-5-pro")
+
+# Cost-conscious: Compact version
+llm = ChatOpenAI(model="gpt-5-mini")
+
+# Ultra-lightweight
+llm = ChatOpenAI(model="gpt-5-nano")
+```
+
+### Environment Variables
+
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+
+## Model Selection Guide
+
+| Use Case | Recommended Model |
+|------|-----------|
+| **Maximum Performance** | `gpt-5-pro` |
+| **General-Purpose Tasks** | `gpt-5` or `gpt-5.1` |
+| **Cost-Conscious** | `gpt-5-mini` |
+| **Ultra-Lightweight** | `gpt-5-nano` |
+| **Adaptive Reasoning** | `gpt-5.1-instant` or `gpt-5.1-thinking` |
+| **Code Generation** | `gpt-5.1-codex` or `gpt-5` |
+
+## GPT-5 Features
+
+### 1. Large Context Window
+
+GPT-5 series has a **400K token** context window:
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5",
+    max_tokens=128000  # Max output: 128K
+)
+
+# GPT-5 Pro has a maximum output of 272K
+llm_pro = ChatOpenAI(
+    model="gpt-5-pro",
+    max_tokens=272000
+)
+```
+
+**Use Cases**:
+- Batch processing of long documents
+- Analysis of large codebases
+- Maintaining long conversation histories
+
+### 2. Software On-Demand Generation
+
+```python
+llm = ChatOpenAI(model="gpt-5")
+response = llm.invoke("Generate a web application")
+```
+
+### 3. Advanced Reasoning Capabilities
+
+**Performance Metrics**:
+- AIME 2025: 94.6%
+- SWE-bench Verified: 74.9%
+- Aider Polyglot: 88%
+- MMMU: 84.2%
+
+### 4. GPT-5.1 Adaptive Reasoning
+
+Automatically adjusts thinking time based on problem complexity:
+
+```python
+# Balance between speed and accuracy
+llm = ChatOpenAI(model="gpt-5.1-instant")
+
+# Tasks requiring deep thought
+llm = ChatOpenAI(model="gpt-5.1-thinking")
+```
+
+**Compaction Technology**: GPT-5.1 introduces technology that effectively handles longer contexts.
+
+### 5. GPT-5 Pro - Extended Reasoning
+
+Advanced reasoning for enterprise and research environments. **Maximum output of 272K tokens**:
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5-pro",
+    max_tokens=272000  # Larger output possible than other models
+)
+# More detailed and reliable responses
+```
+
+### 6. Code-Specialized Models
+
+```python
+# Used in GitHub Copilot
+llm = ChatOpenAI(model="gpt-5.1-codex")
+
+# Compact version
+llm = ChatOpenAI(model="gpt-5.1-codex-mini")
+```
+
+## Multimodal Support
+
+GPT-5 supports images and audio (see [Advanced Features](06_llm_model_ids_openai_advanced.md) for details).
+
+## JSON Mode
+
+When structured output is needed:
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5",
+    model_kwargs={"response_format": {"type": "json_object"}}
+)
+```
+
+## Retrieving Model List
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+models = client.models.list()
+
+for model in models:
+    if model.id.startswith("gpt-5"):
+        print(model.id)
+```
+
+## Detailed Documentation
+
+For advanced settings, vision features, and Azure OpenAI:
+- **[OpenAI Advanced Features](06_llm_model_ids_openai_advanced.md)**
+
+## Reference Links
+
+- [OpenAI GPT-5](https://openai.com/index/introducing-gpt-5/)
+- [OpenAI GPT-5.1](https://openai.com/index/gpt-5-1/)
+- [OpenAI Platform](https://platform.openai.com/)
+- [LangChain Integration](https://docs.langchain.com/oss/python/integrations/chat/openai)
--- a/skills/langgraph-master/06_llm_model_ids_openai_advanced.md
+++ b/skills/langgraph-master/06_llm_model_ids_openai_advanced.md
@@ -0,0 +1,289 @@
+# OpenAI GPT-5 Advanced Features
+
+Advanced settings and multimodal features for GPT-5 models.
+
+## Parameter Settings
+
+```python
+from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+    model="gpt-5",
+    temperature=0.7,          # Creativity (0.0-2.0)
+    max_tokens=128000,        # Max output (GPT-5: 128K)
+    top_p=0.9,               # Diversity
+    frequency_penalty=0.0,    # Repetition penalty
+    presence_penalty=0.0,     # Topic diversity
+)
+
+# GPT-5 Pro (larger max output)
+llm_pro = ChatOpenAI(
+    model="gpt-5-pro",
+    max_tokens=272000,        # GPT-5 Pro: 272K
+)
+```
+
+## Context Window and Output Limits
+
+| Model | Context Window | Max Output Tokens |
+|--------|-------------------|---------------|
+| `gpt-5` | 400,000 (API) | 128,000 |
+| `gpt-5-mini` | 400,000 (API) | 128,000 |
+| `gpt-5-nano` | 400,000 (API) | 128,000 |
+| `gpt-5-pro` | 400,000 | 272,000 |
+| `gpt-5.1` | 128,000 (ChatGPT) / 400,000 (API) | 128,000 |
+| `gpt-5.1-codex` | 400,000 | 128,000 |
+
+**Note**: Context window is the combined length of input + output.
+
+## Vision (Image Processing)
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import HumanMessage
+
+llm = ChatOpenAI(model="gpt-5")
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "What is shown in this image?"},
+        {
+            "type": "image_url",
+            "image_url": {
+                "url": "https://example.com/image.jpg",
+                "detail": "high"  # "low", "high", "auto"
+            }
+        }
+    ]
+)
+
+response = llm.invoke([message])
+```
+
+## Tool Use (Function Calling)
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.tools import tool
+
+@tool
+def get_weather(location: str) -> str:
+    """Get weather"""
+    return f"The weather in {location} is sunny"
+
+@tool
+def calculate(expression: str) -> float:
+    """Calculate"""
+    return eval(expression)
+
+llm = ChatOpenAI(model="gpt-5")
+llm_with_tools = llm.bind_tools([get_weather, calculate])
+
+response = llm_with_tools.invoke("Tell me the weather in Tokyo and 2+2")
+print(response.tool_calls)
+```
+
+## Parallel Tool Calling
+
+```python
+@tool
+def get_stock_price(symbol: str) -> float:
+    """Get stock price"""
+    return 150.25
+
+@tool
+def get_company_info(symbol: str) -> dict:
+    """Get company information"""
+    return {"name": "Apple Inc.", "industry": "Technology"}
+
+llm = ChatOpenAI(model="gpt-5")
+llm_with_tools = llm.bind_tools([get_stock_price, get_company_info])
+
+# Call multiple tools in parallel
+response = llm_with_tools.invoke("Tell me the stock price and company info for AAPL")
+```
+
+## Streaming
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5",
+    streaming=True
+)
+
+for chunk in llm.stream("Question"):
+    print(chunk.content, end="", flush=True)
+```
+
+## JSON Mode
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5",
+    model_kwargs={"response_format": {"type": "json_object"}}
+)
+
+response = llm.invoke("Return user information in JSON format")
+```
+
+## Using GPT-5.1 Adaptive Reasoning
+
+### Instant Mode
+
+Balance between speed and accuracy:
+
+```python
+llm = ChatOpenAI(model="gpt-5.1-instant")
+
+# Adaptively adjusts reasoning time
+response = llm.invoke("Solve this problem...")
+```
+
+### Thinking Mode
+
+Deep thought for complex problems:
+
+```python
+llm = ChatOpenAI(model="gpt-5.1-thinking")
+
+# Improves accuracy with longer thinking time
+response = llm.invoke("Complex math problem...")
+```
+
+## Leveraging GPT-5 Pro
+
+Extended reasoning for enterprise and research environments:
+
+```python
+llm = ChatOpenAI(
+    model="gpt-5-pro",
+    temperature=0.3,  # Precision-focused
+    max_tokens=272000  # Large output possible
+)
+
+# More detailed and reliable responses
+response = llm.invoke("Detailed analysis of...")
+```
+
+## Code Generation Specialized Models
+
+```python
+# Codex used in GitHub Copilot
+llm = ChatOpenAI(model="gpt-5.1-codex")
+
+response = llm.invoke("Implement quicksort in Python")
+
+# Compact version (fast)
+llm_mini = ChatOpenAI(model="gpt-5.1-codex-mini")
+```
+
+## Tracking Token Usage
+
+```python
+from langchain.callbacks import get_openai_callback
+
+llm = ChatOpenAI(model="gpt-5")
+
+with get_openai_callback() as cb:
+    response = llm.invoke("Question")
+    print(f"Total Tokens: {cb.total_tokens}")
+    print(f"Prompt Tokens: {cb.prompt_tokens}")
+    print(f"Completion Tokens: {cb.completion_tokens}")
+    print(f"Total Cost (USD): ${cb.total_cost}")
+```
+
+## Azure OpenAI Service
+
+GPT-5 is also available on Azure:
+
+```python
+from langchain_openai import AzureChatOpenAI
+
+llm = AzureChatOpenAI(
+    azure_endpoint="https://your-resource.openai.azure.com/",
+    api_key="your-azure-api-key",
+    api_version="2024-12-01-preview",
+    deployment_name="gpt-5",
+    model="gpt-5"
+)
+```
+
+### Environment Variables (Azure)
+
+```bash
+export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
+export AZURE_OPENAI_API_KEY="your-azure-api-key"
+export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-5"
+```
+
+## Error Handling
+
+```python
+from langchain_openai import ChatOpenAI
+from openai import OpenAIError, RateLimitError
+
+try:
+    llm = ChatOpenAI(model="gpt-5")
+    response = llm.invoke("Question")
+except RateLimitError:
+    print("Rate limit reached")
+except OpenAIError as e:
+    print(f"OpenAI error: {e}")
+```
+
+## Handling Rate Limits
+
+```python
+from tenacity import retry, wait_exponential, stop_after_attempt
+from openai import RateLimitError
+
+@retry(
+    wait=wait_exponential(multiplier=1, min=4, max=60),
+    stop=stop_after_attempt(5),
+    retry=lambda e: isinstance(e, RateLimitError)
+)
+def invoke_with_retry(llm, messages):
+    return llm.invoke(messages)
+
+llm = ChatOpenAI(model="gpt-5")
+response = invoke_with_retry(llm, ["Question"])
+```
+
+## Leveraging Large Context
+
+Utilizing GPT-5's 400K context window:
+
+```python
+llm = ChatOpenAI(model="gpt-5")
+
+# Process large amounts of documents at once
+long_document = "..." * 100000  # Long document
+
+response = llm.invoke(f"""
+Please analyze the following document:
+
+{long_document}
+
+Provide a summary and key points.
+""")
+```
+
+## Compaction Technology
+
+GPT-5.1 introduces technology that effectively handles longer contexts:
+
+```python
+# Processing very long conversation histories or documents
+llm = ChatOpenAI(model="gpt-5.1")
+
+# Efficiently processed through Compaction
+response = llm.invoke(very_long_context)
+```
+
+## Reference Links
+
+- [OpenAI GPT-5 Documentation](https://openai.com/gpt-5/)
+- [OpenAI GPT-5.1 Documentation](https://openai.com/index/gpt-5-1/)
+- [OpenAI API Reference](https://platform.openai.com/docs/api-reference)
+- [OpenAI Platform Models](https://platform.openai.com/docs/models)
+- [Azure OpenAI Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
--- a/skills/langgraph-master/README.md
+++ b/skills/langgraph-master/README.md
@@ -0,0 +1,131 @@
+# langgraph-master
+
+**PROACTIVE SKILL** - Comprehensive guide for building AI agents with LangGraph. Claude invokes this skill automatically when LangGraph development is detected, providing architecture patterns, implementation guidance, and best practices.
+
+## Installation
+
+```
+/plugin marketplace add hiroshi75/ccplugins
+/plugin install langgraph-master-plugin@hiroshi75
+```
+
+## Automatic Triggers
+
+Claude **automatically invokes** this skill when:
+
+- **LangGraph development** - Detecting LangGraph imports or StateGraph usage
+- **Agent architecture** - Planning or implementing AI agent workflows
+- **Graph patterns** - Working with nodes, edges, or state management
+- **Keywords detected** - When user mentions: LangGraph, StateGraph, agent workflow, node, edge, checkpointer
+- **Implementation requests** - Building chatbots, RAG agents, or autonomous systems
+
+**No manual action required** - Claude provides LangGraph expertise automatically.
+
+## Workflow
+
+```
+Detect LangGraph context → Auto-invoke skill → Provide patterns/guidance → Implement with best practices
+```
+
+## Manual Invocation (Optional)
+
+To manually trigger LangGraph guidance:
+
+```
+/langgraph-master-plugin:langgraph-master
+```
+
+For learning specific patterns:
+
+```
+/langgraph-master-plugin:langgraph-master "explain routing pattern"
+```
+
+## Learning Resources
+
+The skill provides comprehensive documentation covering:
+
+| Category | Topics | Files |
+|----------|--------|-------|
+| **Core Concepts** | State, Node, Edge fundamentals | 01_core_concepts_*.md |
+| **Architecture** | 6 major graph patterns (Routing, Agent, etc.) | 02_graph_architecture_*.md |
+| **Memory** | Checkpointer, Store, Persistence | 03_memory_management_*.md |
+| **Tools** | Tool definition, Command API, Tool Node | 04_tool_integration_*.md |
+| **Advanced** | Human-in-the-Loop, Streaming, Map-Reduce | 05_advanced_features_*.md |
+| **Models** | Gemini, Claude, OpenAI model IDs | 06_llm_model_ids*.md |
+| **Examples** | Chatbot, RAG agent implementations | example_*.md |
+
+## Subagent: langgraph-engineer
+
+The skill includes a specialized **langgraph-master-plugin:langgraph-engineer** subagent for efficient parallel development:
+
+### Key Features
+- **Functional Module Scope**: Implements complete features (2-5 nodes) as cohesive units
+- **Parallel Execution**: Multiple subagents can develop different modules simultaneously
+- **Production-Ready**: No TODOs or placeholders, fully functional code only
+- **Skill-Driven**: Always references langgraph-master documentation before implementation
+
+### When to Use
+1. **Feature Module Implementation**: RAG search, intent analysis, approval workflows
+2. **Subgraph Patterns**: Complete functional units with nodes, edges, and state
+3. **Tool Integration**: Full tool integration modules with error handling
+
+### Parallel Development Pattern
+```
+Planner → Decompose into functional modules
+  ├─ langgraph-engineer 1: Intent analysis module (parallel)
+  │  └─ analyze + classify + route nodes
+  └─ langgraph-engineer 2: RAG search module (parallel)
+     └─ retrieve + rerank + generate nodes
+Orchestrator → Integrate modules into complete graph
+```
+
+## How It Works
+
+1. **Context Detection** - Claude monitors LangGraph-related activities
+2. **Trigger Evaluation** - Checks if auto-invoke conditions are met
+3. **Skill Invocation** - Automatically invokes langgraph-master skill
+4. **Pattern Guidance** - Provides architecture patterns and best practices
+5. **Implementation Support** - Assists with code generation using documented patterns
+
+## Example Use Cases
+
+### Automatic Guidance
+```python
+# Claude detects LangGraph usage and automatically provides guidance
+from langgraph.graph import StateGraph
+
+# Skill auto-invoked → Provides state management patterns
+class AgentState(TypedDict):
+    messages: list[str]
+```
+
+### Pattern Implementation
+```
+User: "Build a RAG agent with LangGraph"
+Claude: [Auto-invokes skill]
+        → Provides RAG architecture pattern
+        → Suggests node structure (retrieve → rerank → generate)
+        → Implements with checkpointer for state persistence
+```
+
+### Subagent Delegation
+```
+User: "Create a chatbot with intent classification and RAG search"
+Claude: → Decomposes into 2 modules
+        → Spawns langgraph-engineer for each module (parallel)
+        → Integrates completed modules into final graph
+```
+
+## Benefits
+
+- **Faster Development**: Pre-validated architecture patterns reduce trial and error
+- **Best Practices**: Automatically applies LangGraph best practices and conventions
+- **Parallel Implementation**: Efficient development through subagent delegation
+- **Complete Documentation**: 40+ documentation files covering all aspects
+- **Production-Ready**: Guidance ensures robust, maintainable implementations
+
+## Reference Links
+
+- [LangGraph Official Docs](https://docs.langchain.com/oss/python/langgraph/overview)
+- [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
--- a/skills/langgraph-master/SKILL.md
+++ b/skills/langgraph-master/SKILL.md
@@ -0,0 +1,193 @@
+---
+name: langgraph-master
+description: Use when specifying or implementing LangGraph applications - from architecture planning and specification writing to actual code implementation. Also use for designing agent workflows or learning LangGraph patterns. This is a comprehensive guide for building AI agents with LangGraph, covering core concepts, architecture patterns, memory management, tool integration, and advanced features.
+---
+
+# LangGraph Agent Construction Skill
+
+A comprehensive guide for building AI agents using LangGraph.
+
+## 📚 Learning Content
+
+### [01. Core Concepts](01_core_concepts_overview.md)
+
+Understanding the three core elements of LangGraph
+
+- [State](01_core_concepts_state.md)
+- [Node](01_core_concepts_node.md)
+- [Edge](01_core_concepts_edge.md)
+- Advantages of the graph-based approach
+
+### [02. Graph Architecture](02_graph_architecture_overview.md)
+
+Six major graph patterns and agent design
+
+- [Workflow vs Agent Differences](02_graph_architecture_workflow_vs_agent.md)
+- [Prompt Chaining (Sequential Processing)](02_graph_architecture_prompt_chaining.md)
+- [Parallelization](02_graph_architecture_parallelization.md)
+- [Routing (Branching)](02_graph_architecture_routing.md)
+- [Orchestrator-Worker](02_graph_architecture_orchestrator_worker.md)
+- [Evaluator-Optimizer](02_graph_architecture_evaluator_optimizer.md)
+- [Agent (Autonomous Tool Usage)](02_graph_architecture_agent.md)
+- [Subgraph](02_graph_architecture_subgraph.md)
+
+### [03. Memory Management](03_memory_management_overview.md)
+
+Persistence and checkpoint functionality
+
+- [Checkpointer](03_memory_management_checkpointer.md)
+- [Store (Long-term Memory)](03_memory_management_store.md)
+- [Persistence](03_memory_management_persistence.md)
+
+### [04. Tool Integration](04_tool_integration_overview.md)
+
+External tool integration and execution control
+
+- [Tool Definition](04_tool_integration_tool_definition.md)
+- [Command API (Control API)](04_tool_integration_command_api.md)
+- [Tool Node](04_tool_integration_tool_node.md)
+
+### [05. Advanced Features](05_advanced_features_overview.md)
+
+Advanced functionality and implementation patterns
+
+- [Human-in-the-Loop (Approval Flow)](05_advanced_features_human_in_the_loop.md)
+- [Streaming](05_advanced_features_streaming.md)
+- [Map-Reduce Pattern](05_advanced_features_map_reduce.md)
+
+### [06. LLM Model IDs](06_llm_model_ids.md)
+
+Model ID reference for major LLM providers. Always refer to this document when selecting model IDs. Do not use models not listed in this document.
+
+- Google Gemini model list
+- Anthropic Claude model list
+- OpenAI GPT model list
+- Usage examples and best practices with LangGraph
+
+### Implementation Examples
+
+Practical agent implementation examples
+
+- [Basic Chatbot](example_basic_chatbot.md)
+- [RAG Agent](example_rag_agent.md)
+
+## 📖 How to Use
+
+Each section can be read independently, but reading them in order is recommended:
+
+1. First understand LangGraph fundamentals in "Core Concepts"
+2. Learn design patterns in "Graph Architecture"
+3. Grasp implementation details in "Memory Management" and "Tool Integration"
+4. Master advanced features in "Advanced Features"
+5. Check practical usage in "Implementation Examples"
+
+Each file is kept short and concise, allowing you to reference only the sections you need.
+
+## 🤖 Efficient Implementation: Utilizing Subagents
+
+To accelerate LangGraph application development, utilize the dedicated subagent `langgraph-master-plugin:langgraph-engineer`.
+
+### Subagent Characteristics
+
+**langgraph-master-plugin:langgraph-engineer** is an agent specialized in implementing functional modules:
+
+- **Functional Unit Scope**: Implements complete functionality with multiple nodes, edges, and state definitions as a set
+- **Parallel Execution Optimization**: Designed for multiple agents to develop different functional modules simultaneously
+- **Skill-Driven**: Always references the langgraph-master skill before implementation
+- **Complete Implementation**: Generates fully functional modules (no TODOs or placeholders)
+- **Appropriate Size**: Functional units of about 2-5 nodes (subgraphs, workflow patterns, tool integrations, etc.)
+
+### When to Use
+
+Use langgraph-master-plugin:langgraph-engineer in the following cases:
+
+1. **When functional module implementation is needed**
+
+   - Decompose the application into functional units
+   - Efficiently develop each function through parallel execution
+
+2. **Subgraph and pattern implementation**
+
+   - RAG search functionality (retrieve → rerank → generate)
+   - Human-in-the-Loop approval flow (propose → wait_approval → execute)
+   - Intent analysis functionality (analyze → classify → route)
+
+3. **Tool integration and memory setup**
+   - Complete tool integration module (definition → execution → processing → error handling)
+   - Memory management module (checkpoint setup → persistence → restoration)
+
+### Practical Example
+
+**Task**: Build a chatbot with intent analysis and RAG search
+
+**Parallel Execution Pattern**:
+
+```
+Planner → Decompose into functional units
+  ├─ langgraph-master-plugin:langgraph-engineer 1: Intent analysis module (parallel)
+  │  └─ analyze + classify + route nodes + conditional edges
+  └─ langgraph-master-plugin:langgraph-engineer 2: RAG search module (parallel)
+     └─ retrieve + rerank + generate nodes + state management
+Orchestrator → Integrate modules to assemble graph
+```
+
+### Usage Method
+
+1. **Decompose into functional modules**
+
+   - Decompose large LangGraph applications into functional units
+   - Verify that each module can be implemented and tested independently
+   - Verify that module size is appropriate (about 2-5 nodes)
+
+2. **Implement common parts first**
+
+   - State used across the entire graph
+   - Common tool definitions and common nodes used throughout
+
+3. **Parallel Execution**
+
+   Assign one functional module implementation to each langgraph-master-plugin:langgraph-engineer agent and execute in parallel
+
+   - Implement independent functional modules simultaneously
+
+4. **Integration**
+   - Incorporate completed modules into the graph
+   - Verify operation through integration testing
+
+### Testing Method
+
+- Perform unit testing for each functional module
+- Verify overall operation after integration. In many cases, there's an API key in .env, so load it and run at least one successful test case
+  - If the successful case doesn't work well, code review is important, but roughly pinpoint the location, add appropriate logs to identify the cause, think carefully, and then fix.
+
+### Functional Module Examples
+
+**Appropriate Size (langgraph-master-plugin:langgraph-engineer scope)**:
+
+- RAG search functionality: retrieve + rerank + generate (3 nodes)
+- Intent analysis: analyze + classify + route (2-3 nodes)
+- Approval workflow: propose + wait_approval + execute (3 nodes)
+- Tool integration: tool_call + execute + process + error_handling (3-4 nodes)
+
+**Too Small (individual implementation is sufficient)**:
+
+- Single node only
+- Single edge only
+- State field definition only
+
+**Too Large (further decomposition needed)**:
+
+- Complete chatbot application
+- Entire system containing multiple independent functions
+
+### Notes
+
+- **Appropriate Scope Setting**: Verify that each task is limited to one functional module
+- **Functional Independence**: Minimize dependencies between modules
+- **Interface Design**: Clearly document state contracts between modules
+- **Integration Plan**: Plan the integration method after module implementation in advance
+
+## 🔗 Reference Links
+
+- [LangGraph Official Documentation](https://docs.langchain.com/oss/python/langgraph/overview)
+- [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
--- a/skills/langgraph-master/example_basic_chatbot.md
+++ b/skills/langgraph-master/example_basic_chatbot.md
@@ -0,0 +1,117 @@
+# Basic Chatbot
+
+Implementation example of a basic chatbot using LangGraph.
+
+## Complete Code
+
+```python
+from typing import Annotated
+from langgraph.graph import StateGraph, START, END, MessagesState
+from langgraph.graph.message import add_messages
+from langgraph.checkpoint.memory import MemorySaver
+from langchain_anthropic import ChatAnthropic
+
+# 1. Initialize LLM
+llm = ChatAnthropic(model="claude-sonnet-4-5-20250929")
+
+# 2. Define node
+def chatbot_node(state: MessagesState):
+    """Chatbot node"""
+    response = llm.invoke(state["messages"])
+    return {"messages": [response]}
+
+# 3. Build graph
+builder = StateGraph(MessagesState)
+builder.add_node("chatbot", chatbot_node)
+builder.add_edge(START, "chatbot")
+builder.add_edge("chatbot", END)
+
+# 4. Compile with checkpointer
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+
+# 5. Execute
+config = {"configurable": {"thread_id": "conversation-1"}}
+
+while True:
+    user_input = input("User: ")
+    if user_input.lower() in ["quit", "exit", "q"]:
+        break
+
+    # Send message
+    for chunk in graph.stream(
+        {"messages": [{"role": "user", "content": user_input}]},
+        config,
+        stream_mode="values"
+    ):
+        chunk["messages"][-1].pretty_print()
+```
+
+## Explanation
+
+### 1. MessagesState
+
+```python
+from langgraph.graph import MessagesState
+
+# MessagesState is equivalent to:
+class MessagesState(TypedDict):
+    messages: Annotated[list[AnyMessage], add_messages]
+```
+
+- `messages`: List of messages
+- `add_messages`: Reducer that adds new messages
+
+### 2. Checkpointer
+
+```python
+from langgraph.checkpoint.memory import MemorySaver
+
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+```
+
+- Saves conversation state
+- Continues conversation with same `thread_id`
+
+### 3. Streaming
+
+```python
+for chunk in graph.stream(input, config, stream_mode="values"):
+    chunk["messages"][-1].pretty_print()
+```
+
+- `stream_mode="values"`: Complete state after each step
+- `pretty_print()`: Displays messages in a readable format
+
+## Extension Examples
+
+### Adding System Message
+
+```python
+def chatbot_with_system(state: MessagesState):
+    """With system message"""
+    system_msg = {
+        "role": "system",
+        "content": "You are a helpful assistant."
+    }
+
+    response = llm.invoke([system_msg] + state["messages"])
+    return {"messages": [response]}
+```
+
+### Limiting Message History
+
+```python
+def chatbot_with_limit(state: MessagesState):
+    """Use only the latest 10 messages"""
+    recent_messages = state["messages"][-10:]
+    response = llm.invoke(recent_messages)
+    return {"messages": [response]}
+```
+
+## Related Pages
+
+- [01_core_concepts_overview.md](01_core_concepts_overview.md) - Understanding fundamental concepts
+- [03_memory_management_overview.md](03_memory_management_overview.md) - Checkpointer details
+- [example_rag_agent.md](example_rag_agent.md) - More advanced example
--- a/skills/langgraph-master/example_rag_agent.md
+++ b/skills/langgraph-master/example_rag_agent.md
@@ -0,0 +1,169 @@
+# RAG Agent
+
+Implementation example of a RAG (Retrieval-Augmented Generation) agent with search functionality.
+
+## Complete Code
+
+```python
+from typing import Annotated, Literal
+from langgraph.graph import StateGraph, START, END, MessagesState
+from langgraph.prebuilt import ToolNode
+from langgraph.checkpoint.memory import MemorySaver
+from langchain_anthropic import ChatAnthropic
+from langchain_core.tools import tool
+
+# 1. Define tool
+@tool
+def retrieve_documents(query: str) -> str:
+    """Retrieve relevant documents.
+
+    Args:
+        query: Search query
+    """
+    # In practice, search with vector store, etc.
+    # Using dummy data here
+    docs = [
+        "LangGraph is an agent framework.",
+        "StateGraph manages state.",
+        "You can extend agents with tools."
+    ]
+
+    return "\n".join(docs)
+
+tools = [retrieve_documents]
+
+# 2. Bind tools to LLM
+llm = ChatAnthropic(model="claude-sonnet-4-5-20250929")
+llm_with_tools = llm.bind_tools(tools)
+
+# 3. Define nodes
+def agent_node(state: MessagesState):
+    """Agent node"""
+    response = llm_with_tools.invoke(state["messages"])
+    return {"messages": [response]}
+
+def should_continue(state: MessagesState) -> Literal["tools", "end"]:
+    """Determine tool usage"""
+    last_message = state["messages"][-1]
+
+    if last_message.tool_calls:
+        return "tools"
+    return "end"
+
+# 4. Build graph
+builder = StateGraph(MessagesState)
+
+builder.add_node("agent", agent_node)
+builder.add_node("tools", ToolNode(tools))
+
+builder.add_edge(START, "agent")
+builder.add_conditional_edges(
+    "agent",
+    should_continue,
+    {
+        "tools": "tools",
+        "end": END
+    }
+)
+builder.add_edge("tools", "agent")
+
+# 5. Compile
+checkpointer = MemorySaver()
+graph = builder.compile(checkpointer=checkpointer)
+
+# 6. Execute
+config = {"configurable": {"thread_id": "rag-session-1"}}
+
+query = "What is LangGraph?"
+
+for chunk in graph.stream(
+    {"messages": [{"role": "user", "content": query}]},
+    config,
+    stream_mode="values"
+):
+    chunk["messages"][-1].pretty_print()
+```
+
+## Execution Flow
+
+```
+User Query: "What is LangGraph?"
+    ↓
+[Agent Node]
+    ↓
+LLM: "I'll search for information" + ToolCall(retrieve_documents)
+    ↓
+[Tool Node] ← Execute search
+    ↓
+ToolMessage: "LangGraph is an agent framework..."
+    ↓
+[Agent Node] ← Use search results
+    ↓
+LLM: "LangGraph is a framework for building agents..."
+    ↓
+END
+```
+
+## Extension Examples
+
+### Multiple Search Tools
+
+```python
+@tool
+def web_search(query: str) -> str:
+    """Search the web"""
+    return search_web(query)
+
+@tool
+def database_search(query: str) -> str:
+    """Search database"""
+    return search_database(query)
+
+tools = [retrieve_documents, web_search, database_search]
+```
+
+### Vector Search Implementation
+
+```python
+from langchain_community.vectorstores import FAISS
+from langchain_openai import OpenAIEmbeddings
+
+# Initialize vector store
+embeddings = OpenAIEmbeddings()
+vectorstore = FAISS.from_texts(
+    ["LangGraph is an agent framework.", ...],
+    embeddings
+)
+
+@tool
+def semantic_search(query: str) -> str:
+    """Perform semantic search"""
+    docs = vectorstore.similarity_search(query, k=3)
+    return "\n".join([doc.page_content for doc in docs])
+```
+
+### Adding Human-in-the-Loop
+
+```python
+from langgraph.types import interrupt
+
+@tool
+def sensitive_search(query: str) -> str:
+    """Search sensitive information (requires approval)"""
+    approved = interrupt({
+        "action": "sensitive_search",
+        "query": query,
+        "message": "Approve this sensitive search?"
+    })
+
+    if approved:
+        return perform_sensitive_search(query)
+    else:
+        return "Search cancelled by user"
+```
+
+## Related Pages
+
+- [02_graph_architecture_agent.md](02_graph_architecture_agent.md) - Agent pattern
+- [04_tool_integration_overview.md](04_tool_integration_overview.md) - Tool details
+- [example_basic_chatbot.md](example_basic_chatbot.md) - Basic chatbot