--- name: arch-analysis description: Analyze LangGraph application architecture, identify bottlenecks, and propose multiple improvement strategies --- # LangGraph Architecture Analysis Skill A skill for analyzing LangGraph application architecture, identifying bottlenecks, and proposing multiple improvement strategies. ## 📋 Overview This skill analyzes existing LangGraph applications and proposes graph structure improvements: 1. **Current State Analysis**: Performance measurement and graph structure understanding 2. **Problem Identification**: Organizing bottlenecks and architectural issues 3. **Improvement Proposals**: Generate 3-5 diverse improvement proposals (**all candidates for parallel exploration**) **Important**: - This skill only performs analysis and proposals. It does not implement changes. - **Output all improvement proposals**. The arch-tune command will implement and evaluate them in parallel. ## 🎯 When to Use Use this skill in the following situations: 1. **When performance improvement of existing applications is needed** - Latency exceeds targets - Cost is too high - Accuracy is insufficient 2. **When considering architecture-level improvements** - Prompt optimization (fine-tune) has limitations - Graph structure changes are needed - Considering introduction of new patterns 3. **When you want to compare multiple improvement options** - Unclear which architecture is optimal - Want to understand trade-offs ## 📖 Analysis and Proposal Workflow ### Step 1: Verify Evaluation Environment **Purpose**: Prepare for performance measurement **Actions**: 1. Verify existence of evaluation program (`.langgraph-master/evaluation/` or specified directory) 2. If not present, confirm evaluation criteria with user and create 3. Verify test cases **Output**: Evaluation program ready ### Step 2: Measure Current Performance **Purpose**: Establish baseline **Actions**: 1. Run test cases 3-5 times 2. Record each metric (accuracy, latency, cost, etc.) 3. Calculate statistics (mean, standard deviation, min, max) 4. Save as baseline **Output**: `baseline_performance.json` ### Step 3: Analyze Graph Structure **Purpose**: Understand current architecture **Actions**: 1. **Identify graph definitions with Serena MCP** - Search for StateGraph, MessageGraph with `find_symbol` - Identify graph definition files (typically `graph.py`, `main.py`, etc.) 2. **Analyze node and edge structure** - List node functions with `get_symbols_overview` - Verify edge types (sequential, parallel, conditional) - Check for subgraphs 3. **Understand each node's role** - Read node functions - Verify presence of LLM calls - Summarize processing content **Output**: Graph structure documentation ### Step 4: Identify Bottlenecks **Purpose**: Identify performance problem areas **Actions**: 1. **Latency Bottlenecks** - Identify nodes with longest execution time - Verify delays from sequential processing - Discover unnecessary processing 2. **Cost Issues** - Identify high-cost nodes - Verify unnecessary LLM calls - Evaluate model selection optimality 3. **Accuracy Issues** - Identify nodes with frequent errors - Verify errors due to insufficient information - Discover architecture constraints **Output**: List of issues ### Step 5: Consider Architecture Patterns **Purpose**: Identify applicable LangGraph patterns **Actions**: 1. **Consider patterns based on problems** - Latency issues → Parallelization - Diverse use cases → Routing - Complex processing → Subgraph - Staged processing → Prompt Chaining, Map-Reduce 2. **Reference langgraph-master skill** - Verify characteristics of each pattern - Evaluate application conditions - Reference implementation examples **Output**: List of applicable patterns ### Step 6: Generate Improvement Proposals **Purpose**: Create 3-5 diverse improvement proposals (all candidates for parallel exploration) **Actions**: 1. **Create improvement proposals based on each pattern** - Change details (which nodes/edges to modify) - Expected effects (impact on accuracy, latency, cost) - Implementation complexity (low/medium/high) - Estimated implementation time 2. **Evaluate improvement proposals** - Feasibility - Risk assessment - Expected ROI **Important**: Output all improvement proposals. The arch-tune command will **implement and evaluate all proposals in parallel**. **Output**: Improvement proposal document (including all proposals) ### Step 7: Create Report **Purpose**: Organize analysis results and proposals **Actions**: 1. Current state analysis summary 2. Organize issues 3. **Document all improvement proposals in `improvement_proposals.md`** (with priorities) 4. Present recommendations for reference (first recommendation, second recommendation, reference) **Important**: Output all proposals to `improvement_proposals.md`. The arch-tune command will read these and implement/evaluate them in parallel. **Output**: - `analysis_report.md` - Current state analysis and issues - `improvement_proposals.md` - **All improvement proposals** (Proposal 1, 2, 3, ...) ## 📊 Output Formats ### baseline_performance.json ```json { "iterations": 5, "test_cases": 20, "metrics": { "accuracy": { "mean": 75.0, "std": 3.2, "min": 70.0, "max": 80.0 }, "latency": { "mean": 3.5, "std": 0.4, "min": 3.1, "max": 4.2 }, "cost": { "mean": 0.020, "std": 0.002, "min": 0.018, "max": 0.023 } } } ``` ### analysis_report.md ```markdown # Architecture Analysis Report Execution Date: 2024-11-24 10:00:00 ## Current Performance | Metric | Mean | Std Dev | Target | Gap | |--------|------|---------|--------|-----| | Accuracy | 75.0% | 3.2% | 90.0% | -15.0% | | Latency | 3.5s | 0.4s | 2.0s | +1.5s | | Cost | $0.020 | $0.002 | $0.010 | +$0.010 | ## Graph Structure ### Current Configuration \``` analyze_intent → retrieve_docs → generate_response \``` - **Node Count**: 3 - **Edge Type**: Sequential only - **Parallel Processing**: None - **Conditional Branching**: None ### Node Details #### analyze_intent - **Role**: Classify user input intent - **LLM**: Claude 3.5 Sonnet - **Average Execution Time**: 0.5s #### retrieve_docs - **Role**: Search related documents - **Processing**: Vector DB query + reranking - **Average Execution Time**: 1.5s #### generate_response - **Role**: Generate final response - **LLM**: Claude 3.5 Sonnet - **Average Execution Time**: 1.5s ## Issues ### 1. Latency Bottleneck from Sequential Processing - **Issue**: analyze_intent and retrieve_docs are sequential - **Impact**: Total 2.0s delay (57% of total) - **Improvement Potential**: -0.8s or more reduction possible through parallelization ### 2. All Requests Follow Same Flow - **Issue**: Simple and complex questions go through same processing - **Impact**: Unnecessary retrieve_docs execution (wasted Cost and Latency) - **Improvement Potential**: -50% reduction possible for simple cases through routing ### 3. Use of Low-Relevance Documents - **Issue**: retrieve_docs returns only top-k (no reranking) - **Impact**: Low Accuracy (75%) - **Improvement Potential**: +10-15% improvement possible through multi-stage RAG ## Applicable Architecture Patterns 1. **Parallelization** - Parallelize analyze_intent and retrieve_docs 2. **Routing** - Branch processing flow based on intent 3. **Subgraph** - Dedicated subgraph for RAG processing (retrieve → rerank → select) 4. **Orchestrator-Worker** - Execute multiple retrievers in parallel and integrate results ``` ### improvement_proposals.md ```markdown # Architecture Improvement Proposals Proposal Date: 2024-11-24 10:30:00 ## Proposal 1: Parallel Document Retrieval + Intent Analysis ### Changes **Current**: \``` analyze_intent → retrieve_docs → generate_response \``` **After Change**: \``` START → [analyze_intent, retrieve_docs] → generate_response ↓ parallel execution ↓ \``` ### Implementation Details 1. Add parallel edges to StateGraph 2. Add join node to wait for both results 3. generate_response receives both results ### Expected Effects | Metric | Current | Expected | Change | Change Rate | |--------|---------|----------|--------|-------------| | Accuracy | 75.0% | 75.0% | ±0 | - | | Latency | 3.5s | 2.7s | -0.8s | -23% | | Cost | $0.020 | $0.020 | ±0 | - | ### Implementation Complexity - **Level**: Low - **Estimated Time**: 1-2 hours - **Risk**: Low (no changes to existing nodes required) ### Recommendation Level ⭐⭐⭐⭐ (High) - Effective for Latency improvement with low risk --- ## Proposal 2: Intent-Based Routing ### Changes **Current**: \``` analyze_intent → retrieve_docs → generate_response \``` **After Change**: \``` analyze_intent ├─ simple_intent → simple_response (lightweight) └─ complex_intent → retrieve_docs → generate_response \``` ### Implementation Details 1. Conditional branching based on analyze_intent output 2. Create new simple_response node (using Haiku) 3. Routing with conditional_edges ### Expected Effects | Metric | Current | Expected | Change | Change Rate | |--------|---------|----------|--------|-------------| | Accuracy | 75.0% | 82.0% | +7.0% | +9% | | Latency | 3.5s | 2.8s | -0.7s | -20% | | Cost | $0.020 | $0.014 | -$0.006 | -30% | **Assumption**: 40% simple cases, 60% complex cases ### Implementation Complexity - **Level**: Medium - **Estimated Time**: 2-3 hours - **Risk**: Medium (adding routing logic) ### Recommendation Level ⭐⭐⭐⭐⭐ (Highest) - Balanced improvement across all metrics --- ## Proposal 3: Multi-Stage RAG with Reranking Subgraph ### Changes **Current**: \``` analyze_intent → retrieve_docs → generate_response \``` **After Change**: \``` analyze_intent → [RAG Subgraph] → generate_response ↓ retrieve (k=20) ↓ rerank (top-5) ↓ select (best context) \``` ### Implementation Details 1. Convert RAG processing to dedicated subgraph 2. Retrieve more candidates in retrieve node (k=20) 3. Evaluate relevance in rerank node (Cross-Encoder) 4. Select optimal context in select node ### Expected Effects | Metric | Current | Expected | Change | Change Rate | |--------|---------|----------|--------|-------------| | Accuracy | 75.0% | 88.0% | +13.0% | +17% | | Latency | 3.5s | 3.8s | +0.3s | +9% | | Cost | $0.020 | $0.022 | +$0.002 | +10% | ### Implementation Complexity - **Level**: Medium-High - **Estimated Time**: 3-4 hours - **Risk**: Medium (introducing new model, subgraph management) ### Recommendation Level ⭐⭐⭐ (Medium) - Effective when Accuracy is priority, Latency will degrade --- ## Recommendations **Note**: The following recommendations are for reference. The arch-tune command will **implement and evaluate all Proposals above in parallel** and select the best option based on actual results. ### 🥇 First Recommendation: Proposal 2 (Intent-Based Routing) **Reasons**: - Balanced improvement across all metrics - Implementation complexity is manageable at medium level - High ROI (effect vs cost) **Next Steps**: 1. Run parallel exploration with arch-tune command 2. Implement and evaluate Proposals 1, 2, 3 simultaneously 3. Select best option based on actual results ### 🥈 Second Recommendation: Proposal 1 (Parallel Retrieval) **Reasons**: - Simple implementation with low risk - Reliable Latency improvement - Can be combined with Proposal 2 ### 📝 Reference: Proposal 3 (Multi-Stage RAG) **Reasons**: - Effective when Accuracy is most important - Only when Latency trade-off is acceptable ``` ## 🔧 Tools and Technologies Used ### MCP Server Usage - **Serena MCP**: Codebase analysis - `find_symbol`: Search graph definitions - `get_symbols_overview`: Understand node structure - `search_for_pattern`: Search specific patterns ### Reference Skills - **langgraph-master skill**: Architecture pattern reference ### Evaluation Program - User-provided or auto-generated - Metrics: accuracy, latency, cost, etc. ## ⚠️ Important Notes 1. **Analysis Only** - This skill does not implement changes - Only outputs analysis and proposals 2. **Evaluation Environment** - Evaluation program is required - Will be created if not present 3. **Serena MCP** - If Serena is unavailable, manual code analysis - Use ls, read tools ## 🔗 Related Resources - [langgraph-master skill](../langgraph-master/SKILL.md) - Architecture patterns - [arch-tune command](../../commands/arch-tune.md) - Command that uses this skill - [fine-tune skill](../fine-tune/SKILL.md) - Prompt optimization