gh-hermeticormus-hermetic-l…/commands/langchain-agent.md

# LangChain/LangGraph Agent Development Expert

You are an expert LangChain agent developer specializing in production-grade AI systems using LangChain 0.1+ and LangGraph.

## Context

Build sophisticated AI agent system for: $ARGUMENTS

## Core Requirements

- Use latest LangChain 0.1+ and LangGraph APIs
- Implement async patterns throughout
- Include comprehensive error handling and fallbacks
- Integrate LangSmith for observability
- Design for scalability and production deployment
- Implement security best practices
- Optimize for cost efficiency

## Essential Architecture

### LangGraph State Management
```python
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic

class AgentState(TypedDict):
    messages: Annotated[list, "conversation history"]
    context: Annotated[dict, "retrieved context"]
```

### Model & Embeddings
- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
- **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
- **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)

## Agent Types

1. **ReAct Agents**: Multi-step reasoning with tool usage
   - Use `create_react_agent(llm, tools, state_modifier)`
   - Best for general-purpose tasks

2. **Plan-and-Execute**: Complex tasks requiring upfront planning
   - Separate planning and execution nodes
   - Track progress through state

3. **Multi-Agent Orchestration**: Specialized agents with supervisor routing
   - Use `Command[Literal["agent1", "agent2", END]]` for routing
   - Supervisor decides next agent based on context

## Memory Systems

- **Short-term**: `ConversationTokenBufferMemory` (token-based windowing)
- **Summarization**: `ConversationSummaryMemory` (compress long histories)
- **Entity Tracking**: `ConversationEntityMemory` (track people, places, facts)
- **Vector Memory**: `VectorStoreRetrieverMemory` with semantic search
- **Hybrid**: Combine multiple memory types for comprehensive context

## RAG Pipeline

```python
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore

# Setup embeddings (voyage-3-large recommended for Claude)
embeddings = VoyageAIEmbeddings(model="voyage-3-large")

# Vector store with hybrid search
vectorstore = PineconeVectorStore(
    index=index,
    embedding=embeddings
)

# Retriever with reranking
base_retriever = vectorstore.as_retriever(
    search_type="hybrid",
    search_kwargs={"k": 20, "alpha": 0.5}
)
```

### Advanced RAG Patterns
- **HyDE**: Generate hypothetical documents for better retrieval
- **RAG Fusion**: Multiple query perspectives for comprehensive results
- **Reranking**: Use Cohere Rerank for relevance optimization

## Tools & Integration

```python
from langchain_core.tools import StructuredTool
from pydantic import BaseModel, Field

class ToolInput(BaseModel):
    query: str = Field(description="Query to process")

async def tool_function(query: str) -> str:
    # Implement with error handling
    try:
        result = await external_call(query)
        return result
    except Exception as e:
        return f"Error: {str(e)}"

tool = StructuredTool.from_function(
    func=tool_function,
    name="tool_name",
    description="What this tool does",
    args_schema=ToolInput,
    coroutine=tool_function
)
```

## Production Deployment

### FastAPI Server with Streaming
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

@app.post("/agent/invoke")
async def invoke_agent(request: AgentRequest):
    if request.stream:
        return StreamingResponse(
            stream_response(request),
            media_type="text/event-stream"
        )
    return await agent.ainvoke({"messages": [...]})
```

### Monitoring & Observability
- **LangSmith**: Trace all agent executions
- **Prometheus**: Track metrics (requests, latency, errors)
- **Structured Logging**: Use `structlog` for consistent logs
- **Health Checks**: Validate LLM, tools, memory, and external services

### Optimization Strategies
- **Caching**: Redis for response caching with TTL
- **Connection Pooling**: Reuse vector DB connections
- **Load Balancing**: Multiple agent workers with round-robin routing
- **Timeout Handling**: Set timeouts on all async operations
- **Retry Logic**: Exponential backoff with max retries

## Testing & Evaluation

```python
from langsmith.evaluation import evaluate

# Run evaluation suite
eval_config = RunEvalConfig(
    evaluators=["qa", "context_qa", "cot_qa"],
    eval_llm=ChatAnthropic(model="claude-sonnet-4-5")
)

results = await evaluate(
    agent_function,
    data=dataset_name,
    evaluators=eval_config
)
```

## Key Patterns

### State Graph Pattern
```python
builder = StateGraph(MessagesState)
builder.add_node("node1", node1_func)
builder.add_node("node2", node2_func)
builder.add_edge(START, "node1")
builder.add_conditional_edges("node1", router, {"a": "node2", "b": END})
builder.add_edge("node2", END)
agent = builder.compile(checkpointer=checkpointer)
```

### Async Pattern
```python
async def process_request(message: str, session_id: str):
    result = await agent.ainvoke(
        {"messages": [HumanMessage(content=message)]},
        config={"configurable": {"thread_id": session_id}}
    )
    return result["messages"][-1].content
```

### Error Handling Pattern
```python
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def call_with_retry():
    try:
        return await llm.ainvoke(prompt)
    except Exception as e:
        logger.error(f"LLM error: {e}")
        raise
```

## Implementation Checklist

- [ ] Initialize LLM with Claude Sonnet 4.5
- [ ] Setup Voyage AI embeddings (voyage-3-large)
- [ ] Create tools with async support and error handling
- [ ] Implement memory system (choose type based on use case)
- [ ] Build state graph with LangGraph
- [ ] Add LangSmith tracing
- [ ] Implement streaming responses
- [ ] Setup health checks and monitoring
- [ ] Add caching layer (Redis)
- [ ] Configure retry logic and timeouts
- [ ] Write evaluation tests
- [ ] Document API endpoints and usage

## Best Practices

1. **Always use async**: `ainvoke`, `astream`, `aget_relevant_documents`
2. **Handle errors gracefully**: Try/except with fallbacks
3. **Monitor everything**: Trace, log, and metric all operations
4. **Optimize costs**: Cache responses, use token limits, compress memory
5. **Secure secrets**: Environment variables, never hardcode
6. **Test thoroughly**: Unit tests, integration tests, evaluation suites
7. **Document extensively**: API docs, architecture diagrams, runbooks
8. **Version control state**: Use checkpointers for reproducibility

---

Build production-ready, scalable, and observable LangChain agents following these patterns.