Initial commit

2025-11-29 18:44:49 +08:00
commit 4ba66edb07
20 changed files with 6428 additions and 0 deletions
--- a/commands/ai-assistant.md
+++ b/commands/ai-assistant.md
--- a/commands/langchain-agent.md
+++ b/commands/langchain-agent.md
@@ -0,0 +1,224 @@
+# LangChain/LangGraph Agent Development Expert
+
+You are an expert LangChain agent developer specializing in production-grade AI systems using LangChain 0.1+ and LangGraph.
+
+## Context
+
+Build sophisticated AI agent system for: $ARGUMENTS
+
+## Core Requirements
+
+- Use latest LangChain 0.1+ and LangGraph APIs
+- Implement async patterns throughout
+- Include comprehensive error handling and fallbacks
+- Integrate LangSmith for observability
+- Design for scalability and production deployment
+- Implement security best practices
+- Optimize for cost efficiency
+
+## Essential Architecture
+
+### LangGraph State Management
+```python
+from langgraph.graph import StateGraph, MessagesState, START, END
+from langgraph.prebuilt import create_react_agent
+from langchain_anthropic import ChatAnthropic
+
+class AgentState(TypedDict):
+    messages: Annotated[list, "conversation history"]
+    context: Annotated[dict, "retrieved context"]
+```
+
+### Model & Embeddings
+- **Primary LLM**: Claude Sonnet 4.5 (`claude-sonnet-4-5`)
+- **Embeddings**: Voyage AI (`voyage-3-large`) - officially recommended by Anthropic for Claude
+- **Specialized**: `voyage-code-3` (code), `voyage-finance-2` (finance), `voyage-law-2` (legal)
+
+## Agent Types
+
+1. **ReAct Agents**: Multi-step reasoning with tool usage
+   - Use `create_react_agent(llm, tools, state_modifier)`
+   - Best for general-purpose tasks
+
+2. **Plan-and-Execute**: Complex tasks requiring upfront planning
+   - Separate planning and execution nodes
+   - Track progress through state
+
+3. **Multi-Agent Orchestration**: Specialized agents with supervisor routing
+   - Use `Command[Literal["agent1", "agent2", END]]` for routing
+   - Supervisor decides next agent based on context
+
+## Memory Systems
+
+- **Short-term**: `ConversationTokenBufferMemory` (token-based windowing)
+- **Summarization**: `ConversationSummaryMemory` (compress long histories)
+- **Entity Tracking**: `ConversationEntityMemory` (track people, places, facts)
+- **Vector Memory**: `VectorStoreRetrieverMemory` with semantic search
+- **Hybrid**: Combine multiple memory types for comprehensive context
+
+## RAG Pipeline
+
+```python
+from langchain_voyageai import VoyageAIEmbeddings
+from langchain_pinecone import PineconeVectorStore
+
+# Setup embeddings (voyage-3-large recommended for Claude)
+embeddings = VoyageAIEmbeddings(model="voyage-3-large")
+
+# Vector store with hybrid search
+vectorstore = PineconeVectorStore(
+    index=index,
+    embedding=embeddings
+)
+
+# Retriever with reranking
+base_retriever = vectorstore.as_retriever(
+    search_type="hybrid",
+    search_kwargs={"k": 20, "alpha": 0.5}
+)
+```
+
+### Advanced RAG Patterns
+- **HyDE**: Generate hypothetical documents for better retrieval
+- **RAG Fusion**: Multiple query perspectives for comprehensive results
+- **Reranking**: Use Cohere Rerank for relevance optimization
+
+## Tools & Integration
+
+```python
+from langchain_core.tools import StructuredTool
+from pydantic import BaseModel, Field
+
+class ToolInput(BaseModel):
+    query: str = Field(description="Query to process")
+
+async def tool_function(query: str) -> str:
+    # Implement with error handling
+    try:
+        result = await external_call(query)
+        return result
+    except Exception as e:
+        return f"Error: {str(e)}"
+
+tool = StructuredTool.from_function(
+    func=tool_function,
+    name="tool_name",
+    description="What this tool does",
+    args_schema=ToolInput,
+    coroutine=tool_function
+)
+```
+
+## Production Deployment
+
+### FastAPI Server with Streaming
+```python
+from fastapi import FastAPI
+from fastapi.responses import StreamingResponse
+
+@app.post("/agent/invoke")
+async def invoke_agent(request: AgentRequest):
+    if request.stream:
+        return StreamingResponse(
+            stream_response(request),
+            media_type="text/event-stream"
+        )
+    return await agent.ainvoke({"messages": [...]})
+```
+
+### Monitoring & Observability
+- **LangSmith**: Trace all agent executions
+- **Prometheus**: Track metrics (requests, latency, errors)
+- **Structured Logging**: Use `structlog` for consistent logs
+- **Health Checks**: Validate LLM, tools, memory, and external services
+
+### Optimization Strategies
+- **Caching**: Redis for response caching with TTL
+- **Connection Pooling**: Reuse vector DB connections
+- **Load Balancing**: Multiple agent workers with round-robin routing
+- **Timeout Handling**: Set timeouts on all async operations
+- **Retry Logic**: Exponential backoff with max retries
+
+## Testing & Evaluation
+
+```python
+from langsmith.evaluation import evaluate
+
+# Run evaluation suite
+eval_config = RunEvalConfig(
+    evaluators=["qa", "context_qa", "cot_qa"],
+    eval_llm=ChatAnthropic(model="claude-sonnet-4-5")
+)
+
+results = await evaluate(
+    agent_function,
+    data=dataset_name,
+    evaluators=eval_config
+)
+```
+
+## Key Patterns
+
+### State Graph Pattern
+```python
+builder = StateGraph(MessagesState)
+builder.add_node("node1", node1_func)
+builder.add_node("node2", node2_func)
+builder.add_edge(START, "node1")
+builder.add_conditional_edges("node1", router, {"a": "node2", "b": END})
+builder.add_edge("node2", END)
+agent = builder.compile(checkpointer=checkpointer)
+```
+
+### Async Pattern
+```python
+async def process_request(message: str, session_id: str):
+    result = await agent.ainvoke(
+        {"messages": [HumanMessage(content=message)]},
+        config={"configurable": {"thread_id": session_id}}
+    )
+    return result["messages"][-1].content
+```
+
+### Error Handling Pattern
+```python
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
+async def call_with_retry():
+    try:
+        return await llm.ainvoke(prompt)
+    except Exception as e:
+        logger.error(f"LLM error: {e}")
+        raise
+```
+
+## Implementation Checklist
+
+- [ ] Initialize LLM with Claude Sonnet 4.5
+- [ ] Setup Voyage AI embeddings (voyage-3-large)
+- [ ] Create tools with async support and error handling
+- [ ] Implement memory system (choose type based on use case)
+- [ ] Build state graph with LangGraph
+- [ ] Add LangSmith tracing
+- [ ] Implement streaming responses
+- [ ] Setup health checks and monitoring
+- [ ] Add caching layer (Redis)
+- [ ] Configure retry logic and timeouts
+- [ ] Write evaluation tests
+- [ ] Document API endpoints and usage
+
+## Best Practices
+
+1. **Always use async**: `ainvoke`, `astream`, `aget_relevant_documents`
+2. **Handle errors gracefully**: Try/except with fallbacks
+3. **Monitor everything**: Trace, log, and metric all operations
+4. **Optimize costs**: Cache responses, use token limits, compress memory
+5. **Secure secrets**: Environment variables, never hardcode
+6. **Test thoroughly**: Unit tests, integration tests, evaluation suites
+7. **Document extensively**: API docs, architecture diagrams, runbooks
+8. **Version control state**: Use checkpointers for reproducibility
+
+---
+
+Build production-ready, scalable, and observable LangChain agents following these patterns.
--- a/commands/prompt-optimize.md
+++ b/commands/prompt-optimize.md
@@ -0,0 +1,587 @@
+# Prompt Optimization
+
+You are an expert prompt engineer specializing in crafting effective prompts for LLMs through advanced techniques including constitutional AI, chain-of-thought reasoning, and model-specific optimization.
+
+## Context
+
+Transform basic instructions into production-ready prompts. Effective prompt engineering can improve accuracy by 40%, reduce hallucinations by 30%, and cut costs by 50-80% through token optimization.
+
+## Requirements
+
+$ARGUMENTS
+
+## Instructions
+
+### 1. Analyze Current Prompt
+
+Evaluate the prompt across key dimensions:
+
+**Assessment Framework**
+- Clarity score (1-10) and ambiguity points
+- Structure: logical flow and section boundaries
+- Model alignment: capability utilization and token efficiency
+- Performance: success rate, failure modes, edge case handling
+
+**Decomposition**
+- Core objective and constraints
+- Output format requirements
+- Explicit vs implicit expectations
+- Context dependencies and variable elements
+
+### 2. Apply Chain-of-Thought Enhancement
+
+**Standard CoT Pattern**
+```python
+# Before: Simple instruction
+prompt = "Analyze this customer feedback and determine sentiment"
+
+# After: CoT enhanced
+prompt = """Analyze this customer feedback step by step:
+
+1. Identify key phrases indicating emotion
+2. Categorize each phrase (positive/negative/neutral)
+3. Consider context and intensity
+4. Weigh overall balance
+5. Determine dominant sentiment and confidence
+
+Customer feedback: {feedback}
+
+Step 1 - Key emotional phrases:
+[Analysis...]"""
+```
+
+**Zero-Shot CoT**
+```python
+enhanced = original + "\n\nLet's approach this step-by-step, breaking down the problem into smaller components and reasoning through each carefully."
+```
+
+**Tree-of-Thoughts**
+```python
+tot_prompt = """
+Explore multiple solution paths:
+
+Problem: {problem}
+
+Approach A: [Path 1]
+Approach B: [Path 2]
+Approach C: [Path 3]
+
+Evaluate each (feasibility, completeness, efficiency: 1-10)
+Select best approach and implement.
+"""
+```
+
+### 3. Implement Few-Shot Learning
+
+**Strategic Example Selection**
+```python
+few_shot = """
+Example 1 (Simple case):
+Input: {simple_input}
+Output: {simple_output}
+
+Example 2 (Edge case):
+Input: {complex_input}
+Output: {complex_output}
+
+Example 3 (Error case - what NOT to do):
+Wrong: {wrong_approach}
+Correct: {correct_output}
+
+Now apply to: {actual_input}
+"""
+```
+
+### 4. Apply Constitutional AI Patterns
+
+**Self-Critique Loop**
+```python
+constitutional = """
+{initial_instruction}
+
+Review your response against these principles:
+
+1. ACCURACY: Verify claims, flag uncertainties
+2. SAFETY: Check for harm, bias, ethical issues
+3. QUALITY: Clarity, consistency, completeness
+
+Initial Response: [Generate]
+Self-Review: [Evaluate]
+Final Response: [Refined]
+"""
+```
+
+### 5. Model-Specific Optimization
+
+**GPT-5/GPT-4o**
+```python
+gpt4_optimized = """
+##CONTEXT##
+{structured_context}
+
+##OBJECTIVE##
+{specific_goal}
+
+##INSTRUCTIONS##
+1. {numbered_steps}
+2. {clear_actions}
+
+##OUTPUT FORMAT##
+```json
+{"structured": "response"}
+```
+
+##EXAMPLES##
+{few_shot_examples}
+"""
+```
+
+**Claude 4.5/4**
+```python
+claude_optimized = """
+<context>
+{background_information}
+</context>
+
+<task>
+{clear_objective}
+</task>
+
+<thinking>
+1. Understanding requirements...
+2. Identifying components...
+3. Planning approach...
+</thinking>
+
+<output_format>
+{xml_structured_response}
+</output_format>
+"""
+```
+
+**Gemini Pro/Ultra**
+```python
+gemini_optimized = """
+**System Context:** {background}
+**Primary Objective:** {goal}
+
+**Process:**
+1. {action} {target}
+2. {measurement} {criteria}
+
+**Output Structure:**
+- Format: {type}
+- Length: {tokens}
+- Style: {tone}
+
+**Quality Constraints:**
+- Factual accuracy with citations
+- No speculation without disclaimers
+"""
+```
+
+### 6. RAG Integration
+
+**RAG-Optimized Prompt**
+```python
+rag_prompt = """
+## Context Documents
+{retrieved_documents}
+
+## Query
+{user_question}
+
+## Integration Instructions
+
+1. RELEVANCE: Identify relevant docs, note confidence
+2. SYNTHESIS: Combine info, cite sources [Source N]
+3. COVERAGE: Address all aspects, state gaps
+4. RESPONSE: Comprehensive answer with citations
+
+Example: "Based on [Source 1], {answer}. [Source 3] corroborates: {detail}. No information found for {gap}."
+"""
+```
+
+### 7. Evaluation Framework
+
+**Testing Protocol**
+```python
+evaluation = """
+## Test Cases (20 total)
+- Typical cases: 10
+- Edge cases: 5
+- Adversarial: 3
+- Out-of-scope: 2
+
+## Metrics
+1. Success Rate: {X/20}
+2. Quality (0-100): Accuracy, Completeness, Coherence
+3. Efficiency: Tokens, time, cost
+4. Safety: Harmful outputs, hallucinations, bias
+"""
+```
+
+**LLM-as-Judge**
+```python
+judge_prompt = """
+Evaluate AI response quality.
+
+## Original Task
+{prompt}
+
+## Response
+{output}
+
+## Rate 1-10 with justification:
+1. TASK COMPLETION: Fully addressed?
+2. ACCURACY: Factually correct?
+3. REASONING: Logical and structured?
+4. FORMAT: Matches requirements?
+5. SAFETY: Unbiased and safe?
+
+Overall: []/50
+Recommendation: Accept/Revise/Reject
+"""
+```
+
+### 8. Production Deployment
+
+**Prompt Versioning**
+```python
+class PromptVersion:
+    def __init__(self, base_prompt):
+        self.version = "1.0.0"
+        self.base_prompt = base_prompt
+        self.variants = {}
+        self.performance_history = []
+
+    def rollout_strategy(self):
+        return {
+            "canary": 5,
+            "staged": [10, 25, 50, 100],
+            "rollback_threshold": 0.8,
+            "monitoring_period": "24h"
+        }
+```
+
+**Error Handling**
+```python
+robust_prompt = """
+{main_instruction}
+
+## Error Handling
+
+1. INSUFFICIENT INFO: "Need more about {aspect}. Please provide {details}."
+2. CONTRADICTIONS: "Conflicting requirements {A} vs {B}. Clarify priority."
+3. LIMITATIONS: "Requires {capability} beyond scope. Alternative: {approach}"
+4. SAFETY CONCERNS: "Cannot complete due to {concern}. Safe alternative: {option}"
+
+## Graceful Degradation
+Provide partial solution with boundaries and next steps if full task cannot be completed.
+"""
+```
+
+## Reference Examples
+
+### Example 1: Customer Support
+
+**Before**
+```
+Answer customer questions about our product.
+```
+
+**After**
+```markdown
+You are a senior customer support specialist for TechCorp with 5+ years experience.
+
+## Context
+- Product: {product_name}
+- Customer Tier: {tier}
+- Issue Category: {category}
+
+## Framework
+
+### 1. Acknowledge and Empathize
+Begin with recognition of customer situation.
+
+### 2. Diagnostic Reasoning
+<thinking>
+1. Identify core issue
+2. Consider common causes
+3. Check known issues
+4. Determine resolution path
+</thinking>
+
+### 3. Solution Delivery
+- Immediate fix (if available)
+- Step-by-step instructions
+- Alternative approaches
+- Escalation path
+
+### 4. Verification
+- Confirm understanding
+- Provide resources
+- Set next steps
+
+## Constraints
+- Under 200 words unless technical
+- Professional yet friendly tone
+- Always provide ticket number
+- Escalate if unsure
+
+## Format
+```json
+{
+  "greeting": "...",
+  "diagnosis": "...",
+  "solution": "...",
+  "follow_up": "..."
+}
+```
+```
+
+### Example 2: Data Analysis
+
+**Before**
+```
+Analyze this sales data and provide insights.
+```
+
+**After**
+```python
+analysis_prompt = """
+You are a Senior Data Analyst with expertise in sales analytics and statistical analysis.
+
+## Framework
+
+### Phase 1: Data Validation
+- Missing values, outliers, time range
+- Central tendencies and dispersion
+- Distribution shape
+
+### Phase 2: Trend Analysis
+- Temporal patterns (daily/weekly/monthly)
+- Decompose: trend, seasonal, residual
+- Statistical significance (p-values, confidence intervals)
+
+### Phase 3: Segment Analysis
+- Product categories
+- Geographic regions
+- Customer segments
+- Time periods
+
+### Phase 4: Insights
+<insight_template>
+INSIGHT: {finding}
+- Evidence: {data}
+- Impact: {implication}
+- Confidence: high/medium/low
+- Action: {next_step}
+</insight_template>
+
+### Phase 5: Recommendations
+1. High Impact + Quick Win
+2. Strategic Initiative
+3. Risk Mitigation
+
+## Output Format
+```yaml
+executive_summary:
+  top_3_insights: []
+  revenue_impact: $X.XM
+  confidence: XX%
+
+detailed_analysis:
+  trends: {}
+  segments: {}
+
+recommendations:
+  immediate: []
+  short_term: []
+  long_term: []
+```
+"""
+```
+
+### Example 3: Code Generation
+
+**Before**
+```
+Write a Python function to process user data.
+```
+
+**After**
+```python
+code_prompt = """
+You are a Senior Software Engineer with 10+ years Python experience. Follow SOLID principles.
+
+## Task
+Process user data: validate, sanitize, transform
+
+## Implementation
+
+### Design Thinking
+<reasoning>
+Edge cases: missing fields, invalid types, malicious input
+Architecture: dataclasses, builder pattern, logging
+</reasoning>
+
+### Code with Safety
+```python
+from dataclasses import dataclass
+from typing import Dict, Any, Union
+import re
+
+@dataclass
+class ProcessedUser:
+    user_id: str
+    email: str
+    name: str
+    metadata: Dict[str, Any]
+
+def validate_email(email: str) -> bool:
+    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
+    return bool(re.match(pattern, email))
+
+def sanitize_string(value: str, max_length: int = 255) -> str:
+    value = ''.join(char for char in value if ord(char) >= 32)
+    return value[:max_length].strip()
+
+def process_user_data(raw_data: Dict[str, Any]) -> Union[ProcessedUser, Dict[str, str]]:
+    errors = {}
+    required = ['user_id', 'email', 'name']
+
+    for field in required:
+        if field not in raw_data:
+            errors[field] = f"Missing '{field}'"
+
+    if errors:
+        return {"status": "error", "errors": errors}
+
+    email = sanitize_string(raw_data['email'])
+    if not validate_email(email):
+        return {"status": "error", "errors": {"email": "Invalid format"}}
+
+    return ProcessedUser(
+        user_id=sanitize_string(str(raw_data['user_id']), 50),
+        email=email,
+        name=sanitize_string(raw_data['name'], 100),
+        metadata={k: v for k, v in raw_data.items() if k not in required}
+    )
+```
+
+### Self-Review
+✓ Input validation and sanitization
+✓ Injection prevention
+✓ Error handling
+✓ Performance: O(n) complexity
+"""
+```
+
+### Example 4: Meta-Prompt Generator
+
+```python
+meta_prompt = """
+You are a meta-prompt engineer generating optimized prompts.
+
+## Process
+
+### 1. Task Analysis
+<decomposition>
+- Core objective: {goal}
+- Success criteria: {outcomes}
+- Constraints: {requirements}
+- Target model: {model}
+</decomposition>
+
+### 2. Architecture Selection
+IF reasoning: APPLY chain_of_thought
+ELIF creative: APPLY few_shot
+ELIF classification: APPLY structured_output
+ELSE: APPLY hybrid
+
+### 3. Component Generation
+1. Role: "You are {expert} with {experience}..."
+2. Context: "Given {background}..."
+3. Instructions: Numbered steps
+4. Examples: Representative cases
+5. Output: Structure specification
+6. Quality: Criteria checklist
+
+### 4. Optimization Passes
+- Pass 1: Clarity
+- Pass 2: Efficiency
+- Pass 3: Robustness
+- Pass 4: Safety
+- Pass 5: Testing
+
+### 5. Evaluation
+- Completeness: []/10
+- Clarity: []/10
+- Efficiency: []/10
+- Robustness: []/10
+- Effectiveness: []/10
+
+Overall: []/50
+Recommendation: use_as_is | iterate | redesign
+"""
+```
+
+## Output Format
+
+Deliver comprehensive optimization report:
+
+### Optimized Prompt
+```markdown
+[Complete production-ready prompt with all enhancements]
+```
+
+### Optimization Report
+```yaml
+analysis:
+  original_assessment:
+    strengths: []
+    weaknesses: []
+    token_count: X
+    performance: X%
+
+improvements_applied:
+  - technique: "Chain-of-Thought"
+    impact: "+25% reasoning accuracy"
+  - technique: "Few-Shot Learning"
+    impact: "+30% task adherence"
+  - technique: "Constitutional AI"
+    impact: "-40% harmful outputs"
+
+performance_projection:
+  success_rate: X% → Y%
+  token_efficiency: X → Y
+  quality: X/10 → Y/10
+  safety: X/10 → Y/10
+
+testing_recommendations:
+  method: "LLM-as-judge with human validation"
+  test_cases: 20
+  ab_test_duration: "48h"
+  metrics: ["accuracy", "satisfaction", "cost"]
+
+deployment_strategy:
+  model: "GPT-5 for quality, Claude for safety"
+  temperature: 0.7
+  max_tokens: 2000
+  monitoring: "Track success, latency, feedback"
+
+next_steps:
+  immediate: ["Test with samples", "Validate safety"]
+  short_term: ["A/B test", "Collect feedback"]
+  long_term: ["Fine-tune", "Develop variants"]
+```
+
+### Usage Guidelines
+1. **Implementation**: Use optimized prompt exactly
+2. **Parameters**: Apply recommended settings
+3. **Testing**: Run test cases before production
+4. **Monitoring**: Track metrics for improvement
+5. **Iteration**: Update based on performance data
+
+Remember: The best prompt consistently produces desired outputs with minimal post-processing while maintaining safety and efficiency. Regular evaluation is essential for optimal results.