# Prompt Optimization Techniques A collection of practical techniques for effectively optimizing prompts in LangGraph nodes. **💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples). ## 🔧 Practical Optimization Techniques ### Technique 1: Few-Shot Examples **Effect**: Accuracy +10-20% **Before (Zero-shot)**: ```python system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.""" # Accuracy: ~70% ``` **After (Few-shot)**: ```python system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general. Examples: Input: "How much does the premium plan cost?" Output: product_inquiry Input: "I can't log into my account" Output: technical_support Input: "Why was I charged twice this month?" Output: billing Input: "Hello, how are you today?" Output: general Input: "What features are included in the basic plan?" Output: product_inquiry""" # Accuracy: ~85-90% ``` **Best Practices**: - **Number of Examples**: 3-7 (diminishing returns beyond this) - **Diversity**: At least one from each category, including edge cases - **Quality**: Select clear and unambiguous examples - **Format**: Consistent Input/Output format ### Technique 2: Chain-of-Thought **Effect**: Accuracy +15-30% for complex reasoning tasks **Before (Direct answer)**: ```python prompt = f"""Question: {question} Answer:""" # Many incorrect answers for complex questions ``` **After (Chain-of-Thought)**: ```python prompt = f"""Question: {question} Think through this step by step: 1. First, identify the key information needed 2. Then, analyze the context for relevant details 3. Finally, formulate a clear answer Reasoning:""" # Logical answers even for complex questions ``` **Application Scenarios**: - ✅ Tasks requiring multi-step reasoning - ✅ Complex decision making - ✅ Resolving contradictions - ❌ Simple classification tasks (overhead) ### Technique 3: Output Format Structuring **Effect**: Latency -10-20%, Parsing errors -90% **Before (Free text)**: ```python prompt = "Classify the intent and explain why." # Output: "This looks like a technical support question because the user is having trouble logging in..." # Problems: Hard to parse, verbose, inconsistent ``` **After (JSON structured)**: ```python prompt = """Classify the intent. Output ONLY a valid JSON object: { "intent": "", "confidence": <0.0-1.0>, "reasoning": "" } Example output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}""" # Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"} # Benefits: Easy to parse, concise, consistent ``` **JSON Parsing Error Handling**: ```python import json import re def parse_llm_json_output(output: str) -> dict: """Robustly parse LLM JSON output""" try: # Parse as JSON directly return json.loads(output) except json.JSONDecodeError: # Extract JSON only (from markdown code blocks, etc.) json_match = re.search(r'\{[^}]+\}', output) if json_match: try: return json.loads(json_match.group()) except json.JSONDecodeError: pass # Fallback return { "intent": "general", "confidence": 0.5, "reasoning": "Failed to parse LLM output" } ``` ### Technique 4: Temperature and Max Tokens Adjustment **Temperature Effects**: | Task Type | Recommended Temperature | Reason | |-----------|------------------------|--------| | Classification/Extraction | 0.0 - 0.3 | Deterministic output desired | | Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed | | Creative/Generation | 0.7 - 1.0 | Diversity and creativity important | **Before (Default settings)**: ```python llm = ChatAnthropic( model="claude-3-5-sonnet-20241022", temperature=1.0 # Default, used for all tasks ) # Unstable results for classification tasks ``` **After (Optimized per task)**: ```python # Intent classification: Low temperature intent_llm = ChatAnthropic( model="claude-3-5-sonnet-20241022", temperature=0.3 # Emphasize consistency ) # Response generation: Medium temperature response_llm = ChatAnthropic( model="claude-3-5-sonnet-20241022", temperature=0.5, # Balance flexibility max_tokens=500 # Enforce conciseness ) ``` **Max Tokens Effects**: ```python # Before: No limit llm = ChatAnthropic(model="claude-3-5-sonnet-20241022") # Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s # After: Appropriate limit llm = ChatAnthropic( model="claude-3-5-sonnet-20241022", max_tokens=500 # Necessary and sufficient length ) # Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%) ``` ### Technique 5: System Message vs Human Message Usage **System Message**: - **Use**: Role, guidelines, constraints - **Characteristics**: Context applied to entire task - **Caching**: Effective (doesn't change frequently) **Human Message**: - **Use**: Specific input, questions - **Characteristics**: Changes per request - **Caching**: Less effective **Good Structure**: ```python messages = [ SystemMessage(content="""You are a customer support assistant. Guidelines: - Be concise: 2-3 sentences maximum - Be empathetic: Acknowledge customer concerns - Be actionable: Provide clear next steps Response format: 1. Acknowledgment 2. Answer or solution 3. Next steps (if applicable)"""), HumanMessage(content=f"""Customer question: {user_input} Context: {context} Generate a helpful response:""") ] ``` ### Technique 6: Prompt Caching **Effect**: Cost -50-90% (on cache hit) Leverage Anthropic Claude's prompt caching: ```python from anthropic import Anthropic client = Anthropic() # Large cacheable system prompt CACHED_SYSTEM_PROMPT = """You are an expert customer support agent... [Long guidelines, examples, and context - 1000+ tokens] Examples: [50 few-shot examples] """ # Use cache message = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=500, system=[ { "type": "text", "text": CACHED_SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"} # Enable caching } ], messages=[ {"role": "user", "content": user_input} ] ) # First time: Full cost # 2nd+ time (within 5 minutes): Input tokens -90% discount ``` **Caching Strategy**: - ✅ Large system prompts (>1024 tokens) - ✅ Sets of few-shot examples - ✅ Long context (RAG documents) - ❌ Frequently changing content - ❌ Small prompts (<1024 tokens) ### Technique 7: Progressive Refinement Break complex tasks into multiple steps: **Before (1 step)**: ```python # Execute everything in one node prompt = f"""Analyze user input, retrieve relevant info, and generate response. Input: {user_input}""" # Problems: Too complex, low quality, hard to debug ``` **After (Multiple steps)**: ```python # Step 1: Intent classification intent = classify_intent(user_input) # Step 2: Information retrieval (based on intent) context = retrieve_context(intent, user_input) # Step 3: Response generation (using intent and context) response = generate_response(intent, context, user_input) # Benefits: Each step optimizable, easy to debug, improved quality ``` ### Technique 8: Negative Instructions **Effect**: Edge case errors -30-50% ```python prompt = """Generate a customer support response. DO: - Be concise (2-3 sentences) - Acknowledge the customer's concern - Provide actionable next steps DO NOT: - Apologize excessively (one apology maximum) - Make promises you can't keep (e.g., "immediate resolution") - Use technical jargon without explanation - Provide information not in the context - Generate placeholder text like "XXX" or "[insert here]" Customer question: {question} Context: {context} Response:""" ``` ### Technique 9: Self-Consistency **Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300% Generate multiple reasoning paths and use majority voting: ```python def self_consistency_reasoning(question: str, num_samples: int = 5) -> str: """Generate multiple reasoning paths and select the most consistent answer""" llm = ChatAnthropic( model="claude-3-5-sonnet-20241022", temperature=0.7 # Higher temperature for diversity ) prompt = f"""Question: {question} Think through this step by step and provide your reasoning: Reasoning:""" # Generate multiple reasoning paths responses = [] for _ in range(num_samples): response = llm.invoke([HumanMessage(content=prompt)]) responses.append(response.content) # Extract the most consistent answer (simplified) # In practice, extract final answer from each response and use majority voting from collections import Counter final_answers = [extract_final_answer(r) for r in responses] most_common = Counter(final_answers).most_common(1)[0][0] return most_common # Trade-offs: # - Accuracy: +10-20% # - Cost: +200-300% (5x API calls) # - Latency: +200-300% (if not parallelized) # Use: Critical decisions only ``` ### Technique 10: Model Selection **Model Selection Based on Task Complexity**: | Task Type | Recommended Model | Reason | |-----------|------------------|--------| | Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy | | Complex reasoning | Claude 3.5 Sonnet | Balanced performance | | Highly complex tasks | Claude Opus | Best performance (high cost) | ```python # Select optimal model per task class LLMSelector: def __init__(self): self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022") self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022") self.opus = ChatAnthropic(model="claude-opus-20240229") def get_llm(self, task_complexity: str): if task_complexity == "simple": return self.haiku # ~$0.001/req elif task_complexity == "complex": return self.sonnet # ~$0.005/req else: # very_complex return self.opus # ~$0.015/req # Usage example selector = LLMSelector() # Simple intent classification → Haiku intent_llm = selector.get_llm("simple") # Complex response generation → Sonnet response_llm = selector.get_llm("complex") ``` **Hybrid Approach**: ```python def hybrid_classification(user_input: str) -> dict: """Try Haiku first, use Sonnet if confidence is low""" # Step 1: Classify with Haiku haiku_result = classify_with_haiku(user_input) if haiku_result["confidence"] >= 0.8: # High confidence → Use Haiku result return haiku_result else: # Low confidence → Re-classify with Sonnet sonnet_result = classify_with_sonnet(user_input) return sonnet_result # Effects: # - 80% of cases use Haiku (low cost) # - 20% of cases use Sonnet (high accuracy) # - Average cost: -60% # - Average accuracy: -2% (acceptable range) ```