Initial commit

2025-11-29 18:45:58 +08:00
commit 4b6db3349f
68 changed files with 15165 additions and 0 deletions
--- a/skills/fine-tune/prompt_techniques.md
+++ b/skills/fine-tune/prompt_techniques.md
@@ -0,0 +1,425 @@
+# Prompt Optimization Techniques
+
+A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.
+
+**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
+
+## 🔧 Practical Optimization Techniques
+
+### Technique 1: Few-Shot Examples
+
+**Effect**: Accuracy +10-20%
+
+**Before (Zero-shot)**:
+```python
+system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""
+
+# Accuracy: ~70%
+```
+
+**After (Few-shot)**:
+```python
+system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.
+
+Examples:
+
+Input: "How much does the premium plan cost?"
+Output: product_inquiry
+
+Input: "I can't log into my account"
+Output: technical_support
+
+Input: "Why was I charged twice this month?"
+Output: billing
+
+Input: "Hello, how are you today?"
+Output: general
+
+Input: "What features are included in the basic plan?"
+Output: product_inquiry"""
+
+# Accuracy: ~85-90%
+```
+
+**Best Practices**:
+- **Number of Examples**: 3-7 (diminishing returns beyond this)
+- **Diversity**: At least one from each category, including edge cases
+- **Quality**: Select clear and unambiguous examples
+- **Format**: Consistent Input/Output format
+
+### Technique 2: Chain-of-Thought
+
+**Effect**: Accuracy +15-30% for complex reasoning tasks
+
+**Before (Direct answer)**:
+```python
+prompt = f"""Question: {question}
+
+Answer:"""
+
+# Many incorrect answers for complex questions
+```
+
+**After (Chain-of-Thought)**:
+```python
+prompt = f"""Question: {question}
+
+Think through this step by step:
+
+1. First, identify the key information needed
+2. Then, analyze the context for relevant details
+3. Finally, formulate a clear answer
+
+Reasoning:"""
+
+# Logical answers even for complex questions
+```
+
+**Application Scenarios**:
+- ✅ Tasks requiring multi-step reasoning
+- ✅ Complex decision making
+- ✅ Resolving contradictions
+- ❌ Simple classification tasks (overhead)
+
+### Technique 3: Output Format Structuring
+
+**Effect**: Latency -10-20%, Parsing errors -90%
+
+**Before (Free text)**:
+```python
+prompt = "Classify the intent and explain why."
+
+# Output: "This looks like a technical support question because the user is having trouble logging in..."
+# Problems: Hard to parse, verbose, inconsistent
+```
+
+**After (JSON structured)**:
+```python
+prompt = """Classify the intent.
+
+Output ONLY a valid JSON object:
+{
+  "intent": "<category>",
+  "confidence": <0.0-1.0>,
+  "reasoning": "<brief explanation in one sentence>"
+}
+
+Example output:
+{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""
+
+# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
+# Benefits: Easy to parse, concise, consistent
+```
+
+**JSON Parsing Error Handling**:
+```python
+import json
+import re
+
+def parse_llm_json_output(output: str) -> dict:
+    """Robustly parse LLM JSON output"""
+    try:
+        # Parse as JSON directly
+        return json.loads(output)
+    except json.JSONDecodeError:
+        # Extract JSON only (from markdown code blocks, etc.)
+        json_match = re.search(r'\{[^}]+\}', output)
+        if json_match:
+            try:
+                return json.loads(json_match.group())
+            except json.JSONDecodeError:
+                pass
+
+        # Fallback
+        return {
+            "intent": "general",
+            "confidence": 0.5,
+            "reasoning": "Failed to parse LLM output"
+        }
+```
+
+### Technique 4: Temperature and Max Tokens Adjustment
+
+**Temperature Effects**:
+
+| Task Type | Recommended Temperature | Reason |
+|-----------|------------------------|--------|
+| Classification/Extraction | 0.0 - 0.3 | Deterministic output desired |
+| Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed |
+| Creative/Generation | 0.7 - 1.0 | Diversity and creativity important |
+
+**Before (Default settings)**:
+```python
+llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=1.0  # Default, used for all tasks
+)
+# Unstable results for classification tasks
+```
+
+**After (Optimized per task)**:
+```python
+# Intent classification: Low temperature
+intent_llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=0.3  # Emphasize consistency
+)
+
+# Response generation: Medium temperature
+response_llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    temperature=0.5,  # Balance flexibility
+    max_tokens=500    # Enforce conciseness
+)
+```
+
+**Max Tokens Effects**:
+
+```python
+# Before: No limit
+llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
+# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s
+
+# After: Appropriate limit
+llm = ChatAnthropic(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=500  # Necessary and sufficient length
+)
+# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)
+```
+
+### Technique 5: System Message vs Human Message Usage
+
+**System Message**:
+- **Use**: Role, guidelines, constraints
+- **Characteristics**: Context applied to entire task
+- **Caching**: Effective (doesn't change frequently)
+
+**Human Message**:
+- **Use**: Specific input, questions
+- **Characteristics**: Changes per request
+- **Caching**: Less effective
+
+**Good Structure**:
+```python
+messages = [
+    SystemMessage(content="""You are a customer support assistant.
+
+Guidelines:
+- Be concise: 2-3 sentences maximum
+- Be empathetic: Acknowledge customer concerns
+- Be actionable: Provide clear next steps
+
+Response format:
+1. Acknowledgment
+2. Answer or solution
+3. Next steps (if applicable)"""),
+
+    HumanMessage(content=f"""Customer question: {user_input}
+
+Context: {context}
+
+Generate a helpful response:""")
+]
+```
+
+### Technique 6: Prompt Caching
+
+**Effect**: Cost -50-90% (on cache hit)
+
+Leverage Anthropic Claude's prompt caching:
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic()
+
+# Large cacheable system prompt
+CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...
+
+[Long guidelines, examples, and context - 1000+ tokens]
+
+Examples:
+[50 few-shot examples]
+"""
+
+# Use cache
+message = client.messages.create(
+    model="claude-3-5-sonnet-20241022",
+    max_tokens=500,
+    system=[
+        {
+            "type": "text",
+            "text": CACHED_SYSTEM_PROMPT,
+            "cache_control": {"type": "ephemeral"}  # Enable caching
+        }
+    ],
+    messages=[
+        {"role": "user", "content": user_input}
+    ]
+)
+
+# First time: Full cost
+# 2nd+ time (within 5 minutes): Input tokens -90% discount
+```
+
+**Caching Strategy**:
+- ✅ Large system prompts (>1024 tokens)
+- ✅ Sets of few-shot examples
+- ✅ Long context (RAG documents)
+- ❌ Frequently changing content
+- ❌ Small prompts (<1024 tokens)
+
+### Technique 7: Progressive Refinement
+
+Break complex tasks into multiple steps:
+
+**Before (1 step)**:
+```python
+# Execute everything in one node
+prompt = f"""Analyze user input, retrieve relevant info, and generate response.
+
+Input: {user_input}"""
+
+# Problems: Too complex, low quality, hard to debug
+```
+
+**After (Multiple steps)**:
+```python
+# Step 1: Intent classification
+intent = classify_intent(user_input)
+
+# Step 2: Information retrieval (based on intent)
+context = retrieve_context(intent, user_input)
+
+# Step 3: Response generation (using intent and context)
+response = generate_response(intent, context, user_input)
+
+# Benefits: Each step optimizable, easy to debug, improved quality
+```
+
+### Technique 8: Negative Instructions
+
+**Effect**: Edge case errors -30-50%
+
+```python
+prompt = """Generate a customer support response.
+
+DO:
+- Be concise (2-3 sentences)
+- Acknowledge the customer's concern
+- Provide actionable next steps
+
+DO NOT:
+- Apologize excessively (one apology maximum)
+- Make promises you can't keep (e.g., "immediate resolution")
+- Use technical jargon without explanation
+- Provide information not in the context
+- Generate placeholder text like "XXX" or "[insert here]"
+
+Customer question: {question}
+Context: {context}
+
+Response:"""
+```
+
+### Technique 9: Self-Consistency
+
+**Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300%
+
+Generate multiple reasoning paths and use majority voting:
+
+```python
+def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
+    """Generate multiple reasoning paths and select the most consistent answer"""
+
+    llm = ChatAnthropic(
+        model="claude-3-5-sonnet-20241022",
+        temperature=0.7  # Higher temperature for diversity
+    )
+
+    prompt = f"""Question: {question}
+
+Think through this step by step and provide your reasoning:
+
+Reasoning:"""
+
+    # Generate multiple reasoning paths
+    responses = []
+    for _ in range(num_samples):
+        response = llm.invoke([HumanMessage(content=prompt)])
+        responses.append(response.content)
+
+    # Extract the most consistent answer (simplified)
+    # In practice, extract final answer from each response and use majority voting
+    from collections import Counter
+    final_answers = [extract_final_answer(r) for r in responses]
+    most_common = Counter(final_answers).most_common(1)[0][0]
+
+    return most_common
+
+# Trade-offs:
+# - Accuracy: +10-20%
+# - Cost: +200-300% (5x API calls)
+# - Latency: +200-300% (if not parallelized)
+# Use: Critical decisions only
+```
+
+### Technique 10: Model Selection
+
+**Model Selection Based on Task Complexity**:
+
+| Task Type | Recommended Model | Reason |
+|-----------|------------------|--------|
+| Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy |
+| Complex reasoning | Claude 3.5 Sonnet | Balanced performance |
+| Highly complex tasks | Claude Opus | Best performance (high cost) |
+
+```python
+# Select optimal model per task
+class LLMSelector:
+    def __init__(self):
+        self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
+        self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
+        self.opus = ChatAnthropic(model="claude-opus-20240229")
+
+    def get_llm(self, task_complexity: str):
+        if task_complexity == "simple":
+            return self.haiku  # ~$0.001/req
+        elif task_complexity == "complex":
+            return self.sonnet  # ~$0.005/req
+        else:  # very_complex
+            return self.opus  # ~$0.015/req
+
+# Usage example
+selector = LLMSelector()
+
+# Simple intent classification → Haiku
+intent_llm = selector.get_llm("simple")
+
+# Complex response generation → Sonnet
+response_llm = selector.get_llm("complex")
+```
+
+**Hybrid Approach**:
+```python
+def hybrid_classification(user_input: str) -> dict:
+    """Try Haiku first, use Sonnet if confidence is low"""
+
+    # Step 1: Classify with Haiku
+    haiku_result = classify_with_haiku(user_input)
+
+    if haiku_result["confidence"] >= 0.8:
+        # High confidence → Use Haiku result
+        return haiku_result
+    else:
+        # Low confidence → Re-classify with Sonnet
+        sonnet_result = classify_with_sonnet(user_input)
+        return sonnet_result
+
+# Effects:
+# - 80% of cases use Haiku (low cost)
+# - 20% of cases use Sonnet (high accuracy)
+# - Average cost: -60%
+# - Average accuracy: -2% (acceptable range)
+```