Initial commit
This commit is contained in:
425
skills/fine-tune/prompt_techniques.md
Normal file
425
skills/fine-tune/prompt_techniques.md
Normal file
@@ -0,0 +1,425 @@
|
||||
# Prompt Optimization Techniques
|
||||
|
||||
A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.
|
||||
|
||||
**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
|
||||
|
||||
## 🔧 Practical Optimization Techniques
|
||||
|
||||
### Technique 1: Few-Shot Examples
|
||||
|
||||
**Effect**: Accuracy +10-20%
|
||||
|
||||
**Before (Zero-shot)**:
|
||||
```python
|
||||
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""
|
||||
|
||||
# Accuracy: ~70%
|
||||
```
|
||||
|
||||
**After (Few-shot)**:
|
||||
```python
|
||||
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.
|
||||
|
||||
Examples:
|
||||
|
||||
Input: "How much does the premium plan cost?"
|
||||
Output: product_inquiry
|
||||
|
||||
Input: "I can't log into my account"
|
||||
Output: technical_support
|
||||
|
||||
Input: "Why was I charged twice this month?"
|
||||
Output: billing
|
||||
|
||||
Input: "Hello, how are you today?"
|
||||
Output: general
|
||||
|
||||
Input: "What features are included in the basic plan?"
|
||||
Output: product_inquiry"""
|
||||
|
||||
# Accuracy: ~85-90%
|
||||
```
|
||||
|
||||
**Best Practices**:
|
||||
- **Number of Examples**: 3-7 (diminishing returns beyond this)
|
||||
- **Diversity**: At least one from each category, including edge cases
|
||||
- **Quality**: Select clear and unambiguous examples
|
||||
- **Format**: Consistent Input/Output format
|
||||
|
||||
### Technique 2: Chain-of-Thought
|
||||
|
||||
**Effect**: Accuracy +15-30% for complex reasoning tasks
|
||||
|
||||
**Before (Direct answer)**:
|
||||
```python
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Answer:"""
|
||||
|
||||
# Many incorrect answers for complex questions
|
||||
```
|
||||
|
||||
**After (Chain-of-Thought)**:
|
||||
```python
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Think through this step by step:
|
||||
|
||||
1. First, identify the key information needed
|
||||
2. Then, analyze the context for relevant details
|
||||
3. Finally, formulate a clear answer
|
||||
|
||||
Reasoning:"""
|
||||
|
||||
# Logical answers even for complex questions
|
||||
```
|
||||
|
||||
**Application Scenarios**:
|
||||
- ✅ Tasks requiring multi-step reasoning
|
||||
- ✅ Complex decision making
|
||||
- ✅ Resolving contradictions
|
||||
- ❌ Simple classification tasks (overhead)
|
||||
|
||||
### Technique 3: Output Format Structuring
|
||||
|
||||
**Effect**: Latency -10-20%, Parsing errors -90%
|
||||
|
||||
**Before (Free text)**:
|
||||
```python
|
||||
prompt = "Classify the intent and explain why."
|
||||
|
||||
# Output: "This looks like a technical support question because the user is having trouble logging in..."
|
||||
# Problems: Hard to parse, verbose, inconsistent
|
||||
```
|
||||
|
||||
**After (JSON structured)**:
|
||||
```python
|
||||
prompt = """Classify the intent.
|
||||
|
||||
Output ONLY a valid JSON object:
|
||||
{
|
||||
"intent": "<category>",
|
||||
"confidence": <0.0-1.0>,
|
||||
"reasoning": "<brief explanation in one sentence>"
|
||||
}
|
||||
|
||||
Example output:
|
||||
{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""
|
||||
|
||||
# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
|
||||
# Benefits: Easy to parse, concise, consistent
|
||||
```
|
||||
|
||||
**JSON Parsing Error Handling**:
|
||||
```python
|
||||
import json
|
||||
import re
|
||||
|
||||
def parse_llm_json_output(output: str) -> dict:
|
||||
"""Robustly parse LLM JSON output"""
|
||||
try:
|
||||
# Parse as JSON directly
|
||||
return json.loads(output)
|
||||
except json.JSONDecodeError:
|
||||
# Extract JSON only (from markdown code blocks, etc.)
|
||||
json_match = re.search(r'\{[^}]+\}', output)
|
||||
if json_match:
|
||||
try:
|
||||
return json.loads(json_match.group())
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Fallback
|
||||
return {
|
||||
"intent": "general",
|
||||
"confidence": 0.5,
|
||||
"reasoning": "Failed to parse LLM output"
|
||||
}
|
||||
```
|
||||
|
||||
### Technique 4: Temperature and Max Tokens Adjustment
|
||||
|
||||
**Temperature Effects**:
|
||||
|
||||
| Task Type | Recommended Temperature | Reason |
|
||||
|-----------|------------------------|--------|
|
||||
| Classification/Extraction | 0.0 - 0.3 | Deterministic output desired |
|
||||
| Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed |
|
||||
| Creative/Generation | 0.7 - 1.0 | Diversity and creativity important |
|
||||
|
||||
**Before (Default settings)**:
|
||||
```python
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=1.0 # Default, used for all tasks
|
||||
)
|
||||
# Unstable results for classification tasks
|
||||
```
|
||||
|
||||
**After (Optimized per task)**:
|
||||
```python
|
||||
# Intent classification: Low temperature
|
||||
intent_llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.3 # Emphasize consistency
|
||||
)
|
||||
|
||||
# Response generation: Medium temperature
|
||||
response_llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.5, # Balance flexibility
|
||||
max_tokens=500 # Enforce conciseness
|
||||
)
|
||||
```
|
||||
|
||||
**Max Tokens Effects**:
|
||||
|
||||
```python
|
||||
# Before: No limit
|
||||
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s
|
||||
|
||||
# After: Appropriate limit
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=500 # Necessary and sufficient length
|
||||
)
|
||||
# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)
|
||||
```
|
||||
|
||||
### Technique 5: System Message vs Human Message Usage
|
||||
|
||||
**System Message**:
|
||||
- **Use**: Role, guidelines, constraints
|
||||
- **Characteristics**: Context applied to entire task
|
||||
- **Caching**: Effective (doesn't change frequently)
|
||||
|
||||
**Human Message**:
|
||||
- **Use**: Specific input, questions
|
||||
- **Characteristics**: Changes per request
|
||||
- **Caching**: Less effective
|
||||
|
||||
**Good Structure**:
|
||||
```python
|
||||
messages = [
|
||||
SystemMessage(content="""You are a customer support assistant.
|
||||
|
||||
Guidelines:
|
||||
- Be concise: 2-3 sentences maximum
|
||||
- Be empathetic: Acknowledge customer concerns
|
||||
- Be actionable: Provide clear next steps
|
||||
|
||||
Response format:
|
||||
1. Acknowledgment
|
||||
2. Answer or solution
|
||||
3. Next steps (if applicable)"""),
|
||||
|
||||
HumanMessage(content=f"""Customer question: {user_input}
|
||||
|
||||
Context: {context}
|
||||
|
||||
Generate a helpful response:""")
|
||||
]
|
||||
```
|
||||
|
||||
### Technique 6: Prompt Caching
|
||||
|
||||
**Effect**: Cost -50-90% (on cache hit)
|
||||
|
||||
Leverage Anthropic Claude's prompt caching:
|
||||
|
||||
```python
|
||||
from anthropic import Anthropic
|
||||
|
||||
client = Anthropic()
|
||||
|
||||
# Large cacheable system prompt
|
||||
CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...
|
||||
|
||||
[Long guidelines, examples, and context - 1000+ tokens]
|
||||
|
||||
Examples:
|
||||
[50 few-shot examples]
|
||||
"""
|
||||
|
||||
# Use cache
|
||||
message = client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=500,
|
||||
system=[
|
||||
{
|
||||
"type": "text",
|
||||
"text": CACHED_SYSTEM_PROMPT,
|
||||
"cache_control": {"type": "ephemeral"} # Enable caching
|
||||
}
|
||||
],
|
||||
messages=[
|
||||
{"role": "user", "content": user_input}
|
||||
]
|
||||
)
|
||||
|
||||
# First time: Full cost
|
||||
# 2nd+ time (within 5 minutes): Input tokens -90% discount
|
||||
```
|
||||
|
||||
**Caching Strategy**:
|
||||
- ✅ Large system prompts (>1024 tokens)
|
||||
- ✅ Sets of few-shot examples
|
||||
- ✅ Long context (RAG documents)
|
||||
- ❌ Frequently changing content
|
||||
- ❌ Small prompts (<1024 tokens)
|
||||
|
||||
### Technique 7: Progressive Refinement
|
||||
|
||||
Break complex tasks into multiple steps:
|
||||
|
||||
**Before (1 step)**:
|
||||
```python
|
||||
# Execute everything in one node
|
||||
prompt = f"""Analyze user input, retrieve relevant info, and generate response.
|
||||
|
||||
Input: {user_input}"""
|
||||
|
||||
# Problems: Too complex, low quality, hard to debug
|
||||
```
|
||||
|
||||
**After (Multiple steps)**:
|
||||
```python
|
||||
# Step 1: Intent classification
|
||||
intent = classify_intent(user_input)
|
||||
|
||||
# Step 2: Information retrieval (based on intent)
|
||||
context = retrieve_context(intent, user_input)
|
||||
|
||||
# Step 3: Response generation (using intent and context)
|
||||
response = generate_response(intent, context, user_input)
|
||||
|
||||
# Benefits: Each step optimizable, easy to debug, improved quality
|
||||
```
|
||||
|
||||
### Technique 8: Negative Instructions
|
||||
|
||||
**Effect**: Edge case errors -30-50%
|
||||
|
||||
```python
|
||||
prompt = """Generate a customer support response.
|
||||
|
||||
DO:
|
||||
- Be concise (2-3 sentences)
|
||||
- Acknowledge the customer's concern
|
||||
- Provide actionable next steps
|
||||
|
||||
DO NOT:
|
||||
- Apologize excessively (one apology maximum)
|
||||
- Make promises you can't keep (e.g., "immediate resolution")
|
||||
- Use technical jargon without explanation
|
||||
- Provide information not in the context
|
||||
- Generate placeholder text like "XXX" or "[insert here]"
|
||||
|
||||
Customer question: {question}
|
||||
Context: {context}
|
||||
|
||||
Response:"""
|
||||
```
|
||||
|
||||
### Technique 9: Self-Consistency
|
||||
|
||||
**Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300%
|
||||
|
||||
Generate multiple reasoning paths and use majority voting:
|
||||
|
||||
```python
|
||||
def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
|
||||
"""Generate multiple reasoning paths and select the most consistent answer"""
|
||||
|
||||
llm = ChatAnthropic(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
temperature=0.7 # Higher temperature for diversity
|
||||
)
|
||||
|
||||
prompt = f"""Question: {question}
|
||||
|
||||
Think through this step by step and provide your reasoning:
|
||||
|
||||
Reasoning:"""
|
||||
|
||||
# Generate multiple reasoning paths
|
||||
responses = []
|
||||
for _ in range(num_samples):
|
||||
response = llm.invoke([HumanMessage(content=prompt)])
|
||||
responses.append(response.content)
|
||||
|
||||
# Extract the most consistent answer (simplified)
|
||||
# In practice, extract final answer from each response and use majority voting
|
||||
from collections import Counter
|
||||
final_answers = [extract_final_answer(r) for r in responses]
|
||||
most_common = Counter(final_answers).most_common(1)[0][0]
|
||||
|
||||
return most_common
|
||||
|
||||
# Trade-offs:
|
||||
# - Accuracy: +10-20%
|
||||
# - Cost: +200-300% (5x API calls)
|
||||
# - Latency: +200-300% (if not parallelized)
|
||||
# Use: Critical decisions only
|
||||
```
|
||||
|
||||
### Technique 10: Model Selection
|
||||
|
||||
**Model Selection Based on Task Complexity**:
|
||||
|
||||
| Task Type | Recommended Model | Reason |
|
||||
|-----------|------------------|--------|
|
||||
| Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy |
|
||||
| Complex reasoning | Claude 3.5 Sonnet | Balanced performance |
|
||||
| Highly complex tasks | Claude Opus | Best performance (high cost) |
|
||||
|
||||
```python
|
||||
# Select optimal model per task
|
||||
class LLMSelector:
|
||||
def __init__(self):
|
||||
self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
|
||||
self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
|
||||
self.opus = ChatAnthropic(model="claude-opus-20240229")
|
||||
|
||||
def get_llm(self, task_complexity: str):
|
||||
if task_complexity == "simple":
|
||||
return self.haiku # ~$0.001/req
|
||||
elif task_complexity == "complex":
|
||||
return self.sonnet # ~$0.005/req
|
||||
else: # very_complex
|
||||
return self.opus # ~$0.015/req
|
||||
|
||||
# Usage example
|
||||
selector = LLMSelector()
|
||||
|
||||
# Simple intent classification → Haiku
|
||||
intent_llm = selector.get_llm("simple")
|
||||
|
||||
# Complex response generation → Sonnet
|
||||
response_llm = selector.get_llm("complex")
|
||||
```
|
||||
|
||||
**Hybrid Approach**:
|
||||
```python
|
||||
def hybrid_classification(user_input: str) -> dict:
|
||||
"""Try Haiku first, use Sonnet if confidence is low"""
|
||||
|
||||
# Step 1: Classify with Haiku
|
||||
haiku_result = classify_with_haiku(user_input)
|
||||
|
||||
if haiku_result["confidence"] >= 0.8:
|
||||
# High confidence → Use Haiku result
|
||||
return haiku_result
|
||||
else:
|
||||
# Low confidence → Re-classify with Sonnet
|
||||
sonnet_result = classify_with_sonnet(user_input)
|
||||
return sonnet_result
|
||||
|
||||
# Effects:
|
||||
# - 80% of cases use Haiku (low cost)
|
||||
# - 20% of cases use Sonnet (high accuracy)
|
||||
# - Average cost: -60%
|
||||
# - Average accuracy: -2% (acceptable range)
|
||||
```
|
||||
Reference in New Issue
Block a user