Files
gh-hiroshi75-protografico-p…/skills/fine-tune/prompt_techniques.md
2025-11-29 18:45:58 +08:00

426 lines
11 KiB
Markdown

# Prompt Optimization Techniques
A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.
**💡 Tip**: For before/after prompt comparison examples and code templates, refer to [examples.md](examples.md#phase-3-iterative-improvement-examples).
## 🔧 Practical Optimization Techniques
### Technique 1: Few-Shot Examples
**Effect**: Accuracy +10-20%
**Before (Zero-shot)**:
```python
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""
# Accuracy: ~70%
```
**After (Few-shot)**:
```python
system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.
Examples:
Input: "How much does the premium plan cost?"
Output: product_inquiry
Input: "I can't log into my account"
Output: technical_support
Input: "Why was I charged twice this month?"
Output: billing
Input: "Hello, how are you today?"
Output: general
Input: "What features are included in the basic plan?"
Output: product_inquiry"""
# Accuracy: ~85-90%
```
**Best Practices**:
- **Number of Examples**: 3-7 (diminishing returns beyond this)
- **Diversity**: At least one from each category, including edge cases
- **Quality**: Select clear and unambiguous examples
- **Format**: Consistent Input/Output format
### Technique 2: Chain-of-Thought
**Effect**: Accuracy +15-30% for complex reasoning tasks
**Before (Direct answer)**:
```python
prompt = f"""Question: {question}
Answer:"""
# Many incorrect answers for complex questions
```
**After (Chain-of-Thought)**:
```python
prompt = f"""Question: {question}
Think through this step by step:
1. First, identify the key information needed
2. Then, analyze the context for relevant details
3. Finally, formulate a clear answer
Reasoning:"""
# Logical answers even for complex questions
```
**Application Scenarios**:
- ✅ Tasks requiring multi-step reasoning
- ✅ Complex decision making
- ✅ Resolving contradictions
- ❌ Simple classification tasks (overhead)
### Technique 3: Output Format Structuring
**Effect**: Latency -10-20%, Parsing errors -90%
**Before (Free text)**:
```python
prompt = "Classify the intent and explain why."
# Output: "This looks like a technical support question because the user is having trouble logging in..."
# Problems: Hard to parse, verbose, inconsistent
```
**After (JSON structured)**:
```python
prompt = """Classify the intent.
Output ONLY a valid JSON object:
{
"intent": "<category>",
"confidence": <0.0-1.0>,
"reasoning": "<brief explanation in one sentence>"
}
Example output:
{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""
# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
# Benefits: Easy to parse, concise, consistent
```
**JSON Parsing Error Handling**:
```python
import json
import re
def parse_llm_json_output(output: str) -> dict:
"""Robustly parse LLM JSON output"""
try:
# Parse as JSON directly
return json.loads(output)
except json.JSONDecodeError:
# Extract JSON only (from markdown code blocks, etc.)
json_match = re.search(r'\{[^}]+\}', output)
if json_match:
try:
return json.loads(json_match.group())
except json.JSONDecodeError:
pass
# Fallback
return {
"intent": "general",
"confidence": 0.5,
"reasoning": "Failed to parse LLM output"
}
```
### Technique 4: Temperature and Max Tokens Adjustment
**Temperature Effects**:
| Task Type | Recommended Temperature | Reason |
|-----------|------------------------|--------|
| Classification/Extraction | 0.0 - 0.3 | Deterministic output desired |
| Summarization/Transformation | 0.3 - 0.5 | Some flexibility needed |
| Creative/Generation | 0.7 - 1.0 | Diversity and creativity important |
**Before (Default settings)**:
```python
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=1.0 # Default, used for all tasks
)
# Unstable results for classification tasks
```
**After (Optimized per task)**:
```python
# Intent classification: Low temperature
intent_llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0.3 # Emphasize consistency
)
# Response generation: Medium temperature
response_llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0.5, # Balance flexibility
max_tokens=500 # Enforce conciseness
)
```
**Max Tokens Effects**:
```python
# Before: No limit
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s
# After: Appropriate limit
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
max_tokens=500 # Necessary and sufficient length
)
# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)
```
### Technique 5: System Message vs Human Message Usage
**System Message**:
- **Use**: Role, guidelines, constraints
- **Characteristics**: Context applied to entire task
- **Caching**: Effective (doesn't change frequently)
**Human Message**:
- **Use**: Specific input, questions
- **Characteristics**: Changes per request
- **Caching**: Less effective
**Good Structure**:
```python
messages = [
SystemMessage(content="""You are a customer support assistant.
Guidelines:
- Be concise: 2-3 sentences maximum
- Be empathetic: Acknowledge customer concerns
- Be actionable: Provide clear next steps
Response format:
1. Acknowledgment
2. Answer or solution
3. Next steps (if applicable)"""),
HumanMessage(content=f"""Customer question: {user_input}
Context: {context}
Generate a helpful response:""")
]
```
### Technique 6: Prompt Caching
**Effect**: Cost -50-90% (on cache hit)
Leverage Anthropic Claude's prompt caching:
```python
from anthropic import Anthropic
client = Anthropic()
# Large cacheable system prompt
CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...
[Long guidelines, examples, and context - 1000+ tokens]
Examples:
[50 few-shot examples]
"""
# Use cache
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
system=[
{
"type": "text",
"text": CACHED_SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"} # Enable caching
}
],
messages=[
{"role": "user", "content": user_input}
]
)
# First time: Full cost
# 2nd+ time (within 5 minutes): Input tokens -90% discount
```
**Caching Strategy**:
- ✅ Large system prompts (>1024 tokens)
- ✅ Sets of few-shot examples
- ✅ Long context (RAG documents)
- ❌ Frequently changing content
- ❌ Small prompts (<1024 tokens)
### Technique 7: Progressive Refinement
Break complex tasks into multiple steps:
**Before (1 step)**:
```python
# Execute everything in one node
prompt = f"""Analyze user input, retrieve relevant info, and generate response.
Input: {user_input}"""
# Problems: Too complex, low quality, hard to debug
```
**After (Multiple steps)**:
```python
# Step 1: Intent classification
intent = classify_intent(user_input)
# Step 2: Information retrieval (based on intent)
context = retrieve_context(intent, user_input)
# Step 3: Response generation (using intent and context)
response = generate_response(intent, context, user_input)
# Benefits: Each step optimizable, easy to debug, improved quality
```
### Technique 8: Negative Instructions
**Effect**: Edge case errors -30-50%
```python
prompt = """Generate a customer support response.
DO:
- Be concise (2-3 sentences)
- Acknowledge the customer's concern
- Provide actionable next steps
DO NOT:
- Apologize excessively (one apology maximum)
- Make promises you can't keep (e.g., "immediate resolution")
- Use technical jargon without explanation
- Provide information not in the context
- Generate placeholder text like "XXX" or "[insert here]"
Customer question: {question}
Context: {context}
Response:"""
```
### Technique 9: Self-Consistency
**Effect**: Accuracy +10-20% for complex reasoning, Cost +200-300%
Generate multiple reasoning paths and use majority voting:
```python
def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
"""Generate multiple reasoning paths and select the most consistent answer"""
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
temperature=0.7 # Higher temperature for diversity
)
prompt = f"""Question: {question}
Think through this step by step and provide your reasoning:
Reasoning:"""
# Generate multiple reasoning paths
responses = []
for _ in range(num_samples):
response = llm.invoke([HumanMessage(content=prompt)])
responses.append(response.content)
# Extract the most consistent answer (simplified)
# In practice, extract final answer from each response and use majority voting
from collections import Counter
final_answers = [extract_final_answer(r) for r in responses]
most_common = Counter(final_answers).most_common(1)[0][0]
return most_common
# Trade-offs:
# - Accuracy: +10-20%
# - Cost: +200-300% (5x API calls)
# - Latency: +200-300% (if not parallelized)
# Use: Critical decisions only
```
### Technique 10: Model Selection
**Model Selection Based on Task Complexity**:
| Task Type | Recommended Model | Reason |
|-----------|------------------|--------|
| Simple classification | Claude 3.5 Haiku | Fast, low cost, sufficient accuracy |
| Complex reasoning | Claude 3.5 Sonnet | Balanced performance |
| Highly complex tasks | Claude Opus | Best performance (high cost) |
```python
# Select optimal model per task
class LLMSelector:
def __init__(self):
self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
self.opus = ChatAnthropic(model="claude-opus-20240229")
def get_llm(self, task_complexity: str):
if task_complexity == "simple":
return self.haiku # ~$0.001/req
elif task_complexity == "complex":
return self.sonnet # ~$0.005/req
else: # very_complex
return self.opus # ~$0.015/req
# Usage example
selector = LLMSelector()
# Simple intent classification → Haiku
intent_llm = selector.get_llm("simple")
# Complex response generation → Sonnet
response_llm = selector.get_llm("complex")
```
**Hybrid Approach**:
```python
def hybrid_classification(user_input: str) -> dict:
"""Try Haiku first, use Sonnet if confidence is low"""
# Step 1: Classify with Haiku
haiku_result = classify_with_haiku(user_input)
if haiku_result["confidence"] >= 0.8:
# High confidence → Use Haiku result
return haiku_result
else:
# Low confidence → Re-classify with Sonnet
sonnet_result = classify_with_sonnet(user_input)
return sonnet_result
# Effects:
# - 80% of cases use Haiku (low cost)
# - 20% of cases use Sonnet (high accuracy)
# - Average cost: -60%
# - Average accuracy: -2% (acceptable range)
```