Files
gh-hiroshi75-protografico-p…/skills/fine-tune/prompt_techniques.md
2025-11-29 18:45:58 +08:00

11 KiB

Prompt Optimization Techniques

A collection of practical techniques for effectively optimizing prompts in LangGraph nodes.

💡 Tip: For before/after prompt comparison examples and code templates, refer to examples.md.

🔧 Practical Optimization Techniques

Technique 1: Few-Shot Examples

Effect: Accuracy +10-20%

Before (Zero-shot):

system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general."""

# Accuracy: ~70%

After (Few-shot):

system_prompt = """Classify user input into: product_inquiry, technical_support, billing, or general.

Examples:

Input: "How much does the premium plan cost?"
Output: product_inquiry

Input: "I can't log into my account"
Output: technical_support

Input: "Why was I charged twice this month?"
Output: billing

Input: "Hello, how are you today?"
Output: general

Input: "What features are included in the basic plan?"
Output: product_inquiry"""

# Accuracy: ~85-90%

Best Practices:

  • Number of Examples: 3-7 (diminishing returns beyond this)
  • Diversity: At least one from each category, including edge cases
  • Quality: Select clear and unambiguous examples
  • Format: Consistent Input/Output format

Technique 2: Chain-of-Thought

Effect: Accuracy +15-30% for complex reasoning tasks

Before (Direct answer):

prompt = f"""Question: {question}

Answer:"""

# Many incorrect answers for complex questions

After (Chain-of-Thought):

prompt = f"""Question: {question}

Think through this step by step:

1. First, identify the key information needed
2. Then, analyze the context for relevant details
3. Finally, formulate a clear answer

Reasoning:"""

# Logical answers even for complex questions

Application Scenarios:

  • Tasks requiring multi-step reasoning
  • Complex decision making
  • Resolving contradictions
  • Simple classification tasks (overhead)

Technique 3: Output Format Structuring

Effect: Latency -10-20%, Parsing errors -90%

Before (Free text):

prompt = "Classify the intent and explain why."

# Output: "This looks like a technical support question because the user is having trouble logging in..."
# Problems: Hard to parse, verbose, inconsistent

After (JSON structured):

prompt = """Classify the intent.

Output ONLY a valid JSON object:
{
  "intent": "<category>",
  "confidence": <0.0-1.0>,
  "reasoning": "<brief explanation in one sentence>"
}

Example output:
{"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}"""

# Output: {"intent": "technical_support", "confidence": 0.95, "reasoning": "User reports authentication issue"}
# Benefits: Easy to parse, concise, consistent

JSON Parsing Error Handling:

import json
import re

def parse_llm_json_output(output: str) -> dict:
    """Robustly parse LLM JSON output"""
    try:
        # Parse as JSON directly
        return json.loads(output)
    except json.JSONDecodeError:
        # Extract JSON only (from markdown code blocks, etc.)
        json_match = re.search(r'\{[^}]+\}', output)
        if json_match:
            try:
                return json.loads(json_match.group())
            except json.JSONDecodeError:
                pass

        # Fallback
        return {
            "intent": "general",
            "confidence": 0.5,
            "reasoning": "Failed to parse LLM output"
        }

Technique 4: Temperature and Max Tokens Adjustment

Temperature Effects:

Task Type Recommended Temperature Reason
Classification/Extraction 0.0 - 0.3 Deterministic output desired
Summarization/Transformation 0.3 - 0.5 Some flexibility needed
Creative/Generation 0.7 - 1.0 Diversity and creativity important

Before (Default settings):

llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=1.0  # Default, used for all tasks
)
# Unstable results for classification tasks

After (Optimized per task):

# Intent classification: Low temperature
intent_llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.3  # Emphasize consistency
)

# Response generation: Medium temperature
response_llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.5,  # Balance flexibility
    max_tokens=500    # Enforce conciseness
)

Max Tokens Effects:

# Before: No limit
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Average output: 800 tokens, Cost: $0.012/req, Latency: 3.2s

# After: Appropriate limit
llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500  # Necessary and sufficient length
)
# Average output: 450 tokens, Cost: $0.007/req (-42%), Latency: 1.8s (-44%)

Technique 5: System Message vs Human Message Usage

System Message:

  • Use: Role, guidelines, constraints
  • Characteristics: Context applied to entire task
  • Caching: Effective (doesn't change frequently)

Human Message:

  • Use: Specific input, questions
  • Characteristics: Changes per request
  • Caching: Less effective

Good Structure:

messages = [
    SystemMessage(content="""You are a customer support assistant.

Guidelines:
- Be concise: 2-3 sentences maximum
- Be empathetic: Acknowledge customer concerns
- Be actionable: Provide clear next steps

Response format:
1. Acknowledgment
2. Answer or solution
3. Next steps (if applicable)"""),

    HumanMessage(content=f"""Customer question: {user_input}

Context: {context}

Generate a helpful response:""")
]

Technique 6: Prompt Caching

Effect: Cost -50-90% (on cache hit)

Leverage Anthropic Claude's prompt caching:

from anthropic import Anthropic

client = Anthropic()

# Large cacheable system prompt
CACHED_SYSTEM_PROMPT = """You are an expert customer support agent...

[Long guidelines, examples, and context - 1000+ tokens]

Examples:
[50 few-shot examples]
"""

# Use cache
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    system=[
        {
            "type": "text",
            "text": CACHED_SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"}  # Enable caching
        }
    ],
    messages=[
        {"role": "user", "content": user_input}
    ]
)

# First time: Full cost
# 2nd+ time (within 5 minutes): Input tokens -90% discount

Caching Strategy:

  • Large system prompts (>1024 tokens)
  • Sets of few-shot examples
  • Long context (RAG documents)
  • Frequently changing content
  • Small prompts (<1024 tokens)

Technique 7: Progressive Refinement

Break complex tasks into multiple steps:

Before (1 step):

# Execute everything in one node
prompt = f"""Analyze user input, retrieve relevant info, and generate response.

Input: {user_input}"""

# Problems: Too complex, low quality, hard to debug

After (Multiple steps):

# Step 1: Intent classification
intent = classify_intent(user_input)

# Step 2: Information retrieval (based on intent)
context = retrieve_context(intent, user_input)

# Step 3: Response generation (using intent and context)
response = generate_response(intent, context, user_input)

# Benefits: Each step optimizable, easy to debug, improved quality

Technique 8: Negative Instructions

Effect: Edge case errors -30-50%

prompt = """Generate a customer support response.

DO:
- Be concise (2-3 sentences)
- Acknowledge the customer's concern
- Provide actionable next steps

DO NOT:
- Apologize excessively (one apology maximum)
- Make promises you can't keep (e.g., "immediate resolution")
- Use technical jargon without explanation
- Provide information not in the context
- Generate placeholder text like "XXX" or "[insert here]"

Customer question: {question}
Context: {context}

Response:"""

Technique 9: Self-Consistency

Effect: Accuracy +10-20% for complex reasoning, Cost +200-300%

Generate multiple reasoning paths and use majority voting:

def self_consistency_reasoning(question: str, num_samples: int = 5) -> str:
    """Generate multiple reasoning paths and select the most consistent answer"""

    llm = ChatAnthropic(
        model="claude-3-5-sonnet-20241022",
        temperature=0.7  # Higher temperature for diversity
    )

    prompt = f"""Question: {question}

Think through this step by step and provide your reasoning:

Reasoning:"""

    # Generate multiple reasoning paths
    responses = []
    for _ in range(num_samples):
        response = llm.invoke([HumanMessage(content=prompt)])
        responses.append(response.content)

    # Extract the most consistent answer (simplified)
    # In practice, extract final answer from each response and use majority voting
    from collections import Counter
    final_answers = [extract_final_answer(r) for r in responses]
    most_common = Counter(final_answers).most_common(1)[0][0]

    return most_common

# Trade-offs:
# - Accuracy: +10-20%
# - Cost: +200-300% (5x API calls)
# - Latency: +200-300% (if not parallelized)
# Use: Critical decisions only

Technique 10: Model Selection

Model Selection Based on Task Complexity:

Task Type Recommended Model Reason
Simple classification Claude 3.5 Haiku Fast, low cost, sufficient accuracy
Complex reasoning Claude 3.5 Sonnet Balanced performance
Highly complex tasks Claude Opus Best performance (high cost)
# Select optimal model per task
class LLMSelector:
    def __init__(self):
        self.haiku = ChatAnthropic(model="claude-3-5-haiku-20241022")
        self.sonnet = ChatAnthropic(model="claude-3-5-sonnet-20241022")
        self.opus = ChatAnthropic(model="claude-opus-20240229")

    def get_llm(self, task_complexity: str):
        if task_complexity == "simple":
            return self.haiku  # ~$0.001/req
        elif task_complexity == "complex":
            return self.sonnet  # ~$0.005/req
        else:  # very_complex
            return self.opus  # ~$0.015/req

# Usage example
selector = LLMSelector()

# Simple intent classification → Haiku
intent_llm = selector.get_llm("simple")

# Complex response generation → Sonnet
response_llm = selector.get_llm("complex")

Hybrid Approach:

def hybrid_classification(user_input: str) -> dict:
    """Try Haiku first, use Sonnet if confidence is low"""

    # Step 1: Classify with Haiku
    haiku_result = classify_with_haiku(user_input)

    if haiku_result["confidence"] >= 0.8:
        # High confidence → Use Haiku result
        return haiku_result
    else:
        # Low confidence → Re-classify with Sonnet
        sonnet_result = classify_with_sonnet(user_input)
        return sonnet_result

# Effects:
# - 80% of cases use Haiku (low cost)
# - 20% of cases use Sonnet (high accuracy)
# - Average cost: -60%
# - Average accuracy: -2% (acceptable range)