Initial commit

2025-11-30 08:59:54 +08:00
commit 725c187d17
11 changed files with 8174 additions and 0 deletions
--- a/skills/using-llm-specialist/prompt-engineering-patterns.md
+++ b/skills/using-llm-specialist/prompt-engineering-patterns.md
@@ -0,0 +1,973 @@
+
+# Prompt Engineering Patterns
+
+## Context
+
+You're writing prompts for an LLM and getting inconsistent or incorrect outputs. Common issues:
+- **Vague instructions**: Model guesses intent (inconsistent results)
+- **No examples**: Model infers task from description alone (ambiguous)
+- **No output format**: Model defaults to prose (unparsable)
+- **No reasoning scaffolding**: Model jumps to answer (errors in complex tasks)
+- **System message misuse**: Task instructions in system message (inflexible)
+
+**This skill provides effective prompt engineering patterns: specificity, few-shot examples, format specification, chain-of-thought, and proper message structure.**
+
+
+## Core Principle: Be Specific
+
+**Vague prompts → Inconsistent outputs**
+
+**Bad:**
+```
+Analyze this review: "Product was okay."
+```
+
+**Why bad:**
+- "Analyze" is ambiguous (sentiment? quality? topics?)
+- No scale specified (1-5? positive/negative?)
+- No output format (text? JSON? number?)
+
+**Good:**
+```
+Rate this review's sentiment on a scale of 1-5:
+1 = Very negative
+2 = Negative
+3 = Neutral
+4 = Positive
+5 = Very positive
+
+Review: "Product was okay."
+
+Output ONLY the number (1-5):
+```
+
+**Result:** Consistent "3" every time
+
+### Specificity Checklist:
+
+☐ **Define the task clearly** (classify, extract, generate, summarize)
+☐ **Specify the scale** (1-5, 1-10, percentage, positive/negative/neutral)
+☐ **Define edge cases** (null values, ambiguous inputs, relative dates)
+☐ **Specify output format** (JSON, CSV, number only, yes/no)
+☐ **Set constraints** (max length, required fields, allowed values)
+
+
+## Prompt Structure
+
+### Message Roles:
+
+**1. System Message:**
+```python
+system = """
+You are an expert Python programmer with 10 years of experience.
+You write clean, efficient, well-documented code.
+You always follow PEP 8 style guidelines.
+"""
+```
+
+**Purpose:**
+- Sets role/persona (expert, assistant, teacher)
+- Defines global behavior (concise, detailed, technical)
+- Applies to entire conversation
+
+**Best practices:**
+- Keep it short (< 200 words)
+- Define WHO the model is, not WHAT to do
+- Set tone and constraints
+
+**2. User Message:**
+```python
+user = """
+Write a Python function that calculates the Fibonacci sequence up to n terms.
+
+Requirements:
+- Use recursion with memoization
+- Include docstring
+- Handle edge cases (n <= 0)
+- Return list of integers
+
+Output only the code, no explanations.
+"""
+```
+
+**Purpose:**
+- Specific task instructions (per-request)
+- Input data
+- Output format requirements
+
+**Best practices:**
+- Be specific about requirements
+- Include examples if ambiguous
+- Specify output format explicitly
+
+**3. Assistant Message (in conversation):**
+```python
+messages = [
+    {"role": "system", "content": system},
+    {"role": "user", "content": "Calculate 2+2"},
+    {"role": "assistant", "content": "4"},
+    {"role": "user", "content": "Now multiply that by 3"},
+]
+```
+
+**Purpose:**
+- Conversation history
+- Shows model previous responses
+- Enables multi-turn conversations
+
+
+## Few-Shot Learning
+
+**Show, don't tell.** Examples teach better than instructions.
+
+### 0-Shot (No Examples):
+
+```
+Extract the person, company, and location from this text:
+
+Text: "Tim Cook presented the new iPhone at Apple's Cupertino campus."
+```
+
+**Issues:**
+- Model guesses format (JSON? Key-value? List?)
+- Edge cases unclear (What if no person? Multiple companies?)
+
+### 1-Shot (One Example):
+
+```
+Extract entities as JSON.
+
+Example:
+Text: "Satya Nadella spoke at Microsoft in Seattle."
+Output: {"person": "Satya Nadella", "company": "Microsoft", "location": "Seattle"}
+
+Now extract from:
+Text: "Tim Cook presented the new iPhone at Apple's Cupertino campus."
+Output:
+```
+
+**Better!** Model sees format and structure.
+
+### Few-Shot (3-5 Examples - BEST):
+
+```
+Extract entities as JSON.
+
+Example 1:
+Text: "Satya Nadella spoke at Microsoft in Seattle."
+Output: {"person": "Satya Nadella", "company": "Microsoft", "location": "Seattle"}
+
+Example 2:
+Text: "Google announced Gemini in Mountain View."
+Output: {"person": null, "company": "Google", "location": "Mountain View"}
+
+Example 3:
+Text: "The event took place online with no speakers."
+Output: {"person": null, "company": null, "location": "online"}
+
+Now extract from:
+Text: "Tim Cook presented the new iPhone at Apple's Cupertino campus."
+Output:
+```
+
+**Why 3-5 examples?**
+- 1 example: Shows format
+- 2-3 examples: Shows variation and edge cases
+- 4-5 examples: Shows complex patterns
+- > 5 examples: Diminishing returns (uses more tokens)
+
+### Few-Shot Best Practices:
+
+1. **Cover edge cases:**
+   - Null values (missing entities)
+   - Multiple values (list of people)
+   - Ambiguous cases (nickname vs full name)
+
+2. **Show desired format consistently:**
+   - All examples use same structure
+   - Same field names
+   - Same data types
+
+3. **Order matters:**
+   - Put most representative example first
+   - Put edge cases later
+   - Model learns from all examples
+
+4. **Balance examples:**
+   - Show positive and negative cases
+   - Show simple and complex cases
+   - Avoid bias (don't show only easy examples)
+
+
+## Chain-of-Thought (CoT) Prompting
+
+**For reasoning tasks, request step-by-step thinking.**
+
+### Without CoT (Direct):
+
+```
+Q: A farmer has 17 sheep. All but 9 die. How many sheep are left?
+A:
+```
+
+**Output:** "8 sheep" (WRONG! Misread "all but 9")
+
+### With CoT:
+
+```
+Q: A farmer has 17 sheep. All but 9 die. How many sheep are left?
+
+Think step-by-step:
+1. Start with how many sheep
+2. Understand what "all but 9 die" means
+3. Calculate remaining sheep
+4. State the answer
+
+A:
+```
+
+**Output:**
+```
+1. The farmer starts with 17 sheep
+2. "All but 9 die" means all sheep except 9 die
+3. So 9 sheep remain alive
+4. Answer: 9 sheep
+```
+
+**Correct!** CoT catches the trick.
+
+### When to Use CoT:
+
+- ✅ Math word problems
+- ✅ Logic puzzles
+- ✅ Multi-step reasoning
+- ✅ Complex decision-making
+- ✅ Ambiguous questions
+
+**Not needed for:**
+- ❌ Simple classification (sentiment)
+- ❌ Direct lookups (capital of France)
+- ❌ Pattern matching (regex, entity extraction)
+
+### CoT Variants:
+
+**1. Explicit steps:**
+```
+Solve step-by-step:
+1. Identify what we know
+2. Identify what we need to find
+3. Set up the equation
+4. Solve
+5. Verify the answer
+```
+
+**2. "Let's think step by step":**
+```
+Q: [question]
+A: Let's think step by step.
+```
+
+**3. "Explain your reasoning":**
+```
+Q: [question]
+A: I'll explain my reasoning:
+```
+
+**All three work!** Pick what fits your use case.
+
+
+## Output Formatting
+
+**Specify format explicitly. Don't assume model knows what you want.**
+
+### JSON Output:
+
+**Bad (no format specified):**
+```
+Extract the name, age, and occupation from: "John is 30 years old and works as an engineer."
+```
+
+**Output:** "The person's name is John, who is 30 years old and works as an engineer."
+
+**Good (format specified):**
+```
+Extract information as JSON:
+
+Text: "John is 30 years old and works as an engineer."
+
+Output in this format:
+{
+  "name": "<string>",
+  "age": <number>,
+  "occupation": "<string>"
+}
+
+JSON:
+```
+
+**Output:**
+```json
+{
+  "name": "John",
+  "age": 30,
+  "occupation": "engineer"
+}
+```
+
+### CSV Output:
+
+```
+Convert this data to CSV format with columns: name, age, city.
+
+Data: John is 30 and lives in NYC. Mary is 25 and lives in LA.
+
+CSV (with header):
+```
+
+**Output:**
+```csv
+name,age,city
+John,30,NYC
+Mary,25,LA
+```
+
+### Structured Text:
+
+```
+Summarize this article in bullet points (max 5 points):
+
+Article: [text]
+
+Summary:
+-
+```
+
+**Output:**
+```
+- Point 1
+- Point 2
+- Point 3
+- Point 4
+- Point 5
+```
+
+### XML/HTML:
+
+```
+Format this data as HTML table:
+
+Data: [data]
+
+HTML:
+```
+
+### Format Best Practices:
+
+1. **Show the schema:**
+   ```json
+   {
+     "field1": "<type>",
+     "field2": <type>,
+     ...
+   }
+   ```
+
+2. **Specify data types:** `<string>`, `<number>`, `<boolean>`, `<array>`
+
+3. **Show example output:** Full example of expected output
+
+4. **Request validation:** "Output valid JSON" or "Ensure CSV is parsable"
+
+
+## Temperature and Sampling
+
+**Temperature controls randomness. Adjust based on task.**
+
+### Temperature = 0 (Deterministic):
+
+```python
+response = openai.ChatCompletion.create(
+    model="gpt-4",
+    messages=[...],
+    temperature=0  # Deterministic, always same output
+)
+```
+
+**Use for:**
+- ✅ Classification (sentiment, category)
+- ✅ Extraction (entities, data fields)
+- ✅ Structured output (JSON, CSV)
+- ✅ Factual queries (capital of X, date of Y)
+
+**Why:** Need consistency and correctness, not creativity
+
+### Temperature = 0.7-1.0 (Creative):
+
+```python
+response = openai.ChatCompletion.create(
+    model="gpt-4",
+    messages=[...],
+    temperature=0.8  # Creative, varied outputs
+)
+```
+
+**Use for:**
+- ✅ Creative writing (stories, poems)
+- ✅ Brainstorming (ideas, alternatives)
+- ✅ Conversational chat (natural dialogue)
+- ✅ Content generation (marketing copy)
+
+**Why:** Want variety and creativity, not determinism
+
+### Temperature = 1.5-2.0 (Very Random):
+
+```python
+response = openai.ChatCompletion.create(
+    model="gpt-4",
+    messages=[...],
+    temperature=1.8  # Very random, surprising outputs
+)
+```
+
+**Use for:**
+- ✅ Experimental generation
+- ✅ Highly creative tasks
+
+**Warning:** May produce nonsensical outputs (use carefully)
+
+### Top-p (Nucleus Sampling):
+
+```python
+response = openai.ChatCompletion.create(
+    model="gpt-4",
+    messages=[...],
+    temperature=0.7,
+    top_p=0.9  # Consider top 90% probability mass
+)
+```
+
+**Alternative to temperature:**
+- top_p = 1.0: Consider all tokens (default)
+- top_p = 0.9: Consider top 90% (filters low-probability tokens)
+- top_p = 0.5: Consider top 50% (more focused)
+
+**Best practice:** Use temperature OR top_p, not both
+
+
+## Common Task Patterns
+
+### 1. Classification:
+
+```
+Classify the sentiment of this review as 'positive', 'negative', or 'neutral'.
+Output ONLY the label.
+
+Review: "The product works great but shipping was slow."
+
+Sentiment:
+```
+
+**Key elements:**
+- Clear categories ('positive', 'negative', 'neutral')
+- Output constraint ("ONLY the label")
+- Prompt ends with field name ("Sentiment:")
+
+### 2. Extraction:
+
+```
+Extract all dates from this text. Output as JSON array.
+
+Text: "Meeting on March 15, 2024. Follow-up on March 22."
+
+Format:
+["YYYY-MM-DD", "YYYY-MM-DD"]
+
+Output:
+```
+
+**Key elements:**
+- Specific format (JSON array)
+- Date format specified (YYYY-MM-DD)
+- Shows example structure
+
+### 3. Summarization:
+
+```
+Summarize this article in 50 words or less. Focus on the main conclusion and key findings.
+
+Article: [long text]
+
+Summary (max 50 words):
+```
+
+**Key elements:**
+- Length constraint (50 words)
+- Focus instruction (main conclusion, key findings)
+- Clear output label
+
+### 4. Generation:
+
+```
+Write a product description for a wireless mouse with these features:
+- Ergonomic design
+- 1600 DPI sensor
+- 6-month battery life
+- Bluetooth 5.0
+
+Style: Professional, concise (50-100 words)
+
+Product Description:
+```
+
+**Key elements:**
+- Input data (features list)
+- Style guide (professional, concise)
+- Length constraint (50-100 words)
+
+### 5. Transformation:
+
+```
+Convert this SQL query to Python (using pandas):
+
+SQL:
+SELECT name, age FROM users WHERE age > 30 ORDER BY age DESC
+
+Python (pandas):
+```
+
+**Key elements:**
+- Clear source and target formats
+- Shows example input
+- Labels expected output
+
+### 6. Question Answering:
+
+```
+Answer this question based ONLY on the provided context. If the answer is not in the context, say "I don't know."
+
+Context: [document]
+
+Question: What is the return policy?
+
+Answer:
+```
+
+**Key elements:**
+- Constraint ("based ONLY on context")
+- Fallback instruction ("I don't know")
+- Prevents hallucination
+
+
+## Advanced Techniques
+
+### 1. Self-Consistency:
+
+**Generate multiple outputs, take majority vote.**
+
+```python
+answers = []
+for _ in range(5):
+    response = llm.generate(prompt, temperature=0.7)
+    answers.append(response)
+
+# Take majority vote
+final_answer = Counter(answers).most_common(1)[0][0]
+```
+
+**Use for:**
+- Complex reasoning (math, logic)
+- When single answer might be wrong
+- Accuracy > cost
+
+**Trade-off:** 5× cost for 10-20% accuracy improvement
+
+### 2. Tree-of-Thoughts:
+
+**Explore multiple reasoning paths, pick best.**
+
+```
+Problem: [complex problem]
+
+Let's consider 3 different approaches:
+
+Approach 1: [reasoning path 1]
+Approach 2: [reasoning path 2]
+Approach 3: [reasoning path 3]
+
+Which approach is best? Evaluate each:
+[evaluation]
+
+Best approach: [selection]
+
+Now solve using the best approach:
+[solution]
+```
+
+**Use for:**
+- Complex planning
+- Strategic decision-making
+- Multiple valid solutions
+
+### 3. ReAct (Reasoning + Acting):
+
+**Interleave reasoning with actions (tool use).**
+
+```
+Task: What's the weather in the city where the Eiffel Tower is located?
+
+Thought: I need to find where the Eiffel Tower is located.
+Action: Search "Eiffel Tower location"
+Observation: The Eiffel Tower is in Paris, France.
+
+Thought: Now I need the weather in Paris.
+Action: Weather API call for Paris
+Observation: 15°C, partly cloudy
+
+Answer: It's 15°C and partly cloudy in Paris.
+```
+
+**Use for:**
+- Multi-step tasks with tool use
+- Search + reasoning
+- API interactions
+
+### 4. Instruction Following:
+
+**Separate instructions from data.**
+
+```
+Instructions:
+- Extract all email addresses
+- Validate format (user@domain.com)
+- Remove duplicates
+- Sort alphabetically
+
+Data:
+[text with emails]
+
+Output (JSON array):
+```
+
+**Best practice:** Clearly separate "Instructions" from "Data"
+
+
+## Debugging Prompts
+
+**If output is wrong, diagnose systematically.**
+
+### Problem 1: Inconsistent outputs
+
+**Diagnosis:**
+- Instructions too vague?
+- No examples?
+- Temperature too high?
+
+**Fix:**
+- Add specificity
+- Add 3-5 examples
+- Set temperature=0
+
+### Problem 2: Wrong format
+
+**Diagnosis:**
+- Format not specified?
+- Example format missing?
+
+**Fix:**
+- Specify format explicitly
+- Show example output structure
+- End prompt with format label ("JSON:", "CSV:")
+
+### Problem 3: Factual errors
+
+**Diagnosis:**
+- Hallucination (model making up facts)?
+- No chain-of-thought?
+
+**Fix:**
+- Add "based only on provided context"
+- Request "cite your sources"
+- Add "if unsure, say 'I don't know'"
+
+### Problem 4: Too verbose
+
+**Diagnosis:**
+- No length constraint?
+- No "output only" instruction?
+
+**Fix:**
+- Add word/character limit
+- Add "output ONLY the [X], no explanations"
+- Show concise examples
+
+### Problem 5: Misses edge cases
+
+**Diagnosis:**
+- Edge cases not in examples?
+- Instructions don't cover edge cases?
+
+**Fix:**
+- Add edge case examples (null, empty, ambiguous)
+- Explicitly mention edge case handling
+
+
+## Prompt Testing
+
+**Test prompts systematically before production.**
+
+### 1. Create test cases:
+
+```python
+test_cases = [
+    # Normal cases
+    {"input": "...", "expected": "..."},
+    {"input": "...", "expected": "..."},
+
+    # Edge cases
+    {"input": "", "expected": "null"},  # Empty input
+    {"input": "...", "expected": "null"},  # Missing data
+
+    # Ambiguous cases
+    {"input": "...", "expected": "..."},
+]
+```
+
+### 2. Run tests:
+
+```python
+for case in test_cases:
+    output = llm.generate(prompt.format(input=case["input"]))
+    assert output == case["expected"], f"Failed on {case['input']}"
+```
+
+### 3. Measure metrics:
+
+```python
+# Accuracy
+correct = sum(1 for case in test_cases if output == case["expected"])
+accuracy = correct / len(test_cases)
+
+# Consistency (run same input 10 times)
+outputs = [llm.generate(prompt) for _ in range(10)]
+consistency = len(set(outputs)) == 1  # All outputs identical?
+
+# Latency
+import time
+start = time.time()
+output = llm.generate(prompt)
+latency = time.time() - start
+```
+
+
+## Prompt Optimization Workflow
+
+**Iterative improvement process:**
+
+### Step 1: Baseline prompt (simple)
+
+```
+Classify sentiment: [text]
+```
+
+### Step 2: Test and measure
+
+```python
+accuracy = 65%  # Too low!
+consistency = 40%  # Very inconsistent
+```
+
+### Step 3: Add specificity
+
+```
+Classify sentiment as 'positive', 'negative', or 'neutral'.
+Output ONLY the label.
+
+Text: [text]
+Sentiment:
+```
+
+**Result:** accuracy = 75%, consistency = 80%
+
+### Step 4: Add few-shot examples
+
+```
+Classify sentiment as 'positive', 'negative', or 'neutral'.
+
+Examples:
+[3 examples]
+
+Text: [text]
+Sentiment:
+```
+
+**Result:** accuracy = 88%, consistency = 95%
+
+### Step 5: Add edge case handling
+
+```
+[Include edge case examples in few-shot]
+```
+
+**Result:** accuracy = 92%, consistency = 98%
+
+### Step 6: Optimize for cost/latency
+
+```python
+# Reduce examples from 5 to 3 (latency 400ms → 300ms)
+# Accuracy still 92%
+```
+
+**Final:** accuracy = 92%, consistency = 98%, latency = 300ms
+
+
+## Prompt Libraries and Templates
+
+**Reusable templates for common tasks.**
+
+### Template 1: Classification
+
+```
+Classify {item} as one of: {categories}.
+
+{optional: 3-5 examples}
+
+Output ONLY the category label.
+
+{item}: {input}
+
+Category:
+```
+
+### Template 2: Extraction
+
+```
+Extract {fields} from the text. Output as JSON.
+
+{optional: 3-5 examples showing format and edge cases}
+
+Text: {input}
+
+JSON:
+```
+
+### Template 3: Summarization
+
+```
+Summarize this {content_type} in {length} words or less.
+Focus on {aspects}.
+
+{content_type}: {input}
+
+Summary ({length} words max):
+```
+
+### Template 4: Generation
+
+```
+Write {output_type} with these characteristics:
+{characteristics}
+
+Style: {style}
+Length: {length}
+
+{output_type}:
+```
+
+### Template 5: Chain-of-Thought
+
+```
+{question}
+
+Think step-by-step:
+1. {step_1_prompt}
+2. {step_2_prompt}
+3. {step_3_prompt}
+
+Answer:
+```
+
+**Usage:**
+```python
+prompt = CLASSIFICATION_TEMPLATE.format(
+    item="review",
+    categories="'positive', 'negative', 'neutral'",
+    input=review_text
+)
+```
+
+
+## Anti-Patterns
+
+### Anti-pattern 1: "The model is stupid"
+
+**Wrong:** "The model doesn't understand. I need a better model."
+
+**Right:** "My prompt is ambiguous. Let me add examples and specificity."
+
+**Principle:** 90% of issues are prompt issues, not model issues.
+
+### Anti-pattern 2: "Just run it multiple times"
+
+**Wrong:** "Run 10 times and take the average/majority."
+
+**Right:** "Fix the prompt so it's consistent (temperature=0, specific instructions)."
+
+**Principle:** Consistency should come from the prompt, not multiple runs.
+
+### Anti-pattern 3: "Parse the prose output"
+
+**Wrong:** "I'll extract JSON from the prose with regex."
+
+**Right:** "I'll request JSON output explicitly in the prompt."
+
+**Principle:** Specify format in prompt, don't parse after the fact.
+
+### Anti-pattern 4: "System message for everything"
+
+**Wrong:** Put task instructions in system message.
+
+**Right:** System = role/behavior, User = task/instructions.
+
+**Principle:** System message is global (all requests), user message is per-request.
+
+### Anti-pattern 5: "More tokens = better"
+
+**Wrong:** "I'll write a 1000-word prompt with every detail."
+
+**Right:** "I'll write a concise prompt with 3-5 examples."
+
+**Principle:** Concise + examples > verbose instructions.
+
+
+## Summary
+
+**Core principles:**
+
+1. **Be specific**: Define scale, edge cases, constraints, output format
+2. **Use few-shot**: 3-5 examples teach better than instructions
+3. **Specify format**: JSON, CSV, structured text (explicit schema)
+4. **Request reasoning**: Chain-of-thought for complex tasks
+5. **Correct message structure**: System = role, User = task
+
+**Temperature:**
+- 0: Classification, extraction, structured output (deterministic)
+- 0.7-1.0: Creative writing, brainstorming (varied)
+
+**Common patterns:**
+- Classification: Specify categories, output constraint
+- Extraction: Format + examples + edge cases
+- Summarization: Length + focus areas
+- Generation: Features + style + length
+
+**Advanced:**
+- Self-consistency: Multiple runs + majority vote
+- Tree-of-thoughts: Multiple reasoning paths
+- ReAct: Reasoning + action (tool use)
+
+**Debugging:**
+- Inconsistent → Add specificity, examples, temperature=0
+- Wrong format → Specify format explicitly with examples
+- Factual errors → Add context constraints, chain-of-thought
+- Too verbose → Add length limits, "output only"
+
+**Key insight:** Prompts are code. Treat them like code: test, iterate, optimize, version control.