Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:20:33 +08:00
commit 977fbf5872
27 changed files with 5714 additions and 0 deletions

View File

@@ -0,0 +1,186 @@
---
name: engineering-prompts
description: Engineers effective prompts using systematic methodology. Use when designing prompts for Claude, optimizing existing prompts, or balancing simplicity, cost, and effectiveness. Applies progressive disclosure and empirical validation to prompt development.
---
# Engineering Prompts
---
## LEVEL 1: QUICKSTART ⚡
**5-Step Prompt Creation:**
1. **Start Clear**: Explicit instructions + success criteria
2. **Assess Need**: Does it need structure? Examples? Reasoning?
3. **Add Sparingly**: Only techniques that improve outcomes
4. **Estimate Cost**: Count tokens, identify caching opportunities
5. **Test & Iterate**: Measure effectiveness, refine based on results
---
## LEVEL 2: CORE PHILOSOPHY 🎯
### The Three Principles
**Simplicity First**
- Start with minimal prompt
- Add complexity only when empirically justified
- More techniques ≠ better results
**Cost Awareness**
- Minimize token usage
- Leverage prompt caching (90% savings on repeated content)
- Batch processing for non-urgent work (50% savings)
**Effectiveness**
- Techniques must improve outcomes for YOUR use case
- Measure impact, don't just apply best practices
- Iterate based on results
---
## LEVEL 3: THE 9 TECHNIQUES 🛠️
### Quick Reference
| Technique | When to Use | Cost Impact |
|-----------|------------|-------------|
| **1. Clarity** | ALWAYS | Minimal, max impact |
| **2. XML Structure** | Complex prompts, instruction leakage | ~50-100 tokens |
| **3. Chain of Thought** | Reasoning, analysis, math | 2-3x output tokens |
| **4. Multishot Examples** | Pattern learning, format guidance | 200-1K tokens each |
| **5. System Role** | Domain expertise needed | Minimal (caches well) |
| **6. Prefilling** | Strict format requirements | Minimal |
| **7. Long Context** | 20K+ token inputs | Better accuracy |
| **8. Context Budget** | Repeated use, long conversations | 90% savings with cache |
| **9. Tool Docs** | Function calling, agents | 100-500 tokens per tool |
---
## LEVEL 4: DESIGN FRAMEWORK 📋
### D - Define Requirements
**Questions to Answer:**
- Core task?
- Output format?
- Constraints (latency/cost/accuracy)?
- One-off or repeated?
### E - Estimate Complexity
**Simple:**
- Extraction, formatting
- Simple Q&A
- Clear right answer
**Medium:**
- Analysis with reasoning
- Code generation
- Multi-step but clear
**Complex:**
- Deep reasoning
- Novel problem-solving
- Research synthesis
### S - Start Simple
**Minimal Viable Prompt:**
1. Clear instruction
2. Success criteria
3. Output format
Test first. Add complexity only if underperforming.
### I - Iterate Selectively
**Add techniques based on gaps:**
- Unclear outputs → More clarity, examples
- Wrong structure → XML tags, prefilling
- Shallow reasoning → Chain of thought
- Pattern misses → Multishot examples
### G - Guide on Cost
**Cost Optimization:**
- Cache system prompts, reference docs (90% savings)
- Batch non-urgent work (50% savings)
- Minimize token usage through clear, concise instructions
### N - Note Implementation
**Deliverables:**
- The optimized prompt
- Techniques applied + rationale
- Techniques skipped + why
- Token estimate
- Caching strategy
---
## LEVEL 5: ADVANCED TOPICS 🚀
### Tool Integration
**When to use MCP tools during prompt engineering:**
```
Need latest practices?
└─ mcp__plugin_essentials_perplexity
Complex analysis needed?
└─ mcp__plugin_essentials_sequential-thinking
Need library docs?
└─ mcp__plugin_essentials_context7
```
### Context Management
**Prompt Caching:**
- Cache: System prompts, reference docs, examples
- Savings: 90% on cached content
- Write: 25% of standard cost
- Read: 10% of standard cost
**Long Context Tips:**
- Place documents BEFORE queries
- Use XML tags: `<document>`, `<source>`
- Ground responses in quotes
- 30% better performance with proper structure
### Token Optimization
**Reducing Token Usage:**
- Concise, clear instructions (no fluff)
- Reuse examples across calls (cache them)
- Structured output reduces back-and-forth
- Tool use instead of long context when possible
### Anti-Patterns
**Over-engineering** - All 9 techniques for simple task
**Premature optimization** - Complexity before testing simple
**Vague instructions** - "Analyze this" without specifics
**No examples** - Expecting format inference
**Missing structure** - Long prompts without XML
**Ignoring caching** - Not leveraging repeated content
**Stop here if:** You need advanced implementation details
---
## LEVEL 6: REFERENCES 📚
### Deep Dive Documentation
**Detailed Technique Catalog:**
- `reference/technique-catalog.md` - Each technique explained with examples, token costs, combination strategies
**Real-World Examples:**
- `reference/examples.md` - Before/after pairs for coding, analysis, extraction, agent tasks
**Research Papers:**
- `reference/research.md` - Latest Anthropic research, benchmarks, best practices evolution

View File

@@ -0,0 +1,648 @@
# Prompt Engineering Examples
Before/after examples across different use cases demonstrating the application of prompt engineering techniques.
## Table of Contents
- [Example 1: Code Review](#example-1-code-review)
- [Example 2: Data Extraction](#example-2-data-extraction)
- [Example 3: Bug Analysis](#example-3-bug-analysis)
- [Example 4: Long Document Analysis](#example-4-long-document-analysis)
- [Example 5: Agent Workflow with Tools](#example-5-agent-workflow-with-tools)
- [Example 6: Repeated Queries with Caching](#example-6-repeated-queries-with-caching)
- [Example 7: Format Conversion with Prefilling](#example-7-format-conversion-with-prefilling)
- [Example 8: Simple Task (Minimal Techniques)](#example-8-simple-task-minimal-techniques)
- [Complexity Progression](#complexity-progression)
- [Anti-Pattern Examples](#anti-pattern-examples)
- [Key Takeaways](#key-takeaways)
- [Practice Exercise](#practice-exercise)
---
## Example 1: Code Review
### Before (Poor)
```
Review this code.
```
**Issues:**
- Vague - what aspects to review?
- No format guidance
- No success criteria
### After (Optimized)
```xml
<role>
You are a senior software engineer conducting a code review.
</role>
<instructions>
Review the following code for:
1. Security vulnerabilities (SQL injection, XSS, auth issues)
2. Performance problems (N+1 queries, inefficient algorithms)
3. Code quality (naming, duplication, complexity)
For each issue found, provide:
- Severity: Critical/Warning/Suggestion
- Location: File and line number
- Problem: What's wrong
- Fix: Specific code change
</instructions>
<code>
[Code to review]
</code>
<thinking>
Analyze the code systematically for each category before providing your review.
</thinking>
```
**Techniques Applied:**
- Clarity: Specific review categories and output format
- XML Structure: Separate role, instructions, code
- System Role: Senior software engineer
- Chain of Thought: Explicit thinking step
**Cost:** ~300 tokens → 2-3x output tokens for thinking
**Benefit:** Comprehensive, structured reviews with clear action items
---
## Example 2: Data Extraction
### Before (Poor)
```
Get the important information from this document.
```
**Issues:**
- "Important" is subjective
- No format specified
- No examples of desired output
### After (Optimized)
```xml
<instructions>
Extract the following fields from the customer support ticket:
- Customer ID
- Issue category
- Priority level
- Requested action
Return as JSON.
</instructions>
<examples>
Input: "Customer #12345 reporting login issues. High priority. Need password reset."
Output: {
"customer_id": "12345",
"issue_category": "login",
"priority": "high",
"requested_action": "password_reset"
}
Input: "User Jane Smith can't access reports module. Not urgent. Investigate permissions."
Output: {
"customer_id": null,
"issue_category": "access_control",
"priority": "low",
"requested_action": "investigate_permissions"
}
</examples>
<ticket>
[Actual ticket content]
</ticket>
```
**Techniques Applied:**
- Clarity: Specific fields to extract
- XML Structure: Separate sections
- Multishot Examples: Two examples showing pattern and edge cases
- Prefilling: Could add `{` to start JSON response
**Cost:** ~400 tokens (200 per example)
**Benefit:** Consistent structured extraction, handles null values correctly
---
## Example 3: Bug Analysis
### Before (Poor)
```
Why is this code broken?
```
**Issues:**
- No systematic approach
- No context about symptoms
- No guidance on depth of analysis
### After (Optimized)
```xml
<role>
You are an expert debugger specializing in root cause analysis.
</role>
<context>
Error message: TypeError: Cannot read property 'length' of undefined
Stack trace: [stack trace]
Recent changes: Added pagination feature
</context>
<instructions>
Analyze this bug systematically:
<thinking>
1. What does the error message tell us?
2. Which code path leads to this error?
3. What are the possible causes?
4. Which cause is most likely given recent changes?
5. What would fix the root cause?
</thinking>
Then provide:
- Root cause explanation
- Specific code fix
- Prevention strategy
</instructions>
<code>
[Relevant code]
</code>
```
**Techniques Applied:**
- Clarity: Systematic analysis steps
- XML Structure: Separate role, context, instructions, code
- Chain of Thought: Explicit 5-step thinking process
- System Role: Expert debugger
**Cost:** ~250 tokens → 2-3x output for thinking
**Benefit:** Root cause identification, not just symptom fixes
---
## Example 4: Long Document Analysis
### Before (Poor)
```
Summarize these reports.
[Document 1]
[Document 2]
[Document 3]
```
**Issues:**
- Documents after query (poor placement)
- No structure for multiple documents
- No guidance on what to summarize
### After (Optimized)
```xml
<document id="1">
<source>Q1-2024-financial-report.pdf</source>
<type>financial</type>
<content>
[Full document 1 - 15K tokens]
</content>
</document>
<document id="2">
<source>Q2-2024-financial-report.pdf</source>
<type>financial</type>
<content>
[Full document 2 - 15K tokens]
</content>
</document>
<document id="3">
<source>Q3-2024-financial-report.pdf</source>
<type>financial</type>
<content>
[Full document 3 - 15K tokens]
</content>
</document>
<instructions>
Analyze these quarterly financial reports:
1. First, quote the revenue and profit figures from each report
2. Then calculate and explain the trends across quarters
3. Finally, identify any concerning patterns or notable achievements
Present findings as:
- Trend Analysis: [Overall trends with percentages]
- Concerns: [Issues to watch]
- Achievements: [Positive developments]
</instructions>
```
**Techniques Applied:**
- Long Context Optimization: Documents BEFORE query
- XML Structure: Structured document metadata
- Quote Grounding: Explicit instruction to quote first
- Clarity: Specific analysis steps and output format
**Cost:** Same tokens, better accuracy (~30% improvement)
**Benefit:** Accurate multi-document analysis with proper attribution
---
## Example 5: Agent Workflow with Tools
### Before (Poor)
```
Tools:
- search(query)
- calculate(expression)
Answer user questions.
```
**Issues:**
- Vague tool descriptions
- No parameter guidance
- No strategy for tool selection
### After (Optimized)
```xml
<role>
You are a research assistant helping users find and analyze information.
</role>
<tools>
<tool>
Name: semantic_search
Description: Search our internal knowledge base using semantic similarity. Use this when users ask about company policies, products, or internal documentation. Returns the 5 most relevant passages with source citations.
Parameters:
- query (string, required): Natural language search query. Be specific and include key terms.
Example: "vacation policy for employees with 3+ years tenure"
- max_results (integer, optional): Number of results (1-10). Default: 5
When to use: User asks about internal information, policies, or product details
</tool>
<tool>
Name: calculate
Description: Evaluate mathematical expressions safely. Supports basic arithmetic, percentages, and common functions (sqrt, pow, etc.). Use when users request calculations or when analysis requires math.
Parameters:
- expression (string, required): Mathematical expression to evaluate
Example: "(1500 * 0.15) + 200"
When to use: User asks for calculations, percentage changes, or numerical analysis
</tool>
</tools>
<workflow>
1. Understand user intent
2. Determine if tools are needed:
- Information needs → semantic_search
- Math needs → calculate
- Both → search first, then calculate
3. Use tool results to form your response
4. Cite sources when using search results
</workflow>
<thinking>
For each user query, reason through:
- What information or calculation is needed?
- Which tool(s) would help?
- In what order should I use them?
</thinking>
```
**Techniques Applied:**
- Clarity: Detailed tool descriptions with examples
- XML Structure: Organized tool documentation
- System Role: Research assistant
- Tool Documentation: When to use, parameters, examples
- Chain of Thought: Reasoning about tool selection
**Cost:** ~600 tokens for tool docs
**Benefit:** Correct tool selection, proper parameter formatting, strategic tool use
---
## Example 6: Repeated Queries with Caching
### Before (Poor)
```
User: What's the return policy?
System: [Sends entire 50-page policy document + query every time]
```
**Issues:**
- Massive token waste on repeated content
- No caching strategy
- High cost per query
### After (Optimized)
```xml
<system_prompt>
You are a customer service assistant for Acme Corp. Your role is to answer policy questions accurately and concisely, always citing the specific policy section.
</system_prompt>
<company_policies>
[Full 50-page policy document - 40K tokens]
[This section is stable and will be cached]
</company_policies>
<interaction_guidelines>
- Answer clearly and directly
- Cite specific policy sections
- If policy doesn't cover the question, say so
- Be friendly but professional
</interaction_guidelines>
<!-- Everything above caches across requests -->
<!-- Only the user query below changes -->
<user_query>
What's the return policy for electronics?
</user_query>
```
**Techniques Applied:**
- Context Budget Management: Structure for caching
- XML Structure: Create cache boundaries
- System Role: Customer service assistant
- Long Context: Large policy document
**Cost Savings:**
- First call: 40K tokens input (write to cache: 25% cost)
- Subsequent calls: 40K tokens cached (read from cache: 10% cost)
- Savings: 90% on cached content
**Benefit:** $0.30 → $0.03 per query (10x cost reduction)
---
## Example 7: Format Conversion with Prefilling
### Before (Poor)
```
Convert this to JSON: "Customer John Smith, ID 12345, ordered 3 items for $150"
```
**Response:**
```
Sure! Here's the information in JSON format:
{
"customer_name": "John Smith",
"customer_id": "12345",
"item_count": 3,
"total": 150
}
```
**Issues:**
- Unnecessary preamble
- Format might vary
- Extra tokens in output
### After (Optimized)
```
<instructions>
Convert customer orders to JSON with these fields:
- customer_name
- customer_id
- item_count
- total_amount
</instructions>
<input>
Customer John Smith, ID 12345, ordered 3 items for $150
</input>
```
**With Prefilling:**
```
Assistant: {
```
**Response:**
```json
{
"customer_name": "John Smith",
"customer_id": "12345",
"item_count": 3,
"total_amount": 150
}
```
**Techniques Applied:**
- Clarity: Specific field names
- XML Structure: Separate instructions and input
- Prefilling: Start with `{` to force JSON format
**Cost:** Saves ~15 tokens per response (preamble)
**Benefit:** Consistent format, easier parsing, cost savings at scale
---
## Example 8: Simple Task (Minimal Techniques)
### Scenario
Format phone numbers consistently.
### Optimized Prompt
```
Format this phone number in E.164 international format:
(555) 123-4567
Expected: +15551234567
```
**Techniques Applied:**
- Clarity: Specific format with example
**Techniques Skipped:**
- XML Structure: Single-section prompt, unnecessary
- Chain of Thought: Trivial task
- Examples: One is enough
- System Role: No expertise needed
- Long Context: Short input
- Caching: One-off query
**Cost:** ~30 tokens
**Benefit:** Simple, effective, minimal overhead
**Key Lesson:** Not every technique belongs in every prompt. Simple tasks deserve simple prompts.
---
## Complexity Progression
### Level 1: Simple (Haiku)
```
Extract the email address from: "Contact John at john@example.com"
```
- Just clarity
- ~15 tokens
- Obvious single answer
### Level 2: Medium (Sonnet)
```xml
<instructions>
Analyze this code for potential bugs:
1. Logic errors
2. Edge cases not handled
3. Type safety issues
</instructions>
<code>
[Code snippet]
</code>
```
- Clarity + XML structure
- ~100 tokens
- Requires some analysis
### Level 3: Complex (Sonnet with Thinking)
```xml
<role>
You are a security researcher analyzing potential vulnerabilities.
</role>
<instructions>
Analyze this authentication system for security vulnerabilities.
<thinking>
1. What are the authentication flows?
2. Where could an attacker bypass auth?
3. Are credentials handled securely?
4. What about session management?
5. Are there injection risks?
</thinking>
Then provide:
- Vulnerabilities found (severity + location)
- Exploitation scenarios
- Remediation steps
</instructions>
<code>
[Auth system code]
</code>
```
- Clarity + XML + Role + Chain of Thought
- ~350 tokens
- Complex security analysis
---
## Anti-Pattern Examples
### Anti-Pattern 1: Over-Engineering Simple Task
```xml
<role>
You are a world-class expert in string manipulation with 20 years of experience.
</role>
<instructions>
Convert the following text to uppercase.
<thinking>
1. What is the input text?
2. What transformation is needed?
3. Are there special characters?
4. What encoding should we use?
5. Should we preserve whitespace?
</thinking>
Then apply the transformation systematically.
</instructions>
<examples>
Input: "hello"
Output: "HELLO"
Input: "world"
Output: "WORLD"
</examples>
<input>
convert this
</input>
```
**Problem:** Simple task with 200+ token overhead
**Fix:** Just say "Convert to uppercase: convert this"
### Anti-Pattern 2: No Structure for Complex Task
```
I have these 5 documents about different topics and I want you to find common themes and also identify contradictions and create a summary with citations and also rate the quality of each source and explain the methodology you used.
[Document 1 - 10K tokens]
[Document 2 - 10K tokens]
[Document 3 - 10K tokens]
[Document 4 - 10K tokens]
[Document 5 - 10K tokens]
```
**Problems:**
- Run-on instructions
- Documents AFTER query (poor placement)
- No structure
- Multiple tasks crammed together
**Fix:** Use XML structure, place documents first, separate concerns
---
## Key Takeaways
1. **Match complexity to task**: Simple tasks → simple prompts
2. **Start minimal**: Add techniques only when justified
3. **Structure scales**: XML becomes essential with complexity
4. **Examples teach patterns**: Better than description for formats
5. **Thinking improves reasoning**: But costs 2-3x tokens
6. **Caching saves money**: Structure for reuse
7. **Placement matters**: Documents before queries
8. **Tools need docs**: Clear descriptions → correct usage
9. **Measure effectiveness**: Remove techniques that don't help
10. **Every token counts**: Justify each addition
---
## Practice Exercise
Improve this prompt:
```
Analyze the data and tell me what's interesting.
[CSV with 1000 rows of sales data]
```
Consider:
- What's "interesting"? Define it.
- What analysis steps are needed?
- What format should output take?
- Does it need examples?
- Would thinking help?
- Should data be structured?
- What about cost optimization?
Try building an optimized version using appropriate techniques.

View File

@@ -0,0 +1,554 @@
# Prompt Engineering Research & Best Practices
Latest findings from Anthropic research and community best practices for prompt engineering with Claude models.
## Table of Contents
- [Anthropic's Core Research Findings](#anthropics-core-research-findings)
- [Effective Context Engineering (2024)](#effective-context-engineering-2024)
- [Agent Architecture Best Practices (2024-2025)](#agent-architecture-best-practices-2024-2025)
- [Citations and Source Grounding (2024)](#citations-and-source-grounding-2024)
- [Extended Thinking (2024)](#extended-thinking-2024)
- [Community Best Practices (2024-2025)](#community-best-practices-2024-2025)
- [Technique Selection Decision Tree (2025 Consensus)](#technique-selection-decision-tree-2025-consensus)
- [Measuring Prompt Effectiveness](#measuring-prompt-effectiveness)
- [Future Directions (2025 and Beyond)](#future-directions-2025-and-beyond)
- [Key Takeaways from Research](#key-takeaways-from-research)
- [Research Sources](#research-sources)
- [Keeping Current](#keeping-current)
- [Research-Backed Anti-Patterns](#research-backed-anti-patterns)
---
## Anthropic's Core Research Findings
### 1. Prompt Engineering vs Fine-Tuning (2024-2025)
**Key Finding:** Prompt engineering is preferable to fine-tuning for most use cases.
**Advantages:**
- **Speed**: Nearly instantaneous results vs hours/days for fine-tuning
- **Cost**: Uses base models, no GPU resources required
- **Flexibility**: Rapid experimentation and quick iteration
- **Data Requirements**: Works with few-shot or zero-shot learning
- **Knowledge Preservation**: Avoids catastrophic forgetting of general capabilities
- **Transparency**: Prompts are human-readable and debuggable
**When Fine-Tuning Wins:**
- Extremely consistent style requirements across millions of outputs
- Domain-specific jargon that's rare in training data
- Performance optimization for resource-constrained environments
**Source:** Anthropic Prompt Engineering Documentation (2025)
---
### 2. Long Context Window Performance (2024)
**Key Finding:** Document placement dramatically affects accuracy in long context scenarios.
**Research Results:**
- Placing documents BEFORE queries improves performance by up to 30%
- Claude experiences "lost in the middle" phenomenon like other LLMs
- XML structure helps Claude organize and retrieve from long contexts
- Quote grounding (asking Claude to quote relevant sections first) cuts through noise
**Optimal Pattern:**
```xml
<document id="1">
<metadata>...</metadata>
<content>...</content>
</document>
<!-- More documents -->
<instructions>
[Query based on documents]
</instructions>
```
**Source:** Claude Long Context Tips Documentation
---
### 3. Chain of Thought Effectiveness (2023-2025)
**Key Finding:** Encouraging step-by-step reasoning significantly improves accuracy on analytical tasks.
**Results:**
- Simple "Think step by step" phrase improves reasoning accuracy
- Explicit `<thinking>` tags provide transparency and verifiability
- Costs 2-3x output tokens but worth it for complex tasks
- Most effective for: math, logic, multi-step analysis, debugging
**Implementation Evolution:**
- 2023: Simple "think step by step" prompts
- 2024: Structured thinking with XML tags
- 2025: Extended thinking mode with configurable token budgets (16K+ tokens)
**Source:** Anthropic Prompt Engineering Techniques, Extended Thinking Documentation
---
### 4. Prompt Caching Economics (2024)
**Key Finding:** Prompt caching can reduce costs by 90% for repeated content.
**Cost Structure:**
- Cache write: 25% of standard input token cost
- Cache read: 10% of standard input token cost
- Effective savings: ~90% for content that doesn't change
**Optimal Use Cases:**
- System prompts (stable across calls)
- Reference documentation (company policies, API docs)
- Examples in multishot prompting (reused across calls)
- Long context documents (analyzed repeatedly)
**Architecture Pattern:**
```
[Stable content - caches]
└─ System prompt
└─ Reference docs
└─ Guidelines
[Variable content - doesn't cache]
└─ User query
└─ Specific inputs
```
**ROI Example:**
- 40K token system prompt + docs
- 1,000 queries/day
- Without caching: $3.60/day (Sonnet)
- With caching: $0.36/day
- Savings: $1,180/year per 1K daily queries
**Source:** Anthropic Prompt Caching Announcement
---
### 5. XML Tags Fine-Tuning (2024)
**Key Finding:** Claude has been specifically fine-tuned to pay attention to XML tags.
**Why It Works:**
- Training included examples of XML-structured prompts
- Model learned to treat tags as hard boundaries
- Prevents instruction leakage from user input
- Improves retrieval from long contexts
**Best Practices:**
- Use semantic tag names (`<instructions>`, `<context>`, `<examples>`)
- Nest tags for hierarchy when appropriate
- Consistent tag structure across prompts (helps with caching)
- Close all tags properly
**Source:** AWS ML Blog on Anthropic Prompt Engineering
---
### 6. Contextual Retrieval (2024)
**Key Finding:** Encoding context with chunks dramatically improves RAG accuracy.
**Traditional RAG Issues:**
- Chunks encoded in isolation lose surrounding context
- Semantic similarity can miss relevant chunks
- Failed retrievals lead to incorrect or incomplete responses
**Contextual Retrieval Solution:**
- Encode each chunk with surrounding context
- Combine semantic search with BM25 lexical matching
- Apply reranking for final selection
**Results:**
- 49% reduction in failed retrievals (contextual retrieval alone)
- 67% reduction with contextual retrieval + reranking
- Particularly effective for technical documentation and code
**When to Skip RAG:**
- Knowledge base < 200K tokens (fits in context window)
- With prompt caching, including full docs is cost-effective
**Source:** Anthropic Contextual Retrieval Announcement
---
### 7. Batch Processing Economics (2024)
**Key Finding:** Batch API reduces costs by 50% for non-time-sensitive workloads.
**Use Cases:**
- Periodic reports
- Bulk data analysis
- Non-urgent content generation
- Testing and evaluation
**Combined Savings:**
- Batch processing: 50% cost reduction
- Plus prompt caching: Additional 90% on cached content
- Combined potential: 95% cost reduction vs real-time without caching
**Source:** Anthropic Batch API Documentation
---
### 8. Model Capability Tiers (2024-2025)
**Research Finding:** Different tasks have optimal model choices based on complexity vs cost.
**Claude Haiku 4.5 (Released Oct 2024):**
- Performance: Comparable to Sonnet 4
- Speed: ~2x faster than Sonnet 4
- Cost: 1/3 of Sonnet 4 ($0.25/$1.25 per M tokens)
- Best for: High-volume simple tasks, extraction, formatting
**Claude Sonnet 4.5 (Released Oct 2024):**
- Performance: State-of-the-art coding agent (77.2% SWE-bench)
- Sustained attention: 30+ hours on complex tasks
- Cost: $3/$15 per M tokens
- Best for: Most production workloads, balanced use cases
**Claude Opus 4:**
- Performance: Maximum capability
- Cost: $15/$75 per M tokens (5x Sonnet)
- Best for: Novel problems, deep reasoning, research
**Architectural Implication:**
- Orchestrator (Sonnet) + Executor subagents (Haiku) = optimal cost/performance
- Task routing based on complexity assessment
- Dynamic model selection within workflows
**Source:** Anthropic Model Releases, TechCrunch Coverage
---
## Effective Context Engineering (2024)
**Key Research:** Managing attention budget is as important as prompt design.
### The Attention Budget Problem
- LLMs have finite capacity to process and integrate information
- Performance degrades with very long contexts ("lost in the middle")
- n² pairwise relationships for n tokens strains attention mechanism
### Solutions:
**1. Compaction**
- Summarize conversation near context limit
- Reinitiate with high-fidelity summary
- Preserve architectural decisions, unresolved bugs, implementation details
- Discard redundant tool outputs
**2. Structured Note-Taking**
- Maintain curated notes about decisions, findings, state
- Reference notes across context windows
- More efficient than reproducing conversation history
**3. Multi-Agent Architecture**
- Distribute work across agents with specialized contexts
- Each maintains focused context on their domain
- Orchestrator coordinates without managing all context
**4. Context Editing (2024)**
- Automatically clear stale tool calls and results
- Preserve conversation flow
- 84% token reduction in 100-turn evaluations
- 29% performance improvement on agentic search tasks
**Source:** Anthropic Engineering Blog - Effective Context Engineering
---
## Agent Architecture Best Practices (2024-2025)
**Research Consensus:** Successful agents follow three core principles.
### 1. Simplicity
- Do exactly what's needed, no more
- Avoid unnecessary abstraction layers
- Frameworks help initially, but production often benefits from basic components
### 2. Transparency
- Show explicit planning steps
- Allow humans to verify reasoning
- Enable intervention when plans seem misguided
- "Agent shows its work" principle
### 3. Careful Tool Crafting
- Thorough tool documentation with examples
- Clear descriptions of when to use each tool
- Tested tool integrations
- Agent-computer interface as first-class design concern
**Anti-Pattern:** Framework-heavy implementations that obscure decision-making
**Recommended Pattern:**
- Start with frameworks for rapid prototyping
- Gradually reduce abstractions for production
- Build with basic components for predictability
**Source:** Anthropic Research - Building Effective Agents
---
## Citations and Source Grounding (2024)
**Research Finding:** Built-in citation capabilities outperform most custom implementations.
**Citations API Benefits:**
- 15% higher recall accuracy vs custom solutions
- Automatic sentence-level chunking
- Precise attribution to source documents
- Critical for legal, academic, financial applications
**Use Cases:**
- Legal research requiring source verification
- Academic writing with proper attribution
- Fact-checking workflows
- Financial analysis with auditable sources
**Source:** Claude Citations API Announcement
---
## Extended Thinking (2024)
**Capability:** Claude can allocate extended token budget for reasoning before responding.
**Key Parameters:**
- Thinking budget: 16K+ tokens recommended for complex tasks
- Configurable based on task complexity
- Trade latency for accuracy on hard problems
**Use Cases:**
- Complex math problems
- Novel coding challenges
- Multi-step reasoning tasks
- Analysis requiring sustained attention
**Combined with Tools (Beta):**
- Alternate between reasoning and tool invocation
- Reason about available tools, invoke, analyze results, adjust reasoning
- More sophisticated than fixed reasoning → execution sequences
**Source:** Claude Extended Thinking Documentation
---
## Community Best Practices (2024-2025)
### Disable Auto-Compact in Claude Code
**Finding:** Auto-compact can consume 45K tokens (22.5% of context window) before coding begins.
**Recommendation:**
- Turn off auto-compact: `/config` → toggle off
- Use `/clear` after 1-3 messages to prevent bloat
- Run `/clear` immediately after disabling to reclaim tokens
- Regain 88.1% of context window for productive work
**Source:** Shuttle.dev Claude Code Best Practices
### CLAUDE.md Curation
**Finding:** Auto-generated CLAUDE.md files are too generic.
**Best Practice:**
- Manually curate project-specific patterns
- Keep under 100 lines per file
- Include non-obvious relationships
- Document anti-patterns to avoid
- Optimize for AI agent understanding, not human documentation
**Source:** Claude Code Best Practices, Anthropic Engineering
### Custom Slash Commands as Infrastructure
**Finding:** Repeated prompting patterns benefit from reusable commands.
**Best Practice:**
- Store in `.claude/commands/` for project-level
- Store in `~/.claude/commands/` for user-level
- Check into version control for team benefit
- Use `$ARGUMENTS` and `$1, $2, etc.` for parameters
- Encode team best practices as persistent infrastructure
**Source:** Claude Code Documentation
---
## Technique Selection Decision Tree (2025 Consensus)
Based on aggregated research and community feedback:
```
Start: Define Task
┌──────────────┴──────────────┐
│ │
Complexity? Repeated Use?
│ │
┌───┴───┐ ┌────┴────┐
Simple Medium Complex Yes No
│ │ │ │ │
Clarity +XML +Role Cache One-off
+CoT +CoT Structure Design
+Examples +XML
+Tools
Token Budget?
┌───┴───┐
Tight Flexible
│ │
Skip Add CoT
CoT Examples
Format Critical?
┌───┴────┐
Yes No
│ │
+Prefill Skip
+Examples
```
---
## Measuring Prompt Effectiveness
**Research Recommendation:** Systematic evaluation before and after prompt engineering.
### Metrics to Track
**Accuracy:**
- Correctness of outputs
- Alignment with success criteria
- Error rates
**Consistency:**
- Output format compliance
- Reliability across runs
- Variance in responses
**Cost:**
- Tokens per request
- $ cost per request
- Caching effectiveness
**Latency:**
- Time to first token
- Total response time
- User experience impact
### Evaluation Framework
1. **Baseline:** Measure current prompt performance
2. **Iterate:** Apply one technique at a time
3. **Measure:** Compare metrics to baseline
4. **Keep or Discard:** Retain only improvements
5. **Document:** Record which techniques help for which tasks
**Anti-Pattern:** Applying all techniques without measuring effectiveness
---
## Future Directions (2025 and Beyond)
### Emerging Trends
**1. Agent Capabilities**
- Models maintaining focus for 30+ hours (Sonnet 4.5)
- Improved context awareness and self-management
- Better tool use and reasoning integration
**2. Cost Curve Collapse**
- Haiku 4.5 matches Sonnet 4 at 1/3 cost
- Enables new deployment patterns (parallel subagents)
- Economic feasibility of agent orchestration
**3. Multimodal Integration**
- Vision + text for document analysis
- 60% reduction in document processing time
- Correlation of visual and textual information
**4. Safety and Alignment**
- Research on agentic misalignment
- Importance of human oversight at scale
- System design for ethical constraints
**5. Standardization**
- Model Context Protocol (MCP) for tool integration
- Reduced custom integration complexity
- Ecosystem of third-party tools
---
## Key Takeaways from Research
1. **Simplicity wins**: Start minimal, add complexity only when justified by results
2. **Structure scales**: XML tags become essential as complexity increases
3. **Thinking costs but helps**: 2-3x tokens for reasoning, worth it for analysis
4. **Caching transforms economics**: 90% savings makes long prompts feasible
5. **Placement matters**: Documents before queries, 30% better performance
6. **Tools need docs**: Clear descriptions → correct usage
7. **Agents need transparency**: Show reasoning, enable human verification
8. **Context is finite**: Manage attention budget deliberately
9. **Measure everything**: Remove techniques that don't improve outcomes
10. **Economic optimization**: Right model for right task (Haiku → Sonnet → Opus)
---
## Research Sources
- Anthropic Prompt Engineering Documentation (2024-2025)
- Anthropic Engineering Blog - Context Engineering (2024)
- Anthropic Research - Building Effective Agents (2024)
- Claude Code Best Practices (Anthropic, 2024)
- Shuttle.dev Claude Code Analysis (2024)
- AWS ML Blog - Anthropic Techniques (2024)
- Contextual Retrieval Research (Anthropic, 2024)
- Model Release Announcements (Sonnet 4.5, Haiku 4.5)
- Citations API Documentation (2024)
- Extended Thinking Documentation (2024)
- Community Best Practices (Multiple Sources, 2024-2025)
---
## Keeping Current
**Best Practices:**
- Follow Anthropic Engineering blog for latest research
- Monitor Claude Code documentation updates
- Track community implementations (GitHub, forums)
- Experiment with new capabilities as released
- Measure impact of new techniques on your use cases
**Resources:**
- https://www.anthropic.com/research
- https://www.anthropic.com/engineering
- https://docs.claude.com/
- https://code.claude.com/docs
- Community: r/ClaudeAI, Anthropic Discord
---
## Research-Backed Anti-Patterns
Based on empirical findings, avoid:
**Ignoring Document Placement** - 30% performance loss
**Not Leveraging Caching** - 10x unnecessary costs
**Over-Engineering Simple Tasks** - Worse results + higher cost
**Framework Over-Reliance** - Obscures decision-making
**Skipping Measurement** - Can't validate improvements
**One-Size-Fits-All Prompts** - Suboptimal for specific tasks
**Vague Tool Documentation** - Poor tool selection
**Ignoring Context Budget** - Performance degradation
**No Agent Transparency** - Debugging nightmares
**Wrong Model for Task** - Overpaying or underperforming
---
This research summary reflects the state of Anthropic's prompt engineering best practices as of 2025, incorporating both official research and validated community findings.

View File

@@ -0,0 +1,641 @@
# Prompt Engineering Technique Catalog
Deep dive into each of the 9 core prompt engineering techniques with examples, token costs, and combination strategies.
## Table of Contents
- [1. Clarity and Directness](#1-clarity-and-directness)
- [2. XML Structure](#2-xml-structure)
- [3. Chain of Thought](#3-chain-of-thought)
- [4. Multishot Prompting](#4-multishot-prompting)
- [5. System Prompt (Role Assignment)](#5-system-prompt-role-assignment)
- [6. Prefilling](#6-prefilling)
- [7. Long Context Optimization](#7-long-context-optimization)
- [8. Context Budget Management](#8-context-budget-management)
- [9. Tool Documentation](#9-tool-documentation)
- [Technique Combination Matrix](#technique-combination-matrix)
- [Decision Framework](#decision-framework)
- [Common Patterns](#common-patterns)
- [Measuring Effectiveness](#measuring-effectiveness)
---
## 1. Clarity and Directness
### What It Is
Clear, explicit instructions that state objectives precisely, including scope and success criteria in unambiguous terms.
### When to Use
**ALWAYS.** This is the foundational technique that improves responses across virtually all scenarios.
### Token Cost
Minimal - typically 20-50 tokens for clear instructions.
### Examples
**Before (Vague):**
```
Tell me about this document.
```
**After (Clear):**
```
Extract the key financial metrics from this quarterly report, focusing on:
- Revenue growth (YoY %)
- Gross margin
- Operating cash flow
Present each metric in the format: [Metric Name]: [Value] [Trend]
```
### Why It Works
Specificity allows Claude to understand exactly what's needed and focus reasoning on relevant aspects.
### Combination Strategies
- Pairs with ALL techniques - always start here
- Essential foundation for XML structure (what goes in each section)
- Guides chain of thought (what to reason about)
- Clarifies multishot examples (what pattern to match)
---
## 2. XML Structure
### What It Is
Using XML tags to create hard structural boundaries within prompts, separating instructions, context, examples, and formatting requirements.
### When to Use
- Complex prompts with multiple sections
- Risk of instruction leakage (user input mixed with instructions)
- Structured data tasks
- Long prompts where sections need clear delineation
### Token Cost
~50-100 tokens overhead for tag structure.
### Examples
**Before (Mixed):**
```
You're a code reviewer. Look at this code and check for security issues, performance problems, and best practices. Here's the code: [code]. Format your response as bullet points.
```
**After (Structured):**
```xml
<instructions>
You are a code reviewer. Analyze the code for:
- Security vulnerabilities
- Performance issues
- Best practice violations
</instructions>
<code>
[code content]
</code>
<formatting>
Return findings as bullet points, organized by category.
</formatting>
```
### Why It Works
Claude has been fine-tuned to pay special attention to XML tags, preventing confusion between different types of information.
### Combination Strategies
- Use with long context (separate documents with `<document>` tags)
- Pair with examples (`<examples>` section)
- Combine with prefilling (structure output format)
### Skip When
- Simple single-section prompts
- Token budget is extremely tight
- User input doesn't risk instruction leakage
---
## 3. Chain of Thought
### What It Is
Encouraging step-by-step reasoning before providing final answers. Implemented via phrases like "Think step by step" or explicit `<thinking></thinking>` tags.
### When to Use
- Analysis tasks
- Multi-step reasoning
- Math problems
- Complex decision-making
- Debugging
- Tasks where intermediate steps matter
### Token Cost
2-3x output tokens (thinking + final answer).
### Examples
**Before:**
```
What's the root cause of this bug?
```
**After:**
```
Analyze this bug. Think step by step:
1. What is the error message telling us?
2. What code is involved in the stack trace?
3. What are the possible causes?
4. Which cause is most likely given the context?
Then provide your conclusion about the root cause.
```
**Or with structured thinking:**
```
Analyze this bug and provide:
<thinking>
Your step-by-step analysis here
</thinking>
<conclusion>
Root cause and fix
</conclusion>
```
### Why It Works
Breaking down reasoning into steps improves accuracy and makes the decision-making process transparent and verifiable.
### Combination Strategies
- Essential for complex tasks even with other techniques
- Pair with XML structure to separate thinking from output
- Works well with long context (reason about documents)
- Combine with examples showing reasoning process
### Skip When
- Simple extraction or lookup tasks
- Format conversion
- Tasks with obvious single-step answers
- Token budget is critical concern
---
## 4. Multishot Prompting
### What It Is
Providing 2-5 examples of input → desired output to demonstrate patterns.
### When to Use
- Specific formatting requirements
- Pattern learning tasks
- Subtle output nuances
- Structured data extraction
- Style matching
### Token Cost
200-1000 tokens per example (depends on complexity).
### Examples
**Before:**
```
Extract product information from these descriptions.
```
**After:**
```
Extract product information from descriptions. Format as JSON.
Examples:
Input: "Premium leather wallet, black, RFID blocking, $49.99"
Output: {"name": "Premium leather wallet", "color": "black", "features": ["RFID blocking"], "price": 49.99}
Input: "Wireless earbuds, noise cancelling, 24hr battery, multiple colors available"
Output: {"name": "Wireless earbuds", "color": "multiple", "features": ["noise cancelling", "24hr battery"], "price": null}
Now extract from: [your input]
```
### Why It Works
Examples teach patterns more effectively than textual descriptions, especially for format and style.
### Combination Strategies
- Wrap examples in `<examples>` XML tags for clarity
- Show chain of thought in examples if reasoning is complex
- Include edge cases in examples
- Can combine with prefilling to start the response
### Skip When
- Task is self-explanatory
- Examples would be trivial or redundant
- Token budget is constrained
- One-off task where setup cost isn't worth it
---
## 5. System Prompt (Role Assignment)
### What It Is
Using the system parameter to assign Claude a specific role, expertise area, or perspective.
### When to Use
- Domain-specific tasks (medical, legal, technical)
- Tone or style requirements
- Perspective-based analysis
- Specialized workflows
### Token Cost
Minimal (20-100 tokens, caches extremely well).
### Examples
**Generic:**
```
Analyze this code for security issues.
```
**With Role:**
```
System: You are a senior security engineer with 15 years of experience in application security. You specialize in identifying OWASP Top 10 vulnerabilities and secure coding practices.
User: Analyze this code for security issues.
```
### Why It Works
Roles frame Claude's approach and leverage domain-specific patterns from training data.
### Combination Strategies
- Almost always use with other techniques
- Particularly powerful with chain of thought (expert reasoning)
- Helps with multishot examples (expert demonstrates)
- Define constraints in system prompt (tools, approach)
### Skip When
- Generic tasks requiring no specific expertise
- Role would be artificial or unhelpful
- You want flexibility in perspective
---
## 6. Prefilling
### What It Is
Providing the start of Claude's response to guide format and skip preambles.
### When to Use
- Strict format requirements (JSON, XML, CSV)
- Want to skip conversational preambles
- Need consistent output structure
- Automated parsing of responses
### Token Cost
Minimal (5-20 tokens typically).
### Examples
**Without Prefilling:**
```
User: Extract data as JSON
Claude: Sure! Here's the data in JSON format:
{
"data": ...
```
**With Prefilling:**
```
User: Extract data as JSON
Assistant: {
Claude: "data": ...
```
### Why It Works
Forces Claude to continue from the prefilled content, ensuring format compliance and skipping unnecessary text.
### Combination Strategies
- Combine with XML structure (prefill to skip tags)
- Use with multishot (prefill the pattern shown)
- Pair with system role (prefill expert format)
### Skip When
- Conversational tone is desired
- Explanation or context is valuable
- Format is flexible
### Technical Notes
- Prefill cannot end with trailing whitespace
- Works in both API and conversational interfaces
---
## 7. Long Context Optimization
### What It Is
Specific strategies for handling 20K+ token inputs effectively, including document placement, XML structure, and quote grounding.
### When to Use
- Processing multiple documents
- Analyzing long technical documents
- Research across many sources
- Complex data-rich tasks
### Token Cost
No additional cost - improves accuracy for same tokens.
### Key Strategies
**1. Document Placement**
Place long documents BEFORE queries and instructions:
```xml
<document>
[Long document 1]
</document>
<document>
[Long document 2]
</document>
<instructions>
Analyze these documents for X
</instructions>
```
**2. Metadata Tagging**
```xml
<document>
<source>quarterly-report-q3-2024.pdf</source>
<type>financial</type>
<content>
[document content]
</content>
</document>
```
**3. Quote Grounding**
"First, quote the relevant section from the document. Then provide your analysis."
### Why It Works
- Placement: 30% better performance in evaluations
- Tags: Help Claude organize and retrieve information
- Quoting: Forces attention to specific relevant text
### Combination Strategies
- Essential with XML structure for multi-document tasks
- Pair with chain of thought (reason about documents)
- Use with system role (expert document analyst)
### Skip When
- Short prompts (<5K tokens)
- Single focused document
- Simple extraction tasks
---
## 8. Context Budget Management
### What It Is
Optimizing for repeated prompts through caching and managing attention budget across long conversations.
### When to Use
- Repeated prompts with stable content
- Long conversations
- System prompts that don't change
- Reference documentation that's reused
### Token Cost
Caching: 90% cost reduction on cached content
- Write: 25% of standard cost
- Read: 10% of standard cost
### Strategies
**1. Prompt Caching**
Structure prompts so stable content is cached:
```
[System prompt - caches]
[Reference docs - caches]
[User query - doesn't cache]
```
**2. Context Windowing**
For long conversations, periodically summarize and reset context.
**3. Structured Memory**
Use the memory tool to persist information across context windows.
### Examples
**Cacheable Structure:**
```xml
<system>
You are a code reviewer. [full guidelines]
</system>
<style_guide>
[Company style guide - 10K tokens]
</style_guide>
<user_query>
Review this PR: [specific PR]
</user_query>
```
The system prompt and style guide cache, only the user query changes.
### Why It Works
- Caching: Dramatically reduces cost for repeated content
- Windowing: Prevents context overflow and performance degradation
- Memory: Enables projects longer than context window
### Combination Strategies
- Structure with XML to create cacheable boundaries
- Use with long context tips for large documents
- Pair with system prompts (highly cacheable)
### Skip When
- One-off queries
- Content changes every call
- Short prompts where caching overhead isn't worth it
---
## 9. Tool Documentation
### What It Is
Clear, detailed descriptions of tools/functions including when to use them, parameter schemas, and examples.
### When to Use
- Function calling / tool use
- Agent workflows
- API integrations
- Multi-step automated tasks
### Token Cost
100-500 tokens per tool definition.
### Examples
**Poor Tool Definition:**
```json
{
"name": "search",
"description": "Search for something",
"parameters": {
"query": "string"
}
}
```
**Good Tool Definition:**
```json
{
"name": "semantic_search",
"description": "Search internal knowledge base using semantic similarity. Use this when the user asks questions about company policies, products, or documentation. Returns top 5 most relevant passages.",
"parameters": {
"query": {
"type": "string",
"description": "Natural language search query. Be specific and include key terms. Example: 'vacation policy for employees with 3 years tenure'"
},
"max_results": {
"type": "integer",
"description": "Number of results to return (1-10). Default: 5",
"default": 5
}
}
}
```
### Why It Works
Clear tool descriptions help Claude:
- Know when to invoke the tool
- Understand what parameters to provide
- Format parameters correctly
- Choose between multiple tools
### Best Practices
**Description Field:**
- What the tool does
- When to use it
- What it returns
- Keywords/scenarios
**Parameter Schemas:**
- Clear descriptions
- Type definitions
- Enums for fixed values
- Examples of valid inputs
- Defaults where applicable
### Combination Strategies
- Use with system role (define tool strategy)
- Pair with chain of thought (reason about tool choice)
- Combine with examples (show successful tool use)
### Skip When
- No tool use involved
- Single obvious tool
- Tools are self-explanatory
---
## Technique Combination Matrix
| Primary Technique | Works Well With | Avoid Combining With |
|------------------|-----------------|---------------------|
| Clarity | Everything | N/A - always use |
| XML Structure | Long Context, Examples, Caching | Simple single-section prompts |
| Chain of Thought | XML, Role, Long Context | Simple extraction (unnecessary) |
| Multishot | XML, Prefilling | Overly simple tasks |
| System Role | Chain of Thought, Tools | Generic tasks |
| Prefilling | XML, Multishot | Conversational outputs |
| Long Context | XML, Quoting, Caching | Short prompts |
| Context Budget | XML, System Prompts | One-off queries |
| Tool Docs | Role, Examples | No tool use |
---
## Decision Framework
```
Start Here
1. Always apply CLARITY
2. Assess prompt length:
< 5K tokens → Skip long context tips
> 20K tokens → Apply long context optimization
3. Check if repeated:
Yes → Structure for caching
No → Skip cache optimization
4. Does it need reasoning?
Yes → Add chain of thought
No → Skip (save 2-3x tokens)
5. Is format subtle or specific?
Yes → Add examples or prefilling
No → Skip
6. Is it complex or has sections?
Yes → Use XML structure
No → Keep simple
7. Does domain expertise help?
Yes → Assign role in system prompt
No → Skip
8. Does it involve tools?
Yes → Write detailed tool docs
No → Skip
Final Check: Is every technique justified?
```
---
## Common Patterns
### Pattern 1: Simple Extraction
- Clarity ✓
- XML (maybe, if multi-section)
- Everything else: Skip
### Pattern 2: Analysis Task
- Clarity ✓
- Chain of Thought ✓
- XML Structure ✓
- System Role ✓
- Long Context (if large input) ✓
### Pattern 3: Format Conversion
- Clarity ✓
- Multishot Examples ✓
- Prefilling ✓
- XML (maybe)
### Pattern 4: Agent Workflow
- Clarity ✓
- System Role ✓
- Tool Documentation ✓
- Chain of Thought ✓
- Context Budget Management ✓
- XML Structure ✓
### Pattern 5: Repeated Queries
- Clarity ✓
- System Role ✓
- Context Budget Management ✓
- XML Structure (for cache boundaries) ✓
- Other techniques as needed
---
## Measuring Effectiveness
For each technique, track:
- **Accuracy**: Does output quality improve?
- **Token Cost**: What's the overhead?
- **Latency**: Does response time increase?
- **Consistency**: Are results more reliable?
Remove techniques that don't improve outcomes for your specific use case.