Initial commit

2025-11-29 18:20:33 +08:00
commit 977fbf5872
27 changed files with 5714 additions and 0 deletions
--- a/skills/engineering-prompts/SKILL.md
+++ b/skills/engineering-prompts/SKILL.md
@@ -0,0 +1,186 @@
+---
+name: engineering-prompts
+description: Engineers effective prompts using systematic methodology. Use when designing prompts for Claude, optimizing existing prompts, or balancing simplicity, cost, and effectiveness. Applies progressive disclosure and empirical validation to prompt development.
+---
+
+# Engineering Prompts
+
+---
+
+## LEVEL 1: QUICKSTART ⚡
+
+**5-Step Prompt Creation:**
+
+1. **Start Clear**: Explicit instructions + success criteria
+2. **Assess Need**: Does it need structure? Examples? Reasoning?
+3. **Add Sparingly**: Only techniques that improve outcomes
+4. **Estimate Cost**: Count tokens, identify caching opportunities
+5. **Test & Iterate**: Measure effectiveness, refine based on results
+
+---
+
+## LEVEL 2: CORE PHILOSOPHY 🎯
+
+### The Three Principles
+
+**Simplicity First**
+- Start with minimal prompt
+- Add complexity only when empirically justified
+- More techniques ≠ better results
+
+**Cost Awareness**
+- Minimize token usage
+- Leverage prompt caching (90% savings on repeated content)
+- Batch processing for non-urgent work (50% savings)
+
+**Effectiveness**
+- Techniques must improve outcomes for YOUR use case
+- Measure impact, don't just apply best practices
+- Iterate based on results
+
+---
+
+## LEVEL 3: THE 9 TECHNIQUES 🛠️
+
+### Quick Reference
+
+| Technique | When to Use | Cost Impact |
+|-----------|------------|-------------|
+| **1. Clarity** | ALWAYS | Minimal, max impact |
+| **2. XML Structure** | Complex prompts, instruction leakage | ~50-100 tokens |
+| **3. Chain of Thought** | Reasoning, analysis, math | 2-3x output tokens |
+| **4. Multishot Examples** | Pattern learning, format guidance | 200-1K tokens each |
+| **5. System Role** | Domain expertise needed | Minimal (caches well) |
+| **6. Prefilling** | Strict format requirements | Minimal |
+| **7. Long Context** | 20K+ token inputs | Better accuracy |
+| **8. Context Budget** | Repeated use, long conversations | 90% savings with cache |
+| **9. Tool Docs** | Function calling, agents | 100-500 tokens per tool |
+
+---
+
+## LEVEL 4: DESIGN FRAMEWORK 📋
+
+### D - Define Requirements
+
+**Questions to Answer:**
+- Core task?
+- Output format?
+- Constraints (latency/cost/accuracy)?
+- One-off or repeated?
+
+### E - Estimate Complexity
+
+**Simple:**
+- Extraction, formatting
+- Simple Q&A
+- Clear right answer
+
+**Medium:**
+- Analysis with reasoning
+- Code generation
+- Multi-step but clear
+
+**Complex:**
+- Deep reasoning
+- Novel problem-solving
+- Research synthesis
+
+### S - Start Simple
+
+**Minimal Viable Prompt:**
+1. Clear instruction
+2. Success criteria
+3. Output format
+
+Test first. Add complexity only if underperforming.
+
+### I - Iterate Selectively
+
+**Add techniques based on gaps:**
+- Unclear outputs → More clarity, examples
+- Wrong structure → XML tags, prefilling
+- Shallow reasoning → Chain of thought
+- Pattern misses → Multishot examples
+
+### G - Guide on Cost
+
+**Cost Optimization:**
+- Cache system prompts, reference docs (90% savings)
+- Batch non-urgent work (50% savings)
+- Minimize token usage through clear, concise instructions
+
+### N - Note Implementation
+
+**Deliverables:**
+- The optimized prompt
+- Techniques applied + rationale
+- Techniques skipped + why
+- Token estimate
+- Caching strategy
+
+---
+
+## LEVEL 5: ADVANCED TOPICS 🚀
+
+### Tool Integration
+
+**When to use MCP tools during prompt engineering:**
+
+```
+Need latest practices?
+└─ mcp__plugin_essentials_perplexity
+
+Complex analysis needed?
+└─ mcp__plugin_essentials_sequential-thinking
+
+Need library docs?
+└─ mcp__plugin_essentials_context7
+```
+
+### Context Management
+
+**Prompt Caching:**
+- Cache: System prompts, reference docs, examples
+- Savings: 90% on cached content
+- Write: 25% of standard cost
+- Read: 10% of standard cost
+
+**Long Context Tips:**
+- Place documents BEFORE queries
+- Use XML tags: `<document>`, `<source>`
+- Ground responses in quotes
+- 30% better performance with proper structure
+
+### Token Optimization
+
+**Reducing Token Usage:**
+- Concise, clear instructions (no fluff)
+- Reuse examples across calls (cache them)
+- Structured output reduces back-and-forth
+- Tool use instead of long context when possible
+
+### Anti-Patterns
+
+❌ **Over-engineering** - All 9 techniques for simple task
+❌ **Premature optimization** - Complexity before testing simple
+❌ **Vague instructions** - "Analyze this" without specifics
+❌ **No examples** - Expecting format inference
+❌ **Missing structure** - Long prompts without XML
+❌ **Ignoring caching** - Not leveraging repeated content
+
+**Stop here if:** You need advanced implementation details
+
+---
+
+## LEVEL 6: REFERENCES 📚
+
+### Deep Dive Documentation
+
+**Detailed Technique Catalog:**
+- `reference/technique-catalog.md` - Each technique explained with examples, token costs, combination strategies
+
+**Real-World Examples:**
+- `reference/examples.md` - Before/after pairs for coding, analysis, extraction, agent tasks
+
+**Research Papers:**
+- `reference/research.md` - Latest Anthropic research, benchmarks, best practices evolution
--- a/skills/engineering-prompts/reference/examples.md
+++ b/skills/engineering-prompts/reference/examples.md
@@ -0,0 +1,648 @@
+# Prompt Engineering Examples
+
+Before/after examples across different use cases demonstrating the application of prompt engineering techniques.
+
+## Table of Contents
+
+- [Example 1: Code Review](#example-1-code-review)
+- [Example 2: Data Extraction](#example-2-data-extraction)
+- [Example 3: Bug Analysis](#example-3-bug-analysis)
+- [Example 4: Long Document Analysis](#example-4-long-document-analysis)
+- [Example 5: Agent Workflow with Tools](#example-5-agent-workflow-with-tools)
+- [Example 6: Repeated Queries with Caching](#example-6-repeated-queries-with-caching)
+- [Example 7: Format Conversion with Prefilling](#example-7-format-conversion-with-prefilling)
+- [Example 8: Simple Task (Minimal Techniques)](#example-8-simple-task-minimal-techniques)
+- [Complexity Progression](#complexity-progression)
+- [Anti-Pattern Examples](#anti-pattern-examples)
+- [Key Takeaways](#key-takeaways)
+- [Practice Exercise](#practice-exercise)
+
+---
+
+## Example 1: Code Review
+
+### Before (Poor)
+
+```
+Review this code.
+```
+
+**Issues:**
+- Vague - what aspects to review?
+- No format guidance
+- No success criteria
+
+### After (Optimized)
+
+```xml
+<role>
+You are a senior software engineer conducting a code review.
+</role>
+
+<instructions>
+Review the following code for:
+1. Security vulnerabilities (SQL injection, XSS, auth issues)
+2. Performance problems (N+1 queries, inefficient algorithms)
+3. Code quality (naming, duplication, complexity)
+
+For each issue found, provide:
+- Severity: Critical/Warning/Suggestion
+- Location: File and line number
+- Problem: What's wrong
+- Fix: Specific code change
+</instructions>
+
+<code>
+[Code to review]
+</code>
+
+<thinking>
+Analyze the code systematically for each category before providing your review.
+</thinking>
+```
+
+**Techniques Applied:**
+- Clarity: Specific review categories and output format
+- XML Structure: Separate role, instructions, code
+- System Role: Senior software engineer
+- Chain of Thought: Explicit thinking step
+
+**Cost:** ~300 tokens → 2-3x output tokens for thinking
+**Benefit:** Comprehensive, structured reviews with clear action items
+
+---
+
+## Example 2: Data Extraction
+
+### Before (Poor)
+
+```
+Get the important information from this document.
+```
+
+**Issues:**
+- "Important" is subjective
+- No format specified
+- No examples of desired output
+
+### After (Optimized)
+
+```xml
+<instructions>
+Extract the following fields from the customer support ticket:
+- Customer ID
+- Issue category
+- Priority level
+- Requested action
+
+Return as JSON.
+</instructions>
+
+<examples>
+Input: "Customer #12345 reporting login issues. High priority. Need password reset."
+Output: {
+  "customer_id": "12345",
+  "issue_category": "login",
+  "priority": "high",
+  "requested_action": "password_reset"
+}
+
+Input: "User Jane Smith can't access reports module. Not urgent. Investigate permissions."
+Output: {
+  "customer_id": null,
+  "issue_category": "access_control",
+  "priority": "low",
+  "requested_action": "investigate_permissions"
+}
+</examples>
+
+<ticket>
+[Actual ticket content]
+</ticket>
+```
+
+**Techniques Applied:**
+- Clarity: Specific fields to extract
+- XML Structure: Separate sections
+- Multishot Examples: Two examples showing pattern and edge cases
+- Prefilling: Could add `{` to start JSON response
+
+**Cost:** ~400 tokens (200 per example)
+**Benefit:** Consistent structured extraction, handles null values correctly
+
+---
+
+## Example 3: Bug Analysis
+
+### Before (Poor)
+
+```
+Why is this code broken?
+```
+
+**Issues:**
+- No systematic approach
+- No context about symptoms
+- No guidance on depth of analysis
+
+### After (Optimized)
+
+```xml
+<role>
+You are an expert debugger specializing in root cause analysis.
+</role>
+
+<context>
+Error message: TypeError: Cannot read property 'length' of undefined
+Stack trace: [stack trace]
+Recent changes: Added pagination feature
+</context>
+
+<instructions>
+Analyze this bug systematically:
+
+<thinking>
+1. What does the error message tell us?
+2. Which code path leads to this error?
+3. What are the possible causes?
+4. Which cause is most likely given recent changes?
+5. What would fix the root cause?
+</thinking>
+
+Then provide:
+- Root cause explanation
+- Specific code fix
+- Prevention strategy
+</instructions>
+
+<code>
+[Relevant code]
+</code>
+```
+
+**Techniques Applied:**
+- Clarity: Systematic analysis steps
+- XML Structure: Separate role, context, instructions, code
+- Chain of Thought: Explicit 5-step thinking process
+- System Role: Expert debugger
+
+**Cost:** ~250 tokens → 2-3x output for thinking
+**Benefit:** Root cause identification, not just symptom fixes
+
+---
+
+## Example 4: Long Document Analysis
+
+### Before (Poor)
+
+```
+Summarize these reports.
+
+[Document 1]
+[Document 2]
+[Document 3]
+```
+
+**Issues:**
+- Documents after query (poor placement)
+- No structure for multiple documents
+- No guidance on what to summarize
+
+### After (Optimized)
+
+```xml
+<document id="1">
+  <source>Q1-2024-financial-report.pdf</source>
+  <type>financial</type>
+  <content>
+  [Full document 1 - 15K tokens]
+  </content>
+</document>
+
+<document id="2">
+  <source>Q2-2024-financial-report.pdf</source>
+  <type>financial</type>
+  <content>
+  [Full document 2 - 15K tokens]
+  </content>
+</document>
+
+<document id="3">
+  <source>Q3-2024-financial-report.pdf</source>
+  <type>financial</type>
+  <content>
+  [Full document 3 - 15K tokens]
+  </content>
+</document>
+
+<instructions>
+Analyze these quarterly financial reports:
+
+1. First, quote the revenue and profit figures from each report
+2. Then calculate and explain the trends across quarters
+3. Finally, identify any concerning patterns or notable achievements
+
+Present findings as:
+- Trend Analysis: [Overall trends with percentages]
+- Concerns: [Issues to watch]
+- Achievements: [Positive developments]
+</instructions>
+```
+
+**Techniques Applied:**
+- Long Context Optimization: Documents BEFORE query
+- XML Structure: Structured document metadata
+- Quote Grounding: Explicit instruction to quote first
+- Clarity: Specific analysis steps and output format
+
+**Cost:** Same tokens, better accuracy (~30% improvement)
+**Benefit:** Accurate multi-document analysis with proper attribution
+
+---
+
+## Example 5: Agent Workflow with Tools
+
+### Before (Poor)
+
+```
+Tools:
+- search(query)
+- calculate(expression)
+
+Answer user questions.
+```
+
+**Issues:**
+- Vague tool descriptions
+- No parameter guidance
+- No strategy for tool selection
+
+### After (Optimized)
+
+```xml
+<role>
+You are a research assistant helping users find and analyze information.
+</role>
+
+<tools>
+<tool>
+Name: semantic_search
+Description: Search our internal knowledge base using semantic similarity. Use this when users ask about company policies, products, or internal documentation. Returns the 5 most relevant passages with source citations.
+Parameters:
+  - query (string, required): Natural language search query. Be specific and include key terms.
+    Example: "vacation policy for employees with 3+ years tenure"
+  - max_results (integer, optional): Number of results (1-10). Default: 5
+When to use: User asks about internal information, policies, or product details
+</tool>
+
+<tool>
+Name: calculate
+Description: Evaluate mathematical expressions safely. Supports basic arithmetic, percentages, and common functions (sqrt, pow, etc.). Use when users request calculations or when analysis requires math.
+Parameters:
+  - expression (string, required): Mathematical expression to evaluate
+    Example: "(1500 * 0.15) + 200"
+When to use: User asks for calculations, percentage changes, or numerical analysis
+</tool>
+</tools>
+
+<workflow>
+1. Understand user intent
+2. Determine if tools are needed:
+   - Information needs → semantic_search
+   - Math needs → calculate
+   - Both → search first, then calculate
+3. Use tool results to form your response
+4. Cite sources when using search results
+</workflow>
+
+<thinking>
+For each user query, reason through:
+- What information or calculation is needed?
+- Which tool(s) would help?
+- In what order should I use them?
+</thinking>
+```
+
+**Techniques Applied:**
+- Clarity: Detailed tool descriptions with examples
+- XML Structure: Organized tool documentation
+- System Role: Research assistant
+- Tool Documentation: When to use, parameters, examples
+- Chain of Thought: Reasoning about tool selection
+
+**Cost:** ~600 tokens for tool docs
+**Benefit:** Correct tool selection, proper parameter formatting, strategic tool use
+
+---
+
+## Example 6: Repeated Queries with Caching
+
+### Before (Poor)
+
+```
+User: What's the return policy?
+
+System: [Sends entire 50-page policy document + query every time]
+```
+
+**Issues:**
+- Massive token waste on repeated content
+- No caching strategy
+- High cost per query
+
+### After (Optimized)
+
+```xml
+<system_prompt>
+You are a customer service assistant for Acme Corp. Your role is to answer policy questions accurately and concisely, always citing the specific policy section.
+</system_prompt>
+
+<company_policies>
+[Full 50-page policy document - 40K tokens]
+[This section is stable and will be cached]
+</company_policies>
+
+<interaction_guidelines>
+- Answer clearly and directly
+- Cite specific policy sections
+- If policy doesn't cover the question, say so
+- Be friendly but professional
+</interaction_guidelines>
+
+<!-- Everything above caches across requests -->
+<!-- Only the user query below changes -->
+
+<user_query>
+What's the return policy for electronics?
+</user_query>
+```
+
+**Techniques Applied:**
+- Context Budget Management: Structure for caching
+- XML Structure: Create cache boundaries
+- System Role: Customer service assistant
+- Long Context: Large policy document
+
+**Cost Savings:**
+- First call: 40K tokens input (write to cache: 25% cost)
+- Subsequent calls: 40K tokens cached (read from cache: 10% cost)
+- Savings: 90% on cached content
+
+**Benefit:** $0.30 → $0.03 per query (10x cost reduction)
+
+---
+
+## Example 7: Format Conversion with Prefilling
+
+### Before (Poor)
+
+```
+Convert this to JSON: "Customer John Smith, ID 12345, ordered 3 items for $150"
+```
+
+**Response:**
+```
+Sure! Here's the information in JSON format:
+
+{
+  "customer_name": "John Smith",
+  "customer_id": "12345",
+  "item_count": 3,
+  "total": 150
+}
+```
+
+**Issues:**
+- Unnecessary preamble
+- Format might vary
+- Extra tokens in output
+
+### After (Optimized)
+
+```
+<instructions>
+Convert customer orders to JSON with these fields:
+- customer_name
+- customer_id
+- item_count
+- total_amount
+</instructions>
+
+<input>
+Customer John Smith, ID 12345, ordered 3 items for $150
+</input>
+```
+
+**With Prefilling:**
+```
+Assistant: {
+```
+
+**Response:**
+```json
+{
+  "customer_name": "John Smith",
+  "customer_id": "12345",
+  "item_count": 3,
+  "total_amount": 150
+}
+```
+
+**Techniques Applied:**
+- Clarity: Specific field names
+- XML Structure: Separate instructions and input
+- Prefilling: Start with `{` to force JSON format
+
+**Cost:** Saves ~15 tokens per response (preamble)
+**Benefit:** Consistent format, easier parsing, cost savings at scale
+
+---
+
+## Example 8: Simple Task (Minimal Techniques)
+
+### Scenario
+Format phone numbers consistently.
+
+### Optimized Prompt
+
+```
+Format this phone number in E.164 international format:
+(555) 123-4567
+
+Expected: +15551234567
+```
+
+**Techniques Applied:**
+- Clarity: Specific format with example
+
+**Techniques Skipped:**
+- XML Structure: Single-section prompt, unnecessary
+- Chain of Thought: Trivial task
+- Examples: One is enough
+- System Role: No expertise needed
+- Long Context: Short input
+- Caching: One-off query
+
+**Cost:** ~30 tokens
+**Benefit:** Simple, effective, minimal overhead
+
+**Key Lesson:** Not every technique belongs in every prompt. Simple tasks deserve simple prompts.
+
+---
+
+## Complexity Progression
+
+### Level 1: Simple (Haiku)
+```
+Extract the email address from: "Contact John at john@example.com"
+```
+- Just clarity
+- ~15 tokens
+- Obvious single answer
+
+### Level 2: Medium (Sonnet)
+```xml
+<instructions>
+Analyze this code for potential bugs:
+1. Logic errors
+2. Edge cases not handled
+3. Type safety issues
+</instructions>
+
+<code>
+[Code snippet]
+</code>
+```
+- Clarity + XML structure
+- ~100 tokens
+- Requires some analysis
+
+### Level 3: Complex (Sonnet with Thinking)
+```xml
+<role>
+You are a security researcher analyzing potential vulnerabilities.
+</role>
+
+<instructions>
+Analyze this authentication system for security vulnerabilities.
+
+<thinking>
+1. What are the authentication flows?
+2. Where could an attacker bypass auth?
+3. Are credentials handled securely?
+4. What about session management?
+5. Are there injection risks?
+</thinking>
+
+Then provide:
+- Vulnerabilities found (severity + location)
+- Exploitation scenarios
+- Remediation steps
+</instructions>
+
+<code>
+[Auth system code]
+</code>
+```
+- Clarity + XML + Role + Chain of Thought
+- ~350 tokens
+- Complex security analysis
+
+---
+
+## Anti-Pattern Examples
+
+### Anti-Pattern 1: Over-Engineering Simple Task
+
+```xml
+<role>
+You are a world-class expert in string manipulation with 20 years of experience.
+</role>
+
+<instructions>
+Convert the following text to uppercase.
+
+<thinking>
+1. What is the input text?
+2. What transformation is needed?
+3. Are there special characters?
+4. What encoding should we use?
+5. Should we preserve whitespace?
+</thinking>
+
+Then apply the transformation systematically.
+</instructions>
+
+<examples>
+Input: "hello"
+Output: "HELLO"
+
+Input: "world"
+Output: "WORLD"
+</examples>
+
+<input>
+convert this
+</input>
+```
+
+**Problem:** Simple task with 200+ token overhead
+**Fix:** Just say "Convert to uppercase: convert this"
+
+### Anti-Pattern 2: No Structure for Complex Task
+
+```
+I have these 5 documents about different topics and I want you to find common themes and also identify contradictions and create a summary with citations and also rate the quality of each source and explain the methodology you used.
+
+[Document 1 - 10K tokens]
+[Document 2 - 10K tokens]
+[Document 3 - 10K tokens]
+[Document 4 - 10K tokens]
+[Document 5 - 10K tokens]
+```
+
+**Problems:**
+- Run-on instructions
+- Documents AFTER query (poor placement)
+- No structure
+- Multiple tasks crammed together
+
+**Fix:** Use XML structure, place documents first, separate concerns
+
+---
+
+## Key Takeaways
+
+1. **Match complexity to task**: Simple tasks → simple prompts
+2. **Start minimal**: Add techniques only when justified
+3. **Structure scales**: XML becomes essential with complexity
+4. **Examples teach patterns**: Better than description for formats
+5. **Thinking improves reasoning**: But costs 2-3x tokens
+6. **Caching saves money**: Structure for reuse
+7. **Placement matters**: Documents before queries
+8. **Tools need docs**: Clear descriptions → correct usage
+9. **Measure effectiveness**: Remove techniques that don't help
+10. **Every token counts**: Justify each addition
+
+---
+
+## Practice Exercise
+
+Improve this prompt:
+
+```
+Analyze the data and tell me what's interesting.
+
+[CSV with 1000 rows of sales data]
+```
+
+Consider:
+- What's "interesting"? Define it.
+- What analysis steps are needed?
+- What format should output take?
+- Does it need examples?
+- Would thinking help?
+- Should data be structured?
+- What about cost optimization?
+
+Try building an optimized version using appropriate techniques.
--- a/skills/engineering-prompts/reference/research.md
+++ b/skills/engineering-prompts/reference/research.md
@@ -0,0 +1,554 @@
+# Prompt Engineering Research & Best Practices
+
+Latest findings from Anthropic research and community best practices for prompt engineering with Claude models.
+
+## Table of Contents
+
+- [Anthropic's Core Research Findings](#anthropics-core-research-findings)
+- [Effective Context Engineering (2024)](#effective-context-engineering-2024)
+- [Agent Architecture Best Practices (2024-2025)](#agent-architecture-best-practices-2024-2025)
+- [Citations and Source Grounding (2024)](#citations-and-source-grounding-2024)
+- [Extended Thinking (2024)](#extended-thinking-2024)
+- [Community Best Practices (2024-2025)](#community-best-practices-2024-2025)
+- [Technique Selection Decision Tree (2025 Consensus)](#technique-selection-decision-tree-2025-consensus)
+- [Measuring Prompt Effectiveness](#measuring-prompt-effectiveness)
+- [Future Directions (2025 and Beyond)](#future-directions-2025-and-beyond)
+- [Key Takeaways from Research](#key-takeaways-from-research)
+- [Research Sources](#research-sources)
+- [Keeping Current](#keeping-current)
+- [Research-Backed Anti-Patterns](#research-backed-anti-patterns)
+
+---
+
+## Anthropic's Core Research Findings
+
+### 1. Prompt Engineering vs Fine-Tuning (2024-2025)
+
+**Key Finding:** Prompt engineering is preferable to fine-tuning for most use cases.
+
+**Advantages:**
+- **Speed**: Nearly instantaneous results vs hours/days for fine-tuning
+- **Cost**: Uses base models, no GPU resources required
+- **Flexibility**: Rapid experimentation and quick iteration
+- **Data Requirements**: Works with few-shot or zero-shot learning
+- **Knowledge Preservation**: Avoids catastrophic forgetting of general capabilities
+- **Transparency**: Prompts are human-readable and debuggable
+
+**When Fine-Tuning Wins:**
+- Extremely consistent style requirements across millions of outputs
+- Domain-specific jargon that's rare in training data
+- Performance optimization for resource-constrained environments
+
+**Source:** Anthropic Prompt Engineering Documentation (2025)
+
+---
+
+### 2. Long Context Window Performance (2024)
+
+**Key Finding:** Document placement dramatically affects accuracy in long context scenarios.
+
+**Research Results:**
+- Placing documents BEFORE queries improves performance by up to 30%
+- Claude experiences "lost in the middle" phenomenon like other LLMs
+- XML structure helps Claude organize and retrieve from long contexts
+- Quote grounding (asking Claude to quote relevant sections first) cuts through noise
+
+**Optimal Pattern:**
+```xml
+<document id="1">
+  <metadata>...</metadata>
+  <content>...</content>
+</document>
+<!-- More documents -->
+
+<instructions>
+[Query based on documents]
+</instructions>
+```
+
+**Source:** Claude Long Context Tips Documentation
+
+---
+
+### 3. Chain of Thought Effectiveness (2023-2025)
+
+**Key Finding:** Encouraging step-by-step reasoning significantly improves accuracy on analytical tasks.
+
+**Results:**
+- Simple "Think step by step" phrase improves reasoning accuracy
+- Explicit `<thinking>` tags provide transparency and verifiability
+- Costs 2-3x output tokens but worth it for complex tasks
+- Most effective for: math, logic, multi-step analysis, debugging
+
+**Implementation Evolution:**
+- 2023: Simple "think step by step" prompts
+- 2024: Structured thinking with XML tags
+- 2025: Extended thinking mode with configurable token budgets (16K+ tokens)
+
+**Source:** Anthropic Prompt Engineering Techniques, Extended Thinking Documentation
+
+---
+
+### 4. Prompt Caching Economics (2024)
+
+**Key Finding:** Prompt caching can reduce costs by 90% for repeated content.
+
+**Cost Structure:**
+- Cache write: 25% of standard input token cost
+- Cache read: 10% of standard input token cost
+- Effective savings: ~90% for content that doesn't change
+
+**Optimal Use Cases:**
+- System prompts (stable across calls)
+- Reference documentation (company policies, API docs)
+- Examples in multishot prompting (reused across calls)
+- Long context documents (analyzed repeatedly)
+
+**Architecture Pattern:**
+```
+[Stable content - caches]
+└─ System prompt
+└─ Reference docs
+└─ Guidelines
+
+[Variable content - doesn't cache]
+└─ User query
+└─ Specific inputs
+```
+
+**ROI Example:**
+- 40K token system prompt + docs
+- 1,000 queries/day
+- Without caching: $3.60/day (Sonnet)
+- With caching: $0.36/day
+- Savings: $1,180/year per 1K daily queries
+
+**Source:** Anthropic Prompt Caching Announcement
+
+---
+
+### 5. XML Tags Fine-Tuning (2024)
+
+**Key Finding:** Claude has been specifically fine-tuned to pay attention to XML tags.
+
+**Why It Works:**
+- Training included examples of XML-structured prompts
+- Model learned to treat tags as hard boundaries
+- Prevents instruction leakage from user input
+- Improves retrieval from long contexts
+
+**Best Practices:**
+- Use semantic tag names (`<instructions>`, `<context>`, `<examples>`)
+- Nest tags for hierarchy when appropriate
+- Consistent tag structure across prompts (helps with caching)
+- Close all tags properly
+
+**Source:** AWS ML Blog on Anthropic Prompt Engineering
+
+---
+
+### 6. Contextual Retrieval (2024)
+
+**Key Finding:** Encoding context with chunks dramatically improves RAG accuracy.
+
+**Traditional RAG Issues:**
+- Chunks encoded in isolation lose surrounding context
+- Semantic similarity can miss relevant chunks
+- Failed retrievals lead to incorrect or incomplete responses
+
+**Contextual Retrieval Solution:**
+- Encode each chunk with surrounding context
+- Combine semantic search with BM25 lexical matching
+- Apply reranking for final selection
+
+**Results:**
+- 49% reduction in failed retrievals (contextual retrieval alone)
+- 67% reduction with contextual retrieval + reranking
+- Particularly effective for technical documentation and code
+
+**When to Skip RAG:**
+- Knowledge base < 200K tokens (fits in context window)
+- With prompt caching, including full docs is cost-effective
+
+**Source:** Anthropic Contextual Retrieval Announcement
+
+---
+
+### 7. Batch Processing Economics (2024)
+
+**Key Finding:** Batch API reduces costs by 50% for non-time-sensitive workloads.
+
+**Use Cases:**
+- Periodic reports
+- Bulk data analysis
+- Non-urgent content generation
+- Testing and evaluation
+
+**Combined Savings:**
+- Batch processing: 50% cost reduction
+- Plus prompt caching: Additional 90% on cached content
+- Combined potential: 95% cost reduction vs real-time without caching
+
+**Source:** Anthropic Batch API Documentation
+
+---
+
+### 8. Model Capability Tiers (2024-2025)
+
+**Research Finding:** Different tasks have optimal model choices based on complexity vs cost.
+
+**Claude Haiku 4.5 (Released Oct 2024):**
+- Performance: Comparable to Sonnet 4
+- Speed: ~2x faster than Sonnet 4
+- Cost: 1/3 of Sonnet 4 ($0.25/$1.25 per M tokens)
+- Best for: High-volume simple tasks, extraction, formatting
+
+**Claude Sonnet 4.5 (Released Oct 2024):**
+- Performance: State-of-the-art coding agent (77.2% SWE-bench)
+- Sustained attention: 30+ hours on complex tasks
+- Cost: $3/$15 per M tokens
+- Best for: Most production workloads, balanced use cases
+
+**Claude Opus 4:**
+- Performance: Maximum capability
+- Cost: $15/$75 per M tokens (5x Sonnet)
+- Best for: Novel problems, deep reasoning, research
+
+**Architectural Implication:**
+- Orchestrator (Sonnet) + Executor subagents (Haiku) = optimal cost/performance
+- Task routing based on complexity assessment
+- Dynamic model selection within workflows
+
+**Source:** Anthropic Model Releases, TechCrunch Coverage
+
+---
+
+## Effective Context Engineering (2024)
+
+**Key Research:** Managing attention budget is as important as prompt design.
+
+### The Attention Budget Problem
+- LLMs have finite capacity to process and integrate information
+- Performance degrades with very long contexts ("lost in the middle")
+- n² pairwise relationships for n tokens strains attention mechanism
+
+### Solutions:
+
+**1. Compaction**
+- Summarize conversation near context limit
+- Reinitiate with high-fidelity summary
+- Preserve architectural decisions, unresolved bugs, implementation details
+- Discard redundant tool outputs
+
+**2. Structured Note-Taking**
+- Maintain curated notes about decisions, findings, state
+- Reference notes across context windows
+- More efficient than reproducing conversation history
+
+**3. Multi-Agent Architecture**
+- Distribute work across agents with specialized contexts
+- Each maintains focused context on their domain
+- Orchestrator coordinates without managing all context
+
+**4. Context Editing (2024)**
+- Automatically clear stale tool calls and results
+- Preserve conversation flow
+- 84% token reduction in 100-turn evaluations
+- 29% performance improvement on agentic search tasks
+
+**Source:** Anthropic Engineering Blog - Effective Context Engineering
+
+---
+
+## Agent Architecture Best Practices (2024-2025)
+
+**Research Consensus:** Successful agents follow three core principles.
+
+### 1. Simplicity
+- Do exactly what's needed, no more
+- Avoid unnecessary abstraction layers
+- Frameworks help initially, but production often benefits from basic components
+
+### 2. Transparency
+- Show explicit planning steps
+- Allow humans to verify reasoning
+- Enable intervention when plans seem misguided
+- "Agent shows its work" principle
+
+### 3. Careful Tool Crafting
+- Thorough tool documentation with examples
+- Clear descriptions of when to use each tool
+- Tested tool integrations
+- Agent-computer interface as first-class design concern
+
+**Anti-Pattern:** Framework-heavy implementations that obscure decision-making
+
+**Recommended Pattern:**
+- Start with frameworks for rapid prototyping
+- Gradually reduce abstractions for production
+- Build with basic components for predictability
+
+**Source:** Anthropic Research - Building Effective Agents
+
+---
+
+## Citations and Source Grounding (2024)
+
+**Research Finding:** Built-in citation capabilities outperform most custom implementations.
+
+**Citations API Benefits:**
+- 15% higher recall accuracy vs custom solutions
+- Automatic sentence-level chunking
+- Precise attribution to source documents
+- Critical for legal, academic, financial applications
+
+**Use Cases:**
+- Legal research requiring source verification
+- Academic writing with proper attribution
+- Fact-checking workflows
+- Financial analysis with auditable sources
+
+**Source:** Claude Citations API Announcement
+
+---
+
+## Extended Thinking (2024)
+
+**Capability:** Claude can allocate extended token budget for reasoning before responding.
+
+**Key Parameters:**
+- Thinking budget: 16K+ tokens recommended for complex tasks
+- Configurable based on task complexity
+- Trade latency for accuracy on hard problems
+
+**Use Cases:**
+- Complex math problems
+- Novel coding challenges
+- Multi-step reasoning tasks
+- Analysis requiring sustained attention
+
+**Combined with Tools (Beta):**
+- Alternate between reasoning and tool invocation
+- Reason about available tools, invoke, analyze results, adjust reasoning
+- More sophisticated than fixed reasoning → execution sequences
+
+**Source:** Claude Extended Thinking Documentation
+
+---
+
+## Community Best Practices (2024-2025)
+
+### Disable Auto-Compact in Claude Code
+
+**Finding:** Auto-compact can consume 45K tokens (22.5% of context window) before coding begins.
+
+**Recommendation:**
+- Turn off auto-compact: `/config` → toggle off
+- Use `/clear` after 1-3 messages to prevent bloat
+- Run `/clear` immediately after disabling to reclaim tokens
+- Regain 88.1% of context window for productive work
+
+**Source:** Shuttle.dev Claude Code Best Practices
+
+### CLAUDE.md Curation
+
+**Finding:** Auto-generated CLAUDE.md files are too generic.
+
+**Best Practice:**
+- Manually curate project-specific patterns
+- Keep under 100 lines per file
+- Include non-obvious relationships
+- Document anti-patterns to avoid
+- Optimize for AI agent understanding, not human documentation
+
+**Source:** Claude Code Best Practices, Anthropic Engineering
+
+### Custom Slash Commands as Infrastructure
+
+**Finding:** Repeated prompting patterns benefit from reusable commands.
+
+**Best Practice:**
+- Store in `.claude/commands/` for project-level
+- Store in `~/.claude/commands/` for user-level
+- Check into version control for team benefit
+- Use `$ARGUMENTS` and `$1, $2, etc.` for parameters
+- Encode team best practices as persistent infrastructure
+
+**Source:** Claude Code Documentation
+
+---
+
+## Technique Selection Decision Tree (2025 Consensus)
+
+Based on aggregated research and community feedback:
+
+```
+                Start: Define Task
+                       ↓
+        ┌──────────────┴──────────────┐
+        │                             │
+   Complexity?                   Repeated Use?
+        │                             │
+    ┌───┴───┐                    ┌────┴────┐
+Simple  Medium  Complex       Yes          No
+    │       │       │          │            │
+Clarity  +XML   +Role      Cache        One-off
+         +CoT   +CoT       Structure     Design
+              +Examples      +XML
+              +Tools
+
+Token Budget?
+    │
+┌───┴───┐
+Tight   Flexible
+ │          │
+Skip     Add CoT
+CoT      Examples
+
+Format Critical?
+    │
+┌───┴────┐
+Yes      No
+ │        │
+Prefill  Skip
+Examples
+```
+
+---
+
+## Measuring Prompt Effectiveness
+
+**Research Recommendation:** Systematic evaluation before and after prompt engineering.
+
+### Metrics to Track
+
+**Accuracy:**
+- Correctness of outputs
+- Alignment with success criteria
+- Error rates
+
+**Consistency:**
+- Output format compliance
+- Reliability across runs
+- Variance in responses
+
+**Cost:**
+- Tokens per request
+- $ cost per request
+- Caching effectiveness
+
+**Latency:**
+- Time to first token
+- Total response time
+- User experience impact
+
+### Evaluation Framework
+
+1. **Baseline:** Measure current prompt performance
+2. **Iterate:** Apply one technique at a time
+3. **Measure:** Compare metrics to baseline
+4. **Keep or Discard:** Retain only improvements
+5. **Document:** Record which techniques help for which tasks
+
+**Anti-Pattern:** Applying all techniques without measuring effectiveness
+
+---
+
+## Future Directions (2025 and Beyond)
+
+### Emerging Trends
+
+**1. Agent Capabilities**
+- Models maintaining focus for 30+ hours (Sonnet 4.5)
+- Improved context awareness and self-management
+- Better tool use and reasoning integration
+
+**2. Cost Curve Collapse**
+- Haiku 4.5 matches Sonnet 4 at 1/3 cost
+- Enables new deployment patterns (parallel subagents)
+- Economic feasibility of agent orchestration
+
+**3. Multimodal Integration**
+- Vision + text for document analysis
+- 60% reduction in document processing time
+- Correlation of visual and textual information
+
+**4. Safety and Alignment**
+- Research on agentic misalignment
+- Importance of human oversight at scale
+- System design for ethical constraints
+
+**5. Standardization**
+- Model Context Protocol (MCP) for tool integration
+- Reduced custom integration complexity
+- Ecosystem of third-party tools
+
+---
+
+## Key Takeaways from Research
+
+1. **Simplicity wins**: Start minimal, add complexity only when justified by results
+2. **Structure scales**: XML tags become essential as complexity increases
+3. **Thinking costs but helps**: 2-3x tokens for reasoning, worth it for analysis
+4. **Caching transforms economics**: 90% savings makes long prompts feasible
+5. **Placement matters**: Documents before queries, 30% better performance
+6. **Tools need docs**: Clear descriptions → correct usage
+7. **Agents need transparency**: Show reasoning, enable human verification
+8. **Context is finite**: Manage attention budget deliberately
+9. **Measure everything**: Remove techniques that don't improve outcomes
+10. **Economic optimization**: Right model for right task (Haiku → Sonnet → Opus)
+
+---
+
+## Research Sources
+
+- Anthropic Prompt Engineering Documentation (2024-2025)
+- Anthropic Engineering Blog - Context Engineering (2024)
+- Anthropic Research - Building Effective Agents (2024)
+- Claude Code Best Practices (Anthropic, 2024)
+- Shuttle.dev Claude Code Analysis (2024)
+- AWS ML Blog - Anthropic Techniques (2024)
+- Contextual Retrieval Research (Anthropic, 2024)
+- Model Release Announcements (Sonnet 4.5, Haiku 4.5)
+- Citations API Documentation (2024)
+- Extended Thinking Documentation (2024)
+- Community Best Practices (Multiple Sources, 2024-2025)
+
+---
+
+## Keeping Current
+
+**Best Practices:**
+- Follow Anthropic Engineering blog for latest research
+- Monitor Claude Code documentation updates
+- Track community implementations (GitHub, forums)
+- Experiment with new capabilities as released
+- Measure impact of new techniques on your use cases
+
+**Resources:**
+- https://www.anthropic.com/research
+- https://www.anthropic.com/engineering
+- https://docs.claude.com/
+- https://code.claude.com/docs
+- Community: r/ClaudeAI, Anthropic Discord
+
+---
+
+## Research-Backed Anti-Patterns
+
+Based on empirical findings, avoid:
+
+❌ **Ignoring Document Placement** - 30% performance loss
+❌ **Not Leveraging Caching** - 10x unnecessary costs
+❌ **Over-Engineering Simple Tasks** - Worse results + higher cost
+❌ **Framework Over-Reliance** - Obscures decision-making
+❌ **Skipping Measurement** - Can't validate improvements
+❌ **One-Size-Fits-All Prompts** - Suboptimal for specific tasks
+❌ **Vague Tool Documentation** - Poor tool selection
+❌ **Ignoring Context Budget** - Performance degradation
+❌ **No Agent Transparency** - Debugging nightmares
+❌ **Wrong Model for Task** - Overpaying or underperforming
+
+---
+
+This research summary reflects the state of Anthropic's prompt engineering best practices as of 2025, incorporating both official research and validated community findings.
--- a/skills/engineering-prompts/reference/technique-catalog.md
+++ b/skills/engineering-prompts/reference/technique-catalog.md
@@ -0,0 +1,641 @@
+# Prompt Engineering Technique Catalog
+
+Deep dive into each of the 9 core prompt engineering techniques with examples, token costs, and combination strategies.
+
+## Table of Contents
+
+- [1. Clarity and Directness](#1-clarity-and-directness)
+- [2. XML Structure](#2-xml-structure)
+- [3. Chain of Thought](#3-chain-of-thought)
+- [4. Multishot Prompting](#4-multishot-prompting)
+- [5. System Prompt (Role Assignment)](#5-system-prompt-role-assignment)
+- [6. Prefilling](#6-prefilling)
+- [7. Long Context Optimization](#7-long-context-optimization)
+- [8. Context Budget Management](#8-context-budget-management)
+- [9. Tool Documentation](#9-tool-documentation)
+- [Technique Combination Matrix](#technique-combination-matrix)
+- [Decision Framework](#decision-framework)
+- [Common Patterns](#common-patterns)
+- [Measuring Effectiveness](#measuring-effectiveness)
+
+---
+
+## 1. Clarity and Directness
+
+### What It Is
+Clear, explicit instructions that state objectives precisely, including scope and success criteria in unambiguous terms.
+
+### When to Use
+**ALWAYS.** This is the foundational technique that improves responses across virtually all scenarios.
+
+### Token Cost
+Minimal - typically 20-50 tokens for clear instructions.
+
+### Examples
+
+**Before (Vague):**
+```
+Tell me about this document.
+```
+
+**After (Clear):**
+```
+Extract the key financial metrics from this quarterly report, focusing on:
+- Revenue growth (YoY %)
+- Gross margin
+- Operating cash flow
+
+Present each metric in the format: [Metric Name]: [Value] [Trend]
+```
+
+### Why It Works
+Specificity allows Claude to understand exactly what's needed and focus reasoning on relevant aspects.
+
+### Combination Strategies
+- Pairs with ALL techniques - always start here
+- Essential foundation for XML structure (what goes in each section)
+- Guides chain of thought (what to reason about)
+- Clarifies multishot examples (what pattern to match)
+
+---
+
+## 2. XML Structure
+
+### What It Is
+Using XML tags to create hard structural boundaries within prompts, separating instructions, context, examples, and formatting requirements.
+
+### When to Use
+- Complex prompts with multiple sections
+- Risk of instruction leakage (user input mixed with instructions)
+- Structured data tasks
+- Long prompts where sections need clear delineation
+
+### Token Cost
+~50-100 tokens overhead for tag structure.
+
+### Examples
+
+**Before (Mixed):**
+```
+You're a code reviewer. Look at this code and check for security issues, performance problems, and best practices. Here's the code: [code]. Format your response as bullet points.
+```
+
+**After (Structured):**
+```xml
+<instructions>
+You are a code reviewer. Analyze the code for:
+- Security vulnerabilities
+- Performance issues
+- Best practice violations
+</instructions>
+
+<code>
+[code content]
+</code>
+
+<formatting>
+Return findings as bullet points, organized by category.
+</formatting>
+```
+
+### Why It Works
+Claude has been fine-tuned to pay special attention to XML tags, preventing confusion between different types of information.
+
+### Combination Strategies
+- Use with long context (separate documents with `<document>` tags)
+- Pair with examples (`<examples>` section)
+- Combine with prefilling (structure output format)
+
+### Skip When
+- Simple single-section prompts
+- Token budget is extremely tight
+- User input doesn't risk instruction leakage
+
+---
+
+## 3. Chain of Thought
+
+### What It Is
+Encouraging step-by-step reasoning before providing final answers. Implemented via phrases like "Think step by step" or explicit `<thinking></thinking>` tags.
+
+### When to Use
+- Analysis tasks
+- Multi-step reasoning
+- Math problems
+- Complex decision-making
+- Debugging
+- Tasks where intermediate steps matter
+
+### Token Cost
+2-3x output tokens (thinking + final answer).
+
+### Examples
+
+**Before:**
+```
+What's the root cause of this bug?
+```
+
+**After:**
+```
+Analyze this bug. Think step by step:
+1. What is the error message telling us?
+2. What code is involved in the stack trace?
+3. What are the possible causes?
+4. Which cause is most likely given the context?
+
+Then provide your conclusion about the root cause.
+```
+
+**Or with structured thinking:**
+```
+Analyze this bug and provide:
+
+<thinking>
+Your step-by-step analysis here
+</thinking>
+
+<conclusion>
+Root cause and fix
+</conclusion>
+```
+
+### Why It Works
+Breaking down reasoning into steps improves accuracy and makes the decision-making process transparent and verifiable.
+
+### Combination Strategies
+- Essential for complex tasks even with other techniques
+- Pair with XML structure to separate thinking from output
+- Works well with long context (reason about documents)
+- Combine with examples showing reasoning process
+
+### Skip When
+- Simple extraction or lookup tasks
+- Format conversion
+- Tasks with obvious single-step answers
+- Token budget is critical concern
+
+---
+
+## 4. Multishot Prompting
+
+### What It Is
+Providing 2-5 examples of input → desired output to demonstrate patterns.
+
+### When to Use
+- Specific formatting requirements
+- Pattern learning tasks
+- Subtle output nuances
+- Structured data extraction
+- Style matching
+
+### Token Cost
+200-1000 tokens per example (depends on complexity).
+
+### Examples
+
+**Before:**
+```
+Extract product information from these descriptions.
+```
+
+**After:**
+```
+Extract product information from descriptions. Format as JSON.
+
+Examples:
+
+Input: "Premium leather wallet, black, RFID blocking, $49.99"
+Output: {"name": "Premium leather wallet", "color": "black", "features": ["RFID blocking"], "price": 49.99}
+
+Input: "Wireless earbuds, noise cancelling, 24hr battery, multiple colors available"
+Output: {"name": "Wireless earbuds", "color": "multiple", "features": ["noise cancelling", "24hr battery"], "price": null}
+
+Now extract from: [your input]
+```
+
+### Why It Works
+Examples teach patterns more effectively than textual descriptions, especially for format and style.
+
+### Combination Strategies
+- Wrap examples in `<examples>` XML tags for clarity
+- Show chain of thought in examples if reasoning is complex
+- Include edge cases in examples
+- Can combine with prefilling to start the response
+
+### Skip When
+- Task is self-explanatory
+- Examples would be trivial or redundant
+- Token budget is constrained
+- One-off task where setup cost isn't worth it
+
+---
+
+## 5. System Prompt (Role Assignment)
+
+### What It Is
+Using the system parameter to assign Claude a specific role, expertise area, or perspective.
+
+### When to Use
+- Domain-specific tasks (medical, legal, technical)
+- Tone or style requirements
+- Perspective-based analysis
+- Specialized workflows
+
+### Token Cost
+Minimal (20-100 tokens, caches extremely well).
+
+### Examples
+
+**Generic:**
+```
+Analyze this code for security issues.
+```
+
+**With Role:**
+```
+System: You are a senior security engineer with 15 years of experience in application security. You specialize in identifying OWASP Top 10 vulnerabilities and secure coding practices.
+
+User: Analyze this code for security issues.
+```
+
+### Why It Works
+Roles frame Claude's approach and leverage domain-specific patterns from training data.
+
+### Combination Strategies
+- Almost always use with other techniques
+- Particularly powerful with chain of thought (expert reasoning)
+- Helps with multishot examples (expert demonstrates)
+- Define constraints in system prompt (tools, approach)
+
+### Skip When
+- Generic tasks requiring no specific expertise
+- Role would be artificial or unhelpful
+- You want flexibility in perspective
+
+---
+
+## 6. Prefilling
+
+### What It Is
+Providing the start of Claude's response to guide format and skip preambles.
+
+### When to Use
+- Strict format requirements (JSON, XML, CSV)
+- Want to skip conversational preambles
+- Need consistent output structure
+- Automated parsing of responses
+
+### Token Cost
+Minimal (5-20 tokens typically).
+
+### Examples
+
+**Without Prefilling:**
+```
+User: Extract data as JSON
+Claude: Sure! Here's the data in JSON format:
+{
+  "data": ...
+```
+
+**With Prefilling:**
+```
+User: Extract data as JSON
+Assistant: {
+Claude: "data": ...
+```
+
+### Why It Works
+Forces Claude to continue from the prefilled content, ensuring format compliance and skipping unnecessary text.
+
+### Combination Strategies
+- Combine with XML structure (prefill to skip tags)
+- Use with multishot (prefill the pattern shown)
+- Pair with system role (prefill expert format)
+
+### Skip When
+- Conversational tone is desired
+- Explanation or context is valuable
+- Format is flexible
+
+### Technical Notes
+- Prefill cannot end with trailing whitespace
+- Works in both API and conversational interfaces
+
+---
+
+## 7. Long Context Optimization
+
+### What It Is
+Specific strategies for handling 20K+ token inputs effectively, including document placement, XML structure, and quote grounding.
+
+### When to Use
+- Processing multiple documents
+- Analyzing long technical documents
+- Research across many sources
+- Complex data-rich tasks
+
+### Token Cost
+No additional cost - improves accuracy for same tokens.
+
+### Key Strategies
+
+**1. Document Placement**
+Place long documents BEFORE queries and instructions:
+
+```xml
+<document>
+[Long document 1]
+</document>
+
+<document>
+[Long document 2]
+</document>
+
+<instructions>
+Analyze these documents for X
+</instructions>
+```
+
+**2. Metadata Tagging**
+```xml
+<document>
+  <source>quarterly-report-q3-2024.pdf</source>
+  <type>financial</type>
+  <content>
+  [document content]
+  </content>
+</document>
+```
+
+**3. Quote Grounding**
+"First, quote the relevant section from the document. Then provide your analysis."
+
+### Why It Works
+- Placement: 30% better performance in evaluations
+- Tags: Help Claude organize and retrieve information
+- Quoting: Forces attention to specific relevant text
+
+### Combination Strategies
+- Essential with XML structure for multi-document tasks
+- Pair with chain of thought (reason about documents)
+- Use with system role (expert document analyst)
+
+### Skip When
+- Short prompts (<5K tokens)
+- Single focused document
+- Simple extraction tasks
+
+---
+
+## 8. Context Budget Management
+
+### What It Is
+Optimizing for repeated prompts through caching and managing attention budget across long conversations.
+
+### When to Use
+- Repeated prompts with stable content
+- Long conversations
+- System prompts that don't change
+- Reference documentation that's reused
+
+### Token Cost
+Caching: 90% cost reduction on cached content
+- Write: 25% of standard cost
+- Read: 10% of standard cost
+
+### Strategies
+
+**1. Prompt Caching**
+Structure prompts so stable content is cached:
+```
+[System prompt - caches]
+[Reference docs - caches]
+[User query - doesn't cache]
+```
+
+**2. Context Windowing**
+For long conversations, periodically summarize and reset context.
+
+**3. Structured Memory**
+Use the memory tool to persist information across context windows.
+
+### Examples
+
+**Cacheable Structure:**
+```xml
+<system>
+You are a code reviewer. [full guidelines]
+</system>
+
+<style_guide>
+[Company style guide - 10K tokens]
+</style_guide>
+
+<user_query>
+Review this PR: [specific PR]
+</user_query>
+```
+
+The system prompt and style guide cache, only the user query changes.
+
+### Why It Works
+- Caching: Dramatically reduces cost for repeated content
+- Windowing: Prevents context overflow and performance degradation
+- Memory: Enables projects longer than context window
+
+### Combination Strategies
+- Structure with XML to create cacheable boundaries
+- Use with long context tips for large documents
+- Pair with system prompts (highly cacheable)
+
+### Skip When
+- One-off queries
+- Content changes every call
+- Short prompts where caching overhead isn't worth it
+
+---
+
+## 9. Tool Documentation
+
+### What It Is
+Clear, detailed descriptions of tools/functions including when to use them, parameter schemas, and examples.
+
+### When to Use
+- Function calling / tool use
+- Agent workflows
+- API integrations
+- Multi-step automated tasks
+
+### Token Cost
+100-500 tokens per tool definition.
+
+### Examples
+
+**Poor Tool Definition:**
+```json
+{
+  "name": "search",
+  "description": "Search for something",
+  "parameters": {
+    "query": "string"
+  }
+}
+```
+
+**Good Tool Definition:**
+```json
+{
+  "name": "semantic_search",
+  "description": "Search internal knowledge base using semantic similarity. Use this when the user asks questions about company policies, products, or documentation. Returns top 5 most relevant passages.",
+  "parameters": {
+    "query": {
+      "type": "string",
+      "description": "Natural language search query. Be specific and include key terms. Example: 'vacation policy for employees with 3 years tenure'"
+    },
+    "max_results": {
+      "type": "integer",
+      "description": "Number of results to return (1-10). Default: 5",
+      "default": 5
+    }
+  }
+}
+```
+
+### Why It Works
+Clear tool descriptions help Claude:
+- Know when to invoke the tool
+- Understand what parameters to provide
+- Format parameters correctly
+- Choose between multiple tools
+
+### Best Practices
+
+**Description Field:**
+- What the tool does
+- When to use it
+- What it returns
+- Keywords/scenarios
+
+**Parameter Schemas:**
+- Clear descriptions
+- Type definitions
+- Enums for fixed values
+- Examples of valid inputs
+- Defaults where applicable
+
+### Combination Strategies
+- Use with system role (define tool strategy)
+- Pair with chain of thought (reason about tool choice)
+- Combine with examples (show successful tool use)
+
+### Skip When
+- No tool use involved
+- Single obvious tool
+- Tools are self-explanatory
+
+---
+
+## Technique Combination Matrix
+
+| Primary Technique | Works Well With | Avoid Combining With |
+|------------------|-----------------|---------------------|
+| Clarity | Everything | N/A - always use |
+| XML Structure | Long Context, Examples, Caching | Simple single-section prompts |
+| Chain of Thought | XML, Role, Long Context | Simple extraction (unnecessary) |
+| Multishot | XML, Prefilling | Overly simple tasks |
+| System Role | Chain of Thought, Tools | Generic tasks |
+| Prefilling | XML, Multishot | Conversational outputs |
+| Long Context | XML, Quoting, Caching | Short prompts |
+| Context Budget | XML, System Prompts | One-off queries |
+| Tool Docs | Role, Examples | No tool use |
+
+---
+
+## Decision Framework
+
+```
+Start Here
+    ↓
+1. Always apply CLARITY
+    ↓
+2. Assess prompt length:
+   < 5K tokens → Skip long context tips
+   > 20K tokens → Apply long context optimization
+    ↓
+3. Check if repeated:
+   Yes → Structure for caching
+   No → Skip cache optimization
+    ↓
+4. Does it need reasoning?
+   Yes → Add chain of thought
+   No → Skip (save 2-3x tokens)
+    ↓
+5. Is format subtle or specific?
+   Yes → Add examples or prefilling
+   No → Skip
+    ↓
+6. Is it complex or has sections?
+   Yes → Use XML structure
+   No → Keep simple
+    ↓
+7. Does domain expertise help?
+   Yes → Assign role in system prompt
+   No → Skip
+    ↓
+8. Does it involve tools?
+   Yes → Write detailed tool docs
+   No → Skip
+    ↓
+Final Check: Is every technique justified?
+```
+
+---
+
+## Common Patterns
+
+### Pattern 1: Simple Extraction
+- Clarity ✓
+- XML (maybe, if multi-section)
+- Everything else: Skip
+
+### Pattern 2: Analysis Task
+- Clarity ✓
+- Chain of Thought ✓
+- XML Structure ✓
+- System Role ✓
+- Long Context (if large input) ✓
+
+### Pattern 3: Format Conversion
+- Clarity ✓
+- Multishot Examples ✓
+- Prefilling ✓
+- XML (maybe)
+
+### Pattern 4: Agent Workflow
+- Clarity ✓
+- System Role ✓
+- Tool Documentation ✓
+- Chain of Thought ✓
+- Context Budget Management ✓
+- XML Structure ✓
+
+### Pattern 5: Repeated Queries
+- Clarity ✓
+- System Role ✓
+- Context Budget Management ✓
+- XML Structure (for cache boundaries) ✓
+- Other techniques as needed
+
+---
+
+## Measuring Effectiveness
+
+For each technique, track:
+- **Accuracy**: Does output quality improve?
+- **Token Cost**: What's the overhead?
+- **Latency**: Does response time increase?
+- **Consistency**: Are results more reliable?
+
+Remove techniques that don't improve outcomes for your specific use case.