Files
gh-jamesrochabrun-skills/skills/openai-prompt-engineer/references/optimization_strategies.md
2025-11-29 18:48:55 +08:00

785 lines
17 KiB
Markdown

# Prompt Optimization Strategies
## Overview
This guide covers systematic approaches to analyzing and improving prompts that aren't giving you the results you need. Learn how to diagnose problems, apply fixes, and measure improvements.
## The Optimization Process
```
1. IDENTIFY → What's wrong with current results?
2. DIAGNOSE → Why is the prompt failing?
3. FIX → Apply appropriate optimization
4. TEST → Verify improvement
5. ITERATE → Refine further if needed
```
---
## Step 1: Identify the Problem
### Common Prompt Problems
| Problem | Symptoms | Quick Test |
|---------|----------|------------|
| **Vague instructions** | Inconsistent outputs, missing requirements | Run prompt 3 times - do you get similar results? |
| **Missing context** | Wrong assumptions, irrelevant responses | Does the prompt include all necessary background? |
| **Unclear format** | Output structure varies | Does prompt specify exact format? |
| **Too complex** | Partial completion, confusion | Does prompt try to do too much at once? |
| **Wrong technique** | Poor accuracy, missed patterns | Does task match technique used? |
| **Model mismatch** | Suboptimal results | Are you using model-specific features? |
| **No examples** | Pattern not learned | Would examples help? |
| **Ambiguous terms** | Misinterpreted requirements | Are terms like "professional" defined? |
### Diagnostic Questions
Ask yourself:
**Clarity:**
- [ ] Is the task stated explicitly?
- [ ] Are all requirements spelled out?
- [ ] Would a junior developer understand this?
**Completeness:**
- [ ] Is all necessary context included?
- [ ] Are constraints specified?
- [ ] Is output format defined?
**Consistency:**
- [ ] Does running it 3x give similar results?
- [ ] Are there contradictory instructions?
- [ ] Is terminology used consistently?
**Appropriateness:**
- [ ] Is the right technique being used?
- [ ] Is this the right model for the task?
- [ ] Is complexity level appropriate?
---
## Step 2: Diagnose Root Cause
### Problem: Inconsistent Results
**Diagnosis:**
- Instructions too vague
- Missing constraints
- Ambiguous terminology
- No output format specified
**How to confirm:**
Run the same prompt 3-5 times and compare outputs.
**Evidence:**
```
Run 1: Returns JSON
Run 2: Returns markdown table
Run 3: Returns plain text
Diagnosis: No format specification
```
### Problem: Wrong or Irrelevant Answers
**Diagnosis:**
- Missing context
- Unclear task definition
- Wrong assumptions by model
- No examples provided
**How to confirm:**
Does the prompt include all information needed to complete the task correctly?
**Evidence:**
```
Prompt: "Optimize this code"
Response: "What language is this? What should I optimize for?"
Diagnosis: Missing context (language, optimization goals)
```
### Problem: Partial Completion
**Diagnosis:**
- Task too complex for single shot
- Prompt too long/confusing
- Multiple competing requirements
- Agentic behavior not calibrated
**How to confirm:**
Break task into smaller parts - does each part work individually?
**Evidence:**
```
Prompt: "Build complete e-commerce backend with auth, payments, inventory"
Response: Only implements auth, ignores rest
Diagnosis: Too complex, needs progressive disclosure
```
### Problem: Incorrect Pattern Application
**Diagnosis:**
- Using few-shot when zero-shot would work
- Missing chain-of-thought for complex reasoning
- No examples when pattern matching needed
- Wrong model for the task
**How to confirm:**
Compare results with different techniques.
**Evidence:**
```
Complex math problem solved directly: Wrong answer
Same problem with "think step-by-step": Correct answer
Diagnosis: Needs chain-of-thought prompting
```
---
## Step 3: Apply Fixes
### Fix: Add Specificity
**Before:**
```
"Write professional code"
```
**After:**
```
"Write TypeScript code following these standards:
- Strict type checking enabled
- JSDoc comments for all public methods
- Error handling for all async operations
- Maximum cyclomatic complexity of 5
- Meaningful variable names (no abbreviations)"
```
**Why it works:** Removes ambiguity, defines "professional" explicitly.
### Fix: Provide Context
**Before:**
```
"Optimize this function:
[code]"
```
**After:**
```
"Optimize this TypeScript function that processes user uploads:
Context:
- Called 1000x per minute
- Typical input: 1-10MB files
- Current bottleneck: O(n²) loop
- Running in Node.js 18
Optimization goals (in priority order):
1. Reduce time complexity
2. Lower memory usage
3. Maintain readability
[code]"
```
**Why it works:** Provides necessary context for appropriate optimization.
### Fix: Specify Format
**Before:**
```
"Analyze this code for issues"
```
**After:**
```
"Analyze this code for issues and respond in this exact format:
## Issues Found
### [Issue Title]
- **Severity**: Critical/High/Medium/Low
- **Category**: Security/Performance/Quality
- **Location**: Line X
- **Problem**: [Description]
- **Fix**: [Specific solution]
### [Next Issue]
[Same format]
## Summary
Total issues: [count by severity]
Recommendation: [Approve/Request Changes]"
```
**Why it works:** Guarantees consistent, parseable output.
### Fix: Add Examples (Few-Shot)
**Before:**
```
"Convert these to camelCase:
user_name
total_count
is_active"
```
**After:**
```
"Convert these to camelCase:
Examples:
user_name → userName
total_count → totalCount
Now convert:
is_active →
created_at →
max_retry_attempts →"
```
**Why it works:** Shows the pattern, improves accuracy.
### Fix: Enable Reasoning (Chain-of-Thought)
**Before:**
```
"What's the time complexity of this algorithm?"
```
**After:**
```
"Analyze the time complexity of this algorithm step-by-step:
1. First, identify the loops and recursive calls
2. Then, determine how many times each executes
3. Next, calculate the overall complexity
4. Finally, express in Big O notation
Algorithm:
[code]"
```
**Why it works:** Structured reasoning leads to correct analysis.
### Fix: Break Down Complexity (Progressive Disclosure)
**Before:**
```
"Build a complete user authentication system with OAuth,
2FA, session management, and password reset"
```
**After:**
```
"Let's build a user authentication system in stages.
STAGE 1: Design the overall architecture
- Components needed
- Data models
- API endpoints
Wait for my review before proceeding to implementation."
```
**Then after review:**
```
"STAGE 2: Implement the core authentication service
- User login with email/password
- JWT token generation
- Session management
Use the architecture we designed in Stage 1."
```
**Why it works:** Breaks overwhelming task into manageable pieces.
### Fix: Add Constraints
**Before:**
```
"Write a function to sort an array"
```
**After:**
```
"Write a function to sort an array with these constraints:
Requirements:
- Input: Array of numbers (may contain duplicates)
- Output: New array, sorted ascending
- Must be pure function (don't modify input)
- Time complexity: O(n log n) or better
- Language: TypeScript with strict types
Edge cases to handle:
- Empty array → return empty array
- Single element → return copy of array
- All same values → return copy of array
- Negative numbers → sort correctly"
```
**Why it works:** Removes ambiguity, ensures all cases handled.
### Fix: Use Model-Specific Features
**For GPT-5:**
**Before (generic):**
```
"Implement this feature"
```
**After (GPT-5 optimized):**
```
SYSTEM: Senior TypeScript developer
TASK: Implement user authentication service
CONSTRAINTS:
- Use JWT with refresh tokens
- TypeScript strict mode
- Include error handling
OUTPUT: Complete implementation with tests
BEHAVIOR: Complete autonomously without asking questions
VERBOSITY: Concise - core functionality only
```
**For Claude:**
**Before (generic):**
```
"Review this code for security issues"
```
**After (Claude optimized):**
```xml
<instruction>
Review this code for security vulnerabilities
</instruction>
<code>
[Code to review]
</code>
<focus_areas>
- SQL injection
- XSS attacks
- Authentication bypasses
- Data exposure
</focus_areas>
<output_format>
For each vulnerability:
- Severity: [Critical/High/Medium/Low]
- Location: [Line number]
- Description: [What's wrong]
- Fix: [How to remediate]
</output_format>
```
---
## Step 4: Test Improvements
### A/B Testing Prompts
**Method:**
1. Keep original prompt as baseline
2. Create improved version
3. Run both on same inputs (5-10 test cases)
4. Compare results objectively
**Evaluation criteria:**
- Accuracy (does it solve the problem?)
- Consistency (similar results each time?)
- Completeness (all requirements met?)
- Quality (meets standards?)
- Format compliance (matches specification?)
**Example:**
| Test Case | Original Prompt | Improved Prompt | Winner |
|-----------|----------------|-----------------|--------|
| Case 1 | 60% accurate | 95% accurate | Improved |
| Case 2 | Wrong format | Correct format | Improved |
| Case 3 | Incomplete | Complete | Improved |
| Case 4 | Correct | Correct | Tie |
| Case 5 | Partial | Complete | Improved |
**Result:** Improved prompt wins 4/5 cases, use it.
### Measuring Improvement
**Quantitative metrics:**
- Success rate: X% of runs produce correct output
- Consistency: X% of runs produce similar output
- Completeness: X% of requirements met on average
- Format compliance: X% match specified format
- Token efficiency: Avg tokens to correct result
**Qualitative assessment:**
- Does it feel easier to use?
- Do results need less manual correction?
- Is output more predictable?
- Are edge cases handled better?
### Iterative Testing
```
Version 1 → Test → 70% success → Identify issues
Version 2 → Test → 85% success → Identify remaining issues
Version 3 → Test → 95% success → Good enough for production
```
Don't expect perfection on first try. Iterate.
---
## Step 5: Continuous Iteration
### When to Iterate Further
**Keep improving if:**
- Success rate < 90%
- Inconsistent outputs across runs
- Missing edge cases
- Requires manual post-processing
- Takes multiple attempts to get right
**Stop iterating when:**
- Success rate > 95%
- Consistent, predictable outputs
- Handles known edge cases
- Minimal manual correction needed
- One-shot success most of the time
### Iteration Strategies
**Strategy 1: Incremental Refinement**
1. Start with basic working prompt
2. Identify most common failure mode
3. Add constraint/example to fix it
4. Test
5. Repeat for next most common failure
**Strategy 2: Template Evolution**
1. Create template with placeholders
2. Test with multiple scenarios
3. Identify missing sections
4. Add new sections to template
5. Retest with scenarios
**Strategy 3: Technique Stacking**
1. Start with base technique (e.g., zero-shot)
2. If accuracy low, add chain-of-thought
3. If still issues, add few-shot examples
4. If format wrong, add explicit structure
5. Layer techniques until quality sufficient
### Tracking What Works
**Maintain a prompt library:**
```
prompts/
├── code-review.md (Success rate: 97%)
├── data-analysis.md (Success rate: 94%)
├── documentation.md (Success rate: 99%)
└── bug-diagnosis.md (Success rate: 91%)
```
**Document each prompt:**
```markdown
# Code Review Prompt
**Success Rate:** 97%
**Last Updated:** 2025-01-15
**Model:** GPT-5 / Claude Sonnet
**Use When:** Reviewing pull requests for security, performance, quality
**Template:**
[Prompt template]
**Known Limitations:**
- May miss complex race conditions
- Better for backend than frontend code
**Changelog:**
- v3 (2025-01-15): Added self-validation step (95% → 97%)
- v2 (2025-01-10): Added severity ratings (90% → 95%)
- v1 (2025-01-05): Initial version (90%)
```
---
## Optimization Checklist
Use this to systematically improve any prompt:
### Clarity
- [ ] Task explicitly stated
- [ ] No ambiguous terms
- [ ] Success criteria defined
- [ ] All requirements listed
### Completeness
- [ ] All necessary context provided
- [ ] Constraints specified
- [ ] Edge cases addressed
- [ ] Output format defined
### Structure
- [ ] Organized into clear sections
- [ ] Appropriate headings/tags
- [ ] Logical flow
- [ ] Easy to parse
### Technique
- [ ] Right approach for task type
- [ ] Examples if pattern matching needed
- [ ] Reasoning steps if complex
- [ ] Progressive disclosure if large
### Model Fit
- [ ] Uses model-specific features
- [ ] Matches model strengths
- [ ] Avoids model weaknesses
- [ ] Appropriate complexity level
### Testability
- [ ] Produces consistent outputs
- [ ] Format is parseable
- [ ] Easy to validate results
- [ ] Clear success/failure criteria
---
## Advanced Optimization Techniques
### Technique 1: Constraint Tuning
**Concept:** Find the right level of constraint specificity.
**Too loose:**
```
"Write good code"
```
**Too tight:**
```
"Write code with exactly 47 lines, 3 functions named specifically
foo, bar, baz, using only these 12 allowed keywords..."
```
**Just right:**
```
"Write TypeScript code:
- 3-5 small functions (<20 lines each)
- Descriptive names
- One responsibility per function
- Type-safe"
```
**Process:**
1. Start loose
2. Add constraints where outputs vary
3. Stop when consistency achieved
4. Don't over-constrain
### Technique 2: Example Engineering
**Concept:** Carefully craft few-shot examples for maximum learning.
**Principles:**
- **Diversity**: Cover different scenarios
- **Edge cases**: Include tricky examples
- **Consistency**: Same format for all
- **Clarity**: Obvious pattern to learn
**Bad examples:**
```
Input: "Hello"
Output: "HELLO"
Input: "Hi"
Output: "HI"
Input: "Hey"
Output: "HEY"
```
(Too similar, no edge cases)
**Good examples:**
```
Input: "Hello World"
Output: "HELLO WORLD"
Input: "it's-a test_case"
Output: "IT'S-A TEST_CASE"
Input: "123abc"
Output: "123ABC"
Input: ""
Output: ""
```
(Diverse, includes spaces, punctuation, numbers, empty)
### Technique 3: Format Scaffolding
**Concept:** Provide the exact structure you want filled in.
**Basic request:**
```
"Analyze this code"
```
**With scaffolding:**
```
"Analyze this code and fill in this template:
## Code Quality: [Good/Fair/Poor]
## Issues Found:
1. [Issue]: [Description]
- Location: [Line X]
- Fix: [Solution]
2. [Issue]: [Description]
- Location: [Line Y]
- Fix: [Solution]
## Strengths:
- [Strength 1]
- [Strength 2]
## Recommendation: [Approve/Request Changes]"
```
**Result:** Guaranteed format match.
### Technique 4: Self-Correction Loop
**Concept:** Ask the model to validate and fix its own output.
**Single-pass:**
```
"Generate a JSON API response for user data"
```
**With self-correction:**
```
"Generate a JSON API response for user data.
After generating:
1. Validate JSON syntax
2. Check all required fields present
3. Verify data types correct
4. Ensure no sensitive data exposed
If validation fails, fix issues and regenerate."
```
**Result:** Higher quality, fewer errors.
### Technique 5: Precision Through Negation
**Concept:** Specify what NOT to do as well as what to do.
**Positive only:**
```
"Write a professional email"
```
**Positive + Negative:**
```
"Write a professional email:
DO:
- Use formal salutation (Dear [Name])
- State purpose in first sentence
- Use clear, concise language
- Include specific next steps
DON'T:
- Use contractions (don't, won't, etc.)
- Include emojis or informal language
- Exceed 3 paragraphs
- End without clear call-to-action"
```
**Result:** Avoids common mistakes.
---
## Common Optimization Patterns
### Pattern: Vague → Specific
**Before:** "Make this better"
**After:** "Refactor to reduce complexity from 15 to <5, extract helper functions, add type safety"
### Pattern: No Examples → Few-Shot
**Before:** "Format these dates"
**After:** "2025-01-15 → January 15, 2025\n2024-12-01 → December 1, 2024\nNow format: 2025-03-22"
### Pattern: Single-Shot → Progressive
**Before:** "Build entire app"
**After:** "Stage 1: Design architecture [wait for approval] Stage 2: Implement core [wait for approval]..."
### Pattern: Generic → Model-Specific
**Before:** "Review this code"
**After (GPT-5):** "SYSTEM: Senior dev\nTASK: Code review\nCONSTRAINTS:..."
**After (Claude):** `<instruction>Review code</instruction><code>...</code>`
### Pattern: Implicit → Explicit Format
**Before:** "List the issues"
**After:** "List issues as:\n1. [Issue] - Line X - [Fix]\n2. [Issue] - Line Y - [Fix]"
---
## Quick Reference
### Optimization Decision Tree
```
Is output inconsistent?
├─ Yes → Add format specification + constraints
└─ No → Continue
Is output incomplete?
├─ Yes → Check if task too complex → Use progressive disclosure
└─ No → Continue
Is output inaccurate?
├─ Yes → Add examples (few-shot) or reasoning (CoT)
└─ No → Continue
Is output low quality?
├─ Yes → Add quality criteria + validation step
└─ No → Prompt is good!
```
### Quick Fixes
| Symptom | Quick Fix |
|---------|-----------|
| Varying formats | Add exact format template |
| Wrong answers | Add "think step-by-step" |
| Inconsistent results | Add specific constraints |
| Missing edge cases | Add explicit edge case handling |
| Too verbose | Specify length limit |
| Too brief | Request "detailed" or "comprehensive" |
| Wrong assumptions | Provide complete context |
| Partial completion | Break into stages |
---
**Remember:** Optimization is iterative. Start simple, measure results, identify failures, apply targeted fixes, and repeat until quality is sufficient. Don't over-engineer prompts that already work well enough.