17 KiB
Prompt Optimization Strategies
Overview
This guide covers systematic approaches to analyzing and improving prompts that aren't giving you the results you need. Learn how to diagnose problems, apply fixes, and measure improvements.
The Optimization Process
1. IDENTIFY → What's wrong with current results?
2. DIAGNOSE → Why is the prompt failing?
3. FIX → Apply appropriate optimization
4. TEST → Verify improvement
5. ITERATE → Refine further if needed
Step 1: Identify the Problem
Common Prompt Problems
| Problem | Symptoms | Quick Test |
|---|---|---|
| Vague instructions | Inconsistent outputs, missing requirements | Run prompt 3 times - do you get similar results? |
| Missing context | Wrong assumptions, irrelevant responses | Does the prompt include all necessary background? |
| Unclear format | Output structure varies | Does prompt specify exact format? |
| Too complex | Partial completion, confusion | Does prompt try to do too much at once? |
| Wrong technique | Poor accuracy, missed patterns | Does task match technique used? |
| Model mismatch | Suboptimal results | Are you using model-specific features? |
| No examples | Pattern not learned | Would examples help? |
| Ambiguous terms | Misinterpreted requirements | Are terms like "professional" defined? |
Diagnostic Questions
Ask yourself:
Clarity:
- Is the task stated explicitly?
- Are all requirements spelled out?
- Would a junior developer understand this?
Completeness:
- Is all necessary context included?
- Are constraints specified?
- Is output format defined?
Consistency:
- Does running it 3x give similar results?
- Are there contradictory instructions?
- Is terminology used consistently?
Appropriateness:
- Is the right technique being used?
- Is this the right model for the task?
- Is complexity level appropriate?
Step 2: Diagnose Root Cause
Problem: Inconsistent Results
Diagnosis:
- Instructions too vague
- Missing constraints
- Ambiguous terminology
- No output format specified
How to confirm: Run the same prompt 3-5 times and compare outputs.
Evidence:
Run 1: Returns JSON
Run 2: Returns markdown table
Run 3: Returns plain text
Diagnosis: No format specification
Problem: Wrong or Irrelevant Answers
Diagnosis:
- Missing context
- Unclear task definition
- Wrong assumptions by model
- No examples provided
How to confirm: Does the prompt include all information needed to complete the task correctly?
Evidence:
Prompt: "Optimize this code"
Response: "What language is this? What should I optimize for?"
Diagnosis: Missing context (language, optimization goals)
Problem: Partial Completion
Diagnosis:
- Task too complex for single shot
- Prompt too long/confusing
- Multiple competing requirements
- Agentic behavior not calibrated
How to confirm: Break task into smaller parts - does each part work individually?
Evidence:
Prompt: "Build complete e-commerce backend with auth, payments, inventory"
Response: Only implements auth, ignores rest
Diagnosis: Too complex, needs progressive disclosure
Problem: Incorrect Pattern Application
Diagnosis:
- Using few-shot when zero-shot would work
- Missing chain-of-thought for complex reasoning
- No examples when pattern matching needed
- Wrong model for the task
How to confirm: Compare results with different techniques.
Evidence:
Complex math problem solved directly: Wrong answer
Same problem with "think step-by-step": Correct answer
Diagnosis: Needs chain-of-thought prompting
Step 3: Apply Fixes
Fix: Add Specificity
Before:
"Write professional code"
After:
"Write TypeScript code following these standards:
- Strict type checking enabled
- JSDoc comments for all public methods
- Error handling for all async operations
- Maximum cyclomatic complexity of 5
- Meaningful variable names (no abbreviations)"
Why it works: Removes ambiguity, defines "professional" explicitly.
Fix: Provide Context
Before:
"Optimize this function:
[code]"
After:
"Optimize this TypeScript function that processes user uploads:
Context:
- Called 1000x per minute
- Typical input: 1-10MB files
- Current bottleneck: O(n²) loop
- Running in Node.js 18
Optimization goals (in priority order):
1. Reduce time complexity
2. Lower memory usage
3. Maintain readability
[code]"
Why it works: Provides necessary context for appropriate optimization.
Fix: Specify Format
Before:
"Analyze this code for issues"
After:
"Analyze this code for issues and respond in this exact format:
## Issues Found
### [Issue Title]
- **Severity**: Critical/High/Medium/Low
- **Category**: Security/Performance/Quality
- **Location**: Line X
- **Problem**: [Description]
- **Fix**: [Specific solution]
### [Next Issue]
[Same format]
## Summary
Total issues: [count by severity]
Recommendation: [Approve/Request Changes]"
Why it works: Guarantees consistent, parseable output.
Fix: Add Examples (Few-Shot)
Before:
"Convert these to camelCase:
user_name
total_count
is_active"
After:
"Convert these to camelCase:
Examples:
user_name → userName
total_count → totalCount
Now convert:
is_active →
created_at →
max_retry_attempts →"
Why it works: Shows the pattern, improves accuracy.
Fix: Enable Reasoning (Chain-of-Thought)
Before:
"What's the time complexity of this algorithm?"
After:
"Analyze the time complexity of this algorithm step-by-step:
1. First, identify the loops and recursive calls
2. Then, determine how many times each executes
3. Next, calculate the overall complexity
4. Finally, express in Big O notation
Algorithm:
[code]"
Why it works: Structured reasoning leads to correct analysis.
Fix: Break Down Complexity (Progressive Disclosure)
Before:
"Build a complete user authentication system with OAuth,
2FA, session management, and password reset"
After:
"Let's build a user authentication system in stages.
STAGE 1: Design the overall architecture
- Components needed
- Data models
- API endpoints
Wait for my review before proceeding to implementation."
Then after review:
"STAGE 2: Implement the core authentication service
- User login with email/password
- JWT token generation
- Session management
Use the architecture we designed in Stage 1."
Why it works: Breaks overwhelming task into manageable pieces.
Fix: Add Constraints
Before:
"Write a function to sort an array"
After:
"Write a function to sort an array with these constraints:
Requirements:
- Input: Array of numbers (may contain duplicates)
- Output: New array, sorted ascending
- Must be pure function (don't modify input)
- Time complexity: O(n log n) or better
- Language: TypeScript with strict types
Edge cases to handle:
- Empty array → return empty array
- Single element → return copy of array
- All same values → return copy of array
- Negative numbers → sort correctly"
Why it works: Removes ambiguity, ensures all cases handled.
Fix: Use Model-Specific Features
For GPT-5:
Before (generic):
"Implement this feature"
After (GPT-5 optimized):
SYSTEM: Senior TypeScript developer
TASK: Implement user authentication service
CONSTRAINTS:
- Use JWT with refresh tokens
- TypeScript strict mode
- Include error handling
OUTPUT: Complete implementation with tests
BEHAVIOR: Complete autonomously without asking questions
VERBOSITY: Concise - core functionality only
For Claude:
Before (generic):
"Review this code for security issues"
After (Claude optimized):
<instruction>
Review this code for security vulnerabilities
</instruction>
<code>
[Code to review]
</code>
<focus_areas>
- SQL injection
- XSS attacks
- Authentication bypasses
- Data exposure
</focus_areas>
<output_format>
For each vulnerability:
- Severity: [Critical/High/Medium/Low]
- Location: [Line number]
- Description: [What's wrong]
- Fix: [How to remediate]
</output_format>
Step 4: Test Improvements
A/B Testing Prompts
Method:
- Keep original prompt as baseline
- Create improved version
- Run both on same inputs (5-10 test cases)
- Compare results objectively
Evaluation criteria:
- Accuracy (does it solve the problem?)
- Consistency (similar results each time?)
- Completeness (all requirements met?)
- Quality (meets standards?)
- Format compliance (matches specification?)
Example:
| Test Case | Original Prompt | Improved Prompt | Winner |
|---|---|---|---|
| Case 1 | 60% accurate | 95% accurate | Improved |
| Case 2 | Wrong format | Correct format | Improved |
| Case 3 | Incomplete | Complete | Improved |
| Case 4 | Correct | Correct | Tie |
| Case 5 | Partial | Complete | Improved |
Result: Improved prompt wins 4/5 cases, use it.
Measuring Improvement
Quantitative metrics:
- Success rate: X% of runs produce correct output
- Consistency: X% of runs produce similar output
- Completeness: X% of requirements met on average
- Format compliance: X% match specified format
- Token efficiency: Avg tokens to correct result
Qualitative assessment:
- Does it feel easier to use?
- Do results need less manual correction?
- Is output more predictable?
- Are edge cases handled better?
Iterative Testing
Version 1 → Test → 70% success → Identify issues
Version 2 → Test → 85% success → Identify remaining issues
Version 3 → Test → 95% success → Good enough for production
Don't expect perfection on first try. Iterate.
Step 5: Continuous Iteration
When to Iterate Further
Keep improving if:
- Success rate < 90%
- Inconsistent outputs across runs
- Missing edge cases
- Requires manual post-processing
- Takes multiple attempts to get right
Stop iterating when:
- Success rate > 95%
- Consistent, predictable outputs
- Handles known edge cases
- Minimal manual correction needed
- One-shot success most of the time
Iteration Strategies
Strategy 1: Incremental Refinement
- Start with basic working prompt
- Identify most common failure mode
- Add constraint/example to fix it
- Test
- Repeat for next most common failure
Strategy 2: Template Evolution
- Create template with placeholders
- Test with multiple scenarios
- Identify missing sections
- Add new sections to template
- Retest with scenarios
Strategy 3: Technique Stacking
- Start with base technique (e.g., zero-shot)
- If accuracy low, add chain-of-thought
- If still issues, add few-shot examples
- If format wrong, add explicit structure
- Layer techniques until quality sufficient
Tracking What Works
Maintain a prompt library:
prompts/
├── code-review.md (Success rate: 97%)
├── data-analysis.md (Success rate: 94%)
├── documentation.md (Success rate: 99%)
└── bug-diagnosis.md (Success rate: 91%)
Document each prompt:
# Code Review Prompt
**Success Rate:** 97%
**Last Updated:** 2025-01-15
**Model:** GPT-5 / Claude Sonnet
**Use When:** Reviewing pull requests for security, performance, quality
**Template:**
[Prompt template]
**Known Limitations:**
- May miss complex race conditions
- Better for backend than frontend code
**Changelog:**
- v3 (2025-01-15): Added self-validation step (95% → 97%)
- v2 (2025-01-10): Added severity ratings (90% → 95%)
- v1 (2025-01-05): Initial version (90%)
Optimization Checklist
Use this to systematically improve any prompt:
Clarity
- Task explicitly stated
- No ambiguous terms
- Success criteria defined
- All requirements listed
Completeness
- All necessary context provided
- Constraints specified
- Edge cases addressed
- Output format defined
Structure
- Organized into clear sections
- Appropriate headings/tags
- Logical flow
- Easy to parse
Technique
- Right approach for task type
- Examples if pattern matching needed
- Reasoning steps if complex
- Progressive disclosure if large
Model Fit
- Uses model-specific features
- Matches model strengths
- Avoids model weaknesses
- Appropriate complexity level
Testability
- Produces consistent outputs
- Format is parseable
- Easy to validate results
- Clear success/failure criteria
Advanced Optimization Techniques
Technique 1: Constraint Tuning
Concept: Find the right level of constraint specificity.
Too loose:
"Write good code"
Too tight:
"Write code with exactly 47 lines, 3 functions named specifically
foo, bar, baz, using only these 12 allowed keywords..."
Just right:
"Write TypeScript code:
- 3-5 small functions (<20 lines each)
- Descriptive names
- One responsibility per function
- Type-safe"
Process:
- Start loose
- Add constraints where outputs vary
- Stop when consistency achieved
- Don't over-constrain
Technique 2: Example Engineering
Concept: Carefully craft few-shot examples for maximum learning.
Principles:
- Diversity: Cover different scenarios
- Edge cases: Include tricky examples
- Consistency: Same format for all
- Clarity: Obvious pattern to learn
Bad examples:
Input: "Hello"
Output: "HELLO"
Input: "Hi"
Output: "HI"
Input: "Hey"
Output: "HEY"
(Too similar, no edge cases)
Good examples:
Input: "Hello World"
Output: "HELLO WORLD"
Input: "it's-a test_case"
Output: "IT'S-A TEST_CASE"
Input: "123abc"
Output: "123ABC"
Input: ""
Output: ""
(Diverse, includes spaces, punctuation, numbers, empty)
Technique 3: Format Scaffolding
Concept: Provide the exact structure you want filled in.
Basic request:
"Analyze this code"
With scaffolding:
"Analyze this code and fill in this template:
## Code Quality: [Good/Fair/Poor]
## Issues Found:
1. [Issue]: [Description]
- Location: [Line X]
- Fix: [Solution]
2. [Issue]: [Description]
- Location: [Line Y]
- Fix: [Solution]
## Strengths:
- [Strength 1]
- [Strength 2]
## Recommendation: [Approve/Request Changes]"
Result: Guaranteed format match.
Technique 4: Self-Correction Loop
Concept: Ask the model to validate and fix its own output.
Single-pass:
"Generate a JSON API response for user data"
With self-correction:
"Generate a JSON API response for user data.
After generating:
1. Validate JSON syntax
2. Check all required fields present
3. Verify data types correct
4. Ensure no sensitive data exposed
If validation fails, fix issues and regenerate."
Result: Higher quality, fewer errors.
Technique 5: Precision Through Negation
Concept: Specify what NOT to do as well as what to do.
Positive only:
"Write a professional email"
Positive + Negative:
"Write a professional email:
DO:
- Use formal salutation (Dear [Name])
- State purpose in first sentence
- Use clear, concise language
- Include specific next steps
DON'T:
- Use contractions (don't, won't, etc.)
- Include emojis or informal language
- Exceed 3 paragraphs
- End without clear call-to-action"
Result: Avoids common mistakes.
Common Optimization Patterns
Pattern: Vague → Specific
Before: "Make this better" After: "Refactor to reduce complexity from 15 to <5, extract helper functions, add type safety"
Pattern: No Examples → Few-Shot
Before: "Format these dates" After: "2025-01-15 → January 15, 2025\n2024-12-01 → December 1, 2024\nNow format: 2025-03-22"
Pattern: Single-Shot → Progressive
Before: "Build entire app" After: "Stage 1: Design architecture [wait for approval] Stage 2: Implement core [wait for approval]..."
Pattern: Generic → Model-Specific
Before: "Review this code"
After (GPT-5): "SYSTEM: Senior dev\nTASK: Code review\nCONSTRAINTS:..."
After (Claude): <instruction>Review code</instruction><code>...</code>
Pattern: Implicit → Explicit Format
Before: "List the issues" After: "List issues as:\n1. [Issue] - Line X - [Fix]\n2. [Issue] - Line Y - [Fix]"
Quick Reference
Optimization Decision Tree
Is output inconsistent?
├─ Yes → Add format specification + constraints
└─ No → Continue
Is output incomplete?
├─ Yes → Check if task too complex → Use progressive disclosure
└─ No → Continue
Is output inaccurate?
├─ Yes → Add examples (few-shot) or reasoning (CoT)
└─ No → Continue
Is output low quality?
├─ Yes → Add quality criteria + validation step
└─ No → Prompt is good!
Quick Fixes
| Symptom | Quick Fix |
|---|---|
| Varying formats | Add exact format template |
| Wrong answers | Add "think step-by-step" |
| Inconsistent results | Add specific constraints |
| Missing edge cases | Add explicit edge case handling |
| Too verbose | Specify length limit |
| Too brief | Request "detailed" or "comprehensive" |
| Wrong assumptions | Provide complete context |
| Partial completion | Break into stages |
Remember: Optimization is iterative. Start simple, measure results, identify failures, apply targeted fixes, and repeat until quality is sufficient. Don't over-engineer prompts that already work well enough.