Files
gh-jamesrochabrun-skills/skills/openai-prompt-engineer/references/optimization_strategies.md
2025-11-29 18:48:55 +08:00

17 KiB

Prompt Optimization Strategies

Overview

This guide covers systematic approaches to analyzing and improving prompts that aren't giving you the results you need. Learn how to diagnose problems, apply fixes, and measure improvements.

The Optimization Process

1. IDENTIFY → What's wrong with current results?
2. DIAGNOSE → Why is the prompt failing?
3. FIX → Apply appropriate optimization
4. TEST → Verify improvement
5. ITERATE → Refine further if needed

Step 1: Identify the Problem

Common Prompt Problems

Problem Symptoms Quick Test
Vague instructions Inconsistent outputs, missing requirements Run prompt 3 times - do you get similar results?
Missing context Wrong assumptions, irrelevant responses Does the prompt include all necessary background?
Unclear format Output structure varies Does prompt specify exact format?
Too complex Partial completion, confusion Does prompt try to do too much at once?
Wrong technique Poor accuracy, missed patterns Does task match technique used?
Model mismatch Suboptimal results Are you using model-specific features?
No examples Pattern not learned Would examples help?
Ambiguous terms Misinterpreted requirements Are terms like "professional" defined?

Diagnostic Questions

Ask yourself:

Clarity:

  • Is the task stated explicitly?
  • Are all requirements spelled out?
  • Would a junior developer understand this?

Completeness:

  • Is all necessary context included?
  • Are constraints specified?
  • Is output format defined?

Consistency:

  • Does running it 3x give similar results?
  • Are there contradictory instructions?
  • Is terminology used consistently?

Appropriateness:

  • Is the right technique being used?
  • Is this the right model for the task?
  • Is complexity level appropriate?

Step 2: Diagnose Root Cause

Problem: Inconsistent Results

Diagnosis:

  • Instructions too vague
  • Missing constraints
  • Ambiguous terminology
  • No output format specified

How to confirm: Run the same prompt 3-5 times and compare outputs.

Evidence:

Run 1: Returns JSON
Run 2: Returns markdown table
Run 3: Returns plain text

Diagnosis: No format specification

Problem: Wrong or Irrelevant Answers

Diagnosis:

  • Missing context
  • Unclear task definition
  • Wrong assumptions by model
  • No examples provided

How to confirm: Does the prompt include all information needed to complete the task correctly?

Evidence:

Prompt: "Optimize this code"
Response: "What language is this? What should I optimize for?"

Diagnosis: Missing context (language, optimization goals)

Problem: Partial Completion

Diagnosis:

  • Task too complex for single shot
  • Prompt too long/confusing
  • Multiple competing requirements
  • Agentic behavior not calibrated

How to confirm: Break task into smaller parts - does each part work individually?

Evidence:

Prompt: "Build complete e-commerce backend with auth, payments, inventory"
Response: Only implements auth, ignores rest

Diagnosis: Too complex, needs progressive disclosure

Problem: Incorrect Pattern Application

Diagnosis:

  • Using few-shot when zero-shot would work
  • Missing chain-of-thought for complex reasoning
  • No examples when pattern matching needed
  • Wrong model for the task

How to confirm: Compare results with different techniques.

Evidence:

Complex math problem solved directly: Wrong answer
Same problem with "think step-by-step": Correct answer

Diagnosis: Needs chain-of-thought prompting

Step 3: Apply Fixes

Fix: Add Specificity

Before:

"Write professional code"

After:

"Write TypeScript code following these standards:
- Strict type checking enabled
- JSDoc comments for all public methods
- Error handling for all async operations
- Maximum cyclomatic complexity of 5
- Meaningful variable names (no abbreviations)"

Why it works: Removes ambiguity, defines "professional" explicitly.

Fix: Provide Context

Before:

"Optimize this function:
[code]"

After:

"Optimize this TypeScript function that processes user uploads:

Context:
- Called 1000x per minute
- Typical input: 1-10MB files
- Current bottleneck: O(n²) loop
- Running in Node.js 18

Optimization goals (in priority order):
1. Reduce time complexity
2. Lower memory usage
3. Maintain readability

[code]"

Why it works: Provides necessary context for appropriate optimization.

Fix: Specify Format

Before:

"Analyze this code for issues"

After:

"Analyze this code for issues and respond in this exact format:

## Issues Found

### [Issue Title]
- **Severity**: Critical/High/Medium/Low
- **Category**: Security/Performance/Quality
- **Location**: Line X
- **Problem**: [Description]
- **Fix**: [Specific solution]

### [Next Issue]
[Same format]

## Summary
Total issues: [count by severity]
Recommendation: [Approve/Request Changes]"

Why it works: Guarantees consistent, parseable output.

Fix: Add Examples (Few-Shot)

Before:

"Convert these to camelCase:
user_name
total_count
is_active"

After:

"Convert these to camelCase:

Examples:
user_name → userName
total_count → totalCount

Now convert:
is_active →
created_at →
max_retry_attempts →"

Why it works: Shows the pattern, improves accuracy.

Fix: Enable Reasoning (Chain-of-Thought)

Before:

"What's the time complexity of this algorithm?"

After:

"Analyze the time complexity of this algorithm step-by-step:

1. First, identify the loops and recursive calls
2. Then, determine how many times each executes
3. Next, calculate the overall complexity
4. Finally, express in Big O notation

Algorithm:
[code]"

Why it works: Structured reasoning leads to correct analysis.

Fix: Break Down Complexity (Progressive Disclosure)

Before:

"Build a complete user authentication system with OAuth,
2FA, session management, and password reset"

After:

"Let's build a user authentication system in stages.

STAGE 1: Design the overall architecture
- Components needed
- Data models
- API endpoints

Wait for my review before proceeding to implementation."

Then after review:

"STAGE 2: Implement the core authentication service
- User login with email/password
- JWT token generation
- Session management

Use the architecture we designed in Stage 1."

Why it works: Breaks overwhelming task into manageable pieces.

Fix: Add Constraints

Before:

"Write a function to sort an array"

After:

"Write a function to sort an array with these constraints:

Requirements:
- Input: Array of numbers (may contain duplicates)
- Output: New array, sorted ascending
- Must be pure function (don't modify input)
- Time complexity: O(n log n) or better
- Language: TypeScript with strict types

Edge cases to handle:
- Empty array → return empty array
- Single element → return copy of array
- All same values → return copy of array
- Negative numbers → sort correctly"

Why it works: Removes ambiguity, ensures all cases handled.

Fix: Use Model-Specific Features

For GPT-5:

Before (generic):

"Implement this feature"

After (GPT-5 optimized):

SYSTEM: Senior TypeScript developer
TASK: Implement user authentication service
CONSTRAINTS:
- Use JWT with refresh tokens
- TypeScript strict mode
- Include error handling
OUTPUT: Complete implementation with tests
BEHAVIOR: Complete autonomously without asking questions
VERBOSITY: Concise - core functionality only

For Claude:

Before (generic):

"Review this code for security issues"

After (Claude optimized):

<instruction>
Review this code for security vulnerabilities
</instruction>

<code>
[Code to review]
</code>

<focus_areas>
- SQL injection
- XSS attacks
- Authentication bypasses
- Data exposure
</focus_areas>

<output_format>
For each vulnerability:
- Severity: [Critical/High/Medium/Low]
- Location: [Line number]
- Description: [What's wrong]
- Fix: [How to remediate]
</output_format>

Step 4: Test Improvements

A/B Testing Prompts

Method:

  1. Keep original prompt as baseline
  2. Create improved version
  3. Run both on same inputs (5-10 test cases)
  4. Compare results objectively

Evaluation criteria:

  • Accuracy (does it solve the problem?)
  • Consistency (similar results each time?)
  • Completeness (all requirements met?)
  • Quality (meets standards?)
  • Format compliance (matches specification?)

Example:

Test Case Original Prompt Improved Prompt Winner
Case 1 60% accurate 95% accurate Improved
Case 2 Wrong format Correct format Improved
Case 3 Incomplete Complete Improved
Case 4 Correct Correct Tie
Case 5 Partial Complete Improved

Result: Improved prompt wins 4/5 cases, use it.

Measuring Improvement

Quantitative metrics:

  • Success rate: X% of runs produce correct output
  • Consistency: X% of runs produce similar output
  • Completeness: X% of requirements met on average
  • Format compliance: X% match specified format
  • Token efficiency: Avg tokens to correct result

Qualitative assessment:

  • Does it feel easier to use?
  • Do results need less manual correction?
  • Is output more predictable?
  • Are edge cases handled better?

Iterative Testing

Version 1 → Test → 70% success → Identify issues
Version 2 → Test → 85% success → Identify remaining issues
Version 3 → Test → 95% success → Good enough for production

Don't expect perfection on first try. Iterate.


Step 5: Continuous Iteration

When to Iterate Further

Keep improving if:

  • Success rate < 90%
  • Inconsistent outputs across runs
  • Missing edge cases
  • Requires manual post-processing
  • Takes multiple attempts to get right

Stop iterating when:

  • Success rate > 95%
  • Consistent, predictable outputs
  • Handles known edge cases
  • Minimal manual correction needed
  • One-shot success most of the time

Iteration Strategies

Strategy 1: Incremental Refinement

  1. Start with basic working prompt
  2. Identify most common failure mode
  3. Add constraint/example to fix it
  4. Test
  5. Repeat for next most common failure

Strategy 2: Template Evolution

  1. Create template with placeholders
  2. Test with multiple scenarios
  3. Identify missing sections
  4. Add new sections to template
  5. Retest with scenarios

Strategy 3: Technique Stacking

  1. Start with base technique (e.g., zero-shot)
  2. If accuracy low, add chain-of-thought
  3. If still issues, add few-shot examples
  4. If format wrong, add explicit structure
  5. Layer techniques until quality sufficient

Tracking What Works

Maintain a prompt library:

prompts/
├── code-review.md (Success rate: 97%)
├── data-analysis.md (Success rate: 94%)
├── documentation.md (Success rate: 99%)
└── bug-diagnosis.md (Success rate: 91%)

Document each prompt:

# Code Review Prompt

**Success Rate:** 97%
**Last Updated:** 2025-01-15
**Model:** GPT-5 / Claude Sonnet

**Use When:** Reviewing pull requests for security, performance, quality

**Template:**
[Prompt template]

**Known Limitations:**
- May miss complex race conditions
- Better for backend than frontend code

**Changelog:**
- v3 (2025-01-15): Added self-validation step (95% → 97%)
- v2 (2025-01-10): Added severity ratings (90% → 95%)
- v1 (2025-01-05): Initial version (90%)

Optimization Checklist

Use this to systematically improve any prompt:

Clarity

  • Task explicitly stated
  • No ambiguous terms
  • Success criteria defined
  • All requirements listed

Completeness

  • All necessary context provided
  • Constraints specified
  • Edge cases addressed
  • Output format defined

Structure

  • Organized into clear sections
  • Appropriate headings/tags
  • Logical flow
  • Easy to parse

Technique

  • Right approach for task type
  • Examples if pattern matching needed
  • Reasoning steps if complex
  • Progressive disclosure if large

Model Fit

  • Uses model-specific features
  • Matches model strengths
  • Avoids model weaknesses
  • Appropriate complexity level

Testability

  • Produces consistent outputs
  • Format is parseable
  • Easy to validate results
  • Clear success/failure criteria

Advanced Optimization Techniques

Technique 1: Constraint Tuning

Concept: Find the right level of constraint specificity.

Too loose:

"Write good code"

Too tight:

"Write code with exactly 47 lines, 3 functions named specifically
foo, bar, baz, using only these 12 allowed keywords..."

Just right:

"Write TypeScript code:
- 3-5 small functions (<20 lines each)
- Descriptive names
- One responsibility per function
- Type-safe"

Process:

  1. Start loose
  2. Add constraints where outputs vary
  3. Stop when consistency achieved
  4. Don't over-constrain

Technique 2: Example Engineering

Concept: Carefully craft few-shot examples for maximum learning.

Principles:

  • Diversity: Cover different scenarios
  • Edge cases: Include tricky examples
  • Consistency: Same format for all
  • Clarity: Obvious pattern to learn

Bad examples:

Input: "Hello"
Output: "HELLO"

Input: "Hi"
Output: "HI"

Input: "Hey"
Output: "HEY"

(Too similar, no edge cases)

Good examples:

Input: "Hello World"
Output: "HELLO WORLD"

Input: "it's-a test_case"
Output: "IT'S-A TEST_CASE"

Input: "123abc"
Output: "123ABC"

Input: ""
Output: ""

(Diverse, includes spaces, punctuation, numbers, empty)

Technique 3: Format Scaffolding

Concept: Provide the exact structure you want filled in.

Basic request:

"Analyze this code"

With scaffolding:

"Analyze this code and fill in this template:

## Code Quality: [Good/Fair/Poor]

## Issues Found:
1. [Issue]: [Description]
   - Location: [Line X]
   - Fix: [Solution]

2. [Issue]: [Description]
   - Location: [Line Y]
   - Fix: [Solution]

## Strengths:
- [Strength 1]
- [Strength 2]

## Recommendation: [Approve/Request Changes]"

Result: Guaranteed format match.

Technique 4: Self-Correction Loop

Concept: Ask the model to validate and fix its own output.

Single-pass:

"Generate a JSON API response for user data"

With self-correction:

"Generate a JSON API response for user data.

After generating:
1. Validate JSON syntax
2. Check all required fields present
3. Verify data types correct
4. Ensure no sensitive data exposed

If validation fails, fix issues and regenerate."

Result: Higher quality, fewer errors.

Technique 5: Precision Through Negation

Concept: Specify what NOT to do as well as what to do.

Positive only:

"Write a professional email"

Positive + Negative:

"Write a professional email:

DO:
- Use formal salutation (Dear [Name])
- State purpose in first sentence
- Use clear, concise language
- Include specific next steps

DON'T:
- Use contractions (don't, won't, etc.)
- Include emojis or informal language
- Exceed 3 paragraphs
- End without clear call-to-action"

Result: Avoids common mistakes.


Common Optimization Patterns

Pattern: Vague → Specific

Before: "Make this better" After: "Refactor to reduce complexity from 15 to <5, extract helper functions, add type safety"

Pattern: No Examples → Few-Shot

Before: "Format these dates" After: "2025-01-15 → January 15, 2025\n2024-12-01 → December 1, 2024\nNow format: 2025-03-22"

Pattern: Single-Shot → Progressive

Before: "Build entire app" After: "Stage 1: Design architecture [wait for approval] Stage 2: Implement core [wait for approval]..."

Pattern: Generic → Model-Specific

Before: "Review this code" After (GPT-5): "SYSTEM: Senior dev\nTASK: Code review\nCONSTRAINTS:..." After (Claude): <instruction>Review code</instruction><code>...</code>

Pattern: Implicit → Explicit Format

Before: "List the issues" After: "List issues as:\n1. [Issue] - Line X - [Fix]\n2. [Issue] - Line Y - [Fix]"


Quick Reference

Optimization Decision Tree

Is output inconsistent?
├─ Yes → Add format specification + constraints
└─ No → Continue

Is output incomplete?
├─ Yes → Check if task too complex → Use progressive disclosure
└─ No → Continue

Is output inaccurate?
├─ Yes → Add examples (few-shot) or reasoning (CoT)
└─ No → Continue

Is output low quality?
├─ Yes → Add quality criteria + validation step
└─ No → Prompt is good!

Quick Fixes

Symptom Quick Fix
Varying formats Add exact format template
Wrong answers Add "think step-by-step"
Inconsistent results Add specific constraints
Missing edge cases Add explicit edge case handling
Too verbose Specify length limit
Too brief Request "detailed" or "comprehensive"
Wrong assumptions Provide complete context
Partial completion Break into stages

Remember: Optimization is iterative. Start simple, measure results, identify failures, apply targeted fixes, and repeat until quality is sufficient. Don't over-engineer prompts that already work well enough.