Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:25:12 +08:00
commit 7a35a34caa
30 changed files with 8396 additions and 0 deletions

205
references/audio-guide.md Normal file
View File

@@ -0,0 +1,205 @@
# Audio Guide (Whisper & TTS)
**Last Updated**: 2025-10-25
Complete guide to OpenAI's Audio API for transcription and text-to-speech.
---
## Whisper Transcription
### Supported Formats
- mp3, mp4, mpeg, mpga, m4a, wav, webm
### Best Practices
**Audio Quality**:
- Use clear audio with minimal background noise
- 16 kHz or higher sample rate recommended
- Mono or stereo both supported
**File Size**:
- Max file size: 25 MB
- For larger files: split into chunks or compress
**Languages**:
- Whisper automatically detects language
- Supports 50+ languages
- Best results with English, Spanish, French, German, Chinese
**Limitations**:
- May struggle with heavy accents
- Background noise reduces accuracy
- Very quiet audio may fail
---
## Text-to-Speech (TTS)
### Model Selection
| Model | Quality | Latency | Features | Best For |
|-------|---------|---------|----------|----------|
| tts-1 | Standard | Lowest | Basic TTS | Real-time streaming |
| tts-1-hd | High | Medium | Better fidelity | Offline audio, podcasts |
| gpt-4o-mini-tts | Best | Medium | Voice instructions, streaming | Maximum control |
### Voice Selection Guide
| Voice | Character | Best For |
|-------|-----------|----------|
| alloy | Neutral, balanced | General use, professional |
| ash | Clear, professional | Business, presentations |
| ballad | Warm, storytelling | Narration, audiobooks |
| coral | Soft, friendly | Customer service, greetings |
| echo | Calm, measured | Meditation, calm content |
| fable | Expressive, narrative | Stories, entertainment |
| onyx | Deep, authoritative | News, serious content |
| nova | Bright, energetic | Marketing, enthusiastic content |
| sage | Wise, thoughtful | Educational, informative |
| shimmer | Gentle, soothing | Relaxation, sleep content |
| verse | Poetic, rhythmic | Poetry, artistic content |
### Voice Instructions (gpt-4o-mini-tts only)
```typescript
// Professional tone
{
model: 'gpt-4o-mini-tts',
voice: 'ash',
input: 'Welcome to our service',
instructions: 'Speak in a calm, professional, and friendly tone suitable for customer service.',
}
// Energetic marketing
{
model: 'gpt-4o-mini-tts',
voice: 'nova',
input: 'Don\'t miss this sale!',
instructions: 'Use an enthusiastic, energetic tone perfect for marketing and advertisements.',
}
// Meditation guidance
{
model: 'gpt-4o-mini-tts',
voice: 'shimmer',
input: 'Take a deep breath',
instructions: 'Adopt a calm, soothing voice suitable for meditation and relaxation guidance.',
}
```
### Speed Control
```typescript
// Slow (0.5x)
{ speed: 0.5 } // Good for: Learning, accessibility
// Normal (1.0x)
{ speed: 1.0 } // Default
// Fast (1.5x)
{ speed: 1.5 } // Good for: Previews, time-saving
// Very fast (2.0x)
{ speed: 2.0 } // Good for: Quick previews only
```
Range: 0.25 to 4.0
### Audio Format Selection
| Format | Compression | Quality | Best For |
|--------|-------------|---------|----------|
| mp3 | Lossy | Good | Maximum compatibility |
| opus | Lossy | Excellent | Web streaming, low bandwidth |
| aac | Lossy | Good | iOS, Apple devices |
| flac | Lossless | Best | Archiving, editing |
| wav | Uncompressed | Best | Editing, processing |
| pcm | Raw | Best | Low-level processing |
---
## Common Patterns
### 1. Transcribe Interview
```typescript
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream('./interview.mp3'),
model: 'whisper-1',
});
// Save transcript
fs.writeFileSync('./interview.txt', transcription.text);
```
### 2. Generate Podcast Narration
```typescript
const script = "Welcome to today's podcast...";
const audio = await openai.audio.speech.create({
model: 'tts-1-hd',
voice: 'fable',
input: script,
response_format: 'mp3',
});
const buffer = Buffer.from(await audio.arrayBuffer());
fs.writeFileSync('./podcast.mp3', buffer);
```
### 3. Multi-Voice Conversation
```typescript
// Speaker 1
const speaker1 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'onyx',
input: 'Hello, how are you?',
});
// Speaker 2
const speaker2 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'nova',
input: 'I\'m doing great, thanks!',
});
// Combine audio files (requires audio processing library)
```
---
## Cost Optimization
1. **Use tts-1 for real-time** (cheaper, faster)
2. **Use tts-1-hd for final production** (better quality)
3. **Cache generated audio** (deterministic for same input)
4. **Choose appropriate format** (opus for web, mp3 for compatibility)
5. **Batch transcriptions** with delays to avoid rate limits
---
## Common Issues
### Transcription Accuracy
- Improve audio quality
- Reduce background noise
- Ensure adequate volume levels
- Use supported audio formats
### TTS Naturalness
- Test different voices
- Use voice instructions (gpt-4o-mini-tts)
- Adjust speed for better pacing
- Add punctuation for natural pauses
### File Size
- Compress audio before transcribing
- Choose lossy formats (mp3, opus) for TTS
- Use appropriate bitrates
---
**See Also**: Official Audio Guide (https://platform.openai.com/docs/guides/speech-to-text)

View File

@@ -0,0 +1,278 @@
# Cost Optimization Guide
**Last Updated**: 2025-10-25
Strategies to minimize OpenAI API costs while maintaining quality.
---
## Model Selection Strategies
### 1. Model Cascading
Start with cheaper models, escalate only when needed:
```typescript
async function smartCompletion(prompt: string) {
// Try gpt-5-nano first
const nanoResult = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{ role: 'user', content: prompt }],
});
// Validate quality
if (isGoodEnough(nanoResult)) {
return nanoResult;
}
// Escalate to gpt-5-mini
const miniResult = await openai.chat.completions.create({
model: 'gpt-5-mini',
messages: [{ role: 'user', content: prompt }],
});
if (isGoodEnough(miniResult)) {
return miniResult;
}
// Final escalation to gpt-5
return await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: prompt }],
});
}
```
### 2. Task-Based Model Selection
| Task | Model | Why |
|------|-------|-----|
| Simple chat | gpt-5-nano | Fast, cheap, sufficient |
| Summarization | gpt-5-mini | Good quality, cost-effective |
| Code generation | gpt-5 | Best reasoning, worth the cost |
| Data extraction | gpt-4o + structured output | Reliable, accurate |
| Vision tasks | gpt-4o | Only model with vision |
---
## Token Optimization
### 1. Limit max_tokens
```typescript
// ❌ No limit: May generate unnecessarily long responses
{
model: 'gpt-5',
messages,
}
// ✅ Set reasonable limit
{
model: 'gpt-5',
messages,
max_tokens: 500, // Prevent runaway generation
}
```
### 2. Trim Conversation History
```typescript
function trimHistory(messages: Message[], maxTokens: number = 4000) {
// Keep system message and recent messages
const system = messages.find(m => m.role === 'system');
const recent = messages.slice(-10); // Last 10 messages
return [system, ...recent].filter(Boolean);
}
```
### 3. Use Shorter Prompts
```typescript
// ❌ Verbose
"Please analyze the following text and provide a detailed summary of the main points, including any key takeaways and important details..."
// ✅ Concise
"Summarize key points:"
```
---
## Caching Strategies
### 1. Cache Embeddings
```typescript
const embeddingCache = new Map<string, number[]>();
async function getCachedEmbedding(text: string) {
if (embeddingCache.has(text)) {
return embeddingCache.get(text)!;
}
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
const embedding = response.data[0].embedding;
embeddingCache.set(text, embedding);
return embedding;
}
```
### 2. Cache Common Completions
```typescript
const completionCache = new Map<string, string>();
async function getCachedCompletion(prompt: string) {
const cacheKey = `${model}:${prompt}`;
if (completionCache.has(cacheKey)) {
return completionCache.get(cacheKey)!;
}
const result = await openai.chat.completions.create({
model: 'gpt-5-mini',
messages: [{ role: 'user', content: prompt }],
});
const content = result.choices[0].message.content;
completionCache.set(cacheKey, content!);
return content;
}
```
---
## Batch Processing
### 1. Use Embeddings Batch API
```typescript
// ❌ Individual requests (expensive)
for (const doc of documents) {
await openai.embeddings.create({
model: 'text-embedding-3-small',
input: doc,
});
}
// ✅ Batch request (cheaper)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documents, // Array of up to 2048 documents
});
```
### 2. Group Similar Requests
```typescript
// Process non-urgent requests in batches during off-peak hours
const batchQueue: string[] = [];
function queueForBatch(prompt: string) {
batchQueue.push(prompt);
if (batchQueue.length >= 10) {
processBatch();
}
}
async function processBatch() {
// Process all at once
const results = await Promise.all(
batchQueue.map(prompt =>
openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{ role: 'user', content: prompt }],
})
)
);
batchQueue.length = 0;
return results;
}
```
---
## Feature-Specific Optimization
### Embeddings
1. **Use custom dimensions**: 256 instead of 1536 = 6x storage reduction
2. **Use text-embedding-3-small**: Cheaper than large, good for most use cases
3. **Batch requests**: Up to 2048 documents per request
### Images
1. **Use standard quality**: Unless HD is critical
2. **Use smaller sizes**: Generate 1024x1024 instead of 1792x1024 when possible
3. **Use natural style**: Cheaper than vivid
### Audio
1. **Use tts-1 for real-time**: Cheaper than tts-1-hd
2. **Use opus format**: Smaller files, good quality
3. **Cache generated audio**: Deterministic for same input
---
## Monitoring and Alerts
```typescript
interface CostTracker {
totalTokens: number;
totalCost: number;
requestCount: number;
}
const tracker: CostTracker = {
totalTokens: 0,
totalCost: 0,
requestCount: 0,
};
async function trackCosts(fn: () => Promise<any>) {
const result = await fn();
if (result.usage) {
tracker.totalTokens += result.usage.total_tokens;
tracker.requestCount++;
// Estimate cost (adjust rates based on actual pricing)
const cost = estimateCost(result.model, result.usage.total_tokens);
tracker.totalCost += cost;
// Alert if threshold exceeded
if (tracker.totalCost > 100) {
console.warn('Cost threshold exceeded!', tracker);
}
}
return result;
}
```
---
## Cost Reduction Checklist
- [ ] Use cheapest model that meets requirements
- [ ] Set max_tokens limits
- [ ] Trim conversation history
- [ ] Cache embeddings and common queries
- [ ] Batch requests when possible
- [ ] Use custom embedding dimensions (256-512)
- [ ] Monitor token usage
- [ ] Implement rate limiting
- [ ] Use structured outputs to avoid retries
- [ ] Compress prompts (remove unnecessary words)
---
**Estimated Savings**: Following these practices can reduce costs by 40-70% while maintaining quality.

View File

@@ -0,0 +1,187 @@
# Embeddings Guide
**Last Updated**: 2025-10-25
Complete guide to OpenAI's Embeddings API for semantic search, RAG, and clustering.
---
## Model Comparison
| Model | Default Dimensions | Custom Dimensions | Best For |
|-------|-------------------|-------------------|----------|
| text-embedding-3-large | 3072 | 256-3072 | Highest quality semantic search |
| text-embedding-3-small | 1536 | 256-1536 | Most applications, cost-effective |
| text-embedding-ada-002 | 1536 | Fixed | Legacy (use v3 models) |
---
## Dimension Selection
### Full Dimensions
- **text-embedding-3-small**: 1536 (default)
- **text-embedding-3-large**: 3072 (default)
- Use for maximum accuracy
### Reduced Dimensions
- **256 dims**: 4-12x storage reduction, minimal quality loss
- **512 dims**: 2-6x storage reduction, good quality
- Use for cost/storage optimization
```typescript
// Full dimensions (1536)
const full = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Sample text',
});
// Reduced dimensions (256)
const reduced = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Sample text',
dimensions: 256,
});
```
---
## RAG (Retrieval-Augmented Generation) Pattern
### 1. Build Knowledge Base
```typescript
const documents = [
'TypeScript is a superset of JavaScript',
'Python is a high-level programming language',
'React is a JavaScript library for UIs',
];
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documents,
});
const knowledgeBase = documents.map((text, i) => ({
text,
embedding: embeddings.data[i].embedding,
}));
```
### 2. Query with Similarity Search
```typescript
// Embed user query
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'What is TypeScript?',
});
// Find similar documents
const similarities = knowledgeBase.map(doc => ({
text: doc.text,
similarity: cosineSimilarity(queryEmbedding.data[0].embedding, doc.embedding),
}));
similarities.sort((a, b) => b.similarity - a.similarity);
const topResults = similarities.slice(0, 3);
```
### 3. Generate Answer with Context
```typescript
const context = topResults.map(r => r.text).join('\n\n');
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'system', content: `Answer using this context:\n\n${context}` },
{ role: 'user', content: 'What is TypeScript?' },
],
});
```
---
## Similarity Metrics
### Cosine Similarity (Recommended)
```typescript
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
```
### Euclidean Distance
```typescript
function euclideanDistance(a: number[], b: number[]): number {
return Math.sqrt(
a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0)
);
}
```
---
## Batch Processing
```typescript
// Process up to 2048 documents
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documents, // Array of strings
});
embeddings.data.forEach((item, index) => {
console.log(`Doc ${index}: ${item.embedding.length} dimensions`);
});
```
**Limits**:
- Max tokens per input: 8192
- Max summed tokens across all inputs: 300,000
- Array dimension max: 2048
---
## Best Practices
**Pre-processing**:
- Normalize text (lowercase, remove special chars)
- Be consistent across queries and documents
- Chunk long documents (max 8192 tokens)
**Storage**:
- Use custom dimensions (256-512) for storage optimization
- Store embeddings in vector databases (Pinecone, Weaviate, Qdrant)
- Cache embeddings (deterministic for same input)
**Search**:
- Use cosine similarity for comparison
- Normalize embeddings before storing (L2 normalization)
- Pre-filter with metadata before similarity search
**Don't**:
- Mix models (incompatible dimensions)
- Exceed token limits (8192 per input)
- Skip normalization
- Use raw embeddings without similarity metric
---
## Use Cases
1. **Semantic Search**: Find similar documents
2. **RAG**: Retrieve context for generation
3. **Clustering**: Group similar content
4. **Recommendations**: Content-based recommendations
5. **Anomaly Detection**: Detect outliers
6. **Duplicate Detection**: Find similar/duplicate content
---
**See Also**: Official Embeddings Guide (https://platform.openai.com/docs/guides/embeddings)

View File

@@ -0,0 +1,189 @@
# Function Calling Patterns
**Last Updated**: 2025-10-25
Advanced patterns for implementing function calling (tool calling) with OpenAI's Chat Completions API.
---
## Basic Pattern
```typescript
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' },
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
];
```
---
## Advanced Patterns
### 1. Parallel Tool Calls
The model can call multiple tools simultaneously:
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'user', content: 'What is the weather in SF and NYC?' }
],
tools: tools,
});
// Model may return multiple tool_calls
const toolCalls = completion.choices[0].message.tool_calls;
// Execute all in parallel
const results = await Promise.all(
toolCalls.map(call => executeFunction(call.function.name, call.function.arguments))
);
```
### 2. Dynamic Tool Generation
Generate tools based on runtime context:
```typescript
function generateTools(database: Database) {
const tables = database.getTables();
return tables.map(table => ({
type: 'function',
function: {
name: `query_${table.name}`,
description: `Query the ${table.name} table`,
parameters: {
type: 'object',
properties: table.columns.reduce((acc, col) => ({
...acc,
[col.name]: { type: col.type, description: col.description },
}), {}),
},
},
}));
}
```
### 3. Tool Chaining
Chain tool results:
```typescript
async function chatWithToolChaining(userMessage: string) {
let messages = [{ role: 'user', content: userMessage }];
while (true) {
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages,
tools,
});
const message = completion.choices[0].message;
messages.push(message);
if (!message.tool_calls) {
return message.content; // Final answer
}
// Execute tool calls and add results
for (const toolCall of message.tool_calls) {
const result = await executeFunction(
toolCall.function.name,
toolCall.function.arguments
);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
}
}
```
### 4. Error Handling in Tools
```typescript
async function executeFunction(name: string, argsString: string) {
try {
const args = JSON.parse(argsString);
switch (name) {
case 'get_weather':
return await getWeather(args.location, args.unit);
default:
return { error: `Unknown function: ${name}` };
}
} catch (error: any) {
return { error: error.message };
}
}
```
### 5. Streaming with Tools
```typescript
const stream = await openai.chat.completions.create({
model: 'gpt-5',
messages,
tools,
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
// Check for tool calls in streaming
if (delta?.tool_calls) {
// Accumulate tool call data
console.log('Tool call chunk:', delta.tool_calls);
}
}
```
---
## Best Practices
**Schema Design**:
- Provide clear descriptions for each parameter
- Use enum when options are limited
- Mark required vs optional parameters
**Error Handling**:
- Return structured error objects
- Don't throw exceptions from tool functions
- Let the model handle error recovery
**Performance**:
- Execute independent tool calls in parallel
- Cache tool results when appropriate
- Limit recursion depth to avoid infinite loops
**Don't**:
- Expose sensitive internal functions
- Allow unlimited recursion
- Skip parameter validation
- Return unstructured error messages
---
**See Also**: Official Function Calling Guide (https://platform.openai.com/docs/guides/function-calling)

153
references/images-guide.md Normal file
View File

@@ -0,0 +1,153 @@
# Images Guide (DALL-E 3 & GPT-Image-1)
**Last Updated**: 2025-10-25
Best practices for image generation and editing with OpenAI's Images API.
---
## DALL-E 3 Generation
### Size Selection
| Size | Use Case |
|------|----------|
| 1024x1024 | Profile pictures, icons, square posts |
| 1024x1536 | Portrait photos, vertical ads |
| 1536x1024 | Landscape photos, banners |
| 1024x1792 | Tall portraits, mobile wallpapers |
| 1792x1024 | Wide banners, desktop wallpapers |
### Quality Settings
**standard**: Normal quality, faster, cheaper
- Use for: Prototyping, high-volume generation, quick iterations
**hd**: High definition, finer details, more expensive
- Use for: Final production images, marketing materials, print
### Style Options
**vivid**: Hyper-real, dramatic, high-contrast
- Use for: Marketing, advertising, eye-catching visuals
**natural**: More realistic, less dramatic
- Use for: Product photos, realistic scenes, professional content
---
## Prompting Best Practices
### Be Specific
```
❌ "A cat"
✅ "A white siamese cat with striking blue eyes, sitting on a wooden table, golden hour lighting, professional photography"
```
### Include Art Style
```
✅ "Oil painting of a sunset in the style of Claude Monet"
✅ "3D render of a futuristic city, Pixar animation style"
✅ "Professional product photo with studio lighting"
```
### Specify Lighting
```
- "Golden hour lighting"
- "Soft studio lighting from the left"
- "Dramatic shadows"
- "Bright natural daylight"
```
### Composition Details
```
- "Shallow depth of field"
- "Wide angle lens"
- "Centered composition"
- "Rule of thirds"
```
---
## GPT-Image-1 Editing
### Input Fidelity
**low**: More creative freedom
- Use for: Major transformations, style changes
**medium**: Balance (default)
- Use for: Most editing tasks
**high**: Stay close to original
- Use for: Subtle edits, preserving details
### Common Editing Tasks
1. **Background Removal**
```typescript
formData.append('prompt', 'Remove the background, keep only the product');
formData.append('format', 'png');
formData.append('background', 'transparent');
```
2. **Color Correction**
```typescript
formData.append('prompt', 'Increase brightness and saturation, make colors more vibrant');
```
3. **Object Removal**
```typescript
formData.append('prompt', 'Remove the person from the background');
```
4. **Compositing**
```typescript
formData.append('image', mainImage);
formData.append('image_2', logoImage);
formData.append('prompt', 'Add the logo to the product, as if stamped on the surface');
```
---
## Format Selection
| Format | Transparency | Compression | Best For |
|--------|--------------|-------------|----------|
| PNG | Yes | Lossless | Logos, transparency needed |
| JPEG | No | Lossy | Photos, smaller file size |
| WebP | Yes | Lossy | Web, best compression |
---
## Cost Optimization
1. Use standard quality unless HD is critical
2. Generate smaller sizes when possible
3. Cache generated images
4. Use natural style for most cases (vivid costs more)
5. Batch requests with delays to avoid rate limits
---
## Common Issues
### Prompt Revision
DALL-E 3 may revise prompts for safety/quality. Check `revised_prompt` in response.
### URL Expiration
Image URLs expire in 1 hour. Download and save if needed long-term.
### Non-Deterministic
Same prompt = different images. Cache results if consistency needed.
### Rate Limits
DALL-E has separate IPM (Images Per Minute) limits. Monitor and implement delays.
---
**See Also**: Official Images Guide (https://platform.openai.com/docs/guides/images)

311
references/models-guide.md Normal file
View File

@@ -0,0 +1,311 @@
# OpenAI Models Guide
**Last Updated**: 2025-10-25
This guide provides a comprehensive comparison of OpenAI's language models to help you choose the right model for your use case.
---
## GPT-5 Series (Released August 2025)
### gpt-5
**Status**: Latest flagship model
**Best for**: Complex reasoning, advanced problem-solving, code generation
**Key Features**:
- Advanced reasoning capabilities
- Unique parameters: `reasoning_effort`, `verbosity`
- Best-in-class performance on complex tasks
**Limitations**:
- ❌ No `temperature` support
- ❌ No `top_p` support
- ❌ No `logprobs` support
- ❌ CoT (Chain of Thought) does NOT persist between turns
**When to use**:
- Complex mathematical problems
- Advanced code generation
- Logic puzzles and reasoning tasks
- Multi-step problem solving
**Cost**: Highest pricing tier
---
### gpt-5-mini
**Status**: Cost-effective GPT-5 variant
**Best for**: Balanced performance and cost
**Key Features**:
- Same parameter support as gpt-5 (`reasoning_effort`, `verbosity`)
- Better than GPT-4 Turbo performance
- Significantly cheaper than gpt-5
**When to use**:
- Most production applications
- When you need GPT-5 features but not maximum performance
- High-volume use cases where cost matters
**Cost**: Mid-tier pricing
---
### gpt-5-nano
**Status**: Smallest GPT-5 variant
**Best for**: Simple tasks, high-volume processing
**Key Features**:
- Fastest response times
- Lowest cost in GPT-5 series
- Still supports GPT-5 unique parameters
**When to use**:
- Simple text generation
- High-volume batch processing
- Real-time streaming applications
- Cost-sensitive deployments
**Cost**: Low-tier pricing
---
## GPT-4o Series
### gpt-4o
**Status**: Multimodal flagship (pre-GPT-5)
**Best for**: Vision tasks, multimodal applications
**Key Features**:
- ✅ Vision support (image understanding)
- ✅ Temperature control
- ✅ Top-p sampling
- ✅ Function calling
- ✅ Structured outputs
**Limitations**:
- ❌ No `reasoning_effort` parameter
- ❌ No `verbosity` parameter
**When to use**:
- Image understanding and analysis
- OCR / text extraction from images
- Visual question answering
- When you need temperature/top_p control
- Multimodal applications
**Cost**: High-tier pricing (cheaper than gpt-5)
---
### gpt-4-turbo
**Status**: Fast GPT-4 variant
**Best for**: When you need GPT-4 speed
**Key Features**:
- Faster than base GPT-4
- Full parameter support (temperature, top_p, logprobs)
- Good balance of quality and speed
**When to use**:
- When GPT-4 quality is needed with faster responses
- Legacy applications requiring specific parameters
- When vision is not required
**Cost**: Mid-tier pricing
---
## Comparison Table
| Feature | GPT-5 | GPT-5-mini | GPT-5-nano | GPT-4o | GPT-4 Turbo |
|---------|-------|------------|------------|--------|-------------|
| **Reasoning** | Best | Excellent | Good | Excellent | Excellent |
| **Speed** | Medium | Medium | Fastest | Medium | Fast |
| **Cost** | Highest | Mid | Lowest | High | Mid |
| **reasoning_effort** | ✅ | ✅ | ✅ | ❌ | ❌ |
| **verbosity** | ✅ | ✅ | ✅ | ❌ | ❌ |
| **temperature** | ❌ | ❌ | ❌ | ✅ | ✅ |
| **top_p** | ❌ | ❌ | ❌ | ✅ | ✅ |
| **Vision** | ❌ | ❌ | ❌ | ✅ | ❌ |
| **Function calling** | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Structured outputs** | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Max output tokens** | 16,384 | 16,384 | 16,384 | 16,384 | 16,384 |
---
## Selection Guide
### Use GPT-5 when:
- ✅ You need the best reasoning performance
- ✅ Complex mathematical or logical problems
- ✅ Advanced code generation
- ✅ Multi-step problem solving
- ❌ Cost is not the primary concern
### Use GPT-5-mini when:
- ✅ You want GPT-5 features at lower cost
- ✅ Production applications with high volume
- ✅ Good reasoning performance is needed
- ✅ Balance of quality and cost matters
### Use GPT-5-nano when:
- ✅ Simple text generation tasks
- ✅ High-volume batch processing
- ✅ Real-time streaming applications
- ✅ Cost optimization is critical
- ❌ Complex reasoning is not required
### Use GPT-4o when:
- ✅ Vision / image understanding is required
- ✅ You need temperature/top_p control
- ✅ Multimodal applications
- ✅ OCR and visual analysis
- ❌ Pure text tasks (use GPT-5 series)
### Use GPT-4 Turbo when:
- ✅ Legacy application compatibility
- ✅ You need specific parameters not in GPT-5
- ✅ Fast responses without vision
- ❌ Not recommended for new applications (use GPT-5 or GPT-4o)
---
## Cost Optimization Strategies
### 1. Model Cascading
Start with cheaper models and escalate only when needed:
```
gpt-5-nano (try first) → gpt-5-mini → gpt-5 (if needed)
```
### 2. Task-Specific Model Selection
- **Simple**: Use gpt-5-nano
- **Medium complexity**: Use gpt-5-mini
- **Complex reasoning**: Use gpt-5
- **Vision tasks**: Use gpt-4o
### 3. Hybrid Approach
- Use embeddings (cheap) for retrieval
- Use gpt-5-mini for generation
- Use gpt-5 only for critical decisions
### 4. Batch Processing
- Use cheaper models for bulk operations
- Reserve expensive models for user-facing requests
---
## Parameter Guide
### GPT-5 Unique Parameters
**reasoning_effort**: Controls reasoning depth
- "minimal": Quick responses
- "low": Basic reasoning
- "medium": Balanced (default)
- "high": Deep reasoning for complex problems
**verbosity**: Controls output length
- "low": Concise responses
- "medium": Balanced detail (default)
- "high": Verbose, detailed responses
### GPT-4o/GPT-4 Turbo Parameters
**temperature**: Controls randomness (0-2)
- 0: Deterministic, focused
- 1: Balanced creativity (default)
- 2: Maximum creativity
**top_p**: Nucleus sampling (0-1)
- Lower values: More focused
- Higher values: More diverse
**logprobs**: Get token probabilities
- Useful for debugging and analysis
---
## Common Patterns
### Pattern 1: Automatic Model Selection
```typescript
function selectModel(taskComplexity: 'simple' | 'medium' | 'complex') {
switch (taskComplexity) {
case 'simple':
return 'gpt-5-nano';
case 'medium':
return 'gpt-5-mini';
case 'complex':
return 'gpt-5';
}
}
```
### Pattern 2: Fallback Chain
```typescript
async function completionWithFallback(prompt: string) {
const models = ['gpt-5-nano', 'gpt-5-mini', 'gpt-5'];
for (const model of models) {
try {
const result = await openai.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
});
// Validate quality
if (isGoodEnough(result)) {
return result;
}
} catch (error) {
continue;
}
}
throw new Error('All models failed');
}
```
### Pattern 3: Vision + Text Hybrid
```typescript
// Use gpt-4o for image analysis
const imageAnalysis = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image' },
{ type: 'image_url', image_url: { url: imageUrl } },
],
},
],
});
// Use gpt-5 for reasoning based on analysis
const reasoning = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'system', content: `Image analysis: ${imageAnalysis.choices[0].message.content}` },
{ role: 'user', content: 'What does this imply about...' },
],
});
```
---
## Official Documentation
- **GPT-5 Guide**: https://platform.openai.com/docs/guides/latest-model
- **Model Pricing**: https://openai.com/pricing
- **Model Comparison**: https://platform.openai.com/docs/models
---
**Summary**: Choose the right model based on your specific needs. GPT-5 series for reasoning, GPT-4o for vision, and optimize costs by selecting the smallest model that meets your requirements.

View File

@@ -0,0 +1,220 @@
# Structured Output Guide
**Last Updated**: 2025-10-25
Best practices for using JSON schemas with OpenAI's structured outputs feature.
---
## When to Use Structured Outputs
Use structured outputs when you need:
-**Guaranteed JSON format**: Response will always be valid JSON
-**Schema validation**: Enforce specific structure
-**Type safety**: Parse directly into TypeScript types
-**Data extraction**: Pull specific fields from text
-**Classification**: Map to predefined categories
---
## Schema Best Practices
### 1. Keep Schemas Simple
```typescript
// ✅ Good: Simple, focused schema
{
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
},
required: ['name', 'age'],
additionalProperties: false,
}
// ❌ Avoid: Overly complex nested structures
// (they work but are harder to debug)
```
### 2. Use Enums for Fixed Options
```typescript
{
type: 'object',
properties: {
category: {
type: 'string',
enum: ['bug', 'feature', 'question'],
},
priority: {
type: 'string',
enum: ['low', 'medium', 'high', 'critical'],
},
},
required: ['category', 'priority'],
}
```
### 3. Always Use `strict: true`
```typescript
response_format: {
type: 'json_schema',
json_schema: {
name: 'response_schema',
strict: true, // ✅ Enforces exact compliance
schema: { /* ... */ },
},
}
```
### 4. Set `additionalProperties: false`
```typescript
{
type: 'object',
properties: { /* ... */ },
required: [ /* ... */ ],
additionalProperties: false, // ✅ Prevents unexpected fields
}
```
---
## Common Use Cases
### Data Extraction
```typescript
const schema = {
type: 'object',
properties: {
person: { type: 'string' },
company: { type: 'string' },
email: { type: 'string' },
phone: { type: 'string' },
},
required: ['person'],
additionalProperties: false,
};
// Extract from unstructured text
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Extract contact information' },
{ role: 'user', content: 'John works at TechCorp, email: john@tech.com' },
],
response_format: { type: 'json_schema', json_schema: { name: 'contact', strict: true, schema } },
});
const contact = JSON.parse(completion.choices[0].message.content);
// { person: "John", company: "TechCorp", email: "john@tech.com", phone: null }
```
### Classification
```typescript
const schema = {
type: 'object',
properties: {
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
confidence: { type: 'number' },
topics: { type: 'array', items: { type: 'string' } },
},
required: ['sentiment', 'confidence', 'topics'],
additionalProperties: false,
};
// Classify text
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Classify the text' },
{ role: 'user', content: 'This product is amazing!' },
],
response_format: { type: 'json_schema', json_schema: { name: 'classification', strict: true, schema } },
});
const result = JSON.parse(completion.choices[0].message.content);
// { sentiment: "positive", confidence: 0.95, topics: ["product", "satisfaction"] }
```
---
## TypeScript Integration
### Type-Safe Parsing
```typescript
interface PersonProfile {
name: string;
age: number;
skills: string[];
}
const schema = {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
skills: { type: 'array', items: { type: 'string' } },
},
required: ['name', 'age', 'skills'],
additionalProperties: false,
};
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Generate a person profile' }],
response_format: { type: 'json_schema', json_schema: { name: 'person', strict: true, schema } },
});
const person: PersonProfile = JSON.parse(completion.choices[0].message.content);
// TypeScript knows the shape!
```
---
## Error Handling
```typescript
try {
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
response_format: { type: 'json_schema', json_schema: { name: 'data', strict: true, schema } },
});
const data = JSON.parse(completion.choices[0].message.content);
return data;
} catch (error) {
if (error.message.includes('JSON')) {
console.error('Failed to parse JSON (should not happen with strict mode)');
}
throw error;
}
```
---
## Validation
While `strict: true` ensures the response matches the schema, you may want additional validation:
```typescript
import { z } from 'zod';
const zodSchema = z.object({
email: z.string().email(),
age: z.number().min(0).max(120),
});
const data = JSON.parse(completion.choices[0].message.content);
const validated = zodSchema.parse(data); // Throws if invalid
```
---
**See Also**: Official Structured Outputs Guide (https://platform.openai.com/docs/guides/structured-outputs)

453
references/top-errors.md Normal file
View File

@@ -0,0 +1,453 @@
# Top OpenAI API Errors & Solutions
**Last Updated**: 2025-10-25
**Skill**: openai-api
**Status**: Phase 1 Complete
---
## Overview
This document covers the 10 most common errors encountered when using OpenAI APIs, with causes, solutions, and code examples.
---
## 1. Rate Limit Error (429)
### Cause
Too many requests or tokens per minute/day.
### Error Response
```json
{
"error": {
"message": "Rate limit reached",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
```
### Solution
Implement exponential backoff:
```typescript
async function completionWithRetry(params, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await openai.chat.completions.create(params);
} catch (error: any) {
if (error.status === 429 && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
console.log(`Rate limited. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
---
## 2. Invalid API Key (401)
### Cause
Missing or incorrect `OPENAI_API_KEY`.
### Error Response
```json
{
"error": {
"message": "Incorrect API key provided",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
```
### Solution
Verify environment variable:
```bash
# Check if set
echo $OPENAI_API_KEY
# Set in .env
OPENAI_API_KEY=sk-...
```
```typescript
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is required');
}
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
```
---
## 3. Function Calling Schema Mismatch
### Cause
Tool definition doesn't match model expectations or arguments are invalid.
### Error Response
```json
{
"error": {
"message": "Invalid schema for function 'get_weather'",
"type": "invalid_request_error"
}
}
```
### Solution
Validate JSON schema:
```typescript
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a location', // Required
parameters: { // Required
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name' // Add descriptions
}
},
required: ['location'] // Specify required fields
}
}
}
];
```
---
## 4. Streaming Parse Error
### Cause
Incomplete or malformed SSE (Server-Sent Events) chunks.
### Symptom
```
SyntaxError: Unexpected end of JSON input
```
### Solution
Properly handle SSE format:
```typescript
const lines = chunk.split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
break;
}
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || '';
console.log(content);
} catch (e) {
// Skip invalid JSON - don't crash
console.warn('Skipping invalid JSON chunk');
}
}
}
```
---
## 5. Vision Image Encoding Error
### Cause
Invalid base64 encoding or unsupported image format.
### Error Response
```json
{
"error": {
"message": "Invalid image format",
"type": "invalid_request_error"
}
}
```
### Solution
Ensure proper base64 encoding:
```typescript
import fs from 'fs';
// Read and encode image
const imageBuffer = fs.readFileSync('./image.jpg');
const base64Image = imageBuffer.toString('base64');
// Use with correct MIME type
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image_url',
image_url: {
url: `data:image/jpeg;base64,${base64Image}` // Include MIME type
}
}
]
}
]
});
```
---
## 6. Token Limit Exceeded
### Cause
Input + output tokens exceed model's context window.
### Error Response
```json
{
"error": {
"message": "This model's maximum context length is 128000 tokens",
"type": "invalid_request_error",
"code": "context_length_exceeded"
}
}
```
### Solution
Truncate input or reduce max_tokens:
```typescript
function truncateMessages(messages, maxTokens = 120000) {
// Rough estimate: 1 token ≈ 4 characters
const maxChars = maxTokens * 4;
let totalChars = 0;
const truncated = [];
for (const msg of messages.reverse()) {
const msgChars = msg.content.length;
if (totalChars + msgChars > maxChars) break;
truncated.unshift(msg);
totalChars += msgChars;
}
return truncated;
}
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: truncateMessages(messages),
max_tokens: 8000, // Limit output tokens
});
```
---
## 7. GPT-5 Temperature Not Supported
### Cause
Using `temperature` parameter with GPT-5 models.
### Error Response
```json
{
"error": {
"message": "temperature is not supported for gpt-5",
"type": "invalid_request_error"
}
}
```
### Solution
Use `reasoning_effort` instead or switch to GPT-4o:
```typescript
// ❌ Bad - GPT-5 doesn't support temperature
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [...],
temperature: 0.7, // NOT SUPPORTED
});
// ✅ Good - Use reasoning_effort for GPT-5
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [...],
reasoning_effort: 'medium',
});
// ✅ Or use GPT-4o if you need temperature
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
temperature: 0.7,
});
```
---
## 8. Streaming Not Closed Properly
### Cause
Stream not properly terminated, causing resource leaks.
### Symptom
Memory leaks, hanging connections.
### Solution
Always close streams:
```typescript
const stream = await openai.chat.completions.create({
model: 'gpt-5',
messages: [...],
stream: true,
});
try {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
} finally {
// Stream is automatically closed when iteration completes
// But handle errors explicitly
}
// For fetch-based streaming:
const reader = response.body?.getReader();
try {
while (true) {
const { done, value } = await reader!.read();
if (done) break;
// Process chunk
}
} finally {
reader!.releaseLock(); // Important!
}
```
---
## 9. API Key Exposure in Client-Side Code
### Cause
Including API key in frontend JavaScript.
### Risk
API key visible to all users, can be stolen and abused.
### Solution
Use server-side proxy:
```typescript
// ❌ Bad - Client-side (NEVER DO THIS)
const apiKey = 'sk-...'; // Exposed to all users!
const response = await fetch('https://api.openai.com/v1/chat/completions', {
headers: { 'Authorization': `Bearer ${apiKey}` }
});
// ✅ Good - Server-side proxy
// Frontend:
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ message: 'Hello' }),
});
// Backend (e.g., Express):
app.post('/api/chat', async (req, res) => {
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: req.body.message }],
});
res.json(completion);
});
```
---
## 10. Embeddings Dimension Mismatch
### Cause
Using wrong dimensions for embedding model.
### Error Response
```json
{
"error": {
"message": "dimensions must be less than or equal to 3072 for text-embedding-3-large",
"type": "invalid_request_error"
}
}
```
### Solution
Use correct dimensions for each model:
```typescript
// text-embedding-3-small: default 1536, max 1536
const embedding1 = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Hello world',
// dimensions: 256, // Optional: reduce from default 1536
});
// text-embedding-3-large: default 3072, max 3072
const embedding2 = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: 'Hello world',
// dimensions: 1024, // Optional: reduce from default 3072
});
// text-embedding-ada-002: fixed 1536 (no dimensions parameter)
const embedding3 = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: 'Hello world',
// No dimensions parameter supported
});
```
---
## Quick Reference Table
| Error Code | HTTP Status | Primary Cause | Quick Fix |
|------------|-------------|---------------|-----------|
| `rate_limit_exceeded` | 429 | Too many requests | Exponential backoff |
| `invalid_api_key` | 401 | Wrong/missing key | Check OPENAI_API_KEY |
| `invalid_request_error` | 400 | Bad parameters | Validate schema/params |
| `context_length_exceeded` | 400 | Too many tokens | Truncate input |
| `model_not_found` | 404 | Invalid model name | Use correct model ID |
| `insufficient_quota` | 429 | No credits left | Add billing/credits |
---
## Additional Resources
- **Official Error Codes**: https://platform.openai.com/docs/guides/error-codes
- **Rate Limits Guide**: https://platform.openai.com/docs/guides/rate-limits
- **Best Practices**: https://platform.openai.com/docs/guides/production-best-practices
---
**Phase 1 Complete**
**Phase 2**: Additional errors for Embeddings, Images, Audio, Moderation (next session)