1218 lines
30 KiB
Markdown
1218 lines
30 KiB
Markdown
---
|
|
name: openai-responses
|
|
description: |
|
|
Build agentic AI applications with OpenAI's Responses API - the stateful successor to Chat Completions. Preserves reasoning across turns for 5% better multi-turn performance and 40-80% improved cache utilization.
|
|
|
|
Use when: building AI agents with persistent reasoning, integrating MCP servers for external tools, using built-in Code Interpreter/File Search/Web Search, managing stateful conversations, implementing background processing for long tasks, or migrating from Chat Completions to gain polymorphic outputs and server-side tools.
|
|
license: MIT
|
|
---
|
|
|
|
# OpenAI Responses API
|
|
|
|
**Status**: Production Ready
|
|
**Last Updated**: 2025-10-25
|
|
**API Launch**: March 2025
|
|
**Dependencies**: openai@5.19.1+ (Node.js) or fetch API (Cloudflare Workers)
|
|
|
|
---
|
|
|
|
## What Is the Responses API?
|
|
|
|
The Responses API (`/v1/responses`) is OpenAI's unified interface for building agentic applications, launched in March 2025. It fundamentally changes how you interact with OpenAI models by providing **stateful conversations** and a **structured loop for reasoning and acting**.
|
|
|
|
### Key Innovation: Preserved Reasoning State
|
|
|
|
Unlike Chat Completions where reasoning is discarded between turns, Responses **keeps the notebook open**. The model's step-by-step thought processes survive into the next turn, improving performance by approximately **5% on TAUBench** and enabling better multi-turn interactions.
|
|
|
|
### Why Use Responses Over Chat Completions?
|
|
|
|
| Feature | Chat Completions | Responses API | Benefit |
|
|
|---------|-----------------|---------------|---------|
|
|
| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Simpler code, less error-prone |
|
|
| **Reasoning** | Dropped between turns | Preserved across turns | Better multi-turn performance |
|
|
| **Tools** | Client-side round trips | Server-side hosted | Lower latency, simpler code |
|
|
| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Richer debugging, better UX |
|
|
| **Cache Utilization** | Baseline | 40-80% better | Lower costs, faster responses |
|
|
| **MCP Support** | Manual integration | Built-in | Easy external tool connections |
|
|
|
|
---
|
|
|
|
## Quick Start (5 Minutes)
|
|
|
|
### 1. Get API Key
|
|
|
|
```bash
|
|
# Sign up at https://platform.openai.com/
|
|
# Navigate to API Keys section
|
|
# Create new key and save securely
|
|
export OPENAI_API_KEY="sk-proj-..."
|
|
```
|
|
|
|
**Why this matters:**
|
|
- API key required for all requests
|
|
- Keep secure (never commit to git)
|
|
- Use environment variables
|
|
|
|
### 2. Install SDK (Node.js)
|
|
|
|
```bash
|
|
npm install openai
|
|
```
|
|
|
|
```typescript
|
|
import OpenAI from 'openai';
|
|
|
|
const openai = new OpenAI({
|
|
apiKey: process.env.OPENAI_API_KEY,
|
|
});
|
|
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'What are the 5 Ds of dodgeball?',
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
```
|
|
|
|
**CRITICAL:**
|
|
- Always use server-side (never expose API key in client code)
|
|
- Model defaults to `gpt-5` (can use `gpt-5-mini`, `gpt-4o`, etc.)
|
|
- `input` can be string or array of messages
|
|
|
|
### 3. Or Use Direct API (Cloudflare Workers)
|
|
|
|
```typescript
|
|
// No SDK needed - use fetch()
|
|
const response = await fetch('https://api.openai.com/v1/responses', {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
|
|
'Content-Type': 'application/json',
|
|
},
|
|
body: JSON.stringify({
|
|
model: 'gpt-5',
|
|
input: 'Hello, world!',
|
|
}),
|
|
});
|
|
|
|
const data = await response.json();
|
|
console.log(data.output_text);
|
|
```
|
|
|
|
**Why fetch?**
|
|
- No dependencies in edge environments
|
|
- Full control over request/response
|
|
- Works in Cloudflare Workers, Deno, Bun
|
|
|
|
---
|
|
|
|
## Responses vs Chat Completions: Complete Comparison
|
|
|
|
### When to Use Each
|
|
|
|
**Use Responses API when:**
|
|
- ✅ Building agentic applications (reasoning + actions)
|
|
- ✅ Need preserved reasoning state across turns
|
|
- ✅ Want built-in tools (Code Interpreter, File Search, Web Search)
|
|
- ✅ Using MCP servers for external integrations
|
|
- ✅ Implementing conversational AI with automatic state management
|
|
- ✅ Background processing for long-running tasks
|
|
- ✅ Need polymorphic outputs (messages, reasoning, tool calls)
|
|
|
|
**Use Chat Completions when:**
|
|
- ✅ Simple one-off text generation
|
|
- ✅ Fully stateless interactions (no conversation continuity needed)
|
|
- ✅ Legacy integrations (existing Chat Completions code)
|
|
- ✅ Very simple use cases without tools
|
|
|
|
### Architecture Differences
|
|
|
|
**Chat Completions Flow:**
|
|
```
|
|
User Input → Model → Single Message → Done
|
|
(Reasoning discarded, state lost)
|
|
```
|
|
|
|
**Responses API Flow:**
|
|
```
|
|
User Input → Model (preserved reasoning) → Polymorphic Outputs
|
|
↓ (server-side tools)
|
|
Tool Call → Tool Result → Model → Final Response
|
|
(Reasoning preserved, state maintained)
|
|
```
|
|
|
|
### Performance Benefits
|
|
|
|
**Cache Utilization:**
|
|
- Chat Completions: Baseline performance
|
|
- Responses API: **40-80% better cache utilization**
|
|
- Result: Lower latency + reduced costs
|
|
|
|
**Reasoning Performance:**
|
|
- Chat Completions: Reasoning dropped between turns
|
|
- Responses API: Reasoning preserved across turns
|
|
- Result: **5% better on TAUBench** (GPT-5 with Responses vs Chat Completions)
|
|
|
|
---
|
|
|
|
## Stateful Conversations
|
|
|
|
### Automatic State Management
|
|
|
|
The Responses API can automatically manage conversation state using **conversation IDs**.
|
|
|
|
#### Creating a Conversation
|
|
|
|
```typescript
|
|
// Create conversation with initial message
|
|
const conversation = await openai.conversations.create({
|
|
metadata: { user_id: 'user_123' },
|
|
items: [
|
|
{
|
|
type: 'message',
|
|
role: 'user',
|
|
content: 'Hello!',
|
|
},
|
|
],
|
|
});
|
|
|
|
console.log(conversation.id); // "conv_abc123..."
|
|
```
|
|
|
|
#### Using Conversation ID
|
|
|
|
```typescript
|
|
// First turn
|
|
const response1 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
conversation: 'conv_abc123',
|
|
input: 'What are the 5 Ds of dodgeball?',
|
|
});
|
|
|
|
console.log(response1.output_text);
|
|
|
|
// Second turn - model remembers previous context
|
|
const response2 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
conversation: 'conv_abc123',
|
|
input: 'Tell me more about the first one',
|
|
});
|
|
|
|
console.log(response2.output_text);
|
|
// Model automatically knows "first one" refers to first D from previous turn
|
|
```
|
|
|
|
**Why this matters:**
|
|
- No manual history tracking required
|
|
- Reasoning state preserved between turns
|
|
- Automatic context management
|
|
- Lower risk of context errors
|
|
|
|
### Manual State Management (Alternative)
|
|
|
|
If you need full control, you can manually manage history:
|
|
|
|
```typescript
|
|
let history = [
|
|
{ role: 'user', content: 'Tell me a joke' },
|
|
];
|
|
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: history,
|
|
store: true, // Optional: store for retrieval later
|
|
});
|
|
|
|
// Add response to history
|
|
history = [
|
|
...history,
|
|
...response.output.map(el => ({
|
|
role: el.role,
|
|
content: el.content,
|
|
})),
|
|
];
|
|
|
|
// Next turn
|
|
history.push({ role: 'user', content: 'Tell me another' });
|
|
|
|
const secondResponse = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: history,
|
|
});
|
|
```
|
|
|
|
**When to use manual management:**
|
|
- Need custom history pruning logic
|
|
- Want to modify conversation history programmatically
|
|
- Implementing custom caching strategies
|
|
|
|
---
|
|
|
|
## Built-in Tools (Server-Side)
|
|
|
|
The Responses API includes **server-side hosted tools** that eliminate costly backend round trips.
|
|
|
|
### Available Tools
|
|
|
|
| Tool | Purpose | Use Case |
|
|
|------|---------|----------|
|
|
| **Code Interpreter** | Execute Python code | Data analysis, calculations, charts |
|
|
| **File Search** | RAG without vector stores | Search uploaded files for answers |
|
|
| **Web Search** | Real-time web information | Current events, fact-checking |
|
|
| **Image Generation** | DALL-E integration | Create images from descriptions |
|
|
| **MCP** | Connect external tools | Stripe, databases, custom APIs |
|
|
|
|
### Code Interpreter
|
|
|
|
Execute Python code server-side for data analysis, calculations, and visualizations.
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Calculate the mean, median, and mode of: 10, 20, 30, 40, 50',
|
|
tools: [{ type: 'code_interpreter' }],
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
// Model writes and executes Python code, returns results
|
|
```
|
|
|
|
**Advanced Example: Data Analysis**
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Analyze this sales data and create a bar chart showing monthly revenue: [data here]',
|
|
tools: [{ type: 'code_interpreter' }],
|
|
});
|
|
|
|
// Check output for code execution results
|
|
response.output.forEach(item => {
|
|
if (item.type === 'code_interpreter_call') {
|
|
console.log('Code executed:', item.input);
|
|
console.log('Result:', item.output);
|
|
}
|
|
});
|
|
```
|
|
|
|
**Why this matters:**
|
|
- No need to run Python locally
|
|
- Sandboxed execution environment
|
|
- Automatic chart generation
|
|
- Can process uploaded files
|
|
|
|
### File Search (RAG Without Vector Stores)
|
|
|
|
Search through uploaded files without building your own RAG pipeline.
|
|
|
|
```typescript
|
|
// 1. Upload files first (one-time setup)
|
|
const file = await openai.files.create({
|
|
file: fs.createReadStream('knowledge-base.pdf'),
|
|
purpose: 'assistants',
|
|
});
|
|
|
|
// 2. Use file search
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'What does the document say about pricing?',
|
|
tools: [
|
|
{
|
|
type: 'file_search',
|
|
file_ids: [file.id],
|
|
},
|
|
],
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
// Model searches file and provides answer with citations
|
|
```
|
|
|
|
**Supported File Types:**
|
|
- PDFs, Word docs, text files
|
|
- Markdown, HTML
|
|
- Code files (Python, JavaScript, etc.)
|
|
- Max: 512MB per file
|
|
|
|
### Web Search
|
|
|
|
Get real-time information from the web.
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'What are the latest updates on GPT-5?',
|
|
tools: [{ type: 'web_search' }],
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
// Model searches web and provides current information with sources
|
|
```
|
|
|
|
**Why this matters:**
|
|
- No cutoff date limitations
|
|
- Automatic source citations
|
|
- Real-time data access
|
|
- No need for external search APIs
|
|
|
|
### Image Generation (DALL-E)
|
|
|
|
Generate images directly in the Responses API.
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Create an image of a futuristic cityscape at sunset',
|
|
tools: [{ type: 'image_generation' }],
|
|
});
|
|
|
|
// Find image in output
|
|
response.output.forEach(item => {
|
|
if (item.type === 'image_generation_call') {
|
|
console.log('Image URL:', item.output.url);
|
|
}
|
|
});
|
|
```
|
|
|
|
**Models Available:**
|
|
- DALL-E 3 (default)
|
|
- Various sizes and quality options
|
|
|
|
---
|
|
|
|
## MCP Server Integration
|
|
|
|
The Responses API has built-in support for **Model Context Protocol (MCP)** servers, allowing you to connect external tools.
|
|
|
|
### What Is MCP?
|
|
|
|
MCP is an open protocol that standardizes how applications provide context to LLMs. It allows you to:
|
|
- Connect to external APIs (Stripe, databases, CRMs)
|
|
- Use hosted MCP servers
|
|
- Build custom tool integrations
|
|
|
|
### Basic MCP Integration
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Roll 2d6 dice',
|
|
tools: [
|
|
{
|
|
type: 'mcp',
|
|
server_label: 'dice',
|
|
server_url: 'https://example.com/mcp',
|
|
},
|
|
],
|
|
});
|
|
|
|
// Model discovers available tools on MCP server and uses them
|
|
console.log(response.output_text);
|
|
```
|
|
|
|
### MCP with Authentication (OAuth)
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Create a $20 payment link',
|
|
tools: [
|
|
{
|
|
type: 'mcp',
|
|
server_label: 'stripe',
|
|
server_url: 'https://mcp.stripe.com',
|
|
authorization: process.env.STRIPE_OAUTH_TOKEN,
|
|
},
|
|
],
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
// Model uses Stripe MCP server to create payment link
|
|
```
|
|
|
|
**CRITICAL:**
|
|
- API does NOT store authorization tokens
|
|
- Must provide token with each request
|
|
- Use environment variables for security
|
|
|
|
### Polymorphic Output: MCP Tool Calls
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Roll 2d4+1',
|
|
tools: [
|
|
{
|
|
type: 'mcp',
|
|
server_label: 'dice',
|
|
server_url: 'https://dmcp.example.com',
|
|
},
|
|
],
|
|
});
|
|
|
|
// Inspect tool calls
|
|
response.output.forEach(item => {
|
|
if (item.type === 'mcp_call') {
|
|
console.log('Tool:', item.name);
|
|
console.log('Arguments:', item.arguments);
|
|
console.log('Output:', item.output);
|
|
}
|
|
if (item.type === 'mcp_list_tools') {
|
|
console.log('Available tools:', item.tools);
|
|
}
|
|
});
|
|
```
|
|
|
|
**Output Types:**
|
|
- `mcp_list_tools` - Tools discovered on server
|
|
- `mcp_call` - Tool invocation and result
|
|
- `message` - Final response to user
|
|
|
|
---
|
|
|
|
## Reasoning Preservation
|
|
|
|
### How It Works
|
|
|
|
The Responses API preserves the model's **internal reasoning state** across turns, unlike Chat Completions which discards it.
|
|
|
|
**Visual Analogy:**
|
|
- **Chat Completions**: Model has a scratchpad, writes reasoning, then **tears out the page** before responding
|
|
- **Responses API**: Model keeps the scratchpad open, **previous reasoning visible** for next turn
|
|
|
|
### Performance Impact
|
|
|
|
**TAUBench Results (GPT-5):**
|
|
- Chat Completions: Baseline score
|
|
- Responses API: **+5% better** (purely from preserved reasoning)
|
|
|
|
**Why This Matters:**
|
|
- Better multi-turn problem solving
|
|
- More coherent long conversations
|
|
- Improved step-by-step reasoning
|
|
- Fewer context errors
|
|
|
|
### Reasoning Summaries (Free!)
|
|
|
|
The Responses API provides **reasoning summaries** at no additional cost.
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Solve this complex math problem: [problem]',
|
|
});
|
|
|
|
// Inspect reasoning
|
|
response.output.forEach(item => {
|
|
if (item.type === 'reasoning') {
|
|
console.log('Model reasoning:', item.summary[0].text);
|
|
}
|
|
if (item.type === 'message') {
|
|
console.log('Final answer:', item.content[0].text);
|
|
}
|
|
});
|
|
```
|
|
|
|
**Use Cases:**
|
|
- Debugging model decisions
|
|
- Audit trails for compliance
|
|
- Understanding model thought process
|
|
- Building transparent AI systems
|
|
|
|
---
|
|
|
|
## Background Mode (Long-Running Tasks)
|
|
|
|
For tasks that take longer than standard timeout limits, use **background mode**.
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Analyze this 500-page document and summarize key findings',
|
|
background: true,
|
|
tools: [{ type: 'file_search', file_ids: [fileId] }],
|
|
});
|
|
|
|
// Returns immediately with status
|
|
console.log(response.status); // "in_progress"
|
|
console.log(response.id); // Use to check status later
|
|
|
|
// Poll for completion
|
|
const checkStatus = async (responseId) => {
|
|
const result = await openai.responses.retrieve(responseId);
|
|
if (result.status === 'completed') {
|
|
console.log(result.output_text);
|
|
} else if (result.status === 'failed') {
|
|
console.error('Task failed:', result.error);
|
|
} else {
|
|
// Still running, check again later
|
|
setTimeout(() => checkStatus(responseId), 5000);
|
|
}
|
|
};
|
|
|
|
checkStatus(response.id);
|
|
```
|
|
|
|
**When to Use:**
|
|
- Large file processing
|
|
- Complex calculations
|
|
- Multi-step research tasks
|
|
- Data analysis on large datasets
|
|
|
|
**Timeout Limits:**
|
|
- Standard mode: 60 seconds
|
|
- Background mode: Up to 10 minutes
|
|
|
|
---
|
|
|
|
## Polymorphic Outputs
|
|
|
|
The Responses API returns **multiple output types** instead of a single message.
|
|
|
|
### Output Types
|
|
|
|
| Type | Description | Example |
|
|
|------|-------------|---------|
|
|
| `message` | Text response to user | Final answer, explanation |
|
|
| `reasoning` | Model's internal thought process | Step-by-step reasoning summary |
|
|
| `code_interpreter_call` | Code execution | Python code + results |
|
|
| `mcp_call` | Tool invocation | Tool name, args, output |
|
|
| `mcp_list_tools` | Available tools | Tool definitions from MCP server |
|
|
| `file_search_call` | File search results | Matched chunks, citations |
|
|
| `web_search_call` | Web search results | URLs, snippets |
|
|
| `image_generation_call` | Image generation | Image URL |
|
|
|
|
### Processing Polymorphic Outputs
|
|
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Search the web for the latest AI news and summarize',
|
|
tools: [{ type: 'web_search' }],
|
|
});
|
|
|
|
// Process different output types
|
|
response.output.forEach(item => {
|
|
switch (item.type) {
|
|
case 'reasoning':
|
|
console.log('Reasoning:', item.summary[0].text);
|
|
break;
|
|
case 'web_search_call':
|
|
console.log('Searched:', item.query);
|
|
console.log('Sources:', item.results);
|
|
break;
|
|
case 'message':
|
|
console.log('Response:', item.content[0].text);
|
|
break;
|
|
}
|
|
});
|
|
|
|
// Or use helper for text-only
|
|
console.log(response.output_text);
|
|
```
|
|
|
|
**Why This Matters:**
|
|
- Better debugging (see all steps)
|
|
- Audit trails (track all tool calls)
|
|
- Richer UX (show progress to users)
|
|
- Compliance (log all actions)
|
|
|
|
---
|
|
|
|
## Migration from Chat Completions
|
|
|
|
### Breaking Changes
|
|
|
|
| Feature | Chat Completions | Responses API | Migration |
|
|
|---------|-----------------|---------------|-----------|
|
|
| **Endpoint** | `/v1/chat/completions` | `/v1/responses` | Update URL |
|
|
| **Parameter** | `messages` | `input` | Rename parameter |
|
|
| **State** | Manual (`messages` array) | Automatic (`conversation` ID) | Use conversation IDs |
|
|
| **Tools** | `tools` array with functions | Built-in types + MCP | Update tool definitions |
|
|
| **Output** | `choices[0].message.content` | `output_text` or `output` array | Update response parsing |
|
|
| **Streaming** | `data: {"choices":[...]}` | SSE with multiple item types | Update stream parser |
|
|
|
|
### Migration Example
|
|
|
|
**Before (Chat Completions):**
|
|
```typescript
|
|
const response = await openai.chat.completions.create({
|
|
model: 'gpt-5',
|
|
messages: [
|
|
{ role: 'system', content: 'You are a helpful assistant.' },
|
|
{ role: 'user', content: 'Hello!' },
|
|
],
|
|
});
|
|
|
|
console.log(response.choices[0].message.content);
|
|
```
|
|
|
|
**After (Responses):**
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: [
|
|
{ role: 'developer', content: 'You are a helpful assistant.' },
|
|
{ role: 'user', content: 'Hello!' },
|
|
],
|
|
});
|
|
|
|
console.log(response.output_text);
|
|
```
|
|
|
|
**Key Differences:**
|
|
1. `chat.completions.create` → `responses.create`
|
|
2. `messages` → `input`
|
|
3. `system` role → `developer` role
|
|
4. `choices[0].message.content` → `output_text`
|
|
|
|
### When to Migrate
|
|
|
|
**Migrate now if:**
|
|
- ✅ Building new applications
|
|
- ✅ Need stateful conversations
|
|
- ✅ Using agentic patterns (reasoning + tools)
|
|
- ✅ Want better performance (preserved reasoning)
|
|
|
|
**Stay on Chat Completions if:**
|
|
- ✅ Simple one-off generations
|
|
- ✅ Legacy integrations
|
|
- ✅ No need for state management
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### Common Errors and Solutions
|
|
|
|
#### 1. Session State Not Persisting
|
|
|
|
**Error:**
|
|
```
|
|
Conversation state not maintained between turns
|
|
```
|
|
|
|
**Cause:**
|
|
- Not using conversation IDs
|
|
- Using different conversation IDs per turn
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Create conversation once
|
|
const conv = await openai.conversations.create();
|
|
|
|
// Reuse conversation ID for all turns
|
|
const response1 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
conversation: conv.id, // ✅ Same ID
|
|
input: 'First message',
|
|
});
|
|
|
|
const response2 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
conversation: conv.id, // ✅ Same ID
|
|
input: 'Follow-up message',
|
|
});
|
|
```
|
|
|
|
#### 2. MCP Server Connection Failed
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"error": {
|
|
"type": "mcp_connection_error",
|
|
"message": "Failed to connect to MCP server"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Causes:**
|
|
- Invalid server URL
|
|
- Missing or expired authorization token
|
|
- Server not responding
|
|
|
|
**Solutions:**
|
|
```typescript
|
|
// 1. Verify URL is correct
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Test MCP',
|
|
tools: [
|
|
{
|
|
type: 'mcp',
|
|
server_label: 'test',
|
|
server_url: 'https://api.example.com/mcp', // ✅ Full URL
|
|
authorization: process.env.AUTH_TOKEN, // ✅ Valid token
|
|
},
|
|
],
|
|
});
|
|
|
|
// 2. Test server URL manually
|
|
const testResponse = await fetch('https://api.example.com/mcp');
|
|
console.log(testResponse.status); // Should be 200
|
|
|
|
// 3. Check token expiration
|
|
console.log('Token expires:', parseJWT(token).exp);
|
|
```
|
|
|
|
#### 3. Code Interpreter Timeout
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"error": {
|
|
"type": "code_interpreter_timeout",
|
|
"message": "Code execution exceeded time limit"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Cause:**
|
|
- Code runs longer than 30 seconds
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Use background mode for long-running code
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Process this large dataset',
|
|
background: true, // ✅ Extended timeout
|
|
tools: [{ type: 'code_interpreter' }],
|
|
});
|
|
|
|
// Poll for results
|
|
const result = await openai.responses.retrieve(response.id);
|
|
```
|
|
|
|
#### 4. Image Generation Rate Limit
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"error": {
|
|
"type": "rate_limit_error",
|
|
"message": "DALL-E rate limit exceeded"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Cause:**
|
|
- Too many image generation requests
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Implement retry with exponential backoff
|
|
const generateImage = async (prompt, retries = 3) => {
|
|
try {
|
|
return await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: prompt,
|
|
tools: [{ type: 'image_generation' }],
|
|
});
|
|
} catch (error) {
|
|
if (error.type === 'rate_limit_error' && retries > 0) {
|
|
const delay = (4 - retries) * 1000; // 1s, 2s, 3s
|
|
await new Promise(resolve => setTimeout(resolve, delay));
|
|
return generateImage(prompt, retries - 1);
|
|
}
|
|
throw error;
|
|
}
|
|
};
|
|
```
|
|
|
|
#### 5. File Search Relevance Issues
|
|
|
|
**Problem:**
|
|
- File search returns irrelevant results
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Use more specific queries
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Find sections about pricing in Q4 2024 specifically', // ✅ Specific
|
|
// NOT: 'Find pricing' (too vague)
|
|
tools: [{ type: 'file_search', file_ids: [fileId] }],
|
|
});
|
|
|
|
// Or filter results manually
|
|
response.output.forEach(item => {
|
|
if (item.type === 'file_search_call') {
|
|
const relevantChunks = item.results.filter(
|
|
chunk => chunk.score > 0.7 // ✅ Only high-confidence matches
|
|
);
|
|
}
|
|
});
|
|
```
|
|
|
|
#### 6. Cost Tracking Confusion
|
|
|
|
**Problem:**
|
|
- Billing different than expected
|
|
|
|
**Explanation:**
|
|
- Responses API bills for: input tokens + output tokens + tool usage + stored conversations
|
|
- Chat Completions bills only: input tokens + output tokens
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Monitor usage
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Hello',
|
|
store: false, // ✅ Don't store if not needed
|
|
});
|
|
|
|
console.log('Usage:', response.usage);
|
|
// {
|
|
// prompt_tokens: 10,
|
|
// completion_tokens: 20,
|
|
// tool_tokens: 5,
|
|
// total_tokens: 35
|
|
// }
|
|
```
|
|
|
|
#### 7. Conversation Not Found
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"error": {
|
|
"type": "invalid_request_error",
|
|
"message": "Conversation conv_xyz not found"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Causes:**
|
|
- Conversation ID typo
|
|
- Conversation deleted
|
|
- Conversation expired (90 days)
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Verify conversation exists before using
|
|
const conversations = await openai.conversations.list();
|
|
const exists = conversations.data.some(c => c.id === 'conv_xyz');
|
|
|
|
if (!exists) {
|
|
// Create new conversation
|
|
const newConv = await openai.conversations.create();
|
|
// Use newConv.id
|
|
}
|
|
```
|
|
|
|
#### 8. Tool Output Parsing Failed
|
|
|
|
**Problem:**
|
|
- Can't access tool outputs correctly
|
|
|
|
**Solution:**
|
|
```typescript
|
|
// Use helper methods
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Search for AI news',
|
|
tools: [{ type: 'web_search' }],
|
|
});
|
|
|
|
// Helper: Get text-only output
|
|
console.log(response.output_text);
|
|
|
|
// Manual: Inspect all outputs
|
|
response.output.forEach(item => {
|
|
console.log('Type:', item.type);
|
|
console.log('Content:', item);
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Production Patterns
|
|
|
|
### Cost Optimization
|
|
|
|
**1. Use Conversation IDs (Cache Benefits)**
|
|
```typescript
|
|
// ✅ GOOD: Reuse conversation ID
|
|
const conv = await openai.conversations.create();
|
|
const response1 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
conversation: conv.id,
|
|
input: 'Question 1',
|
|
});
|
|
// 40-80% better cache utilization
|
|
|
|
// ❌ BAD: New manual history each time
|
|
const response2 = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: [...previousHistory, newMessage],
|
|
});
|
|
// No cache benefits
|
|
```
|
|
|
|
**2. Disable Storage When Not Needed**
|
|
```typescript
|
|
// For one-off requests
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input: 'Quick question',
|
|
store: false, // ✅ Don't store conversation
|
|
});
|
|
```
|
|
|
|
**3. Use Smaller Models When Possible**
|
|
```typescript
|
|
// For simple tasks
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5-mini', // ✅ 50% cheaper
|
|
input: 'Summarize this paragraph',
|
|
});
|
|
```
|
|
|
|
### Rate Limit Handling
|
|
|
|
```typescript
|
|
const createResponseWithRetry = async (params, maxRetries = 3) => {
|
|
for (let i = 0; i < maxRetries; i++) {
|
|
try {
|
|
return await openai.responses.create(params);
|
|
} catch (error) {
|
|
if (error.type === 'rate_limit_error' && i < maxRetries - 1) {
|
|
const delay = Math.pow(2, i) * 1000; // Exponential backoff
|
|
console.log(`Rate limited, retrying in ${delay}ms`);
|
|
await new Promise(resolve => setTimeout(resolve, delay));
|
|
} else {
|
|
throw error;
|
|
}
|
|
}
|
|
}
|
|
};
|
|
```
|
|
|
|
### Monitoring and Logging
|
|
|
|
```typescript
|
|
const monitoredResponse = async (input) => {
|
|
const startTime = Date.now();
|
|
|
|
try {
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input,
|
|
});
|
|
|
|
// Log success metrics
|
|
console.log({
|
|
status: 'success',
|
|
latency: Date.now() - startTime,
|
|
tokens: response.usage.total_tokens,
|
|
model: response.model,
|
|
conversation: response.conversation_id,
|
|
});
|
|
|
|
return response;
|
|
} catch (error) {
|
|
// Log error metrics
|
|
console.error({
|
|
status: 'error',
|
|
latency: Date.now() - startTime,
|
|
error: error.message,
|
|
type: error.type,
|
|
});
|
|
throw error;
|
|
}
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## Node.js vs Cloudflare Workers
|
|
|
|
### Node.js Implementation
|
|
|
|
```typescript
|
|
import OpenAI from 'openai';
|
|
|
|
const openai = new OpenAI({
|
|
apiKey: process.env.OPENAI_API_KEY,
|
|
});
|
|
|
|
export async function handleRequest(input: string) {
|
|
const response = await openai.responses.create({
|
|
model: 'gpt-5',
|
|
input,
|
|
tools: [{ type: 'web_search' }],
|
|
});
|
|
|
|
return response.output_text;
|
|
}
|
|
```
|
|
|
|
**Pros:**
|
|
- Full SDK support
|
|
- Type safety
|
|
- Streaming helpers
|
|
|
|
**Cons:**
|
|
- Requires Node.js runtime
|
|
- Larger bundle size
|
|
|
|
### Cloudflare Workers Implementation
|
|
|
|
```typescript
|
|
export default {
|
|
async fetch(request: Request, env: Env): Promise<Response> {
|
|
const { input } = await request.json();
|
|
|
|
const response = await fetch('https://api.openai.com/v1/responses', {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
|
|
'Content-Type': 'application/json',
|
|
},
|
|
body: JSON.stringify({
|
|
model: 'gpt-5',
|
|
input,
|
|
tools: [{ type: 'web_search' }],
|
|
}),
|
|
});
|
|
|
|
const data = await response.json();
|
|
|
|
return new Response(data.output_text, {
|
|
headers: { 'Content-Type': 'text/plain' },
|
|
});
|
|
},
|
|
};
|
|
```
|
|
|
|
**Pros:**
|
|
- No dependencies
|
|
- Edge deployment
|
|
- Faster cold starts
|
|
|
|
**Cons:**
|
|
- Manual request building
|
|
- No type safety without custom types
|
|
|
|
---
|
|
|
|
## Always Do / Never Do
|
|
|
|
### ✅ Always Do
|
|
|
|
1. **Use conversation IDs for multi-turn interactions**
|
|
```typescript
|
|
const conv = await openai.conversations.create();
|
|
// Reuse conv.id for all related turns
|
|
```
|
|
|
|
2. **Handle all output types in polymorphic responses**
|
|
```typescript
|
|
response.output.forEach(item => {
|
|
if (item.type === 'reasoning') { /* log */ }
|
|
if (item.type === 'message') { /* display */ }
|
|
});
|
|
```
|
|
|
|
3. **Use background mode for long-running tasks**
|
|
```typescript
|
|
const response = await openai.responses.create({
|
|
background: true, // ✅ For tasks >30s
|
|
...
|
|
});
|
|
```
|
|
|
|
4. **Provide authorization tokens for MCP servers**
|
|
```typescript
|
|
tools: [{
|
|
type: 'mcp',
|
|
authorization: process.env.TOKEN, // ✅ Required
|
|
}]
|
|
```
|
|
|
|
5. **Monitor token usage for cost control**
|
|
```typescript
|
|
console.log(response.usage.total_tokens);
|
|
```
|
|
|
|
### ❌ Never Do
|
|
|
|
1. **Never expose API keys in client-side code**
|
|
```typescript
|
|
// ❌ DANGER: API key in browser
|
|
const response = await fetch('https://api.openai.com/v1/responses', {
|
|
headers: { 'Authorization': 'Bearer sk-proj-...' }
|
|
});
|
|
```
|
|
|
|
2. **Never assume single message output**
|
|
```typescript
|
|
// ❌ BAD: Ignores reasoning, tool calls
|
|
console.log(response.output[0].content);
|
|
|
|
// ✅ GOOD: Use helper or check all types
|
|
console.log(response.output_text);
|
|
```
|
|
|
|
3. **Never reuse conversation IDs across users**
|
|
```typescript
|
|
// ❌ DANGER: User A sees User B's conversation
|
|
const sharedConv = 'conv_123';
|
|
```
|
|
|
|
4. **Never ignore error types**
|
|
```typescript
|
|
// ❌ BAD: Generic error handling
|
|
try { ... } catch (e) { console.log('error'); }
|
|
|
|
// ✅ GOOD: Type-specific handling
|
|
catch (e) {
|
|
if (e.type === 'rate_limit_error') { /* retry */ }
|
|
if (e.type === 'mcp_connection_error') { /* alert */ }
|
|
}
|
|
```
|
|
|
|
5. **Never poll faster than 1 second for background tasks**
|
|
```typescript
|
|
// ❌ BAD: Too frequent
|
|
setInterval(() => checkStatus(), 100);
|
|
|
|
// ✅ GOOD: Reasonable interval
|
|
setInterval(() => checkStatus(), 5000);
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
### Official Documentation
|
|
- **Responses API Guide**: https://platform.openai.com/docs/guides/responses
|
|
- **API Reference**: https://platform.openai.com/docs/api-reference/responses
|
|
- **MCP Integration**: https://platform.openai.com/docs/guides/tools-connectors-mcp
|
|
- **Blog Post (Why Responses API)**: https://developers.openai.com/blog/responses-api/
|
|
- **Starter App**: https://github.com/openai/openai-responses-starter-app
|
|
|
|
### Skill Resources
|
|
- `templates/` - Working code examples
|
|
- `references/responses-vs-chat-completions.md` - Feature comparison
|
|
- `references/mcp-integration-guide.md` - MCP server setup
|
|
- `references/built-in-tools-guide.md` - Tool usage patterns
|
|
- `references/stateful-conversations.md` - Conversation management
|
|
- `references/migration-guide.md` - Chat Completions → Responses
|
|
- `references/top-errors.md` - Common errors and solutions
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Read `templates/basic-response.ts` - Simple example
|
|
2. ✅ Try `templates/stateful-conversation.ts` - Multi-turn chat
|
|
3. ✅ Explore `templates/mcp-integration.ts` - External tools
|
|
4. ✅ Review `references/top-errors.md` - Avoid common pitfalls
|
|
5. ✅ Check `references/migration-guide.md` - If migrating from Chat Completions
|
|
|
|
**Happy building with the Responses API!** 🚀
|