493 lines
12 KiB
Markdown
493 lines
12 KiB
Markdown
# Responses API vs Chat Completions: Complete Comparison
|
||
|
||
**Last Updated**: 2025-10-25
|
||
|
||
This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.
|
||
|
||
---
|
||
|
||
## Quick Decision Guide
|
||
|
||
### ✅ Use Responses API When:
|
||
|
||
- Building **agentic applications** (reasoning + actions)
|
||
- Need **multi-turn conversations** with automatic state management
|
||
- Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen)
|
||
- Connecting to **MCP servers** for external integrations
|
||
- Want **preserved reasoning** for better multi-turn performance
|
||
- Implementing **background processing** for long tasks
|
||
- Need **polymorphic outputs** for debugging/auditing
|
||
|
||
### ✅ Use Chat Completions When:
|
||
|
||
- Simple **one-off text generation**
|
||
- Fully **stateless** interactions (no conversation continuity needed)
|
||
- **Legacy integrations** with existing Chat Completions code
|
||
- Very **simple use cases** without tools
|
||
|
||
---
|
||
|
||
## Feature Comparison Matrix
|
||
|
||
| Feature | Chat Completions | Responses API | Winner |
|
||
|---------|-----------------|---------------|---------|
|
||
| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ |
|
||
| **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ |
|
||
| **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ |
|
||
| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ |
|
||
| **Cache Utilization** | Baseline | 40-80% better | Responses ✅ |
|
||
| **MCP Support** | Manual integration required | Built-in | Responses ✅ |
|
||
| **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ |
|
||
| **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ |
|
||
| **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ |
|
||
|
||
---
|
||
|
||
## API Comparison
|
||
|
||
### Endpoints
|
||
|
||
**Chat Completions:**
|
||
```
|
||
POST /v1/chat/completions
|
||
```
|
||
|
||
**Responses:**
|
||
```
|
||
POST /v1/responses
|
||
```
|
||
|
||
---
|
||
|
||
### Request Structure
|
||
|
||
**Chat Completions:**
|
||
```typescript
|
||
{
|
||
model: 'gpt-5',
|
||
messages: [
|
||
{ role: 'system', content: 'You are helpful.' },
|
||
{ role: 'user', content: 'Hello!' },
|
||
],
|
||
temperature: 0.7,
|
||
max_tokens: 1000,
|
||
}
|
||
```
|
||
|
||
**Responses:**
|
||
```typescript
|
||
{
|
||
model: 'gpt-5',
|
||
input: [
|
||
{ role: 'developer', content: 'You are helpful.' },
|
||
{ role: 'user', content: 'Hello!' },
|
||
],
|
||
conversation: 'conv_abc123', // Optional: automatic state
|
||
temperature: 0.7,
|
||
}
|
||
```
|
||
|
||
**Key Differences:**
|
||
- `messages` → `input`
|
||
- `system` role → `developer` role
|
||
- `max_tokens` not required in Responses
|
||
- `conversation` parameter for automatic state
|
||
|
||
---
|
||
|
||
### Response Structure
|
||
|
||
**Chat Completions:**
|
||
```typescript
|
||
{
|
||
id: 'chatcmpl-123',
|
||
object: 'chat.completion',
|
||
created: 1677652288,
|
||
model: 'gpt-5',
|
||
choices: [
|
||
{
|
||
index: 0,
|
||
message: {
|
||
role: 'assistant',
|
||
content: 'Hello! How can I help?',
|
||
},
|
||
finish_reason: 'stop',
|
||
},
|
||
],
|
||
usage: {
|
||
prompt_tokens: 10,
|
||
completion_tokens: 5,
|
||
total_tokens: 15,
|
||
},
|
||
}
|
||
```
|
||
|
||
**Responses:**
|
||
```typescript
|
||
{
|
||
id: 'resp_123',
|
||
object: 'response',
|
||
created: 1677652288,
|
||
model: 'gpt-5',
|
||
output: [
|
||
{
|
||
type: 'reasoning',
|
||
summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
|
||
},
|
||
{
|
||
type: 'message',
|
||
role: 'assistant',
|
||
content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
|
||
},
|
||
],
|
||
output_text: 'Hello! How can I help?', // Helper field
|
||
usage: {
|
||
prompt_tokens: 10,
|
||
completion_tokens: 5,
|
||
tool_tokens: 0,
|
||
total_tokens: 15,
|
||
},
|
||
conversation_id: 'conv_abc123', // If using conversation
|
||
}
|
||
```
|
||
|
||
**Key Differences:**
|
||
- Single `message` → Polymorphic `output` array
|
||
- `choices[0].message.content` → `output_text` helper
|
||
- Additional output types: `reasoning`, `tool_calls`, etc.
|
||
- `conversation_id` included if using conversations
|
||
|
||
---
|
||
|
||
## State Management Comparison
|
||
|
||
### Chat Completions (Manual)
|
||
|
||
```typescript
|
||
// You track history manually
|
||
let messages = [
|
||
{ role: 'system', content: 'You are helpful.' },
|
||
{ role: 'user', content: 'What is AI?' },
|
||
];
|
||
|
||
const response1 = await openai.chat.completions.create({
|
||
model: 'gpt-5',
|
||
messages,
|
||
});
|
||
|
||
// Add response to history
|
||
messages.push({
|
||
role: 'assistant',
|
||
content: response1.choices[0].message.content,
|
||
});
|
||
|
||
// Next turn
|
||
messages.push({ role: 'user', content: 'Tell me more' });
|
||
|
||
const response2 = await openai.chat.completions.create({
|
||
model: 'gpt-5',
|
||
messages, // ✅ You must pass full history
|
||
});
|
||
```
|
||
|
||
**Pros:**
|
||
- Full control over history
|
||
- Can prune old messages
|
||
- Simple for one-off requests
|
||
|
||
**Cons:**
|
||
- Manual tracking error-prone
|
||
- Must handle history yourself
|
||
- No automatic caching benefits
|
||
|
||
### Responses (Automatic)
|
||
|
||
```typescript
|
||
// Create conversation once
|
||
const conv = await openai.conversations.create();
|
||
|
||
const response1 = await openai.responses.create({
|
||
model: 'gpt-5',
|
||
conversation: conv.id, // ✅ Automatic state
|
||
input: 'What is AI?',
|
||
});
|
||
|
||
// Next turn - no manual history tracking
|
||
const response2 = await openai.responses.create({
|
||
model: 'gpt-5',
|
||
conversation: conv.id, // ✅ Remembers previous turn
|
||
input: 'Tell me more',
|
||
});
|
||
```
|
||
|
||
**Pros:**
|
||
- Automatic state management
|
||
- No manual history tracking
|
||
- Better cache utilization (40-80%)
|
||
- Reasoning preserved
|
||
|
||
**Cons:**
|
||
- Less direct control
|
||
- Must create conversation first
|
||
- Conversations expire after 90 days
|
||
|
||
---
|
||
|
||
## Reasoning Preservation
|
||
|
||
### Chat Completions
|
||
|
||
**What Happens:**
|
||
1. Model generates internal reasoning (scratchpad)
|
||
2. Reasoning used to produce response
|
||
3. **Reasoning discarded** before returning
|
||
4. Next turn starts fresh (no reasoning memory)
|
||
|
||
**Visual:**
|
||
```
|
||
Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
|
||
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
|
||
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted
|
||
```
|
||
|
||
**Impact:**
|
||
- Model "forgets" its thought process
|
||
- May repeat reasoning steps
|
||
- Lower performance on complex multi-turn tasks
|
||
|
||
### Responses API
|
||
|
||
**What Happens:**
|
||
1. Model generates internal reasoning
|
||
2. Reasoning used to produce response
|
||
3. **Reasoning preserved** in conversation state
|
||
4. Next turn builds on previous reasoning
|
||
|
||
**Visual:**
|
||
```
|
||
Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
|
||
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
|
||
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved
|
||
```
|
||
|
||
**Impact:**
|
||
- Model remembers thought process
|
||
- No redundant reasoning
|
||
- **+5% better on TAUBench (GPT-5)**
|
||
- Better multi-turn problem solving
|
||
|
||
---
|
||
|
||
## Tools Comparison
|
||
|
||
### Chat Completions (Client-Side)
|
||
|
||
```typescript
|
||
// 1. Define function
|
||
const response1 = await openai.chat.completions.create({
|
||
model: 'gpt-5',
|
||
messages: [{ role: 'user', content: 'What is the weather?' }],
|
||
tools: [
|
||
{
|
||
type: 'function',
|
||
function: {
|
||
name: 'get_weather',
|
||
description: 'Get weather',
|
||
parameters: {
|
||
type: 'object',
|
||
properties: {
|
||
location: { type: 'string' },
|
||
},
|
||
},
|
||
},
|
||
},
|
||
],
|
||
});
|
||
|
||
// 2. Check if tool called
|
||
const toolCall = response1.choices[0].message.tool_calls?.[0];
|
||
|
||
// 3. Execute tool on your server
|
||
const weatherData = await getWeather(toolCall.function.arguments);
|
||
|
||
// 4. Send result back
|
||
const response2 = await openai.chat.completions.create({
|
||
model: 'gpt-5',
|
||
messages: [
|
||
...messages,
|
||
response1.choices[0].message,
|
||
{
|
||
role: 'tool',
|
||
tool_call_id: toolCall.id,
|
||
content: JSON.stringify(weatherData),
|
||
},
|
||
],
|
||
});
|
||
```
|
||
|
||
**Pros:**
|
||
- Full control over tool execution
|
||
- Can use any custom tools
|
||
|
||
**Cons:**
|
||
- Manual round trips (latency)
|
||
- More complex code
|
||
- You handle tool execution
|
||
|
||
### Responses (Server-Side Built-in)
|
||
|
||
```typescript
|
||
// All in one request - tools executed server-side
|
||
const response = await openai.responses.create({
|
||
model: 'gpt-5',
|
||
input: 'What is the weather and analyze the temperature trend?',
|
||
tools: [
|
||
{ type: 'web_search' }, // Built-in
|
||
{ type: 'code_interpreter' }, // Built-in
|
||
],
|
||
});
|
||
|
||
// Tools executed automatically, results in output
|
||
console.log(response.output_text);
|
||
```
|
||
|
||
**Pros:**
|
||
- No round trips (lower latency)
|
||
- Simpler code
|
||
- Built-in tools (no setup)
|
||
|
||
**Cons:**
|
||
- Less control over execution
|
||
- Limited to built-in + MCP tools
|
||
|
||
---
|
||
|
||
## Performance Benchmarks
|
||
|
||
### TAUBench (GPT-5)
|
||
|
||
| Scenario | Chat Completions | Responses API | Difference |
|
||
|----------|-----------------|---------------|------------|
|
||
| Multi-turn reasoning | 82% | 87% | **+5%** |
|
||
| Tool usage accuracy | 85% | 88% | **+3%** |
|
||
| Context retention | 78% | 85% | **+7%** |
|
||
|
||
### Cache Utilization
|
||
|
||
| Metric | Chat Completions | Responses API | Improvement |
|
||
|--------|-----------------|---------------|-------------|
|
||
| Cache hit rate | 30% | 54-72% | **40-80% better** |
|
||
| Latency (cached) | 100ms | 60-80ms | **20-40% faster** |
|
||
| Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** |
|
||
|
||
---
|
||
|
||
## Cost Comparison
|
||
|
||
### Pricing Structure
|
||
|
||
**Chat Completions:**
|
||
- Input tokens: $X per 1K
|
||
- Output tokens: $Y per 1K
|
||
- **No storage costs**
|
||
|
||
**Responses:**
|
||
- Input tokens: $X per 1K
|
||
- Output tokens: $Y per 1K
|
||
- Tool tokens: $Z per 1K (if tools used)
|
||
- **Conversation storage**: $0.01 per conversation per month
|
||
|
||
### Example Cost Calculation
|
||
|
||
**Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn
|
||
|
||
**Chat Completions:**
|
||
```
|
||
Input: 100 convs × 10 turns × 500 tokens × $X = $A
|
||
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
|
||
Total: $A + $B
|
||
```
|
||
|
||
**Responses:**
|
||
```
|
||
Input: 100 convs × 10 turns × 500 tokens × $X = $A
|
||
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
|
||
Storage: 100 convs × $0.01 = $1
|
||
Cache savings: -30% on input (due to better caching)
|
||
Total: ($A × 0.7) + $B + $1 (usually cheaper!)
|
||
```
|
||
|
||
---
|
||
|
||
## Migration Path
|
||
|
||
### Simple Migration
|
||
|
||
**Before (Chat Completions):**
|
||
```typescript
|
||
const response = await openai.chat.completions.create({
|
||
model: 'gpt-5',
|
||
messages: [
|
||
{ role: 'system', content: 'You are helpful.' },
|
||
{ role: 'user', content: 'Hello!' },
|
||
],
|
||
});
|
||
|
||
console.log(response.choices[0].message.content);
|
||
```
|
||
|
||
**After (Responses):**
|
||
```typescript
|
||
const response = await openai.responses.create({
|
||
model: 'gpt-5',
|
||
input: [
|
||
{ role: 'developer', content: 'You are helpful.' },
|
||
{ role: 'user', content: 'Hello!' },
|
||
],
|
||
});
|
||
|
||
console.log(response.output_text);
|
||
```
|
||
|
||
**Changes:**
|
||
1. `chat.completions.create` → `responses.create`
|
||
2. `messages` → `input`
|
||
3. `system` → `developer`
|
||
4. `choices[0].message.content` → `output_text`
|
||
|
||
---
|
||
|
||
## When to Migrate
|
||
|
||
### ✅ Migrate Now If:
|
||
|
||
- Building new applications
|
||
- Need stateful conversations
|
||
- Using agentic patterns (reasoning + tools)
|
||
- Want better performance (preserved reasoning)
|
||
- Need built-in tools (Code Interpreter, File Search, etc.)
|
||
|
||
### ⏸️ Stay on Chat Completions If:
|
||
|
||
- Simple one-off generations
|
||
- Legacy integrations (migration effort)
|
||
- No need for state management
|
||
- Very simple use cases
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
**Responses API** is the future of OpenAI's API for agentic applications. It provides:
|
||
- ✅ Better performance (+5% on TAUBench)
|
||
- ✅ Lower latency (40-80% better caching)
|
||
- ✅ Simpler code (automatic state management)
|
||
- ✅ More features (built-in tools, MCP, reasoning preservation)
|
||
|
||
**Chat Completions** is still great for:
|
||
- ✅ Simple one-off text generation
|
||
- ✅ Legacy integrations
|
||
- ✅ When you need maximum simplicity
|
||
|
||
**Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.
|