Files
gh-jezweb-claude-skills-ski…/references/responses-vs-chat-completions.md
2025-11-30 08:25:17 +08:00

493 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Responses API vs Chat Completions: Complete Comparison
**Last Updated**: 2025-10-25
This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.
---
## Quick Decision Guide
### ✅ Use Responses API When:
- Building **agentic applications** (reasoning + actions)
- Need **multi-turn conversations** with automatic state management
- Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen)
- Connecting to **MCP servers** for external integrations
- Want **preserved reasoning** for better multi-turn performance
- Implementing **background processing** for long tasks
- Need **polymorphic outputs** for debugging/auditing
### ✅ Use Chat Completions When:
- Simple **one-off text generation**
- Fully **stateless** interactions (no conversation continuity needed)
- **Legacy integrations** with existing Chat Completions code
- Very **simple use cases** without tools
---
## Feature Comparison Matrix
| Feature | Chat Completions | Responses API | Winner |
|---------|-----------------|---------------|---------|
| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ |
| **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ |
| **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ |
| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ |
| **Cache Utilization** | Baseline | 40-80% better | Responses ✅ |
| **MCP Support** | Manual integration required | Built-in | Responses ✅ |
| **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ |
| **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ |
| **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ |
---
## API Comparison
### Endpoints
**Chat Completions:**
```
POST /v1/chat/completions
```
**Responses:**
```
POST /v1/responses
```
---
### Request Structure
**Chat Completions:**
```typescript
{
model: 'gpt-5',
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
temperature: 0.7,
max_tokens: 1000,
}
```
**Responses:**
```typescript
{
model: 'gpt-5',
input: [
{ role: 'developer', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
conversation: 'conv_abc123', // Optional: automatic state
temperature: 0.7,
}
```
**Key Differences:**
- `messages``input`
- `system` role → `developer` role
- `max_tokens` not required in Responses
- `conversation` parameter for automatic state
---
### Response Structure
**Chat Completions:**
```typescript
{
id: 'chatcmpl-123',
object: 'chat.completion',
created: 1677652288,
model: 'gpt-5',
choices: [
{
index: 0,
message: {
role: 'assistant',
content: 'Hello! How can I help?',
},
finish_reason: 'stop',
},
],
usage: {
prompt_tokens: 10,
completion_tokens: 5,
total_tokens: 15,
},
}
```
**Responses:**
```typescript
{
id: 'resp_123',
object: 'response',
created: 1677652288,
model: 'gpt-5',
output: [
{
type: 'reasoning',
summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
},
{
type: 'message',
role: 'assistant',
content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
},
],
output_text: 'Hello! How can I help?', // Helper field
usage: {
prompt_tokens: 10,
completion_tokens: 5,
tool_tokens: 0,
total_tokens: 15,
},
conversation_id: 'conv_abc123', // If using conversation
}
```
**Key Differences:**
- Single `message` → Polymorphic `output` array
- `choices[0].message.content``output_text` helper
- Additional output types: `reasoning`, `tool_calls`, etc.
- `conversation_id` included if using conversations
---
## State Management Comparison
### Chat Completions (Manual)
```typescript
// You track history manually
let messages = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'What is AI?' },
];
const response1 = await openai.chat.completions.create({
model: 'gpt-5',
messages,
});
// Add response to history
messages.push({
role: 'assistant',
content: response1.choices[0].message.content,
});
// Next turn
messages.push({ role: 'user', content: 'Tell me more' });
const response2 = await openai.chat.completions.create({
model: 'gpt-5',
messages, // ✅ You must pass full history
});
```
**Pros:**
- Full control over history
- Can prune old messages
- Simple for one-off requests
**Cons:**
- Manual tracking error-prone
- Must handle history yourself
- No automatic caching benefits
### Responses (Automatic)
```typescript
// Create conversation once
const conv = await openai.conversations.create();
const response1 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Automatic state
input: 'What is AI?',
});
// Next turn - no manual history tracking
const response2 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Remembers previous turn
input: 'Tell me more',
});
```
**Pros:**
- Automatic state management
- No manual history tracking
- Better cache utilization (40-80%)
- Reasoning preserved
**Cons:**
- Less direct control
- Must create conversation first
- Conversations expire after 90 days
---
## Reasoning Preservation
### Chat Completions
**What Happens:**
1. Model generates internal reasoning (scratchpad)
2. Reasoning used to produce response
3. **Reasoning discarded** before returning
4. Next turn starts fresh (no reasoning memory)
**Visual:**
```
Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted
```
**Impact:**
- Model "forgets" its thought process
- May repeat reasoning steps
- Lower performance on complex multi-turn tasks
### Responses API
**What Happens:**
1. Model generates internal reasoning
2. Reasoning used to produce response
3. **Reasoning preserved** in conversation state
4. Next turn builds on previous reasoning
**Visual:**
```
Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved
```
**Impact:**
- Model remembers thought process
- No redundant reasoning
- **+5% better on TAUBench (GPT-5)**
- Better multi-turn problem solving
---
## Tools Comparison
### Chat Completions (Client-Side)
```typescript
// 1. Define function
const response1 = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'What is the weather?' }],
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather',
parameters: {
type: 'object',
properties: {
location: { type: 'string' },
},
},
},
},
],
});
// 2. Check if tool called
const toolCall = response1.choices[0].message.tool_calls?.[0];
// 3. Execute tool on your server
const weatherData = await getWeather(toolCall.function.arguments);
// 4. Send result back
const response2 = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
...messages,
response1.choices[0].message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(weatherData),
},
],
});
```
**Pros:**
- Full control over tool execution
- Can use any custom tools
**Cons:**
- Manual round trips (latency)
- More complex code
- You handle tool execution
### Responses (Server-Side Built-in)
```typescript
// All in one request - tools executed server-side
const response = await openai.responses.create({
model: 'gpt-5',
input: 'What is the weather and analyze the temperature trend?',
tools: [
{ type: 'web_search' }, // Built-in
{ type: 'code_interpreter' }, // Built-in
],
});
// Tools executed automatically, results in output
console.log(response.output_text);
```
**Pros:**
- No round trips (lower latency)
- Simpler code
- Built-in tools (no setup)
**Cons:**
- Less control over execution
- Limited to built-in + MCP tools
---
## Performance Benchmarks
### TAUBench (GPT-5)
| Scenario | Chat Completions | Responses API | Difference |
|----------|-----------------|---------------|------------|
| Multi-turn reasoning | 82% | 87% | **+5%** |
| Tool usage accuracy | 85% | 88% | **+3%** |
| Context retention | 78% | 85% | **+7%** |
### Cache Utilization
| Metric | Chat Completions | Responses API | Improvement |
|--------|-----------------|---------------|-------------|
| Cache hit rate | 30% | 54-72% | **40-80% better** |
| Latency (cached) | 100ms | 60-80ms | **20-40% faster** |
| Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** |
---
## Cost Comparison
### Pricing Structure
**Chat Completions:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- **No storage costs**
**Responses:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- Tool tokens: $Z per 1K (if tools used)
- **Conversation storage**: $0.01 per conversation per month
### Example Cost Calculation
**Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn
**Chat Completions:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Total: $A + $B
```
**Responses:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Storage: 100 convs × $0.01 = $1
Cache savings: -30% on input (due to better caching)
Total: ($A × 0.7) + $B + $1 (usually cheaper!)
```
---
## Migration Path
### Simple Migration
**Before (Chat Completions):**
```typescript
const response = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.choices[0].message.content);
```
**After (Responses):**
```typescript
const response = await openai.responses.create({
model: 'gpt-5',
input: [
{ role: 'developer', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.output_text);
```
**Changes:**
1. `chat.completions.create``responses.create`
2. `messages``input`
3. `system``developer`
4. `choices[0].message.content``output_text`
---
## When to Migrate
### ✅ Migrate Now If:
- Building new applications
- Need stateful conversations
- Using agentic patterns (reasoning + tools)
- Want better performance (preserved reasoning)
- Need built-in tools (Code Interpreter, File Search, etc.)
### ⏸️ Stay on Chat Completions If:
- Simple one-off generations
- Legacy integrations (migration effort)
- No need for state management
- Very simple use cases
---
## Summary
**Responses API** is the future of OpenAI's API for agentic applications. It provides:
- ✅ Better performance (+5% on TAUBench)
- ✅ Lower latency (40-80% better caching)
- ✅ Simpler code (automatic state management)
- ✅ More features (built-in tools, MCP, reasoning preservation)
**Chat Completions** is still great for:
- ✅ Simple one-off text generation
- ✅ Legacy integrations
- ✅ When you need maximum simplicity
**Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.