Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:25:17 +08:00
commit 07f3f3c71c
22 changed files with 5007 additions and 0 deletions

View File

@@ -0,0 +1,492 @@
# Responses API vs Chat Completions: Complete Comparison
**Last Updated**: 2025-10-25
This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.
---
## Quick Decision Guide
### ✅ Use Responses API When:
- Building **agentic applications** (reasoning + actions)
- Need **multi-turn conversations** with automatic state management
- Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen)
- Connecting to **MCP servers** for external integrations
- Want **preserved reasoning** for better multi-turn performance
- Implementing **background processing** for long tasks
- Need **polymorphic outputs** for debugging/auditing
### ✅ Use Chat Completions When:
- Simple **one-off text generation**
- Fully **stateless** interactions (no conversation continuity needed)
- **Legacy integrations** with existing Chat Completions code
- Very **simple use cases** without tools
---
## Feature Comparison Matrix
| Feature | Chat Completions | Responses API | Winner |
|---------|-----------------|---------------|---------|
| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ |
| **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ |
| **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ |
| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ |
| **Cache Utilization** | Baseline | 40-80% better | Responses ✅ |
| **MCP Support** | Manual integration required | Built-in | Responses ✅ |
| **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ |
| **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ |
| **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ |
---
## API Comparison
### Endpoints
**Chat Completions:**
```
POST /v1/chat/completions
```
**Responses:**
```
POST /v1/responses
```
---
### Request Structure
**Chat Completions:**
```typescript
{
model: 'gpt-5',
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
temperature: 0.7,
max_tokens: 1000,
}
```
**Responses:**
```typescript
{
model: 'gpt-5',
input: [
{ role: 'developer', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
conversation: 'conv_abc123', // Optional: automatic state
temperature: 0.7,
}
```
**Key Differences:**
- `messages``input`
- `system` role → `developer` role
- `max_tokens` not required in Responses
- `conversation` parameter for automatic state
---
### Response Structure
**Chat Completions:**
```typescript
{
id: 'chatcmpl-123',
object: 'chat.completion',
created: 1677652288,
model: 'gpt-5',
choices: [
{
index: 0,
message: {
role: 'assistant',
content: 'Hello! How can I help?',
},
finish_reason: 'stop',
},
],
usage: {
prompt_tokens: 10,
completion_tokens: 5,
total_tokens: 15,
},
}
```
**Responses:**
```typescript
{
id: 'resp_123',
object: 'response',
created: 1677652288,
model: 'gpt-5',
output: [
{
type: 'reasoning',
summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
},
{
type: 'message',
role: 'assistant',
content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
},
],
output_text: 'Hello! How can I help?', // Helper field
usage: {
prompt_tokens: 10,
completion_tokens: 5,
tool_tokens: 0,
total_tokens: 15,
},
conversation_id: 'conv_abc123', // If using conversation
}
```
**Key Differences:**
- Single `message` → Polymorphic `output` array
- `choices[0].message.content``output_text` helper
- Additional output types: `reasoning`, `tool_calls`, etc.
- `conversation_id` included if using conversations
---
## State Management Comparison
### Chat Completions (Manual)
```typescript
// You track history manually
let messages = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'What is AI?' },
];
const response1 = await openai.chat.completions.create({
model: 'gpt-5',
messages,
});
// Add response to history
messages.push({
role: 'assistant',
content: response1.choices[0].message.content,
});
// Next turn
messages.push({ role: 'user', content: 'Tell me more' });
const response2 = await openai.chat.completions.create({
model: 'gpt-5',
messages, // ✅ You must pass full history
});
```
**Pros:**
- Full control over history
- Can prune old messages
- Simple for one-off requests
**Cons:**
- Manual tracking error-prone
- Must handle history yourself
- No automatic caching benefits
### Responses (Automatic)
```typescript
// Create conversation once
const conv = await openai.conversations.create();
const response1 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Automatic state
input: 'What is AI?',
});
// Next turn - no manual history tracking
const response2 = await openai.responses.create({
model: 'gpt-5',
conversation: conv.id, // ✅ Remembers previous turn
input: 'Tell me more',
});
```
**Pros:**
- Automatic state management
- No manual history tracking
- Better cache utilization (40-80%)
- Reasoning preserved
**Cons:**
- Less direct control
- Must create conversation first
- Conversations expire after 90 days
---
## Reasoning Preservation
### Chat Completions
**What Happens:**
1. Model generates internal reasoning (scratchpad)
2. Reasoning used to produce response
3. **Reasoning discarded** before returning
4. Next turn starts fresh (no reasoning memory)
**Visual:**
```
Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted
```
**Impact:**
- Model "forgets" its thought process
- May repeat reasoning steps
- Lower performance on complex multi-turn tasks
### Responses API
**What Happens:**
1. Model generates internal reasoning
2. Reasoning used to produce response
3. **Reasoning preserved** in conversation state
4. Next turn builds on previous reasoning
**Visual:**
```
Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved
```
**Impact:**
- Model remembers thought process
- No redundant reasoning
- **+5% better on TAUBench (GPT-5)**
- Better multi-turn problem solving
---
## Tools Comparison
### Chat Completions (Client-Side)
```typescript
// 1. Define function
const response1 = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'What is the weather?' }],
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather',
parameters: {
type: 'object',
properties: {
location: { type: 'string' },
},
},
},
},
],
});
// 2. Check if tool called
const toolCall = response1.choices[0].message.tool_calls?.[0];
// 3. Execute tool on your server
const weatherData = await getWeather(toolCall.function.arguments);
// 4. Send result back
const response2 = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
...messages,
response1.choices[0].message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(weatherData),
},
],
});
```
**Pros:**
- Full control over tool execution
- Can use any custom tools
**Cons:**
- Manual round trips (latency)
- More complex code
- You handle tool execution
### Responses (Server-Side Built-in)
```typescript
// All in one request - tools executed server-side
const response = await openai.responses.create({
model: 'gpt-5',
input: 'What is the weather and analyze the temperature trend?',
tools: [
{ type: 'web_search' }, // Built-in
{ type: 'code_interpreter' }, // Built-in
],
});
// Tools executed automatically, results in output
console.log(response.output_text);
```
**Pros:**
- No round trips (lower latency)
- Simpler code
- Built-in tools (no setup)
**Cons:**
- Less control over execution
- Limited to built-in + MCP tools
---
## Performance Benchmarks
### TAUBench (GPT-5)
| Scenario | Chat Completions | Responses API | Difference |
|----------|-----------------|---------------|------------|
| Multi-turn reasoning | 82% | 87% | **+5%** |
| Tool usage accuracy | 85% | 88% | **+3%** |
| Context retention | 78% | 85% | **+7%** |
### Cache Utilization
| Metric | Chat Completions | Responses API | Improvement |
|--------|-----------------|---------------|-------------|
| Cache hit rate | 30% | 54-72% | **40-80% better** |
| Latency (cached) | 100ms | 60-80ms | **20-40% faster** |
| Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** |
---
## Cost Comparison
### Pricing Structure
**Chat Completions:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- **No storage costs**
**Responses:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- Tool tokens: $Z per 1K (if tools used)
- **Conversation storage**: $0.01 per conversation per month
### Example Cost Calculation
**Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn
**Chat Completions:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Total: $A + $B
```
**Responses:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Storage: 100 convs × $0.01 = $1
Cache savings: -30% on input (due to better caching)
Total: ($A × 0.7) + $B + $1 (usually cheaper!)
```
---
## Migration Path
### Simple Migration
**Before (Chat Completions):**
```typescript
const response = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.choices[0].message.content);
```
**After (Responses):**
```typescript
const response = await openai.responses.create({
model: 'gpt-5',
input: [
{ role: 'developer', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.output_text);
```
**Changes:**
1. `chat.completions.create``responses.create`
2. `messages``input`
3. `system``developer`
4. `choices[0].message.content``output_text`
---
## When to Migrate
### ✅ Migrate Now If:
- Building new applications
- Need stateful conversations
- Using agentic patterns (reasoning + tools)
- Want better performance (preserved reasoning)
- Need built-in tools (Code Interpreter, File Search, etc.)
### ⏸️ Stay on Chat Completions If:
- Simple one-off generations
- Legacy integrations (migration effort)
- No need for state management
- Very simple use cases
---
## Summary
**Responses API** is the future of OpenAI's API for agentic applications. It provides:
- ✅ Better performance (+5% on TAUBench)
- ✅ Lower latency (40-80% better caching)
- ✅ Simpler code (automatic state management)
- ✅ More features (built-in tools, MCP, reasoning preservation)
**Chat Completions** is still great for:
- ✅ Simple one-off text generation
- ✅ Legacy integrations
- ✅ When you need maximum simplicity
**Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.