Initial commit

2025-11-30 08:25:17 +08:00
commit 07f3f3c71c
22 changed files with 5007 additions and 0 deletions
--- a/references/responses-vs-chat-completions.md
+++ b/references/responses-vs-chat-completions.md
@@ -0,0 +1,492 @@
+# Responses API vs Chat Completions: Complete Comparison
+
+**Last Updated**: 2025-10-25
+
+This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.
+
+---
+
+## Quick Decision Guide
+
+### ✅ Use Responses API When:
+
+- Building **agentic applications** (reasoning + actions)
+- Need **multi-turn conversations** with automatic state management
+- Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen)
+- Connecting to **MCP servers** for external integrations
+- Want **preserved reasoning** for better multi-turn performance
+- Implementing **background processing** for long tasks
+- Need **polymorphic outputs** for debugging/auditing
+
+### ✅ Use Chat Completions When:
+
+- Simple **one-off text generation**
+- Fully **stateless** interactions (no conversation continuity needed)
+- **Legacy integrations** with existing Chat Completions code
+- Very **simple use cases** without tools
+
+---
+
+## Feature Comparison Matrix
+
+| Feature | Chat Completions | Responses API | Winner |
+|---------|-----------------|---------------|---------|
+| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ |
+| **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ |
+| **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ |
+| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ |
+| **Cache Utilization** | Baseline | 40-80% better | Responses ✅ |
+| **MCP Support** | Manual integration required | Built-in | Responses ✅ |
+| **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ |
+| **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ |
+| **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ |
+
+---
+
+## API Comparison
+
+### Endpoints
+
+**Chat Completions:**
+```
+POST /v1/chat/completions
+```
+
+**Responses:**
+```
+POST /v1/responses
+```
+
+---
+
+### Request Structure
+
+**Chat Completions:**
+```typescript
+{
+  model: 'gpt-5',
+  messages: [
+    { role: 'system', content: 'You are helpful.' },
+    { role: 'user', content: 'Hello!' },
+  ],
+  temperature: 0.7,
+  max_tokens: 1000,
+}
+```
+
+**Responses:**
+```typescript
+{
+  model: 'gpt-5',
+  input: [
+    { role: 'developer', content: 'You are helpful.' },
+    { role: 'user', content: 'Hello!' },
+  ],
+  conversation: 'conv_abc123', // Optional: automatic state
+  temperature: 0.7,
+}
+```
+
+**Key Differences:**
+- `messages` → `input`
+- `system` role → `developer` role
+- `max_tokens` not required in Responses
+- `conversation` parameter for automatic state
+
+---
+
+### Response Structure
+
+**Chat Completions:**
+```typescript
+{
+  id: 'chatcmpl-123',
+  object: 'chat.completion',
+  created: 1677652288,
+  model: 'gpt-5',
+  choices: [
+    {
+      index: 0,
+      message: {
+        role: 'assistant',
+        content: 'Hello! How can I help?',
+      },
+      finish_reason: 'stop',
+    },
+  ],
+  usage: {
+    prompt_tokens: 10,
+    completion_tokens: 5,
+    total_tokens: 15,
+  },
+}
+```
+
+**Responses:**
+```typescript
+{
+  id: 'resp_123',
+  object: 'response',
+  created: 1677652288,
+  model: 'gpt-5',
+  output: [
+    {
+      type: 'reasoning',
+      summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
+    },
+    {
+      type: 'message',
+      role: 'assistant',
+      content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
+    },
+  ],
+  output_text: 'Hello! How can I help?', // Helper field
+  usage: {
+    prompt_tokens: 10,
+    completion_tokens: 5,
+    tool_tokens: 0,
+    total_tokens: 15,
+  },
+  conversation_id: 'conv_abc123', // If using conversation
+}
+```
+
+**Key Differences:**
+- Single `message` → Polymorphic `output` array
+- `choices[0].message.content` → `output_text` helper
+- Additional output types: `reasoning`, `tool_calls`, etc.
+- `conversation_id` included if using conversations
+
+---
+
+## State Management Comparison
+
+### Chat Completions (Manual)
+
+```typescript
+// You track history manually
+let messages = [
+  { role: 'system', content: 'You are helpful.' },
+  { role: 'user', content: 'What is AI?' },
+];
+
+const response1 = await openai.chat.completions.create({
+  model: 'gpt-5',
+  messages,
+});
+
+// Add response to history
+messages.push({
+  role: 'assistant',
+  content: response1.choices[0].message.content,
+});
+
+// Next turn
+messages.push({ role: 'user', content: 'Tell me more' });
+
+const response2 = await openai.chat.completions.create({
+  model: 'gpt-5',
+  messages, // ✅ You must pass full history
+});
+```
+
+**Pros:**
+- Full control over history
+- Can prune old messages
+- Simple for one-off requests
+
+**Cons:**
+- Manual tracking error-prone
+- Must handle history yourself
+- No automatic caching benefits
+
+### Responses (Automatic)
+
+```typescript
+// Create conversation once
+const conv = await openai.conversations.create();
+
+const response1 = await openai.responses.create({
+  model: 'gpt-5',
+  conversation: conv.id, // ✅ Automatic state
+  input: 'What is AI?',
+});
+
+// Next turn - no manual history tracking
+const response2 = await openai.responses.create({
+  model: 'gpt-5',
+  conversation: conv.id, // ✅ Remembers previous turn
+  input: 'Tell me more',
+});
+```
+
+**Pros:**
+- Automatic state management
+- No manual history tracking
+- Better cache utilization (40-80%)
+- Reasoning preserved
+
+**Cons:**
+- Less direct control
+- Must create conversation first
+- Conversations expire after 90 days
+
+---
+
+## Reasoning Preservation
+
+### Chat Completions
+
+**What Happens:**
+1. Model generates internal reasoning (scratchpad)
+2. Reasoning used to produce response
+3. **Reasoning discarded** before returning
+4. Next turn starts fresh (no reasoning memory)
+
+**Visual:**
+```
+Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
+Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
+Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted
+```
+
+**Impact:**
+- Model "forgets" its thought process
+- May repeat reasoning steps
+- Lower performance on complex multi-turn tasks
+
+### Responses API
+
+**What Happens:**
+1. Model generates internal reasoning
+2. Reasoning used to produce response
+3. **Reasoning preserved** in conversation state
+4. Next turn builds on previous reasoning
+
+**Visual:**
+```
+Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
+Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
+Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved
+```
+
+**Impact:**
+- Model remembers thought process
+- No redundant reasoning
+- **+5% better on TAUBench (GPT-5)**
+- Better multi-turn problem solving
+
+---
+
+## Tools Comparison
+
+### Chat Completions (Client-Side)
+
+```typescript
+// 1. Define function
+const response1 = await openai.chat.completions.create({
+  model: 'gpt-5',
+  messages: [{ role: 'user', content: 'What is the weather?' }],
+  tools: [
+    {
+      type: 'function',
+      function: {
+        name: 'get_weather',
+        description: 'Get weather',
+        parameters: {
+          type: 'object',
+          properties: {
+            location: { type: 'string' },
+          },
+        },
+      },
+    },
+  ],
+});
+
+// 2. Check if tool called
+const toolCall = response1.choices[0].message.tool_calls?.[0];
+
+// 3. Execute tool on your server
+const weatherData = await getWeather(toolCall.function.arguments);
+
+// 4. Send result back
+const response2 = await openai.chat.completions.create({
+  model: 'gpt-5',
+  messages: [
+    ...messages,
+    response1.choices[0].message,
+    {
+      role: 'tool',
+      tool_call_id: toolCall.id,
+      content: JSON.stringify(weatherData),
+    },
+  ],
+});
+```
+
+**Pros:**
+- Full control over tool execution
+- Can use any custom tools
+
+**Cons:**
+- Manual round trips (latency)
+- More complex code
+- You handle tool execution
+
+### Responses (Server-Side Built-in)
+
+```typescript
+// All in one request - tools executed server-side
+const response = await openai.responses.create({
+  model: 'gpt-5',
+  input: 'What is the weather and analyze the temperature trend?',
+  tools: [
+    { type: 'web_search' },       // Built-in
+    { type: 'code_interpreter' }, // Built-in
+  ],
+});
+
+// Tools executed automatically, results in output
+console.log(response.output_text);
+```
+
+**Pros:**
+- No round trips (lower latency)
+- Simpler code
+- Built-in tools (no setup)
+
+**Cons:**
+- Less control over execution
+- Limited to built-in + MCP tools
+
+---
+
+## Performance Benchmarks
+
+### TAUBench (GPT-5)
+
+| Scenario | Chat Completions | Responses API | Difference |
+|----------|-----------------|---------------|------------|
+| Multi-turn reasoning | 82% | 87% | **+5%** |
+| Tool usage accuracy | 85% | 88% | **+3%** |
+| Context retention | 78% | 85% | **+7%** |
+
+### Cache Utilization
+
+| Metric | Chat Completions | Responses API | Improvement |
+|--------|-----------------|---------------|-------------|
+| Cache hit rate | 30% | 54-72% | **40-80% better** |
+| Latency (cached) | 100ms | 60-80ms | **20-40% faster** |
+| Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** |
+
+---
+
+## Cost Comparison
+
+### Pricing Structure
+
+**Chat Completions:**
+- Input tokens: $X per 1K
+- Output tokens: $Y per 1K
+- **No storage costs**
+
+**Responses:**
+- Input tokens: $X per 1K
+- Output tokens: $Y per 1K
+- Tool tokens: $Z per 1K (if tools used)
+- **Conversation storage**: $0.01 per conversation per month
+
+### Example Cost Calculation
+
+**Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn
+
+**Chat Completions:**
+```
+Input: 100 convs × 10 turns × 500 tokens × $X = $A
+Output: 100 convs × 10 turns × 500 tokens × $Y = $B
+Total: $A + $B
+```
+
+**Responses:**
+```
+Input: 100 convs × 10 turns × 500 tokens × $X = $A
+Output: 100 convs × 10 turns × 500 tokens × $Y = $B
+Storage: 100 convs × $0.01 = $1
+Cache savings: -30% on input (due to better caching)
+Total: ($A × 0.7) + $B + $1 (usually cheaper!)
+```
+
+---
+
+## Migration Path
+
+### Simple Migration
+
+**Before (Chat Completions):**
+```typescript
+const response = await openai.chat.completions.create({
+  model: 'gpt-5',
+  messages: [
+    { role: 'system', content: 'You are helpful.' },
+    { role: 'user', content: 'Hello!' },
+  ],
+});
+
+console.log(response.choices[0].message.content);
+```
+
+**After (Responses):**
+```typescript
+const response = await openai.responses.create({
+  model: 'gpt-5',
+  input: [
+    { role: 'developer', content: 'You are helpful.' },
+    { role: 'user', content: 'Hello!' },
+  ],
+});
+
+console.log(response.output_text);
+```
+
+**Changes:**
+1. `chat.completions.create` → `responses.create`
+2. `messages` → `input`
+3. `system` → `developer`
+4. `choices[0].message.content` → `output_text`
+
+---
+
+## When to Migrate
+
+### ✅ Migrate Now If:
+
+- Building new applications
+- Need stateful conversations
+- Using agentic patterns (reasoning + tools)
+- Want better performance (preserved reasoning)
+- Need built-in tools (Code Interpreter, File Search, etc.)
+
+### ⏸️ Stay on Chat Completions If:
+
+- Simple one-off generations
+- Legacy integrations (migration effort)
+- No need for state management
+- Very simple use cases
+
+---
+
+## Summary
+
+**Responses API** is the future of OpenAI's API for agentic applications. It provides:
+- ✅ Better performance (+5% on TAUBench)
+- ✅ Lower latency (40-80% better caching)
+- ✅ Simpler code (automatic state management)
+- ✅ More features (built-in tools, MCP, reasoning preservation)
+
+**Chat Completions** is still great for:
+- ✅ Simple one-off text generation
+- ✅ Legacy integrations
+- ✅ When you need maximum simplicity
+
+**Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.