# Responses API vs Chat Completions: Complete Comparison **Last Updated**: 2025-10-25 This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case. --- ## Quick Decision Guide ### ✅ Use Responses API When: - Building **agentic applications** (reasoning + actions) - Need **multi-turn conversations** with automatic state management - Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen) - Connecting to **MCP servers** for external integrations - Want **preserved reasoning** for better multi-turn performance - Implementing **background processing** for long tasks - Need **polymorphic outputs** for debugging/auditing ### ✅ Use Chat Completions When: - Simple **one-off text generation** - Fully **stateless** interactions (no conversation continuity needed) - **Legacy integrations** with existing Chat Completions code - Very **simple use cases** without tools --- ## Feature Comparison Matrix | Feature | Chat Completions | Responses API | Winner | |---------|-----------------|---------------|---------| | **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ | | **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ | | **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ | | **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ | | **Cache Utilization** | Baseline | 40-80% better | Responses ✅ | | **MCP Support** | Manual integration required | Built-in | Responses ✅ | | **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ | | **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ | | **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ | --- ## API Comparison ### Endpoints **Chat Completions:** ``` POST /v1/chat/completions ``` **Responses:** ``` POST /v1/responses ``` --- ### Request Structure **Chat Completions:** ```typescript { model: 'gpt-5', messages: [ { role: 'system', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], temperature: 0.7, max_tokens: 1000, } ``` **Responses:** ```typescript { model: 'gpt-5', input: [ { role: 'developer', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], conversation: 'conv_abc123', // Optional: automatic state temperature: 0.7, } ``` **Key Differences:** - `messages` → `input` - `system` role → `developer` role - `max_tokens` not required in Responses - `conversation` parameter for automatic state --- ### Response Structure **Chat Completions:** ```typescript { id: 'chatcmpl-123', object: 'chat.completion', created: 1677652288, model: 'gpt-5', choices: [ { index: 0, message: { role: 'assistant', content: 'Hello! How can I help?', }, finish_reason: 'stop', }, ], usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15, }, } ``` **Responses:** ```typescript { id: 'resp_123', object: 'response', created: 1677652288, model: 'gpt-5', output: [ { type: 'reasoning', summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }], }, { type: 'message', role: 'assistant', content: [{ type: 'output_text', text: 'Hello! How can I help?' }], }, ], output_text: 'Hello! How can I help?', // Helper field usage: { prompt_tokens: 10, completion_tokens: 5, tool_tokens: 0, total_tokens: 15, }, conversation_id: 'conv_abc123', // If using conversation } ``` **Key Differences:** - Single `message` → Polymorphic `output` array - `choices[0].message.content` → `output_text` helper - Additional output types: `reasoning`, `tool_calls`, etc. - `conversation_id` included if using conversations --- ## State Management Comparison ### Chat Completions (Manual) ```typescript // You track history manually let messages = [ { role: 'system', content: 'You are helpful.' }, { role: 'user', content: 'What is AI?' }, ]; const response1 = await openai.chat.completions.create({ model: 'gpt-5', messages, }); // Add response to history messages.push({ role: 'assistant', content: response1.choices[0].message.content, }); // Next turn messages.push({ role: 'user', content: 'Tell me more' }); const response2 = await openai.chat.completions.create({ model: 'gpt-5', messages, // ✅ You must pass full history }); ``` **Pros:** - Full control over history - Can prune old messages - Simple for one-off requests **Cons:** - Manual tracking error-prone - Must handle history yourself - No automatic caching benefits ### Responses (Automatic) ```typescript // Create conversation once const conv = await openai.conversations.create(); const response1 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, // ✅ Automatic state input: 'What is AI?', }); // Next turn - no manual history tracking const response2 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, // ✅ Remembers previous turn input: 'Tell me more', }); ``` **Pros:** - Automatic state management - No manual history tracking - Better cache utilization (40-80%) - Reasoning preserved **Cons:** - Less direct control - Must create conversation first - Conversations expire after 90 days --- ## Reasoning Preservation ### Chat Completions **What Happens:** 1. Model generates internal reasoning (scratchpad) 2. Reasoning used to produce response 3. **Reasoning discarded** before returning 4. Next turn starts fresh (no reasoning memory) **Visual:** ``` Turn 1: [Reasoning] → Response → ❌ Reasoning deleted Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted ``` **Impact:** - Model "forgets" its thought process - May repeat reasoning steps - Lower performance on complex multi-turn tasks ### Responses API **What Happens:** 1. Model generates internal reasoning 2. Reasoning used to produce response 3. **Reasoning preserved** in conversation state 4. Next turn builds on previous reasoning **Visual:** ``` Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved ``` **Impact:** - Model remembers thought process - No redundant reasoning - **+5% better on TAUBench (GPT-5)** - Better multi-turn problem solving --- ## Tools Comparison ### Chat Completions (Client-Side) ```typescript // 1. Define function const response1 = await openai.chat.completions.create({ model: 'gpt-5', messages: [{ role: 'user', content: 'What is the weather?' }], tools: [ { type: 'function', function: { name: 'get_weather', description: 'Get weather', parameters: { type: 'object', properties: { location: { type: 'string' }, }, }, }, }, ], }); // 2. Check if tool called const toolCall = response1.choices[0].message.tool_calls?.[0]; // 3. Execute tool on your server const weatherData = await getWeather(toolCall.function.arguments); // 4. Send result back const response2 = await openai.chat.completions.create({ model: 'gpt-5', messages: [ ...messages, response1.choices[0].message, { role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(weatherData), }, ], }); ``` **Pros:** - Full control over tool execution - Can use any custom tools **Cons:** - Manual round trips (latency) - More complex code - You handle tool execution ### Responses (Server-Side Built-in) ```typescript // All in one request - tools executed server-side const response = await openai.responses.create({ model: 'gpt-5', input: 'What is the weather and analyze the temperature trend?', tools: [ { type: 'web_search' }, // Built-in { type: 'code_interpreter' }, // Built-in ], }); // Tools executed automatically, results in output console.log(response.output_text); ``` **Pros:** - No round trips (lower latency) - Simpler code - Built-in tools (no setup) **Cons:** - Less control over execution - Limited to built-in + MCP tools --- ## Performance Benchmarks ### TAUBench (GPT-5) | Scenario | Chat Completions | Responses API | Difference | |----------|-----------------|---------------|------------| | Multi-turn reasoning | 82% | 87% | **+5%** | | Tool usage accuracy | 85% | 88% | **+3%** | | Context retention | 78% | 85% | **+7%** | ### Cache Utilization | Metric | Chat Completions | Responses API | Improvement | |--------|-----------------|---------------|-------------| | Cache hit rate | 30% | 54-72% | **40-80% better** | | Latency (cached) | 100ms | 60-80ms | **20-40% faster** | | Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** | --- ## Cost Comparison ### Pricing Structure **Chat Completions:** - Input tokens: $X per 1K - Output tokens: $Y per 1K - **No storage costs** **Responses:** - Input tokens: $X per 1K - Output tokens: $Y per 1K - Tool tokens: $Z per 1K (if tools used) - **Conversation storage**: $0.01 per conversation per month ### Example Cost Calculation **Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn **Chat Completions:** ``` Input: 100 convs × 10 turns × 500 tokens × $X = $A Output: 100 convs × 10 turns × 500 tokens × $Y = $B Total: $A + $B ``` **Responses:** ``` Input: 100 convs × 10 turns × 500 tokens × $X = $A Output: 100 convs × 10 turns × 500 tokens × $Y = $B Storage: 100 convs × $0.01 = $1 Cache savings: -30% on input (due to better caching) Total: ($A × 0.7) + $B + $1 (usually cheaper!) ``` --- ## Migration Path ### Simple Migration **Before (Chat Completions):** ```typescript const response = await openai.chat.completions.create({ model: 'gpt-5', messages: [ { role: 'system', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.choices[0].message.content); ``` **After (Responses):** ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: [ { role: 'developer', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.output_text); ``` **Changes:** 1. `chat.completions.create` → `responses.create` 2. `messages` → `input` 3. `system` → `developer` 4. `choices[0].message.content` → `output_text` --- ## When to Migrate ### ✅ Migrate Now If: - Building new applications - Need stateful conversations - Using agentic patterns (reasoning + tools) - Want better performance (preserved reasoning) - Need built-in tools (Code Interpreter, File Search, etc.) ### ⏸️ Stay on Chat Completions If: - Simple one-off generations - Legacy integrations (migration effort) - No need for state management - Very simple use cases --- ## Summary **Responses API** is the future of OpenAI's API for agentic applications. It provides: - ✅ Better performance (+5% on TAUBench) - ✅ Lower latency (40-80% better caching) - ✅ Simpler code (automatic state management) - ✅ More features (built-in tools, MCP, reasoning preservation) **Chat Completions** is still great for: - ✅ Simple one-off text generation - ✅ Legacy integrations - ✅ When you need maximum simplicity **Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.