Files
gh-jezweb-claude-skills-ski…/references/responses-vs-chat-completions.md
2025-11-30 08:25:17 +08:00

12 KiB
Raw Permalink Blame History

Responses API vs Chat Completions: Complete Comparison

Last Updated: 2025-10-25

This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.


Quick Decision Guide

Use Responses API When:

  • Building agentic applications (reasoning + actions)
  • Need multi-turn conversations with automatic state management
  • Using built-in tools (Code Interpreter, File Search, Web Search, Image Gen)
  • Connecting to MCP servers for external integrations
  • Want preserved reasoning for better multi-turn performance
  • Implementing background processing for long tasks
  • Need polymorphic outputs for debugging/auditing

Use Chat Completions When:

  • Simple one-off text generation
  • Fully stateless interactions (no conversation continuity needed)
  • Legacy integrations with existing Chat Completions code
  • Very simple use cases without tools

Feature Comparison Matrix

Feature Chat Completions Responses API Winner
State Management Manual (you track history) Automatic (conversation IDs) Responses
Reasoning Preservation Dropped between turns Preserved across turns Responses
Tools Execution Client-side round trips Server-side hosted Responses
Output Format Single message Polymorphic (messages, reasoning, tool calls) Responses
Cache Utilization Baseline 40-80% better Responses
MCP Support Manual integration required Built-in Responses
Performance (GPT-5) Baseline +5% on TAUBench Responses
Simplicity Simpler for one-offs More features = more complexity Chat Completions
Legacy Compatibility Mature, stable New (March 2025) Chat Completions

API Comparison

Endpoints

Chat Completions:

POST /v1/chat/completions

Responses:

POST /v1/responses

Request Structure

Chat Completions:

{
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  temperature: 0.7,
  max_tokens: 1000,
}

Responses:

{
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  conversation: 'conv_abc123', // Optional: automatic state
  temperature: 0.7,
}

Key Differences:

  • messagesinput
  • system role → developer role
  • max_tokens not required in Responses
  • conversation parameter for automatic state

Response Structure

Chat Completions:

{
  id: 'chatcmpl-123',
  object: 'chat.completion',
  created: 1677652288,
  model: 'gpt-5',
  choices: [
    {
      index: 0,
      message: {
        role: 'assistant',
        content: 'Hello! How can I help?',
      },
      finish_reason: 'stop',
    },
  ],
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    total_tokens: 15,
  },
}

Responses:

{
  id: 'resp_123',
  object: 'response',
  created: 1677652288,
  model: 'gpt-5',
  output: [
    {
      type: 'reasoning',
      summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
    },
    {
      type: 'message',
      role: 'assistant',
      content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
    },
  ],
  output_text: 'Hello! How can I help?', // Helper field
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    tool_tokens: 0,
    total_tokens: 15,
  },
  conversation_id: 'conv_abc123', // If using conversation
}

Key Differences:

  • Single message → Polymorphic output array
  • choices[0].message.contentoutput_text helper
  • Additional output types: reasoning, tool_calls, etc.
  • conversation_id included if using conversations

State Management Comparison

Chat Completions (Manual)

// You track history manually
let messages = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'What is AI?' },
];

const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages,
});

// Add response to history
messages.push({
  role: 'assistant',
  content: response1.choices[0].message.content,
});

// Next turn
messages.push({ role: 'user', content: 'Tell me more' });

const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages, // ✅ You must pass full history
});

Pros:

  • Full control over history
  • Can prune old messages
  • Simple for one-off requests

Cons:

  • Manual tracking error-prone
  • Must handle history yourself
  • No automatic caching benefits

Responses (Automatic)

// Create conversation once
const conv = await openai.conversations.create();

const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Automatic state
  input: 'What is AI?',
});

// Next turn - no manual history tracking
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Remembers previous turn
  input: 'Tell me more',
});

Pros:

  • Automatic state management
  • No manual history tracking
  • Better cache utilization (40-80%)
  • Reasoning preserved

Cons:

  • Less direct control
  • Must create conversation first
  • Conversations expire after 90 days

Reasoning Preservation

Chat Completions

What Happens:

  1. Model generates internal reasoning (scratchpad)
  2. Reasoning used to produce response
  3. Reasoning discarded before returning
  4. Next turn starts fresh (no reasoning memory)

Visual:

Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted

Impact:

  • Model "forgets" its thought process
  • May repeat reasoning steps
  • Lower performance on complex multi-turn tasks

Responses API

What Happens:

  1. Model generates internal reasoning
  2. Reasoning used to produce response
  3. Reasoning preserved in conversation state
  4. Next turn builds on previous reasoning

Visual:

Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved

Impact:

  • Model remembers thought process
  • No redundant reasoning
  • +5% better on TAUBench (GPT-5)
  • Better multi-turn problem solving

Tools Comparison

Chat Completions (Client-Side)

// 1. Define function
const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [{ role: 'user', content: 'What is the weather?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' },
          },
        },
      },
    },
  ],
});

// 2. Check if tool called
const toolCall = response1.choices[0].message.tool_calls?.[0];

// 3. Execute tool on your server
const weatherData = await getWeather(toolCall.function.arguments);

// 4. Send result back
const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    ...messages,
    response1.choices[0].message,
    {
      role: 'tool',
      tool_call_id: toolCall.id,
      content: JSON.stringify(weatherData),
    },
  ],
});

Pros:

  • Full control over tool execution
  • Can use any custom tools

Cons:

  • Manual round trips (latency)
  • More complex code
  • You handle tool execution

Responses (Server-Side Built-in)

// All in one request - tools executed server-side
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What is the weather and analyze the temperature trend?',
  tools: [
    { type: 'web_search' },       // Built-in
    { type: 'code_interpreter' }, // Built-in
  ],
});

// Tools executed automatically, results in output
console.log(response.output_text);

Pros:

  • No round trips (lower latency)
  • Simpler code
  • Built-in tools (no setup)

Cons:

  • Less control over execution
  • Limited to built-in + MCP tools

Performance Benchmarks

TAUBench (GPT-5)

Scenario Chat Completions Responses API Difference
Multi-turn reasoning 82% 87% +5%
Tool usage accuracy 85% 88% +3%
Context retention 78% 85% +7%

Cache Utilization

Metric Chat Completions Responses API Improvement
Cache hit rate 30% 54-72% 40-80% better
Latency (cached) 100ms 60-80ms 20-40% faster
Cost (cached) $0.10/1K $0.05-0.07/1K 30-50% cheaper

Cost Comparison

Pricing Structure

Chat Completions:

  • Input tokens: $X per 1K
  • Output tokens: $Y per 1K
  • No storage costs

Responses:

  • Input tokens: $X per 1K
  • Output tokens: $Y per 1K
  • Tool tokens: $Z per 1K (if tools used)
  • Conversation storage: $0.01 per conversation per month

Example Cost Calculation

Scenario: 100 multi-turn conversations, 10 turns each, 1000 tokens per turn

Chat Completions:

Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Total: $A + $B

Responses:

Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Storage: 100 convs × $0.01 = $1
Cache savings: -30% on input (due to better caching)
Total: ($A × 0.7) + $B + $1 (usually cheaper!)

Migration Path

Simple Migration

Before (Chat Completions):

const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);

After (Responses):

const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.output_text);

Changes:

  1. chat.completions.createresponses.create
  2. messagesinput
  3. systemdeveloper
  4. choices[0].message.contentoutput_text

When to Migrate

Migrate Now If:

  • Building new applications
  • Need stateful conversations
  • Using agentic patterns (reasoning + tools)
  • Want better performance (preserved reasoning)
  • Need built-in tools (Code Interpreter, File Search, etc.)

⏸️ Stay on Chat Completions If:

  • Simple one-off generations
  • Legacy integrations (migration effort)
  • No need for state management
  • Very simple use cases

Summary

Responses API is the future of OpenAI's API for agentic applications. It provides:

  • Better performance (+5% on TAUBench)
  • Lower latency (40-80% better caching)
  • Simpler code (automatic state management)
  • More features (built-in tools, MCP, reasoning preservation)

Chat Completions is still great for:

  • Simple one-off text generation
  • Legacy integrations
  • When you need maximum simplicity

Recommendation: Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.