zhongwei/gh-jezweb-claude-skills-skills-openai-responses

Files

Zhongwei Li 07f3f3c71c Initial commit

2025-11-30 08:25:17 +08:00

12 KiB

Raw Blame History

Responses API vs Chat Completions: Complete Comparison

Last Updated: 2025-10-25

This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.

Quick Decision Guide

✅ Use Responses API When:

Building agentic applications (reasoning + actions)
Need multi-turn conversations with automatic state management
Using built-in tools (Code Interpreter, File Search, Web Search, Image Gen)
Connecting to MCP servers for external integrations
Want preserved reasoning for better multi-turn performance
Implementing background processing for long tasks
Need polymorphic outputs for debugging/auditing

✅ Use Chat Completions When:

Simple one-off text generation
Fully stateless interactions (no conversation continuity needed)
Legacy integrations with existing Chat Completions code
Very simple use cases without tools

Feature Comparison Matrix

Feature	Chat Completions	Responses API	Winner
State Management	Manual (you track history)	Automatic (conversation IDs)	Responses ✅
Reasoning Preservation	Dropped between turns	Preserved across turns	Responses ✅
Tools Execution	Client-side round trips	Server-side hosted	Responses ✅
Output Format	Single message	Polymorphic (messages, reasoning, tool calls)	Responses ✅
Cache Utilization	Baseline	40-80% better	Responses ✅
MCP Support	Manual integration required	Built-in	Responses ✅
Performance (GPT-5)	Baseline	+5% on TAUBench	Responses ✅
Simplicity	Simpler for one-offs	More features = more complexity	Chat Completions ✅
Legacy Compatibility	Mature, stable	New (March 2025)	Chat Completions ✅

API Comparison

Endpoints

Chat Completions:

POST /v1/chat/completions

Responses:

POST /v1/responses

Request Structure

Chat Completions:

{
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  temperature: 0.7,
  max_tokens: 1000,
}

Responses:

{
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  conversation: 'conv_abc123', // Optional: automatic state
  temperature: 0.7,
}

Key Differences:

messages → input
system role → developer role
max_tokens not required in Responses
conversation parameter for automatic state

Response Structure

Chat Completions:

{
  id: 'chatcmpl-123',
  object: 'chat.completion',
  created: 1677652288,
  model: 'gpt-5',
  choices: [
    {
      index: 0,
      message: {
        role: 'assistant',
        content: 'Hello! How can I help?',
      },
      finish_reason: 'stop',
    },
  ],
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    total_tokens: 15,
  },
}

Responses:

{
  id: 'resp_123',
  object: 'response',
  created: 1677652288,
  model: 'gpt-5',
  output: [
    {
      type: 'reasoning',
      summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
    },
    {
      type: 'message',
      role: 'assistant',
      content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
    },
  ],
  output_text: 'Hello! How can I help?', // Helper field
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    tool_tokens: 0,
    total_tokens: 15,
  },
  conversation_id: 'conv_abc123', // If using conversation
}

Key Differences:

Single message → Polymorphic output array
choices[0].message.content → output_text helper
Additional output types: reasoning, tool_calls, etc.
conversation_id included if using conversations

State Management Comparison

Chat Completions (Manual)

// You track history manually
let messages = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'What is AI?' },
];

const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages,
});

// Add response to history
messages.push({
  role: 'assistant',
  content: response1.choices[0].message.content,
});

// Next turn
messages.push({ role: 'user', content: 'Tell me more' });

const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages, // ✅ You must pass full history
});

Pros:

Full control over history
Can prune old messages
Simple for one-off requests

Cons:

Manual tracking error-prone
Must handle history yourself
No automatic caching benefits

Responses (Automatic)

// Create conversation once
const conv = await openai.conversations.create();

const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Automatic state
  input: 'What is AI?',
});

// Next turn - no manual history tracking
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Remembers previous turn
  input: 'Tell me more',
});

Pros:

Automatic state management
No manual history tracking
Better cache utilization (40-80%)
Reasoning preserved

Cons:

Less direct control
Must create conversation first
Conversations expire after 90 days

Reasoning Preservation

Chat Completions

What Happens:

Model generates internal reasoning (scratchpad)
Reasoning used to produce response
Reasoning discarded before returning
Next turn starts fresh (no reasoning memory)

Visual:

Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted

Impact:

Model "forgets" its thought process
May repeat reasoning steps
Lower performance on complex multi-turn tasks

Responses API

What Happens:

Model generates internal reasoning
Reasoning used to produce response
Reasoning preserved in conversation state
Next turn builds on previous reasoning

Visual:

Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved

Impact:

Model remembers thought process
No redundant reasoning
+5% better on TAUBench (GPT-5)
Better multi-turn problem solving

Tools Comparison

Chat Completions (Client-Side)

// 1. Define function
const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [{ role: 'user', content: 'What is the weather?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' },
          },
        },
      },
    },
  ],
});

// 2. Check if tool called
const toolCall = response1.choices[0].message.tool_calls?.[0];

// 3. Execute tool on your server
const weatherData = await getWeather(toolCall.function.arguments);

// 4. Send result back
const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    ...messages,
    response1.choices[0].message,
    {
      role: 'tool',
      tool_call_id: toolCall.id,
      content: JSON.stringify(weatherData),
    },
  ],
});

Pros:

Full control over tool execution
Can use any custom tools

Cons:

Manual round trips (latency)
More complex code
You handle tool execution

Responses (Server-Side Built-in)

// All in one request - tools executed server-side
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What is the weather and analyze the temperature trend?',
  tools: [
    { type: 'web_search' },       // Built-in
    { type: 'code_interpreter' }, // Built-in
  ],
});

// Tools executed automatically, results in output
console.log(response.output_text);

Pros:

No round trips (lower latency)
Simpler code
Built-in tools (no setup)

Cons:

Less control over execution
Limited to built-in + MCP tools

Performance Benchmarks

TAUBench (GPT-5)

Scenario	Chat Completions	Responses API	Difference
Multi-turn reasoning	82%	87%	+5%
Tool usage accuracy	85%	88%	+3%
Context retention	78%	85%	+7%

Cache Utilization

Metric	Chat Completions	Responses API	Improvement
Cache hit rate	30%	54-72%	40-80% better
Latency (cached)	100ms	60-80ms	20-40% faster
Cost (cached)	$0.10/1K	$0.05-0.07/1K	30-50% cheaper

Cost Comparison

Pricing Structure

Chat Completions:

Input tokens: $X per 1K
Output tokens: $Y per 1K
No storage costs

Responses:

Input tokens: $X per 1K
Output tokens: $Y per 1K
Tool tokens: $Z per 1K (if tools used)
Conversation storage: $0.01 per conversation per month

Example Cost Calculation

Scenario: 100 multi-turn conversations, 10 turns each, 1000 tokens per turn

Chat Completions:

Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Total: $A + $B

Responses:

Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Storage: 100 convs × $0.01 = $1
Cache savings: -30% on input (due to better caching)
Total: ($A × 0.7) + $B + $1 (usually cheaper!)

Migration Path

Simple Migration

Before (Chat Completions):

const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);

After (Responses):

const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.output_text);

Changes:

chat.completions.create → responses.create
messages → input
system → developer
choices[0].message.content → output_text

When to Migrate

✅ Migrate Now If:

Building new applications
Need stateful conversations
Using agentic patterns (reasoning + tools)
Want better performance (preserved reasoning)
Need built-in tools (Code Interpreter, File Search, etc.)

⏸️ Stay on Chat Completions If:

Simple one-off generations
Legacy integrations (migration effort)
No need for state management
Very simple use cases

Summary

Responses API is the future of OpenAI's API for agentic applications. It provides:

✅ Better performance (+5% on TAUBench)
✅ Lower latency (40-80% better caching)
✅ Simpler code (automatic state management)
✅ More features (built-in tools, MCP, reasoning preservation)

Chat Completions is still great for:

✅ Simple one-off text generation
✅ Legacy integrations
✅ When you need maximum simplicity

Recommendation: Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.

12 KiB Raw Blame History Unescape Escape

Responses API vs Chat Completions: Complete Comparison

Quick Decision Guide

✅ Use Responses API When:

✅ Use Chat Completions When:

Feature Comparison Matrix

API Comparison

Endpoints

Request Structure

Response Structure

State Management Comparison

Chat Completions (Manual)

Responses (Automatic)

Reasoning Preservation

Chat Completions

Responses API

Tools Comparison

Chat Completions (Client-Side)

Responses (Server-Side Built-in)

Performance Benchmarks

TAUBench (GPT-5)

Cache Utilization

Cost Comparison

Pricing Structure

Example Cost Calculation

Migration Path

Simple Migration

When to Migrate

✅ Migrate Now If:

⏸️ Stay on Chat Completions If:

Summary

12 KiB

Raw Blame History