gh-jezweb-claude-skills-ski…/references/responses-vs-chat-completions.md

# Responses API vs Chat Completions: Complete Comparison

**Last Updated**: 2025-10-25

This document provides a comprehensive comparison between the Responses API and Chat Completions API to help you choose the right one for your use case.

---

## Quick Decision Guide

### ✅ Use Responses API When:

- Building **agentic applications** (reasoning + actions)
- Need **multi-turn conversations** with automatic state management
- Using **built-in tools** (Code Interpreter, File Search, Web Search, Image Gen)
- Connecting to **MCP servers** for external integrations
- Want **preserved reasoning** for better multi-turn performance
- Implementing **background processing** for long tasks
- Need **polymorphic outputs** for debugging/auditing

### ✅ Use Chat Completions When:

- Simple **one-off text generation**
- Fully **stateless** interactions (no conversation continuity needed)
- **Legacy integrations** with existing Chat Completions code
- Very **simple use cases** without tools

---

## Feature Comparison Matrix

| Feature | Chat Completions | Responses API | Winner |
|---------|-----------------|---------------|---------|
| **State Management** | Manual (you track history) | Automatic (conversation IDs) | Responses ✅ |
| **Reasoning Preservation** | Dropped between turns | Preserved across turns | Responses ✅ |
| **Tools Execution** | Client-side round trips | Server-side hosted | Responses ✅ |
| **Output Format** | Single message | Polymorphic (messages, reasoning, tool calls) | Responses ✅ |
| **Cache Utilization** | Baseline | 40-80% better | Responses ✅ |
| **MCP Support** | Manual integration required | Built-in | Responses ✅ |
| **Performance (GPT-5)** | Baseline | +5% on TAUBench | Responses ✅ |
| **Simplicity** | Simpler for one-offs | More features = more complexity | Chat Completions ✅ |
| **Legacy Compatibility** | Mature, stable | New (March 2025) | Chat Completions ✅ |

---

## API Comparison

### Endpoints

**Chat Completions:**
```
POST /v1/chat/completions
```

**Responses:**
```
POST /v1/responses
```

---

### Request Structure

**Chat Completions:**
```typescript
{
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  temperature: 0.7,
  max_tokens: 1000,
}
```

**Responses:**
```typescript
{
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
  conversation: 'conv_abc123', // Optional: automatic state
  temperature: 0.7,
}
```

**Key Differences:**
- `messages` → `input`
- `system` role → `developer` role
- `max_tokens` not required in Responses
- `conversation` parameter for automatic state

---

### Response Structure

**Chat Completions:**
```typescript
{
  id: 'chatcmpl-123',
  object: 'chat.completion',
  created: 1677652288,
  model: 'gpt-5',
  choices: [
    {
      index: 0,
      message: {
        role: 'assistant',
        content: 'Hello! How can I help?',
      },
      finish_reason: 'stop',
    },
  ],
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    total_tokens: 15,
  },
}
```

**Responses:**
```typescript
{
  id: 'resp_123',
  object: 'response',
  created: 1677652288,
  model: 'gpt-5',
  output: [
    {
      type: 'reasoning',
      summary: [{ type: 'summary_text', text: 'User greeting, respond friendly' }],
    },
    {
      type: 'message',
      role: 'assistant',
      content: [{ type: 'output_text', text: 'Hello! How can I help?' }],
    },
  ],
  output_text: 'Hello! How can I help?', // Helper field
  usage: {
    prompt_tokens: 10,
    completion_tokens: 5,
    tool_tokens: 0,
    total_tokens: 15,
  },
  conversation_id: 'conv_abc123', // If using conversation
}
```

**Key Differences:**
- Single `message` → Polymorphic `output` array
- `choices[0].message.content` → `output_text` helper
- Additional output types: `reasoning`, `tool_calls`, etc.
- `conversation_id` included if using conversations

---

## State Management Comparison

### Chat Completions (Manual)

```typescript
// You track history manually
let messages = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'What is AI?' },
];

const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages,
});

// Add response to history
messages.push({
  role: 'assistant',
  content: response1.choices[0].message.content,
});

// Next turn
messages.push({ role: 'user', content: 'Tell me more' });

const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages, // ✅ You must pass full history
});
```

**Pros:**
- Full control over history
- Can prune old messages
- Simple for one-off requests

**Cons:**
- Manual tracking error-prone
- Must handle history yourself
- No automatic caching benefits

### Responses (Automatic)

```typescript
// Create conversation once
const conv = await openai.conversations.create();

const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Automatic state
  input: 'What is AI?',
});

// Next turn - no manual history tracking
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id, // ✅ Remembers previous turn
  input: 'Tell me more',
});
```

**Pros:**
- Automatic state management
- No manual history tracking
- Better cache utilization (40-80%)
- Reasoning preserved

**Cons:**
- Less direct control
- Must create conversation first
- Conversations expire after 90 days

---

## Reasoning Preservation

### Chat Completions

**What Happens:**
1. Model generates internal reasoning (scratchpad)
2. Reasoning used to produce response
3. **Reasoning discarded** before returning
4. Next turn starts fresh (no reasoning memory)

**Visual:**
```
Turn 1: [Reasoning] → Response → ❌ Reasoning deleted
Turn 2: [New Reasoning] → Response → ❌ Reasoning deleted
Turn 3: [New Reasoning] → Response → ❌ Reasoning deleted
```

**Impact:**
- Model "forgets" its thought process
- May repeat reasoning steps
- Lower performance on complex multi-turn tasks

### Responses API

**What Happens:**
1. Model generates internal reasoning
2. Reasoning used to produce response
3. **Reasoning preserved** in conversation state
4. Next turn builds on previous reasoning

**Visual:**
```
Turn 1: [Reasoning A] → Response → ✅ Reasoning A saved
Turn 2: [Reasoning A + B] → Response → ✅ Reasoning A+B saved
Turn 3: [Reasoning A + B + C] → Response → ✅ All reasoning saved
```

**Impact:**
- Model remembers thought process
- No redundant reasoning
- **+5% better on TAUBench (GPT-5)**
- Better multi-turn problem solving

---

## Tools Comparison

### Chat Completions (Client-Side)

```typescript
// 1. Define function
const response1 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [{ role: 'user', content: 'What is the weather?' }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' },
          },
        },
      },
    },
  ],
});

// 2. Check if tool called
const toolCall = response1.choices[0].message.tool_calls?.[0];

// 3. Execute tool on your server
const weatherData = await getWeather(toolCall.function.arguments);

// 4. Send result back
const response2 = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    ...messages,
    response1.choices[0].message,
    {
      role: 'tool',
      tool_call_id: toolCall.id,
      content: JSON.stringify(weatherData),
    },
  ],
});
```

**Pros:**
- Full control over tool execution
- Can use any custom tools

**Cons:**
- Manual round trips (latency)
- More complex code
- You handle tool execution

### Responses (Server-Side Built-in)

```typescript
// All in one request - tools executed server-side
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What is the weather and analyze the temperature trend?',
  tools: [
    { type: 'web_search' },       // Built-in
    { type: 'code_interpreter' }, // Built-in
  ],
});

// Tools executed automatically, results in output
console.log(response.output_text);
```

**Pros:**
- No round trips (lower latency)
- Simpler code
- Built-in tools (no setup)

**Cons:**
- Less control over execution
- Limited to built-in + MCP tools

---

## Performance Benchmarks

### TAUBench (GPT-5)

| Scenario | Chat Completions | Responses API | Difference |
|----------|-----------------|---------------|------------|
| Multi-turn reasoning | 82% | 87% | **+5%** |
| Tool usage accuracy | 85% | 88% | **+3%** |
| Context retention | 78% | 85% | **+7%** |

### Cache Utilization

| Metric | Chat Completions | Responses API | Improvement |
|--------|-----------------|---------------|-------------|
| Cache hit rate | 30% | 54-72% | **40-80% better** |
| Latency (cached) | 100ms | 60-80ms | **20-40% faster** |
| Cost (cached) | $0.10/1K | $0.05-0.07/1K | **30-50% cheaper** |

---

## Cost Comparison

### Pricing Structure

**Chat Completions:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- **No storage costs**

**Responses:**
- Input tokens: $X per 1K
- Output tokens: $Y per 1K
- Tool tokens: $Z per 1K (if tools used)
- **Conversation storage**: $0.01 per conversation per month

### Example Cost Calculation

**Scenario:** 100 multi-turn conversations, 10 turns each, 1000 tokens per turn

**Chat Completions:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Total: $A + $B
```

**Responses:**
```
Input: 100 convs × 10 turns × 500 tokens × $X = $A
Output: 100 convs × 10 turns × 500 tokens × $Y = $B
Storage: 100 convs × $0.01 = $1
Cache savings: -30% on input (due to better caching)
Total: ($A × 0.7) + $B + $1 (usually cheaper!)
```

---

## Migration Path

### Simple Migration

**Before (Chat Completions):**
```typescript
const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.choices[0].message.content);
```

**After (Responses):**
```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

console.log(response.output_text);
```

**Changes:**
1. `chat.completions.create` → `responses.create`
2. `messages` → `input`
3. `system` → `developer`
4. `choices[0].message.content` → `output_text`

---

## When to Migrate

### ✅ Migrate Now If:

- Building new applications
- Need stateful conversations
- Using agentic patterns (reasoning + tools)
- Want better performance (preserved reasoning)
- Need built-in tools (Code Interpreter, File Search, etc.)

### ⏸️ Stay on Chat Completions If:

- Simple one-off generations
- Legacy integrations (migration effort)
- No need for state management
- Very simple use cases

---

## Summary

**Responses API** is the future of OpenAI's API for agentic applications. It provides:
- ✅ Better performance (+5% on TAUBench)
- ✅ Lower latency (40-80% better caching)
- ✅ Simpler code (automatic state management)
- ✅ More features (built-in tools, MCP, reasoning preservation)

**Chat Completions** is still great for:
- ✅ Simple one-off text generation
- ✅ Legacy integrations
- ✅ When you need maximum simplicity

**Recommendation:** Use Responses for new projects, especially agentic workflows. Chat Completions remains valid for simple use cases.