# Realtime Transport Options: WebRTC vs WebSocket

This reference explains the two transport options for realtime voice agents and when to use each.

---

## Overview

OpenAI Agents Realtime SDK supports two transport mechanisms:
1. **WebRTC** (Web Real-Time Communication)
2. **WebSocket** (WebSocket Protocol)

Both enable bidirectional audio streaming, but have different characteristics.

---

## WebRTC Transport

### Characteristics
- **Lower latency**: ~100-200ms typical
- **Better audio quality**: Built-in adaptive bitrate
- **Peer-to-peer optimizations**: Direct media paths when possible
- **Browser-native**: Designed for browser environments

### When to Use
- ✅ Browser-based voice UI
- ✅ Low latency critical (conversational AI)
- ✅ Real-time voice interactions
- ✅ Production voice applications

### Browser Example
```typescript
import { RealtimeSession, RealtimeAgent } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey, // From your backend
  transport: 'webrtc', // ← WebRTC
});

await session.connect();
```

### Pros
- Best latency for voice
- Handles network jitter better
- Automatic echo cancellation
- NAT traversal built-in

### Cons
- Requires browser environment (or WebRTC libraries in Node.js)
- Slightly more complex setup
- STUN/TURN servers may be needed for some networks

---

## WebSocket Transport

### Characteristics
- **Slightly higher latency**: ~300-500ms typical
- **Simpler protocol**: Standard WebSocket connection
- **Works anywhere**: Node.js, browser, serverless
- **Easier debugging**: Text-based protocol

### When to Use
- ✅ Node.js server environments
- ✅ Simpler implementation preferred
- ✅ Testing and development
- ✅ Non-latency-critical use cases

### Node.js Example
```typescript
import { RealtimeAgent } from '@openai/agents-realtime';
import { OpenAIRealtimeWebSocket } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});

const session = await voiceAgent.createSession({
  transport, // ← WebSocket
});

await session.connect();
```

### Browser Example
```typescript
const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey,
  transport: 'websocket', // ← WebSocket
});
```

### Pros
- Works in Node.js without extra libraries
- Simpler to debug (Wireshark, browser DevTools)
- More predictable behavior
- Easier proxy/firewall setup

### Cons
- Higher latency than WebRTC
- No built-in jitter buffering
- Manual echo cancellation needed

---

## Comparison Table

| Feature | WebRTC | WebSocket |
|---------|--------|-----------|
| **Latency** | ~100-200ms | ~300-500ms |
| **Audio Quality** | Adaptive bitrate | Fixed bitrate |
| **Browser Support** | Native | Native |
| **Node.js Support** | Requires libraries | Native |
| **Setup Complexity** | Medium | Low |
| **Debugging** | Harder | Easier |
| **Best For** | Production voice UI | Development, Node.js |

---

## Audio I/O Handling

### Automatic (Default)
Both transports handle audio I/O automatically in browser:

```typescript
const session = new RealtimeSession(voiceAgent, {
  transport: 'webrtc', // or 'websocket'
});

// Audio automatically captured from microphone
// Audio automatically played through speakers
await session.connect();
```

### Manual (Advanced)
For custom audio sources/sinks:

```typescript
import { OpenAIRealtimeWebRTC } from '@openai/agents-realtime';

// Custom media stream (e.g., from canvas capture)
const customStream = await navigator.mediaDevices.getDisplayMedia();

const transport = new OpenAIRealtimeWebRTC({
  mediaStream: customStream,
});

const session = await voiceAgent.createSession({
  transport,
});
```

---

## Network Considerations

### WebRTC
- **Firewall**: May require STUN/TURN servers
- **NAT Traversal**: Handles automatically
- **Bandwidth**: Adaptive (300 Kbps typical)
- **Port**: Dynamic (UDP preferred)

### WebSocket
- **Firewall**: Standard HTTPS port (443)
- **NAT Traversal**: Not needed
- **Bandwidth**: ~100 Kbps typical
- **Port**: 443 (wss://) or 80 (ws://)

---

## Security

### WebRTC
- Encrypted by default (DTLS-SRTP)
- Peer identity verification
- Media plane encryption

### WebSocket
- TLS encryption (wss://)
- Standard HTTPS security model

**Both are secure for production use.**

---

## Debugging Tips

### WebRTC
```javascript
// Enable WebRTC debug logs
localStorage.setItem('debug', 'webrtc:*');

// Monitor connection stats
session.transport.getStats().then(stats => {
  console.log('RTT:', stats.roundTripTime);
  console.log('Jitter:', stats.jitter);
});
```

### WebSocket
```javascript
// Monitor WebSocket frames in browser DevTools (Network tab)

// Or programmatically
session.transport.on('message', (data) => {
  console.log('WS message:', data);
});
```

---

## Recommendations

### Production Voice UI (Browser)
```typescript
// Use WebRTC for best latency
transport: 'webrtc'
```

### Backend Processing (Node.js)
```typescript
// Use WebSocket for simplicity
const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});
```

### Development/Testing
```typescript
// Use WebSocket for easier debugging
transport: 'websocket'
```

### Mobile Apps
```typescript
// Use WebRTC for better quality
// Ensure WebRTC support in your framework
transport: 'webrtc'
```

---

## Migration Between Transports

Switching transports is simple - change one line:

```typescript
// From WebSocket
const session = new RealtimeSession(agent, {
  transport: 'websocket',
});

// To WebRTC (just change transport)
const session = new RealtimeSession(agent, {
  transport: 'webrtc',
});

// Everything else stays the same!
```

---

**Last Updated**: 2025-10-26
**Source**: [OpenAI Agents Docs - Voice Agents](https://openai.github.io/openai-agents-js/guides/voice-agents)