Files
gh-jezweb-claude-skills-ski…/references/realtime-transports.md
2025-11-30 08:25:09 +08:00

6.0 KiB

Realtime Transport Options: WebRTC vs WebSocket

This reference explains the two transport options for realtime voice agents and when to use each.


Overview

OpenAI Agents Realtime SDK supports two transport mechanisms:

  1. WebRTC (Web Real-Time Communication)
  2. WebSocket (WebSocket Protocol)

Both enable bidirectional audio streaming, but have different characteristics.


WebRTC Transport

Characteristics

  • Lower latency: ~100-200ms typical
  • Better audio quality: Built-in adaptive bitrate
  • Peer-to-peer optimizations: Direct media paths when possible
  • Browser-native: Designed for browser environments

When to Use

  • Browser-based voice UI
  • Low latency critical (conversational AI)
  • Real-time voice interactions
  • Production voice applications

Browser Example

import { RealtimeSession, RealtimeAgent } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey, // From your backend
  transport: 'webrtc', // ← WebRTC
});

await session.connect();

Pros

  • Best latency for voice
  • Handles network jitter better
  • Automatic echo cancellation
  • NAT traversal built-in

Cons

  • Requires browser environment (or WebRTC libraries in Node.js)
  • Slightly more complex setup
  • STUN/TURN servers may be needed for some networks

WebSocket Transport

Characteristics

  • Slightly higher latency: ~300-500ms typical
  • Simpler protocol: Standard WebSocket connection
  • Works anywhere: Node.js, browser, serverless
  • Easier debugging: Text-based protocol

When to Use

  • Node.js server environments
  • Simpler implementation preferred
  • Testing and development
  • Non-latency-critical use cases

Node.js Example

import { RealtimeAgent } from '@openai/agents-realtime';
import { OpenAIRealtimeWebSocket } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});

const session = await voiceAgent.createSession({
  transport, // ← WebSocket
});

await session.connect();

Browser Example

const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey,
  transport: 'websocket', // ← WebSocket
});

Pros

  • Works in Node.js without extra libraries
  • Simpler to debug (Wireshark, browser DevTools)
  • More predictable behavior
  • Easier proxy/firewall setup

Cons

  • Higher latency than WebRTC
  • No built-in jitter buffering
  • Manual echo cancellation needed

Comparison Table

Feature WebRTC WebSocket
Latency ~100-200ms ~300-500ms
Audio Quality Adaptive bitrate Fixed bitrate
Browser Support Native Native
Node.js Support Requires libraries Native
Setup Complexity Medium Low
Debugging Harder Easier
Best For Production voice UI Development, Node.js

Audio I/O Handling

Automatic (Default)

Both transports handle audio I/O automatically in browser:

const session = new RealtimeSession(voiceAgent, {
  transport: 'webrtc', // or 'websocket'
});

// Audio automatically captured from microphone
// Audio automatically played through speakers
await session.connect();

Manual (Advanced)

For custom audio sources/sinks:

import { OpenAIRealtimeWebRTC } from '@openai/agents-realtime';

// Custom media stream (e.g., from canvas capture)
const customStream = await navigator.mediaDevices.getDisplayMedia();

const transport = new OpenAIRealtimeWebRTC({
  mediaStream: customStream,
});

const session = await voiceAgent.createSession({
  transport,
});

Network Considerations

WebRTC

  • Firewall: May require STUN/TURN servers
  • NAT Traversal: Handles automatically
  • Bandwidth: Adaptive (300 Kbps typical)
  • Port: Dynamic (UDP preferred)

WebSocket

  • Firewall: Standard HTTPS port (443)
  • NAT Traversal: Not needed
  • Bandwidth: ~100 Kbps typical
  • Port: 443 (wss://) or 80 (ws://)

Security

WebRTC

  • Encrypted by default (DTLS-SRTP)
  • Peer identity verification
  • Media plane encryption

WebSocket

  • TLS encryption (wss://)
  • Standard HTTPS security model

Both are secure for production use.


Debugging Tips

WebRTC

// Enable WebRTC debug logs
localStorage.setItem('debug', 'webrtc:*');

// Monitor connection stats
session.transport.getStats().then(stats => {
  console.log('RTT:', stats.roundTripTime);
  console.log('Jitter:', stats.jitter);
});

WebSocket

// Monitor WebSocket frames in browser DevTools (Network tab)

// Or programmatically
session.transport.on('message', (data) => {
  console.log('WS message:', data);
});

Recommendations

Production Voice UI (Browser)

// Use WebRTC for best latency
transport: 'webrtc'

Backend Processing (Node.js)

// Use WebSocket for simplicity
const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});

Development/Testing

// Use WebSocket for easier debugging
transport: 'websocket'

Mobile Apps

// Use WebRTC for better quality
// Ensure WebRTC support in your framework
transport: 'webrtc'

Migration Between Transports

Switching transports is simple - change one line:

// From WebSocket
const session = new RealtimeSession(agent, {
  transport: 'websocket',
});

// To WebRTC (just change transport)
const session = new RealtimeSession(agent, {
  transport: 'webrtc',
});

// Everything else stays the same!

Last Updated: 2025-10-26 Source: OpenAI Agents Docs - Voice Agents