zhongwei/gh-jezweb-claude-skills-skills-openai-agents

Fork 0

Files

Zhongwei Li 9475095985 Initial commit

2025-11-30 08:25:09 +08:00

6.0 KiB

Raw Permalink Blame History

Realtime Transport Options: WebRTC vs WebSocket

This reference explains the two transport options for realtime voice agents and when to use each.

Overview

OpenAI Agents Realtime SDK supports two transport mechanisms:

WebRTC (Web Real-Time Communication)
WebSocket (WebSocket Protocol)

Both enable bidirectional audio streaming, but have different characteristics.

WebRTC Transport

Characteristics

Lower latency: ~100-200ms typical
Better audio quality: Built-in adaptive bitrate
Peer-to-peer optimizations: Direct media paths when possible
Browser-native: Designed for browser environments

When to Use

✅ Browser-based voice UI
✅ Low latency critical (conversational AI)
✅ Real-time voice interactions
✅ Production voice applications

Browser Example

import { RealtimeSession, RealtimeAgent } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey, // From your backend
  transport: 'webrtc', // ← WebRTC
});

await session.connect();

Pros

Best latency for voice
Handles network jitter better
Automatic echo cancellation
NAT traversal built-in

Cons

Requires browser environment (or WebRTC libraries in Node.js)
Slightly more complex setup
STUN/TURN servers may be needed for some networks

WebSocket Transport

Characteristics

Slightly higher latency: ~300-500ms typical
Simpler protocol: Standard WebSocket connection
Works anywhere: Node.js, browser, serverless
Easier debugging: Text-based protocol

When to Use

✅ Node.js server environments
✅ Simpler implementation preferred
✅ Testing and development
✅ Non-latency-critical use cases

Node.js Example

import { RealtimeAgent } from '@openai/agents-realtime';
import { OpenAIRealtimeWebSocket } from '@openai/agents-realtime';

const voiceAgent = new RealtimeAgent({
  name: 'Voice Assistant',
  instructions: 'You are helpful.',
  voice: 'alloy',
});

const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});

const session = await voiceAgent.createSession({
  transport, // ← WebSocket
});

await session.connect();

Browser Example

const session = new RealtimeSession(voiceAgent, {
  apiKey: sessionApiKey,
  transport: 'websocket', // ← WebSocket
});

Pros

Works in Node.js without extra libraries
Simpler to debug (Wireshark, browser DevTools)
More predictable behavior
Easier proxy/firewall setup

Cons

Higher latency than WebRTC
No built-in jitter buffering
Manual echo cancellation needed

Comparison Table

Feature	WebRTC	WebSocket
Latency	~100-200ms	~300-500ms
Audio Quality	Adaptive bitrate	Fixed bitrate
Browser Support	Native	Native
Node.js Support	Requires libraries	Native
Setup Complexity	Medium	Low
Debugging	Harder	Easier
Best For	Production voice UI	Development, Node.js

Audio I/O Handling

Automatic (Default)

Both transports handle audio I/O automatically in browser:

const session = new RealtimeSession(voiceAgent, {
  transport: 'webrtc', // or 'websocket'
});

// Audio automatically captured from microphone
// Audio automatically played through speakers
await session.connect();

Manual (Advanced)

For custom audio sources/sinks:

import { OpenAIRealtimeWebRTC } from '@openai/agents-realtime';

// Custom media stream (e.g., from canvas capture)
const customStream = await navigator.mediaDevices.getDisplayMedia();

const transport = new OpenAIRealtimeWebRTC({
  mediaStream: customStream,
});

const session = await voiceAgent.createSession({
  transport,
});

Network Considerations

WebRTC

Firewall: May require STUN/TURN servers
NAT Traversal: Handles automatically
Bandwidth: Adaptive (300 Kbps typical)
Port: Dynamic (UDP preferred)

WebSocket

Firewall: Standard HTTPS port (443)
NAT Traversal: Not needed
Bandwidth: ~100 Kbps typical
Port: 443 (wss://) or 80 (ws://)

Security

WebRTC

Encrypted by default (DTLS-SRTP)
Peer identity verification
Media plane encryption

WebSocket

TLS encryption (wss://)
Standard HTTPS security model

Both are secure for production use.

Debugging Tips

WebRTC

// Enable WebRTC debug logs
localStorage.setItem('debug', 'webrtc:*');

// Monitor connection stats
session.transport.getStats().then(stats => {
  console.log('RTT:', stats.roundTripTime);
  console.log('Jitter:', stats.jitter);
});

WebSocket

// Monitor WebSocket frames in browser DevTools (Network tab)

// Or programmatically
session.transport.on('message', (data) => {
  console.log('WS message:', data);
});

Recommendations

Production Voice UI (Browser)

// Use WebRTC for best latency
transport: 'webrtc'

Backend Processing (Node.js)

// Use WebSocket for simplicity
const transport = new OpenAIRealtimeWebSocket({
  apiKey: process.env.OPENAI_API_KEY,
});

Development/Testing

// Use WebSocket for easier debugging
transport: 'websocket'

Mobile Apps

// Use WebRTC for better quality
// Ensure WebRTC support in your framework
transport: 'webrtc'

Migration Between Transports

Switching transports is simple - change one line:

// From WebSocket
const session = new RealtimeSession(agent, {
  transport: 'websocket',
});

// To WebRTC (just change transport)
const session = new RealtimeSession(agent, {
  transport: 'webrtc',
});

// Everything else stays the same!

Last Updated: 2025-10-26 Source: OpenAI Agents Docs - Voice Agents

6.0 KiB Raw Permalink Blame History

Realtime Transport Options: WebRTC vs WebSocket

Overview

WebRTC Transport

Characteristics

When to Use

Browser Example

Pros

Cons

WebSocket Transport

Characteristics

When to Use

Node.js Example

Browser Example

Pros

Cons

Comparison Table

Audio I/O Handling

Automatic (Default)

Manual (Advanced)

Network Considerations

WebRTC

WebSocket

Security

WebRTC

WebSocket

Debugging Tips

WebRTC

WebSocket

Recommendations

Production Voice UI (Browser)

Backend Processing (Node.js)

Development/Testing

Mobile Apps

Migration Between Transports

6.0 KiB

Raw Permalink Blame History