Initial commit
This commit is contained in:
277
references/realtime-transports.md
Normal file
277
references/realtime-transports.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# Realtime Transport Options: WebRTC vs WebSocket
|
||||
|
||||
This reference explains the two transport options for realtime voice agents and when to use each.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
OpenAI Agents Realtime SDK supports two transport mechanisms:
|
||||
1. **WebRTC** (Web Real-Time Communication)
|
||||
2. **WebSocket** (WebSocket Protocol)
|
||||
|
||||
Both enable bidirectional audio streaming, but have different characteristics.
|
||||
|
||||
---
|
||||
|
||||
## WebRTC Transport
|
||||
|
||||
### Characteristics
|
||||
- **Lower latency**: ~100-200ms typical
|
||||
- **Better audio quality**: Built-in adaptive bitrate
|
||||
- **Peer-to-peer optimizations**: Direct media paths when possible
|
||||
- **Browser-native**: Designed for browser environments
|
||||
|
||||
### When to Use
|
||||
- ✅ Browser-based voice UI
|
||||
- ✅ Low latency critical (conversational AI)
|
||||
- ✅ Real-time voice interactions
|
||||
- ✅ Production voice applications
|
||||
|
||||
### Browser Example
|
||||
```typescript
|
||||
import { RealtimeSession, RealtimeAgent } from '@openai/agents-realtime';
|
||||
|
||||
const voiceAgent = new RealtimeAgent({
|
||||
name: 'Voice Assistant',
|
||||
instructions: 'You are helpful.',
|
||||
voice: 'alloy',
|
||||
});
|
||||
|
||||
const session = new RealtimeSession(voiceAgent, {
|
||||
apiKey: sessionApiKey, // From your backend
|
||||
transport: 'webrtc', // ← WebRTC
|
||||
});
|
||||
|
||||
await session.connect();
|
||||
```
|
||||
|
||||
### Pros
|
||||
- Best latency for voice
|
||||
- Handles network jitter better
|
||||
- Automatic echo cancellation
|
||||
- NAT traversal built-in
|
||||
|
||||
### Cons
|
||||
- Requires browser environment (or WebRTC libraries in Node.js)
|
||||
- Slightly more complex setup
|
||||
- STUN/TURN servers may be needed for some networks
|
||||
|
||||
---
|
||||
|
||||
## WebSocket Transport
|
||||
|
||||
### Characteristics
|
||||
- **Slightly higher latency**: ~300-500ms typical
|
||||
- **Simpler protocol**: Standard WebSocket connection
|
||||
- **Works anywhere**: Node.js, browser, serverless
|
||||
- **Easier debugging**: Text-based protocol
|
||||
|
||||
### When to Use
|
||||
- ✅ Node.js server environments
|
||||
- ✅ Simpler implementation preferred
|
||||
- ✅ Testing and development
|
||||
- ✅ Non-latency-critical use cases
|
||||
|
||||
### Node.js Example
|
||||
```typescript
|
||||
import { RealtimeAgent } from '@openai/agents-realtime';
|
||||
import { OpenAIRealtimeWebSocket } from '@openai/agents-realtime';
|
||||
|
||||
const voiceAgent = new RealtimeAgent({
|
||||
name: 'Voice Assistant',
|
||||
instructions: 'You are helpful.',
|
||||
voice: 'alloy',
|
||||
});
|
||||
|
||||
const transport = new OpenAIRealtimeWebSocket({
|
||||
apiKey: process.env.OPENAI_API_KEY,
|
||||
});
|
||||
|
||||
const session = await voiceAgent.createSession({
|
||||
transport, // ← WebSocket
|
||||
});
|
||||
|
||||
await session.connect();
|
||||
```
|
||||
|
||||
### Browser Example
|
||||
```typescript
|
||||
const session = new RealtimeSession(voiceAgent, {
|
||||
apiKey: sessionApiKey,
|
||||
transport: 'websocket', // ← WebSocket
|
||||
});
|
||||
```
|
||||
|
||||
### Pros
|
||||
- Works in Node.js without extra libraries
|
||||
- Simpler to debug (Wireshark, browser DevTools)
|
||||
- More predictable behavior
|
||||
- Easier proxy/firewall setup
|
||||
|
||||
### Cons
|
||||
- Higher latency than WebRTC
|
||||
- No built-in jitter buffering
|
||||
- Manual echo cancellation needed
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table
|
||||
|
||||
| Feature | WebRTC | WebSocket |
|
||||
|---------|--------|-----------|
|
||||
| **Latency** | ~100-200ms | ~300-500ms |
|
||||
| **Audio Quality** | Adaptive bitrate | Fixed bitrate |
|
||||
| **Browser Support** | Native | Native |
|
||||
| **Node.js Support** | Requires libraries | Native |
|
||||
| **Setup Complexity** | Medium | Low |
|
||||
| **Debugging** | Harder | Easier |
|
||||
| **Best For** | Production voice UI | Development, Node.js |
|
||||
|
||||
---
|
||||
|
||||
## Audio I/O Handling
|
||||
|
||||
### Automatic (Default)
|
||||
Both transports handle audio I/O automatically in browser:
|
||||
|
||||
```typescript
|
||||
const session = new RealtimeSession(voiceAgent, {
|
||||
transport: 'webrtc', // or 'websocket'
|
||||
});
|
||||
|
||||
// Audio automatically captured from microphone
|
||||
// Audio automatically played through speakers
|
||||
await session.connect();
|
||||
```
|
||||
|
||||
### Manual (Advanced)
|
||||
For custom audio sources/sinks:
|
||||
|
||||
```typescript
|
||||
import { OpenAIRealtimeWebRTC } from '@openai/agents-realtime';
|
||||
|
||||
// Custom media stream (e.g., from canvas capture)
|
||||
const customStream = await navigator.mediaDevices.getDisplayMedia();
|
||||
|
||||
const transport = new OpenAIRealtimeWebRTC({
|
||||
mediaStream: customStream,
|
||||
});
|
||||
|
||||
const session = await voiceAgent.createSession({
|
||||
transport,
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Network Considerations
|
||||
|
||||
### WebRTC
|
||||
- **Firewall**: May require STUN/TURN servers
|
||||
- **NAT Traversal**: Handles automatically
|
||||
- **Bandwidth**: Adaptive (300 Kbps typical)
|
||||
- **Port**: Dynamic (UDP preferred)
|
||||
|
||||
### WebSocket
|
||||
- **Firewall**: Standard HTTPS port (443)
|
||||
- **NAT Traversal**: Not needed
|
||||
- **Bandwidth**: ~100 Kbps typical
|
||||
- **Port**: 443 (wss://) or 80 (ws://)
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### WebRTC
|
||||
- Encrypted by default (DTLS-SRTP)
|
||||
- Peer identity verification
|
||||
- Media plane encryption
|
||||
|
||||
### WebSocket
|
||||
- TLS encryption (wss://)
|
||||
- Standard HTTPS security model
|
||||
|
||||
**Both are secure for production use.**
|
||||
|
||||
---
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
### WebRTC
|
||||
```javascript
|
||||
// Enable WebRTC debug logs
|
||||
localStorage.setItem('debug', 'webrtc:*');
|
||||
|
||||
// Monitor connection stats
|
||||
session.transport.getStats().then(stats => {
|
||||
console.log('RTT:', stats.roundTripTime);
|
||||
console.log('Jitter:', stats.jitter);
|
||||
});
|
||||
```
|
||||
|
||||
### WebSocket
|
||||
```javascript
|
||||
// Monitor WebSocket frames in browser DevTools (Network tab)
|
||||
|
||||
// Or programmatically
|
||||
session.transport.on('message', (data) => {
|
||||
console.log('WS message:', data);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Production Voice UI (Browser)
|
||||
```typescript
|
||||
// Use WebRTC for best latency
|
||||
transport: 'webrtc'
|
||||
```
|
||||
|
||||
### Backend Processing (Node.js)
|
||||
```typescript
|
||||
// Use WebSocket for simplicity
|
||||
const transport = new OpenAIRealtimeWebSocket({
|
||||
apiKey: process.env.OPENAI_API_KEY,
|
||||
});
|
||||
```
|
||||
|
||||
### Development/Testing
|
||||
```typescript
|
||||
// Use WebSocket for easier debugging
|
||||
transport: 'websocket'
|
||||
```
|
||||
|
||||
### Mobile Apps
|
||||
```typescript
|
||||
// Use WebRTC for better quality
|
||||
// Ensure WebRTC support in your framework
|
||||
transport: 'webrtc'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Between Transports
|
||||
|
||||
Switching transports is simple - change one line:
|
||||
|
||||
```typescript
|
||||
// From WebSocket
|
||||
const session = new RealtimeSession(agent, {
|
||||
transport: 'websocket',
|
||||
});
|
||||
|
||||
// To WebRTC (just change transport)
|
||||
const session = new RealtimeSession(agent, {
|
||||
transport: 'webrtc',
|
||||
});
|
||||
|
||||
// Everything else stays the same!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-10-26
|
||||
**Source**: [OpenAI Agents Docs - Voice Agents](https://openai.github.io/openai-agents-js/guides/voice-agents)
|
||||
Reference in New Issue
Block a user