Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:25:09 +08:00
commit 9475095985
30 changed files with 5609 additions and 0 deletions

View File

@@ -0,0 +1,256 @@
# Agent Orchestration Patterns
This reference explains different approaches to coordinating multiple agents in OpenAI Agents SDK.
---
## Pattern 1: LLM-Based Orchestration
**What**: Let the LLM autonomously decide how to route tasks and execute tools.
**When to Use**:
- Requirements are complex and context-dependent
- You want adaptive, intelligent routing
- Task decomposition benefits from reasoning
**How It Works**:
1. Create a "manager" agent with instructions and tools/handoffs
2. LLM plans task execution based on instructions
3. LLM decides which tools to call or agents to delegate to
4. Self-critique and improvement loops possible
**Example**:
```typescript
const managerAgent = Agent.create({
name: 'Project Manager',
instructions: `You coordinate project work. You have access to:
- Database agent for data operations
- API agent for external integrations
- UI agent for frontend tasks
Analyze the request and route to appropriate agents.`,
handoffs: [databaseAgent, apiAgent, uiAgent],
});
```
**Best Practices**:
- Write clear, detailed instructions
- Define tool/handoff descriptions precisely
- Implement monitoring and logging
- Create evaluation frameworks
- Iterate based on observed failures
**Pros**:
- Flexible and adaptive
- Handles complex scenarios
- Can self-improve with feedback
**Cons**:
- Less predictable
- Higher token usage
- Requires good prompt engineering
---
## Pattern 2: Code-Based Orchestration
**What**: Use explicit programming logic to control agent execution flow.
**When to Use**:
- Workflow is deterministic and well-defined
- You need guaranteed execution order
- Debugging and testing are priorities
- Cost control is important
**How It Works**:
1. Define agents for specific tasks
2. Use code to sequence execution
3. Pass outputs as inputs to next steps
4. Implement conditional logic manually
**Example**:
```typescript
// Sequential execution
const summary = await run(summarizerAgent, article);
const sentiment = await run(sentimentAgent, summary.finalOutput);
const recommendations = await run(recommenderAgent, sentiment.finalOutput);
// Conditional routing
if (sentiment.finalOutput.score < 0.3) {
await run(escalationAgent, article);
} else {
await run(responseAgent, article);
}
// Parallel execution
const [summary, keywords, entities] = await Promise.all([
run(summarizerAgent, article),
run(keywordAgent, article),
run(entityAgent, article),
]);
// Feedback loops
let result = await run(writerAgent, prompt);
let quality = await run(evaluatorAgent, result.finalOutput);
while (quality.finalOutput.score < 8) {
result = await run(writerAgent, `Improve: ${result.finalOutput}`);
quality = await run(evaluatorAgent, result.finalOutput);
}
```
**Best Practices**:
- Break complex tasks into discrete steps
- Use structured outputs for reliable routing
- Implement error handling at each step
- Log execution flow for debugging
**Pros**:
- Predictable and deterministic
- Easy to debug and test
- Full control over execution
- Lower token usage
**Cons**:
- Less flexible
- Requires upfront planning
- Manual routing logic
---
## Pattern 3: Agents as Tools
**What**: Wrap agents as tools for a manager LLM, which decides when to invoke them.
**When to Use**:
- You want LLM routing but keep the manager in control
- Sub-agents produce specific outputs (data, not conversation)
- You need manager to summarize/synthesize results
**How It Works**:
1. Create specialist agents with `outputType`
2. Convert agents to tools
3. Manager agent calls them as needed
4. Manager synthesizes final response
**Example**:
```typescript
const weatherAgent = new Agent({
name: 'Weather Service',
instructions: 'Return weather data',
outputType: z.object({
temperature: z.number(),
conditions: z.string(),
}),
});
// Convert to tool
const weatherTool = tool({
name: 'get_weather',
description: 'Get weather data',
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => {
const result = await run(weatherAgent, city);
return result.finalOutput;
},
});
const managerAgent = new Agent({
name: 'Assistant',
instructions: 'Help users with various tasks',
tools: [weatherTool, /* other agent-tools */],
});
```
**Pros**:
- Manager maintains conversation control
- Clean separation of concerns
- Reusable specialist agents
**Cons**:
- Extra layer of complexity
- Slightly higher latency
---
## Pattern 4: Parallel Execution
**What**: Run multiple agents concurrently and select/combine results.
**When to Use**:
- Independent tasks can run simultaneously
- You want to generate multiple options
- Time to result matters
**Example Use Cases**:
- Generate 3 marketing copy variants
- Parallel research tasks (summary, pros/cons, stats, quotes)
- Quality voting (best result selection)
**See Templates**:
- `templates/text-agents/agent-parallel.ts`
---
## Pattern 5: Human-in-the-Loop
**What**: Require human approval for specific actions.
**When to Use**:
- High-stakes actions (payments, deletions, emails)
- Compliance requirements
- Building trust in AI systems
**How It Works**:
1. Mark tools with `requiresApproval: true`
2. Handle `ToolApprovalItem` interruptions
3. Prompt user for approval
4. Resume with approve/reject
**See Templates**:
- `templates/text-agents/agent-human-approval.ts`
---
## Choosing a Pattern
| Requirement | Recommended Pattern |
|-------------|---------------------|
| Adaptive routing | LLM-Based |
| Deterministic flow | Code-Based |
| Cost control | Code-Based |
| Complex reasoning | LLM-Based |
| Multiple options | Parallel |
| Safety requirements | Human-in-the-Loop |
| Manager + specialists | Agents as Tools |
---
## Combining Patterns
You can mix patterns:
```typescript
// Code-based orchestration with parallel execution and HITL
const [research1, research2] = await Promise.all([
run(researchAgent1, topic),
run(researchAgent2, topic),
]);
// LLM-based synthesis
const synthesis = await run(synthesizerAgent, {
research1: research1.finalOutput,
research2: research2.finalOutput,
});
// Human approval for final output
const approved = await requestApproval(synthesis.finalOutput);
if (approved) {
await run(publishAgent, synthesis.finalOutput);
}
```
---
**Last Updated**: 2025-10-26
**Source**: [OpenAI Agents Docs - Multi-Agent Guide](https://openai.github.io/openai-agents-js/guides/multi-agent)

View File

@@ -0,0 +1,358 @@
# Cloudflare Workers Integration
**Status**: Experimental Support
OpenAI Agents SDK has experimental support for Cloudflare Workers. Some features work, others have limitations.
---
## Compatibility
### What Works ✅
- Text agents (`Agent`, `run()`)
- Basic tool calling
- Structured outputs with Zod
- Streaming responses (with caveats)
- Environment variable access
### What Doesn't Work ❌
- Realtime voice agents (WebRTC not supported in Workers)
- Some Node.js APIs (timers, crypto edge cases)
- Long-running operations (CPU time limits)
### What's Experimental ⚠️
- Multi-agent handoffs (works but untested at scale)
- Large context windows (may hit memory limits)
- Complex tool executions (CPU time limits)
---
## Setup
### 1. Install Dependencies
```bash
npm install @openai/agents zod hono
```
### 2. Configure wrangler.jsonc
```jsonc
{
"name": "openai-agents-worker",
"main": "src/index.ts",
"compatibility_date": "2025-10-26",
"compatibility_flags": ["nodejs_compat"],
"node_compat": true, // Required for OpenAI SDK
"observability": {
"enabled": true
},
"limits": {
"cpu_ms": 30000 // Adjust based on agent complexity
}
}
```
### 3. Set Environment Variable
```bash
# Set OPENAI_API_KEY secret
wrangler secret put OPENAI_API_KEY
# Enter your OpenAI API key when prompted
```
---
## Basic Worker Example
```typescript
import { Agent, run } from '@openai/agents';
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return new Response('Method not allowed', { status: 405 });
}
try {
const { message } = await request.json();
// Set API key from environment
process.env.OPENAI_API_KEY = env.OPENAI_API_KEY;
const agent = new Agent({
name: 'Assistant',
instructions: 'You are helpful.',
model: 'gpt-4o-mini', // Use smaller models for faster response
});
const result = await run(agent, message, {
maxTurns: 5, // Limit turns to control execution time
});
return new Response(JSON.stringify({
response: result.finalOutput,
tokens: result.usage.totalTokens,
}), {
headers: { 'Content-Type': 'application/json' },
});
} catch (error) {
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { 'Content-Type': 'application/json' },
});
}
},
};
interface Env {
OPENAI_API_KEY: string;
}
```
---
## Hono Integration
```typescript
import { Hono } from 'hono';
import { Agent, run } from '@openai/agents';
const app = new Hono<{ Bindings: { OPENAI_API_KEY: string } }>();
app.post('/api/agent', async (c) => {
const { message } = await c.req.json();
process.env.OPENAI_API_KEY = c.env.OPENAI_API_KEY;
const agent = new Agent({
name: 'Assistant',
instructions: 'You are helpful.',
});
const result = await run(agent, message);
return c.json({
response: result.finalOutput,
});
});
export default app;
```
**See Template**: `templates/cloudflare-workers/worker-agent-hono.ts`
---
## Streaming Responses
Streaming works but requires careful handling:
```typescript
const stream = await run(agent, message, { stream: true });
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const encoder = new TextEncoder();
// Stream in background
(async () => {
try {
for await (const event of stream) {
if (event.type === 'raw_model_stream_event') {
const chunk = event.data?.choices?.[0]?.delta?.content || '';
if (chunk) {
await writer.write(encoder.encode(`data: ${chunk}\n\n`));
}
}
}
await stream.completed;
} finally {
await writer.close();
}
})();
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
},
});
```
---
## Known Limitations
### 1. CPU Time Limits
Workers have CPU time limits (default 50ms, up to 30s with paid plans).
**Solution**: Use smaller models and limit `maxTurns`:
```typescript
const result = await run(agent, message, {
maxTurns: 3, // Limit turns
model: 'gpt-4o-mini', // Faster model
});
```
### 2. Memory Limits
Large context windows may hit memory limits (128MB default).
**Solution**: Keep conversations concise, summarize history:
```typescript
const agent = new Agent({
instructions: 'Keep responses concise. Summarize context when needed.',
});
```
### 3. No Realtime Voice
WebRTC not supported in Workers runtime.
**Solution**: Use realtime agents in Next.js or other Node.js environments.
### 4. Cold Starts
First request after inactivity may be slow.
**Solution**: Use warm-up requests or keep Workers warm with cron triggers.
---
## Performance Tips
### 1. Use Smaller Models
```typescript
model: 'gpt-4o-mini' // Faster than gpt-4o
```
### 2. Limit Turns
```typescript
maxTurns: 3 // Prevent long-running loops
```
### 3. Stream Responses
```typescript
stream: true // Start returning data faster
```
### 4. Cache Results
```typescript
// Cache frequent queries in KV
const cached = await env.KV.get(cacheKey);
if (cached) return cached;
const result = await run(agent, message);
await env.KV.put(cacheKey, result, { expirationTtl: 3600 });
```
### 5. Use Durable Objects for State
```typescript
// Store agent state in Durable Objects for long conversations
class AgentSession {
async fetch(request) {
// Maintain conversation state across requests
}
}
```
---
## Deployment
```bash
# Build and deploy
npm run build
wrangler deploy
# Test locally
wrangler dev
```
---
## Cost Considerations
**Workers Costs**:
- Requests: $0.15 per million (after 100k free/day)
- CPU Time: $0.02 per million CPU-ms (after 10ms free per request)
**OpenAI Costs**:
- GPT-4o-mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
- GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
**Example**: 1M agent requests (avg 500 tokens each)
- Workers: ~$1.50
- GPT-4o-mini: ~$75
- **Total**: ~$76.50
**Use gpt-4o-mini for cost efficiency!**
---
## Monitoring
```typescript
// Log execution time
const start = Date.now();
const result = await run(agent, message);
const duration = Date.now() - start;
console.log(`Agent execution: ${duration}ms`);
console.log(`Tokens used: ${result.usage.totalTokens}`);
```
Enable Workers observability in wrangler.jsonc:
```jsonc
"observability": {
"enabled": true,
"head_sampling_rate": 0.1
}
```
---
## Error Handling
```typescript
try {
const result = await run(agent, message, {
maxTurns: 5,
});
return result;
} catch (error) {
if (error.message.includes('CPU time limit')) {
// Hit Workers CPU limit - reduce complexity
return { error: 'Request too complex' };
}
if (error.message.includes('memory')) {
// Hit memory limit - reduce context
return { error: 'Context too large' };
}
throw error;
}
```
---
## Alternatives
If Workers limitations are problematic:
1. **Cloudflare Pages Functions** (same runtime, may not help)
2. **Next.js on Vercel** (better Node.js support)
3. **Node.js on Railway/Render** (full Node.js environment)
4. **AWS Lambda** (longer timeouts, more memory)
---
**Last Updated**: 2025-10-26
**Status**: Experimental - test thoroughly before production use

361
references/common-errors.md Normal file
View File

@@ -0,0 +1,361 @@
# Common Errors and Solutions
This reference documents known issues with OpenAI Agents SDK and their workarounds.
---
## Error 1: Zod Schema Type Errors with Tool Parameters
**Issue**: Type errors occur when using Zod schemas as tool parameters, even when structurally compatible.
**GitHub Issue**: [#188](https://github.com/openai/openai-agents-js/issues/188)
**Symptoms**:
```typescript
// This causes TypeScript errors
const myTool = tool({
name: 'my_tool',
parameters: myZodSchema, // ❌ Type error
execute: async (input) => { /* ... */ },
});
```
**Workaround**:
```typescript
// Define schema inline
const myTool = tool({
name: 'my_tool',
parameters: z.object({
field1: z.string(),
field2: z.number(),
}), // ✅ Works
execute: async (input) => { /* ... */ },
});
// Or use type assertion (temporary fix)
const myTool = tool({
name: 'my_tool',
parameters: myZodSchema as any, // ⚠️ Loses type safety
execute: async (input) => { /* ... */ },
});
```
**Status**: Known issue as of SDK v0.2.1
**Expected Fix**: Future SDK version
---
## Error 2: MCP Server Tracing Errors
**Issue**: "No existing trace found" error when initializing RealtimeAgent with MCP servers.
**GitHub Issue**: [#580](https://github.com/openai/openai-agents-js/issues/580)
**Symptoms**:
```
UnhandledPromiseRejection: Error: No existing trace found
at RealtimeAgent.init with MCP server
```
**Workaround**:
```typescript
// Ensure tracing is initialized before creating agent
import { initializeTracing } from '@openai/agents/tracing';
await initializeTracing();
// Then create realtime agent with MCP
const agent = new RealtimeAgent({
// ... agent config with MCP servers
});
```
**Status**: Reported October 2025
**Affects**: @openai/agents-realtime v0.0.8 - v0.1.9
---
## Error 3: MaxTurnsExceededError
**Issue**: Agent enters infinite loop and hits turn limit.
**Cause**: Agent keeps calling tools or delegating without reaching conclusion.
**Symptoms**:
```
MaxTurnsExceededError: Agent exceeded maximum turns (10)
```
**Solutions**:
1. **Increase maxTurns**:
```typescript
const result = await run(agent, input, {
maxTurns: 20, // Increase limit
});
```
2. **Improve Instructions**:
```typescript
const agent = new Agent({
instructions: `You are a helpful assistant.
IMPORTANT: After using tools or delegating, provide a final answer.
Do not endlessly loop or delegate back and forth.`,
});
```
3. **Add Exit Criteria**:
```typescript
const agent = new Agent({
instructions: `Answer the question using up to 3 tool calls.
After 3 tool calls, synthesize a final answer.`,
});
```
**Prevention**: Write clear instructions with explicit completion criteria.
---
## Error 4: ToolCallError (Transient Failures)
**Issue**: Tool execution fails temporarily (network, rate limits, external API issues).
**Symptoms**:
```
ToolCallError: Failed to execute tool 'search_api'
```
**Solution**: Implement retry logic with exponential backoff.
```typescript
import { ToolCallError } from '@openai/agents';
async function runWithRetry(agent, input, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await run(agent, input);
} catch (error) {
if (error instanceof ToolCallError && attempt < maxRetries) {
const delay = 1000 * Math.pow(2, attempt - 1);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
**See Template**: `templates/shared/error-handling.ts`
---
## Error 5: GuardrailExecutionError with Fallback
**Issue**: Guardrail itself fails (e.g., guardrail agent unavailable).
**Symptoms**:
```
GuardrailExecutionError: Guardrail 'safety_check' failed to execute
```
**Solution**: Implement fallback guardrails.
```typescript
import { GuardrailExecutionError } from '@openai/agents';
const primaryGuardrail = { /* ... */ };
const fallbackGuardrail = { /* simple keyword filter */ };
const agent = new Agent({
inputGuardrails: [primaryGuardrail],
});
try {
const result = await run(agent, input);
} catch (error) {
if (error instanceof GuardrailExecutionError && error.state) {
// Retry with fallback guardrail
agent.inputGuardrails = [fallbackGuardrail];
const result = await run(agent, error.state);
}
}
```
**See Template**: `templates/text-agents/agent-guardrails-input.ts`
---
## Error 6: Schema Mismatch (outputType vs Actual Output)
**Issue**: Agent returns data that doesn't match declared `outputType` schema.
**Cause**: Model sometimes deviates from schema despite instructions.
**Symptoms**:
```
Validation Error: Output does not match schema
```
**Solutions**:
1. **Add Validation Instructions**:
```typescript
const agent = new Agent({
instructions: `You MUST return data matching this exact schema.
Double-check your output before finalizing.`,
outputType: mySchema,
});
```
2. **Use Stricter Models**:
```typescript
const agent = new Agent({
model: 'gpt-4o', // More reliable than gpt-4o-mini for structured output
outputType: mySchema,
});
```
3. **Catch and Retry**:
```typescript
try {
const result = await run(agent, input);
// Validate output
mySchema.parse(result.finalOutput);
} catch (error) {
// Retry with stronger prompt
const retryResult = await run(agent,
`CRITICAL: Your previous output was invalid. Return valid JSON matching the schema exactly. ${input}`
);
}
```
---
## Error 7: Ollama Integration Failures
**Issue**: TypeScript Agent SDK fails to connect with Ollama models.
**GitHub Issue**: [#136](https://github.com/openai/openai-agents-js/issues/136)
**Symptoms**:
```
TypeError: Cannot read properties of undefined (reading 'completions')
```
**Cause**: SDK designed for OpenAI API format; Ollama requires adapter.
**Workaround**: Use Vercel AI SDK adapter or stick to OpenAI-compatible models.
**Status**: Experimental support; not officially supported.
---
## Error 8: Built-in webSearchTool Intermittent Errors
**Issue**: Built-in `webSearchTool()` sometimes throws exceptions.
**Symptoms**: Unpredictable failures when invoking web search.
**Workaround**:
```typescript
// Use custom search tool with error handling
const customSearchTool = tool({
name: 'search',
description: 'Search the web',
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
try {
// Your search API (Tavily, Google, etc.)
const results = await fetch(`https://api.example.com/search?q=${query}`);
return await results.json();
} catch (error) {
return { error: 'Search temporarily unavailable' };
}
},
});
```
**Status**: Known issue in early SDK versions.
---
## Error 9: Agent Builder Export Bugs
**Issue**: Code exported from Agent Builder has bugs (template string escaping, state typing).
**Source**: [OpenAI Community](https://community.openai.com/t/bugs-in-agent-builder-exported-code-typescript-template-string-escaping-state-typing-and-property-naming/1362119)
**Symptoms**: Exported code doesn't compile or run.
**Solution**: Manually review and fix exported code before use.
---
## General Error Handling Pattern
**Comprehensive error handling template**:
```typescript
import {
MaxTurnsExceededError,
InputGuardrailTripwireTriggered,
OutputGuardrailTripwireTriggered,
ToolCallError,
GuardrailExecutionError,
ModelBehaviorError,
} from '@openai/agents';
try {
const result = await run(agent, input, { maxTurns: 10 });
return result;
} catch (error) {
if (error instanceof MaxTurnsExceededError) {
// Agent hit turn limit - logic issue
console.error('Agent looped too many times');
throw error;
} else if (error instanceof InputGuardrailTripwireTriggered) {
// Input blocked by guardrail - don't retry
console.error('Input blocked:', error.outputInfo);
return { error: 'Input not allowed' };
} else if (error instanceof OutputGuardrailTripwireTriggered) {
// Output blocked by guardrail - don't retry
console.error('Output blocked:', error.outputInfo);
return { error: 'Response blocked for safety' };
} else if (error instanceof ToolCallError) {
// Tool failed - retry with backoff
console.error('Tool failed:', error.toolName);
return retryWithBackoff(agent, input);
} else if (error instanceof GuardrailExecutionError) {
// Guardrail failed - use fallback
console.error('Guardrail failed');
return runWithFallbackGuardrail(agent, input);
} else if (error instanceof ModelBehaviorError) {
// Unexpected model behavior - don't retry
console.error('Model behavior error');
throw error;
} else {
// Unknown error
console.error('Unknown error:', error);
throw error;
}
}
```
**See Template**: `templates/shared/error-handling.ts`
---
**Last Updated**: 2025-10-26
**Sources**:
- [GitHub Issues](https://github.com/openai/openai-agents-js/issues)
- [OpenAI Community](https://community.openai.com/)
- SDK Documentation

View File

@@ -0,0 +1,188 @@
# Official Links and Resources
Quick reference to official OpenAI Agents SDK documentation and resources.
---
## Official Documentation
### Main Documentation
- **Homepage**: https://openai.github.io/openai-agents-js/
- **Getting Started**: https://openai.github.io/openai-agents-js/getting-started
- **API Reference**: https://openai.github.io/openai-agents-js/api
### Guides
- **Quickstart**: https://openai.github.io/openai-agents-js/guides/quickstart
- **Agents**: https://openai.github.io/openai-agents-js/guides/agents
- **Handoffs**: https://openai.github.io/openai-agents-js/guides/handoffs
- **Tools**: https://openai.github.io/openai-agents-js/guides/tools
- **Guardrails**: https://openai.github.io/openai-agents-js/guides/guardrails
- **Human-in-the-Loop**: https://openai.github.io/openai-agents-js/guides/human-in-the-loop
- **Streaming**: https://openai.github.io/openai-agents-js/guides/streaming
- **Multi-Agent**: https://openai.github.io/openai-agents-js/guides/multi-agent
- **Voice Agents**: https://openai.github.io/openai-agents-js/guides/voice-agents
- **Results**: https://openai.github.io/openai-agents-js/guides/results
- **Running Agents**: https://openai.github.io/openai-agents-js/guides/running-agents
---
## GitHub Repository
### Main Repo
- **Source Code**: https://github.com/openai/openai-agents-js
- **Issues**: https://github.com/openai/openai-agents-js/issues
- **Releases**: https://github.com/openai/openai-agents-js/releases
- **Examples**: https://github.com/openai/openai-agents-js/tree/main/examples
### Related Repos
- **Python SDK**: https://github.com/openai/openai-agents-python
- **Go SDK**: https://github.com/nlpodyssey/openai-agents-go
- **Realtime Examples**: https://github.com/openai/openai-realtime-agents
---
## npm Packages
### Core Packages
- **@openai/agents**: https://www.npmjs.com/package/@openai/agents
- **@openai/agents-realtime**: https://www.npmjs.com/package/@openai/agents-realtime
### Installation
```bash
npm install @openai/agents zod@3
npm install @openai/agents-realtime # For voice agents
```
---
## OpenAI Platform
### API Documentation
- **API Overview**: https://platform.openai.com/docs/overview
- **Authentication**: https://platform.openai.com/docs/api-reference/authentication
- **Models**: https://platform.openai.com/docs/models
- **Realtime API**: https://platform.openai.com/docs/guides/realtime
### Pricing
- **Pricing Page**: https://openai.com/api/pricing/
- **GPT-4o**: $2.50 / 1M input tokens, $10.00 / 1M output tokens
- **GPT-4o-mini**: $0.15 / 1M input tokens, $0.60 / 1M output tokens
### Account
- **API Keys**: https://platform.openai.com/api-keys
- **Usage Dashboard**: https://platform.openai.com/usage
- **Playground**: https://platform.openai.com/playground
---
## Community
### Forums
- **OpenAI Community**: https://community.openai.com/
- **Developer Forum**: https://community.openai.com/c/api/7
- **Agents Discussion**: Search "agents sdk" on community
### Social
- **OpenAI Twitter**: https://twitter.com/OpenAI
- **OpenAI Blog**: https://openai.com/blog/
---
## Engineering Blog
### Key Articles
- **Agents SDK Announcement**: Check OpenAI blog for official announcement
- **Swarm to Agents Migration**: (Agents SDK is successor to experimental Swarm)
---
## Related Tools
### Development Tools
- **Zod**: https://zod.dev/ (Schema validation)
- **TypeScript**: https://www.typescriptlang.org/
- **Vercel AI SDK**: https://ai-sdk.dev/ (For multi-provider support)
### Frameworks
- **Next.js**: https://nextjs.org/
- **Hono**: https://hono.dev/
- **Cloudflare Workers**: https://developers.cloudflare.com/workers/
---
## Examples and Templates
### Official Examples
- **Basic Examples**: https://github.com/openai/openai-agents-js/tree/main/examples
- **Voice Examples**: https://github.com/openai/openai-agents-js/tree/main/examples/realtime
- **Multi-Agent Examples**: https://github.com/openai/openai-agents-js/tree/main/examples/agent-patterns
### Community Examples
- Check GitHub for "openai-agents-js" topic: https://github.com/topics/openai-agents
---
## Support
### Getting Help
1. **Documentation**: Start with official docs
2. **GitHub Issues**: Search existing issues first
3. **Community Forum**: Ask in OpenAI Community
4. **Stack Overflow**: Tag with `openai-agents-js`
### Reporting Bugs
- **GitHub Issues**: https://github.com/openai/openai-agents-js/issues/new
- Include: SDK version, code snippet, error message, environment
---
## Version Information
### Current Versions (as of 2025-10-26)
- **@openai/agents**: 0.2.1
- **@openai/agents-realtime**: 0.2.1
- **Required zod**: ^3.x
### Version History
- Check releases: https://github.com/openai/openai-agents-js/releases
### Migration Guides
- Check docs for breaking changes between versions
- Always test after upgrading
---
## Comparison with Other Frameworks
### vs Swarm
- **Swarm**: Experimental project (deprecated)
- **Agents SDK**: Production-ready successor
### vs LangChain
- **LangChain**: Framework-agnostic, many providers
- **Agents SDK**: OpenAI-focused, simpler API
### vs OpenAI Assistants API
- **Assistants API**: Managed state, threads, files
- **Agents SDK**: Full control, custom orchestration
---
## Changelog
### v0.2.1 (2025-10)
- Realtime voice agent improvements
- Bug fixes for MCP integration
- Performance optimizations
### v0.1.x
- Initial public release
- Core agent features
- Handoffs and tools
---
**Last Updated**: 2025-10-26
**SDK Version**: 0.2.1
**Note**: Links verified current. Check official sources for latest updates.

View File

@@ -0,0 +1,277 @@
# Realtime Transport Options: WebRTC vs WebSocket
This reference explains the two transport options for realtime voice agents and when to use each.
---
## Overview
OpenAI Agents Realtime SDK supports two transport mechanisms:
1. **WebRTC** (Web Real-Time Communication)
2. **WebSocket** (WebSocket Protocol)
Both enable bidirectional audio streaming, but have different characteristics.
---
## WebRTC Transport
### Characteristics
- **Lower latency**: ~100-200ms typical
- **Better audio quality**: Built-in adaptive bitrate
- **Peer-to-peer optimizations**: Direct media paths when possible
- **Browser-native**: Designed for browser environments
### When to Use
- ✅ Browser-based voice UI
- ✅ Low latency critical (conversational AI)
- ✅ Real-time voice interactions
- ✅ Production voice applications
### Browser Example
```typescript
import { RealtimeSession, RealtimeAgent } from '@openai/agents-realtime';
const voiceAgent = new RealtimeAgent({
name: 'Voice Assistant',
instructions: 'You are helpful.',
voice: 'alloy',
});
const session = new RealtimeSession(voiceAgent, {
apiKey: sessionApiKey, // From your backend
transport: 'webrtc', // ← WebRTC
});
await session.connect();
```
### Pros
- Best latency for voice
- Handles network jitter better
- Automatic echo cancellation
- NAT traversal built-in
### Cons
- Requires browser environment (or WebRTC libraries in Node.js)
- Slightly more complex setup
- STUN/TURN servers may be needed for some networks
---
## WebSocket Transport
### Characteristics
- **Slightly higher latency**: ~300-500ms typical
- **Simpler protocol**: Standard WebSocket connection
- **Works anywhere**: Node.js, browser, serverless
- **Easier debugging**: Text-based protocol
### When to Use
- ✅ Node.js server environments
- ✅ Simpler implementation preferred
- ✅ Testing and development
- ✅ Non-latency-critical use cases
### Node.js Example
```typescript
import { RealtimeAgent } from '@openai/agents-realtime';
import { OpenAIRealtimeWebSocket } from '@openai/agents-realtime';
const voiceAgent = new RealtimeAgent({
name: 'Voice Assistant',
instructions: 'You are helpful.',
voice: 'alloy',
});
const transport = new OpenAIRealtimeWebSocket({
apiKey: process.env.OPENAI_API_KEY,
});
const session = await voiceAgent.createSession({
transport, // ← WebSocket
});
await session.connect();
```
### Browser Example
```typescript
const session = new RealtimeSession(voiceAgent, {
apiKey: sessionApiKey,
transport: 'websocket', // ← WebSocket
});
```
### Pros
- Works in Node.js without extra libraries
- Simpler to debug (Wireshark, browser DevTools)
- More predictable behavior
- Easier proxy/firewall setup
### Cons
- Higher latency than WebRTC
- No built-in jitter buffering
- Manual echo cancellation needed
---
## Comparison Table
| Feature | WebRTC | WebSocket |
|---------|--------|-----------|
| **Latency** | ~100-200ms | ~300-500ms |
| **Audio Quality** | Adaptive bitrate | Fixed bitrate |
| **Browser Support** | Native | Native |
| **Node.js Support** | Requires libraries | Native |
| **Setup Complexity** | Medium | Low |
| **Debugging** | Harder | Easier |
| **Best For** | Production voice UI | Development, Node.js |
---
## Audio I/O Handling
### Automatic (Default)
Both transports handle audio I/O automatically in browser:
```typescript
const session = new RealtimeSession(voiceAgent, {
transport: 'webrtc', // or 'websocket'
});
// Audio automatically captured from microphone
// Audio automatically played through speakers
await session.connect();
```
### Manual (Advanced)
For custom audio sources/sinks:
```typescript
import { OpenAIRealtimeWebRTC } from '@openai/agents-realtime';
// Custom media stream (e.g., from canvas capture)
const customStream = await navigator.mediaDevices.getDisplayMedia();
const transport = new OpenAIRealtimeWebRTC({
mediaStream: customStream,
});
const session = await voiceAgent.createSession({
transport,
});
```
---
## Network Considerations
### WebRTC
- **Firewall**: May require STUN/TURN servers
- **NAT Traversal**: Handles automatically
- **Bandwidth**: Adaptive (300 Kbps typical)
- **Port**: Dynamic (UDP preferred)
### WebSocket
- **Firewall**: Standard HTTPS port (443)
- **NAT Traversal**: Not needed
- **Bandwidth**: ~100 Kbps typical
- **Port**: 443 (wss://) or 80 (ws://)
---
## Security
### WebRTC
- Encrypted by default (DTLS-SRTP)
- Peer identity verification
- Media plane encryption
### WebSocket
- TLS encryption (wss://)
- Standard HTTPS security model
**Both are secure for production use.**
---
## Debugging Tips
### WebRTC
```javascript
// Enable WebRTC debug logs
localStorage.setItem('debug', 'webrtc:*');
// Monitor connection stats
session.transport.getStats().then(stats => {
console.log('RTT:', stats.roundTripTime);
console.log('Jitter:', stats.jitter);
});
```
### WebSocket
```javascript
// Monitor WebSocket frames in browser DevTools (Network tab)
// Or programmatically
session.transport.on('message', (data) => {
console.log('WS message:', data);
});
```
---
## Recommendations
### Production Voice UI (Browser)
```typescript
// Use WebRTC for best latency
transport: 'webrtc'
```
### Backend Processing (Node.js)
```typescript
// Use WebSocket for simplicity
const transport = new OpenAIRealtimeWebSocket({
apiKey: process.env.OPENAI_API_KEY,
});
```
### Development/Testing
```typescript
// Use WebSocket for easier debugging
transport: 'websocket'
```
### Mobile Apps
```typescript
// Use WebRTC for better quality
// Ensure WebRTC support in your framework
transport: 'webrtc'
```
---
## Migration Between Transports
Switching transports is simple - change one line:
```typescript
// From WebSocket
const session = new RealtimeSession(agent, {
transport: 'websocket',
});
// To WebRTC (just change transport)
const session = new RealtimeSession(agent, {
transport: 'webrtc',
});
// Everything else stays the same!
```
---
**Last Updated**: 2025-10-26
**Source**: [OpenAI Agents Docs - Voice Agents](https://openai.github.io/openai-agents-js/guides/voice-agents)