359 lines
7.2 KiB
Markdown
359 lines
7.2 KiB
Markdown
# Cloudflare Workers Integration
|
|
|
|
**Status**: Experimental Support
|
|
|
|
OpenAI Agents SDK has experimental support for Cloudflare Workers. Some features work, others have limitations.
|
|
|
|
---
|
|
|
|
## Compatibility
|
|
|
|
### What Works ✅
|
|
- Text agents (`Agent`, `run()`)
|
|
- Basic tool calling
|
|
- Structured outputs with Zod
|
|
- Streaming responses (with caveats)
|
|
- Environment variable access
|
|
|
|
### What Doesn't Work ❌
|
|
- Realtime voice agents (WebRTC not supported in Workers)
|
|
- Some Node.js APIs (timers, crypto edge cases)
|
|
- Long-running operations (CPU time limits)
|
|
|
|
### What's Experimental ⚠️
|
|
- Multi-agent handoffs (works but untested at scale)
|
|
- Large context windows (may hit memory limits)
|
|
- Complex tool executions (CPU time limits)
|
|
|
|
---
|
|
|
|
## Setup
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
npm install @openai/agents zod hono
|
|
```
|
|
|
|
### 2. Configure wrangler.jsonc
|
|
|
|
```jsonc
|
|
{
|
|
"name": "openai-agents-worker",
|
|
"main": "src/index.ts",
|
|
"compatibility_date": "2025-10-26",
|
|
"compatibility_flags": ["nodejs_compat"],
|
|
"node_compat": true, // Required for OpenAI SDK
|
|
|
|
"observability": {
|
|
"enabled": true
|
|
},
|
|
|
|
"limits": {
|
|
"cpu_ms": 30000 // Adjust based on agent complexity
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Set Environment Variable
|
|
|
|
```bash
|
|
# Set OPENAI_API_KEY secret
|
|
wrangler secret put OPENAI_API_KEY
|
|
|
|
# Enter your OpenAI API key when prompted
|
|
```
|
|
|
|
---
|
|
|
|
## Basic Worker Example
|
|
|
|
```typescript
|
|
import { Agent, run } from '@openai/agents';
|
|
|
|
export default {
|
|
async fetch(request: Request, env: Env): Promise<Response> {
|
|
if (request.method !== 'POST') {
|
|
return new Response('Method not allowed', { status: 405 });
|
|
}
|
|
|
|
try {
|
|
const { message } = await request.json();
|
|
|
|
// Set API key from environment
|
|
process.env.OPENAI_API_KEY = env.OPENAI_API_KEY;
|
|
|
|
const agent = new Agent({
|
|
name: 'Assistant',
|
|
instructions: 'You are helpful.',
|
|
model: 'gpt-4o-mini', // Use smaller models for faster response
|
|
});
|
|
|
|
const result = await run(agent, message, {
|
|
maxTurns: 5, // Limit turns to control execution time
|
|
});
|
|
|
|
return new Response(JSON.stringify({
|
|
response: result.finalOutput,
|
|
tokens: result.usage.totalTokens,
|
|
}), {
|
|
headers: { 'Content-Type': 'application/json' },
|
|
});
|
|
|
|
} catch (error) {
|
|
return new Response(JSON.stringify({ error: error.message }), {
|
|
status: 500,
|
|
headers: { 'Content-Type': 'application/json' },
|
|
});
|
|
}
|
|
},
|
|
};
|
|
|
|
interface Env {
|
|
OPENAI_API_KEY: string;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Hono Integration
|
|
|
|
```typescript
|
|
import { Hono } from 'hono';
|
|
import { Agent, run } from '@openai/agents';
|
|
|
|
const app = new Hono<{ Bindings: { OPENAI_API_KEY: string } }>();
|
|
|
|
app.post('/api/agent', async (c) => {
|
|
const { message } = await c.req.json();
|
|
|
|
process.env.OPENAI_API_KEY = c.env.OPENAI_API_KEY;
|
|
|
|
const agent = new Agent({
|
|
name: 'Assistant',
|
|
instructions: 'You are helpful.',
|
|
});
|
|
|
|
const result = await run(agent, message);
|
|
|
|
return c.json({
|
|
response: result.finalOutput,
|
|
});
|
|
});
|
|
|
|
export default app;
|
|
```
|
|
|
|
**See Template**: `templates/cloudflare-workers/worker-agent-hono.ts`
|
|
|
|
---
|
|
|
|
## Streaming Responses
|
|
|
|
Streaming works but requires careful handling:
|
|
|
|
```typescript
|
|
const stream = await run(agent, message, { stream: true });
|
|
|
|
const { readable, writable } = new TransformStream();
|
|
const writer = writable.getWriter();
|
|
const encoder = new TextEncoder();
|
|
|
|
// Stream in background
|
|
(async () => {
|
|
try {
|
|
for await (const event of stream) {
|
|
if (event.type === 'raw_model_stream_event') {
|
|
const chunk = event.data?.choices?.[0]?.delta?.content || '';
|
|
if (chunk) {
|
|
await writer.write(encoder.encode(`data: ${chunk}\n\n`));
|
|
}
|
|
}
|
|
}
|
|
await stream.completed;
|
|
} finally {
|
|
await writer.close();
|
|
}
|
|
})();
|
|
|
|
return new Response(readable, {
|
|
headers: {
|
|
'Content-Type': 'text/event-stream',
|
|
'Cache-Control': 'no-cache',
|
|
},
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Known Limitations
|
|
|
|
### 1. CPU Time Limits
|
|
Workers have CPU time limits (default 50ms, up to 30s with paid plans).
|
|
|
|
**Solution**: Use smaller models and limit `maxTurns`:
|
|
|
|
```typescript
|
|
const result = await run(agent, message, {
|
|
maxTurns: 3, // Limit turns
|
|
model: 'gpt-4o-mini', // Faster model
|
|
});
|
|
```
|
|
|
|
### 2. Memory Limits
|
|
Large context windows may hit memory limits (128MB default).
|
|
|
|
**Solution**: Keep conversations concise, summarize history:
|
|
|
|
```typescript
|
|
const agent = new Agent({
|
|
instructions: 'Keep responses concise. Summarize context when needed.',
|
|
});
|
|
```
|
|
|
|
### 3. No Realtime Voice
|
|
WebRTC not supported in Workers runtime.
|
|
|
|
**Solution**: Use realtime agents in Next.js or other Node.js environments.
|
|
|
|
### 4. Cold Starts
|
|
First request after inactivity may be slow.
|
|
|
|
**Solution**: Use warm-up requests or keep Workers warm with cron triggers.
|
|
|
|
---
|
|
|
|
## Performance Tips
|
|
|
|
### 1. Use Smaller Models
|
|
```typescript
|
|
model: 'gpt-4o-mini' // Faster than gpt-4o
|
|
```
|
|
|
|
### 2. Limit Turns
|
|
```typescript
|
|
maxTurns: 3 // Prevent long-running loops
|
|
```
|
|
|
|
### 3. Stream Responses
|
|
```typescript
|
|
stream: true // Start returning data faster
|
|
```
|
|
|
|
### 4. Cache Results
|
|
```typescript
|
|
// Cache frequent queries in KV
|
|
const cached = await env.KV.get(cacheKey);
|
|
if (cached) return cached;
|
|
|
|
const result = await run(agent, message);
|
|
await env.KV.put(cacheKey, result, { expirationTtl: 3600 });
|
|
```
|
|
|
|
### 5. Use Durable Objects for State
|
|
```typescript
|
|
// Store agent state in Durable Objects for long conversations
|
|
class AgentSession {
|
|
async fetch(request) {
|
|
// Maintain conversation state across requests
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
```bash
|
|
# Build and deploy
|
|
npm run build
|
|
wrangler deploy
|
|
|
|
# Test locally
|
|
wrangler dev
|
|
```
|
|
|
|
---
|
|
|
|
## Cost Considerations
|
|
|
|
**Workers Costs**:
|
|
- Requests: $0.15 per million (after 100k free/day)
|
|
- CPU Time: $0.02 per million CPU-ms (after 10ms free per request)
|
|
|
|
**OpenAI Costs**:
|
|
- GPT-4o-mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
|
|
- GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
|
|
|
|
**Example**: 1M agent requests (avg 500 tokens each)
|
|
- Workers: ~$1.50
|
|
- GPT-4o-mini: ~$75
|
|
- **Total**: ~$76.50
|
|
|
|
**Use gpt-4o-mini for cost efficiency!**
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
```typescript
|
|
// Log execution time
|
|
const start = Date.now();
|
|
const result = await run(agent, message);
|
|
const duration = Date.now() - start;
|
|
|
|
console.log(`Agent execution: ${duration}ms`);
|
|
console.log(`Tokens used: ${result.usage.totalTokens}`);
|
|
```
|
|
|
|
Enable Workers observability in wrangler.jsonc:
|
|
|
|
```jsonc
|
|
"observability": {
|
|
"enabled": true,
|
|
"head_sampling_rate": 0.1
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
```typescript
|
|
try {
|
|
const result = await run(agent, message, {
|
|
maxTurns: 5,
|
|
});
|
|
return result;
|
|
|
|
} catch (error) {
|
|
if (error.message.includes('CPU time limit')) {
|
|
// Hit Workers CPU limit - reduce complexity
|
|
return { error: 'Request too complex' };
|
|
}
|
|
|
|
if (error.message.includes('memory')) {
|
|
// Hit memory limit - reduce context
|
|
return { error: 'Context too large' };
|
|
}
|
|
|
|
throw error;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Alternatives
|
|
|
|
If Workers limitations are problematic:
|
|
|
|
1. **Cloudflare Pages Functions** (same runtime, may not help)
|
|
2. **Next.js on Vercel** (better Node.js support)
|
|
3. **Node.js on Railway/Render** (full Node.js environment)
|
|
4. **AWS Lambda** (longer timeouts, more memory)
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-10-26
|
|
**Status**: Experimental - test thoroughly before production use
|