Initial commit
This commit is contained in:
358
references/cloudflare-integration.md
Normal file
358
references/cloudflare-integration.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# Cloudflare Workers Integration
|
||||
|
||||
**Status**: Experimental Support
|
||||
|
||||
OpenAI Agents SDK has experimental support for Cloudflare Workers. Some features work, others have limitations.
|
||||
|
||||
---
|
||||
|
||||
## Compatibility
|
||||
|
||||
### What Works ✅
|
||||
- Text agents (`Agent`, `run()`)
|
||||
- Basic tool calling
|
||||
- Structured outputs with Zod
|
||||
- Streaming responses (with caveats)
|
||||
- Environment variable access
|
||||
|
||||
### What Doesn't Work ❌
|
||||
- Realtime voice agents (WebRTC not supported in Workers)
|
||||
- Some Node.js APIs (timers, crypto edge cases)
|
||||
- Long-running operations (CPU time limits)
|
||||
|
||||
### What's Experimental ⚠️
|
||||
- Multi-agent handoffs (works but untested at scale)
|
||||
- Large context windows (may hit memory limits)
|
||||
- Complex tool executions (CPU time limits)
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
```bash
|
||||
npm install @openai/agents zod hono
|
||||
```
|
||||
|
||||
### 2. Configure wrangler.jsonc
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"name": "openai-agents-worker",
|
||||
"main": "src/index.ts",
|
||||
"compatibility_date": "2025-10-26",
|
||||
"compatibility_flags": ["nodejs_compat"],
|
||||
"node_compat": true, // Required for OpenAI SDK
|
||||
|
||||
"observability": {
|
||||
"enabled": true
|
||||
},
|
||||
|
||||
"limits": {
|
||||
"cpu_ms": 30000 // Adjust based on agent complexity
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Set Environment Variable
|
||||
|
||||
```bash
|
||||
# Set OPENAI_API_KEY secret
|
||||
wrangler secret put OPENAI_API_KEY
|
||||
|
||||
# Enter your OpenAI API key when prompted
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Basic Worker Example
|
||||
|
||||
```typescript
|
||||
import { Agent, run } from '@openai/agents';
|
||||
|
||||
export default {
|
||||
async fetch(request: Request, env: Env): Promise<Response> {
|
||||
if (request.method !== 'POST') {
|
||||
return new Response('Method not allowed', { status: 405 });
|
||||
}
|
||||
|
||||
try {
|
||||
const { message } = await request.json();
|
||||
|
||||
// Set API key from environment
|
||||
process.env.OPENAI_API_KEY = env.OPENAI_API_KEY;
|
||||
|
||||
const agent = new Agent({
|
||||
name: 'Assistant',
|
||||
instructions: 'You are helpful.',
|
||||
model: 'gpt-4o-mini', // Use smaller models for faster response
|
||||
});
|
||||
|
||||
const result = await run(agent, message, {
|
||||
maxTurns: 5, // Limit turns to control execution time
|
||||
});
|
||||
|
||||
return new Response(JSON.stringify({
|
||||
response: result.finalOutput,
|
||||
tokens: result.usage.totalTokens,
|
||||
}), {
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
});
|
||||
|
||||
} catch (error) {
|
||||
return new Response(JSON.stringify({ error: error.message }), {
|
||||
status: 500,
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
});
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
interface Env {
|
||||
OPENAI_API_KEY: string;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hono Integration
|
||||
|
||||
```typescript
|
||||
import { Hono } from 'hono';
|
||||
import { Agent, run } from '@openai/agents';
|
||||
|
||||
const app = new Hono<{ Bindings: { OPENAI_API_KEY: string } }>();
|
||||
|
||||
app.post('/api/agent', async (c) => {
|
||||
const { message } = await c.req.json();
|
||||
|
||||
process.env.OPENAI_API_KEY = c.env.OPENAI_API_KEY;
|
||||
|
||||
const agent = new Agent({
|
||||
name: 'Assistant',
|
||||
instructions: 'You are helpful.',
|
||||
});
|
||||
|
||||
const result = await run(agent, message);
|
||||
|
||||
return c.json({
|
||||
response: result.finalOutput,
|
||||
});
|
||||
});
|
||||
|
||||
export default app;
|
||||
```
|
||||
|
||||
**See Template**: `templates/cloudflare-workers/worker-agent-hono.ts`
|
||||
|
||||
---
|
||||
|
||||
## Streaming Responses
|
||||
|
||||
Streaming works but requires careful handling:
|
||||
|
||||
```typescript
|
||||
const stream = await run(agent, message, { stream: true });
|
||||
|
||||
const { readable, writable } = new TransformStream();
|
||||
const writer = writable.getWriter();
|
||||
const encoder = new TextEncoder();
|
||||
|
||||
// Stream in background
|
||||
(async () => {
|
||||
try {
|
||||
for await (const event of stream) {
|
||||
if (event.type === 'raw_model_stream_event') {
|
||||
const chunk = event.data?.choices?.[0]?.delta?.content || '';
|
||||
if (chunk) {
|
||||
await writer.write(encoder.encode(`data: ${chunk}\n\n`));
|
||||
}
|
||||
}
|
||||
}
|
||||
await stream.completed;
|
||||
} finally {
|
||||
await writer.close();
|
||||
}
|
||||
})();
|
||||
|
||||
return new Response(readable, {
|
||||
headers: {
|
||||
'Content-Type': 'text/event-stream',
|
||||
'Cache-Control': 'no-cache',
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### 1. CPU Time Limits
|
||||
Workers have CPU time limits (default 50ms, up to 30s with paid plans).
|
||||
|
||||
**Solution**: Use smaller models and limit `maxTurns`:
|
||||
|
||||
```typescript
|
||||
const result = await run(agent, message, {
|
||||
maxTurns: 3, // Limit turns
|
||||
model: 'gpt-4o-mini', // Faster model
|
||||
});
|
||||
```
|
||||
|
||||
### 2. Memory Limits
|
||||
Large context windows may hit memory limits (128MB default).
|
||||
|
||||
**Solution**: Keep conversations concise, summarize history:
|
||||
|
||||
```typescript
|
||||
const agent = new Agent({
|
||||
instructions: 'Keep responses concise. Summarize context when needed.',
|
||||
});
|
||||
```
|
||||
|
||||
### 3. No Realtime Voice
|
||||
WebRTC not supported in Workers runtime.
|
||||
|
||||
**Solution**: Use realtime agents in Next.js or other Node.js environments.
|
||||
|
||||
### 4. Cold Starts
|
||||
First request after inactivity may be slow.
|
||||
|
||||
**Solution**: Use warm-up requests or keep Workers warm with cron triggers.
|
||||
|
||||
---
|
||||
|
||||
## Performance Tips
|
||||
|
||||
### 1. Use Smaller Models
|
||||
```typescript
|
||||
model: 'gpt-4o-mini' // Faster than gpt-4o
|
||||
```
|
||||
|
||||
### 2. Limit Turns
|
||||
```typescript
|
||||
maxTurns: 3 // Prevent long-running loops
|
||||
```
|
||||
|
||||
### 3. Stream Responses
|
||||
```typescript
|
||||
stream: true // Start returning data faster
|
||||
```
|
||||
|
||||
### 4. Cache Results
|
||||
```typescript
|
||||
// Cache frequent queries in KV
|
||||
const cached = await env.KV.get(cacheKey);
|
||||
if (cached) return cached;
|
||||
|
||||
const result = await run(agent, message);
|
||||
await env.KV.put(cacheKey, result, { expirationTtl: 3600 });
|
||||
```
|
||||
|
||||
### 5. Use Durable Objects for State
|
||||
```typescript
|
||||
// Store agent state in Durable Objects for long conversations
|
||||
class AgentSession {
|
||||
async fetch(request) {
|
||||
// Maintain conversation state across requests
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
```bash
|
||||
# Build and deploy
|
||||
npm run build
|
||||
wrangler deploy
|
||||
|
||||
# Test locally
|
||||
wrangler dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Considerations
|
||||
|
||||
**Workers Costs**:
|
||||
- Requests: $0.15 per million (after 100k free/day)
|
||||
- CPU Time: $0.02 per million CPU-ms (after 10ms free per request)
|
||||
|
||||
**OpenAI Costs**:
|
||||
- GPT-4o-mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
|
||||
- GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
|
||||
|
||||
**Example**: 1M agent requests (avg 500 tokens each)
|
||||
- Workers: ~$1.50
|
||||
- GPT-4o-mini: ~$75
|
||||
- **Total**: ~$76.50
|
||||
|
||||
**Use gpt-4o-mini for cost efficiency!**
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
```typescript
|
||||
// Log execution time
|
||||
const start = Date.now();
|
||||
const result = await run(agent, message);
|
||||
const duration = Date.now() - start;
|
||||
|
||||
console.log(`Agent execution: ${duration}ms`);
|
||||
console.log(`Tokens used: ${result.usage.totalTokens}`);
|
||||
```
|
||||
|
||||
Enable Workers observability in wrangler.jsonc:
|
||||
|
||||
```jsonc
|
||||
"observability": {
|
||||
"enabled": true,
|
||||
"head_sampling_rate": 0.1
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
```typescript
|
||||
try {
|
||||
const result = await run(agent, message, {
|
||||
maxTurns: 5,
|
||||
});
|
||||
return result;
|
||||
|
||||
} catch (error) {
|
||||
if (error.message.includes('CPU time limit')) {
|
||||
// Hit Workers CPU limit - reduce complexity
|
||||
return { error: 'Request too complex' };
|
||||
}
|
||||
|
||||
if (error.message.includes('memory')) {
|
||||
// Hit memory limit - reduce context
|
||||
return { error: 'Context too large' };
|
||||
}
|
||||
|
||||
throw error;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alternatives
|
||||
|
||||
If Workers limitations are problematic:
|
||||
|
||||
1. **Cloudflare Pages Functions** (same runtime, may not help)
|
||||
2. **Next.js on Vercel** (better Node.js support)
|
||||
3. **Node.js on Railway/Render** (full Node.js environment)
|
||||
4. **AWS Lambda** (longer timeouts, more memory)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-10-26
|
||||
**Status**: Experimental - test thoroughly before production use
|
||||
Reference in New Issue
Block a user