Files
gh-jezweb-claude-skills-ski…/references/cloudflare-integration.md
2025-11-30 08:25:09 +08:00

359 lines
7.2 KiB
Markdown

# Cloudflare Workers Integration
**Status**: Experimental Support
OpenAI Agents SDK has experimental support for Cloudflare Workers. Some features work, others have limitations.
---
## Compatibility
### What Works ✅
- Text agents (`Agent`, `run()`)
- Basic tool calling
- Structured outputs with Zod
- Streaming responses (with caveats)
- Environment variable access
### What Doesn't Work ❌
- Realtime voice agents (WebRTC not supported in Workers)
- Some Node.js APIs (timers, crypto edge cases)
- Long-running operations (CPU time limits)
### What's Experimental ⚠️
- Multi-agent handoffs (works but untested at scale)
- Large context windows (may hit memory limits)
- Complex tool executions (CPU time limits)
---
## Setup
### 1. Install Dependencies
```bash
npm install @openai/agents zod hono
```
### 2. Configure wrangler.jsonc
```jsonc
{
"name": "openai-agents-worker",
"main": "src/index.ts",
"compatibility_date": "2025-10-26",
"compatibility_flags": ["nodejs_compat"],
"node_compat": true, // Required for OpenAI SDK
"observability": {
"enabled": true
},
"limits": {
"cpu_ms": 30000 // Adjust based on agent complexity
}
}
```
### 3. Set Environment Variable
```bash
# Set OPENAI_API_KEY secret
wrangler secret put OPENAI_API_KEY
# Enter your OpenAI API key when prompted
```
---
## Basic Worker Example
```typescript
import { Agent, run } from '@openai/agents';
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return new Response('Method not allowed', { status: 405 });
}
try {
const { message } = await request.json();
// Set API key from environment
process.env.OPENAI_API_KEY = env.OPENAI_API_KEY;
const agent = new Agent({
name: 'Assistant',
instructions: 'You are helpful.',
model: 'gpt-4o-mini', // Use smaller models for faster response
});
const result = await run(agent, message, {
maxTurns: 5, // Limit turns to control execution time
});
return new Response(JSON.stringify({
response: result.finalOutput,
tokens: result.usage.totalTokens,
}), {
headers: { 'Content-Type': 'application/json' },
});
} catch (error) {
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { 'Content-Type': 'application/json' },
});
}
},
};
interface Env {
OPENAI_API_KEY: string;
}
```
---
## Hono Integration
```typescript
import { Hono } from 'hono';
import { Agent, run } from '@openai/agents';
const app = new Hono<{ Bindings: { OPENAI_API_KEY: string } }>();
app.post('/api/agent', async (c) => {
const { message } = await c.req.json();
process.env.OPENAI_API_KEY = c.env.OPENAI_API_KEY;
const agent = new Agent({
name: 'Assistant',
instructions: 'You are helpful.',
});
const result = await run(agent, message);
return c.json({
response: result.finalOutput,
});
});
export default app;
```
**See Template**: `templates/cloudflare-workers/worker-agent-hono.ts`
---
## Streaming Responses
Streaming works but requires careful handling:
```typescript
const stream = await run(agent, message, { stream: true });
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const encoder = new TextEncoder();
// Stream in background
(async () => {
try {
for await (const event of stream) {
if (event.type === 'raw_model_stream_event') {
const chunk = event.data?.choices?.[0]?.delta?.content || '';
if (chunk) {
await writer.write(encoder.encode(`data: ${chunk}\n\n`));
}
}
}
await stream.completed;
} finally {
await writer.close();
}
})();
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
},
});
```
---
## Known Limitations
### 1. CPU Time Limits
Workers have CPU time limits (default 50ms, up to 30s with paid plans).
**Solution**: Use smaller models and limit `maxTurns`:
```typescript
const result = await run(agent, message, {
maxTurns: 3, // Limit turns
model: 'gpt-4o-mini', // Faster model
});
```
### 2. Memory Limits
Large context windows may hit memory limits (128MB default).
**Solution**: Keep conversations concise, summarize history:
```typescript
const agent = new Agent({
instructions: 'Keep responses concise. Summarize context when needed.',
});
```
### 3. No Realtime Voice
WebRTC not supported in Workers runtime.
**Solution**: Use realtime agents in Next.js or other Node.js environments.
### 4. Cold Starts
First request after inactivity may be slow.
**Solution**: Use warm-up requests or keep Workers warm with cron triggers.
---
## Performance Tips
### 1. Use Smaller Models
```typescript
model: 'gpt-4o-mini' // Faster than gpt-4o
```
### 2. Limit Turns
```typescript
maxTurns: 3 // Prevent long-running loops
```
### 3. Stream Responses
```typescript
stream: true // Start returning data faster
```
### 4. Cache Results
```typescript
// Cache frequent queries in KV
const cached = await env.KV.get(cacheKey);
if (cached) return cached;
const result = await run(agent, message);
await env.KV.put(cacheKey, result, { expirationTtl: 3600 });
```
### 5. Use Durable Objects for State
```typescript
// Store agent state in Durable Objects for long conversations
class AgentSession {
async fetch(request) {
// Maintain conversation state across requests
}
}
```
---
## Deployment
```bash
# Build and deploy
npm run build
wrangler deploy
# Test locally
wrangler dev
```
---
## Cost Considerations
**Workers Costs**:
- Requests: $0.15 per million (after 100k free/day)
- CPU Time: $0.02 per million CPU-ms (after 10ms free per request)
**OpenAI Costs**:
- GPT-4o-mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
- GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
**Example**: 1M agent requests (avg 500 tokens each)
- Workers: ~$1.50
- GPT-4o-mini: ~$75
- **Total**: ~$76.50
**Use gpt-4o-mini for cost efficiency!**
---
## Monitoring
```typescript
// Log execution time
const start = Date.now();
const result = await run(agent, message);
const duration = Date.now() - start;
console.log(`Agent execution: ${duration}ms`);
console.log(`Tokens used: ${result.usage.totalTokens}`);
```
Enable Workers observability in wrangler.jsonc:
```jsonc
"observability": {
"enabled": true,
"head_sampling_rate": 0.1
}
```
---
## Error Handling
```typescript
try {
const result = await run(agent, message, {
maxTurns: 5,
});
return result;
} catch (error) {
if (error.message.includes('CPU time limit')) {
// Hit Workers CPU limit - reduce complexity
return { error: 'Request too complex' };
}
if (error.message.includes('memory')) {
// Hit memory limit - reduce context
return { error: 'Context too large' };
}
throw error;
}
```
---
## Alternatives
If Workers limitations are problematic:
1. **Cloudflare Pages Functions** (same runtime, may not help)
2. **Next.js on Vercel** (better Node.js support)
3. **Node.js on Railway/Render** (full Node.js environment)
4. **AWS Lambda** (longer timeouts, more memory)
---
**Last Updated**: 2025-10-26
**Status**: Experimental - test thoroughly before production use