10 KiB
Embedding Models Reference
Complete guide for generating vector embeddings with Workers AI and OpenAI.
Model Comparison
| Model | Provider | Dimensions | Metric | Cost | Performance |
|---|---|---|---|---|---|
| @cf/baai/bge-base-en-v1.5 | Workers AI | 768 | cosine | Free | Fast, edge-optimized |
| text-embedding-3-small | OpenAI | 1536 | cosine | $0.02/1M tokens | High quality, affordable |
| text-embedding-3-large | OpenAI | 3072 | cosine | $0.13/1M tokens | Highest accuracy |
| text-embedding-ada-002 | OpenAI (legacy) | 1536 | cosine | $0.10/1M tokens | Deprecated |
Workers AI (@cf/baai/bge-base-en-v1.5)
Best for: Production apps requiring free, fast embeddings with good quality.
Configuration
# Create index with 768 dimensions
npx wrangler vectorize create my-index \
--dimensions=768 \
--metric=cosine
Wrangler Binding
{
"ai": {
"binding": "AI"
},
"vectorize": [
{
"binding": "VECTORIZE_INDEX",
"index_name": "my-index"
}
]
}
Single Text
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: "Cloudflare Workers are serverless functions."
});
// embedding.data[0] is number[] with 768 dimensions
await env.VECTORIZE_INDEX.upsert([{
id: 'doc-1',
values: embedding.data[0],
metadata: { title: 'Workers Intro' }
}]);
Batch Embeddings
const texts = [
"Document 1 content",
"Document 2 content",
"Document 3 content"
];
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: texts // Array of strings
});
// embeddings.data is number[][] (array of 768-dim vectors)
const vectors = texts.map((text, i) => ({
id: `doc-${i}`,
values: embeddings.data[i],
metadata: { content: text }
}));
await env.VECTORIZE_INDEX.upsert(vectors);
Error Handling
try {
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: userQuery
});
if (!embedding?.data?.[0]) {
throw new Error('No embedding returned');
}
// Use embedding
} catch (error) {
console.error('Embedding generation failed:', error);
// Fallback logic
}
Limits
- Max input length: ~512 tokens (~2000 characters)
- Batch size: Up to 100 texts per request
- Rate limits: Generous (Workers AI scales automatically)
- Cost: Free!
OpenAI Embeddings
Best for: Higher quality embeddings, larger context windows, or specific use cases.
API Key Setup
Store API key as environment variable:
npx wrangler secret put OPENAI_API_KEY
text-embedding-3-small (1536 dimensions)
Best for: Cost-effective high quality embeddings.
Configuration
# Create index with 1536 dimensions
npx wrangler vectorize create my-index \
--dimensions=1536 \
--metric=cosine
Worker Code
import OpenAI from 'openai';
export interface Env {
OPENAI_API_KEY: string;
VECTORIZE_INDEX: VectorizeIndex;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
// Single embedding
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: "Text to embed",
encoding_format: "float" // Default
});
await env.VECTORIZE_INDEX.upsert([{
id: 'doc-1',
values: response.data[0].embedding, // 1536 dimensions
metadata: { model: 'openai-3-small' }
}]);
return Response.json({ success: true });
}
};
Batch Embeddings
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: [
"Document 1",
"Document 2",
"Document 3"
]
});
const vectors = response.data.map((item, i) => ({
id: `doc-${i}`,
values: item.embedding,
metadata: { index: i }
}));
await env.VECTORIZE_INDEX.upsert(vectors);
text-embedding-3-large (3072 dimensions)
Best for: Maximum accuracy, research, or high-stakes applications.
# Create index with 3072 dimensions
npx wrangler vectorize create high-accuracy-index \
--dimensions=3072 \
--metric=cosine
const response = await openai.embeddings.create({
model: "text-embedding-3-large",
input: "Text requiring high accuracy embedding"
});
await env.VECTORIZE_INDEX.upsert([{
id: 'doc-1',
values: response.data[0].embedding, // 3072 dimensions
metadata: { model: 'openai-3-large' }
}]);
OpenAI Error Handling
try {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text
});
return response.data[0].embedding;
} catch (error) {
if (error.status === 429) {
console.error('Rate limited');
// Implement retry with backoff
} else if (error.status === 401) {
console.error('Invalid API key');
} else {
console.error('OpenAI error:', error);
}
throw error;
}
OpenAI Limits
- text-embedding-3-small: 8191 tokens input
- text-embedding-3-large: 8191 tokens input
- Batch size: Up to 2048 inputs per request
- Rate limits: Varies by tier (check OpenAI dashboard)
Model Selection Guide
Use Workers AI (@cf/baai/bge-base-en-v1.5) when:
✅ Building production apps with budget constraints ✅ Need fast, edge-optimized embeddings ✅ Working with English text ✅ Don't need extremely high accuracy ✅ Want zero per-request costs
Use OpenAI text-embedding-3-small when:
✅ Need higher quality than Workers AI ✅ Budget allows ($0.02/1M tokens is affordable) ✅ Working with multilingual content ✅ Need longer context (8191 tokens) ✅ Willing to pay for better accuracy
Use OpenAI text-embedding-3-large when:
✅ Accuracy is critical (legal, medical, research) ✅ Large budget ($0.13/1M tokens) ✅ Need best possible search quality ✅ Working with complex or nuanced content
Embedding Best Practices
1. Consistent Model Usage
Always use the SAME model for indexing and querying!
// ❌ Wrong: Different models
// Index with Workers AI
const indexEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: document
});
// Query with OpenAI (WRONG!)
const queryEmbedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: query
});
// This won't work - different embedding spaces!
// ✅ Right: Same model
const indexEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: document
});
const queryEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: query
});
2. Text Preprocessing
function preprocessText(text: string): string {
return text
.trim() // Remove leading/trailing whitespace
.replace(/\s+/g, ' ') // Normalize whitespace
.slice(0, 8000); // Truncate to model limits
}
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: preprocessText(rawText)
});
3. Batch for Efficiency
// ✅ Good: Batch processing
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: arrayOf100Texts
});
// ❌ Bad: Individual requests
for (const text of texts) {
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text
});
}
4. Cache Embeddings
// Store embeddings, don't regenerate
await env.VECTORIZE_INDEX.upsert([{
id: 'doc-1',
values: embedding,
metadata: {
content: text, // Store original
model: 'bge-base-en-v1.5',
generated_at: Date.now()
}
}]);
// Later: Retrieve embedding instead of regenerating
const vectors = await env.VECTORIZE_INDEX.getByIds(['doc-1']);
const cachedEmbedding = vectors[0].values; // Reuse!
5. Handle Failures Gracefully
async function generateEmbedding(text: string, env: Env): Promise<number[]> {
try {
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text });
return response.data[0];
} catch (error) {
console.error('Primary embedding failed, trying fallback');
// Fallback to OpenAI if Workers AI fails
const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });
const fallback = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text
});
return fallback.data[0].embedding;
}
}
Testing Embedding Quality
Compare Similarity Scores
// Test known similar texts
const text1 = "Cloudflare Workers are serverless functions";
const text2 = "Workers are serverless code running on Cloudflare's edge";
const text3 = "Unrelated content about cooking recipes";
const [emb1, emb2, emb3] = await Promise.all([
env.AI.run('@cf/baai/bge-base-en-v1.5', { text: text1 }),
env.AI.run('@cf/baai/bge-base-en-v1.5', { text: text2 }),
env.AI.run('@cf/baai/bge-base-en-v1.5', { text: text3 }),
]);
const similar = cosineSimilarity(emb1.data[0], emb2.data[0]); // Should be high (>0.7)
const different = cosineSimilarity(emb1.data[0], emb3.data[0]); // Should be low (<0.3)
Cosine Similarity Helper
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magA * magB);
}
Dimension Mismatch Debugging
// Check actual dimensions before upserting
const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: "test"
});
console.log('Embedding dimensions:', embedding.data[0].length);
// Verify against index
const indexInfo = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${accountId}/vectorize/v2/indexes/my-index`,
{ headers: { 'Authorization': `Bearer ${apiToken}` } }
);
const indexConfig = await indexInfo.json();
console.log('Index dimensions:', indexConfig.result.config.dimensions);
// Must match!
if (embedding.data[0].length !== indexConfig.result.config.dimensions) {
throw new Error('Dimension mismatch!');
}