10 KiB
10 KiB
Cloudflare Vectorize Integration
Complete guide for using Gemini embeddings with Cloudflare Vectorize.
Quick Start
1. Create Vectorize Index
# Create index with 768 dimensions (recommended for Gemini)
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
# Alternative: 3072 dimensions (Gemini default, more accurate but larger)
npx wrangler vectorize create gemini-embeddings-large --dimensions 3072 --metric cosine
2. Bind to Worker
Add to wrangler.jsonc:
{
"name": "my-rag-worker",
"main": "src/index.ts",
"compatibility_date": "2025-10-25",
"vectorize": {
"bindings": [
{
"binding": "VECTORIZE",
"index_name": "gemini-embeddings"
}
]
}
}
3. Generate and Store Embeddings
// Generate embedding
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text: 'Your document text' }] },
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768 // MUST match index dimensions
})
}
);
const data = await response.json();
const embedding = data.embedding.values;
// Insert into Vectorize
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: { text: 'Your document text', source: 'manual' }
}]);
Dimension Configuration
CRITICAL: Embedding dimensions MUST match Vectorize index dimensions.
| Gemini Dimensions | Storage (per vector) | Recommended For |
|---|---|---|
| 768 | 3 KB | Most use cases, cost-effective |
| 1536 | 6 KB | Balance accuracy/storage |
| 3072 | 12 KB | Maximum accuracy |
Create index to match your embeddings:
# For 768-dim embeddings
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
# For 1536-dim embeddings
npx wrangler vectorize create my-index --dimensions 1536 --metric cosine
# For 3072-dim embeddings (Gemini default)
npx wrangler vectorize create my-index --dimensions 3072 --metric cosine
Metric Selection
Vectorize supports 3 distance metrics:
Cosine (Recommended)
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
When to use:
- ✅ Semantic search (most common)
- ✅ Document similarity
- ✅ RAG systems
Range: 0 (different) to 1 (identical)
Euclidean
npx wrangler vectorize create my-index --dimensions 768 --metric euclidean
When to use:
- ✅ Absolute distance matters
- ✅ Magnitude is important
Range: 0 (identical) to ∞ (very different)
Dot Product
npx wrangler vectorize create my-index --dimensions 768 --metric dot-product
When to use:
- ✅ Pre-normalized vectors
- ✅ Performance optimization
Range: -1 to 1 (for normalized vectors)
Recommendation: Use cosine for Gemini embeddings (most common and intuitive).
Insert Patterns
Single Insert
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
text: 'Document content',
timestamp: Date.now(),
category: 'documentation'
}
}]);
Batch Insert
const vectors = documents.map((doc, i) => ({
id: `doc-${i}`,
values: doc.embedding,
metadata: { text: doc.text }
}));
// Insert up to 100 vectors at once
await env.VECTORIZE.insert(vectors);
Upsert (Update or Insert)
// Vectorize automatically updates if ID exists
await env.VECTORIZE.insert([{
id: 'doc-1', // Existing ID
values: newEmbedding,
metadata: { text: 'Updated content' }
}]);
Query Patterns
Basic Query
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5
});
console.log(results.matches);
// [{ id: 'doc-1', score: 0.95 }, ...]
Query with Metadata
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
returnMetadata: true
});
results.matches.forEach(match => {
console.log(match.id); // 'doc-1'
console.log(match.score); // 0.95
console.log(match.metadata.text); // 'Document content'
});
Query with Metadata Filtering (Future)
// Coming soon: Filter by metadata
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
filter: { category: 'documentation' }
});
Metadata Best Practices
What to Store
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
// ✅ Store these
text: 'The actual document content', // For retrieval
title: 'Document title',
url: 'https://example.com/doc',
timestamp: Date.now(),
category: 'product',
// ❌ Don't store these
embedding: embedding, // Already stored as values
largeObject: { /* ... */ } // Keep metadata small
}
}]);
Metadata Limits
- Max size: ~1 KB per vector
- Best practice: Store only what you need for retrieval/display
- For large data: Store minimal metadata, fetch full data from D1/KV using ID
Complete RAG Example
interface Env {
GEMINI_API_KEY: string;
VECTORIZE: VectorizeIndex;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// Ingest: POST /ingest with { text: "..." }
if (url.pathname === '/ingest' && request.method === 'POST') {
const { text } = await request.json();
// 1. Generate embedding
const embeddingRes = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text }] },
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
})
}
);
const embeddingData = await embeddingRes.json();
const embedding = embeddingData.embedding.values;
// 2. Store in Vectorize
await env.VECTORIZE.insert([{
id: `doc-${Date.now()}`,
values: embedding,
metadata: { text, timestamp: Date.now() }
}]);
return new Response(JSON.stringify({ success: true }));
}
// Query: POST /query with { query: "..." }
if (url.pathname === '/query' && request.method === 'POST') {
const { query } = await request.json();
// 1. Generate query embedding
const embeddingRes = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text: query }] },
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
})
}
);
const embeddingData = await embeddingRes.json();
const embedding = embeddingData.embedding.values;
// 2. Search Vectorize
const results = await env.VECTORIZE.query(embedding, {
topK: 5,
returnMetadata: true
});
return new Response(JSON.stringify({
query,
results: results.matches.map(m => ({
id: m.id,
score: m.score,
text: m.metadata?.text
}))
}));
}
return new Response('Not found', { status: 404 });
}
};
Index Management
List Indexes
npx wrangler vectorize list
Get Index Info
npx wrangler vectorize get gemini-embeddings
Delete Index
npx wrangler vectorize delete gemini-embeddings
CRITICAL: Deleting an index deletes all vectors permanently.
Limitations & Quotas
| Feature | Free Plan | Paid Plans |
|---|---|---|
| Indexes per account | 100 | 100 |
| Vectors per index | 200,000 | 5,000,000+ |
| Queries per day | 30,000,000 | Unlimited |
| Dimensions | Up to 1536 | Up to 3072 |
Source: https://developers.cloudflare.com/vectorize/platform/pricing/
Best Practices
1. Choose Dimensions Wisely
// ✅ 768 dimensions (recommended)
// - Good accuracy
// - Low storage
// - Fast queries
// ⚠️ 3072 dimensions (if accuracy is critical)
// - Best accuracy
// - 4x storage
// - Slower queries
2. Use Metadata for Context
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
text: 'Store the actual text here for retrieval',
url: 'https://...',
timestamp: Date.now()
}
}]);
3. Implement Caching
// Cache embeddings in KV
const cached = await env.KV.get(`embedding:${textHash}`);
if (cached) {
return JSON.parse(cached);
}
const embedding = await generateEmbedding(text);
await env.KV.put(`embedding:${textHash}`, JSON.stringify(embedding), {
expirationTtl: 86400 // 24 hours
});
4. Monitor Usage
# Check index stats
npx wrangler vectorize get gemini-embeddings
# Shows:
# - Total vectors
# - Dimensions
# - Metric type
Troubleshooting
Dimension Mismatch Error
Error: Vector dimensions do not match. Expected 768, got 3072
Solution: Ensure embedding outputDimensionality matches index dimensions.
No Results Found
Possible causes:
- Index is empty (no vectors inserted)
- Query embedding is wrong task type (use RETRIEVAL_QUERY)
- Similarity threshold too high
Solution: Check index has vectors, use correct task types.
Official Documentation
- Vectorize Docs: https://developers.cloudflare.com/vectorize/
- Pricing: https://developers.cloudflare.com/vectorize/platform/pricing/
- Wrangler CLI: https://developers.cloudflare.com/workers/wrangler/