Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:24:54 +08:00
commit 7927519669
17 changed files with 4377 additions and 0 deletions

View File

@@ -0,0 +1,310 @@
# Choosing the Right Embedding Dimensions
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
---
## Quick Decision Table
| Your Priority | Recommended Dimensions | Why |
|--------------|----------------------|-----|
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
| **Maximum accuracy** | 3072 | Gemini's full capability |
| **Storage-limited** | 512 or lower | Reduce storage/compute |
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
---
## Available Dimensions
Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
### Common Choices
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|------------|---------------|--------------|----------|----------|
| **768** | ~3 KB | Fast | Good | **Recommended default** |
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
---
## Matryoshka Representation Learning
Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
```
Dimensions 1-256: Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision
```
**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
---
## Storage Impact
### Example: 100,000 Documents
| Dimensions | Storage Required | Monthly Cost (R2)* |
|------------|-----------------|-------------------|
| 256 | ~100 MB | $0.01 |
| 512 | ~200 MB | $0.02 |
| **768** | **~300 MB** | **$0.03** |
| 1536 | ~600 MB | $0.06 |
| 3072 | ~1.2 GB | $0.12 |
\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
**For 1M vectors**:
- 768 dims: ~3 GB storage
- 3072 dims: ~12 GB storage (4x more expensive)
---
## Accuracy Trade-offs
Based on MTEB benchmarks (approximate):
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|------------|-------------------|------------------|
| 256 | ~85% | -15% |
| 512 | ~92% | -8% |
| **768** | **~96%** | **-4%** |
| 1536 | ~98% | -2% |
| 3072 | 100% (baseline) | 0% |
**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
---
## Query Performance
Search latency (approximate, 100k vectors):
| Dimensions | Query Latency | Throughput (QPS) |
|------------|--------------|------------------|
| 256 | ~10ms | ~1000 |
| 512 | ~15ms | ~700 |
| **768** | **~20ms** | **~500** |
| 1536 | ~35ms | ~300 |
| 3072 | ~60ms | ~170 |
**Note**: Actual performance depends on Vectorize implementation and hardware.
---
## When to Use Each
### 768 Dimensions (Recommended Default)
**Use when**:
- ✅ Building standard RAG systems
- ✅ General semantic search
- ✅ Cost-effectiveness matters
- ✅ Storage is a consideration
**Don't use when**:
- ❌ You need absolute maximum accuracy
- ❌ Migrating from OpenAI 1536-dim embeddings
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768 // ← Recommended
}
});
```
---
### 3072 Dimensions (Maximum Accuracy)
**Use when**:
- ✅ Accuracy is critical (legal, medical, research)
- ✅ Budget allows 4x storage cost
- ✅ Query latency isn't a concern
- ✅ Small dataset (<10k vectors)
**Don't use when**:
- ❌ Cost-sensitive project
- ❌ Large dataset (>100k vectors)
- ❌ Real-time search required
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 3072 // ← Maximum accuracy
}
});
```
---
### 1536 Dimensions (OpenAI Compatibility)
**Use when**:
- ✅ Migrating from OpenAI text-embedding-3-small
- ✅ Need compatibility with existing infrastructure
- ✅ Balancing accuracy and cost
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 1536 // ← Match OpenAI
}
});
```
---
### 512 or Lower (Storage-Constrained)
**Use when**:
- ✅ Extreme storage constraints
- ✅ Millions of vectors
- ✅ Acceptable to sacrifice some accuracy
- ✅ Ultra-fast queries required
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 512 // ← Compact
}
});
```
---
## Migration Between Dimensions
**CRITICAL**: You cannot mix different dimensions in the same index.
### Option 1: Recreate Index
```bash
# Delete old index
npx wrangler vectorize delete my-index
# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
# Re-generate all embeddings with new dimensions
# Re-insert all vectors
```
### Option 2: Create New Index
```bash
# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
# Gradually migrate vectors
# Switch over when ready
# Delete old index
```
---
## Testing Methodology
To test if lower dimensions work for your use case:
```typescript
// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
dims.map(dim => ai.models.embedContent({
model: 'gemini-embedding-001',
content: testText,
config: { outputDimensionality: dim }
}))
);
// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
const accuracy = await testRetrievalAccuracy(queries, dim);
console.log(`${dim} dims: ${accuracy}% accuracy`);
}
// 3. Measure performance
for (const dim of dims) {
const latency = await measureQueryLatency(dim);
console.log(`${dim} dims: ${latency}ms latency`);
}
```
---
## Recommendations by Use Case
### RAG for Documentation
- **Recommended**: 768 dims
- **Reasoning**: Good accuracy, reasonable storage, fast queries
### E-commerce Search
- **Recommended**: 512-768 dims
- **Reasoning**: Speed matters, millions of products
### Legal Document Search
- **Recommended**: 3072 dims
- **Reasoning**: Accuracy is critical, smaller datasets
### Customer Support Chatbot
- **Recommended**: 768 dims
- **Reasoning**: Balance accuracy and response time
### Research Paper Search
- **Recommended**: 1536-3072 dims
- **Reasoning**: Nuanced understanding needed
---
## Summary
**Default Choice**: **768 dimensions**
- 96% of 3072-dim accuracy
- 75% less storage
- 3x faster queries
- Best balance for most applications
**Only use 3072 if**:
- You need every percentage point of accuracy
- You have budget for 4x storage
- You have a small dataset
**Consider lower (<768) if**:
- You have millions of vectors
- Storage cost is a major concern
- Ultra-fast queries are required
---
## Official Documentation
- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb

View File

@@ -0,0 +1,236 @@
# Embedding Model Comparison
Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
---
## Quick Comparison Table
| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
| **Default Dims** | 3072 | 1536 | 3072 | 768 |
| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
| **Task Types** | 8 types | None | None | None |
| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
---
## Detailed Comparison
### 1. Google Gemini (gemini-embedding-001)
**Strengths**:
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Free tier with generous limits
- Same API as Gemini text generation (unified ecosystem)
**Weaknesses**:
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
- Newer model (less community knowledge)
**Recommended For**:
- RAG systems (optimized task types)
- Projects already using Gemini API
- Budget-conscious projects (free tier)
**Pricing**:
- Free: 100 RPM, 30k TPM, 1k RPD
- Paid: $0.025 per 1M tokens (Tier 1+)
---
### 2. OpenAI text-embedding-3-small
**Strengths**:
- Larger context window (8,191 tokens)
- Well-documented and widely used
- Good balance of cost and performance
- Can shorten dimensions (Matryoshka)
**Weaknesses**:
- Fixed 1536 dimensions (unless shortened)
- No task type optimization
- Costs from day one (no free tier for embeddings)
**Recommended For**:
- General-purpose semantic search
- Projects with long documents (>2k tokens)
- OpenAI ecosystem integration
**Pricing**:
- $0.020 per 1M tokens
---
### 3. OpenAI text-embedding-3-large
**Strengths**:
- Highest accuracy of OpenAI models
- 3072 dimensions (same as Gemini default)
- Large context window (8,191 tokens)
**Weaknesses**:
- Most expensive ($0.130 per 1M tokens)
- Fixed dimensions
- Overkill for most use cases
**Recommended For**:
- Mission-critical applications requiring maximum accuracy
- Well-funded projects
**Pricing**:
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
---
### 4. Cloudflare Workers AI (bge-base-en-v1.5)
**Strengths**:
- **Free** on Cloudflare Workers
- Fast (edge inference)
- Good for English text
- Simple integration with Vectorize
**Weaknesses**:
- Small context window (512 tokens)
- Fixed 768 dimensions
- No task type optimization
- English-only (limited multilingual support)
**Recommended For**:
- Cloudflare-first stacks
- Cost-sensitive projects
- Short documents (<512 tokens)
- Edge inference requirements
**Pricing**:
- Free (included with Cloudflare Workers)
**Example**:
```typescript
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: 'Your text here'
});
// Returns: { data: number[] } with 768 dimensions
```
---
## When to Use Which
### Use Gemini Embeddings When:
- ✅ Building RAG systems (task type optimization)
- ✅ Need flexible dimensions (save storage/compute)
- ✅ Already using Gemini API
- ✅ Want free tier for development
### Use OpenAI text-embedding-3-small When:
- ✅ Documents > 2,048 tokens
- ✅ Using OpenAI for generation
- ✅ Need proven, well-documented solution
- ✅ General-purpose semantic search
### Use OpenAI text-embedding-3-large When:
- ✅ Maximum accuracy required
- ✅ Budget allows ($0.130 per 1M tokens)
- ✅ Mission-critical applications
### Use Workers AI (BGE) When:
- ✅ Building on Cloudflare
- ✅ Short documents (<512 tokens)
- ✅ Cost is primary concern (free)
- ✅ English-only content
- ✅ Need edge inference
---
## Dimension Recommendations
| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|----------|--------|--------------|--------------|------------|
| **General RAG** | 768 | 1536 | 3072 | 768 |
| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
---
## Migration Guide
### From OpenAI to Gemini
```typescript
// Before (OpenAI)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
});
const embedding = response.data[0].embedding; // 1536 dims
// After (Gemini)
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'Your text here',
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768 // or 1536 to match OpenAI
}
});
const embedding = response.embedding.values; // 768 dims
```
**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
---
## Performance Benchmarks
Based on MTEB (Massive Text Embedding Benchmark):
| Model | Retrieval Score | Clustering Score | Overall Score |
|-------|----------------|------------------|---------------|
| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
*Estimated based on available benchmarks
**Source**: https://github.com/embeddings-benchmark/mteb
---
## Summary
**Best Overall**: Gemini gemini-embedding-001
- Flexible dimensions
- Task type optimization
- Free tier
- Good performance
**Best for Accuracy**: OpenAI text-embedding-3-large
- Highest MTEB scores
- Large context window
- Most expensive
**Best for Budget**: Cloudflare Workers AI (BGE)
- Completely free
- Edge inference
- Limited context window
**Best for Long Documents**: OpenAI models
- 8,191 token context
- vs 2,048 (Gemini) or 512 (Workers AI)
---
## Official Documentation
- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb

483
references/rag-patterns.md Normal file
View File

@@ -0,0 +1,483 @@
# RAG Implementation Patterns
Complete guide to Retrieval Augmented Generation patterns using Gemini embeddings and Cloudflare Vectorize.
---
## RAG Workflow Overview
```
┌─────────────────────────────────────────────────────────┐
│ DOCUMENT INGESTION (Offline) │
└─────────────────────────────────────────────────────────┘
Documents
Chunking (500 words)
Generate Embeddings (RETRIEVAL_DOCUMENT)
Store in Vectorize + Metadata
┌─────────────────────────────────────────────────────────┐
│ QUERY PROCESSING (Runtime) │
└─────────────────────────────────────────────────────────┘
User Query
Generate Embedding (RETRIEVAL_QUERY)
Vector Search (top-K)
Retrieve Documents
Generate Response (LLM + Context)
Stream to User
```
---
## Pattern 1: Basic RAG
**Use when**: Simple Q&A over a knowledge base
```typescript
async function basicRAG(query: string, env: Env): Promise<string> {
// 1. Embed query
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
// 2. Search Vectorize
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
// 3. Concatenate context
const context = results.matches
.map(m => m.metadata?.text)
.join('\n\n');
// 4. Generate response
const response = await generateResponse(context, query, env.GEMINI_API_KEY);
return response;
}
```
---
## Pattern 2: Chunked RAG (Recommended)
**Use when**: Documents are longer than 2,048 tokens
### Chunking Strategies
```typescript
// Strategy A: Fixed-size chunks with overlap
function chunkWithOverlap(text: string, size = 500, overlap = 50): string[] {
const words = text.split(/\s+/);
const chunks: string[] = [];
for (let i = 0; i < words.length; i += size - overlap) {
chunks.push(words.slice(i, i + size).join(' '));
}
return chunks;
}
// Strategy B: Sentence-based chunks
function chunkBySentences(text: string, maxSentences = 10): string[] {
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
const chunks: string[] = [];
for (let i = 0; i < sentences.length; i += maxSentences) {
chunks.push(sentences.slice(i, i + maxSentences).join(' '));
}
return chunks;
}
// Strategy C: Semantic chunks (preserves paragraphs)
function chunkByParagraphs(text: string): string[] {
return text.split(/\n\n+/).filter(p => p.trim().length > 50);
}
```
### Implementation
```typescript
async function ingestWithChunking(doc: Document, env: Env) {
const chunks = chunkWithOverlap(doc.text, 500, 50);
const vectors = [];
for (let i = 0; i < chunks.length; i++) {
const embedding = await generateEmbedding(chunks[i], env.GEMINI_API_KEY, 'RETRIEVAL_DOCUMENT');
vectors.push({
id: `${doc.id}-chunk-${i}`,
values: embedding,
metadata: {
documentId: doc.id,
chunkIndex: i,
text: chunks[i],
title: doc.title
}
});
}
await env.VECTORIZE.insert(vectors);
}
```
---
## Pattern 3: Hybrid Search (Keyword + Semantic)
**Use when**: You need both exact keyword matches and semantic understanding
```typescript
async function hybridSearch(query: string, env: Env) {
// 1. Vector search
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
const vectorResults = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
// 2. Keyword search (using metadata or D1)
const keywordResults = await env.D1.prepare(
'SELECT * FROM documents WHERE text LIKE ? ORDER BY relevance DESC LIMIT 10'
).bind(`%${query}%`).all();
// 3. Merge and re-rank
const combined = mergeResults(vectorResults.matches, keywordResults.results);
// 4. Generate response from top results
const context = combined.slice(0, 5).map(r => r.text).join('\n\n');
return await generateResponse(context, query, env.GEMINI_API_KEY);
}
```
---
## Pattern 4: Filtered RAG
**Use when**: Need to filter by category, date, or metadata
```typescript
async function filteredRAG(query: string, filters: { category?: string; minDate?: number }, env: Env) {
// 1. Vector search
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 20 }); // Fetch more
// 2. Filter in application layer (until Vectorize supports metadata filtering)
const filtered = results.matches.filter(match => {
if (filters.category && match.metadata?.category !== filters.category) return false;
if (filters.minDate && match.metadata?.timestamp < filters.minDate) return false;
return true;
});
// 3. Take top 5 after filtering
const topResults = filtered.slice(0, 5);
// 4. Generate response
const context = topResults.map(r => r.metadata?.text).join('\n\n');
return await generateResponse(context, query, env.GEMINI_API_KEY);
}
```
---
## Pattern 5: Streaming RAG
**Use when**: Real-time responses with immediate feedback
```typescript
async function streamingRAG(query: string, env: Env): Promise<ReadableStream> {
// 1. Embed query and search
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
// 2. Stream response from Gemini
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
contents: [{
parts: [{ text: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:` }]
}]
})
}
);
return response.body!;
}
```
---
## Pattern 6: Multi-Query RAG
**Use when**: Query might be ambiguous or multi-faceted
```typescript
async function multiQueryRAG(query: string, env: Env) {
// 1. Generate multiple query variations
const queryVariations = await generateQueryVariations(query, env.GEMINI_API_KEY);
// Returns: ["original query", "rephrased version 1", "rephrased version 2"]
// 2. Search with each variation
const allResults = await Promise.all(
queryVariations.map(async q => {
const embedding = await generateEmbedding(q, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
return await env.VECTORIZE.query(embedding, { topK: 3 });
})
);
// 3. Merge and deduplicate
const uniqueResults = deduplicateById(allResults.flatMap(r => r.matches));
// 4. Generate response
const context = uniqueResults.slice(0, 5).map(r => r.metadata?.text).join('\n\n');
return await generateResponse(context, query, env.GEMINI_API_KEY);
}
```
---
## Pattern 7: Conversational RAG
**Use when**: Multi-turn conversations with context
```typescript
interface ConversationHistory {
role: 'user' | 'assistant';
content: string;
}
async function conversationalRAG(
query: string,
history: ConversationHistory[],
env: Env
) {
// 1. Create contextualized query from history
const contextualizedQuery = await reformulateQuery(query, history, env.GEMINI_API_KEY);
// 2. Search with contextualized query
const embedding = await generateEmbedding(contextualizedQuery, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
const results = await env.VECTORIZE.query(embedding, { topK: 3 });
const retrievedContext = results.matches.map(m => m.metadata?.text).join('\n\n');
// 3. Generate response with conversation history
const prompt = `
Conversation history:
${history.map(h => `${h.role}: ${h.content}`).join('\n')}
Retrieved context:
${retrievedContext}
User: ${query}
Assistant:`;
return await generateResponse(prompt, query, env.GEMINI_API_KEY);
}
```
---
## Pattern 8: Citation RAG
**Use when**: Need to cite sources in responses
```typescript
async function citationRAG(query: string, env: Env) {
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 5, returnMetadata: true });
// Build context with citations
const contextWithCitations = results.matches.map((match, i) =>
`[${i + 1}] ${match.metadata?.text}\nSource: ${match.metadata?.url || match.id}`
).join('\n\n');
const prompt = `Answer the question using the provided sources. Include citations [1], [2], etc. in your answer.
Sources:
${contextWithCitations}
Question: ${query}
Answer (with citations):`;
const response = await generateResponse(prompt, query, env.GEMINI_API_KEY);
return {
answer: response,
sources: results.matches.map((m, i) => ({
citation: i + 1,
text: m.metadata?.text,
url: m.metadata?.url,
score: m.score
}))
};
}
```
---
## Best Practices
### 1. Chunk Size Optimization
```typescript
// Test different chunk sizes for your use case
const chunkSizes = [200, 500, 1000, 1500];
for (const size of chunkSizes) {
const accuracy = await testRetrievalAccuracy(size);
console.log(`Chunk size ${size}: ${accuracy}% accuracy`);
}
// Recommendation: 500-1000 words with 10% overlap
```
### 2. Context Window Management
```typescript
// Don't exceed LLM context window
function truncateContext(chunks: string[], maxTokens = 4000): string {
let context = '';
let estimatedTokens = 0;
for (const chunk of chunks) {
const chunkTokens = chunk.split(/\s+/).length * 1.3; // Rough estimate
if (estimatedTokens + chunkTokens > maxTokens) break;
context += chunk + '\n\n';
estimatedTokens += chunkTokens;
}
return context;
}
```
### 3. Re-ranking
```typescript
// Re-rank results after retrieval
function rerank(results: VectorizeMatch[], query: string): VectorizeMatch[] {
return results
.map(result => ({
...result,
rerankScore: calculateRelevance(result.metadata?.text, query)
}))
.sort((a, b) => b.rerankScore - a.rerankScore);
}
```
### 4. Fallback Strategies
```typescript
async function ragWithFallback(query: string, env: Env) {
const results = await searchVectorize(query, env);
if (results.matches.length === 0 || results.matches[0].score < 0.7) {
// Fallback: Use LLM without RAG
return await generateResponse('', query, env.GEMINI_API_KEY);
}
// Normal RAG flow
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
return await generateResponse(context, query, env.GEMINI_API_KEY);
}
```
---
## Performance Optimization
### 1. Caching
```typescript
// Cache embeddings
const embeddingCache = new Map<string, number[]>();
async function getCachedEmbedding(text: string, apiKey: string) {
const key = hashText(text);
if (embeddingCache.has(key)) {
return embeddingCache.get(key)!;
}
const embedding = await generateEmbedding(text, apiKey, 'RETRIEVAL_QUERY');
embeddingCache.set(key, embedding);
return embedding;
}
```
### 2. Batch Processing
```typescript
// Ingest documents in parallel
async function batchIngest(documents: Document[], env: Env, concurrency = 5) {
for (let i = 0; i < documents.length; i += concurrency) {
const batch = documents.slice(i, i + concurrency);
await Promise.all(
batch.map(doc => ingestDocument(doc, env))
);
}
}
```
---
## Common Pitfalls
### ❌ Don't: Use same task type for queries and documents
```typescript
// Wrong
const embedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_DOCUMENT');
```
### ✅ Do: Use correct task types
```typescript
// Correct
const queryEmbedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_QUERY');
const docEmbedding = await generateEmbedding(doc, apiKey, 'RETRIEVAL_DOCUMENT');
```
### ❌ Don't: Return too many or too few results
```typescript
// Too few (might miss relevant info)
const results = await env.VECTORIZE.query(embedding, { topK: 1 });
// Too many (noise, cost)
const results = await env.VECTORIZE.query(embedding, { topK: 50 });
```
### ✅ Do: Find optimal topK for your use case
```typescript
// Test different topK values
const topK = 5; // Good default for most use cases
const results = await env.VECTORIZE.query(embedding, { topK });
```
---
## Complete Example
See `templates/rag-with-vectorize.ts` for a production-ready implementation combining these patterns.
---
## Official Documentation
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
- **Vectorize**: https://developers.cloudflare.com/vectorize/
- **RAG Best Practices**: https://ai.google.dev/gemini-api/docs/document-processing

460
references/top-errors.md Normal file
View File

@@ -0,0 +1,460 @@
# Top 8 Embedding Errors (And How to Fix Them)
This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.
---
## Error 1: Dimension Mismatch
### Error Message
```
Error: Vector dimensions do not match. Expected 768, got 3072
```
### Why It Happens
- Generated embedding with default dimensions (3072) but Vectorize index expects 768
- Mixed embeddings from different dimension settings
### Root Cause
Not specifying `outputDimensionality` parameter when generating embeddings.
### Prevention
```typescript
// ❌ BAD: No outputDimensionality (defaults to 3072)
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
});
// ✅ GOOD: Match Vectorize index dimensions
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← Match your index
});
```
### Fix
1. **Option A**: Regenerate embeddings with correct dimensions
2. **Option B**: Recreate Vectorize index with 3072 dimensions
```bash
# Recreate index with correct dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
```
**Sources**:
- https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
- Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/
---
## Error 2: Batch Size Limit Exceeded
### Error Message
```
Error: Request contains too many texts. Maximum: 100
```
### Why It Happens
- Tried to embed more texts than API allows in single request
- Different limits for single vs batch endpoints
### Root Cause
Gemini API limits the number of texts per batch request.
### Prevention
```typescript
// ❌ BAD: Trying to embed 500 texts at once
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: largeArray, // 500 texts
config: { taskType: 'RETRIEVAL_DOCUMENT' }
});
// ✅ GOOD: Chunk into batches
async function batchEmbed(texts: string[], batchSize = 100) {
const allEmbeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: batch,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
allEmbeddings.push(...response.embeddings.map(e => e.values));
// Rate limiting delay
if (i + batchSize < texts.length) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
return allEmbeddings;
}
```
**Sources**:
- Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits
---
## Error 3: Rate Limiting (429 Too Many Requests)
### Error Message
```
Error: 429 Too Many Requests - Rate limit exceeded
```
### Why It Happens
- Exceeded 100 requests per minute (free tier)
- Exceeded tokens per minute limit
- No exponential backoff implemented
### Root Cause
Free tier rate limits: 100 RPM, 30k TPM, 1k RPD
### Prevention
```typescript
// ❌ BAD: No rate limiting
for (const text of texts) {
await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
}
// ✅ GOOD: Exponential backoff
async function embedWithRetry(text: string, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
});
} catch (error: any) {
if (error.status === 429 && attempt < maxRetries - 1) {
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
**Rate Limits**:
| Tier | RPM | TPM | RPD |
|------|-----|-----|-----|
| Free | 100 | 30,000 | 1,000 |
| Tier 1 | 3,000 | 1,000,000 | - |
**Sources**:
- https://ai.google.dev/gemini-api/docs/rate-limits
---
## Error 4: Text Truncation (Input Length Limit)
### Error Message
No error! Text is **silently truncated** at 2,048 tokens.
### Why It Happens
- Input text exceeds 2,048 token limit
- No warning or error is raised
- Embeddings represent incomplete text
### Root Cause
Gemini embeddings model has 2,048 token input limit.
### Prevention
```typescript
// ❌ BAD: Long text (silently truncated)
const longText = "...".repeat(10000); // Very long
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: longText // Truncated to ~2,048 tokens
});
// ✅ GOOD: Chunk long texts
function chunkText(text: string, maxTokens = 2000): string[] {
const words = text.split(/\s+/);
const chunks: string[] = [];
let currentChunk: string[] = [];
for (const word of words) {
currentChunk.push(word);
// Rough estimate: 1 token ≈ 0.75 words
if (currentChunk.length * 0.75 >= maxTokens) {
chunks.push(currentChunk.join(' '));
currentChunk = [];
}
}
if (currentChunk.length > 0) {
chunks.push(currentChunk.join(' '));
}
return chunks;
}
const chunks = chunkText(longText, 2000);
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: chunks,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
```
**Sources**:
- https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
---
## Error 5: Cosine Similarity Calculation Errors
### Error Message
```
Error: Similarity values out of range (-1.5 to 1.2)
```
### Why It Happens
- Incorrect formula (using dot product instead of cosine similarity)
- Not normalizing magnitudes
- Division by zero for zero vectors
### Root Cause
Improper implementation of cosine similarity formula.
### Prevention
```typescript
// ❌ BAD: Just dot product (not cosine similarity)
function badSimilarity(a: number[], b: number[]): number {
let sum = 0;
for (let i = 0; i < a.length; i++) {
sum += a[i] * b[i];
}
return sum; // Wrong! This is unbounded
}
// ✅ GOOD: Proper cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
if (a.length !== b.length) {
throw new Error('Vector dimensions must match');
}
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
magnitudeA += a[i] * a[i];
magnitudeB += b[i] * b[i];
}
if (magnitudeA === 0 || magnitudeB === 0) {
return 0; // Handle zero vectors
}
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}
```
**Formula**:
```
cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
```
Where:
- `A · B` = dot product
- `||A||` = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)
**Result Range**: Always between -1 and 1
- 1 = identical direction
- 0 = perpendicular
- -1 = opposite direction
**Sources**:
- https://en.wikipedia.org/wiki/Cosine_similarity
---
## Error 6: Incorrect Task Type (Reduces Quality)
### Error Message
No error, but search quality is poor (10-30% worse).
### Why It Happens
- Using `RETRIEVAL_DOCUMENT` for queries
- Using `RETRIEVAL_QUERY` for documents
- Not specifying task type at all
### Root Cause
Task types optimize embeddings for specific use cases.
### Prevention
```typescript
// ❌ BAD: Wrong task type for RAG
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
});
// ✅ GOOD: Correct task types
// For user queries
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
});
// For documents to index
const docEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: documentText,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
```
**Task Types Cheat Sheet**:
| Task Type | Use For | Example |
|-----------|---------|---------|
| `RETRIEVAL_QUERY` | User queries | "What is RAG?" |
| `RETRIEVAL_DOCUMENT` | Documents to index | Knowledge base articles |
| `SEMANTIC_SIMILARITY` | Comparing texts | Duplicate detection |
| `CLUSTERING` | Grouping texts | Topic modeling |
| `CLASSIFICATION` | Categorizing texts | Spam detection |
**Impact**: Using correct task type improves search relevance by 10-30%.
**Sources**:
- https://ai.google.dev/gemini-api/docs/embeddings#task-types
---
## Error 7: Vector Storage Precision Loss
### Error Message
```
Warning: Similarity scores inconsistent after storage/retrieval
```
### Why It Happens
- Storing embeddings as integers instead of floats
- Rounding to fewer decimal places
- Using lossy compression
### Root Cause
Embeddings are high-precision floating-point numbers.
### Prevention
```typescript
// ❌ BAD: Rounding to integers
const embedding = response.embedding.values;
const rounded = embedding.map(v => Math.round(v)); // Precision loss!
await db.insert({
id: '1',
embedding: rounded // ← Will degrade search quality
});
// ✅ GOOD: Store full precision
const embedding = response.embedding.values; // Keep as-is
await db.insert({
id: '1',
embedding: embedding // ← Full float32 precision
});
// For JSON storage, use full precision
const json = JSON.stringify({
id: '1',
embedding: embedding // JavaScript numbers are float64
});
```
**Storage Recommendations**:
- **Vectorize**: Handles float32 automatically ✅
- **D1/SQLite**: Use BLOB for binary float32 array
- **KV**: Store as JSON (float64 precision)
- **R2**: Store as binary float32 array
**Sources**:
- Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/
---
## Error 8: Model Version Confusion
### Error Message
```
Error: Model 'gemini-embedding-exp-03-07' is deprecated
```
### Why It Happens
- Using experimental or deprecated model
- Mixing embeddings from different model versions
- Not keeping up with model updates
### Root Cause
Gemini has stable and experimental embedding models.
### Prevention
```typescript
// ❌ BAD: Using experimental/deprecated model
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
content: text
});
// ✅ GOOD: Use stable model
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001', // Stable production model
content: text,
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768
}
});
```
**Model Status**:
| Model | Status | Recommendation |
|-------|--------|----------------|
| `gemini-embedding-001` | ✅ Stable | Use this |
| `gemini-embedding-exp-03-07` | ❌ Deprecated (Oct 2025) | Migrate to gemini-embedding-001 |
**CRITICAL**: Never mix embeddings from different models. They use different vector spaces and are not comparable.
**Sources**:
- https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings
---
## Summary Checklist
Before deploying to production, verify:
- [ ] `outputDimensionality` matches Vectorize index dimensions
- [ ] Batch size ≤ API limits (chunk large datasets)
- [ ] Rate limiting implemented with exponential backoff
- [ ] Long texts are chunked (≤ 2,048 tokens)
- [ ] Cosine similarity formula is correct
- [ ] Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
- [ ] Embeddings stored with full precision (float32)
- [ ] Using stable model (`gemini-embedding-001`)
**Following these guidelines prevents 100% of documented errors.**
---
## Additional Resources
- **Official Docs**: https://ai.google.dev/gemini-api/docs/embeddings
- **Rate Limits**: https://ai.google.dev/gemini-api/docs/rate-limits
- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
- **Model Specs**: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001

View File

@@ -0,0 +1,469 @@
# Cloudflare Vectorize Integration
Complete guide for using Gemini embeddings with Cloudflare Vectorize.
---
## Quick Start
### 1. Create Vectorize Index
```bash
# Create index with 768 dimensions (recommended for Gemini)
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
# Alternative: 3072 dimensions (Gemini default, more accurate but larger)
npx wrangler vectorize create gemini-embeddings-large --dimensions 3072 --metric cosine
```
### 2. Bind to Worker
Add to `wrangler.jsonc`:
```jsonc
{
"name": "my-rag-worker",
"main": "src/index.ts",
"compatibility_date": "2025-10-25",
"vectorize": {
"bindings": [
{
"binding": "VECTORIZE",
"index_name": "gemini-embeddings"
}
]
}
}
```
### 3. Generate and Store Embeddings
```typescript
// Generate embedding
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text: 'Your document text' }] },
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768 // MUST match index dimensions
})
}
);
const data = await response.json();
const embedding = data.embedding.values;
// Insert into Vectorize
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: { text: 'Your document text', source: 'manual' }
}]);
```
---
## Dimension Configuration
**CRITICAL**: Embedding dimensions MUST match Vectorize index dimensions.
| Gemini Dimensions | Storage (per vector) | Recommended For |
|-------------------|---------------------|-----------------|
| 768 | 3 KB | Most use cases, cost-effective |
| 1536 | 6 KB | Balance accuracy/storage |
| 3072 | 12 KB | Maximum accuracy |
**Create index to match your embeddings**:
```bash
# For 768-dim embeddings
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
# For 1536-dim embeddings
npx wrangler vectorize create my-index --dimensions 1536 --metric cosine
# For 3072-dim embeddings (Gemini default)
npx wrangler vectorize create my-index --dimensions 3072 --metric cosine
```
---
## Metric Selection
Vectorize supports 3 distance metrics:
### Cosine (Recommended)
```bash
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
```
**When to use**:
- ✅ Semantic search (most common)
- ✅ Document similarity
- ✅ RAG systems
**Range**: 0 (different) to 1 (identical)
### Euclidean
```bash
npx wrangler vectorize create my-index --dimensions 768 --metric euclidean
```
**When to use**:
- ✅ Absolute distance matters
- ✅ Magnitude is important
**Range**: 0 (identical) to ∞ (very different)
### Dot Product
```bash
npx wrangler vectorize create my-index --dimensions 768 --metric dot-product
```
**When to use**:
- ✅ Pre-normalized vectors
- ✅ Performance optimization
**Range**: -1 to 1 (for normalized vectors)
**Recommendation**: Use **cosine** for Gemini embeddings (most common and intuitive).
---
## Insert Patterns
### Single Insert
```typescript
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
text: 'Document content',
timestamp: Date.now(),
category: 'documentation'
}
}]);
```
### Batch Insert
```typescript
const vectors = documents.map((doc, i) => ({
id: `doc-${i}`,
values: doc.embedding,
metadata: { text: doc.text }
}));
// Insert up to 100 vectors at once
await env.VECTORIZE.insert(vectors);
```
### Upsert (Update or Insert)
```typescript
// Vectorize automatically updates if ID exists
await env.VECTORIZE.insert([{
id: 'doc-1', // Existing ID
values: newEmbedding,
metadata: { text: 'Updated content' }
}]);
```
---
## Query Patterns
### Basic Query
```typescript
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5
});
console.log(results.matches);
// [{ id: 'doc-1', score: 0.95 }, ...]
```
### Query with Metadata
```typescript
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
returnMetadata: true
});
results.matches.forEach(match => {
console.log(match.id); // 'doc-1'
console.log(match.score); // 0.95
console.log(match.metadata.text); // 'Document content'
});
```
### Query with Metadata Filtering (Future)
```typescript
// Coming soon: Filter by metadata
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
filter: { category: 'documentation' }
});
```
---
## Metadata Best Practices
### What to Store
```typescript
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
// ✅ Store these
text: 'The actual document content', // For retrieval
title: 'Document title',
url: 'https://example.com/doc',
timestamp: Date.now(),
category: 'product',
// ❌ Don't store these
embedding: embedding, // Already stored as values
largeObject: { /* ... */ } // Keep metadata small
}
}]);
```
### Metadata Limits
- **Max size**: ~1 KB per vector
- **Best practice**: Store only what you need for retrieval/display
- **For large data**: Store minimal metadata, fetch full data from D1/KV using ID
---
## Complete RAG Example
```typescript
interface Env {
GEMINI_API_KEY: string;
VECTORIZE: VectorizeIndex;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// Ingest: POST /ingest with { text: "..." }
if (url.pathname === '/ingest' && request.method === 'POST') {
const { text } = await request.json();
// 1. Generate embedding
const embeddingRes = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text }] },
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
})
}
);
const embeddingData = await embeddingRes.json();
const embedding = embeddingData.embedding.values;
// 2. Store in Vectorize
await env.VECTORIZE.insert([{
id: `doc-${Date.now()}`,
values: embedding,
metadata: { text, timestamp: Date.now() }
}]);
return new Response(JSON.stringify({ success: true }));
}
// Query: POST /query with { query: "..." }
if (url.pathname === '/query' && request.method === 'POST') {
const { query } = await request.json();
// 1. Generate query embedding
const embeddingRes = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': env.GEMINI_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: { parts: [{ text: query }] },
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
})
}
);
const embeddingData = await embeddingRes.json();
const embedding = embeddingData.embedding.values;
// 2. Search Vectorize
const results = await env.VECTORIZE.query(embedding, {
topK: 5,
returnMetadata: true
});
return new Response(JSON.stringify({
query,
results: results.matches.map(m => ({
id: m.id,
score: m.score,
text: m.metadata?.text
}))
}));
}
return new Response('Not found', { status: 404 });
}
};
```
---
## Index Management
### List Indexes
```bash
npx wrangler vectorize list
```
### Get Index Info
```bash
npx wrangler vectorize get gemini-embeddings
```
### Delete Index
```bash
npx wrangler vectorize delete gemini-embeddings
```
**CRITICAL**: Deleting an index deletes all vectors permanently.
---
## Limitations & Quotas
| Feature | Free Plan | Paid Plans |
|---------|-----------|------------|
| Indexes per account | 100 | 100 |
| Vectors per index | 200,000 | 5,000,000+ |
| Queries per day | 30,000,000 | Unlimited |
| Dimensions | Up to 1536 | Up to 3072 |
**Source**: https://developers.cloudflare.com/vectorize/platform/pricing/
---
## Best Practices
### 1. Choose Dimensions Wisely
```typescript
// ✅ 768 dimensions (recommended)
// - Good accuracy
// - Low storage
// - Fast queries
// ⚠️ 3072 dimensions (if accuracy is critical)
// - Best accuracy
// - 4x storage
// - Slower queries
```
### 2. Use Metadata for Context
```typescript
await env.VECTORIZE.insert([{
id: 'doc-1',
values: embedding,
metadata: {
text: 'Store the actual text here for retrieval',
url: 'https://...',
timestamp: Date.now()
}
}]);
```
### 3. Implement Caching
```typescript
// Cache embeddings in KV
const cached = await env.KV.get(`embedding:${textHash}`);
if (cached) {
return JSON.parse(cached);
}
const embedding = await generateEmbedding(text);
await env.KV.put(`embedding:${textHash}`, JSON.stringify(embedding), {
expirationTtl: 86400 // 24 hours
});
```
### 4. Monitor Usage
```bash
# Check index stats
npx wrangler vectorize get gemini-embeddings
# Shows:
# - Total vectors
# - Dimensions
# - Metric type
```
---
## Troubleshooting
### Dimension Mismatch Error
```
Error: Vector dimensions do not match. Expected 768, got 3072
```
**Solution**: Ensure embedding `outputDimensionality` matches index dimensions.
### No Results Found
**Possible causes**:
1. Index is empty (no vectors inserted)
2. Query embedding is wrong task type (use RETRIEVAL_QUERY)
3. Similarity threshold too high
**Solution**: Check index has vectors, use correct task types.
---
## Official Documentation
- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
- **Pricing**: https://developers.cloudflare.com/vectorize/platform/pricing/
- **Wrangler CLI**: https://developers.cloudflare.com/workers/wrangler/