Initial commit

2025-11-30 08:24:54 +08:00
commit 7927519669
17 changed files with 4377 additions and 0 deletions
--- a/references/dimension-guide.md
+++ b/references/dimension-guide.md
@@ -0,0 +1,310 @@
+# Choosing the Right Embedding Dimensions
+
+Guide to selecting optimal dimensions for your use case with Gemini embeddings.
+
+---
+
+## Quick Decision Table
+
+| Your Priority | Recommended Dimensions | Why |
+|--------------|----------------------|-----|
+| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
+| **Maximum accuracy** | 3072 | Gemini's full capability |
+| **Storage-limited** | 512 or lower | Reduce storage/compute |
+| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
+
+---
+
+## Available Dimensions
+
+Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
+
+### Common Choices
+
+| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
+|------------|---------------|--------------|----------|----------|
+| **768** | ~3 KB | Fast | Good | **Recommended default** |
+| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
+| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
+| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
+| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
+
+---
+
+## Matryoshka Representation Learning
+
+Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
+
+```
+Dimensions 1-256:   Core semantic information
+Dimensions 257-512: Additional nuance
+Dimensions 513-768: Fine-grained details
+Dimensions 769-1536: Subtle distinctions
+Dimensions 1537-3072: Maximum precision
+```
+
+**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
+
+---
+
+## Storage Impact
+
+### Example: 100,000 Documents
+
+| Dimensions | Storage Required | Monthly Cost (R2)* |
+|------------|-----------------|-------------------|
+| 256 | ~100 MB | $0.01 |
+| 512 | ~200 MB | $0.02 |
+| **768** | **~300 MB** | **$0.03** |
+| 1536 | ~600 MB | $0.06 |
+| 3072 | ~1.2 GB | $0.12 |
+
+\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
+
+**For 1M vectors**:
+- 768 dims: ~3 GB storage
+- 3072 dims: ~12 GB storage (4x more expensive)
+
+---
+
+## Accuracy Trade-offs
+
+Based on MTEB benchmarks (approximate):
+
+| Dimensions | Retrieval Accuracy | Relative to 3072 |
+|------------|-------------------|------------------|
+| 256 | ~85% | -15% |
+| 512 | ~92% | -8% |
+| **768** | **~96%** | **-4%** |
+| 1536 | ~98% | -2% |
+| 3072 | 100% (baseline) | 0% |
+
+**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
+
+---
+
+## Query Performance
+
+Search latency (approximate, 100k vectors):
+
+| Dimensions | Query Latency | Throughput (QPS) |
+|------------|--------------|------------------|
+| 256 | ~10ms | ~1000 |
+| 512 | ~15ms | ~700 |
+| **768** | **~20ms** | **~500** |
+| 1536 | ~35ms | ~300 |
+| 3072 | ~60ms | ~170 |
+
+**Note**: Actual performance depends on Vectorize implementation and hardware.
+
+---
+
+## When to Use Each
+
+### 768 Dimensions (Recommended Default)
+
+**Use when**:
+- ✅ Building standard RAG systems
+- ✅ General semantic search
+- ✅ Cost-effectiveness matters
+- ✅ Storage is a consideration
+
+**Don't use when**:
+- ❌ You need absolute maximum accuracy
+- ❌ Migrating from OpenAI 1536-dim embeddings
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 768 // ← Recommended
+  }
+});
+```
+
+---
+
+### 3072 Dimensions (Maximum Accuracy)
+
+**Use when**:
+- ✅ Accuracy is critical (legal, medical, research)
+- ✅ Budget allows 4x storage cost
+- ✅ Query latency isn't a concern
+- ✅ Small dataset (<10k vectors)
+
+**Don't use when**:
+- ❌ Cost-sensitive project
+- ❌ Large dataset (>100k vectors)
+- ❌ Real-time search required
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 3072 // ← Maximum accuracy
+  }
+});
+```
+
+---
+
+### 1536 Dimensions (OpenAI Compatibility)
+
+**Use when**:
+- ✅ Migrating from OpenAI text-embedding-3-small
+- ✅ Need compatibility with existing infrastructure
+- ✅ Balancing accuracy and cost
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 1536 // ← Match OpenAI
+  }
+});
+```
+
+---
+
+### 512 or Lower (Storage-Constrained)
+
+**Use when**:
+- ✅ Extreme storage constraints
+- ✅ Millions of vectors
+- ✅ Acceptable to sacrifice some accuracy
+- ✅ Ultra-fast queries required
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 512 // ← Compact
+  }
+});
+```
+
+---
+
+## Migration Between Dimensions
+
+**CRITICAL**: You cannot mix different dimensions in the same index.
+
+### Option 1: Recreate Index
+
+```bash
+# Delete old index
+npx wrangler vectorize delete my-index
+
+# Create new index with different dimensions
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+
+# Re-generate all embeddings with new dimensions
+# Re-insert all vectors
+```
+
+### Option 2: Create New Index
+
+```bash
+# Keep old index running
+# Create new index
+npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
+
+# Gradually migrate vectors
+# Switch over when ready
+# Delete old index
+```
+
+---
+
+## Testing Methodology
+
+To test if lower dimensions work for your use case:
+
+```typescript
+// 1. Generate test embeddings with different dimensions
+const dims = [256, 512, 768, 1536, 3072];
+const testEmbeddings = await Promise.all(
+  dims.map(dim => ai.models.embedContent({
+    model: 'gemini-embedding-001',
+    content: testText,
+    config: { outputDimensionality: dim }
+  }))
+);
+
+// 2. Test retrieval accuracy
+const queries = ['query1', 'query2', 'query3'];
+for (const dim of dims) {
+  const accuracy = await testRetrievalAccuracy(queries, dim);
+  console.log(`${dim} dims: ${accuracy}% accuracy`);
+}
+
+// 3. Measure performance
+for (const dim of dims) {
+  const latency = await measureQueryLatency(dim);
+  console.log(`${dim} dims: ${latency}ms latency`);
+}
+```
+
+---
+
+## Recommendations by Use Case
+
+### RAG for Documentation
+- **Recommended**: 768 dims
+- **Reasoning**: Good accuracy, reasonable storage, fast queries
+
+### E-commerce Search
+- **Recommended**: 512-768 dims
+- **Reasoning**: Speed matters, millions of products
+
+### Legal Document Search
+- **Recommended**: 3072 dims
+- **Reasoning**: Accuracy is critical, smaller datasets
+
+### Customer Support Chatbot
+- **Recommended**: 768 dims
+- **Reasoning**: Balance accuracy and response time
+
+### Research Paper Search
+- **Recommended**: 1536-3072 dims
+- **Reasoning**: Nuanced understanding needed
+
+---
+
+## Summary
+
+**Default Choice**: **768 dimensions**
+- 96% of 3072-dim accuracy
+- 75% less storage
+- 3x faster queries
+- Best balance for most applications
+
+**Only use 3072 if**:
+- You need every percentage point of accuracy
+- You have budget for 4x storage
+- You have a small dataset
+
+**Consider lower (<768) if**:
+- You have millions of vectors
+- Storage cost is a major concern
+- Ultra-fast queries are required
+
+---
+
+## Official Documentation
+
+- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
+- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
+- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb
--- a/references/model-comparison.md
+++ b/references/model-comparison.md
@@ -0,0 +1,236 @@
+# Embedding Model Comparison
+
+Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
+
+---
+
+## Quick Comparison Table
+
+| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
+|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
+| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
+| **Default Dims** | 3072 | 1536 | 3072 | 768 |
+| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
+| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
+| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
+| **Task Types** | 8 types | None | None | None |
+| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
+| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
+
+---
+
+## Detailed Comparison
+
+### 1. Google Gemini (gemini-embedding-001)
+
+**Strengths**:
+- Flexible dimensions (128-3072) using Matryoshka Representation Learning
+- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
+- Free tier with generous limits
+- Same API as Gemini text generation (unified ecosystem)
+
+**Weaknesses**:
+- Smaller context window (2,048 tokens vs OpenAI's 8,191)
+- Newer model (less community knowledge)
+
+**Recommended For**:
+- RAG systems (optimized task types)
+- Projects already using Gemini API
+- Budget-conscious projects (free tier)
+
+**Pricing**:
+- Free: 100 RPM, 30k TPM, 1k RPD
+- Paid: $0.025 per 1M tokens (Tier 1+)
+
+---
+
+### 2. OpenAI text-embedding-3-small
+
+**Strengths**:
+- Larger context window (8,191 tokens)
+- Well-documented and widely used
+- Good balance of cost and performance
+- Can shorten dimensions (Matryoshka)
+
+**Weaknesses**:
+- Fixed 1536 dimensions (unless shortened)
+- No task type optimization
+- Costs from day one (no free tier for embeddings)
+
+**Recommended For**:
+- General-purpose semantic search
+- Projects with long documents (>2k tokens)
+- OpenAI ecosystem integration
+
+**Pricing**:
+- $0.020 per 1M tokens
+
+---
+
+### 3. OpenAI text-embedding-3-large
+
+**Strengths**:
+- Highest accuracy of OpenAI models
+- 3072 dimensions (same as Gemini default)
+- Large context window (8,191 tokens)
+
+**Weaknesses**:
+- Most expensive ($0.130 per 1M tokens)
+- Fixed dimensions
+- Overkill for most use cases
+
+**Recommended For**:
+- Mission-critical applications requiring maximum accuracy
+- Well-funded projects
+
+**Pricing**:
+- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
+
+---
+
+### 4. Cloudflare Workers AI (bge-base-en-v1.5)
+
+**Strengths**:
+- **Free** on Cloudflare Workers
+- Fast (edge inference)
+- Good for English text
+- Simple integration with Vectorize
+
+**Weaknesses**:
+- Small context window (512 tokens)
+- Fixed 768 dimensions
+- No task type optimization
+- English-only (limited multilingual support)
+
+**Recommended For**:
+- Cloudflare-first stacks
+- Cost-sensitive projects
+- Short documents (<512 tokens)
+- Edge inference requirements
+
+**Pricing**:
+- Free (included with Cloudflare Workers)
+
+**Example**:
+```typescript
+const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
+  text: 'Your text here'
+});
+// Returns: { data: number[] } with 768 dimensions
+```
+
+---
+
+## When to Use Which
+
+### Use Gemini Embeddings When:
+- ✅ Building RAG systems (task type optimization)
+- ✅ Need flexible dimensions (save storage/compute)
+- ✅ Already using Gemini API
+- ✅ Want free tier for development
+
+### Use OpenAI text-embedding-3-small When:
+- ✅ Documents > 2,048 tokens
+- ✅ Using OpenAI for generation
+- ✅ Need proven, well-documented solution
+- ✅ General-purpose semantic search
+
+### Use OpenAI text-embedding-3-large When:
+- ✅ Maximum accuracy required
+- ✅ Budget allows ($0.130 per 1M tokens)
+- ✅ Mission-critical applications
+
+### Use Workers AI (BGE) When:
+- ✅ Building on Cloudflare
+- ✅ Short documents (<512 tokens)
+- ✅ Cost is primary concern (free)
+- ✅ English-only content
+- ✅ Need edge inference
+
+---
+
+## Dimension Recommendations
+
+| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
+|----------|--------|--------------|--------------|------------|
+| **General RAG** | 768 | 1536 | 3072 | 768 |
+| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
+| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
+
+---
+
+## Migration Guide
+
+### From OpenAI to Gemini
+
+```typescript
+// Before (OpenAI)
+const response = await openai.embeddings.create({
+  model: 'text-embedding-3-small',
+  input: 'Your text here'
+});
+const embedding = response.data[0].embedding; // 1536 dims
+
+// After (Gemini)
+const response = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: 'Your text here',
+  config: {
+    taskType: 'SEMANTIC_SIMILARITY',
+    outputDimensionality: 768 // or 1536 to match OpenAI
+  }
+});
+const embedding = response.embedding.values; // 768 dims
+```
+
+**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
+
+---
+
+## Performance Benchmarks
+
+Based on MTEB (Massive Text Embedding Benchmark):
+
+| Model | Retrieval Score | Clustering Score | Overall Score |
+|-------|----------------|------------------|---------------|
+| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
+| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
+| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
+| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
+
+*Estimated based on available benchmarks
+
+**Source**: https://github.com/embeddings-benchmark/mteb
+
+---
+
+## Summary
+
+**Best Overall**: Gemini gemini-embedding-001
+- Flexible dimensions
+- Task type optimization
+- Free tier
+- Good performance
+
+**Best for Accuracy**: OpenAI text-embedding-3-large
+- Highest MTEB scores
+- Large context window
+- Most expensive
+
+**Best for Budget**: Cloudflare Workers AI (BGE)
+- Completely free
+- Edge inference
+- Limited context window
+
+**Best for Long Documents**: OpenAI models
+- 8,191 token context
+- vs 2,048 (Gemini) or 512 (Workers AI)
+
+---
+
+## Official Documentation
+
+- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
+- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
+- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
+- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb
--- a/references/rag-patterns.md
+++ b/references/rag-patterns.md
@@ -0,0 +1,483 @@
+# RAG Implementation Patterns
+
+Complete guide to Retrieval Augmented Generation patterns using Gemini embeddings and Cloudflare Vectorize.
+
+---
+
+## RAG Workflow Overview
+
+```
+┌─────────────────────────────────────────────────────────┐
+│              DOCUMENT INGESTION (Offline)                │
+└─────────────────────────────────────────────────────────┘
+   Documents
+      ↓
+   Chunking (500 words)
+      ↓
+   Generate Embeddings (RETRIEVAL_DOCUMENT)
+      ↓
+   Store in Vectorize + Metadata
+
+┌─────────────────────────────────────────────────────────┐
+│              QUERY PROCESSING (Runtime)                  │
+└─────────────────────────────────────────────────────────┘
+   User Query
+      ↓
+   Generate Embedding (RETRIEVAL_QUERY)
+      ↓
+   Vector Search (top-K)
+      ↓
+   Retrieve Documents
+      ↓
+   Generate Response (LLM + Context)
+      ↓
+   Stream to User
+```
+
+---
+
+## Pattern 1: Basic RAG
+
+**Use when**: Simple Q&A over a knowledge base
+
+```typescript
+async function basicRAG(query: string, env: Env): Promise<string> {
+  // 1. Embed query
+  const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+
+  // 2. Search Vectorize
+  const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
+
+  // 3. Concatenate context
+  const context = results.matches
+    .map(m => m.metadata?.text)
+    .join('\n\n');
+
+  // 4. Generate response
+  const response = await generateResponse(context, query, env.GEMINI_API_KEY);
+
+  return response;
+}
+```
+
+---
+
+## Pattern 2: Chunked RAG (Recommended)
+
+**Use when**: Documents are longer than 2,048 tokens
+
+### Chunking Strategies
+
+```typescript
+// Strategy A: Fixed-size chunks with overlap
+function chunkWithOverlap(text: string, size = 500, overlap = 50): string[] {
+  const words = text.split(/\s+/);
+  const chunks: string[] = [];
+
+  for (let i = 0; i < words.length; i += size - overlap) {
+    chunks.push(words.slice(i, i + size).join(' '));
+  }
+
+  return chunks;
+}
+
+// Strategy B: Sentence-based chunks
+function chunkBySentences(text: string, maxSentences = 10): string[] {
+  const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
+  const chunks: string[] = [];
+
+  for (let i = 0; i < sentences.length; i += maxSentences) {
+    chunks.push(sentences.slice(i, i + maxSentences).join(' '));
+  }
+
+  return chunks;
+}
+
+// Strategy C: Semantic chunks (preserves paragraphs)
+function chunkByParagraphs(text: string): string[] {
+  return text.split(/\n\n+/).filter(p => p.trim().length > 50);
+}
+```
+
+### Implementation
+
+```typescript
+async function ingestWithChunking(doc: Document, env: Env) {
+  const chunks = chunkWithOverlap(doc.text, 500, 50);
+
+  const vectors = [];
+  for (let i = 0; i < chunks.length; i++) {
+    const embedding = await generateEmbedding(chunks[i], env.GEMINI_API_KEY, 'RETRIEVAL_DOCUMENT');
+
+    vectors.push({
+      id: `${doc.id}-chunk-${i}`,
+      values: embedding,
+      metadata: {
+        documentId: doc.id,
+        chunkIndex: i,
+        text: chunks[i],
+        title: doc.title
+      }
+    });
+  }
+
+  await env.VECTORIZE.insert(vectors);
+}
+```
+
+---
+
+## Pattern 3: Hybrid Search (Keyword + Semantic)
+
+**Use when**: You need both exact keyword matches and semantic understanding
+
+```typescript
+async function hybridSearch(query: string, env: Env) {
+  // 1. Vector search
+  const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+  const vectorResults = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
+
+  // 2. Keyword search (using metadata or D1)
+  const keywordResults = await env.D1.prepare(
+    'SELECT * FROM documents WHERE text LIKE ? ORDER BY relevance DESC LIMIT 10'
+  ).bind(`%${query}%`).all();
+
+  // 3. Merge and re-rank
+  const combined = mergeResults(vectorResults.matches, keywordResults.results);
+
+  // 4. Generate response from top results
+  const context = combined.slice(0, 5).map(r => r.text).join('\n\n');
+  return await generateResponse(context, query, env.GEMINI_API_KEY);
+}
+```
+
+---
+
+## Pattern 4: Filtered RAG
+
+**Use when**: Need to filter by category, date, or metadata
+
+```typescript
+async function filteredRAG(query: string, filters: { category?: string; minDate?: number }, env: Env) {
+  // 1. Vector search
+  const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+  const results = await env.VECTORIZE.query(queryEmbedding, { topK: 20 }); // Fetch more
+
+  // 2. Filter in application layer (until Vectorize supports metadata filtering)
+  const filtered = results.matches.filter(match => {
+    if (filters.category && match.metadata?.category !== filters.category) return false;
+    if (filters.minDate && match.metadata?.timestamp < filters.minDate) return false;
+    return true;
+  });
+
+  // 3. Take top 5 after filtering
+  const topResults = filtered.slice(0, 5);
+
+  // 4. Generate response
+  const context = topResults.map(r => r.metadata?.text).join('\n\n');
+  return await generateResponse(context, query, env.GEMINI_API_KEY);
+}
+```
+
+---
+
+## Pattern 5: Streaming RAG
+
+**Use when**: Real-time responses with immediate feedback
+
+```typescript
+async function streamingRAG(query: string, env: Env): Promise<ReadableStream> {
+  // 1. Embed query and search
+  const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+  const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
+
+  const context = results.matches.map(m => m.metadata?.text).join('\n\n');
+
+  // 2. Stream response from Gemini
+  const response = await fetch(
+    'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent',
+    {
+      method: 'POST',
+      headers: {
+        'x-goog-api-key': env.GEMINI_API_KEY,
+        'Content-Type': 'application/json'
+      },
+      body: JSON.stringify({
+        contents: [{
+          parts: [{ text: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:` }]
+        }]
+      })
+    }
+  );
+
+  return response.body!;
+}
+```
+
+---
+
+## Pattern 6: Multi-Query RAG
+
+**Use when**: Query might be ambiguous or multi-faceted
+
+```typescript
+async function multiQueryRAG(query: string, env: Env) {
+  // 1. Generate multiple query variations
+  const queryVariations = await generateQueryVariations(query, env.GEMINI_API_KEY);
+  // Returns: ["original query", "rephrased version 1", "rephrased version 2"]
+
+  // 2. Search with each variation
+  const allResults = await Promise.all(
+    queryVariations.map(async q => {
+      const embedding = await generateEmbedding(q, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+      return await env.VECTORIZE.query(embedding, { topK: 3 });
+    })
+  );
+
+  // 3. Merge and deduplicate
+  const uniqueResults = deduplicateById(allResults.flatMap(r => r.matches));
+
+  // 4. Generate response
+  const context = uniqueResults.slice(0, 5).map(r => r.metadata?.text).join('\n\n');
+  return await generateResponse(context, query, env.GEMINI_API_KEY);
+}
+```
+
+---
+
+## Pattern 7: Conversational RAG
+
+**Use when**: Multi-turn conversations with context
+
+```typescript
+interface ConversationHistory {
+  role: 'user' | 'assistant';
+  content: string;
+}
+
+async function conversationalRAG(
+  query: string,
+  history: ConversationHistory[],
+  env: Env
+) {
+  // 1. Create contextualized query from history
+  const contextualizedQuery = await reformulateQuery(query, history, env.GEMINI_API_KEY);
+
+  // 2. Search with contextualized query
+  const embedding = await generateEmbedding(contextualizedQuery, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+  const results = await env.VECTORIZE.query(embedding, { topK: 3 });
+
+  const retrievedContext = results.matches.map(m => m.metadata?.text).join('\n\n');
+
+  // 3. Generate response with conversation history
+  const prompt = `
+Conversation history:
+${history.map(h => `${h.role}: ${h.content}`).join('\n')}
+
+Retrieved context:
+${retrievedContext}
+
+User: ${query}
+Assistant:`;
+
+  return await generateResponse(prompt, query, env.GEMINI_API_KEY);
+}
+```
+
+---
+
+## Pattern 8: Citation RAG
+
+**Use when**: Need to cite sources in responses
+
+```typescript
+async function citationRAG(query: string, env: Env) {
+  const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
+  const results = await env.VECTORIZE.query(queryEmbedding, { topK: 5, returnMetadata: true });
+
+  // Build context with citations
+  const contextWithCitations = results.matches.map((match, i) =>
+    `[${i + 1}] ${match.metadata?.text}\nSource: ${match.metadata?.url || match.id}`
+  ).join('\n\n');
+
+  const prompt = `Answer the question using the provided sources. Include citations [1], [2], etc. in your answer.
+
+Sources:
+${contextWithCitations}
+
+Question: ${query}
+
+Answer (with citations):`;
+
+  const response = await generateResponse(prompt, query, env.GEMINI_API_KEY);
+
+  return {
+    answer: response,
+    sources: results.matches.map((m, i) => ({
+      citation: i + 1,
+      text: m.metadata?.text,
+      url: m.metadata?.url,
+      score: m.score
+    }))
+  };
+}
+```
+
+---
+
+## Best Practices
+
+### 1. Chunk Size Optimization
+
+```typescript
+// Test different chunk sizes for your use case
+const chunkSizes = [200, 500, 1000, 1500];
+
+for (const size of chunkSizes) {
+  const accuracy = await testRetrievalAccuracy(size);
+  console.log(`Chunk size ${size}: ${accuracy}% accuracy`);
+}
+
+// Recommendation: 500-1000 words with 10% overlap
+```
+
+### 2. Context Window Management
+
+```typescript
+// Don't exceed LLM context window
+function truncateContext(chunks: string[], maxTokens = 4000): string {
+  let context = '';
+  let estimatedTokens = 0;
+
+  for (const chunk of chunks) {
+    const chunkTokens = chunk.split(/\s+/).length * 1.3; // Rough estimate
+    if (estimatedTokens + chunkTokens > maxTokens) break;
+
+    context += chunk + '\n\n';
+    estimatedTokens += chunkTokens;
+  }
+
+  return context;
+}
+```
+
+### 3. Re-ranking
+
+```typescript
+// Re-rank results after retrieval
+function rerank(results: VectorizeMatch[], query: string): VectorizeMatch[] {
+  return results
+    .map(result => ({
+      ...result,
+      rerankScore: calculateRelevance(result.metadata?.text, query)
+    }))
+    .sort((a, b) => b.rerankScore - a.rerankScore);
+}
+```
+
+### 4. Fallback Strategies
+
+```typescript
+async function ragWithFallback(query: string, env: Env) {
+  const results = await searchVectorize(query, env);
+
+  if (results.matches.length === 0 || results.matches[0].score < 0.7) {
+    // Fallback: Use LLM without RAG
+    return await generateResponse('', query, env.GEMINI_API_KEY);
+  }
+
+  // Normal RAG flow
+  const context = results.matches.map(m => m.metadata?.text).join('\n\n');
+  return await generateResponse(context, query, env.GEMINI_API_KEY);
+}
+```
+
+---
+
+## Performance Optimization
+
+### 1. Caching
+
+```typescript
+// Cache embeddings
+const embeddingCache = new Map<string, number[]>();
+
+async function getCachedEmbedding(text: string, apiKey: string) {
+  const key = hashText(text);
+
+  if (embeddingCache.has(key)) {
+    return embeddingCache.get(key)!;
+  }
+
+  const embedding = await generateEmbedding(text, apiKey, 'RETRIEVAL_QUERY');
+  embeddingCache.set(key, embedding);
+
+  return embedding;
+}
+```
+
+### 2. Batch Processing
+
+```typescript
+// Ingest documents in parallel
+async function batchIngest(documents: Document[], env: Env, concurrency = 5) {
+  for (let i = 0; i < documents.length; i += concurrency) {
+    const batch = documents.slice(i, i + concurrency);
+
+    await Promise.all(
+      batch.map(doc => ingestDocument(doc, env))
+    );
+  }
+}
+```
+
+---
+
+## Common Pitfalls
+
+### ❌ Don't: Use same task type for queries and documents
+
+```typescript
+// Wrong
+const embedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_DOCUMENT');
+```
+
+### ✅ Do: Use correct task types
+
+```typescript
+// Correct
+const queryEmbedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_QUERY');
+const docEmbedding = await generateEmbedding(doc, apiKey, 'RETRIEVAL_DOCUMENT');
+```
+
+### ❌ Don't: Return too many or too few results
+
+```typescript
+// Too few (might miss relevant info)
+const results = await env.VECTORIZE.query(embedding, { topK: 1 });
+
+// Too many (noise, cost)
+const results = await env.VECTORIZE.query(embedding, { topK: 50 });
+```
+
+### ✅ Do: Find optimal topK for your use case
+
+```typescript
+// Test different topK values
+const topK = 5; // Good default for most use cases
+const results = await env.VECTORIZE.query(embedding, { topK });
+```
+
+---
+
+## Complete Example
+
+See `templates/rag-with-vectorize.ts` for a production-ready implementation combining these patterns.
+
+---
+
+## Official Documentation
+
+- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
+- **Vectorize**: https://developers.cloudflare.com/vectorize/
+- **RAG Best Practices**: https://ai.google.dev/gemini-api/docs/document-processing
--- a/references/top-errors.md
+++ b/references/top-errors.md
@@ -0,0 +1,460 @@
+# Top 8 Embedding Errors (And How to Fix Them)
+
+This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.
+
+---
+
+## Error 1: Dimension Mismatch
+
+### Error Message
+```
+Error: Vector dimensions do not match. Expected 768, got 3072
+```
+
+### Why It Happens
+- Generated embedding with default dimensions (3072) but Vectorize index expects 768
+- Mixed embeddings from different dimension settings
+
+### Root Cause
+Not specifying `outputDimensionality` parameter when generating embeddings.
+
+### Prevention
+```typescript
+// ❌ BAD: No outputDimensionality (defaults to 3072)
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text
+});
+
+// ✅ GOOD: Match Vectorize index dimensions
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: { outputDimensionality: 768 } // ← Match your index
+});
+```
+
+### Fix
+1. **Option A**: Regenerate embeddings with correct dimensions
+2. **Option B**: Recreate Vectorize index with 3072 dimensions
+
+```bash
+# Recreate index with correct dimensions
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+```
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
+- Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/
+
+---
+
+## Error 2: Batch Size Limit Exceeded
+
+### Error Message
+```
+Error: Request contains too many texts. Maximum: 100
+```
+
+### Why It Happens
+- Tried to embed more texts than API allows in single request
+- Different limits for single vs batch endpoints
+
+### Root Cause
+Gemini API limits the number of texts per batch request.
+
+### Prevention
+```typescript
+// ❌ BAD: Trying to embed 500 texts at once
+const embeddings = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  contents: largeArray, // 500 texts
+  config: { taskType: 'RETRIEVAL_DOCUMENT' }
+});
+
+// ✅ GOOD: Chunk into batches
+async function batchEmbed(texts: string[], batchSize = 100) {
+  const allEmbeddings: number[][] = [];
+
+  for (let i = 0; i < texts.length; i += batchSize) {
+    const batch = texts.slice(i, i + batchSize);
+    const response = await ai.models.embedContent({
+      model: 'gemini-embedding-001',
+      contents: batch,
+      config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+    });
+    allEmbeddings.push(...response.embeddings.map(e => e.values));
+
+    // Rate limiting delay
+    if (i + batchSize < texts.length) {
+      await new Promise(resolve => setTimeout(resolve, 1000));
+    }
+  }
+
+  return allEmbeddings;
+}
+```
+
+**Sources**:
+- Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits
+
+---
+
+## Error 3: Rate Limiting (429 Too Many Requests)
+
+### Error Message
+```
+Error: 429 Too Many Requests - Rate limit exceeded
+```
+
+### Why It Happens
+- Exceeded 100 requests per minute (free tier)
+- Exceeded tokens per minute limit
+- No exponential backoff implemented
+
+### Root Cause
+Free tier rate limits: 100 RPM, 30k TPM, 1k RPD
+
+### Prevention
+```typescript
+// ❌ BAD: No rate limiting
+for (const text of texts) {
+  await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
+}
+
+// ✅ GOOD: Exponential backoff
+async function embedWithRetry(text: string, maxRetries = 3) {
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      return await ai.models.embedContent({
+        model: 'gemini-embedding-001',
+        content: text,
+        config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
+      });
+    } catch (error: any) {
+      if (error.status === 429 && attempt < maxRetries - 1) {
+        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
+        console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
+        await new Promise(resolve => setTimeout(resolve, delay));
+        continue;
+      }
+      throw error;
+    }
+  }
+}
+```
+
+**Rate Limits**:
+| Tier | RPM | TPM | RPD |
+|------|-----|-----|-----|
+| Free | 100 | 30,000 | 1,000 |
+| Tier 1 | 3,000 | 1,000,000 | - |
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/rate-limits
+
+---
+
+## Error 4: Text Truncation (Input Length Limit)
+
+### Error Message
+No error! Text is **silently truncated** at 2,048 tokens.
+
+### Why It Happens
+- Input text exceeds 2,048 token limit
+- No warning or error is raised
+- Embeddings represent incomplete text
+
+### Root Cause
+Gemini embeddings model has 2,048 token input limit.
+
+### Prevention
+```typescript
+// ❌ BAD: Long text (silently truncated)
+const longText = "...".repeat(10000); // Very long
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: longText // Truncated to ~2,048 tokens
+});
+
+// ✅ GOOD: Chunk long texts
+function chunkText(text: string, maxTokens = 2000): string[] {
+  const words = text.split(/\s+/);
+  const chunks: string[] = [];
+  let currentChunk: string[] = [];
+
+  for (const word of words) {
+    currentChunk.push(word);
+
+    // Rough estimate: 1 token ≈ 0.75 words
+    if (currentChunk.length * 0.75 >= maxTokens) {
+      chunks.push(currentChunk.join(' '));
+      currentChunk = [];
+    }
+  }
+
+  if (currentChunk.length > 0) {
+    chunks.push(currentChunk.join(' '));
+  }
+
+  return chunks;
+}
+
+const chunks = chunkText(longText, 2000);
+const embeddings = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  contents: chunks,
+  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+});
+```
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
+
+---
+
+## Error 5: Cosine Similarity Calculation Errors
+
+### Error Message
+```
+Error: Similarity values out of range (-1.5 to 1.2)
+```
+
+### Why It Happens
+- Incorrect formula (using dot product instead of cosine similarity)
+- Not normalizing magnitudes
+- Division by zero for zero vectors
+
+### Root Cause
+Improper implementation of cosine similarity formula.
+
+### Prevention
+```typescript
+// ❌ BAD: Just dot product (not cosine similarity)
+function badSimilarity(a: number[], b: number[]): number {
+  let sum = 0;
+  for (let i = 0; i < a.length; i++) {
+    sum += a[i] * b[i];
+  }
+  return sum; // Wrong! This is unbounded
+}
+
+// ✅ GOOD: Proper cosine similarity
+function cosineSimilarity(a: number[], b: number[]): number {
+  if (a.length !== b.length) {
+    throw new Error('Vector dimensions must match');
+  }
+
+  let dotProduct = 0;
+  let magnitudeA = 0;
+  let magnitudeB = 0;
+
+  for (let i = 0; i < a.length; i++) {
+    dotProduct += a[i] * b[i];
+    magnitudeA += a[i] * a[i];
+    magnitudeB += b[i] * b[i];
+  }
+
+  if (magnitudeA === 0 || magnitudeB === 0) {
+    return 0; // Handle zero vectors
+  }
+
+  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
+}
+```
+
+**Formula**:
+```
+cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
+```
+
+Where:
+- `A · B` = dot product
+- `||A||` = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)
+
+**Result Range**: Always between -1 and 1
+- 1 = identical direction
+- 0 = perpendicular
+- -1 = opposite direction
+
+**Sources**:
+- https://en.wikipedia.org/wiki/Cosine_similarity
+
+---
+
+## Error 6: Incorrect Task Type (Reduces Quality)
+
+### Error Message
+No error, but search quality is poor (10-30% worse).
+
+### Why It Happens
+- Using `RETRIEVAL_DOCUMENT` for queries
+- Using `RETRIEVAL_QUERY` for documents
+- Not specifying task type at all
+
+### Root Cause
+Task types optimize embeddings for specific use cases.
+
+### Prevention
+```typescript
+// ❌ BAD: Wrong task type for RAG
+const queryEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: userQuery,
+  config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
+});
+
+// ✅ GOOD: Correct task types
+// For user queries
+const queryEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: userQuery,
+  config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
+});
+
+// For documents to index
+const docEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: documentText,
+  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+});
+```
+
+**Task Types Cheat Sheet**:
+| Task Type | Use For | Example |
+|-----------|---------|---------|
+| `RETRIEVAL_QUERY` | User queries | "What is RAG?" |
+| `RETRIEVAL_DOCUMENT` | Documents to index | Knowledge base articles |
+| `SEMANTIC_SIMILARITY` | Comparing texts | Duplicate detection |
+| `CLUSTERING` | Grouping texts | Topic modeling |
+| `CLASSIFICATION` | Categorizing texts | Spam detection |
+
+**Impact**: Using correct task type improves search relevance by 10-30%.
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/embeddings#task-types
+
+---
+
+## Error 7: Vector Storage Precision Loss
+
+### Error Message
+```
+Warning: Similarity scores inconsistent after storage/retrieval
+```
+
+### Why It Happens
+- Storing embeddings as integers instead of floats
+- Rounding to fewer decimal places
+- Using lossy compression
+
+### Root Cause
+Embeddings are high-precision floating-point numbers.
+
+### Prevention
+```typescript
+// ❌ BAD: Rounding to integers
+const embedding = response.embedding.values;
+const rounded = embedding.map(v => Math.round(v)); // Precision loss!
+
+await db.insert({
+  id: '1',
+  embedding: rounded // ← Will degrade search quality
+});
+
+// ✅ GOOD: Store full precision
+const embedding = response.embedding.values; // Keep as-is
+
+await db.insert({
+  id: '1',
+  embedding: embedding // ← Full float32 precision
+});
+
+// For JSON storage, use full precision
+const json = JSON.stringify({
+  id: '1',
+  embedding: embedding // JavaScript numbers are float64
+});
+```
+
+**Storage Recommendations**:
+- **Vectorize**: Handles float32 automatically ✅
+- **D1/SQLite**: Use BLOB for binary float32 array
+- **KV**: Store as JSON (float64 precision)
+- **R2**: Store as binary float32 array
+
+**Sources**:
+- Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/
+
+---
+
+## Error 8: Model Version Confusion
+
+### Error Message
+```
+Error: Model 'gemini-embedding-exp-03-07' is deprecated
+```
+
+### Why It Happens
+- Using experimental or deprecated model
+- Mixing embeddings from different model versions
+- Not keeping up with model updates
+
+### Root Cause
+Gemini has stable and experimental embedding models.
+
+### Prevention
+```typescript
+// ❌ BAD: Using experimental/deprecated model
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
+  content: text
+});
+
+// ✅ GOOD: Use stable model
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001', // Stable production model
+  content: text,
+  config: {
+    taskType: 'SEMANTIC_SIMILARITY',
+    outputDimensionality: 768
+  }
+});
+```
+
+**Model Status**:
+| Model | Status | Recommendation |
+|-------|--------|----------------|
+| `gemini-embedding-001` | ✅ Stable | Use this |
+| `gemini-embedding-exp-03-07` | ❌ Deprecated (Oct 2025) | Migrate to gemini-embedding-001 |
+
+**CRITICAL**: Never mix embeddings from different models. They use different vector spaces and are not comparable.
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings
+
+---
+
+## Summary Checklist
+
+Before deploying to production, verify:
+
+- [ ] `outputDimensionality` matches Vectorize index dimensions
+- [ ] Batch size ≤ API limits (chunk large datasets)
+- [ ] Rate limiting implemented with exponential backoff
+- [ ] Long texts are chunked (≤ 2,048 tokens)
+- [ ] Cosine similarity formula is correct
+- [ ] Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
+- [ ] Embeddings stored with full precision (float32)
+- [ ] Using stable model (`gemini-embedding-001`)
+
+**Following these guidelines prevents 100% of documented errors.**
+
+---
+
+## Additional Resources
+
+- **Official Docs**: https://ai.google.dev/gemini-api/docs/embeddings
+- **Rate Limits**: https://ai.google.dev/gemini-api/docs/rate-limits
+- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
+- **Model Specs**: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
--- a/references/vectorize-integration.md
+++ b/references/vectorize-integration.md
@@ -0,0 +1,469 @@
+# Cloudflare Vectorize Integration
+
+Complete guide for using Gemini embeddings with Cloudflare Vectorize.
+
+---
+
+## Quick Start
+
+### 1. Create Vectorize Index
+
+```bash
+# Create index with 768 dimensions (recommended for Gemini)
+npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
+
+# Alternative: 3072 dimensions (Gemini default, more accurate but larger)
+npx wrangler vectorize create gemini-embeddings-large --dimensions 3072 --metric cosine
+```
+
+### 2. Bind to Worker
+
+Add to `wrangler.jsonc`:
+
+```jsonc
+{
+  "name": "my-rag-worker",
+  "main": "src/index.ts",
+  "compatibility_date": "2025-10-25",
+  "vectorize": {
+    "bindings": [
+      {
+        "binding": "VECTORIZE",
+        "index_name": "gemini-embeddings"
+      }
+    ]
+  }
+}
+```
+
+### 3. Generate and Store Embeddings
+
+```typescript
+// Generate embedding
+const response = await fetch(
+  'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
+  {
+    method: 'POST',
+    headers: {
+      'x-goog-api-key': env.GEMINI_API_KEY,
+      'Content-Type': 'application/json'
+    },
+    body: JSON.stringify({
+      content: { parts: [{ text: 'Your document text' }] },
+      taskType: 'RETRIEVAL_DOCUMENT',
+      outputDimensionality: 768 // MUST match index dimensions
+    })
+  }
+);
+
+const data = await response.json();
+const embedding = data.embedding.values;
+
+// Insert into Vectorize
+await env.VECTORIZE.insert([{
+  id: 'doc-1',
+  values: embedding,
+  metadata: { text: 'Your document text', source: 'manual' }
+}]);
+```
+
+---
+
+## Dimension Configuration
+
+**CRITICAL**: Embedding dimensions MUST match Vectorize index dimensions.
+
+| Gemini Dimensions | Storage (per vector) | Recommended For |
+|-------------------|---------------------|-----------------|
+| 768 | 3 KB | Most use cases, cost-effective |
+| 1536 | 6 KB | Balance accuracy/storage |
+| 3072 | 12 KB | Maximum accuracy |
+
+**Create index to match your embeddings**:
+
+```bash
+# For 768-dim embeddings
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+
+# For 1536-dim embeddings
+npx wrangler vectorize create my-index --dimensions 1536 --metric cosine
+
+# For 3072-dim embeddings (Gemini default)
+npx wrangler vectorize create my-index --dimensions 3072 --metric cosine
+```
+
+---
+
+## Metric Selection
+
+Vectorize supports 3 distance metrics:
+
+### Cosine (Recommended)
+
+```bash
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+```
+
+**When to use**:
+- ✅ Semantic search (most common)
+- ✅ Document similarity
+- ✅ RAG systems
+
+**Range**: 0 (different) to 1 (identical)
+
+### Euclidean
+
+```bash
+npx wrangler vectorize create my-index --dimensions 768 --metric euclidean
+```
+
+**When to use**:
+- ✅ Absolute distance matters
+- ✅ Magnitude is important
+
+**Range**: 0 (identical) to ∞ (very different)
+
+### Dot Product
+
+```bash
+npx wrangler vectorize create my-index --dimensions 768 --metric dot-product
+```
+
+**When to use**:
+- ✅ Pre-normalized vectors
+- ✅ Performance optimization
+
+**Range**: -1 to 1 (for normalized vectors)
+
+**Recommendation**: Use **cosine** for Gemini embeddings (most common and intuitive).
+
+---
+
+## Insert Patterns
+
+### Single Insert
+
+```typescript
+await env.VECTORIZE.insert([{
+  id: 'doc-1',
+  values: embedding,
+  metadata: {
+    text: 'Document content',
+    timestamp: Date.now(),
+    category: 'documentation'
+  }
+}]);
+```
+
+### Batch Insert
+
+```typescript
+const vectors = documents.map((doc, i) => ({
+  id: `doc-${i}`,
+  values: doc.embedding,
+  metadata: { text: doc.text }
+}));
+
+// Insert up to 100 vectors at once
+await env.VECTORIZE.insert(vectors);
+```
+
+### Upsert (Update or Insert)
+
+```typescript
+// Vectorize automatically updates if ID exists
+await env.VECTORIZE.insert([{
+  id: 'doc-1', // Existing ID
+  values: newEmbedding,
+  metadata: { text: 'Updated content' }
+}]);
+```
+
+---
+
+## Query Patterns
+
+### Basic Query
+
+```typescript
+const results = await env.VECTORIZE.query(queryEmbedding, {
+  topK: 5
+});
+
+console.log(results.matches);
+// [{ id: 'doc-1', score: 0.95 }, ...]
+```
+
+### Query with Metadata
+
+```typescript
+const results = await env.VECTORIZE.query(queryEmbedding, {
+  topK: 5,
+  returnMetadata: true
+});
+
+results.matches.forEach(match => {
+  console.log(match.id);           // 'doc-1'
+  console.log(match.score);        // 0.95
+  console.log(match.metadata.text); // 'Document content'
+});
+```
+
+### Query with Metadata Filtering (Future)
+
+```typescript
+// Coming soon: Filter by metadata
+const results = await env.VECTORIZE.query(queryEmbedding, {
+  topK: 5,
+  filter: { category: 'documentation' }
+});
+```
+
+---
+
+## Metadata Best Practices
+
+### What to Store
+
+```typescript
+await env.VECTORIZE.insert([{
+  id: 'doc-1',
+  values: embedding,
+  metadata: {
+    // ✅ Store these
+    text: 'The actual document content', // For retrieval
+    title: 'Document title',
+    url: 'https://example.com/doc',
+    timestamp: Date.now(),
+    category: 'product',
+
+    // ❌ Don't store these
+    embedding: embedding, // Already stored as values
+    largeObject: { /* ... */ } // Keep metadata small
+  }
+}]);
+```
+
+### Metadata Limits
+
+- **Max size**: ~1 KB per vector
+- **Best practice**: Store only what you need for retrieval/display
+- **For large data**: Store minimal metadata, fetch full data from D1/KV using ID
+
+---
+
+## Complete RAG Example
+
+```typescript
+interface Env {
+  GEMINI_API_KEY: string;
+  VECTORIZE: VectorizeIndex;
+}
+
+export default {
+  async fetch(request: Request, env: Env): Promise<Response> {
+    const url = new URL(request.url);
+
+    // Ingest: POST /ingest with { text: "..." }
+    if (url.pathname === '/ingest' && request.method === 'POST') {
+      const { text } = await request.json();
+
+      // 1. Generate embedding
+      const embeddingRes = await fetch(
+        'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
+        {
+          method: 'POST',
+          headers: {
+            'x-goog-api-key': env.GEMINI_API_KEY,
+            'Content-Type': 'application/json'
+          },
+          body: JSON.stringify({
+            content: { parts: [{ text }] },
+            taskType: 'RETRIEVAL_DOCUMENT',
+            outputDimensionality: 768
+          })
+        }
+      );
+
+      const embeddingData = await embeddingRes.json();
+      const embedding = embeddingData.embedding.values;
+
+      // 2. Store in Vectorize
+      await env.VECTORIZE.insert([{
+        id: `doc-${Date.now()}`,
+        values: embedding,
+        metadata: { text, timestamp: Date.now() }
+      }]);
+
+      return new Response(JSON.stringify({ success: true }));
+    }
+
+    // Query: POST /query with { query: "..." }
+    if (url.pathname === '/query' && request.method === 'POST') {
+      const { query } = await request.json();
+
+      // 1. Generate query embedding
+      const embeddingRes = await fetch(
+        'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
+        {
+          method: 'POST',
+          headers: {
+            'x-goog-api-key': env.GEMINI_API_KEY,
+            'Content-Type': 'application/json'
+          },
+          body: JSON.stringify({
+            content: { parts: [{ text: query }] },
+            taskType: 'RETRIEVAL_QUERY',
+            outputDimensionality: 768
+          })
+        }
+      );
+
+      const embeddingData = await embeddingRes.json();
+      const embedding = embeddingData.embedding.values;
+
+      // 2. Search Vectorize
+      const results = await env.VECTORIZE.query(embedding, {
+        topK: 5,
+        returnMetadata: true
+      });
+
+      return new Response(JSON.stringify({
+        query,
+        results: results.matches.map(m => ({
+          id: m.id,
+          score: m.score,
+          text: m.metadata?.text
+        }))
+      }));
+    }
+
+    return new Response('Not found', { status: 404 });
+  }
+};
+```
+
+---
+
+## Index Management
+
+### List Indexes
+
+```bash
+npx wrangler vectorize list
+```
+
+### Get Index Info
+
+```bash
+npx wrangler vectorize get gemini-embeddings
+```
+
+### Delete Index
+
+```bash
+npx wrangler vectorize delete gemini-embeddings
+```
+
+**CRITICAL**: Deleting an index deletes all vectors permanently.
+
+---
+
+## Limitations & Quotas
+
+| Feature | Free Plan | Paid Plans |
+|---------|-----------|------------|
+| Indexes per account | 100 | 100 |
+| Vectors per index | 200,000 | 5,000,000+ |
+| Queries per day | 30,000,000 | Unlimited |
+| Dimensions | Up to 1536 | Up to 3072 |
+
+**Source**: https://developers.cloudflare.com/vectorize/platform/pricing/
+
+---
+
+## Best Practices
+
+### 1. Choose Dimensions Wisely
+
+```typescript
+// ✅ 768 dimensions (recommended)
+// - Good accuracy
+// - Low storage
+// - Fast queries
+
+// ⚠️ 3072 dimensions (if accuracy is critical)
+// - Best accuracy
+// - 4x storage
+// - Slower queries
+```
+
+### 2. Use Metadata for Context
+
+```typescript
+await env.VECTORIZE.insert([{
+  id: 'doc-1',
+  values: embedding,
+  metadata: {
+    text: 'Store the actual text here for retrieval',
+    url: 'https://...',
+    timestamp: Date.now()
+  }
+}]);
+```
+
+### 3. Implement Caching
+
+```typescript
+// Cache embeddings in KV
+const cached = await env.KV.get(`embedding:${textHash}`);
+if (cached) {
+  return JSON.parse(cached);
+}
+
+const embedding = await generateEmbedding(text);
+await env.KV.put(`embedding:${textHash}`, JSON.stringify(embedding), {
+  expirationTtl: 86400 // 24 hours
+});
+```
+
+### 4. Monitor Usage
+
+```bash
+# Check index stats
+npx wrangler vectorize get gemini-embeddings
+
+# Shows:
+# - Total vectors
+# - Dimensions
+# - Metric type
+```
+
+---
+
+## Troubleshooting
+
+### Dimension Mismatch Error
+
+```
+Error: Vector dimensions do not match. Expected 768, got 3072
+```
+
+**Solution**: Ensure embedding `outputDimensionality` matches index dimensions.
+
+### No Results Found
+
+**Possible causes**:
+1. Index is empty (no vectors inserted)
+2. Query embedding is wrong task type (use RETRIEVAL_QUERY)
+3. Similarity threshold too high
+
+**Solution**: Check index has vectors, use correct task types.
+
+---
+
+## Official Documentation
+
+- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
+- **Pricing**: https://developers.cloudflare.com/vectorize/platform/pricing/
+- **Wrangler CLI**: https://developers.cloudflare.com/workers/wrangler/