Initial commit

2025-11-30 08:24:54 +08:00
commit 7927519669
17 changed files with 4377 additions and 0 deletions
--- a/references/dimension-guide.md
+++ b/references/dimension-guide.md
@@ -0,0 +1,310 @@
+# Choosing the Right Embedding Dimensions
+
+Guide to selecting optimal dimensions for your use case with Gemini embeddings.
+
+---
+
+## Quick Decision Table
+
+| Your Priority | Recommended Dimensions | Why |
+|--------------|----------------------|-----|
+| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
+| **Maximum accuracy** | 3072 | Gemini's full capability |
+| **Storage-limited** | 512 or lower | Reduce storage/compute |
+| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
+
+---
+
+## Available Dimensions
+
+Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
+
+### Common Choices
+
+| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
+|------------|---------------|--------------|----------|----------|
+| **768** | ~3 KB | Fast | Good | **Recommended default** |
+| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
+| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
+| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
+| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
+
+---
+
+## Matryoshka Representation Learning
+
+Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
+
+```
+Dimensions 1-256:   Core semantic information
+Dimensions 257-512: Additional nuance
+Dimensions 513-768: Fine-grained details
+Dimensions 769-1536: Subtle distinctions
+Dimensions 1537-3072: Maximum precision
+```
+
+**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
+
+---
+
+## Storage Impact
+
+### Example: 100,000 Documents
+
+| Dimensions | Storage Required | Monthly Cost (R2)* |
+|------------|-----------------|-------------------|
+| 256 | ~100 MB | $0.01 |
+| 512 | ~200 MB | $0.02 |
+| **768** | **~300 MB** | **$0.03** |
+| 1536 | ~600 MB | $0.06 |
+| 3072 | ~1.2 GB | $0.12 |
+
+\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
+
+**For 1M vectors**:
+- 768 dims: ~3 GB storage
+- 3072 dims: ~12 GB storage (4x more expensive)
+
+---
+
+## Accuracy Trade-offs
+
+Based on MTEB benchmarks (approximate):
+
+| Dimensions | Retrieval Accuracy | Relative to 3072 |
+|------------|-------------------|------------------|
+| 256 | ~85% | -15% |
+| 512 | ~92% | -8% |
+| **768** | **~96%** | **-4%** |
+| 1536 | ~98% | -2% |
+| 3072 | 100% (baseline) | 0% |
+
+**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
+
+---
+
+## Query Performance
+
+Search latency (approximate, 100k vectors):
+
+| Dimensions | Query Latency | Throughput (QPS) |
+|------------|--------------|------------------|
+| 256 | ~10ms | ~1000 |
+| 512 | ~15ms | ~700 |
+| **768** | **~20ms** | **~500** |
+| 1536 | ~35ms | ~300 |
+| 3072 | ~60ms | ~170 |
+
+**Note**: Actual performance depends on Vectorize implementation and hardware.
+
+---
+
+## When to Use Each
+
+### 768 Dimensions (Recommended Default)
+
+**Use when**:
+- ✅ Building standard RAG systems
+- ✅ General semantic search
+- ✅ Cost-effectiveness matters
+- ✅ Storage is a consideration
+
+**Don't use when**:
+- ❌ You need absolute maximum accuracy
+- ❌ Migrating from OpenAI 1536-dim embeddings
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 768 // ← Recommended
+  }
+});
+```
+
+---
+
+### 3072 Dimensions (Maximum Accuracy)
+
+**Use when**:
+- ✅ Accuracy is critical (legal, medical, research)
+- ✅ Budget allows 4x storage cost
+- ✅ Query latency isn't a concern
+- ✅ Small dataset (<10k vectors)
+
+**Don't use when**:
+- ❌ Cost-sensitive project
+- ❌ Large dataset (>100k vectors)
+- ❌ Real-time search required
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 3072 // ← Maximum accuracy
+  }
+});
+```
+
+---
+
+### 1536 Dimensions (OpenAI Compatibility)
+
+**Use when**:
+- ✅ Migrating from OpenAI text-embedding-3-small
+- ✅ Need compatibility with existing infrastructure
+- ✅ Balancing accuracy and cost
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 1536 // ← Match OpenAI
+  }
+});
+```
+
+---
+
+### 512 or Lower (Storage-Constrained)
+
+**Use when**:
+- ✅ Extreme storage constraints
+- ✅ Millions of vectors
+- ✅ Acceptable to sacrifice some accuracy
+- ✅ Ultra-fast queries required
+
+**Example**:
+```typescript
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: {
+    taskType: 'RETRIEVAL_DOCUMENT',
+    outputDimensionality: 512 // ← Compact
+  }
+});
+```
+
+---
+
+## Migration Between Dimensions
+
+**CRITICAL**: You cannot mix different dimensions in the same index.
+
+### Option 1: Recreate Index
+
+```bash
+# Delete old index
+npx wrangler vectorize delete my-index
+
+# Create new index with different dimensions
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+
+# Re-generate all embeddings with new dimensions
+# Re-insert all vectors
+```
+
+### Option 2: Create New Index
+
+```bash
+# Keep old index running
+# Create new index
+npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
+
+# Gradually migrate vectors
+# Switch over when ready
+# Delete old index
+```
+
+---
+
+## Testing Methodology
+
+To test if lower dimensions work for your use case:
+
+```typescript
+// 1. Generate test embeddings with different dimensions
+const dims = [256, 512, 768, 1536, 3072];
+const testEmbeddings = await Promise.all(
+  dims.map(dim => ai.models.embedContent({
+    model: 'gemini-embedding-001',
+    content: testText,
+    config: { outputDimensionality: dim }
+  }))
+);
+
+// 2. Test retrieval accuracy
+const queries = ['query1', 'query2', 'query3'];
+for (const dim of dims) {
+  const accuracy = await testRetrievalAccuracy(queries, dim);
+  console.log(`${dim} dims: ${accuracy}% accuracy`);
+}
+
+// 3. Measure performance
+for (const dim of dims) {
+  const latency = await measureQueryLatency(dim);
+  console.log(`${dim} dims: ${latency}ms latency`);
+}
+```
+
+---
+
+## Recommendations by Use Case
+
+### RAG for Documentation
+- **Recommended**: 768 dims
+- **Reasoning**: Good accuracy, reasonable storage, fast queries
+
+### E-commerce Search
+- **Recommended**: 512-768 dims
+- **Reasoning**: Speed matters, millions of products
+
+### Legal Document Search
+- **Recommended**: 3072 dims
+- **Reasoning**: Accuracy is critical, smaller datasets
+
+### Customer Support Chatbot
+- **Recommended**: 768 dims
+- **Reasoning**: Balance accuracy and response time
+
+### Research Paper Search
+- **Recommended**: 1536-3072 dims
+- **Reasoning**: Nuanced understanding needed
+
+---
+
+## Summary
+
+**Default Choice**: **768 dimensions**
+- 96% of 3072-dim accuracy
+- 75% less storage
+- 3x faster queries
+- Best balance for most applications
+
+**Only use 3072 if**:
+- You need every percentage point of accuracy
+- You have budget for 4x storage
+- You have a small dataset
+
+**Consider lower (<768) if**:
+- You have millions of vectors
+- Storage cost is a major concern
+- Ultra-fast queries are required
+
+---
+
+## Official Documentation
+
+- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
+- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
+- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb