# Choosing the Right Embedding Dimensions Guide to selecting optimal dimensions for your use case with Gemini embeddings. --- ## Quick Decision Table | Your Priority | Recommended Dimensions | Why | |--------------|----------------------|-----| | **Balanced (default)** | **768** | Best accuracy-to-cost ratio | | **Maximum accuracy** | 3072 | Gemini's full capability | | **Storage-limited** | 512 or lower | Reduce storage/compute | | **OpenAI compatibility** | 1536 | Match OpenAI dimensions | --- ## Available Dimensions Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning. ### Common Choices | Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case | |------------|---------------|--------------|----------|----------| | **768** | ~3 KB | Fast | Good | **Recommended default** | | 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets | | 3072 | ~12 KB | Slower | Best | Maximum accuracy needed | | 512 | ~2 KB | Very fast | Acceptable | Storage-constrained | | 256 | ~1 KB | Ultra fast | Lower | Extreme constraints | --- ## Matryoshka Representation Learning Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information. ``` Dimensions 1-256: Core semantic information Dimensions 257-512: Additional nuance Dimensions 513-768: Fine-grained details Dimensions 769-1536: Subtle distinctions Dimensions 1537-3072: Maximum precision ``` **Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding. --- ## Storage Impact ### Example: 100,000 Documents | Dimensions | Storage Required | Monthly Cost (R2)* | |------------|-----------------|-------------------| | 256 | ~100 MB | $0.01 | | 512 | ~200 MB | $0.02 | | **768** | **~300 MB** | **$0.03** | | 1536 | ~600 MB | $0.06 | | 3072 | ~1.2 GB | $0.12 | \*Assuming 4 bytes per float, R2 pricing $0.015/GB/month **For 1M vectors**: - 768 dims: ~3 GB storage - 3072 dims: ~12 GB storage (4x more expensive) --- ## Accuracy Trade-offs Based on MTEB benchmarks (approximate): | Dimensions | Retrieval Accuracy | Relative to 3072 | |------------|-------------------|------------------| | 256 | ~85% | -15% | | 512 | ~92% | -8% | | **768** | **~96%** | **-4%** | | 1536 | ~98% | -2% | | 3072 | 100% (baseline) | 0% | **Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage. --- ## Query Performance Search latency (approximate, 100k vectors): | Dimensions | Query Latency | Throughput (QPS) | |------------|--------------|------------------| | 256 | ~10ms | ~1000 | | 512 | ~15ms | ~700 | | **768** | **~20ms** | **~500** | | 1536 | ~35ms | ~300 | | 3072 | ~60ms | ~170 | **Note**: Actual performance depends on Vectorize implementation and hardware. --- ## When to Use Each ### 768 Dimensions (Recommended Default) **Use when**: - ✅ Building standard RAG systems - ✅ General semantic search - ✅ Cost-effectiveness matters - ✅ Storage is a consideration **Don't use when**: - ❌ You need absolute maximum accuracy - ❌ Migrating from OpenAI 1536-dim embeddings **Example**: ```typescript const embedding = await ai.models.embedContent({ model: 'gemini-embedding-001', content: text, config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 // ← Recommended } }); ``` --- ### 3072 Dimensions (Maximum Accuracy) **Use when**: - ✅ Accuracy is critical (legal, medical, research) - ✅ Budget allows 4x storage cost - ✅ Query latency isn't a concern - ✅ Small dataset (<10k vectors) **Don't use when**: - ❌ Cost-sensitive project - ❌ Large dataset (>100k vectors) - ❌ Real-time search required **Example**: ```typescript const embedding = await ai.models.embedContent({ model: 'gemini-embedding-001', content: text, config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 3072 // ← Maximum accuracy } }); ``` --- ### 1536 Dimensions (OpenAI Compatibility) **Use when**: - ✅ Migrating from OpenAI text-embedding-3-small - ✅ Need compatibility with existing infrastructure - ✅ Balancing accuracy and cost **Example**: ```typescript const embedding = await ai.models.embedContent({ model: 'gemini-embedding-001', content: text, config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 1536 // ← Match OpenAI } }); ``` --- ### 512 or Lower (Storage-Constrained) **Use when**: - ✅ Extreme storage constraints - ✅ Millions of vectors - ✅ Acceptable to sacrifice some accuracy - ✅ Ultra-fast queries required **Example**: ```typescript const embedding = await ai.models.embedContent({ model: 'gemini-embedding-001', content: text, config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 512 // ← Compact } }); ``` --- ## Migration Between Dimensions **CRITICAL**: You cannot mix different dimensions in the same index. ### Option 1: Recreate Index ```bash # Delete old index npx wrangler vectorize delete my-index # Create new index with different dimensions npx wrangler vectorize create my-index --dimensions 768 --metric cosine # Re-generate all embeddings with new dimensions # Re-insert all vectors ``` ### Option 2: Create New Index ```bash # Keep old index running # Create new index npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine # Gradually migrate vectors # Switch over when ready # Delete old index ``` --- ## Testing Methodology To test if lower dimensions work for your use case: ```typescript // 1. Generate test embeddings with different dimensions const dims = [256, 512, 768, 1536, 3072]; const testEmbeddings = await Promise.all( dims.map(dim => ai.models.embedContent({ model: 'gemini-embedding-001', content: testText, config: { outputDimensionality: dim } })) ); // 2. Test retrieval accuracy const queries = ['query1', 'query2', 'query3']; for (const dim of dims) { const accuracy = await testRetrievalAccuracy(queries, dim); console.log(`${dim} dims: ${accuracy}% accuracy`); } // 3. Measure performance for (const dim of dims) { const latency = await measureQueryLatency(dim); console.log(`${dim} dims: ${latency}ms latency`); } ``` --- ## Recommendations by Use Case ### RAG for Documentation - **Recommended**: 768 dims - **Reasoning**: Good accuracy, reasonable storage, fast queries ### E-commerce Search - **Recommended**: 512-768 dims - **Reasoning**: Speed matters, millions of products ### Legal Document Search - **Recommended**: 3072 dims - **Reasoning**: Accuracy is critical, smaller datasets ### Customer Support Chatbot - **Recommended**: 768 dims - **Reasoning**: Balance accuracy and response time ### Research Paper Search - **Recommended**: 1536-3072 dims - **Reasoning**: Nuanced understanding needed --- ## Summary **Default Choice**: **768 dimensions** - 96% of 3072-dim accuracy - 75% less storage - 3x faster queries - Best balance for most applications **Only use 3072 if**: - You need every percentage point of accuracy - You have budget for 4x storage - You have a small dataset **Consider lower (<768) if**: - You have millions of vectors - Storage cost is a major concern - Ultra-fast queries are required --- ## Official Documentation - **Matryoshka Learning**: https://arxiv.org/abs/2205.13147 - **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings - **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb