7.4 KiB
Choosing the Right Embedding Dimensions
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
Quick Decision Table
| Your Priority | Recommended Dimensions | Why |
|---|---|---|
| Balanced (default) | 768 | Best accuracy-to-cost ratio |
| Maximum accuracy | 3072 | Gemini's full capability |
| Storage-limited | 512 or lower | Reduce storage/compute |
| OpenAI compatibility | 1536 | Match OpenAI dimensions |
Available Dimensions
Gemini supports any dimension from 128 to 3072 using Matryoshka Representation Learning.
Common Choices
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|---|---|---|---|---|
| 768 | ~3 KB | Fast | Good | Recommended default |
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
Matryoshka Representation Learning
Gemini's flexible dimensions work because of Matryoshka Representation Learning: The model learns nested representations where the first N dimensions capture progressively more information.
Dimensions 1-256: Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision
Key Point: Lower dimensions aren't "worse" - they're compressed versions of the full embedding.
Storage Impact
Example: 100,000 Documents
| Dimensions | Storage Required | Monthly Cost (R2)* |
|---|---|---|
| 256 | ~100 MB | $0.01 |
| 512 | ~200 MB | $0.02 |
| 768 | ~300 MB | $0.03 |
| 1536 | ~600 MB | $0.06 |
| 3072 | ~1.2 GB | $0.12 |
*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
For 1M vectors:
- 768 dims: ~3 GB storage
- 3072 dims: ~12 GB storage (4x more expensive)
Accuracy Trade-offs
Based on MTEB benchmarks (approximate):
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|---|---|---|
| 256 | ~85% | -15% |
| 512 | ~92% | -8% |
| 768 | ~96% | -4% |
| 1536 | ~98% | -2% |
| 3072 | 100% (baseline) | 0% |
Diminishing returns: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
Query Performance
Search latency (approximate, 100k vectors):
| Dimensions | Query Latency | Throughput (QPS) |
|---|---|---|
| 256 | ~10ms | ~1000 |
| 512 | ~15ms | ~700 |
| 768 | ~20ms | ~500 |
| 1536 | ~35ms | ~300 |
| 3072 | ~60ms | ~170 |
Note: Actual performance depends on Vectorize implementation and hardware.
When to Use Each
768 Dimensions (Recommended Default)
Use when:
- ✅ Building standard RAG systems
- ✅ General semantic search
- ✅ Cost-effectiveness matters
- ✅ Storage is a consideration
Don't use when:
- ❌ You need absolute maximum accuracy
- ❌ Migrating from OpenAI 1536-dim embeddings
Example:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768 // ← Recommended
}
});
3072 Dimensions (Maximum Accuracy)
Use when:
- ✅ Accuracy is critical (legal, medical, research)
- ✅ Budget allows 4x storage cost
- ✅ Query latency isn't a concern
- ✅ Small dataset (<10k vectors)
Don't use when:
- ❌ Cost-sensitive project
- ❌ Large dataset (>100k vectors)
- ❌ Real-time search required
Example:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 3072 // ← Maximum accuracy
}
});
1536 Dimensions (OpenAI Compatibility)
Use when:
- ✅ Migrating from OpenAI text-embedding-3-small
- ✅ Need compatibility with existing infrastructure
- ✅ Balancing accuracy and cost
Example:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 1536 // ← Match OpenAI
}
});
512 or Lower (Storage-Constrained)
Use when:
- ✅ Extreme storage constraints
- ✅ Millions of vectors
- ✅ Acceptable to sacrifice some accuracy
- ✅ Ultra-fast queries required
Example:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 512 // ← Compact
}
});
Migration Between Dimensions
CRITICAL: You cannot mix different dimensions in the same index.
Option 1: Recreate Index
# Delete old index
npx wrangler vectorize delete my-index
# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
# Re-generate all embeddings with new dimensions
# Re-insert all vectors
Option 2: Create New Index
# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
# Gradually migrate vectors
# Switch over when ready
# Delete old index
Testing Methodology
To test if lower dimensions work for your use case:
// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
dims.map(dim => ai.models.embedContent({
model: 'gemini-embedding-001',
content: testText,
config: { outputDimensionality: dim }
}))
);
// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
const accuracy = await testRetrievalAccuracy(queries, dim);
console.log(`${dim} dims: ${accuracy}% accuracy`);
}
// 3. Measure performance
for (const dim of dims) {
const latency = await measureQueryLatency(dim);
console.log(`${dim} dims: ${latency}ms latency`);
}
Recommendations by Use Case
RAG for Documentation
- Recommended: 768 dims
- Reasoning: Good accuracy, reasonable storage, fast queries
E-commerce Search
- Recommended: 512-768 dims
- Reasoning: Speed matters, millions of products
Legal Document Search
- Recommended: 3072 dims
- Reasoning: Accuracy is critical, smaller datasets
Customer Support Chatbot
- Recommended: 768 dims
- Reasoning: Balance accuracy and response time
Research Paper Search
- Recommended: 1536-3072 dims
- Reasoning: Nuanced understanding needed
Summary
Default Choice: 768 dimensions
- 96% of 3072-dim accuracy
- 75% less storage
- 3x faster queries
- Best balance for most applications
Only use 3072 if:
- You need every percentage point of accuracy
- You have budget for 4x storage
- You have a small dataset
Consider lower (<768) if:
- You have millions of vectors
- Storage cost is a major concern
- Ultra-fast queries are required
Official Documentation
- Matryoshka Learning: https://arxiv.org/abs/2205.13147
- Gemini Embeddings: https://ai.google.dev/gemini-api/docs/embeddings
- MTEB Benchmark: https://github.com/embeddings-benchmark/mteb