311 lines
7.4 KiB
Markdown
311 lines
7.4 KiB
Markdown
# Choosing the Right Embedding Dimensions
|
|
|
|
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
|
|
|
|
---
|
|
|
|
## Quick Decision Table
|
|
|
|
| Your Priority | Recommended Dimensions | Why |
|
|
|--------------|----------------------|-----|
|
|
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
|
|
| **Maximum accuracy** | 3072 | Gemini's full capability |
|
|
| **Storage-limited** | 512 or lower | Reduce storage/compute |
|
|
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
|
|
|
|
---
|
|
|
|
## Available Dimensions
|
|
|
|
Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
|
|
|
|
### Common Choices
|
|
|
|
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|
|
|------------|---------------|--------------|----------|----------|
|
|
| **768** | ~3 KB | Fast | Good | **Recommended default** |
|
|
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
|
|
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
|
|
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
|
|
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
|
|
|
|
---
|
|
|
|
## Matryoshka Representation Learning
|
|
|
|
Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
|
|
|
|
```
|
|
Dimensions 1-256: Core semantic information
|
|
Dimensions 257-512: Additional nuance
|
|
Dimensions 513-768: Fine-grained details
|
|
Dimensions 769-1536: Subtle distinctions
|
|
Dimensions 1537-3072: Maximum precision
|
|
```
|
|
|
|
**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
|
|
|
|
---
|
|
|
|
## Storage Impact
|
|
|
|
### Example: 100,000 Documents
|
|
|
|
| Dimensions | Storage Required | Monthly Cost (R2)* |
|
|
|------------|-----------------|-------------------|
|
|
| 256 | ~100 MB | $0.01 |
|
|
| 512 | ~200 MB | $0.02 |
|
|
| **768** | **~300 MB** | **$0.03** |
|
|
| 1536 | ~600 MB | $0.06 |
|
|
| 3072 | ~1.2 GB | $0.12 |
|
|
|
|
\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
|
|
|
|
**For 1M vectors**:
|
|
- 768 dims: ~3 GB storage
|
|
- 3072 dims: ~12 GB storage (4x more expensive)
|
|
|
|
---
|
|
|
|
## Accuracy Trade-offs
|
|
|
|
Based on MTEB benchmarks (approximate):
|
|
|
|
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|
|
|------------|-------------------|------------------|
|
|
| 256 | ~85% | -15% |
|
|
| 512 | ~92% | -8% |
|
|
| **768** | **~96%** | **-4%** |
|
|
| 1536 | ~98% | -2% |
|
|
| 3072 | 100% (baseline) | 0% |
|
|
|
|
**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
|
|
|
|
---
|
|
|
|
## Query Performance
|
|
|
|
Search latency (approximate, 100k vectors):
|
|
|
|
| Dimensions | Query Latency | Throughput (QPS) |
|
|
|------------|--------------|------------------|
|
|
| 256 | ~10ms | ~1000 |
|
|
| 512 | ~15ms | ~700 |
|
|
| **768** | **~20ms** | **~500** |
|
|
| 1536 | ~35ms | ~300 |
|
|
| 3072 | ~60ms | ~170 |
|
|
|
|
**Note**: Actual performance depends on Vectorize implementation and hardware.
|
|
|
|
---
|
|
|
|
## When to Use Each
|
|
|
|
### 768 Dimensions (Recommended Default)
|
|
|
|
**Use when**:
|
|
- ✅ Building standard RAG systems
|
|
- ✅ General semantic search
|
|
- ✅ Cost-effectiveness matters
|
|
- ✅ Storage is a consideration
|
|
|
|
**Don't use when**:
|
|
- ❌ You need absolute maximum accuracy
|
|
- ❌ Migrating from OpenAI 1536-dim embeddings
|
|
|
|
**Example**:
|
|
```typescript
|
|
const embedding = await ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: text,
|
|
config: {
|
|
taskType: 'RETRIEVAL_DOCUMENT',
|
|
outputDimensionality: 768 // ← Recommended
|
|
}
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
### 3072 Dimensions (Maximum Accuracy)
|
|
|
|
**Use when**:
|
|
- ✅ Accuracy is critical (legal, medical, research)
|
|
- ✅ Budget allows 4x storage cost
|
|
- ✅ Query latency isn't a concern
|
|
- ✅ Small dataset (<10k vectors)
|
|
|
|
**Don't use when**:
|
|
- ❌ Cost-sensitive project
|
|
- ❌ Large dataset (>100k vectors)
|
|
- ❌ Real-time search required
|
|
|
|
**Example**:
|
|
```typescript
|
|
const embedding = await ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: text,
|
|
config: {
|
|
taskType: 'RETRIEVAL_DOCUMENT',
|
|
outputDimensionality: 3072 // ← Maximum accuracy
|
|
}
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
### 1536 Dimensions (OpenAI Compatibility)
|
|
|
|
**Use when**:
|
|
- ✅ Migrating from OpenAI text-embedding-3-small
|
|
- ✅ Need compatibility with existing infrastructure
|
|
- ✅ Balancing accuracy and cost
|
|
|
|
**Example**:
|
|
```typescript
|
|
const embedding = await ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: text,
|
|
config: {
|
|
taskType: 'RETRIEVAL_DOCUMENT',
|
|
outputDimensionality: 1536 // ← Match OpenAI
|
|
}
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
### 512 or Lower (Storage-Constrained)
|
|
|
|
**Use when**:
|
|
- ✅ Extreme storage constraints
|
|
- ✅ Millions of vectors
|
|
- ✅ Acceptable to sacrifice some accuracy
|
|
- ✅ Ultra-fast queries required
|
|
|
|
**Example**:
|
|
```typescript
|
|
const embedding = await ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: text,
|
|
config: {
|
|
taskType: 'RETRIEVAL_DOCUMENT',
|
|
outputDimensionality: 512 // ← Compact
|
|
}
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Between Dimensions
|
|
|
|
**CRITICAL**: You cannot mix different dimensions in the same index.
|
|
|
|
### Option 1: Recreate Index
|
|
|
|
```bash
|
|
# Delete old index
|
|
npx wrangler vectorize delete my-index
|
|
|
|
# Create new index with different dimensions
|
|
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
|
|
|
# Re-generate all embeddings with new dimensions
|
|
# Re-insert all vectors
|
|
```
|
|
|
|
### Option 2: Create New Index
|
|
|
|
```bash
|
|
# Keep old index running
|
|
# Create new index
|
|
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
|
|
|
|
# Gradually migrate vectors
|
|
# Switch over when ready
|
|
# Delete old index
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Methodology
|
|
|
|
To test if lower dimensions work for your use case:
|
|
|
|
```typescript
|
|
// 1. Generate test embeddings with different dimensions
|
|
const dims = [256, 512, 768, 1536, 3072];
|
|
const testEmbeddings = await Promise.all(
|
|
dims.map(dim => ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: testText,
|
|
config: { outputDimensionality: dim }
|
|
}))
|
|
);
|
|
|
|
// 2. Test retrieval accuracy
|
|
const queries = ['query1', 'query2', 'query3'];
|
|
for (const dim of dims) {
|
|
const accuracy = await testRetrievalAccuracy(queries, dim);
|
|
console.log(`${dim} dims: ${accuracy}% accuracy`);
|
|
}
|
|
|
|
// 3. Measure performance
|
|
for (const dim of dims) {
|
|
const latency = await measureQueryLatency(dim);
|
|
console.log(`${dim} dims: ${latency}ms latency`);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Recommendations by Use Case
|
|
|
|
### RAG for Documentation
|
|
- **Recommended**: 768 dims
|
|
- **Reasoning**: Good accuracy, reasonable storage, fast queries
|
|
|
|
### E-commerce Search
|
|
- **Recommended**: 512-768 dims
|
|
- **Reasoning**: Speed matters, millions of products
|
|
|
|
### Legal Document Search
|
|
- **Recommended**: 3072 dims
|
|
- **Reasoning**: Accuracy is critical, smaller datasets
|
|
|
|
### Customer Support Chatbot
|
|
- **Recommended**: 768 dims
|
|
- **Reasoning**: Balance accuracy and response time
|
|
|
|
### Research Paper Search
|
|
- **Recommended**: 1536-3072 dims
|
|
- **Reasoning**: Nuanced understanding needed
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Default Choice**: **768 dimensions**
|
|
- 96% of 3072-dim accuracy
|
|
- 75% less storage
|
|
- 3x faster queries
|
|
- Best balance for most applications
|
|
|
|
**Only use 3072 if**:
|
|
- You need every percentage point of accuracy
|
|
- You have budget for 4x storage
|
|
- You have a small dataset
|
|
|
|
**Consider lower (<768) if**:
|
|
- You have millions of vectors
|
|
- Storage cost is a major concern
|
|
- Ultra-fast queries are required
|
|
|
|
---
|
|
|
|
## Official Documentation
|
|
|
|
- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
|
|
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
|
|
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb
|