Initial commit
This commit is contained in:
310
references/dimension-guide.md
Normal file
310
references/dimension-guide.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# Choosing the Right Embedding Dimensions
|
||||
|
||||
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Table
|
||||
|
||||
| Your Priority | Recommended Dimensions | Why |
|
||||
|--------------|----------------------|-----|
|
||||
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
|
||||
| **Maximum accuracy** | 3072 | Gemini's full capability |
|
||||
| **Storage-limited** | 512 or lower | Reduce storage/compute |
|
||||
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
|
||||
|
||||
---
|
||||
|
||||
## Available Dimensions
|
||||
|
||||
Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
|
||||
|
||||
### Common Choices
|
||||
|
||||
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|
||||
|------------|---------------|--------------|----------|----------|
|
||||
| **768** | ~3 KB | Fast | Good | **Recommended default** |
|
||||
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
|
||||
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
|
||||
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
|
||||
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
|
||||
|
||||
---
|
||||
|
||||
## Matryoshka Representation Learning
|
||||
|
||||
Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
|
||||
|
||||
```
|
||||
Dimensions 1-256: Core semantic information
|
||||
Dimensions 257-512: Additional nuance
|
||||
Dimensions 513-768: Fine-grained details
|
||||
Dimensions 769-1536: Subtle distinctions
|
||||
Dimensions 1537-3072: Maximum precision
|
||||
```
|
||||
|
||||
**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
|
||||
|
||||
---
|
||||
|
||||
## Storage Impact
|
||||
|
||||
### Example: 100,000 Documents
|
||||
|
||||
| Dimensions | Storage Required | Monthly Cost (R2)* |
|
||||
|------------|-----------------|-------------------|
|
||||
| 256 | ~100 MB | $0.01 |
|
||||
| 512 | ~200 MB | $0.02 |
|
||||
| **768** | **~300 MB** | **$0.03** |
|
||||
| 1536 | ~600 MB | $0.06 |
|
||||
| 3072 | ~1.2 GB | $0.12 |
|
||||
|
||||
\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
|
||||
|
||||
**For 1M vectors**:
|
||||
- 768 dims: ~3 GB storage
|
||||
- 3072 dims: ~12 GB storage (4x more expensive)
|
||||
|
||||
---
|
||||
|
||||
## Accuracy Trade-offs
|
||||
|
||||
Based on MTEB benchmarks (approximate):
|
||||
|
||||
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|
||||
|------------|-------------------|------------------|
|
||||
| 256 | ~85% | -15% |
|
||||
| 512 | ~92% | -8% |
|
||||
| **768** | **~96%** | **-4%** |
|
||||
| 1536 | ~98% | -2% |
|
||||
| 3072 | 100% (baseline) | 0% |
|
||||
|
||||
**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
|
||||
|
||||
---
|
||||
|
||||
## Query Performance
|
||||
|
||||
Search latency (approximate, 100k vectors):
|
||||
|
||||
| Dimensions | Query Latency | Throughput (QPS) |
|
||||
|------------|--------------|------------------|
|
||||
| 256 | ~10ms | ~1000 |
|
||||
| 512 | ~15ms | ~700 |
|
||||
| **768** | **~20ms** | **~500** |
|
||||
| 1536 | ~35ms | ~300 |
|
||||
| 3072 | ~60ms | ~170 |
|
||||
|
||||
**Note**: Actual performance depends on Vectorize implementation and hardware.
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each
|
||||
|
||||
### 768 Dimensions (Recommended Default)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Building standard RAG systems
|
||||
- ✅ General semantic search
|
||||
- ✅ Cost-effectiveness matters
|
||||
- ✅ Storage is a consideration
|
||||
|
||||
**Don't use when**:
|
||||
- ❌ You need absolute maximum accuracy
|
||||
- ❌ Migrating from OpenAI 1536-dim embeddings
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 768 // ← Recommended
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3072 Dimensions (Maximum Accuracy)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Accuracy is critical (legal, medical, research)
|
||||
- ✅ Budget allows 4x storage cost
|
||||
- ✅ Query latency isn't a concern
|
||||
- ✅ Small dataset (<10k vectors)
|
||||
|
||||
**Don't use when**:
|
||||
- ❌ Cost-sensitive project
|
||||
- ❌ Large dataset (>100k vectors)
|
||||
- ❌ Real-time search required
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 3072 // ← Maximum accuracy
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1536 Dimensions (OpenAI Compatibility)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Migrating from OpenAI text-embedding-3-small
|
||||
- ✅ Need compatibility with existing infrastructure
|
||||
- ✅ Balancing accuracy and cost
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 1536 // ← Match OpenAI
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 512 or Lower (Storage-Constrained)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Extreme storage constraints
|
||||
- ✅ Millions of vectors
|
||||
- ✅ Acceptable to sacrifice some accuracy
|
||||
- ✅ Ultra-fast queries required
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 512 // ← Compact
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Between Dimensions
|
||||
|
||||
**CRITICAL**: You cannot mix different dimensions in the same index.
|
||||
|
||||
### Option 1: Recreate Index
|
||||
|
||||
```bash
|
||||
# Delete old index
|
||||
npx wrangler vectorize delete my-index
|
||||
|
||||
# Create new index with different dimensions
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
||||
|
||||
# Re-generate all embeddings with new dimensions
|
||||
# Re-insert all vectors
|
||||
```
|
||||
|
||||
### Option 2: Create New Index
|
||||
|
||||
```bash
|
||||
# Keep old index running
|
||||
# Create new index
|
||||
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
|
||||
|
||||
# Gradually migrate vectors
|
||||
# Switch over when ready
|
||||
# Delete old index
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Methodology
|
||||
|
||||
To test if lower dimensions work for your use case:
|
||||
|
||||
```typescript
|
||||
// 1. Generate test embeddings with different dimensions
|
||||
const dims = [256, 512, 768, 1536, 3072];
|
||||
const testEmbeddings = await Promise.all(
|
||||
dims.map(dim => ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: testText,
|
||||
config: { outputDimensionality: dim }
|
||||
}))
|
||||
);
|
||||
|
||||
// 2. Test retrieval accuracy
|
||||
const queries = ['query1', 'query2', 'query3'];
|
||||
for (const dim of dims) {
|
||||
const accuracy = await testRetrievalAccuracy(queries, dim);
|
||||
console.log(`${dim} dims: ${accuracy}% accuracy`);
|
||||
}
|
||||
|
||||
// 3. Measure performance
|
||||
for (const dim of dims) {
|
||||
const latency = await measureQueryLatency(dim);
|
||||
console.log(`${dim} dims: ${latency}ms latency`);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations by Use Case
|
||||
|
||||
### RAG for Documentation
|
||||
- **Recommended**: 768 dims
|
||||
- **Reasoning**: Good accuracy, reasonable storage, fast queries
|
||||
|
||||
### E-commerce Search
|
||||
- **Recommended**: 512-768 dims
|
||||
- **Reasoning**: Speed matters, millions of products
|
||||
|
||||
### Legal Document Search
|
||||
- **Recommended**: 3072 dims
|
||||
- **Reasoning**: Accuracy is critical, smaller datasets
|
||||
|
||||
### Customer Support Chatbot
|
||||
- **Recommended**: 768 dims
|
||||
- **Reasoning**: Balance accuracy and response time
|
||||
|
||||
### Research Paper Search
|
||||
- **Recommended**: 1536-3072 dims
|
||||
- **Reasoning**: Nuanced understanding needed
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Default Choice**: **768 dimensions**
|
||||
- 96% of 3072-dim accuracy
|
||||
- 75% less storage
|
||||
- 3x faster queries
|
||||
- Best balance for most applications
|
||||
|
||||
**Only use 3072 if**:
|
||||
- You need every percentage point of accuracy
|
||||
- You have budget for 4x storage
|
||||
- You have a small dataset
|
||||
|
||||
**Consider lower (<768) if**:
|
||||
- You have millions of vectors
|
||||
- Storage cost is a major concern
|
||||
- Ultra-fast queries are required
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
|
||||
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
|
||||
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb
|
||||
Reference in New Issue
Block a user