Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:24:54 +08:00
commit 7927519669
17 changed files with 4377 additions and 0 deletions

View File

@@ -0,0 +1,310 @@
# Choosing the Right Embedding Dimensions
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
---
## Quick Decision Table
| Your Priority | Recommended Dimensions | Why |
|--------------|----------------------|-----|
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
| **Maximum accuracy** | 3072 | Gemini's full capability |
| **Storage-limited** | 512 or lower | Reduce storage/compute |
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
---
## Available Dimensions
Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
### Common Choices
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|------------|---------------|--------------|----------|----------|
| **768** | ~3 KB | Fast | Good | **Recommended default** |
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
---
## Matryoshka Representation Learning
Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
```
Dimensions 1-256: Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision
```
**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
---
## Storage Impact
### Example: 100,000 Documents
| Dimensions | Storage Required | Monthly Cost (R2)* |
|------------|-----------------|-------------------|
| 256 | ~100 MB | $0.01 |
| 512 | ~200 MB | $0.02 |
| **768** | **~300 MB** | **$0.03** |
| 1536 | ~600 MB | $0.06 |
| 3072 | ~1.2 GB | $0.12 |
\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
**For 1M vectors**:
- 768 dims: ~3 GB storage
- 3072 dims: ~12 GB storage (4x more expensive)
---
## Accuracy Trade-offs
Based on MTEB benchmarks (approximate):
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|------------|-------------------|------------------|
| 256 | ~85% | -15% |
| 512 | ~92% | -8% |
| **768** | **~96%** | **-4%** |
| 1536 | ~98% | -2% |
| 3072 | 100% (baseline) | 0% |
**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
---
## Query Performance
Search latency (approximate, 100k vectors):
| Dimensions | Query Latency | Throughput (QPS) |
|------------|--------------|------------------|
| 256 | ~10ms | ~1000 |
| 512 | ~15ms | ~700 |
| **768** | **~20ms** | **~500** |
| 1536 | ~35ms | ~300 |
| 3072 | ~60ms | ~170 |
**Note**: Actual performance depends on Vectorize implementation and hardware.
---
## When to Use Each
### 768 Dimensions (Recommended Default)
**Use when**:
- ✅ Building standard RAG systems
- ✅ General semantic search
- ✅ Cost-effectiveness matters
- ✅ Storage is a consideration
**Don't use when**:
- ❌ You need absolute maximum accuracy
- ❌ Migrating from OpenAI 1536-dim embeddings
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768 // ← Recommended
}
});
```
---
### 3072 Dimensions (Maximum Accuracy)
**Use when**:
- ✅ Accuracy is critical (legal, medical, research)
- ✅ Budget allows 4x storage cost
- ✅ Query latency isn't a concern
- ✅ Small dataset (<10k vectors)
**Don't use when**:
- ❌ Cost-sensitive project
- ❌ Large dataset (>100k vectors)
- ❌ Real-time search required
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 3072 // ← Maximum accuracy
}
});
```
---
### 1536 Dimensions (OpenAI Compatibility)
**Use when**:
- ✅ Migrating from OpenAI text-embedding-3-small
- ✅ Need compatibility with existing infrastructure
- ✅ Balancing accuracy and cost
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 1536 // ← Match OpenAI
}
});
```
---
### 512 or Lower (Storage-Constrained)
**Use when**:
- ✅ Extreme storage constraints
- ✅ Millions of vectors
- ✅ Acceptable to sacrifice some accuracy
- ✅ Ultra-fast queries required
**Example**:
```typescript
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 512 // ← Compact
}
});
```
---
## Migration Between Dimensions
**CRITICAL**: You cannot mix different dimensions in the same index.
### Option 1: Recreate Index
```bash
# Delete old index
npx wrangler vectorize delete my-index
# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
# Re-generate all embeddings with new dimensions
# Re-insert all vectors
```
### Option 2: Create New Index
```bash
# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
# Gradually migrate vectors
# Switch over when ready
# Delete old index
```
---
## Testing Methodology
To test if lower dimensions work for your use case:
```typescript
// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
dims.map(dim => ai.models.embedContent({
model: 'gemini-embedding-001',
content: testText,
config: { outputDimensionality: dim }
}))
);
// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
const accuracy = await testRetrievalAccuracy(queries, dim);
console.log(`${dim} dims: ${accuracy}% accuracy`);
}
// 3. Measure performance
for (const dim of dims) {
const latency = await measureQueryLatency(dim);
console.log(`${dim} dims: ${latency}ms latency`);
}
```
---
## Recommendations by Use Case
### RAG for Documentation
- **Recommended**: 768 dims
- **Reasoning**: Good accuracy, reasonable storage, fast queries
### E-commerce Search
- **Recommended**: 512-768 dims
- **Reasoning**: Speed matters, millions of products
### Legal Document Search
- **Recommended**: 3072 dims
- **Reasoning**: Accuracy is critical, smaller datasets
### Customer Support Chatbot
- **Recommended**: 768 dims
- **Reasoning**: Balance accuracy and response time
### Research Paper Search
- **Recommended**: 1536-3072 dims
- **Reasoning**: Nuanced understanding needed
---
## Summary
**Default Choice**: **768 dimensions**
- 96% of 3072-dim accuracy
- 75% less storage
- 3x faster queries
- Best balance for most applications
**Only use 3072 if**:
- You need every percentage point of accuracy
- You have budget for 4x storage
- You have a small dataset
**Consider lower (<768) if**:
- You have millions of vectors
- Storage cost is a major concern
- Ultra-fast queries are required
---
## Official Documentation
- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb