gh-jezweb-claude-skills-ski…/references/dimension-guide.md

# Choosing the Right Embedding Dimensions

Guide to selecting optimal dimensions for your use case with Gemini embeddings.

---

## Quick Decision Table

| Your Priority | Recommended Dimensions | Why |
|--------------|----------------------|-----|
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
| **Maximum accuracy** | 3072 | Gemini's full capability |
| **Storage-limited** | 512 or lower | Reduce storage/compute |
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |

---

## Available Dimensions

Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.

### Common Choices

| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|------------|---------------|--------------|----------|----------|
| **768** | ~3 KB | Fast | Good | **Recommended default** |
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |

---

## Matryoshka Representation Learning

Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.

```
Dimensions 1-256:   Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision
```

**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.

---

## Storage Impact

### Example: 100,000 Documents

| Dimensions | Storage Required | Monthly Cost (R2)* |
|------------|-----------------|-------------------|
| 256 | ~100 MB | $0.01 |
| 512 | ~200 MB | $0.02 |
| **768** | **~300 MB** | **$0.03** |
| 1536 | ~600 MB | $0.06 |
| 3072 | ~1.2 GB | $0.12 |

\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month

**For 1M vectors**:
- 768 dims: ~3 GB storage
- 3072 dims: ~12 GB storage (4x more expensive)

---

## Accuracy Trade-offs

Based on MTEB benchmarks (approximate):

| Dimensions | Retrieval Accuracy | Relative to 3072 |
|------------|-------------------|------------------|
| 256 | ~85% | -15% |
| 512 | ~92% | -8% |
| **768** | **~96%** | **-4%** |
| 1536 | ~98% | -2% |
| 3072 | 100% (baseline) | 0% |

**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.

---

## Query Performance

Search latency (approximate, 100k vectors):

| Dimensions | Query Latency | Throughput (QPS) |
|------------|--------------|------------------|
| 256 | ~10ms | ~1000 |
| 512 | ~15ms | ~700 |
| **768** | **~20ms** | **~500** |
| 1536 | ~35ms | ~300 |
| 3072 | ~60ms | ~170 |

**Note**: Actual performance depends on Vectorize implementation and hardware.

---

## When to Use Each

### 768 Dimensions (Recommended Default)

**Use when**:
- ✅ Building standard RAG systems
- ✅ General semantic search
- ✅ Cost-effectiveness matters
- ✅ Storage is a consideration

**Don't use when**:
- ❌ You need absolute maximum accuracy
- ❌ Migrating from OpenAI 1536-dim embeddings

**Example**:
```typescript
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 768 // ← Recommended
  }
});
```

---

### 3072 Dimensions (Maximum Accuracy)

**Use when**:
- ✅ Accuracy is critical (legal, medical, research)
- ✅ Budget allows 4x storage cost
- ✅ Query latency isn't a concern
- ✅ Small dataset (<10k vectors)

**Don't use when**:
- ❌ Cost-sensitive project
- ❌ Large dataset (>100k vectors)
- ❌ Real-time search required

**Example**:
```typescript
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 3072 // ← Maximum accuracy
  }
});
```

---

### 1536 Dimensions (OpenAI Compatibility)

**Use when**:
- ✅ Migrating from OpenAI text-embedding-3-small
- ✅ Need compatibility with existing infrastructure
- ✅ Balancing accuracy and cost

**Example**:
```typescript
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 1536 // ← Match OpenAI
  }
});
```

---

### 512 or Lower (Storage-Constrained)

**Use when**:
- ✅ Extreme storage constraints
- ✅ Millions of vectors
- ✅ Acceptable to sacrifice some accuracy
- ✅ Ultra-fast queries required

**Example**:
```typescript
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 512 // ← Compact
  }
});
```

---

## Migration Between Dimensions

**CRITICAL**: You cannot mix different dimensions in the same index.

### Option 1: Recreate Index

```bash
# Delete old index
npx wrangler vectorize delete my-index

# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine

# Re-generate all embeddings with new dimensions
# Re-insert all vectors
```

### Option 2: Create New Index

```bash
# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine

# Gradually migrate vectors
# Switch over when ready
# Delete old index
```

---

## Testing Methodology

To test if lower dimensions work for your use case:

```typescript
// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
  dims.map(dim => ai.models.embedContent({
    model: 'gemini-embedding-001',
    content: testText,
    config: { outputDimensionality: dim }
  }))
);

// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
  const accuracy = await testRetrievalAccuracy(queries, dim);
  console.log(`${dim} dims: ${accuracy}% accuracy`);
}

// 3. Measure performance
for (const dim of dims) {
  const latency = await measureQueryLatency(dim);
  console.log(`${dim} dims: ${latency}ms latency`);
}
```

---

## Recommendations by Use Case

### RAG for Documentation
- **Recommended**: 768 dims
- **Reasoning**: Good accuracy, reasonable storage, fast queries

### E-commerce Search
- **Recommended**: 512-768 dims
- **Reasoning**: Speed matters, millions of products

### Legal Document Search
- **Recommended**: 3072 dims
- **Reasoning**: Accuracy is critical, smaller datasets

### Customer Support Chatbot
- **Recommended**: 768 dims
- **Reasoning**: Balance accuracy and response time

### Research Paper Search
- **Recommended**: 1536-3072 dims
- **Reasoning**: Nuanced understanding needed

---

## Summary

**Default Choice**: **768 dimensions**
- 96% of 3072-dim accuracy
- 75% less storage
- 3x faster queries
- Best balance for most applications

**Only use 3072 if**:
- You need every percentage point of accuracy
- You have budget for 4x storage
- You have a small dataset

**Consider lower (<768) if**:
- You have millions of vectors
- Storage cost is a major concern
- Ultra-fast queries are required

---

## Official Documentation

- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb