Files
gh-jezweb-claude-skills-ski…/references/dimension-guide.md
2025-11-30 08:24:54 +08:00

7.4 KiB

Choosing the Right Embedding Dimensions

Guide to selecting optimal dimensions for your use case with Gemini embeddings.


Quick Decision Table

Your Priority Recommended Dimensions Why
Balanced (default) 768 Best accuracy-to-cost ratio
Maximum accuracy 3072 Gemini's full capability
Storage-limited 512 or lower Reduce storage/compute
OpenAI compatibility 1536 Match OpenAI dimensions

Available Dimensions

Gemini supports any dimension from 128 to 3072 using Matryoshka Representation Learning.

Common Choices

Dimensions Storage/Vector Search Speed Accuracy Use Case
768 ~3 KB Fast Good Recommended default
1536 ~6 KB Medium Better Match OpenAI, large datasets
3072 ~12 KB Slower Best Maximum accuracy needed
512 ~2 KB Very fast Acceptable Storage-constrained
256 ~1 KB Ultra fast Lower Extreme constraints

Matryoshka Representation Learning

Gemini's flexible dimensions work because of Matryoshka Representation Learning: The model learns nested representations where the first N dimensions capture progressively more information.

Dimensions 1-256:   Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision

Key Point: Lower dimensions aren't "worse" - they're compressed versions of the full embedding.


Storage Impact

Example: 100,000 Documents

Dimensions Storage Required Monthly Cost (R2)*
256 ~100 MB $0.01
512 ~200 MB $0.02
768 ~300 MB $0.03
1536 ~600 MB $0.06
3072 ~1.2 GB $0.12

*Assuming 4 bytes per float, R2 pricing $0.015/GB/month

For 1M vectors:

  • 768 dims: ~3 GB storage
  • 3072 dims: ~12 GB storage (4x more expensive)

Accuracy Trade-offs

Based on MTEB benchmarks (approximate):

Dimensions Retrieval Accuracy Relative to 3072
256 ~85% -15%
512 ~92% -8%
768 ~96% -4%
1536 ~98% -2%
3072 100% (baseline) 0%

Diminishing returns: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.


Query Performance

Search latency (approximate, 100k vectors):

Dimensions Query Latency Throughput (QPS)
256 ~10ms ~1000
512 ~15ms ~700
768 ~20ms ~500
1536 ~35ms ~300
3072 ~60ms ~170

Note: Actual performance depends on Vectorize implementation and hardware.


When to Use Each

Use when:

  • Building standard RAG systems
  • General semantic search
  • Cost-effectiveness matters
  • Storage is a consideration

Don't use when:

  • You need absolute maximum accuracy
  • Migrating from OpenAI 1536-dim embeddings

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 768 // ← Recommended
  }
});

3072 Dimensions (Maximum Accuracy)

Use when:

  • Accuracy is critical (legal, medical, research)
  • Budget allows 4x storage cost
  • Query latency isn't a concern
  • Small dataset (<10k vectors)

Don't use when:

  • Cost-sensitive project
  • Large dataset (>100k vectors)
  • Real-time search required

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 3072 // ← Maximum accuracy
  }
});

1536 Dimensions (OpenAI Compatibility)

Use when:

  • Migrating from OpenAI text-embedding-3-small
  • Need compatibility with existing infrastructure
  • Balancing accuracy and cost

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 1536 // ← Match OpenAI
  }
});

512 or Lower (Storage-Constrained)

Use when:

  • Extreme storage constraints
  • Millions of vectors
  • Acceptable to sacrifice some accuracy
  • Ultra-fast queries required

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 512 // ← Compact
  }
});

Migration Between Dimensions

CRITICAL: You cannot mix different dimensions in the same index.

Option 1: Recreate Index

# Delete old index
npx wrangler vectorize delete my-index

# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine

# Re-generate all embeddings with new dimensions
# Re-insert all vectors

Option 2: Create New Index

# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine

# Gradually migrate vectors
# Switch over when ready
# Delete old index

Testing Methodology

To test if lower dimensions work for your use case:

// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
  dims.map(dim => ai.models.embedContent({
    model: 'gemini-embedding-001',
    content: testText,
    config: { outputDimensionality: dim }
  }))
);

// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
  const accuracy = await testRetrievalAccuracy(queries, dim);
  console.log(`${dim} dims: ${accuracy}% accuracy`);
}

// 3. Measure performance
for (const dim of dims) {
  const latency = await measureQueryLatency(dim);
  console.log(`${dim} dims: ${latency}ms latency`);
}

Recommendations by Use Case

RAG for Documentation

  • Recommended: 768 dims
  • Reasoning: Good accuracy, reasonable storage, fast queries
  • Recommended: 512-768 dims
  • Reasoning: Speed matters, millions of products
  • Recommended: 3072 dims
  • Reasoning: Accuracy is critical, smaller datasets

Customer Support Chatbot

  • Recommended: 768 dims
  • Reasoning: Balance accuracy and response time
  • Recommended: 1536-3072 dims
  • Reasoning: Nuanced understanding needed

Summary

Default Choice: 768 dimensions

  • 96% of 3072-dim accuracy
  • 75% less storage
  • 3x faster queries
  • Best balance for most applications

Only use 3072 if:

  • You need every percentage point of accuracy
  • You have budget for 4x storage
  • You have a small dataset

Consider lower (<768) if:

  • You have millions of vectors
  • Storage cost is a major concern
  • Ultra-fast queries are required

Official Documentation