zhongwei/gh-jezweb-claude-skills-skills-google-gemini-embeddings

Fork 0

Files

Zhongwei Li 7927519669 Initial commit

2025-11-30 08:24:54 +08:00

7.4 KiB

Raw Blame History

Choosing the Right Embedding Dimensions

Guide to selecting optimal dimensions for your use case with Gemini embeddings.

Quick Decision Table

Your Priority	Recommended Dimensions	Why
Balanced (default)	768	Best accuracy-to-cost ratio
Maximum accuracy	3072	Gemini's full capability
Storage-limited	512 or lower	Reduce storage/compute
OpenAI compatibility	1536	Match OpenAI dimensions

Available Dimensions

Gemini supports any dimension from 128 to 3072 using Matryoshka Representation Learning.

Common Choices

Dimensions	Storage/Vector	Search Speed	Accuracy	Use Case
768	~3 KB	Fast	Good	Recommended default
1536	~6 KB	Medium	Better	Match OpenAI, large datasets
3072	~12 KB	Slower	Best	Maximum accuracy needed
512	~2 KB	Very fast	Acceptable	Storage-constrained
256	~1 KB	Ultra fast	Lower	Extreme constraints

Matryoshka Representation Learning

Gemini's flexible dimensions work because of Matryoshka Representation Learning: The model learns nested representations where the first N dimensions capture progressively more information.

Dimensions 1-256:   Core semantic information
Dimensions 257-512: Additional nuance
Dimensions 513-768: Fine-grained details
Dimensions 769-1536: Subtle distinctions
Dimensions 1537-3072: Maximum precision

Key Point: Lower dimensions aren't "worse" - they're compressed versions of the full embedding.

Storage Impact

Example: 100,000 Documents

Dimensions	Storage Required	Monthly Cost (R2)*
256	~100 MB	$0.01
512	~200 MB	$0.02
768	~300 MB	$0.03
1536	~600 MB	$0.06
3072	~1.2 GB	$0.12

*Assuming 4 bytes per float, R2 pricing $0.015/GB/month

For 1M vectors:

768 dims: ~3 GB storage
3072 dims: ~12 GB storage (4x more expensive)

Accuracy Trade-offs

Based on MTEB benchmarks (approximate):

Dimensions	Retrieval Accuracy	Relative to 3072
256	~85%	-15%
512	~92%	-8%
768	~96%	-4%
1536	~98%	-2%
3072	100% (baseline)	0%

Diminishing returns: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.

Query Performance

Search latency (approximate, 100k vectors):

Dimensions	Query Latency	Throughput (QPS)
256	~10ms	~1000
512	~15ms	~700
768	~20ms	~500
1536	~35ms	~300
3072	~60ms	~170

Note: Actual performance depends on Vectorize implementation and hardware.

When to Use Each

768 Dimensions (Recommended Default)

Use when:

✅ Building standard RAG systems
✅ General semantic search
✅ Cost-effectiveness matters
✅ Storage is a consideration

Don't use when:

❌ You need absolute maximum accuracy
❌ Migrating from OpenAI 1536-dim embeddings

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 768 // ← Recommended
  }
});

3072 Dimensions (Maximum Accuracy)

Use when:

✅ Accuracy is critical (legal, medical, research)
✅ Budget allows 4x storage cost
✅ Query latency isn't a concern
✅ Small dataset (<10k vectors)

Don't use when:

❌ Cost-sensitive project
❌ Large dataset (>100k vectors)
❌ Real-time search required

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 3072 // ← Maximum accuracy
  }
});

1536 Dimensions (OpenAI Compatibility)

Use when:

✅ Migrating from OpenAI text-embedding-3-small
✅ Need compatibility with existing infrastructure
✅ Balancing accuracy and cost

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 1536 // ← Match OpenAI
  }
});

512 or Lower (Storage-Constrained)

Use when:

✅ Extreme storage constraints
✅ Millions of vectors
✅ Acceptable to sacrifice some accuracy
✅ Ultra-fast queries required

Example:

const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: {
    taskType: 'RETRIEVAL_DOCUMENT',
    outputDimensionality: 512 // ← Compact
  }
});

Migration Between Dimensions

CRITICAL: You cannot mix different dimensions in the same index.

Option 1: Recreate Index

# Delete old index
npx wrangler vectorize delete my-index

# Create new index with different dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine

# Re-generate all embeddings with new dimensions
# Re-insert all vectors

Option 2: Create New Index

# Keep old index running
# Create new index
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine

# Gradually migrate vectors
# Switch over when ready
# Delete old index

Testing Methodology

To test if lower dimensions work for your use case:

// 1. Generate test embeddings with different dimensions
const dims = [256, 512, 768, 1536, 3072];
const testEmbeddings = await Promise.all(
  dims.map(dim => ai.models.embedContent({
    model: 'gemini-embedding-001',
    content: testText,
    config: { outputDimensionality: dim }
  }))
);

// 2. Test retrieval accuracy
const queries = ['query1', 'query2', 'query3'];
for (const dim of dims) {
  const accuracy = await testRetrievalAccuracy(queries, dim);
  console.log(`${dim} dims: ${accuracy}% accuracy`);
}

// 3. Measure performance
for (const dim of dims) {
  const latency = await measureQueryLatency(dim);
  console.log(`${dim} dims: ${latency}ms latency`);
}

Recommendations by Use Case

RAG for Documentation

Recommended: 768 dims
Reasoning: Good accuracy, reasonable storage, fast queries

E-commerce Search

Recommended: 512-768 dims
Reasoning: Speed matters, millions of products

Legal Document Search

Recommended: 3072 dims
Reasoning: Accuracy is critical, smaller datasets

Customer Support Chatbot

Recommended: 768 dims
Reasoning: Balance accuracy and response time

Research Paper Search

Recommended: 1536-3072 dims
Reasoning: Nuanced understanding needed

Summary

Default Choice: 768 dimensions

96% of 3072-dim accuracy
75% less storage
3x faster queries
Best balance for most applications

Only use 3072 if:

You need every percentage point of accuracy
You have budget for 4x storage
You have a small dataset

Consider lower (<768) if:

You have millions of vectors
Storage cost is a major concern
Ultra-fast queries are required

Official Documentation

Matryoshka Learning: https://arxiv.org/abs/2205.13147
Gemini Embeddings: https://ai.google.dev/gemini-api/docs/embeddings
MTEB Benchmark: https://github.com/embeddings-benchmark/mteb

7.4 KiB Raw Blame History

Choosing the Right Embedding Dimensions

Quick Decision Table

Available Dimensions

Common Choices

Matryoshka Representation Learning

Storage Impact

Example: 100,000 Documents

Accuracy Trade-offs

Query Performance

When to Use Each

768 Dimensions (Recommended Default)

3072 Dimensions (Maximum Accuracy)

1536 Dimensions (OpenAI Compatibility)

512 or Lower (Storage-Constrained)

Migration Between Dimensions

Option 1: Recreate Index

Option 2: Create New Index

Testing Methodology

Recommendations by Use Case

RAG for Documentation

E-commerce Search

Legal Document Search

Customer Support Chatbot

Research Paper Search

Summary

Official Documentation

7.4 KiB

Raw Blame History