Files
gh-jezweb-claude-skills-ski…/references/model-comparison.md
2025-11-30 08:24:54 +08:00

6.5 KiB

Embedding Model Comparison

Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.


Quick Comparison Table

Feature Gemini (gemini-embedding-001) OpenAI (text-embedding-3-small) OpenAI (text-embedding-3-large) Workers AI (bge-base-en-v1.5)
Dimensions 128-3072 (flexible) 1536 (fixed) 3072 (fixed) 768 (fixed)
Default Dims 3072 1536 3072 768
Context Window 2,048 tokens 8,191 tokens 8,191 tokens 512 tokens
Cost (per 1M tokens) Free tier, then $0.025 $0.020 $0.130 Free on Cloudflare
Rate Limit (Free) 100 RPM, 30k TPM 3,000 RPM 3,000 RPM Unlimited
Task Types 8 types None None None
Matryoshka Yes Yes (shortening) Yes (shortening) No
Best For RAG, semantic search General purpose High accuracy needed Edge computing, Cloudflare stack

Detailed Comparison

1. Google Gemini (gemini-embedding-001)

Strengths:

  • Flexible dimensions (128-3072) using Matryoshka Representation Learning
  • 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
  • Free tier with generous limits
  • Same API as Gemini text generation (unified ecosystem)

Weaknesses:

  • Smaller context window (2,048 tokens vs OpenAI's 8,191)
  • Newer model (less community knowledge)

Recommended For:

  • RAG systems (optimized task types)
  • Projects already using Gemini API
  • Budget-conscious projects (free tier)

Pricing:

  • Free: 100 RPM, 30k TPM, 1k RPD
  • Paid: $0.025 per 1M tokens (Tier 1+)

2. OpenAI text-embedding-3-small

Strengths:

  • Larger context window (8,191 tokens)
  • Well-documented and widely used
  • Good balance of cost and performance
  • Can shorten dimensions (Matryoshka)

Weaknesses:

  • Fixed 1536 dimensions (unless shortened)
  • No task type optimization
  • Costs from day one (no free tier for embeddings)

Recommended For:

  • General-purpose semantic search
  • Projects with long documents (>2k tokens)
  • OpenAI ecosystem integration

Pricing:

  • $0.020 per 1M tokens

3. OpenAI text-embedding-3-large

Strengths:

  • Highest accuracy of OpenAI models
  • 3072 dimensions (same as Gemini default)
  • Large context window (8,191 tokens)

Weaknesses:

  • Most expensive ($0.130 per 1M tokens)
  • Fixed dimensions
  • Overkill for most use cases

Recommended For:

  • Mission-critical applications requiring maximum accuracy
  • Well-funded projects

Pricing:

  • $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)

4. Cloudflare Workers AI (bge-base-en-v1.5)

Strengths:

  • Free on Cloudflare Workers
  • Fast (edge inference)
  • Good for English text
  • Simple integration with Vectorize

Weaknesses:

  • Small context window (512 tokens)
  • Fixed 768 dimensions
  • No task type optimization
  • English-only (limited multilingual support)

Recommended For:

  • Cloudflare-first stacks
  • Cost-sensitive projects
  • Short documents (<512 tokens)
  • Edge inference requirements

Pricing:

  • Free (included with Cloudflare Workers)

Example:

const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: 'Your text here'
});
// Returns: { data: number[] } with 768 dimensions

When to Use Which

Use Gemini Embeddings When:

  • Building RAG systems (task type optimization)
  • Need flexible dimensions (save storage/compute)
  • Already using Gemini API
  • Want free tier for development

Use OpenAI text-embedding-3-small When:

  • Documents > 2,048 tokens
  • Using OpenAI for generation
  • Need proven, well-documented solution
  • General-purpose semantic search

Use OpenAI text-embedding-3-large When:

  • Maximum accuracy required
  • Budget allows ($0.130 per 1M tokens)
  • Mission-critical applications

Use Workers AI (BGE) When:

  • Building on Cloudflare
  • Short documents (<512 tokens)
  • Cost is primary concern (free)
  • English-only content
  • Need edge inference

Dimension Recommendations

Use Case Gemini OpenAI Small OpenAI Large Workers AI
General RAG 768 1536 3072 768
Storage-limited 128-512 512 (shortened) 1024 (shortened) 768 (fixed)
Maximum accuracy 3072 1536 (fixed) 3072 768 (fixed)

Migration Guide

From OpenAI to Gemini

// Before (OpenAI)
const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here'
});
const embedding = response.data[0].embedding; // 1536 dims

// After (Gemini)
const response = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: 'Your text here',
  config: {
    taskType: 'SEMANTIC_SIMILARITY',
    outputDimensionality: 768 // or 1536 to match OpenAI
  }
});
const embedding = response.embedding.values; // 768 dims

CRITICAL: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.


Performance Benchmarks

Based on MTEB (Massive Text Embedding Benchmark):

Model Retrieval Score Clustering Score Overall Score
OpenAI text-embedding-3-large 64.6 49.0 54.9
OpenAI text-embedding-3-small 62.3 49.0 54.0
Gemini gemini-embedding-001 ~60.0* ~47.0* ~52.0*
Workers AI bge-base-en-v1.5 53.2 42.0 48.0

*Estimated based on available benchmarks

Source: https://github.com/embeddings-benchmark/mteb


Summary

Best Overall: Gemini gemini-embedding-001

  • Flexible dimensions
  • Task type optimization
  • Free tier
  • Good performance

Best for Accuracy: OpenAI text-embedding-3-large

  • Highest MTEB scores
  • Large context window
  • Most expensive

Best for Budget: Cloudflare Workers AI (BGE)

  • Completely free
  • Edge inference
  • Limited context window

Best for Long Documents: OpenAI models

  • 8,191 token context
  • vs 2,048 (Gemini) or 512 (Workers AI)

Official Documentation