6.5 KiB
6.5 KiB
Embedding Model Comparison
Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
Quick Comparison Table
| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|---|---|---|---|---|
| Dimensions | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
| Default Dims | 3072 | 1536 | 3072 | 768 |
| Context Window | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
| Cost (per 1M tokens) | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
| Rate Limit (Free) | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
| Task Types | 8 types | None | None | None |
| Matryoshka | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
| Best For | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
Detailed Comparison
1. Google Gemini (gemini-embedding-001)
Strengths:
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Free tier with generous limits
- Same API as Gemini text generation (unified ecosystem)
Weaknesses:
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
- Newer model (less community knowledge)
Recommended For:
- RAG systems (optimized task types)
- Projects already using Gemini API
- Budget-conscious projects (free tier)
Pricing:
- Free: 100 RPM, 30k TPM, 1k RPD
- Paid: $0.025 per 1M tokens (Tier 1+)
2. OpenAI text-embedding-3-small
Strengths:
- Larger context window (8,191 tokens)
- Well-documented and widely used
- Good balance of cost and performance
- Can shorten dimensions (Matryoshka)
Weaknesses:
- Fixed 1536 dimensions (unless shortened)
- No task type optimization
- Costs from day one (no free tier for embeddings)
Recommended For:
- General-purpose semantic search
- Projects with long documents (>2k tokens)
- OpenAI ecosystem integration
Pricing:
- $0.020 per 1M tokens
3. OpenAI text-embedding-3-large
Strengths:
- Highest accuracy of OpenAI models
- 3072 dimensions (same as Gemini default)
- Large context window (8,191 tokens)
Weaknesses:
- Most expensive ($0.130 per 1M tokens)
- Fixed dimensions
- Overkill for most use cases
Recommended For:
- Mission-critical applications requiring maximum accuracy
- Well-funded projects
Pricing:
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
4. Cloudflare Workers AI (bge-base-en-v1.5)
Strengths:
- Free on Cloudflare Workers
- Fast (edge inference)
- Good for English text
- Simple integration with Vectorize
Weaknesses:
- Small context window (512 tokens)
- Fixed 768 dimensions
- No task type optimization
- English-only (limited multilingual support)
Recommended For:
- Cloudflare-first stacks
- Cost-sensitive projects
- Short documents (<512 tokens)
- Edge inference requirements
Pricing:
- Free (included with Cloudflare Workers)
Example:
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: 'Your text here'
});
// Returns: { data: number[] } with 768 dimensions
When to Use Which
Use Gemini Embeddings When:
- ✅ Building RAG systems (task type optimization)
- ✅ Need flexible dimensions (save storage/compute)
- ✅ Already using Gemini API
- ✅ Want free tier for development
Use OpenAI text-embedding-3-small When:
- ✅ Documents > 2,048 tokens
- ✅ Using OpenAI for generation
- ✅ Need proven, well-documented solution
- ✅ General-purpose semantic search
Use OpenAI text-embedding-3-large When:
- ✅ Maximum accuracy required
- ✅ Budget allows ($0.130 per 1M tokens)
- ✅ Mission-critical applications
Use Workers AI (BGE) When:
- ✅ Building on Cloudflare
- ✅ Short documents (<512 tokens)
- ✅ Cost is primary concern (free)
- ✅ English-only content
- ✅ Need edge inference
Dimension Recommendations
| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|---|---|---|---|---|
| General RAG | 768 | 1536 | 3072 | 768 |
| Storage-limited | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
| Maximum accuracy | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
Migration Guide
From OpenAI to Gemini
// Before (OpenAI)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
});
const embedding = response.data[0].embedding; // 1536 dims
// After (Gemini)
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'Your text here',
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768 // or 1536 to match OpenAI
}
});
const embedding = response.embedding.values; // 768 dims
CRITICAL: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
Performance Benchmarks
Based on MTEB (Massive Text Embedding Benchmark):
| Model | Retrieval Score | Clustering Score | Overall Score |
|---|---|---|---|
| OpenAI text-embedding-3-large | 64.6 | 49.0 | 54.9 |
| OpenAI text-embedding-3-small | 62.3 | 49.0 | 54.0 |
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
*Estimated based on available benchmarks
Source: https://github.com/embeddings-benchmark/mteb
Summary
Best Overall: Gemini gemini-embedding-001
- Flexible dimensions
- Task type optimization
- Free tier
- Good performance
Best for Accuracy: OpenAI text-embedding-3-large
- Highest MTEB scores
- Large context window
- Most expensive
Best for Budget: Cloudflare Workers AI (BGE)
- Completely free
- Edge inference
- Limited context window
Best for Long Documents: OpenAI models
- 8,191 token context
- vs 2,048 (Gemini) or 512 (Workers AI)