237 lines
6.5 KiB
Markdown
237 lines
6.5 KiB
Markdown
# Embedding Model Comparison
|
|
|
|
Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
|
|
|
|
---
|
|
|
|
## Quick Comparison Table
|
|
|
|
| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|
|
|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
|
|
| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
|
|
| **Default Dims** | 3072 | 1536 | 3072 | 768 |
|
|
| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
|
|
| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
|
|
| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
|
|
| **Task Types** | 8 types | None | None | None |
|
|
| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
|
|
| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
|
|
|
|
---
|
|
|
|
## Detailed Comparison
|
|
|
|
### 1. Google Gemini (gemini-embedding-001)
|
|
|
|
**Strengths**:
|
|
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
|
|
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
|
|
- Free tier with generous limits
|
|
- Same API as Gemini text generation (unified ecosystem)
|
|
|
|
**Weaknesses**:
|
|
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
|
|
- Newer model (less community knowledge)
|
|
|
|
**Recommended For**:
|
|
- RAG systems (optimized task types)
|
|
- Projects already using Gemini API
|
|
- Budget-conscious projects (free tier)
|
|
|
|
**Pricing**:
|
|
- Free: 100 RPM, 30k TPM, 1k RPD
|
|
- Paid: $0.025 per 1M tokens (Tier 1+)
|
|
|
|
---
|
|
|
|
### 2. OpenAI text-embedding-3-small
|
|
|
|
**Strengths**:
|
|
- Larger context window (8,191 tokens)
|
|
- Well-documented and widely used
|
|
- Good balance of cost and performance
|
|
- Can shorten dimensions (Matryoshka)
|
|
|
|
**Weaknesses**:
|
|
- Fixed 1536 dimensions (unless shortened)
|
|
- No task type optimization
|
|
- Costs from day one (no free tier for embeddings)
|
|
|
|
**Recommended For**:
|
|
- General-purpose semantic search
|
|
- Projects with long documents (>2k tokens)
|
|
- OpenAI ecosystem integration
|
|
|
|
**Pricing**:
|
|
- $0.020 per 1M tokens
|
|
|
|
---
|
|
|
|
### 3. OpenAI text-embedding-3-large
|
|
|
|
**Strengths**:
|
|
- Highest accuracy of OpenAI models
|
|
- 3072 dimensions (same as Gemini default)
|
|
- Large context window (8,191 tokens)
|
|
|
|
**Weaknesses**:
|
|
- Most expensive ($0.130 per 1M tokens)
|
|
- Fixed dimensions
|
|
- Overkill for most use cases
|
|
|
|
**Recommended For**:
|
|
- Mission-critical applications requiring maximum accuracy
|
|
- Well-funded projects
|
|
|
|
**Pricing**:
|
|
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
|
|
|
|
---
|
|
|
|
### 4. Cloudflare Workers AI (bge-base-en-v1.5)
|
|
|
|
**Strengths**:
|
|
- **Free** on Cloudflare Workers
|
|
- Fast (edge inference)
|
|
- Good for English text
|
|
- Simple integration with Vectorize
|
|
|
|
**Weaknesses**:
|
|
- Small context window (512 tokens)
|
|
- Fixed 768 dimensions
|
|
- No task type optimization
|
|
- English-only (limited multilingual support)
|
|
|
|
**Recommended For**:
|
|
- Cloudflare-first stacks
|
|
- Cost-sensitive projects
|
|
- Short documents (<512 tokens)
|
|
- Edge inference requirements
|
|
|
|
**Pricing**:
|
|
- Free (included with Cloudflare Workers)
|
|
|
|
**Example**:
|
|
```typescript
|
|
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
|
|
text: 'Your text here'
|
|
});
|
|
// Returns: { data: number[] } with 768 dimensions
|
|
```
|
|
|
|
---
|
|
|
|
## When to Use Which
|
|
|
|
### Use Gemini Embeddings When:
|
|
- ✅ Building RAG systems (task type optimization)
|
|
- ✅ Need flexible dimensions (save storage/compute)
|
|
- ✅ Already using Gemini API
|
|
- ✅ Want free tier for development
|
|
|
|
### Use OpenAI text-embedding-3-small When:
|
|
- ✅ Documents > 2,048 tokens
|
|
- ✅ Using OpenAI for generation
|
|
- ✅ Need proven, well-documented solution
|
|
- ✅ General-purpose semantic search
|
|
|
|
### Use OpenAI text-embedding-3-large When:
|
|
- ✅ Maximum accuracy required
|
|
- ✅ Budget allows ($0.130 per 1M tokens)
|
|
- ✅ Mission-critical applications
|
|
|
|
### Use Workers AI (BGE) When:
|
|
- ✅ Building on Cloudflare
|
|
- ✅ Short documents (<512 tokens)
|
|
- ✅ Cost is primary concern (free)
|
|
- ✅ English-only content
|
|
- ✅ Need edge inference
|
|
|
|
---
|
|
|
|
## Dimension Recommendations
|
|
|
|
| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|
|
|----------|--------|--------------|--------------|------------|
|
|
| **General RAG** | 768 | 1536 | 3072 | 768 |
|
|
| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
|
|
| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### From OpenAI to Gemini
|
|
|
|
```typescript
|
|
// Before (OpenAI)
|
|
const response = await openai.embeddings.create({
|
|
model: 'text-embedding-3-small',
|
|
input: 'Your text here'
|
|
});
|
|
const embedding = response.data[0].embedding; // 1536 dims
|
|
|
|
// After (Gemini)
|
|
const response = await ai.models.embedContent({
|
|
model: 'gemini-embedding-001',
|
|
content: 'Your text here',
|
|
config: {
|
|
taskType: 'SEMANTIC_SIMILARITY',
|
|
outputDimensionality: 768 // or 1536 to match OpenAI
|
|
}
|
|
});
|
|
const embedding = response.embedding.values; // 768 dims
|
|
```
|
|
|
|
**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
|
|
|
|
---
|
|
|
|
## Performance Benchmarks
|
|
|
|
Based on MTEB (Massive Text Embedding Benchmark):
|
|
|
|
| Model | Retrieval Score | Clustering Score | Overall Score |
|
|
|-------|----------------|------------------|---------------|
|
|
| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
|
|
| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
|
|
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
|
|
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
|
|
|
|
*Estimated based on available benchmarks
|
|
|
|
**Source**: https://github.com/embeddings-benchmark/mteb
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Best Overall**: Gemini gemini-embedding-001
|
|
- Flexible dimensions
|
|
- Task type optimization
|
|
- Free tier
|
|
- Good performance
|
|
|
|
**Best for Accuracy**: OpenAI text-embedding-3-large
|
|
- Highest MTEB scores
|
|
- Large context window
|
|
- Most expensive
|
|
|
|
**Best for Budget**: Cloudflare Workers AI (BGE)
|
|
- Completely free
|
|
- Edge inference
|
|
- Limited context window
|
|
|
|
**Best for Long Documents**: OpenAI models
|
|
- 8,191 token context
|
|
- vs 2,048 (Gemini) or 512 (Workers AI)
|
|
|
|
---
|
|
|
|
## Official Documentation
|
|
|
|
- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
|
|
- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
|
|
- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
|
|
- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb
|