Files
gh-jezweb-claude-skills-ski…/references/model-comparison.md
2025-11-30 08:24:54 +08:00

237 lines
6.5 KiB
Markdown

# Embedding Model Comparison
Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
---
## Quick Comparison Table
| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
| **Default Dims** | 3072 | 1536 | 3072 | 768 |
| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
| **Task Types** | 8 types | None | None | None |
| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
---
## Detailed Comparison
### 1. Google Gemini (gemini-embedding-001)
**Strengths**:
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Free tier with generous limits
- Same API as Gemini text generation (unified ecosystem)
**Weaknesses**:
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
- Newer model (less community knowledge)
**Recommended For**:
- RAG systems (optimized task types)
- Projects already using Gemini API
- Budget-conscious projects (free tier)
**Pricing**:
- Free: 100 RPM, 30k TPM, 1k RPD
- Paid: $0.025 per 1M tokens (Tier 1+)
---
### 2. OpenAI text-embedding-3-small
**Strengths**:
- Larger context window (8,191 tokens)
- Well-documented and widely used
- Good balance of cost and performance
- Can shorten dimensions (Matryoshka)
**Weaknesses**:
- Fixed 1536 dimensions (unless shortened)
- No task type optimization
- Costs from day one (no free tier for embeddings)
**Recommended For**:
- General-purpose semantic search
- Projects with long documents (>2k tokens)
- OpenAI ecosystem integration
**Pricing**:
- $0.020 per 1M tokens
---
### 3. OpenAI text-embedding-3-large
**Strengths**:
- Highest accuracy of OpenAI models
- 3072 dimensions (same as Gemini default)
- Large context window (8,191 tokens)
**Weaknesses**:
- Most expensive ($0.130 per 1M tokens)
- Fixed dimensions
- Overkill for most use cases
**Recommended For**:
- Mission-critical applications requiring maximum accuracy
- Well-funded projects
**Pricing**:
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
---
### 4. Cloudflare Workers AI (bge-base-en-v1.5)
**Strengths**:
- **Free** on Cloudflare Workers
- Fast (edge inference)
- Good for English text
- Simple integration with Vectorize
**Weaknesses**:
- Small context window (512 tokens)
- Fixed 768 dimensions
- No task type optimization
- English-only (limited multilingual support)
**Recommended For**:
- Cloudflare-first stacks
- Cost-sensitive projects
- Short documents (<512 tokens)
- Edge inference requirements
**Pricing**:
- Free (included with Cloudflare Workers)
**Example**:
```typescript
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: 'Your text here'
});
// Returns: { data: number[] } with 768 dimensions
```
---
## When to Use Which
### Use Gemini Embeddings When:
- ✅ Building RAG systems (task type optimization)
- ✅ Need flexible dimensions (save storage/compute)
- ✅ Already using Gemini API
- ✅ Want free tier for development
### Use OpenAI text-embedding-3-small When:
- ✅ Documents > 2,048 tokens
- ✅ Using OpenAI for generation
- ✅ Need proven, well-documented solution
- ✅ General-purpose semantic search
### Use OpenAI text-embedding-3-large When:
- ✅ Maximum accuracy required
- ✅ Budget allows ($0.130 per 1M tokens)
- ✅ Mission-critical applications
### Use Workers AI (BGE) When:
- ✅ Building on Cloudflare
- ✅ Short documents (<512 tokens)
- ✅ Cost is primary concern (free)
- ✅ English-only content
- ✅ Need edge inference
---
## Dimension Recommendations
| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|----------|--------|--------------|--------------|------------|
| **General RAG** | 768 | 1536 | 3072 | 768 |
| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
---
## Migration Guide
### From OpenAI to Gemini
```typescript
// Before (OpenAI)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
});
const embedding = response.data[0].embedding; // 1536 dims
// After (Gemini)
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'Your text here',
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768 // or 1536 to match OpenAI
}
});
const embedding = response.embedding.values; // 768 dims
```
**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
---
## Performance Benchmarks
Based on MTEB (Massive Text Embedding Benchmark):
| Model | Retrieval Score | Clustering Score | Overall Score |
|-------|----------------|------------------|---------------|
| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
*Estimated based on available benchmarks
**Source**: https://github.com/embeddings-benchmark/mteb
---
## Summary
**Best Overall**: Gemini gemini-embedding-001
- Flexible dimensions
- Task type optimization
- Free tier
- Good performance
**Best for Accuracy**: OpenAI text-embedding-3-large
- Highest MTEB scores
- Large context window
- Most expensive
**Best for Budget**: Cloudflare Workers AI (BGE)
- Completely free
- Edge inference
- Limited context window
**Best for Long Documents**: OpenAI models
- 8,191 token context
- vs 2,048 (Gemini) or 512 (Workers AI)
---
## Official Documentation
- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb