gh-jezweb-claude-skills-ski…/references/model-comparison.md

# Embedding Model Comparison

Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.

---

## Quick Comparison Table

| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
| **Default Dims** | 3072 | 1536 | 3072 | 768 |
| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
| **Task Types** | 8 types | None | None | None |
| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |

---

## Detailed Comparison

### 1. Google Gemini (gemini-embedding-001)

**Strengths**:
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
- Free tier with generous limits
- Same API as Gemini text generation (unified ecosystem)

**Weaknesses**:
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
- Newer model (less community knowledge)

**Recommended For**:
- RAG systems (optimized task types)
- Projects already using Gemini API
- Budget-conscious projects (free tier)

**Pricing**:
- Free: 100 RPM, 30k TPM, 1k RPD
- Paid: $0.025 per 1M tokens (Tier 1+)

---

### 2. OpenAI text-embedding-3-small

**Strengths**:
- Larger context window (8,191 tokens)
- Well-documented and widely used
- Good balance of cost and performance
- Can shorten dimensions (Matryoshka)

**Weaknesses**:
- Fixed 1536 dimensions (unless shortened)
- No task type optimization
- Costs from day one (no free tier for embeddings)

**Recommended For**:
- General-purpose semantic search
- Projects with long documents (>2k tokens)
- OpenAI ecosystem integration

**Pricing**:
- $0.020 per 1M tokens

---

### 3. OpenAI text-embedding-3-large

**Strengths**:
- Highest accuracy of OpenAI models
- 3072 dimensions (same as Gemini default)
- Large context window (8,191 tokens)

**Weaknesses**:
- Most expensive ($0.130 per 1M tokens)
- Fixed dimensions
- Overkill for most use cases

**Recommended For**:
- Mission-critical applications requiring maximum accuracy
- Well-funded projects

**Pricing**:
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)

---

### 4. Cloudflare Workers AI (bge-base-en-v1.5)

**Strengths**:
- **Free** on Cloudflare Workers
- Fast (edge inference)
- Good for English text
- Simple integration with Vectorize

**Weaknesses**:
- Small context window (512 tokens)
- Fixed 768 dimensions
- No task type optimization
- English-only (limited multilingual support)

**Recommended For**:
- Cloudflare-first stacks
- Cost-sensitive projects
- Short documents (<512 tokens)
- Edge inference requirements

**Pricing**:
- Free (included with Cloudflare Workers)

**Example**:
```typescript
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: 'Your text here'
});
// Returns: { data: number[] } with 768 dimensions
```

---

## When to Use Which

### Use Gemini Embeddings When:
- ✅ Building RAG systems (task type optimization)
- ✅ Need flexible dimensions (save storage/compute)
- ✅ Already using Gemini API
- ✅ Want free tier for development

### Use OpenAI text-embedding-3-small When:
- ✅ Documents > 2,048 tokens
- ✅ Using OpenAI for generation
- ✅ Need proven, well-documented solution
- ✅ General-purpose semantic search

### Use OpenAI text-embedding-3-large When:
- ✅ Maximum accuracy required
- ✅ Budget allows ($0.130 per 1M tokens)
- ✅ Mission-critical applications

### Use Workers AI (BGE) When:
- ✅ Building on Cloudflare
- ✅ Short documents (<512 tokens)
- ✅ Cost is primary concern (free)
- ✅ English-only content
- ✅ Need edge inference

---

## Dimension Recommendations

| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|----------|--------|--------------|--------------|------------|
| **General RAG** | 768 | 1536 | 3072 | 768 |
| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |

---

## Migration Guide

### From OpenAI to Gemini

```typescript
// Before (OpenAI)
const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here'
});
const embedding = response.data[0].embedding; // 1536 dims

// After (Gemini)
const response = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: 'Your text here',
  config: {
    taskType: 'SEMANTIC_SIMILARITY',
    outputDimensionality: 768 // or 1536 to match OpenAI
  }
});
const embedding = response.embedding.values; // 768 dims
```

**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.

---

## Performance Benchmarks

Based on MTEB (Massive Text Embedding Benchmark):

| Model | Retrieval Score | Clustering Score | Overall Score |
|-------|----------------|------------------|---------------|
| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |

*Estimated based on available benchmarks

**Source**: https://github.com/embeddings-benchmark/mteb

---

## Summary

**Best Overall**: Gemini gemini-embedding-001
- Flexible dimensions
- Task type optimization
- Free tier
- Good performance

**Best for Accuracy**: OpenAI text-embedding-3-large
- Highest MTEB scores
- Large context window
- Most expensive

**Best for Budget**: Cloudflare Workers AI (BGE)
- Completely free
- Edge inference
- Limited context window

**Best for Long Documents**: OpenAI models
- 8,191 token context
- vs 2,048 (Gemini) or 512 (Workers AI)

---

## Official Documentation

- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb