# Embedding Model Comparison Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case. --- ## Quick Comparison Table | Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) | |---------|------------------------------|--------------------------------|--------------------------------|-------------------------------| | **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) | | **Default Dims** | 3072 | 1536 | 3072 | 768 | | **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens | | **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare | | **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited | | **Task Types** | 8 types | None | None | None | | **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No | | **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack | --- ## Detailed Comparison ### 1. Google Gemini (gemini-embedding-001) **Strengths**: - Flexible dimensions (128-3072) using Matryoshka Representation Learning - 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.) - Free tier with generous limits - Same API as Gemini text generation (unified ecosystem) **Weaknesses**: - Smaller context window (2,048 tokens vs OpenAI's 8,191) - Newer model (less community knowledge) **Recommended For**: - RAG systems (optimized task types) - Projects already using Gemini API - Budget-conscious projects (free tier) **Pricing**: - Free: 100 RPM, 30k TPM, 1k RPD - Paid: $0.025 per 1M tokens (Tier 1+) --- ### 2. OpenAI text-embedding-3-small **Strengths**: - Larger context window (8,191 tokens) - Well-documented and widely used - Good balance of cost and performance - Can shorten dimensions (Matryoshka) **Weaknesses**: - Fixed 1536 dimensions (unless shortened) - No task type optimization - Costs from day one (no free tier for embeddings) **Recommended For**: - General-purpose semantic search - Projects with long documents (>2k tokens) - OpenAI ecosystem integration **Pricing**: - $0.020 per 1M tokens --- ### 3. OpenAI text-embedding-3-large **Strengths**: - Highest accuracy of OpenAI models - 3072 dimensions (same as Gemini default) - Large context window (8,191 tokens) **Weaknesses**: - Most expensive ($0.130 per 1M tokens) - Fixed dimensions - Overkill for most use cases **Recommended For**: - Mission-critical applications requiring maximum accuracy - Well-funded projects **Pricing**: - $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small) --- ### 4. Cloudflare Workers AI (bge-base-en-v1.5) **Strengths**: - **Free** on Cloudflare Workers - Fast (edge inference) - Good for English text - Simple integration with Vectorize **Weaknesses**: - Small context window (512 tokens) - Fixed 768 dimensions - No task type optimization - English-only (limited multilingual support) **Recommended For**: - Cloudflare-first stacks - Cost-sensitive projects - Short documents (<512 tokens) - Edge inference requirements **Pricing**: - Free (included with Cloudflare Workers) **Example**: ```typescript const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: 'Your text here' }); // Returns: { data: number[] } with 768 dimensions ``` --- ## When to Use Which ### Use Gemini Embeddings When: - ✅ Building RAG systems (task type optimization) - ✅ Need flexible dimensions (save storage/compute) - ✅ Already using Gemini API - ✅ Want free tier for development ### Use OpenAI text-embedding-3-small When: - ✅ Documents > 2,048 tokens - ✅ Using OpenAI for generation - ✅ Need proven, well-documented solution - ✅ General-purpose semantic search ### Use OpenAI text-embedding-3-large When: - ✅ Maximum accuracy required - ✅ Budget allows ($0.130 per 1M tokens) - ✅ Mission-critical applications ### Use Workers AI (BGE) When: - ✅ Building on Cloudflare - ✅ Short documents (<512 tokens) - ✅ Cost is primary concern (free) - ✅ English-only content - ✅ Need edge inference --- ## Dimension Recommendations | Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI | |----------|--------|--------------|--------------|------------| | **General RAG** | 768 | 1536 | 3072 | 768 | | **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) | | **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) | --- ## Migration Guide ### From OpenAI to Gemini ```typescript // Before (OpenAI) const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: 'Your text here' }); const embedding = response.data[0].embedding; // 1536 dims // After (Gemini) const response = await ai.models.embedContent({ model: 'gemini-embedding-001', content: 'Your text here', config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 // or 1536 to match OpenAI } }); const embedding = response.embedding.values; // 768 dims ``` **CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable. --- ## Performance Benchmarks Based on MTEB (Massive Text Embedding Benchmark): | Model | Retrieval Score | Clustering Score | Overall Score | |-------|----------------|------------------|---------------| | OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** | | OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 | | Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* | | Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 | *Estimated based on available benchmarks **Source**: https://github.com/embeddings-benchmark/mteb --- ## Summary **Best Overall**: Gemini gemini-embedding-001 - Flexible dimensions - Task type optimization - Free tier - Good performance **Best for Accuracy**: OpenAI text-embedding-3-large - Highest MTEB scores - Large context window - Most expensive **Best for Budget**: Cloudflare Workers AI (BGE) - Completely free - Edge inference - Limited context window **Best for Long Documents**: OpenAI models - 8,191 token context - vs 2,048 (Gemini) or 512 (Workers AI) --- ## Official Documentation - **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings - **OpenAI**: https://platform.openai.com/docs/guides/embeddings - **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/ - **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb