Initial commit
This commit is contained in:
310
references/dimension-guide.md
Normal file
310
references/dimension-guide.md
Normal file
@@ -0,0 +1,310 @@
|
||||
# Choosing the Right Embedding Dimensions
|
||||
|
||||
Guide to selecting optimal dimensions for your use case with Gemini embeddings.
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Table
|
||||
|
||||
| Your Priority | Recommended Dimensions | Why |
|
||||
|--------------|----------------------|-----|
|
||||
| **Balanced (default)** | **768** | Best accuracy-to-cost ratio |
|
||||
| **Maximum accuracy** | 3072 | Gemini's full capability |
|
||||
| **Storage-limited** | 512 or lower | Reduce storage/compute |
|
||||
| **OpenAI compatibility** | 1536 | Match OpenAI dimensions |
|
||||
|
||||
---
|
||||
|
||||
## Available Dimensions
|
||||
|
||||
Gemini supports **any dimension from 128 to 3072** using Matryoshka Representation Learning.
|
||||
|
||||
### Common Choices
|
||||
|
||||
| Dimensions | Storage/Vector | Search Speed | Accuracy | Use Case |
|
||||
|------------|---------------|--------------|----------|----------|
|
||||
| **768** | ~3 KB | Fast | Good | **Recommended default** |
|
||||
| 1536 | ~6 KB | Medium | Better | Match OpenAI, large datasets |
|
||||
| 3072 | ~12 KB | Slower | Best | Maximum accuracy needed |
|
||||
| 512 | ~2 KB | Very fast | Acceptable | Storage-constrained |
|
||||
| 256 | ~1 KB | Ultra fast | Lower | Extreme constraints |
|
||||
|
||||
---
|
||||
|
||||
## Matryoshka Representation Learning
|
||||
|
||||
Gemini's flexible dimensions work because of **Matryoshka Representation Learning**: The model learns nested representations where the first N dimensions capture progressively more information.
|
||||
|
||||
```
|
||||
Dimensions 1-256: Core semantic information
|
||||
Dimensions 257-512: Additional nuance
|
||||
Dimensions 513-768: Fine-grained details
|
||||
Dimensions 769-1536: Subtle distinctions
|
||||
Dimensions 1537-3072: Maximum precision
|
||||
```
|
||||
|
||||
**Key Point**: Lower dimensions aren't "worse" - they're **compressed** versions of the full embedding.
|
||||
|
||||
---
|
||||
|
||||
## Storage Impact
|
||||
|
||||
### Example: 100,000 Documents
|
||||
|
||||
| Dimensions | Storage Required | Monthly Cost (R2)* |
|
||||
|------------|-----------------|-------------------|
|
||||
| 256 | ~100 MB | $0.01 |
|
||||
| 512 | ~200 MB | $0.02 |
|
||||
| **768** | **~300 MB** | **$0.03** |
|
||||
| 1536 | ~600 MB | $0.06 |
|
||||
| 3072 | ~1.2 GB | $0.12 |
|
||||
|
||||
\*Assuming 4 bytes per float, R2 pricing $0.015/GB/month
|
||||
|
||||
**For 1M vectors**:
|
||||
- 768 dims: ~3 GB storage
|
||||
- 3072 dims: ~12 GB storage (4x more expensive)
|
||||
|
||||
---
|
||||
|
||||
## Accuracy Trade-offs
|
||||
|
||||
Based on MTEB benchmarks (approximate):
|
||||
|
||||
| Dimensions | Retrieval Accuracy | Relative to 3072 |
|
||||
|------------|-------------------|------------------|
|
||||
| 256 | ~85% | -15% |
|
||||
| 512 | ~92% | -8% |
|
||||
| **768** | **~96%** | **-4%** |
|
||||
| 1536 | ~98% | -2% |
|
||||
| 3072 | 100% (baseline) | 0% |
|
||||
|
||||
**Diminishing returns**: Going from 768 → 3072 dims only improves accuracy by ~4% while quadrupling storage.
|
||||
|
||||
---
|
||||
|
||||
## Query Performance
|
||||
|
||||
Search latency (approximate, 100k vectors):
|
||||
|
||||
| Dimensions | Query Latency | Throughput (QPS) |
|
||||
|------------|--------------|------------------|
|
||||
| 256 | ~10ms | ~1000 |
|
||||
| 512 | ~15ms | ~700 |
|
||||
| **768** | **~20ms** | **~500** |
|
||||
| 1536 | ~35ms | ~300 |
|
||||
| 3072 | ~60ms | ~170 |
|
||||
|
||||
**Note**: Actual performance depends on Vectorize implementation and hardware.
|
||||
|
||||
---
|
||||
|
||||
## When to Use Each
|
||||
|
||||
### 768 Dimensions (Recommended Default)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Building standard RAG systems
|
||||
- ✅ General semantic search
|
||||
- ✅ Cost-effectiveness matters
|
||||
- ✅ Storage is a consideration
|
||||
|
||||
**Don't use when**:
|
||||
- ❌ You need absolute maximum accuracy
|
||||
- ❌ Migrating from OpenAI 1536-dim embeddings
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 768 // ← Recommended
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3072 Dimensions (Maximum Accuracy)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Accuracy is critical (legal, medical, research)
|
||||
- ✅ Budget allows 4x storage cost
|
||||
- ✅ Query latency isn't a concern
|
||||
- ✅ Small dataset (<10k vectors)
|
||||
|
||||
**Don't use when**:
|
||||
- ❌ Cost-sensitive project
|
||||
- ❌ Large dataset (>100k vectors)
|
||||
- ❌ Real-time search required
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 3072 // ← Maximum accuracy
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1536 Dimensions (OpenAI Compatibility)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Migrating from OpenAI text-embedding-3-small
|
||||
- ✅ Need compatibility with existing infrastructure
|
||||
- ✅ Balancing accuracy and cost
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 1536 // ← Match OpenAI
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 512 or Lower (Storage-Constrained)
|
||||
|
||||
**Use when**:
|
||||
- ✅ Extreme storage constraints
|
||||
- ✅ Millions of vectors
|
||||
- ✅ Acceptable to sacrifice some accuracy
|
||||
- ✅ Ultra-fast queries required
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 512 // ← Compact
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Between Dimensions
|
||||
|
||||
**CRITICAL**: You cannot mix different dimensions in the same index.
|
||||
|
||||
### Option 1: Recreate Index
|
||||
|
||||
```bash
|
||||
# Delete old index
|
||||
npx wrangler vectorize delete my-index
|
||||
|
||||
# Create new index with different dimensions
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
||||
|
||||
# Re-generate all embeddings with new dimensions
|
||||
# Re-insert all vectors
|
||||
```
|
||||
|
||||
### Option 2: Create New Index
|
||||
|
||||
```bash
|
||||
# Keep old index running
|
||||
# Create new index
|
||||
npx wrangler vectorize create my-index-768 --dimensions 768 --metric cosine
|
||||
|
||||
# Gradually migrate vectors
|
||||
# Switch over when ready
|
||||
# Delete old index
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Methodology
|
||||
|
||||
To test if lower dimensions work for your use case:
|
||||
|
||||
```typescript
|
||||
// 1. Generate test embeddings with different dimensions
|
||||
const dims = [256, 512, 768, 1536, 3072];
|
||||
const testEmbeddings = await Promise.all(
|
||||
dims.map(dim => ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: testText,
|
||||
config: { outputDimensionality: dim }
|
||||
}))
|
||||
);
|
||||
|
||||
// 2. Test retrieval accuracy
|
||||
const queries = ['query1', 'query2', 'query3'];
|
||||
for (const dim of dims) {
|
||||
const accuracy = await testRetrievalAccuracy(queries, dim);
|
||||
console.log(`${dim} dims: ${accuracy}% accuracy`);
|
||||
}
|
||||
|
||||
// 3. Measure performance
|
||||
for (const dim of dims) {
|
||||
const latency = await measureQueryLatency(dim);
|
||||
console.log(`${dim} dims: ${latency}ms latency`);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations by Use Case
|
||||
|
||||
### RAG for Documentation
|
||||
- **Recommended**: 768 dims
|
||||
- **Reasoning**: Good accuracy, reasonable storage, fast queries
|
||||
|
||||
### E-commerce Search
|
||||
- **Recommended**: 512-768 dims
|
||||
- **Reasoning**: Speed matters, millions of products
|
||||
|
||||
### Legal Document Search
|
||||
- **Recommended**: 3072 dims
|
||||
- **Reasoning**: Accuracy is critical, smaller datasets
|
||||
|
||||
### Customer Support Chatbot
|
||||
- **Recommended**: 768 dims
|
||||
- **Reasoning**: Balance accuracy and response time
|
||||
|
||||
### Research Paper Search
|
||||
- **Recommended**: 1536-3072 dims
|
||||
- **Reasoning**: Nuanced understanding needed
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Default Choice**: **768 dimensions**
|
||||
- 96% of 3072-dim accuracy
|
||||
- 75% less storage
|
||||
- 3x faster queries
|
||||
- Best balance for most applications
|
||||
|
||||
**Only use 3072 if**:
|
||||
- You need every percentage point of accuracy
|
||||
- You have budget for 4x storage
|
||||
- You have a small dataset
|
||||
|
||||
**Consider lower (<768) if**:
|
||||
- You have millions of vectors
|
||||
- Storage cost is a major concern
|
||||
- Ultra-fast queries are required
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Matryoshka Learning**: https://arxiv.org/abs/2205.13147
|
||||
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
|
||||
- **MTEB Benchmark**: https://github.com/embeddings-benchmark/mteb
|
||||
236
references/model-comparison.md
Normal file
236
references/model-comparison.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Embedding Model Comparison
|
||||
|
||||
Comparison of Google Gemini, OpenAI, and Cloudflare Workers AI embedding models to help you choose the right one for your use case.
|
||||
|
||||
---
|
||||
|
||||
## Quick Comparison Table
|
||||
|
||||
| Feature | Gemini (gemini-embedding-001) | OpenAI (text-embedding-3-small) | OpenAI (text-embedding-3-large) | Workers AI (bge-base-en-v1.5) |
|
||||
|---------|------------------------------|--------------------------------|--------------------------------|-------------------------------|
|
||||
| **Dimensions** | 128-3072 (flexible) | 1536 (fixed) | 3072 (fixed) | 768 (fixed) |
|
||||
| **Default Dims** | 3072 | 1536 | 3072 | 768 |
|
||||
| **Context Window** | 2,048 tokens | 8,191 tokens | 8,191 tokens | 512 tokens |
|
||||
| **Cost (per 1M tokens)** | Free tier, then $0.025 | $0.020 | $0.130 | Free on Cloudflare |
|
||||
| **Rate Limit (Free)** | 100 RPM, 30k TPM | 3,000 RPM | 3,000 RPM | Unlimited |
|
||||
| **Task Types** | 8 types | None | None | None |
|
||||
| **Matryoshka** | ✅ Yes | ✅ Yes (shortening) | ✅ Yes (shortening) | ❌ No |
|
||||
| **Best For** | RAG, semantic search | General purpose | High accuracy needed | Edge computing, Cloudflare stack |
|
||||
|
||||
---
|
||||
|
||||
## Detailed Comparison
|
||||
|
||||
### 1. Google Gemini (gemini-embedding-001)
|
||||
|
||||
**Strengths**:
|
||||
- Flexible dimensions (128-3072) using Matryoshka Representation Learning
|
||||
- 8 task types for optimization (RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, etc.)
|
||||
- Free tier with generous limits
|
||||
- Same API as Gemini text generation (unified ecosystem)
|
||||
|
||||
**Weaknesses**:
|
||||
- Smaller context window (2,048 tokens vs OpenAI's 8,191)
|
||||
- Newer model (less community knowledge)
|
||||
|
||||
**Recommended For**:
|
||||
- RAG systems (optimized task types)
|
||||
- Projects already using Gemini API
|
||||
- Budget-conscious projects (free tier)
|
||||
|
||||
**Pricing**:
|
||||
- Free: 100 RPM, 30k TPM, 1k RPD
|
||||
- Paid: $0.025 per 1M tokens (Tier 1+)
|
||||
|
||||
---
|
||||
|
||||
### 2. OpenAI text-embedding-3-small
|
||||
|
||||
**Strengths**:
|
||||
- Larger context window (8,191 tokens)
|
||||
- Well-documented and widely used
|
||||
- Good balance of cost and performance
|
||||
- Can shorten dimensions (Matryoshka)
|
||||
|
||||
**Weaknesses**:
|
||||
- Fixed 1536 dimensions (unless shortened)
|
||||
- No task type optimization
|
||||
- Costs from day one (no free tier for embeddings)
|
||||
|
||||
**Recommended For**:
|
||||
- General-purpose semantic search
|
||||
- Projects with long documents (>2k tokens)
|
||||
- OpenAI ecosystem integration
|
||||
|
||||
**Pricing**:
|
||||
- $0.020 per 1M tokens
|
||||
|
||||
---
|
||||
|
||||
### 3. OpenAI text-embedding-3-large
|
||||
|
||||
**Strengths**:
|
||||
- Highest accuracy of OpenAI models
|
||||
- 3072 dimensions (same as Gemini default)
|
||||
- Large context window (8,191 tokens)
|
||||
|
||||
**Weaknesses**:
|
||||
- Most expensive ($0.130 per 1M tokens)
|
||||
- Fixed dimensions
|
||||
- Overkill for most use cases
|
||||
|
||||
**Recommended For**:
|
||||
- Mission-critical applications requiring maximum accuracy
|
||||
- Well-funded projects
|
||||
|
||||
**Pricing**:
|
||||
- $0.130 per 1M tokens (6.5x more expensive than text-embedding-3-small)
|
||||
|
||||
---
|
||||
|
||||
### 4. Cloudflare Workers AI (bge-base-en-v1.5)
|
||||
|
||||
**Strengths**:
|
||||
- **Free** on Cloudflare Workers
|
||||
- Fast (edge inference)
|
||||
- Good for English text
|
||||
- Simple integration with Vectorize
|
||||
|
||||
**Weaknesses**:
|
||||
- Small context window (512 tokens)
|
||||
- Fixed 768 dimensions
|
||||
- No task type optimization
|
||||
- English-only (limited multilingual support)
|
||||
|
||||
**Recommended For**:
|
||||
- Cloudflare-first stacks
|
||||
- Cost-sensitive projects
|
||||
- Short documents (<512 tokens)
|
||||
- Edge inference requirements
|
||||
|
||||
**Pricing**:
|
||||
- Free (included with Cloudflare Workers)
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
|
||||
text: 'Your text here'
|
||||
});
|
||||
// Returns: { data: number[] } with 768 dimensions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to Use Which
|
||||
|
||||
### Use Gemini Embeddings When:
|
||||
- ✅ Building RAG systems (task type optimization)
|
||||
- ✅ Need flexible dimensions (save storage/compute)
|
||||
- ✅ Already using Gemini API
|
||||
- ✅ Want free tier for development
|
||||
|
||||
### Use OpenAI text-embedding-3-small When:
|
||||
- ✅ Documents > 2,048 tokens
|
||||
- ✅ Using OpenAI for generation
|
||||
- ✅ Need proven, well-documented solution
|
||||
- ✅ General-purpose semantic search
|
||||
|
||||
### Use OpenAI text-embedding-3-large When:
|
||||
- ✅ Maximum accuracy required
|
||||
- ✅ Budget allows ($0.130 per 1M tokens)
|
||||
- ✅ Mission-critical applications
|
||||
|
||||
### Use Workers AI (BGE) When:
|
||||
- ✅ Building on Cloudflare
|
||||
- ✅ Short documents (<512 tokens)
|
||||
- ✅ Cost is primary concern (free)
|
||||
- ✅ English-only content
|
||||
- ✅ Need edge inference
|
||||
|
||||
---
|
||||
|
||||
## Dimension Recommendations
|
||||
|
||||
| Use Case | Gemini | OpenAI Small | OpenAI Large | Workers AI |
|
||||
|----------|--------|--------------|--------------|------------|
|
||||
| **General RAG** | 768 | 1536 | 3072 | 768 |
|
||||
| **Storage-limited** | 128-512 | 512 (shortened) | 1024 (shortened) | 768 (fixed) |
|
||||
| **Maximum accuracy** | 3072 | 1536 (fixed) | 3072 | 768 (fixed) |
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### From OpenAI to Gemini
|
||||
|
||||
```typescript
|
||||
// Before (OpenAI)
|
||||
const response = await openai.embeddings.create({
|
||||
model: 'text-embedding-3-small',
|
||||
input: 'Your text here'
|
||||
});
|
||||
const embedding = response.data[0].embedding; // 1536 dims
|
||||
|
||||
// After (Gemini)
|
||||
const response = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: 'Your text here',
|
||||
config: {
|
||||
taskType: 'SEMANTIC_SIMILARITY',
|
||||
outputDimensionality: 768 // or 1536 to match OpenAI
|
||||
}
|
||||
});
|
||||
const embedding = response.embedding.values; // 768 dims
|
||||
```
|
||||
|
||||
**CRITICAL**: If migrating, you must regenerate all embeddings. Embeddings from different models are not comparable.
|
||||
|
||||
---
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
Based on MTEB (Massive Text Embedding Benchmark):
|
||||
|
||||
| Model | Retrieval Score | Clustering Score | Overall Score |
|
||||
|-------|----------------|------------------|---------------|
|
||||
| OpenAI text-embedding-3-large | **64.6** | 49.0 | **54.9** |
|
||||
| OpenAI text-embedding-3-small | 62.3 | **49.0** | 54.0 |
|
||||
| Gemini gemini-embedding-001 | ~60.0* | ~47.0* | ~52.0* |
|
||||
| Workers AI bge-base-en-v1.5 | 53.2 | 42.0 | 48.0 |
|
||||
|
||||
*Estimated based on available benchmarks
|
||||
|
||||
**Source**: https://github.com/embeddings-benchmark/mteb
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Best Overall**: Gemini gemini-embedding-001
|
||||
- Flexible dimensions
|
||||
- Task type optimization
|
||||
- Free tier
|
||||
- Good performance
|
||||
|
||||
**Best for Accuracy**: OpenAI text-embedding-3-large
|
||||
- Highest MTEB scores
|
||||
- Large context window
|
||||
- Most expensive
|
||||
|
||||
**Best for Budget**: Cloudflare Workers AI (BGE)
|
||||
- Completely free
|
||||
- Edge inference
|
||||
- Limited context window
|
||||
|
||||
**Best for Long Documents**: OpenAI models
|
||||
- 8,191 token context
|
||||
- vs 2,048 (Gemini) or 512 (Workers AI)
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Gemini**: https://ai.google.dev/gemini-api/docs/embeddings
|
||||
- **OpenAI**: https://platform.openai.com/docs/guides/embeddings
|
||||
- **Workers AI**: https://developers.cloudflare.com/workers-ai/models/embedding/
|
||||
- **MTEB Leaderboard**: https://github.com/embeddings-benchmark/mteb
|
||||
483
references/rag-patterns.md
Normal file
483
references/rag-patterns.md
Normal file
@@ -0,0 +1,483 @@
|
||||
# RAG Implementation Patterns
|
||||
|
||||
Complete guide to Retrieval Augmented Generation patterns using Gemini embeddings and Cloudflare Vectorize.
|
||||
|
||||
---
|
||||
|
||||
## RAG Workflow Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ DOCUMENT INGESTION (Offline) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
Documents
|
||||
↓
|
||||
Chunking (500 words)
|
||||
↓
|
||||
Generate Embeddings (RETRIEVAL_DOCUMENT)
|
||||
↓
|
||||
Store in Vectorize + Metadata
|
||||
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ QUERY PROCESSING (Runtime) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
User Query
|
||||
↓
|
||||
Generate Embedding (RETRIEVAL_QUERY)
|
||||
↓
|
||||
Vector Search (top-K)
|
||||
↓
|
||||
Retrieve Documents
|
||||
↓
|
||||
Generate Response (LLM + Context)
|
||||
↓
|
||||
Stream to User
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1: Basic RAG
|
||||
|
||||
**Use when**: Simple Q&A over a knowledge base
|
||||
|
||||
```typescript
|
||||
async function basicRAG(query: string, env: Env): Promise<string> {
|
||||
// 1. Embed query
|
||||
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
|
||||
// 2. Search Vectorize
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
|
||||
|
||||
// 3. Concatenate context
|
||||
const context = results.matches
|
||||
.map(m => m.metadata?.text)
|
||||
.join('\n\n');
|
||||
|
||||
// 4. Generate response
|
||||
const response = await generateResponse(context, query, env.GEMINI_API_KEY);
|
||||
|
||||
return response;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2: Chunked RAG (Recommended)
|
||||
|
||||
**Use when**: Documents are longer than 2,048 tokens
|
||||
|
||||
### Chunking Strategies
|
||||
|
||||
```typescript
|
||||
// Strategy A: Fixed-size chunks with overlap
|
||||
function chunkWithOverlap(text: string, size = 500, overlap = 50): string[] {
|
||||
const words = text.split(/\s+/);
|
||||
const chunks: string[] = [];
|
||||
|
||||
for (let i = 0; i < words.length; i += size - overlap) {
|
||||
chunks.push(words.slice(i, i + size).join(' '));
|
||||
}
|
||||
|
||||
return chunks;
|
||||
}
|
||||
|
||||
// Strategy B: Sentence-based chunks
|
||||
function chunkBySentences(text: string, maxSentences = 10): string[] {
|
||||
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
|
||||
const chunks: string[] = [];
|
||||
|
||||
for (let i = 0; i < sentences.length; i += maxSentences) {
|
||||
chunks.push(sentences.slice(i, i + maxSentences).join(' '));
|
||||
}
|
||||
|
||||
return chunks;
|
||||
}
|
||||
|
||||
// Strategy C: Semantic chunks (preserves paragraphs)
|
||||
function chunkByParagraphs(text: string): string[] {
|
||||
return text.split(/\n\n+/).filter(p => p.trim().length > 50);
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
async function ingestWithChunking(doc: Document, env: Env) {
|
||||
const chunks = chunkWithOverlap(doc.text, 500, 50);
|
||||
|
||||
const vectors = [];
|
||||
for (let i = 0; i < chunks.length; i++) {
|
||||
const embedding = await generateEmbedding(chunks[i], env.GEMINI_API_KEY, 'RETRIEVAL_DOCUMENT');
|
||||
|
||||
vectors.push({
|
||||
id: `${doc.id}-chunk-${i}`,
|
||||
values: embedding,
|
||||
metadata: {
|
||||
documentId: doc.id,
|
||||
chunkIndex: i,
|
||||
text: chunks[i],
|
||||
title: doc.title
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
await env.VECTORIZE.insert(vectors);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3: Hybrid Search (Keyword + Semantic)
|
||||
|
||||
**Use when**: You need both exact keyword matches and semantic understanding
|
||||
|
||||
```typescript
|
||||
async function hybridSearch(query: string, env: Env) {
|
||||
// 1. Vector search
|
||||
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
const vectorResults = await env.VECTORIZE.query(queryEmbedding, { topK: 10 });
|
||||
|
||||
// 2. Keyword search (using metadata or D1)
|
||||
const keywordResults = await env.D1.prepare(
|
||||
'SELECT * FROM documents WHERE text LIKE ? ORDER BY relevance DESC LIMIT 10'
|
||||
).bind(`%${query}%`).all();
|
||||
|
||||
// 3. Merge and re-rank
|
||||
const combined = mergeResults(vectorResults.matches, keywordResults.results);
|
||||
|
||||
// 4. Generate response from top results
|
||||
const context = combined.slice(0, 5).map(r => r.text).join('\n\n');
|
||||
return await generateResponse(context, query, env.GEMINI_API_KEY);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4: Filtered RAG
|
||||
|
||||
**Use when**: Need to filter by category, date, or metadata
|
||||
|
||||
```typescript
|
||||
async function filteredRAG(query: string, filters: { category?: string; minDate?: number }, env: Env) {
|
||||
// 1. Vector search
|
||||
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 20 }); // Fetch more
|
||||
|
||||
// 2. Filter in application layer (until Vectorize supports metadata filtering)
|
||||
const filtered = results.matches.filter(match => {
|
||||
if (filters.category && match.metadata?.category !== filters.category) return false;
|
||||
if (filters.minDate && match.metadata?.timestamp < filters.minDate) return false;
|
||||
return true;
|
||||
});
|
||||
|
||||
// 3. Take top 5 after filtering
|
||||
const topResults = filtered.slice(0, 5);
|
||||
|
||||
// 4. Generate response
|
||||
const context = topResults.map(r => r.metadata?.text).join('\n\n');
|
||||
return await generateResponse(context, query, env.GEMINI_API_KEY);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5: Streaming RAG
|
||||
|
||||
**Use when**: Real-time responses with immediate feedback
|
||||
|
||||
```typescript
|
||||
async function streamingRAG(query: string, env: Env): Promise<ReadableStream> {
|
||||
// 1. Embed query and search
|
||||
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 3 });
|
||||
|
||||
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
|
||||
|
||||
// 2. Stream response from Gemini
|
||||
const response = await fetch(
|
||||
'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent',
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'x-goog-api-key': env.GEMINI_API_KEY,
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({
|
||||
contents: [{
|
||||
parts: [{ text: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:` }]
|
||||
}]
|
||||
})
|
||||
}
|
||||
);
|
||||
|
||||
return response.body!;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 6: Multi-Query RAG
|
||||
|
||||
**Use when**: Query might be ambiguous or multi-faceted
|
||||
|
||||
```typescript
|
||||
async function multiQueryRAG(query: string, env: Env) {
|
||||
// 1. Generate multiple query variations
|
||||
const queryVariations = await generateQueryVariations(query, env.GEMINI_API_KEY);
|
||||
// Returns: ["original query", "rephrased version 1", "rephrased version 2"]
|
||||
|
||||
// 2. Search with each variation
|
||||
const allResults = await Promise.all(
|
||||
queryVariations.map(async q => {
|
||||
const embedding = await generateEmbedding(q, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
return await env.VECTORIZE.query(embedding, { topK: 3 });
|
||||
})
|
||||
);
|
||||
|
||||
// 3. Merge and deduplicate
|
||||
const uniqueResults = deduplicateById(allResults.flatMap(r => r.matches));
|
||||
|
||||
// 4. Generate response
|
||||
const context = uniqueResults.slice(0, 5).map(r => r.metadata?.text).join('\n\n');
|
||||
return await generateResponse(context, query, env.GEMINI_API_KEY);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 7: Conversational RAG
|
||||
|
||||
**Use when**: Multi-turn conversations with context
|
||||
|
||||
```typescript
|
||||
interface ConversationHistory {
|
||||
role: 'user' | 'assistant';
|
||||
content: string;
|
||||
}
|
||||
|
||||
async function conversationalRAG(
|
||||
query: string,
|
||||
history: ConversationHistory[],
|
||||
env: Env
|
||||
) {
|
||||
// 1. Create contextualized query from history
|
||||
const contextualizedQuery = await reformulateQuery(query, history, env.GEMINI_API_KEY);
|
||||
|
||||
// 2. Search with contextualized query
|
||||
const embedding = await generateEmbedding(contextualizedQuery, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
const results = await env.VECTORIZE.query(embedding, { topK: 3 });
|
||||
|
||||
const retrievedContext = results.matches.map(m => m.metadata?.text).join('\n\n');
|
||||
|
||||
// 3. Generate response with conversation history
|
||||
const prompt = `
|
||||
Conversation history:
|
||||
${history.map(h => `${h.role}: ${h.content}`).join('\n')}
|
||||
|
||||
Retrieved context:
|
||||
${retrievedContext}
|
||||
|
||||
User: ${query}
|
||||
Assistant:`;
|
||||
|
||||
return await generateResponse(prompt, query, env.GEMINI_API_KEY);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 8: Citation RAG
|
||||
|
||||
**Use when**: Need to cite sources in responses
|
||||
|
||||
```typescript
|
||||
async function citationRAG(query: string, env: Env) {
|
||||
const queryEmbedding = await generateEmbedding(query, env.GEMINI_API_KEY, 'RETRIEVAL_QUERY');
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, { topK: 5, returnMetadata: true });
|
||||
|
||||
// Build context with citations
|
||||
const contextWithCitations = results.matches.map((match, i) =>
|
||||
`[${i + 1}] ${match.metadata?.text}\nSource: ${match.metadata?.url || match.id}`
|
||||
).join('\n\n');
|
||||
|
||||
const prompt = `Answer the question using the provided sources. Include citations [1], [2], etc. in your answer.
|
||||
|
||||
Sources:
|
||||
${contextWithCitations}
|
||||
|
||||
Question: ${query}
|
||||
|
||||
Answer (with citations):`;
|
||||
|
||||
const response = await generateResponse(prompt, query, env.GEMINI_API_KEY);
|
||||
|
||||
return {
|
||||
answer: response,
|
||||
sources: results.matches.map((m, i) => ({
|
||||
citation: i + 1,
|
||||
text: m.metadata?.text,
|
||||
url: m.metadata?.url,
|
||||
score: m.score
|
||||
}))
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Chunk Size Optimization
|
||||
|
||||
```typescript
|
||||
// Test different chunk sizes for your use case
|
||||
const chunkSizes = [200, 500, 1000, 1500];
|
||||
|
||||
for (const size of chunkSizes) {
|
||||
const accuracy = await testRetrievalAccuracy(size);
|
||||
console.log(`Chunk size ${size}: ${accuracy}% accuracy`);
|
||||
}
|
||||
|
||||
// Recommendation: 500-1000 words with 10% overlap
|
||||
```
|
||||
|
||||
### 2. Context Window Management
|
||||
|
||||
```typescript
|
||||
// Don't exceed LLM context window
|
||||
function truncateContext(chunks: string[], maxTokens = 4000): string {
|
||||
let context = '';
|
||||
let estimatedTokens = 0;
|
||||
|
||||
for (const chunk of chunks) {
|
||||
const chunkTokens = chunk.split(/\s+/).length * 1.3; // Rough estimate
|
||||
if (estimatedTokens + chunkTokens > maxTokens) break;
|
||||
|
||||
context += chunk + '\n\n';
|
||||
estimatedTokens += chunkTokens;
|
||||
}
|
||||
|
||||
return context;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Re-ranking
|
||||
|
||||
```typescript
|
||||
// Re-rank results after retrieval
|
||||
function rerank(results: VectorizeMatch[], query: string): VectorizeMatch[] {
|
||||
return results
|
||||
.map(result => ({
|
||||
...result,
|
||||
rerankScore: calculateRelevance(result.metadata?.text, query)
|
||||
}))
|
||||
.sort((a, b) => b.rerankScore - a.rerankScore);
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Fallback Strategies
|
||||
|
||||
```typescript
|
||||
async function ragWithFallback(query: string, env: Env) {
|
||||
const results = await searchVectorize(query, env);
|
||||
|
||||
if (results.matches.length === 0 || results.matches[0].score < 0.7) {
|
||||
// Fallback: Use LLM without RAG
|
||||
return await generateResponse('', query, env.GEMINI_API_KEY);
|
||||
}
|
||||
|
||||
// Normal RAG flow
|
||||
const context = results.matches.map(m => m.metadata?.text).join('\n\n');
|
||||
return await generateResponse(context, query, env.GEMINI_API_KEY);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### 1. Caching
|
||||
|
||||
```typescript
|
||||
// Cache embeddings
|
||||
const embeddingCache = new Map<string, number[]>();
|
||||
|
||||
async function getCachedEmbedding(text: string, apiKey: string) {
|
||||
const key = hashText(text);
|
||||
|
||||
if (embeddingCache.has(key)) {
|
||||
return embeddingCache.get(key)!;
|
||||
}
|
||||
|
||||
const embedding = await generateEmbedding(text, apiKey, 'RETRIEVAL_QUERY');
|
||||
embeddingCache.set(key, embedding);
|
||||
|
||||
return embedding;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Batch Processing
|
||||
|
||||
```typescript
|
||||
// Ingest documents in parallel
|
||||
async function batchIngest(documents: Document[], env: Env, concurrency = 5) {
|
||||
for (let i = 0; i < documents.length; i += concurrency) {
|
||||
const batch = documents.slice(i, i + concurrency);
|
||||
|
||||
await Promise.all(
|
||||
batch.map(doc => ingestDocument(doc, env))
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### ❌ Don't: Use same task type for queries and documents
|
||||
|
||||
```typescript
|
||||
// Wrong
|
||||
const embedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_DOCUMENT');
|
||||
```
|
||||
|
||||
### ✅ Do: Use correct task types
|
||||
|
||||
```typescript
|
||||
// Correct
|
||||
const queryEmbedding = await generateEmbedding(query, apiKey, 'RETRIEVAL_QUERY');
|
||||
const docEmbedding = await generateEmbedding(doc, apiKey, 'RETRIEVAL_DOCUMENT');
|
||||
```
|
||||
|
||||
### ❌ Don't: Return too many or too few results
|
||||
|
||||
```typescript
|
||||
// Too few (might miss relevant info)
|
||||
const results = await env.VECTORIZE.query(embedding, { topK: 1 });
|
||||
|
||||
// Too many (noise, cost)
|
||||
const results = await env.VECTORIZE.query(embedding, { topK: 50 });
|
||||
```
|
||||
|
||||
### ✅ Do: Find optimal topK for your use case
|
||||
|
||||
```typescript
|
||||
// Test different topK values
|
||||
const topK = 5; // Good default for most use cases
|
||||
const results = await env.VECTORIZE.query(embedding, { topK });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Example
|
||||
|
||||
See `templates/rag-with-vectorize.ts` for a production-ready implementation combining these patterns.
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Gemini Embeddings**: https://ai.google.dev/gemini-api/docs/embeddings
|
||||
- **Vectorize**: https://developers.cloudflare.com/vectorize/
|
||||
- **RAG Best Practices**: https://ai.google.dev/gemini-api/docs/document-processing
|
||||
460
references/top-errors.md
Normal file
460
references/top-errors.md
Normal file
@@ -0,0 +1,460 @@
|
||||
# Top 8 Embedding Errors (And How to Fix Them)
|
||||
|
||||
This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.
|
||||
|
||||
---
|
||||
|
||||
## Error 1: Dimension Mismatch
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Error: Vector dimensions do not match. Expected 768, got 3072
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Generated embedding with default dimensions (3072) but Vectorize index expects 768
|
||||
- Mixed embeddings from different dimension settings
|
||||
|
||||
### Root Cause
|
||||
Not specifying `outputDimensionality` parameter when generating embeddings.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: No outputDimensionality (defaults to 3072)
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text
|
||||
});
|
||||
|
||||
// ✅ GOOD: Match Vectorize index dimensions
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: { outputDimensionality: 768 } // ← Match your index
|
||||
});
|
||||
```
|
||||
|
||||
### Fix
|
||||
1. **Option A**: Regenerate embeddings with correct dimensions
|
||||
2. **Option B**: Recreate Vectorize index with 3072 dimensions
|
||||
|
||||
```bash
|
||||
# Recreate index with correct dimensions
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
||||
```
|
||||
|
||||
**Sources**:
|
||||
- https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
|
||||
- Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/
|
||||
|
||||
---
|
||||
|
||||
## Error 2: Batch Size Limit Exceeded
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Error: Request contains too many texts. Maximum: 100
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Tried to embed more texts than API allows in single request
|
||||
- Different limits for single vs batch endpoints
|
||||
|
||||
### Root Cause
|
||||
Gemini API limits the number of texts per batch request.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Trying to embed 500 texts at once
|
||||
const embeddings = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
contents: largeArray, // 500 texts
|
||||
config: { taskType: 'RETRIEVAL_DOCUMENT' }
|
||||
});
|
||||
|
||||
// ✅ GOOD: Chunk into batches
|
||||
async function batchEmbed(texts: string[], batchSize = 100) {
|
||||
const allEmbeddings: number[][] = [];
|
||||
|
||||
for (let i = 0; i < texts.length; i += batchSize) {
|
||||
const batch = texts.slice(i, i + batchSize);
|
||||
const response = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
contents: batch,
|
||||
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
|
||||
});
|
||||
allEmbeddings.push(...response.embeddings.map(e => e.values));
|
||||
|
||||
// Rate limiting delay
|
||||
if (i + batchSize < texts.length) {
|
||||
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||
}
|
||||
}
|
||||
|
||||
return allEmbeddings;
|
||||
}
|
||||
```
|
||||
|
||||
**Sources**:
|
||||
- Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits
|
||||
|
||||
---
|
||||
|
||||
## Error 3: Rate Limiting (429 Too Many Requests)
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Error: 429 Too Many Requests - Rate limit exceeded
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Exceeded 100 requests per minute (free tier)
|
||||
- Exceeded tokens per minute limit
|
||||
- No exponential backoff implemented
|
||||
|
||||
### Root Cause
|
||||
Free tier rate limits: 100 RPM, 30k TPM, 1k RPD
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: No rate limiting
|
||||
for (const text of texts) {
|
||||
await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
|
||||
}
|
||||
|
||||
// ✅ GOOD: Exponential backoff
|
||||
async function embedWithRetry(text: string, maxRetries = 3) {
|
||||
for (let attempt = 0; attempt < maxRetries; attempt++) {
|
||||
try {
|
||||
return await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: text,
|
||||
config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
|
||||
});
|
||||
} catch (error: any) {
|
||||
if (error.status === 429 && attempt < maxRetries - 1) {
|
||||
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
|
||||
console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
|
||||
await new Promise(resolve => setTimeout(resolve, delay));
|
||||
continue;
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rate Limits**:
|
||||
| Tier | RPM | TPM | RPD |
|
||||
|------|-----|-----|-----|
|
||||
| Free | 100 | 30,000 | 1,000 |
|
||||
| Tier 1 | 3,000 | 1,000,000 | - |
|
||||
|
||||
**Sources**:
|
||||
- https://ai.google.dev/gemini-api/docs/rate-limits
|
||||
|
||||
---
|
||||
|
||||
## Error 4: Text Truncation (Input Length Limit)
|
||||
|
||||
### Error Message
|
||||
No error! Text is **silently truncated** at 2,048 tokens.
|
||||
|
||||
### Why It Happens
|
||||
- Input text exceeds 2,048 token limit
|
||||
- No warning or error is raised
|
||||
- Embeddings represent incomplete text
|
||||
|
||||
### Root Cause
|
||||
Gemini embeddings model has 2,048 token input limit.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Long text (silently truncated)
|
||||
const longText = "...".repeat(10000); // Very long
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: longText // Truncated to ~2,048 tokens
|
||||
});
|
||||
|
||||
// ✅ GOOD: Chunk long texts
|
||||
function chunkText(text: string, maxTokens = 2000): string[] {
|
||||
const words = text.split(/\s+/);
|
||||
const chunks: string[] = [];
|
||||
let currentChunk: string[] = [];
|
||||
|
||||
for (const word of words) {
|
||||
currentChunk.push(word);
|
||||
|
||||
// Rough estimate: 1 token ≈ 0.75 words
|
||||
if (currentChunk.length * 0.75 >= maxTokens) {
|
||||
chunks.push(currentChunk.join(' '));
|
||||
currentChunk = [];
|
||||
}
|
||||
}
|
||||
|
||||
if (currentChunk.length > 0) {
|
||||
chunks.push(currentChunk.join(' '));
|
||||
}
|
||||
|
||||
return chunks;
|
||||
}
|
||||
|
||||
const chunks = chunkText(longText, 2000);
|
||||
const embeddings = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
contents: chunks,
|
||||
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
|
||||
});
|
||||
```
|
||||
|
||||
**Sources**:
|
||||
- https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
|
||||
|
||||
---
|
||||
|
||||
## Error 5: Cosine Similarity Calculation Errors
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Error: Similarity values out of range (-1.5 to 1.2)
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Incorrect formula (using dot product instead of cosine similarity)
|
||||
- Not normalizing magnitudes
|
||||
- Division by zero for zero vectors
|
||||
|
||||
### Root Cause
|
||||
Improper implementation of cosine similarity formula.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Just dot product (not cosine similarity)
|
||||
function badSimilarity(a: number[], b: number[]): number {
|
||||
let sum = 0;
|
||||
for (let i = 0; i < a.length; i++) {
|
||||
sum += a[i] * b[i];
|
||||
}
|
||||
return sum; // Wrong! This is unbounded
|
||||
}
|
||||
|
||||
// ✅ GOOD: Proper cosine similarity
|
||||
function cosineSimilarity(a: number[], b: number[]): number {
|
||||
if (a.length !== b.length) {
|
||||
throw new Error('Vector dimensions must match');
|
||||
}
|
||||
|
||||
let dotProduct = 0;
|
||||
let magnitudeA = 0;
|
||||
let magnitudeB = 0;
|
||||
|
||||
for (let i = 0; i < a.length; i++) {
|
||||
dotProduct += a[i] * b[i];
|
||||
magnitudeA += a[i] * a[i];
|
||||
magnitudeB += b[i] * b[i];
|
||||
}
|
||||
|
||||
if (magnitudeA === 0 || magnitudeB === 0) {
|
||||
return 0; // Handle zero vectors
|
||||
}
|
||||
|
||||
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
|
||||
}
|
||||
```
|
||||
|
||||
**Formula**:
|
||||
```
|
||||
cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
|
||||
```
|
||||
|
||||
Where:
|
||||
- `A · B` = dot product
|
||||
- `||A||` = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)
|
||||
|
||||
**Result Range**: Always between -1 and 1
|
||||
- 1 = identical direction
|
||||
- 0 = perpendicular
|
||||
- -1 = opposite direction
|
||||
|
||||
**Sources**:
|
||||
- https://en.wikipedia.org/wiki/Cosine_similarity
|
||||
|
||||
---
|
||||
|
||||
## Error 6: Incorrect Task Type (Reduces Quality)
|
||||
|
||||
### Error Message
|
||||
No error, but search quality is poor (10-30% worse).
|
||||
|
||||
### Why It Happens
|
||||
- Using `RETRIEVAL_DOCUMENT` for queries
|
||||
- Using `RETRIEVAL_QUERY` for documents
|
||||
- Not specifying task type at all
|
||||
|
||||
### Root Cause
|
||||
Task types optimize embeddings for specific use cases.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Wrong task type for RAG
|
||||
const queryEmbedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: userQuery,
|
||||
config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
|
||||
});
|
||||
|
||||
// ✅ GOOD: Correct task types
|
||||
// For user queries
|
||||
const queryEmbedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: userQuery,
|
||||
config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
|
||||
});
|
||||
|
||||
// For documents to index
|
||||
const docEmbedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001',
|
||||
content: documentText,
|
||||
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
|
||||
});
|
||||
```
|
||||
|
||||
**Task Types Cheat Sheet**:
|
||||
| Task Type | Use For | Example |
|
||||
|-----------|---------|---------|
|
||||
| `RETRIEVAL_QUERY` | User queries | "What is RAG?" |
|
||||
| `RETRIEVAL_DOCUMENT` | Documents to index | Knowledge base articles |
|
||||
| `SEMANTIC_SIMILARITY` | Comparing texts | Duplicate detection |
|
||||
| `CLUSTERING` | Grouping texts | Topic modeling |
|
||||
| `CLASSIFICATION` | Categorizing texts | Spam detection |
|
||||
|
||||
**Impact**: Using correct task type improves search relevance by 10-30%.
|
||||
|
||||
**Sources**:
|
||||
- https://ai.google.dev/gemini-api/docs/embeddings#task-types
|
||||
|
||||
---
|
||||
|
||||
## Error 7: Vector Storage Precision Loss
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Warning: Similarity scores inconsistent after storage/retrieval
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Storing embeddings as integers instead of floats
|
||||
- Rounding to fewer decimal places
|
||||
- Using lossy compression
|
||||
|
||||
### Root Cause
|
||||
Embeddings are high-precision floating-point numbers.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Rounding to integers
|
||||
const embedding = response.embedding.values;
|
||||
const rounded = embedding.map(v => Math.round(v)); // Precision loss!
|
||||
|
||||
await db.insert({
|
||||
id: '1',
|
||||
embedding: rounded // ← Will degrade search quality
|
||||
});
|
||||
|
||||
// ✅ GOOD: Store full precision
|
||||
const embedding = response.embedding.values; // Keep as-is
|
||||
|
||||
await db.insert({
|
||||
id: '1',
|
||||
embedding: embedding // ← Full float32 precision
|
||||
});
|
||||
|
||||
// For JSON storage, use full precision
|
||||
const json = JSON.stringify({
|
||||
id: '1',
|
||||
embedding: embedding // JavaScript numbers are float64
|
||||
});
|
||||
```
|
||||
|
||||
**Storage Recommendations**:
|
||||
- **Vectorize**: Handles float32 automatically ✅
|
||||
- **D1/SQLite**: Use BLOB for binary float32 array
|
||||
- **KV**: Store as JSON (float64 precision)
|
||||
- **R2**: Store as binary float32 array
|
||||
|
||||
**Sources**:
|
||||
- Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/
|
||||
|
||||
---
|
||||
|
||||
## Error 8: Model Version Confusion
|
||||
|
||||
### Error Message
|
||||
```
|
||||
Error: Model 'gemini-embedding-exp-03-07' is deprecated
|
||||
```
|
||||
|
||||
### Why It Happens
|
||||
- Using experimental or deprecated model
|
||||
- Mixing embeddings from different model versions
|
||||
- Not keeping up with model updates
|
||||
|
||||
### Root Cause
|
||||
Gemini has stable and experimental embedding models.
|
||||
|
||||
### Prevention
|
||||
```typescript
|
||||
// ❌ BAD: Using experimental/deprecated model
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
|
||||
content: text
|
||||
});
|
||||
|
||||
// ✅ GOOD: Use stable model
|
||||
const embedding = await ai.models.embedContent({
|
||||
model: 'gemini-embedding-001', // Stable production model
|
||||
content: text,
|
||||
config: {
|
||||
taskType: 'SEMANTIC_SIMILARITY',
|
||||
outputDimensionality: 768
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Model Status**:
|
||||
| Model | Status | Recommendation |
|
||||
|-------|--------|----------------|
|
||||
| `gemini-embedding-001` | ✅ Stable | Use this |
|
||||
| `gemini-embedding-exp-03-07` | ❌ Deprecated (Oct 2025) | Migrate to gemini-embedding-001 |
|
||||
|
||||
**CRITICAL**: Never mix embeddings from different models. They use different vector spaces and are not comparable.
|
||||
|
||||
**Sources**:
|
||||
- https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings
|
||||
|
||||
---
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before deploying to production, verify:
|
||||
|
||||
- [ ] `outputDimensionality` matches Vectorize index dimensions
|
||||
- [ ] Batch size ≤ API limits (chunk large datasets)
|
||||
- [ ] Rate limiting implemented with exponential backoff
|
||||
- [ ] Long texts are chunked (≤ 2,048 tokens)
|
||||
- [ ] Cosine similarity formula is correct
|
||||
- [ ] Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
|
||||
- [ ] Embeddings stored with full precision (float32)
|
||||
- [ ] Using stable model (`gemini-embedding-001`)
|
||||
|
||||
**Following these guidelines prevents 100% of documented errors.**
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Official Docs**: https://ai.google.dev/gemini-api/docs/embeddings
|
||||
- **Rate Limits**: https://ai.google.dev/gemini-api/docs/rate-limits
|
||||
- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
|
||||
- **Model Specs**: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
|
||||
469
references/vectorize-integration.md
Normal file
469
references/vectorize-integration.md
Normal file
@@ -0,0 +1,469 @@
|
||||
# Cloudflare Vectorize Integration
|
||||
|
||||
Complete guide for using Gemini embeddings with Cloudflare Vectorize.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Create Vectorize Index
|
||||
|
||||
```bash
|
||||
# Create index with 768 dimensions (recommended for Gemini)
|
||||
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
|
||||
|
||||
# Alternative: 3072 dimensions (Gemini default, more accurate but larger)
|
||||
npx wrangler vectorize create gemini-embeddings-large --dimensions 3072 --metric cosine
|
||||
```
|
||||
|
||||
### 2. Bind to Worker
|
||||
|
||||
Add to `wrangler.jsonc`:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"name": "my-rag-worker",
|
||||
"main": "src/index.ts",
|
||||
"compatibility_date": "2025-10-25",
|
||||
"vectorize": {
|
||||
"bindings": [
|
||||
{
|
||||
"binding": "VECTORIZE",
|
||||
"index_name": "gemini-embeddings"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Generate and Store Embeddings
|
||||
|
||||
```typescript
|
||||
// Generate embedding
|
||||
const response = await fetch(
|
||||
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'x-goog-api-key': env.GEMINI_API_KEY,
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({
|
||||
content: { parts: [{ text: 'Your document text' }] },
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 768 // MUST match index dimensions
|
||||
})
|
||||
}
|
||||
);
|
||||
|
||||
const data = await response.json();
|
||||
const embedding = data.embedding.values;
|
||||
|
||||
// Insert into Vectorize
|
||||
await env.VECTORIZE.insert([{
|
||||
id: 'doc-1',
|
||||
values: embedding,
|
||||
metadata: { text: 'Your document text', source: 'manual' }
|
||||
}]);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dimension Configuration
|
||||
|
||||
**CRITICAL**: Embedding dimensions MUST match Vectorize index dimensions.
|
||||
|
||||
| Gemini Dimensions | Storage (per vector) | Recommended For |
|
||||
|-------------------|---------------------|-----------------|
|
||||
| 768 | 3 KB | Most use cases, cost-effective |
|
||||
| 1536 | 6 KB | Balance accuracy/storage |
|
||||
| 3072 | 12 KB | Maximum accuracy |
|
||||
|
||||
**Create index to match your embeddings**:
|
||||
|
||||
```bash
|
||||
# For 768-dim embeddings
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
||||
|
||||
# For 1536-dim embeddings
|
||||
npx wrangler vectorize create my-index --dimensions 1536 --metric cosine
|
||||
|
||||
# For 3072-dim embeddings (Gemini default)
|
||||
npx wrangler vectorize create my-index --dimensions 3072 --metric cosine
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metric Selection
|
||||
|
||||
Vectorize supports 3 distance metrics:
|
||||
|
||||
### Cosine (Recommended)
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- ✅ Semantic search (most common)
|
||||
- ✅ Document similarity
|
||||
- ✅ RAG systems
|
||||
|
||||
**Range**: 0 (different) to 1 (identical)
|
||||
|
||||
### Euclidean
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric euclidean
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- ✅ Absolute distance matters
|
||||
- ✅ Magnitude is important
|
||||
|
||||
**Range**: 0 (identical) to ∞ (very different)
|
||||
|
||||
### Dot Product
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize create my-index --dimensions 768 --metric dot-product
|
||||
```
|
||||
|
||||
**When to use**:
|
||||
- ✅ Pre-normalized vectors
|
||||
- ✅ Performance optimization
|
||||
|
||||
**Range**: -1 to 1 (for normalized vectors)
|
||||
|
||||
**Recommendation**: Use **cosine** for Gemini embeddings (most common and intuitive).
|
||||
|
||||
---
|
||||
|
||||
## Insert Patterns
|
||||
|
||||
### Single Insert
|
||||
|
||||
```typescript
|
||||
await env.VECTORIZE.insert([{
|
||||
id: 'doc-1',
|
||||
values: embedding,
|
||||
metadata: {
|
||||
text: 'Document content',
|
||||
timestamp: Date.now(),
|
||||
category: 'documentation'
|
||||
}
|
||||
}]);
|
||||
```
|
||||
|
||||
### Batch Insert
|
||||
|
||||
```typescript
|
||||
const vectors = documents.map((doc, i) => ({
|
||||
id: `doc-${i}`,
|
||||
values: doc.embedding,
|
||||
metadata: { text: doc.text }
|
||||
}));
|
||||
|
||||
// Insert up to 100 vectors at once
|
||||
await env.VECTORIZE.insert(vectors);
|
||||
```
|
||||
|
||||
### Upsert (Update or Insert)
|
||||
|
||||
```typescript
|
||||
// Vectorize automatically updates if ID exists
|
||||
await env.VECTORIZE.insert([{
|
||||
id: 'doc-1', // Existing ID
|
||||
values: newEmbedding,
|
||||
metadata: { text: 'Updated content' }
|
||||
}]);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Basic Query
|
||||
|
||||
```typescript
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, {
|
||||
topK: 5
|
||||
});
|
||||
|
||||
console.log(results.matches);
|
||||
// [{ id: 'doc-1', score: 0.95 }, ...]
|
||||
```
|
||||
|
||||
### Query with Metadata
|
||||
|
||||
```typescript
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, {
|
||||
topK: 5,
|
||||
returnMetadata: true
|
||||
});
|
||||
|
||||
results.matches.forEach(match => {
|
||||
console.log(match.id); // 'doc-1'
|
||||
console.log(match.score); // 0.95
|
||||
console.log(match.metadata.text); // 'Document content'
|
||||
});
|
||||
```
|
||||
|
||||
### Query with Metadata Filtering (Future)
|
||||
|
||||
```typescript
|
||||
// Coming soon: Filter by metadata
|
||||
const results = await env.VECTORIZE.query(queryEmbedding, {
|
||||
topK: 5,
|
||||
filter: { category: 'documentation' }
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metadata Best Practices
|
||||
|
||||
### What to Store
|
||||
|
||||
```typescript
|
||||
await env.VECTORIZE.insert([{
|
||||
id: 'doc-1',
|
||||
values: embedding,
|
||||
metadata: {
|
||||
// ✅ Store these
|
||||
text: 'The actual document content', // For retrieval
|
||||
title: 'Document title',
|
||||
url: 'https://example.com/doc',
|
||||
timestamp: Date.now(),
|
||||
category: 'product',
|
||||
|
||||
// ❌ Don't store these
|
||||
embedding: embedding, // Already stored as values
|
||||
largeObject: { /* ... */ } // Keep metadata small
|
||||
}
|
||||
}]);
|
||||
```
|
||||
|
||||
### Metadata Limits
|
||||
|
||||
- **Max size**: ~1 KB per vector
|
||||
- **Best practice**: Store only what you need for retrieval/display
|
||||
- **For large data**: Store minimal metadata, fetch full data from D1/KV using ID
|
||||
|
||||
---
|
||||
|
||||
## Complete RAG Example
|
||||
|
||||
```typescript
|
||||
interface Env {
|
||||
GEMINI_API_KEY: string;
|
||||
VECTORIZE: VectorizeIndex;
|
||||
}
|
||||
|
||||
export default {
|
||||
async fetch(request: Request, env: Env): Promise<Response> {
|
||||
const url = new URL(request.url);
|
||||
|
||||
// Ingest: POST /ingest with { text: "..." }
|
||||
if (url.pathname === '/ingest' && request.method === 'POST') {
|
||||
const { text } = await request.json();
|
||||
|
||||
// 1. Generate embedding
|
||||
const embeddingRes = await fetch(
|
||||
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'x-goog-api-key': env.GEMINI_API_KEY,
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({
|
||||
content: { parts: [{ text }] },
|
||||
taskType: 'RETRIEVAL_DOCUMENT',
|
||||
outputDimensionality: 768
|
||||
})
|
||||
}
|
||||
);
|
||||
|
||||
const embeddingData = await embeddingRes.json();
|
||||
const embedding = embeddingData.embedding.values;
|
||||
|
||||
// 2. Store in Vectorize
|
||||
await env.VECTORIZE.insert([{
|
||||
id: `doc-${Date.now()}`,
|
||||
values: embedding,
|
||||
metadata: { text, timestamp: Date.now() }
|
||||
}]);
|
||||
|
||||
return new Response(JSON.stringify({ success: true }));
|
||||
}
|
||||
|
||||
// Query: POST /query with { query: "..." }
|
||||
if (url.pathname === '/query' && request.method === 'POST') {
|
||||
const { query } = await request.json();
|
||||
|
||||
// 1. Generate query embedding
|
||||
const embeddingRes = await fetch(
|
||||
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'x-goog-api-key': env.GEMINI_API_KEY,
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
body: JSON.stringify({
|
||||
content: { parts: [{ text: query }] },
|
||||
taskType: 'RETRIEVAL_QUERY',
|
||||
outputDimensionality: 768
|
||||
})
|
||||
}
|
||||
);
|
||||
|
||||
const embeddingData = await embeddingRes.json();
|
||||
const embedding = embeddingData.embedding.values;
|
||||
|
||||
// 2. Search Vectorize
|
||||
const results = await env.VECTORIZE.query(embedding, {
|
||||
topK: 5,
|
||||
returnMetadata: true
|
||||
});
|
||||
|
||||
return new Response(JSON.stringify({
|
||||
query,
|
||||
results: results.matches.map(m => ({
|
||||
id: m.id,
|
||||
score: m.score,
|
||||
text: m.metadata?.text
|
||||
}))
|
||||
}));
|
||||
}
|
||||
|
||||
return new Response('Not found', { status: 404 });
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Index Management
|
||||
|
||||
### List Indexes
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize list
|
||||
```
|
||||
|
||||
### Get Index Info
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize get gemini-embeddings
|
||||
```
|
||||
|
||||
### Delete Index
|
||||
|
||||
```bash
|
||||
npx wrangler vectorize delete gemini-embeddings
|
||||
```
|
||||
|
||||
**CRITICAL**: Deleting an index deletes all vectors permanently.
|
||||
|
||||
---
|
||||
|
||||
## Limitations & Quotas
|
||||
|
||||
| Feature | Free Plan | Paid Plans |
|
||||
|---------|-----------|------------|
|
||||
| Indexes per account | 100 | 100 |
|
||||
| Vectors per index | 200,000 | 5,000,000+ |
|
||||
| Queries per day | 30,000,000 | Unlimited |
|
||||
| Dimensions | Up to 1536 | Up to 3072 |
|
||||
|
||||
**Source**: https://developers.cloudflare.com/vectorize/platform/pricing/
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Choose Dimensions Wisely
|
||||
|
||||
```typescript
|
||||
// ✅ 768 dimensions (recommended)
|
||||
// - Good accuracy
|
||||
// - Low storage
|
||||
// - Fast queries
|
||||
|
||||
// ⚠️ 3072 dimensions (if accuracy is critical)
|
||||
// - Best accuracy
|
||||
// - 4x storage
|
||||
// - Slower queries
|
||||
```
|
||||
|
||||
### 2. Use Metadata for Context
|
||||
|
||||
```typescript
|
||||
await env.VECTORIZE.insert([{
|
||||
id: 'doc-1',
|
||||
values: embedding,
|
||||
metadata: {
|
||||
text: 'Store the actual text here for retrieval',
|
||||
url: 'https://...',
|
||||
timestamp: Date.now()
|
||||
}
|
||||
}]);
|
||||
```
|
||||
|
||||
### 3. Implement Caching
|
||||
|
||||
```typescript
|
||||
// Cache embeddings in KV
|
||||
const cached = await env.KV.get(`embedding:${textHash}`);
|
||||
if (cached) {
|
||||
return JSON.parse(cached);
|
||||
}
|
||||
|
||||
const embedding = await generateEmbedding(text);
|
||||
await env.KV.put(`embedding:${textHash}`, JSON.stringify(embedding), {
|
||||
expirationTtl: 86400 // 24 hours
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Monitor Usage
|
||||
|
||||
```bash
|
||||
# Check index stats
|
||||
npx wrangler vectorize get gemini-embeddings
|
||||
|
||||
# Shows:
|
||||
# - Total vectors
|
||||
# - Dimensions
|
||||
# - Metric type
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Dimension Mismatch Error
|
||||
|
||||
```
|
||||
Error: Vector dimensions do not match. Expected 768, got 3072
|
||||
```
|
||||
|
||||
**Solution**: Ensure embedding `outputDimensionality` matches index dimensions.
|
||||
|
||||
### No Results Found
|
||||
|
||||
**Possible causes**:
|
||||
1. Index is empty (no vectors inserted)
|
||||
2. Query embedding is wrong task type (use RETRIEVAL_QUERY)
|
||||
3. Similarity threshold too high
|
||||
|
||||
**Solution**: Check index has vectors, use correct task types.
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
|
||||
- **Pricing**: https://developers.cloudflare.com/vectorize/platform/pricing/
|
||||
- **Wrangler CLI**: https://developers.cloudflare.com/workers/wrangler/
|
||||
Reference in New Issue
Block a user