Files
2025-11-30 08:24:54 +08:00

461 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Top 8 Embedding Errors (And How to Fix Them)
This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.
---
## Error 1: Dimension Mismatch
### Error Message
```
Error: Vector dimensions do not match. Expected 768, got 3072
```
### Why It Happens
- Generated embedding with default dimensions (3072) but Vectorize index expects 768
- Mixed embeddings from different dimension settings
### Root Cause
Not specifying `outputDimensionality` parameter when generating embeddings.
### Prevention
```typescript
// ❌ BAD: No outputDimensionality (defaults to 3072)
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
});
// ✅ GOOD: Match Vectorize index dimensions
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← Match your index
});
```
### Fix
1. **Option A**: Regenerate embeddings with correct dimensions
2. **Option B**: Recreate Vectorize index with 3072 dimensions
```bash
# Recreate index with correct dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine
```
**Sources**:
- https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
- Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/
---
## Error 2: Batch Size Limit Exceeded
### Error Message
```
Error: Request contains too many texts. Maximum: 100
```
### Why It Happens
- Tried to embed more texts than API allows in single request
- Different limits for single vs batch endpoints
### Root Cause
Gemini API limits the number of texts per batch request.
### Prevention
```typescript
// ❌ BAD: Trying to embed 500 texts at once
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: largeArray, // 500 texts
config: { taskType: 'RETRIEVAL_DOCUMENT' }
});
// ✅ GOOD: Chunk into batches
async function batchEmbed(texts: string[], batchSize = 100) {
const allEmbeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: batch,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
allEmbeddings.push(...response.embeddings.map(e => e.values));
// Rate limiting delay
if (i + batchSize < texts.length) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
return allEmbeddings;
}
```
**Sources**:
- Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits
---
## Error 3: Rate Limiting (429 Too Many Requests)
### Error Message
```
Error: 429 Too Many Requests - Rate limit exceeded
```
### Why It Happens
- Exceeded 100 requests per minute (free tier)
- Exceeded tokens per minute limit
- No exponential backoff implemented
### Root Cause
Free tier rate limits: 100 RPM, 30k TPM, 1k RPD
### Prevention
```typescript
// ❌ BAD: No rate limiting
for (const text of texts) {
await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
}
// ✅ GOOD: Exponential backoff
async function embedWithRetry(text: string, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
});
} catch (error: any) {
if (error.status === 429 && attempt < maxRetries - 1) {
const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
**Rate Limits**:
| Tier | RPM | TPM | RPD |
|------|-----|-----|-----|
| Free | 100 | 30,000 | 1,000 |
| Tier 1 | 3,000 | 1,000,000 | - |
**Sources**:
- https://ai.google.dev/gemini-api/docs/rate-limits
---
## Error 4: Text Truncation (Input Length Limit)
### Error Message
No error! Text is **silently truncated** at 2,048 tokens.
### Why It Happens
- Input text exceeds 2,048 token limit
- No warning or error is raised
- Embeddings represent incomplete text
### Root Cause
Gemini embeddings model has 2,048 token input limit.
### Prevention
```typescript
// ❌ BAD: Long text (silently truncated)
const longText = "...".repeat(10000); // Very long
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: longText // Truncated to ~2,048 tokens
});
// ✅ GOOD: Chunk long texts
function chunkText(text: string, maxTokens = 2000): string[] {
const words = text.split(/\s+/);
const chunks: string[] = [];
let currentChunk: string[] = [];
for (const word of words) {
currentChunk.push(word);
// Rough estimate: 1 token ≈ 0.75 words
if (currentChunk.length * 0.75 >= maxTokens) {
chunks.push(currentChunk.join(' '));
currentChunk = [];
}
}
if (currentChunk.length > 0) {
chunks.push(currentChunk.join(' '));
}
return chunks;
}
const chunks = chunkText(longText, 2000);
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: chunks,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
```
**Sources**:
- https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
---
## Error 5: Cosine Similarity Calculation Errors
### Error Message
```
Error: Similarity values out of range (-1.5 to 1.2)
```
### Why It Happens
- Incorrect formula (using dot product instead of cosine similarity)
- Not normalizing magnitudes
- Division by zero for zero vectors
### Root Cause
Improper implementation of cosine similarity formula.
### Prevention
```typescript
// ❌ BAD: Just dot product (not cosine similarity)
function badSimilarity(a: number[], b: number[]): number {
let sum = 0;
for (let i = 0; i < a.length; i++) {
sum += a[i] * b[i];
}
return sum; // Wrong! This is unbounded
}
// ✅ GOOD: Proper cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
if (a.length !== b.length) {
throw new Error('Vector dimensions must match');
}
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
magnitudeA += a[i] * a[i];
magnitudeB += b[i] * b[i];
}
if (magnitudeA === 0 || magnitudeB === 0) {
return 0; // Handle zero vectors
}
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}
```
**Formula**:
```
cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
```
Where:
- `A · B` = dot product
- `||A||` = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)
**Result Range**: Always between -1 and 1
- 1 = identical direction
- 0 = perpendicular
- -1 = opposite direction
**Sources**:
- https://en.wikipedia.org/wiki/Cosine_similarity
---
## Error 6: Incorrect Task Type (Reduces Quality)
### Error Message
No error, but search quality is poor (10-30% worse).
### Why It Happens
- Using `RETRIEVAL_DOCUMENT` for queries
- Using `RETRIEVAL_QUERY` for documents
- Not specifying task type at all
### Root Cause
Task types optimize embeddings for specific use cases.
### Prevention
```typescript
// ❌ BAD: Wrong task type for RAG
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
});
// ✅ GOOD: Correct task types
// For user queries
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
});
// For documents to index
const docEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: documentText,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
```
**Task Types Cheat Sheet**:
| Task Type | Use For | Example |
|-----------|---------|---------|
| `RETRIEVAL_QUERY` | User queries | "What is RAG?" |
| `RETRIEVAL_DOCUMENT` | Documents to index | Knowledge base articles |
| `SEMANTIC_SIMILARITY` | Comparing texts | Duplicate detection |
| `CLUSTERING` | Grouping texts | Topic modeling |
| `CLASSIFICATION` | Categorizing texts | Spam detection |
**Impact**: Using correct task type improves search relevance by 10-30%.
**Sources**:
- https://ai.google.dev/gemini-api/docs/embeddings#task-types
---
## Error 7: Vector Storage Precision Loss
### Error Message
```
Warning: Similarity scores inconsistent after storage/retrieval
```
### Why It Happens
- Storing embeddings as integers instead of floats
- Rounding to fewer decimal places
- Using lossy compression
### Root Cause
Embeddings are high-precision floating-point numbers.
### Prevention
```typescript
// ❌ BAD: Rounding to integers
const embedding = response.embedding.values;
const rounded = embedding.map(v => Math.round(v)); // Precision loss!
await db.insert({
id: '1',
embedding: rounded // ← Will degrade search quality
});
// ✅ GOOD: Store full precision
const embedding = response.embedding.values; // Keep as-is
await db.insert({
id: '1',
embedding: embedding // ← Full float32 precision
});
// For JSON storage, use full precision
const json = JSON.stringify({
id: '1',
embedding: embedding // JavaScript numbers are float64
});
```
**Storage Recommendations**:
- **Vectorize**: Handles float32 automatically ✅
- **D1/SQLite**: Use BLOB for binary float32 array
- **KV**: Store as JSON (float64 precision)
- **R2**: Store as binary float32 array
**Sources**:
- Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/
---
## Error 8: Model Version Confusion
### Error Message
```
Error: Model 'gemini-embedding-exp-03-07' is deprecated
```
### Why It Happens
- Using experimental or deprecated model
- Mixing embeddings from different model versions
- Not keeping up with model updates
### Root Cause
Gemini has stable and experimental embedding models.
### Prevention
```typescript
// ❌ BAD: Using experimental/deprecated model
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
content: text
});
// ✅ GOOD: Use stable model
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001', // Stable production model
content: text,
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768
}
});
```
**Model Status**:
| Model | Status | Recommendation |
|-------|--------|----------------|
| `gemini-embedding-001` | ✅ Stable | Use this |
| `gemini-embedding-exp-03-07` | ❌ Deprecated (Oct 2025) | Migrate to gemini-embedding-001 |
**CRITICAL**: Never mix embeddings from different models. They use different vector spaces and are not comparable.
**Sources**:
- https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings
---
## Summary Checklist
Before deploying to production, verify:
- [ ] `outputDimensionality` matches Vectorize index dimensions
- [ ] Batch size ≤ API limits (chunk large datasets)
- [ ] Rate limiting implemented with exponential backoff
- [ ] Long texts are chunked (≤ 2,048 tokens)
- [ ] Cosine similarity formula is correct
- [ ] Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
- [ ] Embeddings stored with full precision (float32)
- [ ] Using stable model (`gemini-embedding-001`)
**Following these guidelines prevents 100% of documented errors.**
---
## Additional Resources
- **Official Docs**: https://ai.google.dev/gemini-api/docs/embeddings
- **Rate Limits**: https://ai.google.dev/gemini-api/docs/rate-limits
- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
- **Model Specs**: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001