Initial commit

2025-11-30 08:24:54 +08:00
commit 7927519669
17 changed files with 4377 additions and 0 deletions
--- a/references/top-errors.md
+++ b/references/top-errors.md
@@ -0,0 +1,460 @@
+# Top 8 Embedding Errors (And How to Fix Them)
+
+This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.
+
+---
+
+## Error 1: Dimension Mismatch
+
+### Error Message
+```
+Error: Vector dimensions do not match. Expected 768, got 3072
+```
+
+### Why It Happens
+- Generated embedding with default dimensions (3072) but Vectorize index expects 768
+- Mixed embeddings from different dimension settings
+
+### Root Cause
+Not specifying `outputDimensionality` parameter when generating embeddings.
+
+### Prevention
+```typescript
+// ❌ BAD: No outputDimensionality (defaults to 3072)
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text
+});
+
+// ✅ GOOD: Match Vectorize index dimensions
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: text,
+  config: { outputDimensionality: 768 } // ← Match your index
+});
+```
+
+### Fix
+1. **Option A**: Regenerate embeddings with correct dimensions
+2. **Option B**: Recreate Vectorize index with 3072 dimensions
+
+```bash
+# Recreate index with correct dimensions
+npx wrangler vectorize create my-index --dimensions 768 --metric cosine
+```
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
+- Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/
+
+---
+
+## Error 2: Batch Size Limit Exceeded
+
+### Error Message
+```
+Error: Request contains too many texts. Maximum: 100
+```
+
+### Why It Happens
+- Tried to embed more texts than API allows in single request
+- Different limits for single vs batch endpoints
+
+### Root Cause
+Gemini API limits the number of texts per batch request.
+
+### Prevention
+```typescript
+// ❌ BAD: Trying to embed 500 texts at once
+const embeddings = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  contents: largeArray, // 500 texts
+  config: { taskType: 'RETRIEVAL_DOCUMENT' }
+});
+
+// ✅ GOOD: Chunk into batches
+async function batchEmbed(texts: string[], batchSize = 100) {
+  const allEmbeddings: number[][] = [];
+
+  for (let i = 0; i < texts.length; i += batchSize) {
+    const batch = texts.slice(i, i + batchSize);
+    const response = await ai.models.embedContent({
+      model: 'gemini-embedding-001',
+      contents: batch,
+      config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+    });
+    allEmbeddings.push(...response.embeddings.map(e => e.values));
+
+    // Rate limiting delay
+    if (i + batchSize < texts.length) {
+      await new Promise(resolve => setTimeout(resolve, 1000));
+    }
+  }
+
+  return allEmbeddings;
+}
+```
+
+**Sources**:
+- Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits
+
+---
+
+## Error 3: Rate Limiting (429 Too Many Requests)
+
+### Error Message
+```
+Error: 429 Too Many Requests - Rate limit exceeded
+```
+
+### Why It Happens
+- Exceeded 100 requests per minute (free tier)
+- Exceeded tokens per minute limit
+- No exponential backoff implemented
+
+### Root Cause
+Free tier rate limits: 100 RPM, 30k TPM, 1k RPD
+
+### Prevention
+```typescript
+// ❌ BAD: No rate limiting
+for (const text of texts) {
+  await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
+}
+
+// ✅ GOOD: Exponential backoff
+async function embedWithRetry(text: string, maxRetries = 3) {
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      return await ai.models.embedContent({
+        model: 'gemini-embedding-001',
+        content: text,
+        config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
+      });
+    } catch (error: any) {
+      if (error.status === 429 && attempt < maxRetries - 1) {
+        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
+        console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
+        await new Promise(resolve => setTimeout(resolve, delay));
+        continue;
+      }
+      throw error;
+    }
+  }
+}
+```
+
+**Rate Limits**:
+| Tier | RPM | TPM | RPD |
+|------|-----|-----|-----|
+| Free | 100 | 30,000 | 1,000 |
+| Tier 1 | 3,000 | 1,000,000 | - |
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/rate-limits
+
+---
+
+## Error 4: Text Truncation (Input Length Limit)
+
+### Error Message
+No error! Text is **silently truncated** at 2,048 tokens.
+
+### Why It Happens
+- Input text exceeds 2,048 token limit
+- No warning or error is raised
+- Embeddings represent incomplete text
+
+### Root Cause
+Gemini embeddings model has 2,048 token input limit.
+
+### Prevention
+```typescript
+// ❌ BAD: Long text (silently truncated)
+const longText = "...".repeat(10000); // Very long
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: longText // Truncated to ~2,048 tokens
+});
+
+// ✅ GOOD: Chunk long texts
+function chunkText(text: string, maxTokens = 2000): string[] {
+  const words = text.split(/\s+/);
+  const chunks: string[] = [];
+  let currentChunk: string[] = [];
+
+  for (const word of words) {
+    currentChunk.push(word);
+
+    // Rough estimate: 1 token ≈ 0.75 words
+    if (currentChunk.length * 0.75 >= maxTokens) {
+      chunks.push(currentChunk.join(' '));
+      currentChunk = [];
+    }
+  }
+
+  if (currentChunk.length > 0) {
+    chunks.push(currentChunk.join(' '));
+  }
+
+  return chunks;
+}
+
+const chunks = chunkText(longText, 2000);
+const embeddings = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  contents: chunks,
+  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+});
+```
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001
+
+---
+
+## Error 5: Cosine Similarity Calculation Errors
+
+### Error Message
+```
+Error: Similarity values out of range (-1.5 to 1.2)
+```
+
+### Why It Happens
+- Incorrect formula (using dot product instead of cosine similarity)
+- Not normalizing magnitudes
+- Division by zero for zero vectors
+
+### Root Cause
+Improper implementation of cosine similarity formula.
+
+### Prevention
+```typescript
+// ❌ BAD: Just dot product (not cosine similarity)
+function badSimilarity(a: number[], b: number[]): number {
+  let sum = 0;
+  for (let i = 0; i < a.length; i++) {
+    sum += a[i] * b[i];
+  }
+  return sum; // Wrong! This is unbounded
+}
+
+// ✅ GOOD: Proper cosine similarity
+function cosineSimilarity(a: number[], b: number[]): number {
+  if (a.length !== b.length) {
+    throw new Error('Vector dimensions must match');
+  }
+
+  let dotProduct = 0;
+  let magnitudeA = 0;
+  let magnitudeB = 0;
+
+  for (let i = 0; i < a.length; i++) {
+    dotProduct += a[i] * b[i];
+    magnitudeA += a[i] * a[i];
+    magnitudeB += b[i] * b[i];
+  }
+
+  if (magnitudeA === 0 || magnitudeB === 0) {
+    return 0; // Handle zero vectors
+  }
+
+  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
+}
+```
+
+**Formula**:
+```
+cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)
+```
+
+Where:
+- `A · B` = dot product
+- `||A||` = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)
+
+**Result Range**: Always between -1 and 1
+- 1 = identical direction
+- 0 = perpendicular
+- -1 = opposite direction
+
+**Sources**:
+- https://en.wikipedia.org/wiki/Cosine_similarity
+
+---
+
+## Error 6: Incorrect Task Type (Reduces Quality)
+
+### Error Message
+No error, but search quality is poor (10-30% worse).
+
+### Why It Happens
+- Using `RETRIEVAL_DOCUMENT` for queries
+- Using `RETRIEVAL_QUERY` for documents
+- Not specifying task type at all
+
+### Root Cause
+Task types optimize embeddings for specific use cases.
+
+### Prevention
+```typescript
+// ❌ BAD: Wrong task type for RAG
+const queryEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: userQuery,
+  config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
+});
+
+// ✅ GOOD: Correct task types
+// For user queries
+const queryEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: userQuery,
+  config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
+});
+
+// For documents to index
+const docEmbedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001',
+  content: documentText,
+  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
+});
+```
+
+**Task Types Cheat Sheet**:
+| Task Type | Use For | Example |
+|-----------|---------|---------|
+| `RETRIEVAL_QUERY` | User queries | "What is RAG?" |
+| `RETRIEVAL_DOCUMENT` | Documents to index | Knowledge base articles |
+| `SEMANTIC_SIMILARITY` | Comparing texts | Duplicate detection |
+| `CLUSTERING` | Grouping texts | Topic modeling |
+| `CLASSIFICATION` | Categorizing texts | Spam detection |
+
+**Impact**: Using correct task type improves search relevance by 10-30%.
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/embeddings#task-types
+
+---
+
+## Error 7: Vector Storage Precision Loss
+
+### Error Message
+```
+Warning: Similarity scores inconsistent after storage/retrieval
+```
+
+### Why It Happens
+- Storing embeddings as integers instead of floats
+- Rounding to fewer decimal places
+- Using lossy compression
+
+### Root Cause
+Embeddings are high-precision floating-point numbers.
+
+### Prevention
+```typescript
+// ❌ BAD: Rounding to integers
+const embedding = response.embedding.values;
+const rounded = embedding.map(v => Math.round(v)); // Precision loss!
+
+await db.insert({
+  id: '1',
+  embedding: rounded // ← Will degrade search quality
+});
+
+// ✅ GOOD: Store full precision
+const embedding = response.embedding.values; // Keep as-is
+
+await db.insert({
+  id: '1',
+  embedding: embedding // ← Full float32 precision
+});
+
+// For JSON storage, use full precision
+const json = JSON.stringify({
+  id: '1',
+  embedding: embedding // JavaScript numbers are float64
+});
+```
+
+**Storage Recommendations**:
+- **Vectorize**: Handles float32 automatically ✅
+- **D1/SQLite**: Use BLOB for binary float32 array
+- **KV**: Store as JSON (float64 precision)
+- **R2**: Store as binary float32 array
+
+**Sources**:
+- Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/
+
+---
+
+## Error 8: Model Version Confusion
+
+### Error Message
+```
+Error: Model 'gemini-embedding-exp-03-07' is deprecated
+```
+
+### Why It Happens
+- Using experimental or deprecated model
+- Mixing embeddings from different model versions
+- Not keeping up with model updates
+
+### Root Cause
+Gemini has stable and experimental embedding models.
+
+### Prevention
+```typescript
+// ❌ BAD: Using experimental/deprecated model
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
+  content: text
+});
+
+// ✅ GOOD: Use stable model
+const embedding = await ai.models.embedContent({
+  model: 'gemini-embedding-001', // Stable production model
+  content: text,
+  config: {
+    taskType: 'SEMANTIC_SIMILARITY',
+    outputDimensionality: 768
+  }
+});
+```
+
+**Model Status**:
+| Model | Status | Recommendation |
+|-------|--------|----------------|
+| `gemini-embedding-001` | ✅ Stable | Use this |
+| `gemini-embedding-exp-03-07` | ❌ Deprecated (Oct 2025) | Migrate to gemini-embedding-001 |
+
+**CRITICAL**: Never mix embeddings from different models. They use different vector spaces and are not comparable.
+
+**Sources**:
+- https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings
+
+---
+
+## Summary Checklist
+
+Before deploying to production, verify:
+
+- [ ] `outputDimensionality` matches Vectorize index dimensions
+- [ ] Batch size ≤ API limits (chunk large datasets)
+- [ ] Rate limiting implemented with exponential backoff
+- [ ] Long texts are chunked (≤ 2,048 tokens)
+- [ ] Cosine similarity formula is correct
+- [ ] Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
+- [ ] Embeddings stored with full precision (float32)
+- [ ] Using stable model (`gemini-embedding-001`)
+
+**Following these guidelines prevents 100% of documented errors.**
+
+---
+
+## Additional Resources
+
+- **Official Docs**: https://ai.google.dev/gemini-api/docs/embeddings
+- **Rate Limits**: https://ai.google.dev/gemini-api/docs/rate-limits
+- **Vectorize Docs**: https://developers.cloudflare.com/vectorize/
+- **Model Specs**: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001