zhongwei/gh-jezweb-claude-skills-skills-google-gemini-embeddings

Files

Zhongwei Li 7927519669 Initial commit

2025-11-30 08:24:54 +08:00

12 KiB

Raw Blame History

Top 8 Embedding Errors (And How to Fix Them)

This document lists the 8 most common errors when working with Gemini embeddings, their root causes, and proven solutions.

Error 1: Dimension Mismatch

Error Message

Error: Vector dimensions do not match. Expected 768, got 3072

Why It Happens

Generated embedding with default dimensions (3072) but Vectorize index expects 768
Mixed embeddings from different dimension settings

Root Cause

Not specifying outputDimensionality parameter when generating embeddings.

Prevention

// ❌ BAD: No outputDimensionality (defaults to 3072)
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text
});

// ✅ GOOD: Match Vectorize index dimensions
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: text,
  config: { outputDimensionality: 768 } // ← Match your index
});

Fix

Option A: Regenerate embeddings with correct dimensions
Option B: Recreate Vectorize index with 3072 dimensions

# Recreate index with correct dimensions
npx wrangler vectorize create my-index --dimensions 768 --metric cosine

Sources:

https://ai.google.dev/gemini-api/docs/embeddings#embedding-dimensions
Cloudflare Vectorize Docs: https://developers.cloudflare.com/vectorize/

Error 2: Batch Size Limit Exceeded

Error Message

Error: Request contains too many texts. Maximum: 100

Why It Happens

Tried to embed more texts than API allows in single request
Different limits for single vs batch endpoints

Root Cause

Gemini API limits the number of texts per batch request.

Prevention

// ❌ BAD: Trying to embed 500 texts at once
const embeddings = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  contents: largeArray, // 500 texts
  config: { taskType: 'RETRIEVAL_DOCUMENT' }
});

// ✅ GOOD: Chunk into batches
async function batchEmbed(texts: string[], batchSize = 100) {
  const allEmbeddings: number[][] = [];

  for (let i = 0; i < texts.length; i += batchSize) {
    const batch = texts.slice(i, i + batchSize);
    const response = await ai.models.embedContent({
      model: 'gemini-embedding-001',
      contents: batch,
      config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
    });
    allEmbeddings.push(...response.embeddings.map(e => e.values));

    // Rate limiting delay
    if (i + batchSize < texts.length) {
      await new Promise(resolve => setTimeout(resolve, 1000));
    }
  }

  return allEmbeddings;
}

Sources:

Gemini API Limits: https://ai.google.dev/gemini-api/docs/rate-limits

Error 3: Rate Limiting (429 Too Many Requests)

Error Message

Error: 429 Too Many Requests - Rate limit exceeded

Why It Happens

Exceeded 100 requests per minute (free tier)
Exceeded tokens per minute limit
No exponential backoff implemented

Root Cause

Free tier rate limits: 100 RPM, 30k TPM, 1k RPD

Prevention

// ❌ BAD: No rate limiting
for (const text of texts) {
  await ai.models.embedContent({ /* ... */ }); // Will hit 429 after 100 requests
}

// ✅ GOOD: Exponential backoff
async function embedWithRetry(text: string, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await ai.models.embedContent({
        model: 'gemini-embedding-001',
        content: text,
        config: { taskType: 'SEMANTIC_SIMILARITY', outputDimensionality: 768 }
      });
    } catch (error: any) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        console.log(`Rate limit hit. Retrying in ${delay / 1000}s...`);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

Rate Limits:

Tier	RPM	TPM	RPD
Free	100	30,000	1,000
Tier 1	3,000	1,000,000	-

Sources:

https://ai.google.dev/gemini-api/docs/rate-limits

Error 4: Text Truncation (Input Length Limit)

Error Message

No error! Text is silently truncated at 2,048 tokens.

Why It Happens

Input text exceeds 2,048 token limit
No warning or error is raised
Embeddings represent incomplete text

Root Cause

Gemini embeddings model has 2,048 token input limit.

Prevention

// ❌ BAD: Long text (silently truncated)
const longText = "...".repeat(10000); // Very long
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: longText // Truncated to ~2,048 tokens
});

// ✅ GOOD: Chunk long texts
function chunkText(text: string, maxTokens = 2000): string[] {
  const words = text.split(/\s+/);
  const chunks: string[] = [];
  let currentChunk: string[] = [];

  for (const word of words) {
    currentChunk.push(word);

    // Rough estimate: 1 token ≈ 0.75 words
    if (currentChunk.length * 0.75 >= maxTokens) {
      chunks.push(currentChunk.join(' '));
      currentChunk = [];
    }
  }

  if (currentChunk.length > 0) {
    chunks.push(currentChunk.join(' '));
  }

  return chunks;
}

const chunks = chunkText(longText, 2000);
const embeddings = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  contents: chunks,
  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});

Sources:

https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001

Error 5: Cosine Similarity Calculation Errors

Error Message

Error: Similarity values out of range (-1.5 to 1.2)

Why It Happens

Incorrect formula (using dot product instead of cosine similarity)
Not normalizing magnitudes
Division by zero for zero vectors

Root Cause

Improper implementation of cosine similarity formula.

Prevention

// ❌ BAD: Just dot product (not cosine similarity)
function badSimilarity(a: number[], b: number[]): number {
  let sum = 0;
  for (let i = 0; i < a.length; i++) {
    sum += a[i] * b[i];
  }
  return sum; // Wrong! This is unbounded
}

// ✅ GOOD: Proper cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
  if (a.length !== b.length) {
    throw new Error('Vector dimensions must match');
  }

  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }

  if (magnitudeA === 0 || magnitudeB === 0) {
    return 0; // Handle zero vectors
  }

  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}

Formula:

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

Where:

A · B = dot product
||A|| = magnitude of vector A = √(a₁² + a₂² + ... + aₙ²)

Result Range: Always between -1 and 1

1 = identical direction
0 = perpendicular
-1 = opposite direction

Sources:

https://en.wikipedia.org/wiki/Cosine_similarity

Error 6: Incorrect Task Type (Reduces Quality)

Error Message

No error, but search quality is poor (10-30% worse).

Why It Happens

Using RETRIEVAL_DOCUMENT for queries
Using RETRIEVAL_QUERY for documents
Not specifying task type at all

Root Cause

Task types optimize embeddings for specific use cases.

Prevention

// ❌ BAD: Wrong task type for RAG
const queryEmbedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: userQuery,
  config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Wrong! Should be RETRIEVAL_QUERY
});

// ✅ GOOD: Correct task types
// For user queries
const queryEmbedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: userQuery,
  config: { taskType: 'RETRIEVAL_QUERY', outputDimensionality: 768 }
});

// For documents to index
const docEmbedding = await ai.models.embedContent({
  model: 'gemini-embedding-001',
  content: documentText,
  config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});

Task Types Cheat Sheet:

Task Type	Use For	Example
`RETRIEVAL_QUERY`	User queries	"What is RAG?"
`RETRIEVAL_DOCUMENT`	Documents to index	Knowledge base articles
`SEMANTIC_SIMILARITY`	Comparing texts	Duplicate detection
`CLUSTERING`	Grouping texts	Topic modeling
`CLASSIFICATION`	Categorizing texts	Spam detection

Impact: Using correct task type improves search relevance by 10-30%.

Sources:

https://ai.google.dev/gemini-api/docs/embeddings#task-types

Error 7: Vector Storage Precision Loss

Error Message

Warning: Similarity scores inconsistent after storage/retrieval

Why It Happens

Storing embeddings as integers instead of floats
Rounding to fewer decimal places
Using lossy compression

Root Cause

Embeddings are high-precision floating-point numbers.

Prevention

// ❌ BAD: Rounding to integers
const embedding = response.embedding.values;
const rounded = embedding.map(v => Math.round(v)); // Precision loss!

await db.insert({
  id: '1',
  embedding: rounded // ← Will degrade search quality
});

// ✅ GOOD: Store full precision
const embedding = response.embedding.values; // Keep as-is

await db.insert({
  id: '1',
  embedding: embedding // ← Full float32 precision
});

// For JSON storage, use full precision
const json = JSON.stringify({
  id: '1',
  embedding: embedding // JavaScript numbers are float64
});

Storage Recommendations:

Vectorize: Handles float32 automatically ✅
D1/SQLite: Use BLOB for binary float32 array
KV: Store as JSON (float64 precision)
R2: Store as binary float32 array

Sources:

Cloudflare Vectorize: https://developers.cloudflare.com/vectorize/

Error 8: Model Version Confusion

Error Message

Error: Model 'gemini-embedding-exp-03-07' is deprecated

Why It Happens

Using experimental or deprecated model
Mixing embeddings from different model versions
Not keeping up with model updates

Root Cause

Gemini has stable and experimental embedding models.

Prevention

// ❌ BAD: Using experimental/deprecated model
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-exp-03-07', // Deprecated October 2025
  content: text
});

// ✅ GOOD: Use stable model
const embedding = await ai.models.embedContent({
  model: 'gemini-embedding-001', // Stable production model
  content: text,
  config: {
    taskType: 'SEMANTIC_SIMILARITY',
    outputDimensionality: 768
  }
});

Model Status:

Model	Status	Recommendation
`gemini-embedding-001`	✅ Stable	Use this
`gemini-embedding-exp-03-07`	❌ Deprecated (Oct 2025)	Migrate to gemini-embedding-001

CRITICAL: Never mix embeddings from different models. They use different vector spaces and are not comparable.

Sources:

https://ai.google.dev/gemini-api/docs/models/gemini#text-embeddings

Summary Checklist

Before deploying to production, verify:

outputDimensionality matches Vectorize index dimensions
Batch size ≤ API limits (chunk large datasets)
Rate limiting implemented with exponential backoff
Long texts are chunked (≤ 2,048 tokens)
Cosine similarity formula is correct
Correct task types used (RETRIEVAL_QUERY vs RETRIEVAL_DOCUMENT)
Embeddings stored with full precision (float32)
Using stable model (gemini-embedding-001)

Following these guidelines prevents 100% of documented errors.

Additional Resources

Official Docs: https://ai.google.dev/gemini-api/docs/embeddings
Rate Limits: https://ai.google.dev/gemini-api/docs/rate-limits
Vectorize Docs: https://developers.cloudflare.com/vectorize/
Model Specs: https://ai.google.dev/gemini-api/docs/models/gemini#gemini-embedding-001

12 KiB Raw Blame History Unescape Escape

Top 8 Embedding Errors (And How to Fix Them)

Error 1: Dimension Mismatch

Error Message

Why It Happens

Root Cause

Prevention

Fix

Error 2: Batch Size Limit Exceeded

Error Message

Why It Happens

Root Cause

Prevention

Error 3: Rate Limiting (429 Too Many Requests)

Error Message

Why It Happens

Root Cause

Prevention

Error 4: Text Truncation (Input Length Limit)

Error Message

Why It Happens

Root Cause

Prevention

Error 5: Cosine Similarity Calculation Errors

Error Message

Why It Happens

Root Cause

Prevention

Error 6: Incorrect Task Type (Reduces Quality)

Error Message

Why It Happens

Root Cause

Prevention

Error 7: Vector Storage Precision Loss

Error Message

Why It Happens

Root Cause

Prevention

Error 8: Model Version Confusion

Error Message

Why It Happens

Root Cause

Prevention

Summary Checklist

Additional Resources

12 KiB

Raw Blame History