Files
gh-jezweb-claude-skills-ski…/references/vectorize-integration.md
2025-11-30 08:24:54 +08:00

10 KiB

Cloudflare Vectorize Integration

Complete guide for using Gemini embeddings with Cloudflare Vectorize.


Quick Start

1. Create Vectorize Index

# Create index with 768 dimensions (recommended for Gemini)
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine

# Alternative: 3072 dimensions (Gemini default, more accurate but larger)
npx wrangler vectorize create gemini-embeddings-large --dimensions 3072 --metric cosine

2. Bind to Worker

Add to wrangler.jsonc:

{
  "name": "my-rag-worker",
  "main": "src/index.ts",
  "compatibility_date": "2025-10-25",
  "vectorize": {
    "bindings": [
      {
        "binding": "VECTORIZE",
        "index_name": "gemini-embeddings"
      }
    ]
  }
}

3. Generate and Store Embeddings

// Generate embedding
const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
  {
    method: 'POST',
    headers: {
      'x-goog-api-key': env.GEMINI_API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      content: { parts: [{ text: 'Your document text' }] },
      taskType: 'RETRIEVAL_DOCUMENT',
      outputDimensionality: 768 // MUST match index dimensions
    })
  }
);

const data = await response.json();
const embedding = data.embedding.values;

// Insert into Vectorize
await env.VECTORIZE.insert([{
  id: 'doc-1',
  values: embedding,
  metadata: { text: 'Your document text', source: 'manual' }
}]);

Dimension Configuration

CRITICAL: Embedding dimensions MUST match Vectorize index dimensions.

Gemini Dimensions Storage (per vector) Recommended For
768 3 KB Most use cases, cost-effective
1536 6 KB Balance accuracy/storage
3072 12 KB Maximum accuracy

Create index to match your embeddings:

# For 768-dim embeddings
npx wrangler vectorize create my-index --dimensions 768 --metric cosine

# For 1536-dim embeddings
npx wrangler vectorize create my-index --dimensions 1536 --metric cosine

# For 3072-dim embeddings (Gemini default)
npx wrangler vectorize create my-index --dimensions 3072 --metric cosine

Metric Selection

Vectorize supports 3 distance metrics:

npx wrangler vectorize create my-index --dimensions 768 --metric cosine

When to use:

  • Semantic search (most common)
  • Document similarity
  • RAG systems

Range: 0 (different) to 1 (identical)

Euclidean

npx wrangler vectorize create my-index --dimensions 768 --metric euclidean

When to use:

  • Absolute distance matters
  • Magnitude is important

Range: 0 (identical) to ∞ (very different)

Dot Product

npx wrangler vectorize create my-index --dimensions 768 --metric dot-product

When to use:

  • Pre-normalized vectors
  • Performance optimization

Range: -1 to 1 (for normalized vectors)

Recommendation: Use cosine for Gemini embeddings (most common and intuitive).


Insert Patterns

Single Insert

await env.VECTORIZE.insert([{
  id: 'doc-1',
  values: embedding,
  metadata: {
    text: 'Document content',
    timestamp: Date.now(),
    category: 'documentation'
  }
}]);

Batch Insert

const vectors = documents.map((doc, i) => ({
  id: `doc-${i}`,
  values: doc.embedding,
  metadata: { text: doc.text }
}));

// Insert up to 100 vectors at once
await env.VECTORIZE.insert(vectors);

Upsert (Update or Insert)

// Vectorize automatically updates if ID exists
await env.VECTORIZE.insert([{
  id: 'doc-1', // Existing ID
  values: newEmbedding,
  metadata: { text: 'Updated content' }
}]);

Query Patterns

Basic Query

const results = await env.VECTORIZE.query(queryEmbedding, {
  topK: 5
});

console.log(results.matches);
// [{ id: 'doc-1', score: 0.95 }, ...]

Query with Metadata

const results = await env.VECTORIZE.query(queryEmbedding, {
  topK: 5,
  returnMetadata: true
});

results.matches.forEach(match => {
  console.log(match.id);           // 'doc-1'
  console.log(match.score);        // 0.95
  console.log(match.metadata.text); // 'Document content'
});

Query with Metadata Filtering (Future)

// Coming soon: Filter by metadata
const results = await env.VECTORIZE.query(queryEmbedding, {
  topK: 5,
  filter: { category: 'documentation' }
});

Metadata Best Practices

What to Store

await env.VECTORIZE.insert([{
  id: 'doc-1',
  values: embedding,
  metadata: {
    // ✅ Store these
    text: 'The actual document content', // For retrieval
    title: 'Document title',
    url: 'https://example.com/doc',
    timestamp: Date.now(),
    category: 'product',

    // ❌ Don't store these
    embedding: embedding, // Already stored as values
    largeObject: { /* ... */ } // Keep metadata small
  }
}]);

Metadata Limits

  • Max size: ~1 KB per vector
  • Best practice: Store only what you need for retrieval/display
  • For large data: Store minimal metadata, fetch full data from D1/KV using ID

Complete RAG Example

interface Env {
  GEMINI_API_KEY: string;
  VECTORIZE: VectorizeIndex;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    // Ingest: POST /ingest with { text: "..." }
    if (url.pathname === '/ingest' && request.method === 'POST') {
      const { text } = await request.json();

      // 1. Generate embedding
      const embeddingRes = await fetch(
        'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
        {
          method: 'POST',
          headers: {
            'x-goog-api-key': env.GEMINI_API_KEY,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({
            content: { parts: [{ text }] },
            taskType: 'RETRIEVAL_DOCUMENT',
            outputDimensionality: 768
          })
        }
      );

      const embeddingData = await embeddingRes.json();
      const embedding = embeddingData.embedding.values;

      // 2. Store in Vectorize
      await env.VECTORIZE.insert([{
        id: `doc-${Date.now()}`,
        values: embedding,
        metadata: { text, timestamp: Date.now() }
      }]);

      return new Response(JSON.stringify({ success: true }));
    }

    // Query: POST /query with { query: "..." }
    if (url.pathname === '/query' && request.method === 'POST') {
      const { query } = await request.json();

      // 1. Generate query embedding
      const embeddingRes = await fetch(
        'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
        {
          method: 'POST',
          headers: {
            'x-goog-api-key': env.GEMINI_API_KEY,
            'Content-Type': 'application/json'
          },
          body: JSON.stringify({
            content: { parts: [{ text: query }] },
            taskType: 'RETRIEVAL_QUERY',
            outputDimensionality: 768
          })
        }
      );

      const embeddingData = await embeddingRes.json();
      const embedding = embeddingData.embedding.values;

      // 2. Search Vectorize
      const results = await env.VECTORIZE.query(embedding, {
        topK: 5,
        returnMetadata: true
      });

      return new Response(JSON.stringify({
        query,
        results: results.matches.map(m => ({
          id: m.id,
          score: m.score,
          text: m.metadata?.text
        }))
      }));
    }

    return new Response('Not found', { status: 404 });
  }
};

Index Management

List Indexes

npx wrangler vectorize list

Get Index Info

npx wrangler vectorize get gemini-embeddings

Delete Index

npx wrangler vectorize delete gemini-embeddings

CRITICAL: Deleting an index deletes all vectors permanently.


Limitations & Quotas

Feature Free Plan Paid Plans
Indexes per account 100 100
Vectors per index 200,000 5,000,000+
Queries per day 30,000,000 Unlimited
Dimensions Up to 1536 Up to 3072

Source: https://developers.cloudflare.com/vectorize/platform/pricing/


Best Practices

1. Choose Dimensions Wisely

// ✅ 768 dimensions (recommended)
// - Good accuracy
// - Low storage
// - Fast queries

// ⚠️ 3072 dimensions (if accuracy is critical)
// - Best accuracy
// - 4x storage
// - Slower queries

2. Use Metadata for Context

await env.VECTORIZE.insert([{
  id: 'doc-1',
  values: embedding,
  metadata: {
    text: 'Store the actual text here for retrieval',
    url: 'https://...',
    timestamp: Date.now()
  }
}]);

3. Implement Caching

// Cache embeddings in KV
const cached = await env.KV.get(`embedding:${textHash}`);
if (cached) {
  return JSON.parse(cached);
}

const embedding = await generateEmbedding(text);
await env.KV.put(`embedding:${textHash}`, JSON.stringify(embedding), {
  expirationTtl: 86400 // 24 hours
});

4. Monitor Usage

# Check index stats
npx wrangler vectorize get gemini-embeddings

# Shows:
# - Total vectors
# - Dimensions
# - Metric type

Troubleshooting

Dimension Mismatch Error

Error: Vector dimensions do not match. Expected 768, got 3072

Solution: Ensure embedding outputDimensionality matches index dimensions.

No Results Found

Possible causes:

  1. Index is empty (no vectors inserted)
  2. Query embedding is wrong task type (use RETRIEVAL_QUERY)
  3. Similarity threshold too high

Solution: Check index has vectors, use correct task types.


Official Documentation