Files
gh-jezweb-claude-skills-ski…/references/index-operations.md
2025-11-30 08:24:34 +08:00

8.0 KiB

Index Operations Guide

Complete guide for creating and managing Vectorize indexes.

Index Configuration

Critical Decisions (Cannot Be Changed!)

When creating an index, these settings are permanent:

  1. Dimensions: Vector width (must match embedding model)
  2. Distance Metric: How similarity is calculated

Choose carefully - you cannot change these after creation!

Dimensions

Dimensions must match your embedding model's output:

Model Provider Dimensions Recommended Metric
@cf/baai/bge-base-en-v1.5 Workers AI 768 cosine
text-embedding-3-small OpenAI 1536 cosine
text-embedding-3-large OpenAI 3072 cosine
text-embedding-ada-002 OpenAI (legacy) 1536 cosine
embed-english-v3.0 Cohere 1024 cosine

Common Mistake: Creating an index with 1536 dimensions but using a 768-dim model!

Distance Metrics

Choose based on your embedding model and use case:

Cosine Similarity (cosine)

  • Best for: Normalized embeddings (most common)
  • Range: -1 (opposite) to 1 (identical)
  • Use when: Embeddings are L2-normalized
  • Most common choice - works with Workers AI, OpenAI, Cohere
npx wrangler vectorize create my-index \
  --dimensions=768 \
  --metric=cosine

Euclidean Distance (euclidean)

  • Best for: Absolute distance matters
  • Range: 0 (identical) to ∞ (different)
  • Use when: Magnitude of vectors is important
  • Example: Geographic coordinates, image features
npx wrangler vectorize create geo-index \
  --dimensions=2 \
  --metric=euclidean

Dot Product (dot-product)

  • Best for: Non-normalized embeddings
  • Range: -∞ to ∞
  • Use when: Embeddings are not normalized
  • Less common - most models produce normalized embeddings
npx wrangler vectorize create sparse-index \
  --dimensions=1024 \
  --metric=dot-product

Creating Indexes

Via Wrangler CLI

npx wrangler vectorize create <name> \
  --dimensions=<number> \
  --metric=<metric> \
  [--description="<text>"]

Via REST API

const response = await fetch(
  `https://api.cloudflare.com/client/v4/accounts/${accountId}/vectorize/v2/indexes`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      name: 'my-index',
      description: 'Production semantic search',
      config: {
        dimensions: 768,
        metric: 'cosine',
      },
    }),
  }
);

Metadata Indexes

⚠️ CRITICAL TIMING: Create metadata indexes IMMEDIATELY after creating the main index, BEFORE inserting any vectors!

Why Timing Matters

Vectorize builds metadata indexes only for vectors inserted AFTER the metadata index was created. Vectors inserted before won't be filterable!

Best Practice Workflow

# 1. Create main index
npx wrangler vectorize create docs-search \
  --dimensions=768 \
  --metric=cosine

# 2. IMMEDIATELY create all metadata indexes
npx wrangler vectorize create-metadata-index docs-search \
  --property-name=category --type=string

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=timestamp --type=number

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=published --type=boolean

# 3. Verify metadata indexes exist
npx wrangler vectorize list-metadata-index docs-search

# 4. NOW safe to start inserting vectors

Metadata Index Limits

  • Max 10 metadata indexes per Vectorize index
  • String type: First 64 bytes indexed (UTF-8 boundaries)
  • Number type: Float64 precision
  • Boolean type: true/false

Choosing What to Index

Only create metadata indexes for fields you'll filter on:

Good candidates:

  • category (string) - "docs", "tutorials", "guides"
  • language (string) - "en", "es", "fr"
  • published_at (number) - Unix timestamp
  • status (string) - "published", "draft", "archived"
  • verified (boolean) - true/false

Bad candidates (don't need indexes):

  • title (string) - only for display, not filtering
  • content (string) - stored in metadata but not filtered
  • url (string) - unless filtering by URL prefix

Wrangler Binding

After creating an index, bind it to your Worker:

wrangler.jsonc

{
  "name": "my-worker",
  "main": "src/index.ts",
  "vectorize": [
    {
      "binding": "VECTORIZE_INDEX",
      "index_name": "docs-search"
    }
  ]
}

TypeScript Types

export interface Env {
  VECTORIZE_INDEX: VectorizeIndex;
}

Index Management Operations

List All Indexes

npx wrangler vectorize list

Get Index Details

npx wrangler vectorize get my-index

Returns:

{
  "name": "my-index",
  "description": "Production search",
  "config": {
    "dimensions": 768,
    "metric": "cosine"
  },
  "created_on": "2024-01-15T10:30:00Z",
  "modified_on": "2024-01-15T10:30:00Z"
}

Get Index Info (Vector Count)

npx wrangler vectorize info my-index

Returns:

{
  "vectorsCount": 12543,
  "lastProcessedMutation": {
    "id": "abc123...",
    "timestamp": "2024-01-20T14:22:00Z"
  }
}

Delete Index

# With confirmation
npx wrangler vectorize delete my-index

# Skip confirmation (use with caution!)
npx wrangler vectorize delete my-index --force

⚠️ WARNING: Deletion is irreversible! All vectors are permanently lost.

Index Naming Best Practices

Good Names

  • production-docs-search - Environment + purpose
  • dev-product-recommendations - Environment + use case
  • customer-support-rag - Descriptive use case
  • en-knowledge-base - Language + type

Bad Names

  • index1 - Not descriptive
  • my_index - Use dashes, not underscores
  • PRODUCTION - Use lowercase
  • this-is-a-very-long-index-name-that-exceeds-limits - Too long

Naming Rules

  • Lowercase letters and numbers only
  • Dashes allowed (not underscores or spaces)
  • Must start with a letter
  • Max 32 characters
  • No special characters

Common Patterns

Multi-Environment Setup

# Development
npx wrangler vectorize create dev-docs-search \
  --dimensions=768 --metric=cosine

# Staging
npx wrangler vectorize create staging-docs-search \
  --dimensions=768 --metric=cosine

# Production
npx wrangler vectorize create prod-docs-search \
  --dimensions=768 --metric=cosine
// wrangler.jsonc
{
  "env": {
    "dev": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "dev-docs-search" }
      ]
    },
    "staging": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "staging-docs-search" }
      ]
    },
    "production": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "prod-docs-search" }
      ]
    }
  }
}

Multi-Tenant with Namespaces

Instead of creating separate indexes per customer, use one index with namespaces:

# Single index for all tenants
npx wrangler vectorize create multi-tenant-index \
  --dimensions=768 --metric=cosine
// Insert with namespace
await env.VECTORIZE_INDEX.upsert([{
  id: 'doc-1',
  values: embedding,
  namespace: 'customer-abc123', // Isolates by customer
  metadata: { title: 'Customer document' }
}]);

// Query within namespace
const results = await env.VECTORIZE_INDEX.query(queryVector, {
  topK: 5,
  namespace: 'customer-abc123' // Only search this customer's data
});

Troubleshooting

"Index name already exists"

# Check existing indexes
npx wrangler vectorize list

# Delete old index if needed
npx wrangler vectorize delete old-name --force

"Cannot change dimensions"

No fix - must create new index and re-insert all vectors.

"Wrangler version 3.71.0 required"

# Update Wrangler
npm install -g wrangler@latest

# Or use npx
npx wrangler@latest vectorize create ...

See Also