8.0 KiB
Index Operations Guide
Complete guide for creating and managing Vectorize indexes.
Index Configuration
Critical Decisions (Cannot Be Changed!)
When creating an index, these settings are permanent:
- Dimensions: Vector width (must match embedding model)
- Distance Metric: How similarity is calculated
Choose carefully - you cannot change these after creation!
Dimensions
Dimensions must match your embedding model's output:
| Model | Provider | Dimensions | Recommended Metric |
|---|---|---|---|
| @cf/baai/bge-base-en-v1.5 | Workers AI | 768 | cosine |
| text-embedding-3-small | OpenAI | 1536 | cosine |
| text-embedding-3-large | OpenAI | 3072 | cosine |
| text-embedding-ada-002 | OpenAI (legacy) | 1536 | cosine |
| embed-english-v3.0 | Cohere | 1024 | cosine |
Common Mistake: Creating an index with 1536 dimensions but using a 768-dim model!
Distance Metrics
Choose based on your embedding model and use case:
Cosine Similarity (cosine)
- Best for: Normalized embeddings (most common)
- Range: -1 (opposite) to 1 (identical)
- Use when: Embeddings are L2-normalized
- Most common choice - works with Workers AI, OpenAI, Cohere
npx wrangler vectorize create my-index \
--dimensions=768 \
--metric=cosine
Euclidean Distance (euclidean)
- Best for: Absolute distance matters
- Range: 0 (identical) to ∞ (different)
- Use when: Magnitude of vectors is important
- Example: Geographic coordinates, image features
npx wrangler vectorize create geo-index \
--dimensions=2 \
--metric=euclidean
Dot Product (dot-product)
- Best for: Non-normalized embeddings
- Range: -∞ to ∞
- Use when: Embeddings are not normalized
- Less common - most models produce normalized embeddings
npx wrangler vectorize create sparse-index \
--dimensions=1024 \
--metric=dot-product
Creating Indexes
Via Wrangler CLI
npx wrangler vectorize create <name> \
--dimensions=<number> \
--metric=<metric> \
[--description="<text>"]
Via REST API
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/${accountId}/vectorize/v2/indexes`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
name: 'my-index',
description: 'Production semantic search',
config: {
dimensions: 768,
metric: 'cosine',
},
}),
}
);
Metadata Indexes
⚠️ CRITICAL TIMING: Create metadata indexes IMMEDIATELY after creating the main index, BEFORE inserting any vectors!
Why Timing Matters
Vectorize builds metadata indexes only for vectors inserted AFTER the metadata index was created. Vectors inserted before won't be filterable!
Best Practice Workflow
# 1. Create main index
npx wrangler vectorize create docs-search \
--dimensions=768 \
--metric=cosine
# 2. IMMEDIATELY create all metadata indexes
npx wrangler vectorize create-metadata-index docs-search \
--property-name=category --type=string
npx wrangler vectorize create-metadata-index docs-search \
--property-name=timestamp --type=number
npx wrangler vectorize create-metadata-index docs-search \
--property-name=published --type=boolean
# 3. Verify metadata indexes exist
npx wrangler vectorize list-metadata-index docs-search
# 4. NOW safe to start inserting vectors
Metadata Index Limits
- Max 10 metadata indexes per Vectorize index
- String type: First 64 bytes indexed (UTF-8 boundaries)
- Number type: Float64 precision
- Boolean type: true/false
Choosing What to Index
Only create metadata indexes for fields you'll filter on:
✅ Good candidates:
category(string) - "docs", "tutorials", "guides"language(string) - "en", "es", "fr"published_at(number) - Unix timestampstatus(string) - "published", "draft", "archived"verified(boolean) - true/false
❌ Bad candidates (don't need indexes):
title(string) - only for display, not filteringcontent(string) - stored in metadata but not filteredurl(string) - unless filtering by URL prefix
Wrangler Binding
After creating an index, bind it to your Worker:
wrangler.jsonc
{
"name": "my-worker",
"main": "src/index.ts",
"vectorize": [
{
"binding": "VECTORIZE_INDEX",
"index_name": "docs-search"
}
]
}
TypeScript Types
export interface Env {
VECTORIZE_INDEX: VectorizeIndex;
}
Index Management Operations
List All Indexes
npx wrangler vectorize list
Get Index Details
npx wrangler vectorize get my-index
Returns:
{
"name": "my-index",
"description": "Production search",
"config": {
"dimensions": 768,
"metric": "cosine"
},
"created_on": "2024-01-15T10:30:00Z",
"modified_on": "2024-01-15T10:30:00Z"
}
Get Index Info (Vector Count)
npx wrangler vectorize info my-index
Returns:
{
"vectorsCount": 12543,
"lastProcessedMutation": {
"id": "abc123...",
"timestamp": "2024-01-20T14:22:00Z"
}
}
Delete Index
# With confirmation
npx wrangler vectorize delete my-index
# Skip confirmation (use with caution!)
npx wrangler vectorize delete my-index --force
⚠️ WARNING: Deletion is irreversible! All vectors are permanently lost.
Index Naming Best Practices
Good Names
production-docs-search- Environment + purposedev-product-recommendations- Environment + use casecustomer-support-rag- Descriptive use caseen-knowledge-base- Language + type
Bad Names
index1- Not descriptivemy_index- Use dashes, not underscoresPRODUCTION- Use lowercasethis-is-a-very-long-index-name-that-exceeds-limits- Too long
Naming Rules
- Lowercase letters and numbers only
- Dashes allowed (not underscores or spaces)
- Must start with a letter
- Max 32 characters
- No special characters
Common Patterns
Multi-Environment Setup
# Development
npx wrangler vectorize create dev-docs-search \
--dimensions=768 --metric=cosine
# Staging
npx wrangler vectorize create staging-docs-search \
--dimensions=768 --metric=cosine
# Production
npx wrangler vectorize create prod-docs-search \
--dimensions=768 --metric=cosine
// wrangler.jsonc
{
"env": {
"dev": {
"vectorize": [
{ "binding": "VECTORIZE", "index_name": "dev-docs-search" }
]
},
"staging": {
"vectorize": [
{ "binding": "VECTORIZE", "index_name": "staging-docs-search" }
]
},
"production": {
"vectorize": [
{ "binding": "VECTORIZE", "index_name": "prod-docs-search" }
]
}
}
}
Multi-Tenant with Namespaces
Instead of creating separate indexes per customer, use one index with namespaces:
# Single index for all tenants
npx wrangler vectorize create multi-tenant-index \
--dimensions=768 --metric=cosine
// Insert with namespace
await env.VECTORIZE_INDEX.upsert([{
id: 'doc-1',
values: embedding,
namespace: 'customer-abc123', // Isolates by customer
metadata: { title: 'Customer document' }
}]);
// Query within namespace
const results = await env.VECTORIZE_INDEX.query(queryVector, {
topK: 5,
namespace: 'customer-abc123' // Only search this customer's data
});
Troubleshooting
"Index name already exists"
# Check existing indexes
npx wrangler vectorize list
# Delete old index if needed
npx wrangler vectorize delete old-name --force
"Cannot change dimensions"
No fix - must create new index and re-insert all vectors.
"Wrangler version 3.71.0 required"
# Update Wrangler
npm install -g wrangler@latest
# Or use npx
npx wrangler@latest vectorize create ...