gh-jezweb-claude-skills-ski…/references/index-operations.md

# Index Operations Guide

Complete guide for creating and managing Vectorize indexes.

## Index Configuration

### Critical Decisions (Cannot Be Changed!)

When creating an index, these settings are **permanent**:

1. **Dimensions**: Vector width (must match embedding model)
2. **Distance Metric**: How similarity is calculated

Choose carefully - you cannot change these after creation!

### Dimensions

Dimensions must match your embedding model's output:

| Model | Provider | Dimensions | Recommended Metric |
|-------|----------|------------|-------------------|
| @cf/baai/bge-base-en-v1.5 | Workers AI | 768 | cosine |
| text-embedding-3-small | OpenAI | 1536 | cosine |
| text-embedding-3-large | OpenAI | 3072 | cosine |
| text-embedding-ada-002 | OpenAI (legacy) | 1536 | cosine |
| embed-english-v3.0 | Cohere | 1024 | cosine |

**Common Mistake**: Creating an index with 1536 dimensions but using a 768-dim model!

### Distance Metrics

Choose based on your embedding model and use case:

#### Cosine Similarity (`cosine`)
- **Best for**: Normalized embeddings (most common)
- **Range**: -1 (opposite) to 1 (identical)
- **Use when**: Embeddings are L2-normalized
- **Most common choice** - works with Workers AI, OpenAI, Cohere

```bash
npx wrangler vectorize create my-index \
  --dimensions=768 \
  --metric=cosine
```

#### Euclidean Distance (`euclidean`)
- **Best for**: Absolute distance matters
- **Range**: 0 (identical) to ∞ (different)
- **Use when**: Magnitude of vectors is important
- **Example**: Geographic coordinates, image features

```bash
npx wrangler vectorize create geo-index \
  --dimensions=2 \
  --metric=euclidean
```

#### Dot Product (`dot-product`)
- **Best for**: Non-normalized embeddings
- **Range**: -∞ to ∞
- **Use when**: Embeddings are not normalized
- **Less common** - most models produce normalized embeddings

```bash
npx wrangler vectorize create sparse-index \
  --dimensions=1024 \
  --metric=dot-product
```

## Creating Indexes

### Via Wrangler CLI

```bash
npx wrangler vectorize create <name> \
  --dimensions=<number> \
  --metric=<metric> \
  [--description="<text>"]
```

### Via REST API

```typescript
const response = await fetch(
  `https://api.cloudflare.com/client/v4/accounts/${accountId}/vectorize/v2/indexes`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      name: 'my-index',
      description: 'Production semantic search',
      config: {
        dimensions: 768,
        metric: 'cosine',
      },
    }),
  }
);
```

## Metadata Indexes

**⚠️ CRITICAL TIMING**: Create metadata indexes IMMEDIATELY after creating the main index, BEFORE inserting any vectors!

### Why Timing Matters

Vectorize builds metadata indexes **only for vectors inserted AFTER** the metadata index was created. Vectors inserted before won't be filterable!

### Best Practice Workflow

```bash
# 1. Create main index
npx wrangler vectorize create docs-search \
  --dimensions=768 \
  --metric=cosine

# 2. IMMEDIATELY create all metadata indexes
npx wrangler vectorize create-metadata-index docs-search \
  --property-name=category --type=string

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=timestamp --type=number

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=published --type=boolean

# 3. Verify metadata indexes exist
npx wrangler vectorize list-metadata-index docs-search

# 4. NOW safe to start inserting vectors
```

### Metadata Index Limits

- **Max 10 metadata indexes** per Vectorize index
- **String type**: First 64 bytes indexed (UTF-8 boundaries)
- **Number type**: Float64 precision
- **Boolean type**: true/false

### Choosing What to Index

Only create metadata indexes for fields you'll **filter** on:

✅ **Good candidates**:
- `category` (string) - "docs", "tutorials", "guides"
- `language` (string) - "en", "es", "fr"
- `published_at` (number) - Unix timestamp
- `status` (string) - "published", "draft", "archived"
- `verified` (boolean) - true/false

❌ **Bad candidates** (don't need indexes):
- `title` (string) - only for display, not filtering
- `content` (string) - stored in metadata but not filtered
- `url` (string) - unless filtering by URL prefix

## Wrangler Binding

After creating an index, bind it to your Worker:

### wrangler.jsonc

```jsonc
{
  "name": "my-worker",
  "main": "src/index.ts",
  "vectorize": [
    {
      "binding": "VECTORIZE_INDEX",
      "index_name": "docs-search"
    }
  ]
}
```

### TypeScript Types

```typescript
export interface Env {
  VECTORIZE_INDEX: VectorizeIndex;
}
```

## Index Management Operations

### List All Indexes

```bash
npx wrangler vectorize list
```

### Get Index Details

```bash
npx wrangler vectorize get my-index
```

**Returns**:
```json
{
  "name": "my-index",
  "description": "Production search",
  "config": {
    "dimensions": 768,
    "metric": "cosine"
  },
  "created_on": "2024-01-15T10:30:00Z",
  "modified_on": "2024-01-15T10:30:00Z"
}
```

### Get Index Info (Vector Count)

```bash
npx wrangler vectorize info my-index
```

**Returns**:
```json
{
  "vectorsCount": 12543,
  "lastProcessedMutation": {
    "id": "abc123...",
    "timestamp": "2024-01-20T14:22:00Z"
  }
}
```

### Delete Index

```bash
# With confirmation
npx wrangler vectorize delete my-index

# Skip confirmation (use with caution!)
npx wrangler vectorize delete my-index --force
```

**⚠️ WARNING**: Deletion is **irreversible**! All vectors are permanently lost.

## Index Naming Best Practices

### Good Names

- `production-docs-search` - Environment + purpose
- `dev-product-recommendations` - Environment + use case
- `customer-support-rag` - Descriptive use case
- `en-knowledge-base` - Language + type

### Bad Names

- `index1` - Not descriptive
- `my_index` - Use dashes, not underscores
- `PRODUCTION` - Use lowercase
- `this-is-a-very-long-index-name-that-exceeds-limits` - Too long

### Naming Rules

- Lowercase letters and numbers only
- Dashes allowed (not underscores or spaces)
- Must start with a letter
- Max 32 characters
- No special characters

## Common Patterns

### Multi-Environment Setup

```bash
# Development
npx wrangler vectorize create dev-docs-search \
  --dimensions=768 --metric=cosine

# Staging
npx wrangler vectorize create staging-docs-search \
  --dimensions=768 --metric=cosine

# Production
npx wrangler vectorize create prod-docs-search \
  --dimensions=768 --metric=cosine
```

```jsonc
// wrangler.jsonc
{
  "env": {
    "dev": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "dev-docs-search" }
      ]
    },
    "staging": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "staging-docs-search" }
      ]
    },
    "production": {
      "vectorize": [
        { "binding": "VECTORIZE", "index_name": "prod-docs-search" }
      ]
    }
  }
}
```

### Multi-Tenant with Namespaces

Instead of creating separate indexes per customer, use one index with namespaces:

```bash
# Single index for all tenants
npx wrangler vectorize create multi-tenant-index \
  --dimensions=768 --metric=cosine
```

```typescript
// Insert with namespace
await env.VECTORIZE_INDEX.upsert([{
  id: 'doc-1',
  values: embedding,
  namespace: 'customer-abc123', // Isolates by customer
  metadata: { title: 'Customer document' }
}]);

// Query within namespace
const results = await env.VECTORIZE_INDEX.query(queryVector, {
  topK: 5,
  namespace: 'customer-abc123' // Only search this customer's data
});
```

## Troubleshooting

### "Index name already exists"

```bash
# Check existing indexes
npx wrangler vectorize list

# Delete old index if needed
npx wrangler vectorize delete old-name --force
```

### "Cannot change dimensions"

**No fix** - must create new index and re-insert all vectors.

### "Wrangler version 3.71.0 required"

```bash
# Update Wrangler
npm install -g wrangler@latest

# Or use npx
npx wrangler@latest vectorize create ...
```

## See Also

- [Wrangler Commands](./wrangler-commands.md)
- [Vector Operations](./vector-operations.md)
- [Metadata Guide](./metadata-guide.md)