Files
2025-11-30 08:25:15 +08:00

9.0 KiB

Vector Stores - Complete Reference

In-depth guide to OpenAI's Vector Stores for the Assistants API.


Overview

Vector Stores provide scalable semantic search infrastructure for the file_search tool:

  • Capacity: Up to 10,000 files per assistant
  • Automatic: Chunking, embedding, indexing
  • Search: Vector + keyword hybrid with reranking
  • Pricing: $0.10/GB/day (first 1GB free)

Creating Vector Stores

Basic Creation

const vectorStore = await openai.beta.vectorStores.create({
  name: "Company Knowledge Base",
});

With Auto-Expiration

const vectorStore = await openai.beta.vectorStores.create({
  name: "Temporary KB",
  expires_after: {
    anchor: "last_active_at",
    days: 7,
  },
});

Anchors:

  • last_active_at: Expires N days after last use (recommended)
  • created_at: Expires N days after creation (not yet available)

With Metadata

const vectorStore = await openai.beta.vectorStores.create({
  name: "Q4 2025 Documentation",
  metadata: {
    department: "sales",
    quarter: "Q4-2025",
    version: "1.0",
  },
});

Adding Files

Single File Upload

// 1. Upload file to OpenAI
const file = await openai.files.create({
  file: fs.createReadStream("document.pdf"),
  purpose: "assistants",
});

// 2. Add to vector store
await openai.beta.vectorStores.files.create(vectorStore.id, {
  file_id: file.id,
});
// Upload multiple files
const files = await Promise.all([
  openai.files.create({ file: fs.createReadStream("doc1.pdf"), purpose: "assistants" }),
  openai.files.create({ file: fs.createReadStream("doc2.md"), purpose: "assistants" }),
  openai.files.create({ file: fs.createReadStream("doc3.docx"), purpose: "assistants" }),
]);

// Batch add to vector store
const batch = await openai.beta.vectorStores.fileBatches.create(vectorStore.id, {
  file_ids: files.map(f => f.id),
});

// Monitor progress
let batchStatus = batch;
while (batchStatus.status === 'in_progress') {
  await new Promise(r => setTimeout(r, 1000));
  batchStatus = await openai.beta.vectorStores.fileBatches.retrieve(
    vectorStore.id,
    batch.id
  );
  console.log(`${batchStatus.file_counts.completed}/${batchStatus.file_counts.total}`);
}

Benefits of Batch Upload:

  • Faster processing (parallel indexing)
  • Single operation to track
  • Better error handling

Vector Store States

State Description
in_progress Files being indexed
completed All files indexed successfully
failed Indexing failed
expired Auto-expiration triggered

Important: Wait for completed before using with assistants.


File Management

List Files in Vector Store

const files = await openai.beta.vectorStores.files.list(vectorStore.id, {
  limit: 100,
});

for (const file of files.data) {
  console.log(`${file.id}: ${file.status}`);
}

Remove File from Vector Store

await openai.beta.vectorStores.files.del(vectorStore.id, fileId);

Note: This removes the file from the vector store but doesn't delete the file from OpenAI's storage.

Check File Status

const file = await openai.beta.vectorStores.files.retrieve(vectorStore.id, fileId);

console.log(file.status); // "in_progress", "completed", "failed"

if (file.status === 'failed') {
  console.error(file.last_error);
}

Pricing & Cost Management

Pricing Structure

  • Storage: $0.10 per GB per day
  • Free tier: First 1GB
  • Calculation: Total vector store size (not original file size)

Example Costs:

Original Files Vector Store Size Daily Cost Monthly Cost
500 MB 0.5 GB $0.00 $0.00 (free tier)
2 GB 2 GB $0.10 $3.00
10 GB 10 GB $0.90 $27.00
50 GB 50 GB $4.90 $147.00

Monitor Usage

const store = await openai.beta.vectorStores.retrieve(vectorStoreId);

const sizeGB = store.usage_bytes / (1024 * 1024 * 1024);
const costPerDay = Math.max(0, (sizeGB - 1) * 0.10);
const costPerMonth = costPerDay * 30;

console.log(`Storage: ${sizeGB.toFixed(2)} GB`);
console.log(`Cost: $${costPerDay.toFixed(4)}/day ($${costPerMonth.toFixed(2)}/month)`);

Cost Optimization

1. Auto-Expiration:

expires_after: {
  anchor: "last_active_at",
  days: 30,
}

2. Regular Cleanup:

async function cleanupUnusedVectorStores() {
  const stores = await openai.beta.vectorStores.list({ limit: 100 });

  for (const store of stores.data) {
    const ageDays = (Date.now() / 1000 - store.created_at) / (60 * 60 * 24);

    if (ageDays > 90) { // 90 days old
      await openai.beta.vectorStores.del(store.id);
      console.log(`Deleted: ${store.name}`);
    }
  }
}

3. Deduplicate Content:

  • Remove duplicate files before upload
  • Combine similar documents
  • Archive old versions

Using with Assistants

Attach to Assistant

const assistant = await openai.beta.assistants.create({
  name: "Support Bot",
  tools: [{ type: "file_search" }],
  tool_resources: {
    file_search: {
      vector_store_ids: [vectorStore.id],
    },
  },
  model: "gpt-4o",
});

Multiple Vector Stores

// Combine multiple knowledge bases
tool_resources: {
  file_search: {
    vector_store_ids: [generalKBId, productDocsId, policyDocsId],
  },
}

Limit: Maximum of 1 vector store per assistant in current API (subject to change).


Advanced Operations

Update Metadata

const updated = await openai.beta.vectorStores.update(vectorStoreId, {
  name: "Updated Name",
  metadata: {
    version: "2.0",
    last_updated: new Date().toISOString(),
  },
});

Retrieve Vector Store Details

const store = await openai.beta.vectorStores.retrieve(vectorStoreId);

console.log({
  id: store.id,
  name: store.name,
  status: store.status,
  usage_bytes: store.usage_bytes,
  file_counts: store.file_counts,
  created_at: new Date(store.created_at * 1000),
  expires_at: store.expires_at ? new Date(store.expires_at * 1000) : null,
  metadata: store.metadata,
});

List All Vector Stores

const stores = await openai.beta.vectorStores.list({
  limit: 20,
  order: "desc",
});

for (const store of stores.data) {
  console.log(`${store.name}: ${store.file_counts.completed} files`);
}

Best Practices

1. Pre-Process Documents

  • Remove headers/footers
  • Clean formatting
  • Extract text from images (OCR separately)
  • Organize with clear structure

2. Monitor Indexing

async function waitForIndexing(vectorStoreId: string, batchId: string) {
  let batch;
  const startTime = Date.now();

  do {
    batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStoreId, batchId);

    if (batch.status === 'failed') {
      throw new Error('Batch indexing failed');
    }

    console.log(`Progress: ${batch.file_counts.completed}/${batch.file_counts.total}`);

    await new Promise(r => setTimeout(r, 2000));

    // Timeout after 10 minutes
    if (Date.now() - startTime > 600000) {
      throw new Error('Indexing timeout');
    }
  } while (batch.status === 'in_progress');

  return batch;
}

3. Set Reasonable Expiration

// For temporary projects
expires_after: { anchor: "last_active_at", days: 7 }

// For active knowledge bases
expires_after: { anchor: "last_active_at", days: 90 }

// For permanent KB (no expiration)
// Don't set expires_after

4. Tag with Metadata

metadata: {
  project: "project-alpha",
  environment: "production",
  version: "1.0",
  owner: "team@company.com",
}

Troubleshooting

Files Not Indexing

Check file status:

const file = await openai.beta.vectorStores.files.retrieve(vectorStoreId, fileId);

if (file.status === 'failed') {
  console.error(file.last_error);
}

Common causes:

  • Unsupported file format
  • Corrupted file
  • File too large (>512 MB)

Vector Store Shows failed Status

Check batch details:

const batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStoreId, batchId);
console.log(batch.file_counts); // Check failed count

Solutions:

  • Remove failed files
  • Re-upload with correct format
  • Check error messages

High Storage Costs

Audit vector stores:

const stores = await openai.beta.vectorStores.list({ limit: 100 });
let totalGB = 0;

for (const store of stores.data) {
  const sizeGB = store.usage_bytes / (1024 * 1024 * 1024);
  totalGB += sizeGB;
  console.log(`${store.name}: ${sizeGB.toFixed(2)} GB`);
}

console.log(`Total: ${totalGB.toFixed(2)} GB = $${((totalGB - 1) * 0.10).toFixed(2)}/day`);

Limits

Resource Limit
Files per vector store 10,000
Vector stores per account Not documented
File size 512 MB
Storage (billable) Unlimited (pay per GB)
Indexing time Varies by size

Last Updated: 2025-10-25