Initial commit

2025-11-30 08:25:15 +08:00
commit 0c577730d5
20 changed files with 5085 additions and 0 deletions
--- a/references/vector-stores.md
+++ b/references/vector-stores.md
@@ -0,0 +1,422 @@
+# Vector Stores - Complete Reference
+
+In-depth guide to OpenAI's Vector Stores for the Assistants API.
+
+---
+
+## Overview
+
+Vector Stores provide scalable semantic search infrastructure for the file_search tool:
+- **Capacity**: Up to 10,000 files per assistant
+- **Automatic**: Chunking, embedding, indexing
+- **Search**: Vector + keyword hybrid with reranking
+- **Pricing**: $0.10/GB/day (first 1GB free)
+
+---
+
+## Creating Vector Stores
+
+### Basic Creation
+
+```typescript
+const vectorStore = await openai.beta.vectorStores.create({
+  name: "Company Knowledge Base",
+});
+```
+
+### With Auto-Expiration
+
+```typescript
+const vectorStore = await openai.beta.vectorStores.create({
+  name: "Temporary KB",
+  expires_after: {
+    anchor: "last_active_at",
+    days: 7,
+  },
+});
+```
+
+**Anchors**:
+- `last_active_at`: Expires N days after last use (recommended)
+- `created_at`: Expires N days after creation (not yet available)
+
+### With Metadata
+
+```typescript
+const vectorStore = await openai.beta.vectorStores.create({
+  name: "Q4 2025 Documentation",
+  metadata: {
+    department: "sales",
+    quarter: "Q4-2025",
+    version: "1.0",
+  },
+});
+```
+
+---
+
+## Adding Files
+
+### Single File Upload
+
+```typescript
+// 1. Upload file to OpenAI
+const file = await openai.files.create({
+  file: fs.createReadStream("document.pdf"),
+  purpose: "assistants",
+});
+
+// 2. Add to vector store
+await openai.beta.vectorStores.files.create(vectorStore.id, {
+  file_id: file.id,
+});
+```
+
+### Batch Upload (Recommended)
+
+```typescript
+// Upload multiple files
+const files = await Promise.all([
+  openai.files.create({ file: fs.createReadStream("doc1.pdf"), purpose: "assistants" }),
+  openai.files.create({ file: fs.createReadStream("doc2.md"), purpose: "assistants" }),
+  openai.files.create({ file: fs.createReadStream("doc3.docx"), purpose: "assistants" }),
+]);
+
+// Batch add to vector store
+const batch = await openai.beta.vectorStores.fileBatches.create(vectorStore.id, {
+  file_ids: files.map(f => f.id),
+});
+
+// Monitor progress
+let batchStatus = batch;
+while (batchStatus.status === 'in_progress') {
+  await new Promise(r => setTimeout(r, 1000));
+  batchStatus = await openai.beta.vectorStores.fileBatches.retrieve(
+    vectorStore.id,
+    batch.id
+  );
+  console.log(`${batchStatus.file_counts.completed}/${batchStatus.file_counts.total}`);
+}
+```
+
+**Benefits of Batch Upload**:
+- Faster processing (parallel indexing)
+- Single operation to track
+- Better error handling
+
+---
+
+## Vector Store States
+
+| State | Description |
+|-------|-------------|
+| `in_progress` | Files being indexed |
+| `completed` | All files indexed successfully |
+| `failed` | Indexing failed |
+| `expired` | Auto-expiration triggered |
+
+**Important**: Wait for `completed` before using with assistants.
+
+---
+
+## File Management
+
+### List Files in Vector Store
+
+```typescript
+const files = await openai.beta.vectorStores.files.list(vectorStore.id, {
+  limit: 100,
+});
+
+for (const file of files.data) {
+  console.log(`${file.id}: ${file.status}`);
+}
+```
+
+### Remove File from Vector Store
+
+```typescript
+await openai.beta.vectorStores.files.del(vectorStore.id, fileId);
+```
+
+**Note**: This removes the file from the vector store but doesn't delete the file from OpenAI's storage.
+
+### Check File Status
+
+```typescript
+const file = await openai.beta.vectorStores.files.retrieve(vectorStore.id, fileId);
+
+console.log(file.status); // "in_progress", "completed", "failed"
+
+if (file.status === 'failed') {
+  console.error(file.last_error);
+}
+```
+
+---
+
+## Pricing & Cost Management
+
+### Pricing Structure
+
+- **Storage**: $0.10 per GB per day
+- **Free tier**: First 1GB
+- **Calculation**: Total vector store size (not original file size)
+
+**Example Costs**:
+| Original Files | Vector Store Size | Daily Cost | Monthly Cost |
+|----------------|-------------------|------------|--------------|
+| 500 MB | 0.5 GB | $0.00 | $0.00 (free tier) |
+| 2 GB | 2 GB | $0.10 | $3.00 |
+| 10 GB | 10 GB | $0.90 | $27.00 |
+| 50 GB | 50 GB | $4.90 | $147.00 |
+
+### Monitor Usage
+
+```typescript
+const store = await openai.beta.vectorStores.retrieve(vectorStoreId);
+
+const sizeGB = store.usage_bytes / (1024 * 1024 * 1024);
+const costPerDay = Math.max(0, (sizeGB - 1) * 0.10);
+const costPerMonth = costPerDay * 30;
+
+console.log(`Storage: ${sizeGB.toFixed(2)} GB`);
+console.log(`Cost: $${costPerDay.toFixed(4)}/day ($${costPerMonth.toFixed(2)}/month)`);
+```
+
+### Cost Optimization
+
+**1. Auto-Expiration**:
+```typescript
+expires_after: {
+  anchor: "last_active_at",
+  days: 30,
+}
+```
+
+**2. Regular Cleanup**:
+```typescript
+async function cleanupUnusedVectorStores() {
+  const stores = await openai.beta.vectorStores.list({ limit: 100 });
+
+  for (const store of stores.data) {
+    const ageDays = (Date.now() / 1000 - store.created_at) / (60 * 60 * 24);
+
+    if (ageDays > 90) { // 90 days old
+      await openai.beta.vectorStores.del(store.id);
+      console.log(`Deleted: ${store.name}`);
+    }
+  }
+}
+```
+
+**3. Deduplicate Content**:
+- Remove duplicate files before upload
+- Combine similar documents
+- Archive old versions
+
+---
+
+## Using with Assistants
+
+### Attach to Assistant
+
+```typescript
+const assistant = await openai.beta.assistants.create({
+  name: "Support Bot",
+  tools: [{ type: "file_search" }],
+  tool_resources: {
+    file_search: {
+      vector_store_ids: [vectorStore.id],
+    },
+  },
+  model: "gpt-4o",
+});
+```
+
+### Multiple Vector Stores
+
+```typescript
+// Combine multiple knowledge bases
+tool_resources: {
+  file_search: {
+    vector_store_ids: [generalKBId, productDocsId, policyDocsId],
+  },
+}
+```
+
+**Limit**: Maximum of 1 vector store per assistant in current API (subject to change).
+
+---
+
+## Advanced Operations
+
+### Update Metadata
+
+```typescript
+const updated = await openai.beta.vectorStores.update(vectorStoreId, {
+  name: "Updated Name",
+  metadata: {
+    version: "2.0",
+    last_updated: new Date().toISOString(),
+  },
+});
+```
+
+### Retrieve Vector Store Details
+
+```typescript
+const store = await openai.beta.vectorStores.retrieve(vectorStoreId);
+
+console.log({
+  id: store.id,
+  name: store.name,
+  status: store.status,
+  usage_bytes: store.usage_bytes,
+  file_counts: store.file_counts,
+  created_at: new Date(store.created_at * 1000),
+  expires_at: store.expires_at ? new Date(store.expires_at * 1000) : null,
+  metadata: store.metadata,
+});
+```
+
+### List All Vector Stores
+
+```typescript
+const stores = await openai.beta.vectorStores.list({
+  limit: 20,
+  order: "desc",
+});
+
+for (const store of stores.data) {
+  console.log(`${store.name}: ${store.file_counts.completed} files`);
+}
+```
+
+---
+
+## Best Practices
+
+### 1. Pre-Process Documents
+
+- Remove headers/footers
+- Clean formatting
+- Extract text from images (OCR separately)
+- Organize with clear structure
+
+### 2. Monitor Indexing
+
+```typescript
+async function waitForIndexing(vectorStoreId: string, batchId: string) {
+  let batch;
+  const startTime = Date.now();
+
+  do {
+    batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStoreId, batchId);
+
+    if (batch.status === 'failed') {
+      throw new Error('Batch indexing failed');
+    }
+
+    console.log(`Progress: ${batch.file_counts.completed}/${batch.file_counts.total}`);
+
+    await new Promise(r => setTimeout(r, 2000));
+
+    // Timeout after 10 minutes
+    if (Date.now() - startTime > 600000) {
+      throw new Error('Indexing timeout');
+    }
+  } while (batch.status === 'in_progress');
+
+  return batch;
+}
+```
+
+### 3. Set Reasonable Expiration
+
+```typescript
+// For temporary projects
+expires_after: { anchor: "last_active_at", days: 7 }
+
+// For active knowledge bases
+expires_after: { anchor: "last_active_at", days: 90 }
+
+// For permanent KB (no expiration)
+// Don't set expires_after
+```
+
+### 4. Tag with Metadata
+
+```typescript
+metadata: {
+  project: "project-alpha",
+  environment: "production",
+  version: "1.0",
+  owner: "team@company.com",
+}
+```
+
+---
+
+## Troubleshooting
+
+### Files Not Indexing
+
+**Check file status**:
+```typescript
+const file = await openai.beta.vectorStores.files.retrieve(vectorStoreId, fileId);
+
+if (file.status === 'failed') {
+  console.error(file.last_error);
+}
+```
+
+**Common causes**:
+- Unsupported file format
+- Corrupted file
+- File too large (>512 MB)
+
+### Vector Store Shows `failed` Status
+
+**Check batch details**:
+```typescript
+const batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStoreId, batchId);
+console.log(batch.file_counts); // Check failed count
+```
+
+**Solutions**:
+- Remove failed files
+- Re-upload with correct format
+- Check error messages
+
+### High Storage Costs
+
+**Audit vector stores**:
+```typescript
+const stores = await openai.beta.vectorStores.list({ limit: 100 });
+let totalGB = 0;
+
+for (const store of stores.data) {
+  const sizeGB = store.usage_bytes / (1024 * 1024 * 1024);
+  totalGB += sizeGB;
+  console.log(`${store.name}: ${sizeGB.toFixed(2)} GB`);
+}
+
+console.log(`Total: ${totalGB.toFixed(2)} GB = $${((totalGB - 1) * 0.10).toFixed(2)}/day`);
+```
+
+---
+
+## Limits
+
+| Resource | Limit |
+|----------|-------|
+| Files per vector store | 10,000 |
+| Vector stores per account | Not documented |
+| File size | 512 MB |
+| Storage (billable) | Unlimited (pay per GB) |
+| Indexing time | Varies by size |
+
+---
+
+**Last Updated**: 2025-10-25