8.0 KiB
8.0 KiB
File Search & RAG Guide
Complete guide to implementing Retrieval-Augmented Generation (RAG) with the Assistants API.
What is File Search?
A built-in tool for semantic search over documents using vector stores:
- Capacity: Up to 10,000 files per assistant (vs 20 in v1)
- Technology: Vector + keyword search with reranking
- Automatic: Chunking, embedding, and indexing handled by OpenAI
- Pricing: $0.10/GB/day (first 1GB free)
Architecture
Documents (PDF, DOCX, MD, etc.)
↓
Vector Store (chunking + embeddings)
↓
Assistant with file_search tool
↓
Semantic Search + Reranking
↓
Retrieved Context + LLM Generation
Quick Setup
1. Create Vector Store
const vectorStore = await openai.beta.vectorStores.create({
name: "Product Documentation",
expires_after: {
anchor: "last_active_at",
days: 30,
},
});
2. Upload Documents
const files = await Promise.all([
openai.files.create({ file: fs.createReadStream("doc1.pdf"), purpose: "assistants" }),
openai.files.create({ file: fs.createReadStream("doc2.md"), purpose: "assistants" }),
]);
const batch = await openai.beta.vectorStores.fileBatches.create(vectorStore.id, {
file_ids: files.map(f => f.id),
});
3. Wait for Indexing
let batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStore.id, batch.id);
while (batch.status === 'in_progress') {
await new Promise(r => setTimeout(r, 2000));
batch = await openai.beta.vectorStores.fileBatches.retrieve(vectorStore.id, batch.id);
}
4. Create Assistant
const assistant = await openai.beta.assistants.create({
name: "Knowledge Base Assistant",
instructions: "Answer questions using the file search tool. Always cite your sources.",
tools: [{ type: "file_search" }],
tool_resources: {
file_search: {
vector_store_ids: [vectorStore.id],
},
},
model: "gpt-4o",
});
Supported File Formats
.pdf- PDFs (most common).docx- Word documents.md,.txt- Plain text.html- HTML documents.json- JSON data.py,.js,.ts,.cpp,.java- Code files
Size Limits:
- Per file: 512 MB
- Total per vector store: Limited by pricing ($0.10/GB/day)
Chunking Strategy
OpenAI automatically chunks documents using:
- Max chunk size: ~800 tokens (configurable internally)
- Overlap: Ensures context continuity
- Hierarchy: Preserves document structure (headers, sections)
Optimize for Better Results
Document Structure:
# Main Topic
## Subtopic 1
Content here...
## Subtopic 2
Content here...
Clear Sections: Use headers to organize content Concise Paragraphs: Avoid very long paragraphs (500+ words) Self-Contained: Each section should make sense independently
Improving Search Quality
1. Better Instructions
const assistant = await openai.beta.assistants.create({
instructions: `You are a support assistant. When answering:
1. Use file_search to find relevant information
2. Synthesize information from multiple sources
3. Always provide citations with file names
4. If information isn't found, say so clearly
5. Don't make up information not in the documents`,
tools: [{ type: "file_search" }],
// ...
});
2. Query Refinement
Encourage users to be specific:
- ❌ "How do I install?"
- ✅ "How do I install the product on Windows 10?"
3. Multi-Document Answers
File Search automatically retrieves from multiple documents and combines information.
Citations
Accessing Citations
const messages = await openai.beta.threads.messages.list(thread.id);
const response = messages.data[0];
for (const content of response.content) {
if (content.type === 'text') {
console.log('Answer:', content.text.value);
// Citations
if (content.text.annotations) {
for (const annotation of content.text.annotations) {
if (annotation.type === 'file_citation') {
console.log('Source:', annotation.file_citation.file_id);
console.log('Quote:', annotation.file_citation.quote);
}
}
}
}
}
Displaying Citations
let answer = response.content[0].text.value;
// Replace citation markers with clickable links
for (const annotation of response.content[0].text.annotations) {
if (annotation.type === 'file_citation') {
const citation = `[${annotation.text}](source: ${annotation.file_citation.file_id})`;
answer = answer.replace(annotation.text, citation);
}
}
console.log(answer);
Cost Management
Pricing Structure
- Storage: $0.10/GB/day
- Free tier: First 1GB
- Example: 5GB = $0.40/day = $12/month
Optimization Strategies
- Auto-Expiration:
const vectorStore = await openai.beta.vectorStores.create({
expires_after: {
anchor: "last_active_at",
days: 7, // Delete after 7 days of inactivity
},
});
- Cleanup Old Stores:
async function cleanupOldVectorStores() {
const stores = await openai.beta.vectorStores.list({ limit: 100 });
for (const store of stores.data) {
const ageDays = (Date.now() / 1000 - store.created_at) / (60 * 60 * 24);
if (ageDays > 30) {
await openai.beta.vectorStores.del(store.id);
}
}
}
- Monitor Usage:
const store = await openai.beta.vectorStores.retrieve(vectorStoreId);
const sizeGB = store.usage_bytes / (1024 * 1024 * 1024);
const costPerDay = Math.max(0, (sizeGB - 1) * 0.10);
console.log(`Daily cost: $${costPerDay.toFixed(4)}`);
Advanced Patterns
Pattern: Multi-Tenant Knowledge Bases
// Separate vector store per tenant
const tenantStore = await openai.beta.vectorStores.create({
name: `Tenant ${tenantId} KB`,
metadata: { tenant_id: tenantId },
});
// Or: Single store with namespace simulation via file metadata
await openai.files.create({
file: fs.createReadStream("doc.pdf"),
purpose: "assistants",
metadata: { tenant_id: tenantId }, // Coming soon
});
Pattern: Versioned Documentation
// Version 1.0
const v1Store = await openai.beta.vectorStores.create({
name: "Docs v1.0",
metadata: { version: "1.0" },
});
// Version 2.0
const v2Store = await openai.beta.vectorStores.create({
name: "Docs v2.0",
metadata: { version: "2.0" },
});
// Switch based on user preference
const storeId = userVersion === "1.0" ? v1Store.id : v2Store.id;
Pattern: Hybrid Search (File Search + Code Interpreter)
const assistant = await openai.beta.assistants.create({
tools: [
{ type: "file_search" },
{ type: "code_interpreter" },
],
tool_resources: {
file_search: {
vector_store_ids: [docsVectorStoreId],
},
},
});
// Assistant can search docs AND analyze attached data files
await openai.beta.threads.messages.create(thread.id, {
content: "Compare this sales data against the targets in our planning docs",
attachments: [{
file_id: salesDataFileId,
tools: [{ type: "code_interpreter" }],
}],
});
Troubleshooting
No Results Found
Causes:
- Vector store not fully indexed
- Poor query formulation
- Documents lack relevant content
Solutions:
- Wait for
status: "completed" - Refine query to be more specific
- Check document quality and structure
Irrelevant Results
Causes:
- Poor document structure
- Too much noise in documents
- Vague queries
Solutions:
- Add clear section headers
- Remove boilerplate/repetitive content
- Improve query specificity
High Costs
Causes:
- Too many vector stores
- Large files that don't expire
- Duplicate content
Solutions:
- Set auto-expiration
- Deduplicate documents
- Delete unused stores
Best Practices
- Structure documents with clear headers and sections
- Wait for indexing before using vector store
- Set auto-expiration to manage costs
- Monitor storage regularly
- Provide citations in responses
- Refine queries for better results
- Clean up old vector stores
Last Updated: 2025-10-25