Initial commit
This commit is contained in:
286
skills/rag/SKILL.md
Normal file
286
skills/rag/SKILL.md
Normal file
@@ -0,0 +1,286 @@
|
||||
---
|
||||
name: rag-implementation
|
||||
description: Build Retrieval-Augmented Generation (RAG) systems for AI applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.
|
||||
allowed-tools: Read, Write, Bash
|
||||
category: ai-engineering
|
||||
tags: [rag, vector-databases, embeddings, retrieval, semantic-search]
|
||||
version: 1.0.0
|
||||
---
|
||||
|
||||
# RAG Implementation
|
||||
|
||||
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
|
||||
|
||||
## Overview
|
||||
|
||||
RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
|
||||
- Building Q&A systems over proprietary documents
|
||||
- Creating chatbots with current, factual information
|
||||
- Implementing semantic search with natural language queries
|
||||
- Reducing hallucinations with grounded responses
|
||||
- Enabling AI systems to access domain-specific knowledge
|
||||
- Building documentation assistants
|
||||
- Creating research tools with source citation
|
||||
- Developing knowledge management systems
|
||||
|
||||
## Core Components
|
||||
|
||||
### Vector Databases
|
||||
Store and efficiently retrieve document embeddings for semantic search.
|
||||
|
||||
**Key Options:**
|
||||
- **Pinecone**: Managed, scalable, production-ready
|
||||
- **Weaviate**: Open-source, hybrid search capabilities
|
||||
- **Milvus**: High performance, on-premise deployment
|
||||
- **Chroma**: Lightweight, easy local development
|
||||
- **Qdrant**: Fast, advanced filtering
|
||||
- **FAISS**: Meta's library, full control
|
||||
|
||||
### Embedding Models
|
||||
Convert text to numerical vectors for similarity search.
|
||||
|
||||
**Popular Models:**
|
||||
- **text-embedding-ada-002** (OpenAI): General purpose, 1536 dimensions
|
||||
- **all-MiniLM-L6-v2**: Fast, lightweight, 384 dimensions
|
||||
- **e5-large-v2**: High quality, multilingual
|
||||
- **bge-large-en-v1.5**: State-of-the-art performance
|
||||
|
||||
### Retrieval Strategies
|
||||
Find relevant content based on user queries.
|
||||
|
||||
**Approaches:**
|
||||
- **Dense Retrieval**: Semantic similarity via embeddings
|
||||
- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)
|
||||
- **Hybrid Search**: Combine dense + sparse for best results
|
||||
- **Multi-Query**: Generate multiple query variations
|
||||
- **Contextual Compression**: Extract only relevant parts
|
||||
|
||||
## Quick Implementation
|
||||
|
||||
### Basic RAG Setup
|
||||
|
||||
```java
|
||||
// Load documents from file system
|
||||
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
|
||||
|
||||
// Create embedding store
|
||||
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
|
||||
|
||||
// Ingest documents into the store
|
||||
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
|
||||
|
||||
// Create AI service with RAG capability
|
||||
Assistant assistant = AiServices.builder(Assistant.class)
|
||||
.chatModel(chatModel)
|
||||
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
|
||||
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
|
||||
.build();
|
||||
```
|
||||
|
||||
### Document Processing Pipeline
|
||||
|
||||
```java
|
||||
// Split documents into chunks
|
||||
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
|
||||
500, // chunk size
|
||||
100 // overlap
|
||||
);
|
||||
|
||||
// Create embedding model
|
||||
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
|
||||
.apiKey("your-api-key")
|
||||
.build();
|
||||
|
||||
// Create embedding store
|
||||
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
|
||||
.host("localhost")
|
||||
.database("postgres")
|
||||
.user("postgres")
|
||||
.password("password")
|
||||
.table("embeddings")
|
||||
.dimension(1536)
|
||||
.build();
|
||||
|
||||
// Process and store documents
|
||||
for (Document document : documents) {
|
||||
List<TextSegment> segments = splitter.split(document);
|
||||
for (TextSegment segment : segments) {
|
||||
Embedding embedding = embeddingModel.embed(segment).content();
|
||||
embeddingStore.add(embedding, segment);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Patterns
|
||||
|
||||
### Pattern 1: Simple Document Q&A
|
||||
|
||||
Create a basic Q&A system over your documents.
|
||||
|
||||
```java
|
||||
public interface DocumentAssistant {
|
||||
String answer(String question);
|
||||
}
|
||||
|
||||
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
|
||||
.chatModel(chatModel)
|
||||
.contentRetriever(retriever)
|
||||
.build();
|
||||
```
|
||||
|
||||
### Pattern 2: Metadata-Filtered Retrieval
|
||||
|
||||
Filter results based on document metadata.
|
||||
|
||||
```java
|
||||
// Add metadata during document loading
|
||||
Document document = Document.builder()
|
||||
.text("Content here")
|
||||
.metadata("source", "technical-manual.pdf")
|
||||
.metadata("category", "technical")
|
||||
.metadata("date", "2024-01-15")
|
||||
.build();
|
||||
|
||||
// Filter during retrieval
|
||||
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
|
||||
.embeddingStore(embeddingStore)
|
||||
.embeddingModel(embeddingModel)
|
||||
.maxResults(5)
|
||||
.minScore(0.7)
|
||||
.filter(metadataKey("category").isEqualTo("technical"))
|
||||
.build();
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Source Retrieval
|
||||
|
||||
Combine results from multiple knowledge sources.
|
||||
|
||||
```java
|
||||
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
|
||||
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
|
||||
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);
|
||||
|
||||
// Combine results
|
||||
List<Content> allResults = new ArrayList<>();
|
||||
allResults.addAll(webRetriever.retrieve(query));
|
||||
allResults.addAll(documentRetriever.retrieve(query));
|
||||
allResults.addAll(databaseRetriever.retrieve(query));
|
||||
|
||||
// Rerank combined results
|
||||
List<Content> rerankedResults = reranker.reorder(query, allResults);
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Document Preparation
|
||||
- Clean and preprocess documents before ingestion
|
||||
- Remove irrelevant content and formatting artifacts
|
||||
- Standardize document structure for consistent processing
|
||||
- Add relevant metadata for filtering and context
|
||||
|
||||
### Chunking Strategy
|
||||
- Use 500-1000 tokens per chunk for optimal balance
|
||||
- Include 10-20% overlap to preserve context at boundaries
|
||||
- Consider document structure when determining chunk boundaries
|
||||
- Test different chunk sizes for your specific use case
|
||||
|
||||
### Retrieval Optimization
|
||||
- Start with high k values (10-20) then filter/rerank
|
||||
- Use metadata filtering to improve relevance
|
||||
- Combine multiple retrieval strategies for better coverage
|
||||
- Monitor retrieval quality and user feedback
|
||||
|
||||
### Performance Considerations
|
||||
- Cache embeddings for frequently accessed content
|
||||
- Use batch processing for document ingestion
|
||||
- Optimize vector store configuration for your scale
|
||||
- Monitor query performance and system resources
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Poor Retrieval Quality
|
||||
**Problem**: Retrieved documents don't match user queries
|
||||
**Solutions**:
|
||||
- Improve document preprocessing and cleaning
|
||||
- Adjust chunk size and overlap parameters
|
||||
- Try different embedding models
|
||||
- Use hybrid search combining semantic and keyword matching
|
||||
|
||||
### Irrelevant Results
|
||||
**Problem**: Retrieved documents contain relevant information but are not specific enough
|
||||
**Solutions**:
|
||||
- Add metadata filtering for domain-specific constraints
|
||||
- Implement reranking with cross-encoder models
|
||||
- Use contextual compression to extract relevant parts
|
||||
- Fine-tune retrieval parameters (k values, similarity thresholds)
|
||||
|
||||
### Performance Issues
|
||||
**Problem**: Slow response times during retrieval
|
||||
**Solutions**:
|
||||
- Optimize vector store configuration and indexing
|
||||
- Implement caching for frequently retrieved content
|
||||
- Use smaller embedding models for faster inference
|
||||
- Consider approximate nearest neighbor algorithms
|
||||
|
||||
### Hallucination Prevention
|
||||
**Problem**: AI generates information not present in retrieved documents
|
||||
**Solutions**:
|
||||
- Improve prompt engineering to emphasize grounding
|
||||
- Add verification steps to check answer alignment
|
||||
- Include confidence scoring for responses
|
||||
- Implement fact-checking mechanisms
|
||||
|
||||
## Evaluation Framework
|
||||
|
||||
### Retrieval Metrics
|
||||
- **Precision@k**: Percentage of relevant documents in top-k results
|
||||
- **Recall@k**: Percentage of all relevant documents found in top-k results
|
||||
- **Mean Reciprocal Rank (MRR)**: Average rank of first relevant result
|
||||
- **Normalized Discounted Cumulative Gain (nDCG)**: Ranking quality metric
|
||||
|
||||
### Answer Quality Metrics
|
||||
- **Faithfulness**: Degree to which answers are grounded in retrieved documents
|
||||
- **Answer Relevance**: How well answers address user questions
|
||||
- **Context Recall**: Percentage of relevant context used in answers
|
||||
- **Context Precision**: Percentage of retrieved context that is relevant
|
||||
|
||||
### User Experience Metrics
|
||||
- **Response Time**: Time from query to answer
|
||||
- **User Satisfaction**: Feedback ratings on answer quality
|
||||
- **Task Completion**: Rate of successful task completion
|
||||
- **Engagement**: User interaction patterns with the system
|
||||
|
||||
## Resources
|
||||
|
||||
### Reference Documentation
|
||||
- [Vector Database Comparison](references/vector-databases.md) - Detailed comparison of vector database options
|
||||
- [Embedding Models Guide](references/embedding-models.md) - Model selection and optimization
|
||||
- [Retrieval Strategies](references/retrieval-strategies.md) - Advanced retrieval techniques
|
||||
- [Document Chunking](references/document-chunking.md) - Chunking strategies and best practices
|
||||
- [LangChain4j RAG Guide](references/langchain4j-rag-guide.md) - Official implementation patterns
|
||||
|
||||
### Assets
|
||||
- `assets/vector-store-config.yaml` - Configuration templates for different vector stores
|
||||
- `assets/retriever-pipeline.java` - Complete RAG pipeline implementation
|
||||
- `assets/evaluation-metrics.java` - Evaluation framework code
|
||||
|
||||
## Constraints and Limitations
|
||||
|
||||
1. **Token Limits**: Respect model context window limitations
|
||||
2. **API Rate Limits**: Manage external API rate limits and costs
|
||||
3. **Data Privacy**: Ensure compliance with data protection regulations
|
||||
4. **Resource Requirements**: Consider memory and computational requirements
|
||||
5. **Maintenance**: Plan for regular updates and system monitoring
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Secure access to vector databases and embedding services
|
||||
- Implement proper authentication and authorization
|
||||
- Validate and sanitize user inputs
|
||||
- Monitor for abuse and unusual usage patterns
|
||||
- Regular security audits and penetration testing
|
||||
307
skills/rag/assets/retriever-pipeline.java
Normal file
307
skills/rag/assets/retriever-pipeline.java
Normal file
@@ -0,0 +1,307 @@
|
||||
package com.example.rag;
|
||||
|
||||
import dev.langchain4j.data.document.Document;
|
||||
import dev.langchain4j.data.document.DocumentSplitter;
|
||||
import dev.langchain4j.data.document.parser.TextDocumentParser;
|
||||
import dev.langchain4j.data.document.splitter.RecursiveCharacterTextSplitter;
|
||||
import dev.langchain4j.data.embedding.Embedding;
|
||||
import dev.langchain4j.data.segment.TextSegment;
|
||||
import dev.langchain4j.model.embedding.EmbeddingModel;
|
||||
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
|
||||
import dev.langchain4j.store.embedding.EmbeddingStore;
|
||||
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
|
||||
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
|
||||
import dev.langchain4j.store.embedding.pinecone.PineconeEmbeddingStore;
|
||||
import dev.langchain4j.store.embedding.chroma.ChromaEmbeddingStore;
|
||||
import dev.langchain4j.store.embedding.qdrant.QdrantEmbeddingStore;
|
||||
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
|
||||
import dev.langchain4j.store.embedding.filter.Filter;
|
||||
import dev.langchain4j.store.embedding.filter.MetadataFilterBuilder;
|
||||
|
||||
import java.nio.file.Path;
|
||||
import java.nio.file.Paths;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.HashMap;
|
||||
|
||||
/**
|
||||
* Complete RAG Pipeline Implementation
|
||||
*
|
||||
* This class provides a comprehensive implementation of a RAG (Retrieval-Augmented Generation)
|
||||
* system with support for multiple vector stores and advanced retrieval strategies.
|
||||
*/
|
||||
public class RAGPipeline {
|
||||
|
||||
private final EmbeddingModel embeddingModel;
|
||||
private final EmbeddingStore<TextSegment> embeddingStore;
|
||||
private final DocumentSplitter documentSplitter;
|
||||
private final RAGConfig config;
|
||||
|
||||
/**
|
||||
* Configuration class for RAG pipeline
|
||||
*/
|
||||
public static class RAGConfig {
|
||||
private String vectorStoreType = "chroma";
|
||||
private String openAiApiKey;
|
||||
private String pineconeApiKey;
|
||||
private String pineconeEnvironment;
|
||||
private String pineconeIndex = "rag-documents";
|
||||
private String chromaCollection = "rag-documents";
|
||||
private String chromaPersistPath = "./chroma_db";
|
||||
private String qdrantHost = "localhost";
|
||||
private int qdrantPort = 6333;
|
||||
private String qdrantCollection = "rag-documents";
|
||||
private int chunkSize = 1000;
|
||||
private int chunkOverlap = 200;
|
||||
private int embeddingDimension = 1536;
|
||||
|
||||
// Getters and setters
|
||||
public String getVectorStoreType() { return vectorStoreType; }
|
||||
public void setVectorStoreType(String vectorStoreType) { this.vectorStoreType = vectorStoreType; }
|
||||
public String getOpenAiApiKey() { return openAiApiKey; }
|
||||
public void setOpenAiApiKey(String openAiApiKey) { this.openAiApiKey = openAiApiKey; }
|
||||
public String getPineconeApiKey() { return pineconeApiKey; }
|
||||
public void setPineconeApiKey(String pineconeApiKey) { this.pineconeApiKey = pineconeApiKey; }
|
||||
public String getPineconeEnvironment() { return pineconeEnvironment; }
|
||||
public void setPineconeEnvironment(String pineconeEnvironment) { this.pineconeEnvironment = pineconeEnvironment; }
|
||||
public String getPineconeIndex() { return pineconeIndex; }
|
||||
public void setPineconeIndex(String pineconeIndex) { this.pineconeIndex = pineconeIndex; }
|
||||
public String getChromaCollection() { return chromaCollection; }
|
||||
public void setChromaCollection(String chromaCollection) { this.chromaCollection = chromaCollection; }
|
||||
public String getChromaPersistPath() { return chromaPersistPath; }
|
||||
public void setChromaPersistPath(String chromaPersistPath) { this.chromaPersistPath = chromaPersistPath; }
|
||||
public String getQdrantHost() { return qdrantHost; }
|
||||
public void setQdrantHost(String qdrantHost) { this.qdrantHost = qdrantHost; }
|
||||
public int getQdrantPort() { return qdrantPort; }
|
||||
public void setQdrantPort(int qdrantPort) { this.qdrantPort = qdrantPort; }
|
||||
public String getQdrantCollection() { return qdrantCollection; }
|
||||
public void setQdrantCollection(String qdrantCollection) { this.qdrantCollection = qdrantCollection; }
|
||||
public int getChunkSize() { return chunkSize; }
|
||||
public void setChunkSize(int chunkSize) { this.chunkSize = chunkSize; }
|
||||
public int getChunkOverlap() { return chunkOverlap; }
|
||||
public void setChunkOverlap(int chunkOverlap) { this.chunkOverlap = chunkOverlap; }
|
||||
public int getEmbeddingDimension() { return embeddingDimension; }
|
||||
public void setEmbeddingDimension(int embeddingDimension) { this.embeddingDimension = embeddingDimension; }
|
||||
}
|
||||
|
||||
/**
|
||||
* Constructor
|
||||
*/
|
||||
public RAGPipeline(RAGConfig config) {
|
||||
this.config = config;
|
||||
this.embeddingModel = createEmbeddingModel();
|
||||
this.embeddingStore = createEmbeddingStore();
|
||||
this.documentSplitter = createDocumentSplitter();
|
||||
}
|
||||
|
||||
/**
|
||||
* Create embedding model based on configuration
|
||||
*/
|
||||
private EmbeddingModel createEmbeddingModel() {
|
||||
return OpenAiEmbeddingModel.builder()
|
||||
.apiKey(config.getOpenAiApiKey())
|
||||
.modelName("text-embedding-ada-002")
|
||||
.build();
|
||||
}
|
||||
|
||||
/**
|
||||
* Create embedding store based on configuration
|
||||
*/
|
||||
private EmbeddingStore<TextSegment> createEmbeddingStore() {
|
||||
switch (config.getVectorStoreType().toLowerCase()) {
|
||||
case "pinecone":
|
||||
return PineconeEmbeddingStore.builder()
|
||||
.apiKey(config.getPineconeApiKey())
|
||||
.environment(config.getPineconeEnvironment())
|
||||
.index(config.getPineconeIndex())
|
||||
.dimension(config.getEmbeddingDimension())
|
||||
.build();
|
||||
|
||||
case "chroma":
|
||||
return ChromaEmbeddingStore.builder()
|
||||
.collectionName(config.getChromaCollection())
|
||||
.persistDirectory(config.getChromaPersistPath())
|
||||
.build();
|
||||
|
||||
case "qdrant":
|
||||
return QdrantEmbeddingStore.builder()
|
||||
.host(config.getQdrantHost())
|
||||
.port(config.getQdrantPort())
|
||||
.collectionName(config.getQdrantCollection())
|
||||
.dimension(config.getEmbeddingDimension())
|
||||
.build();
|
||||
|
||||
case "memory":
|
||||
default:
|
||||
return new InMemoryEmbeddingStore<>();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Create document splitter
|
||||
*/
|
||||
private DocumentSplitter createDocumentSplitter() {
|
||||
return new RecursiveCharacterTextSplitter(
|
||||
config.getChunkSize(),
|
||||
config.getChunkOverlap()
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Load documents from directory
|
||||
*/
|
||||
public List<Document> loadDocuments(String directoryPath) {
|
||||
try {
|
||||
Path directory = Paths.get(directoryPath);
|
||||
List<Document> documents = FileSystemDocumentLoader.loadDocuments(directory);
|
||||
|
||||
// Add metadata to documents
|
||||
for (Document document : documents) {
|
||||
Map<String, Object> metadata = new HashMap<>(document.metadata().toMap());
|
||||
metadata.put("loaded_at", System.currentTimeMillis());
|
||||
metadata.put("source_directory", directoryPath);
|
||||
|
||||
// Update document metadata
|
||||
document = Document.from(document.text(), metadata);
|
||||
}
|
||||
|
||||
return documents;
|
||||
} catch (Exception e) {
|
||||
throw new RuntimeException("Failed to load documents from " + directoryPath, e);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process and ingest documents
|
||||
*/
|
||||
public void ingestDocuments(List<Document> documents) {
|
||||
// Split documents into segments
|
||||
List<TextSegment> segments = documentSplitter.split(documents);
|
||||
|
||||
// Add additional metadata to segments
|
||||
for (int i = 0; i < segments.size(); i++) {
|
||||
TextSegment segment = segments.get(i);
|
||||
Map<String, Object> metadata = new HashMap<>(segment.metadata().toMap());
|
||||
metadata.put("segment_index", i);
|
||||
metadata.put("total_segments", segments.size());
|
||||
metadata.put("processed_at", System.currentTimeMillis());
|
||||
|
||||
segments.set(i, TextSegment.from(segment.text(), metadata));
|
||||
}
|
||||
|
||||
// Ingest into embedding store
|
||||
EmbeddingStoreIngestor.ingest(segments, embeddingStore);
|
||||
|
||||
System.out.println("Ingested " + documents.size() + " documents into " +
|
||||
segments.size() + " segments");
|
||||
}
|
||||
|
||||
/**
|
||||
* Search documents with optional filtering
|
||||
*/
|
||||
public List<TextSegment> search(String query, int maxResults, Filter filter) {
|
||||
Embedding queryEmbedding = embeddingModel.embed(query).content();
|
||||
|
||||
return embeddingStore.findRelevant(queryEmbedding, maxResults, filter);
|
||||
}
|
||||
|
||||
/**
|
||||
* Search documents with metadata filtering
|
||||
*/
|
||||
public List<TextSegment> searchWithMetadataFilter(String query, int maxResults,
|
||||
Map<String, Object> metadataFilters) {
|
||||
Filter filter = null;
|
||||
|
||||
if (metadataFilters != null && !metadataFilters.isEmpty()) {
|
||||
MetadataFilterBuilder filterBuilder = new MetadataFilterBuilder();
|
||||
|
||||
for (Map.Entry<String, Object> entry : metadataFilters.entrySet()) {
|
||||
String key = entry.getKey();
|
||||
Object value = entry.getValue();
|
||||
|
||||
if (value instanceof String) {
|
||||
filterBuilder = filterBuilder.metadata(key).isEqualTo((String) value);
|
||||
} else if (value instanceof Number) {
|
||||
filterBuilder = filterBuilder.metadata(key).isEqualTo(((Number) value).doubleValue());
|
||||
}
|
||||
// Add more type handling as needed
|
||||
}
|
||||
|
||||
filter = filterBuilder.build();
|
||||
}
|
||||
|
||||
return search(query, maxResults, filter);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get statistics about the stored documents
|
||||
*/
|
||||
public RAGStatistics getStatistics() {
|
||||
// This is a simplified implementation
|
||||
// In practice, you might want to track more detailed statistics
|
||||
return new RAGStatistics(
|
||||
embeddingStore.getClass().getSimpleName(),
|
||||
config.getVectorStoreType()
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Statistics holder class
|
||||
*/
|
||||
public static class RAGStatistics {
|
||||
private final String storeType;
|
||||
private final String implementation;
|
||||
|
||||
public RAGStatistics(String storeType, String implementation) {
|
||||
this.storeType = storeType;
|
||||
this.implementation = implementation;
|
||||
}
|
||||
|
||||
public String getStoreType() { return storeType; }
|
||||
public String getImplementation() { return implementation; }
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return "RAGStatistics{" +
|
||||
"storeType='" + storeType + '\'' +
|
||||
", implementation='" + implementation + '\'' +
|
||||
'}';
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Example usage
|
||||
*/
|
||||
public static void main(String[] args) {
|
||||
// Configure the pipeline
|
||||
RAGConfig config = new RAGConfig();
|
||||
config.setVectorStoreType("chroma"); // or "pinecone", "qdrant", "memory"
|
||||
config.setOpenAiApiKey("your-openai-api-key");
|
||||
config.setChunkSize(1000);
|
||||
config.setChunkOverlap(200);
|
||||
|
||||
// Create pipeline
|
||||
RAGPipeline pipeline = new RAGPipeline(config);
|
||||
|
||||
// Load documents
|
||||
List<Document> documents = pipeline.loadDocuments("./documents");
|
||||
|
||||
// Ingest documents
|
||||
pipeline.ingestDocuments(documents);
|
||||
|
||||
// Search for relevant content
|
||||
List<TextSegment> results = pipeline.search("What is machine learning?", 5, null);
|
||||
|
||||
// Print results
|
||||
for (int i = 0; i < results.size(); i++) {
|
||||
TextSegment segment = results.get(i);
|
||||
System.out.println("Result " + (i + 1) + ":");
|
||||
System.out.println("Content: " + segment.text().substring(0, Math.min(200, segment.text().length())) + "...");
|
||||
System.out.println("Metadata: " + segment.metadata());
|
||||
System.out.println();
|
||||
}
|
||||
|
||||
// Print statistics
|
||||
System.out.println("Pipeline Statistics: " + pipeline.getStatistics());
|
||||
}
|
||||
}
|
||||
127
skills/rag/assets/vector-store-config.yaml
Normal file
127
skills/rag/assets/vector-store-config.yaml
Normal file
@@ -0,0 +1,127 @@
|
||||
# Vector Store Configuration Templates
|
||||
# This file contains configuration templates for different vector databases
|
||||
|
||||
# Chroma (Local/Development)
|
||||
chroma:
|
||||
type: chroma
|
||||
settings:
|
||||
persist_directory: "./chroma_db"
|
||||
collection_name: "rag_documents"
|
||||
host: "localhost"
|
||||
port: 8000
|
||||
|
||||
# Recommended for: Development, small-scale applications
|
||||
# Pros: Easy setup, local deployment, free
|
||||
# Cons: Limited scalability, single-node only
|
||||
|
||||
# Pinecone (Cloud/Production)
|
||||
pinecone:
|
||||
type: pinecone
|
||||
settings:
|
||||
api_key: "${PINECONE_API_KEY}"
|
||||
environment: "us-west1-gcp"
|
||||
index_name: "rag-documents"
|
||||
dimension: 1536
|
||||
metric: "cosine"
|
||||
pods: 1
|
||||
pod_type: "p1.x1"
|
||||
|
||||
# Recommended for: Production applications, large-scale
|
||||
# Pros: Managed service, scalable, fast
|
||||
# Cons: Cost, requires internet connection
|
||||
|
||||
# Weaviate (Open-source/Cloud)
|
||||
weaviate:
|
||||
type: weaviate
|
||||
settings:
|
||||
url: "http://localhost:8080"
|
||||
api_key: "${WEAVIATE_API_KEY}"
|
||||
class_name: "Document"
|
||||
text_key: "content"
|
||||
vectorizer: "text2vec-openai"
|
||||
module_config:
|
||||
text2vec-openai:
|
||||
model: "ada"
|
||||
modelVersion: "002"
|
||||
type: "text"
|
||||
baseUrl: "https://api.openai.com/v1"
|
||||
|
||||
# Recommended for: Hybrid search, GraphQL API
|
||||
# Pros: Open-source, hybrid search, flexible
|
||||
# Cons: More complex setup
|
||||
|
||||
# Qdrant (Performance-focused)
|
||||
qdrant:
|
||||
type: qdrant
|
||||
settings:
|
||||
host: "localhost"
|
||||
port: 6333
|
||||
collection_name: "rag_documents"
|
||||
vector_size: 1536
|
||||
distance: "Cosine"
|
||||
api_key: "${QDRANT_API_KEY}"
|
||||
|
||||
# Recommended for: Performance, advanced filtering
|
||||
# Pros: Fast, good filtering, open-source
|
||||
# Cons: Newer project, smaller community
|
||||
|
||||
# Milvus (Enterprise/Scale)
|
||||
milvus:
|
||||
type: milvus
|
||||
settings:
|
||||
host: "localhost"
|
||||
port: 19530
|
||||
collection_name: "rag_documents"
|
||||
dimension: 1536
|
||||
index_type: "IVF_FLAT"
|
||||
metric_type: "COSINE"
|
||||
nlist: 1024
|
||||
|
||||
# Recommended for: Enterprise, large-scale deployments
|
||||
# Pros: High performance, distributed
|
||||
# Cons: Complex setup, resource intensive
|
||||
|
||||
# FAISS (Local/Research)
|
||||
faiss:
|
||||
type: faiss
|
||||
settings:
|
||||
index_type: "IndexFlatL2"
|
||||
dimension: 1536
|
||||
save_path: "./faiss_index"
|
||||
|
||||
# Recommended for: Research, local processing
|
||||
# Pros: Fast, local, no dependencies
|
||||
# Cons: No persistence, limited features
|
||||
|
||||
# Common Configuration Parameters
|
||||
common:
|
||||
chunking:
|
||||
chunk_size: 1000
|
||||
chunk_overlap: 200
|
||||
separators: ["\n\n", "\n", " ", ""]
|
||||
|
||||
embedding:
|
||||
model: "text-embedding-ada-002"
|
||||
batch_size: 100
|
||||
max_retries: 3
|
||||
timeout: 30
|
||||
|
||||
retrieval:
|
||||
default_k: 5
|
||||
similarity_threshold: 0.7
|
||||
max_results: 20
|
||||
|
||||
performance:
|
||||
cache_embeddings: true
|
||||
cache_size: 1000
|
||||
parallel_processing: true
|
||||
batch_size: 50
|
||||
|
||||
# Environment Variables Template
|
||||
# Copy this to .env file and fill in your values
|
||||
environment:
|
||||
OPENAI_API_KEY: "your-openai-api-key-here"
|
||||
PINECONE_API_KEY: "your-pinecone-api-key-here"
|
||||
PINECONE_ENVIRONMENT: "us-west1-gcp"
|
||||
WEAVIATE_API_KEY: "your-weaviate-api-key-here"
|
||||
QDRANT_API_KEY: "your-qdrant-api-key-here"
|
||||
137
skills/rag/references/document-chunking.md
Normal file
137
skills/rag/references/document-chunking.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Document Chunking Strategies
|
||||
|
||||
## Overview
|
||||
Document chunking is the process of breaking large documents into smaller, manageable pieces that can be effectively embedded and retrieved.
|
||||
|
||||
## Chunking Strategies
|
||||
|
||||
### 1. Recursive Character Text Splitter
|
||||
**Method**: Split text based on character count, trying separators in order
|
||||
**Use Case**: General purpose text splitting
|
||||
**Advantages**: Preserves sentence and paragraph boundaries when possible
|
||||
|
||||
```python
|
||||
from langchain.text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=1000,
|
||||
chunk_overlap=200,
|
||||
length_function=len,
|
||||
separators=["\n\n", "\n", " ", ""] # Try these in order
|
||||
)
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### 2. Token-Based Splitting
|
||||
**Method**: Split based on token count rather than characters
|
||||
**Use Case**: When working with token limits of language models
|
||||
**Advantages**: Better control over context window usage
|
||||
|
||||
```python
|
||||
from langchain.text_splitters import TokenTextSplitter
|
||||
|
||||
splitter = TokenTextSplitter(
|
||||
chunk_size=512,
|
||||
chunk_overlap=50
|
||||
)
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### 3. Semantic Chunking
|
||||
**Method**: Split based on semantic similarity
|
||||
**Use Case**: When maintaining semantic coherence is important
|
||||
**Advantages**: Chunks are more semantically meaningful
|
||||
|
||||
```python
|
||||
from langchain.text_splitters import SemanticChunker
|
||||
|
||||
splitter = SemanticChunker(
|
||||
embeddings=OpenAIEmbeddings(),
|
||||
breakpoint_threshold_type="percentile"
|
||||
)
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### 4. Markdown Header Splitter
|
||||
**Method**: Split based on markdown headers
|
||||
**Use Case**: Structured documents with clear hierarchical organization
|
||||
**Advantages**: Maintains document structure and context
|
||||
|
||||
```python
|
||||
from langchain.text_splitters import MarkdownHeaderTextSplitter
|
||||
|
||||
headers_to_split_on = [
|
||||
("#", "Header 1"),
|
||||
("##", "Header 2"),
|
||||
("###", "Header 3"),
|
||||
]
|
||||
|
||||
splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### 5. HTML Splitter
|
||||
**Method**: Split based on HTML tags
|
||||
**Use Case**: Web pages and HTML documents
|
||||
**Advantages**: Preserves HTML structure and metadata
|
||||
|
||||
```python
|
||||
from langchain.text_splitters import HTMLHeaderTextSplitter
|
||||
|
||||
headers_to_split_on = [
|
||||
("h1", "Header 1"),
|
||||
("h2", "Header 2"),
|
||||
("h3", "Header 3"),
|
||||
]
|
||||
|
||||
splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
|
||||
chunks = splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
## Parameter Tuning
|
||||
|
||||
### Chunk Size
|
||||
- **Small chunks (200-400 tokens)**: More precise retrieval, but may lose context
|
||||
- **Medium chunks (500-1000 tokens)**: Good balance of precision and context
|
||||
- **Large chunks (1000-2000 tokens)**: More context, but less precise retrieval
|
||||
|
||||
### Chunk Overlap
|
||||
- **Purpose**: Preserve context at chunk boundaries
|
||||
- **Typical range**: 10-20% of chunk size
|
||||
- **Higher overlap**: Better context preservation, but more redundancy
|
||||
- **Lower overlap**: Less redundancy, but may lose important context
|
||||
|
||||
### Separators
|
||||
- **Hierarchical separators**: Start with larger boundaries (paragraphs), then smaller (sentences)
|
||||
- **Custom separators**: Add domain-specific separators for better results
|
||||
- **Language-specific**: Adjust for different languages and writing styles
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Preserve Context**: Ensure chunks contain enough surrounding context
|
||||
2. **Maintain Coherence**: Keep semantically related content together
|
||||
3. **Respect Boundaries**: Avoid breaking sentences or important phrases
|
||||
4. **Consider Query Types**: Adapt chunking strategy to typical user queries
|
||||
5. **Test and Iterate**: Evaluate different chunking strategies for your specific use case
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
1. **Retrieval Quality**: How well chunks answer user queries
|
||||
2. **Context Preservation**: Whether important context is maintained
|
||||
3. **Chunk Distribution**: Evenness of chunk sizes
|
||||
4. **Boundary Quality**: How natural chunk boundaries are
|
||||
5. **Retrieval Efficiency**: Impact on retrieval speed and accuracy
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
### Adaptive Chunking
|
||||
Adjust chunk size based on document structure and content density.
|
||||
|
||||
### Hierarchical Chunking
|
||||
Create multiple levels of chunks for different retrieval scenarios.
|
||||
|
||||
### Query-Aware Chunking
|
||||
Optimize chunk boundaries based on typical query patterns.
|
||||
|
||||
### Domain-Specific Splitting
|
||||
Use specialized splitters for specific document types (legal, medical, technical).
|
||||
88
skills/rag/references/embedding-models.md
Normal file
88
skills/rag/references/embedding-models.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Embedding Models Guide
|
||||
|
||||
## Overview
|
||||
Embedding models convert text into numerical vectors that capture semantic meaning for similarity search in RAG systems.
|
||||
|
||||
## Popular Embedding Models
|
||||
|
||||
### 1. text-embedding-ada-002 (OpenAI)
|
||||
- **Dimensions**: 1536
|
||||
- **Type**: General purpose
|
||||
- **Use Case**: Most applications requiring high quality embeddings
|
||||
- **Performance**: Excellent balance of quality and speed
|
||||
|
||||
### 2. all-MiniLM-L6-v2 (Sentence Transformers)
|
||||
- **Dimensions**: 384
|
||||
- **Type**: Lightweight
|
||||
- **Use Case**: Applications requiring fast inference
|
||||
- **Performance**: Good quality, very fast
|
||||
|
||||
### 3. e5-large-v2
|
||||
- **Dimensions**: 1024
|
||||
- **Type**: High quality
|
||||
- **Use Case**: Applications needing superior performance
|
||||
- **Performance**: Excellent quality, multilingual support
|
||||
|
||||
### 4. Instructor
|
||||
- **Dimensions**: Variable (768)
|
||||
- **Type**: Task-specific
|
||||
- **Use Case**: Domain-specific applications
|
||||
- **Performance**: Can be fine-tuned for specific tasks
|
||||
|
||||
### 5. bge-large-en-v1.5
|
||||
- **Dimensions**: 1024
|
||||
- **Type**: State-of-the-art
|
||||
- **Use Case**: Applications requiring best possible quality
|
||||
- **Performance**: SOTA performance on benchmarks
|
||||
|
||||
## Selection Criteria
|
||||
|
||||
1. **Quality vs Speed**: Balance between embedding quality and inference speed
|
||||
2. **Dimension Size**: Impact on storage and retrieval performance
|
||||
3. **Domain**: Specific language or domain requirements
|
||||
4. **Cost**: API costs vs local deployment
|
||||
5. **Batch Size**: Throughput requirements
|
||||
6. **Language**: Multilingual support needs
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### OpenAI Embeddings
|
||||
```python
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
|
||||
embeddings = OpenAIEmbeddings()
|
||||
vector = embeddings.embed_query("Your text here")
|
||||
```
|
||||
|
||||
### Sentence Transformers
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
model = SentenceTransformer('all-MiniLM-L6-v2')
|
||||
vector = model.encode("Your text here")
|
||||
```
|
||||
|
||||
### Hugging Face Models
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceEmbeddings
|
||||
|
||||
embeddings = HuggingFaceEmbeddings(
|
||||
model_name="sentence-transformers/all-MiniLM-L6-v2"
|
||||
)
|
||||
```
|
||||
|
||||
## Optimization Tips
|
||||
|
||||
1. **Batch Processing**: Process multiple texts together for efficiency
|
||||
2. **Model Quantization**: Reduce model size for faster inference
|
||||
3. **Caching**: Cache embeddings for frequently used texts
|
||||
4. **GPU Acceleration**: Use GPU for faster processing when available
|
||||
5. **Model Selection**: Choose appropriate model size for your use case
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
1. **Semantic Similarity**: How well embeddings capture meaning
|
||||
2. **Retrieval Performance**: Quality of retrieved documents
|
||||
3. **Speed**: Inference time per document
|
||||
4. **Memory Usage**: RAM requirements for the model
|
||||
5. **Cost**: API costs or infrastructure requirements
|
||||
94
skills/rag/references/langchain4j-rag-guide.md
Normal file
94
skills/rag/references/langchain4j-rag-guide.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# LangChain4j RAG Implementation Guide
|
||||
|
||||
## Overview
|
||||
RAG (Retrieval-Augmented Generation) extends LLM knowledge by finding and injecting relevant information from your data into prompts before sending to the LLM.
|
||||
|
||||
## What is RAG?
|
||||
RAG helps LLMs answer questions using domain-specific knowledge by retrieving relevant information to reduce hallucinations.
|
||||
|
||||
## RAG Flavors in LangChain4j
|
||||
|
||||
### 1. Easy RAG
|
||||
Simplest way to start with minimal setup. Handles document loading, splitting, and embedding automatically.
|
||||
|
||||
### 2. Core RAG APIs
|
||||
Modular components including:
|
||||
- Document
|
||||
- TextSegment
|
||||
- EmbeddingModel
|
||||
- EmbeddingStore
|
||||
- DocumentSplitter
|
||||
|
||||
### 3. Advanced RAG
|
||||
Complex pipelines supporting:
|
||||
- Query transformation
|
||||
- Multi-source retrieval
|
||||
- Re-ranking with components like QueryTransformer and ContentRetriever
|
||||
|
||||
## RAG Stages
|
||||
|
||||
### 1. Indexing
|
||||
Pre-process documents for efficient search
|
||||
|
||||
### 2. Retrieval
|
||||
Find relevant content based on user queries
|
||||
|
||||
## Core Components
|
||||
|
||||
### Documents with metadata
|
||||
Structured representation of your content with associated metadata for filtering and context.
|
||||
|
||||
### Text segments (chunks)
|
||||
Smaller, manageable pieces of documents that are embedded and stored in vector databases.
|
||||
|
||||
### Embedding models
|
||||
Convert text segments into numerical vectors for similarity search.
|
||||
|
||||
### Embedding stores (vector databases)
|
||||
Store and efficiently retrieve embedded text segments.
|
||||
|
||||
### Content retrievers
|
||||
Find relevant content based on user queries.
|
||||
|
||||
### Query transformers
|
||||
Transform and optimize user queries for better retrieval.
|
||||
|
||||
### Content aggregators
|
||||
Combine and rank retrieved content.
|
||||
|
||||
## Advanced Features
|
||||
|
||||
- Query transformation and routing
|
||||
- Multiple retrievers for different data sources
|
||||
- Re-ranking models for improved relevance
|
||||
- Metadata filtering for targeted retrieval
|
||||
- Parallel processing for performance
|
||||
|
||||
## Implementation Example (Easy RAG)
|
||||
|
||||
```java
|
||||
// Load documents
|
||||
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
|
||||
|
||||
// Create embedding store
|
||||
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
|
||||
|
||||
// Ingest documents
|
||||
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
|
||||
|
||||
// Create AI service
|
||||
Assistant assistant = AiServices.builder(Assistant.class)
|
||||
.chatModel(chatModel)
|
||||
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
|
||||
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
|
||||
.build();
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Document Preparation**: Clean and structure documents before ingestion
|
||||
2. **Chunk Size**: Balance between context preservation and retrieval precision
|
||||
3. **Metadata Strategy**: Include relevant metadata for filtering and context
|
||||
4. **Embedding Model Selection**: Choose models appropriate for your domain
|
||||
5. **Retrieval Strategy**: Select appropriate k values and filtering criteria
|
||||
6. **Evaluation**: Continuously evaluate retrieval quality and answer accuracy
|
||||
161
skills/rag/references/retrieval-strategies.md
Normal file
161
skills/rag/references/retrieval-strategies.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# Advanced Retrieval Strategies
|
||||
|
||||
## Overview
|
||||
Different retrieval approaches for finding relevant documents in RAG systems, each with specific strengths and use cases.
|
||||
|
||||
## Retrieval Approaches
|
||||
|
||||
### 1. Dense Retrieval
|
||||
**Method**: Semantic similarity via embeddings
|
||||
**Use Case**: Understanding meaning and context
|
||||
**Example**: Finding documents about "machine learning" when query is "AI algorithms"
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Chroma
|
||||
|
||||
vectorstore = Chroma.from_documents(chunks, embeddings)
|
||||
results = vectorstore.similarity_search("query", k=5)
|
||||
```
|
||||
|
||||
### 2. Sparse Retrieval
|
||||
**Method**: Keyword matching (BM25, TF-IDF)
|
||||
**Use Case**: Exact term matching and keyword-specific queries
|
||||
**Example**: Finding documents containing specific technical terms
|
||||
|
||||
```python
|
||||
from langchain.retrievers import BM25Retriever
|
||||
|
||||
bm25_retriever = BM25Retriever.from_documents(chunks)
|
||||
bm25_retriever.k = 5
|
||||
results = bm25_retriever.get_relevant_documents("query")
|
||||
```
|
||||
|
||||
### 3. Hybrid Search
|
||||
**Method**: Combine dense + sparse retrieval
|
||||
**Use Case**: Balance between semantic understanding and keyword matching
|
||||
|
||||
```python
|
||||
from langchain.retrievers import BM25Retriever, EnsembleRetriever
|
||||
|
||||
# Sparse retriever (BM25)
|
||||
bm25_retriever = BM25Retriever.from_documents(chunks)
|
||||
bm25_retriever.k = 5
|
||||
|
||||
# Dense retriever (embeddings)
|
||||
embedding_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
|
||||
# Combine with weights
|
||||
ensemble_retriever = EnsembleRetriever(
|
||||
retrievers=[bm25_retriever, embedding_retriever],
|
||||
weights=[0.3, 0.7]
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Multi-Query Retrieval
|
||||
**Method**: Generate multiple query variations
|
||||
**Use Case**: Complex queries that can be interpreted in multiple ways
|
||||
|
||||
```python
|
||||
from langchain.retrievers.multi_query import MultiQueryRetriever
|
||||
|
||||
# Generate multiple query perspectives
|
||||
retriever = MultiQueryRetriever.from_llm(
|
||||
retriever=vectorstore.as_retriever(),
|
||||
llm=OpenAI()
|
||||
)
|
||||
|
||||
# Single query → multiple variations → combined results
|
||||
results = retriever.get_relevant_documents("What is the main topic?")
|
||||
```
|
||||
|
||||
### 5. HyDE (Hypothetical Document Embeddings)
|
||||
**Method**: Generate hypothetical documents for better retrieval
|
||||
**Use Case**: When queries are very different from document style
|
||||
|
||||
```python
|
||||
# Generate hypothetical document based on query
|
||||
hypothetical_doc = llm.generate(f"Write a document about: {query}")
|
||||
# Use hypothetical doc for retrieval
|
||||
results = vectorstore.similarity_search(hypothetical_doc, k=5)
|
||||
```
|
||||
|
||||
## Advanced Retrieval Patterns
|
||||
|
||||
### Contextual Compression
|
||||
Compress retrieved documents to only include relevant parts
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ContextualCompressionRetriever
|
||||
from langchain.retrievers.document_compressors import LLMChainExtractor
|
||||
|
||||
compressor = LLMChainExtractor.from_llm(llm)
|
||||
compression_retriever = ContextualCompressionRetriever(
|
||||
base_compressor=compressor,
|
||||
base_retriever=vectorstore.as_retriever()
|
||||
)
|
||||
```
|
||||
|
||||
### Parent Document Retriever
|
||||
Store small chunks for retrieval, return larger chunks for context
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ParentDocumentRetriever
|
||||
from langchain.storage import InMemoryStore
|
||||
|
||||
store = InMemoryStore()
|
||||
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
|
||||
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
|
||||
|
||||
retriever = ParentDocumentRetriever(
|
||||
vectorstore=vectorstore,
|
||||
docstore=store,
|
||||
child_splitter=child_splitter,
|
||||
parent_splitter=parent_splitter
|
||||
)
|
||||
```
|
||||
|
||||
## Retrieval Optimization Techniques
|
||||
|
||||
### 1. Metadata Filtering
|
||||
Filter results based on document metadata
|
||||
|
||||
```python
|
||||
results = vectorstore.similarity_search(
|
||||
"query",
|
||||
filter={"category": "technical", "date": {"$gte": "2023-01-01"}},
|
||||
k=5
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Maximal Marginal Relevance (MMR)
|
||||
Balance relevance with diversity
|
||||
|
||||
```python
|
||||
results = vectorstore.max_marginal_relevance_search(
|
||||
"query",
|
||||
k=5,
|
||||
fetch_k=20,
|
||||
lambda_mult=0.5 # 0=max diversity, 1=max relevance
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Reranking
|
||||
Improve top results with cross-encoder
|
||||
|
||||
```python
|
||||
from sentence_transformers import CrossEncoder
|
||||
|
||||
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
|
||||
candidates = vectorstore.similarity_search("query", k=20)
|
||||
pairs = [[query, doc.page_content] for doc in candidates]
|
||||
scores = reranker.predict(pairs)
|
||||
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)[:5]
|
||||
```
|
||||
|
||||
## Selection Guidelines
|
||||
|
||||
1. **Query Type**: Choose strategy based on typical query patterns
|
||||
2. **Document Type**: Consider document structure and content
|
||||
3. **Performance Requirements**: Balance quality vs speed
|
||||
4. **Domain Knowledge**: Leverage domain-specific patterns
|
||||
5. **User Expectations**: Match retrieval behavior to user expectations
|
||||
86
skills/rag/references/vector-databases.md
Normal file
86
skills/rag/references/vector-databases.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Vector Database Comparison and Configuration
|
||||
|
||||
## Overview
|
||||
Vector databases store and efficiently retrieve document embeddings for semantic search in RAG systems.
|
||||
|
||||
## Popular Vector Database Options
|
||||
|
||||
### 1. Pinecone
|
||||
- **Type**: Managed cloud service
|
||||
- **Features**: Scalable, fast queries, managed infrastructure
|
||||
- **Use Case**: Production applications requiring high availability
|
||||
|
||||
### 2. Weaviate
|
||||
- **Type**: Open-source, hybrid search
|
||||
- **Features**: Combines vector and keyword search, GraphQL API
|
||||
- **Use Case**: Applications needing both semantic and traditional search
|
||||
|
||||
### 3. Milvus
|
||||
- **Type**: High performance, on-premise
|
||||
- **Features**: Distributed architecture, GPU acceleration
|
||||
- **Use Case**: Large-scale deployments with custom infrastructure
|
||||
|
||||
### 4. Chroma
|
||||
- **Type**: Lightweight, easy to use
|
||||
- **Features**: Local deployment, simple API
|
||||
- **Use Case**: Development and small-scale applications
|
||||
|
||||
### 5. Qdrant
|
||||
- **Type**: Fast, filtered search
|
||||
- **Features**: Advanced filtering, payload support
|
||||
- **Use Case**: Applications requiring complex metadata filtering
|
||||
|
||||
### 6. FAISS
|
||||
- **Type**: Meta's library, local deployment
|
||||
- **Features**: High performance, CPU/GPU optimized
|
||||
- **Use Case**: Research and applications needing full control
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Pinecone Setup
|
||||
```python
|
||||
import pinecone
|
||||
from langchain.vectorstores import Pinecone
|
||||
|
||||
pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
|
||||
index = pinecone.Index("your-index-name")
|
||||
vectorstore = Pinecone(index, embeddings.embed_query, "text")
|
||||
```
|
||||
|
||||
### Weaviate Setup
|
||||
```python
|
||||
import weaviate
|
||||
from langchain.vectorstores import Weaviate
|
||||
|
||||
client = weaviate.Client("http://localhost:8080")
|
||||
vectorstore = Weaviate(client, "Document", "content", embeddings)
|
||||
```
|
||||
|
||||
### Chroma Local Setup
|
||||
```python
|
||||
from langchain.vectorstores import Chroma
|
||||
|
||||
vectorstore = Chroma(
|
||||
collection_name="my_collection",
|
||||
embedding_function=embeddings,
|
||||
persist_directory="./chroma_db"
|
||||
)
|
||||
```
|
||||
|
||||
## Selection Criteria
|
||||
|
||||
1. **Scale**: Number of documents and expected query volume
|
||||
2. **Performance**: Latency requirements and throughput needs
|
||||
3. **Deployment**: Cloud vs on-premise preferences
|
||||
4. **Features**: Filtering, hybrid search, metadata support
|
||||
5. **Cost**: Budget constraints and operational overhead
|
||||
6. **Maintenance**: Team expertise and available resources
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Indexing Strategy**: Choose appropriate distance metrics (cosine, euclidean)
|
||||
2. **Sharding**: Distribute data for large-scale deployments
|
||||
3. **Monitoring**: Track query performance and system health
|
||||
4. **Backups**: Implement regular backup procedures
|
||||
5. **Security**: Secure access to sensitive data
|
||||
6. **Optimization**: Tune parameters for your specific use case
|
||||
Reference in New Issue
Block a user