--- name: rag-implementation description: Build Retrieval-Augmented Generation (RAG) systems for AI applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases. allowed-tools: Read, Write, Bash category: ai-engineering tags: [rag, vector-databases, embeddings, retrieval, semantic-search] version: 1.0.0 --- # RAG Implementation Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources. ## Overview RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers. ## When to Use Use this skill when: - Building Q&A systems over proprietary documents - Creating chatbots with current, factual information - Implementing semantic search with natural language queries - Reducing hallucinations with grounded responses - Enabling AI systems to access domain-specific knowledge - Building documentation assistants - Creating research tools with source citation - Developing knowledge management systems ## Core Components ### Vector Databases Store and efficiently retrieve document embeddings for semantic search. **Key Options:** - **Pinecone**: Managed, scalable, production-ready - **Weaviate**: Open-source, hybrid search capabilities - **Milvus**: High performance, on-premise deployment - **Chroma**: Lightweight, easy local development - **Qdrant**: Fast, advanced filtering - **FAISS**: Meta's library, full control ### Embedding Models Convert text to numerical vectors for similarity search. **Popular Models:** - **text-embedding-ada-002** (OpenAI): General purpose, 1536 dimensions - **all-MiniLM-L6-v2**: Fast, lightweight, 384 dimensions - **e5-large-v2**: High quality, multilingual - **bge-large-en-v1.5**: State-of-the-art performance ### Retrieval Strategies Find relevant content based on user queries. **Approaches:** - **Dense Retrieval**: Semantic similarity via embeddings - **Sparse Retrieval**: Keyword matching (BM25, TF-IDF) - **Hybrid Search**: Combine dense + sparse for best results - **Multi-Query**: Generate multiple query variations - **Contextual Compression**: Extract only relevant parts ## Quick Implementation ### Basic RAG Setup ```java // Load documents from file system List documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs"); // Create embedding store InMemoryEmbeddingStore embeddingStore = new InMemoryEmbeddingStore<>(); // Ingest documents into the store EmbeddingStoreIngestor.ingest(documents, embeddingStore); // Create AI service with RAG capability Assistant assistant = AiServices.builder(Assistant.class) .chatModel(chatModel) .chatMemory(MessageWindowChatMemory.withMaxMessages(10)) .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore)) .build(); ``` ### Document Processing Pipeline ```java // Split documents into chunks DocumentSplitter splitter = new RecursiveCharacterTextSplitter( 500, // chunk size 100 // overlap ); // Create embedding model EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder() .apiKey("your-api-key") .build(); // Create embedding store EmbeddingStore embeddingStore = PgVectorEmbeddingStore.builder() .host("localhost") .database("postgres") .user("postgres") .password("password") .table("embeddings") .dimension(1536) .build(); // Process and store documents for (Document document : documents) { List segments = splitter.split(document); for (TextSegment segment : segments) { Embedding embedding = embeddingModel.embed(segment).content(); embeddingStore.add(embedding, segment); } } ``` ## Implementation Patterns ### Pattern 1: Simple Document Q&A Create a basic Q&A system over your documents. ```java public interface DocumentAssistant { String answer(String question); } DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class) .chatModel(chatModel) .contentRetriever(retriever) .build(); ``` ### Pattern 2: Metadata-Filtered Retrieval Filter results based on document metadata. ```java // Add metadata during document loading Document document = Document.builder() .text("Content here") .metadata("source", "technical-manual.pdf") .metadata("category", "technical") .metadata("date", "2024-01-15") .build(); // Filter during retrieval EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .maxResults(5) .minScore(0.7) .filter(metadataKey("category").isEqualTo("technical")) .build(); ``` ### Pattern 3: Multi-Source Retrieval Combine results from multiple knowledge sources. ```java ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore); ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore); ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore); // Combine results List allResults = new ArrayList<>(); allResults.addAll(webRetriever.retrieve(query)); allResults.addAll(documentRetriever.retrieve(query)); allResults.addAll(databaseRetriever.retrieve(query)); // Rerank combined results List rerankedResults = reranker.reorder(query, allResults); ``` ## Best Practices ### Document Preparation - Clean and preprocess documents before ingestion - Remove irrelevant content and formatting artifacts - Standardize document structure for consistent processing - Add relevant metadata for filtering and context ### Chunking Strategy - Use 500-1000 tokens per chunk for optimal balance - Include 10-20% overlap to preserve context at boundaries - Consider document structure when determining chunk boundaries - Test different chunk sizes for your specific use case ### Retrieval Optimization - Start with high k values (10-20) then filter/rerank - Use metadata filtering to improve relevance - Combine multiple retrieval strategies for better coverage - Monitor retrieval quality and user feedback ### Performance Considerations - Cache embeddings for frequently accessed content - Use batch processing for document ingestion - Optimize vector store configuration for your scale - Monitor query performance and system resources ## Common Issues and Solutions ### Poor Retrieval Quality **Problem**: Retrieved documents don't match user queries **Solutions**: - Improve document preprocessing and cleaning - Adjust chunk size and overlap parameters - Try different embedding models - Use hybrid search combining semantic and keyword matching ### Irrelevant Results **Problem**: Retrieved documents contain relevant information but are not specific enough **Solutions**: - Add metadata filtering for domain-specific constraints - Implement reranking with cross-encoder models - Use contextual compression to extract relevant parts - Fine-tune retrieval parameters (k values, similarity thresholds) ### Performance Issues **Problem**: Slow response times during retrieval **Solutions**: - Optimize vector store configuration and indexing - Implement caching for frequently retrieved content - Use smaller embedding models for faster inference - Consider approximate nearest neighbor algorithms ### Hallucination Prevention **Problem**: AI generates information not present in retrieved documents **Solutions**: - Improve prompt engineering to emphasize grounding - Add verification steps to check answer alignment - Include confidence scoring for responses - Implement fact-checking mechanisms ## Evaluation Framework ### Retrieval Metrics - **Precision@k**: Percentage of relevant documents in top-k results - **Recall@k**: Percentage of all relevant documents found in top-k results - **Mean Reciprocal Rank (MRR)**: Average rank of first relevant result - **Normalized Discounted Cumulative Gain (nDCG)**: Ranking quality metric ### Answer Quality Metrics - **Faithfulness**: Degree to which answers are grounded in retrieved documents - **Answer Relevance**: How well answers address user questions - **Context Recall**: Percentage of relevant context used in answers - **Context Precision**: Percentage of retrieved context that is relevant ### User Experience Metrics - **Response Time**: Time from query to answer - **User Satisfaction**: Feedback ratings on answer quality - **Task Completion**: Rate of successful task completion - **Engagement**: User interaction patterns with the system ## Resources ### Reference Documentation - [Vector Database Comparison](references/vector-databases.md) - Detailed comparison of vector database options - [Embedding Models Guide](references/embedding-models.md) - Model selection and optimization - [Retrieval Strategies](references/retrieval-strategies.md) - Advanced retrieval techniques - [Document Chunking](references/document-chunking.md) - Chunking strategies and best practices - [LangChain4j RAG Guide](references/langchain4j-rag-guide.md) - Official implementation patterns ### Assets - `assets/vector-store-config.yaml` - Configuration templates for different vector stores - `assets/retriever-pipeline.java` - Complete RAG pipeline implementation - `assets/evaluation-metrics.java` - Evaluation framework code ## Constraints and Limitations 1. **Token Limits**: Respect model context window limitations 2. **API Rate Limits**: Manage external API rate limits and costs 3. **Data Privacy**: Ensure compliance with data protection regulations 4. **Resource Requirements**: Consider memory and computational requirements 5. **Maintenance**: Plan for regular updates and system monitoring ## Security Considerations - Secure access to vector databases and embedding services - Implement proper authentication and authorization - Validate and sanitize user inputs - Monitor for abuse and unusual usage patterns - Regular security audits and penetration testing