--- name: langchain4j-rag-implementation-patterns description: Implement Retrieval-Augmented Generation (RAG) systems with LangChain4j. Build document ingestion pipelines, embedding stores, vector search strategies, and knowledge-enhanced AI applications. Use when creating question-answering systems over document collections or AI assistants with external knowledge bases. allowed-tools: Read, Write, Bash category: ai-development tags: [langchain4j, rag, retrieval-augmented-generation, embedding, vector-search, document-ingestion, java] version: 1.1.0 --- # LangChain4j RAG Implementation Patterns ## When to Use This Skill Use this skill when: - Building knowledge-based AI applications requiring external document access - Implementing question-answering systems over large document collections - Creating AI assistants with access to company knowledge bases - Building semantic search capabilities for document repositories - Implementing chat systems that reference specific information sources - Creating AI applications requiring source attribution - Building domain-specific AI systems with curated knowledge - Implementing hybrid search combining vector similarity with traditional search - Creating AI applications requiring real-time document updates - Building multi-modal RAG systems with text, images, and other content types ## Overview Implement complete Retrieval-Augmented Generation (RAG) systems with LangChain4j. RAG enhances language models by providing relevant context from external knowledge sources, improving accuracy and reducing hallucinations. ## Instructions ### Initialize RAG Project Create a new Spring Boot project with required dependencies: **pom.xml**: ```xml dev.langchain4j langchain4j-spring-boot-starter 1.8.0 dev.langchain4j langchain4j-open-ai 1.8.0 ``` ### Setup Document Ingestion Configure document loading and processing: ```java @Configuration public class RAGConfiguration { @Bean public EmbeddingModel embeddingModel() { return OpenAiEmbeddingModel.builder() .apiKey(System.getenv("OPENAI_API_KEY")) .modelName("text-embedding-3-small") .build(); } @Bean public EmbeddingStore embeddingStore() { return new InMemoryEmbeddingStore<>(); } } ``` Create document ingestion service: ```java @Service @RequiredArgsConstructor public class DocumentIngestionService { private final EmbeddingModel embeddingModel; private final EmbeddingStore embeddingStore; public void ingestDocument(String filePath, Map metadata) { Document document = FileSystemDocumentLoader.loadDocument(filePath); document.metadata().putAll(metadata); DocumentSplitter splitter = DocumentSplitters.recursive( 500, 50, new OpenAiTokenCountEstimator("text-embedding-3-small") ); List segments = splitter.split(document); List embeddings = embeddingModel.embedAll(segments).content(); embeddingStore.addAll(embeddings, segments); } } ``` ### Configure Content Retrieval Setup content retrieval with filtering: ```java @Configuration public class ContentRetrieverConfiguration { @Bean public ContentRetriever contentRetriever( EmbeddingStore embeddingStore, EmbeddingModel embeddingModel) { return EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .maxResults(5) .minScore(0.7) .build(); } } ``` ### Create RAG-Enabled AI Service Define AI service with context retrieval: ```java interface KnowledgeAssistant { @SystemMessage(""" You are a knowledgeable assistant with access to a comprehensive knowledge base. When answering questions: 1. Use the provided context from the knowledge base 2. If information is not in the context, clearly state this 3. Provide accurate, helpful responses 4. When possible, reference specific sources 5. If the context is insufficient, ask for clarification """) String answerQuestion(String question); } @Service @RequiredArgsConstructor public class KnowledgeService { private final KnowledgeAssistant assistant; public KnowledgeService(ChatModel chatModel, ContentRetriever contentRetriever) { this.assistant = AiServices.builder(KnowledgeAssistant.class) .chatModel(chatModel) .contentRetriever(contentRetriever) .build(); } public String answerQuestion(String question) { return assistant.answerQuestion(question); } } ``` ## Examples ### Basic Document Processing ```java public class BasicRAGExample { public static void main(String[] args) { var embeddingStore = new InMemoryEmbeddingStore(); var embeddingModel = OpenAiEmbeddingModel.builder() .apiKey(System.getenv("OPENAI_API_KEY")) .modelName("text-embedding-3-small") .build(); var ingestor = EmbeddingStoreIngestor.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build(); ingestor.ingest(Document.from("Spring Boot is a framework for building Java applications with minimal configuration.")); var retriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .build(); } } ``` ### Multi-Domain Assistant ```java interface MultiDomainAssistant { @SystemMessage(""" You are an expert assistant with access to multiple knowledge domains: - Technical documentation - Company policies - Product information - Customer support guides Tailor your response based on the type of question and available context. Always indicate which domain the information comes from. """) String answerQuestion(@MemoryId String userId, String question); } ``` ### Hierarchical RAG ```java @Service @RequiredArgsConstructor public class HierarchicalRAGService { private final EmbeddingStore chunkStore; private final EmbeddingStore summaryStore; private final EmbeddingModel embeddingModel; public String performHierarchicalRetrieval(String query) { List> summaryMatches = searchSummaries(query); List relevantChunks = new ArrayList<>(); for (EmbeddingMatch summaryMatch : summaryMatches) { String documentId = summaryMatch.embedded().metadata().getString("documentId"); List> chunkMatches = searchChunksInDocument(query, documentId); chunkMatches.stream() .map(EmbeddingMatch::embedded) .forEach(relevantChunks::add); } return generateResponseWithChunks(query, relevantChunks); } } ``` ## Best Practices ### Document Segmentation - Use recursive splitting with 500-1000 token chunks for most applications - Maintain 20-50 token overlap between chunks for context preservation - Consider document structure (headings, paragraphs) when splitting - Use token-aware splitters for optimal embedding generation ### Metadata Strategy - Include rich metadata for filtering and attribution: - User and tenant identifiers for multi-tenancy - Document type and category classification - Creation and modification timestamps - Version and author information - Confidentiality and access level tags ### Query Processing - Implement query preprocessing and cleaning - Consider query expansion for better recall - Apply dynamic filtering based on user context - Use re-ranking for improved result quality ### Performance Optimization - Cache embeddings for repeated queries - Use batch embedding generation for bulk operations - Implement pagination for large result sets - Consider asynchronous processing for long operations ## Common Patterns ### Simple RAG Pipeline ```java @RequiredArgsConstructor @Service public class SimpleRAGPipeline { private final EmbeddingModel embeddingModel; private final EmbeddingStore embeddingStore; private final ChatModel chatModel; public String answerQuestion(String question) { Embedding queryEmbedding = embeddingModel.embed(question).content(); EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() .queryEmbedding(queryEmbedding) .maxResults(3) .build(); List segments = embeddingStore.search(request).matches().stream() .map(EmbeddingMatch::embedded) .collect(Collectors.toList()); String context = segments.stream() .map(TextSegment::text) .collect(Collectors.joining("\n\n")); return chatModel.generate(context + "\n\nQuestion: " + question + "\nAnswer:"); } } ``` ### Hybrid Search (Vector + Keyword) ```java @Service @RequiredArgsConstructor public class HybridSearchService { private final EmbeddingStore vectorStore; private final FullTextSearchEngine keywordEngine; private final EmbeddingModel embeddingModel; public List hybridSearch(String query, int maxResults) { // Vector search List vectorResults = performVectorSearch(query, maxResults); // Keyword search List keywordResults = performKeywordSearch(query, maxResults); // Combine and re-rank using RRF algorithm return combineResults(vectorResults, keywordResults, maxResults); } } ``` ## Troubleshooting ### Common Issues **Poor Retrieval Results** - Check document chunk size and overlap settings - Verify embedding model compatibility - Ensure metadata filters are not too restrictive - Consider adding re-ranking step **Slow Performance** - Use cached embeddings for frequent queries - Optimize database indexing for vector stores - Implement pagination for large datasets - Consider async processing for bulk operations **High Memory Usage** - Use disk-based embedding stores for large datasets - Implement proper pagination and filtering - Clean up unused embeddings periodically - Monitor and optimize chunk sizes ## References - [API Reference](references/references.md) - Complete API documentation and interfaces - [Examples](references/examples.md) - Production-ready examples and patterns - [Official LangChain4j Documentation](https://docs.langchain4j.dev/)