2.7 KiB
2.7 KiB
Embedding Models Guide
Overview
Embedding models convert text into numerical vectors that capture semantic meaning for similarity search in RAG systems.
Popular Embedding Models
1. text-embedding-ada-002 (OpenAI)
- Dimensions: 1536
- Type: General purpose
- Use Case: Most applications requiring high quality embeddings
- Performance: Excellent balance of quality and speed
2. all-MiniLM-L6-v2 (Sentence Transformers)
- Dimensions: 384
- Type: Lightweight
- Use Case: Applications requiring fast inference
- Performance: Good quality, very fast
3. e5-large-v2
- Dimensions: 1024
- Type: High quality
- Use Case: Applications needing superior performance
- Performance: Excellent quality, multilingual support
4. Instructor
- Dimensions: Variable (768)
- Type: Task-specific
- Use Case: Domain-specific applications
- Performance: Can be fine-tuned for specific tasks
5. bge-large-en-v1.5
- Dimensions: 1024
- Type: State-of-the-art
- Use Case: Applications requiring best possible quality
- Performance: SOTA performance on benchmarks
Selection Criteria
- Quality vs Speed: Balance between embedding quality and inference speed
- Dimension Size: Impact on storage and retrieval performance
- Domain: Specific language or domain requirements
- Cost: API costs vs local deployment
- Batch Size: Throughput requirements
- Language: Multilingual support needs
Usage Examples
OpenAI Embeddings
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vector = embeddings.embed_query("Your text here")
Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
vector = model.encode("Your text here")
Hugging Face Models
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Optimization Tips
- Batch Processing: Process multiple texts together for efficiency
- Model Quantization: Reduce model size for faster inference
- Caching: Cache embeddings for frequently used texts
- GPU Acceleration: Use GPU for faster processing when available
- Model Selection: Choose appropriate model size for your use case
Evaluation Metrics
- Semantic Similarity: How well embeddings capture meaning
- Retrieval Performance: Quality of retrieved documents
- Speed: Inference time per document
- Memory Usage: RAM requirements for the model
- Cost: API costs or infrastructure requirements