Initial commit

2025-11-29 18:28:34 +08:00
commit 390afca02b
220 changed files with 86013 additions and 0 deletions
--- a/skills/rag/references/vector-databases.md
+++ b/skills/rag/references/vector-databases.md
@@ -0,0 +1,86 @@
+# Vector Database Comparison and Configuration
+
+## Overview
+Vector databases store and efficiently retrieve document embeddings for semantic search in RAG systems.
+
+## Popular Vector Database Options
+
+### 1. Pinecone
+- **Type**: Managed cloud service
+- **Features**: Scalable, fast queries, managed infrastructure
+- **Use Case**: Production applications requiring high availability
+
+### 2. Weaviate
+- **Type**: Open-source, hybrid search
+- **Features**: Combines vector and keyword search, GraphQL API
+- **Use Case**: Applications needing both semantic and traditional search
+
+### 3. Milvus
+- **Type**: High performance, on-premise
+- **Features**: Distributed architecture, GPU acceleration
+- **Use Case**: Large-scale deployments with custom infrastructure
+
+### 4. Chroma
+- **Type**: Lightweight, easy to use
+- **Features**: Local deployment, simple API
+- **Use Case**: Development and small-scale applications
+
+### 5. Qdrant
+- **Type**: Fast, filtered search
+- **Features**: Advanced filtering, payload support
+- **Use Case**: Applications requiring complex metadata filtering
+
+### 6. FAISS
+- **Type**: Meta's library, local deployment
+- **Features**: High performance, CPU/GPU optimized
+- **Use Case**: Research and applications needing full control
+
+## Configuration Examples
+
+### Pinecone Setup
+```python
+import pinecone
+from langchain.vectorstores import Pinecone
+
+pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
+index = pinecone.Index("your-index-name")
+vectorstore = Pinecone(index, embeddings.embed_query, "text")
+```
+
+### Weaviate Setup
+```python
+import weaviate
+from langchain.vectorstores import Weaviate
+
+client = weaviate.Client("http://localhost:8080")
+vectorstore = Weaviate(client, "Document", "content", embeddings)
+```
+
+### Chroma Local Setup
+```python
+from langchain.vectorstores import Chroma
+
+vectorstore = Chroma(
+    collection_name="my_collection",
+    embedding_function=embeddings,
+    persist_directory="./chroma_db"
+)
+```
+
+## Selection Criteria
+
+1. **Scale**: Number of documents and expected query volume
+2. **Performance**: Latency requirements and throughput needs
+3. **Deployment**: Cloud vs on-premise preferences
+4. **Features**: Filtering, hybrid search, metadata support
+5. **Cost**: Budget constraints and operational overhead
+6. **Maintenance**: Team expertise and available resources
+
+## Best Practices
+
+1. **Indexing Strategy**: Choose appropriate distance metrics (cosine, euclidean)
+2. **Sharding**: Distribute data for large-scale deployments
+3. **Monitoring**: Track query performance and system health
+4. **Backups**: Implement regular backup procedures
+5. **Security**: Secure access to sensitive data
+6. **Optimization**: Tune parameters for your specific use case