Initial commit
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
---
|
||||
slug: /pyseekdb-sdk-get-started
|
||||
---
|
||||
|
||||
# Get started
|
||||
|
||||
## pyseekdb
|
||||
|
||||
pyseekdb is a Python client provided by OceanBase Database. It allows you to connect to seekdb in embedded mode or remote mode, and supports connecting to seekdb in server mode or OceanBase Database.
|
||||
|
||||
:::tip
|
||||
OceanBase Database is a fully self-developed, enterprise-level, native distributed database provided by OceanBase. It achieves financial-grade high availability on ordinary hardware and sets a new standard for automatic, lossless disaster recovery across cities with the "five IDCs across three regions" architecture. It also sets a new benchmark in the TPC-C standard test, with a single cluster scale exceeding 1,500 nodes. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
|
||||
:::
|
||||
|
||||
pyseekdb is supported on Linux, macOS, and Windows. The supported database connection modes vary by operating system. For more information, see the table below.
|
||||
|
||||
| System | Embedded seekdb | Server mode seekdb | Server mode OceanBase Database |
|
||||
|----|---|---|---|
|
||||
| Linux | Supported | Supported | Supported |
|
||||
| macOS | Not supported | Supported | Supported |
|
||||
| Windows | Not supported | Supported | Supported |
|
||||
|
||||
For Linux system, when you install this client, it will also install seekdb in embedded mode, allowing you to directly connect to it to perform operations such as creating a database. Alternatively, you can choose to connect to a deployed seekdb or OceanBase Database in client/server mode.
|
||||
|
||||
## Install pyseekdb
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Make sure that your environment meets the following requirements:
|
||||
|
||||
* Operating system: Linux (glibc >= 2.28), macOS or Windows
|
||||
* Python version: Python 3.11 and later
|
||||
* System architecture: x86_64 or aarch64
|
||||
|
||||
### Procedure
|
||||
|
||||
Use pip to install pyseekdb. It will automatically detect the default Python version and platform.
|
||||
|
||||
```shell
|
||||
pip install pyseekdb
|
||||
```
|
||||
|
||||
If your pip version is outdated, upgrade it before installation.
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
```
|
||||
|
||||
## What to do next
|
||||
|
||||
* After installing pyseekdb, you can connect to seekdb to perform operations. For information about the API interfaces supported by pyseekdb, see [API Reference](../50.apis/10.api-overview.md).
|
||||
|
||||
* You can also refer to the SDK samples provided to quickly experience pyseekdb.
|
||||
|
||||
* [Simple sample](50.sdk-samples/10.pyseekdb-simple-sample.md)
|
||||
|
||||
* [Complete sample](50.sdk-samples/50.pyseekdb-complete-sample.md)
|
||||
|
||||
* [Hybrid search sample](50.sdk-samples/100.pyseekdb-hybrid-search-sample.md)
|
||||
|
||||
@@ -0,0 +1,130 @@
|
||||
---
|
||||
slug: /pyseekdb-simple-sample
|
||||
---
|
||||
|
||||
# Simple Example
|
||||
|
||||
This example demonstrates the basic operations of Embedding Functions in embedded mode of seekdb to help you understand how to use Embedding Functions.
|
||||
|
||||
1. Connect to seekdb.
|
||||
2. Create a collection with Embedding Functions.
|
||||
3. Add data using documents (vectors will be automatically generated).
|
||||
4. Query using texts (vectors will be automatically generated).
|
||||
5. Print the query results.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This example uses seekdb in embedded mode. Before using this example, make sure that you have deployed seekdb in server mode.
|
||||
|
||||
For information about how to deploy seekdb in embedded mode, see [Embedded Mode](../../../../400.guides/400.deploy/600.python-seekdb.md).
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# ==================== Step 1: Create Client Connection ====================
|
||||
# You can use embedded mode, server mode, or OceanBase mode
|
||||
|
||||
# Embedded mode (local SeekDB)
|
||||
client = pyseekdb.Client()
|
||||
# Alternative: Server mode (connecting to remote SeekDB server)
|
||||
# client = pyseekdb.Client(
|
||||
# host="127.0.0.1",
|
||||
# port=2881,
|
||||
# database="test",
|
||||
# user="root",
|
||||
# password=""
|
||||
# )
|
||||
|
||||
# Alternative: Remote server mode (OceanBase Server)
|
||||
# client = pyseekdb.Client(
|
||||
# host="127.0.0.1",
|
||||
# port=2881,
|
||||
# tenant="test", # OceanBase default tenant
|
||||
# database="test",
|
||||
# user="root",
|
||||
# password=""
|
||||
# )
|
||||
|
||||
# ==================== Step 2: Create a Collection with Embedding Function ====================
|
||||
# A collection is like a table that stores documents with vector embeddings
|
||||
collection_name = "my_simple_collection"
|
||||
|
||||
# Create collection with default embedding function
|
||||
# The embedding function will automatically convert documents to embeddings
|
||||
collection = client.create_collection(
|
||||
name=collection_name,
|
||||
)
|
||||
|
||||
print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
|
||||
print(f"Embedding function: {collection.embedding_function}")
|
||||
|
||||
# ==================== Step 3: Add Data to Collection ====================
|
||||
# With embedding function, you can add documents directly without providing embeddings
|
||||
# The embedding function will automatically generate embeddings from documents
|
||||
|
||||
documents = [
|
||||
"Machine learning is a subset of artificial intelligence",
|
||||
"Python is a popular programming language",
|
||||
"Vector databases enable semantic search",
|
||||
"Neural networks are inspired by the human brain",
|
||||
"Natural language processing helps computers understand text"
|
||||
]
|
||||
|
||||
ids = ["id1", "id2", "id3", "id4", "id5"]
|
||||
|
||||
# Add data with documents only - embeddings will be auto-generated by embedding function
|
||||
collection.add(
|
||||
ids=ids,
|
||||
documents=documents, # embeddings will be automatically generated
|
||||
metadatas=[
|
||||
{"category": "AI", "index": 0},
|
||||
{"category": "Programming", "index": 1},
|
||||
{"category": "Database", "index": 2},
|
||||
{"category": "AI", "index": 3},
|
||||
{"category": "NLP", "index": 4}
|
||||
]
|
||||
)
|
||||
|
||||
print(f"\nAdded {len(documents)} documents to collection")
|
||||
print("Note: Embeddings were automatically generated from documents using the embedding function")
|
||||
|
||||
# ==================== Step 4: Query the Collection ====================
|
||||
# With embedding function, you can query using text directly
|
||||
# The embedding function will automatically convert query text to query vector
|
||||
|
||||
# Query using text - query vector will be auto-generated by embedding function
|
||||
query_text = "artificial intelligence and machine learning"
|
||||
|
||||
results = collection.query(
|
||||
query_texts=query_text, # Query text - will be embedded automatically
|
||||
n_results=3 # Return top 3 most similar documents
|
||||
)
|
||||
|
||||
print(f"\nQuery: '{query_text}'")
|
||||
print(f"Query results: {len(results['ids'][0])} items found")
|
||||
|
||||
# ==================== Step 5: Print Query Results ====================
|
||||
for i in range(len(results['ids'][0])):
|
||||
print(f"\nResult {i+1}:")
|
||||
print(f" ID: {results['ids'][0][i]}")
|
||||
print(f" Distance: {results['distances'][0][i]:.4f}")
|
||||
if results.get('documents'):
|
||||
print(f" Document: {results['documents'][0][i]}")
|
||||
if results.get('metadatas'):
|
||||
print(f" Metadata: {results['metadatas'][0][i]}")
|
||||
|
||||
# ==================== Step 6: Cleanup ====================
|
||||
# Delete the collection
|
||||
client.delete_collection(collection_name)
|
||||
print(f"\nDeleted collection '{collection_name}'")
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* For information about the APIs supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
|
||||
|
||||
* [Complete Example](50.pyseekdb-complete-sample.md)
|
||||
|
||||
* [Hybrid Search Example](100.pyseekdb-hybrid-search-sample.md)
|
||||
@@ -0,0 +1,350 @@
|
||||
---
|
||||
slug: /pyseekdb-hybrid-search-sample
|
||||
---
|
||||
|
||||
# Hybrid search example
|
||||
|
||||
This example demonstrates the advantages of `hybrid_search()` over `query()`.
|
||||
|
||||
The main advantages of `hybrid_search()` are:
|
||||
|
||||
* Supports full-text search and vector similarity search simultaneously
|
||||
|
||||
* Allows separate filtering conditions for full-text and vector search
|
||||
|
||||
* Combines the ranked results of both searches using the Reciprocal Rank Fusion algorithm to improve relevance.
|
||||
|
||||
* Handles complex scenarios that `query()` cannot handle
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Setup
|
||||
client = pyseekdb.Client()
|
||||
collection = client.get_or_create_collection(
|
||||
name="hybrid_search_demo"
|
||||
)
|
||||
|
||||
# Sample data
|
||||
documents = [
|
||||
"Machine learning is revolutionizing artificial intelligence and data science",
|
||||
"Python programming language is essential for machine learning developers",
|
||||
"Deep learning neural networks enable advanced AI applications",
|
||||
"Data science combines statistics, programming, and domain expertise",
|
||||
"Natural language processing uses machine learning to understand text",
|
||||
"Computer vision algorithms process images using deep learning techniques",
|
||||
"Reinforcement learning trains agents through reward-based feedback",
|
||||
"Python libraries like TensorFlow and PyTorch simplify machine learning",
|
||||
"Artificial intelligence systems can learn from large datasets",
|
||||
"Neural networks mimic the structure of biological brain connections"
|
||||
]
|
||||
|
||||
metadatas = [
|
||||
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
|
||||
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
|
||||
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
|
||||
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
|
||||
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
|
||||
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
|
||||
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
|
||||
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
|
||||
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
|
||||
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
|
||||
]
|
||||
|
||||
ids = [f"doc_{i+1}" for i in range(len(documents))]
|
||||
collection.add(ids=ids, documents=documents, metadatas=metadatas)
|
||||
|
||||
print("=" * 100)
|
||||
print("SCENARIO 1: Keyword + Semantic Search")
|
||||
print("=" * 100)
|
||||
print("Goal: Find documents similar to 'AI research' AND containing 'machine learning'\n")
|
||||
|
||||
# query() approach
|
||||
query_result1 = collection.query(
|
||||
query_texts=["AI research"],
|
||||
where_document={"$contains": "machine learning"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() approach
|
||||
hybrid_result1 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
|
||||
knn={"query_texts": ["AI research"], "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results:")
|
||||
for i, doc_id in enumerate(query_result1['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results:")
|
||||
for i, doc_id in enumerate(hybrid_result1['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" query() ranks 'Deep learning neural networks...' first because it's semantically similar to 'AI research',")
|
||||
print(" but 'machine learning' is not its primary focus. hybrid_search() correctly prioritizes documents that")
|
||||
print(" explicitly contain 'machine learning' (from full-text search) while also being semantically relevant")
|
||||
print(" to 'AI research' (from vector search). The RRF fusion ensures documents matching both criteria rank higher.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 2: Independent Filters for Different Search Types")
|
||||
print("=" * 100)
|
||||
print("Goal: Full-text='neural' (year=2024) + Vector='deep learning' (popularity>=90)\n")
|
||||
|
||||
# query() - same filter applies to both conditions
|
||||
query_result2 = collection.query(
|
||||
query_texts=["deep learning"],
|
||||
where={"year": {"$eq": 2024}, "popularity": {"$gte": 90}},
|
||||
where_document={"$contains": "neural"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - different filters for each search type
|
||||
hybrid_result2 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "neural"}, "where": {"year": {"$eq": 2024}}, "n_results": 10},
|
||||
knn={"query_texts": ["deep learning"], "where": {"popularity": {"$gte": 90}}, "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results (same filter for both):")
|
||||
for i, doc_id in enumerate(query_result2['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results (independent filters):")
|
||||
for i, doc_id in enumerate(hybrid_result2['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" query() only returns 2 results because it requires documents to satisfy BOTH year=2024 AND popularity>=90")
|
||||
print(" simultaneously. hybrid_search() returns 5 results by applying year=2024 filter to full-text search")
|
||||
print(" and popularity>=90 filter to vector search independently, then fusing the results. This approach")
|
||||
print(" captures more relevant documents that might satisfy one criterion strongly while meeting the other")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 3: Combining Multiple Search Strategies")
|
||||
print("=" * 100)
|
||||
print("Goal: Find documents about 'machine learning algorithms'\n")
|
||||
|
||||
# query() - vector search only
|
||||
query_result3 = collection.query(
|
||||
query_texts=["machine learning algorithms"],
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - combines full-text and vector
|
||||
hybrid_result3 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
|
||||
knn={"query_texts": ["machine learning algorithms"], "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results (vector similarity only):")
|
||||
for i, doc_id in enumerate(query_result3['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results (full-text + vector fusion):")
|
||||
for i, doc_id in enumerate(hybrid_result3['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" query() returns 'Artificial intelligence systems...' as the result, which doesn't explicitly")
|
||||
print(" mention 'machine learning'. hybrid_search() combines full-text search (for 'machine learning')")
|
||||
print(" with vector search (for semantic similarity to 'machine learning algorithms'), ensuring that")
|
||||
print(" documents containing the exact keyword rank higher while still capturing semantically relevant content.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 4: Complex Multi-Criteria Search")
|
||||
print("=" * 100)
|
||||
print("Goal: Full-text='learning' (category=AI) + Vector='artificial intelligence' (year>=2023)\n")
|
||||
|
||||
# query() - limited to single search with combined filters
|
||||
query_result4 = collection.query(
|
||||
query_texts=["artificial intelligence"],
|
||||
where={"category": {"$eq": "AI"}, "year": {"$gte": 2023}},
|
||||
where_document={"$contains": "learning"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - separate criteria for each search type
|
||||
hybrid_result4 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "learning"}, "where": {"category": {"$eq": "AI"}}, "n_results": 10},
|
||||
knn={"query_texts": ["artificial intelligence"], "where": {"year": {"$gte": 2023}}, "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results:")
|
||||
for i, doc_id in enumerate(query_result4['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results:")
|
||||
for i, doc_id in enumerate(hybrid_result4['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" While both methods return similar documents, hybrid_search() provides better ranking by prioritizing")
|
||||
print(" documents that score highly in both full-text search (containing 'learning' with category=AI) and")
|
||||
print(" vector search (semantically similar to 'artificial intelligence' with year>=2023). The RRF fusion")
|
||||
print(" algorithm ensures that 'Deep learning neural networks...' ranks first because it strongly matches")
|
||||
print(" both search criteria, whereas query() applies filters sequentially which may not optimize ranking.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 5: Result Quality - RRF Fusion")
|
||||
print("=" * 100)
|
||||
print("Goal: Search for 'Python machine learning'\n")
|
||||
|
||||
# query() - single ranking
|
||||
query_result5 = collection.query(
|
||||
query_texts=["Python machine learning"],
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - RRF fusion of multiple rankings
|
||||
hybrid_result5 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "Python"}, "n_results": 10},
|
||||
knn={"query_texts": ["Python machine learning"], "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results (single ranking):")
|
||||
for i, doc_id in enumerate(query_result5['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results (RRF fusion):")
|
||||
for i, doc_id in enumerate(hybrid_result5['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" Both methods return identical results in this case, but hybrid_search() achieves this through RRF")
|
||||
print(" (Reciprocal Rank Fusion) which combines rankings from full-text search (for 'Python') and vector")
|
||||
print(" search (for 'Python machine learning'). RRF provides more stable and robust ranking by considering")
|
||||
print(" multiple signals, making it less sensitive to variations in individual search algorithms and ensuring")
|
||||
print(" consistent high-quality results across different query formulations.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 6: Different Filter Criteria for Each Search")
|
||||
print("=" * 100)
|
||||
print("Goal: Full-text='neural' (high popularity) + Vector='deep learning' (recent year)\n")
|
||||
|
||||
# query() - cannot separate filters for keyword vs semantic
|
||||
query_result6 = collection.query(
|
||||
query_texts=["deep learning"],
|
||||
where={"popularity": {"$gte": 90}, "year": {"$gte": 2023}},
|
||||
where_document={"$contains": "neural"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - different filters for keyword search vs semantic search
|
||||
hybrid_result6 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "neural"}, "where": {"popularity": {"$gte": 90}}, "n_results": 10},
|
||||
knn={"query_texts": ["deep learning"], "where": {"year": {"$gte": 2023}}, "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results:")
|
||||
for i, doc_id in enumerate(query_result6['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results:")
|
||||
for i, doc_id in enumerate(hybrid_result6['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
print(f" {metadatas[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" query() only returns 2 results because it requires documents to satisfy BOTH popularity>=90 AND")
|
||||
print(" year>=2023 simultaneously, along with containing 'neural' and being semantically similar to")
|
||||
print(" 'deep learning'. hybrid_search() returns 5 results by applying popularity>=90 filter to full-text")
|
||||
print(" search (for 'neural') and year>=2023 filter to vector search (for 'deep learning') independently.")
|
||||
print(" The fusion then combines results from both searches, capturing documents that strongly match either")
|
||||
print(" criterion while still being relevant to the overall query intent.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SCENARIO 7: Partial Keyword Match + Semantic Similarity")
|
||||
print("=" * 100)
|
||||
print("Goal: Documents containing 'Python' + Semantically similar to 'data science'\n")
|
||||
|
||||
# query() - filter applied after vector search
|
||||
query_result7 = collection.query(
|
||||
query_texts=["data science"],
|
||||
where_document={"$contains": "Python"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# hybrid_search() - parallel searches then fusion
|
||||
hybrid_result7 = collection.hybrid_search(
|
||||
query={"where_document": {"$contains": "Python"}, "n_results": 10},
|
||||
knn={"query_texts": ["data science"], "n_results": 10},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
print("query() Results:")
|
||||
for i, doc_id in enumerate(query_result7['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nhybrid_search() Results:")
|
||||
for i, doc_id in enumerate(hybrid_result7['ids'][0]):
|
||||
idx = ids.index(doc_id)
|
||||
print(f" {i+1}. {documents[idx]}")
|
||||
|
||||
print("\nAnalysis:")
|
||||
print(" query() only returns 2 results because it first performs vector search for 'data science', then")
|
||||
print(" filters to documents containing 'Python', which severely limits the result set. hybrid_search()")
|
||||
print(" returns 5 results by running full-text search (for 'Python') and vector search (for 'data science')")
|
||||
print(" in parallel, then fusing the results. This captures documents that contain 'Python' (even if not")
|
||||
print(" semantically closest to 'data science') and documents semantically similar to 'data science' (even")
|
||||
print(" if they don't contain 'Python'), providing better recall and more comprehensive results.")
|
||||
|
||||
print("\n" + "=" * 100)
|
||||
print("SUMMARY")
|
||||
print("=" * 100)
|
||||
print("""
|
||||
query() limitations:
|
||||
- Single search type (vector similarity)
|
||||
- Filters applied after search (may miss relevant docs)
|
||||
- Cannot combine full-text and vector search results
|
||||
- Same filter criteria for all conditions
|
||||
|
||||
hybrid_search() advantages:
|
||||
- Simultaneous full-text + vector search
|
||||
- Independent filters for each search type
|
||||
- Intelligent result fusion using RRF
|
||||
- Better recall for complex queries
|
||||
- Handles scenarios requiring both keyword and semantic matching
|
||||
""")
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* For information about the APIs supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
|
||||
|
||||
* [Simple example](10.pyseekdb-simple-sample.md)
|
||||
|
||||
* [Complete example](50.pyseekdb-complete-sample.md)
|
||||
@@ -0,0 +1,440 @@
|
||||
---
|
||||
slug: /pyseekdb-complete-sample
|
||||
---
|
||||
|
||||
# Complete Example
|
||||
|
||||
This example demonstrates the full capabilities of pyseekdb.
|
||||
|
||||
The example includes the following operations:
|
||||
|
||||
1. Connection, including all connection modes
|
||||
2. Collection management
|
||||
3. DML operations, including add, update, upsert, and delete
|
||||
4. DQL operations, including query, get, and hybrid_search
|
||||
5. Filter operators
|
||||
6. Collection information methods
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
import uuid
|
||||
import random
|
||||
import pyseekdb
|
||||
|
||||
# ============================================================================
|
||||
# PART 1: CLIENT CONNECTION
|
||||
# ============================================================================
|
||||
|
||||
# Option 1: Embedded mode (local SeekDB)
|
||||
client = pyseekdb.Client(
|
||||
#path="./seekdb",
|
||||
#database="test"
|
||||
)
|
||||
|
||||
# Option 2: Server mode (remote SeekDB server)
|
||||
# client = pyseekdb.Client(
|
||||
# host="127.0.0.1",
|
||||
# port=2881,
|
||||
# database="test",
|
||||
# user="root",
|
||||
# password=""
|
||||
# )
|
||||
|
||||
# Option 3: Remote server mode (OceanBase Server)
|
||||
# client = pyseekdb.Client(
|
||||
# host="127.0.0.1",
|
||||
# port=2881,
|
||||
# tenant="test", # OceanBase default tenant
|
||||
# database="test",
|
||||
# user="root",
|
||||
# password=""
|
||||
# )
|
||||
|
||||
# ============================================================================
|
||||
# PART 2: COLLECTION MANAGEMENT
|
||||
# ============================================================================
|
||||
|
||||
collection_name = "comprehensive_example"
|
||||
dimension = 128
|
||||
|
||||
# 2.1 Create a collection
|
||||
from pyseekdb import HNSWConfiguration
|
||||
config = HNSWConfiguration(dimension=dimension, distance='cosine')
|
||||
collection = client.get_or_create_collection(
|
||||
name=collection_name,
|
||||
configuration=config,
|
||||
embedding_function=None # Explicitly set to None since we're using custom 128-dim embeddings
|
||||
)
|
||||
|
||||
# 2.2 Check if collection exists
|
||||
exists = client.has_collection(collection_name)
|
||||
|
||||
# 2.3 Get collection object
|
||||
retrieved_collection = client.get_collection(collection_name, embedding_function=None)
|
||||
|
||||
# 2.4 List all collections
|
||||
all_collections = client.list_collections()
|
||||
|
||||
# 2.5 Get or create collection (creates if doesn't exist)
|
||||
config2 = HNSWConfiguration(dimension=64, distance='cosine')
|
||||
collection2 = client.get_or_create_collection(
|
||||
name="another_collection",
|
||||
configuration=config2,
|
||||
embedding_function=None # Explicitly set to None since we're using custom 64-dim embeddings
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 3: DML OPERATIONS - ADD DATA
|
||||
# ============================================================================
|
||||
|
||||
# Generate sample data
|
||||
random.seed(42)
|
||||
documents = [
|
||||
"Machine learning is transforming the way we solve problems",
|
||||
"Python programming language is widely used in data science",
|
||||
"Vector databases enable efficient similarity search",
|
||||
"Neural networks mimic the structure of the human brain",
|
||||
"Natural language processing helps computers understand human language",
|
||||
"Deep learning requires large amounts of training data",
|
||||
"Reinforcement learning agents learn through trial and error",
|
||||
"Computer vision enables machines to interpret visual information"
|
||||
]
|
||||
|
||||
# Generate embeddings (in real usage, use an embedding model)
|
||||
embeddings = []
|
||||
for i in range(len(documents)):
|
||||
vector = [random.random() for _ in range(dimension)]
|
||||
embeddings.append(vector)
|
||||
|
||||
ids = [str(uuid.uuid4()) for _ in documents]
|
||||
|
||||
# 3.1 Add single item
|
||||
single_id = str(uuid.uuid4())
|
||||
collection.add(
|
||||
ids=single_id,
|
||||
documents="This is a single document",
|
||||
embeddings=[random.random() for _ in range(dimension)],
|
||||
metadatas={"type": "single", "category": "test"}
|
||||
)
|
||||
|
||||
# 3.2 Add multiple items
|
||||
collection.add(
|
||||
ids=ids,
|
||||
documents=documents,
|
||||
embeddings=embeddings,
|
||||
metadatas=[
|
||||
{"category": "AI", "score": 95, "tag": "ml", "year": 2023},
|
||||
{"category": "Programming", "score": 88, "tag": "python", "year": 2022},
|
||||
{"category": "Database", "score": 92, "tag": "vector", "year": 2023},
|
||||
{"category": "AI", "score": 90, "tag": "neural", "year": 2022},
|
||||
{"category": "NLP", "score": 87, "tag": "language", "year": 2023},
|
||||
{"category": "AI", "score": 93, "tag": "deep", "year": 2023},
|
||||
{"category": "AI", "score": 85, "tag": "reinforcement", "year": 2022},
|
||||
{"category": "CV", "score": 91, "tag": "vision", "year": 2023}
|
||||
]
|
||||
)
|
||||
|
||||
# 3.3 Add with only embeddings (no documents)
|
||||
vector_only_ids = [str(uuid.uuid4()) for _ in range(2)]
|
||||
collection.add(
|
||||
ids=vector_only_ids,
|
||||
embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
|
||||
metadatas=[{"type": "vector_only"}, {"type": "vector_only"}]
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 4: DML OPERATIONS - UPDATE DATA
|
||||
# ============================================================================
|
||||
|
||||
# 4.1 Update single item
|
||||
collection.update(
|
||||
ids=ids[0],
|
||||
metadatas={"category": "AI", "score": 98, "tag": "ml", "year": 2024, "updated": True}
|
||||
)
|
||||
|
||||
# 4.2 Update multiple items
|
||||
collection.update(
|
||||
ids=ids[1:3],
|
||||
documents=["Updated document 1", "Updated document 2"],
|
||||
embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
|
||||
metadatas=[
|
||||
{"category": "Programming", "score": 95, "updated": True},
|
||||
{"category": "Database", "score": 97, "updated": True}
|
||||
]
|
||||
)
|
||||
|
||||
# 4.3 Update embeddings
|
||||
new_embeddings = [[random.random() for _ in range(dimension)] for _ in range(2)]
|
||||
collection.update(
|
||||
ids=ids[2:4],
|
||||
embeddings=new_embeddings
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 5: DML OPERATIONS - UPSERT DATA
|
||||
# ============================================================================
|
||||
|
||||
# 5.1 Upsert existing item (will update)
|
||||
collection.upsert(
|
||||
ids=ids[0],
|
||||
documents="Upserted document (was updated)",
|
||||
embeddings=[random.random() for _ in range(dimension)],
|
||||
metadatas={"category": "AI", "upserted": True}
|
||||
)
|
||||
|
||||
# 5.2 Upsert new item (will insert)
|
||||
new_id = str(uuid.uuid4())
|
||||
collection.upsert(
|
||||
ids=new_id,
|
||||
documents="This is a new document from upsert",
|
||||
embeddings=[random.random() for _ in range(dimension)],
|
||||
metadatas={"category": "New", "upserted": True}
|
||||
)
|
||||
|
||||
# 5.3 Upsert multiple items
|
||||
upsert_ids = [ids[4], str(uuid.uuid4())] # One existing, one new
|
||||
collection.upsert(
|
||||
ids=upsert_ids,
|
||||
documents=["Upserted doc 1", "Upserted doc 2"],
|
||||
embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
|
||||
metadatas=[{"upserted": True}, {"upserted": True}]
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 6: DQL OPERATIONS - QUERY (VECTOR SIMILARITY SEARCH)
|
||||
# ============================================================================
|
||||
|
||||
# 6.1 Basic vector similarity query
|
||||
query_vector = embeddings[0] # Query with first document's vector
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
n_results=3
|
||||
)
|
||||
print(f"Query results: {len(results['ids'][0])} items")
|
||||
|
||||
# 6.2 Query with metadata filter (simplified equality)
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={"category": "AI"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.3 Query with comparison operators
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={"score": {"$gte": 90}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.4 Query with $in operator
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={"tag": {"$in": ["ml", "python", "neural"]}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.5 Query with logical operators ($or) - simplified equality
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={
|
||||
"$or": [
|
||||
{"category": "AI"},
|
||||
{"tag": "python"}
|
||||
]
|
||||
},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.6 Query with logical operators ($and) - simplified equality
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={
|
||||
"$and": [
|
||||
{"category": "AI"},
|
||||
{"score": {"$gte": 90}}
|
||||
]
|
||||
},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.7 Query with document filter
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where_document={"$contains": "machine learning"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.8 Query with combined filters (simplified equality)
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
where={"category": "AI", "year": {"$gte": 2023}},
|
||||
where_document={"$contains": "learning"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# 6.9 Query with multiple embeddings (batch query)
|
||||
batch_embeddings = [embeddings[0], embeddings[1]]
|
||||
batch_results = collection.query(
|
||||
query_embeddings=batch_embeddings,
|
||||
n_results=2
|
||||
)
|
||||
# batch_results["ids"][0] contains results for first query
|
||||
# batch_results["ids"][1] contains results for second query
|
||||
|
||||
# 6.10 Query with specific fields
|
||||
results = collection.query(
|
||||
query_embeddings=query_vector,
|
||||
include=["documents", "metadatas", "embeddings"],
|
||||
n_results=2
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 7: DQL OPERATIONS - GET (RETRIEVE BY IDS OR FILTERS)
|
||||
# ============================================================================
|
||||
|
||||
# 7.1 Get by single ID
|
||||
result = collection.get(ids=ids[0])
|
||||
# result["ids"] contains [ids[0]]
|
||||
# result["documents"] contains document for ids[0]
|
||||
|
||||
# 7.2 Get by multiple IDs
|
||||
results = collection.get(ids=ids[:3])
|
||||
# results["ids"] contains ids[:3]
|
||||
# results["documents"] contains documents for all IDs
|
||||
|
||||
# 7.3 Get by metadata filter (simplified equality)
|
||||
results = collection.get(
|
||||
where={"category": "AI"},
|
||||
limit=5
|
||||
)
|
||||
|
||||
# 7.4 Get with comparison operators
|
||||
results = collection.get(
|
||||
where={"score": {"$gte": 90}},
|
||||
limit=5
|
||||
)
|
||||
|
||||
# 7.5 Get with $in operator
|
||||
results = collection.get(
|
||||
where={"tag": {"$in": ["ml", "python"]}},
|
||||
limit=5
|
||||
)
|
||||
|
||||
# 7.6 Get with logical operators (simplified equality)
|
||||
results = collection.get(
|
||||
where={
|
||||
"$or": [
|
||||
{"category": "AI"},
|
||||
{"category": "Programming"}
|
||||
]
|
||||
},
|
||||
limit=5
|
||||
)
|
||||
|
||||
# 7.7 Get by document filter
|
||||
results = collection.get(
|
||||
where_document={"$contains": "Python"},
|
||||
limit=5
|
||||
)
|
||||
|
||||
# 7.8 Get with pagination
|
||||
results_page1 = collection.get(limit=2, offset=0)
|
||||
results_page2 = collection.get(limit=2, offset=2)
|
||||
|
||||
# 7.9 Get with specific fields
|
||||
results = collection.get(
|
||||
ids=ids[:2],
|
||||
include=["documents", "metadatas", "embeddings"]
|
||||
)
|
||||
|
||||
# 7.10 Get all data
|
||||
all_results = collection.get(limit=100)
|
||||
|
||||
# ============================================================================
|
||||
# PART 8: DQL OPERATIONS - HYBRID SEARCH
|
||||
# ============================================================================
|
||||
|
||||
# 8.1 Hybrid search with full-text and vector search
|
||||
# Note: This requires query_embeddings to be provided directly
|
||||
# In real usage, you might have an embedding function
|
||||
hybrid_results = collection.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "machine learning"},
|
||||
"where": {"category": "AI"}, # Simplified equality
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_embeddings": [embeddings[0]],
|
||||
"where": {"year": {"$gte": 2022}},
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}}, # Reciprocal Rank Fusion
|
||||
n_results=5,
|
||||
include=["documents", "metadatas"]
|
||||
)
|
||||
# hybrid_results["ids"][0] contains IDs for the hybrid search
|
||||
# hybrid_results["documents"][0] contains documents for the hybrid search
|
||||
print(f"Hybrid search: {len(hybrid_results.get('ids', [[]])[0])} results")
|
||||
|
||||
# ============================================================================
|
||||
# PART 9: DML OPERATIONS - DELETE DATA
|
||||
# ============================================================================
|
||||
|
||||
# 9.1 Delete by IDs
|
||||
delete_ids = [vector_only_ids[0], new_id]
|
||||
collection.delete(ids=delete_ids)
|
||||
|
||||
# 9.2 Delete by metadata filter
|
||||
collection.delete(where={"type": {"$eq": "vector_only"}})
|
||||
|
||||
# 9.3 Delete by document filter
|
||||
collection.delete(where_document={"$contains": "Updated document"})
|
||||
|
||||
# 9.4 Delete with combined filters
|
||||
collection.delete(
|
||||
where={"category": {"$eq": "CV"}},
|
||||
where_document={"$contains": "vision"}
|
||||
)
|
||||
|
||||
# ============================================================================
|
||||
# PART 10: COLLECTION INFORMATION
|
||||
# ============================================================================
|
||||
|
||||
# 10.1 Get collection count
|
||||
count = collection.count()
|
||||
print(f"Collection count: {count} items")
|
||||
|
||||
|
||||
# 10.3 Preview first few items in collection (returns all columns by default)
|
||||
preview = collection.peek(limit=5)
|
||||
print(f"Preview: {len(preview['ids'])} items")
|
||||
for i in range(len(preview['ids'])):
|
||||
print(f" ID: {preview['ids'][i]}, Document: {preview['documents'][i]}")
|
||||
print(f" Metadata: {preview['metadatas'][i]}, Embedding dim: {len(preview['embeddings'][i]) if preview['embeddings'][i] else 0}")
|
||||
|
||||
# 10.4 Count collections in database
|
||||
collection_count = client.count_collection()
|
||||
print(f"Database has {collection_count} collections")
|
||||
|
||||
# ============================================================================
|
||||
# PART 11: CLEANUP
|
||||
# ============================================================================
|
||||
|
||||
# Delete test collections
|
||||
try:
|
||||
client.delete_collection("another_collection")
|
||||
except Exception as e:
|
||||
print(f"Could not delete 'another_collection': {e}")
|
||||
|
||||
# Uncomment to delete main collection
|
||||
client.delete_collection(collection_name)
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* For information about the API interfaces supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
|
||||
|
||||
* [Simple Example](../50.sdk-samples/10.pyseekdb-simple-sample.md)
|
||||
|
||||
* [Hybrid Search Example](../50.sdk-samples/100.pyseekdb-hybrid-search-sample.md)
|
||||
@@ -0,0 +1,66 @@
|
||||
---
|
||||
slug: /api-overview
|
||||
---
|
||||
|
||||
# API Reference
|
||||
|
||||
seekdb allows you to use seekdb through APIs.
|
||||
|
||||
## APIs
|
||||
|
||||
The following APIs are supported.
|
||||
|
||||
### Database
|
||||
|
||||
:::info
|
||||
You can use this API only when you connect to seekdb by using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../50.apis/100.admin-client.md).
|
||||
:::
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `create_database()` | Creates a database. | [Documentation](110.database/200.create-database-of-api.md) |
|
||||
| `get_database()` | Retrieves a specified database. |[Documentation](110.database/300.get-database-of-api.md)|
|
||||
| `list_databases()` | Retrieves a list of databases in an instance. |[Documentation](110.database/400.list-database-of-api.md)|
|
||||
| `delete_database()` | Deletes a specified database.|[Documentation](110.database/500.delete-database-of-api.md)|
|
||||
|
||||
|
||||
### Collection
|
||||
|
||||
:::info
|
||||
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
|
||||
:::
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `create_collection()` | Creates a collection. | [Documentation](200.collection/100.create-collection-of-api.md) |
|
||||
| `get_collection()` | Retrieves a specified collection. |[Documentation](200.collection/200.get-collection-of-api.md)|
|
||||
| `get_or_create_collection()` | Creates or queries a collection. If the collection does not exist in the database, it is created. If the collection exists, the corresponding result is obtained. |[Documentation](200.collection/250.get-or-create-collection-of-api.md)|
|
||||
| `list_collections()` | Retrieves the collection list in a database. |[Documentation](200.collection/300.list-collection-of-api.md)|
|
||||
| `count_collection()` | Counts the number of collections in a database. |[Documentation](200.collection/350.count-collection-of-api.md)|
|
||||
| `delete_collection()` | Deletes a specified collection.|[Documentation](200.collection/400.delete-collection-of-api.md)|
|
||||
|
||||
|
||||
### DML
|
||||
|
||||
:::info
|
||||
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
|
||||
:::
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `add()` | Inserts a new record into a collection. | [Documentation](300.dml/200.add-data-of-api.md) |
|
||||
| `update()` | Updates an existing record in a collection. |[Documentation](300.dml/300.update-data-of-api.md)|
|
||||
| `upsert()` | Inserts a new record or updates an existing record. |[Documentation](300.dml/400.upsert-data-of-api.md)|
|
||||
| `delete()` | Deletes a record from a collection.|[Documentation](300.dml/500.delete-data-of-api.md)|
|
||||
|
||||
### DQL
|
||||
|
||||
:::info
|
||||
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
|
||||
:::
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `query()` | Performs vector similarity search. | [Documentation](400.dql/200.query-interfaces-of-api.md) |
|
||||
| `get()` | Queries specific data from a table by using the ID, document, and metadata (non-vector). |[Documentation](400.dql/300.get-interfaces-of-api.md)|
|
||||
| `hybrid_search()` | Performs full-text search and vector similarity search by using ranking. |[Documentation](400.dql/400.hybrid-search-of-api.md)|
|
||||
@@ -0,0 +1,93 @@
|
||||
---
|
||||
slug: /admin-client
|
||||
---
|
||||
|
||||
# Admin Client
|
||||
|
||||
`AdminClient` provides database management operations. It uses the same database connection mode as `Client`, but only supports database management-related operations.
|
||||
|
||||
## Connect to an embedded seekdb instance
|
||||
|
||||
Connect to a local embedded seekdb instance by using `AdminClient`.
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Embedded mode - Database management
|
||||
admin = pyseekdb.AdminClient(path="./seekdb")
|
||||
```
|
||||
|
||||
Parameter description:
|
||||
|
||||
| Parameter | Value Type | Required | Description | Example Value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `path` | string | Optional | The path of the seekdb data directory. seekdb stores database files in this directory and loads them when it starts. | `./seekdb` |
|
||||
|
||||
## Connect to a remote server
|
||||
|
||||
Connect to a remote server by using `AdminClient`. This way, you can connect to a seekdb instance or an OceanBase Database instance.
|
||||
|
||||
:::tip
|
||||
|
||||
Before you connect to a remote server, make sure that you have deployed a server mode seekdb instance or an OceanBase Database instance.<br/>For information about how to deploy a server mode seekdb instance, see [Overview](../../../400.guides/400.deploy/50.deploy-overview.md).<br/>For information about how to deploy an OceanBase Database instance, see [Overview](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976427).
|
||||
|
||||
:::
|
||||
|
||||
Example: Connect to a server mode seekdb instance
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Remote server mode - Database management
|
||||
admin = pyseekdb.AdminClient(
|
||||
host="127.0.0.1",
|
||||
port=2881,
|
||||
user="root",
|
||||
password="" # Can be retrieved from SEEKDB_PASSWORD environment variable
|
||||
)
|
||||
```
|
||||
|
||||
Parameter description:
|
||||
|
||||
| Parameter | Value Type | Required | Description | Example Value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `host` | string | Yes | The IP address of the server where the instance resides. | `127.0.0.1` |
|
||||
| `prot` | string | Yes | The port of the instance. The default value is 2881. | `2881` |
|
||||
| `user` | string | Yes | The username. The default value is root. | `root` |
|
||||
| `password` | string | Yes | The password corresponding to the username. If you do not specify `password` or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. | |
|
||||
|
||||
Example: Connect to an OceanBase Database instance
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Remote server mode - Database management
|
||||
admin = pyseekdb.AdminClient(
|
||||
host="127.0.0.1",
|
||||
port=2881,
|
||||
tenant="test"
|
||||
user="root",
|
||||
password="" # Can be retrieved from SEEKDB_PASSWORD environment variable
|
||||
)
|
||||
```
|
||||
|
||||
Parameter description:
|
||||
|
||||
| Parameter | Value Type | Required | Description | Example Value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `host` | string | Yes | The IP address of the server where the database resides. | `127.0.0.1` |
|
||||
| `prot` | string | Yes | The port of the OceanBase Database instance. The default value is 2881. | `2881` |
|
||||
| `tenant` | string | No | The name of the tenant. This parameter is not required for a server mode seekdb instance, but is required for an OceanBase Database instance. The default value is sys. | `test` |
|
||||
| `user` | string | Yes | The username corresponding to the tenant. The default value is root. | `root` |
|
||||
| `password` | string | Yes | The password corresponding to the username. If you do not specify `password` or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. | |
|
||||
|
||||
## APIs supported when you use AdminClient to connect to a database
|
||||
|
||||
The following APIs are supported when you use `AdminClient` to connect to a database.
|
||||
|
||||
| API | Description | Documentation Link |
|
||||
| --- | --- | --- |
|
||||
| `create_database` | Creates a new database. |[Documentation](110.database/200.create-database-of-api.md)|
|
||||
| `get_database` | Queries a specified database. |[Documentation](110.database/300.get-database-of-api.md)|
|
||||
| `delete_database` | Deletes a specified database. |[Documentation](110.database/400.list-database-of-api.md)|
|
||||
| `list_databases` | Lists all databases. |[Documentation](110.database/500.delete-database-of-api.md)|
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
slug: /database-overview-of-api
|
||||
---
|
||||
|
||||
# Database Management
|
||||
|
||||
A database contains tables, indexes, and metadata of database objects. You can create, query, and delete databases as needed.
|
||||
|
||||
The following APIs are available for database operations.
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `create_database()` | Creates a database. | [Documentation](200.create-database-of-api.md) |
|
||||
| `get_database()` | Gets a specified database. |[Documentation](300.get-database-of-api.md)|
|
||||
| `list_databases()` | Gets the list of databases in the instance. |[Documentation](400.list-database-of-api.md)|
|
||||
| `delete_database()` | Deletes a specified database.|[Documentation](500.delete-database-of-api.md)|
|
||||
@@ -0,0 +1,76 @@
|
||||
---
|
||||
slug: /create-database-of-api
|
||||
---
|
||||
|
||||
# create_database - Create a database
|
||||
|
||||
The `create_database()` function is used to create a new database.
|
||||
|
||||
:::info
|
||||
* This interface can only be used when you are connected to the database using `AdminClient`. For more information about `AdminClient`, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
* Currently, when you use `create_database` to create a database, you cannot specify the database properties. The database will be created based on the default values of the properties. If you want to create a database with specific properties, you can try to create it using SQL. For more information about how to create a database using SQL, see [Create a database](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977077).
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You are connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
* If you are using server mode of seekdb or OceanBase Database, make sure that the connected user has the `CREATE` privilege. For more information about how to check the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have this privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Limitations
|
||||
|
||||
* In a seekdb instance or OceanBase Database, the name of each database must be globally unique.
|
||||
|
||||
* The maximum length of a database name is 128 characters.
|
||||
|
||||
* The name can contain only uppercase and lowercase letters, digits, underscores, dollar signs, and Chinese characters.
|
||||
|
||||
* Avoid using reserved keywords as database names.
|
||||
|
||||
For more information about reserved keywords, see [Reserved keywords](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976774).
|
||||
|
||||
## Recommendations
|
||||
|
||||
* We recommend that you give the database a meaningful name that reflects its purpose and content. For example, you can use `Application Identifier_Sub-application name (optional)_db` as the database name.
|
||||
|
||||
* We recommend that you create the database and related users using the root user and assign only the necessary privileges to ensure the security and controllability of the database.
|
||||
|
||||
* You can create a database with a name consisting only of digits by enclosing the name in backticks (`), but this is not recommended. This is because names consisting only of digits have no clear meaning, and queries require the use of backticks (`), which can lead to unnecessary complexity and confusion.
|
||||
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
create_database(name, tenant=DEFAULT_TENANT)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the database to be created. |`my_database`|
|
||||
|`tenant`|string|No<ul><li>When using embedded seekdb or server mode of seekdb, this parameter is not required.</li><li>When using OceanBase Database, this parameter is required.</li></ul>|The tenant to which the database belongs. |`test_tenant`|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Embedded mode
|
||||
admin = pyseekdb.AdminClient(path="./seekdb")
|
||||
|
||||
# Create database
|
||||
admin.create_database("my_database")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
|
||||
## References
|
||||
|
||||
* [Get a specific database](300.get-database-of-api.md)
|
||||
* [Delete a database](500.delete-database-of-api.md)
|
||||
* [List databases](400.list-database-of-api.md)
|
||||
@@ -0,0 +1,65 @@
|
||||
---
|
||||
slug: /get-database-of-api
|
||||
---
|
||||
|
||||
# get_database - Get the specified database
|
||||
|
||||
The `get_database()` method is used to obtain the information of the specified database.
|
||||
|
||||
:::info
|
||||
|
||||
This method can be used only when you connect to the database by using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
get_database(name, tenant=DEFAULT_TENANT)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the database to be queried. |`my_database`|
|
||||
|`tenant`|string|No<ul><li>When you use embedded seekdb and server mode seekdb, you do not need to specify this parameter.</li><li>When you use OceanBase Database, you must specify this parameter.</li></ul>|The tenant to which the database belongs. |test_tenant|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Embedded mode
|
||||
admin = pyseekdb.AdminClient(path="./seekdb")
|
||||
|
||||
# Get database
|
||||
db = admin.get_database("my_database")
|
||||
# print(f"Database: {db.name}, Charset: {db.charset}, collation:{db.collation}, metadata:{db.metadata}")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the queried database. |`my_database`|
|
||||
|`tenant`|string|No<br/>When you use embedded seekdb and server mode SeekDB, this parameter does not exist. |The tenant to which the queried database belongs. |`test_tenant`|
|
||||
|`charset`|string|No|The character set used by the queried database. |`utf8mb4`|
|
||||
|`collation`|string|No|The collation used by the queried database. |`utf8mb4_general_ci`|
|
||||
|`metadata`|dict|No|Reserved field. | {} |
|
||||
|
||||
## Response example
|
||||
|
||||
```python
|
||||
Database: my_database, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Create a database](200.create-database-of-api.md)
|
||||
* [Delete a database](500.delete-database-of-api.md)
|
||||
* [Get the database list](400.list-database-of-api.md)
|
||||
@@ -0,0 +1,70 @@
|
||||
---
|
||||
slug: /list-database-of-api
|
||||
---
|
||||
|
||||
# list_databases - Get the database list
|
||||
|
||||
The `list_databases()` method is used to retrieve the database list in the instance.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
list_databases(limit=None, offset=None, tenant=DEFAULT_TENANT)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`limit`|int|Optional|The maximum number of databases to return. |2|
|
||||
|`offset`|int|Optional|The number of databases to skip. |3|
|
||||
|`tenant`|string|Optional<ul><li>When using embedded seekdb and server mode seekdb, this parameter is not required.</li><li>When using OceanBase Database, this parameter is required. The default value is `sys`.</li></ul>|The tenant to which the queried database belongs. |test_tenant|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
# List all databases
|
||||
import pyseekdb
|
||||
|
||||
# Embedded mode
|
||||
admin = pyseekdb.AdminClient(path="./seekdb")
|
||||
|
||||
# list database
|
||||
databases = admin.list_databases(2,3)
|
||||
for db in databases:
|
||||
print(f"Database: {db.name}, Charset: {db.charset}, collation:{db.collation}, metadata:{db.metadata}")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the queried database. |`my_database`|
|
||||
|`tenant`|string|Optional<br/>When using embedded seekdb and server mode SeekDB, this parameter is not available. |The tenant to which the queried database belongs. |`test_tenant`|
|
||||
|`charset`|string|Optional|The character set of the queried database. |`utf8mb4`|
|
||||
|`collation`|string|Optional|The collation of the queried database. |`utf8mb4_general_ci`|
|
||||
|`metadata`|dict|Optional|Reserved field. No data is returned. | {} |
|
||||
|
||||
|
||||
## Response example
|
||||
|
||||
```python
|
||||
Database: test, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
|
||||
Database: my_database, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Create a database](200.create-database-of-api.md)
|
||||
* [Delete a database](500.delete-database-of-api.md)
|
||||
* [Get a specific database](300.get-database-of-api.md)
|
||||
@@ -0,0 +1,54 @@
|
||||
---
|
||||
slug: /delete-database-of-api
|
||||
---
|
||||
|
||||
# delete_database - Delete a database
|
||||
|
||||
The `delete_database()` method is used to delete a database.
|
||||
|
||||
:::info
|
||||
|
||||
This method is only available when using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
|
||||
|
||||
* If you are using server mode of seekdb or OceanBase Database, ensure that the user has the `DROP` privilege. For more information about how to view the privileges of the current user, see [View User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have the privilege, contact the administrator to grant the privilege. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
delete_database(name,tenant=DEFAULT_TENANT)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example Value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the database to be deleted. |my_database|
|
||||
|`tenant`|string|No<ul><li>If you are using embedded seekdb or server mode of seekdb, you do not need to specify this parameter.</li><li>If you are using OceanBase Database, this parameter is required. The default value is `sys`.</li></ul>|The tenant to which the database belongs. |test_tenant|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Embedded mode
|
||||
admin = pyseekdb.AdminClient(path="./seekdb")
|
||||
|
||||
# Delete database
|
||||
admin.delete_database("my_database")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Create a database](200.create-database-of-api.md)
|
||||
* [Get a specific database](300.get-database-of-api.md)
|
||||
* [Obtain a database list](400.list-database-of-api.md)
|
||||
@@ -0,0 +1,93 @@
|
||||
---
|
||||
slug: /create-collection-of-api
|
||||
---
|
||||
|
||||
# create_collection - Create a collection
|
||||
|
||||
`create_collection()` is used to create a new collection, which is a table in the database.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when you are connected to the database using a client. For more information about the client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb in server mode or OceanBase Database, make sure that the user has the `CREATE` privilege. For more information about how to view the privileges of the current user, see [View user privileges](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971368). If the user does not have the privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly grant privileges](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974754).
|
||||
|
||||
## Define the table name
|
||||
|
||||
When creating a table, you must first define its name. The following requirements apply when defining the table name:
|
||||
|
||||
* In seekdb, each table name must be unique within the database.
|
||||
|
||||
* The table name cannot exceed 64 characters.
|
||||
|
||||
* We recommend that you give the table a meaningful name instead of using generic names such as t1 or table1. For more information about table naming conventions, see [Table naming conventions](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977289).
|
||||
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
create_collection(name = name,configuration = configuration, embedding_function = embedding_function )
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the collection to be created. |my_collection|
|
||||
|`configuration`|HNSWConfiguration|No|The index configuration, which specifies the dimension and distance metric. If not provided, the default values `dimension=384` and `distance='cosine'` are used. If set to `None`, the dimension is calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|
||||
|`embedding_function`|EmbeddingFunction|No|The function to convert data into vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` is used. If set to `None`, the collection will not include embedding functionality, and if provided, it will be calculated based on `configuration.dimension`.|DefaultEmbeddingFunction()|
|
||||
|
||||
:::info
|
||||
|
||||
When you provide `embedding_function`, the system will automatically calculate the vector dimension by calling this function. If you also provide `configuration.dimension`, it must match the dimension of `embedding_function`. Otherwise, a ValueError will be raised.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Create a collection with default embedding function (auto-calculates dimension)
|
||||
collection = client.create_collection(
|
||||
name="my_collection"
|
||||
)
|
||||
|
||||
# Create a collection with custom embedding function
|
||||
ef = UserDefinedEmbeddingFunction() // define your own Embedding function, See section.6
|
||||
config = HNSWConfiguration(dimension=384, distance='cosine') # Must match EF dimension
|
||||
collection = client.create_collection(
|
||||
name="my_collection2",
|
||||
configuration=config,
|
||||
embedding_function=ef
|
||||
)
|
||||
|
||||
# Create a collection without embedding function (vectors must be provided manually)
|
||||
collection = client.create_collection(
|
||||
name="my_collection3",
|
||||
configuration=HNSWConfiguration(dimension=384, distance='cosine'),
|
||||
embedding_function=None # Explicitly disable embedding function
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Query a collection](200.get-collection-of-api.md)
|
||||
* [Create or query a collection](250.get-or-create-collection-of-api.md)
|
||||
* [Get a collection list](300.list-collection-of-api.md)
|
||||
* [Count the number of collections](350.count-collection-of-api.md)
|
||||
* [Delete a collection](400.delete-collection-of-api.md)
|
||||
@@ -0,0 +1,89 @@
|
||||
---
|
||||
slug: /get-collection-of-api
|
||||
---
|
||||
|
||||
# get_collection - Get a collection
|
||||
|
||||
The `get_collection()` function is used to retrieve a specified collection.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when connected using a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
|
||||
|
||||
* The collection you want to retrieve exists. If the collection does not exist, an error will be returned.
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
client.get_collection(name,configuration = configuration,embedding_function = embedding_function)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the collection to retrieve. |my_collection|
|
||||
|`configuration`|HNSWConfiguration|No|The index configuration, which specifies the dimension and distance metric. If not provided, the default value `dimension=384, distance='cosine'` will be used. If set to `None`, the dimension will be calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|
||||
|`embedding_function`|EmbeddingFunction|No|The function used to convert text to vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` will be used. If set to `None`, the collection will not contain an embedding function. If an embedding function is provided, it will be calculated based on `configuration.dimension`.|DefaultEmbeddingFunction()|
|
||||
|
||||
:::info
|
||||
|
||||
When vectors are not provided for documents/texts, the embedding function set here will be used for all operations on this collection, including add, upsert, update, query, and hybrid_search.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Get an existing collection (uses default embedding function if collection doesn't have one)
|
||||
collection = client.get_collection("my_collection")
|
||||
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
|
||||
|
||||
# Get collection with specific embedding function
|
||||
ef = UserDefinedEmbeddingFunction() // define your own Embedding function, See section.6
|
||||
collection = client.get_collection("my_collection", embedding_function=ef)
|
||||
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
|
||||
|
||||
# Get collection without embedding function
|
||||
collection = client.get_collection("my_collection", embedding_function=None)
|
||||
# Check if collection exists
|
||||
if client.has_collection("my_collection"):
|
||||
collection = client.get_collection("my_collection")
|
||||
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the collection to query. |my_collection|
|
||||
|`dimension`|int|No| |384|
|
||||
|`embedding_function`|EmbeddingFunction|No|DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')|
|
||||
|`distance`|string|No| |cosine|
|
||||
|`metadata`|dict|No|Reserved field, currently no data| {} |
|
||||
|
||||
## Response example
|
||||
|
||||
```python
|
||||
Database: my_collection, dimension: 384, embedding_function:DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance:cosine, metadata:{}
|
||||
Database: my_collection1, dimension: 384, embedding_function:DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance:cosine, metadata:{}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Create a collection](100.create-collection-of-api.md)
|
||||
* [Create or query a collection](250.get-or-create-collection-of-api.md)
|
||||
* [Get a list of collections](300.list-collection-of-api.md)
|
||||
* [Count the number of collections](350.count-collection-of-api.md)
|
||||
* [Delete a collection](400.delete-collection-of-api.md)
|
||||
@@ -0,0 +1,79 @@
|
||||
---
|
||||
slug: /get-or-create-collection-of-api
|
||||
---
|
||||
|
||||
# get_or_create_collection - Create or query a collection
|
||||
|
||||
The `get_or_create_collection()` function creates or queries a collection. If the collection does not exist in the database, it is created. If it exists, the corresponding result is obtained.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using a client. For more information about the client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb in server mode or OceanBase Database, ensure that the connected user has the `CREATE` privilege. For more information about how to check the privileges of the current user, see [Check User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have this privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Define a table name
|
||||
|
||||
When creating a table, you need to define a table name. The following requirements must be met:
|
||||
|
||||
* In seekdb, each table name must be unique within the database.
|
||||
|
||||
* The table name must be no longer than 64 characters.
|
||||
|
||||
* It is recommended to use meaningful names for tables instead of generic names like t1 or table1. For more information about table naming conventions, see [Table Naming Conventions](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977289).
|
||||
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
create_collection(name = name,configuration = configuration, embedding_function = embedding_function )
|
||||
```
|
||||
|
||||
|Parameter|Value Type|Required|Description|Example Value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the collection to be created. |my_collection|
|
||||
|`configuration`|HNSWConfiguration|No|The index configuration with dimension and distance metric. If not provided, the default value is used, which is `dimension=384, distance='cosine'`. If set to `None`, the dimension will be calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|
||||
|`embedding_function`|EmbeddingFunction|No|The function to convert to vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` is used. If set to `None`, the collection will not include embedding functionality. If embedding functionality is provided, it will be automatically calculated based on `configuration.dimension`. |DefaultEmbeddingFunction()|
|
||||
|
||||
:::info
|
||||
|
||||
When `embedding_function` is provided, the system will automatically calculate the vector dimension by calling the function. If `configuration.dimension` is also provided, it must match the dimension of `embedding_function`, otherwise a ValueError will be raised.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Get or create collection (creates if doesn't exist)
|
||||
collection = client.get_or_create_collection(
|
||||
name="my_collection4",
|
||||
configuration=HNSWConfiguration(dimension=384, distance='cosine'),
|
||||
embedding_function=DefaultEmbeddingFunction()
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Create a collection](100.create-collection-of-api.md)
|
||||
* [Query a collection](200.get-collection-of-api.md)
|
||||
* [Get a list of collections](300.list-collection-of-api.md)
|
||||
* [Count collections](350.count-collection-of-api.md)
|
||||
* [Delete a collection](400.delete-collection-of-api.md)
|
||||
@@ -0,0 +1,65 @@
|
||||
---
|
||||
slug: /list-collection-of-api
|
||||
---
|
||||
|
||||
|
||||
# list_collections - Get a list of collections
|
||||
|
||||
The `list_collections()` API is used to obtain all collections.
|
||||
|
||||
:::info
|
||||
|
||||
This API is supported only when you use a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
client.list_collections()
|
||||
```
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# List all collections
|
||||
collections = client.list_collections()
|
||||
for coll in collections:
|
||||
print(f"Collection: {coll.name}, Dimension: {coll.dimension}, embedding_function: {coll.embedding_function}, distance: {coll.distance}, metadata: {coll.metadata}")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the queried collection. |my_collection|
|
||||
|`dimension`|int|No| | 384 |
|
||||
|`embedding_function`|EmbeddingFunction|No|DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')|
|
||||
|`distance`|string|No| |cosine|
|
||||
|`metadata`|dict|No|Reserved field. No data is returned. | {} |
|
||||
|
||||
## Response example
|
||||
|
||||
```pyhton
|
||||
Collection: my_collection, Dimension: 384, embedding_function: DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance: cosine, metadata: {}
|
||||
Database has 1 collections
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Create a collection](100.create-collection-of-api.md)
|
||||
* [Query a collection](200.get-collection-of-api.md)
|
||||
* [Create or query a collection](250.get-or-create-collection-of-api.md)
|
||||
* [Count collections](350.count-collection-of-api.md)
|
||||
* [Delete a collection](400.delete-collection-of-api.md)
|
||||
@@ -0,0 +1,56 @@
|
||||
---
|
||||
slug: /count-collection-of-api
|
||||
---
|
||||
|
||||
# count_collection - Count the number of collections
|
||||
|
||||
The `count_collection()` method is used to count the number of collections in the database.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when you are connected to the database using a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
client.count_collection()
|
||||
```
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Count collections in database
|
||||
collection_count = client.count_collection()
|
||||
print(f"Database has {collection_count} collections")
|
||||
```
|
||||
|
||||
## Return parameters
|
||||
|
||||
None
|
||||
|
||||
## Return example
|
||||
|
||||
```pyhton
|
||||
Database has 1 collections
|
||||
```
|
||||
|
||||
## Related operations
|
||||
|
||||
* [Create a collection](100.create-collection-of-api.md)
|
||||
* [Query a collection](200.get-collection-of-api.md)
|
||||
* [Create or query a collection](250.get-or-create-collection-of-api.md)
|
||||
* [Get a collection list](300.list-collection-of-api.md)
|
||||
* [Delete a collection](400.delete-collection-of-api.md)
|
||||
@@ -0,0 +1,55 @@
|
||||
---
|
||||
slug: /delete-collection-of-api
|
||||
---
|
||||
|
||||
# delete_collection - Delete a Collection
|
||||
|
||||
The `delete_collection()` method is used to delete a specified Collection.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when you are connected to the database using a client. For more information about the client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* The Collection you want to delete exists. If the Collection does not exist, an error will be returned.
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
client.delete_collection(name)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`name`|string|Yes|The name of the Collection to be deleted. |my_collection|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Delete a collection
|
||||
client.delete_collection("my_collection")
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Create a collection](100.create-collection-of-api.md)
|
||||
* [Query a collection](200.get-collection-of-api.md)
|
||||
* [Create or query a collection](250.get-or-create-collection-of-api.md)
|
||||
* [Get a collection list](300.list-collection-of-api.md)
|
||||
* [Count the number of collections](350.count-collection-of-api.md)
|
||||
@@ -0,0 +1,18 @@
|
||||
---
|
||||
slug: /collection-overview-of-api
|
||||
---
|
||||
|
||||
# Manage collections
|
||||
|
||||
In pyseekdb, a collection is a set similar to a table in a database. You can create, query, and delete collections.
|
||||
|
||||
The following API interfaces are supported for managing collections.
|
||||
|
||||
| API interface | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `create_collection()` | Creates a collection. | [Documentation](100.create-collection-of-api.md) |
|
||||
| `get_collection()` | Gets a specified collection. |[Documentation](200.get-collection-of-api.md)|
|
||||
| `get_or_create_collection()` | Creates or queries a collection. If the collection does not exist in the database, it is created. If the collection exists, the corresponding result is obtained. |[Documentation](250.get-or-create-collection-of-api.md)|
|
||||
| `list_collections()` | Gets the collection list of a database. |[Documentation](300.list-collection-of-api.md)|
|
||||
| `count_collection()` | Counts the number of collections in a database |[Documentation](350.count-collection-of-api.md)|
|
||||
| `delete_collection()` | Deletes a specified collection.|[Documentation](400.delete-collection-of-api.md)|
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
slug: /dml-overview-of-api
|
||||
---
|
||||
|
||||
# DML operations
|
||||
|
||||
DML (Data Manipulation Language) operations allow you to insert, update, and delete data in a collection.
|
||||
|
||||
For DML operations, you can use the following APIs.
|
||||
|
||||
| API | Description | Documentation |
|
||||
|---|---|---|
|
||||
| `add()` | Inserts a new record into a collection. | [Documentation](200.add-data-of-api.md) |
|
||||
| `update()` | Updates an existing record in a collection. |[Documentation](300.update-data-of-api.md)|
|
||||
| `upsert()` | Inserts a new record or updates an existing record. |[Documentation](400.upsert-data-of-api.md)|
|
||||
| `delete()` | Deletes a record from a collection.|[Documentation](500.delete-data-of-api.md)|
|
||||
@@ -0,0 +1,117 @@
|
||||
---
|
||||
slug: /add-data-of-api
|
||||
---
|
||||
|
||||
# add - Insert data
|
||||
|
||||
The `add()` method inserts new data into a collection. If a record with the same ID already exists, an error is returned.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb or OceanBase Database in client mode, make sure that the user to which you are connected has the `INSERT` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have the required privilege, contact the administrator to grant you the privilege. For more information about how to directly grant a privilege, see [Directly grant a privilege](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
add(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas
|
||||
)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|string or List[str]|Yes|The ID of the data to be inserted. You can specify a single ID or an array of IDs.|item1|
|
||||
|`embeddings`|List[float] or List[List[float]]|No|The vector or vectors of the data to be inserted. If you specify this parameter, the value of `embedding_function` is ignored. If you do not specify this parameter, you must specify `documents`, and the `collection` must have an `embedding_function`.|[0.1, 0.2, 0.3]|
|
||||
|`documents`|string or List[str]|No|The document or documents to be inserted. If you do not specify `vectors`, `documents` will be converted to vectors using the `embedding_function` of the `collection`.|"This is a document"|
|
||||
|`metadatas`|dict or List[dict]|No|The metadata or metadata list of the data to be inserted. |`{"category": "AI", "score": 95}`|
|
||||
|
||||
:::info
|
||||
|
||||
The `embedding_function` associated with the collection is set during `create_collection()` or `get_collection()`. You cannot override it for each operation.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.create_collection(
|
||||
name="my_collection",
|
||||
configuration=HNSWConfiguration(dimension=3, distance='cosine'),
|
||||
embedding_function=None
|
||||
)
|
||||
|
||||
# Add single item
|
||||
collection.add(
|
||||
ids="item1",
|
||||
embeddings=[0.1, 0.2, 0.3],
|
||||
documents="This is a document",
|
||||
metadatas={"category": "AI", "score": 95}
|
||||
)
|
||||
|
||||
# Add multiple items
|
||||
collection.add(
|
||||
ids=["item4", "item2", "item3"],
|
||||
embeddings=[
|
||||
[0.1, 0.2, 0.4],
|
||||
[0.4, 0.5, 0.6],
|
||||
[0.7, 0.8, 0.9]
|
||||
],
|
||||
documents=[
|
||||
"Document 1",
|
||||
"Document 2",
|
||||
"Document 3"
|
||||
],
|
||||
metadatas=[
|
||||
{"category": "AI", "score": 95},
|
||||
{"category": "ML", "score": 88},
|
||||
{"category": "DL", "score": 92}
|
||||
]
|
||||
)
|
||||
|
||||
# Add with only embeddings
|
||||
collection.add(
|
||||
ids=["vec1", "vec2"],
|
||||
embeddings=[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
|
||||
)
|
||||
|
||||
collection1 = client.create_collection(
|
||||
name="my_collection1"
|
||||
)
|
||||
|
||||
# Add with only documents - embeddings auto-generated by embedding_function
|
||||
# Requires: collection must have embedding_function set
|
||||
collection1.add(
|
||||
ids=["doc1", "doc2"],
|
||||
documents=["Text document 1", "Text document 2"],
|
||||
metadatas=[{"tag": "A"}, {"tag": "B"}]
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Update data](300.update-data-of-api.md)
|
||||
* [Update or insert data](400.upsert-data-of-api.md)
|
||||
* [Delete data](500.delete-data-of-api.md)
|
||||
@@ -0,0 +1,88 @@
|
||||
---
|
||||
slug: /update-data-of-api
|
||||
---
|
||||
|
||||
# update - Update data
|
||||
|
||||
The `update()` method is used to update existing records in a collection. The record must exist, otherwise an error will be raised.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb in client mode or OceanBase Database, make sure that the user to which you have connected has the `UPDATE` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have this privilege, contact the administrator to grant it to you. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
update(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas
|
||||
)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|string or List[str]|Yes|The ID to be modified. It can be a single ID or an array of IDs.|item1|
|
||||
|`embeddings`|List[float] or List[List[float]]|No|The new vectors. If provided, they will be used directly (ignoring `embedding_function`). If not provided, you can provide `documents` to automatically generate vectors.|[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]]|
|
||||
|`documents`|string or List[str]|No|The new documents. If `vectors` are not provided, `documents` will be converted to vectors using the collection's `embedding_function`.|"New document text"|
|
||||
|`metadatas`|dict or List[dict]|No|The new metadata.|`{"category": "AI"}`|
|
||||
|
||||
:::info
|
||||
|
||||
You can update only the `metadatas`. The `embedding_function` used must be associated with the collection.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
collection1 = client.get_collection("my_collection1")
|
||||
|
||||
# Update single item
|
||||
collection.update(
|
||||
ids="item1",
|
||||
metadatas={"category": "AI", "score": 98} # Update metadata only
|
||||
)
|
||||
|
||||
# Update multiple items
|
||||
collection.update(
|
||||
ids=["item1", "item2"],
|
||||
embeddings=[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]], # Update embeddings
|
||||
documents=["Updated document 1", "Updated document 2"] # Update documents
|
||||
)
|
||||
|
||||
# Update with documents only - embeddings auto-generated by embedding_function
|
||||
# Requires: collection must have embedding_function set
|
||||
collection1.update(
|
||||
ids="doc1",
|
||||
documents="New document text", # Embeddings will be auto-generated
|
||||
metadatas={"category": "AI"}
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Insert data](200.add-data-of-api.md)
|
||||
* [Update or insert data](400.upsert-data-of-api.md)
|
||||
* [Delete data](500.delete-data-of-api.md)
|
||||
@@ -0,0 +1,93 @@
|
||||
---
|
||||
slug: /upsert-data-of-api
|
||||
---
|
||||
|
||||
# upsert - Update or insert data
|
||||
|
||||
The `upsert()` method is used to insert new records or update existing records. If a record with the given ID already exists, it will be updated; otherwise, a new record will be inserted.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using a Client connection. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb or OceanBase Database in client mode, ensure that the connected user has the `INSERT` and `UPDATE` privileges on the target table. For more information about how to view the current user privileges, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have the required privileges, contact the administrator to grant them. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
Upsert(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas
|
||||
)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|string or List[str]|Yes|The ID to be added or modified. It can be a single ID or an array of IDs.|item1|
|
||||
|`embeddings`|List[float] or List[List[float]]|No|The vectors. If provided, they will be used directly (ignoring `embedding_function`). If not provided, you can provide `documents` to automatically generate vectors.|[0.1, 0.2, 0.3]|
|
||||
|`documents`|string or List[str]|No|The documents. If `vectors` are not provided, `documents` will be converted to vectors using the collection's `embedding_function`.|"Document text"|
|
||||
|`metadatas`|dict or List[dict]|No|The metadata. |`{"category": "AI"}`|
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
collection1 = client.get_collection("my_collection1")
|
||||
|
||||
# Upsert single item (insert or update)
|
||||
collection.upsert(
|
||||
ids="item1",
|
||||
embeddings=[0.1, 0.2, 0.3],
|
||||
documents="Document text",
|
||||
metadatas={"category": "AI", "score": 95}
|
||||
)
|
||||
|
||||
# Upsert multiple items
|
||||
collection.upsert(
|
||||
ids=["item1", "item2", "item3"],
|
||||
embeddings=[
|
||||
[0.1, 0.2, 0.3],
|
||||
[0.4, 0.5, 0.6],
|
||||
[0.7, 0.8, 0.9]
|
||||
],
|
||||
documents=["Doc 1", "Doc 2", "Doc 3"],
|
||||
metadatas=[
|
||||
{"category": "AI"},
|
||||
{"category": "ML"},
|
||||
{"category": "DL"}
|
||||
]
|
||||
)
|
||||
|
||||
# Upsert with documents only - embeddings auto-generated by embedding_function
|
||||
# Requires: collection must have embedding_function set
|
||||
collection1.upsert(
|
||||
ids=["item1", "item2"],
|
||||
documents=["Document 1", "Document 2"],
|
||||
metadatas=[{"category": "AI"}, {"category": "ML"}]
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Insert data](200.add-data-of-api.md)
|
||||
* [Update data](300.update-data-of-api.md)
|
||||
* [Delete data](400.upsert-data-of-api.md)
|
||||
@@ -0,0 +1,87 @@
|
||||
---
|
||||
slug: /delete-data-of-api
|
||||
---
|
||||
|
||||
# delete - Delete data
|
||||
|
||||
`delete()` is used to delete records from a collection. You can delete records by ID, metadata filter, or document filter.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when you are connected to the database using a Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* If you are using seekdb or OceanBase Database in client mode, make sure that the user to whom you are connected has the `DELETE` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have this privilege, contact the administrator to grant it to you. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
Upsert(
|
||||
ids=ids,
|
||||
embeddings=embeddings,
|
||||
documents=documents,
|
||||
metadatas=metadatas
|
||||
)
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|string or List[str]|Optional|The ID of the record to be deleted. You can specify a single ID or an array of IDs.|item1|
|
||||
|`where`|dict|Optional|The metadata filter.|`{"category": {"$eq": "AI"}}`|
|
||||
|`where_document`|dict|Optional|The document filter.|`{"$contains": "obsolete"}`|
|
||||
|
||||
:::info
|
||||
|
||||
At least one of the `id`, `where`, or `where_document` parameters must be specified.
|
||||
|
||||
:::
|
||||
|
||||
## Request examples
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
|
||||
# Delete by IDs
|
||||
collection.delete(ids=["item1", "item2", "item3"])
|
||||
|
||||
# Delete by single ID
|
||||
collection.delete(ids="item1")
|
||||
|
||||
# Delete by metadata filter
|
||||
collection.delete(where={"category": {"$eq": "AI"}})
|
||||
|
||||
# Delete by comparison operator
|
||||
collection.delete(where={"score": {"$lt": 50}})
|
||||
|
||||
# Delete by document filter
|
||||
collection.delete(where_document={"$contains": "obsolete"})
|
||||
|
||||
# Delete with combined filters
|
||||
collection.delete(
|
||||
where={"category": {"$eq": "AI"}},
|
||||
where_document={"$contains": "deprecated"}
|
||||
)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
None
|
||||
|
||||
## References
|
||||
|
||||
* [Insert data](200.add-data-of-api.md)
|
||||
* [Update data](300.update-data-of-api.md)
|
||||
* [Update or insert data](400.upsert-data-of-api.md)
|
||||
@@ -0,0 +1,15 @@
|
||||
---
|
||||
slug: /dql-overview-of-api
|
||||
---
|
||||
|
||||
# Overview of DQL
|
||||
|
||||
DQL (Data Query Language) operations allow you to retrieve data from collections using various query methods.
|
||||
|
||||
For DQL operations, the following API interfaces are supported.
|
||||
|
||||
| API Interface | Description | Documentation Link |
|
||||
|---|---|---|
|
||||
| `query()` | A vector similarity search method. | [Documentation](200.query-interfaces-of-api.md) |
|
||||
| `get()` | Queries specific data from a table using an ID, document, or metadata (excluding vectors). | [Documentation](300.get-interfaces-of-api.md) |
|
||||
| `hybrid_search()` | Combines full-text search and vector similarity search using a ranking method. | [Documentation](400.hybrid-search-of-api.md) |
|
||||
@@ -0,0 +1,161 @@
|
||||
---
|
||||
slug: /query-interfaces-of-api
|
||||
---
|
||||
|
||||
# query - vector query
|
||||
|
||||
The `query()` method is used to perform vector similarity search to find the most similar documents to the query vector.
|
||||
|
||||
:::info
|
||||
|
||||
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
query()
|
||||
```
|
||||
|
||||
|Parameter|Value type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or a list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|
||||
|`query_texts`|str or List[str]|No|A single text or a list of texts for query; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|
||||
|`n_results`|int|Yes|The number of similar results to return, default is 10|3|
|
||||
|`where`|dict |No|Metadata filter conditions.|`{"category": {"$eq": "AI"}}`|
|
||||
|`where_document`|dict|No|Document filter conditions.|`{"$contains": "machine"}`|
|
||||
|`include`|List[str]|No|List of fields to include: `["documents", "metadatas", "embeddings"]`|["documents", "metadatas", "embeddings"]|
|
||||
|
||||
:::info
|
||||
|
||||
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
collection1 = client.get_collection("my_collection1")
|
||||
|
||||
# Basic vector similarity query (embedding_function not used)
|
||||
results = collection.query(
|
||||
query_embeddings=[1.0, 2.0, 3.0],
|
||||
n_results=3
|
||||
)
|
||||
|
||||
# Iterate over results
|
||||
for i in range(len(results["ids"][0])):
|
||||
print(f"ID: {results['ids'][0][i]}, Distance: {results['distances'][0][i]}")
|
||||
if results.get("documents"):
|
||||
print(f"Document: {results['documents'][0][i]}")
|
||||
if results.get("metadatas"):
|
||||
print(f"Metadata: {results['metadatas'][0][i]}")
|
||||
|
||||
# Query by texts - vectors auto-generated by embedding_function
|
||||
# Requires: collection must have embedding_function set
|
||||
results = collection1.query(
|
||||
query_texts=["my query text"],
|
||||
n_results=10
|
||||
)
|
||||
# The collection's embedding_function will automatically convert query_texts to query_embeddings
|
||||
|
||||
# Query by multiple texts (batch query)
|
||||
results = collection1.query(
|
||||
query_texts=["query text 1", "query text 2"],
|
||||
n_results=5
|
||||
)
|
||||
# Returns dict with lists of lists, one list per query text
|
||||
for i in range(len(results["ids"])):
|
||||
print(f"Query {i}: {len(results['ids'][i])} results")
|
||||
|
||||
# Query with metadata filter (using query_texts)
|
||||
results = collection1.query(
|
||||
query_texts=["AI research"],
|
||||
where={"category": {"$eq": "AI"}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with comparison operator (using query_texts)
|
||||
results = collection1.query(
|
||||
query_texts=["machine learning"],
|
||||
where={"score": {"$gte": 90}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with document filter (using query_texts)
|
||||
results = collection1.query(
|
||||
query_texts=["neural networks"],
|
||||
where_document={"$contains": "machine learning"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with combined filters (using query_texts)
|
||||
results = collection1.query(
|
||||
query_texts=["AI research"],
|
||||
where={"category": {"$eq": "AI"}, "score": {"$gte": 90}},
|
||||
where_document={"$contains": "machine"},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with multiple vectors (batch query)
|
||||
results = collection.query(
|
||||
query_embeddings=[[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]],
|
||||
n_results=2
|
||||
)
|
||||
# Returns dict with lists of lists, one list per query vector
|
||||
for i in range(len(results["ids"])):
|
||||
print(f"Query {i}: {len(results['ids'][i])} results")
|
||||
|
||||
# Query with specific fields
|
||||
results = collection.query(
|
||||
query_embeddings=[1.0, 2.0, 3.0],
|
||||
include=["documents", "metadatas", "embeddings"],
|
||||
n_results=3
|
||||
)
|
||||
```
|
||||
|
||||
## Return parameters
|
||||
|
||||
|Parameter|Value type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|List[List[str]] |Yes|The IDs to add or modify. It can be a single ID or an array of IDs.|item1|
|
||||
|`embeddings`|[List[List[List[float]]]]|No|The vectors; if provided, it will be used directly (ignoring `embedding_function`), if not provided, `documents` can be provided to generate vectors automatically.|[0.1, 0.2, 0.3]|
|
||||
|`documents`|[List[List[Dict]]]|No|The documents. If `vectors` are not provided, `documents` will be converted to vectors using the `embedding_function` of the collection.| "Document text"|
|
||||
|`metadatas`|[List[List[Dict]]]|No|The metadata.|`{"category": "AI"}`|
|
||||
|`distances`|[List[List[Dict]]]|No| |`{"category": "AI"}`|
|
||||
|
||||
## Return example
|
||||
|
||||
```python
|
||||
ID: vec1, Distance: 0.0
|
||||
Document: None
|
||||
Metadata: {}
|
||||
ID: vec2, Distance: 0.025368153802923787
|
||||
Document: None
|
||||
Metadata: {}
|
||||
Query 0: 4 results
|
||||
Query 1: 4 results
|
||||
Query 0: 2 results
|
||||
Query 1: 2 results
|
||||
```
|
||||
|
||||
## Related operations
|
||||
|
||||
* [get - Retrieve](300.get-interfaces-of-api.md)
|
||||
* [Hybrid search](400.hybrid-search-of-api.md)
|
||||
* [Operators](500.filter-operators-of-api.md)
|
||||
@@ -0,0 +1,127 @@
|
||||
---
|
||||
slug: /get-interfaces-of-api
|
||||
---
|
||||
|
||||
# get - Retrieve
|
||||
|
||||
`get()` is used to retrieve documents from a collection without performing vector similarity search.
|
||||
|
||||
It supports filtering by IDs, metadata, and documents.
|
||||
|
||||
:::info
|
||||
|
||||
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
get()
|
||||
```
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`ids`|List[float] or List[List[float]] |Yes|The ID or list of IDs to retrieve.|[1.0, 2.0, 3.0]|
|
||||
|`where`|dict |No|The metadata filter. |`{"category": {"$eq": "AI"}}`|
|
||||
|`where_document`|dict|No|The document filter. |`{"$contains": "machine"}`|
|
||||
|`limit`|dict |No|The maximum number of results to return. |`{"category": {"$eq": "AI"}}`|
|
||||
|`offset`|dict|No|The number of results to skip for pagination. |`{"$contains": "machine"}`|
|
||||
|`include`|List[str]|No|The list of fields to include: `["documents", "metadatas", "embeddings"]`. |["documents", "metadatas", "embeddings"]|
|
||||
|
||||
:::info
|
||||
|
||||
If no parameters are provided, all data is returned.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
|
||||
# Get by single ID
|
||||
results = collection.get(ids="123")
|
||||
|
||||
# Get by multiple IDs
|
||||
results = collection.get(ids=["1", "2", "3"])
|
||||
|
||||
# Get by metadata filter
|
||||
results = collection.get(
|
||||
where={"category": {"$eq": "AI"}},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get by comparison operator
|
||||
results = collection.get(
|
||||
where={"score": {"$gte": 90}},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get by $in operator
|
||||
results = collection.get(
|
||||
where={"tag": {"$in": ["ml", "python"]}},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get by logical operators ($or)
|
||||
results = collection.get(
|
||||
where={
|
||||
"$or": [
|
||||
{"category": {"$eq": "AI"}},
|
||||
{"tag": {"$eq": "python"}}
|
||||
]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get by document content filter
|
||||
results = collection.get(
|
||||
where_document={"$contains": "machine learning"},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get with combined filters
|
||||
results = collection.get(
|
||||
where={"category": {"$eq": "AI"}},
|
||||
where_document={"$contains": "machine"},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get with pagination
|
||||
results = collection.get(limit=2, offset=1)
|
||||
|
||||
# Get with specific fields
|
||||
results = collection.get(
|
||||
ids=["1", "2"],
|
||||
include=["documents", "metadatas", "embeddings"]
|
||||
)
|
||||
|
||||
# Get all data (up to limit)
|
||||
results = collection.get(limit=100)
|
||||
```
|
||||
|
||||
## Response parameters
|
||||
|
||||
* If a single ID is provided: The result contains the get object for that ID.
|
||||
* If multiple IDs are provided: A list of QueryResult objects, one for each ID.
|
||||
* If filters are provided: A QueryResult object containing all matching results.
|
||||
|
||||
## Related operations
|
||||
|
||||
* [Vector query](200.query-interfaces-of-api.md)
|
||||
* [Hybrid search](400.hybrid-search-of-api.md)
|
||||
* [Operators](500.filter-operators-of-api.md)
|
||||
@@ -0,0 +1,140 @@
|
||||
---
|
||||
slug: /hybrid-search-of-api
|
||||
---
|
||||
|
||||
# hybrid_search - Hybrid search
|
||||
|
||||
`hybrid_search()` combines full-text search and vector similarity search with ranking.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert Data](../300.dml/200.add-data-of-api.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
hybrid_search(
|
||||
query={
|
||||
"where_document": ,
|
||||
"where": ,
|
||||
"n_results":
|
||||
},
|
||||
knn={
|
||||
"query_texts":
|
||||
"where":
|
||||
"n_results":
|
||||
},
|
||||
rank=,
|
||||
n_results=,
|
||||
include=
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
* query: full-text search configuration, including the following parameters:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|
||||
|`where_document`|dict|Optional|Document filter conditions. |`{"$contains": "machine"}`|
|
||||
|`n_results`|int|Yes|Number of results for full-text search.||
|
||||
|
||||
* knn: vector search configuration, including the following parameters:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|
||||
|`query_texts`|str or List[str]|Optional|A single vector or list of vectors; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|
||||
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|
||||
|`n_results`|int|Yes|Number of results for vector search.||
|
||||
|
||||
* Other parameters are as follows:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|`rank`|dict |Optional|Ranking configuration, for example: `{"rrf": {"rank_window_size": 60, "rank_constant": 60}}`|`{"category": {"$eq": "AI"}}`|
|
||||
|`n_results`|int|Yes|Number of similar results to return. Default value is 10|3|
|
||||
|`include`|List[str]|Optional|List of fields to include: `["documents", "metadatas", "embeddings"]`.|["documents", "metadatas", "embeddings"]|
|
||||
|
||||
|
||||
:::info
|
||||
|
||||
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
collection1 = client.get_collection("my_collection1")
|
||||
|
||||
# Hybrid search with query_embeddings (embedding_function not used)
|
||||
results = collection.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "machine learning"},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], # Used directly
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Hybrid search with both full-text and vector search (using query_texts)
|
||||
results = collection1.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "machine learning"},
|
||||
"where": {"category": {"$eq": "science"}},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_texts": ["AI research"], # Will be embedded automatically
|
||||
"where": {"year": {"$gte": 2020}},
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}}, # Reciprocal Rank Fusion
|
||||
n_results=5,
|
||||
include=["documents", "metadatas", "embeddings"]
|
||||
)
|
||||
|
||||
# Hybrid search with multiple query texts (batch)
|
||||
results = collection1.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "AI"},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_texts": ["machine learning", "neural networks"], # Multiple queries
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
```
|
||||
|
||||
## Return parameters
|
||||
|
||||
A dictionary containing search results, including ID, distances, metadatas, document, etc.
|
||||
|
||||
## Related operations
|
||||
|
||||
* [Vector query](200.query-interfaces-of-api.md)
|
||||
* [get - Retrieve](300.get-interfaces-of-api.md)
|
||||
* [Operators](500.filter-operators-of-api.md)
|
||||
@@ -0,0 +1,151 @@
|
||||
---
|
||||
slug: /filter-operators-of-api
|
||||
---
|
||||
|
||||
# Operators
|
||||
|
||||
Operators are used to connect operands or parameters and return results. In terms of syntax, operators can appear before, after, or between operands.
|
||||
|
||||
## Operator examples
|
||||
|
||||
### Data filtering (where)
|
||||
|
||||
#### Equal to
|
||||
|
||||
Use `$eq` to indicate equal to, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"category": {"$eq": "AI"}}
|
||||
```
|
||||
|
||||
#### Not equal to
|
||||
|
||||
Use `$ne` to indicate not equal to, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"status": {"$ne": "deleted"}}
|
||||
```
|
||||
|
||||
#### Greater than
|
||||
|
||||
Use `$gt` to indicate greater than, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"score": {"$gt": 90}}
|
||||
```
|
||||
|
||||
#### Greater than or equal to
|
||||
|
||||
Use `$gte` to indicate greater than or equal to, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"score": {"$gte": 90}}
|
||||
```
|
||||
|
||||
#### Less than
|
||||
|
||||
Use `$lt` to indicate less than, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"score": {"$lt": 50}}
|
||||
```
|
||||
|
||||
#### Less than or equal to
|
||||
|
||||
Use `$lte` to indicate less than or equal to, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"score": {"$lte": 50}}
|
||||
```
|
||||
|
||||
#### Contains
|
||||
|
||||
Use `$in` to indicate contains, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"tag": {"$in": ["ml", "python", "ai"]}}
|
||||
```
|
||||
|
||||
#### Does not contain
|
||||
|
||||
Use `$nin` to indicate does not contain, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={"tag": {"$nin": ["deprecated", "old"]}}
|
||||
```
|
||||
|
||||
#### Logical OR
|
||||
|
||||
Use `$or` to indicate logical OR, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={
|
||||
"$or": [
|
||||
{"category": {"$eq": "AI"}},
|
||||
{"tag": {"$eq": "python"}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Logical AND
|
||||
|
||||
Use `$and` to indicate logical AND, as shown in the following example:
|
||||
|
||||
```python
|
||||
where={
|
||||
"$and": [
|
||||
{"category": {"$eq": "AI"}},
|
||||
{"score": {"$gte": 90}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Text filtering (where_document)
|
||||
|
||||
#### Full-text search (contains substring)
|
||||
|
||||
Use `$contains` to indicate full-text search, as shown in the following example:
|
||||
|
||||
```python
|
||||
where_document={"$contains": "machine learning"}
|
||||
```
|
||||
|
||||
#### Regular expression
|
||||
|
||||
Use `$regex` to indicate regular expression, as shown in the following example:
|
||||
|
||||
```python
|
||||
where_document={"$regex": "pattern.*"}
|
||||
```
|
||||
|
||||
#### Logical OR
|
||||
|
||||
Use `$or` to indicate logical OR, as shown in the following example:
|
||||
|
||||
```python
|
||||
where_document={
|
||||
"$or": [
|
||||
{"$contains": "machine learning"},
|
||||
{"$contains": "artificial intelligence"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Logical AND
|
||||
|
||||
Use `$and` to indicate logical AND, as shown in the following example:
|
||||
|
||||
```python
|
||||
where_document={
|
||||
"$and": [
|
||||
{"$contains": "machine"},
|
||||
{"$contains": "learning"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Related operations
|
||||
|
||||
* [Vector query](200.query-interfaces-of-api.md)
|
||||
* [get - Retrieve](300.get-interfaces-of-api.md)
|
||||
* [Hybrid search](400.hybrid-search-of-api.md)
|
||||
@@ -0,0 +1,107 @@
|
||||
---
|
||||
slug: /client
|
||||
---
|
||||
|
||||
# Client
|
||||
|
||||
The `Client` class is used to connect to a database in either embedded mode or server mode. It automatically selects the appropriate connection mode based on the provided parameters.
|
||||
|
||||
:::tip
|
||||
OceanBase Database is a fully self-developed, enterprise-level, native distributed database developed by OceanBase. It achieves financial-grade high availability on ordinary hardware and sets a new standard for automatic, lossless disaster recovery across five IDCs in three regions. It also sets a new benchmark in the TPC-C benchmark test, with a single cluster size exceeding 1,500 nodes. OceanBase Database is cloud-native, highly consistent, and highly compatible with Oracle and MySQL. For more information about OceanBase Database, see [OceanBase Database](https://www.oceanbase.com/docs/oceanbase-database-cn).
|
||||
:::
|
||||
|
||||
## Connect to an embedded seekdb instance
|
||||
|
||||
Use the `Client` class to connect to a local embedded seekdb instance.
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create embedded client
|
||||
client = pyseekdb.Client(
|
||||
#path="./seekdb", # Path to SeekDB data directory
|
||||
#database="test" # Database name
|
||||
)
|
||||
```
|
||||
|
||||
The following table describes the parameters.
|
||||
|
||||
| Parameter | Value type | Required | Description | Example value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `path` | string | No | The path to the seekdb data directory. seekdb stores database files in this directory and loads them when it starts. | `./seekdb` |
|
||||
| `database` | string | No | The name of the database. | `test` |
|
||||
|
||||
## Connect to a remote server
|
||||
|
||||
Use the `Client` class to connect to a remote server, which runs seekdb or OceanBase Database.
|
||||
|
||||
:::tip
|
||||
|
||||
Before you connect to a remote server, make sure that you have deployed a server instance of seekdb or OceanBase Database. <br/>For information about how to deploy a server instance of seekdb, see [Overview](../../../400.guides/400.deploy/50.deploy-overview.md).<br/>For information about how to deploy OceanBase Database, see [Overview](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976427).
|
||||
|
||||
:::
|
||||
|
||||
Example: Connect to a server instance of seekdb
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create remote server client (SeekDB Server)
|
||||
client = pyseekdb.Client(
|
||||
host="127.0.0.1", # Server host
|
||||
port=2881, # Server port
|
||||
database="test", # Database name
|
||||
user="root", # Username
|
||||
password="" # Password (can be retrieved from SEEKDB_PASSWORD environment variable)
|
||||
)
|
||||
```
|
||||
|
||||
The following table describes the parameters.
|
||||
|
||||
| Parameter | Value type | Required | Description | Example value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `host` | string | Yes | The IP address of the server where the instance is located. | `127.0.0.1` |
|
||||
| `prot` | string | Yes | The port number of the instance. The default value is 2881. | `2881` |
|
||||
| `database` | string | Yes | The name of the database. | `test` |
|
||||
| `user` | string | Yes | The username. The default value is root. | `root` |
|
||||
| `password` | string | Yes | The password corresponding to the user. If you do not provide the `password` parameter or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. ||
|
||||
|
||||
Example: Connect to OceanBase Database
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create remote server client (OceanBase Server)
|
||||
client = pyseekdb.Client(
|
||||
host="127.0.0.1", # Server host
|
||||
port=2881, # Server port (default: 2881)
|
||||
tenant="test", # Tenant name
|
||||
database="test", # Database name
|
||||
user="root", # Username (default: "root")
|
||||
password="" # Password (can be retrieved from SEEKDB_PASSWORD environment variable)
|
||||
)
|
||||
```
|
||||
|
||||
The following table describes the parameters.
|
||||
|
||||
| Parameter | Value type | Required | Description | Example value |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `host` | string | Yes | The IP address of the server where the database is located. | `127.0.0.1` |
|
||||
| `prot` | string | Yes | The port number of OceanBase Database. The default value is 2881. | `2881` |
|
||||
| `tenant` | string | No | The name of the tenant. This parameter is not required for seekdb. For OceanBase Database, the default value is sys. | `test` |
|
||||
| `database` | string | Yes | The name of the database. | `test` |
|
||||
| `user` | string | Yes | The username corresponding to the tenant. The default value is root. | `root` |
|
||||
| `password` | string | Yes | The password corresponding to the user. If you do not provide the `password` parameter or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. ||
|
||||
|
||||
## APIs supported when you use the Client class to connect to a database
|
||||
|
||||
When you use the `Client` class to connect to a database, you can call the following APIs.
|
||||
|
||||
| API | Description | Document link |
|
||||
| --- | --- | --- |
|
||||
| `create_collection()` | Creates a new collection. | [Document](200.collection/100.create-collection-of-api.md) |
|
||||
| `get_collection()` | Queries a specified collection. |[Document](200.collection/200.get-collection-of-api.md)|
|
||||
| `delete_collection()` | Deletes a specified collection. |[Document](200.collection/400.delete-collection-of-api.md)|
|
||||
| `list_collections()` | Lists all collections in the current database.|[Document](200.collection/300.list-collection-of-api.md)|
|
||||
| `get_or_create_collection()` | Queries a specified collection. If the collection does not exist, it is created.|[Document](200.collection/250.get-or-create-collection-of-api.md)|
|
||||
| `count_collection()` | Queries the number of collections in the current database. |[Document](200.collection/350.count-collection-of-api.md)|
|
||||
@@ -0,0 +1,35 @@
|
||||
---
|
||||
slug: /default-embedding-function-of-api
|
||||
---
|
||||
|
||||
# Default embedding function
|
||||
|
||||
An embedding function converts text documents into vector embeddings for similarity search. pyseekdb supports built-in and custom embedding functions.
|
||||
|
||||
The `DefaultEmbeddingFunction` is the default embedding function if none is specified. This function is already available in seekdb and does not need to be created separately.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```python
|
||||
from pyseekdb import DefaultEmbeddingFunction
|
||||
|
||||
# Use default model (all-MiniLM-L6-v2, 384 dimensions)
|
||||
ef = DefaultEmbeddingFunction()
|
||||
|
||||
# Use custom model
|
||||
ef = DefaultEmbeddingFunction()
|
||||
|
||||
# Get embedding dimension
|
||||
print(f"Dimension: {ef.dimension}") # 384
|
||||
|
||||
# Generate embeddings
|
||||
embeddings = ef(["Hello world", "How are you?"])
|
||||
print(f"Generated {len(embeddings)} embeddings, each with {len(embeddings[0])} dimensions")
|
||||
```
|
||||
|
||||
## Related operations
|
||||
|
||||
If you want to use a custom function, you can refer to the following topics to create and use a custom function:
|
||||
|
||||
* [Create a custom embedding function](200.create-custim-embedding-functions-of-api.md)
|
||||
* [Use a custom embedding function](300.using-custom-embedding-functions-of-api.md)
|
||||
@@ -0,0 +1,271 @@
|
||||
---
|
||||
slug: /create-custim-embedding-functions-of-api
|
||||
---
|
||||
|
||||
# Create a custom embedding function
|
||||
|
||||
You can create a custom embedding function by implementing the `EmbeddedFunction` protocol. This function includes the following features:
|
||||
|
||||
* Execute the `__call__` method, which accepts `Documents (str or List[str])` and returns `Embeddings (List[List[float]])`.
|
||||
|
||||
* Optionally implement a dimension attribute to return the vector dimension.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating a custom embedding function, ensure the following:
|
||||
|
||||
* Implement the `__call__` method:
|
||||
|
||||
* Each vector must have the same dimension.
|
||||
* Input: The type of a single or multiple documents is str or List[str].
|
||||
* Output: The field type of the embedded vectors is `List[List[float]]`.
|
||||
|
||||
* (Recommended) Implement the dimension attribute:
|
||||
* Output: The type of the vectors generated by this function is `int`.
|
||||
* Creating collections helps verify uniqueness.
|
||||
|
||||
* Handle special cases
|
||||
* Convert a single string input to a list.
|
||||
* Return an empty list for empty inputs.
|
||||
* All vectors in the output must have the same dimension.
|
||||
|
||||
## Example 1: Sentence Transformer custom embedding function
|
||||
|
||||
```python
|
||||
from typing import List, Union
|
||||
from pyseekdb import EmbeddingFunction, Client, HNSWConfiguration
|
||||
|
||||
Documents = Union[str, List[str]]
|
||||
Embeddings = List[List[float]]
|
||||
|
||||
class SentenceTransformerCustomEmbeddingFunction(EmbeddingFunction[Documents]):
|
||||
"""
|
||||
A custom embedding function using sentence-transformers with a specific model.
|
||||
"""
|
||||
|
||||
def __init__(self, model_name: str = "all-mpnet-base-v2", device: str = "cpu"): # TODO: your own model name and device
|
||||
"""
|
||||
Initialize the sentence-transformer embedding function.
|
||||
|
||||
Args:
|
||||
model_name: Name of the sentence-transformers model to use
|
||||
device: Device to run the model on ('cpu' or 'cuda')
|
||||
"""
|
||||
self.model_name = model_name
|
||||
self.device = device
|
||||
self._model = None
|
||||
self._dimension = None
|
||||
|
||||
def _ensure_model_loaded(self):
|
||||
"""Lazy load the embedding model"""
|
||||
if self._model is None:
|
||||
try:
|
||||
from sentence_transformers import SentenceTransformer
|
||||
self._model = SentenceTransformer(self.model_name, device=self.device)
|
||||
# Get dimension from model
|
||||
test_embedding = self._model.encode(["test"], convert_to_numpy=True)
|
||||
self._dimension = len(test_embedding[0])
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"sentence-transformers is not installed. "
|
||||
"Please install it with: pip install sentence-transformers"
|
||||
)
|
||||
|
||||
@property
|
||||
def dimension(self) -> int:
|
||||
"""Get the dimension of embeddings produced by this function"""
|
||||
self._ensure_model_loaded()
|
||||
return self._dimension
|
||||
|
||||
def __call__(self, input: Documents) -> Embeddings:
|
||||
"""
|
||||
Generate embeddings for the given documents.
|
||||
|
||||
Args:
|
||||
input: Single document (str) or list of documents (List[str])
|
||||
|
||||
Returns:
|
||||
List of embedding vectors
|
||||
"""
|
||||
self._ensure_model_loaded()
|
||||
|
||||
# Handle single string input
|
||||
if isinstance(input, str):
|
||||
input = [input]
|
||||
|
||||
# Handle empty input
|
||||
if not input:
|
||||
return []
|
||||
|
||||
# Generate embeddings
|
||||
embeddings = self._model.encode(
|
||||
input,
|
||||
convert_to_numpy=True,
|
||||
show_progress_bar=False
|
||||
)
|
||||
|
||||
# Convert numpy arrays to lists
|
||||
return [embedding.tolist() for embedding in embeddings]
|
||||
|
||||
# Use the custom embedding function
|
||||
client = Client()
|
||||
|
||||
# Initialize embedding function with all-mpnet-base-v2 model (768 dimensions)
|
||||
ef = SentenceTransformerCustomEmbeddingFunction(
|
||||
model_name='all-mpnet-base-v2', # TODO: your own model name
|
||||
device='cpu' # TODO: your own device
|
||||
)
|
||||
|
||||
# Get the dimension from the embedding function
|
||||
dimension = ef.dimension
|
||||
print(f"Embedding dimension: {dimension}")
|
||||
|
||||
# Create collection with matching dimension
|
||||
collection_name = "my_collection"
|
||||
if client.has_collection(collection_name):
|
||||
client.delete_collection(collection_name)
|
||||
|
||||
collection = client.create_collection(
|
||||
name=collection_name,
|
||||
configuration=HNSWConfiguration(dimension=dimension, distance='cosine'),
|
||||
embedding_function=ef
|
||||
)
|
||||
|
||||
# Test the embedding function
|
||||
print("\nTesting embedding function...")
|
||||
test_documents = ["Hello world", "This is a test", "Sentence transformers are great"]
|
||||
embeddings = ef(test_documents)
|
||||
print(f"Generated {len(embeddings)} embeddings")
|
||||
print(f"Each embedding has {len(embeddings[0])} dimensions")
|
||||
|
||||
# Add some documents to the collection
|
||||
print("\nAdding documents to collection...")
|
||||
collection.add(
|
||||
ids=["1", "2", "3"],
|
||||
documents=test_documents,
|
||||
metadatas=[{"source": "test1"}, {"source": "test2"}, {"source": "test3"}]
|
||||
)
|
||||
|
||||
# Query the collection
|
||||
print("\nQuerying collection...")
|
||||
results = collection.query(
|
||||
query_texts="Hello",
|
||||
n_results=2
|
||||
)
|
||||
|
||||
print("\nQuery results:")
|
||||
for i in range(len(results['ids'][0])):
|
||||
print(f"ID: {results['ids'][0][i]}")
|
||||
print(f"Document: {results['documents'][0][i]}")
|
||||
print(f"Distance: {results['distances'][0][i]}")
|
||||
print()
|
||||
|
||||
# Clean up
|
||||
client.delete_collection(name=collection_name)
|
||||
print("Test completed successfully!")
|
||||
```
|
||||
|
||||
## Example 2: OpenAI embedding function
|
||||
|
||||
```python
|
||||
from typing import List, Union
|
||||
import os
|
||||
from openai import OpenAI
|
||||
from pyseekdb import EmbeddingFunction
|
||||
import pyseekdb
|
||||
|
||||
Documents = Union[str, List[str]]
|
||||
Embeddings = List[List[float]]
|
||||
|
||||
class QWenEmbeddingFunction(EmbeddingFunction[Documents]):
|
||||
"""
|
||||
A custom embedding function using OpenAI's embedding API.
|
||||
"""
|
||||
|
||||
def __init__(self, model_name: str = "", api_key: str = ""): # TODO: your own model name and api key
|
||||
"""
|
||||
Initialize the OpenAI embedding function.
|
||||
|
||||
Args:
|
||||
model_name: Name of the OpenAI embedding model
|
||||
api_key: OpenAI API key (if not provided, uses OPENAI_API_KEY env var)
|
||||
"""
|
||||
self.model_name = model_name
|
||||
self.api_key = api_key or os.environ.get('OPENAI_API_KEY') # TODO: your own api key
|
||||
if not self.api_key:
|
||||
raise ValueError("OpenAI API key is required")
|
||||
|
||||
self._dimension = 1024 # TODO: your own dimension
|
||||
|
||||
@property
|
||||
def dimension(self) -> int:
|
||||
"""Get the dimension of embeddings produced by this function"""
|
||||
if self._dimension is None:
|
||||
# Call API to get dimension (or use known values)
|
||||
raise ValueError("Dimension not set for this model")
|
||||
return self._dimension
|
||||
|
||||
def __call__(self, input: Documents) -> Embeddings:
|
||||
"""
|
||||
Generate embeddings using OpenAI API.
|
||||
|
||||
Args:
|
||||
input: Single document (str) or list of documents (List[str])
|
||||
|
||||
Returns:
|
||||
List of embedding vectors
|
||||
"""
|
||||
# Handle single string input
|
||||
if isinstance(input, str):
|
||||
input = [input]
|
||||
|
||||
# Handle empty input
|
||||
if not input:
|
||||
return []
|
||||
|
||||
# Call OpenAI API
|
||||
client = OpenAI(
|
||||
api_key=self.api_key,
|
||||
base_url="" # TODO: your own base url
|
||||
)
|
||||
response = client.embeddings.create(
|
||||
model=self.model_name,
|
||||
input=input
|
||||
)
|
||||
|
||||
# Extract embeddings
|
||||
embeddings = [item.embedding for item in response.data]
|
||||
return embeddings
|
||||
|
||||
# Use the custom embedding function
|
||||
collection_name = "my_collection"
|
||||
ef = QWenEmbeddingFunction()
|
||||
client = pyseekdb.Client()
|
||||
|
||||
if client.has_collection(collection_name):
|
||||
client.delete_collection(collection_name)
|
||||
|
||||
collection = client.create_collection(
|
||||
name=collection_name,
|
||||
embedding_function=ef
|
||||
)
|
||||
|
||||
collection.add(
|
||||
ids=["1", "2", "3"],
|
||||
documents=["Hello", "World", "Hello World"],
|
||||
metadatas=[{"tag": "A"}, {"tag": "B"}, {"tag": "C"}]
|
||||
)
|
||||
|
||||
results = collection.query(
|
||||
query_texts="Hello",
|
||||
n_results=2
|
||||
)
|
||||
for i in range(len(results['ids'][0])):
|
||||
print(results['ids'][0][i])
|
||||
print(results['documents'][0][i])
|
||||
print(results['metadatas'][0][i])
|
||||
print(results['distances'][0][i])
|
||||
print()
|
||||
|
||||
client.delete_collection(name=collection_name)
|
||||
```
|
||||
@@ -0,0 +1,41 @@
|
||||
---
|
||||
slug: /using-custom-embedding-functions-of-api
|
||||
---
|
||||
|
||||
# Use a custom embedding function
|
||||
|
||||
After you create a custom embedding function, you can use it when you create or get a collection.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
from pyseekdb import HNSWConfiguration
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
# Create collection with custom embedding function
|
||||
ef = SentenceTransformerCustomEmbeddingFunction()
|
||||
collection = client.create_collection(
|
||||
name="my_collection",
|
||||
configuration=HNSWConfiguration(dimension=ef.dimension, distance='cosine'),
|
||||
embedding_function=ef
|
||||
)
|
||||
|
||||
# Get collection with custom embedding function
|
||||
collection = client.get_collection("my_collection", embedding_function=ef)
|
||||
|
||||
# Use the collection - documents will be automatically embedded
|
||||
collection.add(
|
||||
ids=["doc1", "doc2"],
|
||||
documents=["Document 1", "Document 2"], # Vectors auto-generated
|
||||
metadatas=[{"tag": "A"}, {"tag": "B"}]
|
||||
)
|
||||
|
||||
# Query with texts - query vectors auto-generated
|
||||
results = collection.query(
|
||||
query_texts=["my query"],
|
||||
n_results=10
|
||||
)
|
||||
```
|
||||
Reference in New Issue
Block a user