184 lines
7.0 KiB
Markdown
184 lines
7.0 KiB
Markdown
---
|
|
|
|
slug: /using-seekdb-in-python-sdk
|
|
---
|
|
|
|
# Experience embedded seekdb with Python SDK
|
|
|
|
This example demonstrates how to quickly experience embedded seekdb through pyseekdb (a Python client provided by OceanBase) in a Linux environment.
|
|
|
|
:::tip
|
|
In addition to Linux, you can also use pyseekdb in macOS and Windows. However, only server mode of seekdb is supported. For more information about how to use pyseekdb in macOS and Windows, see [Get started with pyseekdb](../../200.develop/900.sdk/10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
|
:::
|
|
|
|
|
|
## Background information
|
|
|
|
### pyseekdb
|
|
|
|
pyseekdb is a Python client provided by OceanBase. It implements a unified API interface that provides three database connection modes, supporting connections to embedded-mode seekdb, server-mode seekdb, and OceanBase databases.
|
|
|
|
Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases. Alternatively, you can choose to remotely connect to a deployed seekdb in client/server mode or OceanBase database.
|
|
|
|
### seekdb deployment modes
|
|
|
|
seekdb provides flexible deployment modes that support everything from rapid prototyping to large-scale user workloads, meeting the full range of your application needs.
|
|
|
|
* Embedded mode
|
|
|
|
seekdb embeds as a lightweight library installable with a single pip command, ideal for personal learning or prototyping, and can easily run on various end devices.
|
|
|
|
* Client/Server mode
|
|
|
|
A lightweight and easy-to-use deployment mode recommended for both testing and production, delivering stable and efficient service.
|
|
|
|
For information about using seekdb in client/server mode, see [Experience seekdb in client/server mode](../100.client-server-mode/10.deploy-seekdb-testing-environment.md).
|
|
|
|
## Install pyseekdb
|
|
|
|
### Prerequisites
|
|
|
|
Ensure that your environment meets the following requirements:
|
|
|
|
* Operating system: Linux (glibc >= 2.28)
|
|
* Python version: Python 3.11 and later
|
|
* System architecture: x86_64, aarch64
|
|
|
|
### Installation
|
|
|
|
Use pip to install, which automatically detects the default Python version and platform.
|
|
|
|
```shell
|
|
pip install pyseekdb
|
|
```
|
|
|
|
If your pip version is low, upgrade pip first before installing.
|
|
|
|
```bash
|
|
pip install --upgrade pip
|
|
```
|
|
|
|
## Experience seekdb with Python SDK
|
|
|
|
The following example uses embedded-mode seekdb to demonstrate basic operations with embedding functions, helping you quickly understand how to use seekdb.
|
|
|
|
1. Connect to seekdb.
|
|
2. Create a collection with embedding functions.
|
|
3. Add data using documents (vectors are automatically generated).
|
|
4. Query using texts (vectors are automatically generated).
|
|
5. Print query results.
|
|
|
|
|
|
```python
|
|
import pyseekdb
|
|
|
|
# ==================== Step 1: Create Client Connection ====================
|
|
# You can use embedded mode, server mode, or OceanBase mode
|
|
# For this example, we'll use embedded mode (you can change to server mode seekdb or OceanBase)
|
|
|
|
# Embedded mode (local SeekDB)
|
|
client = pyseekdb.Client()
|
|
# Alternative: Server mode (connecting to remote SeekDB server)
|
|
# client = pyseekdb.Client(
|
|
# host="127.0.0.1",
|
|
# port=2881,
|
|
# database="test",
|
|
# user="root",
|
|
# password=""
|
|
# )
|
|
|
|
# Alternative: Remote server mode (OceanBase Server)
|
|
# client = pyseekdb.Client(
|
|
# host="127.0.0.1",
|
|
# port=2881,
|
|
# tenant="test", # OceanBase default tenant
|
|
# database="test",
|
|
# user="root",
|
|
# password=""
|
|
# )
|
|
|
|
# ==================== Step 2: Create a Collection with Embedding Function ====================
|
|
# A collection is like a table that stores documents with vector embeddings
|
|
collection_name = "my_simple_collection"
|
|
|
|
# Create collection with default embedding function
|
|
# The embedding function will automatically convert documents to embeddings
|
|
collection = client.create_collection(
|
|
name=collection_name,
|
|
)
|
|
|
|
print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
|
|
print(f"Embedding function: {collection.embedding_function}")
|
|
|
|
# ==================== Step 3: Add Data to Collection ====================
|
|
# With embedding function, you can add documents directly without providing embeddings
|
|
# The embedding function will automatically generate embeddings from documents
|
|
|
|
documents = [
|
|
"Machine learning is a subset of artificial intelligence",
|
|
"Python is a popular programming language",
|
|
"Vector databases enable semantic search",
|
|
"Neural networks are inspired by the human brain",
|
|
"Natural language processing helps computers understand text"
|
|
]
|
|
|
|
ids = ["id1", "id2", "id3", "id4", "id5"]
|
|
|
|
# Add data with documents only - embeddings will be auto-generated by embedding function
|
|
collection.add(
|
|
ids=ids,
|
|
documents=documents, # embeddings will be automatically generated
|
|
metadatas=[
|
|
{"category": "AI", "index": 0},
|
|
{"category": "Programming", "index": 1},
|
|
{"category": "Database", "index": 2},
|
|
{"category": "AI", "index": 3},
|
|
{"category": "NLP", "index": 4}
|
|
]
|
|
)
|
|
|
|
print(f"\nAdded {len(documents)} documents to collection")
|
|
print("Note: Embeddings were automatically generated from documents using the embedding function")
|
|
|
|
# ==================== Step 4: Query the Collection ====================
|
|
# With embedding function, you can query using text directly
|
|
# The embedding function will automatically convert query text to query vector
|
|
|
|
# Query using text - query vector will be auto-generated by embedding function
|
|
query_text = "artificial intelligence and machine learning"
|
|
|
|
results = collection.query(
|
|
query_texts=query_text, # Query text - will be embedded automatically
|
|
n_results=3 # Return top 3 most similar documents
|
|
)
|
|
|
|
print(f"\nQuery: '{query_text}'")
|
|
print(f"Query results: {len(results['ids'][0])} items found")
|
|
|
|
# ==================== Step 5: Print Query Results ====================
|
|
for i in range(len(results['ids'][0])):
|
|
print(f"\nResult {i+1}:")
|
|
print(f" ID: {results['ids'][0][i]}")
|
|
print(f" Distance: {results['distances'][0][i]:.4f}")
|
|
if results.get('documents'):
|
|
print(f" Document: {results['documents'][0][i]}")
|
|
if results.get('metadatas'):
|
|
print(f" Metadata: {results['metadatas'][0][i]}")
|
|
|
|
# ==================== Step 6: Cleanup ====================
|
|
# Delete the collection
|
|
client.delete_collection(collection_name)
|
|
print(f"\nDeleted collection '{collection_name}'")
|
|
```
|
|
|
|
## More information
|
|
|
|
* For more detailed introduction and usage of pyseekdb, see [pyseekdb](../../200.develop/900.sdk/10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
|
|
|
* For more pyseekdb usage examples, see:
|
|
|
|
* [Complete example](../../200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/50.pyseekdb-complete-sample.md): Demonstrates all capabilities currently supported by pyseekdb.
|
|
|
|
* [Hybrid search example](../../200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/100.pyseekdb-hybrid-search-sample.md): Demonstrates the usage of seekdb hybrid search.
|
|
|
|
* In addition to the Python SDK, seekdb also supports operations through SQL. For SQL usage, see [Experience seekdb in client/server mode](../100.client-server-mode/10.deploy-seekdb-testing-environment.md). |