Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:44:54 +08:00
commit eb309b7b59
133 changed files with 21979 additions and 0 deletions

View File

@@ -0,0 +1,66 @@
---
slug: /api-overview
---
# API Reference
seekdb allows you to use seekdb through APIs.
## APIs
The following APIs are supported.
### Database
:::info
You can use this API only when you connect to seekdb by using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../50.apis/100.admin-client.md).
:::
| API | Description | Documentation |
|---|---|---|
| `create_database()` | Creates a database. | [Documentation](110.database/200.create-database-of-api.md) |
| `get_database()` | Retrieves a specified database. |[Documentation](110.database/300.get-database-of-api.md)|
| `list_databases()` | Retrieves a list of databases in an instance. |[Documentation](110.database/400.list-database-of-api.md)|
| `delete_database()` | Deletes a specified database.|[Documentation](110.database/500.delete-database-of-api.md)|
### Collection
:::info
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
:::
| API | Description | Documentation |
|---|---|---|
| `create_collection()` | Creates a collection. | [Documentation](200.collection/100.create-collection-of-api.md) |
| `get_collection()` | Retrieves a specified collection. |[Documentation](200.collection/200.get-collection-of-api.md)|
| `get_or_create_collection()` | Creates or queries a collection. If the collection does not exist in the database, it is created. If the collection exists, the corresponding result is obtained. |[Documentation](200.collection/250.get-or-create-collection-of-api.md)|
| `list_collections()` | Retrieves the collection list in a database. |[Documentation](200.collection/300.list-collection-of-api.md)|
| `count_collection()` | Counts the number of collections in a database. |[Documentation](200.collection/350.count-collection-of-api.md)|
| `delete_collection()` | Deletes a specified collection.|[Documentation](200.collection/400.delete-collection-of-api.md)|
### DML
:::info
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
:::
| API | Description | Documentation |
|---|---|---|
| `add()` | Inserts a new record into a collection. | [Documentation](300.dml/200.add-data-of-api.md) |
| `update()` | Updates an existing record in a collection. |[Documentation](300.dml/300.update-data-of-api.md)|
| `upsert()` | Inserts a new record or updates an existing record. |[Documentation](300.dml/400.upsert-data-of-api.md)|
| `delete()` | Deletes a record from a collection.|[Documentation](300.dml/500.delete-data-of-api.md)|
### DQL
:::info
You can use this API only when you connect to seekdb by using the `Client`. For more information about the `Client`, see [Client](../50.apis/50.client.md).
:::
| API | Description | Documentation |
|---|---|---|
| `query()` | Performs vector similarity search. | [Documentation](400.dql/200.query-interfaces-of-api.md) |
| `get()` | Queries specific data from a table by using the ID, document, and metadata (non-vector). |[Documentation](400.dql/300.get-interfaces-of-api.md)|
| `hybrid_search()` | Performs full-text search and vector similarity search by using ranking. |[Documentation](400.dql/400.hybrid-search-of-api.md)|

View File

@@ -0,0 +1,93 @@
---
slug: /admin-client
---
# Admin Client
`AdminClient` provides database management operations. It uses the same database connection mode as `Client`, but only supports database management-related operations.
## Connect to an embedded seekdb instance
Connect to a local embedded seekdb instance by using `AdminClient`.
```python
import pyseekdb
# Embedded mode - Database management
admin = pyseekdb.AdminClient(path="./seekdb")
```
Parameter description:
| Parameter | Value Type | Required | Description | Example Value |
| --- | --- | --- | --- | --- |
| `path` | string | Optional | The path of the seekdb data directory. seekdb stores database files in this directory and loads them when it starts. | `./seekdb` |
## Connect to a remote server
Connect to a remote server by using `AdminClient`. This way, you can connect to a seekdb instance or an OceanBase Database instance.
:::tip
Before you connect to a remote server, make sure that you have deployed a server mode seekdb instance or an OceanBase Database instance.<br/>For information about how to deploy a server mode seekdb instance, see [Overview](../../../400.guides/400.deploy/50.deploy-overview.md).<br/>For information about how to deploy an OceanBase Database instance, see [Overview](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976427).
:::
Example: Connect to a server mode seekdb instance
```python
import pyseekdb
# Remote server mode - Database management
admin = pyseekdb.AdminClient(
host="127.0.0.1",
port=2881,
user="root",
password="" # Can be retrieved from SEEKDB_PASSWORD environment variable
)
```
Parameter description:
| Parameter | Value Type | Required | Description | Example Value |
| --- | --- | --- | --- | --- |
| `host` | string | Yes | The IP address of the server where the instance resides. | `127.0.0.1` |
| `prot` | string | Yes | The port of the instance. The default value is 2881. | `2881` |
| `user` | string | Yes | The username. The default value is root. | `root` |
| `password` | string | Yes | The password corresponding to the username. If you do not specify `password` or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. | |
Example: Connect to an OceanBase Database instance
```python
import pyseekdb
# Remote server mode - Database management
admin = pyseekdb.AdminClient(
host="127.0.0.1",
port=2881,
tenant="test"
user="root",
password="" # Can be retrieved from SEEKDB_PASSWORD environment variable
)
```
Parameter description:
| Parameter | Value Type | Required | Description | Example Value |
| --- | --- | --- | --- | --- |
| `host` | string | Yes | The IP address of the server where the database resides. | `127.0.0.1` |
| `prot` | string | Yes | The port of the OceanBase Database instance. The default value is 2881. | `2881` |
| `tenant` | string | No | The name of the tenant. This parameter is not required for a server mode seekdb instance, but is required for an OceanBase Database instance. The default value is sys. | `test` |
| `user` | string | Yes | The username corresponding to the tenant. The default value is root. | `root` |
| `password` | string | Yes | The password corresponding to the username. If you do not specify `password` or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. | |
## APIs supported when you use AdminClient to connect to a database
The following APIs are supported when you use `AdminClient` to connect to a database.
| API | Description | Documentation Link |
| --- | --- | --- |
| `create_database` | Creates a new database. |[Documentation](110.database/200.create-database-of-api.md)|
| `get_database` | Queries a specified database. |[Documentation](110.database/300.get-database-of-api.md)|
| `delete_database` | Deletes a specified database. |[Documentation](110.database/400.list-database-of-api.md)|
| `list_databases` | Lists all databases. |[Documentation](110.database/500.delete-database-of-api.md)|

View File

@@ -0,0 +1,16 @@
---
slug: /database-overview-of-api
---
# Database Management
A database contains tables, indexes, and metadata of database objects. You can create, query, and delete databases as needed.
The following APIs are available for database operations.
| API | Description | Documentation |
|---|---|---|
| `create_database()` | Creates a database. | [Documentation](200.create-database-of-api.md) |
| `get_database()` | Gets a specified database. |[Documentation](300.get-database-of-api.md)|
| `list_databases()` | Gets the list of databases in the instance. |[Documentation](400.list-database-of-api.md)|
| `delete_database()` | Deletes a specified database.|[Documentation](500.delete-database-of-api.md)|

View File

@@ -0,0 +1,76 @@
---
slug: /create-database-of-api
---
# create_database - Create a database
The `create_database()` function is used to create a new database.
:::info
* This interface can only be used when you are connected to the database using `AdminClient`. For more information about `AdminClient`, see [Admin Client](../100.admin-client.md).
* Currently, when you use `create_database` to create a database, you cannot specify the database properties. The database will be created based on the default values of the properties. If you want to create a database with specific properties, you can try to create it using SQL. For more information about how to create a database using SQL, see [Create a database](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977077).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You are connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
* If you are using server mode of seekdb or OceanBase Database, make sure that the connected user has the `CREATE` privilege. For more information about how to check the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have this privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Limitations
* In a seekdb instance or OceanBase Database, the name of each database must be globally unique.
* The maximum length of a database name is 128 characters.
* The name can contain only uppercase and lowercase letters, digits, underscores, dollar signs, and Chinese characters.
* Avoid using reserved keywords as database names.
For more information about reserved keywords, see [Reserved keywords](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976774).
## Recommendations
* We recommend that you give the database a meaningful name that reflects its purpose and content. For example, you can use `Application Identifier_Sub-application name (optional)_db` as the database name.
* We recommend that you create the database and related users using the root user and assign only the necessary privileges to ensure the security and controllability of the database.
* You can create a database with a name consisting only of digits by enclosing the name in backticks (`), but this is not recommended. This is because names consisting only of digits have no clear meaning, and queries require the use of backticks (`), which can lead to unnecessary complexity and confusion.
## Request parameters
```python
create_database(name, tenant=DEFAULT_TENANT)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the database to be created. |`my_database`|
|`tenant`|string|No<ul><li>When using embedded seekdb or server mode of seekdb, this parameter is not required.</li><li>When using OceanBase Database, this parameter is required.</li></ul>|The tenant to which the database belongs. |`test_tenant`|
## Request example
```python
import pyseekdb
# Embedded mode
admin = pyseekdb.AdminClient(path="./seekdb")
# Create database
admin.create_database("my_database")
```
## Response parameters
None
## References
* [Get a specific database](300.get-database-of-api.md)
* [Delete a database](500.delete-database-of-api.md)
* [List databases](400.list-database-of-api.md)

View File

@@ -0,0 +1,65 @@
---
slug: /get-database-of-api
---
# get_database - Get the specified database
The `get_database()` method is used to obtain the information of the specified database.
:::info
This method can be used only when you connect to the database by using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
## Request parameters
```python
get_database(name, tenant=DEFAULT_TENANT)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the database to be queried. |`my_database`|
|`tenant`|string|No<ul><li>When you use embedded seekdb and server mode seekdb, you do not need to specify this parameter.</li><li>When you use OceanBase Database, you must specify this parameter.</li></ul>|The tenant to which the database belongs. |test_tenant|
## Request example
```python
import pyseekdb
# Embedded mode
admin = pyseekdb.AdminClient(path="./seekdb")
# Get database
db = admin.get_database("my_database")
# print(f"Database: {db.name}, Charset: {db.charset}, collation:{db.collation}, metadata:{db.metadata}")
```
## Response parameters
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the queried database. |`my_database`|
|`tenant`|string|No<br/>When you use embedded seekdb and server mode SeekDB, this parameter does not exist. |The tenant to which the queried database belongs. |`test_tenant`|
|`charset`|string|No|The character set used by the queried database. |`utf8mb4`|
|`collation`|string|No|The collation used by the queried database. |`utf8mb4_general_ci`|
|`metadata`|dict|No|Reserved field. | {} |
## Response example
```python
Database: my_database, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
```
## References
* [Create a database](200.create-database-of-api.md)
* [Delete a database](500.delete-database-of-api.md)
* [Get the database list](400.list-database-of-api.md)

View File

@@ -0,0 +1,70 @@
---
slug: /list-database-of-api
---
# list_databases - Get the database list
The `list_databases()` method is used to retrieve the database list in the instance.
:::info
This API is only available when using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
## Request parameters
```python
list_databases(limit=None, offset=None, tenant=DEFAULT_TENANT)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`limit`|int|Optional|The maximum number of databases to return. |2|
|`offset`|int|Optional|The number of databases to skip. |3|
|`tenant`|string|Optional<ul><li>When using embedded seekdb and server mode seekdb, this parameter is not required.</li><li>When using OceanBase Database, this parameter is required. The default value is `sys`.</li></ul>|The tenant to which the queried database belongs. |test_tenant|
## Request example
```python
# List all databases
import pyseekdb
# Embedded mode
admin = pyseekdb.AdminClient(path="./seekdb")
# list database
databases = admin.list_databases(2,3)
for db in databases:
print(f"Database: {db.name}, Charset: {db.charset}, collation:{db.collation}, metadata:{db.metadata}")
```
## Response parameters
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the queried database. |`my_database`|
|`tenant`|string|Optional<br/>When using embedded seekdb and server mode SeekDB, this parameter is not available. |The tenant to which the queried database belongs. |`test_tenant`|
|`charset`|string|Optional|The character set of the queried database. |`utf8mb4`|
|`collation`|string|Optional|The collation of the queried database. |`utf8mb4_general_ci`|
|`metadata`|dict|Optional|Reserved field. No data is returned. | {} |
## Response example
```python
Database: test, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
Database: my_database, Charset: utf8mb4, collation:utf8mb4_general_ci, metadata:{}
```
## References
* [Create a database](200.create-database-of-api.md)
* [Delete a database](500.delete-database-of-api.md)
* [Get a specific database](300.get-database-of-api.md)

View File

@@ -0,0 +1,54 @@
---
slug: /delete-database-of-api
---
# delete_database - Delete a database
The `delete_database()` method is used to delete a database.
:::info
This method is only available when using the `AdminClient`. For more information about the `AdminClient`, see [Admin Client](../100.admin-client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Admin Client](../100.admin-client.md).
* If you are using server mode of seekdb or OceanBase Database, ensure that the user has the `DROP` privilege. For more information about how to view the privileges of the current user, see [View User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have the privilege, contact the administrator to grant the privilege. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Request parameters
```python
delete_database(name,tenant=DEFAULT_TENANT)
```
|Parameter|Type|Required|Description|Example Value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the database to be deleted. |my_database|
|`tenant`|string|No<ul><li>If you are using embedded seekdb or server mode of seekdb, you do not need to specify this parameter.</li><li>If you are using OceanBase Database, this parameter is required. The default value is `sys`.</li></ul>|The tenant to which the database belongs. |test_tenant|
## Request example
```python
import pyseekdb
# Embedded mode
admin = pyseekdb.AdminClient(path="./seekdb")
# Delete database
admin.delete_database("my_database")
```
## Response parameters
None
## References
* [Create a database](200.create-database-of-api.md)
* [Get a specific database](300.get-database-of-api.md)
* [Obtain a database list](400.list-database-of-api.md)

View File

@@ -0,0 +1,93 @@
---
slug: /create-collection-of-api
---
# create_collection - Create a collection
`create_collection()` is used to create a new collection, which is a table in the database.
:::info
This API is only available when you are connected to the database using a client. For more information about the client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* If you are using seekdb in server mode or OceanBase Database, make sure that the user has the `CREATE` privilege. For more information about how to view the privileges of the current user, see [View user privileges](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971368). If the user does not have the privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly grant privileges](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974754).
## Define the table name
When creating a table, you must first define its name. The following requirements apply when defining the table name:
* In seekdb, each table name must be unique within the database.
* The table name cannot exceed 64 characters.
* We recommend that you give the table a meaningful name instead of using generic names such as t1 or table1. For more information about table naming conventions, see [Table naming conventions](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977289).
## Request parameters
```python
create_collection(name = name,configuration = configuration, embedding_function = embedding_function )
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the collection to be created. |my_collection|
|`configuration`|HNSWConfiguration|No|The index configuration, which specifies the dimension and distance metric. If not provided, the default values `dimension=384` and `distance='cosine'` are used. If set to `None`, the dimension is calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|`embedding_function`|EmbeddingFunction|No|The function to convert data into vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` is used. If set to `None`, the collection will not include embedding functionality, and if provided, it will be calculated based on `configuration.dimension`.|DefaultEmbeddingFunction()|
:::info
When you provide `embedding_function`, the system will automatically calculate the vector dimension by calling this function. If you also provide `configuration.dimension`, it must match the dimension of `embedding_function`. Otherwise, a ValueError will be raised.
:::
## Request example
```python
import pyseekdb
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
# Create a client
client = pyseekdb.Client()
# Create a collection with default embedding function (auto-calculates dimension)
collection = client.create_collection(
name="my_collection"
)
# Create a collection with custom embedding function
ef = UserDefinedEmbeddingFunction() // define your own Embedding function, See section.6
config = HNSWConfiguration(dimension=384, distance='cosine') # Must match EF dimension
collection = client.create_collection(
name="my_collection2",
configuration=config,
embedding_function=ef
)
# Create a collection without embedding function (vectors must be provided manually)
collection = client.create_collection(
name="my_collection3",
configuration=HNSWConfiguration(dimension=384, distance='cosine'),
embedding_function=None # Explicitly disable embedding function
)
```
## Response parameters
None
## References
* [Query a collection](200.get-collection-of-api.md)
* [Create or query a collection](250.get-or-create-collection-of-api.md)
* [Get a collection list](300.list-collection-of-api.md)
* [Count the number of collections](350.count-collection-of-api.md)
* [Delete a collection](400.delete-collection-of-api.md)

View File

@@ -0,0 +1,89 @@
---
slug: /get-collection-of-api
---
# get_collection - Get a collection
The `get_collection()` function is used to retrieve a specified collection.
:::info
This API is only available when connected using a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
* The collection you want to retrieve exists. If the collection does not exist, an error will be returned.
## Request parameters
```python
client.get_collection(name,configuration = configuration,embedding_function = embedding_function)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the collection to retrieve. |my_collection|
|`configuration`|HNSWConfiguration|No|The index configuration, which specifies the dimension and distance metric. If not provided, the default value `dimension=384, distance='cosine'` will be used. If set to `None`, the dimension will be calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|`embedding_function`|EmbeddingFunction|No|The function used to convert text to vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` will be used. If set to `None`, the collection will not contain an embedding function. If an embedding function is provided, it will be calculated based on `configuration.dimension`.|DefaultEmbeddingFunction()|
:::info
When vectors are not provided for documents/texts, the embedding function set here will be used for all operations on this collection, including add, upsert, update, query, and hybrid_search.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
# Get an existing collection (uses default embedding function if collection doesn't have one)
collection = client.get_collection("my_collection")
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
# Get collection with specific embedding function
ef = UserDefinedEmbeddingFunction() // define your own Embedding function, See section.6
collection = client.get_collection("my_collection", embedding_function=ef)
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
# Get collection without embedding function
collection = client.get_collection("my_collection", embedding_function=None)
# Check if collection exists
if client.has_collection("my_collection"):
collection = client.get_collection("my_collection")
print(f"Database: {collection.name}, dimension: {collection.dimension}, embedding_function:{collection.embedding_function}, distance:{collection.distance}, metadata:{collection.metadata}")
```
## Response parameters
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the collection to query. |my_collection|
|`dimension`|int|No| |384|
|`embedding_function`|EmbeddingFunction|No|DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')|
|`distance`|string|No| |cosine|
|`metadata`|dict|No|Reserved field, currently no data| {} |
## Response example
```python
Database: my_collection, dimension: 384, embedding_function:DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance:cosine, metadata:{}
Database: my_collection1, dimension: 384, embedding_function:DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance:cosine, metadata:{}
```
## References
* [Create a collection](100.create-collection-of-api.md)
* [Create or query a collection](250.get-or-create-collection-of-api.md)
* [Get a list of collections](300.list-collection-of-api.md)
* [Count the number of collections](350.count-collection-of-api.md)
* [Delete a collection](400.delete-collection-of-api.md)

View File

@@ -0,0 +1,79 @@
---
slug: /get-or-create-collection-of-api
---
# get_or_create_collection - Create or query a collection
The `get_or_create_collection()` function creates or queries a collection. If the collection does not exist in the database, it is created. If it exists, the corresponding result is obtained.
:::info
This API is only available when using a client. For more information about the client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
* If you are using seekdb in server mode or OceanBase Database, ensure that the connected user has the `CREATE` privilege. For more information about how to check the privileges of the current user, see [Check User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have this privilege, contact the administrator to grant it. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Define a table name
When creating a table, you need to define a table name. The following requirements must be met:
* In seekdb, each table name must be unique within the database.
* The table name must be no longer than 64 characters.
* It is recommended to use meaningful names for tables instead of generic names like t1 or table1. For more information about table naming conventions, see [Table Naming Conventions](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003977289).
## Request parameters
```python
create_collection(name = name,configuration = configuration, embedding_function = embedding_function )
```
|Parameter|Value Type|Required|Description|Example Value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the collection to be created. |my_collection|
|`configuration`|HNSWConfiguration|No|The index configuration with dimension and distance metric. If not provided, the default value is used, which is `dimension=384, distance='cosine'`. If set to `None`, the dimension will be calculated from the `embedding_function` value. |HNSWConfiguration(dimension=384, distance='cosine')|
|`embedding_function`|EmbeddingFunction|No|The function to convert to vectors. If not provided, `DefaultEmbeddingFunction()(384 dimensions)` is used. If set to `None`, the collection will not include embedding functionality. If embedding functionality is provided, it will be automatically calculated based on `configuration.dimension`. |DefaultEmbeddingFunction()|
:::info
When `embedding_function` is provided, the system will automatically calculate the vector dimension by calling the function. If `configuration.dimension` is also provided, it must match the dimension of `embedding_function`, otherwise a ValueError will be raised.
:::
## Request example
```python
import pyseekdb
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
# Create a client
client = pyseekdb.Client()
# Get or create collection (creates if doesn't exist)
collection = client.get_or_create_collection(
name="my_collection4",
configuration=HNSWConfiguration(dimension=384, distance='cosine'),
embedding_function=DefaultEmbeddingFunction()
)
```
## Response parameters
None
## References
* [Create a collection](100.create-collection-of-api.md)
* [Query a collection](200.get-collection-of-api.md)
* [Get a list of collections](300.list-collection-of-api.md)
* [Count collections](350.count-collection-of-api.md)
* [Delete a collection](400.delete-collection-of-api.md)

View File

@@ -0,0 +1,65 @@
---
slug: /list-collection-of-api
---
# list_collections - Get a list of collections
The `list_collections()` API is used to obtain all collections.
:::info
This API is supported only when you use a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
## Request parameters
```python
client.list_collections()
```
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
# List all collections
collections = client.list_collections()
for coll in collections:
print(f"Collection: {coll.name}, Dimension: {coll.dimension}, embedding_function: {coll.embedding_function}, distance: {coll.distance}, metadata: {coll.metadata}")
```
## Response parameters
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the queried collection. |my_collection|
|`dimension`|int|No| | 384 |
|`embedding_function`|EmbeddingFunction|No|DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')|
|`distance`|string|No| |cosine|
|`metadata`|dict|No|Reserved field. No data is returned. | {} |
## Response example
```pyhton
Collection: my_collection, Dimension: 384, embedding_function: DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2'), distance: cosine, metadata: {}
Database has 1 collections
```
## References
* [Create a collection](100.create-collection-of-api.md)
* [Query a collection](200.get-collection-of-api.md)
* [Create or query a collection](250.get-or-create-collection-of-api.md)
* [Count collections](350.count-collection-of-api.md)
* [Delete a collection](400.delete-collection-of-api.md)

View File

@@ -0,0 +1,56 @@
---
slug: /count-collection-of-api
---
# count_collection - Count the number of collections
The `count_collection()` method is used to count the number of collections in the database.
:::info
This API is only available when you are connected to the database using a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
## Request parameters
```python
client.count_collection()
```
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
# Count collections in database
collection_count = client.count_collection()
print(f"Database has {collection_count} collections")
```
## Return parameters
None
## Return example
```pyhton
Database has 1 collections
```
## Related operations
* [Create a collection](100.create-collection-of-api.md)
* [Query a collection](200.get-collection-of-api.md)
* [Create or query a collection](250.get-or-create-collection-of-api.md)
* [Get a collection list](300.list-collection-of-api.md)
* [Delete a collection](400.delete-collection-of-api.md)

View File

@@ -0,0 +1,55 @@
---
slug: /delete-collection-of-api
---
# delete_collection - Delete a Collection
The `delete_collection()` method is used to delete a specified Collection.
:::info
This API is only available when you are connected to the database using a client. For more information about the client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* The Collection you want to delete exists. If the Collection does not exist, an error will be returned.
## Request parameters
```python
client.delete_collection(name)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`name`|string|Yes|The name of the Collection to be deleted. |my_collection|
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
# Delete a collection
client.delete_collection("my_collection")
```
## Response parameters
None
## References
* [Create a collection](100.create-collection-of-api.md)
* [Query a collection](200.get-collection-of-api.md)
* [Create or query a collection](250.get-or-create-collection-of-api.md)
* [Get a collection list](300.list-collection-of-api.md)
* [Count the number of collections](350.count-collection-of-api.md)

View File

@@ -0,0 +1,18 @@
---
slug: /collection-overview-of-api
---
# Manage collections
In pyseekdb, a collection is a set similar to a table in a database. You can create, query, and delete collections.
The following API interfaces are supported for managing collections.
| API interface | Description | Documentation |
|---|---|---|
| `create_collection()` | Creates a collection. | [Documentation](100.create-collection-of-api.md) |
| `get_collection()` | Gets a specified collection. |[Documentation](200.get-collection-of-api.md)|
| `get_or_create_collection()` | Creates or queries a collection. If the collection does not exist in the database, it is created. If the collection exists, the corresponding result is obtained. |[Documentation](250.get-or-create-collection-of-api.md)|
| `list_collections()` | Gets the collection list of a database. |[Documentation](300.list-collection-of-api.md)|
| `count_collection()` | Counts the number of collections in a database |[Documentation](350.count-collection-of-api.md)|
| `delete_collection()` | Deletes a specified collection.|[Documentation](400.delete-collection-of-api.md)|

View File

@@ -0,0 +1,16 @@
---
slug: /dml-overview-of-api
---
# DML operations
DML (Data Manipulation Language) operations allow you to insert, update, and delete data in a collection.
For DML operations, you can use the following APIs.
| API | Description | Documentation |
|---|---|---|
| `add()` | Inserts a new record into a collection. | [Documentation](200.add-data-of-api.md) |
| `update()` | Updates an existing record in a collection. |[Documentation](300.update-data-of-api.md)|
| `upsert()` | Inserts a new record or updates an existing record. |[Documentation](400.upsert-data-of-api.md)|
| `delete()` | Deletes a record from a collection.|[Documentation](500.delete-data-of-api.md)|

View File

@@ -0,0 +1,117 @@
---
slug: /add-data-of-api
---
# add - Insert data
The `add()` method inserts new data into a collection. If a record with the same ID already exists, an error is returned.
:::info
This API is only available when using a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* If you are using seekdb or OceanBase Database in client mode, make sure that the user to which you are connected has the `INSERT` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have the required privilege, contact the administrator to grant you the privilege. For more information about how to directly grant a privilege, see [Directly grant a privilege](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Request parameters
```python
add(
ids=ids,
embeddings=embeddings,
documents=documents,
metadatas=metadatas
)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|string or List[str]|Yes|The ID of the data to be inserted. You can specify a single ID or an array of IDs.|item1|
|`embeddings`|List[float] or List[List[float]]|No|The vector or vectors of the data to be inserted. If you specify this parameter, the value of `embedding_function` is ignored. If you do not specify this parameter, you must specify `documents`, and the `collection` must have an `embedding_function`.|[0.1, 0.2, 0.3]|
|`documents`|string or List[str]|No|The document or documents to be inserted. If you do not specify `vectors`, `documents` will be converted to vectors using the `embedding_function` of the `collection`.|"This is a document"|
|`metadatas`|dict or List[dict]|No|The metadata or metadata list of the data to be inserted. |`{"category": "AI", "score": 95}`|
:::info
The `embedding_function` associated with the collection is set during `create_collection()` or `get_collection()`. You cannot override it for each operation.
:::
## Request example
```python
import pyseekdb
from pyseekdb import DefaultEmbeddingFunction, HNSWConfiguration
# Create a client
client = pyseekdb.Client()
collection = client.create_collection(
name="my_collection",
configuration=HNSWConfiguration(dimension=3, distance='cosine'),
embedding_function=None
)
# Add single item
collection.add(
ids="item1",
embeddings=[0.1, 0.2, 0.3],
documents="This is a document",
metadatas={"category": "AI", "score": 95}
)
# Add multiple items
collection.add(
ids=["item4", "item2", "item3"],
embeddings=[
[0.1, 0.2, 0.4],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
],
documents=[
"Document 1",
"Document 2",
"Document 3"
],
metadatas=[
{"category": "AI", "score": 95},
{"category": "ML", "score": 88},
{"category": "DL", "score": 92}
]
)
# Add with only embeddings
collection.add(
ids=["vec1", "vec2"],
embeddings=[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
)
collection1 = client.create_collection(
name="my_collection1"
)
# Add with only documents - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection1.add(
ids=["doc1", "doc2"],
documents=["Text document 1", "Text document 2"],
metadatas=[{"tag": "A"}, {"tag": "B"}]
)
```
## Response parameters
None
## References
* [Update data](300.update-data-of-api.md)
* [Update or insert data](400.upsert-data-of-api.md)
* [Delete data](500.delete-data-of-api.md)

View File

@@ -0,0 +1,88 @@
---
slug: /update-data-of-api
---
# update - Update data
The `update()` method is used to update existing records in a collection. The record must exist, otherwise an error will be raised.
:::info
This API is only available when using a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
* If you are using seekdb in client mode or OceanBase Database, make sure that the user to which you have connected has the `UPDATE` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View User Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have this privilege, contact the administrator to grant it to you. For more information about how to directly grant privileges, see [Directly Grant Privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Request parameters
```python
update(
ids=ids,
embeddings=embeddings,
documents=documents,
metadatas=metadatas
)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|string or List[str]|Yes|The ID to be modified. It can be a single ID or an array of IDs.|item1|
|`embeddings`|List[float] or List[List[float]]|No|The new vectors. If provided, they will be used directly (ignoring `embedding_function`). If not provided, you can provide `documents` to automatically generate vectors.|[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]]|
|`documents`|string or List[str]|No|The new documents. If `vectors` are not provided, `documents` will be converted to vectors using the collection's `embedding_function`.|"New document text"|
|`metadatas`|dict or List[dict]|No|The new metadata.|`{"category": "AI"}`|
:::info
You can update only the `metadatas`. The `embedding_function` used must be associated with the collection.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Update single item
collection.update(
ids="item1",
metadatas={"category": "AI", "score": 98} # Update metadata only
)
# Update multiple items
collection.update(
ids=["item1", "item2"],
embeddings=[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]], # Update embeddings
documents=["Updated document 1", "Updated document 2"] # Update documents
)
# Update with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection1.update(
ids="doc1",
documents="New document text", # Embeddings will be auto-generated
metadatas={"category": "AI"}
)
```
## Response parameters
None
## References
* [Insert data](200.add-data-of-api.md)
* [Update or insert data](400.upsert-data-of-api.md)
* [Delete data](500.delete-data-of-api.md)

View File

@@ -0,0 +1,93 @@
---
slug: /upsert-data-of-api
---
# upsert - Update or insert data
The `upsert()` method is used to insert new records or update existing records. If a record with the given ID already exists, it will be updated; otherwise, a new record will be inserted.
:::info
This API is only available when using a Client connection. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect, see [Client](../50.client.md).
* If you are using seekdb or OceanBase Database in client mode, ensure that the connected user has the `INSERT` and `UPDATE` privileges on the target table. For more information about how to view the current user privileges, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If the user does not have the required privileges, contact the administrator to grant them. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Request parameters
```python
Upsert(
ids=ids,
embeddings=embeddings,
documents=documents,
metadatas=metadatas
)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|string or List[str]|Yes|The ID to be added or modified. It can be a single ID or an array of IDs.|item1|
|`embeddings`|List[float] or List[List[float]]|No|The vectors. If provided, they will be used directly (ignoring `embedding_function`). If not provided, you can provide `documents` to automatically generate vectors.|[0.1, 0.2, 0.3]|
|`documents`|string or List[str]|No|The documents. If `vectors` are not provided, `documents` will be converted to vectors using the collection's `embedding_function`.|"Document text"|
|`metadatas`|dict or List[dict]|No|The metadata. |`{"category": "AI"}`|
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Upsert single item (insert or update)
collection.upsert(
ids="item1",
embeddings=[0.1, 0.2, 0.3],
documents="Document text",
metadatas={"category": "AI", "score": 95}
)
# Upsert multiple items
collection.upsert(
ids=["item1", "item2", "item3"],
embeddings=[
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
],
documents=["Doc 1", "Doc 2", "Doc 3"],
metadatas=[
{"category": "AI"},
{"category": "ML"},
{"category": "DL"}
]
)
# Upsert with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection1.upsert(
ids=["item1", "item2"],
documents=["Document 1", "Document 2"],
metadatas=[{"category": "AI"}, {"category": "ML"}]
)
```
## Response parameters
None
## References
* [Insert data](200.add-data-of-api.md)
* [Update data](300.update-data-of-api.md)
* [Delete data](400.upsert-data-of-api.md)

View File

@@ -0,0 +1,87 @@
---
slug: /delete-data-of-api
---
# delete - Delete data
`delete()` is used to delete records from a collection. You can delete records by ID, metadata filter, or document filter.
:::info
This API is only available when you are connected to the database using a Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Quick Start](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You are connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* If you are using seekdb or OceanBase Database in client mode, make sure that the user to whom you are connected has the `DELETE` privilege on the table to be operated. For more information about how to view the privileges of the current user, see [View user privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980135). If you do not have this privilege, contact the administrator to grant it to you. For more information about how to directly grant privileges, see [Directly grant privileges](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003980140).
## Request parameters
```python
Upsert(
ids=ids,
embeddings=embeddings,
documents=documents,
metadatas=metadatas
)
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|string or List[str]|Optional|The ID of the record to be deleted. You can specify a single ID or an array of IDs.|item1|
|`where`|dict|Optional|The metadata filter.|`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|Optional|The document filter.|`{"$contains": "obsolete"}`|
:::info
At least one of the `id`, `where`, or `where_document` parameters must be specified.
:::
## Request examples
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
# Delete by IDs
collection.delete(ids=["item1", "item2", "item3"])
# Delete by single ID
collection.delete(ids="item1")
# Delete by metadata filter
collection.delete(where={"category": {"$eq": "AI"}})
# Delete by comparison operator
collection.delete(where={"score": {"$lt": 50}})
# Delete by document filter
collection.delete(where_document={"$contains": "obsolete"})
# Delete with combined filters
collection.delete(
where={"category": {"$eq": "AI"}},
where_document={"$contains": "deprecated"}
)
```
## Response parameters
None
## References
* [Insert data](200.add-data-of-api.md)
* [Update data](300.update-data-of-api.md)
* [Update or insert data](400.upsert-data-of-api.md)

View File

@@ -0,0 +1,15 @@
---
slug: /dql-overview-of-api
---
# Overview of DQL
DQL (Data Query Language) operations allow you to retrieve data from collections using various query methods.
For DQL operations, the following API interfaces are supported.
| API Interface | Description | Documentation Link |
|---|---|---|
| `query()` | A vector similarity search method. | [Documentation](200.query-interfaces-of-api.md) |
| `get()` | Queries specific data from a table using an ID, document, or metadata (excluding vectors). | [Documentation](300.get-interfaces-of-api.md) |
| `hybrid_search()` | Combines full-text search and vector similarity search using a ranking method. | [Documentation](400.hybrid-search-of-api.md) |

View File

@@ -0,0 +1,161 @@
---
slug: /query-interfaces-of-api
---
# query - vector query
The `query()` method is used to perform vector similarity search to find the most similar documents to the query vector.
:::info
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
query()
```
|Parameter|Value type|Required|Description|Example value|
|---|---|---|---|---|
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or a list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|`query_texts`|str or List[str]|No|A single text or a list of texts for query; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|`n_results`|int|Yes|The number of similar results to return, default is 10|3|
|`where`|dict |No|Metadata filter conditions.|`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|No|Document filter conditions.|`{"$contains": "machine"}`|
|`include`|List[str]|No|List of fields to include: `["documents", "metadatas", "embeddings"]`|["documents", "metadatas", "embeddings"]|
:::info
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Basic vector similarity query (embedding_function not used)
results = collection.query(
query_embeddings=[1.0, 2.0, 3.0],
n_results=3
)
# Iterate over results
for i in range(len(results["ids"][0])):
print(f"ID: {results['ids'][0][i]}, Distance: {results['distances'][0][i]}")
if results.get("documents"):
print(f"Document: {results['documents'][0][i]}")
if results.get("metadatas"):
print(f"Metadata: {results['metadatas'][0][i]}")
# Query by texts - vectors auto-generated by embedding_function
# Requires: collection must have embedding_function set
results = collection1.query(
query_texts=["my query text"],
n_results=10
)
# The collection's embedding_function will automatically convert query_texts to query_embeddings
# Query by multiple texts (batch query)
results = collection1.query(
query_texts=["query text 1", "query text 2"],
n_results=5
)
# Returns dict with lists of lists, one list per query text
for i in range(len(results["ids"])):
print(f"Query {i}: {len(results['ids'][i])} results")
# Query with metadata filter (using query_texts)
results = collection1.query(
query_texts=["AI research"],
where={"category": {"$eq": "AI"}},
n_results=5
)
# Query with comparison operator (using query_texts)
results = collection1.query(
query_texts=["machine learning"],
where={"score": {"$gte": 90}},
n_results=5
)
# Query with document filter (using query_texts)
results = collection1.query(
query_texts=["neural networks"],
where_document={"$contains": "machine learning"},
n_results=5
)
# Query with combined filters (using query_texts)
results = collection1.query(
query_texts=["AI research"],
where={"category": {"$eq": "AI"}, "score": {"$gte": 90}},
where_document={"$contains": "machine"},
n_results=5
)
# Query with multiple vectors (batch query)
results = collection.query(
query_embeddings=[[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]],
n_results=2
)
# Returns dict with lists of lists, one list per query vector
for i in range(len(results["ids"])):
print(f"Query {i}: {len(results['ids'][i])} results")
# Query with specific fields
results = collection.query(
query_embeddings=[1.0, 2.0, 3.0],
include=["documents", "metadatas", "embeddings"],
n_results=3
)
```
## Return parameters
|Parameter|Value type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|List[List[str]] |Yes|The IDs to add or modify. It can be a single ID or an array of IDs.|item1|
|`embeddings`|[List[List[List[float]]]]|No|The vectors; if provided, it will be used directly (ignoring `embedding_function`), if not provided, `documents` can be provided to generate vectors automatically.|[0.1, 0.2, 0.3]|
|`documents`|[List[List[Dict]]]|No|The documents. If `vectors` are not provided, `documents` will be converted to vectors using the `embedding_function` of the collection.| "Document text"|
|`metadatas`|[List[List[Dict]]]|No|The metadata.|`{"category": "AI"}`|
|`distances`|[List[List[Dict]]]|No| |`{"category": "AI"}`|
## Return example
```python
ID: vec1, Distance: 0.0
Document: None
Metadata: {}
ID: vec2, Distance: 0.025368153802923787
Document: None
Metadata: {}
Query 0: 4 results
Query 1: 4 results
Query 0: 2 results
Query 1: 2 results
```
## Related operations
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,127 @@
---
slug: /get-interfaces-of-api
---
# get - Retrieve
`get()` is used to retrieve documents from a collection without performing vector similarity search.
It supports filtering by IDs, metadata, and documents.
:::info
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
get()
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|List[float] or List[List[float]] |Yes|The ID or list of IDs to retrieve.|[1.0, 2.0, 3.0]|
|`where`|dict |No|The metadata filter. |`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|No|The document filter. |`{"$contains": "machine"}`|
|`limit`|dict |No|The maximum number of results to return. |`{"category": {"$eq": "AI"}}`|
|`offset`|dict|No|The number of results to skip for pagination. |`{"$contains": "machine"}`|
|`include`|List[str]|No|The list of fields to include: `["documents", "metadatas", "embeddings"]`. |["documents", "metadatas", "embeddings"]|
:::info
If no parameters are provided, all data is returned.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
# Get by single ID
results = collection.get(ids="123")
# Get by multiple IDs
results = collection.get(ids=["1", "2", "3"])
# Get by metadata filter
results = collection.get(
where={"category": {"$eq": "AI"}},
limit=10
)
# Get by comparison operator
results = collection.get(
where={"score": {"$gte": 90}},
limit=10
)
# Get by $in operator
results = collection.get(
where={"tag": {"$in": ["ml", "python"]}},
limit=10
)
# Get by logical operators ($or)
results = collection.get(
where={
"$or": [
{"category": {"$eq": "AI"}},
{"tag": {"$eq": "python"}}
]
},
limit=10
)
# Get by document content filter
results = collection.get(
where_document={"$contains": "machine learning"},
limit=10
)
# Get with combined filters
results = collection.get(
where={"category": {"$eq": "AI"}},
where_document={"$contains": "machine"},
limit=10
)
# Get with pagination
results = collection.get(limit=2, offset=1)
# Get with specific fields
results = collection.get(
ids=["1", "2"],
include=["documents", "metadatas", "embeddings"]
)
# Get all data (up to limit)
results = collection.get(limit=100)
```
## Response parameters
* If a single ID is provided: The result contains the get object for that ID.
* If multiple IDs are provided: A list of QueryResult objects, one for each ID.
* If filters are provided: A QueryResult object containing all matching results.
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,140 @@
---
slug: /hybrid-search-of-api
---
# hybrid_search - Hybrid search
`hybrid_search()` combines full-text search and vector similarity search with ranking.
:::info
This API is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert Data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
hybrid_search(
query={
"where_document": ,
"where": ,
"n_results":
},
knn={
"query_texts":
"where":
"n_results":
},
rank=,
n_results=,
include=
)
```
* query: full-text search configuration, including the following parameters:
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|Optional|Document filter conditions. |`{"$contains": "machine"}`|
|`n_results`|int|Yes|Number of results for full-text search.||
* knn: vector search configuration, including the following parameters:
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|`query_texts`|str or List[str]|Optional|A single vector or list of vectors; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|`n_results`|int|Yes|Number of results for vector search.||
* Other parameters are as follows:
|Parameter|Type|Required|Description|Example value|
|`rank`|dict |Optional|Ranking configuration, for example: `{"rrf": {"rank_window_size": 60, "rank_constant": 60}}`|`{"category": {"$eq": "AI"}}`|
|`n_results`|int|Yes|Number of similar results to return. Default value is 10|3|
|`include`|List[str]|Optional|List of fields to include: `["documents", "metadatas", "embeddings"]`.|["documents", "metadatas", "embeddings"]|
:::info
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Hybrid search with query_embeddings (embedding_function not used)
results = collection.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"n_results": 10
},
knn={
"query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], # Used directly
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
# Hybrid search with both full-text and vector search (using query_texts)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"where": {"category": {"$eq": "science"}},
"n_results": 10
},
knn={
"query_texts": ["AI research"], # Will be embedded automatically
"where": {"year": {"$gte": 2020}},
"n_results": 10
},
rank={"rrf": {}}, # Reciprocal Rank Fusion
n_results=5,
include=["documents", "metadatas", "embeddings"]
)
# Hybrid search with multiple query texts (batch)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "AI"},
"n_results": 10
},
knn={
"query_texts": ["machine learning", "neural networks"], # Multiple queries
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
```
## Return parameters
A dictionary containing search results, including ID, distances, metadatas, document, etc.
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,151 @@
---
slug: /filter-operators-of-api
---
# Operators
Operators are used to connect operands or parameters and return results. In terms of syntax, operators can appear before, after, or between operands.
## Operator examples
### Data filtering (where)
#### Equal to
Use `$eq` to indicate equal to, as shown in the following example:
```python
where={"category": {"$eq": "AI"}}
```
#### Not equal to
Use `$ne` to indicate not equal to, as shown in the following example:
```python
where={"status": {"$ne": "deleted"}}
```
#### Greater than
Use `$gt` to indicate greater than, as shown in the following example:
```python
where={"score": {"$gt": 90}}
```
#### Greater than or equal to
Use `$gte` to indicate greater than or equal to, as shown in the following example:
```python
where={"score": {"$gte": 90}}
```
#### Less than
Use `$lt` to indicate less than, as shown in the following example:
```python
where={"score": {"$lt": 50}}
```
#### Less than or equal to
Use `$lte` to indicate less than or equal to, as shown in the following example:
```python
where={"score": {"$lte": 50}}
```
#### Contains
Use `$in` to indicate contains, as shown in the following example:
```python
where={"tag": {"$in": ["ml", "python", "ai"]}}
```
#### Does not contain
Use `$nin` to indicate does not contain, as shown in the following example:
```python
where={"tag": {"$nin": ["deprecated", "old"]}}
```
#### Logical OR
Use `$or` to indicate logical OR, as shown in the following example:
```python
where={
"$or": [
{"category": {"$eq": "AI"}},
{"tag": {"$eq": "python"}}
]
}
```
#### Logical AND
Use `$and` to indicate logical AND, as shown in the following example:
```python
where={
"$and": [
{"category": {"$eq": "AI"}},
{"score": {"$gte": 90}}
]
}
```
### Text filtering (where_document)
#### Full-text search (contains substring)
Use `$contains` to indicate full-text search, as shown in the following example:
```python
where_document={"$contains": "machine learning"}
```
#### Regular expression
Use `$regex` to indicate regular expression, as shown in the following example:
```python
where_document={"$regex": "pattern.*"}
```
#### Logical OR
Use `$or` to indicate logical OR, as shown in the following example:
```python
where_document={
"$or": [
{"$contains": "machine learning"},
{"$contains": "artificial intelligence"}
]
}
```
#### Logical AND
Use `$and` to indicate logical AND, as shown in the following example:
```python
where_document={
"$and": [
{"$contains": "machine"},
{"$contains": "learning"}
]
}
```
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)

View File

@@ -0,0 +1,107 @@
---
slug: /client
---
# Client
The `Client` class is used to connect to a database in either embedded mode or server mode. It automatically selects the appropriate connection mode based on the provided parameters.
:::tip
OceanBase Database is a fully self-developed, enterprise-level, native distributed database developed by OceanBase. It achieves financial-grade high availability on ordinary hardware and sets a new standard for automatic, lossless disaster recovery across five IDCs in three regions. It also sets a new benchmark in the TPC-C benchmark test, with a single cluster size exceeding 1,500 nodes. OceanBase Database is cloud-native, highly consistent, and highly compatible with Oracle and MySQL. For more information about OceanBase Database, see [OceanBase Database](https://www.oceanbase.com/docs/oceanbase-database-cn).
:::
## Connect to an embedded seekdb instance
Use the `Client` class to connect to a local embedded seekdb instance.
```python
import pyseekdb
# Create embedded client
client = pyseekdb.Client(
#path="./seekdb", # Path to SeekDB data directory
#database="test" # Database name
)
```
The following table describes the parameters.
| Parameter | Value type | Required | Description | Example value |
| --- | --- | --- | --- | --- |
| `path` | string | No | The path to the seekdb data directory. seekdb stores database files in this directory and loads them when it starts. | `./seekdb` |
| `database` | string | No | The name of the database. | `test` |
## Connect to a remote server
Use the `Client` class to connect to a remote server, which runs seekdb or OceanBase Database.
:::tip
Before you connect to a remote server, make sure that you have deployed a server instance of seekdb or OceanBase Database. <br/>For information about how to deploy a server instance of seekdb, see [Overview](../../../400.guides/400.deploy/50.deploy-overview.md).<br/>For information about how to deploy OceanBase Database, see [Overview](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000003976427).
:::
Example: Connect to a server instance of seekdb
```python
import pyseekdb
# Create remote server client (SeekDB Server)
client = pyseekdb.Client(
host="127.0.0.1", # Server host
port=2881, # Server port
database="test", # Database name
user="root", # Username
password="" # Password (can be retrieved from SEEKDB_PASSWORD environment variable)
)
```
The following table describes the parameters.
| Parameter | Value type | Required | Description | Example value |
| --- | --- | --- | --- | --- |
| `host` | string | Yes | The IP address of the server where the instance is located. | `127.0.0.1` |
| `prot` | string | Yes | The port number of the instance. The default value is 2881. | `2881` |
| `database` | string | Yes | The name of the database. | `test` |
| `user` | string | Yes | The username. The default value is root. | `root` |
| `password` | string | Yes | The password corresponding to the user. If you do not provide the `password` parameter or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. ||
Example: Connect to OceanBase Database
```python
import pyseekdb
# Create remote server client (OceanBase Server)
client = pyseekdb.Client(
host="127.0.0.1", # Server host
port=2881, # Server port (default: 2881)
tenant="test", # Tenant name
database="test", # Database name
user="root", # Username (default: "root")
password="" # Password (can be retrieved from SEEKDB_PASSWORD environment variable)
)
```
The following table describes the parameters.
| Parameter | Value type | Required | Description | Example value |
| --- | --- | --- | --- | --- |
| `host` | string | Yes | The IP address of the server where the database is located. | `127.0.0.1` |
| `prot` | string | Yes | The port number of OceanBase Database. The default value is 2881. | `2881` |
| `tenant` | string | No | The name of the tenant. This parameter is not required for seekdb. For OceanBase Database, the default value is sys. | `test` |
| `database` | string | Yes | The name of the database. | `test` |
| `user` | string | Yes | The username corresponding to the tenant. The default value is root. | `root` |
| `password` | string | Yes | The password corresponding to the user. If you do not provide the `password` parameter or specify an empty string, the system retrieves the password from the `SEEKDB_PASSWORD` environment variable. ||
## APIs supported when you use the Client class to connect to a database
When you use the `Client` class to connect to a database, you can call the following APIs.
| API | Description | Document link |
| --- | --- | --- |
| `create_collection()` | Creates a new collection. | [Document](200.collection/100.create-collection-of-api.md) |
| `get_collection()` | Queries a specified collection. |[Document](200.collection/200.get-collection-of-api.md)|
| `delete_collection()` | Deletes a specified collection. |[Document](200.collection/400.delete-collection-of-api.md)|
| `list_collections()` | Lists all collections in the current database.|[Document](200.collection/300.list-collection-of-api.md)|
| `get_or_create_collection()` | Queries a specified collection. If the collection does not exist, it is created.|[Document](200.collection/250.get-or-create-collection-of-api.md)|
| `count_collection()` | Queries the number of collections in the current database. |[Document](200.collection/350.count-collection-of-api.md)|