Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:44:54 +08:00
commit eb309b7b59
133 changed files with 21979 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
---
slug: /dql-overview-of-api
---
# Overview of DQL
DQL (Data Query Language) operations allow you to retrieve data from collections using various query methods.
For DQL operations, the following API interfaces are supported.
| API Interface | Description | Documentation Link |
|---|---|---|
| `query()` | A vector similarity search method. | [Documentation](200.query-interfaces-of-api.md) |
| `get()` | Queries specific data from a table using an ID, document, or metadata (excluding vectors). | [Documentation](300.get-interfaces-of-api.md) |
| `hybrid_search()` | Combines full-text search and vector similarity search using a ranking method. | [Documentation](400.hybrid-search-of-api.md) |

View File

@@ -0,0 +1,161 @@
---
slug: /query-interfaces-of-api
---
# query - vector query
The `query()` method is used to perform vector similarity search to find the most similar documents to the query vector.
:::info
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
query()
```
|Parameter|Value type|Required|Description|Example value|
|---|---|---|---|---|
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or a list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|`query_texts`|str or List[str]|No|A single text or a list of texts for query; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|`n_results`|int|Yes|The number of similar results to return, default is 10|3|
|`where`|dict |No|Metadata filter conditions.|`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|No|Document filter conditions.|`{"$contains": "machine"}`|
|`include`|List[str]|No|List of fields to include: `["documents", "metadatas", "embeddings"]`|["documents", "metadatas", "embeddings"]|
:::info
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Basic vector similarity query (embedding_function not used)
results = collection.query(
query_embeddings=[1.0, 2.0, 3.0],
n_results=3
)
# Iterate over results
for i in range(len(results["ids"][0])):
print(f"ID: {results['ids'][0][i]}, Distance: {results['distances'][0][i]}")
if results.get("documents"):
print(f"Document: {results['documents'][0][i]}")
if results.get("metadatas"):
print(f"Metadata: {results['metadatas'][0][i]}")
# Query by texts - vectors auto-generated by embedding_function
# Requires: collection must have embedding_function set
results = collection1.query(
query_texts=["my query text"],
n_results=10
)
# The collection's embedding_function will automatically convert query_texts to query_embeddings
# Query by multiple texts (batch query)
results = collection1.query(
query_texts=["query text 1", "query text 2"],
n_results=5
)
# Returns dict with lists of lists, one list per query text
for i in range(len(results["ids"])):
print(f"Query {i}: {len(results['ids'][i])} results")
# Query with metadata filter (using query_texts)
results = collection1.query(
query_texts=["AI research"],
where={"category": {"$eq": "AI"}},
n_results=5
)
# Query with comparison operator (using query_texts)
results = collection1.query(
query_texts=["machine learning"],
where={"score": {"$gte": 90}},
n_results=5
)
# Query with document filter (using query_texts)
results = collection1.query(
query_texts=["neural networks"],
where_document={"$contains": "machine learning"},
n_results=5
)
# Query with combined filters (using query_texts)
results = collection1.query(
query_texts=["AI research"],
where={"category": {"$eq": "AI"}, "score": {"$gte": 90}},
where_document={"$contains": "machine"},
n_results=5
)
# Query with multiple vectors (batch query)
results = collection.query(
query_embeddings=[[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]],
n_results=2
)
# Returns dict with lists of lists, one list per query vector
for i in range(len(results["ids"])):
print(f"Query {i}: {len(results['ids'][i])} results")
# Query with specific fields
results = collection.query(
query_embeddings=[1.0, 2.0, 3.0],
include=["documents", "metadatas", "embeddings"],
n_results=3
)
```
## Return parameters
|Parameter|Value type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|List[List[str]] |Yes|The IDs to add or modify. It can be a single ID or an array of IDs.|item1|
|`embeddings`|[List[List[List[float]]]]|No|The vectors; if provided, it will be used directly (ignoring `embedding_function`), if not provided, `documents` can be provided to generate vectors automatically.|[0.1, 0.2, 0.3]|
|`documents`|[List[List[Dict]]]|No|The documents. If `vectors` are not provided, `documents` will be converted to vectors using the `embedding_function` of the collection.| "Document text"|
|`metadatas`|[List[List[Dict]]]|No|The metadata.|`{"category": "AI"}`|
|`distances`|[List[List[Dict]]]|No| |`{"category": "AI"}`|
## Return example
```python
ID: vec1, Distance: 0.0
Document: None
Metadata: {}
ID: vec2, Distance: 0.025368153802923787
Document: None
Metadata: {}
Query 0: 4 results
Query 1: 4 results
Query 0: 2 results
Query 1: 2 results
```
## Related operations
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,127 @@
---
slug: /get-interfaces-of-api
---
# get - Retrieve
`get()` is used to retrieve documents from a collection without performing vector similarity search.
It supports filtering by IDs, metadata, and documents.
:::info
This interface is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
get()
```
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`ids`|List[float] or List[List[float]] |Yes|The ID or list of IDs to retrieve.|[1.0, 2.0, 3.0]|
|`where`|dict |No|The metadata filter. |`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|No|The document filter. |`{"$contains": "machine"}`|
|`limit`|dict |No|The maximum number of results to return. |`{"category": {"$eq": "AI"}}`|
|`offset`|dict|No|The number of results to skip for pagination. |`{"$contains": "machine"}`|
|`include`|List[str]|No|The list of fields to include: `["documents", "metadatas", "embeddings"]`. |["documents", "metadatas", "embeddings"]|
:::info
If no parameters are provided, all data is returned.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
# Get by single ID
results = collection.get(ids="123")
# Get by multiple IDs
results = collection.get(ids=["1", "2", "3"])
# Get by metadata filter
results = collection.get(
where={"category": {"$eq": "AI"}},
limit=10
)
# Get by comparison operator
results = collection.get(
where={"score": {"$gte": 90}},
limit=10
)
# Get by $in operator
results = collection.get(
where={"tag": {"$in": ["ml", "python"]}},
limit=10
)
# Get by logical operators ($or)
results = collection.get(
where={
"$or": [
{"category": {"$eq": "AI"}},
{"tag": {"$eq": "python"}}
]
},
limit=10
)
# Get by document content filter
results = collection.get(
where_document={"$contains": "machine learning"},
limit=10
)
# Get with combined filters
results = collection.get(
where={"category": {"$eq": "AI"}},
where_document={"$contains": "machine"},
limit=10
)
# Get with pagination
results = collection.get(limit=2, offset=1)
# Get with specific fields
results = collection.get(
ids=["1", "2"],
include=["documents", "metadatas", "embeddings"]
)
# Get all data (up to limit)
results = collection.get(limit=100)
```
## Response parameters
* If a single ID is provided: The result contains the get object for that ID.
* If multiple IDs are provided: A list of QueryResult objects, one for each ID.
* If filters are provided: A QueryResult object containing all matching results.
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,140 @@
---
slug: /hybrid-search-of-api
---
# hybrid_search - Hybrid search
`hybrid_search()` combines full-text search and vector similarity search with ranking.
:::info
This API is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
:::
## Prerequisites
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert Data](../300.dml/200.add-data-of-api.md).
## Request parameters
```python
hybrid_search(
query={
"where_document": ,
"where": ,
"n_results":
},
knn={
"query_texts":
"where":
"n_results":
},
rank=,
n_results=,
include=
)
```
* query: full-text search configuration, including the following parameters:
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|`where_document`|dict|Optional|Document filter conditions. |`{"$contains": "machine"}`|
|`n_results`|int|Yes|Number of results for full-text search.||
* knn: vector search configuration, including the following parameters:
|Parameter|Type|Required|Description|Example value|
|---|---|---|---|---|
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|`query_texts`|str or List[str]|Optional|A single vector or list of vectors; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|`n_results`|int|Yes|Number of results for vector search.||
* Other parameters are as follows:
|Parameter|Type|Required|Description|Example value|
|`rank`|dict |Optional|Ranking configuration, for example: `{"rrf": {"rank_window_size": 60, "rank_constant": 60}}`|`{"category": {"$eq": "AI"}}`|
|`n_results`|int|Yes|Number of similar results to return. Default value is 10|3|
|`include`|List[str]|Optional|List of fields to include: `["documents", "metadatas", "embeddings"]`.|["documents", "metadatas", "embeddings"]|
:::info
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
:::
## Request example
```python
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Hybrid search with query_embeddings (embedding_function not used)
results = collection.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"n_results": 10
},
knn={
"query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], # Used directly
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
# Hybrid search with both full-text and vector search (using query_texts)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"where": {"category": {"$eq": "science"}},
"n_results": 10
},
knn={
"query_texts": ["AI research"], # Will be embedded automatically
"where": {"year": {"$gte": 2020}},
"n_results": 10
},
rank={"rrf": {}}, # Reciprocal Rank Fusion
n_results=5,
include=["documents", "metadatas", "embeddings"]
)
# Hybrid search with multiple query texts (batch)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "AI"},
"n_results": 10
},
knn={
"query_texts": ["machine learning", "neural networks"], # Multiple queries
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
```
## Return parameters
A dictionary containing search results, including ID, distances, metadatas, document, etc.
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Operators](500.filter-operators-of-api.md)

View File

@@ -0,0 +1,151 @@
---
slug: /filter-operators-of-api
---
# Operators
Operators are used to connect operands or parameters and return results. In terms of syntax, operators can appear before, after, or between operands.
## Operator examples
### Data filtering (where)
#### Equal to
Use `$eq` to indicate equal to, as shown in the following example:
```python
where={"category": {"$eq": "AI"}}
```
#### Not equal to
Use `$ne` to indicate not equal to, as shown in the following example:
```python
where={"status": {"$ne": "deleted"}}
```
#### Greater than
Use `$gt` to indicate greater than, as shown in the following example:
```python
where={"score": {"$gt": 90}}
```
#### Greater than or equal to
Use `$gte` to indicate greater than or equal to, as shown in the following example:
```python
where={"score": {"$gte": 90}}
```
#### Less than
Use `$lt` to indicate less than, as shown in the following example:
```python
where={"score": {"$lt": 50}}
```
#### Less than or equal to
Use `$lte` to indicate less than or equal to, as shown in the following example:
```python
where={"score": {"$lte": 50}}
```
#### Contains
Use `$in` to indicate contains, as shown in the following example:
```python
where={"tag": {"$in": ["ml", "python", "ai"]}}
```
#### Does not contain
Use `$nin` to indicate does not contain, as shown in the following example:
```python
where={"tag": {"$nin": ["deprecated", "old"]}}
```
#### Logical OR
Use `$or` to indicate logical OR, as shown in the following example:
```python
where={
"$or": [
{"category": {"$eq": "AI"}},
{"tag": {"$eq": "python"}}
]
}
```
#### Logical AND
Use `$and` to indicate logical AND, as shown in the following example:
```python
where={
"$and": [
{"category": {"$eq": "AI"}},
{"score": {"$gte": 90}}
]
}
```
### Text filtering (where_document)
#### Full-text search (contains substring)
Use `$contains` to indicate full-text search, as shown in the following example:
```python
where_document={"$contains": "machine learning"}
```
#### Regular expression
Use `$regex` to indicate regular expression, as shown in the following example:
```python
where_document={"$regex": "pattern.*"}
```
#### Logical OR
Use `$or` to indicate logical OR, as shown in the following example:
```python
where_document={
"$or": [
{"$contains": "machine learning"},
{"$contains": "artificial intelligence"}
]
}
```
#### Logical AND
Use `$and` to indicate logical AND, as shown in the following example:
```python
where_document={
"$and": [
{"$contains": "machine"},
{"$contains": "learning"}
]
}
```
## Related operations
* [Vector query](200.query-interfaces-of-api.md)
* [get - Retrieve](300.get-interfaces-of-api.md)
* [Hybrid search](400.hybrid-search-of-api.md)