Initial commit
This commit is contained in:
@@ -0,0 +1,140 @@
|
||||
---
|
||||
slug: /hybrid-search-of-api
|
||||
---
|
||||
|
||||
# hybrid_search - Hybrid search
|
||||
|
||||
`hybrid_search()` combines full-text search and vector similarity search with ranking.
|
||||
|
||||
:::info
|
||||
|
||||
This API is only available when using the Client. For more information about the Client, see [Client](../50.client.md).
|
||||
|
||||
:::
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have installed pyseekdb. For more information about how to install pyseekdb, see [Get Started](../../10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md).
|
||||
|
||||
* You have connected to the database. For more information about how to connect to the database, see [Client](../50.client.md).
|
||||
|
||||
* You have created a collection and inserted data. For more information about how to create a collection and insert data, see [create_collection - Create a collection](../200.collection/100.create-collection-of-api.md) and [add - Insert Data](../300.dml/200.add-data-of-api.md).
|
||||
|
||||
## Request parameters
|
||||
|
||||
```python
|
||||
hybrid_search(
|
||||
query={
|
||||
"where_document": ,
|
||||
"where": ,
|
||||
"n_results":
|
||||
},
|
||||
knn={
|
||||
"query_texts":
|
||||
"where":
|
||||
"n_results":
|
||||
},
|
||||
rank=,
|
||||
n_results=,
|
||||
include=
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
* query: full-text search configuration, including the following parameters:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|
||||
|`where_document`|dict|Optional|Document filter conditions. |`{"$contains": "machine"}`|
|
||||
|`n_results`|int|Yes|Number of results for full-text search.||
|
||||
|
||||
* knn: vector search configuration, including the following parameters:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|---|---|---|---|---|
|
||||
|`query_embeddings`|List[float] or List[List[float]] |Yes|A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`|[1.0, 2.0, 3.0]|
|
||||
|`query_texts`|str or List[str]|Optional|A single vector or list of vectors; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`|["my query text"]|
|
||||
|`where`|dict |Optional|Metadata filter conditions. |`{"category": {"$eq": "AI"}}`|
|
||||
|`n_results`|int|Yes|Number of results for vector search.||
|
||||
|
||||
* Other parameters are as follows:
|
||||
|
||||
|Parameter|Type|Required|Description|Example value|
|
||||
|`rank`|dict |Optional|Ranking configuration, for example: `{"rrf": {"rank_window_size": 60, "rank_constant": 60}}`|`{"category": {"$eq": "AI"}}`|
|
||||
|`n_results`|int|Yes|Number of similar results to return. Default value is 10|3|
|
||||
|`include`|List[str]|Optional|List of fields to include: `["documents", "metadatas", "embeddings"]`.|["documents", "metadatas", "embeddings"]|
|
||||
|
||||
|
||||
:::info
|
||||
|
||||
The `embedding_function` used is associated with the collection (set during `create_collection()` or `get_collection()`). You cannot override it for each operation.
|
||||
|
||||
:::
|
||||
|
||||
## Request example
|
||||
|
||||
```python
|
||||
import pyseekdb
|
||||
|
||||
# Create a client
|
||||
client = pyseekdb.Client()
|
||||
|
||||
collection = client.get_collection("my_collection")
|
||||
collection1 = client.get_collection("my_collection1")
|
||||
|
||||
# Hybrid search with query_embeddings (embedding_function not used)
|
||||
results = collection.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "machine learning"},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], # Used directly
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Hybrid search with both full-text and vector search (using query_texts)
|
||||
results = collection1.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "machine learning"},
|
||||
"where": {"category": {"$eq": "science"}},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_texts": ["AI research"], # Will be embedded automatically
|
||||
"where": {"year": {"$gte": 2020}},
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}}, # Reciprocal Rank Fusion
|
||||
n_results=5,
|
||||
include=["documents", "metadatas", "embeddings"]
|
||||
)
|
||||
|
||||
# Hybrid search with multiple query texts (batch)
|
||||
results = collection1.hybrid_search(
|
||||
query={
|
||||
"where_document": {"$contains": "AI"},
|
||||
"n_results": 10
|
||||
},
|
||||
knn={
|
||||
"query_texts": ["machine learning", "neural networks"], # Multiple queries
|
||||
"n_results": 10
|
||||
},
|
||||
rank={"rrf": {}},
|
||||
n_results=5
|
||||
)
|
||||
```
|
||||
|
||||
## Return parameters
|
||||
|
||||
A dictionary containing search results, including ID, distances, metadatas, document, etc.
|
||||
|
||||
## Related operations
|
||||
|
||||
* [Vector query](200.query-interfaces-of-api.md)
|
||||
* [get - Retrieve](300.get-interfaces-of-api.md)
|
||||
* [Operators](500.filter-operators-of-api.md)
|
||||
Reference in New Issue
Block a user