Files
2025-11-30 08:44:54 +08:00

4.7 KiB

slug
slug
/hybrid-search-of-api

hybrid_search - Hybrid search

hybrid_search() combines full-text search and vector similarity search with ranking.

:::info

This API is only available when using the Client. For more information about the Client, see Client.

:::

Prerequisites

  • You have installed pyseekdb. For more information about how to install pyseekdb, see Get Started.

  • You have connected to the database. For more information about how to connect to the database, see Client.

  • You have created a collection and inserted data. For more information about how to create a collection and insert data, see create_collection - Create a collection and add - Insert Data.

Request parameters

hybrid_search(
    query={
        "where_document": ,
        "where": ,
        "n_results": 
    },
    knn={
        "query_texts": 
        "where":
        "n_results": 
    },
    rank=,  
    n_results=,
    include=
)
  • query: full-text search configuration, including the following parameters:

    Parameter Type Required Description Example value
    where dict Optional Metadata filter conditions. {"category": {"$eq": "AI"}}
    where_document dict Optional Document filter conditions. {"$contains": "machine"}
    n_results int Yes Number of results for full-text search.
  • knn: vector search configuration, including the following parameters:

    Parameter Type Required Description Example value
    query_embeddings List[float] or List[List[float]] Yes A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring embedding_function); if not provided, query_text must be provided, and the collection must have an embedding_function [1.0, 2.0, 3.0]
    query_texts str or List[str] Optional A single vector or list of vectors; if provided, it will be used directly (ignoring embedding_function); if not provided, documents must be provided, and the collection must have an embedding_function ["my query text"]
    where dict Optional Metadata filter conditions. {"category": {"$eq": "AI"}}
    n_results int Yes Number of results for vector search.
  • Other parameters are as follows:

    |Parameter|Type|Required|Description|Example value| |rank|dict |Optional|Ranking configuration, for example: {"rrf": {"rank_window_size": 60, "rank_constant": 60}}|{"category": {"$eq": "AI"}}| |n_results|int|Yes|Number of similar results to return. Default value is 10|3| |include|List[str]|Optional|List of fields to include: ["documents", "metadatas", "embeddings"].|["documents", "metadatas", "embeddings"]|

:::info

The embedding_function used is associated with the collection (set during create_collection() or get_collection()). You cannot override it for each operation.

:::

Request example

import pyseekdb

# Create a client
client = pyseekdb.Client()

collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")

# Hybrid search with query_embeddings (embedding_function not used)
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "machine learning"},
        "n_results": 10
    },
    knn={
        "query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],  # Used directly
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

# Hybrid search with both full-text and vector search (using query_texts)
results = collection1.hybrid_search(
    query={
        "where_document": {"$contains": "machine learning"},
        "where": {"category": {"$eq": "science"}},
        "n_results": 10
    },
    knn={
        "query_texts": ["AI research"],  # Will be embedded automatically
        "where": {"year": {"$gte": 2020}},
        "n_results": 10
    },
    rank={"rrf": {}},  # Reciprocal Rank Fusion
    n_results=5,
    include=["documents", "metadatas", "embeddings"]
)

# Hybrid search with multiple query texts (batch)
results = collection1.hybrid_search(
    query={
        "where_document": {"$contains": "AI"},
        "n_results": 10
    },
    knn={
        "query_texts": ["machine learning", "neural networks"],  # Multiple queries
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

Return parameters

A dictionary containing search results, including ID, distances, metadatas, document, etc.