zhongwei/gh-oceanbase-ecology-plugins-seekdb-claudecode-plugin

Files

Zhongwei Li eb309b7b59 Initial commit

2025-11-30 08:44:54 +08:00

5.7 KiB

Raw Permalink Blame History

slug

slug
/query-interfaces-of-api

query - vector query

The query() method is used to perform vector similarity search to find the most similar documents to the query vector.

:::info

This interface is only available when using the Client. For more information about the Client, see Client.

:::

Prerequisites

You have installed pyseekdb. For more information about how to install pyseekdb, see Get Started.
You have connected to the database. For more information about how to connect to the database, see Client.
You have created a collection and inserted data. For more information about how to create a collection and insert data, see create_collection - Create a collection and add - Insert data.

Request parameters

query()

Parameter	Value type	Required	Description	Example value
`query_embeddings`	List[float] or List[List[float]]	Yes	A single vector or a list of vectors for batch queries; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `query_text` must be provided, and the `collection` must have an `embedding_function`	[1.0, 2.0, 3.0]
`query_texts`	str or List[str]	No	A single text or a list of texts for query; if provided, it will be used directly (ignoring `embedding_function`); if not provided, `documents` must be provided, and the `collection` must have an `embedding_function`	["my query text"]
`n_results`	int	Yes	The number of similar results to return, default is 10	3
`where`	dict	No	Metadata filter conditions.	`{"category": {"$eq": "AI"}}`
`where_document`	dict	No	Document filter conditions.	`{"$contains": "machine"}`
`include`	List[str]	No	List of fields to include: `["documents", "metadatas", "embeddings"]`	["documents", "metadatas", "embeddings"]

:::info

The embedding_function used is associated with the collection (set during create_collection() or get_collection()). You cannot override it for each operation.

:::

Request example

import pyseekdb

# Create a client
client = pyseekdb.Client()

collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")

# Basic vector similarity query (embedding_function not used)
results = collection.query(
    query_embeddings=[1.0, 2.0, 3.0],
    n_results=3
)

# Iterate over results
for i in range(len(results["ids"][0])):
    print(f"ID: {results['ids'][0][i]}, Distance: {results['distances'][0][i]}")
    if results.get("documents"):
        print(f"Document: {results['documents'][0][i]}")
    if results.get("metadatas"):
        print(f"Metadata: {results['metadatas'][0][i]}")

# Query by texts - vectors auto-generated by embedding_function
# Requires: collection must have embedding_function set
results = collection1.query(
    query_texts=["my query text"],
    n_results=10
)
# The collection's embedding_function will automatically convert query_texts to query_embeddings

# Query by multiple texts (batch query)
results = collection1.query(
    query_texts=["query text 1", "query text 2"],
    n_results=5
)
# Returns dict with lists of lists, one list per query text
for i in range(len(results["ids"])):
    print(f"Query {i}: {len(results['ids'][i])} results")

# Query with metadata filter (using query_texts)
results = collection1.query(
    query_texts=["AI research"],
    where={"category": {"$eq": "AI"}},
    n_results=5
)

# Query with comparison operator (using query_texts)
results = collection1.query(
    query_texts=["machine learning"],
    where={"score": {"$gte": 90}},
    n_results=5
)

# Query with document filter (using query_texts)
results = collection1.query(
    query_texts=["neural networks"],
    where_document={"$contains": "machine learning"},
    n_results=5
)

# Query with combined filters (using query_texts)
results = collection1.query(
    query_texts=["AI research"],
    where={"category": {"$eq": "AI"}, "score": {"$gte": 90}},
    where_document={"$contains": "machine"},
    n_results=5
)

# Query with multiple vectors (batch query)
results = collection.query(
    query_embeddings=[[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]],
    n_results=2
)
# Returns dict with lists of lists, one list per query vector
for i in range(len(results["ids"])):
    print(f"Query {i}: {len(results['ids'][i])} results")

# Query with specific fields
results = collection.query(
    query_embeddings=[1.0, 2.0, 3.0],
    include=["documents", "metadatas", "embeddings"],
    n_results=3
)

Return parameters

Parameter	Value type	Required	Description	Example value
`ids`	List[List[str]]	Yes	The IDs to add or modify. It can be a single ID or an array of IDs.	item1
`embeddings`	[List[List[List[float]]]]	No	The vectors; if provided, it will be used directly (ignoring `embedding_function`), if not provided, `documents` can be provided to generate vectors automatically.	[0.1, 0.2, 0.3]
`documents`	[List[List[Dict]]]	No	The documents. If `vectors` are not provided, `documents` will be converted to vectors using the `embedding_function` of the collection.	"Document text"
`metadatas`	[List[List[Dict]]]	No	The metadata.	`{"category": "AI"}`
`distances`	[List[List[Dict]]]	No		`{"category": "AI"}`

Return example

ID: vec1, Distance: 0.0
Document: None
Metadata: {}
ID: vec2, Distance: 0.025368153802923787
Document: None
Metadata: {}
Query 0: 4 results
Query 1: 4 results
Query 0: 2 results
Query 1: 2 results

5.7 KiB Raw Permalink Blame History