360 lines
22 KiB
Markdown
360 lines
22 KiB
Markdown
---
|
|
|
|
slug: /experience-hybrid-search
|
|
---
|
|
|
|
# Experience hybrid search in seekdb
|
|
|
|
This tutorial guides you through getting started with seekdb's hybrid search feature, demonstrating how hybrid search leverages the advantages of both full-text index keywords and vector index semantic search to help you better understand the practical applications of hybrid search.
|
|
|
|
## Overview
|
|
|
|
Hybrid search combines vector-based semantic retrieval and full-text index-based keyword retrieval, providing more accurate and comprehensive retrieval results through comprehensive ranking. Vector search excels at semantic approximate matching but is weak at matching exact keywords, numbers, and proper nouns, while full-text retrieval effectively compensates for this deficiency. seekdb provides hybrid search functionality through the DBMS_HYBRID_SEARCH system package, supporting the following scenarios:
|
|
|
|
* Pure vector search: Find relevant content based on semantic similarity, suitable for semantic search, recommendation systems, and other scenarios.
|
|
* Pure full-text search: Find content based on keyword matching, suitable for document search, product search, and other scenarios.
|
|
* Hybrid search: Combines keyword matching and semantic understanding to provide more accurate and comprehensive search results.
|
|
|
|
This feature is widely used in intelligent search, document search, product recommendation, and other scenarios.
|
|
|
|
## Prerequisites
|
|
|
|
* Contact the administrator to obtain the corresponding database connection string, then execute the following command to connect to the database:
|
|
```shell
|
|
- host: seekdb database connection IP.
|
|
- port: seekdb database connection port.
|
|
- database_name: Name of the database to access.
|
|
- user_name: Database username.
|
|
- password: Database password.
|
|
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
|
```
|
|
* A test table has been created, and vector indexes and full-text indexes have been created in the table:
|
|
:::collapse
|
|
```sql
|
|
CREATE TABLE doc_table(
|
|
c1 INT,
|
|
vector VECTOR(3),
|
|
query VARCHAR(255),
|
|
content VARCHAR(255),
|
|
VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag),
|
|
FULLTEXT INDEX idx2(query),
|
|
FULLTEXT INDEX idx3(content)
|
|
);
|
|
|
|
INSERT INTO doc_table VALUES
|
|
(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
|
|
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
|
|
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
|
|
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
|
|
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
|
|
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
|
|
```
|
|
:::
|
|
|
|
## Step 1: Pure vector search
|
|
|
|
Vector search finds semantically relevant content by calculating vector similarity, suitable for semantic search, recommendation systems, and other scenarios.
|
|
|
|
Set search parameters and use vector search to find records most similar to the query vector `[1,2,3]`:
|
|
|
|
```sql
|
|
SET @parm = '{
|
|
"knn" : {
|
|
"field": "vector",
|
|
"k": 3,
|
|
"query_vector": [1,2,3]
|
|
}
|
|
}';
|
|
|
|
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
|
|
```
|
|
|
|
The following result is returned:
|
|
|
|
:::collapse
|
|
```shell
|
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
|
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| [
|
|
{
|
|
"c1": 1,
|
|
"query": "hello world",
|
|
"_score": 1.0,
|
|
"vector": "[1,2,3]",
|
|
"content": "oceanbase Elasticsearch database"
|
|
},
|
|
{
|
|
"c1": 5,
|
|
"query": "real world, how old are you",
|
|
"_score": 0.41421356,
|
|
"vector": "[1,3,2]",
|
|
"content": "redis oracle database"
|
|
},
|
|
{
|
|
"c1": 2,
|
|
"query": "hello world, what is your name",
|
|
"_score": 0.33333333,
|
|
"vector": "[1,2,1]",
|
|
"content": "oceanbase mysql database"
|
|
}
|
|
] |
|
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
1 row in set
|
|
```
|
|
:::
|
|
|
|
The results are sorted by vector similarity, where `_score` represents the similarity score. A higher score indicates greater similarity.
|
|
|
|
## Step 2: Pure full-text search
|
|
|
|
Full-text search finds content through keyword matching, suitable for document search, product search, and other scenarios.
|
|
|
|
Set search parameters and use full-text search to find records containing keywords in the `query` and `content` fields:
|
|
|
|
```sql
|
|
SET @parm = '{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": ["query", "content"],
|
|
"query": "hello oceanbase"
|
|
}
|
|
}
|
|
}';
|
|
|
|
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
|
|
```
|
|
|
|
The following result is returned:
|
|
|
|
:::collapse
|
|
```shell
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| [
|
|
{
|
|
"c1": 1,
|
|
"query": "hello world",
|
|
"_score": 0.37162162162162166,
|
|
"vector": "[1,2,3]",
|
|
"content": "oceanbase Elasticsearch database"
|
|
},
|
|
{
|
|
"c1": 2,
|
|
"query": "hello world, what is your name",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[1,2,1]",
|
|
"content": "oceanbase mysql database"
|
|
},
|
|
{
|
|
"c1": 3,
|
|
"query": "hello world, how are you",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[1,1,1]",
|
|
"content": "oceanbase oracle database"
|
|
},
|
|
{
|
|
"c1": 6,
|
|
"query": "hello world, where are you from",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[2,1,1]",
|
|
"content": "starrocks oceanbase database"
|
|
}
|
|
] |
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
1 row in set
|
|
```
|
|
:::
|
|
|
|
The results are sorted by keyword matching degree, where `_score` represents the matching score. A higher score indicates better matching.
|
|
|
|
## Step 3: Hybrid search
|
|
|
|
Hybrid search combines keyword matching and semantic understanding to provide more accurate and comprehensive search results, leveraging the advantages of both full-text indexes and vector indexes.
|
|
|
|
Set search parameters to perform both full-text search and vector search simultaneously:
|
|
|
|
```sql
|
|
SET @parm = '{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": ["query", "content"],
|
|
"query": "hello oceanbase"
|
|
}
|
|
},
|
|
"knn" : {
|
|
"field": "vector",
|
|
"k": 5,
|
|
"query_vector": [1,2,3]
|
|
}
|
|
}';
|
|
|
|
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
|
|
```
|
|
|
|
The following result is returned:
|
|
:::collapse
|
|
```shell
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| [
|
|
{
|
|
"c1": 1,
|
|
"query": "hello world",
|
|
"_score": 0.37162162162162166,
|
|
"vector": "[1,2,3]",
|
|
"content": "oceanbase Elasticsearch database"
|
|
},
|
|
{
|
|
"c1": 2,
|
|
"query": "hello world, what is your name",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[1,2,1]",
|
|
"content": "oceanbase mysql database"
|
|
},
|
|
{
|
|
"c1": 3,
|
|
"query": "hello world, how are you",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[1,1,1]",
|
|
"content": "oceanbase oracle database"
|
|
},
|
|
{
|
|
"c1": 6,
|
|
"query": "hello world, where are you from",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[2,1,1]",
|
|
"content": "starrocks oceanbase database"
|
|
}
|
|
] |
|
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
1 row in set (0.00 sec)
|
|
|
|
MySQL [test]> SET @parm = '{
|
|
'> "query": {
|
|
'> "query_string": {
|
|
'> "fields": ["query", "content"],
|
|
'> "query": "hello oceanbase"
|
|
'> }
|
|
'> },
|
|
'> "knn" : {
|
|
'> "field": "vector",
|
|
'> "k": 5,
|
|
'> "query_vector": [1,2,3]
|
|
'> }
|
|
'> }';
|
|
Query OK, 0 rows affected (0.00 sec)
|
|
|
|
MySQL [test]>
|
|
MySQL [test]> SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
|
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
|
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
| [
|
|
{
|
|
"c1": 1,
|
|
"query": "hello world",
|
|
"_score": 1.3716216216216217,
|
|
"vector": "[1,2,3]",
|
|
"content": "oceanbase Elasticsearch database"
|
|
},
|
|
{
|
|
"c1": 2,
|
|
"query": "hello world, what is your name",
|
|
"_score": 0.6836518013375796,
|
|
"vector": "[1,2,1]",
|
|
"content": "oceanbase mysql database"
|
|
},
|
|
{
|
|
"c1": 3,
|
|
"query": "hello world, how are you",
|
|
"_score": 0.6593354613375797,
|
|
"vector": "[1,1,1]",
|
|
"content": "oceanbase oracle database"
|
|
},
|
|
{
|
|
"c1": 5,
|
|
"query": "real world, how old are you",
|
|
"_score": 0.41421356,
|
|
"vector": "[1,3,2]",
|
|
"content": "redis oracle database"
|
|
},
|
|
{
|
|
"c1": 6,
|
|
"query": "hello world, where are you from",
|
|
"_score": 0.3503184713375797,
|
|
"vector": "[2,1,1]",
|
|
"content": "starrocks oceanbase database"
|
|
},
|
|
{
|
|
"c1": 4,
|
|
"query": "real world, where are you from",
|
|
"_score": 0.30901699,
|
|
"vector": "[1,3,1]",
|
|
"content": "postgres oracle database"
|
|
}
|
|
] |
|
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
|
1 row in set
|
|
```
|
|
:::
|
|
|
|
The hybrid search results comprehensively consider the keyword matching score (`_keyword_score`) and semantic similarity score (`_semantic_score`). The final `_score` is the sum of these two, used to comprehensively rank the search results.
|
|
|
|
## Parameter tuning
|
|
|
|
In hybrid search, you can adjust the weight ratio of full-text search and vector search through the `boost` parameter to optimize search results. For example, to increase the weight of full-text search:
|
|
|
|
```sql
|
|
SET @parm = '{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": ["query", "content"],
|
|
"query": "hello oceanbase",
|
|
"boost": 2.0
|
|
}
|
|
},
|
|
"knn" : {
|
|
"field": "vector",
|
|
"k": 5,
|
|
"query_vector": [1,2,3],
|
|
"boost": 1.0
|
|
}
|
|
}';
|
|
|
|
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
|
|
```
|
|
|
|
By adjusting the `boost` parameter, you can control the weight of keyword search and semantic search in the final ranking. For example, if you focus more on keyword matching, you can increase the `boost` value of `query_string`; if you focus more on semantic similarity, you can increase the `boost` value of `knn`.
|
|
|
|
## Summary
|
|
|
|
Through this tutorial, you have mastered the core features of seekdb hybrid search:
|
|
|
|
* Pure vector search: Find relevant content through semantic similarity, suitable for semantic search scenarios.
|
|
* Pure full-text search: Find content through keyword matching, suitable for precise search scenarios.
|
|
* Hybrid search: Combines keywords and semantic understanding to provide more comprehensive and accurate search results.
|
|
|
|
The hybrid search feature is an ideal choice for processing massive unstructured data and building intelligent search and recommendation systems, significantly improving the accuracy and comprehensiveness of retrieval results.
|
|
|
|
### What's next
|
|
|
|
* Explore [AI function service features](../../200.develop/300.ai-function/200.ai-function.md)
|
|
* View [hybrid vector index](../../200.develop/300.ai-function/200.ai-function.md) to simplify vector search processes
|
|
|
|
## More information
|
|
|
|
For more guides on experiencing seekdb's AI Native features and building AI applications based on seekdb, see:
|
|
|
|
* [Experience vector search](30.experience-vector-search.md)
|
|
* [Experience full-text indexing](40.experience-full-text-indexing.md)
|
|
* [Experience AI function service](60.experience-ai-function.md)
|
|
* [Experience semantic indexing](70.experience-hybrid-vector-index.md)
|
|
* [Experience the Vibe Coding paradigm with Cursor Agent + OceanBase MCP](80.experience-vibe-coding-paradigm-with-cursor-agent-oceanbase-mcp.md)
|
|
* [Build a knowledge base desktop application based on seekdb](../../500.tutorials/100.create-ai-app-demo/100.build-kb-in-seekdb.md)
|
|
* [Build a cultural tourism assistant with multi-model integration based on seekdb](../../500.tutorials/100.create-ai-app-demo/300.build-multi-model-application-based-on-oceanbase.md)
|
|
* [Build an image search application based on seekdb](../../500.tutorials/100.create-ai-app-demo/400.build-image-search-app-in-seekdb.md)
|
|
|
|
In addition to using SQL for operations, you can also use the Python SDK (pyseekdb) provided by seekdb. For usage instructions, see [Experience embedded seekdb using Python SDK](../50.embedded-mode/25.using-seekdb-in-python-sdk.md) and [pyseekdb overview](../../200.develop/900.sdk/10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md). |