Initial commit
This commit is contained in:
@@ -0,0 +1,600 @@
|
||||
---
|
||||
|
||||
slug: /dense-vector-index
|
||||
---
|
||||
|
||||
# Dense vector index
|
||||
|
||||
This topic describes how to create, query, maintain, and drop a dense vector index in seekdb.
|
||||
|
||||
## Index types
|
||||
|
||||
The following table describes the vector index types supported by seekdb.
|
||||
|
||||
| Index type | Description | Scenarios |
|
||||
|-----------|------|----------|
|
||||
| HNSW | The maximum dimension of indexed columns is 4096. The HNSW index is a memory-based index that must be fully loaded into memory. It supports DML and real-time queries. | |
|
||||
| HNSW_SQ | The HNSW_SQ index offers similar construction speed, query performance, and recall rate as the HNSW index, but reduces overall memory usage to 1/2 to 1/3 of the original. | Scenarios with high performance and recall rate requirements. |
|
||||
| HNSW_BQ | The HNSW_BQ index has a slightly lower recall rate compared to the HNSW index, but significantly reduces memory usage. The BQ quantization compression algorithm (Rabitq) can compress vectors to 1/32 of their original size. The memory optimization effect of the HNSW_BQ index becomes more pronounced as the vector dimension increases. | |
|
||||
| IVF| An IVF index implemented based on database tables, which does not require resident memory. | Scenarios with lower performance requirements but large data volumes and cost sensitivity. |
|
||||
| IVF_PQ| An IVF_PQ index implemented based on database tables, which does not require resident memory. On top of IVF, PQ quantization technology is applied. The recall rate of the index is slightly lower than that of the IVF index, but the performance is higher. The PQ quantization compression algorithm can generally compress vectors to 1/16 to 1/32 of their original size. | Scenarios with lower performance requirements but large data volumes and cost sensitivity. |
|
||||
| IVF_SQ (Experimental feature)| An IVF_SQ index implemented based on database tables, which does not require resident memory. On top of IVF, SQ quantization technology is applied. The recall rate of the index is slightly lower than that of the IVF index, but the performance is higher. The SQ quantization compression algorithm can generally compress vectors to 1/3 to 1/4 of their original size. | Scenarios with lower performance requirements but large data volumes and cost sensitivity. |
|
||||
|
||||
Some other notes:
|
||||
|
||||
* Dense vector indexes support L2, inner product (IP), and cosine distance as the index distance algorithm.
|
||||
* Vector index queries support calling some distance functions. For more information, see [Use SQL functions](../250.vector-function.md).
|
||||
* Vector queries with filter conditions are supported. The filter conditions can be scalar conditions or spatial relationships, such as ST_Intersects. Multi-value indexes, full-text indexes, and global indexes are not supported as pre-filterers.
|
||||
* You can create vector and full-text indexes on the same table.
|
||||
* For more information about how vector indexes support offline DDL operations, see [Offline DDL](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974221).
|
||||
|
||||
The limitations are described as follows:
|
||||
|
||||
* For V1.0.0, creating columnstore vector indexes is currently not supported.
|
||||
|
||||
## Index memory estimation and actual usage query
|
||||
|
||||
You can estimate the memory required for vector indexes using the `DBMS_VECTOR` system package:
|
||||
|
||||
* Before creating a table, you can estimate index memory requirements by using the [INDEX_VECTOR_MEMORY_ADVISOR](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002754002) procedure.
|
||||
* After a table is created and data is inserted, you can estimate index memory requirements by using the [INDEX_VECTOR_MEMORY_ESTIMATE](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002754001) procedure.
|
||||
|
||||
The vector index memory estimation provides two key pieces of information: the minimum memory configuration required to create a vector index, and the actual memory usage after creating HNSW_SQ and IVF indexes.
|
||||
|
||||
We also provide the configuration item `load_vector_index_on_follower` to control whether the follower role automatically loads in-memory vector indexes. For syntax and examples, see [load_vector_index_on_follower](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002969407). If weak reads are not needed, you can disable this configuration item to reduce the memory used by vector indexes.
|
||||
|
||||
## Creation syntax and description
|
||||
|
||||
seekdb vector indexes can be created during table creation or after the table is created. When creating a vector index, note the following:
|
||||
|
||||
* The `VECTOR` keyword is required when creating a vector index.
|
||||
* The parameters and descriptions for an index created after the table is created are the same as those for an index created during table creation.
|
||||
* If a large amount of data is involved, we recommend that you write the data first and then create the index to achieve the optimal query performance.
|
||||
* It is recommended to create HNSW_SQ, IVF, IVF_SQ, and IVF_PQ indexes after data is inserted, and to rebuild the indexes after a significant amount of new data is added. For detailed instructions on creating each index, see the specific examples below.
|
||||
|
||||
:::tab
|
||||
tab HNSW/HNSW_SQ/HNSW_BQ
|
||||
|
||||
Syntax for creating an index during table creation:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table_name (
|
||||
column_name1 data_type1,
|
||||
column_name2 data_type2,
|
||||
...,
|
||||
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
|
||||
);
|
||||
```
|
||||
|
||||
Syntax for creating an index after table creation:
|
||||
|
||||
```sql
|
||||
-- Creating an index after table creation supports setting parallel degree to improve index construction performance. The maximum parallel degree should not exceed CPU cores * 2
|
||||
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
|
||||
```
|
||||
|
||||
`param` parameter description:
|
||||
|
||||
| Parameter | Default value | Value range | Required | Description | Remarks |
|
||||
|------|--------|----------|----------|------|------|
|
||||
| distance | | l2/inner_product/cosine | Yes | The vector distance function type. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. |
|
||||
| type | | currently supported `hnsw` / `hnsw_sq`/ `hnsw_bq`. | Yes | The index type. | |
|
||||
| lib | vsag | vsag | No | The vector index library type. | At present, only the VSAG vector library is supported. |
|
||||
| m | 16 | [5,128] | No | The maximum number of neighbors of each node. | The larger the value, the slower the index construction, but the better the query performance. |
|
||||
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index construction. | The larger the value, the slower the index construction, but the better the index quality. `ef_construction` must be greater than `m`. |
|
||||
| ef_search | 64 | [1,1000] | No | The size of the candidate set during a query. | The larger the value, the slower the query, but the higher the recall rate. |
|
||||
| extra_info_max_size | 0 | [0,16384] | No | The maximum size of each primary key information (in bytes). Storing the primary key of the table in the index can speed up queries. | <code>0</code>: The primary key information is not stored.<br/><code>1</code>: The primary key information is forcibly stored, regardless of the size limit. In this case, the primary key type (see below) must be a supported type.<br/><code>Greater than 1</code>: The maximum size of the primary key information (in bytes) is specified. In this case, the following conditions must be met:<ul><li>The size of the primary key information (calculation method see below) must be less than the specified size limit.</li><li>The primary key type must be a supported type.</li><li>The table is not a table without a primary key.</li></ul> |
|
||||
| refine_k | 4.0 | [1.0,1000.0] | No | <main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. </p></main> This parameter is a floating-point number used to adjust the rearrangement ratio for quantized vector indexes. | This parameter can be specified when you create an index or during a query:<ul><li>If this parameter is not specified during a query, the value specified when the index is created is used. </li><li>If this parameter is specified during a query, the value specified during the query is used. </li></ul> |
|
||||
| refine_type | sq8 <main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. </p></main> This parameter specifies the construction precision of quantized vector indexes. | This parameter improves the efficiency of index construction by reducing the memory usage and the construction time, but may affect the recall rate. |
|
||||
| bq_bits_query | 32 | 0/4/32 | No | <main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. </p></main> This parameter specifies the query precision of quantized vector indexes in bits. | This parameter improves the efficiency of index construction by reducing the memory usage and the construction time, but may affect the recall rate. |
|
||||
| bq_use_fht | true <main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. </p></main> This parameter specifies whether to use FHT for queries. FHT (Fast Hadamard Transform) is an algorithm used to accelerate vector inner product calculations. | |
|
||||
|
||||
The supported primary key types for `extra_info_max_size` include:
|
||||
|
||||
* [Numeric types](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001975803): Integer types, floating-point types, and BIT_VALUE types.
|
||||
* [Datetime types](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001975805)
|
||||
* [Character types](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001975810): VARCHAR type is supported.
|
||||
|
||||
The calculation method for the primary key information size:
|
||||
|
||||
```sql
|
||||
SET @table_name = 'test'; -- Replace this with the table name to be queried.
|
||||
|
||||
SELECT
|
||||
CASE
|
||||
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
|
||||
ELSE COALESCE(SUM(result_value), 'not support')
|
||||
END AS extra_info_size
|
||||
FROM (
|
||||
SELECT
|
||||
CASE
|
||||
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8
|
||||
WHEN oc.data_type = 22 THEN oc.data_length -- For varchar types, extra_info_size += data_length
|
||||
ELSE NULL -- Other types are not supported
|
||||
END AS result_value
|
||||
FROM
|
||||
oceanbase.__all_column oc
|
||||
JOIN
|
||||
oceanbase.__all_virtual_data_type vdt
|
||||
ON
|
||||
oc.data_type = vdt.data_type
|
||||
WHERE
|
||||
oc.rowkey_position != 0
|
||||
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
|
||||
) AS result_table;
|
||||
|
||||
-- The result is 8 bytes.
|
||||
```
|
||||
|
||||
tab IVF/IVF_SQ (Experimental feature)/IVF_PQ
|
||||
|
||||
Syntax for creating an index during table creation:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table_name (
|
||||
column_name1 data_type1,
|
||||
column_name2 data_type2,
|
||||
...,
|
||||
VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
|
||||
);
|
||||
```
|
||||
|
||||
Syntax for creating an index after table creation:
|
||||
|
||||
```sql
|
||||
-- Creating an index after table creation supports setting parallel degree to improve index construction performance. The maximum parallel degree should not exceed CPU cores * 2
|
||||
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
|
||||
```
|
||||
|
||||
`param` parameter description:
|
||||
|
||||
| Parameter | Default value | Value range | Required? | Description | Remarks |
|
||||
|------|--------|----------|----------|------|------|
|
||||
| distance | | l2/inner_product/cosine | Yes | The vector distance function type. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. |
|
||||
| type | | ivf_flat/ivf_sq8/ivf_pq | Yes | The IVF index type. | |
|
||||
| lib | ob | ob | No | The vector index library type. | |
|
||||
| nlist | 128 | [1,65536] | No | The number of clusters. | |
|
||||
| sample_per_nlist | 256 | [1,int64_max] | Yes | The number of samples for each cluster center, which is used when creating an index after table creation. | |
|
||||
| nbits | 8 | [1,24] | No | The number of quantization bits.<main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an IVF_PQ index.</p></main> | The recommended value is 8. The recommended value range is [8,10]. The larger the value, the higher the quantization accuracy and query accuracy, but the query performance will be affected. |
|
||||
| m | No default value, must be specified | [1,65536] | Yes | The dimension of the quantized vectors.<main id="notice" type="notice"><p>This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an IVF_PQ index.</p></main> | The larger the value, the slower the index construction, and the higher the query accuracy, but the query performance will be affected. |
|
||||
|
||||
:::
|
||||
|
||||
## Query syntax and description
|
||||
|
||||
Vector index queries are approximate nearest neighbor queries and do not guarantee 100% accuracy. The accuracy of vector queries is measured by recall. For example, if a query for the 10 nearest neighbors can stably return 9 correct results, the recall is 90%. The recall is described as follows:
|
||||
|
||||
* The recall is affected by the build parameters and query parameters.
|
||||
* The index query parameters are specified when the index is created and cannot be modified. However, you can set session variables to specify the parameters. The `ob_hnsw_ef_search` variable specifies the parameters for the HNSW/HNSW_SQ/HNSW_BQ index, and the `ob_ivf_nprobes` variable specifies the parameters for the IVF index. If you set a session variable, its value is prioritized. For more information, see [ob_hnsw_ef_search](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001976680) and [ob_ivf_nprobes](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002179539).
|
||||
|
||||
The syntax for dense vector indexes is as follows:
|
||||
|
||||
```sql
|
||||
SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num (OFFSET $num);
|
||||
```
|
||||
|
||||
Query usage notes are as follows:
|
||||
|
||||
* Syntax requirements:
|
||||
* The `APPROXIMATE`/`APPROX` keyword must be specified for the query to use the vector index instead of a full table scan.
|
||||
* The query must include the `ORDER BY` and `LIMIT` clauses.
|
||||
* The `ORDER BY` clause only supports a single vector condition.
|
||||
* The value of `LIMIT + OFFSET` must be in the range `(0, 16384]`.
|
||||
|
||||
* Rules for distance functions:
|
||||
* If `APPROXIMATE`/`APPROX` is specified, a supported distance function is called, and it matches the vector index algorithm, the query will use the vector index.
|
||||
* If `APPROXIMATE`/`APPROX` is specified, but the distance function does not match the vector index algorithm, the query will not use the vector index, but no error is returned.
|
||||
* If `APPROXIMATE`/`APPROX` is specified, but the distance function is not supported in the current version, the query will not use the vector index, and an error is returned.
|
||||
* If `APPROXIMATE`/`APPROX` is not specified, and a supported distance function is called, the query will not use the vector index, but no error is returned.
|
||||
|
||||
* Other notes:
|
||||
* The `WHERE` condition will serve as a filter after the vector index query.
|
||||
* Specifying the `LIMIT` clause is required; otherwise, an error will be returned.
|
||||
|
||||
## Create, query, and delete examples
|
||||
|
||||
### Create an index during table creation
|
||||
|
||||
#### Example of dense vector index
|
||||
|
||||
##### HNSW example
|
||||
|
||||
:::tip
|
||||
|
||||
When you create an HNSW index, the index name must be less than 25 characters in length. Otherwise, an exception may occur because the auxiliary table name exceeds the <code>index_name</code> limit. In future versions, the index name can be longer.
|
||||
:::
|
||||
|
||||
Create a test table.
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
|
||||
```
|
||||
|
||||
Write test data.
|
||||
|
||||
```sql
|
||||
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
|
||||
|
||||
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
|
||||
|
||||
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
|
||||
```
|
||||
|
||||
Perform an approximate nearest neighbor query.
|
||||
|
||||
```sql
|
||||
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
|
||||
```
|
||||
|
||||
The query result is as follows:
|
||||
|
||||
```shell
|
||||
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|
||||
| c1 | c0 | c2 | c3 |
|
||||
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|
||||
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
|
||||
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
##### HNSW_SQ example
|
||||
|
||||
```sql
|
||||
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
|
||||
```
|
||||
|
||||
##### HNSW_BQ example
|
||||
|
||||
```sql
|
||||
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
|
||||
```
|
||||
|
||||
The `distance` parameter of HNSW_BQ supports only the l2 value.
|
||||
|
||||
##### IVF example
|
||||
|
||||
:::tip
|
||||
|
||||
When you create an IVF index, the index name must be less than 33 characters in length. Otherwise, an exception may occur because the auxiliary table name exceeds the <code>index_name</code> limit. In future versions, the index name can be longer.
|
||||
:::
|
||||
|
||||
```sql
|
||||
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
|
||||
```
|
||||
|
||||
### Create an index after table creation
|
||||
|
||||
:::tip
|
||||
|
||||
Currently, only dense vector indexes can be created after table creation.
|
||||
:::
|
||||
|
||||
#### Example of HNSW index
|
||||
|
||||
Create a test table.
|
||||
|
||||
```sql
|
||||
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
|
||||
```
|
||||
|
||||
Create an HNSW index.
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
|
||||
```
|
||||
|
||||
View the created table.
|
||||
|
||||
```sql
|
||||
SHOW CREATE TABLE vec_table_hnsw;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| Table | Create Table |
|
||||
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
|
||||
`id` int(11) DEFAULT NULL,
|
||||
`c2` VECTOR(10) DEFAULT NULL,
|
||||
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
|
||||
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
|
||||
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
```sql
|
||||
SHOW INDEX FROM vec_table_hnsw;
|
||||
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
|
||||
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
|
||||
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
|
||||
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
|
||||
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
#### Example of HNSW_SQ index
|
||||
|
||||
Create a test table.
|
||||
|
||||
```sql
|
||||
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
|
||||
```
|
||||
|
||||
Create an HNSW_SQ index.
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
|
||||
```
|
||||
|
||||
##### Example of HNSW_BQ index
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
|
||||
```
|
||||
|
||||
The `distance` parameter of the HNSW_BQ index can be used only with the L2 algorithm.
|
||||
|
||||
#### Example of IVF index
|
||||
|
||||
Create a test table.
|
||||
|
||||
```sql
|
||||
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
|
||||
```
|
||||
|
||||
Create an IVF index.
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
|
||||
```
|
||||
|
||||
### Drop an index
|
||||
|
||||
```sql
|
||||
DROP INDEX vec_idx1 ON vec_table;
|
||||
```
|
||||
|
||||
View the dropped index.
|
||||
|
||||
```sql
|
||||
SHOW INDEX FROM vec_table;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
Empty set
|
||||
```
|
||||
|
||||
<!--## Monitoring
|
||||
|
||||
seekdb vector indexes provide monitoring capabilities:
|
||||
|
||||
* You can view the basic information and real-time status of HNSW/HNSW_SQ/HNSW_BQ indexes through the [GV$OB_HNSW_INDEX_INFO](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000004017373) view.
|
||||
* You can view the basic information and real-time status of IVF/IVF_SQ/IVF_PQ indexes through the [GV$OB_IVF_INDEX_INFO](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000004017374) view.-->
|
||||
|
||||
## Maintenance
|
||||
|
||||
When there is a large amount of incremental data, the query performance decreases. To reduce the amount of incremental data in the table, seekdb introduced the `DBMS_VECTOR` package for maintaining vector indexes.
|
||||
|
||||
### Incremental refresh
|
||||
|
||||
:::tip
|
||||
|
||||
IVF/IVF_SQ/IVF_PQ indexes do not support incremental refresh.
|
||||
:::
|
||||
|
||||
If a large amount of data is written after the index is created, we recommend that you perform an incremental refresh by using the `REFRESH_INDEX` procedure. For more information, see [REFRESH_INDEX](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002753999).
|
||||
|
||||
The system checks for incremental data every 15 minutes. If more than 10,000 incremental data records are found, the system automatically performs an incremental refresh.
|
||||
|
||||
### Full refresh (rebuild)
|
||||
|
||||
#### Manual full table rebuild
|
||||
|
||||
If a large amount of data is updated or deleted after an index is created, it is recommended to use the `REBUILD_INDEX` procedure to perform a full refresh. For details and examples, see [REBUILD_INDEX](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002754000).
|
||||
|
||||
A full refresh is automatically checked every 24 hours. If the newly added data exceeds 20% of the original data, a full refresh will be triggered automatically. The full refresh runs asynchronously in the background: a new index is created first, and then the old index is replaced. During the rebuild process, the old index remains available, but the overall process is relatively slow.
|
||||
|
||||
We also provide the configuration item `vector_index_memory_saving_mode` to control the memory usage during index rebuild. Enabling this mode can reduce the memory consumption during vector index rebuild for partitioned tables. Typically, vector index rebuild requires memory equivalent to twice the index size. After enabling the memory-saving mode, the system will temporarily delete the memory index of a partition after building that partition to release memory, effectively reducing the total memory required for the rebuild operation. For syntax and examples, see [vector_index_memory_saving_mode](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002969408).
|
||||
|
||||
Notes:
|
||||
|
||||
* When executing offline DDL operations (such as `ALTER TABLE` to modify the table structure or primary key), the index table will be rebuilt. Since parallel degree cannot be specified for index rebuild, the system uses single-threaded execution by default. Therefore, when the data volume is large, the rebuild process will be slow, affecting the efficiency of the entire offline DDL operation.
|
||||
* When rebuilding an index, if you need to modify index parameters, you must specify both `type` and `distance` in the parameter list, and `type` and `distance` must match the original index type. For example, if the original index type is `hnsw` and the distance algorithm is `l2`, you must specify both `type=hnsw` and `distance=l2` during rebuild.
|
||||
* When rebuilding an index, the following are supported:
|
||||
* Modifying `m`, `ef_search`, and `ef_construction` values.
|
||||
* Online rebuild of the `ef_search` parameter.
|
||||
* Index type rebuild between `hnsw` - `hnsw_sq`.
|
||||
* Index type rebuild between `ivf_flat` - `ivf_flat`, `ivf_sq8` - `ivf_sq8`, `ivf_pq` - `ivf_pq`.
|
||||
* Setting parallel degree during rebuild. For examples, see [REBUILD_INDEX](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002754000).
|
||||
* When rebuilding an index, the following are not supported:
|
||||
* Modifying `type` and `distance` types.
|
||||
* Index rebuild between `hnsw` - `ivf`.
|
||||
* Index rebuild between `hnsw` - `hnsw_bq`.
|
||||
* Cross rebuild between `ivf_flat`, `ivf_pq`, and `ivf_sq8`.
|
||||
|
||||
#### Automatic partition rebuild (recommended)
|
||||
|
||||
:::tip
|
||||
<li>This feature is supported starting from V1.0.0. If your vector database is upgraded from an earlier version to V1.0.0, you need to manually rebuild all vector indexes for the entire table after the upgrade. Otherwise, automatic partition rebuild tasks may not be executed after the upgrade.</li><li>This feature only supports HNSW/HNSW_SQ/HNSW_BQ indexes.</li>
|
||||
:::
|
||||
|
||||
There are two scenarios that trigger automatic partition rebuild tasks in the current version:
|
||||
|
||||
* When executing vector index query statements.
|
||||
* Scheduled checks, with configurable execution cycle.
|
||||
|
||||
1. Configure execution cycle
|
||||
|
||||
In the `seekdb` database, configure the execution cycle through the configuration item `vector_index_optimize_duty_time`. Example:
|
||||
|
||||
```sql
|
||||
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
|
||||
```
|
||||
After the above configuration, partition rebuild tasks will only be executed between 23:00:00 and 24:00:00, and will not be initiated at other times. For detailed parameter descriptions, see the corresponding configuration item documentation.
|
||||
|
||||
2. View task progress/history
|
||||
|
||||
You can view task progress and history through the `CDB/DBA_OB_VECTOR_INDEX_TASKS` or `CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY` view.
|
||||
|
||||
Determine the current task status through the `status` field:
|
||||
|
||||
* 0 (PREPARE): The task is waiting to be executed.
|
||||
* 1 (RUNNING): The task is being executed.
|
||||
* 2 (PENDING): The task is paused.
|
||||
* 3 (FINISHED): The task has been completed.
|
||||
|
||||
Completed tasks, i.e., tasks with `status=FINISHED`, will be archived to the history table regardless of whether they succeeded. For detailed usage examples, see the corresponding view documentation.
|
||||
|
||||
3. Cancel task
|
||||
|
||||
To cancel a task, obtain the trace_id from the `DBA_OB_VECTOR_INDEX_TASKS` or `CDB_OB_VECTOR_INDEX_TASKS` view, then execute the following command:
|
||||
|
||||
```sql
|
||||
ALTER SYSTEM CANCEL TASK <trace_id>;
|
||||
```
|
||||
Example:
|
||||
```sql
|
||||
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
|
||||
```
|
||||
|
||||
## Performance optimization
|
||||
|
||||
:::tip
|
||||
Only the IVF index is supported.
|
||||
:::
|
||||
|
||||
seekdb provides an automatic performance optimization mechanism for the IVF index to improve query performance through cache management and regular maintenance.
|
||||
|
||||
### Optimization mechanism
|
||||
|
||||
IVF index performance optimization includes two types of automated tasks:
|
||||
|
||||
1. Cache warming task: Periodically checks all IVF indexes. If it finds that the cache corresponding to an index does not exist, it automatically triggers cache warming and loads the index data into memory. Additionally, cache warming is automatically performed when an IVF index is created.
|
||||
2. Cache cleanup task: Periodically checks all IVF caches. If it finds that the cache corresponds to an index that has been deleted, it automatically cleans up the invalid cache and releases memory resources. Additionally, cache cleanup is automatically performed when an IVF index is deleted.
|
||||
|
||||
### Configure the optimization cycle
|
||||
|
||||
The system allows you to customize the execution time window for performance optimization tasks to avoid impacting performance during peak business hours.
|
||||
|
||||
In the `seekdb` database, you can set the execution cycle using the `vector_index_optimize_duty_time` parameter:
|
||||
|
||||
```sql
|
||||
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
|
||||
```
|
||||
|
||||
The configuration is described as follows:
|
||||
|
||||
* The time format is `[start time, end time]`.
|
||||
* The above configuration means that optimization tasks will only be executed between 23:00:00 and 24:00:00.
|
||||
* Optimization tasks will not be initiated at other times to avoid impacting normal business operations.
|
||||
|
||||
### Monitor optimization tasks
|
||||
|
||||
seekdb vector indexes provide monitoring capabilities for optimization tasks:
|
||||
|
||||
* You can view tasks that are being executed or waiting to be executed through the `DBA_OB_VECTOR_INDEX_TASKS` view.
|
||||
* You can view historical task records through the `DBA_OB_VECTOR_INDEX_TASK_HISTORY` view.
|
||||
|
||||
Usage examples:
|
||||
|
||||
1. View the current task status
|
||||
|
||||
View tasks that are being executed or waiting to be executed through the `DBA_OB_VECTOR_INDEX_TASKS` view:
|
||||
|
||||
```sql
|
||||
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS;
|
||||
```
|
||||
|
||||
Sample return result:
|
||||
|
||||
```shell
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | USER | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
Description of the task status:
|
||||
|
||||
* `STATUS = 0`: PREPARE, the task is waiting to be executed.
|
||||
* `STATUS = 1`: RUNNING, the task is being executed.
|
||||
* `STATUS = 3`: FINISHED, the task has been completed.
|
||||
|
||||
Description of the task type:
|
||||
|
||||
* `TASK_TYPE = 2`: IVF cache warming task.
|
||||
* `TASK_TYPE = 3`: IVF invalid cache cleanup task.
|
||||
|
||||
2. View the history task records
|
||||
|
||||
Completed tasks (with `STATUS = 3`) are automatically archived to the history table every 10 seconds, regardless of whether they were successful. View the history through the `DBA_OB_VECTOR_INDEX_TASKS_HISTORY` view:
|
||||
|
||||
```sql
|
||||
-- Query the history of a specified task ID
|
||||
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS_HISTORY WHERE TASK_ID=2002281;
|
||||
```
|
||||
|
||||
Sample return result:
|
||||
|
||||
```shell
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| 500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | AUTO | FINISHED | 2 | 1750671687770026 | 0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
### Cancel an optimization task
|
||||
|
||||
You can cancel a specified task by using the following command:
|
||||
|
||||
```sql
|
||||
-- trace_id is obtained from the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view
|
||||
ALTER SYSTEM CANCEL TASK <trace_id>;
|
||||
```
|
||||
|
||||
:::tip
|
||||
You can cancel a task only in the failed retry phase by executing the <code>ALTER SYSTEM CANCEL TASK</code> statement. If a background task is stuck in a specific execution phase, it cannot be canceled by using this statement.
|
||||
:::
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
-- Log in to the system and obtain the trace_id of the specified task
|
||||
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY WHERE TASK_ID=2037736;
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | PREPARED | 2 | 1750682301145225 | -1 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Cancel the task
|
||||
ALTER SYSTEM CANCEL TASK "YAFF00B9E4D97-00063839E6BDDEE0-0-1";
|
||||
```
|
||||
|
||||
After the task is canceled, the task status changes to `CANCELLED`.
|
||||
|
||||
```sql
|
||||
-- Log in to the user database and query the task status
|
||||
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY;
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| TABLE_ID | TABLET_ID | TASK_ID | START_TIME | MODIFY_TIME | TRIGGER_TYPE | STATUS | TASK_TYPE | TASK_SCN | RET_CODE | TRACE_ID |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
| 500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER | FINISHED | 2 | 1750682301145225 | -4072 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
|
||||
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Use SQL functions](../250.vector-function.md)
|
||||
@@ -0,0 +1,374 @@
|
||||
---
|
||||
|
||||
slug: /hybrid-vector-index
|
||||
---
|
||||
|
||||
# Create a hybrid vector index
|
||||
|
||||
This topic describes how to create a hybrid vector index in seekdb.
|
||||
|
||||
## Overview
|
||||
|
||||
Hybrid vector indexes leverage seekdb's built-in embedding capabilities to greatly simplify the vector index usage process. They make the vector concept transparent to users: you can directly write raw data (such as text) that needs to be stored, and seekdb will automatically convert it to vectors and build indexes internally. During retrieval, you only need to provide the raw query content, and seekdb will also automatically perform embedding and retrieve the vector index, significantly improving ease of use.
|
||||
|
||||
Considering the performance overhead of embedding models, hybrid vector indexes provide two embedding modes for users to choose from:
|
||||
* Synchronous mode: Embedding and indexing are performed immediately after data is written, ensuring real-time data visibility.
|
||||
* Asynchronous mode: Background tasks perform data embedding and indexing in batches, which can significantly improve write performance. You can flexibly set the trigger cycle of background tasks based on your requirements for real-time data visibility.
|
||||
|
||||
In addition, this feature also provides the capability to perform brute-force search on hybrid vector indexes to help verify the correctness of search results. Brute-force search refers to performing a search using a full table scan to obtain the exact results of the n nearest rows.
|
||||
|
||||
## Feature support
|
||||
|
||||
:::tip
|
||||
This feature currently supports only HNSW/HNSW_BQ indexes.
|
||||
:::
|
||||
|
||||
This feature supports the full lifecycle of hybrid vector indexes, including creation, update, deletion, and retrieval, and is compatible with `REFRESH_INDEX` and `REBUILD_INDEX` in the `DBMS_VECTOR` system package. The syntax for update, deletion, and retrieval is exactly the same as that for regular vector indexes. In asynchronous mode, `REFRESH_INDEX` will additionally trigger data embedding. For details about creation and retrieval, see the sections below.
|
||||
|
||||
The supported features are as follows:
|
||||
|
||||
| Module | Feature | Description |
|
||||
|------|--------|------|
|
||||
| DDL | Create a hybrid vector index during table creation | You can create a hybrid vector index on a `VARCHAR` column when creating a table |
|
||||
| DDL | Create a hybrid vector index after table creation | Supports creating a hybrid vector index on a `VARCHAR` column of an existing table |
|
||||
| Retrieval | `semantic_distance` function | Pass raw data through this function for vector retrieval |
|
||||
| Retrieval | `semantic_vector_distance` function | Pass vectors through this function for retrieval. There are two usage modes: <ul><li>When the SQL statement includes the `APPROXIMATE`/`APPROX` clause, vector index retrieval is used.</li><li>When the `APPROXIMATE`/`APPROX` clause is not included, brute-force search using a full table scan is performed.</li></ul> |
|
||||
| DBMS_VECTOR | `REFRESH_INDEX` | The usage is the same as that for regular vector indexes. Performs incremental index refresh and embedding in asynchronous mode |
|
||||
| DBMS_VECTOR | `REBUILD_INDEX` | The usage is the same as that for regular vector indexes. Performs full index rebuild |
|
||||
|
||||
Some usage notes are as follows:
|
||||
|
||||
* In synchronous mode, write performance may be affected by embedding performance. In asynchronous mode, data visibility will be delayed.
|
||||
* For repeated retrieval scenarios, it is recommended to use AI Function Service to pre-obtain query vectors to avoid embedding for each retrieval.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using hybrid vector indexes, you must register an embedding model and endpoint. The following is a registration example:
|
||||
|
||||
```sql
|
||||
CALL DBMS_AI_SERVICE.DROP_AI_MODEL ('ob_embed');
|
||||
CALL DBMS_AI_SERVICE.DROP_AI_MODEL_ENDPOINT ('ob_embed_endpoint');
|
||||
|
||||
CALL DBMS_AI_SERVICE.CREATE_AI_MODEL(
|
||||
'ob_embed', '{
|
||||
"type": "dense_embedding",
|
||||
"model_name": "BAAI/bge-m3"
|
||||
}');
|
||||
|
||||
CALL DBMS_AI_SERVICE.CREATE_AI_MODEL_ENDPOINT (
|
||||
'ob_embed_endpoint', '{
|
||||
"ai_model_name": "ob_embed",
|
||||
"url": "https://api.siliconflow.cn/v1/embeddings",
|
||||
"access_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxx",
|
||||
"provider": "siliconflow"
|
||||
}');
|
||||
```
|
||||
|
||||
:::info
|
||||
Replace <code>access_key</code> with your actual API Key. The BAAI/bge-m3 model has a vector dimension of 1024, so you need to use <code>dim=1024</code> when creating a hybrid vector index.
|
||||
:::
|
||||
|
||||
## Creation syntax and description
|
||||
|
||||
Hybrid vector indexes support two creation methods: **creation during table creation** and **creation after table creation**. When creating an index, note the following:
|
||||
|
||||
* The index must be created on a column of the `VARCHAR` type.
|
||||
* The `model` and `sync_mode` parameters are not supported for regular vector indexes.
|
||||
* The parameters and descriptions for an index created after table creation are the same as those for an index created during table creation.
|
||||
|
||||
### Create during table creation
|
||||
|
||||
You can use the `CREATE TABLE` statement to create a hybrid vector index. Through index parameters, background tasks can be initiated synchronously or asynchronously. In synchronous mode, `VARCHAR` data is automatically converted to vector data when data is inserted. In asynchronous mode, data conversion is performed periodically or manually.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
CREATE TABLE table_name (
|
||||
column_name1 data_type1,
|
||||
column_name2 VARCHAR, -- Text column
|
||||
...,
|
||||
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
|
||||
);
|
||||
```
|
||||
|
||||
#### Parameter description
|
||||
|
||||
| Parameter | Default value | Value range | Required | Description | Remarks |
|
||||
|------|--------|----------|----------|------|------|
|
||||
| `distance` | | `l2`/`inner_product`/`cosine` | Yes | Specifies the vector distance algorithm type. | `l2` indicates Euclidean distance, `inner_product` indicates inner product distance, and `cosine` indicates cosine distance. |
|
||||
| `type` | | Currently supports `hnsw` / `hnsw_bq` | Yes | Specifies the index algorithm type. | |
|
||||
| `lib` | `vsag` | `vsag` | No | Specifies the vector index library type. | Currently, only the VSAG vector library is supported. |
|
||||
| `model` | | Registered model name | Yes | Specifies the large language model name used for embedding. | The model must be registered using AI Function Service before creating the index.<main id="notice" type='notice'><h4>Note</h4><p>Regular vector indexes do not support setting this parameter.</p></main> |
|
||||
| `dim` | | Positive integer, maximum 4096 | Yes | Specifies the vector dimension after embedding. | Must match the dimension provided by the model. |
|
||||
| `sync_mode` | `async` | `immediate`/`manual`/`async` | No | Specifies the data and index synchronization mode. | `immediate` indicates synchronous mode, `manual` indicates manual mode, and `async` indicates asynchronous mode.<main id="notice" type='notice'><h4>Note</h4><p>Regular vector indexes do not support setting this parameter.</p></main> |
|
||||
| `sync_interval` | `10s` | Time interval, such as `10s`, `1h`, `1d`, etc. | No | Sets the trigger cycle of background tasks in asynchronous mode. | The numeric part must be positive. Units supported include seconds (s), hours (h), days (d), etc. |
|
||||
|
||||
The usage of other vector index parameters (such as `m`, `ef_construction`, `ef_search`, etc.) is the same as that for regular vector indexes. For details, see the related documentation.
|
||||
|
||||
### Create after table creation
|
||||
|
||||
Supports creating a hybrid vector index on a `VARCHAR` column of an existing table. When creating an index after table creation, synchronous or asynchronous background tasks are initiated through the provided index parameters. In synchronous mode, all existing `VARCHAR` data is converted to vector data. In asynchronous mode, data conversion is performed periodically or manually.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX index_name
|
||||
ON table_name(varchar_column_name)
|
||||
WITH (param1=value1, param2=value2, ...);
|
||||
```
|
||||
|
||||
#### Parameter description
|
||||
|
||||
The parameter description is the same as that for creating an index during table creation. For details, see the section above.
|
||||
|
||||
## Create, update, and delete examples
|
||||
|
||||
DML operations (`INSERT`, `UPDATE`, `DELETE`) for hybrid vector indexes are exactly the same as those for regular vector indexes. When inserting or updating data of the `VARCHAR` type, the system automatically or asynchronously performs embedding based on the `sync_mode` parameter setting.
|
||||
|
||||
### Create during table creation
|
||||
|
||||
Create the `vector_idx` index when creating the test table `items`:
|
||||
|
||||
```sql
|
||||
-- Assume that the ob_embed model has been created previously (please refer to the "Prerequisites" section to register the model)
|
||||
CREATE TABLE items (
|
||||
id BIGINT PRIMARY KEY,
|
||||
doc VARCHAR(100),
|
||||
VECTOR INDEX vector_idx(doc)
|
||||
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=async, sync_interval=10s)
|
||||
);
|
||||
```
|
||||
|
||||
Insert a row of data into the test table `items`. The system will automatically perform embedding:
|
||||
|
||||
```sql
|
||||
INSERT INTO items(id, doc) VALUES(1, 'Rose');
|
||||
```
|
||||
|
||||
### Create after table creation
|
||||
|
||||
After creating the test table `items`, use the `CREATE VECTOR INDEX` statement to create the `vector_idx` index:
|
||||
|
||||
```sql
|
||||
CREATE TABLE items (
|
||||
id BIGINT PRIMARY KEY,
|
||||
doc VARCHAR(100)
|
||||
);
|
||||
|
||||
-- Assume that the ob_embed model has been created previously (please refer to the "Prerequisites" section to register the model)
|
||||
CREATE VECTOR INDEX vector_idx
|
||||
ON items (doc)
|
||||
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=async, sync_interval=10s);
|
||||
```
|
||||
|
||||
Insert a row of data into the test table `items`. The system will automatically perform embedding:
|
||||
|
||||
```sql
|
||||
INSERT INTO items(id, doc) VALUES(1, 'Rose');
|
||||
```
|
||||
|
||||
### Update
|
||||
|
||||
When updating data of the `VARCHAR` type, the system will re-perform embedding:
|
||||
|
||||
* Synchronous mode: Re-embedding is performed immediately after update.
|
||||
* Asynchronous mode: Re-embedding is performed by background tasks at the next trigger cycle after update.
|
||||
|
||||
Usage example:
|
||||
|
||||
```sql
|
||||
UPDATE items SET doc = 'Lily' WHERE id = 1;
|
||||
```
|
||||
|
||||
### Delete
|
||||
|
||||
The delete operation is the same as that for regular vector indexes. You can directly delete the data.
|
||||
|
||||
Usage example:
|
||||
|
||||
```sql
|
||||
DELETE FROM items WHERE id = 1;
|
||||
```
|
||||
|
||||
## Retrieval
|
||||
|
||||
Hybrid vector indexes support two retrieval methods:
|
||||
|
||||
* Retrieve using raw text
|
||||
* Retrieve using vectors
|
||||
|
||||
For detailed usage of the `APPROXIMATE`/`APPROX` clause, see the related documentation on creating vector indexes at the end of this topic.
|
||||
|
||||
### Retrieve using raw text
|
||||
|
||||
Use the `semantic_distance` expression to pass raw text for vector retrieval.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
SELECT ... FROM table_name
|
||||
ORDER BY semantic_distance(column_name, 'query_text') [APPROXIMATE|APPROX]
|
||||
LIMIT n;
|
||||
```
|
||||
|
||||
Where:
|
||||
* `column_name`: The text column specified when creating the hybrid vector index.
|
||||
* `query_text`: The raw text for retrieval.
|
||||
* `n`: The number of result rows to return.
|
||||
|
||||
#### Usage example
|
||||
|
||||
```sql
|
||||
-- Assume that the ob_embed model has been created previously
|
||||
CREATE TABLE items (
|
||||
id INT PRIMARY KEY,
|
||||
doc varchar(100),
|
||||
VECTOR INDEX vector_idx(doc)
|
||||
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate)
|
||||
);
|
||||
|
||||
INSERT INTO items(id, doc) VALUES(1, 'Rose');
|
||||
INSERT INTO items(id, doc) VALUES(2, 'Sunflower');
|
||||
INSERT INTO items(id, doc) VALUES(3, 'Rose');
|
||||
INSERT INTO items(id, doc) VALUES(4, 'Sunflower');
|
||||
INSERT INTO items(id, doc) VALUES(5, 'Rose');
|
||||
|
||||
-- Retrieve using raw text
|
||||
SELECT id, doc FROM items
|
||||
ORDER BY semantic_distance(doc, 'Sunflower')
|
||||
APPROXIMATE LIMIT 3;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```sql
|
||||
+----+-----------+
|
||||
| id | doc |
|
||||
+----+-----------+
|
||||
| 2 | Sunflower |
|
||||
| 4 | Sunflower |
|
||||
| 5 | Rose |
|
||||
+----+-----------+
|
||||
3 rows in set
|
||||
```
|
||||
|
||||
### Retrieve using vectors (with APPROXIMATE clause)
|
||||
|
||||
Use the `semantic_vector_distance` expression to pass vectors for retrieval. When the retrieval statement includes the `APPROXIMATE`/`APPROX` clause, vector index retrieval is used.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
SELECT ... FROM table_name
|
||||
ORDER BY semantic_vector_distance(column_name, 'query_vector') [APPROXIMATE|APPROX]
|
||||
LIMIT n;
|
||||
```
|
||||
|
||||
Where:
|
||||
* `column_name`: The text column specified when creating the hybrid vector index.
|
||||
* `query_vector`: The query vector.
|
||||
* `n`: The number of result rows to return.
|
||||
|
||||
#### Usage example
|
||||
|
||||
```sql
|
||||
-- Assume that the ob_embed model has been created previously (please refer to the "Prerequisites" section to register the model)
|
||||
CREATE TABLE items (
|
||||
id INT PRIMARY KEY,
|
||||
doc varchar(100),
|
||||
VECTOR INDEX vector_idx(doc)
|
||||
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate)
|
||||
);
|
||||
|
||||
INSERT INTO items(id, doc) VALUES(1, 'Rose');
|
||||
INSERT INTO items(id, doc) VALUES(2, 'Lily');
|
||||
INSERT INTO items(id, doc) VALUES(3, 'Sunflower');
|
||||
INSERT INTO items(id, doc) VALUES(4, 'Rose');
|
||||
|
||||
-- First, obtain the query vector
|
||||
SET @query_vector = AI_EMBED('ob_embed', 'Sunflower');
|
||||
|
||||
-- Retrieve using vectors with index
|
||||
SELECT id, doc FROM items
|
||||
ORDER BY semantic_vector_distance(doc, @query_vector)
|
||||
APPROXIMATE LIMIT 3;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+----+-----------+
|
||||
| id | doc |
|
||||
+----+-----------+
|
||||
| 3 | Sunflower |
|
||||
| 1 | Rose |
|
||||
| 4 | Rose |
|
||||
+----+-----------+
|
||||
3 rows in set
|
||||
```
|
||||
|
||||
### Retrieve using vectors (without APPROXIMATE clause)
|
||||
|
||||
Use the `semantic_vector_distance` expression to pass vectors for retrieval. When the `APPROXIMATE`/`APPROX` clause is not included, brute-force search using a full table scan is performed to obtain the exact results of the n nearest rows. During retrieval execution, the `distance` type is obtained from the table schema, and then a full table scan is performed. Vector distance is calculated for each row to ensure accurate results.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
SELECT ... FROM table_name
|
||||
ORDER BY semantic_vector_distance(column_name, 'query_vector')
|
||||
LIMIT n;
|
||||
```
|
||||
|
||||
Where:
|
||||
* `column_name`: The text column specified when creating the hybrid vector index.
|
||||
* `query_vector`: The query vector.
|
||||
* `n`: The number of result rows to return.
|
||||
|
||||
#### Usage example
|
||||
|
||||
```sql
|
||||
-- Retrieve using vectors with brute-force search (exact results)
|
||||
SELECT id, doc FROM items
|
||||
ORDER BY semantic_vector_distance(doc, @query_vector)
|
||||
LIMIT 3;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+----+-----------+
|
||||
| id | doc |
|
||||
+----+-----------+
|
||||
| 3 | Sunflower |
|
||||
| 4 | Rose |
|
||||
| 1 | Rose |
|
||||
+----+-----------+
|
||||
3 rows in set
|
||||
```
|
||||
|
||||
## Index maintenance
|
||||
|
||||
Hybrid vector indexes support using the `DBMS_VECTOR` system package for index maintenance, including incremental refresh and full rebuild.
|
||||
|
||||
### Incremental refresh
|
||||
|
||||
If a large amount of data is written after the index is created, it is recommended to use the `REFRESH_INDEX` procedure for incremental refresh. For descriptions and examples, see the related documentation.
|
||||
|
||||
Special notes for hybrid vector indexes:
|
||||
* The usage is the same as that for regular vector indexes. For details, see the related documentation.
|
||||
* In asynchronous mode, `REFRESH_INDEX` will additionally trigger data embedding to ensure that incremental data is correctly converted to vectors and added to the index.
|
||||
|
||||
### Full refresh (rebuild)
|
||||
|
||||
If a large amount of data is updated or deleted after the index is created, it is recommended to use the `REBUILD_INDEX` procedure for full refresh. For descriptions and examples, see the related documentation.
|
||||
|
||||
Special notes for hybrid vector indexes:
|
||||
* The usage is the same as that for regular vector indexes. For details, see the related documentation.
|
||||
* The task merges incremental data and snapshots.
|
||||
|
||||
## Related documentation
|
||||
|
||||
* [AI Function Service](../../300.ai-function/200.ai-function.md)
|
||||
* [Create a vector index](200.dense-vector-index.md)
|
||||
* [REFRESH_INDEX](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002753999)
|
||||
* [REBUILD_INDEX](https://en.oceanbase.com/docs/common-oceanbase-database-10000000002754000)
|
||||
@@ -0,0 +1,279 @@
|
||||
---
|
||||
|
||||
slug: /in-memory-sparse-vector-index
|
||||
---
|
||||
|
||||
# In-memory sparse vector index
|
||||
|
||||
This topic describes how to create, query, and use in-memory sparse vector indexes in seekdb.
|
||||
|
||||
## Overview
|
||||
|
||||
In-memory sparse vector indexes are an efficient index type provided by seekdb for sparse vector data (vectors where most elements are zero). In-memory sparse vector indexes must be fully loaded into memory and support DML and real-time queries.
|
||||
|
||||
To improve the query performance of sparse vectors, seekdb integrates the sparse vector index (SINDI) from the VSAG algorithm library. This index performs better than disk-based sparse vector indexes and is suitable for use when memory resources are sufficient.
|
||||
|
||||
## Feature support
|
||||
|
||||
In-memory sparse vector indexes support the following features:
|
||||
|
||||
| Module | Feature | Description |
|
||||
|------|--------|------|
|
||||
| DDL | Create a sparse vector index during table creation | You can create a sparse vector index on a `SPARSEVECTOR` column when creating a table. The maximum supported dimension is 500,000. |
|
||||
| DDL | Create a sparse vector index after table creation | Supports creating a sparse vector index on a `SPARSEVECTOR` column of an existing table. The maximum supported dimension is 500,000. |
|
||||
| DML | Insert, update, delete | The syntax for DML operations is exactly the same as that for regular vector indexes. |
|
||||
| Retrieval | Vector retrieval | Supports retrieval using SQL functions. |
|
||||
| Retrieval | Query parameters | Supports setting query-level parameters through the `parameters` clause during retrieval. |
|
||||
| DBMS_VECTOR | `REFRESH_INDEX` | Performs incremental index refresh. |
|
||||
| DBMS_VECTOR | `REBUILD_INDEX` | Performs full index rebuild. |
|
||||
|
||||
## Index memory estimation and actual usage query
|
||||
|
||||
Supports index memory estimation through the `DBMS_VECTOR` system package. The usage is the same as that for dense indexes. Here, only the special requirements for sparse vector indexes are described:
|
||||
|
||||
* The `IDX_TYPE` parameter must be set to `SINDI`, case-insensitive.
|
||||
|
||||
## Creation syntax and description
|
||||
|
||||
In-memory sparse vector indexes support two creation methods: **creation during table creation** and **creation after table creation**. When creating an index, note the following:
|
||||
|
||||
* The maximum supported dimension for columns on which sparse vector indexes are created is 500,000.
|
||||
* Sparse vector indexes must be created on columns of the `SPARSEVECTOR` type.
|
||||
* The `VECTOR` keyword is required when creating an index.
|
||||
* The index type must be set to `sindi`, which indicates creating an in-memory sparse vector index.
|
||||
* Only the `inner_product` (inner product) distance algorithm is supported.
|
||||
* The parameters and descriptions for an index created after table creation are the same as those for an index created during table creation.
|
||||
|
||||
### Create during table creation
|
||||
|
||||
Supports using the `CREATE TABLE` statement to create a sparse vector index.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
CREATE TABLE table_name (
|
||||
column_name1 data_type1,
|
||||
column_name2 SPARSEVECTOR,
|
||||
...,
|
||||
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
|
||||
);
|
||||
```
|
||||
|
||||
#### Parameter description
|
||||
|
||||
| Parameter | Default value | Value range | Required | Description | Remarks |
|
||||
|------|--------|----------|----------|------|------|
|
||||
| `distance` | | `inner_product` | Yes | Specifies the vector distance algorithm type. | Sparse vector indexes support only inner product (`inner_product`) as the distance algorithm. |
|
||||
| `type` | | `sindi` | Yes | Specifies the index algorithm type. | Indicates creating an in-memory sparse vector index. |
|
||||
| `lib` | `vsag` | `vsag` | No | Specifies the vector index library type. | Currently, only the VSAG vector library is supported. |
|
||||
| `prune` | `false` | `true`/`false` | No | Whether to perform pruning on vectors. | When `prune` is `true`, you need to set the `refine` and `drop_ratio_build` parameters. When `prune` is `false`, full-precision retrieval can be provided. If `refine` is set to `true` or `drop_ratio_build` is not `0`, an error will be returned. |
|
||||
| `refine` | `false` | `true`/`false` | No | Whether reranking is needed. | When set to `true`, the original sparse vectors are retrieved for the search results to perform high-precision distance calculation and reranking, which means an additional copy of the original vector data needs to be stored. Can be set only when `prune=true`. |
|
||||
| `drop_ratio_build` | `0` | `[0, 0.9]` | No | The pruning ratio for sparse vector data. | When a new sparse vector is inserted, the `query_length * drop_ratio_build` smallest values are pruned based on value size. If `refine` is `true`, the original vector data is preserved. Otherwise, only the pruned data is retained. Can be set only when `prune=true`. |
|
||||
| `drop_ratio_search` | `0` | `[0, 0.9]` | No | The pruning ratio for sparse vector values during retrieval. | The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Can also be set through the `parameters` clause during retrieval, and query parameters have higher priority. |
|
||||
| `refine_k` | `4.0` | `[1.0, 1000.0]` | No | Indicates the proportion of results participating in reranking. | Retrieves `limit_k * refine_k` results and obtains the original vectors for reranking. Meaningful only when `refine=true`. Can also be set through the `parameters` clause during retrieval, and query parameters have higher priority. |
|
||||
|
||||
### Create after table creation
|
||||
|
||||
Supports creating a sparse vector index on a `SPARSEVECTOR` column of an existing table.
|
||||
|
||||
#### Syntax
|
||||
|
||||
```sql
|
||||
CREATE VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);
|
||||
```
|
||||
|
||||
#### Parameter description
|
||||
|
||||
The parameter description is the same as that for creating an index during table creation. For details, see the section above.
|
||||
|
||||
## Create, update, and delete examples
|
||||
|
||||
### Create during table creation
|
||||
|
||||
Create the test table `sparse_t1` and create a sparse vector index:
|
||||
|
||||
```sql
|
||||
CREATE TABLE sparse_t1 (
|
||||
c1 INT PRIMARY KEY,
|
||||
c2 SPARSEVECTOR,
|
||||
VECTOR INDEX sparse_idx1(c2)
|
||||
WITH (lib=vsag, type=sindi, distance=inner_product)
|
||||
);
|
||||
```
|
||||
|
||||
Insert sparse vector data into the test table:
|
||||
|
||||
```sql
|
||||
INSERT INTO sparse_t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
|
||||
INSERT INTO sparse_t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
|
||||
INSERT INTO sparse_t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');
|
||||
```
|
||||
|
||||
Query the test table:
|
||||
|
||||
```sql
|
||||
SELECT * FROM sparse_t1;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```
|
||||
+----+---------------------+
|
||||
| c1 | c2 |
|
||||
+----+---------------------+
|
||||
| 1 | {1:0.1,2:0.2,3:0.3} |
|
||||
| 2 | {2:0.2,3:0.3,4:0.4} |
|
||||
| 3 | {3:0.3,4:0.4,5:0.5} |
|
||||
+----+---------------------+
|
||||
3 rows in set
|
||||
```
|
||||
|
||||
### Create after table creation
|
||||
|
||||
Create a sparse vector index after creating the test table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE sparse_t2 (
|
||||
c1 INT PRIMARY KEY,
|
||||
c2 SPARSEVECTOR
|
||||
);
|
||||
|
||||
CREATE VECTOR INDEX sparse_idx2 ON sparse_t2(c2)
|
||||
WITH (lib=vsag, type=sindi, distance=inner_product,
|
||||
prune=true, refine=true, drop_ratio_build=0.1,
|
||||
drop_ratio_search=0.5, refine_k=2.0);
|
||||
```
|
||||
|
||||
Insert sparse vector data into the test table:
|
||||
|
||||
```sql
|
||||
INSERT INTO sparse_t2 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
|
||||
```
|
||||
|
||||
Query the test table:
|
||||
|
||||
```sql
|
||||
SELECT * FROM sparse_t2;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+----+---------------------+
|
||||
| c1 | c2 |
|
||||
+----+---------------------+
|
||||
| 1 | {1:0.1,2:0.2,3:0.3} |
|
||||
+----+---------------------+
|
||||
1 row in set
|
||||
```
|
||||
|
||||
### Update
|
||||
|
||||
When updating sparse vector data, the index is automatically maintained:
|
||||
|
||||
```sql
|
||||
UPDATE sparse_t1 SET c2 = '{1:0.1}' WHERE c1 = 1;
|
||||
```
|
||||
|
||||
### Delete
|
||||
|
||||
The delete operation is the same as that for regular vector indexes. You can directly delete the data:
|
||||
|
||||
```sql
|
||||
DELETE FROM sparse_t1 WHERE c1 = 1;
|
||||
```
|
||||
|
||||
## Retrieval
|
||||
|
||||
The retrieval syntax for sparse vector indexes is similar to that for dense vector indexes, using the `APPROXIMATE`/`APPROX` keyword for approximate nearest neighbor retrieval.
|
||||
|
||||
### Syntax
|
||||
|
||||
```sql
|
||||
SELECT ... FROM table_name
|
||||
ORDER BY inner_product(column_name, query_vector) [APPROXIMATE|APPROX]
|
||||
LIMIT n [PARAMETERS(param1=value1, param2=value2)];
|
||||
```
|
||||
|
||||
Where:
|
||||
* `column_name`: The `SPARSEVECTOR` column specified when creating the sparse vector index.
|
||||
* `query_vector`: The query vector, which can be a string in sparse vector format, such as `'{1:2.4, 3:1.5}'`.
|
||||
* `n`: The number of result rows to return.
|
||||
* `PARAMETERS`: Optional query-level parameters for setting `drop_ratio_search` and `refine_k`.
|
||||
|
||||
### Retrieval usage notes
|
||||
|
||||
For detailed requirements, see [Dense vector index](../200.dense-vector-index.md). Here, only the special requirements for sparse vector indexes are described:
|
||||
|
||||
* Query parameter priority: Query-level parameters set by `PARAMETERS` > Query parameters set when building the index > Default values.
|
||||
* `drop_ratio_search`: Value range `[0, 0.9]`, default value `0`. The pruning ratio for sparse vector values during retrieval. The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Prunes the `query_length * drop_ratio_search` smallest values based on value size. Since pruning all values is meaningless, at least one value is always retained.
|
||||
* `refine_k`: Value range `[1.0, 1000.0]`, default value `4.0`. Indicates the proportion of results participating in reranking. Queries `limit_k * refine_k` results and obtains the original vectors for reranking. Effective only when `refine=true`.
|
||||
|
||||
### Usage examples
|
||||
|
||||
#### Regular query
|
||||
|
||||
```sql
|
||||
CREATE TABLE t1 (
|
||||
c1 INT PRIMARY KEY,
|
||||
c2 SPARSEVECTOR,
|
||||
VECTOR INDEX idx1(c2)
|
||||
WITH (lib=vsag, type=sindi, distance=inner_product)
|
||||
);
|
||||
|
||||
INSERT INTO t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
|
||||
INSERT INTO t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
|
||||
INSERT INTO t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');
|
||||
INSERT INTO t1 VALUES(4, '{5:0.5, 4:0.4, 6:0.6}');
|
||||
INSERT INTO t1 VALUES(5, '{5:0.5, 6:0.6, 7:0.7}');
|
||||
|
||||
SELECT * FROM t1
|
||||
ORDER BY negative_inner_product(c2, '{3:0.3, 4:0.4}')
|
||||
APPROXIMATE LIMIT 4;
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+----+---------------------+
|
||||
| c1 | c2 |
|
||||
+----+---------------------+
|
||||
| 2 | {2:0.2,3:0.3,4:0.4} |
|
||||
| 3 | {3:0.3,4:0.4,5:0.5} |
|
||||
| 4 | {4:0.4,5:0.5,6:0.6} |
|
||||
| 1 | {1:0.1,2:0.2,3:0.3} |
|
||||
+----+---------------------+
|
||||
```
|
||||
|
||||
#### Use query parameters
|
||||
|
||||
```sql
|
||||
SELECT *, negative_inner_product(c2, '{3:0.3, 4:0.4}')
|
||||
AS score FROM t1
|
||||
ORDER BY score APPROXIMATE LIMIT 4
|
||||
PARAMETERS(drop_ratio_search=0.5);
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```shell
|
||||
+----+---------------------+---------------------+
|
||||
| c1 | c2 | score |
|
||||
+----+---------------------+---------------------+
|
||||
| 4 | {4:0.4,5:0.5,6:0.6} | -0.1600000113248825 |
|
||||
| 3 | {3:0.3,4:0.4,5:0.5} | -0.2500000149011612 |
|
||||
| 2 | {2:0.2,3:0.3,4:0.4} | -0.2500000149011612 |
|
||||
+----+---------------------+---------------------+
|
||||
3 rows in set
|
||||
```
|
||||
|
||||
## Index monitoring and maintenance
|
||||
|
||||
In-memory sparse vector indexes provide monitoring views and support using the `DBMS_VECTOR` system package for index maintenance, including incremental refresh and full rebuild. The usage is the same as that for dense indexes.
|
||||
|
||||
## Related documentation
|
||||
|
||||
* For detailed information about sparse vector data types, see [Vector data type](../../700.vector-search-reference/100.vector-data-type.md).
|
||||
* For detailed information about vector distance functions, see [Vector functions](../../250.vector-function.md).
|
||||
* For monitoring and maintenance of dense vector indexes, see [Vector index monitoring/maintenance](../200.dense-vector-index.md).
|
||||
* For index memory estimation and actual usage query of vector indexes, see [Index memory estimation and actual usage query](../200.dense-vector-index.md).
|
||||
Reference in New Issue
Block a user