Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:44:54 +08:00
commit eb309b7b59
133 changed files with 21979 additions and 0 deletions

View File

@@ -0,0 +1,611 @@
---
slug: /vector-function
---
# Use SQL functions
This topic describes the vector functions supported by seekdb and the considerations for using them.
## Considerations
* Vectors with different dimensions are not allowed to perform the following operations. An error `different vector dimensions %d and %d` is returned.
* When the result exceeds the floating-point number range, an error `value out of range: overflow / underflow` is returned.
* Dense vector indexes support L2, inner product, and cosine distance as index distance algorithms. Memory-based sparse vector indexes support inner product as the distance algorithm. For details, see [Create vector indexes](200.vector-index/200.dense-vector-index.md).
* Vector index search supports calling the `L2_distance`, `Cosine_distance`, and `Inner_product` distance functions in this document.
## Distance functions
Distance functions are used to calculate the distance between two vectors. The calculation method varies depending on the distance algorithm used.
### L2_distance
Euclidean distance reflects the distance between the coordinates of the compared vectors -- essentially the straight-line distance between two vectors. It is calculated by applying the Pythagorean theorem to vector coordinates:
![Pythagorean theorem](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E5%8B%BE%E8%82%A1%E5%AE%9A%E7%90%86.jpg)
The function syntax is as follows:
```sql
l2_distance(vector v1, vector v2)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_distance(c1, [1,2,3]), l2_distance([1,2,3],[1,1,1]), l2_distance('[1,1,1]','[1,2,3]') FROM t1;
```
The return result is as follows:
```shell
+--------------------------+------------------------------+----------------------------------+
| l2_distance(c1, [1,2,3]) | l2_distance([1,2,3],[1,1,1]) | l2_distance('[1,1,1]','[1,2,3]') |
+--------------------------+------------------------------+----------------------------------+
| 0 | 2.23606797749979 | 2.23606797749979 |
+--------------------------+------------------------------+----------------------------------+
1 row in set
```
### L2_squared
L2 Squared distance is the square of the Euclidean distance (L2 Distance). It omits the square root operation in the Euclidean distance formula, thereby reducing computational cost while maintaining the relative order of distances. The calculation method is as follows:
![L2 Squared](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.4.1/%E5%90%91%E9%87%8F/l2-squared.jpg)
The syntax is as follows:
```sql
l2_squared(vector v1, vector v2)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_squared(c1, [1,2,3]), l2_squared([1,2,3],[1,1,1]), l2_squared('[1,1,1]','[1,2,3]') FROM t1;
```
The return result is as follows:
```shell
+-------------------------+-----------------------------+---------------------------------+
| l2_squared(c1, [1,2,3]) | l2_squared([1,2,3],[1,1,1]) | l2_squared('[1,1,1]','[1,2,3]') |
+-------------------------+-----------------------------+---------------------------------+
| 0 | 5 | 5 |
+-------------------------+-----------------------------+---------------------------------+
1 row in set
```
### L1_distance
The Manhattan distance is used to calculate the sum of absolute axis distances between two points in a standard coordinate system. The calculation formula is as follows:
![Manhattan distance](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/l1.jpg)
The function syntax is as follows:
```sql
l1_distance(vector v1, vector v2)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t2(c1 vector(3));
INSERT INTO t2 VALUES('[1,2,3]');
INSERT INTO t2 VALUES('[1,1,1]');
SELECT l1_distance(c1, [1,2,3]) FROM t2;
```
The return result is as follows:
```shell
+--------------------------+
| l1_distance(c1, [1,2,3]) |
+--------------------------+
| 0 |
| 3 |
+--------------------------+
2 rows in set
```
### Cosine_distance
Cosine similarity measures the angular difference between two vectors and reflects their directional similarity, regardless of their lengths (magnitude). The value range of cosine similarity is `[-1, 1]`, where `1` indicates vectors in exactly the same direction, `0` indicates orthogonality, and `-1` indicates completely opposite directions.
The calculation method for cosine similarity is as follows:
![Cosine similarity](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E4%BD%99%E5%BC%A6%E7%9B%B8%E4%BC%BC%E5%BA%A6.jpg)
Since cosine similarity closer to 1 indicates greater similarity, cosine distance (or cosine dissimilarity) is sometimes used as a measure of distance between vectors. Cosine distance can be calculated by subtracting cosine similarity from 1:
![Cosine distance](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E4%BD%99%E5%BC%A6%E8%B7%9D%E7%A6%BB.jpg)
The value range of cosine distance is `[0, 2]`, where `0` indicates exactly the same direction (no distance) and `2` indicates completely opposite directions.
The function syntax is as follows:
```sql
cosine_distance(vector v1, vector v2)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t3(c1 vector(3));
INSERT INTO t3 VALUES('[1,2,3]');
INSERT INTO t3 VALUES('[1,2,1]');
SELECT cosine_distance(c1, [1,2,3]) FROM t3;
```
```shell
+------------------------------+
| cosine_distance(c1, [1,2,3]) |
+------------------------------+
| 0 |
| 0.12712843905603044 |
+------------------------------+
2 rows in set
```
### Inner_product
The inner product, also known as the dot product or scalar product, represents a type of multiplication between two vectors. In geometric terms, the inner product indicates the direction and magnitude relationship between two vectors. The calculation method for the inner product is as follows:
![Inner product](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E5%86%85%E7%A7%AF.jpg)
The syntax is as follows:
```sql
inner_product(vector v1, vector v2)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
* For sparse vectors using this function, one parameter can be a string in sparse vector format, such as `c2,'{1:2.4}'`. Two parameters cannot both be strings.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Dense vector example:
```sql
CREATE TABLE t4(c1 vector(3));
INSERT INTO t4 VALUES('[1,2,3]');
INSERT INTO t4 VALUES('[1,2,1]');
SELECT inner_product(c1, [1,2,3]) FROM t4;
```
The return result is as follows:
```shell
+----------------------------+
| inner_product(c1, [1,2,3]) |
+----------------------------+
| 14 |
| 8 |
+----------------------------+
2 rows in set
```
Sparse vector example:
```sql
CREATE TABLE t4(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
INSERT INTO t4 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
INSERT INTO t4 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}');
SELECT inner_product(c2,c3) FROM t4;
The return result is as follows:
```shell
+----------------------+
| inner_product(c2,c3) |
+----------------------+
| 2.640000104904175 |
| 0 |
+----------------------+
2 rows in set
```
### Vector_distance
The vector_distance function calculates the distance between two vectors. You can specify parameters to select different distance algorithms.
The syntax is as follows:
```sql
vector_distance(vector v1, vector v2 [, string metric])
```
The `vector v1/v2` parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The `metric` parameter is used to specify the distance algorithm. Options:
* If not specified, the default algorithm is `euclidean`.
* If specified, the only valid values are:
* `euclidean`. Represents Euclidean distance, same as L2_distance.
* `manhattan`. Represents Manhattan distance, same as L1_distance.
* `cosine`. Represents cosine distance, same as Cosine_distance.
* `dot`. Represents inner product, same as Inner_product.
The return values are described as follows:
* The return value is a `distance(double)` distance value.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t5(c1 vector(3));
INSERT INTO t5 VALUES('[1,2,3]');
INSERT INTO t5 VALUES('[1,2,1]');
SELECT vector_distance(c1, [1,2,3], euclidean) FROM t5;
```
The return result is as follows:
```shell
+-----------------------------------------+
| vector_distance(c1, [1,2,3], euclidean) |
+-----------------------------------------+
| 0 |
| 2 |
+-----------------------------------------+
2 rows in set
```
## Arithmetic functions
Arithmetic functions provide element-wise addition (+), subtraction (-), and multiplication (*) operations between vector types and vector types, single-level array types, and special string types, as well as between single-level array types and single-level array types, and special string types. The calculation method is element-wise, as shown for addition:
![Addition](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E5%8A%A0%E6%B3%95.jpg)
The syntax is as follows:
```sql
v1 + v2
v1 - v2
v1 * v2
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`). **Note**: Two parameters cannot both be string types. At least one parameter must be a vector or single-level array type.
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* When at least one of the two parameters is a vector type, the return value is of the same vector type as the vector parameter.
* When both parameters are single-level array types, the return value is of the `array(float)` type.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t6(c1 vector(3));
INSERT INTO t6 VALUES('[1,2,3]');
SELECT [1,2,3] + '[1.12,1000.0001, -1.2222]', c1 - [1,2,3] FROM t6;
```
The return result is as follows:
```shell
+---------------------------------------+--------------+
| [1,2,3] + '[1.12,1000.0001, -1.2222]' | c1 - [1,2,3] |
+---------------------------------------+--------------+
| [2.12,1002,1.7778] | [0,0,0] |
+---------------------------------------+--------------+
1 row in set
```
## Comparison functions
Comparison functions provide comparison calculations between vector types and vector types, single-level array types, and special string types, including the comparison operators `=`, `!=`, `>`, `>=`, `<`, and `<=`. The calculation method is element-wise dictionary order comparison.
The syntax is as follows:
```sql
v1 = v2
v1 != v2
v1 > v2
v1 < v2
v1 >= v2
v1 <= v2
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
:::tip
One of the two parameters must be a vector type.
:::
* The dimensions of the two parameters must be the same.
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is of the bool type.
* If any parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t7(c1 vector(3));
INSERT INTO t7 VALUES('[1,2,3]');
SELECT c1 = '[1,2,3]' FROM t7;
```
The return result is as follows:
```shell
+----------------+
| c1 = '[1,2,3]' |
+----------------+
| 1 |
+----------------+
1 row in set
```
## Aggregate functions
:::tip
Vector columns cannot be used as GROUP BY conditions, and DISTINCT is not supported.
:::
### Sum
The Sum function is used to calculate the sum of vectors in a vector column of a table, using element-wise accumulation to obtain the sum vector.
The syntax is as follows:
```sql
sum(vector v1)
```
The parameters are described as follows:
* Only the vector type is supported.
The return values are described as follows:
* The return value is a `sum (vector)` value.
Here is an example:
```sql
CREATE TABLE t8(c1 vector(3));
INSERT INTO t8 VALUES('[1,2,3]');
SELECT sum(c1) FROM t8;
```
The return result is as follows:
```shell
+---------+
| sum(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set
```
### Avg
The Avg function is used to calculate the average of vectors in a vector column of a table.
The syntax is as follows:
```sql
avg(vector v1)
```
The parameters are described as follows:
* Only the vector type is supported.
The return values are described as follows:
* The return value is an `avg (vector)` value.
* `NULL` rows in the vector column are not counted.
* When the input parameter is empty, the output is `NULL`.
Here is an example:
```sql
CREATE TABLE t9(c1 vector(3));
INSERT INTO t9 VALUES('[1,2,3]');
SELECT avg(c1) FROM t9;
```
The return result is as follows:
```shell
+---------+
| avg(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set
```
## Other common vector functions
### Vector_norm
The Vector_norm function calculates the Euclidean norm (or length) of a vector, which represents the Euclidean distance between the vector and the origin. The calculation formula is as follows:
![Euclidean norm](https://obbusiness-private.oss-cn-shanghai.aliyuncs.com/doc/img/observer/V4.3.5/vector_search/%E6%AC%A7%E5%87%A0%E9%87%8C%E5%BE%B7.jpg)
The syntax is as follows:
```sql
vector_norm(vector v1)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
* If a single-level array type parameter exists, its elements cannot be `NULL`.
The return values are described as follows:
* The return value is a `norm(double)` modulus value.
* If the parameter is `NULL`, the return value is `NULL`.
Here is an example:
```sql
CREATE TABLE t10(c1 vector(3));
INSERT INTO t10 VALUES('[1,2,3]');
SELECT vector_norm(c1),vector_norm([1,2,3]) FROM t10;
```
The return result is as follows:
```shell
+--------------------+----------------------+
| vector_norm(c1) | vector_norm([1,2,3]) |
+--------------------+----------------------+
| 3.7416573867739413 | 3.7416573867739413 |
+--------------------+----------------------+
1 row in set
```
### Vector_dims
The Vector_dims function is used to return the vector dimension.
The syntax is as follows:
```sql
vector_dims(vector v1)
```
The parameters are described as follows:
* Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as `[1,2,3,..]`) and string types (such as `'[1,2,3]'`).
The return values are described as follows:
* The return value is a `dims(int64)` dimension value.
* If the parameter is `NULL`, an error is returned.
Here is an example:
```sql
CREATE TABLE t11(c1 vector(3));
INSERT INTO t11 VALUES('[1,2,3]');
INSERT INTO t11 VALUES('[1,1,1]');
SELECT vector_dims(c1), vector_dims('[1,2,3]') FROM t11;
```
The return result is as follows:
```shell
+-----------------+------------------------+
| vector_dims(c1) | vector_dims('[1,2,3]') |
+-----------------+------------------------+
| 3 | 3 |
| 3 | 3 |
+-----------------+------------------------+
2 rows in set
```