Initial commit
This commit is contained in:
@@ -0,0 +1,332 @@
|
||||
---
|
||||
|
||||
slug: /ai-function
|
||||
---
|
||||
|
||||
# Use cases and examples of AI functions
|
||||
|
||||
This topic describes the features of AI functions in seekdb.
|
||||
|
||||
AI functions integrate AI model capabilities directly into data processing within databases through SQL expressions. This greatly simplifies operations such as data extraction, analysis, summarization, and storage using large AI models, making it an important new feature in the fields of databases and data warehouses. seekdb provides AI model and endpoint management through the `DBMS_AI_SERVICE` package, introduces several built-in AI function expressions, and supports monitoring AI model usage through views.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using AI functions, make sure you have the necessary privileges. For more information about the privileges, see [AI function privileges](100.ai-function-permission.md).
|
||||
|
||||
## Considerations
|
||||
|
||||
Hybrid search relies on the model management and embedding capabilities of the AI function service. Before deleting an AI model, check whether it is referenced by hybrid search to avoid potential issues.
|
||||
|
||||
## AI model management
|
||||
|
||||
The `DBMS_AI_SERVICE` package provides the ability to manage AI models and endpoints. It supports the following operations:
|
||||
|
||||
| **Operation** | **Description** |
|
||||
| ------ | ------ |
|
||||
| CREATE_AI_MODEL | Creates an AI model object. |
|
||||
| DROP_AI_MODEL | Drops an AI model object. |
|
||||
| CREATE_AI_MODEL_ENDPOINT | Creates an AI model endpoint object. |
|
||||
| ALTER_AI_MODEL_ENDPOINT | Modifies an AI model endpoint object. |
|
||||
| DROP_AI_MODEL_ENDPOINT | Drops an AI model endpoint object. |
|
||||
|
||||
By using this system package, you can directly manage AI models and endpoints within seekdb, without relying on external services.
|
||||
|
||||
## Monitor AI model usage
|
||||
|
||||
seekdb allows you to query and monitor information about AI models and their usage through the following views:
|
||||
|
||||
* CDB/DBA_OB_AI_MODELS: Query information about AI models.
|
||||
* CDB/DBA_OB_AI_MODEL_ENDPOINTS: Monitor the calls of AI models.
|
||||
|
||||
## AI function expressions
|
||||
|
||||
seekdb supports the following AI function expressions, allowing you to call AI models directly within seekdb using SQL statements and greatly simplifying the process:
|
||||
|
||||
| Function | Description |
|
||||
|-----|------|
|
||||
|`AI_COMPLETE`|Calls a specified text generation large language model (LLM) to process prompts and data, and then parses the results.|
|
||||
| `AI_PROMPT` |Constructs and formats prompts. Supports dynamic data insertion.|
|
||||
|`AI_EMBED`|Calls an embedding model to convert text data into vector data.|
|
||||
|`AI_RERANK`|Calls a reranking model to sort text based on prompts by similarity.|
|
||||
|
||||
:::info
|
||||
When using AI function expressions, make sure you have registered the AI models and endpoint information in the database.
|
||||
:::
|
||||
|
||||
### AI_COMPLETE and AI_PROMPT
|
||||
|
||||
The `AI_COMPLETE` function specifies a registered large language model (LLM) for text generation using the `model_key`, processes the user-provided prompt and data, and returns the text generated by the model. Users can customize the prompt and the format of the data from the database through the `prompt` parameter. This approach not only enables flexible processing of textual data, but also allows for batch processing directly within the database, effectively avoiding the overhead of repeatedly transferring data between the database and the language model.
|
||||
|
||||
In many AI application scenarios, prompts are often highly structured and require dynamic injection of specific data. Manually concatenating prompts and input content using functions like `CONCAT` each time is not only costly in terms of development, but also prone to formatting errors. To support prompt reuse and dynamic combination of prompts and data, seekdb provides the `AI_PROMPT` function. `AI_PROMPT` upgrades prompts from "static text" to a "reusable, parameterizable" function template format, which can be used directly in place of the `prompt` parameter within `AI_COMPLETE`. This greatly simplifies the process of constructing prompts, improving both development efficiency and accuracy.
|
||||
|
||||
#### AI_PROMPT function
|
||||
|
||||
The `AI_PROMPT` function is used to construct and format prompts, supporting dynamic data insertion.
|
||||
|
||||
##### Syntax
|
||||
|
||||
The syntax for the `AI_PROMPT` function is as follows:
|
||||
|
||||
```sql
|
||||
AI_PROMPT('template', expr0 [ , expr1, ... ]);
|
||||
```
|
||||
|
||||
Parameters:
|
||||
|
||||
| Parameter | Description | Type | Nullable |
|
||||
|-----------|-------------|------|----------|
|
||||
| template | The prompt template entered by the user. | VARCHAR(max_length) | No |
|
||||
| expr | The data entered by the user. | VARCHAR(max_length) | No |
|
||||
|
||||
Both the `template` and `expr` parameters are required and cannot be null. The `expr` parameter only supports the `VARCHAR` type and does not support the `JSON` type.
|
||||
|
||||
Return value:
|
||||
|
||||
* JSON, the formatted prompt string.
|
||||
|
||||
##### Examples
|
||||
|
||||
The `AI_PROMPT` function organizes the template string and dynamic data into JSON format:
|
||||
* The first parameter (the template string `template`) is placed in the `template` field of the returned JSON.
|
||||
* Subsequent parameters (data values `expr0`, `expr1`, ...) are placed in the `args` array of the returned JSON.
|
||||
* Placeholders in the template such as `{0}`, `{1}`, etc., correspond by index to the data in the `args` array and will be automatically replaced when used in the `AI_COMPLETE` function.
|
||||
|
||||
For example:
|
||||
|
||||
```sql
|
||||
SELECT AI_PROMPT('Recommend {0} of the most popular {1} to me.', 'ten', 'mobile phones');
|
||||
```
|
||||
|
||||
Return result:
|
||||
|
||||
```json
|
||||
{
|
||||
"template": "Recommend {0} of the most popular {1} to me.",
|
||||
"args": ["ten", "mobile phones"]
|
||||
}
|
||||
```
|
||||
|
||||
Based on the previous example, using the `AI_PROMPT` function within the `AI_COMPLETE` function:
|
||||
|
||||
```sql
|
||||
SELECT AI_COMPLETE("ob_complete", AI_PROMPT('Recommend {0} of the most popular {1} to me. just output name in json array format', 'two', 'mobile phones')) AS ans;
|
||||
```
|
||||
|
||||
Return result:
|
||||
|
||||
```json
|
||||
+--------------------------------------------------+
|
||||
| ans |
|
||||
+--------------------------------------------------+
|
||||
| ["iPhone 15 Pro Max","Samsung Galaxy S24 Ultra"] |
|
||||
+--------------------------------------------------+
|
||||
```
|
||||
|
||||
#### AI_COMPLETE function
|
||||
|
||||
#### Syntax
|
||||
|
||||
The syntax for the `AI_COMPLETE` function is as follows:
|
||||
|
||||
```sql
|
||||
AI_COMPLETE(model_key, prompt[, parameters])
|
||||
-- If you use the AI_PROMPT function, replace the prompt parameter with the AI_PROMPT function. See the AI_PROMPT function example.
|
||||
AI_COMPLETE(model_key, AI_PROMPT(prompt_template, data))
|
||||
```
|
||||
|
||||
Parameters:
|
||||
|
||||
| Parameter | Description | Type | Nullable |
|
||||
|------|------|------|----------|
|
||||
| model_key | The model registered in the database. | VARCHAR(128) | No |
|
||||
| prompt | The prompt provided by the user. | VARCHAR/TEXT(LONGTEXT) | No |
|
||||
| parameters | Optional configuration for the API, such as `temperature`, `top_p`, and `max_tokens`. These options vary by vendor and are added directly to the message body. Typically, you can use the default settings without specifying these options. | JSON | Yes |
|
||||
|
||||
Both `model_key` and `prompt` are required. If either is `NULL`, the function will return an error.
|
||||
|
||||
Return value:
|
||||
|
||||
* text: The text generated by the LLM based on the prompt.
|
||||
|
||||
##### Examples
|
||||
|
||||
1. Sentiment analysis example
|
||||
|
||||
```sql
|
||||
SELECT AI_COMPLETE("ob_complete","Your task is to perform sentiment analysis on the provided text and determine whether the sentiment is positive or negative.
|
||||
The text to analyze is as follows:
|
||||
<text>
|
||||
What a beautiful day!
|
||||
</text>
|
||||
Judgment criteria:
|
||||
If the text expresses a positive sentiment, output 1; if it expresses a negative sentiment, output -1. Do not output anything else.\n") AS ans;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+-----+
|
||||
| ans |
|
||||
+-----+
|
||||
| 1 |
|
||||
+-----+
|
||||
```
|
||||
|
||||
2. Translation example
|
||||
|
||||
```sql
|
||||
CREATE TABLE comments (
|
||||
id INT AUTO_INCREMENT PRIMARY KEY,
|
||||
content TEXT
|
||||
);
|
||||
|
||||
INSERT INTO comments (content) VALUES ('hello world!');
|
||||
|
||||
-- Use the concat expression to replace the processed data with column names from the database table, enabling batch processing of database data without copying data to and from the LLM.
|
||||
SELECT AI_COMPLETE("ob_complete",
|
||||
concat("You are a professional translator. Please translate the following English text into Chinese. The text to be translated is:<text>",
|
||||
content,
|
||||
"</text>")) AS ans FROM comments;
|
||||
```
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+-------------+
|
||||
| ans |
|
||||
+-------------+
|
||||
| 你好,世界! |
|
||||
+-------------+
|
||||
```
|
||||
|
||||
3. Classification example
|
||||
|
||||
```sql
|
||||
SELECT AI_COMPLETE("ob_complete","You are a classification expert. You will receive various issue texts and need to categorize them into the appropriate department. The department list is [\"Hardware\",\"Software\",\"Other\"]. The text to analyze is as follows:
|
||||
<text>
|
||||
The screen quality is terrible.
|
||||
</text>") AS res;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+--------+
|
||||
| res |
|
||||
+--------+
|
||||
| Hardware |
|
||||
+--------+
|
||||
```
|
||||
|
||||
### AI_EMBED
|
||||
|
||||
The `AI_EMBED` function uses the `model_key` parameter to specify a registered embedding model, which converts your text data into vector representations. If the model supports multiple dimensions, you can use the `dim` parameter to specify the output dimension.
|
||||
|
||||
#### Use AI_EMBED
|
||||
|
||||
Syntax:
|
||||
|
||||
```sql
|
||||
AI_EMBED(model_key, input, [dim])
|
||||
```
|
||||
|
||||
Parameters:
|
||||
|
||||
| Parameter | Description | Type | Nullable |
|
||||
|------|------|------|----------|
|
||||
| model_key | The embedding model registered in your database. | VARCHAR(128) | No |
|
||||
| input | The text you want to convert into a vector. | VARCHAR | No |
|
||||
| dim | Specifies the output dimension of the vector. Some model providers support configuring this value. | INT64 | Yes |
|
||||
|
||||
Both `model_key` and `input` are required. If either is `NULL`, the function will return an error.
|
||||
|
||||
Return value:
|
||||
|
||||
* A string in vector format, that is, the embedding model’s vector representation of your text.
|
||||
|
||||
#### Examples
|
||||
|
||||
1. Embed single row of data.
|
||||
|
||||
```sql
|
||||
SELECT AI_EMBED("ob_embed","Hello world") AS embedding;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+----------------+
|
||||
| embedding |
|
||||
+----------------+
|
||||
| [0.1, 0.2, 0.3]|
|
||||
+----------------+
|
||||
```
|
||||
|
||||
2. Embed table columns.
|
||||
|
||||
```sql
|
||||
CREATE TABLE comments (
|
||||
id INT AUTO_INCREMENT PRIMARY KEY,
|
||||
content TEXT
|
||||
);
|
||||
|
||||
INSERT INTO comments (content) VALUES ('hello world!');
|
||||
|
||||
SELECT AI_EMBED("ob_embed",content) AS embedding FROM comments;
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+----------------+
|
||||
| embedding |
|
||||
+----------------+
|
||||
| [0.1, 0.2, 0.3]|
|
||||
+----------------+
|
||||
```
|
||||
|
||||
### AI_RERANK
|
||||
|
||||
The `AI_RERANK` function uses the `model_key` parameter to specify a registered reranking model. It organizes your query and document list according to the provider's rules, sends them to the specified model, and returns the sorted results. This function is suitable for reranking scenarios in Retrieval-Augmented Generation (RAG).
|
||||
|
||||
#### Use AI_RERANK
|
||||
|
||||
Syntax:
|
||||
|
||||
```sql
|
||||
AI_RERANK(model_key, query, documents[, document_key])
|
||||
```
|
||||
|
||||
Parameters:
|
||||
|
||||
| Parameter | Description | Type | Nullable |
|
||||
|------|------|------|----------|
|
||||
| model_key | The reranking model registered in your database. | VARCHAR(128) | No |
|
||||
| query | The search text you want to use. | VARCHAR(1024) | No |
|
||||
| documents | The list of documents to be ranked. | JSON array, for example, `'["apple", "banana"]'` | No |
|
||||
|
||||
All of the parameters `model_key` and `input` are required. If either is `NULL`, the function will return an error.
|
||||
|
||||
Return value:
|
||||
|
||||
* A JSON array containing the documents and their relevance scores, sorted in descending order by relevance.
|
||||
|
||||
#### Examples
|
||||
|
||||
```sql
|
||||
SELECT AI_RERANK("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]');
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```sql
|
||||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| ai_rerank("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]') |
|
||||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
| [{"index": 0, "document": {"text": "apple"}, "relevance_score": 0.9912109375}, {"index": 1, "document": {"text": "banana"}, "relevance_score": 0.0033512115478515625}, {"index": 2, "document": {"text": "fruit"}, "relevance_score": 0.0003669261932373047}, {"index": 3, "document": {"text": "vegetable"}, "relevance_score": 0.00001996755599975586}] |
|
||||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
* [Vector embedding technology](../100.vector-search/150.vector-embedding-technology.md)
|
||||
* [Privilege types in MySQL-compatible mode](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974758)
|
||||
Reference in New Issue
Block a user