Initial commit
This commit is contained in:
@@ -0,0 +1,193 @@
|
||||
---
|
||||
sidebar_label: Jina AI
|
||||
slug: /jina
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with Jina AI
|
||||
|
||||
seekdb supports vector data storage, vector indexes, and embedding vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
Jina AI is an AI platform focused on multimodal search and vector search. It offers core components and tools for building enterprise-grade Retrieval-Augmented Generation (RAG) applications based on multimodal search, helping organizations and developers create advanced search-driven generative AI solutions.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
|
||||
|
||||
* You have installed Python 3.11 or later.
|
||||
|
||||
* You have installed required dependencies:
|
||||
|
||||
```shell
|
||||
python3 -m pip install pyobvector requests sqlalchemy
|
||||
```
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
:::tip
|
||||
The connected user must have <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The username for connecting to the database.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Build your AI assistant
|
||||
|
||||
### Set your Jina AI API key as an environment variable
|
||||
|
||||
Get your [Jina AI API key](https://jina.ai/api-dashboard/reader) and configure it, along with your seekdb connection details, as environment variables:
|
||||
|
||||
```shell
|
||||
export OCEANBASE_DATABASE_URL=YOUR_OCEANBASE_DATABASE_URL
|
||||
export OCEANBASE_DATABASE_USER=YOUR_OCEANBASE_DATABASE_USER
|
||||
export OCEANBASE_DATABASE_DB_NAME=YOUR_OCEANBASE_DATABASE_DB_NAME
|
||||
export OCEANBASE_DATABASE_PASSWORD=YOUR_OCEANBASE_DATABASE_PASSWORD
|
||||
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY
|
||||
```
|
||||
|
||||
### Example code snippets
|
||||
|
||||
#### Get embeddings from Jina AI
|
||||
|
||||
Jina AI offers several embedding models. You can choose the one that best fits your needs.
|
||||
|
||||
| Model | Parameter size | Embedding dimension | Text |
|
||||
| --- | --- | --- | --- |
|
||||
| [jina-embeddings-v3](https://zilliz.com/ai-models/jina-embeddings-v3) | 570M | flexible embedding size (Default: 1024) | multilingual text embeddings; supports 94 language in total |
|
||||
| [jina-embeddings-v2-small-en](https://zilliz.com/ai-models/jina-embeddings-v2-small-en) | 33M | 512 | English monolingual embeddings |
|
||||
| [jina-embeddings-v2-base-en](https://zilliz.com/ai-models/jina-embeddings-v2-base-en) | 137M | 768 | English monolingual embeddings |
|
||||
| [jina-embeddings-v2-base-zh](https://zilliz.com/ai-models/jina-embeddings-v2-base-zh) | 161M | 768 | Chinese-English Bilingual embeddings |
|
||||
| [jina-embeddings-v2-base-de](https://zilliz.com/ai-models/jina-embeddings-v2-base-de) | 161M | 768 | German-English Bilingual embeddings |
|
||||
| [jina-embeddings-v2-base-code](https://zilliz.com/ai-models/jina-embeddings-v2-base-code) | 161M | 768 | English and programming languages |
|
||||
|
||||
Here is an example using `jina-embeddings-v3`. The following helper function, `generate_embeddings`, calls the Jina AI embedding API:
|
||||
|
||||
```python
|
||||
import os
|
||||
import requests
|
||||
from sqlalchemy import Column, Integer, String
|
||||
from pyobvector import ObVecClient, VECTOR, IndexParam, cosine_distance
|
||||
|
||||
JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')
|
||||
|
||||
# Step 1. Text data vectorization
|
||||
def generate_embeddings(text: str):
|
||||
JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
|
||||
JINAAI_HEADERS = {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': f'Bearer {JINAAI_API_KEY}'
|
||||
}
|
||||
JINAAI_REQUEST_DATA = {
|
||||
'input': [text],
|
||||
'model': 'jina-embeddings-v3'
|
||||
}
|
||||
|
||||
response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
|
||||
response_json = response.json()
|
||||
return response_json['data'][0]['embedding']
|
||||
|
||||
|
||||
TEXTS = [
|
||||
'Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.',
|
||||
'OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.',
|
||||
'OceanBase is a native distributed relational database that supports HTAP hybrid transaction analysis and processing. It features enterprise-level characteristics such as high availability, transparent scalability, and multi-tenancy, and is compatible with MySQL/Oracle protocols.'
|
||||
]
|
||||
data = []
|
||||
for text in TEXTS:
|
||||
# Generate the embedding for the text via Jina AI API.
|
||||
embedding = generate_embeddings(text)
|
||||
data.append({
|
||||
'content': text,
|
||||
'content_vec': embedding
|
||||
})
|
||||
|
||||
print(f"Successfully processed {len(data)} texts")
|
||||
```
|
||||
|
||||
#### Define the vector table structure and store vectors in seekdb
|
||||
|
||||
Create a table called `jinaai_oceanbase_demo_documents` with columns for the text (`content`), the embedding vector (`content_vec`), and vector index information. Then insert the vector data into seekdb:
|
||||
|
||||
```python
|
||||
# Step 2. Connect seekdb Serverless
|
||||
OCEANBASE_DATABASE_URL = os.getenv('OCEANBASE_DATABASE_URL')
|
||||
OCEANBASE_DATABASE_USER = os.getenv('OCEANBASE_DATABASE_USER')
|
||||
OCEANBASE_DATABASE_DB_NAME = os.getenv('OCEANBASE_DATABASE_DB_NAME')
|
||||
OCEANBASE_DATABASE_PASSWORD = os.getenv('OCEANBASE_DATABASE_PASSWORD')
|
||||
|
||||
client = ObVecClient(uri=OCEANBASE_DATABASE_URL, user=OCEANBASE_DATABASE_USER,password=OCEANBASE_DATABASE_PASSWORD,db_name=OCEANBASE_DATABASE_DB_NAME)
|
||||
# Step 3. Create the vector table.
|
||||
table_name = "jinaai_oceanbase_demo_documents"
|
||||
client.drop_table_if_exist(table_name)
|
||||
|
||||
cols = [
|
||||
Column("id", Integer, primary_key=True, autoincrement=True),
|
||||
Column("content", String(500), nullable=False),
|
||||
Column("content_vec", VECTOR(1024))
|
||||
]
|
||||
|
||||
# Create vector index
|
||||
vector_index_params = IndexParam(
|
||||
index_name="idx_content_vec",
|
||||
field_name="content_vec",
|
||||
index_type="HNSW",
|
||||
distance_metric="cosine"
|
||||
)
|
||||
|
||||
client.create_table_with_index_params(
|
||||
table_name=table_name,
|
||||
columns=cols,
|
||||
vidxs=[vector_index_params]
|
||||
)
|
||||
|
||||
print('- Inserting Data to OceanBase...')
|
||||
client.insert(table_name, data=data)
|
||||
```
|
||||
|
||||
#### Semantic search
|
||||
|
||||
Use the Jina AI embedding API to generate an embedding for your query text. Then, search for the most relevant document by calculating the cosine distance between the query embedding and each embedding in the vector table:
|
||||
|
||||
```python
|
||||
# Step 4. Query the most relevant document based on the query.
|
||||
query = 'What is OceanBase?'
|
||||
# Generate the embedding for the query via Jina AI API.
|
||||
query_embedding = generate_embeddings(query)
|
||||
|
||||
res = client.ann_search(
|
||||
table_name,
|
||||
vec_data=query_embedding,
|
||||
vec_column_name="content_vec",
|
||||
distance_func=cosine_distance, # Use cosine distance function
|
||||
with_dist=True,
|
||||
topk=1,
|
||||
output_column_names=["id", "content"],
|
||||
)
|
||||
|
||||
print('- The Most Relevant Document and Its Distance to the Query:')
|
||||
for row in res.fetchall():
|
||||
print(f' - ID: {row[0]}\n'
|
||||
f' content: {row[1]}\n'
|
||||
f' distance: {row[2]}')
|
||||
```
|
||||
|
||||
#### Expected result
|
||||
|
||||
```plain
|
||||
- ID: 2
|
||||
content: OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
|
||||
distance: 0.14733879001870276
|
||||
```
|
||||
@@ -0,0 +1,228 @@
|
||||
---
|
||||
sidebar_label: OpenAI
|
||||
slug: /openai
|
||||
---
|
||||
|
||||
# OpenAI
|
||||
|
||||
OpenAI is an artificial intelligence company that has developed several large language models. These models excel at understanding and generating natural language, making them highly effective for tasks such as text generation, answering questions, and engaging in conversations. Access to these models is available through an API.
|
||||
|
||||
seekdb offers features such as vector storage, vector indexing, and embedding-based vector search. By using OpenAI's API, you can convert data into vectors, store these vectors in seekdb, and then take advantage of seekdb's vector search capabilities to find relevant data.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
* You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
|
||||
* You have installed [Python 3.9 or later](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/).
|
||||
* You have installed [Poetry](https://python-poetry.org/docs/), [Pyobvector](https://github.com/oceanbase/pyobvector), and OpenAI SDK. The installation commands are as follows:
|
||||
|
||||
```shell
|
||||
python3 pip install poetry
|
||||
python3 pip install pyobvector
|
||||
python3 pip install openai
|
||||
```
|
||||
|
||||
* You have obtained an [OpenAI API key](https://platform.openai.com/api-keys).
|
||||
|
||||
## Step 1: Obtain the connection string of seekdb
|
||||
|
||||
Contact the seekdb deployment engineer or administrator to obtain the connection string of seekdb, for example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to be accessed.
|
||||
|
||||
:::tip
|
||||
The user for connection must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password of the account.
|
||||
|
||||
**Here is an example:**
|
||||
|
||||
```shell
|
||||
obclient -hxxx.xxx.xxx.xxx -P2881 -utest_user001 -p****** -Dtest
|
||||
```
|
||||
|
||||
## Step 2: Register an LLM account
|
||||
|
||||
Obtain an OpenAI API key.
|
||||
|
||||
1. Log in to the [OpenAI](https://platform.openai.com/) platform.
|
||||
|
||||
2. Click **API Keys** in the upper-right corner.
|
||||
|
||||
3. Click **Create API Key**.
|
||||
|
||||
4. Specify the required information and click **Create API Key**.
|
||||
|
||||
Specify the API key for the relevant environment variable.
|
||||
|
||||
* For a Unix-based system such as Ubuntu or macOS, you can run the following command in a terminal:
|
||||
|
||||
```shell
|
||||
export OPENAI_API_KEY='your-api-key'
|
||||
```
|
||||
|
||||
* For a Windows system, you can run the following command in Command Prompt:
|
||||
|
||||
```shell
|
||||
set OPENAI_API_KEY=your-api-key
|
||||
```
|
||||
|
||||
You must replace `your-api-key` with the actual OpenAI API key.
|
||||
|
||||
## Step 3: Store vector data in seekdb
|
||||
|
||||
### Store vector data in seekdb
|
||||
|
||||
1. Prepare test data.
|
||||
|
||||
Download the [CSV file](https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240827/srxyhu/fine_food_reviews.csv) that already contains the vectorized data. This CSV file includes 1,000 food review entries, and the last column contains the vector values. Therefore, you do not need to calculate the vectors yourself. If you want to recalculate the embeddings for the "embedding" column (the vector column), you can use the following code to generate a new CSV file:
|
||||
|
||||
```shell
|
||||
from openai import OpenAI
|
||||
import pandas as pd
|
||||
input_datapath = "./fine_food_reviews.csv"
|
||||
client = OpenAI()
|
||||
# Here the text-embedding-ada-002 model is used. You can change the model as needed.
|
||||
def embedding_text(text, model="text-embedding-ada-002"):
|
||||
# For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
|
||||
res = client.embeddings.create(input=text, model=model)
|
||||
return res.data[0].embedding
|
||||
df = pd.read_csv(input_datapath, index_col=0)
|
||||
# It takes a few minutes to generate the CSV file by calling the OpenAI Embedding API row by row.
|
||||
df["embedding"] = df.combined.apply(embedding_text)
|
||||
output_datapath = './fine_food_reviews_self_embeddings.csv'
|
||||
df.to_csv(output_datapath)
|
||||
```
|
||||
|
||||
2. Run the following script to insert the test data into seekdb. The script must be located in the same directory as the test data.
|
||||
|
||||
```shell
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import json
|
||||
from pyobvector import *
|
||||
from sqlalchemy import Column, Integer, String
|
||||
# Connect to seekdb by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
|
||||
client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
|
||||
# The test dataset has been vectorized and is stored in the same directory as the Python script by default. If you vectorize the dataset again, specify the new file.
|
||||
file_name = "fine_food_reviews.csv"
|
||||
file_path = os.path.join("./", file_name)
|
||||
# Define columns. The last column is a vector column.
|
||||
cols = [
|
||||
Column('id', Integer, primary_key=True, autoincrement=False),
|
||||
Column('product_id', String(256), nullable=True),
|
||||
Column('user_id', String(256), nullable=True),
|
||||
Column('score', Integer, nullable=True),
|
||||
Column('summary', String(2048), nullable=True),
|
||||
Column('text', String(8192), nullable=True),
|
||||
Column('combined', String(8192), nullable=True),
|
||||
Column('n_tokens', Integer, nullable=True),
|
||||
Column('embedding', VECTOR(1536))
|
||||
]
|
||||
# Define the table name.
|
||||
table_name = 'fine_food_reviews'
|
||||
# If the table does not exist, create it.
|
||||
if not client.check_table_exists(table_name):
|
||||
client.create_table(table_name,columns=cols)
|
||||
# Create an index on the vector column.
|
||||
client.create_index(
|
||||
table_name=table_name,
|
||||
is_vec_index=True,
|
||||
index_name='vidx',
|
||||
column_names=['embedding'],
|
||||
vidx_params='distance=l2, type=hnsw, lib=vsag',
|
||||
)
|
||||
# Open and read the CSV file.
|
||||
with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
|
||||
csvreader = csv.reader(csvfile)
|
||||
# Read the header line.
|
||||
headers = next(csvreader)
|
||||
print("Headers:", headers)
|
||||
batch = [] # Store data by inserting 10 rows into the database each time.
|
||||
for i, row in enumerate(csvreader):
|
||||
# The CSV file contains nine columns: `id`, `product_id`, `user_id`, `score`, `summary`, `text`, `combined`, `n_tokens`, and `embedding`.
|
||||
if not row:
|
||||
break
|
||||
food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
|
||||
'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
|
||||
batch.append(food_review_line)
|
||||
# Insert 10 rows each time.
|
||||
if (i + 1) % 10 == 0:
|
||||
client.insert(table_name,batch)
|
||||
batch = [] # Clear the cache.
|
||||
# Insert the rest rows, if any.
|
||||
if batch:
|
||||
client.insert(table_name,batch)
|
||||
# Check the data in the table and make sure that all data has been inserted.
|
||||
count_sql = f"select count(*) from {table_name};"
|
||||
cursor = client.perform_raw_text_sql(count_sql)
|
||||
result = cursor.fetchone()
|
||||
print(f"Total number of inserted rows:{result[0]}")
|
||||
```
|
||||
|
||||
### Query seekdb data
|
||||
|
||||
1. Save the following Python script and name it as `openAIQuery.py`.
|
||||
|
||||
```shell
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import json
|
||||
from pyobvector import *
|
||||
from sqlalchemy import func
|
||||
from openai import OpenAI
|
||||
# Obtain command-line options.
|
||||
if len(sys.argv) != 2:
|
||||
print("Enter a query statement." )
|
||||
sys.exit()
|
||||
queryStatement = sys.argv[1]
|
||||
# Connect to seekdb by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
|
||||
client = ObVecClient(uri="host:port", user="usename",password="****",db_name="test")
|
||||
openAIclient = OpenAI()
|
||||
# Define the function for generating text vectors.
|
||||
def generate_embeddings(text, model="text-embedding-ada-002"):
|
||||
# For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
|
||||
res = openAIclient.embeddings.create(input=text, model=model)
|
||||
return res.data[0].embedding
|
||||
|
||||
def query_ob(query, tableName, vector_name="embedding", top_k=1):
|
||||
embedding = generate_embeddings(query)
|
||||
# Perform an approximate nearest neighbor search (ANNS).
|
||||
res = client.ann_search(
|
||||
table_name=tableName,
|
||||
vec_data=embedding,
|
||||
vec_column_name=vector_name,
|
||||
distance_func=func.l2_distance,
|
||||
topk=top_k,
|
||||
output_column_names=['combined']
|
||||
)
|
||||
for row in res:
|
||||
print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
|
||||
# Specify the table name.
|
||||
table_name = 'fine_food_reviews'
|
||||
query_ob(queryStatement,table_name,'embedding',1)
|
||||
```
|
||||
|
||||
2. Enter a question for an answer.
|
||||
|
||||
```shell
|
||||
python3 openAIQuery.py 'pet food'
|
||||
```
|
||||
|
||||
The expected result is as follows:
|
||||
|
||||
```shell
|
||||
Crack for dogs.: These thing are like crack for dogs. I am not sure of the make-up but the doggies sure love them.
|
||||
```
|
||||
@@ -0,0 +1,205 @@
|
||||
---
|
||||
sidebar_label: Qwen
|
||||
slug: /qwen
|
||||
---
|
||||
|
||||
# Qwen
|
||||
|
||||
[Tongyi Qianwen (Qwen)](https://tongyi.aliyun.com) is a large language model (LLM) developed by Alibaba Cloud for interpreting and analyzing user inputs. You can use the API of Qwen in the [Alibaba Cloud Model Studio](https://bailian.console.alibabacloud.com/?spm=a2c63.p38356.0.0.948073b58ycZ3f&accounttraceid=ffba8dd7c8ef4dfd95c06513316ac8cfacdj#/home).
|
||||
|
||||
seekdb offers features such as vector storage, vector indexing, and embedding-based vector search. By using Qwen's API, you can convert data into vectors, store these vectors in seekdb, and then take advantage of seekdb's vector search capabilities to find relevant data.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
* You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
|
||||
* You have installed [Python 3.9 or later](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/).
|
||||
* You have installed [Poetry](https://python-poetry.org/docs/), [Pyobvector](https://github.com/oceanbase/pyobvector), and DashScope SDK. The installation commands are as follows:
|
||||
|
||||
```shell
|
||||
pip install poetry
|
||||
pip install pyobvector
|
||||
pip install dashscope
|
||||
```
|
||||
|
||||
* You have obtained the [Qwen API key](https://help.aliyun.com/zh/model-studio/developer-reference/get-api-key).
|
||||
|
||||
## Step 1: Obtain the connection string of seekdb
|
||||
|
||||
Contact the seekdb deployment engineer or administrator to obtain the connection string of seekdb, for example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to be accessed.
|
||||
|
||||
:::tip
|
||||
The user for connection must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password of the account.
|
||||
|
||||
## Step 2: Configure the environment variable for the Qwen API key
|
||||
|
||||
For a Unix-based system (such as Ubuntu or MacOS), run the following command in the terminal:
|
||||
|
||||
```shell
|
||||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||||
```
|
||||
|
||||
For Windows, run the following command in the command prompt:
|
||||
|
||||
```shell
|
||||
set DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY
|
||||
```
|
||||
|
||||
You must replace `YOUR_DASHSCOPE_API_KEY` with the actual Qwen API key.
|
||||
|
||||
## Step 3: Store the vector data in seekdb
|
||||
|
||||
1. Prepare the test data.
|
||||
Download the [CSV file](https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240827/srxyhu/fine_food_reviews.csv) that already contains the vectorized data. This CSV file includes 1,000 food review entries, and the last column contains the vector values. Therefore, you do not need to calculate the vectors yourself. If you want to recalculate the embeddings for the "embedding" column (the vector column), you can use the following code to generate a new CSV file:
|
||||
|
||||
```shell
|
||||
import dashscope
|
||||
import pandas as pd
|
||||
input_datapath = "./fine_food_reviews.csv"
|
||||
# Here the text_embedding_v1 model is used. You can change the model as needed.
|
||||
def generate_embeddings(text):
|
||||
rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
|
||||
embeddings = [record['embedding'] for record in rsp.output['embeddings']]
|
||||
return embeddings if isinstance(text, list) else embeddings[0]
|
||||
df = pd.read_csv(input_datapath, index_col=0)
|
||||
# It takes a few minutes to generate the CSV file by calling the Tongyi Qianwen Embedding API row by row.
|
||||
df["embedding"] = df.combined.apply(generate_embeddings)
|
||||
output_datapath = './fine_food_reviews_self_embeddings.csv'
|
||||
df.to_csv(output_datapath)
|
||||
```
|
||||
|
||||
2. Execute the following script to insert the test data into seekdb. The directory where the script is located must be the same as the directory where the test data is stored.
|
||||
|
||||
```shell
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import json
|
||||
from pyobvector import *
|
||||
from sqlalchemy import Column, Integer, String
|
||||
# Use pyobvector to connect to seekdb. If @ is in the username or password, replace it with %40.
|
||||
client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
|
||||
# The test dataset is prepared in advance and has been vectorized. By default, it is placed in the same directory as the Python script. If you have vectorized it yourself, replace it with the corresponding file.
|
||||
file_name = "fine_food_reviews.csv"
|
||||
file_path = os.path.join("./", file_name)
|
||||
# Define the columns. The vectorized column is placed in the last field.
|
||||
cols = [
|
||||
Column('id', Integer, primary_key=True, autoincrement=False),
|
||||
Column('product_id', String(256), nullable=True),
|
||||
Column('user_id', String(256), nullable=True),
|
||||
Column('score', Integer, nullable=True),
|
||||
Column('summary', String(2048), nullable=True),
|
||||
Column('text', String(8192), nullable=True),
|
||||
Column('combined', String(8192), nullable=True),
|
||||
Column('n_tokens', Integer, nullable=True),
|
||||
Column('embedding', VECTOR(1536))
|
||||
]
|
||||
# Table name
|
||||
table_name = 'fine_food_reviews'
|
||||
# If the table does not exist, create it.
|
||||
if not client.check_table_exists(table_name):
|
||||
client.create_table(table_name,columns=cols)
|
||||
# Create an index for the vector column.
|
||||
client.create_index(
|
||||
table_name=table_name,
|
||||
is_vec_index=True,
|
||||
index_name='vidx',
|
||||
column_names=['embedding'],
|
||||
vidx_params='distance=l2, type=hnsw, lib=vsag',
|
||||
)
|
||||
# Open and read the CSV file.
|
||||
with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
|
||||
csvreader = csv.reader(csvfile)
|
||||
# Read the header row.
|
||||
headers = next(csvreader)
|
||||
print("Headers:", headers)
|
||||
batch = [] # Store data and insert it into the database every 10 rows.
|
||||
for i, row in enumerate(csvreader):
|
||||
# The CSV file has 9 fields: id, product_id, user_id, score, summary, text, combined, n_tokens, embedding.
|
||||
if not row:
|
||||
break
|
||||
food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
|
||||
'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
|
||||
batch.append(food_review_line)
|
||||
# Insert data every 10 rows.
|
||||
if (i + 1) % 10 == 0:
|
||||
client.insert(table_name,batch)
|
||||
batch = [] # Clear the cache.
|
||||
# Insert the remaining rows (if any).
|
||||
if batch:
|
||||
client.insert(table_name,batch)
|
||||
# Check the data in the table to ensure that all data has been inserted.
|
||||
count_sql = f"select count(*) from {table_name};"
|
||||
cursor = client.perform_raw_text_sql(count_sql)
|
||||
result = cursor.fetchone()
|
||||
print(f"Total number of imported data: {result[0]}")
|
||||
```
|
||||
|
||||
## Step 4: Query seekdb data
|
||||
|
||||
1. Save the following Python script as `query.py`.
|
||||
|
||||
```shell
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
import json
|
||||
from pyobvector import *
|
||||
from sqlalchemy import func
|
||||
import dashscope
|
||||
# Get command-line arguments
|
||||
if len(sys.argv) != 2:
|
||||
print("Please enter a query statement.")
|
||||
sys.exit()
|
||||
queryStatement = sys.argv[1]
|
||||
# Use pyobvector to connect to seekdb. If the username or password contains @, replace it with %40.
|
||||
client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
|
||||
# Define a function to generate text vectors.
|
||||
def generate_embeddings(text):
|
||||
rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
|
||||
embeddings = [record['embedding'] for record in rsp.output['embeddings']]
|
||||
return embeddings if isinstance(text, list) else embeddings[0]
|
||||
|
||||
def query_ob(query, tableName, vector_name="embedding", top_k=1):
|
||||
embedding = generate_embeddings(query)
|
||||
# Execute approximate nearest neighbor search.
|
||||
res = client.ann_search(
|
||||
table_name=tableName,
|
||||
vec_data=embedding,
|
||||
vec_column_name=vector_name,
|
||||
distance_func=func.l2_distance,
|
||||
topk=top_k,
|
||||
output_column_names=['combined']
|
||||
)
|
||||
for row in res:
|
||||
print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
|
||||
# Table name
|
||||
table_name = 'fine_food_reviews'
|
||||
query_ob(queryStatement,table_name,'embedding',1)
|
||||
```
|
||||
|
||||
2. Enter a question and obtain the related answer.
|
||||
|
||||
```shell
|
||||
python3 query.py 'pet food'
|
||||
```
|
||||
|
||||
The expected result is as follows:
|
||||
|
||||
```shell
|
||||
This is so good!: I purchased this after my sister sent a small bag to me in a gift box. I loved it so much I wanted to find it to buy for myself and keep it around. I always look on Amazon because you can find everything here and true enough, I found this wonderful candy. It is nice to keep in your purse for when you are out and about and get a dry throat or a tickle in the back of your throat. It is also nice to have in a candy dish at home for guests to try.
|
||||
```
|
||||
@@ -0,0 +1,183 @@
|
||||
---
|
||||
sidebar_label: LangChain
|
||||
slug: /langchain
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with LangChain
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
LangChain is a framework for developing language model-driven applications. It enables an application to have the following capabilities:
|
||||
|
||||
* Context awareness: The application can connect language models to context sources, such as prompt instructions, a few examples, and content requiring responses.
|
||||
* Reasoning: The application can perform reasoning based on language models. For example, it can decide how to answer a question or what actions to take based on the provided context.
|
||||
|
||||
This topic describes how to integrate the [vector search feature](../../200.develop/100.vector-search/100.vector-search-overview/100.vector-search-intro.md) of seekdb with the [Tongyi Qianwen (Qwen) API](https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1) and [LangChain](https://python.langchain.com/) for Document Question Answering (DQA).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
* Your environment has a database and account with read and write privileges.
|
||||
* You have installed Python 3.9 or later.
|
||||
* You have installed required dependencies:
|
||||
|
||||
```shell
|
||||
python3 -m pip install -U langchain-oceanbase
|
||||
python3 -m pip install langchain_community
|
||||
python3 -m pip install dashscope
|
||||
```
|
||||
|
||||
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account, in the format of `username`.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
|
||||
|
||||
## Step 2: Build your AI assistant
|
||||
|
||||
### Set the environment variable for the Qwen API key
|
||||
|
||||
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
|
||||
|
||||
```shell
|
||||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||||
```
|
||||
|
||||
### Load and split the documents
|
||||
|
||||
Download the sample data and split it into chunks of approximately 1000 characters using the `CharacterTextSplitter` class.
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders import TextLoader
|
||||
from langchain_community.embeddings import DashScopeEmbeddings
|
||||
from langchain_text_splitters import CharacterTextSplitter
|
||||
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
|
||||
import os
|
||||
import requests
|
||||
|
||||
DASHSCOPE_API = os.environ.get("DASHSCOPE_API_KEY", "")
|
||||
embeddings = DashScopeEmbeddings(
|
||||
model="text-embedding-v1", dashscope_api_key=DASHSCOPE_API
|
||||
)
|
||||
|
||||
url = "https://raw.githubusercontent.com/GITHUBear/langchain/refs/heads/master/docs/docs/how_to/state_of_the_union.txt"
|
||||
res = requests.get(url)
|
||||
with open("state_of_the_union.txt", "w") as f:
|
||||
f.write(res.text)
|
||||
|
||||
loader = TextLoader('./state_of_the_union.txt')
|
||||
|
||||
documents = loader.load()
|
||||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
|
||||
docs = text_splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### Insert the data into seekdb
|
||||
|
||||
```python
|
||||
connection_args = {
|
||||
"host": "127.0.0.1",
|
||||
"port": "2881",
|
||||
"user": "root@user_name",
|
||||
"password": "",
|
||||
"db_name": "test",
|
||||
}
|
||||
DEMO_TABLE_NAME = "demo_ann"
|
||||
ob = OceanbaseVectorStore(
|
||||
embedding_function=embeddings,
|
||||
table_name=DEMO_TABLE_NAME,
|
||||
connection_args=connection_args,
|
||||
drop_old=True,
|
||||
normalize=True,
|
||||
)
|
||||
res = ob.add_documents(documents=docs)
|
||||
```
|
||||
|
||||
### Vector search
|
||||
|
||||
This step shows how to query `"What did the president say about Ketanji Brown Jackson"` from the document `state_of_the_union.txt`.
|
||||
|
||||
```python
|
||||
query = "What did the president say about Ketanji Brown Jackson"
|
||||
docs_with_score = ob.similarity_search_with_score(query, k=3)
|
||||
|
||||
for doc, score in docs_with_score:
|
||||
print("-" * 80)
|
||||
print("Score: ", score)
|
||||
print(doc.page_content)
|
||||
print("-" * 80)
|
||||
```
|
||||
|
||||
Expected output:
|
||||
|
||||
```shell
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.204783671324283
|
||||
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
|
||||
|
||||
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
|
||||
|
||||
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
|
||||
|
||||
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
|
||||
--------------------------------------------------------------------------------
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.2146663629717394
|
||||
It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
|
||||
|
||||
As I’ve told Xi Jinping, it is never a good bet to bet against the American people.
|
||||
|
||||
We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
|
||||
|
||||
And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
|
||||
|
||||
We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
|
||||
|
||||
4,000 projects have already been announced.
|
||||
|
||||
And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
|
||||
--------------------------------------------------------------------------------
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.2193955178945004
|
||||
Vice President Harris and I ran for office with a new economic vision for America.
|
||||
|
||||
Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
|
||||
and the middle out, not from the top down.
|
||||
|
||||
Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
|
||||
|
||||
America used to have the best roads, bridges, and airports on Earth.
|
||||
|
||||
Now our infrastructure is ranked 13th in the world.
|
||||
|
||||
We won’t be able to compete for the jobs of the 21st Century if we don’t fix that.
|
||||
|
||||
That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
|
||||
|
||||
This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
|
||||
|
||||
We’re done talking about infrastructure weeks.
|
||||
|
||||
We’re going to have an infrastructure decade.
|
||||
--------------------------------------------------------------------------------
|
||||
```
|
||||
@@ -0,0 +1,125 @@
|
||||
---
|
||||
sidebar_label: LlamaIndex
|
||||
slug: /llamaindex
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with LlamaIndex
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
LlamaIndex is a framework for building context-augmented generative AI applications by using large language models (LLMs), including proxies and workflows. It provides a wealth of capabilities such as data connectors, data indexes, proxies, observability/assessment integration, and workflows.
|
||||
|
||||
This topic demonstrates how to integrate the vector search feature of seekdb with the Tongyi Qianwen (Qwen) API and LlamaIndex for Document Question Answering (DQA).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed the seekdb database.
|
||||
* Your environment has a database and account with read and write privileges.
|
||||
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
|
||||
* You have installed Python 3.9 or later.
|
||||
* You have installed the required dependencies:
|
||||
|
||||
```shell
|
||||
python3 -m pip install llama-index-vector-stores-oceanbase llama-index
|
||||
python3 -m pip install llama-index-embeddings-dashscope
|
||||
python3 -m pip install llama-index-llms-dashscope
|
||||
```
|
||||
|
||||
* You have obtained the Qwen API key.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account, in the format of `username`.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
|
||||
|
||||
## Step 2: Build your AI assistant
|
||||
|
||||
### Set the environment variable for the Qwen API key
|
||||
|
||||
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
|
||||
|
||||
```shell
|
||||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||||
```
|
||||
|
||||
### Download the sample data
|
||||
|
||||
```shell
|
||||
mkdir -p '/root/llamaindex/paul_graham/'
|
||||
wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '/root/llamaindex/paul_graham/paul_graham_essay.txt'
|
||||
```
|
||||
|
||||
### Load the data text
|
||||
|
||||
```python
|
||||
import os
|
||||
from pyobvector import ObVecClient
|
||||
from llama_index.core import Settings
|
||||
from llama_index.embeddings.dashscope import DashScopeEmbedding
|
||||
from llama_index.core import (
|
||||
SimpleDirectoryReader,
|
||||
load_index_from_storage,
|
||||
VectorStoreIndex,
|
||||
StorageContext,
|
||||
)
|
||||
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
|
||||
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
|
||||
#set ob client
|
||||
client = ObVecClient(uri="127.0.0.1:2881", user="root@test",password="",db_name="test")
|
||||
# Global Settings
|
||||
Settings.embed_model = DashScopeEmbedding()
|
||||
# config llm model
|
||||
dashscope_llm = DashScope(
|
||||
model_name=DashScopeGenerationModels.QWEN_MAX,
|
||||
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
|
||||
)
|
||||
# load documents
|
||||
documents = SimpleDirectoryReader("/root/llamaindex/paul_graham/").load_data()
|
||||
oceanbase = OceanBaseVectorStore(
|
||||
client=client,
|
||||
dim=1536,
|
||||
drop_old=True,
|
||||
normalize=True,
|
||||
)
|
||||
|
||||
storage_context = StorageContext.from_defaults(vector_store=oceanbase)
|
||||
index = VectorStoreIndex.from_documents(
|
||||
documents, storage_context=storage_context
|
||||
)
|
||||
```
|
||||
|
||||
## Vector search
|
||||
|
||||
This step shows how to query `"What did the author do growing up?"` from the document `paul_graham_essay.txt`.
|
||||
|
||||
```shell
|
||||
# set Logging to DEBUG for more detailed outputs
|
||||
query_engine = index.as_query_engine(llm=dashscope_llm)
|
||||
res = query_engine.query("What did the author do growing up?")
|
||||
res.response
|
||||
```
|
||||
|
||||
Expected result:
|
||||
|
||||
```python
|
||||
'Growing up, the author worked on writing and programming outside of school. In terms of writing, he wrote short stories, which he now considers to be awful, as they had very little plot and focused mainly on characters with strong feelings. For programming, he started in 9th grade by trying to write programs on an IBM 1401 at his school, using an early version of Fortran. Later, after getting a TRS-80 microcomputer, he began to write more practical programs, including simple games, a program to predict the flight height of model rockets, and a word processor that his father used for writing.'
|
||||
```
|
||||
@@ -0,0 +1,338 @@
|
||||
---
|
||||
sidebar_label: Spring AI
|
||||
slug: /springai
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with Spring AI Alibaba
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
The Spring AI Alibaba project is an open-source project that uses Spring AI and provides the best practices for developing Java applications with AI. It simplifies the AI application development process and adapts to cloud-native infrastructure. It helps developers quickly build AI applications.
|
||||
|
||||
This topic describes how to integrate the vector search capability of seekdb with Spring AI Alibaba to implement data import and similarity search features. By configuring vector storage and search services, developers can easily build AI application scenarios based on seekdb, supporting advanced features such as text similarity search and content recommendation.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* Download [JDK 17+](https://www.oracle.com/java/technologies/downloads/#java17). Make sure that you have installed Java 17 and configured the environment variables.
|
||||
|
||||
* Download [Maven](https://dlcdn.apache.org/maven/). Make sure that you have installed Maven 3.6+ for building projects and managing dependencies.
|
||||
|
||||
* Download [IntelliJ IDEA](https://www.jetbrains.com/idea/download/) or [Eclipse](https://www.eclipse.org/downloads/). Choose the version that suits your operating system and install it.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Set up the Maven project
|
||||
|
||||
Maven is a project management and build tool used in this topic. This step describes how to create a Maven project and add project dependencies by configuring the `pom.xml` file.
|
||||
|
||||
### Create a project
|
||||
|
||||
1. Run the following Maven command to create a project.
|
||||
|
||||
```shell
|
||||
mvn archetype:generate -DgroupId=com.alibaba.cloud.ai.example -DartifactId=vector-oceanbase-example -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
|
||||
```
|
||||
|
||||
2. Go to the project directory.
|
||||
|
||||
```shell
|
||||
cd vector-oceanbase-example
|
||||
```
|
||||
|
||||
### Configure the `pom.xml` file
|
||||
|
||||
The `pom.xml` file is the core configuration file of the Maven project, used to manage project dependencies, plugins, configurations, and other information. To build the project, you need to modify the `pom.xml` file and add Spring AI Alibaba, seekdb vector storage, and other necessary dependencies.
|
||||
|
||||
Open the `pom.xml` file and replace the existing content with the following:
|
||||
|
||||
```xml
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.cloud.ai.example</groupId>
|
||||
<artifactId>spring-ai-alibaba-vector-databases-example</artifactId>
|
||||
<version>1.0.0</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>vector-oceanbase-example</artifactId>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>17</maven.compiler.source>
|
||||
<maven.compiler.target>17</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
</properties>
|
||||
|
||||
<dependencies>
|
||||
<!-- Alibaba Cloud AI starter -->
|
||||
<dependency>
|
||||
<groupId>com.alibaba.cloud.ai</groupId>
|
||||
<artifactId>spring-ai-alibaba-starter</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring Boot Web support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-web</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring AI auto-configuration -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.ai</groupId>
|
||||
<artifactId>spring-ai-spring-boot-autoconfigure</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring JDBC support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework</groupId>
|
||||
<artifactId>spring-jdbc</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Transformers model support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.ai</groupId>
|
||||
<artifactId>spring-ai-transformers</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- OceanBase Vector Database starter -->
|
||||
<dependency>
|
||||
<groupId>com.alibaba.cloud.ai</groupId>
|
||||
<artifactId>spring-ai-alibaba-starter-oceanbase-store</artifactId>
|
||||
<version>1.0.0-M6.2-SNAPSHOT</version>
|
||||
</dependency>
|
||||
|
||||
<!-- OceanBase JDBC driver -->
|
||||
<dependency>
|
||||
<groupId>com.oceanbase</groupId>
|
||||
<artifactId>oceanbase-client</artifactId>
|
||||
<version>2.4.14</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<!-- SNAPSHOT repository configuration -->
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>sonatype-snapshots</id>
|
||||
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
|
||||
<releases>
|
||||
<enabled>false</enabled>
|
||||
</releases>
|
||||
<snapshots>
|
||||
<enabled>true</enabled>
|
||||
</snapshots>
|
||||
</repository>
|
||||
</repositories>
|
||||
</project>
|
||||
```
|
||||
|
||||
## Step 3: Configure the connection information of seekdb
|
||||
|
||||
This step configures the `application.yml` file to add the connection information of seekdb.
|
||||
|
||||
Create the `application.yml` file in the `src/main/resources` directory of the project and add the following content:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
port: 8080
|
||||
|
||||
spring:
|
||||
application:
|
||||
name: oceanbase-example
|
||||
ai:
|
||||
dashscope:
|
||||
api-key: ${DASHSCOPE_API_KEY} # Replace with your DashScope API Key
|
||||
vectorstore:
|
||||
oceanbase:
|
||||
enabled: true
|
||||
url: jdbc:oceanbase://xxx:xxx/xxx # URL for connecting to seekdb
|
||||
username: xxx # Username of seekdb
|
||||
password: xxx # Password of seekdb
|
||||
tableName: vector_table # Name of the vector table (automatically created)
|
||||
defaultTopK: 2 # Default number of similar results to return
|
||||
defaultSimilarityThreshold: 0.8 # Similarity threshold (0~1, smaller values indicate higher similarity)
|
||||
```
|
||||
|
||||
## Step 4: Create the main application class and controller
|
||||
|
||||
Create the startup class and controller class of the Spring Boot application to implement the data import and similarity search features.
|
||||
|
||||
### Create an application startup class
|
||||
|
||||
Create a file named `OceanBaseApplication.java` in the `src/main/java/com/alibaba/cloud/ai/example/vector` directory, and add the following code to the file:
|
||||
|
||||
```java
|
||||
package com.alibaba.cloud.ai.example.vector; // The package name must be consistent with the directory structure.
|
||||
|
||||
import org.springframework.boot.SpringApplication;
|
||||
import org.springframework.boot.autoconfigure.SpringBootApplication;
|
||||
|
||||
@SpringBootApplication // Enable Spring Boot auto-configuration
|
||||
public class OceanBaseApplication {
|
||||
public static void main(String[] args) {
|
||||
SpringApplication.run(OceanBaseApplication.class, args); // Start the Spring Boot application
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The sample code creates the core startup class for the project, which is used to start the Spring Boot application.
|
||||
|
||||
### Create a vector storage controller
|
||||
|
||||
Create the `OceanBaseController.java` file in the `src/main/java/com/alibaba/cloud/ai/example/vector` directory and add the following code:
|
||||
|
||||
```java
|
||||
package com.alibaba.cloud.ai.example.vector.controller; // The package name must be consistent with the directory structure.
|
||||
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.ai.document.Document;
|
||||
import org.springframework.ai.vectorstore.SearchRequest;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.web.bind.annotation.GetMapping;
|
||||
import org.springframework.web.bind.annotation.RequestMapping;
|
||||
import org.springframework.web.bind.annotation.RestController;
|
||||
|
||||
import java.util.HashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
import com.alibaba.cloud.ai.vectorstore.oceanbase.OceanBaseVectorStore;
|
||||
|
||||
@RestController // Mark the class as a REST controller.
|
||||
@RequestMapping("/oceanbase") // Set the base path to /oceanbase.
|
||||
public class OceanBaseController {
|
||||
|
||||
private static final Logger logger = LoggerFactory.getLogger(OceanBaseController.class); // The logger.
|
||||
|
||||
@Autowired // Automatically inject the seekdb vector store service.
|
||||
private OceanBaseVectorStore oceanBaseVectorStore;
|
||||
|
||||
// The data import interface.
|
||||
@GetMapping("/import")
|
||||
public void importData() {
|
||||
logger.info("Start importing data");
|
||||
|
||||
// Create sample data.
|
||||
HashMap<String, Object> map = new HashMap<>();
|
||||
map.put("id", "12345");
|
||||
map.put("year", "2025");
|
||||
map.put("name", "yingzi");
|
||||
|
||||
// Create a list that contains three documents.
|
||||
List<Document> documents = List.of(
|
||||
new Document("The World is Big and Salvation Lurks Around the Corner"),
|
||||
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("year", 2024)),
|
||||
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", map)
|
||||
);
|
||||
|
||||
// Add the documents to the vector store.
|
||||
oceanBaseVectorStore.add(documents);
|
||||
}
|
||||
|
||||
// The similar document search interface.
|
||||
@GetMapping("/search")
|
||||
public List<Document> search() {
|
||||
logger.info("Start searching data");
|
||||
|
||||
// Perform a similarity search for documents that contain "Spring" and return the two most similar results.
|
||||
return oceanBaseVectorStore.similaritySearch(SearchRequest.builder()
|
||||
.query("Spring")
|
||||
.topK(2)
|
||||
.build());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 5: Start and test the Maven project
|
||||
|
||||
### Start the project using an IDE
|
||||
|
||||
The following example shows how to start the project using IntelliJ IDEA.
|
||||
|
||||
The steps are as follows:
|
||||
|
||||
1. Open the project by clicking **File** > **Open** and selecting `pom.xml`.
|
||||
2. Select **Open as a project**.
|
||||
3. Find the main class `OceanBaseApplication.java`.
|
||||
4. Right-click and select **Run 'OceanBaseApplication.main()'**.
|
||||
|
||||
### Test the project
|
||||
|
||||
1. Import the test data by visiting the following URL:
|
||||
|
||||
```shell
|
||||
http://localhost:8080/oceanbase/import
|
||||
```
|
||||
|
||||
2. Perform vector search by visiting the following URL:
|
||||
|
||||
```shell
|
||||
http://localhost:8080/oceanbase/search
|
||||
```
|
||||
|
||||
The expected result is as follows:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "03fe9aad-13cc-4d25-807b-ca1bc314f571",
|
||||
"text": "Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!",
|
||||
"metadata": {
|
||||
"name": "yingzi",
|
||||
"id": "12345",
|
||||
"year": "2025",
|
||||
"distance": "7.274442499114312"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "75864954-0a23-4fa1-8e18-b78fd870d474",
|
||||
"text": "Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!",
|
||||
"metadata": {
|
||||
"name": "yingzi",
|
||||
"id": "12345",
|
||||
"year": "2025",
|
||||
"distance": "7.274442499114312"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
### seekdb connection failure
|
||||
|
||||
* Cause: The URL, username, or password is incorrect.
|
||||
* Solution: Check the seekdb configuration in `application.yml` and make sure the database service is running.
|
||||
|
||||
### Dependency conflict
|
||||
|
||||
* Cause: Conflicts between multiple Spring Boot versions.
|
||||
* Solution: Run `mvn dependency:tree` to view the dependency tree and exclude the conflicting versions.
|
||||
|
||||
### SNAPSHOT dependency cannot be downloaded
|
||||
|
||||
* Cause: The SNAPSHOT repository is not configured.
|
||||
* Solution: Make sure that the `sonatype-snapshots` repository is added in `pom.xml`.
|
||||
@@ -0,0 +1,72 @@
|
||||
---
|
||||
sidebar_label: Dify
|
||||
slug: /dify
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with Dify
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
Dify is an open-source Large Language Model (LLM) application development platform. Combining Backend as Service (BaaS) and LLMOps concepts, it enables developers to quickly build production-ready generative AI applications. Even non-technical users can participate in defining AI applications and managing data operations.
|
||||
|
||||
Dify includes essential technologies for building LLM applications: support for hundreds of models, an intuitive prompt orchestration interface, a high-quality RAG engine, a robust agent framework, flexible workflow orchestration, along with user-friendly interfaces and APIs. This eliminates redundant development efforts, enabling developers to focus on innovation and business needs.
|
||||
|
||||
This topic describes how to integrate the vector search capability of seekdb with Dify.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* Before deploying Dify, ensure that your machine meets the following minimum system requirements:
|
||||
|
||||
* CPU: 2 cores
|
||||
* Memory: 4 GB
|
||||
|
||||
* This integration tutorial runs on Docker container platform. Ensure you have set up the [Docker platform](https://docs.docker.com/get-started/get-docker/).
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Deploy Dify
|
||||
|
||||
### Method 1
|
||||
|
||||
For Dify deployment, refer to [Deploy with Docker Compose](https://docs.dify.ai/getting-started/install-self-hosted/docker-compose) with these modifications:
|
||||
|
||||
* Change the `VECTOR_STORE` variable value to `oceanbase` in `.env` file.
|
||||
* Start services using `docker compose --profile oceanbase up -d`.
|
||||
|
||||
### Method 2
|
||||
|
||||
Alternatively, you can refer to [Dify on MySQL](https://github.com/oceanbase/dify-on-mysql) to quickly start the Dify service.
|
||||
|
||||
To start the service, run the following commands:
|
||||
|
||||
```shell
|
||||
cd docker
|
||||
bash setup-mysql-env.sh
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Step 3: Use Dify
|
||||
|
||||
For information about connecting LLMs in Dify, refer to [Model Configuration](https://docs.dify.ai/guides/model-configuration).
|
||||
@@ -0,0 +1,265 @@
|
||||
---
|
||||
sidebar_label: n8n
|
||||
slug: /n8n
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with n8n
|
||||
|
||||
n8n is a workflow automation platform with native AI capabilities, providing technical teams with the flexibility of code and the speed of no-code. With over 400 integrations, native AI features, and a fair code license, n8n allows you to build a robust automation while maintaining full control over your data and deployments.
|
||||
|
||||
This topic demonstrates how to build a Chat to seekdb workflow template using the powerful features of n8n.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* This integration tutorial is performed in a Docker container. Make sure that you have [set up a Docker container](https://docs.docker.com/get-started/get-docker/).
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string, for example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Create a test table and insert data
|
||||
|
||||
Before you build the workflow, you need to create a sample table in seekdb to store book information and insert some sample data.
|
||||
|
||||
```sql
|
||||
CREATE TABLE books (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
isbn13 VARCHAR(255),
|
||||
author TEXT,
|
||||
title VARCHAR(255),
|
||||
publisher VARCHAR(255),
|
||||
category TEXT,
|
||||
pages INT,
|
||||
price DECIMAL(10,2),
|
||||
format VARCHAR(50),
|
||||
rating DECIMAL(3,1),
|
||||
release_year YEAR
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'database-internals',
|
||||
'978-1492040347',
|
||||
'"Alexander Petrov"',
|
||||
'Database Internals: A deep-dive into how distributed data systems work',
|
||||
'O\'Reilly',
|
||||
'["databases","information systems"]',
|
||||
350,
|
||||
47.28,
|
||||
'paperback',
|
||||
4.5,
|
||||
2019
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'designing-data-intensive-applications',
|
||||
'978-1449373320',
|
||||
'"Martin Kleppmann"',
|
||||
'Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems',
|
||||
'O\'Reilly',
|
||||
'["databases"]',
|
||||
590,
|
||||
31.06,
|
||||
'paperback',
|
||||
4.4,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'kafka-the-definitive-guide',
|
||||
'978-1491936160',
|
||||
'["Neha Narkhede", "Gwen Shapira", "Todd Palino"]',
|
||||
'Kafka: The Definitive Guide: Real-time data and stream processing at scale',
|
||||
'O\'Reilly',
|
||||
'["databases"]',
|
||||
297,
|
||||
37.31,
|
||||
'paperback',
|
||||
3.9,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'effective-java',
|
||||
'978-1491936160',
|
||||
'"Joshua Block"',
|
||||
'Effective Java',
|
||||
'Addison-Wesley',
|
||||
'["programming languages", "java"]',
|
||||
412,
|
||||
27.91,
|
||||
'paperback',
|
||||
4.2,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'daemon',
|
||||
'978-1847249616',
|
||||
'"Daniel Suarez"',
|
||||
'Daemon',
|
||||
'Quercus',
|
||||
'["dystopia","novel"]',
|
||||
448,
|
||||
12.03,
|
||||
'paperback',
|
||||
4.0,
|
||||
2011
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'cryptonomicon',
|
||||
'978-1847249616',
|
||||
'"Neal Stephenson"',
|
||||
'Cryptonomicon',
|
||||
'Avon',
|
||||
'["thriller", "novel"]',
|
||||
1152,
|
||||
6.99,
|
||||
'paperback',
|
||||
4.0,
|
||||
2002
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'garbage-collection-handbook',
|
||||
'978-1420082791',
|
||||
'["Richard Jones", "Antony Hosking", "Eliot Moss"]',
|
||||
'The Garbage Collection Handbook: The Art of Automatic Memory Management',
|
||||
'Taylor & Francis',
|
||||
'["programming algorithms"]',
|
||||
511,
|
||||
87.85,
|
||||
'paperback',
|
||||
5.0,
|
||||
2011
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'radical-candor',
|
||||
'978-1250258403',
|
||||
'"Kim Scott"',
|
||||
'Radical Candor: Be a Kick-Ass Boss Without Losing Your Humanity',
|
||||
'Macmillan',
|
||||
'["human resources","management", "new work"]',
|
||||
404,
|
||||
7.29,
|
||||
'paperback',
|
||||
4.0,
|
||||
2018
|
||||
);
|
||||
```
|
||||
|
||||
## Step 3: Deploy the tools
|
||||
|
||||
### Private deployment of n8n
|
||||
|
||||
n8n is a workflow automation platform based on Node.js. It provides extensive integration capabilities and flexible extensibility. By privately deploying n8n, you can better control the runtime environment of your workflows and ensure the security and privacy of your data. This section describes how to deploy n8n in a Docker environment.
|
||||
|
||||
```shell
|
||||
sudo docker run -d --name n8n -p 5678:5678 -e N8N_SECURE_COOKIE=false n8nio/n8n
|
||||
```
|
||||
|
||||
### Deploy the Qwen3 model using Ollama
|
||||
|
||||
Ollama is an open-source AI model server that supports the deployment and management of multiple AI models. With Ollama, you can easily deploy the Qwen3 model locally to enable AI Agent functionality. This section describes how to deploy Ollama in a Docker environment, and then use Ollama to deploy the Qwen3 model.
|
||||
|
||||
```shell
|
||||
# Deploy Ollama in a Docker environment
|
||||
sudo docker run -d -p 11434:11434 --name ollama ollama/ollama
|
||||
# Deploy the Qwen3 model
|
||||
sudo docker exec -it ollama sh -c 'ollama run qwen3:latest'
|
||||
```
|
||||
|
||||
## Step 4: Build an AI Agent workflow
|
||||
|
||||
n8n provides a variety of nodes to build an AI Agent workflow. This section shows you how to build a Chat to seekdb workflow template. The workflow consists of five nodes, and the steps are as follows:
|
||||
|
||||
1. Add a trigger
|
||||
|
||||
Add an HTTP trigger node to receive HTTP requests.
|
||||
|
||||

|
||||
|
||||
2. Add an AI Agent node
|
||||
|
||||
Add an AI Agent node to process AI Agent requests.
|
||||
|
||||

|
||||
|
||||
3. Add an Ollama Chat Model node
|
||||
|
||||
Select a free Ollama chat model, such as Qwen3, and configure the Ollama account.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
4. Add a Simple Memory node
|
||||
|
||||
The Simple Memory node indicates short-term memory and can remember the previous five interactions in the chat.
|
||||
|
||||

|
||||
|
||||
5. Add a Tool node
|
||||
|
||||
The Tool node is used to perform database operations in seekdb. Add a MySQL tool under AI Agent-tool.
|
||||
|
||||

|
||||
|
||||
Configure the MySQL tool as follows:
|
||||
|
||||

|
||||
|
||||
Click the edit icon shown in the preceding figure to configure the MySQL connection information.
|
||||
|
||||

|
||||
|
||||
After the configuration is completed, close the window. Click **Test step** in the upper-right corner of the configuration panel to test the database connection with a simple SQL statement, or click **Back to canvas** to return to the main interface.
|
||||
|
||||
6. Click **Save** to complete the workflow construction.
|
||||
|
||||
After all 5 nodes are configured, click **Save** to complete the workflow construction. You can then test the workflow.
|
||||
|
||||
<!--  -->
|
||||
|
||||
## Workflow demo
|
||||
|
||||
The completed workflow is displayed as follows:
|
||||
|
||||

|
||||
@@ -0,0 +1,284 @@
|
||||
---
|
||||
sidebar_label: Cursor
|
||||
slug: /cursor
|
||||
---
|
||||
|
||||
# Integrate OceanBase MCP Server with Cursor
|
||||
|
||||
[MCP (Model Context Protocol)](https://modelcontextprotocol.io/introduction) is an open-source protocol introduced by Anthropic in November 2024. It allows large language models to interact with external tools or data sources. With MCP, you do not need to manually copy and execute the output of large language models. Instead, the large language model can directly instruct tools to perform specific actions.
|
||||
|
||||
[MCP Server](https://github.com/oceanbase/mcp-oceanbase/tree/main/src/oceanbase_mcp_server) enables large language models to interact with OceanBase Database through the MCP protocol and execute SQL statements. With the right client, you can quickly build project prototypes. The server has been open-sourced on GitHub.
|
||||
|
||||
[Cursor](https://cursordocs.com) is an AI-powered code editor that supports multiple operating systems, including Windows, macOS, and Linux.
|
||||
|
||||
This topic demonstrates how to integrate Cursor with OceanBase MCP Server to quickly build a backend application.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* You have installed [Python 3.11 or later](https://www.python.org/downloads/) and the corresponding [pip](https://pip.pypa.io/en/stable/installation/). If your machine has an older version of Python, you can use Miniconda to create a new environment with Python 3.11 or above. For more information, see [Miniconda installation guide](https://docs.anaconda.com/miniconda/install/).
|
||||
|
||||
* You have installed [Git](https://git-scm.com//downloads) based on your operating system.
|
||||
|
||||
* You have installed the Python package manager uv. After the installation, run the `uv --version` command to verify whether the installation is successful:
|
||||
|
||||
```shell
|
||||
pip install uv
|
||||
uv --version
|
||||
```
|
||||
|
||||
* You have downloaded [Cursor](https://cursor.com/cn/downloads) and installed the version that matches your operating system. When you use Cursor for the first time, you need to register a new account or log in with an existing one. After logging in, you can create a new project or open an existing project.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
:::tip
|
||||
The connected user must have <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The username for connecting to the database.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Configure the OceanBase MCP Server
|
||||
|
||||
### Clone the OceanBase MCP Server repository
|
||||
|
||||
Run the following command to download the source code to your local device:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/oceanbase/mcp-oceanbase.git
|
||||
```
|
||||
|
||||
Go to the source code directory:
|
||||
|
||||
```shell
|
||||
cd mcp-oceanbase
|
||||
```
|
||||
|
||||
### Install dependencies
|
||||
|
||||
Run the following command in the `mcp-oceanbase` directory to create a virtual environment and install dependencies:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install .
|
||||
```
|
||||
|
||||
### Create a working directory for the Cursor client
|
||||
|
||||
Manually create a working directory (such as `cursor`) for the Cursor client and open it with Cursor. The files generated by Cursor will be stored in this directory.
|
||||
|
||||
### Add and configure the OceanBase MCP Server
|
||||
|
||||
1. Use Cursor V2.0.64 as an example. Click the **Open Settings** icon in the upper-right corner, select **Tools & MCP**, and click **New MCP Server**.
|
||||
|
||||

|
||||
|
||||
2. Edit the `mcp.json` configuration file.
|
||||
|
||||

|
||||
|
||||
Replace `path/to/your/mcp-oceanbase/src/oceanbase_mcp_server` with the absolute path of the `oceanbase_mcp_server` folder. Replace `OB_HOST`, `OB_PORT`, `OB_USER`, `OB_PASSWORD`, and `OB_DATABASE` with the corresponding information of your database:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"oceanbase": {
|
||||
"command": "uv",
|
||||
"args": [
|
||||
"--directory",
|
||||
"/path/to/your/mcp-oceanbase/src/oceanbase_mcp_server",
|
||||
"run",
|
||||
"oceanbase_mcp_server"
|
||||
],
|
||||
"env": {
|
||||
"OB_HOST": "***",
|
||||
"OB_PORT": "***",
|
||||
"OB_USER": "***",
|
||||
"OB_PASSWORD": "***",
|
||||
"OB_DATABASE": "***"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. If the configuration is successful, the MCP Server is displayed in ready status.
|
||||
|
||||

|
||||
|
||||
### Test the MCP Server
|
||||
|
||||
1. In the chat dialog box, enter the prompt: `How many tables are there in the dataanalysis_english database?`. The Cursor client will display the SQL statement to be executed. Confirm that it is correct and click the `Run` button to execute the query. The Cursor client will display all the table names in the `dataanalysis_english` database, indicating that we have successfully connected to seekdb.
|
||||
|
||||

|
||||
|
||||
### Use FastAPI to quickly create a RESTful API project
|
||||
|
||||
You can use FastAPI to quickly create a RESTful API project. FastAPI is a Python web framework for building RESTful APIs.
|
||||
|
||||
1. Create a customer table
|
||||
|
||||
In the dialog box, enter the prompt: `Create a customer table with the ID as the primary key and name, age, telephone, and location as fields`, confirm the SQL statement, and click `Run` to execute the query.
|
||||
|
||||

|
||||
|
||||
2. Insert test data
|
||||
|
||||
In the dialog box, enter the prompt: `Insert 10 rows of data into the customer table`, confirm the SQL statement, and click `Run` to execute the query. After the data is inserted, a message will be displayed: `Inserted 10 rows into the customer table. The data includes...`.
|
||||
|
||||

|
||||
|
||||
3. Create a FastAPI project
|
||||
|
||||
In the dialog box, enter the prompt: `Create a FastAPI project and generate a RESTful API based on the customer table`, confirm the SQL statement, and click `Run` to execute the query.
|
||||
|
||||

|
||||
|
||||
This step will automatically generate necessary files. It is recommended to select `Accept All` for the first use, because the content of the files generated by AI may be uncertain, and you can adjust them as needed later.
|
||||
|
||||
4. Create a virtual environment and install dependencies
|
||||
|
||||
Execute the following command to use the uv package manager to create a virtual environment and install the dependencies in the current directory:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install -r requirements.txt
|
||||
```
|
||||
|
||||
5. Start the FastAPI project
|
||||
|
||||
Execute the following command to start the FastAPI project:
|
||||
|
||||
```shell
|
||||
uvicorn main:app --reload
|
||||
```
|
||||
|
||||
6. View the data in the table
|
||||
|
||||
Run the following command in the command line or use other request tools to view the data in the table:
|
||||
|
||||
```shell
|
||||
curl http://127.0.0.1:8000/customers
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```json
|
||||
[{"ID":1,"name":"John Smith","age":28,"telephone":"555-0101","location":"New York, NY"},{"ID":2,"name":"Emily Johnson","age":35,"telephone":"555-0102","location":"Los Angeles, CA"},{"ID":3,"name":"Michael Brown","age":42,"telephone":"555-0103","location":"Chicago, IL"},{"ID":4,"name":"Sarah Davis","age":29,"telephone":"555-0104","location":"Houston, TX"},{"ID":5,"name":"David Wilson","age":51,"telephone":"555-0105","location":"Phoenix, AZ"},{"ID":6,"name":"Jessica Martinez","age":33,"telephone":"555-0106","location":"Philadelphia, PA"},{"ID":7,"name":"Robert Taylor","age":45,"telephone":"555-0107","location":"San Antonio, TX"},{"ID":8,"name":"Amanda Anderson","age":27,"telephone":"555-0108","location":"San Diego, CA"},{"ID":9,"name":"James Thomas","age":38,"telephone":"555-0109","location":"Dallas, TX"},{"ID":10,"name":"Lisa Jackson","age":31,"telephone":"555-0110","location":"San Jose, CA"}]
|
||||
```
|
||||
|
||||
You can see that the RESTful APIs for creating, deleting, updating, and querying data have been successfully generated:
|
||||
|
||||
```shell
|
||||
from fastapi import FastAPI, HTTPException, Depends
|
||||
from pydantic import BaseModel
|
||||
from typing import List
|
||||
from sqlalchemy import create_engine, Column, Integer, String
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from sqlalchemy.orm import sessionmaker, Session
|
||||
|
||||
# seekdb connection configuration (modify it as needed)
|
||||
DATABASE_URL = "mysql://***:***@***:***/***"
|
||||
|
||||
engine = create_engine(DATABASE_URL, echo=True)
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
Base = declarative_base()
|
||||
|
||||
class Customer(Base):
|
||||
__tablename__ = "customer"
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
name = Column(String(100))
|
||||
age = Column(Integer)
|
||||
telephone = Column(String(20))
|
||||
location = Column(String(100))
|
||||
|
||||
class CustomerCreate(BaseModel):
|
||||
id: int
|
||||
name: str
|
||||
age: int
|
||||
telephone: str
|
||||
location: str
|
||||
|
||||
class CustomerUpdate(BaseModel):
|
||||
name: str = None
|
||||
age: int = None
|
||||
telephone: str = None
|
||||
location: str = None
|
||||
|
||||
class CustomerOut(BaseModel):
|
||||
id: int
|
||||
name: str
|
||||
age: int
|
||||
telephone: str
|
||||
location: str
|
||||
class Config:
|
||||
orm_mode = True
|
||||
|
||||
def get_db():
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
@app.post("/customers/", response_model=CustomerOut)
|
||||
def create_customer(customer: CustomerCreate, db: Session = Depends(get_db)):
|
||||
db_customer = Customer(**customer.dict())
|
||||
db.add(db_customer)
|
||||
try:
|
||||
db.commit()
|
||||
db.refresh(db_customer)
|
||||
except Exception as e:
|
||||
db.rollback()
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
return db_customer
|
||||
|
||||
@app.get("/customers/", response_model=List[CustomerOut])
|
||||
def read_customers(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
|
||||
return db.query(Customer).offset(skip).limit(limit).all()
|
||||
|
||||
@app.get("/customers/{customer_id}", response_model=CustomerOut)
|
||||
def read_customer(customer_id: int, db: Session = Depends(get_db)):
|
||||
customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
return customer
|
||||
|
||||
@app.put("/customers/{customer_id}", response_model=CustomerOut)
|
||||
def update_customer(customer_id: int, customer: CustomerUpdate, db: Session = Depends(get_db)):
|
||||
db_customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if db_customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
for var, value in vars(customer).items():
|
||||
if value is not None:
|
||||
setattr(db_customer, var, value)
|
||||
db.commit()
|
||||
db.refresh(db_customer)
|
||||
return db_customer
|
||||
|
||||
@app.delete("/customers/{customer_id}")
|
||||
def delete_customer(customer_id: int, db: Session = Depends(get_db)):
|
||||
db_customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if db_customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
db.delete(db_customer)
|
||||
db.commit()
|
||||
return {"ok": True}
|
||||
```
|
||||
@@ -0,0 +1,288 @@
|
||||
---
|
||||
sidebar_label: Cline
|
||||
slug: /cline
|
||||
---
|
||||
|
||||
# Integrate OceanBase MCP Server with Cline
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
[Cline](https://cline.bot/) is an open-source AI coding assistant that supports the MCP protocol.
|
||||
|
||||
This topic uses Cline to demonstrate how to quickly build a backend application using OceanBase MCP Server.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
|
||||
|
||||
* You have installed [Python 3.11 or later](https://www.python.org/downloads/) and the corresponding [pip](https://pip.pypa.io/en/stable/installation/). If your machine has a low Python version, you can use Miniconda to create a new Python 3.11 or later environment. For more information, see [Miniconda installation guide](https://docs.anaconda.com/miniconda/install/).
|
||||
|
||||
* Install [Git](https://git-scm.com//downloads) based on your operating system.
|
||||
|
||||
* Install uv, a Python package manager. After the installation, run the `uv --version` command to verify the installation:
|
||||
|
||||
```shell
|
||||
pip install uv
|
||||
uv --version
|
||||
```
|
||||
|
||||
* Install Cline:
|
||||
|
||||
* If you are using Visual Studio Code IDE, search for the Cline extension and install it in the `Extensions` section. The extension name is `Cline`. After the installation, click the settings icon to configure the large model API for Cline as follows:
|
||||
|
||||

|
||||
|
||||
* If you do not have an IDE, download Cline from [Cline](https://cline.bot/) and follow the [installation guide](https://docs.cline.bot/getting-started/installing-cline).
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
:::tip
|
||||
The connected user must have <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The username for connecting to the database.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Configure the OceanBase MCP Server
|
||||
|
||||
This example uses Visual Studio Code to demonstrate how to configure the OceanBase MCP Server.
|
||||
|
||||
### Clone the OceanBase MCP Server repository
|
||||
|
||||
Run the following command to download the source code to your local device:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/oceanbase/mcp-oceanbase.git
|
||||
```
|
||||
|
||||
Go to the source code directory:
|
||||
|
||||
```shell
|
||||
cd mcp-oceanbase
|
||||
```
|
||||
|
||||
### Install dependencies
|
||||
|
||||
Run the following command in the `mcp-oceanbase` directory to create a virtual environment and install dependencies:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install .
|
||||
```
|
||||
|
||||
### Create a working directory for Visual Studio Code
|
||||
|
||||
Manually create a working directory for Visual Studio Code on your local device and open it with Visual Studio Code. The files generated by Cline will be placed in this directory. The name of the sample directory is `cline-generate`.
|
||||
|
||||
<!--  -->
|
||||
|
||||
### Configure the OceanBase MCP Server in Cline
|
||||
|
||||
Click the Cline icon on the left-side navigation pane to open the Cline dialog box.
|
||||
|
||||

|
||||
|
||||
### Add and configure MCP servers
|
||||
|
||||
1. Click the **MCP Servers** icon as shown in the following figure.
|
||||
|
||||

|
||||
|
||||
2. Manually configure the OceanBase MCP Server according to the numbered instructions in the figure below.
|
||||
|
||||

|
||||
|
||||
3. Edit the configuration file.
|
||||
|
||||
In the `cline_mcp_settings.json` file that was opened in the previous step, enter the following configuration information and save the file. Replace `/path/to/your/mcp-oceanbase/src/oceanbase_mcp_server` with the absolute path of the `oceanbase_mcp_server` folder, and replace `OB_HOST`, `OB_PORT`, `OB_USER`, `OB_PASSWORD`, and `OB_DATABASE` with your database information.
|
||||
|
||||
The configuration file is as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"oceanbase": {
|
||||
"command": "uv",
|
||||
"args": [
|
||||
"--directory",
|
||||
"/path/to/your/mcp-oceanbase/src/oceanbase_mcp_server",
|
||||
"run",
|
||||
"oceanbase_mcp_server"
|
||||
],
|
||||
"env": {
|
||||
"OB_HOST": "***",
|
||||
"OB_PORT": "***",
|
||||
"OB_USER": "***",
|
||||
"OB_PASSWORD": "***",
|
||||
"OB_DATABASE": "***"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. If the configuration is successful, the MCP Server is displayed in ready status, and the MCP `Tools` and `Resources` information will be displayed, as shown in the following figure:
|
||||
|
||||

|
||||
|
||||
5. Click the switch button in the following figure to enable the MCP Server so that Cline can use it:
|
||||
|
||||

|
||||
|
||||
### Test the MCP Server
|
||||
|
||||
Open the Cline session dialog box and enter the prompt `How many tables are there in the dataanalysis_english database`. Cline will display the SQL statement about to be executed. Confirm the SQL statement and click the `Act` button.
|
||||
|
||||

|
||||
|
||||
Cline will display the table names in the `dataanalysis_english` database, indicating that it can properly connect to seekdb.
|
||||
|
||||

|
||||
|
||||
### Create a RESTful API project using FastAPI
|
||||
|
||||
You can use FastAPI to quickly create a RESTful API project. FastAPI is a Python web framework that allows you to build RESTful APIs efficiently.
|
||||
|
||||
1. Create the customer table
|
||||
|
||||
In the dialog box, enter the prompt: `Create a "customer" table with "ID" as the primary key, including the fields "name", "age", "telephone", and "location"`. Confirm the SQL statement and click the `Act` button.
|
||||
|
||||

|
||||
|
||||
2. Insert test data
|
||||
|
||||
In the dialog box, enter the prompt: `Insert 10 rows of test data`. Confirm the SQL statement and click the `Act` button.
|
||||
|
||||

|
||||
|
||||
After the data is inserted, the execution result will be displayed.
|
||||
|
||||

|
||||
|
||||
3. Create a FastAPI project
|
||||
|
||||
In the dialog box, enter the prompt: `Create a FastAPI project and generate a RESTful API based on the "customer" table`. Confirm the SQL statement and click the `Act` button.
|
||||
|
||||

|
||||
|
||||
This step will automatically generate files. We recommend selecting "Accept All" the first time, because AI-generated files may be uncertain; you can adjust them later as needed.
|
||||
|
||||
4. Create a virtual environment and install dependencies
|
||||
|
||||
Run the following command to create a virtual environment using the uv package manager and install the dependency packages in the current directory:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install -r requirements.txt
|
||||
```
|
||||
|
||||
5. Start the FastAPI project
|
||||
|
||||
Run the following command to start the FastAPI project:
|
||||
|
||||
```shell
|
||||
uvicorn main:app --reload
|
||||
```
|
||||
|
||||
6. View data in the table
|
||||
|
||||
Run the following command in the command line, or use another request tool, to view the data in the table:
|
||||
|
||||
```shell
|
||||
curl http://127.0.0.1:8000/customers
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```json
|
||||
[{"ID":1,"name":"Alice Johnson","age":28,"telephone":"123-456-7890","location":"New York"},{"ID":2,"name":"Bob Smith","age":34,"telephone":"234-567-8901","location":"Los Angeles"},{"ID":3,"name":"Charlie Brown","age":45,"telephone":"345-678-9012","location":"Chicago"},{"ID":4,"name":"David Wilson","age":56,"telephone":"456-789-0123","location":"Houston"},{"ID":5,"name":"Eve Davis","age":67,"telephone":"567-890-1234","location":"Phoenix"},{"ID":6,"name":"Frank Garcia","age":78,"telephone":"678-901-2345","location":"Philadelphia"},{"ID":7,"name":"Grace Martinez","age":89,"telephone":"789-012-3456","location":"San Antonio"},{"ID":8,"name":"Hannah Robinson","age":19,"telephone":"890-123-4567","location":"San Diego"},{"ID":9,"name":"Ian Clark","age":23,"telephone":"901-234-5678","location":"Dallas"},{"ID":10,"name":"Julia Lewis","age":31,"telephone":"012-345-6789","location":"San Jose"}]
|
||||
```
|
||||
|
||||
You can see that the RESTful APIs for creating, reading, updating, and deleting data have been successfully generated:
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI, Depends, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
from models import Customer
|
||||
from database import SessionLocal, engine
|
||||
from pydantic import BaseModel
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
# Database dependency
|
||||
def get_db():
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
# Request model
|
||||
class CustomerCreate(BaseModel):
|
||||
name: str
|
||||
age: int
|
||||
telephone: str
|
||||
location: str
|
||||
|
||||
# Response model
|
||||
class CustomerResponse(CustomerCreate):
|
||||
id: int
|
||||
|
||||
class Config:
|
||||
from_attributes = True
|
||||
|
||||
@app.post("/customers/")
|
||||
def create_customer(customer: CustomerCreate, db: Session = Depends(get_db)):
|
||||
db_customer = Customer(**customer.model_dump())
|
||||
db.add(db_customer)
|
||||
db.commit()
|
||||
db.refresh(db_customer)
|
||||
return db_customer
|
||||
|
||||
@app.get("/customers/{customer_id}")
|
||||
def read_customer(customer_id: int, db: Session = Depends(get_db)):
|
||||
customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
return customer
|
||||
|
||||
@app.get("/customers/")
|
||||
def read_customers(skip: int = 0, limit: int = 10, db: Session = Depends(get_db)):
|
||||
return db.query(Customer).offset(skip).limit(limit).all()
|
||||
|
||||
@app.put("/customers/{customer_id}")
|
||||
def update_customer(customer_id: int, customer: CustomerCreate, db: Session = Depends(get_db)):
|
||||
db_customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if db_customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
for field, value in customer.model_dump().items():
|
||||
setattr(db_customer, field, value)
|
||||
db.commit()
|
||||
db.refresh(db_customer)
|
||||
return db_customer
|
||||
|
||||
@app.delete("/customers/{customer_id}")
|
||||
def delete_customer(customer_id: int, db: Session = Depends(get_db)):
|
||||
customer = db.query(Customer).filter(Customer.id == customer_id).first()
|
||||
if customer is None:
|
||||
raise HTTPException(status_code=404, detail="Customer not found")
|
||||
db.delete(customer)
|
||||
db.commit()
|
||||
return {"message": "Customer deleted successfully"}
|
||||
```
|
||||
@@ -0,0 +1,154 @@
|
||||
---
|
||||
sidebar_label: Continue
|
||||
slug: /continue
|
||||
---
|
||||
|
||||
# Integrate OceanBase MCP Server with Continue
|
||||
|
||||
[MCP (Model Context Protocol)](https://modelcontextprotocol.io/introduction) is an open-source protocol released by Anthropic in November 2024. It enables large language models to interact with external tools or data sources. With MCP, users do not need to manually copy and execute the output of large models; instead, the models can directly instruct tools to perform specific actions (Actions).
|
||||
|
||||
[MCP Server](https://github.com/oceanbase/mcp-oceanbase/tree/main/src/oceanbase_mcp_server) provides the capability for large models to interact with seekdb through the MCP protocol, allowing the execution of SQL statements. With the right client, you can quickly build a project prototype, and it is open-source on GitHub.
|
||||
|
||||
[Continue](https://continue.dev) is an IDE extension that integrates with the MCP Server, supporting Visual Studio Code and IntelliJ IDEA.
|
||||
|
||||
This topic will guide you on how to integrate Continue with the OceanBase MCP Server to quickly build backend applications.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* You have installed [Python 3.11 or later](https://www.python.org/downloads/) and the corresponding [pip](https://pip.pypa.io/en/stable/installation/). If your machine has a low Python version, you can use Miniconda to create a new Python 3.11 or later environment. For more information, see [Miniconda installation guide](https://docs.anaconda.com/miniconda/install/).
|
||||
|
||||
* Install [Git](https://git-scm.com//downloads) based on your operating system.
|
||||
|
||||
* Install uv, a Python package manager. After the installation, run the `uv --version` command to verify the installation:
|
||||
|
||||
```shell
|
||||
pip install uv
|
||||
uv --version
|
||||
```
|
||||
|
||||
* Install the Continue extension in Visual Studio Code or IntelliJ IDEA. The extension name is `Continue`.
|
||||
|
||||

|
||||
|
||||
* After the installation is complete, click `Add Models` to configure the large model API for Continue. The API configuration is as follows:
|
||||
|
||||

|
||||
|
||||
* The configuration file is as follows:
|
||||
|
||||
```json
|
||||
name: Local Assistant
|
||||
version: 1.0.0
|
||||
schema: v1
|
||||
models:
|
||||
# Model name
|
||||
- name: DeepSeek-R1-671B
|
||||
# Model provider
|
||||
provider: deepseek
|
||||
# Model type
|
||||
model: DeepSeek-R1-671B
|
||||
# URL address for accessing the model
|
||||
apiBase: *********
|
||||
# API key for accessing the model
|
||||
apiKey: ********
|
||||
# Context provider
|
||||
context:
|
||||
- provider: code
|
||||
- provider: docs
|
||||
- provider: diff
|
||||
- provider: terminal
|
||||
- provider: problems
|
||||
- provider: folder
|
||||
- provider: codebase
|
||||
```
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
:::tip
|
||||
The connected user must have <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The username for connecting to the database.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Configure the OceanBase MCP Server
|
||||
|
||||
### Clone the OceanBase MCP Server repository
|
||||
|
||||
Run the following command to download the source code to your local device:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/oceanbase/mcp-oceanbase.git
|
||||
```
|
||||
|
||||
Go to the source code directory:
|
||||
|
||||
```shell
|
||||
cd mcp-oceanbase
|
||||
```
|
||||
|
||||
### Install dependencies
|
||||
|
||||
Run the following command in the `mcp-oceanbase` directory to create a virtual environment and install dependencies:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install .
|
||||
```
|
||||
|
||||
### Add and configure MCP servers
|
||||
|
||||
1. Click the button in the upper-right corner of the menu bar to open the MCP panel.
|
||||
|
||||

|
||||
|
||||
2. Click Add `MCP Servers`.
|
||||
|
||||
:::tip
|
||||
MCP can be used only in the Continue Agent mode.
|
||||
:::
|
||||
|
||||

|
||||
|
||||
3. Fill in the configuration file and click OK.
|
||||
|
||||
Replace `/path/to/your/mcp-oceanbase/src/oceanbase_mcp_server` with the absolute path of the `oceanbase_mcp_server` folder. Replace `OB_HOST`, `OB_PORT`, `OB_USER`, `OB_PASSWORD`, and `OB_DATABASE` with the corresponding information of your database:
|
||||
|
||||
```json
|
||||
name: SeekDB
|
||||
version: 0.0.1
|
||||
schema: v1
|
||||
mcpServers:
|
||||
- name: SeekDB
|
||||
command: uv
|
||||
args:
|
||||
- --directory
|
||||
- /path/to/your/mcp-oceanbase/src/oceanbase_mcp_server
|
||||
- run
|
||||
- oceanbase_mcp_server
|
||||
env:
|
||||
OB_HOST: "****"
|
||||
OB_PORT: "***"
|
||||
OB_USER: "***"
|
||||
OB_PASSWORD: "***"
|
||||
OB_DATABASE: "***"
|
||||
```
|
||||
|
||||
4. If the configuration is successful, the following message is displayed:
|
||||
|
||||

|
||||
@@ -0,0 +1,291 @@
|
||||
---
|
||||
sidebar_label: TRAE
|
||||
slug: /trae
|
||||
---
|
||||
|
||||
# Integrate OceanBase MCP Server with TRAE
|
||||
|
||||
[Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) is an open-source protocol introduced by Anthropic in November 2024. It allows large language models to interact with external tools or data sources. With MCP, you do not need to manually copy and execute the output of large language models. Instead, the large language model can directly command tools to perform specific actions.
|
||||
|
||||
[MCP Server](https://github.com/oceanbase/mcp-oceanbase/tree/main/src/oceanbase_mcp_server) enables large language models to interact with OceanBase Database through the MCP protocol and execute SQL statements. It allows you to quickly build a project prototype with the help of an appropriate client and has been open-sourced on GitHub.
|
||||
|
||||
[TRAE](https://www.trae.ai/) is an IDE that can integrate with MCP Server, can be downloaded from its official website.
|
||||
|
||||
This topic will guide you through the process of integrating TRAE IDE with OceanBase MCP Server to quickly build a backend application.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* You have installed [Python 3.11 or later](https://www.python.org/downloads/) and the corresponding [pip](https://pip.pypa.io/en/stable/installation/). If your system has a low Python version, you can use Miniconda to create a new Python 3.11 or later environment. For more information, see [Install Miniconda](https://docs.anaconda.com/miniconda/install/).
|
||||
|
||||
* You have installed [Git](https://git-scm.com//downloads) based on your operating system.
|
||||
|
||||
* You have installed uv, a Python package manager. After the installation, run the `uv --version` command to check whether the installation was successful:
|
||||
|
||||
```shell
|
||||
pip install uv
|
||||
uv --version
|
||||
```
|
||||
|
||||
* You have downloaded [TRAE IDE](https://www.trae.ai/download) and installed the version suitable for your operating system.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to seekdb.
|
||||
* `$port`: The port number for connecting to seekdb. Default is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
:::tip
|
||||
The connected user must have <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.
|
||||
:::
|
||||
|
||||
* `$user_name`: The username for connecting to the database.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Configure the OceanBase MCP Server
|
||||
|
||||
### Clone the OceanBase MCP Server repository
|
||||
|
||||
Run the following command to download the source code to your local device:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/oceanbase/mcp-oceanbase.git
|
||||
```
|
||||
|
||||
Go to the source code directory:
|
||||
|
||||
```shell
|
||||
cd mcp-oceanbase
|
||||
```
|
||||
|
||||
### Install the dependencies
|
||||
|
||||
Run the following commands in the `mcp-oceanbase` directory to create a virtual environment and install the dependencies:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install .
|
||||
```
|
||||
|
||||
### Create a working directory for the TRAE client
|
||||
|
||||
Manually create a working directory for TRAE and open it. TRAE will generate files in this directory. The example directory name is `trae-generate`.
|
||||
|
||||

|
||||
|
||||
### Configure the OceanBase MCP Server in TRAE
|
||||
|
||||
Press `Ctrl + U` (Windows) or `Command + U` (MacOS) to open the chat box. Click the gear icon in the upper-right corner and select **MCP**.
|
||||
|
||||

|
||||
|
||||
### Add and configure MCP servers
|
||||
|
||||
1. Click **Add MCP Servers** and select **Add Manually**.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
2. Delete the sample content in the edit box.
|
||||
|
||||

|
||||
|
||||
Then enter the following contents:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"oceanbase": {
|
||||
"command": "uv",
|
||||
"args": [
|
||||
"--directory",
|
||||
// Replace with the absolute path of the oceanbase_mcp_server folder.
|
||||
"/path/to/your/mcp-oceanbase/src/oceanbase_mcp_server",
|
||||
"run",
|
||||
"oceanbase_mcp_server"
|
||||
],
|
||||
"env": {
|
||||
// Replace with your OceanBase Database connection information.
|
||||
"OB_HOST": "***",
|
||||
"OB_PORT": "***",
|
||||
"OB_USER": "***",
|
||||
"OB_PASSWORD": "***",
|
||||
"OB_DATABASE": "***"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. The configuration is successful.
|
||||
|
||||

|
||||
|
||||
### Test the MCP server
|
||||
|
||||
1. Select the **Builder with MCP** agent.
|
||||
|
||||

|
||||
|
||||
2. In the dialog box, enter `How many tables are there in the test database`. The TRAE client will display the SQL statement to be executed. Confirm the SQL statement and click the `Run` button.
|
||||
|
||||

|
||||
|
||||
3. The TRAE client will display the number of tables in the `test` database. This indicates that you have successfully connected to seekdb.
|
||||
|
||||

|
||||
|
||||
### Create a RESTful API project using FastAPI
|
||||
|
||||
You can use FastAPI to quickly create a RESTful API project. FastAPI is a Python web framework that enables you to build RESTful APIs efficiently.
|
||||
|
||||
1. Create the customer table.
|
||||
|
||||
In the dialog box, enter `Create a "customer" table with "Id" as the primary key, including the fields of "name", "age", "telephone", and "location"`. Confirm the SQL statement and click the `Run` button.
|
||||
|
||||

|
||||
|
||||
2. Insert test data.
|
||||
|
||||
In the dialog box, enter `Insert 10 test data entries`. Confirm the SQL statement and click the `Run` button.
|
||||
|
||||

|
||||
|
||||
The execution result is displayed after the insertion is successful:
|
||||
|
||||

|
||||
|
||||
3. Create a FastAPI project.
|
||||
|
||||
In the dialog box, enter `Create a FastAPI project and generate a RESTful API based on the "customer" table`. Confirm the SQL statement and click the `Run` button.
|
||||
|
||||

|
||||
|
||||
This step will generate three files. We recommend that you select "Accept All" for the first use, because the files generated by AI may contain uncertain contents. You can adjust them based on your actual needs later.
|
||||
|
||||
4. Create a virtual environment and install dependencies
|
||||
|
||||
Execute the following command to use the uv package manager to create a virtual environment and install the required packages in the current directory:
|
||||
|
||||
```shell
|
||||
uv venv
|
||||
source .venv/bin/activate
|
||||
uv pip install -r requirements.txt
|
||||
```
|
||||
|
||||
5. Start the FastAPI project.
|
||||
|
||||
Run the following command to start the FastAPI project:
|
||||
|
||||
```shell
|
||||
uvicorn main:app --reload
|
||||
```
|
||||
|
||||
6. View the data in the table.
|
||||
|
||||
Run the following command in the command line or use other request tools to view the data in the table:
|
||||
|
||||
```shell
|
||||
curl http://127.0.0.1:8000/customers
|
||||
```
|
||||
|
||||
The return result is as follows:
|
||||
|
||||
```json
|
||||
[{"Id":1,"name":"Alice","age":25,"telephone":"123-***-7890","location":"New York"},{"Id":2,"name":"Bob","age":30,"telephone":"234-***-8901","location":"Los Angeles"},{"Id":3,"name":"Charlie","age":35,"telephone":"345-***-9012","location":"Chicago"},{"Id":4,"name":"David","age":40,"telephone":"456-***-0123","location":"Houston"},{"Id":5,"name":"Eve","age":45,"telephone":"567-***-1234","location":"Miami"},{"Id":6,"name":"Frank","age":50,"telephone":"678-***-2345","location":"Seattle"},{"Id":7,"name":"Grace","age":55,"telephone":"789-***-3456","location":"Denver"},{"Id":8,"name":"Heidi","age":60,"telephone":"890-***-4567","location":"Boston"},{"Id":9,"name":"Ivan","age":65,"telephone":"901-***-5678","location":"Philadelphia"},{"Id":10,"name":"Judy","age":70,"telephone":"012-***-6789","location":"San Francisco"}]
|
||||
```
|
||||
|
||||
You can see that the RESTful APIs for CRUD operations has been successfully generated:
|
||||
|
||||
```shell
|
||||
from fastapi import FastAPI
|
||||
from pydantic import BaseModel
|
||||
import mysql.connector
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
# Database connection configuration
|
||||
config = {
|
||||
'user': '*******',
|
||||
'password': '******',
|
||||
'host': 'xx.xxx.xxx.xx',
|
||||
'database': 'test',
|
||||
'port':xxxx,
|
||||
'raise_on_warnings': True
|
||||
}
|
||||
|
||||
class Customer(BaseModel):
|
||||
id: int
|
||||
name: str
|
||||
age: int
|
||||
telephone: str
|
||||
location: str
|
||||
|
||||
@app.get('/customers')
|
||||
async def get_customers():
|
||||
cnx = mysql.connector.connect(**config)
|
||||
cursor = cnx.cursor(dictionary=True)
|
||||
query = 'SELECT * FROM customer'
|
||||
cursor.execute(query)
|
||||
results = cursor.fetchall()
|
||||
cursor.close()
|
||||
cnx.close()
|
||||
return results
|
||||
|
||||
@app.get('/customers/{customer_id}')
|
||||
async def get_customer(customer_id: int):
|
||||
cnx = mysql.connector.connect(**config)
|
||||
cursor = cnx.cursor(dictionary=True)
|
||||
query = 'SELECT * FROM customer WHERE ID = %s'
|
||||
cursor.execute(query, (customer_id,))
|
||||
result = cursor.fetchone()
|
||||
cursor.close()
|
||||
cnx.close()
|
||||
return result
|
||||
|
||||
@app.post('/customers')
|
||||
async def create_customer(customer: Customer):
|
||||
cnx = mysql.connector.connect(**config)
|
||||
cursor = cnx.cursor()
|
||||
query = 'INSERT INTO customer (ID, name, age, telephone, location) VALUES (%s, %s, %s, %s, %s)'
|
||||
data = (customer.id, customer.name, customer.age, customer.telephone, customer.location)
|
||||
cursor.execute(query, data)
|
||||
cnx.commit()
|
||||
cursor.close()
|
||||
cnx.close()
|
||||
return {'message': 'Customer created successfully'}
|
||||
|
||||
@app.put('/customers/{customer_id}')
|
||||
async def update_customer(customer_id: int, customer: Customer):
|
||||
cnx = mysql.connector.connect(**config)
|
||||
cursor = cnx.cursor()
|
||||
query = 'UPDATE customer SET name = %s, age = %s, telephone = %s, location = %s WHERE ID = %s'
|
||||
data = (customer.name, customer.age, customer.telephone, customer.location, customer_id)
|
||||
cursor.execute(query, data)
|
||||
cnx.commit()
|
||||
cursor.close()
|
||||
cnx.close()
|
||||
return {'message': 'Customer updated successfully'}
|
||||
|
||||
@app.delete('/customers/{customer_id}')
|
||||
async def delete_customer(customer_id: int):
|
||||
cnx = mysql.connector.connect(**config)
|
||||
cursor = cnx.cursor()
|
||||
query = 'DELETE FROM customer WHERE ID = %s'
|
||||
cursor.execute(query, (customer_id,))
|
||||
cnx.commit()
|
||||
cursor.close()
|
||||
cnx.close()
|
||||
return {'message': 'Customer deleted successfully'}
|
||||
```
|
||||
Reference in New Issue
Block a user