Initial commit
This commit is contained in:
@@ -0,0 +1,183 @@
|
||||
---
|
||||
sidebar_label: LangChain
|
||||
slug: /langchain
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with LangChain
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
LangChain is a framework for developing language model-driven applications. It enables an application to have the following capabilities:
|
||||
|
||||
* Context awareness: The application can connect language models to context sources, such as prompt instructions, a few examples, and content requiring responses.
|
||||
* Reasoning: The application can perform reasoning based on language models. For example, it can decide how to answer a question or what actions to take based on the provided context.
|
||||
|
||||
This topic describes how to integrate the [vector search feature](../../200.develop/100.vector-search/100.vector-search-overview/100.vector-search-intro.md) of seekdb with the [Tongyi Qianwen (Qwen) API](https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1) and [LangChain](https://python.langchain.com/) for Document Question Answering (DQA).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
* Your environment has a database and account with read and write privileges.
|
||||
* You have installed Python 3.9 or later.
|
||||
* You have installed required dependencies:
|
||||
|
||||
```shell
|
||||
python3 -m pip install -U langchain-oceanbase
|
||||
python3 -m pip install langchain_community
|
||||
python3 -m pip install dashscope
|
||||
```
|
||||
|
||||
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account, in the format of `username`.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
|
||||
|
||||
## Step 2: Build your AI assistant
|
||||
|
||||
### Set the environment variable for the Qwen API key
|
||||
|
||||
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
|
||||
|
||||
```shell
|
||||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||||
```
|
||||
|
||||
### Load and split the documents
|
||||
|
||||
Download the sample data and split it into chunks of approximately 1000 characters using the `CharacterTextSplitter` class.
|
||||
|
||||
```python
|
||||
from langchain_community.document_loaders import TextLoader
|
||||
from langchain_community.embeddings import DashScopeEmbeddings
|
||||
from langchain_text_splitters import CharacterTextSplitter
|
||||
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
|
||||
import os
|
||||
import requests
|
||||
|
||||
DASHSCOPE_API = os.environ.get("DASHSCOPE_API_KEY", "")
|
||||
embeddings = DashScopeEmbeddings(
|
||||
model="text-embedding-v1", dashscope_api_key=DASHSCOPE_API
|
||||
)
|
||||
|
||||
url = "https://raw.githubusercontent.com/GITHUBear/langchain/refs/heads/master/docs/docs/how_to/state_of_the_union.txt"
|
||||
res = requests.get(url)
|
||||
with open("state_of_the_union.txt", "w") as f:
|
||||
f.write(res.text)
|
||||
|
||||
loader = TextLoader('./state_of_the_union.txt')
|
||||
|
||||
documents = loader.load()
|
||||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
|
||||
docs = text_splitter.split_documents(documents)
|
||||
```
|
||||
|
||||
### Insert the data into seekdb
|
||||
|
||||
```python
|
||||
connection_args = {
|
||||
"host": "127.0.0.1",
|
||||
"port": "2881",
|
||||
"user": "root@user_name",
|
||||
"password": "",
|
||||
"db_name": "test",
|
||||
}
|
||||
DEMO_TABLE_NAME = "demo_ann"
|
||||
ob = OceanbaseVectorStore(
|
||||
embedding_function=embeddings,
|
||||
table_name=DEMO_TABLE_NAME,
|
||||
connection_args=connection_args,
|
||||
drop_old=True,
|
||||
normalize=True,
|
||||
)
|
||||
res = ob.add_documents(documents=docs)
|
||||
```
|
||||
|
||||
### Vector search
|
||||
|
||||
This step shows how to query `"What did the president say about Ketanji Brown Jackson"` from the document `state_of_the_union.txt`.
|
||||
|
||||
```python
|
||||
query = "What did the president say about Ketanji Brown Jackson"
|
||||
docs_with_score = ob.similarity_search_with_score(query, k=3)
|
||||
|
||||
for doc, score in docs_with_score:
|
||||
print("-" * 80)
|
||||
print("Score: ", score)
|
||||
print(doc.page_content)
|
||||
print("-" * 80)
|
||||
```
|
||||
|
||||
Expected output:
|
||||
|
||||
```shell
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.204783671324283
|
||||
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
|
||||
|
||||
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
|
||||
|
||||
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
|
||||
|
||||
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
|
||||
--------------------------------------------------------------------------------
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.2146663629717394
|
||||
It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
|
||||
|
||||
As I’ve told Xi Jinping, it is never a good bet to bet against the American people.
|
||||
|
||||
We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
|
||||
|
||||
And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
|
||||
|
||||
We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
|
||||
|
||||
4,000 projects have already been announced.
|
||||
|
||||
And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
|
||||
--------------------------------------------------------------------------------
|
||||
--------------------------------------------------------------------------------
|
||||
Score: 1.2193955178945004
|
||||
Vice President Harris and I ran for office with a new economic vision for America.
|
||||
|
||||
Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
|
||||
and the middle out, not from the top down.
|
||||
|
||||
Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
|
||||
|
||||
America used to have the best roads, bridges, and airports on Earth.
|
||||
|
||||
Now our infrastructure is ranked 13th in the world.
|
||||
|
||||
We won’t be able to compete for the jobs of the 21st Century if we don’t fix that.
|
||||
|
||||
That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
|
||||
|
||||
This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
|
||||
|
||||
We’re done talking about infrastructure weeks.
|
||||
|
||||
We’re going to have an infrastructure decade.
|
||||
--------------------------------------------------------------------------------
|
||||
```
|
||||
@@ -0,0 +1,125 @@
|
||||
---
|
||||
sidebar_label: LlamaIndex
|
||||
slug: /llamaindex
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with LlamaIndex
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
LlamaIndex is a framework for building context-augmented generative AI applications by using large language models (LLMs), including proxies and workflows. It provides a wealth of capabilities such as data connectors, data indexes, proxies, observability/assessment integration, and workflows.
|
||||
|
||||
This topic demonstrates how to integrate the vector search feature of seekdb with the Tongyi Qianwen (Qwen) API and LlamaIndex for Document Question Answering (DQA).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed the seekdb database.
|
||||
* Your environment has a database and account with read and write privileges.
|
||||
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
|
||||
* You have installed Python 3.9 or later.
|
||||
* You have installed the required dependencies:
|
||||
|
||||
```shell
|
||||
python3 -m pip install llama-index-vector-stores-oceanbase llama-index
|
||||
python3 -m pip install llama-index-embeddings-dashscope
|
||||
python3 -m pip install llama-index-llms-dashscope
|
||||
```
|
||||
|
||||
* You have obtained the Qwen API key.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account, in the format of `username`.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
|
||||
|
||||
## Step 2: Build your AI assistant
|
||||
|
||||
### Set the environment variable for the Qwen API key
|
||||
|
||||
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
|
||||
|
||||
```shell
|
||||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||||
```
|
||||
|
||||
### Download the sample data
|
||||
|
||||
```shell
|
||||
mkdir -p '/root/llamaindex/paul_graham/'
|
||||
wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '/root/llamaindex/paul_graham/paul_graham_essay.txt'
|
||||
```
|
||||
|
||||
### Load the data text
|
||||
|
||||
```python
|
||||
import os
|
||||
from pyobvector import ObVecClient
|
||||
from llama_index.core import Settings
|
||||
from llama_index.embeddings.dashscope import DashScopeEmbedding
|
||||
from llama_index.core import (
|
||||
SimpleDirectoryReader,
|
||||
load_index_from_storage,
|
||||
VectorStoreIndex,
|
||||
StorageContext,
|
||||
)
|
||||
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
|
||||
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
|
||||
#set ob client
|
||||
client = ObVecClient(uri="127.0.0.1:2881", user="root@test",password="",db_name="test")
|
||||
# Global Settings
|
||||
Settings.embed_model = DashScopeEmbedding()
|
||||
# config llm model
|
||||
dashscope_llm = DashScope(
|
||||
model_name=DashScopeGenerationModels.QWEN_MAX,
|
||||
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
|
||||
)
|
||||
# load documents
|
||||
documents = SimpleDirectoryReader("/root/llamaindex/paul_graham/").load_data()
|
||||
oceanbase = OceanBaseVectorStore(
|
||||
client=client,
|
||||
dim=1536,
|
||||
drop_old=True,
|
||||
normalize=True,
|
||||
)
|
||||
|
||||
storage_context = StorageContext.from_defaults(vector_store=oceanbase)
|
||||
index = VectorStoreIndex.from_documents(
|
||||
documents, storage_context=storage_context
|
||||
)
|
||||
```
|
||||
|
||||
## Vector search
|
||||
|
||||
This step shows how to query `"What did the author do growing up?"` from the document `paul_graham_essay.txt`.
|
||||
|
||||
```shell
|
||||
# set Logging to DEBUG for more detailed outputs
|
||||
query_engine = index.as_query_engine(llm=dashscope_llm)
|
||||
res = query_engine.query("What did the author do growing up?")
|
||||
res.response
|
||||
```
|
||||
|
||||
Expected result:
|
||||
|
||||
```python
|
||||
'Growing up, the author worked on writing and programming outside of school. In terms of writing, he wrote short stories, which he now considers to be awful, as they had very little plot and focused mainly on characters with strong feelings. For programming, he started in 9th grade by trying to write programs on an IBM 1401 at his school, using an early version of Fortran. Later, after getting a TRS-80 microcomputer, he began to write more practical programs, including simple games, a program to predict the flight height of model rockets, and a word processor that his father used for writing.'
|
||||
```
|
||||
@@ -0,0 +1,338 @@
|
||||
---
|
||||
sidebar_label: Spring AI
|
||||
slug: /springai
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with Spring AI Alibaba
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
The Spring AI Alibaba project is an open-source project that uses Spring AI and provides the best practices for developing Java applications with AI. It simplifies the AI application development process and adapts to cloud-native infrastructure. It helps developers quickly build AI applications.
|
||||
|
||||
This topic describes how to integrate the vector search capability of seekdb with Spring AI Alibaba to implement data import and similarity search features. By configuring vector storage and search services, developers can easily build AI application scenarios based on seekdb, supporting advanced features such as text similarity search and content recommendation.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* Download [JDK 17+](https://www.oracle.com/java/technologies/downloads/#java17). Make sure that you have installed Java 17 and configured the environment variables.
|
||||
|
||||
* Download [Maven](https://dlcdn.apache.org/maven/). Make sure that you have installed Maven 3.6+ for building projects and managing dependencies.
|
||||
|
||||
* Download [IntelliJ IDEA](https://www.jetbrains.com/idea/download/) or [Eclipse](https://www.eclipse.org/downloads/). Choose the version that suits your operating system and install it.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Set up the Maven project
|
||||
|
||||
Maven is a project management and build tool used in this topic. This step describes how to create a Maven project and add project dependencies by configuring the `pom.xml` file.
|
||||
|
||||
### Create a project
|
||||
|
||||
1. Run the following Maven command to create a project.
|
||||
|
||||
```shell
|
||||
mvn archetype:generate -DgroupId=com.alibaba.cloud.ai.example -DartifactId=vector-oceanbase-example -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
|
||||
```
|
||||
|
||||
2. Go to the project directory.
|
||||
|
||||
```shell
|
||||
cd vector-oceanbase-example
|
||||
```
|
||||
|
||||
### Configure the `pom.xml` file
|
||||
|
||||
The `pom.xml` file is the core configuration file of the Maven project, used to manage project dependencies, plugins, configurations, and other information. To build the project, you need to modify the `pom.xml` file and add Spring AI Alibaba, seekdb vector storage, and other necessary dependencies.
|
||||
|
||||
Open the `pom.xml` file and replace the existing content with the following:
|
||||
|
||||
```xml
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<parent>
|
||||
<groupId>com.alibaba.cloud.ai.example</groupId>
|
||||
<artifactId>spring-ai-alibaba-vector-databases-example</artifactId>
|
||||
<version>1.0.0</version>
|
||||
</parent>
|
||||
|
||||
<artifactId>vector-oceanbase-example</artifactId>
|
||||
|
||||
<properties>
|
||||
<maven.compiler.source>17</maven.compiler.source>
|
||||
<maven.compiler.target>17</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
</properties>
|
||||
|
||||
<dependencies>
|
||||
<!-- Alibaba Cloud AI starter -->
|
||||
<dependency>
|
||||
<groupId>com.alibaba.cloud.ai</groupId>
|
||||
<artifactId>spring-ai-alibaba-starter</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring Boot Web support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-web</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring AI auto-configuration -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.ai</groupId>
|
||||
<artifactId>spring-ai-spring-boot-autoconfigure</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Spring JDBC support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework</groupId>
|
||||
<artifactId>spring-jdbc</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Transformers model support -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.ai</groupId>
|
||||
<artifactId>spring-ai-transformers</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- OceanBase Vector Database starter -->
|
||||
<dependency>
|
||||
<groupId>com.alibaba.cloud.ai</groupId>
|
||||
<artifactId>spring-ai-alibaba-starter-oceanbase-store</artifactId>
|
||||
<version>1.0.0-M6.2-SNAPSHOT</version>
|
||||
</dependency>
|
||||
|
||||
<!-- OceanBase JDBC driver -->
|
||||
<dependency>
|
||||
<groupId>com.oceanbase</groupId>
|
||||
<artifactId>oceanbase-client</artifactId>
|
||||
<version>2.4.14</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<!-- SNAPSHOT repository configuration -->
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>sonatype-snapshots</id>
|
||||
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
|
||||
<releases>
|
||||
<enabled>false</enabled>
|
||||
</releases>
|
||||
<snapshots>
|
||||
<enabled>true</enabled>
|
||||
</snapshots>
|
||||
</repository>
|
||||
</repositories>
|
||||
</project>
|
||||
```
|
||||
|
||||
## Step 3: Configure the connection information of seekdb
|
||||
|
||||
This step configures the `application.yml` file to add the connection information of seekdb.
|
||||
|
||||
Create the `application.yml` file in the `src/main/resources` directory of the project and add the following content:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
port: 8080
|
||||
|
||||
spring:
|
||||
application:
|
||||
name: oceanbase-example
|
||||
ai:
|
||||
dashscope:
|
||||
api-key: ${DASHSCOPE_API_KEY} # Replace with your DashScope API Key
|
||||
vectorstore:
|
||||
oceanbase:
|
||||
enabled: true
|
||||
url: jdbc:oceanbase://xxx:xxx/xxx # URL for connecting to seekdb
|
||||
username: xxx # Username of seekdb
|
||||
password: xxx # Password of seekdb
|
||||
tableName: vector_table # Name of the vector table (automatically created)
|
||||
defaultTopK: 2 # Default number of similar results to return
|
||||
defaultSimilarityThreshold: 0.8 # Similarity threshold (0~1, smaller values indicate higher similarity)
|
||||
```
|
||||
|
||||
## Step 4: Create the main application class and controller
|
||||
|
||||
Create the startup class and controller class of the Spring Boot application to implement the data import and similarity search features.
|
||||
|
||||
### Create an application startup class
|
||||
|
||||
Create a file named `OceanBaseApplication.java` in the `src/main/java/com/alibaba/cloud/ai/example/vector` directory, and add the following code to the file:
|
||||
|
||||
```java
|
||||
package com.alibaba.cloud.ai.example.vector; // The package name must be consistent with the directory structure.
|
||||
|
||||
import org.springframework.boot.SpringApplication;
|
||||
import org.springframework.boot.autoconfigure.SpringBootApplication;
|
||||
|
||||
@SpringBootApplication // Enable Spring Boot auto-configuration
|
||||
public class OceanBaseApplication {
|
||||
public static void main(String[] args) {
|
||||
SpringApplication.run(OceanBaseApplication.class, args); // Start the Spring Boot application
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The sample code creates the core startup class for the project, which is used to start the Spring Boot application.
|
||||
|
||||
### Create a vector storage controller
|
||||
|
||||
Create the `OceanBaseController.java` file in the `src/main/java/com/alibaba/cloud/ai/example/vector` directory and add the following code:
|
||||
|
||||
```java
|
||||
package com.alibaba.cloud.ai.example.vector.controller; // The package name must be consistent with the directory structure.
|
||||
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.ai.document.Document;
|
||||
import org.springframework.ai.vectorstore.SearchRequest;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.web.bind.annotation.GetMapping;
|
||||
import org.springframework.web.bind.annotation.RequestMapping;
|
||||
import org.springframework.web.bind.annotation.RestController;
|
||||
|
||||
import java.util.HashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
import com.alibaba.cloud.ai.vectorstore.oceanbase.OceanBaseVectorStore;
|
||||
|
||||
@RestController // Mark the class as a REST controller.
|
||||
@RequestMapping("/oceanbase") // Set the base path to /oceanbase.
|
||||
public class OceanBaseController {
|
||||
|
||||
private static final Logger logger = LoggerFactory.getLogger(OceanBaseController.class); // The logger.
|
||||
|
||||
@Autowired // Automatically inject the seekdb vector store service.
|
||||
private OceanBaseVectorStore oceanBaseVectorStore;
|
||||
|
||||
// The data import interface.
|
||||
@GetMapping("/import")
|
||||
public void importData() {
|
||||
logger.info("Start importing data");
|
||||
|
||||
// Create sample data.
|
||||
HashMap<String, Object> map = new HashMap<>();
|
||||
map.put("id", "12345");
|
||||
map.put("year", "2025");
|
||||
map.put("name", "yingzi");
|
||||
|
||||
// Create a list that contains three documents.
|
||||
List<Document> documents = List.of(
|
||||
new Document("The World is Big and Salvation Lurks Around the Corner"),
|
||||
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("year", 2024)),
|
||||
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", map)
|
||||
);
|
||||
|
||||
// Add the documents to the vector store.
|
||||
oceanBaseVectorStore.add(documents);
|
||||
}
|
||||
|
||||
// The similar document search interface.
|
||||
@GetMapping("/search")
|
||||
public List<Document> search() {
|
||||
logger.info("Start searching data");
|
||||
|
||||
// Perform a similarity search for documents that contain "Spring" and return the two most similar results.
|
||||
return oceanBaseVectorStore.similaritySearch(SearchRequest.builder()
|
||||
.query("Spring")
|
||||
.topK(2)
|
||||
.build());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Step 5: Start and test the Maven project
|
||||
|
||||
### Start the project using an IDE
|
||||
|
||||
The following example shows how to start the project using IntelliJ IDEA.
|
||||
|
||||
The steps are as follows:
|
||||
|
||||
1. Open the project by clicking **File** > **Open** and selecting `pom.xml`.
|
||||
2. Select **Open as a project**.
|
||||
3. Find the main class `OceanBaseApplication.java`.
|
||||
4. Right-click and select **Run 'OceanBaseApplication.main()'**.
|
||||
|
||||
### Test the project
|
||||
|
||||
1. Import the test data by visiting the following URL:
|
||||
|
||||
```shell
|
||||
http://localhost:8080/oceanbase/import
|
||||
```
|
||||
|
||||
2. Perform vector search by visiting the following URL:
|
||||
|
||||
```shell
|
||||
http://localhost:8080/oceanbase/search
|
||||
```
|
||||
|
||||
The expected result is as follows:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "03fe9aad-13cc-4d25-807b-ca1bc314f571",
|
||||
"text": "Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!",
|
||||
"metadata": {
|
||||
"name": "yingzi",
|
||||
"id": "12345",
|
||||
"year": "2025",
|
||||
"distance": "7.274442499114312"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "75864954-0a23-4fa1-8e18-b78fd870d474",
|
||||
"text": "Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!",
|
||||
"metadata": {
|
||||
"name": "yingzi",
|
||||
"id": "12345",
|
||||
"year": "2025",
|
||||
"distance": "7.274442499114312"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
### seekdb connection failure
|
||||
|
||||
* Cause: The URL, username, or password is incorrect.
|
||||
* Solution: Check the seekdb configuration in `application.yml` and make sure the database service is running.
|
||||
|
||||
### Dependency conflict
|
||||
|
||||
* Cause: Conflicts between multiple Spring Boot versions.
|
||||
* Solution: Run `mvn dependency:tree` to view the dependency tree and exclude the conflicting versions.
|
||||
|
||||
### SNAPSHOT dependency cannot be downloaded
|
||||
|
||||
* Cause: The SNAPSHOT repository is not configured.
|
||||
* Solution: Make sure that the `sonatype-snapshots` repository is added in `pom.xml`.
|
||||
@@ -0,0 +1,72 @@
|
||||
---
|
||||
sidebar_label: Dify
|
||||
slug: /dify
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with Dify
|
||||
|
||||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||||
|
||||
Dify is an open-source Large Language Model (LLM) application development platform. Combining Backend as Service (BaaS) and LLMOps concepts, it enables developers to quickly build production-ready generative AI applications. Even non-technical users can participate in defining AI applications and managing data operations.
|
||||
|
||||
Dify includes essential technologies for building LLM applications: support for hundreds of models, an intuitive prompt orchestration interface, a high-quality RAG engine, a robust agent framework, flexible workflow orchestration, along with user-friendly interfaces and APIs. This eliminates redundant development efforts, enabling developers to focus on innovation and business needs.
|
||||
|
||||
This topic describes how to integrate the vector search capability of seekdb with Dify.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* Before deploying Dify, ensure that your machine meets the following minimum system requirements:
|
||||
|
||||
* CPU: 2 cores
|
||||
* Memory: 4 GB
|
||||
|
||||
* This integration tutorial runs on Docker container platform. Ensure you have set up the [Docker platform](https://docs.docker.com/get-started/get-docker/).
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string. For example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Deploy Dify
|
||||
|
||||
### Method 1
|
||||
|
||||
For Dify deployment, refer to [Deploy with Docker Compose](https://docs.dify.ai/getting-started/install-self-hosted/docker-compose) with these modifications:
|
||||
|
||||
* Change the `VECTOR_STORE` variable value to `oceanbase` in `.env` file.
|
||||
* Start services using `docker compose --profile oceanbase up -d`.
|
||||
|
||||
### Method 2
|
||||
|
||||
Alternatively, you can refer to [Dify on MySQL](https://github.com/oceanbase/dify-on-mysql) to quickly start the Dify service.
|
||||
|
||||
To start the service, run the following commands:
|
||||
|
||||
```shell
|
||||
cd docker
|
||||
bash setup-mysql-env.sh
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Step 3: Use Dify
|
||||
|
||||
For information about connecting LLMs in Dify, refer to [Model Configuration](https://docs.dify.ai/guides/model-configuration).
|
||||
@@ -0,0 +1,265 @@
|
||||
---
|
||||
sidebar_label: n8n
|
||||
slug: /n8n
|
||||
---
|
||||
|
||||
# Integrate seekdb vector search with n8n
|
||||
|
||||
n8n is a workflow automation platform with native AI capabilities, providing technical teams with the flexibility of code and the speed of no-code. With over 400 integrations, native AI features, and a fair code license, n8n allows you to build a robust automation while maintaining full control over your data and deployments.
|
||||
|
||||
This topic demonstrates how to build a Chat to seekdb workflow template using the powerful features of n8n.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You have deployed seekdb.
|
||||
|
||||
* This integration tutorial is performed in a Docker container. Make sure that you have [set up a Docker container](https://docs.docker.com/get-started/get-docker/).
|
||||
|
||||
## Step 1: Obtain the database connection information
|
||||
|
||||
Contact the seekdb deployment personnel or administrator to obtain the database connection string, for example:
|
||||
|
||||
```sql
|
||||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
* `$host`: The IP address for connecting to the seekdb database.
|
||||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`.
|
||||
* `$database_name`: The name of the database to access.
|
||||
|
||||
<main id="notice" type='notice'>
|
||||
<h4>Notice</h4>
|
||||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||||
</main>
|
||||
|
||||
* `$user_name`: The database account.
|
||||
* `$password`: The password for the account.
|
||||
|
||||
## Step 2: Create a test table and insert data
|
||||
|
||||
Before you build the workflow, you need to create a sample table in seekdb to store book information and insert some sample data.
|
||||
|
||||
```sql
|
||||
CREATE TABLE books (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
isbn13 VARCHAR(255),
|
||||
author TEXT,
|
||||
title VARCHAR(255),
|
||||
publisher VARCHAR(255),
|
||||
category TEXT,
|
||||
pages INT,
|
||||
price DECIMAL(10,2),
|
||||
format VARCHAR(50),
|
||||
rating DECIMAL(3,1),
|
||||
release_year YEAR
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'database-internals',
|
||||
'978-1492040347',
|
||||
'"Alexander Petrov"',
|
||||
'Database Internals: A deep-dive into how distributed data systems work',
|
||||
'O\'Reilly',
|
||||
'["databases","information systems"]',
|
||||
350,
|
||||
47.28,
|
||||
'paperback',
|
||||
4.5,
|
||||
2019
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'designing-data-intensive-applications',
|
||||
'978-1449373320',
|
||||
'"Martin Kleppmann"',
|
||||
'Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems',
|
||||
'O\'Reilly',
|
||||
'["databases"]',
|
||||
590,
|
||||
31.06,
|
||||
'paperback',
|
||||
4.4,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'kafka-the-definitive-guide',
|
||||
'978-1491936160',
|
||||
'["Neha Narkhede", "Gwen Shapira", "Todd Palino"]',
|
||||
'Kafka: The Definitive Guide: Real-time data and stream processing at scale',
|
||||
'O\'Reilly',
|
||||
'["databases"]',
|
||||
297,
|
||||
37.31,
|
||||
'paperback',
|
||||
3.9,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'effective-java',
|
||||
'978-1491936160',
|
||||
'"Joshua Block"',
|
||||
'Effective Java',
|
||||
'Addison-Wesley',
|
||||
'["programming languages", "java"]',
|
||||
412,
|
||||
27.91,
|
||||
'paperback',
|
||||
4.2,
|
||||
2017
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'daemon',
|
||||
'978-1847249616',
|
||||
'"Daniel Suarez"',
|
||||
'Daemon',
|
||||
'Quercus',
|
||||
'["dystopia","novel"]',
|
||||
448,
|
||||
12.03,
|
||||
'paperback',
|
||||
4.0,
|
||||
2011
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'cryptonomicon',
|
||||
'978-1847249616',
|
||||
'"Neal Stephenson"',
|
||||
'Cryptonomicon',
|
||||
'Avon',
|
||||
'["thriller", "novel"]',
|
||||
1152,
|
||||
6.99,
|
||||
'paperback',
|
||||
4.0,
|
||||
2002
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'garbage-collection-handbook',
|
||||
'978-1420082791',
|
||||
'["Richard Jones", "Antony Hosking", "Eliot Moss"]',
|
||||
'The Garbage Collection Handbook: The Art of Automatic Memory Management',
|
||||
'Taylor & Francis',
|
||||
'["programming algorithms"]',
|
||||
511,
|
||||
87.85,
|
||||
'paperback',
|
||||
5.0,
|
||||
2011
|
||||
);
|
||||
|
||||
INSERT INTO books (
|
||||
id, isbn13, author, title, publisher, category, pages, price, format, rating, release_year
|
||||
) VALUES (
|
||||
'radical-candor',
|
||||
'978-1250258403',
|
||||
'"Kim Scott"',
|
||||
'Radical Candor: Be a Kick-Ass Boss Without Losing Your Humanity',
|
||||
'Macmillan',
|
||||
'["human resources","management", "new work"]',
|
||||
404,
|
||||
7.29,
|
||||
'paperback',
|
||||
4.0,
|
||||
2018
|
||||
);
|
||||
```
|
||||
|
||||
## Step 3: Deploy the tools
|
||||
|
||||
### Private deployment of n8n
|
||||
|
||||
n8n is a workflow automation platform based on Node.js. It provides extensive integration capabilities and flexible extensibility. By privately deploying n8n, you can better control the runtime environment of your workflows and ensure the security and privacy of your data. This section describes how to deploy n8n in a Docker environment.
|
||||
|
||||
```shell
|
||||
sudo docker run -d --name n8n -p 5678:5678 -e N8N_SECURE_COOKIE=false n8nio/n8n
|
||||
```
|
||||
|
||||
### Deploy the Qwen3 model using Ollama
|
||||
|
||||
Ollama is an open-source AI model server that supports the deployment and management of multiple AI models. With Ollama, you can easily deploy the Qwen3 model locally to enable AI Agent functionality. This section describes how to deploy Ollama in a Docker environment, and then use Ollama to deploy the Qwen3 model.
|
||||
|
||||
```shell
|
||||
# Deploy Ollama in a Docker environment
|
||||
sudo docker run -d -p 11434:11434 --name ollama ollama/ollama
|
||||
# Deploy the Qwen3 model
|
||||
sudo docker exec -it ollama sh -c 'ollama run qwen3:latest'
|
||||
```
|
||||
|
||||
## Step 4: Build an AI Agent workflow
|
||||
|
||||
n8n provides a variety of nodes to build an AI Agent workflow. This section shows you how to build a Chat to seekdb workflow template. The workflow consists of five nodes, and the steps are as follows:
|
||||
|
||||
1. Add a trigger
|
||||
|
||||
Add an HTTP trigger node to receive HTTP requests.
|
||||
|
||||

|
||||
|
||||
2. Add an AI Agent node
|
||||
|
||||
Add an AI Agent node to process AI Agent requests.
|
||||
|
||||

|
||||
|
||||
3. Add an Ollama Chat Model node
|
||||
|
||||
Select a free Ollama chat model, such as Qwen3, and configure the Ollama account.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
4. Add a Simple Memory node
|
||||
|
||||
The Simple Memory node indicates short-term memory and can remember the previous five interactions in the chat.
|
||||
|
||||

|
||||
|
||||
5. Add a Tool node
|
||||
|
||||
The Tool node is used to perform database operations in seekdb. Add a MySQL tool under AI Agent-tool.
|
||||
|
||||

|
||||
|
||||
Configure the MySQL tool as follows:
|
||||
|
||||

|
||||
|
||||
Click the edit icon shown in the preceding figure to configure the MySQL connection information.
|
||||
|
||||

|
||||
|
||||
After the configuration is completed, close the window. Click **Test step** in the upper-right corner of the configuration panel to test the database connection with a simple SQL statement, or click **Back to canvas** to return to the main interface.
|
||||
|
||||
6. Click **Save** to complete the workflow construction.
|
||||
|
||||
After all 5 nodes are configured, click **Save** to complete the workflow construction. You can then test the workflow.
|
||||
|
||||
<!--  -->
|
||||
|
||||
## Workflow demo
|
||||
|
||||
The completed workflow is displayed as follows:
|
||||
|
||||

|
||||
Reference in New Issue
Block a user