184 lines
8.2 KiB
Markdown
184 lines
8.2 KiB
Markdown
---
|
||
sidebar_label: LangChain
|
||
slug: /langchain
|
||
---
|
||
|
||
# Integrate seekdb vector search with LangChain
|
||
|
||
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
|
||
|
||
LangChain is a framework for developing language model-driven applications. It enables an application to have the following capabilities:
|
||
|
||
* Context awareness: The application can connect language models to context sources, such as prompt instructions, a few examples, and content requiring responses.
|
||
* Reasoning: The application can perform reasoning based on language models. For example, it can decide how to answer a question or what actions to take based on the provided context.
|
||
|
||
This topic describes how to integrate the [vector search feature](../../200.develop/100.vector-search/100.vector-search-overview/100.vector-search-intro.md) of seekdb with the [Tongyi Qianwen (Qwen) API](https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1) and [LangChain](https://python.langchain.com/) for Document Question Answering (DQA).
|
||
|
||
## Prerequisites
|
||
|
||
* You have deployed seekdb.
|
||
* Your environment has a database and account with read and write privileges.
|
||
* You have installed Python 3.9 or later.
|
||
* You have installed required dependencies:
|
||
|
||
```shell
|
||
python3 -m pip install -U langchain-oceanbase
|
||
python3 -m pip install langchain_community
|
||
python3 -m pip install dashscope
|
||
```
|
||
|
||
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
|
||
|
||
## Step 1: Obtain the database connection information
|
||
|
||
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
|
||
|
||
```sql
|
||
obclient -h$host -P$port -u$user_name -p$password -D$database_name
|
||
```
|
||
|
||
**Parameters:**
|
||
|
||
* `$host`: The IP address for connecting to the seekdb database.
|
||
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
|
||
* `$database_name`: The name of the database to access.
|
||
|
||
<main id="notice" type='notice'>
|
||
<h4>Notice</h4>
|
||
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
|
||
</main>
|
||
|
||
* `$user_name`: The database account, in the format of `username`.
|
||
* `$password`: The password for the account.
|
||
|
||
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
|
||
|
||
## Step 2: Build your AI assistant
|
||
|
||
### Set the environment variable for the Qwen API key
|
||
|
||
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
|
||
|
||
```shell
|
||
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
|
||
```
|
||
|
||
### Load and split the documents
|
||
|
||
Download the sample data and split it into chunks of approximately 1000 characters using the `CharacterTextSplitter` class.
|
||
|
||
```python
|
||
from langchain_community.document_loaders import TextLoader
|
||
from langchain_community.embeddings import DashScopeEmbeddings
|
||
from langchain_text_splitters import CharacterTextSplitter
|
||
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
|
||
import os
|
||
import requests
|
||
|
||
DASHSCOPE_API = os.environ.get("DASHSCOPE_API_KEY", "")
|
||
embeddings = DashScopeEmbeddings(
|
||
model="text-embedding-v1", dashscope_api_key=DASHSCOPE_API
|
||
)
|
||
|
||
url = "https://raw.githubusercontent.com/GITHUBear/langchain/refs/heads/master/docs/docs/how_to/state_of_the_union.txt"
|
||
res = requests.get(url)
|
||
with open("state_of_the_union.txt", "w") as f:
|
||
f.write(res.text)
|
||
|
||
loader = TextLoader('./state_of_the_union.txt')
|
||
|
||
documents = loader.load()
|
||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
|
||
docs = text_splitter.split_documents(documents)
|
||
```
|
||
|
||
### Insert the data into seekdb
|
||
|
||
```python
|
||
connection_args = {
|
||
"host": "127.0.0.1",
|
||
"port": "2881",
|
||
"user": "root@user_name",
|
||
"password": "",
|
||
"db_name": "test",
|
||
}
|
||
DEMO_TABLE_NAME = "demo_ann"
|
||
ob = OceanbaseVectorStore(
|
||
embedding_function=embeddings,
|
||
table_name=DEMO_TABLE_NAME,
|
||
connection_args=connection_args,
|
||
drop_old=True,
|
||
normalize=True,
|
||
)
|
||
res = ob.add_documents(documents=docs)
|
||
```
|
||
|
||
### Vector search
|
||
|
||
This step shows how to query `"What did the president say about Ketanji Brown Jackson"` from the document `state_of_the_union.txt`.
|
||
|
||
```python
|
||
query = "What did the president say about Ketanji Brown Jackson"
|
||
docs_with_score = ob.similarity_search_with_score(query, k=3)
|
||
|
||
for doc, score in docs_with_score:
|
||
print("-" * 80)
|
||
print("Score: ", score)
|
||
print(doc.page_content)
|
||
print("-" * 80)
|
||
```
|
||
|
||
Expected output:
|
||
|
||
```shell
|
||
--------------------------------------------------------------------------------
|
||
Score: 1.204783671324283
|
||
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
|
||
|
||
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
|
||
|
||
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
|
||
|
||
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
|
||
--------------------------------------------------------------------------------
|
||
--------------------------------------------------------------------------------
|
||
Score: 1.2146663629717394
|
||
It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
|
||
|
||
As I’ve told Xi Jinping, it is never a good bet to bet against the American people.
|
||
|
||
We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
|
||
|
||
And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
|
||
|
||
We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
|
||
|
||
4,000 projects have already been announced.
|
||
|
||
And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
|
||
--------------------------------------------------------------------------------
|
||
--------------------------------------------------------------------------------
|
||
Score: 1.2193955178945004
|
||
Vice President Harris and I ran for office with a new economic vision for America.
|
||
|
||
Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
|
||
and the middle out, not from the top down.
|
||
|
||
Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
|
||
|
||
America used to have the best roads, bridges, and airports on Earth.
|
||
|
||
Now our infrastructure is ranked 13th in the world.
|
||
|
||
We won’t be able to compete for the jobs of the 21st Century if we don’t fix that.
|
||
|
||
That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
|
||
|
||
This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
|
||
|
||
We’re done talking about infrastructure weeks.
|
||
|
||
We’re going to have an infrastructure decade.
|
||
--------------------------------------------------------------------------------
|
||
```
|