Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:44:54 +08:00
commit eb309b7b59
133 changed files with 21979 additions and 0 deletions

View File

@@ -0,0 +1,125 @@
---
sidebar_label: LlamaIndex
slug: /llamaindex
---
# Integrate seekdb vector search with LlamaIndex
seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.
LlamaIndex is a framework for building context-augmented generative AI applications by using large language models (LLMs), including proxies and workflows. It provides a wealth of capabilities such as data connectors, data indexes, proxies, observability/assessment integration, and workflows.
This topic demonstrates how to integrate the vector search feature of seekdb with the Tongyi Qianwen (Qwen) API and LlamaIndex for Document Question Answering (DQA).
## Prerequisites
* You have deployed the seekdb database.
* Your environment has a database and account with read and write privileges.
* You can set the `ob_vector_memory_limit_percentage` parameter to enable vector search. We recommend keeping the default value of `0` (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
* You have installed Python 3.9 or later.
* You have installed the required dependencies:
```shell
python3 -m pip install llama-index-vector-stores-oceanbase llama-index
python3 -m pip install llama-index-embeddings-dashscope
python3 -m pip install llama-index-llms-dashscope
```
* You have obtained the Qwen API key.
## Step 1: Obtain the database connection information
Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:
```sql
obclient -h$host -P$port -u$user_name -p$password -D$database_name
```
**Parameters:**
* `$host`: The IP address for connecting to the seekdb database.
* `$port`: The port for connecting to the seekdb database. The default value is `2881`, which can be customized during deployment.
* `$database_name`: The name of the database to access.
<main id="notice" type='notice'>
<h4>Notice</h4>
<p>The user connecting to the database must have the <code>CREATE</code>, <code>INSERT</code>, <code>DROP</code>, and <code>SELECT</code> privileges on the database.</p>
</main>
* `$user_name`: The database account, in the format of `username`.
* `$password`: The password for the account.
For more information about the connection string, see [Connect to OceanBase Database by using OBClient](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971649).
## Step 2: Build your AI assistant
### Set the environment variable for the Qwen API key
Create a [Qwen API key](https://www.alibabacloud.com/help/en/model-studio/get-api-key?spm=a2c63.l28256.help-menu-2400256.d_2.47db1b76nM44Ut) and [configure it in the environment variables](https://www.alibabacloud.com/help/en/model-studio/configure-api-key-through-environment-variables?spm=a2c63.p38356.help-menu-2400256.d_2_0_1.56069f6b3m576u).
```shell
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
```
### Download the sample data
```shell
mkdir -p '/root/llamaindex/paul_graham/'
wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '/root/llamaindex/paul_graham/paul_graham_essay.txt'
```
### Load the data text
```python
import os
from pyobvector import ObVecClient
from llama_index.core import Settings
from llama_index.embeddings.dashscope import DashScopeEmbedding
from llama_index.core import (
SimpleDirectoryReader,
load_index_from_storage,
VectorStoreIndex,
StorageContext,
)
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
#set ob client
client = ObVecClient(uri="127.0.0.1:2881", user="root@test",password="",db_name="test")
# Global Settings
Settings.embed_model = DashScopeEmbedding()
# config llm model
dashscope_llm = DashScope(
model_name=DashScopeGenerationModels.QWEN_MAX,
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
)
# load documents
documents = SimpleDirectoryReader("/root/llamaindex/paul_graham/").load_data()
oceanbase = OceanBaseVectorStore(
client=client,
dim=1536,
drop_old=True,
normalize=True,
)
storage_context = StorageContext.from_defaults(vector_store=oceanbase)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
```
## Vector search
This step shows how to query `"What did the author do growing up?"` from the document `paul_graham_essay.txt`.
```shell
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response
```
Expected result:
```python
'Growing up, the author worked on writing and programming outside of school. In terms of writing, he wrote short stories, which he now considers to be awful, as they had very little plot and focused mainly on characters with strong feelings. For programming, he started in 9th grade by trying to write programs on an IBM 1401 at his school, using an early version of Fortran. Later, after getting a TRS-80 microcomputer, he began to write more practical programs, including simple games, a program to predict the flight height of model rockets, and a word processor that his father used for writing.'
```