---
sidebar_label: OpenAI
slug: /openai
---
# OpenAI
OpenAI is an artificial intelligence company that has developed several large language models. These models excel at understanding and generating natural language, making them highly effective for tasks such as text generation, answering questions, and engaging in conversations. Access to these models is available through an API.
seekdb offers features such as vector storage, vector indexing, and embedding-based vector search. By using OpenAI's API, you can convert data into vectors, store these vectors in seekdb, and then take advantage of seekdb's vector search capabilities to find relevant data.
## Prerequisites
* You have deployed seekdb.
* You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
* You have installed [Python 3.9 or later](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/installation/).
* You have installed [Poetry](https://python-poetry.org/docs/), [Pyobvector](https://github.com/oceanbase/pyobvector), and OpenAI SDK. The installation commands are as follows:
```shell
python3 pip install poetry
python3 pip install pyobvector
python3 pip install openai
```
* You have obtained an [OpenAI API key](https://platform.openai.com/api-keys).
## Step 1: Obtain the connection string of seekdb
Contact the seekdb deployment engineer or administrator to obtain the connection string of seekdb, for example:
```sql
obclient -h$host -P$port -u$user_name -p$password -D$database_name
```
**Parameters:**
* `$host`: The IP address for connecting to seekdb.
* `$port`: The port number for connecting to seekdb. Default is `2881`.
* `$database_name`: The name of the database to be accessed.
:::tip
The user for connection must have the CREATE, INSERT, DROP, and SELECT privileges on the database.
:::
* `$user_name`: The database account.
* `$password`: The password of the account.
**Here is an example:**
```shell
obclient -hxxx.xxx.xxx.xxx -P2881 -utest_user001 -p****** -Dtest
```
## Step 2: Register an LLM account
Obtain an OpenAI API key.
1. Log in to the [OpenAI](https://platform.openai.com/) platform.
2. Click **API Keys** in the upper-right corner.
3. Click **Create API Key**.
4. Specify the required information and click **Create API Key**.
Specify the API key for the relevant environment variable.
* For a Unix-based system such as Ubuntu or macOS, you can run the following command in a terminal:
```shell
export OPENAI_API_KEY='your-api-key'
```
* For a Windows system, you can run the following command in Command Prompt:
```shell
set OPENAI_API_KEY=your-api-key
```
You must replace `your-api-key` with the actual OpenAI API key.
## Step 3: Store vector data in seekdb
### Store vector data in seekdb
1. Prepare test data.
Download the [CSV file](https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240827/srxyhu/fine_food_reviews.csv) that already contains the vectorized data. This CSV file includes 1,000 food review entries, and the last column contains the vector values. Therefore, you do not need to calculate the vectors yourself. If you want to recalculate the embeddings for the "embedding" column (the vector column), you can use the following code to generate a new CSV file:
```shell
from openai import OpenAI
import pandas as pd
input_datapath = "./fine_food_reviews.csv"
client = OpenAI()
# Here the text-embedding-ada-002 model is used. You can change the model as needed.
def embedding_text(text, model="text-embedding-ada-002"):
# For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
res = client.embeddings.create(input=text, model=model)
return res.data[0].embedding
df = pd.read_csv(input_datapath, index_col=0)
# It takes a few minutes to generate the CSV file by calling the OpenAI Embedding API row by row.
df["embedding"] = df.combined.apply(embedding_text)
output_datapath = './fine_food_reviews_self_embeddings.csv'
df.to_csv(output_datapath)
```
2. Run the following script to insert the test data into seekdb. The script must be located in the same directory as the test data.
```shell
import os
import sys
import csv
import json
from pyobvector import *
from sqlalchemy import Column, Integer, String
# Connect to seekdb by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
# The test dataset has been vectorized and is stored in the same directory as the Python script by default. If you vectorize the dataset again, specify the new file.
file_name = "fine_food_reviews.csv"
file_path = os.path.join("./", file_name)
# Define columns. The last column is a vector column.
cols = [
Column('id', Integer, primary_key=True, autoincrement=False),
Column('product_id', String(256), nullable=True),
Column('user_id', String(256), nullable=True),
Column('score', Integer, nullable=True),
Column('summary', String(2048), nullable=True),
Column('text', String(8192), nullable=True),
Column('combined', String(8192), nullable=True),
Column('n_tokens', Integer, nullable=True),
Column('embedding', VECTOR(1536))
]
# Define the table name.
table_name = 'fine_food_reviews'
# If the table does not exist, create it.
if not client.check_table_exists(table_name):
client.create_table(table_name,columns=cols)
# Create an index on the vector column.
client.create_index(
table_name=table_name,
is_vec_index=True,
index_name='vidx',
column_names=['embedding'],
vidx_params='distance=l2, type=hnsw, lib=vsag',
)
# Open and read the CSV file.
with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
csvreader = csv.reader(csvfile)
# Read the header line.
headers = next(csvreader)
print("Headers:", headers)
batch = [] # Store data by inserting 10 rows into the database each time.
for i, row in enumerate(csvreader):
# The CSV file contains nine columns: `id`, `product_id`, `user_id`, `score`, `summary`, `text`, `combined`, `n_tokens`, and `embedding`.
if not row:
break
food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
batch.append(food_review_line)
# Insert 10 rows each time.
if (i + 1) % 10 == 0:
client.insert(table_name,batch)
batch = [] # Clear the cache.
# Insert the rest rows, if any.
if batch:
client.insert(table_name,batch)
# Check the data in the table and make sure that all data has been inserted.
count_sql = f"select count(*) from {table_name};"
cursor = client.perform_raw_text_sql(count_sql)
result = cursor.fetchone()
print(f"Total number of inserted rows:{result[0]}")
```
### Query seekdb data
1. Save the following Python script and name it as `openAIQuery.py`.
```shell
import os
import sys
import csv
import json
from pyobvector import *
from sqlalchemy import func
from openai import OpenAI
# Obtain command-line options.
if len(sys.argv) != 2:
print("Enter a query statement." )
sys.exit()
queryStatement = sys.argv[1]
# Connect to seekdb by using pyobvector and replace the at (@) sign in the username and password with %40, if any.
client = ObVecClient(uri="host:port", user="usename",password="****",db_name="test")
openAIclient = OpenAI()
# Define the function for generating text vectors.
def generate_embeddings(text, model="text-embedding-ada-002"):
# For more information about how to create embedding vectors, see https://community.openai.com/t/embeddings-api-documentation-needs-to-updated/475663.
res = openAIclient.embeddings.create(input=text, model=model)
return res.data[0].embedding
def query_ob(query, tableName, vector_name="embedding", top_k=1):
embedding = generate_embeddings(query)
# Perform an approximate nearest neighbor search (ANNS).
res = client.ann_search(
table_name=tableName,
vec_data=embedding,
vec_column_name=vector_name,
distance_func=func.l2_distance,
topk=top_k,
output_column_names=['combined']
)
for row in res:
print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
# Specify the table name.
table_name = 'fine_food_reviews'
query_ob(queryStatement,table_name,'embedding',1)
```
2. Enter a question for an answer.
```shell
python3 openAIQuery.py 'pet food'
```
The expected result is as follows:
```shell
Crack for dogs.: These thing are like crack for dogs. I am not sure of the make-up but the doggies sure love them.
```