Initial commit
This commit is contained in:
@@ -0,0 +1,520 @@
|
||||
---
|
||||
|
||||
slug: /using-seekdb-in-python-mode
|
||||
---
|
||||
|
||||
# Experience embedded seekdb
|
||||
|
||||
seekdb provides an embedded product form that can be integrated into user applications as a library, offering developers a more powerful and flexible data management solution. This enables data management everywhere (microcontrollers, IoT devices, edge computing, mobile applications, data centers, etc.), allowing users to quickly get started with seekdb's All-in-one (TP, AP, AI Native) capabilities.
|
||||
|
||||

|
||||
|
||||
## Installation and configuration
|
||||
|
||||
### Environment requirements
|
||||
|
||||
* Supported operating systems: Linux (glibc >= 2.28)
|
||||
|
||||
* Supported Python versions: CPython 3.8 ~ 3.14
|
||||
|
||||
* Supported system architectures: x86_64, aarch64
|
||||
|
||||
You can run the following command to check whether your environment meets the requirements.
|
||||
|
||||
```python
|
||||
python -c 'import sys;import platform; print(f"Python: {platform.python_implementation()} {platform.python_version()}, System: {platform.system()} {platform.machine()}, {platform.libc_ver()[0]}: {platform.libc_ver()[1]}");'
|
||||
```
|
||||
|
||||
The output should be like this:
|
||||
|
||||
```python
|
||||
Python: CPython 3.8.17, System: Linux x86_64, glibc: 2.32
|
||||
```
|
||||
|
||||
### Installation
|
||||
|
||||
Use pip to install. It automatically detects the default Python version and platform.
|
||||
|
||||
```python
|
||||
pip install pylibseekdb
|
||||
# Or specify a mirror source for faster installation
|
||||
pip install pylibseekdb -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
If your pip version is low, upgrade pip first before installation:
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
```
|
||||
|
||||
## Experience seekdb
|
||||
|
||||
After completing the installation of seekdb, you can start experiencing seekdb.
|
||||
|
||||
#### Considerations
|
||||
|
||||
* Multi-statement queries are not supported. By default, only the first statement is executed. For example:
|
||||
|
||||
```python
|
||||
cur.execute("insert into t1 values(100);insert into t1 values(200)")
|
||||
```
|
||||
|
||||
* The tmpfs file system cannot be used as the database directory.
|
||||
* Only streaming query mode is supported. For example:
|
||||
|
||||
```python
|
||||
cur = con.cursor()
|
||||
cur.execute("select * from t1")
|
||||
cur.fetchall()
|
||||
```
|
||||
|
||||
* The execute method does not support parameterization.
|
||||
|
||||
#### Experience basic seekdb operations
|
||||
|
||||
The following examples demonstrate some basic operations of seekdb. You can create databases, connect to databases, create tables, write and query data, and more.
|
||||
|
||||
:::info
|
||||
For detailed information about seekdb SQL syntax, see <a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001972315">SQL syntax</a>.
|
||||
:::
|
||||
|
||||
seekdb provides the `test` database by default. The following example demonstrates how to open and connect to the `test` database using default parameters, and how to create tables, write data, commit transactions, query data, and safely close the database.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
# Open the database directory seekdb by default
|
||||
pylibseekdb.open()
|
||||
# Connect to the test database by default
|
||||
conn = pylibseekdb.connect()
|
||||
# Create a cursor for data operations
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Execute table creation statement
|
||||
cursor.execute("create table t1(c1 int primary key, c2 int)")
|
||||
# Execute data insertion
|
||||
cursor.execute("insert into t1 values(1, 100)")
|
||||
cursor.execute("insert into t1 values(2, 200)")
|
||||
# Manually commit the transaction
|
||||
conn.commit()
|
||||
|
||||
# Execute query
|
||||
cursor.execute("select * from t1")
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
You can also manually specify the database directory and create and use a new database.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
# Specify the database directory
|
||||
pylibseekdb.open("mydb")
|
||||
# Do not connect to any database
|
||||
conn = pylibseekdb.connect("")
|
||||
# Create a cursor for data operations
|
||||
cursor = conn.cursor()
|
||||
# Manually create a database
|
||||
cursor.execute("create database db1")
|
||||
# Use the newly created database
|
||||
cursor.execute("use db1")
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
The following example demonstrates how to enable autocommit mode for transactions.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
# Specify the database directory
|
||||
pylibseekdb.open("seekdb")
|
||||
# Connect to the test database
|
||||
conn = pylibseekdb.connect(database="test", autocommit=True)
|
||||
# Create a cursor for data operations
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Execute table creation statement
|
||||
cursor.execute("create table t2(c1 int primary key, c2 int)")
|
||||
# Execute data insertion, transaction is automatically committed
|
||||
cursor.execute("insert into t2 values(1, 100)")
|
||||
# Execute data insertion, transaction is automatically committed
|
||||
cursor.execute("insert into t2 values(2, 200)")
|
||||
|
||||
# Query data using a new connection
|
||||
conn2 = pylibseekdb.connect("test")
|
||||
cursor2=conn2.cursor()
|
||||
cursor2.execute("select * from t2")
|
||||
# View data row by row
|
||||
print(cursor2.fetchone())
|
||||
print(cursor2.fetchone())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
cursor2.close()
|
||||
conn2.close()
|
||||
```
|
||||
|
||||
### Experience AI Native
|
||||
|
||||
#### Experience vector search
|
||||
|
||||
seekdb supports up to 16,000 dimensions of float-type dense vectors, sparse vectors, and various types of vector distance calculations such as Manhattan distance, Euclidean distance, inner product, and cosine distance. It supports creating vector indexes based on HNSW/IVF, and supports incremental updates and deletions without affecting recall.
|
||||
|
||||
:::info
|
||||
|
||||
For more detailed information about seekdb vector search, see <a href="https://en.oceanbase.com/docs/common-oceanbase-database-10000000001976351">Vector search</a>.
|
||||
:::
|
||||
|
||||
The following example demonstrates how to use vector search in seekdb.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a table with a vector index
|
||||
cursor.execute("create table test_vector(c1 int primary key, c2 vector(2), vector index idx1(c2) with (distance=l2, type=hnsw, lib=vsag))")
|
||||
|
||||
# Insert data
|
||||
cursor.execute("insert into test_vector values(1, [1, 1])")
|
||||
cursor.execute("insert into test_vector values(2, [1, 2])")
|
||||
cursor.execute("insert into test_vector values(3, [1, 3])")
|
||||
conn.commit()
|
||||
|
||||
# Execute vector search
|
||||
cursor.execute("SELECT c1,c2 FROM test_vector ORDER BY l2_distance(c2, '[1, 2.5]') APPROXIMATE LIMIT 2;")
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### Experience full-text search
|
||||
|
||||
seekdb provides full-text indexing capabilities. By building full-text indexes, you can comprehensively index entire documents or large text content, significantly improving query performance when dealing with large-scale text data and complex search requirements, enabling users to obtain the required information more efficiently.
|
||||
|
||||
The following example demonstrates how to use seekdb's full-text search feature.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a table with a full-text index
|
||||
sql='''create table articles (title VARCHAR(200) primary key, body Text,
|
||||
FULLTEXT fts_idx(title, body));
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
|
||||
# Insert data
|
||||
sql='''insert into articles(title, body) values
|
||||
('OceanBase Tutorial', 'This is a tutorial about OceanBase Fulltext.'),
|
||||
('Fulltext Index', 'Fulltext index can be very useful.'),
|
||||
('OceanBase Test Case', 'Writing test cases helps ensure quality.')
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
conn.commit()
|
||||
|
||||
# Execute full-text search
|
||||
sql='''select
|
||||
title,
|
||||
match (title, body) against ("OceanBase") as score
|
||||
from
|
||||
articles
|
||||
where
|
||||
match (title, body) against ("OceanBase")
|
||||
order by
|
||||
score desc
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### Experience hybrid search
|
||||
|
||||
Hybrid Search combines vector-based semantic search and full-text index-based keyword search, providing more accurate and comprehensive search results through comprehensive ranking. Vector search excels at semantic approximate matching but is weak at matching exact keywords, numbers, and proper nouns, while full-text search effectively compensates for this deficiency. Therefore, hybrid search has become one of the key features of vector databases and is widely used in various products.
|
||||
|
||||
Based on multi-model integration, seekdb provides hybrid search capabilities for multi-modal data on the basis of SQL+AI, enabling fusion queries of multiple types of data in a single database system.
|
||||
|
||||
The following example demonstrates how to use seekdb's hybrid search feature.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a table with vector indexes and full-text indexes
|
||||
cursor.execute("create table doc_table(c1 int, vector vector(3), query varchar(255), content varchar(255), vector index idx1(vector) with (distance=l2, type=hnsw, lib=vsag), fulltext idx2(query), fulltext idx3(content))")
|
||||
|
||||
# Insert data
|
||||
sql = '''insert into doc_table values(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
|
||||
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
|
||||
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
|
||||
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
|
||||
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
|
||||
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");'''
|
||||
cursor.execute(sql)
|
||||
conn.commit()
|
||||
|
||||
|
||||
sql = '''set @parm = '{
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{"match": {"query": "hi hello"}},
|
||||
{"match": { "content": "oceanbase mysql" }}
|
||||
]
|
||||
}
|
||||
},
|
||||
"knn" : {
|
||||
"field": "vector",
|
||||
"k": 5,
|
||||
"num_candidates": 10,
|
||||
"query_vector": [1,2,3],
|
||||
"boost": 0.7
|
||||
},
|
||||
"_source" : ["query", "content", "_keyword_score", "_semantic_score"]
|
||||
}';'''
|
||||
cursor.execute(sql)
|
||||
|
||||
# Execute hybrid search
|
||||
sql = '''select dbms_hybrid_search.search('doc_table', @parm);'''
|
||||
cursor.execute(sql)
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
### Experience analytical capabilities (OLAP)
|
||||
|
||||
seekdb combines transaction processing (TP) with analytical processing (AP). Based on the LSM-Tree architecture, it achieves unified row and column storage, and introduces a new vectorized engine and cost evaluation model based on column storage, significantly improving the efficiency of processing wide tables and enhancing query performance in AP scenarios. It also supports real-time import, secondary indexes, high-concurrency primary key queries, and other common real-time OLAP requirements.
|
||||
|
||||
#### Experience data import
|
||||
|
||||
seekdb supports various flexible data import methods, allowing you to import data from multiple data sources into the database. Different import methods are suitable for different scenarios. You can choose appropriate import tools for data import based on data source types and business scenarios. As scenarios become more complex and diverse, multiple import methods can be used together. When importing data, in addition to considering data sources, data file formats should also be considered along with the support of import tools. When business scenarios have clearly defined data sources and data file formats, you need to start from the data source and consider the design of the import solution in combination with import tools. When businesses have import tools they are familiar with, you need to consider the tool's support and the possibility of import in combination with business scenarios.
|
||||
|
||||
The following example uses the `load data` method to demonstrate how to quickly import CSV data into seekdb.
|
||||
|
||||
1. Create an external data source
|
||||
|
||||
```bash
|
||||
cat /data/1/example.csv
|
||||
1,10
|
||||
2,20
|
||||
3,30
|
||||
```
|
||||
|
||||
2. Import external data using the embedded method.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a table
|
||||
cursor.execute("create table test_olap(c1 int, c2 int)")
|
||||
# Execute fast import
|
||||
cursor.execute("load data /*+ direct(true, 0) */ infile '/data/1/example.csv' into table test_olap fields terminated by ','")
|
||||
# Query data
|
||||
cursor.execute("select count(*) from test_olap")
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### Experience columnar storage
|
||||
|
||||
In scenarios involving complex analysis of large-scale data or ad-hoc queries on massive data, columnar storage is one of the key capabilities of AP databases. The seekdb storage engine has been further enhanced on the basis of supporting row storage, achieving support for column storage and unified storage. With one codebase, one architecture, and one instance, columnar data and row data coexist.
|
||||
|
||||
The following example demonstrates how to create a columnar table in seekdb.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a columnar table
|
||||
sql='''create table each_column_group (col1 varchar(30) not null, col2 varchar(30) not null, col3 varchar(30) not null, col4 varchar(30) not null, col5 int)
|
||||
with column group (each column);
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
|
||||
# Insert data
|
||||
sql='''insert into each_column_group values('a', 'b', 'c', 'd', 1)
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
conn.commit()
|
||||
|
||||
# Execute query
|
||||
cursor.execute("select col1,col2 from each_column_group")
|
||||
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### Experience materialized views
|
||||
|
||||
Materialized views are a key feature supporting AP business. They improve query performance and simplify complex query logic by precomputing and storing query results of views, reducing real-time computation. They are commonly used in fast report generation and data analysis scenarios. seekdb supports non-real-time and real-time materialized views, supports specifying primary keys or creating indexes for materialized views, and introduces nested materialized views, which can significantly improve query performance.
|
||||
|
||||
The following example demonstrates how to use materialized views in seekdb.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
import time
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create base tables
|
||||
cursor.execute("create table base_t1(a int primary key, b int)")
|
||||
cursor.execute("create table base_t2(c int primary key, d int)")
|
||||
|
||||
# Create materialized view logs
|
||||
cursor.execute("create materialized view log on base_t1 with(b)")
|
||||
cursor.execute("create materialized view log on base_t2 with(d)")
|
||||
|
||||
# Create a materialized view named mv based on tables base_t1 and base_t2, specify the refresh strategy as incremental refresh, and set the initial refresh time in the refresh plan to the current date, then refresh the materialized view every 1 second thereafter.
|
||||
cursor.execute("create materialized view mv REFRESH fast START WITH sysdate() NEXT sysdate() + INTERVAL 1 second as select a,b,c,d from base_t1 join base_t2 on base_t1.a=base_t2.c")
|
||||
|
||||
# Insert data into base tables
|
||||
cursor.execute("insert into base_t1 values(1, 10)")
|
||||
cursor.execute("insert into base_t2 values(1, 100)")
|
||||
conn.commit()
|
||||
|
||||
# Wait for the materialized view background refresh to complete
|
||||
time.sleep(10)
|
||||
|
||||
# Query data
|
||||
cursor.execute("select * from mv")
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### Experience external tables
|
||||
|
||||
Typically, table data in a database is stored in the database's storage space, while external table data is stored in external storage services. When creating an external table, you need to define the data file path and data file format. After creation, users can read data from files in external storage services through external tables.
|
||||
|
||||
The following example demonstrates how to access external CSV files through seekdb's external table feature.
|
||||
|
||||
1. Create an external data source.
|
||||
|
||||
```bash
|
||||
cat /data/1/example.csv
|
||||
1,10
|
||||
2,20
|
||||
3,30
|
||||
```
|
||||
|
||||
2. Access external table data using the embedded method.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create an external table
|
||||
sql='''CREATE EXTERNAL TABLE test_external_table(c1 int, c2 int) LOCATION='/data/1' FORMAT=(TYPE='CSV' FIELD_DELIMITER=',') PATTERN='example.csv';
|
||||
'''
|
||||
cursor.execute(sql)
|
||||
|
||||
# Query data
|
||||
cursor.execute("select * from test_external_table")
|
||||
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
### Experience transaction capabilities (OLTP)
|
||||
|
||||
The following example demonstrates seekdb's transaction capabilities.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
|
||||
pylibseekdb.open("seekdb")
|
||||
conn = pylibseekdb.connect("test")
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Create a table
|
||||
cursor.execute("create table test_oltp(c1 int primary key, c2 int)")
|
||||
|
||||
# Insert data
|
||||
cursor.execute("insert into test_oltp values(1, 10)")
|
||||
cursor.execute("insert into test_oltp values(2, 20)")
|
||||
cursor.execute("insert into test_oltp values(3, 30)")
|
||||
# Commit transaction
|
||||
conn.commit()
|
||||
|
||||
# Query data, ORA_ROWSCN is the data commit version number
|
||||
cursor.execute("select *,ORA_ROWSCN from test_oltp")
|
||||
# Fetch results
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Close connections
|
||||
cursor.close()
|
||||
conn.close()
|
||||
```
|
||||
|
||||
## Smooth transition to the distributed version
|
||||
|
||||
After users quickly validate product prototypes through the embedded version, if they want to switch to seekdb Server mode or use OceanBase's distributed version cluster processing capabilities, they only need to modify the import package and related configuration, while the main application logic remains unchanged.
|
||||
|
||||
```python
|
||||
import pylibseekdb
|
||||
pylibseekdb.open()
|
||||
conn = pylibseekdb.connect()
|
||||
```
|
||||
|
||||
Simply replace the three lines above with the two lines below. Use the pymysql package to replace pylibseekdb, remove the pylibseekdb open phase, and use pymysql's connect method to connect to the database server.
|
||||
|
||||
```python
|
||||
import pymysql
|
||||
conn = pymysql.connect(host='127.0.0.1', port=11002, user='root@sys', database='test')
|
||||
```
|
||||
Reference in New Issue
Block a user