284 lines
11 KiB
Markdown
284 lines
11 KiB
Markdown
---
|
|
|
|
slug: /build-kb-in-seekdb
|
|
---
|
|
|
|
# Build a knowledge base desktop application based on seekdb
|
|
|
|
This tutorial guides you through building a MineKB (Mine Knowledge Base, personal local knowledge base) desktop application using seekdb, demonstrating how to implement intelligent Q&A through vector search and large language models.
|
|
|
|
## Overview
|
|
|
|
Core features of the application:
|
|
* Multi-project management: Supports creating multiple independent knowledge base projects.
|
|
* Document processing: Supports multiple formats such as TXT, MD, PDF, DOC, DOCX, and RTF, with automatic text extraction and vectorization.
|
|
* Intelligent search: Based on seekdb's vector indexes (HNSW), enabling efficient semantic search.
|
|
* Conversational Q&A: Query the knowledge base through AI conversations to obtain accurate answers based on document content.
|
|
* Local storage: All data is stored locally to protect privacy and security.
|
|
|
|
Reasons for choosing seekdb:
|
|
* Embedded deployment: Embedded as a library in the application, no independent service required.
|
|
* Native vector support: Built-in vector type and HNSW indexes, improving vector search performance by 10-100x.
|
|
* All-in-One: Supports transactions, analytics, and vector search simultaneously, meeting all requirements with one database.
|
|
* SQL interface: Standard SQL syntax, developer-friendly.
|
|
|
|
## Prerequisites
|
|
|
|
### Environment requirements
|
|
|
|
The following environment is required for developing and running the knowledge base desktop application:
|
|
|
|
* Operating system: Linux supported, Ubuntu 20.04+ recommended.
|
|
* Node.js: 16.x+ supported, for frontend development, 18.x LTS recommended.
|
|
* Rust: 1.70+ supported, required by Tauri, 1.75+ recommended.
|
|
* Python: 3.x+ supported, 3.9+ recommended.
|
|
|
|
### Technology stack and dependencies
|
|
|
|
* Frontend technology stack (see `package.json` for details)
|
|
* `@tauri-apps/api`: Tauri frontend API for calling Rust commands
|
|
* `@radix-ui/*`: Accessible UI component library
|
|
* `react-markdown`: Markdown rendering
|
|
* `react-syntax-highlighter`: Code highlighting
|
|
* `lucide-react`: Icon library
|
|
* Backend technology stack (see `Cargo.toml` for details)
|
|
* `tauri`: Tauri framework core
|
|
* `tokio`: Async runtime
|
|
* `reqwest`: HTTP client (for calling AI APIs)
|
|
* `pdf-extract`, `docx-rs`: Document parsing
|
|
* `nalgebra`: Vector computation
|
|
|
|
### Python dependencies (see `requirements.txt` for details)
|
|
|
|
```shell
|
|
seekdb==0.0.1.dev4
|
|
```
|
|
|
|
### Install seekdb
|
|
|
|
Ensure seekdb is installed and verify the installation:
|
|
|
|
```shell
|
|
pip install seekdb -i https://pypi.tuna.tsinghua.edu.cn/simple/
|
|
# Verify installation:
|
|
python3 -c "import seekdb; print(seekdb.__version__)"
|
|
```
|
|
|
|
### API key configuration
|
|
|
|
MineKB requires Alibaba Cloud Bailian API to provide embedding and LLM services. Register an [Alibaba Cloud Bailian](https://bailian.console.aliyun.com/) account, enable model services, and obtain an API key.
|
|
|
|
After obtaining the API key, fill it in the configuration file: `src-tauri/config.json`
|
|
|
|
```json
|
|
{
|
|
"api": {
|
|
"dashscope": {
|
|
"api_key": "<sk-your-api-key-here>",
|
|
"base_url": "https://dashscope.aliyuncs.com/api/v1",
|
|
"embedding_model": "text-embedding-v1",
|
|
"chat_model": "qwen-plus"
|
|
}
|
|
},
|
|
"database": {
|
|
"path": "./mine_kb.db",
|
|
"name": "mine_kb"
|
|
}
|
|
}
|
|
```
|
|
|
|
:::tip
|
|
<ul><li>Qwen LLM provides a certain amount of free usage quota. Please monitor your free quota usage during use, as exceeding it will incur charges.</li><li>This tutorial uses Qwen LLM as an example to introduce how to build a Q&A bot. You can also choose to use other LLMs. If you use another LLM, you need to update the <code>apiKey</code>, <code>model</code>, and <code>baseUrl</code> parameters in the <code>src-tauri/config.example.json</code> file.</li></ul>
|
|
:::
|
|
|
|
## Run the application locally
|
|
|
|
### Step 1: Build and start
|
|
|
|
1. Clone the project and install dependencies.
|
|
|
|
```shell
|
|
# Clone the project
|
|
git clone https://github.com/ob-labs/mine-kb.git
|
|
cd mine-kb
|
|
|
|
# Install frontend dependencies
|
|
npm install
|
|
# Install Python dependencies
|
|
pip install seekdb==0.0.1.dev4 -i https://pypi.tuna.tsinghua.edu.cn/simple/
|
|
```
|
|
|
|
2. Configure the API key.
|
|
|
|
```shell
|
|
cp src-tauri/config.example.json src-tauri/config.json
|
|
# Edit the configuration file and fill in your API key
|
|
nano src-tauri/config.json
|
|
```
|
|
|
|
3. Start the application.
|
|
|
|
```shell
|
|
npm run tauri:dev
|
|
```
|
|
<!---->
|
|
|
|
When a user starts the MineKB application, the system executes the following initialization flow in sequence:
|
|
* Application initialization (see `src-tauri/src/main.rs` for code details)
|
|
* Initialize the logging system
|
|
* Determine the application data directory
|
|
* Load the configuration file
|
|
* Initialize the Python environment
|
|
* Initialize the seekdb database
|
|
* Initialize the database schema
|
|
* Create application state
|
|
* Start the Tauri application
|
|
* Frontend initialization (see `src/main.tsx` for code details)
|
|
* Mount the React application
|
|
* Call the list_projects command to get the project list
|
|
* Render the project panel and conversation panel
|
|
* Wait for user operations
|
|
|
|
### Step 2: Create a knowledge base
|
|
|
|
We recommend using seekdb documentation for testing. [Click here](https://github.com/oceanbase/seekdb-doc).
|
|
|
|
<!---->
|
|
|
|
After the user clicks the `Create Project` button, the system executes the following flow:
|
|
* Frontend interaction implementation
|
|
* See `ProjectPanel.tsx` for code details
|
|
* Backend processing implementation
|
|
* See `commands/projects.rs` for code details
|
|
* Database operations
|
|
* See `services/project_service.rs` for code details
|
|
* Database layer (`seekdb_adapter.rs` → `Python Bridge` → `seekdb`), code as follows:
|
|
```shell
|
|
# Python bridge receives command
|
|
{
|
|
"command": "execute",
|
|
"params": {
|
|
"sql": "INSERT INTO projects (...) VALUES (?, ?, ?, ?, ?, ?, ?)",
|
|
"values": ["uuid-here", "My Project", "Description", "active", 0, "2025-11-05T...", "2025-11-05T..."]
|
|
}
|
|
}
|
|
|
|
# Convert to seekdb SQL
|
|
cursor.execute("""
|
|
INSERT INTO projects (id, name, description, status, document_count, created_at, updated_at)
|
|
VALUES ('uuid-here', 'My Project', 'Description', 'active', 0, '2025-11-05T...', '2025-11-05T...')
|
|
""")
|
|
conn.commit()
|
|
|
|
# Return success response
|
|
{
|
|
"status": "success",
|
|
"data": null
|
|
}
|
|
```
|
|
|
|
In summary, creating a knowledge base performs the following tasks:
|
|
1. Generate a unique project ID (UUID v4).
|
|
2. Validate the project name (non-empty, no duplicates).
|
|
3. Initialize the project status as Active.
|
|
4. Record creation time and update time.
|
|
5. Write project information to the `projects` table in seekdb.
|
|
6. Commit the transaction to ensure data persistence.
|
|
7. Return project information to the frontend.
|
|
8. Frontend updates the project list and displays the new project.
|
|
|
|
### Step 3: Start a conversation
|
|
|
|
<!---->
|
|
|
|
After the user enters a question in the dialog box, the system executes the following flow:
|
|
* Frontend sends message
|
|
* See `ChatPanel.tsx` for code details
|
|
* Backend processing
|
|
* See `commands/chat.rs` for code details
|
|
* Vector search
|
|
* See `services/vector_db.rs` for code details
|
|
* LLM streaming call
|
|
* See `services/llm_client.rs` for code details
|
|
|
|
In summary, starting a conversation performs the following tasks:
|
|
1. User enters a question.
|
|
2. Save the user message to the database.
|
|
3. Call Alibaba Cloud Bailian API to generate a query vector (1536-dimensional).
|
|
4. Execute vector search in seekdb (using HNSW index).
|
|
5. Get the top 20 most similar document chunks.
|
|
6. Calculate similarity scores and filter (threshold 0.3).
|
|
7. Use relevant documents as context.
|
|
8. Build a prompt (context + user question).
|
|
9. Streamingly call LLM to generate an answer.
|
|
10. Send the answer to the frontend in real-time for display.
|
|
11. Save the AI reply and source information to the database.
|
|
12. Update the last update time of the conversation.
|
|
|
|
## Summary
|
|
|
|
### Advantages of seekdb in desktop application development
|
|
|
|
Through the MineKB project practice, seekdb demonstrates the following significant advantages in building desktop applications.
|
|
|
|
#### High development efficiency
|
|
|
|
| Comparison item | Traditional solution | seekdb solution |
|
|
| :----------- | :----------------------- | :--------------------------- |
|
|
| Database deployment | Requires installing and configuring an independent service | Embedded, no installation required |
|
|
| Vector search implementation | Manually implement vector indexes and search algorithms | Native HNSW indexes, ready to use |
|
|
| Data management | Manage relational data and vector data separately | Unified management, SQL interface |
|
|
| Cross-platform support | Need to compile/package database for different platforms | `pip install` automatically adapts to platform |
|
|
|
|
#### Excellent performance
|
|
|
|
Vector search performance test (10,000 document chunks, 1536-dimensional vectors):
|
|
|
|
| Operation | seekdb (HNSW) | SQLite (manual search) | Improvement |
|
|
| :---------- | :------------ | :---------------- | :------- |
|
|
| Top-10 search | 15ms | 1200ms | 80x |
|
|
| Top-20 search | 25ms | 2500ms | 100x |
|
|
| Top-50 search | 45ms | Cannot complete | ∞ |
|
|
|
|
Reason analysis:
|
|
* HNSW index: O(log N) complexity, efficient search.
|
|
* Native vector type support: No serialization overhead, improved performance.
|
|
* Columnar storage optimization: Only read required fields, reducing I/O.
|
|
|
|
#### Data privacy and security
|
|
|
|
| Feature | Description | Value |
|
|
| :------- | :----------------------- | :------------- |
|
|
| Local storage | Database files stored on user devices | Zero privacy leakage |
|
|
| No network required | All operations offline except AI conversations | Sensitive documents not uploaded |
|
|
| User control | Users can backup and migrate database files | Data ownership belongs to users |
|
|
| ACID transactions | Ensures data consistency | No data loss |
|
|
|
|
#### All-in-One capabilities
|
|
|
|
seekdb's integrated capabilities provide unlimited possibilities for future expansion:
|
|
* Relational data management: Projects, documents, sessions, etc.
|
|
* Transaction support: ACID features.
|
|
* Vector search: Semantic search.
|
|
* Full-text search: Use seekdb's `FULLTEXT INDEX`.
|
|
* Hybrid search: Combines semantic search and keyword search.
|
|
* Analytical queries: Use OLAP capabilities for knowledge statistics.
|
|
* External table queries: Directly query external files such as CSV.
|
|
* Smooth upgrade: Data can be migrated to OceanBase distributed edition.
|
|
|
|
### MineKB project summary
|
|
|
|
Through the MineKB project, we have proven that seekdb + Tauri is an excellent combination for building AI-Native desktop applications.
|
|
|
|
Key success factors:
|
|
1. seekdb: Provides powerful vector search capabilities.
|
|
2. Tauri: Provides a lightweight cross-platform desktop application framework.
|
|
3. Python Bridge: Achieves seamless integration between Rust and seekdb.
|
|
4. RAG architecture: Fully leverages the advantages of vector search.
|
|
|
|
Applicable scenarios:
|
|
* Personal knowledge base management
|
|
* Enterprise document retrieval systems
|
|
* AI-assisted programming tools
|
|
* Study notes and research assistants
|
|
* Any desktop application that requires semantic search |