Initial commit

2025-11-30 08:44:54 +08:00
commit eb309b7b59
133 changed files with 21979 additions and 0 deletions
--- a/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md
+++ b/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/10.pyseekdb-sdk-get-started.md
@@ -0,0 +1,60 @@
+---
+slug: /pyseekdb-sdk-get-started
+---
+
+# Get started
+
+## pyseekdb
+
+pyseekdb is a Python client provided by OceanBase Database. It allows you to connect to seekdb in embedded mode or remote mode, and supports connecting to seekdb in server mode or OceanBase Database.
+
+:::tip
+OceanBase Database is a fully self-developed, enterprise-level, native distributed database provided by OceanBase. It achieves financial-grade high availability on ordinary hardware and sets a new standard for automatic, lossless disaster recovery across cities with the "five IDCs across three regions" architecture. It also sets a new benchmark in the TPC-C standard test, with a single cluster scale exceeding 1,500 nodes. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
+:::
+
+pyseekdb is supported on Linux, macOS, and Windows. The supported database connection modes vary by operating system. For more information, see the table below.
+
+| System | Embedded seekdb | Server mode seekdb | Server mode OceanBase Database |
+|----|---|---|---|
+| Linux | Supported | Supported | Supported |
+| macOS | Not supported | Supported | Supported |
+| Windows | Not supported | Supported | Supported |
+
+For Linux system, when you install this client, it will also install seekdb in embedded mode, allowing you to directly connect to it to perform operations such as creating a database. Alternatively, you can choose to connect to a deployed seekdb or OceanBase Database in client/server mode.
+
+## Install pyseekdb
+
+### Prerequisites
+
+Make sure that your environment meets the following requirements:
+
+* Operating system: Linux (glibc >= 2.28), macOS or Windows
+* Python version: Python 3.11 and later
+* System architecture: x86_64 or aarch64
+
+### Procedure
+
+Use pip to install pyseekdb. It will automatically detect the default Python version and platform.
+
+```shell
+pip install pyseekdb
+```
+
+If your pip version is outdated, upgrade it before installation.
+
+```bash
+pip install --upgrade pip
+```
+
+## What to do next
+
+* After installing pyseekdb, you can connect to seekdb to perform operations. For information about the API interfaces supported by pyseekdb, see [API Reference](../50.apis/10.api-overview.md).
+
+* You can also refer to the SDK samples provided to quickly experience pyseekdb.
+
+  * [Simple sample](50.sdk-samples/10.pyseekdb-simple-sample.md)
+
+  * [Complete sample](50.sdk-samples/50.pyseekdb-complete-sample.md)
+
+  * [Hybrid search sample](50.sdk-samples/100.pyseekdb-hybrid-search-sample.md)
+
--- a/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/10.pyseekdb-simple-sample.md
+++ b/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/10.pyseekdb-simple-sample.md
@@ -0,0 +1,130 @@
+---
+slug: /pyseekdb-simple-sample
+---
+
+# Simple Example
+
+This example demonstrates the basic operations of Embedding Functions in embedded mode of seekdb to help you understand how to use Embedding Functions.
+
+1. Connect to seekdb.
+2. Create a collection with Embedding Functions.
+3. Add data using documents (vectors will be automatically generated).
+4. Query using texts (vectors will be automatically generated).
+5. Print the query results.
+
+## Prerequisites
+
+This example uses seekdb in embedded mode. Before using this example, make sure that you have deployed seekdb in server mode.
+
+For information about how to deploy seekdb in embedded mode, see [Embedded Mode](../../../../400.guides/400.deploy/600.python-seekdb.md).
+
+## Example
+
+```python
+import pyseekdb
+
+# ==================== Step 1: Create Client Connection ====================
+# You can use embedded mode, server mode, or OceanBase mode
+
+# Embedded mode (local SeekDB)
+client = pyseekdb.Client()
+# Alternative: Server mode (connecting to remote SeekDB server)
+# client = pyseekdb.Client(
+#     host="127.0.0.1",
+#     port=2881,
+#     database="test",
+#     user="root",
+#     password=""
+# )
+
+# Alternative: Remote server mode (OceanBase Server)
+# client = pyseekdb.Client(
+#     host="127.0.0.1",
+#     port=2881,
+#     tenant="test",  # OceanBase default tenant
+#     database="test",
+#     user="root",
+#     password=""
+# )
+
+# ==================== Step 2: Create a Collection with Embedding Function ====================
+# A collection is like a table that stores documents with vector embeddings
+collection_name = "my_simple_collection"
+
+# Create collection with default embedding function
+# The embedding function will automatically convert documents to embeddings
+collection = client.create_collection(
+    name=collection_name,
+)
+
+print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
+print(f"Embedding function: {collection.embedding_function}")
+
+# ==================== Step 3: Add Data to Collection ====================
+# With embedding function, you can add documents directly without providing embeddings
+# The embedding function will automatically generate embeddings from documents
+
+documents = [
+    "Machine learning is a subset of artificial intelligence",
+    "Python is a popular programming language",
+    "Vector databases enable semantic search",
+    "Neural networks are inspired by the human brain",
+    "Natural language processing helps computers understand text"
+]
+
+ids = ["id1", "id2", "id3", "id4", "id5"]
+
+# Add data with documents only - embeddings will be auto-generated by embedding function
+collection.add(
+    ids=ids,
+    documents=documents,  # embeddings will be automatically generated
+    metadatas=[
+        {"category": "AI", "index": 0},
+        {"category": "Programming", "index": 1},
+        {"category": "Database", "index": 2},
+        {"category": "AI", "index": 3},
+        {"category": "NLP", "index": 4}
+    ]
+)
+
+print(f"\nAdded {len(documents)} documents to collection")
+print("Note: Embeddings were automatically generated from documents using the embedding function")
+
+# ==================== Step 4: Query the Collection ====================
+# With embedding function, you can query using text directly
+# The embedding function will automatically convert query text to query vector
+
+# Query using text - query vector will be auto-generated by embedding function
+query_text = "artificial intelligence and machine learning"
+
+results = collection.query(
+    query_texts=query_text,  # Query text - will be embedded automatically
+    n_results=3  # Return top 3 most similar documents
+)
+
+print(f"\nQuery: '{query_text}'")
+print(f"Query results: {len(results['ids'][0])} items found")
+
+# ==================== Step 5: Print Query Results ====================
+for i in range(len(results['ids'][0])):
+    print(f"\nResult {i+1}:")
+    print(f"  ID: {results['ids'][0][i]}")
+    print(f"  Distance: {results['distances'][0][i]:.4f}")
+    if results.get('documents'):
+        print(f"  Document: {results['documents'][0][i]}")
+    if results.get('metadatas'):
+        print(f"  Metadata: {results['metadatas'][0][i]}")
+
+# ==================== Step 6: Cleanup ====================
+# Delete the collection
+client.delete_collection(collection_name)
+print(f"\nDeleted collection '{collection_name}'")
+```
+
+## References
+
+* For information about the APIs supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
+
+* [Complete Example](50.pyseekdb-complete-sample.md)
+
+* [Hybrid Search Example](100.pyseekdb-hybrid-search-sample.md)
--- a/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/100.pyseekdb-hybrid-search-sample.md
+++ b/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/100.pyseekdb-hybrid-search-sample.md
@@ -0,0 +1,350 @@
+---
+slug: /pyseekdb-hybrid-search-sample
+---
+
+# Hybrid search example
+
+This example demonstrates the advantages of `hybrid_search()` over `query()`.
+
+The main advantages of `hybrid_search()` are:
+
+* Supports full-text search and vector similarity search simultaneously
+
+* Allows separate filtering conditions for full-text and vector search
+
+* Combines the ranked results of both searches using the Reciprocal Rank Fusion algorithm to improve relevance.
+
+* Handles complex scenarios that `query()` cannot handle
+
+## Example
+
+```python
+import pyseekdb
+
+# Setup
+client = pyseekdb.Client()
+collection = client.get_or_create_collection(
+    name="hybrid_search_demo"
+)
+
+# Sample data
+documents = [
+    "Machine learning is revolutionizing artificial intelligence and data science",
+    "Python programming language is essential for machine learning developers",
+    "Deep learning neural networks enable advanced AI applications",
+    "Data science combines statistics, programming, and domain expertise",
+    "Natural language processing uses machine learning to understand text",
+    "Computer vision algorithms process images using deep learning techniques",
+    "Reinforcement learning trains agents through reward-based feedback",
+    "Python libraries like TensorFlow and PyTorch simplify machine learning",
+    "Artificial intelligence systems can learn from large datasets",
+    "Neural networks mimic the structure of biological brain connections"
+]
+
+metadatas = [
+    {"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
+    {"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
+    {"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
+    {"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
+    {"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
+    {"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
+    {"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
+    {"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
+    {"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
+    {"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
+]
+
+ids = [f"doc_{i+1}" for i in range(len(documents))]
+collection.add(ids=ids, documents=documents, metadatas=metadatas)
+
+print("=" * 100)
+print("SCENARIO 1: Keyword + Semantic Search")
+print("=" * 100)
+print("Goal: Find documents similar to 'AI research' AND containing 'machine learning'\n")
+
+# query() approach
+query_result1 = collection.query(
+    query_texts=["AI research"],
+    where_document={"$contains": "machine learning"},
+    n_results=5
+)
+
+# hybrid_search() approach
+hybrid_result1 = collection.hybrid_search(
+    query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
+    knn={"query_texts": ["AI research"], "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results:")
+for i, doc_id in enumerate(query_result1['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nhybrid_search() Results:")
+for i, doc_id in enumerate(hybrid_result1['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nAnalysis:")
+print("  query() ranks 'Deep learning neural networks...' first because it's semantically similar to 'AI research',")
+print("  but 'machine learning' is not its primary focus. hybrid_search() correctly prioritizes documents that")
+print("  explicitly contain 'machine learning' (from full-text search) while also being semantically relevant")
+print("  to 'AI research' (from vector search). The RRF fusion ensures documents matching both criteria rank higher.")
+
+print("\n" + "=" * 100)
+print("SCENARIO 2: Independent Filters for Different Search Types")
+print("=" * 100)
+print("Goal: Full-text='neural' (year=2024) + Vector='deep learning' (popularity>=90)\n")
+
+# query() - same filter applies to both conditions
+query_result2 = collection.query(
+    query_texts=["deep learning"],
+    where={"year": {"$eq": 2024}, "popularity": {"$gte": 90}},
+    where_document={"$contains": "neural"},
+    n_results=5
+)
+
+# hybrid_search() - different filters for each search type
+hybrid_result2 = collection.hybrid_search(
+    query={"where_document": {"$contains": "neural"}, "where": {"year": {"$eq": 2024}}, "n_results": 10},
+    knn={"query_texts": ["deep learning"], "where": {"popularity": {"$gte": 90}}, "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results (same filter for both):")
+for i, doc_id in enumerate(query_result2['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nhybrid_search() Results (independent filters):")
+for i, doc_id in enumerate(hybrid_result2['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nAnalysis:")
+print("  query() only returns 2 results because it requires documents to satisfy BOTH year=2024 AND popularity>=90")
+print("  simultaneously. hybrid_search() returns 5 results by applying year=2024 filter to full-text search")
+print("  and popularity>=90 filter to vector search independently, then fusing the results. This approach")
+print("  captures more relevant documents that might satisfy one criterion strongly while meeting the other")
+
+print("\n" + "=" * 100)
+print("SCENARIO 3: Combining Multiple Search Strategies")
+print("=" * 100)
+print("Goal: Find documents about 'machine learning algorithms'\n")
+
+# query() - vector search only
+query_result3 = collection.query(
+    query_texts=["machine learning algorithms"],
+    n_results=5
+)
+
+# hybrid_search() - combines full-text and vector
+hybrid_result3 = collection.hybrid_search(
+    query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
+    knn={"query_texts": ["machine learning algorithms"], "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results (vector similarity only):")
+for i, doc_id in enumerate(query_result3['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nhybrid_search() Results (full-text + vector fusion):")
+for i, doc_id in enumerate(hybrid_result3['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nAnalysis:")
+print("  query() returns 'Artificial intelligence systems...' as the result, which doesn't explicitly")
+print("  mention 'machine learning'. hybrid_search() combines full-text search (for 'machine learning')")
+print("  with vector search (for semantic similarity to 'machine learning algorithms'), ensuring that")
+print("  documents containing the exact keyword rank higher while still capturing semantically relevant content.")
+
+print("\n" + "=" * 100)
+print("SCENARIO 4: Complex Multi-Criteria Search")
+print("=" * 100)
+print("Goal: Full-text='learning' (category=AI) + Vector='artificial intelligence' (year>=2023)\n")
+
+# query() - limited to single search with combined filters
+query_result4 = collection.query(
+    query_texts=["artificial intelligence"],
+    where={"category": {"$eq": "AI"}, "year": {"$gte": 2023}},
+    where_document={"$contains": "learning"},
+    n_results=5
+)
+
+# hybrid_search() - separate criteria for each search type
+hybrid_result4 = collection.hybrid_search(
+    query={"where_document": {"$contains": "learning"}, "where": {"category": {"$eq": "AI"}}, "n_results": 10},
+    knn={"query_texts": ["artificial intelligence"], "where": {"year": {"$gte": 2023}}, "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results:")
+for i, doc_id in enumerate(query_result4['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nhybrid_search() Results:")
+for i, doc_id in enumerate(hybrid_result4['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nAnalysis:")
+print("  While both methods return similar documents, hybrid_search() provides better ranking by prioritizing")
+print("  documents that score highly in both full-text search (containing 'learning' with category=AI) and")
+print("  vector search (semantically similar to 'artificial intelligence' with year>=2023). The RRF fusion")
+print("  algorithm ensures that 'Deep learning neural networks...' ranks first because it strongly matches")
+print("  both search criteria, whereas query() applies filters sequentially which may not optimize ranking.")
+
+print("\n" + "=" * 100)
+print("SCENARIO 5: Result Quality - RRF Fusion")
+print("=" * 100)
+print("Goal: Search for 'Python machine learning'\n")
+
+# query() - single ranking
+query_result5 = collection.query(
+    query_texts=["Python machine learning"],
+    n_results=5
+)
+
+# hybrid_search() - RRF fusion of multiple rankings
+hybrid_result5 = collection.hybrid_search(
+    query={"where_document": {"$contains": "Python"}, "n_results": 10},
+    knn={"query_texts": ["Python machine learning"], "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results (single ranking):")
+for i, doc_id in enumerate(query_result5['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nhybrid_search() Results (RRF fusion):")
+for i, doc_id in enumerate(hybrid_result5['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nAnalysis:")
+print("  Both methods return identical results in this case, but hybrid_search() achieves this through RRF")
+print("  (Reciprocal Rank Fusion) which combines rankings from full-text search (for 'Python') and vector")
+print("  search (for 'Python machine learning'). RRF provides more stable and robust ranking by considering")
+print("  multiple signals, making it less sensitive to variations in individual search algorithms and ensuring")
+print("  consistent high-quality results across different query formulations.")
+
+print("\n" + "=" * 100)
+print("SCENARIO 6: Different Filter Criteria for Each Search")
+print("=" * 100)
+print("Goal: Full-text='neural' (high popularity) + Vector='deep learning' (recent year)\n")
+
+# query() - cannot separate filters for keyword vs semantic
+query_result6 = collection.query(
+    query_texts=["deep learning"],
+    where={"popularity": {"$gte": 90}, "year": {"$gte": 2023}},
+    where_document={"$contains": "neural"},
+    n_results=5
+)
+
+# hybrid_search() - different filters for keyword search vs semantic search
+hybrid_result6 = collection.hybrid_search(
+    query={"where_document": {"$contains": "neural"}, "where": {"popularity": {"$gte": 90}}, "n_results": 10},
+    knn={"query_texts": ["deep learning"], "where": {"year": {"$gte": 2023}}, "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results:")
+for i, doc_id in enumerate(query_result6['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nhybrid_search() Results:")
+for i, doc_id in enumerate(hybrid_result6['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+    print(f"      {metadatas[idx]}")
+
+print("\nAnalysis:")
+print("  query() only returns 2 results because it requires documents to satisfy BOTH popularity>=90 AND")
+print("  year>=2023 simultaneously, along with containing 'neural' and being semantically similar to")
+print("  'deep learning'. hybrid_search() returns 5 results by applying popularity>=90 filter to full-text")
+print("  search (for 'neural') and year>=2023 filter to vector search (for 'deep learning') independently.")
+print("  The fusion then combines results from both searches, capturing documents that strongly match either")
+print("  criterion while still being relevant to the overall query intent.")
+
+print("\n" + "=" * 100)
+print("SCENARIO 7: Partial Keyword Match + Semantic Similarity")
+print("=" * 100)
+print("Goal: Documents containing 'Python' + Semantically similar to 'data science'\n")
+
+# query() - filter applied after vector search
+query_result7 = collection.query(
+    query_texts=["data science"],
+    where_document={"$contains": "Python"},
+    n_results=5
+)
+
+# hybrid_search() - parallel searches then fusion
+hybrid_result7 = collection.hybrid_search(
+    query={"where_document": {"$contains": "Python"}, "n_results": 10},
+    knn={"query_texts": ["data science"], "n_results": 10},
+    rank={"rrf": {}},
+    n_results=5
+)
+
+print("query() Results:")
+for i, doc_id in enumerate(query_result7['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nhybrid_search() Results:")
+for i, doc_id in enumerate(hybrid_result7['ids'][0]):
+    idx = ids.index(doc_id)
+    print(f"  {i+1}. {documents[idx]}")
+
+print("\nAnalysis:")
+print("  query() only returns 2 results because it first performs vector search for 'data science', then")
+print("  filters to documents containing 'Python', which severely limits the result set. hybrid_search()")
+print("  returns 5 results by running full-text search (for 'Python') and vector search (for 'data science')")
+print("  in parallel, then fusing the results. This captures documents that contain 'Python' (even if not")
+print("  semantically closest to 'data science') and documents semantically similar to 'data science' (even")
+print("  if they don't contain 'Python'), providing better recall and more comprehensive results.")
+
+print("\n" + "=" * 100)
+print("SUMMARY")
+print("=" * 100)
+print("""
+query() limitations:
+  - Single search type (vector similarity)
+  - Filters applied after search (may miss relevant docs)
+  - Cannot combine full-text and vector search results
+  - Same filter criteria for all conditions
+
+hybrid_search() advantages:
+  - Simultaneous full-text + vector search
+  - Independent filters for each search type
+  - Intelligent result fusion using RRF
+  - Better recall for complex queries
+  - Handles scenarios requiring both keyword and semantic matching
+""")
+```
+
+## References
+
+* For information about the APIs supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
+
+* [Simple example](10.pyseekdb-simple-sample.md)
+
+* [Complete example](50.pyseekdb-complete-sample.md)
--- a/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/50.pyseekdb-complete-sample.md
+++ b/skills/seekdb-docs/official-docs/200.develop/900.sdk/10.pyseekdb-sdk/50.sdk-samples/50.pyseekdb-complete-sample.md
@@ -0,0 +1,440 @@
+---
+slug: /pyseekdb-complete-sample
+---
+
+# Complete Example
+
+This example demonstrates the full capabilities of pyseekdb.
+
+The example includes the following operations:
+
+1. Connection, including all connection modes
+2. Collection management
+3. DML operations, including add, update, upsert, and delete
+4. DQL operations, including query, get, and hybrid_search
+5. Filter operators
+6. Collection information methods
+
+## Example
+
+```python
+import uuid
+import random
+import pyseekdb
+
+# ============================================================================
+# PART 1: CLIENT CONNECTION
+# ============================================================================
+
+# Option 1: Embedded mode (local SeekDB)
+client = pyseekdb.Client(
+    #path="./seekdb",
+    #database="test"
+)
+
+# Option 2: Server mode (remote SeekDB server)
+# client = pyseekdb.Client(
+#     host="127.0.0.1",
+#     port=2881,
+#     database="test",
+#     user="root",
+#     password=""
+# )
+
+# Option 3: Remote server mode (OceanBase Server)
+# client = pyseekdb.Client(
+#     host="127.0.0.1",
+#     port=2881,
+#     tenant="test",  # OceanBase default tenant
+#     database="test",
+#     user="root",
+#     password=""
+# )
+
+# ============================================================================
+# PART 2: COLLECTION MANAGEMENT
+# ============================================================================
+
+collection_name = "comprehensive_example"
+dimension = 128
+
+# 2.1 Create a collection
+from pyseekdb import HNSWConfiguration
+config = HNSWConfiguration(dimension=dimension, distance='cosine')
+collection = client.get_or_create_collection(
+    name=collection_name,
+    configuration=config,
+    embedding_function=None  # Explicitly set to None since we're using custom 128-dim embeddings
+)
+
+# 2.2 Check if collection exists
+exists = client.has_collection(collection_name)
+
+# 2.3 Get collection object
+retrieved_collection = client.get_collection(collection_name, embedding_function=None)
+
+# 2.4 List all collections
+all_collections = client.list_collections()
+
+# 2.5 Get or create collection (creates if doesn't exist)
+config2 = HNSWConfiguration(dimension=64, distance='cosine')
+collection2 = client.get_or_create_collection(
+    name="another_collection",
+    configuration=config2,
+    embedding_function=None  # Explicitly set to None since we're using custom 64-dim embeddings
+)
+
+# ============================================================================
+# PART 3: DML OPERATIONS - ADD DATA
+# ============================================================================
+
+# Generate sample data
+random.seed(42)
+documents = [
+    "Machine learning is transforming the way we solve problems",
+    "Python programming language is widely used in data science",
+    "Vector databases enable efficient similarity search",
+    "Neural networks mimic the structure of the human brain",
+    "Natural language processing helps computers understand human language",
+    "Deep learning requires large amounts of training data",
+    "Reinforcement learning agents learn through trial and error",
+    "Computer vision enables machines to interpret visual information"
+]
+
+# Generate embeddings (in real usage, use an embedding model)
+embeddings = []
+for i in range(len(documents)):
+    vector = [random.random() for _ in range(dimension)]
+    embeddings.append(vector)
+
+ids = [str(uuid.uuid4()) for _ in documents]
+
+# 3.1 Add single item
+single_id = str(uuid.uuid4())
+collection.add(
+    ids=single_id,
+    documents="This is a single document",
+    embeddings=[random.random() for _ in range(dimension)],
+    metadatas={"type": "single", "category": "test"}
+)
+
+# 3.2 Add multiple items
+collection.add(
+    ids=ids,
+    documents=documents,
+    embeddings=embeddings,
+    metadatas=[
+        {"category": "AI", "score": 95, "tag": "ml", "year": 2023},
+        {"category": "Programming", "score": 88, "tag": "python", "year": 2022},
+        {"category": "Database", "score": 92, "tag": "vector", "year": 2023},
+        {"category": "AI", "score": 90, "tag": "neural", "year": 2022},
+        {"category": "NLP", "score": 87, "tag": "language", "year": 2023},
+        {"category": "AI", "score": 93, "tag": "deep", "year": 2023},
+        {"category": "AI", "score": 85, "tag": "reinforcement", "year": 2022},
+        {"category": "CV", "score": 91, "tag": "vision", "year": 2023}
+    ]
+)
+
+# 3.3 Add with only embeddings (no documents)
+vector_only_ids = [str(uuid.uuid4()) for _ in range(2)]
+collection.add(
+    ids=vector_only_ids,
+    embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
+    metadatas=[{"type": "vector_only"}, {"type": "vector_only"}]
+)
+
+# ============================================================================
+# PART 4: DML OPERATIONS - UPDATE DATA
+# ============================================================================
+
+# 4.1 Update single item
+collection.update(
+    ids=ids[0],
+    metadatas={"category": "AI", "score": 98, "tag": "ml", "year": 2024, "updated": True}
+)
+
+# 4.2 Update multiple items
+collection.update(
+    ids=ids[1:3],
+    documents=["Updated document 1", "Updated document 2"],
+    embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
+    metadatas=[
+        {"category": "Programming", "score": 95, "updated": True},
+        {"category": "Database", "score": 97, "updated": True}
+    ]
+)
+
+# 4.3 Update embeddings
+new_embeddings = [[random.random() for _ in range(dimension)] for _ in range(2)]
+collection.update(
+    ids=ids[2:4],
+    embeddings=new_embeddings
+)
+
+# ============================================================================
+# PART 5: DML OPERATIONS - UPSERT DATA
+# ============================================================================
+
+# 5.1 Upsert existing item (will update)
+collection.upsert(
+    ids=ids[0],
+    documents="Upserted document (was updated)",
+    embeddings=[random.random() for _ in range(dimension)],
+    metadatas={"category": "AI", "upserted": True}
+)
+
+# 5.2 Upsert new item (will insert)
+new_id = str(uuid.uuid4())
+collection.upsert(
+    ids=new_id,
+    documents="This is a new document from upsert",
+    embeddings=[random.random() for _ in range(dimension)],
+    metadatas={"category": "New", "upserted": True}
+)
+
+# 5.3 Upsert multiple items
+upsert_ids = [ids[4], str(uuid.uuid4())]  # One existing, one new
+collection.upsert(
+    ids=upsert_ids,
+    documents=["Upserted doc 1", "Upserted doc 2"],
+    embeddings=[[random.random() for _ in range(dimension)] for _ in range(2)],
+    metadatas=[{"upserted": True}, {"upserted": True}]
+)
+
+# ============================================================================
+# PART 6: DQL OPERATIONS - QUERY (VECTOR SIMILARITY SEARCH)
+# ============================================================================
+
+# 6.1 Basic vector similarity query
+query_vector = embeddings[0]  # Query with first document's vector
+results = collection.query(
+    query_embeddings=query_vector,
+    n_results=3
+)
+print(f"Query results: {len(results['ids'][0])} items")
+
+# 6.2 Query with metadata filter (simplified equality)
+results = collection.query(
+    query_embeddings=query_vector,
+    where={"category": "AI"},
+    n_results=5
+)
+
+# 6.3 Query with comparison operators
+results = collection.query(
+    query_embeddings=query_vector,
+    where={"score": {"$gte": 90}},
+    n_results=5
+)
+
+# 6.4 Query with $in operator
+results = collection.query(
+    query_embeddings=query_vector,
+    where={"tag": {"$in": ["ml", "python", "neural"]}},
+    n_results=5
+)
+
+# 6.5 Query with logical operators ($or) - simplified equality
+results = collection.query(
+    query_embeddings=query_vector,
+    where={
+        "$or": [
+            {"category": "AI"},
+            {"tag": "python"}
+        ]
+    },
+    n_results=5
+)
+
+# 6.6 Query with logical operators ($and) - simplified equality
+results = collection.query(
+    query_embeddings=query_vector,
+    where={
+        "$and": [
+            {"category": "AI"},
+            {"score": {"$gte": 90}}
+        ]
+    },
+    n_results=5
+)
+
+# 6.7 Query with document filter
+results = collection.query(
+    query_embeddings=query_vector,
+    where_document={"$contains": "machine learning"},
+    n_results=5
+)
+
+# 6.8 Query with combined filters (simplified equality)
+results = collection.query(
+    query_embeddings=query_vector,
+    where={"category": "AI", "year": {"$gte": 2023}},
+    where_document={"$contains": "learning"},
+    n_results=5
+)
+
+# 6.9 Query with multiple embeddings (batch query)
+batch_embeddings = [embeddings[0], embeddings[1]]
+batch_results = collection.query(
+    query_embeddings=batch_embeddings,
+    n_results=2
+)
+# batch_results["ids"][0] contains results for first query
+# batch_results["ids"][1] contains results for second query
+
+# 6.10 Query with specific fields
+results = collection.query(
+    query_embeddings=query_vector,
+    include=["documents", "metadatas", "embeddings"],
+    n_results=2
+)
+
+# ============================================================================
+# PART 7: DQL OPERATIONS - GET (RETRIEVE BY IDS OR FILTERS)
+# ============================================================================
+
+# 7.1 Get by single ID
+result = collection.get(ids=ids[0])
+# result["ids"] contains [ids[0]]
+# result["documents"] contains document for ids[0]
+
+# 7.2 Get by multiple IDs
+results = collection.get(ids=ids[:3])
+# results["ids"] contains ids[:3]
+# results["documents"] contains documents for all IDs
+
+# 7.3 Get by metadata filter (simplified equality)
+results = collection.get(
+    where={"category": "AI"},
+    limit=5
+)
+
+# 7.4 Get with comparison operators
+results = collection.get(
+    where={"score": {"$gte": 90}},
+    limit=5
+)
+
+# 7.5 Get with $in operator
+results = collection.get(
+    where={"tag": {"$in": ["ml", "python"]}},
+    limit=5
+)
+
+# 7.6 Get with logical operators (simplified equality)
+results = collection.get(
+    where={
+        "$or": [
+            {"category": "AI"},
+            {"category": "Programming"}
+        ]
+    },
+    limit=5
+)
+
+# 7.7 Get by document filter
+results = collection.get(
+    where_document={"$contains": "Python"},
+    limit=5
+)
+
+# 7.8 Get with pagination
+results_page1 = collection.get(limit=2, offset=0)
+results_page2 = collection.get(limit=2, offset=2)
+
+# 7.9 Get with specific fields
+results = collection.get(
+    ids=ids[:2],
+    include=["documents", "metadatas", "embeddings"]
+)
+
+# 7.10 Get all data
+all_results = collection.get(limit=100)
+
+# ============================================================================
+# PART 8: DQL OPERATIONS - HYBRID SEARCH
+# ============================================================================
+
+# 8.1 Hybrid search with full-text and vector search
+# Note: This requires query_embeddings to be provided directly
+# In real usage, you might have an embedding function
+hybrid_results = collection.hybrid_search(
+    query={
+        "where_document": {"$contains": "machine learning"},
+        "where": {"category": "AI"},  # Simplified equality
+        "n_results": 10
+    },
+    knn={
+        "query_embeddings": [embeddings[0]],
+        "where": {"year": {"$gte": 2022}},
+        "n_results": 10
+    },
+    rank={"rrf": {}},  # Reciprocal Rank Fusion
+    n_results=5,
+    include=["documents", "metadatas"]
+)
+# hybrid_results["ids"][0] contains IDs for the hybrid search
+# hybrid_results["documents"][0] contains documents for the hybrid search
+print(f"Hybrid search: {len(hybrid_results.get('ids', [[]])[0])} results")
+
+# ============================================================================
+# PART 9: DML OPERATIONS - DELETE DATA
+# ============================================================================
+
+# 9.1 Delete by IDs
+delete_ids = [vector_only_ids[0], new_id]
+collection.delete(ids=delete_ids)
+
+# 9.2 Delete by metadata filter
+collection.delete(where={"type": {"$eq": "vector_only"}})
+
+# 9.3 Delete by document filter
+collection.delete(where_document={"$contains": "Updated document"})
+
+# 9.4 Delete with combined filters
+collection.delete(
+    where={"category": {"$eq": "CV"}},
+    where_document={"$contains": "vision"}
+)
+
+# ============================================================================
+# PART 10: COLLECTION INFORMATION
+# ============================================================================
+
+# 10.1 Get collection count
+count = collection.count()
+print(f"Collection count: {count} items")
+
+
+# 10.3 Preview first few items in collection (returns all columns by default)
+preview = collection.peek(limit=5)
+print(f"Preview: {len(preview['ids'])} items")
+for i in range(len(preview['ids'])):
+    print(f"  ID: {preview['ids'][i]}, Document: {preview['documents'][i]}")
+    print(f"  Metadata: {preview['metadatas'][i]}, Embedding dim: {len(preview['embeddings'][i]) if preview['embeddings'][i] else 0}")
+
+# 10.4 Count collections in database
+collection_count = client.count_collection()
+print(f"Database has {collection_count} collections")
+
+# ============================================================================
+# PART 11: CLEANUP
+# ============================================================================
+
+# Delete test collections
+try:
+    client.delete_collection("another_collection")
+except Exception as e:
+    print(f"Could not delete 'another_collection': {e}")
+
+# Uncomment to delete main collection
+client.delete_collection(collection_name)
+```
+
+## References
+
+* For information about the API interfaces supported by pyseekdb, see [API Reference](../../50.apis/10.api-overview.md).
+
+* [Simple Example](../50.sdk-samples/10.pyseekdb-simple-sample.md)
+
+* [Hybrid Search Example](../50.sdk-samples/100.pyseekdb-hybrid-search-sample.md)