Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/lamindb/references/annotation-validation.md
+++ b/skills/lamindb/references/annotation-validation.md
@@ -0,0 +1,513 @@
+# LaminDB Annotation & Validation
+
+This document covers data curation, validation, schema management, and annotation best practices in LaminDB.
+
+## Overview
+
+LaminDB's curation process ensures datasets are both validated and queryable through three essential steps:
+
+1. **Validation**: Confirming datasets match desired schemas
+2. **Standardization**: Fixing inconsistencies like typos and mapping synonyms
+3. **Annotation**: Linking datasets to metadata entities for queryability
+
+## Schema Design
+
+Schemas define expected data structure, types, and validation rules. LaminDB supports three main schema approaches:
+
+### 1. Flexible Schema
+
+Validates only columns matching Feature registry names, allowing additional metadata:
+
+```python
+import lamindb as ln
+
+# Create flexible schema
+schema = ln.Schema(
+    name="valid_features",
+    itype=ln.Feature  # Validates against Feature registry
+).save()
+
+# Any column matching a Feature name will be validated
+# Additional columns are permitted but not validated
+```
+
+### 2. Minimal Required Schema
+
+Specifies essential columns while permitting extra metadata:
+
+```python
+# Define required features
+required_features = [
+    ln.Feature.get(name="cell_type"),
+    ln.Feature.get(name="tissue"),
+    ln.Feature.get(name="donor_id")
+]
+
+# Create schema with required features
+schema = ln.Schema(
+    name="minimal_immune_schema",
+    features=required_features,
+    flexible=True  # Allows additional columns
+).save()
+```
+
+### 3. Strict Schema
+
+Enforces complete control over data structure:
+
+```python
+# Define all allowed features
+all_features = [
+    ln.Feature.get(name="cell_type"),
+    ln.Feature.get(name="tissue"),
+    ln.Feature.get(name="donor_id"),
+    ln.Feature.get(name="disease")
+]
+
+# Create strict schema
+schema = ln.Schema(
+    name="strict_immune_schema",
+    features=all_features,
+    flexible=False  # No additional columns allowed
+).save()
+```
+
+## DataFrame Curation Workflow
+
+The typical curation process involves six key steps:
+
+### Step 1-2: Load Data and Establish Registries
+
+```python
+import pandas as pd
+import lamindb as ln
+
+# Load data
+df = pd.read_csv("experiment.csv")
+
+# Define and save features
+ln.Feature(name="cell_type", dtype=str).save()
+ln.Feature(name="tissue", dtype=str).save()
+ln.Feature(name="gene_count", dtype=int).save()
+ln.Feature(name="experiment_date", dtype="date").save()
+
+# Populate valid values (if using controlled vocabulary)
+import bionty as bt
+bt.CellType.import_source()
+bt.Tissue.import_source()
+```
+
+### Step 3: Create Schema
+
+```python
+# Link features to schema
+features = [
+    ln.Feature.get(name="cell_type"),
+    ln.Feature.get(name="tissue"),
+    ln.Feature.get(name="gene_count"),
+    ln.Feature.get(name="experiment_date")
+]
+
+schema = ln.Schema(
+    name="experiment_schema",
+    features=features,
+    flexible=True
+).save()
+```
+
+### Step 4: Initialize Curator and Validate
+
+```python
+# Initialize curator
+curator = ln.curators.DataFrameCurator(df, schema)
+
+# Validate dataset
+validation = curator.validate()
+
+# Check validation results
+if validation:
+    print("✓ Validation passed")
+else:
+    print("✗ Validation failed")
+    curator.non_validated  # See problematic fields
+```
+
+### Step 5: Fix Validation Issues
+
+#### Standardize Values
+
+```python
+# Fix typos and synonyms in categorical columns
+curator.cat.standardize("cell_type")
+curator.cat.standardize("tissue")
+
+# View standardization mapping
+curator.cat.inspect_standardize("cell_type")
+```
+
+#### Map to Ontologies
+
+```python
+# Map values to ontology terms
+curator.cat.add_ontology("cell_type", bt.CellType)
+curator.cat.add_ontology("tissue", bt.Tissue)
+
+# Look up public ontologies for unmapped terms
+curator.cat.lookup(public=True).cell_type  # Interactive lookup
+```
+
+#### Add New Terms
+
+```python
+# Add new valid terms to registry
+curator.cat.add_new_from("cell_type")
+
+# Or manually create records
+new_cell_type = bt.CellType(name="my_novel_cell_type").save()
+```
+
+#### Rename Columns
+
+```python
+# Rename columns to match feature names
+df = df.rename(columns={"celltype": "cell_type"})
+
+# Re-initialize curator with fixed DataFrame
+curator = ln.curators.DataFrameCurator(df, schema)
+```
+
+### Step 6: Save Curated Artifact
+
+```python
+# Save with schema linkage
+artifact = curator.save_artifact(
+    key="experiments/curated_data.parquet",
+    description="Validated and annotated experimental data"
+)
+
+# Verify artifact has schema
+artifact.schema  # Returns the schema object
+artifact.describe()  # Shows validation status
+```
+
+## AnnData Curation
+
+For composite structures like AnnData, use "slots" to validate different components:
+
+### Defining AnnData Schemas
+
+```python
+# Create schemas for different slots
+obs_schema = ln.Schema(
+    name="cell_metadata",
+    features=[
+        ln.Feature.get(name="cell_type"),
+        ln.Feature.get(name="tissue"),
+        ln.Feature.get(name="donor_id")
+    ]
+).save()
+
+var_schema = ln.Schema(
+    name="gene_ids",
+    features=[ln.Feature.get(name="ensembl_gene_id")]
+).save()
+
+# Create composite AnnData schema
+anndata_schema = ln.Schema(
+    name="scrna_schema",
+    otype="AnnData",
+    slots={
+        "obs": obs_schema,
+        "var.T": var_schema  # .T indicates transposition
+    }
+).save()
+```
+
+### Curating AnnData Objects
+
+```python
+import anndata as ad
+
+# Load AnnData
+adata = ad.read_h5ad("data.h5ad")
+
+# Initialize curator
+curator = ln.curators.AnnDataCurator(adata, anndata_schema)
+
+# Validate all slots
+validation = curator.validate()
+
+# Fix issues by slot
+curator.cat.standardize("obs", "cell_type")
+curator.cat.add_ontology("obs", "cell_type", bt.CellType)
+curator.cat.standardize("var.T", "ensembl_gene_id")
+
+# Save curated artifact
+artifact = curator.save_artifact(
+    key="scrna/validated_data.h5ad",
+    description="Curated single-cell RNA-seq data"
+)
+```
+
+## MuData Curation
+
+MuData supports multi-modal data through modality-specific slots:
+
+```python
+# Define schemas for each modality
+rna_obs_schema = ln.Schema(name="rna_obs_schema", features=[...]).save()
+protein_obs_schema = ln.Schema(name="protein_obs_schema", features=[...]).save()
+
+# Create MuData schema
+mudata_schema = ln.Schema(
+    name="multimodal_schema",
+    otype="MuData",
+    slots={
+        "rna:obs": rna_obs_schema,
+        "protein:obs": protein_obs_schema
+    }
+).save()
+
+# Curate
+curator = ln.curators.MuDataCurator(mdata, mudata_schema)
+curator.validate()
+```
+
+## SpatialData Curation
+
+For spatial transcriptomics data:
+
+```python
+# Define spatial schema
+spatial_schema = ln.Schema(
+    name="spatial_schema",
+    otype="SpatialData",
+    slots={
+        "tables:cell_metadata.obs": cell_schema,
+        "attrs:bio": bio_metadata_schema
+    }
+).save()
+
+# Curate
+curator = ln.curators.SpatialDataCurator(sdata, spatial_schema)
+curator.validate()
+```
+
+## TileDB-SOMA Curation
+
+For scalable array-backed data:
+
+```python
+# Define SOMA schema
+soma_schema = ln.Schema(
+    name="soma_schema",
+    otype="tiledbsoma",
+    slots={
+        "obs": obs_schema,
+        "ms:RNA.T": var_schema  # measurement:modality.T
+    }
+).save()
+
+# Curate
+curator = ln.curators.TileDBSOMACurator(soma_exp, soma_schema)
+curator.validate()
+```
+
+## Feature Validation
+
+### Data Type Validation
+
+```python
+# Define typed features
+ln.Feature(name="age", dtype=int).save()
+ln.Feature(name="weight", dtype=float).save()
+ln.Feature(name="is_treated", dtype=bool).save()
+ln.Feature(name="collection_date", dtype="date").save()
+
+# Coerce types during validation
+ln.Feature(name="age_str", dtype=int, coerce_dtype=True).save()  # Auto-convert strings to int
+```
+
+### Value Validation
+
+```python
+# Validate against allowed values
+cell_type_feature = ln.Feature(name="cell_type", dtype=str).save()
+
+# Link to registry for controlled vocabulary
+cell_type_feature.link_to_registry(bt.CellType)
+
+# Now validation checks against CellType registry
+curator = ln.curators.DataFrameCurator(df, schema)
+curator.validate()  # Errors if cell_type values not in registry
+```
+
+## Standardization Strategies
+
+### Using Public Ontologies
+
+```python
+# Look up standardized terms from public sources
+curator.cat.lookup(public=True).cell_type
+
+# Returns auto-complete object with public ontology terms
+# User can select correct term interactively
+```
+
+### Synonym Mapping
+
+```python
+# Add synonyms to records
+t_cell = bt.CellType.get(name="T cell")
+t_cell.add_synonym("T lymphocyte")
+t_cell.add_synonym("T-cell")
+
+# Now standardization maps synonyms automatically
+curator.cat.standardize("cell_type")
+# "T lymphocyte" → "T cell"
+# "T-cell" → "T cell"
+```
+
+### Custom Standardization
+
+```python
+# Manual mapping
+mapping = {
+    "TCell": "T cell",
+    "t cell": "T cell",
+    "T-cells": "T cell"
+}
+
+# Apply mapping
+df["cell_type"] = df["cell_type"].map(lambda x: mapping.get(x, x))
+```
+
+## Handling Validation Errors
+
+### Common Issues and Solutions
+
+**Issue: Column not in schema**
+```python
+# Solution 1: Rename column
+df = df.rename(columns={"old_name": "feature_name"})
+
+# Solution 2: Add feature to schema
+new_feature = ln.Feature(name="new_column", dtype=str).save()
+schema.features.add(new_feature)
+```
+
+**Issue: Invalid values**
+```python
+# Solution 1: Standardize
+curator.cat.standardize("column_name")
+
+# Solution 2: Add new valid values
+curator.cat.add_new_from("column_name")
+
+# Solution 3: Map to ontology
+curator.cat.add_ontology("column_name", bt.Registry)
+```
+
+**Issue: Data type mismatch**
+```python
+# Solution 1: Convert data type
+df["column"] = df["column"].astype(int)
+
+# Solution 2: Enable coercion in feature
+feature = ln.Feature.get(name="column")
+feature.coerce_dtype = True
+feature.save()
+```
+
+## Schema Versioning
+
+Schemas can be versioned like other records:
+
+```python
+# Create initial schema
+schema_v1 = ln.Schema(name="experiment_schema", features=[...]).save()
+
+# Update schema with new features
+schema_v2 = ln.Schema(
+    name="experiment_schema",
+    features=[...],  # Updated list
+    version="2"
+).save()
+
+# Link artifacts to specific schema versions
+artifact.schema = schema_v2
+artifact.save()
+```
+
+## Querying Validated Data
+
+Once data is validated and annotated, it becomes queryable:
+
+```python
+# Find all validated artifacts
+ln.Artifact.filter(is_valid=True).to_dataframe()
+
+# Find artifacts with specific schema
+ln.Artifact.filter(schema=schema).to_dataframe()
+
+# Query by annotated features
+ln.Artifact.filter(cell_type="T cell", tissue="blood").to_dataframe()
+
+# Include features in results
+ln.Artifact.filter(is_valid=True).to_dataframe(include="features")
+```
+
+## Best Practices
+
+1. **Define features first**: Create Feature registry before curation
+2. **Use public ontologies**: Leverage bt.lookup(public=True) for standardization
+3. **Start flexible**: Use flexible schemas initially, tighten as understanding grows
+4. **Document slots**: Clearly specify transposition (.T) in composite schemas
+5. **Standardize early**: Fix typos and synonyms before validation
+6. **Validate incrementally**: Check each slot separately for composite structures
+7. **Version schemas**: Track schema changes over time
+8. **Add synonyms**: Register common variations to simplify future curation
+9. **Coerce types cautiously**: Enable dtype coercion only when safe
+10. **Test on samples**: Validate small subsets before full dataset curation
+
+## Advanced: Custom Validators
+
+Create custom validation logic:
+
+```python
+def validate_gene_expression(df):
+    """Custom validator for gene expression values."""
+    # Check non-negative
+    if (df < 0).any().any():
+        return False, "Negative expression values found"
+
+    # Check reasonable range
+    if (df > 1e6).any().any():
+        return False, "Unreasonably high expression values"
+
+    return True, "Valid"
+
+# Apply during curation
+is_valid, message = validate_gene_expression(df)
+if not is_valid:
+    print(f"Validation failed: {message}")
+```
+
+## Tracking Curation Provenance
+
+```python
+# Curated artifacts track curation lineage
+ln.track()  # Start tracking
+
+# Perform curation
+curator = ln.curators.DataFrameCurator(df, schema)
+curator.validate()
+curator.cat.standardize("cell_type")
+artifact = curator.save_artifact(key="curated.parquet")
+
+ln.finish()  # Complete tracking
+
+# View curation lineage
+artifact.run.describe()  # Shows curation transform
+artifact.view_lineage()  # Visualizes curation process
+```
--- a/skills/lamindb/references/core-concepts.md
+++ b/skills/lamindb/references/core-concepts.md
@@ -0,0 +1,380 @@
+# LaminDB Core Concepts
+
+This document covers the fundamental concepts and building blocks of LaminDB: Artifacts, Records, Runs, Transforms, Features, and data lineage tracking.
+
+## Artifacts
+
+Artifacts represent datasets in various formats (DataFrames, AnnData, SpatialData, Parquet, Zarr, etc.). They serve as the primary data objects in LaminDB.
+
+### Creating and Saving Artifacts
+
+**From file:**
+```python
+import lamindb as ln
+
+# Save a file as artifact
+ln.Artifact("sample.fasta", key="sample.fasta").save()
+
+# With description
+artifact = ln.Artifact(
+    "data/analysis.h5ad",
+    key="experiments/scrna_batch1.h5ad",
+    description="Single-cell RNA-seq batch 1"
+).save()
+```
+
+**From DataFrame:**
+```python
+import pandas as pd
+
+df = pd.read_csv("data.csv")
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="datasets/processed_data.parquet",
+    description="Processed experimental data"
+).save()
+```
+
+**From AnnData:**
+```python
+import anndata as ad
+
+adata = ad.read_h5ad("data.h5ad")
+artifact = ln.Artifact.from_anndata(
+    adata,
+    key="scrna/experiment1.h5ad",
+    description="scRNA-seq data with QC"
+).save()
+```
+
+### Retrieving Artifacts
+
+```python
+# By key
+artifact = ln.Artifact.get(key="sample.fasta")
+
+# By UID
+artifact = ln.Artifact.get("aRt1Fact0uid000")
+
+# By filter
+artifact = ln.Artifact.filter(suffix=".h5ad").first()
+```
+
+### Accessing Artifact Content
+
+```python
+# Get cached local path
+local_path = artifact.cache()
+
+# Load into memory
+data = artifact.load()  # Returns DataFrame, AnnData, etc.
+
+# Streaming access (for large files)
+with artifact.open() as f:
+    # Read incrementally
+    chunk = f.read(1000)
+```
+
+### Artifact Metadata
+
+```python
+# View all metadata
+artifact.describe()
+
+# Access specific metadata
+artifact.size          # File size in bytes
+artifact.suffix        # File extension
+artifact.created_at    # Timestamp
+artifact.created_by    # User who created it
+artifact.run          # Associated run
+artifact.transform    # Associated transform
+artifact.version      # Version string
+```
+
+## Records
+
+Records represent experimental entities: samples, perturbations, instruments, cell lines, and any other metadata entities. They support hierarchical relationships through type definitions.
+
+### Creating Records
+
+```python
+# Define a type
+sample_type = ln.Record(name="Sample", is_type=True).save()
+
+# Create instances of that type
+ln.Record(name="P53mutant1", type=sample_type).save()
+ln.Record(name="P53mutant2", type=sample_type).save()
+ln.Record(name="WT-control", type=sample_type).save()
+```
+
+### Searching Records
+
+```python
+# Text search
+ln.Record.search("p53").to_dataframe()
+
+# Filter by fields
+ln.Record.filter(type=sample_type).to_dataframe()
+
+# Get specific record
+record = ln.Record.get(name="P53mutant1")
+```
+
+### Hierarchical Relationships
+
+```python
+# Establish parent-child relationships
+parent_record = ln.Record.get(name="P53mutant1")
+child_record = ln.Record(name="P53mutant1-replicate1", type=sample_type).save()
+child_record.parents.add(parent_record)
+
+# Query relationships
+parent_record.children.to_dataframe()
+child_record.parents.to_dataframe()
+```
+
+## Runs & Transforms
+
+These capture computational lineage. A **Transform** represents a reusable analysis step (notebook, script, or function), while a **Run** documents a specific execution instance.
+
+### Basic Tracking Workflow
+
+```python
+import lamindb as ln
+
+# Start tracking (beginning of notebook/script)
+ln.track()
+
+# Your analysis code
+data = ln.Artifact.get(key="input.csv").load()
+# ... perform analysis ...
+result.to_csv("output.csv")
+artifact = ln.Artifact("output.csv", key="output.csv").save()
+
+# Finish tracking (end of notebook/script)
+ln.finish()
+```
+
+### Tracking with Parameters
+
+```python
+ln.track(params={
+    "learning_rate": 0.01,
+    "batch_size": 32,
+    "epochs": 100,
+    "downsample": True
+})
+
+# Query runs by parameters
+ln.Run.filter(params__learning_rate=0.01).to_dataframe()
+ln.Run.filter(params__downsample=True).to_dataframe()
+```
+
+### Tracking with Projects
+
+```python
+# Associate with project
+ln.track(project="Cancer Drug Screen 2025")
+
+# Query by project
+project = ln.Project.get(name="Cancer Drug Screen 2025")
+ln.Artifact.filter(projects=project).to_dataframe()
+ln.Run.filter(project=project).to_dataframe()
+```
+
+### Function-Level Tracking
+
+Use the `@ln.tracked()` decorator for fine-grained lineage:
+
+```python
+@ln.tracked()
+def preprocess_data(input_key: str, output_key: str, normalize: bool = True) -> None:
+    """Preprocess raw data and save result."""
+    # Load input (automatically tracked)
+    artifact = ln.Artifact.get(key=input_key)
+    data = artifact.load()
+
+    # Process
+    if normalize:
+        data = (data - data.mean()) / data.std()
+
+    # Save output (automatically tracked)
+    ln.Artifact.from_dataframe(data, key=output_key).save()
+
+# Each call creates a separate Transform and Run
+preprocess_data("raw/batch1.csv", "processed/batch1.csv", normalize=True)
+preprocess_data("raw/batch2.csv", "processed/batch2.csv", normalize=False)
+```
+
+### Accessing Lineage Information
+
+```python
+# From artifact to run
+artifact = ln.Artifact.get(key="output.csv")
+run = artifact.run
+transform = run.transform
+
+# View details
+run.describe()          # Run metadata
+transform.describe()    # Transform metadata
+
+# Access inputs
+run.inputs.to_dataframe()
+
+# Visualize lineage graph
+artifact.view_lineage()
+```
+
+## Features
+
+Features define typed metadata fields for validation and querying. They enable structured annotation and searching.
+
+### Defining Features
+
+```python
+from datetime import date
+
+# Numeric feature
+ln.Feature(name="gc_content", dtype=float).save()
+ln.Feature(name="read_count", dtype=int).save()
+
+# Date feature
+ln.Feature(name="experiment_date", dtype=date).save()
+
+# Categorical feature
+ln.Feature(name="cell_type", dtype=str).save()
+ln.Feature(name="treatment", dtype=str).save()
+```
+
+### Annotating Artifacts with Features
+
+```python
+# Single values
+artifact.features.add_values({
+    "gc_content": 0.55,
+    "experiment_date": "2025-10-31"
+})
+
+# Using feature registry records
+gc_content_feature = ln.Feature.get(name="gc_content")
+artifact.features.add(gc_content_feature)
+```
+
+### Querying by Features
+
+```python
+# Filter by feature value
+ln.Artifact.filter(gc_content=0.55).to_dataframe()
+ln.Artifact.filter(experiment_date="2025-10-31").to_dataframe()
+
+# Comparison operators
+ln.Artifact.filter(read_count__gt=1000000).to_dataframe()
+ln.Artifact.filter(gc_content__gte=0.5, gc_content__lte=0.6).to_dataframe()
+
+# Check for presence of annotation
+ln.Artifact.filter(cell_type__isnull=False).to_dataframe()
+
+# Include features in output
+ln.Artifact.filter(treatment="DMSO").to_dataframe(include="features")
+```
+
+### Nested Dictionary Features
+
+For complex metadata stored as dictionaries:
+
+```python
+# Access nested values
+ln.Artifact.filter(study_metadata__detail1="123").to_dataframe()
+ln.Artifact.filter(study_metadata__assay__type="RNA-seq").to_dataframe()
+```
+
+## Data Lineage Tracking
+
+LaminDB automatically captures execution context and relationships between data, code, and runs.
+
+### What Gets Tracked
+
+- **Source code**: Script/notebook content and git commit
+- **Environment**: Python packages and versions
+- **Input artifacts**: Data loaded during execution
+- **Output artifacts**: Data created during execution
+- **Execution metadata**: Timestamps, user, parameters
+- **Computational dependencies**: Transform relationships
+
+### Viewing Lineage
+
+```python
+# Visualize full lineage graph
+artifact.view_lineage()
+
+# View captured metadata
+artifact.describe()
+
+# Access related entities
+artifact.run              # The run that created it
+artifact.run.transform    # The transform (code) used
+artifact.run.inputs       # Input artifacts
+artifact.run.report       # Execution report
+```
+
+### Querying Lineage
+
+```python
+# Find all outputs from a transform
+transform = ln.Transform.get(name="preprocessing.py")
+ln.Artifact.filter(transform=transform).to_dataframe()
+
+# Find all artifacts from a specific user
+user = ln.User.get(handle="researcher123")
+ln.Artifact.filter(created_by=user).to_dataframe()
+
+# Find artifacts using specific inputs
+input_artifact = ln.Artifact.get(key="raw/data.csv")
+runs = ln.Run.filter(inputs=input_artifact)
+ln.Artifact.filter(run__in=runs).to_dataframe()
+```
+
+## Versioning
+
+LaminDB manages artifact versioning automatically when source data or code changes.
+
+### Automatic Versioning
+
+```python
+# First version
+artifact_v1 = ln.Artifact("data.csv", key="experiment/data.csv").save()
+
+# Modify and save again - creates new version
+# (modify data.csv)
+artifact_v2 = ln.Artifact("data.csv", key="experiment/data.csv").save()
+```
+
+### Working with Versions
+
+```python
+# Get latest version (default)
+artifact = ln.Artifact.get(key="experiment/data.csv")
+
+# View all versions
+artifact.versions.to_dataframe()
+
+# Get specific version
+artifact_v1 = artifact.versions.filter(version="1").first()
+
+# Compare versions
+v1_data = artifact_v1.load()
+v2_data = artifact.load()
+```
+
+## Best Practices
+
+1. **Use meaningful keys**: Structure keys hierarchically (e.g., `project/experiment/sample.h5ad`)
+2. **Add descriptions**: Help future users understand artifact contents
+3. **Track consistently**: Call `ln.track()` at the start of every analysis
+4. **Define features upfront**: Create feature registry before annotation
+5. **Use typed features**: Specify dtypes for better validation
+6. **Leverage versioning**: Don't create new keys for minor changes
+7. **Document transforms**: Add docstrings to tracked functions
+8. **Set projects**: Group related work for easier organization and access control
+9. **Query efficiently**: Use filters before loading large datasets
+10. **Visualize lineage**: Use `view_lineage()` to understand data provenance
--- a/skills/lamindb/references/data-management.md
+++ b/skills/lamindb/references/data-management.md
@@ -0,0 +1,433 @@
+# LaminDB Data Management
+
+This document covers querying, searching, filtering, and streaming data in LaminDB, as well as best practices for organizing and accessing datasets.
+
+## Registry Overview
+
+View available registries and their contents:
+
+```python
+import lamindb as ln
+
+# View all registries across modules
+ln.view()
+
+# View latest 100 artifacts
+ln.Artifact.to_dataframe()
+
+# View other registries
+ln.Transform.to_dataframe()
+ln.Run.to_dataframe()
+ln.User.to_dataframe()
+```
+
+## Lookup for Quick Access
+
+For registries with fewer than 100k records, `Lookup` objects enable convenient auto-complete:
+
+```python
+# Create lookup
+records = ln.Record.lookup()
+
+# Access by name (auto-complete enabled in IDEs)
+experiment_1 = records.experiment_1
+sample_a = records.sample_a
+
+# Works with biological ontologies too
+import bionty as bt
+cell_types = bt.CellType.lookup()
+t_cell = cell_types.t_cell
+```
+
+## Retrieving Single Records
+
+### Using get()
+
+Retrieve exactly one record (errors if zero or multiple matches):
+
+```python
+# By UID
+artifact = ln.Artifact.get("aRt1Fact0uid000")
+
+# By field
+artifact = ln.Artifact.get(key="data/experiment.h5ad")
+user = ln.User.get(handle="researcher123")
+
+# By ontology ID (for bionty)
+cell_type = bt.CellType.get(ontology_id="CL:0000084")
+```
+
+### Using one() and one_or_none()
+
+```python
+# Get exactly one from QuerySet (errors if 0 or >1)
+artifact = ln.Artifact.filter(key="data.csv").one()
+
+# Get one or None (errors if >1)
+artifact = ln.Artifact.filter(key="maybe_data.csv").one_or_none()
+
+# Get first match
+artifact = ln.Artifact.filter(suffix=".h5ad").first()
+```
+
+## Filtering Data
+
+The `filter()` method returns a QuerySet for flexible retrieval:
+
+```python
+# Basic filtering
+artifacts = ln.Artifact.filter(suffix=".h5ad")
+artifacts.to_dataframe()
+
+# Multiple conditions (AND logic)
+artifacts = ln.Artifact.filter(
+    suffix=".h5ad",
+    created_by=user
+)
+
+# Comparison operators
+ln.Artifact.filter(size__gt=1e6).to_dataframe()           # Greater than
+ln.Artifact.filter(size__gte=1e6).to_dataframe()          # Greater than or equal
+ln.Artifact.filter(size__lt=1e9).to_dataframe()           # Less than
+ln.Artifact.filter(size__lte=1e9).to_dataframe()          # Less than or equal
+
+# Range queries
+ln.Artifact.filter(size__gte=1e6, size__lte=1e9).to_dataframe()
+```
+
+## Text and String Queries
+
+```python
+# Exact match
+ln.Artifact.filter(description="Experiment 1").to_dataframe()
+
+# Contains (case-sensitive)
+ln.Artifact.filter(description__contains="RNA").to_dataframe()
+
+# Case-insensitive contains
+ln.Artifact.filter(description__icontains="rna").to_dataframe()
+
+# Starts with
+ln.Artifact.filter(key__startswith="experiments/").to_dataframe()
+
+# Ends with
+ln.Artifact.filter(key__endswith=".csv").to_dataframe()
+
+# IN list
+ln.Artifact.filter(suffix__in=[".h5ad", ".csv", ".parquet"]).to_dataframe()
+```
+
+## Feature-Based Queries
+
+Query artifacts by their annotated features:
+
+```python
+# Filter by feature value
+ln.Artifact.filter(cell_type="T cell").to_dataframe()
+ln.Artifact.filter(treatment="DMSO").to_dataframe()
+
+# Include features in output
+ln.Artifact.filter(treatment="DMSO").to_dataframe(include="features")
+
+# Nested dictionary access
+ln.Artifact.filter(study_metadata__assay="RNA-seq").to_dataframe()
+ln.Artifact.filter(study_metadata__detail1="123").to_dataframe()
+
+# Check annotation status
+ln.Artifact.filter(cell_type__isnull=False).to_dataframe()  # Has annotation
+ln.Artifact.filter(treatment__isnull=True).to_dataframe()    # Missing annotation
+```
+
+## Traversing Related Registries
+
+Django's double-underscore syntax enables queries across related tables:
+
+```python
+# Find artifacts by creator handle
+ln.Artifact.filter(created_by__handle="researcher123").to_dataframe()
+ln.Artifact.filter(created_by__handle__startswith="test").to_dataframe()
+
+# Find artifacts by transform name
+ln.Artifact.filter(transform__name="preprocess.py").to_dataframe()
+
+# Find artifacts measuring specific genes
+ln.Artifact.filter(feature_sets__genes__symbol="CD8A").to_dataframe()
+ln.Artifact.filter(feature_sets__genes__ensembl_gene_id="ENSG00000153563").to_dataframe()
+
+# Find runs with specific parameters
+ln.Run.filter(params__learning_rate=0.01).to_dataframe()
+ln.Run.filter(params__downsample=True).to_dataframe()
+
+# Find artifacts from specific project
+project = ln.Project.get(name="Cancer Study")
+ln.Artifact.filter(projects=project).to_dataframe()
+```
+
+## Ordering Results
+
+```python
+# Order by field (ascending)
+ln.Artifact.filter(suffix=".h5ad").order_by("created_at").to_dataframe()
+
+# Order descending
+ln.Artifact.filter(suffix=".h5ad").order_by("-created_at").to_dataframe()
+
+# Multiple order fields
+ln.Artifact.order_by("-created_at", "size").to_dataframe()
+```
+
+## Advanced Logical Queries
+
+### OR Logic
+
+```python
+from lamindb import Q
+
+# OR condition
+artifacts = ln.Artifact.filter(
+    Q(suffix=".jpg") | Q(suffix=".png")
+).to_dataframe()
+
+# Complex OR with multiple conditions
+artifacts = ln.Artifact.filter(
+    Q(suffix=".h5ad", size__gt=1e6) | Q(suffix=".csv", size__lt=1e3)
+).to_dataframe()
+```
+
+### NOT Logic
+
+```python
+# Exclude condition
+artifacts = ln.Artifact.filter(
+    ~Q(suffix=".tmp")
+).to_dataframe()
+
+# Complex exclusion
+artifacts = ln.Artifact.filter(
+    ~Q(created_by__handle="testuser")
+).to_dataframe()
+```
+
+### Combining AND, OR, NOT
+
+```python
+# Complex query
+artifacts = ln.Artifact.filter(
+    (Q(suffix=".h5ad") | Q(suffix=".csv")) &
+    Q(size__gt=1e6) &
+    ~Q(created_by__handle__startswith="test")
+).to_dataframe()
+```
+
+## Search Functionality
+
+Full-text search across registry fields:
+
+```python
+# Basic search
+ln.Artifact.search("iris").to_dataframe()
+ln.User.search("smith").to_dataframe()
+
+# Search in specific registry
+bt.CellType.search("T cell").to_dataframe()
+bt.Gene.search("CD8").to_dataframe()
+```
+
+## Working with QuerySets
+
+QuerySets are lazy - they don't hit the database until evaluated:
+
+```python
+# Create query (no database hit)
+qs = ln.Artifact.filter(suffix=".h5ad")
+
+# Evaluate in different ways
+df = qs.to_dataframe()        # As pandas DataFrame
+list_records = list(qs)       # As Python list
+count = qs.count()            # Count only
+exists = qs.exists()          # Boolean check
+
+# Iteration
+for artifact in qs:
+    print(artifact.key, artifact.size)
+
+# Slicing
+first_10 = qs[:10]
+next_10 = qs[10:20]
+```
+
+## Chaining Filters
+
+```python
+# Build query incrementally
+qs = ln.Artifact.filter(suffix=".h5ad")
+qs = qs.filter(size__gt=1e6)
+qs = qs.filter(created_at__year=2025)
+qs = qs.order_by("-created_at")
+
+# Execute
+results = qs.to_dataframe()
+```
+
+## Streaming Large Datasets
+
+For datasets too large to fit in memory, use streaming access:
+
+### Streaming Files
+
+```python
+# Open file stream
+artifact = ln.Artifact.get(key="large_file.csv")
+
+with artifact.open() as f:
+    # Read in chunks
+    chunk = f.read(10000)  # Read 10KB
+    # Process chunk
+```
+
+### Array Slicing
+
+For array-based formats (Zarr, HDF5, AnnData):
+
+```python
+# Get backing file without loading
+artifact = ln.Artifact.get(key="large_data.h5ad")
+adata = artifact.backed()  # Returns backed AnnData
+
+# Slice specific portions
+subset = adata[:1000, :]  # First 1000 cells
+genes_of_interest = adata[:, ["CD4", "CD8A", "CD8B"]]
+
+# Stream batches
+for i in range(0, adata.n_obs, 1000):
+    batch = adata[i:i+1000, :]
+    # Process batch
+```
+
+### Iterator Access
+
+```python
+# Process large collections incrementally
+artifacts = ln.Artifact.filter(suffix=".fastq.gz")
+
+for artifact in artifacts.iterator(chunk_size=10):
+    # Process 10 at a time
+    path = artifact.cache()
+    # Analyze file
+```
+
+## Aggregation and Statistics
+
+```python
+# Count records
+ln.Artifact.filter(suffix=".h5ad").count()
+
+# Distinct values
+ln.Artifact.values_list("suffix", flat=True).distinct()
+
+# Aggregation (requires Django ORM knowledge)
+from django.db.models import Sum, Avg, Max, Min
+
+# Total size of all artifacts
+ln.Artifact.aggregate(Sum("size"))
+
+# Average artifact size by suffix
+ln.Artifact.values("suffix").annotate(avg_size=Avg("size"))
+```
+
+## Caching and Performance
+
+```python
+# Check cache location
+ln.settings.cache_dir
+
+# Configure cache
+lamin cache set /path/to/cache
+
+# Clear cache for specific artifact
+artifact.delete_cache()
+
+# Get cached path (downloads if needed)
+path = artifact.cache()
+
+# Check if cached
+if artifact.is_cached():
+    path = artifact.cache()
+```
+
+## Organizing Data with Keys
+
+Best practices for structuring keys:
+
+```python
+# Hierarchical organization
+ln.Artifact("data.h5ad", key="project/experiment/batch1/data.h5ad").save()
+ln.Artifact("data.h5ad", key="scrna/2025/oct/sample_001.h5ad").save()
+
+# Browse by prefix
+ln.Artifact.filter(key__startswith="scrna/2025/oct/").to_dataframe()
+
+# Version in key (alternative to built-in versioning)
+ln.Artifact("data.h5ad", key="data/processed/v1/final.h5ad").save()
+ln.Artifact("data.h5ad", key="data/processed/v2/final.h5ad").save()
+```
+
+## Collections
+
+Group related artifacts into collections:
+
+```python
+# Create collection
+collection = ln.Collection(
+    [artifact1, artifact2, artifact3],
+    name="scRNA-seq batch 1-3",
+    description="Complete dataset across three batches"
+).save()
+
+# Access collection members
+for artifact in collection.artifacts:
+    print(artifact.key)
+
+# Query collections
+ln.Collection.filter(name__contains="batch").to_dataframe()
+```
+
+## Best Practices
+
+1. **Use filters before loading**: Query metadata before accessing file contents
+2. **Leverage QuerySets**: Build queries incrementally for complex conditions
+3. **Stream large files**: Don't load entire datasets into memory unnecessarily
+4. **Structure keys hierarchically**: Makes browsing and filtering easier
+5. **Use search for discovery**: When you don't know exact field values
+6. **Cache strategically**: Configure cache location based on storage capacity
+7. **Index features**: Define features upfront for efficient feature-based queries
+8. **Use collections**: Group related artifacts for dataset-level operations
+9. **Order results**: Sort by creation date or other fields for consistent retrieval
+10. **Check existence**: Use `exists()` or `one_or_none()` to avoid errors
+
+## Common Query Patterns
+
+```python
+# Recent artifacts
+ln.Artifact.order_by("-created_at")[:10].to_dataframe()
+
+# My artifacts
+me = ln.setup.settings.user
+ln.Artifact.filter(created_by=me).to_dataframe()
+
+# Large files
+ln.Artifact.filter(size__gt=1e9).order_by("-size").to_dataframe()
+
+# This month's data
+from datetime import datetime
+ln.Artifact.filter(
+    created_at__year=2025,
+    created_at__month=10
+).to_dataframe()
+
+# Validated datasets with specific features
+ln.Artifact.filter(
+    is_valid=True,
+    cell_type__isnull=False
+).to_dataframe(include="features")
+```
--- a/skills/lamindb/references/integrations.md
+++ b/skills/lamindb/references/integrations.md
@@ -0,0 +1,642 @@
+# LaminDB Integrations
+
+This document covers LaminDB integrations with workflow managers, MLOps platforms, visualization tools, and other third-party systems.
+
+## Overview
+
+LaminDB supports extensive integrations across data storage, computational workflows, machine learning platforms, and visualization tools, enabling seamless incorporation into existing data science and bioinformatics pipelines.
+
+## Data Storage Integrations
+
+### Local Filesystem
+
+```python
+import lamindb as ln
+
+# Initialize with local storage
+lamin init --storage ./mydata
+
+# Save artifacts to local storage
+artifact = ln.Artifact("data.csv", key="local/data.csv").save()
+
+# Load from local storage
+data = artifact.load()
+```
+
+### AWS S3
+
+```python
+# Initialize with S3 storage
+lamin init --storage s3://my-bucket/path \
+  --db postgresql://user:pwd@host:port/db
+
+# Artifacts automatically sync to S3
+artifact = ln.Artifact("data.csv", key="experiments/data.csv").save()
+
+# Transparent S3 access
+data = artifact.load()  # Downloads from S3 if not cached
+```
+
+### S3-Compatible Services
+
+Support for MinIO, Cloudflare R2, and other S3-compatible endpoints:
+
+```python
+# Initialize with custom S3 endpoint
+lamin init --storage 's3://bucket?endpoint_url=http://minio.example.com:9000'
+
+# Configure credentials
+export AWS_ACCESS_KEY_ID=minioadmin
+export AWS_SECRET_ACCESS_KEY=minioadmin
+```
+
+### Google Cloud Storage
+
+```python
+# Install GCP extras
+pip install 'lamindb[gcp]'
+
+# Initialize with GCS
+lamin init --storage gs://my-bucket/path \
+  --db postgresql://user:pwd@host:port/db
+
+# Artifacts sync to GCS
+artifact = ln.Artifact("data.csv", key="experiments/data.csv").save()
+```
+
+### HTTP/HTTPS (Read-Only)
+
+```python
+# Access remote files without copying
+artifact = ln.Artifact(
+    "https://example.com/data.csv",
+    key="remote/data.csv"
+).save()
+
+# Stream remote content
+with artifact.open() as f:
+    data = f.read()
+```
+
+### HuggingFace Datasets
+
+```python
+# Access HuggingFace datasets
+from datasets import load_dataset
+
+dataset = load_dataset("squad", split="train")
+
+# Register as LaminDB artifact
+artifact = ln.Artifact.from_dataframe(
+    dataset.to_pandas(),
+    key="hf/squad_train.parquet",
+    description="SQuAD training data from HuggingFace"
+).save()
+```
+
+## Workflow Manager Integrations
+
+### Nextflow
+
+Track Nextflow pipeline execution and outputs:
+
+```python
+# In your Nextflow process script
+import lamindb as ln
+
+# Initialize tracking
+ln.track()
+
+# Your Nextflow process logic
+input_artifact = ln.Artifact.get(key="${input_key}")
+data = input_artifact.load()
+
+# Process data
+result = process_data(data)
+
+# Save output
+output_artifact = ln.Artifact.from_dataframe(
+    result,
+    key="${output_key}"
+).save()
+
+ln.finish()
+```
+
+**Nextflow config example:**
+```nextflow
+process ANALYZE {
+    input:
+    val input_key
+
+    output:
+    path "result.csv"
+
+    script:
+    """
+    #!/usr/bin/env python
+    import lamindb as ln
+    ln.track()
+    artifact = ln.Artifact.get(key="${input_key}")
+    # Process and save
+    ln.finish()
+    """
+}
+```
+
+### Snakemake
+
+Integrate LaminDB into Snakemake workflows:
+
+```python
+# In Snakemake rule
+rule process_data:
+    input:
+        "data/input.csv"
+    output:
+        "data/output.csv"
+    run:
+        import lamindb as ln
+
+        ln.track()
+
+        # Load input artifact
+        artifact = ln.Artifact.get(key="inputs/data.csv")
+        data = artifact.load()
+
+        # Process
+        result = analyze(data)
+
+        # Save output
+        result.to_csv(output[0])
+        ln.Artifact(output[0], key="outputs/result.csv").save()
+
+        ln.finish()
+```
+
+### Redun
+
+Track Redun task execution:
+
+```python
+from redun import task
+import lamindb as ln
+
+@task()
+@ln.tracked()
+def process_dataset(input_key: str, output_key: str):
+    """Redun task with LaminDB tracking."""
+    # Load input
+    artifact = ln.Artifact.get(key=input_key)
+    data = artifact.load()
+
+    # Process
+    result = transform(data)
+
+    # Save output
+    ln.Artifact.from_dataframe(result, key=output_key).save()
+
+    return output_key
+
+# Redun automatically tracks lineage alongside LaminDB
+```
+
+## MLOps Platform Integrations
+
+### Weights & Biases (W&B)
+
+Combine W&B experiment tracking with LaminDB data management:
+
+```python
+import wandb
+import lamindb as ln
+
+# Initialize both
+wandb.init(project="my-project", name="experiment-1")
+ln.track(params={"learning_rate": 0.01, "batch_size": 32})
+
+# Load training data
+train_artifact = ln.Artifact.get(key="datasets/train.parquet")
+train_data = train_artifact.load()
+
+# Train model
+model = train_model(train_data)
+
+# Log to W&B
+wandb.log({"accuracy": 0.95, "loss": 0.05})
+
+# Save model in LaminDB
+import joblib
+joblib.dump(model, "model.pkl")
+model_artifact = ln.Artifact(
+    "model.pkl",
+    key="models/experiment-1.pkl",
+    description=f"Model from W&B run {wandb.run.id}"
+).save()
+
+# Link W&B run ID
+model_artifact.features.add_values({"wandb_run_id": wandb.run.id})
+
+ln.finish()
+wandb.finish()
+```
+
+### MLflow
+
+Integrate MLflow model tracking with LaminDB:
+
+```python
+import mlflow
+import lamindb as ln
+
+# Start runs
+mlflow.start_run()
+ln.track()
+
+# Log parameters to both
+params = {"max_depth": 5, "n_estimators": 100}
+mlflow.log_params(params)
+ln.context.params = params
+
+# Load data from LaminDB
+data_artifact = ln.Artifact.get(key="datasets/features.parquet")
+X = data_artifact.load()
+
+# Train and log model
+model = train_model(X)
+mlflow.sklearn.log_model(model, "model")
+
+# Save to LaminDB
+import joblib
+joblib.dump(model, "model.pkl")
+model_artifact = ln.Artifact(
+    "model.pkl",
+    key=f"models/{mlflow.active_run().info.run_id}.pkl"
+).save()
+
+mlflow.end_run()
+ln.finish()
+```
+
+### HuggingFace Transformers
+
+Track model fine-tuning with LaminDB:
+
+```python
+from transformers import Trainer, TrainingArguments
+import lamindb as ln
+
+ln.track(params={"model": "bert-base", "epochs": 3})
+
+# Load training data
+train_artifact = ln.Artifact.get(key="datasets/train_tokenized.parquet")
+train_dataset = train_artifact.load()
+
+# Configure trainer
+training_args = TrainingArguments(
+    output_dir="./results",
+    num_train_epochs=3,
+)
+
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_dataset,
+)
+
+# Train
+trainer.train()
+
+# Save model to LaminDB
+trainer.save_model("./model")
+model_artifact = ln.Artifact(
+    "./model",
+    key="models/bert_finetuned",
+    description="BERT fine-tuned on custom dataset"
+).save()
+
+ln.finish()
+```
+
+### scVI-tools
+
+Single-cell analysis with scVI and LaminDB:
+
+```python
+import scvi
+import lamindb as ln
+
+ln.track()
+
+# Load data
+adata_artifact = ln.Artifact.get(key="scrna/raw_counts.h5ad")
+adata = adata_artifact.load()
+
+# Setup scVI
+scvi.model.SCVI.setup_anndata(adata, layer="counts")
+
+# Train model
+model = scvi.model.SCVI(adata)
+model.train()
+
+# Save latent representation
+adata.obsm["X_scvi"] = model.get_latent_representation()
+
+# Save results
+result_artifact = ln.Artifact.from_anndata(
+    adata,
+    key="scrna/scvi_latent.h5ad",
+    description="scVI latent representation"
+).save()
+
+ln.finish()
+```
+
+## Array Store Integrations
+
+### TileDB-SOMA
+
+Scalable array storage with cellxgene support:
+
+```python
+import tiledbsoma as soma
+import lamindb as ln
+
+# Create SOMA experiment
+uri = "tiledb://my-namespace/experiment"
+
+with soma.Experiment.create(uri) as exp:
+    # Add measurements
+    exp.add_new_collection("RNA")
+
+    # Register in LaminDB
+    artifact = ln.Artifact(
+        uri,
+        key="cellxgene/experiment.soma",
+        description="TileDB-SOMA experiment"
+    ).save()
+
+# Query with SOMA
+with soma.Experiment.open(uri) as exp:
+    obs = exp.obs.read().to_pandas()
+```
+
+### DuckDB
+
+Query artifacts with DuckDB:
+
+```python
+import duckdb
+import lamindb as ln
+
+# Get artifact
+artifact = ln.Artifact.get(key="datasets/large_data.parquet")
+
+# Query with DuckDB (without loading full file)
+path = artifact.cache()
+result = duckdb.query(f"""
+    SELECT cell_type, COUNT(*) as count
+    FROM read_parquet('{path}')
+    GROUP BY cell_type
+    ORDER BY count DESC
+""").to_df()
+
+# Save query result
+result_artifact = ln.Artifact.from_dataframe(
+    result,
+    key="analysis/cell_type_counts.parquet"
+).save()
+```
+
+## Visualization Integrations
+
+### Vitessce
+
+Create interactive visualizations:
+
+```python
+from vitessce import VitessceConfig
+import lamindb as ln
+
+# Load spatial data
+artifact = ln.Artifact.get(key="spatial/visium_slide.h5ad")
+adata = artifact.load()
+
+# Create Vitessce configuration
+vc = VitessceConfig.from_object(adata)
+
+# Save configuration
+import json
+config_file = "vitessce_config.json"
+with open(config_file, "w") as f:
+    json.dump(vc.to_dict(), f)
+
+# Register configuration
+config_artifact = ln.Artifact(
+    config_file,
+    key="visualizations/spatial_config.json",
+    description="Vitessce visualization config"
+).save()
+```
+
+## Schema Module Integrations
+
+### Bionty (Biological Ontologies)
+
+```python
+import bionty as bt
+
+# Import biological ontologies
+bt.CellType.import_source()
+bt.Gene.import_source(organism="human")
+
+# Use in data curation
+cell_types = bt.CellType.from_values(adata.obs.cell_type)
+```
+
+### WetLab
+
+Track wet lab experiments:
+
+```python
+# Install wetlab module
+pip install 'lamindb[wetlab]'
+
+# Use wetlab registries
+import lamindb_wetlab as wetlab
+
+# Track experiments, samples, protocols
+experiment = wetlab.Experiment(name="RNA-seq batch 1").save()
+```
+
+### Clinical Data (OMOP)
+
+```python
+# Install clinical module
+pip install 'lamindb[clinical]'
+
+# Use OMOP common data model
+import lamindb_clinical as clinical
+
+# Track clinical data
+patient = clinical.Patient(patient_id="P001").save()
+```
+
+## Git Integration
+
+### Sync with Git Repositories
+
+```python
+# Configure git sync
+export LAMINDB_SYNC_GIT_REPO=https://github.com/user/repo.git
+
+# Or programmatically
+ln.settings.sync_git_repo = "https://github.com/user/repo.git"
+
+# Set development directory
+lamin settings set dev-dir .
+
+# Scripts tracked with git commits
+ln.track()  # Automatically captures git commit hash
+# ... your code ...
+ln.finish()
+
+# View git information
+transform = ln.Transform.get(name="analysis.py")
+transform.source_code  # Shows code at git commit
+transform.hash        # Git commit hash
+```
+
+## Enterprise Integrations
+
+### Benchling
+
+Sync with Benchling registries (requires team/enterprise plan):
+
+```python
+# Configure Benchling connection (contact LaminDB team)
+# Syncs schemas and data from Benchling registries
+
+# Access synced Benchling data
+# Details available through enterprise support
+```
+
+## Custom Integration Patterns
+
+### REST API Integration
+
+```python
+import requests
+import lamindb as ln
+
+ln.track()
+
+# Fetch from API
+response = requests.get("https://api.example.com/data")
+data = response.json()
+
+# Convert to DataFrame
+import pandas as pd
+df = pd.DataFrame(data)
+
+# Save to LaminDB
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="api/fetched_data.parquet",
+    description="Data fetched from external API"
+).save()
+
+artifact.features.add_values({"api_url": response.url})
+
+ln.finish()
+```
+
+### Database Integration
+
+```python
+import sqlalchemy as sa
+import lamindb as ln
+
+ln.track()
+
+# Connect to external database
+engine = sa.create_engine("postgresql://user:pwd@host:port/db")
+
+# Query data
+query = "SELECT * FROM experiments WHERE date > '2025-01-01'"
+df = pd.read_sql(query, engine)
+
+# Save to LaminDB
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="external_db/experiments_2025.parquet",
+    description="Experiments from external database"
+).save()
+
+ln.finish()
+```
+
+## Croissant Metadata
+
+Export datasets with Croissant metadata format:
+
+```python
+# Create artifact with rich metadata
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="datasets/published_data.parquet",
+    description="Published dataset with Croissant metadata"
+).save()
+
+# Export Croissant metadata (requires additional configuration)
+# Enables dataset discovery and interoperability
+```
+
+## Best Practices for Integrations
+
+1. **Track consistently**: Use `ln.track()` in all integrated workflows
+2. **Link IDs**: Store external system IDs (W&B run ID, MLflow experiment ID) as features
+3. **Centralize data**: Use LaminDB as single source of truth for data artifacts
+4. **Sync parameters**: Log parameters to both LaminDB and ML platforms
+5. **Version together**: Keep code (git), data (LaminDB), and experiments (ML platform) in sync
+6. **Cache strategically**: Configure appropriate cache locations for cloud storage
+7. **Use feature sets**: Link ontology terms from Bionty to artifacts
+8. **Document integrations**: Add descriptions explaining integration context
+9. **Test incrementally**: Verify integration with small datasets first
+10. **Monitor lineage**: Use `view_lineage()` to ensure integration tracking works
+
+## Troubleshooting Common Issues
+
+**Issue: S3 credentials not found**
+```bash
+export AWS_ACCESS_KEY_ID=your_key
+export AWS_SECRET_ACCESS_KEY=your_secret
+export AWS_DEFAULT_REGION=us-east-1
+```
+
+**Issue: GCS authentication failure**
+```bash
+gcloud auth application-default login
+export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
+```
+
+**Issue: Git sync not working**
+```bash
+# Ensure git repo is set
+lamin settings get sync-git-repo
+
+# Ensure you're in git repo
+git status
+
+# Commit changes before tracking
+git add .
+git commit -m "Update analysis"
+ln.track()
+```
+
+**Issue: MLflow artifacts not syncing**
+```python
+# Save explicitly to both systems
+mlflow.log_artifact("model.pkl")
+ln.Artifact("model.pkl", key="models/model.pkl").save()
+```
--- a/skills/lamindb/references/ontologies.md
+++ b/skills/lamindb/references/ontologies.md
@@ -0,0 +1,497 @@
+# LaminDB Ontology Management
+
+This document covers biological ontology management in LaminDB through the Bionty plugin, including accessing, searching, and annotating data with standardized biological terms.
+
+## Overview
+
+LaminDB integrates the `bionty` plugin to manage standardized biological ontologies, enabling consistent metadata curation and data annotation across research projects. Bionty provides access to 20+ curated biological ontologies covering genes, proteins, cell types, tissues, diseases, and more.
+
+## Available Ontologies
+
+LaminDB provides access to multiple curated ontology sources:
+
+| Registry | Ontology Source | Description |
+|----------|----------------|-------------|
+| **Gene** | Ensembl | Genes across organisms (human, mouse, etc.) |
+| **Protein** | UniProt | Protein sequences and annotations |
+| **CellType** | Cell Ontology (CL) | Standardized cell type classifications |
+| **CellLine** | Cell Line Ontology (CLO) | Cell line annotations |
+| **Tissue** | Uberon | Anatomical structures and tissues |
+| **Disease** | Mondo, DOID | Disease classifications |
+| **Phenotype** | Human Phenotype Ontology (HPO) | Phenotypic abnormalities |
+| **Pathway** | Gene Ontology (GO) | Biological pathways and processes |
+| **ExperimentalFactor** | Experimental Factor Ontology (EFO) | Experimental variables |
+| **DevelopmentalStage** | Various | Developmental stages across organisms |
+| **Ethnicity** | HANCESTRO | Human ancestry ontology |
+| **Drug** | DrugBank | Drug compounds |
+| **Organism** | NCBItaxon | Taxonomic classifications |
+
+## Installation and Import
+
+```python
+# Install bionty (included with lamindb)
+pip install lamindb
+
+# Import
+import lamindb as ln
+import bionty as bt
+```
+
+## Importing Public Ontologies
+
+Populate your registry with a public ontology source:
+
+```python
+# Import Cell Ontology
+bt.CellType.import_source()
+
+# Import organism-specific genes
+bt.Gene.import_source(organism="human")
+bt.Gene.import_source(organism="mouse")
+
+# Import tissues
+bt.Tissue.import_source()
+
+# Import diseases
+bt.Disease.import_source(source="mondo")  # Mondo Disease Ontology
+bt.Disease.import_source(source="doid")   # Disease Ontology
+```
+
+## Searching and Accessing Records
+
+### Keyword Search
+
+```python
+# Search cell types
+bt.CellType.search("T cell").to_dataframe()
+bt.CellType.search("gamma-delta").to_dataframe()
+
+# Search genes
+bt.Gene.search("CD8").to_dataframe()
+bt.Gene.search("TP53").to_dataframe()
+
+# Search diseases
+bt.Disease.search("cancer").to_dataframe()
+
+# Search tissues
+bt.Tissue.search("brain").to_dataframe()
+```
+
+### Auto-Complete Lookup
+
+For registries with fewer than 100k records:
+
+```python
+# Create lookup object
+cell_types = bt.CellType.lookup()
+
+# Access by name (auto-complete in IDEs)
+t_cell = cell_types.t_cell
+hsc = cell_types.hematopoietic_stem_cell
+
+# Similarly for other registries
+genes = bt.Gene.lookup()
+cd8a = genes.cd8a
+```
+
+### Exact Field Matching
+
+```python
+# By ontology ID
+cell_type = bt.CellType.get(ontology_id="CL:0000798")
+disease = bt.Disease.get(ontology_id="MONDO:0004992")
+
+# By name
+cell_type = bt.CellType.get(name="T cell")
+gene = bt.Gene.get(symbol="CD8A")
+
+# By Ensembl ID
+gene = bt.Gene.get(ensembl_gene_id="ENSG00000153563")
+```
+
+## Ontological Hierarchies
+
+### Exploring Relationships
+
+```python
+# Get a cell type
+gdt_cell = bt.CellType.get(ontology_id="CL:0000798")  # gamma-delta T cell
+
+# View direct parents
+gdt_cell.parents.to_dataframe()
+
+# View all ancestors recursively
+ancestors = []
+current = gdt_cell
+while current.parents.exists():
+    parent = current.parents.first()
+    ancestors.append(parent)
+    current = parent
+
+# View direct children
+gdt_cell.children.to_dataframe()
+
+# View all descendants recursively
+gdt_cell.query_children().to_dataframe()
+```
+
+### Visualizing Hierarchies
+
+```python
+# Visualize parent hierarchy
+gdt_cell.view_parents()
+
+# Include children in visualization
+gdt_cell.view_parents(with_children=True)
+
+# Get all related terms for visualization
+t_cell = bt.CellType.get(name="T cell")
+t_cell.view_parents(with_children=True)  # Shows T cell subtypes
+```
+
+## Standardizing and Validating Data
+
+### Validation
+
+Check if terms exist in the ontology:
+
+```python
+# Validate cell types
+bt.CellType.validate(["T cell", "B cell", "invalid_cell"])
+# Returns: [True, True, False]
+
+# Validate genes
+bt.Gene.validate(["CD8A", "TP53", "FAKEGENE"], organism="human")
+# Returns: [True, True, False]
+
+# Check which terms are invalid
+terms = ["T cell", "fat cell", "neuron", "invalid_term"]
+invalid = [t for t, valid in zip(terms, bt.CellType.validate(terms)) if not valid]
+print(f"Invalid terms: {invalid}")
+```
+
+### Standardization with Synonyms
+
+Convert non-standard terms to validated names:
+
+```python
+# Standardize cell type names
+bt.CellType.standardize(["fat cell", "blood forming stem cell"])
+# Returns: ['adipocyte', 'hematopoietic stem cell']
+
+# Standardize genes
+bt.Gene.standardize(["BRCA-1", "p53"], organism="human")
+# Returns: ['BRCA1', 'TP53']
+
+# Handle mixed valid/invalid terms
+terms = ["T cell", "T lymphocyte", "invalid"]
+standardized = bt.CellType.standardize(terms)
+# Returns standardized names where possible
+```
+
+### Loading Validated Records
+
+```python
+# Load records from values (including synonyms)
+records = bt.CellType.from_values(["fat cell", "blood forming stem cell"])
+
+# Returns list of CellType records
+for record in records:
+    print(record.name, record.ontology_id)
+
+# Use with gene symbols
+genes = bt.Gene.from_values(["CD8A", "CD8B"], organism="human")
+```
+
+## Annotating Datasets
+
+### Annotating AnnData
+
+```python
+import anndata as ad
+import lamindb as ln
+
+# Load example data
+adata = ad.read_h5ad("data.h5ad")
+
+# Validate and retrieve matching records
+cell_types = bt.CellType.from_values(adata.obs.cell_type)
+
+# Create artifact with annotations
+artifact = ln.Artifact.from_anndata(
+    adata,
+    key="scrna/annotated_data.h5ad",
+    description="scRNA-seq data with validated cell type annotations"
+).save()
+
+# Link ontology records to artifact
+artifact.feature_sets.add_ontology(cell_types)
+```
+
+### Annotating DataFrames
+
+```python
+import pandas as pd
+
+# Create DataFrame with biological entities
+df = pd.DataFrame({
+    "cell_type": ["T cell", "B cell", "NK cell"],
+    "tissue": ["blood", "spleen", "liver"],
+    "disease": ["healthy", "lymphoma", "healthy"]
+})
+
+# Validate and standardize
+df["cell_type"] = bt.CellType.standardize(df["cell_type"])
+df["tissue"] = bt.Tissue.standardize(df["tissue"])
+
+# Create artifact
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="metadata/sample_info.parquet"
+).save()
+
+# Link ontology records
+cell_type_records = bt.CellType.from_values(df["cell_type"])
+tissue_records = bt.Tissue.from_values(df["tissue"])
+
+artifact.feature_sets.add_ontology(cell_type_records)
+artifact.feature_sets.add_ontology(tissue_records)
+```
+
+## Managing Custom Terms and Hierarchies
+
+### Adding Custom Terms
+
+```python
+# Register new term not in public ontology
+my_celltype = bt.CellType(name="my_novel_T_cell_subtype").save()
+
+# Establish parent-child relationship
+parent = bt.CellType.get(name="T cell")
+my_celltype.parents.add(parent)
+
+# Verify relationship
+my_celltype.parents.to_dataframe()
+parent.children.to_dataframe()  # Should include my_celltype
+```
+
+### Adding Synonyms
+
+```python
+# Add synonyms for standardization
+hsc = bt.CellType.get(name="hematopoietic stem cell")
+hsc.add_synonym("HSC")
+hsc.add_synonym("blood stem cell")
+hsc.add_synonym("hematopoietic progenitor")
+
+# Set abbreviation
+hsc.set_abbr("HSC")
+
+# Now standardization works with synonyms
+bt.CellType.standardize(["HSC", "blood stem cell"])
+# Returns: ['hematopoietic stem cell', 'hematopoietic stem cell']
+```
+
+### Creating Custom Hierarchies
+
+```python
+# Build custom cell type hierarchy
+immune_cell = bt.CellType.get(name="immune cell")
+
+# Add custom subtypes
+my_subtype1 = bt.CellType(name="custom_immune_subtype_1").save()
+my_subtype2 = bt.CellType(name="custom_immune_subtype_2").save()
+
+# Link to parent
+my_subtype1.parents.add(immune_cell)
+my_subtype2.parents.add(immune_cell)
+
+# Create sub-subtypes
+my_subsubtype = bt.CellType(name="custom_sub_subtype").save()
+my_subsubtype.parents.add(my_subtype1)
+
+# Visualize custom hierarchy
+immune_cell.view_parents(with_children=True)
+```
+
+## Multi-Organism Support
+
+For organism-aware registries like Gene:
+
+```python
+# Set global organism
+bt.settings.organism = "human"
+
+# Validate human genes
+bt.Gene.validate(["TCF7", "CD8A"], organism="human")
+
+# Load genes for specific organism
+human_genes = bt.Gene.from_values(["CD8A", "TP53"], organism="human")
+mouse_genes = bt.Gene.from_values(["Cd8a", "Trp53"], organism="mouse")
+
+# Search organism-specific genes
+bt.Gene.search("CD8", organism="human").to_dataframe()
+bt.Gene.search("Cd8", organism="mouse").to_dataframe()
+
+# Switch organism context
+bt.settings.organism = "mouse"
+genes = bt.Gene.from_source(symbol="Ap5b1")
+```
+
+## Public Ontology Lookup
+
+Access terms from public ontologies without importing:
+
+```python
+# Interactive lookup in public sources
+cell_types_public = bt.CellType.lookup(public=True)
+
+# Access public terms
+hepatocyte = cell_types_public.hepatocyte
+
+# Import specific term
+hepatocyte_local = bt.CellType.from_source(name="hepatocyte")
+
+# Or import by ontology ID
+specific_cell = bt.CellType.from_source(ontology_id="CL:0000182")
+```
+
+## Version Tracking
+
+LaminDB automatically tracks ontology versions:
+
+```python
+# View current source versions
+bt.Source.filter(currently_used=True).to_dataframe()
+
+# Check which source a record derives from
+cell_type = bt.CellType.get(name="hepatocyte")
+cell_type.source  # Returns Source metadata
+
+# View source details
+source = cell_type.source
+print(source.name)        # e.g., "cl"
+print(source.version)     # e.g., "2023-05-18"
+print(source.url)         # Ontology URL
+```
+
+## Ontology Integration Workflows
+
+### Workflow 1: Validate Existing Data
+
+```python
+# Load data with biological annotations
+adata = ad.read_h5ad("uncurated_data.h5ad")
+
+# Validate cell types
+validation = bt.CellType.validate(adata.obs.cell_type)
+
+# Identify invalid terms
+invalid_idx = [i for i, v in enumerate(validation) if not v]
+invalid_terms = adata.obs.cell_type.iloc[invalid_idx].unique()
+print(f"Invalid cell types: {invalid_terms}")
+
+# Fix invalid terms manually or with standardization
+adata.obs["cell_type"] = bt.CellType.standardize(adata.obs.cell_type)
+
+# Re-validate
+validation = bt.CellType.validate(adata.obs.cell_type)
+assert all(validation), "All terms should now be valid"
+```
+
+### Workflow 2: Curate and Annotate
+
+```python
+import lamindb as ln
+
+ln.track()  # Start tracking
+
+# Load data
+df = pd.read_csv("experimental_data.csv")
+
+# Standardize using ontologies
+df["cell_type"] = bt.CellType.standardize(df["cell_type"])
+df["tissue"] = bt.Tissue.standardize(df["tissue"])
+
+# Create curated artifact
+artifact = ln.Artifact.from_dataframe(
+    df,
+    key="curated/experiment_2025_10.parquet",
+    description="Curated experimental data with ontology-validated annotations"
+).save()
+
+# Link ontology records
+artifact.feature_sets.add_ontology(bt.CellType.from_values(df["cell_type"]))
+artifact.feature_sets.add_ontology(bt.Tissue.from_values(df["tissue"]))
+
+ln.finish()  # Complete tracking
+```
+
+### Workflow 3: Cross-Organism Gene Mapping
+
+```python
+# Get human genes
+human_genes = ["CD8A", "CD8B", "TP53"]
+human_records = bt.Gene.from_values(human_genes, organism="human")
+
+# Find mouse orthologs (requires external mapping)
+# LaminDB doesn't provide built-in ortholog mapping
+# Use external tools like Ensembl BioMart or homologene
+
+mouse_orthologs = ["Cd8a", "Cd8b", "Trp53"]
+mouse_records = bt.Gene.from_values(mouse_orthologs, organism="mouse")
+```
+
+## Querying Ontology-Annotated Data
+
+```python
+# Find all datasets with specific cell type
+t_cell = bt.CellType.get(name="T cell")
+ln.Artifact.filter(feature_sets__cell_types=t_cell).to_dataframe()
+
+# Find datasets measuring specific genes
+cd8a = bt.Gene.get(symbol="CD8A", organism="human")
+ln.Artifact.filter(feature_sets__genes=cd8a).to_dataframe()
+
+# Query across ontology hierarchy
+# Find all datasets with T cell or T cell subtypes
+t_cell_subtypes = t_cell.query_children()
+ln.Artifact.filter(
+    feature_sets__cell_types__in=t_cell_subtypes
+).to_dataframe()
+```
+
+## Best Practices
+
+1. **Import ontologies first**: Call `import_source()` before validation
+2. **Use standardization**: Leverage synonym mapping to handle variations
+3. **Validate early**: Check terms before creating artifacts
+4. **Set organism context**: Specify organism for gene-related queries
+5. **Add custom synonyms**: Register common variations in your domain
+6. **Use public lookup**: Access `lookup(public=True)` for term discovery
+7. **Track versions**: Monitor ontology source versions for reproducibility
+8. **Build hierarchies**: Link custom terms to existing ontology structures
+9. **Query hierarchically**: Use `query_children()` for comprehensive searches
+10. **Document mappings**: Track custom term additions and relationships
+
+## Common Ontology Operations
+
+```python
+# Check if term exists
+exists = bt.CellType.filter(name="T cell").exists()
+
+# Count terms in registry
+n_cell_types = bt.CellType.filter().count()
+
+# Get all terms with specific parent
+immune_cells = bt.CellType.filter(parents__name="immune cell")
+
+# Find orphan terms (no parents)
+orphans = bt.CellType.filter(parents__isnull=True)
+
+# Get recently added terms
+from datetime import datetime, timedelta
+recent = bt.CellType.filter(
+    created_at__gte=datetime.now() - timedelta(days=7)
+)
+```
--- a/skills/lamindb/references/setup-deployment.md
+++ b/skills/lamindb/references/setup-deployment.md
@@ -0,0 +1,733 @@
+# LaminDB Setup & Deployment
+
+This document covers installation, configuration, instance management, storage options, and deployment strategies for LaminDB.
+
+## Installation
+
+### Basic Installation
+
+```bash
+# Install LaminDB
+pip install lamindb
+
+# Or with pip3
+pip3 install lamindb
+```
+
+### Installation with Extras
+
+Install optional dependencies for specific functionality:
+
+```bash
+# Google Cloud Platform support
+pip install 'lamindb[gcp]'
+
+# Flow cytometry formats
+pip install 'lamindb[fcs]'
+
+# Array storage and streaming (Zarr support)
+pip install 'lamindb[zarr]'
+
+# AWS S3 support (usually included by default)
+pip install 'lamindb[aws]'
+
+# Multiple extras
+pip install 'lamindb[gcp,zarr,fcs]'
+```
+
+### Module Plugins
+
+```bash
+# Biological ontologies (Bionty)
+pip install bionty
+
+# Wet lab functionality
+pip install lamindb-wetlab
+
+# Clinical data (OMOP CDM)
+pip install lamindb-clinical
+```
+
+### Verify Installation
+
+```python
+import lamindb as ln
+print(ln.__version__)
+
+# Check available modules
+import bionty as bt
+print(bt.__version__)
+```
+
+## Authentication
+
+### Creating an Account
+
+1. Visit https://lamin.ai
+2. Sign up for a free account
+3. Navigate to account settings to generate an API key
+
+### Logging In
+
+```bash
+# Login with API key
+lamin login
+
+# You'll be prompted to enter your API key
+# API key is stored locally at ~/.lamin/
+```
+
+### Authentication Details
+
+**Data Privacy:** LaminDB authentication only collects basic metadata (email, user information). Your actual data remains private and is not sent to LaminDB servers.
+
+**Local vs Cloud:** Authentication is required even for local-only usage to enable collaboration features and instance management.
+
+## Instance Initialization
+
+### Local SQLite Instance
+
+For local development and small datasets:
+
+```bash
+# Initialize in current directory
+lamin init --storage ./mydata
+
+# Initialize in specific directory
+lamin init --storage /path/to/data
+
+# Initialize with specific modules
+lamin init --storage ./mydata --modules bionty
+
+# Initialize with multiple modules
+lamin init --storage ./mydata --modules bionty,wetlab
+```
+
+### Cloud Storage with SQLite
+
+Use cloud storage but local SQLite database:
+
+```bash
+# AWS S3
+lamin init --storage s3://my-bucket/path
+
+# Google Cloud Storage
+lamin init --storage gs://my-bucket/path
+
+# S3-compatible (MinIO, Cloudflare R2)
+lamin init --storage 's3://bucket?endpoint_url=http://endpoint:9000'
+```
+
+### Cloud Storage with PostgreSQL
+
+For production deployments:
+
+```bash
+# S3 + PostgreSQL
+lamin init --storage s3://my-bucket/path \
+  --db postgresql://user:password@hostname:5432/dbname \
+  --modules bionty
+
+# GCS + PostgreSQL
+lamin init --storage gs://my-bucket/path \
+  --db postgresql://user:password@hostname:5432/dbname \
+  --modules bionty
+```
+
+### Instance Naming
+
+```bash
+# Specify instance name
+lamin init --storage ./mydata --name my-project
+
+# Default name uses directory name
+lamin init --storage ./mydata  # Instance name: "mydata"
+```
+
+## Connecting to Instances
+
+### Connect to Your Own Instance
+
+```bash
+# By name
+lamin connect my-project
+
+# By full path
+lamin connect account_handle/my-project
+```
+
+### Connect to Shared Instance
+
+```bash
+# Connect to someone else's instance
+lamin connect other-user/their-project
+
+# Requires appropriate permissions
+```
+
+### Switching Between Instances
+
+```bash
+# List available instances
+lamin info
+
+# Switch instance
+lamin connect another-instance
+
+# Close current instance
+lamin close
+```
+
+## Storage Configuration
+
+### Local Storage
+
+**Advantages:**
+- Fast access
+- No internet required
+- Simple setup
+
+**Setup:**
+```bash
+lamin init --storage ./data
+```
+
+### AWS S3 Storage
+
+**Advantages:**
+- Scalable
+- Collaborative
+- Durable
+
+**Setup:**
+```bash
+# Set credentials
+export AWS_ACCESS_KEY_ID=your_key_id
+export AWS_SECRET_ACCESS_KEY=your_secret_key
+export AWS_DEFAULT_REGION=us-east-1
+
+# Initialize
+lamin init --storage s3://my-bucket/project-data \
+  --db postgresql://user:pwd@host:5432/db
+```
+
+**S3 Permissions Required:**
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "s3:GetObject",
+        "s3:PutObject",
+        "s3:DeleteObject",
+        "s3:ListBucket"
+      ],
+      "Resource": [
+        "arn:aws:s3:::my-bucket/*",
+        "arn:aws:s3:::my-bucket"
+      ]
+    }
+  ]
+}
+```
+
+### Google Cloud Storage
+
+**Setup:**
+```bash
+# Authenticate
+gcloud auth application-default login
+
+# Or use service account
+export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
+
+# Initialize
+lamin init --storage gs://my-bucket/project-data \
+  --db postgresql://user:pwd@host:5432/db
+```
+
+### S3-Compatible Storage
+
+For MinIO, Cloudflare R2, or other S3-compatible services:
+
+```bash
+# MinIO example
+export AWS_ACCESS_KEY_ID=minioadmin
+export AWS_SECRET_ACCESS_KEY=minioadmin
+
+lamin init --storage 's3://my-bucket?endpoint_url=http://minio.example.com:9000'
+
+# Cloudflare R2 example
+export AWS_ACCESS_KEY_ID=your_r2_access_key
+export AWS_SECRET_ACCESS_KEY=your_r2_secret_key
+
+lamin init --storage 's3://bucket?endpoint_url=https://account-id.r2.cloudflarestorage.com'
+```
+
+## Database Configuration
+
+### SQLite (Default)
+
+**Advantages:**
+- No separate database server
+- Simple setup
+- Good for development
+
+**Limitations:**
+- Not suitable for concurrent writes
+- Limited scalability
+
+**Setup:**
+```bash
+# SQLite is default
+lamin init --storage ./data
+# Database stored at ./data/.lamindb/
+```
+
+### PostgreSQL
+
+**Advantages:**
+- Production-ready
+- Concurrent access
+- Better performance at scale
+
+**Setup:**
+```bash
+# Full connection string
+lamin init --storage s3://bucket/path \
+  --db postgresql://username:password@hostname:5432/database
+
+# With SSL
+lamin init --storage s3://bucket/path \
+  --db "postgresql://user:pwd@host:5432/db?sslmode=require"
+```
+
+**PostgreSQL Versions:** Compatible with PostgreSQL 12+
+
+### Database Schema Management
+
+```bash
+# Check current schema version
+lamin migrate check
+
+# Upgrade schema
+lamin migrate deploy
+
+# View migration history
+lamin migrate history
+```
+
+## Cache Configuration
+
+### Cache Directory
+
+LaminDB maintains a local cache for cloud files:
+
+```python
+import lamindb as ln
+
+# View cache location
+print(ln.settings.cache_dir)
+```
+
+### Configure Cache Location
+
+```bash
+# Set cache directory
+lamin cache set /path/to/cache
+
+# View current cache settings
+lamin cache get
+```
+
+### System-Wide Cache (Multi-User)
+
+For shared systems with multiple users:
+
+```bash
+# Create system settings file
+sudo mkdir -p /system/settings
+sudo nano /system/settings/system.env
+```
+
+Add to `system.env`:
+```bash
+lamindb_cache_path=/shared/cache/lamindb
+```
+
+Ensure permissions:
+```bash
+sudo chmod 755 /shared/cache/lamindb
+sudo chown -R shared-user:shared-group /shared/cache/lamindb
+```
+
+### Cache Management
+
+```python
+import lamindb as ln
+
+# Clear cache for specific artifact
+artifact = ln.Artifact.get(key="data.h5ad")
+artifact.delete_cache()
+
+# Check if artifact is cached
+if artifact.is_cached():
+    print("Already cached")
+
+# Manually clear entire cache
+import shutil
+shutil.rmtree(ln.settings.cache_dir)
+```
+
+## Settings Management
+
+### View Current Settings
+
+```python
+import lamindb as ln
+
+# User settings
+print(ln.setup.settings.user)
+# User(handle='username', email='user@email.com', name='Full Name')
+
+# Instance settings
+print(ln.setup.settings.instance)
+# Instance(name='my-project', storage='s3://bucket/path')
+```
+
+### Configure Settings
+
+```bash
+# Set development directory for relative keys
+lamin settings set dev-dir /path/to/project
+
+# Configure git sync
+lamin settings set sync-git-repo https://github.com/user/repo.git
+
+# View all settings
+lamin settings
+```
+
+### Environment Variables
+
+```bash
+# Cache directory
+export LAMIN_CACHE_DIR=/path/to/cache
+
+# Settings directory
+export LAMIN_SETTINGS_DIR=/path/to/settings
+
+# Git sync
+export LAMINDB_SYNC_GIT_REPO=https://github.com/user/repo.git
+```
+
+## Instance Management
+
+### Viewing Instance Information
+
+```bash
+# Current instance info
+lamin info
+
+# List all instances
+lamin ls
+
+# View instance details
+lamin instance details
+```
+
+### Instance Collaboration
+
+```bash
+# Set instance visibility (requires LaminHub)
+lamin instance set-visibility public
+lamin instance set-visibility private
+
+# Invite collaborators (requires LaminHub)
+lamin instance invite user@email.com
+```
+
+### Instance Migration
+
+```bash
+# Backup instance
+lamin backup create
+
+# Restore from backup
+lamin backup restore backup_id
+
+# Export instance metadata
+lamin export instance-metadata.json
+```
+
+### Deleting Instances
+
+```bash
+# Delete instance (preserves data, removes metadata)
+lamin delete --force instance-name
+
+# This only removes the LaminDB metadata
+# Actual data in storage location remains
+```
+
+## Production Deployment Patterns
+
+### Pattern 1: Local Development → Cloud Production
+
+**Development:**
+```bash
+# Local development
+lamin init --storage ./dev-data --modules bionty
+```
+
+**Production:**
+```bash
+# Cloud production
+lamin init --storage s3://prod-bucket/data \
+  --db postgresql://user:pwd@db-host:5432/prod-db \
+  --modules bionty \
+  --name production
+```
+
+**Migration:** Export artifacts from dev, import to prod
+```python
+# Export from dev
+artifacts = ln.Artifact.filter().all()
+for artifact in artifacts:
+    artifact.export("/tmp/export/")
+
+# Switch to prod
+lamin connect production
+
+# Import to prod
+for file in Path("/tmp/export/").glob("*"):
+    ln.Artifact(str(file), key=file.name).save()
+```
+
+### Pattern 2: Multi-Region Deployment
+
+Deploy instances in multiple regions for data sovereignty:
+
+```bash
+# US instance
+lamin init --storage s3://us-bucket/data \
+  --db postgresql://user:pwd@us-db:5432/db \
+  --name us-production
+
+# EU instance
+lamin init --storage s3://eu-bucket/data \
+  --db postgresql://user:pwd@eu-db:5432/db \
+  --name eu-production
+```
+
+### Pattern 3: Shared Storage, Personal Instances
+
+Multiple users, shared data:
+
+```bash
+# Shared storage with user-specific DB
+lamin init --storage s3://shared-bucket/data \
+  --db postgresql://user1:pwd@host:5432/user1_db \
+  --name user1-workspace
+
+lamin init --storage s3://shared-bucket/data \
+  --db postgresql://user2:pwd@host:5432/user2_db \
+  --name user2-workspace
+```
+
+## Performance Optimization
+
+### Database Performance
+
+```python
+# Use connection pooling for PostgreSQL
+# Configure in database server settings
+
+# Optimize queries with indexes
+# LaminDB creates indexes automatically for common queries
+```
+
+### Storage Performance
+
+```bash
+# Use appropriate storage classes
+# S3: STANDARD for frequent access, INTELLIGENT_TIERING for mixed access
+
+# Configure multipart upload thresholds
+export AWS_CLI_FILE_IO_BANDWIDTH=100MB
+```
+
+### Cache Optimization
+
+```python
+# Pre-cache frequently used artifacts
+artifacts = ln.Artifact.filter(key__startswith="reference/")
+for artifact in artifacts:
+    artifact.cache()  # Download to cache
+
+# Use backed mode for large arrays
+adata = artifact.backed()  # Don't load into memory
+```
+
+## Security Best Practices
+
+1. **Credentials Management:**
+   - Use environment variables, not hardcoded credentials
+   - Use IAM roles on AWS/GCP instead of access keys
+   - Rotate credentials regularly
+
+2. **Access Control:**
+   - Use PostgreSQL for multi-user access control
+   - Configure storage bucket policies
+   - Enable audit logging
+
+3. **Network Security:**
+   - Use SSL/TLS for database connections
+   - Use VPCs for cloud deployments
+   - Restrict IP addresses when possible
+
+4. **Data Protection:**
+   - Enable encryption at rest (S3, GCS)
+   - Use encryption in transit (HTTPS, SSL)
+   - Implement backup strategies
+
+## Monitoring and Maintenance
+
+### Health Checks
+
+```python
+import lamindb as ln
+
+# Check database connection
+try:
+    ln.Artifact.filter().count()
+    print("✓ Database connected")
+except Exception as e:
+    print(f"✗ Database error: {e}")
+
+# Check storage access
+try:
+    test_artifact = ln.Artifact("test.txt", key="healthcheck.txt").save()
+    test_artifact.delete(permanent=True)
+    print("✓ Storage accessible")
+except Exception as e:
+    print(f"✗ Storage error: {e}")
+```
+
+### Logging
+
+```python
+# Enable debug logging
+import logging
+logging.basicConfig(level=logging.DEBUG)
+
+# LaminDB operations will produce detailed logs
+```
+
+### Backup Strategy
+
+```bash
+# Regular database backups (PostgreSQL)
+pg_dump -h hostname -U username -d database > backup_$(date +%Y%m%d).sql
+
+# Storage backups (S3 versioning)
+aws s3api put-bucket-versioning \
+  --bucket my-bucket \
+  --versioning-configuration Status=Enabled
+
+# Metadata export
+lamin export metadata_backup.json
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue: Cannot connect to instance**
+```bash
+# Check instance exists
+lamin ls
+
+# Verify authentication
+lamin login
+
+# Re-connect
+lamin connect instance-name
+```
+
+**Issue: Storage permissions denied**
+```bash
+# Check AWS credentials
+aws s3 ls s3://your-bucket/
+
+# Check GCS credentials
+gsutil ls gs://your-bucket/
+
+# Verify IAM permissions
+```
+
+**Issue: Database connection error**
+```bash
+# Test PostgreSQL connection
+psql postgresql://user:pwd@host:5432/db
+
+# Check database version compatibility
+lamin migrate check
+```
+
+**Issue: Cache full**
+```python
+# Clear cache
+import lamindb as ln
+import shutil
+shutil.rmtree(ln.settings.cache_dir)
+
+# Set larger cache location
+lamin cache set /larger/disk/cache
+```
+
+## Upgrade and Migration
+
+### Upgrading LaminDB
+
+```bash
+# Upgrade to latest version
+pip install --upgrade lamindb
+
+# Upgrade database schema
+lamin migrate deploy
+```
+
+### Schema Compatibility
+
+Check the compatibility matrix to ensure your database schema version is compatible with your installed LaminDB version.
+
+### Breaking Changes
+
+Major version upgrades may require migration:
+
+```bash
+# Check for breaking changes
+lamin migrate check
+
+# Review migration plan
+lamin migrate plan
+
+# Execute migration
+lamin migrate deploy
+```
+
+## Best Practices
+
+1. **Start local, scale cloud**: Develop locally, deploy to cloud for production
+2. **Use PostgreSQL for production**: SQLite is only for development
+3. **Configure appropriate cache**: Size cache based on working set
+4. **Enable versioning**: Use S3/GCS versioning for data protection
+5. **Monitor costs**: Track storage and compute costs in cloud deployments
+6. **Document configuration**: Keep infrastructure-as-code for reproducibility
+7. **Test backups**: Regularly verify backup and restore procedures
+8. **Set up monitoring**: Implement health checks and alerting
+9. **Use modules strategically**: Only install needed plugins to reduce complexity
+10. **Plan for scale**: Consider concurrent users and data growth