zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Fork 0

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

10 KiB

Raw Blame History

Specialized Modality Models

This document covers models for specialized single-cell data modalities in scvi-tools.

MethylVI / MethylANVI (Methylation Analysis)

Purpose: Analysis of single-cell bisulfite sequencing (scBS-seq) data for DNA methylation.

Key Features:

Models methylation patterns at single-cell resolution
Handles sparsity in methylation data
Batch correction for methylation experiments
Label transfer (MethylANVI) for cell type annotation

When to Use:

Analyzing scBS-seq or similar methylation data
Studying DNA methylation patterns across cell types
Integrating methylation data across batches
Cell type annotation based on methylation profiles

Data Requirements:

Methylation count matrices (methylated vs. total reads per CpG site)
Format: Cells × CpG sites with methylation ratios or counts

MethylVI (Unsupervised)

Basic Usage:

import scvi

# Setup methylation data
scvi.model.METHYLVI.setup_anndata(
    adata,
    layer="methylation_counts",  # Methylation data
    batch_key="batch"
)

model = scvi.model.METHYLVI(adata)
model.train()

# Get latent representation
latent = model.get_latent_representation()

# Get normalized methylation values
normalized_meth = model.get_normalized_methylation()

MethylANVI (Semi-supervised with cell types)

Basic Usage:

# Setup with cell type labels
scvi.model.METHYLANVI.setup_anndata(
    adata,
    layer="methylation_counts",
    batch_key="batch",
    labels_key="cell_type",
    unlabeled_category="Unknown"
)

model = scvi.model.METHYLANVI(adata)
model.train()

# Predict cell types
predictions = model.predict()

Key Parameters:

n_latent: Latent dimensionality
region_factors: Model region-specific effects

Use Cases:

Epigenetic heterogeneity analysis
Cell type identification via methylation
Integration with gene expression data (separate analysis)
Differential methylation analysis

CytoVI (Flow and Mass Cytometry)

Purpose: Batch correction and integration of flow cytometry and mass cytometry (CyTOF) data.

Key Features:

Handles antibody-based protein measurements
Corrects batch effects in cytometry data
Enables integration across experiments
Designed for high-dimensional protein panels

When to Use:

Analyzing flow cytometry or CyTOF data
Integrating cytometry experiments across batches
Batch correction for protein panels
Cross-study cytometry integration

Data Requirements:

Protein expression matrix (cells × proteins)
Flow cytometry or CyTOF measurements
Batch/experiment annotations

Basic Usage:

scvi.model.CYTOVI.setup_anndata(
    adata,
    protein_expression_obsm_key="protein_expression",
    batch_key="batch"
)

model = scvi.model.CYTOVI(adata)
model.train()

# Get batch-corrected representation
latent = model.get_latent_representation()

# Get normalized protein values
normalized = model.get_normalized_expression()

Key Parameters:

n_latent: Latent space dimensionality
n_layers: Network depth

Typical Workflow:

import scanpy as sc

# 1. Load cytometry data
adata = sc.read_h5ad("cytof_data.h5ad")

# 2. Train CytoVI
scvi.model.CYTOVI.setup_anndata(
    adata,
    protein_expression_obsm_key="protein",
    batch_key="experiment"
)
model = scvi.model.CYTOVI(adata)
model.train()

# 3. Get batch-corrected values
latent = model.get_latent_representation()
adata.obsm["X_CytoVI"] = latent

# 4. Downstream analysis
sc.pp.neighbors(adata, use_rep="X_CytoVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)

# 5. Visualize batch correction
sc.pl.umap(adata, color=["batch", "leiden"])

SysVI (Systems-level Integration)

Purpose: Batch effect correction with emphasis on preserving biological variation.

Key Features:

Specialized batch integration approach
Preserves biological signals while removing technical effects
Designed for large-scale integration studies

When to Use:

Large-scale multi-batch integration
Need to preserve subtle biological variation
Systems-level analysis across many studies

Basic Usage:

scvi.model.SYSVI.setup_anndata(
    adata,
    layer="counts",
    batch_key="batch"
)

model = scvi.model.SYSVI(adata)
model.train()

latent = model.get_latent_representation()

Decipher (Trajectory Inference)

Purpose: Trajectory inference and pseudotime analysis for single-cell data.

Key Features:

Learns cellular trajectories and differentiation paths
Pseudotime estimation
Accounts for uncertainty in trajectory structure
Compatible with scVI embeddings

When to Use:

Studying cellular differentiation
Time-course or developmental datasets
Understanding cell state transitions
Identifying branching points in development

Basic Usage:

# Typically used after scVI for embeddings
scvi_model = scvi.model.SCVI(adata)
scvi_model.train()

# Decipher for trajectory
scvi.model.DECIPHER.setup_anndata(adata)
decipher_model = scvi.model.DECIPHER(adata, scvi_model)
decipher_model.train()

# Get pseudotime
pseudotime = decipher_model.get_pseudotime()
adata.obs["pseudotime"] = pseudotime

Visualization:

import scanpy as sc

# Plot pseudotime on UMAP
sc.pl.umap(adata, color="pseudotime", cmap="viridis")

# Gene expression along pseudotime
sc.pl.scatter(adata, x="pseudotime", y="gene_of_interest")

peRegLM (Peak Regulatory Linear Model)

Purpose: Linking chromatin accessibility to gene expression for regulatory analysis.

Key Features:

Links ATAC-seq peaks to gene expression
Identifies regulatory relationships
Works with paired multiome data

When to Use:

Multiome data (RNA + ATAC from same cells)
Understanding gene regulation
Linking peaks to target genes
Regulatory network construction

Basic Usage:

# Requires paired RNA + ATAC data
scvi.model.PEREGLM.setup_anndata(
    multiome_adata,
    rna_layer="counts",
    atac_layer="atac_counts"
)

model = scvi.model.PEREGLM(multiome_adata)
model.train()

# Get peak-gene links
peak_gene_links = model.get_regulatory_links()

Model-Specific Best Practices

MethylVI/MethylANVI

Sparsity: Methylation data is inherently sparse; model accounts for this
CpG selection: Filter CpGs with very low coverage
Biological interpretation: Consider genomic context (promoters, enhancers)
Integration: For multi-omics, analyze separately then integrate results

CytoVI

Protein QC: Remove low-quality or uninformative proteins
Compensation: Ensure proper spectral compensation before analysis
Batch design: Include biological and technical replicates
Controls: Use control samples to validate batch correction

SysVI

Sample size: Designed for large-scale integration
Batch definition: Carefully define batch structure
Biological validation: Verify biological signals preserved

Decipher

Start point: Define trajectory start cells if known
Branching: Specify expected number of branches
Validation: Use known markers to validate pseudotime
Integration: Works well with scVI embeddings

Integration with Other Models

Many specialized models work well in combination:

Methylation + Expression:

# Analyze separately, then integrate
methylvi_model = scvi.model.METHYLVI(meth_adata)
scvi_model = scvi.model.SCVI(rna_adata)

# Integrate results at analysis level
# E.g., correlate methylation and expression patterns

Cytometry + CITE-seq:

# CytoVI for flow/CyTOF
cyto_model = scvi.model.CYTOVI(cyto_adata)

# totalVI for CITE-seq
cite_model = scvi.model.TOTALVI(cite_adata)

# Compare protein measurements across platforms

ATAC + RNA (Multiome):

# MultiVI for joint analysis
multivi_model = scvi.model.MULTIVI(multiome_adata)

# peRegLM for regulatory links
pereglm_model = scvi.model.PEREGLM(multiome_adata)

Choosing Specialized Models

Decision Tree

What data modality?
- Methylation → MethylVI/MethylANVI
- Flow/CyTOF → CytoVI
- Trajectory → Decipher
- Multi-batch integration → SysVI
- Regulatory links → peRegLM
Do you have labels?
- Yes → MethylANVI (methylation)
- No → MethylVI (methylation)
What's your main goal?
- Batch correction → CytoVI, SysVI
- Trajectory/pseudotime → Decipher
- Peak-gene links → peRegLM
- Methylation patterns → MethylVI/ANVI

Example: Complete Methylation Analysis

import scvi
import scanpy as sc

# 1. Load methylation data
meth_adata = sc.read_h5ad("methylation_data.h5ad")

# 2. QC: filter low-coverage CpG sites
sc.pp.filter_genes(meth_adata, min_cells=10)

# 3. Setup MethylVI
scvi.model.METHYLVI.setup_anndata(
    meth_adata,
    layer="methylation",
    batch_key="batch"
)

# 4. Train model
model = scvi.model.METHYLVI(meth_adata, n_latent=15)
model.train(max_epochs=400)

# 5. Get latent representation
latent = model.get_latent_representation()
meth_adata.obsm["X_MethylVI"] = latent

# 6. Clustering
sc.pp.neighbors(meth_adata, use_rep="X_MethylVI")
sc.tl.umap(meth_adata)
sc.tl.leiden(meth_adata)

# 7. Differential methylation
dm_results = model.differential_methylation(
    groupby="leiden",
    group1="0",
    group2="1"
)

# 8. Save
model.save("methylvi_model")
meth_adata.write("methylation_analyzed.h5ad")

External Tools Integration

Some specialized models are available as external packages:

SOLO (doublet detection):

from scvi.external import SOLO

solo = SOLO.from_scvi_model(scvi_model)
solo.train()
doublets = solo.predict()

scArches (reference mapping):

from scvi.external import SCARCHES

# For transfer learning and query-to-reference mapping

These external tools extend scvi-tools functionality for specific use cases.

Summary Table

Model	Data Type	Primary Use	Supervised?
MethylVI	Methylation	Unsupervised analysis	No
MethylANVI	Methylation	Cell type annotation	Semi
CytoVI	Cytometry	Batch correction	No
SysVI	scRNA-seq	Large-scale integration	No
Decipher	scRNA-seq	Trajectory inference	No
peRegLM	Multiome	Peak-gene links	No
SOLO	scRNA-seq	Doublet detection	Semi

10 KiB Raw Blame History Unescape Escape

Specialized Modality Models

MethylVI / MethylANVI (Methylation Analysis)

MethylVI (Unsupervised)

MethylANVI (Semi-supervised with cell types)

CytoVI (Flow and Mass Cytometry)

SysVI (Systems-level Integration)

Decipher (Trajectory Inference)

peRegLM (Peak Regulatory Linear Model)

Model-Specific Best Practices

MethylVI/MethylANVI

CytoVI

SysVI

Decipher

Integration with Other Models

Choosing Specialized Models

Decision Tree

Example: Complete Methylation Analysis

External Tools Integration

Summary Table

10 KiB

Raw Blame History