Initial commit
This commit is contained in:
408
skills/scvi-tools/references/models-specialized.md
Normal file
408
skills/scvi-tools/references/models-specialized.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Specialized Modality Models
|
||||
|
||||
This document covers models for specialized single-cell data modalities in scvi-tools.
|
||||
|
||||
## MethylVI / MethylANVI (Methylation Analysis)
|
||||
|
||||
**Purpose**: Analysis of single-cell bisulfite sequencing (scBS-seq) data for DNA methylation.
|
||||
|
||||
**Key Features**:
|
||||
- Models methylation patterns at single-cell resolution
|
||||
- Handles sparsity in methylation data
|
||||
- Batch correction for methylation experiments
|
||||
- Label transfer (MethylANVI) for cell type annotation
|
||||
|
||||
**When to Use**:
|
||||
- Analyzing scBS-seq or similar methylation data
|
||||
- Studying DNA methylation patterns across cell types
|
||||
- Integrating methylation data across batches
|
||||
- Cell type annotation based on methylation profiles
|
||||
|
||||
**Data Requirements**:
|
||||
- Methylation count matrices (methylated vs. total reads per CpG site)
|
||||
- Format: Cells × CpG sites with methylation ratios or counts
|
||||
|
||||
### MethylVI (Unsupervised)
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
import scvi
|
||||
|
||||
# Setup methylation data
|
||||
scvi.model.METHYLVI.setup_anndata(
|
||||
adata,
|
||||
layer="methylation_counts", # Methylation data
|
||||
batch_key="batch"
|
||||
)
|
||||
|
||||
model = scvi.model.METHYLVI(adata)
|
||||
model.train()
|
||||
|
||||
# Get latent representation
|
||||
latent = model.get_latent_representation()
|
||||
|
||||
# Get normalized methylation values
|
||||
normalized_meth = model.get_normalized_methylation()
|
||||
```
|
||||
|
||||
### MethylANVI (Semi-supervised with cell types)
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
# Setup with cell type labels
|
||||
scvi.model.METHYLANVI.setup_anndata(
|
||||
adata,
|
||||
layer="methylation_counts",
|
||||
batch_key="batch",
|
||||
labels_key="cell_type",
|
||||
unlabeled_category="Unknown"
|
||||
)
|
||||
|
||||
model = scvi.model.METHYLANVI(adata)
|
||||
model.train()
|
||||
|
||||
# Predict cell types
|
||||
predictions = model.predict()
|
||||
```
|
||||
|
||||
**Key Parameters**:
|
||||
- `n_latent`: Latent dimensionality
|
||||
- `region_factors`: Model region-specific effects
|
||||
|
||||
**Use Cases**:
|
||||
- Epigenetic heterogeneity analysis
|
||||
- Cell type identification via methylation
|
||||
- Integration with gene expression data (separate analysis)
|
||||
- Differential methylation analysis
|
||||
|
||||
## CytoVI (Flow and Mass Cytometry)
|
||||
|
||||
**Purpose**: Batch correction and integration of flow cytometry and mass cytometry (CyTOF) data.
|
||||
|
||||
**Key Features**:
|
||||
- Handles antibody-based protein measurements
|
||||
- Corrects batch effects in cytometry data
|
||||
- Enables integration across experiments
|
||||
- Designed for high-dimensional protein panels
|
||||
|
||||
**When to Use**:
|
||||
- Analyzing flow cytometry or CyTOF data
|
||||
- Integrating cytometry experiments across batches
|
||||
- Batch correction for protein panels
|
||||
- Cross-study cytometry integration
|
||||
|
||||
**Data Requirements**:
|
||||
- Protein expression matrix (cells × proteins)
|
||||
- Flow cytometry or CyTOF measurements
|
||||
- Batch/experiment annotations
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
scvi.model.CYTOVI.setup_anndata(
|
||||
adata,
|
||||
protein_expression_obsm_key="protein_expression",
|
||||
batch_key="batch"
|
||||
)
|
||||
|
||||
model = scvi.model.CYTOVI(adata)
|
||||
model.train()
|
||||
|
||||
# Get batch-corrected representation
|
||||
latent = model.get_latent_representation()
|
||||
|
||||
# Get normalized protein values
|
||||
normalized = model.get_normalized_expression()
|
||||
```
|
||||
|
||||
**Key Parameters**:
|
||||
- `n_latent`: Latent space dimensionality
|
||||
- `n_layers`: Network depth
|
||||
|
||||
**Typical Workflow**:
|
||||
```python
|
||||
import scanpy as sc
|
||||
|
||||
# 1. Load cytometry data
|
||||
adata = sc.read_h5ad("cytof_data.h5ad")
|
||||
|
||||
# 2. Train CytoVI
|
||||
scvi.model.CYTOVI.setup_anndata(
|
||||
adata,
|
||||
protein_expression_obsm_key="protein",
|
||||
batch_key="experiment"
|
||||
)
|
||||
model = scvi.model.CYTOVI(adata)
|
||||
model.train()
|
||||
|
||||
# 3. Get batch-corrected values
|
||||
latent = model.get_latent_representation()
|
||||
adata.obsm["X_CytoVI"] = latent
|
||||
|
||||
# 4. Downstream analysis
|
||||
sc.pp.neighbors(adata, use_rep="X_CytoVI")
|
||||
sc.tl.umap(adata)
|
||||
sc.tl.leiden(adata)
|
||||
|
||||
# 5. Visualize batch correction
|
||||
sc.pl.umap(adata, color=["batch", "leiden"])
|
||||
```
|
||||
|
||||
## SysVI (Systems-level Integration)
|
||||
|
||||
**Purpose**: Batch effect correction with emphasis on preserving biological variation.
|
||||
|
||||
**Key Features**:
|
||||
- Specialized batch integration approach
|
||||
- Preserves biological signals while removing technical effects
|
||||
- Designed for large-scale integration studies
|
||||
|
||||
**When to Use**:
|
||||
- Large-scale multi-batch integration
|
||||
- Need to preserve subtle biological variation
|
||||
- Systems-level analysis across many studies
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
scvi.model.SYSVI.setup_anndata(
|
||||
adata,
|
||||
layer="counts",
|
||||
batch_key="batch"
|
||||
)
|
||||
|
||||
model = scvi.model.SYSVI(adata)
|
||||
model.train()
|
||||
|
||||
latent = model.get_latent_representation()
|
||||
```
|
||||
|
||||
## Decipher (Trajectory Inference)
|
||||
|
||||
**Purpose**: Trajectory inference and pseudotime analysis for single-cell data.
|
||||
|
||||
**Key Features**:
|
||||
- Learns cellular trajectories and differentiation paths
|
||||
- Pseudotime estimation
|
||||
- Accounts for uncertainty in trajectory structure
|
||||
- Compatible with scVI embeddings
|
||||
|
||||
**When to Use**:
|
||||
- Studying cellular differentiation
|
||||
- Time-course or developmental datasets
|
||||
- Understanding cell state transitions
|
||||
- Identifying branching points in development
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
# Typically used after scVI for embeddings
|
||||
scvi_model = scvi.model.SCVI(adata)
|
||||
scvi_model.train()
|
||||
|
||||
# Decipher for trajectory
|
||||
scvi.model.DECIPHER.setup_anndata(adata)
|
||||
decipher_model = scvi.model.DECIPHER(adata, scvi_model)
|
||||
decipher_model.train()
|
||||
|
||||
# Get pseudotime
|
||||
pseudotime = decipher_model.get_pseudotime()
|
||||
adata.obs["pseudotime"] = pseudotime
|
||||
```
|
||||
|
||||
**Visualization**:
|
||||
```python
|
||||
import scanpy as sc
|
||||
|
||||
# Plot pseudotime on UMAP
|
||||
sc.pl.umap(adata, color="pseudotime", cmap="viridis")
|
||||
|
||||
# Gene expression along pseudotime
|
||||
sc.pl.scatter(adata, x="pseudotime", y="gene_of_interest")
|
||||
```
|
||||
|
||||
## peRegLM (Peak Regulatory Linear Model)
|
||||
|
||||
**Purpose**: Linking chromatin accessibility to gene expression for regulatory analysis.
|
||||
|
||||
**Key Features**:
|
||||
- Links ATAC-seq peaks to gene expression
|
||||
- Identifies regulatory relationships
|
||||
- Works with paired multiome data
|
||||
|
||||
**When to Use**:
|
||||
- Multiome data (RNA + ATAC from same cells)
|
||||
- Understanding gene regulation
|
||||
- Linking peaks to target genes
|
||||
- Regulatory network construction
|
||||
|
||||
**Basic Usage**:
|
||||
```python
|
||||
# Requires paired RNA + ATAC data
|
||||
scvi.model.PEREGLM.setup_anndata(
|
||||
multiome_adata,
|
||||
rna_layer="counts",
|
||||
atac_layer="atac_counts"
|
||||
)
|
||||
|
||||
model = scvi.model.PEREGLM(multiome_adata)
|
||||
model.train()
|
||||
|
||||
# Get peak-gene links
|
||||
peak_gene_links = model.get_regulatory_links()
|
||||
```
|
||||
|
||||
## Model-Specific Best Practices
|
||||
|
||||
### MethylVI/MethylANVI
|
||||
1. **Sparsity**: Methylation data is inherently sparse; model accounts for this
|
||||
2. **CpG selection**: Filter CpGs with very low coverage
|
||||
3. **Biological interpretation**: Consider genomic context (promoters, enhancers)
|
||||
4. **Integration**: For multi-omics, analyze separately then integrate results
|
||||
|
||||
### CytoVI
|
||||
1. **Protein QC**: Remove low-quality or uninformative proteins
|
||||
2. **Compensation**: Ensure proper spectral compensation before analysis
|
||||
3. **Batch design**: Include biological and technical replicates
|
||||
4. **Controls**: Use control samples to validate batch correction
|
||||
|
||||
### SysVI
|
||||
1. **Sample size**: Designed for large-scale integration
|
||||
2. **Batch definition**: Carefully define batch structure
|
||||
3. **Biological validation**: Verify biological signals preserved
|
||||
|
||||
### Decipher
|
||||
1. **Start point**: Define trajectory start cells if known
|
||||
2. **Branching**: Specify expected number of branches
|
||||
3. **Validation**: Use known markers to validate pseudotime
|
||||
4. **Integration**: Works well with scVI embeddings
|
||||
|
||||
## Integration with Other Models
|
||||
|
||||
Many specialized models work well in combination:
|
||||
|
||||
**Methylation + Expression**:
|
||||
```python
|
||||
# Analyze separately, then integrate
|
||||
methylvi_model = scvi.model.METHYLVI(meth_adata)
|
||||
scvi_model = scvi.model.SCVI(rna_adata)
|
||||
|
||||
# Integrate results at analysis level
|
||||
# E.g., correlate methylation and expression patterns
|
||||
```
|
||||
|
||||
**Cytometry + CITE-seq**:
|
||||
```python
|
||||
# CytoVI for flow/CyTOF
|
||||
cyto_model = scvi.model.CYTOVI(cyto_adata)
|
||||
|
||||
# totalVI for CITE-seq
|
||||
cite_model = scvi.model.TOTALVI(cite_adata)
|
||||
|
||||
# Compare protein measurements across platforms
|
||||
```
|
||||
|
||||
**ATAC + RNA (Multiome)**:
|
||||
```python
|
||||
# MultiVI for joint analysis
|
||||
multivi_model = scvi.model.MULTIVI(multiome_adata)
|
||||
|
||||
# peRegLM for regulatory links
|
||||
pereglm_model = scvi.model.PEREGLM(multiome_adata)
|
||||
```
|
||||
|
||||
## Choosing Specialized Models
|
||||
|
||||
### Decision Tree
|
||||
|
||||
1. **What data modality?**
|
||||
- Methylation → MethylVI/MethylANVI
|
||||
- Flow/CyTOF → CytoVI
|
||||
- Trajectory → Decipher
|
||||
- Multi-batch integration → SysVI
|
||||
- Regulatory links → peRegLM
|
||||
|
||||
2. **Do you have labels?**
|
||||
- Yes → MethylANVI (methylation)
|
||||
- No → MethylVI (methylation)
|
||||
|
||||
3. **What's your main goal?**
|
||||
- Batch correction → CytoVI, SysVI
|
||||
- Trajectory/pseudotime → Decipher
|
||||
- Peak-gene links → peRegLM
|
||||
- Methylation patterns → MethylVI/ANVI
|
||||
|
||||
## Example: Complete Methylation Analysis
|
||||
|
||||
```python
|
||||
import scvi
|
||||
import scanpy as sc
|
||||
|
||||
# 1. Load methylation data
|
||||
meth_adata = sc.read_h5ad("methylation_data.h5ad")
|
||||
|
||||
# 2. QC: filter low-coverage CpG sites
|
||||
sc.pp.filter_genes(meth_adata, min_cells=10)
|
||||
|
||||
# 3. Setup MethylVI
|
||||
scvi.model.METHYLVI.setup_anndata(
|
||||
meth_adata,
|
||||
layer="methylation",
|
||||
batch_key="batch"
|
||||
)
|
||||
|
||||
# 4. Train model
|
||||
model = scvi.model.METHYLVI(meth_adata, n_latent=15)
|
||||
model.train(max_epochs=400)
|
||||
|
||||
# 5. Get latent representation
|
||||
latent = model.get_latent_representation()
|
||||
meth_adata.obsm["X_MethylVI"] = latent
|
||||
|
||||
# 6. Clustering
|
||||
sc.pp.neighbors(meth_adata, use_rep="X_MethylVI")
|
||||
sc.tl.umap(meth_adata)
|
||||
sc.tl.leiden(meth_adata)
|
||||
|
||||
# 7. Differential methylation
|
||||
dm_results = model.differential_methylation(
|
||||
groupby="leiden",
|
||||
group1="0",
|
||||
group2="1"
|
||||
)
|
||||
|
||||
# 8. Save
|
||||
model.save("methylvi_model")
|
||||
meth_adata.write("methylation_analyzed.h5ad")
|
||||
```
|
||||
|
||||
## External Tools Integration
|
||||
|
||||
Some specialized models are available as external packages:
|
||||
|
||||
**SOLO** (doublet detection):
|
||||
```python
|
||||
from scvi.external import SOLO
|
||||
|
||||
solo = SOLO.from_scvi_model(scvi_model)
|
||||
solo.train()
|
||||
doublets = solo.predict()
|
||||
```
|
||||
|
||||
**scArches** (reference mapping):
|
||||
```python
|
||||
from scvi.external import SCARCHES
|
||||
|
||||
# For transfer learning and query-to-reference mapping
|
||||
```
|
||||
|
||||
These external tools extend scvi-tools functionality for specific use cases.
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Model | Data Type | Primary Use | Supervised? |
|
||||
|-------|-----------|-------------|-------------|
|
||||
| MethylVI | Methylation | Unsupervised analysis | No |
|
||||
| MethylANVI | Methylation | Cell type annotation | Semi |
|
||||
| CytoVI | Cytometry | Batch correction | No |
|
||||
| SysVI | scRNA-seq | Large-scale integration | No |
|
||||
| Decipher | scRNA-seq | Trajectory inference | No |
|
||||
| peRegLM | Multiome | Peak-gene links | No |
|
||||
| SOLO | scRNA-seq | Doublet detection | Semi |
|
||||
Reference in New Issue
Block a user