Files
2025-11-30 08:30:10 +08:00

252 lines
7.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Scanpy API Quick Reference
Quick reference for commonly used scanpy functions organized by module.
## Import Convention
```python
import scanpy as sc
```
## Reading and Writing Data (sc.read_*)
### Reading Functions
```python
sc.read_10x_h5(filename) # Read 10X HDF5 file
sc.read_10x_mtx(path) # Read 10X mtx directory
sc.read_h5ad(filename) # Read h5ad (AnnData) file
sc.read_csv(filename) # Read CSV file
sc.read_excel(filename) # Read Excel file
sc.read_loom(filename) # Read loom file
sc.read_text(filename) # Read text file
sc.read_visium(path) # Read Visium spatial data
```
### Writing Functions
```python
adata.write_h5ad(filename) # Write to h5ad format
adata.write_csvs(dirname) # Write to CSV files
adata.write_loom(filename) # Write to loom format
adata.write_zarr(filename) # Write to zarr format
```
## Preprocessing (sc.pp.*)
### Quality Control
```python
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
```
### Normalization and Transformation
```python
sc.pp.normalize_total(adata, target_sum=1e4) # Normalize to target sum
sc.pp.log1p(adata) # Log(x + 1) transformation
sc.pp.sqrt(adata) # Square root transformation
```
### Feature Selection
```python
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
sc.pp.highly_variable_genes(adata, flavor='seurat_v3', n_top_genes=2000)
```
### Scaling and Regression
```python
sc.pp.scale(adata, max_value=10) # Scale to unit variance
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt']) # Regress out unwanted variation
```
### Dimensionality Reduction (Preprocessing)
```python
sc.pp.pca(adata, n_comps=50) # Principal component analysis
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40) # Compute neighborhood graph
```
### Batch Correction
```python
sc.pp.combat(adata, key='batch') # ComBat batch correction
```
## Tools (sc.tl.*)
### Dimensionality Reduction
```python
sc.tl.pca(adata, svd_solver='arpack') # PCA
sc.tl.umap(adata) # UMAP embedding
sc.tl.tsne(adata) # t-SNE embedding
sc.tl.diffmap(adata) # Diffusion map
sc.tl.draw_graph(adata, layout='fa') # Force-directed graph
```
### Clustering
```python
sc.tl.leiden(adata, resolution=0.5) # Leiden clustering (recommended)
sc.tl.louvain(adata, resolution=0.5) # Louvain clustering
sc.tl.kmeans(adata, n_clusters=10) # K-means clustering
```
### Marker Genes and Differential Expression
```python
sc.tl.rank_genes_groups(adata, groupby='leiden', method='wilcoxon')
sc.tl.rank_genes_groups(adata, groupby='leiden', method='t-test')
sc.tl.rank_genes_groups(adata, groupby='leiden', method='logreg')
# Get results as dataframe
sc.get.rank_genes_groups_df(adata, group='0')
```
### Trajectory Inference
```python
sc.tl.paga(adata, groups='leiden') # PAGA trajectory
sc.tl.dpt(adata) # Diffusion pseudotime
```
### Gene Scoring
```python
sc.tl.score_genes(adata, gene_list, score_name='score')
sc.tl.score_genes_cell_cycle(adata, s_genes, g2m_genes)
```
### Embeddings and Projections
```python
sc.tl.ingest(adata, adata_ref) # Map to reference
sc.tl.embedding_density(adata, basis='umap', groupby='leiden')
```
## Plotting (sc.pl.*)
### Basic Embeddings
```python
sc.pl.umap(adata, color='leiden') # UMAP plot
sc.pl.tsne(adata, color='gene_name') # t-SNE plot
sc.pl.pca(adata, color='leiden') # PCA plot
sc.pl.diffmap(adata, color='leiden') # Diffusion map plot
```
### Heatmaps and Dot Plots
```python
sc.pl.heatmap(adata, var_names=genes, groupby='leiden')
sc.pl.dotplot(adata, var_names=genes, groupby='leiden')
sc.pl.matrixplot(adata, var_names=genes, groupby='leiden')
sc.pl.stacked_violin(adata, var_names=genes, groupby='leiden')
```
### Violin and Scatter Plots
```python
sc.pl.violin(adata, keys=['gene1', 'gene2'], groupby='leiden')
sc.pl.scatter(adata, x='gene1', y='gene2', color='leiden')
```
### Marker Gene Visualization
```python
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)
sc.pl.rank_genes_groups_violin(adata, groups='0')
sc.pl.rank_genes_groups_heatmap(adata, n_genes=10)
sc.pl.rank_genes_groups_dotplot(adata, n_genes=5)
```
### Trajectory Visualization
```python
sc.pl.paga(adata, color='leiden') # PAGA graph
sc.pl.dpt_timeseries(adata) # DPT timeseries
```
### QC Plots
```python
sc.pl.highest_expr_genes(adata, n_top=20)
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'])
sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts')
```
### Advanced Plots
```python
sc.pl.dendrogram(adata, groupby='leiden')
sc.pl.correlation_matrix(adata, groupby='leiden')
sc.pl.tracksplot(adata, var_names=genes, groupby='leiden')
```
## Common Parameters
### Color Parameters
- `color`: Variable(s) to color by (gene name, obs column)
- `use_raw`: Use `.raw` attribute of adata
- `palette`: Color palette to use
- `vmin`, `vmax`: Color scale limits
### Layout Parameters
- `basis`: Embedding basis ('umap', 'tsne', 'pca', etc.)
- `legend_loc`: Legend location ('on data', 'right margin', etc.)
- `size`: Point size
- `alpha`: Point transparency
### Saving Parameters
- `save`: Filename to save plot
- `show`: Whether to show plot
## AnnData Structure
```python
adata.X # Expression matrix (cells × genes)
adata.obs # Cell annotations (DataFrame)
adata.var # Gene annotations (DataFrame)
adata.uns # Unstructured annotations (dict)
adata.obsm # Multi-dimensional cell annotations (e.g., PCA, UMAP)
adata.varm # Multi-dimensional gene annotations
adata.layers # Additional data layers
adata.raw # Raw data backup
# Access
adata.obs_names # Cell barcodes
adata.var_names # Gene names
adata.shape # (n_cells, n_genes)
# Slicing
adata[cell_indices, gene_indices]
adata[:, adata.var_names.isin(gene_list)]
adata[adata.obs['leiden'] == '0', :]
```
## Settings
```python
sc.settings.verbosity = 3 # 0=error, 1=warning, 2=info, 3=hint
sc.settings.set_figure_params(dpi=80, facecolor='white')
sc.settings.autoshow = False # Don't show plots automatically
sc.settings.autosave = True # Autosave figures
sc.settings.figdir = './figures/' # Figure directory
sc.settings.cachedir = './cache/' # Cache directory
sc.settings.n_jobs = 8 # Number of parallel jobs
```
## Useful Utilities
```python
sc.logging.print_versions() # Print version information
sc.logging.print_memory_usage() # Print memory usage
adata.copy() # Create a copy of AnnData object
adata.concatenate([adata1, adata2]) # Concatenate AnnData objects
```