Files
2025-11-30 08:30:10 +08:00

353 lines
9.5 KiB
Markdown

# Scanpy Plotting Guide
Comprehensive guide for creating publication-quality visualizations with scanpy.
## General Plotting Principles
All scanpy plotting functions follow consistent patterns:
- Functions in `sc.pl.*` mirror analysis functions in `sc.tl.*`
- Most accept `color` parameter for gene names or metadata columns
- Results are saved via `save` parameter
- Multiple plots can be generated in a single call
## Essential Quality Control Plots
### Visualize QC Metrics
```python
# Violin plots for QC metrics
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],
jitter=0.4, multi_panel=True, save='_qc_violin.pdf')
# Scatter plots to identify outliers
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt', save='_qc_mt.pdf')
sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts', save='_qc_genes.pdf')
# Highest expressing genes
sc.pl.highest_expr_genes(adata, n_top=20, save='_highest_expr.pdf')
```
### Post-filtering QC
```python
# Compare before and after filtering
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts'],
groupby='sample', save='_post_filter.pdf')
```
## Dimensionality Reduction Visualizations
### PCA Plots
```python
# Basic PCA
sc.pl.pca(adata, color='leiden', save='_pca.pdf')
# PCA colored by gene expression
sc.pl.pca(adata, color=['gene1', 'gene2', 'gene3'], save='_pca_genes.pdf')
# Variance ratio plot (elbow plot)
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=50, save='_variance.pdf')
# PCA loadings
sc.pl.pca_loadings(adata, components=[1, 2, 3], save='_loadings.pdf')
```
### UMAP Plots
```python
# Basic UMAP with clusters
sc.pl.umap(adata, color='leiden', legend_loc='on data', save='_umap_leiden.pdf')
# UMAP colored by multiple variables
sc.pl.umap(adata, color=['leiden', 'cell_type', 'batch'],
save='_umap_multi.pdf')
# UMAP with gene expression
sc.pl.umap(adata, color=['CD3D', 'CD14', 'MS4A1'],
use_raw=False, save='_umap_genes.pdf')
# Customize appearance
sc.pl.umap(adata, color='leiden',
palette='Set2',
size=50,
alpha=0.8,
frameon=False,
title='Cell Types',
save='_umap_custom.pdf')
```
### t-SNE Plots
```python
# t-SNE with clusters
sc.pl.tsne(adata, color='leiden', legend_loc='right margin', save='_tsne.pdf')
# Multiple t-SNE perplexities (if computed)
sc.pl.tsne(adata, color='leiden', save='_tsne_default.pdf')
```
## Clustering Visualizations
### Basic Cluster Plots
```python
# UMAP with cluster annotations
sc.pl.umap(adata, color='leiden', add_outline=True,
legend_loc='on data', legend_fontsize=12,
legend_fontoutline=2, frameon=False,
save='_clusters.pdf')
# Show cluster proportions
sc.pl.umap(adata, color='leiden', size=50, edges=True,
edges_width=0.1, save='_clusters_edges.pdf')
```
### Cluster Comparison
```python
# Compare clustering results
sc.pl.umap(adata, color=['leiden', 'louvain'],
save='_cluster_comparison.pdf')
# Cluster dendrogram
sc.tl.dendrogram(adata, groupby='leiden')
sc.pl.dendrogram(adata, groupby='leiden', save='_dendrogram.pdf')
```
## Marker Gene Visualizations
### Ranked Marker Genes
```python
# Overview of top markers per cluster
sc.pl.rank_genes_groups(adata, n_genes=25, sharey=False,
save='_marker_overview.pdf')
# Heatmap of top markers
sc.pl.rank_genes_groups_heatmap(adata, n_genes=10, groupby='leiden',
show_gene_labels=True,
save='_marker_heatmap.pdf')
# Dot plot of markers
sc.pl.rank_genes_groups_dotplot(adata, n_genes=5,
save='_marker_dotplot.pdf')
# Stacked violin plots
sc.pl.rank_genes_groups_stacked_violin(adata, n_genes=5,
save='_marker_violin.pdf')
# Matrix plot
sc.pl.rank_genes_groups_matrixplot(adata, n_genes=5,
save='_marker_matrix.pdf')
```
### Specific Gene Expression
```python
# Violin plots for specific genes
marker_genes = ['CD3D', 'CD14', 'MS4A1', 'NKG7', 'FCGR3A']
sc.pl.violin(adata, keys=marker_genes, groupby='leiden',
save='_markers_violin.pdf')
# Dot plot for curated markers
sc.pl.dotplot(adata, var_names=marker_genes, groupby='leiden',
save='_markers_dotplot.pdf')
# Heatmap for specific genes
sc.pl.heatmap(adata, var_names=marker_genes, groupby='leiden',
swap_axes=True, save='_markers_heatmap.pdf')
# Stacked violin for gene sets
sc.pl.stacked_violin(adata, var_names=marker_genes, groupby='leiden',
save='_markers_stacked.pdf')
```
### Gene Expression on Embeddings
```python
# Multiple genes on UMAP
genes = ['CD3D', 'CD14', 'MS4A1', 'NKG7']
sc.pl.umap(adata, color=genes, cmap='viridis',
save='_umap_markers.pdf')
# Gene expression with custom colormap
sc.pl.umap(adata, color='CD3D', cmap='Reds',
vmin=0, vmax=3, save='_umap_cd3d.pdf')
```
## Trajectory and Pseudotime Visualizations
### PAGA Plots
```python
# PAGA graph
sc.pl.paga(adata, color='leiden', save='_paga.pdf')
# PAGA with gene expression
sc.pl.paga(adata, color=['leiden', 'dpt_pseudotime'],
save='_paga_pseudotime.pdf')
# PAGA overlaid on UMAP
sc.pl.umap(adata, color='leiden', save='_umap_with_paga.pdf',
edges=True, edges_color='gray')
```
### Pseudotime Plots
```python
# DPT pseudotime on UMAP
sc.pl.umap(adata, color='dpt_pseudotime', save='_umap_dpt.pdf')
# Gene expression along pseudotime
sc.pl.dpt_timeseries(adata, save='_dpt_timeseries.pdf')
# Heatmap ordered by pseudotime
sc.pl.heatmap(adata, var_names=genes, groupby='leiden',
use_raw=False, show_gene_labels=True,
save='_pseudotime_heatmap.pdf')
```
## Advanced Visualizations
### Tracks Plot (Gene Expression Trends)
```python
# Show gene expression across cell types
sc.pl.tracksplot(adata, var_names=marker_genes, groupby='leiden',
save='_tracks.pdf')
```
### Correlation Matrix
```python
# Correlation between clusters
sc.pl.correlation_matrix(adata, groupby='leiden',
save='_correlation.pdf')
```
### Embedding Density
```python
# Cell density on UMAP
sc.tl.embedding_density(adata, basis='umap', groupby='cell_type')
sc.pl.embedding_density(adata, basis='umap', key='umap_density_cell_type',
save='_density.pdf')
```
## Multi-Panel Figures
### Creating Panel Figures
```python
import matplotlib.pyplot as plt
# Create multi-panel figure
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
# Plot on specific axes
sc.pl.umap(adata, color='leiden', ax=axes[0, 0], show=False)
sc.pl.umap(adata, color='CD3D', ax=axes[0, 1], show=False)
sc.pl.umap(adata, color='CD14', ax=axes[1, 0], show=False)
sc.pl.umap(adata, color='MS4A1', ax=axes[1, 1], show=False)
plt.tight_layout()
plt.savefig('figures/multi_panel.pdf')
plt.show()
```
## Publication-Quality Customization
### High-Quality Settings
```python
# Set publication-quality defaults
sc.settings.set_figure_params(dpi=300, frameon=False, figsize=(5, 5),
facecolor='white')
# Vector graphics output
sc.settings.figdir = './figures/'
sc.settings.file_format_figs = 'pdf' # or 'svg'
```
### Custom Color Palettes
```python
# Use custom colors
custom_colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
sc.pl.umap(adata, color='leiden', palette=custom_colors,
save='_custom_colors.pdf')
# Continuous color maps
sc.pl.umap(adata, color='CD3D', cmap='viridis', save='_viridis.pdf')
sc.pl.umap(adata, color='CD3D', cmap='RdBu_r', save='_rdbu.pdf')
```
### Remove Axes and Frames
```python
# Clean plot without axes
sc.pl.umap(adata, color='leiden', frameon=False,
save='_clean.pdf')
# No legend
sc.pl.umap(adata, color='leiden', legend_loc=None,
save='_no_legend.pdf')
```
## Exporting Plots
### Save Individual Plots
```python
# Automatic saving with save parameter
sc.pl.umap(adata, color='leiden', save='_leiden.pdf')
# Saves to: sc.settings.figdir + 'umap_leiden.pdf'
# Manual saving
import matplotlib.pyplot as plt
fig = sc.pl.umap(adata, color='leiden', show=False, return_fig=True)
fig.savefig('figures/my_umap.pdf', dpi=300, bbox_inches='tight')
```
### Batch Export
```python
# Save multiple versions
for gene in ['CD3D', 'CD14', 'MS4A1']:
sc.pl.umap(adata, color=gene, save=f'_{gene}.pdf')
```
## Common Customization Parameters
### Layout Parameters
- `figsize`: Figure size (width, height)
- `frameon`: Show frame around plot
- `title`: Plot title
- `legend_loc`: 'right margin', 'on data', 'best', or None
- `legend_fontsize`: Font size for legend
- `size`: Point size
### Color Parameters
- `color`: Variable(s) to color by
- `palette`: Color palette (e.g., 'Set1', 'viridis')
- `cmap`: Colormap for continuous variables
- `vmin`, `vmax`: Color scale limits
- `use_raw`: Use raw counts for gene expression
### Saving Parameters
- `save`: Filename suffix for saving
- `show`: Whether to display plot
- `dpi`: Resolution for raster formats
## Tips for Publication Figures
1. **Use vector formats**: PDF or SVG for scalable graphics
2. **High DPI**: Set dpi=300 or higher for raster images
3. **Consistent styling**: Use the same color palette across figures
4. **Clear labels**: Ensure gene names and cell types are readable
5. **White background**: Use `facecolor='white'` for publications
6. **Remove clutter**: Set `frameon=False` for cleaner appearance
7. **Legend placement**: Use 'on data' for compact figures
8. **Color blind friendly**: Consider palettes like 'colorblind' or 'Set2'