18 KiB
deepTools Complete Tool Reference
This document provides a comprehensive reference for all deepTools command-line utilities organized by category.
BAM and bigWig File Processing Tools
multiBamSummary
Computes read coverages for genomic regions across multiple BAM files, outputting compressed numpy arrays for downstream correlation and PCA analysis.
Modes:
- bins: Genome-wide analysis using consecutive equal-sized windows (default 10kb)
- BED-file: Restricts analysis to user-specified genomic regions
Key Parameters:
--bamfiles, -b: Indexed BAM files (space-separated, required)--outFileName, -o: Output coverage matrix file (required)--BED: Region specification file (BED-file mode only)--binSize: Window size in bases (default: 10,000)--labels: Custom sample identifiers--minMappingQuality: Quality threshold for read inclusion--numberOfProcessors, -p: Parallel processing cores--extendReads: Fragment size extension--ignoreDuplicates: Remove PCR duplicates--outRawCounts: Export tab-delimited file with coordinate columns and per-sample counts
Output: Compressed numpy array (.npz) for plotCorrelation and plotPCA
Common Usage:
# Genome-wide comparison
multiBamSummary bins --bamfiles sample1.bam sample2.bam -o results.npz
# Peak region comparison
multiBamSummary BED-file --BED peaks.bed --bamfiles sample1.bam sample2.bam -o results.npz
multiBigwigSummary
Similar to multiBamSummary but operates on bigWig files instead of BAM files. Used for comparing coverage tracks across samples.
Modes:
- bins: Genome-wide analysis
- BED-file: Region-specific analysis
Key Parameters: Similar to multiBamSummary but accepts bigWig files
bamCoverage
Converts BAM alignment files into normalized coverage tracks in bigWig or bedGraph formats. Calculates coverage as number of reads per bin.
Key Parameters:
--bam, -b: Input BAM file (required)--outFileName, -o: Output filename (required)--outFileFormat, -of: Output type (bigwig or bedgraph)--normalizeUsing: Normalization method- RPKM: Reads Per Kilobase per Million mapped reads
- CPM: Counts Per Million mapped reads
- BPM: Bins Per Million mapped reads
- RPGC: Reads per genomic content (requires --effectiveGenomeSize)
- None: No normalization (default)
--effectiveGenomeSize: Mappable genome size (required for RPGC)--binSize: Resolution in base pairs (default: 50)--extendReads, -e: Extend reads to fragment length (recommended for ChIP-seq, NOT for RNA-seq)--centerReads: Center reads at fragment length for sharper signals--ignoreDuplicates: Count identical reads only once--minMappingQuality: Filter reads below quality threshold--minFragmentLength / --maxFragmentLength: Fragment length filtering--smoothLength: Window averaging for noise reduction--MNase: Analyze MNase-seq data for nucleosome positioning--Offset: Position-specific offsets (useful for RiboSeq, GROseq)--filterRNAstrand: Separate forward/reverse strand reads--ignoreForNormalization: Exclude chromosomes from normalization (e.g., sex chromosomes)--numberOfProcessors, -p: Parallel processing
Important Notes:
- For RNA-seq: Do NOT use --extendReads (would extend over splice junctions)
- For ChIP-seq: Use --extendReads with smaller bin sizes
- Never apply --ignoreDuplicates after GC bias correction
Common Usage:
# Basic coverage with RPKM normalization
bamCoverage --bam input.bam --outFileName coverage.bw --normalizeUsing RPKM
# ChIP-seq with extension
bamCoverage --bam chip.bam --outFileName chip_coverage.bw \
--binSize 10 --extendReads 200 --ignoreDuplicates
# Strand-specific RNA-seq
bamCoverage --bam rnaseq.bam --outFileName forward.bw \
--filterRNAstrand forward
bamCompare
Compares two BAM files by generating bigWig or bedGraph files, normalizing for sequencing depth differences. Processes genome in equal-sized bins and performs per-bin calculations.
Comparison Methods:
- log2 (default): Log2 ratio of samples
- ratio: Direct ratio calculation
- subtract: Difference between files
- add: Sum of samples
- mean: Average across samples
- reciprocal_ratio: Negative inverse for ratios < 0
- first/second: Output scaled signal from single file
Normalization Methods:
- readCount (default): Compensates for sequencing depth
- SES: Selective enrichment statistics
- RPKM: Reads per kilobase per million
- CPM: Counts per million
- BPM: Bins per million
- RPGC: Reads per genomic content (requires --effectiveGenomeSize)
Key Parameters:
--bamfile1, -b1: First BAM file (required)--bamfile2, -b2: Second BAM file (required)--outFileName, -o: Output filename (required)--outFileFormat: bigwig or bedgraph--operation: Comparison method (see above)--scaleFactorsMethod: Normalization method (see above)--binSize: Bin width for output (default: 50bp)--pseudocount: Avoid division by zero (default: 1)--extendReads: Extend reads to fragment length--ignoreDuplicates: Count identical reads once--minMappingQuality: Quality threshold--numberOfProcessors, -p: Parallelization
Common Usage:
# Log2 ratio of treatment vs control
bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw
# Subtract control from treatment
bamCompare -b1 treatment.bam -b2 control.bam -o difference.bw \
--operation subtract --scaleFactorsMethod readCount
correctGCBias / computeGCBias
computeGCBias: Identifies GC-content bias from sequencing and PCR amplification.
correctGCBias: Corrects BAM files for GC bias detected by computeGCBias.
Key Parameters (computeGCBias):
--bamfile, -b: Input BAM file--effectiveGenomeSize: Mappable genome size--genome, -g: Reference genome in 2bit format--fragmentLength, -l: Fragment length (for single-end)--biasPlot: Output diagnostic plot
Key Parameters (correctGCBias):
--bamfile, -b: Input BAM file--effectiveGenomeSize: Mappable genome size--genome, -g: Reference genome in 2bit format--GCbiasFrequenciesFile: Frequencies from computeGCBias--correctedFile, -o: Output corrected BAM
Important: Never use --ignoreDuplicates after GC bias correction
alignmentSieve
Filters BAM files by various quality metrics on-the-fly. Useful for creating filtered BAM files for specific analyses.
Key Parameters:
--bam, -b: Input BAM file--outFile, -o: Output BAM file--minMappingQuality: Minimum mapping quality--ignoreDuplicates: Remove duplicates--minFragmentLength / --maxFragmentLength: Fragment length filters--samFlagInclude / --samFlagExclude: SAM flag filtering--shift: Shift reads (e.g., for ATACseq Tn5 correction)--ATACshift: Automatically shift for ATAC-seq data
computeMatrix
Calculates scores per genomic region and prepares matrices for plotHeatmap and plotProfile. Processes bigWig score files and BED/GTF region files.
Modes:
- reference-point: Signal distribution relative to specific position (TSS, TES, or center)
- scale-regions: Signal across regions standardized to uniform lengths
Key Parameters:
-R: Region file(s) in BED/GTF format (required)-S: BigWig score file(s) (required)-o: Output matrix file (required)-b: Upstream distance from reference point-a: Downstream distance from reference point-m: Region body length (scale-regions only)-bs, --binSize: Bin size for averaging scores--skipZeros: Skip regions with all zeros--minThreshold / --maxThreshold: Filter by signal intensity--sortRegions: ascending, descending, keep, no--sortUsing: mean, median, max, min, sum, region_length-p, --numberOfProcessors: Parallel processing--averageTypeBins: Statistical method (mean, median, min, max, sum, std)
Output Options:
--outFileNameMatrix: Export tab-delimited data--outFileSortedRegions: Save filtered/sorted BED file
Common Usage:
# TSS analysis
computeMatrix reference-point -S signal.bw -R genes.bed \
-o matrix.gz -b 2000 -a 2000 --referencePoint TSS
# Scaled gene body
computeMatrix scale-regions -S signal.bw -R genes.bed \
-o matrix.gz -b 1000 -a 1000 -m 3000
Quality Control Tools
plotFingerprint
Quality control tool primarily for ChIP-seq experiments. Assesses whether antibody enrichment was successful. Generates cumulative read coverage profiles to distinguish signal from noise.
Key Parameters:
--bamfiles, -b: Indexed BAM files (required)--plotFile, -plot, -o: Output image filename (required)--extendReads, -e: Extend reads to fragment length--ignoreDuplicates: Count identical reads once--minMappingQuality: Mapping quality filter--centerReads: Center reads at fragment length--minFragmentLength / --maxFragmentLength: Fragment filters--outRawCounts: Save per-bin read counts--outQualityMetrics: Output QC metrics (Jensen-Shannon distance)--labels: Custom sample names--numberOfProcessors, -p: Parallel processing
Interpretation:
- Ideal control: Straight diagonal line
- Strong ChIP: Steep rise towards highest rank (concentrated reads in few bins)
- Weak enrichment: Flatter curve approaching diagonal
Common Usage:
plotFingerprint -b input.bam chip1.bam chip2.bam \
--labels Input ChIP1 ChIP2 -o fingerprint.png \
--extendReads 200 --ignoreDuplicates
plotCoverage
Visualizes average read distribution across the genome. Shows genome coverage and helps determine if sequencing depth is adequate.
Key Parameters:
--bamfiles, -b: BAM files to analyze (required)--plotFile, -o: Output plot filename (required)--ignoreDuplicates: Remove PCR duplicates--minMappingQuality: Quality threshold--outRawCounts: Save underlying data--labels: Sample names--numberOfSamples: Number of positions to sample (default: 1,000,000)
bamPEFragmentSize
Determines fragment length distribution for paired-end sequencing data. Essential QC to verify expected fragment sizes from library preparation.
Key Parameters:
--bamfiles, -b: BAM files (required)--histogram, -hist: Output histogram filename (required)--plotTitle, -T: Plot title--maxFragmentLength: Maximum length to consider (default: 1000)--logScale: Use logarithmic Y-axis--outRawFragmentLengths: Save raw fragment lengths
plotCorrelation
Analyzes sample correlations from multiBamSummary or multiBigwigSummary outputs. Shows how similar different samples are.
Correlation Methods:
- Pearson: Measures metric differences; sensitive to outliers; appropriate for normally distributed data
- Spearman: Rank-based; less influenced by outliers; better for non-normal distributions
Visualization Options:
- heatmap: Color intensity with hierarchical clustering (complete linkage)
- scatterplot: Pairwise scatter plots with correlation coefficients
Key Parameters:
--corData, -in: Input matrix from multiBamSummary/multiBigwigSummary (required)--corMethod: pearson or spearman (required)--whatToShow: heatmap or scatterplot (required)--plotFile, -o: Output filename (required)--skipZeros: Exclude zero-value regions--removeOutliers: Use median absolute deviation (MAD) filtering--outFileCorMatrix: Export correlation matrix--labels: Custom sample names--plotTitle: Plot title--colorMap: Color scheme (50+ options)--plotNumbers: Display correlation values on heatmap
Common Usage:
# Heatmap with Pearson correlation
plotCorrelation -in readCounts.npz --corMethod pearson \
--whatToShow heatmap -o correlation_heatmap.png --plotNumbers
# Scatterplot with Spearman correlation
plotCorrelation -in readCounts.npz --corMethod spearman \
--whatToShow scatterplot -o correlation_scatter.png
plotPCA
Generates principal component analysis plots from multiBamSummary or multiBigwigSummary output. Displays sample relationships in reduced dimensionality.
Key Parameters:
--corData, -in: Coverage file from multiBamSummary/multiBigwigSummary (required)--plotFile, -o: Output image (png, eps, pdf, svg) (required)--outFileNameData: Export PCA data (loadings/rotation and eigenvalues)--labels, -l: Custom sample labels--plotTitle, -T: Plot title--plotHeight / --plotWidth: Dimensions in centimeters--colors: Custom symbol colors--markers: Symbol shapes--transpose: Perform PCA on transposed matrix (rows=samples)--ntop: Use top N variable rows (default: 1000)--PCs: Components to plot (default: 1 2)--log2: Log2-transform data before analysis--rowCenter: Center each row at 0
Common Usage:
plotPCA -in readCounts.npz -o PCA_plot.png \
-T "PCA of read counts" --transpose
Visualization Tools
plotHeatmap
Creates genomic region heatmaps from computeMatrix output. Generates publication-quality visualizations.
Key Parameters:
--matrixFile, -m: Matrix from computeMatrix (required)--outFileName, -o: Output image (png, eps, pdf, svg) (required)--outFileSortedRegions: Save regions after filtering--outFileNameMatrix: Export matrix values--interpolationMethod: auto, nearest, bilinear, bicubic, gaussian- Default: nearest (≤1000 columns), bilinear (>1000 columns)
--dpi: Figure resolution
Clustering:
--kmeans: k-means clustering--hclust: Hierarchical clustering (slower for >1000 regions)--silhouette: Calculate cluster quality metrics
Visual Customization:
--heatmapHeight / --heatmapWidth: Dimensions (3-100 cm)--whatToShow: plot, heatmap, colorbar (combinations)--alpha: Transparency (0-1)--colorMap: 50+ color schemes--colorList: Custom gradient colors--zMin / --zMax: Intensity scale limits--boxAroundHeatmaps: yes/no (default: yes)
Labels:
--xAxisLabel / --yAxisLabel: Axis labels--regionsLabel: Region set identifiers--samplesLabel: Sample names--refPointLabel: Reference point label--startLabel / --endLabel: Region boundary labels
Common Usage:
# Basic heatmap
plotHeatmap -m matrix.gz -o heatmap.png
# With clustering and custom colors
plotHeatmap -m matrix.gz -o heatmap.png \
--kmeans 3 --colorMap RdBu --zMin -3 --zMax 3
plotProfile
Generates profile plots showing scores across genomic regions using computeMatrix output.
Key Parameters:
--matrixFile, -m: Matrix from computeMatrix (required)--outFileName, -o: Output image (png, eps, pdf, svg) (required)--plotType: lines, fill, se, std, overlapped_lines, heatmap--colors: Color palette (names or hex codes)--plotHeight / --plotWidth: Dimensions in centimeters--yMin / --yMax: Y-axis range--averageType: mean, median, min, max, std, sum
Clustering:
--kmeans: k-means clustering--hclust: Hierarchical clustering--silhouette: Cluster quality metrics
Labels:
--plotTitle: Main heading--regionsLabel: Region set identifiers--samplesLabel: Sample names--startLabel / --endLabel: Region boundary labels (scale-regions mode)
Output Options:
--outFileNameData: Export data as tab-separated values--outFileSortedRegions: Save filtered/sorted regions as BED
Common Usage:
# Line plot
plotProfile -m matrix.gz -o profile.png --plotType lines
# With standard error shading
plotProfile -m matrix.gz -o profile.png --plotType se \
--colors blue red green
plotEnrichment
Calculates and visualizes signal enrichment across genomic regions. Measures percentage of alignments overlapping region groups. Useful for FRiP (Fragment in Peaks) scores.
Key Parameters:
--bamfiles, -b: Indexed BAM files (required)--BED: Region files in BED/GTF format (required)--plotFile, -o: Output visualization (png, pdf, eps, svg)--labels, -l: Custom sample identifiers--outRawCounts: Export numerical data--perSample: Group by sample instead of feature (default)--regionLabels: Custom region names
Read Processing:
--minFragmentLength / --maxFragmentLength: Fragment filters--minMappingQuality: Quality threshold--samFlagInclude / --samFlagExclude: SAM flag filters--ignoreDuplicates: Remove duplicates--centerReads: Center reads for sharper signal
Common Usage:
plotEnrichment -b Input.bam H3K4me3.bam \
--BED peaks_up.bed peaks_down.bed \
--regionLabels "Up regulated" "Down regulated" \
-o enrichment.png
Miscellaneous Tools
computeMatrixOperations
Advanced matrix manipulation tool for combining or subsetting matrices from computeMatrix. Enables complex multi-sample, multi-region analyses.
Operations:
cbind: Combine matrices column-wiserbind: Combine matrices row-wisesubset: Extract specific samples or regionsfilterStrand: Keep only regions on specific strandfilterValues: Apply signal intensity filterssort: Order regions by various criteriadataRange: Report min/max values
Common Usage:
# Combine matrices
computeMatrixOperations cbind -m matrix1.gz matrix2.gz -o combined.gz
# Extract specific samples
computeMatrixOperations subset -m matrix.gz --samples 0 2 -o subset.gz
estimateReadFiltering
Predicts the impact of various filtering parameters without actually filtering. Helps optimize filtering strategies before running full analyses.
Key Parameters:
--bamfiles, -b: BAM files to analyze--sampleSize: Number of reads to sample (default: 100,000)--binSize: Bin size for analysis--distanceBetweenBins: Spacing between sampled bins
Filtration Options to Test:
--minMappingQuality: Test quality thresholds--ignoreDuplicates: Assess duplicate impact--minFragmentLength / --maxFragmentLength: Test fragment filters
Common Parameters Across Tools
Many deepTools commands share these filtering and performance options:
Read Filtering:
--ignoreDuplicates: Remove PCR duplicates--minMappingQuality: Filter by alignment confidence--samFlagInclude / --samFlagExclude: SAM format filtering--minFragmentLength / --maxFragmentLength: Fragment length bounds
Performance:
--numberOfProcessors, -p: Enable parallel processing--region: Process specific genomic regions (chr:start-end)
Read Processing:
--extendReads: Extend to fragment length--centerReads: Center at fragment midpoint--ignoreDuplicates: Count unique reads only