11 KiB
deepTools Common Workflows
This document provides complete workflow examples for common deepTools analyses.
ChIP-seq Quality Control Workflow
Complete quality control assessment for ChIP-seq experiments.
Step 1: Initial Correlation Assessment
Compare replicates and samples to verify experimental quality:
# Generate coverage matrix across genome
multiBamSummary bins \
--bamfiles Input1.bam Input2.bam ChIP1.bam ChIP2.bam \
--labels Input_rep1 Input_rep2 ChIP_rep1 ChIP_rep2 \
-o readCounts.npz \
--numberOfProcessors 8
# Create correlation heatmap
plotCorrelation \
-in readCounts.npz \
--corMethod pearson \
--whatToShow heatmap \
--plotFile correlation_heatmap.png \
--plotNumbers
# Generate PCA plot
plotPCA \
-in readCounts.npz \
-o PCA_plot.png \
-T "PCA of ChIP-seq samples"
Expected Results:
- Replicates should cluster together
- Input samples should be distinct from ChIP samples
Step 2: Coverage and Depth Assessment
# Check sequencing depth and coverage
plotCoverage \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--labels Input ChIP_rep1 ChIP_rep2 \
--plotFile coverage.png \
--ignoreDuplicates \
--numberOfProcessors 8
Interpretation: Assess whether sequencing depth is adequate for downstream analysis.
Step 3: Fragment Size Validation (Paired-end)
# Verify expected fragment sizes
bamPEFragmentSize \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--histogram fragmentSizes.png \
--plotTitle "Fragment Size Distribution"
Expected Results: Fragment sizes should match library preparation protocols (typically 200-600bp for ChIP-seq).
Step 4: GC Bias Detection and Correction
# Compute GC bias
computeGCBias \
--bamfile ChIP1.bam \
--effectiveGenomeSize 2913022398 \
--genome genome.2bit \
--fragmentLength 200 \
--biasPlot GCbias.png \
--frequenciesFile freq.txt
# If bias detected, correct it
correctGCBias \
--bamfile ChIP1.bam \
--effectiveGenomeSize 2913022398 \
--genome genome.2bit \
--GCbiasFrequenciesFile freq.txt \
--correctedFile ChIP1_GCcorrected.bam
Note: Only correct if significant bias is observed. Do NOT use --ignoreDuplicates with GC-corrected files.
Step 5: ChIP Signal Strength Assessment
# Evaluate ChIP enrichment quality
plotFingerprint \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--labels Input ChIP_rep1 ChIP_rep2 \
--plotFile fingerprint.png \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8 \
--outQualityMetrics fingerprint_metrics.txt
Interpretation:
- Strong ChIP: Steep rise in cumulative curve
- Weak enrichment: Curve close to diagonal (input-like)
ChIP-seq Analysis Workflow
Complete workflow from BAM files to publication-quality visualizations.
Step 1: Generate Normalized Coverage Tracks
# Input control
bamCoverage \
--bam Input.bam \
--outFileName Input_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
# ChIP sample
bamCoverage \
--bam ChIP.bam \
--outFileName ChIP_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
Step 2: Create Log2 Ratio Track
# Compare ChIP to Input
bamCompare \
--bamfile1 ChIP.bam \
--bamfile2 Input.bam \
--outFileName ChIP_vs_Input_log2ratio.bw \
--operation log2 \
--scaleFactorsMethod readCount \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
Result: Log2 ratio track showing enrichment (positive values) and depletion (negative values).
Step 3: Compute Matrix Around TSS
# Prepare data for heatmap/profile around transcription start sites
computeMatrix reference-point \
--referencePoint TSS \
--scoreFileName ChIP_coverage.bw \
--regionsFileName genes.bed \
--beforeRegionStartLength 3000 \
--afterRegionStartLength 3000 \
--binSize 10 \
--sortRegions descend \
--sortUsing mean \
--outFileName matrix_TSS.gz \
--outFileNameMatrix matrix_TSS.tab \
--numberOfProcessors 8
Step 4: Generate Heatmap
# Create heatmap around TSS
plotHeatmap \
--matrixFile matrix_TSS.gz \
--outFileName heatmap_TSS.png \
--colorMap RdBu \
--whatToShow 'plot, heatmap and colorbar' \
--zMin -3 --zMax 3 \
--yAxisLabel "Genes" \
--xAxisLabel "Distance from TSS (bp)" \
--refPointLabel "TSS" \
--heatmapHeight 15 \
--kmeans 3
Step 5: Generate Profile Plot
# Create meta-profile around TSS
plotProfile \
--matrixFile matrix_TSS.gz \
--outFileName profile_TSS.png \
--plotType lines \
--perGroup \
--colors blue \
--plotTitle "ChIP-seq signal around TSS" \
--yAxisLabel "Average signal" \
--xAxisLabel "Distance from TSS (bp)" \
--refPointLabel "TSS"
Step 6: Enrichment at Peaks
# Calculate enrichment in peak regions
plotEnrichment \
--bamfiles Input.bam ChIP.bam \
--BED peaks.bed \
--labels Input ChIP \
--plotFile enrichment.png \
--outRawCounts enrichment_counts.tab \
--extendReads 200 \
--ignoreDuplicates
RNA-seq Coverage Workflow
Generate strand-specific coverage tracks for RNA-seq data.
Forward Strand
bamCoverage \
--bam rnaseq.bam \
--outFileName forward_coverage.bw \
--filterRNAstrand forward \
--normalizeUsing CPM \
--binSize 1 \
--numberOfProcessors 8
Reverse Strand
bamCoverage \
--bam rnaseq.bam \
--outFileName reverse_coverage.bw \
--filterRNAstrand reverse \
--normalizeUsing CPM \
--binSize 1 \
--numberOfProcessors 8
Important: Do NOT use --extendReads for RNA-seq (would extend over splice junctions).
Multi-Sample Comparison Workflow
Compare multiple ChIP-seq samples (e.g., different conditions or time points).
Step 1: Generate Coverage Files
# For each sample
for sample in Control_ChIP Treated_ChIP; do
bamCoverage \
--bam ${sample}.bam \
--outFileName ${sample}.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
done
Step 2: Compute Multi-Sample Matrix
computeMatrix scale-regions \
--scoreFileName Control_ChIP.bw Treated_ChIP.bw \
--regionsFileName genes.bed \
--beforeRegionStartLength 1000 \
--afterRegionStartLength 1000 \
--regionBodyLength 3000 \
--binSize 10 \
--sortRegions descend \
--sortUsing mean \
--outFileName matrix_multi.gz \
--numberOfProcessors 8
Step 3: Multi-Sample Heatmap
plotHeatmap \
--matrixFile matrix_multi.gz \
--outFileName heatmap_comparison.png \
--colorMap Blues \
--whatToShow 'plot, heatmap and colorbar' \
--samplesLabel Control Treated \
--yAxisLabel "Genes" \
--heatmapHeight 15 \
--kmeans 4
Step 4: Multi-Sample Profile
plotProfile \
--matrixFile matrix_multi.gz \
--outFileName profile_comparison.png \
--plotType lines \
--perGroup \
--colors blue red \
--samplesLabel Control Treated \
--plotTitle "ChIP-seq signal comparison" \
--startLabel "TSS" \
--endLabel "TES"
ATAC-seq Workflow
Specialized workflow for ATAC-seq data with Tn5 offset correction.
Step 1: Shift Reads for Tn5 Correction
alignmentSieve \
--bam atacseq.bam \
--outFile atacseq_shifted.bam \
--ATACshift \
--minFragmentLength 38 \
--maxFragmentLength 2000 \
--ignoreDuplicates
Step 2: Generate Coverage Track
bamCoverage \
--bam atacseq_shifted.bam \
--outFileName atacseq_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 1 \
--numberOfProcessors 8
Step 3: Fragment Size Analysis
bamPEFragmentSize \
--bamfiles atacseq.bam \
--histogram fragmentSizes_atac.png \
--maxFragmentLength 1000
Expected Pattern: Nucleosome ladder with peaks at ~50bp (nucleosome-free), ~200bp (mono-nucleosome), ~400bp (di-nucleosome).
Peak Region Analysis Workflow
Analyze ChIP-seq signal specifically at peak regions.
Step 1: Matrix at Peaks
computeMatrix reference-point \
--referencePoint center \
--scoreFileName ChIP_coverage.bw \
--regionsFileName peaks.bed \
--beforeRegionStartLength 2000 \
--afterRegionStartLength 2000 \
--binSize 10 \
--outFileName matrix_peaks.gz \
--numberOfProcessors 8
Step 2: Heatmap at Peaks
plotHeatmap \
--matrixFile matrix_peaks.gz \
--outFileName heatmap_peaks.png \
--colorMap YlOrRd \
--refPointLabel "Peak Center" \
--heatmapHeight 15 \
--sortUsing max
Troubleshooting Common Issues
Issue: Out of Memory
Solution: Use --region parameter to process chromosomes individually:
bamCoverage --bam input.bam -o chr1.bw --region chr1
Issue: BAM Index Missing
Solution: Index BAM files before running deepTools:
samtools index input.bam
Issue: Slow Processing
Solution: Increase --numberOfProcessors:
# Use 8 cores instead of default
--numberOfProcessors 8
Issue: bigWig Files Too Large
Solution: Increase bin size:
--binSize 50 # or larger (default is 10-50)
Performance Tips
- Use multiple processors: Always set
--numberOfProcessorsto available cores - Process regions: Use
--regionfor testing or memory-limited environments - Adjust bin size: Larger bins = faster processing and smaller files
- Pre-filter BAM files: Use
alignmentSieveto create filtered BAM files once, then reuse - Use bigWig over bedGraph: bigWig format is compressed and faster to process
Best Practices
- Always check QC first: Run correlation, coverage, and fingerprint analysis before proceeding
- Document parameters: Save command lines for reproducibility
- Use consistent normalization: Apply same normalization method across samples in a comparison
- Verify reference genome match: Ensure BAM files and region files use same genome build
- Check strand orientation: For RNA-seq, verify correct strand orientation
- Test on small regions first: Use
--region chr1:1-1000000for testing parameters - Keep intermediate files: Save matrices for regenerating plots with different settings