488 lines
10 KiB
Markdown
488 lines
10 KiB
Markdown
# Verified Workflows
|
|
|
|
## Overview
|
|
Latch Verified Workflows are production-ready, pre-built bioinformatics pipelines developed and maintained by Latch engineers. These workflows are used by top pharmaceutical companies and biotech firms for research and discovery.
|
|
|
|
## Available in Python SDK
|
|
|
|
The `latch.verified` module provides programmatic access to verified workflows from Python code.
|
|
|
|
### Importing Verified Workflows
|
|
|
|
```python
|
|
from latch.verified import (
|
|
bulk_rnaseq,
|
|
deseq2,
|
|
mafft,
|
|
trim_galore,
|
|
alphafold,
|
|
colabfold
|
|
)
|
|
```
|
|
|
|
## Core Verified Workflows
|
|
|
|
### Bulk RNA-seq Analysis
|
|
|
|
**Alignment and Quantification:**
|
|
```python
|
|
from latch.verified import bulk_rnaseq
|
|
from latch.types import LatchFile
|
|
|
|
# Run bulk RNA-seq pipeline
|
|
results = bulk_rnaseq(
|
|
fastq_r1=LatchFile("latch:///data/sample_R1.fastq.gz"),
|
|
fastq_r2=LatchFile("latch:///data/sample_R2.fastq.gz"),
|
|
reference_genome="hg38",
|
|
output_dir="latch:///results/rnaseq"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Read quality control with FastQC
|
|
- Adapter trimming
|
|
- Alignment with STAR or HISAT2
|
|
- Gene-level quantification with featureCounts
|
|
- MultiQC report generation
|
|
|
|
### Differential Expression Analysis
|
|
|
|
**DESeq2:**
|
|
```python
|
|
from latch.verified import deseq2
|
|
from latch.types import LatchFile
|
|
|
|
# Run differential expression analysis
|
|
results = deseq2(
|
|
count_matrix=LatchFile("latch:///data/counts.csv"),
|
|
sample_metadata=LatchFile("latch:///data/metadata.csv"),
|
|
design_formula="~ condition",
|
|
output_dir="latch:///results/deseq2"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Normalization and variance stabilization
|
|
- Differential expression testing
|
|
- MA plots and volcano plots
|
|
- PCA visualization
|
|
- Annotated results tables
|
|
|
|
### Pathway Analysis
|
|
|
|
**Enrichment Analysis:**
|
|
```python
|
|
from latch.verified import pathway_enrichment
|
|
|
|
results = pathway_enrichment(
|
|
gene_list=LatchFile("latch:///data/deg_list.txt"),
|
|
organism="human",
|
|
databases=["GO_Biological_Process", "KEGG", "Reactome"],
|
|
output_dir="latch:///results/pathways"
|
|
)
|
|
```
|
|
|
|
**Supported Databases:**
|
|
- Gene Ontology (GO)
|
|
- KEGG pathways
|
|
- Reactome
|
|
- WikiPathways
|
|
- MSigDB collections
|
|
|
|
### Sequence Alignment
|
|
|
|
**MAFFT Multiple Sequence Alignment:**
|
|
```python
|
|
from latch.verified import mafft
|
|
from latch.types import LatchFile
|
|
|
|
aligned = mafft(
|
|
input_fasta=LatchFile("latch:///data/sequences.fasta"),
|
|
algorithm="auto",
|
|
output_format="fasta"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Multiple alignment algorithms (FFT-NS-1, FFT-NS-2, G-INS-i, L-INS-i)
|
|
- Automatic algorithm selection
|
|
- Support for large alignments
|
|
- Various output formats
|
|
|
|
### Adapter and Quality Trimming
|
|
|
|
**Trim Galore:**
|
|
```python
|
|
from latch.verified import trim_galore
|
|
|
|
trimmed = trim_galore(
|
|
fastq_r1=LatchFile("latch:///data/sample_R1.fastq.gz"),
|
|
fastq_r2=LatchFile("latch:///data/sample_R2.fastq.gz"),
|
|
quality_threshold=20,
|
|
adapter_auto_detect=True
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Automatic adapter detection
|
|
- Quality trimming
|
|
- FastQC integration
|
|
- Support for single-end and paired-end
|
|
|
|
## Protein Structure Prediction
|
|
|
|
### AlphaFold
|
|
|
|
**Standard AlphaFold:**
|
|
```python
|
|
from latch.verified import alphafold
|
|
from latch.types import LatchFile
|
|
|
|
structure = alphafold(
|
|
sequence_fasta=LatchFile("latch:///data/protein.fasta"),
|
|
model_preset="monomer",
|
|
use_templates=True,
|
|
output_dir="latch:///results/alphafold"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Monomer and multimer prediction
|
|
- Template-based modeling option
|
|
- MSA generation
|
|
- Confidence metrics (pLDDT, PAE)
|
|
- PDB structure output
|
|
|
|
**Model Presets:**
|
|
- `monomer`: Single protein chain
|
|
- `monomer_casp14`: CASP14 competition version
|
|
- `monomer_ptm`: With pTM confidence
|
|
- `multimer`: Protein complexes
|
|
|
|
### ColabFold
|
|
|
|
**Optimized AlphaFold Alternative:**
|
|
```python
|
|
from latch.verified import colabfold
|
|
|
|
structure = colabfold(
|
|
sequence_fasta=LatchFile("latch:///data/protein.fasta"),
|
|
num_models=5,
|
|
use_amber_relax=True,
|
|
output_dir="latch:///results/colabfold"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Faster than standard AlphaFold
|
|
- MMseqs2-based MSA generation
|
|
- Multiple model predictions
|
|
- Amber relaxation
|
|
- Ranking by confidence
|
|
|
|
**Advantages:**
|
|
- 3-5x faster MSA generation
|
|
- Lower compute cost
|
|
- Similar accuracy to AlphaFold
|
|
|
|
## Single-Cell Analysis
|
|
|
|
### ArchR (scATAC-seq)
|
|
|
|
**Chromatin Accessibility Analysis:**
|
|
```python
|
|
from latch.verified import archr
|
|
|
|
results = archr(
|
|
fragments_file=LatchFile("latch:///data/fragments.tsv.gz"),
|
|
genome="hg38",
|
|
output_dir="latch:///results/archr"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Arrow file generation
|
|
- Quality control metrics
|
|
- Dimensionality reduction
|
|
- Clustering
|
|
- Peak calling
|
|
- Motif enrichment
|
|
|
|
### scVelo (RNA Velocity)
|
|
|
|
**RNA Velocity Analysis:**
|
|
```python
|
|
from latch.verified import scvelo
|
|
|
|
results = scvelo(
|
|
adata_file=LatchFile("latch:///data/adata.h5ad"),
|
|
mode="dynamical",
|
|
output_dir="latch:///results/scvelo"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Spliced/unspliced quantification
|
|
- Velocity estimation
|
|
- Dynamical modeling
|
|
- Trajectory inference
|
|
- Visualization
|
|
|
|
### emptyDropsR (Cell Calling)
|
|
|
|
**Empty Droplet Detection:**
|
|
```python
|
|
from latch.verified import emptydrops
|
|
|
|
filtered_matrix = emptydrops(
|
|
raw_matrix_dir=LatchDir("latch:///data/raw_feature_bc_matrix"),
|
|
fdr_threshold=0.01
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Distinguish cells from empty droplets
|
|
- FDR-based thresholding
|
|
- Ambient RNA removal
|
|
- Compatible with 10X data
|
|
|
|
## Gene Editing Analysis
|
|
|
|
### CRISPResso2
|
|
|
|
**CRISPR Editing Assessment:**
|
|
```python
|
|
from latch.verified import crispresso2
|
|
|
|
results = crispresso2(
|
|
fastq_r1=LatchFile("latch:///data/sample_R1.fastq.gz"),
|
|
amplicon_sequence="AGCTAGCTAG...",
|
|
guide_rna="GCTAGCTAGC",
|
|
output_dir="latch:///results/crispresso"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Indel quantification
|
|
- Base editing analysis
|
|
- Prime editing analysis
|
|
- HDR quantification
|
|
- Allele frequency plots
|
|
|
|
## Phylogenetics
|
|
|
|
### Phylogenetic Tree Construction
|
|
|
|
```python
|
|
from latch.verified import phylogenetics
|
|
|
|
tree = phylogenetics(
|
|
alignment_file=LatchFile("latch:///data/aligned.fasta"),
|
|
method="maximum_likelihood",
|
|
bootstrap_replicates=1000,
|
|
output_dir="latch:///results/phylo"
|
|
)
|
|
```
|
|
|
|
**Features:**
|
|
- Multiple tree-building methods
|
|
- Bootstrap support
|
|
- Tree visualization
|
|
- Model selection
|
|
|
|
## Workflow Integration
|
|
|
|
### Using Verified Workflows in Custom Pipelines
|
|
|
|
```python
|
|
from latch import workflow, small_task
|
|
from latch.verified import bulk_rnaseq, deseq2
|
|
from latch.types import LatchFile, LatchDir
|
|
|
|
@workflow
|
|
def complete_rnaseq_analysis(
|
|
fastq_files: List[LatchFile],
|
|
metadata: LatchFile,
|
|
output_dir: LatchDir
|
|
) -> LatchFile:
|
|
"""
|
|
Complete RNA-seq analysis pipeline using verified workflows
|
|
"""
|
|
# Run alignment for each sample
|
|
aligned_samples = []
|
|
for fastq in fastq_files:
|
|
result = bulk_rnaseq(
|
|
fastq_r1=fastq,
|
|
reference_genome="hg38",
|
|
output_dir=output_dir
|
|
)
|
|
aligned_samples.append(result)
|
|
|
|
# Aggregate counts and run differential expression
|
|
count_matrix = aggregate_counts(aligned_samples)
|
|
deseq_results = deseq2(
|
|
count_matrix=count_matrix,
|
|
sample_metadata=metadata,
|
|
design_formula="~ condition"
|
|
)
|
|
|
|
return deseq_results
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### When to Use Verified Workflows
|
|
|
|
**Use Verified Workflows for:**
|
|
1. Standard analysis pipelines
|
|
2. Well-established methods
|
|
3. Production-ready analyses
|
|
4. Reproducible research
|
|
5. Validated bioinformatics tools
|
|
|
|
**Build Custom Workflows for:**
|
|
1. Novel analysis methods
|
|
2. Custom preprocessing steps
|
|
3. Integration with proprietary tools
|
|
4. Experimental pipelines
|
|
5. Highly specialized workflows
|
|
|
|
### Combining Verified and Custom
|
|
|
|
```python
|
|
from latch import workflow, small_task
|
|
from latch.verified import alphafold
|
|
from latch.types import LatchFile
|
|
|
|
@small_task
|
|
def preprocess_sequence(raw_fasta: LatchFile) -> LatchFile:
|
|
"""Custom preprocessing"""
|
|
# Custom logic here
|
|
return processed_fasta
|
|
|
|
@small_task
|
|
def postprocess_structure(pdb_file: LatchFile) -> LatchFile:
|
|
"""Custom post-analysis"""
|
|
# Custom analysis here
|
|
return analysis_results
|
|
|
|
@workflow
|
|
def custom_structure_pipeline(input_fasta: LatchFile) -> LatchFile:
|
|
"""
|
|
Combine custom steps with verified AlphaFold
|
|
"""
|
|
# Custom preprocessing
|
|
processed = preprocess_sequence(raw_fasta=input_fasta)
|
|
|
|
# Use verified AlphaFold
|
|
structure = alphafold(
|
|
sequence_fasta=processed,
|
|
model_preset="monomer_ptm"
|
|
)
|
|
|
|
# Custom post-processing
|
|
results = postprocess_structure(pdb_file=structure)
|
|
|
|
return results
|
|
```
|
|
|
|
## Accessing Workflow Documentation
|
|
|
|
### In-Platform Documentation
|
|
|
|
Each verified workflow includes:
|
|
- Parameter descriptions
|
|
- Input/output specifications
|
|
- Method details
|
|
- Citation information
|
|
- Example usage
|
|
|
|
### Viewing Available Workflows
|
|
|
|
```python
|
|
from latch.verified import list_workflows
|
|
|
|
# List all available verified workflows
|
|
workflows = list_workflows()
|
|
|
|
for workflow in workflows:
|
|
print(f"{workflow.name}: {workflow.description}")
|
|
```
|
|
|
|
## Version Management
|
|
|
|
### Workflow Versions
|
|
|
|
Verified workflows are versioned and maintained:
|
|
- Bug fixes and improvements
|
|
- New features added
|
|
- Backward compatibility maintained
|
|
- Version pinning available
|
|
|
|
### Using Specific Versions
|
|
|
|
```python
|
|
from latch.verified import bulk_rnaseq
|
|
|
|
# Use specific version
|
|
results = bulk_rnaseq(
|
|
fastq_r1=input_file,
|
|
reference_genome="hg38",
|
|
workflow_version="2.1.0"
|
|
)
|
|
```
|
|
|
|
## Support and Updates
|
|
|
|
### Getting Help
|
|
|
|
- **Documentation**: https://docs.latch.bio
|
|
- **Slack Community**: Latch SDK workspace
|
|
- **Support**: support@latch.bio
|
|
- **GitHub Issues**: Report bugs and request features
|
|
|
|
### Workflow Updates
|
|
|
|
Verified workflows receive regular updates:
|
|
- Tool version upgrades
|
|
- Performance improvements
|
|
- Bug fixes
|
|
- New features
|
|
|
|
Subscribe to release notes for update notifications.
|
|
|
|
## Common Use Cases
|
|
|
|
### Complete RNA-seq Study
|
|
|
|
```python
|
|
# 1. Quality control and alignment
|
|
aligned = bulk_rnaseq(fastq=samples)
|
|
|
|
# 2. Differential expression
|
|
deg = deseq2(counts=aligned)
|
|
|
|
# 3. Pathway enrichment
|
|
pathways = pathway_enrichment(genes=deg)
|
|
```
|
|
|
|
### Protein Structure Analysis
|
|
|
|
```python
|
|
# 1. Predict structure
|
|
structure = alphafold(sequence=protein_seq)
|
|
|
|
# 2. Custom analysis
|
|
results = analyze_structure(pdb=structure)
|
|
```
|
|
|
|
### Single-Cell Workflow
|
|
|
|
```python
|
|
# 1. Filter cells
|
|
filtered = emptydrops(matrix=raw_counts)
|
|
|
|
# 2. RNA velocity
|
|
velocity = scvelo(adata=filtered)
|
|
```
|