Phylogenomics Workflow Templates
This directory contains template scripts for running the phylogenomics pipeline across different computing environments.
Directory Structure
templates/
├── slurm/ # SLURM job scheduler templates
├── pbs/ # PBS/Torque job scheduler templates
└── local/ # Local machine templates (with GNU parallel support)
Template Naming Convention
Templates follow a consistent naming pattern: NN_step_name[_variant].ext
NN: Step number (e.g.,02for compleasm,08afor partition search)step_name: Descriptive name of the pipeline step_variant: Optional variant (e.g.,_first,_parallel,_serial).ext: File extension (.jobfor schedulers,.shfor local scripts)
Available Templates
Step 2: Ortholog Identification (compleasm)
SLURM:
02_compleasm_first.job- Process first genome to download lineage database02_compleasm_parallel.job- Array job for remaining genomes
PBS:
02_compleasm_first.job- Process first genome to download lineage database02_compleasm_parallel.job- Array job for remaining genomes
Local:
02_compleasm_first.sh- Process first genome to download lineage database02_compleasm_parallel.sh- GNU parallel for remaining genomes
Step 8A: Partition Model Selection
SLURM:
08a_partition_search.job- IQ-TREE partition model search with TESTMERGEONLY
PBS:
08a_partition_search.job- IQ-TREE partition model search with TESTMERGEONLY
Local:
08a_partition_search.sh- IQ-TREE partition model search with TESTMERGEONLY
Step 8C: Individual Gene Trees
SLURM:
08c_gene_trees_array.job- Array job for parallel gene tree estimation
PBS:
08c_gene_trees_array.job- Array job for parallel gene tree estimation
Local:
08c_gene_trees_parallel.sh- GNU parallel for gene tree estimation08c_gene_trees_serial.sh- Serial processing (for debugging/limited resources)
Placeholders
Templates contain placeholders that must be replaced with user-specific values:
| Placeholder | Description | Example |
|---|---|---|
TOTAL_THREADS |
Total CPU cores available | 64 |
THREADS_PER_JOB |
Threads per concurrent job | 16 |
NUM_GENOMES |
Number of genomes in analysis | 20 |
NUM_LOCI |
Number of loci/alignments | 2795 |
LINEAGE |
BUSCO lineage dataset | insecta_odb10 |
MODEL_SET |
Comma-separated substitution models | LG,WAG,JTT,Q.pfam |
Usage
For Claude (LLM)
When a user requests scripts for a specific computing environment:
- Read the appropriate template using the Read tool
- Replace placeholders with user-specified values
- Present the customized script to the user
- Provide setup instructions (e.g., how many genomes, how to calculate thread allocation)
Example:
# Read template
template = Read("templates/slurm/02_compleasm_first.job")
# Replace placeholders
script = template.replace("TOTAL_THREADS", "64")
script = script.replace("LINEAGE", "insecta_odb10")
# Present to user
print(script)
For Users
Templates are not meant to be used directly. Instead:
- Follow the workflow in
SKILL.md - Answer Claude's questions about your setup
- Claude will fetch the appropriate template and customize it for you
- Copy the customized script Claude provides
Benefits of This Structure
- Reduced token usage: Claude only reads templates when needed
- Easier maintenance: Update one template file instead of multiple locations in SKILL.md
- Consistency: All users get the same base template structure
- Clarity: Separate files are easier to review than inline code
- Extensibility: Easy to add new templates for additional tools or variants
Adding New Templates
When adding new templates:
- Follow naming convention:
NN_descriptive_name[_variant].ext - Include clear comments: Explain what the script does
- Use consistent placeholders: Match existing placeholder names
- Test thoroughly: Ensure placeholders are complete and correct
- Update this README: Add the new template to the "Available Templates" section
- Update SKILL.md: Reference the new template in the appropriate workflow step