223 lines
7.9 KiB
Markdown
223 lines
7.9 KiB
Markdown
# BioGeoBEARS Biogeographic Analysis Skill
|
|
|
|
A Claude skill for setting up and executing phylogenetic biogeographic analyses using BioGeoBEARS in R.
|
|
|
|
## Overview
|
|
|
|
This skill automates the complete workflow for biogeographic analysis on phylogenetic trees, from raw data validation to publication-ready visualizations. It helps users reconstruct ancestral geographic ranges by:
|
|
|
|
- Validating and reformatting input files (phylogenetic tree + geographic distribution data)
|
|
- Setting up organized analysis folder structures
|
|
- Generating customized RMarkdown analysis scripts
|
|
- Guiding parameter selection (maximum range size, model choices)
|
|
- Producing visualizations with pie charts and text labels showing ancestral ranges
|
|
- Comparing multiple biogeographic models with statistical tests
|
|
|
|
## When to Use
|
|
|
|
Use this skill when you need to:
|
|
- Reconstruct ancestral geographic ranges on a phylogeny
|
|
- Test different biogeographic models (DEC, DIVALIKE, BAYAREALIKE)
|
|
- Analyze how species distributions evolved over time
|
|
- Determine whether founder-event speciation (+J parameter) is important
|
|
- Generate publication-ready biogeographic visualizations
|
|
|
|
## Required Inputs
|
|
|
|
Users must provide:
|
|
|
|
1. **Phylogenetic tree** (Newick format: .nwk, .tre, or .tree)
|
|
- Must be rooted
|
|
- Tip labels must match species in geography file
|
|
- Branch lengths required
|
|
|
|
2. **Geographic distribution data** (any tabular format)
|
|
- Species names matching tree tips
|
|
- Presence/absence data for different geographic areas
|
|
- Accepts CSV, TSV, Excel, or PHYLIP format
|
|
|
|
## What the Skill Does
|
|
|
|
### 1. Data Validation and Reformatting
|
|
|
|
The skill includes a Python script (`validate_geography_file.py`) that:
|
|
- Validates geography file format (PHYLIP-like with specific tab/spacing requirements)
|
|
- Checks for common errors (spaces in species names, tab delimiters, binary code length)
|
|
- Reformats CSV/TSV files to proper BioGeoBEARS format
|
|
- Cross-validates species names against tree tip labels
|
|
|
|
### 2. Analysis Setup
|
|
|
|
Creates an organized directory structure:
|
|
```
|
|
biogeobears_analysis/
|
|
├── input/
|
|
│ ├── tree.nwk # Phylogenetic tree
|
|
│ ├── geography.data # Validated geography file
|
|
│ └── original_data/ # Original input files
|
|
├── scripts/
|
|
│ └── run_biogeobears.Rmd # Customized RMarkdown script
|
|
├── results/ # Analysis outputs
|
|
│ ├── [MODEL]_result.Rdata # Saved model results
|
|
│ └── plots/ # Visualizations
|
|
│ ├── [MODEL]_pie.pdf
|
|
│ └── [MODEL]_text.pdf
|
|
└── README.md # Documentation
|
|
```
|
|
|
|
### 3. RMarkdown Analysis Template
|
|
|
|
Generates a complete RMarkdown script that:
|
|
- Loads and validates input data
|
|
- Fits 6 biogeographic models:
|
|
- DEC (Dispersal-Extinction-Cladogenesis)
|
|
- DEC+J (DEC with founder-event speciation)
|
|
- DIVALIKE (vicariance-focused)
|
|
- DIVALIKE+J
|
|
- BAYAREALIKE (sympatry-focused)
|
|
- BAYAREALIKE+J
|
|
- Compares models using AIC, AICc, and AIC weights
|
|
- Performs likelihood ratio tests for nested models
|
|
- Estimates parameters (d=dispersal, e=extinction, j=founder-event rates)
|
|
- Generates visualizations on the phylogeny
|
|
- Creates HTML report with all results
|
|
|
|
### 4. Visualization
|
|
|
|
Produces two types of plots:
|
|
- **Pie charts**: Show probability distributions for ancestral ranges (conveys uncertainty)
|
|
- **Text labels**: Show maximum likelihood ancestral states (cleaner, easier to read)
|
|
|
|
Colors represent geographic areas:
|
|
- Single areas: Bright primary colors
|
|
- Multi-area ranges: Blended colors
|
|
- All areas: White
|
|
|
|
## Workflow
|
|
|
|
1. **Gather information**: Ask user for tree file, geography file, and parameters
|
|
2. **Validate tree**: Check if rooted and extract tip labels
|
|
3. **Validate/reformat geography file**: Use validation script to check format or convert from CSV/TSV
|
|
4. **Set up analysis folder**: Create organized directory structure
|
|
5. **Generate RMarkdown script**: Customize template with user parameters
|
|
6. **Create documentation**: Generate README and run scripts
|
|
7. **Provide instructions**: Clear steps for running the analysis
|
|
|
|
## Analysis Parameters
|
|
|
|
The skill helps users choose:
|
|
|
|
### Maximum Range Size
|
|
- How many areas can a species occupy simultaneously?
|
|
- Options: Conservative (# areas - 1), Permissive (all areas), Data-driven (max observed)
|
|
- Larger values increase computation time exponentially
|
|
|
|
### Models to Compare
|
|
- Default: All 6 models (recommended for comprehensive comparison)
|
|
- Alternative: Only base models or only +J models
|
|
- Rationale: Model comparison is key to biogeographic inference
|
|
|
|
### Visualization Type
|
|
- Pie charts (show probabilities and uncertainty)
|
|
- Text labels (show most likely states, cleaner)
|
|
- Both (default in template)
|
|
|
|
## Bundled Resources
|
|
|
|
### scripts/
|
|
|
|
**validate_geography_file.py**
|
|
- Validates BioGeoBEARS geography file format
|
|
- Reformats from CSV/TSV to PHYLIP
|
|
- Cross-validates with tree tip labels
|
|
- Usage: `python validate_geography_file.py --help`
|
|
|
|
**biogeobears_analysis_template.Rmd**
|
|
- Complete RMarkdown analysis template
|
|
- Parameterized via YAML header
|
|
- Fits all models, compares, and visualizes
|
|
- Generates self-contained HTML report
|
|
|
|
### references/
|
|
|
|
**biogeobears_details.md**
|
|
- Detailed model descriptions (DEC, DIVALIKE, BAYAREALIKE, +J parameter)
|
|
- Input file format specifications with examples
|
|
- Parameter interpretation guidelines
|
|
- Plotting options and customization
|
|
- Complete citations for publications
|
|
- Computational considerations and troubleshooting
|
|
|
|
## Example Output
|
|
|
|
The analysis produces:
|
|
- `biogeobears_report.html` - Interactive HTML report with all results
|
|
- `[MODEL]_result.Rdata` - Saved R objects for each model
|
|
- `plots/[MODEL]_pie.pdf` - Ancestral ranges shown as pie charts on tree
|
|
- `plots/[MODEL]_text.pdf` - Ancestral ranges shown as text labels on tree
|
|
|
|
## Interpretation Guidance
|
|
|
|
The skill helps users understand:
|
|
|
|
### Model Selection
|
|
- **AIC weights**: Probability each model is best
|
|
- **ΔAIC thresholds**: <2 (equivalent), 2-7 (less support), >10 (no support)
|
|
|
|
### Parameter Estimates
|
|
- **d (dispersal)**: Rate of range expansion
|
|
- **e (extinction)**: Rate of local extinction
|
|
- **j (founder-event)**: Rate of jump dispersal at speciation
|
|
- **d/e ratio**: >1 favors expansion, <1 favors contraction
|
|
|
|
### Statistical Tests
|
|
- **LRT p < 0.05**: +J parameter significantly improves fit
|
|
- Model uncertainty: Report results from multiple models if weights similar
|
|
|
|
## Installation Requirements
|
|
|
|
Users must have:
|
|
- R (≥4.0)
|
|
- BioGeoBEARS R package
|
|
- Supporting R packages: ape, rmarkdown, knitr, kableExtra
|
|
- Python 3 (for validation script)
|
|
|
|
Installation instructions are included in generated README.md files.
|
|
|
|
## Expected Runtime
|
|
|
|
**Skill setup time**: 5-10 minutes (file validation and directory setup)
|
|
|
|
**Analysis runtime** (separate from skill execution):
|
|
- Small datasets (<50 tips, ≤5 areas): 10-30 minutes
|
|
- Medium datasets (50-100 tips, 5-6 areas): 30-90 minutes
|
|
- Large datasets (>100 tips, >5 areas): 1-6 hours
|
|
|
|
## Common Issues Handled
|
|
|
|
The skill troubleshoots:
|
|
- Species name mismatches between tree and geography file
|
|
- Unrooted trees (guides user to root with outgroup)
|
|
- Geography file formatting errors (tabs, spaces, binary codes)
|
|
- Optimization convergence failures
|
|
- Slow runtime with many areas/tips
|
|
|
|
## Citations
|
|
|
|
Based on:
|
|
- **BioGeoBEARS** package by Nicholas Matzke
|
|
- Tutorial resources from http://phylo.wikidot.com/biogeobears
|
|
- Example workflows from BioGeoBEARS GitHub repository
|
|
|
|
## Skill Details
|
|
|
|
- **Skill Type**: Workflow-based bioinformatics skill
|
|
- **Domain**: Phylogenetic biogeography, historical biogeography
|
|
- **Output**: Complete analysis setup with scripts, documentation, and ready-to-run workflow
|
|
- **Automation Level**: High (validates, reformats, generates all scripts)
|
|
- **User Input Required**: File paths and parameter choices via guided questions
|
|
|
|
## See Also
|
|
|
|
- [phylo_from_buscos](../phylo_from_buscos/README.md) - Complementary skill for generating phylogenies from genomes
|