Initial commit
This commit is contained in:
222
skills/biogeobears/README.md
Normal file
222
skills/biogeobears/README.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# BioGeoBEARS Biogeographic Analysis Skill
|
||||
|
||||
A Claude skill for setting up and executing phylogenetic biogeographic analyses using BioGeoBEARS in R.
|
||||
|
||||
## Overview
|
||||
|
||||
This skill automates the complete workflow for biogeographic analysis on phylogenetic trees, from raw data validation to publication-ready visualizations. It helps users reconstruct ancestral geographic ranges by:
|
||||
|
||||
- Validating and reformatting input files (phylogenetic tree + geographic distribution data)
|
||||
- Setting up organized analysis folder structures
|
||||
- Generating customized RMarkdown analysis scripts
|
||||
- Guiding parameter selection (maximum range size, model choices)
|
||||
- Producing visualizations with pie charts and text labels showing ancestral ranges
|
||||
- Comparing multiple biogeographic models with statistical tests
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when you need to:
|
||||
- Reconstruct ancestral geographic ranges on a phylogeny
|
||||
- Test different biogeographic models (DEC, DIVALIKE, BAYAREALIKE)
|
||||
- Analyze how species distributions evolved over time
|
||||
- Determine whether founder-event speciation (+J parameter) is important
|
||||
- Generate publication-ready biogeographic visualizations
|
||||
|
||||
## Required Inputs
|
||||
|
||||
Users must provide:
|
||||
|
||||
1. **Phylogenetic tree** (Newick format: .nwk, .tre, or .tree)
|
||||
- Must be rooted
|
||||
- Tip labels must match species in geography file
|
||||
- Branch lengths required
|
||||
|
||||
2. **Geographic distribution data** (any tabular format)
|
||||
- Species names matching tree tips
|
||||
- Presence/absence data for different geographic areas
|
||||
- Accepts CSV, TSV, Excel, or PHYLIP format
|
||||
|
||||
## What the Skill Does
|
||||
|
||||
### 1. Data Validation and Reformatting
|
||||
|
||||
The skill includes a Python script (`validate_geography_file.py`) that:
|
||||
- Validates geography file format (PHYLIP-like with specific tab/spacing requirements)
|
||||
- Checks for common errors (spaces in species names, tab delimiters, binary code length)
|
||||
- Reformats CSV/TSV files to proper BioGeoBEARS format
|
||||
- Cross-validates species names against tree tip labels
|
||||
|
||||
### 2. Analysis Setup
|
||||
|
||||
Creates an organized directory structure:
|
||||
```
|
||||
biogeobears_analysis/
|
||||
├── input/
|
||||
│ ├── tree.nwk # Phylogenetic tree
|
||||
│ ├── geography.data # Validated geography file
|
||||
│ └── original_data/ # Original input files
|
||||
├── scripts/
|
||||
│ └── run_biogeobears.Rmd # Customized RMarkdown script
|
||||
├── results/ # Analysis outputs
|
||||
│ ├── [MODEL]_result.Rdata # Saved model results
|
||||
│ └── plots/ # Visualizations
|
||||
│ ├── [MODEL]_pie.pdf
|
||||
│ └── [MODEL]_text.pdf
|
||||
└── README.md # Documentation
|
||||
```
|
||||
|
||||
### 3. RMarkdown Analysis Template
|
||||
|
||||
Generates a complete RMarkdown script that:
|
||||
- Loads and validates input data
|
||||
- Fits 6 biogeographic models:
|
||||
- DEC (Dispersal-Extinction-Cladogenesis)
|
||||
- DEC+J (DEC with founder-event speciation)
|
||||
- DIVALIKE (vicariance-focused)
|
||||
- DIVALIKE+J
|
||||
- BAYAREALIKE (sympatry-focused)
|
||||
- BAYAREALIKE+J
|
||||
- Compares models using AIC, AICc, and AIC weights
|
||||
- Performs likelihood ratio tests for nested models
|
||||
- Estimates parameters (d=dispersal, e=extinction, j=founder-event rates)
|
||||
- Generates visualizations on the phylogeny
|
||||
- Creates HTML report with all results
|
||||
|
||||
### 4. Visualization
|
||||
|
||||
Produces two types of plots:
|
||||
- **Pie charts**: Show probability distributions for ancestral ranges (conveys uncertainty)
|
||||
- **Text labels**: Show maximum likelihood ancestral states (cleaner, easier to read)
|
||||
|
||||
Colors represent geographic areas:
|
||||
- Single areas: Bright primary colors
|
||||
- Multi-area ranges: Blended colors
|
||||
- All areas: White
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Gather information**: Ask user for tree file, geography file, and parameters
|
||||
2. **Validate tree**: Check if rooted and extract tip labels
|
||||
3. **Validate/reformat geography file**: Use validation script to check format or convert from CSV/TSV
|
||||
4. **Set up analysis folder**: Create organized directory structure
|
||||
5. **Generate RMarkdown script**: Customize template with user parameters
|
||||
6. **Create documentation**: Generate README and run scripts
|
||||
7. **Provide instructions**: Clear steps for running the analysis
|
||||
|
||||
## Analysis Parameters
|
||||
|
||||
The skill helps users choose:
|
||||
|
||||
### Maximum Range Size
|
||||
- How many areas can a species occupy simultaneously?
|
||||
- Options: Conservative (# areas - 1), Permissive (all areas), Data-driven (max observed)
|
||||
- Larger values increase computation time exponentially
|
||||
|
||||
### Models to Compare
|
||||
- Default: All 6 models (recommended for comprehensive comparison)
|
||||
- Alternative: Only base models or only +J models
|
||||
- Rationale: Model comparison is key to biogeographic inference
|
||||
|
||||
### Visualization Type
|
||||
- Pie charts (show probabilities and uncertainty)
|
||||
- Text labels (show most likely states, cleaner)
|
||||
- Both (default in template)
|
||||
|
||||
## Bundled Resources
|
||||
|
||||
### scripts/
|
||||
|
||||
**validate_geography_file.py**
|
||||
- Validates BioGeoBEARS geography file format
|
||||
- Reformats from CSV/TSV to PHYLIP
|
||||
- Cross-validates with tree tip labels
|
||||
- Usage: `python validate_geography_file.py --help`
|
||||
|
||||
**biogeobears_analysis_template.Rmd**
|
||||
- Complete RMarkdown analysis template
|
||||
- Parameterized via YAML header
|
||||
- Fits all models, compares, and visualizes
|
||||
- Generates self-contained HTML report
|
||||
|
||||
### references/
|
||||
|
||||
**biogeobears_details.md**
|
||||
- Detailed model descriptions (DEC, DIVALIKE, BAYAREALIKE, +J parameter)
|
||||
- Input file format specifications with examples
|
||||
- Parameter interpretation guidelines
|
||||
- Plotting options and customization
|
||||
- Complete citations for publications
|
||||
- Computational considerations and troubleshooting
|
||||
|
||||
## Example Output
|
||||
|
||||
The analysis produces:
|
||||
- `biogeobears_report.html` - Interactive HTML report with all results
|
||||
- `[MODEL]_result.Rdata` - Saved R objects for each model
|
||||
- `plots/[MODEL]_pie.pdf` - Ancestral ranges shown as pie charts on tree
|
||||
- `plots/[MODEL]_text.pdf` - Ancestral ranges shown as text labels on tree
|
||||
|
||||
## Interpretation Guidance
|
||||
|
||||
The skill helps users understand:
|
||||
|
||||
### Model Selection
|
||||
- **AIC weights**: Probability each model is best
|
||||
- **ΔAIC thresholds**: <2 (equivalent), 2-7 (less support), >10 (no support)
|
||||
|
||||
### Parameter Estimates
|
||||
- **d (dispersal)**: Rate of range expansion
|
||||
- **e (extinction)**: Rate of local extinction
|
||||
- **j (founder-event)**: Rate of jump dispersal at speciation
|
||||
- **d/e ratio**: >1 favors expansion, <1 favors contraction
|
||||
|
||||
### Statistical Tests
|
||||
- **LRT p < 0.05**: +J parameter significantly improves fit
|
||||
- Model uncertainty: Report results from multiple models if weights similar
|
||||
|
||||
## Installation Requirements
|
||||
|
||||
Users must have:
|
||||
- R (≥4.0)
|
||||
- BioGeoBEARS R package
|
||||
- Supporting R packages: ape, rmarkdown, knitr, kableExtra
|
||||
- Python 3 (for validation script)
|
||||
|
||||
Installation instructions are included in generated README.md files.
|
||||
|
||||
## Expected Runtime
|
||||
|
||||
**Skill setup time**: 5-10 minutes (file validation and directory setup)
|
||||
|
||||
**Analysis runtime** (separate from skill execution):
|
||||
- Small datasets (<50 tips, ≤5 areas): 10-30 minutes
|
||||
- Medium datasets (50-100 tips, 5-6 areas): 30-90 minutes
|
||||
- Large datasets (>100 tips, >5 areas): 1-6 hours
|
||||
|
||||
## Common Issues Handled
|
||||
|
||||
The skill troubleshoots:
|
||||
- Species name mismatches between tree and geography file
|
||||
- Unrooted trees (guides user to root with outgroup)
|
||||
- Geography file formatting errors (tabs, spaces, binary codes)
|
||||
- Optimization convergence failures
|
||||
- Slow runtime with many areas/tips
|
||||
|
||||
## Citations
|
||||
|
||||
Based on:
|
||||
- **BioGeoBEARS** package by Nicholas Matzke
|
||||
- Tutorial resources from http://phylo.wikidot.com/biogeobears
|
||||
- Example workflows from BioGeoBEARS GitHub repository
|
||||
|
||||
## Skill Details
|
||||
|
||||
- **Skill Type**: Workflow-based bioinformatics skill
|
||||
- **Domain**: Phylogenetic biogeography, historical biogeography
|
||||
- **Output**: Complete analysis setup with scripts, documentation, and ready-to-run workflow
|
||||
- **Automation Level**: High (validates, reformats, generates all scripts)
|
||||
- **User Input Required**: File paths and parameter choices via guided questions
|
||||
|
||||
## See Also
|
||||
|
||||
- [phylo_from_buscos](../phylo_from_buscos/README.md) - Complementary skill for generating phylogenies from genomes
|
||||
Reference in New Issue
Block a user