zhongwei/gh-brunoasm-my-claude-skills-bioinfo-skills

Fork 0

Files

Zhongwei Li c1d9dee646 Initial commit

2025-11-29 18:02:37 +08:00

7.9 KiB

Raw Permalink Blame History

BioGeoBEARS Biogeographic Analysis Skill

A Claude skill for setting up and executing phylogenetic biogeographic analyses using BioGeoBEARS in R.

Overview

This skill automates the complete workflow for biogeographic analysis on phylogenetic trees, from raw data validation to publication-ready visualizations. It helps users reconstruct ancestral geographic ranges by:

Validating and reformatting input files (phylogenetic tree + geographic distribution data)
Setting up organized analysis folder structures
Generating customized RMarkdown analysis scripts
Guiding parameter selection (maximum range size, model choices)
Producing visualizations with pie charts and text labels showing ancestral ranges
Comparing multiple biogeographic models with statistical tests

When to Use

Use this skill when you need to:

Reconstruct ancestral geographic ranges on a phylogeny
Test different biogeographic models (DEC, DIVALIKE, BAYAREALIKE)
Analyze how species distributions evolved over time
Determine whether founder-event speciation (+J parameter) is important
Generate publication-ready biogeographic visualizations

Required Inputs

Users must provide:

Phylogenetic tree (Newick format: .nwk, .tre, or .tree)
- Must be rooted
- Tip labels must match species in geography file
- Branch lengths required
Geographic distribution data (any tabular format)
- Species names matching tree tips
- Presence/absence data for different geographic areas
- Accepts CSV, TSV, Excel, or PHYLIP format

What the Skill Does

1. Data Validation and Reformatting

The skill includes a Python script (validate_geography_file.py) that:

Validates geography file format (PHYLIP-like with specific tab/spacing requirements)
Checks for common errors (spaces in species names, tab delimiters, binary code length)
Reformats CSV/TSV files to proper BioGeoBEARS format
Cross-validates species names against tree tip labels

2. Analysis Setup

Creates an organized directory structure:

biogeobears_analysis/
├── input/
│   ├── tree.nwk                    # Phylogenetic tree
│   ├── geography.data              # Validated geography file
│   └── original_data/              # Original input files
├── scripts/
│   └── run_biogeobears.Rmd         # Customized RMarkdown script
├── results/                        # Analysis outputs
│   ├── [MODEL]_result.Rdata        # Saved model results
│   └── plots/                      # Visualizations
│       ├── [MODEL]_pie.pdf
│       └── [MODEL]_text.pdf
└── README.md                       # Documentation

3. RMarkdown Analysis Template

Generates a complete RMarkdown script that:

Loads and validates input data
Fits 6 biogeographic models:
- DEC (Dispersal-Extinction-Cladogenesis)
- DEC+J (DEC with founder-event speciation)
- DIVALIKE (vicariance-focused)
- DIVALIKE+J
- BAYAREALIKE (sympatry-focused)
- BAYAREALIKE+J
Compares models using AIC, AICc, and AIC weights
Performs likelihood ratio tests for nested models
Estimates parameters (d=dispersal, e=extinction, j=founder-event rates)
Generates visualizations on the phylogeny
Creates HTML report with all results

4. Visualization

Produces two types of plots:

Pie charts: Show probability distributions for ancestral ranges (conveys uncertainty)
Text labels: Show maximum likelihood ancestral states (cleaner, easier to read)

Colors represent geographic areas:

Single areas: Bright primary colors
Multi-area ranges: Blended colors
All areas: White

Workflow

Gather information: Ask user for tree file, geography file, and parameters
Validate tree: Check if rooted and extract tip labels
Validate/reformat geography file: Use validation script to check format or convert from CSV/TSV
Set up analysis folder: Create organized directory structure
Generate RMarkdown script: Customize template with user parameters
Create documentation: Generate README and run scripts
Provide instructions: Clear steps for running the analysis

Analysis Parameters

The skill helps users choose:

Maximum Range Size

How many areas can a species occupy simultaneously?
Options: Conservative (# areas - 1), Permissive (all areas), Data-driven (max observed)
Larger values increase computation time exponentially

Models to Compare

Default: All 6 models (recommended for comprehensive comparison)
Alternative: Only base models or only +J models
Rationale: Model comparison is key to biogeographic inference

Visualization Type

Pie charts (show probabilities and uncertainty)
Text labels (show most likely states, cleaner)
Both (default in template)

Bundled Resources

scripts/

validate_geography_file.py

Validates BioGeoBEARS geography file format
Reformats from CSV/TSV to PHYLIP
Cross-validates with tree tip labels
Usage: python validate_geography_file.py --help

biogeobears_analysis_template.Rmd

Complete RMarkdown analysis template
Parameterized via YAML header
Fits all models, compares, and visualizes
Generates self-contained HTML report

references/

biogeobears_details.md

Detailed model descriptions (DEC, DIVALIKE, BAYAREALIKE, +J parameter)
Input file format specifications with examples
Parameter interpretation guidelines
Plotting options and customization
Complete citations for publications
Computational considerations and troubleshooting

Example Output

The analysis produces:

biogeobears_report.html - Interactive HTML report with all results
[MODEL]_result.Rdata - Saved R objects for each model
plots/[MODEL]_pie.pdf - Ancestral ranges shown as pie charts on tree
plots/[MODEL]_text.pdf - Ancestral ranges shown as text labels on tree

Interpretation Guidance

The skill helps users understand:

Model Selection

AIC weights: Probability each model is best
ΔAIC thresholds: <2 (equivalent), 2-7 (less support), >10 (no support)

Parameter Estimates

d (dispersal): Rate of range expansion
e (extinction): Rate of local extinction
j (founder-event): Rate of jump dispersal at speciation
d/e ratio: >1 favors expansion, <1 favors contraction

Statistical Tests

LRT p < 0.05: +J parameter significantly improves fit
Model uncertainty: Report results from multiple models if weights similar

Installation Requirements

Users must have:

R (≥4.0)
BioGeoBEARS R package
Supporting R packages: ape, rmarkdown, knitr, kableExtra
Python 3 (for validation script)

Installation instructions are included in generated README.md files.

Expected Runtime

Skill setup time: 5-10 minutes (file validation and directory setup)

Analysis runtime (separate from skill execution):

Small datasets (<50 tips, ≤5 areas): 10-30 minutes
Medium datasets (50-100 tips, 5-6 areas): 30-90 minutes
Large datasets (>100 tips, >5 areas): 1-6 hours

Common Issues Handled

The skill troubleshoots:

Species name mismatches between tree and geography file
Unrooted trees (guides user to root with outgroup)
Geography file formatting errors (tabs, spaces, binary codes)
Optimization convergence failures
Slow runtime with many areas/tips

Citations

Based on:

BioGeoBEARS package by Nicholas Matzke
Tutorial resources from http://phylo.wikidot.com/biogeobears
Example workflows from BioGeoBEARS GitHub repository

Skill Details

Skill Type: Workflow-based bioinformatics skill
Domain: Phylogenetic biogeography, historical biogeography
Output: Complete analysis setup with scripts, documentation, and ready-to-run workflow
Automation Level: High (validates, reformats, generates all scripts)
User Input Required: File paths and parameter choices via guided questions

7.9 KiB Raw Permalink Blame History