Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/biomni/SKILL.md
+++ b/skills/biomni/SKILL.md
@@ -0,0 +1,309 @@
+---
+name: biomni
+description: Autonomous biomedical AI agent framework for executing complex research tasks across genomics, drug discovery, molecular biology, and clinical analysis. Use this skill when conducting multi-step biomedical research including CRISPR screening design, single-cell RNA-seq analysis, ADMET prediction, GWAS interpretation, rare disease diagnosis, or lab protocol optimization. Leverages LLM reasoning with code execution and integrated biomedical databases.
+---
+
+# Biomni
+
+## Overview
+
+Biomni is an open-source biomedical AI agent framework from Stanford's SNAP lab that autonomously executes complex research tasks across biomedical domains. Use this skill when working on multi-step biological reasoning tasks, analyzing biomedical data, or conducting research spanning genomics, drug discovery, molecular biology, and clinical analysis.
+
+## Core Capabilities
+
+Biomni excels at:
+
+1. **Multi-step biological reasoning** - Autonomous task decomposition and planning for complex biomedical queries
+2. **Code generation and execution** - Dynamic analysis pipeline creation for data processing
+3. **Knowledge retrieval** - Access to ~11GB of integrated biomedical databases and literature
+4. **Cross-domain problem solving** - Unified interface for genomics, proteomics, drug discovery, and clinical tasks
+
+## When to Use This Skill
+
+Use biomni for:
+- **CRISPR screening** - Design screens, prioritize genes, analyze knockout effects
+- **Single-cell RNA-seq** - Cell type annotation, differential expression, trajectory analysis
+- **Drug discovery** - ADMET prediction, target identification, compound optimization
+- **GWAS analysis** - Variant interpretation, causal gene identification, pathway enrichment
+- **Clinical genomics** - Rare disease diagnosis, variant pathogenicity, phenotype-genotype mapping
+- **Lab protocols** - Protocol optimization, literature synthesis, experimental design
+
+## Quick Start
+
+### Installation and Setup
+
+Install Biomni and configure API keys for LLM providers:
+
+```bash
+uv pip install biomni --upgrade
+```
+
+Configure API keys (store in `.env` file or environment variables):
+```bash
+export ANTHROPIC_API_KEY="your-key-here"
+# Optional: OpenAI, Azure, Google, Groq, AWS Bedrock keys
+```
+
+Use `scripts/setup_environment.py` for interactive setup assistance.
+
+### Basic Usage Pattern
+
+```python
+from biomni.agent import A1
+
+# Initialize agent with data path and LLM choice
+agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+
+# Execute biomedical task autonomously
+agent.go("Your biomedical research question or task")
+
+# Save conversation history and results
+agent.save_conversation_history("report.pdf")
+```
+
+## Working with Biomni
+
+### 1. Agent Initialization
+
+The A1 class is the primary interface for biomni:
+
+```python
+from biomni.agent import A1
+from biomni.config import default_config
+
+# Basic initialization
+agent = A1(
+    path='./data',  # Path to data lake (~11GB downloaded on first use)
+    llm='claude-sonnet-4-20250514'  # LLM model selection
+)
+
+# Advanced configuration
+default_config.llm = "gpt-4"
+default_config.timeout_seconds = 1200
+default_config.max_iterations = 50
+```
+
+**Supported LLM Providers:**
+- Anthropic Claude (recommended): `claude-sonnet-4-20250514`, `claude-opus-4-20250514`
+- OpenAI: `gpt-4`, `gpt-4-turbo`
+- Azure OpenAI: via Azure configuration
+- Google Gemini: `gemini-2.0-flash-exp`
+- Groq: `llama-3.3-70b-versatile`
+- AWS Bedrock: Various models via Bedrock API
+
+See `references/llm_providers.md` for detailed LLM configuration instructions.
+
+### 2. Task Execution Workflow
+
+Biomni follows an autonomous agent workflow:
+
+```python
+# Step 1: Initialize agent
+agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+
+# Step 2: Execute task with natural language query
+result = agent.go("""
+Design a CRISPR screen to identify genes regulating autophagy in
+HEK293 cells. Prioritize genes based on essentiality and pathway
+relevance.
+""")
+
+# Step 3: Review generated code and analysis
+# Agent autonomously:
+# - Decomposes task into sub-steps
+# - Retrieves relevant biological knowledge
+# - Generates and executes analysis code
+# - Interprets results and provides insights
+
+# Step 4: Save results
+agent.save_conversation_history("autophagy_screen_report.pdf")
+```
+
+### 3. Common Task Patterns
+
+#### CRISPR Screening Design
+```python
+agent.go("""
+Design a genome-wide CRISPR knockout screen for identifying genes
+affecting [phenotype] in [cell type]. Include:
+1. sgRNA library design
+2. Gene prioritization criteria
+3. Expected hit genes based on pathway analysis
+""")
+```
+
+#### Single-Cell RNA-seq Analysis
+```python
+agent.go("""
+Analyze this single-cell RNA-seq dataset:
+- Perform quality control and filtering
+- Identify cell populations via clustering
+- Annotate cell types using marker genes
+- Conduct differential expression between conditions
+File path: [path/to/data.h5ad]
+""")
+```
+
+#### Drug ADMET Prediction
+```python
+agent.go("""
+Predict ADMET properties for these drug candidates:
+[SMILES strings or compound IDs]
+Focus on:
+- Absorption (Caco-2 permeability, HIA)
+- Distribution (plasma protein binding, BBB penetration)
+- Metabolism (CYP450 interaction)
+- Excretion (clearance)
+- Toxicity (hERG liability, hepatotoxicity)
+""")
+```
+
+#### GWAS Variant Interpretation
+```python
+agent.go("""
+Interpret GWAS results for [trait/disease]:
+- Identify genome-wide significant variants
+- Map variants to causal genes
+- Perform pathway enrichment analysis
+- Predict functional consequences
+Summary statistics file: [path/to/gwas_summary.txt]
+""")
+```
+
+See `references/use_cases.md` for comprehensive task examples across all biomedical domains.
+
+### 4. Data Integration
+
+Biomni integrates ~11GB of biomedical knowledge sources:
+- **Gene databases** - Ensembl, NCBI Gene, UniProt
+- **Protein structures** - PDB, AlphaFold
+- **Clinical datasets** - ClinVar, OMIM, HPO
+- **Literature indices** - PubMed abstracts, biomedical ontologies
+- **Pathway databases** - KEGG, Reactome, GO
+
+Data is automatically downloaded to the specified `path` on first use.
+
+### 5. MCP Server Integration
+
+Extend biomni with external tools via Model Context Protocol:
+
+```python
+# MCP servers can provide:
+# - FDA drug databases
+# - Web search for literature
+# - Custom biomedical APIs
+# - Laboratory equipment interfaces
+
+# Configure MCP servers in .biomni/mcp_config.json
+```
+
+### 6. Evaluation Framework
+
+Benchmark agent performance on biomedical tasks:
+
+```python
+from biomni.eval import BiomniEval1
+
+evaluator = BiomniEval1()
+
+# Evaluate on specific task types
+score = evaluator.evaluate(
+    task_type='crispr_design',
+    instance_id='test_001',
+    answer=agent_output
+)
+
+# Access evaluation dataset
+dataset = evaluator.load_dataset()
+```
+
+## Best Practices
+
+### Task Formulation
+- **Be specific** - Include biological context, organism, cell type, conditions
+- **Specify outputs** - Clearly state desired analysis outputs and formats
+- **Provide data paths** - Include file paths for datasets to analyze
+- **Set constraints** - Mention time/computational limits if relevant
+
+### Security Considerations
+⚠️ **Important**: Biomni executes LLM-generated code with full system privileges. For production use:
+- Run in isolated environments (Docker, VMs)
+- Avoid exposing sensitive credentials
+- Review generated code before execution in sensitive contexts
+- Use sandboxed execution environments when possible
+
+### Performance Optimization
+- **Choose appropriate LLMs** - Claude Sonnet 4 recommended for balance of speed/quality
+- **Set reasonable timeouts** - Adjust `default_config.timeout_seconds` for complex tasks
+- **Monitor iterations** - Track `max_iterations` to prevent runaway loops
+- **Cache data** - Reuse downloaded data lake across sessions
+
+### Result Documentation
+```python
+# Always save conversation history for reproducibility
+agent.save_conversation_history("results/project_name_YYYYMMDD.pdf")
+
+# Include in reports:
+# - Original task description
+# - Generated analysis code
+# - Results and interpretations
+# - Data sources used
+```
+
+## Resources
+
+### References
+Detailed documentation available in the `references/` directory:
+
+- **`api_reference.md`** - Complete API documentation for A1 class, configuration, and evaluation
+- **`llm_providers.md`** - LLM provider setup (Anthropic, OpenAI, Azure, Google, Groq, AWS)
+- **`use_cases.md`** - Comprehensive task examples for all biomedical domains
+
+### Scripts
+Helper scripts in the `scripts/` directory:
+
+- **`setup_environment.py`** - Interactive environment and API key configuration
+- **`generate_report.py`** - Enhanced PDF report generation with custom formatting
+
+### External Resources
+- **GitHub**: https://github.com/snap-stanford/biomni
+- **Web Platform**: https://biomni.stanford.edu
+- **Paper**: https://www.biorxiv.org/content/10.1101/2025.05.30.656746v1
+- **Model**: https://huggingface.co/biomni/Biomni-R0-32B-Preview
+- **Evaluation Dataset**: https://huggingface.co/datasets/biomni/Eval1
+
+## Troubleshooting
+
+### Common Issues
+
+**Data download fails**
+```python
+# Manually trigger data lake download
+agent = A1(path='./data', llm='your-llm')
+# First .go() call will download data
+```
+
+**API key errors**
+```bash
+# Verify environment variables
+echo $ANTHROPIC_API_KEY
+# Or check .env file in working directory
+```
+
+**Timeout on complex tasks**
+```python
+from biomni.config import default_config
+default_config.timeout_seconds = 3600  # 1 hour
+```
+
+**Memory issues with large datasets**
+- Use streaming for large files
+- Process data in chunks
+- Increase system memory allocation
+
+### Getting Help
+
+For issues or questions:
+- GitHub Issues: https://github.com/snap-stanford/biomni/issues
+- Documentation: Check `references/` files for detailed guidance
+- Community: Stanford SNAP lab and biomni contributors