Files
2025-11-30 08:30:10 +08:00

291 lines
9.8 KiB
Markdown

---
name: tooluniverse
description: Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.
---
# ToolUniverse
## Overview
ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery.
**Key Capabilities:**
- Access 600+ scientific tools, models, datasets, and APIs
- Discover tools using natural language, semantic search, or keywords
- Execute tools through standardized AI-Tool Interaction Protocol
- Compose multi-step workflows for complex research problems
- Integration with Claude Desktop/Code via Model Context Protocol (MCP)
## When to Use This Skill
Use this skill when:
- Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools")
- Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis)
- Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.)
- Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening)
- Working with bioinformatics, cheminformatics, or structural biology tasks
- Analyzing gene expression, protein sequences, molecular structures, or clinical data
- Performing literature searches, pathway enrichment, or variant annotation
- Building automated scientific research workflows
## Quick Start
### Basic Setup
```python
from tooluniverse import ToolUniverse
# Initialize and load tools
tu = ToolUniverse()
tu.load_tools() # Loads 600+ scientific tools
# Discover tools
tools = tu.run({
"name": "Tool_Finder_Keyword",
"arguments": {
"description": "disease target associations",
"limit": 10
}
})
# Execute a tool
result = tu.run({
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": "EFO_0000537"} # Hypertension
})
```
### Model Context Protocol (MCP)
For Claude Desktop/Code integration:
```bash
tooluniverse-smcp
```
## Core Workflows
### 1. Tool Discovery
Find relevant tools for your research task:
**Three discovery methods:**
- `Tool_Finder` - Embedding-based semantic search (requires GPU)
- `Tool_Finder_LLM` - LLM-based semantic search (no GPU required)
- `Tool_Finder_Keyword` - Fast keyword search
**Example:**
```python
# Search by natural language description
tools = tu.run({
"name": "Tool_Finder_LLM",
"arguments": {
"description": "Find tools for RNA sequencing differential expression analysis",
"limit": 10
}
})
# Review available tools
for tool in tools:
print(f"{tool['name']}: {tool['description']}")
```
**See `references/tool-discovery.md` for:**
- Detailed discovery methods and search strategies
- Domain-specific keyword suggestions
- Best practices for finding tools
### 2. Tool Execution
Execute individual tools through the standardized interface:
**Example:**
```python
# Execute disease-target lookup
targets = tu.run({
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": "EFO_0000616"} # Breast cancer
})
# Get protein structure
structure = tu.run({
"name": "AlphaFold_get_structure",
"arguments": {"uniprot_id": "P12345"}
})
# Calculate molecular properties
properties = tu.run({
"name": "RDKit_calculate_descriptors",
"arguments": {"smiles": "CCO"} # Ethanol
})
```
**See `references/tool-execution.md` for:**
- Real-world execution examples across domains
- Tool parameter handling and validation
- Result processing and error handling
- Best practices for production use
### 3. Tool Composition and Workflows
Compose multiple tools for complex research workflows:
**Drug Discovery Example:**
```python
# 1. Find disease targets
targets = tu.run({
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": "EFO_0000616"}
})
# 2. Get protein structures
structures = []
for target in targets[:5]:
structure = tu.run({
"name": "AlphaFold_get_structure",
"arguments": {"uniprot_id": target['uniprot_id']}
})
structures.append(structure)
# 3. Screen compounds
hits = []
for structure in structures:
compounds = tu.run({
"name": "ZINC_virtual_screening",
"arguments": {
"structure": structure,
"library": "lead-like",
"top_n": 100
}
})
hits.extend(compounds)
# 4. Evaluate drug-likeness
drug_candidates = []
for compound in hits:
props = tu.run({
"name": "RDKit_calculate_drug_properties",
"arguments": {"smiles": compound['smiles']}
})
if props['lipinski_pass']:
drug_candidates.append(compound)
```
**See `references/tool-composition.md` for:**
- Complete workflow examples (drug discovery, genomics, clinical)
- Sequential and parallel tool composition patterns
- Output processing hooks
- Workflow best practices
## Scientific Domains
ToolUniverse supports 600+ tools across major scientific domains:
**Bioinformatics:**
- Sequence analysis, alignment, BLAST
- Gene expression (RNA-seq, DESeq2)
- Pathway enrichment (KEGG, Reactome, GO)
- Variant annotation (VEP, ClinVar)
**Cheminformatics:**
- Molecular descriptors and fingerprints
- Drug discovery and virtual screening
- ADMET prediction and drug-likeness
- Chemical databases (PubChem, ChEMBL, ZINC)
**Structural Biology:**
- Protein structure prediction (AlphaFold)
- Structure retrieval (PDB)
- Binding site detection
- Protein-protein interactions
**Proteomics:**
- Mass spectrometry analysis
- Protein databases (UniProt, STRING)
- Post-translational modifications
**Genomics:**
- Genome assembly and annotation
- Copy number variation
- Clinical genomics workflows
**Medical/Clinical:**
- Disease databases (OpenTargets, OMIM)
- Clinical trials and FDA data
- Variant classification
**See `references/domains.md` for:**
- Complete domain categorization
- Tool examples by discipline
- Cross-domain applications
- Search strategies by domain
## Reference Documentation
This skill includes comprehensive reference files that provide detailed information for specific aspects:
- **`references/installation.md`** - Installation, setup, MCP configuration, platform integration
- **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools
- **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling
- **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution
- **`references/domains.md`** - Tool categorization by domain, use case examples
- **`references/api_reference.md`** - Python API documentation, hooks, protocols
**Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies.
## Example Scripts
Two executable example scripts demonstrate common use cases:
**`scripts/example_tool_search.py`** - Demonstrates all three discovery methods:
- Keyword-based search
- LLM-based search
- Domain-specific searches
- Getting detailed tool information
**`scripts/example_workflow.py`** - Complete workflow examples:
- Drug discovery pipeline (disease → targets → structures → screening → candidates)
- Genomics analysis (expression data → differential analysis → pathways)
Run examples to understand typical usage patterns and workflow composition.
## Best Practices
1. **Tool Discovery:**
- Start with broad searches, then refine based on results
- Use `Tool_Finder_Keyword` for fast searches with known terms
- Use `Tool_Finder_LLM` for complex semantic queries
- Set appropriate `limit` parameter (default: 10)
2. **Tool Execution:**
- Always verify tool parameters before execution
- Implement error handling for production workflows
- Validate input data formats (SMILES, UniProt IDs, gene symbols)
- Check result types and structures
3. **Workflow Composition:**
- Test each step individually before composing full workflows
- Implement checkpointing for long workflows
- Consider rate limits for remote APIs
- Use parallel execution when tools are independent
4. **Integration:**
- Initialize ToolUniverse once and reuse the instance
- Call `load_tools()` once at startup
- Cache frequently used tool information
- Enable logging for debugging
## Key Terminology
- **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse
- **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword)
- **Tool Execution**: Running a tool with specific arguments via `tu.run()`
- **Tool Composition**: Chaining multiple tools for multi-step workflows
- **MCP**: Model Context Protocol for integration with Claude Desktop/Code
- **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication
## Resources
- **Official Website**: https://aiscientist.tools
- **GitHub**: https://github.com/mims-harvard/ToolUniverse
- **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/
- **Installation**: `uv uv pip install tooluniverse`
- **MCP Server**: `tooluniverse-smcp`