Initial commit
This commit is contained in:
290
skills/tooluniverse/SKILL.md
Normal file
290
skills/tooluniverse/SKILL.md
Normal file
@@ -0,0 +1,290 @@
|
||||
---
|
||||
name: tooluniverse
|
||||
description: Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.
|
||||
---
|
||||
|
||||
# ToolUniverse
|
||||
|
||||
## Overview
|
||||
|
||||
ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery.
|
||||
|
||||
**Key Capabilities:**
|
||||
- Access 600+ scientific tools, models, datasets, and APIs
|
||||
- Discover tools using natural language, semantic search, or keywords
|
||||
- Execute tools through standardized AI-Tool Interaction Protocol
|
||||
- Compose multi-step workflows for complex research problems
|
||||
- Integration with Claude Desktop/Code via Model Context Protocol (MCP)
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools")
|
||||
- Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis)
|
||||
- Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.)
|
||||
- Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening)
|
||||
- Working with bioinformatics, cheminformatics, or structural biology tasks
|
||||
- Analyzing gene expression, protein sequences, molecular structures, or clinical data
|
||||
- Performing literature searches, pathway enrichment, or variant annotation
|
||||
- Building automated scientific research workflows
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Setup
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
# Initialize and load tools
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools() # Loads 600+ scientific tools
|
||||
|
||||
# Discover tools
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "disease target associations",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
|
||||
# Execute a tool
|
||||
result = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000537"} # Hypertension
|
||||
})
|
||||
```
|
||||
|
||||
### Model Context Protocol (MCP)
|
||||
For Claude Desktop/Code integration:
|
||||
```bash
|
||||
tooluniverse-smcp
|
||||
```
|
||||
|
||||
## Core Workflows
|
||||
|
||||
### 1. Tool Discovery
|
||||
|
||||
Find relevant tools for your research task:
|
||||
|
||||
**Three discovery methods:**
|
||||
- `Tool_Finder` - Embedding-based semantic search (requires GPU)
|
||||
- `Tool_Finder_LLM` - LLM-based semantic search (no GPU required)
|
||||
- `Tool_Finder_Keyword` - Fast keyword search
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Search by natural language description
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_LLM",
|
||||
"arguments": {
|
||||
"description": "Find tools for RNA sequencing differential expression analysis",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
|
||||
# Review available tools
|
||||
for tool in tools:
|
||||
print(f"{tool['name']}: {tool['description']}")
|
||||
```
|
||||
|
||||
**See `references/tool-discovery.md` for:**
|
||||
- Detailed discovery methods and search strategies
|
||||
- Domain-specific keyword suggestions
|
||||
- Best practices for finding tools
|
||||
|
||||
### 2. Tool Execution
|
||||
|
||||
Execute individual tools through the standardized interface:
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Execute disease-target lookup
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000616"} # Breast cancer
|
||||
})
|
||||
|
||||
# Get protein structure
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": "P12345"}
|
||||
})
|
||||
|
||||
# Calculate molecular properties
|
||||
properties = tu.run({
|
||||
"name": "RDKit_calculate_descriptors",
|
||||
"arguments": {"smiles": "CCO"} # Ethanol
|
||||
})
|
||||
```
|
||||
|
||||
**See `references/tool-execution.md` for:**
|
||||
- Real-world execution examples across domains
|
||||
- Tool parameter handling and validation
|
||||
- Result processing and error handling
|
||||
- Best practices for production use
|
||||
|
||||
### 3. Tool Composition and Workflows
|
||||
|
||||
Compose multiple tools for complex research workflows:
|
||||
|
||||
**Drug Discovery Example:**
|
||||
```python
|
||||
# 1. Find disease targets
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000616"}
|
||||
})
|
||||
|
||||
# 2. Get protein structures
|
||||
structures = []
|
||||
for target in targets[:5]:
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": target['uniprot_id']}
|
||||
})
|
||||
structures.append(structure)
|
||||
|
||||
# 3. Screen compounds
|
||||
hits = []
|
||||
for structure in structures:
|
||||
compounds = tu.run({
|
||||
"name": "ZINC_virtual_screening",
|
||||
"arguments": {
|
||||
"structure": structure,
|
||||
"library": "lead-like",
|
||||
"top_n": 100
|
||||
}
|
||||
})
|
||||
hits.extend(compounds)
|
||||
|
||||
# 4. Evaluate drug-likeness
|
||||
drug_candidates = []
|
||||
for compound in hits:
|
||||
props = tu.run({
|
||||
"name": "RDKit_calculate_drug_properties",
|
||||
"arguments": {"smiles": compound['smiles']}
|
||||
})
|
||||
if props['lipinski_pass']:
|
||||
drug_candidates.append(compound)
|
||||
```
|
||||
|
||||
**See `references/tool-composition.md` for:**
|
||||
- Complete workflow examples (drug discovery, genomics, clinical)
|
||||
- Sequential and parallel tool composition patterns
|
||||
- Output processing hooks
|
||||
- Workflow best practices
|
||||
|
||||
## Scientific Domains
|
||||
|
||||
ToolUniverse supports 600+ tools across major scientific domains:
|
||||
|
||||
**Bioinformatics:**
|
||||
- Sequence analysis, alignment, BLAST
|
||||
- Gene expression (RNA-seq, DESeq2)
|
||||
- Pathway enrichment (KEGG, Reactome, GO)
|
||||
- Variant annotation (VEP, ClinVar)
|
||||
|
||||
**Cheminformatics:**
|
||||
- Molecular descriptors and fingerprints
|
||||
- Drug discovery and virtual screening
|
||||
- ADMET prediction and drug-likeness
|
||||
- Chemical databases (PubChem, ChEMBL, ZINC)
|
||||
|
||||
**Structural Biology:**
|
||||
- Protein structure prediction (AlphaFold)
|
||||
- Structure retrieval (PDB)
|
||||
- Binding site detection
|
||||
- Protein-protein interactions
|
||||
|
||||
**Proteomics:**
|
||||
- Mass spectrometry analysis
|
||||
- Protein databases (UniProt, STRING)
|
||||
- Post-translational modifications
|
||||
|
||||
**Genomics:**
|
||||
- Genome assembly and annotation
|
||||
- Copy number variation
|
||||
- Clinical genomics workflows
|
||||
|
||||
**Medical/Clinical:**
|
||||
- Disease databases (OpenTargets, OMIM)
|
||||
- Clinical trials and FDA data
|
||||
- Variant classification
|
||||
|
||||
**See `references/domains.md` for:**
|
||||
- Complete domain categorization
|
||||
- Tool examples by discipline
|
||||
- Cross-domain applications
|
||||
- Search strategies by domain
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
This skill includes comprehensive reference files that provide detailed information for specific aspects:
|
||||
|
||||
- **`references/installation.md`** - Installation, setup, MCP configuration, platform integration
|
||||
- **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools
|
||||
- **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling
|
||||
- **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution
|
||||
- **`references/domains.md`** - Tool categorization by domain, use case examples
|
||||
- **`references/api_reference.md`** - Python API documentation, hooks, protocols
|
||||
|
||||
**Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies.
|
||||
|
||||
## Example Scripts
|
||||
|
||||
Two executable example scripts demonstrate common use cases:
|
||||
|
||||
**`scripts/example_tool_search.py`** - Demonstrates all three discovery methods:
|
||||
- Keyword-based search
|
||||
- LLM-based search
|
||||
- Domain-specific searches
|
||||
- Getting detailed tool information
|
||||
|
||||
**`scripts/example_workflow.py`** - Complete workflow examples:
|
||||
- Drug discovery pipeline (disease → targets → structures → screening → candidates)
|
||||
- Genomics analysis (expression data → differential analysis → pathways)
|
||||
|
||||
Run examples to understand typical usage patterns and workflow composition.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Tool Discovery:**
|
||||
- Start with broad searches, then refine based on results
|
||||
- Use `Tool_Finder_Keyword` for fast searches with known terms
|
||||
- Use `Tool_Finder_LLM` for complex semantic queries
|
||||
- Set appropriate `limit` parameter (default: 10)
|
||||
|
||||
2. **Tool Execution:**
|
||||
- Always verify tool parameters before execution
|
||||
- Implement error handling for production workflows
|
||||
- Validate input data formats (SMILES, UniProt IDs, gene symbols)
|
||||
- Check result types and structures
|
||||
|
||||
3. **Workflow Composition:**
|
||||
- Test each step individually before composing full workflows
|
||||
- Implement checkpointing for long workflows
|
||||
- Consider rate limits for remote APIs
|
||||
- Use parallel execution when tools are independent
|
||||
|
||||
4. **Integration:**
|
||||
- Initialize ToolUniverse once and reuse the instance
|
||||
- Call `load_tools()` once at startup
|
||||
- Cache frequently used tool information
|
||||
- Enable logging for debugging
|
||||
|
||||
## Key Terminology
|
||||
|
||||
- **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse
|
||||
- **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword)
|
||||
- **Tool Execution**: Running a tool with specific arguments via `tu.run()`
|
||||
- **Tool Composition**: Chaining multiple tools for multi-step workflows
|
||||
- **MCP**: Model Context Protocol for integration with Claude Desktop/Code
|
||||
- **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication
|
||||
|
||||
## Resources
|
||||
|
||||
- **Official Website**: https://aiscientist.tools
|
||||
- **GitHub**: https://github.com/mims-harvard/ToolUniverse
|
||||
- **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/
|
||||
- **Installation**: `uv uv pip install tooluniverse`
|
||||
- **MCP Server**: `tooluniverse-smcp`
|
||||
Reference in New Issue
Block a user