Initial commit
This commit is contained in:
290
skills/tooluniverse/SKILL.md
Normal file
290
skills/tooluniverse/SKILL.md
Normal file
@@ -0,0 +1,290 @@
|
||||
---
|
||||
name: tooluniverse
|
||||
description: Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.
|
||||
---
|
||||
|
||||
# ToolUniverse
|
||||
|
||||
## Overview
|
||||
|
||||
ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery.
|
||||
|
||||
**Key Capabilities:**
|
||||
- Access 600+ scientific tools, models, datasets, and APIs
|
||||
- Discover tools using natural language, semantic search, or keywords
|
||||
- Execute tools through standardized AI-Tool Interaction Protocol
|
||||
- Compose multi-step workflows for complex research problems
|
||||
- Integration with Claude Desktop/Code via Model Context Protocol (MCP)
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools")
|
||||
- Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis)
|
||||
- Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.)
|
||||
- Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening)
|
||||
- Working with bioinformatics, cheminformatics, or structural biology tasks
|
||||
- Analyzing gene expression, protein sequences, molecular structures, or clinical data
|
||||
- Performing literature searches, pathway enrichment, or variant annotation
|
||||
- Building automated scientific research workflows
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Setup
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
# Initialize and load tools
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools() # Loads 600+ scientific tools
|
||||
|
||||
# Discover tools
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "disease target associations",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
|
||||
# Execute a tool
|
||||
result = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000537"} # Hypertension
|
||||
})
|
||||
```
|
||||
|
||||
### Model Context Protocol (MCP)
|
||||
For Claude Desktop/Code integration:
|
||||
```bash
|
||||
tooluniverse-smcp
|
||||
```
|
||||
|
||||
## Core Workflows
|
||||
|
||||
### 1. Tool Discovery
|
||||
|
||||
Find relevant tools for your research task:
|
||||
|
||||
**Three discovery methods:**
|
||||
- `Tool_Finder` - Embedding-based semantic search (requires GPU)
|
||||
- `Tool_Finder_LLM` - LLM-based semantic search (no GPU required)
|
||||
- `Tool_Finder_Keyword` - Fast keyword search
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Search by natural language description
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_LLM",
|
||||
"arguments": {
|
||||
"description": "Find tools for RNA sequencing differential expression analysis",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
|
||||
# Review available tools
|
||||
for tool in tools:
|
||||
print(f"{tool['name']}: {tool['description']}")
|
||||
```
|
||||
|
||||
**See `references/tool-discovery.md` for:**
|
||||
- Detailed discovery methods and search strategies
|
||||
- Domain-specific keyword suggestions
|
||||
- Best practices for finding tools
|
||||
|
||||
### 2. Tool Execution
|
||||
|
||||
Execute individual tools through the standardized interface:
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Execute disease-target lookup
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000616"} # Breast cancer
|
||||
})
|
||||
|
||||
# Get protein structure
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": "P12345"}
|
||||
})
|
||||
|
||||
# Calculate molecular properties
|
||||
properties = tu.run({
|
||||
"name": "RDKit_calculate_descriptors",
|
||||
"arguments": {"smiles": "CCO"} # Ethanol
|
||||
})
|
||||
```
|
||||
|
||||
**See `references/tool-execution.md` for:**
|
||||
- Real-world execution examples across domains
|
||||
- Tool parameter handling and validation
|
||||
- Result processing and error handling
|
||||
- Best practices for production use
|
||||
|
||||
### 3. Tool Composition and Workflows
|
||||
|
||||
Compose multiple tools for complex research workflows:
|
||||
|
||||
**Drug Discovery Example:**
|
||||
```python
|
||||
# 1. Find disease targets
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000616"}
|
||||
})
|
||||
|
||||
# 2. Get protein structures
|
||||
structures = []
|
||||
for target in targets[:5]:
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": target['uniprot_id']}
|
||||
})
|
||||
structures.append(structure)
|
||||
|
||||
# 3. Screen compounds
|
||||
hits = []
|
||||
for structure in structures:
|
||||
compounds = tu.run({
|
||||
"name": "ZINC_virtual_screening",
|
||||
"arguments": {
|
||||
"structure": structure,
|
||||
"library": "lead-like",
|
||||
"top_n": 100
|
||||
}
|
||||
})
|
||||
hits.extend(compounds)
|
||||
|
||||
# 4. Evaluate drug-likeness
|
||||
drug_candidates = []
|
||||
for compound in hits:
|
||||
props = tu.run({
|
||||
"name": "RDKit_calculate_drug_properties",
|
||||
"arguments": {"smiles": compound['smiles']}
|
||||
})
|
||||
if props['lipinski_pass']:
|
||||
drug_candidates.append(compound)
|
||||
```
|
||||
|
||||
**See `references/tool-composition.md` for:**
|
||||
- Complete workflow examples (drug discovery, genomics, clinical)
|
||||
- Sequential and parallel tool composition patterns
|
||||
- Output processing hooks
|
||||
- Workflow best practices
|
||||
|
||||
## Scientific Domains
|
||||
|
||||
ToolUniverse supports 600+ tools across major scientific domains:
|
||||
|
||||
**Bioinformatics:**
|
||||
- Sequence analysis, alignment, BLAST
|
||||
- Gene expression (RNA-seq, DESeq2)
|
||||
- Pathway enrichment (KEGG, Reactome, GO)
|
||||
- Variant annotation (VEP, ClinVar)
|
||||
|
||||
**Cheminformatics:**
|
||||
- Molecular descriptors and fingerprints
|
||||
- Drug discovery and virtual screening
|
||||
- ADMET prediction and drug-likeness
|
||||
- Chemical databases (PubChem, ChEMBL, ZINC)
|
||||
|
||||
**Structural Biology:**
|
||||
- Protein structure prediction (AlphaFold)
|
||||
- Structure retrieval (PDB)
|
||||
- Binding site detection
|
||||
- Protein-protein interactions
|
||||
|
||||
**Proteomics:**
|
||||
- Mass spectrometry analysis
|
||||
- Protein databases (UniProt, STRING)
|
||||
- Post-translational modifications
|
||||
|
||||
**Genomics:**
|
||||
- Genome assembly and annotation
|
||||
- Copy number variation
|
||||
- Clinical genomics workflows
|
||||
|
||||
**Medical/Clinical:**
|
||||
- Disease databases (OpenTargets, OMIM)
|
||||
- Clinical trials and FDA data
|
||||
- Variant classification
|
||||
|
||||
**See `references/domains.md` for:**
|
||||
- Complete domain categorization
|
||||
- Tool examples by discipline
|
||||
- Cross-domain applications
|
||||
- Search strategies by domain
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
This skill includes comprehensive reference files that provide detailed information for specific aspects:
|
||||
|
||||
- **`references/installation.md`** - Installation, setup, MCP configuration, platform integration
|
||||
- **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools
|
||||
- **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling
|
||||
- **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution
|
||||
- **`references/domains.md`** - Tool categorization by domain, use case examples
|
||||
- **`references/api_reference.md`** - Python API documentation, hooks, protocols
|
||||
|
||||
**Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies.
|
||||
|
||||
## Example Scripts
|
||||
|
||||
Two executable example scripts demonstrate common use cases:
|
||||
|
||||
**`scripts/example_tool_search.py`** - Demonstrates all three discovery methods:
|
||||
- Keyword-based search
|
||||
- LLM-based search
|
||||
- Domain-specific searches
|
||||
- Getting detailed tool information
|
||||
|
||||
**`scripts/example_workflow.py`** - Complete workflow examples:
|
||||
- Drug discovery pipeline (disease → targets → structures → screening → candidates)
|
||||
- Genomics analysis (expression data → differential analysis → pathways)
|
||||
|
||||
Run examples to understand typical usage patterns and workflow composition.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Tool Discovery:**
|
||||
- Start with broad searches, then refine based on results
|
||||
- Use `Tool_Finder_Keyword` for fast searches with known terms
|
||||
- Use `Tool_Finder_LLM` for complex semantic queries
|
||||
- Set appropriate `limit` parameter (default: 10)
|
||||
|
||||
2. **Tool Execution:**
|
||||
- Always verify tool parameters before execution
|
||||
- Implement error handling for production workflows
|
||||
- Validate input data formats (SMILES, UniProt IDs, gene symbols)
|
||||
- Check result types and structures
|
||||
|
||||
3. **Workflow Composition:**
|
||||
- Test each step individually before composing full workflows
|
||||
- Implement checkpointing for long workflows
|
||||
- Consider rate limits for remote APIs
|
||||
- Use parallel execution when tools are independent
|
||||
|
||||
4. **Integration:**
|
||||
- Initialize ToolUniverse once and reuse the instance
|
||||
- Call `load_tools()` once at startup
|
||||
- Cache frequently used tool information
|
||||
- Enable logging for debugging
|
||||
|
||||
## Key Terminology
|
||||
|
||||
- **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse
|
||||
- **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword)
|
||||
- **Tool Execution**: Running a tool with specific arguments via `tu.run()`
|
||||
- **Tool Composition**: Chaining multiple tools for multi-step workflows
|
||||
- **MCP**: Model Context Protocol for integration with Claude Desktop/Code
|
||||
- **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication
|
||||
|
||||
## Resources
|
||||
|
||||
- **Official Website**: https://aiscientist.tools
|
||||
- **GitHub**: https://github.com/mims-harvard/ToolUniverse
|
||||
- **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/
|
||||
- **Installation**: `uv uv pip install tooluniverse`
|
||||
- **MCP Server**: `tooluniverse-smcp`
|
||||
298
skills/tooluniverse/references/api_reference.md
Normal file
298
skills/tooluniverse/references/api_reference.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# ToolUniverse Python API Reference
|
||||
|
||||
## Core Classes
|
||||
|
||||
### ToolUniverse
|
||||
|
||||
Main class for interacting with the ToolUniverse ecosystem.
|
||||
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
tu = ToolUniverse()
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `load_tools()`
|
||||
Load all available tools into the ToolUniverse instance.
|
||||
|
||||
```python
|
||||
tu.load_tools()
|
||||
```
|
||||
|
||||
**Returns:** None
|
||||
|
||||
**Side effects:** Loads 600+ tools into memory for discovery and execution.
|
||||
|
||||
---
|
||||
|
||||
##### `run(tool_config)`
|
||||
Execute a tool with specified arguments.
|
||||
|
||||
**Parameters:**
|
||||
- `tool_config` (dict): Configuration dictionary with keys:
|
||||
- `name` (str): Tool name to execute
|
||||
- `arguments` (dict): Tool-specific arguments
|
||||
|
||||
**Returns:** Tool-specific output (dict, list, str, or other types)
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
result = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {
|
||||
"efoId": "EFO_0000537"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `list_tools(limit=None)`
|
||||
List all available tools or a subset.
|
||||
|
||||
**Parameters:**
|
||||
- `limit` (int, optional): Maximum number of tools to return. If None, returns all tools.
|
||||
|
||||
**Returns:** List of tool dictionaries
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# List all tools
|
||||
all_tools = tu.list_tools()
|
||||
|
||||
# List first 20 tools
|
||||
tools = tu.list_tools(limit=20)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
##### `get_tool_info(tool_name)`
|
||||
Get detailed information about a specific tool.
|
||||
|
||||
**Parameters:**
|
||||
- `tool_name` (str): Name of the tool
|
||||
|
||||
**Returns:** Dictionary containing tool metadata, parameters, and documentation
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
info = tu.get_tool_info("AlphaFold_get_structure")
|
||||
print(info['description'])
|
||||
print(info['parameters'])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Built-in Discovery Tools
|
||||
|
||||
These are special tools that help find other tools in the ecosystem.
|
||||
|
||||
### Tool_Finder
|
||||
|
||||
Embedding-based semantic search for tools. Requires GPU.
|
||||
|
||||
```python
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder",
|
||||
"arguments": {
|
||||
"description": "protein structure prediction",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `description` (str): Natural language description of desired functionality
|
||||
- `limit` (int): Maximum number of tools to return
|
||||
|
||||
**Returns:** List of relevant tools with similarity scores
|
||||
|
||||
---
|
||||
|
||||
### Tool_Finder_LLM
|
||||
|
||||
LLM-based semantic search for tools. No GPU required.
|
||||
|
||||
```python
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_LLM",
|
||||
"arguments": {
|
||||
"description": "Find tools for RNA sequencing analysis",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `description` (str): Natural language query
|
||||
- `limit` (int): Maximum number of tools to return
|
||||
|
||||
**Returns:** List of relevant tools
|
||||
|
||||
---
|
||||
|
||||
### Tool_Finder_Keyword
|
||||
|
||||
Fast keyword-based search through tool names and descriptions.
|
||||
|
||||
```python
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "pathway enrichment",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `description` (str): Keywords to search for
|
||||
- `limit` (int): Maximum number of tools to return
|
||||
|
||||
**Returns:** List of matching tools
|
||||
|
||||
---
|
||||
|
||||
## Tool Output Hooks
|
||||
|
||||
Post-processing hooks for tool results.
|
||||
|
||||
### Summarization Hook
|
||||
```python
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
},
|
||||
hooks={
|
||||
"summarize": {
|
||||
"format": "brief" # or "detailed"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### File Saving Hook
|
||||
```python
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
},
|
||||
hooks={
|
||||
"save_to_file": {
|
||||
"filename": "output.json",
|
||||
"format": "json" # or "csv", "txt"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Context Protocol (MCP)
|
||||
|
||||
### Starting MCP Server
|
||||
|
||||
Command-line interface:
|
||||
```bash
|
||||
tooluniverse-smcp
|
||||
```
|
||||
|
||||
This launches an MCP server that exposes all ToolUniverse tools through the Model Context Protocol.
|
||||
|
||||
**Configuration:**
|
||||
- Default port: Automatically assigned
|
||||
- Protocol: MCP standard
|
||||
- Authentication: None required for local use
|
||||
|
||||
---
|
||||
|
||||
## Integration Modules
|
||||
|
||||
### OpenRouter Integration
|
||||
|
||||
Access 100+ LLMs through OpenRouter API:
|
||||
|
||||
```python
|
||||
from tooluniverse import OpenRouterClient
|
||||
|
||||
client = OpenRouterClient(api_key="your_key")
|
||||
response = client.chat("Analyze this protein sequence", model="anthropic/claude-3-5-sonnet")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AI-Tool Interaction Protocol
|
||||
|
||||
ToolUniverse uses a standardized protocol for LLM-tool communication:
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"name": "tool_name",
|
||||
"arguments": {
|
||||
"param1": "value1",
|
||||
"param2": "value2"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"data": { ... },
|
||||
"metadata": {
|
||||
"execution_time": 1.23,
|
||||
"tool_version": "1.0.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
try:
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
})
|
||||
except ToolNotFoundError as e:
|
||||
print(f"Tool not found: {e}")
|
||||
except InvalidArgumentError as e:
|
||||
print(f"Invalid arguments: {e}")
|
||||
except ToolExecutionError as e:
|
||||
print(f"Execution failed: {e}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Type Hints
|
||||
|
||||
```python
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
def run_tool(
|
||||
tu: ToolUniverse,
|
||||
tool_name: str,
|
||||
arguments: Dict[str, Any]
|
||||
) -> Any:
|
||||
"""Execute a tool with type-safe arguments."""
|
||||
return tu.run({
|
||||
"name": tool_name,
|
||||
"arguments": arguments
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Initialize Once**: Create a single ToolUniverse instance and reuse it
|
||||
2. **Load Tools Early**: Call `load_tools()` once at startup
|
||||
3. **Cache Tool Info**: Store frequently used tool information
|
||||
4. **Error Handling**: Always wrap tool execution in try-except blocks
|
||||
5. **Type Validation**: Validate argument types before execution
|
||||
6. **Resource Management**: Consider rate limits for remote APIs
|
||||
7. **Logging**: Enable logging for production environments
|
||||
272
skills/tooluniverse/references/domains.md
Normal file
272
skills/tooluniverse/references/domains.md
Normal file
@@ -0,0 +1,272 @@
|
||||
# ToolUniverse Tool Domains and Categories
|
||||
|
||||
## Overview
|
||||
|
||||
ToolUniverse integrates 600+ scientific tools across multiple research domains. This document categorizes tools by scientific discipline and use case.
|
||||
|
||||
## Major Scientific Domains
|
||||
|
||||
### Bioinformatics
|
||||
|
||||
**Sequence Analysis:**
|
||||
- Sequence alignment and comparison
|
||||
- Multiple sequence alignment (MSA)
|
||||
- BLAST and homology searches
|
||||
- Motif finding and pattern matching
|
||||
|
||||
**Genomics:**
|
||||
- Gene expression analysis
|
||||
- RNA-seq data processing
|
||||
- Variant calling and annotation
|
||||
- Genome assembly and annotation
|
||||
- Copy number variation analysis
|
||||
|
||||
**Functional Analysis:**
|
||||
- Gene Ontology (GO) enrichment
|
||||
- Pathway analysis (KEGG, Reactome)
|
||||
- Gene set enrichment analysis (GSEA)
|
||||
- Protein domain analysis
|
||||
|
||||
**Example Tools:**
|
||||
- GEO data download and analysis
|
||||
- DESeq2 differential expression
|
||||
- KEGG pathway enrichment
|
||||
- UniProt sequence retrieval
|
||||
- VEP variant annotation
|
||||
|
||||
### Cheminformatics
|
||||
|
||||
**Molecular Descriptors:**
|
||||
- Chemical property calculation
|
||||
- Molecular fingerprints
|
||||
- SMILES/InChI conversion
|
||||
- 3D conformer generation
|
||||
|
||||
**Drug Discovery:**
|
||||
- Virtual screening
|
||||
- Molecular docking
|
||||
- ADMET prediction
|
||||
- Drug-likeness assessment (Lipinski's Rule of Five)
|
||||
- Toxicity prediction
|
||||
|
||||
**Chemical Databases:**
|
||||
- PubChem compound search
|
||||
- ChEMBL bioactivity data
|
||||
- ZINC compound libraries
|
||||
- DrugBank drug information
|
||||
|
||||
**Example Tools:**
|
||||
- RDKit molecular descriptors
|
||||
- AutoDock molecular docking
|
||||
- ZINC library screening
|
||||
- ChEMBL target-compound associations
|
||||
|
||||
### Structural Biology
|
||||
|
||||
**Protein Structure:**
|
||||
- AlphaFold structure prediction
|
||||
- PDB structure retrieval
|
||||
- Structure alignment and comparison
|
||||
- Binding site prediction
|
||||
- Protein-protein interaction prediction
|
||||
|
||||
**Structure Analysis:**
|
||||
- Secondary structure prediction
|
||||
- Solvent accessibility calculation
|
||||
- Structure quality assessment
|
||||
- Ramachandran plot analysis
|
||||
|
||||
**Example Tools:**
|
||||
- AlphaFold structure prediction
|
||||
- PDB structure download
|
||||
- Fpocket binding site detection
|
||||
- DSSP secondary structure assignment
|
||||
|
||||
### Proteomics
|
||||
|
||||
**Protein Analysis:**
|
||||
- Mass spectrometry data analysis
|
||||
- Protein identification
|
||||
- Post-translational modification analysis
|
||||
- Protein quantification
|
||||
|
||||
**Protein Databases:**
|
||||
- UniProt protein information
|
||||
- STRING protein interactions
|
||||
- IntAct interaction databases
|
||||
|
||||
**Example Tools:**
|
||||
- UniProt data retrieval
|
||||
- STRING interaction networks
|
||||
- Mass spec peak analysis
|
||||
|
||||
### Machine Learning
|
||||
|
||||
**Model Types:**
|
||||
- Classification models
|
||||
- Regression models
|
||||
- Clustering algorithms
|
||||
- Neural networks
|
||||
- Deep learning models
|
||||
|
||||
**Applications:**
|
||||
- Predictive modeling
|
||||
- Feature selection
|
||||
- Dimensionality reduction
|
||||
- Pattern recognition
|
||||
- Biomarker discovery
|
||||
|
||||
**Example Tools:**
|
||||
- Scikit-learn models
|
||||
- TensorFlow/PyTorch models
|
||||
- XGBoost predictors
|
||||
- Random forest classifiers
|
||||
|
||||
### Medical/Clinical
|
||||
|
||||
**Disease Databases:**
|
||||
- OpenTargets disease-target associations
|
||||
- OMIM genetic disorders
|
||||
- ClinVar pathogenic variants
|
||||
- DisGeNET disease-gene associations
|
||||
|
||||
**Clinical Data:**
|
||||
- Electronic health records analysis
|
||||
- Clinical trial data
|
||||
- Diagnostic tools
|
||||
- Treatment recommendations
|
||||
|
||||
**Example Tools:**
|
||||
- OpenTargets disease queries
|
||||
- ClinVar variant classification
|
||||
- OMIM disease lookup
|
||||
- FDA drug approval data
|
||||
|
||||
### Neuroscience
|
||||
|
||||
**Brain Imaging:**
|
||||
- fMRI data analysis
|
||||
- Brain atlas mapping
|
||||
- Connectivity analysis
|
||||
- Neuroimaging pipelines
|
||||
|
||||
**Neural Data:**
|
||||
- Electrophysiology analysis
|
||||
- Spike train analysis
|
||||
- Neural network simulation
|
||||
|
||||
### Image Processing
|
||||
|
||||
**Biomedical Imaging:**
|
||||
- Microscopy image analysis
|
||||
- Cell segmentation
|
||||
- Object detection
|
||||
- Image enhancement
|
||||
- Feature extraction
|
||||
|
||||
**Image Analysis:**
|
||||
- ImageJ/Fiji tools
|
||||
- CellProfiler pipelines
|
||||
- Deep learning segmentation
|
||||
|
||||
### Systems Biology
|
||||
|
||||
**Network Analysis:**
|
||||
- Biological network construction
|
||||
- Network topology analysis
|
||||
- Module identification
|
||||
- Hub gene identification
|
||||
|
||||
**Modeling:**
|
||||
- Systems biology models
|
||||
- Metabolic network modeling
|
||||
- Signaling pathway simulation
|
||||
|
||||
## Tool Categories by Use Case
|
||||
|
||||
### Literature and Knowledge
|
||||
|
||||
**Literature Search:**
|
||||
- PubMed article search
|
||||
- Article summarization
|
||||
- Citation analysis
|
||||
- Knowledge extraction
|
||||
|
||||
**Knowledge Bases:**
|
||||
- Ontology queries (GO, DO, HPO)
|
||||
- Database cross-referencing
|
||||
- Entity recognition
|
||||
|
||||
### Data Access
|
||||
|
||||
**Public Repositories:**
|
||||
- GEO (Gene Expression Omnibus)
|
||||
- SRA (Sequence Read Archive)
|
||||
- PDB (Protein Data Bank)
|
||||
- ChEMBL (Bioactivity database)
|
||||
|
||||
**API Access:**
|
||||
- RESTful API clients
|
||||
- Database query tools
|
||||
- Batch data retrieval
|
||||
|
||||
### Visualization
|
||||
|
||||
**Plot Generation:**
|
||||
- Heatmaps
|
||||
- Volcano plots
|
||||
- Manhattan plots
|
||||
- Network graphs
|
||||
- Molecular structures
|
||||
|
||||
### Utilities
|
||||
|
||||
**Data Processing:**
|
||||
- Format conversion
|
||||
- Data normalization
|
||||
- Statistical analysis
|
||||
- Quality control
|
||||
|
||||
**Workflow Management:**
|
||||
- Pipeline construction
|
||||
- Task orchestration
|
||||
- Result aggregation
|
||||
|
||||
## Finding Tools by Domain
|
||||
|
||||
Use domain-specific keywords with Tool_Finder:
|
||||
|
||||
```python
|
||||
# Bioinformatics
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {"description": "RNA-seq genomics", "limit": 10}
|
||||
})
|
||||
|
||||
# Cheminformatics
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {"description": "molecular docking SMILES", "limit": 10}
|
||||
})
|
||||
|
||||
# Structural biology
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {"description": "protein structure PDB", "limit": 10}
|
||||
})
|
||||
|
||||
# Clinical
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {"description": "disease clinical variants", "limit": 10}
|
||||
})
|
||||
```
|
||||
|
||||
## Cross-Domain Applications
|
||||
|
||||
Many scientific problems require tools from multiple domains:
|
||||
|
||||
- **Precision Medicine**: Genomics + Clinical + Proteomics
|
||||
- **Drug Discovery**: Cheminformatics + Structural Biology + Machine Learning
|
||||
- **Cancer Research**: Genomics + Pathways + Literature
|
||||
- **Neurodegenerative Diseases**: Genomics + Proteomics + Imaging
|
||||
83
skills/tooluniverse/references/installation.md
Normal file
83
skills/tooluniverse/references/installation.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# ToolUniverse Installation and Setup
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
uv pip install tooluniverse
|
||||
```
|
||||
|
||||
## Basic Setup
|
||||
|
||||
### Python SDK
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
# Initialize ToolUniverse
|
||||
tu = ToolUniverse()
|
||||
|
||||
# Load all available tools (600+ scientific tools)
|
||||
tu.load_tools()
|
||||
```
|
||||
|
||||
## Model Context Protocol (MCP) Setup
|
||||
|
||||
ToolUniverse provides native MCP support for integration with Claude Desktop, Claude Code, and other MCP-compatible systems.
|
||||
|
||||
### Starting MCP Server
|
||||
```bash
|
||||
tooluniverse-smcp
|
||||
```
|
||||
|
||||
This launches an MCP server that exposes ToolUniverse's 600+ tools through the Model Context Protocol.
|
||||
|
||||
### Claude Desktop Integration
|
||||
|
||||
Add to Claude Desktop configuration (~/.config/Claude/claude_desktop_config.json):
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"tooluniverse": {
|
||||
"command": "tooluniverse-smcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Code Integration
|
||||
|
||||
ToolUniverse MCP server works natively with Claude Code through the MCP protocol.
|
||||
|
||||
## Integration with Other Platforms
|
||||
|
||||
### OpenRouter Integration
|
||||
ToolUniverse integrates with OpenRouter for access to 100+ LLMs through a single API:
|
||||
- GPT-5, Claude, Gemini
|
||||
- Qwen, Deepseek
|
||||
- Open-source models
|
||||
|
||||
### Supported LLM Platforms
|
||||
- Claude Desktop and Claude Code
|
||||
- Gemini CLI
|
||||
- Qwen Code
|
||||
- ChatGPT API
|
||||
- GPT Codex CLI
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.8+
|
||||
- For Tool_Finder (embedding-based search): GPU recommended
|
||||
- For Tool_Finder_LLM: No GPU required (uses LLM-based search)
|
||||
|
||||
## Verification
|
||||
|
||||
Test installation:
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
# List first 5 tools to verify setup
|
||||
tools = tu.list_tools(limit=5)
|
||||
print(f"Loaded {len(tools)} tools successfully")
|
||||
```
|
||||
249
skills/tooluniverse/references/tool-composition.md
Normal file
249
skills/tooluniverse/references/tool-composition.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Tool Composition and Workflows in ToolUniverse
|
||||
|
||||
## Overview
|
||||
|
||||
ToolUniverse enables chaining multiple tools together to create complex scientific workflows. Tools can be composed sequentially or in parallel to solve multi-step research problems.
|
||||
|
||||
## Sequential Tool Composition
|
||||
|
||||
Execute tools in sequence where each tool's output feeds into the next tool.
|
||||
|
||||
### Basic Pattern
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
# Step 1: Get disease-associated targets
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000537"} # Hypertension
|
||||
})
|
||||
|
||||
# Step 2: For each target, get protein structure
|
||||
structures = []
|
||||
for target in targets[:5]: # First 5 targets
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": target['uniprot_id']}
|
||||
})
|
||||
structures.append(structure)
|
||||
|
||||
# Step 3: Analyze structures
|
||||
for structure in structures:
|
||||
analysis = tu.run({
|
||||
"name": "ProteinAnalysis_calculate_properties",
|
||||
"arguments": {"structure": structure}
|
||||
})
|
||||
```
|
||||
|
||||
## Complex Workflow Examples
|
||||
|
||||
### Drug Discovery Workflow
|
||||
|
||||
Complete workflow from disease to drug candidates:
|
||||
|
||||
```python
|
||||
# 1. Find disease-associated targets
|
||||
print("Finding disease targets...")
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": "EFO_0000616"} # Breast cancer
|
||||
})
|
||||
|
||||
# 2. Get target protein sequences
|
||||
print("Retrieving protein sequences...")
|
||||
sequences = []
|
||||
for target in targets[:10]:
|
||||
seq = tu.run({
|
||||
"name": "UniProt_get_sequence",
|
||||
"arguments": {"uniprot_id": target['uniprot_id']}
|
||||
})
|
||||
sequences.append(seq)
|
||||
|
||||
# 3. Predict protein structures
|
||||
print("Predicting structures...")
|
||||
structures = []
|
||||
for seq in sequences:
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"sequence": seq}
|
||||
})
|
||||
structures.append(structure)
|
||||
|
||||
# 4. Find binding sites
|
||||
print("Identifying binding sites...")
|
||||
binding_sites = []
|
||||
for structure in structures:
|
||||
sites = tu.run({
|
||||
"name": "Fpocket_find_binding_sites",
|
||||
"arguments": {"structure": structure}
|
||||
})
|
||||
binding_sites.append(sites)
|
||||
|
||||
# 5. Screen compound libraries
|
||||
print("Screening compounds...")
|
||||
hits = []
|
||||
for site in binding_sites:
|
||||
compounds = tu.run({
|
||||
"name": "ZINC_virtual_screening",
|
||||
"arguments": {
|
||||
"binding_site": site,
|
||||
"library": "lead-like",
|
||||
"top_n": 100
|
||||
}
|
||||
})
|
||||
hits.extend(compounds)
|
||||
|
||||
# 6. Calculate drug-likeness
|
||||
print("Evaluating drug-likeness...")
|
||||
drug_candidates = []
|
||||
for compound in hits:
|
||||
properties = tu.run({
|
||||
"name": "RDKit_calculate_drug_properties",
|
||||
"arguments": {"smiles": compound['smiles']}
|
||||
})
|
||||
if properties['lipinski_pass']:
|
||||
drug_candidates.append(compound)
|
||||
|
||||
print(f"Found {len(drug_candidates)} drug candidates")
|
||||
```
|
||||
|
||||
### Genomics Analysis Workflow
|
||||
|
||||
```python
|
||||
# 1. Download gene expression data
|
||||
expression_data = tu.run({
|
||||
"name": "GEO_download_dataset",
|
||||
"arguments": {"geo_id": "GSE12345"}
|
||||
})
|
||||
|
||||
# 2. Perform differential expression analysis
|
||||
de_genes = tu.run({
|
||||
"name": "DESeq2_differential_expression",
|
||||
"arguments": {
|
||||
"data": expression_data,
|
||||
"condition1": "control",
|
||||
"condition2": "treated"
|
||||
}
|
||||
})
|
||||
|
||||
# 3. Pathway enrichment analysis
|
||||
pathways = tu.run({
|
||||
"name": "KEGG_pathway_enrichment",
|
||||
"arguments": {
|
||||
"gene_list": de_genes['significant_genes'],
|
||||
"organism": "hsa"
|
||||
}
|
||||
})
|
||||
|
||||
# 4. Find relevant literature
|
||||
papers = tu.run({
|
||||
"name": "PubMed_search",
|
||||
"arguments": {
|
||||
"query": f"{pathways[0]['pathway_name']} AND cancer",
|
||||
"max_results": 20
|
||||
}
|
||||
})
|
||||
|
||||
# 5. Summarize findings
|
||||
summary = tu.run({
|
||||
"name": "LLM_summarize",
|
||||
"arguments": {
|
||||
"text": papers,
|
||||
"focus": "therapeutic implications"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Clinical Genomics Workflow
|
||||
|
||||
```python
|
||||
# 1. Load patient variants
|
||||
variants = tu.run({
|
||||
"name": "VCF_parse",
|
||||
"arguments": {"vcf_file": "patient_001.vcf"}
|
||||
})
|
||||
|
||||
# 2. Annotate variants
|
||||
annotated = tu.run({
|
||||
"name": "VEP_annotate_variants",
|
||||
"arguments": {"variants": variants}
|
||||
})
|
||||
|
||||
# 3. Filter pathogenic variants
|
||||
pathogenic = tu.run({
|
||||
"name": "ClinVar_filter_pathogenic",
|
||||
"arguments": {"variants": annotated}
|
||||
})
|
||||
|
||||
# 4. Find disease associations
|
||||
diseases = tu.run({
|
||||
"name": "OMIM_disease_lookup",
|
||||
"arguments": {"genes": pathogenic['affected_genes']}
|
||||
})
|
||||
|
||||
# 5. Generate clinical report
|
||||
report = tu.run({
|
||||
"name": "Report_generator",
|
||||
"arguments": {
|
||||
"variants": pathogenic,
|
||||
"diseases": diseases,
|
||||
"format": "clinical"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Parallel Tool Execution
|
||||
|
||||
Execute multiple tools simultaneously when they don't depend on each other:
|
||||
|
||||
```python
|
||||
import concurrent.futures
|
||||
|
||||
def run_tool(tu, tool_config):
|
||||
return tu.run(tool_config)
|
||||
|
||||
# Define parallel tasks
|
||||
tasks = [
|
||||
{"name": "PubMed_search", "arguments": {"query": "cancer", "max_results": 10}},
|
||||
{"name": "OpenTargets_get_diseases", "arguments": {"therapeutic_area": "oncology"}},
|
||||
{"name": "ChEMBL_search_compounds", "arguments": {"target": "EGFR"}}
|
||||
]
|
||||
|
||||
# Execute in parallel
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
|
||||
futures = [executor.submit(run_tool, tu, task) for task in tasks]
|
||||
results = [future.result() for future in concurrent.futures.as_completed(futures)]
|
||||
```
|
||||
|
||||
## Output Processing Hooks
|
||||
|
||||
ToolUniverse supports post-processing hooks for:
|
||||
- Summarization
|
||||
- File saving
|
||||
- Data transformation
|
||||
- Visualization
|
||||
|
||||
```python
|
||||
# Example: Save results to file
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
},
|
||||
hooks={
|
||||
"save_to_file": {"filename": "results.json"},
|
||||
"summarize": {"format": "brief"}
|
||||
})
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Error Handling**: Implement try-except blocks for each tool in workflow
|
||||
2. **Data Validation**: Verify output from each step before passing to next tool
|
||||
3. **Checkpointing**: Save intermediate results for long workflows
|
||||
4. **Logging**: Track progress through complex workflows
|
||||
5. **Resource Management**: Consider rate limits and computational resources
|
||||
6. **Modularity**: Break complex workflows into reusable functions
|
||||
7. **Testing**: Test each step individually before composing full workflow
|
||||
126
skills/tooluniverse/references/tool-discovery.md
Normal file
126
skills/tooluniverse/references/tool-discovery.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Tool Discovery in ToolUniverse
|
||||
|
||||
## Overview
|
||||
|
||||
ToolUniverse provides multiple methods to discover and search through 600+ scientific tools using natural language, keywords, or embeddings.
|
||||
|
||||
## Discovery Methods
|
||||
|
||||
### 1. Tool_Finder (Embedding-Based Search)
|
||||
|
||||
Uses semantic embeddings to find relevant tools. **Requires GPU** for optimal performance.
|
||||
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
# Search by natural language description
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder",
|
||||
"arguments": {
|
||||
"description": "protein structure prediction",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
|
||||
print(tools)
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Natural language queries
|
||||
- Semantic similarity search
|
||||
- When GPU is available
|
||||
|
||||
### 2. Tool_Finder_LLM (LLM-Based Search)
|
||||
|
||||
Alternative to embedding-based search that uses LLM reasoning. **No GPU required**.
|
||||
|
||||
```python
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_LLM",
|
||||
"arguments": {
|
||||
"description": "Find tools for analyzing gene expression data",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- When GPU is not available
|
||||
- Complex queries requiring reasoning
|
||||
- Semantic understanding needed
|
||||
|
||||
### 3. Tool_Finder_Keyword (Keyword Search)
|
||||
|
||||
Fast keyword-based search through tool names and descriptions.
|
||||
|
||||
```python
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "disease target associations",
|
||||
"limit": 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**When to use:**
|
||||
- Fast searches
|
||||
- Known keywords
|
||||
- Exact term matching
|
||||
|
||||
## Listing Available Tools
|
||||
|
||||
### List All Tools
|
||||
```python
|
||||
all_tools = tu.list_tools()
|
||||
print(f"Total tools available: {len(all_tools)}")
|
||||
```
|
||||
|
||||
### List Tools with Limit
|
||||
```python
|
||||
tools = tu.list_tools(limit=20)
|
||||
for tool in tools:
|
||||
print(f"{tool['name']}: {tool['description']}")
|
||||
```
|
||||
|
||||
## Tool Information
|
||||
|
||||
### Get Tool Details
|
||||
```python
|
||||
# After finding a tool, inspect its details
|
||||
tool_info = tu.get_tool_info("OpenTargets_get_associated_targets_by_disease_efoId")
|
||||
print(tool_info)
|
||||
```
|
||||
|
||||
## Search Strategies
|
||||
|
||||
### By Domain
|
||||
Use domain-specific keywords:
|
||||
- Bioinformatics: "sequence alignment", "genomics", "RNA-seq"
|
||||
- Cheminformatics: "molecular dynamics", "drug design", "SMILES"
|
||||
- Machine Learning: "classification", "prediction", "neural network"
|
||||
- Structural Biology: "protein structure", "PDB", "crystallography"
|
||||
|
||||
### By Functionality
|
||||
Search by what you want to accomplish:
|
||||
- "Find disease-gene associations"
|
||||
- "Predict protein interactions"
|
||||
- "Analyze clinical trial data"
|
||||
- "Generate molecular descriptors"
|
||||
|
||||
### By Data Source
|
||||
Search for specific databases or APIs:
|
||||
- "OpenTargets", "PubChem", "UniProt"
|
||||
- "AlphaFold", "ChEMBL", "PDB"
|
||||
- "KEGG", "Reactome", "STRING"
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Start Broad**: Begin with general terms, then refine
|
||||
2. **Use Multiple Methods**: Try different discovery methods if results aren't satisfactory
|
||||
3. **Set Appropriate Limits**: Use `limit` parameter to control result size (default: 10)
|
||||
4. **Check Tool Descriptions**: Review returned tool descriptions to verify relevance
|
||||
5. **Iterate**: Refine search terms based on initial results
|
||||
177
skills/tooluniverse/references/tool-execution.md
Normal file
177
skills/tooluniverse/references/tool-execution.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Tool Execution in ToolUniverse
|
||||
|
||||
## Overview
|
||||
|
||||
Execute individual tools through ToolUniverse's standardized interface using the `run()` method.
|
||||
|
||||
## Basic Tool Execution
|
||||
|
||||
### Standard Pattern
|
||||
```python
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
# Execute a tool
|
||||
result = tu.run({
|
||||
"name": "tool_name_here",
|
||||
"arguments": {
|
||||
"param1": "value1",
|
||||
"param2": "value2"
|
||||
}
|
||||
})
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Example 1: Disease-Target Associations (OpenTargets)
|
||||
```python
|
||||
# Find targets associated with hypertension
|
||||
result = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {
|
||||
"efoId": "EFO_0000537" # Hypertension
|
||||
}
|
||||
})
|
||||
|
||||
print(f"Found {len(result)} targets associated with hypertension")
|
||||
```
|
||||
|
||||
### Example 2: Protein Structure Prediction
|
||||
```python
|
||||
# Get AlphaFold structure prediction
|
||||
result = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {
|
||||
"uniprot_id": "P12345"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Example 3: Chemical Property Calculation
|
||||
```python
|
||||
# Calculate molecular descriptors
|
||||
result = tu.run({
|
||||
"name": "RDKit_calculate_descriptors",
|
||||
"arguments": {
|
||||
"smiles": "CCO" # Ethanol
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Example 4: Gene Expression Analysis
|
||||
```python
|
||||
# Analyze differential gene expression
|
||||
result = tu.run({
|
||||
"name": "GeneExpression_differential_analysis",
|
||||
"arguments": {
|
||||
"dataset_id": "GSE12345",
|
||||
"condition1": "control",
|
||||
"condition2": "treatment"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Tool Execution Workflow
|
||||
|
||||
### 1. Discover the Tool
|
||||
```python
|
||||
# Find relevant tools
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "pathway enrichment",
|
||||
"limit": 5
|
||||
}
|
||||
})
|
||||
|
||||
# Review available tools
|
||||
for tool in tools:
|
||||
print(f"Name: {tool['name']}")
|
||||
print(f"Description: {tool['description']}")
|
||||
print(f"Parameters: {tool['parameters']}")
|
||||
print("---")
|
||||
```
|
||||
|
||||
### 2. Check Tool Parameters
|
||||
```python
|
||||
# Get detailed tool information
|
||||
tool_info = tu.get_tool_info("KEGG_pathway_enrichment")
|
||||
print(tool_info['parameters'])
|
||||
```
|
||||
|
||||
### 3. Execute with Proper Arguments
|
||||
```python
|
||||
# Execute the tool
|
||||
result = tu.run({
|
||||
"name": "KEGG_pathway_enrichment",
|
||||
"arguments": {
|
||||
"gene_list": ["TP53", "BRCA1", "EGFR"],
|
||||
"organism": "hsa" # Homo sapiens
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Handling Tool Results
|
||||
|
||||
### Check Result Type
|
||||
```python
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
})
|
||||
|
||||
# Results can be various types
|
||||
if isinstance(result, dict):
|
||||
print("Dictionary result")
|
||||
elif isinstance(result, list):
|
||||
print(f"List with {len(result)} items")
|
||||
elif isinstance(result, str):
|
||||
print("String result")
|
||||
```
|
||||
|
||||
### Process Results
|
||||
```python
|
||||
# Example: Processing multiple results
|
||||
results = tu.run({
|
||||
"name": "PubMed_search",
|
||||
"arguments": {
|
||||
"query": "cancer immunotherapy",
|
||||
"max_results": 10
|
||||
}
|
||||
})
|
||||
|
||||
for idx, paper in enumerate(results, 1):
|
||||
print(f"{idx}. {paper['title']}")
|
||||
print(f" PMID: {paper['pmid']}")
|
||||
print(f" Authors: {', '.join(paper['authors'][:3])}")
|
||||
print()
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
try:
|
||||
result = tu.run({
|
||||
"name": "some_tool",
|
||||
"arguments": {"param": "value"}
|
||||
})
|
||||
except Exception as e:
|
||||
print(f"Tool execution failed: {e}")
|
||||
# Check if tool exists
|
||||
# Verify parameter names and types
|
||||
# Review tool documentation
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Verify Tool Parameters**: Always check required parameters before execution
|
||||
2. **Start Simple**: Test with simple cases before complex workflows
|
||||
3. **Handle Results Appropriately**: Check result type and structure
|
||||
4. **Error Recovery**: Implement try-except blocks for production code
|
||||
5. **Documentation**: Review tool descriptions for parameter requirements and output formats
|
||||
6. **Rate Limiting**: Be aware of API rate limits for remote tools
|
||||
7. **Data Validation**: Validate input data format (e.g., SMILES, UniProt IDs, gene symbols)
|
||||
91
skills/tooluniverse/scripts/example_tool_search.py
Executable file
91
skills/tooluniverse/scripts/example_tool_search.py
Executable file
@@ -0,0 +1,91 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Example script demonstrating tool discovery in ToolUniverse.
|
||||
|
||||
This script shows how to search for tools using different methods:
|
||||
- Embedding-based search (Tool_Finder)
|
||||
- LLM-based search (Tool_Finder_LLM)
|
||||
- Keyword-based search (Tool_Finder_Keyword)
|
||||
"""
|
||||
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
|
||||
def main():
|
||||
# Initialize ToolUniverse
|
||||
print("Initializing ToolUniverse...")
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
print(f"Loaded {len(tu.list_tools())} tools\n")
|
||||
|
||||
# Example 1: Keyword-based search (fastest)
|
||||
print("=" * 60)
|
||||
print("Example 1: Keyword Search for Disease-Target Tools")
|
||||
print("=" * 60)
|
||||
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "disease target associations",
|
||||
"limit": 5
|
||||
}
|
||||
})
|
||||
|
||||
print(f"Found {len(tools)} tools:")
|
||||
for idx, tool in enumerate(tools, 1):
|
||||
print(f"\n{idx}. {tool['name']}")
|
||||
print(f" Description: {tool['description']}")
|
||||
|
||||
# Example 2: LLM-based search (no GPU required)
|
||||
print("\n" + "=" * 60)
|
||||
print("Example 2: LLM Search for Protein Structure Tools")
|
||||
print("=" * 60)
|
||||
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_LLM",
|
||||
"arguments": {
|
||||
"description": "Find tools for predicting protein structures from sequences",
|
||||
"limit": 5
|
||||
}
|
||||
})
|
||||
|
||||
print(f"Found {len(tools)} tools:")
|
||||
for idx, tool in enumerate(tools, 1):
|
||||
print(f"\n{idx}. {tool['name']}")
|
||||
print(f" Description: {tool['description']}")
|
||||
|
||||
# Example 3: Search by specific domain
|
||||
print("\n" + "=" * 60)
|
||||
print("Example 3: Search for Cheminformatics Tools")
|
||||
print("=" * 60)
|
||||
|
||||
tools = tu.run({
|
||||
"name": "Tool_Finder_Keyword",
|
||||
"arguments": {
|
||||
"description": "molecular docking SMILES compound",
|
||||
"limit": 5
|
||||
}
|
||||
})
|
||||
|
||||
print(f"Found {len(tools)} tools:")
|
||||
for idx, tool in enumerate(tools, 1):
|
||||
print(f"\n{idx}. {tool['name']}")
|
||||
print(f" Description: {tool['description']}")
|
||||
|
||||
# Example 4: Get detailed tool information
|
||||
print("\n" + "=" * 60)
|
||||
print("Example 4: Get Tool Details")
|
||||
print("=" * 60)
|
||||
|
||||
if tools:
|
||||
tool_name = tools[0]['name']
|
||||
print(f"Getting details for: {tool_name}")
|
||||
|
||||
tool_info = tu.get_tool_info(tool_name)
|
||||
print(f"\nTool: {tool_info['name']}")
|
||||
print(f"Description: {tool_info['description']}")
|
||||
print(f"Parameters: {tool_info.get('parameters', 'No parameters listed')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
219
skills/tooluniverse/scripts/example_workflow.py
Executable file
219
skills/tooluniverse/scripts/example_workflow.py
Executable file
@@ -0,0 +1,219 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Example workflow demonstrating tool composition in ToolUniverse.
|
||||
|
||||
This script shows a complete drug discovery workflow:
|
||||
1. Find disease-associated targets
|
||||
2. Retrieve protein sequences
|
||||
3. Get structure predictions
|
||||
4. Screen compound libraries
|
||||
5. Calculate drug-likeness properties
|
||||
"""
|
||||
|
||||
from tooluniverse import ToolUniverse
|
||||
|
||||
|
||||
def drug_discovery_workflow(disease_efo_id: str, max_targets: int = 3):
|
||||
"""
|
||||
Execute a drug discovery workflow for a given disease.
|
||||
|
||||
Args:
|
||||
disease_efo_id: EFO ID for the disease (e.g., "EFO_0000537" for hypertension)
|
||||
max_targets: Maximum number of targets to process
|
||||
"""
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
print("=" * 70)
|
||||
print("DRUG DISCOVERY WORKFLOW")
|
||||
print("=" * 70)
|
||||
|
||||
# Step 1: Find disease-associated targets
|
||||
print(f"\nStep 1: Finding targets for disease {disease_efo_id}...")
|
||||
targets = tu.run({
|
||||
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
|
||||
"arguments": {"efoId": disease_efo_id}
|
||||
})
|
||||
print(f"✓ Found {len(targets)} disease-associated targets")
|
||||
|
||||
# Process top targets
|
||||
top_targets = targets[:max_targets]
|
||||
print(f" Processing top {len(top_targets)} targets:")
|
||||
for idx, target in enumerate(top_targets, 1):
|
||||
print(f" {idx}. {target.get('target_name', 'Unknown')} ({target.get('uniprot_id', 'N/A')})")
|
||||
|
||||
# Step 2: Get protein sequences
|
||||
print(f"\nStep 2: Retrieving protein sequences...")
|
||||
sequences = []
|
||||
for target in top_targets:
|
||||
try:
|
||||
seq = tu.run({
|
||||
"name": "UniProt_get_sequence",
|
||||
"arguments": {"uniprot_id": target['uniprot_id']}
|
||||
})
|
||||
sequences.append({
|
||||
"target": target,
|
||||
"sequence": seq
|
||||
})
|
||||
print(f" ✓ Retrieved sequence for {target.get('target_name', 'Unknown')}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed to get sequence: {e}")
|
||||
|
||||
# Step 3: Predict protein structures
|
||||
print(f"\nStep 3: Predicting protein structures...")
|
||||
structures = []
|
||||
for seq_data in sequences:
|
||||
try:
|
||||
structure = tu.run({
|
||||
"name": "AlphaFold_get_structure",
|
||||
"arguments": {"uniprot_id": seq_data['target']['uniprot_id']}
|
||||
})
|
||||
structures.append({
|
||||
"target": seq_data['target'],
|
||||
"structure": structure
|
||||
})
|
||||
print(f" ✓ Predicted structure for {seq_data['target'].get('target_name', 'Unknown')}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed to predict structure: {e}")
|
||||
|
||||
# Step 4: Find binding sites
|
||||
print(f"\nStep 4: Identifying binding sites...")
|
||||
binding_sites = []
|
||||
for struct_data in structures:
|
||||
try:
|
||||
sites = tu.run({
|
||||
"name": "Fpocket_find_binding_sites",
|
||||
"arguments": {"structure": struct_data['structure']}
|
||||
})
|
||||
binding_sites.append({
|
||||
"target": struct_data['target'],
|
||||
"sites": sites
|
||||
})
|
||||
print(f" ✓ Found {len(sites)} binding sites for {struct_data['target'].get('target_name', 'Unknown')}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed to find binding sites: {e}")
|
||||
|
||||
# Step 5: Virtual screening (simplified)
|
||||
print(f"\nStep 5: Screening compound libraries...")
|
||||
all_hits = []
|
||||
for site_data in binding_sites:
|
||||
for site in site_data['sites'][:1]: # Top site only
|
||||
try:
|
||||
compounds = tu.run({
|
||||
"name": "ZINC_virtual_screening",
|
||||
"arguments": {
|
||||
"binding_site": site,
|
||||
"library": "lead-like",
|
||||
"top_n": 10
|
||||
}
|
||||
})
|
||||
all_hits.extend(compounds)
|
||||
print(f" ✓ Found {len(compounds)} hit compounds for {site_data['target'].get('target_name', 'Unknown')}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Screening failed: {e}")
|
||||
|
||||
# Step 6: Calculate drug-likeness
|
||||
print(f"\nStep 6: Evaluating drug-likeness...")
|
||||
drug_candidates = []
|
||||
for compound in all_hits:
|
||||
try:
|
||||
properties = tu.run({
|
||||
"name": "RDKit_calculate_drug_properties",
|
||||
"arguments": {"smiles": compound['smiles']}
|
||||
})
|
||||
|
||||
if properties.get('lipinski_pass', False):
|
||||
drug_candidates.append({
|
||||
"compound": compound,
|
||||
"properties": properties
|
||||
})
|
||||
except Exception as e:
|
||||
print(f" ✗ Property calculation failed: {e}")
|
||||
|
||||
print(f"\n ✓ Identified {len(drug_candidates)} drug candidates passing Lipinski's Rule of Five")
|
||||
|
||||
# Summary
|
||||
print("\n" + "=" * 70)
|
||||
print("WORKFLOW SUMMARY")
|
||||
print("=" * 70)
|
||||
print(f"Disease targets processed: {len(top_targets)}")
|
||||
print(f"Protein structures predicted: {len(structures)}")
|
||||
print(f"Binding sites identified: {sum(len(s['sites']) for s in binding_sites)}")
|
||||
print(f"Compounds screened: {len(all_hits)}")
|
||||
print(f"Drug candidates identified: {len(drug_candidates)}")
|
||||
print("=" * 70)
|
||||
|
||||
return drug_candidates
|
||||
|
||||
|
||||
def genomics_workflow(geo_id: str):
|
||||
"""
|
||||
Execute a genomics analysis workflow.
|
||||
|
||||
Args:
|
||||
geo_id: GEO dataset ID (e.g., "GSE12345")
|
||||
"""
|
||||
tu = ToolUniverse()
|
||||
tu.load_tools()
|
||||
|
||||
print("=" * 70)
|
||||
print("GENOMICS ANALYSIS WORKFLOW")
|
||||
print("=" * 70)
|
||||
|
||||
# Step 1: Download gene expression data
|
||||
print(f"\nStep 1: Downloading dataset {geo_id}...")
|
||||
try:
|
||||
expression_data = tu.run({
|
||||
"name": "GEO_download_dataset",
|
||||
"arguments": {"geo_id": geo_id}
|
||||
})
|
||||
print(f" ✓ Downloaded expression data")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed: {e}")
|
||||
return
|
||||
|
||||
# Step 2: Differential expression analysis
|
||||
print(f"\nStep 2: Performing differential expression analysis...")
|
||||
try:
|
||||
de_genes = tu.run({
|
||||
"name": "DESeq2_differential_expression",
|
||||
"arguments": {
|
||||
"data": expression_data,
|
||||
"condition1": "control",
|
||||
"condition2": "treated"
|
||||
}
|
||||
})
|
||||
print(f" ✓ Found {len(de_genes.get('significant_genes', []))} differentially expressed genes")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed: {e}")
|
||||
return
|
||||
|
||||
# Step 3: Pathway enrichment
|
||||
print(f"\nStep 3: Running pathway enrichment analysis...")
|
||||
try:
|
||||
pathways = tu.run({
|
||||
"name": "KEGG_pathway_enrichment",
|
||||
"arguments": {
|
||||
"gene_list": de_genes['significant_genes'],
|
||||
"organism": "hsa"
|
||||
}
|
||||
})
|
||||
print(f" ✓ Found {len(pathways)} enriched pathways")
|
||||
if pathways:
|
||||
print(f" Top pathway: {pathways[0].get('pathway_name', 'Unknown')}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed: {e}")
|
||||
|
||||
print("\n" + "=" * 70)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example 1: Drug discovery workflow for hypertension
|
||||
print("EXAMPLE 1: Drug Discovery for Hypertension")
|
||||
candidates = drug_discovery_workflow("EFO_0000537", max_targets=2)
|
||||
|
||||
print("\n\n")
|
||||
|
||||
# Example 2: Genomics workflow
|
||||
print("EXAMPLE 2: Genomics Analysis")
|
||||
genomics_workflow("GSE12345")
|
||||
Reference in New Issue
Block a user