Initial commit
This commit is contained in:
471
skills/denario/references/research_pipeline.md
Normal file
471
skills/denario/references/research_pipeline.md
Normal file
@@ -0,0 +1,471 @@
|
||||
# Research Pipeline API Reference
|
||||
|
||||
## Core Classes
|
||||
|
||||
### Denario
|
||||
|
||||
The main class for orchestrating research workflows.
|
||||
|
||||
#### Initialization
|
||||
|
||||
```python
|
||||
from denario import Denario
|
||||
|
||||
den = Denario(project_dir="path/to/project")
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `project_dir` (str): Path to the research project directory where all outputs will be stored
|
||||
|
||||
#### Methods
|
||||
|
||||
##### set_data_description()
|
||||
|
||||
Define the research context by describing available data and analytical tools.
|
||||
|
||||
```python
|
||||
den.set_data_description(description: str)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `description` (str): Text describing the dataset, available tools, research domain, and any relevant context
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
den.set_data_description("""
|
||||
Available data: Time-series temperature measurements from 2010-2023
|
||||
Tools: pandas, scipy, sklearn, matplotlib
|
||||
Domain: Climate science
|
||||
Research interest: Identifying seasonal patterns and long-term trends
|
||||
""")
|
||||
```
|
||||
|
||||
**Purpose:** This establishes the foundation for automated idea generation by providing context about what data is available and what analyses are feasible.
|
||||
|
||||
##### get_idea()
|
||||
|
||||
Generate research hypotheses based on the data description.
|
||||
|
||||
```python
|
||||
den.get_idea()
|
||||
```
|
||||
|
||||
**Returns:** Research idea/hypothesis (stored internally in project directory)
|
||||
|
||||
**Output:** Creates a file containing the generated research question or hypothesis
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
den.get_idea()
|
||||
# Generates ideas like: "Investigate the correlation between seasonal temperature
|
||||
# variations and long-term warming trends using time-series decomposition"
|
||||
```
|
||||
|
||||
##### set_idea()
|
||||
|
||||
Manually specify a research idea instead of generating one.
|
||||
|
||||
```python
|
||||
den.set_idea(idea: str)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `idea` (str): The research hypothesis or question to investigate
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
den.set_idea("Analyze the impact of El Niño events on regional temperature anomalies")
|
||||
```
|
||||
|
||||
**Use case:** When you have a specific research direction and want to skip automated idea generation.
|
||||
|
||||
##### get_method()
|
||||
|
||||
Develop a research methodology based on the idea and data description.
|
||||
|
||||
```python
|
||||
den.get_method()
|
||||
```
|
||||
|
||||
**Returns:** Methodology document (stored internally in project directory)
|
||||
|
||||
**Output:** Creates a structured methodology including:
|
||||
- Analytical approach
|
||||
- Statistical methods to apply
|
||||
- Validation strategies
|
||||
- Expected outputs
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
den.get_method()
|
||||
# Generates methodology: "Apply seasonal decomposition, compute correlation coefficients,
|
||||
# perform statistical significance tests, generate visualization plots..."
|
||||
```
|
||||
|
||||
##### set_method()
|
||||
|
||||
Provide a custom methodology instead of generating one.
|
||||
|
||||
```python
|
||||
den.set_method(method: str)
|
||||
den.set_method(method: Path) # Can also accept file paths
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `method` (str or Path): Methodology description or path to markdown file containing methodology
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# From string
|
||||
den.set_method("""
|
||||
1. Apply seasonal decomposition using STL
|
||||
2. Compute Pearson correlation coefficients
|
||||
3. Perform Mann-Kendall trend test
|
||||
4. Generate time-series plots with confidence intervals
|
||||
""")
|
||||
|
||||
# From file
|
||||
den.set_method("methodology.md")
|
||||
```
|
||||
|
||||
##### get_results()
|
||||
|
||||
Execute the methodology, perform computations, and generate results.
|
||||
|
||||
```python
|
||||
den.get_results()
|
||||
```
|
||||
|
||||
**Returns:** Results document with analysis outputs (stored internally in project directory)
|
||||
|
||||
**Output:** Creates results including:
|
||||
- Computed statistics
|
||||
- Generated figures and visualizations
|
||||
- Data tables
|
||||
- Analysis findings
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
den.get_results()
|
||||
# Executes the methodology, runs analyses, creates plots, compiles findings
|
||||
```
|
||||
|
||||
**Note:** This is where the actual computational work happens. The agent executes code to perform the analyses specified in the methodology.
|
||||
|
||||
##### set_results()
|
||||
|
||||
Provide pre-computed results instead of generating them.
|
||||
|
||||
```python
|
||||
den.set_results(results: str)
|
||||
den.set_results(results: Path) # Can also accept file paths
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `results` (str or Path): Results description or path to markdown file containing results
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# From string
|
||||
den.set_results("""
|
||||
Analysis Results:
|
||||
- Correlation coefficient: 0.78 (p < 0.001)
|
||||
- Seasonal amplitude: 5.2°C
|
||||
- Long-term trend: +0.15°C per decade
|
||||
- Figure 1: Seasonal decomposition (see attached)
|
||||
""")
|
||||
|
||||
# From file
|
||||
den.set_results("results.md")
|
||||
```
|
||||
|
||||
**Use case:** When analyses were performed externally or when iterating on paper writing without re-running computations.
|
||||
|
||||
##### get_paper()
|
||||
|
||||
Generate a publication-ready LaTeX paper with the research findings.
|
||||
|
||||
```python
|
||||
den.get_paper(journal: Journal = None)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `journal` (Journal, optional): Target journal for formatting. Defaults to generic format.
|
||||
|
||||
**Returns:** LaTeX paper with proper formatting (stored in project directory)
|
||||
|
||||
**Output:** Creates:
|
||||
- Complete LaTeX source file
|
||||
- Compiled PDF (if LaTeX is available)
|
||||
- Integrated figures and tables
|
||||
- Properly formatted bibliography
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
from denario import Journal
|
||||
|
||||
den.get_paper(journal=Journal.APS)
|
||||
# Generates paper.tex and paper.pdf formatted for APS journals
|
||||
```
|
||||
|
||||
### Journal Enum
|
||||
|
||||
Enumeration of supported journal formats.
|
||||
|
||||
```python
|
||||
from denario import Journal
|
||||
```
|
||||
|
||||
#### Available Journals
|
||||
|
||||
- `Journal.APS` - American Physical Society format
|
||||
- Suitable for Physical Review, Physical Review Letters, etc.
|
||||
- Uses RevTeX document class
|
||||
|
||||
Additional journal formats may be available. Check the latest denario documentation for the complete list.
|
||||
|
||||
#### Usage
|
||||
|
||||
```python
|
||||
from denario import Denario, Journal
|
||||
|
||||
den = Denario(project_dir="./research")
|
||||
# ... complete workflow ...
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Workflow Patterns
|
||||
|
||||
### Fully Automated Pipeline
|
||||
|
||||
Let denario handle every stage:
|
||||
|
||||
```python
|
||||
from denario import Denario, Journal
|
||||
|
||||
den = Denario(project_dir="./automated_research")
|
||||
|
||||
# Define context
|
||||
den.set_data_description("""
|
||||
Dataset: Sensor readings from IoT devices
|
||||
Tools: pandas, numpy, sklearn, matplotlib
|
||||
Goal: Anomaly detection in sensor networks
|
||||
""")
|
||||
|
||||
# Automate entire pipeline
|
||||
den.get_idea() # Generate research idea
|
||||
den.get_method() # Develop methodology
|
||||
den.get_results() # Execute analysis
|
||||
den.get_paper(journal=Journal.APS) # Create paper
|
||||
```
|
||||
|
||||
### Custom Idea, Automated Execution
|
||||
|
||||
Provide your research question, automate the rest:
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./custom_idea")
|
||||
|
||||
den.set_data_description("Dataset: Financial time-series data...")
|
||||
|
||||
# Manual idea
|
||||
den.set_idea("Investigate predictive models for stock market volatility using LSTM networks")
|
||||
|
||||
# Automated execution
|
||||
den.get_method()
|
||||
den.get_results()
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
### Fully Manual with Template Generation
|
||||
|
||||
Use denario only for paper formatting:
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./manual_research")
|
||||
|
||||
# Provide everything manually
|
||||
den.set_data_description("Pre-existing dataset description...")
|
||||
den.set_idea("Pre-defined research hypothesis")
|
||||
den.set_method("methodology.md") # Load from file
|
||||
den.set_results("results.md") # Load from file
|
||||
|
||||
# Generate formatted paper
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
### Iterative Refinement
|
||||
|
||||
Refine specific stages without re-running everything:
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./iterative")
|
||||
|
||||
# Initial run
|
||||
den.set_data_description("Dataset description...")
|
||||
den.get_idea()
|
||||
den.get_method()
|
||||
den.get_results()
|
||||
|
||||
# Refine methodology after reviewing results
|
||||
den.set_method("""
|
||||
Revised methodology:
|
||||
- Use different statistical test
|
||||
- Add sensitivity analysis
|
||||
- Include cross-validation
|
||||
""")
|
||||
|
||||
# Re-run only downstream stages
|
||||
den.get_results() # Re-execute with new method
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Project Directory Structure
|
||||
|
||||
After running a complete workflow, the project directory contains:
|
||||
|
||||
```
|
||||
project_dir/
|
||||
├── data_description.txt # Input: data context
|
||||
├── idea.md # Generated or provided research idea
|
||||
├── methodology.md # Generated or provided methodology
|
||||
├── results.md # Generated or provided results
|
||||
├── figures/ # Generated visualizations
|
||||
│ ├── figure_1.png
|
||||
│ ├── figure_2.png
|
||||
│ └── ...
|
||||
├── paper.tex # Generated LaTeX source
|
||||
├── paper.pdf # Compiled PDF (if LaTeX available)
|
||||
└── logs/ # Agent execution logs
|
||||
└── ...
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Multiagent Orchestration
|
||||
|
||||
Denario uses AG2 and LangGraph frameworks to coordinate multiple specialized agents:
|
||||
|
||||
- **Idea Agent**: Generates research hypotheses from data descriptions
|
||||
- **Method Agent**: Develops analytical methodologies
|
||||
- **Execution Agent**: Runs computations and creates visualizations
|
||||
- **Writing Agent**: Produces publication-ready manuscripts
|
||||
|
||||
These agents collaborate automatically, with each stage building on previous outputs.
|
||||
|
||||
### Integration with Scientific Tools
|
||||
|
||||
Denario integrates with common scientific Python libraries:
|
||||
|
||||
- **pandas**: Data manipulation and analysis
|
||||
- **scikit-learn**: Machine learning algorithms
|
||||
- **scipy**: Scientific computing and statistics
|
||||
- **matplotlib/seaborn**: Visualization
|
||||
- **numpy**: Numerical operations
|
||||
|
||||
When generating results, denario can automatically write and execute code using these libraries.
|
||||
|
||||
### Reproducibility
|
||||
|
||||
All stages produce structured outputs saved to the project directory:
|
||||
|
||||
- Version control friendly (markdown and LaTeX)
|
||||
- Auditable (logs of agent decisions and code execution)
|
||||
- Reproducible (saved methodologies can be re-run)
|
||||
|
||||
### Literature Search
|
||||
|
||||
Denario includes capabilities for literature searches to provide context for research ideas and methodology development. See `examples.md` for literature search workflows.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Missing data description:**
|
||||
```python
|
||||
den = Denario(project_dir="./project")
|
||||
den.get_idea() # Error: must call set_data_description() first
|
||||
```
|
||||
|
||||
**Solution:** Always set data description before generating ideas.
|
||||
|
||||
**Missing prerequisite stages:**
|
||||
```python
|
||||
den = Denario(project_dir="./project")
|
||||
den.get_results() # Error: must have idea and method first
|
||||
```
|
||||
|
||||
**Solution:** Follow the workflow order or manually set prerequisite stages.
|
||||
|
||||
**LaTeX compilation errors:**
|
||||
```python
|
||||
den.get_paper() # May fail if LaTeX not installed
|
||||
```
|
||||
|
||||
**Solution:** Install LaTeX distribution or use Docker image with pre-installed LaTeX.
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Data Description Quality
|
||||
|
||||
Provide detailed context for better idea generation:
|
||||
|
||||
```python
|
||||
# Good: Detailed and specific
|
||||
den.set_data_description("""
|
||||
Dataset: 10 years of daily temperature readings from 50 weather stations
|
||||
Format: CSV with columns [date, station_id, temperature, humidity]
|
||||
Tools available: pandas, scipy, sklearn, matplotlib, seaborn
|
||||
Domain: Climatology
|
||||
Research interests: Climate change, seasonal patterns, regional variations
|
||||
Known challenges: Missing data in 2015, station 23 has calibration issues
|
||||
""")
|
||||
|
||||
# Bad: Too vague
|
||||
den.set_data_description("Temperature data from weather stations")
|
||||
```
|
||||
|
||||
### Methodology Validation
|
||||
|
||||
Review generated methodologies before executing:
|
||||
|
||||
```python
|
||||
den.get_method()
|
||||
# Review the methodology.md file in project_dir
|
||||
# If needed, refine with set_method()
|
||||
```
|
||||
|
||||
### Incremental Development
|
||||
|
||||
Build the research pipeline incrementally:
|
||||
|
||||
```python
|
||||
# Stage 1: Validate idea generation
|
||||
den.set_data_description("...")
|
||||
den.get_idea()
|
||||
# Review idea.md, adjust if needed
|
||||
|
||||
# Stage 2: Validate methodology
|
||||
den.get_method()
|
||||
# Review methodology.md, adjust if needed
|
||||
|
||||
# Stage 3: Execute and validate results
|
||||
den.get_results()
|
||||
# Review results.md and figures/
|
||||
|
||||
# Stage 4: Generate paper
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
### Version Control Integration
|
||||
|
||||
Initialize git in project directory for tracking:
|
||||
|
||||
```bash
|
||||
cd project_dir
|
||||
git init
|
||||
git add .
|
||||
git commit -m "Initial research workflow"
|
||||
```
|
||||
|
||||
Commit after each stage to track the evolution of your research.
|
||||
Reference in New Issue
Block a user