Files
2025-11-30 08:30:10 +08:00

472 lines
12 KiB
Markdown

# Research Pipeline API Reference
## Core Classes
### Denario
The main class for orchestrating research workflows.
#### Initialization
```python
from denario import Denario
den = Denario(project_dir="path/to/project")
```
**Parameters:**
- `project_dir` (str): Path to the research project directory where all outputs will be stored
#### Methods
##### set_data_description()
Define the research context by describing available data and analytical tools.
```python
den.set_data_description(description: str)
```
**Parameters:**
- `description` (str): Text describing the dataset, available tools, research domain, and any relevant context
**Example:**
```python
den.set_data_description("""
Available data: Time-series temperature measurements from 2010-2023
Tools: pandas, scipy, sklearn, matplotlib
Domain: Climate science
Research interest: Identifying seasonal patterns and long-term trends
""")
```
**Purpose:** This establishes the foundation for automated idea generation by providing context about what data is available and what analyses are feasible.
##### get_idea()
Generate research hypotheses based on the data description.
```python
den.get_idea()
```
**Returns:** Research idea/hypothesis (stored internally in project directory)
**Output:** Creates a file containing the generated research question or hypothesis
**Example:**
```python
den.get_idea()
# Generates ideas like: "Investigate the correlation between seasonal temperature
# variations and long-term warming trends using time-series decomposition"
```
##### set_idea()
Manually specify a research idea instead of generating one.
```python
den.set_idea(idea: str)
```
**Parameters:**
- `idea` (str): The research hypothesis or question to investigate
**Example:**
```python
den.set_idea("Analyze the impact of El Niño events on regional temperature anomalies")
```
**Use case:** When you have a specific research direction and want to skip automated idea generation.
##### get_method()
Develop a research methodology based on the idea and data description.
```python
den.get_method()
```
**Returns:** Methodology document (stored internally in project directory)
**Output:** Creates a structured methodology including:
- Analytical approach
- Statistical methods to apply
- Validation strategies
- Expected outputs
**Example:**
```python
den.get_method()
# Generates methodology: "Apply seasonal decomposition, compute correlation coefficients,
# perform statistical significance tests, generate visualization plots..."
```
##### set_method()
Provide a custom methodology instead of generating one.
```python
den.set_method(method: str)
den.set_method(method: Path) # Can also accept file paths
```
**Parameters:**
- `method` (str or Path): Methodology description or path to markdown file containing methodology
**Example:**
```python
# From string
den.set_method("""
1. Apply seasonal decomposition using STL
2. Compute Pearson correlation coefficients
3. Perform Mann-Kendall trend test
4. Generate time-series plots with confidence intervals
""")
# From file
den.set_method("methodology.md")
```
##### get_results()
Execute the methodology, perform computations, and generate results.
```python
den.get_results()
```
**Returns:** Results document with analysis outputs (stored internally in project directory)
**Output:** Creates results including:
- Computed statistics
- Generated figures and visualizations
- Data tables
- Analysis findings
**Example:**
```python
den.get_results()
# Executes the methodology, runs analyses, creates plots, compiles findings
```
**Note:** This is where the actual computational work happens. The agent executes code to perform the analyses specified in the methodology.
##### set_results()
Provide pre-computed results instead of generating them.
```python
den.set_results(results: str)
den.set_results(results: Path) # Can also accept file paths
```
**Parameters:**
- `results` (str or Path): Results description or path to markdown file containing results
**Example:**
```python
# From string
den.set_results("""
Analysis Results:
- Correlation coefficient: 0.78 (p < 0.001)
- Seasonal amplitude: 5.2°C
- Long-term trend: +0.15°C per decade
- Figure 1: Seasonal decomposition (see attached)
""")
# From file
den.set_results("results.md")
```
**Use case:** When analyses were performed externally or when iterating on paper writing without re-running computations.
##### get_paper()
Generate a publication-ready LaTeX paper with the research findings.
```python
den.get_paper(journal: Journal = None)
```
**Parameters:**
- `journal` (Journal, optional): Target journal for formatting. Defaults to generic format.
**Returns:** LaTeX paper with proper formatting (stored in project directory)
**Output:** Creates:
- Complete LaTeX source file
- Compiled PDF (if LaTeX is available)
- Integrated figures and tables
- Properly formatted bibliography
**Example:**
```python
from denario import Journal
den.get_paper(journal=Journal.APS)
# Generates paper.tex and paper.pdf formatted for APS journals
```
### Journal Enum
Enumeration of supported journal formats.
```python
from denario import Journal
```
#### Available Journals
- `Journal.APS` - American Physical Society format
- Suitable for Physical Review, Physical Review Letters, etc.
- Uses RevTeX document class
Additional journal formats may be available. Check the latest denario documentation for the complete list.
#### Usage
```python
from denario import Denario, Journal
den = Denario(project_dir="./research")
# ... complete workflow ...
den.get_paper(journal=Journal.APS)
```
## Workflow Patterns
### Fully Automated Pipeline
Let denario handle every stage:
```python
from denario import Denario, Journal
den = Denario(project_dir="./automated_research")
# Define context
den.set_data_description("""
Dataset: Sensor readings from IoT devices
Tools: pandas, numpy, sklearn, matplotlib
Goal: Anomaly detection in sensor networks
""")
# Automate entire pipeline
den.get_idea() # Generate research idea
den.get_method() # Develop methodology
den.get_results() # Execute analysis
den.get_paper(journal=Journal.APS) # Create paper
```
### Custom Idea, Automated Execution
Provide your research question, automate the rest:
```python
den = Denario(project_dir="./custom_idea")
den.set_data_description("Dataset: Financial time-series data...")
# Manual idea
den.set_idea("Investigate predictive models for stock market volatility using LSTM networks")
# Automated execution
den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)
```
### Fully Manual with Template Generation
Use denario only for paper formatting:
```python
den = Denario(project_dir="./manual_research")
# Provide everything manually
den.set_data_description("Pre-existing dataset description...")
den.set_idea("Pre-defined research hypothesis")
den.set_method("methodology.md") # Load from file
den.set_results("results.md") # Load from file
# Generate formatted paper
den.get_paper(journal=Journal.APS)
```
### Iterative Refinement
Refine specific stages without re-running everything:
```python
den = Denario(project_dir="./iterative")
# Initial run
den.set_data_description("Dataset description...")
den.get_idea()
den.get_method()
den.get_results()
# Refine methodology after reviewing results
den.set_method("""
Revised methodology:
- Use different statistical test
- Add sensitivity analysis
- Include cross-validation
""")
# Re-run only downstream stages
den.get_results() # Re-execute with new method
den.get_paper(journal=Journal.APS)
```
## Project Directory Structure
After running a complete workflow, the project directory contains:
```
project_dir/
├── data_description.txt # Input: data context
├── idea.md # Generated or provided research idea
├── methodology.md # Generated or provided methodology
├── results.md # Generated or provided results
├── figures/ # Generated visualizations
│ ├── figure_1.png
│ ├── figure_2.png
│ └── ...
├── paper.tex # Generated LaTeX source
├── paper.pdf # Compiled PDF (if LaTeX available)
└── logs/ # Agent execution logs
└── ...
```
## Advanced Features
### Multiagent Orchestration
Denario uses AG2 and LangGraph frameworks to coordinate multiple specialized agents:
- **Idea Agent**: Generates research hypotheses from data descriptions
- **Method Agent**: Develops analytical methodologies
- **Execution Agent**: Runs computations and creates visualizations
- **Writing Agent**: Produces publication-ready manuscripts
These agents collaborate automatically, with each stage building on previous outputs.
### Integration with Scientific Tools
Denario integrates with common scientific Python libraries:
- **pandas**: Data manipulation and analysis
- **scikit-learn**: Machine learning algorithms
- **scipy**: Scientific computing and statistics
- **matplotlib/seaborn**: Visualization
- **numpy**: Numerical operations
When generating results, denario can automatically write and execute code using these libraries.
### Reproducibility
All stages produce structured outputs saved to the project directory:
- Version control friendly (markdown and LaTeX)
- Auditable (logs of agent decisions and code execution)
- Reproducible (saved methodologies can be re-run)
### Literature Search
Denario includes capabilities for literature searches to provide context for research ideas and methodology development. See `examples.md` for literature search workflows.
## Error Handling
### Common Issues
**Missing data description:**
```python
den = Denario(project_dir="./project")
den.get_idea() # Error: must call set_data_description() first
```
**Solution:** Always set data description before generating ideas.
**Missing prerequisite stages:**
```python
den = Denario(project_dir="./project")
den.get_results() # Error: must have idea and method first
```
**Solution:** Follow the workflow order or manually set prerequisite stages.
**LaTeX compilation errors:**
```python
den.get_paper() # May fail if LaTeX not installed
```
**Solution:** Install LaTeX distribution or use Docker image with pre-installed LaTeX.
## Best Practices
### Data Description Quality
Provide detailed context for better idea generation:
```python
# Good: Detailed and specific
den.set_data_description("""
Dataset: 10 years of daily temperature readings from 50 weather stations
Format: CSV with columns [date, station_id, temperature, humidity]
Tools available: pandas, scipy, sklearn, matplotlib, seaborn
Domain: Climatology
Research interests: Climate change, seasonal patterns, regional variations
Known challenges: Missing data in 2015, station 23 has calibration issues
""")
# Bad: Too vague
den.set_data_description("Temperature data from weather stations")
```
### Methodology Validation
Review generated methodologies before executing:
```python
den.get_method()
# Review the methodology.md file in project_dir
# If needed, refine with set_method()
```
### Incremental Development
Build the research pipeline incrementally:
```python
# Stage 1: Validate idea generation
den.set_data_description("...")
den.get_idea()
# Review idea.md, adjust if needed
# Stage 2: Validate methodology
den.get_method()
# Review methodology.md, adjust if needed
# Stage 3: Execute and validate results
den.get_results()
# Review results.md and figures/
# Stage 4: Generate paper
den.get_paper(journal=Journal.APS)
```
### Version Control Integration
Initialize git in project directory for tracking:
```bash
cd project_dir
git init
git add .
git commit -m "Initial research workflow"
```
Commit after each stage to track the evolution of your research.