gh-k-dense-ai-claude-scient…/skills/denario/references/research_pipeline.md

# Research Pipeline API Reference

## Core Classes

### Denario

The main class for orchestrating research workflows.

#### Initialization

```python
from denario import Denario

den = Denario(project_dir="path/to/project")
```

**Parameters:**
- `project_dir` (str): Path to the research project directory where all outputs will be stored

#### Methods

##### set_data_description()

Define the research context by describing available data and analytical tools.

```python
den.set_data_description(description: str)
```

**Parameters:**
- `description` (str): Text describing the dataset, available tools, research domain, and any relevant context

**Example:**
```python
den.set_data_description("""
Available data: Time-series temperature measurements from 2010-2023
Tools: pandas, scipy, sklearn, matplotlib
Domain: Climate science
Research interest: Identifying seasonal patterns and long-term trends
""")
```

**Purpose:** This establishes the foundation for automated idea generation by providing context about what data is available and what analyses are feasible.

##### get_idea()

Generate research hypotheses based on the data description.

```python
den.get_idea()
```

**Returns:** Research idea/hypothesis (stored internally in project directory)

**Output:** Creates a file containing the generated research question or hypothesis

**Example:**
```python
den.get_idea()
# Generates ideas like: "Investigate the correlation between seasonal temperature
# variations and long-term warming trends using time-series decomposition"
```

##### set_idea()

Manually specify a research idea instead of generating one.

```python
den.set_idea(idea: str)
```

**Parameters:**
- `idea` (str): The research hypothesis or question to investigate

**Example:**
```python
den.set_idea("Analyze the impact of El Niño events on regional temperature anomalies")
```

**Use case:** When you have a specific research direction and want to skip automated idea generation.

##### get_method()

Develop a research methodology based on the idea and data description.

```python
den.get_method()
```

**Returns:** Methodology document (stored internally in project directory)

**Output:** Creates a structured methodology including:
- Analytical approach
- Statistical methods to apply
- Validation strategies
- Expected outputs

**Example:**
```python
den.get_method()
# Generates methodology: "Apply seasonal decomposition, compute correlation coefficients,
# perform statistical significance tests, generate visualization plots..."
```

##### set_method()

Provide a custom methodology instead of generating one.

```python
den.set_method(method: str)
den.set_method(method: Path)  # Can also accept file paths
```

**Parameters:**
- `method` (str or Path): Methodology description or path to markdown file containing methodology

**Example:**
```python
# From string
den.set_method("""
1. Apply seasonal decomposition using STL
2. Compute Pearson correlation coefficients
3. Perform Mann-Kendall trend test
4. Generate time-series plots with confidence intervals
""")

# From file
den.set_method("methodology.md")
```

##### get_results()

Execute the methodology, perform computations, and generate results.

```python
den.get_results()
```

**Returns:** Results document with analysis outputs (stored internally in project directory)

**Output:** Creates results including:
- Computed statistics
- Generated figures and visualizations
- Data tables
- Analysis findings

**Example:**
```python
den.get_results()
# Executes the methodology, runs analyses, creates plots, compiles findings
```

**Note:** This is where the actual computational work happens. The agent executes code to perform the analyses specified in the methodology.

##### set_results()

Provide pre-computed results instead of generating them.

```python
den.set_results(results: str)
den.set_results(results: Path)  # Can also accept file paths
```

**Parameters:**
- `results` (str or Path): Results description or path to markdown file containing results

**Example:**
```python
# From string
den.set_results("""
Analysis Results:
- Correlation coefficient: 0.78 (p < 0.001)
- Seasonal amplitude: 5.2°C
- Long-term trend: +0.15°C per decade
- Figure 1: Seasonal decomposition (see attached)
""")

# From file
den.set_results("results.md")
```

**Use case:** When analyses were performed externally or when iterating on paper writing without re-running computations.

##### get_paper()

Generate a publication-ready LaTeX paper with the research findings.

```python
den.get_paper(journal: Journal = None)
```

**Parameters:**
- `journal` (Journal, optional): Target journal for formatting. Defaults to generic format.

**Returns:** LaTeX paper with proper formatting (stored in project directory)

**Output:** Creates:
- Complete LaTeX source file
- Compiled PDF (if LaTeX is available)
- Integrated figures and tables
- Properly formatted bibliography

**Example:**
```python
from denario import Journal

den.get_paper(journal=Journal.APS)
# Generates paper.tex and paper.pdf formatted for APS journals
```

### Journal Enum

Enumeration of supported journal formats.

```python
from denario import Journal
```

#### Available Journals

- `Journal.APS` - American Physical Society format
  - Suitable for Physical Review, Physical Review Letters, etc.
  - Uses RevTeX document class

Additional journal formats may be available. Check the latest denario documentation for the complete list.

#### Usage

```python
from denario import Denario, Journal

den = Denario(project_dir="./research")
# ... complete workflow ...
den.get_paper(journal=Journal.APS)
```

## Workflow Patterns

### Fully Automated Pipeline

Let denario handle every stage:

```python
from denario import Denario, Journal

den = Denario(project_dir="./automated_research")

# Define context
den.set_data_description("""
Dataset: Sensor readings from IoT devices
Tools: pandas, numpy, sklearn, matplotlib
Goal: Anomaly detection in sensor networks
""")

# Automate entire pipeline
den.get_idea()        # Generate research idea
den.get_method()      # Develop methodology
den.get_results()     # Execute analysis
den.get_paper(journal=Journal.APS)  # Create paper
```

### Custom Idea, Automated Execution

Provide your research question, automate the rest:

```python
den = Denario(project_dir="./custom_idea")

den.set_data_description("Dataset: Financial time-series data...")

# Manual idea
den.set_idea("Investigate predictive models for stock market volatility using LSTM networks")

# Automated execution
den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)
```

### Fully Manual with Template Generation

Use denario only for paper formatting:

```python
den = Denario(project_dir="./manual_research")

# Provide everything manually
den.set_data_description("Pre-existing dataset description...")
den.set_idea("Pre-defined research hypothesis")
den.set_method("methodology.md")  # Load from file
den.set_results("results.md")      # Load from file

# Generate formatted paper
den.get_paper(journal=Journal.APS)
```

### Iterative Refinement

Refine specific stages without re-running everything:

```python
den = Denario(project_dir="./iterative")

# Initial run
den.set_data_description("Dataset description...")
den.get_idea()
den.get_method()
den.get_results()

# Refine methodology after reviewing results
den.set_method("""
Revised methodology:
- Use different statistical test
- Add sensitivity analysis
- Include cross-validation
""")

# Re-run only downstream stages
den.get_results()  # Re-execute with new method
den.get_paper(journal=Journal.APS)
```

## Project Directory Structure

After running a complete workflow, the project directory contains:

```
project_dir/
├── data_description.txt    # Input: data context
├── idea.md                 # Generated or provided research idea
├── methodology.md          # Generated or provided methodology
├── results.md              # Generated or provided results
├── figures/                # Generated visualizations
│   ├── figure_1.png
│   ├── figure_2.png
│   └── ...
├── paper.tex               # Generated LaTeX source
├── paper.pdf               # Compiled PDF (if LaTeX available)
└── logs/                   # Agent execution logs
    └── ...
```

## Advanced Features

### Multiagent Orchestration

Denario uses AG2 and LangGraph frameworks to coordinate multiple specialized agents:

- **Idea Agent**: Generates research hypotheses from data descriptions
- **Method Agent**: Develops analytical methodologies
- **Execution Agent**: Runs computations and creates visualizations
- **Writing Agent**: Produces publication-ready manuscripts

These agents collaborate automatically, with each stage building on previous outputs.

### Integration with Scientific Tools

Denario integrates with common scientific Python libraries:

- **pandas**: Data manipulation and analysis
- **scikit-learn**: Machine learning algorithms
- **scipy**: Scientific computing and statistics
- **matplotlib/seaborn**: Visualization
- **numpy**: Numerical operations

When generating results, denario can automatically write and execute code using these libraries.

### Reproducibility

All stages produce structured outputs saved to the project directory:

- Version control friendly (markdown and LaTeX)
- Auditable (logs of agent decisions and code execution)
- Reproducible (saved methodologies can be re-run)

### Literature Search

Denario includes capabilities for literature searches to provide context for research ideas and methodology development. See `examples.md` for literature search workflows.

## Error Handling

### Common Issues

**Missing data description:**
```python
den = Denario(project_dir="./project")
den.get_idea()  # Error: must call set_data_description() first
```

**Solution:** Always set data description before generating ideas.

**Missing prerequisite stages:**
```python
den = Denario(project_dir="./project")
den.get_results()  # Error: must have idea and method first
```

**Solution:** Follow the workflow order or manually set prerequisite stages.

**LaTeX compilation errors:**
```python
den.get_paper()  # May fail if LaTeX not installed
```

**Solution:** Install LaTeX distribution or use Docker image with pre-installed LaTeX.

## Best Practices

### Data Description Quality

Provide detailed context for better idea generation:

```python
# Good: Detailed and specific
den.set_data_description("""
Dataset: 10 years of daily temperature readings from 50 weather stations
Format: CSV with columns [date, station_id, temperature, humidity]
Tools available: pandas, scipy, sklearn, matplotlib, seaborn
Domain: Climatology
Research interests: Climate change, seasonal patterns, regional variations
Known challenges: Missing data in 2015, station 23 has calibration issues
""")

# Bad: Too vague
den.set_data_description("Temperature data from weather stations")
```

### Methodology Validation

Review generated methodologies before executing:

```python
den.get_method()
# Review the methodology.md file in project_dir
# If needed, refine with set_method()
```

### Incremental Development

Build the research pipeline incrementally:

```python
# Stage 1: Validate idea generation
den.set_data_description("...")
den.get_idea()
# Review idea.md, adjust if needed

# Stage 2: Validate methodology
den.get_method()
# Review methodology.md, adjust if needed

# Stage 3: Execute and validate results
den.get_results()
# Review results.md and figures/

# Stage 4: Generate paper
den.get_paper(journal=Journal.APS)
```

### Version Control Integration

Initialize git in project directory for tracking:

```bash
cd project_dir
git init
git add .
git commit -m "Initial research workflow"
```

Commit after each stage to track the evolution of your research.