Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/denario/references/research_pipeline.md
+++ b/skills/denario/references/research_pipeline.md
@@ -0,0 +1,471 @@
+# Research Pipeline API Reference
+
+## Core Classes
+
+### Denario
+
+The main class for orchestrating research workflows.
+
+#### Initialization
+
+```python
+from denario import Denario
+
+den = Denario(project_dir="path/to/project")
+```
+
+**Parameters:**
+- `project_dir` (str): Path to the research project directory where all outputs will be stored
+
+#### Methods
+
+##### set_data_description()
+
+Define the research context by describing available data and analytical tools.
+
+```python
+den.set_data_description(description: str)
+```
+
+**Parameters:**
+- `description` (str): Text describing the dataset, available tools, research domain, and any relevant context
+
+**Example:**
+```python
+den.set_data_description("""
+Available data: Time-series temperature measurements from 2010-2023
+Tools: pandas, scipy, sklearn, matplotlib
+Domain: Climate science
+Research interest: Identifying seasonal patterns and long-term trends
+""")
+```
+
+**Purpose:** This establishes the foundation for automated idea generation by providing context about what data is available and what analyses are feasible.
+
+##### get_idea()
+
+Generate research hypotheses based on the data description.
+
+```python
+den.get_idea()
+```
+
+**Returns:** Research idea/hypothesis (stored internally in project directory)
+
+**Output:** Creates a file containing the generated research question or hypothesis
+
+**Example:**
+```python
+den.get_idea()
+# Generates ideas like: "Investigate the correlation between seasonal temperature
+# variations and long-term warming trends using time-series decomposition"
+```
+
+##### set_idea()
+
+Manually specify a research idea instead of generating one.
+
+```python
+den.set_idea(idea: str)
+```
+
+**Parameters:**
+- `idea` (str): The research hypothesis or question to investigate
+
+**Example:**
+```python
+den.set_idea("Analyze the impact of El Niño events on regional temperature anomalies")
+```
+
+**Use case:** When you have a specific research direction and want to skip automated idea generation.
+
+##### get_method()
+
+Develop a research methodology based on the idea and data description.
+
+```python
+den.get_method()
+```
+
+**Returns:** Methodology document (stored internally in project directory)
+
+**Output:** Creates a structured methodology including:
+- Analytical approach
+- Statistical methods to apply
+- Validation strategies
+- Expected outputs
+
+**Example:**
+```python
+den.get_method()
+# Generates methodology: "Apply seasonal decomposition, compute correlation coefficients,
+# perform statistical significance tests, generate visualization plots..."
+```
+
+##### set_method()
+
+Provide a custom methodology instead of generating one.
+
+```python
+den.set_method(method: str)
+den.set_method(method: Path)  # Can also accept file paths
+```
+
+**Parameters:**
+- `method` (str or Path): Methodology description or path to markdown file containing methodology
+
+**Example:**
+```python
+# From string
+den.set_method("""
+1. Apply seasonal decomposition using STL
+2. Compute Pearson correlation coefficients
+3. Perform Mann-Kendall trend test
+4. Generate time-series plots with confidence intervals
+""")
+
+# From file
+den.set_method("methodology.md")
+```
+
+##### get_results()
+
+Execute the methodology, perform computations, and generate results.
+
+```python
+den.get_results()
+```
+
+**Returns:** Results document with analysis outputs (stored internally in project directory)
+
+**Output:** Creates results including:
+- Computed statistics
+- Generated figures and visualizations
+- Data tables
+- Analysis findings
+
+**Example:**
+```python
+den.get_results()
+# Executes the methodology, runs analyses, creates plots, compiles findings
+```
+
+**Note:** This is where the actual computational work happens. The agent executes code to perform the analyses specified in the methodology.
+
+##### set_results()
+
+Provide pre-computed results instead of generating them.
+
+```python
+den.set_results(results: str)
+den.set_results(results: Path)  # Can also accept file paths
+```
+
+**Parameters:**
+- `results` (str or Path): Results description or path to markdown file containing results
+
+**Example:**
+```python
+# From string
+den.set_results("""
+Analysis Results:
+- Correlation coefficient: 0.78 (p < 0.001)
+- Seasonal amplitude: 5.2°C
+- Long-term trend: +0.15°C per decade
+- Figure 1: Seasonal decomposition (see attached)
+""")
+
+# From file
+den.set_results("results.md")
+```
+
+**Use case:** When analyses were performed externally or when iterating on paper writing without re-running computations.
+
+##### get_paper()
+
+Generate a publication-ready LaTeX paper with the research findings.
+
+```python
+den.get_paper(journal: Journal = None)
+```
+
+**Parameters:**
+- `journal` (Journal, optional): Target journal for formatting. Defaults to generic format.
+
+**Returns:** LaTeX paper with proper formatting (stored in project directory)
+
+**Output:** Creates:
+- Complete LaTeX source file
+- Compiled PDF (if LaTeX is available)
+- Integrated figures and tables
+- Properly formatted bibliography
+
+**Example:**
+```python
+from denario import Journal
+
+den.get_paper(journal=Journal.APS)
+# Generates paper.tex and paper.pdf formatted for APS journals
+```
+
+### Journal Enum
+
+Enumeration of supported journal formats.
+
+```python
+from denario import Journal
+```
+
+#### Available Journals
+
+- `Journal.APS` - American Physical Society format
+  - Suitable for Physical Review, Physical Review Letters, etc.
+  - Uses RevTeX document class
+
+Additional journal formats may be available. Check the latest denario documentation for the complete list.
+
+#### Usage
+
+```python
+from denario import Denario, Journal
+
+den = Denario(project_dir="./research")
+# ... complete workflow ...
+den.get_paper(journal=Journal.APS)
+```
+
+## Workflow Patterns
+
+### Fully Automated Pipeline
+
+Let denario handle every stage:
+
+```python
+from denario import Denario, Journal
+
+den = Denario(project_dir="./automated_research")
+
+# Define context
+den.set_data_description("""
+Dataset: Sensor readings from IoT devices
+Tools: pandas, numpy, sklearn, matplotlib
+Goal: Anomaly detection in sensor networks
+""")
+
+# Automate entire pipeline
+den.get_idea()        # Generate research idea
+den.get_method()      # Develop methodology
+den.get_results()     # Execute analysis
+den.get_paper(journal=Journal.APS)  # Create paper
+```
+
+### Custom Idea, Automated Execution
+
+Provide your research question, automate the rest:
+
+```python
+den = Denario(project_dir="./custom_idea")
+
+den.set_data_description("Dataset: Financial time-series data...")
+
+# Manual idea
+den.set_idea("Investigate predictive models for stock market volatility using LSTM networks")
+
+# Automated execution
+den.get_method()
+den.get_results()
+den.get_paper(journal=Journal.APS)
+```
+
+### Fully Manual with Template Generation
+
+Use denario only for paper formatting:
+
+```python
+den = Denario(project_dir="./manual_research")
+
+# Provide everything manually
+den.set_data_description("Pre-existing dataset description...")
+den.set_idea("Pre-defined research hypothesis")
+den.set_method("methodology.md")  # Load from file
+den.set_results("results.md")      # Load from file
+
+# Generate formatted paper
+den.get_paper(journal=Journal.APS)
+```
+
+### Iterative Refinement
+
+Refine specific stages without re-running everything:
+
+```python
+den = Denario(project_dir="./iterative")
+
+# Initial run
+den.set_data_description("Dataset description...")
+den.get_idea()
+den.get_method()
+den.get_results()
+
+# Refine methodology after reviewing results
+den.set_method("""
+Revised methodology:
+- Use different statistical test
+- Add sensitivity analysis
+- Include cross-validation
+""")
+
+# Re-run only downstream stages
+den.get_results()  # Re-execute with new method
+den.get_paper(journal=Journal.APS)
+```
+
+## Project Directory Structure
+
+After running a complete workflow, the project directory contains:
+
+```
+project_dir/
+├── data_description.txt    # Input: data context
+├── idea.md                 # Generated or provided research idea
+├── methodology.md          # Generated or provided methodology
+├── results.md              # Generated or provided results
+├── figures/                # Generated visualizations
+│   ├── figure_1.png
+│   ├── figure_2.png
+│   └── ...
+├── paper.tex               # Generated LaTeX source
+├── paper.pdf               # Compiled PDF (if LaTeX available)
+└── logs/                   # Agent execution logs
+    └── ...
+```
+
+## Advanced Features
+
+### Multiagent Orchestration
+
+Denario uses AG2 and LangGraph frameworks to coordinate multiple specialized agents:
+
+- **Idea Agent**: Generates research hypotheses from data descriptions
+- **Method Agent**: Develops analytical methodologies
+- **Execution Agent**: Runs computations and creates visualizations
+- **Writing Agent**: Produces publication-ready manuscripts
+
+These agents collaborate automatically, with each stage building on previous outputs.
+
+### Integration with Scientific Tools
+
+Denario integrates with common scientific Python libraries:
+
+- **pandas**: Data manipulation and analysis
+- **scikit-learn**: Machine learning algorithms
+- **scipy**: Scientific computing and statistics
+- **matplotlib/seaborn**: Visualization
+- **numpy**: Numerical operations
+
+When generating results, denario can automatically write and execute code using these libraries.
+
+### Reproducibility
+
+All stages produce structured outputs saved to the project directory:
+
+- Version control friendly (markdown and LaTeX)
+- Auditable (logs of agent decisions and code execution)
+- Reproducible (saved methodologies can be re-run)
+
+### Literature Search
+
+Denario includes capabilities for literature searches to provide context for research ideas and methodology development. See `examples.md` for literature search workflows.
+
+## Error Handling
+
+### Common Issues
+
+**Missing data description:**
+```python
+den = Denario(project_dir="./project")
+den.get_idea()  # Error: must call set_data_description() first
+```
+
+**Solution:** Always set data description before generating ideas.
+
+**Missing prerequisite stages:**
+```python
+den = Denario(project_dir="./project")
+den.get_results()  # Error: must have idea and method first
+```
+
+**Solution:** Follow the workflow order or manually set prerequisite stages.
+
+**LaTeX compilation errors:**
+```python
+den.get_paper()  # May fail if LaTeX not installed
+```
+
+**Solution:** Install LaTeX distribution or use Docker image with pre-installed LaTeX.
+
+## Best Practices
+
+### Data Description Quality
+
+Provide detailed context for better idea generation:
+
+```python
+# Good: Detailed and specific
+den.set_data_description("""
+Dataset: 10 years of daily temperature readings from 50 weather stations
+Format: CSV with columns [date, station_id, temperature, humidity]
+Tools available: pandas, scipy, sklearn, matplotlib, seaborn
+Domain: Climatology
+Research interests: Climate change, seasonal patterns, regional variations
+Known challenges: Missing data in 2015, station 23 has calibration issues
+""")
+
+# Bad: Too vague
+den.set_data_description("Temperature data from weather stations")
+```
+
+### Methodology Validation
+
+Review generated methodologies before executing:
+
+```python
+den.get_method()
+# Review the methodology.md file in project_dir
+# If needed, refine with set_method()
+```
+
+### Incremental Development
+
+Build the research pipeline incrementally:
+
+```python
+# Stage 1: Validate idea generation
+den.set_data_description("...")
+den.get_idea()
+# Review idea.md, adjust if needed
+
+# Stage 2: Validate methodology
+den.get_method()
+# Review methodology.md, adjust if needed
+
+# Stage 3: Execute and validate results
+den.get_results()
+# Review results.md and figures/
+
+# Stage 4: Generate paper
+den.get_paper(journal=Journal.APS)
+```
+
+### Version Control Integration
+
+Initialize git in project directory for tracking:
+
+```bash
+cd project_dir
+git init
+git add .
+git commit -m "Initial research workflow"
+```
+
+Commit after each stage to track the evolution of your research.