Initial commit
This commit is contained in:
494
skills/denario/references/examples.md
Normal file
494
skills/denario/references/examples.md
Normal file
@@ -0,0 +1,494 @@
|
||||
# Denario Examples
|
||||
|
||||
## Complete End-to-End Research Example
|
||||
|
||||
This example demonstrates a full research pipeline from data to publication.
|
||||
|
||||
### Setup
|
||||
|
||||
```python
|
||||
from denario import Denario, Journal
|
||||
import os
|
||||
|
||||
# Create project directory
|
||||
os.makedirs("climate_research", exist_ok=True)
|
||||
den = Denario(project_dir="./climate_research")
|
||||
```
|
||||
|
||||
### Define Research Context
|
||||
|
||||
```python
|
||||
den.set_data_description("""
|
||||
Available data: Global temperature anomaly dataset (1880-2023)
|
||||
- Monthly mean temperature deviations from 1951-1980 baseline
|
||||
- Global coverage with land and ocean measurements
|
||||
- Format: CSV with columns [year, month, temperature_anomaly]
|
||||
|
||||
Available tools:
|
||||
- pandas for data manipulation
|
||||
- scipy for statistical analysis
|
||||
- sklearn for regression modeling
|
||||
- matplotlib and seaborn for visualization
|
||||
|
||||
Research domain: Climate science
|
||||
Research goal: Quantify and characterize long-term global warming trends
|
||||
|
||||
Data source: NASA GISTEMP
|
||||
Known characteristics: Strong autocorrelation, seasonal patterns, missing data pre-1900
|
||||
""")
|
||||
```
|
||||
|
||||
### Execute Full Pipeline
|
||||
|
||||
```python
|
||||
# Generate research idea
|
||||
den.get_idea()
|
||||
# Output: "Quantify the rate of global temperature increase using
|
||||
# linear regression and assess acceleration in warming trends"
|
||||
|
||||
# Develop methodology
|
||||
den.get_method()
|
||||
# Output: Creates methodology including:
|
||||
# - Time-series preprocessing
|
||||
# - Linear trend analysis
|
||||
# - Moving average smoothing
|
||||
# - Statistical significance testing
|
||||
# - Visualization of trends
|
||||
|
||||
# Execute analysis
|
||||
den.get_results()
|
||||
# Output: Runs the analysis, generates:
|
||||
# - Computed trend: +0.18°C per decade
|
||||
# - Statistical tests: p < 0.001
|
||||
# - Figure 1: Temperature anomaly over time with trend line
|
||||
# - Figure 2: Decadal averages
|
||||
# - Figure 3: Acceleration analysis
|
||||
|
||||
# Generate publication
|
||||
den.get_paper(journal=Journal.APS)
|
||||
# Output: Creates formatted LaTeX paper with:
|
||||
# - Title, abstract, introduction
|
||||
# - Methods section
|
||||
# - Results with embedded figures
|
||||
# - Discussion and conclusions
|
||||
# - References
|
||||
```
|
||||
|
||||
### Review Outputs
|
||||
|
||||
```bash
|
||||
tree climate_research/
|
||||
# climate_research/
|
||||
# ├── data_description.txt
|
||||
# ├── idea.md
|
||||
# ├── methodology.md
|
||||
# ├── results.md
|
||||
# ├── figures/
|
||||
# │ ├── temperature_trend.png
|
||||
# │ ├── decadal_averages.png
|
||||
# │ └── acceleration_analysis.png
|
||||
# ├── paper.tex
|
||||
# └── paper.pdf
|
||||
```
|
||||
|
||||
## Enhancing Input Descriptions
|
||||
|
||||
Improve data descriptions for better idea generation.
|
||||
|
||||
### Basic Description
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./enhanced_input")
|
||||
|
||||
# Start with minimal description
|
||||
den.set_data_description("Gene expression data from cancer patients")
|
||||
```
|
||||
|
||||
### Enhanced Description
|
||||
|
||||
```python
|
||||
# Enhance with specifics
|
||||
den.set_data_description("""
|
||||
Dataset: Gene expression microarray data from breast cancer patients
|
||||
- Sample size: 500 patients (250 responders, 250 non-responders to therapy)
|
||||
- Features: Expression levels of 20,000 genes
|
||||
- Format: CSV matrix (samples × genes)
|
||||
- Clinical metadata: Age, tumor stage, treatment response, survival time
|
||||
|
||||
Available analytical tools:
|
||||
- pandas for data processing
|
||||
- sklearn for machine learning (PCA, random forests, SVM)
|
||||
- lifelines for survival analysis
|
||||
- matplotlib/seaborn for visualization
|
||||
|
||||
Research objectives:
|
||||
- Identify gene signatures predictive of treatment response
|
||||
- Discover potential therapeutic targets
|
||||
- Validate findings using cross-validation
|
||||
|
||||
Data characteristics:
|
||||
- Normalized log2 expression values
|
||||
- Some missing data (<5% of values)
|
||||
- Batch effects corrected
|
||||
""")
|
||||
|
||||
den.get_idea()
|
||||
# Now generates more specific and relevant research ideas
|
||||
```
|
||||
|
||||
## Literature Search Integration
|
||||
|
||||
Incorporate existing research into your workflow.
|
||||
|
||||
### Example: Finding Related Work
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./literature_review")
|
||||
|
||||
# Define research area
|
||||
den.set_data_description("""
|
||||
Research area: Machine learning for protein structure prediction
|
||||
Available data: Protein sequence database with known structures
|
||||
Tools: Biopython, TensorFlow, scikit-learn
|
||||
""")
|
||||
|
||||
# Generate idea
|
||||
den.set_idea("Develop a deep learning model for predicting protein secondary structure from amino acid sequences")
|
||||
|
||||
# NOTE: Literature search functionality would be integrated here
|
||||
# The specific API for literature search should be checked in denario's documentation
|
||||
# Example conceptual usage:
|
||||
# den.search_literature(keywords=["protein structure prediction", "deep learning", "LSTM"])
|
||||
# This would inform methodology and provide citations for the paper
|
||||
```
|
||||
|
||||
## Generate Research Ideas from Data
|
||||
|
||||
Focus on idea generation without full pipeline execution.
|
||||
|
||||
### Example: Brainstorming Research Questions
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./idea_generation")
|
||||
|
||||
# Provide comprehensive data description
|
||||
den.set_data_description("""
|
||||
Available datasets:
|
||||
1. Social media sentiment data (1M tweets, 2020-2023)
|
||||
2. Stock market prices (S&P 500, daily, 2020-2023)
|
||||
3. Economic indicators (GDP, unemployment, inflation)
|
||||
|
||||
Tools: pandas, sklearn, statsmodels, Prophet, VADER sentiment analysis
|
||||
|
||||
Domain: Computational social science and finance
|
||||
Research interests: Market prediction, sentiment analysis, causal inference
|
||||
""")
|
||||
|
||||
# Generate multiple ideas (conceptual - depends on denario API)
|
||||
den.get_idea()
|
||||
|
||||
# Review the generated idea in idea.md
|
||||
# Decide whether to proceed or regenerate
|
||||
```
|
||||
|
||||
## Writing a Paper from Existing Results
|
||||
|
||||
Use denario for paper generation when analysis is already complete.
|
||||
|
||||
### Example: Formatting Existing Research
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./paper_generation")
|
||||
|
||||
# Provide all components manually
|
||||
den.set_data_description("""
|
||||
Completed analysis of traffic pattern data from urban sensors
|
||||
Dataset: 6 months of traffic flow measurements from 100 intersections
|
||||
Analysis completed using R and Python
|
||||
""")
|
||||
|
||||
den.set_idea("""
|
||||
Research question: Optimize traffic light timing using reinforcement learning
|
||||
to reduce congestion and improve traffic flow efficiency
|
||||
""")
|
||||
|
||||
den.set_method("""
|
||||
# Methodology
|
||||
|
||||
## Data Collection
|
||||
Traffic flow data collected from 100 intersections in downtown area from
|
||||
January-June 2023. Measurements include vehicle counts, wait times, and
|
||||
queue lengths at 1-minute intervals.
|
||||
|
||||
## Model Development
|
||||
Developed a Deep Q-Network (DQN) reinforcement learning agent to optimize
|
||||
traffic light timing. State space includes current queue lengths and
|
||||
historical flow patterns. Actions correspond to light timing adjustments.
|
||||
|
||||
## Training
|
||||
Trained the agent using historical data with a reward function based on
|
||||
total wait time reduction. Used experience replay and target networks for
|
||||
stable learning.
|
||||
|
||||
## Validation
|
||||
Validated using held-out test data and compared against:
|
||||
- Current fixed-timing system
|
||||
- Actuated control system
|
||||
- Alternative RL algorithms (A3C, PPO)
|
||||
|
||||
## Metrics
|
||||
- Average wait time reduction
|
||||
- Total throughput improvement
|
||||
- Queue length distribution
|
||||
- Computational efficiency
|
||||
""")
|
||||
|
||||
den.set_results("""
|
||||
# Results
|
||||
|
||||
## Training Performance
|
||||
The DQN agent converged after 500,000 training episodes. Training time: 12 hours
|
||||
on NVIDIA V100 GPU.
|
||||
|
||||
## Wait Time Reduction
|
||||
- Current system: Average wait time 45.2 seconds
|
||||
- DQN system: Average wait time 32.8 seconds
|
||||
- Improvement: 27.4% reduction (p < 0.001)
|
||||
|
||||
## Throughput Analysis
|
||||
- Vehicles processed per hour increased from 2,850 to 3,420 (+20%)
|
||||
- Peak hour congestion reduced by 35%
|
||||
|
||||
## Comparison with Baselines
|
||||
- Actuated control: 38.1 seconds average wait (DQN still 14% better)
|
||||
- A3C: 34.5 seconds (DQN slightly better, 5%)
|
||||
- PPO: 33.2 seconds (DQN marginally better, 1%)
|
||||
|
||||
## Queue Length Analysis
|
||||
Maximum queue length reduced from 42 vehicles to 28 vehicles during peak hours.
|
||||
|
||||
## Figures
|
||||
- Figure 1: Training curve showing convergence
|
||||
- Figure 2: Wait time distribution comparison
|
||||
- Figure 3: Throughput over time of day
|
||||
- Figure 4: Heatmap of queue lengths across intersections
|
||||
""")
|
||||
|
||||
# Generate publication-ready paper
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Fast Mode with Gemini
|
||||
|
||||
Use Google's Gemini models for faster execution.
|
||||
|
||||
### Example: Rapid Prototyping
|
||||
|
||||
```python
|
||||
# Configure for fast mode (conceptual - check denario documentation)
|
||||
# This would involve setting appropriate LLM backend
|
||||
|
||||
den = Denario(project_dir="./fast_research")
|
||||
|
||||
# Same workflow, optimized for speed
|
||||
den.set_data_description("""
|
||||
Quick analysis needed: Monthly sales data (2 years)
|
||||
Goal: Identify seasonal patterns and forecast next quarter
|
||||
Tools: pandas, Prophet
|
||||
""")
|
||||
|
||||
# Fast execution
|
||||
den.get_idea()
|
||||
den.get_method()
|
||||
den.get_results()
|
||||
den.get_paper()
|
||||
|
||||
# Trade-off: Faster execution, potentially less detailed analysis
|
||||
```
|
||||
|
||||
## Hybrid Workflow: Custom Idea + Automated Method
|
||||
|
||||
Combine manual and automated approaches.
|
||||
|
||||
### Example: Directed Research
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./hybrid_workflow")
|
||||
|
||||
# Describe data
|
||||
den.set_data_description("""
|
||||
Medical imaging dataset: 10,000 chest X-rays
|
||||
Labels: Normal, pneumonia, COVID-19
|
||||
Format: 224x224 grayscale PNG files
|
||||
Tools: TensorFlow, Keras, scikit-learn, OpenCV
|
||||
""")
|
||||
|
||||
# Provide specific research direction
|
||||
den.set_idea("""
|
||||
Develop a transfer learning approach using pre-trained ResNet50 for multi-class
|
||||
classification of chest X-rays, with focus on interpretability using Grad-CAM
|
||||
to identify diagnostic regions
|
||||
""")
|
||||
|
||||
# Let denario develop the methodology
|
||||
den.get_method()
|
||||
|
||||
# Review methodology, then execute
|
||||
den.get_results()
|
||||
|
||||
# Generate paper
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Time-Series Analysis Example
|
||||
|
||||
Specialized example for temporal data.
|
||||
|
||||
### Example: Economic Forecasting
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./time_series_analysis")
|
||||
|
||||
den.set_data_description("""
|
||||
Dataset: Monthly unemployment rates (US, 1950-2023)
|
||||
Additional features: GDP growth, inflation, interest rates
|
||||
Format: Multivariate time-series DataFrame
|
||||
Tools: statsmodels, Prophet, pmdarima, sklearn
|
||||
|
||||
Analysis goals:
|
||||
- Model unemployment trends
|
||||
- Forecast next 12 months
|
||||
- Identify leading indicators
|
||||
- Assess forecast uncertainty
|
||||
|
||||
Data characteristics:
|
||||
- Seasonal patterns (annual cycles)
|
||||
- Structural breaks (recessions)
|
||||
- Autocorrelation present
|
||||
- Non-stationary (unit root)
|
||||
""")
|
||||
|
||||
den.get_idea()
|
||||
# Might generate: "Develop a SARIMAX model incorporating economic indicators
|
||||
# as exogenous variables to forecast unemployment with confidence intervals"
|
||||
|
||||
den.get_method()
|
||||
den.get_results()
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Machine Learning Pipeline Example
|
||||
|
||||
Complete ML workflow with validation.
|
||||
|
||||
### Example: Predictive Modeling
|
||||
|
||||
```python
|
||||
den = Denario(project_dir="./ml_pipeline")
|
||||
|
||||
den.set_data_description("""
|
||||
Dataset: Customer churn prediction
|
||||
- 50,000 customers, 30 features (demographics, usage patterns, service history)
|
||||
- Binary target: churned (1) or retained (0)
|
||||
- Imbalanced: 20% churn rate
|
||||
- Features: Numerical and categorical mixed
|
||||
|
||||
Available tools:
|
||||
- pandas for preprocessing
|
||||
- sklearn for modeling (RF, XGBoost, logistic regression)
|
||||
- imblearn for handling imbalance
|
||||
- SHAP for feature importance
|
||||
|
||||
Goals:
|
||||
- Build predictive model for churn
|
||||
- Identify key churn factors
|
||||
- Provide actionable insights
|
||||
- Achieve >85% AUC-ROC
|
||||
""")
|
||||
|
||||
den.get_idea()
|
||||
# Might generate: "Develop an ensemble model combining XGBoost and Random Forest
|
||||
# with SMOTE oversampling, and use SHAP values to identify interpretable
|
||||
# churn risk factors"
|
||||
|
||||
den.get_method()
|
||||
# Will include: train/test split, cross-validation, hyperparameter tuning,
|
||||
# performance metrics, feature importance analysis
|
||||
|
||||
den.get_results()
|
||||
# Executes full ML pipeline, generates:
|
||||
# - Model performance metrics
|
||||
# - ROC curves
|
||||
# - Feature importance plots
|
||||
# - Confusion matrices
|
||||
|
||||
den.get_paper(journal=Journal.APS)
|
||||
```
|
||||
|
||||
## Tips for Effective Usage
|
||||
|
||||
### Provide Rich Context
|
||||
|
||||
More context → better ideas and methodologies:
|
||||
|
||||
```python
|
||||
# Include:
|
||||
# - Data characteristics (size, format, quality issues)
|
||||
# - Available tools and libraries
|
||||
# - Domain-specific knowledge
|
||||
# - Research objectives and constraints
|
||||
# - Known challenges or considerations
|
||||
```
|
||||
|
||||
### Iterate on Intermediate Outputs
|
||||
|
||||
Review and refine at each stage:
|
||||
|
||||
```python
|
||||
# Generate
|
||||
den.get_idea()
|
||||
|
||||
# Review idea.md
|
||||
# If needed, refine:
|
||||
den.set_idea("Refined version of the idea")
|
||||
|
||||
# Continue
|
||||
den.get_method()
|
||||
# Review methodology.md
|
||||
# Refine if needed, then proceed
|
||||
```
|
||||
|
||||
### Save Your Workflow
|
||||
|
||||
Document the complete pipeline:
|
||||
|
||||
```python
|
||||
# Save workflow script
|
||||
with open("research_workflow.py", "w") as f:
|
||||
f.write("""
|
||||
from denario import Denario, Journal
|
||||
|
||||
den = Denario(project_dir="./project")
|
||||
den.set_data_description("...")
|
||||
den.get_idea()
|
||||
den.get_method()
|
||||
den.get_results()
|
||||
den.get_paper(journal=Journal.APS)
|
||||
""")
|
||||
```
|
||||
|
||||
### Use Version Control
|
||||
|
||||
Track research evolution:
|
||||
|
||||
```bash
|
||||
cd project_dir
|
||||
git init
|
||||
git add .
|
||||
git commit -m "Initial data description"
|
||||
|
||||
# After each stage
|
||||
git add .
|
||||
git commit -m "Generated research idea"
|
||||
# ... continue committing after each stage
|
||||
```
|
||||
Reference in New Issue
Block a user