Files
2025-11-30 08:30:10 +08:00

495 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Denario Examples
## Complete End-to-End Research Example
This example demonstrates a full research pipeline from data to publication.
### Setup
```python
from denario import Denario, Journal
import os
# Create project directory
os.makedirs("climate_research", exist_ok=True)
den = Denario(project_dir="./climate_research")
```
### Define Research Context
```python
den.set_data_description("""
Available data: Global temperature anomaly dataset (1880-2023)
- Monthly mean temperature deviations from 1951-1980 baseline
- Global coverage with land and ocean measurements
- Format: CSV with columns [year, month, temperature_anomaly]
Available tools:
- pandas for data manipulation
- scipy for statistical analysis
- sklearn for regression modeling
- matplotlib and seaborn for visualization
Research domain: Climate science
Research goal: Quantify and characterize long-term global warming trends
Data source: NASA GISTEMP
Known characteristics: Strong autocorrelation, seasonal patterns, missing data pre-1900
""")
```
### Execute Full Pipeline
```python
# Generate research idea
den.get_idea()
# Output: "Quantify the rate of global temperature increase using
# linear regression and assess acceleration in warming trends"
# Develop methodology
den.get_method()
# Output: Creates methodology including:
# - Time-series preprocessing
# - Linear trend analysis
# - Moving average smoothing
# - Statistical significance testing
# - Visualization of trends
# Execute analysis
den.get_results()
# Output: Runs the analysis, generates:
# - Computed trend: +0.18°C per decade
# - Statistical tests: p < 0.001
# - Figure 1: Temperature anomaly over time with trend line
# - Figure 2: Decadal averages
# - Figure 3: Acceleration analysis
# Generate publication
den.get_paper(journal=Journal.APS)
# Output: Creates formatted LaTeX paper with:
# - Title, abstract, introduction
# - Methods section
# - Results with embedded figures
# - Discussion and conclusions
# - References
```
### Review Outputs
```bash
tree climate_research/
# climate_research/
# ├── data_description.txt
# ├── idea.md
# ├── methodology.md
# ├── results.md
# ├── figures/
# │ ├── temperature_trend.png
# │ ├── decadal_averages.png
# │ └── acceleration_analysis.png
# ├── paper.tex
# └── paper.pdf
```
## Enhancing Input Descriptions
Improve data descriptions for better idea generation.
### Basic Description
```python
den = Denario(project_dir="./enhanced_input")
# Start with minimal description
den.set_data_description("Gene expression data from cancer patients")
```
### Enhanced Description
```python
# Enhance with specifics
den.set_data_description("""
Dataset: Gene expression microarray data from breast cancer patients
- Sample size: 500 patients (250 responders, 250 non-responders to therapy)
- Features: Expression levels of 20,000 genes
- Format: CSV matrix (samples × genes)
- Clinical metadata: Age, tumor stage, treatment response, survival time
Available analytical tools:
- pandas for data processing
- sklearn for machine learning (PCA, random forests, SVM)
- lifelines for survival analysis
- matplotlib/seaborn for visualization
Research objectives:
- Identify gene signatures predictive of treatment response
- Discover potential therapeutic targets
- Validate findings using cross-validation
Data characteristics:
- Normalized log2 expression values
- Some missing data (<5% of values)
- Batch effects corrected
""")
den.get_idea()
# Now generates more specific and relevant research ideas
```
## Literature Search Integration
Incorporate existing research into your workflow.
### Example: Finding Related Work
```python
den = Denario(project_dir="./literature_review")
# Define research area
den.set_data_description("""
Research area: Machine learning for protein structure prediction
Available data: Protein sequence database with known structures
Tools: Biopython, TensorFlow, scikit-learn
""")
# Generate idea
den.set_idea("Develop a deep learning model for predicting protein secondary structure from amino acid sequences")
# NOTE: Literature search functionality would be integrated here
# The specific API for literature search should be checked in denario's documentation
# Example conceptual usage:
# den.search_literature(keywords=["protein structure prediction", "deep learning", "LSTM"])
# This would inform methodology and provide citations for the paper
```
## Generate Research Ideas from Data
Focus on idea generation without full pipeline execution.
### Example: Brainstorming Research Questions
```python
den = Denario(project_dir="./idea_generation")
# Provide comprehensive data description
den.set_data_description("""
Available datasets:
1. Social media sentiment data (1M tweets, 2020-2023)
2. Stock market prices (S&P 500, daily, 2020-2023)
3. Economic indicators (GDP, unemployment, inflation)
Tools: pandas, sklearn, statsmodels, Prophet, VADER sentiment analysis
Domain: Computational social science and finance
Research interests: Market prediction, sentiment analysis, causal inference
""")
# Generate multiple ideas (conceptual - depends on denario API)
den.get_idea()
# Review the generated idea in idea.md
# Decide whether to proceed or regenerate
```
## Writing a Paper from Existing Results
Use denario for paper generation when analysis is already complete.
### Example: Formatting Existing Research
```python
den = Denario(project_dir="./paper_generation")
# Provide all components manually
den.set_data_description("""
Completed analysis of traffic pattern data from urban sensors
Dataset: 6 months of traffic flow measurements from 100 intersections
Analysis completed using R and Python
""")
den.set_idea("""
Research question: Optimize traffic light timing using reinforcement learning
to reduce congestion and improve traffic flow efficiency
""")
den.set_method("""
# Methodology
## Data Collection
Traffic flow data collected from 100 intersections in downtown area from
January-June 2023. Measurements include vehicle counts, wait times, and
queue lengths at 1-minute intervals.
## Model Development
Developed a Deep Q-Network (DQN) reinforcement learning agent to optimize
traffic light timing. State space includes current queue lengths and
historical flow patterns. Actions correspond to light timing adjustments.
## Training
Trained the agent using historical data with a reward function based on
total wait time reduction. Used experience replay and target networks for
stable learning.
## Validation
Validated using held-out test data and compared against:
- Current fixed-timing system
- Actuated control system
- Alternative RL algorithms (A3C, PPO)
## Metrics
- Average wait time reduction
- Total throughput improvement
- Queue length distribution
- Computational efficiency
""")
den.set_results("""
# Results
## Training Performance
The DQN agent converged after 500,000 training episodes. Training time: 12 hours
on NVIDIA V100 GPU.
## Wait Time Reduction
- Current system: Average wait time 45.2 seconds
- DQN system: Average wait time 32.8 seconds
- Improvement: 27.4% reduction (p < 0.001)
## Throughput Analysis
- Vehicles processed per hour increased from 2,850 to 3,420 (+20%)
- Peak hour congestion reduced by 35%
## Comparison with Baselines
- Actuated control: 38.1 seconds average wait (DQN still 14% better)
- A3C: 34.5 seconds (DQN slightly better, 5%)
- PPO: 33.2 seconds (DQN marginally better, 1%)
## Queue Length Analysis
Maximum queue length reduced from 42 vehicles to 28 vehicles during peak hours.
## Figures
- Figure 1: Training curve showing convergence
- Figure 2: Wait time distribution comparison
- Figure 3: Throughput over time of day
- Figure 4: Heatmap of queue lengths across intersections
""")
# Generate publication-ready paper
den.get_paper(journal=Journal.APS)
```
## Fast Mode with Gemini
Use Google's Gemini models for faster execution.
### Example: Rapid Prototyping
```python
# Configure for fast mode (conceptual - check denario documentation)
# This would involve setting appropriate LLM backend
den = Denario(project_dir="./fast_research")
# Same workflow, optimized for speed
den.set_data_description("""
Quick analysis needed: Monthly sales data (2 years)
Goal: Identify seasonal patterns and forecast next quarter
Tools: pandas, Prophet
""")
# Fast execution
den.get_idea()
den.get_method()
den.get_results()
den.get_paper()
# Trade-off: Faster execution, potentially less detailed analysis
```
## Hybrid Workflow: Custom Idea + Automated Method
Combine manual and automated approaches.
### Example: Directed Research
```python
den = Denario(project_dir="./hybrid_workflow")
# Describe data
den.set_data_description("""
Medical imaging dataset: 10,000 chest X-rays
Labels: Normal, pneumonia, COVID-19
Format: 224x224 grayscale PNG files
Tools: TensorFlow, Keras, scikit-learn, OpenCV
""")
# Provide specific research direction
den.set_idea("""
Develop a transfer learning approach using pre-trained ResNet50 for multi-class
classification of chest X-rays, with focus on interpretability using Grad-CAM
to identify diagnostic regions
""")
# Let denario develop the methodology
den.get_method()
# Review methodology, then execute
den.get_results()
# Generate paper
den.get_paper(journal=Journal.APS)
```
## Time-Series Analysis Example
Specialized example for temporal data.
### Example: Economic Forecasting
```python
den = Denario(project_dir="./time_series_analysis")
den.set_data_description("""
Dataset: Monthly unemployment rates (US, 1950-2023)
Additional features: GDP growth, inflation, interest rates
Format: Multivariate time-series DataFrame
Tools: statsmodels, Prophet, pmdarima, sklearn
Analysis goals:
- Model unemployment trends
- Forecast next 12 months
- Identify leading indicators
- Assess forecast uncertainty
Data characteristics:
- Seasonal patterns (annual cycles)
- Structural breaks (recessions)
- Autocorrelation present
- Non-stationary (unit root)
""")
den.get_idea()
# Might generate: "Develop a SARIMAX model incorporating economic indicators
# as exogenous variables to forecast unemployment with confidence intervals"
den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)
```
## Machine Learning Pipeline Example
Complete ML workflow with validation.
### Example: Predictive Modeling
```python
den = Denario(project_dir="./ml_pipeline")
den.set_data_description("""
Dataset: Customer churn prediction
- 50,000 customers, 30 features (demographics, usage patterns, service history)
- Binary target: churned (1) or retained (0)
- Imbalanced: 20% churn rate
- Features: Numerical and categorical mixed
Available tools:
- pandas for preprocessing
- sklearn for modeling (RF, XGBoost, logistic regression)
- imblearn for handling imbalance
- SHAP for feature importance
Goals:
- Build predictive model for churn
- Identify key churn factors
- Provide actionable insights
- Achieve >85% AUC-ROC
""")
den.get_idea()
# Might generate: "Develop an ensemble model combining XGBoost and Random Forest
# with SMOTE oversampling, and use SHAP values to identify interpretable
# churn risk factors"
den.get_method()
# Will include: train/test split, cross-validation, hyperparameter tuning,
# performance metrics, feature importance analysis
den.get_results()
# Executes full ML pipeline, generates:
# - Model performance metrics
# - ROC curves
# - Feature importance plots
# - Confusion matrices
den.get_paper(journal=Journal.APS)
```
## Tips for Effective Usage
### Provide Rich Context
More context → better ideas and methodologies:
```python
# Include:
# - Data characteristics (size, format, quality issues)
# - Available tools and libraries
# - Domain-specific knowledge
# - Research objectives and constraints
# - Known challenges or considerations
```
### Iterate on Intermediate Outputs
Review and refine at each stage:
```python
# Generate
den.get_idea()
# Review idea.md
# If needed, refine:
den.set_idea("Refined version of the idea")
# Continue
den.get_method()
# Review methodology.md
# Refine if needed, then proceed
```
### Save Your Workflow
Document the complete pipeline:
```python
# Save workflow script
with open("research_workflow.py", "w") as f:
f.write("""
from denario import Denario, Journal
den = Denario(project_dir="./project")
den.set_data_description("...")
den.get_idea()
den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)
""")
```
### Use Version Control
Track research evolution:
```bash
cd project_dir
git init
git add .
git commit -m "Initial data description"
# After each stage
git add .
git commit -m "Generated research idea"
# ... continue committing after each stage
```