Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/denario/references/examples.md
+++ b/skills/denario/references/examples.md
@@ -0,0 +1,494 @@
+# Denario Examples
+
+## Complete End-to-End Research Example
+
+This example demonstrates a full research pipeline from data to publication.
+
+### Setup
+
+```python
+from denario import Denario, Journal
+import os
+
+# Create project directory
+os.makedirs("climate_research", exist_ok=True)
+den = Denario(project_dir="./climate_research")
+```
+
+### Define Research Context
+
+```python
+den.set_data_description("""
+Available data: Global temperature anomaly dataset (1880-2023)
+- Monthly mean temperature deviations from 1951-1980 baseline
+- Global coverage with land and ocean measurements
+- Format: CSV with columns [year, month, temperature_anomaly]
+
+Available tools:
+- pandas for data manipulation
+- scipy for statistical analysis
+- sklearn for regression modeling
+- matplotlib and seaborn for visualization
+
+Research domain: Climate science
+Research goal: Quantify and characterize long-term global warming trends
+
+Data source: NASA GISTEMP
+Known characteristics: Strong autocorrelation, seasonal patterns, missing data pre-1900
+""")
+```
+
+### Execute Full Pipeline
+
+```python
+# Generate research idea
+den.get_idea()
+# Output: "Quantify the rate of global temperature increase using
+# linear regression and assess acceleration in warming trends"
+
+# Develop methodology
+den.get_method()
+# Output: Creates methodology including:
+# - Time-series preprocessing
+# - Linear trend analysis
+# - Moving average smoothing
+# - Statistical significance testing
+# - Visualization of trends
+
+# Execute analysis
+den.get_results()
+# Output: Runs the analysis, generates:
+# - Computed trend: +0.18°C per decade
+# - Statistical tests: p < 0.001
+# - Figure 1: Temperature anomaly over time with trend line
+# - Figure 2: Decadal averages
+# - Figure 3: Acceleration analysis
+
+# Generate publication
+den.get_paper(journal=Journal.APS)
+# Output: Creates formatted LaTeX paper with:
+# - Title, abstract, introduction
+# - Methods section
+# - Results with embedded figures
+# - Discussion and conclusions
+# - References
+```
+
+### Review Outputs
+
+```bash
+tree climate_research/
+# climate_research/
+# ├── data_description.txt
+# ├── idea.md
+# ├── methodology.md
+# ├── results.md
+# ├── figures/
+# │   ├── temperature_trend.png
+# │   ├── decadal_averages.png
+# │   └── acceleration_analysis.png
+# ├── paper.tex
+# └── paper.pdf
+```
+
+## Enhancing Input Descriptions
+
+Improve data descriptions for better idea generation.
+
+### Basic Description
+
+```python
+den = Denario(project_dir="./enhanced_input")
+
+# Start with minimal description
+den.set_data_description("Gene expression data from cancer patients")
+```
+
+### Enhanced Description
+
+```python
+# Enhance with specifics
+den.set_data_description("""
+Dataset: Gene expression microarray data from breast cancer patients
+- Sample size: 500 patients (250 responders, 250 non-responders to therapy)
+- Features: Expression levels of 20,000 genes
+- Format: CSV matrix (samples × genes)
+- Clinical metadata: Age, tumor stage, treatment response, survival time
+
+Available analytical tools:
+- pandas for data processing
+- sklearn for machine learning (PCA, random forests, SVM)
+- lifelines for survival analysis
+- matplotlib/seaborn for visualization
+
+Research objectives:
+- Identify gene signatures predictive of treatment response
+- Discover potential therapeutic targets
+- Validate findings using cross-validation
+
+Data characteristics:
+- Normalized log2 expression values
+- Some missing data (<5% of values)
+- Batch effects corrected
+""")
+
+den.get_idea()
+# Now generates more specific and relevant research ideas
+```
+
+## Literature Search Integration
+
+Incorporate existing research into your workflow.
+
+### Example: Finding Related Work
+
+```python
+den = Denario(project_dir="./literature_review")
+
+# Define research area
+den.set_data_description("""
+Research area: Machine learning for protein structure prediction
+Available data: Protein sequence database with known structures
+Tools: Biopython, TensorFlow, scikit-learn
+""")
+
+# Generate idea
+den.set_idea("Develop a deep learning model for predicting protein secondary structure from amino acid sequences")
+
+# NOTE: Literature search functionality would be integrated here
+# The specific API for literature search should be checked in denario's documentation
+# Example conceptual usage:
+# den.search_literature(keywords=["protein structure prediction", "deep learning", "LSTM"])
+# This would inform methodology and provide citations for the paper
+```
+
+## Generate Research Ideas from Data
+
+Focus on idea generation without full pipeline execution.
+
+### Example: Brainstorming Research Questions
+
+```python
+den = Denario(project_dir="./idea_generation")
+
+# Provide comprehensive data description
+den.set_data_description("""
+Available datasets:
+1. Social media sentiment data (1M tweets, 2020-2023)
+2. Stock market prices (S&P 500, daily, 2020-2023)
+3. Economic indicators (GDP, unemployment, inflation)
+
+Tools: pandas, sklearn, statsmodels, Prophet, VADER sentiment analysis
+
+Domain: Computational social science and finance
+Research interests: Market prediction, sentiment analysis, causal inference
+""")
+
+# Generate multiple ideas (conceptual - depends on denario API)
+den.get_idea()
+
+# Review the generated idea in idea.md
+# Decide whether to proceed or regenerate
+```
+
+## Writing a Paper from Existing Results
+
+Use denario for paper generation when analysis is already complete.
+
+### Example: Formatting Existing Research
+
+```python
+den = Denario(project_dir="./paper_generation")
+
+# Provide all components manually
+den.set_data_description("""
+Completed analysis of traffic pattern data from urban sensors
+Dataset: 6 months of traffic flow measurements from 100 intersections
+Analysis completed using R and Python
+""")
+
+den.set_idea("""
+Research question: Optimize traffic light timing using reinforcement learning
+to reduce congestion and improve traffic flow efficiency
+""")
+
+den.set_method("""
+# Methodology
+
+## Data Collection
+Traffic flow data collected from 100 intersections in downtown area from
+January-June 2023. Measurements include vehicle counts, wait times, and
+queue lengths at 1-minute intervals.
+
+## Model Development
+Developed a Deep Q-Network (DQN) reinforcement learning agent to optimize
+traffic light timing. State space includes current queue lengths and
+historical flow patterns. Actions correspond to light timing adjustments.
+
+## Training
+Trained the agent using historical data with a reward function based on
+total wait time reduction. Used experience replay and target networks for
+stable learning.
+
+## Validation
+Validated using held-out test data and compared against:
+- Current fixed-timing system
+- Actuated control system
+- Alternative RL algorithms (A3C, PPO)
+
+## Metrics
+- Average wait time reduction
+- Total throughput improvement
+- Queue length distribution
+- Computational efficiency
+""")
+
+den.set_results("""
+# Results
+
+## Training Performance
+The DQN agent converged after 500,000 training episodes. Training time: 12 hours
+on NVIDIA V100 GPU.
+
+## Wait Time Reduction
+- Current system: Average wait time 45.2 seconds
+- DQN system: Average wait time 32.8 seconds
+- Improvement: 27.4% reduction (p < 0.001)
+
+## Throughput Analysis
+- Vehicles processed per hour increased from 2,850 to 3,420 (+20%)
+- Peak hour congestion reduced by 35%
+
+## Comparison with Baselines
+- Actuated control: 38.1 seconds average wait (DQN still 14% better)
+- A3C: 34.5 seconds (DQN slightly better, 5%)
+- PPO: 33.2 seconds (DQN marginally better, 1%)
+
+## Queue Length Analysis
+Maximum queue length reduced from 42 vehicles to 28 vehicles during peak hours.
+
+## Figures
+- Figure 1: Training curve showing convergence
+- Figure 2: Wait time distribution comparison
+- Figure 3: Throughput over time of day
+- Figure 4: Heatmap of queue lengths across intersections
+""")
+
+# Generate publication-ready paper
+den.get_paper(journal=Journal.APS)
+```
+
+## Fast Mode with Gemini
+
+Use Google's Gemini models for faster execution.
+
+### Example: Rapid Prototyping
+
+```python
+# Configure for fast mode (conceptual - check denario documentation)
+# This would involve setting appropriate LLM backend
+
+den = Denario(project_dir="./fast_research")
+
+# Same workflow, optimized for speed
+den.set_data_description("""
+Quick analysis needed: Monthly sales data (2 years)
+Goal: Identify seasonal patterns and forecast next quarter
+Tools: pandas, Prophet
+""")
+
+# Fast execution
+den.get_idea()
+den.get_method()
+den.get_results()
+den.get_paper()
+
+# Trade-off: Faster execution, potentially less detailed analysis
+```
+
+## Hybrid Workflow: Custom Idea + Automated Method
+
+Combine manual and automated approaches.
+
+### Example: Directed Research
+
+```python
+den = Denario(project_dir="./hybrid_workflow")
+
+# Describe data
+den.set_data_description("""
+Medical imaging dataset: 10,000 chest X-rays
+Labels: Normal, pneumonia, COVID-19
+Format: 224x224 grayscale PNG files
+Tools: TensorFlow, Keras, scikit-learn, OpenCV
+""")
+
+# Provide specific research direction
+den.set_idea("""
+Develop a transfer learning approach using pre-trained ResNet50 for multi-class
+classification of chest X-rays, with focus on interpretability using Grad-CAM
+to identify diagnostic regions
+""")
+
+# Let denario develop the methodology
+den.get_method()
+
+# Review methodology, then execute
+den.get_results()
+
+# Generate paper
+den.get_paper(journal=Journal.APS)
+```
+
+## Time-Series Analysis Example
+
+Specialized example for temporal data.
+
+### Example: Economic Forecasting
+
+```python
+den = Denario(project_dir="./time_series_analysis")
+
+den.set_data_description("""
+Dataset: Monthly unemployment rates (US, 1950-2023)
+Additional features: GDP growth, inflation, interest rates
+Format: Multivariate time-series DataFrame
+Tools: statsmodels, Prophet, pmdarima, sklearn
+
+Analysis goals:
+- Model unemployment trends
+- Forecast next 12 months
+- Identify leading indicators
+- Assess forecast uncertainty
+
+Data characteristics:
+- Seasonal patterns (annual cycles)
+- Structural breaks (recessions)
+- Autocorrelation present
+- Non-stationary (unit root)
+""")
+
+den.get_idea()
+# Might generate: "Develop a SARIMAX model incorporating economic indicators
+# as exogenous variables to forecast unemployment with confidence intervals"
+
+den.get_method()
+den.get_results()
+den.get_paper(journal=Journal.APS)
+```
+
+## Machine Learning Pipeline Example
+
+Complete ML workflow with validation.
+
+### Example: Predictive Modeling
+
+```python
+den = Denario(project_dir="./ml_pipeline")
+
+den.set_data_description("""
+Dataset: Customer churn prediction
+- 50,000 customers, 30 features (demographics, usage patterns, service history)
+- Binary target: churned (1) or retained (0)
+- Imbalanced: 20% churn rate
+- Features: Numerical and categorical mixed
+
+Available tools:
+- pandas for preprocessing
+- sklearn for modeling (RF, XGBoost, logistic regression)
+- imblearn for handling imbalance
+- SHAP for feature importance
+
+Goals:
+- Build predictive model for churn
+- Identify key churn factors
+- Provide actionable insights
+- Achieve >85% AUC-ROC
+""")
+
+den.get_idea()
+# Might generate: "Develop an ensemble model combining XGBoost and Random Forest
+# with SMOTE oversampling, and use SHAP values to identify interpretable
+# churn risk factors"
+
+den.get_method()
+# Will include: train/test split, cross-validation, hyperparameter tuning,
+# performance metrics, feature importance analysis
+
+den.get_results()
+# Executes full ML pipeline, generates:
+# - Model performance metrics
+# - ROC curves
+# - Feature importance plots
+# - Confusion matrices
+
+den.get_paper(journal=Journal.APS)
+```
+
+## Tips for Effective Usage
+
+### Provide Rich Context
+
+More context → better ideas and methodologies:
+
+```python
+# Include:
+# - Data characteristics (size, format, quality issues)
+# - Available tools and libraries
+# - Domain-specific knowledge
+# - Research objectives and constraints
+# - Known challenges or considerations
+```
+
+### Iterate on Intermediate Outputs
+
+Review and refine at each stage:
+
+```python
+# Generate
+den.get_idea()
+
+# Review idea.md
+# If needed, refine:
+den.set_idea("Refined version of the idea")
+
+# Continue
+den.get_method()
+# Review methodology.md
+# Refine if needed, then proceed
+```
+
+### Save Your Workflow
+
+Document the complete pipeline:
+
+```python
+# Save workflow script
+with open("research_workflow.py", "w") as f:
+    f.write("""
+from denario import Denario, Journal
+
+den = Denario(project_dir="./project")
+den.set_data_description("...")
+den.get_idea()
+den.get_method()
+den.get_results()
+den.get_paper(journal=Journal.APS)
+""")
+```
+
+### Use Version Control
+
+Track research evolution:
+
+```bash
+cd project_dir
+git init
+git add .
+git commit -m "Initial data description"
+
+# After each stage
+git add .
+git commit -m "Generated research idea"
+# ... continue committing after each stage
+```