zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Fork 0

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

12 KiB

Raw Permalink Blame History

Denario Examples

Complete End-to-End Research Example

This example demonstrates a full research pipeline from data to publication.

Setup

from denario import Denario, Journal
import os

# Create project directory
os.makedirs("climate_research", exist_ok=True)
den = Denario(project_dir="./climate_research")

Define Research Context

den.set_data_description("""
Available data: Global temperature anomaly dataset (1880-2023)
- Monthly mean temperature deviations from 1951-1980 baseline
- Global coverage with land and ocean measurements
- Format: CSV with columns [year, month, temperature_anomaly]

Available tools:
- pandas for data manipulation
- scipy for statistical analysis
- sklearn for regression modeling
- matplotlib and seaborn for visualization

Research domain: Climate science
Research goal: Quantify and characterize long-term global warming trends

Data source: NASA GISTEMP
Known characteristics: Strong autocorrelation, seasonal patterns, missing data pre-1900
""")

Execute Full Pipeline

# Generate research idea
den.get_idea()
# Output: "Quantify the rate of global temperature increase using
# linear regression and assess acceleration in warming trends"

# Develop methodology
den.get_method()
# Output: Creates methodology including:
# - Time-series preprocessing
# - Linear trend analysis
# - Moving average smoothing
# - Statistical significance testing
# - Visualization of trends

# Execute analysis
den.get_results()
# Output: Runs the analysis, generates:
# - Computed trend: +0.18°C per decade
# - Statistical tests: p < 0.001
# - Figure 1: Temperature anomaly over time with trend line
# - Figure 2: Decadal averages
# - Figure 3: Acceleration analysis

# Generate publication
den.get_paper(journal=Journal.APS)
# Output: Creates formatted LaTeX paper with:
# - Title, abstract, introduction
# - Methods section
# - Results with embedded figures
# - Discussion and conclusions
# - References

Review Outputs

tree climate_research/
# climate_research/
# ├── data_description.txt
# ├── idea.md
# ├── methodology.md
# ├── results.md
# ├── figures/
# │   ├── temperature_trend.png
# │   ├── decadal_averages.png
# │   └── acceleration_analysis.png
# ├── paper.tex
# └── paper.pdf

Enhancing Input Descriptions

Improve data descriptions for better idea generation.

Basic Description

den = Denario(project_dir="./enhanced_input")

# Start with minimal description
den.set_data_description("Gene expression data from cancer patients")

Enhanced Description

# Enhance with specifics
den.set_data_description("""
Dataset: Gene expression microarray data from breast cancer patients
- Sample size: 500 patients (250 responders, 250 non-responders to therapy)
- Features: Expression levels of 20,000 genes
- Format: CSV matrix (samples × genes)
- Clinical metadata: Age, tumor stage, treatment response, survival time

Available analytical tools:
- pandas for data processing
- sklearn for machine learning (PCA, random forests, SVM)
- lifelines for survival analysis
- matplotlib/seaborn for visualization

Research objectives:
- Identify gene signatures predictive of treatment response
- Discover potential therapeutic targets
- Validate findings using cross-validation

Data characteristics:
- Normalized log2 expression values
- Some missing data (<5% of values)
- Batch effects corrected
""")

den.get_idea()
# Now generates more specific and relevant research ideas

Literature Search Integration

Incorporate existing research into your workflow.

den = Denario(project_dir="./literature_review")

# Define research area
den.set_data_description("""
Research area: Machine learning for protein structure prediction
Available data: Protein sequence database with known structures
Tools: Biopython, TensorFlow, scikit-learn
""")

# Generate idea
den.set_idea("Develop a deep learning model for predicting protein secondary structure from amino acid sequences")

# NOTE: Literature search functionality would be integrated here
# The specific API for literature search should be checked in denario's documentation
# Example conceptual usage:
# den.search_literature(keywords=["protein structure prediction", "deep learning", "LSTM"])
# This would inform methodology and provide citations for the paper

Generate Research Ideas from Data

Focus on idea generation without full pipeline execution.

Example: Brainstorming Research Questions

den = Denario(project_dir="./idea_generation")

# Provide comprehensive data description
den.set_data_description("""
Available datasets:
1. Social media sentiment data (1M tweets, 2020-2023)
2. Stock market prices (S&P 500, daily, 2020-2023)
3. Economic indicators (GDP, unemployment, inflation)

Tools: pandas, sklearn, statsmodels, Prophet, VADER sentiment analysis

Domain: Computational social science and finance
Research interests: Market prediction, sentiment analysis, causal inference
""")

# Generate multiple ideas (conceptual - depends on denario API)
den.get_idea()

# Review the generated idea in idea.md
# Decide whether to proceed or regenerate

Writing a Paper from Existing Results

Use denario for paper generation when analysis is already complete.

Example: Formatting Existing Research

den = Denario(project_dir="./paper_generation")

# Provide all components manually
den.set_data_description("""
Completed analysis of traffic pattern data from urban sensors
Dataset: 6 months of traffic flow measurements from 100 intersections
Analysis completed using R and Python
""")

den.set_idea("""
Research question: Optimize traffic light timing using reinforcement learning
to reduce congestion and improve traffic flow efficiency
""")

den.set_method("""
# Methodology

## Data Collection
Traffic flow data collected from 100 intersections in downtown area from
January-June 2023. Measurements include vehicle counts, wait times, and
queue lengths at 1-minute intervals.

## Model Development
Developed a Deep Q-Network (DQN) reinforcement learning agent to optimize
traffic light timing. State space includes current queue lengths and
historical flow patterns. Actions correspond to light timing adjustments.

## Training
Trained the agent using historical data with a reward function based on
total wait time reduction. Used experience replay and target networks for
stable learning.

## Validation
Validated using held-out test data and compared against:
- Current fixed-timing system
- Actuated control system
- Alternative RL algorithms (A3C, PPO)

## Metrics
- Average wait time reduction
- Total throughput improvement
- Queue length distribution
- Computational efficiency
""")

den.set_results("""
# Results

## Training Performance
The DQN agent converged after 500,000 training episodes. Training time: 12 hours
on NVIDIA V100 GPU.

## Wait Time Reduction
- Current system: Average wait time 45.2 seconds
- DQN system: Average wait time 32.8 seconds
- Improvement: 27.4% reduction (p < 0.001)

## Throughput Analysis
- Vehicles processed per hour increased from 2,850 to 3,420 (+20%)
- Peak hour congestion reduced by 35%

## Comparison with Baselines
- Actuated control: 38.1 seconds average wait (DQN still 14% better)
- A3C: 34.5 seconds (DQN slightly better, 5%)
- PPO: 33.2 seconds (DQN marginally better, 1%)

## Queue Length Analysis
Maximum queue length reduced from 42 vehicles to 28 vehicles during peak hours.

## Figures
- Figure 1: Training curve showing convergence
- Figure 2: Wait time distribution comparison
- Figure 3: Throughput over time of day
- Figure 4: Heatmap of queue lengths across intersections
""")

# Generate publication-ready paper
den.get_paper(journal=Journal.APS)

Fast Mode with Gemini

Use Google's Gemini models for faster execution.

Example: Rapid Prototyping

# Configure for fast mode (conceptual - check denario documentation)
# This would involve setting appropriate LLM backend

den = Denario(project_dir="./fast_research")

# Same workflow, optimized for speed
den.set_data_description("""
Quick analysis needed: Monthly sales data (2 years)
Goal: Identify seasonal patterns and forecast next quarter
Tools: pandas, Prophet
""")

# Fast execution
den.get_idea()
den.get_method()
den.get_results()
den.get_paper()

# Trade-off: Faster execution, potentially less detailed analysis

Hybrid Workflow: Custom Idea + Automated Method

Combine manual and automated approaches.

Example: Directed Research

den = Denario(project_dir="./hybrid_workflow")

# Describe data
den.set_data_description("""
Medical imaging dataset: 10,000 chest X-rays
Labels: Normal, pneumonia, COVID-19
Format: 224x224 grayscale PNG files
Tools: TensorFlow, Keras, scikit-learn, OpenCV
""")

# Provide specific research direction
den.set_idea("""
Develop a transfer learning approach using pre-trained ResNet50 for multi-class
classification of chest X-rays, with focus on interpretability using Grad-CAM
to identify diagnostic regions
""")

# Let denario develop the methodology
den.get_method()

# Review methodology, then execute
den.get_results()

# Generate paper
den.get_paper(journal=Journal.APS)

Time-Series Analysis Example

Specialized example for temporal data.

Example: Economic Forecasting

den = Denario(project_dir="./time_series_analysis")

den.set_data_description("""
Dataset: Monthly unemployment rates (US, 1950-2023)
Additional features: GDP growth, inflation, interest rates
Format: Multivariate time-series DataFrame
Tools: statsmodels, Prophet, pmdarima, sklearn

Analysis goals:
- Model unemployment trends
- Forecast next 12 months
- Identify leading indicators
- Assess forecast uncertainty

Data characteristics:
- Seasonal patterns (annual cycles)
- Structural breaks (recessions)
- Autocorrelation present
- Non-stationary (unit root)
""")

den.get_idea()
# Might generate: "Develop a SARIMAX model incorporating economic indicators
# as exogenous variables to forecast unemployment with confidence intervals"

den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)

Machine Learning Pipeline Example

Complete ML workflow with validation.

Example: Predictive Modeling

den = Denario(project_dir="./ml_pipeline")

den.set_data_description("""
Dataset: Customer churn prediction
- 50,000 customers, 30 features (demographics, usage patterns, service history)
- Binary target: churned (1) or retained (0)
- Imbalanced: 20% churn rate
- Features: Numerical and categorical mixed

Available tools:
- pandas for preprocessing
- sklearn for modeling (RF, XGBoost, logistic regression)
- imblearn for handling imbalance
- SHAP for feature importance

Goals:
- Build predictive model for churn
- Identify key churn factors
- Provide actionable insights
- Achieve >85% AUC-ROC
""")

den.get_idea()
# Might generate: "Develop an ensemble model combining XGBoost and Random Forest
# with SMOTE oversampling, and use SHAP values to identify interpretable
# churn risk factors"

den.get_method()
# Will include: train/test split, cross-validation, hyperparameter tuning,
# performance metrics, feature importance analysis

den.get_results()
# Executes full ML pipeline, generates:
# - Model performance metrics
# - ROC curves
# - Feature importance plots
# - Confusion matrices

den.get_paper(journal=Journal.APS)

Tips for Effective Usage

Provide Rich Context

More context → better ideas and methodologies:

# Include:
# - Data characteristics (size, format, quality issues)
# - Available tools and libraries
# - Domain-specific knowledge
# - Research objectives and constraints
# - Known challenges or considerations

Iterate on Intermediate Outputs

Review and refine at each stage:

# Generate
den.get_idea()

# Review idea.md
# If needed, refine:
den.set_idea("Refined version of the idea")

# Continue
den.get_method()
# Review methodology.md
# Refine if needed, then proceed

Save Your Workflow

Document the complete pipeline:

# Save workflow script
with open("research_workflow.py", "w") as f:
    f.write("""
from denario import Denario, Journal

den = Denario(project_dir="./project")
den.set_data_description("...")
den.get_idea()
den.get_method()
den.get_results()
den.get_paper(journal=Journal.APS)
""")

Use Version Control

Track research evolution:

cd project_dir
git init
git add .
git commit -m "Initial data description"

# After each stage
git add .
git commit -m "Generated research idea"
# ... continue committing after each stage

12 KiB Raw Permalink Blame History Unescape Escape