zhongwei/gh-k-dense-ai-claude-scientific-writer

Files

Zhongwei Li 1dd5bee3b4 Initial commit

2025-11-30 08:30:14 +08:00

9.4 KiB

Raw Permalink Blame History

OpenRouter Integration for MarkItDown

Overview

This MarkItDown skill has been configured to use OpenRouter instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.

Why OpenRouter?

Benefits

Multiple Model Access: Access GPT-4, Claude, Gemini, and 100+ other models through one API
No Vendor Lock-in: Switch between models without code changes
Competitive Pricing: Often better rates than going direct
Simple Migration: OpenAI-compatible API means minimal code changes
Flexible Choice: Choose the best model for each task

Popular Models for Image Description

Model	Provider	Use Case	Vision Support
`anthropic/claude-sonnet-4.5`	Anthropic	Recommended - Best overall for scientific analysis	✅
`anthropic/claude-3.5-sonnet`	Anthropic	Excellent technical analysis	✅
`openai/gpt-4o`	OpenAI	Strong vision understanding	✅
`openai/gpt-4-vision`	OpenAI	GPT-4 with vision	✅
`google/gemini-pro-vision`	Google	Cost-effective option	✅

See https://openrouter.ai/models for the complete list.

Getting Started

1. Get an API Key

Visit https://openrouter.ai/keys
Sign up or log in
Create a new API key
Copy the key (starts with sk-or-v1-...)

2. Set Environment Variable

# Add to your environment
export OPENROUTER_API_KEY="sk-or-v1-..."

# Make it permanent
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc  # macOS
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux

# Reload shell
source ~/.zshrc  # or source ~/.bashrc

3. Use in Python

from markitdown import MarkItDown
from openai import OpenAI

# Initialize OpenRouter client (OpenAI-compatible)
client = OpenAI(
    api_key="your-openrouter-api-key",  # or use env var
    base_url="https://openrouter.ai/api/v1"
)

# Create MarkItDown with AI support
md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"  # Choose your model
)

# Convert with AI-enhanced descriptions
result = md.convert("presentation.pptx")
print(result.text_content)

Using the Scripts

All skill scripts have been updated to use OpenRouter:

convert_with_ai.py

# Set API key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Convert with default model (Claude Sonnet 4.5)
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific

# Use GPT-4o as alternative
python scripts/convert_with_ai.py paper.pdf output.md \
  --model openai/gpt-4o \
  --prompt-type scientific

# Use Gemini Pro Vision (cost-effective)
python scripts/convert_with_ai.py slides.pptx output.md \
  --model google/gemini-pro-vision \
  --prompt-type presentation

# List available prompt types
python scripts/convert_with_ai.py --list-prompts

Choosing the Right Model

# For scientific papers - use Claude Sonnet 4.5 for technical analysis
python scripts/convert_with_ai.py research.pdf output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type scientific

# For presentations - use Claude Sonnet 4.5 for vision
python scripts/convert_with_ai.py slides.pptx output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type presentation

# For data visualizations - use Claude Sonnet 4.5
python scripts/convert_with_ai.py charts.pdf output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type data_viz

# For medical images - use Claude Sonnet 4.5 for detailed analysis
python scripts/convert_with_ai.py xray.jpg output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type medical

Code Examples

Basic Usage

from markitdown import MarkItDown
from openai import OpenAI
import os

# Initialize OpenRouter client
client = OpenAI(
    api_key=os.environ.get("OPENROUTER_API_KEY"),
    base_url="https://openrouter.ai/api/v1"
)

# Use Claude Sonnet 4.5 for image descriptions
md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"
)

result = md.convert("document.pptx")
print(result.text_content)

Switching Models Dynamically

from markitdown import MarkItDown
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["OPENROUTER_API_KEY"],
    base_url="https://openrouter.ai/api/v1"
)

# Use different models for different file types
def convert_with_best_model(filepath):
    if filepath.endswith('.pdf'):
        # Use Claude Sonnet 4.5 for technical PDFs
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5",
            llm_prompt="Describe scientific figures with technical precision"
        )
    elif filepath.endswith('.pptx'):
        # Use Claude Sonnet 4.5 for presentations
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5",
            llm_prompt="Describe slide content and visual elements"
        )
    else:
        # Use Claude Sonnet 4.5 as default
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5"
        )
    
    return md.convert(filepath)

# Use it
result = convert_with_best_model("paper.pdf")

Custom Prompts per Model

from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
)

# Scientific analysis with Claude Sonnet 4.5
scientific_prompt = """
Analyze this scientific figure. Provide:
1. Type of visualization and methodology
2. Quantitative data points and trends
3. Statistical significance
4. Technical interpretation
Be precise and use scientific terminology.
"""

md_scientific = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",
    llm_prompt=scientific_prompt
)

# Visual analysis with Claude Sonnet 4.5
visual_prompt = """
Describe this image comprehensively:
1. Main visual elements and composition
2. Colors, layout, and design
3. Text and labels
4. Overall message
"""

md_visual = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",
    llm_prompt=visual_prompt
)

Model Comparison

For Scientific Content

Recommended: anthropic/claude-sonnet-4.5

Excellent at technical analysis
Superior reasoning capabilities
Best at understanding scientific figures
Most detailed and accurate explanations
Advanced vision capabilities

Alternative: openai/gpt-4o

Good vision understanding
Fast processing
Good at charts and graphs

For Presentations

Recommended: anthropic/claude-sonnet-4.5

Superior vision capabilities
Excellent at understanding slide layouts
Fast and reliable
Best technical comprehension

For Cost-Effectiveness

Recommended: google/gemini-pro-vision

Lower cost per request
Good quality
Fast processing

Pricing Considerations

OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models

Tips for Cost Optimization:

Use Claude Sonnet 4.5 for best quality on complex scientific content
Use cheaper models (Gemini) for simple images
Batch process similar content with the same model
Use appropriate prompts to get better results in fewer retries

Troubleshooting

API Key Issues

# Check if key is set
echo $OPENROUTER_API_KEY

# Should show: sk-or-v1-...
# If empty, set it:
export OPENROUTER_API_KEY="sk-or-v1-..."

Model Not Found

If you get a "model not found" error, check:

Model name format: provider/model-name
Model availability: https://openrouter.ai/models
Vision support: Ensure model supports vision for image description

Rate Limits

OpenRouter has rate limits. If you hit them:

Add delays between requests
Use batch processing scripts with --workers parameter
Consider upgrading your OpenRouter plan

Migration Notes

This skill was updated from direct OpenAI API to OpenRouter. Key changes:

Environment Variable: OPENAI_API_KEY → OPENROUTER_API_KEY
Client Initialization: Added base_url="https://openrouter.ai/api/v1"
Model Names: gpt-4o → openai/gpt-4o (with provider prefix)
Script Updates: All scripts now use OpenRouter by default

Resources

OpenRouter Website: https://openrouter.ai
Get API Keys: https://openrouter.ai/keys
Model List: https://openrouter.ai/models
Pricing: https://openrouter.ai/models (click on model for details)
Documentation: https://openrouter.ai/docs
Support: https://openrouter.ai/discord

Example Workflow

Here's a complete workflow using OpenRouter:

# 1. Set up API key
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"

# 2. Convert a scientific paper with Claude
python scripts/convert_with_ai.py \
  research_paper.pdf \
  output.md \
  --model anthropic/claude-3.5-sonnet \
  --prompt-type scientific

# 3. Convert presentation with GPT-4o
python scripts/convert_with_ai.py \
  talk_slides.pptx \
  slides.md \
  --model openai/gpt-4o \
  --prompt-type presentation

# 4. Batch convert with cost-effective model
python scripts/batch_convert.py \
  images/ \
  markdown_output/ \
  --extensions .jpg .png

Support

For OpenRouter-specific issues:

Discord: https://openrouter.ai/discord
Email: support@openrouter.ai

For MarkItDown skill issues:

Check documentation in this skill directory
Review examples in assets/example_usage.md

9.4 KiB Raw Permalink Blame History