9.4 KiB
OpenRouter Integration for MarkItDown
Overview
This MarkItDown skill has been configured to use OpenRouter instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.
Why OpenRouter?
Benefits
- Multiple Model Access: Access GPT-4, Claude, Gemini, and 100+ other models through one API
- No Vendor Lock-in: Switch between models without code changes
- Competitive Pricing: Often better rates than going direct
- Simple Migration: OpenAI-compatible API means minimal code changes
- Flexible Choice: Choose the best model for each task
Popular Models for Image Description
| Model | Provider | Use Case | Vision Support |
|---|---|---|---|
anthropic/claude-sonnet-4.5 |
Anthropic | Recommended - Best overall for scientific analysis | ✅ |
anthropic/claude-3.5-sonnet |
Anthropic | Excellent technical analysis | ✅ |
openai/gpt-4o |
OpenAI | Strong vision understanding | ✅ |
openai/gpt-4-vision |
OpenAI | GPT-4 with vision | ✅ |
google/gemini-pro-vision |
Cost-effective option | ✅ |
See https://openrouter.ai/models for the complete list.
Getting Started
1. Get an API Key
- Visit https://openrouter.ai/keys
- Sign up or log in
- Create a new API key
- Copy the key (starts with
sk-or-v1-...)
2. Set Environment Variable
# Add to your environment
export OPENROUTER_API_KEY="sk-or-v1-..."
# Make it permanent
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
# Reload shell
source ~/.zshrc # or source ~/.bashrc
3. Use in Python
from markitdown import MarkItDown
from openai import OpenAI
# Initialize OpenRouter client (OpenAI-compatible)
client = OpenAI(
api_key="your-openrouter-api-key", # or use env var
base_url="https://openrouter.ai/api/v1"
)
# Create MarkItDown with AI support
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # Choose your model
)
# Convert with AI-enhanced descriptions
result = md.convert("presentation.pptx")
print(result.text_content)
Using the Scripts
All skill scripts have been updated to use OpenRouter:
convert_with_ai.py
# Set API key
export OPENROUTER_API_KEY="sk-or-v1-..."
# Convert with default model (Claude Sonnet 4.5)
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
# Use GPT-4o as alternative
python scripts/convert_with_ai.py paper.pdf output.md \
--model openai/gpt-4o \
--prompt-type scientific
# Use Gemini Pro Vision (cost-effective)
python scripts/convert_with_ai.py slides.pptx output.md \
--model google/gemini-pro-vision \
--prompt-type presentation
# List available prompt types
python scripts/convert_with_ai.py --list-prompts
Choosing the Right Model
# For scientific papers - use Claude Sonnet 4.5 for technical analysis
python scripts/convert_with_ai.py research.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type scientific
# For presentations - use Claude Sonnet 4.5 for vision
python scripts/convert_with_ai.py slides.pptx output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type presentation
# For data visualizations - use Claude Sonnet 4.5
python scripts/convert_with_ai.py charts.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type data_viz
# For medical images - use Claude Sonnet 4.5 for detailed analysis
python scripts/convert_with_ai.py xray.jpg output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type medical
Code Examples
Basic Usage
from markitdown import MarkItDown
from openai import OpenAI
import os
# Initialize OpenRouter client
client = OpenAI(
api_key=os.environ.get("OPENROUTER_API_KEY"),
base_url="https://openrouter.ai/api/v1"
)
# Use Claude Sonnet 4.5 for image descriptions
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5"
)
result = md.convert("document.pptx")
print(result.text_content)
Switching Models Dynamically
from markitdown import MarkItDown
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1"
)
# Use different models for different file types
def convert_with_best_model(filepath):
if filepath.endswith('.pdf'):
# Use Claude Sonnet 4.5 for technical PDFs
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt="Describe scientific figures with technical precision"
)
elif filepath.endswith('.pptx'):
# Use Claude Sonnet 4.5 for presentations
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt="Describe slide content and visual elements"
)
else:
# Use Claude Sonnet 4.5 as default
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5"
)
return md.convert(filepath)
# Use it
result = convert_with_best_model("paper.pdf")
Custom Prompts per Model
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
# Scientific analysis with Claude Sonnet 4.5
scientific_prompt = """
Analyze this scientific figure. Provide:
1. Type of visualization and methodology
2. Quantitative data points and trends
3. Statistical significance
4. Technical interpretation
Be precise and use scientific terminology.
"""
md_scientific = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt=scientific_prompt
)
# Visual analysis with Claude Sonnet 4.5
visual_prompt = """
Describe this image comprehensively:
1. Main visual elements and composition
2. Colors, layout, and design
3. Text and labels
4. Overall message
"""
md_visual = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt=visual_prompt
)
Model Comparison
For Scientific Content
Recommended: anthropic/claude-sonnet-4.5
- Excellent at technical analysis
- Superior reasoning capabilities
- Best at understanding scientific figures
- Most detailed and accurate explanations
- Advanced vision capabilities
Alternative: openai/gpt-4o
- Good vision understanding
- Fast processing
- Good at charts and graphs
For Presentations
Recommended: anthropic/claude-sonnet-4.5
- Superior vision capabilities
- Excellent at understanding slide layouts
- Fast and reliable
- Best technical comprehension
For Cost-Effectiveness
Recommended: google/gemini-pro-vision
- Lower cost per request
- Good quality
- Fast processing
Pricing Considerations
OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models
Tips for Cost Optimization:
- Use Claude Sonnet 4.5 for best quality on complex scientific content
- Use cheaper models (Gemini) for simple images
- Batch process similar content with the same model
- Use appropriate prompts to get better results in fewer retries
Troubleshooting
API Key Issues
# Check if key is set
echo $OPENROUTER_API_KEY
# Should show: sk-or-v1-...
# If empty, set it:
export OPENROUTER_API_KEY="sk-or-v1-..."
Model Not Found
If you get a "model not found" error, check:
- Model name format:
provider/model-name - Model availability: https://openrouter.ai/models
- Vision support: Ensure model supports vision for image description
Rate Limits
OpenRouter has rate limits. If you hit them:
- Add delays between requests
- Use batch processing scripts with
--workersparameter - Consider upgrading your OpenRouter plan
Migration Notes
This skill was updated from direct OpenAI API to OpenRouter. Key changes:
- Environment Variable:
OPENAI_API_KEY→OPENROUTER_API_KEY - Client Initialization: Added
base_url="https://openrouter.ai/api/v1" - Model Names:
gpt-4o→openai/gpt-4o(with provider prefix) - Script Updates: All scripts now use OpenRouter by default
Resources
- OpenRouter Website: https://openrouter.ai
- Get API Keys: https://openrouter.ai/keys
- Model List: https://openrouter.ai/models
- Pricing: https://openrouter.ai/models (click on model for details)
- Documentation: https://openrouter.ai/docs
- Support: https://openrouter.ai/discord
Example Workflow
Here's a complete workflow using OpenRouter:
# 1. Set up API key
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
# 2. Convert a scientific paper with Claude
python scripts/convert_with_ai.py \
research_paper.pdf \
output.md \
--model anthropic/claude-3.5-sonnet \
--prompt-type scientific
# 3. Convert presentation with GPT-4o
python scripts/convert_with_ai.py \
talk_slides.pptx \
slides.md \
--model openai/gpt-4o \
--prompt-type presentation
# 4. Batch convert with cost-effective model
python scripts/batch_convert.py \
images/ \
markdown_output/ \
--extensions .jpg .png
Support
For OpenRouter-specific issues:
- Discord: https://openrouter.ai/discord
- Email: support@openrouter.ai
For MarkItDown skill issues:
- Check documentation in this skill directory
- Review examples in
assets/example_usage.md