Initial commit

2025-11-30 08:30:14 +08:00
commit 1dd5bee3b4
335 changed files with 147360 additions and 0 deletions
--- a/skills/markitdown/OPENROUTER_INTEGRATION.md
+++ b/skills/markitdown/OPENROUTER_INTEGRATION.md
@@ -0,0 +1,359 @@
+# OpenRouter Integration for MarkItDown
+
+## Overview
+
+This MarkItDown skill has been configured to use **OpenRouter** instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.
+
+## Why OpenRouter?
+
+### Benefits
+
+1. **Multiple Model Access**: Access GPT-4, Claude, Gemini, and 100+ other models through one API
+2. **No Vendor Lock-in**: Switch between models without code changes
+3. **Competitive Pricing**: Often better rates than going direct
+4. **Simple Migration**: OpenAI-compatible API means minimal code changes
+5. **Flexible Choice**: Choose the best model for each task
+
+### Popular Models for Image Description
+
+| Model | Provider | Use Case | Vision Support |
+|-------|----------|----------|----------------|
+| `anthropic/claude-sonnet-4.5` | Anthropic | **Recommended** - Best overall for scientific analysis | ✅ |
+| `anthropic/claude-3.5-sonnet` | Anthropic | Excellent technical analysis | ✅ |
+| `openai/gpt-4o` | OpenAI | Strong vision understanding | ✅ |
+| `openai/gpt-4-vision` | OpenAI | GPT-4 with vision | ✅ |
+| `google/gemini-pro-vision` | Google | Cost-effective option | ✅ |
+
+See https://openrouter.ai/models for the complete list.
+
+## Getting Started
+
+### 1. Get an API Key
+
+1. Visit https://openrouter.ai/keys
+2. Sign up or log in
+3. Create a new API key
+4. Copy the key (starts with `sk-or-v1-...`)
+
+### 2. Set Environment Variable
+
+```bash
+# Add to your environment
+export OPENROUTER_API_KEY="sk-or-v1-..."
+
+# Make it permanent
+echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc  # macOS
+echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
+
+# Reload shell
+source ~/.zshrc  # or source ~/.bashrc
+```
+
+### 3. Use in Python
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+# Initialize OpenRouter client (OpenAI-compatible)
+client = OpenAI(
+    api_key="your-openrouter-api-key",  # or use env var
+    base_url="https://openrouter.ai/api/v1"
+)
+
+# Create MarkItDown with AI support
+md = MarkItDown(
+    llm_client=client,
+    llm_model="anthropic/claude-sonnet-4.5"  # Choose your model
+)
+
+# Convert with AI-enhanced descriptions
+result = md.convert("presentation.pptx")
+print(result.text_content)
+```
+
+## Using the Scripts
+
+All skill scripts have been updated to use OpenRouter:
+
+### convert_with_ai.py
+
+```bash
+# Set API key
+export OPENROUTER_API_KEY="sk-or-v1-..."
+
+# Convert with default model (Claude Sonnet 4.5)
+python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
+
+# Use GPT-4o as alternative
+python scripts/convert_with_ai.py paper.pdf output.md \
+  --model openai/gpt-4o \
+  --prompt-type scientific
+
+# Use Gemini Pro Vision (cost-effective)
+python scripts/convert_with_ai.py slides.pptx output.md \
+  --model google/gemini-pro-vision \
+  --prompt-type presentation
+
+# List available prompt types
+python scripts/convert_with_ai.py --list-prompts
+```
+
+### Choosing the Right Model
+
+```bash
+# For scientific papers - use Claude Sonnet 4.5 for technical analysis
+python scripts/convert_with_ai.py research.pdf output.md \
+  --model anthropic/claude-sonnet-4.5 \
+  --prompt-type scientific
+
+# For presentations - use Claude Sonnet 4.5 for vision
+python scripts/convert_with_ai.py slides.pptx output.md \
+  --model anthropic/claude-sonnet-4.5 \
+  --prompt-type presentation
+
+# For data visualizations - use Claude Sonnet 4.5
+python scripts/convert_with_ai.py charts.pdf output.md \
+  --model anthropic/claude-sonnet-4.5 \
+  --prompt-type data_viz
+
+# For medical images - use Claude Sonnet 4.5 for detailed analysis
+python scripts/convert_with_ai.py xray.jpg output.md \
+  --model anthropic/claude-sonnet-4.5 \
+  --prompt-type medical
+```
+
+## Code Examples
+
+### Basic Usage
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+import os
+
+# Initialize OpenRouter client
+client = OpenAI(
+    api_key=os.environ.get("OPENROUTER_API_KEY"),
+    base_url="https://openrouter.ai/api/v1"
+)
+
+# Use Claude Sonnet 4.5 for image descriptions
+md = MarkItDown(
+    llm_client=client,
+    llm_model="anthropic/claude-sonnet-4.5"
+)
+
+result = md.convert("document.pptx")
+print(result.text_content)
+```
+
+### Switching Models Dynamically
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+import os
+
+client = OpenAI(
+    api_key=os.environ["OPENROUTER_API_KEY"],
+    base_url="https://openrouter.ai/api/v1"
+)
+
+# Use different models for different file types
+def convert_with_best_model(filepath):
+    if filepath.endswith('.pdf'):
+        # Use Claude Sonnet 4.5 for technical PDFs
+        md = MarkItDown(
+            llm_client=client,
+            llm_model="anthropic/claude-sonnet-4.5",
+            llm_prompt="Describe scientific figures with technical precision"
+        )
+    elif filepath.endswith('.pptx'):
+        # Use Claude Sonnet 4.5 for presentations
+        md = MarkItDown(
+            llm_client=client,
+            llm_model="anthropic/claude-sonnet-4.5",
+            llm_prompt="Describe slide content and visual elements"
+        )
+    else:
+        # Use Claude Sonnet 4.5 as default
+        md = MarkItDown(
+            llm_client=client,
+            llm_model="anthropic/claude-sonnet-4.5"
+        )
+    
+    return md.convert(filepath)
+
+# Use it
+result = convert_with_best_model("paper.pdf")
+```
+
+### Custom Prompts per Model
+
+```python
+from markitdown import MarkItDown
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="your-openrouter-api-key",
+    base_url="https://openrouter.ai/api/v1"
+)
+
+# Scientific analysis with Claude Sonnet 4.5
+scientific_prompt = """
+Analyze this scientific figure. Provide:
+1. Type of visualization and methodology
+2. Quantitative data points and trends
+3. Statistical significance
+4. Technical interpretation
+Be precise and use scientific terminology.
+"""
+
+md_scientific = MarkItDown(
+    llm_client=client,
+    llm_model="anthropic/claude-sonnet-4.5",
+    llm_prompt=scientific_prompt
+)
+
+# Visual analysis with Claude Sonnet 4.5
+visual_prompt = """
+Describe this image comprehensively:
+1. Main visual elements and composition
+2. Colors, layout, and design
+3. Text and labels
+4. Overall message
+"""
+
+md_visual = MarkItDown(
+    llm_client=client,
+    llm_model="anthropic/claude-sonnet-4.5",
+    llm_prompt=visual_prompt
+)
+```
+
+## Model Comparison
+
+### For Scientific Content
+
+**Recommended: anthropic/claude-sonnet-4.5**
+- Excellent at technical analysis
+- Superior reasoning capabilities
+- Best at understanding scientific figures
+- Most detailed and accurate explanations
+- Advanced vision capabilities
+
+**Alternative: openai/gpt-4o**
+- Good vision understanding
+- Fast processing
+- Good at charts and graphs
+
+### For Presentations
+
+**Recommended: anthropic/claude-sonnet-4.5**
+- Superior vision capabilities
+- Excellent at understanding slide layouts
+- Fast and reliable
+- Best technical comprehension
+
+### For Cost-Effectiveness
+
+**Recommended: google/gemini-pro-vision**
+- Lower cost per request
+- Good quality
+- Fast processing
+
+## Pricing Considerations
+
+OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models
+
+**Tips for Cost Optimization:**
+1. Use Claude Sonnet 4.5 for best quality on complex scientific content
+2. Use cheaper models (Gemini) for simple images
+3. Batch process similar content with the same model
+4. Use appropriate prompts to get better results in fewer retries
+
+## Troubleshooting
+
+### API Key Issues
+
+```bash
+# Check if key is set
+echo $OPENROUTER_API_KEY
+
+# Should show: sk-or-v1-...
+# If empty, set it:
+export OPENROUTER_API_KEY="sk-or-v1-..."
+```
+
+### Model Not Found
+
+If you get a "model not found" error, check:
+1. Model name format: `provider/model-name`
+2. Model availability: https://openrouter.ai/models
+3. Vision support: Ensure model supports vision for image description
+
+### Rate Limits
+
+OpenRouter has rate limits. If you hit them:
+1. Add delays between requests
+2. Use batch processing scripts with `--workers` parameter
+3. Consider upgrading your OpenRouter plan
+
+## Migration Notes
+
+This skill was updated from direct OpenAI API to OpenRouter. Key changes:
+
+1. **Environment Variable**: `OPENAI_API_KEY` → `OPENROUTER_API_KEY`
+2. **Client Initialization**: Added `base_url="https://openrouter.ai/api/v1"`
+3. **Model Names**: `gpt-4o` → `openai/gpt-4o` (with provider prefix)
+4. **Script Updates**: All scripts now use OpenRouter by default
+
+## Resources
+
+- **OpenRouter Website**: https://openrouter.ai
+- **Get API Keys**: https://openrouter.ai/keys
+- **Model List**: https://openrouter.ai/models
+- **Pricing**: https://openrouter.ai/models (click on model for details)
+- **Documentation**: https://openrouter.ai/docs
+- **Support**: https://openrouter.ai/discord
+
+## Example Workflow
+
+Here's a complete workflow using OpenRouter:
+
+```bash
+# 1. Set up API key
+export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
+
+# 2. Convert a scientific paper with Claude
+python scripts/convert_with_ai.py \
+  research_paper.pdf \
+  output.md \
+  --model anthropic/claude-3.5-sonnet \
+  --prompt-type scientific
+
+# 3. Convert presentation with GPT-4o
+python scripts/convert_with_ai.py \
+  talk_slides.pptx \
+  slides.md \
+  --model openai/gpt-4o \
+  --prompt-type presentation
+
+# 4. Batch convert with cost-effective model
+python scripts/batch_convert.py \
+  images/ \
+  markdown_output/ \
+  --extensions .jpg .png
+```
+
+## Support
+
+For OpenRouter-specific issues:
+- Discord: https://openrouter.ai/discord
+- Email: support@openrouter.ai
+
+For MarkItDown skill issues:
+- Check documentation in this skill directory
+- Review examples in `assets/example_usage.md`
+