12 KiB
Advanced Integrations Reference
This document provides detailed information about advanced MarkItDown features including Azure Document Intelligence integration, LLM-powered descriptions, and plugin system.
Azure Document Intelligence Integration
Azure Document Intelligence (formerly Form Recognizer) provides superior PDF processing with advanced table extraction and layout analysis.
Setup
Prerequisites:
- Azure subscription
- Document Intelligence resource created in Azure
- Endpoint URL and API key
Create Azure Resource:
# Using Azure CLI
az cognitiveservices account create \
--name my-doc-intelligence \
--resource-group my-resource-group \
--kind FormRecognizer \
--sku F0 \
--location eastus
Basic Usage
from markitdown import MarkItDown
md = MarkItDown(
docintel_endpoint="https://YOUR-RESOURCE.cognitiveservices.azure.com/",
docintel_key="YOUR-API-KEY"
)
result = md.convert("complex_document.pdf")
print(result.text_content)
Configuration from Environment Variables
import os
from markitdown import MarkItDown
# Set environment variables
os.environ['AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'] = 'YOUR-ENDPOINT'
os.environ['AZURE_DOCUMENT_INTELLIGENCE_KEY'] = 'YOUR-KEY'
# Use without explicit credentials
md = MarkItDown(
docintel_endpoint=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'),
docintel_key=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_KEY')
)
result = md.convert("document.pdf")
When to Use Azure Document Intelligence
Use for:
- Complex PDFs with sophisticated tables
- Multi-column layouts
- Forms and structured documents
- Scanned documents requiring OCR
- PDFs with mixed content types
- Documents with intricate formatting
Benefits over standard extraction:
- Superior table extraction - Better handling of merged cells, complex layouts
- Layout analysis - Understands document structure (headers, footers, columns)
- Form fields - Extracts key-value pairs from forms
- Reading order - Maintains correct text flow in complex layouts
- OCR quality - High-quality text extraction from scanned documents
Comparison Example
Standard extraction:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("complex_table.pdf")
# May struggle with complex tables
Azure Document Intelligence:
from markitdown import MarkItDown
md = MarkItDown(
docintel_endpoint="YOUR-ENDPOINT",
docintel_key="YOUR-KEY"
)
result = md.convert("complex_table.pdf")
# Better table reconstruction and layout understanding
Cost Considerations
Azure Document Intelligence is a paid service:
- Free tier: 500 pages per month
- Paid tiers: Pay per page processed
- Monitor usage to control costs
- Use standard extraction for simple documents
Error Handling
from markitdown import MarkItDown
md = MarkItDown(
docintel_endpoint="YOUR-ENDPOINT",
docintel_key="YOUR-KEY"
)
try:
result = md.convert("document.pdf")
print(result.text_content)
except Exception as e:
print(f"Document Intelligence error: {e}")
# Common issues: authentication, quota exceeded, unsupported file
LLM-Powered Image Descriptions
Generate detailed, contextual descriptions for images using large language models.
Setup with OpenAI
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("image.jpg")
print(result.text_content)
Supported Use Cases
Images in documents:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
# PowerPoint with images
result = md.convert("presentation.pptx")
# Word documents with images
result = md.convert("report.docx")
# Standalone images
result = md.convert("diagram.png")
Custom Prompts
Customize the LLM prompt for specific needs:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
# For diagrams
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Analyze this diagram and explain all components, connections, and relationships in detail"
)
# For charts
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Describe this chart, including the type, axes, data points, trends, and key insights"
)
# For UI screenshots
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Describe this user interface screenshot, listing all UI elements, their layout, and functionality"
)
# For scientific figures
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
llm_prompt="Describe this scientific figure in detail, including methodology, results shown, and significance"
)
Model Selection
GPT-4o (Recommended):
- Best vision capabilities
- High-quality descriptions
- Good at understanding context
- Higher cost per image
GPT-4o-mini:
- Lower cost alternative
- Good for simpler images
- Faster processing
- May miss subtle details
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
# High quality (more expensive)
md_quality = MarkItDown(llm_client=client, llm_model="gpt-4o")
# Budget option (less expensive)
md_budget = MarkItDown(llm_client=client, llm_model="gpt-4o-mini")
Configuration from Environment
import os
from markitdown import MarkItDown
from openai import OpenAI
# Set API key in environment
os.environ['OPENAI_API_KEY'] = 'YOUR-API-KEY'
client = OpenAI() # Uses env variable
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
Alternative LLM Providers
Anthropic Claude:
from markitdown import MarkItDown
from anthropic import Anthropic
# Note: Check current compatibility with MarkItDown
client = Anthropic(api_key="YOUR-API-KEY")
# May require adapter for MarkItDown compatibility
Azure OpenAI:
from markitdown import MarkItDown
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="YOUR-AZURE-KEY",
api_version="2024-02-01",
azure_endpoint="https://YOUR-RESOURCE.openai.azure.com"
)
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
Cost Management
Strategies to reduce LLM costs:
- Selective processing:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
# Only use LLM for important documents
if is_important_document(file):
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
else:
md = MarkItDown() # Standard processing
result = md.convert(file)
- Image filtering:
# Pre-process to identify images that need descriptions
# Only use LLM for complex/important images
- Batch processing:
# Process multiple images in batches
# Monitor costs and set limits
- Model selection:
# Use gpt-4o-mini for simple images
# Reserve gpt-4o for complex visualizations
Performance Considerations
LLM processing adds latency:
- Each image requires an API call
- Processing time: 1-5 seconds per image
- Network dependent
- Consider parallel processing for multiple images
Batch optimization:
from markitdown import MarkItDown
from openai import OpenAI
import concurrent.futures
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
def process_image(image_path):
return md.convert(image_path)
# Process multiple images in parallel
images = ["img1.jpg", "img2.jpg", "img3.jpg"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
results = list(executor.map(process_image, images))
Combined Advanced Features
Azure Document Intelligence + LLM Descriptions
Combine both for maximum quality:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
docintel_endpoint="YOUR-AZURE-ENDPOINT",
docintel_key="YOUR-AZURE-KEY"
)
# Best possible PDF conversion with image descriptions
result = md.convert("complex_report.pdf")
Use cases:
- Research papers with figures
- Business reports with charts
- Technical documentation with diagrams
- Presentations with visual data
Smart Document Processing Pipeline
from markitdown import MarkItDown
from openai import OpenAI
import os
def smart_convert(file_path):
"""Intelligently choose processing method based on file type."""
client = OpenAI()
ext = os.path.splitext(file_path)[1].lower()
# PDFs with complex tables: Use Azure
if ext == '.pdf':
md = MarkItDown(
docintel_endpoint=os.getenv('AZURE_ENDPOINT'),
docintel_key=os.getenv('AZURE_KEY')
)
# Documents/presentations with images: Use LLM
elif ext in ['.pptx', '.docx']:
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o"
)
# Simple formats: Standard processing
else:
md = MarkItDown()
return md.convert(file_path)
# Use it
result = smart_convert("document.pdf")
Plugin System
MarkItDown supports custom plugins for extending functionality.
Plugin Architecture
Plugins are disabled by default for security:
from markitdown import MarkItDown
# Enable plugins
md = MarkItDown(enable_plugins=True)
Creating Custom Plugins
Plugin structure:
class CustomConverter:
"""Custom converter plugin for MarkItDown."""
def can_convert(self, file_path):
"""Check if this plugin can handle the file."""
return file_path.endswith('.custom')
def convert(self, file_path):
"""Convert file to Markdown."""
# Your conversion logic here
return {
'text_content': '# Converted Content\n\n...'
}
Plugin Registration
from markitdown import MarkItDown
md = MarkItDown(enable_plugins=True)
# Register custom plugin
md.register_plugin(CustomConverter())
# Use normally
result = md.convert("file.custom")
Plugin Use Cases
Custom formats:
- Proprietary document formats
- Specialized scientific data formats
- Legacy file formats
Enhanced processing:
- Custom OCR engines
- Specialized table extraction
- Domain-specific parsing
Integration:
- Enterprise document systems
- Custom databases
- Specialized APIs
Plugin Security
Important security considerations:
- Plugins run with full system access
- Only enable for trusted plugins
- Validate plugin code before use
- Disable plugins in production unless required
Error Handling for Advanced Features
from markitdown import MarkItDown
from openai import OpenAI
def robust_convert(file_path):
"""Convert with fallback strategies."""
try:
# Try with all advanced features
client = OpenAI()
md = MarkItDown(
llm_client=client,
llm_model="gpt-4o",
docintel_endpoint=os.getenv('AZURE_ENDPOINT'),
docintel_key=os.getenv('AZURE_KEY')
)
return md.convert(file_path)
except Exception as azure_error:
print(f"Azure failed: {azure_error}")
try:
# Fallback: LLM only
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
return md.convert(file_path)
except Exception as llm_error:
print(f"LLM failed: {llm_error}")
# Final fallback: Standard processing
md = MarkItDown()
return md.convert(file_path)
# Use it
result = robust_convert("document.pdf")
Best Practices
Azure Document Intelligence
- Use for complex PDFs only (cost optimization)
- Monitor usage and costs
- Store credentials securely
- Handle quota limits gracefully
- Fall back to standard processing if needed
LLM Integration
- Use appropriate models for task complexity
- Customize prompts for specific use cases
- Monitor API costs
- Implement rate limiting
- Cache results when possible
- Handle API errors gracefully
Combined Features
- Test cost/quality tradeoffs
- Use selectively for important documents
- Implement intelligent routing
- Monitor performance and costs
- Have fallback strategies
Security
- Store API keys securely (environment variables, secrets manager)
- Never commit credentials to code
- Disable plugins unless required
- Validate all inputs
- Use least privilege access