# Advanced Integrations Reference This document provides detailed information about advanced MarkItDown features including Azure Document Intelligence integration, LLM-powered descriptions, and plugin system. ## Azure Document Intelligence Integration Azure Document Intelligence (formerly Form Recognizer) provides superior PDF processing with advanced table extraction and layout analysis. ### Setup **Prerequisites:** 1. Azure subscription 2. Document Intelligence resource created in Azure 3. Endpoint URL and API key **Create Azure Resource:** ```bash # Using Azure CLI az cognitiveservices account create \ --name my-doc-intelligence \ --resource-group my-resource-group \ --kind FormRecognizer \ --sku F0 \ --location eastus ``` ### Basic Usage ```python from markitdown import MarkItDown md = MarkItDown( docintel_endpoint="https://YOUR-RESOURCE.cognitiveservices.azure.com/", docintel_key="YOUR-API-KEY" ) result = md.convert("complex_document.pdf") print(result.text_content) ``` ### Configuration from Environment Variables ```python import os from markitdown import MarkItDown # Set environment variables os.environ['AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'] = 'YOUR-ENDPOINT' os.environ['AZURE_DOCUMENT_INTELLIGENCE_KEY'] = 'YOUR-KEY' # Use without explicit credentials md = MarkItDown( docintel_endpoint=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'), docintel_key=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_KEY') ) result = md.convert("document.pdf") ``` ### When to Use Azure Document Intelligence **Use for:** - Complex PDFs with sophisticated tables - Multi-column layouts - Forms and structured documents - Scanned documents requiring OCR - PDFs with mixed content types - Documents with intricate formatting **Benefits over standard extraction:** - **Superior table extraction** - Better handling of merged cells, complex layouts - **Layout analysis** - Understands document structure (headers, footers, columns) - **Form fields** - Extracts key-value pairs from forms - **Reading order** - Maintains correct text flow in complex layouts - **OCR quality** - High-quality text extraction from scanned documents ### Comparison Example **Standard extraction:** ```python from markitdown import MarkItDown md = MarkItDown() result = md.convert("complex_table.pdf") # May struggle with complex tables ``` **Azure Document Intelligence:** ```python from markitdown import MarkItDown md = MarkItDown( docintel_endpoint="YOUR-ENDPOINT", docintel_key="YOUR-KEY" ) result = md.convert("complex_table.pdf") # Better table reconstruction and layout understanding ``` ### Cost Considerations Azure Document Intelligence is a paid service: - **Free tier**: 500 pages per month - **Paid tiers**: Pay per page processed - Monitor usage to control costs - Use standard extraction for simple documents ### Error Handling ```python from markitdown import MarkItDown md = MarkItDown( docintel_endpoint="YOUR-ENDPOINT", docintel_key="YOUR-KEY" ) try: result = md.convert("document.pdf") print(result.text_content) except Exception as e: print(f"Document Intelligence error: {e}") # Common issues: authentication, quota exceeded, unsupported file ``` ## LLM-Powered Image Descriptions Generate detailed, contextual descriptions for images using large language models. ### Setup with OpenAI ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI(api_key="YOUR-OPENAI-API-KEY") md = MarkItDown(llm_client=client, llm_model="gpt-4o") result = md.convert("image.jpg") print(result.text_content) ``` ### Supported Use Cases **Images in documents:** ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI() md = MarkItDown(llm_client=client, llm_model="gpt-4o") # PowerPoint with images result = md.convert("presentation.pptx") # Word documents with images result = md.convert("report.docx") # Standalone images result = md.convert("diagram.png") ``` ### Custom Prompts Customize the LLM prompt for specific needs: ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI() # For diagrams md = MarkItDown( llm_client=client, llm_model="gpt-4o", llm_prompt="Analyze this diagram and explain all components, connections, and relationships in detail" ) # For charts md = MarkItDown( llm_client=client, llm_model="gpt-4o", llm_prompt="Describe this chart, including the type, axes, data points, trends, and key insights" ) # For UI screenshots md = MarkItDown( llm_client=client, llm_model="gpt-4o", llm_prompt="Describe this user interface screenshot, listing all UI elements, their layout, and functionality" ) # For scientific figures md = MarkItDown( llm_client=client, llm_model="gpt-4o", llm_prompt="Describe this scientific figure in detail, including methodology, results shown, and significance" ) ``` ### Model Selection **GPT-4o (Recommended):** - Best vision capabilities - High-quality descriptions - Good at understanding context - Higher cost per image **GPT-4o-mini:** - Lower cost alternative - Good for simpler images - Faster processing - May miss subtle details ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI() # High quality (more expensive) md_quality = MarkItDown(llm_client=client, llm_model="gpt-4o") # Budget option (less expensive) md_budget = MarkItDown(llm_client=client, llm_model="gpt-4o-mini") ``` ### Configuration from Environment ```python import os from markitdown import MarkItDown from openai import OpenAI # Set API key in environment os.environ['OPENAI_API_KEY'] = 'YOUR-API-KEY' client = OpenAI() # Uses env variable md = MarkItDown(llm_client=client, llm_model="gpt-4o") ``` ### Alternative LLM Providers **Anthropic Claude:** ```python from markitdown import MarkItDown from anthropic import Anthropic # Note: Check current compatibility with MarkItDown client = Anthropic(api_key="YOUR-API-KEY") # May require adapter for MarkItDown compatibility ``` **Azure OpenAI:** ```python from markitdown import MarkItDown from openai import AzureOpenAI client = AzureOpenAI( api_key="YOUR-AZURE-KEY", api_version="2024-02-01", azure_endpoint="https://YOUR-RESOURCE.openai.azure.com" ) md = MarkItDown(llm_client=client, llm_model="gpt-4o") ``` ### Cost Management **Strategies to reduce LLM costs:** 1. **Selective processing:** ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI() # Only use LLM for important documents if is_important_document(file): md = MarkItDown(llm_client=client, llm_model="gpt-4o") else: md = MarkItDown() # Standard processing result = md.convert(file) ``` 2. **Image filtering:** ```python # Pre-process to identify images that need descriptions # Only use LLM for complex/important images ``` 3. **Batch processing:** ```python # Process multiple images in batches # Monitor costs and set limits ``` 4. **Model selection:** ```python # Use gpt-4o-mini for simple images # Reserve gpt-4o for complex visualizations ``` ### Performance Considerations **LLM processing adds latency:** - Each image requires an API call - Processing time: 1-5 seconds per image - Network dependent - Consider parallel processing for multiple images **Batch optimization:** ```python from markitdown import MarkItDown from openai import OpenAI import concurrent.futures client = OpenAI() md = MarkItDown(llm_client=client, llm_model="gpt-4o") def process_image(image_path): return md.convert(image_path) # Process multiple images in parallel images = ["img1.jpg", "img2.jpg", "img3.jpg"] with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor: results = list(executor.map(process_image, images)) ``` ## Combined Advanced Features ### Azure Document Intelligence + LLM Descriptions Combine both for maximum quality: ```python from markitdown import MarkItDown from openai import OpenAI client = OpenAI() md = MarkItDown( llm_client=client, llm_model="gpt-4o", docintel_endpoint="YOUR-AZURE-ENDPOINT", docintel_key="YOUR-AZURE-KEY" ) # Best possible PDF conversion with image descriptions result = md.convert("complex_report.pdf") ``` **Use cases:** - Research papers with figures - Business reports with charts - Technical documentation with diagrams - Presentations with visual data ### Smart Document Processing Pipeline ```python from markitdown import MarkItDown from openai import OpenAI import os def smart_convert(file_path): """Intelligently choose processing method based on file type.""" client = OpenAI() ext = os.path.splitext(file_path)[1].lower() # PDFs with complex tables: Use Azure if ext == '.pdf': md = MarkItDown( docintel_endpoint=os.getenv('AZURE_ENDPOINT'), docintel_key=os.getenv('AZURE_KEY') ) # Documents/presentations with images: Use LLM elif ext in ['.pptx', '.docx']: md = MarkItDown( llm_client=client, llm_model="gpt-4o" ) # Simple formats: Standard processing else: md = MarkItDown() return md.convert(file_path) # Use it result = smart_convert("document.pdf") ``` ## Plugin System MarkItDown supports custom plugins for extending functionality. ### Plugin Architecture Plugins are disabled by default for security: ```python from markitdown import MarkItDown # Enable plugins md = MarkItDown(enable_plugins=True) ``` ### Creating Custom Plugins **Plugin structure:** ```python class CustomConverter: """Custom converter plugin for MarkItDown.""" def can_convert(self, file_path): """Check if this plugin can handle the file.""" return file_path.endswith('.custom') def convert(self, file_path): """Convert file to Markdown.""" # Your conversion logic here return { 'text_content': '# Converted Content\n\n...' } ``` ### Plugin Registration ```python from markitdown import MarkItDown md = MarkItDown(enable_plugins=True) # Register custom plugin md.register_plugin(CustomConverter()) # Use normally result = md.convert("file.custom") ``` ### Plugin Use Cases **Custom formats:** - Proprietary document formats - Specialized scientific data formats - Legacy file formats **Enhanced processing:** - Custom OCR engines - Specialized table extraction - Domain-specific parsing **Integration:** - Enterprise document systems - Custom databases - Specialized APIs ### Plugin Security **Important security considerations:** - Plugins run with full system access - Only enable for trusted plugins - Validate plugin code before use - Disable plugins in production unless required ## Error Handling for Advanced Features ```python from markitdown import MarkItDown from openai import OpenAI def robust_convert(file_path): """Convert with fallback strategies.""" try: # Try with all advanced features client = OpenAI() md = MarkItDown( llm_client=client, llm_model="gpt-4o", docintel_endpoint=os.getenv('AZURE_ENDPOINT'), docintel_key=os.getenv('AZURE_KEY') ) return md.convert(file_path) except Exception as azure_error: print(f"Azure failed: {azure_error}") try: # Fallback: LLM only client = OpenAI() md = MarkItDown(llm_client=client, llm_model="gpt-4o") return md.convert(file_path) except Exception as llm_error: print(f"LLM failed: {llm_error}") # Final fallback: Standard processing md = MarkItDown() return md.convert(file_path) # Use it result = robust_convert("document.pdf") ``` ## Best Practices ### Azure Document Intelligence - Use for complex PDFs only (cost optimization) - Monitor usage and costs - Store credentials securely - Handle quota limits gracefully - Fall back to standard processing if needed ### LLM Integration - Use appropriate models for task complexity - Customize prompts for specific use cases - Monitor API costs - Implement rate limiting - Cache results when possible - Handle API errors gracefully ### Combined Features - Test cost/quality tradeoffs - Use selectively for important documents - Implement intelligent routing - Monitor performance and costs - Have fallback strategies ### Security - Store API keys securely (environment variables, secrets manager) - Never commit credentials to code - Disable plugins unless required - Validate all inputs - Use least privilege access