gh-k-dense-ai-claude-scient…/skills/markitdown/references/advanced_integrations.md

# Advanced Integrations Reference

This document provides detailed information about advanced MarkItDown features including Azure Document Intelligence integration, LLM-powered descriptions, and plugin system.

## Azure Document Intelligence Integration

Azure Document Intelligence (formerly Form Recognizer) provides superior PDF processing with advanced table extraction and layout analysis.

### Setup

**Prerequisites:**
1. Azure subscription
2. Document Intelligence resource created in Azure
3. Endpoint URL and API key

**Create Azure Resource:**
```bash
# Using Azure CLI
az cognitiveservices account create \
  --name my-doc-intelligence \
  --resource-group my-resource-group \
  --kind FormRecognizer \
  --sku F0 \
  --location eastus
```

### Basic Usage

```python
from markitdown import MarkItDown

md = MarkItDown(
    docintel_endpoint="https://YOUR-RESOURCE.cognitiveservices.azure.com/",
    docintel_key="YOUR-API-KEY"
)

result = md.convert("complex_document.pdf")
print(result.text_content)
```

### Configuration from Environment Variables

```python
import os
from markitdown import MarkItDown

# Set environment variables
os.environ['AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'] = 'YOUR-ENDPOINT'
os.environ['AZURE_DOCUMENT_INTELLIGENCE_KEY'] = 'YOUR-KEY'

# Use without explicit credentials
md = MarkItDown(
    docintel_endpoint=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT'),
    docintel_key=os.getenv('AZURE_DOCUMENT_INTELLIGENCE_KEY')
)

result = md.convert("document.pdf")
```

### When to Use Azure Document Intelligence

**Use for:**
- Complex PDFs with sophisticated tables
- Multi-column layouts
- Forms and structured documents
- Scanned documents requiring OCR
- PDFs with mixed content types
- Documents with intricate formatting

**Benefits over standard extraction:**
- **Superior table extraction** - Better handling of merged cells, complex layouts
- **Layout analysis** - Understands document structure (headers, footers, columns)
- **Form fields** - Extracts key-value pairs from forms
- **Reading order** - Maintains correct text flow in complex layouts
- **OCR quality** - High-quality text extraction from scanned documents

### Comparison Example

**Standard extraction:**
```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("complex_table.pdf")
# May struggle with complex tables
```

**Azure Document Intelligence:**
```python
from markitdown import MarkItDown

md = MarkItDown(
    docintel_endpoint="YOUR-ENDPOINT",
    docintel_key="YOUR-KEY"
)
result = md.convert("complex_table.pdf")
# Better table reconstruction and layout understanding
```

### Cost Considerations

Azure Document Intelligence is a paid service:
- **Free tier**: 500 pages per month
- **Paid tiers**: Pay per page processed
- Monitor usage to control costs
- Use standard extraction for simple documents

### Error Handling

```python
from markitdown import MarkItDown

md = MarkItDown(
    docintel_endpoint="YOUR-ENDPOINT",
    docintel_key="YOUR-KEY"
)

try:
    result = md.convert("document.pdf")
    print(result.text_content)
except Exception as e:
    print(f"Document Intelligence error: {e}")
    # Common issues: authentication, quota exceeded, unsupported file
```

## LLM-Powered Image Descriptions

Generate detailed, contextual descriptions for images using large language models.

### Setup with OpenAI

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
md = MarkItDown(llm_client=client, llm_model="gpt-4o")

result = md.convert("image.jpg")
print(result.text_content)
```

### Supported Use Cases

**Images in documents:**
```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")

# PowerPoint with images
result = md.convert("presentation.pptx")

# Word documents with images
result = md.convert("report.docx")

# Standalone images
result = md.convert("diagram.png")
```

### Custom Prompts

Customize the LLM prompt for specific needs:

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()

# For diagrams
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Analyze this diagram and explain all components, connections, and relationships in detail"
)

# For charts
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this chart, including the type, axes, data points, trends, and key insights"
)

# For UI screenshots
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this user interface screenshot, listing all UI elements, their layout, and functionality"
)

# For scientific figures
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    llm_prompt="Describe this scientific figure in detail, including methodology, results shown, and significance"
)
```

### Model Selection

**GPT-4o (Recommended):**
- Best vision capabilities
- High-quality descriptions
- Good at understanding context
- Higher cost per image

**GPT-4o-mini:**
- Lower cost alternative
- Good for simpler images
- Faster processing
- May miss subtle details

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()

# High quality (more expensive)
md_quality = MarkItDown(llm_client=client, llm_model="gpt-4o")

# Budget option (less expensive)
md_budget = MarkItDown(llm_client=client, llm_model="gpt-4o-mini")
```

### Configuration from Environment

```python
import os
from markitdown import MarkItDown
from openai import OpenAI

# Set API key in environment
os.environ['OPENAI_API_KEY'] = 'YOUR-API-KEY'

client = OpenAI()  # Uses env variable
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
```

### Alternative LLM Providers

**Anthropic Claude:**
```python
from markitdown import MarkItDown
from anthropic import Anthropic

# Note: Check current compatibility with MarkItDown
client = Anthropic(api_key="YOUR-API-KEY")
# May require adapter for MarkItDown compatibility
```

**Azure OpenAI:**
```python
from markitdown import MarkItDown
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="YOUR-AZURE-KEY",
    api_version="2024-02-01",
    azure_endpoint="https://YOUR-RESOURCE.openai.azure.com"
)

md = MarkItDown(llm_client=client, llm_model="gpt-4o")
```

### Cost Management

**Strategies to reduce LLM costs:**

1. **Selective processing:**
```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()

# Only use LLM for important documents
if is_important_document(file):
    md = MarkItDown(llm_client=client, llm_model="gpt-4o")
else:
    md = MarkItDown()  # Standard processing

result = md.convert(file)
```

2. **Image filtering:**
```python
# Pre-process to identify images that need descriptions
# Only use LLM for complex/important images
```

3. **Batch processing:**
```python
# Process multiple images in batches
# Monitor costs and set limits
```

4. **Model selection:**
```python
# Use gpt-4o-mini for simple images
# Reserve gpt-4o for complex visualizations
```

### Performance Considerations

**LLM processing adds latency:**
- Each image requires an API call
- Processing time: 1-5 seconds per image
- Network dependent
- Consider parallel processing for multiple images

**Batch optimization:**
```python
from markitdown import MarkItDown
from openai import OpenAI
import concurrent.futures

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")

def process_image(image_path):
    return md.convert(image_path)

# Process multiple images in parallel
images = ["img1.jpg", "img2.jpg", "img3.jpg"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    results = list(executor.map(process_image, images))
```

## Combined Advanced Features

### Azure Document Intelligence + LLM Descriptions

Combine both for maximum quality:

```python
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(
    llm_client=client,
    llm_model="gpt-4o",
    docintel_endpoint="YOUR-AZURE-ENDPOINT",
    docintel_key="YOUR-AZURE-KEY"
)

# Best possible PDF conversion with image descriptions
result = md.convert("complex_report.pdf")
```

**Use cases:**
- Research papers with figures
- Business reports with charts
- Technical documentation with diagrams
- Presentations with visual data

### Smart Document Processing Pipeline

```python
from markitdown import MarkItDown
from openai import OpenAI
import os

def smart_convert(file_path):
    """Intelligently choose processing method based on file type."""
    client = OpenAI()
    ext = os.path.splitext(file_path)[1].lower()

    # PDFs with complex tables: Use Azure
    if ext == '.pdf':
        md = MarkItDown(
            docintel_endpoint=os.getenv('AZURE_ENDPOINT'),
            docintel_key=os.getenv('AZURE_KEY')
        )

    # Documents/presentations with images: Use LLM
    elif ext in ['.pptx', '.docx']:
        md = MarkItDown(
            llm_client=client,
            llm_model="gpt-4o"
        )

    # Simple formats: Standard processing
    else:
        md = MarkItDown()

    return md.convert(file_path)

# Use it
result = smart_convert("document.pdf")
```

## Plugin System

MarkItDown supports custom plugins for extending functionality.

### Plugin Architecture

Plugins are disabled by default for security:

```python
from markitdown import MarkItDown

# Enable plugins
md = MarkItDown(enable_plugins=True)
```

### Creating Custom Plugins

**Plugin structure:**
```python
class CustomConverter:
    """Custom converter plugin for MarkItDown."""

    def can_convert(self, file_path):
        """Check if this plugin can handle the file."""
        return file_path.endswith('.custom')

    def convert(self, file_path):
        """Convert file to Markdown."""
        # Your conversion logic here
        return {
            'text_content': '# Converted Content\n\n...'
        }
```

### Plugin Registration

```python
from markitdown import MarkItDown

md = MarkItDown(enable_plugins=True)

# Register custom plugin
md.register_plugin(CustomConverter())

# Use normally
result = md.convert("file.custom")
```

### Plugin Use Cases

**Custom formats:**
- Proprietary document formats
- Specialized scientific data formats
- Legacy file formats

**Enhanced processing:**
- Custom OCR engines
- Specialized table extraction
- Domain-specific parsing

**Integration:**
- Enterprise document systems
- Custom databases
- Specialized APIs

### Plugin Security

**Important security considerations:**
- Plugins run with full system access
- Only enable for trusted plugins
- Validate plugin code before use
- Disable plugins in production unless required

## Error Handling for Advanced Features

```python
from markitdown import MarkItDown
from openai import OpenAI

def robust_convert(file_path):
    """Convert with fallback strategies."""
    try:
        # Try with all advanced features
        client = OpenAI()
        md = MarkItDown(
            llm_client=client,
            llm_model="gpt-4o",
            docintel_endpoint=os.getenv('AZURE_ENDPOINT'),
            docintel_key=os.getenv('AZURE_KEY')
        )
        return md.convert(file_path)

    except Exception as azure_error:
        print(f"Azure failed: {azure_error}")

        try:
            # Fallback: LLM only
            client = OpenAI()
            md = MarkItDown(llm_client=client, llm_model="gpt-4o")
            return md.convert(file_path)

        except Exception as llm_error:
            print(f"LLM failed: {llm_error}")

            # Final fallback: Standard processing
            md = MarkItDown()
            return md.convert(file_path)

# Use it
result = robust_convert("document.pdf")
```

## Best Practices

### Azure Document Intelligence
- Use for complex PDFs only (cost optimization)
- Monitor usage and costs
- Store credentials securely
- Handle quota limits gracefully
- Fall back to standard processing if needed

### LLM Integration
- Use appropriate models for task complexity
- Customize prompts for specific use cases
- Monitor API costs
- Implement rate limiting
- Cache results when possible
- Handle API errors gracefully

### Combined Features
- Test cost/quality tradeoffs
- Use selectively for important documents
- Implement intelligent routing
- Monitor performance and costs
- Have fallback strategies

### Security
- Store API keys securely (environment variables, secrets manager)
- Never commit credentials to code
- Disable plugins unless required
- Validate all inputs
- Use least privilege access