Files
2025-11-30 08:30:18 +08:00

308 lines
8.6 KiB
Markdown

# MarkItDown Skill - Creation Summary
## Overview
A comprehensive skill for using Microsoft's MarkItDown tool has been created for the Claude Scientific Writer. This skill enables conversion of 15+ file formats to Markdown, optimized for LLM processing and scientific workflows.
## What Was Created
### Core Documentation
1. **SKILL.md** (Main skill file)
- Complete guide to MarkItDown
- Quick start examples
- All supported formats
- Advanced features (AI, Azure DI)
- Best practices
- Use cases and examples
2. **README.md**
- Skill overview
- Key features
- Quick reference
- Integration guide
3. **QUICK_REFERENCE.md**
- Cheat sheet for common tasks
- Quick syntax reference
- Common commands
- Troubleshooting tips
4. **INSTALLATION_GUIDE.md**
- Step-by-step installation
- System dependencies
- Virtual environment setup
- Optional features
- Troubleshooting
### Reference Documentation
Located in `references/`:
1. **api_reference.md**
- Complete API documentation
- Class and method references
- Custom converter development
- Plugin system
- Error handling
- Breaking changes guide
2. **file_formats.md**
- Detailed format-specific guides
- 15+ supported formats
- Format capabilities and limitations
- Best practices per format
- Example outputs
### Utility Scripts
Located in `scripts/`:
1. **batch_convert.py**
- Parallel batch conversion
- Multi-format support
- Recursive directory search
- Progress tracking
- Error reporting
- Command-line interface
2. **convert_with_ai.py**
- AI-enhanced conversions
- Predefined prompt types (scientific, medical, data viz, etc.)
- Custom prompt support
- Multiple model support
- OpenRouter integration (Claude Sonnet 4.5 default)
3. **convert_literature.py**
- Scientific literature conversion
- Metadata extraction from filenames
- Year-based organization
- Automatic index generation
- JSON catalog creation
- Front matter support
### Assets
Located in `assets/`:
1. **example_usage.md**
- 20+ practical examples
- Basic conversions
- Scientific workflows
- AI-enhanced processing
- Batch operations
- Error handling patterns
- Integration examples
### License
- **LICENSE.txt** - MIT License from Microsoft
## Skill Structure
```
.claude/skills/markitdown/
├── SKILL.md # Main skill documentation
├── README.md # Skill overview
├── QUICK_REFERENCE.md # Quick reference guide
├── INSTALLATION_GUIDE.md # Installation instructions
├── SKILL_SUMMARY.md # This file
├── LICENSE.txt # MIT License
├── references/
│ ├── api_reference.md # Complete API docs
│ └── file_formats.md # Format-specific guides
├── scripts/
│ ├── batch_convert.py # Batch conversion utility
│ ├── convert_with_ai.py # AI-enhanced conversion
│ └── convert_literature.py # Literature conversion
└── assets/
└── example_usage.md # Practical examples
```
## Capabilities
### File Format Support
- **Documents**: PDF, DOCX, PPTX, XLSX, XLS, EPUB
- **Images**: JPEG, PNG, GIF, WebP (with OCR)
- **Audio**: WAV, MP3 (with transcription)
- **Web**: HTML, YouTube URLs
- **Data**: CSV, JSON, XML
- **Archives**: ZIP files
- **Email**: Outlook MSG files
### Advanced Features
1. **AI Enhancement via OpenRouter**
- Access to 100+ AI models through OpenRouter
- Multiple preset prompts (scientific, medical, data viz)
- Custom prompt support
- Default: Claude Sonnet 4.5 (best for scientific vision)
- Choose best model for each task
2. **Azure Integration**
- Azure Document Intelligence for complex PDFs
- Enhanced layout understanding
- Better table extraction
3. **Batch Processing**
- Parallel conversion with configurable workers
- Recursive directory processing
- Progress tracking and error reporting
- Format-specific organization
4. **Scientific Workflows**
- Literature conversion with metadata
- Automatic index generation
- Year-based organization
- Citation-friendly output
## Integration with Scientific Writer
The skill has been added to the Scientific Writer's skill catalog:
- **Location**: `.claude/skills/markitdown/`
- **Skill Number**: #5 in Document Manipulation Skills
- **SKILLS.md**: Updated with complete skill description
### Usage Examples
```
> Convert all PDFs in the literature folder to Markdown
> Convert this PowerPoint presentation to Markdown with AI-generated descriptions
> Extract tables from this Excel file
> Transcribe this lecture recording
```
## Scripts Usage
### Batch Convert
```bash
python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf .docx --workers 4
```
### AI-Enhanced Convert
```bash
export OPENROUTER_API_KEY="sk-or-v1-..."
python scripts/convert_with_ai.py paper.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type scientific
```
### Literature Convert
```bash
python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
```
## Key Features
1. **Token-Efficient Output**: Markdown optimized for LLM processing
2. **Comprehensive Format Support**: 15+ file types
3. **AI Enhancement**: Detailed image descriptions via OpenAI
4. **OCR Support**: Extract text from scanned documents
5. **Audio Transcription**: Speech-to-text for audio files
6. **YouTube Support**: Video transcript extraction
7. **Plugin System**: Extensible architecture
8. **Batch Processing**: Efficient parallel conversion
9. **Error Handling**: Robust error management
10. **Scientific Focus**: Optimized for research workflows
## Installation
```bash
# Full installation
pip install 'markitdown[all]'
# Selective installation
pip install 'markitdown[pdf,docx,pptx,xlsx]'
```
## Quick Start
```python
from markitdown import MarkItDown
# Basic usage
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
# With AI via OpenRouter
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # or openai/gpt-4o
)
result = md.convert("presentation.pptx")
```
## Documentation Files
| File | Purpose | Lines |
|------|---------|-------|
| SKILL.md | Main documentation | 400+ |
| api_reference.md | API documentation | 500+ |
| file_formats.md | Format guides | 600+ |
| example_usage.md | Practical examples | 500+ |
| batch_convert.py | Batch conversion | 200+ |
| convert_with_ai.py | AI conversion | 200+ |
| convert_literature.py | Literature conversion | 250+ |
| QUICK_REFERENCE.md | Quick reference | 300+ |
| INSTALLATION_GUIDE.md | Installation guide | 300+ |
**Total**: ~3,000+ lines of documentation and code
## Use Cases
1. **Literature Review**: Convert research papers to Markdown for analysis
2. **Data Extraction**: Extract tables from Excel/PDF for processing
3. **Presentation Processing**: Convert slides with AI descriptions
4. **Document Analysis**: Prepare documents for LLM consumption
5. **Lecture Transcription**: Convert audio recordings to text
6. **YouTube Analysis**: Extract video transcripts
7. **Archive Processing**: Batch convert document collections
## Next Steps
1. Install MarkItDown: `pip install 'markitdown[all]'`
2. Read `QUICK_REFERENCE.md` for common tasks
3. Try example scripts in `scripts/` directory
4. Explore `SKILL.md` for comprehensive guide
5. Check `example_usage.md` for practical examples
## Resources
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
- **PyPI**: https://pypi.org/project/markitdown/
- **OpenRouter**: https://openrouter.ai (AI model access)
- **OpenRouter API Keys**: https://openrouter.ai/keys
- **OpenRouter Models**: https://openrouter.ai/models
- **License**: MIT (Microsoft Corporation)
- **Python**: 3.10+ required
- **Skill Location**: `.claude/skills/markitdown/`
## Success Criteria
✅ Comprehensive skill documentation created
✅ Complete API reference provided
✅ Format-specific guides included
✅ Utility scripts implemented
✅ Practical examples documented
✅ Installation guide created
✅ Quick reference guide added
✅ Integration with Scientific Writer complete
✅ SKILLS.md updated
✅ Scripts made executable
✅ MIT License included
## Skill Status
**Status**: ✅ Complete and Ready to Use
The MarkItDown skill is fully integrated into the Claude Scientific Writer and ready for use. All documentation, scripts, and examples are in place.