zhongwei/gh-dnvriend-ollama-deepseek-ocr-tool-plugins-ollama-deepseek-ocr-tool

Files

Zhongwei Li 3182431ecf Initial commit

2025-11-29 18:23:31 +08:00

7.0 KiB

Raw Permalink Blame History

name, description

name	description
skill-ollama-deepseek-ocr-tool	Batch OCR processing with DeepSeek-OCR via Ollama

When to use

Convert textbook/lecture images to markdown notes
Batch OCR processing of scanned documents
Extract text from image sequences (iPhone photos, screenshots)
Create searchable markdown from visual content
Process documents privately without cloud services

ollama-deepseek-ocr-tool Skill

Purpose

This skill provides access to ollama-deepseek-ocr-tool, a CLI tool for fast, private batch OCR processing using DeepSeek-OCR via Ollama. Converts sequences of images (textbook pages, slides, scans) into a single coherent markdown document.

Key capabilities:

⚡ Fast processing (~3s per image on M4)
🔒 Private - runs entirely locally
📝 Clean markdown output (tables, headings, lists)
🔄 Natural sorting (IMG_1 < IMG_2 < IMG_10)
💰 Free - no API costs or rate limits

When to Use This Skill

Use this skill when:

Converting textbook chapters to Obsidian notes
Processing lecture slides or handouts to markdown
Extracting text from scanned documents
Creating searchable study materials from images
Need comprehensive examples and troubleshooting

Do NOT use this skill for:

Cloud-based OCR (this is local-only)
Describing image content (extracts text only)
Handwritten text recognition (printed text only)
Real-time streaming OCR (batch processing only)

CLI Tool: ollama-deepseek-ocr-tool

The ollama-deepseek-ocr-tool processes multiple images in sequence and creates a single markdown document with extracted text. Images are sorted naturally and text is appended sequentially for coherent reading.

Installation

# Clone and install
git clone https://github.com/dnvriend/ollama-deepseek-ocr-tool.git
cd ollama-deepseek-ocr-tool
uv tool install .

Prerequisites

Ollama - Local LLM runtime
```
brew install ollama
ollama serve
```
DeepSeek-OCR model (~6GB download)
```
ollama pull deepseek-ocr
```
Python 3.14+ and uv package manager

Quick Start

# Example 1: Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md

# Example 2: Convert lecture slides to markdown
ollama-deepseek-ocr-tool "lecture-week5/*.jpg" week5-summary.md

# Example 3: With verbose logging to debug issues
ollama-deepseek-ocr-tool "*.png" output.md -vv

Main Command - Batch OCR Processing

Process images matching a glob pattern and create a markdown document.

Usage:

ollama-deepseek-ocr-tool GLOB_PATTERN OUTPUT_FILE [OPTIONS]

Arguments:

GLOB_PATTERN: Pattern to match images (e.g., ".png", "dir/.jpg")
OUTPUT_FILE: Path to output markdown file (will be overwritten)
-v/-vv/-vvv: Verbosity (INFO/DEBUG/TRACE)
--help: Show comprehensive help with examples
--version: Show version

Examples:

# Basic: Process all PNGs in current directory
ollama-deepseek-ocr-tool "*.png" output.md

# Process specific directory
ollama-deepseek-ocr-tool "textbook-ch3/*.jpg" chapter-3.md

# With verbose logging
ollama-deepseek-ocr-tool "*.png" output.md -vv

# Preview help (shows all examples)
ollama-deepseek-ocr-tool --help

Output Format:

<!-- Source: IMG_4170.png -->

[extracted text from image 1]

---

<!-- Source: IMG_4171.png -->

[extracted text from image 2]

⚙️ Advanced Features (Click to expand)

Multi-Level Verbosity Logging

Control logging detail with progressive verbosity levels. All logs output to stderr.

Logging Levels:

Flag	Level	Output	Use Case
(none)	WARNING	Errors and warnings only	Production, quiet mode
`-v`	INFO	+ High-level operations	Normal debugging
`-vv`	DEBUG	+ Detailed info, full tracebacks	Development, troubleshooting
`-vvv`	TRACE	+ Library internals	Deep debugging

Examples:

# INFO level - see operations
ollama-deepseek-ocr-tool command -v

# DEBUG level - see detailed info
ollama-deepseek-ocr-tool command -vv

# TRACE level - see all internals
ollama-deepseek-ocr-tool command -vvv

What Can Be Extracted

Text & Formatting:

✅ Headings (H1, H2, H3)
✅ Body text with bold/italic
✅ Bulleted and numbered lists
✅ Multi-column layouts

Tables:

✅ Clean markdown table format
✅ Headers and structure preserved
✅ Merged cells handled

Diagrams & Figures:

✅ Text labels extracted
✅ Figure captions captured
❌ Visual content not described
❌ Flowchart arrows not preserved

Performance Characteristics

Speed: ~3 seconds per image (M4 MacBook)
Memory: ~6GB (DeepSeek-OCR model)
Throughput: ~20 images per minute
Scalability: Sequential processing (no parallel batching)

🔧 Troubleshooting (Click to expand)

Common Issues

Issue: "No files match pattern"

# Check your glob pattern and current directory
ls *.png  # Verify files exist

# Use absolute or relative paths correctly
ollama-deepseek-ocr-tool "./images/*.png" output.md

Issue: "Connection refused" / "OCR extraction failed"

# Ensure Ollama is running
ollama serve

# Verify model is installed
ollama list | grep deepseek-ocr

# Pull model if missing
ollama pull deepseek-ocr

Issue: Poor quality extraction

Use -vv flag to see word counts and verify extraction
Check image quality (resolution, clarity)
For complex layouts, results may vary
Tables and diagrams work best with clear text

Issue: Slow processing

Expected: ~3 seconds per image on M4
Check if Ollama is using GPU acceleration
Sequential processing is by design (6GB model)

Getting Help

# Show comprehensive help with examples
ollama-deepseek-ocr-tool --help

# Use verbose logging to debug
ollama-deepseek-ocr-tool "*.png" output.md -vv

Exit Codes

0: Success - all images processed
1: Validation error - no files match pattern or invalid arguments
2: Runtime error - Ollama connection failed or model not found

Best Practices

Organize images before processing: Name files sequentially (IMG_001, IMG_002) for natural sorting
Use descriptive output names: chapter-3-entrepreneurship.md not output.md
Start with small batches: Test with 2-3 images first to verify quality
Enable verbose logging for debugging: Use -vv to see extraction progress and word counts
Review output after processing: OCR may miss formatting or misread complex layouts
Keep images at good resolution: Higher quality = better extraction
Process similar content together: Keep textbook pages separate from diagrams

Resources

GitHub: https://github.com/dnvriend/ollama-deepseek-ocr-tool
Python Package Index: https://pypi.org/project/ollama-deepseek-ocr-tool/
Documentation:

7.0 KiB Raw Permalink Blame History