Files
2025-11-29 18:23:31 +08:00

7.0 KiB

name, description
name description
skill-ollama-deepseek-ocr-tool Batch OCR processing with DeepSeek-OCR via Ollama

When to use

  • Convert textbook/lecture images to markdown notes
  • Batch OCR processing of scanned documents
  • Extract text from image sequences (iPhone photos, screenshots)
  • Create searchable markdown from visual content
  • Process documents privately without cloud services

ollama-deepseek-ocr-tool Skill

Purpose

This skill provides access to ollama-deepseek-ocr-tool, a CLI tool for fast, private batch OCR processing using DeepSeek-OCR via Ollama. Converts sequences of images (textbook pages, slides, scans) into a single coherent markdown document.

Key capabilities:

  • Fast processing (~3s per image on M4)
  • 🔒 Private - runs entirely locally
  • 📝 Clean markdown output (tables, headings, lists)
  • 🔄 Natural sorting (IMG_1 < IMG_2 < IMG_10)
  • 💰 Free - no API costs or rate limits

When to Use This Skill

Use this skill when:

  • Converting textbook chapters to Obsidian notes
  • Processing lecture slides or handouts to markdown
  • Extracting text from scanned documents
  • Creating searchable study materials from images
  • Need comprehensive examples and troubleshooting

Do NOT use this skill for:

  • Cloud-based OCR (this is local-only)
  • Describing image content (extracts text only)
  • Handwritten text recognition (printed text only)
  • Real-time streaming OCR (batch processing only)

CLI Tool: ollama-deepseek-ocr-tool

The ollama-deepseek-ocr-tool processes multiple images in sequence and creates a single markdown document with extracted text. Images are sorted naturally and text is appended sequentially for coherent reading.

Installation

# Clone and install
git clone https://github.com/dnvriend/ollama-deepseek-ocr-tool.git
cd ollama-deepseek-ocr-tool
uv tool install .

Prerequisites

  1. Ollama - Local LLM runtime

    brew install ollama
    ollama serve
    
  2. DeepSeek-OCR model (~6GB download)

    ollama pull deepseek-ocr
    
  3. Python 3.14+ and uv package manager

Quick Start

# Example 1: Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md

# Example 2: Convert lecture slides to markdown
ollama-deepseek-ocr-tool "lecture-week5/*.jpg" week5-summary.md

# Example 3: With verbose logging to debug issues
ollama-deepseek-ocr-tool "*.png" output.md -vv

Main Command - Batch OCR Processing

Process images matching a glob pattern and create a markdown document.

Usage:

ollama-deepseek-ocr-tool GLOB_PATTERN OUTPUT_FILE [OPTIONS]

Arguments:

  • GLOB_PATTERN: Pattern to match images (e.g., ".png", "dir/.jpg")
  • OUTPUT_FILE: Path to output markdown file (will be overwritten)
  • -v/-vv/-vvv: Verbosity (INFO/DEBUG/TRACE)
  • --help: Show comprehensive help with examples
  • --version: Show version

Examples:

# Basic: Process all PNGs in current directory
ollama-deepseek-ocr-tool "*.png" output.md

# Process specific directory
ollama-deepseek-ocr-tool "textbook-ch3/*.jpg" chapter-3.md

# With verbose logging
ollama-deepseek-ocr-tool "*.png" output.md -vv

# Preview help (shows all examples)
ollama-deepseek-ocr-tool --help

Output Format:

<!-- Source: IMG_4170.png -->

[extracted text from image 1]

---

<!-- Source: IMG_4171.png -->

[extracted text from image 2]
⚙️ Advanced Features (Click to expand)

Multi-Level Verbosity Logging

Control logging detail with progressive verbosity levels. All logs output to stderr.

Logging Levels:

Flag Level Output Use Case
(none) WARNING Errors and warnings only Production, quiet mode
-v INFO + High-level operations Normal debugging
-vv DEBUG + Detailed info, full tracebacks Development, troubleshooting
-vvv TRACE + Library internals Deep debugging

Examples:

# INFO level - see operations
ollama-deepseek-ocr-tool command -v

# DEBUG level - see detailed info
ollama-deepseek-ocr-tool command -vv

# TRACE level - see all internals
ollama-deepseek-ocr-tool command -vvv

What Can Be Extracted

Text & Formatting:

  • Headings (H1, H2, H3)
  • Body text with bold/italic
  • Bulleted and numbered lists
  • Multi-column layouts

Tables:

  • Clean markdown table format
  • Headers and structure preserved
  • Merged cells handled

Diagrams & Figures:

  • Text labels extracted
  • Figure captions captured
  • Visual content not described
  • Flowchart arrows not preserved

Performance Characteristics

  • Speed: ~3 seconds per image (M4 MacBook)
  • Memory: ~6GB (DeepSeek-OCR model)
  • Throughput: ~20 images per minute
  • Scalability: Sequential processing (no parallel batching)
🔧 Troubleshooting (Click to expand)

Common Issues

Issue: "No files match pattern"

# Check your glob pattern and current directory
ls *.png  # Verify files exist

# Use absolute or relative paths correctly
ollama-deepseek-ocr-tool "./images/*.png" output.md

Issue: "Connection refused" / "OCR extraction failed"

# Ensure Ollama is running
ollama serve

# Verify model is installed
ollama list | grep deepseek-ocr

# Pull model if missing
ollama pull deepseek-ocr

Issue: Poor quality extraction

  • Use -vv flag to see word counts and verify extraction
  • Check image quality (resolution, clarity)
  • For complex layouts, results may vary
  • Tables and diagrams work best with clear text

Issue: Slow processing

  • Expected: ~3 seconds per image on M4
  • Check if Ollama is using GPU acceleration
  • Sequential processing is by design (6GB model)

Getting Help

# Show comprehensive help with examples
ollama-deepseek-ocr-tool --help

# Use verbose logging to debug
ollama-deepseek-ocr-tool "*.png" output.md -vv

Exit Codes

  • 0: Success - all images processed
  • 1: Validation error - no files match pattern or invalid arguments
  • 2: Runtime error - Ollama connection failed or model not found

Best Practices

  1. Organize images before processing: Name files sequentially (IMG_001, IMG_002) for natural sorting
  2. Use descriptive output names: chapter-3-entrepreneurship.md not output.md
  3. Start with small batches: Test with 2-3 images first to verify quality
  4. Enable verbose logging for debugging: Use -vv to see extraction progress and word counts
  5. Review output after processing: OCR may miss formatting or misread complex layouts
  6. Keep images at good resolution: Higher quality = better extraction
  7. Process similar content together: Keep textbook pages separate from diagrams

Resources