Initial commit

2025-11-29 18:23:31 +08:00
commit 3182431ecf
5 changed files with 386 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
+{
+  "name": "ollama-deepseek-ocr-tool",
+  "description": "A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown",
+  "version": "0.1.0",
+  "author": {
+    "name": "Dennis Vriend",
+    "email": "dvriend@ilionx.com"
+  },
+  "skills": [
+    "./skills"
+  ],
+  "commands": [
+    "./commands"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# ollama-deepseek-ocr-tool
+
+A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown
--- a/commands/help.md
+++ b/commands/help.md
@@ -0,0 +1,61 @@
+---
+description: Show help information for ollama-deepseek-ocr-tool
+argument-hint: none
+---
+
+Display help information for ollama-deepseek-ocr-tool, a CLI tool for batch OCR processing
+using DeepSeek-OCR via Ollama. Converts image sequences to markdown documents.
+
+## Usage
+
+```bash
+# Show full help with examples and troubleshooting
+ollama-deepseek-ocr-tool --help
+
+# Show version
+ollama-deepseek-ocr-tool --version
+```
+
+## Options
+
+- `--help` / `-h`: Show help with examples, prerequisites, and troubleshooting
+- `--version`: Show version number
+- `-v`, `-vv`, `-vvv`: Verbosity levels (INFO, DEBUG, TRACE)
+
+## Examples
+
+```bash
+# Get help (shows progressive examples)
+ollama-deepseek-ocr-tool --help
+
+# Check version
+ollama-deepseek-ocr-tool --version
+
+# Basic usage (from help)
+ollama-deepseek-ocr-tool "*.png" output.md
+
+# With verbose logging
+ollama-deepseek-ocr-tool "*.png" output.md -vv
+```
+
+## What Help Shows
+
+The `--help` output includes:
+- **Description**: What the tool does
+- **Arguments**: GLOB_PATTERN and OUTPUT_FILE
+- **Examples**: Progressive from simple to complex
+- **Output Format**: How the markdown is structured
+- **Prerequisites**: Ollama setup steps
+- **Troubleshooting**: Common errors and solutions
+
+## Quick Start
+
+```bash
+# 1. Install prerequisites
+brew install ollama
+ollama serve
+ollama pull deepseek-ocr
+
+# 2. Process images
+ollama-deepseek-ocr-tool "IMG_*.png" chapter.md
+```
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,49 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:dnvriend/ollama-deepseek-ocr-tool:plugins/ollama-deepseek-ocr-tool",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "ac93b6127e632488400c0081e5614a45ce98d40f",
+    "treeHash": "ad6930ae20fed449775ceca7701a19abba47996c443b36f8c5559f2c18ba12f7",
+    "generatedAt": "2025-11-28T10:16:36.987596Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "ollama-deepseek-ocr-tool",
+    "description": "A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown",
+    "version": "0.1.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "2b3554b3c7e485b93d6a2a94e0a8ae17fd2aaa37bb714fc6e0effc0b64bc0398"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "7842278bcd56e5dc45cd0b49c2f14e1cbf2f508bcb5b897c893160d0ff04a185"
+      },
+      {
+        "path": "commands/help.md",
+        "sha256": "09ac30b7e0bab611f436f02b8bdeb075c8069da8ba7b8d2caa37646d0f4da39c"
+      },
+      {
+        "path": "skills/ollama-deepseek-ocr-tool/SKILL.md",
+        "sha256": "0145f404eb17895139f9af37889e87e77b4ee7fce5c5ef312c457aabe9d51127"
+      }
+    ],
+    "dirSha256": "ad6930ae20fed449775ceca7701a19abba47996c443b36f8c5559f2c18ba12f7"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/ollama-deepseek-ocr-tool/SKILL.md
+++ b/skills/ollama-deepseek-ocr-tool/SKILL.md
@@ -0,0 +1,258 @@
+---
+name: skill-ollama-deepseek-ocr-tool
+description: Batch OCR processing with DeepSeek-OCR via Ollama
+---
+
+# When to use
+
+- Convert textbook/lecture images to markdown notes
+- Batch OCR processing of scanned documents
+- Extract text from image sequences (iPhone photos, screenshots)
+- Create searchable markdown from visual content
+- Process documents privately without cloud services
+
+# ollama-deepseek-ocr-tool Skill
+
+## Purpose
+
+This skill provides access to `ollama-deepseek-ocr-tool`, a CLI tool for fast, private batch OCR processing using DeepSeek-OCR via Ollama. Converts sequences of images (textbook pages, slides, scans) into a single coherent markdown document.
+
+**Key capabilities:**
+- ⚡ Fast processing (~3s per image on M4)
+- 🔒 Private - runs entirely locally
+- 📝 Clean markdown output (tables, headings, lists)
+- 🔄 Natural sorting (IMG_1 < IMG_2 < IMG_10)
+- 💰 Free - no API costs or rate limits
+
+## When to Use This Skill
+
+**Use this skill when:**
+- Converting textbook chapters to Obsidian notes
+- Processing lecture slides or handouts to markdown
+- Extracting text from scanned documents
+- Creating searchable study materials from images
+- Need comprehensive examples and troubleshooting
+
+**Do NOT use this skill for:**
+- Cloud-based OCR (this is local-only)
+- Describing image content (extracts text only)
+- Handwritten text recognition (printed text only)
+- Real-time streaming OCR (batch processing only)
+
+## CLI Tool: ollama-deepseek-ocr-tool
+
+The `ollama-deepseek-ocr-tool` processes multiple images in sequence and creates a single markdown document with extracted text. Images are sorted naturally and text is appended sequentially for coherent reading.
+
+### Installation
+
+```bash
+# Clone and install
+git clone https://github.com/dnvriend/ollama-deepseek-ocr-tool.git
+cd ollama-deepseek-ocr-tool
+uv tool install .
+```
+
+### Prerequisites
+
+1. **Ollama** - Local LLM runtime
+   ```bash
+   brew install ollama
+   ollama serve
+   ```
+
+2. **DeepSeek-OCR model** (~6GB download)
+   ```bash
+   ollama pull deepseek-ocr
+   ```
+
+3. **Python 3.14+** and **uv package manager**
+
+### Quick Start
+
+```bash
+# Example 1: Process textbook chapter from iPhone photos
+ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md
+
+# Example 2: Convert lecture slides to markdown
+ollama-deepseek-ocr-tool "lecture-week5/*.jpg" week5-summary.md
+
+# Example 3: With verbose logging to debug issues
+ollama-deepseek-ocr-tool "*.png" output.md -vv
+```
+
+### Main Command - Batch OCR Processing
+
+Process images matching a glob pattern and create a markdown document.
+
+**Usage:**
+```bash
+ollama-deepseek-ocr-tool GLOB_PATTERN OUTPUT_FILE [OPTIONS]
+```
+
+**Arguments:**
+- `GLOB_PATTERN`: Pattern to match images (e.g., "*.png", "dir/*.jpg")
+- `OUTPUT_FILE`: Path to output markdown file (will be overwritten)
+- `-v/-vv/-vvv`: Verbosity (INFO/DEBUG/TRACE)
+- `--help`: Show comprehensive help with examples
+- `--version`: Show version
+
+**Examples:**
+```bash
+# Basic: Process all PNGs in current directory
+ollama-deepseek-ocr-tool "*.png" output.md
+
+# Process specific directory
+ollama-deepseek-ocr-tool "textbook-ch3/*.jpg" chapter-3.md
+
+# With verbose logging
+ollama-deepseek-ocr-tool "*.png" output.md -vv
+
+# Preview help (shows all examples)
+ollama-deepseek-ocr-tool --help
+```
+
+**Output Format:**
+```markdown
+<!-- Source: IMG_4170.png -->
+
+[extracted text from image 1]
+
+---
+
+<!-- Source: IMG_4171.png -->
+
+[extracted text from image 2]
+```
+
+</details>
+
+<details>
+<summary><strong>⚙️  Advanced Features (Click to expand)</strong></summary>
+
+<!-- TODO: Add advanced features documentation -->
+
+### Multi-Level Verbosity Logging
+
+Control logging detail with progressive verbosity levels. All logs output to stderr.
+
+**Logging Levels:**
+
+| Flag | Level | Output | Use Case |
+|------|-------|--------|----------|
+| (none) | WARNING | Errors and warnings only | Production, quiet mode |
+| `-v` | INFO | + High-level operations | Normal debugging |
+| `-vv` | DEBUG | + Detailed info, full tracebacks | Development, troubleshooting |
+| `-vvv` | TRACE | + Library internals | Deep debugging |
+
+**Examples:**
+```bash
+# INFO level - see operations
+ollama-deepseek-ocr-tool command -v
+
+# DEBUG level - see detailed info
+ollama-deepseek-ocr-tool command -vv
+
+# TRACE level - see all internals
+ollama-deepseek-ocr-tool command -vvv
+```
+
+---
+
+### What Can Be Extracted
+
+**Text & Formatting:**
+- ✅ Headings (H1, H2, H3)
+- ✅ Body text with bold/italic
+- ✅ Bulleted and numbered lists
+- ✅ Multi-column layouts
+
+**Tables:**
+- ✅ Clean markdown table format
+- ✅ Headers and structure preserved
+- ✅ Merged cells handled
+
+**Diagrams & Figures:**
+- ✅ Text labels extracted
+- ✅ Figure captions captured
+- ❌ Visual content not described
+- ❌ Flowchart arrows not preserved
+
+### Performance Characteristics
+
+- **Speed**: ~3 seconds per image (M4 MacBook)
+- **Memory**: ~6GB (DeepSeek-OCR model)
+- **Throughput**: ~20 images per minute
+- **Scalability**: Sequential processing (no parallel batching)
+
+</details>
+
+<details>
+<summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>
+
+### Common Issues
+
+**Issue: "No files match pattern"**
+```bash
+# Check your glob pattern and current directory
+ls *.png  # Verify files exist
+
+# Use absolute or relative paths correctly
+ollama-deepseek-ocr-tool "./images/*.png" output.md
+```
+
+**Issue: "Connection refused" / "OCR extraction failed"**
+```bash
+# Ensure Ollama is running
+ollama serve
+
+# Verify model is installed
+ollama list | grep deepseek-ocr
+
+# Pull model if missing
+ollama pull deepseek-ocr
+```
+
+**Issue: Poor quality extraction**
+- Use `-vv` flag to see word counts and verify extraction
+- Check image quality (resolution, clarity)
+- For complex layouts, results may vary
+- Tables and diagrams work best with clear text
+
+**Issue: Slow processing**
+- Expected: ~3 seconds per image on M4
+- Check if Ollama is using GPU acceleration
+- Sequential processing is by design (6GB model)
+
+### Getting Help
+
+```bash
+# Show comprehensive help with examples
+ollama-deepseek-ocr-tool --help
+
+# Use verbose logging to debug
+ollama-deepseek-ocr-tool "*.png" output.md -vv
+```
+
+</details>
+
+## Exit Codes
+
+- `0`: Success - all images processed
+- `1`: Validation error - no files match pattern or invalid arguments
+- `2`: Runtime error - Ollama connection failed or model not found
+
+## Best Practices
+
+1. **Organize images before processing**: Name files sequentially (IMG_001, IMG_002) for natural sorting
+2. **Use descriptive output names**: `chapter-3-entrepreneurship.md` not `output.md`
+3. **Start with small batches**: Test with 2-3 images first to verify quality
+4. **Enable verbose logging for debugging**: Use `-vv` to see extraction progress and word counts
+5. **Review output after processing**: OCR may miss formatting or misread complex layouts
+6. **Keep images at good resolution**: Higher quality = better extraction
+7. **Process similar content together**: Keep textbook pages separate from diagrams
+
+## Resources
+
+- **GitHub**: https://github.com/dnvriend/ollama-deepseek-ocr-tool
+- **Python Package Index**: https://pypi.org/project/ollama-deepseek-ocr-tool/
+- **Documentation**: <!-- TODO: Add documentation URL if available -->