Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:18:23 +08:00
commit ed84e15471
6 changed files with 620 additions and 0 deletions

View File

@@ -0,0 +1,12 @@
{
"name": "markdown-tools",
"description": "Convert documents (PDFs, Word, PowerPoint, Confluence exports) to markdown with Windows/WSL path handling support",
"version": "0.0.0-2025.11.28",
"author": {
"name": "daymade",
"email": "daymadev89@gmail.com"
},
"skills": [
"./skills/markdown-tools"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# markdown-tools
Convert documents (PDFs, Word, PowerPoint, Confluence exports) to markdown with Windows/WSL path handling support

52
plugin.lock.json Normal file
View File

@@ -0,0 +1,52 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:daymade/claude-code-skills:markdown-tools",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "62cf537b7bdb5ed19d740e664111a5912fbdfa95",
"treeHash": "17aa71a1268617ab036dc7319909a3154d7456e0f0f527921cdcc271c71a11fb",
"generatedAt": "2025-11-28T10:16:11.418407Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "markdown-tools",
"description": "Convert documents (PDFs, Word, PowerPoint, Confluence exports) to markdown with Windows/WSL path handling support"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "17f4045f916f7ad75a0224052fe5392b994d2779af4d7062989feedd7b7d0d5e"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "ae25054ec70038a07c517c230298edd2ab7e4ab5d69cacab5100ba9976f11971"
},
{
"path": "skills/markdown-tools/SKILL.md",
"sha256": "e6e9aa2e4f75b4824d8ab664cbe004e4b3ec6849db43cef5b858940633d0fb4f"
},
{
"path": "skills/markdown-tools/references/conversion-examples.md",
"sha256": "184b501c4e277f7df9db9fc09df8f0d1190c12071730ccd2d16d8fbcbffdaf71"
},
{
"path": "skills/markdown-tools/scripts/convert_path.py",
"sha256": "09e22a4b8cce524c1f9b496e16378bde7db58b4825cf3a7599a4b2b4c5f9a853"
}
],
"dirSha256": "17aa71a1268617ab036dc7319909a3154d7456e0f0f527921cdcc271c71a11fb"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,146 @@
---
name: markdown-tools
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, or working with markitdown utility.
---
# Markdown Tools
## Overview
This skill provides document conversion to markdown with Windows/WSL path handling support. It helps convert various document formats to markdown and handles path conversions between Windows and WSL environments.
## Core Capabilities
### 1. Markdown Conversion
Convert documents to markdown format with automatic Windows/WSL path handling.
### 2. Confluence Export Processing
Handle Confluence .doc exports with special characters for knowledge base integration.
## Quick Start
### Convert Any Document to Markdown
```bash
# Basic conversion
markitdown "path/to/document.pdf" > output.md
# WSL path example
markitdown "/mnt/c/Users/username/Documents/file.docx" > output.md
```
See `references/conversion-examples.md` for detailed examples of various conversion scenarios.
### Convert Confluence Export
```bash
# Direct conversion for simple exports
markitdown "confluence-export.doc" > output.md
# For exports with special characters, see references/
```
## Path Conversion
### Windows to WSL Path Format
Windows paths must be converted to WSL format before use in bash commands.
**Conversion rules:**
- Replace `C:\` with `/mnt/c/`
- Replace `\` with `/`
- Preserve spaces and special characters
- Use quotes for paths with spaces
**Example conversions:**
```bash
# Windows path
C:\Users\username\Documents\file.doc
# WSL path
/mnt/c/Users/username/Documents/file.doc
```
**Helper script:** Use `scripts/convert_path.py` to automate conversion:
```bash
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
```
See `references/conversion-examples.md` for detailed path conversion examples.
## Document Conversion Workflows
### Workflow 1: Simple Markdown Conversion
For straightforward document conversions (PDF, .docx without special characters):
1. Convert Windows path to WSL format (if needed)
2. Run markitdown
3. Redirect output to .md file
See `references/conversion-examples.md` for detailed examples.
### Workflow 2: Confluence Export with Special Characters
For Confluence .doc exports that contain special characters or complex formatting:
1. Save .doc file to accessible location
2. Use appropriate conversion method (see references)
3. Verify output formatting
See `references/conversion-examples.md` for step-by-step command examples.
## Error Handling
### Common Issues and Solutions
**markitdown not found:**
```bash
# Install markitdown via pip
pip install markitdown
# Or via uv tools
uv tool install markitdown
```
**Path not found:**
```bash
# Verify path exists
ls -la "/mnt/c/Users/username/Documents/file.doc"
# Use convert_path.py helper
python scripts/convert_path.py "C:\Users\username\Documents\file.doc"
```
**Encoding issues:**
- Ensure files are UTF-8 encoded
- Check for special characters in filenames
- Use quotes around paths with spaces
## Resources
### references/conversion-examples.md
Comprehensive examples for all conversion scenarios including:
- Simple document conversions (PDF, Word, PowerPoint)
- Confluence export handling
- Path conversion examples for Windows/WSL
- Batch conversion operations
- Error recovery and troubleshooting examples
Load this reference when users need specific command examples or encounter conversion issues.
### scripts/convert_path.py
Python script to automate Windows to WSL path conversion. Handles:
- Drive letter conversion (C:\ → /mnt/c/)
- Backslash to forward slash
- Special characters and spaces
## Best Practices
1. **Convert Windows paths to WSL format** before bash operations
2. **Verify paths exist** before operations using ls or test commands
3. **Check output quality** after conversion
4. **Use markitdown directly** for simple conversions
5. **Test incrementally** - Verify each conversion step before proceeding
6. **Preserve directory structure** when doing batch conversions

View File

@@ -0,0 +1,346 @@
# Document Conversion Examples
Comprehensive examples for converting various document formats to markdown.
## Basic Document Conversions
### PDF to Markdown
```bash
# Simple PDF conversion
markitdown "document.pdf" > output.md
# WSL path example
markitdown "/mnt/c/Users/username/Documents/report.pdf" > report.md
# With explicit output
markitdown "slides.pdf" > "slides.md"
```
### Word Documents to Markdown
```bash
# Modern Word document (.docx)
markitdown "document.docx" > output.md
# Legacy Word document (.doc)
markitdown "legacy-doc.doc" > output.md
# Preserve directory structure
markitdown "/path/to/docs/file.docx" > "/path/to/output/file.md"
```
### PowerPoint to Markdown
```bash
# Convert presentation
markitdown "presentation.pptx" > slides.md
# WSL path
markitdown "/mnt/c/Users/username/Desktop/slides.pptx" > slides.md
```
---
## Windows/WSL Path Conversion
### Basic Path Conversion Rules
```bash
# Windows path
C:\Users\username\Documents\file.doc
# WSL equivalent
/mnt/c/Users/username/Documents/file.doc
```
### Conversion Examples
```bash
# Single backslash to forward slash
C:\folder\file.txt
→ /mnt/c/folder/file.txt
# Path with spaces (must use quotes)
C:\Users\John Doe\Documents\report.pdf
"/mnt/c/Users/John Doe/Documents/report.pdf"
# OneDrive path
C:\Users\username\OneDrive\Documents\file.doc
"/mnt/c/Users/username/OneDrive/Documents/file.doc"
# Different drive letters
D:\Projects\document.docx
→ /mnt/d/Projects/document.docx
```
### Using convert_path.py Helper
```bash
# Automatic conversion
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
# Output: /mnt/c/Users/username/Downloads/document.doc
# Use in conversion command
wsl_path=$(python scripts/convert_path.py "C:\Users\username\file.docx")
markitdown "$wsl_path" > output.md
```
---
## Batch Conversions
### Convert Multiple Files
```bash
# Convert all PDFs in a directory
for pdf in /path/to/pdfs/*.pdf; do
filename=$(basename "$pdf" .pdf)
markitdown "$pdf" > "/path/to/output/${filename}.md"
done
# Convert all Word documents
for doc in /path/to/docs/*.docx; do
filename=$(basename "$doc" .docx)
markitdown "$doc" > "/path/to/output/${filename}.md"
done
```
### Batch Conversion with Path Conversion
```bash
# Windows batch (PowerShell)
Get-ChildItem "C:\Documents\*.pdf" | ForEach-Object {
$wslPath = "/mnt/c/Documents/$($_.Name)"
$outFile = "/mnt/c/Output/$($_.BaseName).md"
wsl markitdown $wslPath > $outFile
}
```
---
## Confluence Export Handling
### Simple Confluence Export
```bash
# Direct conversion for exports without special characters
markitdown "confluence-export.doc" > output.md
```
### Export with Special Characters
For Confluence exports containing special characters:
1. Save the .doc file to an accessible location
2. Try direct conversion first:
```bash
markitdown "confluence-export.doc" > output.md
```
3. If special characters cause issues:
- Open in Word and save as .docx
- Or use LibreOffice to convert: `libreoffice --headless --convert-to docx export.doc`
- Then convert the .docx file
### Handling Encoding Issues
```bash
# Check file encoding
file -i "document.doc"
# Convert if needed (using iconv)
iconv -f ISO-8859-1 -t UTF-8 input.md > output.md
```
---
## Advanced Conversion Scenarios
### Preserving Directory Structure
```bash
# Mirror directory structure
src_dir="/mnt/c/Users/username/Documents"
out_dir="/path/to/output"
find "$src_dir" -name "*.docx" | while read file; do
# Get relative path
rel_path="${file#$src_dir/}"
out_file="$out_dir/${rel_path%.docx}.md"
# Create output directory
mkdir -p "$(dirname "$out_file")"
# Convert
markitdown "$file" > "$out_file"
done
```
### Conversion with Metadata
```bash
# Add frontmatter to converted file
{
echo "---"
echo "title: $(basename "$file" .pdf)"
echo "converted: $(date -I)"
echo "source: $file"
echo "---"
echo ""
markitdown "$file"
} > output.md
```
---
## Error Recovery
### Handling Failed Conversions
```bash
# Check if markitdown succeeded
if markitdown "document.pdf" > output.md 2> error.log; then
echo "Conversion successful"
else
echo "Conversion failed, check error.log"
fi
```
### Retry Logic
```bash
# Retry failed conversions
for file in *.pdf; do
output="${file%.pdf}.md"
if ! [ -f "$output" ]; then
echo "Converting $file..."
markitdown "$file" > "$output" || echo "Failed: $file" >> failed.txt
fi
done
```
---
## Quality Verification
### Check Conversion Quality
```bash
# Compare line counts
wc -l document.pdf.md
# Check for common issues
grep "TODO\|ERROR\|MISSING" output.md
# Preview first/last lines
head -n 20 output.md
tail -n 20 output.md
```
### Validate Output
```bash
# Check for empty files
if [ ! -s output.md ]; then
echo "Warning: Output file is empty"
fi
# Verify markdown syntax
# Use a markdown linter if available
markdownlint output.md
```
---
## Best Practices
### 1. Path Handling
- Always quote paths with spaces
- Verify paths exist before conversion
- Use absolute paths for scripts
### 2. Batch Processing
- Log conversions for audit trail
- Handle errors gracefully
- Preserve original files
### 3. Output Organization
- Mirror source directory structure
- Use consistent naming conventions
- Separate by document type or date
### 4. Quality Assurance
- Spot-check random conversions
- Validate critical documents manually
- Keep conversion logs
### 5. Performance
- Use parallel processing for large batches
- Skip already converted files
- Clean up temporary files
---
## Common Patterns
### Pattern: Convert and Review
```bash
#!/bin/bash
file="$1"
output="${file%.*}.md"
# Convert
markitdown "$file" > "$output"
# Open in editor for review
${EDITOR:-vim} "$output"
```
### Pattern: Safe Conversion
```bash
#!/bin/bash
file="$1"
backup="${file}.backup"
output="${file%.*}.md"
# Backup original
cp "$file" "$backup"
# Convert with error handling
if markitdown "$file" > "$output" 2> conversion.log; then
echo "Success: $output"
rm "$backup"
else
echo "Failed: Check conversion.log"
mv "$backup" "$file"
fi
```
### Pattern: Metadata Preservation
```bash
#!/bin/bash
# Extract and preserve document metadata
file="$1"
output="${file%.*}.md"
# Get file metadata
created=$(stat -c %w "$file" 2>/dev/null || stat -f %SB "$file")
modified=$(stat -c %y "$file" 2>/dev/null || stat -f %Sm "$file")
# Convert with metadata
{
echo "---"
echo "original_file: $(basename "$file")"
echo "created: $created"
echo "modified: $modified"
echo "converted: $(date -I)"
echo "---"
echo ""
markitdown "$file"
} > "$output"
```

View File

@@ -0,0 +1,61 @@
#!/usr/bin/env python3
"""
Convert Windows paths to WSL format.
Usage:
python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"
Output:
/mnt/c/Users/username/Downloads/file.doc
"""
import sys
import re
def convert_windows_to_wsl(windows_path: str) -> str:
"""
Convert a Windows path to WSL format.
Args:
windows_path: Windows path (e.g., "C:\\Users\\username\\file.doc")
Returns:
WSL path (e.g., "/mnt/c/Users/username/file.doc")
"""
# Remove quotes if present
path = windows_path.strip('"').strip("'")
# Handle drive letter (C:\ or C:/)
drive_pattern = r'^([A-Za-z]):[\\\/]'
match = re.match(drive_pattern, path)
if not match:
# Already a WSL path or relative path
return path
drive_letter = match.group(1).lower()
path_without_drive = path[3:] # Remove "C:\"
# Replace backslashes with forward slashes
path_without_drive = path_without_drive.replace('\\', '/')
# Construct WSL path
wsl_path = f"/mnt/{drive_letter}/{path_without_drive}"
return wsl_path
def main():
if len(sys.argv) < 2:
print("Usage: python convert_path.py <windows_path>")
print('Example: python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"')
sys.exit(1)
windows_path = sys.argv[1]
wsl_path = convert_windows_to_wsl(windows_path)
print(wsl_path)
if __name__ == "__main__":
main()