Initial commit

2025-11-30 09:08:11 +08:00
commit e9e441dcb1
8 changed files with 769 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,12 @@
 {
  "name": "html-to-pdf",
  "description": "Convert HTML files to PDF or PNG format with multiple rendering options. Supports multi-page PDFs, single-page long-image PDFs, background colors, and Chinese/CJK characters.",
  "version": "0.0.0-2025.11.28",
  "author": {
    "name": "Yong Gao",
    "email": "zhongweili@tubi.tv"
  },
  "skills": [
    "./"
  ]
 }
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@@ -0,0 +1,183 @@
 # Usage Examples
 ## Real-World Examples
 ### Example 1: Business Plan Report
 ```bash
 # Original file: 202510_Alpha_Intelligence_BP.html (71 KB)
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py 202510_Alpha_Intelligence_BP.html
 # Output:
 # - 202510_Alpha_Intelligence_BP_fullpage.png (6.1 MB)
 # - 202510_Alpha_Intelligence_BP_fullpage.pdf (1.6 MB, 1 page)
 ```
 **Result**: Single-page PDF with no page breaks, perfect for online presentation!
 ### Example 2: Simple Test
 ```bash
 # Create test HTML
 echo '<h1>Test</h1><p>Hello World</p>' > test.html
 # Convert
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py test.html
 # Output:
 # - test_fullpage.png (12 KB)
 # - test_fullpage.pdf (20 KB)
 ```
 ### Example 3: Multi-File Batch Conversion
 ```bash
 # Convert all HTML files in a directory
 cd /path/to/html/files
 for file in *.html; do
    echo "Converting $file..."
    python ~/.claude/skills/html-to-pdf/html_to_long_image.py "$file"
 done
 # Check results
 ls -lh *_fullpage.pdf
 ```
 ### Example 4: With Custom Path
 ```bash
 # Specify full paths
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py \
    /Users/yonggao/Documents/report.html
 ```
 ## Integration Examples
 ### Use in Shell Script
 ```bash
 #!/bin/bash
 # convert_reports.sh
 HTML_CONVERTER=~/.claude/skills/html-to-pdf/html_to_long_image.py
 for report in reports/*.html; do
    python "$HTML_CONVERTER" "$report"
    echo "✓ Converted: $report"
 done
 echo "All reports converted!"
 ```
 ### Use in Makefile
 ```makefile
 # Makefile
 CONVERTER = python ~/.claude/skills/html-to-pdf/html_to_long_image.py
 %.pdf: %.html
 	$(CONVERTER) $<
 all: report.pdf presentation.pdf
 clean:
 	rm -f *_fullpage.pdf *_fullpage.png
 ```
 ### Use in Python Script
 ```python
 import subprocess
 import sys
 def convert_html_to_pdf(html_file):
    """Convert HTML to PDF using the skill."""
    converter = "~/.claude/skills/html-to-pdf/html_to_long_image.py"
    result = subprocess.run(
        [sys.executable, converter, html_file],
        capture_output=True,
        text=True
    )
    return result.returncode == 0
 # Usage
 if convert_html_to_pdf("report.html"):
    print("Success!")
 ```
 ## Comparison with Original Files
 | File | Type | Size | Pages | Notes |
 |------|------|------|-------|-------|
 | 202510_Alpha_Intelligence_BP.html | HTML | 71 KB | - | Source file |
 | 202510_Alpha_Intelligence_BP_fullpage.png | PNG | 6.1 MB | 1 | High-quality screenshot |
 | 202510_Alpha_Intelligence_BP_fullpage.pdf | PDF | 1.6 MB | 1 | **Best for viewing** |
 | 202510_Alpha_Intelligence_BP_final.pdf | PDF | 5.7 MB | 22 | Multi-page (with breaks) |
 ## Performance Data
 Based on real usage:
 | HTML Size | Processing Time | PNG Size | PDF Size |
 |-----------|----------------|----------|----------|
 | 71 KB | ~5 seconds | 6.1 MB | 1.6 MB |
 | 5 KB | ~3 seconds | 12 KB | 20 KB |
 | 200 KB | ~8 seconds | 15 MB | 4 MB |
 ## Tips & Tricks
 ### Optimize for File Size
 The PNG is always larger than the PDF. If you only need the PDF:
 ```bash
 # Generate both, then delete PNG
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py report.html
 rm report_fullpage.png
 ```
 ### Preview Before Converting
 ```bash
 # Open HTML in browser first
 open report.html
 # Then convert
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py report.html
 ```
 ### Auto-open Result
 ```bash
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py report.html && \
    open report_fullpage.pdf
 ```
 ### Check Page Count
 ```bash
 python -c "
 from pypdf import PdfReader
 r = PdfReader('report_fullpage.pdf')
 print(f'Pages: {len(r.pages)}')
 "
 ```
 ## Troubleshooting Examples
 ### Problem: Script not found
 ```bash
 # Solution: Use full path
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py file.html
 ```
 ### Problem: Permission denied
 ```bash
 # Solution: Make executable
 chmod +x ~/.claude/skills/html-to-pdf/html_to_long_image.py
 ```
 ### Problem: Playwright not installed
 ```bash
 # Solution: Install dependencies
 pip install playwright pillow pypdf
 playwright install chromium
 ```
 ### Problem: Content appears cut off
 ```bash
 # This is automatically handled by the script which:
 # 1. Scrolls through entire page
 # 2. Waits for animations
 # 3. Forces all content visible
 # No action needed!
 ```
--- a/QUICK_START.md
+++ b/QUICK_START.md
@@ -0,0 +1,58 @@
 # Quick Start Guide
 ## One-Line Usage
 ```bash
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py your_file.html
 ```
 ## What You Get
 - `your_file_fullpage.png` - Complete screenshot
 - `your_file_fullpage.pdf` - Single-page PDF (no page breaks!)
 ## Common Tasks
 ### Convert HTML in current directory
 ```bash
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py report.html
 ```
 ### Convert and open immediately
 ```bash
 python ~/.claude/skills/html-to-pdf/html_to_long_image.py report.html && open report_fullpage.pdf
 ```
 ### Batch convert all HTML files
 ```bash
 for file in *.html; do
    python ~/.claude/skills/html-to-pdf/html_to_long_image.py "$file"
 done
 ```
 ## Tips
 1. **For presentations**: Use the PDF output (smaller, 1-2 MB)
 2. **For high-quality images**: Use the PNG output (6-10 MB)
 3. **First-time setup**: Run `playwright install chromium` once
 ## Comparison
 | Method | Pages | Size | Best For |
 |--------|-------|------|----------|
 | Long Image (this skill) | 1 page | 1-2 MB | Online viewing, presentations |
 | Standard A4 | 20+ pages | 5-6 MB | Printing, archiving |
 ## Success!
 If you see this output, it worked:
 ```
 ✅ 成功生成长图！
   大小: 6263.9 KB
 ✅ PDF生成成功！大小: 1632.6 KB
 💡 打开查看:
   长图: open your_file_fullpage.png
   PDF:  open your_file_fullpage.pdf
 ```
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # html-to-pdf
 Convert HTML files to PDF or PNG format with multiple rendering options. Supports multi-page PDFs, single-page long-image PDFs, background colors, and Chinese/CJK characters.
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,252 @@
 ---
 name: html-to-pdf
 description: Converts HTML files to PDF or PNG format. Use this skill when the user asks to convert, export, or generate PDF/PNG from HTML files, or when they want to create printable documents, presentations, or long-form images from web pages or HTML reports.
 ---
 # HTML to PDF/PNG Converter Skill
 This skill helps you convert HTML files to PDF or PNG format with various options for output quality and page layout.
 ## When to Use This Skill
 Use this skill when the user wants to:
 - Convert HTML files to PDF
 - Generate PNG screenshots from HTML
 - Create printable documents from web pages
 - Export HTML reports as PDFs
 - Generate long-form images without page breaks
 - Create presentation-ready PDFs from HTML
 ## Available Conversion Methods
 ### Method 1: Multi-Page PDF (Standard A4)
 Best for: Printing, archiving, traditional documents
 **Command:**
 ```bash
 python html_to_pdf_final.py input.html output.pdf
 ```
 **Features:**
 - Standard A4 page format
 - Zero margins for seamless appearance
 - Background colors and gradients preserved
 - Multiple pages with page breaks
 - Optimized file size (~5-6 MB for typical reports)
 ### Method 2: Single-Page Long Image PDF
 Best for: Online viewing, presentations, no page breaks
 **Command:**
 ```bash
 python html_to_long_image.py input.html
 ```
 **Features:**
 - Generates a full-page PNG screenshot first
 - Converts PNG to single-page PDF
 - NO page breaks - entire content on one page
 - Perfect for presentations and online viewing
 - Smaller file size (~1-2 MB)
 - Two output files: `.png` and `.pdf`
 ### Method 3: Advanced Multi-Method Converter
 Best for: Fallback options, compatibility
 **Command:**
 ```bash
 python html_to_pdf_converter.py input.html output.pdf
 ```
 **Features:**
 - Tries WeasyPrint first (best CSS support)
 - Falls back to Playwright if needed
 - Automatic dependency installation
 - Handles complex CSS and gradients
 ## Required Dependencies
 The skill requires Playwright and Pillow. The scripts auto-install dependencies if missing:
 ```bash
 pip install playwright pillow pypdf
 playwright install chromium
 ```
 ## Step-by-Step Instructions
 ### For Standard Multi-Page PDF:
 1. **Verify the HTML file exists:**
   ```bash
   ls -la *.html
   ```
 2. **Run the converter:**
   ```bash
   python html_to_pdf_final.py your_file.html
   ```
 3. **Open the result:**
   ```bash
   open your_file_final.pdf
   ```
 ### For Single-Page Long Image PDF:
 1. **Verify the HTML file exists:**
   ```bash
   ls -la *.html
   ```
 2. **Run the long image converter:**
   ```bash
   python html_to_long_image.py your_file.html
   ```
 3. **Check outputs:**
   ```bash
   # View the PNG screenshot
   open your_file_fullpage.png
   # View the PDF version
   open your_file_fullpage.pdf
   ```
 ## Troubleshooting
 ### Issue: Playwright browser not found
 **Solution:**
 ```bash
 playwright install chromium
 ```
 ### Issue: Page breaks visible in PDF
 **Solution:** Use the long image method instead:
 ```bash
 python html_to_long_image.py your_file.html
 ```
 ### Issue: Content appears cut off
 **Causes & Solutions:**
 - **CSS animations not complete**: Script waits 2 seconds for animations
 - **Lazy loading**: Script scrolls through entire page to trigger loading
 - **Large file size**: Scripts handle files up to 20MB+
 ### Issue: Blank PDF output
 **Solution:** Use the long image method which uses screenshot instead of PDF rendering:
 ```bash
 python html_to_long_image.py your_file.html
 ```
 ## Output Files
 After conversion, you'll get:
 **Multi-Page PDF:**
 - `filename_final.pdf` - Standard A4 multi-page PDF
 **Long Image Method:**
 - `filename_fullpage.png` - Complete screenshot as PNG (6-10 MB)
 - `filename_fullpage.pdf` - Single-page PDF from image (1-2 MB)
 ## Best Practices
 1. **For online viewing/presentations:** Use `html_to_long_image.py`
   - No page breaks
   - Smooth scrolling experience
   - Smaller file size
 2. **For printing/archiving:** Use `html_to_pdf_final.py`
   - Standard A4 pages
   - Better for physical printing
   - Professional document format
 3. **For complex CSS:** Use `html_to_pdf_converter.py`
   - Multiple fallback methods
   - Better compatibility
 ## Implementation Notes
 ### Script Locations
 All scripts should be in the project directory:
 - `html_to_pdf_final.py` - Main multi-page converter
 - `html_to_long_image.py` - Long image generator
 - `html_to_pdf_converter.py` - Advanced multi-method converter
 ### Key Features Implemented
 1. **Animation Handling**: All scripts disable CSS animations/transitions
 2. **Lazy Loading**: Scripts scroll through content to trigger loading
 3. **Background Preservation**: All gradients and colors render correctly
 4. **Zero Margins**: Seamless page appearance without visible borders
 5. **Chinese Font Support**: Handles CJK characters properly
 ## Examples
 ### Example 1: Convert BP Report to Multi-Page PDF
 ```bash
 python html_to_pdf_final.py 202510_Alpha_Intelligence_BP.html
 # Output: 202510_Alpha_Intelligence_BP_final.pdf (22 pages, 5.7 MB)
 ```
 ### Example 2: Create Single-Page Presentation PDF
 ```bash
 python html_to_long_image.py 202510_Alpha_Intelligence_BP.html
 # Output:
 #   - 202510_Alpha_Intelligence_BP_fullpage.png (6.1 MB)
 #   - 202510_Alpha_Intelligence_BP_fullpage.pdf (1.6 MB, 1 page)
 ```
 ### Example 3: Batch Convert Multiple Files
 ```bash
 for file in *.html; do
    python html_to_long_image.py "$file"
 done
 ```
 ## Advanced Usage
 ### Custom Output Paths
 ```bash
 python html_to_pdf_final.py input.html custom_output.pdf
 python html_to_long_image.py input.html
 ```
 ### Check PDF Page Count
 ```bash
 python -c "from pypdf import PdfReader; r = PdfReader('output.pdf'); print(f'Pages: {len(r.pages)}')"
 ```
 ### Verify File Sizes
 ```bash
 ls -lh *.pdf *.png
 ```
 ## Performance Expectations
 - **Processing Speed**: ~5-10 seconds for typical HTML files
 - **Memory Usage**: ~100-200 MB during conversion
 - **PDF File Size**: 1-6 MB depending on method
 - **PNG File Size**: 6-10 MB for full-page screenshots
 ## Success Criteria
 A successful conversion should:
 1. ✅ Generate PDF/PNG without errors
 2. ✅ Include all HTML content (no truncation)
 3. ✅ Preserve colors, gradients, and styling
 4. ✅ Handle Chinese/CJK characters correctly
 5. ✅ Create readable file sizes (< 10 MB)
 ## Quick Reference
 | Need | Use | Output |
 |------|-----|--------|
 | Printing | `html_to_pdf_final.py` | Multi-page A4 PDF |
 | Online viewing | `html_to_long_image.py` | Single-page PDF + PNG |
 | Maximum compatibility | `html_to_pdf_converter.py` | Multi-page PDF with fallbacks |
--- a/34
+++ b/34
@@ -0,0 +1,34 @@
 # HTML to PDF Skill - Version History
 ## Version 1.0.0 (2025-10-23)
 ### Initial Release
 **Features:**
 - HTML to PNG screenshot conversion
 - PNG to single-page PDF conversion
 - Auto-install dependencies (playwright, Pillow)
 - CSS animation disabling
 - Lazy content loading via scrolling
 - Zero-margin seamless output
 - Chinese/CJK character support
 **Scripts:**
 - `html_to_long_image.py` - Main converter script
 **Documentation:**
 - SKILL.md - Main skill definition
 - README.md - Installation and usage
 - QUICK_START.md - One-line quick reference
 - EXAMPLES.md - Real-world examples
 **Tested With:**
 - Python 3.12
 - Playwright 1.40+
 - Pillow 11.0+
 - macOS (arm64)
 **Known Limitations:**
 - Single-page output only (no multi-page A4)
 - Requires Chromium browser installation
 - Large HTML files (>5MB) may take longer to process
--- a/html_to_long_image.py
+++ b/html_to_long_image.py
@@ -0,0 +1,166 @@
 #!/usr/bin/env python3
 """
 HTML转长图工具
 将HTML渲染为一张完整的长图（PNG），然后可以转PDF
 """
 import os
 import sys
 import time
 from pathlib import Path
 def html_to_long_image(html_path: str, output_path: str = None) -> str:
    """
    将HTML转换为一张完整的长图PNG。
    Args:
        html_path: HTML文件路径
        output_path: 输出图片路径（可选）
    Returns:
        生成的图片路径
    """
    try:
        from playwright.sync_api import sync_playwright
    except ImportError:
        print("正在安装 Playwright...")
        import subprocess
        subprocess.check_call([sys.executable, "-m", "pip", "install", "playwright", "-q"])
        from playwright.sync_api import sync_playwright
    if not os.path.exists(html_path):
        raise FileNotFoundError(f"HTML文件不存在: {html_path}")
    if output_path is None:
        html_file = Path(html_path)
        output_path = str(html_file.parent / f"{html_file.stem}_fullpage.png")
    print(f"\n📄 转换HTML为完整长图")
    print(f"   输入: {Path(html_path).name}")
    print(f"   输出: {Path(output_path).name}\n")
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page(viewport={'width': 1200, 'height': 800})
        # 加载HTML
        html_path_abs = str(Path(html_path).absolute())
        print("⏳ 加载HTML...")
        page.goto(f'file://{html_path_abs}', wait_until='networkidle')
        # 禁用所有动画
        print("🎨 禁用动画...")
        page.add_style_tag(content="""
            *, *::before, *::after {
                animation: none !important;
                transition: none !important;
            }
            .section, .cover {
                opacity: 1 !important;
                transform: none !important;
            }
        """)
        # 强制显示所有内容
        page.evaluate("""
            () => {
                document.querySelectorAll('.section, .cover').forEach(el => {
                    el.style.opacity = '1';
                    el.style.transform = 'none';
                });
            }
        """)
        # 滚动加载
        print("📜 加载所有内容...")
        total_height = page.evaluate("document.body.scrollHeight")
        for y in range(0, total_height, 1000):
            page.evaluate(f"window.scrollTo(0, {y})")
            time.sleep(0.1)
        page.evaluate("window.scrollTo(0, 0)")
        time.sleep(0.5)
        # 截取完整页面
        print("📸 截取完整页面...")
        page.screenshot(path=output_path, full_page=True)
        browser.close()
    size_kb = os.path.getsize(output_path) / 1024
    print(f"\n✅ 成功生成长图！")
    print(f"   大小: {size_kb:.1f} KB\n")
    return str(output_path)
 def image_to_pdf(image_path: str, pdf_path: str = None) -> str:
    """
    将图片转换为PDF。
    Args:
        image_path: 图片路径
        pdf_path: PDF输出路径（可选）
    Returns:
        生成的PDF路径
    """
    try:
        from PIL import Image
    except ImportError:
        print("正在安装 Pillow...")
        import subprocess
        subprocess.check_call([sys.executable, "-m", "pip", "install", "Pillow", "-q"])
        from PIL import Image
    if pdf_path is None:
        image_file = Path(image_path)
        pdf_path = str(image_file.with_suffix('.pdf'))
    print(f"📄 转换图片为PDF: {Path(pdf_path).name}")
    # 打开图片并转换为PDF
    image = Image.open(image_path)
    # 转换为RGB（PDF需要）
    if image.mode != 'RGB':
        image = image.convert('RGB')
    # 保存为PDF
    image.save(pdf_path, 'PDF', resolution=100.0)
    size_kb = os.path.getsize(pdf_path) / 1024
    print(f"✅ PDF生成成功！大小: {size_kb:.1f} KB\n")
    return str(pdf_path)
 def main():
    default_html = "202510_Alpha_Intelligence_BP.html"
    html_path = sys.argv[1] if len(sys.argv) > 1 else default_html
    print("\n" + "=" * 70)
    print("HTML转完整长图工具 - 无分页断开")
    print("=" * 70)
    try:
        # 生成长图
        image_path = html_to_long_image(html_path)
        # 转换为PDF
        pdf_path = image_to_pdf(image_path)
        print(f"💡 打开查看:")
        print(f"   长图: open {Path(image_path).name}")
        print(f"   PDF:  open {Path(pdf_path).name}\n")
    except Exception as e:
        print(f"\n❌ 错误: {e}\n")
        import traceback
        traceback.print_exc()
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,61 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:yonggao/claude-plugins:skills/html-to-pdf",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "d9a7bdd2b1535728f687605158566929b17670e8",
    "treeHash": "6792dd0d55d9a678dadebc65721c1a3864c4d30c619e5d185be09158b9de027d",
    "generatedAt": "2025-11-28T10:29:12.715718Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "html-to-pdf",
    "description": "Convert HTML files to PDF or PNG format with multiple rendering options. Supports multi-page PDFs, single-page long-image PDFs, background colors, and Chinese/CJK characters.",
    "version": null
  },
  "content": {
    "files": [
      {
        "path": "EXAMPLES.md",
        "sha256": "3e0a847cf1df915d9cbab1aada54f1e6cdc86976fc4ea9c0f25e85bd5a56da5a"
      },
      {
        "path": "html_to_long_image.py",
        "sha256": "450db8662b54dab9a17c8c2fd326ae6b0ac783dddef21b5c32c9d56627bfea74"
      },
      {
        "path": "README.md",
        "sha256": "d896d73dfc7ca8ac44703f1dfe09caccf05252ffdec97c696211293447c49e4a"
      },
      {
        "path": "VERSION",
        "sha256": "7086eb17c2adb4a5a4808c02065c4aa5bafb70ce833bbf31e237733265b0ef28"
      },
      {
        "path": "SKILL.md",
        "sha256": "2776f5b726b2bf7c8be9475cea8aad5bdbda7f70fe48a5aead366a161fa59cd0"
      },
      {
        "path": "QUICK_START.md",
        "sha256": "b7908dc8aeb6a687aa965debe8183ab9b37c8eae0392e7a7293a652946ad87c9"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "5fc173f27bc0ff64119a367e2f395ab3b4da752e7facb25ad019019ab0d5904b"
      }
    ],
    "dirSha256": "6792dd0d55d9a678dadebc65721c1a3864c4d30c619e5d185be09158b9de027d"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
		`@@ -0,0 +1,3 @@`
							`# html-to-pdf`

							`Convert HTML files to PDF or PNG format with multiple rendering options. Supports multi-page PDFs, single-page long-image PDFs, background colors, and Chinese/CJK characters.`