Initial commit
This commit is contained in:
252
SKILL.md
Normal file
252
SKILL.md
Normal file
@@ -0,0 +1,252 @@
|
||||
---
|
||||
name: html-to-pdf
|
||||
description: Converts HTML files to PDF or PNG format. Use this skill when the user asks to convert, export, or generate PDF/PNG from HTML files, or when they want to create printable documents, presentations, or long-form images from web pages or HTML reports.
|
||||
---
|
||||
|
||||
# HTML to PDF/PNG Converter Skill
|
||||
|
||||
This skill helps you convert HTML files to PDF or PNG format with various options for output quality and page layout.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when the user wants to:
|
||||
- Convert HTML files to PDF
|
||||
- Generate PNG screenshots from HTML
|
||||
- Create printable documents from web pages
|
||||
- Export HTML reports as PDFs
|
||||
- Generate long-form images without page breaks
|
||||
- Create presentation-ready PDFs from HTML
|
||||
|
||||
## Available Conversion Methods
|
||||
|
||||
### Method 1: Multi-Page PDF (Standard A4)
|
||||
Best for: Printing, archiving, traditional documents
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
python html_to_pdf_final.py input.html output.pdf
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Standard A4 page format
|
||||
- Zero margins for seamless appearance
|
||||
- Background colors and gradients preserved
|
||||
- Multiple pages with page breaks
|
||||
- Optimized file size (~5-6 MB for typical reports)
|
||||
|
||||
### Method 2: Single-Page Long Image PDF
|
||||
Best for: Online viewing, presentations, no page breaks
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
python html_to_long_image.py input.html
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Generates a full-page PNG screenshot first
|
||||
- Converts PNG to single-page PDF
|
||||
- NO page breaks - entire content on one page
|
||||
- Perfect for presentations and online viewing
|
||||
- Smaller file size (~1-2 MB)
|
||||
- Two output files: `.png` and `.pdf`
|
||||
|
||||
### Method 3: Advanced Multi-Method Converter
|
||||
Best for: Fallback options, compatibility
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
python html_to_pdf_converter.py input.html output.pdf
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Tries WeasyPrint first (best CSS support)
|
||||
- Falls back to Playwright if needed
|
||||
- Automatic dependency installation
|
||||
- Handles complex CSS and gradients
|
||||
|
||||
## Required Dependencies
|
||||
|
||||
The skill requires Playwright and Pillow. The scripts auto-install dependencies if missing:
|
||||
|
||||
```bash
|
||||
pip install playwright pillow pypdf
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
## Step-by-Step Instructions
|
||||
|
||||
### For Standard Multi-Page PDF:
|
||||
|
||||
1. **Verify the HTML file exists:**
|
||||
```bash
|
||||
ls -la *.html
|
||||
```
|
||||
|
||||
2. **Run the converter:**
|
||||
```bash
|
||||
python html_to_pdf_final.py your_file.html
|
||||
```
|
||||
|
||||
3. **Open the result:**
|
||||
```bash
|
||||
open your_file_final.pdf
|
||||
```
|
||||
|
||||
### For Single-Page Long Image PDF:
|
||||
|
||||
1. **Verify the HTML file exists:**
|
||||
```bash
|
||||
ls -la *.html
|
||||
```
|
||||
|
||||
2. **Run the long image converter:**
|
||||
```bash
|
||||
python html_to_long_image.py your_file.html
|
||||
```
|
||||
|
||||
3. **Check outputs:**
|
||||
```bash
|
||||
# View the PNG screenshot
|
||||
open your_file_fullpage.png
|
||||
|
||||
# View the PDF version
|
||||
open your_file_fullpage.pdf
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Playwright browser not found
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
### Issue: Page breaks visible in PDF
|
||||
|
||||
**Solution:** Use the long image method instead:
|
||||
```bash
|
||||
python html_to_long_image.py your_file.html
|
||||
```
|
||||
|
||||
### Issue: Content appears cut off
|
||||
|
||||
**Causes & Solutions:**
|
||||
- **CSS animations not complete**: Script waits 2 seconds for animations
|
||||
- **Lazy loading**: Script scrolls through entire page to trigger loading
|
||||
- **Large file size**: Scripts handle files up to 20MB+
|
||||
|
||||
### Issue: Blank PDF output
|
||||
|
||||
**Solution:** Use the long image method which uses screenshot instead of PDF rendering:
|
||||
```bash
|
||||
python html_to_long_image.py your_file.html
|
||||
```
|
||||
|
||||
## Output Files
|
||||
|
||||
After conversion, you'll get:
|
||||
|
||||
**Multi-Page PDF:**
|
||||
- `filename_final.pdf` - Standard A4 multi-page PDF
|
||||
|
||||
**Long Image Method:**
|
||||
- `filename_fullpage.png` - Complete screenshot as PNG (6-10 MB)
|
||||
- `filename_fullpage.pdf` - Single-page PDF from image (1-2 MB)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **For online viewing/presentations:** Use `html_to_long_image.py`
|
||||
- No page breaks
|
||||
- Smooth scrolling experience
|
||||
- Smaller file size
|
||||
|
||||
2. **For printing/archiving:** Use `html_to_pdf_final.py`
|
||||
- Standard A4 pages
|
||||
- Better for physical printing
|
||||
- Professional document format
|
||||
|
||||
3. **For complex CSS:** Use `html_to_pdf_converter.py`
|
||||
- Multiple fallback methods
|
||||
- Better compatibility
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Script Locations
|
||||
All scripts should be in the project directory:
|
||||
- `html_to_pdf_final.py` - Main multi-page converter
|
||||
- `html_to_long_image.py` - Long image generator
|
||||
- `html_to_pdf_converter.py` - Advanced multi-method converter
|
||||
|
||||
### Key Features Implemented
|
||||
|
||||
1. **Animation Handling**: All scripts disable CSS animations/transitions
|
||||
2. **Lazy Loading**: Scripts scroll through content to trigger loading
|
||||
3. **Background Preservation**: All gradients and colors render correctly
|
||||
4. **Zero Margins**: Seamless page appearance without visible borders
|
||||
5. **Chinese Font Support**: Handles CJK characters properly
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Convert BP Report to Multi-Page PDF
|
||||
```bash
|
||||
python html_to_pdf_final.py 202510_Alpha_Intelligence_BP.html
|
||||
# Output: 202510_Alpha_Intelligence_BP_final.pdf (22 pages, 5.7 MB)
|
||||
```
|
||||
|
||||
### Example 2: Create Single-Page Presentation PDF
|
||||
```bash
|
||||
python html_to_long_image.py 202510_Alpha_Intelligence_BP.html
|
||||
# Output:
|
||||
# - 202510_Alpha_Intelligence_BP_fullpage.png (6.1 MB)
|
||||
# - 202510_Alpha_Intelligence_BP_fullpage.pdf (1.6 MB, 1 page)
|
||||
```
|
||||
|
||||
### Example 3: Batch Convert Multiple Files
|
||||
```bash
|
||||
for file in *.html; do
|
||||
python html_to_long_image.py "$file"
|
||||
done
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Output Paths
|
||||
```bash
|
||||
python html_to_pdf_final.py input.html custom_output.pdf
|
||||
python html_to_long_image.py input.html
|
||||
```
|
||||
|
||||
### Check PDF Page Count
|
||||
```bash
|
||||
python -c "from pypdf import PdfReader; r = PdfReader('output.pdf'); print(f'Pages: {len(r.pages)}')"
|
||||
```
|
||||
|
||||
### Verify File Sizes
|
||||
```bash
|
||||
ls -lh *.pdf *.png
|
||||
```
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
- **Processing Speed**: ~5-10 seconds for typical HTML files
|
||||
- **Memory Usage**: ~100-200 MB during conversion
|
||||
- **PDF File Size**: 1-6 MB depending on method
|
||||
- **PNG File Size**: 6-10 MB for full-page screenshots
|
||||
|
||||
## Success Criteria
|
||||
|
||||
A successful conversion should:
|
||||
1. ✅ Generate PDF/PNG without errors
|
||||
2. ✅ Include all HTML content (no truncation)
|
||||
3. ✅ Preserve colors, gradients, and styling
|
||||
4. ✅ Handle Chinese/CJK characters correctly
|
||||
5. ✅ Create readable file sizes (< 10 MB)
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Need | Use | Output |
|
||||
|------|-----|--------|
|
||||
| Printing | `html_to_pdf_final.py` | Multi-page A4 PDF |
|
||||
| Online viewing | `html_to_long_image.py` | Single-page PDF + PNG |
|
||||
| Maximum compatibility | `html_to_pdf_converter.py` | Multi-page PDF with fallbacks |
|
||||
Reference in New Issue
Block a user