188 lines
5.3 KiB
Markdown
188 lines
5.3 KiB
Markdown
# Paper2Web: Academic Homepage Generation
|
|
|
|
## Overview
|
|
|
|
Paper2Web converts academic papers into interactive, explorable academic homepages. Unlike traditional approaches (direct generation, template-based, or HTML conversion), Paper2Web creates layout-aware, interactive websites through an iterative refinement process.
|
|
|
|
## Core Capabilities
|
|
|
|
### 1. Layout-Aware Generation
|
|
- Analyzes paper structure and content organization
|
|
- Creates responsive, multi-section layouts
|
|
- Adapts design based on paper type (research article, review, preprint, etc.)
|
|
|
|
### 2. Interactive Elements
|
|
- Expandable sections for detailed content
|
|
- Interactive figures and tables
|
|
- Embedded citations and references
|
|
- Navigation menu for easy browsing
|
|
- Mobile-responsive design
|
|
|
|
### 3. Content Refinement
|
|
The system uses an iterative pipeline:
|
|
1. Initial content extraction and structuring
|
|
2. Layout generation with visual hierarchy
|
|
3. Interactive element integration
|
|
4. Aesthetic refinement
|
|
5. Quality assessment and validation
|
|
|
|
## Usage
|
|
|
|
### Basic Website Generation
|
|
|
|
```bash
|
|
python pipeline_all.py \
|
|
--input-dir "path/to/papers" \
|
|
--output-dir "path/to/output" \
|
|
--model-choice 1
|
|
```
|
|
|
|
### Parameters
|
|
|
|
- `--input-dir`: Directory containing paper files (PDF or LaTeX)
|
|
- `--output-dir`: Directory for generated website files
|
|
- `--model-choice`: LLM model selection (1=GPT-4, 2=GPT-4.1)
|
|
- `--enable-logo-search`: Use Google Search API to find institution logos (optional)
|
|
|
|
### Input Format Requirements
|
|
|
|
**Supported Input Formats:**
|
|
1. **LaTeX source** (preferred for best results)
|
|
- Main file: `main.tex`
|
|
- Include all referenced figures, tables, and bibliography files
|
|
- Organize in a single directory per paper
|
|
|
|
2. **PDF files**
|
|
- High-quality PDF with selectable text
|
|
- Embedded figures should be high resolution
|
|
- Proper section headers and structure
|
|
|
|
**Directory Structure:**
|
|
```
|
|
input/
|
|
└── paper_name/
|
|
├── main.tex # LaTeX source
|
|
├── bibliography.bib # References
|
|
├── figures/ # Figure files
|
|
│ ├── fig1.png
|
|
│ └── fig2.pdf
|
|
└── tables/ # Table files
|
|
```
|
|
|
|
## Output Structure
|
|
|
|
Generated websites include:
|
|
|
|
```
|
|
output/paper_name/website/
|
|
├── index.html # Main webpage
|
|
├── styles.css # Styling
|
|
├── script.js # Interactive features
|
|
├── assets/ # Images and media
|
|
│ ├── figures/
|
|
│ └── logos/
|
|
└── data/ # Structured data (optional)
|
|
```
|
|
|
|
## Customization Options
|
|
|
|
### Visual Design
|
|
The generated websites automatically include:
|
|
- Professional color schemes based on paper content
|
|
- Typography optimized for readability
|
|
- Consistent spacing and visual hierarchy
|
|
- Dark mode support (optional)
|
|
|
|
### Content Sections
|
|
Standard sections include:
|
|
- Abstract
|
|
- Key findings/contributions
|
|
- Methodology overview
|
|
- Results and visualizations
|
|
- Discussion and implications
|
|
- References and citations
|
|
- Author information and affiliations
|
|
|
|
Additional sections are automatically added based on paper content:
|
|
- Code repositories
|
|
- Dataset links
|
|
- Supplementary materials
|
|
- Related publications
|
|
|
|
## Quality Assessment
|
|
|
|
Paper2Web includes built-in evaluation:
|
|
|
|
### Aesthetic Metrics
|
|
- Layout balance and spacing
|
|
- Color harmony
|
|
- Typography consistency
|
|
- Visual hierarchy effectiveness
|
|
|
|
### Informativeness Metrics
|
|
- Content completeness
|
|
- Key finding clarity
|
|
- Method explanation adequacy
|
|
- Results presentation quality
|
|
|
|
### Technical Metrics
|
|
- Page load time
|
|
- Mobile responsiveness
|
|
- Browser compatibility
|
|
- Accessibility compliance
|
|
|
|
## Advanced Features
|
|
|
|
### Logo Discovery
|
|
When enabled with Google Search API:
|
|
- Automatically finds institution logos
|
|
- Matches author affiliations
|
|
- Downloads and optimizes logo images
|
|
- Integrates into website header
|
|
|
|
### Citation Integration
|
|
- Interactive reference list
|
|
- Hover previews for citations
|
|
- Links to DOI and external sources
|
|
- Citation count tracking (if available)
|
|
|
|
### Figure Enhancement
|
|
- High-resolution figure rendering
|
|
- Zoom and pan functionality
|
|
- Caption and description integration
|
|
- Multi-panel figure navigation
|
|
|
|
## Best Practices
|
|
|
|
### Input Preparation
|
|
1. **Use LaTeX when possible**: Provides best structure extraction
|
|
2. **Include all assets**: Figures, tables, and bibliography files
|
|
3. **Clean formatting**: Remove compilation artifacts and temporary files
|
|
4. **High-quality figures**: Use vector formats (PDF, SVG) when available
|
|
|
|
### Model Selection
|
|
- **GPT-4**: Best balance of quality and cost
|
|
- **GPT-4.1**: Latest features, higher cost
|
|
- **GPT-3.5-turbo**: Faster processing, acceptable for simple papers
|
|
|
|
### Output Optimization
|
|
1. Review generated content for accuracy
|
|
2. Check that all figures render correctly
|
|
3. Test interactive elements functionality
|
|
4. Verify mobile responsiveness
|
|
5. Validate external links
|
|
|
|
## Limitations
|
|
|
|
- Complex mathematical equations may require manual review
|
|
- Multi-column layouts in PDF may affect extraction quality
|
|
- Large papers (>50 pages) may require extended processing time
|
|
- Some specialized figure types may need manual adjustment
|
|
|
|
## Integration with Other Components
|
|
|
|
Paper2Web can be combined with:
|
|
- **Paper2Video**: Generate companion video for the website
|
|
- **Paper2Poster**: Create matching poster design
|
|
- **AutoPR**: Generate promotional content linking to website
|