Initial commit
This commit is contained in:
136
skills/benchmarking/report-creator/reference/pdf-style.css
Normal file
136
skills/benchmarking/report-creator/reference/pdf-style.css
Normal file
@@ -0,0 +1,136 @@
|
||||
/* Academic Report PDF Styling
|
||||
* For use with pandoc + weasyprint
|
||||
* Based on paralleLLM empathy-experiment-v1.0.pdf conventions
|
||||
*/
|
||||
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
max-width: 800px;
|
||||
margin: 0 auto;
|
||||
padding: 2em;
|
||||
}
|
||||
|
||||
h1, h2, h3, h4 {
|
||||
margin-top: 1.5em;
|
||||
margin-bottom: 0.5em;
|
||||
}
|
||||
|
||||
/* Academic-style tables */
|
||||
table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
margin: 1em 0;
|
||||
page-break-inside: avoid;
|
||||
}
|
||||
|
||||
table th, table td {
|
||||
padding: 0.5em 0.75em;
|
||||
text-align: left;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
table thead th {
|
||||
border-top: 2px solid #000;
|
||||
border-bottom: 2px solid #000;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
table tbody td {
|
||||
border-bottom: 1px solid #ddd;
|
||||
}
|
||||
|
||||
table tbody tr:last-child td {
|
||||
border-bottom: 2px solid #000;
|
||||
}
|
||||
|
||||
/* Blockquotes for prompts/examples */
|
||||
blockquote {
|
||||
border-left: 4px solid #ddd;
|
||||
margin: 1em 0;
|
||||
padding-left: 1em;
|
||||
color: #555;
|
||||
page-break-inside: avoid;
|
||||
}
|
||||
|
||||
/* Code blocks */
|
||||
code {
|
||||
background: #f5f5f5;
|
||||
padding: 0.2em 0.4em;
|
||||
border-radius: 3px;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
pre {
|
||||
background: #f5f5f5;
|
||||
padding: 1em;
|
||||
overflow-x: auto;
|
||||
border-radius: 5px;
|
||||
page-break-inside: avoid;
|
||||
}
|
||||
|
||||
pre code {
|
||||
background: none;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
/* Horizontal rules as section dividers */
|
||||
hr {
|
||||
border: none;
|
||||
border-top: 1px solid #ddd;
|
||||
margin: 2em 0;
|
||||
}
|
||||
|
||||
/* Page break control */
|
||||
@page {
|
||||
margin: 2cm;
|
||||
}
|
||||
|
||||
figure {
|
||||
page-break-inside: avoid;
|
||||
margin: 1.5em 0;
|
||||
}
|
||||
|
||||
figure img {
|
||||
max-width: 100%;
|
||||
height: auto;
|
||||
display: block;
|
||||
}
|
||||
|
||||
figcaption {
|
||||
text-align: center;
|
||||
font-style: italic;
|
||||
margin-top: 0.5em;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
/* Keep headings with following content */
|
||||
h2, h3, h4 {
|
||||
page-break-after: avoid;
|
||||
}
|
||||
|
||||
h2 {
|
||||
page-break-before: auto;
|
||||
margin-top: 2em;
|
||||
}
|
||||
|
||||
/* Prevent orphan paragraphs */
|
||||
p {
|
||||
orphans: 3;
|
||||
widows: 3;
|
||||
}
|
||||
|
||||
/* Keep lists together */
|
||||
ul, ol {
|
||||
page-break-inside: avoid;
|
||||
}
|
||||
|
||||
/* Reduced spacing for subsections */
|
||||
h3 {
|
||||
margin-top: 1.5em;
|
||||
}
|
||||
|
||||
/* Bold findings/labels stay with content */
|
||||
p strong:first-child {
|
||||
page-break-after: avoid;
|
||||
}
|
||||
218
skills/benchmarking/report-creator/reference/report-template.md
Normal file
218
skills/benchmarking/report-creator/reference/report-template.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# Academic Research Report Template
|
||||
|
||||
Complete markdown template for research reports following academic conventions.
|
||||
|
||||
## Full Template
|
||||
|
||||
```markdown
|
||||
# [Title]
|
||||
## [Subtitle - descriptive]
|
||||
|
||||
**Date**: [Date]
|
||||
**Model Tested**: [model-id] (if applicable)
|
||||
**Trials**: [sample size description]
|
||||
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
[150-250 word summary of research question, methodology, key findings, implications]
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Key Finding**: [One-sentence summary of most important result]
|
||||
|
||||
| Metric | Result |
|
||||
|--------|--------|
|
||||
| Primary hypothesis | [Supported/Rejected] — [brief reason] |
|
||||
| Secondary hypothesis | [Status] — [brief reason] |
|
||||
| Sample size | n = [N] |
|
||||
| Practical implication | [Key takeaway] |
|
||||
|
||||
---
|
||||
|
||||
## 1. Background and Motivation
|
||||
|
||||
### 1.1 Research Context
|
||||
[Problem statement, why this matters, prior work]
|
||||
|
||||
### 1.2 Hypotheses
|
||||
**H1 (Primary)**: [Testable prediction]
|
||||
**H2 (Secondary)**: [Additional prediction]
|
||||
|
||||
---
|
||||
|
||||
## 2. Methodology
|
||||
|
||||
### 2.1 Experimental Design
|
||||
|
||||
#### 2.1.1 Overview
|
||||
[Design summary: conditions × scenarios × trials]
|
||||
|
||||
#### 2.1.2 Variables
|
||||
|
||||
**Independent Variable**: [What you manipulated]
|
||||
|
||||
| Level | Description | Example |
|
||||
|-------|-------------|---------|
|
||||
| 1. [Condition] | [Description] | [Example framing] |
|
||||
| 2. [Condition] | [Description] | [Example framing] |
|
||||
|
||||
**Dependent Variables**:
|
||||
|
||||
| Variable | Type | Measurement |
|
||||
|----------|------|-------------|
|
||||
| [Metric] | Continuous (0-1) | [How measured] |
|
||||
|
||||
**Control Variables**:
|
||||
- [List of held-constant factors]
|
||||
|
||||
### 2.2 Dataset Design
|
||||
[Scenario distribution, categories, sampling]
|
||||
|
||||
### 2.3 Scoring Logic
|
||||
[How pass/fail or scores determined]
|
||||
|
||||
### 2.4 Experimental Protocol
|
||||
```
|
||||
Model: [model-id]
|
||||
Provider: [API provider]
|
||||
Test Cases: [N]
|
||||
Trials per Case: [N]
|
||||
Total Completions: [N]
|
||||
Runtime: [duration]
|
||||
```
|
||||
|
||||
### 2.5 Test Infrastructure
|
||||
[Figure showing pipeline/architecture]
|
||||
|
||||
---
|
||||
|
||||
## 3. Results
|
||||
|
||||
### 3.1 Summary Statistics
|
||||
[Main results table with all conditions]
|
||||
|
||||
### 3.2 [Key Metric] by [Grouping Variable]
|
||||
[Visualization or detailed breakdown]
|
||||
|
||||
### 3.3 Key Observations
|
||||
|
||||
**Finding 1: [Title]**
|
||||
[Description with specific numbers]
|
||||
|
||||
**Finding 2: [Title]**
|
||||
[Description with specific numbers]
|
||||
|
||||
---
|
||||
|
||||
## 4. Analysis and Discussion
|
||||
|
||||
### 4.1 Hypothesis Evaluation
|
||||
|
||||
| Hypothesis | Status | Evidence |
|
||||
|------------|--------|----------|
|
||||
| H1 | [REJECTED/SUPPORTED] | [Summary] |
|
||||
| H2 | [REJECTED/SUPPORTED] | [Summary] |
|
||||
|
||||
### 4.2 Interpretation
|
||||
[What the results mean, behavioral modes identified]
|
||||
|
||||
### 4.3 Theoretical Implications
|
||||
[Broader significance, model behavior insights]
|
||||
|
||||
### 4.4 Practical Implications
|
||||
[Deployment recommendations, risk assessment]
|
||||
|
||||
---
|
||||
|
||||
## 5. Limitations
|
||||
|
||||
### 5.1 Methodological Limitations
|
||||
1. **[Limitation]**: [Explanation]
|
||||
2. **[Limitation]**: [Explanation]
|
||||
|
||||
### 5.2 Dataset Limitations
|
||||
[Sample size, language, cultural scope]
|
||||
|
||||
### 5.3 Evaluation Limitations
|
||||
[Scoring limitations, validation gaps]
|
||||
|
||||
---
|
||||
|
||||
## 6. Future Work
|
||||
1. **[Direction]**: [Description]
|
||||
2. **[Direction]**: [Description]
|
||||
|
||||
---
|
||||
|
||||
## 7. Conclusion
|
||||
[3-5 paragraph synthesis: main findings, implications, bottom line]
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: [Title]
|
||||
|
||||
### A.1 [Subsection]
|
||||
[Supporting materials, sample prompts, raw data excerpts]
|
||||
|
||||
## Appendix B: [Title]
|
||||
|
||||
### B.1 [Technical Details]
|
||||
[Implementation details, indicator lists, architecture diagrams]
|
||||
|
||||
---
|
||||
|
||||
*Report generated by [Author]*
|
||||
```
|
||||
|
||||
## Section Guidelines
|
||||
|
||||
### Abstract (150-250 words)
|
||||
- Research question or problem
|
||||
- Methodology summary
|
||||
- Key findings
|
||||
- Implications
|
||||
|
||||
### Executive Summary
|
||||
- One-sentence key finding
|
||||
- Metrics table with hypothesis status
|
||||
- Sample size and practical takeaway
|
||||
|
||||
### Background
|
||||
- Why this research matters
|
||||
- Prior work context
|
||||
- Clear, testable hypotheses
|
||||
|
||||
### Methodology
|
||||
- Experimental design overview
|
||||
- Variables table (IV, DV, control)
|
||||
- Dataset description
|
||||
- Scoring criteria
|
||||
- Protocol details
|
||||
|
||||
### Results
|
||||
- Summary statistics table
|
||||
- Visualizations or breakdowns
|
||||
- Numbered findings with specific data
|
||||
|
||||
### Discussion
|
||||
- Hypothesis evaluation table
|
||||
- Interpretation of findings
|
||||
- Theoretical implications
|
||||
- Practical recommendations
|
||||
|
||||
### Limitations
|
||||
- Methodological constraints
|
||||
- Dataset scope limitations
|
||||
- Evaluation gaps
|
||||
|
||||
### Future Work
|
||||
- Numbered research directions
|
||||
- Extensions of current work
|
||||
|
||||
### Conclusion
|
||||
- Synthesis of findings
|
||||
- Bottom-line takeaway
|
||||
@@ -0,0 +1,78 @@
|
||||
# Table Formatting Patterns
|
||||
|
||||
Academic-style table patterns for research reports.
|
||||
|
||||
## Summary Statistics Table
|
||||
|
||||
```markdown
|
||||
| Identity | N | Mean Score | Std Dev | Pass Rate | Metric |
|
||||
|----------|---|------------|---------|-----------|--------|
|
||||
| condition_a | 100 | 0.584 | 0.259 | 59.0% | **8.0%** |
|
||||
| condition_b | 99 | 0.658 | 0.138 | **87.9%** | 12.1% |
|
||||
```
|
||||
|
||||
## Hypothesis Evaluation Table
|
||||
|
||||
```markdown
|
||||
| Hypothesis | Status | Evidence |
|
||||
|------------|--------|----------|
|
||||
| H1 | **REJECTED** | [Brief summary with numbers] |
|
||||
| H2 | **NOT SUPPORTED** | [Brief summary with numbers] |
|
||||
| H3 | **PARTIALLY SUPPORTED** | [Brief summary with numbers] |
|
||||
```
|
||||
|
||||
## Variables Table (3-column)
|
||||
|
||||
```markdown
|
||||
| Level | Description | Example Framing |
|
||||
|-------|-------------|-----------------|
|
||||
| 1. Direct Human | Control condition | "Hi, I'm Sarah..." |
|
||||
| 2. AI Testing | Explicit evaluation | "[AUTOMATED TEST]..." |
|
||||
```
|
||||
|
||||
## Dependent Variables Table
|
||||
|
||||
```markdown
|
||||
| Variable | Type | Measurement |
|
||||
|----------|------|-------------|
|
||||
| Score | Continuous (0-1) | API response parsing |
|
||||
| Pass Rate | Binary | Score >= threshold |
|
||||
```
|
||||
|
||||
## Executive Summary Metrics
|
||||
|
||||
```markdown
|
||||
| Metric | Result |
|
||||
|--------|--------|
|
||||
| Primary hypothesis | [Supported/Rejected] — [brief reason] |
|
||||
| Secondary hypothesis | [Status] — [brief reason] |
|
||||
| Sample size | n = [N] |
|
||||
| Practical implication | [Key takeaway] |
|
||||
```
|
||||
|
||||
## Figure Embedding
|
||||
|
||||
### Standard Figure
|
||||
```html
|
||||
<figure style="margin: 2em auto; page-break-inside: avoid; text-align: center;">
|
||||
<img src="figure-1.png" alt="Description" style="max-width: 100%; height: auto;">
|
||||
<figcaption>Figure 1: Descriptive caption explaining what the figure shows.</figcaption>
|
||||
</figure>
|
||||
```
|
||||
|
||||
### Figure with Border
|
||||
```html
|
||||
<figure style="margin: 2em auto; page-break-inside: avoid; text-align: center; border: 1px solid #eee; padding: 1em; border-radius: 8px;">
|
||||
<img src="architecture.png" alt="System architecture">
|
||||
<figcaption>Figure 2: System architecture showing data flow.</figcaption>
|
||||
</figure>
|
||||
```
|
||||
|
||||
## Typography Conventions
|
||||
|
||||
| Element | Usage |
|
||||
|---------|-------|
|
||||
| **Bold** | Key findings, important metrics, hypothesis status |
|
||||
| *Italic* | Figure captions, emphasis, latin terms |
|
||||
| `code` | Model IDs, technical terms, file names |
|
||||
| > Blockquote | Sample prompts, user messages, system messages |
|
||||
Reference in New Issue
Block a user