415 lines
13 KiB
Markdown
415 lines
13 KiB
Markdown
# Evaluation Rubrics Templates
|
||
|
||
Quick-start templates for purpose definition, criteria selection, scale design, descriptor writing, and rubric formats.
|
||
|
||
## Workflow
|
||
|
||
```
|
||
Rubric Development Progress:
|
||
- [ ] Step 1: Define purpose and scope
|
||
- [ ] Step 2: Identify evaluation criteria
|
||
- [ ] Step 3: Design the scale
|
||
- [ ] Step 4: Write performance descriptors
|
||
- [ ] Step 5: Test and calibrate
|
||
- [ ] Step 6: Use and iterate
|
||
```
|
||
|
||
**Step 1: Define purpose and scope**
|
||
|
||
Use [Purpose Definition Template](#purpose-definition-template) to clarify evaluation context and constraints.
|
||
|
||
**Step 2: Identify evaluation criteria**
|
||
|
||
Brainstorm and prioritize quality dimensions using [Criteria Identification Template](#criteria-identification-template).
|
||
|
||
**Step 3: Design the scale**
|
||
|
||
Select scale type and levels using [Scale Selection Template](#scale-selection-template).
|
||
|
||
**Step 4: Write performance descriptors**
|
||
|
||
Write clear, observable descriptors using [Descriptor Writing Template](#descriptor-writing-template).
|
||
|
||
**Step 5: Test and calibrate**
|
||
|
||
Conduct inter-rater reliability testing and refine rubric.
|
||
|
||
**Step 6: Use and iterate**
|
||
|
||
Apply rubric, collect feedback, revise as needed.
|
||
|
||
---
|
||
|
||
## Purpose Definition Template
|
||
|
||
**What are we evaluating?**
|
||
- Artifact type: [e.g., code pull requests, research proposals, design mockups, student essays]
|
||
- Specific context: [e.g., internal code review, grant competition, course assignment]
|
||
|
||
**Who will evaluate?**
|
||
- Number of evaluators: [Single reviewer or multiple?]
|
||
- Evaluator expertise: [Subject matter experts, peers, instructors, automated systems]
|
||
- Evaluator availability: [Time per evaluation? Total volume?]
|
||
|
||
**Who are the evaluatees?**
|
||
- Audience: [Students, employees, vendors, applicants]
|
||
- Skill level: [Novice, intermediate, expert]
|
||
- Will they see rubric before evaluation? [Yes/No - if yes, rubric serves as guide]
|
||
|
||
**What decisions depend on scores?**
|
||
- High stakes: [Pass/fail, hiring, funding, promotion, grades]
|
||
- Medium stakes: [Feedback for improvement, prioritization, awards]
|
||
- Low stakes: [Self-assessment, informal feedback]
|
||
|
||
**Success criteria for rubric:**
|
||
- [ ] Enables consistent scoring across evaluators (inter-rater reliability >70%)
|
||
- [ ] Provides actionable feedback for improvement
|
||
- [ ] Takes reasonable time to use (target: X minutes per evaluation)
|
||
- [ ] Acceptable to evaluators (not overly complex or rigid)
|
||
- [ ] Acceptable to evaluatees (perceived as fair and transparent)
|
||
|
||
---
|
||
|
||
## Criteria Identification Template
|
||
|
||
### Brainstorming Quality Dimensions
|
||
|
||
**Product criteria** (artifact itself):
|
||
- Correctness/Accuracy: [Is it right? Factually accurate? Meets requirements?]
|
||
- Completeness: [Covers all necessary elements? No major gaps?]
|
||
- Clarity: [Easy to understand? Well-organized? Clear communication?]
|
||
- Quality/Craftsmanship: [Attention to detail? Polished? Professional?]
|
||
- Originality/Creativity: [Novel approach? Innovative? Goes beyond expected?]
|
||
- Performance: [Fast? Efficient? Scalable? Meets technical specs?]
|
||
|
||
**Process criteria** (how it was made):
|
||
- Methodology: [Followed appropriate process? Research methods sound?]
|
||
- Collaboration: [Teamwork? Communication? Used feedback?]
|
||
- Iteration: [Multiple drafts? Refinement? Responsiveness to critique?]
|
||
- Time management: [Completed on time? Paced work appropriately?]
|
||
|
||
**Impact criteria** (effects/outcomes):
|
||
- Usability: [User-friendly? Accessible? Intuitive?]
|
||
- Value: [Solves problem? Addresses need? Business impact?]
|
||
- Learning demonstrated: [Shows understanding? Growth from previous work?]
|
||
|
||
**Meta criteria** (quality of quality):
|
||
- Maintainability: [Can others work with this? Documented? Modular?]
|
||
- Testability: [Can be verified? Validated? Measured?]
|
||
- Extensibility: [Can be built upon? Flexible? Adaptable?]
|
||
|
||
### Prioritization
|
||
|
||
**Rate each candidate criterion:**
|
||
|
||
| Criterion | Importance (H/M/L) | Observable (Y/N) | Distinct from others (Y/N) | Include? |
|
||
|-----------|-------------------|------------------|---------------------------|----------|
|
||
| [Criterion 1] | | | | |
|
||
| [Criterion 2] | | | | |
|
||
| [Criterion 3] | | | | |
|
||
|
||
**Selection rules:**
|
||
- Must be High or Medium importance
|
||
- Must be Observable (can two reviewers score consistently?)
|
||
- Must be Distinct (not overlapping with other criteria)
|
||
- Aim for 4-8 criteria (balance coverage vs. simplicity)
|
||
|
||
**Final criteria** (4-8 selected):
|
||
1. [Criterion]: [Brief definition]
|
||
2. [Criterion]: [Brief definition]
|
||
3. [Criterion]: [Brief definition]
|
||
4. [Criterion]: [Brief definition]
|
||
|
||
---
|
||
|
||
## Scale Selection Template
|
||
|
||
**Scale type options:**
|
||
|
||
### Numeric Scales
|
||
|
||
**1-3 scale** (Low/Medium/High)
|
||
- Use when: Quick categorization, clear tiers sufficient
|
||
- Levels: 1=Below standard, 2=Meets standard, 3=Exceeds standard
|
||
|
||
**1-4 scale** (Forced choice, no middle)
|
||
- Use when: Want to avoid central tendency, need clear differentiation
|
||
- Levels: 1=Poor, 2=Fair, 3=Good, 4=Excellent
|
||
|
||
**1-5 scale** (Most common, allows neutral)
|
||
- Use when: General purpose, familiar to evaluators
|
||
- Levels: 1=Poor, 2=Fair, 3=Adequate, 4=Good, 5=Excellent
|
||
|
||
**1-10 scale** (Fine gradations)
|
||
- Use when: Large sample, need statistical analysis, can distinguish subtle differences
|
||
- Levels: 1-2=Poor, 3-4=Fair, 5-6=Adequate, 7-8=Good, 9-10=Excellent
|
||
|
||
### Qualitative Scales
|
||
|
||
**Developmental**: Novice → Developing → Proficient → Expert
|
||
**Standards-based**: Below Standard → Approaching → Meets → Exceeds
|
||
**Competency**: Not Yet Competent → Partially Competent → Competent → Highly Competent
|
||
|
||
### Binary
|
||
|
||
**Pass/Fail, Yes/No, Present/Absent**
|
||
- Use when: Compliance checks, minimum thresholds, clear criteria
|
||
|
||
**Selected scale for this rubric**: [Choose one]
|
||
- **Type**: [Numeric 1-5, Qualitative, etc.]
|
||
- **Levels**: [List with labels]
|
||
- **Rationale**: [Why this scale fits purpose]
|
||
|
||
---
|
||
|
||
## Descriptor Writing Template
|
||
|
||
For each criterion, write descriptors at each scale level.
|
||
|
||
### Criterion: [Name]
|
||
|
||
**Definition**: [What does this criterion assess? 1-2 sentences]
|
||
|
||
**Why it matters**: [Importance to overall quality]
|
||
|
||
**Scale descriptors:**
|
||
|
||
#### Level 5 (or highest): [Label]
|
||
**Observable characteristics**:
|
||
- [Concrete, observable feature 1]
|
||
- [Concrete, observable feature 2]
|
||
- [Concrete, observable feature 3]
|
||
|
||
**Example**: [Specific instance of work at this level]
|
||
|
||
#### Level 4: [Label]
|
||
**Observable characteristics**:
|
||
- [How this differs from Level 5 - what's missing or less strong]
|
||
- [Concrete observable feature]
|
||
|
||
**Example**: [Specific instance]
|
||
|
||
#### Level 3: [Label] (Baseline/Adequate)
|
||
**Observable characteristics**:
|
||
- [Minimum acceptable performance]
|
||
- [Observable feature]
|
||
|
||
**Example**: [Specific instance]
|
||
|
||
#### Level 2: [Label]
|
||
**Observable characteristics**:
|
||
- [What's lacking compared to Level 3]
|
||
- [Observable deficiency]
|
||
|
||
**Example**: [Specific instance]
|
||
|
||
#### Level 1 (or lowest): [Label]
|
||
**Observable characteristics**:
|
||
- [Significant deficiencies]
|
||
- [Observable problems]
|
||
|
||
**Example**: [Specific instance]
|
||
|
||
---
|
||
|
||
### Descriptor Writing Guidelines
|
||
|
||
**DO:**
|
||
- Use observable, measurable language ("Contains 3+ bugs" not "poor quality")
|
||
- Provide concrete examples or anchors for each level
|
||
- Focus on what IS present at each level, not just "less than" higher level
|
||
- Use parallel structure across levels (same aspects addressed at each level)
|
||
- Specify quantities when possible ("All 5 requirements met" vs "Most requirements met")
|
||
|
||
**DON'T:**
|
||
- Use subjective terms without definition ("creative", "professional", "excellent effort")
|
||
- Rely on comparative language only ("better than", "more sophisticated")
|
||
- Make assumptions about process ("spent time", "worked hard" - unless observable)
|
||
- Penalize for things not mentioned in descriptor (hidden expectations)
|
||
|
||
---
|
||
|
||
## Analytic Rubric Template
|
||
|
||
Most common format: Multiple criteria (rows) × Multiple levels (columns)
|
||
|
||
### Rubric for: [Artifact Type]
|
||
|
||
**Purpose**: [Brief description]
|
||
|
||
**Scale**: [1-5, 1-4, etc. with labels]
|
||
|
||
| Criterion | 1 | 2 | 3 | 4 | 5 | Weight |
|
||
|-----------|---|---|---|---|---|--------|
|
||
| **[Criterion 1]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
|
||
| **[Criterion 2]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
|
||
| **[Criterion 3]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
|
||
| **[Criterion 4]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
|
||
|
||
**Scoring:**
|
||
- Calculate: (Score1 × Weight1) + (Score2 × Weight2) + ... / Total Weights
|
||
- Threshold: [e.g., Must average ≥3.0 to pass, ≥4 on critical criteria]
|
||
|
||
**Usage notes:**
|
||
- Score each criterion independently before looking at others (avoid halo effect)
|
||
- Provide brief justification for each score
|
||
- Flag areas for improvement in feedback
|
||
|
||
---
|
||
|
||
## Holistic Rubric Template
|
||
|
||
Single overall score integrating multiple criteria.
|
||
|
||
### Rubric for: [Artifact Type]
|
||
|
||
**Purpose**: [Brief description]
|
||
|
||
#### Level 5: Excellent
|
||
**Overall quality**: [Integrated description touching all important aspects]
|
||
- Criterion A: [How it manifests at this level]
|
||
- Criterion B: [How it manifests at this level]
|
||
- Criterion C: [How it manifests at this level]
|
||
|
||
**Example**: [Work that exemplifies this level]
|
||
|
||
#### Level 4: Good
|
||
**Overall quality**: [Integrated description]
|
||
- Differences from Level 5: [What's less strong]
|
||
- Key characteristics: [Observable features]
|
||
|
||
**Example**: [Work that exemplifies this level]
|
||
|
||
#### Level 3: Adequate
|
||
**Overall quality**: [Integrated description of baseline acceptable]
|
||
- Meets minimum standards: [What's required]
|
||
- May have: [Acceptable weaknesses]
|
||
|
||
**Example**: [Work that exemplifies this level]
|
||
|
||
#### Level 2: Weak
|
||
**Overall quality**: [Integrated description of below standard]
|
||
- Falls short because: [Key deficiencies]
|
||
- Problems include: [Observable issues]
|
||
|
||
**Example**: [Work that exemplifies this level]
|
||
|
||
#### Level 1: Poor
|
||
**Overall quality**: [Integrated description of unacceptable]
|
||
- Major problems: [Significant deficiencies across multiple aspects]
|
||
|
||
**Example**: [Work that exemplifies this level]
|
||
|
||
---
|
||
|
||
## Single-Point Rubric Template
|
||
|
||
Lists criteria with "meets standard" description only, space to note exceeds/below.
|
||
|
||
### Rubric for: [Artifact Type]
|
||
|
||
| Criterion | Concerns (Below Standard) | Meets Standard | Advanced (Exceeds Standard) |
|
||
|-----------|---------------------------|----------------|----------------------------|
|
||
| **[Criterion 1]** | | [Clear description of standard] | |
|
||
| **[Criterion 2]** | | [Clear description of standard] | |
|
||
| **[Criterion 3]** | | [Clear description of standard] | |
|
||
| **[Criterion 4]** | | [Clear description of standard] | |
|
||
|
||
**Usage:**
|
||
- Check if work meets standard for each criterion
|
||
- Note specific strengths in "Advanced" column (e.g., "+Exceptionally clear examples")
|
||
- Note specific areas for improvement in "Concerns" column (e.g., "-Missing citations for 3 claims")
|
||
|
||
---
|
||
|
||
## Checklist Template
|
||
|
||
Binary yes/no for must-have requirements.
|
||
|
||
### Checklist for: [Artifact Type]
|
||
|
||
#### Category 1: [e.g., Completeness]
|
||
- [ ] [Specific requirement 1]
|
||
- [ ] [Specific requirement 2]
|
||
- [ ] [Specific requirement 3]
|
||
|
||
#### Category 2: [e.g., Quality]
|
||
- [ ] [Specific requirement 4]
|
||
- [ ] [Specific requirement 5]
|
||
|
||
#### Category 3: [e.g., Compliance]
|
||
- [ ] [Specific requirement 6]
|
||
- [ ] [Specific requirement 7]
|
||
|
||
**Pass/Fail Criteria:**
|
||
- **Pass**: All items checked OR All items in critical categories + X% of others
|
||
- **Fail**: Any critical item unchecked OR <Y% total items checked
|
||
|
||
---
|
||
|
||
## Weighted Scoring Template
|
||
|
||
When criteria have different importance.
|
||
|
||
### Weighted Rubric for: [Artifact Type]
|
||
|
||
| Criterion | Score (1-5) | Weight | Weighted Score |
|
||
|-----------|-------------|--------|----------------|
|
||
| [Criterion 1] | | ×3 (Critical) | Score × 3 = |
|
||
| [Criterion 2] | | ×2 (Important) | Score × 2 = |
|
||
| [Criterion 3] | | ×2 (Important) | Score × 2 = |
|
||
| [Criterion 4] | | ×1 (Desirable) | Score × 1 = |
|
||
| **Total** | | **8** | **[Sum] / 8 =** |
|
||
|
||
**Weight categories:**
|
||
- ×3 = Critical (must be strong, threshold: ≥4 required)
|
||
- ×2 = Important (significant impact on overall quality)
|
||
- ×1 = Desirable (nice to have, less critical)
|
||
|
||
---
|
||
|
||
## Calibration Session Template
|
||
|
||
**Pre-calibration:**
|
||
1. Select 3-5 sample works spanning quality range (low, medium, high)
|
||
2. Have each reviewer independently score samples using rubric
|
||
3. Record scores without discussion
|
||
|
||
**During calibration:**
|
||
4. Compare scores across reviewers for each sample
|
||
5. For discrepancies (>1 point difference):
|
||
- Discuss what each reviewer saw
|
||
- Identify ambiguous descriptors
|
||
- Clarify criterion boundaries
|
||
- Refine rubric language
|
||
6. Re-score samples using refined rubric
|
||
|
||
**Post-calibration:**
|
||
7. Calculate inter-rater reliability (% agreement, Kappa)
|
||
8. Target: ≥70% agreement (within 1 point) or Kappa ≥0.6
|
||
9. If below target: Iterate with more refinement + calibration
|
||
10. Document calibration decisions and rubric changes
|
||
|
||
---
|
||
|
||
## Feedback Template
|
||
|
||
**For: [Evaluatee Name]**
|
||
|
||
**Overall Score**: [X.X / 5.0 or Level]
|
||
|
||
**Criterion-by-Criterion Scores:**
|
||
|
||
| Criterion | Score | Feedback |
|
||
|-----------|-------|----------|
|
||
| [Criterion 1] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
|
||
| [Criterion 2] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
|
||
| [Criterion 3] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
|
||
|
||
**Summary:**
|
||
- **Greatest strengths**: [2-3 specific strengths]
|
||
- **Priority improvements**: [2-3 most important areas to address]
|
||
- **Next steps**: [Actionable recommendations]
|
||
|
||
**Overall assessment**: [Pass/Fail or qualitative judgment]
|