Files
gh-lyndonkl-claude/skills/evaluation-rubrics/resources/template.md
2025-11-30 08:38:26 +08:00

415 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Evaluation Rubrics Templates
Quick-start templates for purpose definition, criteria selection, scale design, descriptor writing, and rubric formats.
## Workflow
```
Rubric Development Progress:
- [ ] Step 1: Define purpose and scope
- [ ] Step 2: Identify evaluation criteria
- [ ] Step 3: Design the scale
- [ ] Step 4: Write performance descriptors
- [ ] Step 5: Test and calibrate
- [ ] Step 6: Use and iterate
```
**Step 1: Define purpose and scope**
Use [Purpose Definition Template](#purpose-definition-template) to clarify evaluation context and constraints.
**Step 2: Identify evaluation criteria**
Brainstorm and prioritize quality dimensions using [Criteria Identification Template](#criteria-identification-template).
**Step 3: Design the scale**
Select scale type and levels using [Scale Selection Template](#scale-selection-template).
**Step 4: Write performance descriptors**
Write clear, observable descriptors using [Descriptor Writing Template](#descriptor-writing-template).
**Step 5: Test and calibrate**
Conduct inter-rater reliability testing and refine rubric.
**Step 6: Use and iterate**
Apply rubric, collect feedback, revise as needed.
---
## Purpose Definition Template
**What are we evaluating?**
- Artifact type: [e.g., code pull requests, research proposals, design mockups, student essays]
- Specific context: [e.g., internal code review, grant competition, course assignment]
**Who will evaluate?**
- Number of evaluators: [Single reviewer or multiple?]
- Evaluator expertise: [Subject matter experts, peers, instructors, automated systems]
- Evaluator availability: [Time per evaluation? Total volume?]
**Who are the evaluatees?**
- Audience: [Students, employees, vendors, applicants]
- Skill level: [Novice, intermediate, expert]
- Will they see rubric before evaluation? [Yes/No - if yes, rubric serves as guide]
**What decisions depend on scores?**
- High stakes: [Pass/fail, hiring, funding, promotion, grades]
- Medium stakes: [Feedback for improvement, prioritization, awards]
- Low stakes: [Self-assessment, informal feedback]
**Success criteria for rubric:**
- [ ] Enables consistent scoring across evaluators (inter-rater reliability >70%)
- [ ] Provides actionable feedback for improvement
- [ ] Takes reasonable time to use (target: X minutes per evaluation)
- [ ] Acceptable to evaluators (not overly complex or rigid)
- [ ] Acceptable to evaluatees (perceived as fair and transparent)
---
## Criteria Identification Template
### Brainstorming Quality Dimensions
**Product criteria** (artifact itself):
- Correctness/Accuracy: [Is it right? Factually accurate? Meets requirements?]
- Completeness: [Covers all necessary elements? No major gaps?]
- Clarity: [Easy to understand? Well-organized? Clear communication?]
- Quality/Craftsmanship: [Attention to detail? Polished? Professional?]
- Originality/Creativity: [Novel approach? Innovative? Goes beyond expected?]
- Performance: [Fast? Efficient? Scalable? Meets technical specs?]
**Process criteria** (how it was made):
- Methodology: [Followed appropriate process? Research methods sound?]
- Collaboration: [Teamwork? Communication? Used feedback?]
- Iteration: [Multiple drafts? Refinement? Responsiveness to critique?]
- Time management: [Completed on time? Paced work appropriately?]
**Impact criteria** (effects/outcomes):
- Usability: [User-friendly? Accessible? Intuitive?]
- Value: [Solves problem? Addresses need? Business impact?]
- Learning demonstrated: [Shows understanding? Growth from previous work?]
**Meta criteria** (quality of quality):
- Maintainability: [Can others work with this? Documented? Modular?]
- Testability: [Can be verified? Validated? Measured?]
- Extensibility: [Can be built upon? Flexible? Adaptable?]
### Prioritization
**Rate each candidate criterion:**
| Criterion | Importance (H/M/L) | Observable (Y/N) | Distinct from others (Y/N) | Include? |
|-----------|-------------------|------------------|---------------------------|----------|
| [Criterion 1] | | | | |
| [Criterion 2] | | | | |
| [Criterion 3] | | | | |
**Selection rules:**
- Must be High or Medium importance
- Must be Observable (can two reviewers score consistently?)
- Must be Distinct (not overlapping with other criteria)
- Aim for 4-8 criteria (balance coverage vs. simplicity)
**Final criteria** (4-8 selected):
1. [Criterion]: [Brief definition]
2. [Criterion]: [Brief definition]
3. [Criterion]: [Brief definition]
4. [Criterion]: [Brief definition]
---
## Scale Selection Template
**Scale type options:**
### Numeric Scales
**1-3 scale** (Low/Medium/High)
- Use when: Quick categorization, clear tiers sufficient
- Levels: 1=Below standard, 2=Meets standard, 3=Exceeds standard
**1-4 scale** (Forced choice, no middle)
- Use when: Want to avoid central tendency, need clear differentiation
- Levels: 1=Poor, 2=Fair, 3=Good, 4=Excellent
**1-5 scale** (Most common, allows neutral)
- Use when: General purpose, familiar to evaluators
- Levels: 1=Poor, 2=Fair, 3=Adequate, 4=Good, 5=Excellent
**1-10 scale** (Fine gradations)
- Use when: Large sample, need statistical analysis, can distinguish subtle differences
- Levels: 1-2=Poor, 3-4=Fair, 5-6=Adequate, 7-8=Good, 9-10=Excellent
### Qualitative Scales
**Developmental**: Novice → Developing → Proficient → Expert
**Standards-based**: Below Standard → Approaching → Meets → Exceeds
**Competency**: Not Yet Competent → Partially Competent → Competent → Highly Competent
### Binary
**Pass/Fail, Yes/No, Present/Absent**
- Use when: Compliance checks, minimum thresholds, clear criteria
**Selected scale for this rubric**: [Choose one]
- **Type**: [Numeric 1-5, Qualitative, etc.]
- **Levels**: [List with labels]
- **Rationale**: [Why this scale fits purpose]
---
## Descriptor Writing Template
For each criterion, write descriptors at each scale level.
### Criterion: [Name]
**Definition**: [What does this criterion assess? 1-2 sentences]
**Why it matters**: [Importance to overall quality]
**Scale descriptors:**
#### Level 5 (or highest): [Label]
**Observable characteristics**:
- [Concrete, observable feature 1]
- [Concrete, observable feature 2]
- [Concrete, observable feature 3]
**Example**: [Specific instance of work at this level]
#### Level 4: [Label]
**Observable characteristics**:
- [How this differs from Level 5 - what's missing or less strong]
- [Concrete observable feature]
**Example**: [Specific instance]
#### Level 3: [Label] (Baseline/Adequate)
**Observable characteristics**:
- [Minimum acceptable performance]
- [Observable feature]
**Example**: [Specific instance]
#### Level 2: [Label]
**Observable characteristics**:
- [What's lacking compared to Level 3]
- [Observable deficiency]
**Example**: [Specific instance]
#### Level 1 (or lowest): [Label]
**Observable characteristics**:
- [Significant deficiencies]
- [Observable problems]
**Example**: [Specific instance]
---
### Descriptor Writing Guidelines
**DO:**
- Use observable, measurable language ("Contains 3+ bugs" not "poor quality")
- Provide concrete examples or anchors for each level
- Focus on what IS present at each level, not just "less than" higher level
- Use parallel structure across levels (same aspects addressed at each level)
- Specify quantities when possible ("All 5 requirements met" vs "Most requirements met")
**DON'T:**
- Use subjective terms without definition ("creative", "professional", "excellent effort")
- Rely on comparative language only ("better than", "more sophisticated")
- Make assumptions about process ("spent time", "worked hard" - unless observable)
- Penalize for things not mentioned in descriptor (hidden expectations)
---
## Analytic Rubric Template
Most common format: Multiple criteria (rows) × Multiple levels (columns)
### Rubric for: [Artifact Type]
**Purpose**: [Brief description]
**Scale**: [1-5, 1-4, etc. with labels]
| Criterion | 1 | 2 | 3 | 4 | 5 | Weight |
|-----------|---|---|---|---|---|--------|
| **[Criterion 1]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
| **[Criterion 2]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
| **[Criterion 3]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
| **[Criterion 4]** | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [Descriptor] | [×N or %] |
**Scoring:**
- Calculate: (Score1 × Weight1) + (Score2 × Weight2) + ... / Total Weights
- Threshold: [e.g., Must average ≥3.0 to pass, ≥4 on critical criteria]
**Usage notes:**
- Score each criterion independently before looking at others (avoid halo effect)
- Provide brief justification for each score
- Flag areas for improvement in feedback
---
## Holistic Rubric Template
Single overall score integrating multiple criteria.
### Rubric for: [Artifact Type]
**Purpose**: [Brief description]
#### Level 5: Excellent
**Overall quality**: [Integrated description touching all important aspects]
- Criterion A: [How it manifests at this level]
- Criterion B: [How it manifests at this level]
- Criterion C: [How it manifests at this level]
**Example**: [Work that exemplifies this level]
#### Level 4: Good
**Overall quality**: [Integrated description]
- Differences from Level 5: [What's less strong]
- Key characteristics: [Observable features]
**Example**: [Work that exemplifies this level]
#### Level 3: Adequate
**Overall quality**: [Integrated description of baseline acceptable]
- Meets minimum standards: [What's required]
- May have: [Acceptable weaknesses]
**Example**: [Work that exemplifies this level]
#### Level 2: Weak
**Overall quality**: [Integrated description of below standard]
- Falls short because: [Key deficiencies]
- Problems include: [Observable issues]
**Example**: [Work that exemplifies this level]
#### Level 1: Poor
**Overall quality**: [Integrated description of unacceptable]
- Major problems: [Significant deficiencies across multiple aspects]
**Example**: [Work that exemplifies this level]
---
## Single-Point Rubric Template
Lists criteria with "meets standard" description only, space to note exceeds/below.
### Rubric for: [Artifact Type]
| Criterion | Concerns (Below Standard) | Meets Standard | Advanced (Exceeds Standard) |
|-----------|---------------------------|----------------|----------------------------|
| **[Criterion 1]** | | [Clear description of standard] | |
| **[Criterion 2]** | | [Clear description of standard] | |
| **[Criterion 3]** | | [Clear description of standard] | |
| **[Criterion 4]** | | [Clear description of standard] | |
**Usage:**
- Check if work meets standard for each criterion
- Note specific strengths in "Advanced" column (e.g., "+Exceptionally clear examples")
- Note specific areas for improvement in "Concerns" column (e.g., "-Missing citations for 3 claims")
---
## Checklist Template
Binary yes/no for must-have requirements.
### Checklist for: [Artifact Type]
#### Category 1: [e.g., Completeness]
- [ ] [Specific requirement 1]
- [ ] [Specific requirement 2]
- [ ] [Specific requirement 3]
#### Category 2: [e.g., Quality]
- [ ] [Specific requirement 4]
- [ ] [Specific requirement 5]
#### Category 3: [e.g., Compliance]
- [ ] [Specific requirement 6]
- [ ] [Specific requirement 7]
**Pass/Fail Criteria:**
- **Pass**: All items checked OR All items in critical categories + X% of others
- **Fail**: Any critical item unchecked OR <Y% total items checked
---
## Weighted Scoring Template
When criteria have different importance.
### Weighted Rubric for: [Artifact Type]
| Criterion | Score (1-5) | Weight | Weighted Score |
|-----------|-------------|--------|----------------|
| [Criterion 1] | | ×3 (Critical) | Score × 3 = |
| [Criterion 2] | | ×2 (Important) | Score × 2 = |
| [Criterion 3] | | ×2 (Important) | Score × 2 = |
| [Criterion 4] | | ×1 (Desirable) | Score × 1 = |
| **Total** | | **8** | **[Sum] / 8 =** |
**Weight categories:**
- ×3 = Critical (must be strong, threshold: ≥4 required)
- ×2 = Important (significant impact on overall quality)
- ×1 = Desirable (nice to have, less critical)
---
## Calibration Session Template
**Pre-calibration:**
1. Select 3-5 sample works spanning quality range (low, medium, high)
2. Have each reviewer independently score samples using rubric
3. Record scores without discussion
**During calibration:**
4. Compare scores across reviewers for each sample
5. For discrepancies (>1 point difference):
- Discuss what each reviewer saw
- Identify ambiguous descriptors
- Clarify criterion boundaries
- Refine rubric language
6. Re-score samples using refined rubric
**Post-calibration:**
7. Calculate inter-rater reliability (% agreement, Kappa)
8. Target: ≥70% agreement (within 1 point) or Kappa ≥0.6
9. If below target: Iterate with more refinement + calibration
10. Document calibration decisions and rubric changes
---
## Feedback Template
**For: [Evaluatee Name]**
**Overall Score**: [X.X / 5.0 or Level]
**Criterion-by-Criterion Scores:**
| Criterion | Score | Feedback |
|-----------|-------|----------|
| [Criterion 1] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
| [Criterion 2] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
| [Criterion 3] | X/5 | **Strengths**: [What was done well]<br>**Areas for improvement**: [Specific suggestions] |
**Summary:**
- **Greatest strengths**: [2-3 specific strengths]
- **Priority improvements**: [2-3 most important areas to address]
- **Next steps**: [Actionable recommendations]
**Overall assessment**: [Pass/Fail or qualitative judgment]