Files
gh-k-dense-ai-claude-scient…/skills/scholar-evaluation/references/evaluation_framework.md
2025-11-30 08:30:18 +08:00

664 lines
20 KiB
Markdown

# ScholarEval Evaluation Framework
## Overview
This document provides detailed evaluation criteria, rubrics, and quality indicators for each dimension of the ScholarEval framework. Use these standards when conducting systematic evaluations of scholarly work.
---
## Dimension 1: Problem Formulation & Research Questions
### Quality Indicators
**Excellent (5):**
- Research question is specific, measurable, and clearly articulated
- Problem addresses significant gap in literature with high impact potential
- Scope is appropriate and feasible within constraints
- Novel contribution is clearly differentiated from existing work
- Theoretical or practical significance is compellingly justified
**Good (4):**
- Research question is clear with minor ambiguities
- Problem is relevant with moderate impact potential
- Scope is generally appropriate with minor feasibility concerns
- Contribution is identifiable though not groundbreaking
- Significance is adequately justified
**Adequate (3):**
- Research question is present but lacks specificity
- Problem relevance is unclear or incremental
- Scope may be too broad or narrow
- Contribution is unclear or overlaps heavily with existing work
- Significance justification is weak
**Needs Improvement (2):**
- Research question is vague or poorly defined
- Problem lacks clear relevance or significance
- Scope is inappropriate or infeasible
- Contribution is not articulated
- No clear justification for significance
**Poor (1):**
- No clear research question
- Problem is trivial or irrelevant
- Scope is fundamentally flawed
- No identifiable contribution
- No significance justification
### Assessment Checklist
- [ ] Is the research question clearly stated?
- [ ] Can the question be answered with the proposed approach?
- [ ] Is the problem significant to the field?
- [ ] Is the scope feasible within resource constraints?
- [ ] Is the novelty/contribution clearly articulated?
- [ ] Are key assumptions explicitly stated?
- [ ] Are success criteria or expected outcomes defined?
---
## Dimension 2: Literature Review
### Quality Indicators
**Excellent (5):**
- Comprehensive coverage of relevant literature across key areas
- Critical synthesis identifying patterns, contradictions, and gaps
- Literature is current (majority from last 3-5 years for rapidly evolving fields)
- Sources are authoritative and peer-reviewed
- Clear positioning of current work within scholarly conversation
- Identifies genuine research gaps that the work addresses
**Good (4):**
- Good coverage with minor gaps in key areas
- Mostly synthesis with some description
- Literature is mostly current with some older foundational works
- Sources are generally authoritative
- Work positioning is present but could be stronger
- Research gaps are identified but may not be critical
**Adequate (3):**
- Partial coverage with notable gaps
- More descriptive summarization than synthesis
- Literature mix of current and dated sources
- Mix of authoritative and less rigorous sources
- Weak positioning within existing literature
- Research gaps are vague or questionable
**Needs Improvement (2):**
- Minimal coverage with major gaps
- Purely descriptive without synthesis
- Literature is largely outdated
- Sources lack authority or rigor
- Little to no positioning of current work
- No clear research gaps identified
**Poor (1):**
- Inadequate or absent literature review
- No synthesis
- Outdated or inappropriate sources
- No engagement with scholarly conversation
- No gap identification
### Assessment Checklist
- [ ] Does review cover all major relevant areas?
- [ ] Is literature synthesized rather than just summarized?
- [ ] Are sources current and authoritative?
- [ ] Are contrasting viewpoints presented?
- [ ] Are research gaps clearly identified?
- [ ] Is the current work positioned within existing literature?
- [ ] Is citation balance appropriate (not over-relying on few authors)?
- [ ] Are seminal/foundational works included?
### Common Issues
- **Insufficient coverage**: Missing key papers or research streams
- **Descriptive listing**: Summarizing papers sequentially without synthesis
- **Outdated sources**: Relying on literature more than 5-10 years old
- **Cherry-picking**: Only citing work that supports hypothesis
- **Poor organization**: Lack of thematic or conceptual structure
- **Weak gap identification**: Gaps are trivial or not actually gaps
---
## Dimension 3: Methodology & Research Design
### Quality Indicators
**Excellent (5):**
- Research design perfectly aligned with research questions
- Methods are rigorous, valid, and reliable
- Procedures are detailed enough for replication
- Controls, randomization, or triangulation appropriate
- Potential biases acknowledged and mitigated
- Ethical considerations addressed comprehensively
- Limitations are explicitly discussed
**Good (4):**
- Design is appropriate with minor alignment issues
- Methods are sound with small validity concerns
- Procedures are mostly replicable
- Some controls or validation present
- Major biases addressed
- Ethical considerations mentioned
- Some limitations discussed
**Adequate (3):**
- Design partially appropriate for questions
- Methods have notable validity concerns
- Procedures lack detail for full replication
- Limited controls or validation
- Bias mitigation is minimal
- Ethics addressed superficially
- Limitations minimally discussed
**Needs Improvement (2):**
- Design poorly aligned with research questions
- Methods have serious validity issues
- Procedures too vague to replicate
- No controls or validation
- Biases not addressed
- Ethical concerns not addressed
- No limitation discussion
**Poor (1):**
- Inappropriate or absent methodology
- Methods fundamentally flawed
- Not replicable
- No validity considerations
- No ethical considerations
- No acknowledgment of limitations
### Assessment Checklist
- [ ] Is methodology appropriate for research questions?
- [ ] Are procedures described in sufficient detail?
- [ ] Can the study be replicated from the description?
- [ ] Are validity and reliability addressed?
- [ ] Are potential biases identified and mitigated?
- [ ] Are ethical considerations discussed?
- [ ] Are limitations acknowledged?
- [ ] Is sample size justified (for quantitative work)?
- [ ] Are qualitative methods rigorous (if applicable)?
### Design-Specific Considerations
**Quantitative Studies:**
- Sample size with power analysis
- Control groups and randomization
- Measurement validity and reliability
- Statistical assumptions checking
**Qualitative Studies:**
- Sampling strategy and saturation
- Data collection procedures
- Coding and analysis framework
- Trustworthiness criteria (credibility, transferability, etc.)
**Mixed Methods:**
- Integration rationale
- Sequencing justification
- Data convergence strategy
---
## Dimension 4: Data Collection & Sources
### Quality Indicators
**Excellent (5):**
- Data sources are highly credible and appropriate
- Sample size is sufficient and well-justified
- Data collection procedures are rigorous and systematic
- Data quality controls are in place
- Sampling strategy ensures representativeness
- Missing data is minimal and handled appropriately
**Good (4):**
- Data sources are credible with minor concerns
- Sample size is adequate
- Collection procedures are systematic
- Some quality controls present
- Sampling is reasonable
- Missing data is addressed
**Adequate (3):**
- Data sources are acceptable but not optimal
- Sample size is marginal
- Collection procedures lack some rigor
- Limited quality controls
- Sampling may have bias concerns
- Missing data handling is basic
**Needs Improvement (2):**
- Data sources have credibility issues
- Sample size is insufficient
- Collection procedures are ad hoc
- No quality controls
- Sampling is clearly biased
- Missing data not addressed
**Poor (1):**
- Data sources are inappropriate or unreliable
- Sample size is inadequate
- Collection is unsystematic
- No quality considerations
- Sampling is fundamentally flawed
- Excessive missing data
### Assessment Checklist
- [ ] Are data sources credible and appropriate?
- [ ] Is sample size sufficient for conclusions?
- [ ] Is sampling strategy clearly described?
- [ ] Is the sample representative of target population?
- [ ] Are data collection procedures systematic?
- [ ] Are data quality controls described?
- [ ] Is missing data addressed?
- [ ] Are any potential data biases discussed?
---
## Dimension 5: Analysis & Interpretation
### Quality Indicators
**Excellent (5):**
- Analytical methods perfectly suited to data and questions
- Analysis is rigorous with appropriate techniques
- Results interpretation is logical and well-supported
- Alternative explanations are considered
- Claims are proportionate to evidence
- Assumptions are validated
- Analysis is transparent and reproducible
**Good (4):**
- Methods are appropriate with minor issues
- Analysis is sound
- Interpretation is mostly logical
- Some alternatives considered
- Claims generally match evidence
- Key assumptions checked
- Analysis is mostly transparent
**Adequate (3):**
- Methods are acceptable but not optimal
- Analysis has some technical issues
- Interpretation has logical gaps
- Alternatives not thoroughly explored
- Some claims exceed evidence
- Assumptions not fully validated
- Analysis transparency is limited
**Needs Improvement (2):**
- Methods are questionable for data/questions
- Analysis has significant technical flaws
- Interpretation is poorly supported
- No alternative explanations
- Claims significantly exceed evidence
- Assumptions not checked
- Analysis is not transparent
**Poor (1):**
- Methods are inappropriate
- Analysis is fundamentally flawed
- Interpretation is illogical
- No consideration of alternatives
- Claims unsupported by evidence
- No assumption validation
- Analysis is opaque
### Assessment Checklist
- [ ] Are analytical methods appropriate?
- [ ] Are statistical tests/qualitative methods properly applied?
- [ ] Are assumptions tested?
- [ ] Is interpretation logical and well-supported?
- [ ] Are alternative explanations considered?
- [ ] Do claims align with evidence strength?
- [ ] Is analysis reproducible from description?
- [ ] Are uncertainties acknowledged?
### Quantitative Analysis
- Appropriate statistical tests
- Assumptions checked (normality, homogeneity, etc.)
- Effect sizes reported
- Confidence intervals provided
- Multiple testing corrections (if applicable)
- Model diagnostics performed
### Qualitative Analysis
- Coding framework is clear
- Inter-rater reliability (if applicable)
- Saturation discussed
- Negative cases examined
- Member checking or validation
- Clear audit trail
---
## Dimension 6: Results & Findings
### Quality Indicators
**Excellent (5):**
- Results are clearly and comprehensively presented
- Visualizations are effective and appropriate
- Statistical or qualitative rigor is evident
- Key findings are highlighted effectively
- Results directly address research questions
- Patterns and relationships are clearly shown
- Negative and null results are reported
**Good (4):**
- Results are clear with minor presentation issues
- Visualizations are generally effective
- Rigor is present
- Main findings are identifiable
- Results mostly address questions
- Patterns are shown
- Some negative results included
**Adequate (3):**
- Results presentation is adequate but could be clearer
- Visualizations are basic or have issues
- Rigor is questionable in places
- Findings are present but not emphasized
- Partial alignment with questions
- Patterns are unclear
- Negative results may be omitted
**Needs Improvement (2):**
- Results presentation is unclear or confusing
- Visualizations are poor or misleading
- Lack of rigor
- Findings are difficult to identify
- Weak alignment with questions
- No clear patterns
- Only positive results shown
**Poor (1):**
- Results are poorly presented or absent
- Visualizations are inappropriate or missing
- No evidence of rigor
- Findings are unclear
- Results don't address questions
- No identifiable patterns
- Results appear selective
### Assessment Checklist
- [ ] Are results clearly presented?
- [ ] Do results directly address research questions?
- [ ] Are visualizations appropriate and effective?
- [ ] Are key findings highlighted?
- [ ] Are negative/null results reported?
- [ ] Is appropriate precision reported (p-values, CIs, effect sizes)?
- [ ] Are qualitative findings supported by data excerpts?
- [ ] Is there evidence of selective reporting?
### Presentation Quality
**Tables:**
- Clear labels and captions
- Appropriate precision
- Organized logically
- Not overly complex
**Figures:**
- Clear axes and legends
- Appropriate chart type
- Professional appearance
- Accessible (color-blind friendly)
**Text:**
- Highlights key findings
- Avoids redundancy with tables/figures
- Uses appropriate statistical language
---
## Dimension 7: Scholarly Writing & Presentation
### Quality Indicators
**Excellent (5):**
- Writing is clear, concise, and precise
- Organization is logical with excellent flow
- Academic tone is appropriate and consistent
- Grammar and mechanics are flawless
- Technical terms are used correctly
- Accessible to target audience
- Abstract/summary is comprehensive and accurate
**Good (4):**
- Writing is clear with minor awkwardness
- Organization is logical with good flow
- Tone is mostly appropriate
- Few grammar/mechanical errors
- Technical terms mostly correct
- Generally accessible
- Abstract is adequate
**Adequate (3):**
- Writing is understandable but has clarity issues
- Organization has some logical gaps
- Tone inconsistencies
- Noticeable grammar/mechanical errors
- Some technical term misuse
- Accessibility issues for target audience
- Abstract is incomplete or vague
**Needs Improvement (2):**
- Writing is often unclear or verbose
- Poor organization and flow
- Tone is inappropriate
- Frequent grammar/mechanical errors
- Technical terminology problems
- Not accessible to target audience
- Abstract is poor or missing
**Poor (1):**
- Writing is unclear and difficult to follow
- No clear organization
- Tone is inappropriate
- Pervasive grammar/mechanical errors
- Incorrect technical terminology
- Inaccessible
- No adequate abstract
### Assessment Checklist
- [ ] Is writing clear and concise?
- [ ] Is organization logical?
- [ ] Is tone appropriate for academic writing?
- [ ] Are grammar and mechanics correct?
- [ ] Are technical terms used appropriately?
- [ ] Is jargon explained when necessary?
- [ ] Does abstract accurately summarize the work?
- [ ] Are transitions between sections smooth?
- [ ] Is the target audience clear?
### Common Writing Issues
- **Wordiness**: Unnecessarily complex or lengthy prose
- **Passive voice overuse**: Reduces clarity and directness
- **Paragraph structure**: Lack of topic sentences or coherence
- **Redundancy**: Repeating information unnecessarily
- **Logical flow**: Poor transitions between ideas
- **Precision**: Vague or ambiguous language
- **Accessibility**: Too technical or not technical enough
---
## Dimension 8: Citations & References
### Quality Indicators
**Excellent (5):**
- All claims are appropriately cited
- Sources are authoritative and current
- Citations are accurate and complete
- Diverse perspectives are represented
- Citation format is consistent and correct
- Balance between self-citation and others
- Primary sources used appropriately
**Good (4):**
- Most claims are cited
- Sources are generally authoritative
- Few citation errors
- Reasonable diversity of sources
- Format is mostly consistent
- Citation balance is good
- Mix of primary and secondary sources
**Adequate (3):**
- Some claims lack citations
- Source quality is mixed
- Several citation errors
- Limited source diversity
- Format inconsistencies
- Citation balance issues
- Over-reliance on secondary sources
**Needs Improvement (2):**
- Many claims uncited
- Sources are questionable
- Numerous citation errors
- Narrow source base
- Format is inconsistent
- Excessive self-citation or narrow citing
- Inappropriate sources (e.g., only secondary)
**Poor (1):**
- Inadequate citations
- Unreliable sources
- Pervasive citation errors
- Minimal source diversity
- No consistent format
- Severe citation imbalance
- Inappropriate source types
### Assessment Checklist
- [ ] Are all factual claims cited?
- [ ] Are citations to primary sources when appropriate?
- [ ] Are sources authoritative and peer-reviewed?
- [ ] Is there balance in perspectives cited?
- [ ] Are citations accurate (authors, dates, pages)?
- [ ] Is citation format consistent?
- [ ] Are self-citations appropriate (typically <20%)?
- [ ] Are sources current (for time-sensitive topics)?
- [ ] Are classic/seminal works included where relevant?
### Citation Quality Assessment
**Source Types (in order of preference for most academic work):**
1. Peer-reviewed journal articles
2. Academic books from reputable publishers
3. Conference proceedings (field-dependent)
4. Technical reports from reputable institutions
5. Dissertations/theses
6. Preprints (with caution, field-dependent)
7. Grey literature (limited use)
8. Websites (rarely appropriate, except for factual data)
**Red Flags:**
- Wikipedia as a primary source
- Excessive self-citation (>30%)
- Only citing papers that support hypothesis
- Outdated sources when current ones exist
- Missing key papers in the field
- Citing abstracts only when full papers are available
- Inconsistent or incorrect citation format
---
## Cross-Cutting Considerations
### Reproducibility
Assess across dimensions:
- Are methods detailed enough to replicate?
- Are data and code available (or availability explained)?
- Are analysis steps transparent?
- Are materials/instruments specified?
### Ethics
Consider:
- IRB approval (for human subjects)
- Informed consent
- Privacy and confidentiality
- Conflicts of interest
- Research integrity
- Data sharing ethics
### Bias and Limitations
Evaluate whether:
- Potential biases are acknowledged
- Limitations are discussed honestly
- Boundary conditions are specified
- Generalizability is appropriately claimed
### Impact and Significance
Consider:
- Theoretical contribution
- Practical implications
- Policy relevance
- Methodological innovation
- Field advancement
---
## Scoring Guidelines
### Dimension Weighting (Suggested, Adjust by Context)
- Problem Formulation: 15%
- Literature Review: 15%
- Methodology: 20%
- Data Collection: 10%
- Analysis: 15%
- Results: 10%
- Writing: 10%
- Citations: 5%
### Overall Assessment Thresholds
- **Exceptional (4.5-5.0)**: Ready for top-tier publication
- **Strong (4.0-4.4)**: Publication-ready with minor revisions
- **Good (3.5-3.9)**: Major revisions required, promising work
- **Acceptable (3.0-3.4)**: Significant revisions needed
- **Weak (2.0-2.9)**: Fundamental issues, major rework required
- **Poor (<2.0)**: Not suitable for publication without complete revision
### Contextual Adjustments
Adjust standards based on:
- **Stage**: Proposal < Draft < Final submission
- **Venue**: Student thesis < Conference < Journal < Top-tier journal
- **Type**: Theoretical < Empirical < Meta-analysis
- **Field**: Standards vary by discipline
- **Purpose**: Educational < Professional < Publication
---
## Using This Framework
1. **Read the work thoroughly** before beginning evaluation
2. **Score each dimension** using the 5-point scale
3. **Document evidence** for each score with specific examples
4. **Consider context** and adjust expectations appropriately
5. **Synthesize findings** across dimensions
6. **Provide actionable feedback** prioritized by impact
7. **Balance criticism with recognition** of strengths
This framework is a guide, not a rigid checklist. Professional judgment should always be applied in context.