20 KiB
ScholarEval Evaluation Framework
Overview
This document provides detailed evaluation criteria, rubrics, and quality indicators for each dimension of the ScholarEval framework. Use these standards when conducting systematic evaluations of scholarly work.
Dimension 1: Problem Formulation & Research Questions
Quality Indicators
Excellent (5):
- Research question is specific, measurable, and clearly articulated
- Problem addresses significant gap in literature with high impact potential
- Scope is appropriate and feasible within constraints
- Novel contribution is clearly differentiated from existing work
- Theoretical or practical significance is compellingly justified
Good (4):
- Research question is clear with minor ambiguities
- Problem is relevant with moderate impact potential
- Scope is generally appropriate with minor feasibility concerns
- Contribution is identifiable though not groundbreaking
- Significance is adequately justified
Adequate (3):
- Research question is present but lacks specificity
- Problem relevance is unclear or incremental
- Scope may be too broad or narrow
- Contribution is unclear or overlaps heavily with existing work
- Significance justification is weak
Needs Improvement (2):
- Research question is vague or poorly defined
- Problem lacks clear relevance or significance
- Scope is inappropriate or infeasible
- Contribution is not articulated
- No clear justification for significance
Poor (1):
- No clear research question
- Problem is trivial or irrelevant
- Scope is fundamentally flawed
- No identifiable contribution
- No significance justification
Assessment Checklist
- Is the research question clearly stated?
- Can the question be answered with the proposed approach?
- Is the problem significant to the field?
- Is the scope feasible within resource constraints?
- Is the novelty/contribution clearly articulated?
- Are key assumptions explicitly stated?
- Are success criteria or expected outcomes defined?
Dimension 2: Literature Review
Quality Indicators
Excellent (5):
- Comprehensive coverage of relevant literature across key areas
- Critical synthesis identifying patterns, contradictions, and gaps
- Literature is current (majority from last 3-5 years for rapidly evolving fields)
- Sources are authoritative and peer-reviewed
- Clear positioning of current work within scholarly conversation
- Identifies genuine research gaps that the work addresses
Good (4):
- Good coverage with minor gaps in key areas
- Mostly synthesis with some description
- Literature is mostly current with some older foundational works
- Sources are generally authoritative
- Work positioning is present but could be stronger
- Research gaps are identified but may not be critical
Adequate (3):
- Partial coverage with notable gaps
- More descriptive summarization than synthesis
- Literature mix of current and dated sources
- Mix of authoritative and less rigorous sources
- Weak positioning within existing literature
- Research gaps are vague or questionable
Needs Improvement (2):
- Minimal coverage with major gaps
- Purely descriptive without synthesis
- Literature is largely outdated
- Sources lack authority or rigor
- Little to no positioning of current work
- No clear research gaps identified
Poor (1):
- Inadequate or absent literature review
- No synthesis
- Outdated or inappropriate sources
- No engagement with scholarly conversation
- No gap identification
Assessment Checklist
- Does review cover all major relevant areas?
- Is literature synthesized rather than just summarized?
- Are sources current and authoritative?
- Are contrasting viewpoints presented?
- Are research gaps clearly identified?
- Is the current work positioned within existing literature?
- Is citation balance appropriate (not over-relying on few authors)?
- Are seminal/foundational works included?
Common Issues
- Insufficient coverage: Missing key papers or research streams
- Descriptive listing: Summarizing papers sequentially without synthesis
- Outdated sources: Relying on literature more than 5-10 years old
- Cherry-picking: Only citing work that supports hypothesis
- Poor organization: Lack of thematic or conceptual structure
- Weak gap identification: Gaps are trivial or not actually gaps
Dimension 3: Methodology & Research Design
Quality Indicators
Excellent (5):
- Research design perfectly aligned with research questions
- Methods are rigorous, valid, and reliable
- Procedures are detailed enough for replication
- Controls, randomization, or triangulation appropriate
- Potential biases acknowledged and mitigated
- Ethical considerations addressed comprehensively
- Limitations are explicitly discussed
Good (4):
- Design is appropriate with minor alignment issues
- Methods are sound with small validity concerns
- Procedures are mostly replicable
- Some controls or validation present
- Major biases addressed
- Ethical considerations mentioned
- Some limitations discussed
Adequate (3):
- Design partially appropriate for questions
- Methods have notable validity concerns
- Procedures lack detail for full replication
- Limited controls or validation
- Bias mitigation is minimal
- Ethics addressed superficially
- Limitations minimally discussed
Needs Improvement (2):
- Design poorly aligned with research questions
- Methods have serious validity issues
- Procedures too vague to replicate
- No controls or validation
- Biases not addressed
- Ethical concerns not addressed
- No limitation discussion
Poor (1):
- Inappropriate or absent methodology
- Methods fundamentally flawed
- Not replicable
- No validity considerations
- No ethical considerations
- No acknowledgment of limitations
Assessment Checklist
- Is methodology appropriate for research questions?
- Are procedures described in sufficient detail?
- Can the study be replicated from the description?
- Are validity and reliability addressed?
- Are potential biases identified and mitigated?
- Are ethical considerations discussed?
- Are limitations acknowledged?
- Is sample size justified (for quantitative work)?
- Are qualitative methods rigorous (if applicable)?
Design-Specific Considerations
Quantitative Studies:
- Sample size with power analysis
- Control groups and randomization
- Measurement validity and reliability
- Statistical assumptions checking
Qualitative Studies:
- Sampling strategy and saturation
- Data collection procedures
- Coding and analysis framework
- Trustworthiness criteria (credibility, transferability, etc.)
Mixed Methods:
- Integration rationale
- Sequencing justification
- Data convergence strategy
Dimension 4: Data Collection & Sources
Quality Indicators
Excellent (5):
- Data sources are highly credible and appropriate
- Sample size is sufficient and well-justified
- Data collection procedures are rigorous and systematic
- Data quality controls are in place
- Sampling strategy ensures representativeness
- Missing data is minimal and handled appropriately
Good (4):
- Data sources are credible with minor concerns
- Sample size is adequate
- Collection procedures are systematic
- Some quality controls present
- Sampling is reasonable
- Missing data is addressed
Adequate (3):
- Data sources are acceptable but not optimal
- Sample size is marginal
- Collection procedures lack some rigor
- Limited quality controls
- Sampling may have bias concerns
- Missing data handling is basic
Needs Improvement (2):
- Data sources have credibility issues
- Sample size is insufficient
- Collection procedures are ad hoc
- No quality controls
- Sampling is clearly biased
- Missing data not addressed
Poor (1):
- Data sources are inappropriate or unreliable
- Sample size is inadequate
- Collection is unsystematic
- No quality considerations
- Sampling is fundamentally flawed
- Excessive missing data
Assessment Checklist
- Are data sources credible and appropriate?
- Is sample size sufficient for conclusions?
- Is sampling strategy clearly described?
- Is the sample representative of target population?
- Are data collection procedures systematic?
- Are data quality controls described?
- Is missing data addressed?
- Are any potential data biases discussed?
Dimension 5: Analysis & Interpretation
Quality Indicators
Excellent (5):
- Analytical methods perfectly suited to data and questions
- Analysis is rigorous with appropriate techniques
- Results interpretation is logical and well-supported
- Alternative explanations are considered
- Claims are proportionate to evidence
- Assumptions are validated
- Analysis is transparent and reproducible
Good (4):
- Methods are appropriate with minor issues
- Analysis is sound
- Interpretation is mostly logical
- Some alternatives considered
- Claims generally match evidence
- Key assumptions checked
- Analysis is mostly transparent
Adequate (3):
- Methods are acceptable but not optimal
- Analysis has some technical issues
- Interpretation has logical gaps
- Alternatives not thoroughly explored
- Some claims exceed evidence
- Assumptions not fully validated
- Analysis transparency is limited
Needs Improvement (2):
- Methods are questionable for data/questions
- Analysis has significant technical flaws
- Interpretation is poorly supported
- No alternative explanations
- Claims significantly exceed evidence
- Assumptions not checked
- Analysis is not transparent
Poor (1):
- Methods are inappropriate
- Analysis is fundamentally flawed
- Interpretation is illogical
- No consideration of alternatives
- Claims unsupported by evidence
- No assumption validation
- Analysis is opaque
Assessment Checklist
- Are analytical methods appropriate?
- Are statistical tests/qualitative methods properly applied?
- Are assumptions tested?
- Is interpretation logical and well-supported?
- Are alternative explanations considered?
- Do claims align with evidence strength?
- Is analysis reproducible from description?
- Are uncertainties acknowledged?
Quantitative Analysis
- Appropriate statistical tests
- Assumptions checked (normality, homogeneity, etc.)
- Effect sizes reported
- Confidence intervals provided
- Multiple testing corrections (if applicable)
- Model diagnostics performed
Qualitative Analysis
- Coding framework is clear
- Inter-rater reliability (if applicable)
- Saturation discussed
- Negative cases examined
- Member checking or validation
- Clear audit trail
Dimension 6: Results & Findings
Quality Indicators
Excellent (5):
- Results are clearly and comprehensively presented
- Visualizations are effective and appropriate
- Statistical or qualitative rigor is evident
- Key findings are highlighted effectively
- Results directly address research questions
- Patterns and relationships are clearly shown
- Negative and null results are reported
Good (4):
- Results are clear with minor presentation issues
- Visualizations are generally effective
- Rigor is present
- Main findings are identifiable
- Results mostly address questions
- Patterns are shown
- Some negative results included
Adequate (3):
- Results presentation is adequate but could be clearer
- Visualizations are basic or have issues
- Rigor is questionable in places
- Findings are present but not emphasized
- Partial alignment with questions
- Patterns are unclear
- Negative results may be omitted
Needs Improvement (2):
- Results presentation is unclear or confusing
- Visualizations are poor or misleading
- Lack of rigor
- Findings are difficult to identify
- Weak alignment with questions
- No clear patterns
- Only positive results shown
Poor (1):
- Results are poorly presented or absent
- Visualizations are inappropriate or missing
- No evidence of rigor
- Findings are unclear
- Results don't address questions
- No identifiable patterns
- Results appear selective
Assessment Checklist
- Are results clearly presented?
- Do results directly address research questions?
- Are visualizations appropriate and effective?
- Are key findings highlighted?
- Are negative/null results reported?
- Is appropriate precision reported (p-values, CIs, effect sizes)?
- Are qualitative findings supported by data excerpts?
- Is there evidence of selective reporting?
Presentation Quality
Tables:
- Clear labels and captions
- Appropriate precision
- Organized logically
- Not overly complex
Figures:
- Clear axes and legends
- Appropriate chart type
- Professional appearance
- Accessible (color-blind friendly)
Text:
- Highlights key findings
- Avoids redundancy with tables/figures
- Uses appropriate statistical language
Dimension 7: Scholarly Writing & Presentation
Quality Indicators
Excellent (5):
- Writing is clear, concise, and precise
- Organization is logical with excellent flow
- Academic tone is appropriate and consistent
- Grammar and mechanics are flawless
- Technical terms are used correctly
- Accessible to target audience
- Abstract/summary is comprehensive and accurate
Good (4):
- Writing is clear with minor awkwardness
- Organization is logical with good flow
- Tone is mostly appropriate
- Few grammar/mechanical errors
- Technical terms mostly correct
- Generally accessible
- Abstract is adequate
Adequate (3):
- Writing is understandable but has clarity issues
- Organization has some logical gaps
- Tone inconsistencies
- Noticeable grammar/mechanical errors
- Some technical term misuse
- Accessibility issues for target audience
- Abstract is incomplete or vague
Needs Improvement (2):
- Writing is often unclear or verbose
- Poor organization and flow
- Tone is inappropriate
- Frequent grammar/mechanical errors
- Technical terminology problems
- Not accessible to target audience
- Abstract is poor or missing
Poor (1):
- Writing is unclear and difficult to follow
- No clear organization
- Tone is inappropriate
- Pervasive grammar/mechanical errors
- Incorrect technical terminology
- Inaccessible
- No adequate abstract
Assessment Checklist
- Is writing clear and concise?
- Is organization logical?
- Is tone appropriate for academic writing?
- Are grammar and mechanics correct?
- Are technical terms used appropriately?
- Is jargon explained when necessary?
- Does abstract accurately summarize the work?
- Are transitions between sections smooth?
- Is the target audience clear?
Common Writing Issues
- Wordiness: Unnecessarily complex or lengthy prose
- Passive voice overuse: Reduces clarity and directness
- Paragraph structure: Lack of topic sentences or coherence
- Redundancy: Repeating information unnecessarily
- Logical flow: Poor transitions between ideas
- Precision: Vague or ambiguous language
- Accessibility: Too technical or not technical enough
Dimension 8: Citations & References
Quality Indicators
Excellent (5):
- All claims are appropriately cited
- Sources are authoritative and current
- Citations are accurate and complete
- Diverse perspectives are represented
- Citation format is consistent and correct
- Balance between self-citation and others
- Primary sources used appropriately
Good (4):
- Most claims are cited
- Sources are generally authoritative
- Few citation errors
- Reasonable diversity of sources
- Format is mostly consistent
- Citation balance is good
- Mix of primary and secondary sources
Adequate (3):
- Some claims lack citations
- Source quality is mixed
- Several citation errors
- Limited source diversity
- Format inconsistencies
- Citation balance issues
- Over-reliance on secondary sources
Needs Improvement (2):
- Many claims uncited
- Sources are questionable
- Numerous citation errors
- Narrow source base
- Format is inconsistent
- Excessive self-citation or narrow citing
- Inappropriate sources (e.g., only secondary)
Poor (1):
- Inadequate citations
- Unreliable sources
- Pervasive citation errors
- Minimal source diversity
- No consistent format
- Severe citation imbalance
- Inappropriate source types
Assessment Checklist
- Are all factual claims cited?
- Are citations to primary sources when appropriate?
- Are sources authoritative and peer-reviewed?
- Is there balance in perspectives cited?
- Are citations accurate (authors, dates, pages)?
- Is citation format consistent?
- Are self-citations appropriate (typically <20%)?
- Are sources current (for time-sensitive topics)?
- Are classic/seminal works included where relevant?
Citation Quality Assessment
Source Types (in order of preference for most academic work):
- Peer-reviewed journal articles
- Academic books from reputable publishers
- Conference proceedings (field-dependent)
- Technical reports from reputable institutions
- Dissertations/theses
- Preprints (with caution, field-dependent)
- Grey literature (limited use)
- Websites (rarely appropriate, except for factual data)
Red Flags:
- Wikipedia as a primary source
- Excessive self-citation (>30%)
- Only citing papers that support hypothesis
- Outdated sources when current ones exist
- Missing key papers in the field
- Citing abstracts only when full papers are available
- Inconsistent or incorrect citation format
Cross-Cutting Considerations
Reproducibility
Assess across dimensions:
- Are methods detailed enough to replicate?
- Are data and code available (or availability explained)?
- Are analysis steps transparent?
- Are materials/instruments specified?
Ethics
Consider:
- IRB approval (for human subjects)
- Informed consent
- Privacy and confidentiality
- Conflicts of interest
- Research integrity
- Data sharing ethics
Bias and Limitations
Evaluate whether:
- Potential biases are acknowledged
- Limitations are discussed honestly
- Boundary conditions are specified
- Generalizability is appropriately claimed
Impact and Significance
Consider:
- Theoretical contribution
- Practical implications
- Policy relevance
- Methodological innovation
- Field advancement
Scoring Guidelines
Dimension Weighting (Suggested, Adjust by Context)
- Problem Formulation: 15%
- Literature Review: 15%
- Methodology: 20%
- Data Collection: 10%
- Analysis: 15%
- Results: 10%
- Writing: 10%
- Citations: 5%
Overall Assessment Thresholds
- Exceptional (4.5-5.0): Ready for top-tier publication
- Strong (4.0-4.4): Publication-ready with minor revisions
- Good (3.5-3.9): Major revisions required, promising work
- Acceptable (3.0-3.4): Significant revisions needed
- Weak (2.0-2.9): Fundamental issues, major rework required
- Poor (<2.0): Not suitable for publication without complete revision
Contextual Adjustments
Adjust standards based on:
- Stage: Proposal < Draft < Final submission
- Venue: Student thesis < Conference < Journal < Top-tier journal
- Type: Theoretical < Empirical < Meta-analysis
- Field: Standards vary by discipline
- Purpose: Educational < Professional < Publication
Using This Framework
- Read the work thoroughly before beginning evaluation
- Score each dimension using the 5-point scale
- Document evidence for each score with specific examples
- Consider context and adjust expectations appropriately
- Synthesize findings across dimensions
- Provide actionable feedback prioritized by impact
- Balance criticism with recognition of strengths
This framework is a guide, not a rigid checklist. Professional judgment should always be applied in context.