Files
gh-k-dense-ai-claude-scient…/skills/scholar-evaluation/references/evaluation_framework.md
2025-11-30 08:30:18 +08:00

20 KiB

ScholarEval Evaluation Framework

Overview

This document provides detailed evaluation criteria, rubrics, and quality indicators for each dimension of the ScholarEval framework. Use these standards when conducting systematic evaluations of scholarly work.


Dimension 1: Problem Formulation & Research Questions

Quality Indicators

Excellent (5):

  • Research question is specific, measurable, and clearly articulated
  • Problem addresses significant gap in literature with high impact potential
  • Scope is appropriate and feasible within constraints
  • Novel contribution is clearly differentiated from existing work
  • Theoretical or practical significance is compellingly justified

Good (4):

  • Research question is clear with minor ambiguities
  • Problem is relevant with moderate impact potential
  • Scope is generally appropriate with minor feasibility concerns
  • Contribution is identifiable though not groundbreaking
  • Significance is adequately justified

Adequate (3):

  • Research question is present but lacks specificity
  • Problem relevance is unclear or incremental
  • Scope may be too broad or narrow
  • Contribution is unclear or overlaps heavily with existing work
  • Significance justification is weak

Needs Improvement (2):

  • Research question is vague or poorly defined
  • Problem lacks clear relevance or significance
  • Scope is inappropriate or infeasible
  • Contribution is not articulated
  • No clear justification for significance

Poor (1):

  • No clear research question
  • Problem is trivial or irrelevant
  • Scope is fundamentally flawed
  • No identifiable contribution
  • No significance justification

Assessment Checklist

  • Is the research question clearly stated?
  • Can the question be answered with the proposed approach?
  • Is the problem significant to the field?
  • Is the scope feasible within resource constraints?
  • Is the novelty/contribution clearly articulated?
  • Are key assumptions explicitly stated?
  • Are success criteria or expected outcomes defined?

Dimension 2: Literature Review

Quality Indicators

Excellent (5):

  • Comprehensive coverage of relevant literature across key areas
  • Critical synthesis identifying patterns, contradictions, and gaps
  • Literature is current (majority from last 3-5 years for rapidly evolving fields)
  • Sources are authoritative and peer-reviewed
  • Clear positioning of current work within scholarly conversation
  • Identifies genuine research gaps that the work addresses

Good (4):

  • Good coverage with minor gaps in key areas
  • Mostly synthesis with some description
  • Literature is mostly current with some older foundational works
  • Sources are generally authoritative
  • Work positioning is present but could be stronger
  • Research gaps are identified but may not be critical

Adequate (3):

  • Partial coverage with notable gaps
  • More descriptive summarization than synthesis
  • Literature mix of current and dated sources
  • Mix of authoritative and less rigorous sources
  • Weak positioning within existing literature
  • Research gaps are vague or questionable

Needs Improvement (2):

  • Minimal coverage with major gaps
  • Purely descriptive without synthesis
  • Literature is largely outdated
  • Sources lack authority or rigor
  • Little to no positioning of current work
  • No clear research gaps identified

Poor (1):

  • Inadequate or absent literature review
  • No synthesis
  • Outdated or inappropriate sources
  • No engagement with scholarly conversation
  • No gap identification

Assessment Checklist

  • Does review cover all major relevant areas?
  • Is literature synthesized rather than just summarized?
  • Are sources current and authoritative?
  • Are contrasting viewpoints presented?
  • Are research gaps clearly identified?
  • Is the current work positioned within existing literature?
  • Is citation balance appropriate (not over-relying on few authors)?
  • Are seminal/foundational works included?

Common Issues

  • Insufficient coverage: Missing key papers or research streams
  • Descriptive listing: Summarizing papers sequentially without synthesis
  • Outdated sources: Relying on literature more than 5-10 years old
  • Cherry-picking: Only citing work that supports hypothesis
  • Poor organization: Lack of thematic or conceptual structure
  • Weak gap identification: Gaps are trivial or not actually gaps

Dimension 3: Methodology & Research Design

Quality Indicators

Excellent (5):

  • Research design perfectly aligned with research questions
  • Methods are rigorous, valid, and reliable
  • Procedures are detailed enough for replication
  • Controls, randomization, or triangulation appropriate
  • Potential biases acknowledged and mitigated
  • Ethical considerations addressed comprehensively
  • Limitations are explicitly discussed

Good (4):

  • Design is appropriate with minor alignment issues
  • Methods are sound with small validity concerns
  • Procedures are mostly replicable
  • Some controls or validation present
  • Major biases addressed
  • Ethical considerations mentioned
  • Some limitations discussed

Adequate (3):

  • Design partially appropriate for questions
  • Methods have notable validity concerns
  • Procedures lack detail for full replication
  • Limited controls or validation
  • Bias mitigation is minimal
  • Ethics addressed superficially
  • Limitations minimally discussed

Needs Improvement (2):

  • Design poorly aligned with research questions
  • Methods have serious validity issues
  • Procedures too vague to replicate
  • No controls or validation
  • Biases not addressed
  • Ethical concerns not addressed
  • No limitation discussion

Poor (1):

  • Inappropriate or absent methodology
  • Methods fundamentally flawed
  • Not replicable
  • No validity considerations
  • No ethical considerations
  • No acknowledgment of limitations

Assessment Checklist

  • Is methodology appropriate for research questions?
  • Are procedures described in sufficient detail?
  • Can the study be replicated from the description?
  • Are validity and reliability addressed?
  • Are potential biases identified and mitigated?
  • Are ethical considerations discussed?
  • Are limitations acknowledged?
  • Is sample size justified (for quantitative work)?
  • Are qualitative methods rigorous (if applicable)?

Design-Specific Considerations

Quantitative Studies:

  • Sample size with power analysis
  • Control groups and randomization
  • Measurement validity and reliability
  • Statistical assumptions checking

Qualitative Studies:

  • Sampling strategy and saturation
  • Data collection procedures
  • Coding and analysis framework
  • Trustworthiness criteria (credibility, transferability, etc.)

Mixed Methods:

  • Integration rationale
  • Sequencing justification
  • Data convergence strategy

Dimension 4: Data Collection & Sources

Quality Indicators

Excellent (5):

  • Data sources are highly credible and appropriate
  • Sample size is sufficient and well-justified
  • Data collection procedures are rigorous and systematic
  • Data quality controls are in place
  • Sampling strategy ensures representativeness
  • Missing data is minimal and handled appropriately

Good (4):

  • Data sources are credible with minor concerns
  • Sample size is adequate
  • Collection procedures are systematic
  • Some quality controls present
  • Sampling is reasonable
  • Missing data is addressed

Adequate (3):

  • Data sources are acceptable but not optimal
  • Sample size is marginal
  • Collection procedures lack some rigor
  • Limited quality controls
  • Sampling may have bias concerns
  • Missing data handling is basic

Needs Improvement (2):

  • Data sources have credibility issues
  • Sample size is insufficient
  • Collection procedures are ad hoc
  • No quality controls
  • Sampling is clearly biased
  • Missing data not addressed

Poor (1):

  • Data sources are inappropriate or unreliable
  • Sample size is inadequate
  • Collection is unsystematic
  • No quality considerations
  • Sampling is fundamentally flawed
  • Excessive missing data

Assessment Checklist

  • Are data sources credible and appropriate?
  • Is sample size sufficient for conclusions?
  • Is sampling strategy clearly described?
  • Is the sample representative of target population?
  • Are data collection procedures systematic?
  • Are data quality controls described?
  • Is missing data addressed?
  • Are any potential data biases discussed?

Dimension 5: Analysis & Interpretation

Quality Indicators

Excellent (5):

  • Analytical methods perfectly suited to data and questions
  • Analysis is rigorous with appropriate techniques
  • Results interpretation is logical and well-supported
  • Alternative explanations are considered
  • Claims are proportionate to evidence
  • Assumptions are validated
  • Analysis is transparent and reproducible

Good (4):

  • Methods are appropriate with minor issues
  • Analysis is sound
  • Interpretation is mostly logical
  • Some alternatives considered
  • Claims generally match evidence
  • Key assumptions checked
  • Analysis is mostly transparent

Adequate (3):

  • Methods are acceptable but not optimal
  • Analysis has some technical issues
  • Interpretation has logical gaps
  • Alternatives not thoroughly explored
  • Some claims exceed evidence
  • Assumptions not fully validated
  • Analysis transparency is limited

Needs Improvement (2):

  • Methods are questionable for data/questions
  • Analysis has significant technical flaws
  • Interpretation is poorly supported
  • No alternative explanations
  • Claims significantly exceed evidence
  • Assumptions not checked
  • Analysis is not transparent

Poor (1):

  • Methods are inappropriate
  • Analysis is fundamentally flawed
  • Interpretation is illogical
  • No consideration of alternatives
  • Claims unsupported by evidence
  • No assumption validation
  • Analysis is opaque

Assessment Checklist

  • Are analytical methods appropriate?
  • Are statistical tests/qualitative methods properly applied?
  • Are assumptions tested?
  • Is interpretation logical and well-supported?
  • Are alternative explanations considered?
  • Do claims align with evidence strength?
  • Is analysis reproducible from description?
  • Are uncertainties acknowledged?

Quantitative Analysis

  • Appropriate statistical tests
  • Assumptions checked (normality, homogeneity, etc.)
  • Effect sizes reported
  • Confidence intervals provided
  • Multiple testing corrections (if applicable)
  • Model diagnostics performed

Qualitative Analysis

  • Coding framework is clear
  • Inter-rater reliability (if applicable)
  • Saturation discussed
  • Negative cases examined
  • Member checking or validation
  • Clear audit trail

Dimension 6: Results & Findings

Quality Indicators

Excellent (5):

  • Results are clearly and comprehensively presented
  • Visualizations are effective and appropriate
  • Statistical or qualitative rigor is evident
  • Key findings are highlighted effectively
  • Results directly address research questions
  • Patterns and relationships are clearly shown
  • Negative and null results are reported

Good (4):

  • Results are clear with minor presentation issues
  • Visualizations are generally effective
  • Rigor is present
  • Main findings are identifiable
  • Results mostly address questions
  • Patterns are shown
  • Some negative results included

Adequate (3):

  • Results presentation is adequate but could be clearer
  • Visualizations are basic or have issues
  • Rigor is questionable in places
  • Findings are present but not emphasized
  • Partial alignment with questions
  • Patterns are unclear
  • Negative results may be omitted

Needs Improvement (2):

  • Results presentation is unclear or confusing
  • Visualizations are poor or misleading
  • Lack of rigor
  • Findings are difficult to identify
  • Weak alignment with questions
  • No clear patterns
  • Only positive results shown

Poor (1):

  • Results are poorly presented or absent
  • Visualizations are inappropriate or missing
  • No evidence of rigor
  • Findings are unclear
  • Results don't address questions
  • No identifiable patterns
  • Results appear selective

Assessment Checklist

  • Are results clearly presented?
  • Do results directly address research questions?
  • Are visualizations appropriate and effective?
  • Are key findings highlighted?
  • Are negative/null results reported?
  • Is appropriate precision reported (p-values, CIs, effect sizes)?
  • Are qualitative findings supported by data excerpts?
  • Is there evidence of selective reporting?

Presentation Quality

Tables:

  • Clear labels and captions
  • Appropriate precision
  • Organized logically
  • Not overly complex

Figures:

  • Clear axes and legends
  • Appropriate chart type
  • Professional appearance
  • Accessible (color-blind friendly)

Text:

  • Highlights key findings
  • Avoids redundancy with tables/figures
  • Uses appropriate statistical language

Dimension 7: Scholarly Writing & Presentation

Quality Indicators

Excellent (5):

  • Writing is clear, concise, and precise
  • Organization is logical with excellent flow
  • Academic tone is appropriate and consistent
  • Grammar and mechanics are flawless
  • Technical terms are used correctly
  • Accessible to target audience
  • Abstract/summary is comprehensive and accurate

Good (4):

  • Writing is clear with minor awkwardness
  • Organization is logical with good flow
  • Tone is mostly appropriate
  • Few grammar/mechanical errors
  • Technical terms mostly correct
  • Generally accessible
  • Abstract is adequate

Adequate (3):

  • Writing is understandable but has clarity issues
  • Organization has some logical gaps
  • Tone inconsistencies
  • Noticeable grammar/mechanical errors
  • Some technical term misuse
  • Accessibility issues for target audience
  • Abstract is incomplete or vague

Needs Improvement (2):

  • Writing is often unclear or verbose
  • Poor organization and flow
  • Tone is inappropriate
  • Frequent grammar/mechanical errors
  • Technical terminology problems
  • Not accessible to target audience
  • Abstract is poor or missing

Poor (1):

  • Writing is unclear and difficult to follow
  • No clear organization
  • Tone is inappropriate
  • Pervasive grammar/mechanical errors
  • Incorrect technical terminology
  • Inaccessible
  • No adequate abstract

Assessment Checklist

  • Is writing clear and concise?
  • Is organization logical?
  • Is tone appropriate for academic writing?
  • Are grammar and mechanics correct?
  • Are technical terms used appropriately?
  • Is jargon explained when necessary?
  • Does abstract accurately summarize the work?
  • Are transitions between sections smooth?
  • Is the target audience clear?

Common Writing Issues

  • Wordiness: Unnecessarily complex or lengthy prose
  • Passive voice overuse: Reduces clarity and directness
  • Paragraph structure: Lack of topic sentences or coherence
  • Redundancy: Repeating information unnecessarily
  • Logical flow: Poor transitions between ideas
  • Precision: Vague or ambiguous language
  • Accessibility: Too technical or not technical enough

Dimension 8: Citations & References

Quality Indicators

Excellent (5):

  • All claims are appropriately cited
  • Sources are authoritative and current
  • Citations are accurate and complete
  • Diverse perspectives are represented
  • Citation format is consistent and correct
  • Balance between self-citation and others
  • Primary sources used appropriately

Good (4):

  • Most claims are cited
  • Sources are generally authoritative
  • Few citation errors
  • Reasonable diversity of sources
  • Format is mostly consistent
  • Citation balance is good
  • Mix of primary and secondary sources

Adequate (3):

  • Some claims lack citations
  • Source quality is mixed
  • Several citation errors
  • Limited source diversity
  • Format inconsistencies
  • Citation balance issues
  • Over-reliance on secondary sources

Needs Improvement (2):

  • Many claims uncited
  • Sources are questionable
  • Numerous citation errors
  • Narrow source base
  • Format is inconsistent
  • Excessive self-citation or narrow citing
  • Inappropriate sources (e.g., only secondary)

Poor (1):

  • Inadequate citations
  • Unreliable sources
  • Pervasive citation errors
  • Minimal source diversity
  • No consistent format
  • Severe citation imbalance
  • Inappropriate source types

Assessment Checklist

  • Are all factual claims cited?
  • Are citations to primary sources when appropriate?
  • Are sources authoritative and peer-reviewed?
  • Is there balance in perspectives cited?
  • Are citations accurate (authors, dates, pages)?
  • Is citation format consistent?
  • Are self-citations appropriate (typically <20%)?
  • Are sources current (for time-sensitive topics)?
  • Are classic/seminal works included where relevant?

Citation Quality Assessment

Source Types (in order of preference for most academic work):

  1. Peer-reviewed journal articles
  2. Academic books from reputable publishers
  3. Conference proceedings (field-dependent)
  4. Technical reports from reputable institutions
  5. Dissertations/theses
  6. Preprints (with caution, field-dependent)
  7. Grey literature (limited use)
  8. Websites (rarely appropriate, except for factual data)

Red Flags:

  • Wikipedia as a primary source
  • Excessive self-citation (>30%)
  • Only citing papers that support hypothesis
  • Outdated sources when current ones exist
  • Missing key papers in the field
  • Citing abstracts only when full papers are available
  • Inconsistent or incorrect citation format

Cross-Cutting Considerations

Reproducibility

Assess across dimensions:

  • Are methods detailed enough to replicate?
  • Are data and code available (or availability explained)?
  • Are analysis steps transparent?
  • Are materials/instruments specified?

Ethics

Consider:

  • IRB approval (for human subjects)
  • Informed consent
  • Privacy and confidentiality
  • Conflicts of interest
  • Research integrity
  • Data sharing ethics

Bias and Limitations

Evaluate whether:

  • Potential biases are acknowledged
  • Limitations are discussed honestly
  • Boundary conditions are specified
  • Generalizability is appropriately claimed

Impact and Significance

Consider:

  • Theoretical contribution
  • Practical implications
  • Policy relevance
  • Methodological innovation
  • Field advancement

Scoring Guidelines

Dimension Weighting (Suggested, Adjust by Context)

  • Problem Formulation: 15%
  • Literature Review: 15%
  • Methodology: 20%
  • Data Collection: 10%
  • Analysis: 15%
  • Results: 10%
  • Writing: 10%
  • Citations: 5%

Overall Assessment Thresholds

  • Exceptional (4.5-5.0): Ready for top-tier publication
  • Strong (4.0-4.4): Publication-ready with minor revisions
  • Good (3.5-3.9): Major revisions required, promising work
  • Acceptable (3.0-3.4): Significant revisions needed
  • Weak (2.0-2.9): Fundamental issues, major rework required
  • Poor (<2.0): Not suitable for publication without complete revision

Contextual Adjustments

Adjust standards based on:

  • Stage: Proposal < Draft < Final submission
  • Venue: Student thesis < Conference < Journal < Top-tier journal
  • Type: Theoretical < Empirical < Meta-analysis
  • Field: Standards vary by discipline
  • Purpose: Educational < Professional < Publication

Using This Framework

  1. Read the work thoroughly before beginning evaluation
  2. Score each dimension using the 5-point scale
  3. Document evidence for each score with specific examples
  4. Consider context and adjust expectations appropriately
  5. Synthesize findings across dimensions
  6. Provide actionable feedback prioritized by impact
  7. Balance criticism with recognition of strengths

This framework is a guide, not a rigid checklist. Professional judgment should always be applied in context.