# Evidence Hierarchy and Quality Assessment ## Traditional Evidence Hierarchy (Medical/Clinical) ### Level 1: Systematic Reviews and Meta-Analyses **Description:** Comprehensive synthesis of all available evidence on a question. **Strengths:** - Combines multiple studies for greater power - Reduces impact of single-study anomalies - Can identify patterns across studies - Quantifies overall effect size **Weaknesses:** - Quality depends on included studies ("garbage in, garbage out") - Publication bias can distort findings - Heterogeneity may make pooling inappropriate - Can mask important differences between studies **Critical evaluation:** - Was search comprehensive (multiple databases, grey literature)? - Were inclusion criteria appropriate and prespecified? - Was study quality assessed? - Was heterogeneity explored? - Was publication bias assessed (funnel plots, fail-safe N)? - Were appropriate statistical methods used? ### Level 2: Randomized Controlled Trials (RCTs) **Description:** Experimental studies with random assignment to conditions. **Strengths:** - Gold standard for establishing causation - Controls for known and unknown confounders - Minimizes selection bias - Enables causal inference **Weaknesses:** - May not be ethical or feasible - Artificial settings may limit generalizability - Often short-term with selected populations - Expensive and time-consuming **Critical evaluation:** - Was randomization adequate (sequence generation, allocation concealment)? - Was blinding implemented (participants, providers, assessors)? - Was sample size adequate (power analysis)? - Was intention-to-treat analysis used? - Was attrition rate acceptable and balanced? - Are results generalizable? ### Level 3: Cohort Studies **Description:** Observational studies following groups over time. **Types:** - **Prospective:** Follow forward from exposure to outcome - **Retrospective:** Look backward at existing data **Strengths:** - Can study multiple outcomes - Establishes temporal sequence - Can calculate incidence and relative risk - More feasible than RCTs for many questions **Weaknesses:** - Susceptible to confounding - Selection bias possible - Attrition can bias results - Cannot prove causation definitively **Critical evaluation:** - Were cohorts comparable at baseline? - Was exposure measured reliably? - Was follow-up adequate and complete? - Were potential confounders measured and controlled? - Was outcome assessment blinded to exposure? ### Level 4: Case-Control Studies **Description:** Compare people with outcome (cases) to those without (controls), looking back at exposures. **Strengths:** - Efficient for rare outcomes - Relatively quick and inexpensive - Can study multiple exposures - Useful for generating hypotheses **Weaknesses:** - Cannot calculate incidence - Susceptible to recall bias - Selection of controls is challenging - Cannot prove causation **Critical evaluation:** - Were cases and controls defined clearly? - Were controls appropriate (same source population)? - Was matching appropriate? - How was exposure ascertained (records vs. recall)? - Were potential confounders controlled? - Could recall bias explain findings? ### Level 5: Cross-Sectional Studies **Description:** Snapshot observation at single point in time. **Strengths:** - Quick and inexpensive - Can assess prevalence - Useful for hypothesis generation - Can study multiple outcomes and exposures **Weaknesses:** - Cannot establish temporal sequence - Cannot determine causation - Prevalence-incidence bias - Survival bias **Critical evaluation:** - Was sample representative? - Were measures validated? - Could reverse causation explain findings? - Are confounders acknowledged? ### Level 6: Case Series and Case Reports **Description:** Description of observations in clinical practice. **Strengths:** - Can identify new diseases or effects - Hypothesis-generating - Details rare phenomena - Quick to report **Weaknesses:** - No control group - No statistical inference possible - Highly susceptible to bias - Cannot establish causation or frequency **Use:** Primarily for hypothesis generation and clinical description. ### Level 7: Expert Opinion **Description:** Statements by recognized authorities. **Strengths:** - Synthesizes experience - Useful when no research available - May integrate multiple sources **Weaknesses:** - Subjective and potentially biased - May not reflect current evidence - Appeal to authority fallacy risk - Individual expertise varies **Use:** Lowest level of evidence; should be supported by data when possible. ## Nuances and Limitations of Traditional Hierarchy ### When Lower-Level Evidence Can Be Strong 1. **Well-designed observational studies** with: - Large effects (hard to confound) - Dose-response relationships - Consistent findings across contexts - Biological plausibility - No plausible confounders 2. **Multiple converging lines of evidence** from different study types 3. **Natural experiments** approximating randomization ### When Higher-Level Evidence Can Be Weak 1. **Poor-quality RCTs** with: - Inadequate randomization - High attrition - No blinding when feasible - Conflicts of interest 2. **Biased meta-analyses**: - Publication bias - Selective inclusion - Inappropriate pooling - Poor search strategy 3. **Not addressing the right question**: - Wrong population - Wrong comparison - Wrong outcome - Too artificial to generalize ## Alternative: GRADE System GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across four levels: ### High Quality **Definition:** Very confident that true effect is close to estimated effect. **Characteristics:** - Well-conducted RCTs - Overwhelming evidence from observational studies - Large, consistent effects - No serious limitations ### Moderate Quality **Definition:** Moderately confident; true effect likely close to estimated, but could be substantially different. **Downgrades from high:** - Some risk of bias - Inconsistency across studies - Indirectness (different populations/interventions) - Imprecision (wide confidence intervals) - Publication bias suspected ### Low Quality **Definition:** Limited confidence; true effect may be substantially different. **Downgrades:** - Serious limitations in above factors - Observational studies without special strengths ### Very Low Quality **Definition:** Very limited confidence; true effect likely substantially different. **Characteristics:** - Very serious limitations - Expert opinion - Multiple serious flaws ## Study Quality Assessment Criteria ### Internal Validity (Bias Control) **Questions:** - Was randomization adequate? - Was allocation concealed? - Were groups similar at baseline? - Was blinding implemented? - Was attrition minimal and balanced? - Was intention-to-treat used? - Were all outcomes reported? ### External Validity (Generalizability) **Questions:** - Is sample representative of target population? - Are inclusion/exclusion criteria too restrictive? - Is setting realistic? - Are results applicable to other populations? - Are effects consistent across subgroups? ### Statistical Conclusion Validity **Questions:** - Was sample size adequate (power)? - Were statistical tests appropriate? - Were assumptions checked? - Were effect sizes and confidence intervals reported? - Were multiple comparisons addressed? - Was analysis prespecified? ### Construct Validity (Measurement) **Questions:** - Were measures validated and reliable? - Was outcome defined clearly and appropriately? - Were assessors blinded? - Were exposures measured accurately? - Was timing of measurement appropriate? ## Critical Appraisal Tools ### For Different Study Types **RCTs:** - Cochrane Risk of Bias Tool - Jadad Scale - PEDro Scale (for trials in physical therapy) **Observational Studies:** - Newcastle-Ottawa Scale - ROBINS-I (Risk of Bias in Non-randomized Studies) **Diagnostic Studies:** - QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) **Systematic Reviews:** - AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews) **All Study Types:** - CASP Checklists (Critical Appraisal Skills Programme) ## Domain-Specific Considerations ### Basic Science Research **Hierarchy differs:** 1. Multiple convergent lines of evidence 2. Mechanistic understanding 3. Reproducible experiments 4. Established theoretical framework **Key considerations:** - Replication essential - Mechanistic plausibility - Consistency across model systems - Convergence of methods ### Psychological Research **Additional concerns:** - Replication crisis - Publication bias particularly problematic - Small effect sizes often expected - Cultural context matters - Measures often indirect (self-report) **Strong evidence includes:** - Preregistered studies - Large samples - Multiple measures - Behavioral (not just self-report) outcomes - Cross-cultural replication ### Epidemiology **Causal inference frameworks:** - Bradford Hill criteria - Rothman's causal pies - Directed Acyclic Graphs (DAGs) **Strong observational evidence:** - Dose-response relationships - Temporal consistency - Biological plausibility - Specificity - Consistency across populations - Large effects unlikely due to confounding ### Social Sciences **Challenges:** - Complex interventions - Context-dependent effects - Measurement challenges - Ethical constraints on RCTs **Strengthening evidence:** - Mixed methods - Natural experiments - Instrumental variables - Regression discontinuity designs - Multiple operationalizations ## Synthesizing Evidence Across Studies ### Consistency **Strong evidence:** - Multiple studies, different investigators - Different populations and settings - Different research designs converge - Different measurement methods **Weak evidence:** - Single study - Only one research group - Conflicting results - Publication bias evident ### Biological/Theoretical Plausibility **Strengthens evidence:** - Known mechanism - Consistent with other knowledge - Dose-response relationship - Coherent with animal/in vitro data **Weakens evidence:** - No plausible mechanism - Contradicts established knowledge - Biological implausibility ### Temporality **Essential for causation:** - Cause must precede effect - Cross-sectional studies cannot establish - Reverse causation must be ruled out ### Specificity **Moderate indicator:** - Specific cause → specific effect strengthens causation - But lack of specificity doesn't rule out causation - Most causes have multiple effects ### Strength of Association **Strong evidence:** - Large effects unlikely to be due to confounding - Dose-response relationships - All-or-none effects **Caution:** - Small effects may still be real - Large effects can still be confounded ## Red Flags in Evidence Quality ### Study Design Red Flags - No control group - Self-selected participants - No randomization when feasible - No blinding when feasible - Very small sample - Inappropriate statistical tests ### Reporting Red Flags - Selective outcome reporting - No study registration/protocol - Missing methodological details - No conflicts of interest statement - Cherry-picked citations - Results don't match methods ### Interpretation Red Flags - Causal language from correlational data - Claiming "proof" - Ignoring limitations - Overgeneralizing - Spinning negative results - Post hoc rationalization ### Context Red Flags - Industry funding without independence - Single study in isolation - Contradicts preponderance of evidence - No replication - Published in predatory journal - Press release before peer review ## Practical Decision Framework ### When Evaluating Evidence, Ask: 1. **What type of study is this?** (Design) 2. **How well was it conducted?** (Quality) 3. **What does it actually show?** (Results) 4. **How likely is bias?** (Internal validity) 5. **Does it apply to my question?** (External validity) 6. **How does it fit with other evidence?** (Context) 7. **Are the conclusions justified?** (Interpretation) 8. **What are the limitations?** (Uncertainty) ### Making Decisions with Imperfect Evidence **High-quality evidence:** - Strong confidence in acting on findings - Reasonable to change practice/policy **Moderate-quality evidence:** - Provisional conclusions - Consider in conjunction with other factors - May warrant action depending on stakes **Low-quality evidence:** - Weak confidence - Hypothesis-generating - Insufficient for major decisions alone - Consider cost/benefit of waiting for better evidence **Very low-quality evidence:** - Very uncertain - Should not drive decisions alone - Useful for identifying gaps and research needs ### When Evidence is Conflicting **Strategies:** 1. Weight by study quality 2. Look for systematic differences (population, methods) 3. Consider publication bias 4. Update with most recent, rigorous evidence 5. Conduct/await systematic review 6. Consider if question is well-formed ## Communicating Evidence Strength **Avoid:** - Absolute certainty ("proves") - False balance (equal weight to unequal evidence) - Ignoring uncertainty - Cherry-picking studies **Better:** - Quantify uncertainty - Describe strength of evidence - Acknowledge limitations - Present range of evidence - Distinguish established from emerging findings - Be clear about what is/isn't known