Initial commit
This commit is contained in:
@@ -0,0 +1,484 @@
|
||||
# Evidence Hierarchy and Quality Assessment
|
||||
|
||||
## Traditional Evidence Hierarchy (Medical/Clinical)
|
||||
|
||||
### Level 1: Systematic Reviews and Meta-Analyses
|
||||
**Description:** Comprehensive synthesis of all available evidence on a question.
|
||||
|
||||
**Strengths:**
|
||||
- Combines multiple studies for greater power
|
||||
- Reduces impact of single-study anomalies
|
||||
- Can identify patterns across studies
|
||||
- Quantifies overall effect size
|
||||
|
||||
**Weaknesses:**
|
||||
- Quality depends on included studies ("garbage in, garbage out")
|
||||
- Publication bias can distort findings
|
||||
- Heterogeneity may make pooling inappropriate
|
||||
- Can mask important differences between studies
|
||||
|
||||
**Critical evaluation:**
|
||||
- Was search comprehensive (multiple databases, grey literature)?
|
||||
- Were inclusion criteria appropriate and prespecified?
|
||||
- Was study quality assessed?
|
||||
- Was heterogeneity explored?
|
||||
- Was publication bias assessed (funnel plots, fail-safe N)?
|
||||
- Were appropriate statistical methods used?
|
||||
|
||||
### Level 2: Randomized Controlled Trials (RCTs)
|
||||
**Description:** Experimental studies with random assignment to conditions.
|
||||
|
||||
**Strengths:**
|
||||
- Gold standard for establishing causation
|
||||
- Controls for known and unknown confounders
|
||||
- Minimizes selection bias
|
||||
- Enables causal inference
|
||||
|
||||
**Weaknesses:**
|
||||
- May not be ethical or feasible
|
||||
- Artificial settings may limit generalizability
|
||||
- Often short-term with selected populations
|
||||
- Expensive and time-consuming
|
||||
|
||||
**Critical evaluation:**
|
||||
- Was randomization adequate (sequence generation, allocation concealment)?
|
||||
- Was blinding implemented (participants, providers, assessors)?
|
||||
- Was sample size adequate (power analysis)?
|
||||
- Was intention-to-treat analysis used?
|
||||
- Was attrition rate acceptable and balanced?
|
||||
- Are results generalizable?
|
||||
|
||||
### Level 3: Cohort Studies
|
||||
**Description:** Observational studies following groups over time.
|
||||
|
||||
**Types:**
|
||||
- **Prospective:** Follow forward from exposure to outcome
|
||||
- **Retrospective:** Look backward at existing data
|
||||
|
||||
**Strengths:**
|
||||
- Can study multiple outcomes
|
||||
- Establishes temporal sequence
|
||||
- Can calculate incidence and relative risk
|
||||
- More feasible than RCTs for many questions
|
||||
|
||||
**Weaknesses:**
|
||||
- Susceptible to confounding
|
||||
- Selection bias possible
|
||||
- Attrition can bias results
|
||||
- Cannot prove causation definitively
|
||||
|
||||
**Critical evaluation:**
|
||||
- Were cohorts comparable at baseline?
|
||||
- Was exposure measured reliably?
|
||||
- Was follow-up adequate and complete?
|
||||
- Were potential confounders measured and controlled?
|
||||
- Was outcome assessment blinded to exposure?
|
||||
|
||||
### Level 4: Case-Control Studies
|
||||
**Description:** Compare people with outcome (cases) to those without (controls), looking back at exposures.
|
||||
|
||||
**Strengths:**
|
||||
- Efficient for rare outcomes
|
||||
- Relatively quick and inexpensive
|
||||
- Can study multiple exposures
|
||||
- Useful for generating hypotheses
|
||||
|
||||
**Weaknesses:**
|
||||
- Cannot calculate incidence
|
||||
- Susceptible to recall bias
|
||||
- Selection of controls is challenging
|
||||
- Cannot prove causation
|
||||
|
||||
**Critical evaluation:**
|
||||
- Were cases and controls defined clearly?
|
||||
- Were controls appropriate (same source population)?
|
||||
- Was matching appropriate?
|
||||
- How was exposure ascertained (records vs. recall)?
|
||||
- Were potential confounders controlled?
|
||||
- Could recall bias explain findings?
|
||||
|
||||
### Level 5: Cross-Sectional Studies
|
||||
**Description:** Snapshot observation at single point in time.
|
||||
|
||||
**Strengths:**
|
||||
- Quick and inexpensive
|
||||
- Can assess prevalence
|
||||
- Useful for hypothesis generation
|
||||
- Can study multiple outcomes and exposures
|
||||
|
||||
**Weaknesses:**
|
||||
- Cannot establish temporal sequence
|
||||
- Cannot determine causation
|
||||
- Prevalence-incidence bias
|
||||
- Survival bias
|
||||
|
||||
**Critical evaluation:**
|
||||
- Was sample representative?
|
||||
- Were measures validated?
|
||||
- Could reverse causation explain findings?
|
||||
- Are confounders acknowledged?
|
||||
|
||||
### Level 6: Case Series and Case Reports
|
||||
**Description:** Description of observations in clinical practice.
|
||||
|
||||
**Strengths:**
|
||||
- Can identify new diseases or effects
|
||||
- Hypothesis-generating
|
||||
- Details rare phenomena
|
||||
- Quick to report
|
||||
|
||||
**Weaknesses:**
|
||||
- No control group
|
||||
- No statistical inference possible
|
||||
- Highly susceptible to bias
|
||||
- Cannot establish causation or frequency
|
||||
|
||||
**Use:** Primarily for hypothesis generation and clinical description.
|
||||
|
||||
### Level 7: Expert Opinion
|
||||
**Description:** Statements by recognized authorities.
|
||||
|
||||
**Strengths:**
|
||||
- Synthesizes experience
|
||||
- Useful when no research available
|
||||
- May integrate multiple sources
|
||||
|
||||
**Weaknesses:**
|
||||
- Subjective and potentially biased
|
||||
- May not reflect current evidence
|
||||
- Appeal to authority fallacy risk
|
||||
- Individual expertise varies
|
||||
|
||||
**Use:** Lowest level of evidence; should be supported by data when possible.
|
||||
|
||||
## Nuances and Limitations of Traditional Hierarchy
|
||||
|
||||
### When Lower-Level Evidence Can Be Strong
|
||||
1. **Well-designed observational studies** with:
|
||||
- Large effects (hard to confound)
|
||||
- Dose-response relationships
|
||||
- Consistent findings across contexts
|
||||
- Biological plausibility
|
||||
- No plausible confounders
|
||||
|
||||
2. **Multiple converging lines of evidence** from different study types
|
||||
|
||||
3. **Natural experiments** approximating randomization
|
||||
|
||||
### When Higher-Level Evidence Can Be Weak
|
||||
1. **Poor-quality RCTs** with:
|
||||
- Inadequate randomization
|
||||
- High attrition
|
||||
- No blinding when feasible
|
||||
- Conflicts of interest
|
||||
|
||||
2. **Biased meta-analyses**:
|
||||
- Publication bias
|
||||
- Selective inclusion
|
||||
- Inappropriate pooling
|
||||
- Poor search strategy
|
||||
|
||||
3. **Not addressing the right question**:
|
||||
- Wrong population
|
||||
- Wrong comparison
|
||||
- Wrong outcome
|
||||
- Too artificial to generalize
|
||||
|
||||
## Alternative: GRADE System
|
||||
|
||||
GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across four levels:
|
||||
|
||||
### High Quality
|
||||
**Definition:** Very confident that true effect is close to estimated effect.
|
||||
|
||||
**Characteristics:**
|
||||
- Well-conducted RCTs
|
||||
- Overwhelming evidence from observational studies
|
||||
- Large, consistent effects
|
||||
- No serious limitations
|
||||
|
||||
### Moderate Quality
|
||||
**Definition:** Moderately confident; true effect likely close to estimated, but could be substantially different.
|
||||
|
||||
**Downgrades from high:**
|
||||
- Some risk of bias
|
||||
- Inconsistency across studies
|
||||
- Indirectness (different populations/interventions)
|
||||
- Imprecision (wide confidence intervals)
|
||||
- Publication bias suspected
|
||||
|
||||
### Low Quality
|
||||
**Definition:** Limited confidence; true effect may be substantially different.
|
||||
|
||||
**Downgrades:**
|
||||
- Serious limitations in above factors
|
||||
- Observational studies without special strengths
|
||||
|
||||
### Very Low Quality
|
||||
**Definition:** Very limited confidence; true effect likely substantially different.
|
||||
|
||||
**Characteristics:**
|
||||
- Very serious limitations
|
||||
- Expert opinion
|
||||
- Multiple serious flaws
|
||||
|
||||
## Study Quality Assessment Criteria
|
||||
|
||||
### Internal Validity (Bias Control)
|
||||
**Questions:**
|
||||
- Was randomization adequate?
|
||||
- Was allocation concealed?
|
||||
- Were groups similar at baseline?
|
||||
- Was blinding implemented?
|
||||
- Was attrition minimal and balanced?
|
||||
- Was intention-to-treat used?
|
||||
- Were all outcomes reported?
|
||||
|
||||
### External Validity (Generalizability)
|
||||
**Questions:**
|
||||
- Is sample representative of target population?
|
||||
- Are inclusion/exclusion criteria too restrictive?
|
||||
- Is setting realistic?
|
||||
- Are results applicable to other populations?
|
||||
- Are effects consistent across subgroups?
|
||||
|
||||
### Statistical Conclusion Validity
|
||||
**Questions:**
|
||||
- Was sample size adequate (power)?
|
||||
- Were statistical tests appropriate?
|
||||
- Were assumptions checked?
|
||||
- Were effect sizes and confidence intervals reported?
|
||||
- Were multiple comparisons addressed?
|
||||
- Was analysis prespecified?
|
||||
|
||||
### Construct Validity (Measurement)
|
||||
**Questions:**
|
||||
- Were measures validated and reliable?
|
||||
- Was outcome defined clearly and appropriately?
|
||||
- Were assessors blinded?
|
||||
- Were exposures measured accurately?
|
||||
- Was timing of measurement appropriate?
|
||||
|
||||
## Critical Appraisal Tools
|
||||
|
||||
### For Different Study Types
|
||||
|
||||
**RCTs:**
|
||||
- Cochrane Risk of Bias Tool
|
||||
- Jadad Scale
|
||||
- PEDro Scale (for trials in physical therapy)
|
||||
|
||||
**Observational Studies:**
|
||||
- Newcastle-Ottawa Scale
|
||||
- ROBINS-I (Risk of Bias in Non-randomized Studies)
|
||||
|
||||
**Diagnostic Studies:**
|
||||
- QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)
|
||||
|
||||
**Systematic Reviews:**
|
||||
- AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews)
|
||||
|
||||
**All Study Types:**
|
||||
- CASP Checklists (Critical Appraisal Skills Programme)
|
||||
|
||||
## Domain-Specific Considerations
|
||||
|
||||
### Basic Science Research
|
||||
**Hierarchy differs:**
|
||||
1. Multiple convergent lines of evidence
|
||||
2. Mechanistic understanding
|
||||
3. Reproducible experiments
|
||||
4. Established theoretical framework
|
||||
|
||||
**Key considerations:**
|
||||
- Replication essential
|
||||
- Mechanistic plausibility
|
||||
- Consistency across model systems
|
||||
- Convergence of methods
|
||||
|
||||
### Psychological Research
|
||||
**Additional concerns:**
|
||||
- Replication crisis
|
||||
- Publication bias particularly problematic
|
||||
- Small effect sizes often expected
|
||||
- Cultural context matters
|
||||
- Measures often indirect (self-report)
|
||||
|
||||
**Strong evidence includes:**
|
||||
- Preregistered studies
|
||||
- Large samples
|
||||
- Multiple measures
|
||||
- Behavioral (not just self-report) outcomes
|
||||
- Cross-cultural replication
|
||||
|
||||
### Epidemiology
|
||||
**Causal inference frameworks:**
|
||||
- Bradford Hill criteria
|
||||
- Rothman's causal pies
|
||||
- Directed Acyclic Graphs (DAGs)
|
||||
|
||||
**Strong observational evidence:**
|
||||
- Dose-response relationships
|
||||
- Temporal consistency
|
||||
- Biological plausibility
|
||||
- Specificity
|
||||
- Consistency across populations
|
||||
- Large effects unlikely due to confounding
|
||||
|
||||
### Social Sciences
|
||||
**Challenges:**
|
||||
- Complex interventions
|
||||
- Context-dependent effects
|
||||
- Measurement challenges
|
||||
- Ethical constraints on RCTs
|
||||
|
||||
**Strengthening evidence:**
|
||||
- Mixed methods
|
||||
- Natural experiments
|
||||
- Instrumental variables
|
||||
- Regression discontinuity designs
|
||||
- Multiple operationalizations
|
||||
|
||||
## Synthesizing Evidence Across Studies
|
||||
|
||||
### Consistency
|
||||
**Strong evidence:**
|
||||
- Multiple studies, different investigators
|
||||
- Different populations and settings
|
||||
- Different research designs converge
|
||||
- Different measurement methods
|
||||
|
||||
**Weak evidence:**
|
||||
- Single study
|
||||
- Only one research group
|
||||
- Conflicting results
|
||||
- Publication bias evident
|
||||
|
||||
### Biological/Theoretical Plausibility
|
||||
**Strengthens evidence:**
|
||||
- Known mechanism
|
||||
- Consistent with other knowledge
|
||||
- Dose-response relationship
|
||||
- Coherent with animal/in vitro data
|
||||
|
||||
**Weakens evidence:**
|
||||
- No plausible mechanism
|
||||
- Contradicts established knowledge
|
||||
- Biological implausibility
|
||||
|
||||
### Temporality
|
||||
**Essential for causation:**
|
||||
- Cause must precede effect
|
||||
- Cross-sectional studies cannot establish
|
||||
- Reverse causation must be ruled out
|
||||
|
||||
### Specificity
|
||||
**Moderate indicator:**
|
||||
- Specific cause → specific effect strengthens causation
|
||||
- But lack of specificity doesn't rule out causation
|
||||
- Most causes have multiple effects
|
||||
|
||||
### Strength of Association
|
||||
**Strong evidence:**
|
||||
- Large effects unlikely to be due to confounding
|
||||
- Dose-response relationships
|
||||
- All-or-none effects
|
||||
|
||||
**Caution:**
|
||||
- Small effects may still be real
|
||||
- Large effects can still be confounded
|
||||
|
||||
## Red Flags in Evidence Quality
|
||||
|
||||
### Study Design Red Flags
|
||||
- No control group
|
||||
- Self-selected participants
|
||||
- No randomization when feasible
|
||||
- No blinding when feasible
|
||||
- Very small sample
|
||||
- Inappropriate statistical tests
|
||||
|
||||
### Reporting Red Flags
|
||||
- Selective outcome reporting
|
||||
- No study registration/protocol
|
||||
- Missing methodological details
|
||||
- No conflicts of interest statement
|
||||
- Cherry-picked citations
|
||||
- Results don't match methods
|
||||
|
||||
### Interpretation Red Flags
|
||||
- Causal language from correlational data
|
||||
- Claiming "proof"
|
||||
- Ignoring limitations
|
||||
- Overgeneralizing
|
||||
- Spinning negative results
|
||||
- Post hoc rationalization
|
||||
|
||||
### Context Red Flags
|
||||
- Industry funding without independence
|
||||
- Single study in isolation
|
||||
- Contradicts preponderance of evidence
|
||||
- No replication
|
||||
- Published in predatory journal
|
||||
- Press release before peer review
|
||||
|
||||
## Practical Decision Framework
|
||||
|
||||
### When Evaluating Evidence, Ask:
|
||||
|
||||
1. **What type of study is this?** (Design)
|
||||
2. **How well was it conducted?** (Quality)
|
||||
3. **What does it actually show?** (Results)
|
||||
4. **How likely is bias?** (Internal validity)
|
||||
5. **Does it apply to my question?** (External validity)
|
||||
6. **How does it fit with other evidence?** (Context)
|
||||
7. **Are the conclusions justified?** (Interpretation)
|
||||
8. **What are the limitations?** (Uncertainty)
|
||||
|
||||
### Making Decisions with Imperfect Evidence
|
||||
|
||||
**High-quality evidence:**
|
||||
- Strong confidence in acting on findings
|
||||
- Reasonable to change practice/policy
|
||||
|
||||
**Moderate-quality evidence:**
|
||||
- Provisional conclusions
|
||||
- Consider in conjunction with other factors
|
||||
- May warrant action depending on stakes
|
||||
|
||||
**Low-quality evidence:**
|
||||
- Weak confidence
|
||||
- Hypothesis-generating
|
||||
- Insufficient for major decisions alone
|
||||
- Consider cost/benefit of waiting for better evidence
|
||||
|
||||
**Very low-quality evidence:**
|
||||
- Very uncertain
|
||||
- Should not drive decisions alone
|
||||
- Useful for identifying gaps and research needs
|
||||
|
||||
### When Evidence is Conflicting
|
||||
|
||||
**Strategies:**
|
||||
1. Weight by study quality
|
||||
2. Look for systematic differences (population, methods)
|
||||
3. Consider publication bias
|
||||
4. Update with most recent, rigorous evidence
|
||||
5. Conduct/await systematic review
|
||||
6. Consider if question is well-formed
|
||||
|
||||
## Communicating Evidence Strength
|
||||
|
||||
**Avoid:**
|
||||
- Absolute certainty ("proves")
|
||||
- False balance (equal weight to unequal evidence)
|
||||
- Ignoring uncertainty
|
||||
- Cherry-picking studies
|
||||
|
||||
**Better:**
|
||||
- Quantify uncertainty
|
||||
- Describe strength of evidence
|
||||
- Acknowledge limitations
|
||||
- Present range of evidence
|
||||
- Distinguish established from emerging findings
|
||||
- Be clear about what is/isn't known
|
||||
Reference in New Issue
Block a user