Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:18 +08:00
commit 74bee324ab
335 changed files with 147377 additions and 0 deletions

View File

@@ -0,0 +1,484 @@
# Evidence Hierarchy and Quality Assessment
## Traditional Evidence Hierarchy (Medical/Clinical)
### Level 1: Systematic Reviews and Meta-Analyses
**Description:** Comprehensive synthesis of all available evidence on a question.
**Strengths:**
- Combines multiple studies for greater power
- Reduces impact of single-study anomalies
- Can identify patterns across studies
- Quantifies overall effect size
**Weaknesses:**
- Quality depends on included studies ("garbage in, garbage out")
- Publication bias can distort findings
- Heterogeneity may make pooling inappropriate
- Can mask important differences between studies
**Critical evaluation:**
- Was search comprehensive (multiple databases, grey literature)?
- Were inclusion criteria appropriate and prespecified?
- Was study quality assessed?
- Was heterogeneity explored?
- Was publication bias assessed (funnel plots, fail-safe N)?
- Were appropriate statistical methods used?
### Level 2: Randomized Controlled Trials (RCTs)
**Description:** Experimental studies with random assignment to conditions.
**Strengths:**
- Gold standard for establishing causation
- Controls for known and unknown confounders
- Minimizes selection bias
- Enables causal inference
**Weaknesses:**
- May not be ethical or feasible
- Artificial settings may limit generalizability
- Often short-term with selected populations
- Expensive and time-consuming
**Critical evaluation:**
- Was randomization adequate (sequence generation, allocation concealment)?
- Was blinding implemented (participants, providers, assessors)?
- Was sample size adequate (power analysis)?
- Was intention-to-treat analysis used?
- Was attrition rate acceptable and balanced?
- Are results generalizable?
### Level 3: Cohort Studies
**Description:** Observational studies following groups over time.
**Types:**
- **Prospective:** Follow forward from exposure to outcome
- **Retrospective:** Look backward at existing data
**Strengths:**
- Can study multiple outcomes
- Establishes temporal sequence
- Can calculate incidence and relative risk
- More feasible than RCTs for many questions
**Weaknesses:**
- Susceptible to confounding
- Selection bias possible
- Attrition can bias results
- Cannot prove causation definitively
**Critical evaluation:**
- Were cohorts comparable at baseline?
- Was exposure measured reliably?
- Was follow-up adequate and complete?
- Were potential confounders measured and controlled?
- Was outcome assessment blinded to exposure?
### Level 4: Case-Control Studies
**Description:** Compare people with outcome (cases) to those without (controls), looking back at exposures.
**Strengths:**
- Efficient for rare outcomes
- Relatively quick and inexpensive
- Can study multiple exposures
- Useful for generating hypotheses
**Weaknesses:**
- Cannot calculate incidence
- Susceptible to recall bias
- Selection of controls is challenging
- Cannot prove causation
**Critical evaluation:**
- Were cases and controls defined clearly?
- Were controls appropriate (same source population)?
- Was matching appropriate?
- How was exposure ascertained (records vs. recall)?
- Were potential confounders controlled?
- Could recall bias explain findings?
### Level 5: Cross-Sectional Studies
**Description:** Snapshot observation at single point in time.
**Strengths:**
- Quick and inexpensive
- Can assess prevalence
- Useful for hypothesis generation
- Can study multiple outcomes and exposures
**Weaknesses:**
- Cannot establish temporal sequence
- Cannot determine causation
- Prevalence-incidence bias
- Survival bias
**Critical evaluation:**
- Was sample representative?
- Were measures validated?
- Could reverse causation explain findings?
- Are confounders acknowledged?
### Level 6: Case Series and Case Reports
**Description:** Description of observations in clinical practice.
**Strengths:**
- Can identify new diseases or effects
- Hypothesis-generating
- Details rare phenomena
- Quick to report
**Weaknesses:**
- No control group
- No statistical inference possible
- Highly susceptible to bias
- Cannot establish causation or frequency
**Use:** Primarily for hypothesis generation and clinical description.
### Level 7: Expert Opinion
**Description:** Statements by recognized authorities.
**Strengths:**
- Synthesizes experience
- Useful when no research available
- May integrate multiple sources
**Weaknesses:**
- Subjective and potentially biased
- May not reflect current evidence
- Appeal to authority fallacy risk
- Individual expertise varies
**Use:** Lowest level of evidence; should be supported by data when possible.
## Nuances and Limitations of Traditional Hierarchy
### When Lower-Level Evidence Can Be Strong
1. **Well-designed observational studies** with:
- Large effects (hard to confound)
- Dose-response relationships
- Consistent findings across contexts
- Biological plausibility
- No plausible confounders
2. **Multiple converging lines of evidence** from different study types
3. **Natural experiments** approximating randomization
### When Higher-Level Evidence Can Be Weak
1. **Poor-quality RCTs** with:
- Inadequate randomization
- High attrition
- No blinding when feasible
- Conflicts of interest
2. **Biased meta-analyses**:
- Publication bias
- Selective inclusion
- Inappropriate pooling
- Poor search strategy
3. **Not addressing the right question**:
- Wrong population
- Wrong comparison
- Wrong outcome
- Too artificial to generalize
## Alternative: GRADE System
GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across four levels:
### High Quality
**Definition:** Very confident that true effect is close to estimated effect.
**Characteristics:**
- Well-conducted RCTs
- Overwhelming evidence from observational studies
- Large, consistent effects
- No serious limitations
### Moderate Quality
**Definition:** Moderately confident; true effect likely close to estimated, but could be substantially different.
**Downgrades from high:**
- Some risk of bias
- Inconsistency across studies
- Indirectness (different populations/interventions)
- Imprecision (wide confidence intervals)
- Publication bias suspected
### Low Quality
**Definition:** Limited confidence; true effect may be substantially different.
**Downgrades:**
- Serious limitations in above factors
- Observational studies without special strengths
### Very Low Quality
**Definition:** Very limited confidence; true effect likely substantially different.
**Characteristics:**
- Very serious limitations
- Expert opinion
- Multiple serious flaws
## Study Quality Assessment Criteria
### Internal Validity (Bias Control)
**Questions:**
- Was randomization adequate?
- Was allocation concealed?
- Were groups similar at baseline?
- Was blinding implemented?
- Was attrition minimal and balanced?
- Was intention-to-treat used?
- Were all outcomes reported?
### External Validity (Generalizability)
**Questions:**
- Is sample representative of target population?
- Are inclusion/exclusion criteria too restrictive?
- Is setting realistic?
- Are results applicable to other populations?
- Are effects consistent across subgroups?
### Statistical Conclusion Validity
**Questions:**
- Was sample size adequate (power)?
- Were statistical tests appropriate?
- Were assumptions checked?
- Were effect sizes and confidence intervals reported?
- Were multiple comparisons addressed?
- Was analysis prespecified?
### Construct Validity (Measurement)
**Questions:**
- Were measures validated and reliable?
- Was outcome defined clearly and appropriately?
- Were assessors blinded?
- Were exposures measured accurately?
- Was timing of measurement appropriate?
## Critical Appraisal Tools
### For Different Study Types
**RCTs:**
- Cochrane Risk of Bias Tool
- Jadad Scale
- PEDro Scale (for trials in physical therapy)
**Observational Studies:**
- Newcastle-Ottawa Scale
- ROBINS-I (Risk of Bias in Non-randomized Studies)
**Diagnostic Studies:**
- QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)
**Systematic Reviews:**
- AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews)
**All Study Types:**
- CASP Checklists (Critical Appraisal Skills Programme)
## Domain-Specific Considerations
### Basic Science Research
**Hierarchy differs:**
1. Multiple convergent lines of evidence
2. Mechanistic understanding
3. Reproducible experiments
4. Established theoretical framework
**Key considerations:**
- Replication essential
- Mechanistic plausibility
- Consistency across model systems
- Convergence of methods
### Psychological Research
**Additional concerns:**
- Replication crisis
- Publication bias particularly problematic
- Small effect sizes often expected
- Cultural context matters
- Measures often indirect (self-report)
**Strong evidence includes:**
- Preregistered studies
- Large samples
- Multiple measures
- Behavioral (not just self-report) outcomes
- Cross-cultural replication
### Epidemiology
**Causal inference frameworks:**
- Bradford Hill criteria
- Rothman's causal pies
- Directed Acyclic Graphs (DAGs)
**Strong observational evidence:**
- Dose-response relationships
- Temporal consistency
- Biological plausibility
- Specificity
- Consistency across populations
- Large effects unlikely due to confounding
### Social Sciences
**Challenges:**
- Complex interventions
- Context-dependent effects
- Measurement challenges
- Ethical constraints on RCTs
**Strengthening evidence:**
- Mixed methods
- Natural experiments
- Instrumental variables
- Regression discontinuity designs
- Multiple operationalizations
## Synthesizing Evidence Across Studies
### Consistency
**Strong evidence:**
- Multiple studies, different investigators
- Different populations and settings
- Different research designs converge
- Different measurement methods
**Weak evidence:**
- Single study
- Only one research group
- Conflicting results
- Publication bias evident
### Biological/Theoretical Plausibility
**Strengthens evidence:**
- Known mechanism
- Consistent with other knowledge
- Dose-response relationship
- Coherent with animal/in vitro data
**Weakens evidence:**
- No plausible mechanism
- Contradicts established knowledge
- Biological implausibility
### Temporality
**Essential for causation:**
- Cause must precede effect
- Cross-sectional studies cannot establish
- Reverse causation must be ruled out
### Specificity
**Moderate indicator:**
- Specific cause → specific effect strengthens causation
- But lack of specificity doesn't rule out causation
- Most causes have multiple effects
### Strength of Association
**Strong evidence:**
- Large effects unlikely to be due to confounding
- Dose-response relationships
- All-or-none effects
**Caution:**
- Small effects may still be real
- Large effects can still be confounded
## Red Flags in Evidence Quality
### Study Design Red Flags
- No control group
- Self-selected participants
- No randomization when feasible
- No blinding when feasible
- Very small sample
- Inappropriate statistical tests
### Reporting Red Flags
- Selective outcome reporting
- No study registration/protocol
- Missing methodological details
- No conflicts of interest statement
- Cherry-picked citations
- Results don't match methods
### Interpretation Red Flags
- Causal language from correlational data
- Claiming "proof"
- Ignoring limitations
- Overgeneralizing
- Spinning negative results
- Post hoc rationalization
### Context Red Flags
- Industry funding without independence
- Single study in isolation
- Contradicts preponderance of evidence
- No replication
- Published in predatory journal
- Press release before peer review
## Practical Decision Framework
### When Evaluating Evidence, Ask:
1. **What type of study is this?** (Design)
2. **How well was it conducted?** (Quality)
3. **What does it actually show?** (Results)
4. **How likely is bias?** (Internal validity)
5. **Does it apply to my question?** (External validity)
6. **How does it fit with other evidence?** (Context)
7. **Are the conclusions justified?** (Interpretation)
8. **What are the limitations?** (Uncertainty)
### Making Decisions with Imperfect Evidence
**High-quality evidence:**
- Strong confidence in acting on findings
- Reasonable to change practice/policy
**Moderate-quality evidence:**
- Provisional conclusions
- Consider in conjunction with other factors
- May warrant action depending on stakes
**Low-quality evidence:**
- Weak confidence
- Hypothesis-generating
- Insufficient for major decisions alone
- Consider cost/benefit of waiting for better evidence
**Very low-quality evidence:**
- Very uncertain
- Should not drive decisions alone
- Useful for identifying gaps and research needs
### When Evidence is Conflicting
**Strategies:**
1. Weight by study quality
2. Look for systematic differences (population, methods)
3. Consider publication bias
4. Update with most recent, rigorous evidence
5. Conduct/await systematic review
6. Consider if question is well-formed
## Communicating Evidence Strength
**Avoid:**
- Absolute certainty ("proves")
- False balance (equal weight to unequal evidence)
- Ignoring uncertainty
- Cherry-picking studies
**Better:**
- Quantify uncertainty
- Describe strength of evidence
- Acknowledge limitations
- Present range of evidence
- Distinguish established from emerging findings
- Be clear about what is/isn't known