Files
2025-11-30 08:38:26 +08:00

23 KiB
Raw Permalink Blame History

Domain Research: Health Science - Advanced Methodology

Workflow

Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary

Step 1: Formulate research question (PICOT)

Define precise PICOT elements for answerable research question (see template.md for framework).

Step 2: Assess evidence hierarchy and study design

Match study design to question type using 1. Evidence Hierarchy (RCT for therapy, cohort for prognosis, cross-sectional for diagnosis).

Step 3: Evaluate study quality and bias

Apply systematic bias assessment using 2. Bias Assessment (Cochrane RoB 2, ROBINS-I, or QUADAS-2 depending on design).

Step 4: Prioritize and define outcomes

Distinguish patient-important from surrogate outcomes using 6. Outcome Measurement guidance on MCID, composite outcomes, and surrogates.

Step 5: Synthesize evidence and grade certainty

Rate certainty using 3. GRADE Framework (downgrade for bias/inconsistency/indirectness/imprecision/publication bias, upgrade for large effects/dose-response). For multiple studies, apply 4. Meta-Analysis Techniques.

Step 6: Create decision-ready summary

Synthesize findings using 8. Knowledge Translation evidence-to-decision framework, assess applicability per 7. Special Populations & Contexts, and avoid 9. Common Pitfalls.


1. Evidence Hierarchy

Study Design Selection by Question Type

Therapy/Intervention Questions:

  • Gold standard: RCT (randomized controlled trial)
  • When RCT not feasible: Prospective cohort or pragmatic trial
  • Never acceptable: Case series, expert opinion for causal claims
  • Rationale: RCTs minimize confounding through randomization, establishing causation

Diagnostic Accuracy Questions:

  • Gold standard: Cross-sectional study with consecutive enrollment
  • Critical requirement: Compare index test to validated reference standard in same patients
  • Avoid: Case-control design (inflates sensitivity/specificity by selecting extremes)
  • Rationale: Cross-sectional design prevents spectrum bias; consecutive enrollment prevents selection bias

Prognosis/Prediction Questions:

  • Gold standard: Prospective cohort (follow from exposure to outcome)
  • Acceptable: Retrospective cohort with robust data (registries, databases)
  • Avoid: Case-control (can't estimate incidence), cross-sectional (no temporal sequence)
  • Rationale: Cohort design establishes temporal sequence, allows incidence calculation

Harm/Safety Questions:

  • Common harms: RCTs (adequate power for events occurring in >1% patients)
  • Rare harms: Large observational studies (cohort, case-control, pharmacovigilance)
  • Delayed harms: Long-term cohort studies or registries
  • Rationale: RCTs often lack power/duration for rare or delayed harms; observational studies provide larger samples and longer follow-up

Hierarchy by Evidence Strength

Level 1 (Highest): Systematic reviews and meta-analyses of well-designed RCTs Level 2: Individual large, well-designed RCT with low risk of bias Level 3: Well-designed RCTs with some limitations (quasi-randomized, not blinded) Level 4: Cohort studies (prospective better than retrospective) Level 5: Case-control studies Level 6: Cross-sectional surveys (descriptive only, not causal) Level 7: Case series or case reports Level 8: Expert opinion, pathophysiologic rationale

Important: Hierarchy is a starting point. Study quality matters more than design alone. Well-conducted cohort > poorly conducted RCT.


2. Bias Assessment

Cochrane Risk of Bias 2 (RoB 2) for RCTs

Domain 1: Randomization Process

  • Low risk: Computer-generated sequence, central allocation, opaque envelopes
  • Some concerns: Randomization method unclear, baseline imbalances suggesting problems
  • High risk: Non-random sequence (alternation, date of birth), predictable allocation, post-randomization exclusions

Domain 2: Deviations from Intended Interventions

  • Low risk: Double-blind, protocol deviations balanced across groups, intention-to-treat (ITT) analysis
  • Some concerns: Open-label but objective outcomes, minor unbalanced deviations
  • High risk: Open-label with subjective outcomes, substantial deviation (>10% cross-over), per-protocol analysis only

Domain 3: Missing Outcome Data

  • Low risk: <5% loss to follow-up, balanced across groups, multiple imputation if >5%
  • Some concerns: 5-10% loss, ITT analysis used, or reasons for missingness reported
  • High risk: >10% loss, or imbalanced loss (>5% difference between groups), or complete-case analysis with no sensitivity

Domain 4: Measurement of Outcome

  • Low risk: Blinded outcome assessors, objective outcomes (mortality, lab values)
  • Some concerns: Unblinded assessors but objective outcomes
  • High risk: Unblinded assessors with subjective outcomes (pain, quality of life)

Domain 5: Selection of Reported Result

  • Low risk: Protocol published before enrollment, all pre-specified outcomes reported
  • Some concerns: Protocol not available, but outcomes match methods section
  • High risk: Outcomes in results differ from protocol/methods, selective subgroup reporting

Overall Judgment: If any domain is "high risk" → Overall high risk. If all domains "low risk" → Overall low risk. Otherwise → Some concerns.

ROBINS-I for Observational Studies

Domain 1: Confounding

  • Low: All important confounders measured and adjusted (multivariable regression, propensity scores, matching)
  • Moderate: Most confounders adjusted, but some unmeasured
  • Serious: Important confounders not adjusted (e.g., comparing treatment groups without adjusting for severity)
  • Critical: Confounding by indication makes results uninterpretable

Domain 2: Selection of Participants

  • Low: Selection into study unrelated to intervention and outcome (inception cohort, consecutive enrollment)
  • Serious: Post-intervention selection (survivor bias, selecting on outcome)

Domain 3: Classification of Interventions

  • Low: Intervention status well-defined and independently ascertained (pharmacy records, procedure logs)
  • Serious: Intervention status based on patient recall or subjective classification

Domain 4: Deviations from Intended Interventions

  • Low: Intervention/comparator groups received intended interventions, co-interventions balanced
  • Serious: Substantial differences in co-interventions between groups

Domain 5: Missing Data

  • Low: <5% missing outcome data, or multiple imputation with sensitivity analysis
  • Serious: >10% missing, complete-case analysis with no sensitivity

Domain 6: Measurement of Outcomes

  • Low: Blinded outcome assessment or objective outcomes
  • Serious: Unblinded assessment of subjective outcomes, knowledge of intervention may bias assessment

Domain 7: Selection of Reported Result

  • Low: Analysis plan pre-specified and followed
  • Serious: Selective reporting of outcomes or subgroups

QUADAS-2 for Diagnostic Accuracy Studies

Domain 1: Patient Selection

  • Low: Consecutive or random sample, case-control design avoided, appropriate exclusions
  • High: Case-control design (inflates accuracy), inappropriate exclusions (spectrum bias)

Domain 2: Index Test

  • Low: Pre-specified threshold, blinded to reference standard
  • High: Threshold chosen after seeing results, unblinded interpretation

Domain 3: Reference Standard

  • Low: Reference standard correctly classifies condition, interpreted blind to index test
  • High: Imperfect reference standard, differential verification (different reference for positive/negative index)

Domain 4: Flow and Timing

  • Low: All patients receive same reference standard, appropriate interval between tests
  • High: Not all patients receive reference (partial verification), long interval allowing disease status to change

3. GRADE Framework

Starting Certainty

RCTs: Start at High certainty Observational studies: Start at Low certainty

Downgrade Factors (Each -1 or -2 levels)

1. Risk of Bias (Study Limitations)

  • Serious (-1): Most studies have some concerns on RoB 2, or observational studies with moderate risk on most ROBINS-I domains
  • Very serious (-2): Most studies high risk of bias, or observational with serious/critical risk on ROBINS-I

2. Inconsistency (Heterogeneity)

  • Serious (-1): I² = 50-75%, or point estimates vary widely, or confidence intervals show minimal overlap
  • Very serious (-2): I² > 75%, opposite directions of effect
  • Do not downgrade if: Heterogeneity explained by subgroup analysis, or all studies show benefit despite variation in magnitude

3. Indirectness (Applicability)

  • Serious (-1): Indirect comparison (no head-to-head trial), surrogate outcome instead of patient-important, PICO mismatch (different population/intervention than question)
  • Very serious (-2): Multiple levels of indirectness (e.g., indirect comparison + surrogate outcome)

4. Imprecision (Statistical Uncertainty)

  • Serious (-1): Confidence interval crosses minimal clinically important difference (MCID) or includes both benefit and harm, or optimal information size (OIS) not met
  • Very serious (-2): Very wide CI, very small sample (<100 total), or very few events (<100 total)
  • Rule of thumb: OIS = sample size required for adequately powered RCT (~400 patients for typical effect size)

5. Publication Bias

  • Serious (-1): Funnel plot asymmetry (Egger's test p<0.10), all studies industry-funded with positive results, or known unpublished negative trials
  • Note: Requires ≥10 studies to assess funnel plot. Consider searching trial registries for unpublished studies.

Upgrade Factors (Observational Studies Only)

1. Large Effect

  • Upgrade +1: RR > 2 or < 0.5 (based on consistent evidence, no plausible confounders)
  • Upgrade +2: RR > 5 or < 0.2 ("very large effect")
  • Example: Smoking → lung cancer (RR ~20) upgraded from low to moderate or high

2. Dose-Response Gradient

  • Upgrade +1: Increasing exposure associated with increasing risk/benefit in consistent pattern
  • Example: More cigarettes/day → higher lung cancer risk

3. All Plausible Confounders Would Reduce Observed Effect

  • Upgrade +1: Despite confounding working against finding effect, effect still observed
  • Example: Healthy user bias would reduce observed benefit, yet benefit still seen

Final Certainty Rating

High (⊕⊕⊕⊕): Very confident true effect is close to estimate. Further research very unlikely to change conclusion.

Moderate (⊕⊕⊕○): Moderately confident. True effect is likely close to estimate, but could be substantially different. Further research may change conclusion.

Low (⊕⊕○○): Limited confidence. True effect may be substantially different. Further research likely to change conclusion.

Very Low (⊕○○○): Very little confidence. True effect is likely substantially different. Any estimate is very uncertain.


4. Meta-Analysis Techniques

When to Pool (Meta-Analysis)

Pool when:

  • Studies address same PICO question
  • Outcomes measured similarly (same construct, similar timepoints)
  • Low to moderate heterogeneity (I² < 60%)
  • At least 3 studies available

Do not pool when:

  • Substantial heterogeneity (I² > 75%) unexplained by subgroups
  • Different interventions (can't pool aspirin with warfarin for "anticoagulation")
  • Different populations (adults vs children, mild vs severe disease)
  • Methodologically flawed studies (high risk of bias)

Statistical Models

Fixed-effect model: Assumes one true effect, differences due to sampling error only.

  • Use when: I² < 25%, studies very similar
  • Calculation: Inverse-variance weighting (larger studies get more weight)

Random-effects model: Assumes distribution of true effects, accounts for between-study variance.

  • Use when: I² ≥ 25%, clinical heterogeneity expected
  • Calculation: DerSimonian-Laird or REML methods
  • Note: Gives more weight to smaller studies than fixed-effect

Recommendation: Use random-effects as default for clinical heterogeneity, even if I² low.

Effect Measures

Binary outcomes (event yes/no):

  • Risk Ratio (RR): Events in intervention / Events in control. Easier to interpret than OR.
  • Odds Ratio (OR): Used when outcome rare (<10%) or case-control design.
  • Risk Difference (RD): Absolute difference. Important for clinical interpretation (NNT = 1/RD).

Continuous outcomes (measured on scale):

  • Mean Difference (MD): When outcome measured on same scale (e.g., mm Hg blood pressure)
  • Standardized Mean Difference (SMD): When outcome measured on different scales (different QoL questionnaires). Interpret as effect size: SMD 0.2 = small, 0.5 = moderate, 0.8 = large.

Time-to-event outcomes:

  • Hazard Ratio (HR): Accounts for censoring and time. From Cox proportional hazards models.

Heterogeneity Assessment

I² statistic: % of variability due to heterogeneity rather than chance.

  • I² = 0-25%: Low heterogeneity (might not need subgroup analysis)
  • I² = 25-50%: Moderate heterogeneity (explore sources)
  • I² = 50-75%: Substantial heterogeneity (subgroup analysis essential)
  • I² > 75%: Considerable heterogeneity (consider not pooling)

Cochran's Q test: Tests whether heterogeneity is statistically significant (p<0.10 suggests heterogeneity).

  • Limitation: Low power with few studies, high power with many studies (may detect clinically unimportant heterogeneity)

Exploring heterogeneity:

  1. Visual inspection (forest plot - outliers?)
  2. Subgroup analysis (by population, intervention, setting, risk of bias)
  3. Meta-regression (if ≥10 studies) - test whether study-level characteristics (year, dose, age) explain heterogeneity
  4. Sensitivity analysis (exclude high risk of bias, exclude outliers)

Publication Bias Assessment

Methods (require ≥10 studies):

  • Funnel plot: Plot effect size vs precision (SE). Asymmetry suggests small-study effects/publication bias.
  • Egger's test: Statistical test for funnel plot asymmetry (p<0.10 suggests bias).
  • Trim and fill: Impute missing studies and recalculate pooled effect.

Limitations: Asymmetry can be due to heterogeneity, not just publication bias. Small-study effects != publication bias.

Search mitigation: Search clinical trial registries (ClinicalTrials.gov, EudraCT), contact authors, grey literature.


5. Advanced Study Designs

Pragmatic Trials

Purpose: Evaluate effectiveness in real-world settings (vs efficacy in ideal conditions).

Characteristics:

  • Broad inclusion criteria (representative of clinical practice)
  • Minimal exclusions (include comorbidities, elderly, diverse populations)
  • Flexible interventions (allow adaptations like clinical practice)
  • Clinically relevant comparators (usual care, not placebo)
  • Patient-important outcomes (mortality, QoL, not just biomarkers)
  • Long-term follow-up (capture real-world adherence, adverse events)

PRECIS-2 wheel: Rates trials from explanatory (ideal conditions) to pragmatic (real-world) on 9 domains.

Example: HOPE-3 trial (polypill for CVD prevention) - broad inclusion, minimal monitoring, usual care comparator, long-term follow-up.

Non-Inferiority Trials

Purpose: Show new treatment is "not worse" than standard (by pre-defined margin), usually because new treatment has other advantages (cheaper, safer, easier).

Key concepts:

  • Non-inferiority margin (Δ): Maximum acceptable difference. New treatment preserves ≥50% of standard's benefit over placebo.
  • One-sided test: Test whether upper limit of 95% CI for difference < Δ.
  • Interpretation: If upper CI < Δ, declare non-inferiority. If CI crosses Δ, inconclusive.

Pitfalls:

  • Large non-inferiority margins (>50% of benefit) allow ineffective treatments
  • Per-protocol analysis bias (favors non-inferiority); need ITT + per-protocol
  • Assay sensitivity: Must show historical evidence that standard > placebo

Example: Enoxaparin vs unfractionated heparin for VTE treatment. Margin = 2% absolute difference in recurrent VTE.

Cluster Randomized Trials

Design: Randomize groups (hospitals, clinics, communities) not individuals.

When used:

  • Intervention delivered at group level (policy, training, quality improvement)
  • Contamination risk if individuals randomized (control group adopts intervention)

Statistical consideration:

  • Intracluster correlation (ICC): Individuals within cluster more similar than across clusters
  • Design effect: Effective sample size reduced: Deff = 1 + (m-1) × ICC, where m = cluster size
  • Analysis: Account for clustering (GEE, mixed models, cluster-level analysis)

Example: COMMIT trial (smoking cessation at workplace level). Randomized worksites, analyzed accounting for clustering.

N-of-1 Trials

Design: Single patient receives multiple crossovers between treatments in random order.

When used:

  • Chronic stable conditions (asthma, arthritis, chronic pain)
  • Rapid onset/offset treatments
  • Substantial inter-patient variability in response
  • Patient wants personalized evidence

Requirements:

  • ≥3 treatment periods per arm (A-B-A-B-A-B)
  • Washout between periods if needed
  • Blind patient and assessor if possible
  • Pre-specify outcome and decision rule

Analysis: Compare outcomes during A vs B periods within patient (paired t-test, meta-analysis across periods).

Example: Stimulant dose optimization for ADHD. Test 3 doses + placebo in randomized crossover, 1-week periods each.


6. Outcome Measurement

Minimal Clinically Important Difference (MCID)

Definition: Smallest change in outcome that patients perceive as beneficial (and would mandate change in management).

Determination methods:

  1. Anchor-based: Link change to external anchor ("How much has your pain improved?" - "A little" threshold)
  2. Distribution-based: 0.5 SD or 1 SE as MCID (statistical, not patient-centered)
  3. Delphi consensus: Expert panel agrees on MCID

Examples:

  • Pain VAS (0-100): MCID = 10-15 points
  • 6-minute walk distance: MCID = 30 meters
  • KCCQ (Kansas City Cardiomyopathy Questionnaire): MCID = 5 points
  • FEV₁ (lung function): MCID = 100-140 mL

Interpretation: Effect size must exceed MCID to be clinically meaningful. p<0.05 with effect < MCID = statistically significant but clinically trivial.

Composite Outcomes

Definition: Combines ≥2 outcomes into single endpoint (e.g., "death, MI, or stroke").

Advantages:

  • Increases event rate → reduces required sample size
  • Captures multiple aspects of benefit/harm

Disadvantages:

  • Obscures which component drives effect (mortality reduction? or non-fatal MI?)
  • Components may not be equally important to patients (MI ≠ revascularization)
  • If components affected differently, composite can mislead

Guidelines:

  • Report components separately
  • Verify effect is consistent across components
  • Weight components by importance if possible
  • Avoid composites with many low-importance components

Example: MACE (major adverse cardiac events) = death + MI + stroke (appropriate). But "death, MI, stroke, or revascularization" dilutes with less important outcome.

Surrogate Outcomes

Definition: Biomarker/lab value used as substitute for patient-important outcome.

Valid surrogate criteria (Prentice criteria):

  1. Surrogate associated with clinical outcome (correlation)
  2. Intervention affects surrogate
  3. Intervention's effect on clinical outcome is mediated through surrogate
  4. Effect on surrogate fully captures effect on clinical outcome

Problems:

  • Many surrogates fail criteria #4 (e.g., antiarrhythmics reduce PVCs but increase mortality)
  • Intervention can affect surrogate without affecting clinical outcome

Examples:

  • Good surrogate: Blood pressure for stroke (validated, consistent)
  • Poor surrogate: Bone density for fracture (drugs increase density but not all reduce fracture)
  • Unvalidated: HbA1c for microvascular complications (association exists, but lowering HbA1c doesn't always reduce complications)

Recommendation: Prioritize patient-important outcomes. Accept surrogates only if validated relationship exists and patient-important outcome infeasible.


7. Special Populations & Contexts

Pediatric Evidence: Age-appropriate outcomes (developmental milestones, parent-reported), pharmacokinetic modeling for dose prediction, extrapolation from adults if justified, expert opinion carries more weight when RCTs infeasible.

Rare Diseases: N-of-1 trials, registries, historical controls (with caution), Bayesian methods to reduce sample requirements. Regulatory allows lower evidence standards (orphan drugs, conditional approval).

Health Technology Assessment: Assesses clinical effectiveness (GRADE), safety, cost-effectiveness (cost per QALY), budget impact, organizational/ethical/social factors. Thresholds vary (£20-30k/QALY UK, $50-150k US). Requires systematic review + economic model + probabilistic sensitivity analysis.


8. Knowledge Translation

Evidence-to-Decision Framework (GRADE): Problem priority → Desirable/undesirable effects → Certainty → Values → Balance of benefits/harms → Resources → Equity → Acceptability → Feasibility.

Recommendation strength:

  • Strong ("We recommend"): Most patients would want, few would not
  • Conditional ("We suggest"): Substantial proportion might not want, or uncertainty high

Guideline Development: Scope/PICOT → Systematic review → GRADE profiles → EtD framework → Recommendation (strong vs conditional) → External review → Update plan (3-5 years). COI management critical. AGREE II assesses guideline quality.


9. Common Pitfalls & Fixes

Surrogate outcomes: Using unvalidated biomarkers. Fix: Prioritize patient-important outcomes (mortality, QoL).

Composite outcomes: Obscuring which component drives effect. Fix: Report components separately, verify consistency.

Subgroup proliferation: Data dredging for false positives. Fix: Pre-specify <5 subgroups, test interaction, require plausibility.

Statistical vs clinical significance: p<0.05 with effect below MCID. Fix: Compare to MCID, report absolute effects (NNT).

Publication bias: Missing null results. Fix: Search trial registries (ClinicalTrials.gov), contact authors, assess funnel plot.

Poor applicability: Extrapolating from selected trials. Fix: Assess PICO match, setting differences, patient values.

Causation claims: From observational data. Fix: Use causal language only for RCTs or strong obs evidence (large effect, dose-response).

Industry bias: Uncritical acceptance. Fix: Assess COI, check selective reporting, verify independent analysis.