# Domain Research: Health Science - Advanced Methodology ## Workflow ``` Health Research Progress: - [ ] Step 1: Formulate research question (PICOT) - [ ] Step 2: Assess evidence hierarchy and study design - [ ] Step 3: Evaluate study quality and bias - [ ] Step 4: Prioritize and define outcomes - [ ] Step 5: Synthesize evidence and grade certainty - [ ] Step 6: Create decision-ready summary ``` **Step 1: Formulate research question (PICOT)** Define precise PICOT elements for answerable research question (see template.md for framework). **Step 2: Assess evidence hierarchy and study design** Match study design to question type using [1. Evidence Hierarchy](#1-evidence-hierarchy) (RCT for therapy, cohort for prognosis, cross-sectional for diagnosis). **Step 3: Evaluate study quality and bias** Apply systematic bias assessment using [2. Bias Assessment](#2-bias-assessment) (Cochrane RoB 2, ROBINS-I, or QUADAS-2 depending on design). **Step 4: Prioritize and define outcomes** Distinguish patient-important from surrogate outcomes using [6. Outcome Measurement](#6-outcome-measurement) guidance on MCID, composite outcomes, and surrogates. **Step 5: Synthesize evidence and grade certainty** Rate certainty using [3. GRADE Framework](#3-grade-framework) (downgrade for bias/inconsistency/indirectness/imprecision/publication bias, upgrade for large effects/dose-response). For multiple studies, apply [4. Meta-Analysis Techniques](#4-meta-analysis-techniques). **Step 6: Create decision-ready summary** Synthesize findings using [8. Knowledge Translation](#8-knowledge-translation) evidence-to-decision framework, assess applicability per [7. Special Populations & Contexts](#7-special-populations--contexts), and avoid [9. Common Pitfalls](#9-common-pitfalls--fixes). --- ## 1. Evidence Hierarchy ### Study Design Selection by Question Type **Therapy/Intervention Questions**: - **Gold standard**: RCT (randomized controlled trial) - **When RCT not feasible**: Prospective cohort or pragmatic trial - **Never acceptable**: Case series, expert opinion for causal claims - **Rationale**: RCTs minimize confounding through randomization, establishing causation **Diagnostic Accuracy Questions**: - **Gold standard**: Cross-sectional study with consecutive enrollment - **Critical requirement**: Compare index test to validated reference standard in same patients - **Avoid**: Case-control design (inflates sensitivity/specificity by selecting extremes) - **Rationale**: Cross-sectional design prevents spectrum bias; consecutive enrollment prevents selection bias **Prognosis/Prediction Questions**: - **Gold standard**: Prospective cohort (follow from exposure to outcome) - **Acceptable**: Retrospective cohort with robust data (registries, databases) - **Avoid**: Case-control (can't estimate incidence), cross-sectional (no temporal sequence) - **Rationale**: Cohort design establishes temporal sequence, allows incidence calculation **Harm/Safety Questions**: - **Common harms**: RCTs (adequate power for events occurring in >1% patients) - **Rare harms**: Large observational studies (cohort, case-control, pharmacovigilance) - **Delayed harms**: Long-term cohort studies or registries - **Rationale**: RCTs often lack power/duration for rare or delayed harms; observational studies provide larger samples and longer follow-up ### Hierarchy by Evidence Strength **Level 1 (Highest)**: Systematic reviews and meta-analyses of well-designed RCTs **Level 2**: Individual large, well-designed RCT with low risk of bias **Level 3**: Well-designed RCTs with some limitations (quasi-randomized, not blinded) **Level 4**: Cohort studies (prospective better than retrospective) **Level 5**: Case-control studies **Level 6**: Cross-sectional surveys (descriptive only, not causal) **Level 7**: Case series or case reports **Level 8**: Expert opinion, pathophysiologic rationale **Important**: Hierarchy is a starting point. Study quality matters more than design alone. Well-conducted cohort > poorly conducted RCT. --- ## 2. Bias Assessment ### Cochrane Risk of Bias 2 (RoB 2) for RCTs **Domain 1: Randomization Process** - **Low risk**: Computer-generated sequence, central allocation, opaque envelopes - **Some concerns**: Randomization method unclear, baseline imbalances suggesting problems - **High risk**: Non-random sequence (alternation, date of birth), predictable allocation, post-randomization exclusions **Domain 2: Deviations from Intended Interventions** - **Low risk**: Double-blind, protocol deviations balanced across groups, intention-to-treat (ITT) analysis - **Some concerns**: Open-label but objective outcomes, minor unbalanced deviations - **High risk**: Open-label with subjective outcomes, substantial deviation (>10% cross-over), per-protocol analysis only **Domain 3: Missing Outcome Data** - **Low risk**: <5% loss to follow-up, balanced across groups, multiple imputation if >5% - **Some concerns**: 5-10% loss, ITT analysis used, or reasons for missingness reported - **High risk**: >10% loss, or imbalanced loss (>5% difference between groups), or complete-case analysis with no sensitivity **Domain 4: Measurement of Outcome** - **Low risk**: Blinded outcome assessors, objective outcomes (mortality, lab values) - **Some concerns**: Unblinded assessors but objective outcomes - **High risk**: Unblinded assessors with subjective outcomes (pain, quality of life) **Domain 5: Selection of Reported Result** - **Low risk**: Protocol published before enrollment, all pre-specified outcomes reported - **Some concerns**: Protocol not available, but outcomes match methods section - **High risk**: Outcomes in results differ from protocol/methods, selective subgroup reporting **Overall Judgment**: If any domain is "high risk" → Overall high risk. If all domains "low risk" → Overall low risk. Otherwise → Some concerns. ### ROBINS-I for Observational Studies **Domain 1: Confounding** - **Low**: All important confounders measured and adjusted (multivariable regression, propensity scores, matching) - **Moderate**: Most confounders adjusted, but some unmeasured - **Serious**: Important confounders not adjusted (e.g., comparing treatment groups without adjusting for severity) - **Critical**: Confounding by indication makes results uninterpretable **Domain 2: Selection of Participants** - **Low**: Selection into study unrelated to intervention and outcome (inception cohort, consecutive enrollment) - **Serious**: Post-intervention selection (survivor bias, selecting on outcome) **Domain 3: Classification of Interventions** - **Low**: Intervention status well-defined and independently ascertained (pharmacy records, procedure logs) - **Serious**: Intervention status based on patient recall or subjective classification **Domain 4: Deviations from Intended Interventions** - **Low**: Intervention/comparator groups received intended interventions, co-interventions balanced - **Serious**: Substantial differences in co-interventions between groups **Domain 5: Missing Data** - **Low**: <5% missing outcome data, or multiple imputation with sensitivity analysis - **Serious**: >10% missing, complete-case analysis with no sensitivity **Domain 6: Measurement of Outcomes** - **Low**: Blinded outcome assessment or objective outcomes - **Serious**: Unblinded assessment of subjective outcomes, knowledge of intervention may bias assessment **Domain 7: Selection of Reported Result** - **Low**: Analysis plan pre-specified and followed - **Serious**: Selective reporting of outcomes or subgroups ### QUADAS-2 for Diagnostic Accuracy Studies **Domain 1: Patient Selection** - **Low**: Consecutive or random sample, case-control design avoided, appropriate exclusions - **High**: Case-control design (inflates accuracy), inappropriate exclusions (spectrum bias) **Domain 2: Index Test** - **Low**: Pre-specified threshold, blinded to reference standard - **High**: Threshold chosen after seeing results, unblinded interpretation **Domain 3: Reference Standard** - **Low**: Reference standard correctly classifies condition, interpreted blind to index test - **High**: Imperfect reference standard, differential verification (different reference for positive/negative index) **Domain 4: Flow and Timing** - **Low**: All patients receive same reference standard, appropriate interval between tests - **High**: Not all patients receive reference (partial verification), long interval allowing disease status to change --- ## 3. GRADE Framework ### Starting Certainty **RCTs**: Start at High certainty **Observational studies**: Start at Low certainty ### Downgrade Factors (Each -1 or -2 levels) **1. Risk of Bias (Study Limitations)** - **Serious (-1)**: Most studies have some concerns on RoB 2, or observational studies with moderate risk on most ROBINS-I domains - **Very serious (-2)**: Most studies high risk of bias, or observational with serious/critical risk on ROBINS-I **2. Inconsistency (Heterogeneity)** - **Serious (-1)**: I² = 50-75%, or point estimates vary widely, or confidence intervals show minimal overlap - **Very serious (-2)**: I² > 75%, opposite directions of effect - **Do not downgrade if**: Heterogeneity explained by subgroup analysis, or all studies show benefit despite variation in magnitude **3. Indirectness (Applicability)** - **Serious (-1)**: Indirect comparison (no head-to-head trial), surrogate outcome instead of patient-important, PICO mismatch (different population/intervention than question) - **Very serious (-2)**: Multiple levels of indirectness (e.g., indirect comparison + surrogate outcome) **4. Imprecision (Statistical Uncertainty)** - **Serious (-1)**: Confidence interval crosses minimal clinically important difference (MCID) or includes both benefit and harm, or optimal information size (OIS) not met - **Very serious (-2)**: Very wide CI, very small sample (<100 total), or very few events (<100 total) - **Rule of thumb**: OIS = sample size required for adequately powered RCT (~400 patients for typical effect size) **5. Publication Bias** - **Serious (-1)**: Funnel plot asymmetry (Egger's test p<0.10), all studies industry-funded with positive results, or known unpublished negative trials - **Note**: Requires ≥10 studies to assess funnel plot. Consider searching trial registries for unpublished studies. ### Upgrade Factors (Observational Studies Only) **1. Large Effect** - **Upgrade +1**: RR > 2 or < 0.5 (based on consistent evidence, no plausible confounders) - **Upgrade +2**: RR > 5 or < 0.2 ("very large effect") - **Example**: Smoking → lung cancer (RR ~20) upgraded from low to moderate or high **2. Dose-Response Gradient** - **Upgrade +1**: Increasing exposure associated with increasing risk/benefit in consistent pattern - **Example**: More cigarettes/day → higher lung cancer risk **3. All Plausible Confounders Would Reduce Observed Effect** - **Upgrade +1**: Despite confounding working against finding effect, effect still observed - **Example**: Healthy user bias would reduce observed benefit, yet benefit still seen ### Final Certainty Rating **High** (⊕⊕⊕⊕): Very confident true effect is close to estimate. Further research very unlikely to change conclusion. **Moderate** (⊕⊕⊕○): Moderately confident. True effect is likely close to estimate, but could be substantially different. Further research may change conclusion. **Low** (⊕⊕○○): Limited confidence. True effect may be substantially different. Further research likely to change conclusion. **Very Low** (⊕○○○): Very little confidence. True effect is likely substantially different. Any estimate is very uncertain. --- ## 4. Meta-Analysis Techniques ### When to Pool (Meta-Analysis) **Pool when**: - Studies address same PICO question - Outcomes measured similarly (same construct, similar timepoints) - Low to moderate heterogeneity (I² < 60%) - At least 3 studies available **Do not pool when**: - Substantial heterogeneity (I² > 75%) unexplained by subgroups - Different interventions (can't pool aspirin with warfarin for "anticoagulation") - Different populations (adults vs children, mild vs severe disease) - Methodologically flawed studies (high risk of bias) ### Statistical Models **Fixed-effect model**: Assumes one true effect, differences due to sampling error only. - **Use when**: I² < 25%, studies very similar - **Calculation**: Inverse-variance weighting (larger studies get more weight) **Random-effects model**: Assumes distribution of true effects, accounts for between-study variance. - **Use when**: I² ≥ 25%, clinical heterogeneity expected - **Calculation**: DerSimonian-Laird or REML methods - **Note**: Gives more weight to smaller studies than fixed-effect **Recommendation**: Use random-effects as default for clinical heterogeneity, even if I² low. ### Effect Measures **Binary outcomes** (event yes/no): - **Risk Ratio (RR)**: Events in intervention / Events in control. Easier to interpret than OR. - **Odds Ratio (OR)**: Used when outcome rare (<10%) or case-control design. - **Risk Difference (RD)**: Absolute difference. Important for clinical interpretation (NNT = 1/RD). **Continuous outcomes** (measured on scale): - **Mean Difference (MD)**: When outcome measured on same scale (e.g., mm Hg blood pressure) - **Standardized Mean Difference (SMD)**: When outcome measured on different scales (different QoL questionnaires). Interpret as effect size: SMD 0.2 = small, 0.5 = moderate, 0.8 = large. **Time-to-event outcomes**: - **Hazard Ratio (HR)**: Accounts for censoring and time. From Cox proportional hazards models. ### Heterogeneity Assessment **I² statistic**: % of variability due to heterogeneity rather than chance. - **I² = 0-25%**: Low heterogeneity (might not need subgroup analysis) - **I² = 25-50%**: Moderate heterogeneity (explore sources) - **I² = 50-75%**: Substantial heterogeneity (subgroup analysis essential) - **I² > 75%**: Considerable heterogeneity (consider not pooling) **Cochran's Q test**: Tests whether heterogeneity is statistically significant (p<0.10 suggests heterogeneity). - **Limitation**: Low power with few studies, high power with many studies (may detect clinically unimportant heterogeneity) **Exploring heterogeneity**: 1. Visual inspection (forest plot - outliers?) 2. Subgroup analysis (by population, intervention, setting, risk of bias) 3. Meta-regression (if ≥10 studies) - test whether study-level characteristics (year, dose, age) explain heterogeneity 4. Sensitivity analysis (exclude high risk of bias, exclude outliers) ### Publication Bias Assessment **Methods** (require ≥10 studies): - **Funnel plot**: Plot effect size vs precision (SE). Asymmetry suggests small-study effects/publication bias. - **Egger's test**: Statistical test for funnel plot asymmetry (p<0.10 suggests bias). - **Trim and fill**: Impute missing studies and recalculate pooled effect. **Limitations**: Asymmetry can be due to heterogeneity, not just publication bias. Small-study effects != publication bias. **Search mitigation**: Search clinical trial registries (ClinicalTrials.gov, EudraCT), contact authors, grey literature. --- ## 5. Advanced Study Designs ### Pragmatic Trials **Purpose**: Evaluate effectiveness in real-world settings (vs efficacy in ideal conditions). **Characteristics**: - Broad inclusion criteria (representative of clinical practice) - Minimal exclusions (include comorbidities, elderly, diverse populations) - Flexible interventions (allow adaptations like clinical practice) - Clinically relevant comparators (usual care, not placebo) - Patient-important outcomes (mortality, QoL, not just biomarkers) - Long-term follow-up (capture real-world adherence, adverse events) **PRECIS-2 wheel**: Rates trials from explanatory (ideal conditions) to pragmatic (real-world) on 9 domains. **Example**: HOPE-3 trial (polypill for CVD prevention) - broad inclusion, minimal monitoring, usual care comparator, long-term follow-up. ### Non-Inferiority Trials **Purpose**: Show new treatment is "not worse" than standard (by pre-defined margin), usually because new treatment has other advantages (cheaper, safer, easier). **Key concepts**: - **Non-inferiority margin** (Δ): Maximum acceptable difference. New treatment preserves ≥50% of standard's benefit over placebo. - **One-sided test**: Test whether upper limit of 95% CI for difference < Δ. - **Interpretation**: If upper CI < Δ, declare non-inferiority. If CI crosses Δ, inconclusive. **Pitfalls**: - Large non-inferiority margins (>50% of benefit) allow ineffective treatments - Per-protocol analysis bias (favors non-inferiority); need ITT + per-protocol - Assay sensitivity: Must show historical evidence that standard > placebo **Example**: Enoxaparin vs unfractionated heparin for VTE treatment. Margin = 2% absolute difference in recurrent VTE. ### Cluster Randomized Trials **Design**: Randomize groups (hospitals, clinics, communities) not individuals. **When used**: - Intervention delivered at group level (policy, training, quality improvement) - Contamination risk if individuals randomized (control group adopts intervention) **Statistical consideration**: - **Intracluster correlation (ICC)**: Individuals within cluster more similar than across clusters - **Design effect**: Effective sample size reduced: Deff = 1 + (m-1) × ICC, where m = cluster size - **Analysis**: Account for clustering (GEE, mixed models, cluster-level analysis) **Example**: COMMIT trial (smoking cessation at workplace level). Randomized worksites, analyzed accounting for clustering. ### N-of-1 Trials **Design**: Single patient receives multiple crossovers between treatments in random order. **When used**: - Chronic stable conditions (asthma, arthritis, chronic pain) - Rapid onset/offset treatments - Substantial inter-patient variability in response - Patient wants personalized evidence **Requirements**: - ≥3 treatment periods per arm (A-B-A-B-A-B) - Washout between periods if needed - Blind patient and assessor if possible - Pre-specify outcome and decision rule **Analysis**: Compare outcomes during A vs B periods within patient (paired t-test, meta-analysis across periods). **Example**: Stimulant dose optimization for ADHD. Test 3 doses + placebo in randomized crossover, 1-week periods each. --- ## 6. Outcome Measurement ### Minimal Clinically Important Difference (MCID) **Definition**: Smallest change in outcome that patients perceive as beneficial (and would mandate change in management). **Determination methods**: 1. **Anchor-based**: Link change to external anchor ("How much has your pain improved?" - "A little" threshold) 2. **Distribution-based**: 0.5 SD or 1 SE as MCID (statistical, not patient-centered) 3. **Delphi consensus**: Expert panel agrees on MCID **Examples**: - **Pain VAS** (0-100): MCID = 10-15 points - **6-minute walk distance**: MCID = 30 meters - **KCCQ** (Kansas City Cardiomyopathy Questionnaire): MCID = 5 points - **FEV₁** (lung function): MCID = 100-140 mL **Interpretation**: Effect size must exceed MCID to be clinically meaningful. p<0.05 with effect < MCID = statistically significant but clinically trivial. ### Composite Outcomes **Definition**: Combines ≥2 outcomes into single endpoint (e.g., "death, MI, or stroke"). **Advantages**: - Increases event rate → reduces required sample size - Captures multiple aspects of benefit/harm **Disadvantages**: - Obscures which component drives effect (mortality reduction? or non-fatal MI?) - Components may not be equally important to patients (MI ≠ revascularization) - If components affected differently, composite can mislead **Guidelines**: - Report components separately - Verify effect is consistent across components - Weight components by importance if possible - Avoid composites with many low-importance components **Example**: MACE (major adverse cardiac events) = death + MI + stroke (appropriate). But "death, MI, stroke, or revascularization" dilutes with less important outcome. ### Surrogate Outcomes **Definition**: Biomarker/lab value used as substitute for patient-important outcome. **Valid surrogate criteria** (Prentice criteria): 1. Surrogate associated with clinical outcome (correlation) 2. Intervention affects surrogate 3. Intervention's effect on clinical outcome is mediated through surrogate 4. Effect on surrogate fully captures effect on clinical outcome **Problems**: - Many surrogates fail criteria #4 (e.g., antiarrhythmics reduce PVCs but increase mortality) - Intervention can affect surrogate without affecting clinical outcome **Examples**: - **Good surrogate**: Blood pressure for stroke (validated, consistent) - **Poor surrogate**: Bone density for fracture (drugs increase density but not all reduce fracture) - **Unvalidated**: HbA1c for microvascular complications (association exists, but lowering HbA1c doesn't always reduce complications) **Recommendation**: Prioritize patient-important outcomes. Accept surrogates only if validated relationship exists and patient-important outcome infeasible. --- ## 7. Special Populations & Contexts **Pediatric Evidence**: Age-appropriate outcomes (developmental milestones, parent-reported), pharmacokinetic modeling for dose prediction, extrapolation from adults if justified, expert opinion carries more weight when RCTs infeasible. **Rare Diseases**: N-of-1 trials, registries, historical controls (with caution), Bayesian methods to reduce sample requirements. Regulatory allows lower evidence standards (orphan drugs, conditional approval). **Health Technology Assessment**: Assesses clinical effectiveness (GRADE), safety, cost-effectiveness (cost per QALY), budget impact, organizational/ethical/social factors. Thresholds vary (£20-30k/QALY UK, $50-150k US). Requires systematic review + economic model + probabilistic sensitivity analysis. --- ## 8. Knowledge Translation **Evidence-to-Decision Framework** (GRADE): Problem priority → Desirable/undesirable effects → Certainty → Values → Balance of benefits/harms → Resources → Equity → Acceptability → Feasibility. **Recommendation strength**: - **Strong** ("We recommend"): Most patients would want, few would not - **Conditional** ("We suggest"): Substantial proportion might not want, or uncertainty high **Guideline Development**: Scope/PICOT → Systematic review → GRADE profiles → EtD framework → Recommendation (strong vs conditional) → External review → Update plan (3-5 years). COI management critical. AGREE II assesses guideline quality. --- ## 9. Common Pitfalls & Fixes **Surrogate outcomes**: Using unvalidated biomarkers. **Fix**: Prioritize patient-important outcomes (mortality, QoL). **Composite outcomes**: Obscuring which component drives effect. **Fix**: Report components separately, verify consistency. **Subgroup proliferation**: Data dredging for false positives. **Fix**: Pre-specify <5 subgroups, test interaction, require plausibility. **Statistical vs clinical significance**: p<0.05 with effect below MCID. **Fix**: Compare to MCID, report absolute effects (NNT). **Publication bias**: Missing null results. **Fix**: Search trial registries (ClinicalTrials.gov), contact authors, assess funnel plot. **Poor applicability**: Extrapolating from selected trials. **Fix**: Assess PICO match, setting differences, patient values. **Causation claims**: From observational data. **Fix**: Use causal language only for RCTs or strong obs evidence (large effect, dose-response). **Industry bias**: Uncritical acceptance. **Fix**: Assess COI, check selective reporting, verify independent analysis.