Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,470 @@
# Domain Research: Health Science - Advanced Methodology
## Workflow
```
Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary
```
**Step 1: Formulate research question (PICOT)**
Define precise PICOT elements for answerable research question (see template.md for framework).
**Step 2: Assess evidence hierarchy and study design**
Match study design to question type using [1. Evidence Hierarchy](#1-evidence-hierarchy) (RCT for therapy, cohort for prognosis, cross-sectional for diagnosis).
**Step 3: Evaluate study quality and bias**
Apply systematic bias assessment using [2. Bias Assessment](#2-bias-assessment) (Cochrane RoB 2, ROBINS-I, or QUADAS-2 depending on design).
**Step 4: Prioritize and define outcomes**
Distinguish patient-important from surrogate outcomes using [6. Outcome Measurement](#6-outcome-measurement) guidance on MCID, composite outcomes, and surrogates.
**Step 5: Synthesize evidence and grade certainty**
Rate certainty using [3. GRADE Framework](#3-grade-framework) (downgrade for bias/inconsistency/indirectness/imprecision/publication bias, upgrade for large effects/dose-response). For multiple studies, apply [4. Meta-Analysis Techniques](#4-meta-analysis-techniques).
**Step 6: Create decision-ready summary**
Synthesize findings using [8. Knowledge Translation](#8-knowledge-translation) evidence-to-decision framework, assess applicability per [7. Special Populations & Contexts](#7-special-populations--contexts), and avoid [9. Common Pitfalls](#9-common-pitfalls--fixes).
---
## 1. Evidence Hierarchy
### Study Design Selection by Question Type
**Therapy/Intervention Questions**:
- **Gold standard**: RCT (randomized controlled trial)
- **When RCT not feasible**: Prospective cohort or pragmatic trial
- **Never acceptable**: Case series, expert opinion for causal claims
- **Rationale**: RCTs minimize confounding through randomization, establishing causation
**Diagnostic Accuracy Questions**:
- **Gold standard**: Cross-sectional study with consecutive enrollment
- **Critical requirement**: Compare index test to validated reference standard in same patients
- **Avoid**: Case-control design (inflates sensitivity/specificity by selecting extremes)
- **Rationale**: Cross-sectional design prevents spectrum bias; consecutive enrollment prevents selection bias
**Prognosis/Prediction Questions**:
- **Gold standard**: Prospective cohort (follow from exposure to outcome)
- **Acceptable**: Retrospective cohort with robust data (registries, databases)
- **Avoid**: Case-control (can't estimate incidence), cross-sectional (no temporal sequence)
- **Rationale**: Cohort design establishes temporal sequence, allows incidence calculation
**Harm/Safety Questions**:
- **Common harms**: RCTs (adequate power for events occurring in >1% patients)
- **Rare harms**: Large observational studies (cohort, case-control, pharmacovigilance)
- **Delayed harms**: Long-term cohort studies or registries
- **Rationale**: RCTs often lack power/duration for rare or delayed harms; observational studies provide larger samples and longer follow-up
### Hierarchy by Evidence Strength
**Level 1 (Highest)**: Systematic reviews and meta-analyses of well-designed RCTs
**Level 2**: Individual large, well-designed RCT with low risk of bias
**Level 3**: Well-designed RCTs with some limitations (quasi-randomized, not blinded)
**Level 4**: Cohort studies (prospective better than retrospective)
**Level 5**: Case-control studies
**Level 6**: Cross-sectional surveys (descriptive only, not causal)
**Level 7**: Case series or case reports
**Level 8**: Expert opinion, pathophysiologic rationale
**Important**: Hierarchy is a starting point. Study quality matters more than design alone. Well-conducted cohort > poorly conducted RCT.
---
## 2. Bias Assessment
### Cochrane Risk of Bias 2 (RoB 2) for RCTs
**Domain 1: Randomization Process**
- **Low risk**: Computer-generated sequence, central allocation, opaque envelopes
- **Some concerns**: Randomization method unclear, baseline imbalances suggesting problems
- **High risk**: Non-random sequence (alternation, date of birth), predictable allocation, post-randomization exclusions
**Domain 2: Deviations from Intended Interventions**
- **Low risk**: Double-blind, protocol deviations balanced across groups, intention-to-treat (ITT) analysis
- **Some concerns**: Open-label but objective outcomes, minor unbalanced deviations
- **High risk**: Open-label with subjective outcomes, substantial deviation (>10% cross-over), per-protocol analysis only
**Domain 3: Missing Outcome Data**
- **Low risk**: <5% loss to follow-up, balanced across groups, multiple imputation if >5%
- **Some concerns**: 5-10% loss, ITT analysis used, or reasons for missingness reported
- **High risk**: >10% loss, or imbalanced loss (>5% difference between groups), or complete-case analysis with no sensitivity
**Domain 4: Measurement of Outcome**
- **Low risk**: Blinded outcome assessors, objective outcomes (mortality, lab values)
- **Some concerns**: Unblinded assessors but objective outcomes
- **High risk**: Unblinded assessors with subjective outcomes (pain, quality of life)
**Domain 5: Selection of Reported Result**
- **Low risk**: Protocol published before enrollment, all pre-specified outcomes reported
- **Some concerns**: Protocol not available, but outcomes match methods section
- **High risk**: Outcomes in results differ from protocol/methods, selective subgroup reporting
**Overall Judgment**: If any domain is "high risk" → Overall high risk. If all domains "low risk" → Overall low risk. Otherwise → Some concerns.
### ROBINS-I for Observational Studies
**Domain 1: Confounding**
- **Low**: All important confounders measured and adjusted (multivariable regression, propensity scores, matching)
- **Moderate**: Most confounders adjusted, but some unmeasured
- **Serious**: Important confounders not adjusted (e.g., comparing treatment groups without adjusting for severity)
- **Critical**: Confounding by indication makes results uninterpretable
**Domain 2: Selection of Participants**
- **Low**: Selection into study unrelated to intervention and outcome (inception cohort, consecutive enrollment)
- **Serious**: Post-intervention selection (survivor bias, selecting on outcome)
**Domain 3: Classification of Interventions**
- **Low**: Intervention status well-defined and independently ascertained (pharmacy records, procedure logs)
- **Serious**: Intervention status based on patient recall or subjective classification
**Domain 4: Deviations from Intended Interventions**
- **Low**: Intervention/comparator groups received intended interventions, co-interventions balanced
- **Serious**: Substantial differences in co-interventions between groups
**Domain 5: Missing Data**
- **Low**: <5% missing outcome data, or multiple imputation with sensitivity analysis
- **Serious**: >10% missing, complete-case analysis with no sensitivity
**Domain 6: Measurement of Outcomes**
- **Low**: Blinded outcome assessment or objective outcomes
- **Serious**: Unblinded assessment of subjective outcomes, knowledge of intervention may bias assessment
**Domain 7: Selection of Reported Result**
- **Low**: Analysis plan pre-specified and followed
- **Serious**: Selective reporting of outcomes or subgroups
### QUADAS-2 for Diagnostic Accuracy Studies
**Domain 1: Patient Selection**
- **Low**: Consecutive or random sample, case-control design avoided, appropriate exclusions
- **High**: Case-control design (inflates accuracy), inappropriate exclusions (spectrum bias)
**Domain 2: Index Test**
- **Low**: Pre-specified threshold, blinded to reference standard
- **High**: Threshold chosen after seeing results, unblinded interpretation
**Domain 3: Reference Standard**
- **Low**: Reference standard correctly classifies condition, interpreted blind to index test
- **High**: Imperfect reference standard, differential verification (different reference for positive/negative index)
**Domain 4: Flow and Timing**
- **Low**: All patients receive same reference standard, appropriate interval between tests
- **High**: Not all patients receive reference (partial verification), long interval allowing disease status to change
---
## 3. GRADE Framework
### Starting Certainty
**RCTs**: Start at High certainty
**Observational studies**: Start at Low certainty
### Downgrade Factors (Each -1 or -2 levels)
**1. Risk of Bias (Study Limitations)**
- **Serious (-1)**: Most studies have some concerns on RoB 2, or observational studies with moderate risk on most ROBINS-I domains
- **Very serious (-2)**: Most studies high risk of bias, or observational with serious/critical risk on ROBINS-I
**2. Inconsistency (Heterogeneity)**
- **Serious (-1)**: I² = 50-75%, or point estimates vary widely, or confidence intervals show minimal overlap
- **Very serious (-2)**: I² > 75%, opposite directions of effect
- **Do not downgrade if**: Heterogeneity explained by subgroup analysis, or all studies show benefit despite variation in magnitude
**3. Indirectness (Applicability)**
- **Serious (-1)**: Indirect comparison (no head-to-head trial), surrogate outcome instead of patient-important, PICO mismatch (different population/intervention than question)
- **Very serious (-2)**: Multiple levels of indirectness (e.g., indirect comparison + surrogate outcome)
**4. Imprecision (Statistical Uncertainty)**
- **Serious (-1)**: Confidence interval crosses minimal clinically important difference (MCID) or includes both benefit and harm, or optimal information size (OIS) not met
- **Very serious (-2)**: Very wide CI, very small sample (<100 total), or very few events (<100 total)
- **Rule of thumb**: OIS = sample size required for adequately powered RCT (~400 patients for typical effect size)
**5. Publication Bias**
- **Serious (-1)**: Funnel plot asymmetry (Egger's test p<0.10), all studies industry-funded with positive results, or known unpublished negative trials
- **Note**: Requires ≥10 studies to assess funnel plot. Consider searching trial registries for unpublished studies.
### Upgrade Factors (Observational Studies Only)
**1. Large Effect**
- **Upgrade +1**: RR > 2 or < 0.5 (based on consistent evidence, no plausible confounders)
- **Upgrade +2**: RR > 5 or < 0.2 ("very large effect")
- **Example**: Smoking → lung cancer (RR ~20) upgraded from low to moderate or high
**2. Dose-Response Gradient**
- **Upgrade +1**: Increasing exposure associated with increasing risk/benefit in consistent pattern
- **Example**: More cigarettes/day → higher lung cancer risk
**3. All Plausible Confounders Would Reduce Observed Effect**
- **Upgrade +1**: Despite confounding working against finding effect, effect still observed
- **Example**: Healthy user bias would reduce observed benefit, yet benefit still seen
### Final Certainty Rating
**High** (⊕⊕⊕⊕): Very confident true effect is close to estimate. Further research very unlikely to change conclusion.
**Moderate** (⊕⊕⊕○): Moderately confident. True effect is likely close to estimate, but could be substantially different. Further research may change conclusion.
**Low** (⊕⊕○○): Limited confidence. True effect may be substantially different. Further research likely to change conclusion.
**Very Low** (⊕○○○): Very little confidence. True effect is likely substantially different. Any estimate is very uncertain.
---
## 4. Meta-Analysis Techniques
### When to Pool (Meta-Analysis)
**Pool when**:
- Studies address same PICO question
- Outcomes measured similarly (same construct, similar timepoints)
- Low to moderate heterogeneity (I² < 60%)
- At least 3 studies available
**Do not pool when**:
- Substantial heterogeneity (I² > 75%) unexplained by subgroups
- Different interventions (can't pool aspirin with warfarin for "anticoagulation")
- Different populations (adults vs children, mild vs severe disease)
- Methodologically flawed studies (high risk of bias)
### Statistical Models
**Fixed-effect model**: Assumes one true effect, differences due to sampling error only.
- **Use when**: I² < 25%, studies very similar
- **Calculation**: Inverse-variance weighting (larger studies get more weight)
**Random-effects model**: Assumes distribution of true effects, accounts for between-study variance.
- **Use when**: I² ≥ 25%, clinical heterogeneity expected
- **Calculation**: DerSimonian-Laird or REML methods
- **Note**: Gives more weight to smaller studies than fixed-effect
**Recommendation**: Use random-effects as default for clinical heterogeneity, even if I² low.
### Effect Measures
**Binary outcomes** (event yes/no):
- **Risk Ratio (RR)**: Events in intervention / Events in control. Easier to interpret than OR.
- **Odds Ratio (OR)**: Used when outcome rare (<10%) or case-control design.
- **Risk Difference (RD)**: Absolute difference. Important for clinical interpretation (NNT = 1/RD).
**Continuous outcomes** (measured on scale):
- **Mean Difference (MD)**: When outcome measured on same scale (e.g., mm Hg blood pressure)
- **Standardized Mean Difference (SMD)**: When outcome measured on different scales (different QoL questionnaires). Interpret as effect size: SMD 0.2 = small, 0.5 = moderate, 0.8 = large.
**Time-to-event outcomes**:
- **Hazard Ratio (HR)**: Accounts for censoring and time. From Cox proportional hazards models.
### Heterogeneity Assessment
**I² statistic**: % of variability due to heterogeneity rather than chance.
- **I² = 0-25%**: Low heterogeneity (might not need subgroup analysis)
- **I² = 25-50%**: Moderate heterogeneity (explore sources)
- **I² = 50-75%**: Substantial heterogeneity (subgroup analysis essential)
- **I² > 75%**: Considerable heterogeneity (consider not pooling)
**Cochran's Q test**: Tests whether heterogeneity is statistically significant (p<0.10 suggests heterogeneity).
- **Limitation**: Low power with few studies, high power with many studies (may detect clinically unimportant heterogeneity)
**Exploring heterogeneity**:
1. Visual inspection (forest plot - outliers?)
2. Subgroup analysis (by population, intervention, setting, risk of bias)
3. Meta-regression (if ≥10 studies) - test whether study-level characteristics (year, dose, age) explain heterogeneity
4. Sensitivity analysis (exclude high risk of bias, exclude outliers)
### Publication Bias Assessment
**Methods** (require ≥10 studies):
- **Funnel plot**: Plot effect size vs precision (SE). Asymmetry suggests small-study effects/publication bias.
- **Egger's test**: Statistical test for funnel plot asymmetry (p<0.10 suggests bias).
- **Trim and fill**: Impute missing studies and recalculate pooled effect.
**Limitations**: Asymmetry can be due to heterogeneity, not just publication bias. Small-study effects != publication bias.
**Search mitigation**: Search clinical trial registries (ClinicalTrials.gov, EudraCT), contact authors, grey literature.
---
## 5. Advanced Study Designs
### Pragmatic Trials
**Purpose**: Evaluate effectiveness in real-world settings (vs efficacy in ideal conditions).
**Characteristics**:
- Broad inclusion criteria (representative of clinical practice)
- Minimal exclusions (include comorbidities, elderly, diverse populations)
- Flexible interventions (allow adaptations like clinical practice)
- Clinically relevant comparators (usual care, not placebo)
- Patient-important outcomes (mortality, QoL, not just biomarkers)
- Long-term follow-up (capture real-world adherence, adverse events)
**PRECIS-2 wheel**: Rates trials from explanatory (ideal conditions) to pragmatic (real-world) on 9 domains.
**Example**: HOPE-3 trial (polypill for CVD prevention) - broad inclusion, minimal monitoring, usual care comparator, long-term follow-up.
### Non-Inferiority Trials
**Purpose**: Show new treatment is "not worse" than standard (by pre-defined margin), usually because new treatment has other advantages (cheaper, safer, easier).
**Key concepts**:
- **Non-inferiority margin** (Δ): Maximum acceptable difference. New treatment preserves ≥50% of standard's benefit over placebo.
- **One-sided test**: Test whether upper limit of 95% CI for difference < Δ.
- **Interpretation**: If upper CI < Δ, declare non-inferiority. If CI crosses Δ, inconclusive.
**Pitfalls**:
- Large non-inferiority margins (>50% of benefit) allow ineffective treatments
- Per-protocol analysis bias (favors non-inferiority); need ITT + per-protocol
- Assay sensitivity: Must show historical evidence that standard > placebo
**Example**: Enoxaparin vs unfractionated heparin for VTE treatment. Margin = 2% absolute difference in recurrent VTE.
### Cluster Randomized Trials
**Design**: Randomize groups (hospitals, clinics, communities) not individuals.
**When used**:
- Intervention delivered at group level (policy, training, quality improvement)
- Contamination risk if individuals randomized (control group adopts intervention)
**Statistical consideration**:
- **Intracluster correlation (ICC)**: Individuals within cluster more similar than across clusters
- **Design effect**: Effective sample size reduced: Deff = 1 + (m-1) × ICC, where m = cluster size
- **Analysis**: Account for clustering (GEE, mixed models, cluster-level analysis)
**Example**: COMMIT trial (smoking cessation at workplace level). Randomized worksites, analyzed accounting for clustering.
### N-of-1 Trials
**Design**: Single patient receives multiple crossovers between treatments in random order.
**When used**:
- Chronic stable conditions (asthma, arthritis, chronic pain)
- Rapid onset/offset treatments
- Substantial inter-patient variability in response
- Patient wants personalized evidence
**Requirements**:
- ≥3 treatment periods per arm (A-B-A-B-A-B)
- Washout between periods if needed
- Blind patient and assessor if possible
- Pre-specify outcome and decision rule
**Analysis**: Compare outcomes during A vs B periods within patient (paired t-test, meta-analysis across periods).
**Example**: Stimulant dose optimization for ADHD. Test 3 doses + placebo in randomized crossover, 1-week periods each.
---
## 6. Outcome Measurement
### Minimal Clinically Important Difference (MCID)
**Definition**: Smallest change in outcome that patients perceive as beneficial (and would mandate change in management).
**Determination methods**:
1. **Anchor-based**: Link change to external anchor ("How much has your pain improved?" - "A little" threshold)
2. **Distribution-based**: 0.5 SD or 1 SE as MCID (statistical, not patient-centered)
3. **Delphi consensus**: Expert panel agrees on MCID
**Examples**:
- **Pain VAS** (0-100): MCID = 10-15 points
- **6-minute walk distance**: MCID = 30 meters
- **KCCQ** (Kansas City Cardiomyopathy Questionnaire): MCID = 5 points
- **FEV₁** (lung function): MCID = 100-140 mL
**Interpretation**: Effect size must exceed MCID to be clinically meaningful. p<0.05 with effect < MCID = statistically significant but clinically trivial.
### Composite Outcomes
**Definition**: Combines ≥2 outcomes into single endpoint (e.g., "death, MI, or stroke").
**Advantages**:
- Increases event rate → reduces required sample size
- Captures multiple aspects of benefit/harm
**Disadvantages**:
- Obscures which component drives effect (mortality reduction? or non-fatal MI?)
- Components may not be equally important to patients (MI ≠ revascularization)
- If components affected differently, composite can mislead
**Guidelines**:
- Report components separately
- Verify effect is consistent across components
- Weight components by importance if possible
- Avoid composites with many low-importance components
**Example**: MACE (major adverse cardiac events) = death + MI + stroke (appropriate). But "death, MI, stroke, or revascularization" dilutes with less important outcome.
### Surrogate Outcomes
**Definition**: Biomarker/lab value used as substitute for patient-important outcome.
**Valid surrogate criteria** (Prentice criteria):
1. Surrogate associated with clinical outcome (correlation)
2. Intervention affects surrogate
3. Intervention's effect on clinical outcome is mediated through surrogate
4. Effect on surrogate fully captures effect on clinical outcome
**Problems**:
- Many surrogates fail criteria #4 (e.g., antiarrhythmics reduce PVCs but increase mortality)
- Intervention can affect surrogate without affecting clinical outcome
**Examples**:
- **Good surrogate**: Blood pressure for stroke (validated, consistent)
- **Poor surrogate**: Bone density for fracture (drugs increase density but not all reduce fracture)
- **Unvalidated**: HbA1c for microvascular complications (association exists, but lowering HbA1c doesn't always reduce complications)
**Recommendation**: Prioritize patient-important outcomes. Accept surrogates only if validated relationship exists and patient-important outcome infeasible.
---
## 7. Special Populations & Contexts
**Pediatric Evidence**: Age-appropriate outcomes (developmental milestones, parent-reported), pharmacokinetic modeling for dose prediction, extrapolation from adults if justified, expert opinion carries more weight when RCTs infeasible.
**Rare Diseases**: N-of-1 trials, registries, historical controls (with caution), Bayesian methods to reduce sample requirements. Regulatory allows lower evidence standards (orphan drugs, conditional approval).
**Health Technology Assessment**: Assesses clinical effectiveness (GRADE), safety, cost-effectiveness (cost per QALY), budget impact, organizational/ethical/social factors. Thresholds vary (£20-30k/QALY UK, $50-150k US). Requires systematic review + economic model + probabilistic sensitivity analysis.
---
## 8. Knowledge Translation
**Evidence-to-Decision Framework** (GRADE): Problem priority → Desirable/undesirable effects → Certainty → Values → Balance of benefits/harms → Resources → Equity → Acceptability → Feasibility.
**Recommendation strength**:
- **Strong** ("We recommend"): Most patients would want, few would not
- **Conditional** ("We suggest"): Substantial proportion might not want, or uncertainty high
**Guideline Development**: Scope/PICOT → Systematic review → GRADE profiles → EtD framework → Recommendation (strong vs conditional) → External review → Update plan (3-5 years). COI management critical. AGREE II assesses guideline quality.
---
## 9. Common Pitfalls & Fixes
**Surrogate outcomes**: Using unvalidated biomarkers. **Fix**: Prioritize patient-important outcomes (mortality, QoL).
**Composite outcomes**: Obscuring which component drives effect. **Fix**: Report components separately, verify consistency.
**Subgroup proliferation**: Data dredging for false positives. **Fix**: Pre-specify <5 subgroups, test interaction, require plausibility.
**Statistical vs clinical significance**: p<0.05 with effect below MCID. **Fix**: Compare to MCID, report absolute effects (NNT).
**Publication bias**: Missing null results. **Fix**: Search trial registries (ClinicalTrials.gov), contact authors, assess funnel plot.
**Poor applicability**: Extrapolating from selected trials. **Fix**: Assess PICO match, setting differences, patient values.
**Causation claims**: From observational data. **Fix**: Use causal language only for RCTs or strong obs evidence (large effect, dose-response).
**Industry bias**: Uncritical acceptance. **Fix**: Assess COI, check selective reporting, verify independent analysis.