Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,210 @@
---
name: domain-research-health-science
description: Use when formulating clinical research questions (PICOT framework), evaluating health evidence quality (study design hierarchy, bias assessment, GRADE), prioritizing patient-important outcomes, conducting systematic reviews or meta-analyses, creating evidence summaries for guidelines, assessing regulatory evidence, or when user mentions clinical trials, evidence-based medicine, health research methodology, systematic reviews, research protocols, or study quality assessment.
---
# Domain Research: Health Science
## Table of Contents
- [Purpose](#purpose)
- [When to Use](#when-to-use)
- [What Is It?](#what-is-it)
- [Workflow](#workflow)
- [Common Patterns](#common-patterns)
- [Guardrails](#guardrails)
- [Quick Reference](#quick-reference)
## Purpose
This skill helps structure clinical and health science research using evidence-based medicine frameworks. It guides you through formulating precise research questions (PICOT), evaluating study quality (hierarchy of evidence, bias assessment, GRADE), prioritizing outcomes (patient-important vs surrogate), and synthesizing evidence for clinical decision-making.
## When to Use
Use this skill when:
- **Formulating research questions**: Structuring clinical questions using P ICO T (Population, Intervention, Comparator, Outcome, Timeframe)
- **Evaluating evidence quality**: Assessing study design strength, risk of bias, certainty of evidence (GRADE framework)
- **Prioritizing outcomes**: Distinguishing patient-important outcomes from surrogate endpoints, creating outcome hierarchies
- **Systematic reviews**: Planning or conducting systematic reviews, meta-analyses, or evidence syntheses
- **Clinical guidelines**: Creating evidence summaries for practice guidelines or decision support
- **Trial design**: Designing RCTs, pragmatic trials, or observational studies with rigorous methodology
- **Regulatory submissions**: Preparing evidence dossiers for drug/device approval or reimbursement decisions
- **Critical appraisal**: Evaluating published research for clinical applicability and methodological quality
Trigger phrases: "clinical trial design", "systematic review", "PICOT question", "evidence quality", "bias assessment", "GRADE", "outcome measures", "research protocol", "evidence synthesis", "study appraisal"
## What Is It?
Domain Research: Health Science applies structured frameworks from evidence-based medicine to ensure clinical research is well-formulated, methodologically sound, and clinically meaningful.
**Quick example:**
**Vague question**: "Does this drug work for heart disease?"
**PICOT-structured question**:
- **P** (Population): Adults >65 with heart failure and reduced ejection fraction
- **I** (Intervention): SGLT2 inhibitor (dapagliflozin 10mg daily)
- **C** (Comparator): Standard care (ACE inhibitor + beta-blocker)
- **O** (Outcome): All-cause mortality (primary); hospitalizations, quality of life (secondary)
- **T** (Timeframe): 12-month follow-up
**Result**: Precise, answerable research question that guides study design, literature search, and outcome selection.
## Workflow
Copy this checklist and track your progress:
```
Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary
```
**Step 1: Formulate research question (PICOT)**
Use PICOT framework to structure answerable clinical question. Define Population (demographics, condition, setting), Intervention (treatment, exposure, diagnostic test), Comparator (alternative treatment, placebo, standard care), Outcome (patient-important endpoints), and Timeframe (follow-up duration). See [resources/template.md](resources/template.md#picot-framework) for structured templates.
**Step 2: Assess evidence hierarchy and study design**
Determine appropriate study design based on research question type (therapy: RCT; diagnosis: cross-sectional; prognosis: cohort; harm: case-control or cohort). Understand hierarchy of evidence (systematic reviews > RCTs > cohort > case-control > case series). See [resources/methodology.md](resources/methodology.md#evidence-hierarchy) for design selection guidance.
**Step 3: Evaluate study quality and bias**
Apply risk of bias assessment tools (Cochrane RoB 2 for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic accuracy). Evaluate randomization, blinding, allocation concealment, incomplete outcome data, selective reporting. See [resources/methodology.md](resources/methodology.md#bias-assessment) for detailed criteria.
**Step 4: Prioritize and define outcomes**
Distinguish patient-important outcomes (mortality, symptoms, quality of life, function) from surrogate endpoints (biomarkers, lab values). Create outcome hierarchy: critical (decision-driving), important (informs decision), not important. Define measurement instruments and minimal clinically important differences (MCID). See [resources/template.md](resources/template.md#outcome-hierarchy) for prioritization framework.
**Step 5: Synthesize evidence and grade certainty**
Apply GRADE (Grading of Recommendations Assessment, Development and Evaluation) to rate certainty of evidence (high, moderate, low, very low). Consider study limitations, inconsistency, indirectness, imprecision, publication bias. Upgrade for large effects, dose-response, or confounders reducing effect. See [resources/methodology.md](resources/methodology.md#grade-framework) for rating guidance.
**Step 6: Create decision-ready summary**
Produce evidence profile or summary of findings table linking outcomes to certainty ratings and effect estimates. Include clinical interpretation, applicability assessment, and evidence gaps. Validate using [resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json). **Minimum standard**: Average score ≥ 3.5.
## Common Patterns
**Pattern 1: Therapy/Intervention Question**
- **PICOT**: Adults with condition → new treatment vs standard care → patient-important outcomes → follow-up period
- **Study design**: RCT preferred (highest quality for causation); systematic review of RCTs for synthesis
- **Key outcomes**: Mortality, morbidity, quality of life, adverse events
- **Bias assessment**: Cochrane RoB 2 (randomization, blinding, attrition, selective reporting)
- **Example**: SGLT2 inhibitors for heart failure → reduced mortality (GRADE: high certainty)
**Pattern 2: Diagnostic Test Accuracy**
- **PICOT**: Patients with suspected condition → new test vs reference standard → sensitivity/specificity → cross-sectional
- **Study design**: Cross-sectional study with consecutive enrollment; avoid case-control (inflates accuracy)
- **Key outcomes**: Sensitivity, specificity, positive/negative predictive values, likelihood ratios
- **Bias assessment**: QUADAS-2 (patient selection, index test, reference standard, flow and timing)
- **Example**: High-sensitivity troponin for MI → sensitivity 95%, specificity 92% (GRADE: moderate certainty)
**Pattern 3: Prognosis/Risk Prediction**
- **PICOT**: Population with condition/exposure → risk factors → outcomes (death, disease progression) → long-term follow-up
- **Study design**: Prospective cohort (follow from exposure to outcome); avoid retrospective (recall bias)
- **Key outcomes**: Incidence, hazard ratios, absolute risk, risk prediction model performance (C-statistic, calibration)
- **Bias assessment**: ROBINS-I or PROBAST (for prediction models)
- **Example**: Framingham Risk Score for CVD → C-statistic 0.76 (moderate discrimination)
**Pattern 4: Harm/Safety Assessment**
- **PICOT**: Population exposed to intervention → adverse events → timeframe for rare/delayed harms
- **Study design**: RCT for common harms; observational (cohort, case-control) for rare harms (larger sample, longer follow-up)
- **Key outcomes**: Serious adverse events, discontinuations, organ-specific toxicity, long-term safety
- **Bias assessment**: Different for rare vs common harms; consider confounding by indication in observational studies
- **Example**: NSAID cardiovascular risk → observational studies show increased MI risk (GRADE: low certainty due to confounding)
**Pattern 5: Systematic Review/Meta-Analysis**
- **PICOT**: Defined in protocol; guides search strategy, inclusion criteria, outcome extraction
- **Study design**: Comprehensive search, explicit eligibility criteria, duplicate screening/extraction, bias assessment, quantitative synthesis (if appropriate)
- **Key outcomes**: Pooled effect estimates (RR, OR, MD, SMD), heterogeneity (I²), certainty rating (GRADE)
- **Bias assessment**: Individual study RoB + review-level assessment (AMSTAR 2 for review quality)
- **Example**: Statins for primary prevention → RR 0.75 for MI (95% CI 0.70-0.80, I²=12%, GRADE: high certainty)
## Guardrails
**Critical requirements:**
1. **Use PICOT for all clinical questions**: Vague questions lead to unfocused research. Always specify Population, Intervention, Comparator, Outcome, Timeframe explicitly. Avoid "does X work?" without defining for whom, compared to what, and measuring which outcomes.
2. **Match study design to question type**: RCTs answer therapy questions (causal inference). Cohort studies answer prognosis. Cross-sectional studies answer diagnosis. Case-control studies answer rare harm or etiology. Don't claim causation from observational data or use case series for treatment effects.
3. **Prioritize patient-important outcomes over surrogates**: Surrogate endpoints (biomarkers, lab values) don't always correlate with patient outcomes. Focus on mortality, morbidity, symptoms, function, quality of life. Only use surrogates if validated relationship to patient outcomes exists.
4. **Assess bias systematically, not informally**: Use validated tools (Cochrane RoB 2, ROBINS-I, QUADAS-2) not subjective judgment. Bias assessment affects certainty of evidence and clinical recommendations. Common biases: selection bias, performance bias (lack of blinding), detection bias, attrition bias, reporting bias.
5. **Apply GRADE to rate certainty of evidence**: Don't conflate study design with certainty. RCTs start as high certainty but can be downgraded (serious limitations, inconsistency, indirectness, imprecision, publication bias). Observational studies start as low but can be upgraded (large effect, dose-response, residual confounding reducing effect).
6. **Distinguish statistical significance from clinical importance**: p < 0.05 doesn't mean clinically meaningful. Consider minimal clinically important difference (MCID), absolute risk reduction, number needed to treat (NNT). Small p-value with tiny effect size is statistically significant but clinically irrelevant.
7. **Assess external validity and applicability**: Evidence from selected trial populations may not apply to your patient. Consider PICO match (are your patients similar?), setting differences (tertiary center vs community), intervention feasibility, patient values and preferences.
8. **State limitations and certainty explicitly**: All evidence has limitations. Specify what's uncertain, where evidence gaps exist, and how this affects confidence in recommendations. Avoid overconfident claims not supported by evidence quality.
**Common pitfalls:**
-**Treating all RCTs as high quality**: RCTs can have serious bias (inadequate randomization, unblinded, high attrition). Always assess bias.
-**Ignoring heterogeneity in meta-analysis**: High I² (>50%) suggests important differences across studies. Explore sources (population, intervention, outcome definition) before pooling.
-**Confusing association with causation**: Observational studies show association, not causation. Residual confounding is always possible.
-**Using composite outcomes uncritically**: Composite endpoints (e.g., "death or MI or hospitalization") obscure which component drives effect. Report components separately.
-**Accepting industry-funded evidence uncritically**: Pharmaceutical/device company-sponsored trials may have bias (outcome selection, selective reporting). Assess for conflicts of interest.
-**Over-interpreting subgroup analyses**: Most subgroup effects are chance findings. Only credible if pre-specified, statistically tested for interaction, and biologically plausible.
## Quick Reference
**Key resources:**
- **[resources/template.md](resources/template.md)**: PICOT framework, outcome hierarchy template, evidence table, GRADE summary template
- **[resources/methodology.md](resources/methodology.md)**: Evidence hierarchy, bias assessment tools, GRADE detailed guidance, study design selection, systematic review methods
- **[resources/evaluators/rubric_domain_research_health_science.json](resources/evaluators/rubric_domain_research_health_science.json)**: Quality criteria for research questions, evidence synthesis, and clinical interpretation
**PICOT Template:**
- **P** (Population): [Who? Age, sex, condition, severity, setting]
- **I** (Intervention): [What? Drug, procedure, test, exposure - dose, duration, route]
- **C** (Comparator): [Compared to what? Placebo, standard care, alternative treatment]
- **O** (Outcome): [What matters? Mortality, symptoms, QoL, harms - measurement instrument, timepoint]
- **T** (Timeframe): [How long? Follow-up duration, time to outcome]
**Evidence Hierarchy (Therapy Questions):**
1. Systematic reviews/meta-analyses of RCTs
2. Individual RCTs (large, well-designed)
3. Cohort studies (prospective)
4. Case-control studies
5. Case series, case reports
6. Expert opinion, pathophysiologic rationale
**GRADE Certainty Ratings:**
- **High** (⊕⊕⊕⊕): Very confident true effect is close to estimated effect
- **Moderate** (⊕⊕⊕○): Moderately confident, true effect likely close but could be substantially different
- **Low** (⊕⊕○○): Limited confidence, true effect may be substantially different
- **Very Low** (⊕○○○): Very little confidence, true effect likely substantially different
**Typical workflow time:**
- PICOT formulation: 10-15 minutes
- Single study critical appraisal: 20-30 minutes
- Systematic review protocol: 2-4 hours
- Evidence synthesis with GRADE: 1-2 hours
- Full systematic review: 40-100 hours (depending on scope)
**When to escalate:**
- Complex statistical meta-analysis (network meta-analysis, IPD meta-analysis)
- Advanced causal inference methods (instrumental variables, propensity scores)
- Health technology assessment (cost-effectiveness, budget impact)
- Guideline development panels (requires multi-stakeholder consensus)
→ Consult biostatistician, health economist, or guideline methodologist
**Inputs required:**
- **Research question** (clinical scenario or decision problem)
- **Evidence sources** (studies to appraise, databases for systematic review)
- **Outcome preferences** (which outcomes matter most to patients/clinicians)
- **Context** (setting, patient population, decision urgency)
**Outputs produced:**
- `domain-research-health-science.md`: Structured research question, evidence appraisal, outcome hierarchy, certainty assessment, clinical interpretation

View File

@@ -0,0 +1,309 @@
{
"criteria": [
{
"name": "PICOT Question Formulation",
"description": "Is the research question precisely formulated using PICOT framework (Population, Intervention, Comparator, Outcome, Timeframe)?",
"scoring": {
"1": "Vague question without clear PICOT structure. Missing critical elements (no comparator specified, outcomes undefined, population too broad 'adults' without condition/setting). Question unanswerable as stated.",
"3": "Basic PICOT present but incomplete. Population defined but setting/severity missing, intervention specified but dose/duration unclear, outcomes listed but measurement instruments/timepoints undefined. Partially answerable.",
"5": "Complete, precise PICOT: Population with demographics/condition/severity/setting specified, Intervention with dose/duration/delivery details, explicit Comparator, Outcomes with measurement instruments/timepoints/MCID if known, Timeframe justified. Creates answerable, focused research question."
}
},
{
"name": "Study Design Appropriateness",
"description": "Is the study design matched to the research question type (RCT for therapy, cohort for prognosis, cross-sectional for diagnosis)?",
"scoring": {
"1": "Inappropriate design for question type. Using case series for therapy question, case-control for diagnosis (inflates accuracy), or cross-sectional for prognosis (no temporal sequence). Design cannot answer question.",
"3": "Acceptable design but not optimal. Using cohort when RCT feasible for therapy, retrospective cohort when prospective possible. Design can answer question but with more bias/uncertainty.",
"5": "Optimal design for question type: RCT for therapy (if ethical/feasible), cross-sectional with consecutive enrollment for diagnosis, prospective cohort for prognosis, large observational for rare harms. Design minimizes bias and provides strongest evidence."
}
},
{
"name": "Risk of Bias Assessment",
"description": "Is bias systematically assessed using validated tools (Cochrane RoB 2, ROBINS-I, QUADAS-2) rather than subjective judgment?",
"scoring": {
"1": "No bias assessment or subjective 'looks good' judgment. Ignoring obvious biases (open-label with subjective outcomes, 30% loss to follow-up, industry-funded without scrutiny). Uncritical acceptance of published findings.",
"3": "Basic bias assessment using appropriate tool but incomplete. Assessing randomization and blinding but missing attrition or selective reporting domains. Recognizing some biases but not systematically evaluating all domains.",
"5": "Comprehensive, systematic bias assessment: Appropriate tool for study design (RoB 2 for RCT, ROBINS-I for observational, QUADAS-2 for diagnostic), all domains assessed (randomization, blinding, incomplete data, selective reporting, confounding), judgments supported with evidence, overall risk categorized (low/some concerns/high)."
}
},
{
"name": "Outcome Prioritization",
"description": "Are patient-important outcomes (mortality, morbidity, QoL, function) prioritized over surrogate endpoints (biomarkers, lab values)?",
"scoring": {
"1": "Surrogate outcomes as primary without validation. Using biomarkers (HbA1c, bone density, PVCs) without demonstrating relationship to patient outcomes. Ignoring mortality/morbidity/QoL entirely. Claiming benefit based on surrogate alone.",
"3": "Mix of patient-important and surrogate outcomes. Patient-important outcomes included but not clearly prioritized. Surrogates used but relationship to clinical outcomes mentioned. Some outcomes rated by importance (critical vs important).",
"5": "Patient-important outcomes prioritized: Mortality, morbidity, symptoms, function, QoL rated as critical (7-9). Surrogates included only if validated relationship exists. Outcome hierarchy explicit (critical/important/not important). MCID specified for continuous outcomes."
}
},
{
"name": "GRADE Certainty Assessment",
"description": "Is certainty of evidence rated using GRADE framework (high/moderate/low/very low) considering study limitations, inconsistency, indirectness, imprecision, publication bias?",
"scoring": {
"1": "No certainty rating or conflating study design with certainty ('it's an RCT so it's high quality'). Ignoring serious bias, heterogeneity, or imprecision. Overconfident claims not supported by evidence quality.",
"3": "Basic GRADE assessment but incomplete. Downgrading for bias but missing inconsistency/indirectness/imprecision. Starting certainty correct (RCTs high, observational low) but not systematically applying all downgrade/upgrade criteria.",
"5": "Rigorous GRADE assessment: Starting certainty appropriate, all five downgrade domains assessed (bias, inconsistency, indirectness, imprecision, publication bias), upgrade factors considered for observational (large effect, dose-response, confounding), final certainty (⊕⊕⊕⊕ high to ⊕○○○ very low) explicitly stated with justification."
}
},
{
"name": "Statistical Interpretation",
"description": "Are statistical results interpreted correctly, distinguishing statistical significance from clinical importance, and reporting absolute effects alongside relative effects?",
"scoring": {
"1": "Confusing statistical significance with clinical importance. Claiming clinically meaningful based on p<0.05 alone when effect below MCID. Reporting only relative risk without absolute effects. Misinterpreting confidence intervals or p-values.",
"3": "Basic statistical interpretation. Reporting p-values and confidence intervals correctly. Mentioning both relative and absolute effects but not emphasizing clinical importance. Some comparison to MCID but not systematic.",
"5": "Sophisticated statistical interpretation: Comparing effect size to MCID (not just p-value), reporting absolute effects (risk difference, NNT/NNH) alongside relative (RR, OR), interpreting confidence intervals (precision, includes/excludes benefit), distinguishing statistical significance from clinical importance, noting when statistically significant but clinically trivial."
}
},
{
"name": "Heterogeneity & Consistency",
"description": "For evidence synthesis, is heterogeneity (I², visual inspection) assessed and explained? Are inconsistent findings explored rather than ignored?",
"scoring": {
"1": "Ignoring substantial heterogeneity (I²>75%). Pooling studies with opposite directions of effect. No exploration of why studies differ. Presenting pooled estimate without acknowledging uncertainty from inconsistency.",
"3": "Acknowledging heterogeneity (I² reported) but limited exploration. Noting inconsistency but not conducting subgroup analysis or meta-regression. Downgrading GRADE for inconsistency but not identifying source.",
"5": "Comprehensive heterogeneity assessment: I² calculated, forest plot inspected for outliers, sources explored (subgroup analysis by population/intervention/setting/risk of bias, or meta-regression if ≥10 studies), inconsistency explained or acknowledged as unexplained, GRADE downgraded appropriately, decision whether to pool or not justified."
}
},
{
"name": "Applicability & External Validity",
"description": "Is applicability to target population/setting assessed? Are differences between study population and clinical question population considered?",
"scoring": {
"1": "No applicability assessment. Directly extrapolating from highly selected trial population (tertiary center, strict exclusions, supervised delivery) to general population without considering differences. Ignoring PICO mismatch.",
"3": "Basic applicability mentioned. Noting trial population differs from target but not specifying how. Acknowledging setting differences (trial vs real-world) without assessing impact on generalizability.",
"5": "Rigorous applicability assessment: PICO match evaluated (are study patients similar to target?), setting differences assessed (tertiary vs primary care, supervised vs unsupervised), intervention feasibility in target setting considered, indirectness noted if substantial (and GRADE downgraded), patient values/preferences incorporated, limitations to generalizability explicitly stated."
}
},
{
"name": "Bias from Conflicts of Interest",
"description": "Are conflicts of interest (funding source, author COI) assessed and considered when interpreting evidence?",
"scoring": {
"1": "No COI assessment. Accepting industry-funded trials uncritically. Ignoring potential for selective outcome reporting, favorable comparator choice, or statistical manipulation. Not checking for unpublished negative trials from same sponsor.",
"3": "Noting funding source but limited critical appraisal. Mentioning industry sponsorship but not assessing impact on study design/outcomes/reporting. Acknowledging COI exists but not incorporating into bias assessment.",
"5": "Critical COI assessment: Funding source identified for all studies, author COI checked, potential for bias from industry sponsorship assessed (outcome selection, favorable comparator, selective reporting), preference for independent or government-funded studies noted, unpublished trials searched (trial registries), manufacturer-submitted evidence treated with appropriate skepticism."
}
},
{
"name": "Evidence Synthesis & Clarity",
"description": "Is evidence clearly synthesized into actionable summary? Are key findings, certainty, clinical interpretation, and evidence gaps communicated effectively?",
"scoring": {
"1": "Unclear summary. Presenting study-by-study without synthesis. No overall conclusion. Missing certainty ratings or clinical interpretation. User left to figure out implications themselves. No mention of evidence gaps or next steps.",
"3": "Basic synthesis. Key findings summarized but lacking detail. Certainty mentioned but not linked to specific outcomes. Some clinical interpretation but vague ('may be beneficial'). Evidence gaps noted briefly.",
"5": "Exemplary synthesis: Evidence profile/summary of findings table linking outcomes to certainty ratings and effect estimates (absolute + relative), clinical interpretation clear (balance of benefits/harms, strength of recommendation), applicability to target population stated, evidence gaps and research needs identified, next steps actionable (e.g., 'recommend intervention' or 'insufficient evidence, conduct trial')."
}
}
],
"minimum_score": 3.5,
"guidance_by_research_type": {
"Therapy/Intervention Question": {
"target_score": 4.2,
"focus_criteria": [
"Study Design Appropriateness",
"Risk of Bias Assessment",
"GRADE Certainty Assessment"
],
"key_requirements": [
"RCT preferred (or justify why cohort used if RCT not feasible)",
"Cochrane RoB 2 applied to all RCTs (5 domains assessed)",
"Patient-important outcomes (mortality, QoL, morbidity) as primary",
"GRADE certainty rated for each critical outcome",
"Absolute effects reported (NNT calculated)",
"Applicability assessed (can trial results generalize?)",
"Conflicts of interest noted and assessed"
],
"common_pitfalls": [
"Treating all RCTs as high quality without bias assessment",
"Using surrogate outcomes (biomarkers) without validation",
"Ignoring absolute effects, reporting only relative risk",
"Not accounting for heterogeneity in meta-analysis (I²>50%)",
"Extrapolating from highly selected trial population without considering applicability"
]
},
"Diagnostic Accuracy Question": {
"target_score": 4.1,
"focus_criteria": [
"PICOT Question Formulation",
"Study Design Appropriateness",
"Risk of Bias Assessment"
],
"key_requirements": [
"PICOT specifies suspected condition, index test, reference standard, target accuracy",
"Cross-sectional design with consecutive enrollment (avoid case-control)",
"QUADAS-2 bias assessment (patient selection, index test, reference standard, flow/timing)",
"Sensitivity, specificity, predictive values, likelihood ratios reported",
"Reference standard must correctly classify condition (gold standard)",
"Spectrum of disease severity represented (not just severe cases)",
"Blinding of index test interpretation to reference standard results"
],
"common_pitfalls": [
"Case-control design (inflates sensitivity/specificity by selecting extremes)",
"Differential verification bias (different reference standards for positive vs negative index tests)",
"Spectrum bias (only testing in severe cases, overstating accuracy)",
"Unblinded index test interpretation (knowing reference standard results)",
"Partial verification (not all patients get reference standard)"
]
},
"Prognosis/Prediction Question": {
"target_score": 4.0,
"focus_criteria": [
"Study Design Appropriateness",
"Risk of Bias Assessment",
"Applicability & External Validity"
],
"key_requirements": [
"Prospective cohort design (follow from exposure/risk factor to outcome)",
"Inception cohort (patients enrolled at similar point in disease course)",
"ROBINS-I or PROBAST (for prediction models) bias assessment",
"Sufficient follow-up to observe outcomes (time-to-event analysis)",
"Hazard ratios, incidence rates, or risk prediction performance (C-statistic, calibration) reported",
"Confounding assessed and adjusted (multivariable models)",
"Validation in external cohort if prediction model"
],
"common_pitfalls": [
"Retrospective design with recall bias for exposures",
"Case-control design (cannot estimate incidence or cumulative risk)",
"Incomplete follow-up or differential loss (biased estimates)",
"Overfitting prediction models (too many predictors for events, no validation)",
"Reporting only crude estimates without adjusting for confounding"
]
},
"Systematic Review/Meta-Analysis": {
"target_score": 4.3,
"focus_criteria": [
"Risk of Bias Assessment",
"Heterogeneity & Consistency",
"GRADE Certainty Assessment"
],
"key_requirements": [
"Protocol pre-registered (PROSPERO) with pre-specified outcomes",
"Comprehensive search (multiple databases, trial registries, grey literature)",
"Duplicate screening/extraction (two reviewers independently)",
"Bias assessment for all included studies (RoB 2 or ROBINS-I)",
"Heterogeneity assessed (I², Cochran's Q, forest plot inspection)",
"Subgroup analysis or meta-regression to explain heterogeneity (if warranted)",
"GRADE certainty for all critical outcomes in summary of findings table",
"Publication bias assessed (funnel plot if ≥10 studies, trial registry search)"
],
"common_pitfalls": [
"No protocol or post-hoc outcome changes (selective reporting)",
"Incomplete search (only MEDLINE, no trial registries or grey literature)",
"Single reviewer screening (bias in study selection)",
"Pooling studies with substantial heterogeneity (I²>75%) without exploring sources",
"Not assessing publication bias (missing negative trials)",
"No GRADE assessment (certainty unclear)",
"Mixing different populations/interventions inappropriately"
]
},
"Harm/Safety Assessment": {
"target_score": 3.9,
"focus_criteria": [
"Study Design Appropriateness",
"Outcome Prioritization",
"Statistical Interpretation"
],
"key_requirements": [
"RCT for common harms (>1% incidence), observational for rare harms (larger sample, longer follow-up)",
"Serious adverse events, discontinuations, organ-specific toxicity as key outcomes",
"Long-term follow-up for delayed harms (not just trial duration)",
"Confounding by indication assessed for observational studies",
"Absolute risk increase and NNH (number needed to harm) calculated",
"Multiple sources (RCTs + observational + pharmacovigilance) synthesized",
"Dose-response relationship explored if exposure varies"
],
"common_pitfalls": [
"Relying only on RCTs for rare harms (inadequate power/duration)",
"Confounding by indication in observational studies (sicker patients get treatment, appear to do worse)",
"Not reporting absolute risk (only relative risk, hard to interpret clinical importance)",
"Missing long-term harms (trials too short, no post-market surveillance)",
"Dismissing harms as 'not statistically significant' when CI wide and includes important harm"
]
}
},
"guidance_by_complexity": {
"Simple (Single Study Appraisal)": {
"target_score": 3.5,
"characteristics": "Appraising single published RCT or cohort study. Clear PICOT, straightforward outcomes, no complex statistics. Goal: Determine if findings apply to clinical question.",
"focus": "PICOT match assessment, bias assessment with appropriate tool, outcome prioritization (patient-important vs surrogate), basic statistical interpretation (RR, CI, p-value), applicability to target population.",
"examples": "Evaluating RCT of new diabetes drug vs standard for mortality, assessing cohort study of statin use and cardiovascular outcomes, appraising diagnostic accuracy study of troponin for MI.",
"scoring_priorities": [
"PICOT Question Formulation (score ≥4 to ensure clear match)",
"Risk of Bias Assessment (score ≥4 using appropriate tool)",
"Outcome Prioritization (score ≥3 to prioritize patient-important outcomes)",
"Applicability (score ≥3 to assess if results generalize)"
]
},
"Moderate (Evidence Synthesis for Guidelines)": {
"target_score": 4.0,
"characteristics": "Synthesizing 5-15 studies on single PICOT question. Moderate heterogeneity (I²=30-60%). GRADE assessment needed. Goal: Create evidence summary for guideline recommendation.",
"focus": "GRADE certainty rating for each critical outcome, heterogeneity assessment and explanation (subgroup analysis), applicability to guideline target population, balance of benefits/harms, conflicts of interest in included studies.",
"examples": "Synthesizing RCTs of anticoagulation for atrial fibrillation, evaluating diagnostic accuracy studies for D-dimer in PE, assessing observational studies of screening colonoscopy.",
"scoring_priorities": [
"GRADE Certainty Assessment (score ≥4 to rigorously rate evidence)",
"Heterogeneity & Consistency (score ≥4 to explore sources)",
"Evidence Synthesis & Clarity (score ≥4 for actionable summary)",
"All criteria should score ≥3 for moderate complexity"
]
},
"Complex (Comprehensive Systematic Review)": {
"target_score": 4.3,
"characteristics": "Full systematic review with meta-analysis. 20+ studies, substantial heterogeneity (I²>60%), multiple outcomes and subgroups, publication bias likely. Goal: Definitive evidence synthesis.",
"focus": "Comprehensive search and selection (multiple databases, grey literature), duplicate review processes, bias assessment for all studies, heterogeneity exploration (subgroup, meta-regression), publication bias assessment (funnel plot, trial registry search), GRADE for all critical outcomes, detailed evidence profile.",
"examples": "Cochrane systematic review of antihypertensives for CVD, network meta-analysis of diabetes drugs, systematic review for HTA (reimbursement decision).",
"scoring_priorities": [
"All criteria should score ≥4 for complex systematic reviews",
"Risk of Bias Assessment must be comprehensive (score 5)",
"Heterogeneity & Consistency must be rigorously explored (score ≥4)",
"GRADE Certainty Assessment must be detailed for all outcomes (score 5)",
"Conflicts of Interest assessment critical (score ≥4)"
]
}
},
"common_failure_modes": [
{
"failure": "Vague PICOT question (unanswerable)",
"symptom": "Research question like 'Does drug X work for disease Y?' without specifying population (severity, setting), comparator (vs what?), specific outcomes (mortality? symptoms? biomarker?), or timeframe. Can't design study or search literature with this question.",
"detection": "Check if all five PICOT elements are precisely specified. Ask: Can I design a study from this question? Can I search databases? Are inclusion criteria clear?",
"fix": "Specify each PICOT element: Population (age, severity, comorbidities, setting), Intervention (dose, duration, delivery), Comparator (placebo, standard care, or alternative), Outcome (patient-important endpoints with measurement instruments and MCID if known), Timeframe (follow-up duration justified). Example: 'In adults >65 with HFrEF (EF<40%) in primary care, does dapagliflozin 10mg daily vs standard ACEi+BB reduce all-cause mortality (primary) and HF hospitalizations (secondary) over 12 months?'"
},
{
"failure": "Using surrogate outcomes without validation",
"symptom": "Claiming benefit based on biomarker changes (HbA1c reduction, bone density increase, arrhythmia suppression) without demonstrating impact on patient outcomes (microvascular complications, fractures, mortality). Assuming surrogate = patient outcome.",
"detection": "Check if outcomes are patient-important (mortality, morbidity, symptoms, function, QoL) or surrogates (lab values, imaging, biomarkers). If surrogates used, check if validated relationship to patient outcomes has been established.",
"fix": "Prioritize patient-important outcomes. If surrogate necessary (patient outcome requires decades of follow-up), explicitly state surrogate status and limitations. Only accept surrogates with demonstrated relationship to patient outcomes (e.g., blood pressure for stroke, LDL for MI after statin trials, but NOT HbA1c for complications after intensive glycemic control showed harms). Example correction: 'Drug reduces HbA1c by 1% (surrogate outcome, GRADE: moderate) but impact on microvascular complications unknown (no patient-important outcome data available).'"
},
{
"failure": "Ignoring risk of bias in RCTs",
"symptom": "Treating all RCTs as high quality without systematic bias assessment. Missing problems like inadequate randomization (alternation, predictable sequence), lack of blinding with subjective outcomes (pain, QoL), high attrition (>20% loss to follow-up, differential between groups), or selective outcome reporting (protocol shows 10 outcomes, paper reports 3).",
"detection": "Check if Cochrane RoB 2 tool applied with all 5 domains assessed (randomization, deviations, missing data, outcome measurement, selective reporting). Look for justification for each domain judgment (low risk / some concerns / high risk).",
"fix": "Apply Cochrane RoB 2 systematically to all RCTs. Assess: (1) Randomization process (computer-generated sequence? allocation concealment?), (2) Deviations from interventions (blinding? protocol adherence?), (3) Missing outcome data (<5% loss? ITT analysis?), (4) Outcome measurement (blinded assessors? validated instruments?), (5) Selective reporting (protocol available? all outcomes reported?). If any domain high risk → overall high risk. Downgrade GRADE certainty if serious risk of bias. Example: 'Open-label trial with subjective QoL outcome and no blinding of assessors → High risk of bias (Domain 4: measurement of outcome) → Downgrade GRADE from High to Moderate for QoL outcome.'"
},
{
"failure": "Confusing statistical significance with clinical importance",
"symptom": "Claiming clinically meaningful benefit based solely on p<0.05, even when effect size is below minimal clinically important difference (MCID). Example: 'Pain reduced by 3 points on 0-100 VAS, p=0.001' (statistically significant) but MCID for pain is 10-15 points (clinically trivial change).",
"detection": "Check if MCID comparison made. Look for absolute effect sizes (mean difference, risk difference, NNT) alongside p-values. Verify effect exceeds MCID for continuous outcomes, or absolute risk reduction is clinically meaningful for binary outcomes.",
"fix": "Always compare effect size to MCID for continuous outcomes, or calculate absolute effects for binary outcomes. Report: (1) Statistical significance (p-value, CI), (2) Effect size (MD, RR, RD), (3) Clinical importance comparison (does MD exceed MCID? is NNT reasonable?). Example correction: 'Pain reduced by 3 points (95% CI 2-4, p<0.001) which is statistically significant but below MCID of 10 points for VAS pain. Effect is statistically significant but clinically trivial.' Or for positive finding: 'QoL improved by 8 points (95% CI 5-11, p<0.001), exceeding MCID of 5 points for KCCQ. Effect is both statistically significant and clinically meaningful.'"
},
{
"failure": "Not exploring heterogeneity in meta-analysis",
"symptom": "Presenting pooled estimate from meta-analysis with I²=70% (substantial heterogeneity) without investigating why studies differ. Studies show opposite directions of effect (some positive, some negative) yet pooled together. No subgroup analysis or sensitivity analysis conducted.",
"detection": "Check I² statistic. If I²>50%, look for explanation of heterogeneity (subgroup analysis by population, intervention dose, setting, risk of bias) or justification for not pooling. Visually inspect forest plot for outliers or opposite directions.",
"fix": "Assess heterogeneity: Calculate I² and Cochran's Q. If I²>50%, explore sources before pooling. Conduct pre-specified subgroup analyses (e.g., by population severity, intervention dose, setting) or meta-regression if ≥10 studies. If I²>75% and unexplained, consider not pooling (narrative synthesis instead). Downgrade GRADE for inconsistency if heterogeneity unexplained. Example: 'Pooled RR 0.80 (95% CI 0.70-0.91) but I²=65% indicating substantial heterogeneity. Subgroup analysis by disease severity: Moderate disease RR 0.70 (I²=20%), Severe disease RR 0.95 (I²=10%, p for interaction=0.03). Heterogeneity explained by severity. Downgrade GRADE by 1 level for initial inconsistency.'"
},
{
"failure": "Poor applicability assessment (over-extrapolation)",
"symptom": "Directly applying trial results from highly selected population (tertiary center, strict exclusions, age 18-65 only, no comorbidities) to general practice (community setting, elderly, multiple comorbidities, less supervised). Ignoring PICO mismatch between study and target population.",
"detection": "Compare study population characteristics to target population. Check: Are patients similar (age, disease severity, comorbidities)? Is setting similar (tertiary vs primary care)? Is intervention delivered similarly (supervised daily vs real-world adherence)? Are outcomes same priority?",
"fix": "Explicitly assess PICO match. Identify differences between study and target population: (1) Population (study: age 40-65, no diabetes, EF 30-40% in cardiology clinic vs target: age >70, diabetes common, EF 20-50% in primary care), (2) Intervention (study: supervised daily dosing in trial vs target: patient self-administered with variable adherence), (3) Outcome (study: CV mortality vs target: all-cause mortality + QoL more relevant). Downgrade GRADE for indirectness if substantial PICO mismatch. State limitations: 'Trial included younger patients without diabetes in cardiology clinics with supervised dosing. Uncertain if results apply to elderly primary care patients with comorbidities and lower adherence. Downgrade GRADE by 1 level for indirectness.'"
},
{
"failure": "Accepting industry-funded evidence uncritically",
"symptom": "Manufacturer-sponsored trial showing benefit for their drug, no independent replication, selective outcome reporting (protocol registered cardiovascular death but paper reports composite of death/hospitalization to boost event rate), no mention of conflicts of interest in appraisal.",
"detection": "Check funding source. If industry-funded, assess: Are outcomes pre-specified in protocol? Are all outcomes from protocol reported? Is comparator chosen to favor new drug (underdosed standard)? Are unpublished negative trials available (search ClinicalTrials.gov)? Is statistical analysis independent or done by sponsor?",
"fix": "Critical assessment of industry-funded studies: (1) Identify funding source and author COI, (2) Compare protocol to publication (outcome switching? selective reporting?), (3) Assess comparator appropriateness (is standard care dosed optimally?), (4) Search for unpublished trials from same sponsor (ClinicalTrials.gov, regulatory submissions), (5) Prefer independent systematic reviews over manufacturer meta-analyses, (6) Downgrade GRADE if serious publication bias suspected (funnel asymmetry, missing trials). Example: 'Five trials from manufacturer show benefit (RR 0.75), but ClinicalTrials.gov lists two unpublished trials (outcomes not posted). Funnel plot shows asymmetry (Egger p=0.04). Downgrade GRADE by 1 level for publication bias. Independent systematic review needed.'"
},
{
"failure": "No GRADE certainty rating (unclear confidence in evidence)",
"symptom": "Evidence synthesis presented without rating certainty (high/moderate/low/very low). User doesn't know how confident to be in findings. Recommendations made without linking to evidence quality.",
"detection": "Check if GRADE certainty explicitly stated for each critical outcome. Look for ⊕⊕⊕⊕ (high), ⊕⊕⊕○ (moderate), ⊕⊕○○ (low), ⊕○○○ (very low) symbols or verbal equivalent. Verify downgrade/upgrade factors explained.",
"fix": "Apply GRADE systematically: (1) Start with RCTs at High certainty, observational at Low, (2) Assess five downgrade domains (risk of bias, inconsistency, indirectness, imprecision, publication bias) - serious = -1 level, very serious = -2 levels, (3) Consider upgrades for observational (large effect RR>2 or <0.5, dose-response, all plausible confounders would reduce effect), (4) Assign final certainty (High/Moderate/Low/Very Low), (5) Create evidence profile or summary of findings table linking outcomes to certainty. Example: 'Mortality: 5 RCTs, low risk of bias, I²=10% (consistent), direct PICO match, CI excludes no effect (precise), no publication bias detected → GRADE: ⊕⊕⊕⊕ High certainty. QoL: 3 RCTs, high risk of bias (unblinded), I²=60% (inconsistent) → Downgrade by 2 levels → GRADE: ⊕⊕○○ Low certainty.'"
}
]
}

View File

@@ -0,0 +1,470 @@
# Domain Research: Health Science - Advanced Methodology
## Workflow
```
Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary
```
**Step 1: Formulate research question (PICOT)**
Define precise PICOT elements for answerable research question (see template.md for framework).
**Step 2: Assess evidence hierarchy and study design**
Match study design to question type using [1. Evidence Hierarchy](#1-evidence-hierarchy) (RCT for therapy, cohort for prognosis, cross-sectional for diagnosis).
**Step 3: Evaluate study quality and bias**
Apply systematic bias assessment using [2. Bias Assessment](#2-bias-assessment) (Cochrane RoB 2, ROBINS-I, or QUADAS-2 depending on design).
**Step 4: Prioritize and define outcomes**
Distinguish patient-important from surrogate outcomes using [6. Outcome Measurement](#6-outcome-measurement) guidance on MCID, composite outcomes, and surrogates.
**Step 5: Synthesize evidence and grade certainty**
Rate certainty using [3. GRADE Framework](#3-grade-framework) (downgrade for bias/inconsistency/indirectness/imprecision/publication bias, upgrade for large effects/dose-response). For multiple studies, apply [4. Meta-Analysis Techniques](#4-meta-analysis-techniques).
**Step 6: Create decision-ready summary**
Synthesize findings using [8. Knowledge Translation](#8-knowledge-translation) evidence-to-decision framework, assess applicability per [7. Special Populations & Contexts](#7-special-populations--contexts), and avoid [9. Common Pitfalls](#9-common-pitfalls--fixes).
---
## 1. Evidence Hierarchy
### Study Design Selection by Question Type
**Therapy/Intervention Questions**:
- **Gold standard**: RCT (randomized controlled trial)
- **When RCT not feasible**: Prospective cohort or pragmatic trial
- **Never acceptable**: Case series, expert opinion for causal claims
- **Rationale**: RCTs minimize confounding through randomization, establishing causation
**Diagnostic Accuracy Questions**:
- **Gold standard**: Cross-sectional study with consecutive enrollment
- **Critical requirement**: Compare index test to validated reference standard in same patients
- **Avoid**: Case-control design (inflates sensitivity/specificity by selecting extremes)
- **Rationale**: Cross-sectional design prevents spectrum bias; consecutive enrollment prevents selection bias
**Prognosis/Prediction Questions**:
- **Gold standard**: Prospective cohort (follow from exposure to outcome)
- **Acceptable**: Retrospective cohort with robust data (registries, databases)
- **Avoid**: Case-control (can't estimate incidence), cross-sectional (no temporal sequence)
- **Rationale**: Cohort design establishes temporal sequence, allows incidence calculation
**Harm/Safety Questions**:
- **Common harms**: RCTs (adequate power for events occurring in >1% patients)
- **Rare harms**: Large observational studies (cohort, case-control, pharmacovigilance)
- **Delayed harms**: Long-term cohort studies or registries
- **Rationale**: RCTs often lack power/duration for rare or delayed harms; observational studies provide larger samples and longer follow-up
### Hierarchy by Evidence Strength
**Level 1 (Highest)**: Systematic reviews and meta-analyses of well-designed RCTs
**Level 2**: Individual large, well-designed RCT with low risk of bias
**Level 3**: Well-designed RCTs with some limitations (quasi-randomized, not blinded)
**Level 4**: Cohort studies (prospective better than retrospective)
**Level 5**: Case-control studies
**Level 6**: Cross-sectional surveys (descriptive only, not causal)
**Level 7**: Case series or case reports
**Level 8**: Expert opinion, pathophysiologic rationale
**Important**: Hierarchy is a starting point. Study quality matters more than design alone. Well-conducted cohort > poorly conducted RCT.
---
## 2. Bias Assessment
### Cochrane Risk of Bias 2 (RoB 2) for RCTs
**Domain 1: Randomization Process**
- **Low risk**: Computer-generated sequence, central allocation, opaque envelopes
- **Some concerns**: Randomization method unclear, baseline imbalances suggesting problems
- **High risk**: Non-random sequence (alternation, date of birth), predictable allocation, post-randomization exclusions
**Domain 2: Deviations from Intended Interventions**
- **Low risk**: Double-blind, protocol deviations balanced across groups, intention-to-treat (ITT) analysis
- **Some concerns**: Open-label but objective outcomes, minor unbalanced deviations
- **High risk**: Open-label with subjective outcomes, substantial deviation (>10% cross-over), per-protocol analysis only
**Domain 3: Missing Outcome Data**
- **Low risk**: <5% loss to follow-up, balanced across groups, multiple imputation if >5%
- **Some concerns**: 5-10% loss, ITT analysis used, or reasons for missingness reported
- **High risk**: >10% loss, or imbalanced loss (>5% difference between groups), or complete-case analysis with no sensitivity
**Domain 4: Measurement of Outcome**
- **Low risk**: Blinded outcome assessors, objective outcomes (mortality, lab values)
- **Some concerns**: Unblinded assessors but objective outcomes
- **High risk**: Unblinded assessors with subjective outcomes (pain, quality of life)
**Domain 5: Selection of Reported Result**
- **Low risk**: Protocol published before enrollment, all pre-specified outcomes reported
- **Some concerns**: Protocol not available, but outcomes match methods section
- **High risk**: Outcomes in results differ from protocol/methods, selective subgroup reporting
**Overall Judgment**: If any domain is "high risk" → Overall high risk. If all domains "low risk" → Overall low risk. Otherwise → Some concerns.
### ROBINS-I for Observational Studies
**Domain 1: Confounding**
- **Low**: All important confounders measured and adjusted (multivariable regression, propensity scores, matching)
- **Moderate**: Most confounders adjusted, but some unmeasured
- **Serious**: Important confounders not adjusted (e.g., comparing treatment groups without adjusting for severity)
- **Critical**: Confounding by indication makes results uninterpretable
**Domain 2: Selection of Participants**
- **Low**: Selection into study unrelated to intervention and outcome (inception cohort, consecutive enrollment)
- **Serious**: Post-intervention selection (survivor bias, selecting on outcome)
**Domain 3: Classification of Interventions**
- **Low**: Intervention status well-defined and independently ascertained (pharmacy records, procedure logs)
- **Serious**: Intervention status based on patient recall or subjective classification
**Domain 4: Deviations from Intended Interventions**
- **Low**: Intervention/comparator groups received intended interventions, co-interventions balanced
- **Serious**: Substantial differences in co-interventions between groups
**Domain 5: Missing Data**
- **Low**: <5% missing outcome data, or multiple imputation with sensitivity analysis
- **Serious**: >10% missing, complete-case analysis with no sensitivity
**Domain 6: Measurement of Outcomes**
- **Low**: Blinded outcome assessment or objective outcomes
- **Serious**: Unblinded assessment of subjective outcomes, knowledge of intervention may bias assessment
**Domain 7: Selection of Reported Result**
- **Low**: Analysis plan pre-specified and followed
- **Serious**: Selective reporting of outcomes or subgroups
### QUADAS-2 for Diagnostic Accuracy Studies
**Domain 1: Patient Selection**
- **Low**: Consecutive or random sample, case-control design avoided, appropriate exclusions
- **High**: Case-control design (inflates accuracy), inappropriate exclusions (spectrum bias)
**Domain 2: Index Test**
- **Low**: Pre-specified threshold, blinded to reference standard
- **High**: Threshold chosen after seeing results, unblinded interpretation
**Domain 3: Reference Standard**
- **Low**: Reference standard correctly classifies condition, interpreted blind to index test
- **High**: Imperfect reference standard, differential verification (different reference for positive/negative index)
**Domain 4: Flow and Timing**
- **Low**: All patients receive same reference standard, appropriate interval between tests
- **High**: Not all patients receive reference (partial verification), long interval allowing disease status to change
---
## 3. GRADE Framework
### Starting Certainty
**RCTs**: Start at High certainty
**Observational studies**: Start at Low certainty
### Downgrade Factors (Each -1 or -2 levels)
**1. Risk of Bias (Study Limitations)**
- **Serious (-1)**: Most studies have some concerns on RoB 2, or observational studies with moderate risk on most ROBINS-I domains
- **Very serious (-2)**: Most studies high risk of bias, or observational with serious/critical risk on ROBINS-I
**2. Inconsistency (Heterogeneity)**
- **Serious (-1)**: I² = 50-75%, or point estimates vary widely, or confidence intervals show minimal overlap
- **Very serious (-2)**: I² > 75%, opposite directions of effect
- **Do not downgrade if**: Heterogeneity explained by subgroup analysis, or all studies show benefit despite variation in magnitude
**3. Indirectness (Applicability)**
- **Serious (-1)**: Indirect comparison (no head-to-head trial), surrogate outcome instead of patient-important, PICO mismatch (different population/intervention than question)
- **Very serious (-2)**: Multiple levels of indirectness (e.g., indirect comparison + surrogate outcome)
**4. Imprecision (Statistical Uncertainty)**
- **Serious (-1)**: Confidence interval crosses minimal clinically important difference (MCID) or includes both benefit and harm, or optimal information size (OIS) not met
- **Very serious (-2)**: Very wide CI, very small sample (<100 total), or very few events (<100 total)
- **Rule of thumb**: OIS = sample size required for adequately powered RCT (~400 patients for typical effect size)
**5. Publication Bias**
- **Serious (-1)**: Funnel plot asymmetry (Egger's test p<0.10), all studies industry-funded with positive results, or known unpublished negative trials
- **Note**: Requires ≥10 studies to assess funnel plot. Consider searching trial registries for unpublished studies.
### Upgrade Factors (Observational Studies Only)
**1. Large Effect**
- **Upgrade +1**: RR > 2 or < 0.5 (based on consistent evidence, no plausible confounders)
- **Upgrade +2**: RR > 5 or < 0.2 ("very large effect")
- **Example**: Smoking → lung cancer (RR ~20) upgraded from low to moderate or high
**2. Dose-Response Gradient**
- **Upgrade +1**: Increasing exposure associated with increasing risk/benefit in consistent pattern
- **Example**: More cigarettes/day → higher lung cancer risk
**3. All Plausible Confounders Would Reduce Observed Effect**
- **Upgrade +1**: Despite confounding working against finding effect, effect still observed
- **Example**: Healthy user bias would reduce observed benefit, yet benefit still seen
### Final Certainty Rating
**High** (⊕⊕⊕⊕): Very confident true effect is close to estimate. Further research very unlikely to change conclusion.
**Moderate** (⊕⊕⊕○): Moderately confident. True effect is likely close to estimate, but could be substantially different. Further research may change conclusion.
**Low** (⊕⊕○○): Limited confidence. True effect may be substantially different. Further research likely to change conclusion.
**Very Low** (⊕○○○): Very little confidence. True effect is likely substantially different. Any estimate is very uncertain.
---
## 4. Meta-Analysis Techniques
### When to Pool (Meta-Analysis)
**Pool when**:
- Studies address same PICO question
- Outcomes measured similarly (same construct, similar timepoints)
- Low to moderate heterogeneity (I² < 60%)
- At least 3 studies available
**Do not pool when**:
- Substantial heterogeneity (I² > 75%) unexplained by subgroups
- Different interventions (can't pool aspirin with warfarin for "anticoagulation")
- Different populations (adults vs children, mild vs severe disease)
- Methodologically flawed studies (high risk of bias)
### Statistical Models
**Fixed-effect model**: Assumes one true effect, differences due to sampling error only.
- **Use when**: I² < 25%, studies very similar
- **Calculation**: Inverse-variance weighting (larger studies get more weight)
**Random-effects model**: Assumes distribution of true effects, accounts for between-study variance.
- **Use when**: I² ≥ 25%, clinical heterogeneity expected
- **Calculation**: DerSimonian-Laird or REML methods
- **Note**: Gives more weight to smaller studies than fixed-effect
**Recommendation**: Use random-effects as default for clinical heterogeneity, even if I² low.
### Effect Measures
**Binary outcomes** (event yes/no):
- **Risk Ratio (RR)**: Events in intervention / Events in control. Easier to interpret than OR.
- **Odds Ratio (OR)**: Used when outcome rare (<10%) or case-control design.
- **Risk Difference (RD)**: Absolute difference. Important for clinical interpretation (NNT = 1/RD).
**Continuous outcomes** (measured on scale):
- **Mean Difference (MD)**: When outcome measured on same scale (e.g., mm Hg blood pressure)
- **Standardized Mean Difference (SMD)**: When outcome measured on different scales (different QoL questionnaires). Interpret as effect size: SMD 0.2 = small, 0.5 = moderate, 0.8 = large.
**Time-to-event outcomes**:
- **Hazard Ratio (HR)**: Accounts for censoring and time. From Cox proportional hazards models.
### Heterogeneity Assessment
**I² statistic**: % of variability due to heterogeneity rather than chance.
- **I² = 0-25%**: Low heterogeneity (might not need subgroup analysis)
- **I² = 25-50%**: Moderate heterogeneity (explore sources)
- **I² = 50-75%**: Substantial heterogeneity (subgroup analysis essential)
- **I² > 75%**: Considerable heterogeneity (consider not pooling)
**Cochran's Q test**: Tests whether heterogeneity is statistically significant (p<0.10 suggests heterogeneity).
- **Limitation**: Low power with few studies, high power with many studies (may detect clinically unimportant heterogeneity)
**Exploring heterogeneity**:
1. Visual inspection (forest plot - outliers?)
2. Subgroup analysis (by population, intervention, setting, risk of bias)
3. Meta-regression (if ≥10 studies) - test whether study-level characteristics (year, dose, age) explain heterogeneity
4. Sensitivity analysis (exclude high risk of bias, exclude outliers)
### Publication Bias Assessment
**Methods** (require ≥10 studies):
- **Funnel plot**: Plot effect size vs precision (SE). Asymmetry suggests small-study effects/publication bias.
- **Egger's test**: Statistical test for funnel plot asymmetry (p<0.10 suggests bias).
- **Trim and fill**: Impute missing studies and recalculate pooled effect.
**Limitations**: Asymmetry can be due to heterogeneity, not just publication bias. Small-study effects != publication bias.
**Search mitigation**: Search clinical trial registries (ClinicalTrials.gov, EudraCT), contact authors, grey literature.
---
## 5. Advanced Study Designs
### Pragmatic Trials
**Purpose**: Evaluate effectiveness in real-world settings (vs efficacy in ideal conditions).
**Characteristics**:
- Broad inclusion criteria (representative of clinical practice)
- Minimal exclusions (include comorbidities, elderly, diverse populations)
- Flexible interventions (allow adaptations like clinical practice)
- Clinically relevant comparators (usual care, not placebo)
- Patient-important outcomes (mortality, QoL, not just biomarkers)
- Long-term follow-up (capture real-world adherence, adverse events)
**PRECIS-2 wheel**: Rates trials from explanatory (ideal conditions) to pragmatic (real-world) on 9 domains.
**Example**: HOPE-3 trial (polypill for CVD prevention) - broad inclusion, minimal monitoring, usual care comparator, long-term follow-up.
### Non-Inferiority Trials
**Purpose**: Show new treatment is "not worse" than standard (by pre-defined margin), usually because new treatment has other advantages (cheaper, safer, easier).
**Key concepts**:
- **Non-inferiority margin** (Δ): Maximum acceptable difference. New treatment preserves ≥50% of standard's benefit over placebo.
- **One-sided test**: Test whether upper limit of 95% CI for difference < Δ.
- **Interpretation**: If upper CI < Δ, declare non-inferiority. If CI crosses Δ, inconclusive.
**Pitfalls**:
- Large non-inferiority margins (>50% of benefit) allow ineffective treatments
- Per-protocol analysis bias (favors non-inferiority); need ITT + per-protocol
- Assay sensitivity: Must show historical evidence that standard > placebo
**Example**: Enoxaparin vs unfractionated heparin for VTE treatment. Margin = 2% absolute difference in recurrent VTE.
### Cluster Randomized Trials
**Design**: Randomize groups (hospitals, clinics, communities) not individuals.
**When used**:
- Intervention delivered at group level (policy, training, quality improvement)
- Contamination risk if individuals randomized (control group adopts intervention)
**Statistical consideration**:
- **Intracluster correlation (ICC)**: Individuals within cluster more similar than across clusters
- **Design effect**: Effective sample size reduced: Deff = 1 + (m-1) × ICC, where m = cluster size
- **Analysis**: Account for clustering (GEE, mixed models, cluster-level analysis)
**Example**: COMMIT trial (smoking cessation at workplace level). Randomized worksites, analyzed accounting for clustering.
### N-of-1 Trials
**Design**: Single patient receives multiple crossovers between treatments in random order.
**When used**:
- Chronic stable conditions (asthma, arthritis, chronic pain)
- Rapid onset/offset treatments
- Substantial inter-patient variability in response
- Patient wants personalized evidence
**Requirements**:
- ≥3 treatment periods per arm (A-B-A-B-A-B)
- Washout between periods if needed
- Blind patient and assessor if possible
- Pre-specify outcome and decision rule
**Analysis**: Compare outcomes during A vs B periods within patient (paired t-test, meta-analysis across periods).
**Example**: Stimulant dose optimization for ADHD. Test 3 doses + placebo in randomized crossover, 1-week periods each.
---
## 6. Outcome Measurement
### Minimal Clinically Important Difference (MCID)
**Definition**: Smallest change in outcome that patients perceive as beneficial (and would mandate change in management).
**Determination methods**:
1. **Anchor-based**: Link change to external anchor ("How much has your pain improved?" - "A little" threshold)
2. **Distribution-based**: 0.5 SD or 1 SE as MCID (statistical, not patient-centered)
3. **Delphi consensus**: Expert panel agrees on MCID
**Examples**:
- **Pain VAS** (0-100): MCID = 10-15 points
- **6-minute walk distance**: MCID = 30 meters
- **KCCQ** (Kansas City Cardiomyopathy Questionnaire): MCID = 5 points
- **FEV₁** (lung function): MCID = 100-140 mL
**Interpretation**: Effect size must exceed MCID to be clinically meaningful. p<0.05 with effect < MCID = statistically significant but clinically trivial.
### Composite Outcomes
**Definition**: Combines ≥2 outcomes into single endpoint (e.g., "death, MI, or stroke").
**Advantages**:
- Increases event rate → reduces required sample size
- Captures multiple aspects of benefit/harm
**Disadvantages**:
- Obscures which component drives effect (mortality reduction? or non-fatal MI?)
- Components may not be equally important to patients (MI ≠ revascularization)
- If components affected differently, composite can mislead
**Guidelines**:
- Report components separately
- Verify effect is consistent across components
- Weight components by importance if possible
- Avoid composites with many low-importance components
**Example**: MACE (major adverse cardiac events) = death + MI + stroke (appropriate). But "death, MI, stroke, or revascularization" dilutes with less important outcome.
### Surrogate Outcomes
**Definition**: Biomarker/lab value used as substitute for patient-important outcome.
**Valid surrogate criteria** (Prentice criteria):
1. Surrogate associated with clinical outcome (correlation)
2. Intervention affects surrogate
3. Intervention's effect on clinical outcome is mediated through surrogate
4. Effect on surrogate fully captures effect on clinical outcome
**Problems**:
- Many surrogates fail criteria #4 (e.g., antiarrhythmics reduce PVCs but increase mortality)
- Intervention can affect surrogate without affecting clinical outcome
**Examples**:
- **Good surrogate**: Blood pressure for stroke (validated, consistent)
- **Poor surrogate**: Bone density for fracture (drugs increase density but not all reduce fracture)
- **Unvalidated**: HbA1c for microvascular complications (association exists, but lowering HbA1c doesn't always reduce complications)
**Recommendation**: Prioritize patient-important outcomes. Accept surrogates only if validated relationship exists and patient-important outcome infeasible.
---
## 7. Special Populations & Contexts
**Pediatric Evidence**: Age-appropriate outcomes (developmental milestones, parent-reported), pharmacokinetic modeling for dose prediction, extrapolation from adults if justified, expert opinion carries more weight when RCTs infeasible.
**Rare Diseases**: N-of-1 trials, registries, historical controls (with caution), Bayesian methods to reduce sample requirements. Regulatory allows lower evidence standards (orphan drugs, conditional approval).
**Health Technology Assessment**: Assesses clinical effectiveness (GRADE), safety, cost-effectiveness (cost per QALY), budget impact, organizational/ethical/social factors. Thresholds vary (£20-30k/QALY UK, $50-150k US). Requires systematic review + economic model + probabilistic sensitivity analysis.
---
## 8. Knowledge Translation
**Evidence-to-Decision Framework** (GRADE): Problem priority → Desirable/undesirable effects → Certainty → Values → Balance of benefits/harms → Resources → Equity → Acceptability → Feasibility.
**Recommendation strength**:
- **Strong** ("We recommend"): Most patients would want, few would not
- **Conditional** ("We suggest"): Substantial proportion might not want, or uncertainty high
**Guideline Development**: Scope/PICOT → Systematic review → GRADE profiles → EtD framework → Recommendation (strong vs conditional) → External review → Update plan (3-5 years). COI management critical. AGREE II assesses guideline quality.
---
## 9. Common Pitfalls & Fixes
**Surrogate outcomes**: Using unvalidated biomarkers. **Fix**: Prioritize patient-important outcomes (mortality, QoL).
**Composite outcomes**: Obscuring which component drives effect. **Fix**: Report components separately, verify consistency.
**Subgroup proliferation**: Data dredging for false positives. **Fix**: Pre-specify <5 subgroups, test interaction, require plausibility.
**Statistical vs clinical significance**: p<0.05 with effect below MCID. **Fix**: Compare to MCID, report absolute effects (NNT).
**Publication bias**: Missing null results. **Fix**: Search trial registries (ClinicalTrials.gov), contact authors, assess funnel plot.
**Poor applicability**: Extrapolating from selected trials. **Fix**: Assess PICO match, setting differences, patient values.
**Causation claims**: From observational data. **Fix**: Use causal language only for RCTs or strong obs evidence (large effect, dose-response).
**Industry bias**: Uncritical acceptance. **Fix**: Assess COI, check selective reporting, verify independent analysis.

View File

@@ -0,0 +1,394 @@
# Domain Research: Health Science - Templates
## Workflow
```
Health Research Progress:
- [ ] Step 1: Formulate research question (PICOT)
- [ ] Step 2: Assess evidence hierarchy and study design
- [ ] Step 3: Evaluate study quality and bias
- [ ] Step 4: Prioritize and define outcomes
- [ ] Step 5: Synthesize evidence and grade certainty
- [ ] Step 6: Create decision-ready summary
```
**Step 1: Formulate research question (PICOT)**
Use [PICOT Framework](#picot-framework) to structure precise research question with Population, Intervention, Comparator, Outcome, and Timeframe fully specified.
**Step 2: Assess evidence hierarchy and study design**
Select appropriate study design for question type using [Common Question Types](#common-question-types) guidance (RCT for therapy, cross-sectional for diagnosis, cohort for prognosis, observational for rare harms).
**Step 3: Evaluate study quality and bias**
Apply bias assessment using [Evidence Appraisal Template](#evidence-appraisal-template) with appropriate tool (Cochrane RoB 2 for RCTs, ROBINS-I for observational, QUADAS-2 for diagnostic).
**Step 4: Prioritize and define outcomes**
Create hierarchy using [Outcome Hierarchy Template](#outcome-hierarchy-template), prioritizing patient-important outcomes (mortality, QoL) over surrogates (biomarkers), and specify MCID.
**Step 5: Synthesize evidence and grade certainty**
Rate certainty using [GRADE Evidence Profile Template](#grade-evidence-profile-template), assessing study limitations, inconsistency, indirectness, imprecision, and publication bias.
**Step 6: Create decision-ready summary**
Produce evidence summary using [Clinical Interpretation Template](#clinical-interpretation-template) with benefits/harms balance, certainty ratings, applicability, and evidence gaps identified.
---
## PICOT Framework
### Research Question Template
**Clinical scenario**: [Describe the decision problem or knowledge gap]
### PICOT Components
**P (Population)**:
- **Demographics**: [Age range, sex, race/ethnicity if relevant]
- **Condition**: [Disease, severity, stage, diagnostic criteria]
- **Setting**: [Primary care, hospital, community, country/region]
- **Inclusion criteria**: [Key eligibility requirements]
- **Exclusion criteria**: [Factors that make evidence inapplicable]
**I (Intervention)**:
- **Type**: [Drug, procedure, diagnostic test, preventive measure, exposure]
- **Specification**: [Dose, frequency, duration, route, technique details]
- **Co-interventions**: [Other treatments given alongside]
- **Timing**: [When initiated relative to disease course]
**C (Comparator)**:
- **Type**: [Placebo, standard care, alternative treatment, no intervention]
- **Specification**: [Same level of detail as intervention]
- **Rationale**: [Why this comparator?]
**O (Outcome)**:
- **Primary outcome**: [Most important endpoint - typically patient-important]
- Measurement instrument/definition
- Timepoint for assessment
- Minimal clinically important difference (MCID) if known
- **Secondary outcomes**: [Additional endpoints]
- **Safety outcomes**: [Harms, adverse events]
**T (Timeframe)**:
- **Follow-up duration**: [How long? Justification for duration choice]
- **Time to outcome**: [When do you expect to see effect?]
**Structured PICOT statement**:
"In [population], does [intervention] compared to [comparator] affect [outcome] over [timeframe]?"
**Example**: "In adults >65 years with heart failure and reduced ejection fraction (HFrEF), does dapagliflozin 10mg daily compared to standard care (ACE inhibitor + beta-blocker) reduce all-cause mortality over 12 months?"
---
## Outcome Hierarchy Template
### Outcome Prioritization
Rate each outcome as:
- **Critical** (7-9): Essential for decision-making, would change recommendation
- **Important** (4-6): Informs decision but not decisive alone
- **Not important** (1-3): Interesting but doesn't influence decision
| Outcome | Rating (1-9) | Patient-important? | Surrogate? | MCID | Measurement | Timepoint |
|---------|--------------|---------------------|------------|------|-------------|-----------|
| All-cause mortality | 9 (Critical) | Yes | No | N/A | Death registry | 12 months |
| CV mortality | 8 (Critical) | Yes | No | N/A | Adjudicated cause | 12 months |
| Hospitalization (HF) | 7 (Critical) | Yes | No | 1-2 events/year | Hospital admission | 12 months |
| Quality of life (QoL) | 7 (Critical) | Yes | No | 5 points (KCCQ) | KCCQ questionnaire | 6, 12 months |
| 6-minute walk distance | 5 (Important) | Yes | No | 30 meters | 6MWT | 6, 12 months |
| NT-proBNP reduction | 4 (Important) | No | Yes (partial) | 30% reduction | Blood test | 3, 6, 12 months |
| Ejection fraction | 3 (Not important) | No | Yes (weak) | 5% absolute | Echocardiogram | 6, 12 months |
**Notes**:
- Prioritize patient-important outcomes (mortality, symptoms, function, QoL) over surrogates (biomarkers)
- Surrogates only acceptable if validated relationship to patient outcomes exists
- MCID = Minimal Clinically Important Difference (smallest change patients notice as meaningful)
---
## Evidence Appraisal Template
### Study Identification
**Citation**: [Author, Year, Journal]
**Study design**: [RCT, cohort, case-control, cross-sectional, systematic review]
**Research question type**: [Therapy, diagnosis, prognosis, harm, etiology]
**Setting**: [Country, healthcare system, single/multi-center]
**Funding**: [Government, industry, foundation - assess for conflict of interest]
### PICOT Match Assessment
| PICOT Element | Study Population | Your Population | Match? |
|---------------|------------------|-----------------|--------|
| Population | [Study's population] | [Your patient/question] | Yes/Partial/No |
| Intervention | [Study's intervention] | [Your intervention] | Yes/Partial/No |
| Comparator | [Study's comparator] | [Your comparator] | Yes/Partial/No |
| Outcome | [Study's outcomes] | [Your outcomes of interest] | Yes/Partial/No |
| Timeframe | [Study's follow-up] | [Your timeframe] | Yes/Partial/No |
**Applicability**: [Overall assessment - can you apply these results to your question/patient?]
### Risk of Bias Assessment (RCT - Cochrane RoB 2)
| Domain | Judgment | Support |
|--------|----------|---------|
| 1. Randomization process | Low / Some concerns / High | [Was allocation sequence random and concealed?] |
| 2. Deviations from intended interventions | Low / Some concerns / High | [Were participants and personnel blinded? Deviations balanced?] |
| 3. Missing outcome data | Low / Some concerns / High | [Loss to follow-up <10%? Balanced across groups? ITT analysis?] |
| 4. Measurement of outcome | Low / Some concerns / High | [Blinded outcome assessment? Validated instruments?] |
| 5. Selection of reported result | Low / Some concerns / High | [Protocol pre-specified outcomes? Selective reporting?] |
**Overall risk of bias**: Low / Some concerns / High
### Key Results
| Outcome | Intervention group | Control group | Effect estimate | 95% CI | p-value | Clinical interpretation |
|---------|-------------------|---------------|-----------------|--------|---------|-------------------------|
| Mortality | [n/N, %] | [n/N, %] | RR 0.75 | 0.68-0.83 | <0.001 | 25% relative risk reduction |
| QoL change | [Mean ± SD] | [Mean ± SD] | MD 5.2 points | 3.1-7.3 | <0.001 | Exceeds MCID (5 points) |
**Absolute effects**:
- **Risk difference**: [e.g., 5% absolute reduction in mortality]
- **Number needed to treat (NNT)**: [e.g., NNT = 20 to prevent 1 death]
---
## GRADE Evidence Profile Template
### Evidence Summary Table
**Question**: [PICOT question]
**Setting**: [Clinical context]
**Bibliography**: [Key studies included]
| Outcomes | Studies (Design) | Sample Size | Effect Estimate (95% CI) | Absolute Effect | Certainty | Importance |
|----------|------------------|-------------|--------------------------|-----------------|-----------|------------|
| Mortality (12mo) | 5 RCTs | N=15,234 | RR 0.75 (0.70-0.80) | 50 fewer per 1000 (from 60 to 40) | ⊕⊕⊕⊕ High | Critical |
| HF hospitalization | 5 RCTs | N=15,234 | RR 0.70 (0.65-0.76) | 90 fewer per 1000 (from 300 to 210) | ⊕⊕⊕○ Moderate¹ | Critical |
| QoL (KCCQ change) | 3 RCTs | N=8,500 | MD 5.2 (3.1-7.3) | 5.2 points higher (MCID=5) | ⊕⊕⊕○ Moderate² | Critical |
| Serious adverse events | 5 RCTs | N=15,234 | RR 0.95 (0.88-1.03) | 15 fewer per 1000 (from 300 to 285) | ⊕⊕⊕○ Moderate³ | Critical |
**Footnotes**:
1. Downgraded for inconsistency (I²=55%, moderate heterogeneity across studies)
2. Downgraded for indirectness (QoL instrument not validated in all subgroups)
3. Downgraded for imprecision (confidence interval includes no effect)
### GRADE Certainty Assessment
| Outcome | Study Design | Risk of Bias | Inconsistency | Indirectness | Imprecision | Publication Bias | Upgrade Factors | Final Certainty |
|---------|--------------|--------------|---------------|--------------|-------------|------------------|-----------------|-----------------|
| Mortality | RCT (High) | No serious (-0) | No serious (-0) | No serious (-0) | No serious (-0) | Undetected (-0) | None | ⊕⊕⊕⊕ High |
| HF hosp | RCT (High) | No serious (-0) | Serious (-1) | No serious (-0) | No serious (-0) | Undetected (-0) | None | ⊕⊕⊕○ Moderate |
| QoL | RCT (High) | No serious (-0) | No serious (-0) | Serious (-1) | No serious (-0) | Undetected (-0) | None | ⊕⊕⊕○ Moderate |
**Certainty definitions**:
- **High** (⊕⊕⊕⊕): Very confident true effect is close to estimate
- **Moderate** (⊕⊕⊕○): Moderately confident; true effect likely close but could differ substantially
- **Low** (⊕⊕○○): Limited confidence; true effect may be substantially different
- **Very Low** (⊕○○○): Very little confidence; true effect likely substantially different
---
## Clinical Interpretation Template
### Evidence-to-Decision
**Benefits**:
- [List benefits with certainty ratings]
- Example: Mortality reduction (RR 0.75, GRADE: High) - clear benefit
**Harms**:
- [List harms with certainty ratings]
- Example: Serious adverse events (RR 0.95, GRADE: Moderate) - no significant increase
**Balance of benefits vs harms**: [Favorable / Unfavorable / Uncertain]
**Certainty of evidence**: [Overall certainty across critical outcomes]
**Patient values and preferences**: [Are there important variations? Uncertainty?]
**Resource implications**: [Cost, accessibility, training required]
**Applicability**: [Can these results be applied to your setting/population?]
- PICO match: [Assess similarity]
- Setting differences: [Trial setting vs your setting]
- Feasibility: [Can intervention be delivered as in trial?]
**Evidence gaps**: [What remains uncertain? Need for further research?]
---
## Systematic Review Protocol Template
### Protocol Information
**Title**: [Systematic review title]
**Registration**: [PROSPERO ID if applicable]
**Review team**: [Names, roles, affiliations]
**Funding**: [Source - declare conflicts of interest]
### Research Question (PICOT)
[Use PICOT template above]
### Eligibility Criteria
**Inclusion criteria**:
- Study designs: [RCTs, cohort, etc.]
- Population: [Specific PICO elements]
- Interventions: [What will be included]
- Comparators: [What will be included]
- Outcomes: [Which outcomes required for inclusion]
- Setting/context: [Geographic, time period]
- Language: [English only? All languages?]
**Exclusion criteria**:
- [Specific exclusions]
### Search Strategy
**Databases**: [MEDLINE, Embase, Cochrane CENTRAL, CINAHL, PsycINFO, Web of Science]
**Search terms**: [Key concepts - population AND intervention AND outcome]
- Population: [MeSH terms, keywords]
- Intervention: [MeSH terms, keywords]
- Outcome: [MeSH terms, keywords]
**Other sources**: [Clinical trial registries, grey literature, reference lists, contact authors]
**Date limits**: [From XXXX to present, or all dates]
### Selection Process
- **Screening**: Two reviewers independently screen titles/abstracts, then full text
- **Disagreement resolution**: Discussion, third reviewer if needed
- **Software**: [Covidence, DistillerSR, or other]
- **PRISMA flow diagram**: Document screening at each stage
### Data Extraction
**Information to extract**:
- Study characteristics: Author, year, country, setting, sample size, funding
- Population: Demographics, condition details, inclusion/exclusion criteria
- Intervention: Specifics of intervention (dose, duration, delivery)
- Comparator: Details of comparison
- Outcomes: Results for each outcome (means, SDs, events, totals)
- Risk of bias domains: [RoB 2 or ROBINS-I elements]
**Extraction tool**: Standardized form, piloted on 5 studies
**Duplicate extraction**: Two reviewers independently, compare and resolve discrepancies
### Risk of Bias Assessment
**Tool**: [Cochrane RoB 2 for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic accuracy]
**Domains assessed**: [List specific domains from chosen tool]
**Process**: Two independent reviewers, disagreements resolved by discussion
### Data Synthesis
**Quantitative synthesis (meta-analysis)**: [If appropriate]
- Statistical method: [Random-effects or fixed-effect]
- Effect measure: [Risk ratio, odds ratio, mean difference, standardized mean difference]
- Software: [RevMan, R, Stata]
- Heterogeneity assessment: [I², Cochran's Q test]
- Subgroup analyses: [Pre-specified]
- Sensitivity analyses: [Exclude high risk of bias, publication bias adjustment]
**Qualitative synthesis**: [If meta-analysis not appropriate]
- Narrative summary organized by [outcome, population, intervention]
### Certainty of Evidence
**GRADE assessment**: Rate certainty (high, moderate, low, very low) for each critical outcome
**Summary of findings table**: Create evidence profile with absolute effects and certainty ratings
---
## Common Question Types
### Therapy Question
**PICOT**: Population with condition → Intervention vs Comparator → Patient-important outcomes → Follow-up
**Best study design**: RCT (if feasible); cohort if RCT not ethical/feasible
**Bias tool**: Cochrane RoB 2 (RCT), ROBINS-I (observational)
**Key outcomes**: Mortality, morbidity, quality of life, adverse events
**Statistical measure**: Risk ratio, hazard ratio, absolute risk reduction, NNT
### Diagnosis Question
**PICOT**: Population with suspected condition → Index test vs Reference standard → Diagnostic accuracy → Cross-sectional
**Best study design**: Cross-sectional with consecutive enrollment
**Bias tool**: QUADAS-2
**Key outcomes**: Sensitivity, specificity, positive/negative predictive values, likelihood ratios
**Statistical measure**: Sensitivity, specificity, diagnostic odds ratio, AUC
### Prognosis Question
**PICOT**: Population with condition/exposure → Prognostic factors → Outcomes → Long-term follow-up
**Best study design**: Prospective cohort
**Bias tool**: ROBINS-I or PROBAST (for prediction models)
**Key outcomes**: Incidence, survival, hazard ratios, risk prediction performance
**Statistical measure**: Hazard ratio, incidence rate, C-statistic, calibration
### Harm Question
**PICOT**: Population exposed to intervention → Adverse outcomes → Timeframe for rare/delayed harms
**Best study design**: RCT for common harms; observational for rare harms
**Bias tool**: Cochrane RoB 2 (RCT), ROBINS-I (observational)
**Key outcomes**: Serious adverse events, discontinuations, organ-specific toxicity
**Statistical measure**: Risk ratio, absolute risk increase, number needed to harm (NNH)
---
## Quick Reference
### Evidence Hierarchy by Question Type
**Therapy**: Systematic review of RCTs > RCT > Cohort > Case-control > Case series
**Diagnosis**: Systematic review > Cross-sectional with consecutive enrollment > Case-control (inflates accuracy)
**Prognosis**: Systematic review > Prospective cohort > Retrospective cohort > Case-control
**Harm**: Systematic review > RCT (common harms) > Observational (rare harms) > Case series
### GRADE Domains
**Downgrade certainty for**:
1. **Risk of bias** (study limitations)
2. **Inconsistency** (unexplained heterogeneity, I² > 50%)
3. **Indirectness** (PICO mismatch, surrogate outcomes)
4. **Imprecision** (wide confidence intervals, small sample)
5. **Publication bias** (funnel plot asymmetry, selective reporting)
**Upgrade certainty for** (observational studies):
1. **Large effect** (RR > 2 or < 0.5; very large RR > 5 or < 0.2)
2. **Dose-response gradient**
3. **All plausible confounders would reduce effect**
### Effect Size Interpretation
**Risk Ratio (RR)**:
- RR = 1.0: No effect
- RR = 0.75: 25% relative risk reduction
- RR = 1.25: 25% relative risk increase
**Minimal Clinically Important Difference (MCID) - Common Scales**:
- **KCCQ** (Kansas City Cardiomyopathy Questionnaire): 5 points
- **SF-36** (Short Form Health Survey): 5-10 points
- **VAS pain** (0-100): 10-15 points
- **6-minute walk test**: 30 meters
- **FEV₁** (lung function): 100-140 mL
### Sample Size Considerations
**Adequate power**: ≥80% power to detect MCID
**Typical requirements**:
- Mortality reduction (5% → 4%): ~10,000 per arm
- QoL improvement (MCID): ~200-500 per arm
- Diagnostic accuracy (sensitivity 85% → 90%): ~300-500 patients