Initial commit

2025-11-30 08:30:18 +08:00
commit 74bee324ab
335 changed files with 147377 additions and 0 deletions
--- a/skills/scientific-critical-thinking/references/evidence_hierarchy.md
+++ b/skills/scientific-critical-thinking/references/evidence_hierarchy.md
@@ -0,0 +1,484 @@
+# Evidence Hierarchy and Quality Assessment
+
+## Traditional Evidence Hierarchy (Medical/Clinical)
+
+### Level 1: Systematic Reviews and Meta-Analyses
+**Description:** Comprehensive synthesis of all available evidence on a question.
+
+**Strengths:**
+- Combines multiple studies for greater power
+- Reduces impact of single-study anomalies
+- Can identify patterns across studies
+- Quantifies overall effect size
+
+**Weaknesses:**
+- Quality depends on included studies ("garbage in, garbage out")
+- Publication bias can distort findings
+- Heterogeneity may make pooling inappropriate
+- Can mask important differences between studies
+
+**Critical evaluation:**
+- Was search comprehensive (multiple databases, grey literature)?
+- Were inclusion criteria appropriate and prespecified?
+- Was study quality assessed?
+- Was heterogeneity explored?
+- Was publication bias assessed (funnel plots, fail-safe N)?
+- Were appropriate statistical methods used?
+
+### Level 2: Randomized Controlled Trials (RCTs)
+**Description:** Experimental studies with random assignment to conditions.
+
+**Strengths:**
+- Gold standard for establishing causation
+- Controls for known and unknown confounders
+- Minimizes selection bias
+- Enables causal inference
+
+**Weaknesses:**
+- May not be ethical or feasible
+- Artificial settings may limit generalizability
+- Often short-term with selected populations
+- Expensive and time-consuming
+
+**Critical evaluation:**
+- Was randomization adequate (sequence generation, allocation concealment)?
+- Was blinding implemented (participants, providers, assessors)?
+- Was sample size adequate (power analysis)?
+- Was intention-to-treat analysis used?
+- Was attrition rate acceptable and balanced?
+- Are results generalizable?
+
+### Level 3: Cohort Studies
+**Description:** Observational studies following groups over time.
+
+**Types:**
+- **Prospective:** Follow forward from exposure to outcome
+- **Retrospective:** Look backward at existing data
+
+**Strengths:**
+- Can study multiple outcomes
+- Establishes temporal sequence
+- Can calculate incidence and relative risk
+- More feasible than RCTs for many questions
+
+**Weaknesses:**
+- Susceptible to confounding
+- Selection bias possible
+- Attrition can bias results
+- Cannot prove causation definitively
+
+**Critical evaluation:**
+- Were cohorts comparable at baseline?
+- Was exposure measured reliably?
+- Was follow-up adequate and complete?
+- Were potential confounders measured and controlled?
+- Was outcome assessment blinded to exposure?
+
+### Level 4: Case-Control Studies
+**Description:** Compare people with outcome (cases) to those without (controls), looking back at exposures.
+
+**Strengths:**
+- Efficient for rare outcomes
+- Relatively quick and inexpensive
+- Can study multiple exposures
+- Useful for generating hypotheses
+
+**Weaknesses:**
+- Cannot calculate incidence
+- Susceptible to recall bias
+- Selection of controls is challenging
+- Cannot prove causation
+
+**Critical evaluation:**
+- Were cases and controls defined clearly?
+- Were controls appropriate (same source population)?
+- Was matching appropriate?
+- How was exposure ascertained (records vs. recall)?
+- Were potential confounders controlled?
+- Could recall bias explain findings?
+
+### Level 5: Cross-Sectional Studies
+**Description:** Snapshot observation at single point in time.
+
+**Strengths:**
+- Quick and inexpensive
+- Can assess prevalence
+- Useful for hypothesis generation
+- Can study multiple outcomes and exposures
+
+**Weaknesses:**
+- Cannot establish temporal sequence
+- Cannot determine causation
+- Prevalence-incidence bias
+- Survival bias
+
+**Critical evaluation:**
+- Was sample representative?
+- Were measures validated?
+- Could reverse causation explain findings?
+- Are confounders acknowledged?
+
+### Level 6: Case Series and Case Reports
+**Description:** Description of observations in clinical practice.
+
+**Strengths:**
+- Can identify new diseases or effects
+- Hypothesis-generating
+- Details rare phenomena
+- Quick to report
+
+**Weaknesses:**
+- No control group
+- No statistical inference possible
+- Highly susceptible to bias
+- Cannot establish causation or frequency
+
+**Use:** Primarily for hypothesis generation and clinical description.
+
+### Level 7: Expert Opinion
+**Description:** Statements by recognized authorities.
+
+**Strengths:**
+- Synthesizes experience
+- Useful when no research available
+- May integrate multiple sources
+
+**Weaknesses:**
+- Subjective and potentially biased
+- May not reflect current evidence
+- Appeal to authority fallacy risk
+- Individual expertise varies
+
+**Use:** Lowest level of evidence; should be supported by data when possible.
+
+## Nuances and Limitations of Traditional Hierarchy
+
+### When Lower-Level Evidence Can Be Strong
+1. **Well-designed observational studies** with:
+   - Large effects (hard to confound)
+   - Dose-response relationships
+   - Consistent findings across contexts
+   - Biological plausibility
+   - No plausible confounders
+
+2. **Multiple converging lines of evidence** from different study types
+
+3. **Natural experiments** approximating randomization
+
+### When Higher-Level Evidence Can Be Weak
+1. **Poor-quality RCTs** with:
+   - Inadequate randomization
+   - High attrition
+   - No blinding when feasible
+   - Conflicts of interest
+
+2. **Biased meta-analyses**:
+   - Publication bias
+   - Selective inclusion
+   - Inappropriate pooling
+   - Poor search strategy
+
+3. **Not addressing the right question**:
+   - Wrong population
+   - Wrong comparison
+   - Wrong outcome
+   - Too artificial to generalize
+
+## Alternative: GRADE System
+
+GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across four levels:
+
+### High Quality
+**Definition:** Very confident that true effect is close to estimated effect.
+
+**Characteristics:**
+- Well-conducted RCTs
+- Overwhelming evidence from observational studies
+- Large, consistent effects
+- No serious limitations
+
+### Moderate Quality
+**Definition:** Moderately confident; true effect likely close to estimated, but could be substantially different.
+
+**Downgrades from high:**
+- Some risk of bias
+- Inconsistency across studies
+- Indirectness (different populations/interventions)
+- Imprecision (wide confidence intervals)
+- Publication bias suspected
+
+### Low Quality
+**Definition:** Limited confidence; true effect may be substantially different.
+
+**Downgrades:**
+- Serious limitations in above factors
+- Observational studies without special strengths
+
+### Very Low Quality
+**Definition:** Very limited confidence; true effect likely substantially different.
+
+**Characteristics:**
+- Very serious limitations
+- Expert opinion
+- Multiple serious flaws
+
+## Study Quality Assessment Criteria
+
+### Internal Validity (Bias Control)
+**Questions:**
+- Was randomization adequate?
+- Was allocation concealed?
+- Were groups similar at baseline?
+- Was blinding implemented?
+- Was attrition minimal and balanced?
+- Was intention-to-treat used?
+- Were all outcomes reported?
+
+### External Validity (Generalizability)
+**Questions:**
+- Is sample representative of target population?
+- Are inclusion/exclusion criteria too restrictive?
+- Is setting realistic?
+- Are results applicable to other populations?
+- Are effects consistent across subgroups?
+
+### Statistical Conclusion Validity
+**Questions:**
+- Was sample size adequate (power)?
+- Were statistical tests appropriate?
+- Were assumptions checked?
+- Were effect sizes and confidence intervals reported?
+- Were multiple comparisons addressed?
+- Was analysis prespecified?
+
+### Construct Validity (Measurement)
+**Questions:**
+- Were measures validated and reliable?
+- Was outcome defined clearly and appropriately?
+- Were assessors blinded?
+- Were exposures measured accurately?
+- Was timing of measurement appropriate?
+
+## Critical Appraisal Tools
+
+### For Different Study Types
+
+**RCTs:**
+- Cochrane Risk of Bias Tool
+- Jadad Scale
+- PEDro Scale (for trials in physical therapy)
+
+**Observational Studies:**
+- Newcastle-Ottawa Scale
+- ROBINS-I (Risk of Bias in Non-randomized Studies)
+
+**Diagnostic Studies:**
+- QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)
+
+**Systematic Reviews:**
+- AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews)
+
+**All Study Types:**
+- CASP Checklists (Critical Appraisal Skills Programme)
+
+## Domain-Specific Considerations
+
+### Basic Science Research
+**Hierarchy differs:**
+1. Multiple convergent lines of evidence
+2. Mechanistic understanding
+3. Reproducible experiments
+4. Established theoretical framework
+
+**Key considerations:**
+- Replication essential
+- Mechanistic plausibility
+- Consistency across model systems
+- Convergence of methods
+
+### Psychological Research
+**Additional concerns:**
+- Replication crisis
+- Publication bias particularly problematic
+- Small effect sizes often expected
+- Cultural context matters
+- Measures often indirect (self-report)
+
+**Strong evidence includes:**
+- Preregistered studies
+- Large samples
+- Multiple measures
+- Behavioral (not just self-report) outcomes
+- Cross-cultural replication
+
+### Epidemiology
+**Causal inference frameworks:**
+- Bradford Hill criteria
+- Rothman's causal pies
+- Directed Acyclic Graphs (DAGs)
+
+**Strong observational evidence:**
+- Dose-response relationships
+- Temporal consistency
+- Biological plausibility
+- Specificity
+- Consistency across populations
+- Large effects unlikely due to confounding
+
+### Social Sciences
+**Challenges:**
+- Complex interventions
+- Context-dependent effects
+- Measurement challenges
+- Ethical constraints on RCTs
+
+**Strengthening evidence:**
+- Mixed methods
+- Natural experiments
+- Instrumental variables
+- Regression discontinuity designs
+- Multiple operationalizations
+
+## Synthesizing Evidence Across Studies
+
+### Consistency
+**Strong evidence:**
+- Multiple studies, different investigators
+- Different populations and settings
+- Different research designs converge
+- Different measurement methods
+
+**Weak evidence:**
+- Single study
+- Only one research group
+- Conflicting results
+- Publication bias evident
+
+### Biological/Theoretical Plausibility
+**Strengthens evidence:**
+- Known mechanism
+- Consistent with other knowledge
+- Dose-response relationship
+- Coherent with animal/in vitro data
+
+**Weakens evidence:**
+- No plausible mechanism
+- Contradicts established knowledge
+- Biological implausibility
+
+### Temporality
+**Essential for causation:**
+- Cause must precede effect
+- Cross-sectional studies cannot establish
+- Reverse causation must be ruled out
+
+### Specificity
+**Moderate indicator:**
+- Specific cause → specific effect strengthens causation
+- But lack of specificity doesn't rule out causation
+- Most causes have multiple effects
+
+### Strength of Association
+**Strong evidence:**
+- Large effects unlikely to be due to confounding
+- Dose-response relationships
+- All-or-none effects
+
+**Caution:**
+- Small effects may still be real
+- Large effects can still be confounded
+
+## Red Flags in Evidence Quality
+
+### Study Design Red Flags
+- No control group
+- Self-selected participants
+- No randomization when feasible
+- No blinding when feasible
+- Very small sample
+- Inappropriate statistical tests
+
+### Reporting Red Flags
+- Selective outcome reporting
+- No study registration/protocol
+- Missing methodological details
+- No conflicts of interest statement
+- Cherry-picked citations
+- Results don't match methods
+
+### Interpretation Red Flags
+- Causal language from correlational data
+- Claiming "proof"
+- Ignoring limitations
+- Overgeneralizing
+- Spinning negative results
+- Post hoc rationalization
+
+### Context Red Flags
+- Industry funding without independence
+- Single study in isolation
+- Contradicts preponderance of evidence
+- No replication
+- Published in predatory journal
+- Press release before peer review
+
+## Practical Decision Framework
+
+### When Evaluating Evidence, Ask:
+
+1. **What type of study is this?** (Design)
+2. **How well was it conducted?** (Quality)
+3. **What does it actually show?** (Results)
+4. **How likely is bias?** (Internal validity)
+5. **Does it apply to my question?** (External validity)
+6. **How does it fit with other evidence?** (Context)
+7. **Are the conclusions justified?** (Interpretation)
+8. **What are the limitations?** (Uncertainty)
+
+### Making Decisions with Imperfect Evidence
+
+**High-quality evidence:**
+- Strong confidence in acting on findings
+- Reasonable to change practice/policy
+
+**Moderate-quality evidence:**
+- Provisional conclusions
+- Consider in conjunction with other factors
+- May warrant action depending on stakes
+
+**Low-quality evidence:**
+- Weak confidence
+- Hypothesis-generating
+- Insufficient for major decisions alone
+- Consider cost/benefit of waiting for better evidence
+
+**Very low-quality evidence:**
+- Very uncertain
+- Should not drive decisions alone
+- Useful for identifying gaps and research needs
+
+### When Evidence is Conflicting
+
+**Strategies:**
+1. Weight by study quality
+2. Look for systematic differences (population, methods)
+3. Consider publication bias
+4. Update with most recent, rigorous evidence
+5. Conduct/await systematic review
+6. Consider if question is well-formed
+
+## Communicating Evidence Strength
+
+**Avoid:**
+- Absolute certainty ("proves")
+- False balance (equal weight to unequal evidence)
+- Ignoring uncertainty
+- Cherry-picking studies
+
+**Better:**
+- Quantify uncertainty
+- Describe strength of evidence
+- Acknowledge limitations
+- Present range of evidence
+- Distinguish established from emerging findings
+- Be clear about what is/isn't known