Files
gh-k-dense-ai-claude-scient…/skills/clinical-decision-support/references/outcome_analysis.md
2025-11-30 08:30:18 +08:00

24 KiB
Raw Blame History

Outcome Analysis and Statistical Methods Guide

Overview

Rigorous outcome analysis is essential for clinical decision support documents. This guide covers survival analysis, response assessment, statistical testing, and data visualization for patient cohort analyses and treatment evaluation.

Survival Analysis

Kaplan-Meier Method

Overview

  • Non-parametric estimator of survival function from time-to-event data
  • Handles censored observations (patients alive at last follow-up)
  • Provides survival probability at each time point
  • Generates characteristic step-function survival curves

Key Concepts

Censoring

  • Right censoring: Most common - patient alive at last follow-up or study end
  • Left censoring: Rare in clinical studies
  • Interval censoring: Event occurred between two assessment times
  • Informative vs non-informative: Censoring should be independent of outcome

Survival Function S(t)

  • S(t) = Probability of surviving beyond time t
  • S(0) = 1.0 (100% alive at time zero)
  • S(t) decreases as time increases
  • Step decreases at each event time

Median Survival

  • Time point where S(t) = 0.50
  • 50% of patients alive, 50% have had event
  • Reported with 95% confidence interval
  • "Not reached (NR)" if fewer than 50% events

Survival Rates at Fixed Time Points

  • 1-year survival rate, 2-year survival rate, 5-year survival rate
  • Read from K-M curve at specific time point
  • Report with 95% CI: S(t) ± 1.96 × SE

Calculation Example

Time  Events  At Risk  Survival Probability
0     0       100      1.000
3     2       100      0.980 (98/100)
5     1       95       0.970 (97/100 × 95/98)
8     3       87       0.936 (94/100 × 92/95 × 84/87)
...

Log-Rank Test

Purpose: Compare survival curves between two or more groups

Null Hypothesis: No difference in survival distributions between groups

Test Statistic

  • Compares observed vs expected events in each group at each time point
  • Weights all time points equally
  • Follows chi-square distribution with df = k-1 (k groups)

Reporting

  • Chi-square statistic, degrees of freedom, p-value
  • Example: χ² = 6.82, df = 1, p = 0.009
  • Interpretation: Significant difference in survival curves

Assumptions

  • Censoring is non-informative and independent
  • Proportional hazards (constant HR over time)
  • If non-proportional, consider time-varying effects

Alternatives for Non-Proportional Hazards

  • Gehan-Breslow test: Weights early events more heavily
  • Peto-Peto test: Modifies Gehan-Breslow weighting
  • Restricted mean survival time (RMST): Difference in area under K-M curve

Cox Proportional Hazards Regression

Purpose: Multivariable survival analysis, estimate hazard ratios adjusting for covariates

Model: h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)

  • h(t|X): Hazard rate for individual with covariates X
  • h₀(t): Baseline hazard function (unspecified)
  • exp(β): Hazard ratio for one-unit change in covariate

Hazard Ratio Interpretation

  • HR = 1.0: No effect
  • HR > 1.0: Increased risk (harmful)
  • HR < 1.0: Decreased risk (beneficial)
  • HR = 0.50: 50% reduction in hazard (risk of event)

Example Output

Variable              HR      95% CI         p-value
Treatment (B vs A)    0.62    0.43-0.89      0.010
Age (per 10 years)    1.15    1.02-1.30      0.021
ECOG PS (2 vs 0-1)    1.85    1.21-2.83      0.004
Biomarker+ (vs -)     0.71    0.48-1.05      0.089

Proportional Hazards Assumption

  • Hazard ratio constant over time
  • Test: Schoenfeld residuals, log-minus-log plots
  • Violation: Time-varying effects, consider stratification or time-dependent covariates

Multivariable vs Univariable

  • Univariable: One covariate at a time, unadjusted HRs
  • Multivariable: Multiple covariates simultaneously, adjusted HRs
  • Report both: Univariable for all variables, multivariable for final model

Model Selection

  • Forward selection: Start with empty model, add significant variables
  • Backward elimination: Start with all variables, remove non-significant
  • Clinical judgment: Include known prognostic factors regardless of p-value
  • Parsimony: Avoid overfitting, rule of thumb 1 variable per 10-15 events

Response Assessment

RECIST v1.1 (Response Evaluation Criteria in Solid Tumors)

Target Lesions

  • Select up to 5 lesions total (maximum 2 per organ)
  • Measurable: ≥10 mm longest diameter (≥15 mm for lymph nodes short axis)
  • Sum of longest diameters (SLD) at baseline

Response Categories

Complete Response (CR)

  • Disappearance of all target and non-target lesions
  • Lymph nodes must regress to <10 mm short axis
  • Confirmation required at ≥4 weeks

Partial Response (PR)

  • ≥30% decrease in SLD from baseline
  • No new lesions or unequivocal progression of non-target lesions
  • Confirmation required at ≥4 weeks

Stable Disease (SD)

  • Neither PR nor PD criteria met
  • Minimum duration typically 6-8 weeks from baseline

Progressive Disease (PD)

  • ≥20% increase in SLD AND ≥5 mm absolute increase from smallest SLD (nadir)
  • OR appearance of new lesions
  • OR unequivocal progression of non-target lesions

Example Calculation

Baseline SLD: 80 mm (4 target lesions)
Week 6 SLD: 52 mm

Percent change: (52 - 80)/80 × 100% = -35%
Classification: Partial Response (≥30% decrease)

Week 12 SLD: 48 mm (nadir)
Week 18 SLD: 62 mm

Percent change from nadir: (62 - 48)/48 × 100% = +29%
Absolute change: 62 - 48 = 14 mm
Classification: Progressive Disease (>20% AND ≥5 mm increase)

iRECIST (Immune RECIST)

Purpose: Account for atypical response patterns with immunotherapy

Modifications from RECIST v1.1

iUPD (Immune Unconfirmed Progressive Disease)

  • Initial increase in tumor burden or new lesions
  • Requires confirmation at next assessment (≥4 weeks later)
  • Continue treatment if clinically stable

iCPD (Immune Confirmed Progressive Disease)

  • Confirmed progression at repeat imaging
  • Discontinue immunotherapy

Pseudoprogression

  • Initial apparent progression followed by response
  • Mechanism: Immune cell infiltration increases tumor size
  • Incidence: 5-10% of patients on immunotherapy
  • Management: Continue treatment if patient clinically stable

New Lesions

  • Record size and location but continue treatment
  • Do not automatically classify as PD
  • Confirm progression if new lesions grow or additional new lesions appear

Other Response Criteria

Lugano Classification (Lymphoma)

  • PET-based: Deauville 5-point scale
    • Score 1-3: Negative (metabolic CR)
    • Score 4-5: Positive (residual disease)
  • CT-based: If PET not available
  • Bone marrow: Required for staging in some lymphomas

RANO (Response Assessment in Neuro-Oncology)

  • Glioblastoma-specific: Accounts for pseudoprogression with radiation/temozolomide
  • Enhancing disease: Bidimensional measurements (product of perpendicular diameters)
  • Non-enhancing disease: FLAIR changes assessed separately
  • Corticosteroid dose: Must document, increase may indicate progression

mRECIST (Modified RECIST for HCC)

  • Viable tumor: Enhancing portion only (arterial phase enhancement)
  • Necrosis: Non-enhancing areas excluded from measurements
  • Application: Hepatocellular carcinoma with arterial enhancement

Outcome Metrics

Efficacy Endpoints

Overall Survival (OS)

  • Definition: Time from randomization/treatment start to death from any cause
  • Advantages: Objective, not subject to assessment bias, regulatory gold standard
  • Disadvantages: Requires long follow-up, affected by subsequent therapies
  • Censoring: Last known alive date
  • Analysis: Kaplan-Meier, log-rank test, Cox regression

Progression-Free Survival (PFS)

  • Definition: Time from randomization to progression (RECIST) or death
  • Advantages: Earlier readout than OS, direct treatment effect
  • Disadvantages: Requires regular imaging, subject to assessment timing
  • Censoring: Last tumor assessment without progression
  • Sensitivity Analysis: Assess impact of censoring assumptions

Objective Response Rate (ORR)

  • Definition: Proportion of patients achieving CR or PR (best response)
  • Denominator: Evaluable patients (baseline measurable disease)
  • Reporting: Percentage with 95% CI (exact binomial method)
  • Duration: Time from first response to progression (DOR)
  • Advantage: Binary endpoint, no censoring complications

Disease Control Rate (DCR)

  • Definition: CR + PR + SD (stable disease ≥6-8 weeks)
  • Less Stringent: Captures clinical benefit beyond objective response
  • Reporting: Percentage with 95% CI

Duration of Response (DOR)

  • Definition: Time from first CR or PR to progression (among responders only)
  • Population: Subset analysis of responders
  • Analysis: Kaplan-Meier among responders
  • Reporting: Median DOR with 95% CI

Time to Treatment Failure (TTF)

  • Definition: Time from start to discontinuation for any reason (progression, toxicity, death, patient choice)
  • Advantage: Reflects real-world treatment duration
  • Components: PFS + toxicity-related discontinuations

Safety Endpoints

Adverse Events (CTCAE v5.0)

Grading

  • Grade 1: Mild, asymptomatic or mild symptoms, clinical intervention not indicated
  • Grade 2: Moderate, minimal/local intervention indicated, age-appropriate ADL limitation
  • Grade 3: Severe or medically significant, not immediately life-threatening, hospitalization/prolongation indicated, disabling, self-care ADL limitation
  • Grade 4: Life-threatening consequences, urgent intervention indicated
  • Grade 5: Death related to adverse event

Reporting Standards

Adverse Event Summary Table:

AE Term (MedDRA)        Any Grade, n (%)  Grade 3-4, n (%)  Grade 5, n (%)
                        Trt A    Trt B    Trt A   Trt B     Trt A   Trt B
─────────────────────────────────────────────────────────────────────────
Hematologic
  Anemia                45 (90%) 42 (84%) 8 (16%) 6 (12%)   0       0
  Neutropenia           35 (70%) 38 (76%) 15 (30%) 18 (36%) 0       0
  Thrombocytopenia      28 (56%) 25 (50%) 6 (12%) 4 (8%)    0       0
  Febrile neutropenia   4 (8%)   6 (12%)  4 (8%)  6 (12%)   0       0

Gastrointestinal
  Nausea                42 (84%) 40 (80%) 2 (4%)  1 (2%)    0       0
  Diarrhea              31 (62%) 28 (56%) 5 (10%) 3 (6%)    0       0
  Mucositis             18 (36%) 15 (30%) 3 (6%)  2 (4%)    0       0

Any AE                  50 (100%) 50 (100%) 38 (76%) 35 (70%) 1 (2%) 0

Serious Adverse Events (SAEs)

  • SAE incidence and type
  • Relationship to treatment (related vs unrelated)
  • Outcome (resolved, ongoing, fatal)
  • Causality assessment (definite, probable, possible, unlikely, unrelated)

Treatment Modifications

  • Dose reductions: n (%), reason
  • Dose delays: n (%), duration
  • Discontinuations: n (%), reason (toxicity vs progression vs other)
  • Relative dose intensity: (actual dose delivered / planned dose) × 100%

Statistical Analysis Methods

Comparing Continuous Outcomes

Independent Samples t-test

  • Application: Compare means between two independent groups (normally distributed)
  • Assumptions: Normal distribution, equal variances (or use Welch's t-test)
  • Reporting: Mean ± SD for each group, mean difference (95% CI), t-statistic, df, p-value
  • Example: Mean age 62.3 ± 8.4 vs 58.7 ± 9.1 years, difference 3.6 years (95% CI 0.2-7.0, p=0.038)

Mann-Whitney U Test (Wilcoxon Rank-Sum)

  • Application: Compare medians between two groups (non-normal distribution)
  • Non-parametric: No distributional assumptions
  • Reporting: Median [IQR] for each group, median difference, U-statistic, p-value
  • Example: Median time to response 6.2 [4.1-8.3] vs 8.5 [5.9-11.2] weeks, p=0.042

ANOVA (Analysis of Variance)

  • Application: Compare means across three or more groups
  • Output: F-statistic, p-value (overall test)
  • Post-hoc: If significant, pairwise comparisons with Tukey or Bonferroni correction
  • Example: Treatment effect varied by biomarker subgroup (F=4.32, df=2, p=0.016)

Comparing Categorical Outcomes

Chi-Square Test for Independence

  • Application: Compare proportions between two or more groups
  • Assumptions: Expected count ≥5 in at least 80% of cells
  • Reporting: n (%) for each cell, χ², df, p-value
  • Example: ORR 45% vs 30%, χ²=6.21, df=1, p=0.013

Fisher's Exact Test

  • Application: 2×2 tables when expected count <5
  • Exact p-value: No large-sample approximation
  • Two-sided vs one-sided: Typically report two-sided
  • Example: SAE rate 3/20 (15%) vs 8/22 (36%), Fisher's exact p=0.083

McNemar's Test

  • Application: Paired categorical data (before/after, matched pairs)
  • Example: Response before vs after treatment switch in same patients

Sample Size and Power

Power Analysis Components

  • Alpha (α): Type I error rate, typically 0.05 (two-sided)
  • Beta (β): Type II error rate, typically 0.10 or 0.20
  • Power: 1 - β, typically 0.80 or 0.90 (80-90% power)
  • Effect size: Expected difference (HR, mean difference, proportion difference)
  • Sample size: Number of patients or events needed

Survival Study Sample Size

  • Events-driven: Need sufficient events (deaths, progressions)
  • Rule of thumb: 80% power requires approximately 165 events for HR=0.70 (α=0.05, two-sided)
  • Accrual time + follow-up time determines calendar time

Response Rate Study

Example: Detect ORR difference 45% vs 30% (15 percentage points)
- α = 0.05 (two-sided)
- Power = 0.80
- Sample size: n = 94 per group (188 total)
- With 10% dropout: n = 105 per group (210 total)

Data Visualization

Survival Curves

Kaplan-Meier Plot Best Practices

# Key elements for publication-quality survival curve
1. X-axis: Time (months or years), starts at 0
2. Y-axis: Survival probability (0 to 1.0 or 0% to 100%)
3. Step function: Survival curve with steps at event times
4. 95% CI bands: Shaded region around survival curve (optional but recommended)
5. Number at risk table: Below x-axis showing n at risk at time intervals
6. Censoring marks: Vertical tick marks (|) at censored observations
7. Legend: Clearly identify each curve
8. Log-rank p-value: Prominently displayed
9. Median survival: Horizontal line at 0.50, labeled
10. Follow-up: Median follow-up time reported

Number at Risk Table Format

Number at risk
Group A   50    42    35    28    18    10     5
Group B   48    38    29    19    12     6     2
Time      0     6     12    18    24    30    36 (months)

Hazard Ratio Annotation

On plot: HR 0.62 (95% CI 0.43-0.89), p=0.010
Or in caption: Log-rank test p=0.010; Cox model HR=0.62 (95% CI 0.43-0.89)

Waterfall Plots

Purpose: Visualize individual patient responses to treatment

Construction

  • X-axis: Individual patients (anonymized patient IDs)
  • Y-axis: Best % change from baseline tumor burden
  • Bars: Vertical bars, one per patient
    • Positive values: Tumor growth
    • Negative values: Tumor shrinkage
  • Ordering: Sorted from best response (left) to worst (right)
  • Color coding:
    • Green/blue: CR or PR (≥30% decrease)
    • Yellow: SD (-30% to +20%)
    • Red: PD (≥20% increase)
  • Reference lines: Horizontal lines at +20% (PD), -30% (PR)
  • Annotations: Biomarker status, response duration (symbols)

Example Annotations

■ = Biomarker-positive
○ = Biomarker-negative
* = Ongoing response
† = Progressed

Forest Plots

Purpose: Display subgroup analyses with hazard ratios and confidence intervals

Construction

  • Y-axis: Subgroup categories
  • X-axis: Hazard ratio (log scale), vertical line at HR=1.0
  • Points: HR estimate for each subgroup
  • Horizontal lines: 95% confidence interval
  • Square size: Proportional to sample size or precision
  • Overall effect: Diamond at bottom, width represents 95% CI

Subgroups to Display

Subgroup                    n     HR (95% CI)          Favors A  Favors B
──────────────────────────────────────────────────────────────────────────
Overall                     300   0.65 (0.48-0.88)         ●────┤
Age
  <65 years                 180   0.58 (0.39-0.86)        ●────┤
  ≥65 years                 120   0.78 (0.49-1.24)          ●──────┤
Sex
  Male                      175   0.62 (0.43-0.90)        ●────┤
  Female                    125   0.70 (0.44-1.12)         ●─────┤
Biomarker Status
  Positive                  140   0.45 (0.28-0.72)      ●───┤
  Negative                  160   0.89 (0.59-1.34)           ●──────┤
                                  p-interaction=0.041

                                  0.25  0.5   1.0   2.0
                                        Hazard Ratio

Interaction Testing

  • Test whether treatment effect differs across subgroups
  • p-interaction <0.05 suggests heterogeneity
  • Pre-specify subgroups to avoid data mining

Spider Plots

Purpose: Display longitudinal tumor burden changes over time for individual patients

Construction

  • X-axis: Time from treatment start (weeks or months)
  • Y-axis: % change from baseline tumor burden
  • Lines: One line per patient connecting assessments
  • Color coding: By response category or biomarker status
  • Reference lines: 0% (no change), +20% (PD threshold), -30% (PR threshold)

Clinical Insights

  • Identify delayed responders (initial SD then PR)
  • Detect early progression (rapid upward trajectory)
  • Assess depth of response (maximum tumor shrinkage)
  • Duration visualization (when lines cross PD threshold)

Swimmer Plots

Purpose: Display treatment duration and response for individual patients

Construction

  • X-axis: Time from treatment start (weeks or months)
  • Y-axis: Individual patients (one row per patient)
  • Bars: Horizontal bars representing treatment duration
  • Symbols:
    • ● Start of treatment
    • ▼ Ongoing treatment (arrow)
    • ■ Progressive disease (end of bar)
    • ◆ Death
    • | Dose modification
  • Color: Response status (CR=green, PR=blue, SD=yellow, PD=red)

Example

Patient ID    |0   3   6   9   12  15  18  21  24 months
──────────────|──────────────────────────────────────────
Pt-001        ●═══PR═══════════|════════PR══════════▼
Pt-002        ●═══PR═══════════════PD■
Pt-003        ●══════SD══════════PD■
Pt-004        ●PR══════════════════════════════════PR▼
...

Confidence Intervals

Interpretation

95% Confidence Interval

  • Range of plausible values for true population parameter
  • If study repeated 100 times, 95 of the 95% CIs would contain true value
  • Not: 95% probability true value within this interval (frequentist, not Bayesian)

Relationship to p-value

  • If 95% CI excludes null value (HR=1.0, difference=0), p<0.05
  • If 95% CI includes null value, p≥0.05
  • CI provides more information: magnitude and precision of effect

Precision

  • Narrow CI: High precision, large sample size
  • Wide CI: Low precision, small sample size or high variability
  • Example: HR 0.65 (95% CI 0.62-0.68) very precise; HR 0.65 (0.30-1.40) imprecise

Calculation Methods

Hazard Ratio CI

  • From Cox regression output
  • Standard error of log(HR) → exp(log(HR) ± 1.96×SE)
  • Example: HR=0.62, SE(logHR)=0.185 → 95% CI (0.43, 0.89)

Survival Rate CI (Greenwood Formula)

  • SE(S(t)) = S(t) × sqrt(Σ[d_i / (n_i × (n_i - d_i))])
  • 95% CI: S(t) ± 1.96 × SE(S(t))
  • Can use complementary log-log transformation for better properties

Proportion CI (Exact Binomial)

  • For ORR, DCR: Use exact method (Clopper-Pearson) for small samples
  • Wilson score interval: Better properties than normal approximation
  • Example: 12/30 responses → ORR 40% (95% CI 22.7-59.4%)

Censoring and Missing Data

Types of Censoring

Right Censoring

  • End of study: Patient alive at study termination (administrative censoring)
  • Loss to follow-up: Patient stops attending visits
  • Withdrawal: Patient withdraws consent
  • Competing risk: Death from unrelated cause (in disease-specific survival)

Handling Censoring

  • Assumption: Non-informative - censoring independent of event probability
  • Sensitivity Analysis: Assess impact if assumption violated
    • Best case: All censored patients never progress
    • Worst case: All censored patients progress immediately after censoring
    • Actual result should fall between best/worst case

Missing Data

Mechanisms

  • MCAR (Missing Completely at Random): Missingness unrelated to any variable
  • MAR (Missing at Random): Missingness related to observed but not unobserved variables
  • NMAR (Not Missing at Random): Missingness related to the missing value itself

Handling Strategies

  • Complete case analysis: Exclude patients with missing data (biased if not MCAR)
  • Multiple imputation: Generate multiple plausible datasets, analyze each, pool results
  • Maximum likelihood: Estimate parameters using all available data
  • Sensitivity analysis: Assess robustness to missing data assumptions

Response Assessment Missing Data

  • Unevaluable for response: Baseline measurable disease but post-baseline assessment missing
    • Exclude from ORR denominator or count as non-responder (sensitivity analysis)
  • PFS censoring: Last adequate tumor assessment date if later assessments missing

Reporting Standards

CONSORT Statement (RCTs)

Flow Diagram

  • Assessed for eligibility (n=)
  • Randomized (n=)
  • Allocated to intervention (n=)
  • Lost to follow-up (n=, reasons)
  • Discontinued intervention (n=, reasons)
  • Analyzed (n=)

Baseline Table

  • Demographics and clinical characteristics
  • Baseline prognostic factors
  • Show balance between arms

Outcomes Table

  • Primary endpoint results with CI and p-value
  • Secondary endpoints
  • Safety summary

STROBE Statement (Observational Studies)

Study Design: Cohort, case-control, or cross-sectional

Participants: Eligibility, sources, selection methods, sample size

Variables: Clearly define outcomes, exposures, predictors, confounders

Statistical Methods: Describe all methods, handling of missing data, sensitivity analyses

Results: Participant flow, descriptive data, outcome data, main results, other analyses

Reproducible Research Practices

Statistical Analysis Plan (SAP)

  • Pre-specify all analyses before data lock
  • Primary and secondary endpoints
  • Analysis populations (ITT, per-protocol, safety)
  • Statistical tests and models
  • Subgroup analyses (pre-specified)
  • Interim analyses (if planned)
  • Multiple testing procedures

Transparency

  • Report all pre-specified analyses
  • Distinguish pre-specified from post-hoc exploratory
  • Report both positive and negative results
  • Provide access to anonymized individual patient data (when possible)

Software and Tools

R Packages for Survival Analysis

  • survival: Core package (Surv, survfit, coxph, survdiff)
  • survminer: Publication-ready Kaplan-Meier plots (ggsurvplot)
  • rms: Regression modeling strategies
  • flexsurv: Flexible parametric survival models

Python Libraries

  • lifelines: Kaplan-Meier, Cox regression, survival curves
  • scikit-survival: Machine learning for survival analysis
  • matplotlib: Custom survival curve plotting

Statistical Software

  • R: Most comprehensive for survival analysis
  • Stata: Medical statistics, good for epidemiology
  • SAS: Industry standard for clinical trials
  • GraphPad Prism: User-friendly for basic analyses
  • SPSS: Point-and-click interface, limited survival features