Files
gh-k-dense-ai-claude-scient…/skills/clinical-decision-support/references/outcome_analysis.md
2025-11-30 08:30:14 +08:00

641 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Outcome Analysis and Statistical Methods Guide
## Overview
Rigorous outcome analysis is essential for clinical decision support documents. This guide covers survival analysis, response assessment, statistical testing, and data visualization for patient cohort analyses and treatment evaluation.
## Survival Analysis
### Kaplan-Meier Method
**Overview**
- Non-parametric estimator of survival function from time-to-event data
- Handles censored observations (patients alive at last follow-up)
- Provides survival probability at each time point
- Generates characteristic step-function survival curves
**Key Concepts**
**Censoring**
- **Right censoring**: Most common - patient alive at last follow-up or study end
- **Left censoring**: Rare in clinical studies
- **Interval censoring**: Event occurred between two assessment times
- **Informative vs non-informative**: Censoring should be independent of outcome
**Survival Function S(t)**
- S(t) = Probability of surviving beyond time t
- S(0) = 1.0 (100% alive at time zero)
- S(t) decreases as time increases
- Step decreases at each event time
**Median Survival**
- Time point where S(t) = 0.50
- 50% of patients alive, 50% have had event
- Reported with 95% confidence interval
- "Not reached (NR)" if fewer than 50% events
**Survival Rates at Fixed Time Points**
- 1-year survival rate, 2-year survival rate, 5-year survival rate
- Read from K-M curve at specific time point
- Report with 95% CI: S(t) ± 1.96 × SE
**Calculation Example**
```
Time Events At Risk Survival Probability
0 0 100 1.000
3 2 100 0.980 (98/100)
5 1 95 0.970 (97/100 × 95/98)
8 3 87 0.936 (94/100 × 92/95 × 84/87)
...
```
### Log-Rank Test
**Purpose**: Compare survival curves between two or more groups
**Null Hypothesis**: No difference in survival distributions between groups
**Test Statistic**
- Compares observed vs expected events in each group at each time point
- Weights all time points equally
- Follows chi-square distribution with df = k-1 (k groups)
**Reporting**
- Chi-square statistic, degrees of freedom, p-value
- Example: χ² = 6.82, df = 1, p = 0.009
- Interpretation: Significant difference in survival curves
**Assumptions**
- Censoring is non-informative and independent
- Proportional hazards (constant HR over time)
- If non-proportional, consider time-varying effects
**Alternatives for Non-Proportional Hazards**
- **Gehan-Breslow test**: Weights early events more heavily
- **Peto-Peto test**: Modifies Gehan-Breslow weighting
- **Restricted mean survival time (RMST)**: Difference in area under K-M curve
### Cox Proportional Hazards Regression
**Purpose**: Multivariable survival analysis, estimate hazard ratios adjusting for covariates
**Model**: h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)
- h(t|X): Hazard rate for individual with covariates X
- h₀(t): Baseline hazard function (unspecified)
- exp(β): Hazard ratio for one-unit change in covariate
**Hazard Ratio Interpretation**
- HR = 1.0: No effect
- HR > 1.0: Increased risk (harmful)
- HR < 1.0: Decreased risk (beneficial)
- HR = 0.50: 50% reduction in hazard (risk of event)
**Example Output**
```
Variable HR 95% CI p-value
Treatment (B vs A) 0.62 0.43-0.89 0.010
Age (per 10 years) 1.15 1.02-1.30 0.021
ECOG PS (2 vs 0-1) 1.85 1.21-2.83 0.004
Biomarker+ (vs -) 0.71 0.48-1.05 0.089
```
**Proportional Hazards Assumption**
- Hazard ratio constant over time
- Test: Schoenfeld residuals, log-minus-log plots
- Violation: Time-varying effects, consider stratification or time-dependent covariates
**Multivariable vs Univariable**
- **Univariable**: One covariate at a time, unadjusted HRs
- **Multivariable**: Multiple covariates simultaneously, adjusted HRs
- Report both: Univariable for all variables, multivariable for final model
**Model Selection**
- **Forward selection**: Start with empty model, add significant variables
- **Backward elimination**: Start with all variables, remove non-significant
- **Clinical judgment**: Include known prognostic factors regardless of p-value
- **Parsimony**: Avoid overfitting, rule of thumb 1 variable per 10-15 events
## Response Assessment
### RECIST v1.1 (Response Evaluation Criteria in Solid Tumors)
**Target Lesions**
- Select up to 5 lesions total (maximum 2 per organ)
- Measurable: ≥10 mm longest diameter (≥15 mm for lymph nodes short axis)
- Sum of longest diameters (SLD) at baseline
**Response Categories**
**Complete Response (CR)**
- Disappearance of all target and non-target lesions
- Lymph nodes must regress to <10 mm short axis
- Confirmation required at ≥4 weeks
**Partial Response (PR)**
- ≥30% decrease in SLD from baseline
- No new lesions or unequivocal progression of non-target lesions
- Confirmation required at ≥4 weeks
**Stable Disease (SD)**
- Neither PR nor PD criteria met
- Minimum duration typically 6-8 weeks from baseline
**Progressive Disease (PD)**
- ≥20% increase in SLD AND ≥5 mm absolute increase from smallest SLD (nadir)
- OR appearance of new lesions
- OR unequivocal progression of non-target lesions
**Example Calculation**
```
Baseline SLD: 80 mm (4 target lesions)
Week 6 SLD: 52 mm
Percent change: (52 - 80)/80 × 100% = -35%
Classification: Partial Response (≥30% decrease)
Week 12 SLD: 48 mm (nadir)
Week 18 SLD: 62 mm
Percent change from nadir: (62 - 48)/48 × 100% = +29%
Absolute change: 62 - 48 = 14 mm
Classification: Progressive Disease (>20% AND ≥5 mm increase)
```
### iRECIST (Immune RECIST)
**Purpose**: Account for atypical response patterns with immunotherapy
**Modifications from RECIST v1.1**
**iUPD (Immune Unconfirmed Progressive Disease)**
- Initial increase in tumor burden or new lesions
- Requires confirmation at next assessment (≥4 weeks later)
- Continue treatment if clinically stable
**iCPD (Immune Confirmed Progressive Disease)**
- Confirmed progression at repeat imaging
- Discontinue immunotherapy
**Pseudoprogression**
- Initial apparent progression followed by response
- Mechanism: Immune cell infiltration increases tumor size
- Incidence: 5-10% of patients on immunotherapy
- Management: Continue treatment if patient clinically stable
**New Lesions**
- Record size and location but continue treatment
- Do not automatically classify as PD
- Confirm progression if new lesions grow or additional new lesions appear
### Other Response Criteria
**Lugano Classification (Lymphoma)**
- **PET-based**: Deauville 5-point scale
- Score 1-3: Negative (metabolic CR)
- Score 4-5: Positive (residual disease)
- **CT-based**: If PET not available
- **Bone marrow**: Required for staging in some lymphomas
**RANO (Response Assessment in Neuro-Oncology)**
- **Glioblastoma-specific**: Accounts for pseudoprogression with radiation/temozolomide
- **Enhancing disease**: Bidimensional measurements (product of perpendicular diameters)
- **Non-enhancing disease**: FLAIR changes assessed separately
- **Corticosteroid dose**: Must document, increase may indicate progression
**mRECIST (Modified RECIST for HCC)**
- **Viable tumor**: Enhancing portion only (arterial phase enhancement)
- **Necrosis**: Non-enhancing areas excluded from measurements
- **Application**: Hepatocellular carcinoma with arterial enhancement
## Outcome Metrics
### Efficacy Endpoints
**Overall Survival (OS)**
- **Definition**: Time from randomization/treatment start to death from any cause
- **Advantages**: Objective, not subject to assessment bias, regulatory gold standard
- **Disadvantages**: Requires long follow-up, affected by subsequent therapies
- **Censoring**: Last known alive date
- **Analysis**: Kaplan-Meier, log-rank test, Cox regression
**Progression-Free Survival (PFS)**
- **Definition**: Time from randomization to progression (RECIST) or death
- **Advantages**: Earlier readout than OS, direct treatment effect
- **Disadvantages**: Requires regular imaging, subject to assessment timing
- **Censoring**: Last tumor assessment without progression
- **Sensitivity Analysis**: Assess impact of censoring assumptions
**Objective Response Rate (ORR)**
- **Definition**: Proportion of patients achieving CR or PR (best response)
- **Denominator**: Evaluable patients (baseline measurable disease)
- **Reporting**: Percentage with 95% CI (exact binomial method)
- **Duration**: Time from first response to progression (DOR)
- **Advantage**: Binary endpoint, no censoring complications
**Disease Control Rate (DCR)**
- **Definition**: CR + PR + SD (stable disease ≥6-8 weeks)
- **Less Stringent**: Captures clinical benefit beyond objective response
- **Reporting**: Percentage with 95% CI
**Duration of Response (DOR)**
- **Definition**: Time from first CR or PR to progression (among responders only)
- **Population**: Subset analysis of responders
- **Analysis**: Kaplan-Meier among responders
- **Reporting**: Median DOR with 95% CI
**Time to Treatment Failure (TTF)**
- **Definition**: Time from start to discontinuation for any reason (progression, toxicity, death, patient choice)
- **Advantage**: Reflects real-world treatment duration
- **Components**: PFS + toxicity-related discontinuations
### Safety Endpoints
**Adverse Events (CTCAE v5.0)**
**Grading**
- **Grade 1**: Mild, asymptomatic or mild symptoms, clinical intervention not indicated
- **Grade 2**: Moderate, minimal/local intervention indicated, age-appropriate ADL limitation
- **Grade 3**: Severe or medically significant, not immediately life-threatening, hospitalization/prolongation indicated, disabling, self-care ADL limitation
- **Grade 4**: Life-threatening consequences, urgent intervention indicated
- **Grade 5**: Death related to adverse event
**Reporting Standards**
```
Adverse Event Summary Table:
AE Term (MedDRA) Any Grade, n (%) Grade 3-4, n (%) Grade 5, n (%)
Trt A Trt B Trt A Trt B Trt A Trt B
─────────────────────────────────────────────────────────────────────────
Hematologic
Anemia 45 (90%) 42 (84%) 8 (16%) 6 (12%) 0 0
Neutropenia 35 (70%) 38 (76%) 15 (30%) 18 (36%) 0 0
Thrombocytopenia 28 (56%) 25 (50%) 6 (12%) 4 (8%) 0 0
Febrile neutropenia 4 (8%) 6 (12%) 4 (8%) 6 (12%) 0 0
Gastrointestinal
Nausea 42 (84%) 40 (80%) 2 (4%) 1 (2%) 0 0
Diarrhea 31 (62%) 28 (56%) 5 (10%) 3 (6%) 0 0
Mucositis 18 (36%) 15 (30%) 3 (6%) 2 (4%) 0 0
Any AE 50 (100%) 50 (100%) 38 (76%) 35 (70%) 1 (2%) 0
```
**Serious Adverse Events (SAEs)**
- SAE incidence and type
- Relationship to treatment (related vs unrelated)
- Outcome (resolved, ongoing, fatal)
- Causality assessment (definite, probable, possible, unlikely, unrelated)
**Treatment Modifications**
- Dose reductions: n (%), reason
- Dose delays: n (%), duration
- Discontinuations: n (%), reason (toxicity vs progression vs other)
- Relative dose intensity: (actual dose delivered / planned dose) × 100%
## Statistical Analysis Methods
### Comparing Continuous Outcomes
**Independent Samples t-test**
- **Application**: Compare means between two independent groups (normally distributed)
- **Assumptions**: Normal distribution, equal variances (or use Welch's t-test)
- **Reporting**: Mean ± SD for each group, mean difference (95% CI), t-statistic, df, p-value
- **Example**: Mean age 62.3 ± 8.4 vs 58.7 ± 9.1 years, difference 3.6 years (95% CI 0.2-7.0, p=0.038)
**Mann-Whitney U Test (Wilcoxon Rank-Sum)**
- **Application**: Compare medians between two groups (non-normal distribution)
- **Non-parametric**: No distributional assumptions
- **Reporting**: Median [IQR] for each group, median difference, U-statistic, p-value
- **Example**: Median time to response 6.2 [4.1-8.3] vs 8.5 [5.9-11.2] weeks, p=0.042
**ANOVA (Analysis of Variance)**
- **Application**: Compare means across three or more groups
- **Output**: F-statistic, p-value (overall test)
- **Post-hoc**: If significant, pairwise comparisons with Tukey or Bonferroni correction
- **Example**: Treatment effect varied by biomarker subgroup (F=4.32, df=2, p=0.016)
### Comparing Categorical Outcomes
**Chi-Square Test for Independence**
- **Application**: Compare proportions between two or more groups
- **Assumptions**: Expected count ≥5 in at least 80% of cells
- **Reporting**: n (%) for each cell, χ², df, p-value
- **Example**: ORR 45% vs 30%, χ²=6.21, df=1, p=0.013
**Fisher's Exact Test**
- **Application**: 2×2 tables when expected count <5
- **Exact p-value**: No large-sample approximation
- **Two-sided vs one-sided**: Typically report two-sided
- **Example**: SAE rate 3/20 (15%) vs 8/22 (36%), Fisher's exact p=0.083
**McNemar's Test**
- **Application**: Paired categorical data (before/after, matched pairs)
- **Example**: Response before vs after treatment switch in same patients
### Sample Size and Power
**Power Analysis Components**
- **Alpha (α)**: Type I error rate, typically 0.05 (two-sided)
- **Beta (β)**: Type II error rate, typically 0.10 or 0.20
- **Power**: 1 - β, typically 0.80 or 0.90 (80-90% power)
- **Effect size**: Expected difference (HR, mean difference, proportion difference)
- **Sample size**: Number of patients or events needed
**Survival Study Sample Size**
- Events-driven: Need sufficient events (deaths, progressions)
- Rule of thumb: 80% power requires approximately 165 events for HR=0.70 (α=0.05, two-sided)
- Accrual time + follow-up time determines calendar time
**Response Rate Study**
```
Example: Detect ORR difference 45% vs 30% (15 percentage points)
- α = 0.05 (two-sided)
- Power = 0.80
- Sample size: n = 94 per group (188 total)
- With 10% dropout: n = 105 per group (210 total)
```
## Data Visualization
### Survival Curves
**Kaplan-Meier Plot Best Practices**
```python
# Key elements for publication-quality survival curve
1. X-axis: Time (months or years), starts at 0
2. Y-axis: Survival probability (0 to 1.0 or 0% to 100%)
3. Step function: Survival curve with steps at event times
4. 95% CI bands: Shaded region around survival curve (optional but recommended)
5. Number at risk table: Below x-axis showing n at risk at time intervals
6. Censoring marks: Vertical tick marks (|) at censored observations
7. Legend: Clearly identify each curve
8. Log-rank p-value: Prominently displayed
9. Median survival: Horizontal line at 0.50, labeled
10. Follow-up: Median follow-up time reported
```
**Number at Risk Table Format**
```
Number at risk
Group A 50 42 35 28 18 10 5
Group B 48 38 29 19 12 6 2
Time 0 6 12 18 24 30 36 (months)
```
**Hazard Ratio Annotation**
```
On plot: HR 0.62 (95% CI 0.43-0.89), p=0.010
Or in caption: Log-rank test p=0.010; Cox model HR=0.62 (95% CI 0.43-0.89)
```
### Waterfall Plots
**Purpose**: Visualize individual patient responses to treatment
**Construction**
- **X-axis**: Individual patients (anonymized patient IDs)
- **Y-axis**: Best % change from baseline tumor burden
- **Bars**: Vertical bars, one per patient
- Positive values: Tumor growth
- Negative values: Tumor shrinkage
- **Ordering**: Sorted from best response (left) to worst (right)
- **Color coding**:
- Green/blue: CR or PR (≥30% decrease)
- Yellow: SD (-30% to +20%)
- Red: PD (≥20% increase)
- **Reference lines**: Horizontal lines at +20% (PD), -30% (PR)
- **Annotations**: Biomarker status, response duration (symbols)
**Example Annotations**
```
■ = Biomarker-positive
○ = Biomarker-negative
* = Ongoing response
† = Progressed
```
### Forest Plots
**Purpose**: Display subgroup analyses with hazard ratios and confidence intervals
**Construction**
- **Y-axis**: Subgroup categories
- **X-axis**: Hazard ratio (log scale), vertical line at HR=1.0
- **Points**: HR estimate for each subgroup
- **Horizontal lines**: 95% confidence interval
- **Square size**: Proportional to sample size or precision
- **Overall effect**: Diamond at bottom, width represents 95% CI
**Subgroups to Display**
```
Subgroup n HR (95% CI) Favors A Favors B
──────────────────────────────────────────────────────────────────────────
Overall 300 0.65 (0.48-0.88) ●────┤
Age
<65 years 180 0.58 (0.39-0.86) ●────┤
≥65 years 120 0.78 (0.49-1.24) ●──────┤
Sex
Male 175 0.62 (0.43-0.90) ●────┤
Female 125 0.70 (0.44-1.12) ●─────┤
Biomarker Status
Positive 140 0.45 (0.28-0.72) ●───┤
Negative 160 0.89 (0.59-1.34) ●──────┤
p-interaction=0.041
0.25 0.5 1.0 2.0
Hazard Ratio
```
**Interaction Testing**
- Test whether treatment effect differs across subgroups
- p-interaction <0.05 suggests heterogeneity
- Pre-specify subgroups to avoid data mining
### Spider Plots
**Purpose**: Display longitudinal tumor burden changes over time for individual patients
**Construction**
- **X-axis**: Time from treatment start (weeks or months)
- **Y-axis**: % change from baseline tumor burden
- **Lines**: One line per patient connecting assessments
- **Color coding**: By response category or biomarker status
- **Reference lines**: 0% (no change), +20% (PD threshold), -30% (PR threshold)
**Clinical Insights**
- Identify delayed responders (initial SD then PR)
- Detect early progression (rapid upward trajectory)
- Assess depth of response (maximum tumor shrinkage)
- Duration visualization (when lines cross PD threshold)
### Swimmer Plots
**Purpose**: Display treatment duration and response for individual patients
**Construction**
- **X-axis**: Time from treatment start (weeks or months)
- **Y-axis**: Individual patients (one row per patient)
- **Bars**: Horizontal bars representing treatment duration
- **Symbols**:
- ● Start of treatment
- ▼ Ongoing treatment (arrow)
- ■ Progressive disease (end of bar)
- ◆ Death
- | Dose modification
- **Color**: Response status (CR=green, PR=blue, SD=yellow, PD=red)
**Example**
```
Patient ID |0 3 6 9 12 15 18 21 24 months
──────────────|──────────────────────────────────────────
Pt-001 ●═══PR═══════════|════════PR══════════▼
Pt-002 ●═══PR═══════════════PD■
Pt-003 ●══════SD══════════PD■
Pt-004 ●PR══════════════════════════════════PR▼
...
```
## Confidence Intervals
### Interpretation
**95% Confidence Interval**
- Range of plausible values for true population parameter
- If study repeated 100 times, 95 of the 95% CIs would contain true value
- **Not**: 95% probability true value within this interval (frequentist, not Bayesian)
**Relationship to p-value**
- If 95% CI excludes null value (HR=1.0, difference=0), p<0.05
- If 95% CI includes null value, p≥0.05
- CI provides more information: magnitude and precision of effect
**Precision**
- **Narrow CI**: High precision, large sample size
- **Wide CI**: Low precision, small sample size or high variability
- **Example**: HR 0.65 (95% CI 0.62-0.68) very precise; HR 0.65 (0.30-1.40) imprecise
### Calculation Methods
**Hazard Ratio CI**
- From Cox regression output
- Standard error of log(HR) → exp(log(HR) ± 1.96×SE)
- Example: HR=0.62, SE(logHR)=0.185 → 95% CI (0.43, 0.89)
**Survival Rate CI (Greenwood Formula)**
- SE(S(t)) = S(t) × sqrt(Σ[d_i / (n_i × (n_i - d_i))])
- 95% CI: S(t) ± 1.96 × SE(S(t))
- Can use complementary log-log transformation for better properties
**Proportion CI (Exact Binomial)**
- For ORR, DCR: Use exact method (Clopper-Pearson) for small samples
- Wilson score interval: Better properties than normal approximation
- Example: 12/30 responses → ORR 40% (95% CI 22.7-59.4%)
## Censoring and Missing Data
### Types of Censoring
**Right Censoring**
- **End of study**: Patient alive at study termination (administrative censoring)
- **Loss to follow-up**: Patient stops attending visits
- **Withdrawal**: Patient withdraws consent
- **Competing risk**: Death from unrelated cause (in disease-specific survival)
**Handling Censoring**
- **Assumption**: Non-informative - censoring independent of event probability
- **Sensitivity Analysis**: Assess impact if assumption violated
- Best case: All censored patients never progress
- Worst case: All censored patients progress immediately after censoring
- Actual result should fall between best/worst case
### Missing Data
**Mechanisms**
- **MCAR (Missing Completely at Random)**: Missingness unrelated to any variable
- **MAR (Missing at Random)**: Missingness related to observed but not unobserved variables
- **NMAR (Not Missing at Random)**: Missingness related to the missing value itself
**Handling Strategies**
- **Complete case analysis**: Exclude patients with missing data (biased if not MCAR)
- **Multiple imputation**: Generate multiple plausible datasets, analyze each, pool results
- **Maximum likelihood**: Estimate parameters using all available data
- **Sensitivity analysis**: Assess robustness to missing data assumptions
**Response Assessment Missing Data**
- **Unevaluable for response**: Baseline measurable disease but post-baseline assessment missing
- Exclude from ORR denominator or count as non-responder (sensitivity analysis)
- **PFS censoring**: Last adequate tumor assessment date if later assessments missing
## Reporting Standards
### CONSORT Statement (RCTs)
**Flow Diagram**
- Assessed for eligibility (n=)
- Randomized (n=)
- Allocated to intervention (n=)
- Lost to follow-up (n=, reasons)
- Discontinued intervention (n=, reasons)
- Analyzed (n=)
**Baseline Table**
- Demographics and clinical characteristics
- Baseline prognostic factors
- Show balance between arms
**Outcomes Table**
- Primary endpoint results with CI and p-value
- Secondary endpoints
- Safety summary
### STROBE Statement (Observational Studies)
**Study Design**: Cohort, case-control, or cross-sectional
**Participants**: Eligibility, sources, selection methods, sample size
**Variables**: Clearly define outcomes, exposures, predictors, confounders
**Statistical Methods**: Describe all methods, handling of missing data, sensitivity analyses
**Results**: Participant flow, descriptive data, outcome data, main results, other analyses
### Reproducible Research Practices
**Statistical Analysis Plan (SAP)**
- Pre-specify all analyses before data lock
- Primary and secondary endpoints
- Analysis populations (ITT, per-protocol, safety)
- Statistical tests and models
- Subgroup analyses (pre-specified)
- Interim analyses (if planned)
- Multiple testing procedures
**Transparency**
- Report all pre-specified analyses
- Distinguish pre-specified from post-hoc exploratory
- Report both positive and negative results
- Provide access to anonymized individual patient data (when possible)
## Software and Tools
### R Packages for Survival Analysis
- **survival**: Core package (Surv, survfit, coxph, survdiff)
- **survminer**: Publication-ready Kaplan-Meier plots (ggsurvplot)
- **rms**: Regression modeling strategies
- **flexsurv**: Flexible parametric survival models
### Python Libraries
- **lifelines**: Kaplan-Meier, Cox regression, survival curves
- **scikit-survival**: Machine learning for survival analysis
- **matplotlib**: Custom survival curve plotting
### Statistical Software
- **R**: Most comprehensive for survival analysis
- **Stata**: Medical statistics, good for epidemiology
- **SAS**: Industry standard for clinical trials
- **GraphPad Prism**: User-friendly for basic analyses
- **SPSS**: Point-and-click interface, limited survival features