zhongwei/gh-k-dense-ai-claude-scientific-writer-claude-scientific-writer

Files

Zhongwei Li 74bee324ab Initial commit

2025-11-30 08:30:18 +08:00

24 KiB

Raw Blame History

Outcome Analysis and Statistical Methods Guide

Overview

Rigorous outcome analysis is essential for clinical decision support documents. This guide covers survival analysis, response assessment, statistical testing, and data visualization for patient cohort analyses and treatment evaluation.

Survival Analysis

Kaplan-Meier Method

Overview

Non-parametric estimator of survival function from time-to-event data
Handles censored observations (patients alive at last follow-up)
Provides survival probability at each time point
Generates characteristic step-function survival curves

Key Concepts

Censoring

Right censoring: Most common - patient alive at last follow-up or study end
Left censoring: Rare in clinical studies
Interval censoring: Event occurred between two assessment times
Informative vs non-informative: Censoring should be independent of outcome

Survival Function S(t)

S(t) = Probability of surviving beyond time t
S(0) = 1.0 (100% alive at time zero)
S(t) decreases as time increases
Step decreases at each event time

Median Survival

Time point where S(t) = 0.50
50% of patients alive, 50% have had event
Reported with 95% confidence interval
"Not reached (NR)" if fewer than 50% events

Survival Rates at Fixed Time Points

1-year survival rate, 2-year survival rate, 5-year survival rate
Read from K-M curve at specific time point
Report with 95% CI: S(t) ± 1.96 × SE

Calculation Example

Time  Events  At Risk  Survival Probability
0     0       100      1.000
3     2       100      0.980 (98/100)
5     1       95       0.970 (97/100 × 95/98)
8     3       87       0.936 (94/100 × 92/95 × 84/87)
...

Log-Rank Test

Purpose: Compare survival curves between two or more groups

Null Hypothesis: No difference in survival distributions between groups

Test Statistic

Compares observed vs expected events in each group at each time point
Weights all time points equally
Follows chi-square distribution with df = k-1 (k groups)

Reporting

Chi-square statistic, degrees of freedom, p-value
Example: χ² = 6.82, df = 1, p = 0.009
Interpretation: Significant difference in survival curves

Assumptions

Censoring is non-informative and independent
Proportional hazards (constant HR over time)
If non-proportional, consider time-varying effects

Alternatives for Non-Proportional Hazards

Gehan-Breslow test: Weights early events more heavily
Peto-Peto test: Modifies Gehan-Breslow weighting
Restricted mean survival time (RMST): Difference in area under K-M curve

Cox Proportional Hazards Regression

Purpose: Multivariable survival analysis, estimate hazard ratios adjusting for covariates

Model: h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)

h(t|X): Hazard rate for individual with covariates X
h₀(t): Baseline hazard function (unspecified)
exp(β): Hazard ratio for one-unit change in covariate

Hazard Ratio Interpretation

HR = 1.0: No effect
HR > 1.0: Increased risk (harmful)
HR < 1.0: Decreased risk (beneficial)
HR = 0.50: 50% reduction in hazard (risk of event)

Example Output

Variable              HR      95% CI         p-value
Treatment (B vs A)    0.62    0.43-0.89      0.010
Age (per 10 years)    1.15    1.02-1.30      0.021
ECOG PS (2 vs 0-1)    1.85    1.21-2.83      0.004
Biomarker+ (vs -)     0.71    0.48-1.05      0.089

Proportional Hazards Assumption

Hazard ratio constant over time
Test: Schoenfeld residuals, log-minus-log plots
Violation: Time-varying effects, consider stratification or time-dependent covariates

Multivariable vs Univariable

Univariable: One covariate at a time, unadjusted HRs
Multivariable: Multiple covariates simultaneously, adjusted HRs
Report both: Univariable for all variables, multivariable for final model

Model Selection

Forward selection: Start with empty model, add significant variables
Backward elimination: Start with all variables, remove non-significant
Clinical judgment: Include known prognostic factors regardless of p-value
Parsimony: Avoid overfitting, rule of thumb 1 variable per 10-15 events

Response Assessment

RECIST v1.1 (Response Evaluation Criteria in Solid Tumors)

Target Lesions

Select up to 5 lesions total (maximum 2 per organ)
Measurable: ≥10 mm longest diameter (≥15 mm for lymph nodes short axis)
Sum of longest diameters (SLD) at baseline

Response Categories

Complete Response (CR)

Disappearance of all target and non-target lesions
Lymph nodes must regress to <10 mm short axis
Confirmation required at ≥4 weeks

Partial Response (PR)

≥30% decrease in SLD from baseline
No new lesions or unequivocal progression of non-target lesions
Confirmation required at ≥4 weeks

Stable Disease (SD)

Neither PR nor PD criteria met
Minimum duration typically 6-8 weeks from baseline

Progressive Disease (PD)

≥20% increase in SLD AND ≥5 mm absolute increase from smallest SLD (nadir)
OR appearance of new lesions
OR unequivocal progression of non-target lesions

Example Calculation

Baseline SLD: 80 mm (4 target lesions)
Week 6 SLD: 52 mm

Percent change: (52 - 80)/80 × 100% = -35%
Classification: Partial Response (≥30% decrease)

Week 12 SLD: 48 mm (nadir)
Week 18 SLD: 62 mm

Percent change from nadir: (62 - 48)/48 × 100% = +29%
Absolute change: 62 - 48 = 14 mm
Classification: Progressive Disease (>20% AND ≥5 mm increase)

iRECIST (Immune RECIST)

Purpose: Account for atypical response patterns with immunotherapy

Modifications from RECIST v1.1

iUPD (Immune Unconfirmed Progressive Disease)

Initial increase in tumor burden or new lesions
Requires confirmation at next assessment (≥4 weeks later)
Continue treatment if clinically stable

iCPD (Immune Confirmed Progressive Disease)

Confirmed progression at repeat imaging
Discontinue immunotherapy

Pseudoprogression

Initial apparent progression followed by response
Mechanism: Immune cell infiltration increases tumor size
Incidence: 5-10% of patients on immunotherapy
Management: Continue treatment if patient clinically stable

New Lesions

Record size and location but continue treatment
Do not automatically classify as PD
Confirm progression if new lesions grow or additional new lesions appear

Other Response Criteria

Lugano Classification (Lymphoma)

PET-based: Deauville 5-point scale
- Score 1-3: Negative (metabolic CR)
- Score 4-5: Positive (residual disease)
CT-based: If PET not available
Bone marrow: Required for staging in some lymphomas

RANO (Response Assessment in Neuro-Oncology)

Glioblastoma-specific: Accounts for pseudoprogression with radiation/temozolomide
Enhancing disease: Bidimensional measurements (product of perpendicular diameters)
Non-enhancing disease: FLAIR changes assessed separately
Corticosteroid dose: Must document, increase may indicate progression

mRECIST (Modified RECIST for HCC)

Viable tumor: Enhancing portion only (arterial phase enhancement)
Necrosis: Non-enhancing areas excluded from measurements
Application: Hepatocellular carcinoma with arterial enhancement

Outcome Metrics

Efficacy Endpoints

Overall Survival (OS)

Definition: Time from randomization/treatment start to death from any cause
Advantages: Objective, not subject to assessment bias, regulatory gold standard
Disadvantages: Requires long follow-up, affected by subsequent therapies
Censoring: Last known alive date
Analysis: Kaplan-Meier, log-rank test, Cox regression

Progression-Free Survival (PFS)

Definition: Time from randomization to progression (RECIST) or death
Advantages: Earlier readout than OS, direct treatment effect
Disadvantages: Requires regular imaging, subject to assessment timing
Censoring: Last tumor assessment without progression
Sensitivity Analysis: Assess impact of censoring assumptions

Objective Response Rate (ORR)

Definition: Proportion of patients achieving CR or PR (best response)
Denominator: Evaluable patients (baseline measurable disease)
Reporting: Percentage with 95% CI (exact binomial method)
Duration: Time from first response to progression (DOR)
Advantage: Binary endpoint, no censoring complications

Disease Control Rate (DCR)

Definition: CR + PR + SD (stable disease ≥6-8 weeks)
Less Stringent: Captures clinical benefit beyond objective response
Reporting: Percentage with 95% CI

Duration of Response (DOR)

Definition: Time from first CR or PR to progression (among responders only)
Population: Subset analysis of responders
Analysis: Kaplan-Meier among responders
Reporting: Median DOR with 95% CI

Time to Treatment Failure (TTF)

Definition: Time from start to discontinuation for any reason (progression, toxicity, death, patient choice)
Advantage: Reflects real-world treatment duration
Components: PFS + toxicity-related discontinuations

Safety Endpoints

Adverse Events (CTCAE v5.0)

Grading

Grade 1: Mild, asymptomatic or mild symptoms, clinical intervention not indicated
Grade 2: Moderate, minimal/local intervention indicated, age-appropriate ADL limitation
Grade 3: Severe or medically significant, not immediately life-threatening, hospitalization/prolongation indicated, disabling, self-care ADL limitation
Grade 4: Life-threatening consequences, urgent intervention indicated
Grade 5: Death related to adverse event

Reporting Standards

Adverse Event Summary Table:

AE Term (MedDRA)        Any Grade, n (%)  Grade 3-4, n (%)  Grade 5, n (%)
                        Trt A    Trt B    Trt A   Trt B     Trt A   Trt B
─────────────────────────────────────────────────────────────────────────
Hematologic
  Anemia                45 (90%) 42 (84%) 8 (16%) 6 (12%)   0       0
  Neutropenia           35 (70%) 38 (76%) 15 (30%) 18 (36%) 0       0
  Thrombocytopenia      28 (56%) 25 (50%) 6 (12%) 4 (8%)    0       0
  Febrile neutropenia   4 (8%)   6 (12%)  4 (8%)  6 (12%)   0       0

Gastrointestinal
  Nausea                42 (84%) 40 (80%) 2 (4%)  1 (2%)    0       0
  Diarrhea              31 (62%) 28 (56%) 5 (10%) 3 (6%)    0       0
  Mucositis             18 (36%) 15 (30%) 3 (6%)  2 (4%)    0       0

Any AE                  50 (100%) 50 (100%) 38 (76%) 35 (70%) 1 (2%) 0

Serious Adverse Events (SAEs)

SAE incidence and type
Relationship to treatment (related vs unrelated)
Outcome (resolved, ongoing, fatal)
Causality assessment (definite, probable, possible, unlikely, unrelated)

Treatment Modifications

Dose reductions: n (%), reason
Dose delays: n (%), duration
Discontinuations: n (%), reason (toxicity vs progression vs other)
Relative dose intensity: (actual dose delivered / planned dose) × 100%

Statistical Analysis Methods

Comparing Continuous Outcomes

Independent Samples t-test

Application: Compare means between two independent groups (normally distributed)
Assumptions: Normal distribution, equal variances (or use Welch's t-test)
Reporting: Mean ± SD for each group, mean difference (95% CI), t-statistic, df, p-value
Example: Mean age 62.3 ± 8.4 vs 58.7 ± 9.1 years, difference 3.6 years (95% CI 0.2-7.0, p=0.038)

Mann-Whitney U Test (Wilcoxon Rank-Sum)

Application: Compare medians between two groups (non-normal distribution)
Non-parametric: No distributional assumptions
Reporting: Median [IQR] for each group, median difference, U-statistic, p-value
Example: Median time to response 6.2 [4.1-8.3] vs 8.5 [5.9-11.2] weeks, p=0.042

ANOVA (Analysis of Variance)

Application: Compare means across three or more groups
Output: F-statistic, p-value (overall test)
Post-hoc: If significant, pairwise comparisons with Tukey or Bonferroni correction
Example: Treatment effect varied by biomarker subgroup (F=4.32, df=2, p=0.016)

Comparing Categorical Outcomes

Chi-Square Test for Independence

Application: Compare proportions between two or more groups
Assumptions: Expected count ≥5 in at least 80% of cells
Reporting: n (%) for each cell, χ², df, p-value
Example: ORR 45% vs 30%, χ²=6.21, df=1, p=0.013

Fisher's Exact Test

Application: 2×2 tables when expected count <5
Exact p-value: No large-sample approximation
Two-sided vs one-sided: Typically report two-sided
Example: SAE rate 3/20 (15%) vs 8/22 (36%), Fisher's exact p=0.083

McNemar's Test

Application: Paired categorical data (before/after, matched pairs)
Example: Response before vs after treatment switch in same patients

Sample Size and Power

Power Analysis Components

Alpha (α): Type I error rate, typically 0.05 (two-sided)
Beta (β): Type II error rate, typically 0.10 or 0.20
Power: 1 - β, typically 0.80 or 0.90 (80-90% power)
Effect size: Expected difference (HR, mean difference, proportion difference)
Sample size: Number of patients or events needed

Survival Study Sample Size

Events-driven: Need sufficient events (deaths, progressions)
Rule of thumb: 80% power requires approximately 165 events for HR=0.70 (α=0.05, two-sided)
Accrual time + follow-up time determines calendar time

Response Rate Study

Example: Detect ORR difference 45% vs 30% (15 percentage points)
- α = 0.05 (two-sided)
- Power = 0.80
- Sample size: n = 94 per group (188 total)
- With 10% dropout: n = 105 per group (210 total)

Data Visualization

Survival Curves

Kaplan-Meier Plot Best Practices

# Key elements for publication-quality survival curve
1. X-axis: Time (months or years), starts at 0
2. Y-axis: Survival probability (0 to 1.0 or 0% to 100%)
3. Step function: Survival curve with steps at event times
4. 95% CI bands: Shaded region around survival curve (optional but recommended)
5. Number at risk table: Below x-axis showing n at risk at time intervals
6. Censoring marks: Vertical tick marks (|) at censored observations
7. Legend: Clearly identify each curve
8. Log-rank p-value: Prominently displayed
9. Median survival: Horizontal line at 0.50, labeled
10. Follow-up: Median follow-up time reported

Number at Risk Table Format

Number at risk
Group A   50    42    35    28    18    10     5
Group B   48    38    29    19    12     6     2
Time      0     6     12    18    24    30    36 (months)

Hazard Ratio Annotation

On plot: HR 0.62 (95% CI 0.43-0.89), p=0.010
Or in caption: Log-rank test p=0.010; Cox model HR=0.62 (95% CI 0.43-0.89)

Waterfall Plots

Purpose: Visualize individual patient responses to treatment

Construction

X-axis: Individual patients (anonymized patient IDs)
Y-axis: Best % change from baseline tumor burden
Bars: Vertical bars, one per patient
- Positive values: Tumor growth
- Negative values: Tumor shrinkage
Ordering: Sorted from best response (left) to worst (right)
Color coding:
- Green/blue: CR or PR (≥30% decrease)
- Yellow: SD (-30% to +20%)
- Red: PD (≥20% increase)
Reference lines: Horizontal lines at +20% (PD), -30% (PR)
Annotations: Biomarker status, response duration (symbols)

Example Annotations

■ = Biomarker-positive
○ = Biomarker-negative
* = Ongoing response
† = Progressed

Forest Plots

Purpose: Display subgroup analyses with hazard ratios and confidence intervals

Construction

Y-axis: Subgroup categories
X-axis: Hazard ratio (log scale), vertical line at HR=1.0
Points: HR estimate for each subgroup
Horizontal lines: 95% confidence interval
Square size: Proportional to sample size or precision
Overall effect: Diamond at bottom, width represents 95% CI

Subgroups to Display

Subgroup                    n     HR (95% CI)          Favors A  Favors B
──────────────────────────────────────────────────────────────────────────
Overall                     300   0.65 (0.48-0.88)         ●────┤
Age
  <65 years                 180   0.58 (0.39-0.86)        ●────┤
  ≥65 years                 120   0.78 (0.49-1.24)          ●──────┤
Sex
  Male                      175   0.62 (0.43-0.90)        ●────┤
  Female                    125   0.70 (0.44-1.12)         ●─────┤
Biomarker Status
  Positive                  140   0.45 (0.28-0.72)      ●───┤
  Negative                  160   0.89 (0.59-1.34)           ●──────┤
                                  p-interaction=0.041

                                  0.25  0.5   1.0   2.0
                                        Hazard Ratio

Interaction Testing

Test whether treatment effect differs across subgroups
p-interaction <0.05 suggests heterogeneity
Pre-specify subgroups to avoid data mining

Spider Plots

Purpose: Display longitudinal tumor burden changes over time for individual patients

Construction

X-axis: Time from treatment start (weeks or months)
Y-axis: % change from baseline tumor burden
Lines: One line per patient connecting assessments
Color coding: By response category or biomarker status
Reference lines: 0% (no change), +20% (PD threshold), -30% (PR threshold)

Clinical Insights

Identify delayed responders (initial SD then PR)
Detect early progression (rapid upward trajectory)
Assess depth of response (maximum tumor shrinkage)
Duration visualization (when lines cross PD threshold)

Swimmer Plots

Purpose: Display treatment duration and response for individual patients

Construction

X-axis: Time from treatment start (weeks or months)
Y-axis: Individual patients (one row per patient)
Bars: Horizontal bars representing treatment duration
Symbols:
- ● Start of treatment
- ▼ Ongoing treatment (arrow)
- ■ Progressive disease (end of bar)
- ◆ Death
- | Dose modification
Color: Response status (CR=green, PR=blue, SD=yellow, PD=red)

Example

Patient ID    |0   3   6   9   12  15  18  21  24 months
──────────────|──────────────────────────────────────────
Pt-001        ●═══PR═══════════|════════PR══════════▼
Pt-002        ●═══PR═══════════════PD■
Pt-003        ●══════SD══════════PD■
Pt-004        ●PR══════════════════════════════════PR▼
...

Confidence Intervals

Interpretation

95% Confidence Interval

Range of plausible values for true population parameter
If study repeated 100 times, 95 of the 95% CIs would contain true value
Not: 95% probability true value within this interval (frequentist, not Bayesian)

Relationship to p-value

If 95% CI excludes null value (HR=1.0, difference=0), p<0.05
If 95% CI includes null value, p≥0.05
CI provides more information: magnitude and precision of effect

Precision

Narrow CI: High precision, large sample size
Wide CI: Low precision, small sample size or high variability
Example: HR 0.65 (95% CI 0.62-0.68) very precise; HR 0.65 (0.30-1.40) imprecise

Calculation Methods

Hazard Ratio CI

From Cox regression output
Standard error of log(HR) → exp(log(HR) ± 1.96×SE)
Example: HR=0.62, SE(logHR)=0.185 → 95% CI (0.43, 0.89)

Survival Rate CI (Greenwood Formula)

SE(S(t)) = S(t) × sqrt(Σ[d_i / (n_i × (n_i - d_i))])
95% CI: S(t) ± 1.96 × SE(S(t))
Can use complementary log-log transformation for better properties

Proportion CI (Exact Binomial)

For ORR, DCR: Use exact method (Clopper-Pearson) for small samples
Wilson score interval: Better properties than normal approximation
Example: 12/30 responses → ORR 40% (95% CI 22.7-59.4%)

Censoring and Missing Data

Types of Censoring

Right Censoring

End of study: Patient alive at study termination (administrative censoring)
Loss to follow-up: Patient stops attending visits
Withdrawal: Patient withdraws consent
Competing risk: Death from unrelated cause (in disease-specific survival)

Handling Censoring

Assumption: Non-informative - censoring independent of event probability
Sensitivity Analysis: Assess impact if assumption violated
- Best case: All censored patients never progress
- Worst case: All censored patients progress immediately after censoring
- Actual result should fall between best/worst case

Missing Data

Mechanisms

MCAR (Missing Completely at Random): Missingness unrelated to any variable
MAR (Missing at Random): Missingness related to observed but not unobserved variables
NMAR (Not Missing at Random): Missingness related to the missing value itself

Handling Strategies

Complete case analysis: Exclude patients with missing data (biased if not MCAR)
Multiple imputation: Generate multiple plausible datasets, analyze each, pool results
Maximum likelihood: Estimate parameters using all available data
Sensitivity analysis: Assess robustness to missing data assumptions

Response Assessment Missing Data

Unevaluable for response: Baseline measurable disease but post-baseline assessment missing
- Exclude from ORR denominator or count as non-responder (sensitivity analysis)
PFS censoring: Last adequate tumor assessment date if later assessments missing

Reporting Standards

CONSORT Statement (RCTs)

Flow Diagram

Assessed for eligibility (n=)
Randomized (n=)
Allocated to intervention (n=)
Lost to follow-up (n=, reasons)
Discontinued intervention (n=, reasons)
Analyzed (n=)

Baseline Table

Demographics and clinical characteristics
Baseline prognostic factors
Show balance between arms

Outcomes Table

Primary endpoint results with CI and p-value
Secondary endpoints
Safety summary

STROBE Statement (Observational Studies)

Study Design: Cohort, case-control, or cross-sectional

Participants: Eligibility, sources, selection methods, sample size

Variables: Clearly define outcomes, exposures, predictors, confounders

Statistical Methods: Describe all methods, handling of missing data, sensitivity analyses

Results: Participant flow, descriptive data, outcome data, main results, other analyses

Reproducible Research Practices

Statistical Analysis Plan (SAP)

Pre-specify all analyses before data lock
Primary and secondary endpoints
Analysis populations (ITT, per-protocol, safety)
Statistical tests and models
Subgroup analyses (pre-specified)
Interim analyses (if planned)
Multiple testing procedures

Transparency

Report all pre-specified analyses
Distinguish pre-specified from post-hoc exploratory
Report both positive and negative results
Provide access to anonymized individual patient data (when possible)

Software and Tools

R Packages for Survival Analysis

survival: Core package (Surv, survfit, coxph, survdiff)
survminer: Publication-ready Kaplan-Meier plots (ggsurvplot)
rms: Regression modeling strategies
flexsurv: Flexible parametric survival models

Python Libraries

lifelines: Kaplan-Meier, Cox regression, survival curves
scikit-survival: Machine learning for survival analysis
matplotlib: Custom survival curve plotting

Statistical Software

R: Most comprehensive for survival analysis
Stata: Medical statistics, good for epidemiology
SAS: Industry standard for clinical trials
GraphPad Prism: User-friendly for basic analyses
SPSS: Point-and-click interface, limited survival features

24 KiB Raw Blame History Unescape Escape

Outcome Analysis and Statistical Methods Guide

Overview

Survival Analysis

Kaplan-Meier Method

Log-Rank Test

Cox Proportional Hazards Regression

Response Assessment

RECIST v1.1 (Response Evaluation Criteria in Solid Tumors)

iRECIST (Immune RECIST)

Other Response Criteria

Outcome Metrics

Efficacy Endpoints

Safety Endpoints

Statistical Analysis Methods

Comparing Continuous Outcomes

Comparing Categorical Outcomes

Sample Size and Power

Data Visualization

Survival Curves

Waterfall Plots

Forest Plots

Spider Plots

Swimmer Plots

Confidence Intervals

Interpretation

Calculation Methods

Censoring and Missing Data

Types of Censoring

Missing Data

Reporting Standards

CONSORT Statement (RCTs)

STROBE Statement (Observational Studies)

Reproducible Research Practices

Software and Tools

R Packages for Survival Analysis

Python Libraries

Statistical Software

24 KiB

Raw Blame History