641 lines
24 KiB
Markdown
641 lines
24 KiB
Markdown
# Outcome Analysis and Statistical Methods Guide
|
||
|
||
## Overview
|
||
|
||
Rigorous outcome analysis is essential for clinical decision support documents. This guide covers survival analysis, response assessment, statistical testing, and data visualization for patient cohort analyses and treatment evaluation.
|
||
|
||
## Survival Analysis
|
||
|
||
### Kaplan-Meier Method
|
||
|
||
**Overview**
|
||
- Non-parametric estimator of survival function from time-to-event data
|
||
- Handles censored observations (patients alive at last follow-up)
|
||
- Provides survival probability at each time point
|
||
- Generates characteristic step-function survival curves
|
||
|
||
**Key Concepts**
|
||
|
||
**Censoring**
|
||
- **Right censoring**: Most common - patient alive at last follow-up or study end
|
||
- **Left censoring**: Rare in clinical studies
|
||
- **Interval censoring**: Event occurred between two assessment times
|
||
- **Informative vs non-informative**: Censoring should be independent of outcome
|
||
|
||
**Survival Function S(t)**
|
||
- S(t) = Probability of surviving beyond time t
|
||
- S(0) = 1.0 (100% alive at time zero)
|
||
- S(t) decreases as time increases
|
||
- Step decreases at each event time
|
||
|
||
**Median Survival**
|
||
- Time point where S(t) = 0.50
|
||
- 50% of patients alive, 50% have had event
|
||
- Reported with 95% confidence interval
|
||
- "Not reached (NR)" if fewer than 50% events
|
||
|
||
**Survival Rates at Fixed Time Points**
|
||
- 1-year survival rate, 2-year survival rate, 5-year survival rate
|
||
- Read from K-M curve at specific time point
|
||
- Report with 95% CI: S(t) ± 1.96 × SE
|
||
|
||
**Calculation Example**
|
||
```
|
||
Time Events At Risk Survival Probability
|
||
0 0 100 1.000
|
||
3 2 100 0.980 (98/100)
|
||
5 1 95 0.970 (97/100 × 95/98)
|
||
8 3 87 0.936 (94/100 × 92/95 × 84/87)
|
||
...
|
||
```
|
||
|
||
### Log-Rank Test
|
||
|
||
**Purpose**: Compare survival curves between two or more groups
|
||
|
||
**Null Hypothesis**: No difference in survival distributions between groups
|
||
|
||
**Test Statistic**
|
||
- Compares observed vs expected events in each group at each time point
|
||
- Weights all time points equally
|
||
- Follows chi-square distribution with df = k-1 (k groups)
|
||
|
||
**Reporting**
|
||
- Chi-square statistic, degrees of freedom, p-value
|
||
- Example: χ² = 6.82, df = 1, p = 0.009
|
||
- Interpretation: Significant difference in survival curves
|
||
|
||
**Assumptions**
|
||
- Censoring is non-informative and independent
|
||
- Proportional hazards (constant HR over time)
|
||
- If non-proportional, consider time-varying effects
|
||
|
||
**Alternatives for Non-Proportional Hazards**
|
||
- **Gehan-Breslow test**: Weights early events more heavily
|
||
- **Peto-Peto test**: Modifies Gehan-Breslow weighting
|
||
- **Restricted mean survival time (RMST)**: Difference in area under K-M curve
|
||
|
||
### Cox Proportional Hazards Regression
|
||
|
||
**Purpose**: Multivariable survival analysis, estimate hazard ratios adjusting for covariates
|
||
|
||
**Model**: h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)
|
||
- h(t|X): Hazard rate for individual with covariates X
|
||
- h₀(t): Baseline hazard function (unspecified)
|
||
- exp(β): Hazard ratio for one-unit change in covariate
|
||
|
||
**Hazard Ratio Interpretation**
|
||
- HR = 1.0: No effect
|
||
- HR > 1.0: Increased risk (harmful)
|
||
- HR < 1.0: Decreased risk (beneficial)
|
||
- HR = 0.50: 50% reduction in hazard (risk of event)
|
||
|
||
**Example Output**
|
||
```
|
||
Variable HR 95% CI p-value
|
||
Treatment (B vs A) 0.62 0.43-0.89 0.010
|
||
Age (per 10 years) 1.15 1.02-1.30 0.021
|
||
ECOG PS (2 vs 0-1) 1.85 1.21-2.83 0.004
|
||
Biomarker+ (vs -) 0.71 0.48-1.05 0.089
|
||
```
|
||
|
||
**Proportional Hazards Assumption**
|
||
- Hazard ratio constant over time
|
||
- Test: Schoenfeld residuals, log-minus-log plots
|
||
- Violation: Time-varying effects, consider stratification or time-dependent covariates
|
||
|
||
**Multivariable vs Univariable**
|
||
- **Univariable**: One covariate at a time, unadjusted HRs
|
||
- **Multivariable**: Multiple covariates simultaneously, adjusted HRs
|
||
- Report both: Univariable for all variables, multivariable for final model
|
||
|
||
**Model Selection**
|
||
- **Forward selection**: Start with empty model, add significant variables
|
||
- **Backward elimination**: Start with all variables, remove non-significant
|
||
- **Clinical judgment**: Include known prognostic factors regardless of p-value
|
||
- **Parsimony**: Avoid overfitting, rule of thumb 1 variable per 10-15 events
|
||
|
||
## Response Assessment
|
||
|
||
### RECIST v1.1 (Response Evaluation Criteria in Solid Tumors)
|
||
|
||
**Target Lesions**
|
||
- Select up to 5 lesions total (maximum 2 per organ)
|
||
- Measurable: ≥10 mm longest diameter (≥15 mm for lymph nodes short axis)
|
||
- Sum of longest diameters (SLD) at baseline
|
||
|
||
**Response Categories**
|
||
|
||
**Complete Response (CR)**
|
||
- Disappearance of all target and non-target lesions
|
||
- Lymph nodes must regress to <10 mm short axis
|
||
- Confirmation required at ≥4 weeks
|
||
|
||
**Partial Response (PR)**
|
||
- ≥30% decrease in SLD from baseline
|
||
- No new lesions or unequivocal progression of non-target lesions
|
||
- Confirmation required at ≥4 weeks
|
||
|
||
**Stable Disease (SD)**
|
||
- Neither PR nor PD criteria met
|
||
- Minimum duration typically 6-8 weeks from baseline
|
||
|
||
**Progressive Disease (PD)**
|
||
- ≥20% increase in SLD AND ≥5 mm absolute increase from smallest SLD (nadir)
|
||
- OR appearance of new lesions
|
||
- OR unequivocal progression of non-target lesions
|
||
|
||
**Example Calculation**
|
||
```
|
||
Baseline SLD: 80 mm (4 target lesions)
|
||
Week 6 SLD: 52 mm
|
||
|
||
Percent change: (52 - 80)/80 × 100% = -35%
|
||
Classification: Partial Response (≥30% decrease)
|
||
|
||
Week 12 SLD: 48 mm (nadir)
|
||
Week 18 SLD: 62 mm
|
||
|
||
Percent change from nadir: (62 - 48)/48 × 100% = +29%
|
||
Absolute change: 62 - 48 = 14 mm
|
||
Classification: Progressive Disease (>20% AND ≥5 mm increase)
|
||
```
|
||
|
||
### iRECIST (Immune RECIST)
|
||
|
||
**Purpose**: Account for atypical response patterns with immunotherapy
|
||
|
||
**Modifications from RECIST v1.1**
|
||
|
||
**iUPD (Immune Unconfirmed Progressive Disease)**
|
||
- Initial increase in tumor burden or new lesions
|
||
- Requires confirmation at next assessment (≥4 weeks later)
|
||
- Continue treatment if clinically stable
|
||
|
||
**iCPD (Immune Confirmed Progressive Disease)**
|
||
- Confirmed progression at repeat imaging
|
||
- Discontinue immunotherapy
|
||
|
||
**Pseudoprogression**
|
||
- Initial apparent progression followed by response
|
||
- Mechanism: Immune cell infiltration increases tumor size
|
||
- Incidence: 5-10% of patients on immunotherapy
|
||
- Management: Continue treatment if patient clinically stable
|
||
|
||
**New Lesions**
|
||
- Record size and location but continue treatment
|
||
- Do not automatically classify as PD
|
||
- Confirm progression if new lesions grow or additional new lesions appear
|
||
|
||
### Other Response Criteria
|
||
|
||
**Lugano Classification (Lymphoma)**
|
||
- **PET-based**: Deauville 5-point scale
|
||
- Score 1-3: Negative (metabolic CR)
|
||
- Score 4-5: Positive (residual disease)
|
||
- **CT-based**: If PET not available
|
||
- **Bone marrow**: Required for staging in some lymphomas
|
||
|
||
**RANO (Response Assessment in Neuro-Oncology)**
|
||
- **Glioblastoma-specific**: Accounts for pseudoprogression with radiation/temozolomide
|
||
- **Enhancing disease**: Bidimensional measurements (product of perpendicular diameters)
|
||
- **Non-enhancing disease**: FLAIR changes assessed separately
|
||
- **Corticosteroid dose**: Must document, increase may indicate progression
|
||
|
||
**mRECIST (Modified RECIST for HCC)**
|
||
- **Viable tumor**: Enhancing portion only (arterial phase enhancement)
|
||
- **Necrosis**: Non-enhancing areas excluded from measurements
|
||
- **Application**: Hepatocellular carcinoma with arterial enhancement
|
||
|
||
## Outcome Metrics
|
||
|
||
### Efficacy Endpoints
|
||
|
||
**Overall Survival (OS)**
|
||
- **Definition**: Time from randomization/treatment start to death from any cause
|
||
- **Advantages**: Objective, not subject to assessment bias, regulatory gold standard
|
||
- **Disadvantages**: Requires long follow-up, affected by subsequent therapies
|
||
- **Censoring**: Last known alive date
|
||
- **Analysis**: Kaplan-Meier, log-rank test, Cox regression
|
||
|
||
**Progression-Free Survival (PFS)**
|
||
- **Definition**: Time from randomization to progression (RECIST) or death
|
||
- **Advantages**: Earlier readout than OS, direct treatment effect
|
||
- **Disadvantages**: Requires regular imaging, subject to assessment timing
|
||
- **Censoring**: Last tumor assessment without progression
|
||
- **Sensitivity Analysis**: Assess impact of censoring assumptions
|
||
|
||
**Objective Response Rate (ORR)**
|
||
- **Definition**: Proportion of patients achieving CR or PR (best response)
|
||
- **Denominator**: Evaluable patients (baseline measurable disease)
|
||
- **Reporting**: Percentage with 95% CI (exact binomial method)
|
||
- **Duration**: Time from first response to progression (DOR)
|
||
- **Advantage**: Binary endpoint, no censoring complications
|
||
|
||
**Disease Control Rate (DCR)**
|
||
- **Definition**: CR + PR + SD (stable disease ≥6-8 weeks)
|
||
- **Less Stringent**: Captures clinical benefit beyond objective response
|
||
- **Reporting**: Percentage with 95% CI
|
||
|
||
**Duration of Response (DOR)**
|
||
- **Definition**: Time from first CR or PR to progression (among responders only)
|
||
- **Population**: Subset analysis of responders
|
||
- **Analysis**: Kaplan-Meier among responders
|
||
- **Reporting**: Median DOR with 95% CI
|
||
|
||
**Time to Treatment Failure (TTF)**
|
||
- **Definition**: Time from start to discontinuation for any reason (progression, toxicity, death, patient choice)
|
||
- **Advantage**: Reflects real-world treatment duration
|
||
- **Components**: PFS + toxicity-related discontinuations
|
||
|
||
### Safety Endpoints
|
||
|
||
**Adverse Events (CTCAE v5.0)**
|
||
|
||
**Grading**
|
||
- **Grade 1**: Mild, asymptomatic or mild symptoms, clinical intervention not indicated
|
||
- **Grade 2**: Moderate, minimal/local intervention indicated, age-appropriate ADL limitation
|
||
- **Grade 3**: Severe or medically significant, not immediately life-threatening, hospitalization/prolongation indicated, disabling, self-care ADL limitation
|
||
- **Grade 4**: Life-threatening consequences, urgent intervention indicated
|
||
- **Grade 5**: Death related to adverse event
|
||
|
||
**Reporting Standards**
|
||
```
|
||
Adverse Event Summary Table:
|
||
|
||
AE Term (MedDRA) Any Grade, n (%) Grade 3-4, n (%) Grade 5, n (%)
|
||
Trt A Trt B Trt A Trt B Trt A Trt B
|
||
─────────────────────────────────────────────────────────────────────────
|
||
Hematologic
|
||
Anemia 45 (90%) 42 (84%) 8 (16%) 6 (12%) 0 0
|
||
Neutropenia 35 (70%) 38 (76%) 15 (30%) 18 (36%) 0 0
|
||
Thrombocytopenia 28 (56%) 25 (50%) 6 (12%) 4 (8%) 0 0
|
||
Febrile neutropenia 4 (8%) 6 (12%) 4 (8%) 6 (12%) 0 0
|
||
|
||
Gastrointestinal
|
||
Nausea 42 (84%) 40 (80%) 2 (4%) 1 (2%) 0 0
|
||
Diarrhea 31 (62%) 28 (56%) 5 (10%) 3 (6%) 0 0
|
||
Mucositis 18 (36%) 15 (30%) 3 (6%) 2 (4%) 0 0
|
||
|
||
Any AE 50 (100%) 50 (100%) 38 (76%) 35 (70%) 1 (2%) 0
|
||
```
|
||
|
||
**Serious Adverse Events (SAEs)**
|
||
- SAE incidence and type
|
||
- Relationship to treatment (related vs unrelated)
|
||
- Outcome (resolved, ongoing, fatal)
|
||
- Causality assessment (definite, probable, possible, unlikely, unrelated)
|
||
|
||
**Treatment Modifications**
|
||
- Dose reductions: n (%), reason
|
||
- Dose delays: n (%), duration
|
||
- Discontinuations: n (%), reason (toxicity vs progression vs other)
|
||
- Relative dose intensity: (actual dose delivered / planned dose) × 100%
|
||
|
||
## Statistical Analysis Methods
|
||
|
||
### Comparing Continuous Outcomes
|
||
|
||
**Independent Samples t-test**
|
||
- **Application**: Compare means between two independent groups (normally distributed)
|
||
- **Assumptions**: Normal distribution, equal variances (or use Welch's t-test)
|
||
- **Reporting**: Mean ± SD for each group, mean difference (95% CI), t-statistic, df, p-value
|
||
- **Example**: Mean age 62.3 ± 8.4 vs 58.7 ± 9.1 years, difference 3.6 years (95% CI 0.2-7.0, p=0.038)
|
||
|
||
**Mann-Whitney U Test (Wilcoxon Rank-Sum)**
|
||
- **Application**: Compare medians between two groups (non-normal distribution)
|
||
- **Non-parametric**: No distributional assumptions
|
||
- **Reporting**: Median [IQR] for each group, median difference, U-statistic, p-value
|
||
- **Example**: Median time to response 6.2 [4.1-8.3] vs 8.5 [5.9-11.2] weeks, p=0.042
|
||
|
||
**ANOVA (Analysis of Variance)**
|
||
- **Application**: Compare means across three or more groups
|
||
- **Output**: F-statistic, p-value (overall test)
|
||
- **Post-hoc**: If significant, pairwise comparisons with Tukey or Bonferroni correction
|
||
- **Example**: Treatment effect varied by biomarker subgroup (F=4.32, df=2, p=0.016)
|
||
|
||
### Comparing Categorical Outcomes
|
||
|
||
**Chi-Square Test for Independence**
|
||
- **Application**: Compare proportions between two or more groups
|
||
- **Assumptions**: Expected count ≥5 in at least 80% of cells
|
||
- **Reporting**: n (%) for each cell, χ², df, p-value
|
||
- **Example**: ORR 45% vs 30%, χ²=6.21, df=1, p=0.013
|
||
|
||
**Fisher's Exact Test**
|
||
- **Application**: 2×2 tables when expected count <5
|
||
- **Exact p-value**: No large-sample approximation
|
||
- **Two-sided vs one-sided**: Typically report two-sided
|
||
- **Example**: SAE rate 3/20 (15%) vs 8/22 (36%), Fisher's exact p=0.083
|
||
|
||
**McNemar's Test**
|
||
- **Application**: Paired categorical data (before/after, matched pairs)
|
||
- **Example**: Response before vs after treatment switch in same patients
|
||
|
||
### Sample Size and Power
|
||
|
||
**Power Analysis Components**
|
||
- **Alpha (α)**: Type I error rate, typically 0.05 (two-sided)
|
||
- **Beta (β)**: Type II error rate, typically 0.10 or 0.20
|
||
- **Power**: 1 - β, typically 0.80 or 0.90 (80-90% power)
|
||
- **Effect size**: Expected difference (HR, mean difference, proportion difference)
|
||
- **Sample size**: Number of patients or events needed
|
||
|
||
**Survival Study Sample Size**
|
||
- Events-driven: Need sufficient events (deaths, progressions)
|
||
- Rule of thumb: 80% power requires approximately 165 events for HR=0.70 (α=0.05, two-sided)
|
||
- Accrual time + follow-up time determines calendar time
|
||
|
||
**Response Rate Study**
|
||
```
|
||
Example: Detect ORR difference 45% vs 30% (15 percentage points)
|
||
- α = 0.05 (two-sided)
|
||
- Power = 0.80
|
||
- Sample size: n = 94 per group (188 total)
|
||
- With 10% dropout: n = 105 per group (210 total)
|
||
```
|
||
|
||
## Data Visualization
|
||
|
||
### Survival Curves
|
||
|
||
**Kaplan-Meier Plot Best Practices**
|
||
|
||
```python
|
||
# Key elements for publication-quality survival curve
|
||
1. X-axis: Time (months or years), starts at 0
|
||
2. Y-axis: Survival probability (0 to 1.0 or 0% to 100%)
|
||
3. Step function: Survival curve with steps at event times
|
||
4. 95% CI bands: Shaded region around survival curve (optional but recommended)
|
||
5. Number at risk table: Below x-axis showing n at risk at time intervals
|
||
6. Censoring marks: Vertical tick marks (|) at censored observations
|
||
7. Legend: Clearly identify each curve
|
||
8. Log-rank p-value: Prominently displayed
|
||
9. Median survival: Horizontal line at 0.50, labeled
|
||
10. Follow-up: Median follow-up time reported
|
||
```
|
||
|
||
**Number at Risk Table Format**
|
||
```
|
||
Number at risk
|
||
Group A 50 42 35 28 18 10 5
|
||
Group B 48 38 29 19 12 6 2
|
||
Time 0 6 12 18 24 30 36 (months)
|
||
```
|
||
|
||
**Hazard Ratio Annotation**
|
||
```
|
||
On plot: HR 0.62 (95% CI 0.43-0.89), p=0.010
|
||
Or in caption: Log-rank test p=0.010; Cox model HR=0.62 (95% CI 0.43-0.89)
|
||
```
|
||
|
||
### Waterfall Plots
|
||
|
||
**Purpose**: Visualize individual patient responses to treatment
|
||
|
||
**Construction**
|
||
- **X-axis**: Individual patients (anonymized patient IDs)
|
||
- **Y-axis**: Best % change from baseline tumor burden
|
||
- **Bars**: Vertical bars, one per patient
|
||
- Positive values: Tumor growth
|
||
- Negative values: Tumor shrinkage
|
||
- **Ordering**: Sorted from best response (left) to worst (right)
|
||
- **Color coding**:
|
||
- Green/blue: CR or PR (≥30% decrease)
|
||
- Yellow: SD (-30% to +20%)
|
||
- Red: PD (≥20% increase)
|
||
- **Reference lines**: Horizontal lines at +20% (PD), -30% (PR)
|
||
- **Annotations**: Biomarker status, response duration (symbols)
|
||
|
||
**Example Annotations**
|
||
```
|
||
■ = Biomarker-positive
|
||
○ = Biomarker-negative
|
||
* = Ongoing response
|
||
† = Progressed
|
||
```
|
||
|
||
### Forest Plots
|
||
|
||
**Purpose**: Display subgroup analyses with hazard ratios and confidence intervals
|
||
|
||
**Construction**
|
||
- **Y-axis**: Subgroup categories
|
||
- **X-axis**: Hazard ratio (log scale), vertical line at HR=1.0
|
||
- **Points**: HR estimate for each subgroup
|
||
- **Horizontal lines**: 95% confidence interval
|
||
- **Square size**: Proportional to sample size or precision
|
||
- **Overall effect**: Diamond at bottom, width represents 95% CI
|
||
|
||
**Subgroups to Display**
|
||
```
|
||
Subgroup n HR (95% CI) Favors A Favors B
|
||
──────────────────────────────────────────────────────────────────────────
|
||
Overall 300 0.65 (0.48-0.88) ●────┤
|
||
Age
|
||
<65 years 180 0.58 (0.39-0.86) ●────┤
|
||
≥65 years 120 0.78 (0.49-1.24) ●──────┤
|
||
Sex
|
||
Male 175 0.62 (0.43-0.90) ●────┤
|
||
Female 125 0.70 (0.44-1.12) ●─────┤
|
||
Biomarker Status
|
||
Positive 140 0.45 (0.28-0.72) ●───┤
|
||
Negative 160 0.89 (0.59-1.34) ●──────┤
|
||
p-interaction=0.041
|
||
|
||
0.25 0.5 1.0 2.0
|
||
Hazard Ratio
|
||
```
|
||
|
||
**Interaction Testing**
|
||
- Test whether treatment effect differs across subgroups
|
||
- p-interaction <0.05 suggests heterogeneity
|
||
- Pre-specify subgroups to avoid data mining
|
||
|
||
### Spider Plots
|
||
|
||
**Purpose**: Display longitudinal tumor burden changes over time for individual patients
|
||
|
||
**Construction**
|
||
- **X-axis**: Time from treatment start (weeks or months)
|
||
- **Y-axis**: % change from baseline tumor burden
|
||
- **Lines**: One line per patient connecting assessments
|
||
- **Color coding**: By response category or biomarker status
|
||
- **Reference lines**: 0% (no change), +20% (PD threshold), -30% (PR threshold)
|
||
|
||
**Clinical Insights**
|
||
- Identify delayed responders (initial SD then PR)
|
||
- Detect early progression (rapid upward trajectory)
|
||
- Assess depth of response (maximum tumor shrinkage)
|
||
- Duration visualization (when lines cross PD threshold)
|
||
|
||
### Swimmer Plots
|
||
|
||
**Purpose**: Display treatment duration and response for individual patients
|
||
|
||
**Construction**
|
||
- **X-axis**: Time from treatment start (weeks or months)
|
||
- **Y-axis**: Individual patients (one row per patient)
|
||
- **Bars**: Horizontal bars representing treatment duration
|
||
- **Symbols**:
|
||
- ● Start of treatment
|
||
- ▼ Ongoing treatment (arrow)
|
||
- ■ Progressive disease (end of bar)
|
||
- ◆ Death
|
||
- | Dose modification
|
||
- **Color**: Response status (CR=green, PR=blue, SD=yellow, PD=red)
|
||
|
||
**Example**
|
||
```
|
||
Patient ID |0 3 6 9 12 15 18 21 24 months
|
||
──────────────|──────────────────────────────────────────
|
||
Pt-001 ●═══PR═══════════|════════PR══════════▼
|
||
Pt-002 ●═══PR═══════════════PD■
|
||
Pt-003 ●══════SD══════════PD■
|
||
Pt-004 ●PR══════════════════════════════════PR▼
|
||
...
|
||
```
|
||
|
||
## Confidence Intervals
|
||
|
||
### Interpretation
|
||
|
||
**95% Confidence Interval**
|
||
- Range of plausible values for true population parameter
|
||
- If study repeated 100 times, 95 of the 95% CIs would contain true value
|
||
- **Not**: 95% probability true value within this interval (frequentist, not Bayesian)
|
||
|
||
**Relationship to p-value**
|
||
- If 95% CI excludes null value (HR=1.0, difference=0), p<0.05
|
||
- If 95% CI includes null value, p≥0.05
|
||
- CI provides more information: magnitude and precision of effect
|
||
|
||
**Precision**
|
||
- **Narrow CI**: High precision, large sample size
|
||
- **Wide CI**: Low precision, small sample size or high variability
|
||
- **Example**: HR 0.65 (95% CI 0.62-0.68) very precise; HR 0.65 (0.30-1.40) imprecise
|
||
|
||
### Calculation Methods
|
||
|
||
**Hazard Ratio CI**
|
||
- From Cox regression output
|
||
- Standard error of log(HR) → exp(log(HR) ± 1.96×SE)
|
||
- Example: HR=0.62, SE(logHR)=0.185 → 95% CI (0.43, 0.89)
|
||
|
||
**Survival Rate CI (Greenwood Formula)**
|
||
- SE(S(t)) = S(t) × sqrt(Σ[d_i / (n_i × (n_i - d_i))])
|
||
- 95% CI: S(t) ± 1.96 × SE(S(t))
|
||
- Can use complementary log-log transformation for better properties
|
||
|
||
**Proportion CI (Exact Binomial)**
|
||
- For ORR, DCR: Use exact method (Clopper-Pearson) for small samples
|
||
- Wilson score interval: Better properties than normal approximation
|
||
- Example: 12/30 responses → ORR 40% (95% CI 22.7-59.4%)
|
||
|
||
## Censoring and Missing Data
|
||
|
||
### Types of Censoring
|
||
|
||
**Right Censoring**
|
||
- **End of study**: Patient alive at study termination (administrative censoring)
|
||
- **Loss to follow-up**: Patient stops attending visits
|
||
- **Withdrawal**: Patient withdraws consent
|
||
- **Competing risk**: Death from unrelated cause (in disease-specific survival)
|
||
|
||
**Handling Censoring**
|
||
- **Assumption**: Non-informative - censoring independent of event probability
|
||
- **Sensitivity Analysis**: Assess impact if assumption violated
|
||
- Best case: All censored patients never progress
|
||
- Worst case: All censored patients progress immediately after censoring
|
||
- Actual result should fall between best/worst case
|
||
|
||
### Missing Data
|
||
|
||
**Mechanisms**
|
||
- **MCAR (Missing Completely at Random)**: Missingness unrelated to any variable
|
||
- **MAR (Missing at Random)**: Missingness related to observed but not unobserved variables
|
||
- **NMAR (Not Missing at Random)**: Missingness related to the missing value itself
|
||
|
||
**Handling Strategies**
|
||
- **Complete case analysis**: Exclude patients with missing data (biased if not MCAR)
|
||
- **Multiple imputation**: Generate multiple plausible datasets, analyze each, pool results
|
||
- **Maximum likelihood**: Estimate parameters using all available data
|
||
- **Sensitivity analysis**: Assess robustness to missing data assumptions
|
||
|
||
**Response Assessment Missing Data**
|
||
- **Unevaluable for response**: Baseline measurable disease but post-baseline assessment missing
|
||
- Exclude from ORR denominator or count as non-responder (sensitivity analysis)
|
||
- **PFS censoring**: Last adequate tumor assessment date if later assessments missing
|
||
|
||
## Reporting Standards
|
||
|
||
### CONSORT Statement (RCTs)
|
||
|
||
**Flow Diagram**
|
||
- Assessed for eligibility (n=)
|
||
- Randomized (n=)
|
||
- Allocated to intervention (n=)
|
||
- Lost to follow-up (n=, reasons)
|
||
- Discontinued intervention (n=, reasons)
|
||
- Analyzed (n=)
|
||
|
||
**Baseline Table**
|
||
- Demographics and clinical characteristics
|
||
- Baseline prognostic factors
|
||
- Show balance between arms
|
||
|
||
**Outcomes Table**
|
||
- Primary endpoint results with CI and p-value
|
||
- Secondary endpoints
|
||
- Safety summary
|
||
|
||
### STROBE Statement (Observational Studies)
|
||
|
||
**Study Design**: Cohort, case-control, or cross-sectional
|
||
|
||
**Participants**: Eligibility, sources, selection methods, sample size
|
||
|
||
**Variables**: Clearly define outcomes, exposures, predictors, confounders
|
||
|
||
**Statistical Methods**: Describe all methods, handling of missing data, sensitivity analyses
|
||
|
||
**Results**: Participant flow, descriptive data, outcome data, main results, other analyses
|
||
|
||
### Reproducible Research Practices
|
||
|
||
**Statistical Analysis Plan (SAP)**
|
||
- Pre-specify all analyses before data lock
|
||
- Primary and secondary endpoints
|
||
- Analysis populations (ITT, per-protocol, safety)
|
||
- Statistical tests and models
|
||
- Subgroup analyses (pre-specified)
|
||
- Interim analyses (if planned)
|
||
- Multiple testing procedures
|
||
|
||
**Transparency**
|
||
- Report all pre-specified analyses
|
||
- Distinguish pre-specified from post-hoc exploratory
|
||
- Report both positive and negative results
|
||
- Provide access to anonymized individual patient data (when possible)
|
||
|
||
## Software and Tools
|
||
|
||
### R Packages for Survival Analysis
|
||
- **survival**: Core package (Surv, survfit, coxph, survdiff)
|
||
- **survminer**: Publication-ready Kaplan-Meier plots (ggsurvplot)
|
||
- **rms**: Regression modeling strategies
|
||
- **flexsurv**: Flexible parametric survival models
|
||
|
||
### Python Libraries
|
||
- **lifelines**: Kaplan-Meier, Cox regression, survival curves
|
||
- **scikit-survival**: Machine learning for survival analysis
|
||
- **matplotlib**: Custom survival curve plotting
|
||
|
||
### Statistical Software
|
||
- **R**: Most comprehensive for survival analysis
|
||
- **Stata**: Medical statistics, good for epidemiology
|
||
- **SAS**: Industry standard for clinical trials
|
||
- **GraphPad Prism**: User-friendly for basic analyses
|
||
- **SPSS**: Point-and-click interface, limited survival features
|
||
|