# Competing Risks Analysis ## Overview Competing risks occur when subjects can experience one of several mutually exclusive events (event types). When one event occurs, it prevents ("competes with") the occurrence of other events. ### Examples of Competing Risks **Medical Research:** - Death from cancer vs. death from cardiovascular disease vs. death from other causes - Relapse vs. death without relapse in cancer studies - Different types of infections in transplant patients **Other Applications:** - Job termination: retirement vs. resignation vs. termination for cause - Equipment failure: different failure modes - Customer churn: different reasons for leaving ### Key Concept: Cumulative Incidence Function (CIF) The **Cumulative Incidence Function (CIF)** represents the probability of experiencing a specific event type by time *t*, accounting for the presence of competing risks. **CIF_k(t) = P(T ≤ t, event type = k)** This differs from the Kaplan-Meier estimator, which would overestimate event probabilities when competing risks are present. ## When to Use Competing Risks Analysis **Use competing risks when:** - Multiple mutually exclusive event types exist - Occurrence of one event prevents others - Need to estimate probability of specific event types - Want to understand how covariates affect different event types **Don't use when:** - Only one event type of interest (standard survival analysis) - Events are not mutually exclusive (use recurrent events methods) - Competing events are extremely rare (can treat as censoring) ## Cumulative Incidence with Competing Risks ### cumulative_incidence_competing_risks Function Estimates the cumulative incidence function for each event type. ```python from sksurv.nonparametric import cumulative_incidence_competing_risks from sksurv.datasets import load_leukemia # Load data with competing risks X, y = load_leukemia() # y has event types: 0=censored, 1=relapse, 2=death # Compute cumulative incidence for each event type # Returns: time points, CIF for event 1, CIF for event 2, ... time_points, cif_1, cif_2 = cumulative_incidence_competing_risks(y) # Plot cumulative incidence functions import matplotlib.pyplot as plt plt.figure(figsize=(10, 6)) plt.step(time_points, cif_1, where='post', label='Relapse', linewidth=2) plt.step(time_points, cif_2, where='post', label='Death in remission', linewidth=2) plt.xlabel('Time (weeks)') plt.ylabel('Cumulative Incidence') plt.title('Competing Risks: Relapse vs Death') plt.legend() plt.grid(True, alpha=0.3) plt.show() ``` ### Interpretation - **CIF at time t**: Probability of experiencing that specific event by time t - **Sum of all CIFs**: Total probability of experiencing any event (all cause) - **1 - sum of CIFs**: Probability of being event-free and uncensored ## Data Format for Competing Risks ### Creating Structured Array with Event Types ```python import numpy as np from sksurv.util import Surv # Event types: 0 = censored, 1 = event type 1, 2 = event type 2 event_types = np.array([0, 1, 2, 1, 0, 2, 1]) times = np.array([10.2, 5.3, 8.1, 3.7, 12.5, 6.8, 4.2]) # Create survival array # For competing risks: event=True if any event occurred # Store event type separately or encode in the event field y = Surv.from_arrays( event=(event_types > 0), # True if any event time=times ) # Keep event_types for distinguishing between event types ``` ### Converting Data with Event Types ```python import pandas as pd from sksurv.util import Surv # Assume data has: time, event_type columns # event_type: 0=censored, 1=type1, 2=type2, etc. df = pd.read_csv('competing_risks_data.csv') # Create survival outcome y = Surv.from_arrays( event=(df['event_type'] > 0), time=df['time'] ) # Store event types event_types = df['event_type'].values ``` ## Comparing Cumulative Incidence Between Groups ### Stratified Analysis ```python from sksurv.nonparametric import cumulative_incidence_competing_risks import matplotlib.pyplot as plt # Split by treatment group mask_treatment = X['treatment'] == 'A' mask_control = X['treatment'] == 'B' y_treatment = y[mask_treatment] y_control = y[mask_control] # Compute CIF for each group time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_treatment) time_ctl, cif1_ctl, cif2_ctl = cumulative_incidence_competing_risks(y_control) # Plot comparison fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) # Event type 1 ax1.step(time_trt, cif1_trt, where='post', label='Treatment', linewidth=2) ax1.step(time_ctl, cif1_ctl, where='post', label='Control', linewidth=2) ax1.set_xlabel('Time') ax1.set_ylabel('Cumulative Incidence') ax1.set_title('Event Type 1') ax1.legend() ax1.grid(True, alpha=0.3) # Event type 2 ax2.step(time_trt, cif2_trt, where='post', label='Treatment', linewidth=2) ax2.step(time_ctl, cif2_ctl, where='post', label='Control', linewidth=2) ax2.set_xlabel('Time') ax2.set_ylabel('Cumulative Incidence') ax2.set_title('Event Type 2') ax2.legend() ax2.grid(True, alpha=0.3) plt.tight_layout() plt.show() ``` ## Statistical Testing with Competing Risks ### Gray's Test Compare cumulative incidence functions between groups using Gray's test (available in other packages like lifelines). ```python # Note: Gray's test not directly available in scikit-survival # Consider using lifelines or other packages # from lifelines.statistics import multivariate_logrank_test # result = multivariate_logrank_test(times, groups, events, event_of_interest=1) ``` ## Modeling with Competing Risks ### Approach 1: Cause-Specific Hazard Models Fit separate Cox models for each event type, treating other event types as censored. ```python from sksurv.linear_model import CoxPHSurvivalAnalysis from sksurv.util import Surv # Separate outcome for each event type # Event type 1: treat type 2 as censored y_event1 = Surv.from_arrays( event=(event_types == 1), time=times ) # Event type 2: treat type 1 as censored y_event2 = Surv.from_arrays( event=(event_types == 2), time=times ) # Fit cause-specific models cox_event1 = CoxPHSurvivalAnalysis() cox_event1.fit(X, y_event1) cox_event2 = CoxPHSurvivalAnalysis() cox_event2.fit(X, y_event2) # Interpret coefficients for each event type print("Event Type 1 (e.g., Relapse):") print(cox_event1.coef_) print("\nEvent Type 2 (e.g., Death):") print(cox_event2.coef_) ``` **Interpretation:** - Separate model for each competing event - Coefficients show effect on cause-specific hazard for that event type - A covariate may increase risk for one event type but decrease for another ### Approach 2: Fine-Gray Sub-distribution Hazard Model Models the cumulative incidence directly (not available directly in scikit-survival, but can use other packages). ```python # Note: Fine-Gray model not directly in scikit-survival # Consider using lifelines or rpy2 to access R's cmprsk package # from lifelines import CRCSplineFitter # crc = CRCSplineFitter() # crc.fit(df, event_col='event', duration_col='time') ``` ## Practical Example: Complete Competing Risks Analysis ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from sksurv.nonparametric import cumulative_incidence_competing_risks from sksurv.linear_model import CoxPHSurvivalAnalysis from sksurv.util import Surv # Simulate competing risks data np.random.seed(42) n = 200 # Create features age = np.random.normal(60, 10, n) treatment = np.random.choice(['A', 'B'], n) # Simulate event times and types # Event types: 0=censored, 1=relapse, 2=death times = np.random.exponential(100, n) event_types = np.zeros(n, dtype=int) # Higher age increases both events, treatment A reduces relapse for i in range(n): if times[i] < 150: # Event occurred # Probability of each event type p_relapse = 0.6 if treatment[i] == 'B' else 0.4 event_types[i] = 1 if np.random.rand() < p_relapse else 2 else: times[i] = 150 # Censored at study end # Create DataFrame df = pd.DataFrame({ 'time': times, 'event_type': event_types, 'age': age, 'treatment': treatment }) # Encode treatment df['treatment_A'] = (df['treatment'] == 'A').astype(int) # 1. OVERALL CUMULATIVE INCIDENCE print("=" * 60) print("OVERALL CUMULATIVE INCIDENCE") print("=" * 60) y_all = Surv.from_arrays(event=(df['event_type'] > 0), time=df['time']) time_points, cif_relapse, cif_death = cumulative_incidence_competing_risks(y_all) plt.figure(figsize=(10, 6)) plt.step(time_points, cif_relapse, where='post', label='Relapse', linewidth=2) plt.step(time_points, cif_death, where='post', label='Death', linewidth=2) plt.xlabel('Time (days)') plt.ylabel('Cumulative Incidence') plt.title('Competing Risks: Relapse vs Death') plt.legend() plt.grid(True, alpha=0.3) plt.show() print(f"5-year relapse incidence: {cif_relapse[-1]:.2%}") print(f"5-year death incidence: {cif_death[-1]:.2%}") # 2. STRATIFIED BY TREATMENT print("\n" + "=" * 60) print("CUMULATIVE INCIDENCE BY TREATMENT") print("=" * 60) for trt in ['A', 'B']: mask = df['treatment'] == trt y_trt = Surv.from_arrays( event=(df.loc[mask, 'event_type'] > 0), time=df.loc[mask, 'time'] ) time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_trt) print(f"\nTreatment {trt}:") print(f" 5-year relapse: {cif1_trt[-1]:.2%}") print(f" 5-year death: {cif2_trt[-1]:.2%}") # 3. CAUSE-SPECIFIC MODELS print("\n" + "=" * 60) print("CAUSE-SPECIFIC HAZARD MODELS") print("=" * 60) X = df[['age', 'treatment_A']] # Model for relapse (event type 1) y_relapse = Surv.from_arrays( event=(df['event_type'] == 1), time=df['time'] ) cox_relapse = CoxPHSurvivalAnalysis() cox_relapse.fit(X, y_relapse) print("\nRelapse Model:") print(f" Age: HR = {np.exp(cox_relapse.coef_[0]):.3f}") print(f" Treatment A: HR = {np.exp(cox_relapse.coef_[1]):.3f}") # Model for death (event type 2) y_death = Surv.from_arrays( event=(df['event_type'] == 2), time=df['time'] ) cox_death = CoxPHSurvivalAnalysis() cox_death.fit(X, y_death) print("\nDeath Model:") print(f" Age: HR = {np.exp(cox_death.coef_[0]):.3f}") print(f" Treatment A: HR = {np.exp(cox_death.coef_[1]):.3f}") print("\n" + "=" * 60) ``` ## Important Considerations ### Censoring in Competing Risks - **Administrative censoring**: Subject still at risk at end of study - **Loss to follow-up**: Subject leaves study before event - **Competing event**: Other event occurred - NOT censored for CIF, but censored for cause-specific models ### Choosing Between Cause-Specific and Sub-distribution Models **Cause-Specific Hazard Models:** - Easier to interpret - Direct effect on hazard rate - Better for understanding etiology - Can fit with scikit-survival **Fine-Gray Sub-distribution Models:** - Models cumulative incidence directly - Better for prediction and risk stratification - More appropriate for clinical decision-making - Requires other packages ### Common Mistakes **Mistake 1**: Using Kaplan-Meier to estimate event-specific probabilities - **Wrong**: Kaplan-Meier for event type 1, treating type 2 as censored - **Correct**: Cumulative incidence function accounting for competing risks **Mistake 2**: Ignoring competing risks when they're substantial - If competing event rate > 10-20%, should use competing risks methods **Mistake 3**: Confusing cause-specific and sub-distribution hazards - They answer different questions - Use appropriate model for your research question ## Summary **Key Functions:** - `cumulative_incidence_competing_risks`: Estimate CIF for each event type - Fit separate Cox models for cause-specific hazards - Use stratified analysis to compare groups **Best Practices:** 1. Always plot cumulative incidence functions 2. Report both event-specific and overall incidence 3. Use cause-specific models in scikit-survival 4. Consider Fine-Gray models (other packages) for prediction 5. Be explicit about which events are competing vs censored