12 KiB
Competing Risks Analysis
Overview
Competing risks occur when subjects can experience one of several mutually exclusive events (event types). When one event occurs, it prevents ("competes with") the occurrence of other events.
Examples of Competing Risks
Medical Research:
- Death from cancer vs. death from cardiovascular disease vs. death from other causes
- Relapse vs. death without relapse in cancer studies
- Different types of infections in transplant patients
Other Applications:
- Job termination: retirement vs. resignation vs. termination for cause
- Equipment failure: different failure modes
- Customer churn: different reasons for leaving
Key Concept: Cumulative Incidence Function (CIF)
The Cumulative Incidence Function (CIF) represents the probability of experiencing a specific event type by time t, accounting for the presence of competing risks.
CIF_k(t) = P(T ≤ t, event type = k)
This differs from the Kaplan-Meier estimator, which would overestimate event probabilities when competing risks are present.
When to Use Competing Risks Analysis
Use competing risks when:
- Multiple mutually exclusive event types exist
- Occurrence of one event prevents others
- Need to estimate probability of specific event types
- Want to understand how covariates affect different event types
Don't use when:
- Only one event type of interest (standard survival analysis)
- Events are not mutually exclusive (use recurrent events methods)
- Competing events are extremely rare (can treat as censoring)
Cumulative Incidence with Competing Risks
cumulative_incidence_competing_risks Function
Estimates the cumulative incidence function for each event type.
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.datasets import load_leukemia
# Load data with competing risks
X, y = load_leukemia()
# y has event types: 0=censored, 1=relapse, 2=death
# Compute cumulative incidence for each event type
# Returns: time points, CIF for event 1, CIF for event 2, ...
time_points, cif_1, cif_2 = cumulative_incidence_competing_risks(y)
# Plot cumulative incidence functions
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.step(time_points, cif_1, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_2, where='post', label='Death in remission', linewidth=2)
plt.xlabel('Time (weeks)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Interpretation
- CIF at time t: Probability of experiencing that specific event by time t
- Sum of all CIFs: Total probability of experiencing any event (all cause)
- 1 - sum of CIFs: Probability of being event-free and uncensored
Data Format for Competing Risks
Creating Structured Array with Event Types
import numpy as np
from sksurv.util import Surv
# Event types: 0 = censored, 1 = event type 1, 2 = event type 2
event_types = np.array([0, 1, 2, 1, 0, 2, 1])
times = np.array([10.2, 5.3, 8.1, 3.7, 12.5, 6.8, 4.2])
# Create survival array
# For competing risks: event=True if any event occurred
# Store event type separately or encode in the event field
y = Surv.from_arrays(
event=(event_types > 0), # True if any event
time=times
)
# Keep event_types for distinguishing between event types
Converting Data with Event Types
import pandas as pd
from sksurv.util import Surv
# Assume data has: time, event_type columns
# event_type: 0=censored, 1=type1, 2=type2, etc.
df = pd.read_csv('competing_risks_data.csv')
# Create survival outcome
y = Surv.from_arrays(
event=(df['event_type'] > 0),
time=df['time']
)
# Store event types
event_types = df['event_type'].values
Comparing Cumulative Incidence Between Groups
Stratified Analysis
from sksurv.nonparametric import cumulative_incidence_competing_risks
import matplotlib.pyplot as plt
# Split by treatment group
mask_treatment = X['treatment'] == 'A'
mask_control = X['treatment'] == 'B'
y_treatment = y[mask_treatment]
y_control = y[mask_control]
# Compute CIF for each group
time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_treatment)
time_ctl, cif1_ctl, cif2_ctl = cumulative_incidence_competing_risks(y_control)
# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Event type 1
ax1.step(time_trt, cif1_trt, where='post', label='Treatment', linewidth=2)
ax1.step(time_ctl, cif1_ctl, where='post', label='Control', linewidth=2)
ax1.set_xlabel('Time')
ax1.set_ylabel('Cumulative Incidence')
ax1.set_title('Event Type 1')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Event type 2
ax2.step(time_trt, cif2_trt, where='post', label='Treatment', linewidth=2)
ax2.step(time_ctl, cif2_ctl, where='post', label='Control', linewidth=2)
ax2.set_xlabel('Time')
ax2.set_ylabel('Cumulative Incidence')
ax2.set_title('Event Type 2')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Statistical Testing with Competing Risks
Gray's Test
Compare cumulative incidence functions between groups using Gray's test (available in other packages like lifelines).
# Note: Gray's test not directly available in scikit-survival
# Consider using lifelines or other packages
# from lifelines.statistics import multivariate_logrank_test
# result = multivariate_logrank_test(times, groups, events, event_of_interest=1)
Modeling with Competing Risks
Approach 1: Cause-Specific Hazard Models
Fit separate Cox models for each event type, treating other event types as censored.
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv
# Separate outcome for each event type
# Event type 1: treat type 2 as censored
y_event1 = Surv.from_arrays(
event=(event_types == 1),
time=times
)
# Event type 2: treat type 1 as censored
y_event2 = Surv.from_arrays(
event=(event_types == 2),
time=times
)
# Fit cause-specific models
cox_event1 = CoxPHSurvivalAnalysis()
cox_event1.fit(X, y_event1)
cox_event2 = CoxPHSurvivalAnalysis()
cox_event2.fit(X, y_event2)
# Interpret coefficients for each event type
print("Event Type 1 (e.g., Relapse):")
print(cox_event1.coef_)
print("\nEvent Type 2 (e.g., Death):")
print(cox_event2.coef_)
Interpretation:
- Separate model for each competing event
- Coefficients show effect on cause-specific hazard for that event type
- A covariate may increase risk for one event type but decrease for another
Approach 2: Fine-Gray Sub-distribution Hazard Model
Models the cumulative incidence directly (not available directly in scikit-survival, but can use other packages).
# Note: Fine-Gray model not directly in scikit-survival
# Consider using lifelines or rpy2 to access R's cmprsk package
# from lifelines import CRCSplineFitter
# crc = CRCSplineFitter()
# crc.fit(df, event_col='event', duration_col='time')
Practical Example: Complete Competing Risks Analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv
# Simulate competing risks data
np.random.seed(42)
n = 200
# Create features
age = np.random.normal(60, 10, n)
treatment = np.random.choice(['A', 'B'], n)
# Simulate event times and types
# Event types: 0=censored, 1=relapse, 2=death
times = np.random.exponential(100, n)
event_types = np.zeros(n, dtype=int)
# Higher age increases both events, treatment A reduces relapse
for i in range(n):
if times[i] < 150: # Event occurred
# Probability of each event type
p_relapse = 0.6 if treatment[i] == 'B' else 0.4
event_types[i] = 1 if np.random.rand() < p_relapse else 2
else:
times[i] = 150 # Censored at study end
# Create DataFrame
df = pd.DataFrame({
'time': times,
'event_type': event_types,
'age': age,
'treatment': treatment
})
# Encode treatment
df['treatment_A'] = (df['treatment'] == 'A').astype(int)
# 1. OVERALL CUMULATIVE INCIDENCE
print("=" * 60)
print("OVERALL CUMULATIVE INCIDENCE")
print("=" * 60)
y_all = Surv.from_arrays(event=(df['event_type'] > 0), time=df['time'])
time_points, cif_relapse, cif_death = cumulative_incidence_competing_risks(y_all)
plt.figure(figsize=(10, 6))
plt.step(time_points, cif_relapse, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_death, where='post', label='Death', linewidth=2)
plt.xlabel('Time (days)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"5-year relapse incidence: {cif_relapse[-1]:.2%}")
print(f"5-year death incidence: {cif_death[-1]:.2%}")
# 2. STRATIFIED BY TREATMENT
print("\n" + "=" * 60)
print("CUMULATIVE INCIDENCE BY TREATMENT")
print("=" * 60)
for trt in ['A', 'B']:
mask = df['treatment'] == trt
y_trt = Surv.from_arrays(
event=(df.loc[mask, 'event_type'] > 0),
time=df.loc[mask, 'time']
)
time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_trt)
print(f"\nTreatment {trt}:")
print(f" 5-year relapse: {cif1_trt[-1]:.2%}")
print(f" 5-year death: {cif2_trt[-1]:.2%}")
# 3. CAUSE-SPECIFIC MODELS
print("\n" + "=" * 60)
print("CAUSE-SPECIFIC HAZARD MODELS")
print("=" * 60)
X = df[['age', 'treatment_A']]
# Model for relapse (event type 1)
y_relapse = Surv.from_arrays(
event=(df['event_type'] == 1),
time=df['time']
)
cox_relapse = CoxPHSurvivalAnalysis()
cox_relapse.fit(X, y_relapse)
print("\nRelapse Model:")
print(f" Age: HR = {np.exp(cox_relapse.coef_[0]):.3f}")
print(f" Treatment A: HR = {np.exp(cox_relapse.coef_[1]):.3f}")
# Model for death (event type 2)
y_death = Surv.from_arrays(
event=(df['event_type'] == 2),
time=df['time']
)
cox_death = CoxPHSurvivalAnalysis()
cox_death.fit(X, y_death)
print("\nDeath Model:")
print(f" Age: HR = {np.exp(cox_death.coef_[0]):.3f}")
print(f" Treatment A: HR = {np.exp(cox_death.coef_[1]):.3f}")
print("\n" + "=" * 60)
Important Considerations
Censoring in Competing Risks
- Administrative censoring: Subject still at risk at end of study
- Loss to follow-up: Subject leaves study before event
- Competing event: Other event occurred - NOT censored for CIF, but censored for cause-specific models
Choosing Between Cause-Specific and Sub-distribution Models
Cause-Specific Hazard Models:
- Easier to interpret
- Direct effect on hazard rate
- Better for understanding etiology
- Can fit with scikit-survival
Fine-Gray Sub-distribution Models:
- Models cumulative incidence directly
- Better for prediction and risk stratification
- More appropriate for clinical decision-making
- Requires other packages
Common Mistakes
Mistake 1: Using Kaplan-Meier to estimate event-specific probabilities
- Wrong: Kaplan-Meier for event type 1, treating type 2 as censored
- Correct: Cumulative incidence function accounting for competing risks
Mistake 2: Ignoring competing risks when they're substantial
- If competing event rate > 10-20%, should use competing risks methods
Mistake 3: Confusing cause-specific and sub-distribution hazards
- They answer different questions
- Use appropriate model for your research question
Summary
Key Functions:
cumulative_incidence_competing_risks: Estimate CIF for each event type- Fit separate Cox models for cause-specific hazards
- Use stratified analysis to compare groups
Best Practices:
- Always plot cumulative incidence functions
- Report both event-specific and overall incidence
- Use cause-specific models in scikit-survival
- Consider Fine-Gray models (other packages) for prediction
- Be explicit about which events are competing vs censored