Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,397 @@
# Competing Risks Analysis
## Overview
Competing risks occur when subjects can experience one of several mutually exclusive events (event types). When one event occurs, it prevents ("competes with") the occurrence of other events.
### Examples of Competing Risks
**Medical Research:**
- Death from cancer vs. death from cardiovascular disease vs. death from other causes
- Relapse vs. death without relapse in cancer studies
- Different types of infections in transplant patients
**Other Applications:**
- Job termination: retirement vs. resignation vs. termination for cause
- Equipment failure: different failure modes
- Customer churn: different reasons for leaving
### Key Concept: Cumulative Incidence Function (CIF)
The **Cumulative Incidence Function (CIF)** represents the probability of experiencing a specific event type by time *t*, accounting for the presence of competing risks.
**CIF_k(t) = P(T ≤ t, event type = k)**
This differs from the Kaplan-Meier estimator, which would overestimate event probabilities when competing risks are present.
## When to Use Competing Risks Analysis
**Use competing risks when:**
- Multiple mutually exclusive event types exist
- Occurrence of one event prevents others
- Need to estimate probability of specific event types
- Want to understand how covariates affect different event types
**Don't use when:**
- Only one event type of interest (standard survival analysis)
- Events are not mutually exclusive (use recurrent events methods)
- Competing events are extremely rare (can treat as censoring)
## Cumulative Incidence with Competing Risks
### cumulative_incidence_competing_risks Function
Estimates the cumulative incidence function for each event type.
```python
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.datasets import load_leukemia
# Load data with competing risks
X, y = load_leukemia()
# y has event types: 0=censored, 1=relapse, 2=death
# Compute cumulative incidence for each event type
# Returns: time points, CIF for event 1, CIF for event 2, ...
time_points, cif_1, cif_2 = cumulative_incidence_competing_risks(y)
# Plot cumulative incidence functions
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.step(time_points, cif_1, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_2, where='post', label='Death in remission', linewidth=2)
plt.xlabel('Time (weeks)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
```
### Interpretation
- **CIF at time t**: Probability of experiencing that specific event by time t
- **Sum of all CIFs**: Total probability of experiencing any event (all cause)
- **1 - sum of CIFs**: Probability of being event-free and uncensored
## Data Format for Competing Risks
### Creating Structured Array with Event Types
```python
import numpy as np
from sksurv.util import Surv
# Event types: 0 = censored, 1 = event type 1, 2 = event type 2
event_types = np.array([0, 1, 2, 1, 0, 2, 1])
times = np.array([10.2, 5.3, 8.1, 3.7, 12.5, 6.8, 4.2])
# Create survival array
# For competing risks: event=True if any event occurred
# Store event type separately or encode in the event field
y = Surv.from_arrays(
event=(event_types > 0), # True if any event
time=times
)
# Keep event_types for distinguishing between event types
```
### Converting Data with Event Types
```python
import pandas as pd
from sksurv.util import Surv
# Assume data has: time, event_type columns
# event_type: 0=censored, 1=type1, 2=type2, etc.
df = pd.read_csv('competing_risks_data.csv')
# Create survival outcome
y = Surv.from_arrays(
event=(df['event_type'] > 0),
time=df['time']
)
# Store event types
event_types = df['event_type'].values
```
## Comparing Cumulative Incidence Between Groups
### Stratified Analysis
```python
from sksurv.nonparametric import cumulative_incidence_competing_risks
import matplotlib.pyplot as plt
# Split by treatment group
mask_treatment = X['treatment'] == 'A'
mask_control = X['treatment'] == 'B'
y_treatment = y[mask_treatment]
y_control = y[mask_control]
# Compute CIF for each group
time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_treatment)
time_ctl, cif1_ctl, cif2_ctl = cumulative_incidence_competing_risks(y_control)
# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Event type 1
ax1.step(time_trt, cif1_trt, where='post', label='Treatment', linewidth=2)
ax1.step(time_ctl, cif1_ctl, where='post', label='Control', linewidth=2)
ax1.set_xlabel('Time')
ax1.set_ylabel('Cumulative Incidence')
ax1.set_title('Event Type 1')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Event type 2
ax2.step(time_trt, cif2_trt, where='post', label='Treatment', linewidth=2)
ax2.step(time_ctl, cif2_ctl, where='post', label='Control', linewidth=2)
ax2.set_xlabel('Time')
ax2.set_ylabel('Cumulative Incidence')
ax2.set_title('Event Type 2')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```
## Statistical Testing with Competing Risks
### Gray's Test
Compare cumulative incidence functions between groups using Gray's test (available in other packages like lifelines).
```python
# Note: Gray's test not directly available in scikit-survival
# Consider using lifelines or other packages
# from lifelines.statistics import multivariate_logrank_test
# result = multivariate_logrank_test(times, groups, events, event_of_interest=1)
```
## Modeling with Competing Risks
### Approach 1: Cause-Specific Hazard Models
Fit separate Cox models for each event type, treating other event types as censored.
```python
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv
# Separate outcome for each event type
# Event type 1: treat type 2 as censored
y_event1 = Surv.from_arrays(
event=(event_types == 1),
time=times
)
# Event type 2: treat type 1 as censored
y_event2 = Surv.from_arrays(
event=(event_types == 2),
time=times
)
# Fit cause-specific models
cox_event1 = CoxPHSurvivalAnalysis()
cox_event1.fit(X, y_event1)
cox_event2 = CoxPHSurvivalAnalysis()
cox_event2.fit(X, y_event2)
# Interpret coefficients for each event type
print("Event Type 1 (e.g., Relapse):")
print(cox_event1.coef_)
print("\nEvent Type 2 (e.g., Death):")
print(cox_event2.coef_)
```
**Interpretation:**
- Separate model for each competing event
- Coefficients show effect on cause-specific hazard for that event type
- A covariate may increase risk for one event type but decrease for another
### Approach 2: Fine-Gray Sub-distribution Hazard Model
Models the cumulative incidence directly (not available directly in scikit-survival, but can use other packages).
```python
# Note: Fine-Gray model not directly in scikit-survival
# Consider using lifelines or rpy2 to access R's cmprsk package
# from lifelines import CRCSplineFitter
# crc = CRCSplineFitter()
# crc.fit(df, event_col='event', duration_col='time')
```
## Practical Example: Complete Competing Risks Analysis
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv
# Simulate competing risks data
np.random.seed(42)
n = 200
# Create features
age = np.random.normal(60, 10, n)
treatment = np.random.choice(['A', 'B'], n)
# Simulate event times and types
# Event types: 0=censored, 1=relapse, 2=death
times = np.random.exponential(100, n)
event_types = np.zeros(n, dtype=int)
# Higher age increases both events, treatment A reduces relapse
for i in range(n):
if times[i] < 150: # Event occurred
# Probability of each event type
p_relapse = 0.6 if treatment[i] == 'B' else 0.4
event_types[i] = 1 if np.random.rand() < p_relapse else 2
else:
times[i] = 150 # Censored at study end
# Create DataFrame
df = pd.DataFrame({
'time': times,
'event_type': event_types,
'age': age,
'treatment': treatment
})
# Encode treatment
df['treatment_A'] = (df['treatment'] == 'A').astype(int)
# 1. OVERALL CUMULATIVE INCIDENCE
print("=" * 60)
print("OVERALL CUMULATIVE INCIDENCE")
print("=" * 60)
y_all = Surv.from_arrays(event=(df['event_type'] > 0), time=df['time'])
time_points, cif_relapse, cif_death = cumulative_incidence_competing_risks(y_all)
plt.figure(figsize=(10, 6))
plt.step(time_points, cif_relapse, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_death, where='post', label='Death', linewidth=2)
plt.xlabel('Time (days)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
print(f"5-year relapse incidence: {cif_relapse[-1]:.2%}")
print(f"5-year death incidence: {cif_death[-1]:.2%}")
# 2. STRATIFIED BY TREATMENT
print("\n" + "=" * 60)
print("CUMULATIVE INCIDENCE BY TREATMENT")
print("=" * 60)
for trt in ['A', 'B']:
mask = df['treatment'] == trt
y_trt = Surv.from_arrays(
event=(df.loc[mask, 'event_type'] > 0),
time=df.loc[mask, 'time']
)
time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_trt)
print(f"\nTreatment {trt}:")
print(f" 5-year relapse: {cif1_trt[-1]:.2%}")
print(f" 5-year death: {cif2_trt[-1]:.2%}")
# 3. CAUSE-SPECIFIC MODELS
print("\n" + "=" * 60)
print("CAUSE-SPECIFIC HAZARD MODELS")
print("=" * 60)
X = df[['age', 'treatment_A']]
# Model for relapse (event type 1)
y_relapse = Surv.from_arrays(
event=(df['event_type'] == 1),
time=df['time']
)
cox_relapse = CoxPHSurvivalAnalysis()
cox_relapse.fit(X, y_relapse)
print("\nRelapse Model:")
print(f" Age: HR = {np.exp(cox_relapse.coef_[0]):.3f}")
print(f" Treatment A: HR = {np.exp(cox_relapse.coef_[1]):.3f}")
# Model for death (event type 2)
y_death = Surv.from_arrays(
event=(df['event_type'] == 2),
time=df['time']
)
cox_death = CoxPHSurvivalAnalysis()
cox_death.fit(X, y_death)
print("\nDeath Model:")
print(f" Age: HR = {np.exp(cox_death.coef_[0]):.3f}")
print(f" Treatment A: HR = {np.exp(cox_death.coef_[1]):.3f}")
print("\n" + "=" * 60)
```
## Important Considerations
### Censoring in Competing Risks
- **Administrative censoring**: Subject still at risk at end of study
- **Loss to follow-up**: Subject leaves study before event
- **Competing event**: Other event occurred - NOT censored for CIF, but censored for cause-specific models
### Choosing Between Cause-Specific and Sub-distribution Models
**Cause-Specific Hazard Models:**
- Easier to interpret
- Direct effect on hazard rate
- Better for understanding etiology
- Can fit with scikit-survival
**Fine-Gray Sub-distribution Models:**
- Models cumulative incidence directly
- Better for prediction and risk stratification
- More appropriate for clinical decision-making
- Requires other packages
### Common Mistakes
**Mistake 1**: Using Kaplan-Meier to estimate event-specific probabilities
- **Wrong**: Kaplan-Meier for event type 1, treating type 2 as censored
- **Correct**: Cumulative incidence function accounting for competing risks
**Mistake 2**: Ignoring competing risks when they're substantial
- If competing event rate > 10-20%, should use competing risks methods
**Mistake 3**: Confusing cause-specific and sub-distribution hazards
- They answer different questions
- Use appropriate model for your research question
## Summary
**Key Functions:**
- `cumulative_incidence_competing_risks`: Estimate CIF for each event type
- Fit separate Cox models for cause-specific hazards
- Use stratified analysis to compare groups
**Best Practices:**
1. Always plot cumulative incidence functions
2. Report both event-specific and overall incidence
3. Use cause-specific models in scikit-survival
4. Consider Fine-Gray models (other packages) for prediction
5. Be explicit about which events are competing vs censored