gh-k-dense-ai-claude-scient…/skills/scikit-survival/references/competing-risks.md

# Competing Risks Analysis

## Overview

Competing risks occur when subjects can experience one of several mutually exclusive events (event types). When one event occurs, it prevents ("competes with") the occurrence of other events.

### Examples of Competing Risks

**Medical Research:**
- Death from cancer vs. death from cardiovascular disease vs. death from other causes
- Relapse vs. death without relapse in cancer studies
- Different types of infections in transplant patients

**Other Applications:**
- Job termination: retirement vs. resignation vs. termination for cause
- Equipment failure: different failure modes
- Customer churn: different reasons for leaving

### Key Concept: Cumulative Incidence Function (CIF)

The **Cumulative Incidence Function (CIF)** represents the probability of experiencing a specific event type by time *t*, accounting for the presence of competing risks.

**CIF_k(t) = P(T ≤ t, event type = k)**

This differs from the Kaplan-Meier estimator, which would overestimate event probabilities when competing risks are present.

## When to Use Competing Risks Analysis

**Use competing risks when:**
- Multiple mutually exclusive event types exist
- Occurrence of one event prevents others
- Need to estimate probability of specific event types
- Want to understand how covariates affect different event types

**Don't use when:**
- Only one event type of interest (standard survival analysis)
- Events are not mutually exclusive (use recurrent events methods)
- Competing events are extremely rare (can treat as censoring)

## Cumulative Incidence with Competing Risks

### cumulative_incidence_competing_risks Function

Estimates the cumulative incidence function for each event type.

```python
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.datasets import load_leukemia

# Load data with competing risks
X, y = load_leukemia()
# y has event types: 0=censored, 1=relapse, 2=death

# Compute cumulative incidence for each event type
# Returns: time points, CIF for event 1, CIF for event 2, ...
time_points, cif_1, cif_2 = cumulative_incidence_competing_risks(y)

# Plot cumulative incidence functions
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.step(time_points, cif_1, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_2, where='post', label='Death in remission', linewidth=2)
plt.xlabel('Time (weeks)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
```

### Interpretation

- **CIF at time t**: Probability of experiencing that specific event by time t
- **Sum of all CIFs**: Total probability of experiencing any event (all cause)
- **1 - sum of CIFs**: Probability of being event-free and uncensored

## Data Format for Competing Risks

### Creating Structured Array with Event Types

```python
import numpy as np
from sksurv.util import Surv

# Event types: 0 = censored, 1 = event type 1, 2 = event type 2
event_types = np.array([0, 1, 2, 1, 0, 2, 1])
times = np.array([10.2, 5.3, 8.1, 3.7, 12.5, 6.8, 4.2])

# Create survival array
# For competing risks: event=True if any event occurred
# Store event type separately or encode in the event field
y = Surv.from_arrays(
    event=(event_types > 0),  # True if any event
    time=times
)

# Keep event_types for distinguishing between event types
```

### Converting Data with Event Types

```python
import pandas as pd
from sksurv.util import Surv

# Assume data has: time, event_type columns
# event_type: 0=censored, 1=type1, 2=type2, etc.

df = pd.read_csv('competing_risks_data.csv')

# Create survival outcome
y = Surv.from_arrays(
    event=(df['event_type'] > 0),
    time=df['time']
)

# Store event types
event_types = df['event_type'].values
```

## Comparing Cumulative Incidence Between Groups

### Stratified Analysis

```python
from sksurv.nonparametric import cumulative_incidence_competing_risks
import matplotlib.pyplot as plt

# Split by treatment group
mask_treatment = X['treatment'] == 'A'
mask_control = X['treatment'] == 'B'

y_treatment = y[mask_treatment]
y_control = y[mask_control]

# Compute CIF for each group
time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_treatment)
time_ctl, cif1_ctl, cif2_ctl = cumulative_incidence_competing_risks(y_control)

# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Event type 1
ax1.step(time_trt, cif1_trt, where='post', label='Treatment', linewidth=2)
ax1.step(time_ctl, cif1_ctl, where='post', label='Control', linewidth=2)
ax1.set_xlabel('Time')
ax1.set_ylabel('Cumulative Incidence')
ax1.set_title('Event Type 1')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Event type 2
ax2.step(time_trt, cif2_trt, where='post', label='Treatment', linewidth=2)
ax2.step(time_ctl, cif2_ctl, where='post', label='Control', linewidth=2)
ax2.set_xlabel('Time')
ax2.set_ylabel('Cumulative Incidence')
ax2.set_title('Event Type 2')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
```

## Statistical Testing with Competing Risks

### Gray's Test

Compare cumulative incidence functions between groups using Gray's test (available in other packages like lifelines).

```python
# Note: Gray's test not directly available in scikit-survival
# Consider using lifelines or other packages

# from lifelines.statistics import multivariate_logrank_test
# result = multivariate_logrank_test(times, groups, events, event_of_interest=1)
```

## Modeling with Competing Risks

### Approach 1: Cause-Specific Hazard Models

Fit separate Cox models for each event type, treating other event types as censored.

```python
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv

# Separate outcome for each event type
# Event type 1: treat type 2 as censored
y_event1 = Surv.from_arrays(
    event=(event_types == 1),
    time=times
)

# Event type 2: treat type 1 as censored
y_event2 = Surv.from_arrays(
    event=(event_types == 2),
    time=times
)

# Fit cause-specific models
cox_event1 = CoxPHSurvivalAnalysis()
cox_event1.fit(X, y_event1)

cox_event2 = CoxPHSurvivalAnalysis()
cox_event2.fit(X, y_event2)

# Interpret coefficients for each event type
print("Event Type 1 (e.g., Relapse):")
print(cox_event1.coef_)

print("\nEvent Type 2 (e.g., Death):")
print(cox_event2.coef_)
```

**Interpretation:**
- Separate model for each competing event
- Coefficients show effect on cause-specific hazard for that event type
- A covariate may increase risk for one event type but decrease for another

### Approach 2: Fine-Gray Sub-distribution Hazard Model

Models the cumulative incidence directly (not available directly in scikit-survival, but can use other packages).

```python
# Note: Fine-Gray model not directly in scikit-survival
# Consider using lifelines or rpy2 to access R's cmprsk package

# from lifelines import CRCSplineFitter
# crc = CRCSplineFitter()
# crc.fit(df, event_col='event', duration_col='time')
```

## Practical Example: Complete Competing Risks Analysis

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sksurv.nonparametric import cumulative_incidence_competing_risks
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.util import Surv

# Simulate competing risks data
np.random.seed(42)
n = 200

# Create features
age = np.random.normal(60, 10, n)
treatment = np.random.choice(['A', 'B'], n)

# Simulate event times and types
# Event types: 0=censored, 1=relapse, 2=death
times = np.random.exponential(100, n)
event_types = np.zeros(n, dtype=int)

# Higher age increases both events, treatment A reduces relapse
for i in range(n):
    if times[i] < 150:  # Event occurred
        # Probability of each event type
        p_relapse = 0.6 if treatment[i] == 'B' else 0.4
        event_types[i] = 1 if np.random.rand() < p_relapse else 2
    else:
        times[i] = 150  # Censored at study end

# Create DataFrame
df = pd.DataFrame({
    'time': times,
    'event_type': event_types,
    'age': age,
    'treatment': treatment
})

# Encode treatment
df['treatment_A'] = (df['treatment'] == 'A').astype(int)

# 1. OVERALL CUMULATIVE INCIDENCE
print("=" * 60)
print("OVERALL CUMULATIVE INCIDENCE")
print("=" * 60)

y_all = Surv.from_arrays(event=(df['event_type'] > 0), time=df['time'])
time_points, cif_relapse, cif_death = cumulative_incidence_competing_risks(y_all)

plt.figure(figsize=(10, 6))
plt.step(time_points, cif_relapse, where='post', label='Relapse', linewidth=2)
plt.step(time_points, cif_death, where='post', label='Death', linewidth=2)
plt.xlabel('Time (days)')
plt.ylabel('Cumulative Incidence')
plt.title('Competing Risks: Relapse vs Death')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"5-year relapse incidence: {cif_relapse[-1]:.2%}")
print(f"5-year death incidence: {cif_death[-1]:.2%}")

# 2. STRATIFIED BY TREATMENT
print("\n" + "=" * 60)
print("CUMULATIVE INCIDENCE BY TREATMENT")
print("=" * 60)

for trt in ['A', 'B']:
    mask = df['treatment'] == trt
    y_trt = Surv.from_arrays(
        event=(df.loc[mask, 'event_type'] > 0),
        time=df.loc[mask, 'time']
    )
    time_trt, cif1_trt, cif2_trt = cumulative_incidence_competing_risks(y_trt)
    print(f"\nTreatment {trt}:")
    print(f"  5-year relapse: {cif1_trt[-1]:.2%}")
    print(f"  5-year death: {cif2_trt[-1]:.2%}")

# 3. CAUSE-SPECIFIC MODELS
print("\n" + "=" * 60)
print("CAUSE-SPECIFIC HAZARD MODELS")
print("=" * 60)

X = df[['age', 'treatment_A']]

# Model for relapse (event type 1)
y_relapse = Surv.from_arrays(
    event=(df['event_type'] == 1),
    time=df['time']
)
cox_relapse = CoxPHSurvivalAnalysis()
cox_relapse.fit(X, y_relapse)

print("\nRelapse Model:")
print(f"  Age:        HR = {np.exp(cox_relapse.coef_[0]):.3f}")
print(f"  Treatment A: HR = {np.exp(cox_relapse.coef_[1]):.3f}")

# Model for death (event type 2)
y_death = Surv.from_arrays(
    event=(df['event_type'] == 2),
    time=df['time']
)
cox_death = CoxPHSurvivalAnalysis()
cox_death.fit(X, y_death)

print("\nDeath Model:")
print(f"  Age:        HR = {np.exp(cox_death.coef_[0]):.3f}")
print(f"  Treatment A: HR = {np.exp(cox_death.coef_[1]):.3f}")

print("\n" + "=" * 60)
```

## Important Considerations

### Censoring in Competing Risks

- **Administrative censoring**: Subject still at risk at end of study
- **Loss to follow-up**: Subject leaves study before event
- **Competing event**: Other event occurred - NOT censored for CIF, but censored for cause-specific models

### Choosing Between Cause-Specific and Sub-distribution Models

**Cause-Specific Hazard Models:**
- Easier to interpret
- Direct effect on hazard rate
- Better for understanding etiology
- Can fit with scikit-survival

**Fine-Gray Sub-distribution Models:**
- Models cumulative incidence directly
- Better for prediction and risk stratification
- More appropriate for clinical decision-making
- Requires other packages

### Common Mistakes

**Mistake 1**: Using Kaplan-Meier to estimate event-specific probabilities
- **Wrong**: Kaplan-Meier for event type 1, treating type 2 as censored
- **Correct**: Cumulative incidence function accounting for competing risks

**Mistake 2**: Ignoring competing risks when they're substantial
- If competing event rate > 10-20%, should use competing risks methods

**Mistake 3**: Confusing cause-specific and sub-distribution hazards
- They answer different questions
- Use appropriate model for your research question

## Summary

**Key Functions:**
- `cumulative_incidence_competing_risks`: Estimate CIF for each event type
- Fit separate Cox models for cause-specific hazards
- Use stratified analysis to compare groups

**Best Practices:**
1. Always plot cumulative incidence functions
2. Report both event-specific and overall incidence
3. Use cause-specific models in scikit-survival
4. Consider Fine-Gray models (other packages) for prediction
5. Be explicit about which events are competing vs censored