11 KiB
Survival Support Vector Machines
Overview
Survival Support Vector Machines (SVMs) adapt the traditional SVM framework to survival analysis with censored data. They optimize a ranking objective that encourages correct ordering of survival times.
Core Idea
SVMs for survival analysis learn a function f(x) that produces risk scores, where the optimization ensures that subjects with shorter survival times receive higher risk scores than those with longer times.
When to Use Survival SVMs
Appropriate for:
- Medium-sized datasets (typically 100-10,000 samples)
- Need for non-linear decision boundaries (kernel SVMs)
- Want margin-based learning with regularization
- Have well-defined feature space
Not ideal for:
- Very large datasets (>100,000 samples) - ensemble methods may be faster
- Need interpretable coefficients - use Cox models instead
- Require survival function estimates - use Random Survival Forest
- Very high dimensional data - use regularized Cox or gradient boosting
Model Types
FastSurvivalSVM
Linear survival SVM optimized for speed using coordinate descent.
When to Use:
- Linear relationships expected
- Large datasets where speed matters
- Want fast training and prediction
Key Parameters:
alpha: Regularization parameter (default: 1.0)- Higher = more regularization
rank_ratio: Trade-off between ranking and regression (default: 1.0)max_iter: Maximum iterations (default: 20)tol: Tolerance for stopping criterion (default: 1e-5)
from sksurv.svm import FastSurvivalSVM
# Fit linear survival SVM
estimator = FastSurvivalSVM(alpha=1.0, max_iter=100, tol=1e-5, random_state=42)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
FastKernelSurvivalSVM
Kernel survival SVM for non-linear relationships.
When to Use:
- Non-linear relationships between features and survival
- Medium-sized datasets
- Can afford longer training time for better performance
Kernel Options:
'linear': Linear kernel, equivalent to FastSurvivalSVM'poly': Polynomial kernel'rbf': Radial basis function (Gaussian) kernel - most common'sigmoid': Sigmoid kernel- Custom kernel function
Key Parameters:
alpha: Regularization parameter (default: 1.0)kernel: Kernel function (default: 'rbf')gamma: Kernel coefficient for rbf, poly, sigmoiddegree: Degree for polynomial kernelcoef0: Independent term for poly and sigmoidrank_ratio: Trade-off parameter (default: 1.0)max_iter: Maximum iterations (default: 20)
from sksurv.svm import FastKernelSurvivalSVM
# Fit RBF kernel survival SVM
estimator = FastKernelSurvivalSVM(
alpha=1.0,
kernel='rbf',
gamma='scale',
max_iter=50,
random_state=42
)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
HingeLossSurvivalSVM
Survival SVM using hinge loss, more similar to classification SVM.
When to Use:
- Want hinge loss instead of squared hinge
- Sparse solutions desired
- Similar behavior to classification SVMs
Key Parameters:
alpha: Regularization parameterfit_intercept: Whether to fit intercept term (default: False)
from sksurv.svm import HingeLossSurvivalSVM
# Fit hinge loss SVM
estimator = HingeLossSurvivalSVM(alpha=1.0, fit_intercept=False, random_state=42)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
NaiveSurvivalSVM
Original formulation of survival SVM using quadratic programming.
When to Use:
- Small datasets
- Research/benchmarking purposes
- Other methods don't converge
Limitations:
- Slower than Fast variants
- Less scalable
from sksurv.svm import NaiveSurvivalSVM
# Fit naive SVM (slower)
estimator = NaiveSurvivalSVM(alpha=1.0, random_state=42)
estimator.fit(X, y)
# Predict
risk_scores = estimator.predict(X_test)
MinlipSurvivalAnalysis
Survival analysis using minimizing Lipschitz constant approach.
When to Use:
- Want different optimization objective
- Research applications
- Alternative to standard survival SVMs
from sksurv.svm import MinlipSurvivalAnalysis
# Fit Minlip model
estimator = MinlipSurvivalAnalysis(alpha=1.0, random_state=42)
estimator.fit(X, y)
# Predict
risk_scores = estimator.predict(X_test)
Hyperparameter Tuning
Tuning Alpha (Regularization)
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer
# Define parameter grid
param_grid = {
'alpha': [0.1, 0.5, 1.0, 5.0, 10.0, 50.0]
}
# Grid search
cv = GridSearchCV(
FastSurvivalSVM(),
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1
)
cv.fit(X, y)
print(f"Best alpha: {cv.best_params_['alpha']}")
print(f"Best C-index: {cv.best_score_:.3f}")
Tuning Kernel Parameters
from sklearn.model_selection import GridSearchCV
# Define parameter grid for kernel SVM
param_grid = {
'alpha': [0.1, 1.0, 10.0],
'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1.0]
}
# Grid search
cv = GridSearchCV(
FastKernelSurvivalSVM(kernel='rbf'),
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1
)
cv.fit(X, y)
print(f"Best parameters: {cv.best_params_}")
print(f"Best C-index: {cv.best_score_:.3f}")
Clinical Kernel Transform
ClinicalKernelTransform
Special kernel that combines clinical features with molecular data for improved predictions in medical applications.
Use Case:
- Have both clinical variables (age, stage, etc.) and high-dimensional molecular data (gene expression, genomics)
- Clinical features should have different weighting
- Want to integrate heterogeneous data types
Key Parameters:
fit_once: Whether to fit kernel once or refit during cross-validation (default: False)- Clinical features should be passed separately from molecular features
from sksurv.kernels import ClinicalKernelTransform
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.pipeline import make_pipeline
# Separate clinical and molecular features
clinical_features = ['age', 'stage', 'grade']
X_clinical = X[clinical_features]
X_molecular = X.drop(clinical_features, axis=1)
# Create pipeline with clinical kernel
estimator = make_pipeline(
ClinicalKernelTransform(),
FastKernelSurvivalSVM()
)
# Fit model
# ClinicalKernelTransform expects tuple (clinical, molecular)
X_combined = list(zip(X_clinical.values, X_molecular.values))
estimator.fit(X_combined, y)
Practical Examples
Example 1: Linear SVM with Cross-Validation
from sksurv.svm import FastSurvivalSVM
from sklearn.model_selection import cross_val_score
from sksurv.metrics import as_concordance_index_ipcw_scorer
from sklearn.preprocessing import StandardScaler
# Standardize features (important for SVMs!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Create model
svm = FastSurvivalSVM(alpha=1.0, max_iter=100, random_state=42)
# Cross-validation
scores = cross_val_score(
svm, X_scaled, y,
cv=5,
scoring=as_concordance_index_ipcw_scorer(),
n_jobs=-1
)
print(f"Mean C-index: {scores.mean():.3f} (±{scores.std():.3f})")
Example 2: Kernel SVM with Different Kernels
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import train_test_split
from sksurv.metrics import concordance_index_ipcw
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Compare different kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = {}
for kernel in kernels:
# Fit model
svm = FastKernelSurvivalSVM(kernel=kernel, alpha=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)
# Predict
risk_scores = svm.predict(X_test_scaled)
# Evaluate
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
results[kernel] = c_index
print(f"{kernel:10s}: C-index = {c_index:.3f}")
# Best kernel
best_kernel = max(results, key=results.get)
print(f"\nBest kernel: {best_kernel} (C-index = {results[best_kernel]:.3f})")
Example 3: Full Pipeline with Hyperparameter Tuning
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sksurv.metrics import as_concordance_index_ipcw_scorer
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', FastKernelSurvivalSVM(kernel='rbf'))
])
# Define parameter grid
param_grid = {
'svm__alpha': [0.1, 1.0, 10.0],
'svm__gamma': ['scale', 0.01, 0.1, 1.0]
}
# Grid search
cv = GridSearchCV(
pipeline,
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1,
verbose=1
)
cv.fit(X_train, y_train)
# Best model
best_model = cv.best_estimator_
print(f"Best parameters: {cv.best_params_}")
print(f"Best CV C-index: {cv.best_score_:.3f}")
# Evaluate on test set
risk_scores = best_model.predict(X_test)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
print(f"Test C-index: {c_index:.3f}")
Important Considerations
Feature Scaling
CRITICAL: Always standardize features before using SVMs!
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Computational Complexity
- FastSurvivalSVM: O(n × p) per iteration - fast
- FastKernelSurvivalSVM: O(n² × p) - slower, scales quadratically
- NaiveSurvivalSVM: O(n³) - very slow for large datasets
For large datasets (>10,000 samples), prefer:
- FastSurvivalSVM (linear)
- Gradient Boosting
- Random Survival Forest
When SVMs May Not Be Best Choice
- Very large datasets: Ensemble methods are faster
- Need survival functions: Use Random Survival Forest or Cox models
- Need interpretability: Use Cox models
- Very high dimensional: Use penalized Cox (Coxnet) or gradient boosting with feature selection
Model Selection Guide
| Model | Speed | Non-linearity | Scalability | Interpretability |
|---|---|---|---|---|
| FastSurvivalSVM | Fast | No | High | Medium |
| FastKernelSurvivalSVM | Medium | Yes | Medium | Low |
| HingeLossSurvivalSVM | Fast | No | High | Medium |
| NaiveSurvivalSVM | Slow | No | Low | Medium |
General Recommendations:
- Start with FastSurvivalSVM for baseline
- Try FastKernelSurvivalSVM with RBF if non-linearity expected
- Use grid search to tune alpha and gamma
- Always standardize features
- Compare with Random Survival Forest and Gradient Boosting