Initial commit
This commit is contained in:
411
skills/scikit-survival/references/svm-models.md
Normal file
411
skills/scikit-survival/references/svm-models.md
Normal file
@@ -0,0 +1,411 @@
|
||||
# Survival Support Vector Machines
|
||||
|
||||
## Overview
|
||||
|
||||
Survival Support Vector Machines (SVMs) adapt the traditional SVM framework to survival analysis with censored data. They optimize a ranking objective that encourages correct ordering of survival times.
|
||||
|
||||
### Core Idea
|
||||
|
||||
SVMs for survival analysis learn a function f(x) that produces risk scores, where the optimization ensures that subjects with shorter survival times receive higher risk scores than those with longer times.
|
||||
|
||||
## When to Use Survival SVMs
|
||||
|
||||
**Appropriate for:**
|
||||
- Medium-sized datasets (typically 100-10,000 samples)
|
||||
- Need for non-linear decision boundaries (kernel SVMs)
|
||||
- Want margin-based learning with regularization
|
||||
- Have well-defined feature space
|
||||
|
||||
**Not ideal for:**
|
||||
- Very large datasets (>100,000 samples) - ensemble methods may be faster
|
||||
- Need interpretable coefficients - use Cox models instead
|
||||
- Require survival function estimates - use Random Survival Forest
|
||||
- Very high dimensional data - use regularized Cox or gradient boosting
|
||||
|
||||
## Model Types
|
||||
|
||||
### FastSurvivalSVM
|
||||
|
||||
Linear survival SVM optimized for speed using coordinate descent.
|
||||
|
||||
**When to Use:**
|
||||
- Linear relationships expected
|
||||
- Large datasets where speed matters
|
||||
- Want fast training and prediction
|
||||
|
||||
**Key Parameters:**
|
||||
- `alpha`: Regularization parameter (default: 1.0)
|
||||
- Higher = more regularization
|
||||
- `rank_ratio`: Trade-off between ranking and regression (default: 1.0)
|
||||
- `max_iter`: Maximum iterations (default: 20)
|
||||
- `tol`: Tolerance for stopping criterion (default: 1e-5)
|
||||
|
||||
```python
|
||||
from sksurv.svm import FastSurvivalSVM
|
||||
|
||||
# Fit linear survival SVM
|
||||
estimator = FastSurvivalSVM(alpha=1.0, max_iter=100, tol=1e-5, random_state=42)
|
||||
estimator.fit(X, y)
|
||||
|
||||
# Predict risk scores
|
||||
risk_scores = estimator.predict(X_test)
|
||||
```
|
||||
|
||||
### FastKernelSurvivalSVM
|
||||
|
||||
Kernel survival SVM for non-linear relationships.
|
||||
|
||||
**When to Use:**
|
||||
- Non-linear relationships between features and survival
|
||||
- Medium-sized datasets
|
||||
- Can afford longer training time for better performance
|
||||
|
||||
**Kernel Options:**
|
||||
- `'linear'`: Linear kernel, equivalent to FastSurvivalSVM
|
||||
- `'poly'`: Polynomial kernel
|
||||
- `'rbf'`: Radial basis function (Gaussian) kernel - most common
|
||||
- `'sigmoid'`: Sigmoid kernel
|
||||
- Custom kernel function
|
||||
|
||||
**Key Parameters:**
|
||||
- `alpha`: Regularization parameter (default: 1.0)
|
||||
- `kernel`: Kernel function (default: 'rbf')
|
||||
- `gamma`: Kernel coefficient for rbf, poly, sigmoid
|
||||
- `degree`: Degree for polynomial kernel
|
||||
- `coef0`: Independent term for poly and sigmoid
|
||||
- `rank_ratio`: Trade-off parameter (default: 1.0)
|
||||
- `max_iter`: Maximum iterations (default: 20)
|
||||
|
||||
```python
|
||||
from sksurv.svm import FastKernelSurvivalSVM
|
||||
|
||||
# Fit RBF kernel survival SVM
|
||||
estimator = FastKernelSurvivalSVM(
|
||||
alpha=1.0,
|
||||
kernel='rbf',
|
||||
gamma='scale',
|
||||
max_iter=50,
|
||||
random_state=42
|
||||
)
|
||||
estimator.fit(X, y)
|
||||
|
||||
# Predict risk scores
|
||||
risk_scores = estimator.predict(X_test)
|
||||
```
|
||||
|
||||
### HingeLossSurvivalSVM
|
||||
|
||||
Survival SVM using hinge loss, more similar to classification SVM.
|
||||
|
||||
**When to Use:**
|
||||
- Want hinge loss instead of squared hinge
|
||||
- Sparse solutions desired
|
||||
- Similar behavior to classification SVMs
|
||||
|
||||
**Key Parameters:**
|
||||
- `alpha`: Regularization parameter
|
||||
- `fit_intercept`: Whether to fit intercept term (default: False)
|
||||
|
||||
```python
|
||||
from sksurv.svm import HingeLossSurvivalSVM
|
||||
|
||||
# Fit hinge loss SVM
|
||||
estimator = HingeLossSurvivalSVM(alpha=1.0, fit_intercept=False, random_state=42)
|
||||
estimator.fit(X, y)
|
||||
|
||||
# Predict risk scores
|
||||
risk_scores = estimator.predict(X_test)
|
||||
```
|
||||
|
||||
### NaiveSurvivalSVM
|
||||
|
||||
Original formulation of survival SVM using quadratic programming.
|
||||
|
||||
**When to Use:**
|
||||
- Small datasets
|
||||
- Research/benchmarking purposes
|
||||
- Other methods don't converge
|
||||
|
||||
**Limitations:**
|
||||
- Slower than Fast variants
|
||||
- Less scalable
|
||||
|
||||
```python
|
||||
from sksurv.svm import NaiveSurvivalSVM
|
||||
|
||||
# Fit naive SVM (slower)
|
||||
estimator = NaiveSurvivalSVM(alpha=1.0, random_state=42)
|
||||
estimator.fit(X, y)
|
||||
|
||||
# Predict
|
||||
risk_scores = estimator.predict(X_test)
|
||||
```
|
||||
|
||||
### MinlipSurvivalAnalysis
|
||||
|
||||
Survival analysis using minimizing Lipschitz constant approach.
|
||||
|
||||
**When to Use:**
|
||||
- Want different optimization objective
|
||||
- Research applications
|
||||
- Alternative to standard survival SVMs
|
||||
|
||||
```python
|
||||
from sksurv.svm import MinlipSurvivalAnalysis
|
||||
|
||||
# Fit Minlip model
|
||||
estimator = MinlipSurvivalAnalysis(alpha=1.0, random_state=42)
|
||||
estimator.fit(X, y)
|
||||
|
||||
# Predict
|
||||
risk_scores = estimator.predict(X_test)
|
||||
```
|
||||
|
||||
## Hyperparameter Tuning
|
||||
|
||||
### Tuning Alpha (Regularization)
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
from sksurv.metrics import as_concordance_index_ipcw_scorer
|
||||
|
||||
# Define parameter grid
|
||||
param_grid = {
|
||||
'alpha': [0.1, 0.5, 1.0, 5.0, 10.0, 50.0]
|
||||
}
|
||||
|
||||
# Grid search
|
||||
cv = GridSearchCV(
|
||||
FastSurvivalSVM(),
|
||||
param_grid,
|
||||
scoring=as_concordance_index_ipcw_scorer(),
|
||||
cv=5,
|
||||
n_jobs=-1
|
||||
)
|
||||
cv.fit(X, y)
|
||||
|
||||
print(f"Best alpha: {cv.best_params_['alpha']}")
|
||||
print(f"Best C-index: {cv.best_score_:.3f}")
|
||||
```
|
||||
|
||||
### Tuning Kernel Parameters
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
|
||||
# Define parameter grid for kernel SVM
|
||||
param_grid = {
|
||||
'alpha': [0.1, 1.0, 10.0],
|
||||
'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1.0]
|
||||
}
|
||||
|
||||
# Grid search
|
||||
cv = GridSearchCV(
|
||||
FastKernelSurvivalSVM(kernel='rbf'),
|
||||
param_grid,
|
||||
scoring=as_concordance_index_ipcw_scorer(),
|
||||
cv=5,
|
||||
n_jobs=-1
|
||||
)
|
||||
cv.fit(X, y)
|
||||
|
||||
print(f"Best parameters: {cv.best_params_}")
|
||||
print(f"Best C-index: {cv.best_score_:.3f}")
|
||||
```
|
||||
|
||||
## Clinical Kernel Transform
|
||||
|
||||
### ClinicalKernelTransform
|
||||
|
||||
Special kernel that combines clinical features with molecular data for improved predictions in medical applications.
|
||||
|
||||
**Use Case:**
|
||||
- Have both clinical variables (age, stage, etc.) and high-dimensional molecular data (gene expression, genomics)
|
||||
- Clinical features should have different weighting
|
||||
- Want to integrate heterogeneous data types
|
||||
|
||||
**Key Parameters:**
|
||||
- `fit_once`: Whether to fit kernel once or refit during cross-validation (default: False)
|
||||
- Clinical features should be passed separately from molecular features
|
||||
|
||||
```python
|
||||
from sksurv.kernels import ClinicalKernelTransform
|
||||
from sksurv.svm import FastKernelSurvivalSVM
|
||||
from sklearn.pipeline import make_pipeline
|
||||
|
||||
# Separate clinical and molecular features
|
||||
clinical_features = ['age', 'stage', 'grade']
|
||||
X_clinical = X[clinical_features]
|
||||
X_molecular = X.drop(clinical_features, axis=1)
|
||||
|
||||
# Create pipeline with clinical kernel
|
||||
estimator = make_pipeline(
|
||||
ClinicalKernelTransform(),
|
||||
FastKernelSurvivalSVM()
|
||||
)
|
||||
|
||||
# Fit model
|
||||
# ClinicalKernelTransform expects tuple (clinical, molecular)
|
||||
X_combined = list(zip(X_clinical.values, X_molecular.values))
|
||||
estimator.fit(X_combined, y)
|
||||
```
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Example 1: Linear SVM with Cross-Validation
|
||||
|
||||
```python
|
||||
from sksurv.svm import FastSurvivalSVM
|
||||
from sklearn.model_selection import cross_val_score
|
||||
from sksurv.metrics import as_concordance_index_ipcw_scorer
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
# Standardize features (important for SVMs!)
|
||||
scaler = StandardScaler()
|
||||
X_scaled = scaler.fit_transform(X)
|
||||
|
||||
# Create model
|
||||
svm = FastSurvivalSVM(alpha=1.0, max_iter=100, random_state=42)
|
||||
|
||||
# Cross-validation
|
||||
scores = cross_val_score(
|
||||
svm, X_scaled, y,
|
||||
cv=5,
|
||||
scoring=as_concordance_index_ipcw_scorer(),
|
||||
n_jobs=-1
|
||||
)
|
||||
|
||||
print(f"Mean C-index: {scores.mean():.3f} (±{scores.std():.3f})")
|
||||
```
|
||||
|
||||
### Example 2: Kernel SVM with Different Kernels
|
||||
|
||||
```python
|
||||
from sksurv.svm import FastKernelSurvivalSVM
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sksurv.metrics import concordance_index_ipcw
|
||||
|
||||
# Split data
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||
|
||||
# Standardize
|
||||
scaler = StandardScaler()
|
||||
X_train_scaled = scaler.fit_transform(X_train)
|
||||
X_test_scaled = scaler.transform(X_test)
|
||||
|
||||
# Compare different kernels
|
||||
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
|
||||
results = {}
|
||||
|
||||
for kernel in kernels:
|
||||
# Fit model
|
||||
svm = FastKernelSurvivalSVM(kernel=kernel, alpha=1.0, random_state=42)
|
||||
svm.fit(X_train_scaled, y_train)
|
||||
|
||||
# Predict
|
||||
risk_scores = svm.predict(X_test_scaled)
|
||||
|
||||
# Evaluate
|
||||
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
|
||||
results[kernel] = c_index
|
||||
|
||||
print(f"{kernel:10s}: C-index = {c_index:.3f}")
|
||||
|
||||
# Best kernel
|
||||
best_kernel = max(results, key=results.get)
|
||||
print(f"\nBest kernel: {best_kernel} (C-index = {results[best_kernel]:.3f})")
|
||||
```
|
||||
|
||||
### Example 3: Full Pipeline with Hyperparameter Tuning
|
||||
|
||||
```python
|
||||
from sksurv.svm import FastKernelSurvivalSVM
|
||||
from sklearn.model_selection import GridSearchCV, train_test_split
|
||||
from sklearn.pipeline import Pipeline
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sksurv.metrics import as_concordance_index_ipcw_scorer
|
||||
|
||||
# Split data
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||
|
||||
# Create pipeline
|
||||
pipeline = Pipeline([
|
||||
('scaler', StandardScaler()),
|
||||
('svm', FastKernelSurvivalSVM(kernel='rbf'))
|
||||
])
|
||||
|
||||
# Define parameter grid
|
||||
param_grid = {
|
||||
'svm__alpha': [0.1, 1.0, 10.0],
|
||||
'svm__gamma': ['scale', 0.01, 0.1, 1.0]
|
||||
}
|
||||
|
||||
# Grid search
|
||||
cv = GridSearchCV(
|
||||
pipeline,
|
||||
param_grid,
|
||||
scoring=as_concordance_index_ipcw_scorer(),
|
||||
cv=5,
|
||||
n_jobs=-1,
|
||||
verbose=1
|
||||
)
|
||||
cv.fit(X_train, y_train)
|
||||
|
||||
# Best model
|
||||
best_model = cv.best_estimator_
|
||||
print(f"Best parameters: {cv.best_params_}")
|
||||
print(f"Best CV C-index: {cv.best_score_:.3f}")
|
||||
|
||||
# Evaluate on test set
|
||||
risk_scores = best_model.predict(X_test)
|
||||
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
|
||||
print(f"Test C-index: {c_index:.3f}")
|
||||
```
|
||||
|
||||
## Important Considerations
|
||||
|
||||
### Feature Scaling
|
||||
|
||||
**CRITICAL**: Always standardize features before using SVMs!
|
||||
|
||||
```python
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
scaler = StandardScaler()
|
||||
X_train_scaled = scaler.fit_transform(X_train)
|
||||
X_test_scaled = scaler.transform(X_test)
|
||||
```
|
||||
|
||||
### Computational Complexity
|
||||
|
||||
- **FastSurvivalSVM**: O(n × p) per iteration - fast
|
||||
- **FastKernelSurvivalSVM**: O(n² × p) - slower, scales quadratically
|
||||
- **NaiveSurvivalSVM**: O(n³) - very slow for large datasets
|
||||
|
||||
For large datasets (>10,000 samples), prefer:
|
||||
- FastSurvivalSVM (linear)
|
||||
- Gradient Boosting
|
||||
- Random Survival Forest
|
||||
|
||||
### When SVMs May Not Be Best Choice
|
||||
|
||||
- **Very large datasets**: Ensemble methods are faster
|
||||
- **Need survival functions**: Use Random Survival Forest or Cox models
|
||||
- **Need interpretability**: Use Cox models
|
||||
- **Very high dimensional**: Use penalized Cox (Coxnet) or gradient boosting with feature selection
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
| Model | Speed | Non-linearity | Scalability | Interpretability |
|
||||
|-------|-------|---------------|-------------|------------------|
|
||||
| FastSurvivalSVM | Fast | No | High | Medium |
|
||||
| FastKernelSurvivalSVM | Medium | Yes | Medium | Low |
|
||||
| HingeLossSurvivalSVM | Fast | No | High | Medium |
|
||||
| NaiveSurvivalSVM | Slow | No | Low | Medium |
|
||||
|
||||
**General Recommendations:**
|
||||
- Start with **FastSurvivalSVM** for baseline
|
||||
- Try **FastKernelSurvivalSVM** with RBF if non-linearity expected
|
||||
- Use grid search to tune alpha and gamma
|
||||
- Always standardize features
|
||||
- Compare with Random Survival Forest and Gradient Boosting
|
||||
Reference in New Issue
Block a user