Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,411 @@
# Survival Support Vector Machines
## Overview
Survival Support Vector Machines (SVMs) adapt the traditional SVM framework to survival analysis with censored data. They optimize a ranking objective that encourages correct ordering of survival times.
### Core Idea
SVMs for survival analysis learn a function f(x) that produces risk scores, where the optimization ensures that subjects with shorter survival times receive higher risk scores than those with longer times.
## When to Use Survival SVMs
**Appropriate for:**
- Medium-sized datasets (typically 100-10,000 samples)
- Need for non-linear decision boundaries (kernel SVMs)
- Want margin-based learning with regularization
- Have well-defined feature space
**Not ideal for:**
- Very large datasets (>100,000 samples) - ensemble methods may be faster
- Need interpretable coefficients - use Cox models instead
- Require survival function estimates - use Random Survival Forest
- Very high dimensional data - use regularized Cox or gradient boosting
## Model Types
### FastSurvivalSVM
Linear survival SVM optimized for speed using coordinate descent.
**When to Use:**
- Linear relationships expected
- Large datasets where speed matters
- Want fast training and prediction
**Key Parameters:**
- `alpha`: Regularization parameter (default: 1.0)
- Higher = more regularization
- `rank_ratio`: Trade-off between ranking and regression (default: 1.0)
- `max_iter`: Maximum iterations (default: 20)
- `tol`: Tolerance for stopping criterion (default: 1e-5)
```python
from sksurv.svm import FastSurvivalSVM
# Fit linear survival SVM
estimator = FastSurvivalSVM(alpha=1.0, max_iter=100, tol=1e-5, random_state=42)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
```
### FastKernelSurvivalSVM
Kernel survival SVM for non-linear relationships.
**When to Use:**
- Non-linear relationships between features and survival
- Medium-sized datasets
- Can afford longer training time for better performance
**Kernel Options:**
- `'linear'`: Linear kernel, equivalent to FastSurvivalSVM
- `'poly'`: Polynomial kernel
- `'rbf'`: Radial basis function (Gaussian) kernel - most common
- `'sigmoid'`: Sigmoid kernel
- Custom kernel function
**Key Parameters:**
- `alpha`: Regularization parameter (default: 1.0)
- `kernel`: Kernel function (default: 'rbf')
- `gamma`: Kernel coefficient for rbf, poly, sigmoid
- `degree`: Degree for polynomial kernel
- `coef0`: Independent term for poly and sigmoid
- `rank_ratio`: Trade-off parameter (default: 1.0)
- `max_iter`: Maximum iterations (default: 20)
```python
from sksurv.svm import FastKernelSurvivalSVM
# Fit RBF kernel survival SVM
estimator = FastKernelSurvivalSVM(
alpha=1.0,
kernel='rbf',
gamma='scale',
max_iter=50,
random_state=42
)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
```
### HingeLossSurvivalSVM
Survival SVM using hinge loss, more similar to classification SVM.
**When to Use:**
- Want hinge loss instead of squared hinge
- Sparse solutions desired
- Similar behavior to classification SVMs
**Key Parameters:**
- `alpha`: Regularization parameter
- `fit_intercept`: Whether to fit intercept term (default: False)
```python
from sksurv.svm import HingeLossSurvivalSVM
# Fit hinge loss SVM
estimator = HingeLossSurvivalSVM(alpha=1.0, fit_intercept=False, random_state=42)
estimator.fit(X, y)
# Predict risk scores
risk_scores = estimator.predict(X_test)
```
### NaiveSurvivalSVM
Original formulation of survival SVM using quadratic programming.
**When to Use:**
- Small datasets
- Research/benchmarking purposes
- Other methods don't converge
**Limitations:**
- Slower than Fast variants
- Less scalable
```python
from sksurv.svm import NaiveSurvivalSVM
# Fit naive SVM (slower)
estimator = NaiveSurvivalSVM(alpha=1.0, random_state=42)
estimator.fit(X, y)
# Predict
risk_scores = estimator.predict(X_test)
```
### MinlipSurvivalAnalysis
Survival analysis using minimizing Lipschitz constant approach.
**When to Use:**
- Want different optimization objective
- Research applications
- Alternative to standard survival SVMs
```python
from sksurv.svm import MinlipSurvivalAnalysis
# Fit Minlip model
estimator = MinlipSurvivalAnalysis(alpha=1.0, random_state=42)
estimator.fit(X, y)
# Predict
risk_scores = estimator.predict(X_test)
```
## Hyperparameter Tuning
### Tuning Alpha (Regularization)
```python
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer
# Define parameter grid
param_grid = {
'alpha': [0.1, 0.5, 1.0, 5.0, 10.0, 50.0]
}
# Grid search
cv = GridSearchCV(
FastSurvivalSVM(),
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1
)
cv.fit(X, y)
print(f"Best alpha: {cv.best_params_['alpha']}")
print(f"Best C-index: {cv.best_score_:.3f}")
```
### Tuning Kernel Parameters
```python
from sklearn.model_selection import GridSearchCV
# Define parameter grid for kernel SVM
param_grid = {
'alpha': [0.1, 1.0, 10.0],
'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1.0]
}
# Grid search
cv = GridSearchCV(
FastKernelSurvivalSVM(kernel='rbf'),
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1
)
cv.fit(X, y)
print(f"Best parameters: {cv.best_params_}")
print(f"Best C-index: {cv.best_score_:.3f}")
```
## Clinical Kernel Transform
### ClinicalKernelTransform
Special kernel that combines clinical features with molecular data for improved predictions in medical applications.
**Use Case:**
- Have both clinical variables (age, stage, etc.) and high-dimensional molecular data (gene expression, genomics)
- Clinical features should have different weighting
- Want to integrate heterogeneous data types
**Key Parameters:**
- `fit_once`: Whether to fit kernel once or refit during cross-validation (default: False)
- Clinical features should be passed separately from molecular features
```python
from sksurv.kernels import ClinicalKernelTransform
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.pipeline import make_pipeline
# Separate clinical and molecular features
clinical_features = ['age', 'stage', 'grade']
X_clinical = X[clinical_features]
X_molecular = X.drop(clinical_features, axis=1)
# Create pipeline with clinical kernel
estimator = make_pipeline(
ClinicalKernelTransform(),
FastKernelSurvivalSVM()
)
# Fit model
# ClinicalKernelTransform expects tuple (clinical, molecular)
X_combined = list(zip(X_clinical.values, X_molecular.values))
estimator.fit(X_combined, y)
```
## Practical Examples
### Example 1: Linear SVM with Cross-Validation
```python
from sksurv.svm import FastSurvivalSVM
from sklearn.model_selection import cross_val_score
from sksurv.metrics import as_concordance_index_ipcw_scorer
from sklearn.preprocessing import StandardScaler
# Standardize features (important for SVMs!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Create model
svm = FastSurvivalSVM(alpha=1.0, max_iter=100, random_state=42)
# Cross-validation
scores = cross_val_score(
svm, X_scaled, y,
cv=5,
scoring=as_concordance_index_ipcw_scorer(),
n_jobs=-1
)
print(f"Mean C-index: {scores.mean():.3f}{scores.std():.3f})")
```
### Example 2: Kernel SVM with Different Kernels
```python
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import train_test_split
from sksurv.metrics import concordance_index_ipcw
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Compare different kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = {}
for kernel in kernels:
# Fit model
svm = FastKernelSurvivalSVM(kernel=kernel, alpha=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)
# Predict
risk_scores = svm.predict(X_test_scaled)
# Evaluate
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
results[kernel] = c_index
print(f"{kernel:10s}: C-index = {c_index:.3f}")
# Best kernel
best_kernel = max(results, key=results.get)
print(f"\nBest kernel: {best_kernel} (C-index = {results[best_kernel]:.3f})")
```
### Example 3: Full Pipeline with Hyperparameter Tuning
```python
from sksurv.svm import FastKernelSurvivalSVM
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sksurv.metrics import as_concordance_index_ipcw_scorer
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', FastKernelSurvivalSVM(kernel='rbf'))
])
# Define parameter grid
param_grid = {
'svm__alpha': [0.1, 1.0, 10.0],
'svm__gamma': ['scale', 0.01, 0.1, 1.0]
}
# Grid search
cv = GridSearchCV(
pipeline,
param_grid,
scoring=as_concordance_index_ipcw_scorer(),
cv=5,
n_jobs=-1,
verbose=1
)
cv.fit(X_train, y_train)
# Best model
best_model = cv.best_estimator_
print(f"Best parameters: {cv.best_params_}")
print(f"Best CV C-index: {cv.best_score_:.3f}")
# Evaluate on test set
risk_scores = best_model.predict(X_test)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
print(f"Test C-index: {c_index:.3f}")
```
## Important Considerations
### Feature Scaling
**CRITICAL**: Always standardize features before using SVMs!
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
### Computational Complexity
- **FastSurvivalSVM**: O(n × p) per iteration - fast
- **FastKernelSurvivalSVM**: O(n² × p) - slower, scales quadratically
- **NaiveSurvivalSVM**: O(n³) - very slow for large datasets
For large datasets (>10,000 samples), prefer:
- FastSurvivalSVM (linear)
- Gradient Boosting
- Random Survival Forest
### When SVMs May Not Be Best Choice
- **Very large datasets**: Ensemble methods are faster
- **Need survival functions**: Use Random Survival Forest or Cox models
- **Need interpretability**: Use Cox models
- **Very high dimensional**: Use penalized Cox (Coxnet) or gradient boosting with feature selection
## Model Selection Guide
| Model | Speed | Non-linearity | Scalability | Interpretability |
|-------|-------|---------------|-------------|------------------|
| FastSurvivalSVM | Fast | No | High | Medium |
| FastKernelSurvivalSVM | Medium | Yes | Medium | Low |
| HingeLossSurvivalSVM | Fast | No | High | Medium |
| NaiveSurvivalSVM | Slow | No | Low | Medium |
**General Recommendations:**
- Start with **FastSurvivalSVM** for baseline
- Try **FastKernelSurvivalSVM** with RBF if non-linearity expected
- Use grid search to tune alpha and gamma
- Always standardize features
- Compare with Random Survival Forest and Gradient Boosting