124 lines
3.7 KiB
Markdown
124 lines
3.7 KiB
Markdown
# Time Series Clustering
|
|
|
|
Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.
|
|
|
|
## Partitioning Algorithms
|
|
|
|
Standard k-means/k-medoids adapted for time series:
|
|
|
|
- `TimeSeriesKMeans` - K-means with temporal distance metrics (DTW, Euclidean, etc.)
|
|
- `TimeSeriesKMedoids` - Uses actual time series as cluster centers
|
|
- `TimeSeriesKShape` - Shape-based clustering algorithm
|
|
- `TimeSeriesKernelKMeans` - Kernel-based variant for nonlinear patterns
|
|
|
|
**Use when**: Known number of clusters, spherical cluster shapes expected.
|
|
|
|
## Large Dataset Methods
|
|
|
|
Efficient clustering for large collections:
|
|
|
|
- `TimeSeriesCLARA` - Clustering Large Applications with sampling
|
|
- `TimeSeriesCLARANS` - Randomized search variant of CLARA
|
|
|
|
**Use when**: Dataset too large for standard k-medoids, need scalability.
|
|
|
|
## Elastic Distance Clustering
|
|
|
|
Specialized for alignment-based similarity:
|
|
|
|
- `KASBA` - K-means with shift-invariant elastic averaging
|
|
- `ElasticSOM` - Self-organizing map using elastic distances
|
|
|
|
**Use when**: Time series have temporal shifts or warping.
|
|
|
|
## Spectral Methods
|
|
|
|
Graph-based clustering:
|
|
|
|
- `KSpectralCentroid` - Spectral clustering with centroid computation
|
|
|
|
**Use when**: Non-convex cluster shapes, need graph-based approach.
|
|
|
|
## Deep Learning Clustering
|
|
|
|
Neural network-based clustering with auto-encoders:
|
|
|
|
- `AEFCNClusterer` - Fully convolutional auto-encoder
|
|
- `AEResNetClusterer` - Residual network auto-encoder
|
|
- `AEDCNNClusterer` - Dilated CNN auto-encoder
|
|
- `AEDRNNClusterer` - Dilated RNN auto-encoder
|
|
- `AEBiGRUClusterer` - Bidirectional GRU auto-encoder
|
|
- `AEAttentionBiGRUClusterer` - Attention-enhanced BiGRU auto-encoder
|
|
|
|
**Use when**: Large datasets, need learned representations, or complex patterns.
|
|
|
|
## Feature-Based Clustering
|
|
|
|
Transform to feature space before clustering:
|
|
|
|
- `Catch22Clusterer` - Clusters on 22 canonical features
|
|
- `SummaryClusterer` - Uses summary statistics
|
|
- `TSFreshClusterer` - Automated tsfresh features
|
|
|
|
**Use when**: Raw time series not informative, need interpretable features.
|
|
|
|
## Composition
|
|
|
|
Build custom clustering pipelines:
|
|
|
|
- `ClustererPipeline` - Chain transformers with clusterers
|
|
|
|
## Averaging Methods
|
|
|
|
Compute cluster centers for time series:
|
|
|
|
- `mean_average` - Arithmetic mean
|
|
- `ba_average` - Barycentric averaging with DTW
|
|
- `kasba_average` - Shift-invariant averaging
|
|
- `shift_invariant_average` - General shift-invariant method
|
|
|
|
**Use when**: Need representative cluster centers for visualization or initialization.
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from aeon.clustering import TimeSeriesKMeans
|
|
from aeon.datasets import load_classification
|
|
|
|
# Load data (using classification data for clustering)
|
|
X_train, _ = load_classification("GunPoint", split="train")
|
|
|
|
# Cluster time series
|
|
clusterer = TimeSeriesKMeans(
|
|
n_clusters=3,
|
|
distance="dtw", # Use DTW distance
|
|
averaging_method="ba" # Barycentric averaging
|
|
)
|
|
labels = clusterer.fit_predict(X_train)
|
|
centers = clusterer.cluster_centers_
|
|
```
|
|
|
|
## Algorithm Selection
|
|
|
|
- **Speed priority**: TimeSeriesKMeans with Euclidean distance
|
|
- **Temporal alignment**: KASBA, TimeSeriesKMeans with DTW
|
|
- **Large datasets**: TimeSeriesCLARA, TimeSeriesCLARANS
|
|
- **Complex patterns**: Deep learning clusterers
|
|
- **Interpretability**: Catch22Clusterer, SummaryClusterer
|
|
- **Non-convex clusters**: KSpectralCentroid
|
|
|
|
## Distance Metrics
|
|
|
|
Compatible distance metrics include:
|
|
- Euclidean, Manhattan, Minkowski (lock-step)
|
|
- DTW, DDTW, WDTW (elastic with alignment)
|
|
- ERP, EDR, LCSS (edit-based)
|
|
- MSM, TWE (specialized elastic)
|
|
|
|
## Evaluation
|
|
|
|
Use clustering metrics from sklearn or aeon benchmarking:
|
|
- Silhouette score
|
|
- Davies-Bouldin index
|
|
- Calinski-Harabasz index
|