3.7 KiB
Time Series Clustering
Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.
Partitioning Algorithms
Standard k-means/k-medoids adapted for time series:
TimeSeriesKMeans- K-means with temporal distance metrics (DTW, Euclidean, etc.)TimeSeriesKMedoids- Uses actual time series as cluster centersTimeSeriesKShape- Shape-based clustering algorithmTimeSeriesKernelKMeans- Kernel-based variant for nonlinear patterns
Use when: Known number of clusters, spherical cluster shapes expected.
Large Dataset Methods
Efficient clustering for large collections:
TimeSeriesCLARA- Clustering Large Applications with samplingTimeSeriesCLARANS- Randomized search variant of CLARA
Use when: Dataset too large for standard k-medoids, need scalability.
Elastic Distance Clustering
Specialized for alignment-based similarity:
KASBA- K-means with shift-invariant elastic averagingElasticSOM- Self-organizing map using elastic distances
Use when: Time series have temporal shifts or warping.
Spectral Methods
Graph-based clustering:
KSpectralCentroid- Spectral clustering with centroid computation
Use when: Non-convex cluster shapes, need graph-based approach.
Deep Learning Clustering
Neural network-based clustering with auto-encoders:
AEFCNClusterer- Fully convolutional auto-encoderAEResNetClusterer- Residual network auto-encoderAEDCNNClusterer- Dilated CNN auto-encoderAEDRNNClusterer- Dilated RNN auto-encoderAEBiGRUClusterer- Bidirectional GRU auto-encoderAEAttentionBiGRUClusterer- Attention-enhanced BiGRU auto-encoder
Use when: Large datasets, need learned representations, or complex patterns.
Feature-Based Clustering
Transform to feature space before clustering:
Catch22Clusterer- Clusters on 22 canonical featuresSummaryClusterer- Uses summary statisticsTSFreshClusterer- Automated tsfresh features
Use when: Raw time series not informative, need interpretable features.
Composition
Build custom clustering pipelines:
ClustererPipeline- Chain transformers with clusterers
Averaging Methods
Compute cluster centers for time series:
mean_average- Arithmetic meanba_average- Barycentric averaging with DTWkasba_average- Shift-invariant averagingshift_invariant_average- General shift-invariant method
Use when: Need representative cluster centers for visualization or initialization.
Quick Start
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_classification
# Load data (using classification data for clustering)
X_train, _ = load_classification("GunPoint", split="train")
# Cluster time series
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw", # Use DTW distance
averaging_method="ba" # Barycentric averaging
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
Algorithm Selection
- Speed priority: TimeSeriesKMeans with Euclidean distance
- Temporal alignment: KASBA, TimeSeriesKMeans with DTW
- Large datasets: TimeSeriesCLARA, TimeSeriesCLARANS
- Complex patterns: Deep learning clusterers
- Interpretability: Catch22Clusterer, SummaryClusterer
- Non-convex clusters: KSpectralCentroid
Distance Metrics
Compatible distance metrics include:
- Euclidean, Manhattan, Minkowski (lock-step)
- DTW, DDTW, WDTW (elastic with alignment)
- ERP, EDR, LCSS (edit-based)
- MSM, TWE (specialized elastic)
Evaluation
Use clustering metrics from sklearn or aeon benchmarking:
- Silhouette score
- Davies-Bouldin index
- Calinski-Harabasz index