Initial commit
This commit is contained in:
123
skills/aeon/references/clustering.md
Normal file
123
skills/aeon/references/clustering.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Time Series Clustering
|
||||
|
||||
Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.
|
||||
|
||||
## Partitioning Algorithms
|
||||
|
||||
Standard k-means/k-medoids adapted for time series:
|
||||
|
||||
- `TimeSeriesKMeans` - K-means with temporal distance metrics (DTW, Euclidean, etc.)
|
||||
- `TimeSeriesKMedoids` - Uses actual time series as cluster centers
|
||||
- `TimeSeriesKShape` - Shape-based clustering algorithm
|
||||
- `TimeSeriesKernelKMeans` - Kernel-based variant for nonlinear patterns
|
||||
|
||||
**Use when**: Known number of clusters, spherical cluster shapes expected.
|
||||
|
||||
## Large Dataset Methods
|
||||
|
||||
Efficient clustering for large collections:
|
||||
|
||||
- `TimeSeriesCLARA` - Clustering Large Applications with sampling
|
||||
- `TimeSeriesCLARANS` - Randomized search variant of CLARA
|
||||
|
||||
**Use when**: Dataset too large for standard k-medoids, need scalability.
|
||||
|
||||
## Elastic Distance Clustering
|
||||
|
||||
Specialized for alignment-based similarity:
|
||||
|
||||
- `KASBA` - K-means with shift-invariant elastic averaging
|
||||
- `ElasticSOM` - Self-organizing map using elastic distances
|
||||
|
||||
**Use when**: Time series have temporal shifts or warping.
|
||||
|
||||
## Spectral Methods
|
||||
|
||||
Graph-based clustering:
|
||||
|
||||
- `KSpectralCentroid` - Spectral clustering with centroid computation
|
||||
|
||||
**Use when**: Non-convex cluster shapes, need graph-based approach.
|
||||
|
||||
## Deep Learning Clustering
|
||||
|
||||
Neural network-based clustering with auto-encoders:
|
||||
|
||||
- `AEFCNClusterer` - Fully convolutional auto-encoder
|
||||
- `AEResNetClusterer` - Residual network auto-encoder
|
||||
- `AEDCNNClusterer` - Dilated CNN auto-encoder
|
||||
- `AEDRNNClusterer` - Dilated RNN auto-encoder
|
||||
- `AEBiGRUClusterer` - Bidirectional GRU auto-encoder
|
||||
- `AEAttentionBiGRUClusterer` - Attention-enhanced BiGRU auto-encoder
|
||||
|
||||
**Use when**: Large datasets, need learned representations, or complex patterns.
|
||||
|
||||
## Feature-Based Clustering
|
||||
|
||||
Transform to feature space before clustering:
|
||||
|
||||
- `Catch22Clusterer` - Clusters on 22 canonical features
|
||||
- `SummaryClusterer` - Uses summary statistics
|
||||
- `TSFreshClusterer` - Automated tsfresh features
|
||||
|
||||
**Use when**: Raw time series not informative, need interpretable features.
|
||||
|
||||
## Composition
|
||||
|
||||
Build custom clustering pipelines:
|
||||
|
||||
- `ClustererPipeline` - Chain transformers with clusterers
|
||||
|
||||
## Averaging Methods
|
||||
|
||||
Compute cluster centers for time series:
|
||||
|
||||
- `mean_average` - Arithmetic mean
|
||||
- `ba_average` - Barycentric averaging with DTW
|
||||
- `kasba_average` - Shift-invariant averaging
|
||||
- `shift_invariant_average` - General shift-invariant method
|
||||
|
||||
**Use when**: Need representative cluster centers for visualization or initialization.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
from aeon.datasets import load_classification
|
||||
|
||||
# Load data (using classification data for clustering)
|
||||
X_train, _ = load_classification("GunPoint", split="train")
|
||||
|
||||
# Cluster time series
|
||||
clusterer = TimeSeriesKMeans(
|
||||
n_clusters=3,
|
||||
distance="dtw", # Use DTW distance
|
||||
averaging_method="ba" # Barycentric averaging
|
||||
)
|
||||
labels = clusterer.fit_predict(X_train)
|
||||
centers = clusterer.cluster_centers_
|
||||
```
|
||||
|
||||
## Algorithm Selection
|
||||
|
||||
- **Speed priority**: TimeSeriesKMeans with Euclidean distance
|
||||
- **Temporal alignment**: KASBA, TimeSeriesKMeans with DTW
|
||||
- **Large datasets**: TimeSeriesCLARA, TimeSeriesCLARANS
|
||||
- **Complex patterns**: Deep learning clusterers
|
||||
- **Interpretability**: Catch22Clusterer, SummaryClusterer
|
||||
- **Non-convex clusters**: KSpectralCentroid
|
||||
|
||||
## Distance Metrics
|
||||
|
||||
Compatible distance metrics include:
|
||||
- Euclidean, Manhattan, Minkowski (lock-step)
|
||||
- DTW, DDTW, WDTW (elastic with alignment)
|
||||
- ERP, EDR, LCSS (edit-based)
|
||||
- MSM, TWE (specialized elastic)
|
||||
|
||||
## Evaluation
|
||||
|
||||
Use clustering metrics from sklearn or aeon benchmarking:
|
||||
- Silhouette score
|
||||
- Davies-Bouldin index
|
||||
- Calinski-Harabasz index
|
||||
Reference in New Issue
Block a user