Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,123 @@
# Time Series Clustering
Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.
## Partitioning Algorithms
Standard k-means/k-medoids adapted for time series:
- `TimeSeriesKMeans` - K-means with temporal distance metrics (DTW, Euclidean, etc.)
- `TimeSeriesKMedoids` - Uses actual time series as cluster centers
- `TimeSeriesKShape` - Shape-based clustering algorithm
- `TimeSeriesKernelKMeans` - Kernel-based variant for nonlinear patterns
**Use when**: Known number of clusters, spherical cluster shapes expected.
## Large Dataset Methods
Efficient clustering for large collections:
- `TimeSeriesCLARA` - Clustering Large Applications with sampling
- `TimeSeriesCLARANS` - Randomized search variant of CLARA
**Use when**: Dataset too large for standard k-medoids, need scalability.
## Elastic Distance Clustering
Specialized for alignment-based similarity:
- `KASBA` - K-means with shift-invariant elastic averaging
- `ElasticSOM` - Self-organizing map using elastic distances
**Use when**: Time series have temporal shifts or warping.
## Spectral Methods
Graph-based clustering:
- `KSpectralCentroid` - Spectral clustering with centroid computation
**Use when**: Non-convex cluster shapes, need graph-based approach.
## Deep Learning Clustering
Neural network-based clustering with auto-encoders:
- `AEFCNClusterer` - Fully convolutional auto-encoder
- `AEResNetClusterer` - Residual network auto-encoder
- `AEDCNNClusterer` - Dilated CNN auto-encoder
- `AEDRNNClusterer` - Dilated RNN auto-encoder
- `AEBiGRUClusterer` - Bidirectional GRU auto-encoder
- `AEAttentionBiGRUClusterer` - Attention-enhanced BiGRU auto-encoder
**Use when**: Large datasets, need learned representations, or complex patterns.
## Feature-Based Clustering
Transform to feature space before clustering:
- `Catch22Clusterer` - Clusters on 22 canonical features
- `SummaryClusterer` - Uses summary statistics
- `TSFreshClusterer` - Automated tsfresh features
**Use when**: Raw time series not informative, need interpretable features.
## Composition
Build custom clustering pipelines:
- `ClustererPipeline` - Chain transformers with clusterers
## Averaging Methods
Compute cluster centers for time series:
- `mean_average` - Arithmetic mean
- `ba_average` - Barycentric averaging with DTW
- `kasba_average` - Shift-invariant averaging
- `shift_invariant_average` - General shift-invariant method
**Use when**: Need representative cluster centers for visualization or initialization.
## Quick Start
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_classification
# Load data (using classification data for clustering)
X_train, _ = load_classification("GunPoint", split="train")
# Cluster time series
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw", # Use DTW distance
averaging_method="ba" # Barycentric averaging
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
```
## Algorithm Selection
- **Speed priority**: TimeSeriesKMeans with Euclidean distance
- **Temporal alignment**: KASBA, TimeSeriesKMeans with DTW
- **Large datasets**: TimeSeriesCLARA, TimeSeriesCLARANS
- **Complex patterns**: Deep learning clusterers
- **Interpretability**: Catch22Clusterer, SummaryClusterer
- **Non-convex clusters**: KSpectralCentroid
## Distance Metrics
Compatible distance metrics include:
- Euclidean, Manhattan, Minkowski (lock-step)
- DTW, DDTW, WDTW (elastic with alignment)
- ERP, EDR, LCSS (edit-based)
- MSM, TWE (specialized elastic)
## Evaluation
Use clustering metrics from sklearn or aeon benchmarking:
- Silhouette score
- Davies-Bouldin index
- Calinski-Harabasz index