gh-k-dense-ai-claude-scient…/skills/aeon/references/clustering.md

# Time Series Clustering

Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.

## Partitioning Algorithms

Standard k-means/k-medoids adapted for time series:

- `TimeSeriesKMeans` - K-means with temporal distance metrics (DTW, Euclidean, etc.)
- `TimeSeriesKMedoids` - Uses actual time series as cluster centers
- `TimeSeriesKShape` - Shape-based clustering algorithm
- `TimeSeriesKernelKMeans` - Kernel-based variant for nonlinear patterns

**Use when**: Known number of clusters, spherical cluster shapes expected.

## Large Dataset Methods

Efficient clustering for large collections:

- `TimeSeriesCLARA` - Clustering Large Applications with sampling
- `TimeSeriesCLARANS` - Randomized search variant of CLARA

**Use when**: Dataset too large for standard k-medoids, need scalability.

## Elastic Distance Clustering

Specialized for alignment-based similarity:

- `KASBA` - K-means with shift-invariant elastic averaging
- `ElasticSOM` - Self-organizing map using elastic distances

**Use when**: Time series have temporal shifts or warping.

## Spectral Methods

Graph-based clustering:

- `KSpectralCentroid` - Spectral clustering with centroid computation

**Use when**: Non-convex cluster shapes, need graph-based approach.

## Deep Learning Clustering

Neural network-based clustering with auto-encoders:

- `AEFCNClusterer` - Fully convolutional auto-encoder
- `AEResNetClusterer` - Residual network auto-encoder
- `AEDCNNClusterer` - Dilated CNN auto-encoder
- `AEDRNNClusterer` - Dilated RNN auto-encoder
- `AEBiGRUClusterer` - Bidirectional GRU auto-encoder
- `AEAttentionBiGRUClusterer` - Attention-enhanced BiGRU auto-encoder

**Use when**: Large datasets, need learned representations, or complex patterns.

## Feature-Based Clustering

Transform to feature space before clustering:

- `Catch22Clusterer` - Clusters on 22 canonical features
- `SummaryClusterer` - Uses summary statistics
- `TSFreshClusterer` - Automated tsfresh features

**Use when**: Raw time series not informative, need interpretable features.

## Composition

Build custom clustering pipelines:

- `ClustererPipeline` - Chain transformers with clusterers

## Averaging Methods

Compute cluster centers for time series:

- `mean_average` - Arithmetic mean
- `ba_average` - Barycentric averaging with DTW
- `kasba_average` - Shift-invariant averaging
- `shift_invariant_average` - General shift-invariant method

**Use when**: Need representative cluster centers for visualization or initialization.

## Quick Start

```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_classification

# Load data (using classification data for clustering)
X_train, _ = load_classification("GunPoint", split="train")

# Cluster time series
clusterer = TimeSeriesKMeans(
    n_clusters=3,
    distance="dtw",  # Use DTW distance
    averaging_method="ba"  # Barycentric averaging
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
```

## Algorithm Selection

- **Speed priority**: TimeSeriesKMeans with Euclidean distance
- **Temporal alignment**: KASBA, TimeSeriesKMeans with DTW
- **Large datasets**: TimeSeriesCLARA, TimeSeriesCLARANS
- **Complex patterns**: Deep learning clusterers
- **Interpretability**: Catch22Clusterer, SummaryClusterer
- **Non-convex clusters**: KSpectralCentroid

## Distance Metrics

Compatible distance metrics include:
- Euclidean, Manhattan, Minkowski (lock-step)
- DTW, DDTW, WDTW (elastic with alignment)
- ERP, EDR, LCSS (edit-based)
- MSM, TWE (specialized elastic)

## Evaluation

Use clustering metrics from sklearn or aeon benchmarking:
- Silhouette score
- Davies-Bouldin index
- Calinski-Harabasz index