257 lines
6.3 KiB
Markdown
257 lines
6.3 KiB
Markdown
# Distance Metrics
|
|
|
|
Aeon provides specialized distance functions for measuring similarity between time series, compatible with both aeon and scikit-learn estimators.
|
|
|
|
## Distance Categories
|
|
|
|
### Elastic Distances
|
|
|
|
Allow flexible temporal alignment between series:
|
|
|
|
**Dynamic Time Warping Family:**
|
|
- `dtw` - Classic Dynamic Time Warping
|
|
- `ddtw` - Derivative DTW (compares derivatives)
|
|
- `wdtw` - Weighted DTW (penalizes warping by location)
|
|
- `wddtw` - Weighted Derivative DTW
|
|
- `shape_dtw` - Shape-based DTW
|
|
|
|
**Edit-Based:**
|
|
- `erp` - Edit distance with Real Penalty
|
|
- `edr` - Edit Distance on Real sequences
|
|
- `lcss` - Longest Common SubSequence
|
|
- `twe` - Time Warp Edit distance
|
|
|
|
**Specialized:**
|
|
- `msm` - Move-Split-Merge distance
|
|
- `adtw` - Amerced DTW
|
|
- `sbd` - Shape-Based Distance
|
|
|
|
**Use when**: Time series may have temporal shifts, speed variations, or phase differences.
|
|
|
|
### Lock-Step Distances
|
|
|
|
Compare time series point-by-point without alignment:
|
|
|
|
- `euclidean` - Euclidean distance (L2 norm)
|
|
- `manhattan` - Manhattan distance (L1 norm)
|
|
- `minkowski` - Generalized Minkowski distance (Lp norm)
|
|
- `squared` - Squared Euclidean distance
|
|
|
|
**Use when**: Series already aligned, need computational speed, or no temporal warping expected.
|
|
|
|
## Usage Patterns
|
|
|
|
### Computing Single Distance
|
|
|
|
```python
|
|
from aeon.distances import dtw_distance
|
|
|
|
# Distance between two time series
|
|
distance = dtw_distance(x, y)
|
|
|
|
# With window constraint (Sakoe-Chiba band)
|
|
distance = dtw_distance(x, y, window=0.1)
|
|
```
|
|
|
|
### Pairwise Distance Matrix
|
|
|
|
```python
|
|
from aeon.distances import dtw_pairwise_distance
|
|
|
|
# All pairwise distances in collection
|
|
X = [series1, series2, series3, series4]
|
|
distance_matrix = dtw_pairwise_distance(X)
|
|
|
|
# Cross-collection distances
|
|
distance_matrix = dtw_pairwise_distance(X_train, X_test)
|
|
```
|
|
|
|
### Cost Matrix and Alignment Path
|
|
|
|
```python
|
|
from aeon.distances import dtw_cost_matrix, dtw_alignment_path
|
|
|
|
# Get full cost matrix
|
|
cost_matrix = dtw_cost_matrix(x, y)
|
|
|
|
# Get optimal alignment path
|
|
path = dtw_alignment_path(x, y)
|
|
# Returns indices: [(0,0), (1,1), (2,1), (2,2), ...]
|
|
```
|
|
|
|
### Using with Estimators
|
|
|
|
```python
|
|
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
|
|
|
# Use DTW distance in classifier
|
|
clf = KNeighborsTimeSeriesClassifier(
|
|
n_neighbors=5,
|
|
distance="dtw",
|
|
distance_params={"window": 0.2}
|
|
)
|
|
clf.fit(X_train, y_train)
|
|
```
|
|
|
|
## Distance Parameters
|
|
|
|
### Window Constraints
|
|
|
|
Limit warping path deviation (improves speed and prevents pathological warping):
|
|
|
|
```python
|
|
# Sakoe-Chiba band: window as fraction of series length
|
|
dtw_distance(x, y, window=0.1) # Allow 10% deviation
|
|
|
|
# Itakura parallelogram: slopes constrain path
|
|
dtw_distance(x, y, itakura_max_slope=2.0)
|
|
```
|
|
|
|
### Normalization
|
|
|
|
Control whether to z-normalize series before distance computation:
|
|
|
|
```python
|
|
# Most elastic distances support normalization
|
|
distance = dtw_distance(x, y, normalize=True)
|
|
```
|
|
|
|
### Distance-Specific Parameters
|
|
|
|
```python
|
|
# ERP: penalty for gaps
|
|
distance = erp_distance(x, y, g=0.5)
|
|
|
|
# TWE: stiffness and penalty parameters
|
|
distance = twe_distance(x, y, nu=0.001, lmbda=1.0)
|
|
|
|
# LCSS: epsilon threshold for matching
|
|
distance = lcss_distance(x, y, epsilon=0.5)
|
|
```
|
|
|
|
## Algorithm Selection
|
|
|
|
### By Use Case:
|
|
|
|
**Temporal misalignment**: DTW, DDTW, WDTW
|
|
**Speed variations**: DTW with window constraint
|
|
**Shape similarity**: Shape DTW, SBD
|
|
**Edit operations**: ERP, EDR, LCSS
|
|
**Derivative matching**: DDTW
|
|
**Computational speed**: Euclidean, Manhattan
|
|
**Outlier robustness**: Manhattan, LCSS
|
|
|
|
### By Computational Cost:
|
|
|
|
**Fastest**: Euclidean (O(n))
|
|
**Fast**: Constrained DTW (O(nw) where w is window)
|
|
**Medium**: Full DTW (O(n²))
|
|
**Slower**: Complex elastic distances (ERP, TWE, MSM)
|
|
|
|
## Quick Reference Table
|
|
|
|
| Distance | Alignment | Speed | Robustness | Interpretability |
|
|
|----------|-----------|-------|------------|------------------|
|
|
| Euclidean | Lock-step | Very Fast | Low | High |
|
|
| DTW | Elastic | Medium | Medium | Medium |
|
|
| DDTW | Elastic | Medium | High | Medium |
|
|
| WDTW | Elastic | Medium | Medium | Medium |
|
|
| ERP | Edit-based | Slow | High | Low |
|
|
| LCSS | Edit-based | Slow | Very High | Low |
|
|
| Shape DTW | Elastic | Medium | Medium | High |
|
|
|
|
## Best Practices
|
|
|
|
### 1. Normalization
|
|
|
|
Most distances sensitive to scale; normalize when appropriate:
|
|
|
|
```python
|
|
from aeon.transformations.collection import Normalizer
|
|
|
|
normalizer = Normalizer()
|
|
X_normalized = normalizer.fit_transform(X)
|
|
```
|
|
|
|
### 2. Window Constraints
|
|
|
|
For DTW variants, use window constraints for speed and better generalization:
|
|
|
|
```python
|
|
# Start with 10-20% window
|
|
distance = dtw_distance(x, y, window=0.1)
|
|
```
|
|
|
|
### 3. Series Length
|
|
|
|
- Equal-length required: Most lock-step distances
|
|
- Unequal-length supported: Elastic distances (DTW, ERP, etc.)
|
|
|
|
### 4. Multivariate Series
|
|
|
|
Most distances support multivariate time series:
|
|
|
|
```python
|
|
# x.shape = (n_channels, n_timepoints)
|
|
distance = dtw_distance(x_multivariate, y_multivariate)
|
|
```
|
|
|
|
### 5. Performance Optimization
|
|
|
|
- Use numba-compiled implementations (default in aeon)
|
|
- Consider lock-step distances if alignment not needed
|
|
- Use windowed DTW instead of full DTW
|
|
- Precompute distance matrices for repeated use
|
|
|
|
### 6. Choosing the Right Distance
|
|
|
|
```python
|
|
# Quick decision tree:
|
|
if series_aligned:
|
|
use_distance = "euclidean"
|
|
elif need_speed:
|
|
use_distance = "dtw" # with window constraint
|
|
elif temporal_shifts_expected:
|
|
use_distance = "dtw" or "shape_dtw"
|
|
elif outliers_present:
|
|
use_distance = "lcss" or "manhattan"
|
|
elif derivatives_matter:
|
|
use_distance = "ddtw" or "wddtw"
|
|
```
|
|
|
|
## Integration with scikit-learn
|
|
|
|
Aeon distances work with sklearn estimators:
|
|
|
|
```python
|
|
from sklearn.neighbors import KNeighborsClassifier
|
|
from aeon.distances import dtw_pairwise_distance
|
|
|
|
# Precompute distance matrix
|
|
X_train_distances = dtw_pairwise_distance(X_train)
|
|
|
|
# Use with sklearn
|
|
clf = KNeighborsClassifier(metric='precomputed')
|
|
clf.fit(X_train_distances, y_train)
|
|
```
|
|
|
|
## Available Distance Functions
|
|
|
|
Get list of all available distances:
|
|
|
|
```python
|
|
from aeon.distances import get_distance_function_names
|
|
|
|
print(get_distance_function_names())
|
|
# ['dtw', 'ddtw', 'wdtw', 'euclidean', 'erp', 'edr', ...]
|
|
```
|
|
|
|
Retrieve specific distance function:
|
|
|
|
```python
|
|
from aeon.distances import get_distance_function
|
|
|
|
distance_func = get_distance_function("dtw")
|
|
result = distance_func(x, y, window=0.1)
|
|
```
|