Initial commit
This commit is contained in:
256
skills/aeon/references/distances.md
Normal file
256
skills/aeon/references/distances.md
Normal file
@@ -0,0 +1,256 @@
|
||||
# Distance Metrics
|
||||
|
||||
Aeon provides specialized distance functions for measuring similarity between time series, compatible with both aeon and scikit-learn estimators.
|
||||
|
||||
## Distance Categories
|
||||
|
||||
### Elastic Distances
|
||||
|
||||
Allow flexible temporal alignment between series:
|
||||
|
||||
**Dynamic Time Warping Family:**
|
||||
- `dtw` - Classic Dynamic Time Warping
|
||||
- `ddtw` - Derivative DTW (compares derivatives)
|
||||
- `wdtw` - Weighted DTW (penalizes warping by location)
|
||||
- `wddtw` - Weighted Derivative DTW
|
||||
- `shape_dtw` - Shape-based DTW
|
||||
|
||||
**Edit-Based:**
|
||||
- `erp` - Edit distance with Real Penalty
|
||||
- `edr` - Edit Distance on Real sequences
|
||||
- `lcss` - Longest Common SubSequence
|
||||
- `twe` - Time Warp Edit distance
|
||||
|
||||
**Specialized:**
|
||||
- `msm` - Move-Split-Merge distance
|
||||
- `adtw` - Amerced DTW
|
||||
- `sbd` - Shape-Based Distance
|
||||
|
||||
**Use when**: Time series may have temporal shifts, speed variations, or phase differences.
|
||||
|
||||
### Lock-Step Distances
|
||||
|
||||
Compare time series point-by-point without alignment:
|
||||
|
||||
- `euclidean` - Euclidean distance (L2 norm)
|
||||
- `manhattan` - Manhattan distance (L1 norm)
|
||||
- `minkowski` - Generalized Minkowski distance (Lp norm)
|
||||
- `squared` - Squared Euclidean distance
|
||||
|
||||
**Use when**: Series already aligned, need computational speed, or no temporal warping expected.
|
||||
|
||||
## Usage Patterns
|
||||
|
||||
### Computing Single Distance
|
||||
|
||||
```python
|
||||
from aeon.distances import dtw_distance
|
||||
|
||||
# Distance between two time series
|
||||
distance = dtw_distance(x, y)
|
||||
|
||||
# With window constraint (Sakoe-Chiba band)
|
||||
distance = dtw_distance(x, y, window=0.1)
|
||||
```
|
||||
|
||||
### Pairwise Distance Matrix
|
||||
|
||||
```python
|
||||
from aeon.distances import dtw_pairwise_distance
|
||||
|
||||
# All pairwise distances in collection
|
||||
X = [series1, series2, series3, series4]
|
||||
distance_matrix = dtw_pairwise_distance(X)
|
||||
|
||||
# Cross-collection distances
|
||||
distance_matrix = dtw_pairwise_distance(X_train, X_test)
|
||||
```
|
||||
|
||||
### Cost Matrix and Alignment Path
|
||||
|
||||
```python
|
||||
from aeon.distances import dtw_cost_matrix, dtw_alignment_path
|
||||
|
||||
# Get full cost matrix
|
||||
cost_matrix = dtw_cost_matrix(x, y)
|
||||
|
||||
# Get optimal alignment path
|
||||
path = dtw_alignment_path(x, y)
|
||||
# Returns indices: [(0,0), (1,1), (2,1), (2,2), ...]
|
||||
```
|
||||
|
||||
### Using with Estimators
|
||||
|
||||
```python
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
# Use DTW distance in classifier
|
||||
clf = KNeighborsTimeSeriesClassifier(
|
||||
n_neighbors=5,
|
||||
distance="dtw",
|
||||
distance_params={"window": 0.2}
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Distance Parameters
|
||||
|
||||
### Window Constraints
|
||||
|
||||
Limit warping path deviation (improves speed and prevents pathological warping):
|
||||
|
||||
```python
|
||||
# Sakoe-Chiba band: window as fraction of series length
|
||||
dtw_distance(x, y, window=0.1) # Allow 10% deviation
|
||||
|
||||
# Itakura parallelogram: slopes constrain path
|
||||
dtw_distance(x, y, itakura_max_slope=2.0)
|
||||
```
|
||||
|
||||
### Normalization
|
||||
|
||||
Control whether to z-normalize series before distance computation:
|
||||
|
||||
```python
|
||||
# Most elastic distances support normalization
|
||||
distance = dtw_distance(x, y, normalize=True)
|
||||
```
|
||||
|
||||
### Distance-Specific Parameters
|
||||
|
||||
```python
|
||||
# ERP: penalty for gaps
|
||||
distance = erp_distance(x, y, g=0.5)
|
||||
|
||||
# TWE: stiffness and penalty parameters
|
||||
distance = twe_distance(x, y, nu=0.001, lmbda=1.0)
|
||||
|
||||
# LCSS: epsilon threshold for matching
|
||||
distance = lcss_distance(x, y, epsilon=0.5)
|
||||
```
|
||||
|
||||
## Algorithm Selection
|
||||
|
||||
### By Use Case:
|
||||
|
||||
**Temporal misalignment**: DTW, DDTW, WDTW
|
||||
**Speed variations**: DTW with window constraint
|
||||
**Shape similarity**: Shape DTW, SBD
|
||||
**Edit operations**: ERP, EDR, LCSS
|
||||
**Derivative matching**: DDTW
|
||||
**Computational speed**: Euclidean, Manhattan
|
||||
**Outlier robustness**: Manhattan, LCSS
|
||||
|
||||
### By Computational Cost:
|
||||
|
||||
**Fastest**: Euclidean (O(n))
|
||||
**Fast**: Constrained DTW (O(nw) where w is window)
|
||||
**Medium**: Full DTW (O(n²))
|
||||
**Slower**: Complex elastic distances (ERP, TWE, MSM)
|
||||
|
||||
## Quick Reference Table
|
||||
|
||||
| Distance | Alignment | Speed | Robustness | Interpretability |
|
||||
|----------|-----------|-------|------------|------------------|
|
||||
| Euclidean | Lock-step | Very Fast | Low | High |
|
||||
| DTW | Elastic | Medium | Medium | Medium |
|
||||
| DDTW | Elastic | Medium | High | Medium |
|
||||
| WDTW | Elastic | Medium | Medium | Medium |
|
||||
| ERP | Edit-based | Slow | High | Low |
|
||||
| LCSS | Edit-based | Slow | Very High | Low |
|
||||
| Shape DTW | Elastic | Medium | Medium | High |
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Normalization
|
||||
|
||||
Most distances sensitive to scale; normalize when appropriate:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection import Normalizer
|
||||
|
||||
normalizer = Normalizer()
|
||||
X_normalized = normalizer.fit_transform(X)
|
||||
```
|
||||
|
||||
### 2. Window Constraints
|
||||
|
||||
For DTW variants, use window constraints for speed and better generalization:
|
||||
|
||||
```python
|
||||
# Start with 10-20% window
|
||||
distance = dtw_distance(x, y, window=0.1)
|
||||
```
|
||||
|
||||
### 3. Series Length
|
||||
|
||||
- Equal-length required: Most lock-step distances
|
||||
- Unequal-length supported: Elastic distances (DTW, ERP, etc.)
|
||||
|
||||
### 4. Multivariate Series
|
||||
|
||||
Most distances support multivariate time series:
|
||||
|
||||
```python
|
||||
# x.shape = (n_channels, n_timepoints)
|
||||
distance = dtw_distance(x_multivariate, y_multivariate)
|
||||
```
|
||||
|
||||
### 5. Performance Optimization
|
||||
|
||||
- Use numba-compiled implementations (default in aeon)
|
||||
- Consider lock-step distances if alignment not needed
|
||||
- Use windowed DTW instead of full DTW
|
||||
- Precompute distance matrices for repeated use
|
||||
|
||||
### 6. Choosing the Right Distance
|
||||
|
||||
```python
|
||||
# Quick decision tree:
|
||||
if series_aligned:
|
||||
use_distance = "euclidean"
|
||||
elif need_speed:
|
||||
use_distance = "dtw" # with window constraint
|
||||
elif temporal_shifts_expected:
|
||||
use_distance = "dtw" or "shape_dtw"
|
||||
elif outliers_present:
|
||||
use_distance = "lcss" or "manhattan"
|
||||
elif derivatives_matter:
|
||||
use_distance = "ddtw" or "wddtw"
|
||||
```
|
||||
|
||||
## Integration with scikit-learn
|
||||
|
||||
Aeon distances work with sklearn estimators:
|
||||
|
||||
```python
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
from aeon.distances import dtw_pairwise_distance
|
||||
|
||||
# Precompute distance matrix
|
||||
X_train_distances = dtw_pairwise_distance(X_train)
|
||||
|
||||
# Use with sklearn
|
||||
clf = KNeighborsClassifier(metric='precomputed')
|
||||
clf.fit(X_train_distances, y_train)
|
||||
```
|
||||
|
||||
## Available Distance Functions
|
||||
|
||||
Get list of all available distances:
|
||||
|
||||
```python
|
||||
from aeon.distances import get_distance_function_names
|
||||
|
||||
print(get_distance_function_names())
|
||||
# ['dtw', 'ddtw', 'wdtw', 'euclidean', 'erp', 'edr', ...]
|
||||
```
|
||||
|
||||
Retrieve specific distance function:
|
||||
|
||||
```python
|
||||
from aeon.distances import get_distance_function
|
||||
|
||||
distance_func = get_distance_function("dtw")
|
||||
result = distance_func(x, y, window=0.1)
|
||||
```
|
||||
Reference in New Issue
Block a user