Files
gh-k-dense-ai-claude-scient…/skills/aeon/references/distances.md
2025-11-30 08:30:10 +08:00

6.3 KiB

Distance Metrics

Aeon provides specialized distance functions for measuring similarity between time series, compatible with both aeon and scikit-learn estimators.

Distance Categories

Elastic Distances

Allow flexible temporal alignment between series:

Dynamic Time Warping Family:

  • dtw - Classic Dynamic Time Warping
  • ddtw - Derivative DTW (compares derivatives)
  • wdtw - Weighted DTW (penalizes warping by location)
  • wddtw - Weighted Derivative DTW
  • shape_dtw - Shape-based DTW

Edit-Based:

  • erp - Edit distance with Real Penalty
  • edr - Edit Distance on Real sequences
  • lcss - Longest Common SubSequence
  • twe - Time Warp Edit distance

Specialized:

  • msm - Move-Split-Merge distance
  • adtw - Amerced DTW
  • sbd - Shape-Based Distance

Use when: Time series may have temporal shifts, speed variations, or phase differences.

Lock-Step Distances

Compare time series point-by-point without alignment:

  • euclidean - Euclidean distance (L2 norm)
  • manhattan - Manhattan distance (L1 norm)
  • minkowski - Generalized Minkowski distance (Lp norm)
  • squared - Squared Euclidean distance

Use when: Series already aligned, need computational speed, or no temporal warping expected.

Usage Patterns

Computing Single Distance

from aeon.distances import dtw_distance

# Distance between two time series
distance = dtw_distance(x, y)

# With window constraint (Sakoe-Chiba band)
distance = dtw_distance(x, y, window=0.1)

Pairwise Distance Matrix

from aeon.distances import dtw_pairwise_distance

# All pairwise distances in collection
X = [series1, series2, series3, series4]
distance_matrix = dtw_pairwise_distance(X)

# Cross-collection distances
distance_matrix = dtw_pairwise_distance(X_train, X_test)

Cost Matrix and Alignment Path

from aeon.distances import dtw_cost_matrix, dtw_alignment_path

# Get full cost matrix
cost_matrix = dtw_cost_matrix(x, y)

# Get optimal alignment path
path = dtw_alignment_path(x, y)
# Returns indices: [(0,0), (1,1), (2,1), (2,2), ...]

Using with Estimators

from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier

# Use DTW distance in classifier
clf = KNeighborsTimeSeriesClassifier(
    n_neighbors=5,
    distance="dtw",
    distance_params={"window": 0.2}
)
clf.fit(X_train, y_train)

Distance Parameters

Window Constraints

Limit warping path deviation (improves speed and prevents pathological warping):

# Sakoe-Chiba band: window as fraction of series length
dtw_distance(x, y, window=0.1)  # Allow 10% deviation

# Itakura parallelogram: slopes constrain path
dtw_distance(x, y, itakura_max_slope=2.0)

Normalization

Control whether to z-normalize series before distance computation:

# Most elastic distances support normalization
distance = dtw_distance(x, y, normalize=True)

Distance-Specific Parameters

# ERP: penalty for gaps
distance = erp_distance(x, y, g=0.5)

# TWE: stiffness and penalty parameters
distance = twe_distance(x, y, nu=0.001, lmbda=1.0)

# LCSS: epsilon threshold for matching
distance = lcss_distance(x, y, epsilon=0.5)

Algorithm Selection

By Use Case:

Temporal misalignment: DTW, DDTW, WDTW Speed variations: DTW with window constraint Shape similarity: Shape DTW, SBD Edit operations: ERP, EDR, LCSS Derivative matching: DDTW Computational speed: Euclidean, Manhattan Outlier robustness: Manhattan, LCSS

By Computational Cost:

Fastest: Euclidean (O(n)) Fast: Constrained DTW (O(nw) where w is window) Medium: Full DTW (O(n²)) Slower: Complex elastic distances (ERP, TWE, MSM)

Quick Reference Table

Distance Alignment Speed Robustness Interpretability
Euclidean Lock-step Very Fast Low High
DTW Elastic Medium Medium Medium
DDTW Elastic Medium High Medium
WDTW Elastic Medium Medium Medium
ERP Edit-based Slow High Low
LCSS Edit-based Slow Very High Low
Shape DTW Elastic Medium Medium High

Best Practices

1. Normalization

Most distances sensitive to scale; normalize when appropriate:

from aeon.transformations.collection import Normalizer

normalizer = Normalizer()
X_normalized = normalizer.fit_transform(X)

2. Window Constraints

For DTW variants, use window constraints for speed and better generalization:

# Start with 10-20% window
distance = dtw_distance(x, y, window=0.1)

3. Series Length

  • Equal-length required: Most lock-step distances
  • Unequal-length supported: Elastic distances (DTW, ERP, etc.)

4. Multivariate Series

Most distances support multivariate time series:

# x.shape = (n_channels, n_timepoints)
distance = dtw_distance(x_multivariate, y_multivariate)

5. Performance Optimization

  • Use numba-compiled implementations (default in aeon)
  • Consider lock-step distances if alignment not needed
  • Use windowed DTW instead of full DTW
  • Precompute distance matrices for repeated use

6. Choosing the Right Distance

# Quick decision tree:
if series_aligned:
    use_distance = "euclidean"
elif need_speed:
    use_distance = "dtw"  # with window constraint
elif temporal_shifts_expected:
    use_distance = "dtw" or "shape_dtw"
elif outliers_present:
    use_distance = "lcss" or "manhattan"
elif derivatives_matter:
    use_distance = "ddtw" or "wddtw"

Integration with scikit-learn

Aeon distances work with sklearn estimators:

from sklearn.neighbors import KNeighborsClassifier
from aeon.distances import dtw_pairwise_distance

# Precompute distance matrix
X_train_distances = dtw_pairwise_distance(X_train)

# Use with sklearn
clf = KNeighborsClassifier(metric='precomputed')
clf.fit(X_train_distances, y_train)

Available Distance Functions

Get list of all available distances:

from aeon.distances import get_distance_function_names

print(get_distance_function_names())
# ['dtw', 'ddtw', 'wdtw', 'euclidean', 'erp', 'edr', ...]

Retrieve specific distance function:

from aeon.distances import get_distance_function

distance_func = get_distance_function("dtw")
result = distance_func(x, y, window=0.1)