Files
gh-k-dense-ai-claude-scient…/skills/aeon/references/transformations.md
2025-11-30 08:30:10 +08:00

7.5 KiB

Transformations

Aeon provides extensive transformation capabilities for preprocessing, feature extraction, and representation learning from time series data.

Transformation Types

Aeon distinguishes between:

  • CollectionTransformers: Transform multiple time series (collections)
  • SeriesTransformers: Transform individual time series

Collection Transformers

Convolution-Based Feature Extraction

Fast, scalable feature generation using random kernels:

  • RocketTransformer - Random convolutional kernels
  • MiniRocketTransformer - Simplified ROCKET for speed
  • MultiRocketTransformer - Enhanced ROCKET variant
  • HydraTransformer - Multi-resolution dilated convolutions
  • MultiRocketHydraTransformer - Combines ROCKET and Hydra
  • ROCKETGPU - GPU-accelerated variant

Use when: Need fast, scalable features for any ML algorithm, strong baseline performance.

Statistical Feature Extraction

Domain-agnostic features based on time series characteristics:

  • Catch22 - 22 canonical time-series characteristics
  • TSFresh - Comprehensive automated feature extraction (100+ features)
  • TSFreshRelevant - Feature extraction with relevance filtering
  • SevenNumberSummary - Descriptive statistics (mean, std, quantiles)

Use when: Need interpretable features, domain-agnostic approach, or feeding traditional ML.

Dictionary-Based Representations

Symbolic approximations for discrete representations:

  • SAX - Symbolic Aggregate approXimation
  • PAA - Piecewise Aggregate Approximation
  • SFA - Symbolic Fourier Approximation
  • SFAFast - Optimized SFA
  • SFAWhole - SFA on entire series (no windowing)
  • BORF - Bag-of-Receptive-Fields

Use when: Need discrete/symbolic representation, dimensionality reduction, interpretability.

Shapelet-Based Features

Discriminative subsequence extraction:

  • RandomShapeletTransform - Random discriminative shapelets
  • RandomDilatedShapeletTransform - Dilated shapelets for multi-scale
  • SAST - Scalable And Accurate Subsequence Transform
  • RSAST - Randomized SAST

Use when: Need interpretable discriminative patterns, phase-invariant features.

Interval-Based Features

Statistical summaries from time intervals:

  • RandomIntervals - Features from random intervals
  • SupervisedIntervals - Supervised interval selection
  • QUANTTransformer - Quantile-based interval features

Use when: Predictive patterns localized to specific windows.

Preprocessing Transformations

Data preparation and normalization:

  • MinMaxScaler - Scale to [0, 1] range
  • Normalizer - Z-normalization (zero mean, unit variance)
  • Centerer - Center to zero mean
  • SimpleImputer - Fill missing values
  • DownsampleTransformer - Reduce temporal resolution
  • Tabularizer - Convert time series to tabular format

Use when: Need standardization, missing value handling, format conversion.

Specialized Transformations

Advanced analysis methods:

  • MatrixProfile - Computes distance profiles for pattern discovery
  • DWTTransformer - Discrete Wavelet Transform
  • AutocorrelationFunctionTransformer - ACF computation
  • Dobin - Distance-based Outlier BasIs using Neighbors
  • SignatureTransformer - Path signature methods
  • PLATransformer - Piecewise Linear Approximation

Class Imbalance Handling

  • ADASYN - Adaptive Synthetic Sampling
  • SMOTE - Synthetic Minority Over-sampling
  • OHIT - Over-sampling with Highly Imbalanced Time series

Use when: Classification with imbalanced classes.

Pipeline Composition

  • CollectionTransformerPipeline - Chain multiple transformers

Series Transformers

Transform individual time series (e.g., for preprocessing in forecasting).

Statistical Analysis

  • AutoCorrelationSeriesTransformer - Autocorrelation
  • StatsModelsACF - ACF using statsmodels
  • StatsModelsPACF - Partial autocorrelation

Smoothing and Filtering

  • ExponentialSmoothing - Exponentially weighted moving average
  • MovingAverage - Simple or weighted moving average
  • SavitzkyGolayFilter - Polynomial smoothing
  • GaussianFilter - Gaussian kernel smoothing
  • BKFilter - Baxter-King bandpass filter
  • DiscreteFourierApproximation - Fourier-based filtering

Use when: Need noise reduction, trend extraction, or frequency filtering.

Dimensionality Reduction

  • PCASeriesTransformer - Principal component analysis
  • PlASeriesTransformer - Piecewise Linear Approximation

Transformations

  • BoxCoxTransformer - Variance stabilization
  • LogTransformer - Logarithmic scaling
  • ClaSPTransformer - Classification Score Profile

Pipeline Composition

  • SeriesTransformerPipeline - Chain series transformers

Quick Start: Feature Extraction

from aeon.transformations.collection.convolution_based import RocketTransformer
from aeon.classification.sklearn import RotationForest
from aeon.datasets import load_classification

# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")

# Extract ROCKET features
rocket = RocketTransformer()
X_train_features = rocket.fit_transform(X_train)
X_test_features = rocket.transform(X_test)

# Use with any sklearn classifier
clf = RotationForest()
clf.fit(X_train_features, y_train)
accuracy = clf.score(X_test_features, y_test)

Quick Start: Preprocessing Pipeline

from aeon.transformations.collection import (
    MinMaxScaler,
    SimpleImputer,
    CollectionTransformerPipeline
)

# Build preprocessing pipeline
pipeline = CollectionTransformerPipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', MinMaxScaler())
])

X_transformed = pipeline.fit_transform(X_train)

Quick Start: Series Smoothing

from aeon.transformations.series import MovingAverage

# Smooth individual time series
smoother = MovingAverage(window_size=5)
y_smoothed = smoother.fit_transform(y)

Algorithm Selection

For Feature Extraction:

  • Speed + Performance: MiniRocketTransformer
  • Interpretability: Catch22, TSFresh
  • Dimensionality reduction: PAA, SAX, PCA
  • Discriminative patterns: Shapelet transforms
  • Comprehensive features: TSFresh (with longer runtime)

For Preprocessing:

  • Normalization: Normalizer, MinMaxScaler
  • Smoothing: MovingAverage, SavitzkyGolayFilter
  • Missing values: SimpleImputer
  • Frequency analysis: DWTTransformer, Fourier methods

For Symbolic Representation:

  • Fast approximation: PAA
  • Alphabet-based: SAX
  • Frequency-based: SFA, SFAFast

Best Practices

  1. Fit on training data only: Avoid data leakage

    transformer.fit(X_train)
    X_train_tf = transformer.transform(X_train)
    X_test_tf = transformer.transform(X_test)
    
  2. Pipeline composition: Chain transformers for complex workflows

    pipeline = CollectionTransformerPipeline([
        ('imputer', SimpleImputer()),
        ('scaler', Normalizer()),
        ('features', RocketTransformer())
    ])
    
  3. Feature selection: TSFresh can generate many features; consider selection

    from sklearn.feature_selection import SelectKBest
    selector = SelectKBest(k=100)
    X_selected = selector.fit_transform(X_features, y)
    
  4. Memory considerations: Some transformers memory-intensive on large datasets

    • Use MiniRocket instead of ROCKET for speed
    • Consider downsampling for very long series
    • Use ROCKETGPU for GPU acceleration
  5. Domain knowledge: Choose transformations matching domain:

    • Periodic data: Fourier-based methods
    • Noisy data: Smoothing filters
    • Spike detection: Wavelet transforms