Initial commit
This commit is contained in:
246
skills/aeon/references/transformations.md
Normal file
246
skills/aeon/references/transformations.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Transformations
|
||||
|
||||
Aeon provides extensive transformation capabilities for preprocessing, feature extraction, and representation learning from time series data.
|
||||
|
||||
## Transformation Types
|
||||
|
||||
Aeon distinguishes between:
|
||||
- **CollectionTransformers**: Transform multiple time series (collections)
|
||||
- **SeriesTransformers**: Transform individual time series
|
||||
|
||||
## Collection Transformers
|
||||
|
||||
### Convolution-Based Feature Extraction
|
||||
|
||||
Fast, scalable feature generation using random kernels:
|
||||
|
||||
- `RocketTransformer` - Random convolutional kernels
|
||||
- `MiniRocketTransformer` - Simplified ROCKET for speed
|
||||
- `MultiRocketTransformer` - Enhanced ROCKET variant
|
||||
- `HydraTransformer` - Multi-resolution dilated convolutions
|
||||
- `MultiRocketHydraTransformer` - Combines ROCKET and Hydra
|
||||
- `ROCKETGPU` - GPU-accelerated variant
|
||||
|
||||
**Use when**: Need fast, scalable features for any ML algorithm, strong baseline performance.
|
||||
|
||||
### Statistical Feature Extraction
|
||||
|
||||
Domain-agnostic features based on time series characteristics:
|
||||
|
||||
- `Catch22` - 22 canonical time-series characteristics
|
||||
- `TSFresh` - Comprehensive automated feature extraction (100+ features)
|
||||
- `TSFreshRelevant` - Feature extraction with relevance filtering
|
||||
- `SevenNumberSummary` - Descriptive statistics (mean, std, quantiles)
|
||||
|
||||
**Use when**: Need interpretable features, domain-agnostic approach, or feeding traditional ML.
|
||||
|
||||
### Dictionary-Based Representations
|
||||
|
||||
Symbolic approximations for discrete representations:
|
||||
|
||||
- `SAX` - Symbolic Aggregate approXimation
|
||||
- `PAA` - Piecewise Aggregate Approximation
|
||||
- `SFA` - Symbolic Fourier Approximation
|
||||
- `SFAFast` - Optimized SFA
|
||||
- `SFAWhole` - SFA on entire series (no windowing)
|
||||
- `BORF` - Bag-of-Receptive-Fields
|
||||
|
||||
**Use when**: Need discrete/symbolic representation, dimensionality reduction, interpretability.
|
||||
|
||||
### Shapelet-Based Features
|
||||
|
||||
Discriminative subsequence extraction:
|
||||
|
||||
- `RandomShapeletTransform` - Random discriminative shapelets
|
||||
- `RandomDilatedShapeletTransform` - Dilated shapelets for multi-scale
|
||||
- `SAST` - Scalable And Accurate Subsequence Transform
|
||||
- `RSAST` - Randomized SAST
|
||||
|
||||
**Use when**: Need interpretable discriminative patterns, phase-invariant features.
|
||||
|
||||
### Interval-Based Features
|
||||
|
||||
Statistical summaries from time intervals:
|
||||
|
||||
- `RandomIntervals` - Features from random intervals
|
||||
- `SupervisedIntervals` - Supervised interval selection
|
||||
- `QUANTTransformer` - Quantile-based interval features
|
||||
|
||||
**Use when**: Predictive patterns localized to specific windows.
|
||||
|
||||
### Preprocessing Transformations
|
||||
|
||||
Data preparation and normalization:
|
||||
|
||||
- `MinMaxScaler` - Scale to [0, 1] range
|
||||
- `Normalizer` - Z-normalization (zero mean, unit variance)
|
||||
- `Centerer` - Center to zero mean
|
||||
- `SimpleImputer` - Fill missing values
|
||||
- `DownsampleTransformer` - Reduce temporal resolution
|
||||
- `Tabularizer` - Convert time series to tabular format
|
||||
|
||||
**Use when**: Need standardization, missing value handling, format conversion.
|
||||
|
||||
### Specialized Transformations
|
||||
|
||||
Advanced analysis methods:
|
||||
|
||||
- `MatrixProfile` - Computes distance profiles for pattern discovery
|
||||
- `DWTTransformer` - Discrete Wavelet Transform
|
||||
- `AutocorrelationFunctionTransformer` - ACF computation
|
||||
- `Dobin` - Distance-based Outlier BasIs using Neighbors
|
||||
- `SignatureTransformer` - Path signature methods
|
||||
- `PLATransformer` - Piecewise Linear Approximation
|
||||
|
||||
### Class Imbalance Handling
|
||||
|
||||
- `ADASYN` - Adaptive Synthetic Sampling
|
||||
- `SMOTE` - Synthetic Minority Over-sampling
|
||||
- `OHIT` - Over-sampling with Highly Imbalanced Time series
|
||||
|
||||
**Use when**: Classification with imbalanced classes.
|
||||
|
||||
### Pipeline Composition
|
||||
|
||||
- `CollectionTransformerPipeline` - Chain multiple transformers
|
||||
|
||||
## Series Transformers
|
||||
|
||||
Transform individual time series (e.g., for preprocessing in forecasting).
|
||||
|
||||
### Statistical Analysis
|
||||
|
||||
- `AutoCorrelationSeriesTransformer` - Autocorrelation
|
||||
- `StatsModelsACF` - ACF using statsmodels
|
||||
- `StatsModelsPACF` - Partial autocorrelation
|
||||
|
||||
### Smoothing and Filtering
|
||||
|
||||
- `ExponentialSmoothing` - Exponentially weighted moving average
|
||||
- `MovingAverage` - Simple or weighted moving average
|
||||
- `SavitzkyGolayFilter` - Polynomial smoothing
|
||||
- `GaussianFilter` - Gaussian kernel smoothing
|
||||
- `BKFilter` - Baxter-King bandpass filter
|
||||
- `DiscreteFourierApproximation` - Fourier-based filtering
|
||||
|
||||
**Use when**: Need noise reduction, trend extraction, or frequency filtering.
|
||||
|
||||
### Dimensionality Reduction
|
||||
|
||||
- `PCASeriesTransformer` - Principal component analysis
|
||||
- `PlASeriesTransformer` - Piecewise Linear Approximation
|
||||
|
||||
### Transformations
|
||||
|
||||
- `BoxCoxTransformer` - Variance stabilization
|
||||
- `LogTransformer` - Logarithmic scaling
|
||||
- `ClaSPTransformer` - Classification Score Profile
|
||||
|
||||
### Pipeline Composition
|
||||
|
||||
- `SeriesTransformerPipeline` - Chain series transformers
|
||||
|
||||
## Quick Start: Feature Extraction
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.convolution_based import RocketTransformer
|
||||
from aeon.classification.sklearn import RotationForest
|
||||
from aeon.datasets import load_classification
|
||||
|
||||
# Load data
|
||||
X_train, y_train = load_classification("GunPoint", split="train")
|
||||
X_test, y_test = load_classification("GunPoint", split="test")
|
||||
|
||||
# Extract ROCKET features
|
||||
rocket = RocketTransformer()
|
||||
X_train_features = rocket.fit_transform(X_train)
|
||||
X_test_features = rocket.transform(X_test)
|
||||
|
||||
# Use with any sklearn classifier
|
||||
clf = RotationForest()
|
||||
clf.fit(X_train_features, y_train)
|
||||
accuracy = clf.score(X_test_features, y_test)
|
||||
```
|
||||
|
||||
## Quick Start: Preprocessing Pipeline
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection import (
|
||||
MinMaxScaler,
|
||||
SimpleImputer,
|
||||
CollectionTransformerPipeline
|
||||
)
|
||||
|
||||
# Build preprocessing pipeline
|
||||
pipeline = CollectionTransformerPipeline([
|
||||
('imputer', SimpleImputer(strategy='mean')),
|
||||
('scaler', MinMaxScaler())
|
||||
])
|
||||
|
||||
X_transformed = pipeline.fit_transform(X_train)
|
||||
```
|
||||
|
||||
## Quick Start: Series Smoothing
|
||||
|
||||
```python
|
||||
from aeon.transformations.series import MovingAverage
|
||||
|
||||
# Smooth individual time series
|
||||
smoother = MovingAverage(window_size=5)
|
||||
y_smoothed = smoother.fit_transform(y)
|
||||
```
|
||||
|
||||
## Algorithm Selection
|
||||
|
||||
### For Feature Extraction:
|
||||
- **Speed + Performance**: MiniRocketTransformer
|
||||
- **Interpretability**: Catch22, TSFresh
|
||||
- **Dimensionality reduction**: PAA, SAX, PCA
|
||||
- **Discriminative patterns**: Shapelet transforms
|
||||
- **Comprehensive features**: TSFresh (with longer runtime)
|
||||
|
||||
### For Preprocessing:
|
||||
- **Normalization**: Normalizer, MinMaxScaler
|
||||
- **Smoothing**: MovingAverage, SavitzkyGolayFilter
|
||||
- **Missing values**: SimpleImputer
|
||||
- **Frequency analysis**: DWTTransformer, Fourier methods
|
||||
|
||||
### For Symbolic Representation:
|
||||
- **Fast approximation**: PAA
|
||||
- **Alphabet-based**: SAX
|
||||
- **Frequency-based**: SFA, SFAFast
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Fit on training data only**: Avoid data leakage
|
||||
```python
|
||||
transformer.fit(X_train)
|
||||
X_train_tf = transformer.transform(X_train)
|
||||
X_test_tf = transformer.transform(X_test)
|
||||
```
|
||||
|
||||
2. **Pipeline composition**: Chain transformers for complex workflows
|
||||
```python
|
||||
pipeline = CollectionTransformerPipeline([
|
||||
('imputer', SimpleImputer()),
|
||||
('scaler', Normalizer()),
|
||||
('features', RocketTransformer())
|
||||
])
|
||||
```
|
||||
|
||||
3. **Feature selection**: TSFresh can generate many features; consider selection
|
||||
```python
|
||||
from sklearn.feature_selection import SelectKBest
|
||||
selector = SelectKBest(k=100)
|
||||
X_selected = selector.fit_transform(X_features, y)
|
||||
```
|
||||
|
||||
4. **Memory considerations**: Some transformers memory-intensive on large datasets
|
||||
- Use MiniRocket instead of ROCKET for speed
|
||||
- Consider downsampling for very long series
|
||||
- Use ROCKETGPU for GPU acceleration
|
||||
|
||||
5. **Domain knowledge**: Choose transformations matching domain:
|
||||
- Periodic data: Fourier-based methods
|
||||
- Noisy data: Smoothing filters
|
||||
- Spike detection: Wavelet transforms
|
||||
Reference in New Issue
Block a user