Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,246 @@
# Transformations
Aeon provides extensive transformation capabilities for preprocessing, feature extraction, and representation learning from time series data.
## Transformation Types
Aeon distinguishes between:
- **CollectionTransformers**: Transform multiple time series (collections)
- **SeriesTransformers**: Transform individual time series
## Collection Transformers
### Convolution-Based Feature Extraction
Fast, scalable feature generation using random kernels:
- `RocketTransformer` - Random convolutional kernels
- `MiniRocketTransformer` - Simplified ROCKET for speed
- `MultiRocketTransformer` - Enhanced ROCKET variant
- `HydraTransformer` - Multi-resolution dilated convolutions
- `MultiRocketHydraTransformer` - Combines ROCKET and Hydra
- `ROCKETGPU` - GPU-accelerated variant
**Use when**: Need fast, scalable features for any ML algorithm, strong baseline performance.
### Statistical Feature Extraction
Domain-agnostic features based on time series characteristics:
- `Catch22` - 22 canonical time-series characteristics
- `TSFresh` - Comprehensive automated feature extraction (100+ features)
- `TSFreshRelevant` - Feature extraction with relevance filtering
- `SevenNumberSummary` - Descriptive statistics (mean, std, quantiles)
**Use when**: Need interpretable features, domain-agnostic approach, or feeding traditional ML.
### Dictionary-Based Representations
Symbolic approximations for discrete representations:
- `SAX` - Symbolic Aggregate approXimation
- `PAA` - Piecewise Aggregate Approximation
- `SFA` - Symbolic Fourier Approximation
- `SFAFast` - Optimized SFA
- `SFAWhole` - SFA on entire series (no windowing)
- `BORF` - Bag-of-Receptive-Fields
**Use when**: Need discrete/symbolic representation, dimensionality reduction, interpretability.
### Shapelet-Based Features
Discriminative subsequence extraction:
- `RandomShapeletTransform` - Random discriminative shapelets
- `RandomDilatedShapeletTransform` - Dilated shapelets for multi-scale
- `SAST` - Scalable And Accurate Subsequence Transform
- `RSAST` - Randomized SAST
**Use when**: Need interpretable discriminative patterns, phase-invariant features.
### Interval-Based Features
Statistical summaries from time intervals:
- `RandomIntervals` - Features from random intervals
- `SupervisedIntervals` - Supervised interval selection
- `QUANTTransformer` - Quantile-based interval features
**Use when**: Predictive patterns localized to specific windows.
### Preprocessing Transformations
Data preparation and normalization:
- `MinMaxScaler` - Scale to [0, 1] range
- `Normalizer` - Z-normalization (zero mean, unit variance)
- `Centerer` - Center to zero mean
- `SimpleImputer` - Fill missing values
- `DownsampleTransformer` - Reduce temporal resolution
- `Tabularizer` - Convert time series to tabular format
**Use when**: Need standardization, missing value handling, format conversion.
### Specialized Transformations
Advanced analysis methods:
- `MatrixProfile` - Computes distance profiles for pattern discovery
- `DWTTransformer` - Discrete Wavelet Transform
- `AutocorrelationFunctionTransformer` - ACF computation
- `Dobin` - Distance-based Outlier BasIs using Neighbors
- `SignatureTransformer` - Path signature methods
- `PLATransformer` - Piecewise Linear Approximation
### Class Imbalance Handling
- `ADASYN` - Adaptive Synthetic Sampling
- `SMOTE` - Synthetic Minority Over-sampling
- `OHIT` - Over-sampling with Highly Imbalanced Time series
**Use when**: Classification with imbalanced classes.
### Pipeline Composition
- `CollectionTransformerPipeline` - Chain multiple transformers
## Series Transformers
Transform individual time series (e.g., for preprocessing in forecasting).
### Statistical Analysis
- `AutoCorrelationSeriesTransformer` - Autocorrelation
- `StatsModelsACF` - ACF using statsmodels
- `StatsModelsPACF` - Partial autocorrelation
### Smoothing and Filtering
- `ExponentialSmoothing` - Exponentially weighted moving average
- `MovingAverage` - Simple or weighted moving average
- `SavitzkyGolayFilter` - Polynomial smoothing
- `GaussianFilter` - Gaussian kernel smoothing
- `BKFilter` - Baxter-King bandpass filter
- `DiscreteFourierApproximation` - Fourier-based filtering
**Use when**: Need noise reduction, trend extraction, or frequency filtering.
### Dimensionality Reduction
- `PCASeriesTransformer` - Principal component analysis
- `PlASeriesTransformer` - Piecewise Linear Approximation
### Transformations
- `BoxCoxTransformer` - Variance stabilization
- `LogTransformer` - Logarithmic scaling
- `ClaSPTransformer` - Classification Score Profile
### Pipeline Composition
- `SeriesTransformerPipeline` - Chain series transformers
## Quick Start: Feature Extraction
```python
from aeon.transformations.collection.convolution_based import RocketTransformer
from aeon.classification.sklearn import RotationForest
from aeon.datasets import load_classification
# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Extract ROCKET features
rocket = RocketTransformer()
X_train_features = rocket.fit_transform(X_train)
X_test_features = rocket.transform(X_test)
# Use with any sklearn classifier
clf = RotationForest()
clf.fit(X_train_features, y_train)
accuracy = clf.score(X_test_features, y_test)
```
## Quick Start: Preprocessing Pipeline
```python
from aeon.transformations.collection import (
MinMaxScaler,
SimpleImputer,
CollectionTransformerPipeline
)
# Build preprocessing pipeline
pipeline = CollectionTransformerPipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', MinMaxScaler())
])
X_transformed = pipeline.fit_transform(X_train)
```
## Quick Start: Series Smoothing
```python
from aeon.transformations.series import MovingAverage
# Smooth individual time series
smoother = MovingAverage(window_size=5)
y_smoothed = smoother.fit_transform(y)
```
## Algorithm Selection
### For Feature Extraction:
- **Speed + Performance**: MiniRocketTransformer
- **Interpretability**: Catch22, TSFresh
- **Dimensionality reduction**: PAA, SAX, PCA
- **Discriminative patterns**: Shapelet transforms
- **Comprehensive features**: TSFresh (with longer runtime)
### For Preprocessing:
- **Normalization**: Normalizer, MinMaxScaler
- **Smoothing**: MovingAverage, SavitzkyGolayFilter
- **Missing values**: SimpleImputer
- **Frequency analysis**: DWTTransformer, Fourier methods
### For Symbolic Representation:
- **Fast approximation**: PAA
- **Alphabet-based**: SAX
- **Frequency-based**: SFA, SFAFast
## Best Practices
1. **Fit on training data only**: Avoid data leakage
```python
transformer.fit(X_train)
X_train_tf = transformer.transform(X_train)
X_test_tf = transformer.transform(X_test)
```
2. **Pipeline composition**: Chain transformers for complex workflows
```python
pipeline = CollectionTransformerPipeline([
('imputer', SimpleImputer()),
('scaler', Normalizer()),
('features', RocketTransformer())
])
```
3. **Feature selection**: TSFresh can generate many features; consider selection
```python
from sklearn.feature_selection import SelectKBest
selector = SelectKBest(k=100)
X_selected = selector.fit_transform(X_features, y)
```
4. **Memory considerations**: Some transformers memory-intensive on large datasets
- Use MiniRocket instead of ROCKET for speed
- Consider downsampling for very long series
- Use ROCKETGPU for GPU acceleration
5. **Domain knowledge**: Choose transformations matching domain:
- Periodic data: Fourier-based methods
- Noisy data: Smoothing filters
- Spike detection: Wavelet transforms