Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/aeon/references/transformations.md
+++ b/skills/aeon/references/transformations.md
@@ -0,0 +1,246 @@
+# Transformations
+
+Aeon provides extensive transformation capabilities for preprocessing, feature extraction, and representation learning from time series data.
+
+## Transformation Types
+
+Aeon distinguishes between:
+- **CollectionTransformers**: Transform multiple time series (collections)
+- **SeriesTransformers**: Transform individual time series
+
+## Collection Transformers
+
+### Convolution-Based Feature Extraction
+
+Fast, scalable feature generation using random kernels:
+
+- `RocketTransformer` - Random convolutional kernels
+- `MiniRocketTransformer` - Simplified ROCKET for speed
+- `MultiRocketTransformer` - Enhanced ROCKET variant
+- `HydraTransformer` - Multi-resolution dilated convolutions
+- `MultiRocketHydraTransformer` - Combines ROCKET and Hydra
+- `ROCKETGPU` - GPU-accelerated variant
+
+**Use when**: Need fast, scalable features for any ML algorithm, strong baseline performance.
+
+### Statistical Feature Extraction
+
+Domain-agnostic features based on time series characteristics:
+
+- `Catch22` - 22 canonical time-series characteristics
+- `TSFresh` - Comprehensive automated feature extraction (100+ features)
+- `TSFreshRelevant` - Feature extraction with relevance filtering
+- `SevenNumberSummary` - Descriptive statistics (mean, std, quantiles)
+
+**Use when**: Need interpretable features, domain-agnostic approach, or feeding traditional ML.
+
+### Dictionary-Based Representations
+
+Symbolic approximations for discrete representations:
+
+- `SAX` - Symbolic Aggregate approXimation
+- `PAA` - Piecewise Aggregate Approximation
+- `SFA` - Symbolic Fourier Approximation
+- `SFAFast` - Optimized SFA
+- `SFAWhole` - SFA on entire series (no windowing)
+- `BORF` - Bag-of-Receptive-Fields
+
+**Use when**: Need discrete/symbolic representation, dimensionality reduction, interpretability.
+
+### Shapelet-Based Features
+
+Discriminative subsequence extraction:
+
+- `RandomShapeletTransform` - Random discriminative shapelets
+- `RandomDilatedShapeletTransform` - Dilated shapelets for multi-scale
+- `SAST` - Scalable And Accurate Subsequence Transform
+- `RSAST` - Randomized SAST
+
+**Use when**: Need interpretable discriminative patterns, phase-invariant features.
+
+### Interval-Based Features
+
+Statistical summaries from time intervals:
+
+- `RandomIntervals` - Features from random intervals
+- `SupervisedIntervals` - Supervised interval selection
+- `QUANTTransformer` - Quantile-based interval features
+
+**Use when**: Predictive patterns localized to specific windows.
+
+### Preprocessing Transformations
+
+Data preparation and normalization:
+
+- `MinMaxScaler` - Scale to [0, 1] range
+- `Normalizer` - Z-normalization (zero mean, unit variance)
+- `Centerer` - Center to zero mean
+- `SimpleImputer` - Fill missing values
+- `DownsampleTransformer` - Reduce temporal resolution
+- `Tabularizer` - Convert time series to tabular format
+
+**Use when**: Need standardization, missing value handling, format conversion.
+
+### Specialized Transformations
+
+Advanced analysis methods:
+
+- `MatrixProfile` - Computes distance profiles for pattern discovery
+- `DWTTransformer` - Discrete Wavelet Transform
+- `AutocorrelationFunctionTransformer` - ACF computation
+- `Dobin` - Distance-based Outlier BasIs using Neighbors
+- `SignatureTransformer` - Path signature methods
+- `PLATransformer` - Piecewise Linear Approximation
+
+### Class Imbalance Handling
+
+- `ADASYN` - Adaptive Synthetic Sampling
+- `SMOTE` - Synthetic Minority Over-sampling
+- `OHIT` - Over-sampling with Highly Imbalanced Time series
+
+**Use when**: Classification with imbalanced classes.
+
+### Pipeline Composition
+
+- `CollectionTransformerPipeline` - Chain multiple transformers
+
+## Series Transformers
+
+Transform individual time series (e.g., for preprocessing in forecasting).
+
+### Statistical Analysis
+
+- `AutoCorrelationSeriesTransformer` - Autocorrelation
+- `StatsModelsACF` - ACF using statsmodels
+- `StatsModelsPACF` - Partial autocorrelation
+
+### Smoothing and Filtering
+
+- `ExponentialSmoothing` - Exponentially weighted moving average
+- `MovingAverage` - Simple or weighted moving average
+- `SavitzkyGolayFilter` - Polynomial smoothing
+- `GaussianFilter` - Gaussian kernel smoothing
+- `BKFilter` - Baxter-King bandpass filter
+- `DiscreteFourierApproximation` - Fourier-based filtering
+
+**Use when**: Need noise reduction, trend extraction, or frequency filtering.
+
+### Dimensionality Reduction
+
+- `PCASeriesTransformer` - Principal component analysis
+- `PlASeriesTransformer` - Piecewise Linear Approximation
+
+### Transformations
+
+- `BoxCoxTransformer` - Variance stabilization
+- `LogTransformer` - Logarithmic scaling
+- `ClaSPTransformer` - Classification Score Profile
+
+### Pipeline Composition
+
+- `SeriesTransformerPipeline` - Chain series transformers
+
+## Quick Start: Feature Extraction
+
+```python
+from aeon.transformations.collection.convolution_based import RocketTransformer
+from aeon.classification.sklearn import RotationForest
+from aeon.datasets import load_classification
+
+# Load data
+X_train, y_train = load_classification("GunPoint", split="train")
+X_test, y_test = load_classification("GunPoint", split="test")
+
+# Extract ROCKET features
+rocket = RocketTransformer()
+X_train_features = rocket.fit_transform(X_train)
+X_test_features = rocket.transform(X_test)
+
+# Use with any sklearn classifier
+clf = RotationForest()
+clf.fit(X_train_features, y_train)
+accuracy = clf.score(X_test_features, y_test)
+```
+
+## Quick Start: Preprocessing Pipeline
+
+```python
+from aeon.transformations.collection import (
+    MinMaxScaler,
+    SimpleImputer,
+    CollectionTransformerPipeline
+)
+
+# Build preprocessing pipeline
+pipeline = CollectionTransformerPipeline([
+    ('imputer', SimpleImputer(strategy='mean')),
+    ('scaler', MinMaxScaler())
+])
+
+X_transformed = pipeline.fit_transform(X_train)
+```
+
+## Quick Start: Series Smoothing
+
+```python
+from aeon.transformations.series import MovingAverage
+
+# Smooth individual time series
+smoother = MovingAverage(window_size=5)
+y_smoothed = smoother.fit_transform(y)
+```
+
+## Algorithm Selection
+
+### For Feature Extraction:
+- **Speed + Performance**: MiniRocketTransformer
+- **Interpretability**: Catch22, TSFresh
+- **Dimensionality reduction**: PAA, SAX, PCA
+- **Discriminative patterns**: Shapelet transforms
+- **Comprehensive features**: TSFresh (with longer runtime)
+
+### For Preprocessing:
+- **Normalization**: Normalizer, MinMaxScaler
+- **Smoothing**: MovingAverage, SavitzkyGolayFilter
+- **Missing values**: SimpleImputer
+- **Frequency analysis**: DWTTransformer, Fourier methods
+
+### For Symbolic Representation:
+- **Fast approximation**: PAA
+- **Alphabet-based**: SAX
+- **Frequency-based**: SFA, SFAFast
+
+## Best Practices
+
+1. **Fit on training data only**: Avoid data leakage
+   ```python
+   transformer.fit(X_train)
+   X_train_tf = transformer.transform(X_train)
+   X_test_tf = transformer.transform(X_test)
+   ```
+
+2. **Pipeline composition**: Chain transformers for complex workflows
+   ```python
+   pipeline = CollectionTransformerPipeline([
+       ('imputer', SimpleImputer()),
+       ('scaler', Normalizer()),
+       ('features', RocketTransformer())
+   ])
+   ```
+
+3. **Feature selection**: TSFresh can generate many features; consider selection
+   ```python
+   from sklearn.feature_selection import SelectKBest
+   selector = SelectKBest(k=100)
+   X_selected = selector.fit_transform(X_features, y)
+   ```
+
+4. **Memory considerations**: Some transformers memory-intensive on large datasets
+   - Use MiniRocket instead of ROCKET for speed
+   - Consider downsampling for very long series
+   - Use ROCKETGPU for GPU acceleration
+
+5. **Domain knowledge**: Choose transformations matching domain:
+   - Periodic data: Fourier-based methods
+   - Noisy data: Smoothing filters
+   - Spike detection: Wavelet transforms