290 lines
7.7 KiB
Markdown
290 lines
7.7 KiB
Markdown
# Deep Learning Networks
|
|
|
|
Aeon provides neural network architectures specifically designed for time series tasks. These networks serve as building blocks for classification, regression, clustering, and forecasting.
|
|
|
|
## Core Network Architectures
|
|
|
|
### Convolutional Networks
|
|
|
|
**FCNNetwork** - Fully Convolutional Network
|
|
- Three convolutional blocks with batch normalization
|
|
- Global average pooling for dimensionality reduction
|
|
- **Use when**: Need simple yet effective CNN baseline
|
|
|
|
**ResNetNetwork** - Residual Network
|
|
- Residual blocks with skip connections
|
|
- Prevents vanishing gradients in deep networks
|
|
- **Use when**: Deep networks needed, training stability important
|
|
|
|
**InceptionNetwork** - Inception Modules
|
|
- Multi-scale feature extraction with parallel convolutions
|
|
- Different kernel sizes capture patterns at various scales
|
|
- **Use when**: Patterns exist at multiple temporal scales
|
|
|
|
**TimeCNNNetwork** - Standard CNN
|
|
- Basic convolutional architecture
|
|
- **Use when**: Simple CNN sufficient, interpretability valued
|
|
|
|
**DisjointCNNNetwork** - Separate Pathways
|
|
- Disjoint convolutional pathways
|
|
- **Use when**: Different feature extraction strategies needed
|
|
|
|
**DCNNNetwork** - Dilated CNN
|
|
- Dilated convolutions for large receptive fields
|
|
- **Use when**: Long-range dependencies without many layers
|
|
|
|
### Recurrent Networks
|
|
|
|
**RecurrentNetwork** - RNN/LSTM/GRU
|
|
- Configurable cell type (RNN, LSTM, GRU)
|
|
- Sequential modeling of temporal dependencies
|
|
- **Use when**: Sequential dependencies critical, variable-length series
|
|
|
|
### Temporal Convolutional Network
|
|
|
|
**TCNNetwork** - Temporal Convolutional Network
|
|
- Dilated causal convolutions
|
|
- Large receptive field without recurrence
|
|
- **Use when**: Long sequences, need parallelizable architecture
|
|
|
|
### Multi-Layer Perceptron
|
|
|
|
**MLPNetwork** - Basic Feedforward
|
|
- Simple fully-connected layers
|
|
- Flattens time series before processing
|
|
- **Use when**: Baseline needed, computational limits, or simple patterns
|
|
|
|
## Encoder-Based Architectures
|
|
|
|
Networks designed for representation learning and clustering.
|
|
|
|
### Autoencoder Variants
|
|
|
|
**EncoderNetwork** - Generic Encoder
|
|
- Flexible encoder structure
|
|
- **Use when**: Custom encoding needed
|
|
|
|
**AEFCNNetwork** - FCN-based Autoencoder
|
|
- Fully convolutional encoder-decoder
|
|
- **Use when**: Need convolutional representation learning
|
|
|
|
**AEResNetNetwork** - ResNet Autoencoder
|
|
- Residual blocks in encoder-decoder
|
|
- **Use when**: Deep autoencoding with skip connections
|
|
|
|
**AEDCNNNetwork** - Dilated CNN Autoencoder
|
|
- Dilated convolutions for compression
|
|
- **Use when**: Need large receptive field in autoencoder
|
|
|
|
**AEDRNNNetwork** - Dilated RNN Autoencoder
|
|
- Dilated recurrent connections
|
|
- **Use when**: Sequential patterns with long-range dependencies
|
|
|
|
**AEBiGRUNetwork** - Bidirectional GRU
|
|
- Bidirectional recurrent encoding
|
|
- **Use when**: Context from both directions helpful
|
|
|
|
**AEAttentionBiGRUNetwork** - Attention + BiGRU
|
|
- Attention mechanism on BiGRU outputs
|
|
- **Use when**: Need to focus on important time steps
|
|
|
|
## Specialized Architectures
|
|
|
|
**LITENetwork** - Lightweight Inception Time Ensemble
|
|
- Efficient inception-based architecture
|
|
- LITEMV variant for multivariate series
|
|
- **Use when**: Need efficiency with strong performance
|
|
|
|
**DeepARNetwork** - Probabilistic Forecasting
|
|
- Autoregressive RNN for forecasting
|
|
- Produces probabilistic predictions
|
|
- **Use when**: Need forecast uncertainty quantification
|
|
|
|
## Usage with Estimators
|
|
|
|
Networks are typically used within estimators, not directly:
|
|
|
|
```python
|
|
from aeon.classification.deep_learning import FCNClassifier
|
|
from aeon.regression.deep_learning import ResNetRegressor
|
|
from aeon.clustering.deep_learning import AEFCNClusterer
|
|
|
|
# Classification with FCN
|
|
clf = FCNClassifier(n_epochs=100, batch_size=16)
|
|
clf.fit(X_train, y_train)
|
|
|
|
# Regression with ResNet
|
|
reg = ResNetRegressor(n_epochs=100)
|
|
reg.fit(X_train, y_train)
|
|
|
|
# Clustering with autoencoder
|
|
clusterer = AEFCNClusterer(n_clusters=3, n_epochs=100)
|
|
labels = clusterer.fit_predict(X_train)
|
|
```
|
|
|
|
## Custom Network Configuration
|
|
|
|
Many networks accept configuration parameters:
|
|
|
|
```python
|
|
# Configure FCN layers
|
|
clf = FCNClassifier(
|
|
n_epochs=200,
|
|
batch_size=32,
|
|
kernel_size=[7, 5, 3], # Kernel sizes for each layer
|
|
n_filters=[128, 256, 128], # Filters per layer
|
|
learning_rate=0.001
|
|
)
|
|
```
|
|
|
|
## Base Classes
|
|
|
|
- `BaseDeepLearningNetwork` - Abstract base for all networks
|
|
- `BaseDeepRegressor` - Base for deep regression
|
|
- `BaseDeepClassifier` - Base for deep classification
|
|
- `BaseDeepForecaster` - Base for deep forecasting
|
|
|
|
Extend these to implement custom architectures.
|
|
|
|
## Training Considerations
|
|
|
|
### Hyperparameters
|
|
|
|
Key hyperparameters to tune:
|
|
|
|
- `n_epochs` - Training iterations (50-200 typical)
|
|
- `batch_size` - Samples per batch (16-64 typical)
|
|
- `learning_rate` - Step size (0.0001-0.01)
|
|
- Network-specific: layers, filters, kernel sizes
|
|
|
|
### Callbacks
|
|
|
|
Many networks support callbacks for training monitoring:
|
|
|
|
```python
|
|
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
|
|
|
|
clf = FCNClassifier(
|
|
n_epochs=200,
|
|
callbacks=[
|
|
EarlyStopping(patience=20, restore_best_weights=True),
|
|
ReduceLROnPlateau(patience=10, factor=0.5)
|
|
]
|
|
)
|
|
```
|
|
|
|
### GPU Acceleration
|
|
|
|
Deep learning networks benefit from GPU:
|
|
|
|
```python
|
|
import os
|
|
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Use first GPU
|
|
|
|
# Networks automatically use GPU if available
|
|
clf = InceptionTimeClassifier(n_epochs=100)
|
|
clf.fit(X_train, y_train)
|
|
```
|
|
|
|
## Architecture Selection
|
|
|
|
### By Task:
|
|
|
|
**Classification**: InceptionNetwork, ResNetNetwork, FCNNetwork
|
|
**Regression**: InceptionNetwork, ResNetNetwork, TCNNetwork
|
|
**Forecasting**: TCNNetwork, DeepARNetwork, RecurrentNetwork
|
|
**Clustering**: AEFCNNetwork, AEResNetNetwork, AEAttentionBiGRUNetwork
|
|
|
|
### By Data Characteristics:
|
|
|
|
**Long sequences**: TCNNetwork, DCNNNetwork (dilated convolutions)
|
|
**Short sequences**: MLPNetwork, FCNNetwork
|
|
**Multivariate**: InceptionNetwork, FCNNetwork, LITENetwork
|
|
**Variable length**: RecurrentNetwork with masking
|
|
**Multi-scale patterns**: InceptionNetwork
|
|
|
|
### By Computational Resources:
|
|
|
|
**Limited compute**: MLPNetwork, LITENetwork
|
|
**Moderate compute**: FCNNetwork, TimeCNNNetwork
|
|
**High compute available**: InceptionNetwork, ResNetNetwork
|
|
**GPU available**: Any deep network (major speedup)
|
|
|
|
## Best Practices
|
|
|
|
### 1. Data Preparation
|
|
|
|
Normalize input data:
|
|
|
|
```python
|
|
from aeon.transformations.collection import Normalizer
|
|
|
|
normalizer = Normalizer()
|
|
X_train_norm = normalizer.fit_transform(X_train)
|
|
X_test_norm = normalizer.transform(X_test)
|
|
```
|
|
|
|
### 2. Training/Validation Split
|
|
|
|
Use validation set for early stopping:
|
|
|
|
```python
|
|
from sklearn.model_selection import train_test_split
|
|
|
|
X_train_fit, X_val, y_train_fit, y_val = train_test_split(
|
|
X_train, y_train, test_size=0.2, stratify=y_train
|
|
)
|
|
|
|
clf = FCNClassifier(n_epochs=200)
|
|
clf.fit(X_train_fit, y_train_fit, validation_data=(X_val, y_val))
|
|
```
|
|
|
|
### 3. Start Simple
|
|
|
|
Begin with simpler architectures before complex ones:
|
|
|
|
1. Try MLPNetwork or FCNNetwork first
|
|
2. If insufficient, try ResNetNetwork or InceptionNetwork
|
|
3. Consider ensembles if single models insufficient
|
|
|
|
### 4. Hyperparameter Tuning
|
|
|
|
Use grid search or random search:
|
|
|
|
```python
|
|
from sklearn.model_selection import GridSearchCV
|
|
|
|
param_grid = {
|
|
'n_epochs': [100, 200],
|
|
'batch_size': [16, 32],
|
|
'learning_rate': [0.001, 0.0001]
|
|
}
|
|
|
|
clf = FCNClassifier()
|
|
grid = GridSearchCV(clf, param_grid, cv=3)
|
|
grid.fit(X_train, y_train)
|
|
```
|
|
|
|
### 5. Regularization
|
|
|
|
Prevent overfitting:
|
|
- Use dropout (if network supports)
|
|
- Early stopping
|
|
- Data augmentation (if available)
|
|
- Reduce model complexity
|
|
|
|
### 6. Reproducibility
|
|
|
|
Set random seeds:
|
|
|
|
```python
|
|
import numpy as np
|
|
import random
|
|
import tensorflow as tf
|
|
|
|
seed = 42
|
|
np.random.seed(seed)
|
|
random.seed(seed)
|
|
tf.random.set_seed(seed)
|
|
```
|