155 lines
4.8 KiB
Markdown
155 lines
4.8 KiB
Markdown
# Anomaly Detection
|
|
|
|
Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.
|
|
|
|
## Collection Anomaly Detectors
|
|
|
|
Detect anomalous time series within a collection:
|
|
|
|
- `ClassificationAdapter` - Adapts classifiers for anomaly detection
|
|
- Train on normal data, flag outliers during prediction
|
|
- **Use when**: Have labeled normal data, want classification-based approach
|
|
|
|
- `OutlierDetectionAdapter` - Wraps sklearn outlier detectors
|
|
- Works with IsolationForest, LOF, OneClassSVM
|
|
- **Use when**: Want to use sklearn anomaly detectors on collections
|
|
|
|
## Series Anomaly Detectors
|
|
|
|
Detect anomalous points or subsequences within a single time series.
|
|
|
|
### Distance-Based Methods
|
|
|
|
Use similarity metrics to identify anomalies:
|
|
|
|
- `CBLOF` - Cluster-Based Local Outlier Factor
|
|
- Clusters data, identifies outliers based on cluster properties
|
|
- **Use when**: Anomalies form sparse clusters
|
|
|
|
- `KMeansAD` - K-means based anomaly detection
|
|
- Distance to nearest cluster center indicates anomaly
|
|
- **Use when**: Normal patterns cluster well
|
|
|
|
- `LeftSTAMPi` - Left STAMP incremental
|
|
- Matrix profile for online anomaly detection
|
|
- **Use when**: Streaming data, need online detection
|
|
|
|
- `STOMP` - Scalable Time series Ordered-search Matrix Profile
|
|
- Computes matrix profile for subsequence anomalies
|
|
- **Use when**: Discord discovery, motif detection
|
|
|
|
- `MERLIN` - Matrix profile-based method
|
|
- Efficient matrix profile computation
|
|
- **Use when**: Large time series, need scalability
|
|
|
|
- `LOF` - Local Outlier Factor adapted for time series
|
|
- Density-based outlier detection
|
|
- **Use when**: Anomalies in low-density regions
|
|
|
|
- `ROCKAD` - ROCKET-based semi-supervised detection
|
|
- Uses ROCKET features for anomaly identification
|
|
- **Use when**: Have some labeled data, want feature-based approach
|
|
|
|
### Distribution-Based Methods
|
|
|
|
Analyze statistical distributions:
|
|
|
|
- `COPOD` - Copula-Based Outlier Detection
|
|
- Models marginal and joint distributions
|
|
- **Use when**: Multi-dimensional time series, complex dependencies
|
|
|
|
- `DWT_MLEAD` - Discrete Wavelet Transform Multi-Level Anomaly Detection
|
|
- Decomposes series into frequency bands
|
|
- **Use when**: Anomalies at specific frequencies
|
|
|
|
### Isolation-Based Methods
|
|
|
|
Use isolation principles:
|
|
|
|
- `IsolationForest` - Random forest-based isolation
|
|
- Anomalies easier to isolate than normal points
|
|
- **Use when**: High-dimensional data, no assumptions about distribution
|
|
|
|
- `OneClassSVM` - Support vector machine for novelty detection
|
|
- Learns boundary around normal data
|
|
- **Use when**: Well-defined normal region, need robust boundary
|
|
|
|
- `STRAY` - Streaming Robust Anomaly Detection
|
|
- Robust to data distribution changes
|
|
- **Use when**: Streaming data, distribution shifts
|
|
|
|
### External Library Integration
|
|
|
|
- `PyODAdapter` - Bridges PyOD library to aeon
|
|
- Access 40+ PyOD anomaly detectors
|
|
- **Use when**: Need specific PyOD algorithm
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from aeon.anomaly_detection import STOMP
|
|
import numpy as np
|
|
|
|
# Create time series with anomaly
|
|
y = np.concatenate([
|
|
np.sin(np.linspace(0, 10, 100)),
|
|
[5.0], # Anomaly spike
|
|
np.sin(np.linspace(10, 20, 100))
|
|
])
|
|
|
|
# Detect anomalies
|
|
detector = STOMP(window_size=10)
|
|
anomaly_scores = detector.fit_predict(y)
|
|
|
|
# Higher scores indicate more anomalous points
|
|
threshold = np.percentile(anomaly_scores, 95)
|
|
anomalies = anomaly_scores > threshold
|
|
```
|
|
|
|
## Point vs Subsequence Anomalies
|
|
|
|
- **Point anomalies**: Single unusual values
|
|
- Use: COPOD, DWT_MLEAD, IsolationForest
|
|
|
|
- **Subsequence anomalies** (discords): Unusual patterns
|
|
- Use: STOMP, LeftSTAMPi, MERLIN
|
|
|
|
- **Collective anomalies**: Groups of points forming unusual pattern
|
|
- Use: Matrix profile methods, clustering-based
|
|
|
|
## Evaluation Metrics
|
|
|
|
Specialized metrics for anomaly detection:
|
|
|
|
```python
|
|
from aeon.benchmarking.metrics.anomaly_detection import (
|
|
range_precision,
|
|
range_recall,
|
|
range_f_score,
|
|
roc_auc_score
|
|
)
|
|
|
|
# Range-based metrics account for window detection
|
|
precision = range_precision(y_true, y_pred, alpha=0.5)
|
|
recall = range_recall(y_true, y_pred, alpha=0.5)
|
|
f1 = range_f_score(y_true, y_pred, alpha=0.5)
|
|
```
|
|
|
|
## Algorithm Selection
|
|
|
|
- **Speed priority**: KMeansAD, IsolationForest
|
|
- **Accuracy priority**: STOMP, COPOD
|
|
- **Streaming data**: LeftSTAMPi, STRAY
|
|
- **Discord discovery**: STOMP, MERLIN
|
|
- **Multi-dimensional**: COPOD, PyODAdapter
|
|
- **Semi-supervised**: ROCKAD, OneClassSVM
|
|
- **No training data**: IsolationForest, STOMP
|
|
|
|
## Best Practices
|
|
|
|
1. **Normalize data**: Many methods sensitive to scale
|
|
2. **Choose window size**: For matrix profile methods, window size critical
|
|
3. **Set threshold**: Use percentile-based or domain-specific thresholds
|
|
4. **Validate results**: Visualize detections to verify meaningfulness
|
|
5. **Handle seasonality**: Detrend/deseasonalize before detection
|