4.8 KiB
Anomaly Detection
Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.
Collection Anomaly Detectors
Detect anomalous time series within a collection:
-
ClassificationAdapter- Adapts classifiers for anomaly detection- Train on normal data, flag outliers during prediction
- Use when: Have labeled normal data, want classification-based approach
-
OutlierDetectionAdapter- Wraps sklearn outlier detectors- Works with IsolationForest, LOF, OneClassSVM
- Use when: Want to use sklearn anomaly detectors on collections
Series Anomaly Detectors
Detect anomalous points or subsequences within a single time series.
Distance-Based Methods
Use similarity metrics to identify anomalies:
-
CBLOF- Cluster-Based Local Outlier Factor- Clusters data, identifies outliers based on cluster properties
- Use when: Anomalies form sparse clusters
-
KMeansAD- K-means based anomaly detection- Distance to nearest cluster center indicates anomaly
- Use when: Normal patterns cluster well
-
LeftSTAMPi- Left STAMP incremental- Matrix profile for online anomaly detection
- Use when: Streaming data, need online detection
-
STOMP- Scalable Time series Ordered-search Matrix Profile- Computes matrix profile for subsequence anomalies
- Use when: Discord discovery, motif detection
-
MERLIN- Matrix profile-based method- Efficient matrix profile computation
- Use when: Large time series, need scalability
-
LOF- Local Outlier Factor adapted for time series- Density-based outlier detection
- Use when: Anomalies in low-density regions
-
ROCKAD- ROCKET-based semi-supervised detection- Uses ROCKET features for anomaly identification
- Use when: Have some labeled data, want feature-based approach
Distribution-Based Methods
Analyze statistical distributions:
-
COPOD- Copula-Based Outlier Detection- Models marginal and joint distributions
- Use when: Multi-dimensional time series, complex dependencies
-
DWT_MLEAD- Discrete Wavelet Transform Multi-Level Anomaly Detection- Decomposes series into frequency bands
- Use when: Anomalies at specific frequencies
Isolation-Based Methods
Use isolation principles:
-
IsolationForest- Random forest-based isolation- Anomalies easier to isolate than normal points
- Use when: High-dimensional data, no assumptions about distribution
-
OneClassSVM- Support vector machine for novelty detection- Learns boundary around normal data
- Use when: Well-defined normal region, need robust boundary
-
STRAY- Streaming Robust Anomaly Detection- Robust to data distribution changes
- Use when: Streaming data, distribution shifts
External Library Integration
PyODAdapter- Bridges PyOD library to aeon- Access 40+ PyOD anomaly detectors
- Use when: Need specific PyOD algorithm
Quick Start
from aeon.anomaly_detection import STOMP
import numpy as np
# Create time series with anomaly
y = np.concatenate([
np.sin(np.linspace(0, 10, 100)),
[5.0], # Anomaly spike
np.sin(np.linspace(10, 20, 100))
])
# Detect anomalies
detector = STOMP(window_size=10)
anomaly_scores = detector.fit_predict(y)
# Higher scores indicate more anomalous points
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
Point vs Subsequence Anomalies
-
Point anomalies: Single unusual values
- Use: COPOD, DWT_MLEAD, IsolationForest
-
Subsequence anomalies (discords): Unusual patterns
- Use: STOMP, LeftSTAMPi, MERLIN
-
Collective anomalies: Groups of points forming unusual pattern
- Use: Matrix profile methods, clustering-based
Evaluation Metrics
Specialized metrics for anomaly detection:
from aeon.benchmarking.metrics.anomaly_detection import (
range_precision,
range_recall,
range_f_score,
roc_auc_score
)
# Range-based metrics account for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)
Algorithm Selection
- Speed priority: KMeansAD, IsolationForest
- Accuracy priority: STOMP, COPOD
- Streaming data: LeftSTAMPi, STRAY
- Discord discovery: STOMP, MERLIN
- Multi-dimensional: COPOD, PyODAdapter
- Semi-supervised: ROCKAD, OneClassSVM
- No training data: IsolationForest, STOMP
Best Practices
- Normalize data: Many methods sensitive to scale
- Choose window size: For matrix profile methods, window size critical
- Set threshold: Use percentile-based or domain-specific thresholds
- Validate results: Visualize detections to verify meaningfulness
- Handle seasonality: Detrend/deseasonalize before detection