Files
gh-k-dense-ai-claude-scient…/skills/aeon/references/anomaly_detection.md
2025-11-30 08:30:10 +08:00

155 lines
4.8 KiB
Markdown

# Anomaly Detection
Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.
## Collection Anomaly Detectors
Detect anomalous time series within a collection:
- `ClassificationAdapter` - Adapts classifiers for anomaly detection
- Train on normal data, flag outliers during prediction
- **Use when**: Have labeled normal data, want classification-based approach
- `OutlierDetectionAdapter` - Wraps sklearn outlier detectors
- Works with IsolationForest, LOF, OneClassSVM
- **Use when**: Want to use sklearn anomaly detectors on collections
## Series Anomaly Detectors
Detect anomalous points or subsequences within a single time series.
### Distance-Based Methods
Use similarity metrics to identify anomalies:
- `CBLOF` - Cluster-Based Local Outlier Factor
- Clusters data, identifies outliers based on cluster properties
- **Use when**: Anomalies form sparse clusters
- `KMeansAD` - K-means based anomaly detection
- Distance to nearest cluster center indicates anomaly
- **Use when**: Normal patterns cluster well
- `LeftSTAMPi` - Left STAMP incremental
- Matrix profile for online anomaly detection
- **Use when**: Streaming data, need online detection
- `STOMP` - Scalable Time series Ordered-search Matrix Profile
- Computes matrix profile for subsequence anomalies
- **Use when**: Discord discovery, motif detection
- `MERLIN` - Matrix profile-based method
- Efficient matrix profile computation
- **Use when**: Large time series, need scalability
- `LOF` - Local Outlier Factor adapted for time series
- Density-based outlier detection
- **Use when**: Anomalies in low-density regions
- `ROCKAD` - ROCKET-based semi-supervised detection
- Uses ROCKET features for anomaly identification
- **Use when**: Have some labeled data, want feature-based approach
### Distribution-Based Methods
Analyze statistical distributions:
- `COPOD` - Copula-Based Outlier Detection
- Models marginal and joint distributions
- **Use when**: Multi-dimensional time series, complex dependencies
- `DWT_MLEAD` - Discrete Wavelet Transform Multi-Level Anomaly Detection
- Decomposes series into frequency bands
- **Use when**: Anomalies at specific frequencies
### Isolation-Based Methods
Use isolation principles:
- `IsolationForest` - Random forest-based isolation
- Anomalies easier to isolate than normal points
- **Use when**: High-dimensional data, no assumptions about distribution
- `OneClassSVM` - Support vector machine for novelty detection
- Learns boundary around normal data
- **Use when**: Well-defined normal region, need robust boundary
- `STRAY` - Streaming Robust Anomaly Detection
- Robust to data distribution changes
- **Use when**: Streaming data, distribution shifts
### External Library Integration
- `PyODAdapter` - Bridges PyOD library to aeon
- Access 40+ PyOD anomaly detectors
- **Use when**: Need specific PyOD algorithm
## Quick Start
```python
from aeon.anomaly_detection import STOMP
import numpy as np
# Create time series with anomaly
y = np.concatenate([
np.sin(np.linspace(0, 10, 100)),
[5.0], # Anomaly spike
np.sin(np.linspace(10, 20, 100))
])
# Detect anomalies
detector = STOMP(window_size=10)
anomaly_scores = detector.fit_predict(y)
# Higher scores indicate more anomalous points
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
## Point vs Subsequence Anomalies
- **Point anomalies**: Single unusual values
- Use: COPOD, DWT_MLEAD, IsolationForest
- **Subsequence anomalies** (discords): Unusual patterns
- Use: STOMP, LeftSTAMPi, MERLIN
- **Collective anomalies**: Groups of points forming unusual pattern
- Use: Matrix profile methods, clustering-based
## Evaluation Metrics
Specialized metrics for anomaly detection:
```python
from aeon.benchmarking.metrics.anomaly_detection import (
range_precision,
range_recall,
range_f_score,
roc_auc_score
)
# Range-based metrics account for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)
```
## Algorithm Selection
- **Speed priority**: KMeansAD, IsolationForest
- **Accuracy priority**: STOMP, COPOD
- **Streaming data**: LeftSTAMPi, STRAY
- **Discord discovery**: STOMP, MERLIN
- **Multi-dimensional**: COPOD, PyODAdapter
- **Semi-supervised**: ROCKAD, OneClassSVM
- **No training data**: IsolationForest, STOMP
## Best Practices
1. **Normalize data**: Many methods sensitive to scale
2. **Choose window size**: For matrix profile methods, window size critical
3. **Set threshold**: Use percentile-based or domain-specific thresholds
4. **Validate results**: Visualize detections to verify meaningfulness
5. **Handle seasonality**: Detrend/deseasonalize before detection