gh-k-dense-ai-claude-scient…/skills/aeon/references/anomaly_detection.md

# Anomaly Detection

Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.

## Collection Anomaly Detectors

Detect anomalous time series within a collection:

- `ClassificationAdapter` - Adapts classifiers for anomaly detection
  - Train on normal data, flag outliers during prediction
  - **Use when**: Have labeled normal data, want classification-based approach

- `OutlierDetectionAdapter` - Wraps sklearn outlier detectors
  - Works with IsolationForest, LOF, OneClassSVM
  - **Use when**: Want to use sklearn anomaly detectors on collections

## Series Anomaly Detectors

Detect anomalous points or subsequences within a single time series.

### Distance-Based Methods

Use similarity metrics to identify anomalies:

- `CBLOF` - Cluster-Based Local Outlier Factor
  - Clusters data, identifies outliers based on cluster properties
  - **Use when**: Anomalies form sparse clusters

- `KMeansAD` - K-means based anomaly detection
  - Distance to nearest cluster center indicates anomaly
  - **Use when**: Normal patterns cluster well

- `LeftSTAMPi` - Left STAMP incremental
  - Matrix profile for online anomaly detection
  - **Use when**: Streaming data, need online detection

- `STOMP` - Scalable Time series Ordered-search Matrix Profile
  - Computes matrix profile for subsequence anomalies
  - **Use when**: Discord discovery, motif detection

- `MERLIN` - Matrix profile-based method
  - Efficient matrix profile computation
  - **Use when**: Large time series, need scalability

- `LOF` - Local Outlier Factor adapted for time series
  - Density-based outlier detection
  - **Use when**: Anomalies in low-density regions

- `ROCKAD` - ROCKET-based semi-supervised detection
  - Uses ROCKET features for anomaly identification
  - **Use when**: Have some labeled data, want feature-based approach

### Distribution-Based Methods

Analyze statistical distributions:

- `COPOD` - Copula-Based Outlier Detection
  - Models marginal and joint distributions
  - **Use when**: Multi-dimensional time series, complex dependencies

- `DWT_MLEAD` - Discrete Wavelet Transform Multi-Level Anomaly Detection
  - Decomposes series into frequency bands
  - **Use when**: Anomalies at specific frequencies

### Isolation-Based Methods

Use isolation principles:

- `IsolationForest` - Random forest-based isolation
  - Anomalies easier to isolate than normal points
  - **Use when**: High-dimensional data, no assumptions about distribution

- `OneClassSVM` - Support vector machine for novelty detection
  - Learns boundary around normal data
  - **Use when**: Well-defined normal region, need robust boundary

- `STRAY` - Streaming Robust Anomaly Detection
  - Robust to data distribution changes
  - **Use when**: Streaming data, distribution shifts

### External Library Integration

- `PyODAdapter` - Bridges PyOD library to aeon
  - Access 40+ PyOD anomaly detectors
  - **Use when**: Need specific PyOD algorithm

## Quick Start

```python
from aeon.anomaly_detection import STOMP
import numpy as np

# Create time series with anomaly
y = np.concatenate([
    np.sin(np.linspace(0, 10, 100)),
    [5.0],  # Anomaly spike
    np.sin(np.linspace(10, 20, 100))
])

# Detect anomalies
detector = STOMP(window_size=10)
anomaly_scores = detector.fit_predict(y)

# Higher scores indicate more anomalous points
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```

## Point vs Subsequence Anomalies

- **Point anomalies**: Single unusual values
  - Use: COPOD, DWT_MLEAD, IsolationForest

- **Subsequence anomalies** (discords): Unusual patterns
  - Use: STOMP, LeftSTAMPi, MERLIN

- **Collective anomalies**: Groups of points forming unusual pattern
  - Use: Matrix profile methods, clustering-based

## Evaluation Metrics

Specialized metrics for anomaly detection:

```python
from aeon.benchmarking.metrics.anomaly_detection import (
    range_precision,
    range_recall,
    range_f_score,
    roc_auc_score
)

# Range-based metrics account for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)
```

## Algorithm Selection

- **Speed priority**: KMeansAD, IsolationForest
- **Accuracy priority**: STOMP, COPOD
- **Streaming data**: LeftSTAMPi, STRAY
- **Discord discovery**: STOMP, MERLIN
- **Multi-dimensional**: COPOD, PyODAdapter
- **Semi-supervised**: ROCKAD, OneClassSVM
- **No training data**: IsolationForest, STOMP

## Best Practices

1. **Normalize data**: Many methods sensitive to scale
2. **Choose window size**: For matrix profile methods, window size critical
3. **Set threshold**: Use percentile-based or domain-specific thresholds
4. **Validate results**: Visualize detections to verify meaningfulness
5. **Handle seasonality**: Detrend/deseasonalize before detection