Files
gh-k-dense-ai-claude-scient…/skills/aeon/references/anomaly_detection.md
2025-11-30 08:30:10 +08:00

4.8 KiB

Anomaly Detection

Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.

Collection Anomaly Detectors

Detect anomalous time series within a collection:

  • ClassificationAdapter - Adapts classifiers for anomaly detection

    • Train on normal data, flag outliers during prediction
    • Use when: Have labeled normal data, want classification-based approach
  • OutlierDetectionAdapter - Wraps sklearn outlier detectors

    • Works with IsolationForest, LOF, OneClassSVM
    • Use when: Want to use sklearn anomaly detectors on collections

Series Anomaly Detectors

Detect anomalous points or subsequences within a single time series.

Distance-Based Methods

Use similarity metrics to identify anomalies:

  • CBLOF - Cluster-Based Local Outlier Factor

    • Clusters data, identifies outliers based on cluster properties
    • Use when: Anomalies form sparse clusters
  • KMeansAD - K-means based anomaly detection

    • Distance to nearest cluster center indicates anomaly
    • Use when: Normal patterns cluster well
  • LeftSTAMPi - Left STAMP incremental

    • Matrix profile for online anomaly detection
    • Use when: Streaming data, need online detection
  • STOMP - Scalable Time series Ordered-search Matrix Profile

    • Computes matrix profile for subsequence anomalies
    • Use when: Discord discovery, motif detection
  • MERLIN - Matrix profile-based method

    • Efficient matrix profile computation
    • Use when: Large time series, need scalability
  • LOF - Local Outlier Factor adapted for time series

    • Density-based outlier detection
    • Use when: Anomalies in low-density regions
  • ROCKAD - ROCKET-based semi-supervised detection

    • Uses ROCKET features for anomaly identification
    • Use when: Have some labeled data, want feature-based approach

Distribution-Based Methods

Analyze statistical distributions:

  • COPOD - Copula-Based Outlier Detection

    • Models marginal and joint distributions
    • Use when: Multi-dimensional time series, complex dependencies
  • DWT_MLEAD - Discrete Wavelet Transform Multi-Level Anomaly Detection

    • Decomposes series into frequency bands
    • Use when: Anomalies at specific frequencies

Isolation-Based Methods

Use isolation principles:

  • IsolationForest - Random forest-based isolation

    • Anomalies easier to isolate than normal points
    • Use when: High-dimensional data, no assumptions about distribution
  • OneClassSVM - Support vector machine for novelty detection

    • Learns boundary around normal data
    • Use when: Well-defined normal region, need robust boundary
  • STRAY - Streaming Robust Anomaly Detection

    • Robust to data distribution changes
    • Use when: Streaming data, distribution shifts

External Library Integration

  • PyODAdapter - Bridges PyOD library to aeon
    • Access 40+ PyOD anomaly detectors
    • Use when: Need specific PyOD algorithm

Quick Start

from aeon.anomaly_detection import STOMP
import numpy as np

# Create time series with anomaly
y = np.concatenate([
    np.sin(np.linspace(0, 10, 100)),
    [5.0],  # Anomaly spike
    np.sin(np.linspace(10, 20, 100))
])

# Detect anomalies
detector = STOMP(window_size=10)
anomaly_scores = detector.fit_predict(y)

# Higher scores indicate more anomalous points
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold

Point vs Subsequence Anomalies

  • Point anomalies: Single unusual values

    • Use: COPOD, DWT_MLEAD, IsolationForest
  • Subsequence anomalies (discords): Unusual patterns

    • Use: STOMP, LeftSTAMPi, MERLIN
  • Collective anomalies: Groups of points forming unusual pattern

    • Use: Matrix profile methods, clustering-based

Evaluation Metrics

Specialized metrics for anomaly detection:

from aeon.benchmarking.metrics.anomaly_detection import (
    range_precision,
    range_recall,
    range_f_score,
    roc_auc_score
)

# Range-based metrics account for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)

Algorithm Selection

  • Speed priority: KMeansAD, IsolationForest
  • Accuracy priority: STOMP, COPOD
  • Streaming data: LeftSTAMPi, STRAY
  • Discord discovery: STOMP, MERLIN
  • Multi-dimensional: COPOD, PyODAdapter
  • Semi-supervised: ROCKAD, OneClassSVM
  • No training data: IsolationForest, STOMP

Best Practices

  1. Normalize data: Many methods sensitive to scale
  2. Choose window size: For matrix profile methods, window size critical
  3. Set threshold: Use percentile-based or domain-specific thresholds
  4. Validate results: Visualize detections to verify meaningfulness
  5. Handle seasonality: Detrend/deseasonalize before detection