164 lines
4.8 KiB
Markdown
164 lines
4.8 KiB
Markdown
# Time Series Segmentation
|
|
|
|
Aeon provides algorithms to partition time series into regions with distinct characteristics, identifying change points and boundaries.
|
|
|
|
## Segmentation Algorithms
|
|
|
|
### Binary Segmentation
|
|
- `BinSegmenter` - Recursive binary segmentation
|
|
- Iteratively splits series at most significant change points
|
|
- Parameters: `n_segments`, `cost_function`
|
|
- **Use when**: Known number of segments, hierarchical structure
|
|
|
|
### Classification-Based
|
|
- `ClaSPSegmenter` - Classification Score Profile
|
|
- Uses classification performance to identify boundaries
|
|
- Discovers segments where classification distinguishes neighbors
|
|
- **Use when**: Segments have different temporal patterns
|
|
|
|
### Fast Pattern-Based
|
|
- `FLUSSSegmenter` - Fast Low-cost Unipotent Semantic Segmentation
|
|
- Efficient semantic segmentation using arc crossings
|
|
- Based on matrix profile
|
|
- **Use when**: Large time series, need speed and pattern discovery
|
|
|
|
### Information Theory
|
|
- `InformationGainSegmenter` - Information gain maximization
|
|
- Finds boundaries maximizing information gain
|
|
- **Use when**: Statistical differences between segments
|
|
|
|
### Gaussian Modeling
|
|
- `GreedyGaussianSegmenter` - Greedy Gaussian approximation
|
|
- Models segments as Gaussian distributions
|
|
- Incrementally adds change points
|
|
- **Use when**: Segments follow Gaussian distributions
|
|
|
|
### Hierarchical Agglomerative
|
|
- `EAggloSegmenter` - Bottom-up merging approach
|
|
- Estimates change points via agglomeration
|
|
- **Use when**: Want hierarchical segmentation structure
|
|
|
|
### Hidden Markov Models
|
|
- `HMMSegmenter` - HMM with Viterbi decoding
|
|
- Probabilistic state-based segmentation
|
|
- **Use when**: Segments represent hidden states
|
|
|
|
### Dimensionality-Based
|
|
- `HidalgoSegmenter` - Heterogeneous Intrinsic Dimensionality Algorithm
|
|
- Detects changes in local dimensionality
|
|
- **Use when**: Dimensionality shifts between segments
|
|
|
|
### Baseline
|
|
- `RandomSegmenter` - Random change point generation
|
|
- **Use when**: Need null hypothesis baseline
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from aeon.segmentation import ClaSPSegmenter
|
|
import numpy as np
|
|
|
|
# Create time series with regime changes
|
|
y = np.concatenate([
|
|
np.sin(np.linspace(0, 10, 100)), # Segment 1
|
|
np.cos(np.linspace(0, 10, 100)), # Segment 2
|
|
np.sin(2 * np.linspace(0, 10, 100)) # Segment 3
|
|
])
|
|
|
|
# Segment the series
|
|
segmenter = ClaSPSegmenter()
|
|
change_points = segmenter.fit_predict(y)
|
|
|
|
print(f"Detected change points: {change_points}")
|
|
```
|
|
|
|
## Output Format
|
|
|
|
Segmenters return change point indices:
|
|
|
|
```python
|
|
# change_points = [100, 200] # Boundaries between segments
|
|
# This divides series into: [0:100], [100:200], [200:end]
|
|
```
|
|
|
|
## Algorithm Selection
|
|
|
|
- **Speed priority**: FLUSSSegmenter, BinSegmenter
|
|
- **Accuracy priority**: ClaSPSegmenter, HMMSegmenter
|
|
- **Known segment count**: BinSegmenter with n_segments parameter
|
|
- **Unknown segment count**: ClaSPSegmenter, InformationGainSegmenter
|
|
- **Pattern changes**: FLUSSSegmenter, ClaSPSegmenter
|
|
- **Statistical changes**: InformationGainSegmenter, GreedyGaussianSegmenter
|
|
- **State transitions**: HMMSegmenter
|
|
|
|
## Common Use Cases
|
|
|
|
### Regime Change Detection
|
|
Identify when time series behavior fundamentally changes:
|
|
|
|
```python
|
|
from aeon.segmentation import InformationGainSegmenter
|
|
|
|
segmenter = InformationGainSegmenter(k=3) # Up to 3 change points
|
|
change_points = segmenter.fit_predict(stock_prices)
|
|
```
|
|
|
|
### Activity Segmentation
|
|
Segment sensor data into activities:
|
|
|
|
```python
|
|
from aeon.segmentation import ClaSPSegmenter
|
|
|
|
segmenter = ClaSPSegmenter()
|
|
boundaries = segmenter.fit_predict(accelerometer_data)
|
|
```
|
|
|
|
### Seasonal Boundary Detection
|
|
Find season transitions in time series:
|
|
|
|
```python
|
|
from aeon.segmentation import HMMSegmenter
|
|
|
|
segmenter = HMMSegmenter(n_states=4) # 4 seasons
|
|
segments = segmenter.fit_predict(temperature_data)
|
|
```
|
|
|
|
## Evaluation Metrics
|
|
|
|
Use segmentation quality metrics:
|
|
|
|
```python
|
|
from aeon.benchmarking.metrics.segmentation import (
|
|
count_error,
|
|
hausdorff_error
|
|
)
|
|
|
|
# Count error: difference in number of change points
|
|
count_err = count_error(y_true, y_pred)
|
|
|
|
# Hausdorff: maximum distance between predicted and true points
|
|
hausdorff_err = hausdorff_error(y_true, y_pred)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Normalize data**: Ensures change detection not dominated by scale
|
|
2. **Choose appropriate metric**: Different algorithms optimize different criteria
|
|
3. **Validate segments**: Visualize to verify meaningful boundaries
|
|
4. **Handle noise**: Consider smoothing before segmentation
|
|
5. **Domain knowledge**: Use expected segment count if known
|
|
6. **Parameter tuning**: Adjust sensitivity parameters (thresholds, penalties)
|
|
|
|
## Visualization
|
|
|
|
```python
|
|
import matplotlib.pyplot as plt
|
|
|
|
plt.figure(figsize=(12, 4))
|
|
plt.plot(y, label='Time Series')
|
|
for cp in change_points:
|
|
plt.axvline(cp, color='r', linestyle='--', label='Change Point')
|
|
plt.legend()
|
|
plt.show()
|
|
```
|