Files
gh-k-dense-ai-claude-scient…/skills/aeon/references/segmentation.md
2025-11-30 08:30:10 +08:00

164 lines
4.8 KiB
Markdown

# Time Series Segmentation
Aeon provides algorithms to partition time series into regions with distinct characteristics, identifying change points and boundaries.
## Segmentation Algorithms
### Binary Segmentation
- `BinSegmenter` - Recursive binary segmentation
- Iteratively splits series at most significant change points
- Parameters: `n_segments`, `cost_function`
- **Use when**: Known number of segments, hierarchical structure
### Classification-Based
- `ClaSPSegmenter` - Classification Score Profile
- Uses classification performance to identify boundaries
- Discovers segments where classification distinguishes neighbors
- **Use when**: Segments have different temporal patterns
### Fast Pattern-Based
- `FLUSSSegmenter` - Fast Low-cost Unipotent Semantic Segmentation
- Efficient semantic segmentation using arc crossings
- Based on matrix profile
- **Use when**: Large time series, need speed and pattern discovery
### Information Theory
- `InformationGainSegmenter` - Information gain maximization
- Finds boundaries maximizing information gain
- **Use when**: Statistical differences between segments
### Gaussian Modeling
- `GreedyGaussianSegmenter` - Greedy Gaussian approximation
- Models segments as Gaussian distributions
- Incrementally adds change points
- **Use when**: Segments follow Gaussian distributions
### Hierarchical Agglomerative
- `EAggloSegmenter` - Bottom-up merging approach
- Estimates change points via agglomeration
- **Use when**: Want hierarchical segmentation structure
### Hidden Markov Models
- `HMMSegmenter` - HMM with Viterbi decoding
- Probabilistic state-based segmentation
- **Use when**: Segments represent hidden states
### Dimensionality-Based
- `HidalgoSegmenter` - Heterogeneous Intrinsic Dimensionality Algorithm
- Detects changes in local dimensionality
- **Use when**: Dimensionality shifts between segments
### Baseline
- `RandomSegmenter` - Random change point generation
- **Use when**: Need null hypothesis baseline
## Quick Start
```python
from aeon.segmentation import ClaSPSegmenter
import numpy as np
# Create time series with regime changes
y = np.concatenate([
np.sin(np.linspace(0, 10, 100)), # Segment 1
np.cos(np.linspace(0, 10, 100)), # Segment 2
np.sin(2 * np.linspace(0, 10, 100)) # Segment 3
])
# Segment the series
segmenter = ClaSPSegmenter()
change_points = segmenter.fit_predict(y)
print(f"Detected change points: {change_points}")
```
## Output Format
Segmenters return change point indices:
```python
# change_points = [100, 200] # Boundaries between segments
# This divides series into: [0:100], [100:200], [200:end]
```
## Algorithm Selection
- **Speed priority**: FLUSSSegmenter, BinSegmenter
- **Accuracy priority**: ClaSPSegmenter, HMMSegmenter
- **Known segment count**: BinSegmenter with n_segments parameter
- **Unknown segment count**: ClaSPSegmenter, InformationGainSegmenter
- **Pattern changes**: FLUSSSegmenter, ClaSPSegmenter
- **Statistical changes**: InformationGainSegmenter, GreedyGaussianSegmenter
- **State transitions**: HMMSegmenter
## Common Use Cases
### Regime Change Detection
Identify when time series behavior fundamentally changes:
```python
from aeon.segmentation import InformationGainSegmenter
segmenter = InformationGainSegmenter(k=3) # Up to 3 change points
change_points = segmenter.fit_predict(stock_prices)
```
### Activity Segmentation
Segment sensor data into activities:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter()
boundaries = segmenter.fit_predict(accelerometer_data)
```
### Seasonal Boundary Detection
Find season transitions in time series:
```python
from aeon.segmentation import HMMSegmenter
segmenter = HMMSegmenter(n_states=4) # 4 seasons
segments = segmenter.fit_predict(temperature_data)
```
## Evaluation Metrics
Use segmentation quality metrics:
```python
from aeon.benchmarking.metrics.segmentation import (
count_error,
hausdorff_error
)
# Count error: difference in number of change points
count_err = count_error(y_true, y_pred)
# Hausdorff: maximum distance between predicted and true points
hausdorff_err = hausdorff_error(y_true, y_pred)
```
## Best Practices
1. **Normalize data**: Ensures change detection not dominated by scale
2. **Choose appropriate metric**: Different algorithms optimize different criteria
3. **Validate segments**: Visualize to verify meaningful boundaries
4. **Handle noise**: Consider smoothing before segmentation
5. **Domain knowledge**: Use expected segment count if known
6. **Parameter tuning**: Adjust sensitivity parameters (thresholds, penalties)
## Visualization
```python
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.plot(y, label='Time Series')
for cp in change_points:
plt.axvline(cp, color='r', linestyle='--', label='Change Point')
plt.legend()
plt.show()
```