Anomaly Detection — is the identification of rare items, events, or patterns that significantly differ from the majority of the data.
The Numenta Anomaly Benchmark (NAB):
- Contains many files with different metrics from different places. It is in the nature of metrics — being ordered in time.
- From NAB we decided to use Real CPU utilization from AWS Cloudwatch metrics for Amazon Relational Database Service. These metrics are saved in csv format and the exact timestamps of anomalies in them (let’s call it labels) are saved in json format.
- ARIMA statistical model as a baseline — this is the classic auto-regression model that is made exactly for the time series.
- Convolutional Neural Network — such neural networks are usually used for image processing, but if you dig deeper into them, you may find that they actually look for the patterns in the images. Our time series also consists of patterns
- Long Short-Term Memory Neural Network — this type was designed especially for time-related data
- Google Colab as working environment for .ipynb files.
- Scikit-Learn for some data preprocessing.
- Statsmodel library for ARIMA model.
- PyTorch for neural networks.
- Plotly for plots and graphs.