Time-series anomaly detection is a feature used to identify unusual patterns that do not conform to expected behavior, called outliers.
There are many applications in business, from intrusion detection (identifying strange patterns in network traffic that could signal a hack) to system health monitoring (spotting a malignant tumor in an MRI scan), and from fraud detection in credit card transactions to fault detection in operating environments.
Upon creation of your Anomaly Detection Workspace, the user will be presented with a number of configuration steps.
- Select Dataset - the user is able to select an existing time-series dataset or upload a new dataset to analyze (please note that anomaly detection algorithms work only with time series data at this time)
- Cloud9QL data manipulation (optional) - this allows the user to post process the data by applying Cloud9QL transformation (more information on Cloud9QL can be found here)
- Select the Date/Time Dimension - this is the time series feature of the selected dataset that is going to be on the X chart axis
- Select the Numeric attribute - this is the numerical feature of the selected dataset that you'd like to monitor. This will be the Y chart axis
- Choose your Algorithm - here the user will select one of the many anomaly forecasting algorithms available (see below)
Anomaly forecasting algorithms
* Olympic Model (Seasonal Naive)
The naive seasonal model where the prediction for next point is a smoothed average over the previous n periods.
* Double and Triple Exponential Smoothing Models
Both are popular models used to produce smoothed time-series. The exponential smoothing variant add trend and seasonality into the model. The ETS model used automatically picks the best 'fit' exponential smoothing model.
* Moving Average Model
Here, the forecast is based on an artificially constructed time series in which the value for a given time period is replaced by the mean of that value and the values for some number of the preceding and succeeding time periods.
* Weighted Moving Average and Naive Forecasting Models
The forecadt for both of these models is based on an artificially constructed time series in which the value for a given time period is replaced by the mean of that value and the values for some number of the preceding and succeeding time periods. The Weighted Moving Average is a special case of the moving average model.
* Regression Model
Models the relationship between x & y using one or more variable.
* ARIMA Model
Uses the Autoregressive Integrated Moving Average method.
As soon as the above steps have been completed and the Run Analysis option selected an anomaly detection model is trained and applied to the data. The precision of the model increases over time as more data is made available.
The anomaly detection visualization itself consists of a configurable blue band range of expected values (acceptable threshold limit) along with the actual metric data points. Any values outside of the blue band range are considered anomalies and will appear in red.
Configuring the Anomaly Detection Algorithm
The width of the blue band of the expected values can be configured by setting the threshold attribute explicitly on the settings modal dialog. This Anomaly detection threshold is the mean absolute percentage deviation from the expected value. The default threshold value set is 50% but this can be modified.
Saving the Anomaly detection visualization
As an option you can save the anomaly detection visualisation results as widget that can then be shared on one or more dashboards. To do this, simply select teh Save Widget option and enter a widget name. The widget will now appear in the general widget list for subsequent use outside of the Machine Learning module.
However all anomaly related information available within the widget settings bar will not be readily available for user edit. All anomaly detection settings have to be changed via the anomaly workspace directly.
Setting an Anomaly Detection alert
One crucial feature around the anomaly detection is the ability to configure alerts that provide automatic notification when new anomalies are detected.
Channels such as email, webhook and slack can be easily set up by selecting the alerts button from the control list.
By default the look back interval is set to equals to the alert frequency, so any anomaly will be communicated within that interval only. As soon as at least 1 anomaly is detected the system will trigger the alert.
There are several fixed email placeholders that may be used in the email template to add additional information:
* %DATASET_NAME% - represents the dataset name selected
* %ANOMALY_SIZE% - represents the number of anomalies within the look back interval
* %FREQUENCY% - represents the frequency of the alert chosen
* %ANOMALY_RESULTS% - represents the detailed information about the anomalies, including expected range and actual metric value
Adding additional analyses
The workspace can contain one or more anomaly detection models. To add another into the workspace, simply choose the Add Analysis button.