On-demand forecasting with machine learning in Elasticsearch
Editor's Note (August 3, 2021): This post uses deprecated features. Please reference the map custom regions with reverse geocoding documentation for current instructions.
The newest X-Pack machine learning feature in 6.1 is on-demand forecasting. Previously, Elastic’s machine learning was designed to use historical data to predict the normal range of values for “now” and compare that to the data we actually saw so it could identify anomalies in real time. Now in 6.1, machine learning can model the data and predict multiple time intervals into the future.
It is called “on-demand forecasting” because users can take an existing machine learning job and, using the predictive model built into machine learning, forecast out where that model is expected to grow over the forecasted days. The forecast results are written to an Elasticsearch index that allows users to compare actual results to forecast models.
Capacity planning with machine learning and forecasting
It has been said many times that past performance is not indicative of future results. But, the best way to predict results for capacity planning is by using past performance indicators.
How can you determine when a particular resource is going to reach its capacity? For example, if you are monitoring your server’s disk space you may need to estimate when it will run out of space. You can use Elastic’s predictive machine learning models to forecast into the future and identify when you will need to add storage to the system.
Another way you can use capacity planning is to be able to predict a volume metric at a specific time in the future. An example of this would be to try to predict how many customer calls your business would expect to get on a Monday afternoon. By analyzing historical data and using the complex machine learning models you will have the information you need to make decisions around staffing and resources.
Getting started using forecasting
Forecasts can be run from the Single Metric Viewer of existing machine learning jobs. Once a system has been upgraded to version 6.1 there will be a new option button, in the upper right corner, to forecast jobs.
The results of the prediction for the machine learning job will be estimated in a dark yellow trend line with the confidence model in a lighter yellow band. If the band of light yellow is thin then that indicates a greater confidence in the prediction. The band of light yellow becomes thicker as the prediction model gets less confident.
Considerations when building a forecast
There are several details that should be considered when building forecasting models to better understand the results. Forecasting results may not look like what you expected, and it does not work with every dataset.
It is recommended that you collect enough historical data before attempting to run a machine learning job for forecasting. The sweet spot is usually about 3 weeks or 3 full intervals of periodic data. If you run a forecast too early in the learning phase, before the model can be established, it will likely display unusable results.
If forecast confidence levels get out of reasonable bounds, the forecast model will stop prematurely. The forecast job will stop and a message (like the one below) indicating that the confidence level fell outside acceptable limits will appear.
Forecast results are much easier to understand if model plot is turned ‘on’. This is an option, and on single-metric jobs, it is on by default. For multi-metric jobs, the model plot option can be turned on by configuring the ‘model-plot-config’ option within the machine learning job config.
To help track any specific forecast outside of the Single Metric Viewer, each forecast will be given a unique ID, called the forecast_ID, so every forecast can be queried separately. Multiple forecasts can be run for the same metric, but the UI will only display the last five forecasts run for any single metric. Still, all forecasts are available and take the corresponding index space. Forecast results are automatically deleted after 14 days if you run them from the UI, while direct API usage allows you to specify the data expiration. See the forecast documentation for the details.
Machine Learning comes with an Elastic Platinum subscription, but you can download a free trial of X-Pack and try it out.