Getting started with machine learning

edit

Getting started with machine learning

edit

Machine learning features analyze your data and generate models for its patterns of behavior. The type of analysis that you choose depends on the questions or problems you want to address and the type of data you have available.

Unsupervised machine learning

edit

There are two types of analysis that can deduce the patterns and relationships within your data without training or intervention: anomaly detection and outlier detection.

Anomaly detection requires time series data. It constructs a probability model and can run continuously to identify unusual events as they occur. The model evolves over time; you can use its insights to forecast future behavior.

Outlier detection does not require time series data; it identifies unusual points in a data set by analyzing how close each data point is to others and the density of the cluster of points around it. It does not run continuously; it generates a copy of your data set where each data point is annotated with an outlier score. The score indicates the extent to which a data point is an outlier compared to other data points.

Supervised machine learning

edit

There are two types of analysis that require training data sets: classification and regression.

In both cases, the result is a copy of your data set where each data point is annotated with predictions and a trained model, which you can deploy to make predictions for new data. For more information, refer to Introduction to supervised learning.

Classification learns relationships between your data points in order to predict discrete categorical values, such as whether a DNS request originates from a malicious or benign domain.

Regression learns relationships between your data points in order to predict continuous numerical values, such as the response time for a web request.

Try it out

edit

Ready to take machine learning for a test drive? Follow this tutorial to:

  • Try out the Data Visualizer
  • Create anomaly detection jobs for the Kibana sample data
  • Use the results to identify possible anomalies in the data

At the end of this tutorial, you should have a good idea of what machine learning is and will hopefully be inspired to use it to detect anomalies in your own data.

Need more context? Check out the Elasticsearch introduction to learn the lingo and understand the basics of how Elasticsearch works.

  1. Before you can play with the machine learning features, you must install Elasticsearch and Kibana. Elasticsearch stores the data and the analysis results. Kibana provides a helpful user interface for creating and viewing jobs.

    You can run Elasticsearch and Kibana on your own hardware, or use our hosted Elasticsearch Service on Elastic Cloud. The Elasticsearch Service is available on both AWS and GCP. Try out the Elasticsearch Service for free.

  2. Verify that your environment is set up properly to use the machine learning features. If the Elasticsearch security features are enabled, to complete this tutorial you need a user that has authority to manage anomaly detection jobs. See Setup and security.
  3. Add the sample data sets that ship with Kibana.

    1. From the Kibana home page, click Add data, then select Sample data.
    2. Pick a data set. In this tutorial, you’ll use the Sample web logs. While you’re here, feel free to click Add data on all of the available sample data sets.

These data sets are now ready be analyzed in machine learning jobs in Kibana.