Data frame analytics job resources
editData frame analytics job resources
editData frame analytics resources relate to APIs such as Create data frame analytics jobs and Get data frame analytics jobs.
Properties
edit-
analysis
-
(object) The type of analysis that is performed on the
source
. For example:outlier_detection
. For more information, see Analysis objects. -
analyzed_fields
-
(object) You can specify both
includes
and/orexcludes
patterns. Ifanalyzed_fields
is not set, only the relevant fields will be included. For example, all the numeric fields for outlier detection. For the supported field types, see Supported fields.-
includes
- (array) An array of strings that defines the fields that will be included in the analysis.
-
excludes
- (array) An array of strings that defines the fields that will be excluded from the analysis.
-
PUT _ml/data_frame/analytics/loganalytics { "source": { "index": "logdata" }, "dest": { "index": "logdata_out" }, "analysis": { "outlier_detection": { } }, "analyzed_fields": { "includes": [ "request.bytes", "response.counts.error" ], "excludes": [ "source.geo" ] } }
-
dest
-
(object) The destination configuration of the analysis.
-
index
- (Required, string) Defines the destination index to store the results of the data frame analytics job.
-
results_field
-
(Optional, string) Defines the name of the field in which to store the
results of the analysis. Default to
ml
.
-
-
id
- (string) The unique identifier for the data frame analytics job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters. This property is informational; you cannot change the identifier for existing jobs.
-
model_memory_limit
-
(string) The approximate maximum amount of memory resources that are
permitted for analytical processing. The default value for data frame analytics jobs
is
1gb
. If yourelasticsearch.yml
file contains anxpack.ml.max_model_memory_limit
setting, an error occurs when you try to create data frame analytics jobs that havemodel_memory_limit
values greater than that setting. For more information, see Machine learning settings. -
source
-
(object) The source configuration consisting an
index
and optionally aquery
object.-
index
- (Required, string or array) Index or indices on which to perform the analysis. It can be a single index or index pattern as well as an array of indices or patterns.
-
query
-
(Optional, object) The Elasticsearch query domain-specific language
(DSL). This value corresponds to the query object in an Elasticsearch
search POST body. All the options that are supported by Elasticsearch can be used,
as this object is passed verbatim to Elasticsearch. By default, this property has
the following value:
{"match_all": {}}
.
-
Analysis objects
editData frame analytics resources contain analysis
objects. For example, when you
create a data frame analytics job, you must define the type of analysis it performs.
Currently, outlier_detection
is the only available type of analysis, however,
other types will be added, for example regression
.
Outlier detection configuration objects
editAn outlier detection configuration object has the following properties:
-
feature_influence_threshold
-
(double) The minimum outlier score that a document needs to have in order to
calculate its feature influence score. Value range: 0-1 (
0.1
by default). -
method
-
(string) Sets the method that outlier detection uses. If the method is not set
outlier detection uses an ensemble of different methods and normalises and
combines their individual outlier scores to obtain the overall outlier score. We
recommend to use the ensemble method. Available methods are
lof
,ldof
,distance_kth_nn
,distance_knn
. -
n_neighbors
- (integer) Defines the value for how many nearest neighbors each method of outlier detection will use to calculate its outlier score. When the value is not set, different values will be used for different ensemble members. This helps improve diversity in the ensemble. Therefore, only override this if you are confident that the value you choose is appropriate for the data set.