Explain data frame analytics API

edit

This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.

Explains the following about a data frame analytics config:

  • field selection: which fields are included or not in the analysis
  • memory estimation: how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for model_memory_limit setting later on.

The API accepts an ExplainDataFrameAnalyticsRequest object and returns an ExplainDataFrameAnalyticsResponse.

Explain data frame analytics request

edit

The request can be constructed with the id of an existing data frame analytics job.

ExplainDataFrameAnalyticsRequest request = new ExplainDataFrameAnalyticsRequest("existing_job_id"); 

Constructing a new request with the id of an existing data frame analytics job

It can also be constructed with a data frame analytics config to explain it before creating it.

DataFrameAnalyticsConfig config = DataFrameAnalyticsConfig.builder()
    .setSource(DataFrameAnalyticsSource.builder().setIndex("explain-df-test-source-index").build())
    .setAnalysis(org.elasticsearch.client.ml.dataframe.OutlierDetection.createDefault())
    .build();
request = new ExplainDataFrameAnalyticsRequest(config); 

Constructing a new request containing a data frame analytics config

Synchronous execution

edit

When executing a ExplainDataFrameAnalyticsRequest in the following manner, the client waits for the ExplainDataFrameAnalyticsResponse to be returned before continuing with code execution:

ExplainDataFrameAnalyticsResponse response = client.machineLearning().explainDataFrameAnalytics(request,
    RequestOptions.DEFAULT);

Synchronous calls may throw an IOException in case of either failing to parse the REST response in the high-level REST client, the request times out or similar cases where there is no response coming back from the server.

In cases where the server returns a 4xx or 5xx error code, the high-level client tries to parse the response body error details instead and then throws a generic ElasticsearchException and adds the original ResponseException as a suppressed exception to it.

Asynchronous execution

edit

Executing a ExplainDataFrameAnalyticsRequest can also be done in an asynchronous fashion so that the client can return directly. Users need to specify how the response or potential failures will be handled by passing the request and a listener to the asynchronous explain-data-frame-analytics method:

client.machineLearning().explainDataFrameAnalyticsAsync(request, RequestOptions.DEFAULT, listener); 

The ExplainDataFrameAnalyticsRequest to execute and the ActionListener to use when the execution completes

The asynchronous method does not block and returns immediately. Once it is completed the ActionListener is called back using the onResponse method if the execution successfully completed or using the onFailure method if it failed. Failure scenarios and expected exceptions are the same as in the synchronous execution case.

A typical listener for explain-data-frame-analytics looks like:

ActionListener<ExplainDataFrameAnalyticsResponse> listener = new ActionListener<ExplainDataFrameAnalyticsResponse>() {
    @Override
    public void onResponse(ExplainDataFrameAnalyticsResponse response) {
        
    }

    @Override
    public void onFailure(Exception e) {
        
    }
};

Called when the execution is successfully completed.

Called when the whole ExplainDataFrameAnalyticsRequest fails.

Response

edit

The returned ExplainDataFrameAnalyticsResponse contains the field selection and the memory usage estimation.

List<FieldSelection> fieldSelection = response.getFieldSelection(); 
MemoryEstimation memoryEstimation = response.getMemoryEstimation(); 

A list where each item explains whether a field was selected for analysis or not

The memory estimation for the data frame analytics job