Infer trained model API

edit

Evaluates a trained model. The model may be any supervised model either trained by data frame analytics or imported.

For model deployments with caching enabled, results may be returned directly from the inference cache.

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

Request

edit

POST _ml/trained_models/<model_id>/_infer

Path parameters

edit
<model_id>
(Required, string) The unique identifier of the trained model.

Query parameters

edit
timeout
(Optional, time) Controls the amount of time to wait for inference results. Defaults to 10 seconds.

Request body

edit
docs
(Required, array) An array of objects to pass to the model for inference. The objects should contain the fields matching your configured trained model input. Typically for NLP models, the field name is text_field. Currently for NLP models, only a single value is allowed. For data frame analytics or imported classification or regression models, more than one value is allowed.
inference_config

(Required, object) The default configuration for inference. This can be: regression, classification, fill_mask, ner, question_answering, text_classification, text_embedding or zero_shot_classification. If regression or classification, it must match the target_type of the underlying definition.trained_model. If fill_mask, ner, question_answering, text_classification, or text_embedding; the model_type must be pytorch.

Properties of inference_config
classification

(Optional, object) Classification configuration for inference.

Properties of classification inference
num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. Defaults to 0 which means no feature importance calculation occurs.
prediction_field_type
(Optional, string) Specifies the type of the predicted field to write. Valid values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
top_classes_results_field
(Optional, string) Specifies the field to which the top classes are written. Defaults to top_classes.
fill_mask

(Optional, object) Configuration for a fill_mask natural language processing (NLP) task. The fill_mask task works with models optimized for a fill mask action. For example, for BERT models, the following text may be provided: "The capital of France is [MASK].". The response indicates the value most likely to replace [MASK]. In this instance, the most probable token is paris.

Properties of fill_mask inference
num_top_classes
(Optional, integer) Number of top predicted tokens to return for replacing the mask token. Defaults to 0.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

ner

(Optional, object) Configures a named entity recognition (NER) task. NER is a special case of token classification. Each token in the sequence is classified according to the provided classification labels. Currently, the NER task requires the classification_labels Inside-Outside-Beginning (IOB) formatted labels. Only person, organization, location, and miscellaneous are supported.

Properties of ner inference
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

pass_through

(Optional, object) Configures a pass_through task. This task is useful for debugging as no post-processing is done to the inference output and the raw pooling layer results are returned to the caller.

Properties of pass_through inference
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

question_answering

(Optional, object) Configures a question answering natural language processing (NLP) task. Question answering is useful for extracting answers for certain questions from a large corpus of text.

Properties of question_answering inference
max_answer_length
(Optional, integer) The maximum amount of words in the answer. Defaults to 15.
num_top_classes
(Optional, integer) The number the top found answers to return. Defaults to 0, meaning only the best found answer is returned.
question
(Required, string) The question to use when extracting an answer
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models

Recommended to set max_sequence_length to 386 with 128 of span and set truncate to none.

Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

regression

(Optional, object) Regression configuration for inference.

Properties of regression inference
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
text_classification

(Optional, object) A text classification task. Text classification classifies a provided text sequence into previously known target classes. A specific example of this is sentiment analysis, which returns the likely target classes indicating text sentiment, such as "sad", "happy", or "angry".

Properties of text_classification inference
classification_labels
(Optional, string) An array of classification labels.
num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to all classes (-1).
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
span

(Optional, integer) When truncate is none, you can partition longer text sequences for inference. The value indicates how many tokens overlap between each subsequence.

The default value is -1, indicating no windowing or spanning occurs.

When your typical input is just slightly larger than max_sequence_length, it may be best to simply truncate; there will be very little information in the second subsequence.

truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

text_embedding

(Object, optional) Text embedding takes an input sequence and transforms it into a vector of numbers. These embeddings capture not simply tokens, but semantic meanings and context. These embeddings can be used in a dense vector field for powerful insights.

Properties of text_embedding inference
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

zero_shot_classification

(Object, optional) Configures a zero-shot classification task. Zero-shot classification allows for text classification to occur without pre-determined labels. At inference time, it is possible to adjust the labels to classify. This makes this type of model and task exceptionally flexible.

If consistently classifying the same labels, it may be better to use a fine-tuned text classification model.

Properties of zero_shot_classification inference
labels
(Optional, array) The labels to classify. Can be set at creation for default labels, and then updated during inference.
multi_label
(Optional, boolean) Indicates if more than one true label is possible given the input. This is useful when labeling text that could pertain to more than one of the input labels. Defaults to false.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.
tokenization

(Optional, object) Indicates the tokenization to perform and the desired settings. The default tokenization configuration is bert. Valid tokenization values are

  • bert: Use for BERT-style models
  • mpnet: Use for MPNet-style models
  • roberta: Use for RoBERTa-style and BART-style models
Properties of tokenization
bert

(Optional, object) BERT-style tokenization is to be performed with the enclosed settings.

Properties of bert
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

roberta

(Optional, object) RoBERTa-style tokenization is to be performed with the enclosed settings.

Properties of roberta
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

mpnet

(Optional, object) MPNet-style tokenization is to be performed with the enclosed settings.

Properties of mpnet
truncate

(Optional, string) Indicates how tokens are truncated when they exceed max_sequence_length. The default value is first.

  • none: No truncation occurs; the inference request receives an error.
  • first: Only the first sequence is truncated.
  • second: Only the second sequence is truncated. If there is just one sequence, that sequence is truncated.

For zero_shot_classification, the hypothesis sequence is always the second sequence. Therefore, do not use second in this case.

Examples

edit

The response depends on the kind of model.

For example, for language identification the response is the predicted language and the score:

POST _ml/trained_models/lang_ident_model_1/_infer
{
  "docs":[{"text": "The fool doth think he is wise, but the wise man knows himself to be a fool."}]
}

Here are the results predicting english with a high probability.

{
  "inference_results": [
    {
      "predicted_value": "en",
      "prediction_probability": 0.9999658805366392,
      "prediction_score": 0.9999658805366392
    }
  ]
}

When it is a text classification model, the response is the score and predicted classification.

For example:

POST _ml/trained_models/model2/_infer
{
	"docs": [{"text_field": "The movie was awesome!!"}]
}

The API returns the predicted label and the confidence.

{
  "inference_results": [{
    "predicted_value" : "POSITIVE",
    "prediction_probability" : 0.9998667964092964
  }]
}

For named entity recognition (NER) models, the response contains the annotated text output and the recognized entities.

POST _ml/trained_models/model2/_infer
{
	"docs": [{"text_field": "Hi my name is Josh and I live in Berlin"}]
}

The API returns in this case:

{
  "inference_results": [{
    "predicted_value" : "Hi my name is [Josh](PER&Josh) and I live in [Berlin](LOC&Berlin)",
    "entities" : [
      {
        "entity" : "Josh",
        "class_name" : "PER",
        "class_probability" : 0.9977303419824,
        "start_pos" : 14,
        "end_pos" : 18
      },
      {
        "entity" : "Berlin",
        "class_name" : "LOC",
        "class_probability" : 0.9992474323902818,
        "start_pos" : 33,
        "end_pos" : 39
      }
    ]
  }]
}

Zero-shot classification models require extra configuration defining the class labels. These labels are passed in the zero-shot inference config.

POST _ml/trained_models/model2/_infer
{
  "docs": [
    {
      "text_field": "This is a very happy person"
    }
  ],
  "inference_config": {
    "zero_shot_classification": {
      "labels": [
        "glad",
        "sad",
        "bad",
        "rad"
      ],
      "multi_label": false
    }
  }
}

The API returns the predicted label and the confidence, as well as the top classes:

{
  "inference_results": [{
    "predicted_value" : "glad",
    "top_classes" : [
      {
        "class_name" : "glad",
        "class_probability" : 0.8061155063386439,
        "class_score" : 0.8061155063386439
      },
      {
        "class_name" : "rad",
        "class_probability" : 0.18218006158387956,
        "class_score" : 0.18218006158387956
      },
      {
        "class_name" : "bad",
        "class_probability" : 0.006325615787634201,
        "class_score" : 0.006325615787634201
      },
      {
        "class_name" : "sad",
        "class_probability" : 0.0053788162898424545,
        "class_score" : 0.0053788162898424545
      }
    ],
    "prediction_probability" : 0.8061155063386439
  }]
}

Question answering models require extra configuration defining the question to answer.

POST _ml/trained_models/model2/_infer
{
  "docs": [
    {
      "text_field": "<long text to extract answer>"
    }
  ],
  "inference_config": {
    "question_answering": {
      "question": "<question to be answered>"
    }
  }
}

The API returns a response similar to the following:

{
    "predicted_value": <string subsection of the text that is the answer>,
    "start_offset": <character offset in document to start>,
    "end_offset": <character offset end of the answer,
    "prediction_probability": <prediction score>
}

The tokenization truncate option can be overridden when calling the API:

POST _ml/trained_models/model2/_infer
{
  "docs": [{"text_field": "The Amazon rainforest covers most of the Amazon basin in South America"}],
  "inference_config": {
    "ner": {
      "tokenization": {
        "bert": {
          "truncate": "first"
        }
      }
    }
  }
}

When the input has been truncated due to the limit imposed by the model’s max_sequence_length the is_truncated field appears in the response.

{
  "inference_results": [{
    "predicted_value" : "The [Amazon](LOC&Amazon) rainforest covers most of the [Amazon](LOC&Amazon) basin in [South America](LOC&South+America)",
    "entities" : [
      {
        "entity" : "Amazon",
        "class_name" : "LOC",
        "class_probability" : 0.9505460915724254,
        "start_pos" : 4,
        "end_pos" : 10
      },
      {
        "entity" : "Amazon",
        "class_name" : "LOC",
        "class_probability" : 0.9969992804311777,
        "start_pos" : 41,
        "end_pos" : 47
      }
    ],
    "is_truncated" : true
  }]
}