IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Extended stats bucket aggregation Max bucket aggregation »

› › ›

Inference bucket aggregation

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Inference bucket aggregation

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

A parent pipeline aggregation which loads a pre-trained model and performs inference on the collated result fields from the parent bucket aggregation.

To use the inference bucket aggregation, you need to have the same security privileges that are required for using the Get trained model.

Syntax

edit

A inference aggregation looks like this in isolation:

{
  "inference": {
    "model_id": "a_model_for_inference", 
    "inference_config": { 
      "regression_config": {
        "num_top_feature_importance_values": 2
      }
    },
    "buckets_path": {
      "avg_cost": "avg_agg", 
          "max_cost": "max_agg"
    }
  }
}

	The ID of model to use.
	The optional inference config which overrides the model’s default settings
	Map the value of `avg_agg` to the model’s input field `avg_cost`

Table 52. inference Parameters

Parameter Name	Description	Required	Default Value
`model_id`	The ID of the model to load and infer against	Required	-
`inference_config`	Contains the inference type and its options. There are two types: `regression` and `classification`	Optional	-
`buckets_path`	Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model. See `buckets_path` Syntax for more details	Required	-

Configuration options for inference models

edit

The inference_config setting is optional and usually isn’t required as the pre-trained models come equipped with sensible defaults. In the context of aggregations some options can overridden for each of the 2 types of model.

Configuration options for regression models

edit

num_top_feature_importance_values: (Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.

Configuration options for classification models

edit

num_top_classes: (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values: (Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
prediction_field_type: (Optional, string) Specifies the type of the predicted field to write. Acceptable values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.

Example

edit

The following snippet aggregates a web log by client_ip and extracts a number of features via metric and bucket sub-aggregations as input to the inference aggregation configured with a model trained to identify suspicious client IPs:

GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "client_ip": { 
      "composite": {
        "sources": [
          {
            "client_ip": {
              "terms": {
                "field": "clientip"
              }
            }
          }
        ]
      },
      "aggs": { 
        "url_dc": {
          "cardinality": {
            "field": "url.keyword"
          }
        },
        "bytes_sum": {
          "sum": {
            "field": "bytes"
          }
        },
        "geo_src_dc": {
          "cardinality": {
            "field": "geo.src"
          }
        },
        "geo_dest_dc": {
          "cardinality": {
            "field": "geo.dest"
          }
        },
        "responses_total": {
          "value_count": {
            "field": "timestamp"
          }
        },
        "success": {
          "filter": {
            "term": {
              "response": "200"
            }
          }
        },
        "error404": {
          "filter": {
            "term": {
              "response": "404"
            }
          }
        },
        "error503": {
          "filter": {
            "term": {
              "response": "503"
            }
          }
        },
        "malicious_client_ip": { 
          "inference": {
            "model_id": "malicious_clients_model",
            "buckets_path": {
              "response_count": "responses_total",
              "url_dc": "url_dc",
              "bytes_sum": "bytes_sum",
              "geo_src_dc": "geo_src_dc",
              "geo_dest_dc": "geo_dest_dc",
              "success": "success._count",
              "error404": "error404._count",
              "error503": "error503._count"
            }
          }
        }
      }
    }
  }
}

	A composite bucket aggregation that aggregates the data by `client_ip`.
	A series of metrics and bucket sub-aggregations.
	Inference bucket aggregation that contains the model ID and maps the aggregation names to the model’s input fields.

« Extended stats bucket aggregation Max bucket aggregation »