Bucket correlation aggregation

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

A sibling pipeline aggregation which executes a correlation function on the configured sibling multi-bucket aggregation.

Parameters

edit
buckets_path
(Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see buckets_path Syntax.
function

(Required, object) The correlation function to execute.

Properties of function
count_correlation

(Required*, object) The configuration to calculate a count correlation. This function is designed for determining the correlation of a term value and a given metric. Consequently, it needs to meet the following requirements.

  • The buckets_path must point to a _count metric.
  • The total count of all the bucket_path count values must be less than or equal to indicator.doc_count.
  • When utilizing this function, an initial calculation to gather the required indicator values is required.
Properties of count_correlation
indicator

(Required, object) The indicator with which to correlate the configured bucket_path values.

Properties of indicator
doc_count
(Required, integer) The total number of documents that initially created the expectations. It’s required to be greater than or equal to the sum of all values in the buckets_path as this is the originating superset of data to which the term values are correlated.
expectations
(Required, array) An array of numbers with which to correlate the configured bucket_path values. The length of this value must always equal the number of buckets returned by the bucket_path.
fractions
(Optional, array) An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the buckets_path have known gaps. The length of fractions, if provided, must equal expectations.

Syntax

edit

A bucket_correlation aggregation looks like this in isolation:

{
  "bucket_correlation": {
    "buckets_path": "range_values>_count", 
    "function": {
      "count_correlation": { 
        "indicator": {
          "expectations": [...],
          "doc_count": 10000
        }
      }
    }
  }
}

The buckets containing the values to correlate against.

The correlation function definition.

Example

edit

The following snippet correlates the individual terms in the field version with the latency metric. Not shown is the pre-calculation of the latency indicator values, which was done utilizing the percentiles aggregation.

This example is only using the 10s percentiles.

POST correlate_latency/_search?size=0&filter_path=aggregations
{
  "aggs": {
    "buckets": {
      "terms": { 
        "field": "version",
        "size": 2
      },
      "aggs": {
        "latency_ranges": {
          "range": { 
            "field": "latency",
            "ranges": [
              { "to": 0.0 },
              { "from": 0, "to": 105 },
              { "from": 105, "to": 225 },
              { "from": 225, "to": 445 },
              { "from": 445, "to": 665 },
              { "from": 665, "to": 885 },
              { "from": 885, "to": 1115 },
              { "from": 1115, "to": 1335 },
              { "from": 1335, "to": 1555 },
              { "from": 1555, "to": 1775 },
              { "from": 1775 }
            ]
          }
        },
        "bucket_correlation": { 
          "bucket_correlation": {
            "buckets_path": "latency_ranges>_count",
            "function": {
              "count_correlation": {
                "indicator": {
                   "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
                   "doc_count": 200
                }
              }
            }
          }
        }
      }
    }
  }
}

The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate the correlation of the term values with the latency.

The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.

The bucket correlation aggregation that calculates the correlation of the number of term values within each range and the previously calculated indicator values.

And the following may be the response:

{
  "aggregations" : {
    "buckets" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.0",
          "doc_count" : 100,
          "latency_ranges" : {
            "buckets" : [
              {
                "key" : "*-0.0",
                "to" : 0.0,
                "doc_count" : 0
              },
              {
                "key" : "0.0-105.0",
                "from" : 0.0,
                "to" : 105.0,
                "doc_count" : 1
              },
              {
                "key" : "105.0-225.0",
                "from" : 105.0,
                "to" : 225.0,
                "doc_count" : 9
              },
              {
                "key" : "225.0-445.0",
                "from" : 225.0,
                "to" : 445.0,
                "doc_count" : 0
              },
              {
                "key" : "445.0-665.0",
                "from" : 445.0,
                "to" : 665.0,
                "doc_count" : 0
              },
              {
                "key" : "665.0-885.0",
                "from" : 665.0,
                "to" : 885.0,
                "doc_count" : 0
              },
              {
                "key" : "885.0-1115.0",
                "from" : 885.0,
                "to" : 1115.0,
                "doc_count" : 10
              },
              {
                "key" : "1115.0-1335.0",
                "from" : 1115.0,
                "to" : 1335.0,
                "doc_count" : 20
              },
              {
                "key" : "1335.0-1555.0",
                "from" : 1335.0,
                "to" : 1555.0,
                "doc_count" : 20
              },
              {
                "key" : "1555.0-1775.0",
                "from" : 1555.0,
                "to" : 1775.0,
                "doc_count" : 20
              },
              {
                "key" : "1775.0-*",
                "from" : 1775.0,
                "doc_count" : 20
              }
            ]
          },
          "bucket_correlation" : {
            "value" : 0.8402398981360937
          }
        },
        {
          "key" : "2.0",
          "doc_count" : 100,
          "latency_ranges" : {
            "buckets" : [
              {
                "key" : "*-0.0",
                "to" : 0.0,
                "doc_count" : 0
              },
              {
                "key" : "0.0-105.0",
                "from" : 0.0,
                "to" : 105.0,
                "doc_count" : 19
              },
              {
                "key" : "105.0-225.0",
                "from" : 105.0,
                "to" : 225.0,
                "doc_count" : 11
              },
              {
                "key" : "225.0-445.0",
                "from" : 225.0,
                "to" : 445.0,
                "doc_count" : 20
              },
              {
                "key" : "445.0-665.0",
                "from" : 445.0,
                "to" : 665.0,
                "doc_count" : 20
              },
              {
                "key" : "665.0-885.0",
                "from" : 665.0,
                "to" : 885.0,
                "doc_count" : 20
              },
              {
                "key" : "885.0-1115.0",
                "from" : 885.0,
                "to" : 1115.0,
                "doc_count" : 10
              },
              {
                "key" : "1115.0-1335.0",
                "from" : 1115.0,
                "to" : 1335.0,
                "doc_count" : 0
              },
              {
                "key" : "1335.0-1555.0",
                "from" : 1335.0,
                "to" : 1555.0,
                "doc_count" : 0
              },
              {
                "key" : "1555.0-1775.0",
                "from" : 1555.0,
                "to" : 1775.0,
                "doc_count" : 0
              },
              {
                "key" : "1775.0-*",
                "from" : 1775.0,
                "doc_count" : 0
              }
            ]
          },
          "bucket_correlation" : {
            "value" : -0.5759855613334943
          }
        }
      ]
    }
  }
}