Grouping Limitations with heterogeneous indices

edit

Grouping Limitations with heterogeneous indices

edit

There is a known limitation to Rollup groups, due to some internal implementation details at this time. The Rollup feature leverages the composite aggregation from Elasticsearch. At the moment, the composite agg only returns buckets when all keys in the tuple are non-null. Put another way, if the you request keys [A,B,C] in the composite aggregation, the only documents that are aggregated are those that have all of the keys A, B and C.

Because Rollup uses the composite agg during the indexing process, it inherits this behavior. Practically speaking, if all of the documents in your index are homogeneous (they have the same mapping), you can ignore this limitation and stop reading now.

However, if you have a heterogeneous collection of documents that you wish to roll up, you may need to configure two or more jobs to accurately cover the original data.

As an example, if your index has two types of documents:

{
  "timestamp": 1516729294000,
  "temperature": 200,
  "voltage": 5.2,
  "node": "a"
}

and

{
  "timestamp": 1516729294000,
  "price": 123,
  "title": "Foo"
}

it may be tempting to create a single, combined rollup job which covers both of these document types, something like this:

PUT _xpack/rollup/job/combined
{
    "index_pattern": "data-*",
    "rollup_index": "data_rollup",
    "cron": "*/30 * * * * ?",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["node", "title"]
      }
    },
    "metrics": [
        {
            "field": "temperature",
            "metrics": ["min", "max", "sum"]
        },
        {
            "field": "price",
            "metrics": ["avg"]
        }
    ]
}

You can see that it includes a terms grouping on both "node" and "title", fields that are mutually exclusive in the document types. This will not work. Because the composite aggregation (and by extension, Rollup) only returns buckets when all keys are non-null, and there are no documents that have both a "node" field and a "title" field, this rollup job will not produce any rollups.

Instead, you should configure two independent jobs (sharing the same index, or going to separate indices):

PUT _xpack/rollup/job/sensor
{
    "index_pattern": "data-*",
    "rollup_index": "data_rollup",
    "cron": "*/30 * * * * ?",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["node"]
      }
    },
    "metrics": [
        {
            "field": "temperature",
            "metrics": ["min", "max", "sum"]
        }
    ]
}
PUT _xpack/rollup/job/purchases
{
    "index_pattern": "data-*",
    "rollup_index": "data_rollup",
    "cron": "*/30 * * * * ?",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["title"]
      }
    },
    "metrics": [
        {
            "field": "price",
            "metrics": ["avg"]
        }
    ]
}

Notice that each job now deals with a single "document type", and will not run into the limitations described above. We are working on changes in core Elasticsearch to remove this limitation from the composite aggregation, and the documentation will be updated accordingly when this particular scenario is fixed.