Elasticsearch highlights

edit

This list summarizes the most important enhancements in Elasticsearch 8.5. For the complete list, go to Elasticsearch release highlights.

Speed up SQL queries by not tracking total hits by default

edit

SQL query translator now explicitly sets track_total_hits to false when not needed. This has a significant impact on SQL query performance in cases where total hits are not needed to calculate the final result, in particular when the cost of evaluation of a single document is particularly high (eg. queries that involve script evaluation) and in queries with a small LIMIT value. In our tests, on some specific queries, we see a speed-up of more than 50%, with peaks of ~95% (from 600ms to 20ms).

#89106

ILM no longer rolls over empty indices

edit

For both new and existing Index Lifecycle Management (ILM) policies, the rollover action will only execute if an index has at least one document.

For indices with a max_age condition that are no longer being written to, this will mean that they will no longer roll over every time their max_age is reached.

A policy can override this behavior, and explicitly opt in to rolling over empty indices, by adding a "min_docs": 0 condition:

PUT _ilm/policy/allow_empty_rollover_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover" : {
            "max_age": "7d",
            "max_size": "100gb",
            "min_docs": 0
          }
        }
      }
    }
  }
}

This can also be disabled on a cluster-wide basis by setting indices.lifecycle.rollover.only_if_has_documents to false.

#89557

Release time series data stream (TSDS) functionality

edit

Elasticsearch offers support for time series data stream (TSDS) indices. A TSDS index is an index that contains time series metrics data as part of a data stream. Elasticsearch routes the incoming documents into a TSDS index so that all the documents for a particular time series are on the same shard, and then sorts the shard by time series and timestamp. This structure has a few advantages:

  1. Documents from the same time series are next to each other on the shard, and hence stored next to each other on the disk, so the operating system pages are much more homogeneous and compress better, yielding massive reduction in TCO.
  2. The analysis of a time series typically involves comparing each two consecutive docs (samples), examining the last doc in a given time window, etc., which is quite complex when the next doc could be on any shard, and in fact on any index. Sorting by time series and timestamp allows improved analysis, both in terms of performance and in terms of our ability to add new aggregations.

Finally, as part of the Index Lifecycle Management of metrics data time series, Elasticsearch enables a Downsampling action. When an index is downsampled, Elasticsearch keeps a single document with statistical summaries per each bucket of time in the time series. Supported aggregations can then be run on the data stream and include both downsampled indices and raw data indices, without the user needing to be aware of that. Downsampling of downsampled indices, to more coarse time resolution, is also supported.

#90116

Unattended mode for transforms

edit

The new unattended setting for transforms introduces unattended mode. It enables the transform to retry indefinitely, even in the face of errors that require direct intervention to resolve. For example, a high disk watermark may prevent all ingest activities, including the indexing operation of a transform; however once the original issue is resolved by the system administrator, the transform automatically continues to operate from where it stopped working. Neither the system administrator nor the end-user need to restart the transform.

#89212

Frequent items aggregation

edit

The frequent items aggregation is a new bucket aggregation which identifies items that often occur together. It is a form of association rule mining that helps to discover relationships between different data points. For example, the aggregation can find attributes of log events that tend to co-occur and may give a more informative explanation for the possible causes of a spike in the logs.

#83055

More efficient snapshots

edit

The overhead of the snapshot operation has been reduced significantly. Snapshots now run in a more efficient order and require much less network traffic than before.

#89619