What’s new in 7.9

edit

Here are the highlights of what’s new and improved in Elasticsearch 7.9! For detailed information about this release, see the Release notes and Breaking changes.

Other versions: 7.8 | 7.7 | 7.6 | 7.5 | 7.4 | 7.3 | 7.2 | 7.1 | 7.0

Fixed retries for cross-cluster replication

edit

Cross-cluster replication now retries operations that failed due to a circuit breaker or a lost remote cluster connection.

Fixed index throttling

edit

When indexing data, Elasticsearch and Lucene use heap memory for buffering. To control memory usage, Elasticsearch moves data from the buffer to disk based on your indexing buffer settings. If ongoing indexing outpaces the relocation of data to disk, Elasticsearch will now throttle indexing. In previous Elasticsearch versions, this feature was broken and throttling was not activated.

EQL

edit

EQL (Event Query Language) is a declarative language dedicated for identifying patterns and relationships between events.

Consider using EQL if you:

  • Use Elasticsearch for threat hunting or other security use cases
  • Search time series data or logs, such as network or system logs
  • Want an easy way to explore relationships between events

A good intro on EQL and its purpose is available in this blog post. See the EQL in Elasticsearch documentaton for an in-depth explanation, and also the language reference.

This release includes the following features:

  • Event queries
  • Sequences
  • Pipes

An in-depth discussion of EQL in ES scope can be found at #49581.

Data streams

edit

A data stream is a convenient, scalable way to ingest, search, and manage continuously generated time series data. They provide a simpler way to split data across multiple indices and still query it via a single named resource.

See the Data streams documentation to get started.

Enable fully concurrent snapshot operations

edit

Snapshot operations can now execute in a fully concurrent manner.

  • Create and delete operations can be started in any order
  • Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and, once enqueued in the cluster state, prevent new snapshots from starting on data nodes until executed
  • Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository, as are snapshot finalizations

Improve speed and memory usage of multi-bucket aggregations

edit

Before 7.9, many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:

POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}

When run over three years, this aggregation spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. Elasticsearch 7.9 removes all of this overhead, which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally benchmarks for aggregations to make sure that these tests didn’t slow down aggregations, so now we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don’t affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but this optimization should help!

Allow index filtering in field capabilities API

edit

You can now supply an index_filter to the field capabilities API. Indices are filtered from the response if the provided query rewrites to match_none on every shard.

Support terms and rare_terms aggregations in transforms

edit

Transforms now support the terms and rare_terms aggregations. The default behavior is that the results are collapsed in the following manner:

<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...

Or if no sub-aggregations exist:

<AGG_NAME>.<BUCKET_NAME>.<_doc_count>

The mapping is also defined as flattened by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.

Optimize date_histograms across daylight savings time

edit

Rounding dates on a shard that contains a daylight savings time transition is currently drastically slower than when a shard contains dates only on one side of the DST transition, and also generates a large number of short-lived objects in memory. Elasticsearch 7.9 has a revised and far more efficient implemention that adds only a comparatively small overhead to requests.

Improved resilience to network disruption

edit

Elasticsearch now has mechansisms to safely resume peer recoveries when there is network disruption, which would previously have failed any in-progress peer recoveries.

Wildcard field optimised for wildcard queries

edit

Elasticsearch now supports a wildcard field type, which stores values optimised for wildcard grep-like queries. While such queries are possible with other field types, they suffer from constraints that limit their usefulness.

This field type is especially well suited for running grep-like queries on log lines. See the wildcard datatype documentation for more information.

Indexing metrics and back pressure

edit

Elasticsearch 7.9 now tracks metrics about the number of indexing request bytes that are outstanding at each point in the indexing process (coordinating, primary, and replication). These metrics are exposed in the node stats API. Additionally, the new setting indexing_pressure.memory.limit controls the maximum number of bytes that can be outstanding, which is 10% of the heap by default. Once this number of bytes from a node’s heap is consumed by outstanding indexing bytes, Elasticsearch will start rejecting new coordinating and primary requests.

Additionally, since a failed replication operation can fail a replica, Elasticsearch will assign 1.5X limit for the number of replication bytes. Only replication bytes can trigger this limit. If replication bytes increase to high levels, the node will stop accepting new coordinating and primary operations until the replication work load has dropped.

Inference in pipeline aggregations

edit

In 7.6, we introduced inference that enables you to make predictions on new data with your regression or classification models via a processor in an ingest pipeline. Now, in 7.9, inference is even more flexible! You can reference a pre-trained data frame analytics model in an aggregation to infer on the result field of the parent bucket aggregation. The aggregation uses the model on the results to provide a prediction. This addition enables you to run classification or regression analysis at search time. If you want to perform analysis on a small set of data, you can generate predictions without the need to set up a processor in the ingest pipeline.