What’s new in 8.9
editWhat’s new in 8.9
editHere are the highlights of what’s new and improved in Elasticsearch 8.9! For detailed information about this release, see the Release notes and Migration guide.
Other versions:
8.8 | 8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0
Better indexing and search performance under concurrent indexing and search
editWhen a query like a match phrase query or a terms query targets a constant keyword field we can skip query execution on shards where the query is rewritten to match no documents. We take advantage of index mappings including constant keyword fields and rewrite queries in such a way that, if a constant keyword field does not match the value defined in the index mapping, we rewrite the query to match no document. This will result in the shard level request to return immediately, before the query is executed on the data node and, as a result, skipping the shard completely. Here we leverage the ability to skip shards whenever possible to avoid unnecessary shard refreshes and improve query latency (by not doing any search-related I/O). Avoiding such unnecessary shard refreshes improves query latency since the search thread does not need to wait anymore for unnecessary shard refreshes. Shards not matching the query criteria will remain in a search-idle state and indexing throughput will not be negatively affected by a refresh. Before introducing this change a query hitting multiple shards, including those with no documents matching the search criteria (think about using index patterns or data streams with many backing indices), would potentially result in a "shard refresh storm" increasing query latency as a result of the search thread waiting on all shard refreshes to complete before being able to initiate and carry out the search operation. After introducing this change the search thread will just need to wait for refreshes to be completed on shards including relevant data. Note that execution of the shard pre-filter and the corresponding "can match" phase where rewriting happens, depends on the overall number of shards involved and on whether there is at least one of them returning a non-empty result (see pre_filter_shard_size setting to understand how to control this behavior). Elasticsearch does the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access index mappings and extract information about constant keyword fields and their values. This means we still "fan-out" search queries from the coordinator node to involved data nodes. Rewriting queries based on index mappings is not possible on the coordinator node because the coordinator node is missing index mapping information.
Add multiple queries for ranking to the search endpoint
editThe search endpoint adds a new top-level element called sub_searches
. This top-level element is a list of searches used for ranking where each "sub search" is executed independently. The sub_searches
element is used instead of query
to allow a search using ranking to execute multiple queries. The sub_searches
and query
elements cannot be used together.
Make text embedding for kNN search GA
editFrom 8.9, the text_embedding
query_vector_builder
extension to kNN search is generally available. This functionality is required to perform semantic search by converting text to dense vectors.
Asset tracking - geo_line in time-series aggregations
editThe geo_line
aggregation builds tracks from geo_points
.
It has previously needed to use large arrays in memory for collecting points into multiple buckets
and sorting those buckets.
With the advances made in TSDB features and the time_series
aggregation in particular,
it is now possible to rely on data aggregating in both TSID and timestamp order,
enabling the removal of all sorting, as well as the use of only a single bucket’s
worth of memory, a dramatic improvement in memory footprint. In addition, we can use the streaming line
simplifier algorithm introduced in https://github.com/elastic/elasticsearch/pull/94859 to replace the previous
behaviour of truncating very large tracks with the far more preferable approach of simplifying those tracks.
In this diagram, the grey line is the original geometry, the blue line is the truncated geometry as would be
produced by the original geo_line
aggregation, and the magenta line is the new simplified geometry.