Elasticsearch highlights

edit

This list summarizes the most important enhancements in Elasticsearch 7.3.2. For the complete list, go to Elasticsearch release highlights.

Voting-only master nodes
edit

A new node.voting-only role has been introduced that allows nodes to participate in elections even though they are not eligible to become the master. The benefit is that these nodes still help with high availability while requiring less CPU and heap than master nodes.

The node.voting-only role is only available with the default distribution of Elasticsearch.

Reloading of search-time synonyms
edit

A new Analyzer reload API allows to reload the definition of search-time analyzers and their associated resources. A common use-case for this API is the reloading of search-time synonyms. In earlier versions of Elasticsearch, users could force synonyms to be reloaded by closing the index and then opening it again. With this new API, synonyms can be updated without closing the index.

The Analyzer reload API is only available with the default distribution of Elasticsearch.

New flattened field type
edit

A new flattened field type has been added, which can index arbitrary json objects into a single field. This helps avoid hitting issues due to many fields in mappings, at the cost of more limited search functionality.

The flattened field type is only available with the default distribution of Elasticsearch.

Functions on vector fields
edit

Painless now support computing the cosine similarity and the dot product of a query vector and either values of a sparse_vector or dense_vector field.

These functions are only available with the default distribution of Elasticsearch.

Prefix and wildcard support for intervals
edit

Intervals now support querying by prefix or wildcard.

Rare terms aggregation
edit

A new Rare Terms aggregation allows to find the least frequent values in a field. It is intended to replace the "order" : { "_count" : "asc" } option of the terms aggregations.

Aliases are replicated via cross-cluster replication
edit

Read aliases are now replicated via cross-cluster replication. Note that write aliases are still not replicated since they only make sense for indices that are being written to while follower indices do not receive direct writes.

SQL supports frozen indices
edit

Elasticsearch SQL now supports querying frozen indices via the new FROZEN keyword.

Fixed memory leak when using templates in document-level security
edit

Document-level security was using an unbounded cache for the set of visible documents. This could lead to a memory leak when using a templated query as a role query. The cache has been fixed to evict based on memory usage and has a limit of 50MB.

More memory-efficient aggregations on keyword fields
edit

Terms aggregations generally need to build global ordinals in order to run. Unfortunately this operation became more memory-intensive in 6.0 due to the move to doc-value iterators in order to improve handling of sparse fields. Memory pressure of global ordinals now goes back to a more similar level as what you could have on pre-6.0 releases.

Data frames: transform and pivot your streaming data
edit

[beta] This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features. Transforms are a core new feature in Elasticsearch that enable you to transform an existing index to a secondary, summarized index. Transforms enable you to pivot your data and create entity-centric indices that can summarize the behavior of an entity. This organizes the data into an analysis-friendly format.

Transforms were originally available in 7.2. With 7.3 they can now run either as a single batch transform or continuously incorporating new data as it is ingested.

Data frames enable new possibilities for machine learning analysis (such as outlier detection), but they can also be useful for other types of visualizations and custom types of analysis.

Discover your most unusual data using outlier detection
edit

The goal of outlier detection is to find the most unusual data points in an index. We analyse the numerical fields of each data point (document in an index) and annotate them with how unusual they are.

We use unsupervised outlier detection which means there is no need to provide a training data set to teach outlier detection to recognize outliers. In practice, this is achieved by using an ensemble of distance based and density based techniques to identify those data points which are the most different from the bulk of the data in the index. We assign to each analysed data point an outlier score, which captures how different the entity is from other entities in the index.

In addition to new outlier detection functionality, we are introducing the evaluate data frame analytics API, which enables you to compute a range of performance metrics such as confusion matrices, precision, recall, the receiver-operating characteristics (ROC) curve and the area under the ROC curve. If you are running outlier detection on a source index that has already been labeled to indicate which points are truly outliers and which are normal, you can use the evaluate data frame analytics API to assess the performance of the outlier detection analytics on your dataset.