Elasticsearch highlights
editElasticsearch highlights
editThis list summarizes the most important enhancements in Elasticsearch 7.8. For the complete list, go to Elasticsearch release highlights.
Composable index templates
editIndex templates are an easy, repeatable way to configure mappings, index settings, and aliases for new indices. However, in previous versions, you had to define these configurations directly in the template. Managing multiple templates often meant copying similar configurations across templates.
In 7.8, we added a more modular version of index templates called composable index templates. You can still define configurations directly in these templates. However, composable index templates can also contain component templates. Also added in 7.8, component templates are reusable configurations for mappings, index settings, and aliases. With a component template, you can define a configuration once and reuse it across multiple index templates. If you later need to change the configuration, you only need to change its component template.
Composable index templates replace the previous version of index templates, which are now deprecated. If an index matches both a composable template and a legacy index template, Elasticsearch uses the composable template.
To get started with composable index templates, see Index templates.
Geo improvements
editWe have made several improvements to geo support in Elasticsearch 7.8.
- You can now run an aggregation that finds the bounding box (top left point and bottom right point) that contains all shapes matching a query. A shape is anything that is defined by multiple points. See Geo Bounds Aggregations.
- GeoHash grid aggregations and map tile grid aggregations allow you to group geo_points into buckets.
- Geo centroid aggregations allow you to compute the weighted centroid from all coordinate values for a geo_point field.
Add support for t-test aggregations
editElasticsearch now supports a t_test
metrics
aggregation, which performs a statistical hypothesis test in which the test
statistic follows a
Student’s
t-distribution under the null hypothesis on numeric values extracted from
the aggregated documents or generated by provided scripts. In practice,
this will tell you if the difference between two population means are
statistically significant and did not occur by chance alone. See
T-Test Aggregation.
Expose aggregation usage in feature usage API
editIt is now possible to fetch a count of aggregations that have been executed via the node features API. This is broken down per combination of aggregation and data type, per shard on each node, from the last restart until the time when the counts are fetched. When trying to analyze how Elasticsearch is being used in practice, it is useful to know the usage distribution across aggregations and field types. For example, you might be able to conclude that a certain part of an index is not used a lot and could perhaps can be eliminated.
Support value_count
and avg
aggregations over histogram fields
editElasticsearch now implements value_count
and avg
aggregations over histogram
fields.
When the value_count
aggregation is computed on histogram
fields, the result of the aggregation is the sum of all numbers in the
counts
array of the histogram.
When the average is computed on histogram fields, the result of the
aggregation is the weighted average of all elements in the values
array
taking into consideration the number in the same position in the counts
array.
Reduce aggregation memory consumption
editElasticsearch now attempts to save memory on the coordinating node by delaying deserialization of the shard results for an aggregation until the last second. This is helpful as it makes the shard-aggregations results "short lived" garbage. It also should shrink the memory usage of aggregations when they are waiting to be merged.
Additionally, when the search is in batched reduce mode, Elasticsearch will force the results to be serialized between batch reduces in an attempt to keep the memory usage as low as possible between reductions.
Scalar functions now supported in SQL aggregations
editWhen querying Elasticsearch using SQL, it is now possible to use scalar functions
inside aggregations. This allows for more complex expressions, including
within GROUP BY
or HAVING
clauses. For example:
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
Increase the performance and scalability of transforms with throttling
editTransforms achieved GA status in 7.7 and now in 7.8 they are even better
with the introduction of
throttling. You can spread
out the impact of the transforms on your cluster by defining the rate at which
they perform search and index requests. Set the docs_per_second
limit when you
create or update your transform.
Better estimates for machine learning model memory usage
editFor 7.8, we introduce dynamic estimation of the model memory limit for jobs in ML solution modules. The estimate is generated during the job creation. It uses a calculation based on the specific detectors of the job and the cardinality of the partitioning and influencer fields. It means the job setup has better default values depending on the size of the data being analyzed.
Additional loss functions for regression
editLoss functions measure how well a machine learning model fits a specific data set. In 7.8, we added two new loss functions for regression analysis. In addition to the existing mean squared error function, there are now mean squared logarithmic error and Pseudo-Huber loss functions. These additions enable you to choose the loss function that fits best with your data set.
Extended upload limit and explanations for Data Visualizer
editYou can now upload files up to 1 GB in Data Visualizer. The file structure finder functionality of the Data Visualizer provides more detailed explanations after both successful and unsuccessful analysis which makes it easier to diagnose issues with file upload.
Fixed out-of-memory error when using cross-cluster replication with large documents
editA bug caused cross-cluster replication to use more memory than configured with large documents, which could cause memory pressure or even out-of-memory errors in some cases.