What’s new in 8.0

edit

Here are the highlights of what’s new and improved in Elasticsearch 8.0!

For detailed information about this release, see the Release notes and Migration guide.

7.x REST API compatibility

edit

8.0 introduces several breaking changes to the Elasticsearch REST APIs. While it’s important to update your application to account for these changes, finding and updating every API call in a single upgrade can be painful and error-prone. To make this process easier, we’ve added support for 7.x compatibility headers to our REST APIs. In many cases, these optional headers let you make 7.x-compatible requests to an 8.0 cluster and receive 7.x-compatible responses.

While we still recommend you update your application to use native 8.0 requests and responses, the 7.x API compatibility headers let you safely make these changes over a longer period of time.

For more information about the headers and how to use them, see REST API compatibility.

Security features are enabled and configured by default

edit

Running Elasticsearch without security leaves your cluster exposed to anyone who can send network traffic to Elasticsearch. In previous versions, you had to explicitly enable the Elasticsearch security features such as authentication, authorization, and network encryption (TLS). Starting in Elasticsearch 8.0, security is enabled and configured by default when you start Elasticsearch for the first time.

At startup, we generate enrollment tokens that you use to connect a Kibana instance or enroll additional nodes in your secured Elasticsearch cluster, without having to generate security certificates or update YAML configuration files. Just use the generated enrollment token when starting new nodes or Kibana instances, and the Elastic Stack handles all of the security configuration for you. Out of the box, you’ll get:

  • User authentication
  • User authorization
  • Encrypted internode communication with TLS
  • Encrypted communication between Elasticsearch and Kibana with TLS

Need a new enrollment token? Use the elasticsearch-create-enrollment-token tool to create enrollment tokens for Elasticsearch nodes and Kibana instances.

Better protection for system indices

edit

System indices store configurations and internal data for Elastic features. Generally, system indices are reserved only for internal use by these features. While possible, directly accessing or changing a system index can cause instability and other issues.

In 8.0, we’ve made several changes to protect system indices from direct access. To access a system index, you must now have the allow_restricted_indices permission set to true.

The superuser role also no longer gives write access to system indices. As a result, the built-in elastic superuser can’t change system indices by default.

If available, use Kibana or the associated Elasticsearch APIs to manage data for a feature rather than accessing a system index. If you attempt to directly access a system index, Elasticsearch will return a warning in the header of API responses and in the deprecation logs.

New kNN search API

edit

With 8.0, we’re introducing a technical preview of the kNN search API.

Using dense_vector fields, a k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector, as measured by a similarity metric. kNN is commonly used to power recommendation engines and rank relevancy based on natural language processing (NLP) algorithms.

Previously, Elasticsearch only supported exact kNN searches using a script_score query with a vector function. While this method guarantees accurate results, it often results in slow searches and doesn’t scale well with large datasets. In exchange for slower indexing and imperfect accuracy, the new kNN search API lets you run approximate kNN searches on larger datasets and at faster speeds.

Storage savings for keyword, match_only_text, and text fields

edit

We’ve updated inverted indices, an internal data structure, to use a more space-efficient encoding. This change will benefit keyword fields, match_only_text fields, and, to a lesser extent, text fields. In our benchmarks using application logs, this translated into a 14.4% reduction of the size of the index of the message field (mapped as match_only_text) and an overall 3.5% reduction of the on-disk footprint.

This change will be picked up automatically by both new indices, and existing indices for every new segment.

Faster indexing of geo_point, geo_shape, and range fields

edit

We’ve optimized indexing speeds for multi-dimensional points, an internal data structure used for geo_point, geo_shape, and range fields. Lucene-level benchmarks reported 10-15% faster indexing for these fields types. Elasticsearch indices and data streams that mostly consist of these fields may see noticeable improvements to indexing speed.

PyTorch model support for natural language processing (NLP)

edit

Now it is possible to upload PyTorch models that are trained outside Elasticsearch and use them for inference at ingest time. Third party model support brings modern natural language processing (NLP) and search use cases to the Elastic Stack such as:

  • Fill-mask
  • Named entity recognition (NER)
  • Text classification
  • Text embedding
  • Zero-shot classification