What’s new in 8.8

edit

Here are the highlights of what’s new and improved in Elasticsearch 8.8! For detailed information about this release, see the Release notes and Migration guide.

Other versions:

8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0

Encode using 40, 48 and 56 bits per value

edit

We use the encoding as follows: * for values taking [33, 40] bits per value, encode using 40 bits per value * for values taking [41, 48] bits per value, encode using 48 bits per value * for values taking [49, 56] bits per value, encode using 56 bits per value

This is an improvement over the encoding used by ForUtils which does not apply any compression for values taking more than 32 bits per value.

Note that 40, 48 and 56 bits per value represent exact multiples of bytes (5, 6 and 7 bytes per value). As a result, we always write values using 3, 2 or 1 byte less than the 8 bytes required for a long value.

Looking at the savings in stored bytes, for a block of 128 (long) values we would normally store 128 x 8 bytes = 1024 bytes, while now we have the following: * 40 bits per value: write 645 bytes instead of 1024, saving 379 bytes (37%) * 48 bits per value: write 772 bytes instead of 1024, saving 252 bytes (24%) * 56 bits per value: write 897 bytes instead of 1024, saving 127 bytes (12%)

We also apply compression to gauge metrics under the assumption that compressing values taking more than 32 bits per value works well for floating point values, because of the way floating point values are represented (IEEE 754 format).

#93371

Add support for Reciprocal Rank Fusion (RRF) to the search API

edit

This change adds reciprocal rank fusion (RRF) which follows the basic formula for merging 1...n sets of results sets together with sum(1/(k+d)) where k is a ranking constant and d is a document’s scored position within a result set from a query. The main advantage of ranking this way is the scores for the sets of results do not have to be normalized relative to each other because RRF only relies upon positions within each result set.

The API for this change adds a rank top-level element to the search endpoint. An example:

{
  "query": {
    "match": {
      "product": {
        "query": "brown shoes"
      }
    }
  },
  "knn": {
    "field": "product-vector",
    "query_vector": [54, 10, -2],
    "k": 20,
    "num_candidates": 75
  },
  "rank": {
     "rrf": {
        "window_size": 100,
        "rank_constant": 20
     }
  }
}

The above example will execute the search query and the knn search separately. It will preserve separate result sets up to the point where the queries are ranked on the coordinating node using RRF.

#93396

Add new similarity field to knn clause in _search

edit

This adds a new parameter to knn that allows filtering nearest neighbor results that are outside a given similarity.

num_candidates and k are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search num_candidates and only keep those that are within the provided similarity boundary, and then finally reduce to only the global top k as normal.

For example, when using the l2_norm indexed similarity value, this could be considered a radius post-filter on knn.

relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574

#94828

GA release of the JWT realm

edit

This PR removes the beta label for JWT realm feature to make it GA.

#95398

The Elastic Learned Sparse EncodeR (ELSER) model

edit

In 8.8, we introduce the Elastic Learned Sparse EncodeR model to our machine learning model library that you can use out of the box. ELSER improves the relevance of your search results by enabling semantic search. This search method considers the meaning of words rather than solely relying on literal terms. ELSER is a pre-trained, out-of-domain sparse vector model that eliminates the need for fine-tuning on your specific source data. It provides you with relevant search results right from the start.