What’s new in 8.8
editWhat’s new in 8.8
editHere are the highlights of what’s new and improved in Elasticsearch 8.8! For detailed information about this release, see the Release notes and Migration guide.
Other versions:
8.7 | 8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0
Encode using 40, 48 and 56 bits per value
editWe use the encoding as follows: * for values taking [33, 40] bits per value, encode using 40 bits per value * for values taking [41, 48] bits per value, encode using 48 bits per value * for values taking [49, 56] bits per value, encode using 56 bits per value
This is an improvement over the encoding used by ForUtils which does not apply any compression for values taking more than 32 bits per value.
Note that 40, 48 and 56 bits per value represent exact multiples of bytes (5, 6 and 7 bytes per value). As a result, we always write values using 3, 2 or 1 byte less than the 8 bytes required for a long value.
Looking at the savings in stored bytes, for a block of 128 (long) values we would normally store 128 x 8 bytes = 1024 bytes, while now we have the following: * 40 bits per value: write 645 bytes instead of 1024, saving 379 bytes (37%) * 48 bits per value: write 772 bytes instead of 1024, saving 252 bytes (24%) * 56 bits per value: write 897 bytes instead of 1024, saving 127 bytes (12%)
We also apply compression to gauge metrics under the assumption that compressing values taking more than 32 bits per value works well for floating point values, because of the way floating point values are represented (IEEE 754 format).
Add support for Reciprocal Rank Fusion (RRF) to the search API
editThis change adds reciprocal rank fusion (RRF) which follows the basic formula
for merging 1...n
sets of results sets together with sum(1/(k+d))
where k
is a ranking constant and d
is a document’s scored position within a result set
from a query. The main advantage of ranking this way is the scores for the sets
of results do not have to be normalized relative to each other because RRF only
relies upon positions within each result set.
The API for this change adds a rank
top-level element to the search
endpoint. An example:
{ "query": { "match": { "product": { "query": "brown shoes" } } }, "knn": { "field": "product-vector", "query_vector": [54, 10, -2], "k": 20, "num_candidates": 75 }, "rank": { "rrf": { "window_size": 100, "rank_constant": 20 } } }
The above example will execute the search query and the knn search separately. It will preserve separate result sets up to the point where the queries are ranked on the coordinating node using RRF.
Add new similarity
field to knn
clause in _search
editThis adds a new parameter to knn
that allows filtering nearest
neighbor results that are outside a given similarity.
num_candidates
and k
are still required as this controls the
nearest-neighbor vector search accuracy and exploration. For each shard
the query will search num_candidates
and only keep those that are
within the provided similarity
boundary, and then finally reduce to
only the global top k
as normal.
For example, when using the l2_norm
indexed similarity value, this
could be considered a radius
post-filter on knn
.
relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574
GA release of the JWT realm
editThis PR removes the beta label for JWT realm feature to make it GA.
The Elastic Learned Sparse EncodeR (ELSER) model
editIn 8.8, we introduce the Elastic Learned Sparse EncodeR model to our machine learning model library that you can use out of the box. ELSER improves the relevance of your search results by enabling semantic search. This search method considers the meaning of words rather than solely relying on literal terms. ELSER is a pre-trained, out-of-domain sparse vector model that eliminates the need for fine-tuning on your specific source data. It provides you with relevant search results right from the start.