Script score query
editScript score query
editUses a script to provide a custom score for returned documents.
The script_score
query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.
Example request
editThe following script_score
query assigns each returned document a score equal to the likes
field value divided by 10
.
GET /_search { "query" : { "script_score" : { "query" : { "match": { "message": "elasticsearch" } }, "script" : { "source" : "doc['likes'].value / 10 " } } } }
Top-level parameters for script_score
edit-
query
- (Required, query object) Query used to return documents.
-
script
-
(Required, script object) Script used to compute the score of documents returned by the
query
.Final relevance scores from the
script_score
query cannot be negative. To support certain search optimizations, Lucene requires scores be positive or0
. -
min_score
- (Optional, float) Documents with a relevance score lower than this floating point number are excluded from the search results.
Notes
editUse relevance scores in a script
editWithin a script, you can
access
the _score
variable which represents the current relevance score of a
document.
Predefined functions
editYou can use any of the available painless
functions in your script
. You can also use the following predefined functions
to customize scoring:
We suggest using these predefined functions instead of writing your own. These functions take advantage of efficiencies from Elasticsearch' internal mechanisms.
Saturation
editsaturation(value,k) = value/(k + value)
"script" : { "source" : "saturation(doc['likes'].value, 1)" }
Sigmoid
editsigmoid(value, k, a) = value^a/ (k^a + value^a)
"script" : { "source" : "sigmoid(doc['likes'].value, 2, 1)" }
Random score function
editrandom_score
function generates scores that are uniformly distributed
from 0 up to but not including 1.
randomScore
function has the following syntax:
randomScore(<seed>, <fieldName>)
.
It has a required parameter - seed
as an integer value,
and an optional parameter - fieldName
as a string value.
"script" : { "source" : "randomScore(100, '_seq_no')" }
If the fieldName
parameter is omitted, the internal Lucene
document ids will be used as a source of randomness. This is very efficient,
but unfortunately not reproducible since documents might be renumbered
by merges.
"script" : { "source" : "randomScore(100)" }
Note that documents that are within the same shard and have the
same value for field will get the same score, so it is usually desirable
to use a field that has unique values for all documents across a shard.
A good default choice might be to use the _seq_no
field, whose only drawback is that scores will change if the document is
updated since update operations also update the value of the _seq_no
field.
Decay functions for numeric fields
editYou can read more about decay functions here.
-
double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
Decay functions for geo fields
edit-
double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
"script" : { "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)", "params": { "origin": "40, -70.12", "scale": "200km", "offset": "0km", "decay" : 0.2 } }
Decay functions for date fields
edit-
double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
"script" : { "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)", "params": { "origin": "2008-01-01T01:00:00Z", "scale": "1h", "offset" : "0", "decay" : 0.5 } }
Decay functions on dates are limited to dates in the default format
and default time zone. Also calculations with now
are not supported.
Functions for vector fields
editFunctions for vector fields are accessible through
script_score
query.
Faster alternatives
editThe script_score
query calculates the score for
every matching document, or hit. There are faster alternative query types that
can efficiently skip non-competitive hits:
-
If you want to boost documents on some static fields, use the
rank_feature
query. -
If you want to boost documents closer to a date or geographic point, use the
distance_feature
query.
Transition from the function score query
editWe are deprecating the function_score
query. We recommend using the script_score
query instead.
You can implement the following functions from the function_score
query using
the script_score
query:
script_score
editWhat you used in script_score
of the Function Score query, you
can copy into the Script Score query. No changes here.
weight
editweight
function can be implemented in the Script Score query through
the following script:
"script" : { "source" : "params.weight * _score", "params": { "weight": 2 } }
random_score
editUse randomScore
function
as described in random score function.
field_value_factor
editfield_value_factor
function can be easily implemented through script:
"script" : { "source" : "Math.log10(doc['field'].value * params.factor)", "params" : { "factor" : 5 } }
For checking if a document has a missing value, you can use
doc['field'].size() == 0
. For example, this script will use
a value 1
if a document doesn’t have a field field
:
"script" : { "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)", "params" : { "factor" : 5 } }
This table lists how field_value_factor
modifiers can be implemented
through a script:
Modifier | Implementation in Script Score |
---|---|
|
- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
decay
functions
editThe script_score
query has equivalent decay functions
that can be used in script.
Functions for vector fields
editThis functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
These functions are used for
for dense_vector
and
sparse_vector
fields.
During vector functions' calculation, all matched documents are
linearly scanned. Thus, expect the query time grow linearly
with the number of matched documents. For this reason, we recommend
to limit the number of matched documents with a query
parameter.
Let’s create an index with the following mapping and index a couple of documents into it.
PUT my_index { "mappings": { "properties": { "my_dense_vector": { "type": "dense_vector", "dims": 3 }, "my_sparse_vector" : { "type" : "sparse_vector" }, "status" : { "type" : "keyword" } } } } PUT my_index/_doc/1 { "my_dense_vector": [0.5, 10, 6], "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1}, "status" : "published" } PUT my_index/_doc/2 { "my_dense_vector": [-0.5, 10, 10], "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6}, "status" : "published" }
For dense_vector fields, cosineSimilarity
calculates the measure of
cosine similarity between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", "params": { "query_vector": [4, 3.4, -0.2] } } } } }
To restrict the number of documents on which script score calculation is applied, provide a filter. |
|
The script adds 1.0 to the cosine similarity to prevent the score from being negative. |
|
To take advantage of the script optimizations, provide a query vector as a script parameter. |
If a document’s dense vector field has a number of dimensions different from the query’s vector, an error will be thrown.
Similarly, for sparse_vector fields, cosineSimilaritySparse
calculates cosine similarity
between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0", "params": { "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
For dense_vector fields, dotProduct
calculates the measure of
dot product between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProduct(params.query_vector, doc['my_dense_vector']); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": [4, 3.4, -0.2] } } } } }
Similarly, for sparse_vector fields, dotProductSparse
calculates dot product
between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
For dense_vector fields, l1norm
calculates L1 distance
(Manhattan distance) between a given query vector and
document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
Unlike |
For sparse_vector fields, l1normSparse
calculates L1 distance
between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))", "params": { "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
For dense_vector fields, l2norm
calculates L2 distance
(Euclidean distance) between a given query vector and
document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
Similarly, for sparse_vector fields, l2normSparse
calculates L2 distance
between a given query vector and document vectors.
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))", "params": { "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
If a document doesn’t have a value for a vector field on which a vector function is executed, an error will be thrown.
You can check if a document has a value for the field my_vector
by
doc['my_vector'].size() == 0
. Your overall script can look like this:
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"