Combined fields
editCombined fields
editThe combined_fields
query supports searching multiple text fields as if their
contents had been indexed into one combined field. It takes a term-centric
view of the query: first it analyzes the query string into individual terms,
then looks for each term in any of the fields. This query is particularly
useful when a match could span multiple text fields, for example the title
,
abstract
and body
of an article:
GET /_search { "query": { "combined_fields" : { "query": "database systems", "fields": [ "title", "abstract", "body"], "operator": "and" } } }
The combined_fields
query takes a principled approach to scoring based on the
simple BM25F formula described in
The Probabilistic Relevance Framework: BM25 and Beyond.
When scoring matches, the query combines term and collection statistics across
fields. This allows it to score each match as if the specified fields had been
indexed into a single combined field. (Note that this is a best attempt — combined_fields
makes some approximations and scores will not obey this
model perfectly.)
Field number limit
There is a limit on the number of fields that can be queried at once. It is
defined by the indices.query.bool.max_clause_count
Search settings
which defaults to 1024.
Per-field boosting
editIndividual fields can be boosted with the caret (^
) notation:
GET /_search { "query": { "combined_fields" : { "query" : "distributed consensus", "fields" : [ "title^2", "body" ] } } }
Field boosts are interpreted according to the combined field model. For example,
if the title
field has a boost of 2, the score is calculated as if each term
in the title appeared twice in the synthetic combined field.
The combined_fields
query requires that field boosts are greater than
or equal to 1.0. Field boosts are allowed to be fractional.
Top-level parameters for combined_fields
edit-
fields
-
(Required, array of strings) List of fields to search. Field wildcard patterns
are allowed. Only
text
fields are supported, and they must all have the same searchanalyzer
. -
query
-
(Required, string) Text to search for in the provided
<fields>
.The
combined_fields
query analyzes the provided text before performing a search. -
auto_generate_synonyms_phrase_query
-
(Optional, Boolean) If
true
, match phrase queries are automatically created for multi-term synonyms. Defaults totrue
.See Use synonyms with match query for an example.
-
operator
-
(Optional, string) Boolean logic used to interpret text in the
query
value. Valid values are:-
or
(Default) -
For example, a
query
value ofdatabase systems
is interpreted asdatabase OR systems
. -
and
-
For example, a
query
value ofdatabase systems
is interpreted asdatabase AND systems
.
-
-
minimum_should_match
-
(Optional, string) Minimum number of clauses that must match for a document to be returned. See the
minimum_should_match
parameter for valid values and more information. -
zero_terms_query
-
(Optional, string) Indicates whether no documents are returned if the
analyzer
removes all tokens, such as when using astop
filter. Valid values are:-
none
(Default) -
No documents are returned if the
analyzer
removes all tokens. -
all
-
Returns all documents, similar to a
match_all
query.
See Zero terms query for an example.
-
Comparison to multi_match
query
editThe combined_fields
query provides a principled way of matching and scoring
across multiple text
fields. To support this, it requires that all
fields have the same search analyzer
.
If you want a single query that handles fields of different types like
keywords or numbers, then the multi_match
query may be a better fit. It supports both text and non-text fields, and
accepts text fields that do not share the same analyzer.
The main multi_match
modes best_fields
and most_fields
take a
field-centric view of the query. In contrast, combined_fields
is
term-centric: operator
and minimum_should_match
are applied per-term,
instead of per-field. Concretely, a query like
GET /_search { "query": { "combined_fields" : { "query": "database systems", "fields": [ "title", "abstract"], "operator": "and" } } }
is executed as
+(combined("database", fields:["title" "abstract"])) +(combined("systems", fields:["title", "abstract"]))
In other words, each term must be present in at least one field for a document to match.
The cross_fields
multi_match
mode also takes a term-centric approach and
applies operator
and minimum_should_match per-term
. The main advantage of
combined_fields
over cross_fields
is its robust and interpretable approach
to scoring based on the BM25F algorithm.
Custom similarities
The combined_fields
query currently only supports the BM25
similarity
(which is the default unless a custom similarity
is configured). Per-field similarities are also not allowed.
Using combined_fields
in either of these cases will result in an error.