Manipulating Relevance with Query Structure

edit

The Elasticsearch query DSL is immensely flexible. You can move individual query clauses up and down the query hierarchy to make a clause more or less important. For instance, imagine the following query:

quick OR brown OR red OR fox

We could write this as a bool query with all terms at the same level:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "text": "quick" }},
        { "term": { "text": "brown" }},
        { "term": { "text": "red"   }},
        { "term": { "text": "fox"   }}
      ]
    }
  }
}

But this query might score a document that contains quick, red, and brown the same as another document that contains quick, red, and fox. Red and brown are synonyms and we probably only need one of them to match. Perhaps we really want to express the query as follows:

quick OR (brown OR red) OR fox

According to standard Boolean logic, this is exactly the same as the original query, but as we have already seen in Combining Queries, a bool query does not concern itself only with whether a document matches, but also with how well it matches.

A better way to write this query is as follows:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "text": "quick" }},
        { "term": { "text": "fox"   }},
        {
          "bool": {
            "should": [
              { "term": { "text": "brown" }},
              { "term": { "text": "red"   }}
            ]
          }
        }
      ]
    }
  }
}

Now, red and brown compete with each other at their own level, and quick, fox, and red OR brown are the top-level competitive terms.

We have already discussed how the match, multi_match, term, bool, and dis_max queries can be used to manipulate scoring. In the rest of this chapter, we present three other scoring-related queries: the boosting query, the constant_score query, and the function_score query.