Query-Time Boosting

edit

In Prioritizing Clauses, we explained how you could use the boost parameter at search time to give one query clause more importance than another. For instance:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "quick brown fox",
              "boost": 2 
            }
          }
        },
        {
          "match": { 
            "content": "quick brown fox"
          }
        }
      ]
    }
  }
}

The title query clause is twice as important as the content query clause, because it has been boosted by a factor of 2.

A query clause without a boost value has a neutral boost of 1.

Query-time boosting is the main tool that you can use to tune relevance. Any type of query accepts a boost parameter. Setting a boost of 2 doesn’t simply double the final _score; the actual boost value that is applied goes through normalization and some internal optimization. However, it does imply that a clause with a boost of 2 is twice as important as a clause with a boost of 1.

Practically, there is no simple formula for deciding on the “correct” boost value for a particular query clause. It’s a matter of try-it-and-see. Remember that boost is just one of the factors involved in the relevance score; it has to compete with the other factors. For instance, in the preceding example, the title field will probably already have a “natural” boost over the content field thanks to the field-length norm (titles are usually shorter than the related content), so don’t blindly boost fields just because you think they should be boosted. Apply a boost and check the results. Change the boost and check again.

Boosting an Index

edit

When searching across multiple indices, you can boost an entire index over the others with the indices_boost parameter. This could be used, as in the next example, to give more weight to documents from a more recent index:

GET /docs_2014_*/_search 
{
  "indices_boost": { 
    "docs_2014_10": 3,
    "docs_2014_09": 2
  },
  "query": {
    "match": {
      "text": "quick brown fox"
    }
  }
}

This multi-index search covers all indices beginning with docs_2014_.

Documents in the docs_2014_10 index will be boosted by 3, those in docs_2014_09 by 2, and any other matching indices will have a neutral boost of 1.

t.getBoost()

edit

These boost values are represented in the Lucene’s Practical Scoring Function by the t.getBoost() element. Boosts are not applied at the level that they appear in the query DSL. Instead, any boost values are combined and passed down to the individual terms. The t.getBoost() method returns any boost value applied to the term itself or to any of the queries higher up the chain.

In fact, reading the explain output is a little more complex than that. You won’t see the boost value or t.getBoost() mentioned in the explanation at all. Instead, the boost is rolled into the queryNorm that is applied to a particular term. Although we said that the queryNorm is the same for every term, you will see that the queryNorm for a boosted term is higher than the queryNorm for an unboosted term.