WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

› › ›

Multivalue Fields

edit

Multivalue Fields

edit

A curious thing can happen when you try to use phrase matching on multivalue fields. Imagine that you index this document:

PUT /my_index/groups/1
{
    "names": [ "John Abraham", "Lincoln Smith"]
}

Copy as curl View in Sense

Then run a phrase query for Abraham Lincoln:

GET /my_index/groups/_search
{
    "query": {
        "match_phrase": {
            "names": "Abraham Lincoln"
        }
    }
}

Copy as curl View in Sense

Surprisingly, our document matches, even though Abraham and Lincoln belong to two different people in the names array. The reason for this comes down to the way arrays are indexed in Elasticsearch.

When John Abraham is analyzed, it produces this:

Position 1: john
Position 2: abraham

Then when Lincoln Smith is analyzed, it produces this:

Position 3: lincoln
Position 4: smith

In other words, Elasticsearch produces exactly the same list of tokens as it would have for the single string John Abraham Lincoln Smith. Our example query looks for abraham directly followed by lincoln, and these two terms do indeed exist, and they are right next to each other, so the query matches.

Fortunately, there is a simple workaround for cases like these, called the position_increment_gap, which we need to configure in the field mapping:

DELETE /my_index/groups/ 

PUT /my_index/_mapping/groups 
{
    "properties": {
        "names": {
            "type":                "string",
            "position_increment_gap": 100
        }
    }
}

Copy as curl View in Sense

	First delete the `groups` mapping and all documents of that type.
	Then create a new `groups` mapping with the correct values.

The position_increment_gap setting tells Elasticsearch that it should increase the current term position by the specified value for every new array element. So now, when we index the array of names, the terms are emitted with the following positions:

Position 1: john
Position 2: abraham
Position 103: lincoln
Position 104: smith

Our phrase query would no longer match a document like this because abraham and lincoln are now 100 positions apart. You would have to add a slop value of 100 in order for this document to match.

« Mixing It Up Closer Is Better »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Multivalue Fields

Multivalue Fields

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards