- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment
WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Multivalue Fields
editMultivalue Fields
editA curious thing can happen when you try to use phrase matching on multivalue fields. Imagine that you index this document:
PUT /my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] }
Then run a phrase query for Abraham Lincoln
:
GET /my_index/groups/_search { "query": { "match_phrase": { "names": "Abraham Lincoln" } } }
Surprisingly, our document matches, even though Abraham
and Lincoln
belong to two different people in the names
array. The reason for this comes
down to the way arrays are indexed in Elasticsearch.
When John Abraham
is analyzed, it produces this:
-
Position 1:
john
-
Position 2:
abraham
Then when Lincoln Smith
is analyzed, it produces this:
-
Position 3:
lincoln
-
Position 4:
smith
In other words, Elasticsearch produces exactly the same list of tokens as it would have
for the single string John Abraham Lincoln Smith
. Our example query
looks for abraham
directly followed by lincoln
, and these two terms do
indeed exist, and they are right next to each other, so the query matches.
Fortunately, there is a simple workaround for cases like these, called the
position_increment_gap
, which we need to configure in the field mapping:
DELETE /my_index/groups/ PUT /my_index/_mapping/groups { "properties": { "names": { "type": "string", "position_increment_gap": 100 } } }
First delete the |
|
Then create a new |
The position_increment_gap
setting tells Elasticsearch that it should increase
the current term position
by the specified value for every new array
element. So now, when we index the array of names, the terms are emitted with
the following positions:
-
Position 1:
john
-
Position 2:
abraham
-
Position 103:
lincoln
-
Position 104:
smith
Our phrase query would no longer match a document like this because abraham
and lincoln
are now 100 positions apart. You would have to add a slop
value of 100 in order for this document to match.