NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
Keyword Repeat Token Filter
editKeyword Repeat Token Filter
editThe keyword_repeat
token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an unstemmed version of a
term to be indexed side by side with the stemmed version of the term.
Given the nature of this filter each token that isn’t transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
unique
filter with only_on_same_position
set to true
to drop
unnecessary duplicates.
Here is an example of using the keyword_repeat
token filter to
preserve both the stemmed and unstemmed version of tokens:
PUT /keyword_repeat_example { "settings": { "analysis": { "analyzer": { "stemmed_and_unstemmed": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "keyword_repeat", "porter_stem", "unique_stem"] } }, "filter": { "unique_stem": { "type": "unique", "only_on_same_position": true } } } } }
And you can test it with:
POST /keyword_repeat_example/_analyze { "analyzer" : "stemmed_and_unstemmed", "text" : "I like cats" }
And it’d respond:
{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "cats", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 }, { "token": "cat", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 } ] }
Which preserves both the cat
and cats
tokens. Compare this to the example
on the Keyword Marker Token Filter.