NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
Keyword Repeat Token Filter
editKeyword Repeat Token Filter
editThe keyword_repeat token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an unstemmed version of a
term to be indexed side by side with the stemmed version of the term.
Given the nature of this filter each token that isn’t transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
unique filter with only_on_same_position set to true to drop
unnecessary duplicates.
Here is an example of using the keyword_repeat token filter to
preserve both the stemmed and unstemmed version of tokens:
PUT /keyword_repeat_example
{
"settings": {
"analysis": {
"analyzer": {
"stemmed_and_unstemmed": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "keyword_repeat", "porter_stem", "unique_stem"]
}
},
"filter": {
"unique_stem": {
"type": "unique",
"only_on_same_position": true
}
}
}
}
}
And you can test it with:
POST /keyword_repeat_example/_analyze
{
"analyzer" : "stemmed_and_unstemmed",
"text" : "I like cats"
}
And it’d respond:
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "like",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "cats",
"start_offset": 7,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "cat",
"start_offset": 7,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Which preserves both the cat and cats tokens. Compare this to the example
on the Keyword Marker Token Filter.