This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
kuromoji_stemmer token filter
editkuromoji_stemmer
token filter
editThe kuromoji_stemmer
token filter normalizes common katakana spelling
variations ending in a long sound character by removing this character
(U+30FC). Only full-width katakana characters are supported.
This token filter accepts the following setting:
-
minimum_length
-
Katakana words shorter than the
minimum length
are not stemmed (default is4
).
PUT kuromoji_sample { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "kuromoji_tokenizer", "filter": [ "my_katakana_stemmer" ] } }, "filter": { "my_katakana_stemmer": { "type": "kuromoji_stemmer", "minimum_length": 4 } } } } } } GET kuromoji_sample/_analyze { "analyzer": "my_analyzer", "text": "コピー" } GET kuromoji_sample/_analyze { "analyzer": "my_analyzer", "text": "サーバー" }