NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
kuromoji_stemmer token filter
edit
IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.
kuromoji_stemmer token filter
editThe kuromoji_stemmer token filter normalizes common katakana spelling
variations ending in a long sound character by removing this character
(U+30FC). Only full-width katakana characters are supported.
This token filter accepts the following setting:
-
minimum_length -
Katakana words shorter than the
minimum lengthare not stemmed (default is4).
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"my_katakana_stemmer"
]
}
},
"filter": {
"my_katakana_stemmer": {
"type": "kuromoji_stemmer",
"minimum_length": 4
}
}
}
}
}
}
GET kuromoji_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "コピー"
}
GET kuromoji_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "サーバー"
}