IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
kuromoji_stemmer token filter
edit
IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.
kuromoji_stemmer token filter
editThe kuromoji_stemmer token filter normalizes common katakana spelling
variations ending in a long sound character by removing this character
(U+30FC). Only full-width katakana characters are supported.
This token filter accepts the following setting:
-
minimum_length -
Katakana words shorter than the
minimum lengthare not stemmed (default is4).
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"my_katakana_stemmer"
]
}
},
"filter": {
"my_katakana_stemmer": {
"type": "kuromoji_stemmer",
"minimum_length": 4
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=コピー
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=サーバー
ja_stop token filter
editThe ja_stop token filter filters out Japanese stopwords (_japanese_), and
any other custom stopwords specified by the user. This filter only supports
the predefined _japanese_ stopwords list. If you want to use a different
predefined list, then use the
stop token filter instead.
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_with_ja_stop": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"ja_stop"
]
}
},
"filter": {
"ja_stop": {
"type": "ja_stop",
"stopwords": [
"_japanese_",
"ストップ"
]
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=analyzer_with_ja_stop&text=ストップは消える
The above request returns:
{
"tokens" : [ {
"token" : "消える",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}
kuromoji_number token filter
editThe kuromoji_number token filter normalizes Japanese numbers (kansūji)
to regular Arabic decimal numbers in half-width characters. For example:
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"kuromoji_number"
]
}
}
}
}
}
}
POST kuromoji_sample/_analyze?analyzer=my_analyzer&text=一〇〇〇
Which results in:
{
"tokens" : [ {
"token" : "1000",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
} ]
}