This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
kuromoji_part_of_speech token filter
editkuromoji_part_of_speech
token filter
editThe kuromoji_part_of_speech
token filter removes tokens that match a set of
part-of-speech tags. It accepts the following setting:
-
stoptags
-
An array of part-of-speech tags that should be removed. It defaults to the
stoptags.txt
file embedded in thelucene-analyzer-kuromoji.jar
.
For example:
PUT kuromoji_sample { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "kuromoji_tokenizer", "filter": [ "my_posfilter" ] } }, "filter": { "my_posfilter": { "type": "kuromoji_part_of_speech", "stoptags": [ "助詞-格助詞-一般", "助詞-終助詞" ] } } } } } } GET kuromoji_sample/_analyze { "analyzer": "my_analyzer", "text": "寿司がおいしいね" }
Which responds with:
{ "tokens" : [ { "token" : "寿司", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 0 }, { "token" : "おいしい", "start_offset" : 3, "end_offset" : 7, "type" : "word", "position" : 2 } ] }