- Plugins and Integrations: other versions:
- Introduction to plugins
- Plugin management
- API extension plugins
- Analysis plugins
- ICU analysis plugin
- Japanese (kuromoji) analysis plugin
kuromoji
analyzerkuromoji_iteration_mark
character filterkuromoji_tokenizer
kuromoji_baseform
token filterkuromoji_part_of_speech
token filterkuromoji_readingform
token filterkuromoji_stemmer
token filterja_stop
token filterkuromoji_number
token filterhiragana_uppercase
token filterkatakana_uppercase
token filterkuromoji_completion
token filter
- Korean (nori) analysis plugin
- Phonetic analysis plugin
- Smart Chinese analysis plugin
- Stempel Polish analysis plugin
- Ukrainian analysis plugin
- Discovery plugins
- Mapper plugins
- Snapshot/restore repository plugins
- Store plugins
- Integrations
- Creating an Elasticsearch plugin
hiragana_uppercase token filter
edithiragana_uppercase
token filter
editThe hiragana_uppercase
token filter normalizes small letters (捨て仮名) in hiragana into standard letters.
This filter is useful if you want to search against old style Japanese text such as
patents, legal documents, contract policies, etc.
For example:
PUT kuromoji_sample { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "kuromoji_tokenizer", "filter": [ "hiragana_uppercase" ] } } } } } } GET kuromoji_sample/_analyze { "analyzer": "my_analyzer", "text": "ちょっとまって" }
Which results in:
{ "tokens": [ { "token": "ちよつと", "start_offset": 0, "end_offset": 4, "type": "word", "position": 0 }, { "token": "まつ", "start_offset": 4, "end_offset": 6, "type": "word", "position": 1 }, { "token": "て", "start_offset": 6, "end_offset": 7, "type": "word", "position": 2 } ] }
Was this helpful?
Thank you for your feedback.