- Plugins and Integrations: other versions:
- Introduction to plugins
- Plugin management
- API extension plugins
- Analysis plugins
- ICU analysis plugin
- Japanese (kuromoji) analysis plugin
kuromoji
analyzerkuromoji_iteration_mark
character filterkuromoji_tokenizer
kuromoji_baseform
token filterkuromoji_part_of_speech
token filterkuromoji_readingform
token filterkuromoji_stemmer
token filterja_stop
token filterkuromoji_number
token filterhiragana_uppercase
token filterkatakana_uppercase
token filterkuromoji_completion
token filter
- Korean (nori) analysis plugin
- Phonetic analysis plugin
- Smart Chinese analysis plugin
- Stempel Polish analysis plugin
- Ukrainian analysis plugin
- Discovery plugins
- Mapper plugins
- Snapshot/restore repository plugins
- Store plugins
- Integrations
- Creating an Elasticsearch plugin
ICU folding token filter
editICU folding token filter
editCase folding of Unicode characters based on UTR#30
, like the
ASCII-folding token filter
on steroids. It registers itself as the icu_folding
token filter and is
available to all indices:
PUT icu_sample { "settings": { "index": { "analysis": { "analyzer": { "folded": { "tokenizer": "icu_tokenizer", "filter": [ "icu_folding" ] } } } } } }
The ICU folding token filter already does Unicode normalization, so there is no need to use Normalize character or token filter as well.
Which letters are folded can be controlled by specifying the
unicode_set_filter
parameter, which accepts a
UnicodeSet.
The following example exempts Swedish characters from folding. It is important
to note that both upper and lowercase forms should be specified, and that
these filtered character are not lowercased which is why we add the
lowercase
filter as well:
PUT icu_sample { "settings": { "index": { "analysis": { "analyzer": { "swedish_analyzer": { "tokenizer": "icu_tokenizer", "filter": [ "swedish_folding", "lowercase" ] } }, "filter": { "swedish_folding": { "type": "icu_folding", "unicode_set_filter": "[^åäöÅÄÖ]" } } } } } }
Was this helpful?
Thank you for your feedback.