NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
Normalizers
editNormalizers
editNormalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole. The current list of filters that can be used in a normalizer is
following: arabic_normalization
, asciifolding
, bengali_normalization
,
cjk_width
, decimal_digit
, elision
, german_normalization
,
hindi_normalization
, indic_normalization
, lowercase
,
persian_normalization
, scandinavian_folding
, serbian_normalization
,
sorani_normalization
, uppercase
.
Custom normalizers
editElasticsearch does not ship with built-in normalizers so far, so the only way to get one is by building a custom one. Custom normalizers take a list of char character filters and a list of token filters.
PUT index { "settings": { "analysis": { "char_filter": { "quote": { "type": "mapping", "mappings": [ "« => \"", "» => \"" ] } }, "normalizer": { "my_normalizer": { "type": "custom", "char_filter": ["quote"], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "_doc": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } }