CJK Width Token Filter

edit

The cjk_width token filter normalizes CJK width differences:

  • Folds fullwidth ASCII variants into the equivalent basic Latin
  • Folds halfwidth Katakana variants into the equivalent Kana

This token filter can be viewed as a subset of NFKC/NFKD Unicode normalization. See the ICU Analysis Plugin for full normalization support.