WARNING: Version 1.7 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Analysis
editAnalysis
editThe index analysis module acts as a configurable registry of Analyzers
that can be used in order to both break indexed (analyzed) fields when a
document is indexed and process query strings. It maps to the Lucene
Analyzer.
Analyzers are composed of a single Tokenizer
and zero or more TokenFilters. The tokenizer may
be preceded by one or more CharFilters. The
analysis module allows one to register TokenFilters, Tokenizers and
Analyzers under logical names that can then be referenced either in
mapping definitions or in certain APIs. The Analysis module
automatically registers (if not explicitly defined) built in
analyzers, token filters, and tokenizers.
Here is a sample configuration:
index :
analysis :
analyzer :
standard :
type : standard
stopwords : [stop1, stop2]
myAnalyzer1 :
type : standard
stopwords : [stop1, stop2, stop3]
max_token_length : 500
# configure a custom analyzer which is
# exactly like the default standard analyzer
myAnalyzer2 :
tokenizer : standard
filter : [standard, lowercase, stop]
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
myTokenizer2 :
type : keyword
buffer_size : 512
filter :
myTokenFilter1 :
type : stop
stopwords : [stop1, stop2, stop3, stop4]
myTokenFilter2 :
type : length
min : 0
max : 2000
Backwards compatibility
editAll analyzers, tokenizers, and token filters can be configured with a
version parameter to control which Lucene version behavior they should
use. Possible values are: 3.0 - 3.6, 4.0 - 4.3 (the highest
version number is the default option).