IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Simple analyzer
editSimple analyzer
editThe simple
analyzer breaks text into tokens at any non-letter character, such
as numbers, spaces, hyphens and apostrophes, discards non-letter characters,
and changes uppercase to lowercase.
Example
editresp = client.indices.analyze( analyzer="simple", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp)
response = client.indices.analyze( body: { analyzer: 'simple', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.analyze({ analyzer: "simple", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response);
POST _analyze { "analyzer": "simple", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The simple
analyzer parses the sentence and produces the following
tokens:
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
Customize
editTo customize the simple
analyzer, duplicate it to create the basis for
a custom analyzer. This custom analyzer can be modified as required, usually by
adding token filters.
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_custom_simple_analyzer": { "tokenizer": "lowercase", "filter": [] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: 'lowercase', filter: [] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: "lowercase", filter: [], }, }, }, }, }); console.log(response);