New

The executive guide to generative AI

Read more

Multiplexer Token Filter

edit

A token filter of type multiplexer will emit multiple tokens at the same position, each version of the token having been run through a different filter. Identical output tokens at the same position will be removed.

If the incoming token stream has duplicate tokens, then these will also be removed by the multiplexer

Options

edit

filters

a list of token filters to apply to incoming tokens. These can be any token filters defined elsewhere in the index mappings. Filters can be chained using a comma-delimited string, so for example "lowercase, porter_stem" would apply the lowercase filter and then the porter_stem filter to a single token.

Shingle or multi-word synonym token filters will not function normally when they are declared in the filters array because they read ahead internally which is unsupported by the multiplexer

preserve_original
if true (the default) then emit the original token in addition to the filtered tokens

Settings example

edit

You can set it up like:

PUT /multiplexer_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : [ "my_multiplexer" ]
                }
            },
            "filter" : {
                "my_multiplexer" : {
                    "type" : "multiplexer",
                    "filters" : [ "lowercase", "lowercase, porter_stem" ]
                }
            }
        }
    }
}

And test it like:

POST /multiplexer_example/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "Going HOME"
}

And it’d respond:

{
  "tokens": [
    {
      "token": "Going",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "going",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "go",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "HOME",
      "start_offset": 6,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "home",          
      "start_offset": 6,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

The stemmer has also emitted a token home at position 1, but because it is a duplicate of this token it has been removed from the token stream

The synonym and synonym_graph filters use their preceding analysis chain to parse and analyse their synonym lists, and ignore any token filters in the chain that produce multiple tokens at the same position. This means that any filters within the multiplexer will be ignored for the purpose of synonyms. If you want to use filters contained within the multiplexer for parsing synonyms (for example, to apply stemming to the synonym lists), then you should append the synonym filter to the relevant multiplexer filter list.

Was this helpful?
Feedback