Keep types token filter
Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes to quick foxes by keeping only <ALPHANUM> (alphanumeric) tokens.
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard tokenizer can produce a variety of token types, including <ALPHANUM>, <HANGUL>, and <NUM>. Simpler analyzers, like the lowercase tokenizer, only produce the word token type.
Certain token filters can also add token types. For example, the synonym filter can add the <SYNONYM> token type.
Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.
This filter uses Lucene’s TypeTokenFilter.
The following analyze API request uses the keep_types filter to keep only <NUM> (numeric) tokens from 1 quick fox 2 lazy dogs.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ]
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ 1, 2 ]
The following analyze API request uses the keep_types filter to remove <NUM> tokens from 1 quick fox 2 lazy dogs. Note the mode parameter is set to exclude.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ],
"mode": "exclude"
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
types- (Required, array of strings) List of token types to keep or remove.
mode-
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
include- (Default) Keep only the specified token types.
exclude- Remove the specified token types.
To customize the keep_types filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following create index API request uses a custom keep_types filter to configure a new custom analyzer. The custom keep_types filter keeps only <ALPHANUM> (alphanumeric) tokens.
PUT keep_types_example
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [ "extract_alpha" ]
}
},
"filter": {
"extract_alpha": {
"type": "keep_types",
"types": [ "<ALPHANUM>" ]
}
}
}
}
}