Fingerprint Token Filter
editFingerprint Token Filter
editThe fingerprint token filter emits a single token which is useful for fingerprinting
a body of text, and/or providing a token that can be clustered on. It does this by
sorting the tokens, deduplicating and then concatenating them back into a single token.
For example, the tokens ["the", "quick", "quick", "brown", "fox", "was", "very", "brown"] will be
transformed into a single token: "brown fox quick the very was". Notice how the tokens were sorted
alphabetically, and there is only one "quick".
The following are settings that can be set for a fingerprint token
filter type:
| Setting | Description |
|---|---|
|
Defaults to a space. |
|
Defaults to |
Maximum token size
editBecause a field may have many unique tokens, it is important to set a cutoff so that fields do not grow
too large. The max_output_size setting controls this behavior. If the concatenated fingerprint
grows larger than max_output_size, the token filter will exit and will not emit a token (e.g. the
field will be empty).