Minhash Token Filter
editMinhash Token Filter
editA token filter of type min_hash
hashes each token of the token stream and divides
the resulting hashes into buckets, keeping the lowest-valued hashes per
bucket. It then returns these hashes as tokens.
The following are settings that can be set for a min_hash
token filter.
Setting | Description |
---|---|
|
The number of hashes to hash the token stream with. Defaults to |
|
The number of buckets to divide the minhashes into. Defaults to |
|
The number of minhashes to keep per bucket. Defaults to |
|
Whether or not to fill empty buckets with the value of the first non-empty
bucket to its circular right. Only takes effect if hash_set_size is equal to one.
Defaults to |