WARNING: Version 0.90 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Terms Filter
editTerms Filter
editFilters documents that have fields that match any of the provided terms (not analyzed). For example:
{ "constant_score" : { "filter" : { "terms" : { "user" : ["kimchy", "elasticsearch"]} } } }
The terms
filter is also aliased with in
as the filter name for
simpler usage.
Execution Mode
editThe way terms filter executes is by iterating over the terms provided and finding matches docs (loading into a bitset) and caching it. Sometimes, we want a different execution model that can still be achieved by building more complex queries in the DSL, but we can support them in the more compact model that terms filter provides.
The execution
option now has the following options :
|
The default. Works as today. Iterates over all the terms, building a bit set matching it, and filtering. The total filter is cached. |
|
Generates a terms filters that uses the fielddata cache to compare terms. This execution mode is great to use when filtering on a field that is already loaded into the fielddata cache from faceting, sorting, or index warmers. When filtering on a large number of terms, this execution can be considerably faster than the other modes. The total filter is not cached unless explicitly configured to do so. |
|
Generates a term filter (which is cached) for each term, and wraps those in a bool filter. The bool filter itself is not cached as it can operate very quickly on the cached term filters. |
|
Generates a term filter (which is cached) for each term, and wraps those in an and filter. The and filter itself is not cached. |
|
Generates a term filter (which is cached) for each term, and
wraps those in an or filter. The or filter itself is not cached.
Generally, the |
If you don’t want the generated individual term queries to be cached,
you can use: bool_nocache
, and_nocache
or or_nocache
instead, but
be aware that this will affect performance.
The "total" terms filter caching can still be explicitly controlled
using the _cache
option. Note the default value for it depends on the
execution value.
For example:
{ "constant_score" : { "filter" : { "terms" : { "user" : ["kimchy", "elasticsearch"], "execution" : "bool", "_cache": true } } } }
Caching
editThe result of the filter is automatically cached by default. The
_cache
can be set to false
to turn it off.
Terms lookup mechanism
editWhen it’s needed to specify a terms
filter with a lot of terms it can
be beneficial to fetch those term values from a document in an index. A
concrete example would be to filter tweets tweeted by your followers.
Potentially the amount of user ids specified in the terms filter can be
a lot. In this scenario it makes sense to use the terms filter’s terms
lookup mechanism.
The terms lookup mechanism is supported from version 0.90.0.Beta1
.
The terms lookup mechanism supports the following options:
|
The index to fetch the term values from. Defaults to the current index. |
|
The type to fetch the term values from. |
|
The id of the document to fetch the term values from. |
|
The field specified as path to fetch the actual values for the
|
|
A custom routing value to be used when retrieving the external terms doc. |
|
Whether to cache the filter built from the retrieved document
( |
The values for the terms
filter will be fetched from a field in a
document with the specified id in the specified type and index.
Internally a get request is executed to fetch the values from the
specified path. At the moment for this feature to work the _source
needs to be stored.
Also, consider using an index with a single shard and fully replicated across all nodes if the "reference" terms data is not large. The lookup terms filter will prefer to execute the get request on a local node if possible, reducing the need for networking.
Terms lookup caching
editThere is an additional cache involved, which caches the lookup of the lookup document to the actual terms. This lookup cache is a LRU cache. This cache has the following options:
-
indices.cache.filter.terms.size
-
The size of the lookup cache. The default is
10mb
. -
indices.cache.filter.terms.expire_after_access
- The time after the last read an entry should expire. Disabled by default.
-
indices.cache.filter.terms.expire_after_write
- The time after the last write an entry should expire. Disabled by default.
All options for the lookup of the documents cache can only be configured
via the elasticsearch.yml
file.
When using the terms lookup the execution
option isn’t taken into
account and behaves as if the execution mode was set to plain
.
Terms lookup twitter example
edit# index the information for user with id 2, specifically, its followers curl -XPUT localhost:9200/users/user/2 -d '{ "followers" : ["1", "3"] }' # index a tweet, from user with id 2 curl -XPUT localhost:9200/tweets/tweet/1 -d '{ "user" : "2" }' # search on all the tweets that match the followers of user 2 curl -XGET localhost:9200/tweets/_search -d '{ "query" : { "filtered" : { "filter" : { "terms" : { "user" : { "index" : "users", "type" : "user", "id" : "2", "path" : "followers" }, "_cache_key" : "user_2_friends" } } } } }'
The above is highly optimized, both in a sense that the list of
followers will not be fetched if the filter is already cached in the
filter cache, and with internal LRU cache for fetching external values
for the terms filter. Also, the entry in the filter cache will not hold
all
the terms reducing the memory required for it.
_cache_key
is recommended to be set, so its simple to clear the cache
associated with it using the clear cache API. For example:
curl -XPOST 'localhost:9200/tweets/_cache/clear?filter_keys=user_2_friends'
The structure of the external terms document can also include array of inner objects, for example:
curl -XPUT localhost:9200/users/user/2 -d '{ "followers" : [ { "id" : "1" }, { "id" : "2" } ] }'
In which case, the lookup path will be followers.id
.