The Delete by Query API Is now a plugin
As of Elasticsearch's 2.0.0-beta1 release we
have
removed
the previous core implementation of
the
Delete
by Query API
and replaced
it
with a new
Delete
by Query plugin. Here we explain why we did this and how the
plugin's implementation differs from the previous core implementation.
If you use the Delete by Query API, after upgrading to 2.0, just install the plugin and then follow its documentation:
bin/plugin install delete-by-query
Why did we do this?
We take the quality of Elasticsearch's core APIs seriously, and the previous implementation of Delete by Query had major issues that could not easily be fixed:
- After each request, it did a silent refresh which made the deletes unexpectedly visible to searches and could also easily lead to segment explosion, merge storms, drastic indexing slowdowns and heap exhaustion, crashing multiple nodes in the cluster.
- The query was re-executed on the primary and replicas, so different documents could be deleted, leading to silently inconsistent replicas (data corruption).
- It made upgrades fragile, because a Delete by Query request could leave a query in the transaction log which may not parse correctly after upgrading, or may not execute correctly e.g. if it referred to index aliases that have since been deleted, leading to bugs like this one.
In contrast, the new plugin has a fully safe implementation: it runs scan and scroll request to find all ids matching the query, and then uses the bulk indexing API to delete them.
This implementation is necessarily slower, especially if the query deletes many documents. Be sure to test your application if you delete many documents using this API, and consider switching to a different approach where you can delete whole indices instead.
The Delete by Query plugin documentation describes more details about the motivation and differences in the new implementation.
A minimal Elasticsearch core
Switching to a plugin was not an easy decision: many users were able to use the faster core Delete by Query without issue. Still, the danger was always there, and a non-trivial number of users did hit the serious problems above.
Furthermore, Elasticsearch's core must remain reliable and lean. Any feature which can be built on top of the other core APIs really does not belong in the core, especially if it's buggy. All features in the core should be ironclad, and Delete by Query, despite its popularity and high performance, simply wasn't.
We take resiliency and quality seriously enough to make hard tradeoffs like this one, when necessary.
Deleting a Mapping is gone
Deleting a type's mappings has also been removed in 2.0 because it could lead to index corruption if the same field names were later re-used in a new type but with different mappings.
However, you can still delete all documents for a given type using the plugin with a Match All Query against that type, or strongly consider changing your approach to use a separate index instead of different types within one index.