Announcing Curator 5
I am excited to announce the release of Curator 5! Let's dive right in with the changes, shall we?
Breaking Changes
There's really only two breaking changes to be aware of:
- Curator 5 only works with Elasticsearch 5.x versions.
- A tiny API change. If you only ever use Curator as a command-line tool, you won't even know this change is there.
So, why does Curator 5 only work with Elasticsearch 5.x? On one hand, reverse-compatibility is hard. Another difficulty is that sometimes a new feature doesn't work with an older version of Elasticsearch, but that feature is in the docs, even with a big warning that says, "this feature doesn't work with version X." As a result, new users can have a bad experience, as they struggle to make something work for hours, and then ask for help in the forums only to learn that the feature will not work for them due to a version mismatch. To save everyone time and aggravation, Curator is trying to get on the unified release schedule (though it's still a few versions behind).
What's the same?
One of the nice things is that the configuration format remains unchanged. You are free to use curator
and curator_cli
exactly as before, without having to change any configuration. The change from Curator 3 to Curator 4 was a jarring one for many users. The improvements were effective, however, and there hasn't been a need to change the configuration syntax. Instant upgrade win!
What's new?
Now this is where the exciting part comes! New features!
Reindex
Perhaps the biggest new feature in Curator 5 is the addition of the reindex action. The Reindex API is extremely powerful.
actions:
1:
description: "Reindex index1 into index2"
action: reindex
options:
wait_interval: 9
max_wait: -1
request_body:
source:
index: index1
dest:
index: index2
filters:
- filtertype: none
This is just an example of a simple, local reindex with manually selected indices. "But," I hear you say, "this is Curator! It should support filtered index selection!" And you're absolutely right. That's accomplished like this, with the REINDEX_SELECTION
placeholder:
actions:
1:
description: >-
Reindex all daily logstash indices from March 2017 into logstash-2017.03
action: reindex
options:
wait_interval: 9
max_wait: -1
request_body:
source:
index: REINDEX_SELECTION
dest:
index: logstash-2017.03
filters:
- filtertype: pattern
kind: prefix
value: logstash-2017.03.
This example will reindex all of the daily Logstash indices from March 2017 into a single monthly index.
You can add all kinds of extra processing to these reindexing operations. Curator should support all possible configurations, save one only, and that is manual slicing (which is likely to be a pretty rare need, since automatic slicing is available). Want to reindex through an ingest pipeline? No problem? Reindex from remote? Oh, that's the best part!
actions:
1:
description: >-
Reindex remote index1 to local index1
action: reindex
options:
wait_interval: 9
max_wait: -1
request_body:
source:
remote:
host: http://otherhost:9200
username: myuser
password: mypass
index: index1
dest:
index: index1
filters:
- filtertype: none
This example will pull index1
from http://otherhost:9200
with the provided credentials if you have started the local node with the following setting in the elasticsearch.yml
file. If the setting was not present when the Elasticsearch node was started, it means that the node must be restarted after this setting has been added (it cannot be done dynamically):
reindex.remote.whitelist: remote_host_or_IP1:9200, remote_host_or_IP2:9200
You must whitelist remote nodes in order to be able to reindex from remote. In this case, remote_host
would likely be the same IP or host name as otherhost
in the reindex request_body
. Curator will test for the presence of the dest
index, and if the task successfully completes, but that index is not found, it will log an error guessing that whitelisting is not set up properly.
"But," I hear you say again, "what if I want to use Curator's index filters to select indices on the remote side?" I saw you coming:
actions:
1:
description: >-
Reindex all remote daily logstash indices from March 2017 into local index
logstash-2017.03
action: reindex
options:
wait_interval: 9
max_wait: -1
request_body:
source:
remote:
host: http://otherhost:9200
username: myuser
password: mypass
index: REINDEX_SELECTION
dest:
index: logstash-2017.03
remote_filters:
- filtertype: pattern
kind: prefix
value: logstash-2017.03.
filters:
- filtertype: none
This example will reindex all of the daily Logstash indices from March 2017 from otherhost
into a single monthly index on the local cluster.
Generally speaking, the Curator should be able to perform a remote reindex from any version of Elasticsearch, 1.4 and newer. Strictly speaking, the Reindex API in Elasticsearch is able to reindex from older clusters, but Curator cannot be used to facilitate this due to Curator's dependency on changes released in 1.4.
However, there is a known bug with Elasticsearch 5.3.0 not being able to reindex from remote clusters older than 2.0. The patch will be available in Elasticsearch 5.3.1. Earlier versions of Elasticsearch 5.x do not suffer from this bug.
There is a ton of documentation regarding what can be put in a request_body
, which has even more examples than this.
Rollover
Lots of you have been asking for this feature, and here it is!
action: rollover
description: >-
Rollover the index associated with index 'name', which should be in the
form of prefix-000001 (or similar), or prefix-YYYY.MM.DD-1.
options:
name: aliasname
conditions:
max_age: 1d
max_docs: 1000000
extra_settings:
index.number_of_shards: 3
index.number_of_replicas: 1
timeout_override:
continue_if_exception: False
disable_action: False
The conditions are described in the Rollover API Elasticsearch documentation.
Read more in the Curator documentation.
Date Math in create_index
Many users were eager to be able to create indices in Curator, but were unable to create indices with a future timestamp in the index name. Credit for this actually goes to the Elasticsearch team.
action: create_index
description: "Create index as named"
options:
name: '<logstash-{now/d+1d}>'
# ...
For example, if today's date were 2017-04-07, the name <logstash-{now/d}>
will create an index named logstash-2017.04.07
. If you wanted to create tomorrow's index, you would use the name <logstash-{now/d+1d}>
, which adds 1 day. This pattern creates an index named logstash-2017.04.08
. For many more configuration options, read the Elasticsearch date math documentation.
Unset Shard Routing Allocation
action: allocation
description: "Apply shard allocation filtering rules to the specified indices"
options:
key: tag
value:
allocation_type: require
filters:
- filtertype: ...
By leaving value
unset, or empty, a previously set value can be unset.
Period Filter
This has been a long requested feature. Now you can select blocks of whole units of time.
- filtertype: period
source: name
range_from: -1
range_to: -1
timestring: '%Y.%m.%d'
unit: weeks
week_starts_on: sunday
With range_from
and range_to
, you can select multiple hours, days, weeks, months, or years. Negative numbers indicate the past, and positive numbers indicate the future. In the above example, setting both to -1
means to only select the last whole week, counting Sunday as the first day of the week. If today is 2017-04-07, week 0
is this week, which starts on 2017-04-02. This means that -1
actually gets the week starting on 2017-03-26 and ending on 2017-04-01.
There's too much to put it all in the blog, so be sure to read the documentation for this filter.
Dedicated internal waitforcompletion functionality
In previous versions of Curator, users were obliged to increase client connection timeout values to be very high for long-running actions, like Snapshots. Curator even tried to compensate by automatically increasing those values for the long-running actions. These actions included allocation, cluster_routing, forceMerge, reindex, replicas, restore, and snapshot.
With the exception of forceMerge, these actions will no longer bind a client connection, waiting for the cluster to send a completion message. Instead, they poll to check for completion:
action: snapshot
description: Snapshot selected indices to 'repository'
options:
repository:
# ...
wait_for_completion: True
max_wait: 3600
wait_interval: 10
# ...
filters:
- filtertype: ...
You still use the wait_for_completion
setting, but now with a max_wait
and a wait_interval
. A max_wait
of -1
means to wait forever for it to complete, otherwise specify a number of seconds to give the operation to attempt to wait for completion before giving up. The wait_interval
defines how frequently Curator will check to see if the task is complete. Curator does not check to see if wait_interval
is less than the timeout
value you specify in the curator.yml
client configuration file, so don't set it too high.
So why doesn't this work with forceMerge? From the Elasticsearch documentation:
This call will block until the merge is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous force merge is complete.
So for forceMerge, be aware that timeouts can still occur, and set timeout_override
accordingly.
Conclusion
This is a feature-laden release for Curator, and I'm excited to bring it to you. As always, if you run into a problem, help is available at https://discuss.elastic.co.
Happy Curating!