Curator: Tending your time-series indices

NOTE:This article now contains outdated information. Please reference our docs, peruse our latest blogs, and visit our forums for the latest and greatest. Thank you.

background

A few years ago, I was managing an Elasticsearch, Logstash and Kibana (ELK) stack and needed a way to automatically clean up indices older than 30 days. After reading the API documentation and getting some help from the community in the #logstash and #elasticsearch IRC channels, I realized that this was fairly easy to set up with simple scripting and cron.

curl -XDELETE 'localhost:9200/logstash-2014.01.01?pretty'

Sure, this works, and it’s not terribly hard to generate dates, but I wanted something a bit more elegant.

In the beginning…

So, I started writing a script in python. It did the job with a cleaner command line and a single target number of days to keep, so I shared it with the larger community. Others polished it up and added new functionality. I wrote another script that also allowed me to optimize — which really just means “merge the segments in each shard until no more than n segments exist per shard — old indices. These scripts have now been merged and enhanced to become a single, helpful tool to help manage your older indices like the fine works of art they are!

Introducing Curator

Here are a few of the index operations you can do with Curator:

  • Delete (by total space consumed or by date)
  • Close
  • Disable bloom filter cache
  • Optimize (Lucene forceMerge)

Installing Curator

As of this writing, Curator is at release 0.5.1 and works with versions up through 0.90.10. Curator should also should be compatible with Elasticsearch version 1.0 (which is still only at RC1). We’ll be testing to ensure compatibility with each release.

It currently resides in a git repository. In the near future, it will be a pip installable package (it’s python-based). Don’t let that scare you away from using it, though! If you have python and pip installed on your machine, installation is as simple as:

git clone https://github.com/elasticsearch/curator.git
pip install -r requirements.txt

After that, you should be able to run this:

$ ./curator.py -v
curator.py 0.5.1

How-To and examples

Before we get to the examples, you may want to look over the options in context. The list is long (and included at the end of this post), but illustrative of how much control you have. Please note where defaults have been listed. If you like them, you do not need to specify those flags.

Now let’s go through some simple examples to illustrate how Curator can make your ELK stack better, and even more responsive.

Delete

Let’s say you want to keep no more than 90 days of indices. The command is simple:

$ curator.py --host my-elasticsearch -d 90

Here the -d specifies the number of days. Simple, right?

Delete by space

This is a special case where you might want to delete indices in excess of some number of gigabytes (starting with the oldest):

$ curator.py --host my-elasticsearch -C space -g 10024

Here you see that we specified curation (-C) by space, and a number of (-g) gigabytes (10024, or 10TB). The -g argument will accept a decimal value, e.g. 1.5, 0.5, etc.

Please note that you cannot combine delete by space with any of the other Curator options.

Close

Closing an index is handled by the Open/Close Index API:

The open and close index APIs allow to you close an index, and later on open it. A closed index has almost no overhead on the cluster (except for maintaining its metadata), and is blocked for read/write operations. A closed index can be opened which will then go through the normal recovery process.

Closing an index means it’s still there, but not searchable. Why is that useful?

Imagine that you have an obligation to keep indices for 90 days, but rarely if ever do you search through an index over 30 days old. In this case, you can close the indices which will save valuable resources (heap space, in particular). This means your cluster will have more memory for searches and indexing! And if you ever need the data in those indices, you can open them again with an API call and they’ll be there again.

In such an event it may be wise to temporarily disable a scheduled execution of Curator to prevent your now-open indices from being closed again, prematurely.

$ curator.py --host my-elasticsearch -c 30 -d 90

This builds on our previous example. This will close indices older than 30 days and delete indices older than 90 days. Still quite simple!

Disable Bloom Filters

This is a brand new feature not usable unless you’re running Elasticsearch version 0.90.9 or better.

Don’t worry, the script checks to make sure your version is sufficient before attempting the operation.

What is a bloom filter? Why would I want it disabled?

The bloom filter is resources allocated to speed indexing operations. With time-series data this is still useful while indexing. Your index is probably not indexing new data 2 days after the date has rolled over. The bloom filter is then holding on to resources that the index no longer requires. With Curator we can free those resources!

$ curator.py --host my-elasticsearch -b 2 -c 30 -d 90

Now we’re freeing those bloom filter resources on indices older than 2 days (you could also do 1), closing indices older than 30 days and deleting indices older than 90 days.

Optimize, or rather forceMerge

It’s important to understand that when you see optimize in the Elasticsearch API you’re not seeing a command you need to run on a live index, or repeatedly on a “cold” index (meaning one that is no longer actively indexing). In fact, optimize was renamed to forceMerge inside Lucene so people would not make an optimize call thinking it was improving their index. Merging segments in Elasticsearch can bring benefits, but understand the cost before you start optimizing all of your cold indices.

A forceMerge operation will try to reduce the segment count in each shard in your indices. Since each segment has an overhead, more segments means more resources used. That’s good, right? Less resources?

It can be, but the amount of disk and network I/O to perform a significant merge operation can take a toll on your disks and your cluster’s normal write operations. My advice here is to consider carefully if you need this. It can help make searches faster (by a few percentage points), and reduce resources. It can also make cluster recover faster as there are fewer segments to manage. It can take a very long time to optimize a single index — perhaps an hour or more. As a result, increasing the timeout may be necessary, with a recommended value of no less than 3600 seconds (one hour). As with the warning labels on cleaning bottles, “test in an inconspicuous place before using,” you should test this during a time of reduced disk I/O and see if it improves operations and resources in a way that makes sense for your cluster and use-case. The default is to merge to 2 segments per shard, but you can override this default with the --max_num_segments flag.

Building on our previous example yet again,

$ curator.py --host my-elasticsearch -b 2 -o 2 -c 30 -d 90

This will disable bloom filters for indices older than 2 days, “optimize” indices older than 2 days, close indices older than 30 days and delete indices older than 90 days.

Order of operations

The script enforces the following order to prevent operations from colliding. Why optimize an index that’s closed? Why close an index that’s slated for deletion?

  1. Delete (by space or time)
  2. Close
  3. Disable bloom filters
  4. Optimize

Usage considerations

In the most recent example, we performed all three operations in one command, but you may not want them all run serially.

$ curator.py --host my-elasticsearch -b 2 -o 2 -c 30 -d 90

…is functionally equivalent to…

$ curator.py --host my-elasticsearch -d 90
$ curator.py --host my-elasticsearch -c 30 
$ curator.py --host my-elasticsearch -b 2
$ curator.py --host my-elasticsearch -o 2

You could easily run these commands at different times, or with extra flags (especially the optimize run, which should have something like --timeout 3600).

You may also have indices with prefixes other than the default logstash- to crawl:

$ curator.py --host my-elasticsearch --prefix logstash- -d 30
$ curator.py --host my-elasticsearch --prefix othername- -d 30

Conclusion

Curator helps you manage your time-series indices retention policies. With an abundance of configuration options you can easily manage your indices–whether you have 1 node, or a hundred or more in your cluster. Feedback and contributions are welcome at https://github.com/elasticsearch/curator!

Help output (all arguments and options displayed)
$ curator.py -h
usage: curator.py [-h] [-v] [--host HOST] [--port PORT] [-t TIMEOUT]
                  [-p PREFIX] [-s SEPARATOR] [-C CURATION_STYLE]
                  [-T TIME_UNIT] [-d DELETE_OLDER] [-c CLOSE_OLDER]
                  [-b BLOOM_OLDER] [-g DISK_SPACE]
                  [--max_num_segments MAX_NUM_SEGMENTS] [-o OPTIMIZE] [-n]
                  [-D] [-l LOG_FILE]
Curator for Elasticsearch indices. Can delete (by space or time), close,
disable bloom filters and optimize (forceMerge) your indices.
optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program version number and exit
  --host HOST           Elasticsearch host. Default: localhost
  --port PORT           Elasticsearch port. Default: 9200
  -t TIMEOUT, --timeout TIMEOUT
                        Elasticsearch timeout. Default: 30
  -p PREFIX, --prefix PREFIX
                        Prefix for the indices. Indices that do not have this
                        prefix are skipped. Default: logstash-
  -s SEPARATOR, --separator SEPARATOR
                        Time unit separator. Default: .
  -C CURATION_STYLE, --curation-style CURATION_STYLE
                        Curate indices by [time, space] Default: time
  -T TIME_UNIT, --time-unit TIME_UNIT
                        Unit of time to reckon by: [days, hours] Default: days
  -d DELETE_OLDER, --delete DELETE_OLDER
                        Delete indices older than n TIME_UNITs.
  -c CLOSE_OLDER, --close CLOSE_OLDER
                        Close indices older than n TIME_UNITs.
  -b BLOOM_OLDER, --bloom BLOOM_OLDER
                        Disable bloom filter for indices older than n
                        TIME_UNITs.
  -g DISK_SPACE, --disk-space DISK_SPACE
                        Delete indices beyond n GIGABYTES.
  --max_num_segments MAX_NUM_SEGMENTS
                        Maximum number of segments, post-optimize. Default: 2
  -o OPTIMIZE, --optimize OPTIMIZE
                        Optimize (Lucene forceMerge) indices older than n
                        TIME_UNITs. Must increase timeout to stay connected
                        throughout optimize operation, recommend no less than
                        3600.
  -n, --dry-run         If true, does not perform any changes to the
                        Elasticsearch indices.
  -D, --debug           Debug mode
  -l LOG_FILE, --logfile LOG_FILE
                        log file