Snapshot and Restore
UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
Behind the Scenes
Let's take a closer look at Elasticsearch's snapshot and restore module and the files used to store snapshots, exemplified with snapshots on S3.
Introduction
With the new Snapshot and Restore API introduced in Elasticsearch 1.0, you can create snapshots of your data and store it in a repository. In essence, a snapshot is a fancy word for backup, and combined with the ability to restore, snapshots are also useful backups. In this article we will look into the files used by Snapshot & Restore, the anatomy of a repository, and some of its implications. The aim is to uncover which Elasticsearch and Lucene features paved the way for this new and cool feature.
I will include all Elasticsearch commands used, so no previous experience with the Snapshot & Restore API is required, but a general idea of how indexes, shards and Lucene segments are related will be beneficial. The short version of that story is that Elasticsearch divides all its indexes into shards and each shard is a Lucene index which in turn is built up of segments that either reside in memory or on disk. One very important feature of those segments is that they are immutable. Elasticsearch utilizes this in many ways, for example in relation to filter caches. For a better introduction to these topics I recommend reading Elasticsearch from the bottom up.
Nomenoclature
A snapshot is a copy of all the cluster data and may contain both indexes and cluster settings. A snapshot resides within a repository. Several repositories may be defined for a cluster and each repository has a type. The two types available in core Elasticsearch are fs and url. The fs type requires a shared file system that is mounted on every instance in the cluster. The url type only requires the repository files to be readable from a url, however it is limited to read only. In particular, if you are running Elasticsearch on Amazon EC2, you may want to store your snapshots on Amazon S3. The elasticsearch-cloud-aws plugin allows you to this by providing a repository type for S3.
Creating Your First Repository and Snapshot
For this demonstration I will reuse a cluster from a previous article. As detailed in the image below it has three indexes with one shard each. The biggest one, the oslo3 index, receives about one hundred documents every five minutes.
We define a repository with the below curl command. As all snapshot and restore commands, the endpoint used is _snapshot followed by the repository name.
curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo -d' { "type": "s3", "settings": { "bucket": "myBucket", "region": "eu-west-1", "base_path": "myCluster" } }' </cluster_id>
With the elasticsearch-cloud-aws plugin installed on the cluster I can specify type s3. The only required settings are the bucket name and the access credentials. The latter is specified in elasticsearch.yml. The next upcoming version of the plugin is expected to allow repository specific credentials. More settings are available through the plugin documentation.
Once the repository is created we can create the first snapshot with the following command:
curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo/test?wait_for_completion=true </cluster_id>
Two things are worth noting about the above command: every snapshot needs a unique name as specified in the url and, unless otherwise specified, creation of snapshots are done asynchronously. If you want to check progress on a snapshot you can do a GET to the same url.
Looking at the Files Created
Having connected to the bucket using an S3 client I find that the snapshot has created the following files:
myCluster/index myCluster/indices/kibana-int/0/__0 myCluster/indices/kibana-int/0/__1 myCluster/indices/kibana-int/0/__2 myCluster/indices/kibana-int/0/__3 myCluster/indices/kibana-int/0/snapshot-test myCluster/indices/kibana-int/snapshot-test myCluster/indices/my_index/0/__0 myCluster/indices/my_index/0/__1 myCluster/indices/my_index/0/__2 myCluster/indices/my_index/0/__3 myCluster/indices/my_index/0/snapshot-test myCluster/indices/my_index/snapshot-test myCluster/indices/oslo3/0/__0 myCluster/indices/oslo3/0/__1 myCluster/indices/oslo3/0/__10 myCluster/indices/oslo3/0/__11 myCluster/indices/oslo3/0/__12 myCluster/indices/oslo3/0/__13 myCluster/indices/oslo3/0/__14 myCluster/indices/oslo3/0/__15 myCluster/indices/oslo3/0/__16 myCluster/indices/oslo3/0/__17 myCluster/indices/oslo3/0/__18 myCluster/indices/oslo3/0/__19 myCluster/indices/oslo3/0/__1a myCluster/indices/oslo3/0/__1b myCluster/indices/oslo3/0/__1c myCluster/indices/oslo3/0/__1d myCluster/indices/oslo3/0/__1e myCluster/indices/oslo3/0/__1f myCluster/indices/oslo3/0/__1g #skipped 162 similar lines myCluster/indices/oslo3/0/__w myCluster/indices/oslo3/0/__x myCluster/indices/oslo3/0/__y myCluster/indices/oslo3/0/__z myCluster/indices/oslo3/0/snapshot-test myCluster/indices/oslo3/snapshot-test myCluster/metadata-test myCluster/snapshot-test
From this rather lengthy list we can deduce the following structure in every repository:
- index
- indices
- <index_name>
- <shard_number>
- <segment_id>
- snapshot-<snapshot_name>
- snapshot-<snapshot_name>
- metadata-<snapshot_name>
- snapshot-<snapshot_name>
A Second Snapshot
To get a better understanding of the files, we index some more data and then create a second snapshot. By checking the timestamps of the files, it’s clear that the following files were created or modified as part of the new snapshot:
myCluster/index myCluster/indices/kibana-int/0/__4 myCluster/indices/kibana-int/0/snapshot-test2 myCluster/indices/kibana-int/snapshot-test2 myCluster/indices/my_index/0/__4 myCluster/indices/my_index/0/snapshot-test2 myCluster/indices/my_index/snapshot-test2 myCluster/indices/oslo3/0/__55 myCluster/indices/oslo3/0/__56 myCluster/indices/oslo3/0/__57 myCluster/indices/oslo3/0/__58 myCluster/indices/oslo3/0/__59 myCluster/indices/oslo3/0/__5a myCluster/indices/oslo3/0/__5b myCluster/indices/oslo3/0/__5c myCluster/indices/oslo3/0/__5d #Skipped 90 similar lines myCluster/indices/oslo3/0/__7w myCluster/indices/oslo3/0/__7x myCluster/indices/oslo3/0/__7y myCluster/indices/oslo3/0/__7z myCluster/indices/oslo3/0/__80 myCluster/indices/oslo3/0/__81 myCluster/indices/oslo3/0/__82 myCluster/indices/oslo3/0/__83 myCluster/indices/oslo3/0/__84 myCluster/indices/oslo3/0/__85 myCluster/indices/oslo3/0/__86 myCluster/indices/oslo3/0/__87 myCluster/indices/oslo3/0/snapshot-test2 myCluster/indices/oslo3/snapshot-test2 myCluster/metadata-test2 myCluster/snapshot-test2
By comparing the two lists we find that the only preexisting file in the bucket that has been modified is the myCluster/index-file. Let’s have a look at its contents:
{"snapshots":["test","test2"]}
It’s not a big file, nevertheless it contains the names of all the snapshots in the repository.
Furthermore, there are many more files created in each snapshot for the oslo3 index. This is to be expected, as the oslo3 index is the only one receiving data regularly, causing new segments to be created and old segments to be merged. The Kibana index is a different story. As this index is used to store Kibana dashboards, it seldom gets any new data. After creating our previous two snapshots, the following files for the Kibana index exist in the bucket:
#test: myCluster/indices/kibana-int/0/__0 myCluster/indices/kibana-int/0/__1 myCluster/indices/kibana-int/0/__2 myCluster/indices/kibana-int/0/__3 myCluster/indices/kibana-int/0/snapshot-test myCluster/indices/kibana-int/snapshot-test #test2: myCluster/indices/kibana-int/0/__4 myCluster/indices/kibana-int/0/snapshot-test2 myCluster/indices/kibana-int/snapshot-test2
Now, this is actually interesting as it points at one of the really handy features of snapshot and restore: the snapshots are incremental. Let’s examine further and shed some light on how this is implemented.
First, by comparing the two files myCluster/indices/kibana-int/snapshot-test and myCluster/indices/kibana-int/snapshot-test2 we find that they’re identical JSON documents, and by adding a little whitespace formatting it’s easy to recognize the index settings and mappings:
{ "kibana-int":{ "version":2, "state":"open", "settings":{ "index.number_of_replicas":"1", "index.version.created":"900499", "index.number_of_shards":"1" }, "mappings":[ { "dashboard": { "properties": { "dashboard": {"type":"string"}, "group": {"type":"string"}, "title": {"type":"string"}, "user": {"type":"string"} } } } ], "aliases":{} } }
Moving on, we investigate the files for shard 0 and start by comparing snapshot-test and snapshot-test2. This time the files are a little different:
myCluster/indices/kibana-int/0/snapshot-test
{ "name" : "test", "index-version" : 5, "files" : [ { "name" : "__0", "physical_name" : "_0.cfs", "length" : 8037, "checksum" : "trhzg", "part_size" : 104857600 }, { "name" : "__1", "physical_name" : "_0.cfe", "length" : 314, "checksum" : "14i5z7r", "part_size" : 104857600 }, { "name" : "__2", "physical_name" : "_0.si", "length" : 270, "checksum" : "19azdai", "part_size" : 104857600 }, { "name" : "__3", "physical_name" : "segments_5", "length" : 107, "part_size" : 104857600 } ] }
myCluster/indices/kibana-int/0/snapshot-test2
{ "name" : "test2", "index-version" : 6, "files" : [ { "name" : "__0", "physical_name" : "_0.cfs", "length" : 8037, "checksum" : "trhzg", "part_size" : 104857600 }, { "name" : "__1", "physical_name" : "_0.cfe", "length" : 314, "checksum" : "14i5z7r", "part_size" : 104857600 }, { "name" : "__2", "physical_name" : "_0.si", "length" : 270, "checksum" : "19azdai", "part_size" : 104857600 }, { "name" : "__4", "physical_name" : "segments_6", "length" : 107, "part_size" : 104857600 } ] }
From these files we can deduce that the first snapshot uses the files: __0, __1, __2 and __3 and that the second snapshot uses the files: __0, __1, __2 and __4. This is how the incremental feature of the snapshots are implemented on the file side. If you wonder how Elasticsearch is able to compare these files to the files it has on disk for each segment, then there is a hint in the physical_name attributes and their extensions. Cfs, cfe and si are all filetypes used by Lucene segments. This implies that the core building blocks of indexes in snapshots are the same as on disk. Considering the fact that Lucene segments are immutable, the process of making an incremental snapshot simply becomes copying the missing segments to the repository and creating a record of which segments are used by the new snapshot.
Deletes
Let’s assume we no longer need the test snapshot after having created the test2 snapshot. As you might have guessed from the previously described file structure and unlike most incremental backup solutions, snapshots in Elasticsearch have no special significant ordering or dependency on one another even if they are incremental. Hence, we are able to delete the first snapshot without affecting the latter snapshot, as long as we do the deletes through the Elasticsearch API and don’t start picking files on our own. Demo time:
curl -XDELETE https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo/test </cluster_id>
Looking at the segment for the Kibana index again, only the following files remain:
myCluster/indices/kibana-int/0/__0 myCluster/indices/kibana-int/0/__1 myCluster/indices/kibana-int/0/__2 myCluster/indices/kibana-int/0/__4 myCluster/indices/kibana-int/0/snapshot-test2
One obvious implication of this is that deleting a snapshot might not result in freeing up as much disk space as was consumed when the snapshot was created, if another snapshot was created in the meantime. Another implication is that it’s not trivial to define how much disk space is consumed by a snapshot. This is probably why the API does not currently expose any size used per snapshot. A neat feature I would like to see introduced, is a dry run of sorts for deletes, where one could select one or more snapshots and have Elasticsearch calculate the amount of disk space released if they were to be deleted.
Merges
Knowing that segments are the building blocks of indexes and snapshots I suspect that a segment merge might result in having to copy the new merged segment even if there are no changes to the documents or the index settings. I will test this hypothesis by stopping the traffic to the oslo3 index, creating a snapshot, optimize the index down to one segment (it currently has about 17 segments) and create another snapshot.
curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo/oslo3-before-merge?wait_for_completion=true -d '{ "indices": "oslo3", "include_global_state": false }' curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/oslo3/_optimize?max_num_segments=1 curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo/oslo3-after-merge?wait_for_completion=true -d '{ "indices": "oslo3", "include_global_state": false }' </cluster_id></cluster_id></cluster_id>
In chronological order, the above commands resulted in the following files in the bucket:
myCluster/indices/oslo3/0/__az myCluster/indices/oslo3/0/snapshot-oslo3-before-merge myCluster/indices/oslo3/0/__b0 myCluster/indices/oslo3/0/__b1 myCluster/indices/oslo3/0/__b2 myCluster/indices/oslo3/0/__b3 myCluster/indices/oslo3/0/__b4 myCluster/indices/oslo3/0/__b5 myCluster/indices/oslo3/0/__b6 myCluster/indices/oslo3/0/__b7 myCluster/indices/oslo3/0/__b8.part0 myCluster/indices/oslo3/0/__b8.part1 myCluster/indices/oslo3/0/__b8.part2 myCluster/indices/oslo3/0/__b9 myCluster/indices/oslo3/0/__ba myCluster/indices/oslo3/0/__bb myCluster/indices/oslo3/0/__bc myCluster/indices/oslo3/0/__bd myCluster/indices/oslo3/0/snapshot-oslo3-after-merge
As suspected, the merge forced the following snapshot to copy the new segment even if the documents strictly speaking already existed within the repository. One advantage with this approach is that the segments don’t have to be merged again when restoring the snapshot. Nonetheless, it also implies that even if you never delete any documents from your index and do snapshots frequently, your repository will contain a lot of redundant data - that is, if you don’t delete old snapshots. In other words, the fact that snapshots are incremental does not imply that they will never contain any redundant data. The deduplication only work with complete segments and is not able to track which segments where merged together. For a really nice demonstration of how segment merges occur in Lucene while indexing I recommend this video.
Restore
Backups have limited value if you are unable to restore them, but to make things a little more interesting, let’s do the restore to a different cluster. I create a new cluster through the Found console, select Elasticsearch version 1.0 and the same region as my previous cluster. I proceed to create the same repository in the new cluster and restore with the following commands:
curl -XPUT https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo -d' { "type": "s3", "settings": { "bucket": "myBucket", "region": "eu-west-1", "base_path": "myCluster" } }' curl -XPOST https://<cluster_id>-eu-west-1.foundcluster.com:9243/_snapshot/myRepo/snapshot-oslo3-after-merge/_restore </cluster_id></cluster_id>
Before taking this to the extreme and have all your clusters configured with the same repository, here’s a word of caution. This usage is not mentioned anywhere in the official documentation and there is a number of potential race conditions that can happen with multiple systems accessing the same remote file system. It’s highly likely that the Elasticsearch developers have considered this use case and have plans to endorse it in the future. My guess as to why it hasn’t been documented is that they are yet to implement all the safe guards and test it properly. There is, however, a way to reduce the number of potential problems, and that is to never have more than one cluster with write access to any given repository. By first creating one repository per cluster and then make it available to the other clusters as a url repository, the read only access would be enforced.
Another constraint which is worth noting about restores is that if the index already exists in the cluster it must be closed first. One possible workaround is to rename the index as it is restored, as described in the official documentation.
Possible Use Cases
There are many use cases for snapshot and restore and the most obvious one is of course backups, both ad hoc before large changes, but also ones scheduled on a regular basis. Obviously, the scheduling is something you will have to implement on your own, just don’t forget to make retention policy as well. You can even use snapshot restore for a point in time recovery, but before you start automating a snapshot every minute, remember that while a shard is being snapshot, all segment merges are postponed, and if you end up with too many segments your search performance will suffer. In all likelihood, the Elasticsearch developers decided to limit a cluster to only do one snapshot or restore operation at a time for the very same reasons that I recommend not allowing multiple cluster have write access to a repository.
The coolest thing in snapshot and restore, at least in my opinion, is the possibility to easily duplicate data from production clusters to development or test environments. This is one of those things that is never a problem at the beginning of projects - simply because you just don’t have that much data! - but eventually you get to the point where it takes days to transfer all the data from the production environment to a separate system to start tracking down that bug… Once that happens you really wish you had some way of keeping the development system up to date with what goes on in production. Snapshot and restore to the rescue!