A Dive into the Elasticsearch Storage
UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
In this article we'll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index and shard level files and give a short explanation of their contents in order to establish an understanding of the data written to disk by Elasticsearch.
Elasticsearch Paths
Elasticsearch is configured with several paths:
- path.home: Home directory of the user running the Elasticsearch process. Defaults to the Java system property user.dir, which is the default home directory for the process owner.
- path.conf: A directory containing the configuration files. This is usually set by setting the Java system property es.config, as it naturally has to be resolved before the configuration file is found.
- path.plugins: A directory whose sub-folders are Elasticsearch plugins. Sym-links are supported here, which can be used to selectively enable/disable a set of plugins for a certain Elasticsearch instance when multiple Elasticsearch instances are run from the same executable.
- path.work: A directory that was used to store working/temporary files for Elasticsearch. It’s no longer used.
- path.logs: Where the generated logs are stored. It might make sense to have this on a separate volume from the data directory in case one of the volumes runs out of disk space.
- path.data: Path to a folder containing the data stored by Elasticsearch.
In this article, we’ll have a closer look at the actual contents of the data directory (path.data) and try to gain an understanding of what all the files are used for.
Where Do the Files Come from?
Since Elasticsearch uses Lucene under the hood to handle the indexing and querying on the shard level, the files in the data directory are written by both Elasticsearch and Lucene.
The responsibilities of each is quite clear: Lucene is responsible for writing and maintaining the Lucene index files while Elasticsearch writes metadata related to features on top of Lucene, such as field mappings, index settings and other cluster metadata – end user and supporting features that do not exist in the low-level Lucene but are provided by Elasticsearch.
Let’s look at the outer levels of data written by Elasticsearch before we dive deeper and eventually find the Lucene index files.
Node Data
Simply starting Elasticsearch from a empty data directory yields the following directory tree:
$ tree data
data
└── elasticsearch
└── nodes
└── 0
├── _state
│ └── global-0.st
└── node.lock
The node.lock file is there to ensure that only one Elasticsearch installation is reading/writing from a single data directory at a time.
More interesting is the global-0.st-file. The global-prefix indicates that this is a global state file while the .st extension indicates that this is a state file that contains metadata. As you might have guessed, this binary file contains global metadata about your cluster and the number after the prefix indicates the cluster metadata version, a strictly increasing versioning scheme that follows your cluster.
While it is technically possible to edit these files with an hex editor in an emergency, it is strongly discouraged because it can quickly lead to data loss.
Index Data
Let’s create a single shard index and look at the files changed by Elasticsearch:
$ curl localhost:9200/foo -XPOST -H 'Content-Type: application/json' -d '{"settings":{"index.number_of_shards": 1}}'
{"acknowledged":true}
$ tree -h data
data
└── [ 102] elasticsearch
└── [ 102] nodes
└── [ 170] 0
├── [ 102] _state
│ └── [ 109] global-0.st
├── [ 102] indices
│ └── [ 136] foo
│ ├── [ 170] 0
│ │ ├── .....
│ └── [ 102] _state
│ └── [ 256] state-0.st
└── [ 0] node.lock
We see that a new directory has been created corresponding to the index name. This directory has two sub-folders: _state and 0. The former contains what’s called a index state file (indices/{index-name}/_state/state-{version}.st), which contains metadata about the index, such as its creation timestamp. It also contains a unique identifier as well as the settings and the mappings for the index. The latter contains data relevant for the first (and only) shard of the index (shard 0). Next up, we’ll have a closer look at this.
Lucene Index Files
Lucene has done a good job at documenting the files in the Lucene index directory, reproduced here for your convenience (the linked documentation in Lucene also goes into detail about the changes these files have gone through since all the way back to Lucene 2.1, so check it out):
Name | Extension | Brief Description |
---|---|---|
Segments File | segments_N | Stores information about a commit point |
Lock File | write.lock | The Write lock prevents multiple IndexWriters from writing to the same file. |
Segment Info | .si | Stores metadata about a segment |
Compound File | .cfs, .cfe | An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles. |
Fields | .fnm | Stores information about the fields |
Field Index | .fdx | Contains pointers to field data |
Field Data | .fdt | The stored fields for documents |
Term Dictionary | .tim | The term dictionary, stores term info |
Term Index | .tip | The index into the Term Dictionary |
Frequencies | .doc | Contains the list of docs which contain each term along with frequency |
Positions | .pos | Stores position information about where a term occurs in the index |
Payloads | .pay | Stores additional per-position metadata information such as character offsets and user payloads |
Norms | .nvd, .nvm | Encodes length and boost factors for docs and fields |
Per-Document Values | .dvd, .dvm | Encodes additional scoring factors or other per-document information. |
Term Vector Index | .tvx | Stores offset into the document data file |
Term Vector Documents | .tvd | Contains information about each document that has term vectors |
Term Vector Fields | .tvf | The field level info about term vectors |
Live Documents | .liv | Info about what files are live |
Often, you’ll also see a segments.gen file in the Lucene index directory, which is a helper file that contains information about the current/latest segments_N file and is used for filesystems that might not return enough information via directory listings to determine the latest generation segments file.
In older Lucene versions you’ll also find files with the .del suffix. These serve the same purpose as the Live Documents (.liv) files – in other words, these are the deletion lists. If you’re wondering what all this talk about Live Documents and deletion lists are about, you might want to read up on it in the section about building indexes in our Elasticsearch from the bottom-up article.
Fixing Problematic Shards
Since an Elasticsearch shard contains a Lucene Index, we can use Lucene’s wonderful CheckIndex tool, which enables us to scan and fix problematic segments with usually minimal data loss. We would generally recommend Elasticsearch users to simply re-index the data, but if for some reason that’s not possible and the data is very important, it’s a route that’s possible to take, even if it requires quite a bit of manual work and time, depending on the number of shards and their sizes.
The Lucene CheckIndex tool is included in the default Elasticsearch distribution and requires no additional downloads.
# change this to reflect your shard path, the format is
# {path.data}/{cluster_name}/nodes/{node_id}/indices/{index_name}/{shard_id}/index/
$ export SHARD_PATH=data/elasticsearch/nodes/0/indices/foo/0/index/
$ java -cp lib/elasticsearch-*.jar:lib/*:lib/sigar/* -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex $SHARD_PATH
If CheckIndex detects a problem and its suggestion to fix it looks sensible, you can tell CheckIndex to apply the fix(es) by adding the -fix command line parameter.
Storing Snapshots
You might wonder how all these files translate into the storage used by the snapshot repositories. Wonder no more: taking this cluster, snapshotting it as my-snapshot to a filesystem based gateway and inspecting the files in the repository we’ll find these files (some files omitted for brevity):
$ tree -h snapshots
snapshots
├── [ 31] index
├── [ 102] indices
│ └── [ 136] foo
│ ├── [1.2K] 0
│ │ ├── [ 350] __0
│ │ ├── [1.8K] __1
...
│ │ ├── [ 350] __w
│ │ ├── [ 380] __x
│ │ └── [8.2K] snapshot-my-snapshot
│ └── [ 249] snapshot-my-snapshot
├── [ 79] metadata-my-snapshot
└── [ 171] snapshot-my-snapshot
At the root we have an index file that contains information about all the snapshots in this repository and each snapshot has an associated snapshot- and a metadata- file. The snapshot- file at the root contains information about the state of the snapshot, which indexes it contains and so on. The metadata- file at the root contains the cluster metadata at the time of the snapshot.
When compress: true is set, metadata- and snapshot- files are compressed using LZF, which focuses on compressing and decompressing speed, which makes it a great fit for Elasticsearch. The data is stored with a header: ZV + 1 byte indicating whether the data is compressed. After the header there will be one or more compressed 64K blocks on the format: 2 byte block length + 2 byte uncompressed size + compressed data. Using this information you can use any LibLZF compatible decompressor. If you want to learn more about LZF, check out this great description of the format.
At the index level there is another file, indices/{index_name}/snapshot-{snapshot_name} that contains the index metadata, such as settings and mappings for the index at the time of the snapshot.
At the shard level you’ll find two kinds of files: renamed Lucene index files and the shard snapshot file: indices/{index_name}/{shard_id}/snapshot-{snapshot_name}. This file contains information about which of the files in the shard directory are used in the snapshot and a mapping from the logical file names in the snapshot to the concrete filenames they should be stored as on-disk when being restored. It also contains the checksum, Lucene versioning and size information for all relevant files that can be used to detect and prevent data corruption.
You might wonder why these files have been renamed instead of just keeping their original file names, which potentially would have been easier to work with directly on disk. The reason is simple: it’s possible to snapshot an index, delete and re-create it before snapshotting it again. In this case, several files would end up having the same names, but different contents.
Summary
In this article we have looked at the files written to the data directory by various levels of Elasticsearch: the node, index and shard level. We’ve seen where the Lucene indexes are stored on disk, and briefly described how to use the Lucene CheckIndex tool to verify and fix problematic shards.
Hopefully, you won’t ever need to perform any operations on the contents of the Elasticsearch data directory, but having some insight into what kind of data is written to your file system by your favorite search based database is always a good idea.
Editor’s Note (May 1, 2017): Starting with 6.0, any curl command to Elasticsearch containing content will require a valid content type header. As a result, this post has been updated to reflect this change and to set readers of this post up for success with future versions.