- Elasticsearch Guide: other versions:
- Getting Started
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Installing X-Pack
- Set up X-Pack
- Configuring X-Pack Java Clients
- X-Pack Settings
- Bootstrap Checks for X-Pack
- Upgrade Elasticsearch
- API Conventions
- Document APIs
- Search APIs
- Aggregations
- Metrics Aggregations
- Avg Aggregation
- Weighted Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- Sum Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Median Absolute Deviation Aggregation
- Bucket Aggregations
- Adjacency Matrix Aggregation
- Auto-interval Date Histogram Aggregation
- Intervals
- Children Aggregation
- Composite Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Parent Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
- Pipeline Aggregations
- Avg Bucket Aggregation
- Derivative Aggregation
- Max Bucket Aggregation
- Min Bucket Aggregation
- Sum Bucket Aggregation
- Stats Bucket Aggregation
- Extended Stats Bucket Aggregation
- Percentiles Bucket Aggregation
- Moving Average Aggregation
- Moving Function Aggregation
- Cumulative Sum Aggregation
- Bucket Script Aggregation
- Bucket Selector Aggregation
- Bucket Sort Aggregation
- Serial Differencing Aggregation
- Matrix Aggregations
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
- Metrics Aggregations
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
- Refresh
- Force Merge
- cat APIs
- Cluster APIs
- Query DSL
- Mapping
- Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Standard Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- Whitespace Tokenizer
- UAX URL Email Tokenizer
- Classic Tokenizer
- Thai Tokenizer
- NGram Tokenizer
- Edge NGram Tokenizer
- Keyword Tokenizer
- Pattern Tokenizer
- Char Group Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Flatten Graph Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Word Delimiter Graph Token Filter
- Multiplexer Token Filter
- Conditional Token Filter
- Predicate Token Filter Script
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Parsing synonym files
- Synonym Graph Token Filter
- Parsing synonym files
- Compound Word Token Filters
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Exclude mode settings example
- Classic Token Filter
- Apostrophe Token Filter
- Decimal Digit Token Filter
- Fingerprint Token Filter
- Minhash Token Filter
- Remove Duplicates Token Filter
- Character Filters
- Modules
- Index Modules
- Ingest Node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Processors
- Append Processor
- Bytes Processor
- Convert Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Drop Processor
- Dot Expander Processor
- Fail Processor
- Foreach Processor
- Grok Processor
- Gsub Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- Managing the index lifecycle
- SQL Access
- Monitor a cluster
- Rolling up historical data
- Frozen indices
- Set up a cluster for high availability
- Secure a cluster
- Overview
- Configuring security
- Encrypting communications in Elasticsearch
- Encrypting communications in an Elasticsearch Docker Container
- Enabling cipher suites for stronger encryption
- Separating node-to-node and client traffic
- Configuring an Active Directory realm
- Configuring a file realm
- Configuring an LDAP realm
- Configuring a native realm
- Configuring a PKI realm
- Configuring a SAML realm
- Configuring a Kerberos realm
- FIPS 140-2
- Security settings
- Security files
- Auditing Settings
- How security works
- User authentication
- Built-in users
- Internal users
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- User authorization
- Auditing security events
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, tribe, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Can’t log in after upgrading to 6.5.0
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on Cluster and Index Events
- Command line tools
- How To
- Testing
- Glossary of terms
- X-Pack APIs
- Info API
- Cross-cluster replication APIs
- Explore API
- Freeze index
- Index lifecycle management API
- Licensing APIs
- Migration APIs
- Machine learning APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create calendar
- Create datafeeds
- Create filter
- Create jobs
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Find file structure
- Flush jobs
- Forecast jobs
- Get calendars
- Get buckets
- Get overall buckets
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Rollup APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get application privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate token
- SSL certificate
- Unfreeze index
- Watcher APIs
- Definitions
- Release Highlights
- Breaking changes
- Release Notes
- Elasticsearch version 6.6.2
- Elasticsearch version 6.6.1
- Elasticsearch version 6.6.0
- Elasticsearch version 6.5.4
- Elasticsearch version 6.5.3
- Elasticsearch version 6.5.2
- Elasticsearch version 6.5.1
- Elasticsearch version 6.5.0
- Elasticsearch version 6.4.3
- Elasticsearch version 6.4.2
- Elasticsearch version 6.4.1
- Elasticsearch version 6.4.0
- Elasticsearch version 6.3.2
- Elasticsearch version 6.3.1
- Elasticsearch version 6.3.0
- Elasticsearch version 6.2.4
- Elasticsearch version 6.2.3
- Elasticsearch version 6.2.2
- Elasticsearch version 6.2.1
- Elasticsearch version 6.2.0
- Elasticsearch version 6.1.4
- Elasticsearch version 6.1.3
- Elasticsearch version 6.1.2
- Elasticsearch version 6.1.1
- Elasticsearch version 6.1.0
- Elasticsearch version 6.0.1
- Elasticsearch version 6.0.0
- Elasticsearch version 6.0.0-rc2
- Elasticsearch version 6.0.0-rc1
- Elasticsearch version 6.0.0-beta2
- Elasticsearch version 6.0.0-beta1
- Elasticsearch version 6.0.0-alpha2
- Elasticsearch version 6.0.0-alpha1
- Elasticsearch version 6.0.0-alpha1 (Changes previously released in 5.x)
Index API
editIndex API
editThe index API adds or updates a typed JSON document in a specific index,
making it searchable. The following example inserts the JSON document
into the "twitter" index, under a type called _doc
with an id of 1:
PUT twitter/_doc/1 { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
The result of the above index operation is:
{ "_shards" : { "total" : 2, "failed" : 0, "successful" : 2 }, "_index" : "twitter", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "result" : "created" }
The _shards
header provides information about the replication process of the index operation:
-
total
- Indicates how many shard copies (primary and replica shards) the index operation should be executed on.
-
successful
- Indicates the number of shard copies the index operation succeeded on.
-
failed
- An array that contains replication-related errors in the case an index operation failed on a replica shard.
The index operation is successful in the case successful
is at least 1.
Replica shards may not all be started when an indexing operation successfully returns (by default, only the
primary is required, but this behavior can be changed). In that case,
total
will be equal to the total shards based on the number_of_replicas
setting and successful
will be
equal to the number of shards started (primary plus replicas). If there were no failures, the failed
will be 0.
Automatic Index Creation
editThe index operation automatically creates an index if it does not already exist, and applies any index templates that are configured. The index operation also creates a dynamic type mapping for the specified type if one does not already exist. By default, new fields and objects will automatically be added to the mapping definition for the specified type if needed. Check out the mapping section for more information on mapping definitions, and the put mapping API for information about updating type mappings manually.
Automatic index creation is controlled by the action.auto_create_index
setting. This setting defaults to true
, meaning that indices are always
automatically created. Automatic index creation can be permitted only for
indices matching certain patterns by changing the value of this setting to a
comma-separated list of these patterns. It can also be explicitly permitted and
forbidden by prefixing patterns in the list with a +
or -
. Finally it can
be completely disabled by changing this setting to false
.
PUT _cluster/settings { "persistent": { "action.auto_create_index": "twitter,index10,-index1*,+ind*" } } PUT _cluster/settings { "persistent": { "action.auto_create_index": "false" } } PUT _cluster/settings { "persistent": { "action.auto_create_index": "true" } }
Permit only the auto-creation of indices called |
|
Completely disable the auto-creation of indices. |
|
Permit the auto-creation of indices with any name. This is the default. |
Operation Type
editThe index operation also accepts an op_type
that can be used to force
a create
operation, allowing for "put-if-absent" behavior. When
create
is used, the index operation will fail if a document by that id
already exists in the index.
Here is an example of using the op_type
parameter:
PUT twitter/_doc/1?op_type=create { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
Another option to specify create
is to use the following uri:
PUT twitter/_doc/1/_create { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
Automatic ID Generation
editThe index operation can be executed without specifying the id. In such a
case, an id will be generated automatically. In addition, the op_type
will automatically be set to create
. Here is an example (note the
POST used instead of PUT):
POST twitter/_doc/ { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
The result of the above index operation is:
{ "_shards" : { "total" : 2, "failed" : 0, "successful" : 2 }, "_index" : "twitter", "_type" : "_doc", "_id" : "W0tpsmIBdwcYyG50zbta", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "result": "created" }
Optimistic concurrency control
editIndex operations can be made optional and only be performed if the last
modification to the document was assigned the sequence number and primary
term specified by the if_seq_no
and if_primary_term
parameters. If a
mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409. See Optimistic concurrency control for more details.
Routing
editBy default, shard placement — or routing
— is controlled by using a
hash of the document’s id value. For more explicit control, the value
fed into the hash function used by the router can be directly specified
on a per-operation basis using the routing
parameter. For example:
POST twitter/_doc?routing=kimchy { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
In the example above, the "_doc" document is routed to a shard based on
the routing
parameter provided: "kimchy".
When setting up explicit mapping, the _routing
field can be optionally
used to direct the index operation to extract the routing value from the
document itself. This does come at the (very minimal) cost of an
additional document parsing pass. If the _routing
mapping is defined
and set to be required
, the index operation will fail if no routing
value is provided or extracted.
Distributed
editThe index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Wait For Active Shards
editTo improve the resiliency of writes to the system, indexing operations
can be configured to wait for a certain number of active shard copies
before proceeding with the operation. If the requisite number of active
shard copies are not available, then the write operation must wait and
retry, until either the requisite shard copies have started or a timeout
occurs. By default, write operations only wait for the primary shards
to be active before proceeding (i.e. wait_for_active_shards=1
).
This default can be overridden in the index settings dynamically
by setting index.write.wait_for_active_shards
. To alter this behavior
per operation, the wait_for_active_shards
request parameter can be used.
Valid values are all
or any positive integer up to the total number
of configured copies per shard in the index (which is number_of_replicas+1
).
Specifying a negative value or a number greater than the number of
shard copies will throw an error.
For example, suppose we have a cluster of three nodes, A
, B
, and C
and
we create an index index
with the number of replicas set to 3 (resulting in
4 shard copies, one more copy than there are nodes). If we
attempt an indexing operation, by default the operation will only ensure
the primary copy of each shard is available before proceeding. This means
that even if B
and C
went down, and A
hosted the primary shard copies,
the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all 3 nodes
are up), then the indexing operation will require 3 active shard copies
before proceeding, a requirement which should be met because there are 3
active nodes in the cluster, each one holding a copy of the shard. However,
if we set wait_for_active_shards
to all
(or to 4
, which is the same),
the indexing operation will not proceed as we do not have all 4 copies of
each shard active in the index. The operation will timeout
unless a new node is brought up in the cluster to host the fourth copy of
the shard.
It is important to note that this setting greatly reduces the chances of
the write operation not writing to the requisite number of shard copies,
but it does not completely eliminate the possibility, because this check
occurs before the write operation commences. Once the write operation
is underway, it is still possible for replication to fail on any number of
shard copies but still succeed on the primary. The _shards
section of the
write operation’s response reveals the number of shard copies on which
replication succeeded/failed.
{ "_shards" : { "total" : 2, "failed" : 0, "successful" : 2 } }
Refresh
editControl when the changes made by this request are visible to search. See refresh.
Noop Updates
editWhen updating a document using the index API a new version of the document is
always created even if the document hasn’t changed. If this isn’t acceptable
use the _update
API with detect_noop
set to true. This option isn’t
available on the index API because the index API doesn’t fetch the old source
and isn’t able to compare it against the new source.
There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Timeout
editThe primary shard assigned to perform the index operation might not be
available when the index operation is executed. Some reasons for this
might be that the primary shard is currently recovering from a gateway
or undergoing relocation. By default, the index operation will wait on
the primary shard to become available for up to 1 minute before failing
and responding with an error. The timeout
parameter can be used to
explicitly specify how long it waits. Here is an example of setting it
to 5 minutes:
PUT twitter/_doc/1?timeout=5m { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }
Versioning
editEach indexed document is given a version number. By default,
internal versioning is used that starts at 1 and increments
with each update, deletes included. Optionally, the version number can be
set to an external value (for example, if maintained in a
database). To enable this functionality, version_type
should be set to
external
. The value provided must be a numeric, long value greater than or equal to 0,
and less than around 9.2e+18.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail. For example:
PUT twitter/_doc/1?version=2&version_type=external { "message" : "elasticsearch now has versioning support, double cool!" }
NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.
The above will succeed since the the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 http status code).
External versioning supports the value 0 as a valid version number. This allows the version to be in sync with an external versioning system where version numbers start from zero instead of one. It has the side effect that documents with version number equal to zero can neither be updated using the Update By Query API nor be deleted using the Delete By Query API as long as their version number is equal to zero.
A nice side effect is that there is no need to maintain strict ordering of async indexing operations executed as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order for whatever reason.
Version types
editNext to the external
version type explained above, Elasticsearch
also supports other types for specific use cases. Here is an overview of
the different version types and their semantics.
-
internal
- Only index the document if the given version is identical to the version of the stored document.
-
external
orexternal_gt
- Only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
-
external_gte
- Only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
NOTE: The external_gte
version type is meant for special use cases and
should be used with care. If used incorrectly, it can result in loss of data.
There is another option, force
, which is deprecated because it can cause
primary and replica shards to diverge.
On this page