Index API

edit

Adds a JSON document to the specified data stream or index and makes it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.

You cannot use the index API to send update requests for existing documents to a data stream. See Update documents in a data stream by query and Update or delete documents in a backing index.

Request

edit

PUT /<target>/_doc/<_id>

POST /<target>/_doc/

PUT /<target>/_create/<_id>

POST /<target>/_create/<_id>

You cannot add new documents to a data stream using the PUT /<target>/_doc/<_id> request format. To specify a document ID, use the PUT /<target>/_create/<_id> format instead. See Add documents to a data stream.

Prerequisites

edit
  • If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

    • To add or overwrite a document using the PUT /<target>/_doc/<_id> request format, you must have the create, index, or write index privilege.
    • To add a document using the POST /<target>/_doc/, PUT /<target>/_create/<_id>, or POST /<target>/_create/<_id> request formats, you must have the create_doc, create, index, or write index privilege.
    • To automatically create a data stream or index with an index API request, you must have the auto_configure, create_index, or manage index privilege.
  • Automatic data stream creation requires a matching index template with data stream enabled. See Set up a data stream.

Path parameters

edit
<target>

(Required, string) Name of the data stream or index to target.

If the target doesn’t exist and matches the name or wildcard (*) pattern of an index template with a data_stream definition, this request creates the data stream. See Set up a data stream.

If the target doesn’t exist and doesn’t match a data stream template, this request creates the index.

You can check for existing targets using the resolve index API.

<_id>

(Optional, string) Unique identifier for the document.

This parameter is required for the following request formats:

  • PUT /<target>/_doc/<_id>
  • PUT /<target>/_create/<_id>
  • POST /<target>/_create/<_id>

To automatically generate a document ID, use the POST /<target>/_doc/ request format and omit this parameter.

Query parameters

edit
if_seq_no
(Optional, integer) Only perform the operation if the document has this sequence number. See Optimistic concurrency control.
if_primary_term
(Optional, integer) Only perform the operation if the document has this primary term. See Optimistic concurrency control.
op_type

(Optional, enum) Set to create to only index the document if it does not already exist (put if absent). If a document with the specified _id already exists, the indexing operation will fail. Same as using the <index>/_create endpoint. Valid values: index, create. If document id is specified, it defaults to index. Otherwise, it defaults to create.

If the request targets a data stream, an op_type of create is required. See Add documents to a data stream.

pipeline
(Optional, string) ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to _none disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter.
refresh
(Optional, enum) If true, Elasticsearch refreshes the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false do nothing with refreshes. Valid values: true, false, wait_for. Default: false.
routing
(Optional, string) Custom value used to route operations to a specific shard.
timeout

(Optional, time units) Period the request waits for the following operations:

Defaults to 1m (one minute). This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur.

version
(Optional, integer) Explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.
version_type
(Optional, enum) Specific version type: external, external_gte.
wait_for_active_shards

(Optional, string) The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). Default: 1, the primary shard.

See Active shards.

require_alias
(Optional, Boolean) If true, the destination must be an index alias. Defaults to false.

Request body

edit
<field>
(Required, string) Request body contains the JSON source for the document data.

Response body

edit
_shards
Provides information about the replication process of the index operation.
_shards.total
Indicates how many shard copies (primary and replica shards) the index operation should be executed on.
_shards.successful

Indicates the number of shard copies the index operation succeeded on. When the index operation is successful, successful is at least 1.

Replica shards might not all be started when an indexing operation returns successfully—​by default, only the primary is required. Set wait_for_active_shards to change this default behavior. See Active shards.

_shards.failed
An array that contains replication-related errors in the case an index operation failed on a replica shard. 0 indicates there were no failures.
_index
The name of the index the document was added to.
_type
The document type. Elasticsearch indices now support a single document type, _doc.
_id
The unique identifier for the added document.
_version
The document version. Incremented each time the document is updated.
_seq_no
The sequence number assigned to the document for the indexing operation. Sequence numbers are used to ensure an older version of a document doesn’t overwrite a newer version. See Optimistic concurrency control.
_primary_term
The primary term assigned to the document for the indexing operation. See Optimistic concurrency control.
result
The result of the indexing operation, created or updated.

Description

edit

You can index a new JSON document with the _doc or _create resource. Using _create guarantees that the document is only indexed if it does not already exist. To update an existing document, you must use the _doc resource.

Automatically create data streams and indices

edit

If request’s target doesn’t exist and matches an index template with a data_stream definition, the index operation automatically creates the data stream. See Set up a data stream.

If the target doesn’t exist and doesn’t match a data stream template, the operation automatically creates the index and applies any matching index templates.

Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, see Avoid index pattern collisions.

If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed. For more information about field mapping, see mapping and the update mapping API.

Automatic index creation is controlled by the action.auto_create_index setting. This setting defaults to true, which allows any index to be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns, or set it to false to disable automatic index creation entirely. Specify a comma-separated list of patterns you want to allow, or prefix each pattern with + or - to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.

The action.auto_create_index setting only affects the automatic creation of indices. It does not affect the creation of data streams.

response = client.cluster.put_settings(
  body: {
    persistent: {
      "action.auto_create_index": 'my-index-000001,index10,-index1*,+ind*'
    }
  }
)
puts response

response = client.cluster.put_settings(
  body: {
    persistent: {
      "action.auto_create_index": 'false'
    }
  }
)
puts response

response = client.cluster.put_settings(
  body: {
    persistent: {
      "action.auto_create_index": 'true'
    }
  }
)
puts response
PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "my-index-000001,index10,-index1*,+ind*" 
  }
}

PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "false" 
  }
}

PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": "true" 
  }
}

Allow auto-creation of indices called my-index-000001 or index10, block the creation of indices that match the pattern index1*, and allow creation of any other indices that match the ind* pattern. Patterns are matched in the order specified.

Disable automatic index creation entirely.

Allow automatic creation of any index. This is the default.

Put if absent
edit

You can force a create operation by using the _create resource or setting the op_type parameter to create. In this case, the index operation fails if a document with the specified ID already exists in the index.

Create document IDs automatically
edit

When using the POST /<target>/_doc/ request format, the op_type is automatically set to create and the index operation generates a unique ID for the document.

POST my-index-000001/_doc/
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

The API returns the following result:

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  },
  "_index": "my-index-000001",
  "_id": "W0tpsmIBdwcYyG50zbta",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "result": "created"
}
Optimistic concurrency control
edit

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409. See Optimistic concurrency control for more details.

Routing
edit

By default, shard placement — or routing — is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:

POST my-index-000001/_doc?routing=kimchy
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

In this example, the document is routed to a shard based on the routing parameter provided: "kimchy".

When setting up explicit mapping, you can also use the _routing field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

Data streams do not support custom routing unless they were created with the allow_custom_routing setting enabled in the template.

Distributed
edit

The index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Active shards
edit

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (i.e. wait_for_active_shards=1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, the wait_for_active_shards request parameter can be used.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3 active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation commences. Once the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the write operation’s response reveals the number of shard copies on which replication succeeded/failed.

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  }
}
Refresh
edit

Control when the changes made by this request are visible to search. See refresh.

Noop updates
edit

When updating a document using the index API a new version of the document is always created even if the document hasn’t changed. If this isn’t acceptable use the _update API with detect_noop set to true. This option isn’t available on the index API because the index API doesn’t fetch the old source and isn’t able to compare it against the new source.

There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

Timeout
edit

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is an example of setting it to 5 minutes:

PUT my-index-000001/_doc/1?timeout=5m
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}
Versioning
edit

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.

When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail. For example:

response = client.index(
  index: 'my-index-000001',
  id: 1,
  version: 2,
  version_type: 'external',
  body: {
    user: {
      id: 'elkbee'
    }
  }
)
puts response
PUT my-index-000001/_doc/1?version=2&version_type=external
{
  "user": {
    "id": "elkbee"
  }
}

Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.

In the previous example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 http status code).

A nice side effect is that there is no need to maintain strict ordering of async indexing operations executed as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order for whatever reason.

Version types
edit

In addition to the external version type, Elasticsearch also supports other types for specific use cases:

external or external_gt
Only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.
external_gte
Only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.

The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data. There is another option, force, which is deprecated because it can cause primary and replica shards to diverge.

Examples

edit

Insert a JSON document into the my-index-000001 index with an _id of 1:

response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
    "@timestamp": '2099-11-15T13:12:00',
    message: 'GET /search HTTP/1.1 200 1070000',
    user: {
      id: 'kimchy'
    }
  }
)
puts response
PUT my-index-000001/_doc/1
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

The API returns the following result:

{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  },
  "_index": "my-index-000001",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "result": "created"
}

Use the _create resource to index a document into the my-index-000001 index if no document with that ID exists:

PUT my-index-000001/_create/1
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

Set the op_type parameter to create to index a document into the my-index-000001 index if no document with that ID exists:

PUT my-index-000001/_doc/1?op_type=create
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}