Index some documents
editIndex some documents
editOnce you have a cluster up and running, you’re ready to index some data. There are a variety of ingest options for Elasticsearch, but in the end they all do the same thing: put JSON documents into an Elasticsearch index.
You can do this directly with a simple PUT request that specifies
the index you want to add the document, a unique document ID, and one or more
"field": "value"
pairs in the request body:
PUT /customer/_doc/1 { "name": "John Doe" }
This request automatically creates the customer
index if it doesn’t already
exist, adds a new document that has an ID of 1
, and stores and
indexes the name
field.
Since this is a new document, the response shows that the result of the operation was that version 1 of the document was created:
{ "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "_seq_no" : 26, "_primary_term" : 4 }
The new document is available immediately from any node in the cluster. You can retrieve it with a GET request that specifies its document ID:
GET /customer/_doc/1
The response indicates that a document with the specified ID was found and shows the original source fields that were indexed.
{ "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 26, "_primary_term" : 4, "found" : true, "_source" : { "name": "John Doe" } }
Indexing documents in bulk
editIf you have a lot of documents to index, you can submit them in batches with the bulk API. Using bulk to batch document operations is significantly faster than submitting requests individually as it minimizes network roundtrips.
The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents and a total payload between 5MB and 15MB. From there, you can experiment to find the sweet spot.
To get some data into Elasticsearch that you can start searching and analyzing:
-
Download the
accounts.json
sample data set. The documents in this randomly-generated data set represent user accounts with the following information:{ "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "244 Columbus Place", "employer": "Euron", "email": "[email protected]", "city": "Hobucken", "state": "CO" }
-
Index the account data into the
bank
index with the following_bulk
request:curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json" curl "localhost:9200/_cat/indices?v"
The response indicates that 1,000 documents were indexed successfully.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb