WARNING: Version 2.4 has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Hadoop Metrics
editHadoop Metrics
editThe Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop Counters infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.
elasticsearch-hadoop provides the following counters, available under org.elasticsearch.hadoop.mr.Counter
enum:
Table 8. Available counters
Counter name | Purpose |
---|---|
Data focused |
|
BYTES_SENT |
Total number of data/communication bytes sent over the network to Elasticsearch |
BYTES_ACCEPTED |
Data/Documents accepted by Elasticsearch in bytes |
BYTES_RETRIED |
Data/Documents rejected by Elasticsearch in bytes |
BYTES_RECEIVED |
Data/Documents received from Elasticsearch in bytes |
Document focused |
|
DOCS_SENT |
Number of docs sent over the network to Elasticsearch |
DOCS_ACCEPTED |
Number of documents sent and accepted by Elasticsearch |
DOCS_RETRIED |
Number of documents sent but rejected by Elasticsearch |
DOCS_RECEIVED |
Number of documents received from Elasticsearch |
Network focused |
|
BULK_TOTAL |
Number of bulk requests made to Elasticsearch |
BULK_RETRIES |
Number of bulk retries (caused by document rejections) |
SCROLL_TOTAL |
Number of scroll pulled from Elasticsearch |
NODE_RETRIES |
Number of node fall backs (caused by network errors) |
NET_RETRIES |
Number of network retries (caused by network errors) |
Time focused |
|
NET_TOTAL_TIME_MS |
Overall time (in ms) spent over the network |
BULK_TOTAL_TIME_MS |
Time (in ms) spent over the network by the bulk requests |
BULK_RETRIES_TOTAL_TIME_MS |
Time (in ms) spent over the network retrying bulk requests |
SCROLL_TOTAL_TIME_MS |
Time (in ms) spent over the network reading the scroll requests |
One can use the counters programatically, depending on the API used, through mapred or mapreduce. Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:
13:55:08,100 INFO main mapreduce.Job - Job job_local127738678_0013 completed successfully 13:55:08,101 INFO main mapreduce.Job - Counters: 35 ... Elasticsearch Hadoop Counters Bulk Retries=0 Bulk Retries Total Time(ms)=0 Bulk Total=20 Bulk Total Time(ms)=518 Bytes Accepted=159129 Bytes Sent=159129 Bytes Received=79921 Bytes Retried=0 Documents Accepted=993 Documents Sent=993 Documents Received=0 Documents Retried=0 Network Retries=0 Network Total Time(ms)=937 Node Retries=0 Scroll Total=0 Scroll Total Time(ms)=0