Elasticsearch input plugin

edit

Elasticsearch input plugin

edit
  • Plugin version: v4.16.0
  • Released on: 2022-08-12
  • Changelog

For other versions, see the Versioned plugin docs.

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

Read from an Elasticsearch cluster, based on search query results. This is useful for replaying test logs, reindexing, etc. You can periodically schedule ingestion using a cron syntax (see schedule setting) or run the query one time to load data into Logstash.

Example:

    input {
      # Read all documents from Elasticsearch matching the given query
      elasticsearch {
        hosts => "localhost"
        query => '{ "query": { "match": { "statuscode": 200 } }, "sort": [ "_doc" ] }'
      }
    }

This would create an Elasticsearch query with the following format:

    curl 'http://localhost:9200/logstash-*/_search?&scroll=1m&size=1000' -d '{
      "query": {
        "match": {
          "statuscode": 200
        }
      },
      "sort": [ "_doc" ]
    }'

Scheduling

edit

Input from this plugin can be scheduled to run periodically according to a specific schedule. This scheduling syntax is powered by rufus-scheduler. The syntax is cron-like with some extensions specific to Rufus (e.g. timezone support ).

Examples:

* 5 * 1-3 *

will execute every minute of 5am every day of January through March.

0 * * * *

will execute on the 0th minute of every hour every day.

0 6 * * * America/Chicago

will execute at 6:00am (UTC/GMT -5) every day.

Further documentation describing this syntax can be found here.

Authentication

edit

Authentication to a secure Elasticsearch cluster is possible using one of the following options:

Authorization

edit

Authorization to a secure Elasticsearch cluster requires read permission at index level and monitoring permissions at cluster level. The monitoring permission at cluster level is necessary to perform periodic connectivity checks.

Compatibility with the Elastic Common Schema (ECS)

edit

When ECS compatibility is disabled, docinfo_target uses the "@metadata" field as a default, with ECS enabled the plugin uses a naming convention "[@metadata][input][elasticsearch]" as a default target for placing document information.

The plugin logs a warning when ECS is enabled and target isn’t set.

Set the target option to avoid potential schema conflicts.

Elasticsearch Input configuration options

edit

This plugin supports the following configuration options plus the Common Options described later.

Also see Common Options for a list of options supported by all input plugins.

 

api_key

edit
  • Value type is password
  • There is no default value for this setting.

Authenticate using Elasticsearch API key. Note that this option also requires enabling the ssl option.

Format is id:api_key where id and api_key are as returned by the Elasticsearch Create API key API.

ca_file

edit
  • Value type is path
  • There is no default value for this setting.

SSL Certificate Authority file in PEM encoded format, must also include any chain certificates as necessary.

ca_trusted_fingerprint

edit
  • Value type is string, and must contain exactly 64 hexadecimal characters.
  • There is no default value for this setting.
  • Use of this option requires Logstash 8.3+

The SHA-256 fingerprint of an SSL Certificate Authority to trust, such as the autogenerated self-signed CA for an Elasticsearch cluster.

cloud_auth

edit
  • Value type is password
  • There is no default value for this setting.

Cloud authentication string ("<username>:<password>" format) is an alternative for the user/password pair.

For more info, check out the Logstash-to-Cloud documentation.

cloud_id

edit
  • Value type is string
  • There is no default value for this setting.

Cloud ID, from the Elastic Cloud web console. If set hosts should not be used.

For more info, check out the Logstash-to-Cloud documentation.

connect_timeout_seconds

edit
  • Value type is number
  • Default value is 10

The maximum amount of time, in seconds, to wait while establishing a connection to Elasticsearch. Connect timeouts tend to occur when Elasticsearch or an intermediate proxy is overloaded with requests and has exhausted its connection pool.

docinfo

edit
  • Value type is boolean
  • Default value is false

If set, include Elasticsearch document information such as index, type, and the id in the event.

It might be important to note, with regards to metadata, that if you’re ingesting documents with the intent to re-index them (or just update them) that the action option in the elasticsearch output wants to know how to handle those things. It can be dynamically assigned with a field added to the metadata.

Example

    input {
      elasticsearch {
        hosts => "es.production.mysite.org"
        index => "mydata-2018.09.*"
        query => '{ "query": { "query_string": { "query": "*" } } }'
        size => 500
        scroll => "5m"
        docinfo => true
        docinfo_target => "[@metadata][doc]"
      }
    }
    output {
      elasticsearch {
        index => "copy-of-production.%{[@metadata][doc][_index]}"
        document_type => "%{[@metadata][doc][_type]}"
        document_id => "%{[@metadata][doc][_id]}"
      }
    }

If set, you can use metadata information in the add_field common option.

Example

    input {
      elasticsearch {
        docinfo => true
        docinfo_target => "[@metadata][doc]"
        add_field => {
          identifier => "%{[@metadata][doc][_index]}:%{[@metadata][doc][_type]}:%{[@metadata][doc][_id]}"
        }
      }
    }

docinfo_fields

edit
  • Value type is array
  • Default value is ["_index", "_type", "_id"]

If document metadata storage is requested by enabling the docinfo option, this option lists the metadata fields to save in the current event. See Meta-Fields in the Elasticsearch documentation for more information.

docinfo_target

edit
  • Value type is string
  • Default value depends on whether ecs_compatibility is enabled:

    • ECS Compatibility disabled: "@metadata"
    • ECS Compatibility enabled: "[@metadata][input][elasticsearch]"

If document metadata storage is requested by enabling the docinfo option, this option names the field under which to store the metadata fields as subfields.

ecs_compatibility

edit
  • Value type is string
  • Supported values are:

    • disabled: CSV data added at root level
    • v1,v8: Elastic Common Schema compliant behavior
  • Default value depends on which version of Logstash is running:

    • When Logstash provides a pipeline.ecs_compatibility setting, its value is used as the default
    • Otherwise, the default value is disabled

Controls this plugin’s compatibility with the Elastic Common Schema (ECS).

hosts

edit
  • Value type is array
  • There is no default value for this setting.

List of one or more Elasticsearch hosts to use for querying. Each host can be either IP, HOST, IP:port, or HOST:port. The port defaults to 9200.

index

edit
  • Value type is string
  • Default value is "logstash-*"

The index or alias to search. See Multi Indices documentation in the Elasticsearch documentation for more information on how to reference multiple indices.

password

edit
  • Value type is password
  • There is no default value for this setting.

The password to use together with the username in the user option when authenticating to the Elasticsearch server. If set to an empty string authentication will be disabled.

proxy

edit
  • Value type is uri
  • There is no default value for this setting.

Set the address of a forward HTTP proxy. An empty string is treated as if proxy was not set, this is useful when using environment variables e.g. proxy => '${LS_PROXY:}'.

query

edit
  • Value type is string
  • Default value is '{ "sort": [ "_doc" ] }'

The query to be executed. Read the Elasticsearch query DSL documentation for more information.

request_timeout_seconds

edit
  • Value type is number
  • Default value is 60

The maximum amount of time, in seconds, for a single request to Elasticsearch. Request timeouts tend to occur when an individual page of data is very large, such as when it contains large-payload documents and/or the size has been specified as a large value.

retries

edit
  • Value type is number
  • Default value is 0

The number of times to re-run the query after the first failure. If the query fails after all retries, it logs an error message. The default is 0 (no retry). This value should be equal to or greater than zero.

schedule

edit
  • Value type is string
  • There is no default value for this setting.

Schedule of when to periodically run statement, in Cron format for example: "* * * * *" (execute query every minute, on the minute)

There is no schedule by default. If no schedule is given, then the statement is run exactly once.

scroll

edit
  • Value type is string
  • Default value is "1m"

This parameter controls the keepalive time in seconds of the scrolling request and initiates the scrolling process. The timeout applies per round trip (i.e. between the previous scroll request, to the next).

size

edit
  • Value type is number
  • Default value is 1000

This allows you to set the maximum number of hits returned per scroll.

slices

edit
  • Value type is number
  • There is no default value.
  • Sensible values range from 2 to about 8.

In some cases, it is possible to improve overall throughput by consuming multiple distinct slices of a query simultaneously using sliced scrolls, especially if the pipeline is spending significant time waiting on Elasticsearch to provide results.

If set, the slices parameter tells the plugin how many slices to divide the work into, and will produce events from the slices in parallel until all of them are done scrolling.

The Elasticsearch manual indicates that there can be negative performance implications to both the query and the Elasticsearch cluster when a scrolling query uses more slices than shards in the index.

If the slices parameter is left unset, the plugin will not inject slice instructions into the query.

ssl

edit
  • Value type is boolean
  • Default value is false

If enabled, SSL will be used when communicating with the Elasticsearch server (i.e. HTTPS will be used instead of plain HTTP).

ssl_certificate_verification

edit
  • Value type is boolean
  • Default value is true

Option to validate the server’s certificate. Disabling this severely compromises security. When certificate validation is disabled, this plugin implicitly trusts the machine resolved at the given address without validating its proof-of-identity. In this scenario, the plugin can transmit credentials to or process data from an untrustworthy man-in-the-middle or other compromised infrastructure. More information on the importance of certificate verification: https://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf.

socket_timeout_seconds

edit
  • Value type is number
  • Default value is 60

The maximum amount of time, in seconds, to wait on an incomplete response from Elasticsearch while no additional data has been appended. Socket timeouts usually occur while waiting for the first byte of a response, such as when executing a particularly complex query.

target

edit
  • Value type is field reference
  • There is no default value for this setting.

Without a target, events are created from each hit’s _source at the root level. When the target is set to a field reference, the _source of the hit is placed in the target field instead.

This option can be useful to avoid populating unknown fields when a downstream schema such as ECS is enforced. It is also possible to target an entry in the event’s metadata, which will be available during event processing but not exported to your outputs (e.g., target \=> "[@metadata][_source]").

user

edit
  • Value type is string
  • There is no default value for this setting.

The username to use together with the password in the password option when authenticating to the Elasticsearch server. If set to an empty string authentication will be disabled.

Common Options

edit

The following configuration options are supported by all input plugins:

Setting Input type Required

add_field

hash

No

enable_metric

boolean

No

id

string

No

tags

array

No

type

string

No

Details

edit

 

add_field

edit
  • Value type is hash
  • Default value is {}

Add a field to an event

enable_metric

edit
  • Value type is boolean
  • Default value is true

Disable or enable metric logging for this specific plugin instance by default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

  • Value type is string
  • There is no default value for this setting.

Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type, for example, if you have 2 elasticsearch inputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.

input {
  elasticsearch {
    id => "my_plugin_id"
  }
}

Variable substitution in the id field only supports environment variables and does not support the use of values from the secret store.

tags

edit
  • Value type is array
  • There is no default value for this setting.

Add any number of arbitrary tags to your event.

This can help with processing later.

type

edit
  • Value type is string
  • There is no default value for this setting.

Add a type field to all events handled by this input.

Types are used mainly for filter activation.

The type is stored as part of the event itself, so you can also use the type to search for it in Kibana.

If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.