Azure Module

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

The Microsoft Azure module in Logstash helps you easily integrate your Azure activity logs and SQL diagnostic logs with the Elastic Stack.

Azure Work Flow

You can monitor your Azure cloud environments and SQL DB deployments with deep operational insights across multiple Azure subscriptions. You can explore the health of your infrastructure in real-time, accelerating root cause analysis and decreasing overall time to resolution. The Azure module helps you:

  • Analyze infrastructure changes and authorization activity
  • Identify suspicious behaviors and potential malicious actors
  • Perform root-cause analysis by investigating user activity
  • Monitor and optimize your SQL DB deployments.

The Logstash Azure module is an X-Pack feature under the Basic License and is therefore free to use. Please contact [email protected] for questions or more information.

The Azure module uses the Logstash Azure Event Hubs input plugin to consume data from Azure Event Hubs. The module taps directly into the Azure dashboard, parses and indexes events into Elasticsearch, and installs a suite of Kibana dashboards to help you start exploring your data immediately.

Dashboards

edit

These Kibana dashboards are available and ready for you to use. You can use them as they are, or tailor them to meet your needs.

Infrastructure activity monitoring

edit
  • Overview. Top-level view into your Azure operations, including info about users, resource groups, service health, access, activities, and alerts.
  • Alerts. Alert info, including activity, alert status, and alerts heatmap
  • User Activity. Info about system users, their activity, and requests.

SQL Database monitoring

edit
  • SQL DB Overview. Top-level view into your SQL Databases, including counts for databases, servers, resource groups, and subscriptions.
  • SQL DB Database View. Detailed info about each SQL Database, including wait time, errors, DTU and storage utilization, size, and read and write input/output.
  • SQL DB Queries. Info about SQL Database queries and performance.

Prerequisites

edit

Azure Monitor enabled with Azure Event Hubs and the Elastic Stack are required for this module.

Elastic prerequisites

edit

The instructions below assume that you have Logstash, Elasticsearch, and Kibana running locally. You can also run Logstash, Elasticsearch, and Kibana on separate hosts.

The Elastic Stack version 6.4 (or later) is required for this module.

The Azure module uses the azure_event_hubs input plugin to consume logs and metrics from your Azure environment. It is installed by default with Logstash 6.4 (or later). Basic understanding of the plugin and options is helpful when you set up the Azure module. See the azure_event_hubs input plugin documentation for more information.

Elastic products are available to download and easy to install.

Azure prerequisites

edit

Azure Monitor should be configured to stream logs to one or more Event Hubs. Logstash will need to access these Event Hubs instances to consume your Azure logs and metrics. See Microsoft Azure resources at the end of this topic for links to Microsoft Azure documentation.

Configure the module

edit

Specify options for the Logstash Azure module in the logstash.yml configuration file.

  • Basic configuration. You can use the logstash.yml file to configure inputs from multiple Event Hubs that share the same configuration. Basic configuration is recommended for most use cases.
  • Advanced configuration. The advanced configuration is available for deployments where different Event Hubs require different configurations. The logstash.yml file holds your settings. Advanced configuration is not necessary or recommended for most use cases.

See the azure_event_hubs input plugin documentation for more information about basic and advanced configuration models.

Basic configuration sample

edit

The configuration in the logstash.yml file is shared between Event Hubs. Basic configuration is recommended for most use cases

modules:
  - name: azure
    var.elasticsearch.hosts: localhost:9200
    var.kibana.host: localhost:5601
    var.input.azure_event_hubs.consumer_group: "logstash" 
    var.input.azure_event_hubs.storage_connection: "DefaultEndpointsProtocol=https;AccountName=instance1..." 
    var.input.azure_event_hubs.threads: 9 
    var.input.azure_event_hubs.event_hub_connections:
      - "Endpoint=sb://...EntityPath=insights-operational-logs" 
      - "Endpoint=sb://...EntityPath=insights-metrics-pt1m" 
      - "Endpoint=sb://...EntityPath=insights-logs-blocks"
      - "Endpoint=sb://...EntityPath=insights-logs-databasewaitstatistics"
      - "Endpoint=sb://...EntityPath=insights-logs-errors"
      - "Endpoint=sb://...EntityPath=insights-logs-querystoreruntimestatistics"
      - "Endpoint=sb://...EntityPath=insights-logs-querystorewaitstatistics"
      - "Endpoint=sb://...EntityPath=insights-logs-timeouts"

The consumer_group (optional) is highly recommended. See Best practices.

The storage_connection (optional) sets the Azure Blob Storage connection for tracking processing state for Event Hubs when scaling out a deployment with multiple Logstash instances. See Scale Event Hub consumption for additional details.

See Best practices for guidelines on choosing an appropriate number of threads.

This connection sets up the consumption of Activity Logs. By default, Azure Monitor uses the insights-operational-logs Event Hub name. Make sure this matches the name of the Event Hub specified for Activity Logs.

This connection and the ones below set up the consumption of SQL DB diagnostic logs and metrics. By default, Azure Monitor uses all these different Event Hub names.

The basic configuration requires the var.input.azure_event_hubs. prefix before a configuration option. Notice the notation for the threads option.

Advanced configuration sample

edit

Advanced configuration in the logstash.yml file supports Event Hub specific options. Advanced configuration is available for more granular tuning of threading and Blob Storage usage across multiple Event Hubs. Advanced configuration is not necessary or recommended for most use cases. Use it only if it is required for your deployment scenario.

You must define the header array with name in the first position. You can define other options in any order. The per Event Hub configuration takes precedence. Any values not defined per Event Hub use the global config value.

In this example threads, consumer_group, and storage_connection will be applied to each of the configured Event Hubs. Note that decorate_events is defined in both the global and per Event Hub configuration. The per Event Hub configuration takes precedence, and the global configuration is effectively ignored when the per Event Hub setting is present.

modules:
  - name: azure
    var.elasticsearch.hosts: localhost:9200
    var.kibana.host: localhost:5601
    var.input.azure_event_hubs.decorate_events: true 
    var.input.azure_event_hubs.threads: 9 
    var.input.azure_event_hubs.consumer_group: "logstash"
    var.input.azure_event_hubs.storage_connection: "DefaultEndpointsProtocol=https;AccountName=instance1..."
    var.input.azure_event_hubs.event_hubs:
      - ["name",                                    "initial_position",  "storage_container",  "decorate_events",  "event_hub_connection"]                                   
      - ["insights-operational-logs",                 "TAIL",              "activity-logs1",    "true",             "Endpoint=sb://...EntityPath=insights-operational-logs"]
      - ["insights-operational-logs",                 "TAIL",              "activity_logs2",  "true",             "Endpoint=sb://...EntityPath=insights-operational-logs"]   
      - ["insights-metrics-pt1m",                     "TAIL",              "dbmetrics",         "true",             "Endpoint=sb://...EntityPath=insights-metrics-pt1m"]
      - ["insights-logs-blocks",                      "TAIL",              "dbblocks",          "true",             "Endpoint=sb://...EntityPath=insights-logs-blocks"]
      - ["insights-logs-databasewaitstatistics",      "TAIL",              "dbwaitstats",       "false",            "Endpoint=sb://...EntityPath=insights-logs-databasewaitstatistics"]
      - ["insights-logs-errors",                      "HEAD",              "dberrors",          "true",             "Endpoint=sb://...EntityPath=insights-logs-errors"
      - ["insights-logs-querystoreruntimestatistics", "TAIL",              "dbstoreruntime",    "true",             "Endpoint=sb://...EntityPath=insights-logs-querystoreruntimestatistics"]
      - ["insights-logs-querystorewaitstatistics",    "TAIL",              "dbstorewaitstats",  "true",             "Endpoint=sb://...EntityPath=insights-logs-querystorewaitstatistics"]
      - ["insights-logs-timeouts",                    "TAIL",              "dbtimeouts",        "true",             "Endpoint=sb://...EntityPath=insights-logs-timeouts"]

You can specify global Event Hub options. They will be overridden by any configurations specified in the event_hubs option.

See Best practices for guidelines on choosing an appropriate number of threads.

The header array must be defined with name in the first position. Other options can be defined in any order. The per Event Hub configuration takes precedence. Any values not defined per Event Hub use the global config value.

This enables consuming from a second Activity Logs Event Hub that uses a different Blob Storage container. This is necessary to avoid the offsets from the first insights-operational-logs from overwriting the offsets for the second insights-operational-logs.

The advanced configuration doesn’t require a prefix before a per Event Hub configuration option. Notice the notation for the initial_position option.

Scale Event Hub consumption

edit

An Azure Blob Storage account is an essential part of Azure-to-Logstash configuration. It is required for users who want to scale out multiple Logstash instances to consume from Event Hubs.

A Blob Storage account is a central location that enables multiple instances of Logstash to work together to process events. It records the offset (location) of processed events. On restart, Logstash resumes processing exactly where it left off.

Configuration notes:

  • A Blob Storage account is highly recommended for use with this module, and is likely required for production servers.
  • The storage_connection option passes the blob storage connection string.
  • Configure all Logstash instances to use the same storage_connection to get the benefits of shared processing.

Sample Blob Storage connection string:

DefaultEndpointsProtocol=https;AccountName=logstash;AccountKey=ETOPnkd/hDAWidkEpPZDiXffQPku/SZdXhPSLnfqdRTalssdEuPkZwIcouzXjCLb/xPZjzhmHfwRCGo0SBSw==;EndpointSuffix=core.windows.net

Find the connection string to Blob Storage here: Azure Portal-> Blob Storage account -> Access keys.

Best practices

edit

Here are some guidelines to help you achieve a successful deployment, and avoid data conflicts that can cause lost events.

Create a Logstash consumer group
edit

Create a new consumer group specifically for Logstash. Do not use the $default or any other consumer group that might already be in use. Reusing consumer groups among non-related consumers can cause unexpected behavior and possibly lost events. All Logstash instances should use the same consumer group so that they can work together for processing events.

Avoid overwriting offset with multiple Event Hubs
edit

The offsets (position) of the Event Hubs are stored in the configured Azure Blob store. The Azure Blob store uses paths like a file system to store the offsets. If the paths between multiple Event Hubs overlap, then the offsets may be stored incorrectly.

To avoid duplicate file paths, use the advanced configuration model and make sure that at least one of these options is different per Event Hub:

  • storage_connection
  • storage_container (defaults to Event Hub name if not defined)
  • consumer_group
Set number of threads correctly
edit

By default, the number of threads used to service all event hubs is 16. And while this may be sufficient for most use cases, throughput may be improved by refining this number. When servicing a large number of partitions across one or more event hubs, setting a higher value may result in improved performance. The maximum number of threads is not strictly bound by the total number of partitions being serviced, but setting the value much higher than that may mean that some threads are idle.

The number of threads must be greater than or equal to the number of Event hubs plus one.

Threads are currently available only as a global setting across all event hubs in a single azure_event_hubs input definition. However if your configuration includes multiple azure_event_hubs inputs, the threads setting applies independently to each.

Sample scenarios:

  • Event Hubs = 4. Partitions on each Event Hub = 3. Minimum threads is 5 (4 Event Hubs plus one). Maximum threads is 13 (4 Event Hubs times 3 partitions plus one).
  • If you’re collecting activity logs from only one specified event hub instance, then only 2 threads (1 Event Hub plus one) are required.

Set up and run the module

edit

Be sure that the logstash.yml file is configured correctly.

First time setup

edit

Run this command from the Logstash directory:

bin/logstash --setup

The --modules azure option starts a Logstash pipeline for ingestion from Azure Event Hubs. The --setup option creates an azure-* index pattern in Elasticsearch and imports Kibana dashboards and visualizations.

Subsequent starts

edit

Run this command from the Logstash directory:

bin/logstash

The --setup option is intended only for first-time setup. If you include --setup on subsequent runs, your existing Kibana dashboards will be overwritten.

Explore your data

edit

When the Logstash Azure module starts receiving events, you can begin using the packaged Kibana dashboards to explore and visualize your data.

To explore your data with Kibana:

  1. Open a browser to http://localhost:5601 (username: "elastic"; password: "YOUR_PASSWORD")
  2. Click Dashboard.
  3. Click [Azure Monitor] Overview.

Configuration options

edit

All Event Hubs options are common to both basic and advanced configurations, with the following exceptions. The basic configuration uses event_hub_connections to support multiple connections. The advanced configuration uses event_hubs and event_hub_connection (singular).

event_hubs

edit
  • Value type is array
  • No default value
  • Ignored for basic and command line configuration
  • Required for advanced configuration

Defines the per Event Hubs configuration for the advanced configuration.

The advanced configuration uses event_hub_connection instead of event_hub_connections. The event_hub_connection option is defined per Event Hub.

event_hub_connections

edit
  • Value type is array
  • No default value
  • Required for basic and command line configuration
  • Ignored for advanced configuration

List of connection strings that identifies the Event Hubs to be read. Connection strings include the EntityPath for the Event Hub.

checkpoint_interval

edit
  • Value type is number
  • Default value is 5 seconds
  • Set to 0 to disable.

Interval in seconds to write checkpoints during batch processing. Checkpoints tell Logstash where to resume processing after a restart. Checkpoints are automatically written at the end of each batch, regardless of this setting.

Writing checkpoints too frequently can slow down processing unnecessarily.

consumer_group

edit
  • Value type is string
  • Default value is $Default

Consumer group used to read the Event Hub(s). Create a consumer group specifically for Logstash. Then ensure that all instances of Logstash use that consumer group so that they can work together properly.

decorate_events

edit
  • Value type is boolean
  • Default value is false

Adds metadata about the Event Hub, including Event Hub name, consumer_group, processor_host, partition, offset, sequence, timestamp, and event_size.

initial_position

edit
  • Value type is string
  • Valid arguments are beginning, end, look_back
  • Default value is beginning

When first reading from an Event Hub, start from this position:

  • beginning reads all pre-existing events in the Event Hub
  • end does not read any pre-existing events in the Event Hub
  • look_back reads end minus a number of seconds worth of pre-existing events. You control the number of seconds using the initial_position_look_back option.

If storage_connection is set, the initial_position value is used only the first time Logstash reads from the Event Hub.

initial_position_look_back

edit
  • Value type is number
  • Default value is 86400
  • Used only if initial_position is set to look-back

Number of seconds to look back to find the initial position for pre-existing events. This option is used only if initial_position is set to look_back. If storage_connection is set, this configuration applies only the first time Logstash reads from the Event Hub.

max_batch_size

edit
  • Value type is number
  • Default value is 125

Maximum number of events retrieved and processed together. A checkpoint is created after each batch. Increasing this value may help with performance, but requires more memory.

storage_connection

edit
  • Value type is string
  • No default value

Connection string for blob account storage. Blob account storage persists the offsets between restarts, and ensures that multiple instances of Logstash process different partitions. When this value is set, restarts resume where processing left off. When this value is not set, the initial_position value is used on every restart.

We strongly recommend that you define this value for production environments.

storage_container

edit
  • Value type is string
  • Defaults to the Event Hub name if not defined

Name of the storage container used to persist offsets and allow multiple instances of Logstash to work together.

To avoid overwriting offsets, you can use different storage containers. This is particularly important if you are monitoring two Event Hubs with the same name. You can use the advanced configuration model to configure different storage containers.

threads

edit
  • Value type is number
  • Minimum value is 2
  • Default value is 16

Total number of threads used to process events. The value you set here applies to all Event Hubs. Even with advanced configuration, this value is a global setting, and can’t be set per event hub.

The number of threads should be the number of Event Hubs plus one or more. See Best practices for more information.

Common options

The following configuration options are supported by all modules:

var.elasticsearch.hosts
  • Value type is uri
  • Default value is "localhost:9200"

Sets the host(s) of the Elasticsearch cluster. For each host, you must specify the hostname and port. For example, "myhost:9200". If given an array, Logstash will load balance requests across the hosts specified in the hosts parameter. It is important to exclude dedicated master nodes from the hosts list to prevent Logstash from sending bulk requests to the master nodes. So this parameter should only reference either data or client nodes in Elasticsearch.

Any special characters present in the URLs here MUST be URL escaped! This means # should be put in as %23 for instance.

var.elasticsearch.username
  • Value type is string
  • Default value is "elastic"

The username to authenticate to a secure Elasticsearch cluster.

var.elasticsearch.password
  • Value type is string
  • Default value is "changeme"

The password to authenticate to a secure Elasticsearch cluster.

var.elasticsearch.ssl.enabled
  • Value type is boolean
  • There is no default value for this setting.

Enable SSL/TLS secured communication to the Elasticsearch cluster. Leaving this unspecified will use whatever scheme is specified in the URLs listed in hosts. If no explicit protocol is specified, plain HTTP will be used. If SSL is explicitly disabled here, the plugin will refuse to start if an HTTPS URL is given in hosts.

var.elasticsearch.ssl.verification_mode
  • Value type is string
  • Default value is "strict"

The hostname verification setting when communicating with Elasticsearch. Set to disable to turn off hostname verification. Disabling this has serious security concerns.

var.elasticsearch.ssl.certificate_authority
  • Value type is string
  • There is no default value for this setting

The path to an X.509 certificate to use to validate SSL certificates when communicating with Elasticsearch.

var.elasticsearch.ssl.certificate
  • Value type is string
  • There is no default value for this setting

The path to an X.509 certificate to use for client authentication when communicating with Elasticsearch.

var.elasticsearch.ssl.key
  • Value type is string
  • There is no default value for this setting

The path to the certificate key for client authentication when communicating with Elasticsearch.

var.kibana.host
  • Value type is string
  • Default value is "localhost:5601"

Sets the hostname and port of the Kibana instance to use for importing dashboards and visualizations. For example: "myhost:5601".

var.kibana.scheme
  • Value type is string
  • Default value is "http"

Sets the protocol to use for reaching the Kibana instance. The options are: "http" or "https". The default is "http".

var.kibana.username
  • Value type is string
  • Default value is "elastic"

The username to authenticate to a secured Kibana instance.

var.kibana.password
  • Value type is string
  • Default value is "changeme"

The password to authenticate to a secure Kibana instance.

var.kibana.ssl.enabled
  • Value type is boolean
  • Default value is false

Enable SSL/TLS secured communication to the Kibana instance.

var.kibana.ssl.verification_mode
  • Value type is string
  • Default value is "strict"

The hostname verification setting when communicating with Kibana. Set to disable to turn off hostname verification. Disabling this has serious security concerns.

var.kibana.ssl.certificate_authority
  • Value type is string
  • There is no default value for this setting

The path to an X.509 certificate to use to validate SSL certificates when communicating with Kibana.

var.kibana.ssl.certificate
  • Value type is string
  • There is no default value for this setting

The path to an X.509 certificate to use for client authentication when communicating with Kibana.

var.kibana.ssl.key
  • Value type is string
  • There is no default value for this setting

The path to the certificate key for client authentication when communicating with Kibana.

Azure module schema

edit

This module reads data from the Azure Event Hub and adds some additional structure to the data for Activity Logs and SQL Diagnostics. The original data is always preserved and any data added or parsed will be namespaced under azure. For example, azure.subscription may have been parsed from a longer more complex URN.

Name Description Notes

azure.subscription

Azure subscription from which this data originates.

Some Activity Log events may not be associated with a subscription.

azure.group

Primary type of data.

Current values are either activity_log or sql_diagnostics

azure.category*

Secondary type of data specific to group from which the data originated

azure.provider

Azure provider

azure.resource_group

Azure resource group

azure.resource_type

Azure resource type

azure.resource_name

Azure resource name

azure.database

Azure database name, for display purposes

SQL Diagnostics only

azure.db_unique_id

Azure database name that is guaranteed to be unique

SQL Diagnostics only

azure.server

Azure server for the database

SQL Diagnostics only

azure.server_and_database

Azure server and database combined

SQL Diagnostics only

Notes:

  • Activity Logs can have the following categories: Administrative, ServiceHealth, Alert, Autoscale, Security
  • SQL Diagnostics can have the following categories: Metric, Blocks, Errors, Timeouts, QueryStoreRuntimeStatistics, QueryStoreWaitStatistics, DatabaseWaitStatistics, SQLInsights

Microsoft documents Activity log schema here. The SQL Diagnostics data is documented here. Elastic does not own these data models, and as such, cannot make any assurances of information accuracy or passivity.

Special note - Properties field

edit

Many of the logs contain a properties top level field. This is often where the most interesting data lives. There is not a fixed schema between log types for properties fields coming from different sources.

For example, one log may have properties.type where one log sets this a String type and another sets this an Integer type. To avoid mapping errors, the original properties field is moved to <azure.group>_<azure_category>_properties.<original_key>. For example properties.type may end up as sql_diagnostics_Errors_properties.type or activity_log_Security_properties.type depending on the group/category where the event originated.

Deploying the module in production

edit

Use security best practices to secure your configuration. See Secure a cluster for details and recommendations.

Microsoft Azure resources

edit

Microsoft is the best source for the most up-to-date Azure information.