Kibana Alerts

edit

The Elastic Stack monitoring features provide Kibana alerts out-of-the box to notify you of potential issues in the Elastic Stack. These alerts are preconfigured based on the best practices recommended by Elastic. However, you can tailor them to meet your specific needs.

When you open Stack Monitoring, the preconfigured Kibana alerts are created automatically. If you collect monitoring data from multiple clusters, these alerts can search, detect, and notify on various conditions across the clusters. The alerts are visible alongside your existing Watcher cluster alerts. You can view details about the alerts that are active and view health and performance data for Elasticsearch, Logstash, and Beats in real time, as well as analyze past performance. You can also modify active alerts.

Kibana alerts in the Stack Monitoring app

To review and modify all the available alerts, use Alerts and Actions in Stack Management.

CPU threshold

edit

This alert is triggered when a node runs a consistently high CPU load. By default, the trigger condition is set at 85% or more averaged over the last 5 minutes. The alert is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 1 day.

Disk usage threshold

edit

This alert is triggered when a node is nearly at disk capacity. By default, the trigger condition is set at 80% or more averaged over the last 5 minutes. The alert is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 1 day.

JVM memory threshold

edit

This alert is triggered when a node runs a consistently high JVM memory usage. By default, the trigger condition is set at 85% or more averaged over the last 5 minutes. The alert is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 1 day.

Missing monitoring data

edit

This alert is triggered when any stack products nodes or instances stop sending monitoring data. By default, the trigger condition is set to missing for 15 minutes looking back 1 day. The alert is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 6 hours.

Thread pool rejections (search/write)

edit

This alert is triggered when a node experiences thread pool rejections. By default, the trigger condition is set at 300 or more over the last 5 minutes. The alert is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 1 day. Thresholds can be set independently for search and write type rejections.

CCR read exceptions

edit

This alert is triggered if a read exception has been detected on any of the replicated clusters. The trigger condition is met if 1 or more read exceptions are detected in the last hour. The alert is grouped across all replicated clusters by running checks on a schedule time of 1 minute with a re-notify interval of 6 hours.

Large shard size

edit

This alert is triggered if a large average shard size (across associated primaries) is found on any of the specified index patterns. The trigger condition is met if an index’s average shard size is 55gb or higher in the last 5 minutes. The alert is grouped across all indices that match the default pattern of * by running checks on a schedule time of 1 minute with a re-notify interval of 12 hours.

Cluster alerts

edit

These alerts summarize the current status of your Elastic Stack. You can drill down into the metrics to view more information about your cluster and specific nodes, instances, and indices.

An alert will be triggered if any of the following conditions are met within the last minute:

  • Elasticsearch cluster health status is yellow (missing at least one replica) or red (missing at least one primary).
  • Elasticsearch version mismatch. You have Elasticsearch nodes with different versions in the same cluster.
  • Kibana version mismatch. You have Kibana instances with different versions running against the same Elasticsearch cluster.
  • Logstash version mismatch. You have Logstash nodes with different versions reporting stats to the same monitoring cluster.
  • Elasticsearch nodes changed. You have Elasticsearch nodes that were recently added or removed.
  • Elasticsearch license expiration. The cluster’s license is about to expire.

    If you do not preserve the data directory when upgrading a Kibana or Logstash node, the instance is assigned a new persistent UUID and shows up as a new instance

  • Subscription license expiration. When the expiration date approaches, you will get notifications with a severity level relative to how soon the expiration date is:

    • 60 days: Informational alert
    • 30 days: Low-level alert
    • 15 days: Medium-level alert
    • 7 days: Severe-level alert

      The 60-day and 30-day thresholds are skipped for Trial licenses, which are only valid for 30 days.

Some action types are subscription features, while others are free. For a comparison of the Elastic subscription levels, see the alerting section of the Subscriptions page.