How to set up monitoring

edit

Learn how to configure your deployments for observability, which includes metric and log collection, troubleshooting views, and cluster alerts to automate performance monitoring.

These steps are helpful to set yourself up for success by making monitoring readily available and to automate alerts for the future.

Before you begin

edit

As you manage, monitor, and troubleshoot your deployment, make sure you have an understanding of the shared responsibilities between Elastic and yourself, so you know what you need to do to keep your deployments running smoothly.

You may also consider subscribing to incident notices reported on the Elasticsearch Service status page.

Enable logs and metrics

edit

After you have created a new deployment, you should enable shipping logs and metrics to a monitoring deployment:

  1. Go to the Deployments page in Elastic Cloud.
  2. Find your deployment and go to the Logs and Metrics page.
  3. Select Enable.
  4. Choose where to send your logs and metrics.

    Anything used for production should go to a separate deployment you create only for monitoring. For development or testing, you can send monitoring data to the same deployment. Check Enable logging and monitoring.

  5. Select Save.

Optionally, turn on audit logging to capture security-related events, such as authentication failures, refused connections, and data-access events through the proxy. To turn on audit logging, edit your deployment’s elasticsearch.yml file to add these lines:

xpack.security.audit.enabled: true
# xpack.security.audit.logfile.events.include: _all
# xpack.security.audit.logfile.events.emit_request_body: true

The last two lines are commented out for now but left there as placeholders to easily turn on in the future. These two settings generate large logs, but can be helpful to turn on temporarily when troubleshooting traffic request bodies.

View your deployment health

edit

From the monitoring deployment, you can now view your deployment’s health in Kibana using Stack Monitoring:

  1. Select the Kibana link for your monitoring deployment.
  2. From the app menu or the search bar, open Stack Monitoring.

    Stack monitoring comes with many out-of-the-box rules, but you need to enable them when prompted.

To learn more about what Elasticsearch monitoring metrics are available, take a look at the different tabs. For example:

  • The Overview tab includes information about the search and indexing performance of Elasticsearch and also provides log entries.
  • The Nodes tab can help you monitor cluster CPU performance, JVM strain, and free disk space.
Node tab in Kibana under Stack Monitoring

Some performance metrics are also available directly in the Elasticsearch Service Console and don’t require looking at your monitoring deployment. If you’re ever in a rush to determine if there is a performance problem, you can get a quick overview by going to the Performance page from your deployment menu:

Performance page of the Elastic Cloud console

Check the logs

edit

If you suspect a performance issue, you can use your monitoring deployment to investigate what is going in Kibana:

  • Through Observability > Logs > Stream: This page shows errors in real-time and is part of the same logs Elastic Support reviews when a deployment experiences issues. Check Tail log files.
  • Through Discover: This page is a good option for investigating widespread historical patterns. Check Discover.

    Discover requires a quick setup in Kibana:

    1. Go to Stack Management > Data Views (formerly Index Patterns).
    2. Create a data view for elastic-cloud-logs* and set Timestamp field to @timestamp:

      Create data view example in Kibana

Navigate to the Discover or Stream pages to check if you’ve misconfigured your SAML authentication setup by filtering for WARN and ERROR level logs and viewing the specific message fields, for example:

Log error in Stream page showing failed SAML authentication

You can also use this page to test how problematic proxy traffic requests show up in audit logs. To illustrate, create a spurious test request from the Elasticsearch API console:

Elasticsearch API console showing a spurious request that fails

You will get this request reported as a new log. Audit logs do not currently report the HTTP response status code, but they do report a correlating event.action column:

New log entry that shows failed spurious request issued from the API console

Get notified

edit

You should take advantage of the default Elastic Stack monitoring alerts that are available out-of-the-box. You don’t have to do anything other than enable shipping logs and metrics to have them made available to you (which you did earlier on).

On top of these default alerts that write to indices you can investigate, you might want to add some custom actions, such as a connector for Slack notifications. To set up these notifications, you first configure a Slack connector and then append it to the default alerts and actions. From Kibana:

  1. Go to Stack Management > Rules and Connectors > Connectors and create your Slack connector:

    1. Select Slack.
    2. Create a Slack Webhook URL and paste it into the Webhook URL field.
    3. Select Save.
  2. Go to Stack Monitoring and select Enter setup mode.
  3. Edit an alert rule, such as CPU usage:

    1. Select one of the alert rule fields and select CPU Usage.
    2. Choose Edit rule and scroll down to the bottom of the screen to select Slack.
    3. Optional: Set up a customized message that helps you identify what the message is for.
    4. Select Save.
    Alert rule example showing settings to send a Slack notification based on CPU Usage

Now, when your CPU usage alert goes off, you will also get a Slack notification to investigate if your cluster is experiencing a traffic blip or if you need to scale out. (You can automate the latter with deployment autoscaling).

Keep monitoring

edit

As a managed service, Elastic Cloud is here to help you manage the maintenance and upkeep. As part of your responsibilities, you should monitor deployment health on an ongoing basis. There are two main activities to perform:

  • Review the deployment logs
  • Act on automated alerts

When issues come up that you need to troubleshoot, you’ll frequently start with the same queries to determine which rabbit hole to investigate further, such as _cluster/health to determine overall deployment health.

Elasticsearch API console showing queries useful for monitoring

You can run this query and many others from the API consoles available via:

You can also learn more about the queries you should run for your deployment by reading our blog Managing and Troubleshooting Elasticsearch Memory.