Agi K Thomas

LLM Observability with the new Amazon Bedrock Integration in Elastic Observability

Elastic's new Amazon Bedrock integration for Observability provides comprehensive insights into Amazon Bedrock LLM performance and usage. Learn about how LLM based metric and log collection in real-time with pre-built dashboards can effectively monitor and resolve LLM invocation errors and performance challenges.

6 min read
LLM Observability with the new Amazon Bedrock Integration in Elastic Observability

Elastic is expanding support for LLM Observability with Elastic Observability's new Amazon Bedrock integration. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models from Amazon Bedrock. As more LLM-based applications are developed, it is essential for SREs and developers to monitor GenAI application's performance and LLM performance and cost. The new Amazon Bedrock Observability integration offers an out-of-the-box experience by simplifying the collection of Amazon Bedrock metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize, and troubleshoot LLM applications which are using Amazon Bedrock.

This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.

Prerequisites

To follow along with this blog, please make sure you have:

Configuring Amazon Bedrock Logs Collection

To collect Amazon Bedrock logs, you can choose from the following options:

  1. S3 bucket
  2. CloudWatch logs

S3 Bucket Logs Collection: When collecting logs from the S3 bucket, you can retrieve logs from S3 objects pointed to by S3 notification events, which are read from an SQS queue, or by directly polling a list of S3 objects in an S3 bucket. Refer to Elastic’s Custom AWS Logs integration for more details.

CloudWatch Logs Collection: In this option, you will need to create a CloudWatch log group. After creating the log group, be sure to note down the ARN of the newly created log group, as you will need it for the Amazon Bedrock settings configuration and Amazon Bedrock integration configuration for logs.

Configure the Amazon Bedrock CloudWatch logs with the Log group ARN to start collecting CloudWatch logs.

Please visit the AWS Console and navigate to the "Settings" section under Amazon Bedrock and select your preferred method of collecting logs. Based on the value you select from the Logging Destination in the Amazon Bedrock settings, you will need to enter either the S3 location or the CloudWatch log group ARN.

Configuring Amazon Bedrock Metrics Collection

Configure Elastic's Amazon Bedrock integration to collect Amazon Bedrock metrics from your chosen AWS region at the specified collection interval.

Maximize Visibility with Out-of-the-Box Dashboards

Amazon Bedrock integration offers rich out-of-the-box visibility into the performance and usage information of models in Amazon Bedrock, including text and image models. The Amazon Bedrock Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.

The Text / Chat metrics section in the Amazon Bedrock Overview dashboard provides insights into token usage for Text models in Amazon Bedrock. This includes use cases such as text content generation, summarization, translation, code generation, question answering, and sentiment analysis.

The Image metrics section in the Amazon Bedrock Overview dashboard offers valuable insights into the usage of Image models in Amazon Bedrock.

The Logs section of the Amazon Bedrock Overview dashboard provides detailed insights into the usage and performance of LLM requests. It enables you to monitor key details such as model name, version, LLM prompt and response, usage tokens, request size, completion tokens, response size, and any error codes tied to specific LLM requests.

The detailed logs provide full visibility into raw model interactions, capturing both the inputs (prompts) and the outputs (responses) generated by the models. This transparency enables you to analyze and optimize how your LLM handles different requests, allowing for more precise fine-tuning of both the prompt structure and the resulting model responses. By closely monitoring these interactions, you can refine prompt strategies and enhance the quality and reliability of model outputs.

Amazon Bedrock Overview dashboard provides a comprehensive view of the initial and final response times. It includes a percentage comparison graph that highlights the performance differences between these response stages, enabling you to quickly identify efficiency improvements or potential bottlenecks in your LLM interactions.

Creating Alerts and SLOs to Monitor Amazon Bedrock

As with any Elastic integration, Amazon Bedrock logs and metrics are fully integrated into Elastic Observability, allowing you to leverage features like SLOs, alerting, custom dashboards, and detailed logs exploration.

To create an alert, for example to monitor LLM invocation latency in Amazon Bedrock, you can apply a Custom Threshold rule on the Amazon Bedrock datastream. Set the rule to trigger an alert when the LLM invocation latency exceeds a defined threshold. This ensures proactive monitoring of model performance, allowing you to detect and address latency issues before they impact the user experience.

When a violation occurs, the Alert Details view linked in the notification provides detailed context, including when the issue began, its current status, and any history of similar violations. This rich information enables rapid triaging, investigation, and root cause analysis to resolve issues efficiently.

Similarly, to create an SLO for monitoring Amazon Bedrock invocation performance for instance, you can define a custom query SLI where good events are those Amazon Bedrock invocations that do not result in client errors or server errors and have latency less than 10 seconds. Set an appropriate SLO target, such as 99%. This will help you identify errors and latency issues in applications using LLMs, allowing you to take timely corrective actions before they affect the overall user experience.

The image below highlights the SLOs, SLIs, and the remaining error budget for Amazon Bedrock models. The observed violations are a result of deliberately crafted long text generation prompts, which led to extended response times. This example demonstrates how the system tracks performance against defined targets, helping you quickly identify latency issues and performance bottlenecks. By monitoring these metrics, you gain valuable insights for proactive issue triaging, allowing for timely corrective actions and improved user experience of applications using LLM.

Try it out today

The Amazon Bedrock playgrounds provide a console environment to experiment with running inference on different models and configurations before deciding to use them in an application. Deploy a cluster on our Elasticsearch Service or download the Elasticsearch stack, spin up the new technical preview of Amazon Bedrock integration, open the curated dashboards in Kibana and start monitoring your Amazon Bedrock service!

Share this article