Enhance threat response with custom knowledge sources for Elastic AI Assistant

November 13, 2024

As we’ve seen over the past year and a half, generative AI has been an extremely beneficial addition to security analyst workflows. Large language models (LLMs) are a tremendous knowledge resource on all things cybersecurity and can be asked virtually any question relating to a security analyst’s workflow.

We’ve seen incredible results from customers using the Elastic AI Assistant within their security operations workflows, enabling efficient operations and increased productivity.

However, LLMs fall short when it comes to answering questions about public content that falls outside their training cutoff date or questions related to private data sources.

There are various strategies for using custom knowledge sources with LLMs — most of which are fairly expensive or time-consuming, such as fine-tuning or instruction tuning. These models also have a short shelf life — becoming outdated pretty much the second they are generated — as we deal with new data constantly.

Elastic's Search AI Platform to the rescue

At Elastic, we’re able to take a different approach to solve this problem for users of the Elastic AI Assistant. Being built on top of the Search AI Platform, we’re able to use a technique called retrieval augmented generation (RAG) to supplement the knowledge of LLMs with content contained within a user's Elasticsearch cluster. More importantly, we’re able to build workflows for security operations teams to use RAG in a simple, intuitive way — without the need to use external tools, code, or scripts.

This allows teams to easily bridge the gap between their private data sources and LLMs in a secure, flexible, and scalable way.

How does it work?

When additional knowledge sources are made available to the Elastic AI Assistant, they can be used depending on the question a user asks. The Elastic AI Assistant is able to identify if a knowledge source needs to be referenced and searched first before handing the query off to the chosen LLM, allowing the LLM to gain the context it needs to answer the user’s question.

The Search AI Platform features allow the correct content to be searched and retrieved based on the intent and semantics of the user’s question. This is important because incorrect content will lead to an LLM providing an incorrect response and sending too much content will end up being costly and ineffective. It’s also important to only retrieve data for which the user has authorized permission. Custom knowledge sources should not be considered “free for all” and should respect role-based access controls (RBAC) policies just like any other data source.

elastic ai assistant - behind the scenes

Adding knowledge sources for the Elastic AI Assistant

Custom knowledge sources can take the form of a simple text or markdown entry as well as an index that has been configured with a semantic text field. The new knowledge settings user interface makes the process of adding custom knowledge sources a breeze, allowing you to configure the content and the sharing settings for that knowledge.

In addition, users can now ask Elastic AI Assistant to remember content as knowledge during a conversation. Simply ask the Elastic AI Assistant what you would like remembered, and it will be available as a custom knowledge source going forward.

Some examples of how custom knowledge sources can be used:

Attaching an index containing asset information, such as content found in a configuration management database (CMDB)
Adding your favorite threat intelligence reports to be used during a conversation
Documents containing any existing threat hunting playbooks or standard operating procedures
Historical incident or case information
On-call schedules

Examples

Adding a threat intelligence report PDF as custom knowledge

Security operations teams often maintain repositories of threat intelligence reports that contain a wealth of knowledge from the vendor producing the report. The challenge, however, is that the content of these reports typically sits in PDFs, making it difficult to retrieve and reference relevant information from the report during an incident or investigation or leverage any indicators of compromise (IoCs) for threat hunting. With the ability to use these reports as knowledge within the Elastic AI Assistant, this dynamic changes entirely.

Let’s use the Elastic Global Threat Report for 2024 as an example.

Step 1. Enabling and setting up the knowledge base
This is a very simple step that takes care of some of the prerequisites necessary for the knowledge base content to be used by Elastic AI Assistant. It’s a single button in the assistant management settings. The process only takes a few minutes to complete.

Step 2. Uploading the PDF
Once the knowledge base setup is complete, we can proceed to upload the PDF. To do this, we can use the integration titled Upload a file from the Integrations page.

You can select the PDF from the next screen.

Click Import when prompted.

For the next step, we will need to pivot to the Advanced tab. Once uploaded, this PDF will live in its own index, so feel free to name the index accordingly. There is no need to create a data view.

There is one last step before clicking on the import button. We need to add a semantic text field. This allows the assistant to retrieve the correct information from the report.

Click on Add additional field and then Add semantic text field.

You can leave the default settings that appear after clicking Add semantic text field.

You can now click on Import.

When the file is imported successfully, you should see the following status:

It’s important to note that while we used the File Upload user interface to add this PDF, it’s possible to automate this functionality as part of any ingest process using the attachment processor.

Step 3. Adding the PDF index as custom knowledge
Returning to the AI Settings page, select New to add a new knowledge entry, and then select Index from the list.

You’ll then be asked to select the index that was just created (“global-threat-report-kb” in our example), the semantic text field we just created (content), and a description of how and when the assistant should use this knowledge. This should be a simple sentence description of what the data is and when and how it should be queried. You can also set the relevant permissions for this knowledge entry from this view. When ready, hit Save.

Once added, you should see the new knowledge entry in the list:

The threat report is now available as knowledge and is ready to be used by the assistant.

Comparing the results
If we compare results from the assistant before and after we add the knowledge base entry, we can see a clear difference.

Before the knowledge was added:

After the knowledge was added:

Our PDF went from being an idle bit of important — yet hard-to-use — information to being immediately accessible to our security operations team. The great thing about knowledge sources is that the Elastic AI Assistant is able to use a combination of them, depending on the questions asked. Remember that the Elastic AI Assistant can also ingest 500 of your latest alerts as knowledge by default, which allows for a powerful combination of questions that can be asked.

Below is an example of that in action — we’ll use the assistant to ask about a specific process or technique highlighted in our threat reports and perform a follow-up check to see if we’ve been impacted by similar behavior:

This one example clearly highlights the usefulness of having custom knowledge sources available to the assistant. And as we highlighted earlier, there are many other scenarios and examples of where custom knowledge sources can be useful.

For more information on how to add different types of knowledge sources, you can refer to our detailed documentation.

What’s next?

We expect to add the ability to use custom knowledge in our other AI features, such as Elastic Attack Discovery and Automatic Import. We’ll also be making it easier to use existing search connectors to continuously import and synchronize knowledge across systems, such as GitHub, Confluence, Jira, ServiceNow, and many other systems.

Ready to try this out with your own data? Get started with a 14-day free trial!

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.