Preamble
In our first post in our Detonate series, we introduced the Detonate system and what we use it for at Elastic. We also discussed the benefits it provides our team when assessing the performance of our security artifacts.
In this publication, we will break down how Detonate works & dive deeper into the technical implementation. This includes how we’re able to create this sandboxed environment in practice, the technology that supports the overall pipeline, and how we submit information to and read information from the pipeline.
Interested in other posts on Detonate? Check out Part 1 - Click, Click…Boom! where we introduce Detonate, why we built it, explore how Detonate works, describe case studies, and discuss efficacy testing.
Architecture
Below is a high-level overview of the Detonate end-to-end architecture.
The overall system consists of a series of message queues and Python workers. Detonation tasks are created by an API server upon accepting a request with as little information as the sample file hash. The task then moves from queue to queue, picked up by workers that execute various operations along the way.
The server and workers run in a container on Amazon ECS. The pipeline can also be brought up locally using Docker Compose for early development and feature testing.
API server
The Detonate API server is a FastAPI python application that accepts a variety of execution target requests: hashes of samples, native commands (in bash or Powershell, with or without arguments), and uploaded files. The server also exposes endpoints for fetching alerts and raw agent telemetry from an Elastic cluster.
The API documentation is generated automatically by FastAPI and incorporated into our global API schema.
Interacting with the API server - CLI
We built a custom Python CLI (command-line interface) tool for interacting with our Detonate server. The CLI tool is built using the Python library click along with rich for a beautiful formatting experience in a terminal window. The tool is particularly useful for debugging the pipeline, as it can also be run against a local pipeline setup. The tool is installed and runs using Poetry, our preferred tool of choice for managing dependencies and running scripts.
❯ DETONATE_CLI_API_ROOT_URL="${API_ENDPOINT_URL}" \
DETONATE_CLI_API_AUTH_HEADER="${API_KEY}" \
poetry run cli \
--hash "${MY_FILE_HASH}"
Interacting with the API server - Web UI
Internally, we host a site called Protections Portal (written using Elastic UI components) to assist our team with research. For a more interactive experience with the Detonate API, we built a page in the Portal to interact with it. Along with submitting tasks, the Web UI allows users to see the feed of all detonations and the details of each task.
Each task can be expanded to see its full details. We provide the links to the data and telemetry collected during the detonation.
Interacting with the API server - HTTP client
If our users want to customize how they interact with the Detonate API, they can also run commands using their HTTP client of choice (such as curl , httpie , etc.). This allows them to add detonations to scripts or as final steps at the end of their own workflows.
Queues
The pipeline is built on a series of queues and workers. Having very basic requirements for the message queues engine, we decided to go with Amazon SQS. One of the many benefits of using a popular service like SQS is the availability of open-source resources and libraries we can build upon. For example, we use softwaremill/elasticmq Docker images as a queue engine when running the pipeline locally.
The queues are configured and deployed with Terraform code that covers all our production and staging infrastructure.
Workers
Each worker is a Python script that acts as both a queue consumer and a queue producer. The workers are implemented in our custom mini-framework, with the boilerplate code for error handling, retries, and monitoring built-in. Our base worker is easily extended, allowing us to add new workers and evolve existing ones if additional requirements arise.
For monitoring, we use the Elastic APM observability solution. It is incredibly powerful, giving us a view into the execution flow and making debugging pipeline issues a breeze. Below, we can see a Detonate task move between workers in the APM UI:
These software and infrastructure components give us everything we need to perform the submission, execution, and data collection that make up a detonation.
Detonations
The pipeline can execute commands and samples in Windows, Linux, and macOS virtual machines (VMs). For Windows and Linux environments, we use VM instances in Google Compute Engine. With the wide selection of public images, it allows us to provision sandboxed environments with different versions of Windows, Debian, Ubuntu, CentOS, and RHEL.
For macOS environments, we use mac1.metal instances in AWS and an on-demand macOS VM provisioning solution from Veertu called Anka. Anka gives us the ability to quickly rotate multiple macOS VMs running on the same macOS bare metal instance.
Detonate is currently focused on the breadth of our OS coverage, scalability, and the collection of contextually relevant data from the pipeline. Fitting sophisticated anti-analysis countermeasures into Detonate is currently being researched and engineered.
VM provisioning
In order to keep our footprint in the VM to a minimum, we use startup scripts for provisioning. Minimizing our footprint is important because our activities within a VM are included in the events we collect, making analysis more complicated after a run. For Windows and Linux VMs, GCP startup scripts written in Powershell and bash are used to configure the system; for macOS VMs, we wrote custom bash and AppleScript scripts.
The startup scripts perform these steps:
- Configure the system. For example, disable MS Defender, enable macros execution in MS Office, disable automatic system updates, etc.
- Download and install Elastic agent. The script verifies that the agent is properly enrolled into the Fleet Server and that the policies are applied.
- Download and detonate a sample, or execute a set of commands. The execution happens in a background process, while the main script collects the STDOUT / STDERR datastreams and sleeps for N seconds.
- Collect files from the filesystem (if needed) and upload them into the storage. This allows us to do any additional verification or debugging once the detonation is complete.
The VM lifecycle is managed by the start_vm and stop_vm workers. Since we expect some detonations to break the startup script execution flow (e.g., in the case of ransomware), every VM has a TTL set, which allows the stop_vm worker to delete VMs not in use anymore.
This clean-slate approach, with the startup script used to configure everything needed for a detonation, allows us to use VM images from the vendors from Google Cloud public images catalog without any modifications!
Network configuration
Some of the samples we detonate are malicious and might produce malicious traffic, such as network scans, C2 callouts, etc. In order to keep our cloud resources and our vendor’s infrastructure safe, we limit all outgoing traffic from VMs. The instances are placed in a locked-down VPC that allows outgoing connection only to a predefined list of targets. We restrict traffic flows in VPC using Google Cloud’s routes and firewall rules, and AWS’s security groups.
We also make use of VPC Flow Logs in GCE. These logs allow us to see private network traffic initiated by sandbox VMs in our VPC.
Telemetry collection
To observe detonations, we use the Elastic Agent with the Elastic Defend integration installed with all protections in “Detect” (instead of “Protect”) mode. This allows us to collect as much information from a VM as we can, while simultaneously allowing the Elastic Security solution to produce alerts and detections.
We cover two use cases with this architecture: we can validate protections (comparing events and alerts produced for different OS versions, agent versions, security artifacts deployed, etc) and collect telemetry for analysis (for fresh samples or novel malware) at the same time. All data collected is kept in a persistent Elastic cluster and is available for our researchers.
Running in production
Recently we completed a full month of running Detonate pipeline in production, under the load of multiple data integrations, serving internal users through UI at the same time. Our record so far is 1034 detonations in a single day, and so far, we haven’t seen any scalability or reliability issues.
The bulk of the submissions are Windows-specific samples, for now. We are working on increasing our coverage of Linux and macOS as well – stay tuned for the research blog posts coming soon!
We are constantly improving our support for various file types, making sure the detonation is as close to the intended trigger behavior as possible.
Looking at the detonations from the last month, we see that most of the tasks were completed in under 13 minutes (with a median of 515 seconds). This time includes task data preparation, VM provisioning and cleanup, sample execution, and post-detonation processing.
These are still early days of the service, so it is normal to see the outliers. Since most of the time in a task is spent waiting for a VM to provision, we can improve the overall execution time by using custom VM images, pre-starting VM instances, and optimizing the startup scripts.
What's next?
Now that you see how Detonate works, our next posts will dive into more detailed use cases of Detonate. We’ll go further into how these detonations turn into protecting more of our users, including right here at Elastic!