Key takeaways
- The EMOTET developers have changed the way they encode their configuration in the 64bit version of the malware.
- Using code emulation we can bypass multiple code obfuscation techniques.
- The use of code emulators in config extractors will become more prevalent in the future.
To download the EMOTET configuration extractor, check out our post on the tool:
Preamble
The EMOTET family broke onto the malware scene as a modular banking trojan in 2014, focused on harvesting and exfiltrating bank account information by inspecting traffic. EMOTET has been adapted as an early-stage implant used to load other malware families, such as QAKBOT, TRICKBOT, and RYUK. While multiple EMOTET campaigns have been dismantled by international law enforcement entities, it has continued to operate as one of the most prolific cybercrime operations.
For the last several months, Elastic Security has observed the EMOTET developers transition to a 64-bit version of their malware. While this change does not seem to impact the core functionality of the samples we have witnessed, we did notice a change in how the configuration and strings are obfuscated. In earlier versions of EMOTET, the configuration was stored in an encrypted form in the .data section of the binary. In the newer versions the configuration is calculated at runtime. The information we need to extract the configuration from the binary is thus hidden within the actual code.
In the next sections, we’ll discuss the following as it relates to 64-bit EMOTET samples:
- EMOTET encryption mechanisms
- Reviewing the EMOTET C2 list
- Interesting EMOTET strings
- The EMOTET configuration extractor utility
Encryption keys
EMOTET uses embedded Elliptic Curve Cryptography (ECC) public keys to encrypt their network communication. While in previous versions, the keys would be stored in an XOR-encrypted blob, now the content is calculated at runtime.
In comparison the previous versions of EMOTET would store an encrypted version of the key data in the . text section of the binary.
In order to make it harder for security researchers to find the given code the malware uses Mixed Boolean-Arithmetic (MBA) as one of its obfuscation techniques. It transforms constants and simple expressions into expressions that contain a mix of Boolean and arithmetic operations.
In this example, an array of constants is instantiated, but looking at the assembly we see that every constant is calculated at runtime. This method makes it challenging to develop a signature to target this function.
We noticed that both the Elliptic Curve Diffie-Hellman (ECDH) and Elliptic Curve Digital Signature Algorithm (ECDSA) keys use the same function to decode the contents.
The ECDH key (which you can recognize by its magic ECK1 bytes) is used for encryption purposes while the ECDSA key (ECC1) is used for verifying the C2 server's responses.
By leveraging a YARA signature to find the location of this decode function within the EMOTET binary we can observe the following process:
- Find the decoding algorithm within the binary.
- Locate any Cross References (Xrefs) to the decoding function.
- Emulate the function that calls the decoding function.
- Read the resulting data from memory.
As we mentioned, we first find the function in the binary by using YARA. The signature is provided at the end of this article. It is worth pointing out that these yara signatures are used to identify locations in the binary but are, in their current form, not usable to identify EMOTET samples.
In order to automatically retrieve the data from multiple samples, we created a configuration extractor. In the snippets below, we will demonstrate, in a high level fashion, how we collect the configuration information from the malware samples.
In the above code snippet:
- First load the YARA signature.
- Try to find a match, and if a signature is found in the file.
- Calculate the function offset based on the offset in the file.
In order to locate the Xrefs to this function, we use the excellent SMDA decompiler. After locating the Xrefs, we can start the emulation process using the CPU emulator, Unicorn.
- Initialize the Unicorn emulator.
- Load the executable code from the PE file into memory.
- Disassemble the function to find the return and the end of the execution.
- The binary will try to use the windows HeapAlloc API to allocate space for the decoded data. Since we don't want to emulate any windows API's, as this would add unnecessary complexity, we hook to code so that we can allocate space ourselves.
- After the emulation has run the 64-bit “long size” register (RAX), it will contain a pointer to the key data in memory.
- To present the key in a more readable way, we convert it to the standard PEM format.
By emulating the parts of the binary that we are interested in, we no longer have to statically defeat the obfuscation in order to retrieve the hidden contents. This approach adds a level of complexity to the creation of config extractors. However, since malware authors are adding ever more obfuscation, there is a need for a generic approach to defeating these techniques.
C2 server list
An important part of tracking malware families is to get new insights by identifying and discovering which C2 servers they use to operate their network.
In the 64-bit versions of EMOTET, we see that the IP and port information of the C2 servers are also dynamically calculated at runtime. Every C2 server is represented by a function that calculates and returns a value for the IP address and the port number.
These functions don’t have a direct cross reference available for searching. However, a procedure references all the C2 functions and creates the p_c2_list array of pointers.
After that, we can emulate every C2-server function individually to retrieve the IP and port combination as seen below.
Strings
The same method is applied to the use of strings in memory. Every string has its own function. In the following example, the function would return a pointer to the string %s\regsvr32.exe "%s".
All of the EMOTET strings share a common function to decode or resolve the string at runtime. In the sample that we are analyzing here, the string resolver function is referenced 29 times.
This allows us to follow the same approach as noted earlier in order to decode all of the EMOTET strings. We pinpoint the string decoding function using YARA, find the cross-references, and emulate the resulting functions.
Configuration extractor
Automating the payload extraction from EMOTET is a crucial aspect of threat hunting as it gives visibility of the campaign and the malware deployed by the threat actors, enabling practitioners to discover new unknown samples in a timely manner.
% emotet-config-extractor --help
usage: Emotet Configuration Extractor [-h] (-f FILE | -d DIRECTORY) [-k] [-c] [-s] [-a]
options:
-h, --help show this help message and exit
-f FILE, --file FILE Emotet sample path
-d DIRECTORY, --directory DIRECTORY
Emotet samples folder
-k Extract Encryption keys
-c Extract C2 information
-s Extract strings
-a Extract strings (ascii)
Our extractor takes either a directory of samples with -d option or -f for a single sample and then can output parts of the configuration of note, specifically:
- -k : extract the encryption keys
- -c : extract the C2 information
- -s : extract the wide-character strings
- -a : extract the ASCII character stings
EMOTET uses a different routine for decoding wide and ASCII strings. That is why the extractor provides flags to extract them separately.
The C2 information displays a list of IP addresses found in the sample. It is worth noting that EMOTET downloads submodules to perform specific tasks. These submodules can contain their own list of C2 servers. The extractor is also able to process these submodules.
The submodules that we observed do not contain encryption keys. While processing submodules you can omit the -k flag.
[...]
[+] Key type: ECK1
[+] Key length: 32
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE2DWT12OLUMXfzeFp+bE2AJubVDsW
NqJdRC6yODDYRzYuuNL0i2rI2Ex6RUQaBvqPOL7a+wCWnIQszh42gCRQlg==
-----END PUBLIC KEY-----
[...]
[+] Key type: ECS1
[+] Key length: 32
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE9C8agzYaJ1GMJPLKqOyFrlJZUXVI
lAZwAnOq6JrEKHtWCQ+8CHuAIXqmKH6WRbnDw1wmdM/YvqKFH36nqC2VNA==
-----END PUBLIC KEY-----
[...]
[+] Found 64 c2 subs
174.138.33.49:7080
188.165.79.151:443
196.44.98.190:8080
[...]
[+] Starting emulation
[+] String BLOB address: 0x4000000
KeyDataBlob
[...]
[+] String BLOB address: 0x4000000
bcrypt.dll
[...]
[+] String BLOB address: 0x4000000
RNG
To enable the community to further defend themselves against existing and new variants of EMOTET, we are making the payload extractor open source under the Apache 2 License. Access the payload extractor documentation and binary download.
The future of EMOTET
The EMOTET developers are implementing new techniques to hide their configurations from security researchers. These techniques will slow down initial analysis, however, EMOTET will eventually have to execute to achieve its purpose, and that means that we can collect information that we can use to uncover more about the campaign and infrastructure. Using code emulators, we can still find and extract the information from the binary without having to deal with any obfuscation techniques. EMOTET is a great example where multiple obfuscation techniques make static analysis harder. But of course, we expect more malware authors to follow the same example. That is why we expect to see more emulation-based configuration extract in the future.
Detection
YARA
Elastic Security has created YARA rules to identify this activity. The YARA rules shown here are not meant to be used to solely detect EMOTET binaries, they are created to support the configuration extractor. The YARA rules for detecting EMOTET can be found in the protections-artifacts repository.
EMOTET key decryption function
rule resolve_keys
{
meta:
author = "Elastic Security"
description = "EMOTET - find the key decoding algorithm in the PE"
creation_date = "2022-08-02"
last_modified = "2022-08-11"
os = "Windows"
family = "EMOTET"
threat_name = "Windows.Trojan.EMOTET"
reference_sample = "debad0131060d5dd9c4642bd6aed186c4a57b46b0f4c69f1af16b1ff9c0a77b1"
strings:
$chunk_1 = {
45 33 C9
4C 8B D0
48 85 C0
74 ??
48 8D ?? ??
4C 8B ??
48 8B ??
48 2B ??
48 83 ?? ??
48 C1 ?? ??
48 3B ??
49 0F 47 ??
48 85 ??
74 ??
48 2B D8
42 8B 04 03
}
condition:
any of them
}
EMOTET C2 aggregation
rule c2_list
{
author = "Elastic Security"
description = "EMOTET - find the C2 collection in the PE"
creation_date = "2022-08-02"
last_modified = "2022-08-11"
os = "Windows"
family = "EMOTET"
threat_name = "Windows.Trojan.EMOTET"
reference_sample = "debad0131060d5dd9c4642bd6aed186c4a57b46b0f4c69f1af16b1ff9c0a77b1"
strings:
$chunk_1 = {
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
48 8D 05 ?? ?? ?? ??
48 89 81 ?? ?? ?? ??
}
condition:
any of them
}
EMOTET string decoder
rule string_decode
{
meta:
author = "Elastic Security"
description = "EMOTET - find the string decoding algorithm in the PE"
creation_date = "2022-08-02"
last_modified = "2022-08-11"
os = "Windows"
family = "EMOTET"
threat_name = "Windows.Trojan.EMOTET"
reference_sample = "debad0131060d5dd9c4642bd6aed186c4a57b46b0f4c69f1af16b1ff9c0a77b1"
strings:
$chunk_1 = {
8B 0B
49 FF C3
48 8D 5B ??
33 CD
0F B6 C1
66 41 89 00
0F B7 C1
C1 E9 10
66 C1 E8 08
4D 8D 40 ??
66 41 89 40 ??
0F B6 C1
66 C1 E9 08
66 41 89 40 ??
66 41 89 48 ??
4D 3B D9
72 ??
}
$chunk_2 = {
8B 0B
49 FF C3
48 8D 5B ??
33 CD
0F B6 C1
66 41 89 00
0F B7 C1
C1 E9 ??
66 C1 E8 ??
4D 8D 40 ??
66 41 89 40 ??
0F B6 C1
66 C1 E9 ??
66 41 89 40 ??
66 41 89 48 ??
4D 3B D9
72 ??
}
condition:
any of them
}