Thursday, July 23, 2020

Writing a ClamAV signature for Masslogger

By Nikhil Hegde.

MassLogger is a .NET executable and an info-stealer having the ability to take screenshots, log keystrokes, etc. The binary features two obfuscated loaders. The first loader decrypts the second using the Rijndael algorithm. The second loader decodes the payload using the values in a Bitmap image. The presence of multiple loaders and obfuscation made it a good choice for a walkthrough of ClamAV signature creation. In this post, we’ll walk through the creation of Clam signatures for this malware. It is our hope that this gives you a closer look at the work we do and reverse-engineer malware.

I picked up the sample hash, 2b7455d2a9434cfe516d9d886248b45f1073c0cc9fef73b15e9a1ef187fe4677 from a tweet by Nocturnus. The sample is available on VirusTotal.

Basic and behavioral analysis

Type of binary

The first step during analysis is always to determine the type of sample we’re dealing with. The type of sample determines:

  • The kind of obfuscation/packing techniques that can be used.
  • The tools that can be used for deeper analysis.

In this case, the binary is a .NET executable as determined by the PEiD packer.

Human-readable strings

The next step of analysis is to look at the readable strings. These provide insights into what the sample may be doing. We’ll use an example of a string for the video game “Apex Legends”. The string, “Apex Legends Installer” can mean five things:

  • The sample is an official “Apex Legends” installer
  • The sample is posing as the official “Apex Legends” installer
  • The sample is a trojanized version of the official “Apex Legends” installer
  • The sample is looking for the presence of the “Apex Legends” installer, or
  • The sample intentionally includes legitimate strings to fool reverse-engineers.

Finding a unique string(s) is great for writing a ClamAV signature. However, a signature that consists of only readable strings is not the most preferable. This is because strings are fungible. A malware author can easily remove/add/modify a single character or change the encoding type and the signature would be rendered useless.

The following strings in the sample caught our eye:

  • IBtYoWoZVZsP.exe
  • $0ED9E969-8548-455D-B751-6A5DD454C8F8

The first string is the filename of the sample and we could not find any benign matches for the second string. At a later stage in the analysis, we found that the second string is a GUID hard-coded in the sample.
[assembly: Guid("0ED9E969-8548-455D-B751-6A5DD454C8F8")]

Behavior

When reading behavioral analysis reports, I tend to ignore Files Opened and Registry Keys Opened sections. Whenever a binary uses a DLL/file or reads a benign Registry key, it opens them. This information is not particularly useful in the analysis. I generally start with the other sections like Files Written, Registry Keys Set, etc. Having said that, the two XXXXX Opens become important if the malware in question is a second-stage malware that uses files dropped by the first-stage malware.

In this case, the following files were written to disk:

  • %APPDATA%\jyvnbfjfom.exe
  • <SYSTEM32>\tasks\updates\jyvnbfjfom
  • %TEMP%\8ebb36b8cf\log.txt

While we don't know what contents were written to these files, we know that the WriteFile function was most likely used. It can also be hypothesized that the sample is creating a task that uses the file jyvnbfjfom. When the sample is analyzed in a disassembler or debugger, we jump to a WriteFile call or place a breakpoint on the first instruction inside the WriteFile function respectively. This allows us to see the contents that are being written into the files.

The sample sets the following interesting Registry keys:

  • <HKLM>\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Schedule\TaskCache\Tasks\{FD1C51B3-5EBE-43ED-BA6C-60DBA47CE496}\Path
    • \Updates\JyVNBfjFOm
  • <HKLM>\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Schedule\TaskCache\Tasks\{FD1C51B3-5EBE-43ED-BA6C-60DBA47CE496}\Triggers
    • \x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x68\x09\x41\x00\x48\x48\x48\x48\xe6\xc6\x25......

The above shows that a task was scheduled with the GUID, FD1C51B3-5EBE-43ED-BA6C-60DBA47CE496 which uses the file that was previously written to disk, \Updates\JyVNBfjFOm. I don't know what the Triggers key-value means but presumably, it's a trigger that starts the task.
The sample creates the following process tree:



The VirusTotal report also mentions that the original sample was injected into. In general, the assumption in such cases is that the injected process is the one that actually performs the malicious actions while the original sample's primary job is to create itself as a sub-process, inject it with malicious code and then exit. This kind of in-memory execution or being "fileless" helps the malware evade AVs.

Imports / Exports

The Import Address Table (IAT) and Export Address Table (EAT) sometimes contain strong indicators to a binary's functionality. The presence of cryptography-related APIs suggest cryptographic functionality (ransomware, etc.) and internet-related APIs suggest network activity (downloaders, etc.).

In this case, there was just one import: _CorExeMain which is not unheard of for .NET binaries. _CorExeMain is a native function that all .NET binaries import and jumping into this function loads the CLR for the process.

Debugging with dnSpy

Finding the injected code

On opening the sample in dnSpy-x86, I was greeted with a fantastic indicator of obfuscation:



Irrespective of whether de4dot supports the packer/obfuscator or not, we use it anyway. It tries its best to restore the original assembly and the resulting product is easier to read than the original in most cases.













While the deobfuscated version isn't the pre-compilation source code, it is much easier to read and navigate.

My first goal was to find the code which did process injection. Essentially, I'm looking for a .Load(<assembly>) function call such as the one below:



I placed a breakpoint on line 21 to see what assembly was being loaded and voila! it was a PE that was being loaded into memory.



I saved the contents of the variable rawAssembly to the disk into a file. The following LateBinding.LateGet() calls a method (Class22.smethod_0(-2799749)) within the newly loaded assembly with the arguments that is the fourth argument to the LateBinding.LateGet().

The following is a more readable version as Libra puts it:
public void (Class22.smethod_0(-2799749)) (
    Class22.smethod_0(-2799800),
    BindingFlags.InvokeMethod,
    null,
    null,
    new object[] {
        Class35.string_0,
        Class22.smethod_0(-2799783)
    }
)
Placing a breakpoint on the return statement inside Class22.smethod_0(), we saw the return values of the three function calls and the string_0 attribute.

  • Class22.smethod_0(-2799749) returns "InvokeMember"
  • Class22.smethod_0(-2799800) returns "Register"
  • Class35.string_0 returns "xbOXDnoziEaClDkDAVzfoavxLAnDftvL"
  • Class22.smethod_0(-2799783) returns "Fluxx"

The Register function inside the newly loaded executable is being called with the arguments ("xbOXDnoziEaClDkDAVzfoavxLAnDftvL," "Fluxx") using the InvokeMember method (from the namespace, System.Reflection).

ClamAV signature

ClamAV looks for sequences or multiple sequences of bytes that are present in the original sample. Bytes that are custom decoded by the sample dynamically may not be used in a ClamAV signature because it cannot see those decoded bytes statically. The sample decrypts the above assembly using the Rijndael algorithm. These bytes will not be visible to ClamAV statically. The original process exits soon after loading the assembly, so the signature will have to be written for the sample up until this point.



Loaded assembly bytes decryption

The function, Class19.method_2() decrypts the loaded assembly bytes:
public byte[] method_2(byte[] byte_0, byte[] byte_1)
{
    Rfc2898DeriveBytes rfc2898DeriveBytes = new Rfc2898DeriveBytes(byte_1, new byte[8], 1);
    RijndaelManaged rijndaelManaged = new RijndaelManaged
    {
        Key = rfc2898DeriveBytes.GetBytes(16),
        IV = rfc2898DeriveBytes.GetBytes(16)
    };
    byte[] array = rijndaelManaged.CreateDecryptor().TransformFinalBlock(byte_0, 0, byte_0.Length);
    checked
    {
        byte[] array2 = new byte[array.Length - 17 + 1];
        Array.Copy(array, 16, array2, 0, array.Length - 16);
        return array2;
    }
}
The following statements are used to create a 16-bit key and IV (Initialization Vector) value based on byte_1 password value, 8-bit salt (initialized to zero) with one iteration (the recommended iterations is 1,000).
...
Rfc2898DeriveBytes rfc2898DeriveBytes = new Rfc2898DeriveBytes(byte_1, new byte[8], 1);
RijndaelManaged rijndaelManaged = new RijndaelManaged
{
    Key = rfc2898DeriveBytes.GetBytes(16),
    IV = rfc2898DeriveBytes.GetBytes(16)
};

...
The following statements are used to decrypt the assembly values using the above key and IV value:
...
byte[] array = rijndaelManaged.CreateDecryptor().TransformFinalBlock(byte_0, 0, byte_0.Length);
...
Class19.method_1() calls the above decryption function and is shown below:
public byte[] method_1(byte[] byte_0, string string_0)
{
    return this.method_2(byte_0, Encoding.Default.GetBytes(string_0));
}
The statement below calls Class19.method_1():
byte[] rawAssembly = this.method_1(Class20.smethod_3(), Class22.smethod_0(-2799769));
Now that we know the function call sequence, we also know that the first argument above is the encrypted byte sequence and the second argument is the string password. Class20.smethod_3() contains the following statements:
internal static byte[] smethod_3()
{
    object @object = Class20.smethod_0().GetObject(Class22.smethod_0(-2801361), Class20.cultureInfo_0);
    return (byte[])@object;
}
Notice that Class22.smethod_0() is also the function that returns the string password. Perhaps it's a string retrieval/decryptor function. Class22.smethod_0() is shown below:
internal static string smethod_0(int int_3)
{
    Class22.Class23 obj = Class22.class23_0;
    string result;
    lock (obj)
    {
        string text = Class22.class23_0.method_2(int_3);
        if (text != null)
        {
            result = text;
        }
        else
        {
            result = Class22.smethod_1(int_3, true);
        }
    }
    return result;
}
The above function accepts an integer argument and returns a string. The string result is returned by either Class22.class23_0.method_2() or Class22.smethod_1(). Class22.class23_0.method_2() is shown below:
public string method_2(int int_1)
{
    Class22.Class23.Struct0[] array = this.struct0_0;
    int num = array.Length;
    int num2 = int_1 & num - 1;
    string result = null;
    while (array[num2].int_0 != int_1)
    {
        if (array[num2].string_0 == null)
        {
            return result;
        }
        num2++;
        if (num2 >= num)
        {
            num2 = 0;
        }
    }
    return array[num2].string_0;
}
The array variable is an array of structure, Struct0 which has the form:
private struct Struct0
{
    public int int_0;
    public string string_0;
}
Class22.class23_0.method_2() returns the string, string_0 where int_0 is equal to the input argument, int_1. If no such int_0 is found, it returns null. If null is returned, control goes into Class22.smethod_1(). I decided to use statements in Class22.class23_0.method_2() function in the ClamAV signature since it plays an integral role in the string retrieval process. Class22.smethod_1() is a large function, but we can hypothesize that it must be some kind of a decryptor or string retrieval function. I noticed in this function that the value, Class22.int_0 was used in multiple math operations such as binary OR, XOR, etc. and based on this observation, we determined those math operations were part of a decoder and decided to use those statements in the ClamAV signature.

Note: It is important to not use a large number of contiguous code statements in the ClamAV signature because the author has more opportunity to substitute one of those statements and possibly render the signature ineffective.

Effect of obfuscation

The ClamAV signature needs to be written based on the obfuscated sample and not the cleaned version. The following are snaps of the Class22.smethod_1() function from three different obfuscated Masslogger samples:

SHA256: 2b7455d2a9434cfe516d9d886248b45f1073c0cc9fef73b15e9a1ef187fe4677

SHA256: ff93d03d2064353ffe7482722da48dd84537cca05f780f1f4852995590acde3f

SHA256: 7e68b29fadf839d238225f50e5e5db677690a7cf968ab59484571bb6002a1311

Notice that the workflow looks similar but the constants and math operations are different. For instance, the following statements on line 18 in the above three snaps: (When cleaned by de4dot, \u0005\u2005.\u0006 actually represents Class22.int_0.)
\u0005\u2005.\u0006 |= 807638346 - num + num2;
\u0005\u2005.\u0006 |= 1261563936 + num + num2;
\u0005\u2005.\u0006 |= (num ^ 847395692) + num2;
Consequently, their hex codes are also slightly different:
7E18000004204A95233006590758608018000004
7E180000042020F0314B06580758608018000004
7E1800000406206C3B8232610758608018000004
When writing the ClamAV signature, I'll replace the different bytes with a wildcard character: ?. In the above case, the signature would match on the following hex sequence:
7E18000004{9}608018000004
{9} implies 9 bytes of wildcard characters.

Signature

The following is a ClamAV signature that I created to detect Masslogger samples:
Win.Trojan.Masslogger;Engine:51-255,Target:1;0&1&((2&3&4&5&6&7&8&9&10)>6,7);06088F170000027B220000042C0C;0817580C080732CC160C2BC8;7E18000004{9}608018000004;7E18000004{10}0760618018000004;7E180000041107{9}58618018000004;7E18000004{9}618018000004;1F207E18000004618018000004;7E18000004{9}1711075860618018000004;7E18000004{19}618018000004;7E18000004{9}3334;7E1500000414FE037E18000004{9}FE0116FE012E37
A detailed description of a ClamAV signature can be retrieved by using sigtool utility with --decode-sig switch.

The below signature description isn't very human-friendly because the hex bytes used in the signature are function opcodes rather than human-readable strings.
$ cat ../masslogger.ldb | sigtool --decode-sig
VIRUS NAME: Win.Trojan.Masslogger
TDB: Engine:51-255,Target:1
LOGICAL EXPRESSION: 0&1&((2&3&4&5&6&7&8&9&10)>6,7)
SUBSIG ID 0
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
�{",
SUBSIG ID 1
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
X
2�
+�
SUBSIG ID 2
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==9)}`�
SUBSIG ID 3
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==10)}`a�
SUBSIG ID 4
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==9)}Xa�
SUBSIG ID 5
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==9)}a�
SUBSIG ID 6
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~a�
SUBSIG ID 7
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==9)}X`a�
SUBSIG ID 8
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==19)}a�
SUBSIG ID 9
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~{WILDCARD_ANY_STRING(LENGTH==9)}34
SUBSIG ID 10
+-> OFFSET: ANY
+-> SIGMOD: NONE
+-> DECODED SUBSIGNATURE:
~�~{WILDCARD_ANY_STRING(LENGTH==9)}��.7
The created signature is called a logical signature and is placed in a .ldb file which has the following format:
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;Subsig1;Subsig2;...
The SignatureName has a specific format: platform.category.malware_name. In this case, this comes out to be Win.Trojan.Masslogger.

TargetDescriptionBlock can include multiple options and are available here. The Engine option specifies the functional level of ClamAV and is used to "define which versions of ClamAV the signature features support". Functional level, 255 is an integer used to represent a future release of ClamAV. The Target option tells ClamAV the type of file to scan.

LogicalExpression is used to represent the relationship between the various sub-signatures in the signature. In this case, each sub-signature is related to each other via a logical AND: &. Signatures 2 to 10 have to match more than 6 times and at least seven of them should match. I've put in this additional constraint in case two of the sub-signatures (from 2 to 10) do not match because of the effects of obfuscation.

The following is a table of code statements that I've used in the signature and their corresponding hex codes:

if (u[num2].\u0003 == null)
06088F170000027B220000042C0C 
num2++; if (num2 >= num) { num2 = 0; }
0817580C080732CC160C2BC8 
\u0005\u2005.\u0006 |= 807638346 - num + num2
7E18000004{9}608018000004 
\u0005\u2005.\u0006 ^= ((-759257072 - num ^ num2) | num4);
7E18000004{10}0760618018000004 
\u0005\u2005.\u0006 ^= num4 + ((num ^ 801305398) + num2);
7E180000041107{9}58618018000004 
\u0005\u2005.\u0006 ^= -759043579 - num - num2;
7E18000004{9}618018000004 
\u0005\u2005.\u0006 = (32 ^ \u0005\u2005.\u0006);
1F207E18000004618018000004 
\u0005\u2005.\u0006 ^= (num + 759269396 + num2 | 1 + num4);
7E18000004{9}1711075860618018000004 
\u0005\u2005.\u0006 = ((\u0005\u2005.\u0006 & (num + -582914518 ^ num2)) ^ (-801297468 ^ num) - num2); 
7E18000004{19}618018000004 
if (\u0005\u2005.\u0006 == 806071938 - num + num2)
7E18000004{9}3334 
if (\u0005\u2005.\u000E != null != (\u0005\u2005.\u0006 != (num ^ 802818892) + num2))
7E1500000414FE037E18000004{9}FE0116FE012E37

Local Verification

On Ubuntu, ClamAV can be installed using apt install clamav. Its on-demand scan utility, clamscan, can use our signature file (masslogger.ldb) to scan binaries as shown below:
$ clamscan -d ../masslogger.ldb *
2b7455d2a9434cfe516d9d886248b45f1073c0cc9fef73b15e9a1ef187fe4677: Win.Trojan.Masslogger.UNOFFICIAL FOUND
2f4964e14972eafa98f1fc8ad81f8dc2eeb45a00ef420cf59db34faba1592ac4: Win.Trojan.Masslogger.UNOFFICIAL FOUND
7e68b29fadf839d238225f50e5e5db677690a7cf968ab59484571bb6002a1311: Win.Trojan.Masslogger.UNOFFICIAL FOUND
d808126fdcb04b3b796f2dd35c378336fdf55479fe17852c4e033e4768d913c9: Win.Trojan.Masslogger.UNOFFICIAL FOUND
ff93d03d2064353ffe7482722da48dd84537cca05f780f1f4852995590acde3f: Win.Trojan.Masslogger.UNOFFICIAL FOUND
...
Known viruses: 1
Engine version: 0.102.3
Scanned directories: 0
Scanned files: 5
Infected files: 5
Data scanned: 3.40 MB
Data read: 3.40 MB (ratio 1.00:1)
Time: 0.057 sec (0 m 0 s)

Retrohunt

For large-scale verification, retrohunt can be used. The ClamAV signature can be easily converted to a YARA rule and is shown below:
rule Masslogger {
    meta:
        author = "nikhegde"
        date = "07//19/2020"
        description = "This YARA rule detects Masslogger samples"
    strings:
        $sig_req_0 = { 06 08 8F 17 00 00 02 7B 22 00 00 04 2C 0C }
        $sig_req_1 = { 08 17 58 0C 08 07 32 CC 16 0C 2B C8 }
        $sig_opt_2 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? 60 80 18 00 00 04 }
        $sig_opt_3 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 07 60 61 80 18 00 00 04 }
        $sig_opt_4 = { 7E 18 00 00 04 11 07 ?? ?? ?? ?? ?? ?? ?? ?? ?? 58 61 80 18 00 00 04 }
        $sig_opt_5 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? 61 80 18 00 00 04 }
        $sig_opt_6 = { 1F 20 7E 18 00 00 04 61 80 18 00 00 04 }
        $sig_opt_7 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? 17 11 07 58 60 61 80 18 00 00 04 }
        $sig_opt_8 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 61 80 18 00 00 04 }
        $sig_opt_9 = { 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? 33 34 }
        $sig_opt_10 = { 7E 15 00 00 04 14 FE 03 7E 18 00 00 04 ?? ?? ?? ?? ?? ?? ?? ?? ?? FE 01 16 FE 01 2E 37 }
    condition:
        (all of ($sig_req*)) and (7 of ($sig_opt*))
}
The next step would be to verify if the matched samples are actually Masslogger samples. We looked for the following executed command structure in the behavioral report for confirmation:
schtasks.exe" /Create /TN "Updates\<some_random_string1>" /XML "%TEMP%\tmp<some_random_string2>.tmp"
The command is executed by the loaded assembly at a later stage. That is beyond the scope of this post.