Friday, November 4, 2011

Bytecode signatures for polymorphic malware

About one year ago Alain presented the LLVM-based ClamAV bytecode. We've realised that, besides that initial introduction, we've never shown any real life use case, nor did we ever demonstrate the incredible power and flexibility of the ClamAV bytecode engine. I'll try to fix that today.

I decided to target the Xpaj virus because it's an polymorphic file infector, which means that it is not easily to detected with plain signatures.
Please note that I'm just focusing on the detection of Xpaj via bytecode signatures, not on Xpaj itself which was already thoroughly reviewed and explained.

Clean file
Pic.1: Clean file

Pic.2: Same file as above, but infected with Xpaj

For the scope of this blog post, it suffices to say that Xpaj is a file infector targeting 32-bit Windows executables and DLLs which employs entry-point obfuscation (EPO) capabilities in order to make the detection harder. In particular, the virus code hijacks a few API calls in the .text section of the file, diverting them to its own routine.

This routine is located within the .text section and consists of a series of small chunks of code connected by jumps. Most of that is “garbage”. The only thing this preliminary block of code does is compute the code address for the next stage and jump to it. The actual viral code, as well as the overwritten blocks, are stored, in encrypted form, inside the data section.

Well... enough technical info already. From now on I'll just focus on the Xpaj detection, or rather, the detection of a rather simplified version of it in order to keep this blog post small and readable. The geeks can find the full source code here.

Let's start with a look at the virus entry point code:
push   ebp
mov ebp, esp
sub esp, XX
While these are technically enough bytes to create a signature based on the opcodes, such a signature would be a really bad idea. What we have there, in fact, is just a pretty standard function entry point.

After that we have some optional trash (do nothing) code, and then the virus saves the content of 3 random registers, which will be clobbered later by both the virus code and the trash engine too.

So far we can still get away with a signature that makes use of a wildcard, however we still don't have much: stack allocation and 3 registers saved. That's still not enough.

Next, we've got the trash engine in all its glory, and eventually we reach a function call.
The trash code may or may not jump to another chunk of code. And that effectively kills our ability to use a normal (ndb or ldb) signature.

Not all is lost, though. We can still write a small piece of bytecode signature which follows the code through the trash and checks for specific fingerprints.

In particular we plan to scan the code section for something that looks like the following:
mov         edi, edi
push ebp
mov ebp, esp
sub esp, $STACKSIZE
[optional trash]
push eax (*)
push edx (*)
push edi (*)

(*) note, the registers are chosen randomly among the 32 bit general purpose registers except esp and ebp

[optional trash]
call $DELTA
Here we are inside "$DELTA"..
[optional trash]
mov register, [ebp-stacksize]
[optional trash]

Back outside the call we have a couple of other less interesting fingerprints and eventually the virus will jump to some runtime computed location. There are two ways by which this is achieved:
jmp         local_var
push        local_var

Ok let's code...

First we look for the 5 static bytes at the virus entry point (EP):
seek(begin_of_the_code_section, SEEK_SET);
cur = file_find_limit("\x55\x89\xe5\x83\xec", 5, end_of_the_code_section);
if(cur < 0) return 0;
Then we set ourselves in a disassembly loop and we check if we got what we expect. Something along the lines of:
while(1) {
struct DIS_fixed d;
int next = DisassembleAt(&d, cur, space_remaining);
if(next == -1) break; /* disasm error */
cur = next; /* cur now points at the next op */
[here we check the op]

As for the actual opcode matching, here are a few examples. The first thing we are interested in is the 3 pushes. In terms of bytecode we need to check that:

1. the opcode is OP_PUSH
2. the argument is a register
3. the register is one of (eax, ebx, ecx, edx, esi, edi)

In BC that'd be:
d.x86_opcode == OP_PUSH
d.arg[0].access_type == ACCESS_REG
d.arg[0].u.reg == REG_EAX || d.arg[0].u.reg == REG_ECX || d.arg[0].u.reg == REG_EDX || d.arg[0].u.reg == REG_EBX || d.arg[0].u.reg == REG_ESI || d.arg[0].u.reg == REG_EDI
if(d.x86_opcode == OP_PUSH && d.arg[0].access_type == ACCESS_REG && (d.arg[0].u.reg == REG_EAX || d.arg[0].u.reg == REG_ECX || d.arg[0].u.reg == REG_EDX || d.arg[0].u.reg == REG_EBX || d.arg[0].u.reg == REG_ESI || d.arg[0].u.reg ==  REG_EDI))
Then we need to check for the call $DELTA. In other words we check that:

1. the opcode is a call
i.e.: d.x86_opcode == OP_CALL
2. the argument is an immediate relative value
i.e.: d.arg[0].access_type == ACCESS_REL

Then we pick the call target and we "jump" to it, not before saving the return address:
int32_t target_address, return_address;
seek(cur-4, SEEK_SET); /* we position onto the call argument */
read(&target_address, sizeof(target_address)); /* we read the relative jump value */
target_address = le32_to_host(target_address); /* we handle big endian machines */
retaddr = cur; /* we save the address to return to */
target_address = cur + target_address; /* we compute the addres to jump to */

Another interesting example is the trash code parser. There can be 3 types or trash ops:

A. Arithmetic or logic operation on a stack allocated DWORD based on an immediate or register value. Eg:
mov [ebp-xx], immed
add [ebp-xx], register
B. Arithmetic or logic operation on a 32bit register based on a stack allocated DWORD or an immediate value. Eg:
mov register, [ebp-xx]
sub register, other_register
C. A jump to the next chunk of code.Eg:
jmp next_chunk
More in details, for case A we check that:

1. d.x86_opcode is one of (OP_ADD, OP_ADC, OP_AND, OP_MOV, OP_OR, OP_SBB, OP_SUB, OP_XOR), i.e.:
d.x86_opcode == OP_ADD || d.x86_opcode == OP_ADC || d.x86_opcode == OP_AND || d.x86_opcode == OP_MOV || d.x86_opcode == OP_OR || d.x86_opcode == OP_SBB || d.x86_opcode == OP_SUB || d.x86_opcode == OP_XOR

2. the dest argument is a mem region:
d.arg[0].access_type == ACCESS_MEM

3. the access size is a DWORD:
d.arg[0].u.mem.access_size == SIZED

4. the dest argument is in the form [ebx-displacement]:
d.arg[0].u.mem.scale_reg == REG_EBP && d.arg[0].u.mem.scale == 1 && d.arg[0].u.mem.add_reg == REG_INVALID

5. the displacement fits within the local funcion stack:
d.arg[0].u.mem.displacement <= -4 && d.arg[0].u.mem.displacement >= -(int32_t)stacksize

6. the source argument can be anything (i.e. a register or an immediate value): nothing to check!

Case B is very similar, except the arguments are reversed:

1. The dest argument is a register:
d.arg[0].access_type == ACCESS_REG

2a. The src arg is either another reg:
d.arg[1].access_type == ACCESS_REG

2b. Or it is an immediate:
d.arg[1].access_type == ACCESS_IMM

2c. Or it is a stack based DWORD:
d.arg[0].access_type == ACCESS_MEM && d.arg[0].u.mem.access_size == SIZED && d.arg[0].u.mem.scale_reg == REG_EBP && d.arg[0].u.mem.scale == 1 && d.arg[0].u.mem.add_reg == REG_INVALID && d.arg[0].u.mem.displacement <= -4 && d.arg[0].u.mem.displacement >= -(int32_t)stacksize

Finally, case C... Here we:

1. Check that the op is a jmp:
d.x86_opcode == OP_JMP

2. Check that it's got an immediate argument:
d.arg[0].access_type == ACCESS_REL

3. Then we can "jump" to the next position:
int32_t rel;
seek(cur-4, SEEK_SET); /* move onto the jmp argument */
read(&rel, sizeof(rel)); /* read it */
rel = le32_to_host(rel); /* make it big endian safe */
cur += rel; /* "jump" to it */

Blog post by Alberto Wu.