Tuesday, November 25, 2014

Sample File Properties Collection Analysis Bytecode Signature Walkthrough

One of the features for ClamAV 0.98.5 is the File Properties Collection Analysis which collects information on scanned files. However, question is, how do we go about using this new feature?

Well, the first thing to know is there is a new ClamAV target type set for scanning the internal JSON structure generated by enabling this feature, target type 13. All signatures set for target type 13 with be only applied to the internal JSON structure. Thus you can write a normal ClamAV signature to detect on the value of certain fields but this is very limited in a number of regards. For example, you cannot have any insight on how the properties are related to one another or, in certain cases, whether a match is from the file itself or an embedded file.

In fact, the other case where you can make that distinction is on the root file type and embedded file types but this can, at best, just provide a form of filtering. This is the case where a bytecode signature is needed. In this post, we will be examining a bytecode signature source on the common things to look for in writing bytecode signatures. This post is directed towards signature writers.

 This bytecode source can be used to examine all supported files for embedded executables:
/* ClamAV.BCC.SandBox.Submit */
/* ClamAV.BCC.SandBox.InActive */
VIRUSNAME_PREFIX("ClamAV.BCC.SandBox")
VIRUSNAMES("Submit""InActive")

/* Target type is 13, internal JSON properties */
TARGET(13)

/* JSON API call will require FUNC_LEVEL_098_5 = 78 */
FUNCTIONALITY_LEVEL_MIN(FUNC_LEVEL_098_5)

SIGNATURES_DECL_BEGIN
DECLARE_SIGNATURE(sig1)
SIGNATURES_DECL_END

SIGNATURES_DEF_BEGIN
/* search @offset 0 : '{ "Magic": "CLAMJSON' */
/* this can be readjusted for specific filetypes */
DEFINE_SIGNATURE(sig1, "0: 7b20224d61676963223a2022434c414d4a534f4e")
SIGNATURES_END

bool logical_trigger(void)
{
    return matches(Signatures.sig1);
}

#define STR_MAXLEN 256

int entrypoint ()
{
    int i;
    int32_t type, obj, objarr, objit, arrlen, strlen;
    char str[STR_MAXLEN];

    /* check is json is available, alerts on inactive (optional) */
    if (!json_is_active())
        foundVirus("InActive");

    /* acquire array of internal contained objects */
    objarr = json_get_object("ContainedObjects", 16, 0);
    type = json_get_type(objarr);
    /* debug print uint (no '\n' or prepended message */
    debug_print_uint(type);

    if (type != JSON_TYPE_ARRAY) {
        return -1;
    }

    /* check array length for iteration over elements */
    arrlen = json_get_array_length(objarr);
    for (i = 0; i < arrlen; ++i) {
        /* acquire json object @ idx i */
        objit = json_get_array_idx(i, objarr);
        if (objit <= 0) continue;

        /* acquire FileType object of the array element @ idx i */
        obj = json_get_object("FileType", 8, objit);
        if (obj <= 0) continue;

        /* acquire and check type */
        type = json_get_type(obj);
        if (type == JSON_TYPE_STRING) {
            /* acquire string length, note +1 is for the NULL terminator */
            strlen = json_get_string_length(obj)+1;
            /* prevent buffer overflow */
            if (strlen > STR_MAXLEN)
                strlen = STR_MAXLEN;
            /* acquire string data, note strlen includes NULL terminator */
            if (json_get_string(str, strlen, obj)) {
                /* debug print str (with '\n' and prepended message */
                debug_print_str(str,strlen);

                /* check the contained object's type */
                if (strlen == 14 && !memcmp(str, "CL_TYPE_MSEXE", 14)) {
                    /* alert for submission */
                    foundVirus("Submit");
                    return 0;
                }
            }
        }
    }

    return 0;
}


Reported Signatures
The first thing that a bytecode signature source needs is the detection name it returns upon detecting the desired trait. The signatures that a bytecode can reported are determined by the strings passed to the VIRUSNAME_PREFIX and VIRUSNAMES macros.
/* ClamAV.BCC.SandBox.Submit */
/* ClamAV.BCC.SandBox.InActive */
VIRUSNAME_PREFIX("ClamAV.BCC.SandBox")
VIRUSNAMES("Submit""InActive")

VIRUSNAME_PREFIX: a REQUIRED macro field. It consists of exactly one string value which may contain alphanumeric characters and periods; periods are used to mark different groups the signature is attributed to.
VIRUSNAMES: an OPTIONAL macro field. It consists of an array of string values which may only contain alphanumeric characters.

Once the names of possible detections are declared, you need to specify in the entrypoint function when a detection has occurred. The bytecode signature reports specific detections through the usage of the bytecode API function foundVirus() which takes a single string argument that correlates to the VIRUSNAMES.
        foundVirus("InActive");
              ...
                    foundVirus("Submit");

Using an empty string (“”) will have the bytecode simple report the VIRUSNAME_PREFIX. Note that the VIRUSNAME_PREFIX string is not part of the foundVirus() call. Bytecode signatures may report one detection; multiple calls to foundVirus() may overwrite the previous detection though this behavior is not guaranteed.


Target Group and Engine
Bytecode allows for the user to direct the application of a bytecode signature specifically at a particular filetype and for a specific version of the ClamAV engine. While normally, these parameters are optional, using the File Properties Collection Analysis requires TARGET(13) and FUNCTIONALITY_LEVEL_MIN(FUNC_LEVEL_098_5).
/* Target type is 13, internal JSON properties */
TARGET(13)

/* JSON API call will require FUNC_LEVEL_098_5 = 78 */
FUNCTIONALITY_LEVEL_MIN(FUNC_LEVEL_098_5)

TARGET: a normally OPTIONAL macro field. For the case of ClamAV internal File Properties Collection Analysis, this is REQUIRED and MUST BE set to 13. It consists of single integer value [1-13 at time of writing] which represents the intended target of the bytecode (bytecode will only run on that target type). Target types are listed in the ClamAV Signature document.
FUNCTIONALITY_LEVEL_MIN: a normally OPTIONAL macro field. For the case of File Properties Collection Analysis, this is REQUIRED and MUST BE set to at least FUNC_LEVEL_098_5. It consists of an enumeration value that represents the minimum functionality level of ClamAV for this bytecode to run on. ClamAV versions prior to this value will not load this bytecode.
FUNCTIONALITY_LEVEL_MAX: an OPTIONAL macro field. It consists of an enumeration value that represents the maximum functionality level of ClamAV for this bytecode to run on. ClamAV versions after this value will not load this bytecode.


Logical signatures
Running bytecode signatures are quite expensive and so bytecode signatures are only executed where a certain logical condition is fulfilled. The first section that needs to be specified is the associated subsignatures and the logical trigger function. These are REQUIRED.
SIGNATURES_DECL_BEGIN
DECLARE_SIGNATURE(sig1)
SIGNATURES_DECL_END

SIGNATURES_DEF_BEGIN
/* search @offset 0 : '{ "Magic": "CLAMJSON' */
/* this can be readjusted for specific filetypes */
DEFINE_SIGNATURE(sig1, "0: 7b20224d61676963223a2022434c414d4a534f4e")
SIGNATURES_END

bool logical_trigger(void)
{
    return matches(Signatures.sig1);
}

Excerpt from clambc-user.pdf (from ClamAV Bytecode Documentation):
“Logical signatures use .ndb style patterns….
Each pattern has a name (like a variable), and a string that is the hex pattern itself. The declarations are delimited by the macros SIGNATURES_DECL_BEGIN, and SIGNATURES_DECL_END. The definitions are delimited by the macros SIGNATURES_DEF_BEGIN, and SIGNATURES_END. Declarations must always come before definitions, and you can have only one declaration and declaration section! (think of declaration like variable declarations, and definitions as variable assignments, since that what they are under the hood). The order in which you declare the signatures is the order in which they appear in the generated logical signature.
You can use any name for the patterns that is a valid record field name in C, and doesn’t conflict with anything else declared.
After using the above macros, the global variable Signatures will have [one field, sig1]. [This] can be used as arguments to the functions count_match(), and matches() anywhere in the program…:
·         matches[Signatures.sig1) will return true when the [sig1] signature matches (at least once)
·         count_match(Signatures.sig1) will return the number of times the [sig1] signature matched
The condition in [the logical_trigger function can be interpreted as if sig1 is matched at least once].”

The logical trigger for the sample bytecode is set to trigger on all supported file tyes by triggering on the detection of the associated internal structure “file magic”. If you want to trigger the bytecode on a specific file type, you could add a subsignature for the file type, for example:


/* search '"FileType": "CL_TYPE_MSEXE"' */
DEFINE_SIGNATURE(sig2, "2246696c6554797065223a2022434c5f545950455f4d5345584522")

And then set the logical trigger return to use both subsignatures:
bool logical_trigger(void)
{
    return matches(Signatures.sig1) && matches(Signatures.sig2);
}

Note that specifying TARGET(13) will already force the signature to apply strictly to File Properties Collection Analysis so the inclusion of the subsignature is simply for the requirement fulfillment in our example.


Main Program (entrypoint)
The “entrypoint” function in the bytecode signature can be seen as effectively the “main” function within a C program. It is REQUIRED and must use this prototype.
int entrypoint ()
{
...
}

At this point, the bytecode uses roughly the same syntax as the C programming language to perform operations on the file to determine whether or not to report detection. There are a key number of limitations on the bytecode language from C however, all of which can be found in section 4 of the clambc-user.pdf document of the ClamAV Bytecode Compiler.


Example Entrypoint Walkthrough
    int i;
    int32_t type, obj, objarr, objit, arrlen, strlen;
    char str[STR_MAXLEN];
These are the declarations of the variables will use in the bytecode. While it is not strictly enforced, it is generally good practice to state all variables at the start of each function. Note that only basic C types (excluding floats and doubles) and fixed-sized ints can be used. STR_MAXLEN is a macro equal to 256.

    /* check is json is available, alerts on inactive (optional) */
    if (!json_is_active())
        foundVirus("InActive");
The bytecode API is used to query if JSON is enabled in the libclamav instance. Note that the target type requirement of 13 will generally force the returned value to be true; this statement is here for extra safety. This segment also reports a virus “InActive” in the case that the JSON is inactive. A call to “foundVirus()” does not terminate the run of the program, so this sample signature will actually continue running even though JSON is not available. This is alright as most JSON parsing API functions check if JSON is enabled and return an error value.

    /* acquire array of internal contained objects */
    objarr = json_get_object("ContainedObjects", 16, 0);
    type = json_get_type(objarr);
    /* debug print uint (no '\n' or prepended message */
    debug_print_uint(type);

    if (type != JSON_TYPE_ARRAY) {
        return -1;
    }
This segment acquires an object ID for the “ContainedObjects” object and checks to see if the object is typed JSON_TYPE_ARRAY.  The other call “debug_print_uint()” prints a debug message with only the uint value (no “LibClamAV Debug” or newline”).

    /* check array length for iteration over elements */
    arrlen = json_get_array_length(objarr);
    for (i = 0; i < arrlen; ++i) {
        /* acquire json object @ idx i */
        objit = json_get_array_idx(i, objarr);
        if (objit <= 0) continue;
This segment setups an iteration across all the members of the “ContainedObjects” array retrieved earlier. Note the check for the objit ID to be a valid value.

        /* acquire FileType object of the array element @ idx i */
        obj = json_get_object("FileType", 8, objit);
        if (obj <= 0) continue;

        /* acquire and check type */
        type = json_get_type(obj);
        if (type == JSON_TYPE_STRING) {
            /* acquire string length, note +1 is for the NULL terminator */
            strlen = json_get_string_length(obj)+1;
            /* prevent buffer overflow */
            if (strlen > STR_MAXLEN)
                strlen = STR_MAXLEN;
            /* acquire string data, note strlen includes NULL terminator */
            if (json_get_string(str, strlen, obj)) {
                /* debug print str (with '\n' and prepended message */
                debug_print_str(str,strlen);

                /* check the contained object's type */
                if (strlen == 14 && !memcmp(str, "CL_TYPE_MSEXE", 14)) {
                    /* alert for submission */
                    foundVirus("Submit");
                    return 0;
                }
            }
        }
    }
This segment retrieves the objit’s type and, if it is a string, retrieves the string value and stores it in the str user string. The returned user string is a copy and modifications to the string do not change the internal object’s value. Next the string is a comparison of the returned string against “CL_TYPE_MSEXE” to determine whether to “Submit”.  Effectively, this signature returns a “Submit” whenever it detects a PE file embedded within the parent file.

Note the checks on the received strlen value to prevent a buffer overflow vulnerability. Bytecode signatures are always compiled with various runtime checks and thus a case that would cause the vulnerability would trigger a runtime error and bytecode termination (clamav continues to run). Regardless, great care should be exercised in regards to user variable boundaries.

    return 0;
}
Entrypoint function needs to return an integer; returning a 0 reports no issues with the bytecode execution however, all return values are ignored by libclamav by default.

This should about cover a basic example of how to write a bytecode signature to trigger on File Properties Collection Analysis, for more example bytecode signatures, you can look at the source code under “fileprop_analysis “ in the examples directory of the ClamAV source distribution. For more information on how to write bytecode signatures, you can review the documentation in under user in the doc directory of the clambc-compiler.