Friday, February 15, 2019

ClamAV adopts Clang-Format


ClamAV has adopted the use of Clang-Format for the purposes of improving the readability and maintainability of our code base.

Contributors to ClamAV, and those wishing to adopt the same format rules for their own projects, are advised to use the latest version of Clang-Format available. Some of the rules we have selected require version 7 as a minimum. Details are included below on how to install Clang-Format, including recommended plugins for popular text editors.

Some of our readers may be concerned that auto-formatting the code base will make it difficult to identify when a bug was introduced. The concern is valid but we have decided that the benefits in code readability outweighed the extra steps that may be needed when viewing the git history. As a reminder, `git blame` and `git diff` have the `-w` option allowing you to ignore changes to white-space which can help tremendously when analyzing changes. The ignore-whitespace feature can also be enabled in most Git plugins for text editors, such as VSCode’s GitLens extension.

Auto-formatting exemptions

In a few instances, we had to modify the code to ensure that Clang-Format would not incorrectly format the code. This is because there are still some cases where manually formatted code is more legible than Clang-Format auto-formatted code.

Situations where formatting requires some manual finesse:

  1. The `try` keyword may not be used as a variable name because Clang-Format will interpret it in the context of exception handling. Avoid the naming variables with reserved words such as `try` to alleviate such issues.
  2. Programmers often choose to align consecutive macros to make related constants easier to identify at a glance. The AlignConsecutiveMacros Clang-Format feature is still a work in progress so we are not yet able to automatically align consecutive macros.
  3. A similar issue to the alignment of consecutive macros is the alignment of large array and large struct declarations. Clang-Format generally does an okay job formatting single and multi-dimensional array and/or structure declarations. However, manual formatting in this case is generally better.

To address items 2 and 3 above, we have resorted to using the `// clang-format on/off` markers to protect specific blocks of code from being modified by Clang-Format.

ClamAV code examples where manual formatting was better than Clang-Format:
Preparatory steps taken to ensure correct formatting when working with Clang-Format can be found here.

This commit in our Git history implemented our Clang-Format rules for the first time. You may find it interesting to peruse to see how the rules affected the code base.

Clang-Format Usage

To support our chosen code format style, we have included the following in the clamav-devel dev/0.102 branch:


Formatting options:
  • Run the following in a directory containing a “.clang-format” ruleset. It will apply the rules in-place to the file or directory you specified:

    ~/workspace/clamav-devel$ clang-format -i <filepath>
  • If you’re working regularly with ClamAV, or another project configured to use Clang-Format, you may wish to set the workspace settings for your editor to format-on-save automatically for you.
    IMPORTANT: Be sure that you do NOT enable format-on-save globally for all projects, or you may inadvertently make large changes to code in projects that do not provide a Clang-Format ruleset.
  • If you have made many changes across the ClamAV code base, you may wish to run our clam-format convenience script:

    ~/workspace/clamav-devel$ ./clam-format
    This script does three things:
    • Re-generate the .clang-format config, specifying the specific format style rules our team has chosen to enforce (as described in the next section).
    • Update the format for all ClamAV-owned code in the repository in-place.
    • Undo changes from step (2) to specific files that should not be reformatted.

The Clam C/C++ Format Style

ClamAV’s format style was chosen not only because we find it attractive and legible, but also because it aligned best with the existing ClamAV code prior to enacting the new rules. We have done our best to minimize frivolous changes to the code base using the options that Clang-Format provides.

The Clam C/C++ Format Style uses the following Clang-Format options:

Language: Cpp, 

These rules only apply to C & C++ code.

UseTab: Never,
IndentWidth: 4,

Code must not use tabs for indentation, and must use 4 spaces for indentation.

AlignTrailingComments: true,

Comments after code on consecutive lines will be aligned.

Example:
int a;     // My comment a
int b = 2; // comment b


AlignConsecutiveAssignments: true,

Variable assignments on consecutive lines will be aligned.

Example:
int aaaa = 12;
int b    = 23;

int ccc  = 23;

AlignAfterOpenBracket: true,

Function arguments split onto multiple lines will be aligned.

Example:

someLongFunction(argument1,
                 argument2);


AlignEscapedNewlines: Left, 

Backslashes for escaped newlines will be aligned as far left as possible.

Example:
#define A     \
    int aaaa; \
    int b;    \
    int dddddddddd;


AlignOperands: true,

Operands split onto multiple lines will be aligned.

Example:
int aaa = bbbbbbbbbbbbbbb +
          ccccccccccccccc;


AllowShortFunctionsOnASingleLine: Empty,

Only empty functions may be defined on a single line.

Example:
void f() {}
void f2() {
    bar2();
}


AllowShortIfStatementsOnASingleLine: true,
Short if-statements may be written on a single line.

Example:
true, if (a) return;

AllowShortLoopsOnASingleLine: true,
Short loops may be written on a single line.

Example:
true, while (true) continue;

BreakBeforeBraces: Linux,

The Linux style of breaking before braces will be used. In this style, there is a new line before the opening brace on function, namespace and class definitions, but control statements such as if, else, for, do, while, and try and braces variable declarations such as enum and struct do not get a new line before the opening brace.

Example:
try {    foo();
} catch () {
}

void foo() { bar(); }

class foo
{
};

if (foo()) {
} else {
}

enum X : int { A, B };


BreakBeforeTernaryOperators: true,

Multiline ternary statements will place the ternary operators before the values, otherwise clang-format will reformat the statement into a single line.

Example:

veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongDescription
    ? firstValue
    : SecondValueVeryVeryVeryVeryLong;


ColumnLimit: 0,

A column limit of 0 means that there is no column limit. In this case, Clang-Format will respect the input’s line breaking decisions within statements unless they contradict other rules.

FixNamespaceComments: true,

The closing brace of a namespace will include a comment providing the namespace name.

Example:
namespace a {
foo();
} // namespace a;


SortIncludes: false,

Header #include statements will not be automatically rearranged by alphabetical order, because the specific order of header includes may inadvertently or intentionally affect compilation.

MaxEmptyLinesToKeep: 1,

Only 1 empty line may exist between lines of code.

SpaceBeforeParens: ControlStatements,

Control statements will have space before the opening paren. Functions and macros will not.

Example:
void f() {    if (true) {        f();
    }
}


IndentCaseLabels: true,

Case labels in switch statements will be indented.

Example:
switch (fool) {
    case 1:
        bar();
        break;
    default:        plop();
}

DerivePointerAlignment: true

Pointer alignment preference is up to the author of the file but must be consistent across the file. Inconsistencies in a specific file will be reformatted to match the rest of the file.

Example pointer alignment possibilities:
  • int * a;
  • int* b;
  • int *c;
Install Clang-Format


Clang-Format is easy to install on most operating systems, although we will note that due to choices we’ve made in our format specification. Clang-Format version 7 or higher is required for ClamAV. If the Clang-Format tool is not provided as a separate package, it can be obtained by installing Clang or LLVM.


CentOS:

sudo yum install epel-release && sudo yum install clang

Fedora:

sudo dnf install clang

Ubuntu & Debian:

sudo apt install clang-format

macOS:

First, install Homebrew.

Then:

brew install clang-format

Windows:

First, install Chocolatey.

Then:

C:\> choco install llvm

Plugins for popular text editors and IDEs


In addition to the above manual command-line usage, clang-format plugins are readily available for most popular text editors, enabling you to auto-format as you edit using commands in your editor, or to auto-format each time you save a file.

VSCode:

https://marketplace.visualstudio.com/items?itemName=xaver.clang-format

Sublime Text:

https://packagecontrol.io/packages/Clang%20Format

VIM & Emacs:

http://clang.llvm.org/docs/ClangFormat.html#vim-integration

https://github.com/rhysd/vim-clang-format

Atom Editor:

https://atom.io/packages/clang-format

Visual Studio:

https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.ClangFormat

Emacs:

https://www.reddit.com/r/emacs/comments/7uq9w1/replace_emacs_c_autoformatting_with_clangformat/

Conventions and recommendations beyond Clang-Format

In addition to the rules enforced by Clang-Format, we have some additional guidelines to use when developing code for ClamAV.

Disclaimer: As project with a history dating back almost two decades, not all code in ClamAV adheres to these guidelines. New code should, however. During the course of working with older code, it should be updated to follow these as best as is feasible and as time permits.

Integer Types


Integer variables should use variable types from stdint that are specific about variable width (int8_t, uint32_t, int64_t, etc.) wherever possible. These types should be included using #include “clamav-types.h”, for portability. Not all operating systems or toolchains provide stdint.h or inttypes.h.

Notable exceptions to the above:

ptrdiff_t may be used to hold the signed numerical value of the difference of two pointers. Similarly, intptr_t and uintptr_t may be used to store and perform arithmetic on pointers specifically because they match the width of a pointer on a given machine.

size_t may be used to store unsigned size values as needed.

off_t, ssize_t, int, and long and other poorly defined types of yore may be required (at least temporarily) by library API's. These should be avoided in general, favoring ptrdiff_t (signed), size_t (unsigned), or an even more explicit type, such as uint32_t or int64_t. The older integer types are not well defined and its size may vary by compiler, OS, and system architecture. These types should be properly converted to better defined types as fast as is practical.

Integer Types in Format Strings

The format characters for stdint variable types are defined in clamav-types.h as of ClamAV 0.101.1.

Example format string usage:
int64_t obj_id = 0;
printf("Object ID: " PRIi64 "\n", obj_id);


In addition to these types, many people forget how to print size_t, ptrdiff_t, ssize_t, and off_t in a way that is portable across all systems.

For reference, these are:
  • size_t val
    • “%zu”, val
  • ptrdiff_t val
    • “%zd”, val
  • off_t val
    • “%lld”, (long long)val
  • ssize_t val
    • “%lld”, (long long)val
Inline Documentation / Function Comments
Functions should have Doxygen-style comment blocks. These comment blocks should be located where other developers can easily read them when working with the functions.

That is to say that:

  • For library API's, the Doxygen comment block should appear above the function prototype in the .h header file. For an example, see clamav.h.
  • For almost all other function types, it is best to place the doxygen comment block above the implementation in the .c / .cc / .cpp file.
Tip: Many text editors offer Doxygen extensions that significantly assist in writing function comments. For VSCode, the Doxygen Documentation Generator extension is quite helpful.

Example comment block including a brief, description, in, out, and in/out parameters:

/**
 * @brief Scan a file, given a file descriptor.
 *
 * This callback variant allows the caller to provide a context structure that caller provided callback functions can interpret.
 *
 * @param desc File descriptor of an open file. The caller must provide this or the map.
 * @param filename (optional) Filepath of the open file descriptor or file map.
 * @param[out] virname Will be set to a statically allocated (i.e. needs not be freed) signature name if the scan matches against a signature.
 * @param[out] scanned The number of bytes scanned.
 * @param engine The scanning engine.
 * @param scanoptions Scanning options.
 * @param[in/out] context An opaque context structure allowing the caller to record details about the sample being scanned.
 * @return cl_error_t CL_CLEAN, CL_VIRUS, or an error code if an error occurred during the scan.
 */
extern int cl_scandesc_callback(int desc, const char *filename, const char **virname, unsigned long int *scanned, const struct cl_engine *engine, struct cl_scan_options *scanoptions, void *context);

Packed Struct Definitions

Some structs require packing to ensure that they correctly represent data the same when written to or read from disk or the network.

Take the example struct:
struct example {
    short y;
    int x;
    char z;
}

Without packing, the compiler might optimize execution time by representing the struct on disk as the following bytes:
y
y


x
x
x
x
z




In reality, the programmer probably envisioned the struct without all that extra padding:
y
y
x
x
x
x
z


To force the compiler to properly pack a struct so that it is portable, surround the struct definition with the following:

#ifndef HAVE_ATTRIB_PACKED
#define __attribute__(x)
#endif

#ifdef HAVE_PRAGMA_PACK
#pragma pack(1)
#endif

#ifdef HAVE_PRAGMA_PACK_HPPA
#pragma pack 1
#endif

/* struct definition */
struct example_struct
{
    ...
} __attribute__((packed));

#ifdef HAVE_PRAGMA_PACK
#pragma pack()
#endif

#ifdef HAVE_PRAGMA_PACK_HPPA
#pragma pack
#endif


Note that all THREE preprocessor directive forms are necessary for cross-platform compatible code, as per https://bugzilla.clamav.net/show_bug.cgi?id=1752.