Wednesday, February 20, 2019
The ClamAV Development team is looking to add a new C programmer to the team, located in Fulton, Maryland USA!
Our team is a small and friendly one. At present, we have two software engineers and one quality assurance engineer. All three of us work in the Cisco office in Fulton, MD where we occupy a comfortable corner complete with our own snack area and Nerf gun arsenal. And we're not simply legacy software maintainers: we actively design and engineer new features while improving existing ones. If you’re seeking a long-term, head-in-the-sand, maintenance job, look elsewhere. Our team is constantly engaged. From discussing new ways to improve our processes, to opening ourselves to new concepts, our willingness to learn and explore new ideas leads us to write better code, test more effectively, and ultimately build a better product.
As a member of the ClamAV team, you wouldn’t be working in a vacuum, and you wouldn’t get stuck fixing bugs in legacy software day-in and day-out. The team has an ever-evolving list of cool, new ideas to work on. Project planning is a collaborative effort. We derive requirements from community requests as well as from our partners in Talos Malware Research, Talos Web Team, and stakeholders across a handful of Cisco product groups. We work as a team to find the best way to tackle the more difficult challenges. The ClamAV Dev team is part of a larger group of skilled developers, researchers, and analysts, with whom we collaborate daily. Working on ClamAV, you'll always have a friendly and highly-skilled talentpool from which to draw upon and share ideas.
Our team performs for a wide audience. ClamAV is oft regarded as an anti-malware product, but our tools do more than protect endpoint devices. In fact, ClamAV is run on most mail servers and many web servers on the internet, protecting the world from malware transmitted via email and file upload. What’s more, our open source scanning and matching technology is baked into many of Cisco's core products. As a consequence, our team is responsible for maintaining a healthy relationship not only with the open source community but also with those Cisco product development teams that integrate ClamAV into their software.
Speaking of Talos, the ClamAV team is a core component of the largest and coolest commercial threat intelligence organization on the planet. Our offices are new and shiny. Our workforce is skilled, but also casual. We enjoy our work, and we know how to relax, too. That means our office has an active roster of video gamers, board gamers, ping pong players, and more.
Ok … so the work sounds good to you, and the office sounds great too… but your schedule can be a little crazy, and sometimes you need to stay home in your pajamas. No problem. Our team generally works from home once a week. Usually we work from home on Fridays, but things can be adjusted as needed to accomodate life's inescapable work-week appointments.
Does this job sounds too good to be true? Don't take my word for it. This year, Cisco placed #6 in Fortune’s rankings of the 100 best companies to work for! Our offices in Fulton, MD and in San Jose, CA are no exception.
To see the required skills, please check out the job listing on our careers page on TalosIntelligence.com.
Labels:
careers,
clamav,
development,
hiring,
jobs
Friday, February 15, 2019
ClamAV has adopted the use of Clang-Format for the purposes of improving the readability and maintainability of our code base.
Contributors to ClamAV, and those wishing to adopt the same format rules for their own projects, are advised to use the latest version of Clang-Format available. Some of the rules we have selected require version 7 as a minimum. Details are included below on how to install Clang-Format, including recommended plugins for popular text editors.
Some of our readers may be concerned that auto-formatting the code base will make it difficult to identify when a bug was introduced. The concern is valid but we have decided that the benefits in code readability outweighed the extra steps that may be needed when viewing the git history. As a reminder, `git blame` and `git diff` have the `-w` option allowing you to ignore changes to white-space which can help tremendously when analyzing changes. The ignore-whitespace feature can also be enabled in most Git plugins for text editors, such as VSCode’s GitLens extension.
Auto-formatting exemptions
In a few instances, we had to modify the code to ensure that Clang-Format would not incorrectly format the code. This is because there are still some cases where manually formatted code is more legible than Clang-Format auto-formatted code.Situations where formatting requires some manual finesse:
- The `try` keyword may not be used as a variable name because Clang-Format will interpret it in the context of exception handling. Avoid the naming variables with reserved words such as `try` to alleviate such issues.
- Programmers often choose to align consecutive macros to make related constants easier to identify at a glance. The AlignConsecutiveMacros Clang-Format feature is still a work in progress so we are not yet able to automatically align consecutive macros.
- A similar issue to the alignment of consecutive macros is the alignment of large array and large struct declarations. Clang-Format generally does an okay job formatting single and multi-dimensional array and/or structure declarations. However, manual formatting in this case is generally better.
To address items 2 and 3 above, we have resorted to using the `// clang-format on/off` markers to protect specific blocks of code from being modified by Clang-Format.
ClamAV code examples where manual formatting was better than Clang-Format:
Preparatory steps taken to ensure correct formatting when working with Clang-Format can be found here.
This commit in our Git history implemented our Clang-Format rules for the first time. You may find it interesting to peruse to see how the rules affected the code base.
Clang-Format Usage
To support our chosen code format style, we have included the following in the clamav-devel dev/0.102 branch:- .clang-format C/C++ code format rules
- clam-format convenience script
Formatting options:
- Run the following in a directory containing a “.clang-format” ruleset. It will apply the rules in-place to the file or directory you specified:
~/workspace/clamav-devel$ clang-format -i <filepath>
- If you’re working regularly with ClamAV, or another project configured to use Clang-Format, you may wish to set the workspace settings for your editor to format-on-save automatically for you.
IMPORTANT: Be sure that you do NOT enable format-on-save globally for all projects, or you may inadvertently make large changes to code in projects that do not provide a Clang-Format ruleset. - If you have made many changes across the ClamAV code base, you may wish to run our clam-format convenience script:
~/workspace/clamav-devel$ ./clam-format
This script does three things: - Re-generate the .clang-format config, specifying the specific format style rules our team has chosen to enforce (as described in the next section).
- Update the format for all ClamAV-owned code in the repository in-place.
- Undo changes from step (2) to specific files that should not be reformatted.
The Clam C/C++ Format Style
ClamAV’s format style was chosen not only because we find it attractive and legible, but also because it aligned best with the existing ClamAV code prior to enacting the new rules. We have done our best to minimize frivolous changes to the code base using the options that Clang-Format provides.The Clam C/C++ Format Style uses the following Clang-Format options:
Language: Cpp,These rules only apply to C & C++ code.
UseTab: Never,
IndentWidth: 4,
Code must not use tabs for indentation, and must use 4 spaces for indentation.
AlignTrailingComments: true,
Comments after code on consecutive lines will be aligned.
Example:
int a; // My comment a
int b = 2; // comment b
AlignConsecutiveAssignments: true,
Variable assignments on consecutive lines will be aligned.
Example:
int aaaa = 12;
int b = 23;
int ccc = 23;
AlignAfterOpenBracket: true,
Function arguments split onto multiple lines will be aligned.
Example:
someLongFunction(argument1,
argument2);
AlignEscapedNewlines: Left,
Backslashes for escaped newlines will be aligned as far left as possible.
Example:
#define A \
int aaaa; \
int b; \
int dddddddddd;
AlignOperands: true,
Operands split onto multiple lines will be aligned.
Example:
int aaa = bbbbbbbbbbbbbbb +
ccccccccccccccc;
AllowShortFunctionsOnASingleLine: Empty,
Only empty functions may be defined on a single line.
Example:
void f() {}
void f2() {
bar2();
}
AllowShortIfStatementsOnASingleLine: true,
Short if-statements may be written on a single line.
Example:
true, if (a) return;
AllowShortLoopsOnASingleLine: true,
Short loops may be written on a single line.
Example:
true, while (true) continue;
BreakBeforeBraces: Linux,
The Linux style of breaking before braces will be used. In this style, there is a new line before the opening brace on function, namespace and class definitions, but control statements such as if, else, for, do, while, and try and braces variable declarations such as enum and struct do not get a new line before the opening brace.
Example:
try { foo();
} catch () {
}
void foo() { bar(); }
class foo
{
};
if (foo()) {
} else {
}
enum X : int { A, B };
BreakBeforeTernaryOperators: true,
Multiline ternary statements will place the ternary operators before the values, otherwise clang-format will reformat the statement into a single line.
Example:
veryVeryVeryVeryVeryVeryVeryVeryVeryVeryVeryLongDescription
? firstValue
: SecondValueVeryVeryVeryVeryLong;
ColumnLimit: 0,
A column limit of 0 means that there is no column limit. In this case, Clang-Format will respect the input’s line breaking decisions within statements unless they contradict other rules.
FixNamespaceComments: true,
The closing brace of a namespace will include a comment providing the namespace name.
Example:
namespace a {
foo();
} // namespace a;
SortIncludes: false,
Header #include statements will not be automatically rearranged by alphabetical order, because the specific order of header includes may inadvertently or intentionally affect compilation.
MaxEmptyLinesToKeep: 1,
Only 1 empty line may exist between lines of code.
SpaceBeforeParens: ControlStatements,
Control statements will have space before the opening paren. Functions and macros will not.
Example:
void f() { if (true) { f();
}
}
IndentCaseLabels: true,
Case labels in switch statements will be indented.
Example:
switch (fool) {
case 1:
bar(); break;
default: plop();
}
DerivePointerAlignment: true
Pointer alignment preference is up to the author of the file but must be consistent across the file. Inconsistencies in a specific file will be reformatted to match the rest of the file.
Example pointer alignment possibilities:
- int * a;
- int* b;
- int *c;
Clang-Format is easy to install on most operating systems, although we will note that due to choices we’ve made in our format specification. Clang-Format version 7 or higher is required for ClamAV. If the Clang-Format tool is not provided as a separate package, it can be obtained by installing Clang or LLVM.
CentOS:
sudo yum install epel-release && sudo yum install clang
Fedora:
sudo dnf install clang
Ubuntu & Debian:
sudo apt install clang-format
macOS:
First, install Homebrew.
Then:
brew install clang-format
Windows:
First, install Chocolatey.
Then:
C:\> choco install llvm
Plugins for popular text editors and IDEs
In addition to the above manual command-line usage, clang-format plugins are readily available for most popular text editors, enabling you to auto-format as you edit using commands in your editor, or to auto-format each time you save a file.
VSCode:
https://marketplace.visualstudio.com/items?itemName=xaver.clang-format
Sublime Text:
https://packagecontrol.io/packages/Clang%20Format
VIM & Emacs:
http://clang.llvm.org/docs/ClangFormat.html#vim-integration
https://github.com/rhysd/vim-clang-format
Atom Editor:
https://atom.io/packages/clang-format
Visual Studio:
https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.ClangFormat
Emacs:
https://www.reddit.com/r/emacs/comments/7uq9w1/replace_emacs_c_autoformatting_with_clangformat/
Conventions and recommendations beyond Clang-Format
In addition to the rules enforced by Clang-Format, we have some additional guidelines to use when developing code for ClamAV.Disclaimer: As project with a history dating back almost two decades, not all code in ClamAV adheres to these guidelines. New code should, however. During the course of working with older code, it should be updated to follow these as best as is feasible and as time permits.
Integer Types
Integer variables should use variable types from stdint that are specific about variable width (int8_t, uint32_t, int64_t, etc.) wherever possible. These types should be included using #include “clamav-types.h”, for portability. Not all operating systems or toolchains provide stdint.h or inttypes.h.
Notable exceptions to the above:
ptrdiff_t may be used to hold the signed numerical value of the difference of two pointers. Similarly, intptr_t and uintptr_t may be used to store and perform arithmetic on pointers specifically because they match the width of a pointer on a given machine.
size_t may be used to store unsigned size values as needed.
off_t, ssize_t, int, and long and other poorly defined types of yore may be required (at least temporarily) by library API's. These should be avoided in general, favoring ptrdiff_t (signed), size_t (unsigned), or an even more explicit type, such as uint32_t or int64_t. The older integer types are not well defined and its size may vary by compiler, OS, and system architecture. These types should be properly converted to better defined types as fast as is practical.
Integer Types in Format Strings
The format characters for stdint variable types are defined in clamav-types.h as of ClamAV 0.101.1.
Example format string usage:
int64_t obj_id = 0;
printf("Object ID: " PRIi64 "\n", obj_id);
In addition to these types, many people forget how to print size_t, ptrdiff_t, ssize_t, and off_t in a way that is portable across all systems.
For reference, these are:
Example format string usage:
int64_t obj_id = 0;
printf("Object ID: " PRIi64 "\n", obj_id);
In addition to these types, many people forget how to print size_t, ptrdiff_t, ssize_t, and off_t in a way that is portable across all systems.
For reference, these are:
- size_t val
- “%zu”, val
- ptrdiff_t val
- “%zd”, val
- off_t val
- “%lld”, (long long)val
- ssize_t val
- “%lld”, (long long)val
Functions should have Doxygen-style comment blocks. These comment blocks should be located where other developers can easily read them when working with the functions.
That is to say that:
Example comment block including a brief, description, in, out, and in/out parameters:
/**
* @brief Scan a file, given a file descriptor.
*
* This callback variant allows the caller to provide a context structure that caller provided callback functions can interpret.
*
* @param desc File descriptor of an open file. The caller must provide this or the map.
* @param filename (optional) Filepath of the open file descriptor or file map.
* @param[out] virname Will be set to a statically allocated (i.e. needs not be freed) signature name if the scan matches against a signature.
* @param[out] scanned The number of bytes scanned.
* @param engine The scanning engine.
* @param scanoptions Scanning options.
* @param[in/out] context An opaque context structure allowing the caller to record details about the sample being scanned.
* @return cl_error_t CL_CLEAN, CL_VIRUS, or an error code if an error occurred during the scan.
*/
extern int cl_scandesc_callback(int desc, const char *filename, const char **virname, unsigned long int *scanned, const struct cl_engine *engine, struct cl_scan_options *scanoptions, void *context);
That is to say that:
- For library API's, the Doxygen comment block should appear above the function prototype in the .h header file. For an example, see clamav.h.
- For almost all other function types, it is best to place the doxygen comment block above the implementation in the .c / .cc / .cpp file.
Example comment block including a brief, description, in, out, and in/out parameters:
/**
* @brief Scan a file, given a file descriptor.
*
* This callback variant allows the caller to provide a context structure that caller provided callback functions can interpret.
*
* @param desc File descriptor of an open file. The caller must provide this or the map.
* @param filename (optional) Filepath of the open file descriptor or file map.
* @param[out] virname Will be set to a statically allocated (i.e. needs not be freed) signature name if the scan matches against a signature.
* @param[out] scanned The number of bytes scanned.
* @param engine The scanning engine.
* @param scanoptions Scanning options.
* @param[in/out] context An opaque context structure allowing the caller to record details about the sample being scanned.
* @return cl_error_t CL_CLEAN, CL_VIRUS, or an error code if an error occurred during the scan.
*/
extern int cl_scandesc_callback(int desc, const char *filename, const char **virname, unsigned long int *scanned, const struct cl_engine *engine, struct cl_scan_options *scanoptions, void *context);
Packed Struct Definitions
Some structs require packing to ensure that they correctly represent data the same when written to or read from disk or the network.Take the example struct:
struct example {
short y;
int x;
char z;
}
Without packing, the compiler might optimize execution time by representing the struct on disk as the following bytes:
In reality, the programmer probably envisioned the struct without all that extra padding:
To force the compiler to properly pack a struct so that it is portable, surround the struct definition with the following:
#ifndef HAVE_ATTRIB_PACKED
#define __attribute__(x)
#endif
y
|
y
| ||
x
|
x
|
x
|
x
|
z
|
y
|
y
|
x
|
x
|
x
|
x
|
z
|
#ifndef HAVE_ATTRIB_PACKED
#define __attribute__(x)
#endif
#ifdef HAVE_PRAGMA_PACK
#pragma pack(1)
#endif
#ifdef HAVE_PRAGMA_PACK_HPPA
#pragma pack 1
#endif
/* struct definition */
struct example_struct
{
...
} __attribute__((packed));
#ifdef HAVE_PRAGMA_PACK
#pragma pack()
#endif
#ifdef HAVE_PRAGMA_PACK_HPPA
#pragma pack
#endif
Note that all THREE preprocessor directive forms are necessary for cross-platform compatible code, as per https://bugzilla.clamav.net/show_bug.cgi?id=1752.
Monday, February 11, 2019
Notice to all ClamAV users, our bugzilla instance at bugzilla.clamav.net will be experiencing some downtime tomorrow at 9am EST, February 14th, 2019, for about 90 minutes. We are moving to the data to a new machine at that time.
Subscribe to:
Posts
(
Atom
)