Monday, September 16, 2019

Understanding and transitioning to ClamAV's new On-Access scanner

We have a new On-Access scanner for ClamAV that separates functionality from clamd into a new application called clamonacc.

This post is for technically inclined users who have used ClamAV’s On-Access scanner in the past (0.99 - 0.101.3), and wish to transition to a newer version (>= 0.102.0). While this overview may be somewhat useful for new On-Access users, we first recommend setting up your environment using the official documentation, then returning here only if your use case is not met.

This post is also for anyone who may simply be trying to install ClamAV from source, on an older system, with an older version of Curl. If that’s the case, skip ahead to the section titled The Breakdown for your fix.

Things That Haven’t Changed

With a change this big, it’s easier to start with what’s the same. Here’s a list of the important things:
  • Fanotify and inotify still required
  • Clamd still needs to be run and clamd.conf still used by default
  • All working “OnAccessXYZ” clamd.conf configuration options still valid and work as expected
  • Only Linux systems are supported

The New Stuff

Now for the real reason you’re here: what’s different and how that affects you. Well, let’s start with the differences, and then I’ll break down each item to help you gain a fuller understanding of the new system.
  • Curl (version >= 7.45) required for installation
  • VirusEvent and Extra Scanning features re-enabled
  • Client application called clamonacc which interfaces with a clamd server
  • Command-line options
  • Separate and cleaner logging
  • Configuration option for excluding users via username
  • Configurable multi-threaded event handling architecture
  • Configuration options which allow tweaks to network communication and error handling

The Breakdown

Curl (version >= 7.45) required for installation:

This is only relevant if you are installing from source, but it is worth noting. If your curl version is out of date, the installation will fail with an error message stating that you need a version of curl >= 7.45 when you run:

$ > ./configure

If your OS package maintainers do not provide a version of curl newer than 7.45, we recommend installing the latest version of curl (and its headers) from the source.

Alternatively, if you don’t need On-Access capabilities, you can skip installation on your system using the “./configure” flag “--disable-clamonacc”. If you are using a non-Linux system, installation of clamonacc will automatically be disabled.

VirusEvent and Extra Scanning features re-enabled:

Previous versions of the On-Access Scanner had disabled the VirusEvent and Extra Scanning features. The VirusEvent feature allowed users to kick-off a custom shell script whenever clamd found a malicious object. Extra Scanning was a feature tied to inotify which used its expanded and more mature event detection to fill the gaps left (at the time) by fanotify event coverage. With Extra Scanning enabled, users can catch "create" and "move to" events, which up until kernel version 5.1, were not available for capture with the fanotif api. Without Extra Scanning, On-Access scanning will capture all "access" and "open" events only.

Both VirusEvent and Extra Scanning features were disabled due to resource consumption issues when running the On-Access Scanner for long periods of time. However, the new On-Access Scanner has been re-architected with a long-running use-case at the forefront. As a result, it is more reliable, error tolerant, and much, much better at cleaning up after itself. All of this allows us to re-enable the Extra Scanning feature with confidence.

Similarly, due to the new separation between clamd and clamonacc, VirusEvent scripts should now work as expected. This is not so much, “re-enabling the feature” as it is a direct (albeit planned and intended) result of this new separation. Like with clamdscan, VirusEvent will be kicked off by the clamd process, not the new clamonacc application.

Client application called clamonacc which interfaces with a clamd server:

The biggest change to On-Access Scanning is its separation from the clamd server application. With this separation comes more flexibility in deployment options, better stability and up-time for both applications, and a much improved potential attack surface.

Regarding flexibility, the application can be run on the same machine as a clamd instance, or for resource sensitive deployments clamonacc can “phone home” to a central clamd instance. Even better, multiple clamonacc instances on multiple systems can all receive verdicts from a single, centrally located clamd instance. This offloads verdicts to a single location, and scanning/protection tasking to a much lighter-weight application.

However, while such a deployment is possible, it requires streaming over a TCP socket connection, which comes with a number of drawbacks. First, this version of ClamAV requires users to secure their own TCP sockets. We are moving to change this in the future, (the new curl requirement is a step in that direction) but it’s still important to note. Second, the version of clamonacc (and clamd) released with 0.102.0 is not optimized for sending files and receiving verdicts via a network stream. While there are plans to alleviate this, expect full file contents to be sent across the configured socket each time clamonacc requires a clamd verdict. This will obviously have a network impact on a distributed deployment. Third, and finally, caching still needs to be implemented on the clamonacc client side to reduce the number of overall network scan requests.

All that said, smart network engineering, and a targeted clamonacc configuration which only watches necessary files/directory and excludes the right UIDs and/or unames might let you mitigate or overcome these hurdles quite nicely.

Another benefit to this separation is increased stability for both clamd and clamonacc. During our testing, clamonacc was able to recover gracefully from just about every issue that arose--whether anticipated or not--while still providing necessary protections. Similarly during the course of development and testing, clamd was not affected by any clamonacc failure. That said, this does not mean that clamonacc cannot affect clamd at all, or vice-versa. These applications do not exist in a vaccuum and must necessarily interact with one another during normal operation.

With that in mind, one major goal of this rework was improving clamd’s security posture. In versions prior to 0.102, On-Access Scanning was tied directly into clamd, and thus required users to run clamd with elevated privileges (often root). This came with a host of security concerns given the size of clamd's attack surface. By separating clamonacc from clamd, a system admin need only ensure clamd has the read and access permissions necessary to deal with any file descriptors clamonacc may pass along. Of course, clamonacc still requires elevated permissions due to the fanotify interfaces used, but compared to clamd, clamonacc's attack surface is much smaller.

Command-line options:

In order of appearance when you run clamonacc with “--help” these are the command line options and their uses:

   --help

As one would expect, prints the version number, a command line usage example, and a very abbreviated explanation of each available command line option, alongside their shorter forms.

    --version              

Attempts a connection to the clamd server and requests clamd’s version, such that a version mismatch between server and client might be identified. If a clamd server is not found, the local client version is printed to the console instead.

    --verbose

This is akin to clamd’s or clamscan’s --debug option, but isn’t quite so noisy as either of those. By default, clamonacc does not print any output after daemonizing, so you will have to pair this option with --log or --foreground to use it.

    --log=FILE

FILE should be a full path to the logfile you wish clamonacc to use. Without this option, clamonacc will not keep a log. With this option, clamonacc will only output some information to the console if --foreground is enabled. As of the release of 0.102, it is a known bug that clamonacc lacks log rotation.

    --foreground

Forces clamonacc not to daemonize into the background and instead print output and verdicts to the console.

    --watch-list=FILE

This is the command line analogue to the “OnAccessIncludePath” configuration option. The file provided via FILE will be parsed at startup and all valid paths will have watch points placed on them. FILE must be a proper path, it must be a text file, and each path in the text file must be a full path to a valid directory. You must separate multiple paths in the text file with a newline. If you run clamonacc with --verbose, it will let you know if you got any of this wrong, but it will still startup, choosing to ignore invalid input instead of failing out.

    --exclude-list=FILE

This is the command line analogue for “OnAccessExcludePath”. Everything that holds true for --watch-list holds true for --exclude-list, except the end result is that the provided paths within the text file will not have watch points placed on them when clamonacc starts up.

    --remove

Works the same way as clamdscan’s --remove option. In the event that a file is found to be malicious, clamonacc will make a best attempt at removal.

    --move=DIRECTORY
    --copy=DIRECTORY

Works as you'd expect, each also sharing clamdscan’s core functionality. If clamd returns with a malicious, the clamonacc process will either move or copy it into the given path. These three options are mutually exclusive.

    --config-file=FILE

When loading configuration options, clamonacc checks for clamd.conf in ClamAV’s default install location. You can force clamonacc to use a configuration file in a location of your choice by using this option instead. This option is especially useful if you have broken up clamd and clamonacc configuration options into their own separate files.

    --allmatch

Every time a scan request is made, clamonacc will tell the clamd server to run in all-match mode when rendering verdicts.

    --fdpass

This is a niche option with an unclear usecase, but we preserved in case older clamdscan users may know of a specific usecase we do not. Generally, if you are running clamd on the same system as clamonacc, you will be using a local unix socket and file descriptor passing is enabled by default. One theoretical (untested) use, is passing file descriptors along a socket between containers or between a container and the host.

    --stream 

Typically, the only time you would use this option is when you could otherwise pass file descriptors instead. Even if clamonacc and clamd were optimized for streaming, file descriptor passing would be the better, and faster method. It’s only use (besides debugging), is avoiding permission issues that arise when passing file descriptors to clamd.

Separate and cleaner logging:

On-Access Scanning no longer uses the same log file as clamd. To make clamonacc print its output to a logfile, run clamonacc with the command “--log=FILE” where “FILE” is the name you wish to give the log file. Without this command, by default, clamonacc will fork into the background without printing any output. Regardless of whether a log file has been specified, Clamonacc will still protect your system according to any configurations made and all command line options passed. And no matter the logging situation, all VirusEvents will trigger from clamd as expected.

If you do choose to enable logging, know that On-Access logging has been cleaned up considerably in the move from 0.101 to 0.102. After startup, you will see only verdicts for malicious files and errors in their log. That’s it.

If the “--verbose” command is supplied at startup, significantly more output will be available to you. This information is primarily useful for troubleshooting purposes and developers. Therefore, only consider using it if you run into a recurring problem during application runtime.

Configuration option for excluding users via username:

A feature included on user request, this allows simple exclusion of any user and more flexible permission management. The option to use this feature is called “OnAccesExcludeUname” and you can use it as many times as you’d like.

Another exclusion useful option that existed in 0.101 and continues to exist in 0.102, but may seem out of place to some users, is “OnAccessExcludeRootUID”, which is a boolean option that--as it says on the box--will exclude all events triggered by a processes under the root UID “0” from being scanned. This option was added strictly as a workaround to an option parsing limitation, which entirely disabled the “OnAccessExcludeUID” option when set to “0”.

Configurable multi-threaded event handling architecture:

Clamonacc has been re-architectured to follow a multi-supplier, single-consumer queue model for event processing. It accomplishes this by keeping an active thread pool to handle verdict receipts, which is managed by a thread that kicks off work for the pool whenever new entries are added to the event queue it maintains. Currently, that event queue is set up to be fed and grown with distilled information from fanotify, and inotify event monitoring threads, but in theory, the event queue could very easily be fed from other sources down the road--should the need or desire arise.

The clamonacc will startup with five worker threads available to consume events from the queue. However, if your system has the resources for it, you can drastically improve the performance of clamonacc by raising that number with the “OnAccessMaxThreads” options. If you do this, you will likely also want to increase values on “MaxThreads” and “MaxQueue” as well to ensure your clamd instance can keep up.

Configuration options which allow tweaks to network communication and error handling:

With the separation came increased inter-process complexity. And with that complexity arose more potential error cases. Of particular note are the new configuration options surrounding network communications between clamd and clamonacc applications. Two options are provided for tweaking network communication behavior to better suit your environment:
  • OnAccessCurlTimeout
  • OnAccessRetryAttempts
By default, each connection attempt made by clamonacc will timeout after five seconds and will not attempt to reconnect. In case of connection failure or timeout due to known, intermittent network constraints, you may force clamonacc to reattempt the connection by setting the OnAccessRetryAttempts to the number of retries you’d like clamonacc to make before giving up and reporting an error.

Users experienced with the prevention will now be wondering what happens in such a case? Will the file remain locked? Will clamonacc release its access hold automatically in case of failure?

Clamonacc is configured to allow all access attempts if an error occurs while prevention is enabled. However, you can change this behavior by enabling the “OnAccessDenyOnError” configuration option. When this option is enabled alongside “OnAccessPrevention”, clamonacc will deny process access to a file if any error is encountered during the scanning process.

As you can imagine, this is potentially a very dangerous setting and must be used with care to avoid locking your system out of important resources due to something so mundane as a clamd permission issue, or a brief network outage.

Wrap Up

That’s the bulk of it. A lot has changed from a technical standpoint, and while the amount of information shared above might seem overwhelming at first glance, from an operational standpoint there isn’t too much more you need to worry about. Be mindful of your UID/uname excludes, make sure clamd has the right permissions, lock down your TCP ports, be aware of your resource limitations and the knobs you’ve been given to tweak software performance, and you should have your deployment up in no time.

Finally, as I said before, if there’s anything that changed which I didn’t go over above, please leave a comment below so I can address your concern.

Happy clamming.

ClamAV 0.102.0 Release Candidate is now available

*This article was accidentally withdrawn and is being re-published so it is available for historical reference.

Today we are publishing the release candidate for ClamAV 0.102.0 (clamav-0.102.0-rc).

There have been some bug fixes and minor improvements since the 0.102.0 beta.  We do not expect any additional changes should be necessarily before publishing the 0.102.0 stable release.

Please take this opportunity to validate that the 0.102.0 release candidate works for your application and that there are no major issues blocking your upgrade to 0.102.0.

Release materials for 0.102.0-rc can be found on the ClamAV's downloads site.
 

Release Notes

ClamAV 0.102.0 includes an assortment improvements and a couple of significant changes.

Major changes

  • The On-Access Scanning feature has been migrated out of clamd and into a brand new utility named clamonacc. This utility is similar to clamdscan and clamav-milter in that it acts as a client to clamd. This separation from clamd means that clamd no longer needs to run with root privileges while scanning potentially malicious files. Instead, clamd may drop privileges to run under an account that does not have super-user. In addition to improving the security posture of running clamd with On-Access enabled, this update fixed a few outstanding defects:
    • On-Access scanning for created and moved files (Extra-Scanning) is fixed.
    • VirusEvent for On-Access scans is fixed.
    • With clamonacc, it is now possible to copy, move, or remove a file if the scan triggered an alert, just like with clamdscan. For details on how to use the new clamonacc On-Access scanner, please refer to the user manual on ClamAV.net, and keep an eye out for a new blog post on the topic.
  • The freshclam database update utility has undergone a significant update. This includes:
    • Added support for HTTPS.
    • Support for database mirrors hosted on ports other than 80.
    • Removal of the mirror management feature (mirrors.dat).
    • An all new libfreshclam library API.

Notable changes

  • Added support for extracting ESTsoft .egg archives. This feature is new code developed from scratch using ESTsoft's Egg-archive specification and without referencing the UnEgg library provided by ESTsoft. This was necessary because the UnEgg library's license includes restrictions limiting the commercial use of the UnEgg library.
  • The documentation has moved!
    • Users should navigate to ClamAV.net to view the documentation online.
    • The documentation will continue to be provided in HTML format with each release for offline viewing in the docs/html directory.
    • The new home for the documentation markdown is in our ClamAV FAQ Github repository.
  • To remediate future denial of service conditions caused by excessive scan times, we introduced a scan time limit. The default value is 2 minutes (120000 milliseconds).

    To customize the time limit:
    • use the clamscan --max-scantime option
    • use the clamd MaxScanTime config option
  • Libclamav users may customize the time limit using the cl_engine_set_num function. For example:

    cl_engine_set_num(engine, CL_ENGINE_MAX_SCANTIME, time_limit_milliseconds)

Other improvements

  • Improved Windows executable Authenticode handling, enabling both whitelisting and blacklisting of files based on code-signing certificates. Additional improvements to Windows executable (PE file) parsing. Work courtesy of Andrew Williams.
  • Added support for creating bytecode signatures for Mach-O and ELF executable unpacking. Work courtesy of Jonas Zaddach.
  • Re-formatted the entire ClamAV code-base using clang-format in conjunction with our new ClamAV code style specification. See the clamav.net blog post for details.
  • Integrated ClamAV with Google's OSS-Fuzz automated fuzzing service with the help of Alex Gaynor. This work has already proven beneficial, enabling us to identify and fix subtle bugs in both legacy code and newly developed code.
  • The clamsubmit tool is now available on Windows.
  • The clamscan metadata feature (--gen-json) is now available on Windows.
  • Significantly reduced number of warnings generated when compiling ClamAV with "-Wall" and "-Wextra" compiler flags and made many subtle improvements to the consistency of variable types throughout the code.
  • Updated the majority of third-party dependencies for ClamAV on Windows. The source code for each has been removed from the clamav-devel repository. This means that these dependencies have to be compiled independently of ClamAV. The added build process complexity is offset by significantly reducing the difficulty of releasing ClamAV with newer versions of those dependencies.
  • During the 0.102 development period, we've also improved our Continuous Integration (CI) processes. Most recently, we added a CI pipeline definition to the ClamAV Git repository. This chains together our build and quality assurance test suites and enables automatic testing of all proposed changes to ClamAV, with customizable parameters to suit the testing needs of any given code change.
  • Added a new clamav-version.h generated header to provide version number macros in text and numerical format for ClamAV, libclamav, and libfreshclam.
  • Improved cross-platform buildability of libxml2. Work courtesy of Eneas U de Queiroz with supporting ideas pulled from the work of Jim Klimov.

Bug fixes

  • Fix to prevent a possible crash when loading LDB type signature databases and PCRE is not available. Patch courtesy of Tomasz Kojm.
  • Fixes to the PDF parser that will improve PDF malware detection efficacy. Patch courtesy of Clement Lecigne.
  • Fix for regular expression phishing signatures (PDB R-type signatures).
  • Various other bug fixes.

New Requirements

  • Libcurl has become a hard-dependency. Libcurl enables HTTPS support for freshclam and clamsubmit as well as communication between clamonacc and clamd.
  • Libcurl version >= 7.45 is required when building ClamAV from source with the new On-Access Scanning application (clamonacc). Users on Linux operating systems that package older versions of libcurl (e.g. all versions of CentOS and Debian versions <= 8) have a number of options:
    • Wait for your package maintainer to provide a newer version of libcurl.
    • Install a newer version of libcurl from source.
    • Disable installation of clamonacc and On-Access Scanning capabilities with the ./configure flag --disable-clamonacc.
  • Non-Linux users will need to take no actions as they are unaffected by this new requirement.

Acknowledgements

The ClamAV team thanks the following individuals for their code submissions:
  • Alex Gaynor
  • Andrew Williams
  • Carlo Landmeter
  • Chips
  • Clement Lecigne
  • Eneas U de Queiroz
  • Jim Klimov
  • Joe Cooper
  • Jonas Zaddach
  • Markus Kolb
  • Orion Poplawski
  • Ørjan Malde
  • Paul Arthur
  • Rick Wang
  • Romain Chollet
  • Rosen Penev
  • Thomas Jarosch
  • Tomasz Kojm

Finally, we'd like to thank Joe McGrath for building our quality assurance test suite and for working diligently to ensure knowledge transfer up until his last day on the team. Working with you was a pleasure, Joe, and we wish you the best of luck in your next adventure!