A comparison of several host/file integrity checkers (scanners)

By Rainer Wichmann []    (last update: Oct 13, 2004)

Caveat: The author of this study is also the author of one of these file integrity scanners (samhain). I.e. this study may be biased, because:

(a) the tests in this study are based on user feedback for samhain and the authors personal opinion on what basic functionality a file integrity scanner should provide, and

(b) a bug in samhain found during this study was fixed (see the remark on samhain under Remarks on individual programs).

If you think some of the results presented here are incorrect or outdated, you are welcome to point out corrections.

The lack of a trademark sign does not imply the non-existence of a trademark.

Content

What is the focus of this study ?
Explanation of table rows
Table of results
Remarks on individual programs
Relative speed
Logging options
Centralized management: osiris and samhain

What is the focus of this study ?

This study compares seven freely available (mostly open-source) file/host integrity scanners (file integrity check programs) with respect to the implementation of the core functionality, i.e. questions like:

  • Can the program check all files that you may want to check ?
  • Can the program handle corner cases of the filesystem (that may e.g. result from an intrusion, or from simple errors of users) ?
  • Does the program warn about an incorrect configuration (which may cause it to check not in the way you intended) ?

The results presented here are based on test runs, and sometimes also on investigation of the source code. All test were performed on a Debian 3.0 Linux system. In general, tests were performed only with console logging (stdout/stderr).

Thus, while some "features" of these programs are mentioned that may be of interest for useability, the focus of the study was on testing the scanner's functionality, not on listing and/or comparing their features.

Explanation of table rows

Version
The version number of the file integrity scanner.
Date
The release date of the file integrity scanner. For PGP signed source code, this is the date of the PGP signature, otherwise the date listed on the web site, or in the source.
PGP signature
Is the distributed source code PGP-signed ? If there is no signature, it may be possible to put a trojan into the source code (this has happened in the past with several high-profile security-related programs)!
Language
The programming language of the file integrity scanner.
Required
Requirements (other than compiler or interpreter).
Log Options
What channels are supported for logging ?
DB sign/crypt
Does the scanner support signed or encrypted baseline databases ?
Conf sign/crypt
Does the scanner support signed or encrypted configuration files ?
Name Expansion
Does the scanner support expansion of file names (shell-style globbing or regular expressions) in the configuration file ?
Duplicate Path
Does the scanner check the configuration file for duplicate entries of files/directories (possibly with a different checking policy for the duplicate) ? Strict checking of the configuration file can help to avoid user errors.
PATH_MAX
Can the scanner handle a file whose path has the maximum allowed length (4095 on Linux) ?
Root Inode
Can the scanner handle the "/" directory inode ? This is the file with the shortest possible path, and also the only one with a "/" in its filename, so it may expose programming bugs (and you do want to check that inode).
Non-printable
Can the scanner handle filenames with weird or non-printable characters ? And if it can handle them internally, can it report results in a useful way ? Checked filenames were:

bash$ ls -l --quoting-style=c /
drwxr-xr-x 2 root root 4096 Feb 11 20:16 "\002\002\002\002"

As "\002" is non-printable, incorrect reporting will result in a report about removal of the root directory ("/"), if this file is removed ...

bash$ ls -l --quoting-style=c /opt
drwxr-xr-x 2 root root 1024 Feb 11 19:51 "this is_not_a_love_song\b\b\b\b\b\b\b\b\bwrong_filename"

As "\b" is backspace, incorrect reporting will result in a report for the non-existing file "this is_not_a_wrong_filename"
No User
Can the scanner handle files owned by a non-existing user (UID with no entry in /etc/passwd) ?
No Group
Can the scanner handle files owned by a non-existing group (GID with no entry in /etc/group) ?
Lock
Can the scanner handle files if another process has aquired a mandatory (kernel-enforced) lock on it (yes, Linux has that kind of locks) ?
It is possible to open() such a file for reading, but the read() itself will block, so the scanner will hang indefinitely, unless precautions are taken.
On Linux, mandatory locking requires a special mount option, thus cannot usually be enforced by unprivileged users.
Race
File integrity scanners first lstat() a file to determine whether it is a regular file, then open() it to read it for checksumming. In between these two calls, a user with write access to the directory may replace the file with a named pipe. As a result, the open() call will block and the scanner may hang indefinitely, unless precautions are taken.
/proc
Is the scanner able to scan the /proc directory ? On Linux, at least some files in /proc are writeable and can be used to configure the kernel at runtime, so you may want to check these files. However, files in /proc may be listed with zero filesize, even if you can read plenty of data from them. Almost all scanners "optimize" by not checksumming zero-length files, which is incorrect in the Linux /proc filesystem. Additionally, some files may block on an attempt to read from them.
/dev
Has the scanner problems with the /dev directory ?
Crea/Del
Can the scanner report on missing (deleted) or newly created files ?

Table of results (alphabetic order)

AIDE FCheck Integrit Nabou Osiris Samhain Tripwire
Version 0.10 2.07.59 3.02 2.4 4.0.5 1.8.4 2.3.1-2
Date Nov 30, 2003 May 03, 2001 Sep 08, 2002 Aug 30, 2004 Sep 27, 2004 Mar 17, 2004 Mar 04, 2001
PGP Signature YES NO YES YES YES YES NO
Language C Perl C Perl C C C++
Required libmhash md5sum (or md5) PARI/GP library + about 11 Perl modules OpenSSL 0.9.6j or newer GnuPG (only if signed config/database used)
Log Options stdout, stderr, file, file descriptor stdout, syslog stdout stdout, email central log server (email+file on server side) stderr, email, file, pipe, syslog, RDBMS, central log server, prelude, external script, IPC message queue stdout, file, email, syslog
DB sign/crypt NO NO NO sign NO sign sign+crypt
Conf sign/crypt NO NO NO NO NO sign sign+crypt
Name Expansion regex NO NO see remarks regex glob (shell-style) NO
Duplicate Path NO NO NO NO NO Warns Exits
PATH_MAX OK OK NO OK NO OK OK
Root Inode see remarks NO OK NO OK OK OK
Non-printable NO NO NO NO NO OK OK
No User OK OK OK see remarks OK OK OK
No Group OK OK OK see remarks OK OK OK
Lock OK Hangs Hangs Hangs Hangs Times out Hangs
Race Hangs Hangs Hangs Hangs Hangs OK Hangs
/proc NO NO Hangs Hangs NO OK NO
/dev OK OK OK OK OK OK OK
Crea/Del OK OK OK OK OK OK OK

Remarks on individual programs

AIDE

  • Segfaults on syntax error in config file (directory without policy).
  • When specifying the root directory, apparently '/.* R' does not match '/'; '/$ R' matches, but only if there is no other rule, so it's useless (i.e. can't check the root directory inode). Bug, or my misunderstanding of the regex syntax in the configuration file.
  • There is no tool to list the database (however, it is human-readable, not binary).
  • AIDE is the only scanner in this study that uses mmap() rather than read() to read a file. This is responsible for passing the 'Lock' test (the kernel denies mmapping a mandatorily locked file).
  • Judging from comments in the source code, AIDE tries to fix the 'Race' problem, but the solution does not work.
  • Omits checksum if file size is zero, which is incorrect for Linux /proc files.
  • For deleted / added files, only the path is printed.

FCheck

  • Not possible to define different policies (e.g. ignore size change for logfiles).
  • Omits checksum if file size is zero, which is incorrect for Linux /proc files.
  • Filenames in baseline database are not properly escaped, thus it is not possible to check files with non-printable characters. Some of them may even corrupt the baseline database (e.g. filenames with newlines).
  • No check on config file syntax is done. Duplicate entries are scanned twice, mis-typed directives (e.g. 'Directoy =' instead of 'Directory =' are silently ignored).
  • No tool to dump/read the baseline database, which is barely human-readable.
  • If a directory is scanned recursively, the top level directory inode itself is never included. Thus it is impossible to check the root inode.

Integrit

  • You can have only one root directory in the config file, which makes it complicated to scan (only) some directories scattered over the file system. You need to run one integrit instance per root with different (per-root) configuration files.
  • Internally, all path names start with a double '/'. This may cause the observed ENAMETOOLONG error on valid long paths (?).
  • Usage is simple and straightforward. According to the documentation, the lack of features is intentional to simplify usage (which certainly is a valid argument, as long as one does not need advanced features).
  • Judging from comments in the source code, Integrit tries to fix the 'Race' problem, but the solution does not work.
  • Comment by Ed L Cashin (integrit developer): While there are integrit users who agree with you, I maintain that running integrit three times using three configuration files is cleaner and easier than running integrit once with one more cluttered configuration file. You can even take advantage of parallel I/O if the roots are on different devices.

Nabou

  • The script does not check whether a file is a socket, so it tries to checksum sockets (and hangs).
  • The config file syntax is somewhat apache-like, and easy to understand. Liked the config file syntax best of all tested programs.
  • Filename expansion (globbing, i.e. shell-style) is (only) supported for excluded files.
  • Nabou only prints user/group names. If there is no user (group) for a UID (GID), it will print whitespace rather than the numeric UID (GID).
  • With 'use_ls', nabou prints ls -l like line about matching files, but the file type is incorrect (e.g. devices are listed as regular files).
  • Dumping the database is possible (comma-separated format).

Osiris

  • Files with filename length of NAME_MAX are completely ignored (no database or log entries, except for the eventually modified timestamp of the parent directory).
  • Exclusion of subdirectories (option NoEntry) apparently does not work for the root directory (of course that could also be a user error on my side).
  • Omits checksum if file size is zero, which is incorrect for Linux /proc files.
  • For deleted / added files, only the path is logged.
  • The management command-line interface (CLI) has no support for non-printable chars (although 'space' is accepted). This not only hides the true path for records, but makes the record details eventually unavailable (e.g. if a real path gets duplicated by 'path + non-printable chars'), because the baseline database is binary (Berkeley DB) and only readable via the CLI (but see below).
  • There is no documented way to dump the baseline database to a human-readable format (format is Berkeley DB). Because details for new/missing files are not in the log, one has to lookup each record individually with the CLI, which is cumbersome.
    More precisely, there is a tool printdb in the src/tools/, which is not compiled by default (cd src/tools/ && make), and does not work as-is (edit printdb.c, remove code referencing "db2" in main(), recompile, and use 'printdb -a <database>' - not a big thing if you know C ...).

Samhain

  • Suffers a bit from feature bloat, which causes probably a steeper learning curve than for other programs in this study.
  • In the 'Lock' test, samhain will timeout.
  • In the interest of full disclosure: version 1.8.3 had a bug with formatting of long reports that caused samhain to fail on tests with long paths. Version 1.8.4 was fixed as a result of these tests.

Tripwire

  • The makefile cannot recognize that 'make' is GNU make, it insists on 'gmake'. Made a symlink gmake->make to fix the problem.
  • Tripwire provides no details about modified/added/removed files, only path names, unless one uses twprint --report-level 4, which is pretty verbose.
  • Omits checksum if file size is zero, which is incorrect for Linux /proc files.
  • Apparently the open-source version of Tripwire has failed to attract any developer community. While it appears to be more solid than most other open-source integrity scanners in this study, I found the source code poorly commented and not particularly lucid.

Speed

Tests were performed under Debian 3.0 by checking two datasets, one with 206 Mb, the other with 1.1 Gb. Absolute times depend on the hardware - your mileage may vary.

All integrity checkers showed non-linear behaviour: the larger dataset was checked with less speed (Mb / minute) than the smaller one.

Integrity checkers written in C (AIDE, Integrit, Osiris, and Samhain) were I/O-limited (i.e. speed was limited by the disk I/O), and all were about equally fast (about 2 minutes for the small dataset, 18 minutes for the large one), with no significant difference between database initialization and checking.

Tripwire (written in C++) was slower (4 minutes / 26 minutes) than AIDE, Integrit, and Samhain, again with no significant difference between database initialization and checking.

For the smaller dataset, the two Perl scripts (FCheck and Nabou) were faster than Tripwire, but slower than the C programs.

For the larger dataset, the performance of the Perl scripts was much worse: Nabou took 85 minutes to initialize the database, and 41 minutes for checking. FCheck (also Perl) needed 40 minutes for initializing, and 54 minutes for checking.

Logging options

This is an overview over the logging options provided by different scanners. This information is mostly taken from the documentation, and usually not verified.

AIDE: reports can be printed to stdout, stderr, plaintext file, or to an open file descriptor. Any combination of these can be used, but the verbosity level cannot be set individually.

Fcheck: reports are printed to stdout, and optionally logged to syslog (via the logger standard utility).

Integrit: reports are printed to stdout.

Nabou: nabou prints reports to stdout, or sends them via email.

Osiris: osiris clients only send scan results to the central server, which in turn logs reports to plaintext files and can send emails.

Samhain: samhain can log to stdout, plaintext file, and syslog. Also supported are: sending reports by email, sending reports to a central server, inserting reports into an RDBMS (MySQL, PostgreSQL, Oracle, or unixODBC), sending reports to a Prelude IDS system, writing reports to a named pipe, calling a user-defined external application to process reports (e.g. to send an SMS to a mobile phone), and providing reports via an IPC message queue.

For each supported logging facility, the level of logging can be configured individually. Any combination of facilities can be used in parallel.

Tripwire: tripwire prints reports on stdout, and stores them in binary files. Optionally it can send reports by email. It can also log to syslog (in a very terse way, where only the number of violations are logged).

Centralized management: osiris and samhain

Osiris and samhain are special insofar as they are the only host integrity scanners in this study that provide built-in support for centralized logging and management.

Both systems are able to collect reports/data from clients on a central server, and to store baseline databases and client configurations on the central server. Configuration changes and updates of the baseline database can be performed centrally rather than on individual hosts monitored by the system.

General design differences: push vs. pull

Centralized logging and management requires a client/server system where at least one side has to listen on the network for connections, and thus is potentially vulnerable to remote attacks.

For osiris, scan requests are pushed from the central server to the individual scanner clients. Thus the client, which needs root privileges to open and checksum privileged files, also listens on the network.
On Unix/Linux, this problem is mitigated by using privilege separation (similar to OpenSSH): there is a privileged process that only handles actions that require root privileges, and an unprivileged (sub-)process that does most of the work (including network connections).
However, on MS Windows, privilege separation is not supported.

Samhain works the other way: clients pull the baseline database from the server, and return reports. I.e. here the server has an open port, and the server does not need root privileges. Actually the samhain server (called 'yule') will only run as an unprivileged user (it drops root privileges if started with), and can be chrooted.

Both samhain and osiris use encrypted client/server connections.
With osiris, (only) the server must authenticate to the client. However, similar to samhain, osiris clients negotiate a shared secret with the server that is kept in memory after startup, thus attempts to replace the client can be detected once it has started.
Samhain uses mutual authentication (where the client's credentials are located within the client executable). Upon successful authentication, a shared secret is negotiated that is kept in memory.

With osiris, clients send back snapshots of the file system, which are compared to the baseline on the server side, and stored in the same location as the baseline database. Thus the server (which is potentially vulnerable to malicious clients) needs write access to the directory where baseline data is stored.

Samhain clients only send back reports on filesystem modifications. These reports can be used to update the baseline database on the server via the central management console. The server only needs read access to the baseline data.

Additional features

In addition to file integrity checking, samhain can optionally check for kernel rootkits, search the filesystem for SUID binaries, check mount points (and their mount options), and watch login/logout events.

Osiris can report if (and which) users/groups have been added to /etc/passwd and/or /etc/groups. Also, it can report on new kernel modules loaded (in a limited way, you can do that with samhain by monitoring the checksum of /proc/modules).

Samhain offers a large choice of different logging facilities (both on the client as well as on the server side) that can optionally be used simultaneously. Osiris clients only report to the central server, which in turn logs reports to files and optionally can send email notifications.

Both samhain and osiris support a central management interface. In the case of osiris, this is a command-line interface (CLI) that is part of the osiris package. For samhain, the management interface is a PHP web-based interface that is available as a separate package (beltane).

Osiris supports MS Windows natively, while samhain requires a POSIX emulation (like e.g. Cygwin).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Germany License.