Sunday, June 21, 2015

Keeping up with the Kernels

As filesystems go, ZoL (zfsonlinux) is somewhat unique in that each release should build and run under any kernel from the 2.6.32 "enterprise" version up to the current version at the time of its release.  For example, ZoL version 0.6.4 should work properly under all release kernels from 2.6.32 to 4.0.  This post is the result of some of my observations over several years following and contributing to the ZoL project.


ZoL uses the GNU autotools suite to configure itself for a specific kernel.  It currently has about 225 checks and on a modern multi-core system, the configure process takes quite a bit more time than it does to actually compile the package.

The process by which ZoL gains compatibility with new kernels varies.  In some cases, a distro (typically Fedora) updates their kernel and causes compile failures.  This typically results in posting of an issue (Github terminology) by a user of that distro.  In other cases, a developer or other user who build their own kernels encounters the build failures.  Eventually, issues are raised, pull requests are posted, evaluated and then merged into the repository.

The major problem I've noticed with the approach of basing changes on compile failures is that it can miss other, unrelated API changes which can invalidate existing autoconf checks.  For example, although ZoL currently only provides a stub, it does check for the existence of the superblock nr_cached_objects callback by trying to compile this code:

        #include <linux/fs.h>

        int nr_cached_objects(struct super_block *sb) { return 0; }

        static const struct super_operations
            sops __attribute__ ((unused)) = {
                .nr_cached_objects = nr_cached_objects,
        };

If the compilation succeeds, HAVE_NR_CACHED_OBJECTS is defined and the callback is enabled within ZoL.  In this commit which first appeared in kernel version 3.12, however, the callback was extended to take an additional argument which causes this test to fail.

The problem with this scheme is that changes to the kernel can silently disable features and possibly cause ZoL to fail in obscure ways.  In this particular case, it would have been better to simply check for the existence if the .nr_cached_objects callback with, for example a simple sizeof() operation and then an incompatible callback within ZoL would fail, alerting the user of a kernel incompatibility.

XFS seems to be very actively developed within the upstream Linux kernel codebase and, since it uses many of the same interfaces as ZoL, is a good place to look for these types of interface changes.  In fact, the main XFS developer seems to be behind many of the types of changes similar to the one shown above.

In conclusion, ZoL developers should examine the output of their config.log when building on a new kernel to see whether any important interfaces may have been inadvertently removed.

No comments:

Post a Comment