Sections:
Main page
Security
Kernel
Distributions
On the Desktop
Development
Commerce
Linux in the news
Announcements
Linux History
Letters
All in one big page

Kernel development

The current kernel release is still 2.4.3. The 2.4.4 prepatch is up to 2.4.4pre7; it continues to accumulate bug fixes. There has been no word on when a real 2.4.4 release might happen. Alan Cox's patch, meanwhile, is up to 2.4.3ac14.

The security module project only recently got off the ground, but the people involved are not wasting any time in getting going. This project, remember, set out to define a generic security interface that could be used by any particular enhanced-security implementation to hook into the kernel without the need for further patching by the user. This interface would allow easy experimentation with several of the current offerings, and would make it (relatively) easy to switch between them. Linus has argued for this approach with the reasoning that, since there seems to be no agreement on what is the right approach to heightened security for Linux, there should be a simple way for all of them to work with stock kernels.

The interface that the group is settling on at this early stage is based on a structure called security_ops which, by way of a set of subsidiary structures, contains pointers to several dozen functions. The role of each function is to make a security decision in a particular situation, returning a value indicating whether or not a particular operation should be allowed. Thus, for example, before creating a symbolic link the kernel will make a call like:

    error = security_ops->inode_ops->symlink(dir, 
            dentry, oldname);
    if (error)
        goto nice_try_buddy;

The default implementations of these functions in the kernel simply allow anything at all. If a user wishes to impose a particular security policy, it is simply a matter of loading a module which replaces all of those functions with a new set that implements that policy.

This approach is conceptually simple, and has a very low overhead on systems where no added security policy is in use. It is relatively easy to implement; it's mostly a matter of deciding what operations need to be checked, and inserting a security_ops call for each one. A patch implementing this scheme already exists, thanks to the efforts of Greg Kroah-Hartman. It does not implement the full set of calls, of course, but it is a start which gives people something to play with.

There is one obvious limitation in this design: only one security policy can be in place at any given time. There is no way to "stack" multiple policies. That appears to be a deliberate design decision; as soon as you start playing with multiple policies you have the potential for no end of administrative problems and complicated interactions. Nonetheless, a stackable implementation would certainly allow for more flexibility in the creation and use of security policies.

There is also some discussion currently over whether one or more special system calls will be needed for the security module implementation.

This work is proceeding quickly; people who have an interest in how security modules hook into the system may want to make their views known before too long. There is now a web site available for the project, if you want further information.

Block driver API change. The 2.4.4 kernel will contain an incompatible API change that people working with block device drivers, at least, should know about.

The kernel maintains one or more "request queues" for each block driver in the system; it holds a structure for each I/O request which is waiting for attention from the device. In general, performance is improved if that queue is allowed to get reasonably long before being handed to the device itself. A long queue allows requests to be sorted to minimize disk head movement, as well as allowing the merging of contiguous requests.

The block I/O subsystem uses a technique called "plugging" to help with sorting and merging. When the request queue is emptied by the device, it will be plugged by the kernel, meaning that no more requests will be passed to the driver. The plug will be maintained for a short period of time while the queue fills, then the plug is pulled and the new set of requests will be processed.

For most devices, this mechanism works reasonably well. There are exceptions, however. RAM disk devices, for example, do not benefit from request sorting and merging; doing that work is simply a waste of CPU time. Compound devices, such as RAID arrays or disks managed by LVM, can not be sorted at that level; what looks like a pair of contiguous requests on a RAID volume will likely turn into operations on two or more separate devices later on. To accommodate these needs (and others), the block subsystem provides a function blk_queue_pluggable() which sets up a special "plug" function. Often all that function does is return, effectively disabling plugging.

At least, that's how it worked until recently. As of kernel 2.4.2, devices which simply disable plugging have not worked correctly, and, in 2.4.4, blk_queue_pluggable() is going away entirely. According to kernel hacker Jens Axboe, this change is being made because there are no longer any reasons for disabling plugging. A separate set of functions exists which allows control over sorting and merging of requests. But devices which truly do not benefit from sorting and merging probably should not be using a request queue at all. The 2.4 kernel allows drivers to provide a make_request() function which can be used to receive requests directly, before they go onto any queues.

The reasoning all makes sense, but changes of this nature make it clear that the 2.4 kernel has still not truly stabilized. When the core API is no longer changing, we can say that we have a stable kernel.

Non-GPL firmware in the kernel. Adam Richter posted a note on the Debian-legal list this week pointing out a bit of a licensing problem in the kernel source. Several of the header files in the drivers/usb/serial directory (such as keyspan_usa19_fw.h) contain the following text:

"The firmware contained herein as keyspan_usa19_fw.h is Copyright (C) 1999-2000 Keyspan, A division of InnoSys Incorporated ("Keyspan"), as an unpublished work. This notice does not imply unrestricted or public access to this firmware which is a trade secret of Keyspan, and which may not be reproduced, used, sold or transferred to any third party without Keyspan's prior written consent. All Rights Reserved.
This firmware may not be modified and may only be used with the Keyspan USA-19 Serial Adapter. Distribution and/or Modification of the keyspan.c driver which includes this firmware, in whole or in part, requires the inclusion of this statement."

Needless to say, this language is not exactly compatible with the GPL code that makes up the kernel.

The code in question is firmware for the Keyspan device; it is downloaded into the hardware when the driver initializes itself. In that sense, one can see it as not really being part of the kernel - it's part of the hardware. Certainly the kernel hackers have been willing to see it that way; the inclusion of this firmware is regarded as "mere aggregation," which is allowed by the GPL, even though the code is linked into the kernel image.

Not everybody agrees with that interpretation. But this issue came up on the Debian lists because Debian does not much care whether linking in restricted firmware in this manner is OK or not. Since the firmware is not free software, Debian does not wish to include it as part of its distribution. The Debian Project is highly inflexible in this regard, and most of its developers like it that way. While a conclusion had not been reached as this was being written, it seems likely that Debian will remove the Keyspan drivers from its kernels.

The longer-term solution has two different aspects:

Most modern hardware has code inside it, and that code is generally not free software. Even Debian happily runs on hardware that has restricted firmware in it. Adding the ability to update that firmware does not really change anything. Some tolerance of non-free firmware will likely be required in the future - though it is also important to make vendors aware of the need for better licensing.
In most cases, it is not really necessary to link the firmware into the kernel image itself. A solution for USB devices already exists where a user-space program downloads the firmware into the device via the hotplug mechanism. That removes this code (and its licensing issues) from the kernel, and also makes the kernel image smaller.

The user-mode solution will make a lot of things easier, but it is not likely to go into the mainline kernel until the 2.5 series starts.

Eric in KernelLand. Eric Raymond has had a busy week on the linux-kernel mailing list, and not all of it has been fun. As he seeks to expand his kernel contributions from CML2 into broader parts of how kernel development is done, he is running into resistance.

Relatively uncontroversial has been Eric's taking over responsibility for the Configure.help file. This file provides help text for (in an ideal world) every kernel configuration option. Maintaining this file along with the CML2 configuration system makes some sense, and nobody has complained, even though Eric has stated that he would maybe like to convert the file into an XML-based format.

Eric then released a tool called 'kxref', which attempts to find broken configuration symbols in the kernel source. These symbols can be typos, old configuration options that no longer exist, and other types of related cruft. This tool turned up 731 apparently broken symbols out of 2096 total - seemingly quite a few. Some of them were clearly bugs, but others, as it turns out, were not.

Eric started posting patches to eliminate the dead symbols, and that's where the trouble started. It seems that quite a few of the symbols aren't quite as dead as Eric thought. Or they have already been fixed in other places. Many of the problematic symbols, as it turns out, are in architecture-specific code, and the port maintainers started to get a little grumpy about Eric posting patches for "their" stuff.

The problem is this: the official Linus kernel is not the definitive tree for ports other than the x86, and perhaps the Alpha. Almost all of the other architectures have their own development trees elsewhere; they can be found on the main kernel.org page. Development on ports tends to happen independently of the Linus kernel for long periods of time, with merges happening when things appear to be reasonably stable.

For 2.4, things aren't that stable yet, and most of those merges have not yet happened. Thus, any changes to port-specific code as found in the Linus kernel will be difficult to apply to the real port-specific tree. Cross-port changes of the type being attempted by Eric are always going to present some logistical challenges, but now appears to be an especially poor time. Later in the stable series, when the port-specific trees are more in sync with Linus's kernel, should provide a better opportunity for this sort of cleanup.

Eric then went on to propose a new scheme for the MAINTAINERS file. This file lists, in theory, who is responsible for each part of the kernel source (curious people can look at the 2.4.3 version). Eric has concluded that this file "doesn't seem to be scaling well," mostly because he has had trouble finding maintainers for code he wants to change.

The new scheme would put a "map block" into most source files, listing who is responsible for it. New tools would then be created to merge these blocks into a coherent whole, and to make it easy, in theory, to find the maintainer for a specific module.

Response to this proposal has been almost uniformly negative. Not everybody agrees that the MAINTAINERS file is not scaling; Alan Cox, for example, says that updates are the real problem; people just don't bother to update the entries in the file. There appears to be some truth to that: is Remy Card really still maintaining the ext2 filesystem? Eric's plan might help somewhat by putting the maintainer entries with the code itself, but he also has a wider goal:

However, if you think about it, you'll notice there's a common thread in all the proposals I've been making. If you still have trouble seeing it, remember that I hack social systems as much as I hack code. And consider lkml as a social machine. And consider -- carefully -- the things it is demonstrably poor at.

This kind of language tends to turn off kernel hackers, who, in general, probably feel little need to have their social system hacked. At least, not in such an overt way. Eric may yet achieve many of his goals, but a bit of a lighter touch might help.

Other patches and updates released this week include:

Daniel Phillips posted a look at file deletion performance, as a way of figuring out why it takes so long to delete a large directory full of files.
Jes Sorensen announced the creation of a new Logical Volume Manager (LVM) mailing list. Evidently the closed and excessively moderated nature of the old list aggravated a lot of developers; there is also some disgruntlement over coding practices and unfixed bugs in the LVM code. In response, the LVM list has been opened up, but it may have happened too late.
Ingo Molnar tracked down a swapping performance problem and produced a patch to fix it. (There has since been an updated patch, but the original posting explains the nature of the problem).
Herbert Valerio Riedel released a version of the international kernel patch (which provides cryptographic capabilities) in a pure-module form.
A patch to make NFS work with ReiserFS was posted by Chris Mason. It's not presented as an optimal solution; instead, it's an attempt to produce a minimal patch that can get into the 2.4 kernel.
Alexander Viro has released a new version of his namespaces patch.
Tim Jansen posted version 0.2.0 of his device registry patch.
Ulrich Windl has posted an extension to adjtime() which fixes some limitations in that system call.
The 2001-04-24 release of the hotplug scripts was announced by Greg Kroah-Hartman.
Jeff Garzik has announced a web page for people dealing with ECN problems (see the September, 2000 kernel page).
D.W.Howells has released the fourth version of his R/W semaphore patch.
Bulent Abali announced a patch for "Memory Expansion Technology" (MXT) support. MXT uses hardware support to compress data stored in main memory, thus doubling its capacity. (See also this description of the design of the Linux MXT implementation).
Rusty Russell posted a lengthy netfilter patch intended for the 2.4.4 kernel.
Eric Raymond's latest CML2 patch is cml2-1.2.5.
Rejected patch of the week: somebody named Imel reached an interesting conclusion: "i found out that one of the big problem with linux and most other operating system is the multi-user thing." So he posted a patch which removes all permissions and privilege checking as a step toward the creation of a single-user kernel. Needless to say, the kernel hackers were not impressed...

Section Editor: Jonathan Corbet

April 26, 2001

For other kernel news, see:
Kernel traffic
Kernel Newsflash
Kernel Trap
2.5 Status
Other resources:
L-K mailing list FAQ
Linux-MM
Linux Scalability Effort
Kernel Newbies
Linux Device Drivers

Next: Distributions