[LWN Logo]
[LWN.net]

Sections:
 Main page
 Security
 Kernel
 Distributions
 Development
 Commerce
 Linux in the news
 Announcements
 Letters
All in one big page

See also: last week's Kernel page.

Kernel development


The current development kernel release is 2.5.3, which was released on January 30 (changelog). The biggest change in the more recent prepatches has been the split of the massive (> 1MB) Configure.help file into multiple, smaller files spread out over the source tree. This change will make those files easier to maintain (it is hoped); in the mean time, however, it has broken a number of the configuration tools. Other changes include a large ReiserFS update and the inclusion of Nathan Scott's extended attribute patch, which paves the way for access control lists and other useful stuff in the future.

Dave Jones's latest is 2.5.2-dj7, which is caught up to 2.5.3-pre6 and 2.4.18-pre7. It adds a number of small fixes, and, of course, the input layer changes (which require some configuration changes - see last week's LWN kernel page).

Guillaume Boissiere's 2.5 status summary has been updated to reflect the current and near-future state of affairs.

The current stable kernel release is still 2.4.17; Marcelo has not released any new prepatches over the last week. Alan Cox has released 2.4.18-pre7-ac1, which he describes as "a standing still release;" it mostly just catches up to the -pre7 prepatch.

For those with more modest hardware, SnapGear has announced the release of a new uClinux kernel based on 2.4.17. Your processor may not have a memory management unit, but now you can run things like ext3 anyway.

Alternate kernel tree of the week: Marcus Grando has announced 2.4.18-pre7-mg1, which adds the reverse mapping VM patch and some netfilter fixes to the 2.4.18 prepatch.

ACPI followup. Andy Grover, Linux ACPI developer, took exception with the discussion of ACPI, and its problems, in last week's LWN kernel page. His note challenges the complaints that have been made against ACPI, and states:

My hope is, the more people gain familiarity of Linux's ACPI code by testing and helping in its development, the more we all can accept it on its merits, and start improving Linux's PnP and power management by using the improved functionality ACPI provides.

His note is worth a read. The simple fact is that ACPI is in our future, whether we like it or not, and we will have to deal with it. The concerns remain, however, and those will have to be dealt with too.

The patch penguin debate. This discussion has been covered widely, from News.com to Slashdot, so we'll try to go over the main points without getting too far into the depths of it.

It all started, of course, with Rob Landley's 'modest proposal' calling for a "patch penguin" to help Linus manage patches from developers.

Okay everybody, this is getting rediculous. Patches FROM MAINTAINERS are getting dropped on the floor on a regular basis. This is burning out maintainers and is increasing the number of different kernel trees (not yet a major fork, but a lot of cracks and fragmentation are showing under the stress). Linus needs an integration lieutenant, and he needs one NOW.

Rob points out that there have been unofficial "patch penguins" in the past. Alan Cox filled that role through much of the 2.3 and 2.4 series, and Dave Jones is doing it in 2.5. In general, the "ac" or "dj" trees have indeed served as a useful staging area for patches on their way to Linus; Rob claims that there should be one such tree with some sort of official blessing from Linus.

The complaints are echoed by a number of developers who feel that their patches have been ignored for too long. Alan Cox goes far enough to suggest that Linus could find himself replaced: "Think gcc, think egcs. History is merely beginning to repeat itself."

Linus, for his part, feels that there is no real problem in how kernel development works. Adding a patch penguin would not help, since said penguin would scale no better than Linus does. The solution to dropped patches is to route them through the appropriate maintainers:

In short: don't try to come up with a "patch penguin". Instead try to help existing maintainers, or maybe help grow new ones. THAT is the way to scalability.

A number of high-profile kernel developers seem to agree with Linus that the system still works.

That is the core of the dispute. The more interesting part, perhaps, is what changes might result from the discussion. It appears that there might actually be a few:

  • Part of the problem seems to be a misunderstanding of Linus's view of a "maintainer." Linus sees "maintainers" as the 10-20 people he trusts to send him good patches - far short of the full list in the kernel maintainers file. He has, however, never spelled out just who the trusted people are, so there is confusion about where patches should really be sent. Linus did post a partial list of developers with "good taste," but it seems incomplete.

    One necessary result, if the existing system is to continue to work, will be a clearer definition of the protocol for getting patches to Linus. The "trusted" people, and their areas of expertise, need to be made explicit.

  • The issue of small patches was recognized, even by Linus, as a problem. Linus tends to lack the time to look over the large number of "one-liner" fixes that get sent in. But these fixes tend to be important, and should not get dropped. So Linus agrees that there may be a place for a "small stuff" patch penguin. Again, Alan Cox has served in that capacity in the past, and Dave Jones is doing it now.

    The addition of a bug-tracking system, and somebody to keep up with it, could only help as well.

  • Linus may actually start using a system to help with patch management - most likely BitKeeper. BitKeeper and its possible use in kernel development was first covered in LWN back in 1999; its adoption has been hindered by a lack of time on Linus's part, and its not-quite-free license. BitKeeper has some seriously nice features, though, and a number of kernel developers are using it for their own work. There are reasons to believe that it could be quite helpful in the management of kernel patches.

    Linus has never taken the time to get good at BitKeeper, but that may change. In one message he promised "to use bk exclusively for two months" if he gets one more feature added.

  • There is a resurgence in interest in online systems ("patchbots") that will help with the submission of patches. Two new development efforts have sprung up to try to develop such systems; the nascent projects can be found here and here.

Much of the coverage of this discussion has portrayed it as a major rift among kernel developers, with ominous overtones of an impending "fork" of the kernel project. The truth of the matter is that no large, collaborative project can continue to function without occasionally taking a look at how its processes work. Kernel development is certainly not without its challenges; with luck, this discussion will help bring about changes that will keep the kernel project sustainable into the future.

rmap, fork, and COW. Last week's discussion of the reverse mapping VM patch omitted a couple of important things that are worth a mention. First and easiest is the fact that the hashed page wait queues discussed as part of the rmap patch were actually implemented by William Lee Irwin. Credit where credit is due.

The discussion of the costs of the rmap patch concentrated on memory use, but (as Daniel Phillips pointed out) we overlooked one other important factor. When a child process is created with the fork() system call, one task that must be performed is the copying of the parent's page tables. When the rmap patch is applied, fork() must also copy all of the reverse mapping entries. The computational cost of this copying is not small; with rmap, the time required for a fork increases by a factor between 10% (for small applications) up to 400% for something large. A fast fork() implementation is important for overall system performance; a 400% increase is likely to be seen as unacceptible.

There is a fix in the works, however, as described by Daniel Phillips: copy-on-write page tables. The COW idea has the potential to speed up fork() with or without rmap; it can also lead the way to other interesting page table optimizations in the future.

Under the COW scheme, a call to fork() does not result in the copying of the parent process's page tables. Instead, the tables are marked read-only, and their reference count is increased. Both processes then go off and execute with the (now shared) page tables. When either process makes a write access, it will be trapped with a page fault. At that point, the kernel copies the relevant page table (as well as the page being written to) and decreases its reference count. The process, which now has its own page table, is then allowed to continue with its write operation.

Forks become very fast, since page tables are no longer copied at that time. If a process eventually accesses much of its memory, those copies will happen, but they will be more evenly spread out over the life of the process. The usual pattern, however, is for a fork() call to be quickly followed by an exec() call, which wipes out the page tables entirely. In this case, the overhead of copying most of the page tables is avoided altogether.

So COW page tables are a win even in the absence of the rmap patch, and a bigger win when reverse mapping is used. The patch (which has not yet been released) is perhaps even more significant, however, in that it creates the first structure in the Linux kernel for the sharing of page tables. Linux processes can share mappings of memory or files (i.e. shared libraries), but they each have their own page tables for that shared memory. Private page tables are easier to manage, but there are some inefficiencies that result.

Example: most Linux processes have a shared mapping of the C library which occupies just over 1MB of address space (on the author's Debian 'sid' system). This mapping requires almost 300 page table entries (on an i386 system) for every process - and all of them live in unswappable kernel memory. KDE and GNOME applications tend to have many such library mappings, many of which are substantially larger. There would be a real performance advantage in being able to share the page tables for these mappings. The initial COW patch will probably not include support for sharing page tables in this manner, but it is a step in the right direction.

Much of this is speculative, however, until the COW page table patch is posted and benchmarked. If it works as expected, and frees the rmap patch of its fork() penalty, the whole mess may well make its way into the 2.5 series. As Linus told Rik van Riel:

You may not believe me when I say so, but I personally _really_ hope your rmap patches will work out. I may not have believed in your patches in a 2.4.x kind of timeframe, but for 2.6.x I'm more optimistic.

If we're really lucky, the 2.6 (or, perhaps, 3.0?) kernel will have a top-quality VM implementation before it's released.

Asynchronous I/O patch writup. Writing up Ben LaHaise's asynchronous I/O patch has been on the "todo" list for this page for some time. It is an interesting patch; it provides capabilities that some users seem to really need, but it also makes some fundamental changes to the I/O subsystem. We may still take a shot at the AIO patch, but, for now, Suparna Bhattacharya has beat us to it. Have a look for a thorough, detailed look at the patch and the reasons for it existence.

Other patches and updates. This section has gotten steadily longer over the years; we're experimenting a bit with its formatting in an attempt to make it more readable.

Core kernel code:

  • The latest preemptive kernel patch is available from Robert Love.

  • A new software suspend patch for 2.4.17 was posted by Pavel Machek.

  • Rusty Russell has updated his per-cpu data patch for 2.5.3-pre6.

  • William Lee Irwin's hashed page waitqueue patch has been ported to 2.5.3-pre6 by Christoph Hellwig.

  • Momchil Velikov has posted a version of his radix tree page cache patch for inclusion into 2.5.3.

Development tools:

  • Karim Yaghmour has released version 0.9.5pre5 of the Linux Trace Toolkit.

  • Jim Houston has posted a patch which adds a kernel trace mechanism to the kdb debugger.

Device drivers

  • Greg Kroah-Hartman has posted a driverfs implementation for the USB core code. Greg has also updated the USB 2.5 TODO list.

  • Andrew Morton has released a patch which enables DMA transfers of audio data from CDROM drives. (Here's the latest version for those who want to apply the patch).

  • Richard Gooch has posted a new version of his patch which enables a 2.4.18-pre system to handle up to 2080 SCSI disks.

  • Also from Richard are devfs-v199.9 (for 2.4.18-pre7) and devfs-v208 (for 2.5.3-pre6).

  • Jaroslav Kysela has released a set of documentation for the ALSA library API.

Filesystems:

  • LVM 1.0.2 was announced by Heinz J. Mauelshagen.

  • For the more daring, there is a complete reimplementation of LVM (called "device mapper") available from Sistina. This is the beta device mapper release, and the developers are looking for feedback.

  • UVFS 0.2, a user-space filesystem kit, was announced by Britt Park.

  • Steve Best has announced version 1.0.13 of the JFS journaling filesystem.

  • Christoph Hellwig has announced version 0.0.92 of the OpenGFS filesystem.

Kernel building:

  • Justin Piszcz has sent us a detailed description of his "Install Kernel" utility, which helps with kernel builds and installation.

  • Anuradha Ratnaweera has released version 0.1.2 of the kernelconf utility. "Don't use it unless you are really adventurous."

  • CML2 2.2.0 is available from Eric Raymond.

  • Eric W. Biederman has posted a patch which enables the building of a bootable, ELF-formatted kernel. Such a kernel is useful for booting directly from Linux, for network booting, or for use with LinuxBIOS.

Miscellaneous:

Networking:

Section Editor: Jonathan Corbet


January 31, 2002

For other kernel news, see:

Other resources:

 

Next: Distributions

 
Eklektix, Inc. Linux powered! Copyright © 2002 Eklektix, Inc., all rights reserved
Linux ® is a registered trademark of Linus Torvalds